Geometric ergodicity of Gibbs samplers for the Horseshoe and its regularized variants
GGeometric ergodicity of Gibbs samplers for theHorseshoe and its regularized variants
Suman K. Bhattacharya , Kshitij Khare and Subhadip Pal Department of Statistics, University of Florida Department of Bioinformatics and Biostatistics, University of Louisville
Abstract:
The Horseshoe is a widely used and popular continuous shrinkage prior for high-dimensional Bayesian linear regression. Recently, regularized versions of the Horseshoe priorhave also been introduced in the literature. Various Gibbs sampling Markov chains havebeen developed in the literature to generate approximate samples from the correspondingintractable posterior densities. Establishing geometric ergodicity of these Markov chains pro-vides crucial technical justification for the accuracy of asymptotic standard errors for Markovchain based estimates of posterior quantities. In this paper, we establish geometric ergodicityfor various Gibbs samplers corresponding to the Horseshoe prior and its regularized vari-ants in the context of linear regression. First, we establish geometric ergodicity of a Gibbssampler for the original Horseshoe posterior under strictly weaker conditions than existinganalyses in the literature. Second, we consider the regularized Horseshoe prior introduced in[17], and prove geometric ergodicity for a Gibbs sampling Markov chain to sample from thecorresponding posterior without any truncation constraint on the global and local shrinkageparameters. Finally, we consider a variant of this regularized Horseshoe prior introduced in[14], and again establish geometric ergodicity for a Gibbs sampling Markov chain to samplefrom the corresponding posterior.
MSC 2010 subject classifications:
Primary 60J05, 60J20; secondary 33C10.
Keywords and phrases:
Markov chain Monte Carlo, geometric ergodicity, High-dimensionallinear regression, Horseshoe prior.
1. Introduction
Consider the linear model y = X β + σ ε , where y ∈ R n is the response vector, X is the n × p designmatrix, β ∈ R p is the vector of regression coefficients, ε is the error vector with i.i.d. standardnormal components, and σ is the error variance. The goal is to estimate the unknown parameters( β , σ ). In modern applications, datasets where the number of predictors p is much larger than thesample size n are commonly encountered. A standard approach for meaningful statistical estimationin these over-parametrized settings is to assume that only a few of the signals are prominent (theothers are small/insignificant). This is mathematically formalized by assuming that the underlyingregression coefficient vector is sparse. In the Bayesian paradigm, this assumption of sparsity isaccommodated either by choosing spike-and-slab priors (mixture of point mass at zero and anabsolutely continuous density) or absolutely continuous shrinkage priors which selectively shrinkthe small/insignificant signals.A variety of useful shrinkage priors have been proposed in the literature (see [2, 4, 18] and thereferences therein), and the Horseshoe prior ([4]) is a widely used and highly popular choice. The a r X i v : . [ m a t h . S T ] J a n . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Horseshoe prior for linear regression is specified as follows. β | λ , σ , τ ∼ N p (0 , σ τ Λ ) λ i ∼ C + (0 ,
1) independently for i = 1 , , · · · , pτ ∼ π τ ( · ) σ ∼ Inverse-Gamma( a, b ) (1.1)where N d denotes the d − variate normal density, Λ is a diagonal matrix with diagonal entries givenby the entries (cid:110) λ j (cid:111) pj =1 , and Inverse-Gamma( a, b ) denotes the Inverse-Gamma density with shapeparameter a and rate parameter b . The vector λ = ( λ j ) pj =1 is referred to as the vector of local(component-wise) shrinkage parameters, while τ is referred to as the global shrinkage parameter.The resulting posterior distribution for ( β , σ ) is intractable in the sense that closed form com-putations or i.i.d. sampling from this distribution are not feasible. Several Gibbs sampling Markovchains have been proposed in the literature to generate approximate samples from the Horseshoeposterior, see for example ([1, 7, 8, 12]).The fact that parameter values which are far away from zero are not regularized at all due tothe heavy tails is considered to be a key strength of the Horseshoe prior. However, as pointed outin Piironen, Vehtari [17], this can be undesirable when the parameters are only weakly identified.To address this issue, [17] introduced the regularized Horseshoe prior, given by β i | λ , σ , τ ∼ N p (cid:32) , (cid:18) c + 1 λ i τ (cid:19) − σ (cid:33) independently for i = 1 , , · · · , pλ i ∼ C + (0 , i = 1 , , · · · , pτ ∼ π τ ( · ) σ ∼ Inverse-Gamma( a, b )Here c is a finite constant which controls additional regularization of all regression parameters (largeand small). The original Horseshoe prior can be recovered by letting c → ∞ . Piironen, Vehtari [17]use a Hamiltonian Monte Carlo (HMC) based approach to generate approximate samples fromthe corresponding regularized Horseshoe posterior distribution. Also, any Gibbs sampler for theHorseshoe posterior can be suitably adapted in the regularized setting.For any practitioner using Markov chain Monte Carlo, it is crucial to understand the accuracy ofthe resulting MCMC based estimates by obtaining valid standard errors for these estimates. Thenotion of geometric ergodicity plays an important role in this endeavor, as explained below. Let( β m , σ m ) m ≥ denote a Harris ergodic Markov chain with the Horseshoe or regularized Horseshoeposterior density, denoted by π H ( · | y ), as its stationary density. The Markov chain is said to begeometrically ergodic if (cid:13)(cid:13)(cid:13) K m β ,σ − Π H (cid:13)(cid:13)(cid:13) TV ≤ C (cid:0) β , σ (cid:1) γ m where K m β ,σ denotes the distribution of the Markov chain started at ( β , σ ) after m steps, Π H denotes the stationary distribution, and (cid:107) · (cid:107) TV denotes the total variation norm. Suppose we wishto evaluate the posterior expectation E π H ( ·| y ) g = (cid:90) (cid:90) g (cid:0) β , σ (cid:1) π H (cid:0) β , σ | y (cid:1) d β dσ . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe for a real-valued measurable function g of interest. Harris ergodicity guarantees that the Markovchain based estimator ¯ g m := 1 m + 1 m (cid:88) i =0 g (cid:0) β i , σ i (cid:1) is strongly consistent for E π ( ·| y ) g . An estimate by itself, however, is not quite useful without anassociated standard error. All known methods to compute consistent estimates (see for example[6], [4]) of the standard error for ¯ g m require the existence of a Markov chain CLT which establishes √ m (cid:0) ¯ g m − E π H ( ·| y ) g (cid:1) → N (0 , σ g ) , for σ g ∈ (0 , ∞ ). In turn, the standard approach for establishing a Markov chain CLT requiresproving geometric ergodicity of the underlying Markov chain. To summarize, proving geometricergodicity helps rigorously establish the asymptotic validity of CLT based standard error estimatesused by MCMC practitioners.Establishing geometric ergodicity for continuous state space Markov chains encountered in moststatistical applications is in general a very challenging task. For a significant majority of Markovchains in statistical applications, the question of whether they are geometrically ergodic or not hasnot been resolved, although there have been some success stories. In the context of Markov chainsarising in Bayesian shrinkage, geometric ergodicity of Gibbs samplers corresponding to variousshrinkage priors such as the Bayesian lasso, Normal-Gamma, Dirichlet-Laplace and double Paretopriors has been recently established in ([9, 15, 16]).Results for the Horseshoe prior remained elusive until very recently. The marginal Horseshoeprior on entries of β (integrating out λ , given τ ) has an infinite spike near zero and significantlyheavier tails than the shrinkage priors mentioned above. This structure, while making it veryattractive for sparsity selection, implicitly creates a lot of complications and challenges in thegeometric ergodicity analysis using drift and minorization techniques. Recently, the authors inJohndrow et al. [8] derived a two-block Gibbs sampler for the Horseshoe posterior (the ‘exactalgorithm’ in [8, Section 2.1], henceforth referred to as the JOB Gibbs sampler), and establishedgeometric ergodicity ([8, Theorem 14]). However, the truncation assumptions needed for this resultare rather restrictive, requiring all the local shrinkage parameters λ i to be bounded above by afinite constant, and also requiring the global shrinkage parameter τ to be bounded above andbelow by finite positive constants. In parallel work (Biswas et al. [3], uploaded on arxiv a few daysprior to our submission) geometric ergodicity for the JOB Gibbs sampler has now been establishedwithout requiring truncation of the local shrinkage parameters. However, the requirement of theglobal shrinkage parameter τ to be bounded above and below remains. Contribution : The first contribution of this paper is the proof of geometric ergodicity for aHorseshoe Gibbs sampler (see Theorem 2.1) with no truncation assumptions on the local shrinkageparameters, and with the global shrinkage parameter only required to be truncated below by a finitepositive constant and to have a finite δ th prior moment for some δ > . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe discussed in Remark 2.1, the assumption of truncation below by a positive constant can be furtherrelaxed to existence of the negative ( p + δ ) / th prior moment for some δ > . λ -block of the Gibbs sampler and establisha drift condition (Lemma 2.1) using a drift function which is ‘unbounded off compact sets’, and thatdirectly leads to geometric ergodicity. On the other hand, the approaches in [8, 3] use other driftfunctions (using all the parameters or a different parameter block than λ ) which are not unboundedoff compact sets, and hence need an additional minorization argument.Next we move to the regularized Horseshoe setting of Piironen, Vehtari [17]. As mentionedpreviously, [17] use a Hamiltonian Monte Carlo (HMC) based approach to generate approximatesamples from the corresponding regularized Horseshoe posterior distribution, but do not investigategeometric ergodicity of the proposed Markov chain. It is not clear whether the intricate sufficientconditions needed for geometric ergodicity of HMC chains in Livingstone et al. [10] apply to theHMC chain in [17]. Given the variety of efficient Gibbs samplers available for the original Horseshoeposterior, it is natural to consider an appropriately adapted version of any of these samplers forthe regularized Horseshoe posterior. Contribution : As the second main contribution of this paper, we establish geometric er-godicity for one such Gibbs sampler for the regularized Horseshoe posterior (see Theorem 3.1) withno truncation assumptions on the global and local shrinkage parameters at all . The seemingly minorchange in the prior structure (compared to the original Horseshoe), leads to crucial changes in ourconvergence analysis. For example, we need a different drift function for this analysis (Lemma 3.1)compared to the the original Horseshoe analysis. This drift function is not ‘unbounded off compactsets’, and hence we need an additional minorization condition (Lemma 3.2) to establish geometricergodicity.Recently, Nishimura, Suchard [14] construct a further variant of the regularized Horseshoe priorof [17] by changing the algebraic form of the conditional prior density of β for computationalsimplicity. Their prior specification is as follows. π (cid:0) β j , λ j | τ , σ (cid:1) ∝ (cid:113) τ λ j exp (cid:34) − β j σ (cid:32) c + 1 τ λ j (cid:33)(cid:35) π (cid:96) ( λ j )independently for j = 1 , , · · · , pτ ∼ π τ ( · ) σ ∼ Inverse-Gamma( a, b )The algebraic modification, in particular removal of the ( c − + ( λ i τ ) − ) / in the conditional priorfor β i simplifies posterior computation (see Section 3.3 for more details). Nishimura, Suchard [14]prove geometric ergodicity for the related but structurally different setting of Polya-Gamma logisticregression assuming that the global shrinkage parameter τ is bounded above and below by finitepositive constants. However, as discussed in Remark 3.1, several details of this analysis break downin the linear regression setting. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Contribution : We focus on the linear regression setting, and leverage our analysis in theoriginal Horseshoe setting to prove geometric ergodicity of a Gibbs sampler corresponding to [14]’sregularized variant with the global shrinkage parameter only required to be bounded below by afinite positive constant and to have a finite ( p + δ ) / th moment for some δ > .
2. Geometric ergodicity of a Horseshoe Gibbs sampler
In this section, we describe in detail the Horseshoe Gibbs sampler that will be analyzed in subse-quent sections. As pointed out in Makalic, Schmidt [12], if λ j | ν j ∼ Inverse-Gamma(1 / , /ν j ) and ν j ∼ Inverse-Gamma(1 / , λ j ∼ C + (0 , ν = ( ν , ν , · · · , ν p ), theHorseshoe prior in (1.1) can be alternatively written as β | λ , σ , τ ∼ N p (0 , σ τ Λ ) λ i | ν ∼ Inverse-Gamma(1 / , /ν i ) independently for i = 1 , , · · · , pν i ∼ Inverse-Gamma(1 / ,
1) independently for i = 1 , , · · · , pτ ∼ π τ ( · ) , σ ∼ Inverse-Gamma( a, b ) (2.1)Using the prior above and after straightforward calculations, various conditional posterior distri-butions can be derived as follows. β | σ , τ , λ, ν, y ∼ N p ( A − X T y , σ A − ) σ (cid:12)(cid:12) τ , λ, ν, y ∼ Inverse-Gamma (cid:18) a + n , y T ( I n − ˜ P X ) y + b (cid:19) λ j (cid:12)(cid:12)(cid:12) ν j , σ , τ , β j , y ∼ Inverse-Gamma (cid:18) , ν j + β j σ τ (cid:19) independently for i = 1 , , · · · , pν j | λ j , τ , y ∼ Inverse-Gamma (cid:18) , λ j (cid:19) independently for i = 1 , , · · · , pτ (cid:12)(cid:12) λ, y ∼ π (cid:0) τ (cid:12)(cid:12) λ, y (cid:1) ∝ (cid:18) y T ( In − ˜ P X ) y + b (cid:19) − ( a + n ) √ | I p + X T X . Λ ∗ | · π τ ( τ ) (2.2) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where Λ ∗ = τ Λ ; A = X T X + Λ − ∗ and ˜ P X = X A − X T .Consider a two-block Gibbs sampling Markov chain with transition kernel K aug (with blocks( β , σ , ν , τ ) and λ ) whose one-step transition from ( β , σ , ν , τ , λ ) to ( β , σ , ν , τ , λ ) is givenas follows.1. Draw ( β , σ , ν , τ ) from π ( β , σ , ν , τ | λ , y ). This can be done by sequentially drawing β ,then σ , then ν , and then τ from appropriate conditional posterior densities in (2.2).2. Draw λ from π ( λ | β, ν , σ , τ , y ). This can be done by independently drawing the compo-nents of λ from the appropriate full conditional posterior density in (2.2).The JOB Gibbs sampler from Johndrow et al. [8] is very similar to the above two-block Gibbssampler K aug . The difference is that the latent variables ν are not used, and the two blocks used inthe JOB Gibbs sampler are ( β , σ , τ ) and λ . While the sampling steps for β , σ , τ are exactly thesame as above, the components of λ are sampled differently. In particular, each λ j is sampled fromthe conditional density given β j , σ , τ , y (no conditioning on ν j ). This conditional density is nota standard density, and draws are made using a rejection sampler. To summarize, by consideringthe latent variables ν , we replace the p rejection sampler based draws from a non-standard densityin the JOB Gibbs sampler (for components of λ ) with 2 p draws from standard Inverse-Gammadensities (for components of λ and ν ).The Gibbs sampler K aug can essentially be considered a hybrid of the JOB Gibbs sampler and theGibbs sampler in Makalic, Schmidt [12], which uses a latent variable ξ (in addition to ν ) to replacethe draws from the non-standard π ( τ | λ , y ) density with two draws from standard Inverse-Gammadensities. As mentioned in the introduction, the geometric ergodicity result for the JOB Gibbssampler in [8, Theorem 14] has been established by assuming that the local shrinkage parametersin λ are all bounded above, and the global shrinakge parameter τ is bounded above and below. Invery recent follow-up work [3], the authors establish geoemtric ergodcity for a class of Half- t Gibbssamplers of which the JOB Gibbs sampler is a member. In this work, the truncation assumptionon the local shrinkage parameters has been removed, but the global shrinkage parameter is stillassumed to be truncated above and below. However, we show below that geometric ergodicity forthe hybrid Gibbs sampler K aug can be established with no truncation at all on the local shrinkageparameters in λ , and only assuming that the global shrinkage parameter τ is truncated below.The reasons for this improved analysis of the hybrid chain K aug lie in the intricacies of driftand minorization approach ([20]), which is the state of the art technique for proving geometricergodicity for general state space Markov chains. The introduction of the latent variables ν , theresulting Inverse-Gamma posterior conditionals for entries of λ and ν , and avoiding the latentvariable ξ for the global shrinkage parameter τ provide just the right ingredients for establishinga geometric drift condition in Section 2.2 which is then leveraged to establish geometric ergodicity.Even a minor deviation in the structure of the Markov chain (such as in the JOB Gibbs sampleror the Gibbs sampler of [12]) leads to a breakdown of the intricate argument.Before proceeding further, we note that geometric ergodicity of a two-block Gibbs sampler canbe established by showing that any of its two marginal chains is geometrically ergodic (see for . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe example [19]). Hence, we focus on the marginal λ -chain corresponding to K aug . The one-steptransition dynamics of this Markov chain from λ m to λ m +1 is given as follows:1. Draw τ from π (cid:0) τ (cid:12)(cid:12) λ m , y (cid:1)
2. Draw ν from π (cid:0) ν | λ m , τ , y (cid:1) = p (cid:81) j =1 Inverse-Gamma (cid:18) , λ j ; m (cid:19)
3. Draw σ from π (cid:0) σ (cid:12)(cid:12) τ , λ m , ν , y (cid:1) = Inverse-Gamma (cid:18) a + n , y T ( I n − ˜ P X ) y + b (cid:19)
4. Draw β from π (cid:0) β | σ , τ , λ m , ν , y (cid:1) = N p ( A − X T y , σ A − )5. Finally draw λ m +1 from π (cid:0) λ | β, ν , σ , τ , y (cid:1) = p (cid:81) j =1 Inverse-Gamma (cid:18) , ν j + β j σ τ (cid:19) The Markov transition density (MTD) corresponding to the marginal λ -chain is given by k ( λ , λ ) = (cid:90) R + (cid:90) R + (cid:90) R p (cid:90) R p + π (cid:0) λ | β, ν , σ , τ , y (cid:1) π (cid:0) β, ν , σ , τ (cid:12)(cid:12) λ , y (cid:1) d ν d β dσ dτ = (cid:90) R + (cid:90) R + (cid:90) R p (cid:90) R p + π (cid:0) λ | β, ν , σ , τ , y (cid:1) π (cid:0) β | σ , τ , λ , ν , y (cid:1) × π (cid:0) σ (cid:12)(cid:12) τ , λ , ν , y (cid:1) π (cid:0) ν | λ , τ , y (cid:1) π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) d ν d β dσ dτ (2.3)We now establish a drift condition for the marginal λ -chain, which will then be used to establishgeometric ergodicity for the two-block Horseshoe Gibbs sampler K aug . λ -chain Consider the function V : R p + (cid:55)→ [0 , ∞ ) given by V ( λ ) = p (cid:88) j =1 (cid:0) λ j (cid:1) δ + p (cid:88) j =1 (cid:0) λ j (cid:1) − δ , (2.4)where δ , δ ∈ (0 ,
1) are some constants. The next result establishes a geometric drift condition forthe marginal λ -chain using the function V with appropriately small values of δ and δ . Lemma 2.1.
Suppose the prior density π τ for the global shrinkage parameter is truncated belowi.e., π τ ( u ) = 0 for u < T for some T > and satisfies (cid:90) ∞ T u δ/ π τ ( u ) du < ∞ for some δ ∈ (0 . , . . Then, there exist δ , δ ∈ (0 , such that for every λ ∈ R p + wehave E [ V ( λ ) | λ ] = (cid:90) R p + k ( λ , λ ) V ( λ ) d λ ≤ γ ∗ V ( λ ) + b ∗ (2.5) with < γ ∗ = γ ∗ ( δ , δ ) < and b ∗ = b ∗ ( δ , δ ) < ∞ .Proof. Note that by linearity E [ V ( λ ) | λ ] = p (cid:88) j =1 E (cid:20) (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) + p (cid:88) j =1 E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) (2.6) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe We first consider terms in the second sum in (2.6). Fix j ∈ { , , · · · , p } arbitrarily. It follows fromthe definition of the MTD (2.3) that E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) = E (cid:20) E (cid:20) E (cid:20) E (cid:20) E (cid:20) (cid:0) λ j (cid:1) − δ | β, ν , σ , τ , y (cid:21) | σ , τ , λ , ν , y (cid:21) | τ , λ , ν , y (cid:21) | λ , τ , y (cid:21) | λ , y (cid:21) . (2.7)The five iterated expectations correspond to the five conditional densities in (2.3). Starting withthe innermost expectation, and using the fact that 1 /λ j (conditioned on β, ν , σ , τ , y ) follows aGamma distribution with shape parameter 1 and rate parameter 1 /ν j + β j / (2 σ τ ), we obtain E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21) = Γ (cid:18) δ (cid:19) (cid:32) ν j + β j σ τ (cid:33) − δ = Γ (cid:18) δ (cid:19) ν j + 1 (cid:26) (2 σ τ ) δ | β j | δ (cid:27) δ − δ Note that the function y (cid:55)→ (cid:16) c + y − δ (cid:17) − δ / on (0 , ∞ ) is concave for c > , δ ∈ (0 , E (cid:20) E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:21) ≤ Γ (cid:18) δ (cid:19) ν j + 1 (cid:26) E (cid:20) (2 σ τ ) δ | β j | δ (cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , ν , y (cid:21)(cid:27) δ − δ (2.8)Note that the conditional distribution of β j given σ , τ , λ , ν , y is a Gaussian distribution withvariance σ j def = σ e TJ A − e j ≥ σ (cid:18) ¯ ω + τ λ j ;0 (cid:19) − . Here ¯ ω is the maximum eigenvalue of X T X and e j is the p × j th entry 1 and other entries equal to 0. Using Proposition A1 from Pal,Khare [16] regarding the negative moments of a Gaussian random variable and choosing δ ∈ (0 , E (cid:0) σ τ (cid:1) δ | β j | δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , ν , y ≤ (cid:0) σ τ (cid:1) δ Γ (cid:16) − δ (cid:17) − δ √ πσ δ j ≤ Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ωτ + 1 λ j ;0 (cid:33) δ (2.9) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Combining (2.8) and (2.9), we obtain E (cid:20) E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:21) ≤ Γ (cid:18) δ (cid:19) ν j + 1 Γ (cid:16) − δ (cid:17) √ π (cid:18) ¯ ωτ + λ j ;0 (cid:19) δ δ − δ . Using the fact ( u + v ) δ ≤ u δ + v δ for δ ∈ (0 ,
1) and u, v ≥
0, it follows that E (cid:20) E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:21) ≤ Γ (cid:18) δ (cid:19) ν j + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ . (2.10)Note that the bound in (2.10) does not depend on σ . Again, using the fact that y (cid:55)→ (cid:16) c + y − δ (cid:17) − δ / on (0 , ∞ ) is concave for c > , δ ∈ (0 , E ν j + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ , λ , y ≤ (cid:26) E (cid:20) ν δ j (cid:12)(cid:12)(cid:12)(cid:12) τ , λ , y (cid:21)(cid:27) δ + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ Since ν j (given τ , λ , y ) has an Inverse-Gamma distribution with shape parameter 1 and rateparameter 1 + 1 /λ j ;0 , it follows that . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe E ν j + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ , λ , y = Γ (cid:16) − δ (cid:17) (cid:18) λ j ;0 (cid:19) δ δ + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ . (2.11)Let us now take the expectation of the expression in (2.11) with respect to the conditional distri-bution of τ given λ , y . Using for a third time the fact that y (cid:55)→ (cid:16) c + y − δ (cid:17) − δ / on (0 , ∞ ) isconcave for c > , δ ∈ (0 , E (cid:26) Γ (cid:16) − δ (cid:17) (cid:18) λ δ j ;0 (cid:19)(cid:27) δ + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ ( τ ) δ + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ , y ≤ (cid:26) Γ (cid:16) − δ (cid:17) (cid:18) λ δ j ;0 (cid:19)(cid:27) δ + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:32) ¯ ω δ E (cid:104) ( τ ) δ (cid:12)(cid:12)(cid:12) λ , y (cid:105) + (cid:16) λ j ;0 (cid:17) − δ (cid:33)(cid:41) δ − δ ( (cid:70) ) ≤ (cid:26) Γ (cid:16) − δ (cid:17) (cid:18) λ δ j ;0 (cid:19)(cid:27) δ + 1 (cid:40) Γ (cid:16) − δ (cid:17) √ π (cid:18) ¯ ω δ C + λ δ j ;0 (cid:19)(cid:41) δ − δ ≤ (cid:32) C + 1 λ δ j ;0 (cid:33) (cid:110) Γ (cid:16) − δ (cid:17)(cid:111) δ + √ π δ (cid:110) Γ (cid:16) − δ (cid:17)(cid:111) δ − δ ; C = max (cid:110) , ¯ ω δ C (cid:111) (2.12)where ( (cid:70) ) follows from Proposition B.1. Combining (2.7), (2.8), (2.10), (2.11) and (2.12), we get E p (cid:88) j =1 (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ ≤ γ ( δ ) p (cid:88) j =1 (cid:0) λ j ;0 (cid:1) − δ + b (2.13) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where γ ( δ ) = Γ (cid:18) δ (cid:19) (cid:110) Γ (cid:16) − δ (cid:17)(cid:111) δ + √ π δ (cid:110) Γ (cid:16) − δ (cid:17)(cid:111) δ − δ and b = p · C · γ ( δ ) . Next consider E (cid:34) p (cid:80) j =1 (cid:16) λ j (cid:17) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ (cid:35) . Fix a j ∈ { , , · · · , p } arbitrarily. Since δ ∈ (0 , u + v ) δ ≤ u δ + v δ for u, v ≥ E (cid:20) (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21) = Γ (cid:18) − δ (cid:19) (cid:32) ν j + β j σ τ (cid:33) δ ≤ Γ (cid:18) − δ (cid:19) ν δ j + | β j | δ (2 σ τ ) δ . For j = 1 , , · · · , p , we denote µ j = e Tj A − X T y (2.14)where A = X T X + ( τ Λ ) − .It follows that E (cid:20) E (cid:20) (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:21) ≤ Γ (cid:18) − δ (cid:19) E ν δ j + | β j | δ (2 σ τ ) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y = Γ (cid:18) − δ (cid:19) E ν δ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y + E (cid:34) | β j | δ (2 σ τ ) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:35) ≤ Γ (cid:18) − δ (cid:19) (cid:32) Γ (cid:18) δ (cid:19) + E (cid:34) | β j − µ j | δ (2 σ τ ) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:35) + | µ j | δ (2 σ τ ) δ (cid:33) ; ≤ Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) + Γ (cid:16) δ (cid:17) √ π λ δ j ;0 + | µ j | δ (2 σ τ ) δ ( (cid:70)(cid:70) ) ≤ Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) + Γ (cid:16) δ (cid:17) √ π λ δ j ;0 + T ∗ (2 σ ) δ , for some T ∗ >
0. Here ( (cid:70)(cid:70) ) follows from Proposition A. τ . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe is supported on [ T, ∞ ). Hence, E (cid:20) (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) = E (cid:20) E (cid:20) E (cid:20) (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) σ , τ , λ , y (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) + Γ (cid:16) δ (cid:17) √ π λ δ j ;0 + E (cid:34) T ∗ (2 σ ) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:35) = Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) + Γ (cid:16) δ (cid:17) √ π λ δ j ;0 + E (cid:34) E (cid:34) T ∗ (2 σ ) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ , λ , y (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:35) ≤ Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) + Γ (cid:16) δ (cid:17) √ π λ δ j ;0 + T ∗ (2 b ) δ · Γ (cid:16) a + n + δ (cid:17) Γ (cid:0) a + n (cid:1) It follows that E p (cid:88) j =1 (cid:0) λ j (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ ≤ γ ( δ ) p (cid:88) j =1 (cid:0) λ j ;0 (cid:1) δ + b (2.15)where γ ( δ ) = Γ (cid:18) − δ (cid:19) Γ (cid:16) δ (cid:17) √ π and b = p · Γ (cid:18) − δ (cid:19) Γ (cid:18) δ (cid:19) T ∗ (2 b ) δ · Γ (cid:16) a + n + δ (cid:17) Γ (cid:0) a + n (cid:1) . The result follows by combining (2.13) and (2.15) with γ ∗ = max { γ ( δ ) , γ ( δ ) } and b ∗ = b + b . Note that γ ∗ = max { γ ( δ ) , γ ( δ ) } < δ and δ , for example δ , δ ∈ (0 . , . Remark 2.1.
Note that the only place in the proof of Lemma 2.1 where we need τ to be truncatedbelow is to show that E (cid:20)(cid:0) τ (cid:1) − δ | λ , y (cid:21) is uniformly bounded in λ . In Proposition B.2, we showthis follows by assuming the weaker condition that the prior negative ( p + δ ) / th moment for τ isfinite. We now explain why the geometric drift condition established in Theorem 2.1 for the marginal λ -chain implies geometric ergodicity of the two-block Horseshoe Gibbs sampler K aug . Note that forevery d ∈ R , the set B ( V, d ) = λ ∈ R p + : V ( λ ) = p (cid:88) j =1 (cid:0) λ j (cid:1) δ + p (cid:88) j =1 (cid:0) λ j (cid:1) − δ ≤ d . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe is a compact set. Since k ( λ , λ ) is continuous in λ , a standard argument using Fatou’s lemmaalong with Theorem 6.0.1 of Meyn, Tweedie [13] can be used to establish that the marginal λ -chainis unbounded off petite sets . Lemma 15.2.8 of Meyn, Tweedie [13] then implies geometric ergodicityof the marginal λ -chain. Using Lemma 2.4 in Diaconis et al. [5] now gives the following result. Theorem 2.1.
Suppose the prior density π τ for the global shrinkage parameter truncated belowi.e., π τ ( u ) = 0 for u < T for some T > and satisfies (cid:90) ∞ u δ/ π τ ( u ) du < ∞ for some δ ∈ (0 . , . . Then the two-block Horseshoe Gibbs sampler with transition kernel K aug is geometrically ergodic. The assumption of truncation below (i.e., T > ) can be replaced bythe weaker assumption that T = 0 and that the prior negative ( p + δ ) / th moment for τ is finitefor some δ > . . Note that the above result establishes geometric ergodicity, which as described earlier, helps rig-orously establish the asymptotic validity of Markov chain CLT based standard error estimates.However, if quantitative bounds on the distance to stationarity are needed, then an additional minorization condition needs to be established. For the sake of completeness, we derive such acondition in Appendix C (see Lemma C.1).
The objective of this study is to examine the practical feasibility/scalability of the Gibbs samplerdescribed and analyzed in Section 2.1 by comparing its computational performance with the JOBGibbs sampler. We have considered two different simulation settings. For the first simulation setting,we fix the sample size n to be 500 and the number of predictors p to be 1000. The first 20 entriesof the “true” regression coefficient vector β := ( β , . . . , β p ) are specified as β j = 2 s j where s (cid:48) j s area sequence of equally spaced values in the interval ( − . , X are generated independently from N (0 , y from the model y = X β + (cid:15) where the error vector (cid:15) has i.i.d. normal entrieswith mean 0 and standard deviation 0 .
1. For the second simulation setting, the same proceduredescribed above is followed with n = 750 and p = 1500.We generate 20 data sets each from each of the two simulation settings, and run both the Gibbssamplers on each of the 40 data sets. For a fair comparison, both algorithms were implementedin R . The simulations were run on a machine with a 64 bit Windows 7 operating system, 8 GBRAM and a 3.4 GHz processor. We provide the run-times for generating 2500 iterations from eachof the Gibbs sampler in Table 1. In the case n = 750 , p = 1500, the average CPU time requiredfor JOB sampler is 9213 .
84 seconds compared to the average of 9114 .
19 seconds for the proposedsampler. In the case n = 500 , p = 1000 , the average required times are 2446 . . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe the MCMC chains, we considered the cumulative average plots of the function β T β . These plotsfor a randomly selected data set from each of the two simulation settings are provided in Figure 1and Figure 2. The plots for all the other Markov chains are similar to the ones presented here.It is evident from the above results that the Gibbs sampler analysed in this paper has comparable(slightly better) computational performance than the JOB Gibbs sampler in the above settings,and hence is practically useful. The geoemtric ergodicity result in Theorem 2.1 therefore helpsprovide asymptotically valid standard error estimates for corresponding MCMC approximations,under weaker assumptions compared to the JOB Gibbs sampler.In [8], the authors discuss a time-inhomogeneous approximation/modification to the JOB Gibbssampler, termed as the approximate Gibbs sampler, for faster and more scalable computation.We would like to mention that the exact same modifications can be used for the Gibbs samplerdescribed in Section 2.1 to obtain a corresponding approximate faster and time-inhomogeneousversion. Simulation setting: n=500, p=1000JOB sampler Proposed Sampler2300.2 2236.72303.5 2247.332662.55 2312.962700.01 2669.572593.41 2600.052675.34 2569.932755.46 2763.222528.5 2447.422641.8 2580.942664.14 2593.72360.96 2593.692313.37 2248.142301.66 2240.992298.01 2248.022305.22 2254.252294.81 2242.132313.77 2244.532290.15 2240.652307.51 2242.812327.64 2244.93 (a)
Simulation setting: n=750, p=1500JOB sampler Proposed Sampler9211.42 9092.329192.64 9101.279236.63 9105.149190.44 9125.479241.26 9128.659204.89 9108.789222.37 9100.239225.12 9137.849196.45 9110.899197.96 9135.359204.08 9110.059213.73 9126.439215.79 9117.769216.99 9108.289212.47 9098.579220 9115.629232.6 9125.469196.04 9112.859233.48 9122.759212.5 9100.18 (b)
Table 1
The run-times (in seconds) required to generate 2500 MCMC samples using the JOB sampler and the proposedsampler for simulated datasets are tabulated. The left sub-table (a) corresponds to the simulation setting n = 500 , p = 1000 , while the right sub-table (b) corresponds to the simulation setting n = 750 , p = 1500 .. Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe (a) Cumulative Average plot for JOB sampler.(b) Cumulative Average for proposed sampler. Fig 1: Cumulative average plots for the function β T β corresponding to a randomly selected dataset in the n = 500 , p = 1000 simulation setting. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe (a) Cumulative Average plot for JOB sampler.(b) Cumulative Average for proposed sampler. Fig 2: Cumulative average plots for the function β T β corresponding to a randomly selected dataset in the n = 750 , p = 1500 simulation setting. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe
3. Geometric ergodicity for regularized Horseshoe Gibbs samplers
Recall from the introduction that the regularized Horseshoe prior developed in Piironen, Vehtari[17] is given by β i | λ i , σ , τ ∼ N p (cid:32) , (cid:18) c + 1 λ i τ (cid:19) − σ (cid:33) independently for i = 1 , , · · · , pλ i ∼ C + (0 , i = 1 , , · · · , pτ ∼ π τ ( · ) σ ∼ Inverse-Gamma( a, b ) (3.1)The only difference between this prior and the original Horseshoe prior in (1.1) is the additionalregularization introduced in the the prior conditional variance of the β i s through the constant c .As c → ∞ in (3.1), then one reverts back to the original Horseshoe specification in (1.1).Note that one of the salient features of the Horseshoe prior is the lack of shrinkage/regularizationof parameter values that are far away from zero. The authors in [17] argue that while this feature isone of the key strengths of the Horseshoe prior in many situations, it can be a drawback in settingswhere the parameters are weakly identified. We refer the reader to [17] for a thorough motivationand discussion of the properties and performance of this prior vis-a-vis the Horseshoe prior. Ourfocus in this paper is to look at Markov chains to sample from the resulting intractable regularizedHorseshoe posterior, and investigate properties such as geometric ergodicity.The authors in [17] use Hamiltonian Monte Carlo (HMC) to generate samples from the posteriordistribution. Geometric ergodicity of this HMC chain, however, is not established. In recent work[10], sufficient conditions for geometric ergodicity (or lack thereof) for general HMC chains havebeen provided. However, these conditions, namely Assumptions A1, A2, A3 in [10], are rathercomplex and intricate, and at least to the best of our understanding it is unclear and hard to verifyif these conditions are satisfied by the HMC chain in [17].Given the host of Gibbs samplers available in the literature for the original Horseshoe posterior,it is natural to consider a Gibbs sampler to sample from the regularized Horseshoe posterior aswell. In fact, after introducing the augmented variables { ν j } pj =1 , the following conditional posteriordistributions can be obtained after straightforward computations: β | σ , τ , λ , y ∼ N (cid:0) A − c X T y , σ A − c (cid:1) σ (cid:12)(cid:12) τ , λ , y ∼ Inverse-Gamma (cid:32) a + n , y T (cid:0) I n − X A − c X T (cid:1) y b (cid:33) ν j | λ j , y ∼ Inverse-Gamma (cid:32) , λ j (cid:33) , independently for j = 1 , , · · · , pπ (cid:0) λ | β, ν , σ , τ , y (cid:1) = p (cid:89) j =1 g (cid:0) λ j (cid:12)(cid:12) ν j , β j , σ , τ , y (cid:1) τ (cid:12)(cid:12) λ , y ∼ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) (3.2) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where g (cid:0) λ j (cid:12)(cid:12) ν j , β j , σ , τ , y (cid:1) ∝ (cid:32) c + 1 τ λ j (cid:33) (cid:0) λ j (cid:1) − exp (cid:34) − λ j (cid:32) ν j + β j σ τ (cid:33)(cid:35) for j = 1 , , · · · , p , π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) ∝ | A c | − p (cid:89) j =1 (cid:32) c + 1 τ λ j (cid:33) (cid:32) y T (cid:0) I n − X A − c X T (cid:1) y b (cid:33) − ( a + n ) π τ (cid:0) τ (cid:1) and A c = X T X + (cid:0) τ Λ (cid:1) − + c − I p . Most of the above densities are standard and can be easily sam-pled from. Efficient rejection/Metropolis samplers can be used to sample from the one-dimensionalnon-standard densities g (cid:16) λ j (cid:12)(cid:12)(cid:12) ν j , β j , σ , τ , y (cid:17) and π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) (see in Appendix D). Hence, a two-block Gibbs sampler, whose one step-transition from ( β , σ , ν , τ , λ ) to ( β , σ , ν , τ , λ ) is givenby sampling sequentially from π ( β , σ , ν , τ | λ , y ) and π ( λ | β , σ , ν , τ , y ), can be used to gen-erate approximate samples from the regularized Horseshoe posterior. We will denote the transitionkernel of this two-block Gibbs sampler by K aug,reg (analogous to K aug in the original Horseshoesetting).Our goal now is to establish geometric ergodicity for K aug,reg . We will achieve this by focusing onthe marginal λ -chain corresponding to K aug,reg . The one-step transition dynamics of this Markovchain from λ m to λ m +1 is given as follows:1. Draw τ from π (cid:0) τ (cid:12)(cid:12) λ m , y (cid:1)
2. Draw ν from π ( ν | λ m , y ) = p (cid:81) j =1 Inverse-Gamma (cid:18) , λ j ; m (cid:19)
3. Draw σ from π (cid:0) σ (cid:12)(cid:12) τ , λ m , y (cid:1) = Inverse-Gamma (cid:18) a + n , y T ( I n − X A − c X T ) y + b (cid:19)
4. Draw β from π (cid:0) β | σ , τ , λ m , y (cid:1) = N p ( A − c X T y , σ A − c )5. Finally draw λ m +1 from π (cid:0) λ | β, ν , σ , τ , y (cid:1) = p (cid:81) j =1 g (cid:16) λ j (cid:12)(cid:12)(cid:12) ν j , β j , σ , τ , y (cid:17) .The Markov transition density (MTD) corresponding to the marginal λ -chain is given by k ( λ , λ ) = (cid:90) R + (cid:90) R + (cid:90) R p (cid:90) R p + π (cid:0) λ | β, ν , σ , τ , y (cid:1) π (cid:0) β, ν , σ , τ (cid:12)(cid:12) λ , y (cid:1) d ν d β dσ dτ = (cid:90) R + (cid:90) R + (cid:90) R p (cid:90) R p + π (cid:0) λ | β, ν , σ , τ , y (cid:1) π (cid:0) β | σ , τ , λ , ν, y (cid:1) × π (cid:0) σ (cid:12)(cid:12) τ , λ , ν, y (cid:1) π (cid:0) ν | λ , τ , y (cid:1) π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) d ν d β dσ dτ (3.3) λ -chain The geometric ergodicity of the λ -chain will be established using a drift and minorization analysis.However, given the modifications in the regularized Horseshoe posterior, the drift function V ( λ ) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe (see (2.4)) used for the original Horseshoe does not work in this case. We will instead use anotherdrift function ˜ V ( λ ) defined by˜ V ( λ ) = p (cid:88) j =1 (cid:0) λ j (cid:1) − δ ; for some constant δ ∈ (0 , . (3.4)As discussed previously, the function V ( λ ) is unbounded off petite sets and the V -based driftcondition in Lemma 2.1 is enough to guarantee geometric ergodicity for the original HorseshoeGibbs sampler K aug . A minorization condition is only needed if one also wants to get quantitativeconvergence bounds for distance to stationarity. The function ˜ V however, is not unbounded offpetite sets since B (cid:16) ˜ V , d (cid:17) = (cid:110) λ ∈ R p + : ˜ V ( λ ) ≤ d (cid:111) is not a compact subset of R p + for d >
0. Hence, a drift condition with ˜ V needs to be complementedwith a minorization condition in order to establish geometric ergodicity (Theorem 3.1). We establishthese two conditions respectively in Sections 3.2.1 and 3.2.2 below. As opposed to the originalHorseshoe setting, we do not require that the prior density π τ is truncated below away from zero.Only the existence of the δ/ th -moment is assumed for some δ ∈ (0 . , . Lemma 3.1.
Suppose (cid:82) R + u δ/ π τ ( u ) du < ∞ for some δ ∈ (0 . , . . Then, there existconstants < γ ∗ = γ ∗ ( δ ) < and b ∗ < ∞ such that E (cid:104) ˜ V ( λ ) (cid:12)(cid:12)(cid:12) λ (cid:105) ≤ γ ∗ ˜ V ( λ ) + b ∗ (3.5) for every λ ∈ R p + .Proof. Note that by linearity E (cid:104) ˜ V ( λ ) (cid:12)(cid:12)(cid:12) λ (cid:105) = p (cid:88) j =1 E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) (3.6)Fix j ∈ { , , · · · , p } arbitrarily. It follows from the definition of the MTD (3.3) that E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ (cid:21) = E (cid:20) E (cid:20) E (cid:20) E (cid:20) E (cid:20) (cid:0) λ j (cid:1) − δ | β, ν , σ , τ , y (cid:21) | σ , τ , λ , ν, y (cid:21) | τ , λ , ν, y (cid:21) | τ λ , y (cid:21) | λ , y (cid:21) . (3.7)We begin by evaluating the innermost expectation. It follows by using √ u + v ≤ √ u + √ v for u, v ≥ . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21) = ∞ (cid:82) (cid:16) λ j (cid:17) − δ (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ∞ (cid:82) (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ≤ ∞ (cid:82) (cid:16) λ j (cid:17) − δ (cid:32) | c | + (cid:113) τ λ j (cid:33) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ∞ (cid:82) (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ≤ ( τ ) δ | c | ∞ (cid:82) (cid:16) τ λ j (cid:17) − δ (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ∞ (cid:82) (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j + ∞ (cid:82) (cid:16) λ j (cid:17) − δ (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ∞ (cid:82) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j . (3.8)The first term in the last inequality of (3.8) can be expressed as (cid:0) τ (cid:1) δ | c | E (cid:2) X δ (cid:3) E (cid:104)(cid:113) c + X (cid:105) where τ X ∼ Gamma (cid:18) , ν j + β j σ τ (cid:19) . Using Young’s inequality, it follows that the first term isbounded above by max { , | c |}| c | √ δ (cid:0) τ (cid:1) δ . The second term in the last inequality of (3.8) is basically an Inverse-Gamma expectation, and isexactly equal to Γ (cid:0) δ (cid:1)(cid:18) ν j + β j σ τ (cid:19) δ . Hence, we get E (cid:20) (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) β, ν , σ , τ , y (cid:21) ≤ max { , | c |}| c | √ δ (cid:0) τ (cid:1) δ + Γ (cid:0) δ (cid:1)(cid:18) ν j + β j σ τ (cid:19) δ . Note that the conditional distribution of β j given σ , τ , λ , ν , y is a Gaussian distribution withvariance σ j def = σ e TJ A − c e j ≥ σ (cid:18) ¯ ω + c + τ λ j ;0 (cid:19) − . Here ¯ ω is the maximum eigenvalue of X T X . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Now, proceeding exactly with the analysis from (2.9) to (2.13) in the proof of Lemma 2.1 with ¯ ω replaced by ¯ ω + c − and using Proposition B.3 instead of Proposition B.1 yields E p (cid:88) j =1 (cid:0) λ j (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ ≤ γ ∗ ( δ ) p (cid:88) j =1 (cid:0) λ j ;0 (cid:1) − δ + b ∗ with γ ∗ ( δ ) = Γ (cid:18) δ (cid:19) (cid:8) Γ (cid:0) − δ (cid:1)(cid:9) δ + √ π δ (cid:8) Γ (cid:0) − δ (cid:1)(cid:9) δ − δ and b ∗ = p max { , | c |}| c | √ δC + p · max (cid:110) , (¯ ω + c − ) δ C (cid:111) · γ ∗ ( δ ) . Here C is as in Proposition B.3. It can be shown that γ ∗ ( δ ) < δ ∈ (0 . , . As discussed previously, the drift function ˜ V is not unbounded off compact sets, and the driftcondition in Lemma 3.1 needs to be complemented by an associated minorization condition toestablish geometric ergodicity. Fix a d >
0. Define B (cid:16) ˜ V , d (cid:17) = (cid:110) λ ∈ R p + : ˜ V ( λ ) ≤ d (cid:111) (3.9)We now establish the following minorization condition associated to the geometric drift conditionin Lemma 3.1. Lemma 3.2.
There exists a constant (cid:15) ∗ = (cid:15) ∗ (cid:16) ˜ V , d (cid:17) > and a density function h on R p + such that k ( λ , λ ) ≥ (cid:15) ∗ h ( λ ) (3.10) for every λ ∈ B (cid:16) ˜ V , d (cid:17) .Proof.
Fix a λ ∈ B (cid:16) ˜ V , d (cid:17) arbitrarily. In order to prove (3.10) we will demonstrate appropriatelower bounds for the conditional densities appearing in (3.3). From (3.2) we have the following: π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) ≥ b a + n ω − p ∗ (cid:18) τ (cid:19) − p | c | − p (cid:0) y T y + b (cid:1) − ( a + n ) π τ (cid:0) τ (cid:1) ;where ω ∗ = max (cid:110) ¯ ω + c − , d δ (cid:111) ; (recall that ¯ ω denotes the maximum eigenvalue of X T X ), π (cid:0) β | σ , τ , λ , y (cid:1) ≥ (cid:0) πσ (cid:1) − p | c | − p × exp (cid:34) − (cid:0) β − Ω − X T y (cid:1) T Ω (cid:0) β − Ω − X T y (cid:1) + y T X (cid:0) c I p − Ω − (cid:1) X T y σ (cid:35) ; . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where Ω = ω ∗ (cid:0) τ (cid:1) I p , π (cid:0) λ | β, ν , σ , τ , y (cid:1) = p (cid:89) j =1 (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) ∞ (cid:82) (cid:18) c + τ λ j (cid:19) (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) dλ j ≥ p (cid:89) j =1 (cid:0) τ (cid:1) − (cid:16) λ j (cid:17) − exp (cid:20) − λ j (cid:18) ν j + β j σ τ (cid:19)(cid:21) k ∗ (cid:18) √ ν j + σ √ τ β j (cid:19) ;where k ∗ = max (cid:8) √ π | c | − , (cid:9) , and π ( ν | λ , y ) ≥ p (cid:89) j =1 (cid:26) ν − j exp (cid:20) − ν j (cid:16) d δ (cid:17)(cid:21)(cid:27) π (cid:0) σ (cid:12)(cid:12) τ , λ , y (cid:1) ≥ b a + n Γ (cid:0) a + n (cid:1) (cid:0) σ (cid:1) − ( a + n ) − exp (cid:20) − σ (cid:18) y T y b (cid:19)(cid:21) . (3.11)Combining all the lower bounds provided above, it follows from (3.3) that k ( λ , λ ) ≥ (2 π ) − p b ( a + n ) (cid:0) √ ω ∗ k ∗ c (cid:1) − p ( y T y + b ) a + n Γ (cid:0) a + n (cid:1) (cid:90) R + (cid:90) R + (cid:90) R p (cid:90) R p + (cid:0) σ (cid:1) − ( a + n + p ) − p (cid:89) j =1 ν − j exp (cid:20) − ν j (cid:18) d δ + λ j (cid:19)(cid:21) √ ν j + σ √ τ β j exp (cid:34) − (cid:0) β − Ω − X T y (cid:1) T Ω (cid:0) β − Ω − X T y (cid:1) + β T (cid:0) τ Λ (cid:1) − β σ (cid:35) p (cid:89) j =1 (cid:110)(cid:0) λ j (cid:1) − (cid:111) exp (cid:34) − σ (cid:32) y T y + y T X (cid:0) c I p − Ω − (cid:1) X T y b (cid:33)(cid:35) (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) d ν d β dσ dτ Now for the inner most integral wrt ν , substituting the lower bounds given in Proposition B.6,induce the following lower bound on k ( λ , λ ): k ( λ , λ ) ≥ (2 π ) − p b ( a + n ) α p (cid:0) √ ω ∗ k ∗ c (cid:1) − p ( y T y + b ) a + n Γ (cid:0) a + n (cid:1) (cid:90) R + (cid:90) R + (cid:90) R p (cid:0) σ (cid:1) − ( a + n + p ) − p (cid:89) j =1 (cid:110)(cid:0) λ j (cid:1) − (cid:111) exp (cid:34) − (cid:0) β − Ω − X T y (cid:1) T Ω (cid:0) β − Ω − X T y (cid:1) + β T (cid:0) τ Λ (cid:1) − β σ (cid:35) p (cid:89) j =1 (cid:18) λ j (cid:19) − σ √ τ β j exp (cid:34) − σ (cid:32) y T y + y T X (cid:0) c I p − Ω − (cid:1) X T y b (cid:33)(cid:35) (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) d β dσ dτ . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where α is some positive constant (see Proposition B.6). For the inner most integral wrt β we usethe lower bound in Proposition B.7 and get the following: k ( λ , λ ) ≥ b ( a + n ) α p (cid:0) √ ω ∗ k ∗ | c | (cid:1) − p ( y T y + b ) a + n Γ (cid:0) a + n (cid:1)(cid:90) R + (cid:90) R + (cid:0) σ (cid:1) − ( a + n ) − exp (cid:34) − σ (cid:32) y T y + 2 y T X (cid:0) c I p − M − τ (cid:1) X T y b (cid:33)(cid:35) p (cid:89) j =1 (cid:0) λ j (cid:1) − × | M τ | − (cid:32) √ τ c (cid:33) − p (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dσ dτ ;where M τ is as in Proposition B.7. It follows that k ( λ , λ ) ≥ b ( a + n ) α p (cid:0) √ ω ∗ k ∗ | c | (cid:1) − p ( y T y + b ) a + n Γ (cid:0) a + n (cid:1) (cid:90) R + (cid:90) R + (cid:0) σ (cid:1) − ( a + n ) − exp (cid:20) − σ (cid:18) y T y + 2 c y T XX T y b (cid:19)(cid:21) p (cid:89) j =1 (cid:0) λ j (cid:1) − × | M τ | − (cid:32) √ τ c (cid:33) − p (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dσ dτ , since y T X M − τ X T y /σ ≥ . Next by virtue of the inverse-gamma integral, we have (cid:90) R + (cid:0) σ (cid:1) − ( a + n ) − exp (cid:20) − σ (cid:18) y T y + 2 c y T XX T y b (cid:19)(cid:21) dσ = Γ (cid:0) a + n (cid:1)(cid:16) y T y +2 c y T XX T y + b (cid:17) a + n This together with the fact that | M τ |≤ (cid:0) τ (cid:1) p p (cid:81) j =1 (cid:18) ω ∗ + λ j (cid:19) gives the following lower bound: k ( λ , λ ) ≥ b ( a + n ) α p (cid:0) √ ω ∗ k ∗ | c | (cid:1) − p ( y T y + b ) a + n (cid:16) y T y +2 c y T XX T y + b (cid:17) ( a + n ) p (cid:89) j =1 (cid:0) λ j (cid:1) − (cid:32) ω ∗ + 1 λ j (cid:33) − ∞ (cid:90) (cid:18) τ (cid:19) − p (cid:32) √ τ c (cid:33) − p (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dτ Further denoting η = max { , ω ∗ } we get k ( λ , λ ) = (cid:15) ∗ h ( λ )where (cid:15) ∗ = b ( a + n ) α p (cid:0) η √ ω ∗ k ∗ | c | (cid:1) − p ( y T y + b ) a + n (cid:16) y T y +2 c y T XX T y + b (cid:17) ( a + n ) E π τ ( τ ) (cid:34)(cid:18) τ (cid:19) − p (cid:32) √ τ c (cid:33) − p (cid:0) τ (cid:1) − p (cid:35) , . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe and h is a probability density on R p + given by h ( λ ) = p (cid:89) j =1 ηλ j (cid:16) ηλ j (cid:17) I (0 , ∞ ) (cid:0) λ j (cid:1) , and this completes the proof of minorization condition for the MTD k corresponding to the regu-larized Horseshoe λ -chain.The drift and minorization conditions in Lemma 3.1 and Lemma 3.2 can be combined with Theorem12 of Rosenthal [20] to establish geometric ergodicity of the regularized Horseshoe Gibbs samplerwhich is stated as follows: Theorem 3.1.
Suppose the prior density π τ ( · ) for the global shrinkage parameter satisfies (cid:90) R + u δ/ π τ ( u ) du < ∞ for some δ ∈ (0 . , . . Then, the regularized Horseshoe Gibbs sampler with transitionkernel K aug,reg is geometrically ergodic. The following variant of regularized Horseshoe shrinkage prior has been introduced in Nishimura,Suchard [14]. π (cid:0) β j , λ j | τ , σ (cid:1) ∝ (cid:113) τ λ j exp (cid:34) − β j σ (cid:32) c + 1 τ λ j (cid:33)(cid:35) π (cid:96) ( λ j ) independently for j = 1 , , · · · , pσ ∼ Inverse-Gamma ( a, b ) ; τ ∼ π τ ( · ) (3.12)where π (cid:96) and π τ are probability densities. Note that based on the above specification β j | λ j , τ , σ ∼ N (cid:32) , (cid:32) c + 1 τ λ j (cid:33)(cid:33) identical to the specification in [17]. The difference is that instead λ , τ and σ having independentpriors, we now have π ( λ j | τ , σ ) = c ( τ ) (cid:32) τ λ j c (cid:33) − / π (cid:96) ( λ j ) , (3.13)where 1 /c ( τ ) = (cid:90) ∞ (cid:32) τ λ j c (cid:33) − / π (cid:96) ( λ j ) dλ j . (3.14) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe The principal motivation for the algebraic modification of the prior as compared to that of [17] is theresulting simplification of the posterior computation, although an alternative interpretation usingfictitious data is also discussed in [14]. In fact, using π (cid:96) to be the half-Cauchy density and usingits representation in terms of a mixture of Inverse-Gamma densities ([12]) the following conditionalposterior distributions can be obtained from straightforward computations after augmenting thelatent variables { ν j } pj =1 . β | σ , τ , λ , y ∼ N (cid:0) A − c X T y , σ A − c (cid:1) σ (cid:12)(cid:12) τ , λ , y ∼ Inverse-Gamma (cid:32) a + n , y T (cid:0) I n − X A − c X T (cid:1) y b (cid:33) ν j | λ j , y ∼ Inverse-Gamma (cid:32) , λ j (cid:33) , independently for j = 1 , , · · · , pτ (cid:12)(cid:12) λ , y ∼ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) λ j (cid:12)(cid:12) ν j , β j , σ , τ , y ∼ Inverse-Gamma (cid:32) , ν j + β j σ τ (cid:33) , independently for j = 1 , , · · · , p (3.15)where π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) ∝ (cid:12)(cid:12) τ A c (cid:12)(cid:12) − (cid:32) y T (cid:0) I n − X A − c X T (cid:1) y b (cid:33) − ( a + n ) π τ (cid:0) τ (cid:1) c ( τ ) p and A c = X T X + (cid:0) τ Λ (cid:1) − + c − I p . Most of the above conditional posterior densities, includingthat for the local shrinkage parameters (cid:110) λ j (cid:111) pj =1 are standard probability distributions (as opposedto the non-standard ones in the regularized Horseshoe posterior in (3.2)) and can be easily sampledfrom. An efficient Metropolis sampler for the non-standard (one-dimensional) density π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) can be constructed similar to the one provided in Appendix D. Hence, a two-block Gibbs sampler,whose one step-transition from ( β , σ , ν , τ , λ ) to ( β , σ , ν , τ , λ ) is given by sampling sequen-tially from π ( β , σ , ν , τ | λ , y ) and π ( λ | β , σ , ν , τ , y ), can be used to generate approximatesamples from the regularized Horseshoe posterior in (3.15). We will denote the Markov transitionkernel of this two-block Gibbs sampler by ˜ K aug,reg (analogous to K aug,reg in the regularized Horse-shoe setting). The transition density can be obtained by substituting the appropriate conditionalposterior densities in the expression (3.3).Note that the above conditional posterior distributions are very similar to that for the originalHorseshoe Gibbs sampler K aug given by (2.2) in Section 2. The only differences are1. the matrix A appearing in (2.2) has been replaced by A c (which is the matrix A plus theadded regularization introduced in the prior conditional variance of β through the constantc) in (3.15), and2. the form of the posterior conditional density of the global shrinkage parameter, namely, π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) is different due to the additional term c ( τ ) p . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Theorem 3.2.
Suppose the prior density of the global shrinkage parameter is truncated below awayfrom zero; that is, π τ ( u ) = 0 for u < T for some T > and satisfies (cid:90) ∞ T u p + δ π τ ( u ) du < ∞ for some δ ∈ (0 . , . . Then, the regularized Horseshoe Gibbs sampler corresponding tothe transition kernel ˜ K aug,reg is geometrically ergodic. The above theorem can be proved by essentially following verbatim the proof of Lemma 2.1 (whichestablishes geometric ergodicity for K aug ) with the same geometric drift function as in Lemma 2.1,and replacing the matrix A by the matrix A c at relevant places. However, appropriate modificationsare needed using the following two facts.1. In the original Horseshoe setting, a uniform upper bound for the conditional posterior meansof β (cid:48) j s (see (2.14) for definition) was established in Proposition A.5 in Appendix A. However,in the current context, the added regularization of c − I p in A c immediately provides theuniform upper bound without need for additional analysis.2. The conditional posterior density π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) is different from the original Horseshoe setting.Hence, the upper bound for the δ / th moment of this density for some δ ∈ (0 . , . c ( τ ) p in the conditionaldensity, a stronger assumption of the existence of ( p + δ ) / th moment is required (as comparedto the δ / th moment in Theorem 2.1 and Theorem 3.1). Remark 3.1.
In [14], the authors focus on Bayesian logistic regression for their geometric ergod-icity analysis. They use the regularized Horseshoe prior in (3.12) without the parameter σ as theiris no need for an error variance parameter for the Binomial likelihood. However, for computationalpurposes, additional parameters ω = { ω j } pj =1 with Polya-Gamma prior distributions are introduced.A two-block Gibbs sampler with blocks ( β , λ ) and ( ω , τ ) is then constructed and its geometric er-godicity is then established assuming that the global shrinkage parameter τ is bounded away fromzero and infinity [14, Theorem 4.6].Many details of this analysis break down when translating to the Bayesian linear regression frame-work considered in our paper. The parameters ω are now replaced by the error variance parameter σ . One can still construct a two-block Gibbs sampler with blocks ( β , λ ) and ( σ , τ ) , but manyconditional independence and other algebraic niceties involving ω which are crucial in establishingthe minorization condition in the logistic regression context, do not hold analogously with σ inthe linear regression context. The structural differences also imply that the drift condition with thefunction p (cid:80) j =1 | β j | − δ does not work out in the linear regression setting. Remark 3.2.
The geometric ergodicity result (Theorem 3.2) corresponding to the regularized Horse-shoe variant in [14] requires truncation of the global shrinkage parameter τ below away from zero.Such an assumption is not required for the geometric ergodicity result (Theorem 3.1) corresponding . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe to the regularized Horseshoe of [17]. Also, due to the presence of the additional term ( c ( τ )) p in π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) , a stronger moment assumption is required for Theorem 3.2 as compared to Theorem3.1. The primary objective of this study is to examine the practical feasibility/scalability of the tworegularized Horseshoe Gibbs samplers described in Sections 3.1 and 3.3. We consider a simulationsetting with n = 500 samples and p = 1000 variables. We generate 10 replicated datasets followingexactly the same procedure as outlined in Section 2.3. For each of these 10 datasets, we run fourGibbs samplers each: the Gibbs sampler K aug,reg for the regularized Horseshoe in [17] with c = 1and c = 100, and the Gibbs sampler ˜ K aug,reg for the regularized Horseshoe variant in [14] with c = 1 and c = 100.Again, both algorithms were implemented in R . Due to maintenance issues, an older machinealbeit with the same OS/RAM/processor specifications was used for these experiments as comparedto the one used in Section 2.3. The run-times for 2500 iterations of each Gibbs sampler for eachreplication and each value of c are provided in Table 2. Cumulative average plots for the function β T β were used to monitor and confirm sufficient mixing of all the Markov chains. In all the settings,and across all the replications, the Gibbs samplers roughly needed 5500 seconds to complete therequired 2500 iterations.We also tried to use the Hamiltonian Monte Carlo based algorithm for the regularized Horseshoein [17], as implemented in the R package hsstan . However, the maximum treedepth (set to 10) isexceeded in all of the 2500 iterations. This issue persists even after warming up for up to 7000iterations, and then running for 2500 more iterations. As we understand, this indicates poor adap-tation, and raises questions about adequate posterior exploration and mixing of the Markov chain.A proposed remedy in this setting (Chapter 15.2 of the Stan reference manual on mc-stan.org ) is toincrease the tree depth. The hsstan function, however, did not allow us to pass the max treedepth or max depth as a parameter and change its value. Anyway, from the point of view of scalability,the time taken per iteration with maximum treedepth 10 was roughly one-fourth as compared tothe various Gibbs samplers. When the maximum treedepth is increased appropriately to resolvethe issue pointed out above, it is very likely that the time taken per iteration will be around thesame or more than those of the Gibbs samplers (increasing the tree-depth by 1 in the No U-turnHMC sampler effectively doubles the computation time).To conclude, the Gibbs samplers described in Sections 3.1 and 3.3 provide practically feasibleapproaches which are computationally competitive with the HMC based approach. The geometricergodicity results in Theorems 3.1 and 3.2 help provide the practitioner with asymptotically validstandard error estimates for corresponding MCMC based approximations to posterior quantities ofinterest. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe c = 100 c = 11 5505.93 5783.772 5704.37 5755.313 5563.22 5756.454 5644.05 5807.365 5534.64 5917.486 5491.00 5692.677 5487.85 5629.298 5589.35 5812.139 5634.18 5819.7510 5679.75 5715.75 (a) Gibbs sampler K aug,reg for the regularizedHorseshoe posterior in [17] Replication c = 100 c = 11 5422.53 5289.712 5488.5 5461.393 5459.94 5425.374 5479.54 5272.115 5486.41 5373.166 5332.89 5430.527 5549.28 5180.048 5343.74 5102.169 5462.57 5446.2310 5252.53 5103.76 (b) Gibbs sampler ˜ K aug,reg for the regularizedHorseshoe variant in [14] Table 2
The run-times (in seconds) required to generate 2500 MCMC samples for Gibbs samplers corresponding to theregularized Horseshoe in [17] and the regularized Horseshoe variant in [14] for 10 simulated data sets with n = 500 and p = 1000 . Appendix A: Uniform bound on µ j The goal of this subsection is to show that µ j = e Tj A − X T y defined in (2.14) is uniformly boundedin λ (even when n < p ). This result will be established through a sequence of five propositions. Proposition A.1.
Let Λ ∈ R p × p be any diagonal matrix with positive diagonal elements λ , λ , . . . , λ p .Let X ∈ R n × p be any matrix with rank r . Let the singular value decomposition of X is X = U DV T where D ∈ R r × r is diagonal matrix with positive diagonal elements d , . . . d r while U ∈ R n × r and V ∈ R p × r are such that U T U = I and V T V = I . If λ ≤ min { λ , . . . , λ p } be any positive numberthen for arbitrary y ∈ R n y T X ( X T X + Λ) − X T y ≤ y T y − (cid:107) P U ⊥ y (cid:107) − r (cid:88) i =1 λ ˜ u i d i + λ , where ˜ u i is the i th component of the vector ˜ u = U T y and P U ⊥ is the orthogonal projection matrixfor the orthogonal complement of the column space of U .Proof. Without loss of generality we assume that the matrix Λ is diagonal matrix with diagonalelements λ , λ , . . . , λ p where 0 < λ ≤ λ ≤ . . . ≤ λ p . According to the condition of the result λ ≤ λ . Now we define a set of diagonal matrices { Λ (1) , . . . , Λ ( j ) , . . . , Λ ( p ) } in the following manner.For j = 1 , . . . ( p − ( j ) has first j diagonal elements to be identical and equal to λ while rest of the ( p − j ) diagonal elements are identical as that of the matrix Λ. Also let Λ ( p ) = λI p × p and Λ (0) = Λ. The above set of matrices satisfy the following relationΛ ( j − = Λ ( j ) + ( λ j − λ ) e j e Tj for j = 1 , . . . , p where e j ∈ R p denotes the j th elementary vector. Now using the Sherman-Woodbury formula for . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe inverting matrices we get that (cid:0) X T X + Λ ( j − (cid:1) − = (cid:0) X T X + Λ ( j ) + ( λ j − λ ) e j e Tj (cid:1) − = (cid:0) X T X + Λ ( j ) (cid:1) − − ( λ j − λ ) (cid:0) X T X + Λ ( j ) (cid:1) − e j e Tj (cid:0) X T X + Λ ( j ) (cid:1) − − ( λ j − λ ) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − e j . Consequently, (cid:0) X T X + Λ ( j − (cid:1) − − (cid:0) X T X + Λ ( j ) (cid:1) − = − ( λ j − λ ) (cid:0) X T X + Λ ( j ) (cid:1) − e j e Tj (cid:0) X T X + Λ ( j ) (cid:1) − − ( λ j − λ ) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − e j , (A.1)Aggregating the equations A.1 over j = 1 , . . . p , we get that (cid:0) X T X + Λ (cid:1) − = (cid:0) X T X + λI (cid:1) − − p (cid:88) j =1 ( λ j − λ ) (cid:0) X T X + Λ ( j ) (cid:1) − e j e Tj (cid:0) X T X + Λ ( j ) (cid:1) − − ( λ j − λ ) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − e j , (A.2)where we have used the fact that Λ (0) = Λ and Λ ( p ) = λI . If y ∈ R n be arbitrary vector then itfollows from A.2 that y T X (cid:0) X T X + Λ (cid:1) − X T y = y T X (cid:0) X T X + λI (cid:1) − X T y − p (cid:88) j =1 ( λ j − λ ) y T X (cid:0) X T X + Λ ( j ) (cid:1) − e j e Tj (cid:0) X T X + Λ ( j ) (cid:1) − X T y − ( λ j − λ ) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − e j = y T X (cid:0) X T X + λI (cid:1) − X T y − p (cid:88) j =1 ( λ j − λ ) (cid:107) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − X T y (cid:107) − ( λ j − λ ) e Tj (cid:0) X T X + Λ ( j ) (cid:1) − e j ≤ y T X (cid:0) X T X + λI (cid:1) − X T y, (A.3)because λ j ≥ λ . Now consider the singular value decomposition of the matrix X = U DV T where D is a diagonal matrix with positive diagonal elements d , . . . , d r , U ∈ R n × r , V ∈ R p × r such that U T U = V T V = I . Also let V ⊥ ∈ R p × ( p − r ) be such that the matrix (cid:2) V, V ⊥ (cid:3) ∈ R p × p is a orthogonalmatrix, i.e. the columns of V ⊥ constitutes a orthonormal basis for the orthogonal complement ofthe column space of V . Now consider the fact that( X T X + λI ) − = (cid:2) V DU T U DV T + λI (cid:3) − = (cid:104) V D V T + λV V T + λV ⊥ ( V ⊥ ) T (cid:105) − = (cid:104) V ( D + λI ) V T + λV ⊥ ( V ⊥ ) T (cid:105) − = V ( D + λI ) − V T + 1 λ V ⊥ ( V ⊥ ) T . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Note that ( V ⊥ ) T V = 0. Thus y T X (cid:0) X T X + λI (cid:1) − X T y = y T X (cid:20) V ( D + λI ) − V T + 1 λ V ⊥ ( V ⊥ ) T (cid:21) V DU T y = y T ( U DV T ) V ( D + λI ) − V T V DU T y = y T U D ( D + λI ) − DU T y = r (cid:88) i =1 d i ˜ u i d i + λ , (A.4)where d , . . . d r > D and ˜ u i is the i th entry of the vector U T y . Let U ⊥ refers to the orthogonal completion of the matrix U then r (cid:88) i =1 ˜ u i = y T U U T y = y T y − y T U ⊥ ( U ⊥ ) T y = y T y − (cid:107) P U ⊥ y (cid:107) , (A.5)where P U ⊥ denotes the orthogonal projection for the column space of U ⊥ . Finally, it follows fromA.3, A.4 and A.5 that y T X (cid:0) X T X + Λ (cid:1) − X T y ≤ r (cid:88) i =1 ˜ u i − r (cid:88) i =1 λ ˜ u i d i + λ = y T y − (cid:107) P U ⊥ y (cid:107) − r (cid:88) i =1 λ ˜ u i d i + λ . Note that (cid:107) P U ⊥ y (cid:107) + (cid:80) ri =1 λ ˜ u i d i + λ > (cid:80) ri =1 ˜ u i + (cid:107) P U ⊥ y (cid:107) = y T y > Proposition A.2.
Let X = [ x , . . . , x p ] ∈ R n × p and X = [ x , . . . , x p ] . Assume X = U DV T bethe singular value decomposition where d , . . . d r > are the diagonal elements of D . Let δ > and ∆ p − be any diagonal matrix with positive diagonal elements δ , . . . , δ p . If T := ( (cid:107) x (cid:107) + δ ) − x T X ( X T X + ∆ p − ) − X x then for any < δ ≤ min { δ , . . . , δ p } ,T ≥ δ + (cid:107) P U ⊥ x (cid:107) + r (cid:88) i =1 δ ˜ u i d i + δ , where ˜ u i is the i th component of the vector ˜ u = U T x and P U ⊥ is the orthogonal projection matrixfor the orthogonal complement of the column space of U .Proof. Using Result A.1, for any δ ≤ min { δ , . . . , δ p } , we get that x T X ( X T X + ∆ p − ) − X x ≤ x T x − (cid:107) P U ⊥ x (cid:107) − r (cid:88) i =1 δ ˜ u i d i + δ Consequently T = ( x T x + δ ) − x T X ( X T X + ∆ p − ) − X x ≥ δ + (cid:107) P U ⊥ x (cid:107) + (cid:80) ri =1 δ ˜ u i d i + δ . . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Proposition A.3.
Let X ∈ R n × p be arbitrary matrix and ∆ p ∈ R p × p be any diagonal matrix withpositive diagonal elements δ , . . . , δ p . Consider the following partition of the matrix X T X + ∆ p = (cid:34) (cid:107) x (cid:107) + δ x T X X T x X T X + ∆ p − (cid:35) , where ∆ p − is the diagonal matrix with diagonal elements δ , . . . , δ p . If ( X T X + ∆ p − ) − ∆ p − isuniformly bounded, then the first column of the matrix (cid:0) X T X + ∆ p (cid:1) − ∆ p is uniformly bounded. The notations x and X are as they are defined in the Result A.2.Proof. If we consider the partition of X T X + ∆ p = (cid:34) (cid:107) x (cid:107) + δ x T X X T x X T X + ∆ p − (cid:35) , then the Schur complement of the first block of the matrix is given as T = ( (cid:107) x (cid:107) + δ ) − x T X ( X T X + ∆ p − ) − X T x . Employing the inversion formula of the block matrices [11], we get that( X T X + ∆ p ) − = (cid:34) T x T X ( X T X +∆ p − ) − T ( X T X +∆ p − ) − X T x T ( X T X + ∆ p − ) − + ( X T X +∆ p − ) − X T x x T X ( X T X +∆ p − ) − T (cid:35) . In the next two bullet points, we are going to show if ( X T X + ∆ p − ) − ∆ p − is uniformly boundedthen so is all the entries of the vector (cid:104) δ T δ ( X T X +∆ p − ) − X T x T (cid:105) T , which is the first column of the matrix ( X T X + ∆ p ) − ∆ p . • To show < δ T ≤ T > δ T > δ > < δ ≤ min { δ , . . . , δ p } , δ T ≤ δ δ + (cid:107) P U ⊥ x (cid:107) + (cid:80) ri =1 δ ˜ u i d i + δ ≤ . (A.6)where the details about the notations ˜ u i , P U ⊥ , d i can be found in Result A.2. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe • To show δ ( X T X +∆ p − ) − X T x T uniformly bounded: Let v , v be such that x = v + v where v belongs to the column space of X and v belongs to the orthogonal complement of the column space of X . Therefore v = X l forsome vector l ∈ R p − . Consequently,( X T X + ∆ p − ) − X T x = ( X T X + ∆ p − ) − X T ( X l + v )= ( X T X + ∆ p − ) − X T X l = (cid:2) I − ( X T X + ∆ p − ) − ∆ p − (cid:3) l, is uniformly bounded as we are assuming that the matrix ( X T X + ∆ p − ) − ∆ p − is uniformlybounded. Combining this fact along with A.6, we conclude that all the entries of the vector δ ( X T X +∆ p − ) − X T x T are also uniformly bounded. Proposition A.4.
Let X ∈ R n × p be arbitrary matrix and ∆ p ∈ R p × p be any diagonal matrix withpositive diagonal elements δ , . . . , δ p . Then for arbitrary p and n the matrix (cid:0) X T X + ∆ p (cid:1) − ∆ p isuniformly bounded. Specifically sup δ ,...,δ p > (cid:12)(cid:12)(cid:12) e Ti (cid:0) X T X + ∆ p (cid:1) − ∆ p e j (cid:12)(cid:12)(cid:12) < C where C is a finite constant that does not depend on δ , . . . , δ p .Proof. We will show the result by induction on the integer k where the hypothesis of induction isas follows, H ( k ) : Let n be arbitrary positive integer. Then for any positive integer k , the matrix (cid:0) X T X + ∆ k (cid:1) − ∆ k is uniformly bounded for all X ∈ R n × k and arbitrary diagonal matrix with positive diagonal ele-ments ∆ k ∈ R k × k . Initial step:
The hypothesis trivially holds for k = 1. We will show that H ( k ) is true forthe case k = 2 . Let X = [ x , x ] ∈ R n × and ∆ = (cid:34) δ δ (cid:35) , δ , δ > A := (cid:34) a , a , a , a , (cid:35) := X T X and then( A + ∆ ) − ∆ = 1( a , a , − a , a , ) + δ a , + δ a , + δ δ (cid:34) δ ( a , + δ ) − δ a , − δ a , δ ( a , + δ ) (cid:35) • Note thatsup δ ,δ > (cid:12)(cid:12)(cid:12) e T ( A + ∆ ) − ∆ e (cid:12)(cid:12)(cid:12) = sup δ ,δ > δ ( a , + δ )( a , a , − a , a , ) + δ ( a , + δ ) + δ a , ≤ a , a , − a , a , ) ≥ X T X is nonnegative definte matrix. Additionally a , = (cid:107) x (cid:107) ≥ a , = (cid:107) x (cid:107) ≥
0, where X = [ x , x ] . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe • sup δ ,δ > (cid:12)(cid:12)(cid:12) e T ( A + ∆ ) − ∆ e (cid:12)(cid:12)(cid:12) = sup δ ,δ > | a , δ | ( a , a , − a , a , ) + δ ( a , + δ ) + δ a , (A.7) ≤ | a , | a , for the case when a , (cid:54) = 0. On the contrary, if a , = 0 then it follows from A.7 thatsup δ ,δ > (cid:12)(cid:12)(cid:12) e T ( A + ∆ ) − ∆ e (cid:12)(cid:12)(cid:12) = 0 , (A.8)because a , = (cid:107) x (cid:107) = 0 implies that a , = x T x = 0.In a similar fashion we can show that the absolute value of the other two entries of the matrix (cid:0) X T X + ∆ (cid:1) − ∆ can be bounded above by numbers that does not depend on δ , δ . Consequently H ( k ) holds for k = 2 . Induction step:
Let H ( k ) holds for k = 1 , , . . . , ( p − k = p as well. Let X ∈ R n × p be arbitrary matrix and ∆ p for diagonal matrix with positive diagonalelements δ , . . . , δ p . Consider the partition of the matrices X T X + ∆ p as follows X T X + ∆ p = (cid:34) (cid:107) x (cid:107) + δ x T X X T x X T X + ∆ p − (cid:35) , (A.9)where X , x , ∆ p − are as it is in Result A.2. As it satisfies the conditions of the induction hypothesis H ( p − X T X + ∆ p − ) − ∆ p − is uniformly bounded. Therefore using Result A.3,the first column of the matrix ( X T X + ∆ p ) − ∆ p is uniformly bounded.In remaining of the proof, we show that the m th column of ( X T X + ∆ p ) − ∆ p is uniformlybounded for any m >
1. Consider the permutation matrix P ,m = [ e m , e . . . , e m − , e . . . , e p ]. Notethat P ,m can be generated by exchanging the 1 st and m th columns of an identity matrix. P ,m isa symmetric and orthogonal matrix, i.e. P T ,m = P ,m and P T ,m P ,m = P ,m = I . Now consider P ,m ( X T X + ∆ p ) − ∆ p P ,m = P ,m ( X T X + ∆ p ) − P ,m P ,m ∆ p P ,m = ( P T ,m X T XP ,m + P ,m ∆ p P ,m ) − P ,m ∆ p P ,m = ( X ∗ T X ∗ + ∆ ∗ p ) − ∆ ∗ p , (A.10)where the X ∗ := XP ,m is obtained by exchanging the first and m th columns of X while ∆ ∗ p := P ,m ∆ p P ,m is the diagonal matrix where the first and the m th diagonal elements of ∆ p are ex-changed. We can represent X ∗ T X ∗ + ∆ ∗ p as (cid:34) (cid:107) x ∗ (cid:107) + δ ∗ x ∗ T X ∗ X ∗ T x ∗ X ∗ T X + ∆ ∗ p − (cid:35) , . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe where the notations are equivalent to that of the ones in A.9. The matrix ( X ∗ T X ∗ + ∆ ∗ p − ) − ∆ ∗ p − satisfies the conditions of the induction hypothesis H ( p − X ∗ T X ∗ + ∆ ∗ p ) − ∆ ∗ p is uniformly bounded as well.It follows from A.10 that the permuted version of the first column of ( X ∗ T X ∗ + ∆ ∗ p ) − ∆ ∗ p is P ,m (cid:104) ( X ∗ T X ∗ + ∆ ∗ p ) − ∆ ∗ p e (cid:105) = P ,m P ,m ( X T X + ∆ p ) − ∆ p P ,m e = (cid:2) ( X T X + ∆ p ) − ∆ p (cid:3) e m , which is the m th column of ( X T X + ∆ p ) − ∆ p . Therefore, we infer that all the columns of thematrix ( X T X + ∆ p ) − ∆ p are uniformly bounded and conclude that H ( k ) holds for the case k = p . Proposition A.5.
Let X ∈ R n × p be arbitrary matrix and ∆ p ∈ R p × p be any diagonal matrix withpositive diagonal elements δ , . . . , δ p . Then for arbitrary p and n
1. The matrix (cid:0) X T X + ∆ p (cid:1) − X T X is uniformly bounded. Specifically sup δ ,...,δ p > (cid:12)(cid:12)(cid:12) e Ti (cid:0) X T X + ∆ p (cid:1) − X T Xe j (cid:12)(cid:12)(cid:12) < C where C is a finite constant that does not depend on δ , . . . , δ p .2. The vector (cid:0) X T X + ∆ p (cid:1) − X T y is uniformly bounded.Proof. part(1): Note that (cid:0) X T X + ∆ p (cid:1) − X T X = I − (cid:0) X T X + ∆ p (cid:1) − ∆ p . Using ResultA.4 we knowthat the matrix (cid:0) X T X + ∆ p (cid:1) − ∆ p is uniformly bounded. Consequently (cid:0) X T X + ∆ p (cid:1) − X T X isalso uniformly bounded.part(2):Let y = v + v where v belongs to the column space of X and v belongs to the perpendicularto the column space of X . Therefore v = Xl for some vector l ∈ R p − . Consequently,( X T X + ∆ p ) − X T y = ( X T X + ∆ p ) − X T ( v + v )= ( X T X + ∆ p ) − X T ( Xl + v )= (cid:2) ( X T X + ∆ p ) − X T X (cid:3) l. Therefore part(a) of the result ensures that the ( X T X + ∆ p ) − X T y is uniformly bounded. Appendix B: Other technical resultsProposition B.1.
Let δ be chosen as in Lemma 2.1. Then for any (cid:15) > there exists C > (notdepending on λ ) such that E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ C . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Proof.
For any (cid:15) >
0, note that E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = E (cid:20) (cid:0) τ (cid:1) δ I [ τ <(cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ (cid:15) δ + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) (B.1)Next we demonstrate an upper bound to the second term in (B.1). E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = ∞ (cid:90) (cid:15) (cid:0) τ (cid:1) δ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) dτ = ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) | I p + τ X T X · Λ | dτ ∞ (cid:82) (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) | I p + τ X T X · Λ | dτ ≤ (cid:18) y T y b (cid:19) a + n ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ π τ ( τ ) | I p + τ X T X · Λ | dτ (cid:15) (cid:82) π τ ( τ ) | I p + τ X T X · Λ | dτ ≤ (cid:18) y T y b (cid:19) a + n ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ π τ ( τ ) | I p + (cid:15) X T X · Λ | dτ (cid:15) (cid:82) π τ ( τ ) | I p + (cid:15) X T X · Λ | dτ ≤ (cid:16) y T y b (cid:17) a + n (cid:15) (cid:82) π τ ( τ ) dτ ∞ (cid:90) (cid:15) (cid:0) τ (cid:1) δ π τ (cid:0) τ (cid:1) dτ < ∞ . (B.2)This completes the proof with C = (cid:15) δ + (cid:18) y T y b (cid:19) a + n (cid:15) (cid:82) π τ ( τ ) dτ ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ π τ (cid:0) τ (cid:1) dτ . Proposition B.2.
Suppose there exists a δ > . such that (cid:90) ∞ ( τ ) − p + δ π τ ( τ ) dτ < ∞ . Then for any (cid:15) > there exists ˜ C > (not depending on λ ) such that E (cid:20) (cid:0) τ (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ ˜ C . Proof.
For any (cid:15) >
0, note that E (cid:20) (cid:0) τ (cid:1) − δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = E (cid:20) (cid:0) τ (cid:1) − δ I [ τ <(cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ (cid:15) − δ + E (cid:20) (cid:0) τ (cid:1) − δ I [ τ ≤ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) (B.3) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Next we demonstrate an upper bound to the second term in (B.3). E (cid:20) (cid:0) τ (cid:1) − δ I [ τ ≤ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = (cid:15) (cid:90) (cid:0) τ (cid:1) − δ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) dτ = (cid:15) (cid:82) (cid:0) τ (cid:1) − δ (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) | I p + τ X T X · Λ | dτ ∞ (cid:82) (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) | I p + τ X T X · Λ | dτ ≤ (cid:18) y T y b (cid:19) a + n (cid:15) (cid:82) (cid:0) τ (cid:1) − δ π τ ( τ ) | I p + τ X T X · Λ | dτ ∞ (cid:82) (cid:15) π τ ( τ ) | I p + τ X T X · Λ | dτ = (cid:18) y T y b (cid:19) a + n (cid:15) (cid:82) (cid:0) τ (cid:1) − p + δ π τ ( τ ) | τ − I p + X T X · Λ | dτ ∞ (cid:82) (cid:15) ( τ ) − p π τ ( τ ) | τ − I p + X T X · Λ | dτ ≤ (cid:18) y T y b (cid:19) a + n (cid:15) (cid:82) (cid:0) τ (cid:1) − p + δ π τ ( τ ) | (cid:15) − I p + X T X · Λ | dτ ∞ (cid:82) (cid:15) ( τ ) − p π τ ( τ ) | (cid:15) − I p + X T X · Λ | dτ ≤ (cid:18) y T y b (cid:19) a + n (cid:15) (cid:82) (cid:0) τ (cid:1) − p + δ π τ (cid:0) τ (cid:1) ∞ (cid:82) (cid:15) ( τ ) − p π τ ( τ ) dτ < ∞ . (B.4) Proposition B.3.
Refer to 3.2 and Lemma 3.1. Then for any (cid:15) > and for any δ > , there exists C > such that E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ C .Proof. Fix an (cid:15) > δ > E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = E (cid:20) (cid:0) τ (cid:1) δ I [ τ <(cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ (cid:15) δ + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) (B.5) . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Next we demonstrate an upper bound to the second term in (B.5). E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = ∞ (cid:90) (cid:15) (cid:0) τ (cid:1) δ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) dτ = ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) (cid:12)(cid:12)(cid:12) c − I p + ( τ Λ ) − (cid:12)(cid:12)(cid:12) / | X T X + c − I p +( τ Λ ) − | / π τ (cid:0) τ (cid:1) dτ ∞ (cid:82) (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) | c − I p +( τ Λ ) − | / | X T X + c − I p +( τ Λ ) − | / π τ ( τ ) dτ ≤ (cid:18) y T y b (cid:19) a + n ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12) c − I p + ( τ Λ ) − (cid:12)(cid:12)(cid:12) / | X T X + c − I p +( τ Λ ) − | / π τ (cid:0) τ (cid:1) dτ (cid:15) (cid:82) | c − I p +( τ Λ ) − | / | X T X + c − I p +( τ Λ ) − | / π τ ( τ ) dτ (B.6) ≤ (cid:16) y T y b (cid:17) a + n (cid:15) (cid:82) π τ ( τ ) dτ ∞ (cid:90) (cid:0) τ (cid:1) δ π τ (cid:0) τ (cid:1) dτ < ∞ , Note that the ratio of two determinants inside the integral in the numerator and denominator in(B.6) can be represented in the form | B + τ − I p || B + B + τ − I p | = (cid:81) pk =1 ( s k ( B ) + τ − ) (cid:81) pk =1 ( s k ( B + B ) + τ − )for appropriate symmetric non-negative definite matrices B and B , and their respective eigenval-ues denoted by s k ( · ). Since every eigenvalue of B is bounded above by the corresponding eigenvalueof B + B , it follows that the ratio of determinants is a decreasing function of τ , and can be re-placed by the value at τ = (cid:15) in both places with the inequality going in the right direction. Thiscompletes the proof with C = (cid:15) δ + (cid:16) y T y b (cid:17) a + n (cid:15) (cid:82) π τ ( τ ) dτ ∞ (cid:90) (cid:0) τ (cid:1) δ π τ (cid:0) τ (cid:1) dτ . Proposition B.4.
Let δ be chosen as in Theorem 3.2. Then for any (cid:15) > there exists C > (notdepending on λ ) such that E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ C . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Proof.
For any (cid:15) > E (cid:20) (cid:0) τ (cid:1) δ (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = E (cid:20) (cid:0) τ (cid:1) δ I [ τ <(cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) ≤ (cid:15) δ + E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) . (B.7)Next we demonstrate an upper bound to the second term in (B.7). E (cid:20) (cid:0) τ (cid:1) δ I [ τ ≥ (cid:15) ] (cid:12)(cid:12)(cid:12)(cid:12) λ , y (cid:21) = ∞ (cid:90) (cid:15) (cid:0) τ (cid:1) δ π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) dτ = ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) c ( τ ) p | τ ( X T X + c − I p )+ Λ − | dτ ∞ (cid:82) T (cid:18) y T ( I n − X A − X T ) y + b (cid:19) − ( a + n ) π τ ( τ ) c ( τ ) p | τ ( X T X + c − I p )+ Λ − | dτ ≤ (cid:18) y T y b (cid:19) a + n ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ π τ ( τ ) c ( τ ) p | τ ( X T X + c − I p )+ Λ − | dτ (cid:15) (cid:82) T π τ ( τ ) c ( τ ) p | τ ( X T X + c − I p )+ Λ − | dτ ≤ (cid:18) y T y b (cid:19) a + n ∞ (cid:82) (cid:15) (cid:0) τ (cid:1) δ π τ ( τ ) c ( τ ) p | (cid:15) ( X T X + c − I p )+ Λ − | dτ (cid:15) (cid:82) T π τ ( τ ) c ( τ ) p | (cid:15) ( X T X + c − I p )+ Λ − | dτ ≤ ˜ C p (cid:16) y T y b (cid:17) a + n (cid:15) (cid:82) T π τ ( τ ) dτ ∞ (cid:90) T (cid:0) τ (cid:1) δ (cid:0) τ (cid:1) p π τ (cid:0) τ (cid:1) dτ . (B.8)The last inequality follows from the fact that1 ≤ c ( τ ) ≤ ˜ C (cid:112) τ for an appropriate constant ˜ C . Proposition B.5. f : R p (cid:55)→ [0 , ∞ ) and g : R p (cid:55)→ (0 , ∞ ) be two functions such that (cid:82) R p f ( x ) d x < ∞ and < (cid:82) R p f ( x ) g ( x ) d x < ∞ . Then (cid:82) R p f ( x ) g ( x ) d x ≥ (cid:32) (cid:82) R p f ( x ) d x (cid:33) (cid:82) R p f ( x ) g ( x ) d x .Proof. Follows from Cauchy-Schwarz inequality.
Proposition B.6.
For any j ∈ { , , · · · , p } and any d > , there exists some α > such that ∞ (cid:90) ν − j exp (cid:20) − ν j (cid:18) d δ + λ j (cid:19)(cid:21) √ ν j + σ √ τ β j dν j ≥ α (cid:18) λ j (cid:19) − (cid:18) σ √ τ β j (cid:19) . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Proof.
Follows from Proposition B.5 with f ( ν j ) = ν − j exp (cid:20) − ν j (cid:18) d δ + λ j (cid:19)(cid:21) , g ( ν j ) = √ ν j + σ √ τ β j ,and α = (cid:0) d /δ (cid:1) − Proposition B.7.
There exists a positive definite matrix M τ such that (cid:90) R p exp (cid:20) − ( β − Ω − X T y ) T Ω ( β − Ω − X T y ) + β T ( τ Λ ) − β σ (cid:21) p (cid:81) j =1 (cid:18) σ √ τ β j (cid:19) d β ≥ (cid:0) πσ (cid:1) p | c | − p | M τ | − (cid:32) √ τ c (cid:33) − p × exp (cid:34) − y T X (cid:0) c I p + Ω − − M − τ (cid:1) X T y σ (cid:35) Proof.
Follows from Proposition B.5 with f ( β ) = exp (cid:20) − ( β − Ω − X T y ) T Ω ( β − Ω − X T y ) + β T ( τ Λ ) − β σ (cid:21) , g ( β ) = p (cid:81) j =1 (cid:18) σ √ τ β j (cid:19) ,and M τ = Ω + (cid:0) τ Λ (cid:1) − Appendix C: Minorization condition for Horseshoe Gibbs samplerLemma C.1.
For every d > , there exists a constant (cid:15) ∗ = (cid:15) ∗ ( V, d ) > and a density function h on R p + such that k ( λ , λ ) ≥ (cid:15) ∗ h ( λ ) (C.1) for every λ ∈ B ( V, d ) (see Section 2.2 for definition).Proof : Fix a λ ∈ B ( V, d ). In order to prove (C.1) we will demonstrate appropriate lower boundsto the conditional densities appearing in (2.3). From (2.2) we have the following: π (cid:0) τ (cid:12)(cid:12) λ , y (cid:1) ≥ (cid:18) b y T y / b (cid:19) a + n ω − p/ ∗ (cid:0) τ (cid:1) − p/ π τ (cid:0) τ (cid:1) where ω ∗ = max (cid:8) , ¯ ω · d /δ (cid:9) (recall that ¯ ω is the maximum eigenvalue of X T X and that the . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe prior density π τ is truncated below at some T > π ( ν | λ , y ) ≥ p (cid:89) j =1 (cid:26) ν − j exp (cid:20) − ν j (cid:16) d δ (cid:17)(cid:21)(cid:27) π (cid:0) σ (cid:12)(cid:12) τ , λ , y (cid:1) ≥ b a + n Γ (cid:0) a + n (cid:1) (cid:0) σ (cid:1) − ( a + n ) − exp (cid:20) − σ (cid:18) y T y b (cid:19)(cid:21) π (cid:0) β | σ , τ , λ , y (cid:1) ≥ (cid:0) πσ (cid:1) − p d − p/δ (cid:0) τ (cid:1) − p/ × exp (cid:34) − (cid:0) β − M − X T y (cid:1) T M (cid:0) β − M − X T y (cid:1) + y T (cid:0) I − X M − X (cid:1) y σ (cid:35) since, (cid:0) β − A − X T y (cid:1) T A (cid:0) β − A − X T y (cid:1) = β T A β − β T X T y + y T X A − X T y ≤ β T M β − β T X T y + y T y = (cid:0) β − M − X T y (cid:1) T M (cid:0) β − M − X T y (cid:1) + y T (cid:0) I − X M − X T (cid:1) y where M = ω ∗ (cid:0) τ (cid:1) I p and ω ∗ = max (cid:8) ¯ ω, d /δ (cid:9) . Finally, π (cid:0) λ | β, ν , σ , τ , y (cid:1) ≥ p (cid:89) j =1 (cid:40) β j σ τ (cid:0) λ j (cid:1) − exp (cid:34) − λ j (cid:32) ν j + β j σ τ (cid:33)(cid:35)(cid:41) (C.2)Putting all lower bounds in (C.2) in the equation of MTD (2.3) we have: k ( λ , λ ) ≥ (2 π ) − p ( ω ∗ ) − p/ d − p/δ (cid:18) b y T y / b (cid:19) a + n b a + n Γ (cid:0) a + n (cid:1) × (cid:90) [ T, ∞ ) (cid:90) R + (cid:90) R p (cid:90) R p + p (cid:89) j =1 (cid:40) ν − j exp (cid:34) − ν j (cid:32) d δ + 1 λ j (cid:33)(cid:35)(cid:41) × exp (cid:34) − (cid:0) β − M − X T y (cid:1) T M (cid:0) β − M − X T y (cid:1) + β T (cid:0) τ Λ (cid:1) − β σ (cid:35) × exp (cid:34) − y T (cid:0) I − X M − X T (cid:1) y + y T y + 2 b σ (cid:35) × (cid:0) σ (cid:1) − ( a + n + p ) − p (cid:89) j =1 (cid:40) β j σ τ (cid:0) λ j (cid:1) − (cid:41) (cid:0) τ (cid:1) − p/ (cid:0) τ (cid:1) − p/ π τ (cid:0) τ (cid:1) d ν d β dσ dτ Next we perform the inner integral wrt ν and noting that 1 + d δ + λ j ≤ (cid:16) d δ (cid:17) (cid:18) λ j (cid:19) wehave: . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe k ( λ , λ ) ≥ (2 π ) − p ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y / b (cid:19) a + n b a + n Γ (cid:0) a + n (cid:1) × (cid:90) [ T, ∞ ) (cid:90) R + (cid:90) R p p (cid:89) j =1 (cid:40) β j σ (cid:41) exp (cid:34) − (cid:0) β − M − X T y (cid:1) T M (cid:0) β − M − X T y (cid:1) + β T (cid:0) τ Λ (cid:1) − β σ (cid:35) × (cid:0) σ (cid:1) − ( a + n + p ) − exp (cid:34) − y T (cid:0) I − X M − X T (cid:1) y + y T y + 2 b σ (cid:35) × p (cid:89) j =1 (cid:32) λ j (cid:33) − (cid:0) λ j (cid:1) − (cid:0) τ (cid:1) − p/ (cid:0) τ (cid:1) − p/ π τ (cid:0) τ (cid:1) d β dσ dτ Now recall that M = ω ∗ (cid:0) τ (cid:1) I p . Hence (cid:0) β − M − X T y (cid:1) T M (cid:0) β − M − X T y (cid:1) + β T (cid:0) τ Λ (cid:1) − β = β T (cid:16) M + (cid:0) τ Λ (cid:1) − (cid:17) β − β T X T y + y T X M − X T y ≤ β T Q β − β T X T y + y T X M − X T y = (cid:0) β − Q − X T y (cid:1) T Q (cid:0) β − Q − X T y (cid:1) + y T X (cid:0) M − − Q − (cid:1) X T y where Q = ω ∗ (cid:0) τ (cid:1) (cid:0) I p + Λ − (cid:1) . Hence it follows that k ( λ , λ ) ≥ ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y / b (cid:19) a + n b a + n Γ (cid:0) a + n (cid:1) × (cid:90) [ T, ∞ ) (cid:90) R + (cid:90) R p p (cid:89) j =1 (cid:40) β j σ (cid:41) (cid:0) πσ (cid:1) − p | Q | / exp (cid:34) − (cid:0) β − Q − X T y (cid:1) T Q (cid:0) β − Q − X T y (cid:1) σ (cid:35) × | Q | − / (cid:0) σ (cid:1) − ( a + n ) − exp (cid:34) − y T (cid:0) I − X Q − X T (cid:1) y + y T y + 2 b σ (cid:35) × p (cid:89) j =1 (cid:32) λ j (cid:33) − (cid:0) λ j (cid:1) − (cid:0) τ (cid:1) − p/ (cid:0) τ (cid:1) − p/ π τ (cid:0) τ (cid:1) d β dσ dτ Note that if β ∼ N (cid:0) Q − X T y , σ Q − (cid:1) then the inner most integral wrt β is equal to E p (cid:89) j =1 (cid:40) β j σ (cid:41) = (cid:0) σ (cid:1) − p p (cid:89) j =1 (cid:8) E (cid:2) β j (cid:3)(cid:9) ; since Q is a diagonal matrix, β (cid:48) j s are indep. ≥ (cid:0) σ (cid:1) − p p (cid:89) j =1 { Var [ β j ] } = (2 ω ∗ ) − p (cid:18) τ (cid:19) − p p (cid:89) j =1 (cid:32) λ j (cid:33) − . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Also, noting that | Q | = ( ω ∗ ) p (cid:0) τ (cid:1) p p (cid:81) j =1 (cid:18) λ j (cid:19) we have the following lower bound: k ( λ , λ ) ≥ − p ( ω ∗ ) − p/ ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y / b (cid:19) a + n b a + n Γ (cid:0) a + n (cid:1) × (cid:90) [ T, ∞ ) (cid:90) R + (cid:0) σ (cid:1) − ( a + n ) − exp (cid:34) − y T (cid:0) I − X Q − X T (cid:1) y + y T y + 2 b σ (cid:35) × p (cid:89) j =1 (cid:32) λ j (cid:33) − / (cid:0) λ j (cid:1) − (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dσ dτ Further noting that y T (cid:0) I − X Q − X T (cid:1) y ≤ y T y we have: k ( λ , λ ) ≥ − p ( ω ∗ ) − p/ ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y / b (cid:19) a + n b a + n Γ (cid:0) a + n (cid:1) × (cid:90) [ T, ∞ ) (cid:90) R + (cid:0) σ (cid:1) − ( a + n ) − exp (cid:20) − σ ( y T y + b ) (cid:21) × p (cid:89) j =1 (cid:32) λ j (cid:33) − / (cid:0) λ j (cid:1) − (cid:0) τ (cid:1) − p dσ dτ Integrating wrt σ we have: k ( λ , λ ) ≥ − p ( ω ∗ ) − p/ ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y + b (cid:19) a + n × p (cid:89) j =1 (cid:113) λ j (cid:16) λ j (cid:17) / (cid:90) ∞ T (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dτ = (cid:15) ∗ h ( λ )where (cid:15) ∗ = 3 − p ( ω ∗ ) − p/ ( ω ∗ ) − p/ d − p/δ (cid:16) d δ (cid:17) − p (cid:18) b y T y + b (cid:19) a + n (cid:90) ∞ T (cid:0) τ (cid:1) − p π τ (cid:0) τ (cid:1) dτ and h is a probability density on R p + given by h ( λ ) = p (cid:89) j =1 · (cid:113) λ j (cid:16) λ j (cid:17) / · I (0 , ∞ ) (cid:0) λ j (cid:1) Hence, the minorization condition for the MTD (2.3) is established. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe Appendix D: Samplers for conditional posterior distributions of λ and τ for K aug,reg D.1. Rejection sampler for λ Recall that the target distribution g (cid:0) ·| ν, β, σ , τ , y (cid:1) has density proportion to the function φ ( · )where φ ( x ) = (cid:18) c + 1 τ x (cid:19) x − exp (cid:20) − x (cid:18) ν + β σ τ (cid:19)(cid:21) Consider a probability density function ψ on R + as follows: ψ ( x ) = | c | − (cid:113) ν + β σ τ √ π (cid:16) | c | − + ( τ ) − / (cid:17) x − / exp (cid:20) − x (cid:18) ν + β σ τ (cid:19)(cid:21) + (cid:0) τ (cid:1) − / (cid:16) ν + β σ τ (cid:17)(cid:16) | c | − + ( τ ) − / (cid:17) x − exp (cid:20) − x (cid:18) ν + β σ τ (cid:19)(cid:21) Note that the above is a convex combination of two Inverse-Gamma densities and is easy to samplefrom. After simple algebraic manipulation, one can show thatsup x ∈ (0 , ∞ ) φ ( x ) ψ ( x ) ≤ M, where M = √ π | c | − + (cid:0) τ (cid:1) − / (cid:113) ν + β σ τ + | c | − + (cid:0) τ (cid:1) − / ν + β σ τ We apply the following algorithm:For i = 1 , , · · ·
1. sample X i from ψ ( · )2. sample U i from the uniform distribution over (0 , X i if U i ≤ φ ( X i ) M ψ ( X i )for all i ; otherwise, we reject X i .4. Repeat the above three steps until we reach a sample of a predetermined size, say, p . D.2. Metropolis sampler for τ Recall that the target distribution π ( ·| λ , y ) has density proportion to the function φ ( · ) where φ ( x ) = | A c | − p (cid:89) j =1 (cid:32) c + 1 xλ j (cid:33) (cid:32) y T (cid:0) I n − X A − c X T (cid:1) y b (cid:33) − ( a + n ) π τ ( x ) ,A c = X T X +( x Λ ) − + c − I p and π τ ( · ) is a probability density function supported on R + . We willalso need to pick what is called a “proposal distribution” that changes location at each iteration inthe algorithm. We will call this q ( u | x ). Then the algorithm is:1. Choose some initial value x . . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe
2. For i = 1 , · · · , p (a) sample x ∗ i from q ( u | x i − ).(b) Set x i = x ∗ i with probalility α = min (cid:18) φ ( x ∗ i ) q ( x i − | x ∗ i ) φ ( x i − ) q ( x ∗ i | x i − ) , (cid:19) otherwise set x i = x i − .Often times we choose q to be a N ( x,
1) distribution. This has the convenient property of symmetry.Which means that q ( u | x ) = q ( x | u ), so the quantity α can be simplified to α = min (cid:18) φ ( x ∗ i ) φ ( x i − ) , (cid:19) which is much easier to calculate. This variant is a called a Metropolis sampler . References [1]
Bhattacharya Anirban, Chakraborty Antik, Mallick Bani K.
Fast sampling with Gaussian scalemixture priors in high-dimensional regression // Biometrika. 10 2016. 103, 4. 985–991.[2]
Bhattacharya Anirban, Pati Debdeep, Pillai Natesh, Dunson David . Dirichlet–Laplace Priorsfor Optimal Shrinkage // Journal of the American Statistical Association. 2015. 110.[3]
Biswas Niloy, Bhattacharya Anirban, Jacob Pierre, Johndrow James . Coupled Markov chainMonte Carlo for high-dimensional regression with Half-t priors // arXiv. Dec 2020.[4]
Carvalho Carlos M., Polson Nicholas G., Scott James G.
The horseshoe estimator for sparsesignals // Biometrika. 2010. 97, 2. 465–480.[5]
Diaconis Persi, Khare Kshitij, Saloff-Coste Laurent . Gibbs Sampling, Exponential Familiesand Orthogonal Polynomials // Statistical Science. May 2008. 23, 2. 151–178.[6]
Flegal James M., Jones Galin L.
Batch means and spectral variance estimators in Markovchain Monte Carlo // Ann. Statist. 04 2010. 38, 2. 1034–1070.[7]
Hahn Paul, He Jingyu, Lopes Hedibert . Bayesian Factor Model Shrinkage for Linear IV Re-gression With Many Instruments // Journal of Business and Economic Statistics. IV 2018.36, 2. 278–287.[8]
Johndrow James E., Orenstein Paulo, Bhattacharya Anirban . Scalable Approximate MCMCAlgorithms for the Horseshoe Prior // Journal of Machine Learning Research. 2020. 21. 1–61.[9]
Khare Kshitij, Hobert James P.
Geometric ergodicity of the Bayesian lasso // Electron. J.Statist. 2013. 7. 2150–2163.[10]
Livingstone Samuel, Betancourt Michael, Byrne Simon, Girolami Mark . On the geometricergodicity of Hamiltonian Monte Carlo // Bernoulli. 11 2019. 25, 4A. 3109–3138.[11]
Lu Tzon Tzer, Shiou Sheng Hua . Inverses of 2 × Makalic Enes, Schmidt Daniel F.
A Simple Sampler for the Horseshoe Estimator // IEEESignal Processing Letters. Jan 2016. 23, 1. 179–182. . Bhattacharya, K. Khare, and S. Pal/Geometric ergodicity for the Horseshoe [13] Meyn S.P., Tweedie R.L.
Markov Chains and Stochastic Stability. London: Springer-Verlag,1993.[14]
Nishimura Akihiko, Suchard Marc A.
Shrinkage with shrunken shoulders: inference via geo-metrically / uniformly ergodic Gibbs sampler // arXiv. Jul 2020.[15]
Pal Subahdip, Khare Kshitij, Hobert James P.
Trace Class Markov Chains for Bayesian Infer-ence with Generalized Double Pareto Shrinkage Priors // Scandinavian Journal of Statistics.June 2017. 44, 2. 307–323.[16]
Pal Subhadip, Khare Kshitij . Geometric ergodicity for Bayesian shrinkage models // Electron.J. Statist. 2014. 8, 1. 604–645.[17]
Piironen Juho, Vehtari Aki . Sparsity information and regularization in the horseshoe andother shrinkage priors // Electron. J. Statist. 2017. 11, 2. 5018–5051.[18]
Polson Nicholas G., Scott James G.
On the Half-Cauchy Prior for a Global Scale Parameter// Bayesian Analysis. 2012. 7, 4. 887—-902.[19]
Roberts Gareth, Rosenthal Jeffery . Markov chains and de-initializing processes // ScandinavianJournal of Statistics. 2001. 28. 489–504.[20]