[PDF] A positive-definiteness-assured block Gibbs sampler for Bayesian graphical models with shrinkage priors

Abstract

Although the block Gibbs sampler for the Bayesian graphical LASSO proposed by Wang (2012) has been widely applied and extended to various shrinkage priors in recent years, it has a less noticeable but possibly severe disadvantage that the positive definiteness of a precision matrix in the Gaussian graphical model is not guaranteed in each cycle of the Gibbs sampler. Specifically, if the dimension of the precision matrix exceeds the sample size, the positive definiteness of the precision matrix will be barely satisfied and the Gibbs sampler will almost surely fail. In this paper, we propose modifying the original block Gibbs sampler so that the precision matrix never fails to be positive definite by sampling it exactly from the domain of the positive definiteness. As we have shown in the Monte Carlo experiments, this modification not only stabilizes the sampling procedure but also significantly improves the performance of the parameter estimation and graphical structure learning. We also apply our proposed algorithm to a graphical model of the monthly return data in which the number of stocks exceeds the sample period, demonstrating its stability and scalability.

Full PDF

AAn Eﬃcient Gibbs Sampling Algorithm forBayesian Graphical LASSO Models with thePositive Deﬁnite Constraint on the PrecisionMatrix ∗ Sakae Oya † Graduate School of Economics, Keio UniversityandTeruo NakatsumaFaculty of Economics, Keio University

Abstract

Wang (2012) proposed a novel Gibbs sampling algorithm for Bayesiananalysis of Gaussian graphical LASSO models. In this paper, we pro-pose a modiﬁcation to Wang(2012)’s algorithm so that the precisionmatrix in a graphical LASSO model would never fail to be positivedeﬁnite in every cycle of the Gibbs sampling procedure. Our proposedalgorithm is designed to sample the oﬀ-diagonal elements of the preci-sion matrix exactly from the region where the precision matrix remainspositive deﬁnite. As a result, it is more stable in the sense that thesampling procedure will not halt due to numerical exceptions relatedto the lack of positive deﬁniteness. The simulation results show thatour proposed algorithm can signiﬁcantly improve the performance ofparameter estimation and graphical structure learning.

Keywords:

Hit-and-Run algorithm, Gibbs sampler, Positive deﬁnite-ness, Precision matrix ∗ This research is supported by JSPS Grants-in-Aid for Scientiﬁc Research Grant Num-ber 19K01592 and the Keio Economic Society. † [email protected] a r X i v : . [ s t a t . C O ] J a n Introduction

Suppose Y is a ( n × p ) data matrix of p variables and n observationsand the t -th row vector of Y , y t (1 (cid:53) t (cid:53) n ), follows a multivariate normaldistribution N ( , Ω − ) where Ω = ( ω ij ), (1 (cid:53) i, j (cid:53) p ) is the inverse of thecovariance matrix and called the precision matrix. In the multivariate normaldistribution, ω ij = 0 implies that y ti and y tj are independent. Therefore aset of non-zero oﬀ-diagonal elements in Ω constitutes an undirected graphicalstructure among ( y t , . . . , y tp ), which is called the Gaussian graphical model.We may estimate Ω by maximizing the log likelihood: (cid:96) ( Ω ) = − np π − n | Ω | −

12 tr ( S Ω ) , (1)where S = ( s ij ) = Y (cid:124) Y . In practice, however, the MLE with (1) does notproduce estimates of oﬀ-diagonal ω ij ’s that are exactly equal to zero. Toobtain “zero estimates” of ω ij ’s, we may employ a LASSO-type penalizedMLE: max Ω ∈ M + n | Ω | −

12 tr ( S Ω ) − λ (cid:107) Ω (cid:107) , (2)where (cid:107) Ω (cid:107) = (cid:80) i (cid:53) j | ω ij | and M + is the subset of the parameter space of Ω in which Ω is a positive deﬁnite precision matrix. The solution of (2) iscalled the graphical LASSO estimator and there have been many researcheson this model in recent years, including Meinshausen and B¨uhlmann (2006),Yuan and Lin (2007), Banerjee et al. (2008), Friedman et al. (2008), Guoet al. (2011) among others.Note that the penalty in (2) is equivalent to the logarithm of p ( ω ij ) = (cid:40) λe − λω ii , ( i = j ); λ e − λ | ω ij | , ( i (cid:54) = j ) . (3)From the viewpoint of Bayesian statistics as in Marlin et al. (2009) and Marlinand Murphy (2009), the graphical LASSO estimator is a MAP (maximuma posteriori) estimator of Ω in which the prior distribution of each diagonalelement is exponential and that of each oﬀ-diagonal element is Laplace as in(3). This is a natural extension of the original Bayesian LASSO by Park andCasella (2008) who extended the LASSO regression by Tibshirani (1996) toa Bayesian counterpart. 2ased on this interpretation, Wang (2012) and Khondker et al. (2013)independently proposed Markov chain sampling algorithms to generate theprecision matrix Ω from its posterior distribution. Wang (2012) developed aGibbs sampling algorithm while Khondker et al. (2013) devised a Metropolis-Hastings algorithm which could generate a positive deﬁnite precision matrix.In this paper, we explore Wang (2012)’s approach since it is a pure Gibbssampler and does not suﬀer from the problem of a low acceptance rate evenif the dimension of Ω is high.Let us preview Wang (2012)’s algorithm (block Gibbs sampler) brieﬂy,though we will discuss it in detail later in Section 2. Wang (2012)’s blockGibbs sampler generates the i -th diagonal element ω ii and the oﬀ-diagonalelements in the i -th column (or row) alternatively in the following fashion. Block Gibbs sampler for the precision matrix (cid:19) (cid:16)

For i = 1 , . . . , p , repeat Step 1 to Step 3 . Step 1:

Partition Ω into the i -th diagonal element ω ii , the oﬀ-diagonalelements ( ω i , . . . , ω i − ,i , ω i +1 ,i , . . . , ω pi ) and the rest. Step 2:

Generate ω ii from the full conditional posterior distribution. Step 3:

Generate ( ω i , . . . , ω i − ,i , ω i +1 ,i , . . . , ω pi ) from the full condi-tional posterior distribution. (cid:18) (cid:17) These full conditional posterior distributions will be derived in Section 2.Since Wang (2012)’s block Gibbs sampler enables us to generate Ω fromthe posterior distribution so easily, it has become an indispensable buildingblock for recent applied researches in Bayesian analysis of Gaussian graphicalmodels. For example, Li et al. (2019) proposed a Gibbs sampling algorithmfor Bayesian graphical horseshoe models, which is a natural extension ofWang (2012)’s block Gibbs sampler for the Bayesian graphical LASSO.Although this block Gibbs sampler is nice and elegant, the precision ma-trix Ω generated with this sampling algorithm is not necessarily positivedeﬁnite because it does not generate the oﬀ-diagonal elements of Ω from M + in Step 3 . To demonstrate our point, we run Monte Carlo experiments sim-ilar to the ones conducted by Wang (2012). We generate data sets with sixdiﬀerent graph structures (AR(1), AR(2), Block, Star, Circle and Full) andtwo diﬀerent dimensions ( p = 30 , able 1: the number of violations in the positive deﬁniteness of Ω p AR(1) AR(2) Block Star Circle Full30 7,644 561 12 27 77,768 14(2.55) (0.19) (0.00) (0.01) (25.88) (0.00)100 566 9 0 2,524 205,093 0(0.06) (0.00) (0.00) (0.25) (20.51) (0.00)Notes: (a) the number of generated Ω ’s is p × , in which the shrinkage parameter λ maydiﬀer from element to element in Ω . We will go into details about the designof each Monte Carlo experiment in Section 4. The number of iterations inthe block Gibbs sampler is 10,000 for each experiment. Thus, if we countevery Ω that is partially updated in Step 1 to Step 3 as a distinctive one,we have 300,000 ( p = 30) or 10,000,000 ( p = 100) replications of Ω in oneexperiment. The results of the Monte Carlo experiments are summarized inTable 1. In the case of p = 30, the violation of positive deﬁniteness occurs inall designs. In particular, about one quarter of generated Ω ’s do not satisfythe positive deﬁniteness in the Circle design. In the case of p = 100, theviolation of positive deﬁniteness is less severe for some designs, but the ratioof violation is still high (20.51%) in the Circle design.To address this issue, we propose to improve Wang (2012)’s block Gibbssampler so that generated Ω should never fail to be positive deﬁnite. Al-though it seems too intractable to guarantee the positive deﬁniteness of Ω ineach cycle of the block Gibbs sampler, it turns out that the Hit-and-Run algo-rithm by B´elisle et al. (1993) is applicable to the Bayesian (adaptive) graph-ical LASSO in a fairly straightforward manner, and the resultant algorithmis a pure Gibbs sampler without the Metropolis-Hastings step. Therefore ourproposed algorithm enjoys the same eﬃciency as Wang (2012)’s algorithm,yet it can prevent Ω from violating positive deﬁniteness.The main body of this paper is organized as follows. In Section 2, wewill brieﬂy review Wang (2012)’s Gibbs sampling algorithm for the Bayesianadaptive graphical LASSO, though Wang (2012) also derived the algorithmfor the Bayesian graphical LASSO with the common shrinkage parameter. We will explain the Bayesian adaptive LASSO in Section 2. In our experience, theviolation of positive deﬁniteness occurs whether it is adaptive or not.

In this section, we brieﬂy review a Gibbs sampling algorithm developedby Wang (2012). Although Wang (2012) derived it for the Bayesian graphicalLASSO with the prior distribution (3) as well, we consider a more generalprior setting that allows λ in (3) to vary for each element of the precisionmatrix Ω , namely, p ( ω ij ) = (cid:40) λ ii e − λ ii ω ii , ( i = j ); λ ij e − λ ij | ω ij | , ( i (cid:54) = j ) , (4)which is called the adaptive graphical LASSO. Since Wang (2012) demon-strated that the Bayesian adaptive LASSO outperformed the non-adaptivecounterpart in terms of parameter estimation and graphical structure learn-ing, we will illustrate the Gibbs sampling algorithm for the Bayesian adaptivegraphical LASSO in detail.To derive the Gibbs sampling algorithm, Wang (2012) utilizes the well-known fact that the Laplace distribution in (4) is expressed as a scale mixtureof normal distributions with the exponential distribution: ω ij | τ ij ∼ N (0 , τ ij ) , τ ij ∼ E xp (cid:18) λ ij (cid:19) . (5)By using the gamma distribution G a ( r, s ) as the common prior for λ ij (1 (cid:53) i (cid:53) j (cid:53) p ), we obtain the joint posterior distribution of ω = { ω ij } i (cid:53) j , Wang (2012) assumes that the prior distribution of each diagonal element ω ii is λ ii exp (cid:0) − λ ii ω ii (cid:1) , instead of λ ii exp ( − λ ii ω ii ). This is because Wang (2012) employs (cid:107) Ω (cid:107) = (cid:80) pi =1 (cid:80) pj =1 | ω ij | as the penalty in which each oﬀ-diagonal element ω ij ( i (cid:54) = j )appears twice. On the other hand, ours is (cid:107) Ω (cid:107) = (cid:80) pi =1 (cid:80) ij =1 | ω ij | that includes thelower triangular part of Ω only. = { τ ij } i

12 tr( S Ω ) (cid:21) p (cid:89) i =1 λ ii e − λ ii ω ii × (cid:89) i

1) matrix, ω is a ( p − ×

1) vector and ω isa scalar. Without loss of generality, we can rearrange rows and columns of6 so that the lower-right corner of Ω , ω , should be the diagonal elementto be generated from its full conditional posterior distribution. Likewise, wepartition S , Υ and λ as S = (cid:20) S s s (cid:124) s (cid:21) , Υ = (cid:20) Υ τ τ (cid:124) (cid:21) , λ = (cid:20) λ λ (cid:21) , (10)where Υ is a ( p × p ) symmetric matrix in which the oﬀ-diagonal ( i, j ) elementis τ ij and all diagonal elements are equal to zero while λ is the element in λ that corresponds with the diagonal element ω in the prior distribution(4).With the partition of Ω in (9) and that of S in (10), we havetr ( S Ω ) = s ω + 2 s (cid:124) ω + tr ( S Ω ) , and | Ω | = (cid:12)(cid:12) ω − ω (cid:124) Ω − ω (cid:12)(cid:12) | Ω | . Then the likelihood can be expressed as p ( Y | Ω ) ∝ | Ω | n exp (cid:20) −

12 tr( S Ω ) (cid:21) ∝ (cid:12)(cid:12) ω − ω (cid:124) Ω − ω (cid:12)(cid:12) n | Ω | n × exp (cid:20) − { s ω + 2 s (cid:124) ω + tr ( S Ω ) } (cid:21) . (11)By applying a change of variables,( ω , ω ) −→ (cid:0) β = ω , γ = ω − ω (cid:124) Ω − ω (cid:1) , (12)to the likelihood (11), we have p ( Y | Ω ) ∝ γ n exp (cid:20) − (cid:8) s γ + s β (cid:124) Ω − β + 2 s (cid:124) β + tr( S Ω ) (cid:9)(cid:21) ∝ γ N exp (cid:20) − (cid:110) s γ + s β (cid:124) Ω − β + 2 s β (cid:111)(cid:21) . (13)7ith the adaptive prior (4) and the ﬂat prior p ( γ ) ∝ constant, Wang (2012)proposes to use γ | θ − γ , Y ∼ G a (cid:16) n , s λ (cid:17) , (14) β | θ − β , Y ∼ N ( − Cs , C ) , (15) C = (cid:8) ( s + 2 λ ) Ω − + D − τ (cid:9) − , D τ = diag( τ ) , as the full conditional posterior distribution of γ and β .In summary, Wang (2012)’s block Gibbs sampler is given as follows. Block Gibbs sampler for all parameters (cid:19) (cid:16)

For i = 1 , . . . , p , repeat Step 1 to Step 5 . Step 1:

Rearrange Ω , S , Υ and λ so that ω ii is in the place of ω in Ω and partition them as in (9) and (10). Step 2: γ ← G a (cid:0) n + 1 , s + λ (cid:1) and set ω = γ + ω Ω − ω . Step 3: If i (cid:61) β ← N ( − Cs , C ) and set ω = β . Step 4: λ ← G a ( r + 1 , s + | ω | ). Step 5: υ ← IG (cid:16) λ | ω | , λ (cid:17) and set τ = 1 /υ . (cid:18) (cid:17) As we pointed out in the introduction, Wang (2012)’s block Gibbs samplerdoes not necessarily guarantee the positive deﬁniteness of generated Ω ’s.In this section, we directly derive the full conditional posterior distributionof ω and ω without the transformation (12), and propose an eﬃcientsampling method to generate them under the positive deﬁniteness constraint: Ω ∈ M + .First, let us derive the full conditional posterior distribution of ω . Giventhat Ω from the previous iteration of the block Gibbs sampler is positivedeﬁnite, newly generated ω and ω must satisfy ω > ω (cid:124) Ω − ω , (16)8o ensure that the updated Ω is also positive deﬁnite. In other words, theconditional prior distribution of ω given ω and Ω must be p ( ω | ω , Ω ) ∝ λ exp ( − λ ω ) M +22 ( ω ) , (17)where M +22 = { ω : ω > ω (cid:124) Ω − ω } . Therefore, by ignoring the partsthat do not depend on ω in (11), we have p ( ω | θ − ω , Y ) ∝ (cid:12)(cid:12) ω − ω (cid:124) Ω − ω (cid:12)(cid:12) n exp (cid:16) − s ω (cid:17) × exp ( − λ ω ) M +22 ( ω ) ∝ (cid:12)(cid:12) ω − ω (cid:124) Ω − ω (cid:12)(cid:12) n exp (cid:20) − s + 2 λ (cid:0) ω − ω (cid:124) Ω − ω (cid:1)(cid:21) M +22 ( ω ) . (18)The full conditional posterior distribution of ω in (18) is the shifted gammadistribution: ω = u + ω (cid:124) Ω − ω , u ∼ G a (cid:16) n , s λ (cid:17) . (19)Obviously, the distribution of u in (19) is equivalent to that of γ in (14).Thus (19) and (14) are basically identical to each other, and ω generatedfrom either (19) or (14) always satisﬁes the positive deﬁniteness condition(16).Next, let us derive the full conditional posterior distribution of ω . Forthe same reason as (17), the conditional prior distribution of ω must bethe following truncated multivariate normal distribution: p ( ω | ω , Ω ) ∝ exp (cid:18) − ω (cid:124) D − τ ω (cid:19) M +12 ( ω ) , (20)where M +12 = { ω : ω > ω (cid:124) Ω − ω } . As a result, the full conditionalposterior distribution of ω is also the truncated multivariate normal distri-bution: ω | θ − ω , Y ∼ N ( − Cs , C ) M +12 ( ω ) . (21)On the other hand, Wang (2012) proposes to use the unconstrained mul-tivariate normal distribution (15), which does not impose the truncation M +12 ( ω ), to generate ω (= β ). Consequently, if we generate ω from915), there is no guarantee that the newly updated ω satisﬁes the positivedeﬁniteness condition (16). This is the reason why generated Ω ’s are not al-ways positive deﬁnite as shown in Table 1. Therefore, in order to ensure thepositive deﬁniteness of Ω , it is preferable to use the truncated multivariatenormal distribution (21) in the block Gibbs sampler.Since either naive rejection method or Metropolis-Hastings algorithm isineﬃcient even for a modest-size graphical model, we apply the Hit-and-Run algorithm (B´elisle et al. (1993)) to generate ω from the truncatedmultivariate normal distribution (19). Hit-and-Run algorithm (cid:19) (cid:16)

Step 1:

Pick a point α on the unit sphere randomly as α = z (cid:107) z (cid:107) , z ∼N ( , I ). Step 2:

Generate a random scalar κ from the distribution with the den-sity: f ( κ ) ∝ p ( ω + κ α ) M +12 ( ω + κ α ) , (22)where p ( · ) is the density of N ( − Cs , C ) in (21). Step 3:

Set ω + κ α as the new ω . (cid:18) (cid:17) It is straightforward to show that the distribution of κ in (22) is κ ∼ N (cid:0) µ κ , σ κ (cid:1) M +12 ( ω + κ α ) , (23)where µ κ = − s (cid:124) α + ω (cid:124) C − αα (cid:124) C − α , σ κ = 1 α (cid:124) C − α . The indicator function M +12 ( ω + κ α ) is equal to one if and only if( ω + κ α ) (cid:124) Ω − ( ω + κ α ) − ω < . This means that κ must satisfy (cid:0) α (cid:124) Ω − α (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) a κ + 2 (cid:0) ω (cid:124) Ω − α (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) b κ + ω (cid:124) Ω − ω − ω (cid:124) (cid:123)(cid:122) (cid:125) c < . Note that a > c < Ω is positive deﬁnite, whichimplies that the quadratic equation aκ + 2 bκ + c = 0 has two distinctive real10oots. Therefore the distribution in (23) is the truncated univariate normaldistribution on the interval: R + = (cid:26) κ : − b − √ b − aca < κ < − b + √ b − aca (cid:27) . Hence, by using the Hit-and-Run algorithm, sampling from the seeminglyintractable distribution (19) is reduced to sampling from the truncated uni-variate normal distribution: κ ∼ N (cid:0) µ κ , σ κ (cid:1) R + ( κ ) , and the sampling procedure becomes far much simpler.By replacing (14) in Step 2 with (19) and (15) in

Step 3 with the Hit-and-Run algorithm, we have the modiﬁed block Gibbs sampler as follows.

Modiﬁed block Gibbs sampler (cid:19) (cid:16)

For i = 1 , . . . , p , repeat Step 1 to Step 5 . Step 1:

Rearrange Ω , S , Υ and λ so that ω ii is in the place of ω in Ω and partition them as in (9) and (10). Step 2: u ← G a (cid:0) n + 1 , s + λ (cid:1) and set ω = u + ω Ω − ω . Step 3: If i (cid:61) z ← N ( , I ) ans set α = z (cid:107) z (cid:107) .(b) κ ← N ( µ κ , σ κ ) R + ( κ ) and update the old ω with ω + κ α . Step 4: λ ← G a ( r + 1 , s + | ω | ). Step 5: υ ← IG (cid:16) λ | ω | , λ (cid:17) and set τ = 1 /υ . (cid:18) (cid:17) In this section, we report the results of Monte Carlo experiments in orderto compare our modiﬁed block Gibbs sampler with Wang (2012)’s original al-gorithm in terms of accuracy in parameter estimation and graphical structurelearning. For brevity, we shall refer to Wang (2012)’s original algorithm as11GS (block Gibbs sampler) and our modiﬁed version as HRS (Hit-and-Runsampler). Following Wang (2012), we examine the following six diﬀerentspeciﬁcations of the Gaussian graphical model in the Monte Carlo experi-ments:(a) AR(1): σ ij = 0 . | i − j | .(b) AR(2): ω ii = 1 . ω i,i − = ω i − ,i = 0 . ω i,i − = ω i − ,i = 0 . σ ii = 1, σ ij = 0 . ≤ i (cid:54) = j ≤ p/ σ ij = 0 . p/ ≤ i (cid:54) = j ≤

10 and σ ij = 0 . ω ii = 1 . ω ,i = ω i, = 0 . ω ij = 0 . ω ii = 2 . ω i − ,i = ω i,i − = 1 . ω p = ω p = 0 . ω ii = 2 . ω ij = 1 . i (cid:54) = j .where σ ij (1 (cid:53) i, j (cid:53) p ) is the ( i, j ) element of the covariance matrix Ω − inthe Gaussian graphical model.Other settings for the Monte Carlo experiments also mirror Wang (2012).For each model, we generate a sample of ( p ×

1) random vectors y , . . . , y n independently from N ( , Ω − ). We consider two cases: ( n, p ) = (50 ,

30) and( n, p ) = (200 , ×

2) scenarios in the experiments.The hyperparameters in the prior distribution of λ ij are r = 10 − and s =10 − . For both BGS and HRS, the number of burn-in iterations is 5,000 andthe Monte Carlo sample from the following 10,000 iterations will be usedin Bayesian inference . We repeat each scenario of simulation 50 times andobtain a set of point estimates of Ω . All computations were implemented on aworkstation with 64GB RAM and six-core 3.4GHz Intel Xeon processor usingPython 3.6.1. HRS requires additional computations because it explicitlyimposes the positive deﬁnite constraint Ω ∈ M + , but we observed onlymodest diﬀerence in computation time between HRS and BGS.In order to compare HRS with BGS in terms of accuracy in point estima-tion of the precision matrix Ω , we compute two sample loss functions, Stein’sloss and Frobenius norm, as measurements of discrepancy between the pointestimate and the true Ω . Table 2 shows the sample median loss (Stein’s loss The same design of simulation (speciﬁcations of Ω , combinations of ( n, p ), hyperpa-rameters, burn-in iterations and the size of the Monte Carlo sample) was used in producingthe results in Table 1.

12n the upper half and Frobenius norm in the lower half) of 50 replicationsin 12 scenarios for BGS and HRS. Figures in parentheses are the standarderrors. The loss is unanimously and substantially smaller in HRS than BGS.This observation is valid not only for the Circle model in which the positivedeﬁniteness of Ω is most frequently violated as shown in Table 1, but alsofor the other models with diﬀerent graphical structures. Interestingly, HRSoutperforms BGS even for the Full model in which Ω is not sparse and theestimation loss of the graphical LASSO model is expected to be far muchworse. Furthermore, this tendency is unchanged in either small-size ( p = 30)or large-size ( p = 100) model. All in all, the results in Table 2 suggest thatimposing the positive deﬁniteness constraint remarkably improves the accu-racy in point estimation of Ω in the Bayesian adaptive graphical LASSO.To assess the performance of graphical structure learning, we check whetherthe point estimate of Ω can successfully restore the true structure from thesimulated data. Recall that there is no connection between nodes, say node i and node j (1 (cid:53) i, j (cid:53) p ), if ω ij = 0. As in Fan et al. (2009), we use thefollowing rule to determine whether a pair of nodes are connected or not: (cid:40) | ˆ ω ij | (cid:61) − , (node i and node j are connected); | ˆ ω ij | < − , (node i and node j are not connected) , (24)where ˆ ω ij is the point estimate of ω ij computed with the Monte Carlo sampleof Ω that we generate for each scenario with HRS or BGS. Then, with theestimated graphical structures (we have 50 of them), accuracy in graphicalstructure learning is measured with three criteria: speciﬁcity, sensitivity andMatthews Correlation Coeﬃcients (MCC), namelySpeciﬁcity = TNTN + FP , Sensitivity = TPTP + FN , MCC = TP × TN − FP × FN (cid:112) (TP + FP)(TP + FN)(TN + FN)(TN + FN) , (25)where TP, TN, FP and FN are the number of true positives, true negatives,false positives and false negatives in 50 replications respectively.Table 3 reports the calculated criteria for 12 scenarios. In Table 3,HRS outperforms BGS for all scenarios except for the sensitivity of the Star The results of BGS in Table 3 are far diﬀerent from those in Wang (2012, Table 2). Weguess that this discrepancy is caused by the diﬀerence in criteria for detecting connections.Wang (2012, p. 882) states “we claim { ω ij = 0 } if ˆ ω ij < − as Fan et al. (2009)”, which able 2: Sample Median Loss in Point Estimation of Ω AR(1) AR(2) Block Star Circle FullStein’s loss p = 30BGS 1.88 4.48 1.38 1.52 1.81 19.31(0.32) (0.49) (0.28) (0.26) (0.32) (0.87)HRS (0.20) (0.18) (0.18) (0.20) (0.16) (0.52) p = 100BGS 3.02 4.25 2.81 3.75 3.08 69.65(0.20) (0.26) (0.18) (0.22) (0.18) (1.10)HRS (0.08) (0.08) (0.07) (0.08) (0.05) (0.72)Frobenius norm p = 30BGS 4.04 3.01 2.19 2.19 2.51 29.61(0.55) (0.18) (0.35) (0.30) (0.42) (0.06)HRS (0.27) (0.13) (0.22) (0.22) (0.07) (0.49) p = 100BGS 4.38 2.33 2.90 3.20 2.59 99.61(0.29) (0.12) (0.13) (0.13) (0.25) (0.02)HRS (0.11) (0.05) (0.08) (0.07) (0.03) (0.41)Notes: (a) the smaller loss is boldfaced.(b) ﬁgures in parentheses are the standard errors. able 3: Accuracy in Graphical Structure Learning

AR(1) AR(2) Block Star CircleSpeciﬁcity p = 30BGS 6.00 10.22 7.10 6.77 12.34HRS p = 100BGS 10.69 20.39 12.93 12.15 28.45HRS Sensitivity p = 30BGS 100.00 99.31 100.00 p = 100BGS 100.00 100.00 100.00 100.00 100.00HRS 100.00 100.00 100.00 100.00 100.00MCC p = 30BGS 7.85 12.39 5.19 3.45 11.74HRS p = 100BGS 5.96 11.18 3.89 6.41 10.86HRS Notes: (a) the better result is boldfaced.(b) ﬁgures are in percentage. p = 30, though the sensitivity of HRS is still more than 90%.Especially in the case of p = 100, the values of speciﬁcity are more than90% for HRS, which means that most of zero oﬀ-diagonal elements in Ω arecorrectly identiﬁed. This accuracy is quite crucial when we try to detectthe true graphical structure in practice. It seems that imposing the positivedeﬁniteness constraint also enhances the graphical structure learning in theBayesian adaptive graphical LASSO. In this paper, we proposed a modiﬁcation to Wang (2012)’s famous Gibbssampling algorithm for Bayesian graphical LASSO. Our modiﬁed algorithmguarantees the positive deﬁniteness of the precision matrix throughout theloop of Gibbs sampling by generating the oﬀ-diagonal elements of the preci-sion matrix from a truncated multivariate normal distribution whose supportis the region where the updated precision matrix remains positive deﬁnite.To facilitate sampling from such a complicated distribution, we proposedto utilize the Hit-and-Run algorithm by B´elisle et al. (1993). The derivedalgorithm is still a pure Gibbs sampler and keeps eﬃciency and scalabilityof Wang (2012)’s original algorithm. In the simulation study, we showedthat our modiﬁed algorithm could remarkably improve accuracy in point es-timation as well as graph structure learning. Since the part of the Gibbssampling algorithm in which the precision matrix is updated is common toother graphical shrinkage models such as the graphical horseshoe model,it is straightforward to incorporate our modiﬁed algorithm into the Gibbssampling algorithm for the Bayesian graphical horseshoe model by Li et al.(2019) or other related sampling procedures for Gaussian graphical modelswith scale-mixture-of-normals shrinkage priors. means that a negative ˆ ω ij , whether it is near or far away from zero, is regarded as anevidence against connection between nodes. As a result, negative relations between nodeswould be over-rejected and the estimated graphical structure would be too sparse in thesense that the precision matrix includes too many zeros in the oﬀ-diagonal elements. Toconﬁrm this conjecture, we recalculated the three criteria in (25) without the absolutevalue in (24) and found that the recalculated results were comparably similar to those inWang (2012). upplementary Material Python codes for the block Gibbs sampler and the Hit-and-Run samplerused in this paper is available from https://github.com/sakaesaserunrun/onglasso . References

O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection throughsparse maximum likelihood estimation for multivariate gaussian or binarydata.

The Journal of Machine Learning Research , 9:485–516, 2008.C. J. P. B´elisle, H. E. Romeijn, and R. L. Smith. Hit-and-run algorithms forgenerating multivariate distributions.

Mathmatics of Operations Research ,18(2):255–266, 1993.J. Fan, Y. Feng, and Y. Wu. Network exploration via the adaptive lasso andscad penalties.

Annals of Applied Statistics , 3(2):521–541, 2009.J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estima-tion with the graphical lasso.

Biostatistics , 5(1):432–441, 2008.J. Guo, E. Levina, G. Michailidis, and J. Zhu. Joint estimation of multiplegraphical models.

Biometrika , 98(1):1–15, 2011.Z. Khondker, H. Zhu, W. Lin, and J. Ibrahim. The bayesian covariance lasso.

Statistics and Its Inference , 6(2):243–259, 2013.Yunfan Li, Bruce A. Craig, and Anindya Bhadra. The graphical horseshoeestimator for inverse covariance matrices.

Journal of Computational andGraphical Statistics , 28(3):747–757, 2019. doi: 10.1080/10618600.2019.1575744.B. M. Marlin and K. P. Murphy. Sparse gaussian graphical models withunknown block structure. In

Proceedings of the 26th Annual InternationalConference on Machine Learning , number ICML ’09, pages 705–712, NewYork, NY, USA, 2009. Association for Computing Machinery.B. M. Marlin, M. Schmidt, and K. P. Murphy. Group sparse priors for co-variance estimation. In

Proceedings of the Twenty-Fifth Conference on ncertainty in Artiﬁcial Intelligence , number UAI ’09, pages 383–392, Ar-lington, Virginia, USA, 2009. AUAI Press.N. Meinshausen and P. B¨uhlmann. High-dimensional graphs and variableselection with the lasso. The Annals of Statistics , 34(3):1436–1462, 2006.T. Park and G. Casella. The bayesian lasso.

Journal of the American Sta-tistical Association , 103:681–686, 2008.R. Tibshirani. Regression shrinkage and selection via the lasso.

Journal ofthe Royal Statistical Society Series B , 46:267–288, 1996.H. Wang. Bayesian graphical lasso methods and eﬃcient posterior computa-tion.

Bayesian Analysis , 7:867–886, 2012.M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphicalmodel.