[PDF] BDgraph: An R Package for Bayesian Structure Learning in Graphical Models

Abstract

Graphical models provide powerful tools to uncover complicated patterns in multivariate data and are commonly used in Bayesian statistics and machine learning. In this paper, we introduce the R package BDgraph which performs Bayesian structure learning for general undirected graphical models (decomposable and non-decomposable) with continuous, discrete, and mixed variables. The package efficiently implements recent improvements in the Bayesian literature, including that of Mohammadi and Wit (2015) and Dobra and Mohammadi (2018). To speed up computations, the computationally intensive tasks have been implemented in C++ and interfaced with R, and the package has parallel computing capabilities. In addition, the package contains several functions for simulation and visualization, as well as several multivariate datasets taken from the literature and used to describe the package capabilities. The paper includes a brief overview of the statistical methods which have been implemented in the package. The main part of the paper explains how to use the package. Furthermore, we illustrate the package's functionality in both real and artificial examples.

Full PDF

JJSS

Journal of Statistical Software

April 2019, Volume 89, Issue 3. doi: 10.18637/jss.v089.i03

BDgraph : An R Package for Bayesian StructureLearning in Graphical Models

Reza Mohammadi

University of Amsterdam

Ernst C. Wit

Universita della Svizzera Italiana

Abstract

Graphical models provide powerful tools to uncover complicated patterns in multi-variate data and are commonly used in Bayesian statistics and machine learning. In thispaper, we introduce an R package BDgraph which performs Bayesian structure learn-ing for general undirected graphical models (decomposable and non-decomposable) withcontinuous, discrete, and mixed variables. The package eﬃciently implements recent im-provements in the Bayesian literature, including that of Mohammadi and Wit (2015) andDobra and Mohammadi (2018). To speed up computations, the computationally intensivetasks have been implemented in

C++ and interfaced with R , and the package has parallelcomputing capabilities. In addition, the package contains several functions for simula-tion and visualization, as well as several multivariate datasets taken from the literatureand are used to describe the package capabilities. The paper includes a brief overviewof the statistical methods which have been implemented in the package. The main bodyof the paper explains how to use the package. Furthermore, we illustrate the package’sfunctionality in both real and artiﬁcial examples. Keywords : Bayesian structure learning, Gaussian graphical models, Gaussian copula, Covari-ance selection, Birth-death process, Markov chain Monte Carlo, G-Wishart,

BDgraph , R .

1. Introduction

Graphical models (Lauritzen 1996) are commonly used, particularly in Bayesian statistics andmachine learning, to describe the conditional independence relationships among variables inmultivariate data. In graphical models, each random variable is associated with a node in agraph and links represent conditional dependency between variables, whereas the absence ofa link implies that the variables are independent conditional on the rest of the variables (thepairwise Markov property).In recent years, signiﬁcant progress has been made in designing eﬃcient algorithms to discover a r X i v : . [ s t a t . M L ] M a y BDgraph : An R Package for Bayesian Structure Learning in Graphical Models graph structures from multivariate data (Dobra, Lenkoski, and Rodriguez 2011; Dobra andLenkoski 2011; Jones, Carvalho, Dobra, Hans, Carter, and West 2005; Dobra and Mohammadi2018; Mohammadi and Wit 2015; Mohammadi, Abegaz Yazew, van den Heuvel, and Wit2017a; Friedman, Hastie, and Tibshirani 2008; Meinshausen and Buhlmann 2006; Murrayand Ghahramani 2004; Pensar, Nyman, Niiranen, Corander et al. et al.

BDgraph package (Mohammadi and Wit 2019) in R ( R CoreTeam 2019) for Bayesian structure learning in undirected graphical models. The package candeal with Gaussian, non-Gaussian, discrete and mixed datasets. The package includes vari-ous functional modules, including data generation for simulation, several search algorithms,graph estimation routines, a convergence check and a visualization tool; see Figure 1. Ourpackage eﬃciently implements recent improvements in the Bayesian literature, including thoseof Mohammadi and Wit (2015); Mohammadi et al. (2017a); Dobra and Mohammadi (2018);Lenkoski (2013); Mohammadi, Massam, and Gerald (2017b); Dobra and Lenkoski (2011); Hoﬀ(2007). For a Bayesian framework of Gaussian graphical models, we implement the methoddeveloped by Mohammadi and Wit (2015) and for Gaussian copula graphical models we usethe method described by Mohammadi et al. (2017a) and Dobra and Lenkoski (2011). Tomake our Bayesian methods computationally feasible for moderately high-dimensional data,we eﬃciently implement the

BDgraph package in

C++ linked to R . To make the package easyto use, the BDgraph package uses several S3 classes as return values of its functions. Thepackage is available under the general public license (GPL ≥

3) from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/packages=BDgraph .In the Bayesian literature, the

BDgraph is one of the few R packages which is available onlinefor Gaussian graphical models and Gaussian copula graphical models. Another R package is ssgraph (Mohammadi 2019) which is based on spike-and-slab proir. On the other hand, morepackages seem to be available in the frequentist literature. The existing packages include huge (Zhao, Liu, Roeder, Laﬀerty, and Wasserman 2019), glasso (Friedman, Hastie, and Tibshirani2018), bnlearn (Scutari 2010), pcalg (Kalisch, M¨achler, Colombo, Maathuis, and B¨uhlmann2012), netgwas (Behrouzi and Wit 2017), and QUIC (Hsieh, Sustik, Dhillon, and Ravikumar2011, 2014).In Section 2 we illustrate the user interface of the

BDgraph package. In Section 3 we explainsome methodological background of the package. In this regard, in Section 3.1 we brieﬂyexplain the Bayesian framework for Gaussian graphical models for continuous data. In Section3.2 we brieﬂy describe the Bayesian framework in the Gaussian copula graphical models fordata that do not follow the Gaussianity assumption, such as non-Gaussian continuous, discreteor mixed data. In Section 4 we describe the main functions implemented in the

BDgraph package. In addition, we explain the user interface and the performance of the package by asimple simulation example. In Section 6, using the functions implemented in the

BDgraph package, we study two actual datasets.

2. User interface

In the R environment, one can access and load the BDgraph package by using the following ournal of Statistical Software

R> install.packages( "BDgraph" )R> library( "BDgraph" )

By loading the

BDgraph package we automatically load the igraph (Csardi and Nepusz 2006)package, since the

BDgraph package depends on this package for graph visualization. The igraph package is available on the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org .To speed up computations, we eﬃciently implement the

BDgraph package by linking the

C++ code to R . The computationally extensive tasks of the package are implemented in parallel in C++ using

OpenMP (Board 2008). For the

C++ code, we use the highly optimized

LAPACK (Anderson, Bai, Bischof, Blackford, Demmel, Dongarra, Du Croz, Greenbaum, Hammarling,McKenney, and Sorensen 1999) and

BLAS (Lawson, Hanson, Kincaid, and Krogh 1979)linear algebra libraries on systems that provide them. The use of these libraries signiﬁcantlyimproves program speed.We design the

BDgraph package to provide a Bayesian framework for undirected graph esti-mation of diﬀerent types of datasets such as continuous, discrete or mixed data. The packagefacilitates a pipeline for analysis by three functional modules; see Figure 1. These modulesare as follows: > Continuous> Discrete> MixedM1: Data> Binary > GGMs> DGMs> GCGMsM2: Methods M3: Algorithm M3: Results> Convergence> Selection> Comparison> Visualization> BDMCMC> RJMCMC> Hill Climbingbdgraph.sim()graph.sim() bdgraph(data,method=”ggm”, algorithm=“bdmcmc”)bdgraph.mpl(,method=“ggm”,algorithm=“bdmcmc”)ssgraph(data, method=“ggm”) plinks(), select(), compare(), plotcoda()

Figure 1: Conﬁguration of the

BDgraph package which includes three main parts: (M1) datasimulation, (M2) several statistical methods, (M3) several search algorithms, (M4) variousfunctions to evaluate convergence of the search algorithms, estimation of the true graph,assessment and comparison of the results and graph visualization.

Module 1. Data simulation:

Function bdgraph.sim simulates multivariate Gaussian, dis-crete, binary, and mixed data with diﬀerent undirected graph structures, including "random" , "cluster" , "scale-free" , "lattice" , "hub" , "star" , "circle" , "AR(1)" , "AR(2)" , and "fixed" graphs. Users can determine the sparsity of the graph structure and can gener-ate mixed data, including "count" , "ordinal" , "binary" , "Gaussian" and "non-Gaussian" variables. Module 2. Methods:

The function bdgraph and bdgraph.mpl provide several estimationmethods regarding to the type of data:

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models • Bayesian graph estimation for the multivariate data that follow the Gaussianity as-sumption, based on the Gaussian graphical models (GGMs); see Mohammadi and Wit(2015); Dobra et al. (2011). • Bayesian graph estimation for multivariate non-Gaussian, discrete, and mixed data,based on Gaussian copula graphical models (GCGMs); see Mohammadi et al. (2017a);Dobra and Lenkoski (2011). • Bayesian graph estimation for multivariate discrete and binary data, based on discretegraphical models (DGMs); see Dobra and Mohammadi (2018).

Module 3. Algorithms:

The function bdgraph and bdgraph.mpl provide several samplingalgorithms: • Birth-death MCMC (BDMCMC) sampling algorithms (Algorithms 2 and 3) desciribedin Mohammadi and Wit (2015). • Reversible jump MCMC (RJMCMC) sampling algorithms desciribed in Dobra andLenkoski (2011). • Hill-climbing (HC) search algorithm desciribed in Pensar et al. (2017).

Module 4. Results:

Includes four types of functions: • Graph selection : The functions select , plinks , and pgraph provide the selectedgraph, the posterior link inclusion probabilities and the posterior probability of eachgraph, respectively. • Convergence check : The functions plotcoda and traceplot provide several visual-ization plots to monitor the convergence of the sampling algorithms. • Comparison and goodness-of-fit : The functions compare and plotroc provide sev-eral comparison measures and an ROC plot for model comparison. • Visualization : The plotting functions plot.bdgraph and plot.sim provide visual-izations of the simulated data and estimated graphs.

3. Methodological background

In Section 3.1, we brieﬂy explain the Gaussian graphical model for multivariate data. Then weillustrate the birth-death MCMC algorithm for sampling from the joint posterior distributionover Gaussian graphical models; for more details see Mohammadi and Wit (2015). In Section3.2, we brieﬂy describe the Gaussian copula graphical model (Dobra and Lenkoski 2011),which can deal with non-Gaussian, discrete or mixed data. Then we explain the birth-deathMCMC algorithm which is designed for the Gaussian copula graphical models; for more detailssee Mohammadi et al. (2017a).

In graphical models, each random variable is associated with a node and conditional depen-dence relationships among random variables are presented as a graph G = ( V, E ) in which V = { , , ..., p } speciﬁes a set of nodes and a set of existing links E ⊂ V × V (Lauritzen 1996).Our focus here is on undirected graphs, in which ( i, j ) ∈ E ⇔ ( j, i ) ∈ E . The absence of a ournal of Statistical Software N p ( µ, K − ). Here we assume µ = 0. Let Z = ( Z (1) , ..., Z ( n ) ) (cid:62) be theobserved data of n independent samples, then the likelihood function is P r ( Z | K, G ) ∝ | K | n/ exp (cid:26) −

12 tr( KU ) (cid:27) , (1)where U = Z (cid:62) Z .In GGMs, conditional independence is implied by the form of the precision matrix. Basedon the pairwise Markov property, variables i and j are conditionally independent given theremaining variables, if and only if K ij = 0. This property implies that the links in graph G = ( V, E ) correspond with the nonzero elements of the precision matrix K ; this means that E = { ( i, j ) | K ij (cid:54) = 0 } . Given graph G , the precision matrix K is constrained to the cone P G of symmetric positive deﬁnite matrices with elements K ij equal to zero for all ( i, j ) / ∈ E .We consider the G-Wishart distribution W G ( b, D ) to be a prior distribution for the precisionmatrix K with density P r ( K | G ) = 1 I G ( b, D ) | K | ( b − / exp (cid:26) −

12 tr( DK ) (cid:27) K ∈ P G ) , (2)where b > D is a symmetric positive deﬁnite matrix, I G ( b, D ) isthe normalizing constant with respect to the graph G and 1( x ) evaluates to 1 if x holds, andotherwise to 0. The G-Wishart distribution is a well-known prior for the precision matrix,since it represents the conjugate prior for multivariate Gaussian data as in (1).For full graphs, the G-Wishart distribution reduces to the standard Wishart distribution,hence the normalizing constant has an explicit form (Muirhead 1982). Also, for decomposablegraphs, the normalizing constant has an explicit form (Roverato 2002); however, for non-decomposable graphs, it does not. In that case it can be estimated by using the Monte Carlomethod (Atay-Kayis and Massam 2005), the Laplace approximation (Lenkoski and Dobra2011), or recent approximation by Mohammadi et al. (2017b). In the BDgraph package, wedesign the gnorm function to estimate the log of the normalizing constant by using the MonteCarlo method proposed Atay-Kayis and Massam (2005).Since the G-Wishart prior is a conjugate prior to the likelihood (1), the posterior distributionof K is P r ( K | Z , G ) = 1 I G ( b ∗ , D ∗ ) | K | ( b ∗ − / exp (cid:26) −

12 tr( D ∗ K ) (cid:27) , where b ∗ = b + n and D ∗ = D + S , that is, W G ( b ∗ , D ∗ ). Direct sampler from G-Wishart

Several sampling methods from the G-Wishart distribution have been proposed; to reviewexisting methods see Wang and Li (2012). More recently, Lenkoski (2013) has developedan exact sampling algorithm for the G-Wishart distribution, borrowing an idea from Hastie,Tibshirani, and Friedman (2009).

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

Algorithm 1 . Exact sampling from the precision matrix

Input:

A graph G = ( V, E ) with precision matrix K and Σ = K − Output:

An exact sample from the precision matrix. Set Ω = Σ repeat for i = 1 , ..., p do Let N i ⊂ V be the neighbor set of node i in G . Form Ω N i and Σ N i ,i and solveˆ β ∗ i = Ω − N i Σ N i ,i Form ˆ β i ∈ R p − by padding the elements of ˆ β ∗ i to the appropriate locations and zerosin those locations not connected to i in G Update Ω i, − i and Ω − i,i with Ω − i, − i ˆ β i end for until convergence return K = Ω − In the

BDgraph package, we use Algorithm 1 to sample from the posterior distribution of theprecision matrix. We implement the algorithm in the package as a function rgwish ; see the R code below for illustration. R> adj <- matrix( c( 0, 0, 1, 0, 0, 0, 1, 0, 0 ), 3, 3 )R> adj [,1] [,2] [,3][1,] 0 0 1[2,] 0 0 0[3,] 1 0 0

R> sample <- rgwish( n = 1, adj = adj, b = 3, D = diag( 3 ) )R> round( sample, 2 ) [,1] [,2] [,3][1,] 2.37 0.00 -2.12[2,] 0.00 6.15 0.00[3,] -2.12 0.00 7.26

This matrix is a sample from a G-Wishart distribution with b = 3 and D = I as an identitymatrix and a graph structure with adjacency matrix adj . BDMCMC algorithm for GGMs

Consider the joint posterior distribution of the graph G and the precision matrix K given by P r ( K, G | Z ) ∝ P r ( Z | K ) P r ( K | G ) P r ( G ) . (3)For the prior distribution of the graph G = ( V, E ), we consider a Bernoulli prior on each linkinclusion indicator variable as follow

P r ( G ) ∝ (cid:18) θ − θ (cid:19) | E | , (4) ournal of Statistical Software | E | indicate the number of links in the graph G (graph size) and parameter θ ∈ (0 ,

1) isa prior probability of existing link. For the case θ = 0 . BDgraph ),we will have a uniform distribution over all graph space, as a non-informative prior. For theprior distribution of the precision matrix conditional on the graph G , we use a G-Wishart W G ( b, D ).Here we consider a computationally eﬃcient birth-death MCMC sampling algorithm proposedby Mohammadi and Wit (2015) for Gaussian graphical models. The algorithm is based on acontinuous time birth-death Markov process, in which the algorithm explores the graph spaceby adding/removing a link in a birth/death event.In the birth-death process, for a particular pair of graph G = ( V, E ) and precision matrix K ,each link dies independently of the rest as a Poisson process with death rate δ e ( K ). Since thelinks are independent, the overall death rate is δ ( K ) = (cid:80) e ∈ E δ e ( K ). Birth rates β e ( K ) for e / ∈ E are deﬁned similarly. Thus the overall birth rate is β ( K ) = (cid:80) e/ ∈ E β e ( K ).Since the birth and death events are independent Poisson processes, the time between twosuccessive events is exponentially distributed with mean 1 / ( β ( K ) + δ ( K )). The time betweensuccessive events can be considered as inverse support for any particular instance of the state( G, K ). The probabilities of birth and death events are

P r (birth of link e ) = β e ( K ) β ( K ) + δ ( K ) , for each e / ∈ E, (5) P r (death of link e ) = δ e ( K ) β ( K ) + δ ( K ) , for each e ∈ E. (6)The birth and death rates of links occur in continuous time with the rates determined by thestationary distribution of the process. The BDMCMC algorithm is designed in such a waythat the stationary distribution is equal to the target joint posterior distribution of the graphand the precision matrix (3).Mohammadi and Wit (2015, Theorem 3.1) derived a condition that guarantees the abovebirth and death process converges to our target joint posterior distribution (3). By followingtheir Theorem we deﬁne the birth and death rates, as below β e ( K ) = min (cid:26) P r ( G + e , K + e | Z ) P r ( G, K | Z ) , (cid:27) , for each e / ∈ E, (7) δ e ( K ) = min (cid:26) P r ( G − e , K − e | Z ) P r ( G, K | Z ) , (cid:27) , for each e ∈ E, (8)in which G + e = ( V, E ∪ { e } ) and K + e ∈ P G + e and similarly G − e = ( V, E \ { e } ) and K − e ∈ P G − e . For computation part related to the ratio of posterior see Mohammadi et al. (2017b).Algorithm 2 provides the pseudo-code for our BDMCMC sampling scheme which is based onthe above birth and death rates.Note, step 1 of the algorithm is suitable for parallel computation. In the BDgraph , we imple-ment this step of algorithm in parallel using

OpenMP in C++ to speed up the computations.The BDMCMC sampling algorithm is designed in such a way that a sample (

G, K ) is ob-tained at certain jump moments, { t , t , ... } (see Figure 2). For eﬃcient posterior inference BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

Algorithm 2 . BDMCMC algorithm for GGMs

Input:

A graph G = ( V, E ) and a precision matrix K . Output:

Samples from the joint posterior distribution of (

G, K ), (3), and waiting times. for N iteration do

1. Sample from the graph.

Based on birth and death process β ( K ) = (cid:80) e ∈ / ∈ E β e ( K ) δ ( K ) = (cid:80) e ∈ E δ e ( K ) W ( K ) = 1 / ( β ( K ) + δ ( K ))

2. Sample from the precision matrix.

By using Algorithm 1. end for of the parameters, we use the Rao-Blackwellized estimator, which is an eﬃcient estimator forcontinuous time MCMC algorithms (Capp´e, Robert, and Ryd´en 2003, Section 2.5). By usingthe Rao-Blackwellized estimator, for example, one can estimate the posterior distribution ofthe graphs proportional to the total waiting times of each graph. G " G G $ G % G & G ' Pr G data time t ' t & t % t $ t t " t - .Pr G data G GG W ' G " G G $ G % G & G ' BDMCMC sampling algorithm scheme Estimated graphdistributionGraph distribution W & Figure 2: This image visualizes the Algorithm 2. The left side shows the true posteriordistribution of the graph. The middle panel presents a continuous time BDMCMC samplingalgorithm where { W , W , ... } denote waiting times and { t , t , ... } denote jumping times. Theright side denotes the estimated posterior probability of the graphs in proportion to the totalof their waiting times, according to the Rao-Blackwellized estimator. In practice we encounter both discrete and continuous variables; Gaussian copula graphicalmodelling has been proposed by Dobra and Lenkoski (2011) to describe dependencies betweensuch heterogeneous variables. Let Y (as observed data) be a collection of continuous, binary,ordinal or count variables with the marginal distribution F j of Y j and F − j as its pseudoinverse. For constructing a joint distribution of Y , we introduce a multivariate Gaussian ournal of Statistical Software Z , ..., Z n iid ∼ N p (0 , Γ( K )) ,Y ij = F − j (Φ( Z ij )) , (9)where Γ( K ) is the correlation matrix for a given precision matrix K . The joint distributionof Y is given by P r ( Y ≤ Y , . . . , Y p ≤ Y p ) = C ( F ( Y ) , . . . , F p ( Y p ) | Γ( K )) , (10)where C ( · ) is the Gaussian copula given by C ( u , . . . , u p | Γ) = Φ p (cid:0) Φ − ( u ) , . . . , Φ − ( u p ) | Γ (cid:1) , with u v = F v ( Y v ) and Φ p ( · ) is the cumulative distribution of multivariate Gaussian and Φ( · )is the cumulative distribution of univariate Gaussian distributions. It follows that Y v = F − v (Φ( Z v )) for v = 1 , ..., p . If all variables are continuous then the margins are unique;thus zeros in K imply conditional independence, as in Gaussian graphical models (Hoﬀ 2007;Abegaz and Wit 2015). For discrete variables, the margins are not unique but still well-deﬁned(Nelsen 2007).In semiparametric copula estimation, the marginals are treated as nuisance parameters andestimated by the rescaled empirical distribution. The joint distribution in (10) is thenparametrized only by the correlation matrix of the Gaussian copula. We are interested toinfer the underlying graph structure of the observed variables Y implied by the continuouslatent variables Z . Since Z are unobservable we follow the idea of Hoﬀ (2007) of associatingthem with the observed data as below.Given the observed data Y from a sample of n observations, we constrain the samples fromlatent variables Z to belong to the set D ( Y ) = { Z ∈ R n × p : L rj ( Z ) < z ( r ) j < U rj ( Z ) , r = 1 , . . . , n ; j = 1 , . . . , p } , where L rj ( Z ) = max (cid:110) Z ( s ) j : Y ( s ) j < Y ( r ) j (cid:111) and U rj ( Z ) = min (cid:110) Z ( s ) j : Y ( r ) j < Y ( s ) j (cid:111) . (11)Following Hoﬀ (2007) we infer the latent space by substituting the observed data Y with theevent D ( Y ) and deﬁne the likelihood as P r ( Y | K, G, F , ..., F p ) = P r ( Z ∈ D ( Y ) | K, G ) P r ( Y | Z ∈ D ( Y ) , K, G, F , ..., F p ) . The only part of the observed data likelihood relevant for inference on K is P r ( Z ∈ D ( Y ) | K, G ). Thus, the likelihood function is given by

P r ( Z ∈ D ( Y ) | K, G ) =

P r ( Z ∈ D ( Y ) | K, G ) = (cid:90) D ( Y ) P r ( Z | K, G ) dZ (12)where P r ( Z | K, G ) is deﬁned in (1).

BDMCMC algorithm for GCGMs BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

The joint posterior distribution of the graph G and precision matrix K for the GCGMs is P r ( K, G | Z ∈ D ( Y )) ∝ P r ( K, G ) P r ( Z ∈ D ( Y ) | K, G ) . (13)Sampling from this posterior distribution can be done by using the birth-death MCMC al-gorithm. Mohammadi et al. (2017a) have developed and extended the birth-death MCMCalgorithm to more general cases of GCGMs. We summarize their algorithm as follows: Algorithm 3 . BDMCMC algorithm for GCGMs

Input:

A graph G = ( V, E ) and a precision matrix K . Output:

Samples from the joint posterior distribution of (

G, K ), (13), and waiting times. for N iteration do

1. Sample the latent data.

For each r ∈ V and j ∈ { , ..., n } , we update the latentvalues from its full conditional distribution as follows Z ( j ) r | Z V \{ r } = z ( j ) V \{ r } , K ∼ N ( − (cid:88) r (cid:48) K rr (cid:48) z ( j ) r (cid:48) /K rr , /K rr ) , truncated to the interval (cid:104) L jr ( Z ) , U jr ( Z ) (cid:105) in (11).

2. Sample from the graph.

Same as step 1 in Algorithm 2.

3. Sample from the precision matrix.

By using Algorithm 1. end for In step 1, the latent variables Z are sampled conditional on the observed data Y . The othersteps are the same as in Algorithm 2. Remark: in cases where all variables are continuous, we do not need to sample from latentvariables in each iteration of Algorithm 2, since all margins in the Gaussian copula are unique.Thus, for these cases, we transfer our non-Gaussian data to Gaussian, and then we runAlgorithm 2; see example 6.2.

Alternative RJMCMC algorithm

RJMCMC is a special case of the trans-dimensional MCMC methodology Green (2003). TheRJMCMC approach is based on an ergodic discrete-time Markov chain. In graphical models,a RJMCMC algorithm can be designed in such a way that its stationary distribution is thejoint posterior distribution of the graph and the parameters of the graph, e.g., 3 for GGMsand 13 for GCGMs.A RJMCMC can be implemented in various diﬀerent ways. Giudici and Green (1999) im-plemented this algorithm only for the decomposable GGMs, because of the expensive com-putation of the normalizing constant I G ( b, D ). The RJMCMC approach developed by Dobra et al. (2011) and Dobra and Lenkoski (2011) is based on the Cholesky decomposition of theprecision matrix. It uses an approximation for dealing with the extensive computation of thenormalizing constant. To avoid the intractable normalizing constant calculation, Lenkoski(2013) and Wang and Li (2012) implemented a special case of RJMCMC algorithm, whichis based on the exchange algorithm (Murray, Ghahramani, and MacKay 2012). Our imple-mentation of RJMCMC algorithm in the BDgraph package deﬁnes the acceptance probabilityproportional to the birth/death rates in our BDMCMC algorithm. Moreover, we implement ournal of Statistical Software et al. (2017b) for the ratio of the normalizing constant of G-Wishartdistribution.

4. The BDgraph environment

The

BDgraph package provides a set of comprehensive tools related to Bayesian graphicalmodels; we describe below the essential functions available in the package.

We design the function bdgraph , as the main function of the package, to take samples fromthe posterior distributions based on both of our Bayesian frameworks (GGMs and GCGMs).By default, the bdgraph function is based on underlying sampling algorithms (Algorithms 2and 3). Moreover, as an alternative to those BDMCMC sampling algorithms, we implementRJMCMC sampling algorithms for both the Gaussian and non-Gaussian frameworks. Byusing the following function bdgraph( data, n = NULL, method = "ggm", algorithm = "bdmcmc", iter = 5000,burnin = iter / 2, not.cont = NULL, g.prior = 0.5, df.prior = 3,g.start = "empty", jump = NULL, save = FALSE, print = 1000, cores = NULL,threshold = 1e-8 ) we obtain a sample from our target joint posterior distribution. bdgraph returns an objectof S3 class type “ bdgraph ”. The functions plot , print and summary are working with theobject “ bdgraph ”. The input data can be an ( n × p ) matrix or a data.frame or a covariance( p × p ) matrix ( n is the sample size and p is the dimension); it can also be an object of class“ sim ”, which is the output of function bdgraph.sim .The argument method determines the type of methods, GGMs, GCGMs. Option “ ggm ” isbased on Gaussian graphical models (Algorithm 2) that is designed for multivariate Gaussiandata. Option “ gcgm ” is based on the GCGMs (Algorithm 3) that is designed for non-Gaussiandata such as, non-Gaussian continuous, discrete or mixed data.The argument algorithm refers the type of sampling algorithms which could be based onBDMCMC or RJMCMC. Option “ bdmcmc ” (as default) is for the BDMCMC sampling algo-rithms (Algorithms 2 and 3). Option “ rjmcmc ” is for the RJMCMC sampling algorithms,which are alternative algorithms. See Mohammadi and Wit (2015, Section 4), Mohammadi et al. (2017a, Section 2.2.3).The argument g.start speciﬁes the initial graph for our sampling algorithm. It could be empty (default) or full . Option empty means the initial graph is an empty graph and full means a full graph. It also could be an object with S3 class "bdgraph" , which allows usersto run the sampling algorithm from the last objects of the previous run.The argument jump determines the number of links that are simultaneously updated in theBDMCMC algorithm.For parallel computation in C++ which is based on

OpenMP (Board 2008), user can useargument cores which speciﬁes the number of cores to use for parallel execution.2

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

Note, the package

BDgraph has two other sampling functions, bdgraph.mpl and bdgraph.ts which are designed in the similar framework as the function bdgraph . The function bdgraph.mpl is for Bayesian model determination in undirected graphical models based on marginal pseudo-likelihood, for both continuous and discrete variables; For more details see Dobra and Mo-hammadi (2018). The function bdgraph.ts is for Bayesian model determination in time seriesgraphical models (Tank, Foti, and Fox 2015).

We design the

BDgraph package in such a way that posterior graph selection can be done basedon both Bayesian model averaging (BMA), as default, and maximum a posterior probability(MAP). The functions select and plinks are designed for the objects of class bdgraph toprovide BMA and MAP estimations for posterior graph selection.The function plinks( bdgraph.obj, round = 2, burnin = NULL ) provides estimated posterior link inclusion probabilities for all possible links, which is basedon BMA estimation. In cases where the sampling algorithm is based on BDMCMC, theseprobabilities for all possible links e = ( i, j ) in the graph can be estimated using a Rao-Blackwellized estimate (Capp´e et al. P r ( e ∈ E | data ) = (cid:80) Nt =1 e ∈ E ( t ) ) W ( K ( t ) ) (cid:80) Nt =1 W ( K ( t ) ) , (14)where N is the number of iteration and W ( K ( t ) ) are the weights of the graph G ( t ) with theprecision matrix K ( t ) .The function select( bdgraph.obj, cut = NULL, vis = FALSE ) provides the inferred graph based on both BMA (as default) and MAP estimators. The in-ferred graph based on BMA estimation is a graph with links for which the estimated posteriorprobabilities are greater than a certain cut-point (as default cut=0.5 ). The inferred graphbased on MAP estimation is a graph with the highest posterior probability.Note, for posterior graph selection based on MAP estimation we should save all adjacencymatrices by using the option save = TRUE in the function bdgraph . Saving all the adjacencymatrices could, however, cause memory problems; to see how we cope with this problem thereader is referred to Appendix 7. In general, convergence in MCMC approaches can be diﬃcult to evaluate. From a theoreticalpoint of view, the sampling distribution will converge to the target joint posterior distributionas the number of iteration increases to inﬁnity. Because we normally have little theoreticalinsight about how quickly MCMC algorithms converge to the target stationary distribution wetherefore rely on post hoc testing of the sampled output. In general, the sample is divided into ournal of Statistical Software plotcoda and traceplot are two visualization functions for the objects of class bdgraph that make it possible to check the convergence of the search algorithms in

BDgraph . Thefunction plotcoda( bdgraph.obj, thin = NULL, control = TRUE, main = NULL, ... ) provides the trace of estimated posterior probability of all possible links to check convergenceof the search algorithms. Option control is designed for the case where if control=TRUE (asdefault) and the dimension ( p ) is greater than 15, then 100 links are randomly selected forvisualization.The function traceplot( bdgraph.obj, acf = FALSE, pacf = FALSE, main = NULL, ... ) provides the trace of graph size to check convergence of the search algorithms. Option acf is for visualization of the autocorrelation functions for graph size; option pacf visualizes thepartial autocorrelations. The functions compare and plotroc are designed to evaluate and compare the performanceof the selected graph. These functions are particularly useful for simulation studies. Withthe function compare( target, est, est2 = NULL, est3 = NULL, est4 = NULL, main = NULL,vis = FALSE ) we can evaluate the performance of the Bayesian methods available in our

BDgraph packageand compare them with alternative approaches. This function provides several measures suchas the balanced F -score measure (Baldi, Brunak, Chauvin, Andersen, and Nielsen 2000),which is deﬁned as follows: F -score = 2TP2TP + FP + FN , (15)where TP, FP and FN are the number of true positives, false positives and false negatives,respectively. The F -score lies between 0 and 1, where 1 stands for perfect identiﬁcation and0 for no true positives.The function plotroc( target, est, est2 = NULL, est3 = NULL, est4 = NULL, cut = 20,smooth = FALSE, label = TRUE, main = "ROC Curve" ) provides a ROC plot for visualization comparison based on the estimated posterior link in-clusion probabilities.4 BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

The function bdgraph.sim is designed to simulate diﬀerent types of datasets with variousgraph structures. The function bdgraph.sim( p = 10, graph = "random", n = 0, type = "Gaussian", prob = 0.2,size = NULL, mean = 0, class = NULL, cut = 4, b = 3, D = diag( p ),K = NULL, sigma = NULL, vis = FALSE ) can simulate multivariate Gaussian, non-Gaussian, discrete, binary and mixed data with dif-ferent undirected graph structures, including "random" , "cluster" , "scale-free" , "lattice" , "hub" , "star" , "circle" , "AR(1)" , "AR(2)" , and "fixed" graphs. Users can specify thetype of multivariate data by option type and the graph structure by option graph . Theycan determine the sparsity level of the obtained graph by using option prob . With this func-tion users can generate mixed data from "count" , "ordinal" , "binary" , "Gaussian" and "non-Gaussian" distributions. bdgraph.sim returns an object of the S3 class type “ sim ”.Functions plot and print work with this object type.There is another function in the BDgraph package with the name graph.sim which is designedto simulate diﬀerent types of graph structures. The function graph.sim( p = 10, graph = "random", prob = 0.2, size = NULL, class = NULL,cut = 4, vis = FALSE ) can simulate diﬀerent undirected graph structures, including "random" , "cluster" , "scale-free" ’, "lattice" , "hub" , "star" , and "circle" graphs. Users can specify the type of graph struc-ture by option graph . They can determine the sparsity level of the obtained graph by usingoption prob . bdgraph.sim returns an object of the S3 class type “ graph ”. Functions plot and print work with this object type.

5. An example on simulated data

We illustrate the user interface of the

BDgraph package by use of a simple simulation. Weperform all the computations on an MacBook Pro with 2 . bdgraph.sim we simulate 60 observations ( n = 60) from a multivariateGaussian distribution with 8 variables ( p = 8) and “scale-free” graph structure, as below. R> data.sim <- bdgraph.sim( n = 60, p = 8, graph = "scale-free",+ type = "Gaussian" )R> round( head( data.sim $ data, 4 ), 2 ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 0.72 -0.91 -1.23 -0.16 0.20 -0.47 0.08 1.07[2,] 0.25 -0.11 0.09 0.53 0.10 -0.04 -0.13 -0.67[3,] -0.42 -0.09 -0.28 -0.42 2.04 0.84 -0.79 1.24[4,] -0.33 -0.50 0.68 -1.33 -1.15 0.25 -0.35 2.97

Since the generated data are Gaussian, we run the BDMCMC algorithm which is based onGaussian graphical models. For this we choose method = "ggm" , as follows: ournal of Statistical Software R> sample.bdmcmc <- bdgraph( bdgraph( data = data.sim, method = "ggm",+ algorithm = "bdmcmc", iter = 5000, save = TRUE )

We choose option “ save = TRUE ” to save the samples in order to check convergence of thealgorithm. Running this function takes less than one second, as the computational intensivetasks are performed in

C++ and interfaced with R .Since the function bdgraph returns an object of class S3 , users can see the summary resultas follows R> summary( sample.bdmcmc ) $selected_g[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 0 1 1 0 0 0 1 0[2,] 0 0 0 1 0 0 0 0[3,] 0 0 0 0 0 1 0 0[4,] 0 0 0 0 0 0 0 1[5,] 0 0 0 0 0 0 0 0[6,] 0 0 0 0 0 0 0 0[7,] 0 0 0 0 0 0 0 0[8,] 0 0 0 0 0 0 0 0$p_links[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 0 0.51 1.00 0.27 0.21 0.31 0.74 0.11[2,] 0 0.00 0.29 1.00 0.25 0.18 0.49 0.14[3,] 0 0.00 0.00 0.24 0.27 0.79 0.44 0.22[4,] 0 0.00 0.00 0.00 0.32 0.30 0.34 1.00[5,] 0 0.00 0.00 0.00 0.00 0.25 0.40 0.22[6,] 0 0.00 0.00 0.00 0.00 0.00 0.23 0.37[7,] 0 0.00 0.00 0.00 0.00 0.00 0.00 0.19[8,] 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00$K_hat[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 3.81 0.33 3.19 -0.09 0.04 0.14 -0.84 0.02[2,] 0.33 4.24 -0.06 -3.43 -0.07 -0.02 0.41 -0.02[3,] 3.19 -0.06 5.54 -0.08 -0.06 -0.75 0.41 0.08[4,] -0.09 -3.43 -0.08 9.28 -0.15 0.10 -0.18 1.62[5,] 0.04 -0.07 -0.06 -0.15 0.76 -0.06 0.16 -0.04[6,] 0.14 -0.02 -0.75 0.10 -0.06 3.08 0.04 -0.14[7,] -0.84 0.41 0.41 -0.18 0.16 0.04 5.56 0.04[8,] 0.02 -0.02 0.08 1.62 -0.04 -0.14 0.04 1.21

The summary results are the adjacency matrix of the selected graph ( selected_g ) based onBMA estimation, the estimated posterior probabilities of all possible links ( p_links ) and theestimated precision matrix (

K_hat ).6

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

In addition, the function summary reports a visualization summary of the results as we cansee in Figure 3. At the top-left is the graph with the highest posterior probability. The plotat the top-right gives the estimated posterior probabilities of all the graphs which are visitedby the BDMCMC algorithm; it indicates that our algorithm visits more than 2000 diﬀerentgraphs. The plot at the bottom-left gives the estimated posterior probabilities of the size ofthe graphs; it indicates that our algorithm visited mainly graphs with sizes between 4 and 18links. At the bottom-right is the trace of our algorithm based on the size of the graphs.

Selected graph

Graph with edge posterior probability > 0.5 ll l l l lll

12 34 5 678 . . . . Posterior probability of graphs graph P r( g r aph | da t a ) . . . . Posterior probability of graphs size

Trace of graph size G r aph s i z e Figure 3: Visualization summary of simulation data based on output of bdgraph function.The ﬁgure at the top-left is the inferred graph with the highest posterior probability. Theﬁgure at the top-right gives the estimated posterior probabilities of all visited graphs. Theﬁgure at the bottom-left gives the estimated posterior probabilities of all visited graphs basedon the size of the graphs. The ﬁgure at in the bottom-right is the trace of our algorithmbased on the size of the graphs.The function compare provides several measures to evaluate the performance of our algorithmsand compare them with alternative approaches with respect to the true graph structure. Toevaluate the performance of our BDMCMC algorithm (Algorithm 2) and compare it with thatof an alternative algorithm, we also run the RJMCMC algorithm under the same conditionsas below.

R> sample.rjmcmc <- bdgraph( data = data.sim, method = "ggm",+ algorithm = "rjmcmc", iter = 5000, save = TRUE ) ournal of Statistical Software

R> plotroc( data.sim, sample.bdmcmc, sample.rjmcmc, smooth = TRUE ) which visualizes an ROC plot for both algorithms, BDMCMC and RJMCMC; see Figure 4. . . . . . . ROC Curve

False Postive Rate T r ue P o s t i v e R a t e BDMCMCRJMCMC

Figure 4: ROC plot to compare the performance of the BDMCMC and RJMCMC algorithmsfor a simulated toy example.We can also compare the performance of those algorithms by using the compare function asfollows:

R> compare( data.sim, sample.bdmcmc, sample.rjmcmc,+ main = c( "True graph", "BDMCMC", "RJMCMC" ) )

True graph BDMCMC RJMCMCtrue positive 7 5.000 5.000true negative 21 20.000 19.000false positive 0 1.000 2.000false negative 0 2.000 2.000F1-score 1 0.769 0.714specificity 1 0.952 0.905sensitivity 1 0.714 0.714MCC 1 0.704 0.619

The results show that for this speciﬁc simulated example both algorithms have more or lessthe same performance; See Mohammadi and Wit (2015, Section 4), Mohammadi et al. (2017a,Section 2.2.3).In this simulation example, we run both BDMCMC and RJMCMC algorithms for 5 , ,

500 of them as burn-in. To check whether the number of iterations is enoughand to monitoring the convergence of our both algorithm, we run8

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

R> plotcoda( sample.bdmcmc )R> plotcoda( sample.rjmcmc )

The results in Figure 5 indicate that our BDMCMC algorithm converges faster with comparewith RJMCMC algorithm. . . . . . . Iteration P o s t e r i o r li n k p r obab ili t y Trace of the Posterior Probabilities of the Links . . . . . . Iteration P o s t e r i o r li n k p r obab ili t y Trace of the Posterior Probabilities of the Links

Figure 5: Plot for monitoring the convergence based on the trace of estimated posterior prob-ability of all possible links for the BDMCMC algorithm (left) and the RJMCMC algorithm(right).

6. Application to real datasets

In this section we analyze two datasets from genetics and sociology, using the functionsavailable in the

BDgraph package. In Section 6.1 we analyze a labor force survey dataset,involving mixed data. In Section 6.2 we analyze human gene expression data, which do notfollow the Gaussianity assumption. Both datasets are available in the

BDgraph package.

Hoﬀ (2007) analyzes the multivariate associations among income, education and family back-ground, using data concerning 1002 males in the U.S labor force. The dataset is available inthe

BDgraph package.

R> data( "surveyData" )R> head( surveyData, 5 ) income degree children pincome pdegree pchildren age[1,] NA 1 3 3 1 5 59[2,] 11 0 3 NA 0 7 59[3,] 8 1 1 NA 0 9 25 ournal of Statistical Software [4,] 25 3 2 NA 0 5 55[5,] 100 3 2 4 3 2 56 Missing data are indicated by NA ; in general, the rate of missing data is about 9%, with higherrates for the variables income and pincome . In this dataset we have seven observed variablesas follows: income : An ordinal variable indicating respondent’s income in 1000s of dollars. degree : An ordinal variable with ﬁve categories indicating respondent’s highest educationaldegree. children : A count variable indicating number of children of respondent. pincome : An ordinal variable with ﬁve categories indicating ﬁnancial status of respondent’sparents. pdegree : An ordinal variable with ﬁve categories indicating highest educational degree ofrespondent’s parents. pchildren : A count variable indicating number of children of respondent’s parents. age : A count variable indicating respondent’s age in years.Since the variables are measured on various scales, the marginal distributions are heteroge-neous, which makes the study of their joint distribution very challenging. However, we canapply to this dataset our Bayesian framework based on the Gaussian copula graphical models.Thus, we run the function bdgraph with option method = "gcgm" . For the prior distributionsof the graph and precision matrix, as default of the function bdgraph , we place a uniformdistribution as a uninformative prior on the graph and a G-Wishart W G (3 , I ) on the precisionmatrix. We run our function for 10 ,

000 iterations with 7 ,

000 as burn-in.

R> sample.bdmcmc <- bdgraph( data = surveyData, method = "gcgm",+ iter = 10000, burnin = 7000 )R> summary( sample.bdmcmc ) $selected_gincome degree children pincome pdegree pchildren ageincome 0 1 1 0 0 0 1degree 0 0 1 0 1 1 0children 0 0 0 0 1 1 1pincome 0 0 0 0 1 0 0pdegree 0 0 0 0 0 1 1pchildren 0 0 0 0 0 0 0age 0 0 0 0 0 0 0$p_links income degree children pincome pdegree pchildren ageincome 0 1 1.00 0.37 0.06 0.05 1.00degree 0 0 0.67 0.20 1.00 0.78 0.16children 0 0 0.00 0.34 0.72 1.00 1.00pincome 0 0 0.00 0.00 1.00 0.40 0.09pdegree 0 0 0.00 0.00 0.00 0.92 0.99 BDgraph : An R Package for Bayesian Structure Learning in Graphical Models pchildren 0 0 0.00 0.00 0.00 0.00 0.05age 0 0 0.00 0.00 0.00 0.00 0.00$K_hat income degree children pincome pdegree pchildren ageincome 1.33 -1.46 -0.54 -0.10 0.00 0.00 -0.33degree -1.46 7.63 0.46 0.08 -1.20 0.23 -0.04children -0.54 0.46 7.21 0.19 0.26 -0.40 -1.81pincome -0.10 0.08 0.19 6.92 -1.09 0.13 0.01pdegree 0.00 -1.20 0.26 -1.09 1.36 0.20 0.22pchildren 0.00 0.23 -0.40 0.13 0.20 1.17 0.00age -0.33 -0.04 -1.81 0.01 0.22 0.00 1.79

The results of the function summary are the adjacency matrix of the selected graph ( selected_g ),estimated posterior probabilities of all possible links ( p_links ) and estimated precision matrix(

K_hat ). + + + − + − − + + + − − incomedegree children pincomepdegreepchildren age Figure 6: Inferred graph for the labor force survey data based on output from bdgraph .Sign “+” represents a positively correlated relationship between associated variables and “-”represents a negatively correlated relationship.Figure 6 presents the selected graph, a graph with links for which the estimated posteriorprobabilities are greater than 0 .

5. Links in the graph are indicated by signs “+” and “-”, whichrepresent positively and negatively correlated relationships between associated variables, re-spectively. ournal of Statistical Software . pdegree ) and fertility( child ), a link which is not selected by Hoﬀ. Here, by using the functions that are available in the

BDgraph package, we study the structurelearning of the sparse graphs applied to the human gene expression data which were originallydescribed by Stranger, Nica, Forrest, Dimas, Bird, Beazley, Ingle, Dunning, Flicek, Koller et al. (2007). They collected data to measure gene expression in B-lymphocyte cells from Utahinhabitants with Northern and Western European ancestry. They considered 60 individualswhose genotypes were available online at ftp://ftp.sanger.ac.uk/pub/genevar . Here thefocus was on the 3 ,

125 Single Nucleotide Polymorphisms (SNPs) that were found in the 5’UTR (untranslated region) of mRNA (messenger RNA) with a minor allele frequency ≥ . ,

293 total available probes, we consider the 100 most variable probes thatcorrespond to diﬀerent Illumina TargetID transcripts. The data for these 100 probes areavailable in our package. To see the data users can run the code

R> data( "geneExpression" )R> dim( geneExpression )

60 100

The data consist of only 60 observations ( n = 60) across 100 genes ( p = 100). This datasetis an interesting case study for graph structure learning, as it has been used by Bhadra andMallick (2013); Mohammadi and Wit (2015); Gu, Cao, Ning, and Liu (2015).In this dataset, all the variables are continuous but not Gaussian, as can be seen in Figure 7.Thus, we apply Gaussian copula graphical models, using the function bdgraph with option method = "gcgm" . For the prior distributions of the graph we use a Bernoulli prior on eachlink inclusion (4), encourage sparsity by considering θ = 0 .

1, using the function bdgraph withoption g.prior = 0.1 . For the prior distributions of the precision matrix, as default of thefunction bdgraph , we place the G-Wishart W G (3 , I ) on the precision matrix.We run our function for 10 ,

000 iterations with 7 ,

000 as burn-in as follows:

R> sample.bdmcmc <- bdgraph( data = geneExpression, method = "gcgm",+ g.prior = 0.1, iter = 10000, burnin = 7000 ) BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

GI_18426974−S F r equen cy GI_41197088−S F r equen cy GI_17981706−S F r equen cy GI_41190507−S F r equen cy GI_33356162−S F r equen cy Hs.449605−S F r equen cy Figure 7: Univariate histograms of ﬁrst 6 genes in human gene dataset.This took less than 3 minutes. We use the following code to visualize the graph with estimatedposterior probabilities greater than 0 . R> select( sample.bdmcmc, cut = 0.5, vis = TRUE )

By using option vis = TRUE , the function plots the selected graph. Figure 9 visualizes theselected graph which consists of 176 links with estimated posterior probabilities (14) greaterthan 0 . plinks reports the estimated posterior probabilities of all possible links in thegraph. For our data the output of this function is a 100 ×

100 matrix. Figure 8 reports thevisualization of that matrix.Most of the links in our selected graph conform to results in previous studies. For instance,Bhadra and Mallick (2013) found 54 signiﬁcant interactions between genes, most of which arecovered by our method. In addition, our approach indicates additional gene interactions withhigh posterior probabilities that are not found in previous studies; this result may complementthe analysis of human gene interaction networks.

7. Conclusion

We have presented the

BDgraph package which was designed for Bayesian structure learn-ing in general – decomposable and non-decomposable – undirected graphical models. Thepackage implements recent improvements in computation, sampling and inference of Gaus-sian graphical models (Mohammadi and Wit 2015; Dobra et al. et al.

BDgraph package in the future. Futureversions of the package will contain more options for prior distributions of the graph and ournal of Statistical Software Posterior Probabilities of all Links

Figure 8: Image visualization of the estimated posterior probabilities of all possible links inthe graph on human gene expression data.precision matrix. One possible extension of our package, would be to deal with outliers, byusing robust Bayesian graphical modelling using Dirichlet t -Distributions (Finegold and Drton2014; Mohammadi and Wit 2014). An implementation of this method would be desirable inactual applications. Acknowledgments

The authors are grateful to the associated editor and reviewers for helpful criticism of the orig-inal of both the manuscript and the R package. We would like to thank Sven Baars for parallelimplementation in C++ . We also would like to thank Sourabh Kotnala for implementing thepackage in

C++ . Appendix: Dealing with memory usage restriction

The memory usage restriction is one of the challenges of Bayesian inference for maximum aposterior probability (MAP) estimation and monitoring the convergence check, especially forhigh-dimensional problems. For example, to compute MAP estimation in our

BDgraph pack-age, we must document the adjacency matrices of all the visited graphs by our MCMC sam-4

BDgraph : An R Package for Bayesian Structure Learning in Graphical Models l l ll l l ll ll lll l lll ll l lll ll ll llll l ll l lll l lll l ll l ll l ll ll l l ll l lll l ll ll ll ll lll l l ll ll lll ll llll l l llll

GI_1842 GI_4119 GI_1798GI_4119GI_3335 Hs.4496 GI_3754Hs.5121 GI_3754Hs.4495 Hs.4064GI_1864Hs.4496 hmm3574hmm1029Hs.4496GI_1109 Hs.5121GI_3754 hmm3577GI_2138GI_2775GI_1351 GI_1302GI_4504 GI_1199GI_3335 GI_3753hmm1028GI_4266GI_3491 GI_3137 GI_4265GI_4119 GI_2351 GI_7661GI_2748GI_1655GI_3422GI_3165GI_2775GI_8923GI_2007 GI_3079GI_3107 GI_2789 GI_2430GI_1974 GI_2861 GI_2776GI_4507GI_2146GI_1421 GI_2789 Hs.1851 GI_4505GI_3422 GI_2747 GI_4504GI_2161GI_2449 GI_1922 GI_2202GI_9961GI_2138GI_2479 GI_3856GI_2855 GI_2030GI_1615 Hs.1712Hs.1363GI_4502 GI_4504 GI_7657 GI_4247GI_3754 GI_7662GI_1332 GI_4505GI_4135GI_2037 hmm9615hmm3587 GI_7019GI_3753GI_3134GI_1864 GI_3040 GI_5454GI_4035GI_1837GI_4507GI_1460

Figure 9: The inferred graph for human gene expression data using Gaussian copula graphicalmodels. This graph consists of 176 links with estimated posterior probabilities greater than0 . R . Indeed, the function bdgraph in our package in case save = TRUE is documented to return all of the adjacency matrices forall iterations after burn-in. For instance, for the case R> iter <- 10000R> burnin <- 7000R> p <- 100R> graph <- matrix( 1, p, p )R> print( ( iter - burnin ) * object.size( graph ), units = "auto" )

A naive way is to save all the matrices, which leads to memory usage restriction, as it costs3 . R> string_graph <- paste( graph[ upper.tri( graph ) ], collapse = '' )R> print( ( iter - burnin ) * object.size( string_graph ), units = "auto" ) ournal of Statistical Software In this eﬃcient way we need only 241 . . References

Abegaz F, Wit E (2015). “Copula Gaussian graphical models with penalized ascent MonteCarlo EM algorithm.”

Statistica Neerlandica , (4), 419–441. doi:10.1111/stan.12066 .Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A,Hammarling S, McKenney A, Sorensen D (1999). LAPACK

Users’ Guide . Third edition.Society for Industrial and Applied Mathematics, Philadelphia, PA. ISBN 0-89871-447-8(paperback).Atay-Kayis A, Massam H (2005). “A Monte Carlo method for computing the marginal likeli-hood in nondecomposable Gaussian graphical models.”

Biometrika , (2), 317–335. doi:10.1093/biomet/92.2.317 .Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000). “Assessing the Accuracy ofPrediction Algorithms for Classiﬁcation: an Overview.” Bioinformatics , (5), 412–424. doi:10.1093/bioinformatics/16.5.412 .Behrouzi P, Wit EC (2017). “netgwas: An R Package for Network-Based Genome-WideAssociation Studies.” arXiv preprint arXiv:1710.01236 .Behrouzi P, Wit EC (2019). “Detecting epistatic selection with partially observed genotypedata by using copula graphical models.” Journal of the Royal Statistical Society: Series C(Applied Statistics) , (1), 141–160. doi:10.1111/rssc.12287 .Bhadra A, Mallick BK (2013). “Joint High-Dimensional Bayesian Variable and CovarianceSelection with an Application to eQTL Analysis.” Biometrics , (2), 447–457. doi:10.1111/biom.12021 .Board OAR (2008). “ OpenMP

Application Program Interface Version 3.0.” URL .Capp´e O, Robert C, Ryd´en T (2003). “Reversible Jump, Birth-and-Death and More GeneralContinuous Time Markov Chain Monte Carlo Samplers.”

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , (3), 679–700. doi:10.1111/1467-9868.00409 .Csardi G, Nepusz T (2006). “The igraph Software Package for Complex Network Research.”

InterJournal , Complex Systems , 1695. URL http://igraph.org .Dobra A, Lenkoski A (2011). “Copula Gaussian graphical models and their application tomodeling functional disability data.”

The Annals of Applied Statistics , (2A), 969–993. doi:10.1214/10-aoas397 .6 BDgraph : An R Package for Bayesian Structure Learning in Graphical Models

Dobra A, Lenkoski A, Rodriguez A (2011). “Bayesian Inference for General Gaussian Graph-ical Models with Application to Multivariate Lattice Data.”

Journal of the American Sta-tistical Association , (496), 1418–1433. doi:10.1198/jasa.2011.tm10465 .Dobra A, Mohammadi R (2018). “Loglinear model selection and human mobility.” The Annalsof Applied Statistics , (2), 815–845.Dyrba M, Grothe MJ, Mohammadi A, Binder H, Kirste T, Teipel SJ, Initiative ADN, et al. (2018). “Comparison of diﬀerent hypotheses regarding the spread of Alzheimer’s diseaseusing Markov random ﬁelds and multimodal imaging.” Journal of Alzheimer’s Disease , (3), 731–746. doi:10.3233/jad-161197 .Finegold M, Drton M (2014). “Robust Bayesian Graphical Modeling Using Dirichlet t-Distributions.” Bayesian Analysis , (3), 521–550. doi:10.1214/13-ba856 .Friedman J, Hastie T, Tibshirani R (2008). “Sparse Inverse Covariance Estimation with theGraphical Lasso.” Biostatistics , (3), 432–441. doi:10.1093/biostatistics/kxm045 .Friedman J, Hastie T, Tibshirani R (2018). glasso : Graphical Lasso- Estimation of GaussianGraphical Models . R package version 1.10, URL http://CRAN.R-project.org/package=glasso .Giudici P, Green P (1999). “Decomposable graphical Gaussian model determination.” Biometrika , (4), 785–801. doi:10.1093/biomet/86.4.785 .Green PJ (2003). “Trans-dimensional markov chain monte carlo.” Oxford Statistical ScienceSeries , pp. 179–198.Gu Q, Cao Y, Ning Y, Liu H (2015). “Local and Global Inference for High DimensionalGaussian Copula Graphical Models.” arXiv preprint arXiv:1502.02347 .Hastie T, Tibshirani R, Friedman J (2009).

The Elements of Statistical Learning: DataMining, Inference, and Prediction . Springer-Verlag.Hoﬀ PD (2007). “Extending the Rank Likelihood for Semiparametric Copula Estimation.”

The Annals of Applied Statistics , pp. 265–283. doi:10.1214/07-aoas107 .Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar P (2011). “Sparse Inverse Covariance MatrixEstimation Using Quadratic Approximation.” In J Shawe-Taylor, R Zemel, P Bartlett,F Pereira, K Weinberger (eds.),

Advances in Neural Information Processing Systems 24 ,pp. 2330–2338. http://nips.cc/. URL http://cs.utexas.edu/users/sustik/QUIC .Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar P (2014). “QUIC: quadratic approximation forsparse inverse covariance estimation.”

The Journal of Machine Learning Research , (1),2911–2947.Jones B, Carvalho C, Dobra A, Hans C, Carter C, West M (2005). “Experiments in StochasticComputation for High-Dimensional Graphical Models.” Statistical Science , (4), 388–400. doi:10.1214/088342305000000304 .Kalisch M, M¨achler M, Colombo D, Maathuis MH, B¨uhlmann P (2012). “Causal InferenceUsing Graphical Models with the R Package pcalg .” Journal of Statistical Software , (11),1–26. doi:10.18637/jss.v047.i11 . ournal of Statistical Software Graphical Models , volume 17. Oxford University Press, USA.Lawson CL, Hanson RJ, Kincaid DR, Krogh FT (1979). “Basic Linear Algebra Subprogramsfor Fortran Usage.”

ACM Transactions on Mathematical Software (TOMS) , (3), 308–323. doi:10.1145/355841.355847 .Lenkoski A (2013). “A Direct Sampler for G-Wishart Variates.” Stat , (1), 119–128. doi:10.1002/sta4.23 .Lenkoski A, Dobra A (2011). “Computational aspects related to inference in Gaussian graph-ical models with the G-wishart prior.” Journal of Computational and Graphical Statistics , (1), 140–157. doi:10.1198/jcgs.2010.08181 .Meinshausen N, Buhlmann P (2006). “High-Dimensional Graphs and Variable Selec-tion with the Lasso.” The Annals of Statistics , (3), 1436–1462. doi:10.1214/009053606000000281 .Mohammadi A, Abegaz Yazew F, van den Heuvel E, Wit EC (2017a). “Bayesian modellingof Dupuytren disease using Gaussian copula graphical models.” Journal of Royal StatisticalSociety-Series C , (3), 629–645. doi:10.1111/rssc.12171 .Mohammadi A, Wit EC (2014). “Contributed Discussion on Article by Finegold and Drton.” Bayesian Analysis , (3), 577–579. doi:10.1214/13-ba856d .Mohammadi A, Wit EC (2015). “Bayesian Structure Learning in Sparse Gaussian GraphicalModels.” Bayesian Analysis , (1), 109–138. doi:10.1214/14-ba889 .Mohammadi R (2019). ssgraph : Bayesian Graphical Estimation using Spike-and-Slab Priors . R package version 1.8.Mohammadi R, Massam H, Gerald L (2017b). “The Ratio of Normalizing Constants forBayesian Graphical Gaussian Model Selection.” arXiv preprint arXiv:1706.04416 .Mohammadi R, Wit EC (2019). BDgraph : Bayesian Structure Learning in Graphical Modelsusing Birth-Death MCMC . R package version 2.59, URL http://CRAN.R-project.org/package=BDgraph .Muirhead R (1982). Aspects of Multivariate Statistical Theory , volume 42. Wiley OnlineLibrary. doi:10.1002/9780470316559 .Murray I, Ghahramani Z (2004). “Bayesian Learning in Undirected Graphical Models: Ap-proximate MCMC Algorithms.” In

Proceedings of the 20th conference on Uncertainty inartiﬁcial intelligence , pp. 392–399. AUAI Press.Murray I, Ghahramani Z, MacKay D (2012). “MCMC for doubly-intractable distributions.” arXiv preprint arXiv:1206.6848 .Nelsen RB (2007).

An introduction to copulas . Springer Science & Business Media.Pensar J, Nyman H, Niiranen J, Corander J, et al. (2017). “Marginal pseudo-likelihoodlearning of discrete Markov network structures.”

Bayesian analysis , (4), 1195–1215. doi:10.1214/16-ba1032 .8 BDgraph : An R Package for Bayesian Structure Learning in Graphical Models R Core Team (2019). R : A Language and Environment for Statistical Computing . R Founda-tion for Statistical Computing, Vienna, Austria. URL .Rolfs B, Rajaratnam B, Guillot D, Wong I, Maleki A (2012). “Iterative thresholding algorithmfor sparse inverse covariance estimation.” In

Advances in Neural Information ProcessingSystems , pp. 1574–1582.Roverato A (2002). “Hyper Inverse Wishart Distribution for Non-decomposable Graphs andits Application to Bayesian Inference for Gaussian Graphical Models.”

Scandinavian Journalof Statistics , (3), 391–411. doi:10.1111/1467-9469.00297 .Scutari M (2010). “Learning Bayesian Networks with the bnlearn R Package.”

Journal ofStatistical Software , (3), 22. doi:10.18637/jss.v035.i03 .Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M,Flicek P, Koller D, et al. (2007). “Population Genomics of Human Gene Expression.” Naturegenetics , (10), 1217–1224. doi:10.1038/ng2142 .Tank A, Foti N, Fox E (2015). “Bayesian structure learning for stationary time series.” InProceedings of the 31st Conference on Uncertainty in Artiﬁcial Intelligence , pp. 872–881.Wang H, Li S (2012). “Eﬃcient Gaussian Graphical Model Determination under G-WishartPrior Distributions.”

Electronic Journal of Statistics , , 168–198. doi:10.1214/12-ejs669 .Wit EC, Abbruzzo A (2015a). “Factorial graphical models for dynamic networks.” NetworkScience , (01), 37–57. doi:10.1017/nws.2015.2 .Wit EC, Abbruzzo A (2015b). “Inferring slowly-changing dynamic gene-regulatory networks.” BMC bioinformatics , (Suppl 6), S5. doi:10.1186/1471-2105-16-s6-s5 .Zhao T, Liu H, Roeder K, Laﬀerty J, Wasserman L (2019). huge : High-dimensional Undi-rected Graph Estimation . R package version 1.3.2, URL http://CRAN.R-project.org/package=huge . Aﬃliation:

Reza Mohammadi,Operation Management Section,Faculty of Economics end Business,University of Amsterdam,Amsterdam, Netherlands,E-mail: [email protected]

URL: ournal of Statistical Software [email protected]

URL:

Journal of Statistical Software published by the American Statistical Association

Volume 89, Issue 3

Submitted: