[PDF] Efficient Bayesian Structural Equation Modeling in Stan

Abstract

Structural equation models comprise a large class of popular statistical models, including factor analysis models, certain mixed models, and extensions thereof. Model estimation is complicated by the fact that we typically have multiple interdependent response variables and multiple latent variables (which may also be called random effects or hidden variables), often leading to slow and inefficient MCMC samples. In this paper, we describe and illustrate a general, efficient approach to Bayesian SEM estimation in Stan, contrasting it with previous implementations in R package blavaan (Merkle & Rosseel, 2018). After describing the approaches in detail, we conduct a practical comparison under multiple scenarios. The comparisons show that the new approach is clearly better. We also discuss ways that the approach may be extended to other models that are of interest to psychometricians.

Full PDF

EEﬃcient Bayesian Structural Equation Modeling in

Stan

Edgar C. Merkle

University of Missouri

Ellen Fitzsimmons

University of Missouri

James Uanhoro

Ohio State University

Ben Goodrich

Columbia University

Abstract

Structural equation models comprise a large class of popular statistical models, in-cluding factor analysis models, certain mixed models, and extensions thereof. Modelestimation is complicated by the fact that we typically have multiple interdependent re-sponse variables and multiple latent variables (which may also be called random eﬀects orhidden variables), often leading to slow and ineﬃcient MCMC samples. In this paper, wedescribe and illustrate a general, eﬃcient approach to Bayesian SEM estimation in

Stan ,contrasting it with previous implementations in R package blavaan (Merkle and Rosseel2018). After describing the approaches in detail, we conduct a practical comparison undermultiple scenarios. The comparisons show that the new approach is clearly better. Wealso discuss ways that the approach may be extended to other models that are of interestto psychometricians. Keywords : Bayesian SEM, structural equation model,

Stan , JAGS , MCMC, blavaan .

1. Introduction

Structural equation models (SEMs) are commonly used in the social sciences, where it iscustomary to (attempt to) measure unobservable traits such as cognitive abilities, attitudes,and proﬁciencies. Such models provide a formal way of connecting these unobservable traits torelated, observed variables (e.g., test scores, Likert responses, etc), which has led to increasedpopularity of SEMs. SEMs are also related to research on causality and directed acyclicgraphs (e.g., Pearl 2013), to generalized linear mixed models (e.g., Bates, Mächler, Bolker,and Walker 2015; Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin 2013; Stroup 2013),and to time series models (e.g., Driver, Oud, and Voelkle 2017), illustrating the models’ broadapplicability across disciplines.A deﬁning feature of the SEM framework is the ability to instantiate regressions on latentvariables, as opposed to observed variables. This framework is more general than the tradi-tional mixed modeling framework, allowing for products between latent variables and otherfree parameters, as might be seen in factor analysis (e.g., Bollen 1989; Merkle and Wang2018). The generality of SEM implies that the estimation methods are relatively complex,which has historically led researchers to rely on closed-source implementations of optimiza-tion methods via software like

Mplus (Muthén and Muthén 1998–2017),

LISREL (Jöreskogand Sörbom 1997), and

EQS (Bentler 2000–2008). A small number of more recent R ( R CoreTeam 2019) packages, including sem (Fox, Nie, and Byrnes 2017a),

OpenMx (Boker, Neale,Maes, Wilde, Spiegel, Brick, Spies, Estabrook, Kenny, Bates, Mehta, and Fox 2011), and a r X i v : . [ s t a t . C O ] A ug Eﬃcient SEM in Stan lavaan (Rosseel 2012), provide open source SEM functionality that utilize classical estimationmethods including maximum likelihood or least squares.While maximum likelihood and least squares methods are most popular, Bayesian approachesto SEM and related models have received increased recent attention (e.g., Depaoli and van deSchoot 2017; Fox 2010; Jackman 2009; Kaplan 2014; Kruschke 2011; MacCallum, Edwards,and Cai 2012; Merkle and Wang 2018; Muthén and Asparouhov 2012; van Erp, Mulder, andOberski 2018). Researchers have speciﬁcally found the methods to be useful for estimationof complex SEMs (including, e.g., latent variable interactions; Lee, Song, and Tang 2007), forautomatically handling uncertainty associated with latent variable estimation, and for scalingto high-dimensional datasets.Despite the increased popularity of Bayesian latent variable models, coding the models via

JAGS (Plummer 2003) or

Stan ( Stan

Development Team 2017) syntax can be diﬃcult, andthe resulting sampling can be time-consuming. These issues have been partially addressed by R package blavaan (Merkle and Rosseel 2018), which uses lavaan model speciﬁcation syntaxand originally relied on JAGS for model estimation (via package runjags , which provides an R interface to JAGS ; see Denwood 2016). Other R packages have addressed these issues forrelated models, including brms (Bürkner 2017) for mixed and multivariate models, ctsem (Driver et al. edstan (Furr 2017) for item response models, and pcFactorStan (Pritikin and Furr 2019) for pairwise comparison factor models.The original blavaan approach was similar to the brms approach for generalized linear mixed(and related) models, where JAGS code was generated at runtime from the user-speciﬁedmodel syntax. However, this approach became very slow for some models, forcing the userto wait hours or more for enough samples to make inferences. This reduced the viabilityof blavaan for applied data analysis and for simulation studies, leading us to implement

Stan functionality in blavaan . The original

Stan implementation was similar to the

JAGS implementation, generating

Stan syntax for a user-speciﬁed model and relying on package rstan (Stan Development Team 2018) for MCMC. This

Stan implementation has not beenformally described, which represents one contribution of the current paper. In general, though,the

Stan implementation was not much faster or more eﬃcient than the

JAGS approach.The primary contribution of this paper is to describe and illustrate a new approach to

Stan

SEM estimation, which greatly improves the speed and eﬃciency of model estimation. Theapproach can be ﬂexibly applied to models in the traditional SEM framework, with general(possibly non-conjugate) prior distributions. It has been implemented in blavaan alongsidethe previous

JAGS and

Stan implementations, allowing for easy comparison across implemen-tations.In the sections below, we ﬁrst formally deﬁne the models under consideration. We thendescribe the three MCMC approaches that are now implemented in blavaan : the original

JAGS approach described in Merkle and Rosseel (2018), the original

Stan approach that isbeing formally described in this paper for the ﬁrst time, and the new

Stan approach that isthe primary focus of this paper. After describing the approaches, we explicitly discuss someproblematic issues associated with estimation of SEMs via MCMC. These are issues that areoften overlooked in the literature, but they are necessary to have fully functional BayesianSEM software. Finally, we compare the three approaches via three examples, highlighting theadvantages of the new

Stan approach. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich

2. Model deﬁnition

The blavaan package generally relies on the lavaan representation of a structural equationmodel, which is based on the LISREL “all-y” representation (e.g., Jöreskog and Sörbom1997).Let y i be the p (continuous) observed variables associated with observation i . Then a struc-tural equation model with m latent variables may be represented by the equations y i = ν + Λ η i + (cid:15) i (1) η i = α + Bη i + ζ i , (2)where η i is an m × (cid:15) i is a p × ζ i is an m × ν and α contain intercept parameters for the manifest and latent variables, respectively; Λ isa matrix of factor loadings; and B contains parameters that reﬂect directed paths betweenlatent variables.The residuals (cid:15) i and ζ i are assumed to be multivariate normal: (cid:15) i ∼ N p ( , Θ ) (3) ζ i ∼ N m ( , Ψ ) , (4)where the associated covariance matrices are often diagonal. These assumptions imply thatthe marginal distribution of y (integrating out the latent variables) is multivariate normalwith parameters µ = ν + Λ α (5) Σ = Λ ( I − B ) − Ψ ( I − B > ) − Λ > + Θ , (6)which requires that ( I − B ) be invertible. The traditional LISREL framework includes ad-ditional matrices for exogenous observed variables, but these are not often utilized in lavaan .Instead, an exogenous observed variable is “upgraded” to latent variable status, where thelatent variable accounts for all of the observed variable’s variance (and the associated varianceparameter in Θ is ﬁxed to 0).Many Bayesian approaches to SEM estimation rely on sampling the η i in tandem with othermodel parameters. This is advantageous because observed variables are often independentconditioned on the η i , so that the conditional distribution of each observed variable is aunivariate normal. However, as we will see later, the sampling of the η i can have a majorimpact on the speed and eﬃciency of MCMC estimation.

3. MCMC approaches

In the sections below, we brieﬂy describe the three MCMC approaches implemented in blavaan : the original

JAGS approach, the original

Stan approach, and the new

Stan approachthat is the focus of this paper.

In previous work (Merkle and Rosseel 2018), we developed a parameter expansion approachthat can be applied to SEMs for continuous data (also see Palomo, Dunson, and Bollen 2007).

Eﬃcient SEM in Stan

The method allows researchers to place prior distributions on intuitive sets of parameters andcan be generally implemented in

JAGS .The approach involves converting the model of interest (the “inferential model”) to an over-parameterized, equivalent model (the “working model”) from which it is easier to sample.The conversion focuses on a model’s covariance parameters, converting each covariance to a“phantom” latent variable. This conversion makes observed variables conditionally indepen-dent of one another (conditioned on latent variables), meaning that our likelihood involves aseries of univariate distributions instead of a single multivariate distribution. Such a conver-sion can speed up sampling in

JAGS , where computations involving the multivariate normaldistribution are very slow. The full details underlying these procedures can be found in Merkleand Rosseel (2018).

The phantom latent variable approach is not essential in

Stan and, in testing, we found thatthe approach did not lead to gains in sampling speed or eﬃciency. However, we did make initialprogress in

Stan by capitalizing on the structure of the SEM latent variable covariance matrix Ψ . This capitalization was inspired by related work on estimating multivariate autoregressivemodels in Stan (Joseph 2016).We provide an overview of this approach here. For traditional SEMs, the distribution oflatent variables can typically be expressed as (see Equation (2)) η ∼ N (( I − B ) − α , ( I − B ) − Ψ ( I − B > ) − ) . Evaluation of this multivariate normal log-likelihood is time-consuming in Stan because weneed to compute the inverse and determinant of the covariance matrix. However, for manymodels, the structure of the covariance matrix leads to simpliﬁcations. For example, thematrix B is often triangular with zeros along its diagonal (leading to a so-called “recursive”model), and the matrix Ψ is often diagonal. When both of these properties are fulﬁlled, we canuse standard matrix properties (e.g., Petersen and Pedersen 2012) to write the determinantas a product of scalar values:det( I − B ) − Ψ ( I − B > ) − ) = (det( I − B )) − det( Ψ )(det( I − B )) − = 1 · m Y i =1 ψ ii · . Relatedly, the inverse of Σ is simpliﬁed as(( I − B ) − Ψ ( I − B > ) − ) − = ( I − B > ) Ψ − ( I − B ) , which completely removes the need to compute matrix inversions when Ψ is diagonal. Wheneither B is triangular or Ψ is diagonal (but not both), we can use a subset of the abovesimpliﬁcations to improve sampling eﬃciency as much as possible. Given a speciﬁc model,package blavaan automatically determines which simpliﬁcations are available and uses themfor Stan estimation. The simpliﬁcations are implemented in

Stan as a custom log-probabilitydensity function. This implementation is available in blavaan via the argument target ="stanclassic" . dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Both methods mentioned above exploit the fact that the latent variables in the model can besampled along with other model parameters. This generally simpliﬁes likelihood computationsand allows us to immediately extend the methods to situations where observed variables havenon-normal distributions. Most Bayesian approaches to SEM, and to other models with“random” parameters, sample the latent variables.However, the sampling of latent variables greatly increases the dimension of the parameterspace, which can reduce sampling speed and eﬃciency. Unlike

JAGS , the key to fast samplingin

Stan is to work with a model likelihood that is marginal over latent variables. This issomewhat unintuitive, because previous researchers have focused on the simpliﬁcations thatwe can gain from sampling the latent variables. One concern related to using the marginallikelihood involves our inability to make inferences about the latent variables (because they areintegrated out of the likelihood). But this concern is addressed by blavaan because, conditionalon the other model parameters, the latent variable posterior distribution is tractable. Thus,the latent variables can be sampled in a “generated quantities” block within the

Stan syntax,even though they do not directly play a role in the MCMC sampling.The new blavaan approach utilizes the marginal likelihood, and we have written a single

Stan program that can estimate the majority of multivariate normal SEMs that a user couldspecify. This ﬁle is compiled once during (or before) package installation. Then, once theuser speciﬁes a model, many pieces of information about the data and about the model arepassed to the compiled model, with sampling occurring immediately. It is inconvenient toenter all the required model information by hand. Thus, to complement the

Stan ﬁle, wehave new R code that serves as a pipeline from blavaan to the Stan model and back. The blavaan user will not notice many diﬀerences, because the commands for model speciﬁcationand estimation are the same as before. However, the model is now sent to the pre-compiled

Stan code by default, whereas the previous approaches wrote

JAGS or Stan code at runtime.It is worth noting that our

Stan

SEM ﬁle stands on its own, so that users of languages beyond R (e.g., Python ) could also utilize the ﬁle if they can pass all the required data in to the

Stan model. This is more challenging than it may sound due to the many pieces of data that arerequired, including the dimensions of all SEM matrices, the free entries of SEM matrices,equality constraints on free parameters, prior distribution parameters, and so on.Because our

Stan model is precompiled, the possible models that can be estimated are re-stricted in two ways. First, there is some inﬂexibility in choice of prior distributions. For mosttypes of model parameters, the form of each parameter’s prior cannot be changed (thoughthe speciﬁc prior hyperparameters can). For example, regression parameters ( B ) in blavaan currently have N(0 ,

10) priors by default, where the normal distribution is parameterized bystandard deviation. Users can change the mean or standard deviation of this normal prior,but they cannot change the fact that the prior is normal. However, for scale parameters,users have the option to place priors on variances, standard deviations, or precisions.The second restriction of the precompiled

Stan model involves equality constraints. While ourcode currently allows for equality constraints within a class of parameters (e.g., loadings canbe constrained equal to one another or intercepts can be constrained equal to one another),it does not allow for constraints across classes of parameters. Additionally, if users wish toset one parameter equal to a function of other parameters, that is not currently possible.However, if users desire features that are not included in our current implementation, they

Eﬃcient SEM in Stan can take our

Stan ﬁle, make the desired changes, and recompile the model. Alternatively, theycould use the original MCMC methods available in blavaan , which provide more ﬂexibilitybecause they are not precompiled.

4. Challenging issues

While the above methods can be readily applied to “vanilla” models such as conﬁrmatoryfactor analysis with uncorrelated factors, the SEM framework includes many model and datacharacteristics that require further attention for estimation. Below, we highlight three char-acteristics requiring special attention.

The general SEM presented earlier includes two covariance matrices with free parameters: Θ and Ψ . These matrices can include some ﬁxed values and some free values, so priordistributions for these matrices are not straightforward. That is, there are some models forwhich we cannot simply place an inverse Wishart prior on the covariance matrix, nor an LKJprior (Lewandowski, Kurowicka, and Joe 2009). Those priors were meant for unrestrictedcovariance/correlation matrices, not for matrices with some ﬁxed values and some free values.In the Stan approaches, we consequently decompose the covariance matrices into standarddeviations and correlations. For example, Θ is written as Θ = D Θ R Θ D Θ , (7)where D Θ is a diagonal matrix of standard deviations and R Θ is a correlation matrix. Priordistributions are then placed individually on the free standard deviation parameters and onthe free correlation parameters within the two matrices. This approach is similar to thatof Barnard, McCulloch, and Meng (2000), and Liu, Zhang, and Grimm (2016) provide acomparison of this approach to the use of inverse Wisharts in the context of growth curvemodels.The use of an independent prior on each free parameter can sometimes lead to a non-positivedeﬁnite covariance matrix during MCMC sampling. Stan is able to reject such a covariancematrix and continue sampling, whereas

JAGS will terminate. This is why we developedthe parameter expansion method in

JAGS : the parameter-expanded model involves diagonalcovariance matrices that cannot become non-positive deﬁnite. But the non-positive deﬁnitecovariance matrices still have implications for model calibration, as detailed in the simulation-based calibration study later.We are aware of a variety of other prior distributions proposed for covariance matrices (e.g.,Chung, Gelman, Rabe-Hesketh, Liu, and Dorie 2015; Consonni and Veronese 2003; Mulderand Pericchi 2018; Spezia 2018). The strategy implemented in blavaan is worthwihle becauseit is relatively easy to specify informative prior distributions for individual standard deviationand correlation parameters in the model. In contrast, many of the other prior distributionsare proposed for convenience or due to the fact that they maintain positive deﬁniteness, andthey have less-intuitive interpretations as compared to our approach. We plan to furtherconsider these alternative priors in the future. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich While it is often useful and desirable to directly model the missing values with the rest of themodel (e.g., Merkle 2011; O’Muircheartaigh and Moustaki 1999), blavaan employs a “missingat random” approach to missing data that diﬀers across

JAGS and

Stan . In

JAGS , one caninclude NA values in the data, and

JAGS will sample these missing values as if they wereextra model parameters. In contrast,

Stan does not allow NA values in the data, so thatone must handle the missing data manually. We utilize a “full information” likelihood (e.g.,Wothke 2000) in our

Stan models, which is the same likelihood that is used to handle missingdata in lavaan and other software that performs maximum likelihood SEM estimation. Thisrequires some additional overhead in preparing the data to be sent to

Stan , because each case’sobserved values must be indexed, and cases are sorted by missing data pattern to speed upcomputations. Missing values could also be directly sampled (“imputed”) in

Stan , thoughthis functionality is not currently available.

Structural equation models typically require some parameter constraints to achieve parameteridentiﬁcation, where we must “set the scale” of each latent variable. The two most popularways to do this involve (i) ﬁxing each latent variable’s variance to 1, or (ii) ﬁxing a singleloading (parameter in λ ) to 1 for each latent variable. Of these two, the latter method ismost straightforward to implement in a Bayesian setting.The former method (of ﬁxing each latent variance to 1) is more challenging. This is because,as described by Peeters (2012), one loading per latent variable must be sign constrained toachieve global parameter identiﬁcation. Otherwise, the sign of each loading may ﬂip back andforth, with a model’s regression parameters and covariance parameters potentially ﬂippingalong with the loadings. One solution to this issue involves the placement of a truncatednormal prior (truncated from below at 0) on one loading per latent variable, preventing thesign changes. This solution is adopted in blavaan ’s JAGS approach to model estimation.A diﬀerent solution is implemented in blavaan ’s Stan approaches. In those approaches, thesign ﬂipping is allowed to occur during MCMC sampling. The issue is then handled aftersampling, in the “generated quantities” block. In this block, one loading per latent variable istransformed to always be positive, and the signs of associated parameters (loadings, regres-sions, and covariance parameters) are ﬂipped every time the sampled value of the focal loadingis negative. This approach can improve sampling eﬃciency because no boundary constraintsare introduced in the parameter space. This approach was discussed in a thread on the StanDiscourse site ( https://discourse.mc-stan.org/t/latent-factor-loadings/1483 ).

5. Applications

The estimation approaches described above are all implemented in package blavaan for generalSEM estimation. These include the original

JAGS approach (obtained via argument target ="jags" ), the original

Stan approach ( target = "stanclassic" ), and the new

Stan approach( target = "stan" ).For example, the following code speciﬁes a model for the well-known “political democracy”data (Bollen 1989) and estimates it via each of the three approaches. This dataset includes

Eﬃcient SEM in Stan

75 countries measured on 11 attributes, seven of which were measured in 1960 and 4 of whichwere measured in 1965. The intent of the model is to study relationships between countries’levels of industrialization and democracy over time.

R> model <- '+

The above commands use the default number of burnin/warmup and sampling iterations, aswell as the package’s default prior distribution for each type of model parameter. The defaultprior distributions have generally been chosen to be weakly informative for a variety of SEMstypically encountered in practice; some further discussion of prior distributions appears laterin the simulation-based calibration section.Following model estimation, convergence diagnostics such as Rhat and eﬀective sample sizeare immediately available via the summary method and blavInspect() function, and manytypes of plots are available via the plot method, which relies on package bayesplot (Gabryand Mahr 2019). Further examples of blavaan syntax and functionality can be found in Merkleand Rosseel (2018), noting that target="jags" was the default at the time that paper waswritten, while target="stan" is now the default.In the following sections, we compare the three MCMC approaches on speed and samplingeﬃciency via three example models. We then study the extent to which the best method (themarginal

Stan method) is calibrated.

All comparisons are carried out on a Dell desktop with a large amount of RAM, runningUbuntu Linux. We deﬁne sampling eﬃciency as “eﬀective sample size per second” (ESS/s).Eﬀective sample sizes are computed via the rstan monitor() function, and sampling time ismeasured after

Stan model compilation. The warmup time for

Stan models was ﬁxed to 300iterations, whereas the burn-in time for

JAGS models was ﬁxed to 1000 iterations. There isarbitrariness in the warmup and burn-in choices, so that the ESS/s metric is somewhat crude.But we think the metric is suﬃcient to illustrate the advantage of the new

Stan approach. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Parameter Number

ESS / s JAGSNew StanOld Stan

Figure 1: Sampling eﬃciency of the MCMC procedures for the Bollen example. Parameternumbers correspond to diﬀerent types of parameters: loadings are 1–8; regressions are 9–11;observed variances are 12–22; latent variances are 23–25; intercepts are 26–36.We examine the MCMC methods’ speed and eﬃciency on three models, two of which arepopular models often used to illustrate SEM methods. The third is a more complex modelthat is known to pose diﬃculties for the original blavaan approach. We do not conduct a fullMonte Carlo study here, so our results are subject to noise. But the results are generallyconsistent across the models presented here as well as many others not presented, so we thinkthey can be taken as general evidence for the approaches’ relative performance. The resultsare also consistent with those of Yackulic, Dodrill, Dzul, Sanderlin, and Reid (in press), whostudy marginalization of discrete latent variables in ecological models.

Political democracy

For our ﬁrst comparison, we continue with the Bollen (1989) political democracy model. The blavaan code to ﬁt the model was shown earlier. The

JAGS method was fastest here, averaging0.55 seconds per 100 iterations. Next fastest was the new (marginal)

Stan method, averaging1.44 seconds per 100 iterations, followed by the old

Stan method at 7.36 seconds per 100iterations. But it is more important to examine the methods’ sampling eﬃciencies (eﬀectivesample size per second), which are shown in Figure 1. A separate metric is shown for eachparameter, with the parameters being numbered along the x-axis. Parameters are orderedon the x-axis by parameter type, with the details being shown in the ﬁgure caption. Theﬁgure shows that the speed of

JAGS is oﬀset by the eﬀective sample size, so that the new

Stan method is best in terms of sampling eﬃciency. The old

Stan method exhibits eﬃciencysimilar to that of

JAGS .0 Eﬃcient SEM in Stan

Holzinger and Swineford

Our second example involves a conﬁrmatory factor analysis of the Holzinger and Swineford(1939) data. This is the version of the dataset included in package lavaan , which has 301individuals measured on nine cognitive scales. The conﬁrmatory factor model ﬁt to the dataincludes three latent variables, each of which is associated with three observed variables. The blavaan code to specify and ﬁt the model is

R> HS.model <- ' visual =~ x1 + x2 + x3+ textual =~ x4 + x5 + x6+ speed =~ x7 + x8 + x9 'R> fit <- bcfa(HS.model, data = HolzingerSwineford1939) where additional arguments would typically be used to specify the number of sampling iter-ations, to specify priors, to specify the MCMC sampler, and so on.In terms of speed, the

JAGS method is again fastest, averaging 0.91 seconds per 100 iterations.This was followed by the marginal

Stan method at 2.83 seconds per 100 iterations, then theold

Stan method at 9.55 seconds per 100 iterations. The ESS/s metrics for this model arevisualized in Figure 2. The graph is similar to that of the previous section, with fewer param-eters in this model as compared to the last model. We see that the gold line, representing thenew

Stan method, is the highest for all the model parameters, being at least twice as largeas the other methods’ eﬃciencies. The

JAGS and old

Stan methods are again similar to oneanother for this example, with

JAGS being better for the majority of parameters.

Growth model

For our ﬁnal comparison, we use a “Multiple indicator univariate latent change score” modelpresented in Kievit, Brandmaier, Ziegler, van Harmelen, de Mooij, Moutoussis, Goodyer,Bullmore, Jones, Fonagy, Lindenberger, and Dolan (2018). The blavaan code to ﬁt thismodel, as speciﬁed by Kievit et al. (2018), is

R> MILCS <- '+ COG_T1 =~ 1*T1X1 + T1X2 + T1X3+ COG_T2 =~ 1*T2X1 + equal("COG_T1 =~ T1X2")*T2X2 + equal("COG_T1 =~ T1X3")*T2X3++ COG_T2 ~ 1*COG_T1+ dCOG1 =~ 1*COG_T2+ COG_T2 ~ 0*1+ COG_T2 ~~ 0*COG_T2++ dCOG1 ~ 1+ COG_T1 ~ 1+ dCOG1 ~~ dCOG1+ COG_T1 ~~ COG_T1+ dCOG1 ~ COG_T1++ T1X1 ~~ T2X1+ T1X2 ~~ T2X2 dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Parameter Number

ESS / s JAGSNew StanOld Stan

Figure 2: Sampling eﬃciency of the MCMC procedures for the Holzinger & Swineford ex-ample. Parameter numbers correspond to diﬀerent types of parameters: loadings are 1–6;observed variable variances are 7–15; latent variable variances are 16–18; latent variable co-variances are 19–21; intercepts are 22–30. + T1X3 ~~ T2X3++ T1X1 ~~ T1X1+ T1X2 ~~ T1X2+ T1X3 ~~ T1X3++ T2X1 ~~ equal("T1X1 ~~ T1X1")*T2X1+ T2X2 ~~ equal("T1X2 ~~ T1X2")*T2X2+ T2X3 ~~ equal("T1X3 ~~ T1X3")*T2X3++ T1X1 ~ 0*1+ T1X2 ~ 1+ T1X3 ~ 1+ T2X1 ~ 0*1+ T2X2 ~ equal("T1X2 ~ 1")*1+ T2X3 ~ equal("T1X3 ~ 1")*1+ 'R> fit <- blavaan(MILCS, data = simdatMILCS, fixed.x = FALSE) where further information on this model and its speciﬁcation can be found in the originalauthors’ paper. We ﬁt the model to 500 artiﬁcial observations, where the data were generatedvia code that was included with the Kievit paper. This model, and others described in the2

Eﬃcient SEM in Stan

Parameter Number

ESS / s JAGSNew StanOld Stan

Figure 3: Sampling eﬃciency of the MCMC procedures for the growth model example. Pa-rameter numbers correspond to diﬀerent types of parameters: loadings are 1–2; regressionsare 3; observed variances are 4–6; observed variable covariances are 7–9; latent variances are10–11; intercepts are 12–13; latent means are 14–15.Kievit paper, have been especially diﬃcult to ﬁt in blavaan , requiring long run times and highautocorrelation among parameter draws. We ended up thinning the

JAGS samples by 20 inour analyses here, because it was the only way that we could consistently obtain an Rhatvalue below 1.05.The sampling speed is now reversed, with the marginal

Stan method at 23.4 seconds per 100iterations, the

JAGS method at 28.4 seconds per 100 iterations, and the old

Stan method at577.27 seconds per 100 iterations. If we instead compute the

JAGS speed while accountingfor thinning (i.e., counting only each twentieth iteration in the computations), then the

JAGS speed is at 567.98 seconds per 100 iterations.The approaches’ sampling eﬃciencies are shown in Figure 3, where parameter ordering isagain described in the ﬁgure caption. The

JAGS and old

Stan methods are very low for thismodel, with the new

Stan method displaying much better eﬃciency and yielding useful resultsin a matter of minutes, as opposed to hours or days. This and related examples (not shown)convinced us to make the new

Stan method the default in blavaan , replacing the originaldefault of

JAGS . The new

Stan method reliably produces fast, eﬃcient samples for a largenumber of models, whereas the other methods exhibit more variability in their speeds andeﬃciencies, and are seldom clearly better than the new

Stan method.

The posterior estimates resulting from blavaan have been veriﬁed in a few manners. Initially,we treated lavaan as a gold standard, comparing posterior means and standard deviations dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich ν λ β θ ψ ρ Set 1 (default) N(0,32) N(0,10) N(0,10) Gamma(1,.5) Gamma(1,.5) Beta(1,1)Set 2 (informative) N(0,32) N(1.25,.25) N(1.5,.25) Gamma(10,10) Gamma(10,10) Beta(5,5)

Table 1: Prior distributions used in the simulation-based calibration study. Normal priors areparameterized using standard deviations. Gamma priors are placed on standard deviationsassociated with θ and ψ parameters, as opposed to variances or precisions.under weak priors to the maximum likelihood estimates and standard errors from lavaan . Forall three MCMC methods, the posterior means obtained under this approach typically agreewith lavaan estimates to about one decimal point. The posterior standard deviations tend tobe close to, but slightly larger than, the maximum likelihood standard errors. Now that thereare multiple MCMC methods implemented in blavaan , we have also been able to compareMCMC methods to one another in order to verify that they were producing similar posteriordistributions.Here, we use the simulation-based calibration method proposed by Talts, Betancourt, Simp-son, Vehtari, and Gelman (2018) to study the calibration of blavaan ’s new (marginal) Stan implementation. This involved repeatedly generating data from the model’s prior distribu-tion, ﬁtting the model to the generated data, and examining the ranks of the posterior MCMCsamples relative to samples from the prior distribution. If the MCMC algorithm is calibrated,then these ranks should be approximately uniformly distributed. Deviations from uniformityare then taken as miscalibration.

Method

Our simulation-based calibration study utilized the political democracy model presented ear-lier. We generated 500 datasets of size 75 from the prior predictive distribution and ﬁt themodel to each generated dataset via MCMC. The study involved two conditions that dif-fered by the prior distributions that were used. First, we used the default prior distributionsfrom blavaan , which are intended to be weakly informative in many situations encountered inpractice. Second, we used a set of more informative prior distributions to contrast with thenoninformative priors. Both sets of prior distributions are shown in Table 5.2.

Results

Rank frequencies for the blavaan default priors are shown in Figure 4. Perhaps surprisingly,the distributions are far from uniform, with peaks generally occurring near zero and one.These peaks represent posterior distributions that exhibit less variability than they should,given the non-informative priors from which we started. Clearly, the results are far from theuniformity that would be expected from a calibrated algorithm.The non-uniformity occurs because our model has a large number of parameters with indepen-dent prior distributions, with the parameters all contributing to the model-implied covariancematrix (see Equation (6)). For the model considered here, there are many combinations of pa-rameters that lead to a non-positive deﬁnite covariance matrix, and the MCMC sampler willavoid these combinations of parameters during sampling. In the simulation-based calibrationstudy, this leads to posterior samples that are not calibrated with respect to the independentpriors. Instead, we might say that the posteriors are calibrated with respect to regions of the4

Eﬃcient SEM in Stan prior distribution that are positive deﬁnite.To provide evidence that the MCMC algorithm is indeed calibrated with respect to priorsthat maintain positive deﬁniteness, we show the results of the informative priors in Figure 5.These frequencies are now much closer to uniform, because the information in the prior dis-tributions now generally leads to positive deﬁnite model covariance matrices. This interplaybetween the informativeness of prior distributions and posterior calibration is worthy of fur-ther attention, because existing MCMC algorithms for SEM use a series of independent priorson parameters that each play a role in the model-implied covariance matrix. A researcher’spriors may be more informative than expected, based solely on the fact that the model-impliedcovariance matrix must remain positive deﬁnite during MCMC sampling. Further, dependingon the model likelihood used (marginal vs. conditional), the degree of information present inuninformative priors may vary. These results highlight the utility of blavaan for conductingdetailed study of MCMC algorithms, as well as the fact that there is room to improve thedefault blavaan priors in the future.

6. Conclusion

The results in this paper show that we can improve sampling eﬃciency by integrating thelatent variables out of the model likelihood, which is the opposite of most popular approachesto Bayesian SEM estimation (where the popular approaches are largely based on resultssummarized by, e.g., Lee 2007; Song and Lee 2012). We can expect the marginal samplingeﬃciency to be even more advantageous as sample sizes increase, because the sample sizehas no impact on the dimension of the parameter space here. In contrast, the dimension ofthe parameter space increases with sample size under conditional approaches, where latentvariables count as parameters.While the marginal approach is promising, use of the marginal likelihood leads us back toproblems that frequentists often encounter in SEM. These problems include the fact that themarginal likelihood does not have a closed form when we have non-normal observed variables(e.g., ordinal variables) or when we have latent variable interactions. We think that someprogress can be made here by employing other Bayesian methods, including data augmen-tation (e.g., Chib and Greenberg 1998) in the ordinal case. The use of data augmentationfor psychometric models has been described by Fox (2010) and Fox, Mulder, and Sinharay(2017b), and such methods may be implemented in future versions of blavaan .For situations where the marginal likelihood does not exist in closed form, it is also possibleto move back to the original blavaan approaches that sample the latent variables. However,in our experience, the original approaches are even slower and less eﬃcient in those situations(as compared to the models considered here), making them questionable for applied work.Further, even if those methods did exhibit reasonable eﬃciency, the marginal likelihood isgenerally necessary for obtaining suitable information criteria such as DIC (Spiegelhalter,Best, Carlin, and Linde 2002) or WAIC (Watanabe 2010). Merkle, Furr, and Rabe-Hesketh(2019) discuss why the marginal likelihood is preferable here, and Zhang, Tao, Wang, and Shi(2019) discuss related applications of DIC to multilevel item response models. Thus, we thinkthat use of the new

Stan approach, paired with new tricks for handling non-normal observedvariables, is the most viable approach for applications to the non-normal modeling situationstypically encountered in practice. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Theta_sd[3] Theta_sd[4] Theta_sd[5] Theta_sd[6] Theta_sd[7] Theta_sd[8] Theta_sd[9]Theta_r[4] Theta_r[5] Theta_r[6] Theta_sd[1] Theta_sd[10] Theta_sd[11] Theta_sd[2]Nu[9] Psi_sd[1] Psi_sd[2] Psi_sd[3] Theta_r[1] Theta_r[2] Theta_r[3]Nu[2] Nu[3] Nu[4] Nu[5] Nu[6] Nu[7] Nu[8]Lambda_y[5] Lambda_y[6] Lambda_y[7] Lambda_y[8] Nu[1] Nu[10] Nu[11]B[1] B[2] B[3] Lambda_y[1] Lambda_y[2] Lambda_y[3] Lambda_y[4] c oun t Figure 4: Simulation-based calibration rank frequencies, default blavaan priors.6

Eﬃcient SEM in Stan

Theta_sd[3] Theta_sd[4] Theta_sd[5] Theta_sd[6] Theta_sd[7] Theta_sd[8] Theta_sd[9]Theta_r[4] Theta_r[5] Theta_r[6] Theta_sd[1] Theta_sd[10] Theta_sd[11] Theta_sd[2]Nu[9] Psi_sd[1] Psi_sd[2] Psi_sd[3] Theta_r[1] Theta_r[2] Theta_r[3]Nu[2] Nu[3] Nu[4] Nu[5] Nu[6] Nu[7] Nu[8]Lambda_y[5] Lambda_y[6] Lambda_y[7] Lambda_y[8] Nu[1] Nu[10] Nu[11]B[1] B[2] B[3] Lambda_y[1] Lambda_y[2] Lambda_y[3] Lambda_y[4] c oun t Figure 5: Simulation-based calibration rank frequencies, informative priors. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich

JAGS or in

Stan . For example, the pre-compiled marginal

Stan approach can be modiﬁedso that the latent variables are part of the model likelihood. This leads to a method that issimilar to the “old

Stan ” method, except that it simpliﬁes computation of the model likelihoodin diﬀerent ways. In limited testing, we found that this approach was somewhat closer tothe

JAGS approach in sampling eﬃciency, but still similar to the “old

Stan ” approach studiedin this paper. Alternatively, it is possible to deﬁne a marginal approach in

JAGS , but ourlimited testing there indicates that use of the multivariate normal distribution leads to majordecreases in

JAGS sampling eﬃciency.Finally, our analyses here have included no use of parallelization. While between-chain paral-lelization is immediately available in blavaan , there have been recent advances in within-chainparallelization in

Stan . These advances are promising for further improvement of

Stan sam-pling eﬃciency, and we plan to include this in future versions of blavaan .

7. Acknowledgments

This work was partially supported by a research leave to the ﬁrst author, which was providedby the University of Missouri.

References

Barnard J, McCulloch R, Meng XL (2000). “Modeling Covariance Matrices in Terms ofStandard Deviations and Correlations, with Application to Shrinkage.”

Statistica Sinica , , 1281–1311.Bates D, Mächler M, Bolker B, Walker S (2015). “Fitting Linear Mixed-Eﬀects Models Using lme4 .” Journal of Statistical Software , (1), 1–48. doi:10.18637/jss.v067.i01 .Bentler PM (2000–2008). EQS . Multivariate Soft-ware, Inc.Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S,Bates T, Mehta P, Fox J (2011). “

OpenMx : An Open Source Extended Structural EquationModeling Framework.”

Psychometrika , , 306–317.Bollen KA (1989). Structural Equations with Latent Variables . New York: John Wiley &Sons.Bürkner PC (2017). “ brms : An R Package for Bayesian Multilevel Models Using

Stan .” Journal of Statistical Software , (1), 1–28. doi:10.18637/jss.v080.i01 .Chib S, Greenberg E (1998). “Analysis of Multivariate Probit Models.” Biometrika , ,347–361.Chung Y, Gelman A, Rabe-Hesketh S, Liu J, Dorie V (2015). “Weakly Informative Prior forPoint Estimation of Covariance Matrices in Hierarchical Models.” Journal of Educationaland Behavioral Statistics , (2), 136–157. doi:10.3102/1076998615570945 .8 Eﬃcient SEM in Stan

Consonni G, Veronese P (2003). “Enriched Conjugate and Reference Priors for the WishartFamily on Symmetric Cones.”

The Annals of Statistics , (5), 1491–1516. doi:10.1214/aos/1065705116 .Denwood MJ (2016). “ runjags : An R Package Providing Interface Utilities, Model Templates,Parallel Computing Methods and Additional Distributions for MCMC Models in

JAGS .” Journal of Statistical Software , , 1–25. doi:10.18637/jss.v071.i09 .Depaoli S, van de Schoot R (2017). “Improving Transparency and Replication in BayesianStatistics: The WAMBS-Checklist.” Psychological Methods , , 240–261.Driver CC, Oud JHL, Voelkle MC (2017). “Continuous Time Structural Equation Modelingwith R Package ctsem .” Journal of Statistical Software , (5), 1–35. doi:10.18637/jss.v077.i05 .Fox J, Nie Z, Byrnes J (2017a). sem : Structural Equation Models . R package version 3.1-9,URL https://CRAN.R-project.org/package=sem .Fox JP (2010). Bayesian Item Response Modeling: Theory and Applications . New York, NY:Springer-Verlag.Fox JP, Mulder J, Sinharay S (2017b). “Bayes Factor Covariance Testing in Item ResponseModels.”

Psychometrika , (4), 979–1006. doi:10.1007/s11336-017-9577-6 . URL https://doi.org/10.1007/s11336-017-9577-6 .Furr DC (2017). edstan : Stan

Models for Item Response Theory . R package version 1.0.6,URL https://CRAN.R-project.org/package=edstan .Gabry J, Mahr T (2019). bayesplot : Plotting for Bayesian Models . R package version 1.7.1,URL https://CRAN.R-project.org/package=bayesplot .Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013). Bayesian DataAnalysis . 3rd edition. Chapman & Hall/CRC.Holzinger KJ, Swineford FA (1939).

A Study of Factor Analysis: The Stability of a Bi-factorSolution . Number 48 in Supplementary Educational Monograph. Chicago: University ofChicago Press.Jackman S (2009).

Bayesian Analysis for the Social Sciences . John Wiley & Sons.Jöreskog KG, Sörbom D (1997).

LISREL 8

User’s Reference Guide . Scientiﬁc Software Inter-national.Joseph M (2016). “Exact Sparse CAR Models in

Stan .” Stan

Case Studies , . URL http://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html .Kaplan D (2014). Bayesian Statistics for the Social Sciences . New York: The Guilford Press.Kievit RA, Brandmaier AM, Ziegler G, van Harmelen AL, de Mooij SM, Moutoussis M,Goodyer IM, Bullmore E, Jones PB, Fonagy P, Lindenberger U, Dolan RJ (2018). “De-velopmental Cognitive Neuroscience Using Latent Change Score Models: A Tutorial andApplications.”

Developmental Cognitive Neuroscience , , 99–117. doi:https://doi.org/10.1016/j.dcn.2017.11.007 . dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Doing Bayesian Data Analysis: A Tutorial with R and BUGS . Burling-ton, MA: Academic Press.Lee SY (2007).

Structural Equation Modeling: A Bayesian Approach . Chichester, England:John Wiley & Sons.Lee SY, Song XY, Tang NS (2007). “Bayesian Methods for Analyzing Structural EquationModels with Covariates, Interaction, and Quadratic Latent Variables.”

Structural EquationModeling , , 404–434.Lewandowski D, Kurowicka D, Joe H (2009). “Generating Random Correlation MatricesBased on Vines and Extended Onion Method.” Journal of Multivariate Analysis , ,1989–2001.Liu H, Zhang Z, Grimm KJ (2016). “Comparison of Inverse Wishart and Separation-StrategyPriors for Bayesian Estimation of Covariance Parameter Matrix in Growth Curve Analysis.” Structural Equation Modeling , , 353–367.MacCallum RC, Edwards MC, Cai L (2012). “Hopes and Cautions in Implementing BayesianStructural Equation Modeling.” Psychological Methods , , 340–345.Merkle EC (2011). “A Comparison of Imputation Methods for Bayesian Factor AnalysisModels.” Journal of Educational and Behavioral Statistics , , 257–276. doi:10.3102/1076998610375833 .Merkle EC, Furr D, Rabe-Hesketh S (2019). “Bayesian Model Assessment: Use ofConditional vs Marginal Likelihoods.” Psychometrika , , 802–829. doi:10.1007/s11336-019-09679-0 .Merkle EC, Rosseel Y (2018). “ blavaan : Bayesian Structural Equation Models via ParameterExpansion.” Journal of Statistical Software , (4), 1–30.Merkle EC, Wang T (2018). “Bayesian Latent Variable Models for the Analysis of Ex-perimental Psychology Data.” Psychonomic Bulletin & Review , , 256–270. doi:10.3758/s13423-016-1016-7 .Mulder J, Pericchi LR (2018). “The Matrix-F Prior for Estimating and Testing CovarianceMatrices.” Bayesian Analysis , , 1193–1214.Muthén B, Asparouhov T (2012). “Bayesian Structural Equation Modeling: A More FlexibleRepresentation of Substantive Theory.” Psychological Methods , , 313–335.Muthén LK, Muthén B (1998–2017). Mplus

User’s Guide . 8th edition. Los Angeles, CA:Muthén & Muthén.O’Muircheartaigh C, Moustaki I (1999). “Symmetric Pattern Models: A Latent VariableApproach to Item Non-Response in Attitude Scales.”

Journal of the Royal Statistical SocietyA , , 177–194.Palomo J, Dunson DB, Bollen K (2007). “Bayesian Structural Equation Modeling.” In SY Lee(ed.), Handbook of Latent Variable and Related Models , pp. 163–188. Elsevier.0

Eﬃcient SEM in Stan

Pearl J (2013). “Linear Models: A Useful “Microscope” for Causal Analysis.”

Journal ofCausal Inference , , 155–170.Peeters CFW (2012). “Rotational Uniqueness Conditions under Oblique Factor CorrelationMetric.” Psychometrika , , 288–292.Petersen KB, Pedersen MS (2012). “The Matrix Cookbook.” Version 20121115, URL .Plummer M (2003). “ JAGS : A Program for Analysis of Bayesian Graphical Models UsingGibbs Sampling.” In K Hornik, F Leisch, A Zeileis (eds.),

Proceedings of the 3rd Interna-tional Workshop on Distributed Statistical Computing .Pritikin JN, Furr DC (2019). pcFactorStan : Stan

Models for the Pairwise ComparisonFactor Model . R package version 0.11, URL https://CRAN.R-project.org/package=pcFactorStan . R Core Team (2019). R : A Language and Environment for Statistical Computing . R Founda-tion for Statistical Computing, Vienna, Austria. URL .Rosseel Y (2012). “ lavaan : An R Package for Structural Equation Modeling.”

Journal ofStatistical Software , (2), 1–36. URL .Song XY, Lee SY (2012). Basic and Advanced Bayesian Structural Equation Modeling: WithApplications in the Medical and Behavioral Sciences . Chichester, England: John Wiley &Sons.Spezia L (2018). “Modeling Covariance Matrices by the Trigonometric Separation Strategywith Application to Hidden Markov Models.”

TEST , Online ﬁrst .Spiegelhalter DJ, Best NG, Carlin BP, Linde AVD (2002). “Bayesian Measures of ModelComplexity and Fit.”

Journal of the Royal Statistical Society B , , 583–639.Stan Development Team (2018). “ RStan : The R Interface to

Stan .” R package version 2.18.2,URL http://mc-stan.org/ . Stan

Development Team (2017).

Stan

Modeling Language Users Guide and Reference Manual,Version 2.17.0 . URL http://mc-stan.org/ .Stroup WW (2013).

Generalized Linear Mixed Models: Modern Concepts, Methods and Ap-plications . Boca Raton, FL: CRC Press.Talts S, Betancourt M, Simpson D, Vehtari A, Gelman A (2018). “ValidatingBayesian Inference Algorithms with Simulation-Based Calibration.” Retrieved fromhttp://arxiv.org/abs/1804.06788.van Erp S, Mulder J, Oberski DL (2018). “Prior Sensitivity Analysis in Default BayesianStructural Equation Modeling.”

Psychological Methods , , 363–388.Watanabe S (2010). “Asymptotic Equivalence of Bayes Cross Validation and Widely Appli-cable Information Criterion in Singular Learning Theory.” Journal of Machine LearningResearch , , 3571–3594. dgar C. Merkle, Ellen Fitzsimmons, James Uanhoro, Ben Goodrich Modeling Longitudinal and Multilevel Data: PracticalIssues, Applied Approaches, and Speciﬁc Examples . Mahwah, NJ: Lawrence Erlbaum As-sociates.Yackulic CB, Dodrill M, Dzul M, Sanderlin JS, Reid JA (in press). “A Need for Speed inBayesian Population Models: A Practical Guide to Marginalizing and Recovering DiscreteLatent States.”

Ecological Applications . doi:10.1002/eap.2112 .Zhang X, Tao J, Wang C, Shi NZ (2019). “Bayesian Model Selection Methods for MultilevelIRT Models: A Comparison of Five DIC-based Indices.” Journal of Educational Measure-ment , , 3–27. Aﬃliation:

Edgar C. MerkleDepartment of Psychological SciencesUniversity of Missouri28A McAlester HallColumbia, MO, USA 65211E-mail: [email protected]