bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R
bbssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models inR
Jouni Helske ∗ ,a , Matti Vihola a a Department of Mathematics and Statistics, University of Jyvaskyla, Finland
Abstract
We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existingpackages, bssm allows for easy-to-use approximate inference for the latent states based on Gaussian approximationssuch as the Laplace approximation and the extended Kalman filter. The package accommodates also discretiseddiffusion latent state processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo(MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximationbias. The package implements also a direct pseudo-marginal MCMC or a delayed acceptance pseudo-marginalMCMC using the approximations. The package supports directly models with linear-Gaussian state dynamics (butwith non-Gaussian observation models), and has an
Rcpp interface for specifying custom non-linear models.
Key words:
R package; state space models; time series; Bayesian inference; Markov chain Monte Carlo; sequentialMonte Carlo
1. Introduction
State space models (SSM) are a flexible class of latent variable models commonly used in analysing time seriesdata (cf. Durbin and Koopman, 2012). There are a number of packages available for state space modelling for R (RCore Team, 2020), especially for two special cases: a linear-Gaussian SSM (LGSSM) where both the observationand state densities are Gaussian with linear relationships with the states, and an SSM with discrete state space,which is sometimes called a hidden Markov model (HMM). These classes admit analytically tractable marginallikelihood functions and conditional state distributions (conditioned on the observations), making inference relativelystraightforward. See for example (Helske, 2017; Helske and Helske, 2019; Petris and Petrone, 2011; Tusell, 2011)for review of some of the R packages dealing with these type of models. The present R package bssm is designedfor Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and stateequations. The package aims to provide easy-to-use and efficient functions for fully Bayesian inference with commontime series models such basic structural time series model (BSM) (Harvey, 1989) with exogenous covariates, simplestochastic volatility models, and discretized diffusion models, making it straightforward and efficient to makepredictions and other inference in a Bayesian setting.When extending the state space modelling to non-linear or non-Gaussian models, some difficulties arise. As thelikelihood is no longer analytically tractable, computing the latent state distributions, as well as hyperparameterestimation of the model becomes more challenging. One general option is to use Markov chain Monte Carlo (MCMC)methods targeting the full joint posterior of hyperparameters and the latent states, for example by Gibbs sampling orHamiltonian Monte Carlo. Unfortunately, the joint posterior is typically very high dimensional and due to the strongautocorrelation structures of the state densities, the efficiency of such methods can be relatively poor. Anotherasymptotically exact approach is based on the pseudo-marginal particle MCMC approach (Andrieu et al., 2010),where the likelihood function and the state distributions are estimated using sequential Monte Carlo (SMC) i.e. theparticle filter (PF) algorithm. Instead of computationally demanding Monte Carlo methods, approximation-basedmethods such extended and unscented Kalman filters may be used, as well as Laplace approximations, which areprovided for example by the INLA (Lindgren and Rue, 2015) R package. The latter are computationally appealing,but may lead to hard-to-quantify biases.Some of the R packages suitable for Bayesian state space modelling include pomp (King et al., 2016), rbi (Jacoband Funk, 2020), and rstan (Stan Development Team, 2019). With the package pomp , user defines the model using R or C snippets for simulation from and evaluation of the latent state and observation level densities, allowing flexiblemodel construction. The rbi package is an interface to LibBi (Murray, 2015), a standalone software with a focus on ∗ Corresponding Author
Email addresses: [email protected] (Jouni Helske), [email protected] (Matti Vihola)
Preprint submitted to Computational Statistics and Data Analysis January 22, 2021 a r X i v : . [ s t a t . C O ] J a n ayesian state space modelling on high-performance computers. The rstan package provides R interface to the Stan (Carpenter et al., 2017) C++ package, a general statistical modelling platform. The pomp package providesseveral simulation-based inference methods mainly based on iterated filtering and maximum likelihood, whereas rbi is typically used for Bayesian inference via particle MCMC. For a more detailed comparison of differences of rbi / LibBi and pomp with examples, see (Funk and King, 2020).The key difference to the aforementioned packages and motivation behind the present bssm package is to combinethe use of approximation-based methods with Monte Carlo methods, leading to efficient and unbiased inference, assuggested in (Vihola et al., 2020). In a nutshell, the method uses MCMC which targets an approximate marginalposterior of the hyperparameters, and an importance sampling type weighting which provides asymptotically exactinference on the joint posterior of hyperparameters and the latent states. In addition to this two-stage procedure, the bssm supports also delayed acceptance pseudo-marginal MCMC (Christen and Fox, 2005) using the approximations,and direct pseudo-marginal MCMC. To our knowledge, importance sampling and delayed acceptance in this formare not available in other Bayesian state space modelling packages in R.
2. Supported models
We denote the sequence of observations ( y , . . . , y T ) as y , and the sequence of latent state variables ( α , . . . , α T )as α . The latent states α t ∈ R d are typically vector-valued, whereas we focus mainly on scalar observations y t ∈ R (vector-valued observations are also supported, assuming conditional independence (given α t ) in case of non-Gaussianobservations).A general state space model consists of two parts: observation level densities g ( θ ) t ( y t | α t ) and latent state transitiondensities µ ( θ ) t ( α t +1 | α t ). Typically both g ( θ ) t and µ ( θ ) t depend on unknown parameter vector θ for which we can definearbitrary prior p ( θ ).In a linear-Gaussian SSM, both g ( θ ) t and µ ( θ ) t are Gaussian densities and they depend linearly on the current andprevious state vectors, respectively. Section 2.1 describes a common extension to these models supported by bssm ,which relaxes the assumptions on observational density g ( θ ) t , by allowing exponential family links, and stochasticvolatility models. While the main focus of bssm is in state space models with linear-Gaussian state dynamics, thereis also support for more general non-linear models, discussed briefly in Section 2.2. The primary class of models supported by bssm consists of SSMs with linear-Gaussian state dynamics of form α t +1 = c t + T t α t + R t η t , where c t ∈ R d , T t ∈ R d × d , and R t ∈ R d × k can depend on the unknown parameters θ and covariates. The noise terms η t ∼ N (0 , I k ) and α ∼ N ( a , P ) are independent. These state dynamics can be combined with the observationallevel density g t of form g t ( y t | d t + Z t α t , φ, u t ) , where parameters φ and the known vector u t are distribution specific and can be omitted in some cases. Currently,following observational level distributions are supported:• Gaussian distribution: y t = d t + Z t α t + H t (cid:15) t with (cid:15) t ∼ N (0 , I ).• Poisson distribution: g t ( y t | d t + Z t α t , u t ) = Poisson( u t exp( d t + Z t α t )), where u t is the known exposure at time t .• Binomial distribution: g t ( y t | d t + Z t α t , u t ) = B( u t , exp( d t + Z t α t ) / (1 + exp( d t + Z t α t ))), where u t is the sizeand exp( d t + Z t α t ) / (1 + exp( d t + Z t α t )) is the probability of the success.• Negative binomial distribution: g t ( y t | d t + Z t α t , φ, u t ) = NB(exp( d t + Z t α t ) , φ, u t ), where u t exp( d t + Z t α t ) isthe expected value and φ is the dispersion parameter, and u t is a known offset term.• Gamma distribution: g t ( y t | d t + Z t α t , φ, u t ) = Gamma(exp( d t + Z t α t ) , φ, u t ), where u t exp( d t + Z t α t ) is theexpected value, φ is the shape parameter, and u t is a known offset term.• Stochastic volatility model: g t ( y t | Z t α t ) = exp( α t / (cid:15) t , with (cid:15) t ∼ N (0 , α t +1 = µ + ρ ( α t − µ ) + σ η η t , with η t ∼ N (0 ,
1) and α ∼ N ( µ, σ η / (1 − ρ )).For multivariate models, these distributions can be combined arbitrarily, except the stochastic volatility modelcase which is currently handled separately. Also for fully Gaussian model, the observational level errors (cid:15) t can becorrelated across time series. 2 .2. Other state space models The general non-linear Gaussian model in the bssm has following form: y t = Z ( t, α t , θ ) + H ( t, α t , θ ) (cid:15) t ,α t +1 = T ( t, α t , θ ) + R ( t, α t , θ ) η t ,α ∼ N ( a ( θ ) , P ( θ )) , with t = 1 , . . . , n , (cid:15) t ∼ N (0 , I p ), and η ∼ N (0 , I k ).The bssm package also supports models where the state equation is defined as a continuous-time diffusion modelof the form d α t = µ ( t, α t , θ )d t + σ ( t, α t , θ )d B t , t ≥ , where B t is a (vector-valued) Brownian motion and where µ and σ are vector and matrix-valued functions, with theunivariate observation density p ( y k | α k ) defined at integer times k = 1 . . . , n .
3. Inference methods
The main goal of bssm is to facilitate easy-to-use full Bayesian inference of the joint posterior p ( α, θ | y ) for modelsdiscussed in Section 2. The inference methods implemented in bssm are based on a factorised approach where thejoint posterior of hyperparameters θ and latent states α is given as p ( α, θ | y ) ∝ p ( θ ) p ( α, y | θ ) = p ( θ ) p ( y | θ ) p ( α | y, θ ) , where p ( y | θ ) is the parameter marginal likelihood and p ( α | y, θ ) is the smoothing distribution.All the inference algorithms are based on a Markov chain Monte Carlo on the parameters θ , whose single iterationmay be summarised as follows: Algorithm 1
One iteration of MCMC algorithm for sampling p ( θ | y ). Draw a proposal θ ∼ N ( θ i − , Σ i − ). Calculate the (approximate) marginal likelihood ˆ p ( y | θ ). Accept the proposal with probability α := min n , p ( θ )ˆ p ( y | θ ) p ( θ i − )ˆ p ( y | θ i − ) o . If the proposal θ is accepted, set θ i = θ . Otherwise, set θ i = θ i − . Adapt the proposal covariance matrix Σ i − → Σ i .The adaptation step 5 in bssm currently implements the robust adaptive Metropolis algorithm (Vihola, 2012) withfixed target acceptance rate (0.234 by default) provided by the ramcmc package (Helske, 2016). The (approximate)marginal likelihood ˆ p ( y | θ ) = p ( y | θ ) takes different forms, leading to different inference algorithms, discussed below. The simplest case is with a linear-Gaussian SSM, where we can use the exact marginal likelihood ˆ p ( y | θ ) = p ( y | θ ),in which case Algorithm 1 reduces to (an adaptive) random-walk Metropolis algorithm targeting the posteriormarginal of the parameters θ . Inference from the full posterior may be done using the simulation smoothing algorithm(Durbin and Koopman, 2002).The other ‘direct’ option, which can be used with any model, is using the bootstrap particle filter (BSF) (Gordonet al., 1993), which leads to a random ˆ p ( y | θ ) which is an unbiased estimator of p ( y | θ ). In this case, Algorithm 1reduces to (an adaptive) particle marginal Metropolis-Hastings (Andrieu et al., 2010). Full posterior inference isachieved simultaneously, by picking particle trajectories based on their ancestries as in the filter-smoother algorithm(Kitagawa, 1996). Note that with BSF, the desired acceptance rate needs to be lower, depending on the number ofparticles used (Doucet et al., 2015). 3 .2. Approximate inference: Laplace approximation and the extended Kalman filter The direct BSF discussed above may be used with any non-linear and/or non-Gaussian model, but may be slowand/or poor mixing. To alleviate this, the bssm provides efficient (intermediate) approximate inference in case ofnon-Gaussian observation models in Section 2.1, and in case of non-linear dynamics in Section 2.2.With non-Gaussian models of Section 2.1, we use an approximating Gaussian model ˜ p ( y, α | θ ) which is a Laplaceapproximation of p ( α, y | θ ) following (Durbin and Koopman, 2000). We write the likelihood as follows p ( y | θ ) = Z p ( α, y | θ )d α = ˜ p ( y | θ ) E (cid:20) p ( y | α, θ )˜ p ( y | α, θ ) (cid:21) , where ˜ p ( y | θ ) is the likelihood of the Laplace approximation and the expectation is taken with respect to its conditional˜ p ( α | y, θ ) (Durbin and Koopman, 2012). Indeed, denoting ˆ α as the mode of ˜ p ( α | θ, y ), we may writelog p ( y | θ ) = log ˜ p ( y | θ ) + log p ( y | ˆ α, θ )˜ p ( y | ˆ α, θ ) + log E (cid:20) p ( y | α, θ ) /p ( y | ˆ α, θ )˜ p ( y | α, θ ) / ˜ p ( y | ˆ α, θ ) (cid:21) . If ˜ p resembles p with typical values of α , the latter logarithm of expectation is zero. We take ˆ p ( y | θ ) as the expressionon the right, dropping the expectation.When ˆ p is approximate, the MCMC algorithm targets an approximate posterior marginal. Approximate fullinference may be done analogously as in Section 3.1, by simulating trajectories conditional to the sampled parameterconfigurations θ i . We believe that approximate inference is often good enough for model development, but stronglyrecommend using post-correction as discussed in Section 3.3 to check the validity of the final inference.In addition to these algorithms, bssm also supports ˆ p ( y | θ ) based on the extended KF (EKF) or iterated EKF(Jazwinski, 1970) (IEKF) which can be used for models with non-linear dynamics (Section 2.2). Approximatesmoothing based on (iterated) EKF is also supported. It is also possible to perform direct inference as in Section 3.1,but instead of the BSF, employ particle filter based on EKF (Van Der Merwe et al., 2001). The inference methods in Section 3.2 are efficient, but come with a bias. The bssm implements importance-sampling type post-correction as discussed in (Vihola et al., 2020). Indeed, having MCMC samples ( θ i ) from theapproximate posterior constructed as in Section {approximate-inference}, we may produce (random) weights andlatent states, such that the weighted samples form estimators which are consistent with respect to the true posterior p ( α, θ | y ).The primary approach which we recommend for post-correction is based on a “ ψ -APF” — a particle filter usingthe Gaussian approximations of Section 3.2. In essence, this particle filter employs the dynamics and a look-aheadstrategy coming from the approximation, which leads to low-variance estimators; see (Vihola et al., 2020) andpackage vignettes for more detailed description. Naturally ψ -APF can also be used in place of BSF in directinference of Section 3.1. An alternative to approximate MCMC and post-correction, bssm also supports an analogous delayed acceptancemethod (Banterle et al., 2019; Christen and Fox, 2005) (here denoted by DA-MCMC). This algorithm is similar to 1,but in case of ‘acceptance’, leads to second-stage acceptance using the same weights as the post-correction would;see (Vihola et al., 2020) for details. Note that as in direct approach for non-Gaussian/non-linear models, the desiredacceptance rate with DA-MCMC should be lower than the default 0.234.The DA-MCMC also leads to consistent posterior estimators, and often outperforms the direct particle marginalMetropolis-Hastings. However, empirical findings (Vihola et al., 2020) and theoretical considerations (Franks andVihola, 2020) suggest that approximate inference with post-correction may often be preferable. The bssm supportsparallelisation with post-correction, which may further promote the latter.
For general continuous-time diffusion models, the transition densities are intractable. The bssm uses Millsteintime-discretisation scheme for approximate simulation, and inference is based on the corresponding BSF. Finetime-discretisation mesh gives less bias than the coarser one, with increased computational complexity. The DA andIS approaches can be used to speed up the inference by using coarse discretisation in the first stage and then usingmore fine mesh in the second stage. For comparison of DA and IS approaches in case of geometric Brownian motionmodel, see (Vihola et al., 2020). https://cran.r-project.org/package=bssm/vignettes/psi_pf.html . Using the bssm package Main functions of bssm are written in
C++ , with help of
Rcpp and
RcppArmadillo packages. On the R side,the package uses S3 methods to provide a relatively unified workflow independent of the type of the model one isworking with. The model building functions such as bsm_ng and svm are used to construct the model objects whichcan be then passed to other methods, such as logLik and run_mcmc which compute the log-likelihood value andrun MCMC algorithm respectively. We will now briefly describe the main functionality of bssm . For more detaileddescriptions of different functions, see the corresponding documentation in R and the package vignettes. For models with linear-Gaussian state dynamics 2.1, bssm includes some predefined models such as bsm_lg and bsm_ng for univariate Gaussian and non-Gaussian structural time series models (BSM) (Harvey, 1989) with externalcovariates, for which user only needs to supply the data and priors for unknown model parameters. In addition, bssm supports general model building functions ssm_ulg , ssm_mlg for univariate and multivariate Gaussian modelsand ssm_ung , and ssm_mng for their non-Gaussian counterparts. For these models, users need to supply R functionsfor the evaluation of the prior density and for updating the model matrices given the current value of the parametervector θ . It is also possible to avoid defining the matrices manually by leveraging the formula interface of the KFAS package (Helske, 2017) together with as_bssm function which converts KFAS model to a bssm equivalent modelobject. This is especially useful in case of complex multivariate models with covariates.As an example, consider a Gaussian local linear trend model of the form y t = µ t + (cid:15) t ,µ t +1 = µ t + ν t + η t ,ν t +1 = ν t + ξ t , with zero-mean Gaussian noise terms (cid:15) t , η t , ξ t with unknown standard deviations. This model can be built with bsm function as suppressPackageStartupMessages ( library ("bssm")) data ("nhtemp", package = "datasets")prior <- halfnormal (1, 10)bsm_model <- bsm_lg (y = nhtemp, sd_y = prior, sd_level = prior,sd_slope = prior) Here we use helper function halfnormal which defines half-Normal prior distribution for the standard deviationparameters, with the first argument defining the initial value of the parameter, and second defines the scale parameterof the half-Normal distribution. Other prior options are normal , tnormal (truncated normal), gamma , and uniform .As an example of multivariate model, consider bivariate Poisson model with latent random walk model, definedas y i,t ∼ P oisson (exp( x t )) , i = 1 , ,x t +1 = x t + η t , with η t ∼ N (0 , σ ), and prior σ ∼ Gamma (2 , . ssm_mng function as set.seed (1)x <- cumsum ( c (1, rnorm (49, sd = 0.2)))y <- cbind ( rpois (50, exp (x)), rpois (50, exp (x)))prior_fn <- function (theta) { dgamma (theta, 2, 0.01, log = TRUE)}update_fn <- function (theta) { list (R = array (theta, c (1, 1, 1))) mng_model <- ssm_mng (y = y, Z = matrix (1,2,1), T = 1, R = 0.1, P1 = 1, distribution = "poisson",init_theta = 0.1,prior_fn = prior_fn, update_fn = update_fn)
Here prior_fn defines the log-prior for the model, and update_fn defines how the model components dependon the hyperparameters θ .For models where the state equation is no longer linear-Gaussian, we use pointer-based interface by defining allmodel components as well as functions defining the Jacobians of Z ( · ) and T ( · ) needed by the extended Kalmanfilter as C++ snippets. General non-linear Gaussian model can be defined with the function ssm_nlg . Discretelyobserved diffusion models where the state process is assumed to be continuous stochastic process can be constructedusing the ssm_sde function, which takes pointers to C++ functions defining the drift, diffusion, the derivative ofthe diffusion function, and the log-densities of the observations and the prior. As an example of the latter, let usconsider an Ornstein–Uhlenbeck process d α t = ρ ( ν − α t )d t + σ d B t , with parameters θ = ( φ, ν, σ ) = (0 . , ,
1) and the initial condition α = 1. For observation density, we use Poissondistribution with parameter exp( α k ). We first simulate a trajectory x , . . . , x n using the sde.sim function from the sde package (Iacus, 2016) and use that for the simulation of observations y : set.seed (1) suppressPackageStartupMessages ( library ("sde"))x <- sde.sim (t0 = 0, T = 100, X0 = 1, N = 100,drift = expression (0.5 * (2 - x)),sigma = expression (1),sigma.x = expression (0))y <- rpois (100, exp (x[ - We then compile and build the model as
Rcpp ::sourceCpp ("ssm_sde_template.cpp")pntrs <- create_xptrs ()sde_model <- sde_ssm (y, pntrs $ drift, pntrs $ diffusion,pntrs $ ddiffusion, pntrs $ obs_density, pntrs $ prior, c (0.5, 2, 1), 1, FALSE) The templates for SDE and non-linear Gaussian models can be found from the package vignettes on the CRAN . For LGSSM, the bssm has a methods kfilter and smoother for performing Kalman filtering and smoothing.For non-Gaussian models with linear-Gaussian dynamics, approximate filtering and smoothing estimates can beobtained by calls to kfilter and smoother , in which case these functions first construct an approximating Gaussianmodel for which the Kalman filter/smoother is then used. For non-linear models defined by nlg_ssm we can runapproximate filtering using the extended Kalman filter with the function ekf , the unscented Kalman filter withthe function ukf , or the iterated EKF (IEKF) by changing the argument iekf_iter of the ekf function. Function ekf_smoother can be used for smoothing based on EKF/IEKF.For particle filtering the bssm package supports general bootstrap particle filter (function bootstrap_filter )for all model classes of the bssm . For nlg_ssm , extended Kalman particle filtering (Van Der Merwe et al., 2001) issupported (function ekpf_filter ).For particle smoothing, function particle_smoother with the smoothing based on BSF is available for allmodels. In addition, ψ -APF (using argument method = "psi" ) is available for all models except of ssm_sde class.Currently, only filter-smoother approach (Kitagawa, 1996) for particle smoothing is supported. https://CRAN.R-project.org/package=bssm d_level sd_slope sd_y0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.50 0.75 1.00 1.25 1.5001201230.00.51.01.52.0 value den s i t y Figure 1: Posterior densities of linear-Gaussian model for nhtemp data. bssm
The main purpose of the bssm is to allow efficient MCMC-based inference for various state space models. Forthis task, a method run_mcmc can be used. The function takes a number of arguments, depending on the modelclass, but for many of these, default values are provided. For linear-Gaussian models, we only need to supply thenumber of iterations. Here we define a random walk model with drift and stochastic seasonal component for UK gasconsumption dataset and use 100,000 MCMC iteration where first 10,000 is discarded as a burn-in (burn-in phase isalso used for the adaptation of the proposal distribution): mcmc_bsm <- run_mcmc (bsm_model, iter = 1e5, burnin = 1e4)
The print method for the output of the MCMC algorithms gives a summary of the results, and detailedsummaries for θ and α can be obtained using summary function. For all MCMC algorithms, bssm uses so-calledjump chain representation of the Markov chain X , . . . , X n , where we only store each accepted X k and the numberof steps we stayed on the same state. So for example if X n = (1 , , , , , X = (1 , , N = (1 , , N . This can be done using the function expand_sample which returns anobject of class mcmc of the coda package (Plummer et al., 2006) (thus the plotting and diagnostic methods of coda can also be used). We can also directly transform the posterior samples to a data.frame by using as.data.frame method for the MCMC output (for IS-weighting, the returned data frame contains additional column weights ).This is useful for example for visualization purposes with the ggplot2 package: suppressPackageStartupMessages ( library ("ggplot2"))d <- as.data.frame (mcmc_bsm, variable = "theta") ggplot (d, aes (x = value)) +geom_density (bw = 0.1, fill = " +facet_wrap ( ~ variable, scales = "free") +theme_bw () suppressPackageStartupMessages ( library ("dplyr")) options (dplyr.summarise.inform= FALSE)d <- as.data.frame (mcmc_bsm, variable = "states") Year M ean annua l t e m pe r a t u r e i n N e w H a v en Figure 2: Observed annual average temperatures in New Haven (black dots) and predicted mean (solid line) with 95% prediction intervals(grey ribbon) from ‘bssm‘. summary_y <- d %>%filter (variable == "level") %>%group_by (time) %>%summarise (mean = mean (value),lwr = quantile (value, 0.025),upr = quantile (value, 0.975)) ggplot (summary_y, aes (x = time, y = mean)) +geom_ribbon ( aes (ymin = lwr, ymax = upr), alpha = 0.25) +geom_line () +geom_point (data = data.frame (mean = nhtemp,time = time (nhtemp))) +theme_bw () + xlab ("Year") +ylab ("Mean annual temperature in New Haven") For non-Gaussian models the default MCMC algorithm is approximate inference combined with importancesampling post-correction (Section 3.3). It is also possible to perform first approximate MCMC using the argument mcmc_type = "approx" , and then perform the post-correction step using the results from the approximate MCMC.In doing so, we can also use the function suggest_N to find a suitable number of particles N for ψ -APF in the spiritof (Doucet et al., 2015): out_approx <- run_mcmc (model, mcmc_type = "approx", iter = 50000)est_N <- suggest_N (out_approx, model)out_exact <- postcorrection (model, out_approx, particles = est_N $ N) The function suggest_N computes the standard deviation of the logarithm of the post-correction weights (i.e. therandom part of log-likelihood of ψ -APF) at the approximate MAP estimator of θ using a range of N and returns alist with component N which is the smallest number of particles where the standard deviation was less than one. Forsmall and moderate problems typically 10-20 particles is enough.
5. Comparison of IS-MCMC and HMC
Vihola et al. (2020) compared the computational efficiency of delayed acceptance MCMC and importancesampling type MCMC approaches in various settings. Here we make a small experiment comparing the generic8amiltonian Monte Carlo using the NUTS sampler (Hoffman and Gelman, 2014) with rstan (Stan DevelopmentTeam, 2019), and IS-MCMC with bssm . For complete code of the experiment, see Appendix A.We consider the case of a random walk with drift model with negative binomial observations and some knowncovariate x t , defined as y t ∼ N B (exp( βx t + µ t ) , φ ) µ t +1 = µ t + ν t + η t ,ν t +1 = ν t , with zero-mean Gaussian noise term η t with unknown standard deviation σ µ . Based on this we simulate onerealization of y and x with n = 200, φ = 5, β = − . ν = 0 . σ µ = 0 . ng_bsm function for model building, with prior variances 100 and 0.01 for theinitial states µ and ν . For hyperparameters, we used fairly uninformative half-Normal distribution with standarddeviation 0.5 for σ µ and 0.1 for σ ν . We then ran the IS-MCMC algorithm with run_mcmc using a burn-in phase oflength 10,000 and run 50,000 iterations after the burn-in, with 10 particles per SMC.Using the same set up, we ran the MCMC with rstan using 15,000 iterations (with first 5000 used for warm-up).Note that in order to avoid sampling problems, it was necessary to tweak the default control parameters of thesampler (see Appendix A).Table 1 shows the results. We see both methods produce identical results (within the Monte Carlo error), butwhile rstan produces similar Monte Carlo standard errors with smaller amount of total iterations than bssm , thetotal computation time of rstan is 70 times higher than with bssm (58 minutes versus 50 seconds). Table 1: Estimates of posterior mean, standard deviation and Monte Carlo standard error of the mean for hyperparameters θ and latentstates for last time point for the example model. bssm rstan Mean SD MCSE Mean SD MCSE σ µ .
093 0 .
036 8 × − .
090 0 .
036 9 × − σ ν .
003 0 .
003 5 × − .
003 0 .
003 7 × − φ .
409 0 .
925 2 × − .
386 0 .
898 1 × − β − .
912 0 .
057 1 × − − .
911 0 .
056 7 × − µ .
963 0 .
350 5 × − .
965 0 .
349 4 × − ν .
006 0 .
020 3 × − .
006 0 .
019 2 × −
6. Conclusions
State space models are a flexible tool for analysing a variety of time series data. Here we introduced the R package bssm for fully Bayesian state space modelling for a large class of models with several alternative MCMC samplingstrategies. All computationally intensive parts of the package are implemented with C++ with parallel computationsupport for IS-MCMC making it an attractive option for many common models where relatively accurate Gaussianapproximations are available.Compared to early versions of the bssm package, the option to define R functions for model updating and priorevaluation have lowered the bar for analysing custom models. The package is also written in a way that it is relativelyeasy to extend to new model types similar to current bsm_lg in future. The bssm package could be expanded toallow other proposal adaptation schemes such as adaptive Metropolis algorithm by Haario et al. (2001), as well assupport for multivariate SDE models and automatic differentiation for EKF-type algorithms.
7. Acknowledgements
This work has been supported by the Academy of Finland research grants 284513, 312605, 311877, and 331817.
8. References
Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo methods. Journal of RoyalStatistical Society B 72, 269–342.Banterle, M., Grazian, C., Lee, A., Robert, C.P., 2019. Accelerating Metropolis-Hastings algorithms by delayedacceptance. Foundations of Data Science 1, 103. doi:10.3934/fods.20190059arpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li,P., Riddell, A., 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76, 1–32.doi:10.18637/jss.v076.i01Christen, J.A., Fox, C., 2005. Markov chain Monte Carlo using an approximation. Journal of Computationaland Graphical Statistics 14, 795–810. doi:10.1198/106186005X76983Doucet, A., Pitt, M.K., Deligiannidis, G., Kohn, R., 2015. Efficient implementation of Markov chain MonteCarlo when using an unbiased likelihood estimator. Biometrika 102, 295–313. doi:10.1093/biomet/asu075Durbin, J., Koopman, S.J., 2000. Time series analysis of non-Gaussian observations based on state space modelsfrom both classical and Bayesian perspectives. Journal of Royal Statistical Society B 62, 3–56.Durbin, J., Koopman, S.J., 2002. A simple and efficient simulation smoother for state space time series analysis.Biometrika 89, 603–615.Durbin, J., Koopman, S.J., 2012. Time series analysis by state space methods, 2nd ed. Oxford University Press,New York.Franks, J., Vihola, M., 2020. Importance sampling correction versus standard averages of reversible MCMCs interms of the asymptotic variance. Stochastic Processes and their Applications 130, 6157–6183.Funk, S., King, A.A., 2020. Choices and trade-offs in inference with infectious disease models. Epidemics 30,100383. doi:https://doi.org/10.1016/j.epidem.2019.100383Gordon, N.J., Salmond, D.J., Smith, A.F.M., 1993. Novel approach to nonlinear/non-Gaussian Bayesian stateestimation. IEE Proceedings-F 140, 107–113.Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive metropolis algorithm. Bernoulli 7, 223–242.Harvey, A.C., 1989. Forecasting, structural time series models and the Kalman filter. Cambridge UniversityPress.Helske, J., 2016. ramcmc: Robust adaptive Metropolis algorithm.Helske, J., 2017. KFAS: Exponential family state space models in R. Journal of Statistical Software 78, 1–39.doi:10.18637/jss.v078.i10Helske, S., Helske, J., 2019. Mixture hidden Markov models for sequence data: The seqHMM package in R.Journal of Statistical Software 88, 1–32. doi:10.18637/jss.v088.i03Hoffman, M.D., Gelman, A., 2014. The no-U-turn sampler: Adaptively setting path lengths in HamiltonianMonte Carlo. The Journal of Machine Learning Research 15, 1593–1623.Iacus, S.M., 2016. Sde: Simulation and inference for stochastic differential equations.Jacob, P.E., Funk, S., 2020. Rbi: Interface to LibBi.Jazwinski, A., 1970. Stochastic processes and filtering theory. Academic Press.King, A.A., Nguyen, D., Ionides, E.L., 2016. Statistical inference for partially observed Markov processes via theR package pomp. Journal of Statistical Software 69, 1–43. doi:10.18637/jss.v069.i12Kitagawa, G., 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal ofComputational and Graphical Statistics 5, 1–25.Lindgren, F., Rue, H., 2015. Bayesian spatial modelling with R-INLA. Journal of Statistical Software 63, 1–25.doi:10.18637/jss.v063.i19Murray, L., 2015. Bayesian state-space modelling on high-performance hardware using LibBi. Journal ofStatistical Software, Articles 67, 1–36. doi:10.18637/jss.v067.i10Petris, G., Petrone, S., 2011. State space models in R. Journal of Statistical Software 41, 1–25. doi:10.18637/jss.v041.i04Plummer, M., Best, N., Cowles, K., Vines, K., 2006. CODA: Convergence diagnosis and output analysis forMCMC. R News 6, 7–11.R Core Team, 2020. R: A language and environment for statistical computing. R Foundation for StatisticalComputing, Vienna, Austria.Stan Development Team, 2019. RStan: The R interface to Stan.Tusell, F., 2011. Kalman filtering in R. Journal of Statistical Software 39, 1–27. doi:10.18637/jss.v039.i02Van Der Merwe, R., Doucet, A., De Freitas, N., Wan, E.A., 2001. The unscented particle filter, in: Advances inNeural Information Processing Systems. pp. 584–590.Vihola, M., 2012. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing22, 997–1008. doi:10.1007/s11222-011-9269-5Vihola, M., Helske, J., Franks, J., 2020. Importance sampling type estimators based on approximate marginalMCMC. Scandinavian Journal of Statistics. doi:10.1111/sjos.1249210 . Appendix: Code for Section 5 library ("bssm") set.seed (123)n <- 200sd_level <- 0.1drift <- 0.01beta <- -0.9phi <- 5level <- cumsum ( c (5, drift + rnorm (n -
1, sd = sd_level)))x <- 3 + (1 : n) * drift + sin (1 : n + runif (n, -1, 1))y <- rnbinom (n, size = phi, mu = exp (beta * x + level)) bssm_model <- bsm_ng (y,xreg = x,beta = normal (0, 0, 10),phi = halfnormal (1, 10),sd_level = halfnormal (0.1, 1),sd_slope = halfnormal (0.01, 0.1),a1 = c (0, 0), P1 = diag ( c (10, 0.1) ^ fit_bssm <- run_mcmc (bssm_model, iter = 60000, burnin = 10000,particles = 10, seed = 1) library ("rstan")stan_model <- "data {int