[PDF] bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

Abstract

We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers an easy-to-use interface to define models with linear-Gaussian state dynamics with non-Gaussian observation models, and has an Rcpp interface for specifying custom non-linear and diffusion models.

Full PDF

bbssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models inR

Jouni Helske ∗ ,a , Matti Vihola a a Department of Mathematics and Statistics, University of Jyvaskyla, Finland

Abstract

We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existingpackages, bssm allows for easy-to-use approximate inference for the latent states based on Gaussian approximationssuch as the Laplace approximation and the extended Kalman ﬁlter. The package accommodates also discretiseddiﬀusion latent state processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo(MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximationbias. The package implements also a direct pseudo-marginal MCMC or a delayed acceptance pseudo-marginalMCMC using the approximations. The package supports directly models with linear-Gaussian state dynamics (butwith non-Gaussian observation models), and has an

Rcpp interface for specifying custom non-linear models.

Key words:

R package; state space models; time series; Bayesian inference; Markov chain Monte Carlo; sequentialMonte Carlo

1. Introduction

State space models (SSM) are a ﬂexible class of latent variable models commonly used in analysing time seriesdata (cf. Durbin and Koopman, 2012). There are a number of packages available for state space modelling for R (RCore Team, 2020), especially for two special cases: a linear-Gaussian SSM (LGSSM) where both the observationand state densities are Gaussian with linear relationships with the states, and an SSM with discrete state space,which is sometimes called a hidden Markov model (HMM). These classes admit analytically tractable marginallikelihood functions and conditional state distributions (conditioned on the observations), making inference relativelystraightforward. See for example (Helske, 2017; Helske and Helske, 2019; Petris and Petrone, 2011; Tusell, 2011)for review of some of the R packages dealing with these type of models. The present R package bssm is designedfor Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and stateequations. The package aims to provide easy-to-use and eﬃcient functions for fully Bayesian inference with commontime series models such basic structural time series model (BSM) (Harvey, 1989) with exogenous covariates, simplestochastic volatility models, and discretized diﬀusion models, making it straightforward and eﬃcient to makepredictions and other inference in a Bayesian setting.When extending the state space modelling to non-linear or non-Gaussian models, some diﬃculties arise. As thelikelihood is no longer analytically tractable, computing the latent state distributions, as well as hyperparameterestimation of the model becomes more challenging. One general option is to use Markov chain Monte Carlo (MCMC)methods targeting the full joint posterior of hyperparameters and the latent states, for example by Gibbs sampling orHamiltonian Monte Carlo. Unfortunately, the joint posterior is typically very high dimensional and due to the strongautocorrelation structures of the state densities, the eﬃciency of such methods can be relatively poor. Anotherasymptotically exact approach is based on the pseudo-marginal particle MCMC approach (Andrieu et al., 2010),where the likelihood function and the state distributions are estimated using sequential Monte Carlo (SMC) i.e. theparticle ﬁlter (PF) algorithm. Instead of computationally demanding Monte Carlo methods, approximation-basedmethods such extended and unscented Kalman ﬁlters may be used, as well as Laplace approximations, which areprovided for example by the INLA (Lindgren and Rue, 2015) R package. The latter are computationally appealing,but may lead to hard-to-quantify biases.Some of the R packages suitable for Bayesian state space modelling include pomp (King et al., 2016), rbi (Jacoband Funk, 2020), and rstan (Stan Development Team, 2019). With the package pomp , user deﬁnes the model using R or C snippets for simulation from and evaluation of the latent state and observation level densities, allowing ﬂexiblemodel construction. The rbi package is an interface to LibBi (Murray, 2015), a standalone software with a focus on ∗ Corresponding Author

Email addresses: [email protected] (Jouni Helske), [email protected] (Matti Vihola)

Preprint submitted to Computational Statistics and Data Analysis January 22, 2021 a r X i v : . [ s t a t . C O ] J a n ayesian state space modelling on high-performance computers. The rstan package provides R interface to the Stan (Carpenter et al., 2017) C++ package, a general statistical modelling platform. The pomp package providesseveral simulation-based inference methods mainly based on iterated ﬁltering and maximum likelihood, whereas rbi is typically used for Bayesian inference via particle MCMC. For a more detailed comparison of diﬀerences of rbi / LibBi and pomp with examples, see (Funk and King, 2020).The key diﬀerence to the aforementioned packages and motivation behind the present bssm package is to combinethe use of approximation-based methods with Monte Carlo methods, leading to eﬃcient and unbiased inference, assuggested in (Vihola et al., 2020). In a nutshell, the method uses MCMC which targets an approximate marginalposterior of the hyperparameters, and an importance sampling type weighting which provides asymptotically exactinference on the joint posterior of hyperparameters and the latent states. In addition to this two-stage procedure, the bssm supports also delayed acceptance pseudo-marginal MCMC (Christen and Fox, 2005) using the approximations,and direct pseudo-marginal MCMC. To our knowledge, importance sampling and delayed acceptance in this formare not available in other Bayesian state space modelling packages in R.

2. Supported models

We denote the sequence of observations ( y , . . . , y T ) as y , and the sequence of latent state variables ( α , . . . , α T )as α . The latent states α t ∈ R d are typically vector-valued, whereas we focus mainly on scalar observations y t ∈ R (vector-valued observations are also supported, assuming conditional independence (given α t ) in case of non-Gaussianobservations).A general state space model consists of two parts: observation level densities g ( θ ) t ( y t | α t ) and latent state transitiondensities µ ( θ ) t ( α t +1 | α t ). Typically both g ( θ ) t and µ ( θ ) t depend on unknown parameter vector θ for which we can deﬁnearbitrary prior p ( θ ).In a linear-Gaussian SSM, both g ( θ ) t and µ ( θ ) t are Gaussian densities and they depend linearly on the current andprevious state vectors, respectively. Section 2.1 describes a common extension to these models supported by bssm ,which relaxes the assumptions on observational density g ( θ ) t , by allowing exponential family links, and stochasticvolatility models. While the main focus of bssm is in state space models with linear-Gaussian state dynamics, thereis also support for more general non-linear models, discussed brieﬂy in Section 2.2. The primary class of models supported by bssm consists of SSMs with linear-Gaussian state dynamics of form α t +1 = c t + T t α t + R t η t , where c t ∈ R d , T t ∈ R d × d , and R t ∈ R d × k can depend on the unknown parameters θ and covariates. The noise terms η t ∼ N (0 , I k ) and α ∼ N ( a , P ) are independent. These state dynamics can be combined with the observationallevel density g t of form g t ( y t | d t + Z t α t , φ, u t ) , where parameters φ and the known vector u t are distribution speciﬁc and can be omitted in some cases. Currently,following observational level distributions are supported:• Gaussian distribution: y t = d t + Z t α t + H t (cid:15) t with (cid:15) t ∼ N (0 , I ).• Poisson distribution: g t ( y t | d t + Z t α t , u t ) = Poisson( u t exp( d t + Z t α t )), where u t is the known exposure at time t .• Binomial distribution: g t ( y t | d t + Z t α t , u t ) = B( u t , exp( d t + Z t α t ) / (1 + exp( d t + Z t α t ))), where u t is the sizeand exp( d t + Z t α t ) / (1 + exp( d t + Z t α t )) is the probability of the success.• Negative binomial distribution: g t ( y t | d t + Z t α t , φ, u t ) = NB(exp( d t + Z t α t ) , φ, u t ), where u t exp( d t + Z t α t ) isthe expected value and φ is the dispersion parameter, and u t is a known oﬀset term.• Gamma distribution: g t ( y t | d t + Z t α t , φ, u t ) = Gamma(exp( d t + Z t α t ) , φ, u t ), where u t exp( d t + Z t α t ) is theexpected value, φ is the shape parameter, and u t is a known oﬀset term.• Stochastic volatility model: g t ( y t | Z t α t ) = exp( α t / (cid:15) t , with (cid:15) t ∼ N (0 , α t +1 = µ + ρ ( α t − µ ) + σ η η t , with η t ∼ N (0 ,

1) and α ∼ N ( µ, σ η / (1 − ρ )).For multivariate models, these distributions can be combined arbitrarily, except the stochastic volatility modelcase which is currently handled separately. Also for fully Gaussian model, the observational level errors (cid:15) t can becorrelated across time series. 2 .2. Other state space models The general non-linear Gaussian model in the bssm has following form: y t = Z ( t, α t , θ ) + H ( t, α t , θ ) (cid:15) t ,α t +1 = T ( t, α t , θ ) + R ( t, α t , θ ) η t ,α ∼ N ( a ( θ ) , P ( θ )) , with t = 1 , . . . , n , (cid:15) t ∼ N (0 , I p ), and η ∼ N (0 , I k ).The bssm package also supports models where the state equation is deﬁned as a continuous-time diﬀusion modelof the form d α t = µ ( t, α t , θ )d t + σ ( t, α t , θ )d B t , t ≥ , where B t is a (vector-valued) Brownian motion and where µ and σ are vector and matrix-valued functions, with theunivariate observation density p ( y k | α k ) deﬁned at integer times k = 1 . . . , n .

3. Inference methods

The main goal of bssm is to facilitate easy-to-use full Bayesian inference of the joint posterior p ( α, θ | y ) for modelsdiscussed in Section 2. The inference methods implemented in bssm are based on a factorised approach where thejoint posterior of hyperparameters θ and latent states α is given as p ( α, θ | y ) ∝ p ( θ ) p ( α, y | θ ) = p ( θ ) p ( y | θ ) p ( α | y, θ ) , where p ( y | θ ) is the parameter marginal likelihood and p ( α | y, θ ) is the smoothing distribution.All the inference algorithms are based on a Markov chain Monte Carlo on the parameters θ , whose single iterationmay be summarised as follows: Algorithm 1

One iteration of MCMC algorithm for sampling p ( θ | y ). Draw a proposal θ ∼ N ( θ i − , Σ i − ). Calculate the (approximate) marginal likelihood ˆ p ( y | θ ). Accept the proposal with probability α := min n , p ( θ )ˆ p ( y | θ ) p ( θ i − )ˆ p ( y | θ i − ) o . If the proposal θ is accepted, set θ i = θ . Otherwise, set θ i = θ i − . Adapt the proposal covariance matrix Σ i − → Σ i .The adaptation step 5 in bssm currently implements the robust adaptive Metropolis algorithm (Vihola, 2012) withﬁxed target acceptance rate (0.234 by default) provided by the ramcmc package (Helske, 2016). The (approximate)marginal likelihood ˆ p ( y | θ ) = p ( y | θ ) takes diﬀerent forms, leading to diﬀerent inference algorithms, discussed below. The simplest case is with a linear-Gaussian SSM, where we can use the exact marginal likelihood ˆ p ( y | θ ) = p ( y | θ ),in which case Algorithm 1 reduces to (an adaptive) random-walk Metropolis algorithm targeting the posteriormarginal of the parameters θ . Inference from the full posterior may be done using the simulation smoothing algorithm(Durbin and Koopman, 2002).The other ‘direct’ option, which can be used with any model, is using the bootstrap particle ﬁlter (BSF) (Gordonet al., 1993), which leads to a random ˆ p ( y | θ ) which is an unbiased estimator of p ( y | θ ). In this case, Algorithm 1reduces to (an adaptive) particle marginal Metropolis-Hastings (Andrieu et al., 2010). Full posterior inference isachieved simultaneously, by picking particle trajectories based on their ancestries as in the ﬁlter-smoother algorithm(Kitagawa, 1996). Note that with BSF, the desired acceptance rate needs to be lower, depending on the number ofparticles used (Doucet et al., 2015). 3 .2. Approximate inference: Laplace approximation and the extended Kalman ﬁlter The direct BSF discussed above may be used with any non-linear and/or non-Gaussian model, but may be slowand/or poor mixing. To alleviate this, the bssm provides eﬃcient (intermediate) approximate inference in case ofnon-Gaussian observation models in Section 2.1, and in case of non-linear dynamics in Section 2.2.With non-Gaussian models of Section 2.1, we use an approximating Gaussian model ˜ p ( y, α | θ ) which is a Laplaceapproximation of p ( α, y | θ ) following (Durbin and Koopman, 2000). We write the likelihood as follows p ( y | θ ) = Z p ( α, y | θ )d α = ˜ p ( y | θ ) E (cid:20) p ( y | α, θ )˜ p ( y | α, θ ) (cid:21) , where ˜ p ( y | θ ) is the likelihood of the Laplace approximation and the expectation is taken with respect to its conditional˜ p ( α | y, θ ) (Durbin and Koopman, 2012). Indeed, denoting ˆ α as the mode of ˜ p ( α | θ, y ), we may writelog p ( y | θ ) = log ˜ p ( y | θ ) + log p ( y | ˆ α, θ )˜ p ( y | ˆ α, θ ) + log E (cid:20) p ( y | α, θ ) /p ( y | ˆ α, θ )˜ p ( y | α, θ ) / ˜ p ( y | ˆ α, θ ) (cid:21) . If ˜ p resembles p with typical values of α , the latter logarithm of expectation is zero. We take ˆ p ( y | θ ) as the expressionon the right, dropping the expectation.When ˆ p is approximate, the MCMC algorithm targets an approximate posterior marginal. Approximate fullinference may be done analogously as in Section 3.1, by simulating trajectories conditional to the sampled parameterconﬁgurations θ i . We believe that approximate inference is often good enough for model development, but stronglyrecommend using post-correction as discussed in Section 3.3 to check the validity of the ﬁnal inference.In addition to these algorithms, bssm also supports ˆ p ( y | θ ) based on the extended KF (EKF) or iterated EKF(Jazwinski, 1970) (IEKF) which can be used for models with non-linear dynamics (Section 2.2). Approximatesmoothing based on (iterated) EKF is also supported. It is also possible to perform direct inference as in Section 3.1,but instead of the BSF, employ particle ﬁlter based on EKF (Van Der Merwe et al., 2001). The inference methods in Section 3.2 are eﬃcient, but come with a bias. The bssm implements importance-sampling type post-correction as discussed in (Vihola et al., 2020). Indeed, having MCMC samples ( θ i ) from theapproximate posterior constructed as in Section {approximate-inference}, we may produce (random) weights andlatent states, such that the weighted samples form estimators which are consistent with respect to the true posterior p ( α, θ | y ).The primary approach which we recommend for post-correction is based on a “ ψ -APF” — a particle ﬁlter usingthe Gaussian approximations of Section 3.2. In essence, this particle ﬁlter employs the dynamics and a look-aheadstrategy coming from the approximation, which leads to low-variance estimators; see (Vihola et al., 2020) andpackage vignettes for more detailed description. Naturally ψ -APF can also be used in place of BSF in directinference of Section 3.1. An alternative to approximate MCMC and post-correction, bssm also supports an analogous delayed acceptancemethod (Banterle et al., 2019; Christen and Fox, 2005) (here denoted by DA-MCMC). This algorithm is similar to 1,but in case of ‘acceptance’, leads to second-stage acceptance using the same weights as the post-correction would;see (Vihola et al., 2020) for details. Note that as in direct approach for non-Gaussian/non-linear models, the desiredacceptance rate with DA-MCMC should be lower than the default 0.234.The DA-MCMC also leads to consistent posterior estimators, and often outperforms the direct particle marginalMetropolis-Hastings. However, empirical ﬁndings (Vihola et al., 2020) and theoretical considerations (Franks andVihola, 2020) suggest that approximate inference with post-correction may often be preferable. The bssm supportsparallelisation with post-correction, which may further promote the latter.

For general continuous-time diﬀusion models, the transition densities are intractable. The bssm uses Millsteintime-discretisation scheme for approximate simulation, and inference is based on the corresponding BSF. Finetime-discretisation mesh gives less bias than the coarser one, with increased computational complexity. The DA andIS approaches can be used to speed up the inference by using coarse discretisation in the ﬁrst stage and then usingmore ﬁne mesh in the second stage. For comparison of DA and IS approaches in case of geometric Brownian motionmodel, see (Vihola et al., 2020). https://cran.r-project.org/package=bssm/vignettes/psi_pf.html . Using the bssm package Main functions of bssm are written in

C++ , with help of

Rcpp and

RcppArmadillo packages. On the R side,the package uses S3 methods to provide a relatively uniﬁed workﬂow independent of the type of the model one isworking with. The model building functions such as bsm_ng and svm are used to construct the model objects whichcan be then passed to other methods, such as logLik and run_mcmc which compute the log-likelihood value andrun MCMC algorithm respectively. We will now brieﬂy describe the main functionality of bssm . For more detaileddescriptions of diﬀerent functions, see the corresponding documentation in R and the package vignettes. For models with linear-Gaussian state dynamics 2.1, bssm includes some predeﬁned models such as bsm_lg and bsm_ng for univariate Gaussian and non-Gaussian structural time series models (BSM) (Harvey, 1989) with externalcovariates, for which user only needs to supply the data and priors for unknown model parameters. In addition, bssm supports general model building functions ssm_ulg , ssm_mlg for univariate and multivariate Gaussian modelsand ssm_ung , and ssm_mng for their non-Gaussian counterparts. For these models, users need to supply R functionsfor the evaluation of the prior density and for updating the model matrices given the current value of the parametervector θ . It is also possible to avoid deﬁning the matrices manually by leveraging the formula interface of the KFAS package (Helske, 2017) together with as_bssm function which converts KFAS model to a bssm equivalent modelobject. This is especially useful in case of complex multivariate models with covariates.As an example, consider a Gaussian local linear trend model of the form y t = µ t + (cid:15) t ,µ t +1 = µ t + ν t + η t ,ν t +1 = ν t + ξ t , with zero-mean Gaussian noise terms (cid:15) t , η t , ξ t with unknown standard deviations. This model can be built with bsm function as suppressPackageStartupMessages ( library ("bssm")) data ("nhtemp", package = "datasets")prior <- halfnormal (1, 10)bsm_model <- bsm_lg (y = nhtemp, sd_y = prior, sd_level = prior,sd_slope = prior) Here we use helper function halfnormal which deﬁnes half-Normal prior distribution for the standard deviationparameters, with the ﬁrst argument deﬁning the initial value of the parameter, and second deﬁnes the scale parameterof the half-Normal distribution. Other prior options are normal , tnormal (truncated normal), gamma , and uniform .As an example of multivariate model, consider bivariate Poisson model with latent random walk model, deﬁnedas y i,t ∼ P oisson (exp( x t )) , i = 1 , ,x t +1 = x t + η t , with η t ∼ N (0 , σ ), and prior σ ∼ Gamma (2 , . ssm_mng function as set.seed (1)x <- cumsum ( c (1, rnorm (49, sd = 0.2)))y <- cbind ( rpois (50, exp (x)), rpois (50, exp (x)))prior_fn <- function (theta) { dgamma (theta, 2, 0.01, log = TRUE)}update_fn <- function (theta) { list (R = array (theta, c (1, 1, 1))) mng_model <- ssm_mng (y = y, Z = matrix (1,2,1), T = 1, R = 0.1, P1 = 1, distribution = "poisson",init_theta = 0.1,prior_fn = prior_fn, update_fn = update_fn)

Here prior_fn deﬁnes the log-prior for the model, and update_fn deﬁnes how the model components dependon the hyperparameters θ .For models where the state equation is no longer linear-Gaussian, we use pointer-based interface by deﬁning allmodel components as well as functions deﬁning the Jacobians of Z ( · ) and T ( · ) needed by the extended Kalmanﬁlter as C++ snippets. General non-linear Gaussian model can be deﬁned with the function ssm_nlg . Discretelyobserved diﬀusion models where the state process is assumed to be continuous stochastic process can be constructedusing the ssm_sde function, which takes pointers to C++ functions deﬁning the drift, diﬀusion, the derivative ofthe diﬀusion function, and the log-densities of the observations and the prior. As an example of the latter, let usconsider an Ornstein–Uhlenbeck process d α t = ρ ( ν − α t )d t + σ d B t , with parameters θ = ( φ, ν, σ ) = (0 . , ,

1) and the initial condition α = 1. For observation density, we use Poissondistribution with parameter exp( α k ). We ﬁrst simulate a trajectory x , . . . , x n using the sde.sim function from the sde package (Iacus, 2016) and use that for the simulation of observations y : set.seed (1) suppressPackageStartupMessages ( library ("sde"))x <- sde.sim (t0 = 0, T = 100, X0 = 1, N = 100,drift = expression (0.5 * (2 - x)),sigma = expression (1),sigma.x = expression (0))y <- rpois (100, exp (x[ - We then compile and build the model as

Rcpp ::sourceCpp ("ssm_sde_template.cpp")pntrs <- create_xptrs ()sde_model <- sde_ssm (y, pntrs $ drift, pntrs $ diffusion,pntrs $ ddiffusion, pntrs $ obs_density, pntrs $ prior, c (0.5, 2, 1), 1, FALSE) The templates for SDE and non-linear Gaussian models can be found from the package vignettes on the CRAN . For LGSSM, the bssm has a methods kfilter and smoother for performing Kalman ﬁltering and smoothing.For non-Gaussian models with linear-Gaussian dynamics, approximate ﬁltering and smoothing estimates can beobtained by calls to kfilter and smoother , in which case these functions ﬁrst construct an approximating Gaussianmodel for which the Kalman ﬁlter/smoother is then used. For non-linear models deﬁned by nlg_ssm we can runapproximate ﬁltering using the extended Kalman ﬁlter with the function ekf , the unscented Kalman ﬁlter withthe function ukf , or the iterated EKF (IEKF) by changing the argument iekf_iter of the ekf function. Function ekf_smoother can be used for smoothing based on EKF/IEKF.For particle ﬁltering the bssm package supports general bootstrap particle ﬁlter (function bootstrap_filter )for all model classes of the bssm . For nlg_ssm , extended Kalman particle ﬁltering (Van Der Merwe et al., 2001) issupported (function ekpf_filter ).For particle smoothing, function particle_smoother with the smoothing based on BSF is available for allmodels. In addition, ψ -APF (using argument method = "psi" ) is available for all models except of ssm_sde class.Currently, only ﬁlter-smoother approach (Kitagawa, 1996) for particle smoothing is supported. https://CRAN.R-project.org/package=bssm d_level sd_slope sd_y0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.50 0.75 1.00 1.25 1.5001201230.00.51.01.52.0 value den s i t y Figure 1: Posterior densities of linear-Gaussian model for nhtemp data. bssm

The main purpose of the bssm is to allow eﬃcient MCMC-based inference for various state space models. Forthis task, a method run_mcmc can be used. The function takes a number of arguments, depending on the modelclass, but for many of these, default values are provided. For linear-Gaussian models, we only need to supply thenumber of iterations. Here we deﬁne a random walk model with drift and stochastic seasonal component for UK gasconsumption dataset and use 100,000 MCMC iteration where ﬁrst 10,000 is discarded as a burn-in (burn-in phase isalso used for the adaptation of the proposal distribution): mcmc_bsm <- run_mcmc (bsm_model, iter = 1e5, burnin = 1e4)

The print method for the output of the MCMC algorithms gives a summary of the results, and detailedsummaries for θ and α can be obtained using summary function. For all MCMC algorithms, bssm uses so-calledjump chain representation of the Markov chain X , . . . , X n , where we only store each accepted X k and the numberof steps we stayed on the same state. So for example if X n = (1 , , , , , X = (1 , , N = (1 , , N . This can be done using the function expand_sample which returns anobject of class mcmc of the coda package (Plummer et al., 2006) (thus the plotting and diagnostic methods of coda can also be used). We can also directly transform the posterior samples to a data.frame by using as.data.frame method for the MCMC output (for IS-weighting, the returned data frame contains additional column weights ).This is useful for example for visualization purposes with the ggplot2 package: suppressPackageStartupMessages ( library ("ggplot2"))d <- as.data.frame (mcmc_bsm, variable = "theta") ggplot (d, aes (x = value)) +geom_density (bw = 0.1, fill = " +facet_wrap ( ~ variable, scales = "free") +theme_bw () suppressPackageStartupMessages ( library ("dplyr")) options (dplyr.summarise.inform= FALSE)d <- as.data.frame (mcmc_bsm, variable = "states") Year M ean annua l t e m pe r a t u r e i n N e w H a v en Figure 2: Observed annual average temperatures in New Haven (black dots) and predicted mean (solid line) with 95% prediction intervals(grey ribbon) from ‘bssm‘. summary_y <- d %>%filter (variable == "level") %>%group_by (time) %>%summarise (mean = mean (value),lwr = quantile (value, 0.025),upr = quantile (value, 0.975)) ggplot (summary_y, aes (x = time, y = mean)) +geom_ribbon ( aes (ymin = lwr, ymax = upr), alpha = 0.25) +geom_line () +geom_point (data = data.frame (mean = nhtemp,time = time (nhtemp))) +theme_bw () + xlab ("Year") +ylab ("Mean annual temperature in New Haven") For non-Gaussian models the default MCMC algorithm is approximate inference combined with importancesampling post-correction (Section 3.3). It is also possible to perform ﬁrst approximate MCMC using the argument mcmc_type = "approx" , and then perform the post-correction step using the results from the approximate MCMC.In doing so, we can also use the function suggest_N to ﬁnd a suitable number of particles N for ψ -APF in the spiritof (Doucet et al., 2015): out_approx <- run_mcmc (model, mcmc_type = "approx", iter = 50000)est_N <- suggest_N (out_approx, model)out_exact <- postcorrection (model, out_approx, particles = est_N $ N) The function suggest_N computes the standard deviation of the logarithm of the post-correction weights (i.e. therandom part of log-likelihood of ψ -APF) at the approximate MAP estimator of θ using a range of N and returns alist with component N which is the smallest number of particles where the standard deviation was less than one. Forsmall and moderate problems typically 10-20 particles is enough.

5. Comparison of IS-MCMC and HMC

Vihola et al. (2020) compared the computational eﬃciency of delayed acceptance MCMC and importancesampling type MCMC approaches in various settings. Here we make a small experiment comparing the generic8amiltonian Monte Carlo using the NUTS sampler (Hoﬀman and Gelman, 2014) with rstan (Stan DevelopmentTeam, 2019), and IS-MCMC with bssm . For complete code of the experiment, see Appendix A.We consider the case of a random walk with drift model with negative binomial observations and some knowncovariate x t , deﬁned as y t ∼ N B (exp( βx t + µ t ) , φ ) µ t +1 = µ t + ν t + η t ,ν t +1 = ν t , with zero-mean Gaussian noise term η t with unknown standard deviation σ µ . Based on this we simulate onerealization of y and x with n = 200, φ = 5, β = − . ν = 0 . σ µ = 0 . ng_bsm function for model building, with prior variances 100 and 0.01 for theinitial states µ and ν . For hyperparameters, we used fairly uninformative half-Normal distribution with standarddeviation 0.5 for σ µ and 0.1 for σ ν . We then ran the IS-MCMC algorithm with run_mcmc using a burn-in phase oflength 10,000 and run 50,000 iterations after the burn-in, with 10 particles per SMC.Using the same set up, we ran the MCMC with rstan using 15,000 iterations (with ﬁrst 5000 used for warm-up).Note that in order to avoid sampling problems, it was necessary to tweak the default control parameters of thesampler (see Appendix A).Table 1 shows the results. We see both methods produce identical results (within the Monte Carlo error), butwhile rstan produces similar Monte Carlo standard errors with smaller amount of total iterations than bssm , thetotal computation time of rstan is 70 times higher than with bssm (58 minutes versus 50 seconds). Table 1: Estimates of posterior mean, standard deviation and Monte Carlo standard error of the mean for hyperparameters θ and latentstates for last time point for the example model. bssm rstan Mean SD MCSE Mean SD MCSE σ µ .

093 0 .

036 8 × − .

090 0 .

036 9 × − σ ν .

003 0 .

003 5 × − .

003 0 .

003 7 × − φ .

409 0 .

925 2 × − .

386 0 .

898 1 × − β − .

912 0 .

057 1 × − − .

911 0 .

056 7 × − µ .

963 0 .

350 5 × − .

965 0 .

349 4 × − ν .

006 0 .

020 3 × − .

006 0 .

019 2 × −

6. Conclusions

State space models are a ﬂexible tool for analysing a variety of time series data. Here we introduced the R package bssm for fully Bayesian state space modelling for a large class of models with several alternative MCMC samplingstrategies. All computationally intensive parts of the package are implemented with C++ with parallel computationsupport for IS-MCMC making it an attractive option for many common models where relatively accurate Gaussianapproximations are available.Compared to early versions of the bssm package, the option to deﬁne R functions for model updating and priorevaluation have lowered the bar for analysing custom models. The package is also written in a way that it is relativelyeasy to extend to new model types similar to current bsm_lg in future. The bssm package could be expanded toallow other proposal adaptation schemes such as adaptive Metropolis algorithm by Haario et al. (2001), as well assupport for multivariate SDE models and automatic diﬀerentiation for EKF-type algorithms.

7. Acknowledgements

This work has been supported by the Academy of Finland research grants 284513, 312605, 311877, and 331817.

8. References

Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo methods. Journal of RoyalStatistical Society B 72, 269–342.Banterle, M., Grazian, C., Lee, A., Robert, C.P., 2019. Accelerating Metropolis-Hastings algorithms by delayedacceptance. Foundations of Data Science 1, 103. doi:10.3934/fods.20190059arpenter, B., Gelman, A., Hoﬀman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li,P., Riddell, A., 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76, 1–32.doi:10.18637/jss.v076.i01Christen, J.A., Fox, C., 2005. Markov chain Monte Carlo using an approximation. Journal of Computationaland Graphical Statistics 14, 795–810. doi:10.1198/106186005X76983Doucet, A., Pitt, M.K., Deligiannidis, G., Kohn, R., 2015. Eﬃcient implementation of Markov chain MonteCarlo when using an unbiased likelihood estimator. Biometrika 102, 295–313. doi:10.1093/biomet/asu075Durbin, J., Koopman, S.J., 2000. Time series analysis of non-Gaussian observations based on state space modelsfrom both classical and Bayesian perspectives. Journal of Royal Statistical Society B 62, 3–56.Durbin, J., Koopman, S.J., 2002. A simple and eﬃcient simulation smoother for state space time series analysis.Biometrika 89, 603–615.Durbin, J., Koopman, S.J., 2012. Time series analysis by state space methods, 2nd ed. Oxford University Press,New York.Franks, J., Vihola, M., 2020. Importance sampling correction versus standard averages of reversible MCMCs interms of the asymptotic variance. Stochastic Processes and their Applications 130, 6157–6183.Funk, S., King, A.A., 2020. Choices and trade-oﬀs in inference with infectious disease models. Epidemics 30,100383. doi:https://doi.org/10.1016/j.epidem.2019.100383Gordon, N.J., Salmond, D.J., Smith, A.F.M., 1993. Novel approach to nonlinear/non-Gaussian Bayesian stateestimation. IEE Proceedings-F 140, 107–113.Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive metropolis algorithm. Bernoulli 7, 223–242.Harvey, A.C., 1989. Forecasting, structural time series models and the Kalman ﬁlter. Cambridge UniversityPress.Helske, J., 2016. ramcmc: Robust adaptive Metropolis algorithm.Helske, J., 2017. KFAS: Exponential family state space models in R. Journal of Statistical Software 78, 1–39.doi:10.18637/jss.v078.i10Helske, S., Helske, J., 2019. Mixture hidden Markov models for sequence data: The seqHMM package in R.Journal of Statistical Software 88, 1–32. doi:10.18637/jss.v088.i03Hoﬀman, M.D., Gelman, A., 2014. The no-U-turn sampler: Adaptively setting path lengths in HamiltonianMonte Carlo. The Journal of Machine Learning Research 15, 1593–1623.Iacus, S.M., 2016. Sde: Simulation and inference for stochastic diﬀerential equations.Jacob, P.E., Funk, S., 2020. Rbi: Interface to LibBi.Jazwinski, A., 1970. Stochastic processes and ﬁltering theory. Academic Press.King, A.A., Nguyen, D., Ionides, E.L., 2016. Statistical inference for partially observed Markov processes via theR package pomp. Journal of Statistical Software 69, 1–43. doi:10.18637/jss.v069.i12Kitagawa, G., 1996. Monte Carlo ﬁlter and smoother for non-Gaussian nonlinear state space models. Journal ofComputational and Graphical Statistics 5, 1–25.Lindgren, F., Rue, H., 2015. Bayesian spatial modelling with R-INLA. Journal of Statistical Software 63, 1–25.doi:10.18637/jss.v063.i19Murray, L., 2015. Bayesian state-space modelling on high-performance hardware using LibBi. Journal ofStatistical Software, Articles 67, 1–36. doi:10.18637/jss.v067.i10Petris, G., Petrone, S., 2011. State space models in R. Journal of Statistical Software 41, 1–25. doi:10.18637/jss.v041.i04Plummer, M., Best, N., Cowles, K., Vines, K., 2006. CODA: Convergence diagnosis and output analysis forMCMC. R News 6, 7–11.R Core Team, 2020. R: A language and environment for statistical computing. R Foundation for StatisticalComputing, Vienna, Austria.Stan Development Team, 2019. RStan: The R interface to Stan.Tusell, F., 2011. Kalman ﬁltering in R. Journal of Statistical Software 39, 1–27. doi:10.18637/jss.v039.i02Van Der Merwe, R., Doucet, A., De Freitas, N., Wan, E.A., 2001. The unscented particle ﬁlter, in: Advances inNeural Information Processing Systems. pp. 584–590.Vihola, M., 2012. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing22, 997–1008. doi:10.1007/s11222-011-9269-5Vihola, M., Helske, J., Franks, J., 2020. Importance sampling type estimators based on approximate marginalMCMC. Scandinavian Journal of Statistics. doi:10.1111/sjos.1249210 . Appendix: Code for Section 5 library ("bssm") set.seed (123)n <- 200sd_level <- 0.1drift <- 0.01beta <- -0.9phi <- 5level <- cumsum ( c (5, drift + rnorm (n -

1, sd = sd_level)))x <- 3 + (1 : n) * drift + sin (1 : n + runif (n, -1, 1))y <- rnbinom (n, size = phi, mu = exp (beta * x + level)) bssm_model <- bsm_ng (y,xreg = x,beta = normal (0, 0, 10),phi = halfnormal (1, 10),sd_level = halfnormal (0.1, 1),sd_slope = halfnormal (0.01, 0.1),a1 = c (0, 0), P1 = diag ( c (10, 0.1) ^ fit_bssm <- run_mcmc (bssm_model, iter = 60000, burnin = 10000,particles = 10, seed = 1) library ("rstan")stan_model <- "data {int n; // number of data pointsint y[n]; // time seriesvector[n] x; // covariate}parameters {real sd_slope;real sd_level;real beta;real phi;// instead of working directly with true state variables// it is often suggested use standard normal variables in sampling// and reconstruct the true parameters in transformed parameters block// this should make sampling more efficient although coding the model// is less intuitive.vector[n] level_std; // N(0, 1) level noisevector[n] slope_std; // N(0, 1) slope noise}transformed parameters {vector[n] level; ector[n] slope;// construct the actual stateslevel[1] = 10 * level_std[1];slope[1] = 0.1 * slope_std[1];slope[2:n] = slope[1] + cumulative_sum(sd_slope * slope_std[2:n]);level[2:n] = level[1] + cumulative_sum(slope[1:(n-1)]) +cumulative_sum(sd_level * level_std[2:n]);}model {beta ~ normal(0, 10);phi ~ normal(0, 10);sd_slope ~ normal(0, 0.1);sd_level ~ std_normal();// standardised noise termslevel_std ~ std_normal();slope_std ~ std_normal();y ~ neg_binomial_2_log(level + beta * x, phi);}"stan_data <- list (n = n, y = y, x = x)stan_inits <- list ( list (sd_level = 0.1, sd_slope = 0.01, phi = 1, beta = 0)) fit_stan <- stan (model_code = stan_model,data = stan_data, iter = 15000, warmup = 5000,control = list (adapt_delta = 0.99, max_treedepth = 12),init = stan_inits, chains = 1, refresh = 0, seed = 1)d_stan <- summary (fit_stan, pars = c ("sd_level","sd_slope","phi","beta","level[200]","slope[200]")) $ summary[, c ("mean", "sd", "se_mean")]d_bssm <- summary (fit_bssm, variable = "both", return_se = TRUE) d_stand_bssm $ thetad_bssm $ states $ Mean[200,]d_bssm $ states $ SD[200,]d_bssm $ states $ SE[200,] sum ( get_elapsed_time (fit_stan))fit_bssm $ time[3]time[3]