Efficient Bayesian generalized linear models with time-varying coefficients: The walker package in R
EEfficient Bayesian generalized linear models withtime-varying coefficients: The walker package in R
Jouni Helske
Department of Mathematics and Statistics, University of Jyvaskyla, FI-40014 Jyv¨askyl¨a,Finland
Abstract
The R package walker extends standard Bayesian general linear models tothe case where the effects of the explanatory variables can vary in time. Thisallows, for example, to model the effects of interventions such as changesin tax policy which gradually increases their effect over time. The Markovchain Monte Carlo algorithms powering the Bayesian inference are based onHamiltonian Monte Carlo provided by Stan software, using a state spacerepresentation of the model to marginalise over the regression coefficients forefficient low-dimensional sampling.
Key words:
Bayesian inference, Time-varying regression, R, Markov chainMonte Carlo
Email address: [email protected] (Jouni Helske)
Preprint submitted to SoftwareX September 16, 2020 a r X i v : . [ s t a t . C O ] S e p equired MetadataCurrent code versionNr. Code metadata description Please fill in this column C1 Current code version 0.4.1C2 Permanent link to code/repositoryused for this code version https://github.com/helske/walkerC3 Code Ocean compute capsule noneC4 Legal Code License GPL3C5 Code versioning system used gitC6 Software code languages, tools, andservices used R, Stan, C++.C7 Compilation requirements, operat-ing environments & dependencies R version 3.4.0 and up, C++14, Rpackages bayesplot, BH, coda, dplyr,ggplot2, Hmisc, KFAS, methods,rlang, rstan, rstantools, Stan-Headers, Rcpp, RcppArmadillo,RcppEigenC8 If available Link to developer docu-mentation/manual https://cran.r-project.org/web/packages/walker/walker.pdfC9 Support email for questions jouni.helske@iki.fiTable 1: Code metadata
1. Motivation and significance
Assume a time series of interest y , . . . , y T which is linearly dependent onthe some other predictor time series X , . . . , X T , where X (cid:48) t = ( x t , . . . , x pt ) (cid:48) are the predictor variables at the time point t , and define linear-gaussiantime series regression model as y t = X (cid:48) t β t + (cid:15) t , t = 1 , . . . , T, (1)where (cid:15) t ∼ N (0 , σ (cid:15) ) and β is a vector of p unknown regression coefficients(first one typically being the intercept term).It is not always reasonable to assume that the relationship between y t and some predictor x it stays constant over t = 1 , . . . , T , the time period ofinterest. The convenient linear relationship approximation may hold well foronly piecewise if the underlying relationship is nonlinear or when there aresome unmeasured confounders alter the relationship between the measuredvariables. Allowing the regression coefficients to vary over time can in some2nstances alleviate these problems. It is also possible that our knowledge ofthe phenomena of interest already leads to suspect time-varying relationships(see, e.g., Chapter 1 in [1]).The basic time series regression model (1) can be extended to allow theunknown regression coefficients β t to vary over time. This can be done invarious ways, for example, by constructing a semiparametric model basedon kernel smoothing [2], or parametrically by using dynamic Bayesian net-works [3] or state space models [4]. We follow the state space modellingapproach and define a gaussian time series regression model with randomwalk coefficients as y t = X (cid:48) t β t + (cid:15) t ,β t +1 = β t + η t , (2)where η t ∼ N (0 , D ), with D being p × p diagonal matrix with diagonalelements σ i,η , i = 1 , . . . , p , and define a prior distribution for the first timepoint β as N ( µ β , σ β ). The bottom equation in 2 defines a random walkprocess for the regression coefficients with D defining the degree of variabilityof the coefficients (with D = 0 the model collapses to basic regression model).Our goal is a Bayesian estimation of the unknown regression coefficients β , . . . , β T and standard deviations σ = ( σ (cid:15) , σ ,η , . . . , σ p,η ). Although in prin-ciple we can estimate these using the general-purpose Markov chain MonteCarlo (MCMC) software such as Stan [5] or
BUGS [6], these standard imple-mentations can be computationally inefficient and prone to severe problemsrelated to the convergence of the underlying MCMC algorithm due to thenature of our models of interest. For example in (block) Gibbs sampling ap-proach we target the joint posterior p ( β, σ | y ) by sampling from p ( β | σ, y ) and p ( σ | β, y ). But because of strong autocorrelations between the coefficients β at different time points, as well as with the associated standard deviationparameters, this MCMC scheme can lead to slow mixing. Also, the totalnumber of parameters to sample increases with the number of data points T .Although the Hamiltonian Monte Carlo algorithms offered by Stan are typ-ically more efficient exploring high-dimensional posteriors than Gibbs-typealgorithms, we still encounter similar problems.An alternative solution used by the R [7] package walker is based onthe property that model (2) can be written as a linear gaussian state spacemodel (see, e.g., Section 3.6.1 in [8]). This allows us to marginalize theregression coefficients β during the MCMC sampling by using the Kalmanfilter leading to a fast and accurate inference of marginal posterior p ( σ | y ).Then, the corresponding joint posterior p ( σ, β | y ) = p ( β | σ, y ) p ( σ | y ) can beobtained by simulating the regression coefficients given marginal posteriorof standard deviations. This sampling can be performed for example by3imulation smoothing algorithm [9].The marginalization of regression coefficients cannot be directly extendedto generalized linear models such as Poisson regression, as the marginal log-likelihood is intractable. However, it is possible to use Gaussian approxi-mation of this exponential state space model [10], and the resulting samplesfrom the approximating posterior can then be weighted using the impor-tance sampling type correction [11], leading again to asymptotically exactinference.When modelling the regression coefficients as a simple random walk, theposterior estimates of these coefficients can have large short-term variationwhich might not be realistic in practice. One way of imposing more smooth-ness for the estimates is to switch from random walk coefficients to integratedsecond order random walk coefficients, defined as β t +1 = β t + ν t ,ν t +1 = ν t + ξ t , with ξ t ∼ N (0 , D ξ ). This is a local linear trend model [12] (also knownas an integrated random walk), with the restriction that there is no noise onthe β level. For this model, we can apply the same estimation techniquesas for the random walk coefficient case. More complex patterns for β arealso possible. For example, we can define that β t +1 ∼ N ( β t , γ t σ η ) where γ , . . . , γ T is known monotonically decreasing sequence of values leading to acase where β t gradually converges to constant over time.
2. Software description
A stable version of walker is available at CRAN , while the current de-velopment version can be installed from Github . The MCMC sampling ishandled by rstan [13], the R interface to Stan , while the model definitionsand analysis of the results are performed in R , leading to fast and flexiblemodelling.For defining the models, the walker package uses similar Wilkinson-Rogers model formulation syntax [14] as, for example, basic linear modelfunction lm in R , but the formula for walker also recognizes two customfunctions, rw1 and rw2 for random walk and integrated random walk respec-tively. For example, by typing https://cran.r-project.org/package=walker https://github.com/helske/walker it <- walker(y ~ 0 + x +rw1(~ z, beta = c(0, 1), sigma = c(0, 1)),sigma_y = c(0, 1)) we define a model with time-invariant coefficient for predictor x , and firstorder random walk coefficients for z and time-varying intercept term. Priorsfor β (including the intercept) and σ are defined as vectors of length twowhich define the mean and standard deviation of the normal distributionrespectively (for standard deviation parameters the prior is truncated atzero).Function walker creates the model based on the formula and the priordefinitions, and then calls the sampling function from the rstan package.The resulting posterior samples can be then converted to a data frame formatusing the function as.data.frame for easy visualization and further analysis.In addition to the main functions walker for the Gaussian case and walker_glm for the Poisson and negative binomial models, the package con-tains additional functions for visualization of the results (e.g., plot_coef and pp_check ) and out-of-sample prediction (function predict ). Also, asthe modelling functions return the full stanfit used in MCMC sampling,this object can be analysed using many general diagnostic and graphical toolsprovided by several Stan related R packages such as ShinyStan [15].
3. Illustrative Examples
As an illustrative example, let us consider a observations y of length n = 100, generated by random walk (i.e. time varying intercept) and twopredictors. First we simulate the coefficients, predictors and the observations: set.seed(1)n <- 100beta1 <- cumsum(c(0.5, rnorm(n - 1, 0, sd = 0.05)))beta2 <- cumsum(c(-1, rnorm(n - 1, 0, sd = 0.15)))x1 <- rnorm(n, mean = 2)x2 <- cos(1:n)rw <- cumsum(rnorm(n, 0, 0.5))signal <- rw + beta1 * x1 + beta2 * x2y <- rnorm(n, signal, 0.5) Then we can call function walker . As noted in Section 2 the model isdefined as a simple formula object, and in addition to the prior definitionswe can pass various arguments to the sampling method of rstan , such asthe number of iterations iter and the number of chains chains to be usedfor the MCMC (default values for these are 2000 and 4 respectively).5 it <- walker(y ~ 0 + rw1(~ x1 + x2,beta = c(0, 10), sigma = c(0, 10)),sigma_y = c(0, 10), chains = 2, seed = 1)
The output of walker is walker_fit object, which is essentially a listwith stanfit from Stan ’s sampling function, the original observations y and the covariate matrix xreg . This allows us to use all the postprocessingfunctions for stanfit objects.Figure 1 shows how walker recovers the true coefficient processes rel-atively well: The posterior intervals contain the true coefficients and theposterior mean estimates closely follow the true values. For drawing the fig-ure, we first use function as.data.frame to extract our posterior samplesof the coefficients as data frame, then use packages dplyr [16] and ggplot2 [17] to summarise and plot the posterior means and 95% posterior intervalsrespectively: library(dplyr)library(ggplot2)sumr <- as.data.frame(fit, "tv") %>%group_by(variable, time) %>%summarise(mean = mean(value), lwr = quantile(value, 0.025),upr = quantile(value, 0.975))sumr$true <- c(rw, beta1, beta2)ggplot(sumr, aes(y = mean, x = time, colour = variable)) +geom_ribbon(aes(ymin = lwr, ymax = upr, fill = variable),colour = NA, alpha = 0.2) +geom_line(aes(linetype = "Estimate"), lwd = 1) +geom_line(aes(y = true, linetype = "True"), lwd = 1) +scale_linetype_manual(values = c("solid", "dashed")) +theme_bw() + xlab("Time") + ylab("Value") +theme(legend.position = "bottom",legend.title = element_blank()) More examples can be found in the package vignette and function doc-umentation pages (e.g. typing vignette("walker") or ?walker_glm in R ),including a comparison between the marginalization approach of walker and”naive” implementation with Stan , and an example on the scalability.
4. Impact and conclusions
The walker package extends standard Bayesian generalized linear modelsto flexible time-varying coefficients case in a computationally efficient man-6
Time V a l ue beta_(Intercept) beta_x1 beta_x2 Estimate True Figure 1: Posterior means (solid lines) and 95% posterior intervals (shaded areas) of thetime-varying regression coefficients and the true data generating values (dashed lines) ofthe illustrative example. ner, which allows researchers in economics, social sciences and other fieldsto relax the sometimes unreasonable assumption of a stable, time-invariantrelationship between the response variable and (some of) the predictors. Sim-ilar methods have been previously used in maximum likelihood setting forexample in studying the diminishing effects of advertising [18] and demandfor international reserves [19].There are several ways how walker can be extended in the future. Thereare already some plans for additional forms of time-varying coefficients (suchas a stationary autoregressive process), support for more priors and addi-tional distributions for the response variables (e.g., negative binomial).
5. Conflict of Interest
We wish to confirm that there are no known conflicts of interest associatedwith this publication and there has been no significant financial support forthis work that could have influenced its outcome.
Acknowledgements
This work has been supported by the Academy of Finland research grants284513, 312605, and 311877. 7 eferences [1] M. Moryson, Testing for random walk coefficients in regression and statespace models, Springer Science & Business Media, 2012.[2] P. M. Robinson, Nonparametric estimation of time-varying parameters,in: Statistical analysis and forecasting of economic structural change,Springer, 1989, pp. 253–264.[3] P. Dagum, A. Galper, E. Horvitz, Dynamic network models for fore-casting, in: Proceedings of the Eighth Conference on Uncertainty inArtificial Intelligence, UAI 92, Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 1992, p. 4148.[4] A. C. Harvey, G. D. A. Phillips, The estimation of regression modelswith time-varying parameters, in: M. Deistler, E. F¨urst, G. Schw¨odiauer(Eds.), Games, Economic Dynamics, and Time Series Analysis, Physica-Verlag HD, Heidelberg, 1982, pp. 306–321.[5] Stan Development Team, The Stan C++ Library, version 2.15.0 (2016).URL https://mc-stan.org/ [6] D. J. Lunn, A. Thomas, N. Best, D. Spiegelhalter, WinBUGS - ABayesian modelling framework: Concepts, structure, and extensibility(Oct 2000). doi:10.1023/A:1008929526011 .[7] R Core Team, R: A Language and Environment for Statistical Comput-ing, R Foundation for Statistical Computing, Vienna, Austria (2020).URL [8] J. Durbin, S. J. Koopman, Time Series Analysis by State Space Meth-ods, 2nd Edition, Oxford University Press, New York, 2012.[9] J. Durbin, S. J. Koopman, A simple and efficient simulation smootherfor state space time series analysis, Biometrika 89 (2002) 603–615.[10] J. Durbin, S. J. Koopman, Monte Carlo maximum likelihood estimationfor non-Gaussian state space models, Biometrika 84 (3) (1997) 669–684.[11] M. Vihola, J. Helske, J. Franks, Importance sampling type estimatorsbased on approximate marginal MCMC, Scandinavian Journal ofStatistics (2020). doi:10.1111/sjos.12492 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/sjos.12492 https://mc-stan.org/ [14] G. N. Wilkinson, C. E. Rogers, Symbolic description of factorial modelsfor analysis of variance, Journal of the Royal Statistical Society. SeriesC (Applied Statistics) 22 (3) (1973) 392–399.URL [15] Stan Development Team, shinystan: Interactive Visual and NumericalDiagnostics and Posterior Analysis for Bayesian Models, R package ver-sion 2.3.0 (2017).URL https://mc-stan.org/ [16] H. Wickham, R. Franois, L. Henry, K. Mller, dplyr: A Grammar of DataManipulation, R package version 0.7.6 (2018).URL https://CRAN.R-project.org/package=dplyr [17] H. Wickham, ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag New York, 2016.URL https://ggplot2.tidyverse.org [18] H. W. Kinnucan, M. Venkateswaran, Generic advertising; and thestructural heterogeneity hypothesis, Canadian Journal of AgriculturalEconomics/Revue canadienne d’agroeconomie 42 (3) (1994) 381–396. doi:10.1111/j.1744-7976.1994.tb00032.x .[19] M. Bahmani-Oskooee, F. Brown, Kalman filter approach to estimate thedemand for international reserves, Applied Economics 36 (15) (2004)1655–1668. doi:10.1080/0003684042000218543 .URL https://doi.org/10.1080/0003684042000218543https://doi.org/10.1080/0003684042000218543