[PDF] Efficient Bayesian generalized linear models with time-varying coefficients: The walker package in R

Abstract

The R package walker extends standard Bayesian general linear models to the case where the effects of the explanatory variables can vary in time. This allows, for example, to model the effects of interventions such as changes in tax policy which gradually increases their effect over time. The Markov chain Monte Carlo algorithms powering the Bayesian inference are based on Hamiltonian Monte Carlo provided by Stan software, using a state space representation of the model to marginalise over the regression coefficients for efficient low-dimensional sampling.

Full PDF

EEﬃcient Bayesian generalized linear models withtime-varying coeﬃcients: The walker package in R

Jouni Helske

Department of Mathematics and Statistics, University of Jyvaskyla, FI-40014 Jyv¨askyl¨a,Finland

Abstract

The R package walker extends standard Bayesian general linear models tothe case where the eﬀects of the explanatory variables can vary in time. Thisallows, for example, to model the eﬀects of interventions such as changesin tax policy which gradually increases their eﬀect over time. The Markovchain Monte Carlo algorithms powering the Bayesian inference are based onHamiltonian Monte Carlo provided by Stan software, using a state spacerepresentation of the model to marginalise over the regression coeﬃcients foreﬃcient low-dimensional sampling.

Key words:

Bayesian inference, Time-varying regression, R, Markov chainMonte Carlo

Email address: [email protected] (Jouni Helske)

Preprint submitted to SoftwareX September 16, 2020 a r X i v : . [ s t a t . C O ] S e p equired MetadataCurrent code versionNr. Code metadata description Please ﬁll in this column C1 Current code version 0.4.1C2 Permanent link to code/repositoryused for this code version https://github.com/helske/walkerC3 Code Ocean compute capsule noneC4 Legal Code License GPL3C5 Code versioning system used gitC6 Software code languages, tools, andservices used R, Stan, C++.C7 Compilation requirements, operat-ing environments & dependencies R version 3.4.0 and up, C++14, Rpackages bayesplot, BH, coda, dplyr,ggplot2, Hmisc, KFAS, methods,rlang, rstan, rstantools, Stan-Headers, Rcpp, RcppArmadillo,RcppEigenC8 If available Link to developer docu-mentation/manual https://cran.r-project.org/web/packages/walker/walker.pdfC9 Support email for questions jouni.helske@iki.ﬁTable 1: Code metadata

1. Motivation and signiﬁcance

Assume a time series of interest y , . . . , y T which is linearly dependent onthe some other predictor time series X , . . . , X T , where X (cid:48) t = ( x t , . . . , x pt ) (cid:48) are the predictor variables at the time point t , and deﬁne linear-gaussiantime series regression model as y t = X (cid:48) t β t + (cid:15) t , t = 1 , . . . , T, (1)where (cid:15) t ∼ N (0 , σ (cid:15) ) and β is a vector of p unknown regression coeﬃcients(ﬁrst one typically being the intercept term).It is not always reasonable to assume that the relationship between y t and some predictor x it stays constant over t = 1 , . . . , T , the time period ofinterest. The convenient linear relationship approximation may hold well foronly piecewise if the underlying relationship is nonlinear or when there aresome unmeasured confounders alter the relationship between the measuredvariables. Allowing the regression coeﬃcients to vary over time can in some2nstances alleviate these problems. It is also possible that our knowledge ofthe phenomena of interest already leads to suspect time-varying relationships(see, e.g., Chapter 1 in [1]).The basic time series regression model (1) can be extended to allow theunknown regression coeﬃcients β t to vary over time. This can be done invarious ways, for example, by constructing a semiparametric model basedon kernel smoothing [2], or parametrically by using dynamic Bayesian net-works [3] or state space models [4]. We follow the state space modellingapproach and deﬁne a gaussian time series regression model with randomwalk coeﬃcients as y t = X (cid:48) t β t + (cid:15) t ,β t +1 = β t + η t , (2)where η t ∼ N (0 , D ), with D being p × p diagonal matrix with diagonalelements σ i,η , i = 1 , . . . , p , and deﬁne a prior distribution for the ﬁrst timepoint β as N ( µ β , σ β ). The bottom equation in 2 deﬁnes a random walkprocess for the regression coeﬃcients with D deﬁning the degree of variabilityof the coeﬃcients (with D = 0 the model collapses to basic regression model).Our goal is a Bayesian estimation of the unknown regression coeﬃcients β , . . . , β T and standard deviations σ = ( σ (cid:15) , σ ,η , . . . , σ p,η ). Although in prin-ciple we can estimate these using the general-purpose Markov chain MonteCarlo (MCMC) software such as Stan [5] or

BUGS [6], these standard imple-mentations can be computationally ineﬃcient and prone to severe problemsrelated to the convergence of the underlying MCMC algorithm due to thenature of our models of interest. For example in (block) Gibbs sampling ap-proach we target the joint posterior p ( β, σ | y ) by sampling from p ( β | σ, y ) and p ( σ | β, y ). But because of strong autocorrelations between the coeﬃcients β at diﬀerent time points, as well as with the associated standard deviationparameters, this MCMC scheme can lead to slow mixing. Also, the totalnumber of parameters to sample increases with the number of data points T .Although the Hamiltonian Monte Carlo algorithms oﬀered by Stan are typ-ically more eﬃcient exploring high-dimensional posteriors than Gibbs-typealgorithms, we still encounter similar problems.An alternative solution used by the R [7] package walker is based onthe property that model (2) can be written as a linear gaussian state spacemodel (see, e.g., Section 3.6.1 in [8]). This allows us to marginalize theregression coeﬃcients β during the MCMC sampling by using the Kalmanﬁlter leading to a fast and accurate inference of marginal posterior p ( σ | y ).Then, the corresponding joint posterior p ( σ, β | y ) = p ( β | σ, y ) p ( σ | y ) can beobtained by simulating the regression coeﬃcients given marginal posteriorof standard deviations. This sampling can be performed for example by3imulation smoothing algorithm [9].The marginalization of regression coeﬃcients cannot be directly extendedto generalized linear models such as Poisson regression, as the marginal log-likelihood is intractable. However, it is possible to use Gaussian approxi-mation of this exponential state space model [10], and the resulting samplesfrom the approximating posterior can then be weighted using the impor-tance sampling type correction [11], leading again to asymptotically exactinference.When modelling the regression coeﬃcients as a simple random walk, theposterior estimates of these coeﬃcients can have large short-term variationwhich might not be realistic in practice. One way of imposing more smooth-ness for the estimates is to switch from random walk coeﬃcients to integratedsecond order random walk coeﬃcients, deﬁned as β t +1 = β t + ν t ,ν t +1 = ν t + ξ t , with ξ t ∼ N (0 , D ξ ). This is a local linear trend model [12] (also knownas an integrated random walk), with the restriction that there is no noise onthe β level. For this model, we can apply the same estimation techniquesas for the random walk coeﬃcient case. More complex patterns for β arealso possible. For example, we can deﬁne that β t +1 ∼ N ( β t , γ t σ η ) where γ , . . . , γ T is known monotonically decreasing sequence of values leading to acase where β t gradually converges to constant over time.

2. Software description

A stable version of walker is available at CRAN , while the current de-velopment version can be installed from Github . The MCMC sampling ishandled by rstan [13], the R interface to Stan , while the model deﬁnitionsand analysis of the results are performed in R , leading to fast and ﬂexiblemodelling.For deﬁning the models, the walker package uses similar Wilkinson-Rogers model formulation syntax [14] as, for example, basic linear modelfunction lm in R , but the formula for walker also recognizes two customfunctions, rw1 and rw2 for random walk and integrated random walk respec-tively. For example, by typing https://cran.r-project.org/package=walker https://github.com/helske/walker it <- walker(y ~ 0 + x +rw1(~ z, beta = c(0, 1), sigma = c(0, 1)),sigma_y = c(0, 1)) we deﬁne a model with time-invariant coeﬃcient for predictor x , and ﬁrstorder random walk coeﬃcients for z and time-varying intercept term. Priorsfor β (including the intercept) and σ are deﬁned as vectors of length twowhich deﬁne the mean and standard deviation of the normal distributionrespectively (for standard deviation parameters the prior is truncated atzero).Function walker creates the model based on the formula and the priordeﬁnitions, and then calls the sampling function from the rstan package.The resulting posterior samples can be then converted to a data frame formatusing the function as.data.frame for easy visualization and further analysis.In addition to the main functions walker for the Gaussian case and walker_glm for the Poisson and negative binomial models, the package con-tains additional functions for visualization of the results (e.g., plot_coef and pp_check ) and out-of-sample prediction (function predict ). Also, asthe modelling functions return the full stanfit used in MCMC sampling,this object can be analysed using many general diagnostic and graphical toolsprovided by several Stan related R packages such as ShinyStan [15].

3. Illustrative Examples

As an illustrative example, let us consider a observations y of length n = 100, generated by random walk (i.e. time varying intercept) and twopredictors. First we simulate the coeﬃcients, predictors and the observations: set.seed(1)n <- 100beta1 <- cumsum(c(0.5, rnorm(n - 1, 0, sd = 0.05)))beta2 <- cumsum(c(-1, rnorm(n - 1, 0, sd = 0.15)))x1 <- rnorm(n, mean = 2)x2 <- cos(1:n)rw <- cumsum(rnorm(n, 0, 0.5))signal <- rw + beta1 * x1 + beta2 * x2y <- rnorm(n, signal, 0.5) Then we can call function walker . As noted in Section 2 the model isdeﬁned as a simple formula object, and in addition to the prior deﬁnitionswe can pass various arguments to the sampling method of rstan , such asthe number of iterations iter and the number of chains chains to be usedfor the MCMC (default values for these are 2000 and 4 respectively).5 it <- walker(y ~ 0 + rw1(~ x1 + x2,beta = c(0, 10), sigma = c(0, 10)),sigma_y = c(0, 10), chains = 2, seed = 1)

The output of walker is walker_fit object, which is essentially a listwith stanfit from Stan ’s sampling function, the original observations y and the covariate matrix xreg . This allows us to use all the postprocessingfunctions for stanfit objects.Figure 1 shows how walker recovers the true coeﬃcient processes rel-atively well: The posterior intervals contain the true coeﬃcients and theposterior mean estimates closely follow the true values. For drawing the ﬁg-ure, we ﬁrst use function as.data.frame to extract our posterior samplesof the coeﬃcients as data frame, then use packages dplyr [16] and ggplot2 [17] to summarise and plot the posterior means and 95% posterior intervalsrespectively: library(dplyr)library(ggplot2)sumr <- as.data.frame(fit, "tv") %>%group_by(variable, time) %>%summarise(mean = mean(value), lwr = quantile(value, 0.025),upr = quantile(value, 0.975))sumr$true <- c(rw, beta1, beta2)ggplot(sumr, aes(y = mean, x = time, colour = variable)) +geom_ribbon(aes(ymin = lwr, ymax = upr, fill = variable),colour = NA, alpha = 0.2) +geom_line(aes(linetype = "Estimate"), lwd = 1) +geom_line(aes(y = true, linetype = "True"), lwd = 1) +scale_linetype_manual(values = c("solid", "dashed")) +theme_bw() + xlab("Time") + ylab("Value") +theme(legend.position = "bottom",legend.title = element_blank()) More examples can be found in the package vignette and function doc-umentation pages (e.g. typing vignette("walker") or ?walker_glm in R ),including a comparison between the marginalization approach of walker and”naive” implementation with Stan , and an example on the scalability.

4. Impact and conclusions

The walker package extends standard Bayesian generalized linear modelsto ﬂexible time-varying coeﬃcients case in a computationally eﬃcient man-6

Time V a l ue beta_(Intercept) beta_x1 beta_x2 Estimate True Figure 1: Posterior means (solid lines) and 95% posterior intervals (shaded areas) of thetime-varying regression coeﬃcients and the true data generating values (dashed lines) ofthe illustrative example. ner, which allows researchers in economics, social sciences and other ﬁeldsto relax the sometimes unreasonable assumption of a stable, time-invariantrelationship between the response variable and (some of) the predictors. Sim-ilar methods have been previously used in maximum likelihood setting forexample in studying the diminishing eﬀects of advertising [18] and demandfor international reserves [19].There are several ways how walker can be extended in the future. Thereare already some plans for additional forms of time-varying coeﬃcients (suchas a stationary autoregressive process), support for more priors and addi-tional distributions for the response variables (e.g., negative binomial).

5. Conﬂict of Interest

We wish to conﬁrm that there are no known conﬂicts of interest associatedwith this publication and there has been no signiﬁcant ﬁnancial support forthis work that could have inﬂuenced its outcome.

Acknowledgements

This work has been supported by the Academy of Finland research grants284513, 312605, and 311877. 7 eferences [1] M. Moryson, Testing for random walk coeﬃcients in regression and statespace models, Springer Science & Business Media, 2012.[2] P. M. Robinson, Nonparametric estimation of time-varying parameters,in: Statistical analysis and forecasting of economic structural change,Springer, 1989, pp. 253–264.[3] P. Dagum, A. Galper, E. Horvitz, Dynamic network models for fore-casting, in: Proceedings of the Eighth Conference on Uncertainty inArtiﬁcial Intelligence, UAI 92, Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 1992, p. 4148.[4] A. C. Harvey, G. D. A. Phillips, The estimation of regression modelswith time-varying parameters, in: M. Deistler, E. F¨urst, G. Schw¨odiauer(Eds.), Games, Economic Dynamics, and Time Series Analysis, Physica-Verlag HD, Heidelberg, 1982, pp. 306–321.[5] Stan Development Team, The Stan C++ Library, version 2.15.0 (2016).URL https://mc-stan.org/ [6] D. J. Lunn, A. Thomas, N. Best, D. Spiegelhalter, WinBUGS - ABayesian modelling framework: Concepts, structure, and extensibility(Oct 2000). doi:10.1023/A:1008929526011 .[7] R Core Team, R: A Language and Environment for Statistical Comput-ing, R Foundation for Statistical Computing, Vienna, Austria (2020).URL [8] J. Durbin, S. J. Koopman, Time Series Analysis by State Space Meth-ods, 2nd Edition, Oxford University Press, New York, 2012.[9] J. Durbin, S. J. Koopman, A simple and eﬃcient simulation smootherfor state space time series analysis, Biometrika 89 (2002) 603–615.[10] J. Durbin, S. J. Koopman, Monte Carlo maximum likelihood estimationfor non-Gaussian state space models, Biometrika 84 (3) (1997) 669–684.[11] M. Vihola, J. Helske, J. Franks, Importance sampling type estimatorsbased on approximate marginal MCMC, Scandinavian Journal ofStatistics (2020). doi:10.1111/sjos.12492 .URL https://onlinelibrary.wiley.com/doi/abs/10.1111/sjos.12492 https://mc-stan.org/ [14] G. N. Wilkinson, C. E. Rogers, Symbolic description of factorial modelsfor analysis of variance, Journal of the Royal Statistical Society. SeriesC (Applied Statistics) 22 (3) (1973) 392–399.URL [15] Stan Development Team, shinystan: Interactive Visual and NumericalDiagnostics and Posterior Analysis for Bayesian Models, R package ver-sion 2.3.0 (2017).URL https://mc-stan.org/ [16] H. Wickham, R. Franois, L. Henry, K. Mller, dplyr: A Grammar of DataManipulation, R package version 0.7.6 (2018).URL https://CRAN.R-project.org/package=dplyr [17] H. Wickham, ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag New York, 2016.URL https://ggplot2.tidyverse.org [18] H. W. Kinnucan, M. Venkateswaran, Generic advertising; and thestructural heterogeneity hypothesis, Canadian Journal of AgriculturalEconomics/Revue canadienne d’agroeconomie 42 (3) (1994) 381–396. doi:10.1111/j.1744-7976.1994.tb00032.x .[19] M. Bahmani-Oskooee, F. Brown, Kalman ﬁlter approach to estimate thedemand for international reserves, Applied Economics 36 (15) (2004)1655–1668. doi:10.1080/0003684042000218543 .URL https://doi.org/10.1080/0003684042000218543https://doi.org/10.1080/0003684042000218543