Developments on the Bayesian Structural Time Series Model: Trending Growth
DDevelopments on the Bayesian Structural Time
Series Model: Trending Growth ∗ David Kohns † Department of Economics, Heriot-Watt UniversityandArnab BhattacharjeeDepartment of Economics, Heriot-Watt UniversityNovember 3, 2020
Abstract
This paper investigates the added benefit of internet search data in the form of GoogleTrends for nowcasting real U.S. GDP growth in real time through the lens of the mixedfrequency augmented Bayesian Structural Time Series model (BSTS) of Scott and Varian(2014). We show that a large dimensional set of search terms are able to improve nowcastsbefore other macro data becomes available early on the quarter. Search terms with highinclusion probability have negative correlation with GDP growth, which we reason to stemfrom them signalling special attention likely due to expected large troughs. We furtheroffer several improvements on the priors: we allow to shrink state variances to zero to avoidoverfitting states, extend the SSVS prior to the more flexible normal-inverse-gamma priorof Ishwaran et al. (2005) which stays agnostic about the underlying model size, as wellas adapt the horseshoe prior of Carvalho et al. (2010) to the BSTS. The application tonowcasting GDP growth as well as a simulation study show that the horseshoe prior BSTSimproves markedly over the SSVS and the original BSTS model, with largest gains to beexpected in dense data-generating-processes.
Keywords:
Global-Local Priors, Non-Centred State Space, Shrinkage, Google Trends ∗ The authors would like to thank Arnab Bhattacharjee, Mark Schaffer, Atanas Christev, Aubrey Poon,Gary Koop and all participants of the Scottish Graduate Program in Economics conference in Crieff for theirinvaluable feedback. † Email: [email protected] a r X i v : . [ ec on . E M ] N ov Introduction
The objective of nowcast models is to produce ‘early’ forecasts of the target variable whichexploit the real time data publication schedule of the explanatory data set. Nowcasting isparticularly relevant to central banks and other policy environments who are tasked with con-ducting forward looking policies on the basis of key economic variables such as GDP or inflation.These, however, are published with a lag of up to 7 weeks with respect to their reference period. Since even monthly macro data arrive with considerable lag, it is now common to combine, nextto traditional macroeconomic data, ever more information from Big Data sources such as inter-net search terms, satellite data, scanner data, etc. (Bok et al., 2018) which have the advantageof being available in near real time.As such, a burgeoning literature has studied the utility of using Google search data in theform of Google Trends (GT), which measure the relative search volume of certain search termsentered into the Google search engine, to nowcast economic time-series. The majority of theseapplications study improvements of GT based nowcasts compared to simple autoregressive pro-cesses and survey indicators. The latter comparison seems natural, as search term volumecan be viewed as surveys to supply and demand of products and an expression of views andsentiments (Scott and Varian, 2014). In particular, the literature established improvements inpredicting the UK housing market (McLaren and Shanbhogue, 2011), unemployment relateddata in the UK, Israel and Germany (Askitas and Zimmermann, 2009; Suhoy et al., 2009),US private consumption (Vosen and Schmidt, 2011) and price expectations (Guzman, 2011;D’Amuri and Marcucci, 2017). A commonality within these and related studies is that searchterms have predictive power especially when they are directly related to demand/supply deci-sions and signals of preference (Niesert et al., 2020), such as unemployment data where internetsearch engines provide the dominant funnel through which job seekers find jobs (Smith, 2016).While links of internet search to aggregate economic behaviour is less clear from a theoreticalpoint of view, a growing amount of studies report that Google Trends are even useful in now-casting there. D’Amuri and Marcucci (2017) show that hand-picked search categories related to’jobs’ help predict headline US unemployment, which is confirmed by Choi and Varian (2012)as well as Niesert et al. (2020) who select Google Trends in a more automatic way through The exact lag in publications of GDP and inflation depends as well on which vintage of data the econo-metrician wishes to forecast. Since early vintages of aggregate quantities such as GDP can display substantialvariation between vintages, this is not a trivial issue. y t = τ t + x (cid:48) t β + δ t + (cid:15) t , (cid:15) t ∼ N (0 , σ y ) τ t = µ t − + α t + (cid:15) τt , (cid:15) τt ∼ N (0 , σ τ ) α t = α t − + (cid:15) αt , (cid:15) αt ∼ N (0 , σ α ) δ t = − S − (cid:88) s =1 δ t − s + (cid:15) δt , (cid:15) δt ∼ N (0 , σ δ ) (1)In (1), any persistence in the data are modeled through a latent time-varying slope, τ t , with drift α t , originally proposed by Harvey (1990) as the local-linear-trend model, and seasonality arecaptured by S seasonal components. The local-linear trend model has the intuitive appeal, thatthe deviation from τ t , describes deviations from a long-run trend which applied to the level ofGDP can be interpreted as the output gap (Watson, 1986; Grant and Chan, 2017). The secondlevel of the trend, α t , thus allows for stochastic changes in the output gap, which, when firstdifferencing y t , results in GDP growth following a unit-root process (Clark, 1987). While as-suming that GDP growth can drift without bound, remains a topic of controversy in economics,4here is mounting evidence that U.S. GDP growth experienced multiple structural breaks (Kimand Nelson, 1999; McConnell and Perez-Quiros, 2000; Jurado et al., 2015). Antolin-Diaz et al.(2017) convincingly analysed that allowing for smooth shifts in GDP growth is preferred overdeterministic structural breaks, especially when the state variances are tightly controlled bypriors such that the stochastic trend does not wander too wildly. In fact, Antolin-Diaz et al.(2017) show that nowcasts of GDP growth improve markedly by modeling time-variation inlong-run growth as opposed to filtering it out via e.g. the Hodric-Prescott filter . As there isnot much theoretical basis for a local linear trend in GDP growth, our first point of improve-ment is to generalise the state space approach (1) to a non-centred formulation which offersmore aggressive regularisation on the latent state variances so as to prevent over-fitting. Weuse a prior with positive probability on zero which, importantly, allows to formally test for thenull of zero posterior variance (Fr¨uhwirth-Schnatter and Wagner, 2010) and, therefore, whethera state is constant or time varying. This would not be possible in a frequentist approach toestimating model (1), due to boundary testing issues.Our second improvement concerns the SSVS prior. The SSVS prior as proposed in the originalBSTS model of Scott and Varian (2014), has been formulated under the assumption of a fixedexpected model size parameter. As shown by Giannone et al. (2017), however, the posteriordistribution of model sizes and therefore the resulting degree of sparsity in the model, mightbe overly influenced by fixing the model size parameter a-priori. They therefore advocate touse an ’agnostic’ prior that allows the data to determine which model size is preferred whichwe extend to the BSTS model.While spike-and-slab priors are widely acknowledged to be the gold-standard for Bayesian vari-able selection and model averaging, a computational bottle-neck is presented by their discretemixture representation which quickly becomes infeasible to calculate in very large dimensionalsettings (George and McCulloch, 1993). Further, spike-and-slab priors struggle with correlateddata, as the correlation causes multi-modal posteriors and therefore bad mixing (Piironen et al.,2017). While Niesert et al. (2020) have improved on mixing via Hamiltonian Monte Carlo, ourthird methodological contribution is to further extend the BSTS framework to more modernglobal-local priors which offer continuous shrinkage. Global-local priors such as the horseshoe(Carvalho et al., 2010) and Dirichlet-Laplace prior (Bhattacharya et al., 2016) not only providecomputational advantages over the spike-and-slab prior, but offer favourable asymptotics as This is supported by findings of Sims (2012) who finds that modeling time-variation is preferred over de-trending a priori.
In this paper, we relate monthly macro data based on Giannone et al. (2016) and internetsearch information via U-MIDAS skip-sampling to real quarterly U.S. GDP growth. The U-MIDAS approach to mixed frequency belongs to the broader class of ’partial system’ models(Ba´nbura et al., 2013), which directly relate higher frequency information to the lower fre-quency target variable by vertically realigning the covariate vector. Switching notation fromequation (1) to make explicit that x t is sampled at a higher, i.e., monthly frequency, de-note x t,M = ( x ,t,M , · · · , x K,t,M ) and β m = ( β ,M , · · · , β K,M ) (cid:48) where M = (1 , ,
3) denotes themonthly observation of the covariate within quarter, t. By concatenating each monthly column,we obtain a T × K ∗ X and a K ∗ × β . This6ertical realignment is visualised for a single representative regressor below: y stquarter | x Mar x F eb x Jan y ndquarter | x Jun x May x Apr . | . . .. | . . .. | . . . (2) As early data vintages of U.S. GDP can exhibit substantial variation compared to final vintages(Croushore, 2006; Sims, 2002), it is not trivial which data to use in evaluating nowcast modelson historical data. Further complications can arise through changing definitions or methodsof measurements of data (Carriero et al., 2015). However, as in our application, only a fewexplanatory variables have recorded real time vintages (see Giannone et al. (2016)) and ourtraining window is restricted to begin with 2004 only (since this is the earliest data point forthe Google Trends data base) we decided to use final vintages of our data. We therefore considera pseudo real-time data set: we use the latest vintage of data, but, at each point of the forecasthorizon, we use only the data published up to that point in time.The target variable for this application is deseasonalised U.S. real GDP growth (GDPgrowth) as of downloaded from the FRED website . We found that pre-deseasonalised dataimproved forecast accuracy compared to modeling it in our state space system. This might bedue to the small sample size. As Google Trends are only available from 01/01/2004-01/06/2019,at the time of download, the period under investigation pertains to the same period in quarters(60 quarters). We split the data set into a training sample of 45 quarters (2004q2-2015q2) anda forecast sample of 15 quarters (2015q3-2019q1).The macro data set pertains to an updated version of the data base of Giannone et al.(2016) (henceforth, ‘macro data’) which contains 13 time series which are closely watched byprofessional and institutional forecasters such as real indicators (industrial production, housestarts, total construction expenditure etc.), price data (CPI, PPI, PCE inflation), financialmarket data (BAA-AAA spread) and credit, labour and economic uncertainty measures (volumeof commercial loans, civilian unemployment, economic uncertainty index etc.) Table (1) givesan overview over all data along with FRED codes. The deseasonalisation pertains here to the X-13-ARIMA method and was performed prior to download fromthe FRED-MD website. .3 Google Trends Google Trends (GT) are indices produced by Google on the relative search term popularity ofa given search term or pre-specified search categories, conditional on a given time frame andlocation. Our sample comprises 27 Google Trends (overview in C) which have been chosenbased on the root term methodology as in Koop and Onorante (2019) and Bock (2018). Ingeneral, there is no consensus on how to optimally select search terms for final estimation.Methods which have been proposed in the previous literature fall into: (i) pre-screening throughcorrelation with the target variable as found via Google Correlate (Scott and Varian, 2014;Niesert et al., 2020; Choi and Varian, 2012) , (ii) through cross-validation (Ferrara and Simoni,2019), (iii) use of prior economic intuition where search terms are selected through backwardinduction (e.g.: Smith (2016); Ettredge et al. (2005); Askitas and Zimmermann (2009)), and(iv) root terms, which similarly specify a list of search terms through backward induction,but additionally download ”suggested” search terms from the Google interface. This serves tobroaden the semantic variety of search terms in a semi-automatic way. As as methodologiesbased on pure correlation do not preclude spurious relationships (Scott and Varian, 2014; Niesertet al., 2020; Ferrara and Simoni, 2019), we opt for the root term methodology as from theauthors’ perspective, it currently provides the best guarantee of finding economically relevantGoogle Trends. It is also important to note that in the selection process, search terms are nottossed out based on expected performance but rather that they are simply not related to GDP.Since search terms can display seasonality, we deseasonalise all Google Trends by the Loessfilter, as recommended by Scott and Varian (2014), which is implemented with the ”stl” com-mand in R and make sure that they are individually stationary to avoid any spurious correlation . Finally, our pseudo-real time calendar can be found in table 1 and has been constructed af-ter the data’s real publication schedule. It comprises a total of 31 vintages which make for anequal number of information sets Ω vt for v = 1 , · · · ,
31 which are used to construct nowcasts asexplained in section 4. In order for the Google Trends search indexes to represent the latestavailable information within a given month, we treat them as only observable at the end of thegiven month . Unfortunately, Google Correlate has suspended updating their databases past 2017. To mitigate any inaccuracy stemming from sampling error, we downloaded the set of Google Trends seventimes between 01/07-08/07/2019 and took the cross-sectional average. Since we used the same IP address andgoogle-mail account, there might still be some unaccounted measurement error which could be further mitigated intage Timing Release Variable Name Pub. lag Transformation FRED Code0 First day of month 1 No information available - - - -1 Last day of month 1 Fed. funds rate & credit spread fedfunds & baa m 3 FEDFUNDS & BAAY102 Last day of month 1 Google Trends m 4 -3 1st bus. day of month 2 Economic Policy Uncertainty Index uncertainty m-1 1 USEPUINDXM4 1st Friday of month 2 Employment situation hours & unrate m-1 2 AWHNONAG & UNRATE5 Middle of month 2 CPI cpi m-1 2 CPI6 15th-17th of month 2 Industrial Production indpro m-1 2 INDPRO7 3rd week of month 2 Credit & M2 loans & m2 m-1 2 LOANS & M28 Later part of month 2 Housing starts housst m-1 1 HOUST9 Last week of month 2 PCE & PCEPI pce & pce2 m-1 2 PCE & PCEPI10 Last day of month 2 Fed. funds rate & credit spread fedfunds & baa m 3 FEDFUNDS & BAAY1011 Last day of month 2 Google Trends m 4 -12 1st bus. day of month 3 Economic Policy Uncertainty Index uncertainty m-1 1 USEPUINDXM13 1st bus. day of month 3 Construction starts construction m-2 1 TTLCONS14 1st Friday of month 3 Employment situation hours & unrate m-1 2 AWHNONAG & UNRATE15 Middle of month 3 CPI cpi m-1 2 CPI16 15th-17th of month 3 Industrial Production indpro m-1 2 INDPRO17 3rd week of month 3 Credit & M2 loans & m2 m-1 2 LOANS & M218 Later part of month 3 Housing starts housst m-1 1 HOUST19 Last week of month 3 PCE & PCEPI pce & pce2 m-1 2 PCE & PCEPI20 Last day of month 3 Fed. funds rate & credit spread fedfunds & baa m 3 FEDFUNDS & BAAY1021 Last day of month 3 Google Trends m 4 -22 1st bus. day of month 4 Economic Policy Uncertainty Index uncertainty m-1 1 USEPUINDXM23 1st bus. day of month 4 Construction starts construction m-2 1 TTLCONS24 1st Friday of month 4 Employment situation hours & unrate m-1 2 AWHNONAG & UNRATE25 Middle of month 4 CPI cpi m-1 2 CPI26 15th-17th of month 4 Industrial Production indpro m-1 2 INDPRO27 3rd week of month 4 Credit & M2 loans & m2 m-1 2 LOANS & M228 Later part of month 4 Housing starts housst m-1 1 HOUST29 Last week of month 4 PCE & PCEPI pce & pce2 m-1 2 PCE & PCEPI30 Later part of month 5 Housing starts housst m-2 1 HOUST Table 1: Pseudo real time calendar based on actual publication dates. Transformation: 1 =monthly change, 2 = monthly growth rate, 3 = no change, 4 = LOESS decomposition. Pub.lag: m = refers to data for the given month within the reference period, m-1 = refers to datawith a months’ lag to publication in the reference period, m-2 = refers to data with 2 months’lag to publication in the reference period. 9
Methodology
The original state space formulation of Scott and Varian (2014) collects states τ t , α t and δ t in(1) and estimates them jointly via a forward filtering backward sampling (FFBS) algorithm ofDurbin and Koopman (2002) which is based on the Kalman filter. While very popular, therecursive structure of the algorithm is costly in terms of computation, but more importantly,relies on independent Normal-Inverse Gamma (N-IG) priors for the states and state varianceswhich Fr¨uhwirth-Schnatter and Wagner (2010) show can lead to overfitting state spaces andtherefore imprecise forecasts. This is due to the fact that in state space models, the statevariances determine whether a state process is fixed or time varying (see equation (1)). Instead,Fr¨uhwirth-Schnatter and Wagner (2010) propose a non-centred representation of the state spacewhich dissects the dynamics into a non-time varying and time varying component, where theformer models the state standard deviation directly in the observation equation. A normal prioron the state standard deviation can be shown to imply a Gamma prior on the state variancewhich allows for far more mass on 0, therefore applying more shrinkage. Taking (1) as ourcentred state space (and ignoring X for now), we re-write it equivalently as: y t = τ + σ τ ˜ τ t + tα + σ α t (cid:88) s =1 ˜ α t + (cid:15) t , (cid:15) t ∼ N (0 , σ ) (3)where ˜ τ t = ˜ τ t − + ˜ u τt , ˜ u τt ∼ N (0 , α t = ˜ α t − + ˜ u αt , ˜ u αt ∼ N (0 ,
1) (4)with starting values ˜ τ = ˜ α = 0. To see that (3) and (4) is equivalent to (1), let: α t = α + σ α ˜ α t τ t = τ + σ τ ˜ τ t + tα + σ α t (cid:88) s =1 α s (5) by using web-crawling. Alternatively, one could use bridge methods as in Ferrara and Simoni (2019) who update monthly GTindexes on a weekly basis. We leave this for future investigation. y t = τ t + (cid:15) t , it is clear that α t − α t − = σα ( ˜ α t − ˜ α t − )= σ α + ˜ u αt τ t − τ t − = α + σ α ˜ α t + σ τ (˜ τ t − ˜ τ t − )= α + σ τ + ˜ u τt (6)which recovers (1). Since σ τ,α are allowed to have support on the real line, they are not iden-tified in multiplication with the states: the likelihood is invariant to changes in signs of σ α and σ τ . Consequently, mixing of the posterior state standard deviations can be bad and theirdistributions are likely to be bi-modal (Fr¨uhwirth-Schnatter and Wagner, 2010). This issueis combated by randomly permuting signs in the Gibbs sampler as explained below. Sim-ilar to Fr¨uhwirth-Schnatter and Wagner (2010), we assume normal priors centred at 0 for σ i : σ i ∼ N (0 , V i ) ∀ i ∈ { τ, α } .Collecting all state space parameters in θ = ( τ , α , σ τ , σ α ), we assume an independent multi-variate normal prior with diagonal covariance: θ ∼ N ( θ , V θ ) (7)While the state processes { ˜ τ , ˜ α } Tt =1 can be estimated by any state space algorithm, we opt forthe precision sampler method of Chan (2017) as outlined in Appendix (A.2.1) . In contrastto FFSB type algorithms, it samples the states without recursive estimation which speeds upcomputation significantly. Variable selection in the BSTS model of Scott and Varian (2014) is done via a two compo-nent conjugate spike-and-slab prior which utilises a variant of Zeller’s g-prior and fixed ex-pected model size. While computationally fast due to the assumption of conjugacy, manyhigh-dimensional problems benefit from prior independence (Moran et al., 2018) and a fullyhierarchical formulation to let the data decide on the most likely value of the parameters (Ish-waran et al., 2005). We therefore follow Ishwaran et al. (2005)’s extension to the SSVS prior,11he Normal-Inverse-Gamma prior: β j | γ j , δ j ∼ γ j N (0 , δ j ) + (1 − γ j ) N (0 , c × δ j ) δ j ∼ Gamma ( a .a ) γ j ∼ Bernoulli ( π ) π ∼ Beta ( b , b ) (8)where j ∈ (1 , · · · , K ). The intuition compared to the spike-and-prior of Scott and Varian (2014)remains the same in that the coavariate’s effect is modeled by a mixture of normals where a itis either shrunk to close to zero via a narrow distribution around zero, the spike component, orestimated freely though a relatively diffuse normal distribution, the slab component. Sortinginto each component is handled through an indicator variable, γ j , and the hyperparameter c ischosen to be a very small number, thereby forcing shrinkage of noise variables to close to zero.While in the original BSTS model, the indicator variable, γ j , depends on a fixed prior π whichgoverns the prior inclusion probability of a variable, (8) allows for it to be estimated by thedata through another level of hierarchy. We set b = b = 1, which effectively assumes that anyexpected model size is a-priori possible and thus allows for sparse but also dense model solutionsas recommended by (Giannone et al., 2017). Finally, also the prior variance δ j is allowed tobe hierarchical. Posteriors are standard and described in the appendix (A.1). The posteriorof γ j is of special interest to the analyst as it gives a data informed measure of importance ofa variable. Namely, p ( γ | y ) can be interpreted as the posterior inclusion probability of a variable. The horseshoe prior, like many recently popularised shrinkage priors, belongs to the broaderclass of global-local priors which take the following general form: β j | λ j , ν , σ ∼ N (0 , λ j ν σ ) , j ∈ (1 , · · · , K ) λ j ∼ π ( λ j ) dλ j , j ∈ (1 , · · · , K ) ν ∼ π ( ν ) dν (9)The idea of this family of priors is that the global scale ν , controls the overall shrinkage appliedto the regression, while the local scale λ j allows for the local possibility of regressors to escapeshrinkage when they have large effects on the response. A variety of global local shrinkage12riors have been proposed (see Polson and Scott (2010)), but here we focus on arguably themost popular, the horseshoe prior of Carvalho et al. (2010) which employs two half Cauchydistributions for ν and λ : λ j ∼ C + (0 , ν ∼ C + (0 ,
1) (10)It can further be shown that these two fat tailed priors imply a shrinkage profile that has thespike-and-slab prior in its limit and therefore offers a continuous approximation to the SSVS(Piironen et al., 2017). Due to its special connection to frequentist shrinkage priors (Polsonand Scott, 2010), it can be shown that it not only offers good finite sample performance, butalso has favourable asymptotic behaviour compared to competing global priors (Bhadra et al.,2019). Posteriors are described in the appendix (A.1).
Although, the horseshoe prior will shrink noise variables to close to zero, the importance of avariable for nowcasts may not be immediately clear from posterior summery statistics of thecoefficients, especially, when the posterior is multi-modal. To aid interpretability and simulta-neously preserve predictive ability, we employ the SAVS algorithm of Ray and Bhattacharya(2018) to the posterior coefficients on a draw by draw basis. The algorithm uses a usefulheuristic, inspired by frequentist lasso estimation, to threshold posterior regression coefficientsto zero: φ j = sign ( ˆ β j ) || X j || − max ( | ˆ β j | || X j || − κ j , , (11)where X j = ( x j , · · · , x jT ) (cid:48) is the j th column of the regressor matrix X, sign(x) returns the signof x and ˆ β represents a draw from the regression posterior. The parameter κ j in (11) acts asa threshold for each coefficient akin to the penalty parameter in lasso regression which can beselected via cross-validation. Ray and Bhattacharya (2018) propose as simpler solution, κ j = 1 | β j | , (12)which ranks the given coefficient inverse-squared proportionally and provides good performancecompared to alternative penalty levels (Ray and Bhattacharya, 2018; Huber et al., 2019). To seethe similarity to lasso style regularisation, the solution to (11) can be obtained by the followingminimisation problem which reminds of Zou (2006): φ = argmin φ (cid:26) || X ˆ β − Xφ || + K (cid:88) j =1 κ j | φ j | (cid:27) (13)13 is the sparsified regression vector. The relative frequency of non-zero entries in the poste-rior coefficient vector can analogously to the SSVS posterior be interpreted as posterior inclusionprobabilities. And integrating over the uncertainty of the parameters to obtain the predictivedistribution p (˜ y | y ), we receive something similar to a Bayesian Model Averaged (BMA) poste-rior (Koop and Onorante, 2019). With the conditional posteriors at hand (see A.1), we sample states as well as regression pa-rameter with the following Gibbs sampler:1. Sample (˜ τ , ˜ α | y, θ, β, σ y )2. Sample ( θ | y, β, ˜ τ , ˜ α, σ y )3. Randomly permute signs of (˜ τ , ˜ α ) and ( σ τ , σ α )4. Sample ( β | y, θ, ˜ τ , ˜ α, σ y )5. Sample ( σ y | y, ˜ τ , ˜ α, σ y )As mentioned in 3.1, states are sampled in a non-recursive fashion which exploits sparse matrixcomputation and precision sampling. The exact sampling algorithm is given in (A.2.1). Afterhaving sampled θ in step 2, we randomly permute signs of (˜ τ , ˜ α ) , ( σ τ , σ α ) as alluded to in 3.1to aid mixing. Step 4 of the sampler will depend on the prior and its respective hyperpriors.While the posterior sampling scheme for the SSVS is standard, we use the efficient posteriorsampler of Bhattacharya et al. (2016) to sample the regression coefficients of the horseshoeprior. Compared to Cholesky based sampling as used for the SSVS, computation speed ismarkedly improved (see (A.1.1)). Note, that in step 4, we perform SAVS sparsification via (11)on an iteration basis. 14 Nowcasting U.S. Real GDP Growth
The predictive model used to generate in-as well as out-of-sample prediction is: y t = τ + σ τ ˜ τ t + tα + σ α t (cid:88) s =1 ˜ α t + X t β + (cid:15), (cid:15) t ∼ N (0 , σ y )˜ τ t = ˜ τ t − + u αt , u τt ∼ N (0 , α t = ˜ α t − + u αt , u αt ∼ N (0 ,
1) (14)where t = (1 , · · · , T ) are used as a training sample. We estimate three variants of (14) basedon priors (8,10,11), the original BSTS model of Scott and Varian (2014) as well as a simpleAR(2) model for comparison. As is standard for BSTS applications, we firstly compare the in-sample cumulative absolute one-step-ahead forecast error, which is generated as a side productof the state space, as well as the inclusion probabilities of the variables so as to shed light onwhich variables drive fit. Out-of-sample nowcasts are generated from the posterior predictivedistribution p ( y T +1 | Ω vT ) for growth observation y T +1 , conditional on the real-time informationset Ω vT , where ( v = 1 , · · · ,
31) refers to vintages within the pseudo real-time calendar (1). Thisresults in 31 different nowcasts which are generated on a rolling basis until the end of theforecast sample, T end . Variables that haven’t been published yet until vintage v are zeroed outas recommended by Carriero et al. (2015).Point forecasts are computed as the mean of the posterior predictive distribution and arecompared via real time root-mean-squared-forecast-error (RT-RMSFE) which are calculatedfor each vintage as: RT-RMSFE = (cid:118)(cid:117)(cid:117)(cid:116) T end T end (cid:88) j =1 ( y T + j − ˆ y vT + j | Ω vT + j − ) , (15)where ˆ y vT + j | Ω vT + j is the mean of the posterior predictive for vintage v using information until T + j − T end T end (cid:88) j =1 logp ( y T + j | Ω vT + j − )= 1 T end T end (cid:88) j =1 log (cid:90) p ( y T + j | Ω vT + j − , ζ T + j − ) p ( ζ T + j − | Ω vT + j − ) dζ T + j − ≈ T end T end (cid:88) j =1 log ( 1 M M (cid:88) m =1 p ( y T + j | Ω vT + j − , v m T + j − )) , (16)15T-LPDS = 1 T end T end (cid:88) j =1 | y T + j − y vT + j | Ω νT + j − | − | y v,AT + j | Ω vT + j − − y v,BT + j | Ω vT + j − | , (17)where ζ T + j − , for brevity of notation, collects all model parameters as defined for each model,which are estimated with expanding in-sample information until T + j − y v,A,BT + j | Ω vT + j − are independentlydrawn from the posterior predictive p ( y vT +1 | Ω vT + j − | y T ).The LPDS, as shown by Fr¨uhwirth-Schnatter (1995), in a setting where time-varying andfixed components for a structural state space model are chosen, can be interpreted as a log-marginal likelihood based on the in-sample information and therefore makes for a model foundedscoring function. The RT-CRPS can be thought of as the probabilistic generalisation of themean-absolute-forecast-error. Similar to the log-score, it belongs to the broader class of strictlyproper scoring functions (Gneiting and Raftery, 2007) which allows for comparing density fore-casts in a consistent manner . To facilitate the discussion, the objective is to maximise theRT-LPDS and minimise the RT-CRPS. For all forecast metrics, the predictive distribution usedfor (15,16,17) is traditionally generated in state space models via the prediction equations ofthe Kalman filter (Harvey, 1990). We instead use the simpler approximate method of Cogleyet al. (2005), which we found to make no practical difference for our sample. The method isdescribed in A.2.2.Finally, to test whether a state variance is equal to zero, we make use of the Dickey-SavageDensity ratio test evaluated at σ τ,α = 0:DS = p ( σ τ,α = 0) p ( σ τ,α = 0 | y ) (18)It can be shown that for nested models, the DS statistic is equivalent to the Bayes factorbetween the prior and the posterior distribution of the parameter of interest at zero (Verdinelliand Wasserman, 1995). The intuition for the test is simple: if the prior probability-density-function (PDF) allocates more mass at 0 than the posterior at that point, there is evidence infavour of the unrestricted model, i.e., σ τ,α (cid:54) = 0. While the priors for the state variances havewell known forms and thus can be evaluated analytically, we estimate the denominator for allmodels through Monte Carlo integration. We do not report calibrations tests, as there are too little out-of-sample observations to meaningfullydetermine calibration. Although the CRPS is a symmetric scoring function, it penalises outliers less aggressive than the log-scorewhich is of advantage in small forecast samples such as ours. .1 In-Sample Results Figure (2) shows the in-sample cumulative-one-step ahead prediction errors for the proposedpriors where the information set pertains to the whole in-sample period without ragged edges.From figure (2), it is clear that the horseshoe prior BSTS (HS-BSTS) provides the best in-samplepredictions at all time periods. The HS-BSTS-SAVS and SSVS-BSTS initially provide similarfit, however diverge in performance around the financial crisis. It is striking that compared tothe former two, the HS-BSTS provides very stable performance as indicated by a nearly linearincrease in errors throughout the entire estimation sample. It is also apparent that the SAVSalgorithm is not able to retain the fit of the HS prior alone, which, as we show in the nextsubsection, is in contrast to the out-of-sample results.Figure 2: Cumulative one-step-ahead forecast errors in-sample from 3 different models: (1)SSVS-BSTS, (2) HS-BSTS and (3) HS-SAVS-BSTSThe posterior marginal inclusion probabilities between the SSVS and HS-SAVS prior indi-cate which variables drive fit and are plotted for the top ten most drawn variables in figure (3).The colors of the bars indicate the sign on a continuous scale of white (positive relationship) toblack (negative relationship) of the variable when included in the model, and the prefix ’GT’indicates whether the variable is a Google Trend. The number [0,1,2] appended to a variableindicates the position in a given quarter, with 0 being the latest month. It is clear that theSAVS extended HS prior allows for larger models compared to the SSVS prior, as there aremore variables with larger posterior inclusion probability. This is confirmed by the posterior17igure 3: Posterior inclusion probabilities of (left) the SSVS-BSTS and (right) the HS-SAVS-BSTS prior.distribution of model sizes (see figure (4)). Next to differences in model selection, the inclusionprobabilities of a given selected skip-sampled variable within a given quarter also indicate adifference in how they deal with correlated data. The SSVS prior tends to select only themost dominant of the skip-sampled information, while the SAVS extended HS prior allocatessignificant inclusion probability to all months within a quarter. For example, while the SSVSprior selects the variable ’pce2’, i.e., PCE inflation for the first month in a given quarter, theHS-SAVS prior allocates nearly the same inclusion probability to all construction start vari-ables (’const0’-’const2’). Hence, it allows for greater model uncertainty which is what we wouldexpect from correlated macro data as discussed in Cross et al. (2020); Giannone et al. (2017).In contrast to Giannone et al. (2017), the posterior model size distributions of both the SAVSand SSVS algorithms are relatively sparse compared to the overall size of the data set.Further, of the Google Trends, both priors select the root-term ’real GDP growth’ as the mostimportant one. Although, a-priori, it is not clear whether the root term refers to positive ornegative news about GDP growth, the posterior sign as indicated by the color of the bar infigure (3) is clearly negative, which corroborates the graphical investigation from figure (9).This gives tentative evidence that this search term acts as a warning signal to downturns ofGDP growth as alluded to in the introduction.Lastly, we see from figure (5) that there is support in the data for local trend, but not a locallinear trend model: the posterior for σ τ is clearly bi-modal with less mass on zero than theprior, while the posterior for σ α has substantially more mass on zero than the prior. The Bayesfactors are 3.74 and 0.29 for the state standard deviations respectively.18igure 4: Posterior distribution of model sizes for (left) the SSVS-BSTS (right) HS-SAVS-BSTS.Figure 5: Distribution of the (left) trend state standard deviation and (right) slope standarddeviation for the HS-BSTS model. We now turn to out-of-sample nowcasting performance, where nowcasts are produced followingthe real time data publication calendar as explained in section 2. We first evaluate point- andthen density fit. RT-RMSFE are plotted in figure (6) for the competing non-centred BSTSestimators, as well as the AR(2) benchmark and the original BSTS model. Notice that in allnowcast figures, we represent vintages in which Google Trends are published by grey verticalbars and plot the results for the original BSTS model on the right axis for readability. Thefollowing points emerge from figure (6): firstly, it is clear that all proposed BSTS models basedon the non-centred state space offer large performance gains over the original BSTS model. Sec-19ndly, all models almost monotonically increase in precision as more data are released, where,as expected, the BSTS models eventually outperform the AR(2) benchmark. Thirdly, amongnon-centred BSTS models, the HS-BSTS does best, however, is closely followed by the HS-SAVS-BSTS. Hence, the HS-SAVS-BSTS is largely able to preserve fit, which is expected asthe SAVS algorithm thresholds noise variables to zero that the horseshoe prior already shrinksto close to zero. Compared to the SSVS-BSTS, the horseshoe prior based BSTS models offer15-20% improvements in terms of RT-RMSFE. Finally, Google Trends only provide modest im-provements in point nowcast accuracy where only in the first vintage in which Google Trendsare published, do the estimators show slight improvements in fit.Figure 6: Real-Time RMSFE of all competing models. The RT-RMSFE for the BSTS areplotted on the right axis. Grey vertical bars indicate vintages in which GT are published.Similar to the real-time point forecasts, we plot real-time LPDS (RT-LPDS) and CRPS(RT-CRPS) in figures (7) and (8). The RT-LPDS and RT-CRPS mostly confirm the findingsfound for the point nowcasts. As more information comes in, density nowcasts become moreaccurate, as can be seen from the increasing lines in (7) and decreasing lines in (8). Similar tothe RT-RMSFE, the non-centred BSTS models offer large improvements over the original BSTSmodel of Scott and Varian (2014) and the horseshoe prior models provide the best fit, withhowever a slight advantage for the HS-BSTS model. Improvements of the HS based modelsover the SSVS-BSTS are of similar magnitude to the point nowcasts. In contrast to the point20owcasts, all non-centred BSTS models quickly outperform the benchmark AR(2) model.Figure 7: Real-Time log-predictive density scores (RT-LPDS) for all competing models. TheRT-LPDS for the BSTS are plotted on the right axis. Grey vertical bars indicate vintages inwhich GT are published.Figure 8: Real-Time CRPS scores (RT-CRPS) for all competing models. The RT-CRPS forthe BSTS are plotted on the right axis. Grey vertical bars indicate vintages in which GT arepublished. 21
Simulation Study
The empirical application showed that the proposed BSTS models perform better in point aswell as density forecasts compared to the original model of Scott and Varian (2014) and thatboth the SAVS augmented horseshoe prior as well as the SSVS-BSTS choose models which arerelatively sparse compared to the dimensionality of the regressor set. This finding is in contrastto previous studies using macroeconomic data such as Giannone et al. (2017) and Cross et al.(2020) who find that priors able to accommodate dense models generally outperform sparsityfavouring priors. As the innovation in this paper compared to previous work is the estimationof a latent local-linear trend which might filter out the co-movement with the macro data, wecompare the ability of the proposed priors to the original BSTS model by Scott and Varian(2014) in capturing both sparse and dense environments. Further, to make the simulationscloser to our empirical application, we additionally test the priors’ ability to detect insignificantstate variances. Specifically, we simulate local-linear-trend models as (14) which either havethe trend variance, the local trend variance, both, or none equal to zero.We generate 20 fictitious samples for ( σ τ , σ α ) = { (0 . , , (0 , . , (0 , , ( . , . } for a denseand sparse DGP, where the sparse coefficient vector is defined as: β sparse = (1 , / , / , / , / , K − ) (cid:48) (19)and the dense coefficient vector is defined as: β dense = / p d − p d (20)where p d is set to 2/3. For both coefficient vectors, the dimensionality, K, is set to 300 which ishigh dimensional compared to the number of observations T = 150. We account explicitly formixed frequencies by first generating the covariate matrix according to a multivariate normaldistribution with mean 0 and a covariance matrix with its ( i, j ) th element defined as 0 . | i − j | andthen skip-sample each covariate individually after the U-MIDAS methodology as in (2). Sincein the simulations, we know the true regression coefficient values as well as state variances, wecompare the performance of the different priors via coefficient bias for the regression coefficientsand Dickey Savage Density ratios evaluated at zero state variances. Bias is calculated asRoot Mean Coefficient Bias = (cid:114) || ˆ β − β || , (21)where ˆ β refers to the mean of the posterior distribution. We estimate the original BSTS modelwith the expected model size, π , equal to the true number of non-zero coefficients.22s can be seen from table (2), both the non-centred BSTS models as well as the original BSTSmodel of Scott and Varian (2014) do better in sparse than in dense DGPs which is similar to thefinding of Cross et al. (2020). The largest gains in estimation accuracy however of the proposedBSTS models over Scott and Varian (2014) can be found for dense DGPs where the proposedestimators offer gains in accuracy well in excess of 50% . In sparse designs, however, the latterslightly outperforms the former. This was to be expected given that the spike-and-slab prioruses a point mass prior on zero and that the true expected model size is used. It is encouragingthat the differences in accuracy are very small.Within the proposed estimators in dense designs, the HS prior BSTS versions are 30-40% moreaccurate compared to the SSVS-BSTS which is in line with the empirical application. Hence,these results offer the conclusion that continuous shrinkage priors are clearly preferred overspike-and-slab models in dense DGPs with a latent local-linear trend component.The Dickey-Savage density ratio tests confirm that the non-centred state space models are ableto correctly identify which of the state variances are significant and which are not, even inhigh dimensional regression settings. It is interesting to note that the Dickey-Savage tests are,however, sensitive to correctly pinning down the regression coefficient vector: in dense designs,where the SSVS prior does worse than the horseshoe prior, the DS tests in cases (0,0.5) and(0,0) show false support for significant σ τ . In this paper, we investigated the added benefit of a host of Google Trends search terms innowcasts of U.S. real GDP growth through the lens of improved Bayesian structural time series(BSTS) models. We have extended the BSTS of Scott and Varian (2014) to a non-centredformulation which allows to shrink state variances to zero in order to avoid overfitting statesand therefore let the data speak about the latent structure. We have further extended andcompared priors used for the regression part which allow for agnosticity of the underlying modeldimensions to accommodate both sparse and dense solutions, as well as the widely successfulhorseshoe prior of Carvalho et al. (2010).We find that Google Trends improve point as well as density nowcasts in real time within Note that we do not report DS tests for the original BSTS model. This is due to the fact that the prior onthe state variance has no mass on zero and therefore is not testable. parse Dense( σ τ , σ α ) (0.5,0) (0.0.5) (0.5,0.5) (0,0) (0.5,0) (0,0.5) (0.5,0.5) (0,0)Bias BiasHS 0.034 0.036 0.036 0.034 0.293 0.289 0.289 0.281HS-SAVS 0.035 0.035 0.035 0.035 0.33 0.327 0.32 0.321SSVS 0.035 0.038 0.036 0.036 0.415 0.416 0.421 0.418BSTS 0.02 0.02 0.021 0.021 0.795 0.567 0.582 0.579 DS ( σ τ = 0) DS ( σ τ = 0)HS 516.78 0.81 4.267 0.891 1.959 0.701 3.521 1.493SSVS 629.41 0.824 0.89 0.19 10.775 3.053 3.804 1.402 DS ( σ α = 0) DS ( σ α = 0)HS 0.062 41.587 722.319 0.026 0.112 1772.907 96.015 0.068SSVS 0.058 4.63E+10 1.05E+08 0.005 0.071 1.29E+10 7.56E+04 0.021 Table 2: Average Dickey-Savage Density ratio and bias results the simulations. Since theSAVS algorithm is performed on an iteration basis after inference, the posterior of σ τ,α remainsunaffected, hence receives the same results as the HS-BSTS model.the sample under investigation, where largest improvements can be expected prior to macroinformation trickling in. The search terms with the highest posterior inclusion probabilitywithin all models have negative correlation with GDP growth, which we reason to stem fromthem signalling special attention likely due to expected large troughs. This claim in particularshould be investigated with larger Google Trend samples and for other countries. The superiorperformance of the proposed models over the original BSTS model is confirmed in a simulationstudy which shows that among the proposed models, the horseshoe prior BSTS performs bestand largest gains in estimation accuracy can be expected in dense DGPs. It further confirmedthat the non-centred state priors are able to correctly identify the latent structure, however aresensitive to the efficacy of the regression prior to detect signals from noise.An aspect which remained unexplored in this study is that Google Trends might have timevarying importance in relationship to the macro variable under investigation as highlightedby Koop and Onorante (2019). Search terms can be highly contextual and might thereforebe able to predict turning points in some periods but not in others. Given limited quarterlyobservations of Google Trends, this research question will improve in importance over time.24 eferences Alexopoulos, M. and J. Cohen (2015). The power of print: Uncertainty shocks, markets, andthe economy.
International Review of Economics & Finance 40 , 8–28.Antolin-Diaz, J., T. Drechsel, and I. Petrella (2017). Tracking the slowdown in long-run gdpgrowth.
Review of Economics and Statistics 99 (2), 343–356.Askitas, N. and K. F. Zimmermann (2009). Google econometrics and unemployment forecasting.Baker, S. R., N. Bloom, and S. J. Davis (2016). Measuring economic policy uncertainty.
Thequarterly journal of economics 131 (4), 1593–1636.Ba´nbura, M., D. Giannone, M. Modugno, and L. Reichlin (2013). Now-casting and the real-timedata flow. In
Handbook of economic forecasting , Volume 2, pp. 195–237. Elsevier.Belmonte, M. A., G. Koop, and D. Korobilis (2014). Hierarchical shrinkage in time-varyingparameter models.
Journal of Forecasting 33 (1), 80–94.Bhadra, A., J. Datta, N. G. Polson, B. Willard, et al. (2019). Lasso meets horseshoe: A survey.
Statistical Science 34 (3), 405–427.Bhattacharya, A., A. Chakraborty, and B. K. Mallick (2016). Fast sampling with gaussian scalemixture priors in high-dimensional regression.
Biometrika 103 (4), 985–991.Bitto, A. and S. Fr¨uhwirth-Schnatter (2019). Achieving shrinkage in a time-varying parametermodel framework.
Journal of Econometrics 210 (1), 75–97.Bock, J. (2018). Quantifying macroeconomic expectations in stock markets using google trends.
Available at SSRN 3218912 .Bok, B., D. Caratelli, D. Giannone, A. M. Sbordone, and A. Tambalotti (2018). Macroeconomicnowcasting and forecasting with big data.
Annual Review of Economics 10 , 615–643.Carriero, A., T. E. Clark, and M. Marcellino (2015). Realtime nowcasting with a bayesianmixed frequency model with stochastic volatility.
Journal of the Royal Statistical Society.Series A,(Statistics in Society) 178 (4), 837.Carter, C. K. and R. Kohn (1994). On gibbs sampling for state space models.
Biometrika 81 (3),541–553. 25arvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparsesignals.
Biometrika 97 (2), 465–480.Chan, J. C. (2017). Notes on bayesian macroeconometrics.
Manuscript available athttp://joshuachan. org .Choi, H. and H. Varian (2012). Predicting the present with google trends.
Economic record 88 ,2–9.Clark, P. K. (1987). The cyclical component of us economic activity.
The Quarterly Journal ofEconomics 102 (4), 797–814.Cogley, T., S. Morozov, and T. J. Sargent (2005). Bayesian fan charts for uk inflation: Fore-casting and sources of uncertainty in an evolving monetary system.
Journal of EconomicDynamics and Control 29 (11), 1893–1925.Cross, J. L., C. Hou, and A. Poon (2020). Macroeconomic forecasting with large bayesian vars:Global-local priors and the illusion of sparsity.
International Journal of Forecasting .Croushore, D. (2006). Forecasting with real-time macroeconomic data.
Handbook of economicforecasting 1 , 961–982.Durbin, J. and S. J. Koopman (2002). A simple and efficient simulation smoother for statespace time series analysis.
Biometrika 89 (3), 603–616.D’Amuri, F. and J. Marcucci (2017). The predictive power of google searches in forecasting usunemployment.
International Journal of Forecasting 33 (4), 801–816.Ettredge, M., J. Gerdes, and G. Karuga (2005). Using web-based search data to predict macroe-conomic statistics.
Communications of the ACM 48 (11), 87–92.Ferrara, L. and A. Simoni (2019). When are google data useful to nowcast gdp? an approachvia pre-selection and shrinkage.Fr¨uhwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models.
Journal oftime series analysis 15 (2), 183–202.Fr¨uhwirth-Schnatter, S. (1995). Bayesian model discrimination and bayes factors for lineargaussian state space models.
Journal of the Royal Statistical Society: Series B (Methodolog-ical) 57 (1), 237–246. 26r¨uhwirth-Schnatter, S. and H. Wagner (2010). Stochastic model specification search for gaus-sian and partial non-gaussian state space models.
Journal of Econometrics 154 (1), 85–100.Gentzkow, M., B. Kelly, and M. Taddy (2019). Text as data.
Journal of Economic Litera-ture 57 (3), 535–74.George, E. I. and R. E. McCulloch (1993). Variable selection via gibbs sampling.
Journal ofthe American Statistical Association 88 (423), 881–889.Giannone, D., M. Lenza, and G. E. Primiceri (2017). Economic predictions with big data: Theillusion of sparsity.Giannone, D., F. Monti, and L. Reichlin (2016). Exploiting the monthly data flow in structuralforecasting.
Journal of Monetary Economics 84 , 201–215.Gneiting, T. and A. E. Raftery (2007). Strictly proper scoring rules, prediction, and estimation.
Journal of the American statistical Association 102 (477), 359–378.Grant, A. L. and J. C. Chan (2017). A bayesian model comparison for trend-cycle decomposi-tions of output.
Journal of Money, Credit and Banking 49 (2-3), 525–552.Guzman, G. (2011). Internet search behavior as an economic forecasting tool: The case ofinflation expectations.
Journal of economic and social measurement 36 (3), 119–167.Harvey, A. (2006). Forecasting with unobserved components time series models.
Handbook ofeconomic forecasting 1 , 327–412.Harvey, A. C. (1990).
Forecasting, structural time series models and the Kalman filter . Cam-bridge university press.Huber, F., G. Koop, and L. Onorante (2019). Inducing sparsity and shrinkage in time-varyingparameter models. arXiv preprint arXiv:1905.10787 .Ishwaran, H., J. S. Rao, et al. (2005). Spike and slab variable selection: frequentist and bayesianstrategies.
The Annals of Statistics 33 (2), 730–773.Jurado, K., S. C. Ludvigson, and S. Ng (2015). Measuring uncertainty.
American EconomicReview 105 (3), 1177–1216. 27im, C.-J. and C. R. Nelson (1999). Has the us economy become more stable? a bayesianapproach based on a markov-switching model of the business cycle.
Review of Economicsand Statistics 81 (4), 608–616.Konrath, S., T. Kneib, and L. Fahrmeir (2008). Bayesian regularisation in structured additiveregression models for survival data.Koop, G. and L. Onorante (2019). Macroeconomic nowcasting using google probabilities.
Topicsin Identification, Limited Dependent Variables, Partial Observability, Experimentation, andFlexible Modeling: Part A (Advances in Econometrics 40 , 17–40.Manela, A. and A. Moreira (2017). News implied volatility and disaster concerns.
Journal ofFinancial Economics 123 (1), 137–162.McConnell, M. M. and G. Perez-Quiros (2000). Output fluctuations in the united states: Whathas changed since the early 1980’s?
American Economic Review 90 (5), 1464–1476.McLaren, N. and R. Shanbhogue (2011). Using internet search data as economic indicators.
Bank of England Quarterly Bulletin (2011), Q2.Moran, G. E., V. Roˇckov´a, E. I. George, et al. (2018). Variance prior forms for high-dimensionalbayesian variable selection.
Bayesian Analysis , 1091–1119.Niesert, R. F., J. A. Oorschot, C. P. Veldhuisen, K. Brons, and R.-J. Lange (2020). Can googlesearch data help predict macroeconomic series?
International Journal of Forecasting 36 (3),1163–1172.Piironen, J., A. Vehtari, et al. (2017). Sparsity information and regularization in the horseshoeand other shrinkage priors.
Electronic Journal of Statistics 11 (2), 5018–5051.Polson, N. G. and J. G. Scott (2010). Shrink globally, act locally: Sparse bayesian regularizationand prediction.
Bayesian statistics 9 , 501–538.Ray, P. and A. Bhattacharya (2018). Signal adaptive variable selector for the horseshoe prior. arXiv preprint arXiv:1810.09004 .Scott, S. L. and H. R. Varian (2014). Predicting the present with bayesian structural timeseries.
International Journal of Mathematical Modelling and Numerical Optimisation 5 (1-2),4–23. 28hapiro, A. H., M. Sudhof, and D. Wilson (2020). Measuring news sentiment. Federal ReserveBank of San Francisco.Sims, C. (2012). Comment to stock and watson (2012).
Brookings Papers on Economic Activity,Spring , 81–156.Sims, C. A. (2002). The role of models and probabilities in the monetary policy process.
Brookings Papers on Economic Activity 2002 (2), 1–40.Smith, P. (2016). Google’s midas touch: Predicting uk unemployment with internet searchdata.
Journal of Forecasting 35 (3), 263–284.Suhoy, T. et al. (2009). Query indices and a 2008 downturn: Israeli data. Technical report,Bank of Israel.Verdinelli, I. and L. Wasserman (1995). Computing bayes factors using a generalization ofthe savage-dickey density ratio.
Journal of the American Statistical Association 90 (430),614–618.Vosen, S. and T. Schmidt (2011). Forecasting private consumption: survey-based indicators vs.google trends.
Journal of forecasting 30 (6), 565–578.Watson, M. W. (1986). Univariate detrending methods with stochastic trends.
Journal ofmonetary economics 18 (1), 49–75.Zou, H. (2006). The adaptive lasso and its oracle properties.
Journal of the American statisticalassociation 101 (476), 1418–1429.
A Appendix
A.1 Posteriors
In this section of the appendix, we provide the conditional posterior distribution for the regres-sion parameters. 29 .1.1 Horseshoe Prior
Starting from model 3 and assuming that the states and state variances have already beendrawn in steps 1.-3. in 3.5 which is further described in (reference posterior of the states inappendix). We subtract off τ such that y − τ = y ∗ = Xβ + (cid:15), N (0 , σ I T ). Printing the priorhere again for convenience: β j | λ j , ν, σ ∼ N (0 , λ j ν σ ) , j ∈ , · · · , Kλ j ∼ C + (0 , ν ∼ C + (0 , σ ∼ f (22)Then, by standard calculations (see Bhattacharya et al. (2016)): β | y ∗ , λ, ν, σ ∼ N ( A − X (cid:48) y ∗ , σ A − ) A = ( X (cid:48) X + Λ − ∗ )Λ ∗ = ν diag( λ , . . . , λ K ) (23)Instead of computing the large dimensional inverse A − , we rely on a data augmentation tech-nique introduced by Bhattacharya et al. (2016). This reduces the computational complexityfrom O ( K ) to O ( T K ). Suppose the posterior is normal N K ( µ, Σ) withΣ = ( φ (cid:48) φ + D − ) − , µ = Σ φ (cid:48) α, (24)where α ∈ R T × , φ ∈ R T × K and D ∈ R K × K is symmetric positive definite. Bhattacharya et al.(2016) show that an exact sampling algorithm is given by: Algorithm 1
Fast Horseshoe Sampler Sample independently u ∼ N (0 , D ) and δ ∼ N (0 , I T ) Set ξ = Φ u + δ Solve (Φ D Φ (cid:48) + I T ) w = ( α − ξ ) Set θ = u + D Φ (cid:48) w Notice that φ = X/σ , D = σ Λ ∗ and α = y ∗ /σ .30 .1.2 SSVS Prior Conditioning on the states as in A.1.1, we apply the prior: β j | γ j , δ j ∼ γ j N (0 , δ j ) + (1 − γ j ) N (0 , c × δ j ) δ j ∼ G − ( a .a ) γ j ∼ Bernoulli ( π ) π ∼ B ( b , b ) , (25)where G − and B stand for the inverse gamma and beta distribution respectively. The condi-tional posteriors are standard and derived for example in George and McCulloch (1993) andIshwaran et al. (2005). The difference to the prior of George and McCulloch (1993) lies in theadditional prior for δ j which is assumed to be inverse gamma. It can be shown that this impliesa mixture of student-t distributions for β j marginally (Konrath et al., 2008). We sample fromthe conditional posteriors in the following way: Algorithm 2
SSVS Sampler For j ∈ { , · · · , K } , sample each γ j | β j , δ j , π , y ∼ (1 − π ) N ( β j | , c × δ j ) I γ j =0 + π N ( β j | β j | , δ j ) I γ =1 Sample π ∼ B ( b + n , b + K − n ), where n = (cid:80) j I γ j =1 Sample β | γ, δ , σ , y ∼ N ( A − X (cid:48) y ∗ /σ , A ), where A − = X (cid:48) X/σ + D − D = diag( δ j γ j ) Sample σ ∼ G − ( c, C ), where c = c + T , C = C + (( y ∗ − Xβ ) (cid:48) ( y ∗ − Xβ )) and p ( σ ) ∼G ( c, C ) A.2 State Space Estimation and Forecasting
A.2.1 Estimation
Assume analogously to A.1.1 and A.1.2 that all regression parameters have been sampled suchthat conditionally on β , we estimate y − Xβ = ˆ y t = τ + σ τ ˜ τ t + tα + σ α (cid:80) ts =1 ˜ α t + (cid:15), (cid:15) t ∼ N (0 , σ y )and ˜ τ t = ˜ τ t − + u τt , u τt ∼ N (0 , α t = ˜ α t − + u αt , u αt ∼ N (0 , { ˜ τ , ˜ α } Tt =1 are independent of the other parameters in the non-centred fomrulation, we proceedby first estimating the states and then θ = { ˜ τ , ˜ α, σ τ , σ α } .States { ˜ τ , ˜ α } Tt =1 can be sampled by any state space algorithm, e.g. Durbin and Koopman312002), Carter and Kohn (1994) or Fr¨uhwirth-Schnatter (1994). We instead opt for the preci-sion sampler by Chan (2017) which exploits the joint distribution of the states which pairedwith sparse matrix operations yields significant increases in statistical as well as computationalefficiency (Grant and Chan, 2017). Since ˜ α s enters in the observation equation as a sum, wedefine ˜ A t = (cid:80) ts =1 ˜ α s . Notice that equation (4) implies that H ˜ α = ˜ u α , where H is the firstdifference matrix and ˜ u α ∼ N ( , I T ). Notice that ˜ A t = ˜ α which implies that ˜ A t − ˜ A t − = ˜ α t .Hence, this gives us back the desired H ˜ A = ˜ α . Solving ˜ A = H − ˜ u α = H − ˜ u α . Therefore ˜ A ∼ N ( , ( H (cid:48) H ) − ) (26)To sample the states jointly, define ξ = ( ˜ τ (cid:48) , ˜ A (cid:48) ) (cid:48) . Then ˆ y can be re-written as:ˆ y = τ T + α T + X ξ ξ + (cid:15), (27)where T is defined as (1 , , · · · , T ) (cid:48) and X ξ = ( σ τ I T , σ α I T ). Since X ξ is a sparse matrix,manipulations in programs which utilise sparse matrix operations will be very fast.Similar calculations result in the implicit prior ˜ τ ∼ N ( , ( H (cid:48) H ) − ). Now, since by assumption τ and ˜ A are independent, the combined for ξ is: ξ ∼ N (0 , P − ξ ) , (28)where P ξ = diag( H (cid:48) H , H (cid:48) H ). The posterior is thus standard: p ( ξ | y , σ y ) ∼ N ( ξ, A − ξ ) (29)where K ξ = P ξ + σ y X (cid:48) ξ X ξ and ξ = K − ξ ( σ y X (cid:48) ξ ( y − τ T − α T )).Conditionally on ξ , the starting values β = ( τ , α ) are drawn by simple linear regressionresults, where we specify a generic prior covariance as V β = diag(0 . , . A.2.2 Forecasting
Taking equation (1) as our starting point, it is well known that the predictive density p ( y t | y t − , β, θ, σ y ),where y t − = ( y , · · · , y t − ), can be generated by the Kalman filter. Since equation (14) is in-stead estimated by precision sampling, and hence, without Kalman recursions, the literaturehas proposed (1) conditionally optimal Kalman mixture approximations (Bitto and Fr¨uhwirth-Schnatter, 2019), (2) pure simulation based methods to approximate (1) (Belmonte et al., 2014), [Put in here the first difference matrix as defines on p.81, Chan (2017)] p ( y t | y t − ) can be generated by first drawing fromthe non-centred states which with the draws of the other model parameters yield draws fromthe predictive. More specifically, for posterior draw m = 1 , · · · , M :1. Draw (˜ τ ( m ) t , ˜ α ( m ) t ) from N (˜ τ ( m ) t − ,
1) and N ( ˜ α ( m ) t − ,
1) respectively2. Generate α t = α ( m )0 + σ ( m ) α ˜ α ( m ) t and τ ( m ) = τ ( m )0 + σ ( m ) τ ˜ τ ( m ) t + tα ( m )0 + σ ( m ) α (cid:80) ts =1 α ms
3. Generate x (cid:48) t β ( m ) + τ ( m ) + σ ( m ) y u , where u ∼ N (0 , p ( y t | y t − ), one can then usea kernel density smoother such as ”kdensity” in Matlab. B Graphs
B.1 In-Sample Results
Figure 9: Posterior inclusion probabilities for the original BSTS model of Scott and Varian(2014). 33
Data