Real-time Inflation Forecasting Using Non-linear Dimension Reduction Techniques
RReal-time Inflation Forecasting Using Non-linearDimension Reduction Techniques
NIKO HAUZENBERGER
1, 2 , FLORIAN HUBER , and KARIN KLIEBER ∗ University of Salzburg Vienna University of Economics and Business
December 16, 2020
In this paper, we assess whether using non-linear dimension reduction techniquespays off for forecasting inflation in real-time. Several recent methods from themachine learning literature are adopted to map a large dimensional dataset intoa lower dimensional set of latent factors. We model the relationship betweeninflation and these latent factors using state-of-the-art time-varying parameter(TVP) regressions with shrinkage priors. Using monthly real-time data for the US,our results suggest that adding such non-linearities yields forecasts that are onaverage highly competitive to ones obtained from methods using linear dimensionreduction techniques. Zooming into model performance over time moreover revealsthat controlling for non-linear relations in the data is of particular importanceduring recessionary episodes of the business cycle.
JEL : C11, C32, C40, C53, E31
Keywords : Non-linear principal components, machine learning, time-varyingparameter regression, density forecasting, real-time data ∗ Corresponding author: Karin Klieber. Salzburg Centre of European Union Studies, University of Salzburg.
Address : M¨onchsberg 2a, 5020 Salzburg, Austria.
Email : [email protected]. We thank Michael Pfarrhoferand Anna Stelzer for valuable comments and suggestions. The authors gratefully acknowledge financial sup-port from the Austrian Science Fund (FWF, grant no. ZK 35) and the Oesterreichische Nationalbank (OeNB,Anniversary Fund, project no. 18127). a r X i v : . [ ec on . E M ] D ec Introduction
Inflation expectations are used as crucial inputs for economic decision making in central bankssuch as the European Central Bank (ECB) and the US Federal Reserve (Fed). Given currentand expected inflation, economic agents decide on how much to consume, save and invest.In addition, measures of inflation expectations are often employed to estimate the slope ofthe Phillips curve, infer the output gap or the natural rate of interest. Hence, being able toaccurately predict inflation is key for designing and implementing appropriate monetary policiesin a forward looking manner.Although the literature on modeling inflation is voluminous and the efforts invested con-siderable, predicting inflation remains a difficult task (Stock and Watson, 2007) and simpleunivariate models are still difficult to beat. The recent literature, however, has shown that usinglarge datasets (Stock and Watson, 2002) and/or sophisticated models (see Koop and Potter,2007; Koop and Korobilis, 2012; D’Agostino et al., 2013; Koop and Korobilis, 2013; Clark andRavazzolo, 2015; Chan et al., 2018; Jarocinski and Lenza, 2018) has the potential to improveupon simpler benchmarks.These studies often extract information from huge datasets. This is commonly achieved byextracting a relatively small number of principal components (PCs) and including them in asecond stage regression model. While this approach performs well empirically, it fails to capturenon-linear relations in the dataset. In the presence of non-linearities, using simple PCs poten-tially reduces predictive accuracy by ignoring important features of the data. Moreover, theregression model that links the PCs with inflation is often assumed to feature constant param-eters and homoscedastic errors. In the presence of structural breaks and/or heteroscedasticity,this may adversely affect forecasting accuracy.Investigating whether allowing for non-linearities in the compression stage pays off for in-flation forecasting is the key objective of the present paper. Building on recent advances inmachine learning (see Gallant and White, 1992; McAdam and McNelis, 2005; Exterkate et al.,2016; Chakraborty and Joseph, 2017; Heaton et al., 2017; Mullainathan and Spiess, 2017; Fenget al., 2018; Coulombe et al., 2019; Kelly et al., 2019; Medeiros et al., 2019), we adopt sev-eral non-linear dimension reduction techniques. The resulting latent factors are then linked toinflation in a second stage regression. In this second stage regression we allow for substantialflexibility. Specifically, we consider dynamic regression models that allow for time-varying pa-rameters (TVPs) and stochastic volatility (SV). Since the inclusion of a relatively large numberof latent factors can still imply a considerable number of parameters (and this problem is evenmore severe in the TVP regression case), we rely on state-of-the-art shrinkage techniques.From an empirical standpoint it is necessary to investigate how these dimension reductiontechniques perform over time and during different business cycle phases. We show this using athorough real-time forecasting experiment for the US. Our forecasting application uses monthlyreal-time datasets (i.e., the FRED-MD database proposed in McCracken and Ng (2016)) andincludes a battery of well established models commonly used in central banks and other policyinstitutions to forecast inflation.Our results show that dimension reduction techniques yield forecasts that are highly com-petitive to the ones obtained from using linear methods based on PCs. At a first glance, this2hows that existing models already perform well and using more sophisticated methods yieldsonly modest gains in predictive accuracy. However, zooming into model performance over timereveals that controlling for non-linear relations in the data is of particular importance duringrecessionary episodes of the business cycle.This finding gives rise to the second contribution of our paper. Since we find that moresophisticated non-linear dimension reduction methods outperform simpler techniques duringrecessions, we combine the considered models using dynamic model averaging (see Raftery et al.,2010; Koop and Korobilis, 2013). We show that combining our proposed set of models with avariety of standard forecasting models yields predictive densities which are superior to the singlebest performing model in overall terms. These effects are even more pronounced when interestcenters on multi-step ahead forecasting.The remainder of this paper is structured as follows. Section 2 discusses a set of dimen-sion reduction techniques. Section 3 introduces the econometric modeling environment thatwe use to forecast inflation. Section 4 provides the results of the forecasting horse race andintroduces weighted combinations of the competing models including the results of the forecastcombinations. The last section summarizes and concludes the paper.
Suppose that we are interested in predicting inflation using a large number of K regressors thatwe store in a T × K matrix X = ( x , . . . , x T ) (cid:48) , where x t denotes a K -dimensional vector ofobservations at time t . If K is large relative to T , estimation of an unrestricted model thatuses all columns in X quickly becomes cumbersome and overfitting issues arise. As a solution,dimension reduction techniques are commonly employed (see, e.g., Stock and Watson, 2002;Bernanke et al., 2005). These methods strike a balance between model fit and parsimony. At avery general level, the key idea is to introduce a function f that takes the matrix X as inputand yields a lower dimensional representation Z = f ( X ) = ( z , . . . , z T ) (cid:48) , which is of dimension T × q , as output. The critical assumption to achieve parsimony is that K (cid:29) q . The latentfactors in Z are then linked to inflation through a dynamic regression model (see Section 3).The function f : R T × K → R T × q is typically assumed to be linear with the most prominentexample being PCs. In this paper, we will consider several choices of f that range from linear tohighly non-linear (such as manifold learning as well as deep learning algorithms) specifications.We subsequently analyze how these different specifications impact inflation forecasting accuracy.In the following subsections, we briefly discuss the different techniques and refer to the originalpapers for additional information. Minor alterations of the main PCA algorithm allow for introducing non-linearities in two ways.First, we can introduce a non-linear function g that maps the covariates onto a matrix W = g ( X ). Second, we could alter the sample covariance matrix (the kernel) with a function h : κ = h ( W (cid:48) W ). Both W and κ form the two main ingredients of a general PCA reducing thedimension to q , as outlined below (for details, see Sch¨olkopf et al., 1998).3ndependent of the functional form of g and h , we obtain PCs by performing a truncatedsingular value decomposition (SVD) of the transformed sample covariance matrix κ . Conditionalon the first q eigenvalues, the resulting factor matrix Z is of dimension T × q . These PCs, forappropriate q , explain the vast majority of variation in X . In the following, the relationshipbetween the PCs and X is: Z = f ( X ) = g ( X ) Λ ( κ ) = W Λ ( κ ) , (1)with Λ ( κ ) being the truncated K × q eigenvector matrix of κ (Stock and Watson, 2002). Noticethat this is always conditional on deciding on a suitable number q of PCs. The number offactors is a crucial parameter that strongly influences predictive accuracy and inference (Baiand Ng, 2002). In our empirical work, we consider a small ( q = 5), moderate ( q = 15), andlarge ( q = 30) number of PCs. In the case of a large number of PCs, we use shrinkage to solveoverparameterization concerns.By varying the functional form of g and h we are now able to discuss the first set of linear-and non-linear dimension reduction techniques belonging to the class of PCA:1. Linear PCs
The simplest way is to define both g and h as the unity function, resulting in W = X and κ = X (cid:48) X . Due to the linear link between the PCs and the data, PCA is very easy toimplement and yields consistent estimators for the latent factors if K and T go to infinity(Stock and Watson, 2002; Bai and Ng, 2008). Even if there is some time-variation inthe factor loadings, Stock and Watson (1999) show that principal components remain aconsistent estimator for the factors if K is large.2. Squared PCs
The literature suggests several ways to overcome the linearity restriction of PCs. Bai andNg (2008), for example, apply a quadratic link function between the latent factors and theregressors, yielding a more flexible factor structure. This method considers squaring theelements of X resulting in W = X and κ = ( X ) (cid:48) ( X ) , (2)with X = ( X (cid:12) X ) and (cid:12) denoting element-wise multiplication.Squared PCs focus on the second moments of the covariate matrix and allow for a non-linear relationship between the principal components and the predictors. Bai and Ng(2008) show that quadratic variables can have substantial predictive power as they pro-vide additional information on the underlying time series. Intuitively speaking, giventhat we transform our data to stationarity in the empirical work, this transformationstrongly overweights situations characterized by sharp movements in the columns of X (such as during a recession). By contrast, periods characterized by little variation in ourmacroeconomic panel are transformed to mildly fluctuate around zero (and thus carry lit-tle predictive content for inflation). In our empirical model, our regressions always featurelagged inflation and this transformation thus effectively implies that in tranquil periods,4he model is close to an autoregressive model whereas in crisis periods, more informationis introduced.3. Kernel PCs
Another approach for non-linear PCs is the kernel principal component analysis (KPCA).KPCA dates back to Sch¨olkopf et al. (1998), who proposed using integral operator kernelfunctions to compute PCs in a non-linear manner. In essence, this amounts to implicitlyapplying a non-linear transformation of the data through a kernel function and then ap-plying PCA on this transformed dataset. Such an approach has been used for forecastingin Giovannelli (2012) and Exterkate et al. (2016).We allow for non-linearities in the kernel function between the data and the factors bydefining h to be a Gaussian or a polynomial kernel κ (which is K × K ) with the ( i, j )thelement given by κ ij = exp (cid:18) − || x • i − x • j || c (cid:19) (3)for a Gaussian kernel and κ ij = (cid:18) x (cid:48)• i x • j c + 1 (cid:19) (4)for a polynomial kernel.Here, W = X (i.e., g is the unity function), x • i and x • j ( i, j = 1 , . . . , K ) denote twocolumns of X while c and c are scaling parameters. As suggested by Exterkate et al.(2016) we set c = (cid:112) ( K + 2) / c = √ c K /π with c K being the 95th percentile of the χ distribution with K degrees of freedom. Diffusion maps, originally proposed in Coifman et al. (2005) and Coifman and Lafon (2006), areanother set of non-linear dimension reduction techniques that retain local interactions betweendata points in the presence of substantial non-linearities in the data. The local interactions arepreserved by introducing a random walk process.The random walk captures the notion that moving between similar data points is morelikely than moving to points which are less similar. We assume that the weight function whichdetermines the strength of the relationship between x • i to x • j is given by w ( x • i , x • j ) = exp (cid:18) || x • i − x • j || c (cid:19) , (5)where || x • i − x • j || denotes the Euclidean distance between x • i and x • j and c is a tuningparameter set such that w ( x • i , x • j ) is close to zero except for x • i ≈ x • j . Here, c is determinedby the median distance of k -nearest neighbors of x • i as suggested by Zelnik-Manor and Perona For an application to astronomical spectra, see Richards et al. (2009). k is chosen by taking a small percentage of K (i.e., 1 %) such that itscales with the size of the dataset.The probability of moving from x • i to x • j is then simply obtained by normalizing: p i → j = Prob( x • i → x • j ) = w ( x • i , x • j ) (cid:80) j w ( x • i , x • j ) . (6)This probability tends to be small except for the situation where x • i and x • j are similar to eachother. As a result, the probability that the random walk moves from x • i to x • j will be largeif they are equal but rather small if both covariates differ strongly. Let P denote a transitionmatrix of dimension K × K with ( i, j )th element given by p i → j . The probability of moving from x • i to x • j in n = 1 , , . . . steps is then simply the matrix power of P n , with typical elementdenoted by p ni → j . Using a biorthogonal spectral decomposition of P n yields: p ni → j = (cid:88) s ≥ λ ns ψ s ( x • i ) φ s ( x • j ) , (7)with ψ s and φ s denoting left and right eigenvectors of P , respectively. The correspondingeigenvalues are given by λ s .We then proceed by computing the so-called diffusion distance as follows: ξ n ( x • i , x • j ) = (cid:88) j ( p ni → j − p ns → j ) p ( x • j ) , (8)with p being a normalizing factor that measures the proportion the random walk spends at x • j . This measure turns out to be robust with respect to noise and outliers. Coifman and Lafon(2006) show that ξ n ( x • i , x • j ) = ∞ (cid:88) s =1 λ ns ( ψ s ( x • i ) − ψ s ( x • j )) . (9)This allows us to introduce the family of diffusion maps from R K → R q given by: Ξ n ( x • i ) = [ λ n ψ ( x • i ) , . . . , λ nq ψ q ( x • i )] . (10)The distance matrix can then be approximated as: ξ n ( x • i , x • j ) ≈ q (cid:88) s =1 λ ns ( ψ s ( x • i ) − ψ s ( x • j )) = || Ξ n ( x • i ) − Ξ n ( x • j ) || . (11)Intuitively, this equation states that we now approximate diffusion distances in R K through theEuclidian distance between Ξ n ( x • i ) and Ξ n ( x • j ). This discussion implies that we have to choose n and q and we do this by setting q = { , , } according to our approach with either a small,moderate or large number of factors and n = T , the number of time periods. The algorithmin our application is implemented using the R-package diffusionMap (Richards and Cannoodt,2019). 6 .3 Local Linear Embedding Locally linear embeddings (LLE) have been introduced by Roweis and Saul (2000). Intuitively,the LLE algorithm maps a high dimensional input dataset X into a lower dimensional spacewhile being neighborhood-preserving. This implies that points which are close to each other inthe original space are also close to each other in the transformed space.The LLE algorithm is based on the assumption that each x • i is sampled from some underlyingmanifold. If this manifold is well defined, each x • i and its neighbors x • j are located close toa locally linear patch of this manifold. One consequence is that each x • i can, conditionalon suitably chosen linear coefficients, be reconstructed from its neighbors x • j j (cid:54) = i . Thisreconstruction, however, will be corrupted by measurement errors. Roweis and Saul (2000)introduce a cost function to quantify these errors: C ( Ω ) = (cid:88) i ( x • i − (cid:88) j ω ij x • j ) , (12)with Ω denoting a weight matrix with the ( i, j )th element given by ω ij . This cost function isthen minimized subject to the constraint that each x • i is reconstructed only from its neighbors.This implies that ω ij = 0 if x • j is not a neighbor of x • i . The second constraint is that thematrix Ω is row-stochastic, i.e., the rows sum to one. Conditional on these two restrictions, thecost function can be minimized by solving a least squares problem.To make this algorithm operational we need to define our notion of neighbors. In the fol-lowing, we will use the k -nearest neighbors in terms of the Euclidean distance. We choose thenumber of neighbors by applying the algorithm proposed by Kayo (2006), which automaticallydetermines the optimal number for k . The q latent factors in Z , with typical i th column z • i ,are then obtained by minimizing:Φ( Z ) = (cid:88) i | z • i − (cid:88) j Ω ij z • j | , (13)which implies a quadratic form in z t . Subject to suitable constraints, this problem can be easilysolved by computing: M = ( I T − Ω ) (cid:48) ( I T − Ω ) , (14)and finding the q + 1 eigenvectors of M associated with the q + 1 smallest eigenvalues. Thebottom eigenvector is then discarded to arrive at q factors. For our application, we use theR-package lle (Diedrich and Abel, 2012). Isometric Feature Mapping (ISOMAP) is one of the earliest methods developed in the category ofmanifold learning algorithms. Introduced by Tenenbaum et al. (2000), the ISOMAP algorithmdetermines the geodesic distance on the manifold and uses multidimensional scaling to comeup with a low number of factors describing the underlying dataset. Originally, ISOMAP wasconstructed for applications in visual perception and image recognition. In economics and7nance, some recent papers highlight its usefulness (see, e.g., Ribeiro et al., 2008; Lin et al.,2011; Orsenigo and Vercellis, 2013; Zime, 2014).The algorithm consists of three steps. In the first step, a dissimilarity index that measuresthe distance between data points is computed. These distances are then used to identify neigh-boring points on the manifold. In the second step, the algorithm estimates the geodesic distancebetween the data points as shortest path distances. In the third step, metric scaling is performedby applying classical multidimensional scaling (MDS) to the matrix of distances. For the dissim-ilarity transformation, we determine the distance between point i and j by the Manhattan index d ij = (cid:80) k | x ki − x kj | and collect those points where i is one of the k -nearest neighbors of j ina dissimilarity matrix. For our empirical application, we again choose the number of neighborsby applying the algorithm proposed by Kayo (2006) and use the algorithm implemented in theR-package vegan (Oksanen et al., 2019).The described non-linear transformation of the dataset enables the identification of a non-linear structure hidden in a high-dimensional dataset and maps it to a lower dimension. Insteadof pairwise Euclidean distances, ISOMAP uses the geodesic distances on the manifold and com-presses information under consideration of the global structure. Deep learning algorithms are characterized by not only non-linearly converting input to out-put but also representing the input itself in a transformed way. This is called representationlearning in the sense that representations of the data are expressed in terms of other, simpler,representations before mapping the data input to output values.One tool which performs representation of itself as well as representation to output is theAutoencoder (AE). The first step is accomplished by the encoder function, which maps aninput to an internal representation. The second part, which maps the representation to theoutput, is called the decoder function. Their ability to extract factors, which largely explainvariability of the observed data, in a non-linear manner makes deep learners a powerful toolcomplementing the range of commonly used dimension reduction techniques (Goodfellow et al.,2016). In empirical finance, Heaton et al. (2017), Feng et al. (2018) and Kelly et al. (2019) showthat the application of these methods is beneficial to predict asset returns.Based on deep learning techniques, we propose obtaining hierarchical predictors Z by apply-ing a number of l ∈ { , . . . , L } non-linear transformations to X . The non-linear transformationsare also called hidden layers with L giving the depth of our architecture and f , . . . , f L de-noting univariate activation functions for each layer. More specifically, activation functions(non-linearly) transform data in each layer, taking the output of the previous layer. A commonchoice is the hyperbolic tangent (tanh) given by exp( X ) − exp( − X )exp( X )+exp( − X ) , justified by several findings inrecent studies such as Saxe et al. (2019) or Andreini et al. (2020).The structure of our deep learning algorithm can be represented in form of a composition ofunivariate semi-affine functions given by f W ( l ) ,b l l = f l (cid:32) N l (cid:88) i =1 W ( l ) • i ˆ x ( l ) • i + b l (cid:33) , ≤ l ≤ L, (15)8ith W ( l ) denoting a weighting matrix associated with layer l (with W ( l ) • i denoting the i thcolumn of W ( l ) ), ˆ x ( l ) • i denotes the i th column of an input matrix ˆ X ( l ) to layer l , b l is thecorresponding bias term and N l denotes the number of neurons that determine the width of thenetwork. Notice that if l = 1, ˆ X (1) = X and the input matrix is obtained recursively by usingthe activation functions.The lower dimensional representation of our covariate matrix is then obtained by computingthe composite map: Z = f ( X ) = ( f W (1) ,b ◦ · · · ◦ f W ( L ) ,b L L )( X ) . (16)The optimal sets of ˆ W = ( ˆ W (0) , . . . , ˆ W ( L ) ) and ˆ b = (ˆ b , . . . , ˆ b L ) are obtained by computing aloss function, most commonly the mean squared error of the in-sample fit. The complexity ofthe neural network is determined by choosing the number of hidden layers L and the numberof neurons in each layer N l . We create five hidden layers with the number of neurons evenlydownsizing to the desired number of factors. Corresponding to the standard literature (see, e.g.,Huang, 2003; Heaton, 2008), a huge number of covariates requires a more complex structure(i.e., a higher number of hidden layers). Furthermore, it is recommended to set the numberof neurons between the size of the input and the output layer where N l is high in the firsthidden layer and smaller in the following layers. We employ the R interface to keras (Allaireand Chollet, 2019), a high-level neural networks API and widely used package for implementingdeep learning models. In the following, we introduce the predictive regression that links our target variable (US infla-tion) to Z and p lags of inflation. Following Stock and Watson (1999), inflation is specified suchthat: y t + h = ln (cid:18) CPI t + h CPI t (cid:19) − ln (cid:18) CPI t CPI t − (cid:19) , (17)with CPI t + h denoting the consumer price index in period t + h .In the empirical application we set h ∈ { , , } . y t + h is then modeled using a dynamicregression model: y t + h = d (cid:48) t β t + h + (cid:15) t + h , (cid:15) t + h ∼ N (0 , σ t + h ) , (18)where β t + h is a vector of TVPs associated with M (= q + p ) covariates denoted by d t and σ t + h is a time-varying error variance. d t might include the latent factors extracted from thevarious methods discussed in the previous subsection, lags of inflation, an intercept term orother covariates which are not compressed.Following much of the literature (Taylor, 1982; Belmonte et al., 2014; Kalli and Griffin, 2014;Kastner and Fr¨uhwirth-Schnatter, 2014; Stock and Watson, 2016; Chan, 2017; Huber et al., 2020)we assume that the TVPs and the error variances evolve according to independent stochastic9rocesses: (cid:32) β t + h log σ t + h (cid:33) ∼ N (cid:32)(cid:32) β t + h − µ h + ρ h log σ t + h − (cid:33) , (cid:32) V ϑ h (cid:33)(cid:33) , (19)with µ h denoting the conditional mean of the log-volatility, ρ h its persistence parameter and ϑ h the error variance of log σ t + h . The matrix V is a M × M -dimensional variance-covariancematrix with V = diag( v , . . . , v M ) and v j being the process innovation variance that determinesthe amount of time-variation in β t + h . This setup implies that the TVPs are assumed to followa random walk process while the log-volatilities evolve according to an AR(1) process.The model described by Eq. 18 and Eq. 19 is a flexible state space model that encompassesa wide range of models commonly used for forecasting inflation. For instance, if we set V = M and ϑ = 0, we obtain a constant parameter model. If d t includes the lags of inflation and(lagged) PCs, we obtain a model closely related to the one used in Stock and Watson (2002).If we set d t = 1 and allow for TVPs, we obtain a model very closely related to the unobservedcomponents stochastic volatility (UC-SV) successfully adopted in Stock and Watson (1999).A plethora of other models can be identified by appropriately choosing d t , V and ϑ . Thisflexibility, however, calls for model selection. We select appropriate submodels by using Bayesianmethods for estimation and forecasting. These techniques are further discussed in Appendix Aand allow for data-based shrinkage towards simpler nested alternatives. For the empirical application, we consider the popular FRED-MD database. This dataset ispublicly accessible and available in real-time. The monthly data vintages ensure that we onlyuse information that would have been available at the time a given forecast is being produced.A detailed description of the databases can be found in McCracken and Ng (2016). To achieveapproximate stationarity we transform the dataset as given in Appendix B. Furthermore, eachtime series is standardized to have sample mean zero and unit sample variance prior to usingthe non-linear dimension reduction techniques.Our US dataset includes 105 monthly variables that span the period from 1963:01 to 2019:06.The forecasting design relies on a rolling window, as justified by Clark (2011), that initiallyranges from 1980:01 to 1999:12. For each month of the hold-out sample, which starts in 2000:01and ends in 2018:12, we compute the h -step ahead predictive distribution for each model (for h ∈ { , , } ), keeping the length of the estimation sample fixed at 240 observations (i.e., arolling window of 20 years).One key limitation is that all methods are specified conditionally on d t and thus implicitlyon the specific function f used to move from X to Z . Another key object of this paper is tocontrol for uncertainty with respect to f by using dynamic model averaging techniques. Forobtaining predictive combinations, we use the first 24 observations of our hold-out sample. Theremaining periods (i.e., ranging from 2002:01 to 2018:12) then constitute our evaluation sample.For these periods we contrast each forecast (including the combined ones) with the realization10f inflation in the final vintage of 2019:06. With such a strategy we aim at minimizing the riskthat realized inflation especially at the end of the evaluation sample is still subject to revisionsitself. In terms of competing models we can classify the specifications along two dimensions:1.
How d t is constructed. First and importantly, let s t denote an K -dimensional vectorof covariates except for y t . x t = ( s (cid:48) t , . . . , s (cid:48) t − p +1 ) (cid:48) is then composed of p lags of s t with K = pK . In our empirical work we set p = 12 and include all variables in the dataset(except for the CPI series, i.e., K = 104). We then use the different dimension reductiontechniques outlined in Section 2 to estimate z t . Moreover, we add p lags of y t to z t . Thisserves to investigate how different dimension reduction techniques perform when interestcenters on predicting inflation. Moreover, we also consider simple AR(12) models as wellas extended Phillips curve models (see, e.g., De Mol et al., 2008; Stock and Watson, 2008;Koop and Korobilis, 2012; Hauzenberger et al., 2019) as additional competitors. For theestimation of the extended Phillips curve model we select 20 covariates such that variouseconomic sectors are covered. Details can be found in Appendix B.2.
The relationship between d t and y t + h . The second dimension along which our modelsdiffer is the specific relationship described by Eq. 18. To investigate whether non-linear di-mension reduction techniques are sufficient to control for unknown forms of non-linearities,we benchmark all our models that feature TVPs with their respective constant parametercounterparts. To perform model selection we consider two priors. The first one is theHorseshoe (HS) prior (Carvalho et al., 2010) and the second one is the stochastic searchvariable selection (SSVS) prior outlined in George and McCulloch (1993).
In this subsection we briefly discuss how the factors obtained from using different dimensionreduction techniques look like. For exposition, we choose q = 5 factors. Panels (a) to (h)in Figure 1 show the different factors and reveal remarkable differences across methods usedto compress the data. Considering the different variants of the PCs suggests that the factorsbehave quite similar and exhibit a rather persistent behavior. This, however, does not hold forthe case of squared PCA. In this case, the factors show sharp spikes during the global financialcrisis. This is not surprising since squaring the input dataset, which has been transformed for In general, the literature argues that most of the data revisions take place in the first quarter while afterwardsthe vintages remain relatively unchanged (see Croushore, 2011; Pfarrhofer, 2020). Therefore a gap of six monthsbetween the final observation of inflation in the evaluation sample (2018:12) and our final vintage (2019:06) isconsidered as enough to render evaluation valid. We consider 20 covariates spanning different economic sectors, e.g., • real activity: industrial production (INDPRO), real personal income (W875RX1), housing (HOUST,PERMIT), capacity utilization (CUMFNS), etc. • labor market: unemployment rates (UNRATE, CLAIMSx), employment (PAYEMS), avg. weekly hoursof production (CES0600000007), etc. • price indices: producer price index (PPICMM) • others: Federal Funds Rate (FEDFUNDS), money supply (M2REAL), 3-M (TB3MS) and 10-y (GS10)treasuries, etc. x t to z t .A similar pattern arises for LLE (see panel (d)). In this case, some of the factors behavesimilar to a regime-switching process with a moderate number of regimes. For instance, thedark gray line behaves similar to the PCs during the first few years of the sample. It thenstrongly decreases in the midst of the 1980s before returning to values observed in the beginningof the sample. Then, in the first half of the 1990s, we observe a strong increase (reaching apeak of around 5) before the factor quickly reverts back to the previous regime. This regimestays in place from 1996 to around 2003. Then we again find that dynamics change and thecorresponding factor increases in the run-up to the global financial crisis. Similar patterns canbe found for the other factors obtained from using LLE to compress the input data.Considering ISOMAP shows that the first few factors appear to be highly persistent. Thesefactors look very smooth for some periods but seem to exhibit oscillating behavior during othertime periods. The intensity of these cycles, however, is small. The final few factors are fullycharacterized by these oscillating dynamics.This brief discussion shows that the non-linear dimension reduction techniques yield verysimilar results with distinct dynamics. Some of them (especially the Autoencoder) pick up alot of high frequency movements. These movements might be irrelevant for modeling inflationdynamics but could nevertheless carry relevant information during certain periods in time. Asimilar argument applies to the other techniques which also yield factors that change theirbehavior over time. We now consider point and density forecasting performance of the different models and dimen-sion reduction techniques. The forecast performance is evaluated through averaged log predictivelikelihoods (LPLs) for density forecasts and root mean squared errors (RMSEs) for point fore-casts. Superior models are those with high scores in terms of LPL and low values in terms ofRMSE. Formal descriptions of the evaluation metrics are provided in Appendix A. We bench-mark all models relative to the autoregressive (AR) model with constant parameters and the HSprior. The first entry in the tables gives the actual LPL score (in averages) with actual RMSEsin parentheses for our benchmark model. The remaining entries are relative LPLs with relativeRMSEs in parentheses.Starting with the one-step ahead horizon, Table 1 shows the relative LPLs and RMSEs (inparentheses) for inflation forecasts. This table suggests that, in terms of density forecasts, using12imension reduction techniques (both linear and non-linear) and allowing for non-linearitiesbetween the factors and inflation improves density forecasts substantially. This does not carryover to point forecasts. When we consider relative RMSEs, only small improvements are obtainedby using more sophisticated modeling techniques.Comparing linear to non-linear dimension reduction methods suggests that forecasts canbe further improved. In particular, we observe that along the different reduction techniques,squared PCA performs well. One explanation for this might relate to the fact that simple modelssuch as a random walk or other univariate benchmarks are hard to beat in a real-time forecastingexercise (see Atkeson et al., 2001; Stock and Watson, 2008; Stella and Stock, 2013). When takinga closer look on Figure 1 (h) we see that the factors are close to zero in tranquil periods, while atthe same time, show substantial movements in times of turmoil. Conditional on relatively smallregression coefficients in Eq. 18, this pattern suggests that the forecast densities are close tothe ones obtained from a random walk model. But in recessionary episodes, the factors conveyinformation on the level and volatility of inflation that might be useful for predicting duringcrises periods (see, e.g., Chan, 2017; Huber and Pfarrhofer, 2020).When we consider the different specifications for the observation equation we find that al-lowing for time-variation in the parameters improves one-step ahead predictive densities. Theseimprovements appear to be substantial for all specifications except the model using squaredPCA. For squared PCA, we find only limited differences between constant and TVP regressions(conditional on the specific prior). The single best performing model for the one-step aheadinflation forecasts is the TVP model with a Horseshoe prior and five factors obtained by usingsquared PCA.Again, the strong differences in predictive accuracy between constant and TVP specificationsarise from the necessity to discriminate between different stages of the business cycle. Thesomewhat smaller differences in the case of squared PCA are driven by the specific shape of thelatent factors and the reason outlined in the previous paragraph.Next, we inspect the longer forecast horizons in greater detail. Table 2 depicts the forecastperformance of all competitors for one-quarter and one-year ahead. The table indicates thatnon-linear dimension reduction techniques clearly outperform the autoregressive benchmark andperform similarly to the linear PCAs. Results reveal that diffusion maps, isometric featuremapping as well as squared PCA in combination with time variation in the coefficients yieldhigh LPLs. Here, again, the best performing model is squared PCA, which beats all otherdimension reduction techniques irrespective of the prior structure or whether constant or time-varying parameters are considered. For point forecasts, we again find little differences relativeto the univariate benchmark model. 13) Autoencoder b) Diffusion Maps −202 1980 1990 2000 2010 2020 −404 1980 1990 2000 2010 2020 c) ISOMAP d) LLE −202 1980 1990 2000 2010 2020 −2.50.02.55.0 1980 1990 2000 2010 2020 e) PCA gauss. kernel f) PCA linear −4−2024 1980 1990 2000 2010 2020 −404 1980 1990 2000 2010 2020 g) PCA poly. kernel h) PCA squared −404 1980 1990 2000 2010 2020 −10010 1980 1990 2000 2010 2020
Figure 1:
Illustration of linear and non-linear dimension reduction techniques applied to ourUS dataset with K = 104 based on the last vintage (end of year 2018). By focussing on q = 5we depict normalized factors with mean zero and variance one ranging from January 1980 toDecember 2018. 14 able 1: One-month ahead forecast performance.
Specification One-month ahead const. (HS) const. (SSVS) TVP (HS) TVP (SSVS)AR -336.98 0.40 15.57 19.69(1.18) (1.01) (1.00) (1.01)Autoencoder (q = 5) 1.67 4.64 13.71 22.51(1.00) ( ) (1.00) ( )Autoencoder (q = 15) 1.00 2.88 10.79 14.00(1.00) (1.01) (1.01) (1.05)Autoencoder (q = 30) 2.32 0.31 12.93 12.97(1.00) (1.01) (1.00) (1.06)Diffusion Maps (q = 5) 2.57 1.14 13.81 15.59(1.00) (1.01) (1.01) (1.12)Diffusion Maps (q = 15) 0.71 2.92 13.54 17.26(1.00) (1.01) (1.00) (1.06)Diffusion Maps (q = 30) 2.28 3.14 14.44 -0.36(1.00) (1.02) (1.00) (1.15)Extended PC 11.25 15.73( ) (1.07)ISOMAP (q = 5) 0.99 -0.58 10.80 19.21(1.00) (1.01) (1.00) (1.01)ISOMAP (q = 15) 0.06 1.30 9.71 18.86(1.00) (1.01) (1.01) (1.02)ISOMAP (q = 30) -1.18 2.38 9.73 20.37(1.00) (1.01) (1.02) (1.03)LLE (q = 5) 0.18 -1.83 13.81 19.75(1.00) (1.01) ( ) (1.01)LLE (q = 15) -2.02 0.05 11.64 19.06(1.01) (1.01) (1.00) (1.01)LLE (q = 30) -1.11 -3.63 6.71 19.68(1.00) (1.01) (1.01) (1.01)PCA gauss. kernel (q = 5) -0.74 0.67 13.69 15.85(1.00) (1.01) (1.00) (1.05)PCA gauss. kernel (q = 15) -0.20 2.65 14.49 11.27(1.00) (1.01) (1.01) (1.17)PCA gauss. kernel (q = 30) 0.28 6.86 15.78 -5.34(1.00) (1.01) (1.01) (1.30)PCA linear (q = 5) -0.80 0.51 11.48 18.95(1.00) (1.01) (1.01) (1.03)PCA linear (q = 15) -0.51 2.32 12.56 18.95(1.01) (1.01) (1.02) (1.04)PCA linear (q = 30) 0.27 7.05 16.46 (1.01) (1.00) (1.02) (1.03)PCA poly. kernel (q = 5) 1.86 -0.39 12.52 15.02(1.00) (1.01) (1.00) (1.05)PCA poly. kernel (q = 15) -0.11 2.78 15.56 11.82(1.00) (1.01) (1.00) (1.18)PCA poly. kernel (q = 30) 0.64 4.44 16.10 0.59(1.00) (1.01) (1.01) (1.22)PCA squared (q = 5) 16.79
Note:
The first (red shaded) entry gives the actual LPL score in averages with actual RMSEs in parentheses of our benchmark model,which is the autoregressive (AR) model with constant parameters and the HS prior. All other entries are relative LPLs with relativeRMSEs in parentheses. able 2: One-quarter and one-year ahead forecast performance.
Specification One-quarter ahead One-year ahead const. (HS) const. (SSVS) TVP (HS) TVP (SSVS) const. (HS) const. (SSVS) TVP (HS) TVP (SSVS)AR -383.12 13.10 23.66 31.64 -408.15 8.87 16.52 26.25(1.31) (0.99) (1.00) (1.03) (1.41) (1.01) (1.00) (1.01)Autoencoder (q = 5) 1.02 17.96 26.39 36.60 0.51 11.24 15.55 34.21(1.00) (1.00) (0.99) ( ) (1.01) (1.00) ( ) ( )Autoencoder (q = 15) 0.34 10.34 21.68 34.66 3.44 12.95 15.60 39.35(1.00) (1.00) (1.00) (1.06) (1.00) (1.01) (1.00) (1.02)Autoencoder (q = 30) 0.00 19.77 19.29 33.63 -1.54 12.97 12.55 37.53(1.00) (1.00) (1.00) (1.09) (1.00) (1.00) (1.00) (1.04)Diffusion Maps (q = 5) 1.09 17.18 25.75 40.24 -0.43 17.39 16.16 29.34(1.00) (0.99) (0.99) (1.13) (1.00) (1.00) (1.00) (1.48)Diffusion Maps (q = 15) -1.56 18.56 25.54 ) ( ) (1.00) (1.03) (1.01) (1.01) (1.01) (1.04)LLE (q = 5) -2.94 8.09 24.14 33.60 -1.86 5.70 11.68 27.44(1.00) (1.00) (1.00) (1.03) (1.00) (1.01) (1.00) (1.01)LLE (q = 15) -7.05 10.62 16.88 33.67 1.02 4.45 9.70 26.66(1.00) (1.00) (1.00) (1.03) (1.00) (1.01) (1.00) (1.01)LLE (q = 30) -6.21 8.25 15.55 31.47 -4.56 4.14 8.30 29.24(1.00) (1.00) (1.00) (1.04) (1.00) (1.00) (1.00) (1.01)PCA gauss. kernel (q = 5) 2.83 14.65 24.91 32.19 1.53 10.61 14.64 31.25(1.00) (0.99) (1.00) (1.08) (1.00) (1.00) (1.00) (1.11)PCA gauss. kernel (q = 15) 0.85 18.40 21.89 27.12 -2.78 15.55 15.88 27.95(1.00) (1.00) (1.00) (1.34) (1.01) (1.01) (1.05) (1.32)PCA gauss. kernel (q = 30) 4.74 18.56 27.43 9.82 -2.27 19.62 11.78 16.79(1.00) (1.00) (1.00) (1.55) (1.01) (1.02) (1.05) (1.51)PCA linear (q = 5) 1.06 12.45 20.24 34.72 0.61 10.96 18.44 30.69(1.00) (1.00) (1.00) (1.04) (1.01) (1.01) (1.00) (1.02)PCA linear (q = 15) 4.74 16.12 22.90 37.77 -0.79 16.83 18.19 36.97(1.00) (1.00) (1.01) (1.05) (1.01) (1.01) (1.03) (1.09)PCA linear (q = 30) 7.76 21.16 22.94 45.40 2.52 22.03 16.49 42.25(0.99) (0.99) (1.01) (1.04) (1.00) ( ) (1.03) (1.10)PCA poly. kernel (q = 5) 2.32 10.80 21.68 34.27 4.74 13.20 15.21 34.53(1.00) (1.00) (1.00) (1.09) (1.00) (1.01) (1.00) (1.08)PCA poly. kernel (q = 15) -1.33 14.52 23.42 31.02 2.37 11.31 16.09 32.05(1.00) (1.00) (1.00) (1.24) (1.00) (1.01) (1.01) (1.25)PCA poly. kernel (q = 30) 1.68 19.78 23.15 23.30 -0.07 15.35 15.37 8.87(1.00) (0.99) ( ) (1.36) (1.00) (1.00) (1.02) (1.48)PCA squared (q = 5) (1.03) (1.06) (1.01) (2.60) ( ) (1.02) (1.03) (3.36)PCA squared (q = 15) 51.10 54.01 57.09 32.56 55.54 68.09 67.21 28.52(1.04) (1.03) (1.03) (3.18) (1.00) (1.25) (1.05) (4.32)PCA squared (q = 30) 48.84 52.21 60.12 23.10 62.97 69.93 70.63 19.67(1.04) (1.03) (1.04) (3.35) (0.99) (1.25) (1.05) (4.62)
Note:
The first (red shaded) entry gives the actual LPL score in averages with actual RMSEs in parentheses of our benchmark model, which is the autoregressive (AR) model with constant parameters and the HSprior. All other entries are relative LPLs with relative RMSEs in parentheses. o far, the LPLs are averaged over the full evaluation sample and thus only measure modelquality over the full hold-out period (Geweke and Amisano, 2010). However, this might maskimportant differences in forecast performance of the different models and compression tech-niques over time. Figure 2 depicts the average LPLs along the hold-out sample for the shortrun forecasting exercise. The figure suggests a great deal of performance variation over time.Regardless of the model specification and the number of factors included in the models, account-ing for instabilities in the relationship between the factors and inflation through time-varyingparameters improves the forecasting performance. Especially during the global financial crisis(the gray shaded area), more flexible model specifications yield greater improvements relativeto the univariate benchmark and compared to constant specifications. The final paragraph in the previous subsection showed that model performance varies consider-ably over time. The key implication is that non-linear compression techniques are useful duringturbulent times whereas forecast evidence is less pronounced in normal times. In this subsection,we ask whether combining models in a dynamic manner further improves predictive accuracy.After having obtained the predictive densities of y t + h for the different dimensionality reduc-tion techniques and model specifications, the goal is to exploit the advantages of both linearand non-linear approaches. This is achieved by combining models in a model pool such thatbetter performing models over certain periods receive larger weights while inferior models aresubsequently down-weighted. The literature on forecast combinations suggests several differ-ent weighting schemes, ranging from simply averaging over all models (see, e.g., Hendry andClements, 2004; Hall and Mitchell, 2007; Clark and McCracken, 2010; Berg and Henzel, 2015)to estimating weights based on the models’ performances according to the minimization of anobjective or loss function (see, e.g., Timmermann, 2006; Hall and Mitchell, 2007; Geweke andAmisano, 2011; Conflitti et al., 2015; Pettenuzzo and Ravazzolo, 2016) or according to the pos-terior probabilities of the predictive densities (see, e.g., Raftery et al., 2010; Koop and Korobilis,2012; Beckmann et al., 2020). Since the weights might change over time, we aim to computethem in a dynamic manner.Combining the different predictive densities according to their posterior probabilities is re-ferred to as Bayesian model averaging (BMA). The resulting weights are capable of reflecting thepredictive power of each model for the respective periods. Dynamic model averaging (DMA),as specified by Raftery et al. (2010), extends the approach by adding a discount (or forgetting )factor to control for a model’s forecasting performance in the recent past. The ‘recent past’is determined by the discount factor, with higher values attaching greater importance to pastforecasting performances of the model and lower values gradually ignoring results of past pre-dictive densities. Similar to Beckmann et al. (2020), Koop and Korobilis (2012) and Rafteryet al. (2010), we apply DMA to combine the predictive densities of our various models.17a) No dimension reduction −2002040 2005 2010 2015 l l
AR (p) Extended PC const. (HS) const. (SSVS) TVP (HS) TVP (SSVS) (b) q = 5cons. (HS) const. (SSVS/BMA) TVP (HS) TVP (SSVS) −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 (c) q = 15 −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 (d) q = 30 −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 −100102030 2005 2010 2015 ll ll l ll
AutoencoderExtended PC ISOMAPLLE PCA gauss. kernelPCA linear PCA poly. kernelPCA squared
Figure 2:
Evolution of one-month ahead cumulative LPBFs relative to the benchmark. Thered dashed lines refer to the maximum/minimum Bayes factor over the full hold-out sample.The light gray shaded areas indicate the NBER recessions in the US.18MA works as follows. Let (cid:37) t + h | t + h = ( (cid:37) t + h | t + h, , . . . , (cid:37) t + h | t + h,J ) (cid:48) denote a set of weights for J competing models. These (horizon-specific) weights vary over time and depend on the recentpredictive performance of the model according to: (cid:37) t + h | t,j = (cid:37) δt | t,j (cid:80) Jl =1 (cid:37) δt | t,l , (20) (cid:37) t + h | t + h,j = (cid:37) t + h | t,j p j ( y t + h | y t ) (cid:80) Jl =1 (cid:37) t + h | t,l p l ( y t + h | y t ) (21)where p j ( y t + h | y t ) denotes the h -step ahead predictive distribution of model j and δ ∈ (0 , δ = 0 .
9. Notice thatif δ = 1, we obtain standard BMA weights while δ = 0 would imply that the weights dependexclusively on the forecasting performance in the last period. Weights obtained by combining models according to their predictive power convey useful infor-mation about the adequacy of each model over time. In order to get a comprehensive picture ofthe effects of different model modifications, we combine our models and model specifications invarious ways.Table 3 presents the forecasting results when we use DMA to combine models. Again, allmodels are benchmarked to the AR model with constant parameters and the HS prior. The firstrow depicts the relative performance of the best performing single model for the chosen timehorizon.The table can be understood as follows. Each entry includes all dimension reduction tech-niques. The rows define whether the model space includes all factors q ∈ { , , } or whetherwe combine models with a fixed number of factors exclusively. The columns refer to modelspaces which include only constant parameter, time-varying parameter or both specifications inthe respective model pool. Since we also discriminate between two competing priors we alsoconsider model weights if we condition on either the HS or the SSVS prior or average acrossboth prior specifications (the first upper part of the table with { HS, SSVS } ).Across all three forecast horizons considered, we again find only limited accuracy improve-ments for point forecasts relative to the AR model. This, however, does not carry over to LPLs.For density forecasts, we find that DMA-based combinations improve upon the single best per-forming model for all forecast horizons. Hence, allowing models to change over the hold-outperiod leads to superior predictive accuracy. 19 able 3: Forecast performance of predictive combinations.
Specification One-month ahead One-quarter ahead One-year ahead
Single best performing model 30.15 65.09 80.56(0.99) (0.98) (0.95)Prior Combination Const. TVP { const., TVP } Const. TVP { const., TVP } Const. TVP { const., TVP }{ HS, SSVS } q = {
5, 15, 30 } ) ( ) (1.17) (1.04) (1.09)HS q = {
5, 15, 30 } ) ( ) ( )q = 5 17.97 29.95 26.85 52.31 63.74 61.57 72.00 82.70 81.20(1.00) (1.01) (1.01) (1.03) (1.01) (1.01) (0.93) (1.01) (1.00)q = 15 18.82 29.44 26.37 50.34 59.23 56.77 66.54 82.79 81.50(1.01) ( ) (1.00) (1.03) (1.02) (1.02) (0.96) (1.01) (1.01)q = 30 18.32 26.66 23.81 48.94 61.53 58.95 70.54 83.68 82.02( ) (1.00) ( ) (1.03) (1.03) (1.02) (0.96) (1.01) (1.00)SSVS q = {
5, 15, 30 } (1.00) (1.04) (1.04) (1.02) (1.00) (0.98) (1.03) (1.22) (1.06)q = 5 ) (1.00) (0.98) (1.16) (1.24) (1.18) Note:
The first (grey shaded) row states the results of the single best performing model as presented in the previous chapter for each forecast horizon benchmarked to the AR model with constant parameters and theHS prior. All other rows show the relative results for the combinations of the different dimension reduction techniques according to the specification stated in the rows and columns headers. For example, the entry inrow { HS, SSV S } , q = { , , } and column Const. combines all models estimated with constant parameters, the HS prior, the SSVS prior, 5, 15 and 30 factors. Entries denote the relative LPL with relative RMSEin parantheses benchmarked against the AR model with constant parameters and the HS prior. omparing whether restricting the model a priori improves predictions yields mixed insights.For the one-month and one-quarter ahead predictions we find that a combination scheme thatuses only TVP models but both priors and q = 30 factors yields the most precise forecasts. Inthe case of one-year ahead forecasts, we find that pooling across different q ’s and exclusivelyincluding constant parameter models translates into highest LPLs. In general, the differencesin predictive performance across the DMA-based averaging schemes are small. Hence, as ageneral suggestion we can recommend applying DMA and using the most exhaustive modelspace available (i.e., including both priors, the different number of factors and TVP and constantparameter regressions).To investigate which model receives substantial posterior weight over time, Figure 3 depictsthe weights associated with the one-step ahead LPLs over the hold-out period. Panel (a) displaysthe weight placed on models that allow for TVP, panel (b) shows the weight attached to thedifferent number of factors and panel (c) shows the weight attached to each model. These weightsare obtained by using the full model space (i.e., that includes both priors, TVP and constantparameter regressions and all number of factors). The weight placed on TVP specifications,for instance, is then simply obtained by summing up the weights associated with the differentmodels that feature TVPs.Starting with the top panel of the figure, we observe that during the beginning of the sample,appreciable model weight is placed on constant parameter models. In the mid of 2006, thischanges and DMA places increasing posterior mass on models that allow for time-variation inthe parameters. In the period from the beginning of 2007 to the onset of the financial crisis, wesee that the weight on TVP models somewhat decreases. During the financial crisis, we againexperience a pronounced increase in posterior weight towards TVP regression. In that period,constant parameter models only play a limited role in forming inflation forecasts. With some fewexceptions, the remainder of the hold-out period is characterized by evenly distributed posteriormass across constant and TVP regressions.The middle panel of Figure 3 shows that DMA places increasing posterior mass on modelswith a large number of factors during recessions (and, similar to panel (a), in 2006). Thisindicates that in turbulent times it seems to pay off to include many factors. Since our previousanalysis reveals that point forecasts are very similar to the ones obtained from simpler univariatemodels, this finding is most likely driven by a superior density forecasting performance. Hence,we conjecture that the main driving force behind the strong performance of a model with manyfactors is that this increases posterior uncertainty (through the inclusion of a large number ofcovariates), which ultimately leads to slightly wider credible sets, implying a higher probabilityof observing outlying observations.The bottom panel (panel (c)) of Figure 3 provides information on how much weight isallocated to models that exploit non-linear dimension reduction techniques. Again, we observethat non-linear dimension reduction techniques obtain considerable posterior mass during 2006and the financial crisis of 2007/2008. In 2006, the Autoencoder with q = 15 receives substantialposterior weight. During the financial crisis, we find that diffusion maps and squared PCAfeature large weights. Apart from these two periods, weights allocated to non-linear dimensionreduction techniques are generally close to zero.21a) Parameter change const.TVP 2005 2010 2015 0.000.250.500.751.00prob. (b)
Number of factors (c)
Model selection
Autoencoder (q = 05)Autoencoder (q = 15)Autoencoder (q = 30)Diffusion Maps (q = 05)Diffusion Maps (q = 15)Diffusion Maps (q = 30)ISOMAP (q = 05)ISOMAP (q = 15)ISOMAP (q = 30)LLE (q = 05)LLE (q = 15)LLE (q = 30)PCA gauss. kernel (q = 05)PCA gauss. kernel (q = 15)PCA gauss. kernel (q = 30)PCA linear (q = 05)PCA linear (q = 15)PCA linear (q = 30)PCA poly. kernel (q = 05)PCA poly. kernel (q = 15)PCA poly. kernel (q = 30)PCA squared (q = 05)PCA squared (q = 15)PCA squared (q = 30) 2005 2010 2015 0.000.250.500.751.00prob.
Figure 3:
Evolution of the weights determined by DMA for one-month ahead cumulativeLPBFs. 22his discussion highlights that the strong performance of DMA relative to the single bestperforming model can be, at least partly, attributed to changes in model weights across businesscycles. In expansionary periods with stable inflation rates and macroeconomic fundamentals,linear and simple models dominate the model pool. By contrast, adding more sophisticatedmodels and dimension reduction techniques pays off during recessions. A dynamic combinationof different approaches thus improves real-time inflation forecasts.
In macroeconomics, the vast majority of researchers compresses information using linear methodssuch as principal components to efficiently summarize huge datasets in forecasting applications.Machine learning techniques describing large datasets with relatively few latent factors havegained relevance in the last years in various areas. In this paper, we have shown that using suchapproaches potentially improves real-time inflation forecasts for a wide range of competing modelspecifications. Our findings indicate that point forecasts of simpler models are hard to beat.But when interest centers on predictive distributions, we find that more sophisticated modelingtechniques that rely on non-linear dimension reduction yield favorable inflation predictions.These predictions can be further improved by using DMA to dynamically weight different models,dimension reduction methods and priors. Doing so further improves density forecasts. Weightsobtained from dynamic model averaging reveal that using TVP models in combination withnon-linear approaches to dimension reduction is preferred in turbulent times.23 eferences
Allaire, J., and F. Chollet (2019): keras: R Interface to ’Keras’ , R package version 2.2.5.0.
Andreini, P., C. Izzo, and G. Ricco (2020): “Deep Dynamic Factor Models,” arXiv preprintarXiv:2007.11887 . Atkeson, A., L. E. Ohanian, et al. (2001): “Are Phillips curves useful for forecasting inflation?,”
Federal Reserve Bank of Minneapolis Quarterly Review , 25(1), 2–11.
Bai, J., and S. Ng (2002): “Determining the number of factors in approximate factor models,”
Econo-metrica , 70(1), 191–221.(2008): “Forecasting economic time series using targeted predictors,”
Journal of Econometrics ,146(2), 304–317.
Beckmann, J., G. Koop, D. Korobilis, and R. A. Sch¨ussler (2020): “Exchange rate predictabilityand dynamic Bayesian learning,”
Journal of Applied Econometrics , 35(4), 410–421.
Belmonte, M., G. Koop, and D. Korobilis (2014): “Hierarchical shrinkage in time-varying coeffi-cient models,”
Journal of Forecasting , 33(1), 80–94.
Berg, T. O., and S. R. Henzel (2015): “Point and density forecasts for the euro area using BayesianVARs,”
International Journal of Forecasting , 31(4), 1067–1095.
Bernanke, B. S., J. Boivin, and P. Eliasz (2005): “Measuring the effects of monetary policy: Afactor-augmented vector autoregressive (FAVAR) approach,”
The Quarterly Journal of Economics ,120(1), 387–422.
Carter, C., and R. Kohn (1994): “On Gibbs sampling for state space models,”
Biometrika , 81(3),541–553.
Carvalho, C. M., N. G. Polson, and J. G. Scott (2010): “The horseshoe estimator for sparsesignals,”
Biometrika , 97(2), 465–480.
Chakraborty, C., and A. Joseph (2017): “Machine learning at central banks,” Bank of EnglandWorking Papers 674, Bank of England.
Chan, J. C. (2017): “The stochastic volatility in mean model with time-varying parameters: An appli-cation to inflation modeling,”
Journal of Business & Economic Statistics , 35(1), 17–28.
Chan, J. C., T. E. Clark, and G. Koop (2018): “A new model of inflation, trend inflation, andlong-run inflation expectations,”
Journal of Money, Credit and Banking , 50(1), 5–53.
Clark, T. E. (2011): “Real-time density forecasts from Bayesian vector autoregressions with stochasticvolatility,”
Journal of Business & Economic Statistics , 29(3), 327–341.
Clark, T. E., and M. W. McCracken (2010): “Averaging forecasts from VARs with uncertaininstabilities,”
Journal of Applied Econometrics , 25(1), 5–29.
Clark, T. E., and F. Ravazzolo (2015): “Macroeconomic forecasting performance under alternativespecifications of time-varying volatility,”
Journal of Applied Econometrics , 30(4), 551–575.
Coifman, R. R., and S. Lafon (2006): “Diffusion maps,”
Applied and Computational HarmonicAnalysis , 21(1), 5–30.
Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W.Zucker (2005): “Geometric diffusions as a tool for harmonic analysis and structure definition ofdata: Diffusion maps,”
Proceedings of the National Academy of Sciences , 102(21), 7426–7431.
Conflitti, C., C. De Mol, and D. Giannone (2015): “Optimal combination of survey forecasts,”
International Journal of Forecasting , 31(4), 1096–1103.
Coulombe, P. G., M. Leroux, D. Stevanovic, and S. Surprenant (2019): “How is machinelearning useful for macroeconomic forecasting?,” CIRANO Working Papers 2019s-22, CIRANO.
Croushore, D. (2011): “Frontiers of real-time data analysis,”
Journal of Economic Literature , 49(1),72–100.
D’Agostino, A., L. Gambetti, and D. Giannone (2013): “Macroeconomic forecasting and structuralchange,”
Journal of Applied Econometrics , 28(1), 82–101.
De Mol, C., D. Giannone, and L. Reichlin (2008): “Forecasting using a large number of predictors:Is Bayesian shrinkage a valid alternative to principal components?,”
Journal of Econometrics , 146(2),318–328.
Diedrich, H., and D. M. Abel (2012): lle: Locally linear embedding , R package version 1.1.
Exterkate, P., P. J. Groenen, C. Heij, and D. van Dijk (2016): “Nonlinear forecasting withmany predictors using kernel ridge regression,”
International Journal of Forecasting , 32(3), 736–753.
Feng, G., J. He, and N. G. Polson (2018): “Deep learning for predicting asset returns,” arXivpreprint arXiv:1804.09314 . Fr¨uhwirth-Schnatter, S. (1994): “Data augmentation and dynamic linear models,”
Journal of TimeSeries Analysis , 15(2), 183–202.
Fr¨uhwirth-Schnatter, S., and H. Wagner (2010): “Stochastic model specification search for Gaus-sian and partial non-Gaussian state space models,”
Journal of Econometrics , 154(1), 85–100.
Gallant, A. R., and H. White (1992): “On learning the derivatives of an unknown mapping withmultilayer feedforward networks,”
Neural Networks , 5(1), 129–138.
George, E. I., and R. E. McCulloch (1993): “Variable selection via Gibbs sampling,”
Journal ofthe American Statistical Association , 88(423), 881–889.
George, E. I., D. Sun, and S. Ni (2008): “Bayesian stochastic search for VAR model restrictions,”
Journal of Econometrics , 142(1), 553–580.
Geweke, J., and G. Amisano (2010): “Comparing and evaluating Bayesian predictive distributions of sset returns,” International Journal of Forecasting , 26(2), 216–230.(2011): “Optimal prediction pools,”
Journal of Econometrics , 164(1), 130–141.
Giovannelli, A. (2012): “Nonlinear forecasting using large datasets: Evidences on US and Euro areaeconomies,” CEIS Research Paper 255, Tor Vergata University, CEIS.
Goodfellow, I., Y. Bengio, and A. Courville (2016):
Deep Learning . MIT Press.
Hall, S. G., and J. Mitchell (2007): “Combining density forecasts,”
International Journal of Fore-casting , 23(1), 1–13.
Hauzenberger, N., F. Huber, G. Koop, and L. Onorante (2019): “Fast and Flexible BayesianInference in Time-varying Parameter Regression Models,” arXiv preprint arXiv:1910.10779 . Heaton, J. (2008):
Introduction to neural networks with Java . Heaton Research, Inc.
Heaton, J. B., N. G. Polson, and J. H. Witte (2017): “Deep learning for finance: deep portfolios,”
Applied Stochastic Models in Business and Industry , 33(1), 3–12.
Hendry, D. F., and M. P. Clements (2004): “Pooling of forecasts,”
The Econometrics Journal , 7(1),1–31.
Huang, G.-B. (2003): “Learning capability and storage capacity of two-hidden-layer feedforward net-works,”
IEEE Transactions on Neural Networks , 14(2), 274–281.
Huber, F., G. Koop, and L. Onorante (2020): “Inducing sparsity and shrinkage in time-varyingparameter models,”
Journal of Business & Economic Statistics , (forthcoming).
Huber, F., and M. Pfarrhofer (2020): “Dynamic shrinkage in time-varying parameter stochasticvolatility in mean models,”
Journal of Applied Econometrics , (forthcoming).
Jarocinski, M., and M. Lenza (2018): “An inflation-predicting measure of the output gap in theEuro area,”
Journal of Money, Credit and Banking , 50(6), 1189–1224.
Kalli, M., and J. E. Griffin (2014): “Time-varying sparsity in dynamic regression models,”
Journalof Econometrics , 178(2), 779 – 793.
Kastner, G. (2016): “Dealing with stochastic volatility in time series using the R package stochvol,”
Journal of Statistical Software , 69(5), 1–30.
Kastner, G., and S. Fr¨uhwirth-Schnatter (2014): “Ancillarity-sufficiency interweaving strategy(ASIS) for boosting MCMC estimation of stochastic volatility models,”
Computational Statistics &Data Analysis , 76, 408–423.
Kayo, O. (2006): “LOCALLY LINEAR EMBEDDING ALGORITHM–Extensions and applications,” .
Kelly, B. T., S. Pruitt, and Y. Su (2019): “Characteristics are covariances: A unified model of riskand return,”
Journal of Financial Economics , 134(3), 501–524.
Koop, G., and D. Korobilis (2012): “Forecating inflation using dynamic model averaging,”
Interna-tional Economic Review , 53(3), 867–886.(2013): “Large time-varying parameter VARs,”
Journal of Econometrics , 177(2), 185–198.
Koop, G., and S. M. Potter (2007): “Estimation and forecasting in models with multiple breaks,”
The Review of Economic Studies , 74(3), 763–789.
Lin, F., C.-C. Yeh, and M.-Y. Lee (2011): “The use of hybrid manifold learning and support vectormachines in the prediction of business failure,”
Knowledge-Based Systems , 24(1), 95–101.
Makalic, E., and D. F. Schmidt (2015): “A simple sampler for the horseshoe estimator,”
IEEESignal Processing Letters , 23(1), 179–182.
McAdam, P., and P. McNelis (2005): “Forecasting inflation with thick models and neural networks,”
Economic Modelling , 22(5), 848–867.
McCracken, M. W., and S. Ng (2016): “FRED-MD: A monthly database for macroeconomic re-search,”
Journal of Business & Economic Statistics , 34(4), 574–589.
Medeiros, M. C., G. F. Vasconcelos, ´A. Veiga, and E. Zilberman (2019): “Forecasting infla-tion in a data-rich environment: the benefits of machine learning methods,”
Journal of Business &Economic Statistics , (forthcoming).
Mullainathan, S., and J. Spiess (2017): “Machine learning: An applied econometric approach,”
Journal of Economic Perspectives , 31(2), 87–106.
Oksanen, J., F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R.Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stevens, E. Szoecs, andH. Wagner (2019): vegan: Community Ecology Package , R package version 2.5-6.
Orsenigo, C., and C. Vercellis (2013): “Linear versus nonlinear dimensionality reduction for banks’credit rating prediction,”
Knowledge-Based Systems , 47, 14–22.
Pettenuzzo, D., and F. Ravazzolo (2016): “Optimal portfolio choice under decision-based modelcombinations,”
Journal of Applied Econometrics , 31(7), 1312–1332.
Pfarrhofer, M. (2020): “Forecasts with Bayesian vector autoregressions under real time conditions,” arXiv preprint arXiv:2004.04984 . Polson, N. G., and J. G. Scott (2010): “Shrink globally, act locally: Sparse Bayesian regularizationand prediction,”
Bayesian Statistics , 9, 501–538.
Raftery, A., M. K´arn´y, and P. Ettler (2010): “Online prediction under model uncertainty viaDynamic Model Averaging: Application to a cold rolling mill,”
Technometrics , 52(1), 52–66.
Ribeiro, B., A. Vieira, and J. C. das Neves (2008): “Supervised Isomap with dissimilarity measuresin embedding learning,” in
Iberoamerican Congress on Pattern Recognition , pp. 389–396. Springer.
Richards, J., and R. Cannoodt (2019): diffusionMap: Diffusion Map , R package version 1.2.0.
Richards, J. W., P. E. Freeman, A. B. Lee, and C. M. Schafer (2009): “Exploiting low-dimensional structure in astronomical spectra,”
The Astrophysical Journal , 691(1), 32–42. oweis, S. T., and L. K. Saul (2000): “Nonlinear dimensionality reduction by locally linear embed-ding,” Science , 290(5500), 2323–2326.
Saxe, A. M., Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox (2019): “On the information bottleneck theory of deep learning,”
Journal of Statistical Mechanics:Theory and Experiment , 2019(12), 124020.
Sch¨olkopf, B., A. Smola, and K.-R. M¨uller (1998): “Nonlinear component analysis as a kerneleigenvalue problem,”
Neural Computation , 10(5), 1299–1319.
Stella, A., and J. H. Stock (2013): “A state-dependent model for inflation forecasting,”
FRBInternational Finance Discussion Paper , (1062).
Stock, J., and M. Watson (1999): “Forecasting inflation,”
Journal of Monetary Economics , 44(2),293–335.(2002): “Macroeconomic forecasting using diffusion indexes,”
Journal of Business & EconomicStatistics , 20(2), 147–162.(2008): “Phillips curve inflation forecasts,” NBER Working Papers 14322, National Bureau ofEconomic Research, Inc.
Stock, J. H., and M. W. Watson (2007): “Why has U.S. inflation become harder to forecast?,”
Journal of Money, Credit and Banking , 39(s1), 3–33.
Stock, J. H., and M. W. Watson (2016): “Core inflation and trend inflation,”
Review of Economicsand Statistics , 98(4), 770–784.
Taylor, S. J. (1982): “Financial returns modelled by the product of two stochastic processes-a studyof the daily sugar prices 1961-75,”
Time Series Analysis: Theory and Practice , 1, 203–226.
Tenenbaum, J. B., V. De Silva, and J. C. Langford (2000): “A global geometric framework fornonlinear dimensionality reduction,”
Science , 290(5500), 2319–2323.
Timmermann, A. (2006): “Forecast combinations,”
Handbook of Economic Forecasting , 1, 135–196.
Zelnik-Manor, L., and P. Perona (2004): “Self-tuning spectral clustering,”
Advances in neuralinformation processing systems , 17, 1601–1608.
Zime, S. (2014): “Economic performance evaluation and classification using hybrid manifold learningand support vector machine model,” in . IEEE, 184–191. ppendices A Technical Appendix
A.1 Non-centered Parameterization
To implement the Bayesian priors to achieve shrinkage in the TVP regression defined by Eq. 18 andEq. 19, we use the non-centered parameterization proposed in Fr¨uhwirth-Schnatter and Wagner (2010).Intuitively speaking, this allows us to move the process innovation variances into the observation equationand discriminate between a time-invariant and a time-varying part of the model. The non-centeredparameterization of the model is given by: y t + h = d (cid:48) t + h β + d (cid:48) t + h √ V ˜ β t + h + (cid:15) t + h , (cid:15) t + h ∼ N (0 , σ t + h ) (B.1)˜ β t + h = ˜ β t + h − + ε t + h , ε t + h ∼ N (0 , I M ) , ˜ β = M , (B.2)where the j th element in ˜ β t + h is given by ˜ β jt + h = β jt + h − β j √ v j for j = 1 , . . . , M .Conditional on the normalized states ˜ β , Eq. B.1 can be written as a linear regression model asfollows: y t + h = D (cid:48) t + h α + (cid:15) t + h , (B.3)with D t + h = [ d (cid:48) t + h , ( ˜ β t + h (cid:12) d t + h ) (cid:48) ] (cid:48) denoting a 2 M -dimensional vector of regressors and α = ( β (cid:48) , v , . . . , v (cid:48) M )is a 2 M -dimensional coefficient vector. This parameterization implies that the state innovation variances(or more precisely the square roots) are moved into the observation equation and we can estimate themalongside β (conditional on the states ˜ β t + h ). A.2 Prior Setup
A.2.1 Priors on the Regression Coefficients
We use a zero-mean multivariate Gaussian prior on α : α | V ∼ N ( , V ) , (B.4)with V denoting a 2 M -dimensional prior variance-covariance matrix V = diag (cid:0) τ , . . . , τ M (cid:1) . This matrixcollects the prior shrinkage parameters τ j associated with the time-invariant regression coefficients andthe process innovation standard deviations.In the empirical work, the priors we consider differ in the specification of V . The first is the stochasticsearch variable selection (SSVS) prior of George and McCulloch (1993) and the second the Horseshoe(HS) prior of Carvalho et al. (2010).1. SSVS Prior:
The SSVS prior pushes coefficients associated with irrelevant variables towards zero by using amixture of Gaussians. A specific mixture component is selected by introducing an auxiliary binaryindicator variable γ j . More formally, the SSVS prior specifies τ j ( j = 1 , . . . , M ) such that τ j = (1 − γ j ) τ j + γ j τ j , (B.5)with τ j (cid:28) τ j being fixed prior variances. If γ j = 1, the prior variance is τ j which is set to alarge value. Hence, little shrinkage is introduced. By contrast, if γ j = 0, the prior variance τ j is lose to zero and the corresponding prior weight will be large, leading to a posterior distributionthat is tightly centered on zero.The prior probability that γ j = 1 is set equal to:Prob( γ j = 1) = 1 − Prob( γ j = 0) = p m , p m = 12 . (B.6)This choice of the prior inclusion probability implies that every quantity is equally likely to enterthe model.To control for scaling differences, we adopt the semi-automatic approach proposed in George et al.(2008) and choose τ j = 0 .
01 ˆ σ j and τ j = 100 ˆ σ j for j = 1 , . . . , M . Here, ˆ σ j denotes the OLSvariance of a standard regression model with constant parameters.2. Horseshoe Prior:
The horseshoe prior of Carvalho et al. (2010) achieves shrinkage by introducing local and globalshrinkage parameters (see Polson and Scott, 2010). These follow a standard half-Cauchy distribu-tion restricted to the positive real numbers. That is: τ j = ζ j ς , ζ m ∼ C + (1 , , ς ∼ C + (1 ,
0) (B.7)While the global component ς strongly pushes all coefficients in α towards the prior mean (i.e.,zero), the local scalings { ζ j } Mj =1 allow for variable-specific departures from zero in light of a globalscaling parameter close to zero. This flexibility leads to heavy tails in the marginal prior (obtainedafter integrating out ζ j ) which turns out to be useful for forecasting. A.3 Full Conditional Posterior Simulation
We carry out posterior inference by using a Markov chain Monte Carlo (MCMC) algorithm to simulatefrom the joint posterior of the parameters, the log-volatilities and the TVPs. This MCMC algorithmconsists of the following steps:1. Conditional on the time-varying part of the coefficients and the stochastic volatilities, we draw( β , v , . . . , v M ) (cid:48) from N ( β , V ) with V = ( ˜ D (cid:48) ˜ D + V − ) − and β = V ( ˜ D ˜ y ). ˜ y is a T − dimensionalvector with typical element y t /σ t and ˜ D is a T × (2 M ) matrix with typical row D t /σ t .2. Controlling for all other model parameters, the full history of ˜ β t + h is sampled using the forward-filtering backward-sampling (FFBS) algorithm proposed by Carter and Kohn (1994); Fr¨uhwirth-Schnatter (1994). For constant parameter models this step is skipped.3. The stochastic volatilities log σ t + h are drawn by employing the algorithm of Kastner and Fr¨uhwirth-Schnatter (2014) implemented in the stochvol R-package of Kastner (2016).4. Sampling the diagonal elements of V depends on the specific prior setup chosen. • If the SSVS prior is used, we simulate the indicators γ j from a Bernoulli distribution withthe probability that γ j = 1 given byProb( γ j = 1 | β j ) = u j u j + u j u j = τ − m exp (cid:40) − β j τ j (cid:41) × p m u j = τ − m exp (cid:40) − β j τ j (cid:41) × (1 − p m ) . In case we adopt the HS prior, we rely on the hierarchical representation of Makalic andSchmidt (2015). Introducing auxiliary random quantities which follow an inverse Gammadistribution we can draw ζ j and ς as follows: ζ j | β j , ς, η ∼ G − (cid:32) , η − j + β j ς (cid:33) ς | β j , ζ j , ϕ ∼ G − M + 12 , ϕ − + 12 M (cid:88) j =1 β j ζ − j η j | ζ j ∼ G − (cid:0) , ζ − j (cid:1) ,ϕ | ς ∼ G − (cid:0) , ϕ − (cid:1) We sample from the relevant full conditional posterior distributions iteratively. This is repeated 10 , ,
000 draws are discarded as burn-in. Data Appendix
The Federal Reserve Economic Data (FRED) contains monthly observations of macroeconomic variablesfor the US and is available for download at https://research.stlouisfed.org . Details on the datasetcan be found in McCracken and Ng (2016) . For each data vintage (available from 1999:08), the timeseries start from January 1959. Due to missing values in some of the series, we preselect 105 variables andtransform them according to Table C.1. We select all variables for our models except for the extendedPhillips curve, where we choose the variables indicated by column
PART . Table C.1:
Data description
FRED.Mnemonic Description Trans I(0) PART FULLRPI Real personal income 5 xW875RX1 Real personal income ex transfer receipts 5 x xINDPRO IP Index 5 x xIPFPNSS IP: Final Products 5 xIPFINAL IP: Final Products (Market Group) 5 xIPCONGD IP: Consumer Goods 5 xIPMAT IP: Materials 5 xIPMANSICS IP: Manufacturing (SIC) 5 xCUMFNS Capacity Utilization: Manufacturing 2 x xCLF16OV Civilian Labor Force 5 xCE16OV Civilian Employment 5 xUNRATE Civilian Unemployment Rate 2 x xUEMPMEAN Average Duration of Unemployment (Weeks) 2 xUEMPLT5 Civilians Unemployed : Less Than 5 Weeks 5 xUEMP5TO14 Civilians Unemployed for 5-14 Weeks 5 xUEMP15OV Civilians Unemployed : 15 Weeks & Over 5 xUEMP15T26 Civilians Unemployed for 15-26 Weeks 5 xUEMP27OV Civilians Unemployed for 27 Weeks and Over 5 xCLAIMSx Initial Claims 5 x xPAYEMS All Employees: Total nonfarm 5 x xUSGOOD All Employees: Goods-Producing Industries 5 xCES1021000001 All Employees: Mining and Logging: Mining 5 xUSCONS All Employees: Construction 5 xMANEMP All Employees: Manufacturing 5 xDMANEMP All Employees: Durable goods 5 xNDMANEMP All Employees: Nondurable goods 5 xSRVPRD All Employees: Service-Providing Industries 5 xUSWTRADE All Employees: Wholesale Trade 5 xUSTRADE All Employees: Retail Trade 5 xUSFIRE All Employees: Financial Activities 5 xUSGOVT All Employees: Government 5 xCES0600000007 Avg Weekly Hours: Goods-Producing 1 x xAWOTMAN Avg Weekly Overtime Hourse: Manufacturing 2 xAWHMAN Avg Weekly Hours: Manufacturing 1 xCES0600000008 Avg Hourly Earnings: Goods-Producing 6 x xCES2000000008 Avg Hourly Earnings: Construction 6 xCES3000000008 Avg Hourly Earnings: Manufacturing 6 xHOUST Housing Starts: Total New Privately Owned 4 x xHOUSTNE Housing Starts, Northeast 4 xHOUSTMW Housing Starts, Midwest 4 xHOUSTS Housing Starts, South 4 xHOUSTW Housing Starts, West 4 xPERMIT New Private Housing Permits (SAAR) 4 xPERMITNE New Private Housing Permits, Northeast (SAAR) 4 xPERMITMW New Private Housing Permits, Midwest (SAAR) 4 xPERMITS New Private Housing Permits, South (SAAR) 4 xPERMITW New Private Housing Permits, West (SAAR 4 xCMRMTSPLx Real Manu. and TradeIndustries Sales 5 x xRETAILx Retail and Food Services Sales 5 xAMDMNOx New Orders for Durable goods 5 xANDENOx New Orders for Nondefense Capital goods 5 xAMDMUOx Unfilled Orders for Durable goods 5 xBUSINVx Total Business Inventories 5 x xISRATIOx Total Business: Inventories to Sales Ratio 2 xUMCSENTx Consumer Sentiment Index 2 xOILPRICEx Crude Oil, , spliced WTI and Cushing 6 xPPICMM PPI: Metals and metal products 6 x xCPIAUCSL CPI : All Items 6 xCPIAPPSL CPI : Apparel 6 xCPITRNSL CPI : Transportation 6 x
FRED.Mnemonic Description Trans I(0) PART FULLCPIMEDSL CPI : Medical Care 6 xCUSR0000SAC CPI : Commodities 6 xCUSR0000SAS CPI : Services 6 xCPIULFSL CPI : All Items Less Food 6 xCUSR0000SA0L5 CPI : All Items Less Medical Care 6 xFEDFUNDS Effective Federal Funds Rate 2 x xM1SL M1 Money Stock 6 xM2SL M2 Money Stock 6 xM2REAL Real M2 Money Stock 5 x xAMBSL St. Louis Adjusted Monetary Base 6 xTOTRESNS Total Reserves of Depository Institutions 6 xNONBORRES Reserves of Depository Institutions 7 xBUSLOANS Commercial and Industrial Loans 6 x xREALLN Real Estate Loans at All Commerical Banks 6 x xNONREVSL Total Nonrevolving Credit 6 xCONSPI Nonrevolving consumer credit to Personal Income 2 xMZMSL MZM Money Stock 6 xDTCOLNVHFNM Consumer Motor Vehicle Loans Outstanding 6 xDTCTHFNM Total Consumer Total Consumer Loans and Leases Outstanding 6 xINVEST Securities in Bank Credit at All Commercial Banks 6 xCP3Mx 3-Month AA Financial Commercial Paper Rate 2 xTB3MS 3-Month Treasury Bill 2 x xTB6MS 6-Month Treasury Bill 2 xGS1 1-Year Treasury Rate 2 xGS5 5-Year Treasury Rate 2 xGS10 10-Year Treasury Rate 2 x xAAA Moody’s Seasoned Aaa Corporate Bond Yield 2 xBAA Moody’s Seasoned Baa Corporate Bond Yield 2 xCOMPAPFFx 3-Month Commercial Paper Minus FEDFUNDS 1 xTB3SMFFM 3-Month Treasury C Minus FEDFUNDS 1 xTB6SMFFM 6-Month Treasury C Minus FEDFUNDS 1 xT1YFFM 1-Year Treasury C Minus FEDFUNDS 1 xT5YFFM 5-Year Treasury C Minus FEDFUNDS 1 xT10YFFM 10-Year Treasury C Minus FEDFUNDS 1 xAAAFFM Moody’s Aaa Corporate Bond Minus FEDFUNDS 1 xBAAFFM Moody’s Baa Corporate Bond Minus FEDFUNDS 1 xTWEXMMTH Trade Weighted Trade Weighted U.S. Dollar Index: Major Currencies 5 xEXSZUSx Switzerland / U.S. Foreign Exchange Rate 5 x xEXJPUSx Japan / U.S. Foreign Exchange Rate 5 xEXUSUKx U.S. / UK Foreign Exchange Rate 5 xEXCAUSx Canada / U.S. Foreign Exchange Rate 5 xS.P.500 S&Ps Common Stock Price Index: Composite 5 x xS.P..indust S&Ps Common Stock Price Index: Industrials 5 xS.P.div.yield S&Ps Composite Common Stock: Dividend Yield 2 xS.P.PE.ratio S&Ps Composite Common Stock: Price-Earnings Ratio 5 x
Note:
Column
Trans I(0) denotes the transformation of each time series to achieve approximate stationarity: (1) no transformation, (2)∆ x t , (4) log ( x t ), (5) ∆ log ( x t ), (6) ∆ log ( x t ), (7) ∆( x t /x t − − .0)