[PDF] Macroeconomic Forecasting with Fractional Factor Models

Abstract

We combine high-dimensional factor models with fractional integration methods and derive models where nonstationary, potentially cointegrated data of different persistence is modelled as a function of common fractionally integrated factors. A two-stage estimator, that combines principal components and the Kalman filter, is proposed. The forecast performance is studied for a high-dimensional US macroeconomic data set, where we find that benefits from the fractional factor models can be substantial, as they outperform univariate autoregressions, principal components, and the factor-augmented error-correction model.

Full PDF

MMacroeconomic Forecasting with Fractional FactorModels

Tobias Hartl ∗ University of Regensburg, 93053 Regensburg, Germany Institute for Employment Research (IAB), 90478 Nuremberg, GermanyMay 2020

Abstract.

We combine high-dimensional factor models with fractional integration meth-ods and derive models where nonstationary, potentially cointegrated data of diﬀerent per-sistence is modelled as a function of common fractionally integrated factors. A two-stageestimator, that combines principal components and the Kalman ﬁlter, is proposed. Theforecast performance is studied for a high-dimensional US macroeconomic data set, wherewe ﬁnd that beneﬁts from the fractional factor models can be substantial, as they outper-form univariate autoregressions, principal components, and the factor-augmented error-correction model.

Keywords.

Fractional integration, state space model, principal components, long mem-ory, Kalman ﬁlter

JEL-Classiﬁcation.

C32, C38, C51, C53 ∗ Corresponding author. E-Mail: [email protected] author thanks Federico Carlini, Manfred Deistler, Christoph Rust, Rolf Tschernig, Enzo Weber,Roland Weigand, and the participants at the Long Memory Conference 2018 in Aalborg, at the Workshopon Statistics and Econometrics 2018 in Passau, at the Econometric Society Europe Meeting 2018 inCologne, at the Annual Meeting of the German Statistical Society 2018 in Linz, and at the InternationalConference on Computational and Financial Econometrics 2018 in Pisa for many valuable comments.Support through the projects TS283/1-1 and WE4847/4-1 ﬁnanced by the German Research Foundation(DFG) is gratefully acknowledged. a r X i v : . [ ec on . E M ] M a y Introduction

At least since the seminal work of Forni et al. (2000) and Stock and Watson (2002), fac-tor models have become a popular tool for forecasting macroeconomic dynamics, as theyhandle covariation in the cross-section eﬃciently by condensing it to a typically small num-ber of common latent factors. Regardless their applicability to large data sets, the majordrawback of standard factor models is an ineﬃcient use of longitudinal information: incontrast to e.g. VARMA models, the vast majority of factor models requires stationarity.Consequently, key features of macroeconomic data, such as nonstationary trends and coin-tegration, are not captured adequately by standard factor models but rather diﬀerencedaway. Over-diﬀerencing of latent processes poses an additional risk, since model selectioncriteria and model speciﬁcation tests for the number of factors are likely to miss thesecomponents, as eigenvalues corresponding to over-diﬀerenced series converge to zero.A more ﬂexible setup is suggested by a young strand of the factor model literature that addsunit roots to the model (see, e.g., Pe˜na and Poncela; 2006; Eickmeier; 2009; Chang et al.;2009; Banerjee et al.; 2014, 2016; Barigozzi et al.; 2016). But these models come with thedrawback of requiring a priori assumptions about the degree of persistence, and typicallythe series under study are assumed to be I (1). This makes an endogenous treatmentof the (unknown) long-run dynamic characteristics of observable time series impossible.Statistical inference about the degree of persistence of an observable variable is then limitedto prior unit root testing, ignoring the non-standard behavior of many economic series thatare fractionally integrated. Misspecifying the integration orders of the observable variablesmay bias the factor estimates, can yield wrong inference about the number of commonfactors, and is likely to deteriorate the forecast performance.To address these problems, semiparametric methods that are robust to fractional integra-tion have been proposed by Luciani and Veredas (2015) for a single fractionally integratedfactor and by Ergemen (2019) for pervasive fractionally integrated nuisance. Allowing fora wide range of persistence and an endogenous treatment of integration orders, Hartl andWeigand (2019b) derive a parametric fractionally integrated factor model and apply it torealized covariance matrices.In macroeconomics, fractionally integrated factor models have not played a role so far,although there is comprehensive evidence for long memory and fractional cointegration in1he data (cf. e.g. Hassler and Wolters; 1995; Baillie; 1996; Gil-Ala˜na and Robinson; 1997;Tschernig et al.; 2013).Tackling this issue, this paper aims to provide insights on whether fractional integrationtechniques have merit at least for a relevant fraction of the numerous and heterogeneousmacroeconomic variables typically under study. By elaborating fractionally integrated fac-tor models, we construct setups where cross-sectional covariation in the data in levels isdriven by fractionally integrated latent factors that may impose cointegration relations. Indetail, we propose three diﬀerent factor models that generalize the aforementioned factormodels to fractionally integrated processes. The ﬁrst model introduces ARFIMA processesin the nonstationary factor model setup of Barigozzi et al. (2016), while the second modeldistinguishes between purely fractionally integrated factors that impose cointegration re-lations and I (0) factors that model common short-run behavior of the data. Finally, ourthird model generalizes the pre-diﬀerencing of the data for standard I (0) factor models bytaking fractional diﬀerences.As standard factor models they are applicable to high-dimensional data, but bear severaladvantages: the fractional factor models allow for a joint modelling of data of diﬀerentpersistence, do not require prior assumptions about the degree of persistence of the databut treat the integration orders endogenously, they capture cointegration via the commonfractionally integrated factors and are more robust to over-diﬀerencing.For the estimation of the latent factors we introduce a two-stage estimator, where initialfactor estimates are obtained via principal components, until the model is cast in statespace form such that the Kalman ﬁlter and smoother is applicable. For the latter to becomputationally feasible, we use ARMA approximations for fractionally integrated pro-cesses as suggested in Hartl and Weigand (2019a). Estimation of the unknown modelparameters and the latent factors is then carried out jointly via an expectation maximiza-tion algorithm.In a pseudo out-of-sample forecast experiment for a high-dimensional US macroeconomicdata set of McCracken and Ng (2016), we study the forecast accuracy of the fractionalfactor models. We provide a guided choice among the diﬀerent models by consideringthe forecast performance for 112 macroeconomic variables. Finally, we ﬁnd comprehensiveevidence that adequately combining fractional integration techniques and factor models canimprove forecasts substantially compared to standard factor models and other benchmarks.2he remaining paper is organized as follows. Section 2 details the construction of fractionalfactor models. The two-stage estimator for the factors and model parameters is discussedin section 3. Section 4 compares the forecast performance of the fractional factor modelsto diﬀerent benchmarks in a pseudo out-of-sample forecast experiment, until section 5concludes. To begin with, consider the general form of a high-dimensional factor model for possiblyfractionally integrated data y t = f ( χ t ) + u t , t = 1 , ..., T, (1)where y t = ( y ,t , ..., y N,t ) (cid:48) is an N -dimensional observable time series with entries y i,t ∼ I ( d ∗ i ) that are integrated of order d ∗ i , d ∗ i ∈ R ≥ . An integration order d ∗ i implies that thefractional diﬀerence of a series is I (0), i.e. ∆ d ∗ i y i,t ∼ I (0), i = 1 , ..., N . The vector χ t is r -dimensional and accounts for common short- and long-run dynamics among the y t , and u t = ( u ,t , ..., u N,t ) (cid:48) holds the N idiosyncratic errors and has a diagonal variance.The fractional diﬀerence operator ∆ d is deﬁned as∆ d = (1 − L ) d = ∞ (cid:88) j =0 π j ( d ) L j , π j ( d ) =  j − d − j π j − ( d ) j = 1 , , ..., j = 0 , (2)and a +-subscript amounts to a truncation of an operator at t ≤

0, e.g. for an arbitrarystochastic process z t , ∆ d + z t = (cid:80) t − j =0 π j ( d ) L j z t (see e.g. Johansen; 2008). For d ∈ N frac-tionally integrated processes nest the standard integer integrated speciﬁcations (e.g. I (0), I (1), and I (2) processes), whereas d ∈ R ≥ adds ﬂexibility to the weighting of past shocks.Throughout the paper, we adopt the type II deﬁnition of fractional integration (Marinucciand Robinson; 1999) that assumes zero starting values for all fractional processes, and, asa consequence, allows for a seamless treatment of the asymptotically stationary ( d < / d ≥ /

2) case. Due to the type II deﬁnition the inverse fractionaldiﬀerence ∆ − d + z t exists. 3tandard factor models as those considered in Forni et al. (2000), Bai and Ng (2002),and Stock and Watson (2002), are special cases of (1). They extract r common factorsof a data set in ﬁrst and second diﬀerences, implying that the series in y t are I (1) and I (2). The common factors in f ( χ t ) then correspond to the common trends in the Grangerrepresentation theorem for cointegrated data (see Barigozzi et al.; 2016).To give an intuition on how fractional integration aﬀects the long-run properties of a timeseries we note the following. For positive d the autocovariance function of an I ( d ) processdecays at a hyperbolic rate, implying that a shock has a persistent impact on the I ( d )process, and a greater d implies a more persistent impact of a shock. While an I ( d = 1)process is an unweighted sum of past shocks, an I ( d ) process in general can be interpretedas a weighted sum of past shocks, where the weights depend on d via (2). Furthermore,if a linear combination of a vector I ( d ) process exists that is integrated of order b < d ,then the series are cointegrated. Cointegration implies common (fractionally) integratedtrends, which our models capture via f ( χ t ). For a discussion of cointegration relations ina fractionally integrated factor model setup we refer to Hartl and Weigand (2019b).We introduce three diﬀerent fractionally integrated factor models in the next sectionsthat are nested in (1) and diﬀer in the functional relation between χ t and y t . Section2.1 generalizes nonstationary factor models (cf. e.g. Barigozzi et al.; 2016) to fractionallyintegrated processes. In section 2.2 we distinguish between fractionally integrated factors,that account for long-run co-movements in y t , and I (0) factors, that allow common short-run dynamics. Finally, section 2.3 generalizes the pre-diﬀerencing of standard factor modelsto fractional diﬀerencing. Consider a simple multivariate unobserved components model y t = Λ f t + u t , t = 1 , ..., T, (3)where f ( χ t ) = Λf t in (1), f t = ( f ,t , ..., f r,t ) (cid:48) holds the r common factors, Λ is a N × r matrix of factor loadings that is assumed to have full column rank, and the errors u t account for idiosyncratic dynamics. The latent factors are assumed to follow r fractionally4ntegrated autoregressive processes B j ( L )∆ d j + f j,t = ζ j,t , j = 1 , ..., r, (4)where B j ( L ) = 1 − (cid:80) pk =1 B j,k L k is a stable lag polynomial. For the pervasive shocksthat drive f t we assume ( ζ ,t , ..., ζ r,t ) (cid:48) = ζ t ∼ NID( , Q ), where Q is diagonal. A matrixformulation of (4) follows directly by deﬁning d = ( d , ..., d r ) (cid:48) , the lag polynomials D ( d ) =diag(∆ d + , ..., ∆ d r + ) and B ( L ) = diag( B ( L ) , ..., B r ( L )), such that B ( L ) D ( d ) f t = ζ t .The errors u i,t are assumed to be mutually independent and are allowed to be autocorre-lated ρ i ( L ) u i,t = ξ i,t , ξ i,t ∼ NID(0 , σ ξ i ) , i = 1 , ..., N, (5)where ρ i ( L ) = (cid:80) p i k =0 ρ i,k L k is a stationary autoregressive lag polynomial.As a consequence, the model may explain various degrees of common persistence thatcharacterize the data by common components with long memory. For d = ... = d r = 0,the model nests the approximate dynamic factor model of Stock and Watson (2002), while d j ∈ { , } , j = 1 , ..., r , yields a nonstationary dynamic factor model with I (1) factors asconsidered in e.g. Barigozzi et al. (2016). Therefore, the model can be interpreted as afractional generalization that neither requires prior diﬀerencing of the data, nor a priorassumptions about the integration orders. A more parsimonious factor model is proposed by Hartl and Weigand (2019b). Theirmodel distinguishes between r purely fractional factors f (1) t = ( f (1)1 ,t , ..., f (1) r ,t ) (cid:48) , that es-tablish cointegration relations among the y t , and r stationary autoregressive components f (2) t = ( f (2)1 ,t , ..., f (2) r ,t ) (cid:48) , that account for common short-run behavior. We consider a slightlymore general modiﬁcation that allows for autocorrelated idiosyncratic errors. The generalframework for the dynamic orthogonal fractional components model is then given by y t = (cid:104) Λ (1) Λ (2) (cid:105)  f (1) t f (2) t  + u t , t = 1 , ..., T (6)5 d j + f (1) j,t = ζ (1) j,t , j = 1 , ..., r , (7) B (2) k ( L ) f (2) k,t = ζ (2) k,t , k = 1 , ..., r , (8) ρ i ( L ) u i,t = ξ i,t , i = 1 , ..., N, (9)for all t = 1 , ..., T and r = r + r ≤ N . The N idiosyncratic shocks ξ t = ( ξ ,t , ..., ξ N,t ) (cid:48) are assumed to follow independent Gaussian white noise processes ξ t ∼ NID( , H ). Forthe pervasive shocks ζ (1) t = ( ζ (1)1 ,t , ..., ζ (1) r ,t ) (cid:48) , ζ (2) t = ( ζ (2)1 ,t , ..., ζ (2) r ,t ) (cid:48) we assume vec( ζ (1) t , ζ (2) t ) ∼ NID(0 , Q ) where Q is diagonal. In addition, we assume that the errors u t are independentof the common components f t .Deﬁne the polynomials B (2) ( L ) = diag( B (2)1 ( L ) , ..., B (2) r ( L )), D (1) ( d ) = diag(∆ d + , ..., ∆ d r + ).Then, the model can be shown to be nested in the setup of section 2.1 for f t = vec( f (1) t , f (2) t ), B ( L ) = diag( I , B (2) ( L )), and D ( d ) = diag( D (1) ( d ) , I ). In terms of (1) the model speciﬁes f ( χ t ) = Λ (1) f (1) t + Λ (2) f (2) t .Note that the NID assumption on ζ t together with Q diagonal yields r orthogonal factors f t . This common feature of many unobserved components models, which also applies to themodels in sections 2.1 and 2.3, reduces estimation uncertainty of the loadings and makesthe framework very attractive for forecasting. Since u t , ζ t are assumed to be independent,any correlation among the variables in y t stems from the common long- and short-runcomponents f (1) t and f (2) t . A third model that completes our toolbox of fractionally integrated factor models takesfractional diﬀerences of the observable variables to arrive at a short memory model, whereall components are at most I (0). Hence, we contrast our two models from sections 2.1 and2.2 with an additional approach that excludes fractional integration from the factors. Forthis purpose we deﬁne∆ d ∗ i + y i,t = Λ i f t + ξ i,t , t = 1 , ...., T, i = 1 , ..., N, (10) B j ( L ) f j,t = ζ j,t , j = 1 , ..., r. (11)6s before, y t = ( y ,t , ..., y N,t ) (cid:48) are the observable variables, Λ = [ Λ (cid:48) , ..., Λ (cid:48) N ] (cid:48) holds thefactor loadings, and f t = ( f ,t , ..., f r,t ) (cid:48) contains the r latent factors. In the notation of(1) this implies f ( χ t ) = D ( − d ∗ ) Λf t and u t = D ( − d ∗ ) ξ t with ξ t = ( ξ ,t , ..., ξ N,t ) (cid:48) , and d ∗ = ( d ∗ , ..., d ∗ N ) (cid:48) .By deﬁning B ( L ) = diag( B ( L ) , ..., B r ( L )) as in sections 2.1 and 2.2 the factors f t canbe written as a diagonal VAR process B ( L ) f t = ζ t , where ζ t = ( ζ ,t , ..., ζ r,t ) (cid:48) . The id-iosyncratic and pervasive shocks are assumed to be orthogonal and to follow independentGaussian white noise processes ξ t ∼ NID( , H ) and ζ t ∼ NID( , Q ).By taking fractional diﬀerences prior to estimating a factor model, our approach generalizesthe pre-diﬀerencing of standard factor models to the fractional domain. In fractionaldiﬀerences, our model is an approximate dynamic factor model and, therefore, it nests themodel of Stock and Watson (2002) for d ∗ , ..., d ∗ N ∈ N .Taking fractional diﬀerences of order d ∗ i ensures for each ∆ d ∗ i + y i,t that the common andidiosyncratic components are at most I (0). Note that fractional diﬀerences are less sensitiveto over-diﬀerencing compared to integer diﬀerences, since the method ensures that thefractional diﬀerence of the most persistent factor that loads on y i,t is I (0) ∀ i = 1 , ..., N . In this section we discuss both, the estimation of the latent factors f ( χ t ) in (1) for the threediﬀerent factor models proposed in sections 2.1 to 2.3, and the estimation of the unknownmodel parameters. The expectation-maximization (EM) algorithm is a natural choice forthe estimation of parametric factor models (cf. e.g. Jungbacker and Koopman; 2015) andhas been derived for fractionally integrated factor models in Hartl and Weigand (2019a,appendix B). In the E-step, the latent factors are estimated given a set of parametersvia the Kalman ﬁlter. The M-step then updates the parameter vector by maximizingthe likelihood function given the factor estimates from the E-step. Therefore, the EM-algorithm allows for a joint estimation of factors and model parameters.Since the EM-algorithm is a parametric estimator, it requires starting values for the un-known model parameters in sections 2.1 to 2.3. We tackle this problem by proposing atwo-stage estimator. The ﬁrst stage is described in section 3.1. We estimate the latent fac-tors via the nonparametric method of principal components (PC) and propose estimators7or the unknown model parameters. We include a consistency proof for the PC estimatorfor fractionally integrated factors with integration orders in R ≥ , since consistency of thePC estimator has so far only been shown in more restrictive settings.The second stage is considered in section 3.2. We derive an approximate state spaceformulation for each of the factor models in sections 2.1 to 2.3, so that the Kalman ﬁltercan be applied to estimate the latent factors. Finally, we discuss the joint estimation ofthe model parameters and the latent factors via the EM algorithm. Suﬃcient conditions for a consistent estimation of f ( χ t ) in (1) via PC were derived in Baiand Ng (2002) for stationary processes, in Bai (2004) for I (1) common components and inBai and Ng (2004) for y t ∼ I (1), where nonstationarity may also stem from the idiosyn-cratic components. For a single fractionally integrated factor and fractional integrationorders in [0 ,

1] Luciani and Veredas (2015) have shown that the methods of Bai and Ng(2004) are also applicable. We generalize their results to non-negative integration ordersand multiple fractionally integrated factors by showing consistency of the PC estimatorfor f ( χ t ) in (1).Since PC are estimated via an eigendecomposition of Var( y t ), the applicability of thePC estimator depends crucially on the stability of the variance. For max( d ∗ i ) < . y t are asymptotically stationary, and consequently the variance of y t converges as t →∞ . Therefore, the PC estimator satisﬁes the assumptions of Bai and Ng (2002), whereassumption A postulates boundedness of plim T →∞ T − (cid:80) Tt =1 f t f (cid:48) t = Σ f < ∞ .For max( d ∗ i ) ≥ . d = ... = d r we show that there exists a matrix H such that the factors f t are estimated consistently up to a rotation by PC1 T T (cid:88) t =1 || ˆ f t − H (cid:48) f t || p −→ . Expressions for ˆ f t and H , together with a detailed proof, are given in appendix A.8henever there is at least one d j (cid:54) = d k , j, k = 1 , ..., r , direct estimation of all commonfractionally integrated factors is not feasible, since, depending on the scaling of the PC,either the contribution of the least persistent factors to the covariance of y t converges tozero, or the contribution of the most persistent factors diverges. In this case, one needsto separate y t into blocks of equal persistence. Starting with the most persistent block,latent factors are estimated via PC and projected out. The adjusted variables are thenadded to the next block of y t , and the procedure repeats, until a stationary set of variablesis obtained.Having established consistency of PC for the estimation of the latent fractionally integratedfactors, we turn to the estimation of the dynamic parameters for the common factors.Since the dynamic properties diﬀer among the three frameworks discussed in section 2, weconsider them separately in the following. ARFI factors

The common components of the model in section 2.1 are assumed tofollow r independent autoregressive fractionally integrated processes. Therefore, we rotatethe PC estimates via the method of Matteson and Tsay (2011) to obtain dynamic or-thogonal components. The parameters in (4) are estimated by maximizing the likelihoodfunction for a multivariate fractionally integrated process (see Nielsen; 2004) that is givenby l ( d , B , Q ) = − T | Q | − T (cid:88) t =1 ( B ( L ) D ( d ) f t ) (cid:48) Q − ( B ( L ) D ( d ) f t ) , (12)with B = vec( B , ..., B p ) = vec( B ( L )), and B ( L ), D ( d ) as deﬁned in section 2.1. Plug-ging in the ﬁrst-order condition Q = Q ( d , B ) = T − (cid:80) Tt =1 ( B ( L ) D ( d ) f t ) ( B ( L ) D ( d ) f t ) (cid:48) ,and dropping the constant terms gives l ∗ ( d , B ) = − T log | Q ( d , B ) | , which we maximizeto get estimates for the unknown parameters d , ..., d r , B , ..., B p . For some data setsthe assumption of orthogonal factors may be violated. Then, the diagonal assumption on B ( L ) can be dropped, which does not aﬀect the identiﬁcation of the fractional factor VARbut increases the number of unknown parameters in (12). Factor loadings Λ in (3) areestimated via ordinary least squares (OLS). 9 I and AR factors

To derive an estimator for the dynamic parameters of the model insection 2.2, we ﬁrst need to distinguish between the space spanned by the purely fractionalfactors and the stationary autoregressive components. We identify the two factor subspacesof f (1) t and f (2) t up to a rotation by estimating the fractional cointegration subspace andits orthogonal complement via the semiparametric method of Chen and Hurvich (2006),who use eigenvectors of an averaged periodogram matrix of the ﬁrst m Fourier frequenciesto estimate the fractional cointegration subspace. Finally, orthogonal series within thefractional and non-fractional factors are obtained by applying the decorrelation methodof Matteson and Tsay (2011). The resulting fractional and non-fractional factor estimatesare denoted as ˆ f (1) t and ˆ f (2) t respectively.Given the factor estimates ˆ f (1) t and ˆ f (2) t together with the observable variables y t , weestimate the factor loadings Λ in (6) and the AR coeﬃcients of (8) via OLS. Estimatesfor the fractional integration orders of the common components in (7) are obtained bymaximizing the likelihood of the r ARFIMA(0, d j , 0) processes, j = 1 , ..., r . AR factors

Due to the stationary representation of the model in section 2.3 the PCestimator of Bai and Ng (2002) is directly applicable. The factors are again decorrelated bymeans of dynamic orthogonal components of Matteson and Tsay (2011). For a discussionof the consequences when a diagonal representation of the common factors is not feasiblewe refer to the ARFI case. The dynamic coeﬃcients for the r common factors in (11)together with their factor loadings in (10) are estimated via OLS. AR errors

An estimate for the idiosyncratic errors is obtained via ˆ u t = y t − ˆ Λ ˆ f t .Since the errors are assumed to follow N independent autoregressive processes, the ARparameters are estimated via OLS. The second stage of our estimator combines factor estimation for a given set of parametersvia the Kalman ﬁlter and smoother together with parameter optimization via maximumlikelihood (ML) in an EM algorithm. For the Kalman ﬁlter to be applicable, the diﬀerentcomponents of our fractional factor models are cast in state space form. Note that for agiven sample size T a ﬁnite state space representation of a type II fractionally integrated10rocess exists but requires a state vector of dimension T −

1, as (2) shows. Since the Kalmanﬁlter sequentially inverts the ( T − × ( T −

1) autocovariance matrix for each factor, a fullrepresentation of a fractionally integrated process can be very costly from a computationalperspective, in particular for long time series. Therefore, section 3.2.1 discusses ﬁniteapproximations that resemble the dynamic properties of fractionally integrated processeswell and are computationally feasible. Section 3.2.2 derives the state space representationand section 3.2.3 considers parameter estimation.

The literature has considered a variety of approximations for long memory processes:Palma (2007, section 4.2) suggests truncated AR approximations, whereas Chan and Palma(1998) study truncated MA approximations. In a simulation study, Hartl and Weigand(2019a) ﬁnd that small ARMA( v, w ) models with v, w ∈ { , } outperform pure AR andMA approximations even if a high number of lags enter the latter models. In addition,the ML estimator for the integration order is found to be more precise when an ARMAapproximation is used. As their simulation studies show, the ML estimates for an approx-imate representation of a fractionally integrated process converge to the ML estimates ofthe exact state space representation as T → ∞ . For the latter, consistency is proven inHartl et al. (2020).Following the suggestions of Hartl and Weigand (2019a), an ARMA(4 ,

4) process is usedto approximate the purely fractional factors of section 2.2. For ARFIMA processes, whosedynamic properties stem not only from the fractional diﬀerencing operator, the approxi-mation quality of ARMA processes is not clear. Therefore, we use pure AR(5) processes toresemble the properties of the fractional diﬀerencing operator in the ARFI-case of section2.1. For an arbitrary integration paramter b the approximations are given by∆ b + a = (cid:20) a ( L, b ) m ( L, b ) (cid:21) + = (cid:20) − a ( b ) L − ... − a v ( b ) L v m ( b ) L + ... + m w ( b ) L w (cid:21) + , where m k ( b ) are the MA coeﬃcients, k = 1 , ..., w , and a l ( b ) are the AR parameters, l = 1 , ..., v , ( v = 4 , w = 4) for purely fractional factors as in (7), and ( v = 5 , w = 0) forARFI-factors as in (4).The ARMA parameters are chosen beforehand for a given sample size T and fractional11ntegration order b by minimizing the distance between the generic process x t = ∆ − b + z t = (cid:80) t − j =0 π j ( − b ) z t − j and its approximation ˜ x t = [ m ( L, b ) a ( L, b ) − ] + z t = (cid:80) t − j =0 ˜ ψ j ( − b ) z t − j , z t ∼ N ID (0 , t = 1 , ..., T , where ˜ ψ j ( − b ) is the j -th coeﬃcient of the ARMAWold representation, and π j ( − b ) is its counterpart from (2). We use the mean squared errorover t = 1 , ..., T as the distance measure M SE bT = T (cid:80) Tt =1 (cid:80) t − j =0 (cid:16) ˜ ψ j ( − b ) − π j ( − b ) (cid:17) . For a given sample size T and integration order b , we collect the ARMA coeﬃcients in a( v + w )-vector ϕ T ( b ) = ( a ( b ) , ..., a v ( b ) , m ( b ) , ..., m w ( b )) (cid:48) . The ARMA coeﬃcient estimatesare then deﬁned via ˆ ϕ T ( b ) = arg min ϕ M SE bT . Following Hartl and Weigand (2019a), for agiven T optimization is carried out for each value on a grid for b . The ARMA coeﬃcientsare smoothed using cubic regression splines, such that a continuous, diﬀerentiable function ϕ T ( b ) in b is obtained. The technical details and several simulation studies are containedin Hartl and Weigand (2019a).With a smooth function ϕ T ( b ) in b at hand, parameter optimization for the models in sec-tions 2.1 and 2.2 can be conducted over the low-dimensional vector of fractional integrationorders d , which keeps the dimension of the parameter vector within the optimization pro-cedure manageable and independent of the length of the ARMA approximations. With these approximations at hand, we can turn to the state space representation of ourfractional factor models. A general representation of a state space model is given by˜ y t = Zα t + ξ t , α t +1 = T α t + Rζ t +1 , (13)where a t | T = E( α t | ˜ y , ..., ˜ y T , θ ), P t | T = Var( α t | ˜ y , ..., ˜ y T , θ ). The covariance matrices ofthe disturbances Q = Var( ζ t ), H = Var( ξ t ) are diagonal ∀ t = 1 , ..., T . Without loss ofgenerality, we set Q = I for all fractional factor models in state space form to distinguishbetween the factor loadings Λ and the variance of the factor innovations Q . The matrices Z , T , R , and the states α t diﬀer for the three fractional factor models and are derivedseparately in the following. 12 RFI factors

By approximating the fractional diﬀerence operator of our ﬁrst modelthat is given in section 2.1, equation (4) becomes ζ j,t = B j ( L )(1 − L ) d j f j,t a = B j ( L ) a ( L, d j ) + f j,t , j = 1 , ..., r, t = 1 , ..., T. Let A ( L, d ) = I − (cid:80) vj =1 A j ( d ) L j , A j ( d ) = diag( a j ( d ) , ..., a j ( d r )), j = 1 , ..., r . Then B ( L ) A ( L, d ) = (cid:80) p + vk =0 (cid:80) kl =0 B l A k − l ( d ) L k where A ( d ) = B = − I , A l ( d ) = ∀ l > v ,and B l = ∀ l > p .Jungbacker and Koopman (2015) suggest to eliminate autocorrelation in the idiosyncraticerrors u t via the observations equation instead of accounting for them via the state equa-tion. We follow their suggestion and manipulate the observable variables˜ y i,t = y i,t − p i (cid:88) j =1 ρ i,k y i,t − j , ∀ i = 1 , ..., N. (14)For the state space representation we collect the adjusted observable variables in ˜ y t =(˜ y ,t , ..., ˜ y N,t ) (cid:48) and deﬁne Ψ j = diag( ρ ,j , ..., ρ N,j ) such that ˜ y t = y t − (cid:80) max( p i ) j =1 Ψ j y t − j . A state space representation of (3), (4), and (5) follows directly by deﬁning the systemmatrices T , Z , R , together with the state vector α t in (13) as follows. T depends on d and B ( L ), whereas Z depends on Λ and ρ i ( L ), i = 1 , ..., N , T =  B + A ( d ) · · · − (cid:80) u − l =0 B l A u − − l ( d ) − (cid:80) ul =0 B l A u − l ( d ) I · · · ... . . . ... ... · · · I  , Z =  Λ (cid:48) − ( Ψ Λ ) (cid:48) ... − ( Ψ u − Λ ) (cid:48)  (cid:48) , α t = ( f (cid:48) t , ..., f (cid:48) t − u +1 ) (cid:48) holds the states, R = [ I , ] (cid:48) is a selection matrix and u is deﬁned asmax( p + v, max( p i ) + 1). To distinguish between the r factors, we restrict the ﬁrst r rowsof Λ to form a lower triangular matrix. FI and AR factors

Starting with the ARMA approximations of the purely fractionalfactors in (7), an approximate representation of the latent fractionally integated factorsis given by f (1) t a = M ( L, d ) A ( L, d ) − ζ (1) t , where the matrix AR and MA polynomials are M ( L, d ) = I + M ( d ) L + ... + M w ( d ) L w , M j ( d ) = diag( m j ( d ) , ..., m j ( d r )), A ( L, d ) =13 − A ( d ) L − ... − A v ( d ) L v , A j ( d ) = diag( a j ( d ) , ..., a j ( d r )), and M j ( d ) = ∀ j > w , A j ( d ) = ∀ j > v .Regarding u t , we again eliminate autocorrelation from the idiosyncratic errors by manip-ulating y t as in (14), i.e. ˜ y t = y t − (cid:80) max( p i ) j =1 Ψ j y t − j = Ψ ( L ) y t . For the latent fractionallyintegrated factors, this implies Ψ ( L ) Λ (1) f (1) t a = Ψ ( L ) Λ (1) M ( L, d ) A ( L, d ) − ζ (1) t .The state space form (13) of the model is then obtained by imposing a block diagonalstructure on T = diag( T (1) , T (2) ), where the ﬁrst block T (1) solely depends on d , whereasthe second block T (2) depends on B ( L ) T (1) =  A ( d ) · · · A u − ( d ) A u ( d ) I · · · ... . . . ... ... · · · I  , T (2) =  B (2)1 · · · B (2) u − B (2) u I · · · ... . . . ... ... · · · I  , where u = max( v, w +max( p i )+1), u = max( p, max( p i )+1), and max( p i ) is the maximumlag order of the idiosyncratic errors u t in (9). T (1) accounts for the dynamic propertiesof the fractionally integrated factors, whereas T (2) models the stationary variation of the f (2) t .The two blocks for Z = (cid:104) Z (1) Z (2) (cid:105) depend on Λ (1) , Λ (2) , d , and ρ ( L ), and are given by Z (1) = (cid:104) Λ (1) (cid:80) k =0 − Ψ k Λ (1) M − k ( d ) · · · (cid:80) u − k =0 − Ψ k Λ (1) M u − − k ( d ) (cid:105) , Z (2) = (cid:104) Λ (2) − Ψ Λ (2) · · · − Ψ u − Λ (2) (cid:105) , whereas the state vector is given by α t =  α (1) t α (2) t  , α (1) t = ( I − ... − A v ( d ) L v ) −  ζ (1) t ... ζ (1) t − u +1  , α (2) t =  f (2) t ... f (2) t − u +1  . Note that Ψ j = ∀ j > max( p i ) and B (2) j = ∀ j > p . Finally, the selection matricesare given by R = diag( R (1) , R (2) ), R (1) = [ I , ] (cid:48) , R (2) = [ I , ] (cid:48) , whereas the disturbancesin the state equation are ζ t +1 = ( ζ (1) (cid:48) t +1 , ζ (2) (cid:48) t +1 ) (cid:48) , ζ t ∼ NID( , Q ) for all t = 1 , ..., T . Notethat for the fractionally integrated factors the observations equation yields Z (1) α (1) t =14 ( L ) Λ (1) M ( L, d ) α (1) t = Ψ ( L ) Λ (1) M ( L, d ) A ( L, d ) + ζ (1) t a = Ψ ( L ) Λ (1) f (1) t , whereas for thestationary AR factors it gives Z (2) α (2) t = Ψ ( L ) Λ (2) f (2) t .The r independent fractional factors are identiﬁed by imposing a block triangular structureon Λ (1) while sorting the observations y t with respect to their order of fractional integrationin ascending order. As a consequence, the ﬁrst block of variables in y t is driven by theleast persistent factor f t , the second block of variables depends on f t and f t whereas the r -th block with the highest order of fractional integration is allowed to be inﬂuenced byall fractional factors. In addition, the ﬁrst r rows of Λ (2) form a lower triangular matrixto identify the I (0) factors f (2) t . AR factors

Since the factors of our third model (10) are stationary autoregressiveprocesses, a state space representation as in (13) follows immediately by deﬁning ˜ y t =(∆ d ∗ + y ,t , ..., ∆ d ∗ N + y N,t ) (cid:48) . The factors enter the state vector directly, whereas their dynamiccoeﬃcients in (11) are contained in T . Furthermore, the factor loadings are modelled via Z , and R is again a selection matrix α t =  f t f t − ... f t − p  , T =  B · · · B p I · · · ... . . . ... ... · · · I  , Z =  Λ (cid:48) ...  (cid:48) , R =  I  . For identiﬁcation of the factors, we restrict the ﬁrst r rows of Λ to be lower triangular. We collect the unknown parameters in d , Λ , B , ..., B p , ρ , , ..., ρ N,p N , and H , that enterthe system matrices of the state space model T , Z , and H , in a parameter vector θ . Toestimate θ we adopt the approach of Hartl and Weigand (2019a), who derive an analyticalsolution to the optimization problem of the expected complete Gaussian likelihood functionof the state space model, together with a computationally fast combination of the EMalgorithm and gradient-based optimization.In the expectation step of the EM algorithm, we estimate the smoothed states and distur-bances, together with the corresponding covariance matrices for a given set of parameters15 θ j via the Kalman ﬁlter and smoother. The M-step then maximizes the likelihood giventhe Kalman ﬁlter and smoother estimates to obtain ˆ θ j +1 .After either a convergence criterion is satisﬁed, or a predeﬁned number of iterations m isreached, the resulting parameter estimates from the EM algorithm ˆ θ are used as startingvalues for the maximum likelihood estimation via the BFGS algorithm, which uses the an-alytical solution for the score vector of Hartl and Weigand (2019a), since the EM algorithmwas found to be slow around the optimum.In case of the stationary factor model in fractional diﬀerences, the matrices T and Z are functions of two disjoint parameter spaces and, therefore, a simpliﬁcation of the EMalgorithm is obtained directly by solving the score vector for vec( T ) and vec( Z ) (seeJungbacker and Koopman; 2015, appendix A.2).Forecasts are obtained by shifting the system one period ahead and plugging in thesmoothed factor estimates from the Kalman ﬁlter. Having discussed the estimation of f ( χ t ) together with the unknown parameters for thethree fractionally integrated factor models in sections 2.1–2.3, we investigate their predic-tive accuracy when neither the DGP, nor the starting values, nor the number of factors,are known to the researcher. For this purpose we study the forecast performance of ourthree models in a pseudo out-of-sample forecast experiment with an underlying data setfor the United States of America that consists of 112 macroeconomic variables and spansfrom January 1960 to December 2016 (see McCracken and Ng; 2016).To compare the forecast performance of the diﬀerent factor models, we report the resultingmean squared prediction errors (MSPE) for a selected subset of economic variables thatrepresent diﬀerent segments of the economy.All forecast models allow for seven common factors in the data, which is suggested by the P C ( p criterion of Bai and Ng (2002) after deterministic terms have been eliminated fromthe fractionally diﬀerenced data set. Lag lengths of the diﬀerent AR polynomials for thecommon factors and idiosyncratic components are chosen via the Bayesian Information16riterion (BIC). For the ﬁrst forecasting period we obtain starting values for the Kalmanﬁlter from the principal components estimator, as described in section 3.1. In all subsequentperiods the optimized parameters from the preceding step are used as starting values.Finally, the number of iterations of the EM algorithm is set to ten. To distinguish betweenthe ﬁrst stage and the second stage estimator, we denote the principal components forecastsas PC and the Kalman ﬁlter forecasts as KF . Abbreviations for the three fractional factormodels are: Dynamic fractional factor model ( DFFM ) in section 2.1, dynamic orthogonalfractional components (

DOFC ) in section 2.2, and dynamic factor model in fractionaldiﬀerences (

DFFD ) in section 2.3.Forecasts are conducted for horizons h = 1 , ...,

12 in a recursive window forecast experi-ment, where the ﬁrst forecast period is January 2000, whereas the last is December 2016,leading to 204 forecasts for 112 variables and 12 horizons.The

DOFC model introduced in section 2.2 includes r = 3 fractionally integrated factors,since a higher number was not found to increase the forecast precision substantially. As aconsequence, the number of remaining I (0) factors is set to r = 4. The latter restrictionis conﬁrmed by the P C ( p criterion of Bai and Ng (2002), which suggests four factors afterthe fractionally integrated factors have been projected out.A stationary data set for the DFFD model in section 2.3 is obtained by estimating theintegration order of each y i,t via the exact local Whittle estimator of Shimotsu and Phillips(2005) with a tuning parameter of 0 . AR ) where the AR lag order is chosen via the Akaike InformationCriterion for each y i,t . The second benchmark is a standard approximate dynamic factormodel (cf. e.g. Stock and Watson; 2002) that is estimated via principal components ( PC )based on a pre-diﬀerenced data set, i.e. ∆ k i y i,t + h = Λ i f t + h + ξ i,t + h , φ ( L ) f t + h = ζ t + h , where ξ i,t , ζ j,t are mutually independent and white noise ∀ t = 1 , ..., T and k i is an integer that istaken from McCracken and Ng (2016). Our third model adds lagged dependent variablesto the approximate dynamic factor model. It is given by φ ( L ) f t + h = ζ t + h , c i ( L )∆ k i y i,t + h = Λ i f t + h + ξ i,t + h where ξ i,t , ζ j,t are again mutually independent and white noise ∀ t = 1 , ..., T .We denote it as PCAR . Finally, the last benchmark is the so-called factor-augmentederror-correction model (

FECM ), which separates the observable variables into two disjoint17amples y = ( y (1) (cid:48) , y (2) (cid:48) ) (cid:48) and shrinks the latter sample via principal components to ˆ f . Avector error-correction model is then estimated for ( y (1) (cid:48) , ˆ f (cid:48) ) (cid:48) . Details on the forecastproperties are found in Banerjee et al. (2014). Since we only obtain predictions for y (1) ,the FECM results are only reported in tables 2 and 3.

Table 1 shows for a given forecast horizon h how often each speciﬁcation leads to thesmallest MSPE for all 112 variables. Hence, it illustrates how frequently fractional factormodels are able to outperform widely used forecast methods like autoregressive models andprincipal components of integer diﬀerences. To draw inference on the extent of forecastimprovement from the fractional factor models, the tables 2 and 3 report the relative MSPEfor the twelve depicted variables and for h = 1 , , , , , and 12. Consequently, they alsoshow how large the forecast accuracy ﬂuctuates for each speciﬁcation and highlight therobustness of the forecast results when a model is not chosen to be the best one.Benchmarks DFFM DOFC DFFDHorizon AR PC PCAR PC KF PC KF PC KF1 14 10 16 7 0 3 26 20 162 14 12 11 8 0 1 26 24 163 14 10 6 6 1 5 28 23 194 17 9 7 11 2 4 25 20 175 15 11 7 10 1 5 27 22 146 17 8 5 10 6 6 23 19 187 15 9 3 10 4 7 26 18 208 14 8 3 10 12 7 20 19 199 15 8 3 10 10 7 22 17 2010 15 8 3 11 14 7 16 15 2311 15 8 3 13 15 7 13 15 2312 14 7 3 10 17 9 13 16 23Table 1: Frequency of smallest MSPE: The table shows how often, for a given forecasthorizon, a speciﬁcation came with the smallest mean squared prediction error of all models.We ﬁnd that fractional factor models tend to outperform classical autoregressive models,pre-diﬀerenced principal components models and mixtures of these two model classes.Over all 1344 conducted forecasts, the benchmarks only exhibit a smaller MSPE than thefractional factor models in 357 cases (26.6%), as table 1 shows. Hence, for the remaining987 forecasts (73.4 %) the smallest MSPE is achieved by one of the six fractional factor18odels. Within the benchmarks, one often ﬁnds that principal components come with thesmallest MSPE. Nonetheless, they are often beaten by one of the fractional factor models.Among those, the dynamic orthogonal fractional components model in state space formproduces the best predictions for forecast horizons up to 9 months most frequently.In addition to the good performance of the DOFC-KF speciﬁcation, the DFFD modelscomplement the predictive power of fractional factor models. Whenever the DOFC-KFmodel does not provide the best forecasts, the fractionally diﬀerenced models are likelyto exhibit the smallest MSPE. Furthermore, principal components are found to performrelatively well at least for smaller forecast horizons when the data is in fractional diﬀerences,whereas they are typically beaten by the state space formulation in the DOFC framework.This might be a result of the additional structure that is imposed on the DOFC-KF modelvia the block-triangular identiﬁcation of the fractional factors relative to the DOFC-PCcase, whereas only little additional structure is imposed on the DFFD-KF speciﬁcationrelative to principal components. For larger forecast horizons, the forecast performanceof the DFFD-KF model improves, leading to the highest amount of best predictions for h = 10 , , h . We ﬁnd that gains from the fractional factormodels can be large, relative to the four benchmarks. In many cases, fractional factormodels can reduce the MSPE relative to the AR benchmark by more than 25%. For sometarget variables, the MSPE is cut by half when fractional factor models are used, andreductions of more than 80% are possible.Within the class of fractional factor models, we ﬁnd the DOFC-KF speciﬁcation to per-form best. For h = 1 , ,

3, the most accurate predictions for the consumer price index,personal consumption index and average hourly earnings are obtained from the DOFC-KFspeciﬁcation, which reduces the MSPE relative to the AR benchmark by more than 50%.In addition, the DOFC-KF speciﬁcation exhibits the smallest MSPE for the St. Louis ad-justed monetary base, total reserves of depository institutions and the S&P500 frequently.The stable and reliable performance of the DOFC-KF forecasts is illustrated by the factthat their largest relative MSPE is 1.29, whereas the smallest relative MSPE is 0.17.The good performance of the DOFC-KF speciﬁcation is complemented by the DFFD19 enchmarks DFFM DOFC DFFDAR PC PCAR FECM PC KF PC KF PC KFHorizon h = 1INDPRO 1.00 0.92 0.92 1.22 1.60 3.26 1.96 1.06 0.90 0.99UNRATE 1.00 0.96 0.95 0.88 1.06 2.67 1.32 0.91 0.89 0.88AWOTMAN 1.00 0.85 0.90 0.92 1.25 2.02 1.40 0.91 0.93 0.96HOUST 1.00 0.71 0.70 0.99 0.98 1.36 0.94 0.87 1.09 1.07AMBSL 1.00 0.86 1.17 0.69 0.77 2.62 0.77 0.81 2.78 3.15TOTRESNS 1.00 0.71 1.45 0.69 0.66 2.28 0.70 0.69 4.84 5.14S.P.500 1.00 1.08 1.08 1.01 1.04 2.67 1.24 1.02 1.06 0.99FEDFUNDS 1.00 2.51 2.36 2.64 1.15 4.43 1.17 1.21 3.53 1.25EXUSUKx 1.00 1.18 1.08 1.08 1.11 2.75 1.17 1.06 1.12 1.10CPIAUCSL 1.00 0.64 0.96 0.45 1.77 2.01 1.58 0.41 0.49 0.49PCEPI 1.00 0.69 0.98 0.50 3.32 1.98 2.70 0.41 0.47 0.47CES0600000008 1.00 0.88 1.14 0.48 2.50 1.00 3.06 0.30 0.48 0.41Horizon h = 2INDPRO 1.00 0.90 0.90 1.35 2.10 2.38 2.63 1.15 0.81 0.99UNRATE 1.00 1.03 0.98 0.79 1.06 2.10 1.62 0.96 0.83 0.86AWOTMAN 1.00 0.81 0.92 0.80 1.22 1.69 1.50 0.86 0.96 1.02HOUST 1.00 0.71 0.71 1.01 0.99 1.18 0.95 0.99 1.02 1.00AMBSL 1.00 0.79 1.27 0.86 0.76 1.25 0.80 0.72 1.52 1.65TOTRESNS 1.00 0.69 1.56 0.75 0.67 1.17 0.75 0.65 2.45 2.51S.P.500 1.00 1.14 1.16 1.18 1.05 1.55 1.30 1.02 1.11 0.98FEDFUNDS 1.00 1.66 1.79 2.66 0.91 2.08 0.92 0.95 2.40 1.07EXUSUKx 1.00 1.20 1.13 1.15 1.18 1.71 1.22 1.05 1.11 1.08CPIAUCSL 1.00 0.61 0.98 0.54 1.76 0.50 1.62 0.42 0.60 0.55PCEPI 1.00 0.62 0.99 0.56 3.24 0.45 2.66 0.39 0.52 0.48CES0600000008 1.00 0.99 1.16 0.37 2.16 0.64 3.26 0.21 0.34 0.28Horizon h = 3INDPRO 1.00 1.04 1.04 1.50 2.41 2.47 2.93 1.28 0.81 1.00UNRATE 1.00 1.15 1.08 0.82 1.11 2.21 1.76 1.04 0.82 0.88AWOTMAN 1.00 0.79 0.95 0.76 1.10 1.56 1.47 0.84 1.03 1.02HOUST 1.00 0.70 0.70 1.01 0.88 1.11 0.85 0.95 1.09 0.99AMBSL 1.00 0.77 1.41 0.97 0.74 0.95 0.81 0.66 1.16 1.22TOTRESNS 1.00 0.73 1.62 0.81 0.69 0.94 0.78 0.64 1.81 1.81S.P.500 1.00 1.25 1.27 1.30 1.08 1.45 1.35 1.02 1.18 1.00FEDFUNDS 1.00 1.31 1.41 2.66 0.85 1.54 0.82 0.91 1.92 1.03EXUSUKx 1.00 1.26 1.18 1.21 1.22 1.55 1.29 1.07 1.12 1.08CPIAUCSL 1.00 0.57 1.01 0.54 1.71 0.47 1.63 0.38 0.63 0.56PCEPI 1.00 0.60 1.01 0.58 3.12 0.43 2.72 0.36 0.53 0.47CES0600000008 1.00 1.13 1.18 0.40 1.79 0.56 2.95 0.18 0.27 0.22

Table 2: Selected relative mean squared prediction errors for h=1, 2, and 3. Variable codesare INDPRO: industrial production index; UNRATE: unemployment rate; AWOTMAN:average weekly overtime hours in the manufacturing business; HOUST: housing starts;AMBSL: St. Louis adjusted monetary base; TOTRESNS: total reserves of depositoryinstitutions; S.P.500: S&P500 index; FEDFUNDS: eﬀective federal funds rate; EXUSUKx:US / UK foreign exchange rate; CPIAUCSL: consumer price index; PCEPI: personalconsumption index; CES0600000008: average hourly earnings20 enchmarks DFFM DOFC DFFDAR PC PCAR FECM PC KF PC KF PC KFHorizon h = 6INDPRO 1.00 1.27 1.27 1.59 2.10 1.82 2.55 1.29 0.95 1.08UNRATE 1.00 1.58 1.46 1.11 1.25 2.04 1.86 1.25 1.05 1.06AWOTMAN 1.00 0.92 1.12 0.76 1.03 1.33 1.40 0.87 1.17 1.13HOUST 1.00 0.63 0.63 1.02 0.77 0.88 0.78 0.82 0.98 0.87AMBSL 1.00 0.89 1.87 0.87 0.62 0.68 0.76 0.55 0.89 0.90TOTRESNS 1.00 0.95 1.62 0.75 0.63 0.68 0.76 0.57 1.18 1.16S.P.500 1.00 1.50 1.53 1.40 1.09 1.21 1.47 1.00 1.19 1.01FEDFUNDS 1.00 1.28 1.33 2.70 0.90 1.21 0.76 0.91 1.36 1.06EXUSUKx 1.00 1.31 1.30 1.46 1.23 1.18 1.35 1.03 1.06 1.00CPIAUCSL 1.00 0.57 1.09 0.46 1.49 0.26 1.65 0.31 0.63 0.54PCEPI 1.00 0.59 1.06 0.54 2.69 0.25 2.85 0.34 0.52 0.44CES0600000008 1.00 1.82 1.27 0.54 0.86 0.50 2.38 0.17 0.21 0.16Horizon h = 9INDPRO 1.00 1.43 1.43 1.85 1.83 1.48 2.30 1.25 1.00 1.12UNRATE 1.00 1.88 1.77 1.48 1.23 1.76 1.65 1.26 1.17 1.12AWOTMAN 1.00 1.11 1.33 0.74 1.01 1.20 1.34 0.91 1.21 1.14HOUST 1.00 0.60 0.61 1.02 0.72 0.78 0.73 0.76 0.94 0.85AMBSL 1.00 1.31 3.30 0.91 0.59 0.59 0.79 0.51 0.81 0.80TOTRESNS 1.00 1.54 1.74 0.73 0.61 0.60 0.76 0.53 0.97 0.94S.P.500 1.00 1.76 1.79 1.50 1.09 1.13 1.54 0.99 1.19 1.01FEDFUNDS 1.00 1.41 1.50 2.55 0.93 1.16 0.79 0.96 1.18 1.07EXUSUKx 1.00 1.40 1.41 1.78 1.29 1.08 1.42 1.01 1.05 0.98CPIAUCSL 1.00 0.64 1.13 0.43 1.32 0.21 1.56 0.31 0.60 0.51PCEPI 1.00 0.65 1.10 0.53 2.40 0.21 2.70 0.37 0.50 0.42CES0600000008 1.00 2.65 1.36 0.60 0.49 0.46 1.81 0.19 0.16 0.12Horizon h = 12INDPRO 1.00 1.57 1.58 2.00 1.70 1.28 2.20 1.20 0.99 1.11UNRATE 1.00 2.20 2.09 1.68 1.16 1.50 1.45 1.22 1.18 1.12AWOTMAN 1.00 1.26 1.50 0.78 0.99 1.12 1.27 0.94 1.17 1.10HOUST 1.00 0.61 0.61 1.11 0.72 0.74 0.71 0.74 0.94 0.86AMBSL 1.00 1.84 6.01 0.77 0.52 0.52 0.75 0.45 0.70 0.69TOTRESNS 1.00 2.36 1.93 0.63 0.55 0.53 0.70 0.48 0.80 0.77S.P.500 1.00 2.00 2.03 1.57 1.07 1.09 1.59 0.98 1.20 1.00FEDFUNDS 1.00 1.54 1.65 2.39 0.95 1.11 0.83 1.00 1.10 1.06EXUSUKx 1.00 1.53 1.58 2.01 1.36 1.05 1.48 0.99 1.05 0.98CPIAUCSL 1.00 0.79 1.18 0.40 1.14 0.15 1.43 0.34 0.54 0.46PCEPI 1.00 0.76 1.14 0.51 2.08 0.16 2.47 0.42 0.46 0.39CES0600000008 1.00 4.01 1.47 0.65 0.36 0.45 1.52 0.23 0.15 0.11

Table 3: Selected relative mean squared prediction errors for h=6, 9, and 12. Variable codesare INDPRO: industrial production index; UNRATE: unemployment rate; AWOTMAN:average weekly overtime hours in the manufacturing business; HOUST: housing starts;AMBSL: St. Louis adjusted monetary base; TOTRESNS: total reserves of depositoryinstitutions; S.P.500: S&P500 index; FEDFUNDS: eﬀective federal funds rate; EXUSUKx:US / UK foreign exchange rate; CPIAUCSL: consumer price index; PCEPI: personalconsumption index; CES0600000008: average hourly earnings21odel. For the industrial production index, the DFFD-PC speciﬁcation exhibits thesmallest MSPE for any forecast horizon. In addition, the DFFD-KF speciﬁcation pro-duces accurate predictions for the S&P500, average hourly earnings and the US / UKforeign exchange rate. Furthermore, its forecast performance is almost as stable as theDOFC-KF prediction quality.Finally, the DFFM model, which serves as the most general framework as it nests the tworemaining fractional factor model formulations, cannot compete with the other fractionalfactor models, as its predictive power ﬂuctuates largely. Nonetheless, for larger forecasthorizons, the DFFM-KF formulation produces accurate forecasts for the consumer priceand personal consumption index.Note that the only diﬀerence between the benchmark PC model and the DFFD-PC spec-iﬁcation is the pre-diﬀerencing. As one can see, the two models coincide regarding theirrelative performance to the AR benchmark. The advantages over the AR model are there-fore likely to result from cross-sectional dependencies that are detected by the commonfactors. In addition, the better performance of the DFFD-PC model can be explained bythe sensitivity of standard PC methods to spurious coeﬃcients, as Franses and Janssens(2019) argue.Turning to the DOFC-KF speciﬁcation, which explicitly models fractional cointegrationrelations instead of eliminating them as in the DFFD model, we note that the forecastquality of the two models is similar for many predictions. Nonetheless, gains from theDOFC-KF speciﬁcation relative to the DFFD model can be large, especially in situationswhere the latter produces a relative MSPE >

1. Consider e.g. the forecasts for the adjustedmonetary base (AMBSL) and the total reserves of depository institutions (TOTRESNS)in tables 2 and 3, where the DOFC-KF and the FECM model perform well, wheres theDFFD-KF model yields large MSPEs. While the former two models take cointegration intoaccount, the DFFD-KF model eliminates long-run components by prior diﬀerencing andis likely to produce over-diﬀerenced short-run components. Hence, the better performanceof the DOFC-KF model over the DFFD-KF model is likely to result from cointegrationrelations and over-diﬀerencing of additive short-run factors.Finally, we want to draw inference on the performance of the fractional factor models duringthe world ﬁnancial crisis. By studying the predictive power of the fractional factor modelsduring this period, we shed light on the behavior of this model class when the economy22 .454.504.554.604.65 2008 2010 2012 year v a l ue INDPRO year v a l ue UNRATE year v a l ue AWOTMAN year v a l ue HOUST year v a l ue AMBSL year v a l ue TOTRESNS year v a l ue S.P.500 −2024 2008 2010 2012 year v a l ue FEDFUNDS year v a l ue EXUSUKx year v a l ue CPIAUCSL year v a l ue PCEPI year v a l ue CES0600000008 variable DOFC−KF DFFD−KF AR

Figure 1: Forecast performance of the DOFC-KF, DFFD-KF and AR model during theworld ﬁnancial crisis 2007 - 2010 for h = 3. Variable codes are INDPRO: industrial produc-tion index; UNRATE: unemployment rate; AWOTMAN: average weekly overtime hoursin the manufacturing business; HOUST: housing starts; AMBSL: St. Louis adjusted mon-etary base; TOTRESNS: total reserves of depository institutions; S.P.500: S&P500 index;FEDFUNDS: eﬀective federal funds rate; EXUSUKx: US / UK foreign exchange rate;CPIAUCSL: consumer price index; PCEPI: personal consumption index; CES0600000008:average hourly earnings 23s hit by a large shock and pushed out of its equilibrium growth path. For this purpose,ﬁgure 1 sketches the three step ahead predictions for the twelve selected target variablesand the two best performing fractional factor models together with the AR benchmarkand the realization of the target variable from January 2007 to December 2011. As thegraphs show, the forecast performance of the fractional factor models is not systematicallyaﬀected by the global ﬁnancial crisis relative to the AR benchmark. Instead, the forecastsconverge towards the realizations of the observable variables rapidly after the crisis. TheDOFC-KF forecasts seem to be the least aﬀected by the large shock, as they convergefaster towards the observable variables. Furthermore, the AR and DFFD-KF predictionsfor the adjusted monetary base and total reserves of depository institutions seem to bepolluted by the crisis until the end of 2009, which substantiates the relative robustness ofthe DOFC-KF speciﬁcation. We have derived three diﬀerent fractional factor models that allow the joint modelling ofdata of diﬀerent persistence. A two-stage estimator for the fractional factors and modelparameters was derived. In a macroeconomic forecast experiment, it was shown thatincorporating fractional integration into the class of factor models improves forecast per-formance substantially.Future research could examine whether a combination of the DOFC model in state spaceform and a factor model in fractional diﬀerences can improve the predictive power offractional factor models. Furthermore, one could combine principal components and theKalman ﬁlter analogous to Br¨auning and Koopman (2014) by reducing the dimension of asubset of observable variables via principal components in order to speed up the estimationof the parameters. Additionally, fractional factor models could be used to explore commontrends and cycles in macroeconomic variables and to identify cointegrated blocks. Finally,future research could address the predictive power of fractional factor models for otherdata sets and economies. If gains are of similar size as for the US, we are conﬁdent thatfractional factor models have the potential to become a widely used tool for predictingmacroeconomic dynamics. 24

Consistency of Principal Components for Fraction-ally Integrated Data

To proof consistency of principal components in a fractionally integrated setup, we deﬁnea minimal fractionally integrated factor model that is given by y t = Λ f t + e t , ∀ t = 1 , ..., T, (15)(1 − L ) d f j,t = z j,t , ∀ j = 1 , ..., r, (16)where y t = ( y ,t , ..., y N,t ) (cid:48) is a N -dimensional vector holding the observable data at point t , f t = ( f ,t , ..., f r,t ) (cid:48) ∼ I ( d ) contains the unobserved common factors and is r ×

1, and Λ is N × r and holds the factor loadings. The N × e t = ( e ,t , ..., e N,t ) (cid:48) and the r × z t = ( z ,t , ..., z r,t ) (cid:48) are I (0) stochastic processes, z j,t = (cid:80) t − k =0 c j,k (cid:15) j,t − k with (cid:15) j,t ∼ NID(0 , ∀ j = 1 , ..., r . The model nests the fractional factor models of section 2for d = ... = d r = d . In matrix form, equation (15) is written as y = f Λ (cid:48) + e , where y = ( y , ..., y T ) (cid:48) is T × N , f = ( f , ..., f T ) (cid:48) is T × r and e = ( e , ..., e T ) (cid:48) is T × N . In thefollowing, we deﬁne (cid:107) X (cid:107) = (cid:112) tr( X (cid:48) X ). M < ∞ is positive and constant. To extend theproofs of Bai and Ng (2002), Bai (2004) to the nonstationary fractional case, we make thefollowing assumptions: Assumption 1 (Common stochastic trends) . The common stochastic trends satisfy thefollowing conditions1. E | z j,t | q ≤ M for some q > max(2 , d − . ) and for all t = 1 , ..., T , j = 1 , ..., r ,2. The common stochastic trends are mutually independent. Assumption 2 (Loadings) . The factor loadings Λ = ( λ (cid:48) , ..., λ (cid:48) N ) (cid:48) are either deterministicsuch that || λ i || ≤ M ∀ i = 1 , ..., N or stochastic such that E || λ i || ≤ M ∀ i = 1 , ..., N . Ineither case, Λ (cid:124) Λ /N p → Σ N as N → ∞ , where Σ N is a r × r positive deﬁnite deterministicmatrix. Assumption 3 (Errors) . The errors e t satisfy ∀ i = 1 , ..., N , ∀ t = 1 , ..., T E[ e i,t ] = 0 , E | e i,t | ≤ M ,2. E[ e (cid:48) s e t /N ] = E[ N − (cid:80) Ni =1 e i,s e i,t ] = γ N ( s, t ) , | γ N ( s, s ) | ≤ M ∀ s = 1 , ..., T and T − (cid:80) Ts =1 (cid:80) Tt =1 | γ N ( s, t ) | ≤ M ,3. E[ e i,t e j,t ] = τ ij,t with | τ ij,t | ≤ | τ ij | for some τ ij , and N − (cid:80) Ni =1 (cid:80) Nj =1 | τ ij | ≤ M ,4. E[ e i,t e j,s ] = τ it,js and N − T − (cid:80) Ni =1 (cid:80) Nj =1 (cid:80) Tt =1 (cid:80) Ts =1 | τ ij,ts | ≤ M ,5. For every ( t, s ) , E | N − / (cid:80) Ni =1 { e i,s e i,t − E[ e i,s e i,t ] }| ≤ M . ssumption 4 (Independence) . { λ i } , { z t } and { e t } are mutually independent stochasticrandom variables. Under these assumptions, corollary 1 follows directly.

Corollary 1.

Under assumption 1, Wu and Shao (2006, corollary 2.1) yields T − d − . (cid:98) rT (cid:99) (cid:88) t =1 f j,t d −→ κB d ( r ) , such that plim T →∞ T − d (cid:80) Tt =1 f j,t ≤ M, where κ is a constant, r ∈ [0 , , B d is fractional Brow-nian motion of type II, B d ( r ) = Γ( d + 1) − (cid:82) r ( r − s ) d d B ( s ) and B is standard Brownianmotion generated by (cid:15) j,t . Theorem 1 (Consistency of principal components) . Suppose assumptions 1 - 4 hold.Then, for d ≥ . , the common factors are estimated consistently via principal componentsup to a rotation such that δ NT (cid:32) T T (cid:88) t =1 || ˆ f t − H (cid:48) f t || (cid:33) = O p (1) , where δ NT = min( N, T d ) , ˆ f = N − yy (cid:48) ˜ f T − d is a rescaled version of the principalcomponents together with ˆ Λ = ˜ Λ ( ˜ Λ (cid:48) ˜ Λ N − ) − , ˜ f is T − d times the eigenvector of yy (cid:48) , ˜ Λ (cid:48) = T − d ˜ f (cid:48) y , and H = ( Λ (cid:48) Λ N − )( f (cid:48) ˜ f ) T − d .Proof of Theorem 1. First of all, note that ˜ f ˜ Λ (cid:48) = ˆ f ˆ Λ (cid:48) sinceˆ f ˆ Λ (cid:48) = T − d yy (cid:48) ˜ f ( ˜ Λ (cid:48) ˜ Λ ) − ˜ Λ (cid:48) = T − d ˜ f ˜ Λ (cid:48) ˜ Λ ˜ f (cid:48) ˜ f ( ˜ Λ (cid:48) ˜ Λ ) − ˜ Λ (cid:48) = ˜ f ˜ Λ (cid:48) . Next (cid:107) H (cid:107) ≤ (cid:13)(cid:13) Λ (cid:48) Λ N − (cid:13)(cid:13) (cid:13)(cid:13) f (cid:48) f T − d (cid:13)(cid:13) / (cid:13)(cid:13)(cid:13) ˜ f (cid:48) ˜ f T − d (cid:13)(cid:13)(cid:13) / , where the ﬁrst term is O p (1) by assumption 2, the second term is O p (1) by corollary 1 andthe last term is O p (1) by construction. Nowˆ f t − H (cid:48) f t = N − T − d (cid:32) T (cid:88) s =1 ˜ f s y (cid:48) s y t − ˜ f (cid:48) f Λ (cid:48) Λ f t (cid:33) == N − T − d (cid:32) T (cid:88) s =1 ˜ f s f (cid:48) s Λ (cid:48) e t + T (cid:88) s =1 ˜ f s e (cid:48) s Λ f t + T (cid:88) s =1 ˜ f s e (cid:48) s e t (cid:33) == T − d (cid:32) T (cid:88) s =1 ˜ f s γ N ( s, t ) + T (cid:88) s =1 ˜ f s ζ st + T (cid:88) s =1 ˜ f s η st + T (cid:88) s =1 ˜ f s ξ st (cid:33) , where, to be consistent with the proofs of Bai (2004), we deﬁne γ N ( s, t ) = N − N (cid:88) i =1 E( e i,s e i,t ) , ζ s,t = N − N (cid:88) i =1 ( e i,s e i,t − E( e i,s e i,t )) ,η s,t = N − f (cid:48) s Λ (cid:48) e t , ξ s,t = N − f (cid:48) t Λ (cid:48) e s . T T (cid:88) t =1 (cid:13)(cid:13)(cid:13) ˆ f t − H (cid:48) f t (cid:13)(cid:13)(cid:13) ≤ T T (cid:88) t =1 (cid:34) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s γ N ( s, t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s ζ s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s η s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s ξ s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:35) . (17)The ﬁrst argument of equation (17) is1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) s =1 T − d ˜ f s γ N ( s, t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T − T (cid:88) t =1 (cid:34) T (cid:88) s =1 (cid:13)(cid:13)(cid:13) T − d ˜ f s (cid:13)(cid:13)(cid:13) (cid:35) (cid:34) T (cid:88) s =1 γ N ( s, t ) (cid:35) , where from corollary 1, (cid:80) Ts =1 (cid:13)(cid:13)(cid:13) T − d ˜ f s (cid:13)(cid:13)(cid:13) = (cid:80) Ts =1 tr (cid:16) T − d ˜ f (cid:48) s ˜ f s (cid:17) = O p ( T − d ) , and by as-sumption 3 T − T (cid:88) s =1 γ N ( s, t ) = T − T (cid:88) s =1 T (cid:88) t =1 γ N ( s, s ) γ N ( t, t ) ρ ( s, t ) ≤≤ M T − T (cid:88) s =1 T (cid:88) t =1 | γ N ( s, s ) γ N ( t, t ) | / | ρ ( s, t ) | = M T − T (cid:88) s =1 T (cid:88) t =1 | γ N ( s, t ) | ≤ M . (18)From (18) it follows that T − (cid:80) Tt =1 (cid:13)(cid:13)(cid:13)(cid:80) Ts =1 T − d ˜ f s γ N ( s, t ) (cid:13)(cid:13)(cid:13) = O p ( T − d ) . For the second term in (17) one has T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s ζ s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = T (cid:88) t =1 T (cid:88) s =1 T (cid:88) u =1 (cid:16) T − d ˜ f s ˜ f u (cid:17) ζ s,t ζ u,t ≤≤ (cid:32) T (cid:88) s =1 T (cid:88) u =1 T − d ( ˜ f s ˜ f u ) (cid:33) /  T − d T (cid:88) s =1 T (cid:88) u =1 (cid:32) T (cid:88) t =1 ζ s,t ζ u,t (cid:33)  / ≤≤ (cid:32) T − d T (cid:88) s =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) (cid:33)  T − d T (cid:88) s =1 T (cid:88) u =1 (cid:32) T (cid:88) t =1 ζ s,t ζ u,t (cid:33)  / , (19)where T − d (cid:80) Ts =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) is O p (1) from corollary 1 and E[( (cid:80) Tt =1 ζ s,t ζ u,t ) ] ≤ T max s,t E | ζ s,t | .For the latter term one has by assumption 3E | ζ s,t | = 1 N E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N − / N (cid:88) i =1 [ e i,s e i,t − E( e i,s e i,t )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = N − M, as in Bai and Ng (2002) and, therefore, E[( (cid:80) Tt =1 ζ s,t ζ u,t ) ] is O p ( T N ). Together with (19)this implies1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s ζ st (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T − O p (1) O p ( T − d ) N − ) = O p ( T − d N − ) , − d ≤

0. Considering the third term of equation (17) we have1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s η s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = T − T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) s =1 T − d ˜ f s f (cid:48) s Λ (cid:48) e t N (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤≤ T − T (cid:88) t =1 N − (cid:107) e (cid:48) t Λ (cid:107) (cid:32) T (cid:88) s =1 (cid:13)(cid:13)(cid:13) T − d ˜ f s f (cid:48) s (cid:13)(cid:13)(cid:13) (cid:33) ≤≤ T − T (cid:88) t =1 N − (cid:107) e (cid:48) t Λ (cid:107) (cid:32) T (cid:88) s =1 (cid:13)(cid:13) T − d f s (cid:13)(cid:13) (cid:33) (cid:32) T (cid:88) s =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) (cid:33) == T − T (cid:88) t =1 N − (cid:107) e (cid:48) t Λ (cid:107) (cid:32) T (cid:88) s =1 r (cid:88) l =1 T − d f l,s (cid:33) (cid:32) T − d T (cid:88) s =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) (cid:33) == T − T (cid:88) t =1 N − (cid:107) e (cid:48) t Λ (cid:107) O p (1) , where the last step follows from corollary 1. Note thatE (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ N N (cid:88) i =1 e i,t λ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  = 1 N N (cid:88) i =1 N (cid:88) j =1 E( e i,t e j,t λ (cid:48) i λ j ) ≤ ¯ λ N N (cid:88) i =1 N (cid:88) j =1 | τ ij | = ¯ λ M, with ¯ λ < ∞ from assumption 2, and therefore one has for the third term1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s η s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = T − T (cid:88) t =1 O p ( N − ) = O p ( N − ) . Finally, for the last term, one has1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − d T (cid:88) s =1 ˜ f s ξ s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 1 T T (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) s =1 T − d ˜ f s f (cid:48) t Λ (cid:48) e s N (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤≤ T (cid:32) T (cid:88) s =1 (cid:13)(cid:13) N − e (cid:48) s Λ (cid:13)(cid:13) (cid:33) T (cid:88) t =1 T (cid:88) s =1 (cid:13)(cid:13)(cid:13) T − d ˜ f s ˜ f (cid:48) t (cid:13)(cid:13)(cid:13) ≤≤ T (cid:32) T (cid:88) s =1 (cid:13)(cid:13) N − e (cid:48) s Λ (cid:13)(cid:13) (cid:33) (cid:32) T (cid:88) t =1 (cid:13)(cid:13) T − d f t (cid:13)(cid:13) (cid:33) (cid:32) T (cid:88) s =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) (cid:33) == 1 T (cid:32) T (cid:88) s =1 (cid:13)(cid:13) N − e (cid:48) s Λ (cid:13)(cid:13) (cid:33) (cid:32) T (cid:88) t =1 r (cid:88) l =1 T − d f l,t (cid:33) (cid:32) T − d T (cid:88) s =1 (cid:13)(cid:13)(cid:13) ˜ f s (cid:13)(cid:13)(cid:13) (cid:33) = O p ( N − ) . by corollary 1. Consequently, theorem 1 holds.28 eferences Bai, J. (2004). Estimating cross-section common stochastic trends in nonstationary paneldata,

Journal of Econometrics (1): 137–183.Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models,

Econometrica (1): 191–221.Bai, J. and Ng, S. (2004). A panic attack on unit roots and cointegration, Econometrica (4): 1127–1177.Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics, Journal of Econometrics (1): 5–59.Banerjee, A., Marcellino, M. and Masten, I. (2014). Forecasting with factor-augmentederror correction models, International Journal of Forecasting (3): 589–612.Banerjee, A., Marcellino, M. and Masten, I. (2016). An overview of the factor-augmentederror-correction model, in E. Hillebrand and S. J. Koopman (eds),

Dynamic FactorModels (Advances in Econometrics, Volume 35) , Emerald Group Publishing Limited,pp. 3–41.Barigozzi, M., Lippi, M. and Luciani, M. (2016). Non-stationary dynamic factor modelsfor large datasets,

Working paper , Board of Governors of the Federal Reserve System.

URL: http://dx.doi.org/10.2139/ssrn.2402185

Br¨auning, F. and Koopman, S. J. (2014). Forecasting macroeconomic variables usingcollapsed dynamic factor analysis,

International Journal of Forecasting (3): 572–584.Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes, TheAnnals of Statistics (2): 719–740.Chang, Y., Miller, J. I. and Park, J. Y. (2009). Extracting a common stochastic trend:Theory with some applications, Journal of Econometrics (2): 231–247.Chen, W. W. and Hurvich, C. M. (2006). Semiparametric estimation of fractional cointe-gration subspaces,

The Annals of Statistics (6): 2939–2979.Eickmeier, S. (2009). Comovements and heterogeneity in the Euro area analyzed in anon-stationary dynamic factor model, Journal of Applied Econometrics (6): 933–959.Ergemen, Y. E. (2019). System estimation of panel data models under long-range depen-dence, Journal of Business & Economic Statistics (1): 13–26.29orni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic-factor model: Identiﬁcation and estimation, The Review of Economics and Statistics (4): 540–554.Franses, P. H. and Janssens, E. (2019). Spurious principal components, Applied EconomicsLetters (1): 37–39.Gil-Ala˜na, L. A. and Robinson, P. M. (1997). Testing of unit root and other nonstationaryhypotheses in macroeconomic time series, Journal of Econometrics (2): 241–268.Hartl, T., Tschernig, R. and Weber, E. (2020). Fractional trends in unobserved componentsmodels, arXiv:2005.03988 , arXiv.org. URL: https://arxiv.org/pdf/2005.03988.pdf

Hartl, T. and Weigand, R. (2019a). Approximate state space modelling of unobservedfractional components, arXiv:1812.09142 , arXiv.org.

URL: https://arxiv.org/pdf/1812.09142.pdf

Hartl, T. and Weigand, R. (2019b). Multivariate fractional components analysis, arXiv:1812.09149 , arXiv.org.

URL: https://arxiv.org/pdf/1812.09149.pdf

Hassler, U. and Wolters, J. (1995). Long memory in inﬂation rates: International evidence,

Journal of Business & Economic Statistics (1): 37–45.Johansen, S. (2008). A representation theory for a class of vector autoregressive modelsfor fractional processes, Econometric Theory (3): 651–676.Jungbacker, B. and Koopman, S. J. (2015). Likelihood-based dynamic factor analysis formeasurement and forecasting, Econometrics Journal : C1–C21.Luciani, M. and Veredas, D. (2015). Estimating and forecasting large panels of volatilitieswith approximate dynamic factor models, Journal of Forecasting : 163–176.Marinucci, D. and Robinson, P. (1999). Alternative forms of fractional Brownian motion, Journal of Statistical Planning and Inference : 111–122.Matteson, D. S. and Tsay, R. S. (2011). Dynamic orthogonal components for multivariatetime series, Journal of the American Statistical Association (496): 1450–1463.McCracken, M. W. and Ng, S. (2016). FRED-MD: A monthly database for macroeconomicresearch,

Journal of Business & Economic Statistics (4): 574–589.30ielsen, M. Ø. (2004). Eﬃcient inference in multivariate fractionally integrated time seriesmodels, Econometrics Journal : 63–97.Palma, W. (2007). Long-Memory Time Series: Theory and Methods , Wiley.Pe˜na, D. and Poncela, P. (2006). Nonstationary dynamic factor analysis,

Journal of Sta-tistical Planning and Inference (4): 1237–1257.Shimotsu, K. and Phillips, P. C. B. (2005). Exact local Whittle estimation of fractionalintegration,