[PDF] Fractional trends in unobserved components models

Abstract

We develop a generalization of unobserved components models that allows for a wide range of long-run dynamics by modelling the permanent component as a fractionally integrated process. The model does not require stationarity and can be cast in state space form. In a multivariate setup, fractional trends may yield a cointegrated system. We derive the Kalman filter estimator for the common fractionally integrated component and establish consistency and asymptotic (mixed) normality of the maximum likelihood estimator. We apply the model to extract a common long-run component of three US inflation measures, where we show that the I(1) assumption is likely to be violated for the common trend.

Full PDF

FFractional trends in unobserved components models

Tobias Hartl ∗ , Rolf Tschernig , and Enzo Weber University of Regensburg, 93053 Regensburg, Germany Institute for Employment Research (IAB), 90478 Nuremberg, GermanyMay 2020

Abstract.

We develop a generalization of unobserved components models that allows fora wide range of long-run dynamics by modelling the permanent component as a fraction-ally integrated process. The model allows for cointegration, does not require stationarity,and can be cast in state space form. We derive the Kalman ﬁlter estimator for the com-mon fractionally integrated component and establish consistency and asymptotic (mixed)normality of the maximum likelihood estimator. We apply the model to extract a com-mon long-run component of three US inﬂation measures, where we show that the I (1)assumption is likely to be violated for the common trend. Keywords. long memory, unobserved components, fractional cointegration, Kalman ﬁl-ter, state space models

JEL-Classiﬁcation.

C32, C51, E31 ∗ Corresponding author. E-Mail: [email protected] authors thank Uwe Hassler, Morten Ø. Nielsen, Christoph Rust, the participants of the econometricseminar in Nuremberg, the department seminar at the Christian Albrechts University Kiel, the DAGStatconference 2019 in Munich, the workshop on high-dimensional time series in economics and ﬁnance 2019in Vienna, the Annual Meeting of the German Statistical Society 2019 in Trier, the Annual Meeting ofthe German Economic Society 2019 in Leipzig, the Seminar on International Economic Policy at theUniversity of Zurich, the International Conference on Computational and Financial Econometrics 2019 inLondon, the Symposium in Honor of Michael Hauser at WU Vienna, and the Standing Field Committeein Econometrics of the German Economic Society for many valuable comments. Support through theprojects TS283/1-1 and WE4847/4-1 ﬁnanced by the German Research Foundation (DFG) is gratefullyacknowledged. a r X i v : . [ ec on . E M ] M a y Introduction

Unobserved components (UC) models are widely used to decompose time series into latentcomponents of diﬀerent persistence. Applications in economics include, among others,trend-cycle decompositions, the analysis of long-run equilibrium relations, testing for meanreversion e.g. in asset returns, and forecasting (see Kim and Nelson; 1999; Koopman andShephard; 2015, for an overview).Despite their wide spread, current UC models exhibit two major limitations. First, theyrequire a priori assumptions about the integration order of a series and, therefore, an en-dogenous treatment of the long-run dynamic characteristics is infeasible. And second, theyrestrict the long-run component to be I (0), I (1), or I (2). Statistical inference about thedegree of persistence of a long-run component is then limited to prior unit root testing,ignoring the non-standard behavior of economic series that exhibit long memory and hin-dering the estimation of the integration order on a continuous support jointly with theother parameters of the model. Furthermore, model selection uncertainty from prior unitroot testing is not taken into account. Finally, misspeciﬁcation of the integration ordermay pollute the estimates of permanent and transitory components and bias the varianceestimates for the permanent and transitory shocks.While for the Beveridge-Nelson decomposition a generalization to ARFIMA processes wasderived by Ari˜no and Marmol (2004) and Proietti (2016), and low-frequency transfor-mations that allow for fractional integration have been proposed by M¨uller and Watson(2018), UC models lack a generalization to fractionally integrated processes. Deriving sucha generalization is particularly challenging: It requires to study the convergence propertiesof the Kalman ﬁlter through which the unobserved components are estimated when frac-tional integration is allowed. In addition, to enable feasible estimation for time series oflength n with n large, a modiﬁcation of the Kalman ﬁlter is necessary, as the state vector offractionally integrated processes is of dimension n + 1, thus making the standard Kalmanﬁlter inapplicable from a computational perspective. Moreover, the asymptotic theory ofthe maximum likelihood estimator, that is utilized to estimate the model parameters, hasto be derived. So far, asymptotic results are only available for the I (1) case considered inChang et al. (2009), where in contrast to our model the integration order is assumed to beknown. Providing the theoretical analysis required for fractionally integrated UC modelstogether with a computationally feasible estimator for the latent components is the coreof this paper.We contribute to the literature by deriving a fractionally integrated unobserved compo-nents model that allows for a ﬂexible treatment of the long-run dynamic characteristics ofmultivariate stochastic processes by letting the common integration order to take values1n a set of positive real numbers including zero. Since we model a p -dimensional vectorof observable random variables { y t } nt =1 as a linear function of a scalar latent variable x t that is fractionally integrated of order b , our model exhibits p − I (0).The model is cast in state space form and allows for asymptotically stationary and nonsta-tionary data. Although an exact state space representation of our model exists, estimatinga latent fractionally integrated component via the Kalman ﬁlter is computationally infea-sible for time series with sample size n large. Therefore, we derive a modiﬁed version of theKalman ﬁlter that is based on a truncated state space representation of our fractionallyintegrated unobserved components model while correcting the observable variables for theapproximation error that results from the truncation. Our modiﬁed Kalman ﬁlter yieldsthe same prediction error and likelihood function as the standard Kalman ﬁlter that isbased on the full state space representation of a fractionally integrated process but greatlyreduces the computing time by keeping the state dimension manageable. E.g. for our ap-plication in section 4, the modiﬁed Kalman ﬁlter is found to be about 150 times fasterthan the standard Kalman ﬁlter.The second main technical contribution of our paper is to establish the asymptotic theoryfor the maximum likelihood estimator of our fractionally integrated unobserved compo-nents model. Since the asymptotic properties of the objective function depend on thefractional integration order b of the data-generating process and diﬀer for b < / b > /

2, we consider the asymptotically stationary case and the nonstationary case sep-arately, where in each case the objective function of the maximum likelihood estimatoruniformly converges. While a central limit theorem for martingale diﬀerence sequencesholds for b < / n b to a mixed normal distribution, thus reﬂectingthe behavior of cointegration models. From these results, it follows for the model parame-ters that standard inference results remain valid when a fractionally integrated componentis introduced.As an empirical application, we consider the estimation of unobserved long-run inﬂationby extracting a common fractional component from a set of price measures for the US.2or inﬂation, there exists substantial evidence suggesting that the series are fractionallyintegrated (cf. eg. Hassler and Wolters; 1995; Tschernig et al.; 2013). We conﬁrm suchﬁndings and estimate the integration order of unobserved long-run inﬂation to be 0 . I (1) component to the fractional case, we are able toshow consistency, to derive the convergence rates for diﬀerent parameters and to establisha central limit theorem for the maximum likelihood estimator. In section 4 the model isapplied to extract a common long-run component from diﬀerent US inﬂation measures.Section 5 concludes. All proofs are collected in the appendix. In this section we ﬁrst derive the fractionally integrated unobserved components model andstate the necessary assumptions for identiﬁcation. Next, we cast the model in state spaceform, from which we derive the Kalman ﬁlter estimator for the latent common long-runcomponent, thereby generalizing the permanent-transitory decomposition of Chang et al.(2009). Furthermore, since the Kalman ﬁlter estimator based on the exact state spacerepresentation is computationally infeasible for long time series, we propose a modiﬁedKalman ﬁlter estimator that is based on a ﬁnite ARMA approximation of the fractionallyintegrated process but directly corrects for the resulting approximation error. In corollary2.4 we show that the modiﬁed estimator yields the same prediction error as the estimatorthat is based on the exact state space representation and, therefore, has the same likelihoodbut keeps the state dimension manageable.To begin with, consider the unobserved components model y t = βx t + u t , ∆ b + x t = η t , t = 1 , ..., n, (1)where y t is a p -dimensional observable time series, x t is a scalar latent variable that isfractionally integrated of order b , x t ∼ I ( b ), b ∈ D , D = { d ∈ R | ≤ d < / , d (cid:54) = 1 / } , β is a p × u t ∼ NID(0 , Σ) and η t ∼ NID(0 , p and 1 that are independent and Σ is diagonal and has full rank.3e collect the parameters in θ = ( β (cid:48) , (vech Σ) (cid:48) , b ) (cid:48) ∈ Θ . The model may be interpreted asa system where p observable variables y t are driven by one common, fractionally integratedstochastic trend x t , such that the whole system is I ( b ) and p − β , Σ , and b . They are collected in θ = ( β (cid:48) , (vech Σ ) (cid:48) , b ) (cid:48) ∈ Θ . We exclude the singular point b = 1 / b < /

2, where the maximum likelihood estimatoris asymptotically Gaussian, and b > /

2, where a rotation of the parameter estimatorfor β is asymptotically mixed normal, as will be shown in section 3. The same restrictionapplies to other cointegrated models (cf. e.g. Johansen and Nielsen; 2012). Since we imposeVar( η t ) = 1, Σ diagonal and of full rank, the model is identiﬁed up to a sign for β . Thereforewe restrict the ﬁrst entry to be positive for unique identiﬁcation.The fractional diﬀerence operator ∆ b is deﬁned as∆ b = (1 − L ) b = ∞ (cid:88) j =0 π j ( b ) L j , π j ( b ) =  j − b − j π j − ( b ) j = 1 , , ..., j = 0 , and a + –subscript amounts to a truncation of an operator at t ≤

0, i.e. for an arbitraryprocess z t , ∆ b + z t = (cid:80) t − j =0 π j ( b ) L j z t (see e.g. Johansen; 2008). For b ∈ N the fractionallong-run component nests the standard integer integrated speciﬁcations, whereas b ∈ D adds ﬂexibility to the weighting of past shocks. Throughout the paper, we adopt the typeII deﬁnition of fractional integration (Marinucci and Robinson; 1999) that assumes zerostarting values for all fractional processes, and, as a consequence, allows for a smoothtreatment of the asymptotically stationary ( b < /

2) and the nonstationary ( b > / − b + exists and is givenby ∆ − b + z t = (1 − L ) − b + z t = (cid:80) t − j =0 ϕ j ( b ) z t − j , where ϕ j ( b ) = π j ( − b ) for all j . Finally, wemake use of the fractional lag operator introduced in Johansen (2008) that is deﬁned as L b = 1 − ∆ b + and nests the standard lag operator L = L for b = 1. Note that L b z t preservesthe integration order of a random variable z t since b ∈ D is restricted to be non-negative.Let ( b ≥

1) be an indicator function that becomes one if b ≥ d = b − ( b ≥

1) denote the mean-reverting fraction of a long memory process.Deﬁne ∆ − d + = (cid:80) t − j =0 ϕ j ( d ) L j and ∆ d + = (cid:80) t − j =0 π j ( d ) L j as a function of d , such that ∆ − b + =(1 − L ) − ( b ≥ (cid:80) t − j =0 ϕ j ( d ) L j distinguishes between an integer integration order and thefractionally integrated polynomial with d ∈ [0 , d in the binomial expansion of the fractional diﬀerence operators ∆ d + , ∆ − d + and denote π j , ϕ j as the j -th coeﬃcient of ∆ d + , ∆ − d + if not stated diﬀerent explicitly. Then x t in (1) is4epresented as x t = ( b ≥ x t − + t − (cid:88) j =0 ϕ j η t − j . (2)Given the parameters b , β , and Σ , the exact state space representation of our model (1)is given by α t +1 = T α t + Rη t +1 , y t = Zα t + u t , where T =  ( b ≥

1) 1 0 · · ·

00 0 1 · · · · · ·

10 0 0 · · ·  , R =  ϕ ... ϕ n − ϕ n  , α t =  x t ϕ η t + · · · + ϕ n η t − n +1 ... ϕ n − η t + ϕ n η t − ϕ n η t  ,Z = (cid:104) β · · · (cid:105) and where η t = 0 for all t ≤ F t be the σ -ﬁeld generated by the observable variables y , ..., y t . Furthermore, let z t | s = E θ ( z t |F s ) for z = x, α , and P t | s = Var θ ( α t |F s ) with ω ( i,j ) t as its ( i, j )-th entry for s = t −

1. The θ -subscript denotes that expectations are taken given a parameter vector θ , and E θ ( y t |F t − ) = E( y t |F t − ). Additionally, let α ( j ) t | t − denote the j -th entry of α t | t − .The prediction and updating steps of the Kalman ﬁlter for model (1) given the observabledata and the parameter vector θ are v t ( θ ) = y t − E θ ( y t |F t − ) = y t − β E θ ( x t |F t − ) = y t − βx t | t − , (3) F t = Var θ ( v t ( θ ) |F t − ) = β Var θ ( x t |F t − ) β (cid:48) + Σ = βw (1 , t β (cid:48) + Σ, (4) α t +1 | t = T α t | t − + T P t | t − Z (cid:48) F − t v t ( θ ) , (5) P t +1 | t = T P t | t − T (cid:48) − T P t | t − Z (cid:48) F − t ZP t | t − T (cid:48) + RR (cid:48) . (6)The following theorem states the conditional expectation of the latent variable x t given F t − and generalizes the results of Chang et al. (2009) for I (1) stochastic trends to thefractional domain. Theorem 2.1.

For the exact state space representation of the unobserved components odel (1) the conditional expectation of the latent variable x t +1 is given by x t +1 | t = β (cid:48) Σ − β (cid:48) Σ − β y t +1 − z t +1 ( θ ) , where z t +1 ( θ ) = β (cid:48) Σ − β (cid:48) Σ − β (cid:0) ∆ b + y t +1 − E θ (∆ b + y t +1 |F t ) (cid:1) ,v t +1 ( θ ) = (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) y t +1 + βz t +1 ( θ ) . (7)The proof of theorem 2.1 is contained in appendix A.1. There, and in the proofs that follow,we denote w t as any I (0) process that is a function of the underlying NID distributed shocks u , ..., u t , and η , ..., η t . Since E θ ( y t |F t − ) = E( y t |F t − ), and thus v t ( θ ) = y t − E( y t |F t − ),it follows that ( v t ( θ ) , F t ) is a martingale diﬀerence sequence (MDS).Theorem 2.1 illustrates that the Kalman ﬁlter estimator x t +1 | t can be decomposed into alinear combination of y t +1 that is I ( b ) and an additive component z t +1 ( θ ) where the latteris the prediction error for the fractionally diﬀerenced univariate process ∆ b + β (cid:48) Σ − β (cid:48) Σ − β y t +1 giventhe ﬁltration F t . The integration order of this prediction error is given by the followinglemma. Lemma 2.2.

The univariate prediction error z t ( θ ) is I ( b − b ) for all t = 1 , ..., n . The proof is included in appendix A.1. Thus, the Kalman ﬁlter estimator x t +1 | t is always I ( b ). The prediction error v t +1 ( θ ) combines errors from β (cid:54) = β and errors from b (cid:54) = b .It is I ( b ) for β (cid:54) = β , since (cid:16) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:17) β x t +1 (cid:54) = 0, whereas β = β yields v t +1 ( θ ) = (cid:16) I − β β (cid:48) Σ − β (cid:48) Σ − β (cid:17) u t +1 + β z t +1 ( θ ) ∼ I ( b − b ) by lemma 2.2. Finally, v t +1 ( θ ) ∼ I (0).Although a ﬁnite-order state space representation of the system in (1) exists since a frac-tionally integrated process of type II exhibits a ﬁnite-order autoregressive representationof length n −

1, estimating such a system is only computationally feasible when n is small.To estimate α t the Kalman ﬁlter computes the inverse of the ( n + 1) × ( n + 1) covariancematrix P t | t − for t = 1 , ..., n sequentially, which makes the ﬁlter inapplicable for large n .As a solution, Chan and Palma (1998) suggest to truncate the Wold representation of afractionally integrated process after m lags before the model is cast in state space form,and provide consistency results for b < /

2. Hartl and Weigand (2019) ﬁnd that a purelyfractionally integrated trend is well approximated by ﬁnite ARMA processes in severalsimulation studies. For optimization purposes their approach is particularly convenientsince it maps from the fractional integration order b to its related ARMA coeﬃcients and,therefore, optimization is conducted over b . 6onetheless, the literature lacks consistency results for ﬁnite approximations of fraction-ally integrated processes in state space form when b > /

2, and we expect any esti-mator that truncates the fractionally integrated process at lag m , m < n , to becomeinconsistent as soon as b > / b (cid:54) = 1, since the variance of the truncated sum(1 − L ) − ( b ≥ (cid:80) n − j = m +1 ϕ j ( d ) η n − j diverges as n → ∞ .As a solution, we include a correction for the resulting approximation error that allows usto contribute to the literature on fractionally integrated processes in state space form byderiving consistency results for the maximum likelihood estimator when b ∈ D . To ob-tain a computationally feasible representation, we approximate the fractionally integratedprocess by a ﬁnite-order ARMA process, but directly correct for the resulting approxima-tion error. We base our theoretical analysis on ARMA(1 , m ) approximations of x t , wherethe moving average polynomial truncates the stable part of the Wold representation of afractionally integrated process, whereas the AR polynomial controls for integration ordersgreater or equal to one. As will be shown in this section, the modiﬁed Kalman ﬁlter yieldsthe same likelihood function as the one that is based on the exact state space representationof a fractionally integrated process.Let ˜ y t denote an approximate version of (1) and (2) that is obtained by truncating thefractional polynomial (cid:80) t − i =0 ϕ i η t − i after lag m ,˜ y t = β ˜ x t + u t , ˜ x t = ( b ≥ x t − + m (cid:88) i =0 ϕ i η t − i , (8)such that (1 − L ) ( b ≥ (˜ x t − x t ) = − (cid:80) t − i = m +1 ϕ i η t − i .The system matrices and variables of the approximate state space representation are de-noted with tilde, i.e. ˜ T , ˜ Z , ˜ R , ˜ α t , ˜ v t ( θ ), ˜ P t | s , and ˜ ω ( i,j ) t . Hence, ˜ T = T (1:( m +1) , m +1)) consistsof the upper m + 1 columns and rows of T , ˜ Z = Z ( · , m +1)) holds the ﬁrst m + 1 columns of Z , ˜ R = R (1:( m +1) , · ) consists of the ﬁrst m + 1 rows of R and the ( m + 1) vector ˜ α t is givenby ˜ α t = (cid:16) ˜ x t ϕ η t + ... + ϕ m η t +1 − m · · · ϕ m η t (cid:17) (cid:48) . ˜ P t | s , ˜ v t ( θ ) are deﬁned accordingly. TheKalman ﬁlter equations (3) to (6) hold equivalently if denoted with tilde.In the following theorem we state the conditional expectation ˜ x t +1 | t of the truncated modelas a function of x t +1 | t and an approximation error. Theorem 2.3.

Let e i be a (1 × t ) unit vector with a one at column i and zeros elsewhere.Deﬁne Y t = ( y (cid:48) , ..., y (cid:48) t ) (cid:48) and η t = ( η , ..., η t ) (cid:48) . For the truncated model (8) the conditional xpectation of the latent variable can be written as ˜ x t +1 | t = x t +1 | t − (cid:15) t +1 ( θ ) ,(cid:15) t +1 ( θ ) = (cid:80) ti = m +1 ϕ i e t +1 − i Σ η t Y t Σ − Y t Y t if b < , (cid:80) ts = m +1 (cid:80) si = m +1 ϕ i e s +1 − i Σ η t Y t Σ − Y t Y t if b ≥ , where (cid:15) t +1 ( θ ) denotes the approximation error, and Σ η t Y t = Cov θ ( η t , Y t ) , Σ Y t = Var θ ( Y t ) .Furthermore E θ ( (cid:15) t +1 ( θ )) = 0 . Details on these matrices are presented in the proof, whichis contained in appendix A.1. The prediction error v t +1 ( θ ) can be decomposed into the prediction error of the truncatedmodel plus the approximation error v t +1 ( θ ) = y t +1 − E θ ( y t +1 |F t ) = y t +1 − E θ (˜ y t +1 |F t ) − E θ ( y t +1 − ˜ y t +1 |F t ) == y t +1 − β ˜ x t +1 | t − β(cid:15) t +1 ( θ ) = ˜ v t +1 ( θ ) − β(cid:15) t +1 ( θ ) . Note that the approximation error (cid:15) t ( θ ) is the Kalman ﬁlter estimate for x t − ˜ x t =(1 − L ) − ( b ≥ (cid:80) t − i = m +1 ϕ i η t − i given F t − and, therefore, it is F t − -measurable and can becalculated given the formula in theorem 2.3. Consequently, the results from theorem 2.1for the exact representation carry over if y t is corrected for the approximation error, as thefollowing corollary states. Corollary 2.4.

Deﬁne ¨ y t = y t − β(cid:15) t ( θ ) . Using the results in theorem 2.1 and 2.3 yields E θ (¨ y t +1 |F t ) = E θ (˜ y t +1 |F t ) = β ˜ x t +1 | t , ˜ x t +1 | t = β (cid:48) Σ − β (cid:48) Σ − β y t +1 − (cid:15) t +1 ( θ ) − z t +1 ( θ ) , and v t +1 ( θ ) = ¨ y t +1 − E θ (˜ y t +1 |F t ) = ¨ y t +1 − E θ (¨ y t +1 |F t ) . From corollary 2.4 it follows that the prediction errors of the exact representation (1) using { y t } nt =1 and the truncated model (8) together with the approximation-corrected { ¨ y t } nt =1 are identical and have the same conditional likelihood given θ . Hence, maximizing thelikelihood of the approximation-corrected truncated model solves the same optimizationproblem as for the exact state space representation but requires a smaller number of stateestimates from the Kalman ﬁlter if m < n . The modiﬁed Kalman ﬁlter outperformsthe standard Kalman ﬁlter from a computational perspective whenever p (cid:28) n , as itrequires to invert the np × np matrix Σ Y n once, whereas the Kalman ﬁlter based on thefull representation of (1) sequentially inverts the ( n + 1) × ( n + 1) matrix P t | t − for each8 = 1 , ..., n . E.g. for our application in section 4, the modiﬁed Kalman ﬁlter is about 150times faster than the standard Kalman ﬁlter.Although we base our theoretical analysis on ARMA(1 , m ) approximations of x t , includingfurther lags of the autoregressive polynomial may improve the approximation quality inﬁnite samples, as Hartl and Weigand (2019) show, and, therefore, speed up the parameteroptimization. Nonetheless, the asymptotic results remain unaﬀected by an extended ARpolynomial since correcting for the approximation error yields an exact representation ofa fractionally integrated process anyway. For notational convenience we therefore stick tothe simplest ARMA(1 , m ) approximation in section 2, whereas in our empirical applicationin section 4 we use ARMA(4, 4) approximations for a faster convergence of the estimator.Having shown that an exact representation (1) together with { y t } nt =1 yields the same condi-tional likelihood of the prediction error as a truncated, approximation-corrected model (8)together with { ¨ y t } nt =1 for a given θ , we turn to the estimation of the unknown parameters θ in the subsequent section, where we focus on the exact state space representation of (1).For the asymptotic results to carry over to the truncated, approximation-corrected modelit is required that (cid:15) t ( θ ) < ∞ , and therefore the truncation parameter is required to dependon the sample size n , m = m ( n ), whenever b > / In this section we derive the maximum likelihood (ML) estimator for the unknown pa-rameters θ in the unobserved components model (1) with a common fractional trend anddetermine the asymptotic properties of the ML estimator. With respect to the latter, twomajor diﬃculties have to be tackled. First, as it already becomes clear from theorem 2.1and lemma 2.2, z t ( θ ) depends on b − b and is nonstationary for b − b ≥ /

2. We tacklethis issue by ﬁrst establishing consistency of the ML estimator for b , where we show thatthe estimator is nested in the ARFIMA optimization problem considered in Nielsen (2015).There, consistency of the estimator for b is shown by splitting D into diﬀerent intervals andshowing that the relevant parameter space reduces to D ( κ ) = D ∩{ b : b − b ≥ − / κ } ,0 < κ < /

2, where the objective function of the estimator converges uniformly. Conse-quently, z t ( θ ) and the partial derivative of v t ( θ ) w.r.t. b converge to stationary processes.The second diﬃculty arises from the partial derivative of v t ( θ ) w.r.t. β that is I ( b ),which implies that the convergence rate of the ML estimator for β depends on b for b ∈ (1 / , / b ∈ [0 , / b ∈ (1 / , /

2) separately. For both cases we show that theML estimator of θ converges to a normal distribution, whereas in the latter case a certainrotation of the parameters is asymptotically mixed normally distributed.9he section is organized as follows. We ﬁrst state the log likelihood of the state spacemodel (1) together with its ﬁrst and second derivative and comment on the convergenceof the prediction error variance F t in (4). Next, we show consistency of the ML estimatorfor b . Finally, we derive the asymptotic distribution for the ML estimator of θ for theasymptotically stationary case b ∈ [0 , /

2) and the nonstationary case b ∈ (1 / , / l n ( θ ) = − n F [ n ] −

12 tr F [ n ] − n (cid:88) t =1 v t ( θ ) v t ( θ ) (cid:48) , (9)where F [ n ] = lim t →∞ Var θ ( v t ( θ ) |F t − ) is the steady state variance of the prediction errorthat depends on the ﬁxed system dimension n due to the type II deﬁnition of long memory.The existence of a steady state F [ n ] is shown in lemma A.5 in appendix A.2. The derivationof the asymptotic properties of the ML estimator requires convergence of the steady statevariance F [ n ] as n → ∞ . This is shown in lemma A.6 in appendix A.2, where special careis taken w.r.t. the state dimension increasing with n .An analytical solution for the score and Hessian matrix was derived in Chang et al. (2009)and is given by s n ( θ ) = − n ∂ (vec F [ n ] ) (cid:48) ∂θ vec F [ n ] − + 12 ∂ (vec F [ n ] ) (cid:48) ∂θ vec (cid:32) F [ n ] − n (cid:88) t =1 v t ( θ ) v t ( θ ) (cid:48) F [ n ] − (cid:33) − n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ F [ n ] − v t ( θ ) , (10)and H n ( θ ) = (cid:88) h =1 H n,h ( θ ) , (11) H n, ( θ ) = − n (cid:104) I ⊗ (vec F [ n ] − ) (cid:48) (cid:105) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ vec F [ n ] (cid:19) ,H n, ( θ ) = 12 (cid:40) I ⊗ (cid:40) vec (cid:34) F [ n ] − (cid:32) n (cid:88) t =1 v t ( θ ) v t ( θ ) (cid:48) (cid:33) F [ n ] − (cid:35)(cid:41) (cid:48) (cid:41) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ vec F [ n ] (cid:19) ,H n, ( θ ) = n ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:16) F [ n ] − ⊗ F [ n ] − (cid:17) ∂ (vec F [ n ] ) ∂θ (cid:48) ,H n, ( θ ) = − ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:104) F [ n ] − ⊗ F [ n ] − (cid:32) n (cid:88) t =1 v t ( θ ) v t ( θ ) (cid:48) (cid:33) F [ n ] − F [ n ] − (cid:32) n (cid:88) t =1 v t ( θ ) v t ( θ ) (cid:48) (cid:33) F [ n ] − ⊗ F [ n ] − (cid:105) ∂ vec F [ n ] ∂θ (cid:48) ,H n, ( θ ) = − n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ F [ n ] − ∂v t ( θ ) ∂θ (cid:48) ,H n, ( θ ) = − n (cid:88) t =1 (cid:16) I ⊗ v t ( θ ) (cid:48) F [ n ] − (cid:17) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) ,H n, ( θ ) = ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:16) F [ n ] − ⊗ F [ n ] − (cid:17) n (cid:88) t =1 (cid:18) ∂v t ( θ ) ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) , H n, ( θ ) = H n, ( θ ) (cid:48) . b Having stated the log likelihood together with its derivatives, we turn to the estimation of b . By theorem 2.1 the prediction error has the decomposition v t +1 ( θ ) = ( I − ββ (cid:48) Σ − β (cid:48) Σ − β ) y t +1 + βz t +1 ( θ ). Since the second term of z t +1 ( θ ) is I ( b − b ) by lemma 2.2, the prediction erroris I ( b ) whenever β (cid:54) = β and I ( b − b ) in case of β = β . However, since the ﬁrst termin v t +1 ( θ ) is invariant with respect to b , only the second term βz t +1 ( θ ) matters w.r.t.estimating b . The latter term is asymptotically stationary if b − b < /

2, such that a lawof large numbers can be applied to obtain uniform convergence of the objective function for b . For b − b ≥ / z t +1 ( θ ) is nonstationary, and the rate of convergence of the objectivefunction (9) depends on b − b . Thus, the objective function of the ML estimator for b does not converge uniformly on D . For ARFIMA models Nielsen (2015) shows consistencyof the conditional sum-of-squares (CSS) estimator for b , and the CSS estimator has thesame limit distribution as the maximum likelihood estimator under Gaussianity (Hualdeand Robinson; 2011). Thus, by showing that our objective function of the ML estimatorfor b is asymptotically nested in the ARFIMA objective function considered in Nielsen(2015) and that our setup satisﬁes assumptions A to D in Nielsen (2015), we prove thatconsistency for the ML estimator of b carries over from the CSS estimator. The followingtheorem summarizes the results. Theorem 3.1.

The ML estimator for b in model (1) is consistent, i.e. ˆ b p −→ b as n → ∞ . The proof is contained in appendix A.2.Theorem 3.1 implies that the relevant parameter space for b asymptotically reduces to theneighborhood of b , implying that z t +1 (ˆ θ ) is asymptotically stationary and the objectivefunction for the ML estimator of b converges uniformly.11 .2 Asymptotic distribution of the maximum likelihood estima-tor Next we turn to the asymptotic analysis of the maximum likelihood estimator for θ . Toderive its asymptotic properties, we follow the well-established approach used for stationarymodels and apply a ﬁrst order Taylor expansion to the score vector, which yields s n (ˆ θ n ) = s n ( θ ) + H n ( θ n )(ˆ θ n − θ ) , (12)where ˆ θ n is the maximum likelihood estimator for θ , and H n ( θ n ) denotes the Hessian withrows evaluated at mean values between ˆ θ n and θ . Given that s n (ˆ θ n ) = 0 if ˆ θ n is an interiorsolution, we write ν (cid:48) n A − (ˆ θ n − θ ) = − (cid:104) ν − n A (cid:48) H n ( θ n ) Aν − (cid:48) n (cid:105) − (cid:2) ν − n A (cid:48) s n ( θ ) (cid:3) , (13)where ν n is a scaling matrix and A is a rotation matrix that will be deﬁned in (19) below.Again following Chang et al. (2009), the score vector (10) evaluated at the true parametervalue θ is given by s n ( θ ) = 12 ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:16) F [ n ] − ⊗ F [ n ] − (cid:17) vec n (cid:88) t =1 (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ]0 (cid:17) − n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , (14)where F [ n ]0 is F [ n ] evaluated at θ = θ .It is easy to see that the only stochastic component in s n ( θ ) is v t ( θ ) and its derivativeevaluated at θ . From the decomposition of v t ( θ ) derived in theorem 2.1, one can obtainits derivatives stated in the following lemma. Lemma 3.2.

The ﬁrst partial derivatives of v t ( θ ) , evaluated at θ , are given by ∂v t ( θ ) (cid:48) ∂β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = − (cid:18) I − Σ − β β (cid:48) β (cid:48) Σ − β (cid:19) x t + a β ( u t , η t ) ,∂v t ( θ ) (cid:48) ∂ vec Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = a ( u t , η t ) , ∂v t ( θ ) (cid:48) ∂b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = a b ( u t , η t ) , where a i ( u t , η t ) are I (0) processes that depend on η , ...η t , u , ..., u t , i = β, Σ, b . The proof of lemma 3.2 is contained in appendix A.2. As the lemma shows, ∂v t ( θ ) (cid:48) /∂β at θ

12s the only source of fractional integration in the gradient s n ( θ ), whereas ∂v t ( θ ) (cid:48) /∂ vec Σ , ∂v t ( θ ) (cid:48) /∂b at θ are I (0). Similar to the I (1) case studied in Chang et al. (2009) the partialderivative ∂v t ( θ ) (cid:48) /∂β at θ is a process of dimension ( p × p ) that is driven by one commonfractionally integrated trend x t , such that ∂v t ( θ ) (cid:48) /∂β at θ is cointegrated. Deﬁning the( p × p )-dimensional projection matrix P x = I − Σ − β β (cid:48) β (cid:48) Σ − β , (15)as Chan and Palma (1998) do for the I (1)-case, allows to write ∂v t ( θ ) (cid:48) /∂β | θ = θ = − P x x t + a β ( u t , η t ). While for each column in ∂v t ( θ ) (cid:48) /∂β (cid:12)(cid:12) θ = θ the dimension of the cointegrationspace is p − cβ is the only common cointegrating vector for all p columns, where c is any nonzero constant, eliminating the single common trend from all p derivatives, β (cid:48) ∂v t ( θ ) (cid:48) /∂β (cid:12)(cid:12) θ = θ ∼ I (0). Thus, the projection matrix satisﬁes β (cid:48) P x = 0. Furthermore, P x Σ − β = 0 holds. From the latter equation it follows that P x Σ − is relevant for deter-mining the cointegration space for y t = β x t + u t . To deal with the singularity in P x , wefollow the approach of Chang et al. (2009) and deﬁne Γ as a p × ( p −

1) matrix for which Γ (cid:48) Σ − β = 0 , Γ (cid:48) Σ − Γ = I. (16)Note that P x = Σ − Γ Γ (cid:48) . Thus, Γ (cid:48) Σ − determines the p − y t . From the left equation in (16) it follows that the cointegration vectors for y t and for the partial derivatives ∂v t ( θ ) (cid:48) /∂β | θ = θ are orthogonal. For a broad discussion ofthe cointegrating properties we refer to Chang et al. (2009, ch. 4). In addition, note thatthe derivatives ∂v t ( θ ) (cid:48) ∂θ = − ∂x t | t − β (cid:48) ∂θ are F t − -measurable since x t | t − is F t − -measurable.Next, we study the asymptotic properties of v t ( θ ) and ∂v t ( θ ) (cid:48) ∂θ at θ = θ . From the Kalmanrecursions, in particular (3) and (5) which contain random components, it follows that v t ( θ ) is normally distributed since the recursions are linear and the errors η t and u t areassumed to be NID. Furthermore, ( v t ( θ ) , F t ) is a martingale diﬀerence sequence (MDS) byconstruction. Moreover, the MDS is asymptotically stationary since its conditional vari-ance Var( v t ( θ ) |F t − ) converges asymptotically, lim n →∞ lim t →∞ Var( v t ( θ ) |F t − ) = F , as shownin lemma A.6 in the appendix, so that F is the asymptotic variance for the MDS v t ( θ ).Since v t ( θ ) adapted to F t is uncorrelated, normally distributed due to the NID errors asargued above, and has a ﬁnite asymptotic variance, we have v t ( θ ) d −→ NID(0 , F [ n ]0 ) as t → ∞ for given n and given the adaption to F t . It follows from the results of Muirhead131982, pp. 85–91) on the asymptotic properties of the Wishart distribution that1 √ n n (cid:88) t =1 vec (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ]0 (cid:17) d −→ N (0 , ( I + K )( F ⊗ F )) , (17)as n → ∞ where K is the commutation matrix.As shown in lemma A.7 in appendix A.2, (cid:18) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , F t (cid:19) is a MDS sincethe partial derivative is F t − -measurable. Moreover, both terms in the score vector (14), (cid:80) nt =1 (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ]0 (cid:17) and (cid:80) nt =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ), become independent asymp-totically. Thus, we obtain comparable results as Chang et al. (2009, p. 234). b < / For b < / b < /

2, we use a centrallimit theorem (CLT) for MDS that applies to ∂v t ( θ ) (cid:48) /∂θ (cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) since the partialderivatives at θ are asymptotically stationary. Furthermore we show convergence in dis-tribution for the ﬁrst term in (14). Lemma A.8 in appendix A.2 summarizes the results forboth terms. A martingale CLT for the gradient (14) then yields √ n s n ( θ ) d −→ N (0 , J ),where J is the limiting information matrix (Davidson; 2000, eq. 11.3.11). Asymptoticindependence of both stochastic terms in (14) facilitates the computation of J . Finally,from Davidson (2000, eq. 11.3.15) a CLT for the ML estimator ˆ θ n follows as shown intheorem 3.3. Theorem 3.3.

For b ∈ [0 , / the maximum likelihood estimator is consistent andasymptotically normally distributed √ n (ˆ θ n − θ ) d −→ N(0 , J − ) as n → ∞ , with J = plim n →∞ n n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ + 12 (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) ∂ vec F∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) . The proof is contained in appendix A.2. 14 .4 Asymptotic distribution of the ML estimator for b > / Having shown that the ML estimator is asymptotically normal for b < /

2, we turn tothe nonstationary case b ∈ (1 / , / v t ( θ ) at θ is a nonstationary process. Inference for a broadclass of (potentially) nonstationary models is considered in Wooldridge (1994, sections 8and 11), where suﬃcient conditions for consistency and asymptotic (mixed) normality ofthe ML estimator are derived. Chang et al. (2009) extend this setup by including a rotationmatrix A . Their setup also nests our fractional trend model and allowed Park and Phillips(2001) to study the asymptotic behavior of the NLS estimator for nonlinear cointegrationmodels. It requires to consider the following three suﬃcient conditions:ML1: ν − n A (cid:48) s n ( θ ) d −→ N as n → ∞ ,ML2: − ν − n A (cid:48) H n ( θ ) Aν − (cid:48) n d −→ M a.s. as n → ∞ with M positive deﬁnite with probabilityone andML3: there exists a sequence of invertible normalization matrices µ n such that µ n ν − n → θ ∈ Θ n || µ − n A (cid:48) ( H n ( θ ) − H n ( θ )) Aµ − (cid:48) n || p −→ , where Θ n = (cid:8) θ (cid:12)(cid:12) || µ (cid:48) n A − ( θ − θ ) || ≤ (cid:9) is a sequence of shrinking neighborhoods of θ .The random matrices M and N and the nonstochastic matrices A and ν n will be deﬁnedbelow in (19), and µ n in the proof of lemma A.11. As in the I (1) case considered in Changet al. (2009), under conditions ML1 to ML3, equation (13) converges as n → ∞ ν (cid:48) n A − (ˆ θ n − θ ) = − (cid:104) ν − n A (cid:48) H n ( θ ) Aν − (cid:48) n (cid:105) − (cid:2) ν − n A (cid:48) s n ( θ ) (cid:3) + o p (1) d −→ M − N. (18)Showing that ML1 to ML3 hold, such that (18) follows, is the subject of the remainingsection, where we proceed as follows. To distinguish between I ( b ) and I (0) processeswe ﬁrst derive an expression for the rotation matrix A . Lemma 3.4 contains a functionalcentral limit theorem (FCLT) for the diﬀerent components in A (cid:48) s n ( θ ), which directlyyields the entries of the scaling matrix ν n . Finally, in lemmas A.9 to A.11 we prove thatML1 to ML3 hold and thus (18). Theorem 3.5 summarizes the results and deﬁnes M , N .As lemma 3.2 shows, the partial derivative w.r.t. β at θ is the only source of fractionalintegration in the partial derivatives of v t ( θ ) at θ , whereas the partial derivatives w.r.t.vec Σ and b are I (0). Again following Chang et al. (2009), to distinguish between I (0) and15 ( b ) components, let the rotation matrix be deﬁned as A = (cid:104) A N A S A D (cid:105) , where A N is k × ( p − A S is k × (1 + p ( p + 1) /

2) and A D is k ×

1, where k = p + p ( p + 1) / θ . The scaling matrix ν n adjusts for diﬀerent convergence rates A N =  Γ  , A S =  β ( β (cid:48) Σ − β ) / I  , A D =   , ν n = (cid:34) n b I p − n / I k − p +1 (cid:35) . (19)From lemma 3.2 and the properties of Γ in (16) is easy to see that A (cid:48) N ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = − Γ (cid:48) x t + Γ (cid:48) a β ( u t , η t ) ∼ I ( b ) (20)whereas A (cid:48) S ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ , A (cid:48) D ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ are I (0).To derive the distribution properties of M , N in (18) we deﬁne the partial sums U n ( r ) = 1 √ n (cid:98) nr (cid:99) (cid:88) t =1 F [ n ] − v t ( θ ) , W n ( r ) = 1 √ n (cid:98) nr (cid:99) (cid:88) t =1 A (cid:48) S ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) ,Y n ( r ) = 1 √ n (cid:98) nr (cid:99) (cid:88) t =1 A (cid:48) D ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , X n ( r ) = 1 n b − / (cid:98) nr (cid:99) (cid:88) t =1 A (cid:48) N ∆ ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ , and V n = 1 n b n (cid:88) t =1 A (cid:48) N ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) . Since multiplication with A (cid:48) S and A (cid:48) D eliminates the nonstationary part of ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) ∂θ = θ and ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) is a MDS, the FCLT of Chang et al. (2009, Lemma 3.3) carries overdirectly for U n ( r ), W n ( r ) and Y n ( r ). For X n ( r ) that contains nonstationary fractionallyintegrated common components we extend their FCLT in the following lemma where ⇒ denotes weak convergence. Lemma 3.4.

For b ∈ (1 / , / the following FCLT holds for the partial sums ( U n ( r ) , W n ( r ) , Y n ( r ) , X n ( r )) ⇒ ( U ( r ) , W ( r ) , Y ( r ) , X ( r )) as n → ∞ where U ( · ) , W ( · ) , Y ( · ) are multivariate Brownian motions, whereas X ( · ) is fractional Brownian motion of type II that is independent from U ( r ) . Furthermore, n d −→ V = (cid:82) X ( r ) d U ( r ) and has full rank a.s. The proof is contained in appendix A.2. Denoting in the sequel W (1) by W and Y (1) by Y , one has Var( W ) = plim n →∞ A (cid:48) S (cid:32) n n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A S , (21)Var( Y ) = plim n →∞ A (cid:48) D (cid:32) n n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A D . (22)With the FCLT of lemma 3.4 at hand, lemmas A.9 to A.11 prove that the conditions ML1to ML3 hold. They are contained in appendix A.2. The following theorem summarizesthe results by stating the asymptotic properties of the maximum likelihood estimator. Theorem 3.5.

The ML estimator for model (1) satisﬁes for b ∈ (1 / , / , ν (cid:48) n A − (cid:16) ˆ θ n − θ (cid:17) d −→ M − N, where N = (cid:16) − ( (cid:82) X ( r ) d U ( r )) (cid:48) Z (cid:48) − W (cid:48) Q − Y (cid:17) (cid:48) , M = (cid:82) X ( r ) F − X (cid:48) ( r ) d r Z ) + Var( W ) 00 0 Var( Q ) + Var( Y )  , (23) Z n = 12 A (cid:48) S (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) vec (cid:32) √ n n (cid:88) t =1 ( v t ( θ ) v t ( θ ) (cid:48) − F ) (cid:33)(cid:35) d −→ Z, (24) Q n = 12 A (cid:48) D (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) vec (cid:32) √ n n (cid:88) t =1 ( v t ( θ ) v t ( θ ) (cid:48) − F ) (cid:33)(cid:35) d −→ Q, (25) as n → ∞ with Z ∼ N (0 , Var( Z )) , Q ∼ N (0 , Var( Q )) , Var( Z ) = 12 A (cid:48) S (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) ∂ (vec F ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) A S , (26)Var( Q ) = 12 A (cid:48) D (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) ∂ (vec F ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) A D . (27)Var( W ) , Var( Y ) are given in (21) and (22) . (cid:16) R (cid:48) S (cid:48) (cid:17) (cid:48) = [Var( Z ) + Var( W )] − ( Z − W ) . Then it follows from theorem 3.5 that β (cid:48) Σ − ( β (cid:48) Σ − β ) / (cid:104) √ n ( ˆ β − β ) (cid:105) d −→ R, (28) Γ (cid:48) Σ − (cid:104) n b ( ˆ β − β ) (cid:105) d −→ − (cid:20)(cid:90) X ( r ) F − X (cid:48) ( r ) d r (cid:21) − (cid:90) X ( r ) d U ( r ) , (29) √ n (cid:16) vech ˆ Σ − vech Σ (cid:17) d −→ S, (30) √ n (ˆ b − b ) d −→ [Var( Q ) + Var( Y )] − ( Q − Y ) . (31)Chang et al. (2009, p. 236) conclude from their counterpart of theorem 3.5 that √ n (cid:16) ˆ β − β (cid:17) d −→ β ( β (cid:48) Σ − β ) / R. (32)To show this, multiply (28) by β / ( β (cid:48) Σ − β ) / and then insert (15) to obtain β β (cid:48) Σ − β (cid:48) Σ − β (cid:16) √ n ( ˆ β − β ) (cid:17) = (cid:16) √ n ( ˆ β − β ) (cid:17) − P (cid:48) x (cid:16) √ n ( ˆ β − β ) (cid:17) d −→ β ( β (cid:48) Σ − β ) / R. Using P (cid:48) x = Γ Γ (cid:48) Σ − , the second term converges to zero in probability for n → ∞ and b > / Γ (cid:48) Σ − (cid:16) n b ( ˆ β − β ) (cid:17) = O p (1).From theorem 3.5 it follows directly that the maximum likelihood estimator for θ is consis-tent and asymptotically normal. As in the I (1) model of Chang et al. (2009), the estimatorfor θ converges at rate √ n with one particular exception. Γ (cid:48) Σ − ˆ β converges at rate n b and is mixed normally distributed. Recall that the rotation Γ (cid:48) Σ − is the cointegratingmatrix as it projects out the common fractional trend Γ (cid:48) Σ − β x t = 0. Therefore, thefaster convergence rate for the cointegrating matrix in error-correction models carries overto the fractionally integrated unobserved components model. Additionally, theorem 3.5shows that the standard inference results, which were shown to be valid for nonstationary I (1) trends in state space models by Chang et al. (2009), remain valid when the persis-tence of the common component is generalized to the nonstationary fractional domain.Due to (30), (31), and (32) the information matrix equality holds asymptotically. Thus,an estimate for the parameter covariance matrix can be obtained from the negative inverseof the Hessian matrix computed in the numerical optimization.In a nutshell, the ML estimator is consistent for b ∈ D . It converges to the normaldistribution as n → ∞ whenever b < /

2, as shown in theorem 3.3. For b ∈ (1 / , / t -ratios for parameter signiﬁcance and asymptotic tests such as thelikelihood ratio test, the Wald test, and the LM test, remain valid in the fractionallyintegrated UC model within the two distinct intervals in D . Therefore, our results forthe nonstationary region generalize the statement of Chang et al. (2009) for the I (1)case. Based on simulation results, Hartl and Weigand (2019) report good ﬁnite sampleperformance of the ML estimator for fractionally integrated UC models. We apply our fractional UC model to extract a common long-run component from threeinﬂation measures for the US, the consumer price index (CPI), the personal consumptionexpenditures index (PCI), and the producer price index (PPI). The literature has so faronly considered an I (1) common component in US inﬂation (cf. e.g. Dom´enech and G´omez;2006; Stock and Watson; 2016) that was interpreted as long-run or core inﬂation. Wecontribute to the literature by investigating whether the I (1) assumption for the long-runcomponent holds. Furthermore, we show how estimates for the long-run component x t together with its fundamental shocks η t are aﬀected if fractional integration is allowed for.If the I (1) assumption for the long-run component is violated in the I (1) UC model, thenthe asymptotic results of Chang et al. (2009) are not applicable. In that case the fractionalUC model provides valid inferential results, as it covers integration orders b ∈ D .The data was downloaded from the Federal Reserve Bank of St. Louis (mnemonics: CPI-AUCSL, PCEPI, WPSFD49207), is in monthly frequency and spans from 1961:1 to 2018:12.The three series were generated via log diﬀerences π i,t = 100 × ∆ log price i,t , where i ∈ { CP I, P CI, P P I } indexes the inﬂation measures. Since all three series intendto measure price growth for the US, we model them as a function of one common scalarlong-run component x t , which in our case is a fractionally integrated trend, and threeuncorrelated idiosyncratic components u t  π CP I,t π P CI,t π P P I,t  =  β P CI β P P I  x t +  u CP It u P CIt u P P It  . (33)This implies a cointegration rank r = p − r = 1 (p-value 0 . r = 2 (p-value 0 . η t ) = σ η (cid:54) = 1 and restrict β CP I to one for unique identiﬁcation of x t . Since the standard errors of the three inﬂationmeasures diﬀer considerably, we allow for β P CI (cid:54) = 1 and β P P I (cid:54) = 1.We enrich our ARMA approximation of the fractionally integrated process x t by additionalAR coeﬃcients, which does not aﬀect the asymptotic properties of the ML estimatorbut reduces the approximation error. Since choosing the same lag order for the AR andthe MA polynomial is computationally eﬃcient, as any AR polynomial of length lessor equal to m does not aﬀect the dimension of the state vector, we use ARMA( m , m )approximations in the following. As Hartl and Weigand (2019) demonstrate in a simulationstudy, setting m ≥ ,

4) approximations in the following. Since the Wold representation of anARMA process a ( L )˜ x t = b ( L ) η t is given by ˜ x t = a ( L ) − b ( L ) η t = ψ ( L ) η t the approximationerror becomes ˜ (cid:15) t +1 ( θ ) = (cid:80) ti =1 ( ϕ i − ψ i ) e t +1 − i Σ η t Y t Σ − Y t Y t if b < , (cid:80) ts =1 (cid:80) si =1 ( ϕ i − ψ i ) e s +1 − i Σ η t Y t Σ − Y t Y t if b ≥ , and is again F t -measurable.Technically, for a ﬁxed b , the ARMA coeﬃcients in a ( L ), b ( L ), and thus ψ ( L ), are obtainedbeforehand by minimizing the mean squared error between the Wold representations of ˜ x t and x t . A continuous function that maps from the integration order b to the ARMAcoeﬃcients is then constructed by ﬁrst optimizing over a grid of b and second smoothingthe ARMA coeﬃcients over b using splines. Hence, optimization of the likelihood for thefractionally integrated UC model is conducted over the scalar fractional integration order b and does not involve the estimation of any parameters in a ( L ), b ( L ). This procedure keepsthe dimension of the parameter vector θ small during the optimization. Further detailstogether with simulation results are contained in Hartl and Weigand (2019).Starting values for the ML estimator of θ are obtained by drawing 1000 combinationsof initial values for b , β , and Σ from uniform distributions with appropriate supportand maximizing the likelihood while ignoring the approximation error. As Hartl andWeigand (2019) show, this procedure already yields quite precise estimates for the unknownparameters and is computationally fast. The optimized parameters corresponding to thelargest likelihood are then taken as starting values for the approximation-corrected MLestimator. For an unconstrained optimization, we use a matrix logarithm parametrizationfor the covariance matrices. Standard errors are denoted in parentheses.For the loadings we estimate ˆ β = (cid:16) . (0 . . (0 . (cid:17) (cid:48) , which reﬂects the heterogeneousvolatility of the three inﬂation measures. The integration order estimate ˆ b = 0 .

476 (0 . .

41 for US CPI inﬂation from 1969:1 to 1992:12, while Baillie (1996) estimatesˆ b = 0 .

47 for US CPI inﬂation from 1948:1 to 1990:7. Hence, there is substantial evidencefor long-run inﬂation being mean-reverting and integrated of order around 1 /

2. Our es-timated integration order of 0 .

476 implies that a unit shock still has more than 14% ofits initial impact on inﬂation after one year, and more than 4% of its initial impact afterten years. The variance estimates for the fundamental shocks η t , u t are log ˆ σ η = − . . σ u CPI = − .

374 (0 . σ u PCI = − .

839 (0 . σ u PPI = − . . σ η = 0 . σ u CPI = 0 . σ u PCI = 0 . σ u PPI = 0 . . I (1) assumption for the long-run component is likely to be violated.As a benchmark we also report results based on the fractionally cointegrated VAR (FC-VAR) model of Johansen and Nielsen (2012). Note that the two models are not nested,since they specify the fundamental shocks diﬀerently. For the FCVAR model, we esti-mate an integration order ˆ b F CV AR = 0 .

394 (0 . I (1) assumption for inﬂation. The smaller estimated inte-gration order for the FCVAR model may be explained by the ﬁndings of Sun and Phillips(2004) who show that an additive I (0) term can downward-bias the estimated integrationorder when the I (0) term is not properly included in the model. Furthermore, we cancalculate an estimate for β from the orthogonal complement of the cointegrating vectorof the FCVAR model and obtain ˆ β F CV AR = (cid:16) .

904 1 . (cid:17) (cid:48) . Again, the results ob-tained from the FCVAR model slightly diﬀer from the fractionally integrated unobservedcomponents model but point to a similar direction.Figure 1 sketches the dynamics of the estimated common fractionally integrated componentˆ x t and the idiosyncratic disturbances ˆ u t together with two standard deviations (dashed).As one can see, the common component captures the dynamics of the three inﬂationmeasures well. Due to the long memory property, mean-reversion can take quite a longtime, as the 1970s and the second half of the 1980s show. The disturbance terms seem tobe I (0), such that the long-run dynamics of the three inﬂation measures are well describedby one common fractionally integrated trend component and, therefore, two fractionalcointegration relations exist. As the ﬁgure shows, u t may be heteroskedastic and evenautocorrelated. These features could be included into the model and we leave this challengeopen for future research.We compare our results with the I (1) UC model that was studied in Chang et al. (2009) byestimating the latter as a benchmark. While we obtain similar estimates for the loadings21 ommon fractional component Time1970 1980 1990 2000 2010 − . − . − . . . . . CPI disturbance

Time1970 1980 1990 2000 2010 − . − . − . . . . . PCI disturbance

Time1970 1980 1990 2000 2010 − . − . . . . PPI disturbance

Time ( y [, ] ) / C o rr _e t a [ ] − ( aha t. c o r [, " f r a c " ] + e rr o r) − aha t. c o r [, " l e v e l " ]/ C o rr _e t a [ ] − . − . − . . . . . Figure 1: Common fractional component and I (0) idiosyncratic disturbances of US con-sumer price index, personal consumption expenditures: chain index, and producer priceindex. Shaded areas correspond to NBER recession periods.in β , the log likelihood of the I (1) UC model is 234 .

322 and hence clearly smaller thanin the fractionally integrated setup. Figure 2 plots the long-run component estimate fromthe I (1) UC model for US inﬂation together with the fractional trend estimate on theleft-hand side. The other graph shows the periodogram for the two fundamental shockseries that drive the long-run components and are assumed to follow Gaussian white noiseprocesses in both models.As the graphs show, the two trend estimates are very similar, although the solid line wasgenerated by an I (1) ﬁlter, that is an unweighted sum of past shocks, whereas the dashedline was generated by a fractional ﬁlter with b = 0 .

476 that assigns decreasing weights to22 rend

Time1960 1970 1980 1990 2000 2010 2020 − . − . . . . − − − − Smoothed Periodogram

Frequency S pe c t r u m Figure 2: Common trend and smoothed periodogram of the fundamental shock series forthe I (1) common trend model (solid) and the fractional trend model (dashed).ˆ η t − h as h increases. The similarity of the two processes results from a violation of the whitenoise assumption for the fundamental shocks of the I (1) UC model: As the periodogramshows, these shocks exhibit a zero at the origin, which indicates anti-persistence, whereasthe periodogram of the fundamental shocks for the fractional unobserved componentsmodel does not show such violations of the white noise assumption. In addition, the exactlocal Whittle estimator (with m = n . as in Shimotsu and Phillips (2005)) suggests anintegration order of − .

486 for the fundamental shocks of the I (1) trend (and 0 .

00 forthose of the I ( d ) trend). Applying an I (1) ﬁlter to an anti-persistent shock series withintegration order − .

486 produces a series that is integrated of order 0 . I (1) trend.Estimating a misspeciﬁed I (1) common trend model for US inﬂation therefore pollutes thefundamental shock estimates and leads to wrong conclusions about their persistence. Sinceinﬂation shocks are misleadingly assumed to exhibit a permanent impact, the I (1) modelproduces incorrect impulse responses, whereas the I ( d ) model captures the mean-revertingnature of inﬂation via the impulse response function correctly.Since the Gaussian white noise assumption for the fundamental shocks is crucial for con-sistency and asymptotic normality of the ML estimator of Chang et al. (2009), a violationmay yield inconsistent parameter estimates and incorrect inference. Thus, for US inﬂationwe ﬁnd that a fractional common component should be considered instead of an I (1) trendcomponent. In general, the fundamental shocks of the permanent component should bechecked for (anti-)persistence.We expect further consequences in the general multivariate I ( d ) case that carry over from23 (1) UC models: If additional unobserved components are added to the model that corre-late with the fundamental shocks, as e.g. in the correlated I (1) UC model of Morley et al.(2003) or the simultaneous UC model of Weber (2011), a violation of the I (1) assumptionmay produce spurious cycles and bias the estimates for the latent components. We propose a multivariate fractionally integrated unobserved components model and de-rive a computationally eﬃcient modiﬁcation of the Kalman ﬁlter to estimate a single,fractionally integrated common component. Furthermore, we show consistency and assessthe asymptotic distribution of the maximum likelihood estimator for integration orders b ∈ D = { d ∈ R | ≤ d < / , d (cid:54) = 1 / } , thereby generalizing the asymptotic results ofChang et al. (2009) for a common I (1) component. As we show, the maximum likelihoodestimator is asymptotically normally distributed whenever b < /

2. For b ∈ (1 / , / n b . We apply our fractionally in-tegrated unobserved components model to extract a long-run component from three USinﬂation series and obtain an estimated integration order of 0 .

476 for the long-run com-ponent. Due to a violation of the I (1) assumption the widely applied I (1) unobservedcomponents model yields anti-persistent long-run shocks, while those from our fractionallyintegrated model appear to be in line with the model assumptions.Future research could generalize our results to multiple common long-run components, po-tentially exhibiting diﬀerent integration orders. Furthermore, a trend-cycle decompositionthat allows for autocorrelated idiosyncratic shocks may yield new insights with regard tocommon trends and cycles for macroeconomic time series. Finally, settings with dependentshocks, such as the correlated unobserved components model of Morley et al. (2003) andthe simultaneous unobserved components model of Weber (2011), could be considered. Acknowledgments

The authors thank Uwe Hassler, Ulrich M¨uller, Morten Ø. Nielsen, Christoph Rust, theparticipants of the econometric seminar in Nuremberg, the department seminar at theChristian Albrechts University Kiel, the DAGStat conference 2019 in Munich, the work-shop on high-dimensional time series in economics and ﬁnance 2019 in Vienna, the AnnualMeeting of the German Statistical Society 2019 in Trier, the Annual Meeting of the Ger-man Economic Society 2019 in Leipzig, the Seminar on International Economic Policy at24he University of Zurich, the International Conference on Computational and FinancialEconometrics 2019 in London, the Symposium in Honor of Michael Hauser at WU Vienna,and the Standing Field Committee in Econometrics of the German Economic Society formany valuable comments. This work was supported by the German Research Foundation(DFG) via the projects TS283/1-1 and WE4847/4-1.25

Mathematical appendix

A.1 Proofs for section 2

The following lemma is required for theorem 2.1.

Lemma A.1.

For the prediction error variance F t in (4) it holds that β (cid:48) F − t β = β (cid:48) Σ − β β (cid:48) Σ − βω (1 , t . Proof of Lemma A.1.

From the inverse of the prediction error variance F − t = Σ − − Σ − ββ (cid:48) Σ − ω (1 , t (cid:16) β (cid:48) Σ − βω (1 , t (cid:17) − , it follows that F − t β = Σ − β β (cid:48) Σ − βω (1 , t . Proof of Theorem 2.1.

Using (5) of the exact state space representation α t +1 | t = T α t | t − + T P t | t − Z (cid:48) F − t v t ( θ ) , and using β (cid:48) F − t = β (cid:48) Σ − (cid:16) β (cid:48) Σ − βω (1 , t (cid:17) − analogously to theresult of lemma A.1, the conditional expectation of x t +1 is given by x t +1 | t = ( b ≥ x t | t − + α (2) t | t − + (cid:16) ( b ≥ ω (1 , t + ω (1 , t (cid:17) β (cid:48) Σ − β (cid:48) Σ − βω (1 , t v t ( θ ) . (34)Next, we iterate α (2) t | t − using (5) and deﬁne N t = 1 + β (cid:48) Σ − βω (1 , t to obtain α (2) t | t − = α ( t +1)1 | + t − (cid:88) j =0 β (cid:48) Σ − ω (1 , j ) t − − j N t − − j ( y t − − j − βx t − − j | t − − j ) . (35)After inserting (35) into (34) one has x t +1 | t = ( b ≥ (cid:32) x t | t − + β (cid:48) Σ − ω (1 , t N t ( y t − βx t | t − ) (cid:33) + α ( t +1)1 | + t − (cid:88) j =0 β (cid:48) Σ − ω (1 , j ) t − j N t − j ( y t − j − βx t − j | t − j − ) , (36)where α ( t +1)1 | = 0. To unify the denominators we add and subtract ( b ≥ x t | t − + β (cid:48) Σ − N n ( y t − βx t | t − )] together with ϕ j +1 ( d ) N n inside the sum of (36) x t +1 | t = ( b ≥ (cid:20) x t | t − + β (cid:48) Σ − N n ( y t − βx t | t − ) (cid:21) + t − (cid:88) j =0 ϕ j +1 ( d ) β (cid:48) Σ − N n ( y t − j − βx t − j | t − j − ) + z ,t ( θ ) , z ,t ( θ ) = ( b ≥ (cid:34) x t | t − + ω (1 , t β (cid:48) Σ − N t ( y t − βx t | t − ) (cid:35) − ( b ≥ (cid:104) x t | t − + β (cid:48) Σ − N n ( y t − βx t | t − ) (cid:105) + t − (cid:88) j =0 (cid:34) ω (1 , j ) t − j N t − j − ϕ j +1 ( d ) N n (cid:35) β (cid:48) Σ − ( y t − j − βx t − j | t − j − ) . Subtracting ( b ≥ x t | t − and using the fractional diﬀerence operator (cid:80) t − j =0 ϕ j +1 ( d ) y t − j =(∆ − d + − y t +1 gives ∆ ( b ≥ x t +1 | t = ( b ≥ β (cid:48) Σ − N n − ( y t − βx t | t − )+ β (cid:48) Σ − N n − (∆ − d + − y t +1 − βx t +1 | t ) + z ,t ( θ ) . By taking fractional diﬀerences ∆ d + one has∆ b + x t +1 | t = ( b ≥ β (cid:48) Σ − N n ∆ d + ( y t − βx t | t − ) + β (cid:48) Σ − N n (1 − ∆ d + )( y t +1 − βx t +1 | t )+ ∆ d + z ,t ( θ ) = β (cid:48) Σ − N n (1 − ∆ b + )( y t +1 − βx t +1 | t ) + ∆ d + z ,t ( θ ) , where the last step follows from ∆ d + L + (1 − ∆ d + ) = (1 − L ) d + L + 1 − (1 − L ) d + =1 − (1 − L ) d + (1 − L ) = 1 − (1 − L ) d +1+ = L d +1 . Bringing all x t +1 | t to the left-hand sideand solving for x t +1 | t yields x t +1 | t = (cid:32) − L b β (cid:48) Σ − β ( ω (1 , n − N n (cid:33) − (cid:26) β (cid:48) Σ − N n L b y t +1 + ∆ d + z ,t ( θ ) (cid:27) == (cid:88) j =0 (cid:32) β (cid:48) Σ − β ( ω (1 , n − N n L b (cid:33) j (cid:26) β (cid:48) Σ − N n L b y t +1 + ∆ d + z ,t ( θ ) (cid:27) == (cid:88) j =0 (cid:32) β (cid:48) Σ − β ( ω (1 , n − N n (cid:33) j (cid:26) β (cid:48) Σ − N n y t +1 + ∆ d + z ,t ( θ ) (cid:27) + w t +1 , where w t +1 is an I (0) process that accounts for the impact of the fractional diﬀerences in L b . Finally, using a geometric series and plugging in N n gives x t +1 | t = (cid:32) − β (cid:48) Σ − β ( ω (1 , n − N n (cid:33) − (cid:26) β (cid:48) Σ − N n y t +1 + ∆ d + z ,t ( θ ) (cid:27) + w t +1 == β (cid:48) Σ − β (cid:48) Σ − β y t +1 + 1 + β (cid:48) Σ − βω (1 , n β (cid:48) Σ − β ∆ d + z ,t ( θ ) + w t +1 = β (cid:48) Σ − β (cid:48) Σ − β y t +1 − z t +1 ( θ ) , (37)where z t +1 ( θ ) = − β (cid:48) Σ − βω (1 , n β (cid:48) Σ − β ∆ d + z ,t ( θ ) − w t +1 and the minus sign is included to facilitateits interpretation. 27y multiplication of (37) with β one obtains the conditional expectationE θ ( y t +1 |F t ) = βx t +1 | t = ββ (cid:48) Σ − β (cid:48) Σ − β y t +1 − βz t +1 ( θ ) , (38)and the prediction error in (7).To derive an expression for z t +1 ( θ ), we add and subtract ββ (cid:48) Σ − β (cid:48) Σ − β v t +1 ( θ ) to v t +1 ( θ ) = y t +1 − E θ ( y t +1 |F t ) v t +1 ( θ ) = (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) y t +1 + ββ (cid:48) Σ − β (cid:48) Σ − β ( y t +1 − E θ ( y t +1 |F t )) , (39)since (cid:16) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:17) E θ ( y t +1 |F t ) = (cid:16) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:17) β E θ ( x t +1 |F t ) = 0. By adding and sub-tracting (cid:80) ti =1 π i ( b ) y t +1 − i inside the last parentheses equation (39) becomes v t +1 ( θ ) = (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) y t +1 + ββ (cid:48) Σ − β (cid:48) Σ − β (cid:0) ∆ b + y t +1 − E θ (∆ b + y t +1 |F t ) (cid:1) . (40)We can plug (40) into (7) and solve for βz t +1 which yields βz t +1 ( θ ) = ββ (cid:48) Σ − β (cid:48) Σ − β (∆ b + y t +1 − E θ (∆ b + y t +1 |F t )) . This completes the proof.

Proof of Lemma 2.2.

To derive the integration order of z t ( θ ) given in theorem 2.1, whichis the prediction error of the univariate process ∆ b + β (cid:48) Σ − β (cid:48) Σ − β y t , we show that z t ( θ ) is identicalto the residuals in Nielsen (2015) for which he determined the integration order. First,consider z t ( θ ), for which one has from model (1) ∆ b + β (cid:48) Σ − β (cid:48) Σ − β y t = ζ t ( θ ) = η t + ∆ b + β (cid:48) Σ − β (cid:48) Σ − β u t .Since η t ∼ I (0) and ∆ b + u t ∼ I ( − b ) their sum ζ t ( θ ) is I (0) due to the aggregation propertiesof fractional processes. Furtheremore, since η t , ∆ b + u t are independent, it follows fromGranger and Newbold (1986, p. 29) that ζ t ( θ ) = A + ( L, θ ) g t = (cid:80) t − i =0 A i ( θ ) g t − i follows amoving average process of order n −

1, where g t is Gaussian white noise and zero for all t ≤

0. The coeﬃcients A i are obtained by matching the partial autocovariance functionsof A + ( L, θ ) g t and η t + ∆ b + β (cid:48) Σ − β (cid:48) Σ − β u t . They are A = 1, A i = π i ( b )(1 + β (cid:48) Σ − β ) − / , andVar( g t ) = 1 + ( β (cid:48) Σ − β ) − . Due to the I (0) property, A + ( L, θ ) remains invertible for n → ∞ . Additionally, β (cid:48) Σ − β (cid:48) Σ − β y t has an ARMA( n − n −

1) state space representation (cf.Durbin and Koopman; 2012, ch. 3.4). Rearranging with B + ( L, θ ) as the truncated inverseof A + ( L, θ ) gives β (cid:48) Σ − β (cid:48) Σ − β y t = − ( B + ( L, θ )∆ b + − β (cid:48) Σ − β (cid:48) Σ − β y t + g t , from which it becomes clearthat for a given θ the prediction error z t ( θ ) and the residuals g t ( θ ) as considered in Nielsen(2015) are identical since using (39) z t ( θ ) = β (cid:48) Σ − β (cid:48) Σ − β [ y t − E θ ( y t |F t − )] = β (cid:48) Σ − β (cid:48) Σ − β (cid:2) y t + ( B + ( L, θ )∆ b + − y t (cid:3) =28 B + ( L, θ )∆ b + β (cid:48) Σ − β (cid:48) Σ − β y t = g t ( θ ) . (41)From ∆ b + y t ∼ I ( b − b ) it follows that z t ( θ ) ∼ I ( b − b ).The following lemmas are required for theorem 2.3. Lemma A.2.

The covariance of y t , η t − j is given by Cov θ ( y t , η t − j ) =  β (cid:80) ji =0 ϕ i if b ≥ ,βϕ j , if b < , j = 0 , . . . , t − . (42) Proof of Lemma A.2.

Let b ≥

1. From x t = (1 − L ) − ( b ≥ (cid:80) t − i =0 ϕ i η t − i it follows thatCov θ ( y t , η t − j ) = Cov θ ( βx t + u t , η t − j ) = Cov θ (cid:32) β t (cid:88) s =1 s − (cid:88) i =0 ϕ i η s − i + u t , η t − j (cid:33) = β t (cid:88) s =1 s − (cid:88) i =0 ϕ i Cov θ ( η s − i , η t − j ) = β t (cid:88) s = t − j ϕ s − ( t − j ) = β j (cid:88) i =0 ϕ i . For b < θ ( y t , η t − j ) = Cov θ ( β (cid:80) t − i =0 ϕ i η t − i + u t , η t − j ) = βϕ j , j = 0 , ..., t − Lemma A.3.

The autocovariance function of y t satisﬁes Cov θ ( y t , y t − k ) =  ββ (cid:48) (cid:80) t − ks =1 (cid:80) su =1 (cid:80) t − ks (cid:48) = u ϕ s − u ϕ s (cid:48) − u + ββ (cid:48) (cid:80) ts = t − k +1 (cid:80) t − ku =1 (cid:80) t − ks (cid:48) = u ϕ s − u ϕ s (cid:48) − u if b ≥ ,ββ (cid:48) (cid:80) t − − kl =0 ϕ k + l ϕ l , if b < , k = 1 , . . . , t − . (43) Proof of Lemma A.3.

Let b ≥

1. From x t = (1 − L ) − ( b ≥ (cid:80) t − i =0 ϕ i η t − i one has for k =1 , . . . , t − θ ( y t , y t − k ) = Cov θ (cid:32) β t (cid:88) s =1 s − (cid:88) i =0 ϕ i η s − i + u t , β t − k (cid:88) s (cid:48) =1 s (cid:48) − (cid:88) i (cid:48) =0 ϕ i (cid:48) η s (cid:48) − i (cid:48) + u t − k (cid:33) = β (cid:34) t (cid:88) s =1 s − (cid:88) i =0 t − k (cid:88) s (cid:48) =1 s (cid:48) − (cid:88) i (cid:48) =0 ϕ i ϕ i (cid:48) Cov( η s − i , η s (cid:48) − i (cid:48) ) (cid:35) β (cid:48) , u = s − i and u (cid:48) = s (cid:48) − i (cid:48) one obtainsCov θ ( y t , y t − k ) = ββ (cid:48) t − k (cid:88) s =1 s (cid:88) u =1 t − k (cid:88) s (cid:48) =1 s (cid:48) (cid:88) u (cid:48) =1 ϕ s − u ϕ s (cid:48) − u (cid:48) Cov θ ( η u , η u (cid:48) )+ ββ (cid:48) t (cid:88) s = t − k +1 t − k (cid:88) u =1 t − k (cid:88) s (cid:48) =1 s (cid:48) (cid:88) u (cid:48) =1 ϕ s − u ϕ s (cid:48) − u (cid:48) Cov θ ( η u , η u (cid:48) )= ββ (cid:48) t − k (cid:88) s =1 s (cid:88) u =1 t − k (cid:88) s (cid:48) = u ϕ s − u ϕ s (cid:48) − u + ββ (cid:48) t (cid:88) s = t − k +1 t − k (cid:88) u =1 t − k (cid:88) s (cid:48) = u ϕ s − u ϕ s (cid:48) − u , b ≥ . For b < θ ( y t , y t − k ) = Cov θ (cid:16) β (cid:80) t − i =0 ϕ i η t − i + u t , β (cid:80) t − k − l =0 ϕ l η t − k − l + u t − k (cid:17) = ββ (cid:48) (cid:80) t − k − l =0 ϕ k + l ϕ l . Here t − i = t − k − l was used to obtain i = k + l and l ≤ t − k − Corollary A.4.

Given θ , the joint normal distribution of η t = ( η , ..., η t ) (cid:48) , Y t = ( y (cid:48) , ..., y (cid:48) t ) (cid:48) is given by (cid:32) η t Y t (cid:33) ∼ N (cid:32) , (cid:34) I t Σ η t Y t Σ (cid:48) η t Y t Σ Y t (cid:35)(cid:33) , (44) where the ( tp × t ) covariance matrix Σ (cid:48) η t Y t has entries Cov θ ( y s , η s − i ) , i = 0 , . . . , s − for s = 1 , . . . , t given by (42) and zero matrices for all s with i > and s + i ≤ t − . The ( tp × tp ) covariance matrix Σ Y t has entries given by (43) .Proof of Theorem 2.3. Using lemmas A.2, A.3, and corollary A.4, the conditional expec-tation of the latent state from the truncated model (8) can be rearranged such that˜ x t +1 | t = x t +1 | t + E θ (˜ x t +1 − x t +1 |F t ) = x t +1 | t − ∆ − ( b ≥ t (cid:88) i = m +1 ϕ i E θ ( η t +1 − i |F t ) == x t +1 | t − ∆ − ( b ≥ t (cid:88) i = m +1 ϕ i e t +1 − i Σ η t Y t Σ − Y t Y t , where the last step follows from E θ ( η t |F t ) = Σ η t Y t Σ − Y t ( Y t − E θ ( Y t )) = Σ η t Y t Σ − Y t Y t . A.2 Proofs for section 3

Lemma A.5.

For a ﬁxed state dimension n the prediction error covariance matrix of theexact model (1) has a steady state P t | t − = P [ n ] + O ( e − t ) , nd, therefore, ω ( i,j ) t → ω ( i,j )[ n ] as t → ∞ , and lim t →∞ F t = F [ n ] where the superscript [ n ] denotes the dependence of lim t →∞ F t on the system dimension n due to the type II deﬁnitionof fractional integration.Proof of Lemma A.5. As shown by Anderson and Moore (1979, section 4.4), any stable,time invariant state space model with positive semi-deﬁnite initial prediction error covari-ance matrix P | has a steady state solution for P t +1 | t . Furthermore, a non-stable systemhas a steady state solution for P t +1 | t if it is stabilisable and detectable and if P | is positivesemi-deﬁnite. Note that P | is given by  ϕ · · · ϕ n ϕ ϕ · · · ϕ ϕ n ... ... . . . ... ϕ n ϕ n ϕ · · · ϕ n  , which follows from α = ( x , ϕ η , . . . , ϕ n η ) (cid:48) . Therefore, the matrix P | is positivesemideﬁnite. Hence, it is suﬃcient to show that our model is stable for b < b ≥

1. For this, consider the representation y t = Z ∗ α ∗ t = (cid:104) Z I (cid:105) (cid:32) α t u t (cid:33) , α ∗ t = T ∗ α ∗ t − + G (cid:32) η t u t (cid:33) = (cid:34) T

00 0 (cid:35) α ∗ t − + (cid:34) R I (cid:35) (cid:32) η t u t (cid:33) . The following deﬁnitions are taken from Harvey (1990, section 3.3). A system is stableif the characteristic roots of the transition matrix T ∗ have modulus less than one, i.e. | λ i ( T ∗ ) | < ∀ i . Furthermore, a system is called stabilisable if there exists a matrix S suchthat | λ i ( T ∗ + GS (cid:48) ) | < ∀ i . Finally, a system is detectable if there exists a matrix D suchthat | λ i ( T ∗ − DZ ∗ ) | < ∀ i .Beginning with the stable case, b <

1, we note that T ∗ is a strictly upper triangularmatrix, such that its eigenvalues λ i ( T ∗ ) = 0 ∀ i . Another way to see this is to rewrite x t as x t = − (cid:80) t − i =1 π i ( b ) x t − i + η t , where all roots of − (cid:80) t − i =1 π i ( b ) L i lie outside the unit circlefor b <

1. Hence, for b < b ∈ [1 , .

5) the system is not stable since its largest eigenvalue equals 1 due to the unitroot imposed on x t via T . Nonetheless, the nonstationary unobserved components model isdetectable since a ( n +1+ p ) × p matrix D with D (1 , = 1 /β (1) in its upper left entry and allother elements 0 yields a strictly upper triangular matrix T ∗ − DZ ∗ such that all eigenvaluesare zero. Furthermore, the model is stabilisable since an ( n + 1 + p ) × (1 + p ) matrix S with S (1 , = − ϕ i . Therefore, the nonstationary modelis also stabilisable. Consequently, lemma A.5 follows.31 emma A.6. As n → ∞ the steady state prediction error variance F [ n ] deﬁned in lemmaA.5 converges lim n →∞ F [ n ] = lim n →∞ lim t →∞ F [ n ] t = F, where F [ n ] t = Var θ ( v t ( θ ) |F t − ) indicates the dependence of F t on the state dimension n ,and < F < ∞ .Proof of Lemma A.6. To prove lemma A.6 we ﬁrst consider F [ n ] t and derive the limits for F [ n +1] t − F [ n ] t . Note that F [ n +1] t , F [ n ] t are identical for t ≤ n , such that lim n →∞ F [ n +1] t − F [ n ] t =0 holds for a ﬁxed t . Thus, we only consider t > n . Next, we show that the limit of F [ n ] t is bounded.To simplify the notation, we deﬁne P = I − Σ − ββ (cid:48) β (cid:48) Σ − β analog to (15). Then from theorem2.1 v t ( θ ) = P (cid:48) y t + ( I − P ) (cid:48) (∆ b + y t − E θ (∆ b + y t |F t − )), such that F [ n ] t = Var θ ( v t ( θ ) |F t − ) =Var θ ( P (cid:48) u t ( θ ) + ( I − P ) (cid:48) ∆ b + y t |F t − ) = − P (cid:48) ΣP + P (cid:48) Σ + ΣP + Var θ (( I − P ) (cid:48) ∆ b + y t |F t − ) since P (cid:48) βx t = 0 and Cov θ ( P (cid:48) u t ( θ ) , ( I − P ) (cid:48) ∆ b + y t |F t − ) = P (cid:48) Σ ( I − P ). Furthermore − P (cid:48) ΣP + P (cid:48) Σ + ΣP = Σ − ββ (cid:48) β (cid:48) Σ − β which can be seen by plugging in P . Again using P (cid:48) βx t = 0, thelatter term is Var θ (( I − P ) (cid:48) ∆ b + y t |F t − ) = Var θ (∆ b + y t − P (cid:48) ∆ b + u t ( θ ) |F t − ) = Var θ ( βη t ( θ ) +( I − P ) (cid:48) ∆ b + u t ( θ ) |F t − ) = ββ (cid:48) + Var θ (( I − P ) (cid:48) ∆ b + u t ( θ ) |F t − ). Thus F [ n ] t = Σ − ββ (cid:48) β (cid:48) Σβ + ββ (cid:48) + Var θ (( I − P ) (cid:48) ∆ b + u t ( θ ) |F t − ) . (45)For the latter term we deﬁne A n = Var θ (( I − P ) (cid:48) (cid:80) n − i =0 π i ( b ) u t − i ( θ )), which is independentof t due to t > n , and B n,t = Var θ (E θ (( I − P ) (cid:48) (cid:80) n − i =0 π i ( b ) u t − i ( θ ) |F t − )). It follows fromthe law of total variance thatVar θ (( I − P ) (cid:48) n − (cid:88) i =0 π i ( b ) u t − i ( θ ) |F t − ) = A n − B n,t . (46)Since all other terms are constant, the diﬀerence between F [ n +1] t and F [ n ] t solely dependson (46) and is given by F [ n +1] t − F [ n ] t = A n +1 − A n − ( B n +1 ,t − B n,t ) . (47)In the following, we consider A n +1 − A n and B n +1 ,t − B n,t separately, where we show thattheir limits converge to zero.Since A n +1 = Var θ (( I − P ) (cid:48) (cid:80) ni =0 π i ( b ) u t − i ( θ )) = ( I − P ) (cid:48) Σ ( I − P ) (cid:80) ni =0 π i ( b ), and analog32or A n , one directly has A n +1 − A n = ( I − P ) (cid:48) Σ ( I − P ) π n ( b ) . (48)Note that A n is invariant w.r.t. t , and lim n →∞ lim t →∞ ( A n +1 − A n ) = lim n →∞ ( A n +1 − A n ) =0 since π n ( b ) = O ( n − − b ) (cf. e.g. Hassler; 2018, lemma 5.1).The calculation of B n +1 ,t − B n,t is more involved. By writing B n +1 ,t = B n,t + C n +1 ,t + D n +1 ,t + D (cid:48) n +1 ,t with C n +1 ,t = Var θ (E θ (( I − P ) (cid:48) π n ( b ) u t − n ( θ ) |F t − )), D n +1 ,t = Cov θ (E θ (( I − P ) (cid:48) (cid:80) n − i =0 π i ( b ) u t − i ( θ ) |F t − ) , E θ (( I − P ) (cid:48) π n ( b ) u t − n ( θ ) |F t − )) the diﬀerence becomes B n +1 ,t − B n,t = C n +1 ,t + D n +1 ,t + D (cid:48) n +1 ,t .For D n +1 ,t , deﬁne Y t − = ( y (cid:48) , ..., y (cid:48) t − ) (cid:48) and Σ Y t − = Var θ ( Y t − ). Then it follows fromDurbin and Koopman (2012, lemma 1)Cov θ (cid:32) E θ (cid:32) n − (cid:88) i =0 π i ( b ) u t − i ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:33) , E θ ( π n ( b ) u t − n ( θ ) |F t − ) (cid:33) = n − (cid:88) i =0 π i ( b ) Cov θ (cid:16) Cov θ ( u t − i ( θ ) , Y t − ) Σ − Y t − Y t − , Cov θ ( u t − n ( θ ) , Y t − ) Σ − Y t − Y t − (cid:17) π n ( b ) == π n ( b ) n − (cid:88) i =0 π i ( b ) Cov θ ( u t − i ( θ ) , Y t − ) Σ − Y t − Cov θ ( u t − n ( θ ) , Y t − ) (cid:48) = π n ( b ) n − (cid:88) i =1 π i ( b ) ΣE t − i Σ − Y t − E (cid:48) t − n Σ, (49)where E j = (cid:104) p × p · · · p × p I p × p p × p · · · p × p (cid:105) is a p × ( t − p selection matrix,with an identity matrix in its j -th block and all other blocks zero. Hence, Σ − Y t − E (cid:48) t − n picks the columns corresponding to Cov θ ( Y t − , y t − n ) from the inverse Σ − Y t − , and hence ΣE t − i Σ − Y t − E (cid:48) t − n Σ is ﬁnite for all t > n . Since the sum (cid:80) n − i =1 π i ( b ) < ∞ for all n (Hassler;2018, lemma 5.2), it follows for (49) that (cid:80) n − i =1 π i ( b ) ΣE t − i Σ − Y t − E (cid:48) t − n Σ is ﬁnite. As notedbefore π n ( b ) = O ( n − − b ), such that the limit lim n →∞ lim t →∞ D n +1 ,t = 0.For C n +1 ,t = π n ( b ) Var θ (E θ (( I − P ) (cid:48) u t − n ( θ ) |F t − )) one obtains from the law of totalvariance that Var θ (E θ ( u t − n ( θ ) |F t − )) ≤ Var θ ( u t − n ( θ )) = Σ. Since π n ( b ) = O ( n − − b ),lim n →∞ lim t →∞ C n +1 ,t = 0. The results for C n +1 ,t , D n +1 ,t imply lim n →∞ lim t →∞ ( B n +1 ,t − B n,t ) = 0. It then follows for (47) thatlim n →∞ lim t →∞ ( F [ n +1] t − F [ n ] t ) = lim n →∞ lim t →∞ ( A n +1 − A n ) − lim n →∞ lim t →∞ ( B n +1 ,t − B n,t ) = 0 . (50)Finally, to prove boundedness of lim n →∞ F [ n ] , it is suﬃcient to show that in (45) thelimit lim n →∞ lim t →∞ Var θ (( I − P ) (cid:48) ∆ b + u t ( θ ) |F t − ) < ∞ . From the law of total variance in3346) it follows that Var θ (( I − P ) (cid:48) (cid:80) n − i =0 π i ( b ) u t − i ( θ ) |F t − ) ≤ A n since B n,t ≥

0. For A n =( I − P ) (cid:48) Σ ( I − P ) (cid:80) n − i =0 π i ( b ), note that lim t →∞ A n = A n , and lim n →∞ (cid:80) n − i =0 π i ( b ) < ∞ , (cf.e.g. Hassler; 2018, lemma 5.2). Hence, lim n →∞ F [ n ] < ∞ and lim n →∞ lim t →∞ F [ n ] t = F . Proof of Theorem 3.1.

The prediction error z t ( θ ) of ∆ b + β (cid:48) Σ − β (cid:48) Σ − β y t is the only component in v t ( θ ) that depends on b . Therefore, it is the only part in v t ( θ ) that matters for estimating b . In the proof of lemma 2.2 we showed that the prediction error z t ( θ ) is identical tothe residuals in Nielsen (2015) (compare (41)). While Nielsen (2015) considers the CSSestimator, we consider the ML estimator based on (9). The latter also contains F [ n ] whichdepends on the sample size n . By lemma A.6 the steady state prediction error variance F [ n ] converges to F as n → ∞ . Therefore, the ML estimator and the CSS estimator areasymptotically equivalent and it suﬃces to consider the behavior of the sum of squaredresiduals (cid:80) nt =1 v t ( θ ) v t ( θ ) (cid:48) in (9). By the equivalence of the prediction errors stated abovethis objective function is nested in the ARFIMA objective function considered in Nielsen(2015). Thus, his consistency results carry over to the ML estimator of b if for z t ( θ )assumptions A – D in Nielsen (2015) hold.Since g t deﬁned in the proof of lemma 2.2 is univariate Gaussian white noise with positivevariance and b ∈ D , assumptions A and B in Nielsen (2015) are satisﬁed. Following theproof of lemma 2.2, ζ t ( θ ) = A + ( L, θ ) g t is I (0) which guarantees a well-deﬁned inverse ofthe MA polynomial A + ( L, θ ) even for n → ∞ . Therefore, assumptions C and D in Nielsen(2015) hold. Under these assumptions it follows that the CSS estimator for b is consistent.Since the CSS estimator has the same limit distribution as the ML estimator as arguedbefore, it follows that ˆ b p −→ b as n → ∞ . Proof of Lemma 3.2.

The partial derivatives of v t ( θ ) w.r.t. β , Σ have been derived for the I (1) case in Chang et al. (2009, lemma 3.2). We obtain similar expressions for the I ( b )case. Note that from theorem 2.1 and (41) v t ( θ ) (cid:48) = y (cid:48) t (cid:18) I − Σ − ββ (cid:48) β (cid:48) Σ − β (cid:19) + z t ( θ ) β (cid:48) = y (cid:48) t (cid:18) I − Σ − ββ (cid:48) β (cid:48) Σ − β (cid:19) + B + ( L, θ )∆ b + y (cid:48) t Σ − ββ (cid:48) β (cid:48) Σ − β , with B + ( L, θ ) = (cid:2) − (1 + β (cid:48) Σ − β ) − / + (1 + β (cid:48) Σ − β ) − / ∆ b + (cid:3) − following from the proofof lemma 2.2. The derivative w.r.t. vec Σ , evaluated at θ , is I (0) and given by ∂v t ( θ ) (cid:48) ∂ vec Σ = Σ − β ⊗ Σ − β (cid:48) Σ − β (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) (cid:0) y t β (cid:48) − B + ( L, θ )∆ b + y t β (cid:48) (cid:1) + ∂B + ( L, θ ) ∂ vec Σ ∆ b + y (cid:48) t Σ − ββ (cid:48) β (cid:48) Σ − β ,∂v t ( θ ) (cid:48) ∂ vec Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = Σ − β ⊗ Σ − β (cid:48) Σ − β (cid:18) I − β β (cid:48) Σ − β (cid:48) Σ − β (cid:19) β x t β (cid:48) + w t = a ( u t , η t ) , ∂B + ( L, θ ) / ( ∂ vec Σ ) = − / Σ − β ⊗ Σ − β )(1 + β (cid:48) Σ − β ) − / B ( L, θ )(∆ b + −

1) is astationary ﬁlter, w t , a ( u t , η t ) are I (0) processes that depend on u , ..., u t , η , ..., η t . Next,consider the derivative w.r.t. β , evaluated at θ . For x t | t − one has ∂x t | t − ∂β = Σ − β (cid:48) Σ − β (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) (cid:0) y t − B + ( L, θ )∆ b + y t (cid:1) − ∂B + ( L, θ ) ∂β β (cid:48) Σ − β (cid:48) Σ − β ∆ b + y t , where ∂B + ( L, θ ) /∂β = Σ − βB ( L, θ )(∆ b + − β (cid:48) Σ − β ) − / is a stationary ﬁlter. Thus ∂v t ( θ ) (cid:48) ∂β = − β (cid:48) Σ − β (cid:48) Σ − β (cid:2) y t − B + ( L, θ )∆ b + y t (cid:3) I + ∂B + ( L, θ ) ∂β β (cid:48) Σ − β (cid:48) Σ − β ∆ b + y t β (cid:48) − Σ − β (cid:48) Σ − β (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) (cid:2) y t − B + ( L, θ )∆ b + y t (cid:3) β (cid:48) , (51) ∂v t ( θ ) (cid:48) ∂β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = − (cid:18) I − Σ − β β (cid:48) β (cid:48) Σ − β (cid:19) x t + a β ( u t , η t ) , where again a β ( u t , η t ) ∼ I (0) depends on u , ..., u t , η , ..., η t .For the derivative w.r.t. b , one obtains ∂v t ( θ ) (cid:48) /∂b = ( ∂z t ( θ ) /∂b ) β (cid:48) . From (41) one has z t ( θ ) = B + ( L, θ ) β (cid:48) Σ − β (cid:48) Σ − β ∆ b − b + ∆ b + y t ,∂z t ( θ ) ∂b = β (cid:48) Σ − β (cid:48) Σ − β (cid:20) ∂B + ( L, θ ) ∂b ∆ b − b + ∆ b + y t + B + ( L, θ ) ∂∂b ∆ b − b + ∆ b + y t (cid:21) . (52)To calculate the partial derivatives in (52) we rearrange ∆ b − b + ∆ b + y t = (1 − L )∆ b − b − ∆ b + y t =(1 − L ) (cid:80) t − j =0 π j ( b − b − b + y t − j , where π j ( b − b −

1) =

Γ(1+ b − b + j )Γ( j +1)Γ(1+ b − b ) , and Γ( u ) is thegamma function at u . Deﬁne Ψ( u ) as the digamma function at u , Ψ( u ) = ∂ Γ( u ) /∂u Γ( u ) . Itsatisﬁes Ψ( u + j ) − Ψ( u ) = (cid:80) j − k =0 ( u + k ) − for positive u . Due to theorem 3.1 | b − b | boilsdown to the stationary region, such that 1 + b − b is positive asymptotically. Then ∂π j ( b − b − ∂b = − [Ψ(1 + b − b + j ) − Ψ(1 + b − b )] Γ(1 + b − b + j )Γ( j + 1)Γ(1 + b − b ) == − j − (cid:88) k =0 (1 + b − b + k ) − π j ( b − b − , (53)and ∂∂b ∆ b − b + ∆ b + y t = − (1 − L ) t − (cid:88) j =1 j (cid:88) k =1 b − b + k π j ( b − b − b + y t − j . (54)35he ﬁrst term in (52) is ∂B + ( L, θ ) ∂b ∆ b − b + ∆ b + y t = − B ( L, θ )(1 + β (cid:48) Σ − β ) − / ∆ b + ∂∂b ∆ b − b + ∆ b + y t . (55)By plugging (55) and (54) into (52) one obtains ∂z t ( θ ) /∂b∂z t ( θ ) ∂b = β (cid:48) Σ − β (cid:48) Σ − β B + ( L, θ ) (cid:32) B + ( L, θ )∆ b + (cid:112) β (cid:48) Σ − β − (cid:33) ∆ t − (cid:88) j =1 j (cid:88) k =1 π j ( b − b − b − b + k ∆ b + y t − j . (56)For θ = θ one has π j ( −

1) = 1. The sum in (56) becomes (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − j = (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − j − (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − − j = (cid:80) t − j =1 j − ∆ b + y t − j , which is stationaryand F t − -measurable. For (56) evaluated at θ one has ∂z t ( θ ) ∂b (cid:12)(cid:12)(cid:12) θ = θ = β (cid:48) Σ − β (cid:48) Σ − β B + ( L, θ ) (cid:32) B + ( L, θ )∆ b + (cid:112) β (cid:48) Σ − β − (cid:33) t − (cid:88) j =1 j − ∆ b + y t − j , which is stationary since B + ( L, θ ) is a stationary polynomial. Thus, ( ∂v t ( θ ) (cid:48) /∂b ) | θ = θ =( ∂z t ( θ ) /∂b ) | θ = θ β (cid:48) = a b ( u t , η t ).The following lemmas are required for theorem 3.3 Lemma A.7.

The process ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , together with F t is a martingale diﬀerence sequence.Proof of Lemma A.7. Note that ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ = − ∂x t | t − β (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ is F t − -measurable since x t | t − is F t − -measurable. Hence,E (cid:34) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) (cid:12)(cid:12)(cid:12) F t − (cid:35) = ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − E [ v t ( θ ) |F t − ] = 0 , and E (cid:104) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) (cid:105) = 0 by the law of iterated expectations. Since y t and x t are normally distributed, E[ | y t | ] < ∞ for every ﬁnite t , so that E[ | v t ( θ ) | ] < ∞ andE[ | x t v t ( θ ) | ] < ∞ hold as well. Therefore E (cid:104)(cid:12)(cid:12)(cid:12) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) (cid:12)(cid:12)(cid:12)(cid:105) < ∞ .Under thesetwo conditions the process is a martingale diﬀerence sequence (Davidson; 2000, thm. 6.2.1).36 emma A.8. If b < . , a CLT for the gradient in (14) yields √ n n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) d −→ G, (57)12 ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:16) F [ n ] − ⊗ F [ n ] − (cid:17) vec (cid:32) √ n n (cid:88) t =1 (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ]0 (cid:17)(cid:33) d −→ J, (58) G ∼ N(0 , Var( G )) , J ∼ N(0 , Var( J )) , as n → ∞ where Var( G ) = plim n →∞ n n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ , Var( J ) = 12 (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:0) F − ⊗ F − (cid:1) ∂ vec F∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) . Proof of Lemma A.8.

Due to lemma A.7, the l.h.s. of (57) together with F t is a MDS. Sincewe show below that Var (cid:104) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) (cid:105) < ∞ holds, a MDS CLT (cf. Davidson;2000, thm. 6.2.3) applies and yields equation (57). From lemma A.6 one has F [ n ]0 → F for n → ∞ and F t, = Var ( v t ( θ ) |F t − ) = F + o (1) so thatVar (cid:34) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) (cid:35) = E (cid:34) Var (cid:32) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) |F t − (cid:33)(cid:35) = E (cid:34) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) + o (1) . (59)For the decisive block in (59) we have, using lemma 3.2 and the projection matrix (15),E (cid:16)(cid:0) − P x x t + a β ( u t , η t ) (cid:1) F − (cid:0) − P x x t + a β ( u t , η t ) (cid:1) (cid:48) (cid:17) + o (1) . (60)The leading term in (60) is P x F − P (cid:48) x E( x t ). It is ﬁnite for b < / x t is asymptoticallystationary and so are all cross products from (60). Thus, the covariance matrix is ﬁnitefor b < /

2. Hence n (cid:80) nt =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) v t ( θ ) (cid:48) F [ n ] − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12) θ = θ p −→ Var( G ) as n → ∞ , where 0 < Var( G ) < ∞ and Var( G ) results from (59). The proof of (58) isidentical to Chang et al. (2009, lemma 3.4) except for the additional use of lemma A.6.With these lemmas at hand, we are ready to prove theorem 3.3. Proof of Theorem 3.3.

As noted in section 3, both terms in (14) (cid:80) nt =1 (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ]0 (cid:17) and (cid:80) nt =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) are asymptotically independent. Therefore, it follows37rom lemmas A.7 and A.8 that √ n s n ( θ ) d −→ G + J ∼ N(0 , J ) as n → ∞ , with J = Var( G )+Var( J ) and each variance given in lemma A.8. Using the results of Davidson(2000, Ch. 11.3.3), it follows for b < . √ n (ˆ θ n − θ ) d −→ N(0 , J − ) as n → ∞ . Proof of Lemma 3.4.

First note that from lemma A.7 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) is a martingalediﬀerence sequence adapted to the sigma-algebra F t . To prove weak convergence of U n ( r ), W n ( r ), Y n ( r ) observe that multiplication with A (cid:48) S and A (cid:48) D eliminates the nonstationarypart, so that F [ n ] − v t ( θ ) , A (cid:48) S ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , A (cid:48) D ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) , are (asymptotically) stationary martingale diﬀerence sequences. Therefore, a functionalcentral limit theorem for stationary martingale diﬀerence sequences (cf. eg. Davidson; 1994,thm. 27.14) implies ( U n ( r ) , W n ( r ) , Y n ( r )) ⇒ ( U ( r ) , W ( r ) , Y ( r )) as n → ∞ .For the nonstationary, fractionally integrated X n ( r ) it follows from (20) that, b > / A (cid:48) N ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ = − Γ (cid:48) ∆ − b + η t + Γ (cid:48) a β ( u t , η t ) , (61)where n − b +1 / ∆ − b + η t weakly converges to fractional Brownian motion of type II (cf. Jo-hansen and Nielsen; 2010, eq. 6), whereas for b > / I (0) component Γ (cid:48) a β ( u t , η t ) in X n ( r ) converges to zero due to scaling. Hence, X n ( r ) ⇒ X ( r ) as n → ∞ .For V n , it follows from (51) by plugging in y t and rearranging terms that the partial deriva-tive ∂v t ( θ ) (cid:48) /∂β | θ = θ = V x,t + V η,t + V u,t + V B,t , where V x,t = − P x x t , V η,t = P x B + ( L, θ ) η t , V B,t = ∂B + ( L, θ ) /∂β | θ = θ (cid:0) η t β (cid:48) + ( β (cid:48) Σ − )( β (cid:48) Σ − β ) − ∆ b + u t β (cid:48) (cid:1) , and V u,t = − β (cid:48) Σ − β (cid:48) Σ − β (1 − B + ( L, θ )∆ b + ) u t I − Σ − β (cid:48) Σ − β (cid:18) I − β β (cid:48) Σ − β (cid:48) Σ − β (cid:19) (1 − B + ( L, θ )∆ b + ) u t β (cid:48) . Note that Γ (cid:48) V B,t = 0, which can be seen directly by plugging in the partial derivativeof B + ( L, θ ) as given in the proof of lemma 3.2 and using (16). V u,t only depends on u , ..., u t − , since B = π ( b ) = 1, which eliminates u t in (1 − B + ( L, θ )∆ b + ) u t . Furthermore v t ( θ ) = P x ( β x t + u t ) + β z t ( θ ) = P x u t + β z t ( θ ) only depends on contemporaneous u t , η t ,since z t ( θ ) = η t + β (cid:48) Σ − ( β (cid:48) Σ − β ) − u t is Gaussian white noise, as discussed in the proofof lemma 2.2. Finally, the relation Γ (cid:48) F [ n ] − = Γ (cid:48) Σ − will be helpful in proving convergenceof V n , and follows from plugging in F [ n ] − from lemma A.1 and using (16). For V n one thenhas V n = 1 n b n (cid:88) t =1 Γ (cid:48) ( V x + V η + V u ) F [ n ] − ( P x u t + β z t ( θ )) = − n b n (cid:88) t =1 Γ (cid:48) Σ − x t u t + o p (1) , (cid:80) nt =1 Γ (cid:48) V u F [ n ] − ( P x u t + β z t ( θ )) = O p ( n / ) as V u is I (0), depends on u , ..., u t − and u t , η t are iid, (cid:80) nt =1 Γ (cid:48) V η F [ n ] − ( P x u t + β z t ( θ )) = (cid:80) nt =1 Γ (cid:48) Σ − B + ( L, θ ) η t ( P x u t + β z t ( θ )) = (cid:80) nt =1 Γ (cid:48) Σ − B + ( L, θ ) η t u t = O p ( n / ), since Γ (cid:48) Σ − β = 0 and η t , u t are independent.Finally, n − b (cid:80) nt =1 Γ (cid:48) V x F [ n ] − ( P x u t + β z t ( θ )) = n − b (cid:80) nt =1 − Γ (cid:48) Σ − x t u t . Since η t , u t areindependent, one can apply a central limit theorem for fractionally integrated processes (cf.e.g. Johansen and Nielsen; 2010, eq. 7) and write V n d −→ V = (cid:82) X ( r ) d U ( r ) as n → ∞ . Lemma A.9.

For b ∈ (1 / , / and ν − n given in (19) , the score vector of the likelihoodfunction for the fractional unobserved components model satisﬁes ν − n A (cid:48) s n ( θ ) d −→ N =  − (cid:82) X ( r ) d U ( r ) Z − WQ − Y  , as n → ∞ , with Z n , Q n given in (24) , (25) , Z n d −→ Z ∼ N(0 , Var( Z )) , and Q n d −→ Q ∼ N(0 , Var( Q )) ,as n → ∞ . Var( Z ) , Var( Q ) are given in (26) , (27) .Proof of Lemma A.9. Note that for the ﬁrst block of ν − n A (cid:48) s n ( θ ) one has for b > / n − b A (cid:48) N s n ( θ ) = 12 n b − / A (cid:48) N ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:16) F [ n ] − ⊗ F [ n ] − (cid:17) × vec (cid:34) √ n n (cid:88) t =1 (cid:16) v t ( θ ) v t ( θ ) (cid:48) − F [ n ] − (cid:17)(cid:35) − n b A (cid:48) N n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − v t ( θ ) == n − b +1 / O p (1) − V n d −→ − V = − (cid:90) X ( r ) d U ( r ) , as n → ∞ due to lemma 3.4 and (17). Next, observe that for Z n in (24), additionallyapplying lemma A.6, one has Z n d −→ Z ∼ N(0 , Var( Z )), as n → ∞ , with Var( Z ) given in(26), as Chang et al. (2009, lemma 3.4) show. Since Q n in (25) only diﬀers from Z n by itsrotation matrix, Q n d −→ Q ∼ N(0 , Var( Q )) follows analogously. Using also the partial sumsdeﬁned for lemma 3.4 one obtains for the second block √ n A (cid:48) S s n ( θ ) = Z n − W n d −→ Z − W, as n → ∞ and analogously for the third block √ n A (cid:48) D s n ( θ ) = Q n − Y n d −→ Q − Y . Lemma A.10.

The Hessian matrix satisﬁes − ν − n A (cid:48) H n ( θ ) Aν − (cid:48) n d −→ M > , a.s. as n → ∞ , with M given in (23) 39 roof of Lemma A.10. By (11) we have ν − n A (cid:48) H n ( θ ) Aν − (cid:48) n = ν − n A (cid:48) (cid:0)(cid:80) h =1 H n,h ( θ ) (cid:1) Aν − (cid:48) n .Starting with the upper-left block the decisive term stems from H n, ( θ ) such that1 n b A (cid:48) N H n ( θ ) A N = − n b A (cid:48) N (cid:32) n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A N + o p (1) d −→ − (cid:90) X ( r ) F − X ( r ) (cid:48) d r, as n → ∞ , where the nonstationary term converges due to lemma 3.4 and the continuousmapping theorem and where o p (1) accounts for the components in the Hessian matrix thatconverge to zero in probability.The upper-middle block is n b . A (cid:48) N H n ( θ ) A S = O p ( n − / ) since for the components in-cluding fractionally integrated processes due to H n,h ( θ ), h = 5 , , , A (cid:48) N (cid:32) n (cid:88) t =1 ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F [ n ] − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A S = n (cid:88) t =1 − Γ (cid:48) x t F [ n ] − w t + w t = O p ( n b ) , n (cid:88) t =1 (cid:16) I ⊗ v t ( θ ) (cid:48) F [ n ] − (cid:17) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = O p ( n b ) ,∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F [ n ] − ⊗ F [ n ] − ) n (cid:88) t =1 (cid:32) ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ⊗ v t ( θ ) (cid:33) = O p ( n b ) , n (cid:88) t =1 (cid:32) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ⊗ v t ( θ ) (cid:48) (cid:33) ( F [ n ] − ⊗ F [ n ] − ) ∂ vec F [ n ] ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = O p ( n b ) . The center-middle block converges to n A (cid:48) S H n ( θ ) A S p −→ −

Var( W ) − Var( Z ) , as shown inChang et al. (2009, eq. 56–64).For the last component n A (cid:48) D H n ( θ ) A D , due to relevant H n,h ( θ ), h = 3 , , ,

7, we deﬁne1 n A (cid:48) D H n ( θ ) A D = A ∗ n + B ∗ n + C ∗ n + D ∗ n + D ∗ (cid:48) n + o p (1) ,A ∗ n = − A (cid:48) D (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) ∂ vec F∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) A D + o p (1) = − Var( Q ) + o p (1) ,B ∗ n = − n n (cid:88) t =1 A (cid:48) D (cid:32) ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ F − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A D = − Var( Y ) + o p (1) ,C ∗ n = − n n (cid:88) t =1 A (cid:48) D (cid:0) I ⊗ v t ( θ ) (cid:48) F − (cid:1) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ A D = O p ( n − / ) , ∗ n = A (cid:48) D (cid:34) ∂ (vec F ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F − ⊗ F − ) 1 n n (cid:88) t =1 (cid:32) ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ⊗ v t ( θ ) (cid:33)(cid:35) A D = O p ( n − / ) . The results for A ∗ n , B ∗ n follow directly from lemma A.9 and (22). The result for D ∗ n holdssince ∂v t ( θ ) ∂b (cid:12)(cid:12) θ = θ is stationary and F t − -measurable, as shown in the proof of lemma 3.2,such that ∂v t ( θ ) ∂b (cid:12)(cid:12) θ = θ ⊗ v t ( θ ) is a stationary MDS. For C ∗ n to hold we require stationarityof ∂ v t ( θ ) / ( ∂b∂β (cid:48) ), ∂ v t ( θ ) / ( ∂b∂ (vec Σ ) (cid:48) ), and ∂ v t ( θ ) /∂b at θ = θ . Since ∂v t ( θ ) /∂b =( ∂z t ( θ ) /∂b ) β (cid:48) equation (56) shows directly that the former two conditions hold, as thepartial derivatives w.r.t. β (cid:48) , (vec Σ ) (cid:48) do not change the persistence of the process.For ∂ v t ( θ ) /∂b we decompose ( ∂ z t ( θ ) /∂b ) | θ = θ = ( β Σ − β ) − β (cid:48) Σ − ( Z + Z + Z ), where Z = ( B + ( L, θ )(1+ β (cid:48) Σ − β ) − / ∆ b + − B + ( L, θ )(1 − L ) (cid:80) t − j =1 (cid:80) jk =1 ( ∂∂b ( b − b + k ) − π j ( b − b − b + y t − j ) | θ = θ , Z = ( B + ( L, θ )(1+ β (cid:48) Σ − β ) − / ∆ b + − ∂B + ( L,θ ) ∂b (cid:12)(cid:12) θ = θ (cid:80) t − j =1 j − ∆ b + y t − j ,and Z = B + ( L, θ )(1 + β (cid:48) Σ − β ) − / ( ∂∂b B + ( L, θ )∆ b + ) (cid:12)(cid:12) θ = θ (cid:80) t − j =1 j − ∆ b + y t − j . The three dif-ferent components are obtained by applying the product rule to the partial derivative of(56). Z is stationary, since the stationary ﬁlter ∂B + ( L,θ ) ∂b (cid:12)(cid:12) θ = θ applied to a stationary series yieldsa stationary process, see (55). Z is stationary, since we can write ( ∂∂b B + ( L, θ )∆ b + ) (cid:12)(cid:12) θ = θ =( ∂∂b B + ( L, θ )∆ b − b + ) (cid:12)(cid:12) θ = θ ∆ b + and ( ∂∂b B + ( L, θ )∆ b − b + ) (cid:12)(cid:12) θ = θ is a stationary ﬁlter, as shown inthe proof of lemma 3.2.For Z it remains to be shown that (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 ( ∂∂b ( b − b + k ) − π j ( b − b − b + y t − j ) | θ = θ is stationary. From (( ∂/∂b )( b − b + k ) − π j ( b − b − b + y t − j ) | θ = θ = k − π j ( − b + y t − j − ( ∂π j ( b − b − /∂b ) | θ = θ k − ∆ b + y t − j together with (53) it follows (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 ( ∂∂b ( b − b + k ) − π j ( b − b − b + y t − j ) | θ = θ = (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − j − (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − (cid:80) jl =1 l − ∆ b + y t − j . The former term is (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − j = (cid:80) t − j =1 j − ∆ b + y t − j , whereas the latter term is (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − (cid:80) jl =1 l − ∆ b + y t − j = (cid:80) t − j =1 j − ∆ b + y t − j + 2 (cid:80) t − j =2 ∆ b + y t − j j − (cid:80) j − k =1 k − . Hence (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − ∆ b + y t − j − (1 − L ) (cid:80) t − j =1 (cid:80) jk =1 k − (cid:80) jl =1 l − ∆ b + y t − j = − (cid:80) t − j =2 ∆ b + y t − j j − (cid:80) j − k =1 k − and therefore itis stationary. Thus, Z is stationary, such that ( ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ )) | θ = θ A D has ﬁnite second mo-ments and is F t − -measurable. Therefore, it follows directly that A (cid:48) D ( I ⊗ v t ( θ ) (cid:48) F − )( ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ )) | θ = θ A D is a stationary MDS, such that the result for C ∗ n holds. Lemma A.11.

There exists a sequence of invertible normalization matrices µ n such that µ n ν − n → a.s. and sup θ ∈ Θ n || µ − n A (cid:48) ( H n ( θ ) − H n ( θ )) Aµ − (cid:48) n || p −→ , where Θ n = (cid:8) θ (cid:12)(cid:12) || µ (cid:48) n A − ( θ − θ ) || ≤ (cid:9) is a sequence of shrinking neighborhoods of θ .Proof of Lemma A.11. First, determine all θ ’s that fulﬁll Θ n = (cid:8) θ (cid:12)(cid:12) || µ (cid:48) n A − ( θ − θ ) || ≤ (cid:9) .41nalogously to Chang et al. (2009, p. 245) we let µ n = ν − γn for small γ >

0. Further,denote the vector of rows i to j of a vector δ by δ ( i : j ) . All θ ∈ Θ n are given by those δ = µ (cid:48) n A − ( θ − θ ) for which || δ || ≤ β = β + n − b (1 − γ ) Γ δ (1: p − + n − / − γ ) β ( β (cid:48) Σ − β ) / δ ( p ) , (62)vech Σ = vech Σ + n − / − γ ) δ ( p +1: k − , (63) b = b + n − / − γ ) δ ( k ) . (64)By the properties of the projection matrix P x , multiplication of (62) by Γ (cid:48) Σ − and β (cid:48) Σ − ( β (cid:48) Σ − β ) / delivers Γ (cid:48) Σ − ( β − β ) = O (cid:0) n − b (1 − γ ) (cid:1) , β (cid:48) Σ − ( β (cid:48) Σ − β ) / ( β − β ) = O (cid:0) n − / − γ ) (cid:1) . In (62) to (64) β , Σ , and b are marginally smaller or larger than their true values dependingon the sign of the elements of δ . Note that the sign of δ ( k ) matters in (64). Choosing b ≥ b gives (∆ b + − ∆ b + ) y t ∼ I (0), whereas b < b yields (∆ b + − ∆ b + ) y t ∼ I ( b − b ). The latter caseis implied by δ ( k ) <

0. Thus, b = b − n − / − γ ) | δ ( k ) | covers the more general case and isconsidered in the following. The results carry over to b > b straightforwardly.For lemma A.11 to be satisﬁed, for the nonstationary components in (11) involving H n,h ( θ ), h = 5 , ,

7, we need to show that n b (1 − γ ) A (cid:48) N n (cid:88) t =1 (cid:32) ∂v t ( θ ) (cid:48) ∂θ − ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) F [ n ] − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ A N p −→ , (65)1 n b (1 − γ ) A (cid:48) N n (cid:88) t =1 (cid:32) ∂v t ( θ ) (cid:48) ∂θ − ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) F [ n ] − (cid:32) ∂v t ( θ ) ∂θ (cid:48) − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A N p −→ , (66)1 n γ A (cid:48) j n (cid:88) t =1 (cid:32) ∂v t ( θ ) (cid:48) ∂θ − ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) F [ n ] − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ A j p −→ , (67)1 n γ n (cid:88) t =1 A (cid:48) j (cid:16) I ⊗ ( v t ( θ ) (cid:48) − v t ( θ ) (cid:48) ) F [ n ] − (cid:17) (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ A j p −→ , (68)1 n γ n (cid:88) t =1 A (cid:48) j (cid:16) I ⊗ v t ( θ ) (cid:48) F [ n ] − (cid:17) (cid:34)(cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) − (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) A j p −→ , (69) A (cid:48) j ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F [ n ] − ⊗ F [ n ] − ) 1 n γ n (cid:88) t =1 (cid:32) ∂v t ( θ ) ∂θ (cid:48) − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) ⊗ v t ( θ ) A j p −→ , (70) A (cid:48) j ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F [ n ] − ⊗ F [ n ] − ) 1 n γ n (cid:88) t =1 ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ⊗ ( v t ( θ ) − v t ( θ )) A j p −→ , (71) n γ A (cid:48) j n (cid:88) t =1 (cid:32) ∂v t ( θ ) (cid:48) ∂θ − ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) F [ n ] − (cid:32) ∂v t ( θ ) ∂θ (cid:48) − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) A j p −→ , (72)1 n γ n (cid:88) t =1 A (cid:48) j (cid:104) I ⊗ ( v t ( θ ) (cid:48) − v t ( θ ) (cid:48) ) F [ n ] − (cid:105) (cid:34)(cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) − (cid:18) ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:35) A j p −→ , (73) A (cid:48) j ∂ (vec F [ n ] ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ( F [ n ] − ⊗ F [ n ] − ) 1 n γ n (cid:88) t =1 (cid:32) ∂v t ( θ ) ∂θ (cid:48) − ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ (cid:33) ⊗ ( v t ( θ ) − v t ( θ )) A j p −→ , (74) for j = S, D . Analog to Chang et al. (2009) we only prove convergence of the nonstationarycomponents since the required conditions obviously hold for the stationary terms. Let∆( n K x t ) denote terms that are of order n K times x t or of a lower order. ˜ w t denotesterms that converge from an I (cid:0) n − / − γ ) (cid:1) process to an I (0) process as n → ∞ . For theanalysis of the convergence rates of the various diﬀerences above, β can be rewritten basedon (62) as β = β − n − b (1 − γ ) Γ δ (1: p − − n − / − γ ) β ( β (cid:48) Σ − β ) / δ ( p ) . To obtain the requiredconvergence rates, iterate this equation by inserting it again for the β in the numeratorin the third term. By denoting g = δ ( p ) ( β (cid:48) Σ − β ) − / this leads to β = β (cid:0) − n − / − γ ) g (cid:1) − (cid:0) n − b (1 − γ ) − n − (1 / b )(1 − γ ) g (cid:1) Γ δ (1: p − + n − γ g β . (75)Consider the diﬀerence v t ( θ ) − v t ( θ ) ﬁrst. From theorem 2.1 and (41) and by denoting P = I − Σ − ββ (cid:48) β (cid:48) Σ − β analogously to (15), one has v t ( θ ) − v t ( θ ) = P (cid:48) β x t + B + ( L, θ ) ( I − P (cid:48) ) ∆ b + y t + w t ,where w t denotes some I (0) terms. Note that ∆ b + y t in the second term is I ( b − b ) = I (cid:0) n − / − γ ) (cid:1) by lemma 2.2 and (64) and therefore abbreviated by ˜ w t . Since P (cid:48) β = 0 and b > /

2, inserting (75) for β in the ﬁrst term delivers v t ( θ ) − v t ( θ ) = ∆ (cid:0) n − b (1 − γ ) x t (cid:1) + ∆ (cid:0) n − γ x t (cid:1) + ˜ w t + w t . (76)For considering the diﬀerences in the partial derivatives we start from (51) derived in theproof of lemma 3.2 and focus on terms driven by x t ∂v t ( θ ) (cid:48) ∂β = − (cid:20) β (cid:48) Σ − β (cid:48) Σ − β β I + Σ − β (cid:48) Σ − β (cid:18) I − ββ (cid:48) Σ − β (cid:48) Σ − β (cid:19) β β (cid:48) (cid:21) x t + ˜ w t + w t . Next insert β from (75) and collect terms such that ∂v t ( θ ) (cid:48) ∂β = − P x t + ∆ (cid:0) n − b (1 − γ ) x t (cid:1) + ˜ w t + w t , (77)43 v t ( θ ) (cid:48) ∂β − ∂v t ( θ ) (cid:48) ∂β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = − ( P − P x ) x t + ∆ (cid:0) n − b (1 − γ ) x t (cid:1) + ˜ w t + w t . (78)Based on (75), one can show that P − P x = O (cid:0) n − / − γ ) (cid:1) so that (78) can be also writtenas ∂v t ( θ ) (cid:48) ∂β − ∂v t ( θ ) (cid:48) ∂β (cid:12)(cid:12)(cid:12) θ = θ = ∆ (cid:0) n − / − γ ) x t (cid:1) + ˜ w t + w t , which directly yields ∂v t ( θ ) (cid:48) ∂θ − ∂v t ( θ ) (cid:48) ∂θ (cid:12)(cid:12)(cid:12) θ = θ =∆ (cid:0) n − / − γ ) x t (cid:1) + ˜ w t + w t . From this result, it follows for the second order partial derivativesthat ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) − ∂ ∂θ∂θ (cid:48) ⊗ v t ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = ∆( n − / γ x t ) + ˜ w t + w t . (79)Now we are ready for checking (65) to (74). We begin with (65). Using (78), the aboveresult on P − P x , (19), and lemma 3.2, the leading term in (65) can be stated n γb Γ (cid:48) O (cid:0) n − / − γ ) (cid:1) (cid:32) n b n (cid:88) t =1 x t (cid:33) F [ n ] − P (cid:48) x Γ = O (cid:0) n − / γ (1 / b ) (cid:1) O p (1) = o p (1)for small γ and where n b (cid:80) nt =1 x t = O p (1) can be shown. Similarly, (66) can be derived.For the remaining equations note that ∂v t ( θ ) (cid:48) ∂ vec Σ = ∆( n − b + γ x t ) + ˜ w t + w t which can be seendirectly by plugging (62) into the formula for the partial derivative as given in the proofof lemma 3.2. Furthermore ∂v t ( θ ) (cid:48) ∂b = ˜ w t + w t . Therefore, from (77), (19) and by inserting(75) for β in the numerator and the properties of the projection matrix P one obtains A (cid:48) S ∂v t ( θ ) (cid:48) ∂β = β (cid:48) ( β (cid:48) Σ − β ) / ∂v t ( θ ) (cid:48) ∂β = ∆ (cid:0) n − b (1 − γ ) x t (cid:1) + ˜ w t + w t . (80)The same holds for the second order partial derivatives. Since ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12) θ = θ A S ∼ I (0), andsince ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12) θ = θ A D converges to an I (0) process, as A D only picks the partial derivativew.r.t. b , equations (67), (69), and (72) follow directly. The equations (68), (73) can beshown by using (76). To prove (70), (71) and (74), note that (cid:18) ∂v t ( θ ) ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) A S = (cid:18) ∂v t ( θ ) ∂θ (cid:48) ⊗ v t ( θ ) (cid:19) ( A S ⊗

1) = ∂v t ( θ ) ∂θ (cid:48) A S ⊗ v t ( θ ) . (71) the follows from ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12) θ = θ A S = w t and ∂v t ( θ ) ∂θ (cid:48) (cid:12)(cid:12) θ = θ A D = ˜ w t together with (76), whereas(70) follows from (80) together with v t ( θ ) = w t . Finally, (74) follows from (80) togetherwith (76). This completes the proof for theorem A.11.44 eferences Anderson, B. D. O. and Moore, J. B. (1979).

Optimal Filtering , Prentice Hall informationand system sciences series, Prentice-Hall, Englewood Cliﬀs, N.J.Ari˜no, M. A. and Marmol, F. (2004). A permanent-transitory decomposition for ARFIMAprocesses,

Journal of Statistical Planning and Inference (1): 87–97.Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics,

Journal of Econometrics (1): 5–59.Beran, J. (1995). Maximum likelihood estimation of the diﬀerencing parameter for invert-ible short and long memory autoregressive integrated moving average models, Journalof the Royal Statistical Society B (4): 659–672.Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes, TheAnnals of Statistics (2): 719–740.Chang, Y., Miller, J. I. and Park, J. Y. (2009). Extracting a common stochastic trend:Theory with some applications, Journal of Econometrics (2): 231–247.Davidson, J. (1994).

Stochastic Limit Theory , Oxford University Press.Davidson, J. (2000).

Econometric Theory , Blackwell Publishers.Dom´enech, R. and G´omez, V. (2006). Estimating potential output, core inﬂation, and theNAIRU as latent variables,

Journal of Business & Economic Statistics (3): 354–365.Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods:Second Edition , Oxford Statistical Science Series.Granger, C. and Newbold, P. (1986).

Forecasting Economic Time Series , Academic Press.Hartl, T. and Weigand, R. (2019). Approximate state space modelling of unobservedfractional components,

Papers , arXiv.org.

URL: https://EconPapers.repec.org/RePEc:arx:papers:1812.09142

Harvey, A. C. (1990).

Forecasting, Structural Time Series Models and the Kalman Filter ,Cambridge University Press.Hassler, U. (2018).

Time Series Analysis with Long Memory in View , Wiley Series inProbability and Statistics, Wiley.Hassler, U. and Wolters, J. (1995). Long memory in inﬂation rates: International evidence,

Journal of Business & Economic Statistics (1): 37–45.Hualde, J. and Robinson, P. M. (2011). Gaussian pseudo-maximum likelihood estimationof fractional time series models, The Annals of Statistics (6): 3152–3181.Johansen, S. (2008). A representation theory for a class of vector autoregressive modelsfor fractional processes, Econometric Theory (3): 651–676.45ohansen, S. and Nielsen, M. Ø. (2010). Likelihood inference for a nonstationary fractionalautoregressive model, Journal of Econometrics (1): 51–66.Johansen, S. and Nielsen, M. Ø. (2012). Likelihood inference for a fractionally cointegratedvector autoregressive model,

Econometrica (6): 2667–2732.Kim, C.-J. and Nelson, C. R. (1999). State-Space Models with Regime Switching: Classicaland Gibbs-Sampling Approaches with Applications , The MIT Press.Koopman, S. J. and Shephard, N. (2015).

Unobserved Components and Time Series Econo-metrics , Oxford University Press.Marinucci, D. and Robinson, P. (1999). Alternative forms of fractional Brownian motion,

Journal of Statistical Planning and Inference : 111–122.Morley, J. C., Nelson, C. R. and Zivot, E. (2003). Why are the Beveridge-Nelson andunobserved-components decompositions of GDP so diﬀerent?, The Review of Economicsand Statistics (2): 235–243.Muirhead, R. J. (1982). Aspects of multivariate statistical theory , Wiley series in probabilityand mathematical statistics., Wiley, New York, NY.M¨uller, U. K. and Watson, M. W. (2018). Long-run covariability,

Econometrica (3): 775–804.Nielsen, M. Ø. (2015). Asymptotics for the conditional-sum-of-squares estimator in mul-tivariate fractional time-series models, Journal of Time Series Analysis (2): 154–188.Park, J. Y. and Phillips, P. C. B. (2001). Nonlinear regressions with integrated time series, Econometrica : 117–161.Proietti, T. (2016). Component-wise representations of long-memory models and volatilityprediction, Journal of Financial Econometrics (4): 668–692.Robinson, P. M. (2006). Conditional-sum-of-squares estimation of models for stationarytime series with long memory, in H.-C. Ho, C.-K. Ing and T. L. Lai (eds),

Time Seriesand Related Topics , Institute of Mathematical Statistics, pp. 130–137.Shimotsu, K. and Phillips, P. C. B. (2005). Exact local Whittle estimation of fractionalintegration,

The Annals of Statistics (4): 1890–1933.Stock, J. H. and Watson, M. W. (2016). Core inﬂation and trend inﬂation, The Review ofEconomics and Statistics (4): 770–784.Sun, Y. and Phillips, P. C. B. (2004). Understanding the Fisher equation, Journal ofApplied Econometrics (7): 869–886.Tschernig, R., Weber, E. and Weigand, R. (2013). Long-run identiﬁcation in a fractionallyintegrated system, Journal of Business & Economic Statistics (4): 438–450.46eber, E. (2011). Analyzing U.S. output and the great moderation by simultaneousunobserved components, Journal of Money, Credit and Banking (8): 1579–1597.Wooldridge, J. M. (1994). Estimation and inference for dependent processes, in R. F.Engle and D. McFadden (eds),