Dimension Reduction for High Dimensional Vector Autoregressive Models
aa r X i v : . [ ec on . E M ] S e p Dimension Reduction for High Dimensional Vector AutoregressiveModels
Gianluca Cubadda ∗ Universit`a di Roma ”Tor Vergata” Alain Hecq † Maastricht UniversitySeptember 9, 2020 ‡ Abstract
This paper aims to decompose a large dimensional vector autoregessive (VAR) model into two com-ponents, the first one being generated by a small-scale VAR and the second one being a white noisesequence. Hence, a reduced number of common factors generates the entire dynamics of the large systemthrough a VAR structure. This modelling extends the common feature approach to high dimensionalsystems, and it differs from the dynamic factor models in which the idiosyncratic components can alsoembed a dynamic pattern. We show the conditions under which this decomposition exists, and we pro-vide statistical tools to detect its presence in the data and to estimate the parameters of the underlyingsmall-scale VAR model. We evaluate the practical value of the proposed methodology by simulations aswell as by empirical applications on both economic and financial time series.Keywords: Vector autoregressive models, dimension reduction; reduced-rank regression, multivariateautoregressive index model, common features, factor models ∗ Universita’ di Roma ”Tor Vergata”, Dipartimento di Economia e Finanza, Via Columbia 2, 00133 Roma, Italy. Email:[email protected]. † Maastricht University, Department of Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands. Email:[email protected]. ‡ We would like to thanks Marcelo C. Meideiros for providing us with the high frequency data on stock prices that we haveused to construct the realized variances and covariances. The usual disclaimers apply. Introduction
For decades, the vector autoregressive (VAR) model is de facto a standard tool for investigating multi-variate time series data. In macroeconometrics, VARs are routinely used for forecasting, for extractingco-movements such as the presence of cointegration, to test for Granger causality as well as to performstructural analysis. However a drawback of the VAR is that the number of parameters to be estimatedincreases quadratically with the number of variables and linearly with the number of lags. This quicklycompromises the estimation results outside the case of small scale models. There is currently an increasinginterest for jointly modeling many variables and consequently to look at the feasibility of working with largedimensional VAR models. Indeed, the increase of data availability in economics and finance is associatedwith the common belief that using more information with high dimensional econometric and statistical mod-els will improve our understanding of the macroeconomy as well as forecast accuracy (see Boivin and Ng,2006 for counter examples). Consequently, as the increase of the number of time series jointly consideredin VARs cannot be too large compared to a given number of observations, different attempts have beenproposed in the literature to tackle the curse of dimensionality problem. These methods can be gatheredin two categories: the dimension reduction approach on the one hand and regularization techniques on theother hand. The latter group includes both Bayesian methods (Banbura et al ., 2010), although Bayesiantechniques are also used to estimate large reduced-rank VARs (see Carriero et al ., 2011), and the morerecent booming contributions on penalized estimation of sparse VARs (Wilms and Croux, 2016, Hsu et al .,2008, Nicholson et al ., 2018, Davis et al ., 2016, Smeekes and Wijler, 2018, Kock and Callot, 2015, Hecq etal. , 2019). The former group of methods, to which our paper wishes to contribute, includes reduced ranktechniques (Reinsel, 1983, Ahn and Reinsel, 1988, Carriero et al ., 2011, Cubadda and Hecq, 2011, Bernardiniand Cubadda, 2015) and the huge literature on factor models (surveyed in Bai and Ng, 2008, Stock andWatson 2006, 2011, 2015).Differently from those contributions on system dimension reduction, we provide a framework where thewhole dynamics of the system is due to a small scale VAR model. Following Lam et al . (2011) and Lamand Yao (2012), we first decompose the large multivariate time series Y t into two parts: a linear function ofa small scale dynamic component x t and a static component ε t that, and this is the key point that makesour approach different from the usual dynamic factor model, is unpredictable from the past. The capital Y t stresses that we start from a potentially high-dimensional time series whereas the small x t recalls thatit is a small number of factors that are responsible of the entire dynamics of the system. We provide theconditions under which such a dynamic component x t is generated by a small scale VAR model. Particularly,we show that it is required that the large VAR model of series Y t is endowed with both the serial correlationcommon feature (Engle and Kozicki, 1993) and an index structure (Reinsel, 1983) in order to ensure thatthe dynamic component x t follows a VAR model. Hence, we provide a link between the factor modeling forhigh-dimensional time series and the reduced-rank VAR approach, which, to the best of our knowledge, wasnot noted before. This bridge allows us to unravel common cyclical features and to impose their presencein large VARs. Obviously, the decomposition that we consider might not exist. Based on the eigenanalysisproposed in Lam et al . (2011) and Lam and Yao (2012), we provide statistical tools to verify whether thereexists in series Y t a dynamic component x t that is generated by such a small scale VAR model and to estimate It is usual and correct to quote Sims (1980) here for his fundamental contribution to this literature. An earlier reference isQuenouille (1957). Both in terms of the number of series that are easily available and in the use of different sampling frequencies (e.g. mixedfrequency VARs, see Goetz et al., 2016) x t can then be easily used to predict the futurerealizations of the large dimensional system Y t .The rest of the paper is organized as follows. Section 2 presents the main results on both the modelrepresentation and the statistical inference. Section 3 conducts a Monte Carlo study to evaluate the finitesample properties of the proposed tools. It emerges that information criteria better detect the dimensionreduction than the eigenvalue approach proposed by Lam and Yao (2012). Section 4 provides two empir-ical applications (the first one on US macroeconomic series and the second one on realized variances andcovariances) to illustrate the practical value of our approach. Finally, Section 5 concludes. This section starts by presenting the derivation of the proposed modelling, then it discusses statisticalinference in a large dimensional framework.
Let us assume that the n -vector time series Y t is generated by the following second-order stationary VAR( p )model Y t = p X j =1 Φ j Y t − j + ǫ t , (1)where t = 1 , ..., T , Φ j is an n × n matrix for j = 1 , ..., p with Φ p = 0 such that the roots of det (cid:16) I n − P pj =1 Φ j z j (cid:17) lie outside the unit circle; ǫ t is an n -vector of errors such that E( ǫ t ) = 0, E( ǫ t ǫ ′ t ) = Σ ǫ is a finite and positivedefinite matrix, E( ǫ t | ̥ t − ) = 0 and ̥ t is the natural filtration of the process Y t . For the sake of simplicity,we assume that deterministic elements are absent (or that the series have been demeaned or detrended first). Condition 1
For any j = 1 , ..., p it holds that Φ j = Aω ′ j , where A is a full rank n × r ( r < n ) matrix and Ω ′ = [ ω ′ , ..., ω ′ p ] is a full rank r × np matrix. Without loss of generality, we assume that A is an orthonormalmatrix, namely A ′ A = I r . Condition 1 is popularly known in time series econometrics as the serial correlation common feature(Engle and Kozicki, 1993). It was extensively studied in connection with cointegration (see, inter alia ,Vahid and Engle, 1993, Hecq et al . 2006, Athanasopoulos et al. Y t follow parsimonious univariate models, thus solving theso-called autoregressivity paradox (Cubadda et al ., 2009). See Centoni and Cubadda (2015) for a survey. Inthe analysis that follows, we focus on the case where n is large, virtually with a similar magnitude as thesample size T , whereas r is small compared to T .We start by noting that under Condition 1 we can use the identity AA ′ + A ⊥ A ′⊥ = I n in order todecompose series Y t as follow Y t = Ax t + ε t , (2)where x t = A ′ Y t , ε t = A ⊥ A ′⊥ ǫ t and A ⊥ is a full-rank n × ( n − r ) matrix such that A ′⊥ A = 0 and A ′⊥ A ⊥ = I n − r .Following Lam and Yao (2012), we call x t and ε t respectively as the dynamic and the static componentof series Y t . Indeed, we have that E( ε t | ̥ t − ) = A ′⊥ A ⊥ E( ǫ t | ̥ t − ) = 0 , Y t + k | ̥ t ) = A E( x t + k | ̥ t ) , (3)where k , the forecast horizon, is any positive integer. Remark 2
Note that the assumption that the matrix Ω as defined in Condition 1 has full column rank isequivalent to require that no linear combinations of x t are innovations w.r.t the past. Hence, decomposition (2) allows us to disentangle the latent autocorrelated component x t , whose dimension cannot be furtherreduced, from the unpredictable component ε t . Remark 3
Representation (2) resembles a factor model but there are some relevant differences. First,the static component ε t differs from the idiosyncratic shocks in approximate factor models (Chamberlain andRothschild, 1983) in that the former is driven by the ( n − r ) shocks A ′⊥ ǫ t only. However, if n is large comparedto r (or, more formally, if n diverges whereas r remains fixed), this difference becomes negligible. Second,in most of the factor literature it is assumed that the factor and the idiosyncratic shocks are uncorrelated atany leads and lags whereas in Equation (2) E( ε t + k x ′ t ) = 0 only for k > . However, we can always transformthe original decomposition (2) in such a way that the two components are contemporaneously uncorrelatedand the static component is still a white noise. Indeed, using the decomposition of the identity matrix thatCentoni and Cubadda (2003) proposed to measure the cyclical effects of permanent and transitory shocks A ( A ′ Σ − ǫ A ) − A ′ Σ − ǫ | {z } ˜ A ′ + Σ ǫ A ⊥ ( A ′⊥ Σ ǫ A ⊥ ) − | {z } ˜ A ⊥ A ′⊥ = I N (4) we can decompose series Y t as Y t = A ˜ x t + ˜ ε t , (5) where ˜ x t = ˜ A ′ Y t , ˜ ε t = ˜ A ⊥ A ′⊥ ǫ t . It follows that E(˜ ε t ˜ x ′ t ) = ˜ A ⊥ A ′⊥ Σ ǫ ˜ A = 0 and E(˜ ε t | ̥ t − ) = ˜ A ⊥ A ′⊥ E( ǫ t | ̥ t − ) = 0 . Decompositions (2) and (5) are isomorphic representations of series Y t . Indeed, in view of Equations (5),(3), and (4) we easily obtainE( Y t + k | ̥ t ) = A E(˜ x t + k | ̥ t ) = A ˜ A ′ E( Y t + k | ̥ t ) = A ˜ A ′ A E( x t + k | ̥ t ) = A E( x t + k | ̥ t )given that ˜ A ′ A = I r . Hence, the forecasts of series Y t based on representations (2) and (5) are identical.Although the representation (5) allows for decomposing the variability of each element of Y t into thoseassociated with the two components, which is an aspect of interest for several econometric analyses, it ismore convenient to rely on representation (2) for statistical inference. Indeed, Lam et al . (2011) and Lamand Yao (2012) showed how to consistently estimate both r and A (or, more formally, an orthonormal n × r matrix that lies in the space generated by the columns of A ) under assumptions that are compatible withthose in Condition 1 even when the dimension n diverges. Their method is simple to perform since it isbased on the eigenanalysis of the sum of the squared autocovariance matrices of series Y t . Having obtainedan estimator, say ˆ A , it it is then possible to estimate the dynamic and static components in (2) respectivelyas ˆ x t = ˆ A ′ Y t ε t = ( I − ˆ A ˆ A ′ ) Y t . However, in order to forecast series Y t , as well as to compute the residuals of Model (1) that can in turnbe used to estimate Σ ǫ , estimating the loading matrix A is not enough. In a dimension reduction perspective,in this paper we look for the condition under which the dynamic component x t is generated by a small-scaleVAR( p ) model. To this end, we notice first that under Condition 1 we can rewrite model (1) as follows Y t = p X j =1 Aω ′ j Y t − j + ǫ t , (6)which is popularly known as the reduced-rank VAR model (RRVAR) and it was extensively analyzed, interalia , by Velu et al. (1986) and Ahn and Reinsel (1988).Second, pre-multiplying both sides of Equation (6) by A ′ we get x t = p X j =1 ω ′ j Y t − j + ξ t , (7)where ξ t = A ′ ǫ t . However, Equation (7) does not yet provide a small-scale model for series Y t . Indeed, Ωbeing a np × r matrix with elements ω in (7), the number of parameters still grows proportionally with n (although not with n as in the unrestricted VAR).The next step is then to insert Equation (2) into (7) such as we obtain x t = p X j =1 ω ′ j Ax t − j + p X j =1 ω ′ j ε t − j + ξ t , which allows us to derive the condition under which the dynamic component x t is generated by a VAR( p )process as follows: Condition 4
For any j = 1 , ..., p it holds that ω j ∈ Sp( A ) , where Sp( A ) indicates the space generated bythe columns of A . Notice that this is equivalent to require that ω j = Aα ′ j , where α j is a r × r matrix. Indeed, under Condition 4 we have that ω ′ j Ax t − j = α j x t − j and ω ′ j ε t − j = 0 for j = 1 , ..., p , hence thedata generating process of the dynamic component x t boils down to x t = p X j =1 α j x t − j + ξ t . (8)The intuition behind the algebra is that Condition 4 requires that the same linear combinations of Y t thatare unpredictable from the past are also irrelevant predictors of the dynamic component x t .Remarkably, the RRVAR model (6) of series Y t can be rewritten as Y t = p X j =1 Aα j A ′ Y t − j | {z } x t − j + ǫ t . (9)Model (9) is interesting since it combines the features of the reduced-rank VAR model with those of themultivariate autoregressive index (MAI) model proposed by Reinsel (1983). Recently, there has been arenewed interest in the MAI because it allows to rewrite the VAR in a similar way as the popular dynamic5actor model, see inter alia Carriero et al. (2016), Cubadda et al . (2017), and Cubadda and Guardabascio(2019). Remarkably, the reduced-rank VAR and the MAI have been considered separately in the literaturewhereas Condition 4 reveals that the combination of the two models allows for an important dimensionreduction in large VARs. In what follows, we call Model (9) as the reduced-rank multivariate autoregressiveindex (RRMAI) model.
Remark 5
In order to perform structural analysis through the RRMAI, it is preliminary required to obtainits Wold representation. One way to achieve this goal is to invert the polynomial coefficient matrix inEquation (9) such that one obtains Y t = I n − p X j =1 Aα j A ′ L j − | {z } Θ( L ) ǫ t . (10) Interestingly, it is possible to rewrite (10) in terms of the innovations of the two components in Equation (5) . Indeed, inserting the identity (4) in (10) it easily follows Y t = C ( L ) ˜ ξ t + Θ( L )˜ ε t where C ( L ) = Θ( L ) A and ˜ ξ t = ˜ A ′ ǫ t are the innovations of the dynamic components ˜ x t in decomposition (5) . Since ˜ ξ t and ˜ ε t are uncorrelated at any lead and lag, it is possible to recover the structural shocks solelyfrom the reduced form shocks of the common components ˜ x t . For instance, one may obtain the structuralshocks as C − A ˜ ξ t and the impulse response functions from C ( L ) A − C , where A is the matrix formed bythe first q rows of A and C is a lower triangular matrix such that CC ′ = A ˜ A ′ Σ ǫ ˜ AA ′ . The advantage ofthis approach is that it requires to identify q shocks only instead of n of them. Remark 6
Notice that if the matrices A and Ω have full rank, then the r × rp matrix α ′ = [ α , ..., α p ] hasfull rank as well. Hence, Model (8) cannot be a RRVAR. However, it is well possible that Model (8) followsone of the various restricted versions of the VAR that do not impose a common left null space to all thecoefficient matrices; see, inter alia, Ahn and Reinsel (1988), Cubadda and Hecq (2001) and Cubadda andGuardabascio (2019). Remark 7
Another popular approach in econometrics to exploit the information of large dimensional timeseries is the factor augmented VAR (FAVAR) as originally proposed by Bernanke et al. (2005). In suchmodeling, first r unobserved factors are extracted from the high dimensional time series, then it is assumedthat these factors along with a small set of key observed variables jointly follow a VAR model. In view ofEquation (9) , it is easy to see that the main difference between the FAVAR and our approach is that in thelatter any subset of series Y t is a function of the first p lags of the dynamic components x t only. The reasonis that in our modelling the dynamic component x t captures by construction all the relevant information ofseries series Y t . In order to consistently estimate the matrix A , we start by relying on the approach suggested by Lam etal . (2011), and Lam and Yao (2012). This approach has recently been extended in various directions, suchas cointegration (Zhang et al , 2019), principal component analysis for stationary time series (Chang et al ,6018), and multivariate volatilities modelling (Tao et al . 2011; Li et al ., 2016). The starting point of thismethodology is that the matrix A lies in the space generated by the eigenvectors associated with the ( n − r )non-zero eigenvalues of the matrix M = p X j =1 Σ y ( j )Σ y ( j ) ′ , where Σ y ( j ) = E( Y t Y ′ t − j ). Given the assumption that series Y t follow a VAR( p ) model, one would ideallyfix p = p but the methodology remains valid even if p > p .Let us indicate with ˆ V i the matrix formed by the eigenvectors associated with the i ( ≤ n ) largesteigenvalues of the matrix ˆ M = p X j =1 ˆΣ y ( j ) ˆΣ y ( j ) ′ , where ˆΣ y ( j ) denotes the sample covariance matrix of Y t at lag j . Under regularity conditions that arecompatible with our assumptions, Lam et al . (2011) show that ˆ V r is a √ T -consistent estimator of A (up toan orthonormal transformation) when r is fixed and n, T → ∞ and T is O ( n ). Notice that these assumptionsimply that √ T /n →
0, hence we can use ˆ x t,r = ˆ V ′ r Y t in place of the factors x t in the subsequent analysis and treat ˆ x t,r as observed as the estimation error of ˆ V r is asymptotically irrelevant (Bai and Ng, 2006). Moreover, Lam and Yao (2012) proved that a consistentestimator of r is provided by ˆ r = arg min i =1 ,..R n ˆ λ i +1 / ˆ λ i o , where R is a constant such that r < R < n and ˆ λ i is the i − th largest eigenvalue of matrix ˆ M .However, ˆ r consistently estimates the rank of the matrix A when Condition 1 only applies, whereas weneed an estimator of r that is subject to Condition 4 as well. Let us first consider the problem of estimatingthe parameters of model (9) assuming that r is known and having fixed A equal to ˆ V r . In order to accomplishthis goal, it is convenient to rewrite model (9) in its matrix form Y = ZαA ′ + ǫ, (11)where Y = [ y p +1 , ..., y T ] ′ , ǫ = [ ǫ p +1 , ..., ǫ T ] ′ , z t = (cid:2) x ′ t , ..., x ′ t − p +1 (cid:3) ′ , and Z = [ z p , ..., z T − ] ′ . Then apply theVec operator to both the sides of Equation (11) and use the property Vec( ABC ) = ( C ′ ⊗ A )Vec( B ) to getVec( Y ) = ( A ⊗ Z ) Vec( α ) + Vec( ǫ ) , (12)from which it is easy to see that the ordinary least squares (OLS) estimator of Vec( α ) in Equation (12) takesthe following form:Vec(ˆ α ) = [( A ′ ⊗ Z ′ )( A ⊗ Z )] − ( A ′ ⊗ Z ′ )Vec( Y ) = [ A ′ ⊗ ( Z ′ Z ) − Z ′ ]Vec( Y ) . (13)The main theoretical justification for considering the estimator (13) is that it is equivalent to applying OLSon (8), which is turn the quasi maximum likelihood (QML) estimator of parameters α , conditionally on A ,in the small-scale VAR model of the factor x t . This result is obtained by post-multiplying with A both sides of Equation (11) and then applying the Vec operator to getVec( Y A ) = ( I r ⊗ Z )Vec( α ) + Vec( ǫA )It is easy to see that the OLS estimator of Vec( α ) in the model above is the same as (13).
7n alternative estimator of parameters α can be obtained by applying the generalized least squares (GLS)on Equation (12). In particular, pre-multiply both the sides of Equation (12) by Σ − / ǫ ⊗ I T − p to get(Σ − / ǫ ⊗ I T − p )Vec( Y ) = (cid:16) Σ − / ǫ A ⊗ Z (cid:17) Vec( α ) + (Σ − / ǫ ⊗ I T − p )Vec( ǫ ) . (14)In the appendix we prove that the OLS estimator of Vec( α ) in Equation (14) takes the following form:Vec(˜ α ) = h(cid:0) A ′ Σ − ǫ A (cid:1) − A ′ Σ − ǫ ⊗ ( Z ′ Z ) − Z ′ i Vec( Y ) . (15)In view of Equation (14), it is easy to see that the GLS estimator (15) is the QML estimator of parameters α , conditionally on A , in Model (11). The relation, in terms of efficiency, between the estimators (13) and(15) is provided in the following theorem. Theorem 8
Estimator (15) of
Vec( α ) has a mean square error matrix, conditionally on A , that is not largerthat the one of estimator (13). The two estimators have the same mean square error matrix when A ′ ǫ t and A ′⊥ ǫ t are not correlated. Proof.
See the appendix.In order to derive a feasible GLS (FGLS) estimator, we suggest the following switching algorithm, whichhas the property to increase the Gaussian likelihood conditional to A in each step.1. In view of Equation (9) and given (initial) estimates of α , maximize the conditional Gaussian likelihood L (Σ ǫ | α, A ) by estimating Σ ǫ with( T − p ) − ( Y ′ − Aα ′ Z ′ )( Y − ZαA ′ ) .
2. Given the previously obtained estimate of Σ ǫ , maximize L ( α | Σ ǫ , A ) by estimating elements of α with(15).3. Repeat steps 1 and 2 till numerical convergence occurs. In order to speed up the numerical convergence of that algorithm, it is important to choose the initialvalues for the coefficient matrix α correctly. An obvious choice is resorting to ˆ α , which provides a consistentestimate of α as T increases.A practical problem that arises when the sample size T and the dimension n are of similar magnitude isthat the estimate of matrix Σ ǫ is singular or nearly singular. We propose to solve this problem by ignoringthe error cross-correlations in the estimation method. In particular, we suggest to use a diagonal matrix∆ ǫ with the same diagonal as Σ ǫ in place of Σ ǫ itself in the FGLS procedure. This solution has two mainmotivations. First, it makes the objective function of the switching algorithm become trace(ln(∆ ǫ )), which isa common approximation of ln(det(Σ ǫ )) in high-dimensional settings, see Hu et al . (2017) and the referencestherein. Second, it is reasonable to presume that the fraction of unanticipated co-movements among variablesis small when the conditioning information set is large.Finally, in order to identify the number of factors r, we suggest the following strategy. For i = 1 , ..., R estimate either by OLS or FGLS the models Y t = p X j =1 ˆ V i α j,i ˆ V ′ i Y t − j + ǫ t,i , A general proof of the convergence of this family of iterative procedures is given by Oberhofer and Kmenta (1974). α j,i is a i × i matrix for j = 1 , ..., p , and estimate r as the index b r that minimizes an informationcriterion, where the measure of fit to be used is trace(ln(∆ ǫ )), given the assumption that Σ ǫ is diagonal, andthe overall number of parameters is (( p − i + n ) i , given that the number of free parameters in a base ofthe space spanned by A is equal to ni − i . In this section we perform a Monte Carlo study to evaluate the finite sample performances of the OLS andFGLS estimators of Model (6) parameters having estimated the matrix A according to Lam et al. (2011) inboth cases. We consider the following n -dimensional stationary VAR(2) process Y t = G diag( δ ) G + | {z } Φ Y t − + G diag( δ ) G + | {z } Φ Y t − + ǫ t (16)where G is a n × r matrix such that its columns are generated by r i.i.d. N n (0 , I n ), G + = ( G ′ G ) − G ′ isthe Moore–Penrose pseudoinverse of the matrix G , δ = 2diag( m ) cos( ω ), m is a r − vector drawn from aU r [0 . , . ω is a r − vector drawn from a U r [0 , π ], δ = − m , and ǫ t are i.i.d. N n (0 , Σ ǫ ). Notice thatEquation (16) is equivalent to Y t = Aα A ′ Y t − + Aα A ′ Y t − + ǫ t , where A = GS − , S = ( G ′ G ) / , and α i = S diag( δ i ) S − for i = 1 , Y t , we first generate the dynamic component x t = diag( δ ) x t − + diag( δ ) x t − + G + ǫ t and then the static component with ε t = G ⊥ G ′⊥ ǫ t , where G ′⊥ G ⊥ = I n − r , η t = [( G + ǫ t ) ′ , ( G ′⊥ ǫ t ) ′ ] ′ are i.i.d.N n (0 , Σ η ). We can finally obtain Y t = Gx t + ε t . (17)An important role in the data generating process is played by the covariance matrix Σ n = DCD , where D is a n × n diagonal matrix with positive diagonal elements and C is a positive definite correlation matrix.In order to ensure that the innovations of the dynamic and static components provide equal contribution tothe variability of Y t over different n , we fix D = " nI r / r r × ( n − r ) ( n − r ) × r nI n − r / n − r ) and we generate C as the following Toeplitz matrix C = ρ ρ ρ · · · ρ n ρ ρ ρ · · · ρ n − ... ... ... ... ... ... ρ n ρ n − ρ n − ρ n − · · · where ρ is a scalar drawn from a U[ − . , . Notice that the squared Euclidean norms of the columns of G are all proportional to n on average over the replications.Hence, the simulated factors x t are strong according to the definition by Lam and Yao (2012). However, the randomness of thematrix G ensures for each replication some variability of the degree of strength over the r factors. .2 Results From (17) we generate systems of successively n = 100 , , ,
800 variables. We consider both r = 3and 9 dynamic components; r = 3 is indeed often assumed in financial applications (Fama-French factors)whereas with large sets of macroeconomic series, several studies find that there exist around 8 to 10 commoncomponents. The number of observations equal T = n, n, . n . We consequently evaluate the performanceof our approach when the number of variables is respectively less, equal or larger than the sample size. Wesimulate T + 50 observations and the first 50 points are used as a burn-in period, the remaining ones forestimation. Frequencies are reported for 1000 replications.The proposed methods are evaluated by means of two statistics. We first report the percentage withwhich the true number of dynamic components r is correctly identified (% b r = 3 in Table 1 or % b r = 9 inTable 2) using both the test by Lam and Yao (2012, denoted LY in our tables) and the information criteria ofAkaike (AIC), Schwarz (BIC) and Hannan-Quinn (HQIC). We also report the frequencies with which thoseprocedures underestimate the number of factors (% b r < b r < r obtained for 1000 replications. Second, we report the Frobenius distance between the estimates ofΦ = [Φ ′ , Φ ′ ] ′ and the true ones relative to the Frobenius norm of Φ (RFD). The results reported in Tables 1and 2 are for the OLS estimator for each combination of n , T and r. Note that results relative to the FGLSestimator are almost identical to OLS ones and hence they are omitted for the sake of space.As expected, we notice that all the methods perform better as the dimension of the system n increasesand, conditional to a given n , as the sample size T gets larger. For what regards the estimation of thetrue number of dynamic components r , we see that the information criteria outperforms the LY test. Inparticular, HQIC [AIC] identifies the correct model more often than the competitors when r is equal to 3 [9].We notice that all the criteria perform worse in identification when the number of the dynamic components r is larger.With respect to the RFD, we observe that the models identified by the AIC provide estimates of Φ thatare less accurate than those obtained by the other criteria over all the settings. This disappointing outcome,which is likely due to the systematic tendency of the AIC to overfit the model, suggests that this informationcriterion should be used with caution in empirical applications. The RFDs of models identified by the BICand HQIC are instead rather similar, with the HQIC performing slightly better than then BIC when q = 9and n >
200 and slightly worse in the remaining cases. Again, the RFDs of models identified by all thecriteria display a clear tendency to decrease as both n and T get larger. This section illustrates the feasibility and the practical value of our new data reduction approach in tworelevant applications. The first one seeks at the presence of co-movements within 195 US quarterly eco-nomic and financial time series obtained from the FRED-QD dataset. The second application investigatescommonalities in 30 US equity realized variances for which we compute the 435 realized covariances. Con-sequently the first study looks at the existence of common cyclical features in large systems whereas thesecond application deals with common volatility features for many variances and covariances.10able 1: Monte Carlo results, r=3
T /n n = 100 n = 200% b r = 3 % b r < b r RFD % b r = 3 % b r < b r RFD T = n LY 27.2 72.2 1.821 - 38.0 61.4 2.006 -BIC 41.2 58.7 2.148 88.25 59.9 40.1 2.509 78.88HQ 62.2 33.7 2.619 94.10 76.3 22.0 2.776 80.77AIC 40.1 12.7 5.076 189.27 66.7 6.6 3.485 108.66 T = n LY 34.5 65.0 1.952 - 47.3 52.3 2.189 -BIC 55.4 44.6 2.445 77.97 73.3 26.7 2.697 72.21HQ 74.6 22.9 2.757 79.92 87.2 11.8 2.883 72.65AIC 63.7 8.4 3.453 98.08 72.5 3.2 3.399 89.20 T = 1 . n LY 43.0 56.9 2.079 - 56.1 43.9 2.300 -BIC 66.1 33.8 2.590 73.38 81.0 19.0 2.792 67.84HQ 84.0 15.3 2.838 73.60 91.0 8.5 2.916 68.16AIC 69.6 4.1 3.416 88.43 76.6 1.9 3.354 80.13 n = 400 n = 800IC % b r = 3 % b r < b r RFD % b r = 3 % b r < b r RFD T = n LY 47.0 51.9 2.227 - 59.6 39.2 2.407 -BIC 71.9 28.1 2.685 71.80 83.8 16.2 2.822 67.61HQ 86.9 12.6 2.871 72.03 91.9 7.3 2.932 68.54AIC 73.1 2.9 3.411 97.14 73.4 1.00 3.412 95.08 T = n LY 58.4 40.9 2.398 - 68.3 29.7 2.602 -BIC 83.5 16.5 2.825 67.17 93.3 6.7 2.932 64.66HQ 92.8 6.4 2.939 67.99 98.4 1.5 2.986 64.71AIC 72.4 1.5 3.420 87.39 76.4 0.1 3.378 81.01 T = 1 . n LY 62.8 36.8 2.445 - 72.8 26.5 2.620 -BIC 85.6 14.4 2.848 66.65 94.9 5.1 2.949 64.63HQ 96.5 2.7 2.981 67.03 99.4 0.5 2.996 64.66AIC 73.6 0.5 3.413 81.19 74.4 0.1 3.424 79.41Percentages with which each method correctly estimates or underestimates the true r ,the average of estimates of r over 1000 replications, and the Frobenius distance betweenΦ and its estimates relative to the Frobenius norm of Φ.11able 2: Monte Carlo results, r=9 T /n n = 100 n = 200% b r = 9 % b r < b r RFD % b r = 9 % b r < b r RFD T = n LY 0.3 99.6 1.964 - 1.3 98.5 2.252 -BIC 0.3 99.7 3.177 85.96 2.1 97.9 4.896 76.10HQ 8.9 86.3 5.915 96.80 23.8 75.6 7.170 78.98AIC 13.0 14.2 9.976 141.05 46.6 17.3 9.271 94.31 T = n LY 1.0 99.0 2.120 - 4.90 95.1 2.573 -BIC 1.3 98.7 4.664 76.13 10.1 89.9 6.476 66.12HQ 16.0 83.2 6.819 77.37 46.1 53.9 8.098 66.38AIC 38.1 24.4 9.161 89.67 66.6 7.2 9.254 72.27 T = 1 . n LY 1.1 98.9 2.195 - 7.2 92.6 2.814 -BIC 6.3 93.7 5.820 70.15 18.6 81.4 7.150 61.22HQ 31.3 68.5 7.603 70.48 59.0 40.9 8.404 61.16AIC 53.6 16.4 9.201 76.60 71.0 3.8 9.288 65.40 n = 400 n = 800IC % b r = 9 % b r < b r RFD % b r = 9 % b r < b r RFD T = n LY 5.5 94.4 2.663 - 15.2 84.8 3.358 -BIC 11.8 88.2 6.604 66.50 31.1 68.9 7.691 58.68HQ 46.4 53.4 8.156 67.02 66.8 33.0 8.570 58.65AIC 70.8 6.7 9.208 74.70 78.4 2.9 9.192 64.78 T = n LY 13.7 86.3 3.423 - 23.2 76.8 4.080 -BIC 31.2 68.8 7.730 58.86 48.2 51.8 8.236 53.53HQ 73.2 26.8 8.652 58.79 83.3 16.7 8.808 53.36AIC 77.2 1.9 9.244 63.00 82.4 0.00 9.216 57.32 T = 1 . n LY 18.3 81.7 3.757 - 30.9 69.1 4.777 -BIC 41.8 58.2 8.077 55.09 61.9 38.1 8.504 51.91HQ 79.2 20.8 8.754 54.92 91.1 8.9 8.904 51.82AIC 80.9 0.6 9.225 58.05 82.3 0.2 9.207 54.88See the notes of Table 1. 12 .1 Co-movements in quarterly US time series
In the first application a high-dimensional VAR for the US economy is investigated. We use FRED-QD,a quarterly database for macroeconomic research maintained by the Federal Reserve Bank of St. Louis.Lately, FRED-QD has frequently been updated and different releases of the whole series are available on-line. A detailed description of the variables and the proposed transformations used to achieve stationarityof each series is available on the FRED-QD website (https://research.stlouisfed.org/econ/mccracken/fred-databases/). For comparison convenience, we use the transformations proposed in FRED-QD to stationarizethe time series although the results of unit root tests might suggest alternative transformations in somecases.After some necessary cleaning of the dataset (e.g. some series are not available for the whole sample) weare left with 236 quarterly observations starting in 1959Q3 until 2018Q2 of 195 macroeconomic variables. It’s obvious that running a VAR with usual methods is not feasible for such a large dataset. Hecq etal . (2019) employ a similar dataset to test for Granger causality between important macro variables in asparse regression setting. The aim of this application is different as we search for the existence of a smalldimensional VAR in terms of linear combinations of the original series that can generate the dynamics ofthe whole system.We investigate the robustness of our framework to several lag lengths. We consider successively p = 1 , ..., T = 236 and n = 195. We present the results relative to the VAR(3) model giventhat more generous choices of the lag length do not provide large improvements in fitting.Table 3 provides the numbers of common dynamic components that are estimated by the Lam and Yaoapproach (LY) as well as by the information criteria using both OLS and FGLS in estimation. The maximumnumber of common dynamic factor is fixed to 13. It emerges from Table 3 that, similarly to what has beenfound in the Monte Carlo section, the Lam and Yao approach detects r = 1. This is likely misleading giventhe huge heterogeneity in the series we work with. For each information criterion, we get the same estimateof r using either OLS or FGLS in estimation. The AIC hits the upper bound for r , whereas the BIC indicates r = 8 and the HQIC suggest r = 12.Table 3: Detection of the number of common dynamic components p = 3 LY BIC HQIC AICLY 1OLS 8 12 13FGLS 8 12 13Although r = 8 is a rather typical choice in the empirical literature, we opt for r = 12 as suggested bythe HQIC. The reason for this choice is twofold: first, we follow the indications coming from the MomteCarlo study; second, in our modelling the common components generate the whole dynamics of the systemwhereas the idiosyncratic components are usually autocorrelated in approximate factor models.Next, we compute two statistics in order to evaluate how the model fits to the data. First, we considerthe coefficients of determination of each element of Y t as obtained by model (11). Second, we apply theorthogonalized decomposition (5) and then we compute the squared correlation coefficients between each Our data file is available upon request. Y t and its counterpart in the common parts A ˜ x t . We denote the former statistic as R Y,Z and thelatter as R Y, ˜ x . Based on the OLS estimates of the coefficients of the RRMAI model, we report in Table 4the means as the quartiles of the empirical distributions of both R Y,Z and R Y, ˜ x .Table 4: Measures of fit of the RRMAI model and the orthogonalized decompositionMean Q Q Q R Y,Z R Y, ˜ x R Y,Z and R Y, ˜ x for four key macroeconomic variables:Real Gross Domestic Product (GDPC1 ); All Employees: Total nonfarm (PAYEMS); Personal ConsumptionExpenditures: Chain-type Price Index (PCECTPI); Effective Federal Funds Rate (FEDFUNDS).Table 5: Measures of fit for four key aggregate variablesGDPC1 PAYEMS PCECTPI FEDFUNDS R Y,Z R Y, ˜ x We seek for the presence of common volatility features between a set of equity volatilities and covolatilities.Both the computational simplicity and theoretical foundations make realized volatility measures (realizedvariance, bi-power variation, median realized variance, etc.) very attractive among practitioners and aca-demics for modelling time varying volatilities and monitoring financial risk. As an example, the 5-minuterealized variance (RV5), a benchmark often considered in practice (see Liu et al ., 2015), is obtained as RV t ≡ P Mj =1 r j,t , where r j,t = ln P j,t − ln P j − ,t are the high frequency returns observed for j = 1 , ..., M intra-day 5-min periods. In the same way that the realized variance (RV) uses high frequency data to esti-mate the integrated variance, the realized covariance (RC) estimates the integrated covariance. For a set ofthe returns of n assets stacked in a column vector r t the realized covariance, which is necessary if one wishesto build portfolios, is obtained such as RC ( d ) t = M − P Mj =1 r j,t r ′ j,t , where the subscript ( d ) reminds that thematrix is constructed for a day using intra-day series. Note that the RC ( d ) t matrix is positive definite when M > n, namely when the number of high frequency observations per day is larger than the number of series.In this paper we consider the daily realized variances of 30 assets for the US. We investigate the periodfrom March 2008 until February 2017 (2236 trading days). We have constructed 10-minute realized variancesand covariances. This frequency is a good comprise between a low variance associated with high-frequencydata, the microstructure noise (McAleer and Medeiros, 2008), a small bias with lower frequencies samplingand the constraint
M > n . Actually, realized variances were computed using different frequencies: 1, 5, 10, 15, 30, 65 and 130 minutes, in additionto the estimation using daily returns. The latter estimation has the advantage of being unbiased but the drawback of beingvery noisy (de Pooter et al. 2008). This trade-off between bias and variance is then of crucial importance (see Martens, 2004). RC ( d ) t , a matrix that they call thelog-space volatility such that A ( d ) t = log m( RC ( d ) t ) = V ( d ) t L ( d ) t V ( d ) ′ t where V ( d ) t is the eigenvector matrix of RC ( d ) t of and L ( d ) t is a diagonal matrix with the diagonal elementsbeing the natural logarithms of the eigenvalues of RC ( d ) t . The inverse transformation is easily obtainedas RC ( d ) t = V ( d ) t Λ ( d ) t V ( d ) ′ t where Λ ( d ) t is a diagonal matrix with the diagonal elements being the naturalexponentials of the eigenvalues of A ( d ) t . Bauer and Vorkink (2001) use a ( d ) t = vech (cid:16) A ( d ) t (cid:17) , where vech (cid:16) A ( d ) t (cid:17) denotes the operator stacking elements on and below the diagonal of A ( d ) t into an n ( n + 1) / to which they add a panel type HAR estimation of the realized covariances. The underlying idea liesin the DCC type factorization RC ( d ) t = D ( d ) t R ( d ) t D ( d ) t , where R ( d ) t is the realized correlation matrix and D ( d ) t is the diagonal matrix having the realized standard deviations along the diagonal.We first consider l ( d ) t = ln (cid:16) diagr( D ( d ) t ) (cid:17) where diagr( D ( d ) t ) denotes the operator stacking the diagonal elements of D ( d ) t into an n -vector. Second, inorder to transform the realized correlations to make their distributions closer to Gaussianity, we employ theFisher z -transformation of the below diagonal elements of R ( d ) t (Barndorff-Nielsen and Shephard, 2004). Inparticular, we take z ( d ) t = arc tanh (cid:16) vecl( R ( d ) t ) (cid:17) . where vecl( R ( d ) t ) denotes the operator stacking the below diagonal elements of R ( d ) t into an n ( n − / l ( d ) t and z ( d ) t individually.Following inter alia , Bub´ak et al . (2011), and Souˇcek and Todorova (2013), Cubadda et al . (2017), thejoint dynamics of series Y ( d ) t (respectively l ( d ) t and z ( d ) t ) can be analyzed through a multivariate version ofthe univariate HAR model. The Vector Heterogeneous Autoregressive model (VHAR) reads as follows: Y ( d ) t +1 = γ + Φ ( d ) Y ( d ) t + Φ ( w ) Y ( w ) t + Φ ( m ) Y ( m ) t + ε t +1 , (18)where ( d ), ( w ), and ( m ) denote, respectively, time horizons of one day, one week (5 days), and one month(22 days), Y ( w ) t = 15 X i =0 Y ( d ) t − i , Y ( m ) t = 122 X i =0 Y ( d ) t − i , Hence, mean, variances and Mean Squared Error (MSE) are computed for each estimation frequency in a similar way as dePooter et al. (2008). The results of these computations show that the optimal frequency to choose here is 10 minutes, that isthe frequency that minimizes the MSE for the variances. The HAR model for daily series is able to capture the long memory features observed in realized volatility measures (see Andersen et al ., 2001). The HAR is a parsimonious restricted autoregressive model of lag order 22 with daily, weekly andmonthly effects. The HAR model can easily be estimated by ordinary least squares (OLS) and it has been illustrated to performwell in forecasting exercises (see e.g. Santos and Ziegelmann, 2014, and the references therein). Some authors call it the MHAR for Multivariate Heterogeneous Autoregressive model. is a vector of N constant terms, Φ ( i ) is a N × N matrix for j = d, w, m , and ε t are i.i.d. innovationswith E( ε t ) = 0, E( ε t ε ′ t ) = Σ (positive definite), and finite fourth moments. The unrestricted dynamics in aVAR associated with our VHAR set p = 22 . However, three restricted dynamic components will be used inthe analysis and the Lam and Yao (2001) framework will be implemented on the following sum of the threesquared autocovariances M = Σ y ( d )Σ y ( d ) ′ + Σ y ( w )Σ y ( w ) ′ + Σ y ( m )Σ y ( m ) ′ , where Σ y ( d ) = E( Y ( d ) t Y ( d ) ′ t − ) , Σ y ( w ) = E( Y ( d ) t Y ( w ) ′ t − ) and Σ y ( m ) = E( Y ( d ) t Y ( m ) ′ t − ).Table 6 provides the results of our tests with r max = 50 restricted multivariate HAR components. Onecan see, looking at AIC or HQIC that there are more commonalities in the 435 correlations than in 30variances. The whole set of correlations can be summarized by 19 (49) factors using HQIC (AIC), Thisfinding is however very far from the assumption of Oh and Patton (2016) who impose common parametersto every correlation. Surprisingly, the 30 realized variances display a large number of common volatilitycomponents with 11 and 17 factors according to, respectively, HQIC and AIC.Table 6: Detection of the number of common dynamic components l ( d ) t , n = 30 z ( d ) t , n = 435LY 1 1BIC 7 4HQIC 11 19AIC 17 49 This paper provides a link between two related but different strands of the literature on data reductionof multivariate time series. These are dynamic factor modelling on the one hand and the common featuremethodology on the other hand. The former approach has the advantage that a limited number of factors isoften enough to summarize the variation of a large dataset in economic and financial applications. However,the common feature approach has the nice theoretical property that the related factors posses a given timeseries feature (autocorrelation, volatility, trends, etc.) whereas the uncommon components do not.Building on Lam et al. (2011) and Lam and Yao (2012), we propose a dimensional reduction approachsuch that both a common right space and a common left null space are present in the coefficient matricesof a large VAR model. This specification allows to detect a small dimensional VAR that is responsibleto generate the whole dynamics of the system. This approach has many potential applications such asforecasting big data from a small scale VAR without loosing relevant information, structural VAR analysis,realized covariance matrices modelling, etc.Our Monte Carlo study shows that we should consider information criteria, and in particular HQIC,to detect the number of common dynamic components. We illustrate the feasibility of our framework onlarge dimensional macroeconomic and financial times series relative to the US economy. Around 12 factorsgenerate the entire dynamics of 195 aggregate economic variables. We also consider co-movements in realizedvolatilities and co-volatilities between 30 US equities. Relatively, only a few common volatility components1619) is needed to capture the 435 correlation volatilities, whereas more than 10 factors are needed for the 30realized variances.
References [1]
Ahn, S. K. and G. C. Reinsel (1988) , Nested Reduced Rank Autoregressive Models for MultipleTime Series,
Journal of the American Statistical Association , 83, 849-856.[2]
Andersen , T., Bollerslev, T., Diebold, F.X. and P. Labys (2001),
The distribution of realizedexchange rate volatility,
Journal of the American Statistical Association,
96, 42-55.[3]
Athanasopoulos, G., Guillen, O.T., Issler, J.V., and Vahid, F. (2011),
Model Selection, Es-timation and Forecasting in VAR Models with Short-run and Long-run Restrictions,
Journal of Econo-metrics,
Bai, J. and S. Ng (2006),
Confidence intervals for diffusion index forecasts and inference with factor-augmented regressions,
Econometrica , 74, 1133–1150.[5]
Bai, J., and S. Ng (2008),
Large Dimensional Factor Analysis,
Foundations and Trends in Econo-metrics,
3, 89-163.[6]
Banbura, M., Giannone, D., and L. Reichlin (2010) , Large Bayesian VARs,
Journal of AppliedEconometrics , 25, 71-92.[7]
Barndorff-Nielsen, O.E. and N. Shephard, Econometric (2004),
Analysis of Realized Co-variation: High Frequency Based Covariance, Regression, and Correlation in Financial Economics,
Econometrica,
72, 885-925.[8]
Bernanke, B., Boivin, J., and P.S. Eliasz (2005),
Measuring the Effects of Monetary Policy:A Factor-augmented Vector Autoregressive (FAVAR) Approach,
The Quarterly Journal of Economics ,120, 387-422.[9]
Bernardini E. and G. Cubadda (2015),
Macroeconomic Forecasting and Structural Analysisthrough Regularized Reduced-Rank Regression,
International Journal of Forecasting , 31, 682-691.[10]
Boivin, J. and S. Ng (2006) , Are more data always better for factor analysis,
Journal of Econometrics ,132, 169–194.[11]
Bubak, V., Kovcenda, E. and F. Zikes (2011),
Volatility transmission in emerging Europeanexchange markets,
Journal of Banking & Finance,
35, 2829-2841.[12]
Carriero, A. Kapetanios, G., and M. Marcellino (2011),
Forecasting Large Datasets withBayesian Reduced Rank Multivariate Models,
Journal of Applied Econometrics , 26, 735-761.[13]
Carriero, A. Kapetanios, G., and M. Marcellino (2016),
Structural analysis with MultivariateAutoregressive Index models,
Journal of Econometrics , 192, 332–348.[14]
Centoni, M., and G. Cubadda (2003),
Measuring the business cycle effects of permanent andtransitory shocks in cointegrated time series.
Economics Letters
80, 45–51.1715]
Centoni M., and G. Cubadda (2015),
Common feature analysis of economic time series: Anoverview and recent developments,
Communications for Statistical Applications and Methods,
22, 1–20.[16]
Chamberlain, G., and M. Rothschild (1983) , Arbitrage, Factor Structure, and Mean-VarianceAnalysis on Large Asset Markets,
Econometrica , 51, 1281-1304.[17]
Chang, J. Y., Guo, B., and Q. Yao, (2018),
Principal ComponentAnalysis for Second-OrderStationary Vector Time Series,
Annals of Statistics , 46, 2094–2124[18]
Corsi, F. (2009),
A Simple Approximate Long-Memory Model of Realized Volatility,
Journal of Fi-nancial Econometrics,
7, 174-196.[19]
Cubadda, G. (2007),
A Unifying Framework for Analyzing Common Cyclical Features in CointegratedTime Series,
Computational Statistics and Data Analysis,
52, 896–906.[20]
Cubadda, G., and B. Guardabascio (2019) , Representation, Estimation and Forecasting of theMultivariate Index-Augmented Autoregressive Model,
International Journal of Forecasting .[21]
Cubadda G., Guardabascio B., and A. Hecq (2017) , A Vector Heterogeneous AutoregressiveIndex Model for Realized Volatility Measures,
International Journal of Forecasting , 33, 337–344.[22]
Cubadda, G. and A. Hecq (2011),
Testing for Common Autocorrelation in Data Rich Environments,
Journal of Forecasting , 30, 325–335.[23]
Cubadda, G., Hecq A. and F.C. Palm (2009),
Studying Co-movements in Large MultivariateModels Prior to Modeling,
Journal of Econometrics,
Davis, R.A., Zang, P. and T. Zheng (2016) , Sparse Vector Autoregressive Modeling,
Journal ofComputational and Graphical Statistics , 25, 1077–1096.[25]
De Pooter, M., Martens, M., and D. Van Dijk (2008),
Predicting the daily covariance matrixfor s&p 100 stocks using intraday data - but which frequency to use?
Econometric Reviews,
27, 199-229.[26]
Engle, R.F., and S. Kozicki (1993),
Testing for Common Features (with comments),
Journal ofBusiness and Economic Statistics , 11, 369–395.[27]
Goetz, T., Hecq, A. and S. Smeekes (2016),
Testing for Granger-causality in large mixed-frequencyVARs,
Journal of Econometrics,
Hecq, A., Laurent, S. and F.C. Palm (2016),
On the Univariate Representation of BEKK Modelswith Common Factors,
Journal of Time Series Econometrics , 8.[29]
Hecq, A., Palm, F.C. and J.P. Urbain (2006),
Common cyclical features analysis in VAR modelswith cointegration,
Journal of Econometrics,
132 117–141.[30]
Hecq, A., Margaritella, L. and S. Smeekes (2019),
Granger Causality Testing in High-Dimensional VARs: a Post-Double-Selection Procedure, http://arxiv.org/abs/1902.10991 .[31]
Hsu, N. J. and H. L. Hung and Y. M. Chang (2008) , Subset selection for vector autoregressiveprocesses using Lasso,
Computational Statistics & Data Analysis, 52 , 3645-3657.1832]
Hu, Z., Dong, K., Dai, W. and T. Tong (2017),
A Comparison of Methods for Estimating theDeterminant of High-Dimensional Covariance Matrix,
International Journal of Biostatistics,
13, 1-24.[33]
Kock, A. B. and Callot, L. (2015),
Oracle inequalities for high dimensional vector autoregressions,
Journal of Econometrics,
Lam, C. and Q. Yao (2012),
Factor modeling for high-dimensional time series: Inference for thenumber of factors,
Annals of Statistics,
40, 694-726.[35]
Lam, C., Yao, Q., and N. Bathia, (2011),
Estimation of latent factors for high-dimensional timeseries,
Biometrika,
98 901–918.[36]
Li, W., Gao, J., Li K., and Q. Yao (2016),
Modeling Multivariate Volatilities via Latent CommonFactors,
Journal of Business & Economic Statistics,
34, 564-573.[37]
Liu, L., Patton, A. and K. Sheppard (2015),
Does anything beat 5-minute RV? A comparison ofrealized measures across multiple asset classes,
Journal of Econometrics,
Martens, M. (2004),
Estimating unbiased and precise realized covariances,
SSRN Electronic Journal ,http://dx.doi.org/10.2139/ssrn.556118.[39]
McAleer, M. and M.C. Medeiros (2008),
Realized volatility: a review,
Econometric Reviews,
Nicholson, W. and Wilms, I. and Bien, J. and D.S. Matteson (2018) , High DimensionalForecasting via Interpretable Vector Autoregression, arXiv:1412.5250v3 .[41]
Oberhofer, W. and J. Kmenta (1974),
A general procedure for obtaining maximum likelihoodestimates in generalized regression models,
Econometrica,
42, 579–590.[42]
Quenouille, M.H. (1957) , The Analysis of Multiple Time Series , Griffin’s Statistical Monographs &courses.[43]
Reinsel, G., (1983) , Some results on multivariate autoregressive index models,
Biometrika , 70 (1),145–156.[44]
Sims, C. (1980),
Macroeconomics and Reality,
Econometrica , 48, 1-48.[45]
Smeekes, S. and E. Wijler, E. (2018),
Macroeconomic forecasting using penalized regression meth-ods,
International Journal of Forecasting,
34, 408-430.[46]
Soucek, M. and N. Todorova (2013),
Realized volatilities transmission between crude oil andequity futures markets: A multivariate HAR approach,
Energy Economics,
40, 586-597.[47]
Stock, J. H. and Watson, M. W. (2006),
Forecasting with Many Predictors,
Handbook of Fore-casting,
North Holland.[48]
Stock, J. H. and Watson, M. W. (2006),
Dynamic Factor Models, in Clements, M.P. and Hendry,D.F. (eds),
Oxford Handbook of Forecasting,
Oxford: Oxford University Press.[49]
Stock, J. H. and Watson, M. W. (2006),
Factor Models for Macroeconomics, in J. B. Taylor andH. Uhlig (eds),
Handbook of Macroeconomics,
Vol. 2, North Holland.1950]
Tao, M., Wang, Y, Yao, Q., and J Zou (2011),
Large Volatility Matrix Inference via CombiningLow-Frequency and High-Frequency Approaches,
Journal of the American Statistical Association, 106,
Vahid, F., and R.F. Engle (1993) , Common trends and common cycles,
Journal of Applied Econo-metrics , 8, 341–360.[52]
Velu, R., Reinsel, G., and D. Wichern (1986),
Reduced Rank Models for Multiple Time Series,
Biometrika , 73, 105–118.[53]
Wilms, I. and C. Croux (2016),
Forecasting using sparse cointegration,
International Journal ofForecasting , 32(4), 1256-1267.[54]
Zhang, R., Robinson, P., and Q. Yao (2019),
Identifying Cointegration by Eigenanalysis,
Journalof the American Statistical Association,
In this appendix we prove the main results of Subsection 2.2. α In this subsection we derive the OLS estimator of Vec( α ) in Equation (14).Vec(˜ α ) = h(cid:16) A ′ Σ − / ǫ ⊗ Z ′ (cid:17) (cid:16) Σ − / ǫ A ⊗ Z (cid:17)i − (cid:16) A ′ Σ − / ǫ ⊗ Z ′ (cid:17) (Σ − / ǫ ⊗ I T − p )Vec( Y )= (cid:0) A ′ Σ − ǫ A ⊗ Z ′ Z (cid:1) − (cid:16) A ′ Σ − / ǫ ⊗ Z ′ (cid:17) (Σ − / ǫ ⊗ I T − p )Vec( Y )= (cid:0) A ′ Σ − ǫ A (cid:1) − ⊗ ( Z ′ Z ) − (cid:16) A ′ Σ − / ǫ ⊗ Z ′ (cid:17) (Σ − / ǫ ⊗ I T − p )Vec( Y )= h(cid:0) A ′ Σ − ǫ A (cid:1) − ⊗ ( Z ′ Z ) − i ( A ′ Σ − ǫ ⊗ Z ′ )Vec( Y )= h(cid:0) A ′ Σ − ǫ A (cid:1) − A ′ Σ − ǫ ⊗ ( Z ′ Z ) − Z ′ i Vec( Y ) . In this subsection we provide the proof of Theorem 6. The thesis is given by the following inequality.E[Vec(ˆ α − α )Vec(ˆ α − α ) ′ | A ] ≥ E[Vec(˜ α − α )Vec(˜ α − α ) ′ | A ] (19)for ∀ Vec( α ) ∈ R nr .Let us start with inserting Equation (11) into (13) and (15) to respectively getVec(ˆ α ) = Vec( α ) + [ A ′ ⊗ ( Z ′ Z ) − Z ′ ]Vec( ǫ ) (20)Vec(˜ α ) = Vec( α ) + h(cid:0) A ′ Σ − ǫ A (cid:1) − A ′ Σ − ǫ ⊗ ( Z ′ Z ) − Z ′ i Vec( ǫ ) (21)In view of Equations (20) and (21) we getE[Vec(ˆ α − α )Vec(ˆ α − α ) ′ | A ] = A ′ Σ ǫ A ⊗ ( Z ′ Z ) − α − α )Vec(˜ α − α ) ′ | A ] = (cid:0) A ′ Σ − ǫ A (cid:1) − ⊗ ( Z ′ Z ) − Since the Kronecker product of two positive semidefinite matrix is positive semidefinite as well, proving(19) requires to show that A ′ Σ ǫ A ≥ (cid:0) A ′ Σ − ǫ A (cid:1) − (22)In order to prove inequality (22), we notice that B ′ Σ ǫ B = ( B ′ Σ − ǫ B ) − where B = [ A, A ⊥ ]. Partitioning conformably in blocks both sides of the equation above we get " A ′ Σ ǫ A A ′ Σ ǫ A ⊥ A ′⊥ Σ ǫ A A ′⊥ Σ ǫ A ⊥ = " A ′ Σ − ǫ A A ′ Σ − ǫ A ⊥ A ′⊥ Σ − ǫ A A ′⊥ Σ − ǫ A ⊥ − (23)Applying the rule of the inverse of a partitioned symmetric matrix to the upper left block of the matrices in(23) we see A ′ Σ ǫ A = ( A ′ Σ − ǫ A ) − +( A ′ Σ − ǫ A ) − A ′ Σ − ǫ A ⊥ [ A ′⊥ Σ − ǫ A ⊥ − A ′⊥ Σ − ǫ A ( A ′ Σ − ǫ A ) − A ′ Σ − ǫ A ⊥ ] − A ′⊥ Σ − ǫ A ( A ′ Σ − ǫ A ) − (24)Finally, noticing that the matrix in square brackets in (24) is the conditional variance matrix of A ′⊥ Σ − ǫ ǫ t given A ′ Σ − ǫ ǫ t , then inequality (22) trivially follows.Moreover, the equality sign holds when A ′ Σ − ǫ A ⊥ = 0, that is when the matrix A belongs to the spacespanned by r of the eigenvectors of Σ − ǫ , and consequently A ⊥ belongs to the space spanned by the remaining n − r eigenvectors. Since the eigenvectors of Σ − ǫ are the same of Σ ǫ , we notice that that two estimatorshave the same variance when A ′ ǫ t and A ′⊥ ǫ tt