BBayesian analysis of seasonally cointegrated VAR model
Justyna Wróblewska ∗ Abstract
The paper aims at developing the Bayesian seasonally cointegrated model forquarterly data. We propose the prior structure, derive the set of full conditionalposterior distributions, and propose the sampling scheme. The identificationof cointegrating spaces is obtained via orthonormality restrictions imposed onvectors spanning them. In the case of annual frequency, the cointegrating vectorsare complex, which should be taken into account when identifying them. Thepoint estimation of the cointegrating spaces is also discussed. The presentedmethods are illustrated by a simulation experiment and are employed in theanalysis of money and prices in the Polish economy.
Keywords: seasonal cointegration, reduced rank regression, error correction model,Bayesian analysis, Bayesian model comparison;
JEL Classification:
C11, C32, C53; ∗ Cracow University of Economics, Department of Econometrics and Operational Research,Rakowicka 27, 31-510 Kraków, Poland, e-mail: [email protected] a r X i v : . [ ec on . E M ] D ec Introduction
Many macroeconomic quarterly or monthly time series display strong both trend andseasonal behavior. The idea of cointegration at zero frequency (assuming commonstochastic trend controlling the long-run behavior of the series) is well known and oftemployed in the empirical analyses. Methods enabling estimation of parameters ofvector error correction models are well known within both classical (see e.g. Johansen,1995) and Bayesian paradigm (see e.g. Koop et al., 2006). However, although theidea of cointegration at seasonal frequencies was introduced in 1990 (Hylleberg et al.,1990) it got much fewer interests. In most researches, the seasonally adjusted dataare analyzed or the seasonality is modeled via seasonal dummies. However, thereare papers presenting the unwilling results of seasonal adjustment. Such proceduresmay for example change both the short- and long-run behavior of the series (e.g.Nerlove, 1964, Cubadda, 1999, Hecq, 1998, Granger, Siklos, 1995). Abeysinghe(1994) discusses the problems which may occur in researchers where seasonality ismodeled by seasonal dummies in cases when such behavior is generated by seasonalunit roots. Moreover, Löf, Franses (2001) shows that taking into account seasonalcointegration improves forecast abilities of models.The likelihood method enabling estimation of seasonal VEC models was proposedby Johansen, Schaumburg (1999) (see also Cubadda, Omtzigt, 2005 for furtherdiscussion), but to the best of our knowledge, there are no Bayesian seasonal VECmodels. This paper is aimed to fulfill this gap by building a Bayesian model for thequarterly seasonally cointegrated time series. Priors for the parameters are imposed.The set of full conditional posterior distributions is obtained. Additionally, thepoint estimation of the cointegrating spaces is discussed. The proposed methodsare illustrated by a small simulation experiment and the empirical analysis of moneyand prices in the Polish economy conducted in the four-dimensional seasonal system.
We start the analysis with the assumption that the n − dimensional quarterly timeseries { y t } has the following VAR( k ) representation: y t = k (cid:88) j =1 A j y t − j + Φ D t + ε t , ε t ∼ iiN (0 , Σ) (1) Note that the very same problems appear in the case of filtering non-seasonal unit roots (see e.g.Meyer, Winker, 2005, Hamilton, 2018). D t contains deterministic components such as a constant, a trend or seasonaldummies. The initial conditions - y , y − , . . . , y − k +1 - are fixed, A ( L ) = I n − A L −· · · − A k L k is the polynomial matrix of the process (1).According to Lagrange expansion the polynomial A ( z ) around the points z , z , . . . , z S may be written as (Johansen, Schaumburg, 1999, Kotłowski, 2005): A ( z ) = p ( z ) I n + S (cid:88) s =1 A ( z s ) p s ( z ) zp s ( z s ) z s + p ( z ) zA ( z ) , (2)where p ( z ) = (cid:81) Ss =1 (1 − ¯ z s z ) , p j ( z ) = (cid:81) Ss (cid:54) = j (1 − ¯ z s z ) = p ( z )1 − ¯ z j z , z (cid:54) = z j and A ( z ) is apolynomial matrix. If additionally z s is a root of the characteristic polynomial of theprocess (1), i.e. | A ( z s ) | = 0 then A ( z s ) is of the reduced rank and can be decomposedas the product of full column rank matrices A ( z s ) = a s b (cid:48) s .The expansion (2) leads for processes cointegrated at zero and quarterly frequencies(i.e. | A ( z ) | = 0 for z = 1 , z = − , z = i , z = ¯ z = − i ) to the following seasonalerror correction representation (see e.g. Hylleberg et al., 1990, Johansen, Schaumburg,1999, Cubadda, Omtzigt, 2005, Kotłowski, 2005): ∆ y t = α β (cid:48) ˜ y (1) t + α β (cid:48) ˜ y (2) t + α (cid:63) ¯ β (cid:48) (cid:63) ˜ y (3) t + ¯ α (cid:63) β (cid:48) (cid:63) ¯˜ y (3) t ++ k − (cid:88) i =1 Γ j ∆ y t − j + ˜Φ ˜ D t + ε t , ε t ∼ iiN (0 , Σ) , (3)where ∆ y t = (1 − L ) y t = y t − y t − with L denoting the lag operator, Γ j = − (cid:80) [( k − j ) / l =1 A j +4 l , j = 1 , , . . . , k − . Vectors ˜ y ( · ) t may contain deterministiccomponents, so ˜ y ( · ) t = (cid:32) y ( · ) t d ( · ) t (cid:33) . In particular, let consider the deterministic componentof the following form Φ D t = µ + a cos (cid:0) π t (cid:1) + b sin (cid:0) π t (cid:1) + c cos ( πt ) + γt (see Franses,Kunst, 1999 and Kotłowski, 2005).Employing the polynomials p · ( L ) to the stochastic and deterministic parts of theprocess (1) leads to the vectors ˜ y ( · ) t of the following forms:• at zero frequency - ˜ y (1) t = (cid:32) y (1) t d (1) t (cid:33) , where y (1) t = p ( L ) Ly t = (1 + L + L + L ) Ly t = y t − + y t − + y t − + y t − , the term (1 + L + L + L ) LD t leads to d (1) t = t − and an unrestricted constant,• at π frequency - ˜ y (2) t = (cid:32) y (2) t d (2) t (cid:33) , where y (2) t = p ( L ) Ly t = (1 − L + L − L ) Ly t = t − − y t − + y t − − y t − , the term (1 − L + L − L ) LD t leads to d (2) t = cos ( πt ) and an unrestricted constant,• at π and π frequencies - ˜ y (3) t = (cid:32) y (3) t d (3) t (cid:33) , where y (3) t = p ( L ) Ly t = ( − i − L + iL + L ) Ly t = − iy t − − y t − + iy t − + y t − , the term ( − i − L + iL + L ) LD t leads to d (3) t = cos (cid:0) π t (cid:1) − i sin (cid:0) π t (cid:1) and an unrestricted constant.Note that the unrestricted constant occurring in each of the above consideredfrequencies results form the linear trend assumed for the level of the analyzed process(1). If there is no linear trend, but only constant, there is neither trend restrictedto the cointegration space at the zero frequency nor an unrestricted constant in therepresentation (3), but there is a constant restricted to the cointegration spaces ofthe zero frequency ( d (1) t = 1 ), see e.g. Juselius (2006, pp. 93-112) for the discussionof the meaning of dummies gathered in the vectors D t and ˜ D t , and also the relationsbetween them.Note also that α (cid:63) ¯ β (cid:48) (cid:63) ˜ y (3) t and ¯ α (cid:63) β (cid:48) (cid:63) ¯˜ y (3) t are complex conjugate matrices, so their sumgives their real part multiplied by 2: α (cid:63) ¯ β (cid:48) (cid:63) ˜ y (3) t + ¯ α (cid:63) β (cid:48) (cid:63) ¯˜ y (3) t = 2 Re ( α (cid:63) ¯ β (cid:48) (cid:63) ˜ y (3) t ) = (4) = 2 [( α I β (cid:48) R − α R β (cid:48) I )( y t − − y t − ) − ( α R β (cid:48) R + α I β (cid:48) I )( y t − − y t − )] , where α R , β R denote the real parts of α (cid:63) and β (cid:63) respectively, whereas α I , β I - theirimaginary parts ( α (cid:63) = α R + iα I , β (cid:63) = β R + iβ I ).These leads to the more commonly used representation of seasonally cointegratedquarterly VAR process (see e.g. Hylleberg et al., 1990, Johansen, Schaumburg, 1999,Cubadda, Omtzigt, 2005, Kotłowski, 2005): ∆ y t = Π ˜ y (1) t +Π ˜ y (2) t +Π ˜ y (32) t +Π ˜ y (31) t + k − (cid:88) i =1 Γ i ∆ y t − i + ˜Φ ˜ D t + ε t , ε t ∼ iiN (0 , Σ) , (5)where Π = α β (cid:48) , Π = α β (cid:48) , Π = − α R β (cid:48) R + α I β (cid:48) I ) , Π = 2( α I β (cid:48) R − α R β (cid:48) I ) , ˜ y (31) t = (cid:32) y (31) t d (31) t (cid:33) , where y (31) t = (1 − L ) Ly t = y t − − y t − , d (31) t = sin ( πt ) , ˜ y (32) t = (cid:32) y (32) t d (32) t (cid:33) ,where y (32) t = (1 − L ) L y t = y t − − y t − , d (32) t = cos ( πt ) .To save on notation we introduce the matrix form of the model (3). Z = Z β α (cid:48) + Z β α (cid:48) + Z ¯ β (cid:63) α (cid:48) (cid:63) + ¯ Z β (cid:63) ¯ α (cid:48) (cid:63) + Z Γ + E, (6)4here Z = (cid:16) ∆ y ∆ y . . . ∆ y T (cid:17) (cid:48) , Z = (cid:16) ˜ y (1)1 ˜ y (1)2 . . . ˜ y (1) T (cid:17) (cid:48) , Z = (cid:16) ˜ y (2)1 ˜ y (2)2 . . . ˜ y (2) T (cid:17) (cid:48) , Z = (cid:16) ˜ y (3)1 ˜ y (3)2 . . . ˜ y (3) T (cid:17) (cid:48) = − Z − iZ , Z = (cid:16) ˜ y (31)1 ˜ y (31)2 . . . ˜ y (31) T (cid:17) (cid:48) , Z = (cid:16) ˜ y (32)1 ˜ y (32)2 . . . ˜ y (32) T (cid:17) (cid:48) , Z = (cid:16) z z . . . z T (cid:17) (cid:48) , z (cid:48) t = (cid:16) ∆ y (cid:48) t − ∆ y (cid:48) t − . . . ∆ y (cid:48) t − k +4 ˜ D t (cid:17) , Γ = (cid:16) Γ Γ . . . Γ k − ˜Φ (cid:17) (cid:48) , E = (cid:16) ε ε . . . ε T (cid:17) (cid:48) .As the analyzed data inform only about the cointegration space not the cointegrationvectors, during the estimation we have to deal with the non-identification occurring inthe products: α β (cid:48) , α β (cid:48) , ¯ β (cid:63) α (cid:48) (cid:63) , β (cid:63) ¯ α (cid:48) (cid:63) . Therefore, we employ the methods proposedby Koop et al. (2009). Following their ideas we will consider two observationallyequivalent representation for each of the considered products. In the A − B representations it is assumed that the matrices belong to the R · or C · spaces ofappropriate dimensions, whereas in the α − β representations β s have orthonormalcolumns and α s still belong tho the R · or C · spaces:• A B (cid:48) ≡ α β (cid:48) ,A ∈ R n × r , B ∈ R m × r , α = A ( B (cid:48) B ) ∈ R n × r , β = B ( B (cid:48) B ) − ∈ V r ,m ,• A B (cid:48) ≡ α β (cid:48) ,A ∈ R n × r , B ∈ R m × r , α = A ( B (cid:48) B ) ∈ R n × r , β = B ( B (cid:48) B ) − ∈ V r ,m ,• A (cid:63) ¯ B (cid:48) (cid:63) ≡ α (cid:63) ¯ β (cid:48) (cid:63) ,A (cid:63) = A R + iA I ∈ C n × r , B (cid:63) = B R + iB I ∈ C m × r , α (cid:63) = A (cid:63) ( ¯ B (cid:48) (cid:63) B (cid:63) ) ∈ C n × r , β (cid:63) = B (cid:63) ( ¯ B (cid:48) (cid:63) B (cid:63) ) − ∈ V C r ,m ,where V r j ,m j , j = 1 , denotes the Stiefel manifold, i.e. the set of m j × r j matrices withorthonormal columns ( V r j ,m j = (cid:8) X ∈ R m j × r j : X (cid:48) X = I r j (cid:9) ), V C r ,m stands for thecomplex Stiefel manifold, i.e. the set of m × r semi-unitary matrices ( V C r ,m = (cid:8) X ∈ C m × r : ¯ X (cid:48) X = I r (cid:9) ).Note that by this approach the non-identification issue is only partially solved asthere is many-to-one relationship between the Stiefel manifolds and the Grassmannmanifolds to which belong the cointegration spaces: if X is the element of the(complex) Stiefel manifold and the r j × r j ( j = 1 , , ) matrix O is the element of the(complex) orthonormal group ( O (cid:48) O = OO (cid:48) = I r j , j = 1 , , ¯ O (cid:48) O = O ¯ O (cid:48) = I r ) than XO is the element of the same (complex) Stiefel manifold and they span the same G r j ,m j − r j , j = 1 , , G C r ,m − r , collecting r j -dimensional planes, passing through the origin,in the real ( j = 1 , )/complex ( j = 3 ) vector m j -dimensional space, see e.g. James, 1954, Chern,Wolfson, 1987 XX (cid:48) = XOO (cid:48) X (cid:48) in the real case and X ¯ X (cid:48) = XO ¯ O (cid:48) ¯ X (cid:48) in the complex case).The first two products (i.e. α β and α β ) involve only matrices with real numbers,so can be treated exactly as proposed by Koop et al. (2009). While in the third casewe have to adjust their procedures to the complex matrices and spaces.We start the analysis with the A − B parameterization and impose the following priordistributions over the model parameters:• the inverse Wishart distribution fo the covariance matrix - Σ ∼ iW ( S, q ) , • the matrix normal distribution for Γ - Γ | Σ , ν ∼ mN ( µ Γ , Σ , ν Ω Γ ) , • the matrix normal distribution for adjustment coefficients at frequency 0 - A | Σ , ν ∼ mN ( µ , ν Ω , Σ) , • the matrix normal distribution for un-normalized cointegrating vectors atfrequency 0 - B ∼ mN (0 , m I r , P ) , which leads to matrix angular centraldistribution for its orientation - β ∼ M ACG ( P ) (see Chikuse, 1990, 2003), via the matrix P the researcher can incorporate prior knowledge about thecointegration space at zero frequency (see Koop et al., 2009 for the details),• the matrix normal distribution for adjustment coefficients at frequency π - A | Σ , ν ∼ mN ( µ , ν Ω , Σ) , • the matrix normal distribution for un-normalized cointegrating vectors atfrequency π - B ∼ mN (0 , m I r , P ) , so β ∼ M ACG ( P ) (see the explanationstated in the point for B ),• the complex matrix normal distribution for adjustment coefficients atfrequencies π and π - A (cid:63) | Σ , ν ∼ mCN ( µ (cid:63) , νI r , Σ) , i.e. p ( A (cid:63) | Σ) = π − nr | Σ | − r exp {− tr Σ − ( A (cid:63) − µ (cid:63) )( ν I r )( ¯ A (cid:63) − ¯ µ (cid:63) ) (cid:48) } , so E ( A (cid:63) ) = µ (cid:63) , V ( vec ( A (cid:63) )) = I r ⊗ Σ , where µ (cid:63) = µ (cid:63)R + iµ (cid:63)I . Note that imposingsuch distribution for A (cid:63) is equivalent to assuming that (cid:32) A R A I (cid:33) | Σ ∼ mN (cid:32)(cid:32) µ (cid:63)R µ (cid:63)I (cid:33) , νI r , (cid:32) Σ n × n n × n Σ (cid:33)(cid:33) ,• the complex matrix normal distribution for un-normalized cointegrating vectorsat frequencies π and π - B (cid:63) ∼ mCN ( m r × r , m I r , P (cid:63) ) , where P (cid:63) = P (cid:63)R + iP (cid:63)I , such as P (cid:63) is Hermitian ( P (cid:63) = ¯ P (cid:48) (cid:63) ≡ ( P (cid:63)R = P (cid:48) (cid:63)R , P (cid:63)I = − P (cid:48) (cid:63)I ) ) positive definite matrix, so for the real and imaginary part of B (cid:63)
6e impose the matrix normal distribution of the following form (cid:32) B R B I (cid:33) ∼ mN (cid:32) m × r , I r , (cid:32) P (cid:63)R − P (cid:63)I P (cid:63)I P (cid:63)R (cid:33)(cid:33) . Such distribution leads to the complexmatrix angular central Gaussian distribution for the orientation part of thematrix B (cid:63) (see Wróblewska, 2020).• The parameter ν may be estimated or settled by the researcher. In the caseof estimated ν we propose to impose the inverse gamma distribution for it - ν ∼ iG ( s ν , n ν ) .The joint prior distribution is truncated by the non-explosive condition taking intoaccount the appropriate numbers of unit roots at each frequency.The above proposed prior distributions together with the likelihood function leadto the joint posterior distribution with the following kernel: p ( θ | y ) ∝ ν − n ν − n [ n ( k − l + r + r +2 r ] − exp (cid:16) − s ν ν (cid:17) | Σ | − [ q + n ( k − l + r + r +2 r + T + n +1] / ×× exp (cid:26) − tr (cid:20) Σ − (cid:20) S + 1 ν (cid:16) Γ − µ Γ (cid:17) (cid:48) Ω − (Γ − µ Γ ) + 2 1 ν ( A (cid:63) − µ (cid:63) )( A (cid:63) − µ (cid:63) ) (cid:48) (cid:21)(cid:21)(cid:27) × (7) × exp (cid:26) − tr (cid:20) Σ − (cid:20) ν ( A − µ )Ω − ( A − µ ) (cid:48) + 1 ν ( A − µ )Ω − ( A − µ ) (cid:48) + E (cid:48) E (cid:21)(cid:21)(cid:27) ×× exp (cid:26) − tr (cid:2) m B (cid:48) P − B + m B (cid:48) P − B + 2 m ¯ B (cid:48) (cid:63) P − (cid:63) B (cid:63) (cid:3)(cid:27) I [0 , ( | λ | max ) . where θ = (Σ , Γ , A , A , A (cid:63) , B , B , B (cid:63) ) collects all model’s parameters, l denotes thenumber of deterministic components gathered in ˜ D t , I [ a,b ] ( · ) is the indicator functionof the interval [ a, b ] and λ stands for the vector of the eigenvalues of the companionmatrix, that is the matrix of the form: A = A A . . . A k − A k I n . . . I n . . . ... ... . . . ... ... . . . I n , (8)where I n is an n -dimensional identity matrix, A = Π + Π + Π + Γ , A = Π − Π + Π + Γ , A = Π + Π − Π + Γ , A = I n + Π − Π − Π + Γ , A i = Γ i − Γ i − for i = 5 , , . . . , Π = α β (cid:48) ≡ A B (cid:48) , Π = α β (cid:48) ≡ A B (cid:48) , Π = 2( α I β (cid:48) R − α R β (cid:48) I ) ≡ A I B (cid:48) R − A R B (cid:48) I ) , Π = − α R β (cid:48) R + α I β (cid:48) I ) ≡ − A R B (cid:48) R + A I B (cid:48) I ) and Γ i = 0 for7 > k − . The matrix A makes it possible to write the analyzed process in theVAR(1) form (see e.g. Lütepohl, 2005, pp. 15-16).From equation (8) we obtain the set of full conditional posterior distributions formodel’s parameters:• the inverse Wishart distribution for the covariance matrix of errors Σ : p (Σ |· , y ) = f iW (cid:0) S, q + n ( k −
4) + l + r + r + 2 r + T (cid:1) , (9)where S = S + 1 ν (cid:104) (Γ − µ Γ ) (cid:48) Ω − (Γ − µ Γ ) + 2( A (cid:63) − µ (cid:63) )( A (cid:63) − µ (cid:63) ) (cid:48) + ( A − µ )Ω − ( A − µ ) (cid:48) ++ ( A − µ )Ω − ( A − µ ) (cid:48) (cid:105) + E (cid:48) E, (10)• the inverse gamma distribution for ν (if it is estimated) p ( ν |· , y ) = iG (cid:0) s ν , Ω bI (cid:1) , (11)where n ν = n ν + n [ n ( k −
4) + l + r + r + 2 r ] and s ν = s ν + 12 tr (cid:110) Σ − (cid:104) (Γ − µ Γ ) (cid:48) Ω − (Γ − µ Γ ) + 2( A (cid:63) − µ (cid:63) )( A (cid:63) − µ (cid:63) ) (cid:48) ++ ( A − µ )Ω − ( A − µ ) (cid:48) + ( A − µ )Ω − ( A − µ ) (cid:48) (cid:105)(cid:111) . (12)• the matrix normal distribution for Γ : p (Γ |· , y ) = f mN (cid:0) µ Γ , Σ , Ω Γ (cid:1) , (13)where Ω Γ = ( ν Ω − + Z (cid:48) Z ) − , µ Γ = Ω Γ (cid:104) ν Ω − µ Γ + Z (cid:48) (cid:0) Z − Z B A (cid:48) − Z B A (cid:48) − Re ( Z ¯ B (cid:63) A (cid:48) (cid:63) ) (cid:1)(cid:105) ,• the matrix normal distribution for A : p ( A |· , y ) = f mN (cid:0) µ , Ω , Σ (cid:1) , (14)where Ω = ( ν Ω − + B (cid:48) Z (cid:48) Z B ) − , µ = (cid:104) ν Ω − µ + (cid:0) Z − Z B A (cid:48) − Re ( Z ¯ B (cid:63) A (cid:48) (cid:63) ) − Z Γ (cid:1) (cid:48) Z B (cid:105) Ω ,• the matrix normal distribution for A : p ( A |· , y ) = f mN (cid:0) µ , Ω , Σ (cid:1) , (15)8here Ω = ( ν Ω − + B (cid:48) Z (cid:48) Z B ) − , µ = (cid:104) ν Ω − µ + (cid:0) Z − Z B A (cid:48) − Re ( Z ¯ B (cid:63) A (cid:48) (cid:63) ) − Z Γ (cid:1) (cid:48) Z B (cid:105) Ω ,• the matrix normal distribution for A RI = (cid:32) A (cid:48) R A (cid:48) I (cid:33) : p ( A RI |· , y ) = f mN (cid:0) µ RI , Σ , Ω RI (cid:1) , (16)where Ω RI = ( ν I r + X (cid:48) (cid:63) X (cid:63) ) − , µ RI = Ω RI (cid:104) ν µ RI + 2 X (cid:48) (cid:63) ( Z − Z B A (cid:48) − Z B A (cid:48) − Z Γ) (cid:105) , µ RI = (cid:32) µ (cid:48) (cid:63)R µ (cid:48) (cid:63)I (cid:33) , X (cid:63) = 2 (cid:2) Re ( Z ¯ B ) − Im ( Z ¯ B ) (cid:3) = 2 [( Z − Z ) B R − ( Z + Z ) B I ] ,• the normal distribution for the vector b = vec ( B ) : p ( b |· , y ) = f N (cid:0) µ b , Ω b (cid:1) , (17)where Ω b = (cid:2) ( m I r ⊗ P − ) + ( A (cid:48) Σ − A ⊗ Z (cid:48) Z ) (cid:3) − , µ b = Ω b vec (cid:2) Z (cid:48) (cid:0) Z − Z B A (cid:48) − Re ( Z ¯ B (cid:63) A (cid:48) (cid:63) ) − Z Γ (cid:1) Σ − A (cid:3) ,• the normal distribution for the vector b = vec ( B ) : p ( b |· , y ) = f N (cid:0) µ b , Ω b (cid:1) , (18)where Ω b = (cid:2) ( m I r ⊗ P − ) + ( A (cid:48) Σ − A ⊗ Z (cid:48) Z ) (cid:3) − , µ b = Ω b vec (cid:2) Z (cid:48) (cid:0) Z − Z B A (cid:48) − Re ( Z ¯ B (cid:63) A (cid:48) (cid:63) ) − Z Γ (cid:1) Σ − A (cid:3) ,• the normal distribution for the vector b R = vec ( B R ) p ( b R |· , y ) = f N (cid:0) µ bR , Ω bR (cid:1) , (19)where Ω bR = (cid:2) x (cid:48) bR (Σ − ⊗ I T ) x bR + 2( m I r ⊗ P − (cid:63)R ) (cid:3) − , µ bR = 2Ω bR vec (cid:2) Z (cid:48) Y bR Σ − A I − Z (cid:48) Y bR Σ − A R (cid:3) , x bR = 2( A I ⊗ Z ) − A R ⊗ Z ) , Y bR = Z − Z B A (cid:48) − Z B A (cid:48) − Z Γ + 2 Z B I A (cid:48) R + 2 Z B I A (cid:48) I ,• the normal distribution for the vector b I = vec ( B I ) p ( b I |· , y ) = f N (cid:0) µ bI , Ω bI (cid:1) , (20)where Ω bI = (cid:2) x (cid:48) bI (Σ − ⊗ I T ) x bI + 2( m I r ⊗ ( P (cid:63)R + P (cid:63)I P − (cid:63)R P (cid:63)I ) − ) (cid:3) − , µ bI = 2Ω bI vec (cid:2) m ( P (cid:63)R + P (cid:63)I P − (cid:63)R P (cid:63)I ) − P (cid:63)I P − (cid:63)R B R − Z (cid:48) Y bI Σ − A R − Z (cid:48) Y bI Σ − A I (cid:3) ,9 bI = − A R ⊗ Z ) − A I ⊗ Z ) , Y bI = Z − Z B A (cid:48) − Z B A (cid:48) − Z Γ − Z B R A (cid:48) I + 2 Z B R A (cid:48) R ,Having the set of full conditional posterior distributions, the pseudo-randomsample from the joint posterior distribution may be obtained with the help of theGibbs sampler, similarly as Koop et al. (2009) in CI(1,1) case.In the first step the initial values are proposed - Σ (0) , ν (0) , Γ (0) , A (0)1 , B (0)1 , A (0)2 , B (0)2 , A (0) (cid:63) , B (0) (cid:63) , then the following stepsare reiterated:• draw Σ ( s ) from the inverse Wishart distribution (9),• draw ν ( s ) from the inverse gamma distribution (11) - if it is estimated,• draw Γ ( s ) from the matrix normal distribution (13),• draw A ( s )1 from the matrix normal distribution (14),• draw vec ( B ) ( s ) from the normal distribution (17) and reshape it to obtain B ,• obtain β ( s )1 and α ( s )1 as β ( s )1 = B ( s )1 ( B ( s ) (cid:48) B ( s )1 ) − and α ( s )1 = A ( s )1 ( B ( s ) (cid:48) B ( s )1 ) ,• draw A ( s )2 from the matrix normal distribution (15),• draw vec ( B ) ( s ) from the normal distribution (18) and reshape it to obtain B ,• obtain β ( s )2 and α ( s )2 as β ( s )2 = B ( s )2 ( B ( s ) (cid:48) B ( s )2 ) − and α ( s )2 = A ( s )2 ( B ( s ) (cid:48) B ( s )2 ) ,• draw A ( s ) R and A ( s ) I from the matrix normal distribution (16),• draw vec ( B R ) ( s ) from the normal distribution (19) and reshape it to obtain B R ,• draw vec ( B I ) ( s ) from the normal distribution (20) and reshape it to obtain B I ,• set A ( s ) (cid:63) = A R + iA I and B ( s ) (cid:63) = B R + iB I ,• obtain β ( s ) (cid:63) and α ( s ) (cid:63) as β ( s ) (cid:63) = B ( s ) (cid:63) ( ¯ B ( s ) (cid:48) (cid:63) B ( s ) (cid:63) ) − and α ( s ) (cid:63) = A ( s ) (cid:63) ( ¯ B ( s ) (cid:48) (cid:63) B ( s ) (cid:63) ) ,• check the non-explosive condition and if it is fulfilled keep the draws and increasethe iteration counter.Note that in models with unit roots at various frequencies the non-explosivecondition should be examined carefully by taking into account the explicitnumber of unit roots at particular frequencies.The square root of the complex Hermitian matrix ( ¯ B ( s ) (cid:48) (cid:63) B ( s ) (cid:63) ) , may be obtained withthe Newton’s method proposed by Highman (1986).10 Point estimation of the cointegration space
Information on the cointegration spaces obtained from the data may be summarizedin the point estimate of these spaces and the measure of their posterior distributions’dispersion. Villani (2006) proposed to employ the Frobenius (Hilbert-Schmidt) matrixnorm to build the loss function needed to point estimation of the real cointegrationspace. The same approach can be used to estimate complex spaces (see e.g. Srivastava,2000).Employing the Frobenius matrix norm (cid:107) A (cid:107) F = ( tr ( ¯ A (cid:48) A )) to the projection matriceswe can built the loss function of the following form: l ( β, ˜ β ) = (cid:107) β ¯ β (cid:48) − ˜ β ¯˜ β (cid:48) (cid:107) F = 2( r − tr ( β ¯ β (cid:48) ˜ β ¯˜ β (cid:48) )) , (21)where r denotes the number of cointegrating vectors.The loss function (21) reaches its minimum in: ˆ β = (cid:16) ν ν . . . ν r (cid:17) , (22)where ν i ( i = 1 , , . . . , r ) is the eigenvector of the matrix E ( β ¯ β (cid:48) ) corresponding to its i th largest eigenvalue (see Chikuse, 2003, Villani, 2006).The numerical realization of ˆ β is obtained with the use of the pseudo-random samplefrom the posterior distribution of β , { β ( s ) , s = 1 , , . . . , S } , by approximating E ( β ¯ β (cid:48) ) as S (cid:80) Ss =1 β ( s ) ¯ β ( s ) (cid:48) .Following Villani (2006) we use the projective Frobenius span variation: τ sp ( β ) = r − (cid:80) ri =1 λ i r ( m − r ) /m ∈ [0 , , (23)where λ i is the i th largest eigenvalue of E ( β ¯ β (cid:48) ) . The measure τ sp ( β ) reaches itsminimum value when the distribution is degenerated, whereas it hits the maximumvalue for the uniform distribution over the complex Grassmann manifold ( G r,m − r ). As a first illustration of the proposed methods, we perform a small simulation study.We use one of the data generating processes proposed by Cubadda, Omtzigt (2005).11e simulate 250 data points from: ∆ y t = A B (cid:48) y (1) t + A B (cid:48) y (2) t + A (cid:63) ¯ B (cid:48) (cid:63) y (3) t + ¯ A (cid:63) B (cid:48) (cid:63) ¯ y (3) t ++ Γ ∆ y t − + ε t , ε t ∼ iiN (0 , Σ) , (24)where B = B = (cid:32) − (cid:33) , B (cid:63) = (cid:32) (cid:33) + i (cid:32) (cid:33) , A = (cid:32) − . (cid:33) , A = (cid:32) . (cid:33) , A (cid:63) = i (cid:32) . (cid:33) , Γ = (cid:32) . − . − . . (cid:33) and Σ = (cid:32) − √ − √ . (cid:33) .Initial values are set to zeros, then the first 50 points are discarded, so we are leftwith 200 modeled data points.We impose the following priors:• Σ ∼ iW (0 . I , , • Γ | Σ ∼ mN ( , Σ , νI ) , • A | Σ ∼ mN ( n × r , νI r , Σ) , • B ∼ mN (0 , m I r , . I ) , • A | Σ ∼ mN ( n × r , νI r , Σ) , • B ∼ mN (0 , m I r , . I ) , • A (cid:63) | Σ ∼ mCN ( n × r , νI r , Σ) , • B (cid:63) ∼ mCN ( m r × r , I r , . I ) , • ν ∼ iG (1 , .Table 1 gathered the results of the Bayesian model comparison (see appendix for moreinformation about the employed methods).12 s r r r p ( M d,s,r ,r ,r | y ) Note:
The symbol d denotes the type of deterministic components ( d = 1 - a trend restrictedto the cointegration space at the zero frequency and an unrestricted constant, d = 2 - anunrestricted constant, d = 3 - a constant restricted to the cointegration relation at the zerofrequency, d = 4 - no deterministic components), s can be 0 (no seasonal dummies) or 1(a model with seasonal dummies), r , r , r ∈ { , , } The true model is M , , , , and theprior probability for each of the compared specification is p ( M d,s,r ,r ,r ) = ≈ . . Source:
Own calculations based on 1.5 million draws from the prior distribution.
Table 1: The most probable models in the simulation experiment ( p ( M d,s,r ,r ,r | y ) > . ).There are 3 models with the posterior probability higher than 0.001 and theygathered 0.997 of probability mass. The true model is on the second place with theposterior probability equal 0.383. All the models displayed in the Table 2 have propernumbers of cointegrating vectors, have no seasonal dummies and differ only in thetype of deterministic components.In Table 2 we present the marginal posterior probabilities of models’ features. p ( d = 1 | y ) p ( d = 2 | y ) p ( d = 3 | y ) p ( d = 4 | y ) p ( s = 0 | y ) p ( s = 1 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) Source:
Own calculations based on 1.5 million draws from the prior distribution.
Table 2: Posterior probabilities of the model features.13he results of models comparison generally correctly point to proper model’scharacteristics with the exception of the type of deterministic components becausethe models with the constant restricted to the cointegrating space at zero frequencygathered 0.606 of the posterior probability which is approximately 1.5 more than thewhole posterior probability of models without a constant, i.e. the true specification.In the next step of this small simulation experiment we estimate the true model andpresent the distances between obtained cointegrating spaces and the assumed ones(Table 3).frequency assumed vector ( β i ) point estimate ( ˆ β i ) τ sp ( β i ) l ( β i , ˆ β i ) i = 1 ) (cid:32) √ − √ (cid:33) (cid:32) − . . (cid:33) π ( i = 2 ) (cid:32) √ − √ (cid:33) (cid:32) − . . (cid:33) π , π ( i = (cid:63) ) (cid:32) √ (cid:33) + i (cid:32) √ (cid:33) (cid:32) . . (cid:33) + i (cid:32) − . (cid:33) Note: √ ≈ . Source:
Own calculations based on 200000 accepted draws, preceded by 100000 discardeddraws form the posterior distribution.
Table 3: The results of cointegrated spaces’ estimation in the simulation experiment.At the first glance one may notice significant differences between point estimate ofthe cointegration vectors at the annual frequency and the assumed one, but it shouldbe remembered that the data contain information only about the cointegrating spaces,not the vectors, and the distance between the true and estimated space is very low(see the last column of Table 3). Generally, the results of the performed simulationexperiment proofed that the proposed procedures works well, as the differencesbetween true and estimated spaces at all frequencies are negligible. Moreover,values of the measure τ sp ( β ) are close to zero, so the posterior distributions ofthe cointegrating spaces are almost degenerated, which can be expected during theanalysis of artificial data. Now we proceed to the employment of the proposed modelin the real data analysis. 14 Empirical illustration
In the empirical analysis, we will consider the four-dimensional time series consistedof GDP in constant prices from 2010, consumer price index ( ), the broadmonetary aggregate M3, and the spread between long- and short-term interest ratesapproximated as the difference between 10-Year Bond Yield and 3-months WIBOR.The quarterly data cover the period 2002Q1 - 2019Q4.A similar model was analyzed by Kotłowski (2005).Figure 1 depicts the analyzed time series. The seasonality may be observed in thepaths of all the considered time series, whereas the strongest seasonal pattern is visiblein GDP. The trending behavior of the series is also noticeable.
CPI GDP M3 SPREAD -2.5-2-1.5-1-0.500.511.522.5
Figure 1: The analyzed data.Following Franses (1994) and Engle et al. (1993) we present also the unittransformation of the analyzed time series (Figure 2), where the trending behavior15nd seasonal patterns are more visible. The seasonal variations of GDP differ fromthe changes observed at the rest of the series.
CPI
GDP M3 SPREAD ( + 𝐿 + 𝐿 + 𝐿 ) 𝑦 ( − 𝐿 + 𝐿 − 𝐿 ) 𝑦 ( − 𝐿 ) 𝑦 -33-28-23-18-13-8-32712 -0.015-0.01-0.005 -0.2-0.15-0.1-0.0500.05 -0.04-0.020 -2-1.5-1 -0.5 -0.015-0.01-0.00500.0050.010.0150.020.0250.030.0350.04 -0.15-0.1-0.0500.050.10.150.2 -0.0200.020.040.060.080.10.12 -3-2-101234 Figure 2: The unit root transformations od the analyzed data.To fully define the Bayesian seasonally cointegrated VAR model we impose thefollowing priors:• Σ ∼ iW (0 . I , , • Γ | Σ ∼ mN ( , Σ , νI l ) , where l denotes the number of dummies outsidecointegration spaces,• A | Σ ∼ mN ( n × r , νI r , Σ) , • B ∼ mN (0 , m I r , . I ) , • A | Σ ∼ mN ( n × r , νI r , Σ) , B ∼ mN (0 , m I r , . I ) , • A (cid:63) | Σ ∼ mCN ( n × r , νI r , Σ) , • B (cid:63) ∼ mCN ( m r × r , I r , . I ) , • ν ∼ iG (1 , .In order to check the nature of the analyzed series we start the analysis by the Bayesianmodel comparison. The models may differ in the number of cointegrating relations( r j ∈ { , , , , } for j = 1 , , ) at zero ( j = 1 ), π ( j = 2 ) and π , π ( j = 3 )frequencies. We consider models with a linear trend restricted to the cointegrationspace at zero frequency and with an unrestricted constant ( d = 1 ), models withan unrestricted constant ( d = 2 ), specifications with a constant restricted to thecointegration space at zero frequency ( d = 3 ), and models without constant ( d = 4 ).The models without ( s = 0 ) and with ( s = 1 ) seasonal dummies were examined. Eachof the considered specifications has five lags in VAR representation. After excludingnon-possible feature combinations and leaving in the set one representation of theobservationally equivalent models we are left with 784 pairwise different models. Weassume equal prior probability of each specification, i.e. p ( M d,s,r ,r ,r ) = ≈ . . Note that imposing uniform probability on the models’ space does not leadto uniform prior distribution for the model features (see the numbers in parenthesesin Table 5).The models with posterior probability higher than 0.01 are displayed in Table 4.Almost all of the listed models assume that the analyzed times series may be treatedas a realization of the process with 3 or 4 bi-annual relations (note that 4 meansstability at π frequency). The models ranked at the first and second place assume onelong-run relationship and an unrestricted constant. They differ only in the numberof cointegrating vectors at the bi-annual frequency (4 or 3). Note that accordingto the results displayed in Table 5, models with two relations at 0 frequency arethe most probable and gathered 0.409 of the posterior probability, whereas modelswith one relation 0.363. Models assuming stability at bi-annual frequency togetherobtained 0.432 of the probability mass, so there is still evidence of cointegration atthis frequency as the whole posterior probability of models assuming it is higher andequals 0.562. The posterior probability of the number of cointegrating relations atannual frequency is also diffused, but the whole probability of cointegration equals0.949, so there is a strong confirmation of the existence of cointegration of this type.17 s r r r p ( M d,s,r ,r ,r | y ) Source:
Own calculations based on 1.5 million draws from the prior distribution.
Table 4: The most probable models ( p ( M d,s,r ,r ,r | y ) > . ).18 ( d = 1 | y ) p ( d = 2 | y ) p ( d = 3 | y ) p ( d = 4 | y ) p ( s = 0 | y ) p ( s = 1 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) p ( r = 3 | y ) p ( r = 4 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) p ( r = 3 | y ) p ( r = 4 | y ) p ( r = 0 | y ) p ( r = 1 | y ) p ( r = 2 | y ) p ( r = 3 | y ) p ( r = 4 | y ) Note:
In parentheses - the assumed prior probabilities.
Source:
Own calculations based on 1.5 million draws from the prior distribution.
Table 5: Posterior probabilities of models’ features.The discussion of nature of the analyzed time series will be complemented bypoint estimation of the cointegration spaces in the most probable model - M , , , , . ˆ β = − . . − . . , τ sp ( β ) = 0 . (25)The posterior distribution of the cointegration space at zero frequency is quite diffuse,as the measure τ sp ( β ) is roughly in the middle of the interval [0 , . The visualinspection of deviations from the obtained cointegrating relation (graphed at Figure3) confirms its stationarity.The estimation’s results of the cointegration space at the annual frequency are a bitsurprising. The measure τ sp ( β (cid:63) ) is just over 0.9, so the posterior distribution of the19ointegration space is almost uniform. ˆ β (cid:63) = − .
053 0 . − . .
140 0 .
083 0 . .
007 0 .
280 0 . . − . − . + i − . − . − . − . − .
126 0 . . − . − . ,τ sp ( β ) = 0 . (26)Deviations from the obtained relations seem to be stationary, but by a closer look atthe first relation we can notice that its real and imaginary parts are almost the same . Moreover, these paths resemble the dynamics of transformations for SPREAD, i.e. SP READ t − SP READ t − (see Figure 4). Such results may indicate for stationarityof SPREAD at the annual frequency. This hypothesis may be checked by the Bayesianmodel comparison between models with such stationarity restriction imposed andwithout it, but this is left for further research. Similar remark applies to the rd relation ̂ 𝑦 𝑡(1) − 𝑅𝑒(𝛽̂̅ ⋆′ 𝑦 𝑡(3) ) − 𝐼𝑚(𝛽̂̅ ⋆′ 𝑦 𝑡(3) ) -4-3-2-10123 -2-10123 -0.12-0.1-0.08-0.06-0.04-0.0200.02 -0.13-0.08 -0.03 -2-10123 -0.0500.050.10.15 -0.13-0.08 -0.03 Source:
Own calculations.
Figure 3: Deviations from the obtained cointegrating relations.21 -2-1.5-1-0.5 Re[1. relation](t+1) Im[1. relation](t) SPREAD(t) - SPREAD(t-2)
Source:
Own calculations.
Figure 4: Comparison of
SP READ t − SP READ t − and deviations from the st cointegrating relation at the annual frequency. In this paper, the Bayesian seasonally cointegrated vector error correction model forquarterly data was introduced. The empirical usefulness of the discussed methodswas illustrated by the analysis of four-dimensional time series. Results of the modelcomparison indicate that data support the hypothesis of cointegration at zero andannual frequency. There is also evidence of cointegration at bi-annual frequency.This paper focuses only on Bayesian model comparison and point estimation in themost probable model, but as the posterior distribution of the model’s specification isdiffused it would be useful to take advantage of the Bayesian knowledge pooling inthe set of the most probable models. Moreover, as there are papers demonstratingrisk in omitting seasonal cointegration (see Introduction), the presented research can22lso be extended by comparison of forecasts performed in the set of models takinginto account relationships at seasonal frequencies and those allowing for only long-run relations. A similar comparison may be performed for structural analyses. Suchexamination is left for further research.
References
Abeysinghe, T. (1994). Deterministic seasonal models and spurious regressions.
Journal of Econometrics , 61(2), 259-272.Chern, S. S., Wolfson, J. G. (1987). Harmonic maps of the two-sphere into a complexGrassmann manifold II. Annals of Mathematics, 125(2), 301-335.Chikuse, Y. (1990). The matrix angular central Gaussian distribution.
Journal ofMultivariate Analysis , 33(2), 265-274.Chikuse, Y. (2003). Statistics on special manifolds. Lecture Notes in Statistics (Vol.174). Springer Science & Business Media.Cubadda, G. (1999). Common cycles in seasonal non-stationary time series.
Journalof Applied Econometrics , 14(3), 273-291.Cubadda, G., Omtzigt, P. (2005). Small-sample improvements in the statisticalanalysis of seasonally cointegrated systems.
Computational statistics & dataanalysis , 49(2), 333-348.Engle, R. F., Granger, C. W. J., Hylleberg, S. (1993). Seasonal cointegration: TheJapanese consumption function. Journal of Econometrics , 55, 275-298.Franses, P. H. (1994). A multivariate approach to modeling univariate seasonal timeseries.
Journal of Econometrics , 63(1), 133-151.Franses, P. H., Kunst, R. M. (1999). On the role of seasonal intercepts in seasonalcointegration.
Oxford Bulletin of economics and statistics , 61(3), 409-433.Ghysels, E., Perron, P. (1993). The effect of seasonal adjustment filters on tests for aunit root.
Journal of Econometrics , 55(1-2), 57-98.Ghysels, E., Lee, H. S., Siklos, P. L. (1993). On the (mis) specification ofseasonality and its consequences: an empirical investigation with US data.
Empirical Economics , 18(4), 747-760. 23hysels, E., Granger, C. W., Siklos, P. L. (1996). Is seasonal adjustment a linear ornonlinear data-filtering process?.
Journal of Business & Economic Statistics , 14(3),374-386.Granger, C. W. J., Siklos, P. L. (1995). Systematic sampling, temporalaggregation, seasonal adjustment, and cointegration theory and evidence.
Journalof Econometrics , 66(1-2), 357-369.Hamilton, J. D. (2018). Why you should never use the Hodrick-Prescott filter.
Reviewof Economics and Statistics , 100(5), 831-843.Hecq, A. (1998). Does seasonal adjustment induce common cycles?.
EconomicsLetters , 59(3), 289-297.Higham N. J. (1986), Newton’s method for the matrix square root,
Mathematics ofComputation
Journal of Econometrics , 44(1-2), 215-238.James, A. T. (1954). Normal multivariate analysis and the orthogonal group. TheAnnals of Mathematical Statistics, 25(1), 40-75.Johansen, S. (1995).
Likelihood-based inference in cointegrated vector autoregressivemodels . Oxford University Press on Demand.Johansen, S., Schaumburg, E. (1999). Likelihood analysis of seasonal cointegration.
Journal of Econometrics , 88(2), 301-339.Juselius, K. (2006). The cointegrated VAR model: methodology and applications.Oxford university press.Koop, G., Strachan, R., Van Dijk, H., Villani, M. (2006). Bayesian approaches tocointegration.Koop, G., León-González, R., Strachan, R. W. (2009). Efficient posterior simulationfor cointegrated models with priors on the cointegration space.
EconometricReviews , 29(2), 224-242.Kotlowski, J. (2005). Money and prices in the Polish economy. Seasonal cointegrationapproach.
Working Papers Series Warsaw School of Economics Warszawa, Poland
Working Paper No. 3-05.Löf, M., Franses, P. H. (2001). On forecasting cointegrated seasonal time series.
International Journal of Forecasting , 17(4), 607-621.24ütkepohl, H. (2005). New introduction to multiple time series analysis. SpringerScience & Business Media.Meyer, M., Winker, P. (2005). Using HP filtered data for econometric analysis: someevidence from Monte Carlo simulations.
Allgemeines Statistisches Archiv , 89(3),303-320.Nerlove, M. (1964). Spectral analysis of seasonal adjustment procedures.
Econometrica: Journal of the Econometric Society , 241-286.Osiewalski, J. (2001).
Ekonometria bayesowska w zastosowaniach . WydawnictwoAkademii Ekonomicznej.Osiewalski, J., Pipień, M. (1999). Bayesian forecasting of foreign exchange rates usingGARCH models with skewed t conditional distributions. In MACROMODELS’98.Conference Proceedings (Vol. 2, pp. 195-218).Pajor, A. (2017). Estimating the marginal likelihood using the arithmetic meanidentity. Bayesian Analysis, 12(1), 261-287.Srivastava, A. (2000). A Bayesian approach to geometric subspace estimation.
IEEETransactions on signal processing , 48(5), 1390-1400.Villani, M. (2006). Bayesian point estimation of the cointegration space.
Journal ofEconometrics , 134(2), 645-664.Wallis, K. F. (1974). Seasonal adjustment and relations between variables.
Journal ofthe American Statistical Association , 69(345), 18-31.Wróblewska J. (2020). A note on a complex extension of the matrix angular centralGaussian distribution, arXiv preprint arXiv:2010.03243.Zellner, A. (1971).
An introduction to Bayesian inference in econometrics (No. 519.54Z4).
Appendix - the Bayesian model comparison