[PDF] Cointegration and unit root tests: A fully Bayesian approach

Abstract

To perform statistical inference for time series, one should be able to assess if they present deterministic or stochastic trends. For univariate analysis one way to detect stochastic trends is to test if the series has unit roots, and for multivariate studies it is often relevant to search for stationary linear relationships between the series, or if they cointegrate. The main goal of this article is to briefly review the shortcomings of unit root and cointegration tests proposed by the Bayesian approach of statistical inference and to show how they can be overcome by the fully Bayesian significance test (FBST), a procedure designed to test sharp or precise hypothesis. We will compare its performance with the most used frequentist alternatives, namely, the Augmented Dickey-Fuller for unit roots and the maximum eigenvalue test for cointegration. Keywords: Time series; Bayesian inference; Hypothesis testing; Unit root; Cointegration.

Full PDF

CCOINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH

MARCIO ALVES DINIZFEDERAL UNIVERSITY OF S ˜AO CARLOSCARLOS ALBERTO DE BRAGANC¸ A PEREIRAFEDERAL UNIVERSITY OF MATO GROSSO DO SULJULIO MICHAEL STERNUNIVERSITY OF S ˜AO PAULOA

BSTRACT . To perform statistical inference for time series, one should be able to assess if they present deterministicor stochastic trends. For univariate analysis one way to detect stochastic trends is to test if the series has unit roots,and for multivariate studies it is often relevant to search for stationary linear relationships between the series, or ifthey cointegrate. The main goal of this article is to brieﬂy review the shortcomings of unit root and cointegrationtests proposed by the Bayesian approach of statistical inference and to show how they can be overcome by the fullyBayesian signiﬁcance test (FBST), a procedure designed to test sharp or precise hypothesis. We will compare itsperformance with the most used frequentist alternatives, namely, the Augmented Dickey-Fuller for unit roots and themaximum eigenvalue test for cointegration.

Keywords : time series, Bayesian inference, hypothesis testing, unit root, cointegration.

1. I

NTRODUCTION

Several times series present deterministic or stochastic trends, which imply that the effects of these trends onthe level of the series are permanent. Consequently, the mean and variance of the series will not be constant andwill not revert to a long term value. This feature reﬂects the fact that the stochastic processes generating theseseries are not (weakly) stationary, imposing problems to perform inductive inference using the most traditionalestimators or predictors. This is so because the usual properties of these procedures will not be valid under suchconditions.Therefore, when modelling non-stationary time series, one should be able to properly detrend the used series,either by directly modelling the trend by deterministic functions, or by transforming the series to remove stochastictrends. To determine which strategy is the suitable solution, several statistical tests were developed since the1970’s by the frequentist school of statistical inference.The Augmented Dickey-Fuller (ADF) test is one of the most popular tests used to assess if a time series has astochastic trend or, for series described by auto-regressive models, if they have a unit root. When one is searchingfor long term relationships between multiple series under analysis, it is crucial to know if there are stationarylinear combinations of these series, i. e. if the series are cointegrated. Cointegration tests were developed, also bythe frequentist school, in the late 1980’s [14] and early 1990’s [21]. Only in the late 1980’s the Bayesian approachto test the presence of unit roots started to be developed.Both unit root and cointegration tests may be considered tests on precise or sharp hypotheses, i. e. thosein which the dimension of the parameter space under the tested hypothesis is smaller than the dimension ofthe unrestricted parameter space. Testing sharp hypotheses pose major difﬁculties for either the frequentist orBayesian paradigms, such as the need to eliminate nuisance parameters.The main goal of this article is to brieﬂy review the shortcomings of the tests proposed by the Bayesian schooland how they can be overcome by the fully Bayesian signiﬁcance test (FBST). More speciﬁcally, we will compareits performance with the most used frequentist alternatives, the ADF for unit roots and the maximum eigenvaluetest for cointegration. Since this is a review article, it is important to remark that the results presented here werepublished elsewhere by the same authors. See [9, 10]. a r X i v : . [ m a t h . S T ] A ug COINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH

To accomplish this objective we will deﬁne the FBST in the next Section, also showing how it can be imple-mented in a general context. The following section discusses the problems of testing the existence of unit roots inunivariate time series and how the Bayesian tests approach the problem. Section 4 then shows how the FBST isapplied to test if a time series has unit roots and illustrates this with applications on a real data set. In the sequelwe discuss the Bayesian alternatives to cointegration tests and then apply the FBST to test for cointegration usingreal data sets. We conclude with some remarks and possible extensions for future work.2. FBSTThe fully Bayesian Signiﬁcance Test was proposed in [33] mainly to deal with sharp hypotheses. The procedurehas several properties , most interestingly the fact that it is only based on posterior densities, thus avoiding thenecessity of complications such as the elimination of nuisance parameters or the adoption of priors with positiveprobabilities attached to sets of zero Lebesgue measure.We shall consider general statistical models in which the parameter space is denoted by Θ ⊆ R m , m ∈ N . A sharphypothesis H assumes that θ , the parameter vector of the chosen statistical model, belongs to a sub-manifold Θ H of smaller dimension than Θ . This implies, for continuous parameter spaces, that the subset Θ H has null Lebesguemeasure whenever H is sharp. The sample space, set of all possible values of the observable random variables (orvectors), is here denoted by X .Following the Bayesian paradigm, let h ( · ) be a probability prior density over Θ , x ∈ X the observed sample(scalar or vector), and L ( · | x ) the likelihood derived from data x . To evaluate the Bayesian evidence based on theFBST, the sole relevant entity is the posterior probability density for θ given x , g ( θ | x ) ∝ h ( θ ) · L ( θ | x ) . It is important to highlight that the procedure may be used when the parameter space is discrete. However,when the posterior probability distribution over Θ is absolutely continuous the FBST appears as a more suitablealternative to signiﬁcance hypothesis testing. For notational simplicity, we will denote Θ H by H in the sequel.Let r ( θ ) be a reference density on Θ such that the function s ( θ ) = g ( θ | x ) / r ( θ ) is a relative surprise function.The reference density is important because it guarantees that the FBST is invariant to reparametrizations, evenwhen r ( θ ) is improper. Thus, when considering r ( θ ) proportional to a constant, the surprise function will be,in practical terms, equivalent to the posterior distribution. For the applications considered in this article we willuse the improper uniform density as reference density on Θ . The authors of [30] remark that it is possible togeneralize the procedure using other reference densities such as neutral, reference or non-informative priors, ifthey are available and desirable. Deﬁnition 1 [ Tangent set ]: Considering a sharp hypothesis H : θ ∈ Θ H , the tangential set of the hypothesis giventhe sample is given by(1) T x = { θ ∈ Θ : s ( θ ) > s ∗ } . where s ∗ = sup θ ∈ H s ( θ ) .Notice that the tangent set T x is the highest relative surprise set, that is, the set of points of the parameter spacewith higher relative surprise than any point in H , being tangential to H in this sense. This approach takes intoconsideration the statistical model in which the hypothesis is deﬁned, using several components of the model todeﬁne an evidential measure favouring the hypothesis. Deﬁnition 2 [ Evidence ]: The Bayesian evidence value against H , ev , is deﬁned as See [34, 46]. See [17]. See [34, 44].

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 3 (2) ev = P ( θ ∈ T x | x ) = (cid:90) T x dG x ( θ ) , where G x ( θ ) denotes the posterior distribution function of θ and the above integral is of the Riemann-Stieltjestype.Deﬁnition 2 sets ev as the posterior probability of the tangent set, that is interpreted as an evidence value against H . Hence, the evidence value supporting H is the complement of ev , namely, ev = − ev . Notwithstanding, ev isnot evidence against A : θ / ∈ Θ H , the alternative hypothesis (which is not sharp anyway). Equivalently, ev is notevidence in favor of A , although it is against H . Deﬁnition 3 [ Test ]: The FBST is the procedure that rejects H whenever ev = − ev is smaller than a critical level, ev c .Thus, we are left with the problem of deciding the critical level ev c for each particular application. We brieﬂydiscuss this and other practical issues in the following subsection.2.1. Practical implementation: critical values and numerical computation.

Since ev (also called e-value) is astatistic, it has a sampling distribution derived from the adopted statistical model and in principle this distributioncould be used to ﬁnd a threshold value. If the likelihood and the posterior distribution satisfy certain regularityconditions [11] proved that, asymptotically, there is a relationship between ev and the p -values obtained from thefrequentist likelihood ratio procedure used to test the same hypotheses. This fact provides a way to ﬁnd, at leastasymptotically, a critical value to ev to reject the hypothesis being tested.In a recent review [46], the authors discuss different ways to provide a threshold for ev . Among these alterna-tives, we highlight the standardized e-value, which follows, asymptotically, the uniform distribution on ( , ) . One could also try to deﬁne the FBST as a Bayes test derived from a particular loss function and the respectiveminimization of the posterior expected loss. Following this strategy, [30] showed that there are loss functionswhich result in ev as a Bayes estimator of φ = I H ( θ ) . Hence, the FBST is in fact a Bayes procedure in the formalsense as deﬁned by Wald in [52].

General algorithm : compute ev supporting hypothesis H : θ ∈ Θ H

1. Specify the statistical model (likelihood) and prior distribution on Θ .2. Specify the reference density, r ( θ ) , and derive the relative surprise function, s ( θ ) .3. Find s ∗ , the maximum value of s ( θ ) under the constraint θ ∈ H .4. Integrate the posterior distribution on the tangent set—eq. (2)—, to ﬁnd ev .5. Find ev = − ev .T ABLE

1. Pseudocode to implement the FBSTTo compute the evidence value supporting H deﬁned in the last section we need to follow the steps showed onTable 1. After deﬁning the statistical model and prior, it is simple to ﬁnd the surprise function, s ( θ ) . In step 3 oneshould ﬁnd the point of the parameter space in H that maximizes s ( θ ) , that is, to solve a problem of constrainednumerical maximization. In several applications this step does not present a closed form solution, requiring theuse of numerical optimizers.Step 4 involves the integration of the posterior distribution on a subset of Θ , the tangent set T x , that can behighly complex. Once more, since in many cases it is fairly difﬁcult to ﬁnd an explicit expression for T x , one mayuse use various numerical techniques to compute the integral. If it is possible to generate random samples from theposterior distribution, Monte Carlo integration provides an estimate of ev , as we will show in this work. Another See [39], p.436. See also [3] for more on the standardized version of ev . I A ( x ) denotes the indicator function, being equal to one if x ∈ A and zero otherwise, x / ∈ A . COINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH alternative is to use approximation techniques, such as those proposed in [48], based on a Laplace approximation.We discuss how to implement such approximations for unit root and cointegration tests in [9, 10].3. B

AYESIAN U NIT R OOT T ESTS

Before presenting the Bayesian procedures used to test the presence of unit roots, let us ﬁx notation. We willdenote by y t the t -th value of a univariate time series observed in t = , . . . , T + p dates, where T and p are positiveintegers. The usual approach is to assume that the series under analysis is described by an auto-regressive processwith p lags, AR ( p ) , meaning that the data generating process is fully described by a stochastic difference equationof order p , possibly with an intercept or drift and a deterministic linear trend, i. e.(3) y t = µ + δ · t + φ y t − + . . . + φ p y t − p + ε t with ε t i.i.d. N ( , σ ) for t = , . . . , T + p . Using the lag or backshift operator B , we denote y t − k by B k y t , allowingus to rewrite (3) as(4) ( − φ B − . . . φ p B p ) y t = µ + δ · t + ε t where φ ( B ) = ( − φ B − . . . φ p B p ) is the autoregressive polynomial. The difference equation (3) will be stable,implying that the process generating { y t } T + pt = is (weakly) stationary, whenever the roots of the characteristicpolynomial φ ( z ) , z ∈ C , lie outside the unit circle, since there may be complex roots. If some of the roots lieexactly on the unit circle, it is said the process has unit roots. In order to test such a hypothesis statistically, (3) isrewritten as(5) ∆ y t = µ + δ · t + Γ y t − + Γ ∆ y t − + . . . + Γ p − ∆ y t − p + + ε t where ∆ y t = y t − y t − , Γ = φ + . . . + φ p − Γ i = − ∑ pj = i + φ j , for i = , . . . , p −

1. If the generating processhas only one unit root, one root of the complex polynomial φ ( z ) ,1 − φ z − φ z . . . φ p z p is equal to one, meaning that 1 − φ − φ − . . . − φ p = φ ( ) =

0, and all the other roots are on or outside the unit circle. In this case, Γ =

0, the hypothesis that willbe tested when modelling (5). Even though tests based on these assumptions verify if the process has a single unitroot, there are generalizations based on the same principles that test the existence of multiple unit roots. The search for Bayesian unit root tests begun in the late 1980’s. As far as we know, [42] and [43] were the ﬁrstworks to propose a Bayesian approach for unit root tests. The frequentist critics of these articles received a properanswer in [35, 36], generating a fruitful debate that produced a long list of papers in the literature of Bayesiantime series. A good summary of the debate and the Bayesian papers that resulted from it is presented in [2]. Wewill present here only the most relevant strategies proposed by the Bayesian school to test for unit roots.Let θ = ( ρ , ψ ) be the parameters vector, in which ρ = ∑ pi = φ i and ψ = ( µ , δ , Γ , . . . , Γ p − ) . Assuming σ ﬁxed, the prior density for θ can be factorized as h ( θ ) = h ( ρ ) · h ( ψ | ρ ) . The marginal likelihood for ρ , denoted by L m is: The set of polynomial operators, such as lag polynomials like φ ( B ) , induces an algebra that is isomorphic to the algebra of polyno-mials in real or complex variables. See Dhrymes (2000). See [13].

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 5 L m ( ρ | y ) ∝ (cid:90) Ψ L ( θ | y ) · h ( ψ | ρ ) d ψ . where y = { y t } T + pt = is the observations vector, L ( θ | y ) the full likelihood and Ψ the support of the random vector ψ . This marginal likelihood, associated with a prior for ρ is the main ingredient used by standard Bayesianprocedures to test the existence of unit roots. Even though the procedure varies among authors according to somespeciﬁc aspects, mentioned below, basically all of them use Bayes factors and posterior probabilities.One important issue is the speciﬁcation of the null hypothesis: some authors, starting from [40], consider H : ρ = H : ρ <

1. This is the way the frequentist school addresses the problem, but following thisapproach no explosive value for ρ is considered. The decision theoretic Bayesian approach solved the problemusing the posterior probabilities ratio or Bayes factor: B = L m ( ρ = | y ) (cid:82) L m ( ρ | y ) · h ( ρ ) d ρ . Advocates of this solution argue that one of the advantages of this approach is that the null and the alternativehypotheses are given equal weight. However, the expression above is not deﬁned if h ( ρ ) is not a proper densitysince the denominator of the Bayes factor is equal to the predictive density, deﬁned just if h ( ρ ) is a proper density.There are also problems if L m ( ρ = | y ) is zero or inﬁnite.The problem is approached by [35, 26] by testing H : ρ ≥ H : ρ <

1, considering explicitly explosivevalues for ρ . The main advantage of this strategy is the possibility to compute posterior probabilities like, P ( ρ > | y ) = (cid:90) ∞ g m ( ρ | y ) d ρ deﬁned even for improper priors on ρ , where g m is the marginal posterior for ρ .In [8] the authors do not choose ρ as the parameter of interest, examining instead the largest absolute value ofthe roots of the characteristic polynomial and then verifying if it is smaller or larger than one. Usually this value isslightly smaller than ρ , but the authors argue that this small difference may be important. When this approach isused, unit roots are found less frequently. For an AR(3) model with a constant and deterministic trend, [8] derivethe posterior density for the dominant root for the 14 series used in [31] and concluded the following: for elevenof the series the dominant root was smaller than one, that is to say, the series were trend-stationary. These resultswere based on ﬂat priors for the autoregressive parameters and the deterministic trend coefﬁcient.Another controversy is about the prior over ρ : [35] argues that the difference between the results given by thefrequentist and Bayesian inferences is due to the fact that the ﬂat prior proposed in [42] overweights the stationaryregion. Hence, he derived a Jeffreys prior for the AR(1) model: this prior quickly diverges as ρ increases andbecomes larger than one. The obtained posterior led to the same results of [31], which will be discussed in detailin the following section. The critics of the approach adopted by Phillips in [35] judged the Jeffreys prior asunrealistic, from a subjective point of view. This is a nonsensical objection if one considers that the Jeffreys prioris crucial to ensure an invariant inferential procedure, and invariance is a highly desirable property, for eitherobjective or subjective reasons. A ﬁnal controversial point concerns the modelling of initial observations. If the likelihood explicitly modelsthe initial observed values (it is an exact likelihood), the process is implicitly considered stationary. In fact, whenit is known that the process is stationary and it is believed that the data generating process is working for a longperiod, it is reasonable to assume that the parameters of the model determine the marginal distribution of theinitial observations. In the simplest AR(1) model, this would imply that y ∼ N ( , σ / ( − ρ )) . In this scenario,to perform the inference conditional on the ﬁrst observation would discard relevant information. On the otherhand, there is no marginal distribution deﬁned for y if the generating process is not stationary. Then, it is valid Starting from [12]. See

Journal of Applied Econometrics , volume 6, number 4, 1991. See [45] for more on invariance in physics and statistical models.

COINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH to use a likelihood conditional on initial observations. For the models presented here, we always work with theconditional likelihood. As argued in [42], inferences for stationary models are little affected by using conditionallikelihoods, especially for large samples. He compares these inferences with the ones based on exact likelihoodsunder explicit modelling for initial observations.4. I

MPLEMENTING THE

FBST

FOR UNIT ROOT TESTING

We will now describe how to use the FBST to test for the presence of unit roots referring to the general model (5), i. e. ∆ y t = µ + δ · t + Γ y t − + Γ ∆ y t − + . . . + Γ p − ∆ y t − p + + ε t , where ε t i . i . d . ∼ N ( , σ ) for t = , . . . , T + p , recalling also that the hypothesis being tested is Γ =

0. We slightlychange the notation of the last section now using ψ to denote the vector ( µ , δ , Γ , . . . , Γ p − ) and setting θ =( ψ , σ ) .Recalling the steps to implement the FBST displayed on Table 1, we have just speciﬁed the statistical model.The likelihood, conditional on the ﬁrst p observations, derived from the Gaussian model is(6) L ( θ | y ) = ( π ) − T / σ − T exp (cid:40) − σ · T + p ∑ t = p + ε t (cid:41) , in which ε t = ∆ y t − µ − δ · t − Γ y t − − Γ ∆ y t − − . . . − Γ p − ∆ y t − p + . To complete step 1 of Table 1 we need aprior distribution for θ . For all the series modelled in this article we will use the following non informative prior:(7) h ( θ ) = h ( ψ , σ ) ∝ / σ . We are aware of the problems caused by improper priors applied to this problem when one uses alternativeapproaches, like those mentioned by [2]. However, one of our goals is to show how the FBST can be implementedeven for a potentially problematic prior like this one. To write the posterior we use the following notation: ∆ Y =  ∆ y p + ∆ y p + ... ∆ y T + p  , X =  p + y p ∆ y p . . . ∆ y p + y p + ∆ y p + . . . ∆ y ... ... ... ... ... ...1 T + p y T + p − ∆ y T + p − . . . ∆ y T +  , ψ =  µδ Γ ... Γ p −  , being ∆ Y of dimension ( T × ) , X of dimension ( T × p + ) and ψ , ( p + × ) . Thanks to this notation we canwrite, using primes to denote transposed matrices: T + p ∑ t = p + ε t = ( ∆ Y − X ψ ) (cid:48) ( ∆ Y − X ψ ) = ( ∆ Y − (cid:99) ∆ Y ) (cid:48) ( ∆ Y − (cid:99) ∆ Y ) + ( ψ − (cid:98) ψ ) (cid:48) X (cid:48) X ( ψ − (cid:98) ψ ) , where (cid:98) ψ = ( X (cid:48) X ) − X (cid:48) · ∆ Y is the ordinary least squares (OLS) estimator of ψ and (cid:99) ∆ Y = X (cid:98) ψ its prediction for ∆ Y .Thus, the full posterior is(8) g ( θ | y ) ∝ σ − ( T + ) exp (cid:26) − σ [( ∆ Y − (cid:99) ∆ Y ) (cid:48) ( ∆ Y − (cid:99) ∆ Y ) + ( ψ − (cid:98) ψ ) (cid:48) X (cid:48) X ( ψ − (cid:98) ψ )] (cid:27) , a Normal-Inverse Gamma density.Step 2 demands a reference density in order to deﬁne the relative surprise function. Since we will use theimproper density r ( θ ) ∝

1, the surprise function will be equivalent to the posterior distribution in our applications. It is also possible to include q ∈ N moving average terms in (3) to model the process, a case that will not be covered in this articlebut that, in principle, shall not imply major problems for the FBST. OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 7

Given this, to ﬁnd s ∗ (Step 3) we need to ﬁnd the maximum value of the posterior under the hypothesis beingtested, in our case, Γ = H ) posteriordistribution is(9) g r ( θ r | y ) ∝ σ − ( T + ) exp (cid:26) − σ [( ∆ Y − (cid:99) ∆ Y r ) (cid:48) ( ∆ Y − (cid:99) ∆ Y r ) + ( ψ r − (cid:98) ψ r ) (cid:48) X (cid:48) r X r ( ψ r − (cid:98) ψ r )] (cid:27) , in which θ r = ( ψ r , σ ) , ψ r being vector ψ without Γ , X r =  p + ∆ y p . . . ∆ y p + ∆ y p + . . . ∆ y ... ... ... ... ...1 T + p ∆ y T + p − . . . ∆ y T +  , (cid:98) ψ r = ( X (cid:48) r X r ) − X (cid:48) r · ∆ Y , and (cid:99) ∆ Y r = X r (cid:98) ψ r , that is, X r is simply matrix X above without its third column, since under H : Γ = X is Γ , (cid:98) ψ r is least squares estimator of ψ r and (cid:99) ∆ Y r denotes the predicted values for ∆ Y givenby the restricted model. From (9), it is easy to show that the maximum a posteriori (MAP) estimator of θ r is givenby ( (cid:98) ψ r , (cid:98) σ r ) , with (cid:98) σ r = (cid:115) ( ∆ Y − (cid:99) ∆ Y r ) (cid:48) ( ∆ Y − (cid:99) ∆ Y r ) T + . Plugging the values of (cid:98) ψ r and (cid:98) σ r into (9) we ﬁnd s ∗ , as requested in Step 3. Step 4 will also be easy to implementthanks to structure of the models assumed in this section. Since the full posterior, (8), is a Normal-Inverse Gammadensity, a simple Gibbs sampler allows us to obtain a random sample from such distribution, suggesting a MonteCarlo approach to compute ev . From (8), the conditional posteriors of ψ and σ are, respectively g ψ ( ψ | σ , y ) ∝ N ( (cid:98) ψ , σ ( X (cid:48) X ) − ) (10) g σ ( σ | ψ , y ) ∝ IG (cid:18) T + , H (cid:19) (11)in which H = . [( ∆ Y − (cid:99) ∆ Y ) (cid:48) ( ∆ Y − (cid:99) ∆ Y ) + ( ψ − (cid:98) ψ ) (cid:48) X (cid:48) X ( ψ − (cid:98) ψ )] , IG denotes the Inverse-Gamma distribution and (cid:98) ψ is the OLS estimator of ψ , as above. With a sizeable random sample from the full posterior we estimate ev as the proportion of sampled vectors that generate a value for the posterior greater than s ∗ , found in Step 3. Hence,in Step 5, we only compute one minus the estimate of ev found in Step 4. The whole procedure is summarized inTable 2.4.1. Results.

We implemented the FBST as described above to test the presence of unit roots in 14 U.S. macroe-conomic time series, all with annual frequency, ﬁrst mentioned in [31]. We used the extended series, analyzed in[40].Table 3 reports the names of the tested series, the year of the ﬁrst observation, the adopted value for p , if alinear (deterministic) trend was included in the model or not, the ADF test statistic and its respective p-value. The last two columns bring the posterior probability of non-stationarity, Γ ≥

0, and the FBST e-values for the See equation (5). Appendix A brings the parametrization and the probability density function of the Inverse-Gamma distribution. For the implementations in this article we sampled 51,000 vectors from (8) and discarded the ﬁrst 1,000 as a burn-in sample. See equation (8). We have used the computer package described in [29] to ﬁnd the ADF p-values, available in the R library urca . COINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH

General algorithm : compute ev supporting hypothesis H : Γ = h ( θ ) ∝ / σ .2. Reference density: r ( θ ) ∝

1; relative surprise function: g ( θ | y ) .3. Find s ∗ : (9) evaluated at (cid:98) ψ r and (cid:98) σ r .4. Gibbs sampler (from eqs. (10) and (11)) to obtain N random samples of parameter vectors from (8).Evaluate the posterior at the sampled vectors and estimate ev as the proportion of N in which theevaluated values are larger than s ∗ .5. Find ev = − ev .T ABLE

2. Pseudocode to implement the FBST to unit root testsspeciﬁed models. In order to obtain comparable results, we have adopted the models chosen by [2] for all theseries. The results show that the non-stationary posterior probabilities are quite distant from the ADF p-values. Theseresults were highlighted in [42, 43]. Considering the simplest AR(1) model, they argued that, once frequentistinference is based on the distribution of (cid:98) ρ | ρ =

1, the non-stationary posterior probabilities provide counterintuitiveconclusions since the referred distribution is skewed. Their main argument is that Bayesian inference uses adistribution (marginal posterior of ρ ) which is not skewed.As mentioned before, [35] claims that the difference in results between frequentist and Bayesian approaches isdue to the ﬂat prior that puts much weight on the stationary region. He proposed the use of Jeffreys priors, whichrestored the conclusions drawn by the frequentist test. Phillips argued that the ﬂat prior was, actually, informativewhen used in time series models like those for unit root tests. Using simulations he shows that “ [the use of a] ﬂatprior has a tendency to bias the posterior towards stationarity. ... even when [the estimate] is close to unity, theremay still be a non negligible downward bias in the [ﬂat] posterior probabilities” . Notwithstanding, the e-valuesreported in the last column are quite close to the ADF p-values even using the ﬂat prior criticized by Phillips.Series start p trend ADF p-value P ( Γ ≥ | y ) e-valueReal GNP 1909 2 yes -3.52 0.044 0.0005 0.040Nominal GNP 1909 2 yes -2.06 0.559 0.0238 0.523Real GNP per capita 1909 2 yes -3.59 0.037 0.0004 0.034Industrial prod. 1860 2 yes -3.62 0.032 0.0003 0.028Employment 1890 2 yes -3.47 0.048 0.0004 0.043Unemployment rate 1890 4 no -4.04 0.019 0.0001 0.020GNP deﬂator 1889 2 yes -1.62 0.778 0.0584 0.762Consumer prices 1860 4 yes -1.22 0.902 0.1154 0.983Nominal wages 1900 2 yes -2.40 0.377 0.0106 0.341Real wages 1900 2 yes -1.71 0.739 0.0475 0.715Money stock 1889 2 yes -2.91 0.164 0.0029 0.147Velocity 1869 2 yes -1.62 0.779 0.0620 0.777Bond yield 1900 4 no -1.35 0.602 0.0962 0.936Stock prices 1871 2 yes -2.44 0.357 0.0103 0.349T ABLE

3. Unit root tests for Nelson and Plosser data5. B

AYESIAN C OINTEGRATION T ESTS

Before starting our brief review of the most relevant Bayesian cointegration tests, we ﬁx notation and presentthe deﬁnitions to which we will refer in the sequel.All the tests mentioned here are based on the following multivariate framework. Let Y t = [ y t . . . y nt ] (cid:48) be avector with n ∈ N time series, all of them assumed to be integrated of order d ∈ N , i.e. have d unit roots. The All the models considered the intercept or constant term, µ in (8). OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 9 series are said to be cointegrated if there is a nontrivial linear combination of them that has b ∈ N unit roots, b < d . We will assume that, as in most applications, d = b =

0, meaning that, if the time series in Y t arecointegrated there is a linear combination a (cid:48) Y t that is stationary, where a ∈ R n is the cointegrating vector. Sincethe linear combination a (cid:48) Y t is often motivated by problems found in economics, it is called a long-run equilibriumrelationship. The explanation is that non-stationary time series that are related by a long-run relationship cannotdrift too far from the equilibrium because economic forces will act to restore the relationship.Notice also that: (i) the cointegrating vector is not uniquely determined since for any scalar s , ( s · a ) is acointegrating vector; and (ii) if Y t has more than two series, it is possible that there are more than one cointegratingvector generating a stationary linear combination.It is assumed that the data generating process of Y t is described by the following vector autorregression with p ∈ N lags, denoted VAR(p), and given by:(12) Y t = c + Φ D t + Φ Y t − + . . . + Φ p Y t − p + E t , in which c is a ( n × ) vector of constants, D t a vector ( n × ) with some deterministic variable , Φ i are ( n × n ) coefﬁcients matrices and E t is a ( n × ) stochastic vector with multivariate normal distribution with null expectedvalue and covariance matrix Ω , denoted N n ( , Ω ) . This dynamic model is assumed valid for t = , . . . , T + p , theavailable span of observations of Y t . Model (12) can be rewritten using the lag or backshift operator as(13) ( I n − Φ B − . . . − Φ p B p ) Y t = c + Φ D t + E t , where Φ ( B ) = I n − Φ B − . . . − Φ p B p is the (multivariate) autoregressive polynomial and I n denotes the n -dimensional identity matrix. The associate characteristic polynomial in this context will be the determinant of Φ ( z ) , z ∈ C . If all the roots of the characteristic polynomial lie outside the unit circle, it is possible to show that Y t have a stationary representation, such as equation (12). In order to determine if this is the case, model (12)is rewritten as an (vectorial) error correction model (VECM):(14) ∆ Y t = c + Φ D t + Γ ∆ Y t − + . . . + Γ p − ∆ Y t − p + + Π Y t − + E t , where ∆ Y t = [ ∆ y t . . . ∆ y nt ] (cid:48) , Γ i = − ( Φ i + + . . . Φ p ) for i = , , . . . , p − Π = − Φ ( ) = − ( I n − Φ − . . . − Φ p ) .It is possible to show that, when all the roots of det( Φ ( z ) ) are outside the unit circle, matrix Π in (14) has fullrank, i.e. all the n eigenvalues of Π are n non null. If the rank of Π is null, this matrix cannot be distinguishedfrom a null matrix, implying that the series in Y t have at least one unit root and a valid representation is a VAR oforder p −

1, i.e. model (14) without the term Π Y t − . Finally, if the ( n × n ) matrix Π has rank r , 0 < r < n , it has n − r non null eigenvalues, implying that the seriesin Y t have at least one unit root and its valid representation is given by the VECM in equation (14). In this case, Π = αβ (cid:48) , where α and β are matrices ( n × r ) of rank r . Matrix β denotes the one with the cointegrating vectorsand matrix α is called the loading matrix, since it contains the weights of the equilibrium relationships. The testsdeveloped in [21] focus on the rank of matrix Π .The pioneer Bayesian works to study VAR models and reduced rank regressions are [7, 1, 16]. However, themain concern of these papers is to estimate the model parameters and their (marginal) posterior distributions.The usual approach is to assume a given rank for the long run matrix Π , and proceed with all the computationsconditional on the given rank. The Bayesian initiatives to test the rank of the referred matrix are recent, the mainreference for Bayesian inference on VECM’s being [25].To justify inferential procedures based on prespeciﬁed ranks of matrix Π , [2] argued that an empirical cointe-gration analysis should be based on economic theory, which proposes models obeying equilibrium relationships. As in the univariate case, one may include moving average terms in (12), i.e. lags for E t , but this, in principle, would not causemajor problems in the Bayesian framework. Such as deterministic trends or seasonal dummies. See Johansen (1996). It is possible that the series in Y t have two unit roots each, implying that the correct VECM must be written with ∆ Y t as dependentvariable. According to this view, cointegration research should be “conﬁrmatory” rather than “exploratory”. Even thoughthe advocated conditional inference is of simple implementation and very useful for small samples, [2] recognizedthat tests for the rank of matrix Π should be developed. To our knowledge, few initiatives with this purpose weredeveloped up to now.One common approach to test sharp hypotheses in the Bayesian framework is by means of Bayes factors.Testing the rank of matrix Π by Bayes factors implies several computational complications and requires the useof proper priors, as shown in [22]. Following an informal approach, [1] obtained the posterior distribution of theordered eigenvalues of the “squared” long run matrix, Π (cid:48) · Π , obtained from a VAR model without assuming theexistence of cointegration relations. As the long run matrix has a reduced rank, it has some null eigenvalues, andthis should be revealed by the fact that the smallest eigenvalues should have a lot of probability mass accumulatedon values close to zero. The computations can be made straightforwardly, simulating values for the long run matrixfrom its (marginal) posterior distribution, which is a matrix t-Student distribution under the non informative prior(16), also considered in the sequel.Another common procedure is to estimate the rank of Π as the value r that maximizes the (marginal) posteriordistribution of the rank. Conditioned on such an estimate, one proceeds to derive the full posterior and eventuallyestimate the cointegration space, i. e. the linear space spanned by β .A different approach was proposed by [4], who used the Posterior Information Criterion (PIC), developed in[37], as a criterion to choose the mode of the posterior distribution of the rank of Π . However, as highlighted in[25], one of the advantages of the Bayesian approach is the possibility to incorporate the uncertainty about theparameters in the analysis, represented by the posterior distribution of the rank and, whatever the tool the scientistuses to infer the value of r , it is derived from this posterior distribution.The authors of [23] nested the reduced rank models in an unrestricted VAR and used Metropolis-Hastingssampling with the Savage-Dickey density ratio to estimate the Bayes Factors of all the models with incompleterank up to the model with full rank. The Bayes Factor derivation requires the estimation of an error correctionfactor for the incomplete rank. This factor, however, is not deﬁned for improper priors due to a problem known as Bartlett paradox , which arises whenever one compares models of different dimensions. The difﬁculty is relevantin the present case because after deriving the rank posterior density, one may consider that models of differentdimensions are being compared. The paradox is stated informally as: improper priors should be avoided when onecomputes Bayes Factors (except for parameters common to both models) as they depend on arbitrary constants(that are integrals).More recently, [51] developed an efﬁcient procedure to obtain the posterior distribution of the rank usingan uniform proper prior over the cointegration space linearly normalized. The author derived solutions for theposterior probabilities for the null rank and for the full rank of Π . The posterior probabilities of each intermediaterank is derived from the posterior samples of the matrices that compose the long run matrix ( α and β ) , properlynormalized, under each rank and using the method proposed by [5].6. I MPLEMENTING THE

FBST

AS A COINTEGRATION TEST

This section describes how to implement the FBST to test for cointegration. We will proceed in the same spiritof Section 4, i.e. describing the steps given in Table 1 to implement the test for cointegration.Let us begin recalling the VECM given by equation (14): ∆ Y t = c + Φ D t + Γ ∆ Y t − + . . . + Γ p − ∆ Y t − p + + Π Y t − + E t , t = , . . . , T + p , in which E t i . i . d . ∼ N n ( , Σ ) with a null vector of dimension ( n × ) and Ω a symmetric positivedeﬁnite real matrix. Notice that these assumptions already specify the statistical model (Gaussian) and its impliedlikelihood. Before giving it explicitly, let us rewrite equation (14) using matrix notation:(15) ∆ Y = Z · η + E See [50].

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 11 where ∆ Y =  ∆ Y (cid:48) p + ∆ Y (cid:48) p + ... ∆ Y (cid:48) T + p  , Z =  D (cid:48) p + ∆ Y (cid:48) p . . . ∆ Y (cid:48) Y (cid:48) p D (cid:48) p + ∆ Y (cid:48) p + . . . ∆ Y (cid:48) Y (cid:48) p + ... ... ... ... ...1 D (cid:48) T + p ∆ Y (cid:48) T − . . . ∆ Y (cid:48) T + p − Y (cid:48) T + p −  , η =  c (cid:48) Φ Γ ... Γ p − Π  and the errorvector is given by E ∼ MN T × n ( , I T , Ω ) , denoting the matrix normal distribution . Now the parameter vector isgiven by Θ = ( η , Ω ) .Notice that ∆ Y is formed by piling up T transposed vectors ∆ Y t , thus resulting in a matrix with T lines and n columns ( n is the number of time series in vector Y t ), those being also dimensions of matrix E . Matriz Z isconstructed likewise—always piling up the transposed vectors—resulting in a matrix with T lines and pn + n + η has the matrices of coefﬁcients, all piled up properly, resulting in a matrix with pn + n + n columns.Given the assumptions above, ∆ Y ∼ MN T × n ( Z · η , I T , Ω ) , implying that the likelihood is L ( Θ | y ) ∝ | Ω | − T / exp (cid:26) − · tr [ Ω − ( ∆ Y − Z · η ) (cid:48) ( ∆ Y − Z · η )] (cid:27) where y denotes the set of observed values of vectors Y t for t = , . . . , T + p . As in Section 4, we will consider animproper prior for Θ , given by(16) h ( Θ ) = h ( η , Ω ) ∝ | Ω | − ( n + ) / , and our reference density, r ( Θ ) , will be proportional to a constant, leading to a surprise function equivalent to the(full) posterior distribution. These choices correspond to steps 1 and 2 of Table 1. These modeling choices implythe following posterior density:(17) g ( Θ | y ) ∝ | Ω | − ( T + n + ) / exp (cid:8) − · tr [ Ω − ( ∆ Y − Z · η ) (cid:48) ( ∆ Y − Z · η )] (cid:9) = | Ω | − ( T + n + ) / exp (cid:8) − · tr { Ω − [ S + ( η − (cid:98) η ) (cid:48) · Z (cid:48) Z · ( η − (cid:98) η )] } (cid:9) where (cid:98) η = ( Z (cid:48) Z ) − Z (cid:48) ∆ Y and S = ∆ Y (cid:48) ∆ Y − ∆ Y (cid:48) Z ( Z (cid:48) Z ) − Z (cid:48) ∆ Y .To implement Step 3 of Table 1, we need to ﬁnd the maximum a posteriori of (17) under the constraint Θ ⊂ Θ H ,i.e. we need to maximize the posterior in Θ H . Since we are testing the rank of matrix Π , it is necessary to max-imize the posterior assuming the rank of Π is r , 0 ≤ r ≤ n . Thanks to the modeling choices made here—Gaussianlikelihood and equation (16) as prior—, our posterior is almost identical to a Gaussian likelihood, allowing us toﬁnd this maximum using a strategy similar to that proposed by [21], who derived the maximum of the (Gaussian)likelihood function assuming a reduced rank for Π . We will summarize Johansen’s algorithm, providing in Ap-pendix B a heuristic argument of why it indeed provides the maximum value of the posterior under the assumedhypotheses.We begin estimating a VAR( p −

1) model for ∆ Y t with all the explanatory variables shown in (14) except for Y t − . Using the matrix notation established above, this corresponds to estimate ∆ Y = Z · η + U , where Z =  D (cid:48) p + ∆ Y (cid:48) p . . . ∆ Y (cid:48) D (cid:48) p + ∆ Y (cid:48) p + . . . ∆ Y (cid:48) ... ... ... ...1 D (cid:48) T + p ∆ Y (cid:48) T − . . . ∆ Y (cid:48) T + p −  and η =  e (cid:48) τ υ ... υ p −  showing that Z is obtained from matrix Z extracting its last n columns, exactly those corresponding to Y t − . See Appendix A for more information on this distribution. See the discussion made in the beginning of Section 5.

We also estimate a second set of auxiliary equations, regressing Y t − on a vector of constants and D t , ∆ Y t − ,. . . , ∆ Y t − p + . By piling up all the the (transposed) vectors Y (cid:48) t − for t = p + , . . . , T + p we have a ( T × n ) matrix,denoted by Y − . As above, these equations can be represented by Y − = Z · η + V , where Y − =  Y (cid:48) p Y (cid:48) p + ... Y (cid:48) T + p −  and η =  m (cid:48) ν ζ ... ζ p −  .Considering the OLS estimates of these sets of equations and their respective estimated residuals, we may write (cid:99) ∆ Y = Z · (cid:98) η + (cid:98) U (18) (cid:98) Y − = Z · (cid:98) η + (cid:98) V (19)where (cid:98) η = ( Z (cid:48) Z ) − Z (cid:48) · ∆ Y , (cid:98) η = ( Z (cid:48) Z ) − Z (cid:48) · Y − , (cid:98) U and (cid:98) V are the respective matrices of estimatedresiduals. It is possible to show that the estimated residuals of these auxiliary regressions are related by Π in thefollowing regressions(20) (cid:98) U = Π (cid:98) V + (cid:99) W . One can prove that the OLS estimates of Π obtained from (15) and from (20) are numerically identical, as theestimated residuals (cid:98) E and (cid:99) W .The second stage of Johansen’s algorithm requires the computation of the following sample covariance matricesof the OLS residuals obtained above: (cid:98) Σ VV = T · (cid:98) V (cid:48) (cid:98) V (cid:98) Σ UU = T · (cid:98) U (cid:48) (cid:98) U (cid:98) Σ UV = T · (cid:98) U (cid:48) (cid:98) V (cid:98) Σ VU = (cid:98) Σ (cid:48) UV and from these, we ﬁnd the n eigenvalues of matrix (cid:98) Σ − VV · (cid:98) Σ VU · (cid:98) Σ − UU · (cid:98) Σ UV , ordering them decreasingly (cid:98) λ > (cid:98) λ > . . . > (cid:98) λ n . The maximum value attained by the log posterior subject to theconstraint that there are r (0 ≤ r ≤ n ) cointegration relationships is(21) (cid:96) ∗ = K − ( T + n + ) · log | (cid:98) Σ UU | − T + n + · r ∑ i = log ( − (cid:98) λ i ) , where K is a constant that depends only on T , n and y . Since (cid:96) ∗ represents the maximum of the log-posterior, toobtain s ∗ one should take s ∗ = exp ( (cid:96) ∗ ) , completing step 3 of Table 1.As in Section 4, we compute ev in step 4 by means of a Monte Carlo algorithm. It is easy to factor the fullposterior (17) as a product of a (matrix) normal and an Inverse-Wishart, suggesting a Gibbs sampler to generaterandom samples from the full posterior. Thus, the conditional posteriors for η and Ω are, respectively A result known as Frisch-Waugh-Lovell theorem in the econometrics literature. See [18], theorem 3.3 or [6] Section 2.4. By means of the marginal distribution of the data set, y . See Appendix A for more on the Inverse-Wishart distribution.

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 13 g η ( η | Ω , y ) ∝ MN n × k ( (cid:98) η , ( Z (cid:48) Z ) − , Ω ) (22) g Ω ( Ω | η , y ) ∝ IW ( Ω | S + ( η − (cid:98) η ) (cid:48) · Z (cid:48) Z · ( η − (cid:98) η ) , T ) (23)where S = ∆ Y (cid:48) ∆ Y − ∆ Y (cid:48) Z ( Z (cid:48) Z ) − Z (cid:48) ∆ Y , IW denotes the Inverse-Wishart, k = pn + n + η , and (cid:98) η its OLS estimator, as above. From a Gibbs sampler set with these conditionals we obtain a randomsample from the full posterior to estimate ev as the proportion of sampled vectors that generate a value for theposterior greater than s ∗ . Finally, we obtain ev = − ev in the ﬁnal step (5). The whole implementation forcointegration tests, following the assumptions made in this section, is summarized in Table 4. General algorithm : compute ev supporting hypothesis H : rank ( Π ) = r (0 ≤ r ≤ n ) in model (14)1. Statistical model: Gaussian; prior: h ( Θ ) ∝ | Ω | − ( n + ) / .2. Reference density: r ( Θ ) ∝

1; relative surprise function: g ( Θ | y ) .3. Find s ∗ : Johansen’s algorithm; obtain (cid:96) ∗ from equation (21) with s ∗ = exp ( (cid:96) ∗ ) .4. Gibbs sampler (from eqs. (22) and (23)) to obtain N random samples of parameter vectors from (17).Evaluate the posterior at the sampled vectors and estimate ev as the proportion of N for which theevaluated values are larger than s ∗ .5. Find ev = − ev .T ABLE

4. Pseudocode to implement the FBST to cointegration testsBefore presenting the results of the procedure applied to real data sets it is important to remark one feature ofthe FBST applied to cointegration tests. Since the estimated eigenvalues of matrix Π , (cid:98) λ i , lie between zero andone, (21) shows that (cid:96) ∗ ≤ (cid:96) ∗ ≤ . . . (cid:96) ∗ n , where (cid:96) ∗ r denotes the maximum of the posterior (14) assuming Π has rank0 ≤ r ≤ n . Therefore, one may say that the hypotheses rank( Π ) = r are nested, in the sense that the respectivee-values obtained by the FBST for these hypotheses are always non-decreasing ev ( ) ≤ ev ( ) ≤ . . . ≤ ev ( n ) .This nested formulation is also present in the frequentist procedure proposed by [21], based on the likelihoodratio statistics for successive ranks of Π . Thus, the FBST should be used, like the maximum eigenvalue test, in asequential procedure to test for the number of cointegrating relationships. We will show how this should be donein presenting the applied results in the sequel.6.1. Results.

Now we present, by means of four examples, the application of FBST as a cointegration test.In all the examples we have adopted a Gaussian likelihood and the improper prior (16). The Gibbs samplerwas implemented as described above, providing 51,000 random vectors from the posterior (17). The ﬁrst 1,000samples were discarded as a burn-in sample, the remaining 50,000 being used to estimate the integral (2). Thetables show the e-value computed from the FBST and the maximum eigenvalue test statistics with their respectivep-values.

Example 1 : we analysed four electroencephalography (EEG) signals from a subject that have previously pre-sented epileptic seizures. The original study, [41], had the aim of detecting seizures based on multiple hours ofrecordings for each individual and the cointegration analysis of the mentioned signals was presented by [32]. Infact, the cointegration hypothesis is tested using the phase processes of the original signals. The labels of the modelled series refer to the electrodes on the scalp. As seen on Figures 1 and 2 the seriesare called FP1-F7, FP1-F3, FP2-F4 and FP2-F8, where FP refers to the frontal lobes and F refers to a row ofelectrodes placed behind these. Even numbered electrodes are on the right side and odd numbered electrodes areon the left side. The electrodes for these four signals mirror each other on the left and right sides of the scalp. These eigenvalues correspond to the squared canonical correlations between ∆ Y t and Y − corrected for the variable in Z andtherefore lie between 0 and 1. The phase processes are estimated by passing the signal into the Hilbert transform and then “unwrapping” the resulting transform.See [32], Sections 2 and 5 for more details. See also [15] for more details on “unwrapping”. The original data set is available at https://physionet.org/content/chbmit/1.0.0/ . The recordings of the studied subject identiﬁed a seizure in the interval (measured in seconds) [ , ] .Therefore, like [32], we analyze the period of 41 seconds prior to the seizure—interval [ , ] —, and thesubsequent 41 seconds—interval [ , ] —the seizure period. In the sequel we will refer to these as prior toseizure and during seizure , respectively. Since the sample frequency has 256 measurements per second there area total of 10,496 measurements for each of the four signals. � � � �� F IGURE

1. Estimated phase processesprior to a seizure � � � �� F IGURE

2. Estimated phase processesduring a seizureFigures 1 and 2 display the estimated phases based on the original signals. The model proposed by [32] is aVAR(1), resulting in a VECM given by(24) ∆ Y t = c + Π Y t − + E t . Tables 5 and 6 present the results, that essentially lead to the same conclusions obtained by [32], even thoughthey have based their ﬁndings on the trace test. H FBST Max. p-value r = (cid:39) (cid:39) r = r = r = (cid:39) ABLE

5. FBST and max. eig. test:prior to seizure H FBST Max. p-value r = (cid:39) (cid:39) r = r = r = (cid:39) ABLE

6. FBST and max. eig. test:during seizure

Example 2 : [49] compare three methods for modelling empirical seasonal temperature forecasts over SouthAmerica. One of these methods is based on a (possible) long-term cointegration relationship between the tem-peratures of the quarter March-April-May (MAM) of each year and the temperature of the previous months ofNovember-December-January (NDJ). When there is such a relationship the authors used the NDJ temperatures(of the previous year) as a predictor for the following MAM season.The original data set has monthly temperatures for each coordinate (latitude and longitude) of the coveredarea. The mentioned series of temperatures (MAM and NDJ) are computed as seasonal averages from this monthly An eleven years old female. [32] used 40 seconds for each period, obtaining slightly different results. See Table 8 of [32]. Global Historical Climatology Network (GHCN)/Climate Anomaly Monitoring System (CAMS) 2m temperature analysis(0.5 × × https://climexp.knmi.nl/NCEPData/ghcn_cams_05.nc . OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 15 data set by averaging over consecutive three months. Since we have data available from January 1949 to May2020, the time series of monthly and seasonal average surface temperatures of length 72 for each grid point.The authors of [49] consider Y t as a two dimensional vector, its ﬁrst component being the seasonal (average)MAM temperature of year t and the second component the seasonal NDJ temperature of the previous year. Theyconsider a VAR(2) without deterministic terms to model the series, resulting in a VECM(25) ∆ Y t = Γ ∆ Y t − + Π Y t − + E t . We have chosen ﬁve grid points corresponding to major Brazilian cities to test the cointegration hypothesisof the mentioned seasonal series. Figures 3 and 4 show the seasonal temperatures for S˜ao Paulo and Bras´ıliarespectively, indicating that the cointegration hypothesis is plausible for both cities. � �� F IGURE

3. Seasonal (MAM and NDJ)temperatures for S˜ao Paulo from 1949to 2020 � �� F IGURE

4. Seasonal (MAM and NDJ)temperatures for Bras´ılia from 1949 to2020 H : r = H : r = (cid:39) (cid:39) (cid:39) (cid:39) (cid:39) ABLE

7. FBST and maximum eigenvalue test applied to temperature data (MAM and NDJ series)of the mentioned Brazilian citiesThe results are shown in Table 7. One remark about Bras´ılia seems in order. The city was built to be the federalcapital, being ofﬁcially inaugurated on April, 1960. The construction began circa 1957 and before that the sitehad no human occupation. The process of moving all the administration from Rio de Janeiro, the former capital,was slow and only the 1980 census detected a population over 1 million inhabitants. Figure 4 indicates that theseasonal temperatures began to rise exactly after 1980.

Example 3 : we applied the FBST to the Finish data set used in their seminal work [21].The authors used the logarithms of the series of M The coordinates chosen were the closest ones from: 23.5505 ◦ S, 46.6333 ◦ W for S˜ao Paulo; 22.9068 ◦ S, 43.1729 ◦ W for Rio deJaneiro; 19.9167 ◦ S, 43.9345 ◦ W for Belo Horizonte; 15.8267 ◦ S, 47.9218 ◦ W for Bras´ılia and 12.9777 ◦ S, 38.5016 ◦ W for Salvador. The present population is almost 3,2 million inhabitants living in the Federal District, that includes Bras´ılia and minor surroundingcities. term relationship. The sample has 106 quarterly observations of the mentioned variables, starting at the secondtrimester of 1958 and ﬁnishing in the third of 1984. The chosen model was a VAR(2) with unrestricted constant and seasonal dummies for the ﬁrst three quarters of the year. Writing the chosen model in the error correctionform, we have:(26) ∆ Y t = c + Φ , D t + Φ , D t + Φ , D t + Γ ∆ Y t − + Π Y t − + E t . where Π = Φ + Φ − I , Γ = − Φ , c is a vector with constants and D it denote the seasonal dummies for trimester i = , ,

3. The results are displayed in Table 8. H FBST Max. p-value r = r = r = (cid:39) ABLE

8. FBST and maximum eigenvalue test applied to Finish data of Johansen and Juselius (1990)In [21], the authors concluded that there is, at least, two cointegration vectors. Since the hypotheses beingtested are nested, for the FBST r = Example 4 : as aﬁnal example we apply the FBST to an US data set discussed in [27]. The observations haveannual periodicity and went from 1900 to 1985. We tested for cointegration between real national income, M H FBST Max. p-value r = r = ABLE

9. FBST and maximum eigenvalue test applied to US data of Lucas (2000)7. F

INAL REMARKS

In the past few decades, the econometric literature introduced statistical tests to identify unit roots and cointe-gration relationships in time series. The Bayesian approach applied to these topics advanced considerably afterthe 1990’s, developing interesting alternatives, mostly for unit root testing. The (parametric) frequentist tests men-tioned here may not be suitable since these procedures rely on the distribution of the test statistic , which dependon a particular a statistical model. When the distributions of such statistics cannot be obtained, the procedureis saved by asymptotic results. If the researcher considers different statistical models and the available sample issmall, the results of the tests may be quite misleading.The present work reviewed a simple and powerful Bayesian procedure that can be applied to both purposes:unit root an cointegration testing. We have also shown that the FBST works considerably well even when oneuses improper priors, a choice that may preclude the derivation of Bayes Factors, a standard Bayesian procedurein hypotheses testing. This means that the series in Y t have one unit root with drift vector c and the cointegrating relations may have a non-zero mean.For more information about how to specify deterministic terms in a VAR see [28], chapter 6. This would clearly depend on the chosen criterion used to make a decision, such as a cut-off value for the FBST or a loss function,as discussed in Section 2. Usually assuming the hypothesis being tested is true. Usually Gaussian.

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 17

A long series of articles has showed the versatility and properties of FBST, such as: a. the e-value derivationand implementation are straightforward from its general deﬁnition; b. it uses absolutely no artiﬁcial restrictionslike a distinct probability measure on the hypothesis set, induced by some speciﬁc parametrization; c. it is in strictcompliance with the likelihood principle; d. it can conduct the test with any prior distribution; e. it does not needclosed conjectures concerning error distributions, even for small samples; f. it is an exact procedure, since it doesrely on asymptotic assumptions; and g. it is invariant with respect to the null hypothesis parametrization and withrespect to the parameter space parametrization. To proceed with this research agenda, it would be interesting to perform more simulation studies with the FBSTapplied to unit root testing for a larger group of parametric and semi-parametric models (likelihoods). Anotherpossibility is to include moving average terms in the data generating processes and work with Gaussian and non-Gaussian ARMA models. Notice that, given the points made above, these extensions would not impose majorproblems to the FBST as they would to the frequentist procedures. Regarding cointegration, the same extensionsmay be studied in future works, although the adoption of statistical models outside the Gaussian family wouldrequire further efforts to numerically implement the FBST. We shall also investigate the effect of the prior choicein the estimates of cointegration relations, specially for small samples.A

CKNOWLEDGMENTS

This work was supported by UFSCar - Federal University of S˜ao Carlos, USP - University of S˜ao Paulo,and UFMS - Federal University of Mato Grosso do Sul. This work was also partially supported by CNPq -the Brazilian National Council of Technological and Scientic Development (grants PQ 307648-2018-4, 302767-2017-7, 301892-2015-6 and 308776-2014-3); and FAPESP - the State of S˜ao Paulo Research Foundation (grantsCEPID Shell-RCGI201450279-4; CEPIDCeMEAI 2013-07375-0). The authors are extremely grateful for thesupport received from their colleagues, collaborators, users and critics in the construction works of their researchprojects. A

PPENDIX

A. N ON - STANDARD DISTRIBUTIONS USED IN THIS ARTICLE

A.1.

Inverse-Gamma.

The probability density function of the Inverse-Gamma distribution is given f ( x | a , b ) = b a Γ ( a ) · (cid:18) x (cid:19) a + exp (cid:18) − bx (cid:19) for x > a and b are both positive real numbers and Γ is the gamma function.A.2. Matrix Normal.

The probability density function of the random matrix X with dimensions p × q that fol-lows the matrix normal distribution MN p × q ( M , U , V ) has the form: f ( X | M , U , V ) = exp (cid:0) − tr (cid:2) V − ( X − M ) (cid:48) U − ( X − M ) (cid:3)(cid:1) ( π ) pq / | V | p / | U | q / where M ∈ R p × q , U ∈ R p × p and V ∈ R q × q , being U and V symmetric positive semideﬁnite matrices. The matrixnormal distribution can be characterized by the multivariate normal distribution as follows: X ∼ MN p × q ( M , U , V ) if and only if vec ( X ) ∼ N pq ( vec ( M ) , V ⊗ U ) , where ⊗ denotes the Kronecker product and vec the vectorizationof M .A.3. Inverse-Wishart.

The probability density function of the Inverse-Wishart distribution is f ( x | Λ , ν ) = | Λ | ν / ν p / Γ p ( ν ) | x | − ( ν + p + ) / exp (cid:20) −

12 tr ( Λ x − ) (cid:21) where x and Λ are p × p positive-deﬁnite matrices, and Γ p is the multivariate gamma function. Notice that wemay also write the same density with tr ( x − Λ ) inside the exponential function, as would be convenient in ourimplementation of the Gibbs sampler in Section 6. See [46] and the references therein. See [44], p.253 for this property. A PPENDIX

B. H

EURISTIC PROOF OF J OHANSEN ’ S PROCEDURE

The goal of this appendix is to provide a brief heuristic explanation of the procedure, discussed in Section 6,that ﬁnds the maximum of posterior (17) subject to the hypothesis that matrix Π has reduced rank r , 0 ≤ r ≤ n .The procedure is based on the algorithm proposed in [20, 21] to maximize a Gaussian likelihood under the sameassumption (reduced rank of matrix Π ). As mentioned in Section 6, Johansen’s algorithm can be applied to theposterior (17) since this distribution is very close to a (multivariate) Gaussian likelihood.The ﬁrst step of the algorithm involves “concentrating” the posterior, i.e. to assume Ω and Π are given andmaximize the posterior with respect to all the other parameters in Θ . Hence, let γ denote the matrix η except formatrix Π , i. e. γ (cid:48) = (cid:104) c Φ (cid:48) Γ (cid:48) . . . Γ (cid:48) p − (cid:105) . The concentrated log-posterior, denoted by M , is found by replacing γ with (cid:98) γ ( Π ) in (17):(27) M ( Π , Ω | y ) = ln [ g ( (cid:98) γ ( Π ) ; Π , Ω | y )] = C + ( T + n + ) | Ω − | − (cid:26) − · tr [ Ω − ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V )] (cid:27) where C is a constant that depends on T , n and y . The strategy behind concentrating the posterior is that if we canﬁnd the values (cid:98) Ω and (cid:98) Π that maximize M then these same values, along with (cid:98) γ ( (cid:98) Π ) , will maximize (17) under theconstraint rank ( Π ) = r . Carrying the concentration on one step further, we can ﬁnd the value of Ω that maximizes(27) assuming Π known, giving (cid:98) Ω ( Π ) = T + n + · ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) . To evaluate the concentrated log-posterior at (cid:98) Ω ( Π ) notice thattr (cid:104) (cid:98) Ω ( Π ) − ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:105) = tr [( T + n + ) I n ] = n ( T + n + ) and therefore, denoting by M ∗ this new concentrated log-posterior, we have M ∗ ( Π | y ) = C + ( T + n + ) n − ( T + n + ) (cid:12)(cid:12)(cid:12)(cid:12) T + n + ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:12)(cid:12)(cid:12)(cid:12) (28) = C + ( T + n + ) n − ( T + n + ) (cid:12)(cid:12)(cid:12)(cid:12) TT + n + · T ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:12)(cid:12)(cid:12)(cid:12) (29) = C + ( T + n + ) n − ( T + n + ) (cid:20)(cid:18) TT + n + (cid:19) n · (cid:12)(cid:12)(cid:12)(cid:12) T ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) (30) = K − ( T + n + ) · ln (cid:12)(cid:12)(cid:12)(cid:12) T ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:12)(cid:12)(cid:12)(cid:12) (31)where K is a new constant depending only on T , n and y . Equation (31) represents the maximum value one canachieve for the log-posterior for any given matrix Π . Thus, maximizing the posterior comes down to choosing Π as to minimize the determinant (cid:12)(cid:12)(cid:12)(cid:12) T ( (cid:98) U − Π (cid:98) V ) (cid:48) ( (cid:98) U − Π (cid:98) V ) (cid:12)(cid:12)(cid:12)(cid:12) subject to the constraint rank ( Π ) = r . The solution of this problem demands the analysis of the sample covariancematrices of the OLS residuals (cid:98) U and (cid:98) V and here we only present the ﬁnal expression for the maximum valueachieved for the log-posterior, denoted (cid:96) ∗ in Section 6:(32) (cid:96) ∗ = K − ( T + n + ) · ln | (cid:98) Σ UU | − T + n + · r ∑ i = ln ( − (cid:98) λ i ) . The formal proof of Johansen’s algorithm can be found in [19], chapter 20.

OINTEGRATION AND UNIT ROOT TESTS: A FULLY BAYESIAN APPROACH 19

Chapter 20 of [19] provides the formal derivation of (32).R

EFERENCES [1] L. Bauwens and M. Lubrano.

Advances in Econometrics , volume 11, part B, pages 3-28. London: JAIPress, 1996.[2] L. Bauwens, M. Lubrano, and J.-F. Richard.

Bayesian Inference in Dynamic Econometric Models . Oxford:Oxford University Press, 1999.[3] W. Borges and J. M. Stern. The Rules of Logic Composition for the Bayesian Epistemic E-Values.

LogicJournal of the IGPL , 15:401-420, 2007.[4] J. Chao and P. C. Phillips. Model selection in partially nonstationary vector autoregressive processes withreduced rank structure.

Journal of Econometrics , 91:227-271, 1999.[5] S. Chib and E. Greenberg. Understanding the Metropolis-Hastings algorithm.

The American Statistician ,49:327-335, 1995.[6] R. Davidson and J. G. MacKinnon.

Econometric Theory and Methods . Oxford: Oxford University Press,2004.[7] D. DeJong. Co-integration and trend-stationary in macroeconomic time series.

Journal of Econometrics ,52:347-370, 1992.[8] D. DeJong and C. H. Whiteman. Reconsidering Trends and random walks in macroeconomic time series.

Journal of Monetary Economics , 28:221-254, 1991.[9] M. A. Diniz, C.A.B. Pereira, J.M.Stern. Unit Roots: Bayesian Signiﬁcance Test.

Communications in Sta-tistics: Theory and Methods , 40:4200-4213, 2011.[10] M. A. Diniz, C.A.B. Pereira, J.M.Stern. Cointegration: Bayesian Signiﬁcance Test.

Communications inStatistics: Theory and Methods , 41:3562-3574, 2012.[11] M. A. Diniz, C. A. B. Pereira, A. Polpo, J. M. Stern and S. Wechsler. Relationship between Bayesian andfrequentist signiﬁcance indices.

International Journal for Uncertainty Quantiﬁcation , 2:161172, 2012.[12] D. A. Dickey and W. A. Fuller. Distribution of the estimators for autoregressive time series with a unit root.

Journal of the American Statistical Association , 74:427-431, 1979.[13] D. A. Dickey and S. G. Pantula. Determining the Ordering of Differencing in Autoregressive Processes.

Journal of Business & Economic Statistics , 5:455-461, 1987.[14] R. F. Engle and C. W. J. Granger. Co-Integration and Error Correction: Representation, Estimation, andTesting.

Econometrica , 55:251-276.[15] W. J. Freeman. Hilbert transform for brain waves. Scholarpedia, 2(1):1338, 2007.[16] J. Geweke. Bayesian reduced rank regression in econometrics.

Journal of Econometrics , 75:121-146, 1996.[17] I. J. Good.

Good Thinking: the Foundations of Probability and its Applications . Minneapolis: Universityof Minnesota Press, 1983.[18] W. H. Greene.

Econometric Analysis . New Jersey: Prentice Hall, 2008.[19] J. D. Hamilton.

Time Series Analysis . Princeton: Princeton University Press, 1994.[20] S. Johansen. Statistical analysis of cointegration vectors.

Journal of Economic Dynamics and Control ,12:231-254, 1988.[21] S. Johansen and K. Juselius. Maximum likelihood estimation and inference on cointegration - with appli-cation to the demand for money.

Oxford Bulletin of Economics and Statistics , 52:169-210, 1990.[22] F. Kleibergen and R. Paap. Priors, posterior odds and lagrange multiplier statistics in Bayesian analyses ofcointegration. Discussion paper, Tinbergen Institute, 1997.[23] F. Kleibergen and R. Paap. Priors, posterior odds and bayes factors for a Bayesian analysis of cointegration.

Journal of Econometrics , 111:223-249, 2002.[24] D. Kwiatkowski, P. C. Phillips, P. Schmidt, and Y. Shin. Testing the null hypothesis of stationarity againstthe alternative of a unit root: How sure are we that economic time series have a unit root?

Journal ofEconometrics , 54:159-178, 1992.[25] G. Koop, R. Strachan, H. K. van Dijk, and M. Villani.

The Palgrave Handbook of Theoretical Econometrics ,chapter 25. Palgrave McMillan, 2006.[26] M. Lubrano. Testing for unit roots in a Bayesian framework.

Journal of Econometrics , 69:81-109, 1995. [27] R. Lucas. Inﬂation and welfare.

Econometrica , 68:247-274, 2000.[28] H. Ltkepohl.

New Introduction to Multiple Time Series Analysis . Berlin: Springer, 2005.[29] J. G. MacKinnon. Approximate asymptotic distribution functions for unit-root and cointegration tests.

Jour-nal of Business and Economic Statistics , 12:167-176, 1994.[30] M. R. Madruga, L. G. Esteves and S. Wechsler. On the Bayesianity of Pereira-Stern tests.

Test , 10:291-299,2001.[31] C. Nelson and C. Plosser. Trends and random walks in macroeconomic time series: Some evidence andimplications.

Journal of Monetary Economics , 10:139-162, 1982.[32] J. Østergaard, A. Rahbeck and S. Ditlevsen. Oscillating systems with cointegrated phase processes.

Journalof Mathematical Biology , 75:845-883, 2017.[33] C. A. B. Pereira and J. M. Stern. Evidence and credibility: Full Bayesian Signiﬁcance Test for precisehypotheses.

Entropy Journal , 1:69-80, 1999.[34] C. A. B. Pereira, J. M. Stern and S. Wechsler. Can a Signiﬁcance Test Be Genuinely Bayesian.

BayesianAnalysis , 1:79-100, 2008.[35] P. C. Phillips. To criticize the critics: an objective Bayesian analysis of stochastic trends.

Journal of AppliedEconometrics , 6:333-364, 1991a.[36] P. C. Phillips. Bayesian routes and unit roots: de rebus prioribus semper est disputandum.

Journal of AppliedEconometrics , 6:435-474, 1991b.[37] P. C. Phillips. Econometric model determination.

Econometrica , 59:283-306, 1996.[38] P. C. Phillips e Z. Xiao. A primer on unit root testing.

Journal of Economic Surveys , 12:423-469, 1998.[39] M. Schervish.

Theory of Statistics . New York: Springer, 1995.[40] P. C. Schotman and H. K. van Dijk. On Bayesian routes to unit roots.

Journal of Applied Econometrics ,49:387-401, 1991.[41] A. H. Shoeb.

Application of machine learning to epileptic seizure onset detection and treatment . Disserta-tion, MIT Health Sciences and Technology Division.[42] C. A. Sims. Bayesian skepticism on unit root econometrics.

Journal of Economic Dynamics and Control ,12:463-474, 1988.[43] C. A. Sims and H. Uhlig. Understanding unit rooters: a helicopter tour.

Econometrica , 59:1591-1600, 1991.[44] J. M. Stern.

Cognitive Constructivism and the Epistemic Signiﬁcance of Sharp Statistical Hypotheses inNatural Sciences . Preprint availabe in arXiv:1006.5471 .[45] J. M. Stern. Symmetry, Invariance and Ontology in Physics and Statistics.

Symmetry , 3:611-635, 2011.[46] J. M. Stern and C. A. B. Pereira.

The e-value: A Fully Bayesian Signiﬁcance Measure for Precise StatisticalHypotheses and its Research Program . Preprint available in arXiv:2001.10577 .[47] K. Sugita. Testing For Cointegration Rank Using Bayes Factors. The Warwick Economics Research PaperSeries, 654, 2002.[48] L. Tierney and J. B. Kadane. Accurate approximation for posterior moments and marginal densities.

Journalof the American Statistical Association , 81:82-86, 1986.[49] A. A. Turasie and C. A. S. Coelho. Cointegration modelling for empirical South American seasonal tem-perature forecasts.

International Journal of Climatololgy , 36: 4523-4533, 2016.[50] I. Verdinelli and L. Wasserman. Computing Bayes factors using a generalization of the Savage-Dickeydensity ratio.

Journal of the American Statistical Association , 90:614-618, 1995.[51] M. Villani. Bayesian reference analysis of cointegration.

Econometric Theory , 21:326-357, 2005.[52] A. Wald.