Approximate Bayes factors for unit root testing
AApproximate Bayes factors for unit root testing
Martin Magris ∗ and Alexandros Iosifidis Department of Electrical and Computer Engineering, Aarhus University, Denmark
Abstract
This paper introduces a feasible and practical Bayesian method for unit root testing in financialtime series. We propose a convenient approximation of the Bayes factor in terms of the BayesianInformation Criterion as a straightforward and effective strategy for testing the unit root hypothe-sis. Our approximate approach relies on few assumptions, is of general applicability, and preserves asatisfactory error rate. Among its advantages, it does not require the prior distribution on model’sparameters to be specified. Our simulation study and empirical application on real exchange ratesshow great accordance between the suggested simple approach and both Bayesian and non-Bayesianalternatives.
Keywords : Unit root inference; Bayesian analysis; Bayes factor; BIC
JEL classification:
C11; C12; C22
In time series analysis the interest is often over some persistence properties of the process in the medium-long term. In this regard, unit roots are important since they relate to persistence of shocks and non-mean-reverting dynamics (e.g. Campbell and Perron, 1991). In other words, unit roots go in hand withnon-stationarity. This has severe implications in the applicability of standard econometrics techniques,among which spurious regression (Granger et al., 1974; Phillips, 1986) is the most notable one. Non-stationarity appears to arise quite commonly in economic time series, and with little surprise unit roottesting has been one of the most proficient research areas in econometrics. Most theoretical and empiricalwork in the domain of non-stationary time series relies on classical frequentist methods, which stand asthe reference approach.Among others, (Zellner, 1971; Geweke, 1988; Poirier, 1988) showed that the Bayesian frameworkis in general well-suited to inferential problems in econometrics, and it was since (Sims, 1988) that anumber of Bayesian methods for unit root testing and the corresponding Bayesian unit root literaturedeveloped. The earliest works appear remarkably optimistic and confident about the impact that aBayesian approach could have in unit root testing, even claiming an overall superiority of the Bayesianapproach over classical methods (Sims and Uhlig, 1991). On the contrary, the adoption and developmentof Bayesian methods would shortly appear to be much less straightforward and very debated. It wasin (Phillips, 1991b) that a reconciliation with frequentist methods was first discussed under the lightof impartial and objective Bayesian methods. This work raised several further issues related to theBayesian approach in unit root testing that stimulated active research and strong debates. (Phillips,1991a,b) advanced the idea that the achievement of impartial Bayesian analysis through flat priors onthe parameters is not well-suited in time series models. Actually, flat priors over the autoregressiveparameter achieve the opposite effect and have shown to be quite informative (e.g. Kim and Maddala,1991; Schotman and Van Dijk, 1991b; Phillips, 1991b; Leamer, 1991, among the others). A long debateon appropriate uninformative priors for Bayesian unit root inference followed (see Section 2). Thedetermination of suitable input information through the prior distribution, which is generally the majorreason for the divergence between the classical and Bayesian approach, is tightly bonded to the exacthypothesis being tested. For a simple autoregressive process of order one, hereafter denoted by AR(1), ∗ Corresponding author, [email protected]. Submitted for consideration at the 2021 annual conference of the
Interna-tional Association for Applied Econometrics (IAAE). a r X i v : . [ ec on . E M ] F e b he unit root inference problem corresponds to testing the null hypothesis of the autoregressive parameterbeing equal to one. In general, we shall emphasize the exact/point-wise/non-interval nature of anhypothesis by referring to it as point hypothesis. The goal of testing a point null hypothesis cannot beeasily achieved with continuous priors, as the consequent continuous posterior would assign zero weightto the unit root hypothesis. Feasible tests can either use discontinuous priors that assign a non-zeromass to the unit root hypothesis and distribute the reaming one over some interval (e.g. Schotmanand Van Dijk, 1991a; DeJong and Whiteman, 1991), or test closely-related non-point hypotheses withcontinuous priors (Koop, 1994, is instructive as it considers three possible nulls). The first approachis susceptible to poor objectivity (Phillips et al., 1993) while the second one does not properly matchthe exact and exclusive purpose of testing the unit root hypothesis (Schotman and Van Dijk, 1991b).Consequently, the Bayesian analysis can be respectively based on two radically different approaches: onBayes factors and odds ratios, or Bayes confidence sets and probability intervals over the posterior, withthe first one being of difficult interpretation in terms of traditional p-values (Berger and Delampady,1987). Furthermore, the inclusion of an intercept, a trend, or any richer structure in a simple AR(1)model, are not smooth extensions, as prior beliefs on the autoregressive parameter generally changeaccording to the particular deterministic component added to the model (Schotman and Van Dijk,1991b). Also, the particular form under which a model is expressed, e.g. “structural” or “reduced” from,plays a role in unveiling feasible directions for the Bayesian analysis. Not less importantly, (Schotmanand Van Dijk, 1991b; Uhlig, 1994; Lubrano, 1995) underline the importance of the conditioning set, andin particular the sensitivity of the whole inferential procedure on the first observation. Lastly, Bayesianmethods are generally known for not being keen on simple algebra: also in unit root inference numericalmethods are often required for achieving approximate solutions (e.g. Zivot, 1994).In this paper, we propose an approximate Bayesian unit root testing procedure that mitigates theabove-mentioned criticalities, especially the choice of the prior. In particular, we focus on the simpleAR(1) dynamics, which is a fundamental process and a major ingredient in the theory of non-stationarytime series and unit root testing, and on testing the point null unit root hypothesis. We apply standardapproximation results to obtain a generic Bayesian testing procedure based on approximate Bayes factorsand thus approximate posterior odds. With an asymptotic error rate sufficient to guarantee the appli-cability of the proposed method also for samples of moderate size, the approximate form of the Bayesfactor is independent on the choice of the priors, it is remarkably simple to compute, and scales to morecomplex models as long as their maximum likelihood estimates are attainable. Indeed, our approximateBayes factor is formulated as a simple function of the well-known Bayesian Information Criteria (BIC),being thus easy to implement and attractive for empirical research. Although the BIC approximationof Bayes factors has proved to be a valid tool in different fields, its use in unit root testing has notbeen investigated, though it appears to be very well-suited for this class of problems. In the empiricalsections, we propose a Monte Carlo experiment and an empirical application on real exchange rates, andanalyze the performance of our BIC approach with respect to other competing Bayesian methods andthe most widespread frequentist Dickey-Fuller test (Dickey and Fuller, 1981). Our experiment validatesthe proposed approach, as it stands out as a feasible and viable simple alternative for Bayesian unit rootinference.This paper is organized as follows. Section 2 reviews the literature on approaches, issues and advancesin Bayesian unit root testing, underlying the numerous problems that arise in this context. Section 3introduces Bayes factors and generally discusses the Bayesian testing framework based on evidence .Section 4 introduces our suggested testing approach based on the BIC approximation of Bayes factors.Section 5 reports the results of the Monte Carlo study, while an empirical application on real exchangerates is presented in Section 6. Section 7 concludes and suggests directions for future research. TheAppendix collects details on some benchmark Bayesian unit root tests and further results from thesimulation study. On the premise that the asymptotic distribution theory changes discontinuously between the station-ary and unit root case, with the classical hypothesis testing appearing as a not reasonable inferentialprocedure opposed to a Bayesian flat prior, the Bayesian analysis of unit root models was first sug-gested in (Sims, 1988). Since then, a wide and rich literature on the field pinpointed the advantagesand flaws of the Bayesian approach making it complex and long-debated. First and most notably, the2dentification of a suitable prior is in this context remarkably difficult and widely discussed. Besidesthis, there are several issues associated with model specification and formulation, the role played by theinitial observation, issues related to the invariance of the prior under different sampling frequencies, andcomputational arguments. In the following, we review the most relevant aspects of these issues. For amore comprehensive overview on the topic see e.g. (Maddala and Kim, 1998, Ch. 8), and the articlesin following dedicated special issues:
Journal of Applied Econometrics (1991, vol. 6, n.4),
EconometricTheory (1994, vol. 10) and
Journal of Econometrics (1995, vol. 69, n.1).
Here we shortly outline the setting based on flat priors adopted, among the others, by (Sims andUhlig, 1991; Sims, 1988; Geweke, 1988; Thornber, 1967; Zellner, 1971; Schotman and Van Dijk, 1991b).Consider the simple AR(1) model: x t = ρx t − + u t ,and assume a flat prior for ( ρ, σ ), π ( ρ, σ ) ∝ /σ with − < ρ <
1, and σ > u . ρ is the autoregressive parameter and ρ = 1 corresponds to theunit root hypothesis of interest, under which the AR(1) model reduces to a Brownian motion. Be x the initial starting value of T consecutive observations. With Gaussian likelihood L ( x | ρ, σ, x ) = (2 π ) − T/ σ − T exp (cid:32) − (cid:80) Tt =1 ( x t − ρx t − ) σ (cid:33) the joint posterior for ( ρ, σ ) is given by π ( ρ, σ | x, x ) ∝ σ − T − exp (cid:32) − (cid:80) Tt =1 ( x t − ρx t − ) σ (cid:33) = σ − T − exp (cid:34) − R + ( ρ − ˆ ρ ) Q σ (cid:35) ,with ˆ ρ = (cid:80) x t x t − / (cid:80) x t − being the OLS estimate for ρ , R = (cid:80) ˆ (cid:15) t = (cid:80) ( x t − ρ ˆ x t ) the residual sumof squares, and Q = (cid:80) x t − . For the above joint posterior, one obtains the following margins: π ( ρ | x, x ) ∝ R + ( ρ − ˆ ρ ) Q − T/ π ( σ | x, x ) ∝ σ − T exp (cid:18) − R σ (cid:19) .Our notation distinguishes between priors and posteriors depending on the conditioning set: π ( · ) opposedto π ( ·| data), respectively.The marginal posterior for ρ has the form of a symmetric univariate t-distribution, centered aroundthe OLS estimate ˆ ρ , while the marginal posterior for σ is an inverted gamma-2 distribution (Zellner,1971). Sims and Uhlig (1991) conclude that classical methods relying on asymmetric distributions ofthe OLS estimator of ρ , such as the Dickey-Fuller statistics, attribute too much weight to large valuesof ρ , while the above Bayesian framework based on flat prior is a more logical and sounder basis forinference than classical testing. This argument is advanced by comparing the distributions ρ | ˆ ρ = 1and ˆ ρ | ρ = 1, with the first being the posterior distribution of the true parameter with the estimatedparameter taken as given (Bayesian approach), and the second being the sampling distribution of theestimated parameter under the value of the true parameter (classical approach). While classical methodsare generally based on an asymmetric and nonstandard distribution for the autoregressive parameter,Bayesian methods lead to a symmetric and standard posterior. The asymmetry in ˆ ρ | ρ = 1 drives theargument that classical procedures based on p-values are misleading.A similar approach is that developed in (Schotman and Van Dijk, 1991b), see Appendix A.1 for adetailed description. This appears to be the most widespread setting for Bayesian unit root testing,commonly referenced also in the recent literature and here used as a benchmark. It takes ρ ∈ { S, } , S = { ρ | − < a ≤ ρ < } , and specifies the priors for ρ and σ asPr( ρ = 1) = π , Pr( ρ | ρ ∈ S ) = 11 − a , Pr( σ ) ∝ σ .That is, ρ is taken uniform over S and with probability mass π on ρ = 1, σ and ρ are independent. Themass at ρ = 1 is intended to allow for a feasible testing of the null hypothesis H : ρ = 1 (see Section3.2). For clarity, we shall refer to such an exact/point-wise/non-interval null hypothesis as point null.Restrictions over the domain of ρ are also adopted in (Geweke, 1988) and (DeJong and Whiteman, 1991),with the latter also providing an empirical analysis and a comparative study over the classical approachon the Nelson-Plosser data. The arbitrarity in selecting the restricted domain for the autoregressiveparameter and the values of the statistics supporting a unit root decision are pointed out and criticizedin (Sowell, 1991) and (Phillips, 1991b).Opposed to the use of non-flat priors like Normal-Wishart conjugates, which are known to be infor-mative about the properties of the model (Zellner, 1971; Phillips, 1991b) and that correspond to a priorbelief that explosive roots are unlikely when centered around the unit root (Uhlig, 1994), the choice offlat priors is generally attractive. This is due to that flat priors often appear as a suitable approachfor attributing a degree of “neutrality” or “objectivity” to Bayesian analyses, while being convenientin terms of computations, often leading to algebraic solutions (e.g. Zellner, 1971). The early Bayesianinference as of (Sims, 1988) strongly favors the above approach, but the use of flat priors is not innocu-ous as it may appear. Phillips (1991b) raised concerns on uniform priors since the inference on ρ isconditional on the observed sample moments and sufficient statistics, which depend on the value takenby ρ and are radically different for ρ = 1 and | ρ | <
1. The use of flat priors does not correspond touninformativeness and indeed is shown to downweight large values of ρ , i.e. of unit roots and explosiveprocesses. This is because when | ρ | is large, the data is more informative about ρ and treating all thevalues of ρ as equally-likely implicitly corresponds to downweighting large values of ρ . Consequently,the testing strategy based on regression with or without unit roots taken with the same likelihood irre-spectively of the value of ρ is inadequate. Furthermore, Phillips (1991b) finds that discrepancy in theresults between standard and Bayesian methods in unit root testing for macroeconomic US time seriesis largely due to the use of the misleading flat prior.As an objective setting (Phillips, 1991b) proposed a Jeffreys’ prior (Jeffreys, 1946; Perks, 1947).Jeffreys’ priors (often called “ignorance priors”) are defined up to a proportionality factor as ∝ (cid:112) | i | ,where | i | is the determinant of the expected Fisher information matrix i . Jeffreys’ priors render theposterior invariant under one-to-one reparametrizations and enjoy a number of desirable properties (Lyet al., 2016). Such priors are often interpreted as reflecting the properties of the sampling process andemphasizing data evidence, being interpretable as equivalent to the information that a typical singleobservation, on average, provides (Xia and Griffiths, 2012). Yet, the general use of the Jeffrey’s priorwith respect to the flat prior and its ability to convey a state of “ignorance” about the existence of a unitroot has been readily argued in (Leamer, 1991). Furthermore, in an extensive Monte Carlo study Kimand Maddala (1991) find that the above approach favors big values of ρ distorting the sample evidence.(Uhlig, 1994) finds however that for the univariate AR(1) model the major differences between flat andinformative priors are limited to the explosive regions. Further details on the approach of (Phillips,1991b) are provided in Appendix A.2. As argued in (Schotman and Van Dijk, 1991b), a genuine and exclusive interest in the unit root hypothesisformally corresponds to testing the null ρ = 1. Therefore, this hypothesis should be preferred for aBayesian treatment of unit root inference. If a continuous prior density for ρ is to be adopted, theprobability associated with ρ = 1 is zero and so its posterior probability (e.g. Zivot, 1994). In otherwords, testing a point null ρ = 1 – which corresponds to the only exact hypothesis on the existence of aunit root – is not trivial. Indeed, a continuous prior π ( ρ ) requires a full Bayesian analysis for retrievingthe corresponding posterior π ( ρ | x, x ) ∝ L ( x | ρ, x ) π ( ρ ), for some data x and initial value x .A posterior confidence interval at the probability level α is a subset C α such thatPr( C α | x, x ) = (cid:90) C α π ( ρ | x, x ) dρ ≥ − α ,where typically the subset C α is chosen is such a way that its size is minimal (highest posterior densitycriterion). The unit root hypothesis can be rejected by checking whether ρ = 1 does not belong to theposterior confidence interval C α . However, it is not unlikely for the posterior to be bimodal (e.g. Kimand Maddala, 1991) and C α may result in a disconnected set. Since the interest is around the unit root,an alternative is to redefine C α so thatPr( C α | x, x ) = (cid:90) ρ sup ρ inf π ( ρ | x, x ) dρ ≥ − α ,4ith ρ inf ( ρ sup ) being any convenient truncation point of the likelihood or −∞ (+ ∞ ). Alternatively,one can consider the probability Pr( ρ ≥ | x, x ) = (cid:90) + ∞ π ( ρ | x, x ) dρ (1)and decide to reject the unit root when its value is below a certain threshold, say 5%. This howeveris not anymore a test on a point null hypothesis as now H jointly covers the unit and explosive rootscases: H : ρ ≥ H : ρ < ρ = 1 a positive probability π and assign to the values of ρ over some interval S the density (1 − π ) π ( ρ ), where π ( ρ ) is a properprior on S (as e.g. in Schotman and Van Dijk, 1991b; DeJong and Whiteman, 1991). As in (Schotmanand Van Dijk, 1991b; Zivot, 1994) a common choice for S is the stationary region | ρ | <
1. With thisprocedure, a discontinuous prior can be easily adopted and the following hypotheses can be tested H : ρ = 1 H : ρ ∈ S .In general, H is not a generic non- unit root alternative since explosive processes are ruled out, andneither a properly stationary alternative, since the lower bound of S could potentially be either withinor outside the stationary region. Furthermore, is it possible to draw a data-driven criterion for selecting S in the testing procedure (see Appendix A.1). In this case, the specific formulation of the alternativedepends on the OLS estimate of ρ and on the sample size. It is therefore the test itself that defines thepossible values of ρ under H , and such implicitly imposed restrictions on S are de facto analogous toadopting a strong prior. In addition to the above difficulties, the whole inferential procedure is remarkably sensitive on modelformulation, and the effect of nuisance parameters is not negligible.Schotman and Van Dijk (1991b) warn on the effects of extending the AR(1) model with trendsand intercepts since the inclusion of such elements changes with (or implicitly corresponds to) priorbeliefs. Their study shows that the use of Jeffreys’ priors as in (Phillips, 1991b) downweights the unitroot hypothesis relative to a flat prior in models with trend and intercept. By using the alternativereduced-form parametrization of the AR model with trend Schotman and Van Dijk (1991b) are able ofexplaining the observed bias towards stationarity under flat priors.Lately, Phillips (1991b) explains that such behavior is dependent on the initial value and that con-ditioning the likelihood on the x resolves the issue. The apparently secondary role played by the initialvalue is thus shown to have an enormous impact inference. Analyses in this regard can be found in(Zivot, 1994) and (Lubrano, 1995). The latter shows that the treatment of the first observation pro-duces results that are more or less in accordance with the classical results and that e.g. the fixed orrandom treatment of x does make a difference when an intercept is included or not (opposed to thesimple AR(1) as outlined in (Zellner, 1971)). Following (Thornber, 1967), Lubrano (1995) extends thediscussion suggesting the use of uninformative Beta densities.Further issues related to the choice of priors are interactions/correlations between the different el-ements in multivariate parameters (i.e. appropriate priors specifications with non-diagonal covariancematrices), their often improper nature, and computational issues (e.g. Zivot, 1994). Also, the concep-tual problem of adopting priors that are irrespective and insensible to the sampling frequency of theobservations has been pointed out e.g. in (Leamer, 1991) and (Sims, 1991). The literature review would be certainly incomplete without references to more recent works. Thoughthe major advances in the theory of Bayesian unit root testing come from the 20 th century, Bayesianunit root testing keeps being an area of active research.5losely related to the above literature is the Full Bayesian Significance Test (FBST) of de Bra-gan¸ca Pereira and Stern (1999). This procedure allows to test for the point null unit root hypothesis,has no limiting requirements on the prior, allows for flexible error distributions, applies to small samplesizes, and is invariant with respect to models’ parametrization. Though attractive, the FBST test hasnot gained popularity in Bayesian unit root testing as the only application is that of (Diniz et al., 2011)where the FBST performance is compared against the Augmented Dickey-Fuller but not against existingalternative Bayesian methods, in both their simulation study and application.Recently also unit root testing in Stochastic Volatility (SV) models where the underlying volatilityprocess is unobservable has attracted several research contributions. So and Ll (1999) develop a MonteCarlo Markov Chain approximation of the odds for certain SV models. Their method is improved in(Li and Yu, 2010) with a more robust algorithm and increased test power. Extensions with leverageeffects within an SV model on an AR(1) process are considered in (Li et al., 2012). Kalaylıo˘glu andGhosh (2009) focus on the role played by priors in unit root Bayesian inference in SV models. Theyintroduce a class of non-informative priors to develop a testing procedure that is feasible based onGibbs-sampled approximations of the posterior but unpractical for adopting Bayes factors. Simulationmethods for approximating posterior credibility intervals are also adopted in (Kalaylıo˘glu et al., 2013),where correlations between returns’ series errors and the latent SV with potential unit roots are allowedto be non-zero. Extensions for heteroskedasticity are considered in (Chen et al., 2013). Severe distortionsin the size of the Dickey-Fuller test statistics under unit roots in an AR(1) with SV are reported in anextensive simulation study in (Zhang et al., 2013), where a Bayesian testing approach is introduced as aremedy. Developments over more complex dynamics over the simple AR(1) model include non-normalinnovations (Hasegawa et al., 2000), polynomial trends (Chaturvedi and Kumar, 2005), nonlinear smoothtransitions (Chen and Lee, 2015) and structural breaks (Park and Shintani, 2016; Vosseler, 2016), alsoin panel data (Kumar and Agiwal, 2018).Not strictly relevant for our analysis, yet closely related to Bayesian unit root literature, are thedozens of research works on Bayesian cointegration testing, appeared since (Koop, 1991). Consider a random variable X with density parametrized over θ ∈ Θ. The hypothesis testing problemwe are concerned about consists of deciding among the null hypothesis H : θ = θ and the alternative H : θ (cid:54) = θ . This is achieved by considering suitable measures of evidence of a hypothesis against theother, such as the widespread p-value, or Bayes factors and Bayesian posterior probabilities. Let us denote by T ( · ) a test statistics, and by t = T ( x ) its value when data X = x is observed. The nullhypothesis H is rejected in favor of the alternative H if T ( x ) is more extreme than one would expectif H was true. By choosing a significance level α , H is rejected when the probability of T ( X ) beinggreater than T ( x ) is small (i.e. lower or equal to α ), given that H is true. Formally, the hypothesis H is accepted if Pr( | T ( X ) | ≥ T ( x ) | H ) ≤ α ,that is, extreme values of the statistics are deemed to provide evidence against H .While it is straightforward to identify in higher values of | t | stronger evidence against the nullhypothesis, the problem of evaluating the strength of evidence for H against H is left open. Frequentistsuse a scale of evidence set from Fisher’s work in the 1920s: a common interpretation is that α = 0 . α = 0 .
95 to strong evidence and so on, with neutral evidence ataround α = 0 .
9. In other words, the higher the evidence the lower the Type I error of rejecting the truehypothesis. Bayesian literature provides different answers to the problem, based on quantities known as
Bayes factor and posterior odds . 6 .2 Posterior odds for a point null
Considering the unknown model probabilities as random, the Bayes’ rule yields posterior probabilitiesgiven the observed data. For the hypothesis H (and analogously for H ):Pr( H | x ) = Pr( x | H )Pr( H )Pr( x ) ,where Pr( H ) is the prior probability of hypothesis H being true, and Pr( x ) = Pr( x | H )Pr( H ) +Pr( x | H )Pr( H ) is the marginal density of X = x . The quantity Pr( x | H ) is referred to as marginallikelihood or marginal probability (of the data) under H . In this context, the language and notationencountered in the literature can be heterogeneous and slightly abused as it is common to indistinctlyuse the terms (joint) “density”, “likelihood”, (joint) “probability” and their corresponding notations,e.g. f , L , Pr. Here we adopt the notation and terminology of (Kass and Raftery, 1995).By the law of the total probability the marginal probability Pr( x | H ) is obtained by marginalizingout the parameter θ under H :Pr( x | H ) = (cid:90) Pr( x | θ , H ) π ( θ | H ) dθ , (2)where π ( θ | H ) is a continuous density. From a frequentist perspective, π ( θ | H ) is a mere weightfunction to allow the computation of the average likelihood. For a Bayesian, π ( θ | H ) would be the prior density for θ conditional on H being true (Berger and Delampady, 1987). Note the differencebetween the prior probability of a hypothesis or model being true opposed to the prior density referringto its corresponding parameter.The ratio of the posterior probabilities for the two hypotheses is referred to as posterior odds ratio ,or posterior odds : Pr( H | x )Pr( H | x ) = Pr( x | H )Pr( x | H ) Pr( H )Pr( H ) .Analogously, one may define the prior odds as Pr( H ) / Pr( H ). Posterior odds quantify the evidence of H over H after data x has been observed. On the other hand, prior odds do not convey any evidence asthey solely quantify the prior plausibility of H over H before any data is observed. The interpretationof odds ratios is straightforward as they correspond to simple probability ratios. Odds ratios K greaterthan one (or log K >
0) stand for evidence in favour of the null hypothesis, with a correspondingprobability K / (1 + K ). This is aligned with the general Bayesian rationale. After observing x , theprior probability Pr( H ) = π and corresponding prior odds K = π / (1 − π ) reflecting the prior beliefon H being the true model are updated into the posterior odds K and the corresponding new posteriorprobability for H being true.For a point hypothesis H : θ = θ , the assignment of a positive probability will be rarely thoughtpossible for θ = θ to hold exactly: this is to be understood as a realistic approximation of the hypothesis H : | θ − θ | ≤ b , for some small b , so that π ( θ | H ) in fact represents the prior probability assigned to {| θ − θ | ≤ b } (see Berger and Sellke, 1987). The way to depict such prior is through a smooth densitywith a sharp peak around θ . The focus of this paper is on the ratio B = Pr( x | H )Pr( x | H ) ,referred to as Bayes factor . With this definition, the posterior odds for a null hypothesis H and thealternative H , as discussed so far, can be written as:Pr( H | x )Pr( H | x ) = B Pr( H )Pr( H ) .Bayes factors can be interpreted as either the ratio quantifying the plausibility of observing the data x under H over H , or as the degree by which the observed data updates the prior odds Pr( H ) / Pr( H ).With respect to the posterior odds which involve prior probabilities on the hypotheses, the interest inBayes factors arises from the fact that they appear as actual odds implied by the observed data only.7oreover, Bayes factors are of attractive interpretation since they can be viewed as likelihood ratiosobtained by averaging likelihoods Pr( x | θ k , H k ) across θ k | H k , with weights π ( θ k | H k ), k = { , } . Not less importantly, the calculation of the Bayes factor requires the prescription only of the priordistributions π ( θ k | H k ), while the full Bayesian analysis leading to posterior odds requires the additionalspecification of the prior probabilities Pr( H k ), for k = { , } . The interpretation from above stillapplies: a Bayes factor of e.g. 1 /
10 means that H is supported by an evidence 10 times as high as H is. Furthermore, under the appealing “neutral” choice Pr( H ) = Pr( H ) = 1 /
2, Bayes factors coincidewith posterior odds, further enforcing B as a suitable alternative to p-values. Similar to the frequentistapproach where decisions are based on the critical level α , the higher the Bayes factor the stronger theevidence in favor of H over H . (Jeffreys, 1961) provides a scale for interpreting B as the degree towhich H is supported by the data over H , with ratios greater than 100 being decisive. A comparisonbetween the Bayesian and frequentist decision scales can be found in (Efron et al., 2001). We show how the above description on Bayes factor and posterior odds practically applies to test theset of hypotheses H : ρ = 1 H : ρ ∈ S .Here we extend the discussion to any generic AR-like model, holding the usual interpretation of ρ asthe autoregressive parameter on interest. First, to compare the average likelihood of a model over thecomplementary region through posterior odds as outlined above, we assign a certain positive weight π to the point null hypothesis H and share the complement (1 − π ) over the interval S relevant under H .Second, as a general case the likelihood function is parametrized over a rich K -dimensional parameter { ρ, θ } defined over some appropriate set { S, Θ } ∈ R K . Therefore, the computation of the Bayes factorin this setting involves a multidimensional marginalization over the elements of θ , thus posteriors oddsread: K = π − π (cid:82) Θ L ( x | ρ = 1 , θ, x ) π ( θ | ρ = 1) dθ (cid:82) S (cid:82) Θ L ( x | ρ, θ, x ) π ( θ | ρ ) π ( ρ ) dθdρ = Pr( ρ = 1 | x )Pr( ρ ∈ S | x ) . (3)Eq.(3) embeds a feasible and largely adopted specification for the prior over ( ρ, θ ) that imposes con-ditional independence on the parameters, allowing for a convenient factorization of the joint prior asa product of conditionally independent factors, that is π ( ρ, θ ) = π ( θ | ρ ) π ( ρ ). Rather than the aboveprobability notation (Pr) typical in introductory discussions on Bayes factors, here we use the likelihoodnotation ( L ) as it is more common in the related econometric literature.For inference on the simple AR(1) process x t = ρx t − + u t , K = 2, as θ generally includes the unknown variance σ of the innovations. In this case, posterior oddshave the simple form K = π − π (cid:82) ∞ L ( x | ρ = 1 , σ, x ) π ( σ ) dσ (cid:82) S (cid:82) ∞ L ( x | ρ, σ, x ) π ( σ ) π ( ρ ) dσdρ .The conditional independence assumption among the parameters is generally not very restrictive andcommonly extended to unconditional independence, thus σ is provided with its own prior π ( σ | ρ ) = π ( σ ).Further information can be found e.g. in (Schotman and Van Dijk, 1991b; Zivot, 1994; Zellner and Siow,1980, among the others).In the following section we show how to approximate the general Bayes factor involved in Eq.(3)with a friendly form of moderate error that neither requires integration nor the priors to be specified. For k = { , } consider the densities Pr( x | H k ) = (cid:82) Pr( x | θ k , H k ) π ( θ k | H k ) dθ k involved in the definition ofBayes factors. Be θ k the parameter under H k , π ( θ k | H k ) its prior, and Pr( x | θ k , H k ) the prior density of x given the values of θ k . θ k in general represents a vector parameter with dimension d k . In the following,8e shall refer to the marginal probability of the data (or marginal likelihood) Pr( x | H k ) as I and adopta simplified notation where we drop k and rewrite the marginal likelihood as I = (cid:90) Pr( x | θ, H ) π ( θ | H ) dθ .Except for some elementary cases where the above integral can be evaluated analytically, the computationof the marginal likelihood is intractable and requires numerical methods. In fact, analytic solutions for I are limited to exponential family distributions and conjugate priors, including normal linear models(e.g. DeGroot, 2005; Zellner, 1971). A general description of the different approaches for evaluating I with numerical methods is provided in (Evans et al., 1995).To recover a first useful approximation for I , assume that the posterior density, proportional toPr( x | θ, H ) π ( θ | H ), is peaked around its maximum ˜ θ , which is the posterior mode. This is generally thecase for large samples if the likelihood function of the data Pr( x | θ, H ) is peaked around its maximum ˆ θ (Kass and Raftery, 1995). Let g ( θ ) = log(Pr( x | θ, H ) π ( θ | H )) and consider its Taylor expansion around˜ θ : g ( θ ) = g (˜ θ ) + ( θ − ˜ θ ) T g (cid:48) (˜ θ ) + / ( θ − ˜ θ ) T g (cid:48)(cid:48) (˜ θ )( θ − ˜ θ ) + o ( || θ − ˜ θ || ). Since g (cid:48) (˜ θ ) = 0 as g reaches itsmaximum at ˜ θ , it follows that: I = (cid:90) exp[ g ( θ )] dθ ≈ exp[ g (˜ θ )] (cid:90) exp (cid:104) / ( θ − ˜ θ ) T g (cid:48)(cid:48) (˜ θ )( θ − ˜ θ ) (cid:105) dθ , (4)where we recognize in the integrand the kernel of a generic d -dimensional multivariate normal distributionwith mean ˜ θ and covariance matrix ˜Σ. ˜Σ also corresponds to minus the inverse Hessian matrix of thesecond order derivatives of g ( θ ) evaluated at θ = ˜ θ , i.e. ˜Σ − = − g (cid:48)(cid:48) (˜ θ ). The integrand in Eq.(4) thereforeequals (2 π ) d/ | ˜Σ | / , from which the following approximation is known as Laplace approximation (e.g.Konishi and Kitagawa, 2008): ˜ I = (2 π ) d/ | ˜Σ | / Pr( x | ˜ θ, H ) π (˜ θ | H ) . (5)In particular, as n diverges I = ˜ I (1 + O (cid:0) n − (cid:1) ), see e.g. (Tierney et al., 1989; Kass et al., 1991). Eq.(5)can be applied to any regular statistical model and stands as a viable general approach for evaluatingthe marginal likelihoods involved in the definitions of Bayes factors with an approximation error of order O (cid:0) n − (cid:1) . Slate (1994) discusses requirements on the sample size for reaching posterior normality, andthe accuracy of Laplace’s method has been more generally investigated in (e.g. Efron and Hinkley, 1978;Kass and Vaidyanathan, 1992). An empirical rule is provided in (Kass and Raftery, 1995): sample sizesof at least 5 d provide a satisfactory accuracy in well-behaved problems, with 20 d applicable in mostsituations.The use of Eq.(5) is impractical since ˜ θ refers to the posterior mode and ˜Σ to the negative inverseHessian of g ( θ ), while maximum likelihood estimates and information matrices are of common use andgenerally readily available as standard outputs in any statistical software. Indeed, a variation over Eq.(5)that has attracted much attention uses the maximum likelihood estimator ˆ θ , applies to large sampleswhere ˜ θ ≈ ˆ θ , and relies on the covariance matrix ˆΣ so that ˆΣ − corresponds to the observed informationmatrix, i.e. the negative Hessian of the log-likelihood evaluated at ˆ θ (Tierney et al., 1989; Kass andVaidyanathan, 1992): ˆ I = (2 π ) d/ | ˆΣ | / Pr( x | ˆ θ, H ) π (ˆ θ | H ) . (6)The relative error in this case is still the best rate O (cid:0) n − (cid:1) . If one replaces the observed informationmatrix with the expected information matrix i , the asymptotic error rate moves to the larger order O (cid:0) n − / (cid:1) . The expected information matrix i is a d × d matrix whose ( h, k ) element is − E (cid:20) ∂ log Pr( x | θ, H ) ∂θ h ∂θ k (cid:12)(cid:12)(cid:12)(cid:12) θ =ˆ θ (cid:21) ,and the expectation is taken over x i with θ held constant. Therefore, in large samples the observedinformation matrix ˆΣ − can be approximated based on the expected information matrix ˆΣ − ≈ ni , and n d i = | ˆΣ − | = | ˆΣ | − . With this substitution Eq.(6) rewrites as:ˆ I = (2 π ) d/ n − d/ | i | − / Pr( x | ˆ θ, H ) π (ˆ θ | H ) ,9rom whichlog I = log Pr( x | ˆ θ, H ) + log π (ˆ θ | H ) + d π ) − d n −
12 log | i | + O (cid:16) n − / (cid:17) . (7)Note that the prior density π ( θ | H ) needs to be fully specified, as it is involved throughout the approxi-mation procedure.This leads to the conclusive approximation form that does not involve prior densities:log I = log Pr( x | ˆ θ, H ) − d n + O (1) . (8)This last approximation is in virtue of the fact that in Eq.(7) besides Pr( x | ˆ θ, H ) and log n which arerespectively of order O ( n ) and O (log n ), all the remaining terms are of order O (1) or lower. FromEq.(8) we have that the marginal likelihood is thus equal to the maximized likelihood Pr( x | ˆ θ, H ) minusa correction term where the approximation is O (1). Even though the O (1) term does not vanish, becauseall the other terms tend to infinity as n increases, the error is dominated and vanishing as a proportionof I . Raftery (1995) shows that in reality the error term is not as high as one might think, althoughan O (1) error suggests that the approximation is in general quite crude. In fact, the error can be of asmaller order of magnitudes given a reasonable choice of the prior.As a remark, the definition of the sample size n should reflect the rate at which the Hessian matrixof the log-likelihood grows, i.e. satisfactory for the approximation ˆΣ − ≈ ni . This n turns to be thenumber contributions to the summation appearing in the definition of the Hessian (Raftery, 1995; Kassand Raftery, 1995) – e.g. in survival analysis, n would match the number of non-censored observationsrather than the total number of observations. The above discussion provides the basis for the following approximation of the Bayes factor. Hereafter we,focus on the case where the null hypothesis is nested. That is, we assume some parametrization under H of the form θ = ( ρ, β ) such that H is obtained from H by imposing the restriction ρ = ρ for some ρ .Both ρ and β can be vectors. Let θ denote the parameter under H with prior π ( θ | H ) = π ( ρ, β | H ),and for H : ρ = ρ let its prior be π ( θ | H ) = π ( β | H ).Based on Eq.(6), by applying the definition of the Bayes factor to the log-ratio of the marginallikelihoods, one obtains:2 log B ≈ Λ + log | ˆΣ | − log | ˆΣ | + 2 log π (ˆ θ | H ) − π (ˆ θ | H ) + ( d − d ) log(2 π )where Λ = 2(log Pr( x | ˆ θ , H ) − log Pr( x | ˆ θ , H )) corresponds to the log-likelihood ratio statistics with d − d degrees of freedom. Refer to (Raftery, 1996) for an additional discussion, and for the approxi-mation of the Bayes factor under to Eq.(5). On the other hand, based on the approximation in Eq.(8),one obtains: 2 log B ≈ Λ − ( d − d ) log( n ) = 2 S , (9)where 2 S = 2 log Pr( x | ˆ θ , H ) − x | ˆ θ , H ) − ( d − d ) log( n )= (cid:104) d log( n ) − x | ˆ θ , H ) (cid:105) − (cid:104) d log( n ) − x | ˆ θ , H ) (cid:105) .The following consistency result for n → ∞ known as Schwarz criterion( S − log B ) / log B → S as a standardized quantity to be used even when the priors are hardto set and a useful reference quantity in scientific reporting (Kass and Raftery, 1995). The O (1) errorimplies that even in large samples S does not lead to the correct value in absolute terms, the error doesgo to zero in terms of proportion with respect to the actual log of the Bayes factor (e.g. Kass and Raftery,1995; Raftery, 1995, 1996). Importantly, for certain classes for priors the error of approximation reducesto O (cid:0) n − / (cid:1) . One class is that of Jeffrey’s priors with a specific choice of the constant preceding them,10nother class is that of unit information priors (Raftery, 1995; Wasserman, 2000; Wagenmakers, 2007).With respect to subjectively determined priors, a surprisingly good agreement between the Schwarzcriterion and actual Bayes factors is observed in Kass and Wasserman (1995). In general when thesample size n is sufficiently large, the approximation is very satisfactory for most of purposes and is ofwidespread use, including applications in psychology (e.g. Wasserman, 2000), ecology (e.g. Aho et al.,2014), and computer vision (e.g. Stanford and Raftery, 2002). Kass and Wasserman (1995) furthershow that for the intuitive and reasonable choice of the unit information prior exp( S ) /B → O (cid:0) n − / (cid:1) . This provides a direct interpretation of the Schwartz criterion in terms ofBayes factors and evidence.For a given model k , recall the definition of the Bayesian Information Criterion (BIC):BIC k = d k log n − x | ˆ θ k , H k ) . (11)It is easy to recognize that the right side of Eq.(8) is closely related to BIC as − I = BIC + O (1),and that 2 S = BIC − BIC = ∆BIC . Eq.(9) then is equivalent tolog B ≈
12 ∆BIC . (12)Eq.(12) establishes ∆BIC as an approximate measure of the log-evidence in support of the hypothesis H over H . We shall refer to either Eq.(12) or its exp-version B ≈ exp( / BIC ) as the BICapproximation of the Bayes factor.For the above BIC approximation the approximation error is generally O (1), but in virtue of theSchwartz criterion (∆BIC − log B ) / log B → O (cid:0) n − / (cid:1) for certain priors. Eq.(12) formallyjustifies the extensive practice of model selection based on the smallest BIC value. Indeed, the higherthe evidence in support of model 1, i.e. the higher the log-Bayes factor, the more positive ∆BIC andthe smaller BIC with respect to BIC .In the context of linear models with normal errors, BIC rewrites in the alternative convenient formBIC k = n log (cid:0) − R k (cid:1) + d k log n ,with R being the usual R-squared. The proportion 1 − R k of the variance that model k fails to explainrelates to the sum of squares table through the equivalence 1 − R k = SSE k − SS total , where SSE k isthe sum of squared errors for model k and SS total the total sum of squared errors. This leads to thefollowing expression: ∆BIC = n log SSE SSE + ( d − d ) log( n ) . (13)Applied examples on the use of Eq.(13) are provided e.g. in (Wagenmakers, 2007; Masson, 2011).Furthermore, for nested models such that d − d = 1, Λ ≈ t with t being the t-statistics for testingthe significance of the parameter in model 1 that is set to zero in model 0, and Λ the correspondinglikelihood ratio statistics. From Eq.(12):2 log B ≈ ∆BIC = Λ − log n ≈ t − log n . (14)This underlines a proportionality between t , ∆BIC and B , which means that the t statistics can bedirectly translated into BIC and into grades of evidence through Bayes factors (Johnson, 2005). Highvalues of t support the statistical significance of the additional parameter in the full model. In turn,BIC is smaller than BIC , so ∆BIC > B >
0, which is indicative of evidence against thereduced model. By reparametrization, the above extends to hypotheses where an element θ (cid:48) of θ is setto a fixed value (rather than zero), e.g. H : θ (cid:48) = 1 is analogous to H : θ (cid:48)(cid:48) = 0 by taking θ (cid:48)(cid:48) = θ (cid:48) − The major contribution of this paper is to test for unit roots in financial time series by the BIC approx-imation of the Bayes factor, Eq.(12). We shall list the major points in favor of this approach.i. As reviewed above, the choice of the prior is the principal problem in Bayesian testing of unit roots.On the contrary, our proposed BIC approximation does not require a full specification of the priors.Neither that of the autoregressive parameter, nor of any other parameter. The independence ofthe BIC approximation on prior specifications is also attractive from the point of objectivity inBayesian analysis. 11i. Bayes factors and posterior odds allow test point nulls on the autoregressive parameter, which canbe problematic as shown earlier. This setting is however natural for the BIC approximation.iii. The BIC approximation to Bayes factor is a general procedure that does not depend on the modelform or parametrization. Regardless of whether the AR model under investigation has an intercept,a trend component, exogenous regressors, or any richer structure, the BIC procedure applies. Ingeneral, the error is O (1) but reduces to zero as a proportion of the Bayes factor (cf. Eq.(10)).iv. Testing based on BIC approximation does not require any integration and does not present majorcomputational issues: it only requires the maximum likelihood estimates, as of definition in Eq.(11).v. The applicability of the method depends on the feasibility of the approximation in Eq.(6), i.e.on the feasible hypotheses that the data x consist of i.i.d. observations, that the posterior ispeaked around its maximum, and that the sample size is sufficient for ˜ θ ≈ ˆ θ and ˆΣ − ≈ ni to besatisfactory approximations. A sample size of about 20 times the number of parameters appearsto be generally fair for well-behaved problems where the likelihood is not grossly non-normal. Forthe simple AR(1) model with unknown innovations’ variance, this corresponds to about 40 points,e.g. about two months of daily market data.Lastly, consider that even if the applicability of the above procedure is quite broad, time seriesmodels involved in classical unit root testing applications are broadly of linear form, so the simplifiedalternative form in Eq.(13) commonly applies. Also, the point form of the null hypothesis naturallysuggests a nested hypotheses structure where the only restriction is on the autoregressive parameter.That is, applications will generally encounter linear model specifications where d − d = 1. To validate our proposed testing methodology and compare it with some existing alternatives, we developa Monte Carlo simulation study. In particular, we simulate 20,000 simple AR(1) processes x t = ρx t − + u t , with t = { , . . . , T } and independent standard normal innovations, by considering different samplelengths T = { , , , , , } and different values of the autoregressive parameter ρ = { . , . , . , . , . , . , } . We test the point null unit root hypothesis H : ρ = 1 and summarizein Table 1 the corresponding results. All the tables related to the Monte Carlo study report averagesacross the simulated samples.Table 1 includes probabilities associated with different alternative testing methods. For the ap-proach of Schotman and Van Dijk (1991b) we both adopt a constant lower integration bound a forthe autoregressive parameter fixed to − ρ ≥ = Pr( ρ ≥ | x ) from (Phillips, 1991b), computed throughEq.(1). These posterior probabilities are reported only for T ≤
200 as larger T quickly drive Eq.(17)below machine precision and integration turns problematic: recovering such probabilities for large T ishere beyond our scope. For the SVD, SVD* and BIC entries in Table 1, we report as our main result theprobabilities corresponding to the Bayes factors in Table 3, since of easier interpretation. Furthermore,assuming prior odds equal to one, these probabilities also interpret as posterior odds.Our simulation results show a smooth and coherent behavior of the proposed BIC approximationleading to posterior probabilities behaving in accordance with how one might expect from this controlledsetting. (i) The acceptance probabilities for the null are progressively higher as ρ approaches the unity.(ii) For a fixed ρ , larger samples reduce the posterior unit root probability for small values of ρ whileincrease it for ρ around 1. Indeed larger samples embed higher evidence in support of the true hypothesis,for which we observe increasing posterior probabilities (and log- Bayes factors). With ρ = 0 . T = 50,for the BIC approximation we compute an average posterior probability for H of .240, however as T increases the evidence towards the true hypothesis ρ = 0 . T = 200 ahead). Similarly, for ρ = 1 and T = 50,the small size of the sample advocates for a stationary dynamics with a considerable 0.211 probability,while at larger T the probability associated with far-from-unity values of ρ sharply decreases to zero.Furthermore, our simulation study leads to posterior probabilities that are also well-aligned acrossthe BIC, SVD and SVD* testing approaches, following the same trends across different values of ρ forfixed T . The BIC approximation is however leading to probabilities that are not uniformly greater or12maller than those from SVD*. In fact, for small to moderate sample sizes BIC returns higher posteriorsprobabilities for the unit root hypothesis, while smaller for large T with respect to SVD*. This behaviorcould be partially explained by the different specifications of the alternative hypothesis. While for SVDthe reported probabilities are those of a unit root against a stationary alternative, for BIC the alternativeis a generic H : ρ (cid:54) = 1. It is thus reasonable that the rejection probabilities are larger for BIC thanSVD, as by construction for BIC the feasible parameter space under the alternative H is broader. Alsowith respect to posterior probabilities of explosive roots/non-stationarity dynamics Pr ρ ≥ under theignorance prior, the BIC approximation appears coherent and well-aligned. Though BIC and Pr ρ ≥ probabilities refer to different hypotheses, we observe that indeed the higher the posterior probability ofa unit root, the higher the posterior probability supporting the non-stationary option.As expected, p-values associated with the most classical frequentist Dickey-Fuller test cannot excludethe non-stationary hypothesis with increasing confidence as ρ moves towards one. Accordingly, theevidence in support of a unit root is reflected in increasing posterior BIC probabilities. All the abovediscussion applies as well to the Bayes factors reported in Appendix A.3, where negative signs stand forevidence against the unit root null.This study confirms an overall very satisfactory behavior of the proposed method with respect tosome Bayesian and non-Bayesian alternatives for unit root hypothesis testing. Despite the general O (1)error associated with BIC approximation and its complete independence on the prior specification, oursimulated posterior probabilities are well-behaving (i.e. showing desirable smooth monotonicity over ρ for T fixed, and the other way around), coherent with their expected behavior, and aligned with thedecisions over the null the other approaches suggest. T = 50 T = 500 ρ SVD SVD* BIC DF Pr ρ ≥ ρ SVD SVD* BIC DF0.200 .000 .000 .000 .001 .015 0.200 .000 .000 .000 .0010.500 .003 .002 .004 .001 .107 0.500 .000 .000 .000 .0010.800 .292 .124 .240 .032 .244 0.800 .000 .000 .000 .0010.900 .686 .341 .545 .111 .306 0.900 .000 .000 .000 .0010.990 .955 .663 .787 .390 .445 0.990 .957 .336 .801 .1150.999 .973 .719 .797 .485 .514 0.999 .994 .652 .924 .4061.000 .975 .729 .798 .501 .529 1.000 .996 .701 .926 .491 T = 100 T = 10000.200 .000 .000 .000 .001 .002 0.200 .000 .000 .000 .0010.500 .000 .000 .000 .001 .078 0.500 .000 .000 .000 .0010.800 .041 .011 .031 .003 .184 0.800 .000 .000 .000 .0010.900 .458 .125 .321 .032 .243 0.900 .000 .000 .000 .0010.990 .965 .608 .827 .332 .425 0.990 .900 .131 .623 .0340.999 .983 .701 .849 .474 .528 0.999 .996 .618 .942 .3551.000 .985 .714 .850 .495 .546 1.000 .998 .697 .947 .488 T = 200 T = 50000.200 .000 .000 .000 .001 .000 0.200 .000 .000 .000 .0010.500 .000 .000 .000 .001 .057 0.500 .000 .000 .000 .0010.800 .000 .000 .000 .001 .138 0.800 .000 .000 .000 .0010.900 .084 .012 .049 .004 .182 0.900 .000 .000 .000 .0010.990 .968 .529 .844 .249 .389 0.990 .001 .000 .000 .0010.999 .990 .686 .888 .457 .534 0.999 .996 .351 .932 .1231.000 .992 .706 .889 .492 .562 1.000 1.000 .697 .976 .487 Table 1: Simulation results. SVD, SVD* and BIC: unit root posterior probabilities (prior odds equal toone). DF: p-values of the Dickey-Fuller test. Pr ρ ≥ : posterior probabilities Pr( ρ ≥ | x ). In our empirical application, we analyze Real Exchange Rates (RERs) time series for nine major cur-rencies. RERs are obtained by deflating nominal exchange rates by the relative price of domestic vs.foreign goods and services, thus reflecting the competitiveness of a country with respect to a reference13asket. Common choices for deflation are the consumer price index (CPI), the producer price indices,or GDP-based deflators. An increase in RER implies that exports become more expensive and importsturn cheaper, indicating a loss in trade competitiveness, for instance in response to an appreciation ofthe domestic currency, or in response to increased domestic inflation. We extract official monthly RERs(CPI deflated) time series distributed by the European Commission and available from the Statisticaldata warehouse of the European Central Bank , for the period between January 2010 and November2020. This corresponds to 131 records for each of the nine major currencies we consider in the analysis.Real exchange rates relate to the long-run equilibrium condition – known as Purchasing Power Parity(PPP) – which implies a steady long-term level and a constant unconditional mean for RERs series. Theexistence of a unit root would contradict this theory. There have been considerable efforts in empiricallyverifying the PPP theory and several papers discussed mid- and long- term departures from the expectedRER stationarity, leading to a controversial debate. The essence is that conclusions based on empiricalresearch strongly depend on the exact definition of equilibrium that one adopts, on the methods usedto test it, on the underlying hypotheses on the time series, and on its length. This applies to unit rootanalyses as well, which often lead to opposite results on PPP’s validity (see e.g. MacDonald, 1995, andthe references therein). In this regard, the unit root analysis of Parikh and Wakerly (2000) suggeststhat such equilibrium is expected to be generally observed over a time-span of at least 50 years. Non-short-lived disequilibrium periods in RER dynamics are thus common. At first sight, Figure 1 seemsto confirm such a tendency, suggesting a recent disequilibrium period for some of the major currencies.For instance, we observe an upward trend for the US dollar series and an apparent non-mean-reversingbehavior for the Chinese yuan and Japanese yen, which gradually adjust towards new RER levels,suggesting the presence of unit roots. R E R ( C P I ad j u s t ed ) EUR USD GBP JPY CNY
Figure 1: Real exchange rates for some selected currencies.Posterior probabilities of the unit root hypothesis in RERs are reported in Table 2. Our resultsindicate ubiquitous evidence in favor of the unit root hypothesis for all the currencies. Bayesian re-sults are aligned with the p-values from the Dickey-Fuller (DF) test and posterior probabilities from(Phillips, 1991b). Indeed, higher posterior probabilities are both associated with higher p-values of theDickey-Fuller statistics and higher posterior probabilities Pr ρ ≥ = Pr( ρ ≥ | x ) associated with the non-stationary alternative. Explanatory are the results for the Euro-zone where the .047 p-value of the DFtest indicates a mid rejection of the unit root hypothesis. This is aligned with Figure 1 where the EURseries indeed displays a much stationary behavior than any other series. Accordingly, BIC probabilitiestake their smallest value across all the countries analyzed, with the stationary hypothesis having .455probability, much smaller than the average .880 observed for all the other currencies, where the DFp-value is on average .625. This is also aligned with the conclusion one would draw from the SVD* ap-proach, where the highest evidence in favor of stationarity is again associated with the euro series. Thisis reasonable, as euro-zone countries have a significant presence in the basket for determining RER’sCPI correction (19 out of 37 countries in the basket adopt euro). The observed greater plausibilityof the stationary hypothesis is here not surprising but expected and coherent. Lastly, note that BICprobabilities appear to uniformly dominate SVD* ones, this could perhaps be explained in terms of thetypical bias towards stationarity implied by the use of uniform priors. Data and its documentation are available at https://sdw.ecb.europa.eu/browse.do?node=9691113 . Australian dollar (AUD), Canadian dollar (CAD), Swiss franc (CHF), Chinese yuan (CNY), euro (EUR), Britishpound (GBP), Hong Kong dollar (HKD), Japanese yen (JPY), US dollar (USD). og BF Prob.Currency SVD* BIC SVD* BIC DF Pr ρ ≥ AUD -0.333 1.076 .418 .746 .090 .288CAD 1.455 2.431 .811 .919 .676 .705CHF 0.996 2.282 .730 .907 .442 .569CNY 1.848 2.044 .864 .885 .904 .875EUR -1.194 0.180 .232 .545 .047 .161GBP 0.403 1.802 .599 .858 .240 .374HKD 1.812 2.120 .860 .893 .883 .857JPY 1.154 2.369 .760 .914 .533 .553USD 1.641 2.349 .838 .913 .802 .778
Table 2: SVD* and BIC: log- Bayes factors and their corresponding posterior probabilities for the unitroot hypothesis (prior odds equal to one). DF: p-values of the Dickey-Fuller test. Pr ρ ≥ : posteriorprobabilities Pr( ρ ≥ | x ). Unit root testing has historically been among the most active areas in econometric research. With clas-sical frequentist methods following Dicker Fuller (DF) statistics, being broadly adopted and established,Bayesian methods did not gain much attention in empirical research and applications. However, researchand debates on Bayesian unit root testing have been very active and sound. Indeed, the econometricresearch in the field pointed out a series of unique criticalities that turn the Bayesian approach for unitroot inference particularly challenging.On the other hand, the testing procedure based on the BIC approximation of the Bayes factoraddressed in this paper appears to provide a simple and satisfactory method for Bayesian unit roottesting. With such an approach, the integration problem involved in Bayes factors turns to a standardmaximum likelihood estimation problem, and the Bayes factors have the very simple form of Eq. (12).Notably, (i) priors are not involved in the approximated form (though they determine the asymptoticerror rate), (ii) the procedure smoothly scales to more complex time series models, and (iii) allows totarget the exact point null unit root hypothesis. The ratio of the Bayes factor discriminates betweenthe hypotheses according to the common interpretation scale of (Jeffreys, 1961).The simulation study confirms the validity of our proposed approach showing that, in this controlledsetting, the BIC approximation, leads to decisions that are entirely coherent with the expectations,under a wide range of values for the autoregressive parameter, and sample sizes. The posterior prob-abilities associated with the null are furthermore aligned with those of other Bayesian procedures andin accordance with the p-values from the DF test. The same coherence between BIC, other Bayesianmethods, and the frequentist DF test also arises from the the analysis of real exchange rates series. Inparticular, our BIC-based conclusion of non-stationarity matches the decisions one would draw basedon the DF and SVD tests. The results are furthermore aligned with the posterior probabilities from(Phillips, 1991b), in support of an apparent violation of the real exchange rate and purchasing powerparity equilibrium in the last decade.Recognizing that BIC is just a model Information Criterion (IC) among many others that perhapsreduce to BIC as special cases, it would be interesting to explore the use of such alternatives. Amongthese, generalized variants of BIC (see e.g. Konishi and Kitagawa, 2008, Ch. 9), the FIC (Wei, 1992),and the ICs based on predictive distributions (Phillips and Ploberger, 1994, 1996) serve a broader modelselection scope (than e.g. dampening the drawbacks associated to priors’ selection in unit root testing)and rely on different theoretical and motivation bases. Generalized ICs are potentially superior forthe generic purpose of model selection, however, they do not necessarily have a direct connection withwell-identified Bayesian inference problems, though the use of some of them in unit root testing couldnow be explained by our results on BIC. Perhaps a direction for related future works could be that ofinvestigating to what extent ICs and model selection techniques have a clear link with certain problemsin Bayesian inference. This would shed light on whether it is possible to rely on ICs and model selectionmethods as general tools for approximated Bayesian inference in situations where e.g. priors are ofdifficult specification. 15 cknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovationprogramme under the Marie Sk(cid:32)lodowska-Curie grant agreement No. 890690.
References
Aho, K., Derryberry, D., and Peterson, T. (2014). Model selection for ecologists: the worldviews of aicand bic.
Ecology , 95(3):631–636.Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses.
Statistical Science , pages 317–335.Berger, J. O. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p valuesand evidence.
Journal of the American statistical Association , 82(397):112–122.Campbell, J. Y. and Perron, P. (1991). Pitfalls and opportunities: what macroeconomists should knowabout unit roots.
NBER macroeconomics annual , 6:141–201.Chaturvedi, A. and Kumar, J. (2005). Bayesian unit root test for model with maintained trend.
Statistics& Probability Letters , 74(2):109–115.Chen, C. W., Chen, S.-Y., and Lee, S. (2013). Bayesian unit root test in double threshold heteroskedasticmodels.
Computational Economics , 42(4):471–490.Chen, C. W. and Lee, S. (2015). A local unit root test in mean for financial time series.
Journal ofStatistical Computation and Simulation , 86(4):788–806.de Bragan¸ca Pereira, C. A. and Stern, J. M. (1999). Evidence and credibility: full bayesian significancetest for precise hypotheses.
Entropy , 1(4):99–110.DeGroot, M. H. (2005).
Optimal statistical decisions , volume 82. John Wiley & Sons.DeJong, D. N. and Whiteman, C. H. (1991). Reconsidering ‘trends and random walks in macroeconomictime series’.
Journal of Monetary Economics , 28(2):221–254.Dickey, D. A. and Fuller, W. A. (1981). Likelihood ratio statistics for autoregressive time series with aunit root.
Econometrica: journal of the Econometric Society , pages 1057–1072.Diniz, M., Pereira, C. A. d. B., and Stern, J. M. (2011). Unit roots: Bayesian significance test.
Com-munications in Statistics-Theory and Methods , 40(23):4200–4213.Efron, B., Gous, A., Kass, R., Datta, G., and Lahiri, P. (2001). Scales of evidence for model selection:Fisher versus jeffreys.
Lecture Notes-Monograph Series , pages 208–256.Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator:Observed versus expected fisher information.
Biometrika , 65(3):457–482.Evans, M., Swartz, T., et al. (1995). Methods for approximating integrals in statistics with specialemphasis on bayesian integration problems.
Statistical science , 10(3):254–272.Geweke, J. (1988). The secular and cyclical behavior of real gdp in 19 oecd countries, 1957–1983.
Journalof Business & Economic Statistics , 6(4):479–486.Granger, C. W., Newbold, P., and Econom, J. (1974). Spurious regressions in econometrics.
Baltagi,Badi H. A Companion of Theoretical Econometrics , pages 557–61.Hasegawa, H., Chaturvedi, A., and van Hoa, T. (2000). Bayesian unit root test in nonnormal ar(1)model.
Journal of Time Series Analysis , 21(3):261–280.Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems.
Proceedings ofthe Royal Society of London. Series A. Mathematical and Physical Sciences , 186(1007):453–461.Jeffreys, H. (1961). Theory of probability, clarendon.Johnson, V. E. (2005). Bayes factors based on test statistics.
Journal of the Royal Statistical Society.Series B (Statistical Methodology) , 67(5):689–701.Kalaylıo˘glu, Z. I., Bozdemir, B., and Ghosh, S. K. (2013). Bayesian unit-root testing in stochasticvolatility models with correlated errors.
Hacettepe Journal of Mathematics and Statistics , 42(6):659–669.Kalaylıo˘glu, Z. I. and Ghosh, S. K. (2009). Bayesian unit-root tests for stochastic volatility models.
Statistical Methodology , 6(2):189–201. 16ass, R. E. and Raftery, A. E. (1995). Bayes factors.
Journal of the american statistical association ,90(430):773–795.Kass, R. E., Tierney, L., and Kadane, J. B. (1991). Laplace’s method in bayesian analysis.
ContemporaryMathematics , 115:89–99.Kass, R. E. and Vaidyanathan, S. K. (1992). Approximate bayes factors and orthogonal parameters,with application to testing equality of two binomial proportions.
Journal of the Royal StatisticalSociety: Series B (Methodological) , 54(1):129–144.Kass, R. E. and Wasserman, L. (1995). A reference bayesian test for nested hypotheses and its relation-ship to the schwarz criterion.
Journal of the american statistical association , 90(431):928–934.Kim, I.-M. and Maddala, G. (1991). Flat priors vs. ignorance priors in the analysis of the ar (1) model.
Journal of Applied Econometrics , 6(4):375–380.Konishi, S. and Kitagawa, G. (2008).
Information criteria and statistical modeling . Springer Science &Business Media.Koop, G. (1991). Cointegration tests in present value relationships: A bayesian look at the bivariateproperties of stock prices and dividends.
Journal of Econometrics , 49(1-2):105–139.Koop, G. (1994). An objective bayesian analysis of common stochastic trends in international stockprices and exchange rates.
Journal of Empirical Finance , 1(3-4):343–364.Kumar, J. and Agiwal, V. (2018). Panel data unit root test with structural break: A bayesian approach.
Hacettepe Journal of Mathematics and Statistics , 48(3).Leamer, E. E. (1991). Comment onto criticize the critics’.
Journal of Applied Econometrics , pages371–373.Li, Y., Chong, T. T.-L., and Zhang, J. (2012). Testing for a unit root in the presence of stochasticvolatility and leverage effect.
Economic Modelling , 29(5):2035–2038.Li, Y. and Yu, J. (2010). A new bayesian unit root test in stochastic volatility models.
Singapore Schoolof Economics, Research collection paper 1240 .Lubrano, M. (1995). Testing for unit roots in a bayesian framework.
Journal of Econometrics , 69(1):81–109.Ly, A., Verhagen, J., and Wagenmakers, E.-J. (2016). Harold jeffreys’s default bayes factor hypothesistests: Explanation, extension, and application in psychology.
Journal of Mathematical Psychology ,72:19–32.MacDonald, R. (1995). Long-run exchange rate modeling: a survey of the recent evidence.
Staff Papers ,42(3):437–489.Maddala, G. S. and Kim, I.-M. (1998).
Unit roots, cointegration, and structural change . Cambridgeuniversity press.Masson, M. E. (2011). A tutorial on a practical bayesian alternative to null-hypothesis significancetesting.
Behavior research methods , 43(3):679–690.Parikh, A. and Wakerly, E. (2000). Real exchange rates and unit root tests.
Weltwirtschaftliches Archiv ,136(3):478–490.Park, J. Y. and Shintani, M. (2016). Testing for a unit root against transitional autoregressive models.
International Economic Review , 57(2):635–664.Perks, W. (1947). Some observations on inverse probability including a new indifference rule.
Journalof the Institute of Actuaries (1886-1994) , 73(2):285–334.Phillips, P. C. (1986). Understanding spurious regressions in econometrics.
Journal of econometrics ,33(3):311–340.Phillips, P. C. (1991a). Bayesian routes and unit roots: De rebus prioribus semper est disputandum.
Journal of Applied Econometrics , 6(4):435–473.Phillips, P. C. (1991b). To criticize the critics: An objective bayesian analysis of stochastic trends.
Journal of Applied Econometrics , 6(4):333–364.Phillips, P. C. et al. (1993).
The long-run Australian consumption function reexamined: an empiricalexercise in Bayesian inference . Cowles Foundation for Research in Economics at Yale University.Phillips, P. C. and Ploberger, W. (1994). Posterior odds testing for a unit root with data-based model17election.
Econometric Theory , pages 774–808.Phillips, P. C. and Ploberger, W. (1996). An asymtotic theory of bayesian inference for time series.
Econometrica: Journal of the Econometric Society , pages 381–412.Poirier, D. J. (1988). Frequentist and subjectivist perspectives on the problems of model building ineconomics.
Journal of economic perspectives , 2(1):121–144.Raftery, A. E. (1995). Bayesian model selection in social research.
Sociological Methodology , 25:111–163.Raftery, A. E. (1996). Approximate bayes factors and accounting for model uncertainty in generalisedlinear models.
Biometrika , 83(2):251–266.Schotman, P. and Van Dijk, H. K. (1991a). A bayesian analysis of the unit root in real exchange rates.
Journal of Econometrics , 49(1-2):195–238.Schotman, P. C. and Van Dijk, H. K. (1991b). On bayesian routes to unit roots.
Journal of AppliedEconometrics , 6(4):387–401.Sims, C. A. (1988). Bayesian skepticism on unit root econometrics.
Journal of Economic dynamics andControl , 12(2-3):463–474.Sims, C. A. (1991). Comment by christopher a. sims on ‘to criticize the critics’, by peter c. b. phillips.
Journal of Applied Econometrics , 6(4):423–434.Sims, C. A. and Uhlig, H. (1991). Understanding unit rooters: A helicopter tour.
Econometrica: Journalof the Econometric Society , pages 1591–1599.Slate, E. H. (1994). Parameterizations for natural exponential families with quadratic variance functions.
Journal of the American Statistical Association , 89(428):1471–1482.So, M. K. and Ll, W. (1999). Bayesian unit-root testing in stochastic volatility models.
Journal ofBusiness & Economic Statistics , 17(4):491–496.Sowell, F. (1991). On dejong and whiteman’s bayesian inference for the unit root model.
Journal ofMonetary Economics , 28(2):255–263.Stanford, D. C. and Raftery, A. E. (2002). Approximate bayes factors for image segmentation: Thepseudolikelihood information criterion (plic).
IEEE Transactions on Pattern Analysis and MachineIntelligence , 24(11):1517–1520.Thornber, H. (1967). Finite sample monte carlo studies: An autoregressive illustration.
Journal of theAmerican Statistical Association , 62(319):801–818.Tierney, L., Kass, R. E., and Kadane, J. B. (1989). Fully exponential laplace approximations to ex-pectations and variances of nonpositive functions.
Journal of the American Statistical Association ,84(407):710–716.Uhlig, H. (1994). What macroeconomists should know about unit roots: a bayesian perspective.
Econo-metric Theory , pages 645–671.Vosseler, A. (2016). Bayesian model selection for unit root testing with multiple structural breaks.
Computational Statistics & Data Analysis , 100:616–630.Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems ofp values.
Psychonomicbulletin & review , 14(5):779–804.Wasserman, L. (2000). Bayesian model selection and model averaging.
Journal of mathematical psy-chology , 44(1):92–107.Wei, C.-Z. (1992). On predictive least squares principles.
The Annals of Statistics , pages 1–42.Xia, C. and Griffiths, W. (2012). Bayesian unit root testing: The effect of choice of prior on testoutcomes.
Advances in Econometrics: , 30:27–57.Zellner, A. (1971).
An introduction to Bayesian inference in econometrics . John Wiley & Sons, Inc,.Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses.
Trabajos deestad´ıstica y de investigaci´on operativa , 31(1):585–603.Zhang, J. Y., Li, Y., and Chen, Z. M. (2013). Unit root hypothesis in the presence of stochastic volatility,a bayesian analysis.
Computational Economics , 41(1):89–100.Zivot, E. (1994). A bayesian analysis of the unit root hypothesis within an unobserved componentsmodel.
Econometric Theory , pages 552–578. 18
Appendices
A.1 The model of Schotman and Van Dijk (1991b)
Consider the simplest autoregressive process of order one with zero-mean x t = ρx t − + u t . (15)Assume that (i) x is a known constant, implying that we work conditionally on the initial observation,(ii) u t are independent and identically distributed (i.i.d.) normal random variables with mean zero andunknown variance σ , (iii) ρ ∈ { S, } ; S = { ρ | − < a ≤ ρ < } . We assume (iv) to observe a sample of T observations on a time series { x t } . The Bayesian analysis is carried out via posterior odds. For thesimple model here under consideration we have K = K (cid:82) ∞ L ( x | ρ = 1 , σ, x ) π ( σ ) dσ (cid:82) S (cid:82) ∞ L ( x | ρ, σ, x ) π ( σ ) π ( ρ ) dσdρ = Pr( ρ = 1 | x, x )Pr( ρ ∈ S | x, x ) . (16)where K represents the prior odds ratio in favour of the hypothesis ρ = 1, and K the correspondingposterior odds ratio. The ratio between the integrals corresponds to the Bayes factor, π ( σ ) and π ( ρ )represent the prior densities for σ and ρ ∈ S , L ( x |· ) the likelihood function for the observed data x .The prior odd K expresses the relative weight of the null hypothesis against its stationary alternativesuch that the point ρ = 1 is given the probability mass π = K / (1 + K ), and analogously K / (1 + K )provides the posterior probability of the null hypothesis ρ = 1.Schotman and Van Dijk (1991b) specify the marginal distribution of ρ and σ asPr( ρ = 1) = π , Pr( ρ | ρ ∈ S ) = 11 − a , Pr( σ ) ∝ σ ,that is, ρ is taken uniform over S and with probability mass π on ρ = 1, with σ and ρ independent.Besides the fact that the density for ρ depends only on the parameter α with great simplification of theintegration problem in the denominator of Eq.(16), the overall solution even in this simple setting is notobvious: K = π − π C − T ( T − / (cid:18) σ ˆ σ (cid:19) − T − as ˆ ρ (cid:20) F (cid:18) − ˆ ρs ˆ ρ ; T − (cid:19) − F (cid:18) a − ˆ ρs ˆ ρ ; T − (cid:19)(cid:21) − .ˆ ρ is the OLS estimator of ρ , s ρ the squared OLS standard error of ˆ ρ , ˆ σ the estimated variance of theresiduals, σ the variance of the first differences in x , F ( · , ν ) the cumulative density of the t-distributionwith ν degrees of freedom, C T = Γ(( T − / / / Γ( T /
2) a constant and π / (1 − π ) the prior oddsratio K .By choosing a prior equal balance between the stationary and random walk hypothesis π = 1 / a , a ∗ = ˆ ρ + s ˆ ρ F − ( αF ( − ˆ τ )) ,where τ = − ρs ˆ ρ is the Dickey-Fuller statistics, and 0 < α < − α of its probability mass in [ a ∗ , K = C − T ( T − / (cid:18) σ ˆ σ (cid:19) − T (cid:18) − ˆ τ − F − ( αF ( − ˆ τ )) F ( − ˆ τ ) (cid:19) . A.2 The model of Phillips (1991b)
Phillips (1991b) adopts the information matrix prior from (Jeffreys, 1946). In particular, for a genericfamily of densities with parameter θ = ( ρ, σ ) and information matrix i , the uninformative Jeffreys’ priorhe considers it is defined as π ( θ ) ∝ | i | . 19or the AR(1) model x t = ρx t − + u t with u t i.i.d. zero-mean normal with variance σ , and initialvalue x , the above prior becomes π ( ρ, σ ) ∝ σ I ρ ,where the continuous function I ρ , for −∞ < ρ < + ∞ and sample size T , is defined as: I ρ = T − ρ − − ρ − ρ T − ρ + (cid:16) x σ (cid:17) − ρ T − ρ if ρ (cid:54) = 1, T ( T − T (cid:16) x σ (cid:17) if ρ = 1.This choice of the prior achieves tighter confidence sets for large values of | ρ | , is invariant to transforma-tions of the parameter and enjoys other desirable properties. The prior depends on x and its informationgrows with the sample size T at a geometric rate when ρ >