[PDF] On the relationship between beta-Bartlett and Uhlig extended processes

Abstract

Stochastic volatility processes are used in multivariate time-series analysis to track time-varying patterns in covariance matrices. Uhlig extended and beta-Bartlett processes are especially convenient for analyzing high-dimensional time-series because they are conjugate with Wishart likelihoods. In this article, we show that Uhlig extended and beta-Bartlett are closely related, but not equivalent: their hyperparameters can be matched so that they have the same forward-filtered posteriors and one-step ahead forecasts, but different joint (smoothed) posterior distributions. Under this circumstance, Bayes factors can't discriminate the models and alternative approaches to model comparison are needed. We illustrate these issues in a retrospective analysis of volatilities of returns of foreign exchange rates. Additionally, we provide a backward sampling algorithm for the beta-Bartlett process, for which retrospective analysis had not been developed.

Full PDF

aa r X i v : . [ m a t h . S T ] J u l On the relationship betweenUhlig extended and beta-Bartlett processes

V´ıctor Pe˜naBaruch College, The City University of New YorkandKaoru IrieFaculty of Economics, University of TokyoJuly 7, 2020

Abstract

Stochastic volatility processes are used in multivariate time-series analysis to tracktime-varying patterns in covariance structure. Uhlig extended and beta-Bartlett pro-cesses are especially useful for analyzing high-dimensional time-series because theyare conjugate with Wishart likelihoods. In this article, we show that Uhlig extendedand beta-Bartlett processes are closely related, but not equivalent: their hyperpa-rameters can be matched so that they have the same forward-ﬁltered posteriors andone-step ahead forecasts, but diﬀerent joint (smoothed) posterior distributions. Un-der this circumstance, Bayes factors can’t discriminate the models and alternativeapproaches to model comparison are needed. We illustrate these issues in a retro-spective analysis of volatilities of returns of foreign exchange rates. Additionally,we provide a backward sampling algorithm for the beta-Bartlett process, for whichretrospective analysis had not been developed.

Keywords:

Stochastic volatility, state-space models, Bayesian model comparison1

Introduction

Time-series data with time-varying covariance structure arise in ﬁelds as diverse as ﬁ-nance, neuroimaging or online marketing. In such applications, stochastic volatility pro-cesses are necessary for successful forecasting and decision making. As illustrated in West(2020), models with conjugate sequential updates are particularly attractive for analyz-ing high-dimensional time-series, since implementing richly-parametrized models that re-quire Markov chain Monte Carlo methods for posterior inference (e.g., Lopes et al. (2010),Nakajima and West (2012), Kastner et al. (2017), and Shirota et al. (2017)) may not becomputationally feasible.Two classes of stochastic volatility processes that are conjugate with Wishart likelihoodscoexist in the literature: matrix-beta processes, which build upon Uhlig (1997), and beta-Bartlett processes, which were ﬁrst used in Quintana et al. (2003). To this date, the mostﬂexible matrix-beta process is the Uhlig extended process (Windle and Carvalho, 2014),which is the one we consider herein.Our main contributions are (1) studying the relationship between Uhlig extended andbeta-Bartlett processes (Section 3) and (2) providing the ﬁrst backward sampler in theliterature for beta-Bartlett processes (Section 4). The hyperparameters of the processescan be matched so that they yield the same marginal likelihoods, so Bayes factors can’t beused to distinguish them. In Section 5, we describe approaches to model comparison thatdon’t regard the processes as equivalent, which we then apply in a foreign exchange ratesapplication in Section 6. We end the article with conclusions in Section 7.

We use the notation 1 : n to compactly denote the set { , , ... , n } . Following Prado and West(2010), we use the notation D t for our “information set” at time t . Before we observeany data, our prior knowledge is denoted D . At time t ∈ T, the information setis D t = {D , y t } . We denote q -dimensional normal random variables with mean µ andcovariance matrix Σ as N q ( µ, Σ), chi-squared random variables with k > χ k , and Beta random variables with shape a > b > a, b ).The less common Wishart q ( k, A ) and MatrixBeta q ( n/ , k/

2) distributions are as deﬁned inWindle and Carvalho (2014) and the supplementary material to this article. For extrema,we use the notation a ∧ b = min( a, b ) and a ∨ b = max( a, b ). Finally, we use the notationuchol( · ) for the function that returns the upper-triangular Cholesky factor of a symmetricpositive-deﬁnite matrix.The models we study rely heavily on the Bartlett decomposition of Wishart-distributedmatrices, which we now review. Let W ∼ Wishart q ( k, A ) be a q × q random matrix with k > A . Its Bartlett decomposition is W = ( U P ) ′ U P ,where P = uchol( A ) and upper-triangular U = ( u ij ) i,j ∈ q with u ij iid ∼ N (0 ,

1) for i < j ,which are independent of u ii iid ∼ χ k − i +1 for i, j ∈ q . Windle and Carvalho (2014) extend a model that was originally proposed in Uhlig (1997).Given ( q × q )-dimensional symmetric positive-deﬁnite matrices { y t } t ∈ T , their model canbe written as y t | Φ Ut ind. ∼ Wishart q ( k, ( k Φ Ut ) − ) , Φ Ut = ( U Ut − P Ut − ) ′ Ψ t U Ut − P Ut − /λ, (1)where Ψ t ∼ MatrixBeta q ( n/ , k/

2) and U Ut − and P Ut − come from the Bartlett decompo-sition Φ Ut − = ( U Ut − P Ut − ) ′ U Ut − P Ut − . The model is completed with the prior Φ U | D ∼ Wishart q ( n + k, ( kD U ) − ) . The hyperparameters are 0 < λ < n > q −

1, and k , which iseither a positive integer less than q or a real number greater than q −

1. We refer to theprocess on { Φ Ut } t ∈ T implied by the model above as to the Uhlig extended (UE) process.The prior distributions and forward-ﬁltered posteriors, as derived in Windle and Carvalho(2014), are given in Table 1.In contrast, the beta-Bartlett (BB) stochastic volatility process (Quintana et al., 2003)can be written as y t | Φ Bt ind. ∼ Wishart q ( k, ( k Φ Bt ) − ) , Φ Bt = ( ˜ U t P Bt − ) ′ ˜ U t P Bt − /b, (2)3here P Bt − is deﬁned via the Bartlett decomposition Φ Bt − = ( U Bt − P Bt − ) ′ U Bt − P Bt − and ˜ U t =(˜ u ij,t ) i,j ∈ q is constructed by modifying the diagonal elements of U Bt − = ( u Bij,t − ) i,j ∈ q asexplained in Table 1. The hyperparameters of the model are k >

0, 0 < β <

1, 0 < b < k >

0, which appears in the prior Φ | D ∼ Wishart q ( k , ( kD B ) − ) for symmetricpositive-deﬁnite D B . We refer to the process deﬁned on { Φ Bt } t ∈ T as to the beta-Bartlett(BB) process. The prior distributions and forward-ﬁltered posteriors with this model canbe found in Table 1.Table 1: Comparison of Uhlig extended and beta-Bartlett models. Uhlig extended beta-BartlettLikelihood y t | Φ Ut ind. ∼ Wishart q ( k, ( k Φ Ut ) − ) y t | Φ Bt ind. ∼ Wishart q ( k, ( k Φ Bt ) − )State evol. Φ Ut = ( U Ut − P Ut − ) ′ Ψ t U Ut − P Ut − /λ Φ Bt = ( ˜ U t P Bt − ) ′ ˜ U t P Bt − /b Φ Ut − = ( U Ut − P Ut − ) ′ U Ut − P Ut − Φ Bt − = ( U Bt − P Bt − ) ′ U Bt − P Bt − Error Ψ t ∼ MatrixBeta q ( n/ , k/

2) ˜ u ij,t = u Bij,t − ( i = j )(˜ u ii,t ) = η i,t ( u Bii,t − ) η t,i ind ∼ Beta(( βk t − − i + 1) / , (1 − β ) k t − / t − Ut | D t − ∼ Wishart q ( n + k, ( kD Ut ) − ) Φ Bt − | D t − ∼ Wishart q ( k t − , ( kD Bt − ) − )Prior at t Φ Ut | D t − ∼ Wishart q ( n, ( kλD Ut ) − ) Φ Bt | D t − ∼ Wishart q ( βk t − , ( kbD Bt − ) − )Post. at t Φ Ut | D t ∼ Wishart q ( n + k, ( kD Ut ) − ) Φ Bt | D t ∼ Wishart q ( k t , ( kD Bt ) − ) D Ut = λD Ut + y t D Bt = bD Bt − + y t and k t = βk t − + k The priors, forward-ﬁltered posteriors, and one-step ahead forecast distributions of themodels deﬁned in Equations (1) and (2) coincide under the condition k = n + k, β = n/ ( n + k ) , b = λ and D B = D U . (3)The change of variables is bijective, so if the hyperparameters are set by maximizing themarginal likelihoods of the models, the condition is satisﬁed.However, the two processes are not equivalent under Equation (3) because the condi-tionals Φ Ut | Φ Ut − , D t − and Φ Bt | Φ Bt − , D t − diﬀer. Assume Equation (3) holds, Φ Ut − =Φ Bt − = Φ t − = ( U t − P t − ) ′ U t − P t − and D t − are given, and U t − = ( u ij,t − ) i,j ∈ q . If theconditionals were equal in distribution, they would be equal in expectation. Using results4n Konno (1988), E (Φ Ut − Φ Bt | Φ t − , D t − ) = 1 λ P ′ t − (cid:20) nn + k U ′ t − U t − − E ( ˜ U ′ t ˜ U t | Φ t − , D t − ) (cid:21) P t − E [( ˜ U ′ t ˜ U t ) ij | Φ t − , D t − ] = i ∧ j − X l =1 u li,t − u lj,t − + δ ij ( n − i + 1) u ii,t − n − i + 1 + k + (1 − δ ij ) g ( i, j ) g ( i, j ) = Γ (cid:0) n − i ∧ j +22 (cid:1) Γ (cid:0) n − i ∧ j + k +12 (cid:1) Γ (cid:0) n − i ∧ j +12 (cid:1) Γ (cid:0) n − i ∧ j + k +22 (cid:1) u i ∧ j,i ∧ j,t − u i ∧ j,i ∨ j,t − , where ( ˜ U ′ t ˜ U t ) ij is the ( i, j ) th element of ˜ U ′ t ˜ U t , δ ij = 1 if i = j and δ ij = 0 otherwise.In general, E [( ˜ U ′ t ˜ U t ) ij | Φ t − , D t − ] and nU ′ t − U t − /k aren’t equal: if U t − is diagonal, E [( ˜ U ′ t ˜ U t ) ii | Φ t − , D t − ] = ( n − i + 1) u ii,t − / ( n − i + 1 + k ) = nu ii,t − / ( n + k ) for i ∈ q .This distinction aﬀects the (smoothed) posterior distributions Φ U T | D T and Φ B T | D T .Dropping process superscripts, the posterior distribution can be factorized as p (Φ T | D T ) = p (Φ T | D T ) T − Y t =1 p (Φ t | Φ t +1 , D t ) , (4)and the conditionals p (Φ t | Φ t +1 , D t ) of UE and BB processes are diﬀerent. For the UEmodel, we have Φ t = λ Φ t +1 + Z t , where Z t ∼ Wishart q ( k, ( kD Bt ) − ); for the BB model,see Section 4. Below, we compare the expectations and variances of the conditionals in aconcrete example to prove our claim, as well as to gain some intuition. Example 1

Assume Equation (3) holds, let k = 1, P t = uchol(( kD t ) − ) be the identitymatrix, and Υ = uchol(Φ t +1 ) = ( υ ij ) i,j ∈ q . Then, E [(Φ Ut ) ij | Φ t +1 , D t ] = λ i ∧ j X l =1 υ li υ lj + δ ij ; V [(Φ Ut ) ij | Φ t +1 , D t ] = 1 + δ ij , where (Φ Ut ) ij is the ( i, j ) th element of (Φ Ut ). Similarly, for the BB process: E [(Φ Bt ) ij | Φ t +1 , D t ] = λ i ∧ j − X l =1 υ li υ lj + δ ij ( λυ ii + 1) + (1 − δ ij ) h ( i, j ) V [(Φ t ) Bij | Φ t +1 , D t ] = 2 δ ij + (1 − δ ij )[ λ υ i ∧ j,i ∨ j υ i ∧ j,i ∧ j + λυ i ∧ j,i ∨ j − h ( i, j ) ] , with h ( i, j ) = √ λ υ i ∧ j,i ∨ j U ( − / , , λυ i ∧ j,i ∧ j / U ( a, b, z ) is Tricomi’s conﬂuenthypergeometric function (see e.g. Abramowitz et al. (1988)). The expressions for the5iagonal elements coincide but that need not be the case for the oﬀ-diagonal elements: υ ij > E [(Φ t ) Bij | Φ t +1 , D t ] > E [(Φ t ) Uij | Φ t +1 , D t ] and λυ ij < V [(Φ t ) Bij | Φ t +1 , D t ] < V [(Φ t ) Uij | Φ t +1 , D t ]. Forward-ﬁltered posteriors and forecast distributions for BB processes were derived inQuintana et al. (2010), but a backward sampler was not developed. Our sampler uses thefactorization of Φ T | D T in Section 3 and it consists in drawing Φ ∗ T ∼ Wishart q ( k T , ( kD BT ) − )and iteratively sampling Φ ∗ t ∼ Φ t | Φ ∗ t +1 , D t .Given Φ t +1 and D t , consider the decomposition Φ t +1 = ( ˜ U ∗ t +1 P t ) ′ ˜ U ∗ t +1 P t /b. That is,˜ U ∗ t +1 = uchol( b ( P − t ) ′ Φ t +1 P − t ). Then, we can generate U ∗ t = ( u ∗ ij,t ) i,j ∈ q as follows. Theoﬀ-diagonal elements are u ∗ ij,t = ˜ u ∗ ij,t +1 for i < j and the diagonal elements are ( u ∗ ii,t ) =(˜ u ∗ ii,t +1 ) + θ it , where θ it iid ∼ χ − β ) k t . Finally, we can set Φ t = ( U ∗ t P t ) ′ U ∗ t P t . The expression for the conditional of ( u ∗ ii,t ) given (˜ u ∗ ii,t +1 ) can be justiﬁed using stan-dard results for the univariate gamma-beta discount model (see e.g. Exercise 4 in Section4.6 of Prado and West (2010)). To relate U ∗ t to ˜ U ∗ t , observe that Φ t = ( U ∗ t P t ) ′ U ∗ t P t =( ˜ U ∗ t P t − ) ′ ˜ U ∗ t P t − /b. Therefore,˜ U ∗ t = uchol( b ( P ′ t − ) − Φ t P − t − ) = uchol( b ( U ∗ t P t P − t − ) ′ U ∗ t P t P − t − ) = √ b U ∗ t P t P − t − . The matrix P t − is upper-triangular, so it can be inverted at O ( n ) computational costusing back-substitution. The backward sampler for the UE process requires simulatingWishart random matrices for all t ∈ T , whereas the BB process only requires samplinga Wishart random matrix for t = T . For t ∈ T − q chi-squared random variates. Explicit pseudocode for the backwardsampler can be found in Algorithm 1. The sampler can also be used in multivariate dynamiclinear models with BB stochastic volatilties (as in Section 10.4.8 in Prado and West (2010)).6 lgorithm 1: Backward sampler for Φ B T | D T Input: b , β , k t , and P t = uchol(( kD Bt ) − ) from Φ Bt | D t ∼ Wishart( k t , ( kD Bt ) − ), t ∈ T . Output: Φ ∗ B T ∼ Φ B T | D T . U ∗ T = ( u ∗ T,ij ) i,j ∈ q ; u ∗ T,ii ind. ∼ χ k T − i +1 ; u ∗ T,ij iid ∼ N (0 , i ∈ q and i < j ≤ q ;Φ ∗ BT = ( U ∗ T P T ) ′ U ∗ T P T ;˜ U ∗ T = √ b U ∗ T P T P − T − ; for descending t ∈ T : 1 do θ ( t − ,i ind ∼ χ − β ) k t − for i ∈ q ; U ∗ t − = ( u ∗ ( t − ,ij ) i,j ∈ q ; u ∗ ( t − ,ii = q (˜ u ∗ t,ii ) + θ ( t − ,i , i ∈ q ; u ∗ ( t − ,ij = ˜ u ∗ t,ij , i ∈ q and i < j ≤ q ;Φ ∗ Bt − = ( U ∗ t − P t − ) ′ U ∗ t − P t − ; if t ≥ then ˜ U ∗ t − = √ b U ∗ t − P t − P − t − ; return Φ ∗ B T ; If Equation (3) is satisﬁed, the marginal likelihoods of UE and BB models are equal and wecan’t use Bayes factors (or posterior model probabilities) to compare them. However, thediﬀerence of Φ U T | D T and Φ B T | D T can be substantial in practice, as we see in Section 6.Instead of Bayes factors, we can use posterior likelihood ratios (Aitkin, 1991) andposterior predictive checks (Gelman et al., 1996) to compare the models. Both of theseapproaches can be implemented given posterior draws, but they have been criticized for,among other reasons, using the data twice (Gelman et al., 2013). Alternatively, Kamary et al.(2014) propose comparing models via mixtures, which here amounts to ﬁtting y t | α, Φ Ut , Φ Bt ind. ∼ α Wishart q ( k, ( k Φ Ut ) − ) + (1 − α ) Wishart q ( k, ( k Φ Bt ) − ) , where α ∼ Beta( a , b ), and studying the posterior distribution of the mixture weight α . We perform a retrospective analysis of volatilities of daily returns of exchange rates of threecurrencies measured in US dollars: euros (EUR), British pounds (GBP), and Canadiandollars (CAD), observed from January 2008 to October 2010 ( T = 739). The vector7f returns r t can be turned into a rank-1 symmetric matrix by computing y t = r t r ′ t . Ourobservational model is y t | Φ t ind. ∼ Wishart q (1 , Φ − t ); that is, we set k = 1, which is equivalentto modeling r t | Φ t ind. ∼ N q (0 q , Φ − t ) . The estimate of the volatility matrix at the starting point, D , is computed as the sampleaverage of the data in 2007. The other hyperparameters are obtained by maximizing themarginal likelihood of the model and are n = 5 and λ = 0 . Corr ( EUR, GBP ): UE

Corr ( EUR, GBP ): UE Corr ( EUR, CAD ): UE

Corr ( EUR, CAD ): UE Corr ( GBP, CAD ): UE

Corr ( GBP, CAD ): UE Corr ( EUR, GBP ): BB

Corr ( EUR, GBP ): BB Corr ( EUR, CAD ): BB

Corr ( EUR, CAD ): BB Corr ( GBP, CAD ): BB

Corr ( GBP, CAD ): BB

Figure 1: Posterior medians and 95% credible intervals of correlations computed fromsampled (Φ Ut ) − and (Φ Bt ) − for the UE (top row) and BB (bottom) models.The logarithm of the posterior likelihood ratio of the UE model to the BB model is ℓ UB = log (cid:8) E [ L (Φ U T ) |D T ] /E [ L (Φ B T ) |D T ] (cid:9) , L (Φ T ) = T Y t =1 N q ( r t | q , Φ − t ) , where the expectations are computed by posterior samples Φ U T | D T and Φ B T | D T . Inthis application, ℓ UB = − . N = 10 iterations. A mixture weight α close to 0 favors the BB model, whereas amixture weight near 1 favors UE. Starting with α ∼ Beta(1 , E ( α | D t ) =0 .

498 with an estimated standard error of 0 . P ( α < . | D t ) = 0 .

533 with anestimated standard error of 0 . UE and BB processes can be parametrized so that they yield the same forecasts andmarginal likelihoods. Therefore, practitioners who are only concerned with forecasting andBayesians who compare models using Bayes factors can treat these processes as equivalent.On the other hand, Section 6 shows that the smoothed posteriors can be rather diﬀerent,and that approaches to model comparison such as posterior likelihood ratios don’t see themodels as equivalent. This example calls for further investigation to discover when thisphenomenon occurs, and how to proceed when it does.

Acknowledgement

We thank Mike West (at Duke University) for his encouragement and feedback.

References

Abramowitz, M., I. A. Stegun, and R. H. Romer (1988). Handbook of mathematicalfunctions with formulas, graphs, and mathematical tables.Aitkin, M. (1991). Posterior Bayes factors.

Journal of the Royal Statistical Society: SeriesB (Methodological) 53 (1), 111–128. 9elman, A., X.-L. Meng, and H. Stern (1996). Posterior predictive assessment of modelﬁtness via realized discrepancies.

Statistica Sinica , 733–760.Gelman, A., C. P. Robert, and J. Rousseau (2013). Inherent diﬃculties of non-Bayesianlikelihood-based inference, as revealed by an examination of a recent book by Aitkin.

Statistics & Risk Modeling 30 , 105–120.Geyer, C. J. (1992). Practical Markov chain Monte Carlo.

Statistical Science , 473–483.Kamary, K., K. Mengersen, C. P. Robert, and J. Rousseau (2014). Testing hypotheses viaa mixture estimation model. arXiv preprint arXiv:1412.2044 .Kastner, G., S. Fr¨uhwirth-Schnatter, and H. F. Lopes (2017). Eﬃcient Bayesian infer-ence for multivariate factor stochastic volatility models.

Journal of Computational andGraphical Statistics 26 (4), 905–917.Konno, Y. (1988). Exact moments of the multivariate F and beta distributions.

Journalof the Japan Statistical Society 18 (2), 123–130.Lopes, H. F., R. McCulloch, and R. Tsay (2010). Cholesky stochastic volatility.

TechnicalReport, University of Chicago, Booth Business School .Nakajima, J. and M. West (2012). Dynamic factor volatility modeling: A bayesian latentthreshold approach.

Journal of Financial Econometrics 11 (1), 116–153.Prado, R. and M. West (2010).

Time series: modeling, computation, and inference . CRCPress.Quintana, J. M., C. M. Carvalho, J. Scott, and T. Costigliola (2010). Futures markets,Bayesian forecasting and risk modeling.

The Handbook of Applied Bayesian Analysis ,343–365.Quintana, J. M., V. Lourdes, O. Aguilar, and J. Liu (2003). Global gambling.

BayesianStatistics VII , 349–368.Shirota, S., Y. Omori, H. F. Lopes, and H. Piao (2017). Cholesky realized stochasticvolatility model.

Econometrics and Statistics 3 , 34–59.Uhlig, H. (1997). Bayesian vector autoregressions with stochastic volatility.

Econometrica:Journal of the Econometric Society , 59–73.West, M. (2020). Bayesian forecasting of multivariate time series: scalability, structureuncertainty and decisions.

Annals of the Institute of Statistical Mathematics 72 (1), 1–31.Windle, J. and C. M. Carvalho (2014). A tractable state-space model for symmetricpositive-deﬁnite matrices.

Bayesian Analysis 9 (4), 759–792.10 upplementary material

In this supplementary document, we set the notation for the distributions we use in themain text, give an additional example to compare the forward conditional distributions ofUhlig extended and beta-Bartlett processes, and provide technical details and extra resultsfor the foreign exchange rates application.

Distributions

The deﬁnitions of Wishart and matrix Beta distributions can be found, for instance, inWindle and Carvalho (2014) and Prado and West (2010). We include them here for com-pleteness.

Wishart:

Let A be a q × q symmetric positive deﬁnite matrix. Then, A ∼ Wishart q ( h, S )if its probability density function is p ( A ) = 2 − ( hq ) / | S | − h/ Γ q ( h/ | A | ( h − q − / exp (cid:26) −

12 tr( S − A ) (cid:27) , where h > q − q ( h/

2) is the multivariate gamma function evaluated at h/

2. Thedeﬁnition can be extended to h ≤ q −

1, in which case A is rank-deﬁcient; see e.g.Windle and Carvalho (2014) for details. Matrix beta distribution:

Let A ∼ Wishart q ( n , Σ − ) and A ∼ Wishart q ( n , Σ − )be independent where Σ is symmetric positive-deﬁnite, n > q − n < q is aninteger or n > q − T = uchol( A + A ) and B = ( T − ) ′ A T − . Then, B ∼ MatrixBeta q ( n / , n / Additional example: conditional distributions

Let Φ t +1 = diag( φ q ) and D − t = diag( d q ). For simplicity, we let k = 1, although thesame computations could be done for k = 1. For the UE process, E (Φ Ut | Φ t +1 , D t ) = diag( λφ q + d q ); V [(Φ Ut ) ij | Φ t +1 , D t ] = δ ij d i + d i d j , where δ ij is Kronecker’s delta function. For the BB process, we have E (Φ Bt | Φ t +1 , D t ) = diag( λφ q + d q ); V [(Φ t ) Bij | Φ t +1 , D t ] = δ ij d i . Foreign exchange rates: technical details

As we mentioned in the main text, we take k = 1, which implies that we can simply workwith normal likelihoods for the returns. We use normal likelihoods in our implementation.In Windle and Carvalho (2014), the discounting parameter λ is automatically chosen tosatisfy λ − = 1 + k/ ( n − q − E [Φ − t |D t ] = E [Φ − t +1 |D t ], a property the authors deemdesirable. In contrast, we directly assess the maximization of marginal likelihood in ( n, λ )under no constraint. The marginal likelihood is the product of one-step ahead forecastdensities, each of which is the multivariate- t distribution deﬁned by p ( r t |D t − ) = Z N q ( r t | , Φ − t ) W q (Φ t | n, ( λD t − ) − )( d Φ t )= Γ( n/ n + 1 − q ) / | λD t − | − / π q/ (1 + r ′ t D − t − r t /λ ) − ( n +1) / . In addition, the determinant of D t is sequentially updated using the convenient relationlog | D t | = log(1 + r ′ t D − t − r t /λ ) + q log( λ ) + log | D t − | , so the evaluation of marginal likelihood isn’t computationally demanding. For maximizingthe marginal likelihood, we evaluate it at n ∈ { , , . . . , } and λ ∈ { . , . , . . . , . } .The posterior likelihood ratio (Aitkin, 1991) can be hard to estimate numerically, butits logarithm is stable. To see this, recall that ℓ UB = log (cid:8) E [ L (Φ U T ) |D T ] /E [ L (Φ B T ) |D T ] (cid:9) , L (Φ T ) = T Y t =1 N q ( r t | q , Φ − t ) . Now, based on Monte Carlo samples Φ U ∗ N ∼ Φ U T | D T and Φ B ∗ N ∼ Φ B T | D T , ℓ UB ≈ LSE( ℓ (Φ U ∗ N )) − LSE( ℓ (Φ B ∗ N )) , where ℓ is log L (Φ T ) and LSE is the log-sum-exp function, which can be implemented ina numerically stable way. 12e implement the mixture model approach proposed in Kamary et al. (2014) througha missing-data augmented Gibbs sampler. The target model is deﬁned by the mixture oflikelihoods, r t | α, Φ Ut , Φ Bt ∼ αN q (0 , (Φ Ut ) − ) + (1 − α ) N q (0 , (Φ Bt ) − ) . We implement the following augmented model: r t = z i r Ut + (1 − z i ) r Bt r M t | Φ M t ∼ N q (0 , (Φ M t ) − ) , M ∈ {

U, B } z i iid ∼ Bernoulli( α ) α ∼ Beta( a , b )The actual observed return, r t , is deﬁned separately from the inputs of two models, r Ut and r Bt . At each iteration of Gibbs sampler, conditional on z i , we decide which model is fed by r t , and which model is “missing” its observation. The notable advantage of this approachis that the missing observation, either r Ut or r Bt , is a parameter, so it is sampled through thecourse of the Gibbs sampler. As a result, the sampling of Φ M T is based on a full sequence ofobservations r M T and we can apply the forward ﬁltering equations and backward samplerwe described in the main text.The Gibbs sampler consists in iteratively sampling from the following full-conditionaldistributions: • Sample z t from Bernoulli distribution with probability P [ z t = 1 |− ] ∝ αN q ( r Ut | , (Φ Ut ) − ) P [ z t = 0 |− ] ∝ (1 − α ) N q ( r Bt | , (Φ Bt ) − )In computation, one can utilize the log-scale,log P ( z t = 1 |− ) = c + log( α ) + 12 log | Φ Ut | −

12 ( r Ut ) ′ Φ Ut r Ut log P ( z t = 0 |− ) = c + log(1 − α ) + 12 log | Φ Bt | −

12 ( r Bt ) ′ Φ Bt r Bt where c is the common constant. • Deﬁne r Ut and r Bt as follows: 13 If z t = 1, then set r Ut = r t and generate r Bt ∼ N q (0 , (Φ Bt ) − ). – If z t = 0, then generate r Ut ∼ N q (0 , (Φ Ut ) − ) and set r Ut = r t . • Sample α from Beta( a , b ), a = a + T X t =1 z t , b = b + T X t =1 (1 − z t ) • Sample { Φ U T } and { Φ B T } using the forward-ﬁltering equations and the backwardsampler described in the main text. Foreign exchange rates: additional results and ﬁgures

Figure 4 shows the original series of returns. Figure 5 shows the contours of the marginallikelihood, along with the maximizer ( n, λ ) = (5 , . n, λ ) = (10 , . α ∼ Beta(1 , α ∼ Beta(10 , . . α < . . . E BB

UE BB Prior: Be(1, 1) α P[ α < 0.5 | data ] = 0.533E[ α | data ] = 0.498 Prior: Be(1, 1)

Figure 2: Top: Posterior log-likelihoods with Uhlig extended (UE; solid red) and beta-Bartlett (BB; dashed blue). Bottom: Prior on mixture weight α (dashed blue) and posterior(red histogram). Vertical line at 0.5. EUR − BB EUR GBP

GBP CAD

CAD EUR: UE EUR: BB

EUR: UE EUR: BB GBP: UE GBP: BB

GBP: UE GBP: BB CAD: UE CAD: BB

CAD: UE CAD: BB

Figure 3: Length of 95% posterior predictive intervals and cumulative empirical coveragerates. 15 UR EUR GBP

GBP CAD

CAD

Figure 4: Time series of daily returns from EUR, GBP and CAD in US dollars. . . . . . . . . . n λ Figure 5: Contour plots of marginal likelihoods as the functions of ( n, λ ). The red circleindicates the maximizer (5 , . , . rior: Be(10, 1) α P[ α < 0.5 | data ] = 0.412E[ α | data ] = 0.505 Prior: Be(10, 1) Sample path of α Sample path of α Figure 6: Histogram and sample path of α for prior α ∼ Be (10 ,,