[PDF] A mixture autoregressive model based on Gaussian and Student's t -distributions

Abstract

We introduce a new mixture autoregressive model which combines Gaussian and Student's t mixture components. The model has very attractive properties analogous to the Gaussian and Student's t mixture autoregressive models, but it is more flexible as it enables to model series which consist of both conditionally homoscedastic Gaussian regimes and conditionally heteroscedastic Student's t regimes. The usefulness of our model is demonstrated in an empirical application to the monthly U.S. interest rate spread between the 3-month Treasury bill rate and the effective federal funds rate.

Full PDF

AA mixture autoregressive model based on Gaussian andStudent’s t -distributions Savi Virolainen

University of Helsinki

Abstract

We introduce a new mixture autoregressive model which combines Gaussian and Student’s t mixture components. The model has very attractive properties analogous to the Gaussian andStudent’s t mixture autoregressive models, but it is more ﬂexible as it enables to model serieswhich consist of both conditionally homoscedastic Gaussian regimes and conditionally het-eroscedastic Student’s t regimes. The usefulness of our model is demonstrated in an empiricalapplication to the monthly U.S. interest rate spread between the 3-month Treasury bill rate andthe effective federal funds rate. Keywords: nonlinear autoregression, mixture model, regime switching, interest rate spread

This work was supported by the Academy of Finland under Grant 308628.Contact address: Savi Virolainen, Faculty of Social Sciences, University of Helsinki, P. O. Box 17, FI–00014 Uni-versity of Helsinki, Finland; e-mail: savi.virolainen@helsinki.ﬁ.The author has no conﬂict of interest to declare. a r X i v : . [ ec on . E M ] M a y Introduction

Recently, Kalliovirta et al. (2015) introduced a mixture autoregressive model based on Gaussiandistribution with very attractive features. The Gaussian mixture autoregressive (GMAR) modelhas linear Gaussian autoregressions as its component models and mixing weights that, for a p thorder model, depend on the full distribution of the p past observations. The speciﬁc formulationof the mixing weights leads to ergodicity and full knowledge of the stationary distribution of p + 1 consecutive observations. Moreover, it allows regime switches to depend on the level, variability,and temporal dependence of the past observations.Meitz et al. (2018a) proposed a mixture autoregressive model closely related to the GMARmodel but based on Student’s t -distribution. The Student’s t mixture autoregressive (StMAR)model has linear Student’s t autoregressions as its component models and mixing weights con-structed analogously to the GMAR model, leading to similar theoretical and practical properties.The linear Student’s t autoregressions have the same form for the conditional mean as the Gaussianautoregressions (a linear function of the past observations) but different conditional variance. Inparticular, the conditional variances of the Student’s t autoregressions depend on quadratic formsof past observations, whereas in the Gaussian case the conditional variances of the componentmodels are constants. Utilization of the t -distribution does hence not only allow the StMAR modelto account for larger kurtosis than the GMAR model but also stronger forms of conditional het-eroskedasticity.In this paper, we propose a generalization of the GMAR and StMAR models. The G-StMARmodel accommodates both Gaussian autoregressions and Student’s t autoregressions as its com-ponent models, and its mixing weights are constructed analogously to the GMAR and StMARmodels, leading to similar attractive features. It thus enables to model series which consist ofregimes with time varying conditional variance and excess kurtosis as well as regimes with con-stant conditional variance and zero excess kurtosis. It turns out that the G-StMAR model is alimiting case of a StMAR model with the t -distributions of some regimes tending to normal distri-butions as the degrees of freedom parameters tend to inﬁnity. As opposed to the limiting StMAR1odel, the advantage of the G-StMAR model is that it removes the redundant degrees of freedomparameters from the model and is free from numerical problems induced by weak identiﬁcation ofvery large degrees of freedom parameters.We demonstrate the usefulness of the G-StMAR model in an empirical application to themonthly U.S. interest rate spread between the 3-month Treasury bill (TB) rate and the effectivefederal funds (FF) rate. Our G-StMAR model identiﬁes three regimes for the spread, with a GMARtype regime mainly appearing after the ﬁnancial crisis in 2008 when the zero lower bound limitsmovements of the spread. The remaining regimes are of the StMAR type, one accommodating erasof low mean and high variability and the other high mean and moderate variability. The formerStMAR type regime dominates often when the market possibly anticipates decreases in the FFrate or has increased preference for safety, whereas the latter one mostly prevails when the Fed isarguably not expected to signiﬁcantly decrease the FF rate target. Our ﬁndings are consistent withSarno and Thornton (2003) who found that the FF rate seems to adjust to the TB rate, supportingthe hypothesis that the market anticipates movements of the FF rate, moving the TB rate, and hencethe spread, in advance.The rest of this article is organized as follows. Section 2 ﬁrst introduces the component pro-cesses of the G-StMAR model and then proceeds to deﬁne the G-StMAR model and discussesits theoretical properties. Section 3 discusses maximum likelihood (ML) estimation of the modelparameters and establishes the asymptotic properties of the ML estimator. It is, in particular,discussed how the accompanying R package ”uGMAR” (Virolainen, 2020) estimates the modelparameters in practice with a two-phase procedure. Section 4 describes a simple model selec-tion procedure and discusses numerical consequences of very large degrees of freedom parameterestimates. Section 5 presents the empirical application to the interest rate spread and Section 6concludes. Details of the estimation procedure employed by uGMAR, as well as proofs for thestated theorems are given in an Appendix.Throughout this paper, we use the following notation. We write x = ( x , ..., x n ) for thecolumn vector x where the components x i may be either scalars or (column) vectors. The notation2 ∼ n d ( µ , Γ ) signiﬁes that the random vector x has a d -dimensional Gaussian distribution withmean µ and (positive deﬁnite) covariance matrix Γ . Similarly, x ∼ t d ( µ , Γ , ν ) signiﬁes that x has a d -dimensional t -distribution with mean µ , (positive deﬁnite) covariance matrix Γ , anddegrees of freedom ν (assumed to satisfy ν > ). The density functions and some properties of themultivariate Gaussian and Student’s t -distributions are given in an Appendix. The vectorizationoperator vec stacks columns of a matrix on top of each other and, ι d is the d dimensional vector (1 , , ..., , I d signiﬁes the identity matrix of dimension d , and ⊗ denotes the Kronecker product.Moreover, d and d denote d dimensional vectors of ones and zeros, respectively. We consider mixture autoregressive models in which each observation is generated by a mixturecomponent that is randomly selected according to the probabilities pointed by the mixing weights.The mixture components are either (linear) conditionally homoscedastic Gaussian autoregressionsas in the GMAR model (Kalliovirta et al., 2015) or conditionally heteroscedastic Student’s t au-toregressions as in the StMAR model (Meitz et al., 2018a). The mixing weights are functions ofthe past observations constructed in a way that, for a p th order model, leads to ergodicity and fullknowledge of the stationary distribution p + 1 consecutive observations. Moreover, as the mixingweights depend on the full distribution of the past p observations, they allow regime switches todepend on the level, variability, and temporal dependence of the past observations. In this section,we ﬁrst introduce the component processes of the G-StMAR model and then proceed deﬁne of theG-StMAR model and discuss its properties. 3 .1 Linear Gaussian and Student’s t autoregressions To develop theory and notation, we ﬁrst consider the component processes of the G-StMAR model.For a linear p th order Gaussian or Student t autoregression z t , we have z t = ϕ + p (cid:88) i =1 ϕ i z t − i + σ t ε t , ε t ∼ IID (0 , , (2.1)where σ t > , ϕ ∈ R , and the autoregressive (AR) parameter ϕ = ( ϕ , ..., ϕ p ) satisﬁes thestationarity condition ϕ ∈ S p where S p = { ( ϕ , ..., ϕ p ) ∈ R p : 1 − p (cid:88) i =1 ϕ i z i (cid:54) = 0 for | z | ≤ } . (2.2)In the case of Gaussian autoregression, the distribution of the errors terms ε t is standard normaland σ t is a constant σ for all t . Denoting z t = ( z t , ..., z t − p +1 ) and µ = E [ z t ] , γ j = Cov ( z t , z t − j ) ,and γ p = ( γ , ..., γ p ) , it is well know that the stationary solution to (2.1) for the Gaussian autore-gression satisﬁes z t ∼ n p ( µ p , Γ p ) , (2.3) ( z t , z t − ) ∼ n p +1 ( µ p +1 , Γ p +1 ) , (2.4) z t | z t − ∼ n ( µ + γ (cid:48) p Γ − p ( z t − − µ p ) , γ − γ (cid:48) p Γ − p γ p ) = n ( ϕ + ϕ (cid:48) z t − , σ ) , (2.5)where µ = ϕ / (1 − ϕ (cid:48) p ) γ p = Γ p ϕ , and the covariance matrices Γ p and Γ p +1 are Toeplitzmatrices given as (see, e.g., L¨utkepohl (2005), eq. (2.1.39)) vec ( Γ p ) = ( I p − (Φ ⊗ Φ)) − ι p σ , Φ = (cid:34) ϕ · · · ϕ p − ϕ p I p − p − (cid:35) , Γ p +1 = (cid:34) γ γ (cid:48) p γ p Γ p (cid:35) . (2.6)Using the same notation as in (2.3)-(2.5) for z t − , µ , and Γ p , the Student’s t autoregressionsutilized by Meitz et al. (2018a) (which have also appeared at least in Spanos (1994) and Heracleous4nd Spanos (2006)) are obtained by letting ε t ∼ t (0 , , ν + p ) with ν > in (2.1) and deﬁning σ t = ν − z t − − µ p ) (cid:48) Γ − p ( z t − − µ p ) ν − p σ . (2.7)This deﬁnition (which requires the stationarity condition of the AR parameter) guarantees station-arity of the Student’s t autoregressions. Distributional properties of such stationary Student’s t autoregressions are similar to the Gaussian case, in particular (Meitz et al., 2018a, Theorem 1), z t ∼ t p ( µ p , Γ p , ν ) , (2.8) ( z t , z t − ) ∼ t p +1 ( µ p +1 , Γ p +1 , ν ) , (2.9) z t | z t − ∼ t ( ϕ + ϕ (cid:48) z t − , σ t , ν + p ) . (2.10)The aforementioned properties of the component processes are essential in the following discus-sions and will be exploited implicitly. Gaussian component processes of the G-StMAR model arereferred to as GMAR type and Student’s t component processes as StMAR type since they are iden-tical to the component processes of the GMAR model (Kalliovirta et al., 2015) and the StMARmodel (Meitz et al., 2018a), respectively. t mixture autoregressive model Let y t ( t = 1 , , ... ) be the real valued time series of interest, and let F t − denote the σ -algebra gen-erated by the random variables { y t − j , j > } . For a G-StMAR model with M mixture componentsand autoregressive order p , we have y t = M (cid:88) m =1 s m,t ( µ m,t + σ m,t ε m,t ) , ε m,t ∼ IID (0 , , (2.11) µ m,t = ϕ m, + p (cid:88) i =1 ϕ m,i y t − i , m = 1 , ..., M, (2.12)5here σ m,t > are F t − -measurable, ε m,t are independent of F t − , ϕ m, ∈ R , ϕ m ∈ S p (the set S p is deﬁned in (2.2)), and s ,t , ..., s M,t are unobservable regime variables such that for each t , exactlyone of them takes the value one and the others take the value zero. Given the past of y t , s m,t and ε m,t are assumed to be conditionally independent, and the conditional probability for regime m occurring at the time t is expressed in terms of the mixing weights α m,t ≡ Pr ( s m,t = 1 | F t − ) thatsatisfy (cid:80) Mm =1 α m,t = 1 (for all t = 1 , , ... ). Each observation is thus generated by a linear autore-gression corresponding to some (unobserved) mixture component m which is selected randomlyaccording to the probabilities determined by the mixing weights.The ﬁrst M mixture components are (linear) Gaussian autoregressions and the rest M ≡ M − M are Student’s t autoregressions. Regarding equation (2.11), this means that for m =1 , ..., M , the terms ε m,t have standard normal distributions and the variances σ m,t are constants σ m . For m = M + 1 , ..., M , the terms ε m,t follow the t -distribution t (0 , , ν m + p ) and thevariances σ m,t are as in equation (2.7) except that z t − is replaced with y t − = ( y t − , ..., y t − p ) and the regime speciﬁc parameters ϕ m, , ϕ m , σ m , ν m are used to deﬁne µ and Γ p therein. Thecomponent speciﬁc conditional means µ m,t are deﬁned by equation (2.12) for all the components.Based on the above speciﬁcations, the conditional density function of a G-StMAR model withautoregressive order p is given as f ( y t |F t − ) = M (cid:88) m =1 α m,t n ( y t ; µ m,t , σ m ) + M (cid:88) m = M +1 α m,t t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1) , (2.13)where the conditional densities n ( y t ; µ m,t , σ m ) and t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1) are obtained fromthe properties of the component processes (using the regime speciﬁc parameters). The form of theStudent’s t density function in (2.13) is given in online Appendix. The G-StMAR model adds tothe class of mixture models introduced by Le et al. (1996) and further developed by Wong and Li(2000, 2001a,b), Glasbey (2001), Lanne and Saikkonen (2003), and Wong et al. (2009), to name afew. In order to specify the mixing weights α m,t in (2.13), we ﬁrst deﬁne the following function6or notational convenience. Let d m ( y ; µ m p , Γ m , ν m ) = (cid:40) n p ( y ; µ m p , Γ m ) , when m ≤ M ,t p ( y ; µ m p , Γ m , ν m ) , when m > M , (2.14)where the p -dimensional densities n p ( y ; µ m p , Γ m ) and t p ( y ; µ m p , Γ m , ν m ) correspond to thestationary distribution of the m th component process (given in the equations (2.3) and (2.8)). De-noting y t − = ( y t − , ..., y t − p ) , the mixing weights of the G-StMAR model are deﬁned as α m,t = α m d m ( y t − ; µ m p , Γ m , ν m ) (cid:80) Mn =1 α n d n ( y t − ; µ n p , Γ n , ν n ) , (2.15)where the parameters α , ..., α M satisfy (cid:80) Mm =1 α m = 1 . The mixing weights are thus weightedratios of densities of the component processes corresponding to the p previous observations. Thisspeciﬁc deﬁnition of the mixing weights is appealing as it states that an observation is more likelyto be generated from a regime with higher relative weighted likelihood. Moreover, it allows theprobabilities of each regime occurring to depend on the level, variability, and temporal depen-dence of the past observations. This is not only convenient for forecasting but it also allows theresearcher to associate speciﬁc characteristics to different regimes. It turns out that this formulationof the mixing weights also leads to attractive theoretical properties such as fully known station-ary distribution of realizations ( y t , ..., y t − h ) , h = 0 , , ..., p , and ergodicity of the process. Thesetheoretical properties are formally stated in Theorem 1 below.Before stating the theorem, a few notational conventions are provided. We collect the pa-rameters of the G-StMAR model to a ( M ( p + 3) + M − × vector θ ≡ ( θ − , ν ) where θ − = ( ϑ , ..., ϑ M , α , ..., α M − ) , ϑ m = ( ϕ m, , ϕ m , σ m ) , ϕ m = ( ϕ m, , ..., ϕ m,p ) , m = 1 , ..., M ,and ν = ( ν M +1 , ..., ν M ) . The parameter α M is omitted because it is obtained from the restriction (cid:80) Mm =1 α m = 1 . The parameter space for the G-StMAR model is Θ = (cid:110) θ ∈ R M (2+ p ) × (0 , M − × (2 , ∞ ) M : ϕ m ∈ S p , σ m > , for all m = 1 , ..., M (cid:111) (2.16)7here the restriction ν m > ( m = M + 1 , ..., M ) is made to ensure existence of ﬁnite secondmoments and the set S p is as in (2.2). A G-StMAR model with autoregressive order p , M GMARtype regimes, and M StMAR type regimes is referred to as the G-StMAR( p, M , M ) model,whenever clarity of the presentation requires. Theorem 1

Consider the G-StMAR process y t generated by (2.13) and (2.15) with θ ∈ Θ . Then y t = ( y t , ..., y t − p +1 ) ( t = 1 , , ... ) is a Markov chain on R p with a stationary distribution charac-terized by the density f ( y ; θ ) = M (cid:88) m =1 α m n p ( y ; µ m p , Γ m ) + M (cid:88) m = M +1 α m t p ( y ; µ m p , Γ m , ν m ) . (2.17) Moreover, y t is ergodic. The stationary distribution of y t is a mixture of p -dimensional normal and t -distributions withconstant mixing weights α m . By the well known properties of the normal and the t -distribution,all its moments lower than min { ν M +1 , ..., ν M } exist and are ﬁnite. Moreover, as shown in theproof of Theorem 1, for any h = 0 , , ..., p , the marginal stationary distribution of the vector ( y t , .., y t − h ) is also a mixture of normal and t -distributions. This gives the parameters α m aninterpretation as the unconditional probabilities for the observation y t being generated from the m th component process. Similarly to the GMAR and the StMAR process, the mean, variance, andﬁrst p autocovariances of y t are thusE [ y t ] ≡ µ y = M (cid:88) m =1 α m µ m , γ j ≡ M (cid:88) m =1 α m γ m,j + M (cid:88) m =1 α m ( µ m − µ y ) , j = 0 , , ..., p, (2.18)where γ m,j is the j :th autocovariance of the m :th component process.The conditional mean and variance of the G-StMAR process are obtained from the deﬁnitionof the model as E [ y t |F t − ] = (cid:80) Mm =1 α m,t µ m,t andVar ( y t |F t − ) = M (cid:88) m =1 α m,t σ m + M (cid:88) m = M +1 α m,t σ m,t + M (cid:88) m =1 α m,t (cid:32) µ m,t − M (cid:88) n =1 α n,t µ n,t (cid:33) . (2.19)8he conditional mean shares a common form with the GMAR model and StMAR model but differsfrom them in the deﬁnition of the mixing weights. The conditional variance includes three compo-nents; the ﬁrst one is related to the conditional variances of the GMAR type components and thesecond one to the StMAR type components, whereas the third term encapsulates heteroskedasticitycaused by variations in the conditional mean.Notice that the GMAR model (Kalliovirta et al., 2015) can be obtained as a special of theG-StMAR model by setting M = M and M = 0 , and similarly the StMAR model (Meitz et al.,2018a) is obtained by setting M = 0 and M = M . We simply need to drop the correspondingterms from the formulas, and all the deﬁnitions and results stated in this and in the next sectionalso hold for to the GMAR and StMAR models individually. However, some theory developedfor the GMAR model, such as geometric ergodicity (Kalliovirta et al., 2015, Theorem A.1), hasnot been established for the StMAR and G-StMAR models. The GMAR model also requiresless (currently) unveriﬁed assumptions than the StMAR and G-StMAR models for concludingasymptotic normality of the maximum likelihood estimator (see Kalliovirta et al., 2015, Section 2,Meitz et al., 2018a, Theorem 3, and Theorem 2 of this paper) Parameters of the G-StMAR model can be estimated with the method of maximum likelihood(ML). Because the stationary distribution of the process is known, the exact log-likelihood functioncan be used. Suppose the observed time series is y − p +1 , ..., y , y , ..., y T and that the initial valuesare stationary. Then the log-likelihood function of the G-StMAR model takes the form L ( θ ) = log (cid:32) M (cid:88) m =1 α m n p ( y ; µ m p , Γ m ) + M (cid:88) m = M +1 α m t p ( y ; µ m p , Γ m , ν m ) (cid:33) + T (cid:88) t =1 l t ( θ ) , (3.1)9here l t ( θ ) = log (cid:32) M (cid:88) m =1 α m,t n ( y t ; µ m,t , σ m ) + M (cid:88) m = M +1 α m,t t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1)(cid:33) , (3.2)and the density functions n d ( · ; · ) and t d ( · ; · ) follow the notation described in Section 2.2. If station-arity of the initial values seems unreasonable, one can condition on the initial values by droppingthe ﬁrst term on the right hand side of (3.1) and base the estimation on the resulting conditionallog-likelihood function.In what follows, we assume estimation based on the conditional log-likelihood function L ( c ) T ( θ ) = T − (cid:80) Tt =1 l t ( θ ) , i.e., that the ML estimator ˆ θ T maximizes L ( c ) T ( θ ) . We have scaled the conditionallog-likelihood function with the sample size T so that the notation is consistent with the referredliterature.To investigate the asymptotic properties of the ML estimator ˆ θ T , the parameter space Θ givenin (2.16) needs to be restricted in a way that guarantees identiﬁcation of the parameters. Thisamounts to requiring that components of the G-StMAR model cannot be ”relabelled” so that oneends up with the same model with different parameter vector; that is, α > · · · > α M > , α M +1 > · · · > α M > , and ϑ i = ϑ j only if some of the conditions (1) 1 ≤ i = j ≤ M, (2) i ≤ M < j, (3) i, j > M and ν i (cid:54) = ν j is satisﬁed. (3.3)The restrictions required to establish asymptotic properties of the ML estimator are summarized inthe following assumption. Assumption 1

The true parameter value θ is an interior point of ¯Θ which is a compact subset of { θ ∈ Θ : (3 . holds } . Asymptotic properties of the ML estimator under the conventional high-level conditions arestated in the following theorem (which is similar to Theorem 3 in Meitz et al. (2018a) on the MLestimator of the StMAR model). Denote I ( θ ) = E (cid:2) ∂l t ( θ ) ∂ θ ∂l t ( θ ) ∂ θ (cid:48) (cid:3) and J ( θ ) = E (cid:2) ∂ l t ( θ ) ∂ θ ∂ θ (cid:48) (cid:3) .10 heorem 2 Suppose that y t are generated by the stationary and ergodic G-StMAR process of The-orem 1 and that Assumption 1 holds. Then ˆ θ T is strongly consistent, i.e., ˆ θ T → θ almost surely.Suppose further that (i) T / ∂∂ θ L ( c ) T ( θ ) d → N (0 , I ( θ )) with I ( θ ) ﬁnite and positive deﬁnite, (ii) J ( θ ) = −I ( θ ) , and (iii) E (cid:2) sup θ ∈ ¯Θ (cid:12)(cid:12) ∂ l t ( θ ) ∂ θ ∂ θ (cid:48) (cid:12)(cid:12)(cid:3) < ∞ for some ¯Θ , compact convex set containedin the interior of ¯Θ that has θ as an interior point. Then T / (ˆ θ T − θ ) d → N (0 , −J ( θ ) − ) . If one is willing to assume validity of the conditions (i)-(iii) of Theorem 2, the ML estimator ˆ θ T hasthe conventional limiting distribution, implying that approximative standard errors for the estimatesare obtained as usual. Moreover, standard likelihood based tests are applicable as long as the orders M and M are correctly speciﬁed. If M or M is chosen too large, some of the parameters are notidentiﬁed causing the result of Theorem 2 to break down. This particularly happens when one testsfor the number of regimes as the null hypothesis would imply that some regime is reduced fromthe model (see the related discussion in Kalliovirta et al., 2015, Section 3.3.2). Similar cautionalso applies for testing whether a regime is of the GMAR type against the alternative that it is ofthe StMAR type, as under the null hypothesis ν m = ∞ for the StMAR type regime m being tested,violating Assumption 1. Numerical consequences of the weak identiﬁcation of very large degreesof freedom parameters are brieﬂy discussed in Section 4. Finding the ML estimates amounts to maximizing the log-likelihood function (3.1) over the highdimensional parameter space (2.16) satisfying several constraints. Due to the complexity of thelog-likelihood function, ﬁnding an analytical solution is infeasible, so numerical optimizationmethods are required. The EM algorithm (Redner and Walker, 1984) has been a popular choicefor estimating mixture models (e.g. Wong and Li, 2000, 2001a,b, and Wong et al., 2009) as it issuitable for problems where all the data relevant to estimation is not observed (for mixture modelsthat is the origin of each observation; in our case, the random variables s ,t , ..., s M,t in (2.11)). For Meitz and Saikkonen (2017) have, however, recently developed such tests for mixture models with Gaussian condi-tional densities. Brief descriptions of the employed genetic algorithm and its modiﬁcationsare given in an Appendix. After running the genetic algorithm, the estimation is ﬁnalized witha variable metric algorithm (Nash, 1990, algorithm 21, implemented by R Core Team, 2020) us-ing central difference approximation for the gradient of the log-likelihood function. Because ofthe presence of multiple local maxima, a (sometimes large) number of estimation rounds should beperformed to obtain reliable results, for which uGMAR makes use of parallel computing to shortenthe estimation time. In addition to the G-StMAR model, uGMAR also accomodates the GMAR and StMAR models. Building a G-StMAR Model

In empirical applications, building a G-StMAR model amounts to ﬁnding a suitable autoregressiveorder p , the number of GMAR type regimes M , and the number of StMAR type regimes M .Different strategies for choosing the number of each type of regimes may be considered dependingon the application. We propose a simple model selection procedure which takes advantage of theobservation that the G-StMAR model is a limiting case of the StMAR model .It is easy to see that the linear Gaussian autoregression deﬁned in Section 2.1 is obtained asa limiting case of the Student’s t autoregression with the degrees of freedom parameter tending toinﬁnity. As the mixing weights (2.15) are weighted ratios of the component process densities, itthen follows that the G-StMAR( p, M , M ) model is obtained as a limiting case of a StMAR( p, M )model with the parameters ν , ..., ν M limiting to inﬁnity. Consequently, if a StMAR( p, M ) modelis ﬁtted to data generated by a G-StMAR( p, M , M − M ) process, then asymptotically, the M regimes of the ﬁtted StMAR model are expected to get large degrees of freedom estimates. Wetherefore suggest building a G-StMAR model by ﬁrst ﬁnding a suitable StMAR model, and thenestimating the appropriate G-StMAR model if the ﬁtted StMAR model contains large degrees offreedom estimates. A StMAR model can be speciﬁed, for example, by using information criteriatogether with quantile residual diagnostics (see, e.g., Kalliovirta, 2012).Overly large degrees of freedom estimates in a StMAR model are redundant but their weakidentiﬁcation also causes several inconveniences in numerical analysis of the model. They lead tonearly numerically singular Hessian matrix of the log-likelihood function when evaluated at theestimate, making the approximate standard errors often unavailable. Weakly identiﬁed degreesof freedom parameters also cause inconvenience in quantile residual based model diagnostics. Inparticular, the quantile residual tests proposed by Kalliovirta (2012) require a positive deﬁnite ap-proximation of the Hessian matrix (evaluated the ML estimate). The tests are thus not applicablefor StMAR models with too large degrees of freedom estimates, whereas they are for the corre-sponding G-StMAR models. Applicability of Kalliovirta’s (2012) tests, which take into account The deﬁnition of the StMAR( p, M ) model is technically the same as of the G-StMAR( p, , M ) model. We consider the monthly U.S. interest rate spread between the 3-month Treasury bill (TB) sec-ondary market rate and the effective federal funds (FF) rate, covering the period from 1954VII to2019VII (781 observations). The series is plotted in Figure 1 (top left) along with the 3-month TBand FF rates, and with the shaded areas indicating the periods of (NBER based) U.S. recessions.All the data were taken from the Federal Reserve Bank of St. Louis database.Treasury bills are short-term pure discount bonds which are backed by the U.S. governmentand therefore generally considered to be almost free from default-risk. The effective federal fundsrate is the averaged rate at which depository institutes loan federal funds to each other overnight.The overnight FF lending agreements are one of the most liquid ﬁnancial asset, but unlike TBs, theyare subject to a notable default-risk. The relationship between TB and FF rates has been studied,among others, by Simon (1990) and Sarno and Thornton (2003), while Kishor and Marfatia (2013)examine the relationship between TB and FF futures rate.According to term structure theory, a long-term interest rate should reﬂect the current andexpected future short-term rates, and also perceptions of risk and liquidity in the form of (possiblytime varying) premium. Simon (1990) studied the predictive power of the weekly spread betweenthe 3-month TB and FF rates on the future levels of the FF rate in 1972-1987. He argued that thecurrent and expected future FF rates affect the spread between the TB and FF rates through therepurchase agreement (repo) market because repos are closely linked to the FF rate, and corpora-tions with funds to invest can buy TBs alternatively to investing in consecutive overnight repos. TB In a repo, the borrower sells a security to the lender and agrees to repurchase it in the future (often in the next day).Effectively, repos function similarly to collateralized loans. See Baklanova et al. (2015) for an overview of the U.S.repo market. and TB rates as a risk premium for bank safety. He found that the spread betweenthe 3-month TB and FF rate had signiﬁcant predictive power on future levels of the FF rate in thevolatile nonborrowed reserves operating period (late 1979 - late 1982) but less or none in the othersubperiods.Sarno and Thornton (2003) identiﬁed an error correction model (ECM) between the daily 3-month TB and effective FF rate (covering the period from 1974 to 1999) and showed that theirECM, which allows for asymmetries and nonlinearities, outperforms the alternative of a linearECM. One of their main ﬁndings was that the FF rate (which is controlled by the Fed) seems to ad-just to the TB rate and not vice versa, supporting the hypothesis that the market anticipates changesin the FF rate, moving the TB rate in advance. Moreover, it appears that the adjustment speed de-pends on the sign and size of the deviation from the long-run equilibrium. Sarno and Thornton(2003) argued that although there has been a number of procedural changes affecting predictabilityof the FF rate, their results implicate that the changes have been statistically unimportant. Fur-thermore, their robustness checks indicate that their ﬁndings on the adjustments from disequilibriaalso hold for monthly data. Variations and asymmetries in the adjustment speed, on the other hand,indicate that the dynamics of the spread between the TB and FF rates might ﬂuctuate along withthe level of the spread. This suggests that a mixture model, such as the G-StMAR model, which isable encapsulate such behaviour could be an appropriate choice of model.Kishor and Marfatia (2013) argued that the results in Sarno and Thornton (2003) are not verysurprising since the effective FF rate always tends to revert back to the FF target rate, and it doesnot incorporate markets expectations of the changes in the future FF rate. To get around that, theystudied the relationship between the 3-month TB rate and the 1-month FF futures rate which does Eurodollar time deposit is a U.S. dollar-denominated deposit at a bank outside the U.S. with a ﬁxed maturity. , , ) model ﬁtted to the interest rate spread series. The shadedareas indicate the periods of (NBER based) U.S. recessions. On right, a Gaussian kernel densityestimate of the interest rate spread (black solid line), the mixture density implied by the ﬁtted G-StMAR( , , ) model (grey dashed line), and the regime densities (blue, green, and red dottedlines).incorporate information about market’s anticipations on the future FF rate. They ﬁtted a linearECM to a daily series from 1989 to 2008, and found that the TB rate and the FF futures rate bothseem to move to correct a short-run disequilibrium.Interestingly, the spread between the 3-month TB rate and the effective FF rate is most ofthe time (covered in our sample period) negative. Sarno and Thornton (2003) made a similarobservation for their daily series and suggested that only a small fraction of the negative differencecould be attributed to the low default-risk of TBs, but that a more plausible explanation is that theinterest on TBs is exempt from some local and state taxes. As the smaller taxes have larger effecton paid net interest (relative to interest paid on federal funds) when the interest rates are higher,some movements of the spread could be partially caused by the differences in taxation.16 .1 Estimation and model selection We employ the method of maximum likelihood based on the exact log-likelihood function for es-timating the parameters of the considered models. Adequacy of the estimated models is examinedusing quantile residual diagnostics in the framework presented in Kalliovirta (2012). The quantileresiduals of a correctly speciﬁed G-StMAR model are asymptotically independent with standardnormal distributions (Kalliovirta, 2012, Lemma 2.1), so they can be used for graphical analysisin a similar fashion to conventional Pearson’s residuals. In addition to graphical analysis of thequantile residuals, we perform Kalliovirta’s (2012) asymptotic tests (which take into account theuncertainty caused by estimation of the parameters) for testing normality, autocorrelation, and con-ditional heteroskedasticity of the quantile residuals. The estimation, quantile residual diagnostics,and other numerical analysis of the models is conducted using the R package uGMAR (Virolainen,2020) which is available through the CRAN repository. uGMAR estimates the model parametersusing the two-phase procedure described in Section 3.1.Following the model selection procedure described in Section 4, we started by ﬁnding a suit-able StMAR model. First, we estimated the StMAR( p, M ) model with one mixture component, M = 1 , and autoregressive orders p = 1 , ..., and found that the order p = 6 yields the largestlikelihood. Adequacy of the StMAR( , ) model was clearly rejected by the quantile residual tests(see Table 2), so we estimated the StMAR( p, M ) models with orders p = 1 , ..., and M = 2 , .The order ( p, M ) = (5 , minimized the Schwarz-Bayesian (BIC) and the Hannan-Quinn (HQIC)information criteria, whereas the Akaike’s information criterion (AIC) was minimized by the or-der ( p, M ) = (5 , . Inappropriate estimates extremely near the border of the stationarity regionwere discarded as they are not solutions of interest (but maximize the likelihood for rather a tech-nical reason), so in such cases the next-best local maximum of the log-likelihood function wasconsidered instead. In both the StMAR( , ) and the StMAR( , ) model, a very large degrees offreedom estimate for one regime was obtained (approximately and , respectively), so There is also Matlab code available for the StMAR model in the form of StMAR MATLAB Toolbox by Meitz et al.(2018b).

17e estimated the corresponding G-StMAR( , , ) and G-StMAR( , , ) models. Removing theweakly identiﬁed degrees of freedom parameters by switching to the G-StMAR models enabledus to compute approximate standard errors of the estimates and to calculate Kalliovirta’s (2012)test statistics (see Section 4). The values of the information criteria are reported in Table 2 and theparameter estimates of the G-StMAR models are reported in Table 1 with the approximate standarderrors for the estimates in brackets.Estimates regarding the GMAR type regime are quite similar for the two G-StMAR models,and their standard errors are relatively large. This is because for both of the models the GMARtype regime mainly occurs in the period of near-zero interest rates after 2008 and there are henceonly few observations from that regime (regime 1 in Figure 1, bottom left, which displays the mix-ing weights of the G-StMAR( , , ) model; the mixing weights of the G-StMAR( , , ) modelare not shown). The three zeros in the variance parameter estimates (and in their standard errors)signify that the estimates (and their standard errors) round to zero in three digits accuracy , imply-ing that the GMAR type regime exhibits very low variability (conditionally and unconditionally).The small mixing weight parameter estimates, interpreted as the unconditional probability for theGMAR type regime occurring, reﬂect the observation that eras of such a low variability have beenrare in the sample period. Also, a remarkably large standard error for the second regime’s varianceparameter sticks out for both of the models. Examination of the proﬁle log-likelihood functions(not shown) does not, however, reveal anything notable.Since the AR parameter estimates for the G-StMAR( , , ) model are somewhat similar in allregimes, we estimated a StMAR( , ) model with the AR parameters restricted to be the same in allregimes, allowing for changes in the level, variability, and kurtosis only. The degrees of freedomestimate for one regime was very large (approximately ), so we estimated the correspondingrestricted G-StMAR model which we refer to as the G-StMAR( , , r model. The parameterestimates of this model are also presented in Table 1 with the related statistics, and the values of More accurate values for the ML estimate of σ and its standard error are . × − and . × − for theG-StMAR( , , ) model, . × − and . × − for the G-StMAR( , , ) model, and . × − and . × − for the G-StMAR( , , r model, respectively. . , sample autocorrelation atlag sticks out for each of the three models, however.In order to further study adequacy of the models, we employed Kalliovirta’s (2012) tests, andtested for normality, autocorrelation, and conditional heteroskedasticity of the quantile residuals,taking into account , , , and lags in the autocorrelation and heteroskedasticity tests. The p -values obtained from the tests are reported in Table 2. The normality test rejects for all the threemodels at level of signiﬁcance, possibly because of the fat lower tails in the quantile residuals’distributions. More interestingly, despite the similarities in the graphical analysis, the autocorrela-tion tests unambiguously reject adequacy of the G-StMAR( , , ) model, whereas the p -values arereasonable for the G-StMAR( , , ) model which also passes the heteroskedasticity tests. The p -values for the autocorrelation tests are rather small also for the restricted G-StMAR( , , r model,which is preferred by the information criteria, showing some evidence of inadequacy. We thereforeprefer the unrestricted G-StMAR( , , ) model whose overall adequacy seems quite satisfactory.The fact that the restricted model has information criteria values superior to the unrestricted mod-els, however, suggests that imposing the autocorrelation structure to be the same for all regimeswould also be a reasonable modelling choice. For comparison, we also estimated the GMAR( p, M ) model with orders p = 1 , ..., and M = 1 , ..., . The valuesof the information criteria were, however, found inferior to our G-StMAR models, with the GMAR( , ) modelminimizing BIC ( − ) and the GMAR( , ) model minimizing HQIC ( − ) and AIC ( − ). -StMAR( , , ) G-StMAR( , , ) G-StMAR( , , r ϕ , −0.011 (0.010) −0.013 (0.009) −0.007 (0.002) ϕ , ϕ , −0.049 (0.168) −0.079 (0.163) −0.058 (0.050) ϕ , ϕ , ϕ , σ α µ −0.056 −0.055 −0.048 γ , ϕ , −0.009 (0.005) −0.066 (0.025) −0.079 (0.025) ϕ , ϕ , −0.051 (0.053) −0.038 (0.076) ϕ , ϕ , −0.052 (0.055) −0.134 (0.077) ϕ , σ ν α µ −0.110 −0.519 −0.541 γ , ϕ , −0.011 (0.005) −0.011 (0.005) ϕ , ϕ , −0.082 (0.090) ϕ , ϕ , ϕ , −0.062 (0.085) σ ν µ −0.059 −0.074 γ , µ y −0.108 −0.331 −0.353 γ L (ˆ θ ) Table 1: Maximum likelihood estimates of the G-StMAR( , , ), the G-StMAR( , , ), and therestricted G-StMAR( , , r model based on the exact log-likelihood function, with approximatestandard errors for the estimates presented in the brackets. The statistics µ m and γ m, , m = 1 , , ,are the stationary mean and variance of each regime, respectively. Likewise, the statistics µ y and γ are the stationary mean and variance of the process. The maximized log-likelihoods for eachmodel are presented in the bottom row of the table.20igure 2: Graphical quantile residual diagnostics for the models presented in Table 1. The toprow is for the G-StMAR( , , ) model, the middle row is for the G-StMAR( , , ) model, andthe bottom row is for the G-StMAR( , , r model. The ﬁrst column presents the time series,the second column the normal quantile plot, and third column the autocorrelation function of thequantile residuals. The fourth column presents the autocorrelation function of the squared quantileresiduals. The blue solid line in the quantile plots displays the theoretical quantiles, and the bluedashed lines in the autocorrelation function plots are the 95% bounds ± . / √ T ( T = 776 as theﬁrst p values are the initial values) for autocorrelations of an IID sequence which are presented togive an approximate perception of the magnitude of the sample autocorrelations. Normality Autocorrelation Cond. h.skedasticity AIC HQIC BICNumber of lags

StMAR(6,1) .

00 0 .

00 0 . − − − G-StMAR( , , ) .

00 0 .

01 0 . . . .

46 0 .

38 0 . . − − − G-StMAR( , , ) . .

18 0 .

40 0 .

57 0 .

16 0 .

82 0 .

07 0 .

18 0 . − − − G-StMAR( , , r . .

02 0 .

07 0 .

13 0 .

03 0 .

68 0 .

07 0 .

24 0 . − − − Table 2: The p -values obtained from the Kalliovirta’s (2012) quantile residual tests, testing fornormality, autocorrelation, and conditional heteroskedasticity of the quantile residuals. The p -values smaller than . are bolded. In order to improve size properties of the tests, we employedthe simulation procedure proposed by Kalliovirta (2012) using samples of length .21 .2 Discussion Our model selection procedure led to the (unrestricted) G-StMAR( , , ) model which identiﬁesthree statistical regimes for the spread between the 3-month TB secondary market rate and theeffective FF rate. The mixing weights of the model are presented in Figure 1 (bottom left) alongwith the interest rate spread series (top left). The GMAR type regime (red) dominates the period ofnear-zero interest rates occurring after 2008, where also the spread stays close to zero and exhibitsvery low variability. The second regime (green) identiﬁes periods of high variability and low mean,spanning through most of the recessions, whereas the third regime (blue) often occurs after therecessions when the spread moderately varies around zero. These characteristics of the regimes arealso highlighted in Figure 1 (right) where a kernel density estimate of the spread (black solid line)is presented with the model implied density (grey dashed line) and the regime densities (red, green,and blue dotted lines; regime densities are multiplied by the mixing weight parameter estimates α m , m = 1 , , ). The model implied density matches fairly well to the skewed distribution of theobservations, but peakiness of the distribution seems a bit exaggerated and the lower tail is not fatenough.Based on our G-StMAR( , , ) model, the regime speciﬁc unconditional mean of the spreadvaries from the − . %-units of the ﬁrst (GMAR type) and third regime to the − . %-units ofthe second regime, with each regime regularly occurring for several consecutive months. As thesecond regime dominates during most of the recessions, and also often occurs before the recessionswhen the interest rates are relatively high, it seems plausible that part of the larger negative meanis explained by expectations of a decrease in the near-future FF rate. The third regime, on theother hand, mostly occurs after the recessions when the interest rates seem relatively low, possiblyindicating that the larger mean of the regime could be related to the lack of expected decreasesin the FF rate. These ﬁndings are consistent with Sarno and Thornton (2003) who found thatthe FF rate corrects disequilibriums from the long-run relationship, supporting the hypothesis that By a regime occurring at a point of time we mean that according to the estimated mixing weights, the processgenerated an observation from that regime with a probability close to one.

This article introduced a mixture autoregressive model which is a combination of the Gaussianmixture autoregressive (GMAR) model (Kalliovirta et al., 2015) and the Student’s t mixture au-toregressive (StMAR) model (Meitz et al., 2018a). This model, referred to as the G-StMAR model,has several attractive theoretical and practical properties that are analogous to those of the GMARand StMAR model. In addition to discussing the properties, it was noted that estimating the param-23ters of the G-StMAR model can be challenging in practice. Following Dorsey and Mayer (1995)(and Meitz et al., 2018a,b), we suggested using a two-phase estimation procedure where a geneticalgorithm is used to ﬁnd starting values for a gradient based method and accompied the paper withthe R package uGMAR (Virolainen, 2020) which implements the two-phase estimation procedurewith a modiﬁed version of a genetic algorithm.We stated that the G-StMAR model is a limiting case of a StMAR model with some degreesof freedom parameters tending to inﬁnity, and found that large degrees of freedom estimates in aStMAR model are not only redundant but also cause several inconveniences in numerical analysisof the model. In particular, weak identiﬁcation of large degrees of freedom parameters was found tolead to numerically nearly singular approximation of the observed information matrix when eval-uated at the estimate, making the approximate standard errors for the estimates and Kalliovirta’s(2012) diagnostic tests often unavailable. Removing the redundant degrees of freedom parametersby switching to a G-StMAR model was concluded to obviate the problems.As an empirical application, we considered the monthly U.S. interest rate spread between the3-month Treasury bill rate and the effective federal funds rate. Our G-StMAR model identiﬁedthree regimes for the spread, with a switch from a StMAR type regime to a GMAR type regimearising from a switch in the economic regime, namely, to a regime where the zero lower boundlimits the movements of the interest rates. The two StMAR type regimes accommodate eras oflow mean and high variability and high mean and moderate variability. The ﬁrst StMAR typeregime arguably occurs often when the market anticipates decreases in the FF rate or possibly hasincreased preferences for safety, whereas the second one mostly occurs when the Fed is arguablynot expected to signiﬁcantly decrease the FF rate target. As opposed to modelling the series witha StMAR model containing an overly large degrees of freedom estimate, switching to the moreparsimonious G-StMAR model allowed us to numerically compute approximate standard errorsfor the estimates, and moreover, to perform the Kalliovirta’s (2012) quantile residual tests whichturned out to have signiﬁcance in the model selection.24 cknowledgements The author thanks Markku Lanne, Mika Meitz, and Pentti Saikkonen who commented the workand gave insightful suggestions that helped to improve the paper substantially. The author alsothanks the Academy of Finland for ﬁnancing the project.

References

Baklanova, V., Copeland, A., and McCaughrin, R. (2015). Reference guide to u.s. repo and se-curities lending markets, staff report no. 740. Technical report, Federal Reserve Bank of NewYork.Ding, P. (2016). On the conditional distribution of the multivariate t distribution. The AmericanStatistician , 70(3):293–295.Dorsey, R. and Mayer, W. (1995). Genetic algorithms for estimation problems with multiple op-tima, nondifferentiability, and other irregular features.

Journal of Business and Economic Statis-tics , 13(1):53–66.Glasbey, C. (2001). Non-linear autoregressive time series with multivariate gaussian mixtures asmarginal distributions.

Journal of Royal Statistical Society: Series C , 50(2):143–154.Goffe, W., Ferrier, G., and Rogers, J. (1994). Global optimization of statistical functions withsimulated annealing.

Journal of Econometrics , 60(1-2):65–99.Heracleous, M. and Spanos, A. (2006). Student’s t dynamic linear regression: re-examiningvolatility modeling. In Terrel, D. and T.B, F., editors, Econometric Analysis of Financial TimeSeries (Advances in Econometrics) , volume 20, chapter 1, pages 289–319. Emerald Group Pub-lishing Limited, Bingley.Holzmann, H., Munk, A., and Gneiting, T. (2006). Identiﬁability of ﬁnite mixtures of ellipticaldistributions.

Scandinavian Journal of Statistics , 33(4):753–763.25alliovirta, L. (2012). Misspeciﬁcation tests based on quantile residuals.

The Econometrics Jour-nal , 15(2):358–393.Kalliovirta, L., Meitz, M., and Saikkonen, P. (2015). A gaussian mixture autoregressive model forunivariate time series.

Journal of Time Series Analysis , 36(2):247–266.Kalliovirta, L., Meitz, M., and Saikkonen, P. (2016). Gaussian mixture vector autoregression.

Journal of Econometrics , 192(2):465–498.Kishor, N. and Marfatia, H. (2013). Does federal funds futures rate contain information about thetreasury bill rate?

Applied Financial Economics , 23(16):1311–1324.Lanne, M. and Saikkonen, P. (2003). Modeling the u.s. short-term interest rate by mixture autore-gressive processes.

Journal of Financial Econometrics , 1(1):96–125.Le, N., Martin, R., and Raftery, A. (1996). Modeling ﬂat stretches, bursts, and outliers in timeseries using mixture transition distribution models.

Journal of the American Statistical Associ-ation , 91(436):1504–1515.Meitz, M., Preve, D., and Saikkonen, P. (2018a). A mixture autoregressive model based on stu-dent’s t -distribution. Unpublished working paper, available as arXiv:1805.04010 .Meitz, M., Preve, D., and Saikkonen, P. (2018b).

StMAR Toolbox: A MATLAB Toolbox for Stu-dent’s t Mixture Autoregressive Models .Meitz, M. and Saikkonen, P. (2017). Testing for observation-dependent regime switching in mix-ture autoregressive models.

Unpublished working paper, available as arXiv:1711.03959 .Meyn, S. and Tweedie, R. (2009).

Markov Chains and Stochastic Stability . Cambridge UniversityPress, Cambridge, 2nd edition.Monahan, J. (1984). A note on enforcing stationarity in autoregressive-moving average models.

Biometrika , 71(2):403–404. 26ash, J. (1990).

Compact Numerical Methods for Computers. Linear algebra and Function Mini-mization . Adam Hilger, Bristol and New York, 2nd edition.Newey, W. and McFadden, D. (1994). Large sample estimation and hyphothesis testing. In Eagle,R. and MacFadden, D., editors,

Handbook of Econometrics , volume 4, chapter 36. ElsevierScience B.V.Patnaik, L. and Srinivas, M. (1994). Adaptive probabilities of crossover and mutation in geneticalgorithms.

Transactions on Systems, Man and Cybernetics , 24(4):656–667.R Core Team (2020).

R: A Language and Environment for Statistical Computing . R Foundationfor Statistical Computing, Vienna, Austria.Rao, R. (1962). Relations between weak and uniform convergence of measures with applications.

The Annals of Mathematical Statistics , 33(2):659–680.Redner, R. and Walker, H. (1984). Mixture densities, maximum likelihood and the em algorithm.

Society for Industrial and Applied Mathematics , 26(2):195–239.Sarno, L. and Thornton, D. (2003). The dynamic relationship between the federal funds rate andthe treasury bill rate: An empirical investigation.

Journal of Banking & Finance , 27(6):1079–1110.Simon, D. (1990). Expectations and the treasury bill-federal funds rate spread over recent monetarypolicy regimes.

Journal of Finance , 45(2):567–577.Smith, R., Dike, B., and Stegmann, S. (1995). Fitness inheritance in genetic algorithms.

Proceed-ings of the 1995 ACM symbosium on Applied Computing , pages 345–350.Spanos, A. (1994). On modeling heteroskedasticity: the student’s t and elliptical linear regressionmodels. Econometric Theory , 10(2):286–315.27irolainen, S. (2020). uGMAR: Estimate Univariate Gaussian or Student’s t Mixture Autoregres-sive Model . R package version 3.2.3, availabe at CRAN: https://CRAN.R-project.org/package=uGMAR .Wong, C., Chan, W., and Kam, P. (2009). A student’s t -mixture autoregressive model with appli-cations to heavy-tailed ﬁnancial data. Biometrika , 96(3):751–760.Wong, C. and Li, W. (2000). On mixture autoregressive model.

Journal of the Royal StatisticalSociety , 62(1):95–115.Wong, C. and Li, W. (2001a). On a mixture autoregressive conditional heteroskedastic model.

Journal of the American Statistical Association , 96(455):982–995.Wong, C. and Li, W. (2001b). On logistic mixture autoregressive model.

Biometrika , 88(3):833–846.

Appendix A Modiﬁed genetic algorithm

As discussed in Section 3.1, the accompanied R package uGMAR (Virolainen, 2020) employsa two-phase producedure for estimating the parameters of the G-StMAR model (and also of theGMAR (Kalliovirta et al., 2015) and the StMAR (Meitz et al., 2018a) model). In the ﬁrst phase,a genetic algorithm is used to ﬁnd starting values for a gradient based variable metric algorithm(Nash, 1990, algorithm 21) which then, in the second phase, accurately converges to a nearbylocal maximum or saddle point. In this appendix, it is ﬁrst brieﬂy described how our version ofthe genetic algorithm functions in general, and then the speciﬁc modiﬁcations made to enhanceestimation of the G-StMAR model are discussed (for more detailed description of the geneticalgorithm, see, e.g., Dorsey and Mayer, 1995).In a genetic algorithm, an initial population that consists of different parameter vectors (thatare often drawn at random) is ﬁrst constructed. Then the genetic algorithm operates iteratively28o that in each iteration, referred to as generation , the current population consisting of candidatesolutions goes through the phases of selection , crossover , and mutation . In the selection phase, pa-rameter vectors are sampled with replacement from the current population to the reproduction pool according to probabilities that are based on their ﬁtness , that is, on the related log-likelihoods. Inthe crossover phase, some of the parameter vectors in the reproduction pool are crossed over witheach other, with the probabilities of experiencing crossover given by the crossover rate . Finally,some of the parameter vectors are mutated in the mutation phase, with the mutation probabilitiesgiven by the mutation rate . In our version of the genetic algorithm, mutation means that the mu-tating parameter vector is fully replaced with another parameter vector that is drawn at random(in Dorsey and Mayer, 1995, mutations are drawn for each scalar component of parameter vectorsindividually). The reproduction pool that has experienced crossovers and mutations is the new pop-ulation, and the algorithm proceeds to the next generation, evolving towards the global maximumone generation after another.Because the G-StMAR model can be challenging to estimate even with a robust estimationalgorithm such as the genetic algorithm, we have made modiﬁcations to improve its performance.In particular, a slightly modiﬁed version of the individually adaptive crossover rate and mutationrate introduced by Patnaik and Srinivas (1994) is employed in order to force the subaverage solu-tions to disrupt while protecting the better ones. The ﬁtness inheritance proposed by Smith et al.(1995) is deployed to shorten the estimation time by cutting down the number computationallycostly evaluations of the log-likelihood function. In order to enhance thorough exploration of theparameter space, the algorithm proposed by Monahan (1984) is used in some random mutations togenerate parameter vectors near the boundary of the stationarity region. In the case of a prematureconvergence, most of the population is mutated so that exploration of the parameter space contin-ues. Moreover, after a large number generations have been run, for faster convergence the randommutations will be targeted to a neighbourhood of the best-so-far parameter vector; we call these smart mutations . We modiﬁed it to enforce a 40% minimum crossover rate for all individuals in the population

29n addition to the modiﬁcations described above, we have made further adjustments to carefor the special structure of the log-likelihood function. Speciﬁcally, the deﬁnition of the mixingweights (2.15) implies that if a regime has parameter values that ﬁt poorly relative to the otherregimes, the mixing weights drop to near zero. The surface of the log-likelihood function thusﬂattens in the related directions, meaning that the algorithm is unable to converge properly if theproposed parameter vectors don’t pose a reasonable ﬁt for all regimes. This problem of uniden-tiﬁed (or redundant) regimes often occurs when the number of mixture components is chosen toolarge, but it can be present even when the number of mixture components is chosen correctly. InuGMAR, we try to resolve this problem by penalizing parameter vectors containing redundantregimes with smaller probabilities to get chosen to the reproduction pool. Moreover, smart mu-tations are targeted only to the neighbourhood of parameter values that identify all regimes. Ifsuch parameter vectors have not been found (after a large number of generations have been run),combining regimes from different parameter vectors is attempted along with random search.

Appendix B Properties of multivariate Gaussian and Student’s t -distribution Denote a d -dimensional real valued vector by y . It’s well known that the density function of the d -dimensional multivariate Gaussian distribution with mean µ and covariance matrix Γ is n d ( y ; µ , Γ ) = (2 π ) − d/ det( Γ ) − / exp (cid:26) −

12 ( y − µ ) (cid:48) Γ − ( y − µ ) (cid:27) . (B.1)Similarly to Meitz et al. (2018a) but differing from the standard form, we parametrize theStudent’s t -distribution using its covariance matrix as a parameter together with the mean anddegrees of freedom. The density function of such a d -dimensional t -distribution with mean µ ,30ovariance matrix Γ , and ν > degrees of freedom is t d ( y ; µ , Γ , ν ) = C d ( ν ) det ( Γ ) − / (cid:18) y − µ ) (cid:48) Γ − ( y − µ ) ν − (cid:19) − ( d + ν ) / , (B.2)where C d ( ν ) = Γ (cid:0) d + ν (cid:1)(cid:112) π d ( ν − d Γ (cid:0) ν (cid:1) , (B.3)and Γ ( · ) is the gamma function. We assume that the covariance matrix Γ is positive deﬁnite forboth distributions.Consider a partition X = ( X , X ) of either a normally or t -distributed (with ν degrees offreedom) random vector X such that X has dimension ( d × and X has dimension ( d × .Consider also a corresponding partition of the mean vector µ = ( µ , µ ) and the covariance matrix Γ = (cid:34) Γ Γ Γ (cid:48) Γ (cid:35) , (B.4)where, for example, the dimension of Γ is ( d × d ) . Then in the case of normally distributed X , X has the marginal distribution n d ( µ , Γ ) and X has the marginal distribution n d ( µ , Γ ) .In the t -distributed case, the marginal distributions are t d ( µ , Γ , ν ) and t d ( µ , Γ , ν ) respec-tively (see, e.g., Ding (2016), also in what follows).In the normally distributed case, the conditional distribution of the random vector X given X = x is X | ( X = x ) ∼ n d ( µ | ( x ) , Γ | ( x )) (B.5)where µ | ( x ) = µ + Γ Γ − ( x − µ ) and (B.6) Γ | ( x ) = Γ − Γ Γ − Γ (cid:48) . (B.7)31n the t -distributed case, the analogous conditional distribution is X | ( X = x ) ∼ t d ( µ | ( x ) , Γ | ( x ) , ν + d ) , (B.8)where µ | ( x ) = µ + Γ Γ − ( x − µ ) and Γ | ( x ) = ν − x − µ ) (cid:48) Γ − ( x − µ ) ν − d ( Γ − Γ Γ − Γ (cid:48) ) . In particular, we have n d ( x ; µ , Γ ) = n d ( x ; µ | ( x ) , Γ | ( x )) n d ( x ; µ , Γ ) and (B.9) t d ( x ; µ , Γ , ν ) = t d ( x ; µ | ( x ) , Γ | ( x ) , ν + d ) t d ( x ; µ , Γ , ν ) . (B.10) Appendix C Proofs

C.1 Proof of Theorem 1

Suppose { y t } ∞ t =1 is a G-StMAR process. Then the process y t = ( y t , ..., y t − p +1 ) is clearly aMarkov chain on R p . Let y = ( y , ..., y − p +1 ) be a random vector whose distribution is char-acterized by the density function f ( y ; θ ) = (cid:80) M m =1 α m n p ( y ; µ m p , Γ m,p ) + (cid:80) Mm = M +1 α m × t p ( y ; µ m p , Γ m,p , ν m ) . According to equations (2.3)-(2.5), (2.8)-(2.10), (2.13), and (2.15), the32ensity of the conditional distribution of y given y is f ( y | y ; θ ) = M (cid:88) m =1 α m n p ( y ; µ m p , Γ m,p ) f ( y ; θ ) n ( y ; µ m, , σ m )+ M (cid:88) m = M +1 α m t p ( y ; µ m p , Γ m,p , ν m ) f ( y ; θ ) t ( y ; µ m, , σ m, , ν m + p ) (C.1) = M (cid:88) m =1 α m f ( y ; θ ) n p +1 (( y , y ); µ m p +1 , Γ m,p +1 )+ M (cid:88) m = M +1 α m f ( y ; θ ) t p +1 (( y , y ); µ m p +1 , Γ m,p +1 , ν m ) . (C.2)The random vector ( y , y ) therefore has the density function f (( y , y ); θ ) = M (cid:88) m =1 α m n p +1 (( y , y ); µ m p +1 , Γ m,p +1 )+ M (cid:88) m = M +1 α m t p +1 (( y , y ); µ m p +1 , Γ m,p +1 , ν m ) . (C.3)Using properties of marginal densities of multivariate normal and t -distributions, by integrating y − p +1 out we obtain the density of y as f ( y ; θ ) = (cid:80) M m =1 α m n p ( y ; µ m p , Γ m,p )+ (cid:80) Mm = M +1 α m × t p ( y ; µ m p , Γ m,p , ν m ) . Thus, the random vectors y and y are identically distributed. Asthe process { y t } ∞ t =1 is a (time homogeneous) Markov chain, it follows that { y t } ∞ t =1 has a sta-tionary distribution π y ( · ) characterized by the density f ( · ; θ ) = (cid:80) M m =1 α m n p ( · ; µ m p , Γ m,p ) + (cid:80) Mm = M +1 α m t p ( · ; µ m p , Γ m,p , ν m ) (Meyn and Tweedie, 2009, pp. 230-231).For ergodicity, let P p y ( y , · ) = P ( y p ∈ ·| y = y ) signify the p -step transition probabilitymeasure of the process y t . Using the p th order Markov property of y t , it’s easy to check that Because the covariance matrices Γ m,p +1 ( m = 1 , ..., M ) have the Toepliz form and µ m p = ( µ m , ..., µ m ) , themarginal densities for random vectors shorter than p are obtained by integrating the desired random variables out,and their distributions are mixtures of normal and t -distributions. p y ( y , · ) has the density f ( y p | y ; θ ) = p (cid:89) t =1 (cid:32) M (cid:88) m =1 α m,t n ( y t ; µ m,t , σ m ) + M (cid:88) m = M +1 α m,t t ( y t ; µ m,t , σ m,t , ν m + p ) (cid:33) . (C.4)Clearly f ( y p | y ; θ ) > for all y p ∈ R p and all y ∈ R p , so we can conclude that y t is ergodic inthe sense of Meyn and Tweedie (2009, Ch. 13) by using arguments identical to those used in theproof of Theorem 1 in Kalliovirta et al. (2015). (cid:4) C.2 Proof of Theorem 2

First note that L ( c ) T ( θ ) is continuous, and that together with Assumption 1 of the main paper itimplies existence of a measurable maximizer ˆ θ T . In order to conclude strong consistency of ˆ θ T , itneeds to be shown that (see, e.g., Newey and McFadden, 1994, Theorem 2.1 and the discussion onpage 2122)(i) the uniform strong law of large numbers holds for the log-likelihood function; that is, sup θ ∈ Θ (cid:12)(cid:12)(cid:12) L ( c ) T ( θ ) − E (cid:104) L ( c ) T ( θ ) (cid:105)(cid:12)(cid:12)(cid:12) → almost surely as T → ∞ ,(ii) and that the limit of L ( c ) T ( θ ) is uniquely maximized at θ = θ . Proof of (i).

Because the initial values are assumed to be from the stationary distribution,the process y t = ( y t , ..., y t − p +1 ) , and hence also y t , is stationary and ergodic, and E (cid:104) L ( c ) T ( θ ) (cid:105) = E [ l t ( θ )] . To conclude (i), it thus sufﬁces to show that E [sup θ ∈ Θ | l t ( θ ) | ] < ∞ (see Rao, 1962).This is done by using compactness of the parameter space to derive ﬁnite lower and upper boundsfor l t ( θ ) which is given by l t ( θ ) = log (cid:32) M (cid:88) m =1 α m,t n ( y t ; µ m,t , σ m ) + M (cid:88) m = M +1 α m,t t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1)(cid:33) . (C.5)We know from the structure of the parameter space that c ≤ σ m ≤ c and c ≤ α m ≤ − c forall m = 1 , ..., M , and c ≤ ν m ≤ c for all m = M + 1 , ..., M , for some < c < , c < ∞ and34 > . Because the exponential function is bounded from above by one on the non-positive realaxis, and in addition c ≤ σ m , there exists a constant U < ∞ such that n ( y t ; µ m,t , σ m ) = (cid:0) πσ m (cid:1) − / exp (cid:18) − ( y t − µ m,t ) σ m (cid:19) ≤ U (C.6)for all m = 1 , ..., M .We also have c ≤ ν m + p ≤ c + p for all m = M + 1 , ..., M . Combined with the fact thatthe Gamma function is continuous on the positive real axis, this implies that there exist constants c > and c < ∞ such that c ≤ C ( ν m + p ) = Γ (cid:0) ν m + p (cid:1)(cid:112) π ( ν m + p − (cid:0) ν m + p (cid:1) ≤ c (C.7)for all m = M + 1 , ..., M . Because Γ m and hence Γ − m is positive deﬁnite, σ m ≥ c and c ≤ ν m ≤ c , we can ﬁnd some c > such that σ m,t = ν m − y t − − µ m p ) (cid:48) Γ − m ( y t − − µ m p ) ν m − p σ m ≥ c (C.8)for all m = M + 1 , ..., M . Combined with (C.7) and (C.8), the inequality − (1 + ν m + p ) / < implies that there exists a constant U < ∞ for which t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1) = C ( ν m + p ) σ m,t (cid:18) y t − µ m,t ) ( ν m + p − σ m,t (cid:19) − (1+ ν m + p ) / ≤ U . (C.9)for all m = M + 1 , ..., M . According to (C.6), (C.9) and the restriction ≤ α m,t ≤ , there existsa constant U < ∞ such that l t ( θ ) = log (cid:32) M (cid:88) m =1 α m,t n ( y t ; µ m,t , σ m ) + M (cid:88) m = M +1 α m,t t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1)(cid:33) ≤ U . (C.10)35e know from compactness of the parameter space that ( y t − µ m,t ) σ m ≤ c (1 + y t + y (cid:48) t − y t − ) , (C.11)implying exp (cid:26) − ( y t − µ m,t ) σ m (cid:27) ≥ exp (cid:8) − c (1 + y t + y (cid:48) t − y t − ) (cid:9) , (C.12)for all m = 1 , ..., M , and for some ﬁnite constant c . By σ m ≤ c it also holds that (2 πσ m ) − / ≥ (2 πc ) − / , so n ( y t ; µ m,t , σ m ) ≥ (2 πc ) − / exp (cid:8) − c (1 + y t + y (cid:48) t − y t − ) (cid:9) (C.13)for all m = 1 , ..., M .Accordingly, since σ m,t ≥ c and ν m ≥ c , it holds for some c < ∞ that y t − µ m,t ) ( ν m + p − σ m,t ≤ c (1 + y t + y (cid:48) t − y t − ) , m = M + 1 , ..., M. (C.14)Thus, because ν m ≤ c and the inner functions below take values larger than one, we have (cid:18) y t − µ m,t ) ( ν m + p − σ m,t (cid:19) − (1+ ν m + p ) / ≥ (cid:0) c (1 + y t + y (cid:48) t − y t − ) (cid:1) − (1+ c + p ) / . (C.15)As Meitz et al. (2018a) state in the proof of Theorem 3, the quadratic form on the right-hand-sideof (C.8) satisﬁes ( y t − − µ m p ) (cid:48) Γ − m ( y t − − µ m p ) ≤ c (1 + y (cid:48) t − y t − ) (C.16)for all m = M + 1 , ..., M , and for some c < ∞ . Since also < ν m − ≤ c and σ m ≤ c , wehave σ m,t ≤ c (1 + y (cid:48) t − y t − ) for some ﬁnite constant c . Combining the former inequality with36C.7) and (C.15) yields a lower bound t (cid:0) y t ; µ m,t , σ m,t , ν m + p (cid:1) ≥ c ( c (1 + y (cid:48) t − y t − )) / (cid:0) c (1 + y t + y (cid:48) t − y t − ) (cid:1) − (1+ c + p ) / . (C.17)Finally, the restriction (cid:80) Mm =1 α m,t = 1 together with (C.13) and (C.17) implies l t ( θ ) ≥ min (cid:26) −

12 log(2 π ) −

12 log( c ) − c (1 + y t + y (cid:48) t − y t − ) , log( c ) −

12 log( c (1 + y t + y (cid:48) t − y t − )) − c + p (cid:0) c (1 + y t + y (cid:48) t − y t − ) (cid:1)(cid:27) . (C.18)As E (cid:2) y t + y (cid:48) t − y t − ) (cid:3) < ∞ (because y t is stationary and has ﬁnite second moments), it followsfrom Jensen’s inequality thatE (cid:2) log (cid:0) c (1 + y t + y (cid:48) t − y t − ) (cid:1)(cid:3) < ∞ and E (cid:2) log (cid:0) c (1 + y (cid:48) t − y t − ) (cid:1)(cid:3) < ∞ . (C.19)The upper bound (C.10) together with (C.18) and ﬁniteness of the aforementioned expectationsshows that E (cid:2) sup ( θ , ν ) ∈ Θ | l t ( θ ) | (cid:3) < ∞ . (cid:4) Proof of (ii).

Given that condition (3.3) of the main paper sets a unique order for the mix-ture components, proving that this identiﬁcation condition is satisﬁed amounts to showing thatE [ l t ( θ )] ≤ E [ l t ( θ )] , and that the equality E [ l t ( θ )] = E [ l t ( θ )] implies ϑ m = ϑ τ ( m ) , and α m = α τ ( m ) , when m = 1 , ..., M , and ( ϑ m , ν m ) = ( ϑ τ ( m ) , , ν τ ( m ) , ) and α m = α τ ( m ) , when m = M + 1 , ..., M, (C.20)for some permutations { τ (1) , ..., τ ( M ) } and { τ ( M + 1) , ..., τ ( M ) } . For notational clarity,we omit the subscripts from y t and y t − , and write µ m,t = µ ( y ; ϑ m ) , σ m = σ m ( ϑ m ) , σ m,t = σ m,t ( y ; ϑ m , ν m ) for the expressions in (C.5) making clear their dependence on the parameter value.37e leave the dependence of α m,t on θ and y unmarked and denote by α m, ,t mixing weights basedon the true parameter value.Making use of the fact that the density function of ( y t , y t − ) has the form f (( y t , y t − ); θ ) = (cid:80) M m =1 α m n p +1 (( y t , y t − )); µ m p +1 , Γ m,p +1 ) + (cid:80) Mm = M +1 α m t p +1 (( y t , y t − )); µ m p +1 , Γ m,p +1 , ν m ) (see proof of Theorem 1) and reasoning based on Kullback-Leibler divergence, one can use argu-ments analogous to those in Kalliovirta et al. (2015, p. 265) to conclude E [ l t ( θ )] − E [ l t ( θ )] ≤ with equality if and only if for almost all ( y, y ) ∈ R p +1 M (cid:88) m =1 α m,t n ( y ; µ ( y ; ϑ m ) , σ m ( ϑ m )) + M (cid:88) m = M +1 α m,t t ( y ; µ ( y ; ϑ m ) , σ m,t ( y ; ϑ m , ν m )) , ν m + p )= M (cid:88) m =1 α m, ,t n ( y ; µ ( y ; ϑ m, ) , σ m ( ϑ m, ))+ M (cid:88) m = M +1 α m, ,t t ( y ; µ ( y ; ϑ m, ) , σ m,t ( y ; ϑ m, , ν m, )) , ν m, + p ) . (C.21)For each ﬁxed y at a time, the mixing weights, conditional means and variances in (C.21) are con-stants, so we may apply the result on identiﬁcation of ﬁnite mixtures of normal and t -distributionsin Holzmann et al. (2006, Example 1) (their parametrization of the t -distribution slightly differsfrom ours, but identiﬁcation with their parametrization implies identiﬁcation with our parametriza-tion). For each ﬁxed y , there thus exists a permutation { τ (1) , ..., τ ( M ) } (that may depend on y )of the index set { , ..., M } such that α m,t = α τ ( m ) , ,t , µ ( y ; ϑ m ) = µ ( y ; ϑ τ ( m ) , ) and σ m ( ϑ m ) = σ m ( ϑ τ ( m ) , ) (C.22)for almost all y ∈ R ( m = 1 , ..., M ). Analogously, for each ﬁxed y there exists a permutation38 τ ( M + 1) , ..., τ ( M ) } (that may depend on y ) of the index set { M + 1 , ..., M } such that ν m = ν τ ( m ) , , α m,t = α τ ( m ) , ,t , µ ( y ; ϑ m ) = µ ( y ; ϑ τ ( m ) , ) and σ m,t ( y ; ϑ m , ν m ) = σ m,t ( y ; ϑ τ ( m ) , , ν τ ( m ) , ) , (C.23)for almost all y ∈ R ( m = M + 1 , ..., M ).As argued by Kalliovirta et al. (2015, pp. 265-266) for the GMAR type components, it followsfrom (C.22) that ϑ m = ϑ τ ( m ) , and α m = α τ ( m ) , for m = 1 , ..., M . Accordingly, Meitzet al. (2018a) showed that (C.23) implies ϑ m = ϑ τ ( m ) , , ν m = ν τ ( m ) , and α m = α τ ( m ) , for m = M + 1 , ..., M , completing the proof of strong consistency.Given consistency and assumptions of the theorem, asymptotic normality of the ML estimatorcan now be concluded using standard arguments. The required steps can be found, for example, inKalliovirta et al. (2016, proof of Theorem 3). We omit the details for brevity. (cid:4)(cid:4)