[PDF] Inference in mixed causal and noncausal models with generalized Student's t-distributions

Abstract

This paper analyzes the properties of the Maximum Likelihood Estimator for mixed causal and noncausal models when the error term follows a Student's t-distribution. In particular, we compare several existing methods to compute the expected Fisher information matrix and show that they cannot be applied in the heavy-tail framework. For this purpose, we propose a new approach to make inference on causal and noncausal parameters in finite sample sizes. It is based on the empirical variance computed on the generalized Student's t, even when the population variance is not finite. Monte Carlo simulations show the good performances of our new estimator for fat tail series. We illustrate how the different approaches lead to different standard errors in four time series: annual debt to GDP for Canada, the variation of daily Covid-19 deaths in Belgium, the monthly wheat prices and the monthly inflation rate in Brazil.

Full PDF

IInference in mixed causal and noncausal models withgeneralized Student’s t-distributions

Francesco Giancaterini ∗ and Alain Hecq Department of Quantitative EconomicsSchool of Business and EconomicsMaastricht University

December, 2020

Abstract

This paper analyzes the properties of the Maximum Likelihood Estimator for mixedcausal and noncausal models when the error term follows a Student’s t − distribution. Inparticular, we compare several existing methods to compute the expected Fisher infor-mation matrix and show that they cannot be applied in the heavy-tail framework. Forthis purpose, we propose a new approach to make inference on causal and noncausalparameters in ﬁnite sample sizes. It is based on the empirical variance computed on thegeneralized Student’s t , even when the population variance is not ﬁnite. Monte Carlosimulations show the good performances of our new estimator for fat tail series. Weillustrate how the diﬀerent approaches lead to diﬀerent standard errors in four time se-ries: annual debt to GDP for Canada, the variation of daily Covid-19 deaths in Belgium,the monthly wheat prices and the monthly inﬂation rate in Brazil. Keywords : MLE, noncausal models, generalized Student’s t-distribution, inference.

JEL : C22

Mixed causal and noncausal models (MAR) are time series processes with both leads and lagscomponents. Such speciﬁcations allow to capture nonlinear features such as bubbles, namely pro-cesses that experience a rapid increase followed by a sudden crash. Linear autoregressive models(e.g. ARMA models) cannot exhibit these bubble patterns. MAR models have successfully beenimplemented on several time series, for instance: commodity prices, inﬂation rate, bitcoin and ∗ Corresponding author: Francesco Giancaterini, Maastricht University, School of Business and Economics, De-partment of Quantitative Economics, P.O.box 616, 6200 MD, Maastricht, The Netherlands.Email: [email protected] authors would like to thank Sean Telg, Elisa Voisin and Ines Wilms for their various suggestions. All errors areours. a r X i v : . [ ec on . E M ] D ec other equity prices. Furthermore, forecasts from mixed causal and noncausal models often beatthose from linear ones. They also have an economic ﬂavor. They are interpreted as situations inwhich economic agents have more information then econometricians, linking MAR models with theexistence of non-fundamentalness in structural econometric models (see Alessi et al. (2011) andLanne and Saikkonen (2013)). Still their estimation and in particular making inference on MARparameters is far from trivial.This paper analyzes the behaviour of the Maximum Likelihood Estimator (MLE) for mixedcausal and noncausal models with an error term following Student’s t − distributions. Althoughmost theoretical results for MARs are derived under the assumption of ﬁnite variance of the errorterm (see i.a. Breidt, Davis, Li & Rosenblatt (1991); Lanne & Saikkonen (2011)), we emphasizethat working with the generalized version of the Student’s t allows to also cover inﬁnite variancecases (when the degree of freedom 1 < ν ≤ t lie between 1.5 and 2. The alternative methods tomake inference in the inﬁnite variance cases would be either to work with a diﬀerent asymptotictheory (Davis and Resnick (1985)), or to use diﬀerent distributions (see the work on alpha stabledistributions by Fries and Zakoian (2019)), or, in case of purely noncausal models, to rely on boot-strap estimators (Cavaliere, Nielsen and Rahbek, (2020)).The rest of the paper is organized as follows. Section 2 introduces mixed causal and noncausalmodels. Section 3 presents the diﬀerent ways of obtaining the expected Fisher information matrixfor MARs. The existing strategies are brieﬂy reviewed. Section 4 proposes a new approach to com-pute the standard errors of causal and noncausal parameters, based on a robust estimator of theresiduals. We show its validity in ﬁnite samples. Section 5 studies, using Monte Carlo simulations,the performances of the current methodologies and of the new approach. Section 6 is dedicated tothe empirical applications on four diﬀerent time series. Section 7 concludes. Breidt et al. (1991) introduce a maximum likelihood procedure for estimating the parameters ofnoncausal processes. Their starting point is the autoregressive model a ( L ) y t = ε t , (1)where L is the backshift operator, ε t is an independent and identically ( i.i.d. ) non-Gaussian se-quence of random variables with mean zero and ﬁnite variance. It is assumed that the autoregressivepolynomial a ( z ) = 1 − φ z − · · · − φ z z p has no roots on the unit circle, so that φ ( z ) (cid:54) = 0 for | z | = 1.Breidt et al. (1991) further assume that the polynomial a ( z ) has respectively s roots inside and r outside the unit circle. Equation (1) can be factored in a ( z ) = ϕ ( z ) ∗ φ ( z ) , (2)where ϕ ( z ) ∗ is called the noncausal polynomial since its roots are inside the unit circle such that1 − ϕ ∗ z − ... − ϕ ∗ s z s (cid:54) = 0 for | z | ≥

1. Breidt et al. (1991) derive the covariance matrix of the estimatedparameters only for probability density functions of ε t that satisfy a certain set of assumptions listedin Section 3. The generalized Student’s t − distribution with degrees of freedom equal or less than The non-Gaussianity of ε t is required to identify noncausal from causal models.

2, does not satisfy one of these assumptions and, as a consequence, this approach cannot be usedin the heavy-tail framework. Lanne and Saikkonen (2011) directly start with a mixed causal andnoncausal model expressed as the product of the backward and forward looking polynomials φ ( L ) ϕ ( L − ) y t = (cid:15) t , (3)where L − produces leads such that L − y t = y t +1 . We denote such a model a MAR( r , s ) with φ ( L )the causal/autoregressive polynomial of order r and ϕ ( L − ) the noncausal/lead polynomial of order s . With this representation it is assumed that both φ ( z ) and ϕ ( z ) have their roots outside the unitcircle: φ ( z ) (cid:54) = 0 and ϕ ( z ) (cid:54) = 0 f or | z | < . (4)Note that purely causal or purely noncausal models are respectively obtained when ϕ ( L − ) = 1or φ ( L ) = 1. In (3) the parameter vectors φ = ( φ , ..., φ r ) and ϕ = ( ϕ , ..., ϕ s ) turn out to beorthogonal to the parameters that describe the distribution of the error term (cid:15) t (see Lemma 1of Lanne and Saikkonen (2011)). They can be estimated by an AMLE approach. AMLE refersto as the approximate maximum likelihood estimators because we lose the r ﬁrst and the last s observations when estimating MAR( r, s ). An important and useful feature of mixed causal andnoncausal models, is that we can set: u t = ϕ ( L − ) y t ↔ u t = φ u t − + · · · + φ r u t − r + (cid:15) t , (5) v t = φ ( L ) y t ↔ v t = ϕ v t +1 + · · · + ϕ s v t + s + (cid:15) t . (6)In order to obtain the standard errors of the estimated parameters, Lanne and Saikkonen workwith a density function which satisﬁes similar assumptions presented in Breidt et al. (1991) and,in particular, that it must have a ﬁnite variance.Hecq, Lieb and Telg (2016) propose a new approach to more easily compute the standard errorsfor MAR( r, s ) using the generalized t − distribution and relying on the results developed for thelinear regression model by Fonseca et al. (2008). Their approach, also implemented in the Rpackage MARX, works if and only if E ( | (cid:15) t | ) < ∞ and hence if the degrees of freedom is largerthan 2. We show however that this approach can be misleading as it imposes strong restrictionsthat can lead to incorrect estimates of the standard errors. Let us consider a general density function f and denote the likelihood function of θ by L ( θ ) = T (cid:89) t =1 f ( (cid:15) t | θ ) . We indicate with θ = ( θ , ..., θ p ) the vector of the true values of the causal and noncausal coeﬃ-cients ( p = r + s ). The other parameters of the general density function (degrees of freedom andscale parameter), are, for the moment, assumed to be known and equal to their true populationvalues; we will show next that they are independent from the estimation of θ . Furthermore, we as-sume that (cid:15) t has a ﬁnite variance, equal to σ . Taking the logs of L ( θ ), we obtain the log-likelihoodfunction l ( θ ) = ln L ( θ ) = T (cid:88) t =1 ln( f ( (cid:15) t | θ )) . (7)Deﬁning b ( θ ) = δl ( θ ) δ θ the score vector of the log-likelihood, the MLE of θ is given by the solution (cid:98) θ to the p = r + s equations b ( (cid:98) θ ) = . If the sample size is suﬃciently large, it turns out that thedistribution of the maximum likelihood estimation (cid:98) θ can be well approximated by (cid:98) θ ≈ N ( θ , I ( θ ) − ) , (8)where I is the expected Fisher information matrix I ( θ ) = − E (cid:2) δ l ( θ ) δ θ δ θ (cid:48) (cid:3) . (9)Since it is not always trivial to evaluate analytically the expected value of the Hessian matrix, wecan also compute the observed Fisher information matrix: I ( θ ) = − (cid:2) δ l ( θ ) δ θ δ θ (cid:48) (cid:3) . (10)For the law of large numbers I ( θ ) converges in probability to I ( θ ). In practice, since the truevalue of θ is not known, these two matrices are obtained by replacing the population parametersby their ML estimates to get I ( (cid:98) θ ) and I ( (cid:98) θ ).Let us start with I ( θ ) , the observed information matrix of a MAR( r, s ) as described in (3).We consider (cid:15) t i.i.d. and distributed according to a generalized Student’s t distribution, such thatits density function at time t is: f σ ( (cid:15) t , ν , η ) = Γ( ν +12 )Γ( ν ) √ πν η (cid:34) ν (cid:18) (cid:15) t η (cid:19) (cid:35) , (11)with the corresponding approximate log-likelihood function, conditional on y = [ y , . . . , y T ], equalto: l ( φ , ϕ , ν , η | y ) = ( T − p ) (cid:34) ln (cid:18) Γ (cid:18) ν + 12 (cid:19)(cid:19) − ln (cid:18)(cid:113) ν πη (cid:19) − ln (cid:18) Γ (cid:18) ν (cid:19)(cid:19)(cid:35) + − ν + 12 T − s (cid:88) t = r +1 ln (cid:18) ν (cid:18) φ ( L ) ϕ ( L − ) y t η (cid:19) (cid:19) . (12)We indicate with ν and η respectively the true values of the degrees of freedom and of thescale parameter. Instead, σ denotes the true value of the variance of the error term which, in ageneralized Student’s t − distribution, is equal to σ = η ν ν − , ∀ ν > . We use the notation of Hamilton (1984, p.143).

In this case, we have that I ( θ ) is given by I ( θ ) = − (cid:34) δ l ( θ ) δ φ φ (cid:48) δ l ( θ ) δ φ ϕ (cid:48) δ l ( θ ) δ ϕ φ (cid:48) δ l ( θ ) δ ϕ ϕ (cid:48) (cid:35) , (13)knowing that, in the general case: δl (( θ ) δ φφ (cid:48) = 2( ν + 1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − U t − U (cid:48) t − [ z t η ] (cid:19) − ( ν + 1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − U t − U (cid:48) t − (cid:19) ; δl (( θ ) δ ϕϕ (cid:48) = 2( ν + 1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − V t +1 V (cid:48) t +1 [ z t η ] (cid:19) − ( ν + 1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − V t +1 V (cid:48) t +1 (cid:19) ; δl (( θ ) δ φϕ (cid:48) = 2( ν +1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − U t − V (cid:48) t +1 [ z t η ] (cid:19) − ( ν +1) η − (cid:18) T − s (cid:80) t = r +1 ( ν + z t ) − ( U t − V (cid:48) t +1 + Y t ) (cid:19) ;with z t = φ ( L ) u t η = ϕ ( L − ) v t η = φ ( L ) ϕ ( L − ) y t η , U t − = ( u t − , . . . , u t − r ), V t +1 = ( v t +1 , . . . , v t + s )and Y t is a matrix r × s with elements y t − i + j ( i = 1 , . . . , r and j = 1 , . . . , s ).Section 3.1 shows that in mixed causal and noncausal models, the expected Fisher informationmatrix (unlike I( ˆ θ )) cannot be computed when the population variance is not ﬁnite. In Section 5we will evaluate, by means of Monte Carlo simulations, whether the observed Fisher informationmatrix still allows to respect Equation (8) in this context. Lanne and Saikkonen (2011) propose to calculate the asymptotic covariance matrix using a general(Lebesgue) density function, which depends on parameters vector λ , where all the distributionalparameters are collected (scale parameter and degrees of freedom which, exactly as the previoussection, are respectively indicated with η and ν ). Furthermore, it is characterized by an i.i.d innovation term with ﬁnite and constant variance, equal to σ . Similar conditions as of Andrewset al. (2006) must be satisﬁed. In details these are:(A1) For all x ∈ R and all λ ∈ Λ, f ( x, λ ) > f ( x, λ ) is twice continuously diﬀerentiable withrespect to ( x, λ ).(A2) For all λ ∈ Λ , (cid:82) xf (cid:48) ( x, λ ) dx = − (cid:82) f (cid:48)(cid:48) ( x ; λ ) dx = 0 . (A4) (cid:82) x f (cid:48)(cid:48) ( x, λ ) dx = 2.(A5) J = (cid:82) ( f (cid:48) ( x, λ )) /f ( x, λ ) dx > (A7) For j, k = 1 , ..., d and all λ ∈ Λ , Matrix deﬁned in Equation (11) in Lanne and Saikkonen (2011). • f ( x, λ ) is dominated by a function f ( x, λ ) such that (cid:82) x f ( x ) dx < ∞ , and • x f (cid:48) ( x,λ )) f ( x,λ ) , x (cid:12)(cid:12)(cid:12) f (cid:48)(cid:48) ( x,λ ) f ( x ; λ ) (cid:12)(cid:12)(cid:12) , | x | (cid:12)(cid:12)(cid:12) δf (cid:48) ( x ; λ ) /λ j f ( x ; λ ) (cid:12)(cid:12)(cid:12) , ( δf (cid:48) ( x ; λ ) / ( δλ j )) f ( x ; λ ) , and | δ f ( x,λ ) /δ j δ k | f ( x ; λ ) are dominatedby a + a | x | c , where a , a , and c are nonnegative constants and (cid:82) | x | c f ( x ) dx < ∞ .In this Section we relax the assumption that the distributional parameters of density f are known.Also, we need to introduce some notation used in their paper. Let ζ t ∼ i.i.d (0 ,

1) and deﬁne theAR( r ) stationary process u ∗ t by φ ( L ) u ∗ t = ζ t and the AR( s ) stationary process v ∗ t by ϕ ( L ) v ∗ t = ζ t .Let also deﬁne U ∗ t − = ( u ∗ t − , . . . , u ∗ t − r ), V ∗ t − = ( v ∗ t − , . . . , v ∗ t − s ) and the associated covariance ma-trices Γ U ∗ = Cov ( U ∗ t − ), Γ V ∗ = Cov ( V ∗ t − ) and Γ U ∗ V ∗ = Cov ( U ∗ t − , V ∗ t − ) = Γ (cid:48) V ∗ U ∗ . Theorem 1 (by Lanne et al., 2011) Given conditions (A1)-(A7), there exists a sequence oflocal maximizers (cid:98) θ = ( (cid:98) φ , (cid:98) ϕ , (cid:98) η, (cid:98) v ) of l t ( θ ) in (7) such that( T − p ) / ( (cid:98) θ − θ ) d −→ N (0 , diag (Σ − , Ω − )) , where Σ − is the asymptotic variance-covariance matrix of the AML estimators ( φ, ϕ ), such thatΣ = (cid:20) J Γ U ∗ Γ U ∗ V ∗ Γ V ∗ U ∗ J Γ V ∗ (cid:21) = (cid:20) σ ˜ J Γ U ∗ Γ U ∗ V ∗ Γ V ∗ U ∗ σ ˜ J Γ V ∗ (cid:21) (14)and Ω − is the asymptotic variance-covariance matrix of the distributional parameters.Lanne and Saikkonen (2008) show in detail how to obtain the Σ matrix. Furthermore, theyshow that we have a block diagonality because the representation (3) and the conditions (A2)-(A4). Due to the block diagonality of the covariance matrix of the limiting distribution, the AMLestimators of ( (cid:98) φ , (cid:98) ϕ ) and ( (cid:98) ν , (cid:98) η ) are asymptotically independent. The matrix Σ is positive deﬁniteif condition (A5) is true ( J > (cid:26)(cid:90) x f (cid:48) σ ( x, λ ) f σ ( x, λ ) f σ ( x, λ ) dx (cid:27) ≤ (cid:26) (cid:90) x f σ ( x, λ ) dx (cid:27)(cid:26)(cid:90) (cid:18) f (cid:48) σ ( x, λ ) f σ ( x, λ ) (cid:19) f σ ( x, λ ) dx (cid:27) = σ ˜ J , (15)with an equality if and only if f is gaussian. Hence, (A5) is true for non-gaussian f since ˜ J can berewritten as ˜ J = σ − (cid:90) ( f (cid:48) ( x, λ )) f ( x, λ ) dx = σ − J , (16)where the density function inside the integral, refers to a rescaled density function (that is withunit variance). In other words, we have that Σ is positive deﬁnite if σ ˜ J > In our case, we have (cid:15) t i.i.d. according to a generalized Student’s t distribution and: σ ˜ J = σ (cid:90) f (cid:48) σ ( (cid:15) t , ν, η ) f σ ( (cid:15) t , ν, η ) d(cid:15) t = (cid:18) η νν − (cid:19)(cid:18) η − ν + 1 ν + 3 (cid:19) = ν ( ν + 1)( ν − ν + 3) > ∀ ν > , with f σ ( (cid:15) t , ν, η ) deﬁned in (11). It is easy to see that this approach works if and only if (cid:15) t has The condition σ ˜ J > J > ﬁnite variance. This is the reason why most authors (e.g. Lanne and Saikkonen (2011)) use astandardized Student’s t − distribution (such that σ = η ) in their empirical application. When weconsider this type of standardized distribution, the log-likelihood function is: l (cid:48) ( φ , ϕ , ν , η | y ) = ln (cid:40) Γ( ν +12 )Γ( ν ) √ πνη (cid:113) ν − ν (cid:34) ν (cid:32) φ ( L ) ϕ ( L − ) y t η (cid:113) ν − ν (cid:33) (cid:35) − ν +12 (cid:41) , such that, unlike (12), its structure ensures convergence for ν > ν = 1 though). We also observe that the heavier the tails are, the faster the estimatorseems to converge (see Hecq et al. (2016)). Hecq et al. (2016) take their inspiration from the conclusions of Fonseca et al. (2008) who considerthe linear regression model y i = X (cid:48) i β + (cid:15) i ( i = 1 , ..., T ) , (17)where X i and β are both vectors p × (cid:15) i are i.i.d. following a generalized Student’s t -distributionwith ν degrees of freedom and a scale parameter η , such that c ( ν, η ) = Γ( ν +12 ) ν ν/ Γ( ν ) (cid:112) πη , (18)the log-likelihood function being l ( β , η, ν | y , X ) = T ln[ c ( ν, η )] − ν + 12 T (cid:88) i =1 ln( ν + z i ) , (19)where z i = y i − X (cid:48) i β η .The ﬁrst derivative of the log-likelihood function with respect to β is given by δl ( β , η, ν | y , X ) δ β = ( ν + 1) T (cid:88) i =1 ( ν + z i ) − ( ν + 1) (cid:18) X i ( y i − X (cid:48) i β ) η (cid:19) , whereas the second derivative with respect to β , applying the product and chain rule is δ l ( β , η, ν | y , X ) δ β (cid:48) δ β = 2( ν +1) η − T (cid:88) i =1 (cid:20) ( ν + z i ) − ( X i X (cid:48) i )( y i − X (cid:48) i β ) (cid:21) − ( ν +1) η − T (cid:88) i =1 (cid:20) ( ν + z i ) − ( X i X (cid:48) i ) (cid:21) . (20)In order to obtain the expected Fisher information I ( · ) with respect to β , we have to take theexpectation of this expression and multiply it by -1. Fonseca et al. (2008) show that: I ( β ) = − E (cid:20) δ l ( β , η, ν | y , X ) δ β (cid:48) δ β | β (cid:21) = η − ν + 1 ν + 3 T (cid:88) i =1 X i X (cid:48) i . (21)Hecq et al. (2016) adapt the results obtained by Fonseca et al. (2008) in the context of thenoncausal model setup. That is, they consider a general MAR( r, s ) model φ ( L ) ϕ ( L − ) y t = (cid:15) t , (22)where (cid:15) t ∼ t ( ν, η ). To ensure a similar model setup, Hecq et al. (2016) use representations (5) and(6). They replace the aforementioned alternative representations of noncausal model to the originallinear representation (17), so that they can compute the standard errors of the causal/noncausalcoeﬃcients using the results of Fonseca et al. (2008). In other words, they obtain the standarderrors of the causal coeﬃcient using (5) and assuming the noncausal parameters as known. Instead,for the standard errors of the noncausal parameters, they use representation (6) supposing that thecausal coeﬃcients are known. This is of course an approximation which leads to a block diagonaland conditional expected Fisher information matrix (14). For instance, in a MAR(1,1) they obtainthe following conditional expected Fisher information matrices for the causal and the noncausalparts I ( φ | ϕ ) = − E (cid:20) δ l ( φ, η, ν | ϕ ) δφ (cid:21) , I ( ϕ | φ ) = − E (cid:20) δ l ( ϕ, η, ν | φ ) δϕ (cid:21) implying both δ l ( · ) δφδϕ = 0 and δ l ( · ) δϕδφ = 0; hence I ( φ, ϕ ) = − E (cid:34) δ l ( φ,η,ν | ϕ ) δφ δ l ( ϕ,η,ν | φ ) δϕ (cid:35) . (23)Obviously, when we invert I ( φ, ϕ ), we have diﬀerent results from those that we obtain if we wouldhave inverted the complete Fisher information matrix. Hecq et al. (2016) illustrate that this ap-proximation gives mildly satisfactory results.Furthermore, exactly as Lanne and Saikkonen (2011), Hecq et al. (2016) state that this method-ology can be applied only if the error term has a ﬁnite variance (hence if ν > In this section, we investigate the conclusions obtained by Hecq et al. (2016) in the ﬁnite variancecase of the error term. In particular, we want to evaluate to what extent the assumption of theblock diagonality of the conditional expected Fisher information matrix yields misspeciﬁed standarderrors. Indeed their approach is implemented in the R package MARX and has been applied inseveral researches. For this purpose, we compute the empirical density functions of the percentagediﬀerence of the standard errors obtained through the two aforementioned approaches. In particular,we analyze the empirical density functions of Z φ,i and Z ϕ,j , where: Z φ,i = S.E.L. φ,i − S.E.H. φ,i

S.E.H. φ,i ×

100 ; Z ϕ,j = S.E.L. ϕ,j − S.E.H. ϕ,i

S.E.H. ϕ,j × .S.E.L. φ,i indicates the standard error of the i-th causal coeﬃcient obtained through the expectedFisher information matrix (Σ), with i ∈ [1 , r ]. Instead, S.E.H. φ,i represents the standard error ofthe i-th causal coeﬃcient derived from Hecq et al.’s approach. The same is true for true for

S.E.L. ϕ,j and

S.E.H. ϕ,j . The only diﬀerence is that the latters refer to the j-th noncausal coeﬃcient, with j ∈ [1 , s ].The data generating process is a MAR(1,1) with a scale parameter η = 5, T=1000 observationsand 10000 replications. In addition, we consider diﬀerent values of degrees of freedom ( ν =3, ν =4and ν =5) and diﬀerent combinations of values for the causal/noncausal coeﬃcients, that is: • φ =0.65, ϕ =0.35; • φ =0.5, ϕ =0.5; • φ =0.35, ϕ =0.65.Figures 1-3 show the empirical density functions of Z φ and of Z ϕ obtained through Monte Carloexperiments.We conclude that the standard errors proposed by Lanne and Saikkonen (2011) should be usedfor non-heavy tailed models. The approximation developed in Hecq et al. (2016) on the other hand,underestimates the standard errors and consequently provides a too narrow conﬁdence interval.Furthermore, such underestimation decreases with decreasing degrees of freedom. Hence, althoughthe approach proposed by Hecq et al (2016) is easy to implement, it should be applied in casesof heavy tail disturbances, or where the Lanne and Saikkonen (2011)’s method already show someconvergence problems. This happens, due to estimation uncertainty, when the degrees of freedomis small even though the population variance is ﬁnite. In this section we propose a new methodology to compute the standard errors of MAR parameters.It is valid for mixed causal and noncausal models whenever the error term is distributed accordingto a generalized Student’s t − distribution and the sample size is ﬁnite. Although in the heavy-tail framework it is not possible to derive the theoretical limiting distributions of these parameters,Monte Carlo simulations in the next section show how our new estimator empirically satisﬁes Equa-tion (8) for ν ∈ (1 , D ], with D < ∞ .In Section 3.1, it is stated that the variance of the error term ( σ ) multiplies the block di-agonal matrices of the Expected Fisher information matrix deﬁned in (13). Since the Student’s t − distribution with heavy-tailed innovations is characterized by an undeﬁned variance, the ex-pected Fisher information matrix cannot be computed in this context. Our alternative strategy0 MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 5 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 5 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 5 Z φ Z ϕ Figure 1: Density plots of the variables Z φ and Z ϕ , based on 5 degrees of freedom and T=1000observations1 MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 4 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 4 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 4 Z φ Z ϕ Figure 2: Density plots of the variables Z φ and Z ϕ , based on 4 degrees of freedom and T=1000observations2 MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 3 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 3 Z φ Z ϕ MAR(1,1): φ = 0 . , ϕ = 0 . , ν = 3 Z φ Z ϕ Figure 3: Density plots of the variables Z φ and Z ϕ , based on 3 degrees of freedom and T=1000observations3consists in replacing the variance of the error term with the variance of residuals ( σ (cid:98) (cid:15) ) in (13).Furthermore, especially in those cases where the population variance is not ﬁnite ( ν ∈ (1 , M AD (cid:98) (cid:15) = median ( | (cid:98) (cid:15) i − median ( (cid:98) (cid:15) i ) | ) (24)and a consistent estimation of the standard deviation is given by: σ (cid:98) (cid:15) = k × M AD (cid:98) (cid:15) . (25)Rousseeuw et al. (1993) show that, if we set k to 1.48, Equation (25) ensures convergence to thestandard deviation under the assumption of normality. Let us now ﬁnd the value of k that providesa robust estimation of the standard deviation of the residuals if multiplied by the MAD estimator,under the assumption of Student’s t − distribution, for ν ∈ (1 , D ] . The standard deviation of theresiduals depends on two diﬀerent parameters: the degrees of freedom ( ν ) and the sample size ( T ).This implies that also k is a function of ν and T : k ( ν, T ) = (cid:98) σ (cid:98) (cid:15) ( ν, T ) M AD (cid:98) (cid:15) . (26)In other words, k is a random variable with diﬀerent density functions depending on the diﬀerentvalues of ν and T . Supposing ν = 1 . T = 500, to obtain the empirical density function of k (1 . , ν = 1 . T = 500. In each replication we compute the value of k using Equation (26). In this way, theMonte Carlo experiment yields as many values of k as the number of replications. To identify fromthese values the empirical density function of k , we use the kernel density estimation. The extremevalues of k can aﬀect the non-parametric estimation and to avoid this, we extract all values of k within the range: (cid:2) Q − × IQR, Q × IQR (cid:3) , where Q1 and Q3 are respectively the ﬁrst and the third quartile of k and IQR is its interquartilerange. With the following values, we obtain an empirical density function as shown in Figure 4.In addition to choosing ν = 1 . T = 500, we also consider N=700.000 replications. A largenumber of replications is important to obtain an empirical density function as accurate as possible.Finally, in order to obtain a robust estimate of standard deviation of residuals, we take the modeof k , indicating this value as k ∗ . Appendix A provides values of k ∗ for other ν and T .In conclusion, this approach gives us a Fisher information matrix:¯Σ = (cid:20) σ (cid:98) (cid:15) ˜ J Γ U ∗ Γ U ∗ V ∗ Γ V ∗ U ∗ σ (cid:98) (cid:15) ˜ J Γ V ∗ (cid:21) , (27)where, using Equation (25), we have: σ (cid:98) (cid:15) ˜ J = (cid:18) k ∗ ( ν, T ) × M AD (cid:98) (cid:15) (cid:19) η − ν + 1 ν + 3 . For now we are not interested in the scale parameter of the error term since, in Equation (26), it is shown that η does not aﬀect k . Empirical density function of k( ν = 1 . , T = 500 ), obtained using 700.000 replications. So far we have seen that, for mixed causal and noncausal models with an innovation term distributedaccording to a generalized Student’s t − distribution, it is not possible to derive the theoreticallimiting distribution in the heavy-tail framework. This section focuses on identifying, throughMonte Carlo experiments, which of the aforementioned estimators of the standard errors satisﬁesEquation (8) in ﬁnite samples. As previously stated, we will also include in the analysis the standarderrors obtained by the observed Fisher information matrix ( I ( (cid:98) φ , (cid:98) ϕ )).For this purpose, we run several Monte Carlo simulations characterized by N=10000 replicationseach. The data generating process is a MAR(1,1) with a scale parameter η = 3 and sample sizes T =(100, 200, 500, 1000). We also consider several degrees of freedom ν = (3 , . , . , .

2) anddiﬀerent combinations of causal and noncausal coeﬃcients, that is: • φ =0, ϕ =0; • φ =0.65, ϕ =0.35; • φ =0.5, ϕ =0.5; • φ =0.35, ϕ =0.65.For each replication we test whether the estimated causal and noncausal coeﬃcients are equal totheir respective true values. In particular, we compute two diﬀerent t − tests: H : φ = φ and H : ϕ = ϕ against the two sided alternatives, respectively φ (cid:54) = φ and ϕ (cid:54) = ϕ . Tables 1-14 showthe empirical rejection frequencies (at nominal signiﬁcance level 5%) obtained using the diﬀerentmethodologies to compute the standard errors. In particular, the columns ˆ¯Σ and I ( (cid:98) φ, (cid:98) ϕ ) indicatethe empirical rejection frequencies whenever the standard errors are obtained by the matrices (27)and (23) respectively.We observe that for ν > t − distribution characterized by tails fatter than a standard normal distribution. The reason is thatin the denominator of the t − test we have underestimated standard errors (see Section 3.3). We also5observe that our new approach and the observed Fisher information matrix have less distortionsfor small sample sizes (T=100, T=200) than the expected Fisher information matrix. The latteronly gets closer the 5% nominal rejection frequency for T=1000. For ν ≤ I ( (cid:98) φ, (cid:98) ϕ )) performs better than I ( (cid:98) φ, (cid:98) ϕ ), but the results are still far fromthose that we would have obtained in case of standard normal distribution. Our new approach ( ˆ¯Σ)is the only one that allows us to empirically satisfy Equation (8). Also, this new method providesempirical rejection frequencies slightly smaller for high values of causal and noncausal coeﬃcients.This is not a issue of great relevance in terms of inference, as high values are likely signiﬁcantlydiﬀerent from zero. We illustrate the diﬀerences and the similarities in the computed standard errors of MAR models onfour time series. These are: (a) the annual debt to GDP ratio for Canada from 1870 to 2015 (source:IMF), (b) the variation of daily Covid-19 deaths in Belgium from 10/March/2020 to 17/July/2020(source: WHO), (c) the monthly wheat prices from January 1990 until September 2020 (source:IMF) and (d) the monthly inﬂation rate in Brazil (obtained from year to year diﬀerence on IPCAindex ) observed from January 1997 to June 2020 (source: Central Bank of Brazil). Figure 5presents the data. With this panel of applications we want to show that MAR models are alsointeresting for modeling other series than the usual commodity prices.The way to estimate MAR models imply a series of steps. We ﬁrst estimate a conventionalcausal autoregressive model by OLS in order to obtain the lag order p using information criteria(see Lanne and Saikkonen (2011)). We ﬁnd p = 2 for three out of the four series, namely forinﬂation, debt to GDP ratio and wheat prices whereas p = 4 is chosen for Belgian’s Covid series.Using an AML approach and searching for the r and s with p = r + s that maximize the generalizedStudent’s t likelihood function, Canadian debt/GDP, wheat prices as well as Brazilian inﬂationfollow a MAR(1,1) and the variation of Covid-19 deaths a MAR(2,2). We detail next the valueof estimated parameters and their standard errors obtained using methods reviewed and newlyintroduced in this paper.From our simulation results, we can expect some diﬀerences and similarities given the degreesof freedom estimated for the four variables: for the Canadian series ˆ ν = 2 .

37, for ‘Covid-19 dataˆ ν = 1 .

17, on wheat prices ˆ ν = 2 .

21 and ˆ ν = 3 .

22 for Brazilian inﬂation. Although we observefat tails in each series, it is only on daily Belgian data that the the degree of freedom is below 2.However, none of them is signiﬁcantly diﬀerent from two. To check this, we use the standard errorsgiven by − ( T − p ) − δ l T ( ˆ φ , ˆ ϕ , θ ) /δ θ δ θ (cid:48) with ˆ θ = (ˆ ν, ˆ η ), being a consistent estimator of theexpected Fisher information matrix of the distributional parameters Ω (see Lanne and Saikkonen(2011)). This matrix, unlike Σ, has no restrictions and can be computed also when the population The IPCA targets population families with household income ranging from 1 to 40 minimum wages. Thisincome range guarantees a 90% coverage of families living in 13 geographic zones: metropolitan areas of Bel´em,Fortaleza, Recife, Salvador, Belo Horizonte, Vit´oria, Rio de Janeiro, S˜ao Paulo, Curitiba, Porto Alegre, as well as theFederal District and the cities of Goiˆania and Campo Grande. Basket items include Food and Beverages, Housing,Household Articles, Wearing Apparel, Transportation, Health and Personal Care, Personal Expenses, Education andCommunication. Empirical rejection frequencies - MAR(1,1): φ = 0 , ϕ = 0 , ν = 3 Sample size ˆΣ I ( (cid:98) φ, (cid:98) ϕ ) I ( (cid:98) φ, (cid:98) ϕ ) ˆ¯Σ φ ϕ φ ϕ φ ϕ φ ϕ T=100 24.26% 23.64% 12.08% 11.70% 18.36% 18.45% 9.52% 9.44%T=200 14.93% 15.43% 8.43% 8.77% 14.61% 15.33% 7.21% 7.53%T=500 8.98% 9.35% 5.82% 6.65% 12.14% 12.25% 5.26% 5.76%T=1000 7.17% 7.28% 5.50% 5.48% 10.50% 10.73% 4.74% 4.56%

Table 1: