[PDF] Identifiability and Estimation of Possibly Non-Invertible SVARMA Models: A New Parametrisation

Abstract

This article deals with parameterisation, identifiability, and maximum likelihood (ML) estimation of possibly non-invertible structural vector autoregressive moving average (SVARMA) models driven by independent and non-Gaussian shocks. In contrast to previous literature, the novel representation of the MA polynomial matrix using the Wiener-Hopf factorisation (WHF) focuses on the multivariate nature of the model, generates insights into its structure, and uses this structure for devising optimisation algorithms. In particular, it allows to parameterise the location of determinantal zeros inside and outside the unit circle, and it allows for MA zeros at zero, which can be interpreted as informational delays. This is highly relevant for data-driven evaluation of Dynamic Stochastic General Equilibrium (DSGE) models. Typically imposed identifying restrictions on the shock transmission matrix as well as on the determinantal root location are made testable. Furthermore, we provide low level conditions for asymptotic normality of the ML estimator and analytic expressions for the score and the information matrix. As application, we estimate the Blanchard and Quah model and show that our method provides further insights regarding non-invertibility using a standard macroeconometric model. These and further analyses are implemented in a well documented R-package.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Identiﬁability and Estimation of Possibly Non-InvertibleSVARMA Models: A New Parametrisation

Bernd Funovits

Proposed Running Head

Possibly Non-Invertible SVARMA in WHF

Aﬃliation

University of Helsinki

Faculty of Social SciencesDiscipline of EconomicsP. O. Box 17 (Arkadiankatu7)FIN-00014 University of Helsinkiand

TU Wien

Institute of Statistics and Mathematical Methods in EconomicsEconometrics and System TheoryWiedner Hauptstr. 8A-1040 Vienna

E-mail bernd.funovits@helsinki.ﬁ bstract

This paper deals with parameterisation, identiﬁability, and maximum likelihood (ML) estimation ofpossibly non-invertible structural vector autoregressive moving average (SVARMA) models driven byindependent and non-Gaussian shocks. We introduce a new parameterisation of the MA polynomialmatrix based on the Wiener-Hopf factorisation (WHF) and show that the model is identiﬁed in thisparametrisation for a generic set in the parameter space (when certain just-identifying restrictions areimposed). When the SVARMA model is driven by Gaussian errors, neither the static shock transmis-sion matrix, nor the location of the determinantal zeros of the MA polynomial matrix can be identiﬁedwithout imposing further identifying restrictions on the parameters. We characterise the classes of ob-servational equivalence with respect to second moment information at diﬀerent stages of the modellingprocess. Subsequently, cross-sectional and temporal independence and non-Gaussianity of the shocksis used to solve these identiﬁability problems and identify the true root location of the MA polynomialmatrix as well as the static shock transmission matrix (up to permutation and scaling).Typically imposedidentifying restrictions on the shock transmission matrix as well as on the determinantal root locationare made testable. Furthermore, we provide low level conditions for asymptotic normality of the MLestimator. The estimation procedure is illustrated with various examples from the economic literatureand implemented as R-package.Keywords: Non-invertibility, structural vector autoregressive moving-average models, non-Gaussianity,IdentiﬁabilityJEL classiﬁcation: C32, C51, E52

Introduction

Tracing out the response of variables of interest with respect to underlying economic shocks is part ofalmost every macroeconometric analysis. The main tool for generating this so-called impulse responsefunction (IRF) is the structural vector autoregressive (SVAR) model. In this article, we will point outthe deﬁciencies of SVAR models and suggest a superior alternative: possibly non-invertible SVARMAmodels.If the error terms driving the economy are Gaussian or (cross-sectionally) uncorrelated (as opposed toindependent), one has to resort to identifying restrictions (obtained from economic theory) in order toconclude on the underlying shocks driving the economy and with respect to which we want to analysesystem responses. An immense body of literature has therefore been dedicated to devise (mainly story-driven) identiﬁcation strategies for the static shock transmission matrix in SVARs (Kilian and Lütkepohl,2017, Chapter 4). Recently, Lanne et al. (2017) and Gouriéroux et al. (2017) have shown that structuralvector autoregressive (SVAR) models driven by independent non-Gaussian components are identiﬁedup to scaling and permutations which makes the typically imposed identifying restrictions testable.In particular, inﬁnitely many linear combinations of shocks generating the same second moments arereduced to a ﬁnite set of linear combinations generating the same distributional outcome. It is thuspossible to employ a data-driven approach instead of a story-telling approach.

Deﬁciencies of IRF Analysis with SVAR Models.

While these data-driven SVAR identiﬁcation andestimation strategies are a step forward, two deﬁciencies of SVAR models remain. First, it is known thatcomplex dynamics are better approximated and described by SVARMA models (Hannan and Deistler,2012). Especially in macroeconometrics, where data is sometimes available only at quarterly instances,it is of paramount importance to use parsimoniously parameterised models (like e.g. SVARMA mod-els) for which the IRF and variance decompositions can be obtained straight-forwardly. Poskitt (2016),Poskitt and Yao (2017), Raghavan et al. (2016), Athanasopoulos and Vahid (2008a), and Athanasopoulos and Vahid(2008b) provide ample evidence and make a strong point for using VARMA models instead of VARmodels for econometric analysis. Second, SVAR models exclude a priori the existence of determinantalMA roots. This is especially problematic in structural economic environments where economic agentshave more information available than outside observers (corresponding the determinantal MA rootsinside the unit circle (Hansen and Sargent, 1991, page 86)). While the literature on SVAR models isabundant, the contributions regarding possibly non-invertible SVARMA models are easier to keep track,see Gouriéroux et al. (2019) and references therein.

The Dynamic Identiﬁability Problem.

Extending the approach in Lanne et al. (2017) to invertibleSVARMA models creates a well-understood source of possible non-identiﬁability in terms of possiblenon-coprimeness of the AR and MA matrix polynomials, the static identiﬁability problem concerning the1tatic shock transmission matrix remains the same. However, when allowing for possibly non-invertibleSVARMA models, a diﬀerent and more diﬃcult identiﬁability problem appears. The diﬃculty is due tothe fact that (multivariate) spectral factorisation techniques are necessary to understand the structureof observational equivalence with the same second moment information. The recent contributionGouriéroux et al. (2019) provides an overview of estimation strategies (mainly in the case of one MA lag)and applications in macroeconomics and ﬁnance, and they apply the results by Chan and Ho (2004);Chan et al. (2006) on unique representation of multivariate linear processes to derive identiﬁabilitythe possibly non-invertible SVARMA model. We focus here on a general treatment of the wholemodel class, provide a new parametrisation for the MA polynomial, show that this parametrisation isidentiﬁable under diﬀerent non-Gaussianity assumptions and (just-identifying) parameter restrictions,and provide low-level conditions on the true shock densities such that the ML estimator is asymptoticallynormal. Moreover, we characterise the classes of observational equivalence in terms of second momentinformation at diﬀerent stages of the modelling process, i.e. from rational spectral density to spectralfactors (or equivalently the IRF), from spectral factor to AR and MA polynomial and static shocktransmission matrix, and ﬁnally from the MA matrix polynomial to the (without further assumptionsin general non-unique) WHF factorisation.

Consequences of Dynamic Non-Identiﬁability for IRF.

To illustrate the importance of identify-ing the root location correctly, consider the example given in Gouriéroux et al. (2019) who refer toLippi and Reichlin (1993). Notice that when the true model for productivity is given as y t = ε t + bε t − ,where ( ε t ) is an i.i.d. shock to productivity with variance equal to one, and such that the largest impactof a productivity shock is delayed, i.e. b > , we cannot reconstruct these shocks from present andpast observed data (thence the term “non-invertibility”). Moreover, it is easy to see that the process x t = η t + b η t − where ( η t ) is a white noise process with variance b has the same autocovariancefunction (and spectral density) as ( y t ) . Software for possibly non-invertible VARMA estimation.

One (perceived) disadvantage of VARMAmodels is increased complexity of the estimation procedure compared to VAR models. Two rebuttals arein order. First, there are many sophisticated (e.g. non-linear threshold) VAR models whose estimationis arguably more involved than the one of VARMA models. Second, there are many stable and openlyavailable software implementations which should put the complexities of estimation of VAR and VARMAmodels on equal footing. Examples for implementations in the R software environment R Core Team(2019) are Scherrer and Funovits (2020b), Tsay (2013); Tsay and Wood (2018) and Gilbert (2015),see also Scherrer and Deistler (2019) for a comparison and further comments on the latter packages,and in MATLAB Gomez (2015, 2016). This article is accompanied by an R-package which imple-ments the developed methods and contains various worked examples from the economic literature in It can be downloaded from https://github.com/bfunovits/.

Outline.

The rest of the paper is structured as follows. In section 2, the SVARMA model and theWHF parametrisation are introduced and the latter is shown to be unique under certain parameterrestrictions. In section 3, the identiﬁability problem is analysed and the classes of observational equival-ence with respect to second moment information are characterised. Moreover, the (static and dynamic)identiﬁability result is stated and proved, and an identiﬁcation scheme for selecting a particular signedpermutation is presented. In section 4, the maximum likelihood (ML) estimator is derived and shownto be consistent and asymptotically normal. Detailed illustrations are contained in the associated R-package. The Appendix contains results on zeros and poles at inﬁnity of rational matrices, details onthe (non-) uniqueness of the WHF, and derivations regarding asymptotic normality of the ML estimator.

Notation.

We use z as a complex variable as well as the backward shift operator on a stochasticprocess, i.e. z ( y t ) t ∈ Z = ( y t − ) t ∈ Z and deﬁne i = √− . The transpose of an ( m × n ) -dimensionalmatrix A is represented by A ′ . For the sub-matrix of A consisting of rows m to m , ≤ m ≤ m ≤ m ,we write A [ m : m , • ] and analogously A [ • ,n : n ] for the sub-matrix of A consisting of columns n to n , ≤ n ≤ n ≤ n . The column-wise vectorisation of A ∈ R m × n is denoted by vec ( A ) ∈ R mn × and fora square matrix B ∈ R n × n we denote with vecd ◦ ( B ) ∈ R n ( n − the vectorisation where the diagonalelements of B are left out. The n -dimensional identity matrix is denoted by I n , an n -dimensionaldiagonal matrix with diagonal elements ( a , . . . , a n ) is denoted by diag ( a , . . . , a n ) , and the inequality ” > means positive deﬁniteness in the context of matrices. The column vector ι i has a one atpositions i and zeros everywhere else. The expectation of a random variable with respect to a givenprobability space is denoted by E ( · ) . Convergence in probability and in distribution are denoted by p −→ and d −→ , respectively. Partial derivatives ∂f ( x ) ∂x (cid:12)(cid:12)(cid:12) x = x of a real-valued function f ( x ) evaluated at a point x ∈ R k are denoted by f x ( x ) and considered columns.3 Model

We start from an n -dimensional VARMA system ( I n − a z − · · · a p z p ) | {z } = a ( z ) y t = ( I n + b z + · · · + b q z q ) | {z } = b ( z ) Bε t , a i , b i ∈ R n × n . (1)The shocks ( ε t ) t ∈ Z driving the system are identically and independently distributed (i.i.d.) acrosstime, have zero mean, and diagonal covariance matrix Σ with positive diagonal elements σ i , whosepositive square root is in turn denoted by σ i . To simplify presentation, we also introduce the columnvector σ = ( σ , . . . , σ n ) ′ and Σ = diag ( σ , . . . , σ n ) , as well as x ′ t − = (cid:0) y ′ t − , . . . , y ′ t − p (cid:1) and w ′ t − = (cid:0) ε ′ t − B ′ , . . . , ε ′ t − q B ′ (cid:1) such that equation (2) can be written as y t = ( a , . . . , a p ) x t − + ( b , . . . , b q ) w t − + Bε t . We assume that the stability condition det ( a ( z )) = 0 , | z | ≤ , (2)holds, and that there are no determinantal zeros of b ( z ) on the unit circle , i.e. det ( b ( z )) = 0 , | z | = 1 (3)hold, and that B is invertible and has ones on its diagonal. Furthermore, we assume that the polynomialmatrices a ( z ) and b ( z ) are left-coprime , that a p and b q are non-zero, and that ( a p , b q ) is of full rank . Remark . An assumption similar to the full rank assumption on ( a p , b q ) seems to be missing inGouriéroux et al. (2019). While assuming coprimeness reduces the equivalence class of SVARMA models ( a ( z ) , b ( z ) B ) that generate the same transfer function k ( z ) = a ( z ) − b ( z ) B , it is not suﬃcient toguarantee that the equivalence class is a singleton. For example, if u ( a p , b q ) = 0 and u = 0 , thenfor ˜ u ( z ) = I n + u z the pair (˜ u ( z ) a ( z ) , ˜ u ( z ) b ( z ) B ) is another realisation of the transfer function k ( z ) = (˜ u ( z ) a ( z )) − (˜ u ( z ) b ( z ) B ) which satisﬁes all requirements on the parameter space.The stationary solution ( y t ) t ∈ Z of the system (1) is called an ARMA process. Determinantal zeros of b ( z ) correspond to unit canonical correlations between the future ( y t , y t +1 , . . . ) and the past ( y t − , y t − , . . . ) of a stationary stochastic process (Hannan and Poskitt, 1988). Therefore, it seems reasonable to excludethis case from analysis. Two matrix polynomials are called left-coprime if ( a ( z ) , b ( z )) is of full row rank for all z ∈ C . For equivalent deﬁnitionssee Hannan and Deistler (2012) Lemma 2.2.1 on page 40. The stability, coprimeness, and full-rank assumptions on the parameters in a ( z ) and b ( z ) could be relaxed. The fullrank assumption on ( a p , b q ) is over-identifying in the sense that some rational transfer function cannot be parameterizedby any VARMA(p,q) system which satisﬁes this assumption, see Hannan (1971) or Hannan and Deistler (2012), Chapter2.7 on page 77. To solve this problem, one could consider the parameter space where the column degrees of ( a ( z ) , b ( z )) are ﬁxed to be ( p , . . . , p n , q , . . . , q n ) as in Deistler (1983) or Hannan and Deistler (2012, Chapter 2.7). Be that as itmay, we impose slightly stronger assumptions to strike a balance between notational complexity and generality, and tofocus on the essential part of this contribution. Using non-Gaussianity to reduce the equivalence class of stable SVARMAmodels which generate the same second moments. .1 Parametrisation using the Wiener-Hopf Factorisation The following parametrisation of the MA polynomial matrix b ( z ) is useful for gaining structural insightsinto the behaviour of the system and for deriving asymptotic properties and analytic expressions for thescore, the information matrix, and the Hessian of the ML estimator. Every b ( z ) = I n + b z + · · · + b q z q without zeros on the unit circle can be represented as a product of a backward, a shift, and a forwardpart such that b ( z ) = p ( z ) s ( z ) f ( z ) where the polynomial matrix p ( z ) = p + p z + · · · + p q p z q p has nozeros inside or on the unit circle, s ( z ) is a diagonal matrix with diagonal entries of the form z κ i , where κ ≥ · · · ≥ κ n holds for the so-called partial indices κ i ∈ Z , and f ( z ) = f + f z − + · · · + f q f z − q f has no zeros or poles outside the unit circle - in particular, it has no zeros or poles at inﬁnity . Finite and Inﬁnite Zeros and Poles of a Rational Matrix.

Here we provide simple deﬁnitionsof ﬁnite and inﬁnite zeros and poles of a square matrix R ( z ) whose elements are rational functionsand whose determinant is not identically zero. While these deﬁnitions suﬃce for understanding thefactorization mentioned above, we will discuss diﬀerent deﬁnitions of ﬁnite and inﬁnite zeros and poles(in a more general setting) in the Appendix.A ﬁnite pole of R ( z ) at z ∈ C is deﬁned as a point for which an element of R ( z ) has a pole. At pointswhere R ( z ) does not have a pole, R ( z ) has a ﬁnite zero at z if and only if det ( R ( z )) = 0 . Moregenerally, R ( z ) has a zero at z if and only if R ( z ) − has a pole at z .Regarding the point at inﬁnity, R ( z ) has a pole at inﬁnity if any element is unbounded when | z | → ∞ ,or equivalently, if R (cid:0) z (cid:1) has no pole at zero. If there is no pole at inﬁnity, it has a zero at inﬁnity ifand only if the determinant of R (cid:0) z (cid:1) is zero when evaluated at zero. Otherwise, R ( z ) has a zero atinﬁnity if and only if any element of (cid:0) R (cid:0) z (cid:1)(cid:1) − has a pole at zero.Notice that f ( z ) having no pole at inﬁnity implies that f (cid:0) z (cid:1)(cid:12)(cid:12) z =0 is ﬁnite (or equivalently that lim | z |→∞ f ( z ) is ﬁnite) . Moreover, f ( z ) not having inﬁnite zeros implies that f (cid:0) z (cid:1)(cid:12)(cid:12) z =0 = f isof full rank. Existence of the WHF.

The factorisation of b ( z ) into ( p ( z ) , s ( z ) , f ( z )) is known as the Wiener-Hopf factorisation (WHF), (Clancey and Gohberg, 1981, Chapter I), (Gohberg et al., 2003), see alsoOnatski (2006); Al-Sadoon (2018); Al-Sadoon and Zwiernik (2019) for its use in rational expectationsmodels. The WHF exists in more general cases than required for the representation of b ( z ) describedabove: Every rational matrix function without determinantal zeros on the unit circle admits a WHF In the univariate case, a polynomial of degree d has d poles at inﬁnity. There are diﬀerent deﬁnitions for zeros atinﬁnity. In the Appendix, we will deﬁne and discuss poles and zeros at inﬁnity for rational matrices via the Smith-McMillanform of a rational matrix. In system theory, a rational matrix function satisfying lim | z |→∞ R ( z ) < ∞ or lim | z |→∞ R ( z ) = 0 is called properor strictly proper. The latter is often used for ﬁnding a system realization of the transfer function since it is easy to builda state space system ( A, B, C ) from a strictly proper R ( z ) = C ( zI − A ) − B and subsequently obtain a proper one as (cid:16) C ( z − A ) − B + I (cid:17) D . M ( z ) could besingular when evaluated at z = 0 . Consequently, starting from an MA polynomial in WHF can beconsidered slightly more general than starting from an MA matrix polynomial b ( z ) with b (0) = I n . Uniqueness of the WHF.

While the WHF is not unique, the non-uniqueness can be tamed withreasonable eﬀort for the cases relevant to us. The relevant cases are the ones where the ﬁrst k partialindices are equal to κ + 1 and the last ( n − k ) ones are equal to κ . We will denote this by ( κ, k ) , ≤ κ ≤ q and k ∈ { , . . . , n − } . In the case ( κ, , the WHF is essentially unique in the sensethat the equivalence class of WHFs for b ( z ) is parametrised by the set of non-singular matrices ofdimension ( n × n ) . In particular, requiring that p (0) = I n results in a unique WHF of b ( z ) . In the case ( κ, k ) , k = 0 , the equivalence class of WHFs for b ( z ) is parametrised by the block upper triangularunimodular matrices for which u [ k +1: n, k ] ( z ) = 0 , the diagonal blocks are constant, and the degree of u [1: k,k +1: n ] ( z ) is at most one. Generically, one can choose a canonical representative of a simple formby restricting certain parameters to zero and one (which is easily implementable). For the constructionof this canonical representative we refer to the Appendix. Generic WHF.

The reason for considering the above cases as the relevant ones is the following. Itis a generic property of the parameter space (for which det ( b ( z )) = 0 for | z | = 1 holds and whichis endowed with the relative topology of the qn -dimensional Euclidean space) for the MA polynomialmatrix b ( z ) that the diﬀerence between the largest and the smallest partial index is smaller than two(Gohberg and Krein, 1960), (Gohberg et al., 2003, Section 1.5), (Al-Sadoon, 2018, SupplementaryAppendix).We summarise this in Theorem 1.

Every matrix polynomial b ( z ) = I n + b z + · · · + b q z q without determinantal zeros on theunit circle and whose parameter space is the open subset S | z | =1 n ( b , . . . , b q ) ∈ R n q | det ( b ( z )) = 0 o can generically be factorised as b ( z ) = p ( z ) s ( z ) f ( z ) where p ( z ) has no zeros or poles inside oron the unit circle, s ( z ) = diag (cid:0) z κ +1 , . . . , z κ +1 , z κ , . . . z κ (cid:1) with ( κ, k ) such that ≤ κ ≤ q and k ∈ { , . . . , n − } and f ( z ) has no zeros or poles outside or on the unit circle. There are n · κ + k zerosinside the unit circle and deg (det ( b ( z ))) − ( n · κ + k ) zeros outside the unit circle. In the case k = 0 ,p ( z ) = I n + p z + · · · + p q − κ z q − κ and f ( z ) = f + f z − + · · · + I n z − κ . In the case k = 0 , we have that deg (cid:0) p [ • , k ] ( z ) (cid:1) = q − κ − , deg (cid:0) p [ • ,k +1: n ] ( z ) (cid:1) = q − κ , p = (cid:16) I k p , I n − k (cid:17) , p , = 0 k × ( n − k ) , andthat f ( z ) = f + f z − + · · · + f κ +1 z − κ +1 where (cid:16) f κ +1 , [1: k, • ] f κ, [ k +1: n, • ] (cid:17) = (cid:16) I k − p , I n − k (cid:17) and f κ +1 , [ k +1: n, • ] = 0 . For derivations, we refer to the Appendix. In the Appendix, we also discuss the relation of the rep- A property is generic if it holds on a superset of an open and dense set, see Anderson et al. (2016) and referencestherein. The subset of R n q on which det ( b ( z )) has no zeros on the unit circle is open because it is the union of the opensets n ( b , . . . , b q ) ∈ R n q | det ( b ( z )) = 0 o for all z on the unit circle. Factorisation involving polynomials in z . For obtaining formulae in connection with asymptoticbehaviour of the maximum likelihood estimator, it is advantageous to consider the factorisation b ( z ) = p ( z ) g ( z ) , where g ( z ) := s ( z ) f ( z ) . Important properties of g ( z ) are the non-singularity of the zero-lagcoeﬃcient, the non-singular row-end matrix (see the Appendix for a deﬁnition), and the row degrees of κ + 1 of the ﬁrst k rows, and κ of the last n − k rows. Furthermore, notice that g is the identity inthe case ( κ, . 7 Identiﬁability Analysis

We follow Rothenberg (1971) to deﬁne identiﬁability of parametric models. The external characteristicof the stationary solution ( y t ) t ∈ Z of (1) is the probability distribution function (or a subset of correspond-ing moments). A particular system (1) is described by the parameters of (1) which satisfy assumptions(2) and (3) as well as the coprimeness assumption, the full rank assumption and the assumptions on B and Σ . The model is then characterised by the set of all a priori possible systems which we will callinternal characteristics. Two systems of the form (1) are called observationally equivalent if they implythe same external characteristics of ( y t ) t ∈ Z . A system is identiﬁable if there is no other observationallyequivalent system. The identiﬁability problem is concerned with the existence of an injective functionfrom the internal characteristics to the external characteristics , see Deistler and Seifert (1978) for amore detailed discussion. The classical (non-)identiﬁability issues where the external characteristics are described by the secondmoments of ( y t ) t ∈ Z are best understood in terms of the spectral density of the stationary solution of (1).The spectral density, i.e. the Fourier transform of the autocovariance function γ ( s ) = E (cid:0) y t y ′ t − s (cid:1) , s ∈ Z , of ( y t ) t ∈ Z , is f ( z ) = a ( z ) − b ( z ) B Σ B ′ b ′ (cid:18) z (cid:19) a ′ (cid:18) z (cid:19) − , evaluated at z = e − iλ , λ ∈ [ − π, π ] . The Dynamic Identiﬁability Problem.

Starting identiﬁability from this rational spectral density, itis well known (Rozanov, 1967, Theorem 10.1, page 47), (Hannan, 1970, Theorem II.10’ page 66 andTheorem III.1 on page 129), (Baggio and Ferante, 2016), that there exists a canonical rational spectralfactor l ( z ) without zeros or poles inside or on the unit circle such that f ( z ) = l ( z ) l ′ (cid:0) z (cid:1) . This canonicalspectral factor is unique up to orthogonal post-multiplication. In order to focus on the non-uniquenessimplied by diﬀerent pole and zero locations, we will for now abstract from the “static” non-uniquenessof spectral factors implied by orthogonal post-multiplication by requiring that the coeﬃcient pertainingto power zero of z in the respective spectral factor is lower-triangular with positive diagonal elements.When allowing for spectral factors with unrestricted zero and pole location, there exists, in general,inﬁnitely many rational all-pass ﬁlters V ( z ) , which satisfy V ( z ) V ′ (cid:0) z (cid:1) = I n (Alpay and Gohberg, 1988,page 207), such that f ( z ) = [ l ( z ) V ( z )] V ′ (cid:0) z (cid:1) l ′ (cid:0) z (cid:1) = ˜ l ( z )˜ l ′ (cid:0) z (cid:1) holds. Requiring that the spectralfactors with arbitrary pole and zero location be minimal , Baggio and Ferrante (2019) have recently The inverse of this function, i.e. from the external to the internal characteristics, is called the identifying function. A spectral factor is minimal if the number of its ﬁnite and inﬁnite poles (including multiplicities) is one half of thenumber of ﬁnite and inﬁnite poles (including multiplicities) of the spectral density, see the Appendix for the deﬁnition ofzeros and poles including their multiplicities and structure using the Smith-McMillan form. This excludes, e.g., spectral ˜ l ( z ) of f ( z ) can be obtained by right-multiplyingthe divisors of a particular rational all-pass ﬁlter T ( z ) on the canonical spectral factor l ( z ) . We mayobtain T ( z ) = l ( z ) − j ( z ) from the canonical spectral factor l ( z ) (without zeros and poles inside or onthe unit circle) and another “extremal” spectral factor j ( z ) which has no zeros and poles outside or onthe unit circle. Since l ( z ) l ′ (cid:0) z (cid:1) = j ( z ) j ′ (cid:0) z (cid:1) , it is clear that T ( z ) is indeed all-pass. Moreover, theall-pass ﬁlter T ( z ) may be represented as the product of orthogonal matrices and so-called Blaschkematrices of the form (cid:16) I r r × ( n − r ) ( n − r ) × r I n − r − ¯ αzz − α (cid:17) , see Hannan (1970, page 65), Lippi and Reichlin (1994,Theorem 1, page 311), and Alpay and Gohberg (1988, Theorem 3.12, page 208), which immediatelyprovides the (ﬁnitely many) all-pass divisors of T ( z ) which in turn generate a ﬁnite number of minimalspectral factors with diﬀerent zero and pole locations. The Static Identiﬁability Problem.

Let us now turn to the static identiﬁability problem and noticethat the dynamic and static identiﬁability problem cannot be treated independently. Indeed, for transferfunctions k ( z ) = a ( z ) − b ( z ) B satisfying the assumptions of section 2 it holds that k k ′ is maximalwhen all zeros of b ( z ) are outside the unit circle (Rozanov, 1967, Theorem 4.2, page 60). This is aconsequence of the fact that the Blaschke factor b α ( z ) = − ¯ αzz − α which mirrors a zero at α with | α | > inside the unit circle has absolute value smaller than one when evaluated at z = 0 . Intuitively, thisis due to the fact that whenever there are zeros of the MA polynomial matrix inside the unit circle,the information space of the agents is strictly larger than the information space of the outside observer(Hansen and Sargent, 1991, page 86).Assuming that we know the true zero and pole locations in k ( z ) = a ( z ) − b ( z ) B , it can be shownthat any other minimal spectral factor ˜ k ( z ) with the same zero and pole locations can be obtainedby orthogonal right transformation of k ( z ) , i.e. ˜ k ( z ) = k ( z ) Q where Q is an orthogonal matrix (Baggio and Ferrante, 2016). Continuing with the parametrisation that we discussed in section 2, i.e. a = I n , b = I n , the static shock transmission matrix has ones on its diagonal, and Σ contains the(positive) variances of the economic shocks, we will now conclude the discussion of static observationalequivalence in terms of second moments. Transforming the pair ( B, Σ) with an orthogonal matrix Q to (cid:16) B Σ Q ˜Σ − , Σ (cid:17) , where ˜Σ is a diagonal matrix such that the diagonal elements of B are equal toone and Σ is the same matrix but with positive elements only, generates the same spectral densitybecause B Σ B ′ = B Σ B ′ where B = B Σ Q ˜Σ − . Hence, the class of observational equivalence is atleast n ( n − -dimensional.We will show in the next section that under two diﬀerent sets of assumptions on the joint distributionof the components of the inputs ( ε t ) to (1), ( a ( z ) , b ( z )) are unique and ( B, Σ) are unique up to signed factors that are obtained by post-multiplying the canonical spectral factor by all-pass ﬁlters which do not cancel any zeroor pole of l ( z ) and which correspond to what Lippi and Reichlin (1994) call “non-basic representations”. The rational matrices T l ( z ) and T r ( z ) are respectively left all-pass divisor and right all-pass divisor of the rationalall-pass ﬁlter T ( z ) if T ( z ) = T l ( z ) T r ( z ) holds and there are no (ﬁnite or inﬁnite) pole or zero cancellations between T l ( z ) and T r ( z ) . A square matrix is orthogonal if QQ ′ = Q ′ Q = I n . In this section, we will ﬁrst provide some intuition as to how non-Gaussianity and higher order informationmay help identifying, on the one hand, the orthogonal matrix and the static shock transmission matrix(up to signed permutations) and, on the other hand, the dynamic all-pass ﬁlter which “rotates” thecanonical spectral factor to the true the zero and pole location. Subsequently, we will use these insightsto prove show under which conditions our model is identiﬁable. Finally, we discuss advantages anddisadvantages of various rules for choosing a particular permutation and scaling.In order to strengthen intuition as to how non-Gaussianity and independence help reducing the sizeof the class of observational equivalence, consider the following example featuring two identically andindependently uniformly distributed random variables. Rotating these two variables 45 degrees (withrotation matrix √ (cid:0) − (cid:1) ) leads to marginal distributions which are “more Gaussian” (e.g. measuredby the absolute value of the excess kurtosis) than the original variables. This suggests that searchingfor linear combinations that lead to “maximally non-Gaussian” variables might pin down a rotation.Similarly, the all-pass ﬁlters described in the previous section can be interpreted as “dynamic rotations”.Rather than taking linear combinations of the components at one point in time, special linear combin-ations of the whole stochastic process are considered. In the dynamic setting, we are thus searchingfor the “dynamic rotation” which transforms uncorrelated inputs (which one obtains from any spectralfactor) to independent underlying economic shocks.The (non-) uniqueness of the inﬁnite MA representation of multivariate linear processes driven bynon-Gaussian inputs is well understood in the literature and analysed, e.g., in Chan et al. (2006);Chan and Ho (2004). These insights are used in Lanne and Saikkonen (2013) and Gouriéroux et al.(2019) to show to which extent their respective non-causal and non-invertible models are identiﬁed.Interestingly, the dynamic identiﬁability result of Chan and Ho (2004) builds in the same way on Chapter5 in Kagan et al. (1973) as the static identiﬁability problem described in Lanne et al. (2017) builds onTheorem 3.1.1 on page 89 in Kagan et al. (1973). In both cases, higher order information is includedin the guise of the characteristic function of the whole process or the components at one point in time,respectively.We now introduce the ﬁrst of two possible assumptions on the joint distributions of the components of ε t that is suﬃcient for identiﬁability of model (1). Assumption 1 (Non-zero cumulant) . The components of ε t are mutually independent (but not ne-cessarily identically distributed). Each component has a non-zero cumulant of order r ≥ and ﬁnitemoments up to order τ , where τ is an even integer and strictly larger than r. Lemma 1 (Kagan et al. (1973), Theorem 3.1.1) . Let X , . . . X n be independent (not necessarilyidentically distributed) random variables, and deﬁne Y = P ni =1 a i X i and Y = P ni =1 b i X i where a i and b i are constants. If Y and Y are independent, then the random variables X j for which a j b j = 0 are all normally distributed. In the following, Lemma 1 is used to conclude on the columns of M in ε t = M ε ∗ t , where M = B − B ∗ , where both ε t and ε ∗ t are assumed to be (cross-sectionally) independent and non-Gaussian.The components of ε t correspond to Y , Y , the components of ε ∗ t correspond to X , . . . , X n . E.g.,for component 1 and 2 of ε t we have ε ,t = ( m , . . . , m n ) ε ∗ t and ε ,t = ( m , . . . , m n ) ε ∗ t . Ifany pair of coeﬃcients ( m k , m k ) satisﬁes m k m k = 0 , then the corresponding component ε ∗ k,t isGaussian according to the Lemma. By Assumption 1, at most one component of ε ∗ t is allowed tohave a Gaussian marginal distribution. It follows that there cannot be another pair ( m l , m l ) , l = k, that satisﬁes m l m l = 0 . In particular, there is (at most) one non-zero coeﬃcient in the scalarproduct h m , • , m , • i = m k m k = 0 , where m i, • denotes the i -th row of M . If h m , • , m , • i = m k m k = 0 , we obtain a contradiction to the assumption that E ( ε ,t ε ,t ) = 0 because from thefact that one (exactly one) component ε ∗ k,t is Gaussian and ε i,t = m i, • (cid:18) ε ∗ ,t · · · ε ∗ n,t (cid:19) ′ we obtainthat E ( ε ,t ε ,t ) = m , • D ∗ m ′ , • = d ∗ k m k m k = 0 . It thus follows that all pairs ( m k , m k ) satisfy m k m k = 0 . Since this argument holds for all pairs in ε ,t , . . . , ε n,t , it follows that every columncontains at most one non-zero element. Finally, non-singularity implies that every column containsexactly one non-zero element.The second set of assumptions on the joint distribution of the economic shocks is summarised ini Assumption 2 (Identically distributed components) . The components of ε t are independent, identicallydistributed, and non-Gaussian. Note that in Chan and Ho (2004, Theorem 3, page 8), the authors do not require that the components of ε t be non-Gaussian but only that they be independent and identically distributed. The non-Gaussianityfollows in their case from assuming that the observed output process be non-Gaussian.Finally, let us state the result on identiﬁability of our model.11 heorem 2. Under Assumption 1 or 2, and the assumptions outlined below equation (1) , the parameters ( a ( z ) , p ( z ) s ( z ) f ( z ) , B, Σ) in model (1) are identiﬁable up to signed permutations of B . The proof is a straight-forward application of Chan and Ho (2004); Chan et al. (2006) and using thefact that the shifts are identiﬁed, see Gouriéroux et al. (2019, Appendix B, page 34f.).

In this section, we describe how to pick one particular permutation and scaling from the class ofobservational equivalence described in the previous section. In order to do this, we describe diﬀerentidentiﬁcation schemes, i.e. rules for choosing a particular permutation and scaling of the matrix B .We start by repeating two identiﬁcation schemes presented in Lanne et al. (2017) (which are in turnbased on Ilmonen and Paindaveine (2011) and Hallin and Mehta (2015)). The ﬁrst identiﬁcation scheme, which is convenient for deriving asymptotic properties and which we refer to as identiﬁc-ation scheme A , consists in ﬁrstly scaling all columns of B such that their norm is equal to one,secondly, permutating the columns such that the absolute value of each diagonal element is larger thanthe absolute value of all elements in the same row with a higher column index, and ﬁnally scaling allcolumns of B such that the diagonal elements are equal to one . The second identiﬁcation scheme consists of the same ﬁrst two steps but instead of scaling the columns in the last step such that theirdiagonal elements are equal to one, it is required that the diagonal elements are positive. Sometimes,the second identiﬁcation scheme turns out to be more ﬂexible, for example when testing hypothesesinvolving diagonal elements. Regarding the derivation of asymptotic properties, however, one wouldneed to maximise the constrained (log-) likelihood function where the restrictions that the columns of B have length one are taken into account. Given that in the case ( κ, k ) , k = 0 , one needs to impose(non-overidentifying) restrictions on the parameters in the WHF, (non-overidentifying) restrictions onthe parameters in an otherwise unconstrained static shock transmission matrix B do not add furtherburden on the researcher.It is important to realise that the transformations used in the identiﬁcation schemes described above,exist not on the whole parameter space but only on a topologically large set in the parameter set.For details, see Proposition 2 in Lanne et al. (2017) including an example of a matrix or which theabove identiﬁcation schemes are not deﬁned. The third identiﬁcation scheme , similar to the one inChen and Bickel (2005) on page 3626, does not exclude any non-singular matrix B and is deﬁned by the Note that in the derivation of the ML estimator, we impose only that the diagonal elements of B be equal to one.Thus, the restrictions, in general, do not suﬃce to pin down the particular permutation and scaling for B . However, thefact that the observationally equivalent points in the parameter space are discrete ensures the existence of a consistentroot, i.e. the solution of the ﬁrst order conditions obtained from taking derivatives of the standardized log-likelihoodfunction. Should the gradient descent algorithm return a B matrix which does not satisfy the identiﬁcation scheme,it can be easily transformed such that the identiﬁcation scheme is satisﬁed. The companion R-package to this articletransforms the B matrix such that all restrictions described here are satisﬁed. B are scaled to have norm equal to one. Secondly, ineach column, the element with largest absolute value is made positive. Finally, the columns are orderedaccording to ≺ such that c ≺ d for two columns c, d of B if and only if there exists a k ∈ { , . . . , n } such that c k < d k and c j = d j for all j ∈ { , . . . , k − } .Now that we have ﬁrstly obtained a discrete set of observationally equivalent SVARMA systems andsecondly provided diﬀerent rules to select a unique representative, we may proceed to local ML estim-ation of the true underlying parameter. 13 Maximum Likelihood Estimation

In this section, we treat local ML estimation of (1) in the parametrisation derived in Theorem 1. Inparticular, we show that the ML estimator (MLE) is asymptotically normal.Whereas the essential part of this article is the identiﬁability analysis and the implied non-singularity ofthe information matrix of the MLE when (1) is parametrised (including zero-, one-, and equality restric-tions on the polynomial matrices) with the WHF, the asymptotic theory is standard. Except for the factthat we consider here the multivariate case, it is identical to the asymptotic analysis in Lii and Rosenblatt(1992) and Rosenblatt (2000, Chapter 8) . The multivariate matrix calculus and the treatment of thecomponents’ densities is similar to Lanne et al. (2017). The derivations of the score and second orderpartial derivatives, the information matrix, and the Hessian are straight-forward but tedious. The scoresand essential diﬀerences to the derivations in Lii and Rosenblatt (1992) and Lanne et al. (2017) aresummarised in the Appendix. More detail regarding the implementation can be found in the document-ation of the associated R-package which can be downloaded from https://github.com/bfunovits/. We ﬁrst describe the parameter space over which we optimise the log-likelihood function. Second,we make assumptions on the densities of the components of ε t . This allows us to provide explicitexpressions for the individual contributions to the standardised log-likelihood function and its partialderivatives.For given integer valued parameters ( p, q, ( κ, k )) , we vectorise the system parameters, i.e. the ones in ( a ( z ) , p ( z ) , f ( z )) , in column-major order . This order is chosen because ﬁrstly the ML estimation isimplemented in R (R Core Team, 2019), whose storage order is column-major, and secondly it buildson the packages RLDM and rationalmatrices whose objects lend themselves to vectorising in thedescribed way. The AR parameters are vectorised as τ = vec ( a , . . . , a p ) , the “stable” MA parametersfor ( κ, as τ = vec ( p , . . . , p q − κ ) and for ( κ, k ) , k = 0 , as τ = vec  I k k × ( n − k ) p , I n − k  ,  p , k × ( n − k ) p , p ,  , . . . , p κ − , p κ, [ • ,k +1: n ]  . These authors in turn refer to Lehmann (1983, page 430). However, the proof in Lehmann (1983)requires assumptionson the third order partial derivatives of the individual contributions to the log-likelihood function (rather than the secondorder partial derivatives) while this is not necessary in Lii and Rosenblatt (1992) or here. A parametrisation where the columns of ( a ( z ) , b ( z )) are reordered as vec h(cid:0) a , [ • , , . . . , a p, [ • , | a , [ • , , . . . , a p, [ • , | · · · | a , [ • ,n ] , . . . , a p, [ • ,n ] || b , [ • , , . . . , b q, [ • , | · · · | b , [ • ,n ] , . . . , b q, [ • ,n ] (cid:1) ′ i is advocated in (Hannan and Deistler, 2012, page 133) for the invertible VARMA case because it leads to comparablysimple formulae for the covariance of the asymptotic distribution. However, in our case it is more diﬃcult (if notimpossible) to obtain an elegant integral representation. Therefore, we opt for a form in which the partial derivatives areeasier to obtain. The abbreviation RLDM stands for Rational Linear Dynamic Models.

14t turns out that it is more convenient to parametrise the “unstable” MA parameters in g ( z ) = s ( z ) f ( z )=  f κ +1 , [1: k, • ] f κ, [ k +1: n, • ]  +  f κ, [1: k, • ] f κ − , [ k +1: n, • ]  z + · · · +  f , [1: k, • ] f , [ k +1: n, • ]  z κ +  f , [1: k, • ] ( n − k ) × n  z κ +1 =  I k k × ( n − k ) − p , I n − k  + g z + · · · +  g κ +1 , [1: k, • ] ( n − k ) × n  z κ +1 , rather than the ones in f ( z ) directly. Of course, they are in a one-to-one relation and can be easilyobtained from each other, whenever necessary. Note that none of the parameters in g are free becausethere are equality restrictions between p and g (in the case ( κ, k ) , k = 0 ). The parameters in g ( z ) are vectorised, in the case ( κ, k ) , k = 0 , as τ = vec  I k k × ( n − k ) − p , I n − k  , g , . . . , g κ ,  g κ +1 , [1: k, • ] ( n − k ) × n  and as τ = vec ( g , . . . , g κ ) when k = 0 . Restrictions on system parameters.

Obviously, not all parameters in τ ′ = ( τ ′ , τ ′ , τ ′ ) are free.There are n ( n −

1) + kn zero-restrictions and n one-restrictions in τ , kn + ( n − k ) − n + ( n − k ) n zerorestrictions, and n one-restrictions in τ , and k ( n − k ) restrictions between the parameters in τ and τ , asdescribed in Theorem 1. We represent these restrictions in the implicit form (Gouriéroux and Monfort,1989) as Rτ = r where R is of full row rank and of dimension n × n τ , where n τ = n ( p + q + 3) .Note, however, that when implementing this estimation method, it is more convenient to write themin the explicit form. The parameter space in detail.

The (free) parameters pertaining the the underlying economic shocksare vectorised and summarised in

Assumption 3.

The true parameter value θ belongs to the permissible parameter space Θ = Θ τ × Θ β × Θ σ × Θ λ = Θ τ × Θ γ , where1. Θ τ with Θ τ ⊆ R n ( p + q ) is such that conditions (2) , (3) , the coprimeness assumption and thefull rank assumption on ( a p , b q ) are satisﬁed, and2. Θ β = vecd ◦ ( B ) = (cid:8) β ∈ R n ( n − | β = vecd ◦ ( B ) for some B ∈ B (cid:9) . The vector β collects theoﬀ-diagonal elements of B .3. For the scalings, Θ σ = R n + holds, and . for the additional parameters appearing in the component densities, we have Θ λ = Θ λ × · · · × Θ λ n ⊆ R d with Θ λ i ⊆ R d i open for every i ∈ { , . . . , n } and d = d + · · · + d n . We also introduce the non-singleton compact and convex subset Θ = Θ ,τ × Θ ,γ of the interior of Θ which contains the true parameter value θ . The component densities.

Regarding the component densities of the i.i.d. shock process ( ε t ) , wehave Assumption 4.

For each i ∈ { , . . . , n } the distribution of the error term ε i,t has a (Lebesgue) density f i,σ i ( x ; λ i ) = σ − i f i (cid:0) σ − i x ; λ i (cid:1) which may also depend on a parameter vector λ i ∈ R d i . Thus, the individual contributions in the (standardised) log-likelihood function L T ( θ ) = 1 T T X t =1 l t ( ε t ( θ ) , θ ) (4)are l t ( ε t ( θ ) , θ ) = n X i =1 log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i − log {| det [ f ] |}− log {| det [ B ( β )] |}− n X i =1 log ( σ i ) , (5)where u t ( θ ) = a τ ( z ) y t + ( I − p τ ( z ) s τ ( z ) f τ ( z )) B ( β ) ε t ( θ ) and ι i is the unit column-vector with aone at the i -th position. The expressions for the partial derivatives of the individual contributions to the standardised log-likelihood function are given as ∂l t ( θ ) ∂τ = − x b,t − ( θ ) B ′ ( β ) − Σ − e x,t ( θ ) ∂l t ( θ ) ∂τ = − (cid:0)(cid:2) f ( z ) − s ( z ) − p ( z ) − (cid:3) (cid:2) w ′ g,t − ( θ ) ⊗ I n (cid:3)(cid:1) ′ B ′ ( β ) − Σ − e x,t ( θ ) ∂l t ( θ ) ∂τ = − (cid:0)(cid:2) f ( z ) − s ( z ) − p ( z ) − (cid:3) (cid:2) w ′ p,t − ( θ ) ⊗ I n (cid:3)(cid:1) ′ B ′ ( β ) − Σ − e x,t ( θ ) − ∂vec ( f ) ∂τ vec (cid:0) f ′− (cid:1) ∂l t ( θ ) ∂β = − H ′ q X i =1 (cid:16) B ( β ) − u t − i ( θ ) ⊗ b ′ i B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ (cid:16) B ( β ) − u t ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) ∂∂σ l t ( θ ) = − Σ − [ e x,t ( θ ) ⊙ ε t ( θ ) + σ ] ∂∂λ l t ( θ ) = e λ,t ( θ ) x ′ b,t − = (cid:2) f ( z ) − z − κ p ( z ) − (cid:3) (cid:2) x ′ t − ⊗ I n (cid:3) , x ′ t − = (cid:0) y ′ t − , . . . , y ′ t − p (cid:1) , w ′ g,t − = (cid:0) g ( z ) u ,t − ( θ ) , . . . , g ( z ) u n,t − ( θ ) | · · · | g ( z ) u ,t − ( q − κ ) ( θ ) , . . . , g ( z ) u n,t − ( q − κ ) ( θ ) (cid:1) ,w ′ p,t − = ( p ( z ) u ,t − ( θ ) , . . . , p ( z ) u n,t − ( θ ) | · · · | p ( z ) u ,t − κ ( θ ) , . . . , p ( z ) u n,t − κ ( θ )) , the matrix H ∈ R n × n ( n − consisting of zeros and ones is implicitly deﬁned by vec ( B ( β )) = Hβ + vec ( I n ) for B in B .The other main diﬀerences in the partial derivatives of the log-likelihood function compared to theinvertible Gaussian case are the appearance of f ( z ) and g ( z ) , the term log {| det [ f ] |} , and the factthat the expressions e i,x,t ( θ ) = ∂∂x log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i = f i,x (cid:0) σ − i ε i,t ( θ ) ; λ i (cid:1) f i (cid:0) σ − i ε i,t ( θ ) ; λ i (cid:1) and e i,λ i ,t ( θ ) = ∂∂λ i log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i = f i,λ (cid:0) σ − i ε i,t ( θ ) ; λ i (cid:1) f i (cid:0) σ − i ε i,t ( θ ) ; λ i (cid:1) , with f i,x ( x ; λ i ) = ∂∂x f i ( x ; λ i ) and f i,λ i ( x ; λ i ) = ∂∂λ i f i ( x ; λ i ) do not simplify as in the Gaussian case(compare the terms ˜ I and ˜ J in Rosenblatt (2000, Chapter 8)). Evaluated at the truth, i.e. θ = θ , wehave that ε i,t ( θ ) = ε i,t and e i,x,t = e i,x,t ( θ ) = ∂∂x log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( π ) ; λ i (cid:17)i(cid:12)(cid:12)(cid:12)(cid:12) θ = θ = f i,x (cid:0) σ − i ε i,t ; λ i, (cid:1) f i (cid:0) σ − i, ε i,t ; λ i, (cid:1) . The following assumptions are similar to Lii and Rosenblatt (1992); Lanne et al. (2017).

Assumption 5.

The following conditions hold for i ∈ { , . . . , n } .1. For all x ∈ R and all λ i ∈ Θ ,λ i , f i ( x ; λ i ) > and f i ( x ; λ i ) is twice continuously diﬀerentiablewith respect to ( x ; λ i ) .2. The function f i,x ( x ; λ i, ) is integrable with respect to x, i.e., R | f i,x ( x ; λ i, ) | dx < ∞ .3. For all x ∈ R x f i,x ( x ; λ i ) f i ( x ; λ i ) and k f i,λ i ( x ; λ i ) k f i ( x ; λ i ) are dominated by c (1 + | x | c ) with c , c ≥ and R | x | c f i ( x ; λ i, ) dx < ∞ R sup λ i ∈ Θ ,λi k f i,λ i ( x ; λ i, ) k dx < ∞ . and Assumption 6.

The following conditions hold for i ∈ { , . . . , n } . . The functions f i,xx ( x ; λ i, ) and f i,xλ i ( x ; λ i, ) are integrable with respect to x , i.e., Z | f i,xx ( x ; λ i, ) | dx < ∞ and Z k f i,xλ i ( x ; λ i, ) k dx < ∞ . R sup λ i ∈ Θ ,λi k f i,λ i λ i ( x ; λ i, ) k dx < ∞

3. For all x ∈ R and all λ i ∈ Θ ,λ i , f i,x ( x ; λ i ) f i ( x ; λ i ) and (cid:12)(cid:12)(cid:12)(cid:12) f i,xx ( x ; λ i ) f i ( x ; λ i ) (cid:12)(cid:12)(cid:12)(cid:12) are dominated by a (1 + | x | a ) , (cid:13)(cid:13)(cid:13)(cid:13) f i,xλ i ( x ; λ i ) f i ( x ; λ i ) (cid:13)(cid:13)(cid:13)(cid:13) and (cid:13)(cid:13)(cid:13)(cid:13) f i,x ( x ; λ i ) f i ( x ; λ i ) f i,λ i ( x ; λ i ) f i ( x ; λ i ) (cid:13)(cid:13)(cid:13)(cid:13) are dominated by a (1 + | x | a ) , (cid:13)(cid:13)(cid:13)(cid:13) f i,λ i ( x ; λ i ) f i ( x ; λ i ) (cid:13)(cid:13)(cid:13)(cid:13) and (cid:13)(cid:13)(cid:13)(cid:13) f i,λ i λ i ( x ; λ i ) f i ( x ; λ i ) (cid:13)(cid:13)(cid:13)(cid:13) are dominated by a (1 + | x | a ) , with a , a , a , a ≥ such that R (cid:16) | x | a + | x | a + | x | a (cid:17) f i ( x ; λ i, ) dx < ∞ . In combination, these assumptions allow to prove, in the same way as in Lii and Rosenblatt (1992),

Theorem 3.

Under Assumptions 3, 4, 5, 6, and one of Assumption 1 or 2, there exists a sequence ofmaximisers ˆ θ T of (4) such that √ T (cid:16) ˆ θ T − θ (cid:17) converges in distribution to N (0 , S ) , where S =  I R ′ R  −  I

00 0   I R ′ R  − and I = E h l θ,t ( θ ) l ′ θ,t ( θ ) i . We illustrate the estimation procedure by estimating the two equation system of Blanchard and Quah(1989), the three equation monetary model involving (log-deviation from the steady state of) theunemployment gap, the inﬂation rate, and the Federal Funds rate, and the four equation modelwhere we include additionally the Kansas City Financial Condition Index (KCFCI). The analyses areavailable in the vignettes of the associated R-package which can be downloaded with the command remotes::install_github(“bfunovits/svarmawhf”, auth_token = “___”, build_vignettes= TRUE) . 18

Acknowledgements

Financial support by the Research Funds of the University of Helsinki as well as by funds of theOesterreichische Nationalbank (Austrian Central Bank, Anniversary Fund, project number: 17646) isgratefully acknowledged. For computations, the

Finnish Grid and Cloud Infrastructure with persistentidentiﬁer urn:nbn:ﬁ:research-infras-2016072533 was used. Juho Koistinen, Mika Meitz, Markku Lanne,and Wolfgang Scherrer provided helpful comments on various versions of this article.

In this article, we introduced a new parametrisation for stable and possibly non-invertible SVARMAmodels (1) driven by independent and non-Gaussian shocks. Every MA polynomial with no determin-antal zeros on the unit circle can be factorised in the way described. We showed that the model in thisparametrisation is (under certain aﬃne restrictions) identiﬁable up to permutation and scaling of thestatic shock transmission matrix. These results generalise the SVAR results in Lanne et al. (2017) tothe possibly non-invertible SVARMA case. Moreover, we provide a computationally feasible method forestimating possibly non-invertible SVARMA models. Illustrations can be found in the vignette of theassociated R-package, downloadable from https://github.com/bfunovits/.

References

Majid M. Al-Sadoon. The linear systems approach to linear rational expectations models.

EconometricTheory , 34(3):628–658, 2018. doi: .Majid M. Al-Sadoon and Piotr Zwiernik. The identiﬁcation problem for linear rational expectationsmodels, 2019.Daniel Alpay and Israel Gohberg.

Topics in Interpolation Theory of Rational Matrix-valued Functions ,chapter Unitary Rational Matrix Functions, pages 175–222. 1988.Brian D.O. Anderson and John B. Moore.

Optimal ﬁltering . Dover Publications, New York, 2005.Brian D.O. Anderson, Manfred Deistler, Elisabeth Felsenstein, Bernd Funovits, Lukas Koelbl, andMohsen Zamani. Multivariate AR Systems and Mixed Frequency Data: G-Identiﬁability and Estim-ation.

Econometric Theory , 32(4):793–826, Aug 2016. doi: .George Athanasopoulos and Farshid Vahid. Varma versus var for macroeconomic forecasting.

Journalof Business & Economic Statistics , 26(2):237–252, 2008a. doi: .19eorge Athanasopoulos and Farshid Vahid. A complete varma modelling methodology based on scalarcomponents.

Journal of Time Series Analysis , 29(3):533–554, 2008b. doi: .Giacomo Baggio and Augusto Ferante. On the factorization of rational discrete-time spectral densities.

IEEE Transactions on Automatic Control , 61(4):969–981, 2016.Giacomo Baggio and Augusto Ferrante. On Minimal Spectral Factors With Zeroes and Poles Lying onPrescribed Regions.

IEEE Transactions on Automatic Control , 61(8):2251–2255, Aug 2016. ISSN0018-9286. doi: .Giacomo Baggio and Augusto Ferrante. Parametrization of minimal spectral factors of discrete-timerational spectral densities.

IEEE Transactions on Automatic Control , 64(1):396–403, Jan 2019. doi: .Olivier J Blanchard and Danny Quah. The Dynamic Eﬀects of Aggregate Demand and Supply Disturb-ances.

American Economic Review , 79(4):655–673, 1989.Albrecht Böttcher and Sergei M. Grudsky.

Spectral Properties of Banded Toeplitz Matrices . SIAM,2005.Kung-Sik Chan and Lop-Hing Ho. On the Unique Representation ofnon-Gaussian Multivariate Linear Processes. Technical report, Depart-ment of Statistics & Actuarial Science, University of Iowa., 2004. URL https://pdfs.semanticscholar.org/fd93/194a3fd280de596ef5135aa7954eab5e51a1.pdf .Kung-Sik Chan, Lop-Hing Ho, and Howell Tong. A Note on Time-Reversibility of Multivariate LinearProcesses.

Biometrika , 93:221–227, 2006. doi: .Aiyou Chen and Peter J. Bickel. Consistent independent component analysis and prewhitening.

IEEETransactions on Signal Processing , 53:3625–3632, 2005. doi: .Kevin F. Clancey and Israel Gohberg.

Factorization of Matrix Functions and Singular Integral Operators .Springer Basel AG, 1981.Manfred Deistler. The Properties of the Parameterization of ARMAX Systems and Their Relevance forStructural Estimation and Dynamic Speciﬁcation.

Econometrica , 51(4):1187–1207, July 1983. URL .Manfred Deistler and Hans-Günther Seifert. Identiﬁability and Consistent Estimability in EconometricModels.

Econometrica , 46(6):969–980, July 1978. URL .Felix R. Gantmacher.

The Theory of Matrices , volume 1. AMS Chelsea Publishing, 1959.20ndre J. Geurts and Cornelis Praagman. Column Reduction of Polynomial Matrices; Some Remarkson the Algorithm of Wolovich.

European Journal of Control , 2(2):152–157, 1996. doi: .Paul D. Gilbert. dse: Dynamic Systems Estimation (Time Series Package) , 2015. URL https://cran.r-project.org/package=dse .Israel Gohberg and M. G. Krein. Systems of integral equations on a half-line with kernel dependinguponthe dierence of the arguments. 14(2):217–287, 1960.Israel Gohberg, Marinus. A. Kaashoek, and Ilya M. Spitkovsky. An Overview of Matrix FactorizationTheory and Operator Applications. In Israel Gohberg, Nenad Manojlovic, and António Ferreira dosSantos, editors,

Factorization and Integrable Systems , pages 1–102, Basel, 2003. Birkhäuser Basel.ISBN 978-3-0348-8003-9.Israel Gohberg, Peter Lancaster, and Leiba Rodman.

Matrix Polynomials . SIAM, Philadelphia, 2009.Victor Gomez. SSMMATLAB: A set of matlab programs for the statistical analysis of state space models.

Journal of Statistical Software , 66(9):1–37, 2015. ISSN 1548-7660. doi: .Victor Gomez.

Multivariate Time Series With Linear State Space Structure . Springer, 2016.Christian Gouriéroux and Joann Jasiak. Noncausal vector autoregressive process: Representation, iden-tiﬁcation and semi-parametric estimation.

Journal of Econometrics , 200(1):118–134, 2017. doi: http://dx.doi.org/10.1016/j.jeconom.2017.01.011 .Christian Gouriéroux and Alain Monfort. A General Framework for Testing a Null Hypothesis in a MixedForm.

Econometric Theory , 5(1):63–82, 1989. doi: .Christian Gouriéroux, Alain Monfort, and Jean-Paul Renne. Statistical inference for independent com-ponent analysis: Application to structural VAR models.

Journal of Econometrics , 196:111–126, 2017.doi: .Christian Gouriéroux, Alain Monfort, and Jean-Paul Renne. Identiﬁcation and Estimation in Non-Fundamental Structural VARMA Models.

Review of Economic Studies , pages 1–39, 2019. doi: .Marc Hallin and Chintan Mehta. R-Estimation for Asymmetric Independent Component Analysis.

Journal of the American Statistical Association , 110(509):218–232, 2015. doi: .Edward J. Hannan.

Multiple Time Series . Wiley, 1970.Edward J. Hannan. The identiﬁcation problem for multiple equation systems with moving averageerrors.

Econometrica , 39(5):751–765, September 1971. doi: .21dward J. Hannan and Manfred Deistler.

The Statistical Theory of Linear Systems . SIAM Classics inApplied Mathematics, Philadelphia, 2012.Edward J. Hannan and Donald S. Poskitt. Unit Canonical Correlations between Future and Past.

TheAnnals of Statistics , 16(2):784–790, June 1988. doi: .Lars Peter Hansen and Thomas J. Sargent.

Rational Expectations Econometrics , chapter Two Diﬃcultiesin Interpreting Vector Autoregressions, pages 77–120. Westview Press, 1991.David A. Harville.

Matrix Algebra From a Statistician’s Perspective . Springer, 1997.Diederich Hinrichsen and Anthony J. Pritchard.

Mathematical Systems Theory I - Modelling, StateSpace Analysis, Stability and Robustness . Springer-Verlag, Berlin, Heidelberg, 2005.Pauliina Ilmonen and Davy Paindaveine. Semiparametrically eﬃcient inference based on signed ranksin symmetric independent component models.

Annals of Statistics , 39(5):2448–2476, 2011. doi: .Abram M. Kagan, Yuri V. Linnik, and Calyampudi R. Rao.

Characterization Problems in MathematicalStatistics . John Wiley & Sons, 1973.Thomas Kailath.

Linear Systems . Prentice Hall, Englewood Cliﬀs, N.J., 1980.Lutz Kilian and Helmut Lütkepohl.

Structural Vector Autoregressive Analysis . Cambridge UniversityPress, 2017. doi: .Markku Lanne and Pentti Saikkonen. Noncausal Vector Autoregression.

Econometric Theory , 29:447–481, 6 2013. ISSN 1469-4360. doi: .Markku Lanne, Mika Meitz, and Pentti Saikkonen. Identiﬁcation and estimation of non-gaussian struc-tural vector autoregressions.

Journal of Econometrics , 196(2):288 – 304, 2017. doi: .Erich Leo Lehmann.

Theory of Point Estimation . Springer Science & Business, 1983.Keh-Shin Lii and Murray Rosenblatt. An approximate maximum likelihood estimation for non-Gaussiannon-minimum phase moving average processes.

Journal of Multivariate Analysis , 43(2):272–299,1992. doi: .Marco Lippi and Lucrezia Reichlin. The Dynamic Eﬀects of Aggregate Demand and Supply Disturb-ances: Comment.

The American Economic Review , 83(3):644–652, 1993.Marco Lippi and Lucrezia Reichlin. VAR Analysis, Nonfundamental Representa-tions, Blaschke Matrices.

Journal of Econometrics , 63(1):307 – 325, 1994.ISSN 0304-4076. doi: http://dx.doi.org/10.1016/0304-4076(93)01570-C . URL .22lexei Onatski. Winding number criterion for existence and uniqueness of equilibrium in linear rationalexpectations models.

Journal of Economic Dynamics and Control , 30(2):323–345, February 2006.Donald S. Poskitt. Vector autoregressive moving average identiﬁcation for macroeconomic modeling:A new methodology.

Journal of Econometrics , 192:468–484, June 2016. doi: .Donald S. Poskitt and Wenying Yao. Vector autoregressions and macroeconomic modeling: An errortaxonomy.

Journal of Business & Economic Statistics , 35(3):407–419, 2017. doi: .R Core Team.

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria, 2019. URL .Mala Raghavan, George Athanasopoulos, and Param Silvapulle. Canadian monetary policy analysisusing a structural VARMA model.

Canadian Journal of Economics , 49(1):347–373, 2016. doi: .Murray Rosenblatt.

Gaussian and non-Gaussian linear time series and random ﬁelds . Springer Science& Business Media, 2000.Thomas J. Rothenberg. Identiﬁcation in Parametric Models.

Econometric Theory , 39(3):577–591, May1971. doi: .Yuri A. Rozanov.

Stationary Random Processes . Holden-Day, San Francisco, 1967.Wolfgang Scherrer and Manfred Deistler. Vector autoregressive moving average models. In Hrishikesh D.Vinod and C. R. Rao, editors,

Handbook of statistics 41 , volume 41. North-Holland, 2019.Wolfgang Scherrer and Bernd Funovits. rationalmatrices: Classes and Methods for Rational Matrices ,2020a.Wolfgang Scherrer and Bernd Funovits.

RLDM: A Package for Modeling of Time Series with a RationalSpectral Density , 2020b.George A. F. Seber.

A Matrix Handbook for Statisticians . John Wiley & Sons, 2008.Ruey S. Tsay.

Multivariate Time Series Analysis With R and Financial Applications . John Wiley &Sons, Inc., Hoboken, New Jersey, 2013.Ruey S. Tsay and David Wood.

MTS: All-Purpose Toolkit for Analyzing Multivariate Time Series(MTS) and Estimating Multivariate Volatility Models , 2018. R package version 1.0.William A. Wolovich.

Linear Multivariable Systems . Springer Verlag, third edition, 1974.23

Zeros and Poles at Inﬁnity

We ﬁrst deﬁne for univariate rational functions zeros and poles at inﬁnity. Zeros at inﬁnity are important for the correct speciﬁcationof the parameter space (which is mainly relevant for the multivariate case). The importance of poles at inﬁnity relies mainly in thefact that the number of ﬁnite and inﬁnite poles must always be equal to the number of ﬁnite and inﬁnite zeros.Then, we do the same for matrices whose entries are polynomials or rational functions where we additionally discuss the Smith-McMillan form for obtaining ﬁnite zeros and poles, column-reduced and row-reduced matrices, and two diﬀerent ways for obtainingthe zeros and poles at inﬁnity (one via Möbius transformations, the other via valuation theory).

A.1 Univariate Rational Functions

Pole at Inﬁnity.

A rational function r ( z ) has a pole at inﬁnity of degree n if and only if lim z →∞ r ( z ) z n exists and is a non-zeronumber. Example 1.

A polynomial p ( z ) = p + p z + · · · + p d z d with p d = 0 has d poles at inﬁnity. When dividing this polynomial by z d ,one obtains a function without zeros at inﬁnity because for r ( z ) = p ( z ) z d = p z − d + p z − d + · · · + p d the limit lim z →∞ r ( z ) is thenon-zero number p d . Zero at Inﬁnity and Valuation at Inﬁnity.

A rational function r ( z ) = n ( z ) d ( z ) , where n ( z ) and d ( z ) are polynomials with deg ( n ( z )) = p n and deg ( d ( z )) = p d , has a zero at inﬁnity if p q > p n . In this case, the degree of this zero at inﬁnity is equal to p q − p n . One could rewrite the rational function by dividing n ( z ) and d ( z ) by their respective highest degrees to obtain r ( z ) = z p n (cid:0) n z − p n + n z − p n + · · · + n p n (cid:1) z p d ( d + d z − + · · · + d p d z p d ) = z p n − p d (cid:0) n z − p n + n z − p n + · · · + n p n (cid:1) ( d z − p d + d z − p d + · · · + d p d ) from which the degree of a pole ( p n > p d ) or zero ( p d > p n ) at inﬁnity can be easily obtained. We will also deﬁne the valuationof r ( z ) at inﬁnity as v ∞ ( r ( z )) = p d − p n , i.e. the degree of the denominator of r ( z ) minus the degree of the numerator of r ( z ) .Thus, a pole at inﬁnity implies a negative valuation at inﬁnity, and a zero at inﬁnity implies a positive valuation at inﬁnity. Theconcept of valuations will be important when characterising zeros and poles at inﬁnity of rational matrix functions. Deﬁning Zeros at Inﬁnity with respect to the Parameter Space.

Notice that in the deﬁnition given above, zeros at inﬁnityappear only in conjunction with rational functions. There are, however, other deﬁnitions for zeros at inﬁnity which are relevantfor polynomials and which additionally consider an appropriate parameter space. Let the ( d + 1) -dimensional tuple of complexnumbers ( c , c , . . . , c d ) be the parameter space for the polynomial c ( z ) = c + c z + · · · + c d − z d − + c d z d . The number ofzeros at inﬁnity of such a c ( z ) is equal to d − deg ( c ( z )) . If, for example, c d = 0 and c d − = 0 , then c ( z ) is said to have onezero at inﬁnity. As an example, consider the polynomial c + bz + az and its associated parameter space ( c, b, a ) . The roots for a = 0 are equal to z ± = − b ±√ b − ac a . When considering the limit for a going to zero, it is easy to see (applying the rule ofl’Hôpital) that z + converges to − cb and that z − is unbounded. For a more formal statement, see Theorem 4.1.2 on page 371 inHinrichsen and Pritchard (2005). A-1 onfusing Notation for Backward Shift in System Theory. In system theory (Kailath, 1980; Anderson and Moore, 2005;Hinrichsen and Pritchard, 2005), it is common to use the complex variable z as forward shift (rather than as backward shift as iscommonly the case in econometrics and statistics). Intuitively (and with a certain amount of hand-waiving), this is due to analogywith continuous time systems, where the inﬁnitesimal operator dt corresponds to “small forward step in time”. More formally, itis due to the deﬁnitions of the z -transform of a discrete time signal ( y t ) t ∈ N as a formal power series Z ( y t ) ( z ) = P ∞ t =0 y t z − t (Hinrichsen and Pritchard, 2005, page 735ﬀ.) and the Laplace transform of a continuous time signal ( y t ) t ∈ R > as, withoutconsidering well-deﬁnedness of the integral, L ( y t ) ( s ) = R ∞ y t e − st dt (Hinrichsen and Pritchard, 2005, page 739ﬀ.). In particular,the z -transform of the discrete time signal of ( y t − ) corresponds to the one of ( y t ) multiplied by z − , see Proposition A.3.6.(i) onpage 737 in Hinrichsen and Pritchard (2005). A.2 Matrices whose Elements are Polynomials or Rational Functions

In addition to the deﬁnitions given in the main text which are only given for square rational matrices whose determinant is notidentically zero, we will here provide deﬁnitions for a (possibly rank-deﬁcient) rational matrix R ( z ) of dimension ( n × q ) . TheSmith-McMillan form of R ( z ) is used to deﬁne ﬁnite poles and zeros. The zeros and poles at inﬁnity are characterised directly viavaluation and also with the Smith-McMillan form of R (cid:0) z (cid:1) .These deﬁnitions provide more insight into the structure of the rational matrix in terms of multiplicities and the dimension of thekernel. Deﬁnition of the Smith-McMillan form.

The Smith-McMillan form is a canonical form for rational matrices and is based onthe Smith form which is the equivalent canonical form for polynomial matrices. The Smith form (SF) of a polynomial matrix P ( z ) of dimensions ( n × q ) is obtained from elementary row and column polynomial matrix transformations, i.e. by left- andright-multiplication with so-called unimodular matrices (Gohberg et al., 2009, Chapter S1.1, pages 313ﬀ.). The Smith-McMillanform (SMF), in turn, is obtained by ﬁrst obtaining the smallest common multiple s ( z ) of the denominators of the elements of R ( z ) and subsequently performing the Smith form of s ( z ) R ( z ) . Eventually, the Smith-McMillan form (Hannan and Deistler,2012, page 53), (Kailath, 1980, Section 6.5.2, page 443) of a rational matrix R ( z ) of dimensions ( n × q ) and rank s is equal to R ( z ) = u ( z )Λ( z ) v ( z ) where u ( z ) and v ( z ) are unimodular matrices (i.e. polynomial matrices with nonzero constant determinant),and Λ( z ) is a diagonal matrix in which only the ﬁrst s elements λ i ( z ) = n i ( z ) d i ( z ) are non-zero, and n i ( z ) , d i ( z ) are relatively primemonic (i.e. the coeﬃcient pertaining to the highest power is one) polynomials. Furthermore, it is required that n i ( z ) divides n i +1 ( z ) , and that d i +1 ( z ) divides d i ( z ) . Deﬁnition of ﬁnite zeros and poles of a rational matrix.

The ﬁnite poles of R ( z ) are the zeros of the denominator polynomials d i ( z ) . Note that it is possible that R ( z ) has both a pole and a zero at z .The ﬁnite zeros of R ( z ) are the zeros of the numerator polynomials n i ( z ) . Note that if (for square matrices) the determinant of R ( z ) has a zero of multiplicity at z , the rank deﬁciency of R ( z ) can be one, the “usual” case when the parameters of R ( z ) are A unimodular matrix is a square polynomial matrix with non-zero constant determinant. If the parameters of a square n -dimensional R ( z ) are unrestricted, “usually” n ( z ) = · · · = n n − ( z ) = 1 and d n ( z ) = · · · = d ( z ) = 1 hold suchthat n n ( z ) = Q i ( z − z i ) and d ( z ) = Q j ( z − p j ) . A-2nrestricted, or two. In the square case, a deﬁnition involving the determinant can only make sense in the non-singular case, i.e.when det ( R ( z )) is not identically zero. Still, it is possible that a zero in one d i ( z ) at z may cancel out a zero in one n j ( z ) , j = i ,at z in the determinant. Deﬁnition of zeros and poles at inﬁnity of a rational matrix with the Smith-McMillan form.

The zeros and poles of R ( z ) at inﬁnity are the zeros and poles of R (cid:16) az + bcz + d (cid:17) at z = − dc where c = 0 and ad − bc = 0 . Often, c = b = 1 and a = d = 0 arechosen. In that case, the zeros and poles of R ( z ) at inﬁnity are obtained as the zeros and poles of the numerator and denominatorpolynomials of the Smith-McMillan form of R (cid:0) z (cid:1) at z = 0 . Note that this deﬁnition using the Smith-McMillan form does notrequire to be precise about the considered parameter space (as was the case for univariate polynomials). Deﬁnition of zeros and poles at inﬁnity of a rational matrix via valuation.

Obviously, it is possible to rewrite the Smith-McMillan form as a product of diagonal matrices M α ( z ) which has only (ﬁnite) zeros and poles at α and one non-square matrix S consisting of zeros and ones such that Λ( z ) = S Q kj =1 M α ( z ) . The matrices M α ( z ) can be obtained directly by calculating theminors of R ( z ) as will be described below. Importantly, this is also possibly for M ∞ ( z ) which contains the zero and pole structureat inﬁnity. In this way, one may circumvent the calculation of the Smith-McMillan form of R (cid:0) z (cid:1) .Identically to the valuation at inﬁnity of a univariate rational function, we deﬁne the valuation at α ∈ C as the integer v in r ( z ) = ( z − α ) v p ( z ) q ( z ) where p ( z ) and q ( z ) are polynomials without common factors and do not have ( z − α ) as factor. We willdenote the valuation of r ( z ) at α as v α ( r ( z )) .For treating the multivariate case, we need to consider minors of dimensions ( i × i ) , i ∈ { , . . . , min ( n, q ) } . The i -th valuationof R ( z ) at α ∈ C , denoted as v ( i ) α ( r ( z )) , is obtained as the the minimal degree of all ( i × i ) minors of R ( z ) where the valuationof a polynomial that is identically zero is equal to inﬁnity. The degrees of the diagonal elements in M α ( z ) are obtained as (cid:16) v (1) α ( r ( z )) , v (2) α ( r ( z )) − v (1) α ( r ( z )) , . . . , v ( s ) α ( r ( z )) − v ( s − α ( r ( z )) (cid:17) .The degrees of M ∞ ( z ) can thus be obtained as (cid:16) v (1) ∞ ( r ( z )) , v (2) ∞ ( r ( z )) − v (1) ∞ ( r ( z )) , . . . , v ( s ) ∞ ( r ( z )) − v ( s − ∞ ( r ( z )) (cid:17) . Zeros at inﬁnity of unimodular matrices.

While for univariate polynomials, introducing zeros at inﬁnity seem to be a bitartiﬁcial, they have an immediate interpretation for unimodular matrices. For example, the unimodular matrix t ( z ) = ( z ) hasa zero at inﬁnity because the Smith-McMillan form of t (cid:0) z (cid:1) = (cid:16) z (cid:17) is (cid:16) z z (cid:17) . Thus, the unimodular matrix u ( z ) has a zeroand a pole at inﬁnity. Equivalently, the zeros and poles at inﬁnity can be obtained via its valuations. Since v (1) ( t ( z )) = − and v (2) ( t ( z )) = 0 , we obtain that the degrees in M ∞ ( z ) are equal to ( − , . A.3 Row- and Column-Reduced Polynomial Matrices

The degree of a polynomial matrix is deﬁned as the maximum of the degrees of its elements. Likewise, the degree of row i is deﬁnedas the maximum of the degrees of the polynomials in row i . The rows of the row-end-matrix are the coeﬃcients pertaining to the A minor is the determinant of a square submatrix. There are (cid:0) ni (cid:1)(cid:0) qi (cid:1) diﬀerent ( i × i ) -minors of in an ( n × q ) -dimensinoal matrix (Gantmacher, 1959,page 2). First multiply t (cid:0) z (cid:1) by the SCM and obtain that z (cid:16) z (cid:17) = (cid:0) z (cid:1) (cid:0) − (cid:1) (cid:16) z (cid:17) (cid:0) z (cid:1) (cid:0) (cid:1) . Dividing both sides by z , one results in theSmith-McMillan form. A-3ow degree of the respective row. A polynomial matrix is called row-reduced, if its row-end-matrix is of full rank. For example, thepolynomial matrix (cid:0) z n z n (cid:1) is row-reduced, while (cid:0) z z z (cid:1) is not. Sometimes it is useful to write a (square) polynomial matrix P ( z ) with row-end-matrix denoted as P hr and row degrees ( p , . . . , p n ) as P ( z ) =  z p . . . z p q  P hr + M ( z ) where the degree of M [ i, • ] ( z ) is smaller than p i . The same applies to the columns of a polynomial matrix to obtain the column-end-matrix. B The Wiener-Hopf Factorisation

In this section, we construct the (left-) WHF of b ( z ) using the SF, see also Al-Sadoon (2018). We start from the matrix polynomial b ( z ) = I + b z + · · · + b q z q and obtain b ( z ) = =˜˜ p ( z ) z }| { [ u ( z )Λ p ( z )] = ˜˜˜ f ( z ) z }| { [Λ f ( z ) v ( z )]= (cid:2) ˜˜ p ( z ) w ( z ) − (cid:3)| {z } =˜ p ( z ) (cid:20) w ( z ) ˜˜˜ f ( z ) (cid:21)| {z } = ˜˜ f ( z ) where Λ p ( z ) has only zeros outside the unit circle, and Λ f ( z ) has only zeros inside the unit circle, and w ( z ) is a unimodular matrixwhich row-reduces ˜˜˜ f ( z ) , see Wolovich (1974, Theorem 2.5.7, page 28), Kailath (1980, page 386), Geurts and Praagman (1996).Subsequently, we permute the rows of ˜˜ f ( z ) such that the row degrees κ i the inequalities κ ≥ · · · ≥ κ n hold and we extract thehighest degree of each row to obtain the partial indices b ( z ) = = p ( z ) z }| { [˜ p ( z ) P ′ ] = ˜ f ( z ) z }| {h P ˜˜ f ( z ) i = p ( z ) diag ( z κ , . . . , z κ n ) | {z } = s ( z ) h diag (cid:0) z − κ , . . . , z − κ n (cid:1) ˜ f ( z ) i| {z } = f ( z ) . Note that f ( z ) does not have poles at inﬁnity since its degree is zero and that it does not have zeros at inﬁnity because f (cid:0) z (cid:1) evaluated at z = 0 is by construction of full rank. B.1 Non-Uniqueness of the WHF and Degrees of the Factors

It is shown in Clancey and Gohberg (1981, Theorem I.1.2, page 11) that for ( κ, k ) , the equivalence class of WHFs is describedby the block upper triangular unimodular matrices for which u [ k +1: n, k ] ( z ) = 0 , the diagonal blocks are constant, and the degreeof u [1: k,k +1: n ] ( z ) is at most one. More speciﬁcally, we have that ˚ p ( z ) = p ( z ) u ( z ) , ˚ s ( z ) = s ( z ) , ˚ f ( z ) = s ( z ) − u ( z ) − s ( z ) f ( z ) .Note that v ( z ) = s ( z ) − u ( z ) − s ( z ) is of the form v ( z ) = v +  v  z − and that this transformation does not change theA-4ow degrees of f ( z ) = f + f z − + · · · +  f κ, [1: k, • ] f κ, [ k +1: n, • ]  z − κ +  f κ +1 , [1: k, • ] ( n − k ) × n  z − κ − or g ( z ) := s ( z ) f ( z ) =  f κ +1 , [1: k, • ] f κ, [ k +1: n, • ]  +  f κ, [1: k, • ] f κ − , [ k +1: n, • ]  z + · · · +  f , [1: k, • ] f , [ k +1: n, • ]  z κ +  f , [1: k, • ] ( n − k ) × n  z κ +1 .Moreover, it follows from the row-reducedness of f ( z ) and s ( z ) f ( z ) together with the predictable degree property (Kailath, 1980,Theorem 6.3-13, page 387) that the ﬁrst k columns of p ( z ) have degree smaller than or equal to q − κ − and the last n − k columns of p ( z ) have degree smaller than or equal to q − κ . Therefore, the transformation u ( z ) = u +  u  z does notchange the highest column degrees of p ( z ) .Last, note that due to the fact that b (0) = I n , it holds that p − =  f κ +1 , [1: k, • ] f κ, [ k +1: n, • ]  . B.2 Canonical Representative for ( κ, k ) , k = 0 We will now construct a canonical WHF by choosing u ( z ) and setting certain parameters in p and p of p ( z ) = p + p z + · · · p q p z q p equal to zero and one.First, we will determine u in u ( z ) = u +  u  z . Let us partition the matrix p =  p , p , p , p ,  and assume that p , is invertible. Then, right-multiplying p ( z ) with u =  p − , I n − k   I k − p , I n − k   I k (cid:0) p , − p , p − , p , (cid:1) −  we obtain  p , p , p , p ,  u =  I k p , p , p − , p ,   I k − p , I n − k   I k (cid:0) p , − p , p − , p , (cid:1) −  =  I k p , p − , p , − p , p − , p ,   I k (cid:0) p , − p , p − , p , (cid:1) −  =  I k p , p − , I n − k  . Last, we may choose ˜ u such that p , = 0 . B.3 The Factorisation in Lanne and Saikkonen (2013)

Here, we point out diﬀerences and similarities of the WHF to the factorisation of the AR matrix polynomial in Lanne and Saikkonen(2013). It is of the form ˜ a ( z ) = Π( z )Φ (cid:0) z (cid:1) = ( I − Π z − · · · − Π r z r ) (cid:0) I − Φ z − − · · · − Φ s z − s (cid:1) where both det (Π( z )) andA-5 et (Φ( z )) have no zeros inside or on the unit circle. Thus, Φ (cid:0) z (cid:1) corresponds (roughly) to our f ( z ) .Let us start by noting that if s > , there are negative powers in a ( z ) . This is due to the fact that Lanne and Saikkonen (2013) donot start from a polynomial matrix but directly from the factorisation. Moreover, the coeﬃcient matrices pertaining to z − s , z , z r are Φ s , I n + P min( s,r ) k =1 Π k Φ k , Π r , none of which is assumed to be of full rank.In order to make the comparison easier, we introduce ‘pseudo partial indices’ such that a ( z ) = Π( z ) z s Φ (cid:0) z (cid:1) = ( I − Π z − · · · − Π r z r ) z s (cid:0) I − Φ z − − · · · − Φ s z − s (cid:1) and compare it to the WHF of a polynomial matrix of the same form as the MA polynomial in the main text but with partialindices κ i = κ , i.e. b ( z ) = p ( z ) z κ f ( z ) where deg ( p ( z )) = q − κ and deg ( f ( z )) = κ , b (0) = I n , and det ( b ( z )) = 0 for | z | = 1 .First, note that in the case of constant partial indices it is possible to normalise p to I n and that the condition b (0) = I n impliesthat f κ = I n . Thus, the normalisation of Π( z ) seems reasonable in this context, while the normalisation of Φ (cid:0) z (cid:1) is more diﬃcultto bring into line with the WHF (and the generic existence of a factorization of the kind in Lanne and Saikkonen (2013)). Second,while s ( z ) f ( z ) is row-reduced and f ( ∞ ) is of full rank as well, the row-end-matrix of Φ( z ) is not necessarily of full rank. However,the row-end-matrix of z s Φ (cid:0) z (cid:1) is by deﬁnition of the factorisation equal to the identity matrix and therefore of full rank. Last, andeven though this is entangled with Φ( z ) not being of full row rank, the “pseudo partial indices” are restricted to be identical. C Analytic Formulae and Asymptotic Derivations

Under the assumptions of Section 4, we will ﬁll here the missing pieces and technicalities regarding the asymptotic behaviour of theMLE. In particular, we will discuss the representation of the WHF with (ﬁnite sections of) Toeplitz operators (Böttcher and Grudsky,2005), and derive the partial derivatives of the log-likelihood function from which the conditions for the asymptotic theory can beeasily veriﬁed.

C.1 Notation and (Toeplitz) System Representations

The individual contribution at time t to the (standardised) log-likelihood function, i.e. equation (5), is here repeated as l t ( θ ) = n X i =1 log (cid:2) f i (cid:0) σ − i ε i,t ( θ ) ; λ i (cid:1)(cid:3) − log [ | det ( f ) | ] − log {| det [ B ( β )] |} − n X i =1 log ( σ i ) , where ε i,t ( θ ) = ι ′ i B ( β ) − u t ( θ ) and u t ( θ ) = m ( z ; θ ) − y t with m ( z ; θ ) = a ( z ; θ ) − p ( z ; θ ) s ( z ; θ ) f ( z ; θ ) such that m ( z ; θ ) B ( β ) = k ( z ; θ ) . Derivatives of the component densities.

For the ﬁrst partial derivatives of l t ( θ ) , the expressions e i,x,t ( θ ) = ∂∂x log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i = f i,x (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17) f i (cid:16) σ − i ι ′ i B ′ ( β ) − u t ( θ ) ; λ i (cid:17) , A-6here ι i is the unit vector which is one at position i and zero otherwise, and e i,λ i ,t ( θ ) = ∂∂λ i log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i = f i,λ (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17) f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17) , where f i,x ( x ; λ i ) = ∂∂x f i ( x ; λ i ) and f i,λ i ( x ; λ i ) = ∂∂λ i f i ( x ; λ i ) will be used extensively. The corresponding versions for allcomponents are e x,t ( θ ) = ( e ,x,t ( θ ) , . . . , e n,x,t ( θ )) ′ of dimension n and e λ,t ( θ ) = (cid:16) e ′ ,λ ,t ( θ ) , . . . , e ′ n,λ n ,t ( θ ) (cid:17) ′ of dimension d = d + · · · + d n .The notation ∂e i,x,t ( θ ) ∂x := ∂e i,x,t ( θ ) ∂x (cid:12)(cid:12)(cid:12) θ = θ is used to denote the derivative evaluated at a particular point. Two diﬀerent ways to express the partial derivatives of u t ( θ ) . The observations may be represented at one particular pointin time or as a system containing all observations ( y T , . . . , y ) as well as starting values ( y , . . . , y − p ) . The starting values for theprocess ( u t ) are set to zero, i.e. ( u , . . . , u − q ) = 0 . For simplicity, we also set the starting values ( y , . . . , y − p ) equal to zero. Ifclarity of presentation is not aﬀected, we use x t − = (cid:0) y ′ t − , . . . , y ′ t − p (cid:1) ′ of dimension np and w ( q ) t − ( θ ) = (cid:0) u ′ t − ( θ ) , . . . , u ′ t − q ( θ ) (cid:1) ′ of dimension nq as shorthand notation. One point in time.

For one particular point in time, we have u t ( θ ) = y t − ( a , . . . , a p )  y t − ... y t − p  − ( b , . . . , b q )  u t − ( θ ) ... u t − q ( θ )  (6)for t ∈ { , . . . , T } . System representation.

All observations can be written as ( y T · · · y ) − a ( y T − · · · y ) −· · ·− a p ( y T − p · · · y − p ) = ( u T ( θ ) · · · u ( θ ))+ b ( u T − ( θ ) · · · u ( θ ))+ · · · + b q ( u T − q ( θ ) · · · u − q ( θ )) . (7)Deﬁning the matrix L =  · · ·

00 0 1 ...... . . .... . . . . . . · · ·  ∈ R T × T A-7orresponding to the (non-invertible) lag operator on N such that L  u ′ T ( θ ) u ′ T − ( θ ) ... u ′ ( θ )  =  u ′ T − ( θ ) u ′ T − ( θ ) ... × n  , equation (7) can be written as ( y T · · · y ) − a ( y T − · · · y ) −· · ·− a p ( y T − p · · · y − p ) = ( u T ( θ ) · · · u ( θ ))+ b ( u T ( θ ) · · · u ( θ )) L ′ + · · · + b q ( u T ( θ ) · · · u ( θ )) ( L ′ ) q Vectorizing equation (7) leads to vec ( y T · · · y ) −  y ′ T − y ′ T − ... y ′  ⊗ I n ,  y ′ T − y ′ T − ... y ′  ⊗ I n , . . . ,  y ′ T − p y ′ T − p − ... y ′ − p  ⊗ I n  vec ( a , . . . , a p ) | {z } = τ = (8) = vec ( u T ( θ ) · · · u ( θ )) +  L  u ′ T ( θ ) u ′ T − ( θ ) ... u ′ ( θ )  ⊗ I n , L  u ′ T ( θ ) u ′ T − ( θ ) ... u ′ ( θ )  ⊗ I n , . . . , L q  u ′ T ( θ ) u ′ T − ( θ ) ... u ′ ( θ )  ⊗ I n  vec ( b , . . . , b p ) | {z } = π = " I Tn + q X i =1 (cid:16) L i ⊗ b i (cid:17) vec ( u T ( θ ) · · · u ( θ )) where the vectorisation formula vec ( ABC ) = ( C ′ ⊗ A ) vec ( B ) has been applied to { [ I n ] [ a j ] [( y T − j · · · y − j )] } on the left-hand-side and to (cid:16) [ b j ] [( u T ( θ ) · · · u ( θ ))] h ( L ′ ) j i(cid:17) and (cid:16) [ I n ] [ b j ] h ( u T ( θ ) · · · u ( θ )) ( L ′ ) j i(cid:17) on the right-hand-side of equation (7)By using the (conditional maximum likelihood) assumption that ( y , . . . , y − p ) be zero, we can also vectors the left-hand-side ofequation (7) as vec (cid:2) ( y T · · · y ) − a ( y T · · · y ) L ′ − · · · − a p ( y T · · · y ) ( L ′ ) p (cid:3) = vec ( y T · · · y ) − p X j =1 (cid:0) L j ⊗ a j (cid:1) vec ( y T · · · y ) in order to obtain B  u T ( θ ) ... u ( θ )  = A  y T ... y  A-8here A = " I T n − p X i =1 (cid:0) L i ⊗ a i (cid:1) =  I n − a · · · − a p · · · I n − a · · · − a p ... I n − a · · · − a p . . . ... . . . . . . . . .

00 0 I n − a · · · − a p ... . . . . . . . . . ... I n − a · · · · · · I n  ∈ R T n × T n and B = " I T n + q X i =1 (cid:0) L i ⊗ b i (cid:1) =  I n b · · · b q · · · I n b · · · b q ... I n b · · · b q . . . ... . . . . . . . . .

00 0 I n b · · · b q ... . . . . . . . . . ... I n b · · · · · · I n  ∈ R T n × T n . WHF as (ﬁnite sections of) Toeplitz operators.

Similar to, e.g., Böttcher and Grudsky (2005, Chapter 1), we represent theWHF of b ( z ) = p ( z ) s ( z ) f ( z ) in terms of ﬁnite section of the corresponding Toeplitz operator. We have for ( κ, that  In b · · · bq · · · In b · · · bq ... In b · · · bq ... ... ... ... ...

00 0

In b · · · bq ... ... ... ... ... In b · · · · · · In  =  In p · · · pq − κ · · · In b · · · pq − κ ... In b · · · pq − κ ... ... ... ... ...

00 0

In b · · · bq ... ... ... ... ... In b · · · · · · In  ( L ⊗ In ) κ  f · · · · · · f f ...... f f ... fκ − ... f In fκ − ... f ... ... In ... ... ... ... ... fκ − f f · · · In fκ − · · · f f  ⇐⇒ B = PSF and for ( κ, k ) with k = 0 , the matrix S is replaced with ( L ⊗ I n ) κ [( L ⊗ S ,k ) ( I T ⊗ S ,k )] , where S ,k =  I k

00 0 n − k  and S ,k =  k I n − k  , and some matrices in B and F are adjusted and have zero-, one- and equality-restrictions. The matrices B and F are invertible if and only if the matrix on the diagonal is invertible. For B this is obvious, for F it holds by construction ofthe WHF. Notice that we consider a left-WHF in contrast to the right-WHF analysed in Böttcher and Grudsky (2005, page 6). Therefore, the results inBöttcher and Grudsky (2005) are sometimes not directly transferable. Moreover, these authors treat the univariate case. Be that as it may, themultivariate generalisation is (for our requirements) obvious.

A-9oreover, it also holds that for growing sample size T , the inverses of P and F exist in the sense that the operator norms thatare induced by the l ∞ and the l norm, i.e. the maximum row-sum and maximum column-sum norm, are ﬁnite. The same doesnot hold for B : While B is invertible for every ﬁnite sample size T , the norm of the inverses diverges to inﬁnity for sample sizegoing to inﬁnity! See Böttcher and Grudsky (2005, Chapter 1.6) for a more precise statement.While S (corresponding to the backward shift in the Toeplitz representation) is not invertible, its Moore-Penrose pseudo-inverse S † is equal to ( F ⊗ I T ) where F =  ···

01 0 0 ... ...... ... ... ... ···  .The polynomial b ( z ) can also be represented as b ( z ) = f ( z ) g ( z ) where g ( z ) = s ( z ) f ( z ) . This will be useful when deriving analyticformulae for the score with respect to the system parameters. In this case, we have in the case ( κ, that  In b · · · bq · · · In b · · · bq ... In b · · · bq ... ... ... ... ...

00 0

In b · · · bq ... ... ... ... ... In b · · · · · · In   In g · · · gκ · · · In g · · · gκ ... In g · · · gκ ... ... ... ... ...

00 0

In g · · · gκ ... ... ... ... ... In g · · · · · · In  ⇐⇒ B = PG In the case ( κ, k ) , there are some changes to the parameter matrices as described above and in the main text. Notice that similarto the ﬁnite sections of the Toeplitz operator corresponding to b ( z ) , the ( nT × nT ) -dimensional matrix G is invertible for every T but for T going to inﬁnity, the induced operator norms of the inverses diverge. C.2 Score of System Parameters

C.2.1 System Parameters: Generalities ∂l t ( θ ) ∂τ = ∂∂τ ( n X i =1 log (cid:2) f i (cid:0) σ − i (cid:0) ι ′ i B − ( β ) u t ( θ ) (cid:1) ; λ i (cid:1)(cid:3) − log [ | det ( f ) | ] − log {| det [ B ( β )] |} − n X i =1 log ( σ i ) ) = ∂∂τ ( n X i =1 log (cid:2) f i (cid:0) σ − i (cid:0) u ′ t ( θ ) B ′− ( β ) ι i (cid:1) ; λ i (cid:1)(cid:3) − log [ | det ( f ) | ] ) = n X i =1 e i,x,t ( θ ) ∂u t ( θ ) ′ ∂τ σ − i B ′ ( β ) − ι i − ∂∂τ log [ | det ( f ) | ]= ∂u t ( θ ) ′ ∂τ B ′ ( β ) − Σ − e x,t − f ) ∂ det ( f ) ∂τ = ∂u t ( θ ) ′ ∂τ B ′ ( β ) − Σ − e x,t − f ) det ( f ) ∂vec ( f ) ∂τ vec (cid:0) f ′− (cid:1) = ∂u t ( θ ) ′ ∂τ B ′ ( β ) − Σ − e x,t − ∂vec ( f ) ∂τ vec (cid:0) f ′− (cid:1) The Moore-Penrose pseudo-inverse A † of a square matrix A satisﬁes AA † A = A ,s A † AA † = A † , (cid:0) AA † (cid:1) ′ = AA † , and (cid:0) A † A (cid:1) ′ = A † A . A-10or the derivative of the determinant, we have that ∂ det( Z ) ∂x ′ = vec (cid:2) adj ( Z ) ′ (cid:3) ′ ∂vec ( Z ) ∂x ′ = det ( Z ) vec (cid:0) Z ′− (cid:1) ′ ∂vec ( Z ) ∂x ′ or equivalently ∂ det( Z ) ∂x = det ( Z ) ∂vec ( Z ) ′ ∂x vec (cid:0) Z ′− (cid:1) , see Seber (2008)17.26(c), page 361. Partitioning of system parameters.

Parameters pertaining to a ( z ) , p ( z ) , and f ( z ) are in τ , τ , τ respectively. Rememberthat u t ( θ ) = a τ ( z ) y t + ( I − p τ ( z ) s τ ( z ) f τ ( z )) B ( γ ) ε t ( θ ) | {z } = u t ( θ ) . C.2.2 AR ParametersThe derivative of u t with respect to τ for one equation. We obtain from vectorising (6) that u t ( θ ) = y t − ( a , . . . , a p )  y t − ... y t − p  − ( b , . . . , b q )  u t − ( θ ) ... u t − q ( θ )  = y t − (cid:0)(cid:0) y ′ t − , . . . , y ′ t − p (cid:1) ⊗ I n (cid:1) vec ( a , . . . , a p ) − ( b , . . . , b q )  u t − ( θ ) ... u t − q ( θ )  . Transposition and diﬀerentiation lead to u ′ t ( θ ) = y ′ t − τ ′  y t − ... y t − p  ⊗ I n  − (cid:0) u ′ t − ( θ ) , . . . , u ′ t − q ( θ ) (cid:1)  b ′ ... b ′ q  and ∂u ′ t ( θ ) ∂τ = − ( x t − ⊗ I n ) − (cid:18) ∂u ′ t − ( θ ) ∂τ , . . . , ∂u ′ t − q ( θ ) ∂τ (cid:19)  b ′ ... b ′ q  . Finally, we may express ∂u t ( θ ) ∂τ ′ using a lag polynomial, i.e. [ p ( z ) z κ f ( z )] n × n p z }| { ∂u t ( θ ) ∂τ ′ = − = ( n × n p ) z }| {(cid:0) x ′ t − ⊗ I n (cid:1) ⇐⇒ ∂u t ( θ ) ∂τ ′ = − (cid:2) f ( z ) − z − κ p ( z ) − (cid:3) (cid:2) x ′ t − ⊗ I n (cid:3) . It is irrelevant here that we use b ( z ) instead of its WHF because both derivatives are zero. A-11ote that the power series f ( z ) − = P ∞ j =0 h j z − j only depends on non-positive powers of z and that h = f − is non-singular.For convenience, we deﬁne the quantity x ′ b,t − = (cid:2) f ( z ) − z − κ p ( z ) − (cid:3) (cid:2) x ′ t − ⊗ I n (cid:3) Result for l τ ,t ( θ ) for one point in time. This implies for the score that ∂l t ( θ ) ∂τ = − x b,t − ( θ ) B ′ ( β ) − Σ − e x,t ( θ ) The derivative of u t with respect to τ for all points in time. Rewriting equation (8) as  y T ... y  −  x ′ T − ... x ′  ⊗ I n  vec ( a , . . . , a p ) | {z } = τ = " I T n + q X i =1 (cid:0) L i ⊗ b i (cid:1) = B = PSF vec ( u T ( θ ) · · · u ( θ )) , transposing it and taking partial derivatives leads to ∂∂τ (cid:18) u ′ T ( θ ) · · · u ′ ( θ ) (cid:19) = − ∂∂τ (cid:20) τ ′ (cid:20)(cid:18) x T · · · x (cid:19) ⊗ I n (cid:21) P ′− ( S ′ ) † F ′− (cid:21) = − (cid:20)(cid:18) x T − · · · x (cid:19) ⊗ I n (cid:21) P ′− ( S ′ ) † F ′− . Note that P ′− is block-lower-triangular and F ′− is block-upper-triangular. Their block diagonals correspond to the coeﬃcientsof the associated power series in the WHF, whose (matrix-) norms are decreasing at an exponential rate. Result for ∂L t ( θ ) ∂τ . The partial derivative of the standardized log-likelihood function with respect to τ is ∂L t ( θ ) ∂τ = 1 T T X i =1 l τ ,t ( θ )= − T (cid:20)(cid:18) x T − · · · x (cid:19) ⊗ I n (cid:21) P ′− ( S ′ ) † F ′− (cid:16) I T ⊗ Σ − B ′ ( β ) − (cid:17)  e x,T ( θ ) ... e x, ( θ )  . A-12 .2.3 “Stable” MA Parameters

We consider the case ( κ, for τ such that the free parameters are in τ = vec ( p , . . . , p q − κ ) . Taking the partial derivative withrespect to τ ′ of a ( z ) y t = [ p ( z ) g ( z )] u t ( θ )= ( I n , p , . . . , p q − κ ) ( I q − κ +1 ⊗ g ( z ))  u t ( θ ) u t − ( θ ) ... u t − ( q − κ ) ( θ )  = ( I n , p , . . . , p q − κ )  v t ( θ ) v t − ( θ ) ... v t − ( q − κ ) ( θ )  = v t ( θ ) + ( p , . . . , p q − κ )  v t − ( θ ) ... v t − ( q − κ ) ( θ )  and obtain ∂v t ( θ ) ∂τ ′ = − ( p , . . . , p q − κ )  ∂v t − ( θ ) ∂τ ′ ... ∂v t − ( q − κ ) ( θ ) ∂τ ′  −  v t − ( θ ) ... v t − ( q − κ ) ( θ )  ⊗ I n  ⇐⇒ p ( z ) ∂v t ( θ ) ∂τ ′ = −  v t − ( θ ) ... v t − ( q − κ ) ( θ )  ′ ⊗ I n  . We deﬁne w ′ g,t − =  v t − ( θ ) ... v t − ( q − κ ) ( θ )  ′ ⊗ I n  and obtain ( I n + b z + · · · + b q z q ) ∂u t ( θ ) ∂τ ′ = − (cid:2) w ′ g,t − ( θ ) ⊗ I n (cid:3) ⇐⇒ ∂u t ( θ ) ∂τ ′ = − f ( z ) − z − κ p ( z ) − (cid:2) w ′ g,t − ( θ ) ⊗ I n (cid:3) . Note that w ′ g,t − = (cid:0) g ( z ) u ,t − , . . . , g ( z ) u n,t − | · · · | g ( z ) u ,t − ( q − κ ) , . . . , g ( z ) u n,t − ( q − κ ) (cid:1) . A-13 ll Equations.

Rewriting equation (8) as  u T ( θ ) ... u ( θ )  =  I nT − = B z}|{ PG   u T ( θ ) ... u ( θ )  + A  y T ... y  =  u T ( θ ) ... u ( θ )  − P  v T ( θ ) ... v ( θ )  + A  y T ... y  = −  L  v ′ T ( θ ) v ′ T − ( θ ) ... v ′ ( θ )  ⊗ I n , L  v ′ T ( θ ) v ′ T − ( θ ) ... v ′ ( θ )  ⊗ I n , . . . , L q − κ  v ′ T ( θ ) v ′ T − ( θ ) ... v ′ ( θ )  ⊗ I n  τ + A  y T ... y  , = −  v ′ T − ( θ ) · · · v ′ T − ( q − κ ) ( θ ) ... ... v ′ ( θ ) · · · v ′ − ( q − κ ) ( θ )  ⊗ I n  τ + A  y T ... y  , transposing and taking derivatives leads to ∂ ( v ′ T ( θ ) , . . . , v ′ ( θ )) ∂τ B ′ |{z} = F ′ S ′ P ′ = −  v T − ( θ ) · · · v ( θ ) ... ... v T − ( q − κ ) ( θ ) · · · v − ( q − κ ) ( θ )  ⊗ I n  which in turn is equivalent to ∂ ( v ′ T ( θ ) , . . . , v ′ ( θ )) ∂τ = −  v T − ( θ ) · · · v ( θ ) ... ... v T − ( q − κ ) ( θ ) · · · v − ( q − κ ) ( θ )  ⊗ I n  P ′− S ′† F ′− Result for ∂L t ( θ ) ∂τ . Finally, we obtain for the partial derivative of the standardised log-likelihood function with respect to τ that ∂L t ( θ ) ∂τ = 1 T T X i =1 l τ ,t ( θ )= − T (cid:20)(cid:18) w g,T − ( θ ) · · · w g, ( θ ) (cid:19) ⊗ I n (cid:21) P ′− S ′† F ′− (cid:16) I T ⊗ Σ − B ′ ( β ) − (cid:17)  e x,T ( θ ) ... e x, ( θ )  A-14 .2.4 “Unstable” MA Parameters

Similarly, we consider the case ( κ, for τ such that the free parameters are in τ = vec ( g , . . . , g κ ) . Taking the partial derivativewith respect to τ ′ of a ( z ) y t = [ p ( z ) g ( z )] u t ( θ )= p ( z ) ( I n , g , . . . , g κ )  u t ( θ ) u t − ( θ ) ... u t − κ ( θ )  = p ( z ) ( g , . . . , g κ )  u t − ( θ ) ... u t − κ ( θ )  + p ( z ) u t and obtain − p ( z ) ∂u t ( θ ) ∂τ ′ =  u t − ( θ ) ... u t − κ ( θ )  , ⊗ p ( z )  ∂τ ∂τ ′ + p ( z ) ( g , . . . , g κ )  ∂u t − ( θ ) ∂τ ′ ... ∂u t − κ ( θ ) ∂τ ′  ⇐⇒ p ( z ) g ( z ) | {z } = b ( z ) ∂u t ( θ ) ∂τ ′ = −  u t − ( θ ) ... u t − κ ( θ )  , ⊗ p ( z )  . We deﬁne w ′ p,t − =  u t − ( θ ) ... u t − κ ( θ )  , ⊗ p ( z )  . and obtain ( I n + b z + · · · + b q z q ) ∂u t ( θ ) ∂τ ′ = − (cid:2) w ′ p,t − ( θ ) ⊗ I n (cid:3) ⇐⇒ ∂u t ( θ ) ∂τ ′ = − f ( z ) − z − κ p ( z ) − (cid:2) w ′ p,t − ( θ ) ⊗ I n (cid:3) . = − f ( z ) − (cid:2)(cid:0) u ′ t + κ − ( θ ) , . . . , u ′ t ( θ ) (cid:1) ⊗ I n (cid:3) Similar to the partial derivative with respect to τ , we see that w ′ p,t − = ( p ( z ) u ,t − , . . . , p ( z ) u n,t − | · · · | p ( z ) u ,t − κ , . . . , p ( z ) u n,t − κ ) . A-15 ll Equations.

Rewriting equation (8) as G  u T ( θ ) ... u ( θ )  = P − A  y T ... y  ⇐⇒  u T ( θ ) ... u ( θ )  = ( I nT − G )  u T ( θ ) ... u ( θ )  − P − A  y T ... y  = −  u ′ T − ( θ ) · · · u ′ T − κ ( θ ) ... ... u ′ ( θ ) · · · u ′ − κ ( θ )  ⊗ I n  τ − P − A  y T ... y  , transposing and taking derivatives leads to ∂ ( u ′ T ( θ ) , . . . , u ′ ( θ )) ∂τ G ′ = −  u T − ( θ ) · · · u ( θ ) ... ... u T − κ ( θ ) · · · u − κ ( θ )  ⊗ I n  ⇐⇒ ∂ ( u ′ T ( θ ) , . . . , u ′ ( θ )) ∂τ B ′ |{z} = F ′ S ′ P ′ = −  u T − ( θ ) · · · u ( θ ) ... ... u T − κ ( θ ) · · · u − κ ( θ )  ⊗ I n  P ′ which in turn is equivalent to ∂ ( u ′ T ( θ ) , . . . , u ′ ( θ )) ∂τ = −  u T − ( θ ) · · · u ( θ ) ... ... u T − κ ( θ ) · · · u − κ ( θ )  ⊗ I n  S ′† F ′− . Result for ∂L t ( θ ) ∂τ . Finally, we obtain for the partial derivative of the standardized log-likelihood function with respect to τ that ∂L t ( θ ) ∂τ = 1 T T X i =1 l τ ,t ( θ )= − T (cid:20)(cid:18) w T − ( θ ) · · · w ( θ ) (cid:19) ⊗ I n (cid:21) S ′† F ′− (cid:16) I T ⊗ Σ − B ′ ( β ) − (cid:17)  e x,T ( θ ) ... e x, ( θ )  − ∂vec ( f ) ∂τ vec (cid:0) f ′− (cid:1) where w ′ T − ( θ ) = (cid:0) u ′ T − ( θ ) , . . . , u ′ T − κ ( θ ) (cid:1) . A-16 .3 Score of Noise Parameters C.3.1 Partial Derivative with respect to β By taking the derivative of (5), we obtain for β ∈ R n ( n − ∂l t ( θ ) ∂β = ∂∂β ( n X i =1 log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i) − ∂ log { det [ B ( β )] } ∂β = n X i =1 e i,x,t ( θ ) σ − i ∂∂β (cid:18) u ′ t ( θ ) B ′ ( β ) − ι i + 12 vec (cid:16) ι ′ i B ( β ) − u t ( θ ) (cid:17)(cid:19) − B ( β )) ∂ det ( B ( β )) ∂β = n X i =1 e i,x,t ( θ ) σ − i ∂∂β  u ′ t ( θ ) B ′ ( β ) − ι i + 12 h(cid:16) u ′ t ( θ ) ⊗ ι ′ i (cid:17) vec (cid:16) B ( β ) − (cid:17)i| {z } = scalar  − B ( β )) (cid:18) det ( B ( β )) ∂vec ( B ( β )) ∂β vec (cid:16) B ′ ( β ) − (cid:17)(cid:19) = n X i =1 e i,x,t ( θ ) σ − i (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − ι i +  ∂vec (cid:0) B ( β ) − (cid:1) ′ ∂β ( u t ( θ ) ⊗ ι i )  − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − Σ − e x,t ( θ ) +  ∂vec (cid:0) B ( β ) − (cid:1) ′ ∂β (cid:16) u t ( θ ) ⊗ Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) where we used again that the derivative of the determinant is ∂ det( Z ) ∂x ′ = vec (cid:2) adj ( Z ) ′ (cid:3) ′ ∂vec ( Z ) ∂x ′ = det ( Z ) vec (cid:0) Z ′− (cid:1) ′ ∂vec ( Z ) ∂x ′ or equivalently ∂ det( Z ) ∂x = det ( Z ) ∂vec ( Z ) ′ ∂x vec (cid:0) Z ′− (cid:1) (Seber, 2008, 17.26(c), page 361). Moreover, we have that vec ( B ( β )) = Hβ + vec ( I n ) and thus ∂∂β ′ vec ( B ( β )) = H .We obtain from Seber (2008) 17.33(b), page 363, that ∂vec (cid:0) F − (cid:1) ∂x ′ = − (cid:0) F ′− ⊗ F − (cid:1) ∂vec ( F ) ∂x ′ and ∂vec (cid:0) F − (cid:1) ′ ∂x = − ∂vec ( F ) ′ ∂x (cid:0) F − ⊗ F ′− (cid:1) , which leads to ∂l t ( θ ) ∂β = (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − Σ − e x,t ( θ ) +  ∂vec (cid:0) B ( β ) − (cid:1) ′ ∂β (cid:16) u t ( θ ) ⊗ Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − Σ − e x,t ( θ ) − H ′ (cid:16) B ( β ) − ⊗ B ′ ( β ) − (cid:17) (cid:16) u t ( θ ) ⊗ Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − Σ − e x,t ( θ ) − H ′ (cid:16) B ( β ) − u t ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) (9) The derivative of u t with respect to β for one equation. From u t ( θ ) = y t − ( a , . . . , a p )  y t − ... y t − p  − ( b , . . . , b q )  u t − ( θ ) ... u t − q ( θ )  we obtain immediately ∂u ′ t ( θ ) ∂β = − (cid:18) ∂u ′ t − ( θ ) ∂β · · · ∂u ′ t − q ( θ ) ∂β (cid:19)  b ′ ... b ′ q  . This result can be obtained by taking the derivative of

F F − = I such that we obtain F ∂F − ∂x j + ∂F∂x j F − = 0 . Vectorization of ∂F − ∂x j = − F − ∂F∂x j F − gives the desired result, see Harville (1997) page 366. A-17dditionally, an explicit expression for the derivative of u t ( θ ) = B ( β ) ε t ( θ ) = ( ε ′ t ( θ ) ⊗ I n ) vec ( B ( β )) with respect to β can befound as ∂u ′ t ( θ ) ∂β = H ′ ( ε t ( θ ) ⊗ I n ) and subsequently combined with the quantity above. We thus obtain ∂u ′ t ( θ ) ∂β = − H ′ [( ε t − ( θ ) ⊗ I n ) , . . . , ( ε t − q ( θ ) ⊗ I n )]  b ′ ... b ′ q  = − H ′ [( ε t − ( θ ) , . . . , ε t − q ( θ )) ⊗ I n ]  b ′ ... b ′ q  = − H ′ q X i =1 ( ε t − i ( θ ) ⊗ b ′ i ) = − H ′ q X i =1 (cid:16) B ( β ) − u t − i ( θ ) ⊗ b ′ i (cid:17) . Result for l β,t ( θ ) for one point in time. The above leads to ∂l t ( θ ) ∂β = (cid:18) ∂u ′ t ( θ ) ∂β (cid:19) B ′ ( β ) − Σ − e x,t ( θ ) − H ′ (cid:16) B ( β ) − u t ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = − (cid:18) ∂u ′ t − θ ) ∂β · · · ∂u ′ t − q ( θ ) ∂β (cid:19)  b ′ ... b ′ q  B ′ ( β ) − Σ − e x,t ( θ ) − H ′ (cid:16) B ( β ) − u t ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = − H ′ q X i =1 (cid:16) B ( β ) − u t − i ( θ ) ⊗ b ′ i B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ (cid:16) B ( β ) − u t ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = − H ′ q X i =1 (cid:16) B ( β ) − ⊗ b ′ i (cid:17) (cid:16) u t − i ( θ ) ⊗ B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ (cid:16) B ( β ) − ⊗ B ′ ( β ) − Σ − (cid:17) ( u t ( θ ) ⊗ e x,t ( θ )) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) = − H ′ h B ( β ) − ⊗ (cid:16) I n b ′ · · · b ′ q (cid:17)i  u t ( θ ) u t − ( θ ) ... u t − q ( θ )  ⊗ (cid:16) B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) Result for ∂L t ( θ ) ∂β . Finally, we obtain for the partial derivative of the standardised log-likelihood function with respect to β that ∂L t ( θ ) ∂β = 1 T T X t =1 l β,t ( θ )= − T H ′ (cid:20) B ( β ) − ⊗ (cid:18) I n b ′ · · · b ′ q (cid:19)(cid:21) T X t =1  u t ( θ ) u t − ( θ ) ... u t − q ( θ )  ⊗ (cid:16) B ′ ( β ) − Σ − e x,t ( θ ) (cid:17) − H ′ vec (cid:16) B ′ ( β ) − (cid:17) C.3.2 Partial Derivative with respect to σ Since the individual contribution to the (standardised) log-likelihood function is l t ( θ ) = n X i =1 log h f i (cid:16) σ − i ι ′ i B ( β ) − u t ( θ ) ; λ i (cid:17)i − log { det [ B ( β )] } − n X i =1 log ( σ i ) , A-18e obtain that ∂∂σ l t ( θ ) = n X i =1 e i,x,t ( θ ) (cid:0) − ι i σ − i (cid:1) ι ′ i B ( β ) − u t ( θ ) | {z } = ε t ( θ ) − n X i =1 ι i σ − i | {z } =Σ − σ = − n X i =1 σ − i (cid:16) ι i ι ′ i (cid:17) e i,x,t ( θ ) ε t ( θ ) − Σ − σ = − Σ − [ e x,t ( θ ) ⊙ ε t ( θ ) + σ ] where ⊙ denotes element-wise multiplication. The partial derivative of l t ( θ ) with respect to σ is thus identical to the one derivedin Lanne et al. (2017). Result for ∂L t ( θ ) ∂σ . Finally, we obtain for the partial derivative of the standardised log-likelihood function with respect to β that ∂L t ( θ ) ∂σ = 1 T T X t =1 l σ,t ( θ )= − T Σ − T X t =1 e x,t ( θ ) ⊙ ε t ( θ ) ! −  σ − ... σ − n  C.3.3 Partial Derivative with respect to λ Analogous to l σ,t ( θ ) , the partial derivative of l t ( θ ) with respect to λ is identical to the one derived in Lanne et al. (2017), i.e. ∂∂λ i l t ( θ ) = e i,λ i ,t for all ii