A Robust Score-Driven Filter for Multivariate Time Series
AA Robust Score-Driven Filter for Multivariate Time Series
Enzo D’Innocenzo, Alessandra Luati, Mario MazzocchiUniversity of BolognaDepartment of Statistics
Abstract
A novel multivariate score-driven model is proposed to extract signals from noisyvector processes. By assuming that the conditional location vector from a multivari-ate Student’s t distribution changes over time, we construct a robust filter which isable to overcome several issues that naturally arise when modeling heavy-tailed phe-nomena and, more in general, vectors of dependent non-Gaussian time series. Wederive conditions for stationarity and invertibility and estimate the unknown param-eters by maximum likelihood. Strong consistency and asymptotic normality of theestimator are proved and the finite sample properties are illustrated by a Monte-Carlostudy. From a computational point of view, analytical formulae are derived, whichconsent to develop estimation procedures based on the Fisher scoring method. Thetheory is supported by a novel empirical illustration that shows how the model canbe effectively applied to estimate consumer prices from home scanner data. Keywords:
Robust filtering; Multivariate models; Score-driven models; Homescandata
The analysis of multivariate time series has a long history, due to the empirical evidence frommost research fields that time series resulting from complex phenomena do not only depend ontheir own past, but also on the history of other variables. For this reason, from [Hannan, 1970],the literature on multivariate time series has grown very fast. The leading example is thedynamic representation of the conditional mean of a vector process which gives rise to vectorautoregressive processes, see [Hamilton, 1994] and [L¨utkepohl, 2007].Following the taxonomy proposed in [Cox et al., 1981], two main classes of models can beconsidered when analysing dynamic phenomena: parameter-driven and observation-driven mod-els. The class of parameter driven model is a broad class, which involves the widely appliedstate-space models, or unobserved component models [Harvey, 1989, West and Harrison, 1997].Within this framework, parameters are allowed to vary over time as dynamic processes driven by a r X i v : . [ ec on . E M ] S e p diosyncratic innovations. Hence, likelihood functions are analytically tractable only in specificcases, notably linear Gaussian models, where inference can be handled by the Kalman filter. Onthe other hand, parameter-driven models are very sensitive to small deviations from the distribu-tional assumptions. In addition, the Gaussian assumption often turns out to be too restrictive,and flexible specifications may be more appropriate. Thus, a fast growing field of researchis dealing with nonlinear or non-Gaussian state-space models, resting on computer intensivesimulation methods like the particle filter discussed in [Durbin and Koopman, 2012]. Althoughthese methods provide extremely powerful instruments for estimating nonlinear and/or non-Gaussian models, they can be computationally demanding. Furthermore, it may be difficult toderive the statistical properties of the implied estimators, due to the complexity of the jointlikelihood function.In contrast, in observation-driven models, the dynamics of time varying parameters dependon deterministic functions of lagged variables. This enables a stochastic evolution of the param-eters which become predictable given the past observations. [Koopman et al., 2016] assess theperformances and optimality properties of the two classes of models, in terms of their predictivelikelihood. The main advantage of observation-driven models is that the likelihood function isavailable in closed form, even in nonlinear and/or non-Gaussian cases. Thus, the asymptoticanalysis of the estimators becomes feasible and computational costs are reduced drastically.Within the class of observation-driven models, score-driven models are a valid option formodeling time series that do not fall in the category of linear Gaussian processes. Exampleshave been proposed in the context of volatility estimation and originally referred to as gen-eralised autoregressive score (GAS) models, [Creal et al., 2011], and as dynamic conditionalscore (DCS) models, [Harvey, 2013]. The key feature of these models is that the dynamicsof time-varying parameters are driven by the score of the conditional likelihood, which needsnot necessarily to be Gaussian but can be heavy tailed. For example, it may follow a Stu-dent’s t distribution [Harvey and Luati, 2014, Linton and Wu, 2020], an exponential general-ized beta distribution [Caivano et al., 2016], a binomial distribution as in the vaccine exampleby [Hansen and Schmidtblaicher, 2019], or represented by a mixture [Lucas et al., 2019]. Fur-ther applications are discussed in [Creal et al., 2013]. The optimality of the score as a drivingforce for time varying parameters in observation-driven models is discussed in [Blasques et al., 2015].According to which conditional distribution is adopted, specific situations may be convenientlyhandled due to the properties of the score. As an example for the univariate case, if a heavy-tailed distribution is specified (e.g. Student’s t ), the resulting score-driven model yields a sim-ple and natural model-based signal extraction filter which is robust to extreme observations,without any external interventions or diagnostics like outlier detection and dummy variables[Harvey and Luati, 2014].In score-driven models, as well as in all observation-driven models, the time varying pa-rameters are updated by filtering procedures, i.e. weighted sums of functions of past obser-vations, given some initial conditions that can be fixed or estimated along with the staticparameters. A robust filtering procedure should assign less weight to extreme observations in rder to prevent biased inference of the signal and the parameters. In particular, the workof [Calvet et al., 2015] provides a remarkable application of robust methods when dealing withcontaminated observations. The authors show that a substantial efficiency gain can be achievedby huberizing the derivative of the log -observation density. As we show in the present study,the same holds if one considers an alternative robustification method, based on the specifica-tion of a conditional multivariate Student’s t distribution. A similar approach can be foundin [Prucha and Kelejian, 1984] and [Fiorentini et al., 2003], where the multivariate Student’s t distribution provides a valid alternative to relax the normality assumption. In the context ofscore-driven models, [Creal et al., 2014] mention the relevance of modeling high-frequency datawith outliers and heavy tails by means of the multivariate Student’s t distribution.In this paper, we specify a score-driven model for the time-varying location of a multivariateStudent’s t distribution. We envisage three main contributions to the existing literature.The first contribution is the full derivation of the probabilistic and asymptotic theory behindthe multivariate dynamic score-driven model for conditional Student’s t distributions, includingthe conditions of stationarity, ergodicity and invertibility. Also, we prove strong consistencyand asymptotic normality of maximum likelihood estimators of the static parameters. It isnoteworthy to remark that when the degrees of freedom of the Student’s t distribution tend toinfinity, we recover a linear Gaussian state-space model and the Kalman filter recursions.The second contribution is the development of an estimation scheme grounded on Fisher’sscoring method, based on closed-form analytic expressions, which can be directly implementedinto any statistical or matrix-friendly software.The third contribution of the paper is an innovative application, dealing with estimation ofregional consumer prices based on home scanner data. The use of scanner data to computeofficial consumer price indices (CPIs) is gaining popularity, because of their timeliness and ahigh level of product and geographical detail [Shapiro and Feenstra, 2003]. However, they alsosuffer from a variety of shortcomings, which make time series of scanner data prices (SDPs)potentially very noisy, especially when they are estimated for population sub-groups, or at theregional level [Silver, 1995]. There is extensive research and a lively debate on the issues relatedto the computation and use of scanner data based CPIs. In a dedicated session of the 2019meeting of the the Ottawa Group on Price Indices, it has been suggested to adopt model-based filtering techniques to extract the signal from scanner-based time series of price data.These filtered estimates lose the classical price index formula interpretation, but are expectedto deliver the same information content with a better signal-to-noise ratio. We show that ourrobust multivariate model, applied to SDPs, provides information on the dynamics of the timeseries and on their interrelations without being affected from outlying observations, which arenaturally downweighted in the updating mechanism.The paper is organised as follows. In Section 2 the model is specified. Section 3 deals with thestochastic properties: stationarity and invertibility conditions are derived along with bounds for See Jens Mehroff presentation at https://eventos.fgv.br/sites/eventos.fgv.br/files/arquivos/u161/towards_a_new_paradigm_for_scanner_data_price_indices_0.pdf he moments. In Section 4 maximum likelihood estimation is discussed. The asymptotic theoryis derived in Section 4.1 and the computational aspects are discussed in Section 4.2. Finitesample properties are analysed in Section 5. The empirical analysis is reported in section 6.Some concluding remarks are drawn in Section 7. The main proofs are collected in Appendix A,while the relevant quantities for the implementation of the Fisher scoring algorithm as well asthe proofs of some auxiliary Lemmata are deferred to Appendix B and Appendix C, respectively. t Location Model
Let us consider a vector of N ≥ stochastic processes { y t } t ∈ Z and let F t − = σ { y t − , y t − , . . . } be its filtration at time t − . We assume that the process is generated by a conditional Student’s t distribution with ν > degrees of freedom, f ( y t |F t − ) = Γ (cid:16) ν + N (cid:17) Γ (cid:16) ν (cid:17) ( πν ) N/ | Ω | − / " y t − µ t ) > Ω − ( y t − µ t ) ν − ( ν + N ) / where µ t is a time varying location vector and Ω is the scale matrix of y t , that we assumehere to be static. A location-scale representation of y t is the following, y t = µ t + Ω (cid:15) t , t = 1 , . . . , T, (1)where (cid:15) t IID ∼ t ν ( N , I N ) , i.e. the independent identically distributed ( IID ) multivariate standard t -variate, where N denotes the zero mean vector of R N and I N the N × N identity matrix.The well-known relation holds between the scale matrix Ω and the covariance matrix Σ of y t , Σ = ( ν/ ( ν − Ω . As we have adopted the parameterisation based on the scale matrix,the location vector always exists. In contrast, the conditional mean exists for ν > . If therepresentation based on the covariance matrix is preferred, then the stronger restriction ν > has to be imposed.Our interest is in recovering µ t based on a set of observed vector of time series from y t , for t = 1 , . . . , T . With no distributional assumptions on the dynamics of µ t , a filter can be specified, µ t +1 | t = φ ( µ t | t − , y t , θ ) , i.e. a stochastic recurrence equation (SRE) that approximates thepath of µ t , based on some function φ of the past observations and a set of static parameters θ ∈ Θ ⊂ R p , where Θ is a compact parameter space that possibly include a starting value, say µ | . The subscript notation t | t − is used to emphasise the fact that µ t | t − is an approximationof the dynamic location process at time t given the past, that is equivalent to say that µ t | t − is F t − -measurable. The specification of the mapping φ usually involves some autoregressivescheme for the evolution of the dynamic parameter combined with the specification of a drivingforce, usually a highly nonlinear function of the past observations that forms a martingaledifference sequence, playing an analogous role of the error term in parameter-driven models.In this paper, we approximate the temporal changes of the conditional location vector by elying on the score-driven framework of [Creal et al., 2013] and [Harvey, 2013], as follows, µ t +1 | t − ω = Φ ( µ t | t − − ω ) + Ku t , (2)where µ t | t − = ( µ ,t | t − , . . . , µ N,t | t − ) > , ω = ( ω , . . . , ω N ) > is the N -dimensional vector of un-conditional means, Φ and K are N × N matrices of coefficients and the driving force, { u t } t ∈ N ,is a martingale difference sequence proportional to the score of the conditional likelihood of µ t , u t = ( y t − µ t | t − ) /w t , (3)where w t = 1 + ( y t − µ t | t − ) > Ω − ( y t − µ t | t − ) /ν . As a matter of fact, ∂ ln f ( y t |F t − ) ∂ µ t | t − = Ω − ν + Nν u t . The score as the driving force in an updating equation for a time varying parameter is thekey feature of score-driven models. The rationale behind the recursion (2) is very intuitive.Analogously to the Gauss-Newton algorithm, it improves the model fit by pointing in thedirection of the greatest increase of the likelihood. A detailed discussion on optimality of scoredriven updates in observation driven models is given by [Blasques et al., 2015].In the context of location estimation under the Student’s t assumption, a further relevantmotivation for the score-driven methodology lies in the robustness of the implied filters. Indeed,the positive scaling factors w t in equation 3 are scalar weights that involve the Mahalanobisdistance. They possess the role of re-weighting the large deviation from the mean incorporatedin the innovation error v t = y t − µ t | t − . The bulk of the robustness comes precisely from thiscomponent. We would like to remark that when ν → ∞ , u t converges to v t and equations (1)and (2) coincide with the innovation form of a linear Gaussian state-space model.A formal proof of the robustness of the method is in the following Lemma, which providessufficient conditions for a filter to be robust, in line with [Calvet et al., 2015]. The key resultis that u t in equation (3) can be written as u t = v t (1 − b t ) with b t = 1 − /w t where b t = v > t Ω − v t /ν v > t Ω − v t /ν , ≤ b t ≤ , with b t ∼ B eta (cid:18) N , ν (cid:19) , (4)which emphasises that the driving force u t is a continuous function of a beta distributed randomvariable, see Pag. 19 of [Kotz and Nadarajah, 2004] or Proposition 39 of [Harvey, 2013]. Lemma 2.1 ( Uniformly Boundedness and Moments of the Score ) . For < ν < ∞ , thevector sequence { u t } t ∈ N is uniformly bounded, that is sup t E [ k u t k ] < ∞ . Hence, it possesses allthe even moments E [ k u t k s ] = k Ω k s B (cid:16) N +2 s , ν +2 s (cid:17) B (cid:16) N , ν (cid:17) (cid:18) νN (cid:19) s . where s = 1 , , . . . and B ( α, β ) = Γ( α )Γ( β ) / Γ( α + β ) is the beta function. The odd moments f u t are all equal to zero.Proof. See Appendix A.1.The moment structure reveals important features of the driving force u t , that turns out tobe an an IID sequence with zero mean vector and (vec) -variance covariance matrix, E [ u t ⊗ u t ] = vec E [ u t u > t ] = ν ( ν + N )( ν + N + 2) vec Ω . To conclude, let us define the conditional expectation operator E t − [ x ] for a random vari-able x as a shorthand notation for E [ x |F t − ] . We note that the multi-step forecasts can bestraightforwardly obtained as E T [ y T + l ] = E T [ µ T + l | T + l − ] = ω + l − X j =1 Φ j ( µ T +1 | T − ω ) . We begin this section by giving conditions under which the multivariate DCS- t Location Modelproduces stationary and ergodic paths, i.e. we derive the stochastic properties of the model asa data generating process. Afterwards, we discuss invertibility of the filter. The latter propertyis crucial for estimation and prediction, especially in nonlinear multivariate models, in that itensures that the real path of the dynamic location vector can be consistently approximated bysome measurable function of the past information, see the discussion in [Blasques et al., 2018].
Let us express the dynamics of the signal in terms of the noise term, µ t +1 − ω = Φ ( µ t − ω ) + K Ω / (cid:15) t (cid:15) > t (cid:15) t /ν , t ∈ Z (5)where we have removed the subscript t | t − since we now interpret equation (5) as a Markovchain. With this specification, it is evident that the driving force is independent of µ t because sois the stationary ergodic sequence { (cid:15) t } t ∈ Z . This implies that one can generate a stationary andergodic vector processes { y t } t ∈ Z , which satisfy equations (1) and (2), by drawing IID sequencesfrom the assumed multivariate Student’s t and then plugging this sequence into the transitionmechanism in (5). Equation (5) allows us to derive the stochastic properties of our score-drivenmodel, summarized in the next Lemma. Lemma 3.1 ( Stationarity, Ergodicity and Moments of the Dynamic Location ) . Con-sider the recursion (5) and let { (cid:15) t } t ∈ Z be a stationary and ergodic vector sequence. Assume that ( Φ ) < and det K = 0 , where % ( X ) denote the spectral radius of any N × N -dimensionalreal matrix. Then (5) admits a unique strictly stationary solution { µ t } t ∈ Z with representation µ t +1 − ω = ∞ X j =0 Φ j K Ω / (cid:15) t (cid:15) > t − j (cid:15) t − j /ν . Furthermore, E [ k µ t k m ] < ∞ for every m > .Proof. See Appendix A.1.The stability condition % ( Φ ) < , is a well-known condition in the theory of linear systems,see [Hannan and Deistler, 1987], [Hannan, 1970] or [L¨utkepohl, 2007], which extends to thecase of the present nonlinear model. As a consequence of Lemma 3.1, we derive the momentstructure of the process (1). Lemma 3.2 ( Bounded Moments ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that % ( Φ ) < and det K = 0 . If E [ k (cid:15) t k m ] < ∞ , ∀ m > ν − δ , then E [ k y t k m ] < ∞ .Proof. See Appendix A.1
For filtering purposes, it is convenient to compactly write equations (2) and (3) as µ t +1 | t − ω = Φ ( µ t | t − − ω ) + K y t − µ t | t − y t − µ t | t − ) > Ω − ( y t − µ t | t − ) /ν , t ∈ N . (6)By starting at some initial value, µ | , and using equation (6) for t = 1 , . . . , T one can recovera unique filtered path { ˆ µ t | t − } t ∈ N for every θ ∈ Θ . A desirable property is that the values usedto initialise the process are asymptotically negligible, in the sense that as the time t increases,the impact of the chosen µ | eventually vanishes and the process will converge to a uniquestationary and ergodic sequence. This stability property of the filtered sequence ˆ µ t | t − is knownas invertibility, see [Straumann and Mikosch, 2006]. Moreover, an invertible model allows oneto consistently estimate the innovations, that are typically obtained from the prediction error v t . This is crucial for the asymptotic theory of the estimator defined below, since if invertibilitydoes not hold, the objective function will depend on the starting value µ | even asymptotically,which entails an inconsistent estimator.The seminal paper of [Bougerol, 1993] provides sufficient conditions for invertibility, such asthe contraction condition and the existence of logarithmic moments. By relying on Theorem 3.1of [Bougerol, 1993], [Straumann, 2005] and [Straumann and Mikosch, 2006] discuss stationarityand invertibility conditions with applications to different classes of GARCH models and developa unified asymptotic theory based on SREs. As we show in the proof of Lemma 3.3 below, thelogarithmic moment conditions required in Theorem 3.1 of [Bougerol, 1993] are easily satisfied ince µ t | t − is a continuous vector-valued function of the uniformly bounded driving force in(3). On the other hand, for the contraction condition, we need to define the Jacobian matrixof the SRE in (6), that is X t = ∂ µ t +1 | t ∂ µ > t | t − = Φ + K ∂ u t ∂ µ > t | t − . (7)With the next Lemma, we give the relevant condition under which the SRE in (6) is con-tractive on average and thus, the convergence of the filtered { ˆ µ t | t − } t ∈ N to a unique F t − -measurable stationary and ergodic solution { µ t | t − } t ∈ N is obtained as a corollary of Theorem3.1 of [Bougerol, 1993] (or equivalently Theorem 2.8 of [Straumann and Mikosch, 2006]) irre-spective of the initialization µ | . Moreover, both { ˆ µ t | t − } t ∈ N and { µ t | t − } t ∈ N have m ≥ bounded moments as a consequence of Lemma 2.1. Lemma 3.3 ( Invertibility of the Dynamic Location Filter ) . Consider the model specifiedby equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Let the conditions of Lemma 3.1hold true. Assume that E " log sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k Y j =1 X k − j +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < , (8) for k ≥ and where Θ is the compact parameter space and X t is defined in (7) . Then, thefiltered location vector { ˆ µ t | t − } t ∈ N is invertible and converges exponentially fast almost surely(e.a.s.) to the unique stationary ergodic sequence { µ t | t − } t ∈ Z for any initialization of the filter-ing recursion, µ | ∈ R N , that is, sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k e.a.s. −−−→ as t → ∞ , (9) Furthermore, sup t E [sup θ ∈ Θ k ˆ µ t | t − k m ] < ∞ and E [sup θ ∈ Θ k µ t | t − k m ] < ∞ for every m ≥ .Proof. See Appendix A.1.The contraction condition in equation (8) imposes restrictions on the compact parameterspace Θ that cannot be checked directly. Also the expectation in equation (8) cannot beverified in practice, since it depends on the true unknown distribution, see also the discussionin [Blasques et al., 2018]. Thus, one can rely on sufficient conditions which are typically morerestrictive than (8) and that we discuss in the following remark, see also [Linton and Wu, 2020]. Remark 1.
The contraction condition in (8) is satisfied if E [log sup θ ∈ Θ k X k ] < . Since X = Φ + 2 /ν K /w ( y − µ | )( y − µ | ) > Ω − − K /w , where w t is definied in equation (3) , it is not immediate to obtain a sharp bound for (8) .However, by substituting ( y t − µ t | t − ) by Ω / (cid:15) t we obtain the following sufficient contraction ondition for invertibility E " log sup θ ∈ Θ k X k = E " log sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Φ + 2 /ν K Ω / (cid:15) (cid:15) > (1 + (cid:15) > (cid:15) /ν ) Ω − / − K (cid:15) > (cid:15) /ν (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < , which can be easily evaluated by simulation methods since (cid:15) t ∼ t ν ( N , I N ) . Moreover, if E [log sup θ ∈ Θ k X k ] < does not hold, we can take k = 2 in (8) which is much easier tobe satisfied. Consider the stationary and ergodic process { y t } t ∈ Z defined in equation (1). The unconditionaldistributions of ( µ > , µ > , . . . , µ > T ) > and ( y > , y > , . . . , y > T ) > are unknown. However, condition-ally on some non-random starting value for the dynamic location, µ | , by recursively applyingequation (2), the conditional distribution of y , . . . , y T , can be characterized by the distribu-tion of the IID random vector (cid:15) t , thus implying that the log -likelihood function for a singleobservation has the form ‘ t ( θ ) = ln Γ ν + N ! − ln Γ ν ! − N πν ) −
12 ln | Ω | − ν + N " v > t Ω − v t ν , (10)where θ = ( ξ > , ψ > ) > ∈ Θ ⊂ R p , ξ = ( ν, (vech( Ω )) > , ω > ) > and ψ = ((vec Φ ) > , (vec K ) > ) > .The dimension of the p -vector of unknown parameters θ is thus determined by the dimensionsof ξ ∈ R s , with s = 1 + N ( N + 1) + N and ψ ∈ R d , with d = ( N × N ) + ( N × N ) and hence, p = s + d .Lemma 3.3, ensures that the initial conditions for the function µ t | t − are asymptoticallyequivalent such that, once an initial value has been fixed, it is possible to obtain an approximatedversion of the conditional log -likelihood, ˆ ‘ t ( θ ) , by replacing µ t | t − in equation (10) by theapproximated dynamic location ˆ µ t | t − . Thus, for the whole sample, we obtain ˆ ‘ T ( θ ) = T X t =1 ˆ ‘ t ( θ ) (11)and the maximum likelihood estimator of θ is b θ T = arg max θ ∈ Θ ˆ ‘ T ( θ ) . Strong consistency and asymptotic normality of the maximum likelihood estimator for theproposed model are derived for T → ∞ and N is fixed. Assumption 1. . % ( Φ ) < and det K = 0 ,2. the uniform contraction condition E " log sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Q kj =1 X k − j +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < , holds for k ≥
3. the parameter space Θ is compact with < ν < ∞ ,4. the true parameter vector θ belongs to the interior of Θ , i.e. θ ∈ int ( Θ ) ,5. E [ k X t k ] < . The next results is concerned with the consistency of ˆ θ T . Theorem 4.1 ( Consistency ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Furthermore, suppose that conditions 1–4 in Assumption 1 holdtrue. Then, ˆ θ T a.s. −−→ θ as T → ∞ . Proof.
See Appendix A.2.We now turn to asymptotic normality.
Theorem 4.2 ( Asymptotic Normality ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Let the conditions of Theorem 4.1 hold true and fur-thermore, suppose that condition 5 in Assumption 1 is satisfied. Then, √ T (ˆ θ T − θ ) = ⇒ N ( , I ( θ ) − ) , where, I ( θ ) = − E " d ‘ t ( θ ) d θ d θ > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ , is the Fisher’s Information matrix evaluated at the true parameter vector θ .Proof. See Appendix A.2.By Theorem 4.1, I ( θ ) can be consistently estimated by b I (ˆ θ T ) = − T T X t =1 " d ˆ ‘ t ( θ ) d θ d θ > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = b θ T . (12)The general formula for the second partial derivatives in (12) has the form below d ‘ t ( θ ) d θ d θ > = ∂ ‘ t ( θ ) ∂ θ ∂ θ > + d ( µ t | t − − ω ) d θ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d θ > ! + ∂‘ t ( θ ) ∂ µ > t | t − d ( µ t | t − − ω ) d θ d θ > , (13) ince the dynamic location and its derivatives are nonlinear functions of the vector of parameters θ . However, the assumption of correct specification implies that the score vector forms amartingale difference sequence. In addition, the dynamic location (and its derivatives) are F t − -measurable and therefore the last component of equation (13) will cancel out after theapplication of the conditional expectation.Therefore, by the law of iterated expectations, one has I ( θ ) = E [ I t ( θ )] = − lim n →∞ E t − n ( . . . E t − " E t − " d ‘ t ( θ ) d θ d θ > , which, in turn, can be consistently estimated by b I T ( θ ) = − T T X t =1 E t − " d ˆ ‘ t ( θ ) d θ d θ > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = b θ T . As a matter of fact, the latter estimator b I ( θ ) is strongly consistent, i.e. b I T ( θ ) a.s. −−→ I ( θ ) as T → ∞ , and it is easier and more stable to implement than b I ( θ ) , since it avoids the recursive evaluationof the second derivatives of the dynamic location vector.Maximum likelihood estimation and inference is carried out by means of Fisher’s scoringmethod. The development of a proper iterative procedure requires explicit formulae for thescore vector and the Hessian matrix. In the following section, we discuss the computationalaspects related to maximum likelihood estimation. A strongly reliable Fisher-scoring method based on analytical formulae (reported in AppendixB) is developed, which can be directly implemented in any statistical package through thefollowing steps:1. Choose a starting value b θ (0) T = ( ν (0) , (vech( Ω ) (0) ) > , ( ω (0) ) > , (vec( Φ ) (0) ) > , (vec( K ) (0) ) > ) >
2. For h > , update b θ ( h ) T using the scoring rule b θ ( h +1) T = b θ ( h ) T + h b I T ( θ ) ( h ) i − b s T ( θ ) ( h ) , where the score vector and the conditional information are, respectively, s T ( θ ) = T X t =1 s t ( θ ) = T X t =1 d‘ t ( θ ) d θ and I T ( θ ) = − T X t =1 E t − [ H t ( θ )] = − T X t =1 E t − " d ‘ t ( θ ) d θ d θ > . . Repeat until convergence, e.g., (cid:13)(cid:13)(cid:13)b θ ( h +1) T − b θ ( h ) T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)b θ ( h ) T (cid:13)(cid:13)(cid:13) < δ for some fixed small δ > .The expressions for the score s t ( θ ) and the information matrix I T ( θ ) are in Appendix B. Toinitialise the estimation procedure, we follow the approach suggested in [Fiorentini et al., 2003].First, a consistent estimator of the restricted version of the parameter vector ˜ θ T is obtained bythe Gaussian quasi-maximum likelihood procedure in [Bollerslev and Wooldridge, 1992]. Sec-ond, a consistent method of moment is adopted for the degrees of freedom ν , by making use ofthe empirical coefficient of excess kurtosis ˜ κ on the standardized residuals and the relationship ˜ ν = (4˜ κ + 6) / ˜ κ . Convergence is fast in that usually few iterations of that procedure are needed,which makes scoring methods particularly appealing for estimation purposes. The finite-sample properties of the maximum likelihood estimator are investigated via Monte-Carlo simulations. We focus on: the distribution of the maximum likelihood estimator of thethe degrees of freedom ν , the scale matrix Ω , and the unconditional mean vector ω . Moreover,we consider Φ , which contains the autoregressive coefficients, and finally the matrix K . It iswell-known that estimating the degrees of freedom in multivariate Student’s t distributions canbe quite challenging, since the implied profile likelihood is remarkably flat [Breusch et al., 1997].Hence, we assume that the distribution of the heavy-tailed IID errors will be (cid:15) t ∼ t ν ( , I ) ,where ν ∈ { , , , } , that is, a standard bivariate Student’s t with three, five and tendegrees of freedom, while with the case when ν = 100 we cover the case when (cid:15) t tends tobehave like a standard Gaussian noise. A property of the multivariate model introduced sofar is that it estimates a linear Gaussian state space model when the latter is the true datagenerating process. In this sense, the filter is robust to misspecification if normality holds. Onthe other hand, it is important to remark that we are assuming that all the time series sharethe same degrees of freedom ν .In practice, we simulate data from the different specifications of the standard bivariateStudent’s t and for each of the realized paths of the time series we consider the recursion (2)for µ t | t − , which satisfies the stationarity conditions of Lemma (3.1). During the process whichgenerates the data, we use a burn-in period of , replications and we store T = 250 , and , observations. This ensures that the collected { y t } are stationary ergodic.With this simulated data at hand, we start the Fisher’s scoring algorithm based on theanalytical formulae described in Section 4.2. We stop the whole estimation process after amaximum of ten iterations in order to assess the rate of convergence and evaluate the precision f scoring rule. We repeat this simulation scheme M = 1 , times for each case and we use theempirical measures of bias and root mean square error to quantify the accuracy of our proposedestimators.Formally, the empirical bias measure and the empirical root mean square error of ˆ ν over the M = 1 , replications are computed as Bias (ˆ ν ) = 1 M M X m =1 (ˆ ν m − ν ) , RMSE (ˆ ν ) = vuut M M X m =1 (ˆ ν m − ν ) . In general, given the considered data generating process (DGP), we expect the distributionsof the estimators to be well approximated by a Gaussian distribution with low values of biasesand root mean square error.
In the bivariate case, the vector of parameters assumes the following form θ = ( ν, Ω , Ω , Ω , ω , ω , Φ , Φ , Φ , Φ , κ , κ , κ , κ ) > , thus θ ∈ R , which means that we need to estimate parameters, in order to obtain acomplete bivariate system.The true parameters of the considered DGP are ν ∈ { , , , } , Ω = I , ω = h − i , Φ = .
85 0 . .
00 0 . , K = .
95 0 . .
05 0 . . The results are reported in Tables 1 to 4 according to the values of the degrees of freedom ν ∈ { , , , } , respectively. Each table contains three columns which are associated withthe time series dimensions, that is T = 250 , and , .The first evident result, common to al the tables, is that as the time series dimensionincreases, the values of the empirical Bias and
RMSE tend to reduce sharply, which is line withthe consistency Theorem 4.1. In particular, we note that even if the value of ν is very low,namely ν = 3 , the results are still satisfactory, which shows that the model and the estimationprocedure are robust against heavy-tailed data and potential outliers.In general, estimation of the number of degrees of freedom is rather accurate. In the Gaussiancase when ν = 100 the filter collapses to the Kalman filter, the degrees of freedom parameteris recovered already in the case of the smallest sample size. Moreover, the decreasing biasand RMSE patterns may be due to the fixed initial value of the dynamic location vector µ | which was used to start the filter recursions. However, the invertibility conditions introducedin Lemma 3.3 ensure that for T → ∞ , this initial estimation bias will eventually tapers off.In conclusion, the ML estimations deliver satisfactory results in terms of bias and root meansquare error, hence the reliability of the Fisher-scoring method. ν = 3 . T = 250 T = 500 T = 1000 Estimate Bias RMSE Estimate Bias RMSE Estimate Bias RMSE ν .
987 0 .
013 0 .
457 2 .
979 0 .
021 0 .
473 3 . - .
016 0 . .
972 0 .
028 0 .
132 0 .
972 0 .
028 0 .
130 0 .
988 0 .
011 0 . - .
000 0 .
000 0 . - .
004 0 .
004 0 .
074 0 .
000 0 .
000 0 . .
971 0 .
029 0 .
135 0 .
972 0 .
028 0 .
136 0 .
991 0 .
008 0 . ω - . - .
004 0 . - . - .
009 0 . - .
008 0 .
009 0 . ω . - .
003 0 .
215 5 . - .
004 0 .
207 5 . - .
002 0 . .
831 0 .
019 0 .
063 0 .
836 0 .
014 0 .
062 0 .
840 0 .
010 0 . . - .
001 0 .
082 0 .
000 0 .
000 0 . - .
000 0 .
001 0 . .
000 0 .
000 0 . - .
001 0 .
001 0 . - .
000 0 .
000 0 . .
768 0 .
032 0 .
091 0 .
771 0 .
029 0 .
084 0 .
789 0 .
011 0 . κ . - .
005 0 .
184 0 .
941 0 .
009 0 .
117 0 . - .
004 0 . κ . - .
002 0 .
142 0 . - .
004 0 .
145 0 .
049 0 .
000 0 . κ . - .
004 0 .
149 0 . - .
007 0 .
152 0 . - .
001 0 . κ .
898 0 .
002 0 .
182 0 .
894 0 .
006 0 .
186 0 .
899 0 .
001 0 . Table 2: Monte-Carlo Simulation results for ν = 5 . T = 250 T = 500 T = 1000 Estimate Bias RMSE Estimate Bias RMSE Estimate Bias RMSE ν . - .
090 1 .
084 5 . - .
075 0 .
693 5 . - .
012 0 . .
978 0 .
220 0 .
121 0 .
993 0 .
007 0 .
086 0 .
997 0 .
003 0 . . - .
001 0 . - .
002 0 .
002 0 . - .
001 0 .
001 0 . .
974 0 .
025 0 .
123 0 .
988 0 .
012 0 .
084 0 .
992 0 .
008 0 . ω - . - .
027 0 . - . - .
006 0 . - .
002 0 .
002 0 . ω . - .
011 0 .
268 4 .
995 0 .
005 0 .
156 4 .
997 0 .
003 0 . .
831 0 .
019 0 .
055 0 .
831 0 .
019 0 .
055 0 .
844 0 .
006 0 . - .
000 0 .
001 0 . - .
000 0 .
001 0 .
068 0 .
000 0 .
000 0 . - .
000 0 .
001 0 . - .
001 0 .
001 0 . - .
001 0 .
001 0 . .
776 0 .
023 0 .
069 0 .
777 0 .
023 0 .
069 0 .
788 0 .
012 0 . κ .
974 0 .
002 0 .
154 0 .
949 0 .
001 0 .
103 0 . - .
001 0 . κ .
047 0 .
003 0 .
115 0 .
050 0 .
000 0 .
075 0 .
050 0 .
000 0 . κ . - .
005 0 .
112 0 . - .
005 0 .
073 0 . - .
002 0 . κ .
896 0 .
004 0 .
138 0 . - .
001 0 .
099 0 . - .
000 0 . ν = 10 . T = 250 T = 500 T = 1000 Estimate Bias RMSE Estimate Bias RMSE Estimate Bias RMSE ν . - .
529 4 .
727 10 . - .
384 2 .
232 10 . - .
226 1 . .
989 0 .
011 0 .
097 0 .
995 0 .
005 0 .
075 0 .
995 0 .
004 0 . - .
001 0 .
002 0 .
067 0 .
000 0 .
000 0 . - .
000 0 .
002 0 . .
991 0 .
008 0 .
108 0 .
991 0 .
009 0 .
074 0 .
993 0 .
006 0 . ω - . - .
015 0 . - . - .
006 0 . - . - .
004 0 . ω . - .
013 0 .
287 4 .
994 0 .
006 0 .
204 4 .
997 0 .
002 0 . .
834 0 .
016 0 .
052 0 .
838 0 .
011 0 .
032 0 .
845 0 .
005 0 . - .
005 0 .
006 0 .
064 0 . - .
003 0 . - .
002 0 .
002 0 . . - .
002 0 .
047 0 . - .
001 0 . - .
000 0 .
000 0 . .
781 0 .
019 0 .
063 0 .
789 0 .
011 0 .
040 0 .
794 0 .
006 0 . κ .
926 0 .
024 0 .
113 0 .
946 0 .
003 0 .
089 0 .
949 0 .
001 0 . κ . - .
009 0 .
083 0 . - .
001 0 .
065 0 .
049 0 .
001 0 . κ .
042 0 .
007 0 .
091 0 .
048 0 .
002 0 .
064 0 .
049 0 .
000 0 . κ .
877 0 .
023 0 .
123 0 .
896 0 .
004 0 .
083 0 .
893 0 .
007 0 . Table 4: Monte-Carlo Simulation results for ν = 100 . T = 250 T = 500 T = 1000 Estimate Bias RMSE Estimate Bias RMSE Estimate Bias RMSE ν .
708 3 .
290 25 .
729 98 .
782 1 .
218 13 .
782 100 . - .
855 12 . .
999 0 .
001 0 .
119 1 . - .
017 0 .
085 1 . - .
002 0 . - .
005 0 .
005 0 .
068 0 . - .
006 0 .
056 0 . - .
001 0 . .
991 0 .
009 0 .
106 1 . - .
008 0 .
084 1 . - .
003 0 . ω - . - .
026 0 . - . - .
035 0 . - .
032 0 .
032 0 . ω .
944 0 .
056 0 .
304 5 . - .
042 0 .
230 5 . - .
009 0 . .
826 0 .
023 0 .
054 0 .
834 0 .
016 0 .
039 0 .
841 0 .
009 0 . . - .
004 0 .
060 0 . - .
002 0 .
040 0 . - .
001 0 . - .
001 0 .
001 0 . - .
004 0 .
004 0 .
033 0 . - .
001 0 . .
780 0 .
019 0 .
061 0 .
785 0 .
015 0 .
043 0 .
785 0 .
015 0 . κ .
945 0 .
004 0 .
121 0 .
946 0 .
004 0 .
093 0 .
947 0 .
003 0 . κ .
048 0 .
002 0 .
094 0 . - .
002 0 .
062 0 . - .
005 0 . κ . - .
014 0 .
098 0 .
049 0 .
001 0 .
069 0 .
049 0 .
001 0 . κ . - .
010 0 .
108 0 . - .
003 0 .
090 0 . - .
008 0 . In order to demonstrate a potential use of the robust score-driven filter, we show an innovativeapplication to the estimation of consumer prices from homescan data. This field of application s gaining interest, due to the growing availability of high frequency and high detail purchasedata collected through scanner technologies at the retail point (retail scan) or household level(homescan). The latter of type of data allows one to obtain cost-of-living measures for vulner-able sub-groups of the population, and to explore the distributional effects of fiscal measures.While being a valuable source for detailed price information, post-purchase homescan pricedata are affected by a measurement noise that can be potentially large in small samples, andthe application of filtering techniques may help to mitigate such noise and control for outliers.Scanner data are collected either at the retail level, e.g. supermarket data, or from householdsin consumer panels, i.e. homescan data. Retail scanner data are widely used to estimateprices, both for continuity with the traditional price survey methodology, and because they areexpected to suffer less from the substitution (unit value) bias ([Silver and Heravi, 2001]). Thisbias is due to the fact that scanner data are based on actual transactions, i.e. prices are onlyobserved after the consumer purchases the good. This implies that the observed price embodiesa quality choice component, as consumers confronted with a price increase may opt for a cheaperoption (or a cheaper retailer) and information on non-purchased items is missing. The bias canbe particularly important for aggregated goods, such as those goods commonly represented bycategory-level prices like food and drinks. Thus, a wide body of research has been devotedto improve sampling strategies and the choice of weights in aggregation. A well-documentedproblem is the change in the composition of the consumption basket over time, an issue that canbe exacerbated by high-frequency data [Feenstra and Shapiro, 2003]. For example, stockpilingof goods during promotion periods generate bias in price indices, as the purchased quantitiesare not independent over subsequent time periods [Ivancic et al., 2011, Melser, 2018].Although supermarket-level scanner data allow to mitigate the problem, as one expects awide range of products to be purchased across the population of customers within a given timeperiod, the use of homescan data to estimate prices and price indices has potentially major ad-vantages. These advantages lie in the possibility to exploit household-level heterogeneity. Mostimportantly, it becomes feasible to estimate prices faced by particular population sub-groupswhose consumption basket differs from the average one, as elderly households or low-incomegroups [Kaplan and Schulhofer-Wohl, 2017, Broda et al., 2009]. However, the unit value issueis heavier with homescan data, as individual households buy a small range of products. Thus,variable shopping frequencies and zero purchases make it necessary to rely on very large sam-ples of households to control the bias. The problem becomes even more conspicuous for pricesat the regional level, for products that are not frequently purchased and for products whosedemand is highly seasonal.Robust filtering techniques may constitute a powerful solution to the above mentioned prob-lems, and may perform well even with relatively small samples of household as the one used inour application.To illustrate the potential contribution of the proposed method, we exploit a data-set thathas been recently used to evaluate the effects of a tax on sugar-sweetened beveraged introducedin France in 2012 [Capacci et al., 2019]. Our data consists of weekly scanner price data for ood and non-alcoholic drinks. The data were collected in a single region, within the ItalianGfK homescan consumer panel, based on purchase information on 318 households surveyed inthe Piedmont region, over the period between January 2011 and December 2012. The regionalscope and the relatively small sample provide an ideal setting to test the applicability andeffectiveness of the multivariate filtering approach. The data for our application consist of three time series of weekly unit values for food items,non-alcoholic drinks and Coca-Cola purchased by a sample of 318 households residing in thePiedmont region, Italy, over the period 2011-2012, and collected within the GfK Europanelhomescan survey. The data-set provides information on weekly expenditures and purchasedquantities for each of the three aggregated items, and unit values are obtained as expenditure-quantity ratios.Average unit values are shown in Table 5. Food and non-alcoholic drinks are compositeaggregates, hence they are potentially subject to fluctuations in response to changes in theconsumer basket even when prices are stable. Instead, Coca-Cola is a relatively homogeneousgood, with little variability due to different packaging sizes.
Table 5: Average unit values, e per kilogram, Piedmont homescan data (standard deviations in brackets) Food .
343 (0 . .
226 (0 . Non-alcoholic drinks .
434 (0 . .
426 (0 . Coca-Cola .
000 (0 . .
100 (0 . We fit the multivariate score-driven model developed in the paper to the considered vectorof time series. Maximum likelihood estimation produces the following multivariate dynamicsystem of time varying locations for Drinks (D), Food (F) and Coca-Cola (C), ˆ ω = . (0 . . (0 . − . (0 . ˆ Φ = .
839 0 .
015 0 . (0 . . . − .
528 0 .
912 0 . (0 . . . .
222 0 .
023 0 . (0 . . . ˆ K = . − .
023 0 . (0 . . . .
334 0 . − . (0 . . . − . − . − . (0 . . . here the values in parenthesis are the standard errors and with ˆ ν = 6 .
921 (0 . , ˆ Ω = . · · (0 . .
348 53 . · (0 . . − . − .
579 9 . (0 . . . × − . The estimated degrees of freedom are approximately . We remark here that the assumptionof a (conditional) multivariate Student’s t distribution for the data generating process impliesthat all the univariate marginal distributions are tail equivalent, see [Resnick, 2004]. This re-quires the implicit underlying assumption that the level of heavy-tailedness across the observedtime series vector is fairly homogeneous. To investigate this issue, and for the sake of com-parisons, we have carried out a univariate analysis, as in [Harvey and Luati, 2014], from whichit resulted that the estimated degrees of freedom were very low for Coca-Cola (about ) andmedium size (smaller than ) for the other two series, as expected. Hence, the multivariatescore-driven model developed in the paper reveals to be a good compromise between a multi-variate non-robust filter, based on a linear Gaussian model, and a robust univariate estimator.Indeed, a multivariate Portmanteau test on the residuals obtained from the three univariatemodels is carried out to test the null hypothesis H : R = · · · = R m = , where R i is thesample cross-correlation matrix for some i ∈ { , . . . , m } against the alternative H : R i = .The results of Table 6 indicate rejection of the null hypothesis of absence of of serial dependencein the trivariate series at the significance level. Table 6:
Multivariate Portmanteau test. m Q ( m ) df p-value . . . . . . . . . . We also remark that the estimated degrees of freedom close to rule out the hypothesisthat the data come from a linear Gaussian state-space model, in which case the estimateddegrees of freedom would be definitely higher. Nevertheless, we have fitted a misspecifiedlinear Gaussian state-space model estimated with the Kalman filter and, as expected, alongwith a higher sensitivity to extreme values, in particular in the last period of the Coca-Colaseries, likelihood and information criteria are in favour of the multivariate model based on theconditional Student’s t distribution. Likelihood, Akaike and Bayesian information criteria. log - Lik AIC BICKF . - . - . DCS-t . - . - . The matrix of the estimated autoregressive coefficients ˆ Φ measures the dependence acrossthe dynamic locations µ t | t − , while the estimated scale matrix ˆ Ω measures the concurrentrelationship between the three series under investigation, i.e. drink, food and Coca-Cola prices.For these matrices, we report the estimates of the coefficients and, in parenthesis, the relativestandard errors. The diagonal elements of ˆ Φ show that each variable of interest is highlypersistent. In order to explore the relation among the series, we implement an impulse responseanalysis. Figure 1 shows the estimated impulse response functions. Drink Price on Drink Price −0.050.000.050.10 2 4 6 8 10 12 14 16 18 20 22 24
Food Price on Drink Price −0.2−0.10.00.10.2 2 4 6 8 10 12 14 16 18 20 22 24
Coca Cola Price on Drink Price −2−10123 2 4 6 8 10 12 14 16 18 20 22 24
Drink Price on Food Price −0.250.000.250.500.751.00 2 4 6 8 10 12 14 16 18 20 22 24
Food Price on Food Price −1.0−0.50.00.5 2 4 6 8 10 12 14 16 18 20 22 24
Coca Cola Price on Food Price −0.50.00.5 2 4 6 8 10 12 14 16 18 20 22 24
Drink Price on Coca Cola Price −0.2−0.10.00.1 2 4 6 8 10 12 14 16 18 20 22 24
Food Price on Coca Cola Price
Coca Cola Price on Coca Cola Price
Figure 1: Estimated impulse response functions of the filtered ˆ µ t | t − for a unit shock. The nonlinear impulse are computed by using the local projections approach of [ ´Oscar Jord´a, 2005],and the confidence bands are obtained by using the Newey-West corrected standard-errors (see[Newey and West, 1987]). What emerges is a negative relation between drink and food prices:a unit shock in drink prices will produce a negative shock in food prices. This may adjustments n purchasing decisions by the households aimed at mitigating the rising cost of their shoppingbasket. This would be evidence that univariate signals are likely to suffer from the unit valuebias. Similarly, a non trivial negative relation exists between food and Coca-Cola prices. Aunit shock on food prices yields a concurrent negative impact on Coca-Cola prices, which is alsonoted from the analysis of the cross-correlations. As one might expect, a positive correlationexists between Coca-Cola prices and drink prices, as the former product belongs to the lattercategory. Instead, unit shocks on food prices seem to have negligible correlation (if any) ondrink prices. Figure 2 shows the original unit value time series and the corresponding signals extractedthrough the multivariate score-driven filter. Noise and outliers, as well as some irregular periodicpattern, are clearly visible in the drinks and food series. On the other hand, the Coca-Colaseries is relatively regular, with the exception of few peaks, including a couple of large outliersin the second year. Given the homogeneous nature of the good, it is reasonable to believe thatthose extreme values are the results of measurement error.
Months
Drink Price
Months
Food Price
Months
Coca−Cola Price
Figure 2: Original series (dotted line) and estimated signals
The estimates illustrate an effective noise reduction and return patterns that are smootherand more consistent with a regular price time series. As one would expect, the Coca-Cola CS- t series is very flat, and suggests a relatively stable price over the two-years time window,with no outliers.Figure 3 shows the monthly natural logarithm differences of the raw homescan prices (HSP)and the estimated signals, together with changes in the official Regional CPIs (R-CPI) for foodand non-alcoholic drinks, whereas no CPI to the brand detail is produced. The R-CPIs areprovided by the National Statistical Institute (ISTAT). They have a monthly frequency andare built with a traditional survey-based approach on retailers. The comparison between thescore-driven filtered values and the R-CPIs is purely indicative, as the unit values from thehomescan data are weekly, whereas the official CPIs are monthly. This frequency differencemay lead to biased comparisons [Diewert et al., 2016]. Nevertheless, the graphs confirm thatthe score-driven signals are effective in reducing the noise in the data. This is especially true forthe food series, whose CPIs are more volatile compared to drinks. The correlation between theraw homescan log-differenced unit value and the log-differenced food CPI is 0.05, against 0.44when the filtered time series is considered. For the non-alcoholic drinks price series the gainis less conspicuous, as prices evolve very regularly over the time window. Still, an inexistentcorrelation between the HSP and the R-CPI (-0.02) turns into a positive one (+0.11) whenconsidering the score-driven estimates and the R-CPI. −0.10−0.050.000.05 2011−01 2011−07 2012−01 2012−07 2013−01 Months
Food −0.10.00.1 2011−01 2011−07 2012−01 2012−07 2013−01
Months
Non−Alcoholic Drinks
Figure 3: Raw unit value series (dotted line), estimated signal and Regional CPIs (log differences, grey line)
In essence, the empirical evidence suggests that a robust multivariate approach to model-based signal extraction produce meaningful price series from homescan data, especially whennoise and outliers in the original data are relevant. We find the approach to perform reasonablywell even with a low number of sampled households (318) and price time series (3), and with relatively short time window (104 weeks). Future research might shed further light on theimplications of dealing with a larger number of price series and longer time series. We presented a nonlinear and multivariate dynamic location model which enables the extractionof reliable signals from vector processes affected by outliers and possibly non-Gaussian errors.Its peculiarity lies in the specification of a score-robust updating equation for the time-varyingconditional location vector. Compared to the existing literature on observation driven modelsfor time varying parameters, the model has two innovative features: (a) it extends the univariatefirst-order dynamic conditional location score by [Harvey and Luati, 2014] to the multivariatesetting; and (b) it extends the dynamic model for time varying volatilities and correlations by[Creal et al., 2011] to the location case.We derived the stochastic properties of the model: bounded moments, stationarity, ergodic-ity, and filter invertibility. Parameters are estimated by maximum likelihood and we providedclosed formulae for the score vector and the Hessian matrix, which can be directly used for ascoring procedure. Consistency and asymptotic normality have been proved and a Monte-Carlostudy showed good and reliable finite sample properties. In the case when the degrees of free-dom tend to infinity, or, in practice, their estimate is of the order of hundreds, our specificationconverges to a linear and Gaussian model.Our empirical application showed that robust filtering may lead to satisfactory estimatesof price signals from homescan data, in the case when the multivariate dimension is low. Wecontribute to research in this area with two promising results. First, we show that robustmodeling allowing for heavy tails is more effective in dealing with noisy series affected byoutliers or extreme observations. Second, the multivariate extension of the DCS- t model hasshown more appropriate than the robust univariate filtering approach in the case of scannerprice data, as price time series are expected to have a good degree of correlation. This provesto be valuable information to reduce the noise across the modelled price time series. References [Billingsley, 1961] Billingsley, P. (1961). The lindeberg-levy theorem for martingales.
Proceed-ings of the American Mathematical Society , 12(5):788–792.[Blasques et al., 2018] Blasques, F., Gorgi, P., Koopman, S. J., and Wintenberger, O. (2018).Feasible invertibility conditions and maximum likelihood estimation for observation-drivenmodels.
Electronic Journal of Statistics , 12:1019–1052.[Blasques et al., 2015] Blasques, F., Koopman, S. J., and Lucas, A. (2015). Information-theoretic optimality of observation-driven time series models for continuous responses.
Biometrika , 102(2):325–343. Bollerslev and Wooldridge, 1992] Bollerslev, T. and Wooldridge, J. M. (1992). Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covari-ances.
Econometric Reviews , 11(2):143–172.[Bougerol, 1993] Bougerol, P. (1993). Kalman filtering with random coefficients and contrac-tions.
SIAM Journal on Control and Optimization , 31(4):942–959.[Breusch et al., 1997] Breusch, T. S., Robertson, J. C., and Welsh, A. H. (1997). The em-peror’s new clothes: a critique of the multivariate t regression model.
Statistica Neerlandica ,51(3):269–286.[Broda et al., 2009] Broda, C., Leibtag, E., and Weinstein, D. E. (2009). The role of prices inmeasuring the poor’s living standards.
Journal of Economic Perspectives , 23:77–97.[Caivano et al., 2016] Caivano, M., Harvey, A., and Luati, A. (2016). Robust time series modelswith trend and seasonal components.
SERIEs , 7(1):99–120.[Calvet et al., 2015] Calvet, L. E., Czellar, V., and Ronchetti, E. (2015). Robust filtering.
Journal of the American Statistical Association , 110(512):1591–1606.[Capacci et al., 2019] Capacci, S., Allais, O., Bonnet, C., and Mazzocchi, M. (2019). Theimpact of the French soda tax on prices, purchases and tastes. An ex post evaluation.
PlosOne .[Cox et al., 1981] Cox, D. R., Gudmundsson, G., Lindgren, G., Bondesson, L., Harsaae, E.,Laake, P., Juselius, K., and Lauritzen, S. L. (1981). Statistical analysis of time series: Somerecent developments [with discussion and reply].
Scandinavian Journal of Statistics , 8(2):93–115.[Creal et al., 2011] Creal, D., Koopman, S. J., and Lucas, A. (2011). A dynamic multivariateheavy-tailed model for time-varying volatilities and correlations.
Journal of Business &Economic Statistics , 29(4):552–563.[Creal et al., 2013] Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressivescore models with applications.
Journal of Applied Econometrics , 28(5):777–795.[Creal et al., 2014] Creal, D., Schwaab, B., S.J., K., Lucas, A., and Scharth, M. (2014).Observation-driven mixed-measurement dynamic factor models with an application to creditrisk.
The Review of Economics and Statistics , 96(5):898–915.[Diewert et al., 2016] Diewert, W. E., Fox, K. J., and de Haan, J. (2016). A newly identifiedsource of potential CPI bias: Weekly versus monthly unit value price indexes.
EconomicsLetters , 141:169–172.[Durbin and Koopman, 2012] Durbin, J. and Koopman, S. (2012).
Time series Analysis byState Space Methods . Oxford University Press, second edition. Fang et al., 2002] Fang, H.-B., Fang, K.-T., and Kotz, S. (2002). The meta-elliptical distribu-tions with given marginals.
Journal of Multivariate Analysis , 82(1):1 – 16.[Fang et al., 1990] Fang, K., Kotz, S., and Ng, K. (1990).
Symmetric multivariate and relateddistributions . Number 36 in Monographs on statistics and applied probability. Chapman &Hall, London [u.a.].[Feenstra and Shapiro, 2003] Feenstra, R. C. and Shapiro, M. D. (2003). High-frequency sub-stitution and the measurement of price indexes. In Feenstra, R. C. and Shapiro, M. D.,editors,
Scanner Data and Price Indexes , volume 64 of
Studies in Income and Wealth , pages123–150. The University of Chicago Press.[Fiorentini et al., 2003] Fiorentini, G., Sentana, E., and Calzolari, G. (2003). Maximum like-lihood estimation and inference in multivariate conditionally heteroscedastic dynamic re-gression models with student t innovations.
Journal of Business & Economic Statistics ,21(4):532–546.[Hamilton, 1994] Hamilton, J. (1994).
Time series analysis . Princeton Univ. Press, Princeton,NJ.[Hannan, 1970] Hannan, E. J. (1970).
Multiple Time Series . Wiley Series in Probability andStatistics. Wiley.[Hannan and Deistler, 1987] Hannan, E. J. and Deistler, M. (1987).
The Statistical Theory ofLinear Systems . John Wiley & Sons, Inc., New York, NY, USA.[Hansen and Schmidtblaicher, 2019] Hansen, P. R. and Schmidtblaicher, M. (2019). A dynamicmodel of vaccine compliance: How fake news undermined the danish hpv vaccine program.
Journal of Business & Economic Statistics , 0(0):1–21.[Harvey, 1989] Harvey, A. C. (1989).
Forecasting, structural time series models and the KalmanFilter . Cambridge University Press, Great Britain.[Harvey, 2013] Harvey, A. C. (2013).
Dynamic models for Volatility and Heavy Tails . Econo-metric Society Monograph - Cambridge University Press.[Harvey and Luati, 2014] Harvey, A. C. and Luati, A. (2014). Filtering with heavy tails.
Jour-nal of the American Statistical Association , 109(507):1112–1122.[Ivancic et al., 2011] Ivancic, L., Diewert, W. E., and Fox, K. J. (2011). Scanner data, timeaggregation and the construction of price indexes.
Journal of Econometrics , 161:24–35.[Kalman, 1960] Kalman, R. E. (1960). A new approach to linear filtering and prediction prob-lems.
Transactions of the ASME–Journal of Basic Engineering , 82(Series D):35–45.[Kaplan and Schulhofer-Wohl, 2017] Kaplan, G. and Schulhofer-Wohl, S. (2017). Inflation atthe household level.
Journal of Monetary Economics , 91:19–38. Kohn and Ansley, 1989] Kohn, R. and Ansley, C. F. (1989). A fast algorithm for signal ex-traction, influence and cross-validation in state space models.
Biometrika , 76(1):65–79.[Koopman et al., 2016] Koopman, S. J., Lucas, A., and Scharth, M. (2016). Predicting time-varying parameters with parameter-driven and observation-driven models.
The Review ofEconomics and Statistics , 98(1):97–110.[Kotz and Nadarajah, 2004] Kotz, S. and Nadarajah, S. (2004).
Multivariate T-Distributionsand Their Applications . Cambridge University Press.[Linton and Wu, 2020] Linton, O. and Wu, J. (2020). A coupled component dcs-egarch modelfor intraday and overnight volatility.
Journal of Econometrics , 217:176–201.[Lucas et al., 2019] Lucas, A., Schaumburg, J., and Schwaab, B. (2019). Bank business modelsat zero interest rates.
Journal of Business & Economic Statistics , 37(3):542–555.[L¨utkepohl, 2007] L¨utkepohl, H. (2007).
New Introduction to Multiple Time Series Analysis .Springer Berlin Heidelberg.[Magnus and Neudecker, 2019] Magnus, J. and Neudecker, H. (2019).
Matrix Differential Cal-culus with Applications in Statistics and Econometrics . Wiley Series in Probability andStatistics. Wiley.[Melser, 2018] Melser, D. (2018). Scanner data price indexes: Addressing some unresolvedissues.
Journal of Business & Economic Statistics , 36(3):516–522.[Newey and West, 1987] Newey, W. K. and West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix.
Econometrica ,55(3):703–708.[ ´Oscar Jord´a, 2005] ´Oscar Jord´a (2005). Estimation and inference of impulse responses by localprojections.
The American Economic Review , 95(1):161–182.[Prucha and Kelejian, 1984] Prucha, I. R. and Kelejian, H. H. (1984). The structure of simulta-neous equation estimators: A generalization towards nonnormal disturbances.
Econometrica ,52(3):721–736.[Resnick, 2004] Resnick, S. (2004). On the foundations of multivariate heavy-tail analysis.
Journal of Applied Probability , 41:191–212.[Shapiro and Feenstra, 2003] Shapiro, M. D. and Feenstra, R. C., editors (2003).
Scanner Dataand Price Indexes , volume 64 of
Studies in Income and Wealth , Chicago and London. TheUniversity of Chicago Press.[Silver, 1995] Silver, M. (1995). Elementary aggregates, micro-indices and scanner data: Someissues in the compilation of consumer price indices.
Review of Income and Wealth , 41(4):427–438. Silver and Heravi, 2001] Silver, M. and Heravi, S. (2001). Scanner data and the measurementof inflation.
The Economic Journal , 111(June):383–404.[Straumann, 2005] Straumann, D. (2005).
Estimation in Conditionally Heteroscedastic TimeSeries Models . Lecture Notes in Statistics. Springer Berlin Heidelberg.[Straumann and Mikosch, 2006] Straumann, D. and Mikosch, T. (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrenceequations approach.
The Annals of Statistics , 34(5):2449–2495.[van der Vaart, 1998] van der Vaart, A. W. (1998).
Asymptotic statistics . Cambridge Series inStatistical and Probabilistic Mathematics. Cambridge University Press.[West and Harrison, 1997] West, M. and Harrison, J. (1997).
Bayesian Forecasting and Dy-namic Models (2Nd Ed.) . Springer-Verlag, Berlin, Heidelberg.[White, 1994] White, H. (1994).
Estimation, Inference and Specification Analysis . EconometricSociety Monographs. Cambridge University Press.[White, 2001] White, H. (2001).
Asymptotic Theory for Econometricians . Economic Theory,Econometrics, and Mathematical Economics. Emerald Group Publishing Limited.
Appendix A Main Proofs
A.1 Proofs of Lemmata for the Dynamic Location
Proof of Lemma 2.1 (Uniformly Boundedness and Moments of the Score)
Proof.
Let us consider the following stochastic representation of y t in equation (1), y t = µ t | t − + Ω / z t s r t s t /ν , (14)where z t is uniformly distributed on the Unit Sphere in R N and r t ∼ χ N is independentof s t ∼ χ ν . One can then express b t in equation (4) as b t = r t / ( s t + r t ) and u t in (3) as u t = √ ν q b t (1 − b t ) Ω / z t . The random vector z t and the random variables b t are independentof each other by construction, see [Fang et al., 1990]. It follows that for even integers m =2 s, s = 1 , , . . . , the moments of u t can be expressed as E (cid:20) k u t k m (cid:21) = ν m/ k Ω k m/ E (cid:20) b m/ t (1 − b t ) m/ (cid:21) E (cid:20) k z t k m (cid:21) = k Ω k m/ B (cid:16) N , ν (cid:17) (cid:18) νN (cid:19) m/ Z b N + m − t (1 − b t ) ν + m − d b t = k Ω k m/ (cid:18) νN (cid:19) m/ B (cid:16) N + m , ν + m (cid:17) B (cid:16) N , ν (cid:17) . roof of Lemma 3.1 (Stationarity, Ergodicity and Moments of the Dynamic Location) Proof.
Let us consider equation (5). The recursion is linear in µ t for a given (cid:15) t and hence thecondition needed for the process to produce stationary ergodic paths boils down to the stabilitycondition % ( Φ ) < . By combining equation (5) with Lemma 2.1, taking the expectation andusing the triangle, H¨older and Minkowsky inequalities, respectively, we get E " k µ t +1 − ω k m = E "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X j =0 Φ j Ku t − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m ≤ E " ∞ X j =0 k Φ j Ku t − j k m ≤ ( ¯ c ∞ X j =0 ¯ ρ j E (cid:20) k u t − j k m (cid:21)! /m ) m < ∞ , where ¯ c = N k K k and ¯ ρ = % ( Φ ) < . The last inequality follows from a standard result inlinear algebra, as k Φ k = k P Λ P − k = tr( Λ ) = P Ni =1 ρ i where ρ i are the eigenvalues of Φ . Proof of Lemma 3.2 (Bounded Moments)
Proof.
One has E [ k y t k m ] ≤ C E [ k µ t k m ] + C E [ k (cid:15) t k m ] < ∞ , which follows by the c m -inequality,by the fact that µ t is uniformly bounded from Lemmata 2.1 and 3.1 and by the properties ofthe multivariate Student’s t distribution. Proof of Lemma 3.3 (Invertibility of the Dynamic Location Filter)
Proof.
Equation (6) can be embedded in a first order nonlinear dynamic system µ t +1 | t = φ ( µ t | t − , y t , θ ) , t ∈ N , where the dynamic location vector takes his values in a Borel subset M of R N . Let us define inductively, for k ≥ and any initialization µ | ∈ M , a sequence ofLipschitz maps φ ( k +1) : M × R N × Θ M for k ≥ such that φ ( k +1) ( µ | , y , . . . , y k +1 , θ ) = φ ( φ ( k ) ( µ | , y , . . . , y k , θ ) , y k +1 , θ ) . By applying the mean value theorem to the Lipschitz map φ ( µ t | t − , y t , θ ) , we obtain µ t +1 | t = c X ?t µ t | t − + ϕ ( ˆ µ ?t | t − , y t , θ ) , (15)where ˆ µ ?t | t − denotes a set of points between ˆ µ t | t − and µ t | t − , so that c X ?t = φ ( ˆ µ ?t | t − , y t , θ ) ,where φ ( · ) denotes the first partial derivatives of φ ( · ) with respect to the transpose of µ ?t | t − ,and ϕ ( ˆ µ ?t | t − , y t , θ ) = φ ( ˆ µ t | t − , y t , θ ) − φ ( µ ?t | t − , y t , θ ) ˆ µ t | t − . Equation (15) is a multivariatestochastic recurrence equation (MSRE), that can be viewed as vector autoregressive processwith random coefficients { c X ?t } and { ϕ ( ˆ µ ?t | t − , y t , θ ) } .The sufficient conditions for invertibility given by [Bougerol, 1993] and [Straumann and Mikosch, 2006] hen become E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ϕ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) < ∞ , (16) E " log sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k Y j =1 X k − j +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < , (17)for k ≥ .Let us consider condition (16). One has E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ϕ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) ≤ E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) φ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) + E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) φ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) + log (cid:13)(cid:13)(cid:13) µ | (cid:13)(cid:13)(cid:13) . (18)Now, E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) φ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) ≤ log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ω (cid:13)(cid:13)(cid:13) + log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) Φ (cid:13)(cid:13)(cid:13) + log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) µ | − ω (cid:13)(cid:13)(cid:13) + E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) u t (cid:13)(cid:13)(cid:13) < ∞ , by compactness of Θ and since u t is uniformly bounded, as shown in Lemma 2.1. Moreover, E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) φ ( µ | , y , θ ) (cid:13)(cid:13)(cid:13) = E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) Φ + 2 /ν K /w ( y − µ | )( y − µ | ) > Ω − − K /w (cid:13)(cid:13)(cid:13) ≤ + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) Φ (cid:13)(cid:13)(cid:13) + log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) K /ν (cid:13)(cid:13)(cid:13) + E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ( y − µ | )( y − µ | ) > Ω − (cid:13)(cid:13)(cid:13) /w + E " log + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) K (cid:13)(cid:13)(cid:13) /w < ∞ , by compactness of Θ with < ν < ∞ and since the existence of the logarithmic moment isalways ensured by Lemma 2.1. In particular, we note that E " log + sup θ ∈ Θ k ( y − µ | ) k /w < ∞ , is always ensured since w t is bounded away from zero and the ratio is uniformly bounded ∀ y t ∈ R N . Therefore, condition (16) is fulfilled. s far as condition (17) is concerned, the exponentially fast almost sure convergence of thefiltered { ˆ µ t | t − } t ∈ N may be obtained as an application of Theorem 3.1 in [Bougerol, 1993] orTheorem 2.7 in [Straumann and Mikosch, 2006], since the contraction condition 17 implies that sup θ ∈ Θ k ˆ µ t +1 | t − µ t +1 | t k = sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t − Y i =0 c X ?t − i (cid:16) ˆ µ | − µ | (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ % t c, where c > and < % < are constants. Therefore, all the requirements of [Bougerol, 1993]’sTheorem are satisfied. Additionally, the claim that the moments are bounded follow fromuniform boundedness of u t , see Lemma 2.1. A.2 Proof of Consistency and Asymptotic Normality of the MLE
Let us define the empirical average log -likelihood function based on the chosen initial value µ | and on the recursion ˆ µ t | t − b L T ( θ ) = 1 T T X t =1 b ‘ t ( θ ) , (19)and the likelihood based on the stationary solution µ t | t − L T ( θ ) = 1 T T X t =1 ‘ t ( θ ) , (20)with the following limit L ( θ ) = E [ ‘ t ( θ )] . (21) Proof of Theorem 4.1 (Consistency)
Proof.
One has, sup θ ∈ Θ | b L T ( θ ) − L ( θ ) | ≤ sup θ ∈ Θ | b L T ( θ ) − L T ( θ ) | + sup θ ∈ Θ |L T ( θ ) − L ( θ ) | . By Lemma C.3, sup θ ∈ Θ | b L T ( θ ) − L T ( θ ) | a.s. −−→ , as t → ∞ , and by Lemma C.4, sup θ ∈ Θ |L T ( θ ) −L ( θ ) | a.s. −−→ , as t → ∞ . Also, by the Ergodic Theorem lim T →∞ b L T ( θ ) = lim T →∞ L T ( θ ) = L ( θ ) , and, in conclusion, by Lemma C.2, L ( θ ) < L ( θ ) , ∀ θ = θ . Following similar argumentsof Theorem 3.4 in [White, 1994], one can show that strong consistency holds if ∀ θ = θ , ∃B η ( θ ) , where B η ( θ ) = { θ : k θ − θ k > η, η > } s.t. for any sequence of maximizers { θ ? } ∈ Θ and θ ? ∈ B η ( θ ) , lim sup T →∞ sup θ ? ∈B η ( θ ) b L T ( θ ) < lim T →∞ b L T ( θ ) a.s. ith a similar reasoning, by the reverse Fatou’s Lemma and the Ergodic Theorem we get lim sup T →∞ sup θ ? ∈B η ( θ ) b L T ( θ ) = lim sup T →∞ sup θ ? ∈B η ( θ ) L T ( θ ) = lim sup T →∞ sup θ ? ∈B η ( θ ) T T X t =1 ‘ t ( θ ) ≤ lim sup T →∞ T T X t =1 sup θ ? ∈B η ( θ ) ‘ t ( θ ) = E " sup θ ? ∈B η ( θ ) ‘ t ( θ ) , and therefore, ∀ ε > ∃ η > s.t. E " sup θ ? ∈B η ( θ ) ‘ t ( θ ) < E (cid:20) ‘ t ( θ ) (cid:21) + ε = L ( θ ) + ε. Note that ε can be made arbitrarily small. Therefore, the uniqueness of identifiability of themaximizer θ ∈ Θ , is ensured by the uniqueness of θ as the maximizer of the likelihood, seeLemma C.2, the compactness of the parameter space Θ and finally, the continuity of the limit L ( θ ) in θ ∈ Θ which is ensured from the continuity of L T ( θ ) in θ ∈ Θ , ∀ T ∈ N and theuniform convergence in Lemma C.4. Then, the strong consistency follows by Theorem 3.4 in[White, 1994]. Proof of Theorem 4.2 (Asymptotic Normality)
Proof.
Standard arguments for the proof of asymptotic normality and the Taylor’s theoremlead to the expansion of the conditional likelihood’s score function around a neighborhood of θ , which yields = √ T L T (ˆ θ T ) = √ T (cid:20) b L T ( θ ) − L T ( θ ) (cid:21) + √ T L T ( θ )++ (cid:20)(cid:16) L T ( θ ) − L ( θ ) (cid:17) + (cid:16) b L T ( θ ? ) − L T ( θ ) (cid:17) + L ( θ ) (cid:21) × (cid:20) √ T (ˆ θ T − θ ) (cid:21) , (22)where θ ? lies on the chord between ˆ θ T and θ , componentwise.First, the fact that √ T L T ( θ ) obeys the CLT for martingales is entailed in Lemma C.9.Convergence of the first difference in square brackets of equation (22) is ensured by LemmaC.10. Thus, by the asymptotic equivalence (see Lemma 4.7 in [White, 2001]) b L T ( θ ) has thesame asymptotic distribution of √ T L T ( θ ) . As regards the second line, we have that the middleterm vanishes almost surely and exponentially fast, since Lemma C.12 demonstrates that theinitial conditions for the likelihood’s second derivatives are asymptotically irrelevant and theconsistency theorem further ensures the convergence in the same point by continuity argumentsof the likelihood’s second derivatives. In addition, the first term in the brackets of the secondline vanishes as well by the uniform law of large numbers discussed in Lemma C.13. Finally,with Lemma C.11 at hand, we can easily solve equation (22), since L ( θ ) is nonsingular.Slusky’s Lemma (see Lemma 2.8 (iii) of [van der Vaart, 1998]) completes the proof. ppendix B and C are reported in the supplementary online material. ppendix B Computational Aspects This Appendix is devoted to the construction of score vector and the Hessian matrix, essen-tial for estimation and inference. Our approach to tackle this problem follows the matrixdifferential calculus style of [Magnus and Neudecker, 2019]. As argued by the authors, one ofthe advantages to represent the conditional log-density in its differential form is that we canstraightforwardly retrieve all the partial derivatives, thus avoiding the problem of dealing withthe dimensions of the matrices and vectors involved.
B.1 The Score Vector
The expressions for the score might be collected in a single vector, s t ( θ ) = h s ( ν ) t ( θ ) s (v( Ω )) t ( θ ) s ( ω ) t ( θ ) s (v( Φ )) t ( θ ) s (v( K )) t ( θ ) i > , yielding the recursions for the static parameters s ( ν ) t ( θ ) = 12 " ψ ν + N ! − ψ ν ! − Nν + ν + Nν b t − ln w t + ν + Nν w t d ( µ t | t − − ω ) dν ! > Ω − ( y t − µ t | t − ) , s (v( Ω )) t ( θ ) = 12 D > N ( Ω − / ⊗ Ω − / ) " ν + Nν w t ( (cid:15) t ⊗ (cid:15) t ) − vec I N + ν + Nν w t d ( µ t | t − − ω ) d (vech( Ω )) > ! > Ω − ( y t − µ t | t − ) , for the unconditional mean s ( ω ) t ( θ ) = ν + Nν w t d ( µ t | t − − ω ) d ω > ! > Ω − ( y t − µ t | t − ) , and for the remaining parameters determining the dynamics of the location vector s (v( Φ )) t ( θ ) = ν + Nν w t d ( µ t | t − − ω ) d (vec Φ ) > ! > Ω − ( y t − µ t | t − ) , s (v( K )) t ( θ ) = ν + Nν w t d ( µ t | t − − ω ) d (vec K ) > ! > Ω − ( y t − µ t | t − ) . imilarly, the conditional information matrix may be represented as follows, I t ( θ ) = I ( ν ) t ( θ ) I ( ν, v( Ω )) t ( θ ) × N I ( ν, v( Φ )) t ( θ ) I ( ν, v( K )) t ( θ ) I (v( Ω ) ,ν ) t ( θ ) I (v( Ω )) t ( θ ) N × N I (v( Ω ) , v( Φ )) t ( θ ) I (v( Ω ) , v( K )) t ( θ ) N × N × N I ( ω ) t ( θ ) N × N N × N I (v( Φ ) ,ν ) t ( θ ) I (v( Φ ) , v( Ω )) t ( θ ) N × N I (v( Φ )) t ( θ ) I (v( Φ ) , v( K )) t ( θ ) I (v( K ) ,ν ) t ( θ ) I (v( K ) , v( Ω )) t ( θ ) N × N I (v( K ) , v( Φ )) t ( θ ) I (v( K )) t ( θ ) . The four blocks of the matrix have the following expansions: the first block is composed by I ( ν ) t ( θ ) = 14 " ψ ν ! − ψ ν + N ! − N ( ν + N + 4) ν ( ν + N )( ν + N + 2) + ν + Nν + N + 2 d ( µ t | t − − ω ) dν ! > Ω − d ( µ t | t − − ω ) dν ! , I (v( Ω ) ,ν ) t ( θ ) = − ν + N )( ν + N + 2) D > N (vech( Ω − ))+ ν + Nν + N + 2 d ( µ t | t − − ω ) d (vech( Ω )) > ! > Ω − d ( µ t | t − − ω ) dν ! , I (v( Ω )) t ( θ ) = ν + N ν + N + 2) D > N ( Ω − ⊗ Ω − ) D N − ν + N + 2) D > N (vech( Ω − ))(vech( Ω − )) > D N + ν + Nν + N + 2 d ( µ t | t − − ω ) d (vech( Ω )) > ! > Ω − d ( µ t | t − − ω ) d (vech( Ω )) > ! . The second, I (v( Φ ) ,ν ) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec Φ ) > ! > Ω − d ( µ t | t − − ω ) dν ! , I (v( Φ ) , v( Ω )) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec Φ ) > ! > Ω − d ( µ t | t − − ω ) d (vech( Ω )) > ! , I (v( K ) , v( Ω )) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec K ) > ! > Ω − d ( µ t | t − − ω ) d (vech( Ω )) > ! , I (v( K ) ,ν ) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec K ) > ! > Ω − d ( µ t | t − − ω ) dν ! . hird, the unconditional mean I ( ω ) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d ω > ! > Ω − d ( µ t | t − − ω ) d ω > ! . By symmetry, the fourth and last block are composed by I (v( Φ )) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec Φ ) > ! > Ω − d ( µ t | t − − ω ) d (vec Φ ) > ! , I (v( Φ ) , v( K )) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec Φ ) > ! > Ω − d ( µ t | t − − ω ) d (vec K ) > ! , I (v( K )) t ( θ ) = ν + Nν + N + 2 d ( µ t | t − − ω ) d (vec K ) > ! > Ω − d ( µ t | t − − ω ) d (vec K ) > ! . Notably, I ( ω , ξ ) t ( θ ) = and I ( ω , ψ ) t ( θ ) = , i.e. ω is asymptotically independent of the otherparameters. Moreover, none of the terms of the conditional information matrix involves thesecond derivatives of the dynamic location. This result is a direct consequence of the asymptoticproperties of the proposed maximum likelihood estimator under the assumption of correctspecification of the model and some regularity conditions.To construct the score vector, we take the first differential of the likelihood function (10) d ‘ t ( θ ) = 12 " ψ ν + N ! − ψ ν ! − Nν + ν + Nν b t − ln w t (d ν )+ 12 (d vech( Ω )) > D > N ( Ω − / ⊗ Ω − / ) " ν + Nν w t ( (cid:15) t ⊗ (cid:15) t ) − vec I N + ν + Nν w t (d µ t | t − ) > Ω − ( y t − µ t | t − ) , (23)where ψ ( x ) = d ln Γ( x ) /d ( x ) is the digamma function and D N the duplication matrix, whichallow us to write d vec Ω = D N (d vech( Ω )) , since the scale matrix is symmetric. Secondly,we define s t ( θ ) = d‘ t ( θ ) /d θ and partition the parameter as θ = ( ξ > , ψ > ) > , so that the scorevector can be partitioned into two blocks and two distinct applications of the chain rule arerequired. Specifically, for ξ = ( ω > , (vech( Ω )) > , ν ) > , we have s ( ξ ) t ( θ ) = d‘ t ( θ ) d ξ = ∂‘ t ( θ ) ∂ ξ + d ( µ t | t − − ω ) d ξ > ! > ∂‘ t ( θ ) ∂ µ t | t − , while for ψ = ((vec Φ ) > , (vec K ) > ) > , we have s ( ψ ) t ( θ ) = d‘ t ( θ ) d ψ = d ( µ t | t − − ω ) d ψ > ! > ∂‘ t ( θ ) ∂ µ t | t − . et us start by considering the first differential of the dynamic location d( µ t +1 | t − ω ) = Φ d( µ t | t − − ω ) + h ( µ t | t − − ω ) > ⊗ I N i d vec Φ + h ( u t ) > ⊗ I N i d vec K + K (d u t ) , where d u t =( y t − µ t | t − ) b t (1 − b t ) /ν (d ν )+ ( y t − µ t | t − )(1 − b t ) /ν ( (cid:15) t ⊗ (cid:15) t ) > ( Ω − / ⊗ Ω − / ) D N (d vech( Ω ))+ 2( y t − µ t | t − )(1 − b t ) /ν ( y t − µ t | t − ) > Ω − (d µ t | t − ) − (1 − b t )(d µ t | t − ) . Let us embed the dynamic differential as a stochastic recurrence equation d( µ t +1 | t − ω ) = X t d( µ t | t − − ω ) + R t , where X t = Φ + K C t , (24)and R t = Ka t d ν + KB t d vec Ω + D t d vec Φ + E t d vec K . The terms of the latter equations are a t = ∂ u t ∂ν = ( y t − µ t | t − ) b t (1 − b t ) /ν, B t = ∂ u t ∂ (vech( Ω )) > = (1 − b t ) /ν ( y t − µ t | t − )( Ω − / (cid:15) t ⊗ Ω − / (cid:15) t ) > D N , C t = ∂ u t ∂ µ > t | t − = 2(1 − b t ) /ν ( y t − µ t | t − )( y t − µ t | t − ) > Ω − − (1 − b t ) I N , which we write, for convenience, also in their vectorised form a t = b / t (1 − b t ) / /ν Ω / z t , vec B t = νb / t (1 − b t ) / ( Ω − / ⊗ Ω − / ⊗ Ω / )( z t ⊗ z t ⊗ z t ) , vec C t = 2 b t (1 − b t ) ( Ω − / ⊗ Ω / )( z t ⊗ z t ) − (1 − b t ) vec I N . (25) he partial derivatives C = ∂ ( µ t | t − − ω ) ∂ ω > = ( I N − Φ ) , D t = ∂ ( µ t | t − − ω ) ∂ (vec Φ ) > = h ( µ t | t − − ω ) > ⊗ I N i , E t = ∂ ( µ t | t − − ω ) ∂ (vec K ) > = h ( u t ) > ⊗ I N i , are required to obtain the final recursions, necessary for the iterative procedure d ( µ t +1 | t − ω ) dν = X t d ( µ t | t − − ω ) dν + Ka t ,d ( µ t +1 | t − ω ) d (vech( Ω )) > = X t d ( µ t | t − − ω ) d (vech( Ω )) > + KB t ,d ( µ t +1 | t − ω ) d ω > = X t d ( µ t | t − − ω ) d ω > + C , (26) d ( µ t +1 | t − ω ) d (vec Φ ) > = X t d ( µ t | t − − ω ) d (vec Φ ) > + D t ,d ( µ t +1 | t − ω ) d (vec K ) > = X t d ( µ t | t − − ω ) d (vec K ) > + E t . The discussion on the required partial derivatives of the log -likelihood function is similarlytackled. From (23) the calculation are straightforward, we define α t = ∂‘ t ( θ ) ∂ν = 12 " ψ ν + N ! − ψ ν ! − Nν + ν + Nν b t − ln w t , β t = ∂‘ t ( θ ) ∂ (vech( Ω )) = 12 D > N ( Ω − / ⊗ Ω − / ) " ν + Nν w t ( (cid:15) t ⊗ (cid:15) t ) − vec I N , ς t = ∂‘ t ( θ ) ∂ µ t | t − = ν + Nν w t Ω − ( y t − µ t | t − ) . B.2 The Hessian Matrix
Like in the previous section, we obtain the second differential of the conditional log -likelihoodby differentiating (23), which yields d ‘ t ( θ ) = 12 " ψ ν + N ! − ψ ν ! + Nν − Nν b t − ν + Nν b t (1 − b t ) + 1 ν b t (d ν )+ " ν + N ν (1 − b t ) (d vec Ω ) > ( Ω − / ⊗ Ω − / )( (cid:15) t (cid:15) > t ⊗ (cid:15) t (cid:15) > t )( Ω − / ⊗ Ω − / )(d vec Ω ) " ν + Nν (1 − b t )(d µ t | t − ) > Ω − (d µ t | t − ) − " ν + Nν (1 − b t )(d µ t | t − ) > Ω − / (cid:15) t + " ν + Nν (1 − b t ) (d µ t | t − ) > ( Ω − / (cid:15) t (cid:15) > t Ω − / ⊗ (cid:15) > t Ω − / )(d vec Ω ) − " ν + Nν (1 − b t )(d vec Ω ) > ( Ω − ⊗ Ω − / (cid:15) t (cid:15) > t Ω − / )(d vec Ω ) + " ν + Nν (1 − b t )(d µ t | t − ) > ( (cid:15) > t Ω − / ⊗ Ω − )(d vec Ω ) + " ν + Nν (1 − b t ) (d µ t | t − ) > Ω − / (cid:15) t (cid:15) > t Ω − / (d µ t | t − ) + "
12 (d vec Ω ) > ( Ω − ⊗ Ω − )(d vec Ω ) + " (d µ t | t − ) > Ω / (cid:15) t + 12 (d vec Ω ) > ( Ω − / ⊗ Ω − / )( (cid:15) t ⊗ (cid:15) t ) × " ν + Nν b t (1 − b t ) − Nν (1 − b t ) (d ν ) , (27)where ψ ( x ) = d ln Γ( x ) /d ( x ) is the trigamma function.We thus define the Hessian matrix H t ( θ ) = d ‘ t ( θ ) d θ d θ > , Similar arguments as those used in the computation of the score vector lead us to decomposethe Hessian into four blocks and then apply the chain rule separately to each block. The firstset is ξ = ( ω > , (vech( Ω )) > , ν ) > , H ( ξ ) t ( θ ) = d ‘ t ( θ ) d ξ d ξ > = ∂ ‘ t ( θ ) ∂ ξ ∂ ξ > + d ( µ t | t − − ω ) d ξ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d ξ > ! + ∂‘ t ( θ ) ∂ µ > t | t − d ( µ t | t − − ω ) d ξ d ξ > . As regards the second vector of parameters ψ = ((vec Φ ) > , (vec K ) > ) > , we have H ( ψ ) t ( θ ) = d ‘ t ( θ ) d ψ d ψ > = d ( µ t | t − − ω ) d ψ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d ψ > ! + ∂‘ t ( θ ) ∂ µ > t | t − d ( µ t | t − − ω ) d ψ d ψ > , nd finally, by symmetry, we get the remaining blocks H ( ξ , ψ ) t ( θ ) = d ‘ t ( θ ) d ξ d ψ > = d ( µ t | t − − ω ) d ξ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d ψ > ! + ∂‘ t ( θ ) ∂ µ > t | t − d ( µ t | t − − ω ) d ξ d ψ > . As far as the second differentials of the dynamic equation are concerned, we have d µ t +1 | t = Φ d µ t | t − + 2[d( µ t | t − − ω ) > ⊗ I N ]d vec Φ + 2[d( u t ) > ⊗ I N ] vec K + K (d u t ) , that, in turn, implies expanding d u t with respect to the parameters of the Student’s t .After some algebra we get the second differential of the driving-force d u t =2( y t − µ t | t − ) /ν h b t (1 − b t ) /ν − b t (1 − b t ) i (d ν )+ 2(1 − b t ) /ν (cid:26)h (d vec Ω ) > ⊗ ( y t − µ t | t − )( (cid:15) t ⊗ (cid:15) t ) > i vec( Ω − / ⊗ Ω − / ) (cid:27) × h ( (cid:15) t ⊗ (cid:15) t ) > ( Ω − / ⊗ Ω − / )(d vec Ω ) i − − b t ) /ν (cid:26)h (d vec Ω ) > ( Ω − / (cid:15) t ⊗ Ω − ) ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > Ω − i (d vec Ω ) (cid:27) + 8(1 − b t ) /ν (cid:26)h (d µ t | t − ) > ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > i vec Ω − (cid:27) × h ( y t − µ t | t − ) > Ω − (d µ t | t − ) i − − b t ) /ν (cid:26)h (d µ t | t − ) > Ω − ⊗ I N ih ( y t − µ t | t − ) ⊗ I N + I N ⊗ ( y t − µ t | t − ) i (d µ t | t − ) (cid:27) − − b t ) /ν (cid:26)h (d µ t | t − ) > Ω − ⊗ I N ih ( y t − µ t | t − ) ⊗ I N i (d µ t | t − ) (cid:27) + 2(1 − b t ) /ν (cid:26)h ( y t − µ t | t − )( y t − µ t | t − ) > Ω − (d µ t | t − ) i(cid:27) − (1 − b t ) (cid:26) (d µ t | t − ) (cid:27) + 4(1 − b t ) /ν (cid:26)h (d µ t | t − ) > ⊗ Ω / (cid:15) t (cid:15) > t Ω / i (vec Ω − )( (cid:15) t ⊗ (cid:15) t ) > ( Ω − ⊗ Ω − )(d vec Ω ) (cid:27) − (1 − b t ) /ν (cid:26)h (d µ t | t − ) > ⊗ I N i (vec I N ) h ( (cid:15) t ⊗ (cid:15) t ) > ( Ω − ⊗ Ω − )(d vec Ω ) i(cid:27) − − b t ) /ν (cid:26)h (d µ t | t − ) > ⊗ Ω / (cid:15) t (cid:15) > t Ω / i ( Ω − ⊗ Ω − )(d vec Ω ) (cid:27) + (cid:26) [(d vec Ω ) > ⊗ ( y t − µ t | t − )( (cid:15) t ⊗ (cid:15) t ) > ] vec( Ω − / ⊗ Ω − / ) (cid:27) × h b t (1 − b t ) /ν − (1 − b t ) /ν ) i (d ν )+ 2 (cid:26)h (d µ t | t − ) > ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > i × vec Ω − (cid:27)h b t (1 − b t ) /ν − (1 − b t ) /ν i (d ν ) − (cid:26)h (d µ t | t − ) > ⊗ I N i (vec I N ) (cid:27)h b t (1 − b t ) /ν i (d ν ) . et us write d ( µ t +1 | t − ω ) = X t d ( µ t | t − − ω ) + K d( µ t | t − − ω ) > C t d( µ t | t − − ω ) + Q t , where X t is as in (24) and Q t = Ka t d ν + KB t d vec Ω + K (d vec Ω ) > d aB t d ν + D t d vec Φ + E t d vec K + (d vec Φ ) > d DE t (d vec K ) . (28)We now derive the terms of recursion (28). We first need a set of partial derivative a t = ∂ u t ∂ν =2( y t − µ t | t − ) /ν h b t (1 − b t ) /ν − b t (1 − b t ) i , (29) B t = ∂ u t ∂ (vech( Ω )) ∂ (vech( Ω )) > =2(1 − b t ) /ν × (cid:26)h D > N ⊗ ( y t − µ t | t − )( (cid:15) t ⊗ (cid:15) t ) > i vec( Ω − / ⊗ Ω − / ) (cid:27) × h ( (cid:15) t ⊗ (cid:15) t ) > ( Ω − / ⊗ Ω − / ) D N i − − b t ) /ν (cid:26)h D > N ( Ω − / (cid:15) t ⊗ Ω − ) ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > Ω − i D N (cid:27) , (30) C t = ∂ u t ∂ µ t | t − ∂ µ > t | t − =8(1 − b t ) /ν (cid:26)h I N ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > i vec Ω − (cid:27) × h ( y t − µ t | t − ) > Ω − i − b t ) /ν (cid:26)h Ω − ⊗ I N i × h ( y t − µ t | t − ) ⊗ I N + I N ⊗ ( y t − µ t | t − ) i(cid:27) − − b t ) /ν (cid:26)h Ω − ⊗ I N ih ( y t − µ t | t − ) ⊗ I N i(cid:27) . (31)Secondly, a set of partial cross-derivatives aB t = ∂ u t ∂ (vech( Ω )) ∂ν =[ I N ⊗ ( y t − µ t | t − )( (cid:15) t ⊗ (cid:15) t ) > ] vec( Ω − / ⊗ Ω − / ) × h b t (1 − b t ) /ν − (1 − b t ) /ν ) i , d a C t = ∂ u t ∂ µ t | t − ∂ν =2 (cid:26)h I N ⊗ ( y t − µ t | t − )( y t − µ t | t − ) > i vec Ω − (cid:27) × h b t (1 − b t ) /ν − (1 − b t ) /ν i − (cid:26)h (d µ t | t − ) > ⊗ I N i (vec I N ) (cid:27)h b t (1 − b t ) /ν i , (32) d B C t = ∂ u t ∂ µ t | t − ∂ (vech( Ω )) > =4(1 − b t ) /ν (cid:26)h I N ⊗ Ω / (cid:15) t (cid:15) > t Ω / i (33) (vec Ω − )( (cid:15) t ⊗ (cid:15) t ) > ( Ω − ⊗ Ω − ) (cid:27) − (1 − b t ) /ν (cid:26)h I N ⊗ I N i (vec I N ) h ( (cid:15) t ⊗ (cid:15) t ) > ( Ω − ⊗ Ω − ) i(cid:27) − − b t ) /ν (cid:26)h I N ⊗ Ω / (cid:15) t (cid:15) > t Ω / i ( Ω − ⊗ Ω − ) (cid:27) . (34)In addition, we still need a set of partial derivatives defined by D t = ∂ [ d ( µ t | t − − ω )] ∂ (vec Φ ) d (vec Φ ) > =2 " d ( µ t | t − − ω ) d (vec Φ ) > ! > ⊗ I N , E t = ∂ [ d ( µ t | t − − ω )] ∂ (vec K ) d (vec K ) > =2 " C > t d ( µ t | t − − ω ) d (vec K ) > ! > ⊗ I N , and finally conclude the derivation with d DE t = ∂ [ d ( µ t | t − − ω )] ∂ (vec Φ ) d (vec K ) > = " C > t d ( µ t | t − − ω ) d (vec Φ ) > ! > ⊗ I N . We therefore have obtained a new set of recursions composed by d ( µ t +1 | t − ω ) dν = X t d ( µ t | t − − ω ) dν + K d ( µ t | t − − ω ) dν ! > C t d ( µ t | t − − ω ) dν ! + Ka t ,d ( µ t +1 | t − ω ) d (vech( Ω )) d (vech( Ω )) > = X t d ( µ t | t − − ω ) d (vech( Ω )) d (vech( Ω )) > + K d ( µ t | t − − ω ) d (vech( Ω )) > ! > C t d ( µ t | t − − ω ) d (vech( Ω )) > ! + KB t , ( µ t +1 | t − ω ) d (vech( Ω )) dν = X t d ( µ t | t − − ω ) d (vech( Ω )) dν + K d ( µ t | t − − ω ) d (vech( Ω )) > ! > C t d ( µ t | t − − ω ) dν ! + K d aB t , which continue with d ( µ t +1 | t − ω ) d (vec Φ ) d (vec Φ ) > = X t d ( µ t | t − − ω ) d (vec Φ ) d (vec Φ ) > + K d ( µ t | t − − ω ) d (vec Φ ) > ! > C t d ( µ t | t − − ω ) d (vec Φ ) > ! + D t ,d ( µ t +1 | t − ω ) d (vec K ) d (vec K ) > = X t d ( µ t | t − − ω ) d (vec K ) d (vec K ) > + K d ( µ t | t − − ω ) d (vec K ) > ! > C t d ( µ t | t − − ω ) d (vec K ) > ! + E t ,d ( µ t +1 | t − ω ) d (vec Φ ) d (vec K ) > = X t d ( µ t | t − − ω ) d (vec Φ ) d (vec K ) > + K d ( µ t | t − − ω ) d (vec Φ ) > ! > C t d ( µ t | t − − ω ) d (vec K ) > ! + d DE t , and conclude with d ( µ t +1 | t − ω ) d ( ν ) d (vec Φ ) > = X t d ( µ t | t − − ω ) d ( ν ) d (vec Φ ) > + K d ( µ t | t − − ω ) dν ! > C t d ( µ t | t − − ω ) d (vec Φ ) > ! ,d ( µ t +1 | t − ω ) d ( ν ) d (vec K ) > = X t d ( µ t | t − − ω ) d ( ν ) d (vec K ) > + K d ( µ t | t − − ω ) dν ! > C t d ( µ t | t − − ω ) d (vec K ) > ! ,d ( µ t +1 | t − ω ) d ( ν ) d (vec Φ ) > = X t d ( µ t | t − − ω ) d (vech( Ω )) d (vec Φ ) > + K d ( µ t | t − − ω ) d (vech( Ω )) > ! > C t d ( µ t | t − − ω ) d (vec Φ ) > ! ,d ( µ t +1 | t − ω ) d (vech( Ω )) d (vec K ) > = X t d ( µ t | t − − ω ) d (vech( Ω )) d (vec K ) > + K d ( µ t | t − − ω ) d (vech( Ω )) > ! > C t d ( µ t | t − − ω ) d (vec K ) > ! . The construction of the Hessian can now be completed by deriving the remaining second-orderpartial derivatives of the second differential in (27). By virtue of this representation, one can how that α t = ∂ ‘ t ( θ ) ∂ν = 12 " ψ ν + N ! − ψ ν ! + Nν − Nν b t − ν + Nν b t (1 − b t ) + 1 ν b t , β t = ∂ ‘ t ( θ ) ∂ (vech( Ω )) ∂ (vech( Ω )) > = " ν + N ν (1 − b t ) D > N ( Ω − / ⊗ Ω − / )( (cid:15) t (cid:15) > t ⊗ (cid:15) t (cid:15) > t ) × ( Ω − / ⊗ Ω − / ) D N − " ν + Nν (1 − b t ) D > N ( Ω − ⊗ Ω − / (cid:15) t (cid:15) > t Ω − / ) D N + " D > N ( Ω − ⊗ Ω − ) D N , ς t = ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − = " ν + Nν − b t ) Ω − / (cid:15) t (cid:15) > t Ω − / − " ν + Nν (1 − b t ) Ω − , and d α β t = ∂ ‘ t ( θ ) ∂ (vech( Ω )) ∂ν = 12 D > N ( Ω − / ⊗ Ω − / )( (cid:15) t ⊗ (cid:15) t ) " ν + Nν b t (1 − b t ) − Nν (1 − b t ) , c α ς t = ∂ ‘ t ( θ ) ∂ µ t | t − ∂ν = Ω / (cid:15) t " ν + Nν b t (1 − b t ) − Nν (1 − b t ) , c βς t = ∂ ‘ t ( θ ) ∂ µ t | t − ∂ (vech( Ω )) > = " ν + Nν (1 − b t ) ( Ω − / (cid:15) t (cid:15) > t Ω − / ⊗ (cid:15) > t Ω − / ) D N + " ν + Nν − b t )( (cid:15) > t Ω − / ⊗ Ω − ) D N . B.3 The Conditional Information Matrix
Taking the conditional expectation of the negative Hessian matrix yields the conditional infor-mation matrix needed for the Fisher’s scoring method. Likewise to the score and the Hessian,we start the discussion by taking advantage from the differentials of the log -likelihood function. E t − [d ‘ t ( θ )] = " ψ ν + N ! − ψ ν ! + N ( ν + N + 4)2 ν ( ν + N )( ν + N + 2) (d ν )+ " ν + N + 2) (d vec Ω ) > (vec Ω − )(vec Ω − ) > (d vec Ω ) − " ν + N ν + N + 2) (d vec Ω ) > ( Ω − ⊗ Ω − )(d vec Ω ) " ν + N )( ν + N + 2) (d vec Ω ) > (vec Ω − )(d ν ) − " ν + Nν + N + 2 (d µ t | t − ) > Ω − (d µ t | t − ) . The calculations of this matrix require for the first set ξ = ( ω > , (vech( Ω )) > , ν ) > , I ( ξ ) t ( θ ) = − E t − " d ‘ t ( θ ) d ξ d ξ > = I ( ξ ) ( θ ) + d ( µ t | t − − ω ) d ξ > ! > I ( µ ) ( θ ) d ( µ t | t − − ω ) d ξ > ! , for the second vector ψ = ((vec Φ ) > , (vec K ) > ) > , I ( ψ ) t ( θ ) = − E t − " d ‘ t ( θ ) d ψ d ψ > = d ( µ t | t − − ω ) d ψ > ! > I ( µ ) ( θ ) d ( µ t | t − − ω ) d ψ > ! , and in conclusion, the negative conditional expected value of the cross-second derivatives are I ( ξ,ψ ) t ( θ ) = − E t − " d ‘ t ( θ ) d ξ d ψ > = d ( µ t | t − − ω ) d ξ > ! > I ( µ ) ( θ ) d ( µ t | t − − ω ) d ψ > ! . Now, by equation (26) the calculations boils down to the static terms of the matrix. Specifically, I ( µ ) ( θ ) = − E t − " ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − = ν + Nν + N + 2 Ω − , while the terms of the static matrix I ( ξ ) ( θ ) are I ( ν ) ( θ ) = − E t − " ∂ ‘ t ( θ ) ∂ν = 14 " ψ ν ! − ψ ν + N ! − N ( ν + N + 4) ν ( ν + N )( ν + N + 2) , I (v( Ω )) ( θ ) = − E t − " ∂ ‘ t ( θ ) ∂ (vech( Ω )) ∂ (vech( Ω )) > = ν + N ν + N + 2) D > N ( Ω − ⊗ Ω − ) D N − ν + N + 2) D > N (vech( Ω − ))(vech( Ω − )) > D N , and lastly the cross terms I (v( Ω ) ,ν ) ( θ ) = − E t − " ∂ ‘ t ( θ ) ∂ (vech( Ω )) ∂ν = − ν + N )( ν + N + 2) D > N (vech( Ω − )) . With these last derivations, we have completed the derivations for the Fisher’s scoring methodin the multivariate DCS- t set up. .4 Third differentials This section derives the third differential of the conditional log -likelihood with respect to thedynamic location, auxiliary to the proof of the asymptotic normality of the MLE, see LemmaC.12. By differentiating equation (27) with respect µ t | t − one obtains d µ t | t − ‘ t ( θ ) = " ν + Nν (1 − b t ) (d µ t | t − ) > Ω − / (cid:15) t (d µ t | t − ) > Ω − / (cid:15) t (cid:15) > t (d µ t | t − ) + " ν + Nν (1 − b t ) (d µ t | t − ) > [ Ω − / (cid:15) t ⊗ I N + I N ⊗ (cid:15) t Ω − / ](d µ t | t − ) − " ν + Nν (1 − b t ) (d µ t | t − ) > Ω − / (cid:15) t (d µ t | t − ) > Ω − (d µ t | t − ) − " ν + Nν (1 − b t ) (d µ t | t − ) > Ω − / (cid:15) t (d µ t | t − ) Ω − / (cid:15) t − " ν + Nν (1 − b t )(d µ t | t − ) > Ω − / (d µ t | t − ) − " ν + Nν (1 − b t )(d µ t | t − ) > Ω − / (cid:15) t . (35) Appendix C Lemmata
Lemmata for the Proof of ConsistencyLemma C.1 ( Uniform integrability of the likelihood ) . Consider the model specified byequations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that conditions 1, 2 and 3in Assumption 1 are satisfied. Then, E " sup θ ∈ Θ | ‘ t ( θ ) | < ∞ . Proof.
Consider the t -th contribution to the likelihood in equation (10). We have that E " sup θ ∈ Θ | ‘ t ( θ ) | ≤ sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln Γ ν + N !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln Γ ν !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N πν ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)
12 ln | Ω | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + ν + N E " sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln y t − µ t | t − ) > Ω − ( y t − µ t | t − ) ν !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ , since the compactness of the parameter space Θ with < ν < ∞ ensures that the first threeterms are finite, there exist Ω − > and Ω + < ∞ such that Ω − < | Ω | < Ω + and moreover, thelogarithmic moment in the last term exist as a consequence of Lemmata 3.1, 3.2 and 3.3 with m > . In particular, we can show that E " sup θ ∈ Θ | ( y t − µ t | t − ) > Ω − ( y t − µ t | t − ) /ν | m < ∞ , is always satisfied for some m > and with ν > , implying the existence of the requiredlogarithmic moment. emma C.2 ( Uniqueness of identifiability of the true parameter vector ) . Considerthe model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume thatconditions 1, 2, 3 and 4 in Assumption 1 are satisfied. Then, E h | ‘ t ( θ ) | i < ∞ . Furthermore,for every θ = θ ∈ Θ , E h | ‘ t ( θ ) | i < E h | ‘ t ( θ ) | i . Proof.
We immediately note that E h | ‘ t ( θ ) | i < ∞ follows from Lemma C.1 and then we canturn to the second statement.To prove the uniqueness of identifiability of θ it is sufficient to consider the sequence { ‘ t ( θ ) − ‘ t ( θ ) } t ∈ Z under the assumption that ( ν, vech Ω ) > = ( ν , vech Ω ) > . Thus, denot-ing with µ t | t − ( θ ) and µ t | t − ( θ ) the dynamic location vector evaluated at θ and the trueparameter vector θ respectively, and as v t ( θ ) = y t − µ t | t − ( θ ) and v t ( θ ) = y t − µ t | t − ( θ ) the difference becomes ‘ t ( θ ) − ‘ t ( θ ) = ln (cid:20) v t ( θ )) > v t ( θ ) Ω − ( v t ( θ )) /ν (cid:21) − ln (cid:20) v t ( θ )) > Ω − ( v t ( θ )) /ν (cid:21) ≤ (cid:20) ( v t ( θ )) > Ω − ( v t ( θ )) /ν (cid:21) − (cid:20) ( v t ( θ )) > Ω − ( v t ( θ )) /ν (cid:21) , where the inequality is implied by the elementary relation ln( x ) ≤ x − for x > , that will bealways strict unless x = 1 , which is the case if and only if µ t | t − ( θ ) = µ t | t − ( θ ) almost surelysince, Ω is symmetric positive definite and < ν < ∞ . Thus, taking expectation yields E [ ‘ t ( θ ) − ‘ t ( θ )] < E (cid:20) tr (cid:18) ( Ω − /ν ) (cid:18) ( v t ( θ ))( v t ( θ )) > − v t ( θ )( v t ( θ ))( v t ( θ )) > (cid:19)(cid:19)(cid:21) . Also, we can write the recursion ( µ t +1 | t ( θ ) − µ t ]1 | t ( θ )) as ( µ t +1 | t ( θ ) − µ t ]1 | t ( θ )) = ( ω − ω ) + ( Φ − Φ ) ω + ( Φ − Φ ) µ t | t − ( θ ) + ( K − K ) u t , and the relation above entails the fact that if µ t | t − ( θ ) = µ t | t − ( θ ) ∀ t almost surely, then ( ω − ω ) + ( Φ − Φ ) ω =( Φ − Φ ) µ t | t − ( θ ) + ( K − K ) u t , almost surely. Nonetheless, as det K = 0 , the whole multivariate system of equations is stochas-tic, and one cannot find a nontrivial solution of the system that will cancel out the driving force u t of the dynamic location vector. As a result, the only available option reduces to the equiv-alence between all the parameters, that is ω = ω , Φ = Φ and K = K . Therefore, we haveshown that E [ ‘ t ( θ )] < E [ ‘ t ( θ )] for every θ = θ . Lemma C.3 ( Uniform convergence of the likelihood function ) . Consider the modelspecified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that conditions1, 2 and 3 in Assumption 1 are satisfied. Then, sup θ ∈ Θ | b L T ( θ ) − L T ( θ ) | a.s. −−→ as t → ∞ , here b L T ( θ ) is the empirical likelihood started with µ | and L ( θ ) is the unique stationaryergodic counterpart, defined in (19) and (20) respectively.Proof. We apply a mean-value expansion of the log -likelihood around ˆ µ ?t | t − which is on thechord between the started filtered location ˆ µ t | t − and µ t | t − . We take the supremum over thecompact parameter space and see that sup θ ∈ Θ | b L T ( θ ) − L T ( θ ) | ≤ sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ b L T ( θ ) ∂ ˆ µ ? > t | t − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k , where by direct calculation and the triangle inequality we get sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ b L T ( θ ) ∂ ˆ µ ? > t | t − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T T X t =1 sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Ω − ν + Nν ( y t − ˆ µ ?t | t − )1 + ( y t − ˆ µ ?t | t − ) > Ω − ( y t − ˆ µ ?t | t − ) /ν (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ c Ω max θ ∈ Θ ν + Nν ! T T X t =1 sup θ ∈ Θ k y t − ˆ µ ?t | t − k× sup θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h y t − ˆ µ ?t | t − ) > Ω − ( y t − ˆ µ ?t | t − ) /ν i − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Note that the compactness of the parameter space imposed by condition 3 is crucial here.Moreover, if we treat the dynamic location vector as a fixed parameter with value ˆ µ ?t | t − and let y t → ∞ the entire term in the right hand side of the latter inequality will vanish. Hence, weobtain that sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ b L T ( θ ) ∂ ˆ µ ? > t | t − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) , which is enough to ensure the existence of log -moments.Furthermore, conditions 1 and 2 are needed in order to keep the data stationary and ergodic andthe filter invertible respectively. Thus, we can apply Lemma 3.3 and obtain sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k e.a.s. −−→ . In conclusion, by Lemma 2.1 in [Straumann and Mikosch, 2006] the claimedalmost sure convergence holds.
Lemma C.4 ( Uniform Law of Large Numbers of the likelihood ) . Consider the modelspecified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that conditions1, 2 and 3 in Assumption 1 are satisfied. Then, sup θ ∈ Θ |L T ( θ ) − L ( θ ) | a.s. −−→ as t → ∞ , where L T ( θ ) (the stationary ergodic average likelihood) and L ( θ ) (the limit likelihood) are defined in (20) and (21) , respectively.Proof. We note that Theorem A.2.2 of [White, 1994], i.e. the Uniform Law of Large Numbersin its version for ergodic stationary processes, applies straightforwardly to our case since1. the parameter space is compact,2. the empirical likelihood function L T ( θ ) defined in (20) is continuous in θ ∀ y t and ∀ θ ∈ Θ is measurable in y t ,3. by Lemma C.2 we obtain the identifiability and the moment bound E h | ‘ t ( θ ) | i < ∞ ensurethe dominance condition. hus, all the conditions of Theoerm A.2.2 in [White, 1994] are met and the proof is complete. Lemmata for the Proof of Asymptotic NormalityLemma C.5 ( Stationarity, ergodicity and moments for the first derivatives of the dy-namic location ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationaryand ergodic. Consider the stochastic difference equation d( µ t +1 | t − ω ) = X t d( µ t | t − − ω ) + R t , where X t is defined in (24) Assume that conditions 1, 2 and 3 in Assumption 1 are sat-isfied. Then the sequence d ( µ t +1 | t − ω ) = P ∞ j =0 Q jk =1 X t − k ! R t − j is stationary and er-godic, and converges almost surely to the unique stationary ergodic solution. Furthermore, E [ k d ( µ t | t − − ω ) k m ] < ∞ for every m > .Proof. The proof follows the arguments of the proof of Lemma 3.1, which now applies byrewriting X t and all the components of R t in terms of the innovations and independently of µ t | t − so that a stationary ergodic sequence { ( X t , R t ) } t ∈ Z can be generated. It follows fromLemma 3.1 that the first condition is used in order to keep the multivariate system stableand the matrices X t random, while the contraction condition 2 for linear stochastic differenceequations gives us the sufficient condition under which d ( µ t +1 | t − ω ) admits and converges to aunique stationary and ergodic solution { d ( µ t | t − − ω ) } t ∈ Z , see [Bougerol, 1993]. Moreover, theH¨older and Minkowsky inequalities imply E " k d ( µ t +1 | t − ω ) k m ≤ ∞ X j =0 E "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) j Y k =0 X t − k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m /m E " k R t − j k m /m . In addition, from equation (25) we note that when θ = θ E " k X t k m ≤k Φ k m + E " k K C t k m ≤ ¯ ρ m + c K E " b m/ t (1 − b t ) m/ E h k ( z t ⊗ z t ) k m i + c K N m/ E h (1 − b t ) m/ i = ¯ ρ m + c K E h k z t k m i B (cid:16) N + m , ν + m (cid:17) B (cid:16) N , ν (cid:17) + c K N m/ B (cid:16) N , ν + m (cid:17) B (cid:16) N , ν (cid:17) = ¯ ρ m + c K N m B (cid:16) N + m , ν + m (cid:17) B (cid:16) N , ν (cid:17) + c K N m/ B (cid:16) N , ν + m (cid:17) B (cid:16) N , ν (cid:17) < ∞ , by Lemma 2.1. Note that the condition 1 is needed in order to keep the matrix X t randomand identifiable. t remains to prove the moment bounds of R t for every m > . We have, E " k a t k m = E " b m/ t (1 − b t ) m/ /ν m/ E " k Ω / z t k m ≤ c Ω N m/ B (cid:16) N +3 m , ν + m (cid:17) B (cid:16) N , ν (cid:17) < ∞ , In addition, E " k vec B t k m = E " ν m/ b m/ t (1 − b t ) m/ E "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( Ω − / z t ⊗ Ω − / z t ⊗ Ω / z t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m ≤ c Ω E h k z t k m i B (cid:16) N +3 m , ν + m (cid:17) B (cid:16) N , ν (cid:17) = c Ω N m/ B (cid:16) N +3 m , ν + m (cid:17) B (cid:16) N , ν (cid:17) < ∞ , and E " k D t k m = E "(cid:13)(cid:13)(cid:13)h ( µ t | t − − ω ) > ⊗ I N i(cid:13)(cid:13)(cid:13) m ≤ ( √ N ¯ c ∞ X j =0 ¯ ρ j E (cid:20) k u t − j k m (cid:21)! /m ) m < ∞ , by Lemma 3.1, and finally, by Lemma 2.1, E " k E t k m = E "(cid:13)(cid:13)(cid:13)h ( u t ) > ⊗ I N i(cid:13)(cid:13)(cid:13) m ≤ N m/ E " k u t k m ≤ c Ω ν m/ B (cid:16) N + m , ν + m (cid:17) B (cid:16) N , ν (cid:17) . < ∞ . Lemma C.6 ( Stationarity, ergodicity and moments of the second derivatives ofthe dynamic location ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Consider the stochastic difference equation d ( µ t +1 | t − ω ) = X t d ( µ t | t − − ω ) + K d( µ t | t − − ω ) > C t d( µ t | t − − ω ) + Q t , where X t is defined in (24) , Assume that conditions 1 and 2 in Assumption 1 are satisfied.Then the series { d ( µ t +1 − ω ) } t ∈ Z is stationary and ergodic, and d ( µ t +1 | t − ω ) = ∞ X j =0 ( j Y k =1 X t − k !" K d ( µ t − j | t − j − − ω ) > C t − j d ( µ t − j | t − j − − ω ) + Q t − j converges almost surely to the unique stationary ergodic solution. Furthermore, E [ k d ( µ t | t − − ω ) k m ] < ∞ for every m > .Proof. From Lemma 3.1, the first two conditions ensure stationary and ergodicity of the se-quence { d ( µ t +1 | t − ω ) } t ∈ Z . Moreover, H¨older and Minkowsky inequalities, along with the ndependence between each component, imply that E " k d ( µ t +1 | t − ω ) k m ≤ ∞ X j =0 ( E "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) j Y k =0 X t − k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m /m × c K E " k d ( µ t − j | t − j − − ω ) k m / m E " k C t − j k m /m + E " k Q t − j k m /m !) , from which we can see that, by Lemma C.5, the first two terms are uniformly bounded andthe third is the second derivative of the driving force with respect to the dynamic locationvector. In the same spirit of Lemma C.5, let us consider equation (31). Then, the c m -inequalityestablishes that E " k C t k m ≤ E " [8(1 − b t ) /ν ] m × (cid:13)(cid:13)(cid:13)(cid:13)(cid:26)h ( I N ⊗ v t ) v > t i vec Ω − (cid:27)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:13)(cid:13)(cid:13)h v > t Ω − i(cid:13)(cid:13)(cid:13)(cid:13) m + E " [2(1 − b t ) /ν ] m × (cid:13)(cid:13)(cid:13)(cid:13)(cid:26)h Ω − ⊗ I N ih v t ⊗ I N + I N ⊗ v t i(cid:27)(cid:13)(cid:13)(cid:13)(cid:13) m + E " [2(1 − b t ) /ν ] m (cid:13)(cid:13)(cid:13)(cid:13)(cid:26)h Ω − ⊗ I N ih v t ⊗ I N i(cid:27)(cid:13)(cid:13)(cid:13)(cid:13) m ≤ C E h k z t k m i + C E h k z t k m i + C E h k z t k m i < ∞ . Straightforward calculations show that analogous results hold for each component of Q t , sothat Q t is uniformly bounded. Lemma C.7 ( Invertibility for the first and second derivatives of the dynamic loca-tion filter ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary andergodic. Let the conditions of Lemmata 3.1, C.5 and C.6 hold true. Consider further the fil-tering equation (6) under the condition of Lemma 3.3. Then, for any initialization of the filter µ | and its first derivatives in d µ | , the perturbed first and second derivatives of the dynamiclocation filter, i.e. { d ( ˆ µ t | t − − ω ) } t ∈ N and { d ( ˆ µ t | t − − ω ) } t ∈ N , converge exponentially fast almostsurely to the unique stationary ergodic solution { d ( µ t | t − − ω ) } t ∈ Z and { d ( µ t | t − − ω ) } t ∈ Z .Furthermore, for any m > E [sup θ ∈ Θ k d ( ˆ µ t | t − − ω ) k m ] < ∞ and E [sup θ ∈ Θ k d ( ˆ µ t | t − − ω ) k m ] < ∞ ,2. E [sup θ ∈ Θ k d ( µ t | t − − ω ) k m ] < ∞ and E [sup θ ∈ Θ k d ( µ t | t − − ω ) k m ] < ∞ .Proof. We provide a detailed discussion for the first case, that is the convergence of the per-turbed first derivatives, since the proof for the the convergence of the perturbed second deriva-tives follows the same line.The proof of this Lemma builds upon the arguments of Theorem 2.10 in [Straumann and Mikosch, 2006]for perturbed stochastic recurrence equations. In particular, the perturbed stochastic recurrenceequation corresponds to d ( ˆ µ t +1 | t − ω ) = c X t d ( ˆ µ t | t − − ω ) + c R t , which is a nonlinear function f the initial sequence { ( ˆ µ t | t − − ω ) } t ∈ N . The relevant contraction condition 8 of Lemma 3.3holds and the required convergence of the recurrence equation is obtained if sup θ ∈ Θ k c X t − X t k e.a.s. −−→ and sup θ ∈ Θ k c R t − R t k e.a.s. −−→ as t → ∞ . (36)In order to verify these conditions, we use the mean value theorem, giving sup θ ∈ Θ k c X t − X t k ≤ sup θ ∈ Θ k C t k sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k , (37)and sup θ ∈ Θ k c R t − R t k ≤ sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C t d B C t d a C t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k , where the expression for C t , d B C t and d B C t can be found in (31), (32) and (33) respectively. We can combine the resultsobtained in Lemma C.6 together with the almost sure exponentially fast convergence (9) inLemma 3.3, in order to achieve the required convergences in (36). As in Lemma C.5 we canshow by direct calculations that the property of uniformly boundedness applies to each thesederivatives, since they are continuous functions of w t in equation (3). We obtain sup θ ∈ Θ k C t k = O p (1) , sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C t d B C t d a C t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) and sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k = o e.a.s. (1) as t → ∞ . Thus, repeated applications of Lemma 2.1 in [Straumann and Mikosch, 2006] ensure the re-quired convergence in (37). Summarising, sup θ ∈ Θ k d ( ˆ µ t | t − − ω ) − d ( µ t | t − − ω ) k e.a.s. −−→ as t → ∞ . Since the sequence { d ( ˆ µ t | t − − ω ) } t ∈ N is a nonlinear function of both the perturbed recurrence { d ( ˆ µ t | t − − ω ) } t ∈ N and the filter { ( ˆ µ t | t − − ω ) } t ∈ N the same arguments apply sequentially,yielding sup θ ∈ Θ k d ( ˆ µ t | t − − ω ) − d ( µ t | t − − ω ) k e.a.s. −−→ as t → ∞ . The second claim for the moment bounds follows by the continuous mapping theorem, sincethe derivatives are nonlinear continuous functions of µ t | t − , which has unbounded moments, seeLemma 3.3. Lemma C.8 ( Martingale difference property of the likelihood’s first derivatives ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic.Assume that conditions 1, 2, 3 and 4 in Assumption 1 are satisfied. Then, the first derivatives ofthe log -likelihood { d‘ t ( θ ) } t ∈ Z form a martingale difference sequence with finite second moments, .e. E t − [ d‘ t ( θ )] = 0 and E [ | d‘ t ( θ ) | ] < ∞ . Proof.
The first derivatives of the log -likelihood contribution at time t with respect to theparameter vector θ can be retrieved from the differential in equation (23), We have E t − [d ‘ t ( θ )] = 12 " ψ ν + N ! − ψ ν ! − Nν + ν + Nν E t − [ b t ] − E t − [ln w t ] (d ν )+ 12 (d vech( Ω )) > D > N ( Ω − / ⊗ Ω − / ) " ν + Nν E t − [( (cid:15) t ⊗ (cid:15) t ) /w t ] − vec I N + ν + Nν (d µ t | t − ) > Ω − E t − [( y t − µ t | t − ) /w t ] , since the derivatives obtained from d µ t | t − are F t − -measurables. When θ = θ , one has E t − [ b t ] = Nν + N , E t − [ln(1 /w t )] = E t − [ln(1 − b t )] = ψ ν ! − ψ ν + N ! , E t − [( (cid:15) t ⊗ (cid:15) t ) /w t ] = ν E t − [( z t ⊗ z t )] E t − [ b t ] = νν + N vec I N , E t − [( y t − µ t | t − ) /w t ] = √ ν E t − [ q b t (1 − b t )] Ω / E t − [ z t ] = , where the first and the second equality follow from the propertis of the beta distribution,see equation (4). The third and the fourth equalities are obtained based on the stochasticrepresentation of the model given in equation (14). Thus, by substitutions, we obtain themartingale difference property.The second claim follows by Lemmata 2.1, 3.1, 3.2, 3.3, C.1 and by an application of thecontinuous mapping theorem to d‘ t ( θ ) . Lemma C.9 ( CLT for the likelihood’s first derivatives ) . Consider the model specifiedby equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that conditions 1, 2and 3 in Assumption 1 are satisfied. Then, the first derivatives of the log -likelihood L T ( θ ) = T P Tt =1 d‘ t ( θ ) obeys the CLT for martingale difference sequences, that is √ T L T ( θ ) = ⇒ N ( , V ) as t → ∞ , where V = E " ( L T ( θ ))( L T ( θ )) > as t → ∞ . Proof.
Consider Lemma C.8, where the relevant properties of the likelihood’s first derivativesare obtained. With the support of the Cram´er-Wold device (see [van der Vaart, 1998] pag.16) the CLT for martingales of [Billingsley, 1961] directly applies to the linear combination √ T L T ( θ ) = √ T T P Tt =1 d‘ t ( θ ) = ⇒ N ( , V ) . Lemma C.10 ( Convergence in probability of the likelihood’s first derivatives ) . Con-sider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume hat conditions 1, 2, 3 and 4 in Assumption 1 are satisfied. Then, √ T k b L T ( θ ) − L T ( θ ) k P −→ as T → ∞ , where b L T ( θ ) = T P Tt =1 d b ‘ t ( θ ) collects the likelihood’s first derivatives startedwith µ | , while L T ( θ ) = T P Tt =1 d‘ t ( θ ) is the unique stationary ergodic counterpart.Proof. Almost sure convergence can be proved based on the invertibility of the location filter,see Lemma 3.3, and its derivatives, see Lemma C.7. Invertibility also ensures that the perturbedfirst differential of the dynamic location will converge to its unique stationary ergodic solution, sup θ ∈ Θ k ˆ µ t | t − − µ t | t − k e.a.s. −−→ and sup θ ∈ Θ k d ˆ µ t | t − − d µ t | t − k e.a.s. −−→ as t → ∞ . (38)Hence, we can rely on a multivariate mean value expansion about all the elements of thevectors ˆ µ ?t | t − and d ˆ µ ?t | t − , which lie on the chords between ( ˆ µ t | t − , µ t | t − ) and ( d ˆ µ t | t − , d µ t | t − ) respectively, yielding √ T k b L T ( θ ) − L T ( θ ) k ≤ √ T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ ( b L T ( θ )) ∂ ˆ µ ? > t | t − ∂ ( b L T ( θ )) ∂ ( d ˆ µ ? > t | t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( ˆ µ t | t − ( θ ) − µ t | t − ( θ ))( d ˆ µ t | t − ( θ ) − d µ t | t − ( θ )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . (39)The first term on the right hand of the inequality is uniformly bounded. Exponentially fastalmost sure convergence of the second term in the right hand side is obtained by Lemma C.7.By means of analogouos arguments as in Lemma C.3 we can show that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ ( b L T ( θ )) ∂ ˆ µ ? > t | t − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) , and (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ ( b L T ( θ )) ∂ ( d ˆ µ ? > t | t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) . Moreover, the results obtained in 38 imply that for t large enough max {k ˆ µ t | t − ( θ ) − µ t | t − ( θ ) k , k d ˆ µ t | t − ( θ ) − d µ t | t − ( θ ) k} < . By using the Chebyshev and the c m inequalities we then have that for ε > and some m > P (cid:18) √ T k b L T ( θ ) − L T ( θ ) k > ε (cid:19) ≤ √ Tε m E [ k b L T ( θ ) − L T ( θ ) k m ] ≤ T m/ ε m T X t =1 E [ k d b ‘ t ( θ ) − d‘ T ( θ ) k m ] ≤ T m/ ε m O p ( t% t ) , which is O p ( T m/ ) and this implies the claimed convergence in probability. Lemma C.11 ( Properties of the likelihood’s second derivatives ) . Consider the modelspecified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume that conditions1, 2, 3, 4 and 5 in Assumption 1 are satisfied. Then, the second derivatives of the likelihood d ‘ t ( θ ) } t ∈ Z are stationary ergodic with bounded moments and nonsingular. In particular, E [ d ‘ t ( θ )] < ∞ , Proof.
The complete equation of the second differential is more subtle than the first, thus weleave it in (27). We prove the arguments by considering equation (13), namely d ‘ t ( θ ) d θ d θ > = ∂ ‘ t ( θ ) ∂ θ ∂ θ > + d ( µ t | t − − ω ) d θ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d θ > ! + ∂‘ t ( θ ) ∂ µ > t | t − d ( µ t | t − − ω ) d θ d θ > . By taking the expectation, we get a finite and static term in the first summand on the righthand side, while by the independence and the martingale difference sequence properties of thescore vector, the last term becomes null. Thus, we can focus our attention on the middle term.Define I ( µ t | t − ) ( θ ) = − E " d ( µ t | t − − ω ) d θ > ! > ∂ ‘ t ( θ ) ∂ µ t | t − ∂ µ > t | t − d ( µ t | t − − ω ) d θ > ! . Note that, by independence, we can express the vectorized counterpart as vec I ( µ t | t − ) ( θ ) = E " d ( µ t | t − − ω ) d θ > ! ⊗ d ( µ t | t − − ω ) d θ > ! > vec I ( µ ) ( θ ) . By Lemmata 3.3, C.7 the dynamic location filter and its differentials are invertible and achievetheir own unique stationary ergodic solution with an unbounded number of finite moments.Thus, we obtain the desired result by repeated applications of the law of iterated expectationto the following equality E t − " d ( µ t +1 − ω ) d θ > ! ⊗ d ( µ t +1 − ω ) d θ > ! > = E t − " X t d ( µ t | t − − ω ) d θ > + d R t d θ > ! ⊗ X t d ( µ t | t − − ω ) d θ > + d R t d θ > ! > = d ( µ t | t − − ω ) d θ > ⊗ d ( µ t | t − − ω ) d θ > ! > E t − " X t ⊗ X t ! > + E t − " d R t d θ > ⊗ d R t d θ > ! > + E t − " X t d ( µ t | t − − ω ) d θ > ⊗ d R t d θ > ! > + E t − " d R t d θ > ⊗ X t d ( µ t | t − − ω ) d θ > ! > . Note that the contraction conditions 2 and 5 are more than enough to ensure the stability of therecursions since E t − [ k X t ⊗ X t k ] = E t − [ k X t k ] < , while Lemma C.5 ensures the existenceof the required moments. Lemma C.12 ( Uniform convergence of the likelihood’s second derivatives ) . Considerthe model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and ergodic. Assume thatconditions 1, 2 and 3, in Assumption 1 are satisfied. Then, sup θ ∈ Θ | b L T ( θ ) − L T ( θ ) | a.s. −−→ as t → ∞ , where b L T ( θ ) = T P Tt =1 d b ‘ t ( θ ) collects the likelihood’s second derivatives startedwith µ | , while L T ( θ ) = T P Tt =1 d ‘ t ( θ ) is the unique stationary ergodic counterpart.Proof. The second derivatives of the likelihood are nonlinear functions of the filtered locationvector and its first end second derivatives. Hence, the mean value theorem is applied for eachdynamic equation. As a result, sup θ ∈ Θ k b L T ( θ ) − L T ( θ ) k ≤ sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∂ b L T ( θ ) ∂ ˆ µ ? > t | t − ∂ b L T ( θ ) ∂ ( d ˆ µ ? > t | t − ) ∂ b L T ( θ ) ∂ ( d ˆ µ ? > t | t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( ˆ µ t | t − − µ t | t − )( d ˆ µ t | t − − d µ t | t − )( d ˆ µ t | t − − d µ t | t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . Thus, the proof follows by the same arguments of the proof of Lemma C.10, i.e. by obtaining theuniformly boundedness of the first term and the exponentially fast convergence of the secondterm in the right hand side respectively. Note that the last component of the first term in theright hand side involves a third order differential, which can be found in (35) and is uniformlybounded. Subsequent applications of Lemma 2.1 of [Straumann and Mikosch, 2006] yield thedesired result.
Lemma C.13 ( Uniform Law of Large Numbers of the likelihood’s second deriva-tives ) . Consider the model specified by equations (1) and (2) with { (cid:15) t } t ∈ Z stationary and er-godic. Assume that conditions 1, 2, 3 and 4 in Assumption 1 are satisfied. Then, sup θ ∈ Θ |L T ( θ ) −L ( θ ) | a.s. −−→ as t → ∞ , where L T ( θ ) = T P Tt =1 d ‘ t ( θ ) and L ( θ ) = E [ d ‘ t ( θ )] denotes itslimit.Proof. Note that L T ( θ ) is a function of { y t , y t − , . . . } and therefore stationary and ergodic.The Lemma follows straightforwardly from Lemma C.11 and the The Uniform Law of LargeNumbers for ergodic stationary processes, see Theoerm A.2.2 in [White, 1994] and LemmaC.4.and therefore stationary and ergodic.The Lemma follows straightforwardly from Lemma C.11 and the The Uniform Law of LargeNumbers for ergodic stationary processes, see Theoerm A.2.2 in [White, 1994] and LemmaC.4.