[PDF] Mixing convergence of LSE for supercritical Gaussian AR(2) processes using random scaling

Abstract

We prove mixing convergence of least squares estimator of autoregressive parameters for supercritical Gaussian autoregressive processes of order 2 having real characteristic roots with different absolute values. We use an appropriate random scaling such that the limit distribution is a two-dimensional normal distribution concentrated on a one-dimensional ray determined by the characteristic root having the larger absolute value.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n Mixing convergence of LSE for supercritical GaussianAR(2) processes using random scaling

M´aty´as Barczy ∗ , Gyula Pap * MTA-SZTE Analysis and Stochastics Research Group, Bolyai Institute, University of Szeged,Aradi v´ertan´uk tere 1, H–6720 Szeged, Hungary.e–mail: [email protected] (M. Barczy). Abstract

We prove mixing convergence of least squares estimator of autoregressive parametersfor supercritical Gaussian autoregressive processes of order 2 having real characteristicroots with diﬀerent absolute values. We use an appropriate random scaling such thatthe limit distribution is a two-dimensional normal distribution concentrated on a one-dimensional ray determined by the characteristic root having the larger absolute value.

Studying asymptotic behaviour of Least Squares Estimator (LSE) of AutoRegressive (AR)parameters of AR processes has a long history, it goes back at least to Mann and Wald [5].Most of the authors proved convergence in distribution of appropriately normalized versions ofthe LSE in question, but one can rarely ﬁnd other types of convergence in the correspondinglimit theorems. For some AR processes of order p , Jeganathan [4, Theorems 9, 14 and 17]proved so-called strong convergence (see Jeganathan [4, Deﬁnition 3]) of the LSE in question,and for AR processes of order 1, H¨ausler and Luschgy [3, Chapter 9] proved so-called stableand mixing convergence (see Appendix A) of the LSE. According to our knowledge, results onstable (mixing) convergence of the LSE of AR parameters of higher order AR processes arenot available in the literature. In the present paper we consider a supercritical Gaussian ARprocess of order 2 having real characteristic roots with diﬀerent absolute values, and we provemixing convergence of the LSE of its AR parameters using some random normalization.Let Z + , N , R , R + and R ++ denote the set of non-negative integers, positive integers,real numbers, non-negative real numbers and positive real numbers, respectively. Let ( Z n ) n ∈ N be a sequence of independent and identically distributed random variables such that Z isnormally distributed with mean 0 and with variance σ , where σ ∈ R ++ . Let ( X , X − ) ⊤ : 62F12, 62H12, 60G15, 60F05. Key words and phrases : autoregressive processes, least squares estimator, stable convergence, mixing con-vergence.M´aty´as Barczy is supported by grant NKFIH-1279-2/2020 of the Ministry for Innovation and Technology,Hungary.

1e a random vector with values in R independent of ( Z n ) n ∈ N , and suppose that X and X − have ﬁnite second moments. Let us consider a Gaussian autoregressive process of order 2(AR(2) process) ( X n ) n > − deﬁned by(1.1) X n = ϑ X n − + ϑ X n − + Z n , n ∈ N , where ( ϑ , ϑ ) ⊤ ∈ R . Note that (1.1) implies(1.2) " X n X n − = ϑ " X n − X n − + " Z n , n ∈ N , and hence(1.3) " X n X n − = ϑ k " X n − k X n − k − + n X j = n − k +1 ϑ n − j " Z j , n ∈ Z + , k ∈ { , , . . . , n } , where P nj = n +1 := and ϑ := " ϑ ϑ . By ̺ ( ϑ ), we denote the spectral radius of ϑ . The AR(2) process ( X n ) n > − given in (1.1) iscalled subcritical, critical and supercritical if ̺ ( ϑ ) < ̺ ( ϑ ) = 1 and ̺ ( ϑ ) >

1, respectively.We have ̺ ( ϑ ) = max {| λ + | , | λ − |} , where λ + and λ − denote the eigenvalues of ϑ given by λ + := ϑ + p ϑ + 4 ϑ ∈ C , λ − := ϑ − p ϑ + 4 ϑ ∈ C . (1.4)Note that λ + and λ − are the (characteristic) roots of the autoregressive (characteristic)polynomial x − ϑ x − ϑ of the AR(2) process ( X n ) n > − . The process ( X n ) n > − iscalled explosive if its characteristic polynomial has at least one root outside the unit circle(supercritical case), but has no roots on the unit circle, i.e., ̺ ( ϑ ) > | λ + | 6 = 1, | λ − | 6 = 1,see Jeganathan [4, Section 6]. If both roots λ + and λ − are outside the unit circle, i.e., | λ + | > | λ − | >

1, then the process ( X n ) n > − is called purely explosive. An explosive,but not purely explosive AR(2) process is sometimes called partially explosive.In this paper we will consider a supercritical AR(2) process supposing also that | λ + | 6 = | λ − | .Then ϑ + 4 ϑ >

0, hence we have λ + ∈ R and λ − ∈ R . Note that this case includes the(purely and partially) explosive case, but it also includes the case | λ + | > λ − = 1 (i.e.,when there is a so-called unit root of the characteristic polynomial of ( X n ) n > − ).We will study the asymptotic behaviour of the LSE of the parameters ϑ and ϑ basedon the observations X − , X , X , . . . , X n using an appropriate random normalization withthe aim to establish mixing convergence (see Appendix A) of the LSE in question. For each n ∈ N , a least squares estimator ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ of ( ϑ , ϑ ) ⊤ based on the observations X − , X , X , . . . , X n can be obtained by minimizing the sum of squares n X k =1 ( X k − ϑ X k − − ϑ X k − ) ϑ , ϑ ) ⊤ over R . It is known that for each n ∈ N with n >

3, a uniqueLSE ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ of ( ϑ , ϑ ) ⊤ based on the observations X − , X , X , . . . , X n exists withprobability 1, and this LSE has the form given by " b ϑ ( n )1 b ϑ ( n )2 =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 X k " X k − X k − , n ∈ N , on the event  det  n X k =1 " X k − X k − X k − X k − ⊤  >  , see, e.g., Lemma 2.2.Venkataraman [14], [15], [16] and Narasimham [9] proved convergence in distribution of( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ for explosive AR(2) processes with not necessarily Gaussian innovations and with-out a unit root using non-random normalizations. Further, part (b) of Theorem 3 in Monsour[8] gives a general result on the asymptotic behaviour of the LSE of AR parameters of a su-percritical AR process of order p ∈ N using a non-random normalization establishing weakconvergence of the LSE in question. Part (c) of Theorem 3 in Monsour [8] describes the asymp-totic behaviour of the LSE of AR parameters of a general AR process of order p ∈ N using arandom normalization proving weak convergence of the LSE in question. In Section 4 we givea detailed comparison of our forthcoming result with the existing ones, especially with those ofMonsour [8].Recently, Aknouche [1] has studied the asymptotic behaviour of the LSE for some explosivestrong periodic AR processes of order p ∈ N and in case of independent and periodicallydistributed Gaussian innovations with zero mean, it has been shown that the LSE using anappropriate random scaling converges in distribution towards a p -dimensional standard normaldistribution.In the present paper we consider a supercritical AR(2) process ( X k ) k > − given in (1.1)with real characteristic roots λ + , λ − ∈ R satisfying | λ + | 6 = | λ − | and we show that (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ converges mixing to σN [1 sign( λ )] ⊤ as n → ∞ , where λ denotes the characteristicroot having the larger absolute value, N is a one-dimensional standard normally distributedrandom variable, see Theorem 3.1. The limit distribution is in fact a two-dimensional normaldistribution concentrated on a one-dimensional ray determined by the characteristic root havingthe larger absolute value. Our proof is based on a multidimensional stable limit theorem (seeTheorem B.1, proved in Barczy and Pap [2]) which is a multidimensional analogue of the corre-sponding one-dimensional result in H¨ausler and Luschgy [3, Theorem 8.2]. Our proof techniqueis motivated by that of Theorem 9.2 in H¨ausler and Luschgy [3], and it is completely diﬀerent3rom that of Monsour [8]. We note that we tried to prove mixing (or stable-) convergence of( b ϑ ( n )1 , b ϑ ( n )2 ) as n → ∞ in the supercritical case using a non-random normalization, but ourattempts have not been successful so far (for more details, see the end of Section 4).The paper is structured as follows. In Section 2, we recall the derivation of the LSE( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ of ( ϑ , ϑ ) ⊤ , and a useful decomposition of ( b ϑ ( n )1 − ϑ , b ϑ ( n )2 − ϑ ) ⊤ as well,which is used in the proof of Theorem 3.1. Section 3 contains the precise formulation of ourmain result Theorem 3.1 together with its two corollaries. Section 4 is devoted to give a de-tailed comparison of Theorem 3.1 with the existing results in the literature, especially withthose of Monsour [8]. All the proofs can be found in Section 5. We close the paper with threeappendices: we recall the notions of stable and mixing convergence (see Appendix A), a mul-tidimensional analogue of a one-dimensional stable limit theorem in H¨ausler and Luschgy [3,Theorem 8.2] which was proved in Barczy and Pap [2, Theorem 1.4] (see Appendix B), and theLenglart’s inequality (see Appendix C).In what follows, we collect the notations used in the paper and not deﬁned so far. Letlog + ( x ) := log( x ) { x > } + 0 · { x< } for x ∈ R + . The Borel σ -algebra on R is denotedby B ( R ). Convergence in a probability measure P and in distribution under a probabilitymeasure P will be denoted by P −→ and D ( P ) −→ , respectively. For an event A with P ( A ) > P A ( · ) := P ( · | A ) = P ( · ∩ A ) / P ( A ) denote the conditional probability measure given A .Let E P denote the expectation under the probability measure P . Almost sure equalityunder a probability measure P and equality in distribution will be denoted by P -a.s. = and D =,respectively. Every random variable will be deﬁned on a complete probability space (Ω , F , P )(i.e., on a probability space (Ω , F , P ) having the property that for all B ∈ F with P ( B ) = 0and all subsets A ⊂ B , we have A ∈ F ). For a random variable ξ : Ω → R d , its distributionon ( R d , B ( R d )) under P is denoted by P ξ . By k x k and k A k , we denote the Euclideannorm of a vector x ∈ R d and the induced matrix norm of a matrix A ∈ R d × d , respectively.By h x , y i , we denote the Euclidean inner product of vectors x , y ∈ R d . The null vector andthe null matrix will be denoted by . Moreover, I d ∈ R d × d denotes the identity matrix, and e , . . . , e d denotes the natural bases in R d . For a symmetric and positive semideﬁnite matrix A ∈ R d × d , its unique symmetric, positive semideﬁnite square root is denoted by A / . If V ∈ R d × d is symmetric and positive semideﬁnite, then N d ( , V ) denotes the d -dimensionalnormal distribution with mean vector ∈ R d and covariance matrix V . In case of d = 1,instead of N we simply write N . For each n ∈ N , a least squares estimator ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ of ( ϑ , ϑ ) ⊤ based on the observations X − , X , X , . . . , X n can be obtained by minimizing the sum of squares n X k =1 ( X k − ϑ X k − − ϑ X k − ) ϑ , ϑ ) ⊤ over R . For each n ∈ N , we deﬁne the function Q n : R n +2 × R → R by Q n ( x − , x , x , . . . , x n ; ϑ , ϑ ) := n X k =1 ( x k − ϑ x k − − ϑ x k − ) for all ( x − , x , x , . . . , x n ) ⊤ ∈ R n +2 and ( ϑ , ϑ ) ⊤ ∈ R . By deﬁnition, for each n ∈ N , aleast squares estimator of ( ϑ , ϑ ) ⊤ is a measurable function F n : R n +2 → R such that Q n ( x − , x , x , . . . , x n ; F n ( x − , x , x , . . . , x n )) = inf ( ϑ ,ϑ ) ⊤ ∈ R Q n ( x − , x , x , . . . , x n ; ϑ , ϑ )for all ( x − , x , x , . . . , x n ) ⊤ ∈ R n +2 . Next, we give the solutions of this extremum problem. For each n ∈ N , any least squares estimator of ( ϑ , ϑ ) ⊤ is a measurablefunction F n : R n +2 → R for which (2.1) F n ( x − , x , x , . . . , x n ) = G n ( x − , x , x , . . . , x n ) − H n ( x − , x , x , . . . , x n ) on the set D n := (cid:8) ( x − , x , x , . . . , x n ) ⊤ ∈ R n +2 : det( G n ( x − , x , x , . . . , x n )) > (cid:9) , where G n ( x − , x , x , . . . , x n ) := n X k =1 " x k − x k − x k − x k − ⊤ , H n ( x − , x , x , . . . , x n ) := n X k =1 x k " x k − x k − . The next result is about the unique existence of ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ . Let ( X k ) k > − be an AR(2) process given in (1.1) such that ( Z n ) n ∈ N is asequence of independent and identically distributed random variables with Z D = N (0 , σ ) ,where σ ∈ R ++ , and ( X , X − ) ⊤ is a random vector with values in R independent of ( Z n ) n ∈ N and with E P ( X ) < ∞ , E P ( X − ) < ∞ . Then for each n ∈ N with n > , wehave P (Ω n ) = 1 for the event Ω n given by (2.2) Ω n :=  det  n X k =1 " X k − X k − X k − X k − ⊤  >  , and hence a unique least squares estimator ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ of ( ϑ , ϑ ) ⊤ based on the observations X − , X , X , . . . , X n exists with probability 1, and this least squares estimator has the form givenby (2.3) " b ϑ ( n )1 b ϑ ( n )2 =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 X k " X k − X k − , n ∈ N , on the event Ω n .

5y Lemma 2.2, for each n ∈ N with n >

3, on the event Ω n having probability 1, wehave " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 X k " X k − X k − − " ϑ ϑ =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 X k " X k − X k − −  n X k =1 " X k − X k − X k − X k − ⊤  −  n X k =1 " X k − X k − X k − X k − ⊤  " ϑ ϑ =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 ( X k − ϑ X k − − ϑ X k − ) " X k − X k − =  n X k =1 " X k − X k − X k − X k − ⊤  − n X k =1 Z k " X k − X k − . Hence " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ = h M i − n M n , (2.4)where M n := σ − n X k =1 Z k " X k − X k − , n ∈ N , (2.5)with M := is a square integrable martingale with respect to the ﬁltration ( F n ) n ∈ Z + ,where F n := σ ( X − , X , X , . . . , X n ) = σ ( X − , X , Z , . . . , Z n ), n ∈ Z + , and ( h M i n ) n ∈ Z + isits quadratic characteristic process given by(2.6) h M i n = σ − n X k =1 " X k − X k − X k − X k − ⊤ , n ∈ N , with h M i := .Indeed, for each n ∈ N , we have h M i n := n X k =1 E P (( M k − M k − )( M k − M k − ) ⊤ | F k − )= n X k =1 E P (cid:18) σ − Z k " X k − X k − σ − Z k " X k − X k − ⊤ (cid:12)(cid:12)(cid:12)(cid:12) F k − ! = σ − n X k =1 E P ( Z k ) " X k − X k − X k − X k − ⊤ = σ − n X k =1 " X k − X k − X k − X k − ⊤ .

6s a consequence of (2.6), we have Ω n = { det( h M i n ) > } , where Ω n is given in (2.2). Using an appropriate random scaling we prove mixing convergence of the LSE ( b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ given in (2.3) as n → ∞ in supercritical cases with real characteristic roots having diﬀerentabsolute values.Considering a supercritical AR(2) process given in (1.1) such that | λ + | 6 = | λ − | , let usintroduce the notation ( λ , λ ) := ( ( λ + , λ − ) if | λ + | > | λ − | ,( λ − , λ + ) if | λ − | > | λ + | ,where λ + and λ − are given in (1.4), i.e., λ and λ is the characteristic root having thelarger and smaller absolute value, respectively. Let ( X k ) k > − be an AR(2) process given in (1.1) such that ( Z n ) n ∈ N is asequence of independent and identically distributed random variables with Z D = N (0 , σ ) , where σ ∈ R ++ , and ( X , X − ) ⊤ is a random vector with values in R independent of ( Z n ) n ∈ N and with E P ( X ) < ∞ , E P ( X − ) < ∞ . Suppose that the autoregressive (characteristic)polynomial x − ϑ x − ϑ of ( X k ) k > − has real roots λ and λ with | λ | > | λ | and | λ | > . Then (3.1) (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ → σN " λ ) F ∞ -mixing under P { Y =0 } as n → ∞ ,where F ∞ := σ (cid:0) ∪ ∞ n =0 F n (cid:1) with F n = σ ( X − , X , X , . . . , X n ) , n ∈ Z + , the random variable N is P -independent of F ∞ , N D = N (0 , , and Y := λ λ − λ ( X − λ X − ) + λ λ − λ ∞ X j =1 λ − j Z j , (3.2) where the series is absolute convergent P -almost surely, and P ( Y = 0) = 1 . Note that the random scaling matrix in (3.1) is well-deﬁned with probability 1 for each n ∈ N with n >

3, since X is absolutely continuous yielding P ( P nk =1 X k − >

0) = 1 and P ( P nk =1 X k − >

0) = 1. Note also that the P -almost sure absolute convergence of the series P ∞ j =1 λ − j Z j in the deﬁnition of Y follows, e.g., from Lemma 8.1 in H¨ausler and Luschgy [3],since E P (log + ( | Z | )) < ∞ and | λ | > λ ∈ R . Remark also that Y is F ∞ -measurable,7ince the series P ∞ j =1 λ − j Z j converges P -a.s., P nj =1 λ − j Z j is F ∞ -measurable for all n ∈ N ,and the underlying probability space (Ω , F , P ) is complete.In the next remark we give another representation of the law of the limit random variablein (3.1) in Theorem 3.1. By Step 6 of the proof of Theorem 3.1, the law of the limit random variablein (3.1) can be represented as the law of the P -almost surely convergent series P ∞ j =0 λ − j N j ,where ( N j ) j ∈ Z + is a P -independent and identically distributed sequence of R -valued randomvectors being P -independent of F ∞ such that N D = P N  , λ − λ  λ )sign( λ ) 1  . ✷ Next, we formulate two corollaries of the proof of Theorem 3.1 about the asymptotic be-haviour of M n (given in (2.5)) and its quadratic characteristic process h M i n (given in (2.6))as n → ∞ , proving stable convergence and P -almost sure convergence, respectively. Thesetwo results can be interesting on its own rights as well. Under the conditions of Theorem 3.1, for the process ( M n ) n ∈ N deﬁned in (2.5) , we have λ − n M n → η ∞ X j =0 λ − j N j D = η N " λ ) F ∞ -stably under P { Y =0 } as n → ∞ ,where • η = | Y | σ p λ − " | λ | − with Y is given in (3.2) , • ( N j ) j ∈ Z + is a P -independent and identically distributed sequence of R -valued randomvectors being P -independent of F ∞ such that N D = P N  , λ − λ  λ )sign( λ ) 1  . • the series P ∞ j =0 λ − j N j converges P -almost surely, • N is P -independent of F ∞ with N D = N (0 , . Consequently, N and η are P -independent, since η is F ∞ -measurable (see the end of Step 3 in the proof of Theorem3.1). h M i n (given in (2.6)) as n → ∞ . Under the conditions of Theorem 3.1, for the process ( h M i n ) n ∈ N given in (2.6) , we have λ − n h M i n = λ − n σ " P nk =1 X k − P nk =1 X k − X k − P nk =1 X k − X k − P nk =1 X k − P -a.s. −→ Y ( λ − σ " λ − λ − λ − (3.3) as n → ∞ , where Y is given in (3.2) . Especially, λ − n det( h M i n ) P -a.s. −→ as n → ∞ . We give a detailed comparison of Theorem 3.1 (of the present paper) and part (c) of Theorem3 in Monsour [8] specialized to the two-dimensional case. We will distinguish two supercriticalcases, namely, without a unit root and with a unit root.First of all, we note that we are not convinced that the proof of part (c) of Theorem 3 inMonsour [8] is complete/correct. Namely, with the notations of Monsour [8], we can not see howthe weak convergence O ⊤ n O n D ( P ) −→ I p as n → ∞ yields that O n converges in distribution as n → ∞ , where O n := ( V G P nj =1 e X j − e X ⊤ j − G ⊤ V ⊤ ) − / V GR − n H / , n ∈ N , and we can notsee why ( O n , H − / R n P nj =1 e X j − ε j ) converges in distribution as n → ∞ (which is implicitlyused in the proof in question).Let us suppose that the conditions of Theorem 3.1 hold together with | λ | 6 = 1, i.e., λ , λ ∈ R and either | λ | > | λ | > | λ | > > | λ | (partiallyexplosive case). For simplicity, let us suppose that σ = 1. Provided that the proof of part (c)of Theorem 3 in Monsour [8] is complete, in our case it should yield that  P nk =1 X k − P nk =1 X k − X k − P nk =1 X k − X k − P nk =1 X k −  / " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ D ( P ) −→ N ( , I ) as n → ∞ .(4.1)We note that in part (c) of Theorem 3 in Monsour [8], the limit law is represented in a morecomplicated form, but from its proof it turns out that in the special case of no characteristicroots on the unit circle (as in our considered case) the limit law can be represented in the form O ( ζ , ζ ) ⊤ D = N ( , I ), where O is some random 2 × ζ , ζ ) ⊤ having a 2-dimensional standard normal distribution (see page 305 in Monsour[8]). In the purely explosive case, the weak convergence in (4.1) was established in severalother papers, see, e.g., Jeganathan [4, Theorem 14], Mikulski and Monsour [6, Theorem 1] orMonsour and Mikulski [7, Theorem on page 146].Since mixing convergence yields convergence in distribution, as a consequence of Theorem9.1, we have(4.2) (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ D ( P ) −→ N " λ ) as n → ∞ ,where N is a standard normally distributed random variable.Next we check that (4.1) yields (4.2). Using (2.4) and (2.6), the weak convergence in (4.1)can be rewritten in the form h M i / n h M i − n M n = h M i − / n M n D ( P ) −→ N ( , I ) as n → ∞ .By the decomposition in Step 1 of the proof of Theorem 3.1, the weak convergence in (4.2) canbe rewritten in the form A n M n D ( P ) −→ N " λ ) as n → ∞ ,where A n , n ∈ N , is given in (5.2) with σ = 1. Since on the event Ω n (given in (2.2)), wehave A n M n = A n h M i / n ( h M i − / n M n ), to check that (4.1) yields (4.2), by Slutsky’s lemma,it is enough to verify that A n h M i / n P -a.s. −→ p λ − " λ − sign( λ ) | λ | − as n → ∞ .(4.3)Indeed, p λ − " λ − sign( λ ) | λ | − p λ − " λ − sign( λ ) | λ | − ⊤ = 11 + λ − " λ − sign( λ ) + ( λ | λ | ) − sign( λ ) + ( λ | λ | ) − λ − = " λ )sign( λ ) 1 , which coincides with the covariance matrix of N [1 , sign( λ )] ⊤ , as desired. Next, we prove(4.3). Recall that if V = ( v i,j ) i,j =1 , ∈ R × is a symmetric and positive deﬁnite matrix, then V / = 1 q v , + v , + 2 p det( V ) (cid:16) V + p det( V ) I (cid:17) , and hence on the event Ω n , h M i / n = 1 qP nk =1 X k − + P nk =1 X k − + 2 p det( h M i n ) (cid:16) h M i n + p det( h M i n ) I (cid:17) . A n h M i / n = 1 qP nk =1 X k − + P nk =1 X k − + 2 p det( h M i n ) × (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  + p det( h M i n ) A n  = 1 q λ − n P nk =1 X k − + λ − n P nk =1 X k − + 2 p λ − n det( h M i n ) × (cid:0) λ − n P nk =1 X k − (cid:1) / λ − n P nk =1 X k − X k − ( λ − n P nk =1 X k − ) / λ − n P nk =1 X k − X k − ( λ − n P nk =1 X k − ) / (cid:0) λ − n P nk =1 X k − (cid:1) /  + q λ − n det( h M i n ) λ n A n  P -a.s. −→ q λ − Y + λ ( λ − Y  | Y | √ λ − Y / ( λ ( λ − | Y | / √ λ − Y / ( λ ( λ − | Y | / ( | λ | √ λ − | Y || λ | √ λ −  = 1 p λ − " λ − sign( λ ) | λ | − as n → ∞ ,yielding (4.3), as desired.Now let us suppose that the conditions of Theorem 3.1 hold together with | λ | = 1, i.e., λ ∈ R , | λ | >

1, and either λ = 1 or λ = −

1. For simplicity, let us suppose again that σ = 1. Provided that the proof of part (c) of Theorem 3 in Monsour [8] is complete, in ourcase it should yield that  P nk =1 X k − P nk =1 X k − X k − P nk =1 X k − X k − P nk =1 X k −  / " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ D ( P ) −→ O ( ± )  ζ ( ± ) ± R W ( ± ) u d W ( ± ) u ( R ( W ( ± ) u ) d u ) /  (4.4)as n → ∞ , where the ± sign is according to λ = ± O ( ± ) is some random 2 × ζ ( ± ) is a one-dimensional standard normally distributed random variable,( W ( ± ) u ) u ∈ [0 , is standard Wiener process such that ζ ( ± ) and R W ( ± ) u d W ( ± ) u are uncorrelated.Here O ( ± ) and ( ζ ( ± ) , R W ( ± ) u d W u / ( R ( W ( ± ) u ) d u ) / ) are not necessarily independent. Using(4.3) and Slutsky’s lemma, similarly as we have seen before, it should yield that (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ = A n M n = A n h M i n h M i − n M n D ( P ) −→ ( P ) −→ p λ − " λ − sign( λ ) | λ | − O ( ± )  ζ ( ± ) ± R W ( ± ) u d W ( ± ) u ( R ( W ( ± ) u ) d u ) /  as n → ∞ .Taking into account (3.1) and that mixing convergence yields convergence in distribution, underthe conditions of Theorem 3.1 together with | λ | = 1, it should hold that1 p λ − " λ − sign( λ ) | λ | − O ( ± )  ζ ( ± ) ± R W ( ± ) u d W ( ± ) u ( R ( W ( ± ) u ) d u ) /  D = N " λ ) . We were not able to check whether the previous equality in distribution holds or not (mainlydue to the lack of an explicit form for the random matrix O ( ± ) ), an in fact, we are not sure thatit is true, as, we detailed earlier, we are not convinced that the proof of part (c) of Theorem 3in Monsour [8] is complete/correct.In any case, we emphasize that both in (4.1) and in (4.2) the type of convergence is con-vergence in distribution, and in (3.1) we proved mixing convergence which is stronger thanconvergence in distribution, so in general (4.1) (or (4.2)) would not yield (3.1) without anyadditional work. Our proof technique is completely diﬀerent from that of Monsour [8].Finally, we note that we tried to prove mixing (or stable-) convergence of ( b ϑ ( n )1 , b ϑ ( n )2 ) as n → ∞ in the supercritical case using a non-random normalization, but our attempts have notbeen successful so far. It is mainly due to the fact that the almost sure limit of λ − n h M i n as n → ∞ in Corollary 3.4 is an R × -valued random matrix having determinant zero P -almostsurely. Proof of Lemma 2.1.

For each n ∈ N , ( ϑ , ϑ ) ⊤ ∈ R and ( x − , x , x , . . . , x n ) ⊤ ∈ R n +2 ,we have Q n ( x − , x , x , . . . , x n ; ϑ , ϑ ) = ϑ n X k =1 x k − + 2 ϑ ϑ n X k =1 x k − x k − + ϑ n X k =1 x k − − ϑ n X k =1 x k x k − − ϑ n X k =1 x k x k − + n X k =1 x k , hence the function R ∋ ( ϑ , ϑ ) ⊤ Q n ( x − , x , x , . . . , x n ; ϑ , ϑ ) is strictly convex ifdet " P nk =1 x k − P nk =1 x k − x k − P nk =1 x k − x k − P nk =1 x k − = det( G n ( x − , x , x , . . . , x n )) > . Indeed, if det( G n ( x − , x , x , . . . , x n )) >

0, then P nk =1 x k − > G n ( x − , x , x , . . . , x n ) = " P nk =1 x k − P nk =1 x k − x k − P nk =1 x k − x k − P nk =1 x k −

12s positive deﬁnite yielding that the function R ∋ ( ϑ , ϑ ) ⊤ Q n ( x − , x , x , . . . , x n ; ϑ , ϑ )is strictly convex. Hence, for each n ∈ N and for any least squares estimator F n of ( ϑ , ϑ ) ⊤ ,we have if ( x − , x , x , . . . , x n ) ∈ D n , then F n ( x − , x , x , . . . , x n ) is the unique solution ofthe linear system of equations ( ϑ P nk =1 x k − + ϑ P nk =1 x k − x k − = P nk =1 x k x k − ,ϑ P nk =1 x k − x k − + ϑ P nk =1 x k − = P nk =1 x k x k − . Consequently, on the set D n , any least squares estimator F n of ( ϑ , ϑ ) ⊤ has the form givenin (2.1). ✷ Proof of Lemma 2.2.

It is enough to check that for each n ∈ N with n >

3, we have P (Ω n ) = P  det  n X k =1 " X k − X k − X k − X k − ⊤  >  = 1 , since then, by Lemma 2.1, F n ( X − , X , X , . . . , X n ) = " b ϑ ( n )1 b ϑ ( n )2 on the event Ω n ,where F n : R n +2 → R is a measurable function satisfying (2.1) on the set D n . Here for each n ∈ N , det  n X k =1 " X k − X k − X k − X k − ⊤  = det " P nk =1 X k − P nk =1 X k − X k − P nk =1 X k − X k − P nk =1 X k − = n X k =1 X k − n X k =1 X k − − n X k =1 X k − X k − ! . By Cauchy-Schwartz’s inequality, we have n X k =1 X k − n X k =1 X k − − n X k =1 X k − X k − ! > , n ∈ N , and equality holds if and only if the (random) vectors h X X . . . X n − i ⊤ and h X − X . . . X n − i ⊤ are linearly dependent, i.e., there exist K, L ∈ R such that K + L > K h X X . . . X n − i ⊤ + L h X − X . . . X n − i ⊤ = h . . . i ⊤ . (5.1)Here K and L may depend on ω ∈ Ω as well. In what follows let us suppose that n ∈ N and n >

3. If (5.1) holds with K = 0 (yielding L = 0) or L = 0 (yielding K = 0),13hen X = 0, which can occur only with probability zero, since X is absolutely continuous(following from the facts that X = ϑ X + ϑ X − + Z and Z is absolutely continuous). Soif (5.1) holds, then P ( { K = 0 } ∪ { L = 0 } ) = 0. Further, if (5.1) holds with X − = 0, then X = 0, since if L = 0, then K > X = X = · · · = X n − = 0; and if L > K = 0, then X − = X = X = · · · = X n − = 0; and if L >

K >

0, then X = 0,yielding X = 0. Similarly, if (5.1) holds with X = 0, then X = 0. So, using again theabsolute continuity of X , if (5.1) holds, then P ( { X − = 0 } ∪ { X = 0 } ) = 0.Consequently, P (Ω n ) = P  det  n X k =1 " X k − X k − X k − X k − ⊤  = 0  P  X X ... X n −  = (cid:18) − LK (cid:19)  X − X ... X n −  , K = 0 , L = 0 , X − = 0 , X = 0  = P X k − = (cid:18) − LK (cid:19) k X − , k = 1 , . . . , n ; K = 0 , L = 0 , X − = 0 , X = 0 ! P X k − = (cid:18) − LK (cid:19) k X − , k = 1 , . . . , n ; X X − = − LK , K = 0 , L = 0 , X − = 0 , X = 0 ! P X k − = (cid:18) X X − (cid:19) k X − , k = 1 , . . . , n, X − = 0 , X = 0 ! P X = (cid:18) X X − (cid:19) X − , X − = 0 , X = 0 ! = P ϑ X + ϑ X − + Z = (cid:18) X X − (cid:19) X − , X − = 0 , X = 0 ! = P (cid:18) Z = X X − − ϑ X − ϑ X − , X − = 0 , X = 0 (cid:19) = Z R \{ (0 , } P (cid:18) Z = x x − − ϑ x − ϑ x − (cid:19) F X − ,X (d x − , d x ) = 0 , since Z and ( X − , X ) are P -independent, and Z is absolutely continuous, where F X − ,X denotes the distribution function of ( X − , X ). ✷ Proof of Theorem 3.1.

We divide the proof into six steps.

Step 1 (a decomposition of the left-hand side of (3.1) ): For each n ∈ N with n >

3, by142.4) and (2.6), we have (cid:0)P nk =1 X k − (cid:1) / P nk =1 X k − X k − ( P nk =1 X k − ) / P nk =1 X k − X k − ( P nk =1 X k − ) / (cid:0)P nk =1 X k − (cid:1) /  " b ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ = "(cid:0)P nk =1 X k − (cid:1) − / (cid:0)P nk =1 X k − (cid:1) − / nk =1 X k − P nk =1 X k − X k − P nk =1 X k − X k − P nk =1 X k − ϑ ( n )1 − ϑ b ϑ ( n )2 − ϑ = ( σ − A n )( σ h M i n )( h M i − n M n ) = σ A n M n , where A n := σ "(cid:0)P nk =1 X k − (cid:1) − / (cid:0)P nk =1 X k − (cid:1) − / , n ∈ N , (5.2)on an event having probability one. Indeed, by Lemma 2.2, for each n ∈ N with n > b ϑ ( n )1 , b ϑ ( n )2 ) ⊤ exists uniquely on the event Ω n (given in (2.2))having probability one, and, since X is absolutely continuous (following from the facts that X = ϑ X + ϑ X − + Z and Z is absolutely continuous), we have P ( P nk =1 X k − >

0) = P ( P nk =1 X k − >

0) = 1 for each n ∈ N with n >

3, yielding that A n is well-deﬁned foreach n ∈ N with n > P -almost surely. Hence, by part (c) of Theorem 3.18 in H¨ausler andLuschgy [3], in order to prove (3.1), it is enough to show(5.3) A n M n → N " λ ) F ∞ -mixing under P { Y =0 } as n → ∞ .We are going to apply Theorem B.1 with d = 2, ( U n ) n ∈ Z + = ( M n ) n ∈ Z + , ( B n ) n ∈ N = ( A n ) n ∈ N ,( Q n ) n ∈ N = ( λ − n I ) n ∈ N , ( F n ) n ∈ Z + = ( σ ( X − , X , X , . . . , X n )) n ∈ Z + , and G = Ω. Note that A n is invertible for each n ∈ N with n > P -almost surely. Step 2 (asymptotic behaviour of A n given in (5.2) as n → ∞ ): In order to check theconditions (i)–(iv) of Theorem B.1 with the choices given at the end of Step 1, we need theasymptotic behavior of A n as n → ∞ . The vectors " λ + and " λ − are right eigenvectors of the matrix ϑ corresponding to the eigenvalues λ + and λ − , respec-tively. Due to our assumption | λ | > | λ | , we have λ + = λ − , hence the matrix ϑ can bewritten in a Jordan canonical form ϑ = " λ + λ − λ + λ − λ + λ − − . n ∈ Z + , we have(5.4) ϑ n = " λ + λ − λ + λ − n " λ + λ − − = 1 λ + − λ − " λ + λ − λ n + λ n − − λ − − λ + = λ n + λ + − λ − " λ + − λ − λ + − λ − + λ n − λ + − λ − " − λ − λ + λ − − λ + = λ n λ − λ " λ − λ λ − λ + λ n λ − λ " − λ λ λ − λ . For each n ∈ N , by (1.3) with k = n , we obtain(5.5) X n = " ⊤ " X n X n − = " ⊤ ϑ n " X X − + n X j =1 " ⊤ ϑ n − j " Z j . For each n ∈ Z + , by (5.4), we have " ⊤ ϑ n = λ n +11 λ − λ " − λ ⊤ + λ n +12 λ − λ " − λ ⊤ . Thus, by (5.5), for each n ∈ N , we get X n = λ n +11 λ − λ ( X − λ X − ) + λ n +12 λ − λ ( − X + λ X − )+ λ λ − λ n X j =1 λ n − j Z j + λ λ − λ n X j =1 λ n − j ( − Z j ) . Hence we obtain λ − n X n = λ λ − λ ( X − λ X − ) + λ λ − λ ( − X + λ X − ) (cid:18) λ λ (cid:19) n + λ λ − λ n X j =1 λ − j Z j − λ λ − λ (cid:18) λ λ (cid:19) n n X j =1 λ − j Z j P -a.s. −→ λ λ − λ ( X − λ X − ) + λ λ − λ ∞ X j =1 λ − j Z j = Y as n → ∞ ,(5.6)where the P -almost sure absolute convergence of the series P ∞ j =1 λ − j Z j follows by Lemma 8.1in H¨ausler and Luschgy [3]. Indeed, since E P (log + ( | Z | )) < ∞ , λ ∈ R and | λ | >

1, Lemma16.1 in H¨ausler and Luschgy [3] yields the P -almost sure absolute convergence of P ∞ j =1 λ − j Z j ,and we check that (cid:0) λ λ (cid:1) n P nj =1 λ − j Z j P -a.s. −→ n → ∞ . For this it is enough to verify that P ∞ n =1 P (cid:16)(cid:12)(cid:12)(cid:12)(cid:16) λ λ (cid:17) n P nj =1 λ − j Z j (cid:12)(cid:12)(cid:12) > ε (cid:17) < ∞ for each ε ∈ R ++ . Using that Z j , j ∈ N , areindependent and identically distributed random variables having zero mean and variance σ ,for each ε ∈ R ++ , we have ∞ X n =1 P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) λ λ (cid:19) n n X j =1 λ − j Z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε ! ε ∞ X n =1 E P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) λ λ (cid:19) n n X j =1 λ − j Z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  = 1 ε ∞ X n =1 (cid:18) λ λ (cid:19) n n X j =1 λ − j σ < ∞ , since n X j =1 λ − j =  n if | λ | = 1, λ − λ − n − λ − − if | λ | 6 = 1,and if | λ | = 1, then lim sup n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:16) λ λ (cid:17) n +1) ( n + 1) (cid:16) λ λ (cid:17) n n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − < , and if | λ | 6 = 1, then | λ | > | λ | implies thatlim sup n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:16) λ λ (cid:17) n +1) λ − n +1)2 − λ − − (cid:16) λ λ (cid:17) n λ − n − λ − − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12)(cid:12) lim sup n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ − n +1)2 − λ − n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12) < | λ | > (cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12) lim sup n →∞ (cid:12)(cid:12)(cid:12) λ − − λ n − λ n (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12) | λ | − = | λ | − < | λ | < P ∞ n =1 (cid:0) λ λ (cid:1) n P nj =1 λ − j is convergent.We note that if, in addition, ( X k ) k > − is purely explosive as well (i.e., λ , λ ∈ R and | λ | > | λ | > (cid:0) λ λ (cid:1) n P nj =1 λ − j Z j P -a.s. −→ n → ∞ follows more easily, since, byLemma 8.1 in H¨ausler and Luschgy [3], P ∞ j =1 λ − j Z j is absolutely convergent P -a.s., and, since | λ | < | λ | , we have (cid:0) λ λ (cid:1) n P nj =1 λ − j Z j P -a.s. −→ · P ∞ j =1 λ − j Z j = 0 as n → ∞ .The random variable Y given in (3.2) is absolutely continuous. Indeed, Y = λ λ − λ ( X − λ X − + λ − Z ) + λ λ − λ ∞ X j =2 λ − j Z j , where the absolute continuity of Z and the independence of Z and ( X − , X ) yield theabsolute continuity of X − λ X − + λ − Z . Hence, using that P ∞ j =2 λ − j Z j and X − λ X − + λ − Z are independent, we have the absolute continuity of Y .17y (5.6), we get λ − n X n P -a.s. −→ Y as n → ∞ , and, since P ∞ k =1 λ k − = ∞ (due to | λ | > λ ∈ R ), applying the Toeplitz lemma (see, e.g., H¨ausler and Luschgy [3, Lemma6.28]), we get P nk =1 X k − P nk =1 λ k − P -a.s. −→ Y as n → ∞ .Consequently, using P nk =1 λ k − = λ n − λ − , we conclude(5.7) λ − n n X k =1 X k − = λ − n λ n − λ − P nk =1 X k − P nk =1 λ k − P -a.s. −→ λ − Y as n → ∞ .In a similar way, we have λ − n n X k =1 X k − P -a.s. −→ λ ( λ − Y as n → ∞ .(5.8)Indeed, by (5.6) and (5.7), we have λ − n n X k =1 X k − = λ − n ( X − − X n − ) + λ − n n X k =1 X k − = λ − n X − − λ − λ − n − X n − + λ − n n X k =1 X k − P -a.s. −→ − λ − Y + ( λ − − Y = 1 λ ( λ − Y as n → ∞ ,as desired.The absolute continuity of Y implies P ( Y = 0) = 0, hence, by (5.7) and (5.8), we obtain(5.9) λ n A n = σ "(cid:0) λ − n P nk =1 X k − (cid:1) − / (cid:0) λ − n P nk =1 X k − (cid:1) − / P -a.s. −→ σ (cid:16) Y λ − (cid:17) − / (cid:16) Y ( λ − λ (cid:17) − /  = σ p λ − | Y | " | λ | as n → ∞ . Step 3 (checking conditions (i) and (iii) of Theorem B.1):

Recall that at the end of Step 1we gave our choices for ( U n ) n ∈ Z + , ( B n ) n ∈ N , ( Q n ) n ∈ N , ( F n ) n ∈ Z + and G in Theorem B.1.Applying (5.9), we obtain(5.10) λ − n A − n = ( λ n A n ) − P -a.s. −→ σ p λ − | Y | " | λ | − = | Y | σ p λ − " | λ | − as n → ∞ .18ence, since P -almost sure convergence yields convergence in P { Y =0 } -probability, we obtainthat condition (i) of Theorem B.1 holds with η := | Y | σ p λ − " | λ | − , which is invertible if and only if { Y = 0 } , so P ( ∃ η − ) = P ( Y = 0) = 1. Here η is F ∞ -measurable, since Y is F ∞ -mesaurable. Indeed, the series P ∞ j =1 λ − j Z j converges P -a.s.(see Step 2), P nj =1 λ − j Z j is F ∞ -measurable for all n ∈ N , and the underlying probabilityspace (Ω , F , P ) is complete. Moreover, by (5.10), for every r ∈ N , we have A n A − n − r = λ − r ( λ n A n )( λ − ( n − r )1 A − n − r ) P -a.s. −→ λ − r η − η = λ − r I as n → ∞ ,hence we obtain that condition (iii) of Theorem B.1 holds with P = λ − I . Step 4 (checking condition (ii) of Theorem B.1):

Condition (ii) of Theorem B.1 with theearlier given choices (see the end of Step 1) holds if and only if (cid:0) λ − n M ( j ) n (cid:1) n ∈ Z + is stochasticallybounded in P { Y =0 } -probability for each j ∈ { , } , where ( M ( j ) n ) n ∈ Z + := ( e ⊤ j M n ) n ∈ Z + for j ∈ { , } . Indeed, for all K > n ∈ Z + , P { Y =0 } ( k λ − n M n k > K ) P { Y =0 } (cid:18) | λ − n M (1) n | > K √ (cid:19) + P { Y =0 } (cid:18) | λ − n M (2) n | > K √ (cid:19) , and P { Y =0 } ( | λ − n M ( j ) n | > K ) P { Y =0 } ( k λ − n M n k > K ) , j = 1 , . By (2.6), the process ( M (1) n ) n ∈ Z + is a square integrable martingale with respect to the ﬁltration( F n ) n ∈ Z + and it has a quadratic characteristic process h M (1) i n = σ − n X k =1 X k − , n ∈ N , with h M (1) i = 0. For each n ∈ N and K ∈ (0 , ∞ ), by Lenglart’s inequality (see CorollaryC.2), we get P ( | λ − n M (1) n | > K ) = P ( | M (1) n | > K λ n ) K + P ( h M (1) i n > Kλ n ) , so that for each K ∈ (0 , ∞ ), we havesup n ∈ N P ( | λ − n M (1) n | > K ) K + sup n ∈ N P ( λ − n h M (1) i n > K ) . (5.11)By (5.7), we get λ − n h M (1) i n = λ − n σ − n X k =1 X k − P -a.s. −→ Y ( λ − σ as n → ∞ ,19ence (cid:0) λ − n h M (1) i n (cid:1) n ∈ Z + is stochastically bounded in P -probability, i.e.,lim K →∞ sup n ∈ N P ( | λ − n h M (1) i n | > K ) = 0 . Consequently, by (5.11), lim K →∞ sup n ∈ N P ( | λ − n M (1) n | > K ) = 0, i.e., (cid:0) λ − n M (1) n (cid:1) n ∈ Z + isstochastically bounded in P -probability, and hence stochastically bounded in P { Y =0 } -probabilityas well. Using (5.8), in a similar way, one can check that the process (cid:0) λ − n M (2) n (cid:1) n ∈ Z + isstochastically bounded in P { Y =0 } -probability, and we conclude that condition (ii) of TheoremB.1 holds. Step 5 (checking condition (iv) of Theorem B.1):

In order to check condition (iv) of TheoremB.1, let us observe that the square integrable martingale ( M n ) n ∈ Z + has conditional Gaussianincrements with respect to the ﬁltration ( F n ) n ∈ Z + , since for each n ∈ N , the conditionaldistribution of ∆ M n = M n − M n − = σ Z n " X n − X n − given F n − is N  , σ E P ( Z n ) " X n − X n − X n − X n − ⊤  = N  , σ " X n − X n − X n − X n − ⊤  = N ( , ∆ h M i n ) , where the last equality follows by (2.6). More precisely, using the notations and results ofH¨ausler and Luschgy [3], for each n ∈ N , the conditional distribution P ∆ M n | F n − of ∆ M n given F n − can be calculated as follows: P ∆ M n | F n − = P σ − Z n [ X n − , X n − ] ⊤ | F n − = P g ( X n − ,X n − ,Z n ) | F n − = (cid:16) P ( X n − ,X n − ,Z n ) | F n − (cid:17) g = (cid:16) δ ( X n − ,X n − ) ⊗ P Z n | F n − (cid:17) g = (cid:0) δ ( X n − ,X n − ) ⊗ P Z n (cid:1) g , where g : R → R , g ( x , x , z ) := σ − z [ x , x ] ⊤ , ( x , x , z ) ∈ R , P Z n denotes thedistribution of Z n under P , δ ( X n − ,X n − ) is the Dirac Markov kernel corresponding to( X n − , X n − ), and we used part (a) of Lemma A.5 in H¨ausler and Luschgy [3], parts (b) and( c ) of Lemma A.4 in H¨ausler and Luschgy [3], the independence of Z n and F n − , and the F n − -measurability of X n − and X n − . Hence for each n ∈ N , ω ∈ Ω, and B ∈ B ( R ),we have P ∆ M n | F n − ( ω, B ) = δ ( X n − ,X n − ) ⊗ P Z n (cid:0) ω, { ( x , x , z ) ⊤ ∈ R : g ( x , x , z ) ∈ B } (cid:1) = P Z n (cid:0) { z ∈ R : σ − z [ X n − ( ω ) , X n − ( ω )] ⊤ ∈ B } (cid:1) = P (cid:0) { e ω ∈ Ω : σ − Z n ( e ω )[ X n − ( ω ) , X n − ( ω )] ⊤ ∈ B } (cid:1) = P N  ,σ −  X n − ( ω ) X n − ( ω )  X n − ( ω ) X n − ( ω )  ⊤  ( B ) = P N ( , ∆ h M i n ( ω )) ( B ) ,

20s desired. Consequently, for each θ ∈ R and n ∈ N , we obtain E P (cid:18) exp (cid:8) i h θ , A n ∆ M n i (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) F n − (cid:19) = E P (cid:18) exp (cid:26) i (cid:10) A ⊤ n θ , ∆ M n (cid:11)(cid:27) (cid:12)(cid:12)(cid:12)(cid:12) F n − (cid:19) = exp (cid:26) − D (∆ h M i n ) A ⊤ n θ , A ⊤ n θ E(cid:27) = exp (cid:26) − θ ⊤ A n (∆ h M i n ) A n θ (cid:27) , where, at the last equality, we used that A n is symmetric. Using (2.6) and (5.6), we get(5.12) λ − n ∆ h M i n = σ − λ − n " X n − X n − X n − X n − ⊤ = σ − λ − n " X n − X n − X n − X n − X n − X n − P -a.s. −→ σ − Y " λ − λ − λ − λ − as n → ∞ .Applying (5.9) and (5.12), since P ( Y = 0) = 0, we get A n (∆ h M i n ) A n = ( λ n A n )( λ − n ∆ h M i n )( λ n A n ) P -a.s. −→ ( λ − σ Y · Y σ " | λ | λ − λ − λ − λ − | λ | = λ − λ " λ )sign( λ ) 1 as n → ∞ .Hence, for each θ ∈ R , we have E P (cid:18) exp (cid:8) i h θ , A n ∆ M n i (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) F n − (cid:19) P -a.s. −→ exp ( − λ − λ θ ⊤ " λ )sign( λ ) 1 θ ) = Z R e i h θ , x i P N  , λ − λ  λ )sign( λ ) 1  (d x )as n → ∞ . Hence we obtain that condition (iv) of Theorem B.1 holds with µ := P N  , λ − λ  λ )sign( λ ) 1  , (5.13)since Z R log + ( k x k ) µ (d x ) = Z { x ∈ R : k x k > } log( k x k ) µ (d x ) Z { x ∈ R : k x k > } k x k µ (d x ) Z { x =( x ,x ) ⊤ ∈ R : k x k > } ( | x | + | x | ) µ (d x ) < ∞ µ (being a 2-dimensional normal distribution)are ﬁnite. Step 6 (application of Theorem B.1):

Using Steps 1–5, we can apply Theorem B.1 with ourchoices given at the end of Step 1, and we obtain(5.14) A n M n → ∞ X j =0 " λ − j λ − j N j F ∞ -mixing under P { Y =0 } as n → ∞ ,where ( N j ) j ∈ Z + is a P -independent and identically distributed sequence of R -valued randomvectors being P -independent of F ∞ with P ( N ∈ B ) = µ ( B ) for all B ∈ B ( R ), and theseries in (5.14) converges P -almost surely. The distribution of the limit random variable in(5.14) can be written in the form ∞ X j =0 " λ − j λ − j N j = ∞ X j =0 λ − j N j D = " N sign( λ ) N = N " λ ) , where N is P -independent of F ∞ and N D = N (0 , P ∞ j =0 λ − j N j has a 2-dimensional normal distribution with covariance matrix ∞ X j =0 λ − j ! λ − λ " λ )sign( λ ) 1 = " λ )sign( λ ) 1 . Consequently, we conclude (5.3), hence, as it was explained, the convergence (3.1) follows frompart (a) of Theorem 3.18 in H¨ausler and Luschgy [3]. ✷ Proof of Corollary 3.3.

In the proof of Theorem 3.1, we showed that Theorem B.1 can beapplied with the choices d = 2, ( U n ) n ∈ Z + = ( M n ) n ∈ Z + , ( B n ) n ∈ N = ( A n ) n ∈ N , ( Q n ) n ∈ N =( λ − n I ) n ∈ N , ( F n ) n ∈ Z + = ( σ ( X − , X , X , . . . , X n )) n ∈ Z + and G = Ω, where A n , n ∈ N , isgiven in (5.2). So, by (B.2), we have the statement. ✷ Proof of Corollary 3.4.

First, we prove λ − n n X k =1 X k − X k − P -a.s. −→ λ − λ Y as n → ∞ .(5.15)By (5.6), we have λ − n X n P -a.s. −→ Y as n → ∞ , and λ − ( n − X n − P -a.s. −→ Y as n → ∞ ,so λ − n ( X n + X n − ) = λ − n X n + λ − λ − ( n − X n − P -a.s. −→ (cid:0) λ − (cid:1) Y as n → ∞ .Since P ∞ k =1 λ k − = ∞ (due to | λ | > λ ∈ R ), applying the Toeplitz lemma (see, e.g.,H¨ausler and Luschgy [3, Lemma 6.28]), we get P nk =1 ( X k − + X k − ) P nk =1 λ k − P -a.s. −→ (1 + λ − ) Y as n → ∞ .22ince P nk =1 λ k − = λ n − λ − , we have λ − n n X k =1 ( X k − + X k − ) P -a.s. −→ ( λ − + 1) λ − Y as n → ∞ ,and, by (5.7) and (5.8), λ − n n X k =1 X k − X k − = 12 λ − n " n X k =1 ( X k − + X k − ) − n X k =1 X k − − n X k =1 X k − P -a.s. −→ (cid:20) ( λ − + 1) λ − − λ − − λ − λ (cid:21) Y = 1( λ − λ Y as n → ∞ , yielding (5.15).Finally, (2.6), (5.7), (5.8) and (5.15) yield (3.3). ✷ AppendicesA Stable convergence

We recall the notions of stable and mixing convergence.

A.1 Deﬁnition.

Let (Ω , F , P ) be a probability space and G ⊂ F be a sub- σ -ﬁeld. Let ( X n ) n ∈ N and X be R d -valued random variables, where d ∈ N .(i) We say that X n converges G -stably to X as n → ∞ , if the conditional distribution P X n | G of X n given G converges G -stably to the conditional distribution P X | G of X given G as n → ∞ , which equivalently means that lim n →∞ E P ( f E P ( h ( X n ) | G )) = E P ( f E P ( h ( X ) | G )) for all random variables f : Ω → R with E P ( | f | ) < ∞ and for all bounded and continuousfunctions h : R d → R .(ii) We say that X n converges G -mixing to X as n → ∞ , if X n converges G -stablyto X as n → ∞ , and P X | G = P X P -almost surely, where P X denotes the distribution of X on ( R d , B ( R d )) under P . Equivalently, X n converges G -mixing to X as n → ∞ ,if X n converges G -stably to X as n → ∞ , and σ ( X ) and G are independent, whichequivalently means that lim n →∞ E P ( f E P ( h ( X n ) | G )) = E P ( f ) E P ( h ( X )) for all random variables f : Ω → R with E P ( | f | ) < ∞ and for all bounded and continuousfunctions h : R d → R .

23n Deﬁnition A.1, P X n | G , n ∈ N , and P X | G are the P -almost surely unique G -measurableMarkov kernels from (Ω , F ) to ( R d , B ( R d )) such that for each n ∈ N , Z G P X n | G ( ω, B ) P (d ω ) = P ( X − n ( B ) ∩ G ) for every G ∈ G , B ∈ B ( R d ).and Z G P X | G ( ω, B ) P (d ω ) = P ( X − ( B ) ∩ G ) for every G ∈ G , B ∈ B ( R d ),respectively. For more details, see H¨ausler and Luschgy [3, Chapter 3 and Appendix A]. B A multidimensional stable limit theorem

Recall that log + ( x ) := log( x ) { x > } + 0 · { x< } for x ∈ R + . For an R d -valued stochasticprocess ( U n ) n ∈ Z + , the increments ∆ U n , n ∈ Z + , are deﬁned by ∆ U := and ∆ U n := U n − U n − for n ∈ N .We recall a multidimensional analogue of Theorem 8.2 in H¨ausler and Luschgy [3] whichwas proved in Barczy and Pap [2, Theorem 1.4]. B.1 Theorem. (Barczy and Pap [2, Theorem 1.4])

Let ( U n ) n ∈ Z + and ( B n ) n ∈ Z + be R d -valued and R d × d -valued stochastic processes, respectively, adapted to a ﬁltration ( F n ) n ∈ Z + ,where B n is invertible for suﬃciently large n ∈ N . Let ( Q n ) n ∈ N be a sequence in R d × d such that Q n → as n → ∞ and Q n is invertible for suﬃciently large n ∈ N . Let G ∈ F ∞ := σ ( ∪ ∞ n =0 F n ) with P ( G ) > . Assume that the following conditions are satisﬁed: (i) there exists an R d × d -valued, F ∞ -measurable random matrix η : Ω → R d × d such that P ( G ∩ {∃ η − } ) > and Q n B − n P G −→ η as n → ∞ , (ii) ( Q n U n ) n ∈ N is stochastically bounded in P G ∩{∃ η − } -probability, i.e., lim K →∞ sup n ∈ N P G ∩{∃ η − } ( k Q n U n k > K ) = 0 . (iii) there exists an invertible matrix P ∈ R d × d with ̺ ( P ) < such that B n B − n − r P G −→ P r as n → ∞ for every r ∈ N , (iv) there exists a probability measure µ on ( R d , B ( R d )) with R R d log + ( k x k ) µ (d x ) < ∞ such that E P (cid:0) e i h θ , B n ∆ U n i | F n − (cid:1) P G ∩{∃ η − } −→ Z R d e i h θ , x i µ (d x ) as n → ∞ for every θ ∈ R d . hen (B.1) B n U n → ∞ X j =0 P j Z j F ∞ -mixing under P G ∩{∃ η − } as n → ∞ ,and (B.2) Q n U n → η ∞ X j =0 P j Z j F ∞ -stably under P G ∩{∃ η − } as n → ∞ ,where ( Z j ) j ∈ Z + denotes a P -independent and identically distributed sequence of R d -valuedrandom vectors P -independent of F ∞ with P ( Z ∈ B ) = µ ( B ) for all B ∈ B ( R d ) . The series P ∞ j =0 P j Z j in (B.1) and in (B.2) is absolutely convergent P -almost surely,since, by condition (iv) of Theorem B.1, E P (log + ( k Z k )) < ∞ and one can apply Lemma1.3 in Barczy and Pap [2]. Further, the random variable η and the sequence ( Z j ) j ∈ Z + are P -independent in Theorem B.1, since η is F ∞ -measurable and the sequence ( Z j ) j ∈ Z + is P -independent of F ∞ . C Lenglart’s inequality

The following form of the Lenglart’s inequality can be found, e.g., in H¨ausler and Luschgy [3,Theorem A.8].

C.1 Theorem.

Let ( ξ n ) n ∈ Z + be a nonnegative submartingale with respect to a ﬁltration ( F n ) n ∈ Z + and with compensator A n := P nk =1 E P ( ξ k − ξ k − | F k − ) , n ∈ N , A := 0 . Thenfor each a, b ∈ R ++ and n ∈ N , P (cid:16) max k ∈{ , ,...,n } ξ k > a (cid:17) ba + P ( ξ + A n > b ) . Applying Theorem C.1 for the square of a square integrable martingale, we obtain thefollowing corollary.

C.2 Corollary.

Let ( η n ) n ∈ Z + be a square integrable martingale with respect to a ﬁltration ( F n ) n ∈ Z + and with quadratic characteristic process h η i n := P nk =1 E P (( η k − η k − ) | F k − ) , n ∈ N , h η i := 0 . Then for each a, b ∈ R ++ and n ∈ N , P (cid:16) max k ∈{ , ,...,n } η k > a (cid:17) ba + P ( η + h η i n > b ) . Note that under the conditions of Corollary C.2, the quadratic characteristic process ( h η i n ) n ∈ Z + of ( η n ) n ∈ Z + coincides with the compensator of ( η n ) n ∈ Z + .25 cknowledgements We are grateful to Michael Monsour for sending us his paper [6] and the paper of Venkataraman[16] about limiting distributions for LSE of AR parameters of AR(2) processes without a unitroot.

References [1]

Aknouche, A. (2015). Explosive strong periodic autoregression with multiplicity one.

Journal of Statistical Planning and Inference

Barczy, M. and

Pap, G. (2020). A multidimensional stable limit theorem.

ArXiv https://arxiv.org/abs/2012.04541 [3]

H¨ausler, E. and

Luschgy, H. (2015).

Stable Convergence and Stable Limit Theorems .Springer, Cham.[4]

Jeganathan, P. (1988). On the strong approximation of the distributions of estimatorsin linear stochastic models, I and II: Stationary and explosive AR models.

The Annals ofStatistics

Mann, H. B. and

Wald, A. (1943). On the statistical treatment of linear stochasticdiﬀerence equations.

Econometrica

Mikulski, P. W. and

Monsour, M. J. (1998). Limiting distributions in the secondorder autoregressive process without a unit root.

Technical Report

TR-5/98

Dept. ofMath., American University of Beirut.[7]

Monsour, M. J. and

Mikulski, P. W. (1998). On limiting distributions in explosiveautoregressive processes.

Statistics and Probability Letters Monsour, M. J. (2016). Decomposition of an autoregressive process into ﬁrst orderprocesses.

Journal of Multivariate Analysis

Narasimham, G. V. L. (1969). Some properties of estimators occuring in the theory oflinear stochastic process. In: Economic models, estimation and risk programming: Essaysin honor of Gerhard Tintner, 375–389. Edited by K. A. Fox, J. K. Sengupta and G. V.L. Narasimham.

Lecture Notes in Operations Research and Mathematical Economics ,Springer-Verlag, Berlin-New York.[10] R´enyi, A. (1950). Contributions to the theory of independent random variables.

ActaMathematica Academiae Scientiarum Hungaricae R´enyi, A. (1958). On mixing sequences of sets.

Acta Mathematica Academiae ScientiarumHungaricae

R´enyi, A. (1963). On stable sequences of events.

Sankhy¯a. Series A R´enyi, A. and

R´ev´esz, P. (1958). On mixing sequences of random variables.

Acta Math-ematica Academiae Scientiarum Hungaricae

Venkataraman, K. N. (1967). A note on the least squares estimators of the parametersof a second order linear stochastic diﬀerence equation.

Calcutta Statistical Association.Bulletin Venkataraman, K. N. (1968). Some limit theorems on a linear explosive stochasticdiﬀerence equation with a constant term, and their statistical applications.

Sankhy¯a. SeriesA

Venkataraman, K. N. (1973). Some convergence theorems on a second order linear ex-plosive stochastic diﬀerence equation with a constant term.

Journal of the Indian StatisticalAssociation11