[PDF] Asymptotic Behavior of Least Squares Estimator for Nonlinear Autoregressive Models

Abstract

This paper is concerned with the least squares estimator for a basic class of nonlinear autoregressive models, whose outputs are not necessarily to be ergodic. Several asymptotic properties of the least squares estimator have been established under mild conditions. These properties suggest the strong consistency of the least squares estimates in nonlinear autoregressive models which are not divergent.

Full PDF

aa r X i v : . [ m a t h . P R ] S e p Submitted to the Annals of Applied Probability arXiv: arXiv:0000.0000

ASYMPTOTIC BEHAVIOR OF LEAST SQUARESESTIMATOR FOR NONLINEAR AUTOREGRESSIVEMODELS

By Zhaobo Liu ∗ , † , Chanying Li ∗ , † Academy of Mathematics and Systems Science, Chinese Academy ofSciences, and University of Chinese Academy of Sciences † This paper is concerned with the least squares estimator for abasic class of nonlinear autoregressive models, whose outputs arenot necessarily to be ergodic. Several asymptotic properties of theleast squares estimator have been established under mild conditions.These properties suggest the strong consistency of the least squaresestimates in nonlinear autoregressive models which are not divergent.

1. Introduction.

When it comes to estimating nonlinear autoregres-sive (AR) models, a typical case in the literature is that the underlyingseries are ergodic. Based on this assumption, a series of asymptotic the-ory has been established accordingly (see [1],[2],[9],[12]). However, this goodproperty is not always true. For example, we consider y t +1 = θ τ φ ( y t , . . . , y t − n +1 ) + w t +1 , t ≥ , (1.1)where θ is the m × y t , w t are the scalar obser-vations and random noise signals, respectively. Moreover, φ : R n → R m isa known Lebesgue measurable vector function. No doubt most functions φ produce non-ergodic sequences { y t } . So, this article is intended to identifyparameter θ in model (1.1), whose outputs are not necessarily to be ergodic.It is well known that the least squares (LS) estimator is one of the mosteﬃcient algorithm in parameter estimation and its strong consistency formodel (1.1) depends crucially on the minimal eigenvalue λ min ( t +1) of matrix P − t +1 = I m + t X i =0 φ ( y t , . . . , y t − n +1 ) φ ( y t , . . . , y t − n +1 ) . ∗ This work was supported in part by the National Natural Science Foundation of Chinaunder grants 61422308 and 11688101.

MSC 2010 subject classiﬁcations:

Primary 62F12, 62M10; secondary 93E24

Keywords and phrases: nonlinear autoregressive models, least squares, strong consis-tency, Harris recurrent Z. B. LIU AND C. LI

Speciﬁcally, in the Bayesian framework, [4] and [11] showed (cid:26) lim t → + ∞ λ min ( t + 1) = + ∞ (cid:27) = (cid:26) lim t → + ∞ ˆ θ t = θ (cid:27) , (1.2)while [6, Theorem 1] and [5, Lemma 3.1] found that in the non-Bayesianframework, where { w t } is an approperiate martingale diﬀerence sequence,(1.3) k ˆ θ t +1 − θ k = O (cid:18) log ( λ max ( t + 1)) λ min ( t + 1) (cid:19) , a.s. , where λ max ( t + 1) denotes the maximal eigenvalue of P − t +1 . Moreover, [6]pointed out that log ( λ max ( t + 1)) = o ( λ min ( t + 1))(1.4)is in some sense the weakest condition for the strong consistency of ˆ θ t in thenon-Bayesian framework.The eigenvalues of P − t +1 depend on outputs { y t } , which are produced bythe nonlinear random system (1.1) automatically. So, checking lim t → + ∞ λ min ( t +1) = + ∞ or (1.4) is not trivial in general. But for the linear AR model y t +1 = n X i =1 θ i y t − i +1 + w t +1 , t ≥ , (1.5)which is a special case of (1.1), [7] successfully veriﬁedlim inf t → + ∞ t − λ min ( t + 1) > , a.s.(1.6)and then completely solved the strong consistency of the LS estimator forthis basic situation. The veriﬁcation of (1.6) in [7], to some extent, attributesto the linear structure of model (1.5). As to nonlinear model (1.1), we nat-urally wonder if the LS estimator still has the similar asymptotic behavior.In the next section, we shall establish the asymptotic properties of theLS estimator for model (1.1). By assuming some mild conditions on φ , theminimal eigenvalue of P − t +1 is estimated in both the Bayesian framework andnon-Bayesian framework. We ﬁnd that the LS estimates converge to the trueparameter almost surely on the set where vector ( y t , . . . , y t − n +1 ) τ does notdiverge to inﬁnity. Since most real system is not divergent, this means theLS estimator is very likely to be strong consistency when applied to model(1.1) in practice. The proof of the main results is included in Section 3. SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR

2. Main Results.

We ﬁrst consider a simpliﬁed version of model (1.1)by restricting φ as φ ( z , . . . , z n ) = col { φ (1) ( z ) , . . . , φ ( n ) ( z n ) } , (2.1)where φ ( i ) = ( f i , . . . , f im i ) τ : R → R m i , i = 1 , . . . , n are some knownLebesgue measurable vector functions and m i ≥ n integers satisfy-ing P ni =1 m i = m . Without loss of generality, let y t = 0 for t <

0. Wediscuss the parameter estimation of model (1.1) and (2.1) by two cases. InSubsection 2.1, parameter θ is treated as a random variable, while it is aﬁxed vector in Subsection 2.2.Next, we establish the asymptotic theory of the LS estimator for thegeneral AR model (1.1) in Subsection 2.3.2.1. Bayesian Framework . Consider model (1.1) and (2.1). Assume A1 The noise { w t } is an i.i.d random sequence with w ∼ N (0 ,

1) andparameter θ ∼ N ( θ , I m ) is independent of { w t } . A2 There are some open sets { E i } ni =1 belonging to R such that(i) f ij ∈ C ( R ) and f ij ∈ C m i ( E i ), 1 ≤ j ≤ m i , 1 ≤ i ≤ n ;(ii) for every unit vector x ∈ R m , there is a point y ∈ Q ni =1 E i suchthat | φ τ ( y ) x | 6 = 0 . Remark . By Assumption A2(ii), for every unit vector x ∈ R m , ℓ (cid:16)n y ∈ Y ni =1 E i : | φ τ ( y ) x | > o(cid:17) > , where ℓ denotes the Lebesgue measure. When n = 1, Assumption A2 can be relaxed as A2’ f i ∈ C m ( E ) , i = 1 , . . . , m are linearly independent in E , and φ isbounded in every compact set.The LS estimate ˆ θ t for parameter θ can be recursively deﬁned by  ˆ θ t +1 = ˆ θ t + P t +1 φ t ( y t +1 − φ τt ˆ θ t ) P t +1 = P t − (1 + φ τt P t φ t ) − P t φ t φ τt P t , P = I m φ t = φ ( y t , . . . , y t − n +1 ) , t ≥ , (2.2)where ˆ θ is the deterministic initial condition of the algorithm and φ is therandom initial vector of system (1.1). Clearly, by (1.1) and (2.2), P − t +1 = I m + t X i =0 φ i φ τi . (2.3) Z. B. LIU AND C. LI

We provide a simple way to estimate the minimal eigenvalue of P − t +1 , whichis denoted as λ min ( t + 1). Let N t ( M ) , t X i =1 I {k Y i k≤ M } , (2.4)where Y t , ( y t + n − , . . . , y t ) τ and M > N t ( M ), our estimate of λ min ( t + 1) is readily available by Theorem . Under Assumptions A1–A2, for any constant

M > , lim inf t → + ∞ λ min ( t + 1) N t ( M ) > a.s. on Ω( M ) , (2.5) where Ω( M ) , { lim t → + ∞ N t ( M ) = + ∞} . Corollary . Let Assumptions A1–A2 hold. Then, lim t → + ∞ ˆ θ t = θ a.s. on (cid:26) lim inf t → + ∞ k Y t k < + ∞ (cid:27) (2.6) Remark . If Assumption A2(ii) fails, then ℓ ( { y ∈ R n : | φ τ ( y ) x | > } ) = 0 for some unit vector x ∈ R m . Therefore, by (2.3), as t → ∞ , λ min ( t + 1) = O (1) , a.s. . In view of (1.2), ˆ θ t cannot converge to the true parameter θ . So, AssumptionA2(ii) is necessary for the strong consistency of the LS estimates { ˆ θ t } t ≥ . Constant Parameter . Consider model (1.1) and (2.1), where θ isa non-random parameter. Assume A1’ { w t } is an i.i.d random sequence with Ew = 0 and E | w | β < + ∞ for some β >

2. Moreover, w has a density ρ ( x ) such that for everyproper interval I ⊂ R ,inf x ∈ I ρ ( x ) > x ∈ R ρ ( x ) < + ∞ . SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR In this case, the LS estimator is constructed from partial data. More specif-ically, for some constant C φ > φ t in (2.2) is modiﬁed as φ t , I {k Y t − n +1 k≤ C φ } φ ( y t , . . . , y t − n +1 ) . Let λ min ( t + 1) and λ max ( t + 1) denote the minimal and maximal eigenvaluesof P − t +1 in (2.3). Deﬁne r t , P ti =0 k φ i k + 1 as the trace of P − t +1 . Note that r t λ max ( t +1) ∈ [1 , n ] and r t = O ( N t ( C φ )), where N t ( · ) is deﬁned by (2.4). Then,an analogous version of Theorem 2.1 is deduced as follows: Theorem . Under Assumptions A1’ and A2, there is a constant M φ > depending only on φ such that for any C φ > M φ and M > , lim inf t → + ∞ λ min ( t + 1) N t ( M ) > a.s. on Ω( M ) . Furthermore, if M ≥ C φ , then k ˆ θ t − θ k = O ( log N t ( M ) N t ( M ) ) a.s. on set Ω( M ) . Remark . Theorem 2.2 indicates that (2.6) holds under AssumptionsA1’ and A2. In most practical situations, P (cid:26) lim inf t → + ∞ k Y t k < + ∞ (cid:27) = 1(2.7) and the strong consistency of the LS estimates is thus guaranteed. Notethat Assumption A1’ and (2.7) imply that { y t } t ≥ in model (1.1) is in factan aperiodic Harris recurrent Markov chain and hence admits an invariantmeasure. Some integrability assumptions on the invariant measure mightalso lead to the consistency of the LS estimates (e.g.[10]). However, it isnot yet clear that the invariant measure of such a nonlinear autoregressivemodel ever has the desired properties for estimation. Example . Consider a parametric autoregressive model of the form: y t +1 = n X j =1 θ j g ( y t ) I { y t ∈ D j } + y t I { y t ∈ D n +1 } + w t +1 , y = 0 , (2.8) where g ( · ) is bounded in any compact set, { D j } nj =1 are some compact sub-sets of R with positive Lebesgue measure and D n +1 = ( S nj =1 D i ) c . Let noises { w t } t ≥ satisfy Assumption A1’ and unknown parameters θ , . . . , θ n ∈ R .Considering the properties of random walks, { y t } t ≥ must fall into S nj =1 D i inﬁnitely many times. Then, it follows that { y t } t ≥ fulﬁlls (2.7). Hence The-orems 2.1 and 2.2 can be applied and the strong consistency of the LS es-timates is established. If g ( x ) = x , model (2.8) turns out to be the familiarthreshold autoregressive (TAR) model. Z. B. LIU AND C. LI

Asymptotic Theory for General Model . Let us return to model(1.1) and rewrite φ ( z ) = col { f ( z ) , . . . , f m ( z ) } , where z = ( z , . . . , z n ) τ and f i : R n → R , i = 1 , . . . , n are some knownLebesgue measurable vector functions. A natural question in this part iswhether the asymptotic behavior of the LS estimator in Theorems 2.1 and2.2 still holds for model (1.1)? To this end, assume A3 There is a bounded open set E ⊂ R n and a number δ ∗ > f i ∈ C ( R n ), 1 ≤ i ≤ m ;(ii) for every unit vector x ∈ R n , J (cid:0) { y ∈ E : | φ τ ( y ) x | = δ ∗ } (cid:1) = 0 , (2.9)where J ( · ) denotes the Jordan measure. In addition,inf k x k =1 ℓ ( { y ∈ E : | φ τ ( y ) x | > δ ∗ } ) > . (2.10)With the proof placed in Appendix B, our problem is addressed by Theorem . Theorems 2.1 and 2.2 hold for model (1.1) if AssumptionA2 is replaced by A3.

Example . Consider the following exponential autoregressive model(EXAR) with noises { w t } t ≥ satisfying A1’: y t +1 = n X j =1 ( α j + β j e − γy t ) y t − j +1 + w t +1 , (2.11) where γ is known and α j , β j , j = 1 , , . . . , n are unknown parameters. Itcan be checked that Assumption A3 holds for model (2.11). Furthermore, inmost practical cases, outputs { y t } t ≥ produced by the above EXAR modelsfulﬁll (2.7). So, the LS estimator is often eﬀective for model (2.11) due toTheorem 2.3.

3. Proofs of Theorems 2.1 and 2.2.

It is obvious that to show The-orems 2.1 and 2.2, it suﬃces to prove

Proposition . Under Assumptions A1’ and A2, let θ be a randomvariable independent of { w t } t ≥ .Then, there is a constant M φ > dependingonly on φ such that for any C φ > M φ and M, K > , lim inf t → + ∞ λ min ( t + 1) N t ( M ) > a.s. on Ω( M ) ∩ {k θ k ≤ K } . (3.1) SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Borrowing the idea of [8], the proof of Proposition 3.1 will be completedin the following three subsections.

Section 3.1 : Observe that λ min ( t + 1) = min k x k =1 x τ I m + t X i =1 φ i φ τi ! x = 1 + min k x k =1 t X i =1 ( φ τi x ) , so for any unit vector x ∈ R m , we shall construct a set U x ⊂ B (0 , C φ ) ⊂ R n such that inf y ∈ U x | φ τ ( y ) x | ≥ δ for some δ > Section 3.2 : We shall analyze the properties of U x and derive a key techniqueresult for our problem in Lemma 3.11. Section 3.3 : This section is intended to prove (3.1) by estimating the fre-quency of { Y t } t ≥ falling into U x .3.1. Construction of U x . The important set U x is constructed from aﬁnite family of disjoint open intervals { S ji ( q ) } deﬁned below.3.1.1. Open Intervals S ji ( q ) . We claim that for each i ∈ [1 , n ], thereexists a ﬁnite family of disjoint open intervals { S ji ( q ) } p i j =1 for some q ∈ N + fulﬁlling:(i) φ ( i ) ∈ C m i in S p i j =1 S ji ( q );(ii) S p i j =1 S ji ( q ) has no points in Z s ( i ) deﬁned later in (3.13);(iii) For every unit vector x ∈ R m , ℓ (cid:16)n y ∈ Y ni =1 [ p i j =1 S ji ( q ) : | φ τ ( y ) x | > o(cid:17) > . (3.2)We preface the proof of the claim with several auxiliary lemmas. Lemma . Let { U j } j ≥ be a sequence of open sets in Q ni =1 E i satisfying U ⊂ U . . . ⊂ U j ⊂ . . . and (3.3) lim j → + ∞ U j = U, where U is a non-empty open set that ℓ ( { y ∈ U : | φ τ ( y ) x | > } ) > , ∀ x ∈ R m , k x k = 1 . Then, there is an integer j such that ℓ ( { y ∈ U j : | φ τ ( y ) x | > } ) > , ∀ x ∈ R m , k x k = 1 . Z. B. LIU AND C. LI

Proof.

If the assertion is not true, then by the continuity of φ in As-sumption A2(i), for each j ≥

1, there is a vector x j ∈ R m with k x j k = 1such that φ τ ( y ) x j = 0 , ∀ y ∈ U j . (3.4)It follows that there is a subsequence { x n i } i ≥ of { x j } j ≥ satisfyinglim i → + ∞ x n i = x ∞ , (3.5)where k x ∞ k = 1. On the other hand, ℓ ( { y ∈ U : | φ τ ( y ) x ∞ | > } ) > , so there is a y ∗ ∈ U such that(3.6) | φ τ ( y ∗ ) x ∞ | > . By (3.3), there is an integer m ′ ≥ y ∗ ∈ U j for all j ≥ m ′ , andhence (3.4)–(3.6) yield0 < | φ τ ( y ∗ ) x ∞ | = lim i → + ∞ | φ τ ( y ∗ ) x n i | = 0 , which leads to a contradiction. Remark . Since every open E i ⊂ R , i ∈ [1 , n ] is a countable union ofdisjoint open intervals, Lemma 3.1 implies that there is an open set E ′ i ⊂ E i such that E ′ i consists of a ﬁnite number of disjoint open intervals and ℓ ( { y ∈ n Y i =1 E ′ i : | φ τ ( y ) x | > } ) > , ∀ x ∈ R m , k x k = 1 . So, without loss of generality, assume each E i in the sequel is a ﬁnite unionof disjoint open intervals. Now, we introduce a series of operators. Denote D as the diﬀerential op-erator, then for any suﬃciently smooth functions { g l } l ≥ , recursively deﬁne ( Λ ( g ) , g Λ l +1 ( g , · · · , g l +1 ) , Λ l (cid:16) Dg Dg l +1 , · · · , Dg l Dg l +1 (cid:17) , l ≥ . (3.7)These operators { Λ l } l ≥ have the following property: SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Lemma . Let functions { g i } l +1 i =1 , l ∈ N + be suﬃciently smooth, then Λ l +1 ( g , . . . , g l +1 ) = D (Λ l ( g , g , . . . , g l +1 )) D (Λ l ( g , g , . . . , g l +1 )) . (3.8) Proof.

We use the induction method to show this lemma. By the deﬁ-nition of Λ , it is easy to checkΛ ( g , g ) = Λ (cid:18) Dg Dg (cid:19) = Dg Dg = D (Λ ( g )) D (Λ ( g )) . Let k ≥

2. Suppose (3.8) holds for any functions { g i } l +1 i =1 , l = k −

1, thenΛ k (cid:18) Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:19) = D (cid:16) Λ k − (cid:16) Dg Dg k +1 , Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:17)(cid:17) D (cid:16) Λ k − (cid:16) Dg Dg k +1 , Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:17)(cid:17) , and hence by (3.7),Λ k +1 ( g , g , . . . , g k +1 ) = Λ k (cid:18) Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:19) = D (cid:16) Λ k − (cid:16) Dg Dg k +1 , Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:17)(cid:17) D (cid:16) Λ k − (cid:16) Dg Dg k +1 , Dg Dg k +1 , . . . , Dg k Dg k +1 (cid:17)(cid:17) = D (Λ k ( g , g , . . . , g k +1 )) D (Λ k ( g , g , . . . , g k +1 )) , which completes the induction.Before proceeding to the next lemma, we deﬁne some notations. Let l < · · · < l s be s positive integers. For each k ∈ [1 , s ], denote H ( l ,...,l s ) k as the k -permutations of { l , . . . , l s } . That is, H ( l ,...,l s ) k , { ( i , . . . , i k ) : i j ∈ { l , . . . , l s } , ≤ j ≤ k ; i r = i j if r = j } . Now, let i ∈ [1 , n ]. For each ( i , . . . , i k ) ∈ H (1 ,...,m i ) k , k ∈ [1 , m i ], deﬁneΓ ( i )( i ,...,i k ) , Λ k ( f ii , . . . , f ii k ) , ¯Γ ( i ) s , D Γ ( i ) s , (3.9)and for any s ∈ H i , S m i k =1 H (1 ,...,m i ) k , W s ( i ) , n y : ¯Γ ( i ) s ( y ) is well-deﬁned o . Z. B. LIU AND C. LI

Given function g , denote A ( g ) , { x : g ( x ) = 0 } . In addition, for any twosets X , X ⊂ R , we say that X is locally dense in X , if X is not nowheredense in X . That is, there exists a nonempty open interval X ⊂ X suchthat X ⊂ X . With the above pre-deﬁnitions, we assert Lemma . Let integers i ∈ [1 , n ] , k ∈ [2 , m i ] and array s ∗ ∈ H (1 ,...,m i ) k .Under Assumption A2, there is a set H ik ⊂ S j

We ﬁrst prove (3.10) for the given i and k . Let s k, = s ∗ , thenfor each j = k, . . . ,

2, Lemma 3.2 and (3.9) indicate that there exist someindices s j − , , s j − , ∈ H i such thatΓ ( i ) s j, = ¯Γ ( i ) s j − , ¯Γ ( i ) s j − , . (3.12)Denote H ik , { s j, , j = 1 , . . . , k − } .Note that by (3.7), (3.9) and Assumption A2(i), it is easy to see { y ∈ E i : Γ ( i ) s ∗ ( y ) is well-deﬁned } = { y ∈ E i : D Γ ( i ) s ∗ ( y ) is well-deﬁned } . In addition, Lemma 3.2 infers that for each j = 1 , . . . , k , { y ∈ E i : Γ ( i ) s j − , ( y ) is well-deﬁned } = { y ∈ E i : Γ ( i ) s j − , ( y ) is well-deﬁned } . Then, by (3.12), W cs ∗ ( i ) ∩ E i = { y ∈ E i : Γ ( i ) s ∗ ( y ) is undeﬁned } = { y ∈ E i : ¯Γ ( i ) s k − , ( y ) is undeﬁned } ∪ A (¯Γ ( i ) s k − , )= · · · = [ s ∈ H ik ( A (¯Γ ( i ) s ) ∩ E i ) , which is exactly (3.10). So, if (3.11) holds, for every s ∈ H ik , int( A (¯Γ ( i ) s ) ∩ U ) = ∅ . Finally, we show that for some s ′ ∈ H ik , A (¯Γ ( i ) s ′ ) is locally dense in SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR U . Otherwise, A (¯Γ ( i ) s ) is nowhere dense in U for every s ∈ H ik . This meansthere are a series of nonempty open intervals U ⊂ · · · ⊂ U k − ⊂ U suchthat U j ∩ A (¯Γ ( i ) s l, ) = ∅ for all l = j, . . . , k − . As a consequence, by (3.10), U ∩ W cs ∗ ( i ) = U ∩ (cid:18)[ k − j =1 A (¯Γ ( i ) s j, ) (cid:19) = ∅ , which contradicts to (3.11) due to U ⊂ U .Now, we are ready to construct { S ji ( q ) } p i j =1 . For this, we classify the sets A (¯Γ ( i ) s ) , s ∈ H i , i ∈ [1 , n ] into three types:  Z s ( i ) = int( A (¯Γ ( i ) s )) Z s ( i ) = d ( A (¯Γ ( i ) s )) \ Z s ( i ) Z s ( i ) = A (¯Γ ( i ) s ) \ d ( A (¯Γ ( i ) s )) , i ∈ [1 , n ] , (3.13)where d ( A ) denotes the derived set of A . Observe that Z s ( i ) can be expressedby a countable union of disjoint open intervals and Z s ( i ) is in fact the setof the isolated points of A (¯Γ ( i ) s ). Both the two sets have good topologicalproperties. However, the structure of Z s ( i ) is not that clear. Therefore, wedeﬁne the following sets to exclude Z s ( i ): S ( i ) , E i / (cid:16)[ s ∈H i Z s ( i ) (cid:17) , i ∈ [1 , n ] , which are clearly some open sets.The key idea of the construction of { S ji ( q ) } p i j =1 is to ﬁnd a proper subsetof S ( i ) for each i ∈ [1 , n ]. To begin with, we prove an important lemma. Lemma . Under Assumption A2, for any unit vector x ∈ R m , ℓ ( y ∈ n Y i =1 S ( i ) : | φ τ ( y ) x | > )! > . (3.14) Proof.

We show the lemma in a way of reduction to absurdity. Supposethere exists some x ∈ R m with k x k = 1 such that ℓ ( y ∈ n Y i =1 S ( i ) : | φ τ ( y ) x | > )! = 0 . (3.15) Z. B. LIU AND C. LI As φ ( · ) is continuous on open set Q ni =1 S ( i ) ⊂ Q ni =1 E i , then(3.16) φ τ ( y ) x = 0 , ∀ y ∈ n Y i =1 S ( i )Note that Assumption A2(ii) yields ℓ ( y ∈ n Y i =1 E i : | φ τ ( y ) x | > )! > , which together with (3.15) implies ℓ ( y ∈ n Y i =1 E i \ n Y i =1 S ( i ) : | φ τ ( y ) x | > )! > . Consequently, there is a y ∗ = ( y ∗ , . . . , y ∗ n ) ∈ Q ni =1 E i \ Q ni =1 S ( i ) such that | φ τ ( y ∗ ) x | >

0. By the continuity of φ ( · ) on Q ni =1 E i , there is ε > | φ τ ( y ) x | > , ∀ y ∈ n Y i =1 ( y ∗ i − ε, y ∗ i + ε ) ⊂ n Y i =1 E i . (3.17)On account of (3.16) and (3.17), we deduce n Y i =1 ( y ∗ i − ε, y ∗ i + ε ) ⊂ n Y i =1 E i \ n Y i =1 S ( i )= n [ i =1  i − Y j =1 E j ×  [ s ∈H i Z s ( i )  × n Y j = i +1 E j  , which immediately yields that for some index i ∈ [1 , n ], V i , ( y ∗ i − ε, y ∗ i + ε ) ⊂ [ s ∈H i Z s ( i ) . (3.18)Next, we show (3.18) is impossible. To this end, note that Z s ( i ) is closedfor each s ∈ H i , and hence (3.18) implies that there is an integer k ∈ [1 , m i ]and an array s ∗ ∈ H (1 ,...,m i ) k such that Z s ∗ ( i ) is locally dense in V i . Let k bethe smallest integer for such s ∗ .Now, ﬁx the above i ∈ [1 , n ] , k ∈ [1 , m i ] and s ∗ ∈ H (1 ,...,m i ) k . Since Z s ∗ ( i ) islocally dense in V i , there is an open interval V ′ i ⊂ V i such that Z s ∗ ( i ) is densein V ′ i . Moreover, Z s ∗ ( i ) is closed, so V ′ i ⊂ Z s ∗ ( i ) and thus V ′ i ∩ Z s ∗ ( i ) = ∅ . In SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR addition, ¯Γ ( i ) s ∗ is continuous in W s ∗ ( i ) ∩ E i , by (3.13), W s ∗ ( i ) ∩ Z s ∗ ( i ) ∩ E i ⊂ A (¯Γ ( i ) s ∗ ). Consequently,int (cid:0) W s ∗ ( i ) ∩ Z s ∗ ( i ) ∩ E i (cid:1) ⊂ int( A (¯Γ ( i ) s ∗ ) ∩ E i ) = Z s ∗ ( i ) ∩ E i . (3.19)Moreover, V ′ i is an open interval belongs to E i and V ′ i ∩ ( Z s ∗ ( i )) c = ∅ , then V ′ i = V ′ i \ Z s ∗ ( i ) ⊂ V ′ i \ int (cid:0) W s ∗ ( i ) ∩ Z s ∗ ( i ) (cid:1) ⊂ V ′ i \ ( W s ∗ ( i ) ∩ Z s ∗ ( i )) = V ′ i ∩ W cs ∗ ( i ) ⊂ W cs ∗ ( i ) . (3.20)Note that ¯Γ ( i ) s are well deﬁned in R for all s ∈ H (1 ,...,m i )1 by Assumption A2(i),which shows W cs ( i ) ∩ E i = ∅ . Then, (3.20) implies k ≥

2. Furthermore, by Z s ∗ ( i ) ⊂ A (¯Γ ( i ) s ∗ ), it yieldsint (cid:0) W cs ∗ ( i ) ∩ V ′ i (cid:1) ⊂ int (cid:16) A c (¯Γ ( i ) s ∗ ) ∩ V ′ i (cid:17) = (cid:18) ( A c (¯Γ ( i ) s ∗ ) ∩ V ′ i ) c (cid:19) c = (cid:18) A (¯Γ ( i ) s ∗ ) ∪ ( V ′ i ) c (cid:19) c ⊂ (cid:0) Z s ∗ ( i ) ∪ ( V ′ i ) c (cid:1) c = ∅ . (3.21)Applying Lemma 3.3, (3.20) and (3.21) indicate that we can ﬁnd some j < k and s ′ ∈ H (1 ,...,m i ) j such that A (¯Γ ( i ) s ′ ) is locally dense in V ′ i and int( A (¯Γ ( i ) s ′ ) ∩ V ′ i ) = ∅ . So, there is an open interval V ′′ i ⊂ V ′ i such that V ′′ i ⊂ d ( A (¯Γ ( i ) s ′ ))and int( A (¯Γ ( i ) s ′ )) ∩ V ′′ i = int( A (¯Γ ( i ) s ′ ) ∩ V ′′ i ) = ∅ , and then V ′′ i ⊂ Z s ′ ( i ). That is, Z s ′ ( i ) is locally dense in V ′ i , which derives a contradiction to the deﬁnitionof k . This completes the proof of Lemma 3.4.Next, we consider a series of open sets { S ( i ) ∩ ( − j, j ) } j ≥ for i = 1 , . . . , n .Clearly, S ( i ) ∩ ( − j, j ) ⊂ S ( i ) ∩ ( − ( j +1) , ( j +1)) and lim j → + ∞ S ( i ) ∩ ( − j, j ) = S ( i ). Then, by using Lemmas 3.1 and 3.4, there is an integer d ≥ x ∈ R m , ℓ (cid:16)n y ∈ Y ni =1 ( S ( i ) ∩ ( − d, d )) : | φ τ ( y ) x | > o(cid:17) > . (3.22)Since S ( i ) is open, for each integer i ∈ [1 , n ], there exists some disjoint openintervals { S ji } j ∈ Θ i , where Θ i = { , . . . , k i } ( k i can be taken inﬁnite), suchthat S ( i ) ∩ ( − d, d ) = S j ∈ Θ i S ji . Write S ji = ( c ji , d ji ) and denote S ji ( q ) , c ji + d ji − c ji q + 2 , d ji − d ji − c ji q + 2 ! , j ∈ Θ i , q ∈ N + . (3.23)Given (3.22), the following lemma is natural. Z. B. LIU AND C. LI

Lemma . If (3.22) holds, then there exist some integers p , . . . , p n and q ≥ such that for any unit x ∈ R m , ℓ ( { y ∈ S : | φ τ ( y ) x | > } ) > and S ⊂ n Y i =1 E i , (3.24) where S , Q ni =1 S p i j =1 S ji ( q ) . Proof.

Let i ∈ [1 , n ]. It is obvious that S j ∈ Θ i S ji ( q ) ⊂ S j ∈ Θ i S ji ( q + 1), q ∈ N + . If | Θ i | < + ∞ , thenlim q → + ∞ [ j ∈ Θ i S ji ( q ) = S ( i ) ∩ ( − d, d ) . As for the case where Θ i = N + , it inferslim k → + ∞ [ kj =1 S ji ( k ) = S ( i ) ∩ ( − d, d ) . So, in view of the above two cases, by (3.22) and Lemma 3.1, there are someintegers p , . . . , p n and q ≥ Selection of U x . With the foregoing preliminaries in place, we canset out to construct U x . First, for every x ∈ R m with k x k = 1, deﬁne U x ( δ ) , { y : | φ τ ( y ) x | > δ } ∩ S , δ > . The remaining task is to take a proper δ > U x = U x ( δ ) meetour requirement. To this end, let { d k } nk =1 be a sequence of numbers and for k ∈ [ n + 1 , n ], deﬁne ς k , d k − x τ φ ( d k − , . . . , d k − n ) , x ∈ R m . (3.25)Denote y = ( d n , . . . , d ) τ and ς = ( ς n , . . . , ς n +1 ) τ . Evidently, (3.25) impliesthat there is a function g : R n + m → R n such that( d n , . . . , d n +1 ) τ = g ( ς, y, x ) . (3.26)We choose δ according to the lemma below. Lemma . Under Assumption A2, the following two statements hold:(i) given y ∈ R n , x ∈ R m and a box O = Q ni =1 I i with { I i } ni =1 being someintervals, then ℓ ( { ς : g ( ς, y, x ) ∈ O } ) = ℓ ( O );(3.27) (ii) for any constants M, K > , there is a δ ∗ > such that (3.28) inf k z k =1 , k y k≤ M, k x k≤ K ℓ ( { ς : | φ τ ( g ( ς, y, x )) z | > δ ∗ , g ( ς, y, x ) ∈ S} ) > . SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Proof. (i) Note that in view of (3.25), d k = ς k + o k − , k = n + 1 , . . . , n, where o k − ∈ R is a point determined by ς k − , y and x (for k = n +1, ς n doesnot exist and o n depends only on y and x ). So, { ς : ς + o k − ∈ I k } = I k − o k − is an interval with length | I k | . By the deﬁnition of the Lebesgue measure in R n , it is straightforward that ℓ ( { ς : g ( ς, y, x ) ∈ O } ) = Y nk =1 | I k | = ℓ ( O ) . (ii) Arguing by contradiction, we assume that (3.28) is false. Then, for eachinteger k ≥

1, there exists some point ( z ( k ) , y ( k ) , x ( k )) falling in a compactset B (0 , × B (0 , M ) × B (0 , K ) ⊂ R m × R n × R m with k z ( k ) k = 1 such that(3.29) ℓ ( { ς : | φ τ ( g ( ς, y ( k ) , x ( k ))) z ( k ) | > k , g ( ς, y ( k ) , x ( k )) ∈ S} ) < k . This sequence of points thus has a subsequence { z ( k r ) , y ( k r ) , x ( k r ) } r ≥ andan accumulation point ( z ∗ , y ∗ , x ∗ ) such that(3.30) lim r → + ∞ z ( k r ) = z ∗ , lim r → + ∞ y ( k r ) = y ∗ , lim r → + ∞ x ( k r ) = x ∗ . So, k z ∗ k = 1, k y ∗ k ≤ M , k x ∗ k ≤ K . If ℓ ( { ς : | φ τ ( g ( ς, y ∗ , x ∗ )) z ∗ | > , g ( ς, y ∗ , x ∗ ) ∈ S} ) = 0 , then φ τ ( y ) z ∗ ≡ y ∈ S due to (3.25), (3.26) and the continuity of φ .This clearly contradicts to Lemma 3.5. Therefore, by (3.23),lim k → + ∞ ℓ ( { ς : | φ τ ( g ( ς, y ∗ , x ∗ )) z ∗ | > k , g ( ς, y ∗ , x ∗ ) ∈ S k } )= ℓ ( { ς : | φ τ ( g ( ς, y ∗ , x ∗ )) z ∗ | > , g ( ς, y ∗ , x ∗ ) ∈ S} ) > , where S k , Q ni =1 S p i j =1 (cid:18) c ji + d ji − c ji q +2 + k , d ji − d ji − c ji q +2 − k (cid:19) . This implies thatthere exists an integer h ≥ ℓ ( { ς : | φ τ ( g ( ς, y ∗ , x ∗ )) z ∗ | > h , g ( ς, y ∗ , x ∗ ) ∈ S h } ) > . (3.31)Note that all points { y ( k r ) , x ( k r ) } r ≥ are restricted to B (0 , M ) × B (0 , K ),(3.25) and (3.26) then indicate that there is a compact set O ′ such that { ς : g ( ς, y ( k r ) , x ( k r )) ∈ S} ⊂ O ′ . Z. B. LIU AND C. LI

Further, g and φ are continuous due to (3.25), (3.26) and Assumption A2(i),hence (3.30) shows  lim r →∞ sup ς ∈ O ′ k g ( ς, y ∗ , x ∗ ) − g ( ς, y ( k r ) , x ( k r )) k = 0lim r →∞ sup ς ∈ O ′ k φ τ ( g ( ς, y ∗ , x )) z ∗ − φ τ ( g ( ς, y ( k r ) , x ( k r ))) z ( k r ) k = 0 . As a consequence, for all suﬃciently large r , ℓ ( { ς : | φ τ ( g ( ς, y ∗ , x ∗ )) z ∗ | > h , g ( ς, y ∗ , x ∗ ) ∈ S h } ) < ℓ ( { ς : | φ τ ( g ( ς, y ( k r ) , x ( k r ))) z ( k r ) | > k r , g ( ς, y ( k r ) , x ( k r )) ∈ S} ) < k r , which contradicts to (3.31) by letting r → + ∞ . Lemma 3.6 thus follows. Remark . In Lemma 3.6, Assumption A2 can be weaken to Assump-tion A2’ when n = 1 . Statement (i) is trivial. For (ii), note that (3.24) stillholds by Assumption A2’. But, (3.25), (3.29) and (3.31) yield that for allsuﬃciently large r , k r > ℓ ( { ς : | φ τ ( g ( ς, y ( k r ) , x ( k r ))) z ( k r ) | > k r , g ( ς, y ( k r ) , x ( k r )) ∈ S} )= ℓ ( { y : | φ τ ( y ) z ( k r ) | > k r , y ∈ S} ) ≥ ℓ ( { y ∈ S : | φ τ ( y ) z ∗ | > k r + 1 h } ) , where { z ( k r ) , y ( k r ) , x ( k r ) } r ≥ is deﬁned in the proof of Lemma 3.6. Letting r → + ∞ in the above inequality infers ≥ lim r → + ∞ ℓ ( { y ∈ S : | φ τ ( y ) z ∗ | > k r + 1 h } )= ℓ ( { y ∈ S : | φ τ ( y ) z ∗ | > h } ) , (3.32) which contradicts to (3.31). At the end of this section, ﬁx two numbers M and K . According to Lemma3.6(ii), we select a δ ∗ such that (3.28) holds. Now, for any unit vector x ∈ R m ,deﬁne U x , U x ( δ ∗ ). SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR The Properties of U x . To analyze the properties of U x , we ﬁrstprove a lemma below. Lemma . Fix an integer i ∈ [1 , n ] . Let x i = ( x i , . . . , x im i ) τ ∈ R m i bea non-zero vector and d > c be two numbers satisfying [ c, d ] ⊂ E i . Also, let { r l } mi − l =1 be a sequence of numbers that d ≥ r > r > . . . > r mi − ≥ c and (3.33) m i X j =1 f ′ ij ( r l ) x ij = 0 , ≤ l ≤ m i − , where f ′ ij = Df ij , j = 1 , . . . , m i . Then, the following two statements hold:(i) there exists an array s ∈ H i such that A (¯Γ ( i ) s ) ∩ [ c, d ] = ∅ ; (ii) if for every s ∈ H i , A (¯Γ ( i ) s ) ∩ [ c, d ] is either ∅ or [ c, d ] , then (3.34) m i X j =1 f ′ ij ( y ) x ij = 0 , ∀ y ∈ [ c, d ] . Proof. (i) Suppose S s ∈H i A (¯Γ ( i ) s ) ∩ [ c, d ] = ∅ . Then, for each integer k ∈ [1 , m i ], there exist 2 k − numbers { ε k,l } k − l =1 satisfying d ≥ ε k, > . . . >ε k, k − ≥ c and(3.35) k X j =1 ¯Γ ( i )( j,k +1 ,...,m i ) ( ε k,l ) x ij = 0 , ≤ l ≤ k − , where ¯Γ ( i )( j,m i +1 ,...,m i ) , f ′ ij . As a matter of fact, when k = m i , (3.33) leads to (3.35) immediately. Wenow prove (3.35) by induction. Assume (3.35) holds for k = m ′ , where m ′ isan integer in [2 , m i ]. Hence we can ﬁnd 2 m ′ − numbers of { ε m ′ ,l } m ′− l =1 suchthat d ≥ ε m ′ , > . . . > ε m ′ , m ′− ≥ c and m ′ X j =1 ¯Γ ( i )( j,m ′ +1 ,...,m i ) ( ε m ′ ,l ) x ij = 0 , ≤ l ≤ m ′ − . By S s ∈H i A (¯Γ ( i ) s ) ∩ [ c, d ] = ∅ , every Γ ( i )( j,m ′ ,...,m i ) is well-deﬁned in [ c, d ], thenfor 1 ≤ l ≤ m ′ − , m ′ − X j =1 Γ ( i )( j,m ′ ,...,m i ) ( ε m ′ ,l ) x ij = m ′ − X j =1 ¯Γ ( i )( j,m ′ +1 ,...,m i ) ¯Γ ( i )( m ′ ,...,m i ) ( ε m ′ ,l ) x ij = − x im ′ . Z. B. LIU AND C. LI

Taking account of the

Rolle’s theorem , there are some { ε m ′ − ,l } m ′− l =1 with ε m ′ − ,l ∈ ( ε m ′ , l − , ε m ′ , l ) such that m ′ − X j =1 ¯Γ ( i )( j,m ′ ,...,m i ) ( ε m ′ − ,l ) x ij = 0 , ≤ l ≤ m ′ − . Therefore, (3.35) holds for k = m ′ − k = 1 in (3.35), there is a number ε , ∈ [ c, d ] such that¯Γ ( i )(1 ,...,m i ) ( ε , ) x i = 0. Since S s ∈H i A (¯Γ ( i ) s ) ∩ [ c, d ] = ∅ , ¯Γ ( i )(1 ,...,m i ) ( ε , ) = 0,and hence x i = 0. By the symmetry of x i , . . . , x im i in (3.35), we concludethat x ij = 0 for all j ∈ [1 , m i ]. But this is impossible due to k x i k 6 = 0 andthus S s ∈H i A (¯Γ ( i ) s ) ∩ [ c, d ] = ∅ .(ii) Let I be an open interval containing [ c, d ]. It suﬃces to prove the claimthat for every function sequence f i , . . . , f im i ∈ C m i ( I ) satisfying (3.33), if A (¯Γ ( i ) s ) ∩ [ c, d ] is either ∅ or [ c, d ], ∀ s ∈ H i , then (3.34) holds. We show itby induction. When m i = 1, (3.33) reduces to f ′ i ( r ) x i = 0. Since x i = 0, A ( f ′ i ) ∩ [ c, d ] = ∅ , which means A ( f ′ i ) ⊃ [ c, d ] by assumption. So, f ′ i ( y ) x i ≡ y ∈ [ c, d ]. Suppose the claim mentioned above holds for all m i ∈ [1 , h − h ≥ m i = h . In this case, the non-zero vector x i = ( x i , . . . , x ih ) τ . First, assume that there is an integer j ′ ∈ [1 , h ] suchthat | x ij ′ | < k x i k and(3.36) A ( f ′ ij ′ ) ∩ [ c, d ] = ∅ . Without loss of generality, let j ′ = h . Deﬁne the following h − F j , Γ ( i )( j,h ) , ≤ j ≤ h − . Owing to (3.36), F j ∈ C h − ( I ), 1 ≤ j ≤ h − h ≥ h − X j =1 F j ( r l ) x ij = − x ih , ≤ l ≤ h − . Therefore, by applying the

Rolle’s theorem , there exist 2 h − numbers ε l ∈ [ r l , r l − ], l ∈ [1 , h − ] such that(3.38) h − X j =1 DF j ( ε l ) x ij = 0 , ≤ l ≤ h − . SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Here, ( x i , . . . , x i ( h − ) τ is nonempty by | x ih | < k x i k .Since for every ( i , . . . , i l ) ∈ S h − k =1 H (1 ,...,h − k , ( i , . . . , i l , h ) ∈ S hk =1 H (1 ,...,h ) k ,then by (3.7),(3.39) Λ l ( F i , . . . , F i l ) = Λ l +1 ( f ii , . . . , f ii l , f ih ) = Γ ( i )( i ,...,i l ,h ) . Because A (¯Γ ( i ) s ) ∩ [ c, d ] is either ∅ or [ c, d ] , ∀ s ∈ S hk =1 H (1 ,...,h ) k , (3.39) yields A ( D Λ l ( F i , . . . , F i l )) ∩ [ c, d ] = ∅ or [ c, d ] . Consequently, by the induction hypothesis with m i = h − f ii j = F j , j ∈ [1 , h −

1] satisfying (3.38), we conclude(3.40) h − X j =1 DF j ( y ) x ij = 0 , ∀ y ∈ [ c, d ] . In view of (3.37) and (3.40), we deduce that P h − j =1 F j ( y ) x ij = − x h for any y ∈ [ c, d ], and hence(3.41) h X j =1 f ′ ij ( y ) x ij ≡ y ∈ [ c, d ] . Now, it remains to consider the case that for each integer j ∈ [1 , h ], either | x ij | = k x i k or [ c, d ] ⊂ A ( f ′ ij ). If | x ij | < k x i k for all j ∈ [1 , h ], then f ′ ij ≡ c, d ] for all j ∈ [1 , h ], which leads to (3.41). So, assume there is an integer j ′ ∈ [1 , h ] that | x ij ′ | = k x i k . Without loss of generality, let j ′ = h , then x ij = 0 for all j ∈ [1 , h − f ′ ih ( r l ) = 0 , ≤ l ≤ h − . The induction hypothesis thus yields [ c, d ] ⊂ A ( f ′ ih ), and hence h X j =1 f ′ ij ( y ) x ij = f ′ ih ( y ) x ih = 0 , ∀ y ∈ [ c, d ] . Therefore, the claim is true for m i = h and we complete the induction.We now return to analyze Z s ( i ) ∩ ( S p i j =1 S ji ( q )). Observe that for eacharray s ∈ H i , if Z s ( i ) ∩ ( S p i j =1 S ji ( q )) = ∅ , it is a countable union of disjointopen intervals. Denote the set of these intervals by G s ( i ) , { I js ( i ) } j ≥ , where I js ( i ) = ( a js ( i ) , b js ( i )) , j = 1 , , . . . . (3.42) Z. B. LIU AND C. LI

Let G ( i ) , S s ∈H i G s ( i ) for each i ∈ [1 , n ] . Furthermore, deﬁne H ( i ) ,  [ s ∈H i Z s ( i )  ∩  p i [ j =1 S ji ( q )  , i ∈ [1 , n ] . Lemma . For each i ∈ [1 , n ] , (3.43) |G ( i ) | < + ∞ and | H ( i ) | < + ∞ . Proof.

Suppose |G ( i ) | = + ∞ for some i ∈ [1 , n ], then there is an array s ∗ ∈ H i such that |G s ∗ ( i ) | = + ∞ . Let y ∈ S p i j =1 S ji ( q ) be an accumulationpoint of { b js ∗ ( i ) } j ≥ . By the continuity of ¯Γ ( i ) s ∗ in set S p i j =1 S ji ( q ) ⊂ E i , y ∈ S p i j =1 S ji ( q ) ∩ A (¯Γ ( i ) s ∗ ). Moreover, it is evident that y Z s ∗ ( i ) ∪ Z s ∗ ( i ), so y ∈ Z s ∗ ( i ) ∩  p i [ j =1 S ji ( q )  ⊂ E i ∩  [ s ∈H i Z s ( i )  . However, E i ∩  [ s ∈H i Z s ( i )  = E i \ S ( i ) ⊂ E i \ p i [ j =1 S ji , (3.44)and  E i \ p i [ j =1 S ji  ∩  p i [ j =1 S ji ( q )  = ∅ . (3.45)The contradiction is derived immediately by comparing (3.44), (3.45) andthe fact y ∈ E i ∩ (cid:0)S s ∈H i Z s ( i ) (cid:1) . Thus, |G ( i ) | < + ∞ .As to | H ( i ) | < + ∞ , the proof is quite similar to that given for |G ( i ) | < + ∞ and is omitted.The following lemma is based on the above two lemmas. Lemma . Given i ∈ [1 , n ] , let x i ∈ R m i be a non-zero vector. Denote ( φ ( i ) ) ′ = ( f ′ i , . . . , f ′ im i ) τ and  K i = int ( A ( x τi ( φ ( i ) ) ′ )) ∩ (cid:16)S p i j =1 S ji ( q ) (cid:17) L i = ( A ( x τi ( φ ( i ) ) ′ ) ∩ (cid:16)S p i j =1 S ji ( q )) (cid:17) \ K i , then | L i | ≤ m i p i (3 |G ( i ) | + | H ( i ) | + 2) . SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Proof.

Let Q ij , ( A ( x τi ( φ ( i ) ) ′ ) ∩ S ji ( q )) \ K i , j = 1 , . . . , p i . We ﬁrst showthat the cardinality of each Q ij is ﬁnite. Otherwise, for some j ∈ [1 , p i ], thereis a monotone sequence { r l } l ≥ in Q ij such that r l = r l ′ for each l = l ′ andlim l → + ∞ r l = y ∗ for some y ∗ ∈ S ji ( q ). Without loss of generality, let r l < r l ′ if l > l ′ . Divide this sequence into inﬁnite groups: (cid:8) r mi k +1 , r mi k +2 , . . . , r mi ( k +1) (cid:9) , k = 0 , , . . . and for each k ≥

0, deﬁne(3.46) D k , [ r mi ( k +1) , r mi k +1 ] . So, given k ≥ D k ⊂ S ji ( q ) ⊂ E i and r mi k +1 > . . . > r mi ( k +1) satisfy(3.47) x i ( φ ( i ) ) ′ ( r l ) = 0 , m i k + 1 ≤ l ≤ m i ( k + 1) . Note that the deﬁnition of Q ij yields x i ( φ ( i ) ) ′ D k , applying Lemma3.7 with [ c, d ] = D k indicates that there is an array s k ∈ H i fulﬁlling A (¯Γ ( i ) s k ) ∩ D k = ∅ and D k A (¯Γ ( i ) s k ). Hence, at least one of following three cases occurs: Case 1: Z s k ( i ) ∩ D k = ∅ . Case 2:

There is an interval I js k ( i ) ∈ G s k ( i ) such that I js k ( i ) ⊂ D k . Case 3:

There is an interval I js k ( i ) ∈ G s k ( i ) satisfying I js k ( i ) ∩ D k = ∅ and D k I js k ( i ).Let O l , l = 1 , , l that occurs for some k ≥ D k ∩ D k ′ = ∅ for k = k ′ , O ≤ | H ( i ) | and O ≤ |G ( i ) | . (3.48)Furthermore, for each interval I js ( i ) ∈ G s ( i ), there exist at most two distinct D k such that I js ( i ) ∩ D k = ∅ and D k I js ( i ). Hence,(3.49) O ≤ |G ( i ) | . Combining (3.48) and (3.49) yields(3.50) O + O + O ≤ |G ( i ) | + | H ( i ) | < + ∞ . However, O + O + O = + ∞ because D k is inﬁnite. So, the cardinality ofeach Q ij is ﬁnite.Now, let j ∈ [1 , p i ] be an index that |Q ij | > m i (3 |G ( i ) | + | H ( i ) | + 2) . Z. B. LIU AND C. LI

Then, write the points of Q ij from left to right as v , v , . . . , v |Q ij |− . Deﬁne(3.51) h , (cid:22) |Q ij | − m i (cid:23) > |G ( i ) | + | H ( i ) | and D ′ k , [ v mi ( k +1) , v mi k +1 ] , k = 0 , , . . . , h − . by an analogous proof as (3.47)–(3.50), we arrive at h ≤ |G ( i ) | + | H ( i ) | , which arises a contradiction to (3.51). Therefore, |Q ij | ≤ m i (3 |G ( i ) | + | H ( i ) | + 2) and hence | L i | ≤ m i p i (3 |G ( i ) | + | H ( i ) | + 2).For any i ∈ [1 , n ], x i ∈ R m i and δ ∈ R , it is clear that set { y : x τi φ ( i ) ( y ) >δ } ∩ ( S p i j =1 S ji ( q )) is open. If this set is not empty, then it is a countableunion of disjoint open intervals. Denote the set of these intervals by U i ( δ ). Lemma . Let i ∈ [1 , n ] . Then, for any non-zero x i ∈ R m i and δ ∈ R , (3.52) |U i ( δ ) | ≤ p i ( | L i | + 2) . Proof.

Denote K ij , { I ∈ U i ( δ ) : I ∩ S ji ( q ) = ∅} , ∀ j ∈ [1 , p i ] , then(3.53) |U i ( δ ) | ≤ p i X j =1 |K ij | . Fix an index j ∈ [1 , p i ] and I ∈ K ij . By the continuity of φ ( i ) in S ji ( q ),each endpoint of I either belongs to the zero set A ( x τi φ ( i ) ( y ) − δ ) or is anendpoint of S ji ( q ). If ∂ ( I ) ∩ ∂ ( S ji ( q )) = ∅ , then ∂ ( I ) ∈ A ( x τi φ ( i ) ( y ) − δ ). Bythe Rolle’s theorem , it follows that { y : x τi ( φ ( i ) ) ′ ( y ) = 0 } ∩ I = ∅ , whichtogether with I ⊂ { y : x τi φ ( i ) ( y ) > δ } leads to L i ∩ I = ∅ . Note that thereare at most two intervals I ∈ K ij satisfying ∂ ( I ) ∩ ∂ ( S ji ( q )) = ∅ and any twointervals in K ij are disjoint, so |K ij | ≤ | L i | + 2 . (3.54)Finally, (3.52) is an immediate result of (3.53) and (3.54). SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Given a closed box O = Q ni =1 I i ∈ R n and a positive integer r , equallydivide each I i into r closed intervals { I i,j } rj =1 that int( I i,j ) ∩ int( I i,j ′ ) = ∅ if j = j ′ . So, there are r n small closed boxes Q ni =1 { I i,j } rj =1 . Let T ( O, r ) be theset of the r n small boxes. Clearly, for any distinct boxes U, U ′ ∈ T ( O, r ),int( U ) ∩ int( U ′ ) = ∅ . Deﬁne T δ ( O, r ) , (cid:8) U ∈ T ( O, r ) : B ( δ ) ∩ S ∩ U = ∅ (cid:9) , (3.55)where B ( δ ) , ∂ ( { y : φ τ ( y ) x > δ } ) and S is deﬁned in Lemma 3.5. Let K δ ( O, x, r ) , |T δ ( O, r ) | . The following lemma is critical to our result. Lemma . There is a constant

C > such that for any closed box O = Q ni =1 I i , non-zero vector x ∈ R m , δ ∈ R and integer r ≥ , K δ ( O, x, r ) ≤ Cr n − . (3.56) Proof.

We prove (3.56) by induction. For n = 1, let O = I be a closedbox. By Lemma 3.10 with n = 1, it is easy to check that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) B ( δ ) ∩  p [ j =1 S j ( q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p ( | L | + 2) . (3.57)Moreover, since B ( δ ) ∩  p [ j =1 S j ( q )  ⊂ B ( δ ) ∩  p [ j =1 S j ( q )  ∪ ∂  p [ j =1 S j ( q )  , it follows that K δ ( O, x, r ) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) B ( δ ) ∩  p [ j =1 S j ( q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p ( | L | + 2) + 4 p . Hence, (3.56) is true for n = 1 by taking C = 4 p ( | L | + 2) + 4 p .Now, suppose (3.56) holds for n = k with some k ≥

1. Let us considerthe case where n = k + 1. Take a closed box O = Q k +1 i =1 I i ∈ R k +1 , and let T ( O, r ) be the set of the r k +1 disjoint reﬁned boxes. These boxes correspondto two sets T = k Y i =1 { I i,j } rj =1 and T = { I k +1 ,j } rj =1 . Z. B. LIU AND C. LI

Write vector x = col { x , . . . , x k +1 } 6 = . First, assume there is an index l ∈ [1 , k + 1] such that x l = . Without loss of generality, let l = k + 1, then B ( δ ) ∩ k +1 Y i =1 p i [ j =1 S ji ( q ) ∩ O ⊂ (cid:18) ∂ (cid:18)(cid:26) z ∈ R k : k X i =1 x i φ ( i ) ( z i ) > δ (cid:27)(cid:19) ∩ k Y i =1 p i [ j =1 S ji ( q ) ∩ k Y i =1 I i (cid:19) × I k +1 . (3.58)where z = ( z , . . . , z k ) τ ∈ R k . (3.59)By applying the induction assumption for n = k and for the reﬁned boxesin T , there is a constant C > K δ k Y i =1 I i , col { x , . . . , x k } , r ! ≤ Cr k − , which, together with (3.58) and T ( O, a ) = T ×T , yields K δ ( O, x, r ) ≤ Cr k .This is exactly (3.56) for n = k + 1.So, let x i = for all i ∈ [1 , k + 1]. For any B ∈ T , deﬁne set Z ( B ) ,  z k +1 ∈ I k +1 : ( B × z k +1 ) ∩ B ( δ ) ∩ k +1 Y i =1 p i [ j =1 S ji ( q ) = ∅  . Observe that Z ( B ) is a closed set, then ∂Z ( B ) ⊂ Z ( B ). Deﬁne (cid:26) Z ( B ) , { I k +1 ,j ∈ T : Z ( B ) ∩ I k +1 ,j = ∅}Z ( B ) , { I k +1 ,j ∈ T : ∂Z ( B ) ∩ I k +1 ,j = ∅} . Since any interval in Z ( B ) \ Z ( B ) must be contained in Z ( B ), |Z ( B ) | − |Z ( B ) | = |Z ( B ) \ Z ( B ) | ≤ r | I k +1 | ℓ ( Z ( B )) . At the same time, X B ∈T ℓ ( Z ( B )) = X B ∈T Z R I Z ( B ) dz k +1 = Z I k +1 X B ∈T I Z ( B ) dz k +1 , SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR therefore K δ ( O, x, r ) = X B ∈T |Z ( B ) |≤ r | I k +1 | Z I k +1 X B ∈T I Z ( B ) dz k +1 + X B ∈T |Z ( B ) | . (3.60)The last step is to estimate the term in (3.60). Since the argument is involved,it is included in Appendix A. In light of Lemmas A.2 and A.3, when n = k +1,there are two constants C , C > φ such that K δ ( O, x, r ) ≤ ( C + C ) r k . The proof is thus completed.Now, recall the deﬁnition of U x in the end of Section 3.1, ∂ ( U x ) ⊂ (( ∂ ( { y : φ τ ( y ) x > δ ∗ } ) ∪ ∂ ( { y : − φ τ ( y ) x > δ ∗ } )) ∩ S ) ∪ ∂ ( S ) . Given a closed box O and an integer r ≥

1, observe that |{ U ∈ T ( O, r ) : ∂ ( S ) ∩ U = ∅}| ≤ r n − n X i =1 p i . In addition, by applying Lemma 3.11 it follows that there is a constant C > φ such that |{ U ∈ T ( O, r ) : ∂ ( U x ) ∩ U = ∅}| ≤ C r n − . (3.61)3.3. The Estimation of Minimal Eigenvalue . In the start stage ofthis section, we state a key lemma which is modiﬁed from [8]. Now, for theset U x we have constructed, deﬁne a random process g x by g x ( i ) , I { Y i ∈ U x } − P ( Y i ∈ U x |F yi − ) , i ≥ , where Y i , ( y i + n − , . . . , y i ) τ and F yi − , σ { θ, y , . . . , y i − } . Lemma . For any ǫ > , there is a class G ǫ such that(i) each element of G ǫ , denoted by g ǫ , is a random series { g ǫ ( i ) } i ≥ with theform (3.62) g ǫ ( i ) = I { Y i ∈ U ǫ } − P ( Y i ∈ U ǫ |F yi − ) − ǫ, i ≥ , where U ǫ is a set in R n ;(ii) G ǫ contains a lower process g ǫ to each g x in the sense that (3.63) g ǫ ( i ) ≤ g x ( i ) ∀ i ≥ . Z. B. LIU AND C. LI

Proof. (i) Let O be a closed box contains S . Let r be an integer suchthat r > ε − ρ n C · ℓ ( O ) , (3.64)where C is deﬁned in (3.61) and ρ , sup x ∈ R ρ ( x ). Let U ǫ be a union of someboxes taken from T ( O, r ). Hence, for a ﬁxed U ǫ , we can deﬁne a randomprocess g ǫ by (3.62). Denote G ǫ as the class of all such g ǫ .(ii) Note that for every x ∈ R m with k x k = 1, U x is bounded. Then, thereis a set U ǫ ∈ R n such that U ǫ ⊂ U x and ∆ U ǫ,x , U x − U ǫ falls into a unionof ﬁnite boxes J , . . . , J l ∈ { U ∈ T ( O, r ) : ∂ ( U x ) ∩ U = ∅} . By (3.61), weobtain l X k =1 ℓ ( J k ) = l · ℓ ( O ) r n ≤ C · ℓ ( O ) · r < ρ n ǫ. (3.65)We now calculate P ( Y t ∈ ∆ U ǫ,x |F yt − ). By (3.65), Lemma 3.6(i) and As-sumption A1’, it is easy to see P ( Y t ∈ ∆ U ǫ,x |F yt − ) ≤ P ( Y t ∈ l [ k =1 J k |F yt − ) ≤ ℓ ( { ( w t + n − , . . . , w t ) τ : Y t ∈ l [ k =1 J k } ) · ρ n = ℓ l [ k =1 J k ! · ρ n < ǫ . So, for any i ≥ g x ( i ) = I { Y i ∈ U x } − P ( Y i ∈ U x |F yi − )= I { Y i ∈ U x } − P ( Y i ∈ U ǫ |F yi − ) − P ( Y i ∈ ∆ U ǫ,x |F yi − ) ≥ I { Y i ∈ U ǫ } − P ( Y i ∈ U ǫ |F yi − ) − ǫ = g ǫ ( i ) , which is exactly (3.63). Proof of Proposition 3.1.

First, recall the deﬁnition of U x , for any x ∈ R m with k x k = 1, Lemma 3.6(ii) and Assumption A1’ yield P ( Y i ∈ U x |F yi − ) I {k Y i − n k≤ M, k θ k≤ K } = P ( Y i ∈ { y : | φ τ ( y ) x | > δ ∗ } ∩ S|F yi − ) I {k Y i − n k≤ M, k θ k≤ K } ≥ inf k x k =1 , k y k≤ M, k z k≤ K ℓ ( { ς : | φ τ ( g ( ς, y, z )) x | > δ ∗ , g ( ς, y, z ) ∈ S} ) · (cid:18) inf s ∈ [ − S ′ ,S ′ ] ρ ( s ) (cid:19) n I {k Y i − n k≤ M, k θ k≤ K } , C P I {k Y i − n k≤ M, k θ k≤ K } , (3.66) SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR where S ′ = K sup k y k≤ M + R ′ k φ ( y ) k + R ′ and R ′ , max ≤ i ≤ n dist  , p i [ j =1 S ji ( q )  . Next, note that for any ǫ > g ǫ ∈ G ǫ , { g ǫ ( i )+ ǫ, F yi } i ≥ is a martingalediﬀerence sequence. Taking account of [3, Theorem 2.8],lim t → + ∞ P ti =1 I {k Y i − n k≤ M } ( g ǫ ( i ) + ǫ ) N t ( M ) = 0 , a.s. on Ω( M ) , where Ω( M ) is deﬁned in Theorem 2.1. Thanks to the ﬁnite number of U ǫ constrained in S , it giveslim t → + ∞ inf U ǫ ⊂S N t ( M ) t X i =1 I {k Y i − n k≤ M } g ǫ ( i ) = − ǫ, a.s. on Ω( M ) . As a result, Lemma 3.12(ii) infers that for some g xǫ ∈ G ǫ ,lim inf t → + ∞ inf k x k =1 N t ( M ) t X i =1 I {k Y i − n k≤ M } g x ( i ) ≥ lim inf t → + ∞ inf k x k =1 N t ( M ) t X i =1 I {k Y i − n k≤ M } g xǫ ( i ) ≥ lim inf t →∞ inf U ǫ ⊂S N t ( M ) t X i =1 I {k Y i − n k≤ M } g ǫ ( i )= − ǫ, a.s. a.s. on Ω( M ) . Further, by the arbitrariness of ǫ , we obtain(3.67) lim inf t → + ∞ inf k x k =1 N t ( M ) t X i =1 I {k Y i − n k≤ M } g x ( i ) ≥ M ) . Finally, by (3.66)–(3.67), if ǫ is suﬃciently small, then there is a positiverandom integer T such that for any unit vector x ∈ R m and all t > T ,1 N t ( M ) t X i =1 I {k Y i − n k≤ M } I { Y i ∈ U x } > N t ( M ) t X i =1 I {k Y i − n k≤ M } P ( Y i ∈ U x |F yi − ) − C P ≥ C P , a.s. on Ω( M ) ∩ {k θ k ≤ K } . Z. B. LIU AND C. LI

Hence, we select C φ > nR ′ , U x ⊂ B (0 , C φ ), for suﬃciently large t , λ min ( t + 1) = inf k x k =1 x τ I m + t X i =0 φ i φ τi ! x ≥ t − n +1 X i =1 I { Y i ∈ U x } ( φ τ ( Y i ) x ) ≥ ( δ ∗ ) t − n +1 X i =1 I { Y i ∈ U x } ≥ ( δ ∗ ) C P N t ( M ) − n ) , a.s. on Ω( M ) ∩ {k θ k ≤ K } . Proposition 3.1 is thus proved.APPENDIX AIn this appendix, we follow the deﬁnitions and symbols in the proof ofLemma 3.11 and complete the estimation details of (3.60). To this end, deﬁne I ∗ k +1 , ( z k +1 : k Y i =1 I i × z k +1 ! ∩ B ( δ ) ∩ k Y i =1 K i × z k +1 ! = ∅ ) ∩ I k +1 ∩  p k +1 [ j =1 S jk +1 ( q )  , k ≥ T , { A ∈ T : A ∩ I ∗ k +1 = ∅} , T , ( B ∈ T : k [ i =1 { z : z i ∈ L i } ∩ B = ∅ ) , where Q k +1 i =1 I i = O is the given closed box in the proof of Lemma 3.11. Lemma

A.1 . The cardinals of I ∗ k +1 , T and T are bounded by | I ∗ k +1 | ≤ (2 p k +1 ( | L k +1 | + 2) + 2) k Y i =1 ( | L i | + p i ) , (A.1) |T | ≤ p k +1 ( | L k +1 | + 2) + 2) k Y i =1 ( | L i | + p i ) , |T | ≤ r k − k X i =1 | L i | , (A.2) SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Proof.

By the deﬁnitions of T and T , T ≤ | I ∗ k +1 | and (A.2) istrivial. So, it suﬃces to show (A.1). For this, recall the deﬁnitions of K i and L i , then for each i ∈ [1 , n ], there is a set P i consisting of some disjointintervals such that |P i | ≤ | L i | + p i and S I ∈P i I = K i . As a result, | Q ki =1 P i | ≤ Q ki =1 ( | L i | + p i ). For each box B ∈ Q ki =1 P i , denote I ∗ k +1 ( B ) = { z k +1 : ( k Y i =1 I i × z k +1 ) ∩ B ( δ ) ∩ ( B × z k +1 ) = ∅}∩ I k +1 ∩  p k +1 [ j =1 S jk +1 ( q )  . Since B ⊂ Q ki =1 K i , it is evident that k X i =1 x τi φ ( i ) ≡ constant on B. (A.3)So, for any z k +1 ∈ I ∗ k +1 ( B ), arbitrarily taking a ( z , . . . , z k ) τ ∈ int( B ) infers( z , . . . , z k +1 ) τ ∈ B ( δ ) . Let { ( z ,j , . . . , z k +1 ,j ) τ } + ∞ j =1 be a sequence of points in (int( B ) × E k +1 ) ∩ { y : φ τ ( y ) x > δ } and tend to ( z , . . . , z k +1 ) τ . Then, lim j → + ∞ k z k +1 ,j − z k +1 k = 0and x τk +1 φ ( k +1) ( z k +1 ,j ) > δ − k X i =1 x τi φ ( i ) ( z i,j ) = δ − k X i =1 x τi φ ( i ) ( z i ) . (A.4)Denote ¯ δ = δ − k X i =1 x τi φ ( i ) ( z i ) , (A.5)so (A.4) implies z k +1 ∈ ∂ ( { z : x τk +1 φ ( k +1) ( z ) > ¯ δ } ) ∩ p k +1 [ j =1 S jk +1 ( q ) , Therefore, applying Lemma 3.10, | I ∗ k +1 ( B ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( { z : x τk +1 φ ( k +1) ( z ) > ¯ δ } ) ∩  p k +1 [ j =1 S jk +1 ( q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p k +1 ( | L k +1 | + 2) + 2 , Z. B. LIU AND C. LI and thus | I ∗ k +1 | ≤ (2 p k +1 ( | L k +1 | + 2) + 2) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k Y i =1 P i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (2 p k +1 ( | L k +1 | + 2) + 2) k Y i =1 ( | L i | + p i ) , (A.6)which completes the proof. Lemma

A.2 . Let Lemma 3.11 hold with n = k . Then, there is a constant C > depending only on φ such that r | I k +1 | Z I k +1 X B ∈T I Z ( B ) dz k +1 ≤ C r k . (A.7) Proof.

Denote φ ′ = col { φ (1) , . . . , φ ( k ) } , x ′ = col { x , . . . , x k } and z =( z , . . . , z k ) τ . Given z k +1 ∈ I k +1 , deﬁne δ ′ , δ − φ ( k +1) ( z k +1 ) x k +1 . Then, { z : ( z , . . . , z k +1 ) τ ∈ B ( δ ) } ∩ k Y i =1 A c ( x τi ( φ ( i ) ) ′ ) ∩  k Y i =1 p i [ j =1 S ji ( q )  = ∂ ( { z : ( φ ′ ) τ ( z ) x ′ > δ ′ } ) ∩ k Y i =1 A c ( x τi ( φ ( i ) ) ′ ) ∩  k Y i =1 p i [ j =1 S ji ( q )  . In addition, for { L i , K i } ni =1 deﬁned in Lemma 3.9, k Y i =1 A c ( x τi ( φ ( i ) ) ′ ) ! c = k [ i =1 { z : z i ∈ L i } ! ∪ k Y i =1 K i , so we arrive at { z : ( z , . . . , z k +1 ) τ ∈ B ( δ ) } ∩  k Y i =1 p i [ j =1 S ji ( q )  ⊂ ∂ ( { z : ( φ ′ ) τ ( z ) x ′ > δ ′ } ) ∩  k Y i =1 p i [ j =1 S ji ( q )  ∪ k [ i =1 { z : z i ∈ L i } ! ∪ k Y i =1 K i . (A.8) SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Consequently, for any z k +1 ∈ A ∈ T \ T and B ∈ T \ T , (A.8) shows { z : ( z , . . . , z k +1 ) τ ∈ B ( δ ) } ∩ k Y i =1 p i [ j =1 S ji ( q ) ∩ B ⊂ ∂ ( { z : ( φ ′ ) τ ( z ) x ′ > δ ′ } ) ∩ k Y i =1 p i [ j =1 S ji ( q ) ∩ B. Now, for ∂ ( { z : ( φ ′ ) τ ( z ) x ′ > δ ′ } ) ∩ Q ki =1 S p i j =1 S ji ( q ) and T , applyingLemma 3.11 with n = k leads to X B ∈T \T I Z ( B ) ( z k +1 ) ≤ Cr k − . (A.9)Based on (A.9), it is readily to compute Z I k +1 X B ∈T I Z ( B ) dz k +1 = X A ∈T Z A X B ∈T I Z ( B ) dz k +1 ≤ X A ∈T \T Z A X B ∈T I Z ( B ) dz k +1 + X A ∈T Z A r k dz k +1 = X A ∈T \T Z A X B ∈T \T I Z ( B ) dz k +1 + X A ∈T \T Z A X B ∈T I Z ( B ) dz k +1 + r k · | I k +1 | r · |T |≤ Z I k +1 Cr k − dz k +1 + X B ∈T Z I k +1 dz k +1 + r k − | I k +1 ||T | . ≤ (( C + |T | ) r k − + |T | ) | I k +1 | . The result follows from Lemmas A.1 and 3.9.

Lemma

A.3 . There is a constant C > depends only on φ such that X B ∈T |Z ( B ) | ≤ C r k . Proof.

Let T ,  k Y i =1 I ′ i ∈ T : ∂  p i [ j =1 S ji ( q )  ∩ I ′ i = ∅ for some i ∈ [1 , k ]  . Z. B. LIU AND C. LI

Clearly, |T | ≤ r k − P ki =1 p i . Hence, X B ∈T |Z ( B ) | ≤ X B ∈T \ ( T ∪T ) |Z ( B ) | + r |T | + 4 r k k X i =1 p i . (A.10)It suﬃces to estimate the ﬁrst term in the right hand side of (A.10). To thisend, take a set B = Q ki =1 I ′ i ∈ T \ ( T ∪T ) and let z k +1 ∈ ∂Z ( B ) ∩ int( I k +1 ).Select a point ( z , . . . , z k ) τ ∈ B thatdist(( z , . . . , z k +1 ) τ , k Y i =1 ∂ ( I ′ i ) × z k +1 )= min y ∈B ( δ ) ∩ Q k +1 i =1 S pij =1 S ji ( q ) ∩ ( B × z k +1 ) dist( y, k Y i =1 ∂ ( I ′ i ) × z k +1 ) . (A.11)Clearly, B ∈ T \ ( T ∪ T ) implies that for each i = 1 , . . . , k ,int( I ′ i ) ⊂ p i [ j =1 S ji ( q ) and int( I ′ i ) ∩ L i = ∅ . We consider the following two cases:

Case 1: ( z , . . . , z k ) τ Q ki =1 ∂ ( I ′ i ). Then, there is an integer i ∈ [1 , k ] suchthat z i ∈ int( I ′ i ). By (A.11), z i K i ∩ int( I ′ i ). Otherwise, there is a ρ > x τi ( φ ( i ) ) ′ ≡ z i − ρ, z i + ρ ] ⊂ int( I ′ i ). Similar to (A.3)–(A.4),for any z ′ i ∈ [ z i − ρ, z i + ρ ],( z , . . . , z i − , z ′ i , z i +1 , . . . , z k +1 ) τ ∈ B ( δ ) ∩ k +1 Y i =1 p i [ j =1 S ji ( q ) ∩ ( B × z k +1 ) . Then, min (cid:26) dist(( z , . . . , z i − , z i − ρ, z i +1 , . . . , z k +1 ) τ , k Y i =1 ∂ ( I ′ i ) × z k +1 )dist(( z , . . . , z i − , z i + ρ, z i +1 , . . . , z k +1 ) τ , k Y i =1 ∂ ( I ′ i ) × z k +1 ) (cid:27) < dist(( z , . . . , z k +1 ) τ , k Y i =1 ∂ ( I ′ i ) × z k +1 ) , which contradicts to (A.11). SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR Now, since z i K i ∩ int( I ′ i ) and B / ∈ T , it yields that x τi ( φ ( i ) ) ′ ( z i ) = 0.We claim z k +1 ∈ p k +1 [ j =1 ∂ ( S ji ( q )) . (A.12)Otherwise, z k +1 ∈ S p k +1 j =1 S ji ( q ). By the Implicit function theorem , there is asuﬃciently small η > z ′ k +1 ∈ ( z k +1 − η, z k +1 + η ), a point z ′ i ∈ int( I i ) exists and( z , . . . , z i − , z ′ i , z i +1 , . . . , z k , z ′ k +1 ) τ ∈ B ( δ ) ∩ k +1 Y i =1 p i [ j =1 S ji ( q ) . This means z k +1 ∈ int( Z ( B )), which is impossible due to z k +1 ∈ ∂Z ( B ).Hence (A.12) holds. Case 2: ( z , . . . , z k ) τ ∈ Q ki =1 ∂ ( I ′ i ). Since z k +1 ∈ ∂ ( Z ( B )), x τk +1 φ ( k +1) cannot be a constant on any neighbourhood of z k . So, z k +1 ∈ ∂ ( { z : x τk +1 φ ( k +1) ( z ) = ¯ δ } ) ∩  p k +1 [ j =1 S ji ( q )  ∪  p k +1 [ j =1 ∂ ( S ji ( q ))  , (A.13)where ¯ δ is deﬁned by (A.5).Combining the above two cases, z k +1 ∈ ∂ ( Z ( B )) ∩ int( I k +1 ) implies (A.13).Taking the case z k +1 ∈ ∂ ( I k +1 ) into consideration, we obtain ∂ ( Z ( B )) ⊂ ∂ ( { y ∈ R : x τk +1 φ ( k +1) ( y ) = ¯ δ } ) ∩  p k +1 [ j =1 S ji ( q )  ∪  p k +1 [ j =1 ∂ ( S ji ( q ))  ∪ ∂ ( I k +1 ) . (A.14)which, together with the fact | ∂ ( { z : x τk +1 φ ( k +1) ( z ) = ¯ δ } ) ∩ ( S p k +1 j =1 S ji ( q )) | ≤ p k +1 ( | L k +1 | + 2) from (3.57), leads to |Z ( B ) | ≤ | ∂ ( Z ( B )) | ≤ p k +1 ( | L k +1 | + 2) + 4 p k +1 + 4 . Z. B. LIU AND C. LI

Now, in view of (A.10), we derive X B ∈T |Z ( B ) | ≤ (8 p k +1 ( | L k +1 | + 2) + 4 p k +1 + 4) r k + |T | r + 4 r k k X i =1 p i , which yields the result by Lemma A.1.APPENDIX BIn this appendix, we provide the proof of Theorems 2.3 by showing Proposition

B.1 . Proposition 3.1 holds for model (1.1) if AssumptionA2 is replaced by A3.

Proof of Proposition B.1.

The proof is similar as that of Proposition3.1 but more concise due to Assumption A3. First, we need not to construct S from Lemmas 3.1–3.5. As a matter of fact, taking δ ∗ from (2.10) in As-sumption A3, Lemma 3.6 follows with S replaced by E . So, for every unitvector x ∈ R m , we can directly deﬁne U x , { y : | φ τ ( y ) x | > δ ∗ } ∩ E. Next, with random process g x deﬁned in Subsection 3.3, we proceed toLemma 3.12. To show this lemma in the current case, we are not going toverify (3.65) by using Lemmas 3.7–3.11. Instead, we intend to claim anotherformula. For this, select a box O containing E and deﬁne T ( x, O, r ) , { U ∈ T ( O, r ) : ∂U x ∩ U = ∅} , (B.1)where T ( O, r ) is deﬁned above (3.55). The remainder is mainly devoted toproving lim r → + ∞ sup k x k =1 X U ∈T ( x,O,r ) ℓ ( U ) = 0 . (B.2)To show (B.2), note that ∂ ( U x ) ⊂ V x , { y ∈ E : | φ τ ( y ) x | = δ ∗ } . (B.3)Denote W ( x, r ) , S U ∈T ′ ( x,O,r ) U , where T ′ ( x, O, r ) , { U ∈ T ( O, r ) : V x ∩ U = ∅} . (B.4) SYMPTOTIC BEHAVIOR OF LEAST SQUARES ESTIMATOR So, it suﬃces to show lim r → + ∞ sup k x k =1 ℓ ( W ( x, r )) = 0 . (B.5)If (B.5) is false, then there is a number ε > { x ( k ) } + ∞ k =1 such that lim k → + ∞ x ( k ) = x ∗ for some unit vector x ∗ and ℓ ( W ( x ( k ) , k )) > ε, ∀ k ≥ . (B.6)Now, according to the deﬁnition of the Jordan measure, (2.9) in AssumptionA3(ii) indicates that lim r → + ∞ ℓ ( W ( x ∗ , r )) = 0 . Moreover, sincelim k → + ∞ sup y ∈ V x ( k ) dist( y, V x ∗ ) = 0 , for any ε ′ > k ′ , k with k ′ < k , |T ′ ( x ∗ , O, k ) | < ε ′ kn ℓ ( O )and |T ′ ( x ( k ) , O, k ) | < (1 + 2 k − k ′ +1 ) n |T ′ ( x ∗ , O, k ) | . The above two inequalities immediately lead to ℓ ( W ( x ( k ) , k )) = ℓ ( O )2 kn · |T ′ ( x ( k ) , O, k ) | < (1 + 2 k − k ′ +1 ) n ε ′ , which contradicts to (B.6) by selecting k ′ = k − ε ′ < − n ε .Finally, (3.65) follows from (B.2) and hence Lemma 3.12 holds. The restof the procedures thus keep the same as those for Proposition 3.1.REFERENCES [1] Chan, K. S. (1993). Consistency and limiting distribution of the least squares estima-tor of a threshold autoregressive model.

Ann. Statist. Chan, K. S. and

Tsay, R. S. (1998). Limiting properties of the least squares estimatorof a continuous threshold autoregressive model.

Biometrika Chen, H. F. and

Guo, L. (1991).

Identiﬁcation and Stochastic Adaptive Control .Birkhauser: Boston, MA.[4]

Eicker, F. (1963). Asymptotic Normality and Consistency of the Least Squares Esti-mators for Families of Linear Regressions,

Ann. Math. Statist. Guo, L. (1995). Convergence and logarithm laws of self-tuning regulators.

Automatica Z. B. LIU AND C. LI[6]

Lai, T. L. and

Wei, C. Z. (1982). Least Squares Estimates in Stochastic RegressionModels with Applications to Identiﬁcation and Control of Dynamic Systems.

Ann.Statist. Lai, T. L. and

Wei, C. Z. (1983). Asymptotic properties of general autoregressivemodels and strong consistency of least-squares estimates of their parameters.

J. Mul-tivariate Anal. Li, C. and

Lam, J. (2013). Stabilization of discrete-time nonlinear uncertain systemsby feedback based on LS algorithm.

SIAM J. Control Optim. Li, D. and

Ling, S. (2012). On the least squares estimation of multiple-regime thresh-old autoregressive models.

J. Econometrics

Li, D. , Tjstheim, D. and

Gao, J. (2016). Estimation in nonlinear regression withHarris recurrent Markov chains.

Ann. Statist. Sternby, J. (1977). On consistency for the method of least squares using martingaletheory.

IEEE Trans. Autom. Control Zhao, W. X. , Chen, H. F. and

Zheng, W. X. (2010). Recursive identiﬁcation fornonlinear ARX systems based on stochastic approximation algorithm.