[PDF] Nonparametric least squares estimation in integer-valued GARCH models

Abstract

We consider a nonparametric version of the integer-valued GARCH(1,1) model for time series of counts. The link function in the recursion for the variances is not specified by finite-dimensional parameters, but we impose nonparametric smoothness conditions. We propose a least squares estimator for this function and show that it is consistent with a rate that we conjecture to be nearly optimal.

Full PDF

aa r X i v : . [ m a t h . S T ] S e p Submitted to Bernoulli arXiv: arXiv:0000.0000

Nonparametric least squares estimationin integer-valued GARCH models

MAXIMILIAN WECHSUNG * MICHAEL H. NEUMANN ** Friedrich-Schiller-Universität Jena, Institut für Mathematik, Ernst-Abbe-Platz 2, 07743 Jena,Germany. E-mail: * [email protected] ; ** [email protected] We consider a nonparametric version of the integer-valued GARCH(1,1) model for time series ofcounts. The link function in the recursion for the variances is not speciﬁed by ﬁnite-dimensionalparameters, but we impose nonparametric smoothness conditions. We propose a least squares es-timator for this function and show that it is consistent with a rate that we conjecture to be nearlyoptimal.

Keywords:

Poisson autoregression, INGARCH, nonparametric estimation, empirical process, mixing.

MSC 2010 subject classiﬁcations:

1. Introduction

Time dependent count data appear in many branches of empirical research. An instance ofcount data that is currently attracting considerable attention is given by epidemiologicaldata which count the number of reported cases of a certain disease in a series of successivetime intervals.A common model for the marginal conditional distributions of a time series of countvariables { Y t } t ∈ Z is the Poisson distribution, i.e. Y t | ... , Y t − , Y t − ∼ Poiss( λ t ) [15]. Here theintensities { λ t } t ∈ Z are mere theoretical quantities deﬁned for modeling purposes. As suchthey are not observable. A temporal dynamic can be modeled with a recursive relation forthe intensities, λ t = m ( Y t − ,... , Y t − p ; λ t − ,... , λ t − q ) for some p , q ∈ N + . This relation couldas well include some exogenous variables on the right hand side. However, in order tofocus on the inherent dynamics of the process, we forgo the inclusion of such explanatoryvariables. The function m is called link function and is at the center of our interest.The introduced Poisson autoregression model is very similar to the GARCH( p , q ) modelintroduced by Bollerslev [4] which is constituted by a process { X t } with X t | ... , X t − , X t − ∼ N (0, σ t ) and σ t = α + α X t − + ... α p X t − p + β σ t − + ... + β q σ t − q . Therefore, our Poissonmodel is often called integer-valued GARCH( p , q ) or INGARCH( p , q ) model. This termhas been introduced by Ferland et al. [9], who showed existence and stationarity of anINGARCH( p , q ) process with linear link function. Further analyses were contributed byRydberg and Stephard [19] as well as Davis et al. [5], for the linear and log-linear IN-GARCH(1,1) model, respectively. Parameter estimation in a linear INGARCH(1,1) modelwas considered by Fokianos et al. [10], who proved consistency and asymptotic normality1 imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann of the conditional maximum likelihood estimator. Fokianos and Tjøstheim [11] extendedthis result to INGARCH(1,1) models with link functions of the form m ( y , λ ) = f ( λ ) + g ( y )for rather general but still parametric functions f and g .To our knowledge, the ﬁrst published results on a purely nonparametric INGARCH(1,1)model is due to Neumann [18]. He proved absolute regularity of the count process { Y t } under the assumption that the link function m satisﬁes a strong contraction property.Doukhan and Neumann [8] proved absolute regularity of general INGARCH( p , q ) pro-cesses, assuming only a semi-contractive link function.However, the problem of estimating a nonparametric link function has not been ad-dressed yet. We propose and analyze an estimator for the link function in a nonparametricINGARCH(1,1) model, presupposing the same contraction property as Neumann [18] did.Statistical inference in nonparametric GARCH(1,1) models has already been consideredby Meister and Kreiß [17], the similarity between the GARCH and INGARCH modelsmotivates us to mimic their estimator. However, we pursue a different strategy in theasymptotic analysis which will be based on mixing properties.

2. Assumptions and main result

Deﬁnition 1.

For T ∈ { N , Z } , let { ( λ t , Y t ) } t ∈ T be a stochastic process that is deﬁned on aprobability space ( Ω , F , P ) and assumes values in R × N = R × { } . Let F t : = σ { λ s , Y s : s ≤ t } be the σ -ﬁeld generated by the process up to time t, and let B denote the Borel σ -ﬁeldover R . The bivariate process { ( λ t , Y t ) } t ∈ T is called an INGARCH(1,1) process if there existsa ( B ⊗ N − B ) -measurable function m : [0, ∞ ) × N → [0, ∞ ) such that P Y t | F t − = Poiss ( λ t ) and λ t = m ( λ t − , Y t − ). The processes { Y t } and { λ t } are called count process and intensity process respectively. Thefunction m is called link function. Given a link function m , the corresponding one-sided INGARCH(1,1) process (i.e. with T = N ) is well deﬁned. It is a Markov process and can be constructed explicitly by speci-fying the transition kernel. A two-sided version (i.e. T = Z ) is well deﬁned if a stationarydistribution of the one-sided process exists [8, 23].Recall that the intensities are hidden variables. A prerequisite for estimating the linkfunction without knowledge of the intensities is that a single intensity λ t does not exclu-sively carry substantial information about the whole process. In other words, we requirethe inﬂuence of λ t to fade with progressing time. This property can be compelled by stipu-lating that the link function be contractive. Deﬁnition 2.

For a subset D : = D × D ⊂ R × N and numbers L , L ≥ such that L + L < , the class G = G ( D , L , L ) of contractive candidate link functions with domain D andsmoothness parameters L and L is deﬁned as the set of all functions g : D → D with theproperty that | g ( λ , y ) − g ( λ , y ) | ≤ L | λ − λ | + L | y − y | , (C) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models for all ( λ , y ),( λ , y ) ∈ D. In case of a contractive link function m ∈ G ( D , L , L ), the inﬂuence of a single inten-sity variable indeed vanishes with time: deﬁne inductively m [0] ( λ t , Y t ) : = m ( λ t , Y t ) and m [ k ] ( λ t − k , Y t − k ,... , Y t ) : = m ( m [ k − ( λ t − k ,... , Y t − ), Y t ), and observe that due to (C) ¯¯¯ λ t + − m [ k ] (0, Y t − k ,... , Y t ) ¯¯¯ = ¯¯¯ m [ k ] ( λ t − k , Y t − k ,... , Y t ) − m [ k ] (0, Y t − k ,... , Y t ) ¯¯¯ = O ( L k ), (1)for k → ∞ . This also ensures the existence of a stationary distribution π because two one-sided INGARCH(1,1) processes with the same link function but different starting pointswould eventually behave alike, indicating that they have reached a stationary regime[8, 18]. Hence, the two-sided INGARCH(1,1) process is well deﬁned [8, 23]. Furthermore,equation (1) implies F t = σ { Y t , Y t − ,... } , which means that all necessary information forstatistical inference is carried by the count process [18]. Thus, from a statistical point ofview, estimating the link function without observing any intensity should be a feasibletask.We propose a minimum contrast estimator for the link function. Recall that the func-tional X E ¡ Y n + − X ¢ is minimized on the set of all σ { Y ,... , Y n } -measurable randomvariables by X = E [ Y n + | Y ,... , Y n ]. Furthermore, the contraction property (C) impliesfor n → ∞ , ¯¯ E [ Y n + | Y ,... , Y n ] − m [ n ] (0, Y ,... , Y n ) ¯¯ = O ( L n ). Hence, m is an approximateminimum of the contrast functional Φ : g E ¡ Y i + − g [ i ] (0, Y ,... , Y i ) ¢ over G , and a sen-sible estimator on the basis of observations Y ,... , Y n might be obtained by minimizing anempirical analogue of that functional. Deﬁnition 3.

Let B ( G ) be the Borel σ -ﬁeld over the normed space ( G , k · k ∞ ) . A leastsquares estimator based on n + consecutive observations of the count process is a randomelement ˆ m n [ Y ,... , Y n ]: ( Ω , F ) → ( G , B ( G )) that minimizes the empirical contrast func-tional Φ n ( g ): g n P n − i = ¡ Y i + − g [ i ] (0, Y ,... , Y i ) ¢ over ( G , k · k ∞ ) . The set of least squares estimators is non-void. To prove this assertion, it has to beshown that, given realizations of the count process, the minimum of Φ n ( g ) over G is at-tained and that there exists a ( B ( R n + ) − B ( G ))-measurable selection function T assumingvalues in the set of the functional’s minimizers. The random element T ( Y ,... , Y n ) wouldthen qualify as a least squares estimator. The functional Φ n attains its minimum over( G , k · k ∞ ) since it is continuous with respect to k · k ∞ and because the space ( G , k · k ∞ ) iscompact. The existence of a measurable selector can be proven by a successive applicationof Jennrich’s Lemma [14] and the Kuratowski-Ryll-Nardzewski selection theorem [cf. 1].The formal argument is rather technical and will be omitted here. It can be found in fulldetail in [23].Before we present our main result on the asymptotic behavior of the least squaresestimator, we summarize our technical assumptions. Most importantly, we make specialstipulations regarding the domain D of the candidate link functions g ∈ G . imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann

Deﬁnition 4 (Technical assumptions) . (A1) Let M > , B ∈ N + and ≤ L , L < be ﬁxed constants. The set of candidate functions G ( M , B , L , L ) consists of all functions g ∈ G ( D , L , L ) with D : = [0, M ] × N thatsatisfy g ( λ , y ) = g ( λ , B − for all y ≥ B − and all λ ∈ [0, M ] .(A2) The data generating process { ( Y t , λ t ) } t ∈ Z is an INGARCH(1,1) process with link func-tion m ∈ G ( M , B , L , L ) and stationary distribution P ( λ t , Y t ) = : π .(A3) The G -valued random element ˆ m n is a least squares estimator according to Deﬁnition3. The following theorem is our main result. It states that the loss L ( ˆ m n , m ) : = R¡ ˆ m n ( λ , y ) − m ( λ , y ) ¢ π ( d λ , d y ) converges in probability to zero if the number of observations grows toinﬁnity and provides a rate of this convergence. Theorem 1.

For n ∈ N + , let ˆ m n be a least squares estimator for m on the basis of obser-vations of Y ,... , Y n from the data generating process. The sequence { δ n } n ∈ N + shall be givenby δ n = n − log( n ). Then L ( ˆ m n , m ) = O P ( δ n ) , i.e. lim sup n →∞ P nZ ¡ ˆ m n − m ¢ d π > δ n o = (2) The proof of this theorem is given in Section 4. It will proceed along a series of auxiliaryresults. For the sake of clarity, the proofs of most auxiliary results will be adjourned toSection 5.

3. Discussion

The rate L ( ˆ m n , m ) = O P ( n − (log n ) ) reﬂects the size of the function class G in terms ofthe smoothness of the candidate functions and the dimension of their domains. In non-parametric regression and density estimation with i.i.d. samples, lower bounds for thesquared L -loss of estimators for functions with degree of smoothness β , in terms of aSobolev parameter, and domain dimension d are typically given by n − β /(2 β + d ) , up to apositive multiplicative constant [20]. In our model, the introduction of the threshold value B essentially reduces the problem to a parametric one in the second component. Thus,the effective dimension of the nonparametric estimation problem is d =

1. As to the de-gree of smoothness, the contractive property is a tighter version of Lipschitz continuity.Consequently, R [0, M ] ¯¯ ∂∂λ m ( λ , y ) ¯¯ d λ < ∞ for any y ∈ { B − } [cf. 2], i.e. m ( · , y ) is anelement of the Sobolev class of functions with smoothness parameter β = n − .This conjecture is corroborated by the following result for a nonparametric GARCH(1,1)model. Suppose the corresponding link function m belongs to a Hölder class of monotone imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models β and domains with dimension d =

2. Imposingsome additional regularity conditions, Meister and Kreiß [17] proved that in this caseinf { ˆ m n } lim inf n →∞ n β /(2 β + d ) sup m E Z ¡ m − ˆ m n ¢ d π >

0. (3)There, π denotes the stationary distribution of the data generating GARCH(1,1) process,and the inﬁmum is taken over all sequences of estimators.In conclusion, we strongly suspect that the rate we provide in Theorem 1 is optimal upto the logarithmic term. We want to address the implications of the cutoff threshold B . Introducing this quan-tity contained the class of candidate functions to a size that is more associated with one-dimensional rather than two-dimensional domains. If we forwent this cutoff by setting B = ∞ , we would blow up the class G dramatically. The consequence would supposedly bea rate of convergence not faster than n − . On the other hand, assigning B too small avalue leads to a misspeciﬁed model, rendering worthless the aforementioned advantage.To ﬁnd a compromise between a broad model and a fast rate of convergence, we suggestthe following strategy.Let us still suppose that y m ( λ , y ) is constant for all y ≥ B ∗ − B ∗ ∈ N + . Let { B n } n ∈ N ⊂ N + be a sequence with B n ≤ B n + for all n , and B n → ∞ . We deﬁne the classes G ∞ : = G ( M , ∞ , L , L ), G n : = G ( M , B n , L , L ), G ∗ : = G ( M , B ∗ , L , L ). (4)There exists a number n ∗ ∈ N such that B n ∗ − ≤ B ∗ ≤ B n ∗ and therefore G ⊂ ... G n ∗ − ⊂ G ∗ ⊂ G n ∗ ⊂ ... ⊂ G ∞ . Thus, if n is sufﬁciently large, the set G n certainly contains the truelink function. Following this idea, we deﬁne a sequence of modiﬁed least squares estima-tors and present a result regarding the asymptotic behavior of this sequence. Deﬁnition 5.

Let { ( λ t , Y t ) } t ∈ Z be a stationary version of a two-sided nonparametric IN-GARCH(1,1) process with link function m ∈ G ∗ , and denote with B ( G ∞ ) the Borel σ -ﬁeldover ( G ∞ , k · k ∞ ) . The estimator ˜ m n [ Y ,... , Y n ] of m on the basis of n + successive observa-tions of the count process is given by a ( F − B ( G ∞ )) -measurable function minimizing theempirical contrast Φ n ( g ) over the class G n . Theorem 2.

Suppose that m ∈ G ( M , B ∗ , L , L ) , and let ˆ m n be a least squares estimator ofm that minimizes the contrast functional Φ n ( g ) over the correctly speciﬁed set of candidatefunctions G ( M , B ∗ , L , L ) (cf. Deﬁnition 3). Let the non-decreasing sequence { B n } n ∈ N ⊂ N + satisfy B n ≤ B p log n. Then the sequence of estimators { ˜ m n } n ∈ N chosen, according to Deﬁ-nition 5, from the growing classes of candidate functions { G n } n ∈ N , respectively, attains thesame rate of convergence as { ˆ m n } n ∈ N , i.e. L ( ˜ m n , m ) = O P ( n − (log n ) ) . The proof of this claim is completely analogous to the proof of Theorem 1, which can becopied with only minor modiﬁcations [23]. imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020

M. Wechsung, M.H. Neumann

The task of minimizing Φ n ( g ) over G ( M , B , L , L ) is computationally unfeasible. Numer-ically approximating a realization of the proposed estimator requires a discretization ofthe approach. An approximation of G with a ﬁnite-dimensional subset G n ⊂ G , e.g. basedon splines, is one option [23]. However, with this approach several pitfalls remain. Evenif ﬁnite-dimensional, the resulting optimization problem still includes a large number ofvariables, exposing us to the curse of dimensionality. Furthermore, the functional that weseek to optimize has a somewhat unclear structure making it hard to asses the questionof convexity. We suspect that the optimization problem is not convex, which would requirethe use of global optimization algorithms. These are often slow and especially sensitiveto the problem’s dimension. Thus, the question how to compute realizations of the leastsquares estimator is certainly not yet settled to a satisfying degree, calling for more re-search in this direction. For a nonparametric integer-valued GARCH model for count data with hidden intensities,we proposed as an estimator for the link function the minimizer of the empirical contrastfunctional Φ n over the class of candidate functions. To guarantee good estimates, we stipu-lated that the candidate functions, including the true link function, satisfy the contractivecondition (C). A rather restrictive additional assumption on the candidate functions is thatthey are constant in the count component from a threshold value B onward. However, thisrestriction can be mitigated by letting the threshold B grow gently with the sample size.In conclusion, we have demonstrated that the principle of minimizing the empiricalcontrast Φ n over sufﬁciently large sets of candidate functions leads to procedures with afavorable asymptotic behavior. As to efﬁcient algorithms for the numerical implementationof these procedures, our theoretical results may serve as a motivation for future research.Furthermore, rigorously proven results on minimax lower bounds for the INGARCH(1,1)model are certainly a desirable supplement to our work.

4. Proof of the main result

Before we proceed with the ﬁrst auxiliary lemma, we ﬁx the notation that we use through-out the rest of the proof.

Deﬁnition 6. (i) The true link function corresponding to the data generating process willbe denoted m, candidate functions in G will typically be denoted g or h;(ii) the conditional expectation of a random variable X given that ˆ m n = g is denoted by E | ˆ m n = g [ X ] ;(iii) Y lk : = (0, Y k ,... , Y l ) ;(iv) for any g ∈ G , i ∈ Z and t ∈ N + , let g [0] ( λ i , Y i ) : = g ( λ i , Y i ) and g [ t ] ( λ i − t , Y i − t ,... , Y i ) : = g ¡ g [ t − ( λ i − t , Y i − t ,... , Y i − ), Y i ¢ ; imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models (v) for any g ∈ G , i ∈ Z , t ∈ N ,f t ( g ; Y i + i − t ) : = ¡ Y i + − m [ t ] ( Y ii − t ) ¢ − ¡ Y i + − g [ t ] ( Y ii − t ) ¢ = ¡ Y i + − m [ t ] (0, Y i − t ,... , Y i ) ¢ − ¡ Y i + − g [ t ] (0, Y i − t ,... , Y i ) ¢ .In the notation of part (ii), the loss can be written as a conditional expectation, L ( ˆ m n , m ) = E | ˆ m n = g £ m ( λ ′ , Y ′ ) − g ( λ ′ , Y ′ ) ¤ , where ( λ ′ , Y ′ ) ∼ π is independent of the original data gen-erating process.For the proof of Theorem 1 we use some principles that are well known in the asymptoticanalysis of least squares estimators for regression functions r ( x ) : = E | ξ i = x [ η i ], r ∈ R , withindependent and identically distributed (i.i.d.) data pairs ( η i , ξ i ) ∼ P and some nonpara-metric class R of candidate functions. The estimator ˆ r is chosen as to minimize the con-trast functional ϕ n ( g ) = n P ni = ( η i − g ( ξ )) over R , and the central bound for the quadraticloss can be established, Z [ r ( x ) − ˆ r ( x )] P ( d y , dx ) = Z [( y − ˆ r ( x )) − ( y − r ( x )) ] P ( d y , dx ) (5) ≤ ¡ ϕ n ( r ) − ϕ n ( ˆ r ) ¢ − Z [( y − r ( x )) − ( y − ˆ r ( x )) ] P ( d y , dx ) ≤ sup g ∈ R n¡ ϕ n ( r ) − ϕ n ( g ) ¢ − E ¡ ϕ n ( r ) − ϕ n ( g ) ¢o .A bound for the last term can be found by means of classical empirical process theory.We will adapt this line of argument to the nonparametric INGARCH(1,1) setting. How-ever, due to the more complicated dependency structure of our data generating process,we will not be able to exploit as simple a relation as (5). As the univariate count process iseasier to handle than the bivariate process, we seek as a starting point a relation similarto (5) but adapted to the contrast functional Φ n (cf. Deﬁnition 3) and without the appear-ance of any λ i . This is the subject of Lemma 1. Subsequently, we discuss the dependencystructure of the count process and prove that it is uniformly mixing, which leads to thecoupling of Corollary 1. The resulting empirical process is then bounded in probability inLemma 7, after which the conclusion of the proof follows. Lemma 1.

Let δ = δ ( n ) = n − log n and t = t ( n ) = − § L log n ¨ .(i) Suppose that Ω is the set of all ω ∈ Ω such that ε ¡ L ( ˆ m n , m ) ¢ ≥ M L t for some g ∈ G and ε > . Then E | ˆ m n = g £ − f t ( g ; Y i + i − t ) ¤ ≥ (1 − ε ) L ( ˆ m n , m ) − O ( L t ) for almost all ω ∈ Ω .(ii) There exists a positive constant γ > such that for the set G k : = © g ∈ G : E [ m [ t ] ( Y t ) − g [ t ] ( Y t )] ≤ k + γδ ª , almost all n ∈ N , and n → ∞ P n L ( ˆ m n , m ) > δ o ≤ P ∞ [ k = ½ sup g ∈ G k n − t n − X i = t ¡ f t ( g ; Y i + i − t ) − E f t ( g ; Y i + i − t ) ¢ > k − γδ ¾ + o (1). imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann

Note that the stochastic process © Y i + i − t ª i ∈ Z is not an i.i.d. sequence. In order to boundthe random functional g n − t P n − i = t ¡ f t ( g ; Y i + i − t ) − E f t ( g ; Y i + i − t ) ¢ in probability uniformlyover the set G k , we want to use tools from the theory of empirical processes. These toolsrequire the degree of dependence in the data generating process to be somewhat negligible.We will show that this is indeed the case because the count process is uniformly mixing. Deﬁnition 7.

A stationary sequence { η i } i ∈ Z of ( R , B ) -valued random variables on ( Ω , F , P ) is called uniformly mixing if the sequence of mixing coefﬁcients { φ ( k ) } k ∈ N , φ ( k ) : = esssup n sup B ∈ B ∞ ¯¯ P © ( η k , η k + ,...) ∈ B ¯¯ η , η − ,... ª − P © ( η k , η k + ,...) ∈ B ª¯¯o , converges to zero as k → ∞ . The most striking feature of a uniformly mixing sequence { η i } i ∈ Z is that η and η k arealmost independent if k is large enough. Due to the stationarity of the process, the same istrue for any pair η i + k , η i with i ∈ Z . The sense in which this almost independence materi-alizes is speciﬁed by the next lemma which was used by Doukhan [6] to prove exponentialinequalities for absolute regular sequences. It is based on the classic coupling lemma byBerbee [3]. Lemma 2.

Let { η i } i ∈ Z be a uniformly mixing random sequence on ( Ω , F , P ) , with mixingcoefﬁcients { φ ( k ) } k ∈ N . For q ∈ N + , there exist two random sequences { η ′ i } i ∈ Z and { η ∗ i } i ∈ N on aprobability space ( S , Σ , P ) such that(i) the processes { η i } and { η ′ i } have the same distribution;(ii) the process { η ∗ i } is q-dependent, i.e. the block sequences © ( η ∗ jq ,... , η ∗ (2 j + q − ): j ∈ N ª and © ( η ∗ (2 j + q ,... , η ∗ (2 j + q − ): j ∈ N ª are i.i.d., respectively;(iii) P ¡ η ′ jq ,..., η ′ ( j + q − ¢ = P ¡ η ∗ jq ,..., η ∗ ( j + q − ¢ for any j ∈ N ;(iv) P © η ′ i η ∗ i , for some i ∈ { n − } ª ≤ nq φ ( q ) for any n ∈ N + . The count process { Y t } t ∈ Z is uniformly mixing because the contraction property (C) ren-ders neglectable the difference between the conditional distribution P ( Y k , Y k + ,...) | Y and theunconditional distribution P ( Y k , Y k + ,...) for large k . This property carries over to the pro-cess { Y i + i − t } i ∈ Z . We state the formal result without a proof, the interested reader can ﬁnda detailed proof in [23]. Mixing properties of a general class of GARCH-type processesincluding our data generating process are discussed by Doukhan and Neumann [8]. Lemma 3.

For any t ∈ N , the process { Y i + i − t } n ∈ Z is uniformly mixing, and the correspond-ing mixing coefﬁcients φ t ( k ) are geometrically decreasing, φ t ( k ) . ( L + L ) k − t . Corollary 1.

Let t ∈ N be chosen as in Lemma 1. For q ∈ N + , there exist two sequences of R t + -valued random vectors, V ′ = { V ′ i } i ∈ Z and V ∗ = { V ∗ t + i } i ∈ N , on a probability space ( S , Σ , P ) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models such that: P V ′ = P { Y i + i − t } , the sequence V ∗ is q-dependent, P ( V ′ jq ,..., V ′ ( j + q − ) = P ( V ∗ jq ,..., V ∗ ( j + q − ) ,and P © V ′ i V ∗ i for some i ∈ { t ,... , n − } ª ≤ n − tq φ t ( q ).Note that for any i , the ﬁrst component of V ∗ i can be non-zero only on a P -null set. Sincesuch a set is neglectable for the further argument, we stipulate without loss of general-ity that V ∗ i = (0, Y ∗ i − t ,... , Y ∗ i + ) with P Y ∗ i = P Y i . At this point, we have to introduce somefurther notation. Deﬁnition 8. (i) For any i ∈ Z , the random variable Y ∗ i on ( S , Σ , P ) is deﬁned as the lastcoordinate of V ∗ i − ; the random vector Z ∗ i is deﬁned as Z ∗ i : = (0, Y ∗ i − t ,... , Y ∗ i ) .(ii) The expectation with respect to P is denoted by E.(iii) λ ∗ i : = E [ Y ∗ i | Y ∗ i − , Y ∗ i − ,...] .(iv) For functions g ∈ G and some length q, we introduceN : =  j n − tq k if j n − tq k is even, ³j n − tq k − ´ if j n − tq k is odd. (6)X ∗ r ( g ) : = q q − X i = ¡ f t ( g ; V ∗ t + rq + i ) − E f t ( g ; V ∗ t + rq + i ) ¢ (7)R n ( g ) : = n − t n − − t X i = Nq ¡ f t ( g ; V ∗ t + i ) − E f t ( g ; V ∗ t + i ) ¢ . (8) Remark 1.

The coupling lemma implies that X ∗ r ( g ) and X ∗ r + ( g ) are independent for any r ∈ N and g ∈ G . Using the equality1 n − t n − X i = t ¡ f t ( g ; V ∗ i ) − E f t ( g ; V ∗ i ) ¢ = qn − t N − X j = X ∗ j ( g ) + qn − t N − X j = X ∗ j + ( g ) + R n ( g ) (9)and the triangle inequality for probabilities, we obtain on the basis of Lemma 1 (ii) andCorollary 1, for almost all n ∈ N P n L ( ˆ m n , m ) > δ o ≤ ∞ X k = P ½ sup g ∈ G k qn − t N − X j = X ∗ j ( g ) > k − γδ ¾ (10) + ∞ X k = P ½ sup g ∈ G k R n ( g ) > k − γδ ¾ + n − tq φ t ( q ) + o (1) (11)as n → ∞ .The functional in line (10), g [ t ] qn − t P N − j = X ∗ j ( g ), can be viewed as the trajectoryof an empirical process driven by i.i.d. random variables, indexed by the function class { g [ t ] : g ∈ G k } . A standard tool to ﬁnd uniform bounds for these trajectories is the so calledchaining technique [12, 13, 21, 22]. It is based on uniform bounds for the trajectories over imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann successively reﬁned ﬁnite approximations of the index set. Therefore, estimating the sizeof the index set in terms of covering numbers is of central importance. Since the functions g [ t ] have a high dimensional domain, which induces large covering numbers for the respec-tive function class, it is clearly desirable to have a link to the considerably less complexclass G k . In this context, the usual strategy, considering covering numbers with respect tothe L norm, is unfeasible since it is unclear to us how the L ( P ( λ ∗ i , Y ∗ i ) ) norm of a function g ∈ G and the L ( P Z ∗ i ) norm of the t -fold iterated function g [ t ] are related. However, quiteeasily we can ﬁnd a relation between the respective k · k ∞ -norms. This is done in Lemma4 (i), which will also enable us to bound the remainder term in line (11). Part (ii), in turn,provides a bound for the k · k ∞ -covering number of the index set G k . Lemma 4. (i) For any g , h ∈ G , ¯¯ f t ( g ; V ∗ i ) − f t ( h ; V ∗ i ) ¯¯ ≤ Y ∗ i + + M )1 − L k g − h k ∞ . (ii) For any k , s ∈ N , there exists a set G ( s ) k ⊂ G k with at most e MB s − k /( p γδ ) elements and aselection function π s , k : G k → G ( s ) k such that k g − π s , k g k ∞ ≤ − s k + p γδ for any g ∈ G k . Westipulate the notation π s , k g = : g s , k . Remark 2.

As a simple consequence of Lemma 4 (i), we obtain a bound for remainderterm R n , P ½ sup g ∈ G k R n ( g ) > k − γδ ¾ . − k δ − qn − t . (12)This follows from Markov’s inequality in combination with the inequality E £ sup g ∈ G | R n ( g ) | ¤ ≤ E h sup g ∈ G n − t n − X i = Nq ¯¯ f t ( g ; V ∗ t + i ) ¯¯ + E ¯¯ f t ( g ; V ∗ t + i ) ¯¯i ≤ n − t n − X i = Nq M − L E ¡ Y ∗ t + i + + M ¢ ≤ qn − t M − L , (13)which follows from Lemma 4 (i) and the fact that f t ( m ; V ∗ i ) = imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models Lemma 5.

On a probability space ( E , E , P ) , let Y ,... , Y n be R d -valued random variablesand ε ,... , ε n independent Rademacher variables. Let H be a class of continuous functionsh : R d → R such that R | h ( Y ) | dP < ∞ . If sup h ∈ H var ³ P ni = h ( Y i ) ´ ≤ t ,P ½ sup h ∈ H ¯¯ n X i = h ( Y i ) ¯¯ > t ¾ ≤ P ½ sup h ∈ H ¯¯ n X i = ε i h ( Y i ) ¯¯ > t /4 ¾ .In Lemma 6, Part (ii) applies Lemma 5 to the sums qn − t P N − j = X ∗ j ( g ), for which Part (i)supplies the requisite uniform variance bound. Lemma 6.

Assume that the quantities δ and t are given by δ ( n ) = n − log n and t ( n ) =− § L log n ⌉ > Let the length of the blocks X ∗ j ( g ) also depend on the sample size insuch a way that q ( n ) ≍ t ( n ) . Recall the constant γ introduced in Lemma 1 (ii). Then(i) sup g ∈ G k var ³ qn − t P N − j = X ∗ j ( g ) ´ ≤ ( M + M + M )2 k + γδ qn − t ,(ii) for all k ∈ N and almost all n ∈ N P ½ sup g ∈ G k qn − t N − X j = X ∗ j ( g ) > k − γδ ¾ ≤ P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g ) > k − γδ ¾ .We have ﬁnally obtained a form of the empirical process that is well suited for theapplication of the chaining argument. The resulting bound is presented in the followinglemma. Lemma 7.

Suppose that the quantities δ , t, and q are deﬁned as in Lemma 6. In accor-dance with Lemma 4 (ii), for g ∈ G and ˇ S ( n ) : = min © s ∈ N : − L − s p γδ ≤ − γδ /(15 M ) ª ,the functions g k ,... , g ˇ S , k shall be given such that k g − g s , k k ∞ ≤ − s k + p γδ . Then thereexists a positive constant C and a natural number n such that for all n ≥ n and all k ∈ N P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g ) > k − γδ ¾ . − k log nn + n − ( k + + exp ³ − C n k ´ + − k (log n ) − .Combining the estimate in Remark 1 with Lemma 7 and Remark 2, we obtain that forall but ﬁnitely many n and all k ∈ N P n L ( ˆ m n , m ) > δ n o . b n + ∞ X k = a n , k , (14)with a n , k : = − k log nn + n − k n − + exp ³ − C n k ´ + − k (log n ) − + − k δ − qn − t , (15) b n : = n − tq φ t ( q ) + o (1). (16)Recall that t ( n ) = − § L log n ¨ and q ( n ) ≍ t ( n ). These facts imply δ − n qn − t ≍ n − (log n ) − ,whence we infer that lim n →∞ a n , k = k ∈ N . Since there exists an absolute summable imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann sequence { η k } k ∈ N ⊂ R such that sup n ≥ | a n , k | ≤ η k for any k , we conclude that P ∞ k = a n , k → n → ∞ . Because φ t ( q ) . ( L + L ) q − t , the sequence { q ( n ) } can be speciﬁed such that n − tq ( φ t ( q )) q − t ≍ (log n ) − , implying that lim n →∞ b n =

5. Proofs of auxiliary results

Proof of Lemma 1. (i) The ﬁrst statement of the lemma is a consequence of the factthat E | ˆ m n = g ·³ Y ′ t + − g [ t ] (0, Y ′ ,... , Y ′ t ) ´ − ³ Y ′ t + − m [ t ] (0, Y ′ ,... , Y ′ t ) ´ ¸ ≥ E | ˆ m n = g h m [ t ] (0, Y ′ ,... , Y ′ t ) − g [ t ] (0, Y ′ ,... , Y ′ t ) i − M L t (17)almost surely, combined with the inequality E | ˆ m n = g h m [ t ] (0, Y ′ ,... , Y ′ t ) − g [ t ] (0, Y ′ ,... , Y ′ t ) i > (1 − ε ) L ( ˆ m n , m ) − M L t , (18)which holds for almost all ω ∈ Ω . Both statements are the result of straightforward calcu-lations exploiting the well known assumption that both m and ˆ m n possess the contractionproperty and are bounded by M . For a detailed exposition cf. [23].(ii) First of all, note that lim sup n →∞ L t / δ =

0. Hence, there exists a number n suchthat on the set Ω : = © ω ∈ Ω : L ( ˆ m n , m ) > δ ª the condition of part (i) is satisﬁed for all n ∈ N with n > n . We deﬁne the constant γ : = (1 − ε ) /24 and conclude that there exists anumber n ≥ n such that for almost all ω ∈ Ω and all n > n , E | ˆ m n = g h m [ t ] (0, Y ′ ,... , Y ′ t ) − g [ t ] (0, Y ′ ,... , Y ′ t ) i > (1 − ε ) δ − M L t ≥ γδ , (19) E | ˆ m n = g £ − f t ( g ; Y ′′′ t + ) ¤ > E | ˆ m n = g h m [ t ] (0, Y ′ ,... , Y ′ t ) − g [ t ] (0, Y ′ ,... , Y ′ t ) i − γ δ . (20)In other words, Ω ⊂ © ω ∈ Ω : (19) and (20) hold ª for n > n , up to a null set. Furthermore, n P n − i = f i ( ˆ m n ; Y i + ) = Φ n ( m ) − Φ n ( ˆ m n ) ≥

0, which follows from the deﬁnition of the leastsquares estimator. Let Y ′ ,... , Y ′ n be a ghost sample with the same distribution as Y ,... , Y n but independent from the data generating process, and deﬁne ¡ Y ′ ,... , Y ′ t ¢ = : Y ′′′ t . Then P n L ( ˆ m n , m ) > δ o ≤ P ½ E | ˆ m n = g h m [ t ] ( Y ′′′ t ) − g [ t ] ( Y ′′′ t ) i > γδ ; E | ˆ m n = g £ − f t ( g ; Y ′′′ t + ) ¤ > E | ˆ m n = g h m [ t ] ( Y ′′′ t ) − g [ t ] ( Y ′′′ t ) i − γ δ ¾ ≤ P ½ E | ˆ m n = g h m [ t ] ( Y ′′′ t ) − g [ t ] ( Y ′′′ t ) i > γδ ; imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models n n − X i = f i ( ˆ m n ; Y i + ) − E | ˆ m n = g £ f t ( g ; Y ′′′ t + ) ¤ > E | ˆ m n = g h m [ t ] ( Y ′′′ t ) − g [ t ] ( Y ′′′ t ) i − γ δ ¾ ≤ P ½ ∃ g ∈ G : E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i > γδ ; (21)1 n n − X i = f i ( g ; Y i + ) − E f t ( g ; Y t + ) > E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i − γ δ ¾ for n > n . Using the decomposition ½ E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i > δ γ ¾ = ∞ [ k = ½ k + δ γ ≥ E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i > k δ γ ¾ (22)and introducing G k : = © g ∈ G : E [ m [ t ] ( Y t ) − g [ t ] ( Y t )] ≤ k + γδ ª , we write (21) further as P ∞ [ k = ½ ∃ g ∈ G : 2 k + γδ ≥ E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i > k γδ ;1 n n − X i = f i ( g ; Y i + ) − E f t ( g ; Y t + ) > E h m [ t ] ( Y t ) − g [ t ] ( Y t ) i − γ δ ¾ ≤ P ∞ [ k = ½ sup g ∈ G k n n − X i = f i ( g ; Y i + ) − E f t ( g ; Y t + ) > k − γδ ¾ . (23)This is almost the statement of the lemma. We just have to substitute n P n − i = f i ( g ; Y i + )with n − t P n − i = t f t ( g ; Y i + i − t ). To that end, we invoke the triangle inequality for probabilitiesand the fact that by stationarity E f t ( g ; Y t + ) = E f t ( g ; Y i + i − t ) for i ≥ t and conclude, P ∞ [ k = ½ sup g ∈ G k n n − X i = f i ( g ; Y i + ) − E f t ( g ; Y t + ) > k − γδ ¾ ≤ P ∞ [ k = ½ sup g ∈ G k n − t n − X i = t ¡ f t ( g ; Y i + i − t ) − E f t ( g ; Y i + i − t ) ¢ > k − γδ ¾ + P ½ sup g ∈ G ¯¯¯ n n − X i = f i ( g ; Y i + ) − n − t n − X i = t f t ( g ; Y i + i − t ) ¯¯¯| {z } = : ∆ n > γδ ¾ . (24)It can be shown that E ∆ n . tn + L t . This is again done by straightforward computationsusing the contraction property and the uniform boundedness of the candidate functions.Essentially, the term tn is owed to the different length of the sums ( n and n − t addends,respectively), and the term L t is the order of the difference ¯¯ f i ( g ; Y i + ) − f t ( g ; Y i + i − t ) ¯¯ . Thelatter fact is a consequence of the contraction property and the fact that all functions in G are bounded by M . A simple application of Markov’s inequality shows then that theprobability in line (24) converges to zero as n → ∞ . (cid:3) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann

Proof of Lemma 4. (i) Recall that all g , h ∈ G are bounded by M to conclude, ¯¯ f t ( g ; V ∗ i ) − f t ( h ; V ∗ i ) ¯¯ = ¯¯¯ Y ∗ i + ¡ g [ t ] ( Z ∗ i ) − h [ t ] ( Z ∗ i ) ¢ + £ h [ t ] ( Z ∗ i ) + g [ t ] ( Z ∗ i ) ¤£ h [ t ] ( Z ∗ i ) − g [ t ] ( Z ∗ i ) ¤¯¯¯ ≤ (2 Y ∗ i + + M ) ¯¯ g [ t ] ( Z ∗ i ) − h [ t ] ( Z ∗ i ) ¯¯ . (25)By the contractive property applied to the ﬁrst component, ¯¯ g [ t ] (0, Y ,... , Y i ) − h [ t ] (0, Y ,... , Y i ) ¯¯ ≤ ¯¯ g ¡ g [ t − (0, Y ,... , Y i − ), Y i ¢ − g ¡ h [ t − (0, Y ,... , Y i − ), Y i ¢¯¯ + ¯¯ g ¡ h [ t − (0, Y ,... , Y i − ), Y i ¢ − h ¡ h [ t − (0, Y ,... , Y i − ), Y i ¢¯¯ ≤ L ¯¯ g [ t − (0, Y ,... , Y i − ) − h [ t − (0, Y ,... , Y i − ) ¯¯ + k g − h k ∞ , (26)and by an iteration of this argument, we see that ¯¯ g [ t ] ( Z ∗ i ) − h [ t ] ( Z ∗ i ) ¯¯ ≤ k g − h k ∞ P tk = L k .(ii) We call the quantity N ( ε , d , X ) : = min n N ∈ N : ∃ x ,... , x N ∈ X such that X ⊂ N [ i = B d ( x i , ε ) o (27)the covering number of the metric space ( X , d ) for the resolution level ε . For B ∈ N + and L >

0, suppose that ˜ G is the class of functions deﬁned by˜ G : = n g = ( g ,... , g B − ) ′ : [0, M ] B → [0, M ]; | g i ( x ) − g i ( y ) | ≤ L | x − y | for all x , y ∈ [0, M ] o . (28)A straightforward extension of a result by Kolmogorov and Tikhomirov [16] regardingcovering numbers of classes of Lipschitz functions yields, log N ( ε , k · k ∞ , ˜ G ) ≤ MB / ε [cf.23]. Thus, it takes at most N : = N ¡ ǫ , G , k · k ∞ ¢ ≤ e MB / ǫ balls to cover the whole class G with k · k ∞ -balls of radius ǫ . Now let G ′ ⊂ G be an arbitrary subset and assume thatthe elements { h ,... , h N } ⊂ G constitute a covering of G with such balls. We want to ﬁnda set { h ′ ,... , h ′ N } ⊂ G ′ that constitutes a covering of G ′ with balls of radius 2 ǫ . Since { h ,... , h N } ⊂ G constitutes a covering of G with ǫ -balls, we can select a minimal subset { h i ,... , h i N ′ } ⊂ { h ,... , h N } with 1 ≤ i < ... < i N ′ ≤ N that constitutes an ǫ -ball coveringof G ′ . If all h i j ∈ G ′ , everything is proven. In this case, the set { h i ,... , h i N ′ } ⊂ G ′ consti-tutes also a 2 ǫ -ball covering of G ′ . Otherwise, assume that h i j ∈ G \ G ′ . Since the subset { h i ,... , h i N ′ } is without loss of generality assumed to be minimal, the set B ( h i j , ε ) ∩ G ′ is non-empty. Now pick an arbitrary h ′ i j ∈ B ( h i j , ǫ ) ∩ G ′ , and observe that B ( h i j , ε ) ∩ G ′ ⊂ B ( h ′ i j ,2 ǫ ). We can carry out this procedure for every h i j ∉ G ′ and replace this element withthe obtained h ′ i j . All h i j that are elements of G ′ in the ﬁrst place are simply relabeled h ′ i j .Hence, there exists a set { h ′ i ,... , h ′ i N ′ } ⊂ G ′ that constitutes a cover of G ′ with balls ofradius 2 ǫ . This means that N (2 ǫ , G ′ , k · k ∞ ) ≤ N ′ ≤ N = N ( ǫ , G , k · k ∞ ) ≤ e MB / ǫ = e MB /(2 ǫ ) (29) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models k ∈ N . Since this consideration was independent of the speciﬁcation of G ′ , we con-clude that N ( ǫ , G k , k · k ∞ ) ≤ e MB / ǫ for all k .Plugging in ǫ = + k − s p γδ tells us that N ¡ − s k + p γδ , G k , k·k ∞ ¢ ≤ exp ¡ MB s − k ± ( p γδ ) ¢ .Thus, there exists a set G ( s ) k ⊂ G k with at most exp ¡ MB s − k ± ( p γδ ) ¢ elements, and for any g ∈ G there exists an element h ∈ G ( s ) k with k g − h k ∞ < − s k + p γδ . Let now g ∈ G k bearbitrary. Since G ( s ) k is ﬁnite, the set Π s , k ( g ) : = arg min h ∈ G ( s ) k k g − h k ∞ = © h ′ ∈ G ( s ) k : k g − h ′ k ∞ ≤ k g − h k for all h ∈ G ( s ) k ª (30)is not empty. Choose a representative from the ﬁnite set Π s , k ( g ) and call it g s , k . (cid:3) Proof of Lemma 6.

Recall that for any two random variables ξ , η on ( S , Σ , P ), cov( ξ , η ) ≤k η k L ( P ) k ξ k L ( P ) . Thus, from N < n − tq and because the blocks © X ∗ j ( g ): j = N − ª arei.i.d. and centered, we infer, E · qn − t N − X j = X ∗ j ( g ) i = q ( n − t ) N − X j , j = E X ∗ j ( g ) X ∗ j ( g ) ≤ qn − t E £ X ∗ ( g ) ¤ = qn − t q q − X i , i = cov( f t ( g ; V ∗ i + t ), f t ( g ; V ∗ i + t )) ≤ qn − t E ¡ f t ( g ; V ∗ t ) ¢ . (31)Let us recall that Z ∗ t = ¡ Y ∗ ,... , Y ∗ t ¢ is measurable with respect to the σ -ﬁeld F ∗ t : = σ { Y ∗ s : s ≤ t } . Therefore, E h E | F ∗ t sup α ∈ [0, M ] h¡ Y ∗ t + − α ¢ ¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢i i = E h¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢ E | F ∗ t h sup α ∈ [0, M ] ¡ Y ∗ t + − E | F ∗ t Y ∗ t + + E | F ∗ t Y ∗ t + − α ¢ ii ≤ E h¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢ h E | F ∗ t ³ Y ∗ t + − E | F ∗ t Y ∗ t + ´ + sup α ∈ [0, M ] ¡ E | F ∗ t Y ∗ t + − α ¢ + α ∈ [0, M ] ³¯¯ E | F ∗ t Y ∗ t + − α ¯¯´³ E | F ∗ t ¯¯ Y ∗ t + − E | F ∗ t Y ∗ t + ¯¯´ii ≤ E h¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢ h var | F ∗ t ¡ Y ∗ t + ¢ + M + M q var | F ∗ t ¡ Y ∗ t + ¢ii ≤ ( M + M + M ) E ¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢ . (32)Since ( m [ t ] ( Z ∗ t ) + g [ t ] ( Z ∗ t )) ∈ [0, M ], E ¡ f t ( g ; V ∗ t ) ¢ = E ·³ Y ∗ t + − m [ t ] ( Z ∗ t ) + g [ t ] ( Z ∗ t )2 ´¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢¸ ≤ E sup α ∈ [0, M ] h¡ Y ∗ t + − α ¢¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢i (33) ≤ ¡ M + M + M ¢ E ¡ m [ t ] ( Z ∗ t ) − g [ t ] ( Z ∗ t ) ¢ . (cid:3) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann

Proof of Lemma 7.

For every n ∈ N the maximal index ˇ S = ˇ S ( n ) is given by ˇ S = min n s ∈ N : − L − s k + p γδ ≤ k − γδ /(15 M ) o . Using a telescope representation of g ∈ G , linearityof g X ∗ j ( g ), and the triangle inequality for probabilities, we obtain P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g ) > k − γδ ¾ = P ½ sup g ∈ G k qn − t N − X j = ε j h X ∗ j ( g ) − X ∗ j ( g ˇ S , k ) + X ∗ j ( g k ) + ˇ S − X s = ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢i > k − γδ ¾ ≤ P ½ sup g ∈ G k qn − t N − X j = ε j £ X ∗ j ( g ) − X ∗ j ( g ˇ S , k ) ¤ > k − γδ /3 ¾ + P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g k ) > k − γδ /3 ¾ + P ½ sup g ∈ G k ˇ S − X s = qn − t N − X j = ε j ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢ > k − γδ /3 ¾ = : P + P + P . (34)We treat each of the three terms separately. As to the ﬁrst term, the index ˇ S was chosensuch that the approximation of g by g ˇ S , k is very accurate. Using the deﬁnition of X ∗ j ( g ),Lemma 4 (i), and the deﬁnition of the sequence { g s , k } s = S , we observe ¯¯ X ∗ j ( g ) − X ∗ j ( g ˇ S , k ) ¯¯ = q ¯¯¯ q − X i = ³ f t ( g ; V ∗ t + jq + i ) − f t ( g ˇ S , k ; V ∗ t + jq + i ) − E £ f t ( g ; V ∗ t + jq + i ) − f t ( g ˇ S , k ; V ∗ t + jq + i ) ¤´¯¯¯ ≤ q q − X i = ³¯¯ f t ( g ; V ∗ t + jq + i ) − f t ( g ˇ S , k ; V ∗ t + jq + i ) ¯¯| {z } ≤ k g − g ˇ S , k k ∞ ( Y ∗ t + jq + i + + M )/(1 − L ) + E ¯¯ f t ( g ; V ∗ t + jq + i ) − f t ( g ˇ S , k ; V ∗ t + jq + i ) ¯¯| {z } ≤ k g − g ˇ S , k k ∞ ( EY ∗ t + jq + i + + M )/(1 − L ) ´ ≤ q q − X i = ( Y ∗ t + jq + i + + M ) 21 − L − ˇ S k + p γδ ≤ q q − X i = ( Y ∗ t + jq + i + + M )2 k − γδ /(15 M ). (35) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models S . We conclude, P ½ sup g ∈ G k qn − t ¯¯¯ N − X j = ε j £ X ∗ j ( g ) − X ∗ j ( g ˇ S , k ) ¤¯¯¯ > k − γδ /3 ¾ ≤ P ½ n − t ¯¯¯ n − X i = t ( Y ∗ i + + M )2 k − γδ /(15 M ) ¯¯¯ > k − γδ /3 ¾ ≤ P ½ n − t ¯¯¯ n − X i = t Y ∗ i + ¯¯¯ > M k ¾ ≤ P ½ n − t ¯¯¯ n − X i = t ( Y ∗ i + − E Y ∗ i + ) ¯¯¯ > M k ¾ ≤ − k M ( n − t ) var ¡ n − X i = t Y ∗ i ¢ , (36)for any k . We recall that the process { Y ∗ i } is q -dependent and stationary and conclude thatvar ¡ n − X i = Y ∗ i ¢ = n − X i , j = cov ¡ Y ∗ i , Y ∗ j ¢ = X ≤ i , j ≤ n − | i − j |≤ q ¡ E ( Y ∗ i Y ∗ j ) − EY ∗ i EY ∗ j ¢ ≤ q − X r = n − r X i = ³q EY ∗ i q EY ∗ i + r − EY ∗ i EY ∗ i + r ´ ≤ nq ³ E £ Y ∗ ¤ − £ EY ∗ ¤ ´ ≤ M nq . (37)This proves that there exists a constant C > n ( P ) such that for all n ∈ N with n > n ( P ) and all k ∈ N P = P ½ sup g ∈ G k qn − t ¯¯¯ N − X j = ε j £ X ∗ j ( g ) − X ∗ j ( g ˇ S , k ) ¤¯¯¯ > k − γδ /3 ¾ ≤ C − k qn − t . (38)We proceed by addressing the second term, P . First of all, since for any g ∈ G k the ﬁrstapproximation g k = π k g is selected from the ﬁnite set G (0) k , P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g k ) > k − γδ /3 ¾ = P ½ max h k ∈ G (0) k qn − t N − X j = ε j X ∗ j ( h k ) > k − γδ /3 ¾ . (39)This exceedance probability will be bounded with the help of Bernstein’s inequality forsums of bounded random variables [12, p.118]. To that end, we introduce the Σ -measurableevents A k ,2 j = A k ,2 j ( n ) : = © ω ∈ S : max i = q − Y ∗ t + jq + i + ≤ k + n ª . (40) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann

In order to bound the probability of these events, we use an exponential tail bound forPoisson variables [12, p. 116] and the fact that | λ i | is uniformly bounded by M . Thence weconclude, P n max ≤ i ≤ n Y i > k +

2) log n o ≤ n X i = E h P © Y i − λ i > k +

2) log( n ) − λ i ¯¯ λ i ªi ≤ n X i = E h exp ³ − ¡ k + n ) − λ i ¢ λ i + (2( k + n ) − λ i ) ´i ≤ n X i = E h exp ³ − ( k + n ) + λ i ´i ≤ exp ³ log( n ) − ( k + n ) + M ´ = e M n − ( k + . (41)Consequently, P ³S N − j = A ck ,2 j ´ ≤ e M n − ( k + , which implies P ½ max h k ∈ G (0) k qn − t N − X j = ε j X ∗ j ( h k ) > k − γδ /3 ¾ ≤ e M n − ( k + + P ½ max h k ∈ G (0) k qn − t N − X j = ε j X ∗ j ( h k ) A k ,2 j > k − γδ /3 ¾ ≤ e M n − ( k + + X h k ∈ G (0) k P ½ qn − t N − X j = ε j X ∗ j ( h k ) A k ,2 j > k − γδ /3 ¾ . (42)Now all involved variables are bounded. In order to apply Bernstein’s inequality, we needbounds on the variance of the sum P N − j = ε j X ∗ j ( h k ) A k ,2 j , and a bound on the absolutevalues of the addends ε j X ∗ j ( h k ) A k ,2 j . Furthermore, the addends have to be centered. Asfor the variance bound, recall that the sequences { ε j } and { Y ∗ i } are independent. Hence, © ε j X ∗ j ( h k ) A k ,2 j ª j ∈ N is a sequence of i.i.d. random variables, and E £ ε j X ∗ j ( h k ) A k ,2 j ¤ = E ε j E £ X ∗ j ( h k ) A k ,2 j ¤ =

0. Thus, for any h k ∈ G (0) k ⊂ G k , we can invoke Lemma 6 (i) toconclude that there exists a number n ∗ such that for all n > n ∗ and for all k ∈ N var ³ N − X j = ε j X ∗ j ( h k ) A k ,2 j ´ ≤ N − X j = E ¡ X ∗ j ( h k ) ¢ = ( n − t ) q var Ã qn − t N − X j = X ∗ j ( h k ) ! ≤ ( n − t ) q ( M + M + M )2 k + γδ qn − t = C ( n − t ) q k δ : = σ n , (43) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models C : = M + M + M ) γ . Let us now bound the absolute values of the addends.Recall that f t ( m ; V ∗ t + jq + i ) = k g − m k ∞ ≤ M for all g ∈ G , and infer with Lemma 4 (i)that ¯¯ f t ( h k ; V ∗ t + jq + i ) ¯¯ ≤ k h k − m k ∞ − L ( M + Y ∗ t + jq + i + ) ≤ M − L ( M + Y ∗ t + jq + i + ). (44)In virtue of EY j ≤ M and the deﬁnition of the events A k ,2 j , we obtain ¯¯ ε j X ∗ j ( h k ) A k ,2 j ¯¯ ≤ A k ,2 j q q − X i = ³¯¯ f t ( h k ; V ∗ t + jq + i ) ¯¯ + E ¯¯ f t ( h k ; V ∗ t + jq + i ) ¯¯´ ≤ q q − X i = M − L ¡ Y ∗ t + jq + i + + M ¢ A k ,2 j ≤ M − L ¡ k + n ) + M ¢ ≤ C ( k + n = : b n (45)with C = M − L , for all n ≥ e M and k ∈ N . We are ready to apply Bernstein’s inequality.Introducing the variables η j : = ε j X ∗ j ( h k ) A k ,2 j and x n : = n − tq k − γδ /3, (46)we obtain the display P ½ N − X j = ε j X ∗ j ( h k ) A k ,2 j > n − tq k − γδ /3 ¾ = P ½ N − X j = η j > x n ¾ . (47)We have shown that the random variables η j are independent and centered, | η j | ≤ b n , andvar( η + ... + η N − ) ≤ σ n . Bernstein’s inequality yields, P ½ N − X j = η j > x n ¾ ≤ exp µ − x σ n + x n b n /3 ¶ . (48)In this case, σ n ≍ k n − tq δ is dominated by x n b n ≍ k n − tq δ ( k + n sincelim sup n →∞ sup k ∈ N σ n x n b n ≤ lim sup n →∞ sup k ∈ N Ck + n ≤ C lim sup n →∞ (log n ) − = C . Hence, there exists a number n ∗∗ , independent of k , suchthat σ n ≤ x n b n for all n > n ∗∗ and all k ∈ N . Consequently, under the assumptions δ n = n − log n and t ( n ) ≍ q ( n ) ≍ log n , we obtain for the exponent in (48),lim inf n →∞ x n σ n + x n b n /3 · n − ≥ lim inf n →∞ x n b n · n − = − γ C k k + n →∞ n − tq δ log n · n − imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann ≥ C k (50)for some positive constant C . Hence, there exists a number n ≥ max { e M , n ∗ , n ∗∗ } suchthat x σ n + x n b n /3 ≥ C k n for all n ≥ n and all k ∈ N . In conclusion, for all k and all n ≥ n , P ½ N − X j = ε j X ∗ j ( h k ) A k ,2 j > n − tq k − γδ /3 ¾ ≤ exp ³ − C k n ´ . (51)Note that n does not depend on k . Moreover, the previous bound is independent of the spe-ciﬁc function h k ∈ G (0) k . Since sup k log ¡ G (0) k ¢ ≤ sup k C − k / δ ≤ C / δ with C : = MB / p γ ,which follows from the bounds in Lemma 4 (ii) with s =

0, we conclude, X h k ∈ G (0) k P ½ qn − t N − X j = ε j X ∗ j ( h k ) A k ,2 j > k − γδ /3 ¾ ≤ G (0) k exp ¡ − C k n ¢ ≤ exp ³ C δ − C k n ´ ≤ exp ³ k ¡ C δ − C n ¢´ (52)for any n ≥ n . Subsequently, we observe that lim n →∞ n − δ − = n ∈ N such that C δ − − C n ≤ − C n for all n ≥ n . Hence, for all n ≥ n ∨ n and all k ∈ N X h k ∈ G (0) k P ½ qn − t N − X j = ε j X ∗ j ( h k ) A k ,2 j > k − γδ /3 ¾ ≤ exp ¡ − C k n ¢ . (53)In conclusion, there exists a natural number n ( P ) ≥ n ∨ n such that for all n ≥ n ( P ) andall k ∈ N the term P is bounded by P ½ sup g ∈ G k qn − t N − X j = ε j X ∗ j ( g k ) > k − γδ /3 ¾ < e M n − ( k + + exp ³ − C k n ´ , (54)with some positive constant C .It is left to ﬁnd a bound for the third term, P . For that sake, we deﬁne the sets M s , k by M s , k : = n ( g , g ): g ∈ G ( s ) k , g ∈ G ( s + k , k g − g k ∞ ≤ − s k + p γδ o . (55)By deﬁnition of { g s , k : s = S } , k g − g s , k k ∞ ≤ − s k + p γδ as well as k g − g s + k k ∞ ≤ − ( s + k + p γδ , and by the triangle inequality, k g s , k − g s + k k ∞ ≤ k g s , k − g k ∞ +k g − g s + k k ∞ ≤ − s k + p γδ . Hence, ( g s , k , g s + k ) ∈ M s , k , and we conclude,sup g ∈ G k ˇ S − X s = qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢¯¯¯ = max ( g , g ) ∈ M k , s ˇ S − X s = qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯ ≤ ˇ S − X s = · max ( g , g ) ∈ M k , s qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯¸ . (56) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models g , g ) ∈ M , ¯¯¯¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯ ≤ q q − X i = ¯¯ f t ( g ; V ∗ t + jq + i ) − f t ( g ; V ∗ t + jq + i ) ¯¯ + q q − X i = E ¯¯ f t ( g ; V ∗ t + jq + i ) − f t ( g ; V ∗ t + jq + i ) ¯¯ ≤ − s k + p γδ − L q q − X i = ( Y ∗ t + jq + i + + M ). (57)Since the variables ε j are independent and centered, we can apply Hoeffding’s inequality[12, p.114] conditionally on ¡ V ∗ t ,... , V ∗ n − ¢ ′ and obtain for ( g , g ) ∈ M k , s , P ½ qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯ > x ¯¯¯¯ V ∗ t ,... , V ∗ n − ¾ ≤  − x q ( n − t ) P N − j = ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢  ≤  − x P N − j = h − s k + p γδ ( n − t )(1 − L ) P q − i = ¡ Y ∗ t + jq + i + + M ¢i  . (58)We apply an integrated Bernstein-type inequality [7, p.408]. For random variables ξ ,... , ξ m that satisfy the tail bound P { | ξ i | > x } ≤ e − x b + ax for some a , b ≥ x ≥

0, the inequal-ity states, E £ max i ≤ m | ξ i | ¤ ≤ C ¡p b log m + a log m ¢ (59)for some universal constant C >

0. We apply this inequality with a = b = N − X j = h − s k + p γδ ( n − t )(1 − L ) q − X i = ¡ Y ∗ t + jq + i + + M ¢i . (60)Conditionally on V ∗ t ,... , V ∗ n − , this yields almost surely in P , E · max ( g , g ) ∈ M k , s qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯ ¯¯¯¯ V ∗ t ,... , V ∗ n − ¸ ≤ C q log M k , s vuut N − X j = h − s k + p γδ ( n − t )(1 − L ) q − X i = ¡ Y ∗ t + jq + i + + M ¢i (61)for some positive constant C . Furthermore, as E | F ∗ i Y ∗ i + = var | F ∗ i Y ∗ i + + ( E | F ∗ i Y ∗ i + ) ≤ M + M , we conclude invoking the triangle inequality for the L ( P ) norm, µ E · q − X i = ¡ Y ∗ t + jq + i + + M ¢¸ ¶ ≤ q − X i = ³ E £ Y ∗ t + jq + i + + M ¤ ´ = q E ¡ Y ∗ + M ¢ imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann ≤ M + M ) q = : p C q . (62)Recall that N < n − tq and that x

7→ p x is a concave function. Deﬁne C : = C p C γ − L andintegrate inequality (61) with respect to P . This yields, E · E · max ( g , g ) ∈ M k , s qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯ ¯¯¯¯ V ∗ t ,... , V ∗ n − ¸¸ ≤ C q log M k , s vuut γ (1 − L ) N − X j = · − s + k δ ( n − t ) ¸ E · q − X i = ¡ Y ∗ t + jq + i + + M ¢¸ ≤ C q log M k , s k − s δ p q p n − t (63)Concerning the cardinality of the set M k , s , we observe that M k , s ≤ G ( s ) k G ( s + k ≤ e MB (2 s − k + s + − k )/( p γδ ) ≤ e MB s + − k /( p γδ ) , (64)or, with C : = q MB p γ , q log M k , s ≤ C s /2 − k /2 δ − . (65)Applying Markov’s inequality, we ﬁnally arrive at a bound for P : P ½ sup g ∈ G k ˇ S − X s = qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢¯¯¯ > k − γδ /3 ¾ ≤ · γ − k δ − E · sup g ∈ G k ˇ S − X s = qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢¯¯¯¸ ≤ · γ − k δ − S − X s = E · max ( g , g ) ∈ M k , s qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g ) − X ∗ j ( g ) ¢¯¯¯¸ by (56) ≤ · C γ − k δ − ∞ X s = q log M k , s k − s δ p q p n − t by (63) ≤ · C γ C | {z } = : C − k δ − ∞ X s = s /2 − k /2 δ − k − s δ p q p n − t by (65) = C − k /2 δ − ( n − t ) − q ∞ X s = − s /2 . (66)There exists a constant C > n ∈ N such that C δ − ( n − t ) − q ∞ X s = − s /2 ≤ C (log n ) − . (67) imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 onparametric INGARCH models n ≥ n . Of course, 2 − k /2 ≤ − k for all k ∈ N . Thus, for n ≥ n ( P ) = n and all k ∈ N weobtain the following bound for P : P ½ sup g ∈ G k ˇ S − X s = qn − t ¯¯¯ N − X j = ε j ¡ X ∗ j ( g s + k ) − X ∗ j ( g s , k ) ¢¯¯¯ > k − γδ /3 ¾ ≤ C − k (log n ) − . (68)In conclusion, P + P + P . − k qn − t + n − ( k + + exp ³ − C k n ´ + − k (log n ) − (69)for all n ≥ max © n ( P ) , n ( P ) , n ( P ) ª and all k ∈ N . (cid:3) Acknowledgments

MW thanks Florian Wechsung for revising the manuscript and giving some valuable com-ments.

References [1] A

LIPRANTIS , C. D. and B

ORDER , K. C. (1994).

Inﬁnite Dimensional Analysis: AHitchhiker’s Guide . Springer, Berlin.[2] B

ASS , R. F. (2013).

Real Analysis for Graduate Students . CreateSpace.[3] B

ERBEE , H. C. P. (1979).

Random Walks with stationary increments and renewaltheory . Math. Cent. Tracts, Amsterdam.[4] B

OLLERSLEV , T. (1986). Generalized autoregressive conditional heteroskedasticity.

J. Econometrics

307 - 327.[5] D

AVIS , R. A., D

UNSMUIR , W. T. M. and S

TREETT , S. B. (2003). Observation-DrivenModels for Poisson Counts.

Biometrika OUKHAN , P. (1994).

Mixing: Properties and Examples . Springer, New York (NY).[7] D

OUKHAN , P., M

ASSART , P. and R IO , E. (1995). Invariance principles for absolutelyregular empirical processes. Ann. Inst. Henri Poincaré Probab. Stat.

393 – 427.[8] D

OUKHAN , P. and N

EUMANN , M. H. (2019). Absolute regularity of semi-contractiveGARCH-type processes.

J. Appl. Prob. ERLAND , R., L

ATOUR , A. and O

RAICHI , D. (2006). Integer-Valued GARCH Process.

J. Time Series Anal. OKIANOS , K., R

AHBEK , A. and T

JØSTHEIM , D. (2009). Poisson Autoregression.

J.Amer. Statist. Assoc.

OKIANOS , K. and T

JØSTHEIM , D. (2012). Nonlinear Poisson autoregression.

Ann.Inst. Statist. Math. INÉ , E. and N

ICKL , R. (2016).

Mathematical Foundations of Inﬁnite-DimensionalStatistical Models . Cambridge University Press, New York (NY).[13] G

YÖRFI , L., K

RZYZAK , A., K

OHLER , M. and W

ALK , H. (2002).

A Distribution-FreeTheory of Nonparametric Regression . Springer, New York (NY). imsart-bj ver. 2014/10/16 file: WechsungNeumann_v.06.28.2020.tex date: September 23, 2020 M. Wechsung, M.H. Neumann [14] J

ENNRICH , R. I. (1969). Asymptotic Properties of Non-Linear Least Squares Estima-tors.

Ann. Math. Statist. EDEM , B. and F

OKIANOS , K. (2002).

Regression Models for Time Series Analysis .Wiley, Hoboken (NJ).[16] K

OLMOGOROV , A. N. and T

IKHOMIROV , V. M. (1993). ε -Entropy and ε -Capacity ofSets in Function Spaces. In Selected Works of A. N. Kologorov , (A. N. Shiriyayev, ed.)

III

EISTER , A. and K

REISS , J.-P. (2016). Statistical inference for nonparametricGARCH models.

Stochastic Process. Appl.

EUMANN , M. H. (2011). Absolute regularity and ergodicity of Poisson count pro-cesses.

Bernoulli YDBERG , T. H. and S

HEPHARD , N. (2000). BIN Models for Trade-by-Trade Data.Modelling the Number of Trades in a Fixed Interval of Time Econom. Soc. WorldCongr. 2000 contrib. Pap. No. 0740, Econom. Soc.[20] T

SYBAKOV , A. B. (2008).

Introduction to Nonparametric Estimation . Springer, NewYork (NY).[21]

VAN DE G EER , S. (1990). Estimating a Regression Function.

Ann. Statist. VAN DER V AART , A. W. and W

ELLNER , J. (1996).

Weak Convergence and EmpiricalProcesses: With Applications to Statistics . Springer, New York (NY).[23] W

ECHSUNG , M. (2019). Nonparametric least squares estimation in integer-valuedGARCH models, PhD thesis, Friedrich Schiller University Jena.