[PDF] Uniform concentration inequality for ergodic diffusion processes observed at discrete times

Abstract

In this paper a concentration inequality is proved for the deviation in the ergodic theorem in the case of discrete time observations of diffusion processes. The proof is based on the geometric ergodicity property for diffusion processes. As an application we consider the nonparametric pointwise estimation problem for the drift coefficient under discrete time observations.

Full PDF

aa r X i v : . [ m a t h . P R ] S e p Uniform concentration inequality for ergodicdiﬀusion processes observed at discrete times. ∗ L. Galtchouk † S. Pergamenshchikov ‡ August 15, 2018

Abstract

In this paper a concentration inequality is proved for the deviation inthe ergodic theorem in the case of discrete time observations of diﬀusionprocesses. The proof is based on the geometric ergodicity property fordiﬀusion processes. As an application we consider the nonparametricpointwise estimation problem for the drift coeﬃcient under discrete timeobservations.

Keywords:

Ergodic diﬀusion processes; Markov chains; Tail distribution; Up-per exponential bound; Concentration inequality.

AMS 2000 Subject Classiﬁcations : 60F10 ∗ The second author is partially supported by the RFFI-Grant 09-01-00172-a. † Department of Mathematics, Strasbourg University 7, rue Rene Descartes, 67084, Stras-bourg, France, e-mail: [email protected] ‡ Laboratoire de Math´ematiques Raphael Salem, Avenue de l’Universit´e, BP. 12,Universit´e de Rouen, F76801, Saint Etienne du Rouvray, Cedex France, e-mail:[email protected] Introduction

We consider the process ( y t ) t ≥ governed by the stochastic diﬀerential equationd y t = S ( y t ) d t + σ ( y t )d W t , ≤ t ≤ T , (1.1)where ( W t , F t ) t ≥ is a standard Wiener process, y is a initial condition and ϑ = ( S, σ ) are unknown functions. For this model we consider the pointwiseestimation problem for the function S at a ﬁxed point x ∈ R (i.e. S ( x )), onthe basis of the discrete time observations of the process (1.1), i.e.( y t j ) ≤ j ≤ N , (1.2)where t j = jδ , N = [ T /δ ] and δ is some positive ﬁxed observation frequencywhich will be speciﬁed later. Usually, for this problem one uses kernel estima-tors b S N ( x ) deﬁned as b S N ( x ) = P Nk =1 ψ h,x ( y t k ) ∆ y t k P Nk =1 ψ h,x ( y t k ) ∆ t k , ψ h,x ( y ) = 1 h Ψ (cid:18) y − x h (cid:19) , (1.3)where Ψ( y ) is a kernel function which equals to zero for | y | ≥ < h < y t k = y t k − y t k − and ∆ t k = δ .Main diﬃculty in this estimator is that the denominator is random. There-fore, to obtain the convergence rate for this estimator we have to study thebehavior of the denominator, more precisely, one needs to show that N X k =1 ψ h,x ( y t k )∆ t k ≈ π ϑ ( ψ h,x ) hT as T → ∞ , where π ϑ ( ψ h,x ) = Z R ψ h,x ( y ) q ϑ ( y ) d y (1.4)and q ϑ is the ergodic density deﬁned in (2.2).Unfortunately, the ergodic theorem does not permit to obtain this kind ofresult because the times t k and the bandwidth h depend on T . Usually oneobtains such properties through concentration inequalities for the deviation inthe ergodic theorem, i.e. one needs to study the limit behavior of the deviation D T ( φ ) = N X k =1 (cid:16) φ ( y t k ) − π ϑ ( φ ) (cid:17) ∆ t k (1.5)for some functions φ which can be dependent on T , for example, φ ( · ) = ψ h,x ( · ).More precisely, we need to show, that for any ε > m > ϑ , lim T →∞ T m P ϑ (cid:16) | D T ( ψ h,x ) | > εT (cid:17) = 0 , (1.6)2here P ϑ is the law of the process ( y t ) t ≥ under the coeﬃcients ϑ = ( S, σ ).Usually, to get properties of type (1.6) one needs to establish an exponentialinequality for the deviations (1.5).There are a number of papers devoted to concentration inequalities forfunctions of independent random variables (we refer the reader to [2] and ref-erences therein), for functions of dependent random variables (see [4], [5], [14]).For Markov chains such inequalities were obtained in [1]. For continuous timeMarkov processes an exponential concentration inequality was obtained in [3](see also references therein). Some applications of concentration inequalitiesto statistics are presented in [13]. Concentration inequalities for diﬀusion pro-cesses are given in [8], [16], [18].For statistical applications, we need uniform upper bounds for the taildistribution over functions φ like to the exponential bounds in [8]. We can notapply directly the method from [8], since there it is based on the continuoustimes version of the Ito formula. In this paper we apply this approach throughuniform (over the functions S ) geometric ergodicity. We recall (see [15]), thatthe geometric ergodicity yields a geometric rate in the convergencelim t →∞ E ϑ ( g ( y t ) | y = x ) = π ϑ ( g )for any integrable functions g and any initial value x ∈ R . Here E ϑ denotes theexpectation with respect to the distribution P ϑ . In [10] through the Lyapunovfunctions method it is shown that the process (1.1) is geometrically ergodicuniformly over functions ϑ = ( S, σ ) from the functional class Θ deﬁned in(2.1).The paper is organized as follows. In the next section we formulate the mainresults. In Section 3 we introduce all the necessary parameters. In Section 4we show a concentration inequality in ergodic theorem for the continuous ob-servations of the process (1.1). In Section 5 we announce the uniform geometricergodic property for the process (1.1). In Section 6 we give the Burkh¨olderinequality for dependent random variables. In Section 7 we prove all mainresults. The Appendix contains the proofs of some auxiliary results.

First we describe the functional class Θ for functions ϑ = ( S, σ ) deﬁned in[10]. We start with some real numbers x ∗ ≥ M >

L >

L,M the class of functions S from C ( R ) such thatsup | x |≤ x ∗ (cid:16) | S ( x ) | + | ˙ S ( x ) | (cid:17) ≤ M − L ≤ inf | x |≥ x ∗ ˙ S ( x ) ≤ sup | x |≥ x ∗ ˙ S ( x ) ≤ − L − . Furthermore, for some ﬁxed numbers 0 < σ min ≤ σ max < ∞ , we denote by V the class of the functions σ from C ( R ) such that σ min ≤ inf x ∈ R min ( | σ ( x ) | , | ˙ σ ( x ) | , | ¨ σ ( x ) | ) ≤ sup x ∈ R max ( | σ ( x ) | , | ˙ σ ( x ) | , | ¨ σ ( x ) | ) ≤ σ max . Finally, we set Θ = Σ

L,M × V . (2.1)It should be noted (see, for example, [11]), that for any ϑ = ( S, σ ) ∈ Θ, theequation (1.1) has a unique strong solution which is a ergodic process with theinvariant density q ϑ deﬁned as q ϑ ( x ) = (cid:18)Z R σ − ( z ) e e S ( z ) d z (cid:19) − σ − ( x ) e e S ( x ) , (2.2)where e S ( x ) = 2 R x S ( v )d v and S ( x ) = S ( x ) /σ ( x ).Now we describe the functional classes for the functions φ . First, for anyparameters ν > ν > V ν ,ν = { φ ∈ C ( R ) : | φ | ≤ ν , | φ | ∗ ≤ ν } , (2.3)where | φ | = R R | φ ( y ) | d y and | φ | ∗ = sup y ∈ R | φ ( y ) | .For any function φ from C ( R ) we denote by L ϑ ( φ ) the generator operatorfor the process (1.1), i.e. L ϑ ( φ )( y ) = S ( y ) ˙ φ ( y ) + σ ( y )2 ¨ φ ( y ) . Using this notation, we set µ ( φ ) = sup ϑ ∈ Θ kL ϑ ( φ ) k ∗ and e µ ( φ ) = sup ϑ ∈ Θ | e π ϑ ( φ ) | , (2.4)where e π ϑ ( φ ) = π ϑ ( L ϑ ( φ )). Now for any vector ν = ( ν , ν , ν , ν , ν ) from R we set K ν = n φ ∈ V ν ,ν : k ˙ φ k ∗ ≤ ν , µ ( φ ) ≤ ν , e µ ( φ ) ≤ ν o . (2.5)4 heorem 2.1. For any vector ν = ( ν , ν , ν , ν , ν ) from R and any < δ ≤ there exist positive parameters z = z ( δ, ν ) , γ = γ ( δ, ν ) and κ = κ ( δ, ν ) such that sup T ≥ sup z ≥ z sup φ ∈K ν sup ϑ ∈ Θ e z min( κ z , γ ) P ϑ (cid:16) | D T ( φ ) | ≥ z √ N (cid:17) ≤ , (2.6) where the parameters z , γ and κ are deﬁned in (3.5) – (3.6) . Now we apply this theorem to the pointwise estimation problem, i.e. for thefunctions ψ h,x deﬁned in (1.3). To this end we assume that the frequency δ in the observations (1.2) is of the following form δ = δ T = 1 T l T , (2.7)where the function l T is such that for any m > T →∞ l T T m = 0 and lim T →∞ l T ln T = + ∞ . (2.8)Further, let ǫ = ǫ T be a positive function satisfying the following propertieslim T →∞ ǫ T = 0 , lim T →∞ l T T ǫ T = 0 and lim T →∞ ǫ T l T ln T = + ∞ . (2.9)We can take, for example, for some ι > l T = ln ι ( T + 1) and ǫ T = 1ln ι ( T + 1) . Theorem 2.2.

Assume that the kernel function Ψ in (1.3) is two continu-ously diﬀerentiable. Moreover, assume that the functions δ T and l T satisfy theproperties (2.7) and (2.9) . Then there exist coeﬃcients z ∗ = z ∗ (Ψ) > and γ ∗ = γ ∗ (Ψ) > such that lim sup T →∞ e aγ ∗ l T sup a ≥ a ∗ sup h ≥ T − / sup ϑ ∈ Θ P ϑ (cid:16) | D T ( ψ h,x ) | ≥ a T (cid:17) ≤ , (2.10) where a ∗ = z ∗ /l T , the parameters z ∗ and γ ∗ are given in Section 3. This theorem implies immediately the following

Corollary 2.1.

Assume, that all conditions of Theorem 2.2 hold. Then, forany m > , lim sup T →∞ T m sup a ≥ a ∗ sup h ≥ T − / sup ϑ ∈ Θ P ϑ (cid:16) | D T ( ψ h,x ) | ≥ a T (cid:17) = 0 . χ h,x ( y ) = 1 h χ (cid:18) y − x h (cid:19) , (2.11)where χ ( y ) = {| y |≤ } . Theorem 2.3.

Assume that the parameter δ has the form (2.7) . Then, forany m > , and for any function ǫ T , satisfying the condtions (2.8) and (2.9)lim T →∞ T m sup h ≥ T − / sup ϑ ∈ Θ P ϑ (cid:16) | D T ( χ h,x ) | ≥ ǫ T T (cid:17) = 0 . (2.12) Remark 2.1.

It is well known that to obtain the optimal rate in the estimationproblem for a diﬀerentiable function S in the process (1.1) one needs to choosethe bandwidth h as h = T − / (2 α +1) with the regularity parameter α ≥ . This means that, really for the pointwiseestimation problem, h ≥ T − / . But in the quadratic risk one needs to choosethe parameter h as h = T − / (see [6]-[7],[9]). In this section we introduce all necessary constants and parameters. First, weset υ = e β / (4 β ) and υ = p π/β e β / (4 β ) , (3.1)where β = 2 M/σ and β = 1 /Lσ . Moreover, as we will see in Appendix,the ergodic density (2.2) is uniformly bounded by q ∗ , where q ∗ = σ max σ min e β x ∗ + β / (4 β ) . (3.2)Now we set r = r ( ν ) = 2 ν σ (1 + υ + q ∗ ( x ∗ + υ )) e x ∗ β , (3.3)where the parameter ν is deﬁned in (2.3). Now using this function we set κ = κ ( ν ) = 1108 r (3 ρ + y + 2 σ ) (3.4)where ρ = max (cid:16) | y | , σ max √ L , x ∗ + M L ) (cid:17) .6ow for any δ > ν = ( ν , ν , ν , ν , ν ) from R we set z = z ( δ, ν ) = δ / max (cid:0) c ∗ ν , c ∗ ν , ν T / , ν T − / (cid:1) ,τ = τ ( δ, ν ) = δ / max (cid:0) c ∗ ν , c ∗ ν (cid:1) , (3.5)where c ∗ = 2 e κ +1 r R (1 + ρ ) κ and c ∗ = √ eσ max . The parameters R and κ are deﬁned in Theorem 5.1. Finally we set γ = 14 τ and κ = κ ( δ, ν ) = 9 κ (1 − δ )64 δ . (3.6)Now we set M = M + L ( x ∗ + | x | + 2) . (3.7)Now for any inegrated two times continuously diﬀerentiable R → R functionΨ we deﬁne k ∗ (Ψ) = max (cid:16) | ˙Ψ | , | ¨Ψ | , k Ψ k ∗ , k ˙Ψ k ∗ , k ¨Ψ k ∗ (cid:17) . (3.8)Using this operator we deﬁne the parameters z ∗ = λ k ∗ (Ψ) and τ ∗ = λ k ∗ (Ψ) , (3.9)where λ = max (cid:0) c ∗ M , c ∗ , M q ∗ , (cid:1) and λ = max (cid:0) c ∗ M , c ∗ (cid:1) . Finally, we set γ ∗ = 14 τ ∗ . (3.10) In this section we study the deviation in the ergodic theorem for the continuousobservation case, which in this case is deﬁned as∆ T ( φ ) = 1 √ T Z T ( φ ( y t ) − π ϑ ( φ )) d t , (4.1)where φ is any integrated function, i.e. | φ | < ∞ .7 roposition 4.1. For any ν > and ν > z ≥ e κ z sup T ≥ sup φ ∈V ν ,ν sup ϑ ∈ Θ P ϑ ( | ∆ T ( φ ) | ≥ z ) ≤ , (4.2) where the parameter κ is given in (3.4) . Proof.

Similarly to [8] ﬁrstly we show that the deviation (4.1) has an expo-nential moment, i.e. we show that for the parameter κ sup T ≥ sup ϑ ∈ Θ E ϑ e κ ∆ T ( φ ) ≤ . (4.3)Indeed, to show this inequality we need to estimate the expectation of anyeven power for the deviation ∆ T ( φ ). To this end we have to represent thisdeviation as the sum of a continuous martingale and a negligible term. Forthis one needs to ﬁnd a bounded solution for the following diﬀerential equation˙ v ϑ ( u ) + 2 S ( u ) σ ( u ) v ϑ ( u ) = 2 e φ ( u ) σ ( u ) , e φ ( u ) = φ ( u ) − π ϑ ( φ ) . (4.4)One can check directly that the function v ϑ ( u ) = − Z ∞ u e φ ( y ) σ ( y ) exp { Z yu S ( z )d z } d y (4.5)yields such a solution. We recall that the function S is deﬁned in (2.2).Moreover, due to Lemma A.2 from Appendix implies this function is uniformbounded. By applying the Ito formula to the function V ( y ) = R y v ϑ ( u )d u wefollowing representation Z T e φ ( y s )d s = V ( y T ) − V ( y ) − ζ T , (4.6)where ζ T = R T v ϑ ( y s ) σ ( y s )d w s . Therefore, for any T ≥ T ( φ ) from above as | ∆ T ( φ ) | ≤ r | y T | + r | y | + 1 √ T | ζ T | . Moreover, taking into account (see [12], Lemma 4.11), that for any m ≥ E ϑ ( ζ T ) m ≤ (2 m − r m σ m max T m , we obtain by Proposition A.1 , that for any m ≥ E ϑ | ∆ T ( φ ) | m ≤ m − r m ( E ϑ | y T | m + | y | m ) + E ϑ ( ζ T ) m T m ! ≤ (3 r ) m (cid:0) m + 1)(2 m − ρ m + y m + (2 m − σ m max (cid:1) . κ , we obtain E ϑ e κ ∆ T ( φ ) = 1 + ∞ X m =1 κ m m ! (3 r ) m (cid:0) m + 1)!! ρ m + y m + (2 m − σ m max (cid:1) ≤ ∞ X m =1 κ m (3 r ) m (cid:0) ρ ) m + y m + 2 m σ m max (cid:1) ≤ ∞ X m =1 (1 / m = 2 . From here we obtain the inequality (4.3) and by the Chebychev inequality wecome to the upper bound (4.2). Hence Proposition 4.1.

Remark 4.1.

It should be noted that the inequality (4.2) is shown in [8] forthe process (1.1) with σ = 1 . Thus Proposition 4.1 extends teh result from [8]for any diﬀusion function σ . Here we announce a result on geometric ergodicity obtained in [10].

Theorem 5.1.

There exist some constants R ≥ and κ > such that sup t ≥ e κt sup k g k ∗ ≤ sup x ∈ R sup ϑ ∈ Θ | E ϑ ( g ( y t ) | y = x ) − π ϑ ( g ) | | x | ≤ R , (5.1) where the parameters R and κ are given in [10]. In this section we give the following inequality from [4],[17].

Proposition 6.1.

Let (Ω , F , ( F j ) ≤ j ≤ n , P ) be a ﬁltered probability space and ( X j , F j ) ≤ j ≤ n be sequence of random variables such that for some p ≥ ≤ j ≤ n E | X j | p < ∞ . Deﬁne b j,n ( p ) =  E ( | X j | n X k = j | E ( X k |F j ) | ) p/  /p . hen E | n X j =1 X j | p ≤ (2 p ) p/  n X j =1 b j,n ( p )  p/ . (6.1)Proof of this Proposition is given in Appendix. First note, that by Proposition A.1 and the H¨older inequality we obtain forany α ≥ t ≥ sup ϑ ∈ Θ E ϑ ( | y t | α | y = x ) ≤ α + 1) α/ ρ α . (7.1)Now we represent the deviation D T ( φ ) as D T ( φ ) = Z T ( φ ( y t ) − π ϑ ( φ ))d t + A ,T − A ,T = √ T ∆ T ( φ ) + A ,T − A ,T , (7.2)where A ,T = N X j =1 Z t j t j − (cid:16) φ ( y t j ) − φ ( y t ) (cid:17) d t and A ,T = Z TδN ( φ ( y t ) − π ϑ ( φ ))d t . To estimate the term A ,T we represent through the Ito formula the diﬀerence φ ( y t j ) − φ ( y t ) as φ ( y t j ) − φ ( y t ) = Z t j t L ϑ ( φ )( y s ) d s + Z t j t ˙ φ ( y s ) σ ( y s )d W s = e π ϑ ( φ )( t j − t ) + Ψ j ( t ) + Z t j t ˙ φ ( y s ) σ ( y s )d W s , where Ψ j ( t ) = Z t j t ψ ( y s ) d s , ω j ( t ) = Z t j t ˙ φ ( y s ) σ ( y s ) d W s and ψ ( y ) = L ϑ ( φ )( y ) − e π ϑ ( φ ). Now setting X j = Z t j t j − Ψ j ( t ) d t and η j = Z t j t j − ω j ( t )d t ,

10e obtain A ,T = e π ϑ ( φ ) N δ N X j =1 X j + N X j =1 η j . (7.3)To estimate the second term in the right-hand part of (7.3), we makeuse of the Proposition 6.1.We start with verifying its conditions. Putting F s = σ { y u , ≤ u ≤ s } , we obtain by Theorem 5.1, that for any t ≥ s and forany φ from the functional class (2.5) | E ϑ ( ψ ( y t ) |F s ) | ≤ µ ( φ ) R (1 + | y s | ) e − κ ( t − s ) ≤ ν R (1 + | y s | ) e − κ ( t − s ) . Therefore, for any k > j , | E ϑ ( X k |F t j ) | ≤ Re κ (1 + | y t j | ) ν δ e − κδ ( k − j ) . (7.4)It should be noted also, that the random variables X j are bounded, i.e. | X j | ≤ ν δ . To estimatie the probability tail for the sum P nj =1 X j we will use the inequality(6.1). For this we need to estmate the coeﬃcients b j,N ( p ) for any p ≥

1. Fromhere, taking into account that 1 − e − κδ ≥ κδe − κ and that for p ≥ (cid:16) E ϑ (1 + | y t j | ) p/ (cid:17) /p ≤ (cid:16) E ϑ | y t j | p/ (cid:17) /p , we can estimate the coeﬃcient b j,N ( p ) as b j,N ( p ) ≤ κ R e κ ς (cid:16) E | y t j | p/ ) /p (cid:17) , where ς = ν δ . Now the inequality (7.1) yields b j,N ( p ) ≤ R ς p p ≤ R ς p p , where R = 1 κ R e κ (1 + ρ ) . Using this in (6.1) we obtain, that for any p > E ϑ | N X k =1 X k | p ≤ (2 p ) p/ N p/ R p/ ς p (2 p ) p/ ≤ (2 p R ς ) p N p/ p p . P ϑ | N X k =1 X k | ≥ z √ N ! ≤ e p ln( a )+ p ln p with a = 2 p R ς/z . Minimizing now the right-hand part over p ≥

N/T ≥ (1 − δ ) /δ for any 0 < δ < T ≥ P ϑ ( | D T ( φ ) | ≥ z √ N ) ≤ P ϑ | ∆ T ( φ ) | ≥ z p (1 − δ )8 √ δ ! + P ϑ (cid:18) | A ,T | ≥ z √ N (cid:19) . Therefore, applying here the inequalities (4.2) and (7.7) we come to the upperbound (2.6) with the parameter κ given in (3.6). Hence Theorem 2.1. Firstly, note that in this case | ψ h,x | = | Ψ | , k ψ h,x k ∗ = 1 h k Ψ k ∗ and k ˙ ψ h,x k ∗ = 1 h k ˙Ψ k ∗ . Moreover, taking into account that | S ( y ) | ≤ M + L x ∗ + L | y | , we ﬁnd thatsup | y |≤| x | +2 | S ( y ) | ≤ M , (7.8)where M is given in (3.7).Therefore, in view of the fact that 0 < h <

1, we can estimate from above theparametrs (2.4) as µ ( ψ h,x ) ≤ µ ∗ h − and e µ ( ψ h,x ) ≤ e µ ∗ h − , (7.9)where µ ∗ = max (cid:16) k ˙Ψ k ∗ , k ¨Ψ k ∗ (cid:17) M and e µ ∗ = max (cid:16) | ˙Ψ | , | ¨Ψ | (cid:17) M q ∗ . Therefore, the function ψ h,x belongs to the class (2.5) with the following pa-rameters ν = | Ψ | , ν = k Ψ k ∗ h , ν = k ˙Ψ k ∗ h , ν = µ ∗ h , ν = e µ ∗ h . κ ( | Ψ | ) and the param-eters (3.5) can be represented as z = δ / h max (cid:16) c ∗ µ ∗ , c ∗ k ˙Ψ k ∗ h , e µ ∗ hT / , k Ψ k ∗ h T − / (cid:17) τ = δ / h max (cid:16) c ∗ µ ∗ , c ∗ k ˙Ψ k ∗ h (cid:17) . (7.10)Therefore, thanks to the condition (2.8) for any T − / ≤ h ≤ z ≤ l − / T z ∗ and τ ≤ l − / T τ ∗ , (7.11)where the parameters z ∗ and τ ∗ are given in (3.9). Note now that, by thecondition (2.7) P ϑ (cid:16) | D T ( ψ h,x ) | ≥ a T (cid:17) ≤ P ϑ (cid:16) | D T ( ψ h,x ) | ≥ z √ N (cid:17) where z = a/ p l T . The ﬁrst inequality in (7.11) implies that z ≥ z for all a ≥ a ∗ = z ∗ /l T . Moreover, from the last inequality in (7.11) it follows, thatfor a ≥ a ∗ min ( κ z , γ ) = min (cid:18) κ z , τ (cid:19) ≥ min κ z ∗ l T p l T , l T p l T τ ∗ ! . Taking into account here the deﬁnition of κ in (3.6) and the form for δ givenby (2.7) we obtain that for suﬃciently large T min κ z ∗ l T p l T , l T p l T τ ∗ ! = l T p l T τ ∗ . Thus, through Theorem 2.1 we come to the inequality (2.10). Hence Theo-rem 2.2

First we represent the tail probability as P ϑ (cid:16) | D T ( χ h,x ) | ≥ ǫ T T (cid:17) = I + I , where I = P ϑ  N X j =1 χ h,x ( y t j )∆ t j ≤ ( π ϑ ( χ h,x ) − ǫ T ) T  I = P ϑ  N X j =1 χ h,x ( y t j )∆ t j ≥ ( π ϑ ( χ h,x ) + ǫ T ) T  . Let us deﬁne now the following smoothing indicator functionsΨ ,η ( u ) = 1 η Z + ∞−∞ {| z |≤ − η } V (cid:18) z − uη (cid:19) d z and Ψ ,η ( u ) = 1 η Z + ∞−∞ {| z |≤ η } V (cid:18) z − uη (cid:19) d z , where η is a smoothing positive parameter which will be speciﬁed later, V is atwo times continuously diﬀerentiable even R → R function such that V ( z ) = 0for | z | ≥ Z − V ( z )d z = 1 . It is easy to see that, for any y ∈ R and 0 < η ≤ / ,η ( u )( y ) ≤ χ ( y ) ≤ Ψ ,η ( y )and Ψ ,η ( y ) = 0 for | y | ≥

2. Moreover, for the functions ψ i,h ( y ) = 1 h Ψ i,η (cid:18) y − z h (cid:19) using the inequality (A.4), we can estimate the diﬀerence between the coore-sponding ergodic intergals (1.4) as | π ϑ ( χ h,x ) − π ϑ ( ψ i,h ) | ≤ ηq ∗ . Therefore, choosing here η = ǫ T we obtain, for suﬃciently large T , I i ≤ P ϑ (cid:0) | D T ( φ i,h ) | ≥ ǫ T T / (cid:1) . One can check directly that in this case the operator (3.8) has the followingasymptotic ( T → ∞ ) form k ∗ (Ψ i,η ) = O (cid:0) η − (cid:1) . Therefore, from (3.9) and (7.11) it follows that for T → ∞ and h ≥ T − / z ( φ i,h ) = O (cid:16) η − l − / T (cid:17) and τ ( φ i,h ) = O (cid:16) η − l − / T (cid:17) , z ( φ i,h ) = O ǫ T l / T ! and τ ( φ i,h ) = O ǫ T l / T ! . Now we have P ϑ (cid:16) | D T ( ψ h,x ) | ≥ ǫ T T (cid:17) ≤ P ϑ (cid:16) | D T ( ψ h,x ) | ≥ z √ N (cid:17) , where z = ǫ T / p l T . The last equality in (2.9) implies z ≥ z for suﬃcientlylarge T . Moreover, taking into account, that there exists a constant c ∗ > T κ z ≥ c ∗ T p l T ǫ T and γ ≥ c ∗ l T p l T ǫ T , i.e. for suﬃciently large T min ( κ z , γ ) ≥ c ∗ l T p l T ǫ T . Therefore, by Theorem 2.1 for suﬃciently large T P ϑ (cid:16) | D T ( ψ h,x ) | ≥ ǫ T T (cid:17) ≤ e − c ∗ l T ǫ T . Now the last condition in (2.9) yields the equality (2.12). Hence Theorem 2.3.

A Appendix

A.1 Proof of Proposition 6.1

We set h n ( t ) = E | S n − + tX n | p with S n = n X j =1 X j . By the induction method we assume that for any 1 ≤ k ≤ n − ≤ t ≤ h k ( t ) ≤ (2 p ) p/ B p/ k ( t ) , (A.1)where B k ( t ) = k − X j =1 b j,k ( p ) + tb k,k ( p ) . Note now that as is shown in [17] (Theorem 2.3) E | S n | p = p ( p − n X j =1 Z E | S j − + vX j | p − ( − vX j + Υ( j, n ))d v . (A.2)16ith Υ( j, n ) = X j n X k = j E ( X k |F j ) . Therefore, h n ( t ) = p ( p − n − X j =1 Z E | S j − + vX j | p − ( − vX j + G ( i, n, t ))d v + p ( p − Z E | S n − + vtX n | p − t (1 − v ) X n d v , where G ( j, n, t ) = Υ( j, n −

1) + tX j E ( X n |F j ) . Moreover, we can estimate h n ( t ) as h n ( t ) p ≤ n − X j =1 Z E | S j − + vX j | p − | G ( i, n, t ) | d v + Z t E | S n − + sX n | p − X n d s Now taking into account that for 0 ≤ t ≤ (cid:0) E | G ( j, n, t ) | p/ (cid:1) /p ≤ b j,n ( p ) , we obtain by the H¨older inequality Z E | S j − + vX j | p − | G ( i, n, t ) | d v ≤ Z h αj ( v ) b j,n ( p )d v , where α = 1 − /p . Therefore, h n ( t ) p ≤ n − X j =1 b j,n ( p ) Z h αj ( v )d v + b n,n ( p ) Z t h αn ( s )d s Now by the induction assumption for any 1 ≤ j ≤ n − b j,n ( p ) Z h αj ( v )d v ≤ (2 p ) ( p − / Z B ( p − / j ( v ) d v b j,n ( p ) . Moreover, taking into account that B j ( v ) ≤ j − X i =1 b i,n + vb j,n ( p ) ,

17e obtain that Z B ( p − / j ( v ) d v b j,n ( p ) ≤ p ( j X i =1 b i,n ) p/ − ( j − X i =1 b i,n ) p/ ! . This implies for any 0 ≤ t ≤ h n ( t ) ≤ k n Z t h αn ( v ) d v + f n (A.3)with k n = p b n,n ( p ) and f n =  p n − X j =1 b j,n ( p )  p/ . Now by setting Z ( t ) = Z t h αn ( s )d s + f n k n , we obtain from (A.3) that ˙ Z ( t ) ≤ k αn Z α ( t ) . Now introducing g ( t ) = ˙ Z ( t ) − k αn Z α ( t ) , we obtain the diﬀerential equation˙ Z ( t ) = k αn Z α ( t ) + g ( t )with g ( t ) ≤

0. From here we obtain Z /p ( t ) = Z /p (0) + 2 p k αn t + Z tO g ( u ) Z α ( u ) d u ≤ Z /p (0) + 2 p k αn t , i.e. Z ( t ) ≤ (cid:18) Z /p (0) + 2 p k αn t (cid:19) p/ . Substituting this bound in (A.3) we obtain h n ( t ) ≤ k n Z ( t ) ≤ k n (cid:18) Z /p (0) + 2 p k αn t (cid:19) p/ =  p n − X j =1 b j,n ( p ) + 2 ptb n,n ( p )  p/ . Hence Proposition 6.1. 18 .2 Uniform bound for the invariant density

Lemma A.1.

The invariant density (2.2) is uniformly bounded: sup x ∈ R sup ϑ ∈ Θ q ϑ ( x ) ≤ q ∗ < ∞ , (A.4) where the upper bound q ∗ is given in (3.2) . Proof.

First, note that through the deﬁnition of Θ we can check directly thatfor any | x | ≥ x ∗ Z x S ( v ) d v ≤ β | x | − β ( | x | − x ∗ ) , (A.5)where the coeﬃcients β and β are given in (3.1). Therefore, taking intoaccount, that for | x | ≥ x ∗ Z x S ( v ) d v ≤ β x ∗ , we obtain that 2 sup x ∈ R Z x S ( v ) d v ≤ β x ∗ + β β . Estimating now the denominator in (2.2) from below as Z R σ − ( z ) e e S ( z ) d z ≥ Z σ − ( z ) d z ≥ σ max , and taking into account the deﬁnition of q ∗ we come to the upper uniformbound (A.4). Hence Proposition A.1. A.3 Moment bound for the process y t . Proposition A.1.

For any m ≥ t ≥ sup ϑ ∈ Θ E ϑ | y t | m ≤ m + 1)(2 m − ρ m ≤ m ) m ρ m , where ρ is given in (3.4) . Proof.

First note, that through the Ito formula we can write for the function z t ( m ) = E ϑ y mt the following intergal equality z t ( m ) = z ( m ) + 2 m Z t E ϑ y m − s S ( y s )d s + m (2 m − Z t E ϑ y m − s σ ( y s ) ds , z t ( m ) = 2 m E ϑ y m − s S ( y t ) + m (2 m − E ϑ y m − t σ ( y t ) . Taking into account here that sup x ∈ R σ ( x ) ≤ σ we obtain, that for any m ≥ t ≥ z t ( m ) ≤ m E ϑ y m − t S ( y t ) + m (2 m − σ z t ( m − . Now we need to estimate from above the function x m − S ( x ). Obviously, thatfor any K > x ∗ x m − S ( x ) ≤ K m − sup | x |≤ K | S ( x ) | {| x |≤ K } + x m S ( x ) x {| x | >K } . Taking into account that sup | x | > x ∗ | ˙ S ( x ) | ≤ L , we obtain, for any x ∈ [ x ∗ , K ], | S ( x ) | ≤ | S ( x ∗ ) | + L | x − x ∗ | ≤ M + L ( K − x ∗ ) . Similarly, we obtain the same upper bound for x ∈ [ − K, − x ∗ ]. Therefore,sup | x |≤ K | S ( x ) | ≤ M + L ( K − x ∗ ) . Consider now the case | x | > K . We recall, that sup | x |≥ x ∗ ˙ S ( x ) ≤ − L − .Therefore, S ( x ) x ≤ MK − K − x ∗ LK .

Choosing K = 2( x ∗ + M L ) yields S ( x ) x ≤ − L .

Therefore, x m − S ( x ) ≤ K m − ( M + L ( K − x ∗ )) − L x m {| x | >K } = K m − ( M + L ( K − x ∗ )) + β x m {| x |≤ K } − L x m ≤ A m − β x m , where A m = (2( x ∗ + M L )) m − (cid:0) M + x ∗ (cid:0) L + L − (cid:1) + 2 L M (cid:1) z t ( m ) ≤ m A m − L − m z t ( m ) + m (2 m − σ z t ( m − . We can rewrite this inequality as follows˙ z t ( m ) = − L − mz t ( m ) + m (2 m − σ z t ( m −

1) + ψ t , where sup t ≥ ψ t ≤ m A m . This equality provides z t ( m ) = z ( m ) e − mL − t + m (2 m − σ Z t e − mL − ( t − s ) z s ( m − s + Z t e − mL − ( t − s ) ψ s d s ≤ m (2 m − σ Z t e − mL − ( t − s ) z s ( m − s + B m , where B m = y m + 2 A m L . Setting B = 1 and resolving this inequality byrecurrence yields z t ( m ) ≤ m − m X j =0 (cid:0) σ L (cid:1) m − j B j . It is easy to see, that B m ≤ (cid:0) max (cid:0) | y | , x ∗ + M L ) (cid:1)(cid:1) m . Therefore sup t ≥ z t ( m ) ≤ m + 1)(2 m − ρ m ≤ m ) m ρ m , where ρ is deﬁned in (3.4). Hence Proposition A.1. A.4 Properties of the function (4.5)

Lemma A.2.

For any integrated function φ the solution (4.5) is uniformbounded, i.e. sup ϑ ∈ Θ sup y ∈ R | v ϑ ( y ) | ≤ r , where the upper bound r is introduced in (3.3) . roof. Firstly we note, that for any ϑ from Θ and any intergated R → R function φ | π ϑ ( φ ) | ≤ q ∗ | φ | . Moreover, by the deﬁnition of the parameter β we get2 sup | u |≤ x ∗ | S ( u ) | ≤ β . Therefore, for 0 ≤ u ≤ x ∗ we can estimate the function v ϑ as | v ϑ ( u ) | ≤ e x ∗ β σ ((1 + q ∗ x ∗ ) | φ | + I ( φ )) , where β is given in (3.1) and I ( φ ) = Z ∞ x ∗ ( | φ ( y ) | + q ∗ | φ | ) e R y x ∗ S ( z )d z d y . To estimate this term note that similarly to (A.5) we can obtain that for any y ≥ a ≥ x ∗ Z ya S ( z )d z ≤ β ( y − a ) − β ( y − a ) . (A.6)Using this inequlity for a = x ∗ , we get I ( φ ) ≤ Z ∞ x ∗ | φ ( y ) | e β ( y − x ∗ ) − β ( y − x ∗ ) d y + q ∗ | φ | Z ∞ e β z − β z d z ≤ | φ | sup z ≥ e β z − β z + q ∗ | φ | Z ∞ e β z − β z d z ≤ | φ | ( υ + q ∗ υ ) , where the parameters υ and υ are introduced in (3.1). Therefore, taking intoaccount the deﬁnition (3.3), the last inequality impliessup ϑ ∈ Θ sup ≤ u ≤ x ∗ | v ϑ ( u ) | ≤ r . (A.7)If u ≥ x ∗ , then through the inequality (A.6) we estimate the function v ϑ ( u )from above as sup ϑ ∈ Θ sup u ≥ x ∗ | v ϑ ( u ) | ≤ | φ | σ ( υ + q ∗ υ ) ≤ r . Let now u ≤

0. Taking into account that Z R e φ ( y ) σ ( y ) exp { Z y S ( z ) d z } d y = 0 ,

22e can represent the function v ϑ as v ϑ ( u ) = 2 Z ∞| u | e φ ( − y ) σ ( − y ) e − R y | u | S ( − z ) d z d y . Similarly to (A.6), one can check directly, that for any y ≥ a ≥ x ∗ − Z ya S ( − z ) d z ≤ β ( y − a ) − β ( y − a ) . Therefore, by the same way as in the proof of (A.7) we can estimate thefunction v ϑ ( u ) as sup ϑ ∈ Θ sup u ≤ | v ϑ ( u ) | ≤ r . Hence Lemma A.2.

References [1] P. Bertail, S. Cl´emencon, Sharp bounds for the tails of functionals ofMarkov chains.

Theor. Veroyatnost i Primenen N3,(2009), 609-619.[2] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities usingthe entropy method.

The Ann. Probab. (2003) 1583-1614.[3] D.A. Cattiaux, A. Guillin, Deviation bounds for additive functionals ofMarkov process. ESAIM PS , Vol 12,(2008) p. 12-29.[4] J. Dedecker, P. Doukhan, A new covariance inequality and applications.

Stochastic Proc. Appl. (2003) 63-80.[5] J. Dedecker, C. Prieur, New dependence coeﬃcients. Examples and ap-plications to statistics.

Probab. Theory and Related Fields (2005)203-236.[6] L. Galtchouk, S. Pergamenshchikov, Sequential nonparametric adaptiveestimation of the drift coeﬃcient in diﬀusion processes.

MathematicalMethods of Statistics (2001) 316-330.[7] L. Galtchouk, S. Pergamenshchikov, Asymptotically eﬃcient sequentialkernel estimates of the drift coeﬃcient in ergodic diﬀusion processes. Statistic Inferences for Stochastic Processes (2006) 1-16.[8] L. Galtchouk, S. Pergamenshchikov, Uniform concentration inequalityfor ergodic diﬀusion processes. Stochastic Processes and their applica-tions (2007) 830-839. 239] L. Galtchouk, S. Pergamenshchikov, Adaptive sequential estimation forergodic diﬀusion processes in quadratic metric.

Journal of NonparametricStatistics (2) (2011) 255-285.[10] L. Galtchouk, S. Pergamenshchikov, Geometric er-godicity for families of homogeneous Markov chains. http://hal.archives-ouvertes.fr/hal-00455976/fr (2011)[11] I.I. Gihman, A.V. Skorohod, Stochastic diﬀerential equations.

Springer,New York, 1972.[12] R.Sh. Liptser, A.N. Shiryaev,

Statistics of random processes, I,

Springer,New York, 1977.[13] P. Massart, Some applications of concentration inequalities to statistics.

Ann. Fac. Sci. Toulouse Math. (2000) 245-303.[14] V. Maume-Deschamps, Concentration inequalities and estimation ofconditional probabilities. Universit´e de Bourgogne , D´ecembre 2004,Pr´epublication n. 396.[15] S. Meyn, R. Tweedie, Markov Chains and Stochastic Stability. SpringerVerlag, 1993.[16] S.M. Pergamenshchikov, On large deviation probabilities in ergodic the-orem for singularly perturbed stochastic systems.

Weierstrass Institutf¨ur Angewandte Analysis und Stochastik, Berlin , 1998, Preprint n. 414.[17] Rio, E. (2000)

Th´eorie asymptotique des processus faiblementd´ependants . In Collection : Math´ematiques & Applications, ,Springer, Berlin.[18] A.Yu. Veretennikov, On large deviations for diﬀusion processes with mea-surable coeﬃcients. Uspekhi Mat. Nauk ,50