Optimal convergence rates for the invariant density estimation of jump-diffusion processes
aa r X i v : . [ m a t h . S T ] J a n Optimal convergence rates for the invariantdensity estimation of jump-diffusion processes
Chiara Amorino ∗ and Eulalia Nualart † January 27, 2021
Abstract
We aim at estimating the invariant density associated to a stochasticdifferential equation with jumps in low dimension, which is for d = 1 and d = 2. We consider a class of jump diffusion processes whose invariantdensity belongs to some H¨older space. Firstly, in dimension one, we showthat the kernel density estimator achieves the convergence rate T , which isthe optimal rate in the absence of jumps. This improves the convergence rateobtained in [2], which depends on the Blumenthal-Getoor index for d = 1and is equal to log TT for d = 2. Secondly, we show that is not possible tofind an estimator with faster rates of estimation. Indeed, we get some lowerbounds with the same rates { T , log TT } in the mono and bi-dimensional cases,respectively. Finally, we obtain the asymptotic normality of the estimatorin the one-dimensional case. Keywords:
Minimax risk, convergence rate, non-parametric statistics, ergodicdiffusion with jumps, L´evy driven SDE, invariant density estimation
Solutions to L´evy-driven stochastic differential equations have recently attracteda lot of attention in the literature due to its many applications in various areassuch as finance, physics, and neuroscience. Indeed, it includes some importantexamples from finance such as the well-known Kou model in [29], the Barndorff-Nielsen-Shephard model ([8]), and the Merton model ([32]) to name just a few. Animportant example of application of jump-processes in neuroscience is the stochas-tic Morris-Lecar neuron model presented in [25]. As a consequence, statisticalinference for jump processes has recently become an active domain of research. ∗ Universit´e du Luxembourg, L-4364 Esch-Sur-Alzette, Luxembourg. CA gratefully acknowl-edges financial support of ERC Consolidator Grant 815703 “STAMFORD: Statistical Methodsfor High Dimensional Diffusions”. † Universitat Pompeu Fabra and Barcelona Graduate School of Economics, Department ofEconomics and Business, Ram´on Trias Fargas 25-27, 08005 Barcelona, Spain. EN acknowledgessupport from the Spanish MINECO grant PGC2018-101643-B-I00 and Ayudas Fundacion BBVAa Equipos de Investigaci´on Cient´ıfica 2017.
1e consider the process ( X t ) t ≥ solution to the following stochastic differentialequation with jumps: X t = X + Z t b ( X s ) ds + Z t a ( X s ) dB s + Z t Z R d γ ( X s − ) z ( ν ( ds, dz ) − F ( z ) dzds ) , (1)where ( B t ) t ≥ is a d -dimensional Brownian motion and ν is a Poisson randommeasure on R + × R d associated to a L´evy process ( L t ) t ≥ with L´evy density function F . We focus on the estimation of the invariant density µ associated to the jump-process solution to (1) in low dimension, which is for d = 1 and d = 2. In particular,assuming that a continuous record of ( X t ) t ∈ [0 ,T ] is available, our goal is to proposea non-parametric kernel estimator for the estimation of the stationary measure andto discuss its convergence rate for large T .The same framework has been considered in some recent papers such as [2],[23] (Section 5.2), and [3]. In the first paper, it is shown that the kernel estima-tor achieves the following convergence rates for the pointwise estimation of theinvariant density: log TT for d = 2 and (log T ) (2 − (1+ α )2 ) ∨ T for d = 1 (where α is theBlumenthal-Getoor index). We recall that, in the absence of jumps, the optimalconvergence rate in the one-dimensional case is T , while the one found in [2] de-pends on the jumps and belongs to the interval ( log TT , (log T ) T ).In this paper, we wonder if such a deterioration on the rate is because of thepresence of jumps or the used approach. Indeed, our purpose is to look for anew approach to recover a better convergence rate in the one-dimensional case(hopefully the same as in the continuous case) and to discuss the optimality ofsuch a rate. This new approach will also lead to the obtaining of the asymptoticnormality of the proposed estimator. After that, we will discuss the optimalityof the convergence rate in the bi-dimensional case. This will close the circle ofthe analysis of the convergence rates for the estimation of the invariant density ofjump-diffusions, as the convergence rates and their optimality in the case d ≥ µ belongs to some H¨older class whosesmoothness is β , they prove under some mixing conditions that their pointwise L risk achieves the standard rate of convergence T β β +1 and the rates are minimax intheir framework. Castellana and Leadbetter proved in [15] that, under the followingcondition CL , the density can be estimated with the parametric rate T by somenon-parametric estimators (the kernel ones among them). CL: u
7→ k g u k ∞ is integrable on (0 , ∞ ) and g u ( · , · ) is continuous for each u > g u ( x, y ) = µ ( x ) p u ( x, y ) − µ ( x ) µ ( y ), where p u ( x, y ) is the transitiondensity. More precisely, they shed light to the fact that local irregularities of thesample paths provide some additional information. Indeed, if the joint distributionof ( X , X t ) is not too close to a singular distribution for | t | small, then it is possibleto achieve the superoptimal rate T for the pointwise quadratic risk of the kernelestimator. Condition CL can be verified for ergodic continuous diffusion processes(see [38] for sufficient conditions). The paper of Castellana and Leadbetter led toa lot of works regarding the estimation of the common marginal distribution of acontinuous time process. In [9], [10], [14], [21], and [7] several related results andexamples can be found.An alternative to the kernel density estimator is given by the local time densityestimator, which was proposed by Kutoyants in [22] in the case of diffusion processesand was extended by Bosq and Davydov in [12] to a more general context. Thelatest have proved that, under a condition which is mildly weaker than CL , themean squared error of the local time estimator reaches the full rate T . Leblancbuilt in [31] a wavelet estimator of a density belonging to some general Besov spaceand proved that, if the process is geometrically strong mixing and a condition like CL is satisfied, then its L p -integrated risk converges at rate T as well. In [18] theauthors built a projection estimator and showed that its L -integrated risk achievesthe parametric rate T under a condition named WCL, which is blandly differentcompared to CL . WCL:
There exists a positive integrable function k (defined on R ) such thatsup y ∈ R Z ∞ g u ( x, y ) du ≤ k ( x ) , for all x ∈ R . In this paper, we will show that our mono-dimensional jump-process satisfiesa local irregularity condition
WCL1 and an asymptotic independence condition
WCL2 (see Proposition 1), two conditions in which the original condition
WCL can be decomposed. In this way, it will be possible to show that the L risk forthe pointwise estimation of the invariant measure achieves the superoptimal rate T ,using our kernel density estimator. Moreover, the same conditions will result in theasymptotic normality of the proposed estimator. Indeed, as we will see in the proofof Theorem 2, the main challenge in this part is to justify the use of dominatedconvergence theorem, which will ensured by conditions WCL1 and
WCL2 . Wewill find in particular that, for any collection ( x i ) ≤ i ≤ m of real numbers, we have √ T (ˆ µ h,T ( x i ) − µ ( x i ) , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ , µ h,T is the kernel density estimator andΣ ( m ) := ( σ ( x i , x j )) ≤ i,j ≤ m , σ ( x i , x j ) := 2 Z ∞ g u ( x i , x j ) du. We remark that the precise form of the equation above allows us to construct testsand confidence sets for the density.We have found the convergence rate (cid:8) T , log TT (cid:9) for the risk associated to ourkernel density estimator for the estimation of the invariant density for d = 1 and d = 2. Then, some questions naturally arise: are the convergence rates the bestpossible or is it possible to improve them by using other estimators? In orderto answer, we consider a simpler model where both the volatility and the jumpcoefficient are constant and the intensity of the jumps is finite. Then, we look fora lower bound for the risk at a point x ∈ R d defined as in equation (9) below.The first idea is to use the two hypothesis method (see Section 2.3 in [37]). To dothat, the knowledge of the link between the drift b and the invariant density µ b is essential. If in absence of jumps such link is explicit, in our context it is morechallenging. As shown in [19] and [3], it is possible to find the link knowing that theinvariant measure has to satisfy A ∗ µ b = 0, where A ∗ is the adjoint of the generatorof the considered diffusion. This method allows us to show that the superoptimalrate T is the best possible for the estimation of the invariant density in d = 1, butit fails in the bi-dimensional case (see Remark 1 below for details). Finally, we usea finite number of hypotheses to prove a lower bound in the bi-dimensional case.This requires a detailed analysis of the Kullback divergence between the probabilitylaws associated to the different hypotheses. Thanks to that, it is possible to recoverthe optimal rate log TT in the two-dimensional case.The paper is organised as follows. In Section 2 we give the assumptions on ourmodel and we provide our main results. Section 3 is devoted to state and provesome preliminary results needed for the proofs of the main results. To conclude, inSection 4 we give the proof of Theorems 1, 2, 3, and 4, where our main results aregathered. We consider the following stochastic differential equation with jumps X t = X + Z t b ( X s ) ds + Z t a ( X s ) dB s + Z t Z R d γ ( X s − ) z ( ν ( ds, dz ) − F ( z ) dzds ) , (2)where t ≥ d ∈ { , } , R d = R d \ { } , the initial condition X is a R d -valuedrandom variable, the coefficients b : R d → R d , a : R d → R d ⊗ R d and γ : R d → R d ⊗ R d are measurable functions, ( B t ) t ≥ is a d -dimensional Brownian motion, and ν is a Poisson random measure on R + × R d associated to a L´evy process ( L t ) t ≥ withL´evy density function F . All sources of randomness are mutually independent.We consider the following assumptions on the coefficients and on the L´evydensity F : A1 The functions b ( x ), γ ( x ) and aa T ( x ) are globally Lipschitz and bounded.Moreover, inf x ∈ R aa T ( x ) ≥ c Id, for some constant c >
0, where Id denotesthe d × d identity matrix and inf x ∈ R det( γ ( x )) > h x, b ( x ) i ≤ − c | x | + c , for all | x | ≥ ρ , for some ρ, c , c > A3 Supp( F ) = R d and for all z ∈ R d , F ( z ) ≤ c | z | d + α , for some α ∈ (0 , , c > A4 There exist ǫ > c > R R d | z | e ǫ | z | F ( z ) dz ≤ c . A5 If α = 1, R r< | z | 0, there exists c > λ > t ∈ [0 , T ] and x, y ∈ R d , p t ( x, y ) ≤ c (cid:18) t − d/ e − λ | y − x | t + t ( t / + | y − x | ) d + α (cid:19) . (3)We assume that the process is observed continuously X = ( X t ) t ∈ [0 ,T ] in a timeinterval [0 , T ] such that T tends to ∞ . In the paper [2] cited above, the nonpara-metric estimation of µ is studied via the kernel estimator which is defined as follows.We assume that µ belongs to the H¨older space H d ( β, L ) where β = ( β , . . . , β d ), β i > L = ( L , . . . , L d ), L i > 0, which means that for all i ∈ { , . . . , d } , k = 0 , , . . . , ⌊ β i ⌋ and t ∈ R , (cid:13)(cid:13)(cid:13) D ( k ) i µ (cid:13)(cid:13)(cid:13) ∞ ≤ L and (cid:13)(cid:13)(cid:13) D ( ⌊ β i ⌋ ) i µ ( . + te i ) − D ( ⌊ β i ⌋ ) i µ ( . ) (cid:13)(cid:13)(cid:13) ∞ ≤ L i | t | β i −⌊ β i ⌋ , where D ( k ) i denotes the k th order partial derivative of µ w.r.t the i th component, ⌊ β i ⌋ is the integer part of β i , and e , . . . , e d is the canonical basis of R d . We setˆ µ h,T ( x ) = 1 T Q di =1 h i Z T d Y i =1 K (cid:18) x i − X it h i (cid:19) dt =: 1 T Z T K h ( x − X t ) dt, where x = ( x , . . . , x d ) ∈ R d , h = ( h , . . . , h d ) is a bandwidth and K : R → R is akernel function satisfying Z R K ( x ) dx = 1 , k K k ∞ < ∞ , supp( K ) ⊂ [ − , , Z R K ( x ) x i dx = 0 , for all i ∈ { , . . . , M } with M ≥ max i β i .We first consider equation (2) with d = 1 and show that the kernel estimatorreaches the optimal rate T − , as it is for the stochastic differential equation (2)without jumps. For this, we need the following additional assumption on F . A6 F belongs to C ( R ) and for all z ∈ R , | F ′ ( z ) | ≤ c | z | α , for some c > heorem 1. Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1 - A6 hold and µ ∈ H ( β, L ) . Then there exists a constant c > independent of T and h such that for all x ∈ R , E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ c ( h β + 1 T ) . (4) In particular, choosing h ( T ) = T a with a > β , we conclude that E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ cT . Theorem 1 improves the upper bound obtained in [2] which was of the form (log T ) (2 − α ∨ T . As in that paper, we will use the bias-variance decomposition (see[17, Proposition 1]) E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ c (cid:18) h β + T − Var (cid:18)Z T K ( x − X t ) dt (cid:19)(cid:19) . (5)Then in [2] bounds on the transition semigroup and on the transition density (see(3) above) give an upper bound for the variance depending on the bandwidth. Here,we use the same approach as in [15] and [18] to obtain a bandwidth-free rate forthe variance of smoothing density estimators (which include the kernel estimator).For Markov diffusions, the sufficient conditions can be decomposed into a localirregularity condition WCL1 plus an asymptotic independence condition WCL2 : WCL1: Z R Z sup y ∈ R | g u ( x, y ) | du dx < ∞ , WCL2: Z R Z ∞ sup y ∈ R | g u ( x, y ) | du dx < ∞ , where g u ( x, y ) := µ ( x ) p u ( x, y ) − µ ( x ) µ ( y ). In order to show these conditions, anupper bound of the second derivative of the transition density p t ( x, y ) is obtained(see Lemma 1 below), for which the additional condition A6 is needed.As shown in [13], conditions WLC1 and WLC2 are also useful to show theasymptotic normality of the kernel density estimator, as proved in the next theorem. Theorem 2. Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1 - A6 hold and µ ∈ H ( β, L ) . Then, for any collection ( x i ) ≤ i ≤ m ofdistinct real numbers √ T (ˆ µ h,T ( x i ) − E [ˆ µ h,T ( x i )] , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ , (6) where Σ ( m ) := ( σ ( x i , x j )) ≤ i,j ≤ m , σ ( x i , x j ) := 2 Z ∞ g u ( x i , x j ) du. Furthermore, √ T (ˆ µ h,T ( x i ) − µ ( x i ) , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ . (7)6e are also interested in obtaining lower bounds in dimension d ∈ { , } . Forthis, we consider the particular case of equation (2): X t = X + Z t b ( X s ) ds + aB t + Z t Z R d γz ( ν ( ds, dz ) − F ( z ) dzds ) , (8)where a and γ are d × d invertible matrices and b is a Lipschitz and boundedfunction satisfying Assumption A2 . We assume that F satisfies Assumptions A3 - A5 and R R d F ( z ) dz < ∞ . Then, the unique solution to equation (8) admits aunique invariant measure π b , which we assume has a density µ b with respect to theLebesgue measure. We denote by P ( T ) b and E ( T ) b the law and expectation of thesolution ( X t ) t ∈ [0 ,T ] .We say that a bounded and Lipschitz function b belongs to Σ( β, L ) if the uniqueinvariant density µ b belongs to H d ( β, L ) for some β, L ∈ R d , β i > L i > x ∈ R d by R xT ( β, L ) := inf ˜ µ T R (˜ µ T ( x )) := inf ˜ µ T sup b ∈ Σ( β, L ) E ( T ) b [(˜ µ T ( x ) − µ b ( x )) ] , (9)where the infimum is taken on all possible estimators of the invariant density.The following lower bounds hold true. Theorem 3. Let X be the solution to (8) on [0 , T ] with d = 1 . We assume that a and γ are non-zero constants. There exists T > and c > such that, for all T ≥ T , inf x ∈ R R xT ( β, L ) ≥ cT . Theorem 4. Let X be the solution to (8) on [0 , T ] with d = 2 . Assume that forall i ∈ { , } and j = i , | ( aa T ) ij ( aa T ) − jj | ≤ . (10) There exists T > and c > such that, for T ≥ T , inf ˜ µ T sup b ∈ Σ( β, L ) E ( T ) b (cid:20) sup x ∈ R (˜ µ T ( x ) − µ b ( x )) (cid:21) ≥ c log TT . Comparing these lower bounds with the upper bound of Theorem 1 for thecase d = 1 and Proposition 4 in [2] for the two-dimensional case, we conclude thatthe convergence rate { T , log TT } are the best possible for the kernel estimator of theinvariant density in dimension d ∈ { , } .The proof of Theorem 3 follows along the same lines as that of Theorem 2 in[3], where a lower bound for the kernel estimator of the invariant density for thesolution to (8) for d ≥ x inside the expectation, while the method in[3] provides an inf x outside the expectation.7 Preliminary results The proof of Theorems 1 and 2 will use the following upper bound on the secondpartial derivative of the transition density. Lemma 1. Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose that As-sumptions A1-A6 hold. For all T > , there exist two constants λ > and c > such that for any x, y ∈ R and t ∈ [0 , T ] ∂ ∂x p t ( x, y ) ≤ c (cid:18) t − / e − λ | y − x | t + 1( t / + | x − y | ) α (cid:19) . Proof. We apply the estimate in Theorem 3.5(v) of [16]. We remark that, in [16],the authors assumed d ≥ 2. After inspection of the proof it is possible to see thatthe result can be extended to the case d = 1: it was stated for d ≥ A1-5 ), together with the following additional condition: there exist c > δ ∈ (0 , 1) such that for all x, y, z ∈ R , | b ( x ) − b ( y ) | + | k ( x, z ) − k ( y, z ) | ≤ c | x − y | δ , (11)where k ( x, z ) = γ ( x ) | z | α F ( zγ ( x ) ). Thus, we only need to show (11). As b isbounded and Lipschitz, it satisfies (11). In fact, when x and y are such that | x − y | > 1, thanks to the boundedness of b we have, for each δ ∈ (0 , | b ( x ) − b ( y ) | ≤ | b ( x ) | + | b ( y ) | ≤ c ≤ c | x − y | δ . Instead, when x and y are such that | x − y | ≤ 1, the Lipschitz continuity gives | b ( x ) − b ( y ) | ≤ L | x − y | = L | x − y | − δ | x − y | δ ≤ L | x − y | δ . Concerning k , we write | k ( x, z ) − k ( y, z ) | = | z | α (cid:12)(cid:12)(cid:12)(cid:12) γ ( x ) F (cid:18) zγ ( x ) (cid:19) − γ ( y ) F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) = | z | α | γ ( x ) | (cid:12)(cid:12)(cid:12)(cid:12) F (cid:18) zγ ( x ) (cid:19) − F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) + | z | α (cid:12)(cid:12)(cid:12)(cid:12) F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ ( x ) − γ ( y ) (cid:12)(cid:12)(cid:12)(cid:12) . (12)From the intermediate value theorem and defining ˜ z ∈ [ zγ ( x ) , zγ ( y ) ] (assuming WLOGthat γ ( x ) > γ ( y ), otherwise ˜ z ∈ [ zγ ( y ) , zγ ( x ) ]), the first term in the r.h.s above isbounded by | z | α | γ ( x ) | | F ′ (˜ z ) | (cid:12)(cid:12)(cid:12)(cid:12) zγ ( x ) − zγ ( y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | z | α | γ ( x ) | c | ˜ z | α | z || γ ( x ) γ ( y ) | | γ ( y ) − γ ( x ) |≤ c γ αmax γ min | γ ( x ) γ ( y ) | , where we have used A6 in the first inequality and γ max := sup x | γ ( x ) | and γ min :=inf x | γ ( x ) | . Moreover, by A3 , the second term in the r.h.s of (12) is bounded by | z | α c | γ ( y ) | α | z | α | γ ( x ) γ ( y ) | | γ ( y ) − γ ( x ) | ≤ c γ αmax γ min | γ ( y ) − γ ( x ) | . | k ( x, z ) − k ( y, z ) | ≤ c (cid:18) γ αmax γ min + γ αmax γ min (cid:19) | γ ( y ) − γ ( x ) | . Finally, as γ is Lipschitz and bounded, we conclude that (11) holds. This concludesthe proof of the lemma.The key point of the proof of Theorem 1 consists in showing that conditions WCL1 and WCL2 hold true, which is proved in the next proposition. Proposition 1. Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1-A6 hold. Then, conditions WCL1 and WCL2 are satisfied.Proof. We start considering WCL1 . The density estimate (3) yields p t ( x, y ) ≤ ct − + ˜ ct − α ≤ ¯ ct − < t ≤ , (13)which combined with sup y ∈ R µ ( y ) < ∞ gives WCL1 . In order to show WCL2 , weset ϕ ( λ ) := E [exp( iλX t )] and ϕ x ( λ, t ) := E [exp( iλX t ) | X = x ] and we claim thathere exists c > | ϕ ( λ ) | ≤ c (1 + | λ | − ) . (14)Moreover, for all t ≥ 2, there exists c > 0, such that for all x ∈ R , | ϕ x ( λ, t ) | ≤ c (1 + | λ | − ) . (15)Recall from Lemma 2 in [2] that the process X is exponentially β -mixing, whichimplies that β X ( u ) ≤ ce − γ u , where β X ( u ) is the β -mixing coefficient defined inSection 1.3.2 of [26]. It follows that, for any p > R ∞ β pX ( u ) du < ∞ . Thus,by Proposition 10 of [18], inequalities (14) and (15) and the integrability of the β -mixing coefficient imply WCL2 . Therefore, we are left to show (14) and (15).We start showing (15). Integrating by parts and using Lemma 1 it yields | ϕ x ( λ, t ) | = (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) p t ( x, y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) ∂ ∂y p t ( x, y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) (cid:18) ∂ ∂y Z R p t − ( x, z ) p ( z, y ) (cid:19) dy (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | − Z R Z R p t − ( x, z ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂y p ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12) dz dy ≤ c | λ | − Z R Z R p t − ( x, z ) (cid:18) e − λ | x − y | + 1(1 + | x − y | ) α (cid:19) dy dz. As α ∈ (0 , dy is finite. Since R R p t − ( x, z ) dz < c , we get | ϕ x ( λ, t ) | ≤ c | λ | − ≤ c (1 + | λ | − ) , which proves (15). Similarly, | ϕ ( λ ) | = (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) µ ( y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) (cid:18) ∂ ∂y Z R µ ( z ) p ( z, y ) (cid:19) dy (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | − Z R Z R µ ( z ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂y p ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12) dz dy ≤ c (1 + | λ | − ) , which gives (15). The proof of the proposition is now completed.9heorem 2 is an application of the following central limit theorem for discretestationary sequences. Let Y n = ( Y n,i , i ∈ Z ), n ≥ R m valued random process. We define the α -mixing coefficientof Y n by α n,k := sup A ∈ σ ( Y n,i , i ≤ , B ∈ σ ( Y n,i , i ≥ k ) P ( A ∩ B ) − P ( A ) P ( B )and we set α k := sup n ≥ α n,k (see also Section 1 in [26]). We denote by Y ( r ) ther-th component of an m dimensional random vector Y . Theorem 5 (Theorem 1.1 [13]) . Assume that (i) E [ Y ( r ) n,i ] = 0 and | Y ( r ) n,i | ≤ M n for every n ≥ , i ≥ and ≤ r ≤ m , where M n is a constant depending only n . (ii) sup i ≥ , ≤ r ≤ m E [( Y ( r ) n,i ) ] < ∞ . (iii) For every ≤ r, s ≤ m and for every sequence b n → ∞ such that b n ≤ n forevery n ≥ , we have lim n →∞ b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = σ r,s . (iv) There exists a ∈ (1 , ∞ ) such that P k ≥ kα a − a k < ∞ . (v) For some constant c > and for every n ≥ , M n ≤ cn a a − a − .Then, P ni =1 Y n,i √ n D −→ N (0 , Σ) as n → ∞ , where Σ = ( σ r,s ) ≤ r,s ≤ m . The proof of Theorem 4 is based on the following Kullback version of the maintheorem on lower bounds in [37], see Lemma C.1 of [36]: Lemma 2. Fix β, L ∈ (0 , ∞ ) and assume that there exists f ∈ H ( β, L ) and afinite set J T such that one can find { f j , j ∈ J T } ⊂ H ( β, L ) satisfying k f j − f k k ∞ ≥ ψ > ∀ j = k ∈ J T . (16) Moreover, denoting P j the probability measure associated with f j , ∀ j ∈ J T , P ( T ) j ≪ P ( T )0 and | J T | X j ∈ J T KL ( P ( T ) j , P ( T )0 ) = 1 | J T | X j ∈ J T E ( T ) j " log d P ( T ) j d P ( T )0 ( X T ) ! ≤ γ log( | J T | ) (17) for some γ ∈ (0 , ) . Then, for q > , we have inf ˜ µ T sup µ b ∈H ( β, L ) ( E ( T ) b [ ψ − q k ˜ µ T − µ b k q ∞ ]) /q ≥ c ( γ ) > , where the infimum is taken over all the possible estimators ˜ µ T of µ b . Proof of the main results By the symmetry of the covariance operator and the stationarity of the process, T Var(ˆ µ h,T ( x )) = 1 T Z T Z T Cov( K h ( x − X t ) , K h ( x − X s )) ds dt = 2 T Z T ( T − u )Cov( K h ( x − X u ) , K h ( x − X )) du = 2 Z T (1 − uT ) Z R Z R K h ( x − y ) K h ( x − z ) g u ( y, z ) dy dz du ≤ Z R Z R K h ( x − y ) K h ( x − z ) Z ∞ g u ( y, z ) dy dz du ≤ c, where in the last inequality we have used Proposition 1. Then, from the bias-variance decomposition (5) we obtain (4), which concludes the desired proof. We aim to apply Theorem 5. First of all we split the interval [0 , T ] into n smallintervals whose length is ∆ n as follows: [0 , T ] = ∪ ni =1 [ t i − , t i ), with t = 0 and t n = T and, for any i ∈ { , . . . , n } , t i − t i − = ∆ n . By construction, it clearly holdsthat n ∆ n = T .For each n ≥ ≤ r ≤ m , we consider the sequence ( Y ( r ) n,i ) i ≥ defined as Y ( r ) n,i := 1 √ ∆ n (cid:18)Z t i t i − K h ( x r − X u ) du − E (cid:20)Z t i t i − K h ( x r − X u ) du (cid:21)(cid:19) , for x r ∈ R . We denote by Y n,i the R m valued random vector defined by Y n,i =( Y (1) n,i , . . . , Y ( m ) n,i ). By construction, P ni =1 Y n,i √ n = √ T (ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )]) , where ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )] is the vector(ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )] , . . . , ˆ µ h,T ( x m ) − E [ˆ µ h,T ( x m )]) . It is clear that E [ Y n,i ] = 0 for all n ≥ i ≥ 1. Moreover, for all i ≥ ≤ r ≤ m and n ≥ | Y ( r ) n,i | ≤ √ ∆ n k K h k ∞ ∆ n ≤ ch ( T ) p ∆ n =: M n . Hence, assumption (i) holds true. Concerning assumption (ii) we remark that, forany i ≥ ≤ r ≤ m , E [( Y ( r ) n,i ) ] = Var (cid:18) √ ∆ n Z ∆ n K h ( x r − X u ) du (cid:19) = Var( p ∆ n ˆ µ h, ∆ n ( x r ))= ∆ n Var(ˆ µ h, ∆ n ( x r )) ≤ ∆ n c ∆ n = c, b n be a sequence of integers such that b n → ∞ and b n ≤ n for every n . For every1 ≤ r ≤ m and 1 ≤ s ≤ m , we have1 b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = 1 b n Z b n Z b n Cov( K h ( x r − X u ) , K h ( x s − X v )) du dv = 2 Z b n (1 − ub n ) Z R Z R K h ( x r − z ) K h ( x s − z ) g u ( z , z ) dz dz du = 2 Z R Z R Z b n (1 − ub n ) K ( w ) K ( w ) g u ( x r − h ( T ) w , x s − h ( T ) w ) du dw dw , where we have used Fubini’s theorem and the change of variables w := x r − z h ( T ) , w := x s − z h ( T ) . Using dominated convergence and the fact that h ( T ) → T → ∞ ,we obtainlim n →∞ b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = 2 Z R K ( w ) Z R K ( w ) Z ∞ g u ( x r , x s ) du dw dw = 2 Z ∞ g u ( x r , x s ) du =: σ ( x r , x s ) , which proves (iii). Remark that it possible to use dominated convergence theoremas we have shown in Proposition 1 that R ∞ k g u k ∞ du < ∞ . In particular, we have | (1 − ub n ) K ( w ) K ( w ) g u ( x r − h ( T ) w , x s − h ( T ) w )1 [0 ,b n ] ( u )1 R ( w , w ) |≤ k g u k ∞ | K ( w ) K ( w ) | ∈ L ( R + × R ) . We now check (iv). We remark that if a process is β -mixing, then it is also α -mixing and the following estimation holds (see Theorem 3 in Section 1.2.2 of[26]) α k ≤ β Y n,i ( k ) = β X ( k ) ≤ ce − γ k . Therefore, we it suffices to show that there exists a ∈ (1 , ∞ ) such that c X k ≥ ke − kγ a − a < ∞ , which is true for any a > 1, so (iv) is satisfied.We are left to show (v). Set f ( a ) := a (3 a − a − . We want to show that thereexists a > c > n ≥ √ ∆ n h ( T ) ≤ cn f ( a ) . (18)For any ǫ > 0, we can choose h ( T ) = ( T ) β + ǫ which still achieves the rate optimalchoice of Theorem 1. Recalling that T = n ∆ n and replacing it in (18), we get c p ∆ n ( n ∆ n ) β + ǫ ≤ n f ( a ) , β +1+ ǫ β β n ≤ cn βf ( a ) − − ǫ β β . This condition is satisfied if for some ǫ ′ > n ≤ cn βf ( a ) − β +1 − ǫ ′ . We can always find an ǫ ′ > β > a > f ( a ) > β so that βf ( a ) − β +1 > 0. Thus, condition (v) is satisfied. We can then apply Theorem 5 whichdirectly leads us to (6). We next turn to the proof of (7). In the proof of Theorem1 we have shown that | E [ˆ µ h,T ( x i )] − µ ( x i ) | ≤ h ( T ) β . where h ( T ) = ( T ) ¯ a , with ¯ a > β . Thus, √ T | E [ˆ µ h,T ( x i )] − µ ( x i ) | ≤ cT T − ¯ aβ , which converges to zero. This proves (7) and concludes the desired proof. The proof of of Theorem 3 follows as the proof of the lower bound for d ≥ Step 1 The first step consists in showing that given a density function f , we canalways find a drift function b f such that f is the unique invariant density functionof equation (8) with drift coefficient b = b f . We give the statement and proof indimension d = 1, as in Propositions 2 and 3 of [3] it is only done for d ≥ Proposition 2. Let f : R → R be a C positive probability density satisfying thefollowing conditions1. lim y →±∞ f ( y ) = 0 and lim y →±∞ f ′ ( y ) = 0 .2. There exists < ǫ < ǫ | γ | , where ǫ is as in Assumption A4 such that, for any y, z ∈ R , f ( y ± z ) ≤ ˆ c e ǫ | z | f ( y ) . 3. For ǫ > as in 2. there exists ˆ c ( ǫ ) > such that sup y< f ( y ) Z y −∞ f ( w ) dw < ˆ c and sup y> f ( y ) Z ∞ y f ( w ) dw < ˆ c . 4. There exists < ˜ ǫ < a γ c ˆ c ˆ c ˆ c and R > such that for any | y | > R , f ′ ( y ) f ( y ) ≤ − ˜ ǫ sgn( y ) , where c is as in Assumption A4 . Moreover, there exists ˆ c such that for any y ∈ R , | f ′ ( y ) | ≤ ˆ c f ( y ) . . For any y ∈ R and ˜ ǫ as in 4. | f ′′ ( y ) | ≤ ˆ c ˜ ǫ f ( y ) . Then there exists a bounded Lipschitz function b f which satisfies A2 such that f isthe unique invariant density to equation (8) with drift coefficient b = b f .Proof. Let A d be the discrete part of the generator of the diffusion process X solution of (8) and let A ∗ d its adjoint. We define b f as b f ( x ) = ( f ( x ) R x −∞ ( a f ′′ ( w ) + A ∗ d f ( w )) dw, if x < − f ( x ) R ∞ x a f ′′ ( x )( w ) + A ∗ d f ( w ) dw, if x > , where A ∗ d f ( x ) = Z R [ f ( x − γz ) − f ( x ) + γzf ′ ( x )] F ( z ) dz. Then, following Proposition 3 in [3], one can check that b f is bounded, Lipschitz,and satisfies A2 . Moreover, if we replace b by b f in equation (8), then f is theunique invariant density. Step 2 The second step consists in defining two probability density functions f and f in H ( β, L ).We first define f ( y ) = c η f ( η | y | ), where η ∈ (0 , ), c η is such that R f = 1, and f ( x ) = e −| x | , if | x | ≥ ∈ [1 , e − ] , if < | x | < , if | x | ≤ . (19)Moreover, f is a C function such that for any x ∈ R ,12 e −| x | ≤ f ( x ) ≤ e −| x | , | f ′ ( | x | ) | ≤ e −| x | , and | f ′ ( | x | ) | ≤ e −| x | . It is easy to see that η can be chosen small enough so that f ∈ H ( β, L ).Moreover, f satisfies the assumptions of Proposition 2 with ˆ c = 4, ǫ = η , ˆ c = η , R = η , ˆ c = 4, and ˆ c = 16. In order for the condition on ˜ ǫ in assumption 4. tobe satisfied we need c < a γ . This means that the jumps have to integrate anexponential function. The bound depends on the coefficients a and γ and so itdepends only on the model.Therefore, b := b f belongs to Σ( β, L ). Recall that b belongs to Σ( β, L ) if andonly if f belongs to H ( β, L ) and b is bounded, Lipschitz and satisfies the driftcondition A2 .We next define f ( x ) = f ( x ) + 1 M T ˆ K (cid:18) x − x H ( T ) (cid:19) , (20)where x ∈ R is fixed and ˆ K : R → R is a C ∞ function with support on [ − , K (0) = 1 , Z − ˆ K ( z ) dz = 0 .M T and H ( T ) will be calibrated later and satisfy that M T → ∞ and H ( T ) → T → ∞ . 14hen it can be shown as in [3, Lemma 3] that if there exists ǫ > T sufficiently large,1 M T ≤ ǫH ( T ) β and 1 H ( T ) = o ( M T ) (21)as T → ∞ , then b := b f belongs to Σ( β, L ) for T sufficiently large. Step 3 As b , b ∈ Σ( β, L ), we can write R (˜ µ T ( x )) ≥ E ( T )1 [(˜ µ T ( x ) − f ( x )) ] + 12 E ( T )0 [(˜ µ T ( x ) − f ( x )) ] , where E ( T ) i denotes the expectation with respect to b i . Then, following as in [3],using Girsanov’s formula, we can show that ifsup T ≥ T M T H ( T ) < ∞ , (22)then for sufficiently large T , R (˜ µ T ( x )) ≥ C λ M T , (23)where the constants C and λ are as in Lemma 4 of [3] and they do not depend onthe point x . We finally look for the larger choice of M T for which both (21) and(22) hold true. It suffices to choose M T = √ T and H ( T ) a constant, to concludethe proof of Theorem 3. Remark 1. The two hypothesis method used above does not work to prove the 2-dimensional lower bound of Theorem 4. Indeed, following as above, we can define f ( x ) = f ( x ) + 1 M T ˆ K (cid:18) x − x H ( T ) (cid:19) ˆ K (cid:18) x − x H ( T ) (cid:19) . Then, it is possible to show that (23) still holds and, therefore, we should take M T such that M T = log TT . On the other hand, condition (22) now becomes sup T ≥ T M T (cid:18) H ( T ) H ( T ) + H ( T ) H ( T ) (cid:19) < ∞ . The optimal choice of the bandwidth is achieved for H ( T ) = H ( T ) which yieldsto sup T ≥ T M T < ∞ , which is clearly not satisfied when M T = log TT . We will apply Lemma 2 with ψ := v q log TT , where v > Step 1 As in the one-dimensional case, the first step consists in showing thatgiven a density function f , we can always find a drift function b f such that f isthe unique invariant density function of equation (8) with drift coefficient b = b f ,15hich is proved in Propositions 2 and 3 of [3]. We remark that condition (10) isneeded in Proposition 3 to ensure that the terms on the diagonal of the volatilitycoefficient a dominate on the others, which is crucial to get that b f satisfies thedrift condition A2 . Step 2 We next define the probability density f ∈ H ( β, L ), the finite set J T ,and the set of probability densities { f j , j ∈ J T } ⊂ H ( β, L ) needed in order toapply Lemma 2.We first define f as π in Section 7.2 of [3], which is the two-dimensional versionof f defined in the proof of Theorem 3, that is, f ( x ) = c η f ( η ( aa T ) − | x | ) f ( η ( aa T ) − | x | ) , x = ( x , x ) ∈ R , (24)where f is as in (19). The density f belongs to H ( β, L ) by construction.We then set J T := (cid:26) , . . . , ⌊ √ H ⌋ (cid:27) × (cid:26) , . . . , ⌊ √ H ⌋ (cid:27) , (25)where H := H ( T ) and H := H ( T ) are two quantities that converge to 0 as T → ∞ and need to be calibrated.Finally, for j := ( j , j ) ∈ J T , we define x j := ( x j, , x j, ) = ( j H , j H ) and weset f j ( x ) := f ( x ) + v r log TT ˆ K (cid:18) x − x j, H (cid:19) ˆ K (cid:18) x − x j, H (cid:19) , where recall that v > K is as in (20).Acting as in Lemma 3 of [3], recalling that the rate M T therein is now replacedby q log TT (see also points 1. and 3. in the proof of Proposition 3 below), it is easyto see that if there exists ǫ > T , r log TT ≤ ǫH β , r log TT ≤ ǫH β , (26)then, for any j ∈ J T and large T , b j ∈ Σ( β, L ) . In particular, f j ∈ H ( β, L ) . Therefore, { f j , j ∈ J T } ⊂ H ( β, L ) and, by construction, k f j − f k k ∞ ≥ v r log TT k ˆ K k ∞ = 2 v r log TT , which proves the first condition of Lemma 2. Step 3 We are left to show the remaining conditions of Lemma 2. The absolutecontinuity P ( T ) j ≪ P ( T )0 and the expression for d P ( T ) j d P ( T )0 ( X T ) are both obtained byGirsanov formula, as in Lemma 4 of [3]. We have, KL ( P ( T ) j , P ( T )0 ) = E ( T ) j (cid:20) log (cid:18) f j f ( X T ) (cid:19)(cid:21) + 12 E ( T ) j (cid:20)Z T | a − ( b ( X u ) − b j ( X u )) | du (cid:21) , where the law of X T = ( X t ) t ∈ [0 ,T ] under P ( T ) j is the one of the solution to equation(8) with b = b . 16y the definition of the f j ’s it is easy to see that the first term is o (1) as T → ∞ .In fact, as ˆ K is supported in [ − , E ( T ) j (cid:20) log (cid:18) f j f ( X T ) (cid:19)(cid:21) = Z R log (cid:18) v q log TT ˆ K (cid:16) x − x j, H (cid:17) ˆ K (cid:16) x − x j, H (cid:17) f ( x ) (cid:19) f ( x ) dx ≤ (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) c ∗ v r log TT k ˆ K k ∞ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , which tends to zero as T → ∞ , where c ∗ := c η e η k , c η is the constant of normal-ization introduced in the definition of f , and k := max i =1 , ( aa T ) − ii . In fact, thisfollows from the definition of f in (24). Since f ( x ) ≥ e −| x | , we obtain1 f ( x ) ≤ c η e − η ( aa T ) − | x | e − η ( aa T ) − | x | ≤ c η e ηk ( | H | + | x j, | + | H | + | x j, | ) , where we have also used the fact that, as ˆ K is supported in [ − , x ∈ [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ]. Finally, by the definition of x j andthe fact that H i → T → ∞ for i = 1 , T large enough they aresmaller than 1), we get1 f ( x ) ≤ c η e ηk for any x ∈ [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ] . (27)Regarding the second term, using the stationarity of the process X T , we have E ( T ) j (cid:20)Z T | a − ( b ( X u ) − b j ( X u )) | du (cid:21) = T Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx. Then, the following asymptotic bound will be proved at the end of this Section. Proposition 3. For T large enough, Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx ≤ e ηk c η k v H H (cid:18) H + 1 H (cid:19) log TT . Taking the optimal choice for the bandwidth in Proposition 3, which is H = H ,we get that Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx ≤ e ηk c η k v TT . In particular, after having ordered β ≤ β , we choose H = H = ( log TT ) a with a ≤ β = ( β ∧ β ) so that condition (26) is satisfied. We therefore get KL ( P ( T ) j , P ( T )0 ) ≤ e ηk c η k v log T ≤ e ηk c η a k v log( | J T | ) , being the last estimation a consequence of the fact that, by construction,log( | J T | ) ≥ a log (cid:18) T log T (cid:19) = a log( T )(1 + o (1)) . It is therefore enough to choose v such that 128 e ηk c η a k v < (ie v < c η a k e ηk )and apply Lemma 2 to conclude the proof of Theorem 4.17 .5 Proof of Proposition 3 The proof of Proposition 3 follows similarly as Proposition 4 of [3]. Indeed, we firstdefine the set K jT := [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ]and then show the following points for T large enough:1. For any x ∈ K j cT and i ∈ { , } : | b ij ( x ) − b i ( x ) | ≤ c v q log TT .2. For any i ∈ { , } : R K j cT | b ij ( x ) − b i ( x ) | f ( x ) dx ≤ c v q log TT H H .3. For any x ∈ K jT and i ∈ { , } : | b ij ( x ) − b i ( x ) | ≤ c η e ηk kv q log TT (cid:16) H + H (cid:17) .The proof of the first two points follows exactly the one in Proposition 4 of [3],remarking that d T ( x ) := π ( x ) − π ( x ) = 1 M T d Y l =1 K (cid:18) x l − x l h l ( T ) (cid:19) in [3] is now replaced by d jT ( x ) := f j ( x ) − f ( x ) = v r log TT ˆ K (cid:18) x − x j, H (cid:19) ˆ K (cid:18) x − x j, H (cid:19) , and the set K T := [ x − h ( T ) , x + h ( T )] × · · · × [ x d − h d ( T ) , x d + h d ( T )]introduced in the d -dimensional framework is now replaced by K jT . We recall that K and ˆ K are exactly the same kernel function. The proof of Proposition 4 of [3] isbased on the fact that d T ( x ) and its derivatives are null for x ∈ K cT . In the sameway, d jT ( x ) and its derivatives are null for x ∈ K j cT . Then, acting as in [3], it iseasy to see that the first two points above hold true.Comparing the third point above with the third point of Proposition 4 of [3], itis clear that our goal is to make explicit the constant c . Keeping the notation in[3], we first introduce the following quantities:˜ I i [ f ]( x ) := 12 X j =1 ( aa T ) ij ∂f ∂x j ( x ) , ˜ I i [ f ]( x ) = Z x i −∞ A ∗ d,i f ( w i ) dw. We moreover introduce the notation˜ I i [ f ]( x ) = ˜ I i [ f ]( x ) + ˜ I i [ f ]( x ) . According with the definition of b , we have b i ( x ) = 1 f ( x ) ˜ I i [ f ]( x ) , b ij ( x ) = 1 f j ( x ) ˜ I i [ f j ]( x ) . f → ˜ I i [ f ] is linear, we deduce that b ij ( x ) = 1 f j ( x ) ˜ I i [ f j ]( x ) = 1 f j ( x ) ˜ I i [ f ]( x ) + 1 f j ( x ) ˜ I i [ d jT ]( x ) . (28)Therefore, b ij − b i = ( 1 f j − f ) ˜ I i [ f ] + 1 f j ˜ I i [ d jT ] = f − f j f j f ˜ I i [ f ] + 1 f j ˜ I i [ d jT ] = d jT f j b i + 1 f j ˜ I i [ d jT ] . We need to evaluate such a difference on the compact set K jT . For this, we will usethat fact that f j = f + d jT , and obtain a lower bound away from 0. Specifically,from the definition of d jT , we get (cid:13)(cid:13) d jT (cid:13)(cid:13) ∞ ≤ v r log TT k ˆ K k ∞ = v r log TT . (29)In particular, f j ≥ f − | d jT | ≥ f − v r log TT ≥ f , since q log TT → T → ∞ , so for T large enough we have v q log TT ≤ f . Then,for any x ∈ K jT , using (27) we have1 f j ( x ) ≤ f ≤ c η e ηk . Moreover, as b is bounded, we deduce that for all x ∈ K jT , | b ij ( x ) − b i ( x ) | ≤ vc η e ηk (cid:13)(cid:13) b i (cid:13)(cid:13) ∞ r log TT + 8 e ηk c η ˜ I i [ d jT ]( x ) . (30)We therefore need to evaluate ˜ I i [ d jT ]( x ) = ˜ I i [ d jT ]( x ) + ˜ I i [ d jT ]( x ) on K jT . As (cid:13)(cid:13)(cid:13)(cid:13) ∂d jT ∂x j (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ vH j r log TT , (31)it clearly follows that ˜ I i [ d T ] j ( x ) ≤ kv r log TT (cid:18) H + 1 H (cid:19) . (32)Regarding ˜ I i [ d jT ]( x ), we can act exactly as in the third point of Proposition 4 of[3]. As x ∈ K jT , x i ∈ [ x j,i − H i , x j,i + H i ] for i = 1 , 2. Therefore, using also thedefinition of d jT , the first integral is between x j,i − H i and x i . We enlarge the domainof integration to [ x j,i − H i , x j,i + H i ] and then, appealing to (29) and (31) and the19act that the intensity of the jumps is finite, we get | ˜ I i [ d jT ]( x ) | ≤ Z x j,i + H i x j,i − H i Z R | d jT ( ˜ w i ) − d jT ( ˜ w i − ) + ( γ · z ) i ∂∂x i d jT ( w i ) | F ( z ) dzdw ≤ Z R F ( z ) dz ) Z x j,i + H i x j,i − H i (cid:13)(cid:13) d jT (cid:13)(cid:13) ∞ dw + Z x j,i + H i x j,i − H i Z R Z R | ( γ · z ) i | (cid:13)(cid:13)(cid:13)(cid:13) ∂d jT ∂x i (cid:13)(cid:13)(cid:13)(cid:13) ∞ F ( z ) dzdw ≤ cH i r log TT + cH i H i r log TT , for some c > 0. Using this together with (30) and (32) it follows that, for any x ∈ K jT , | b j ( x ) − b ( x ) | ≤ c r log TT + 8 e ηk c η kv r log TT (cid:18) H + 1 H (cid:19) + cH i r log TT + c r log TT ≤ e ηk c η kv r log TT (cid:18) H + 1 H (cid:19) , where the last inequality is a consequence of the fact that, ∀ i ∈ { , } , H i → T → ∞ and so, for T large enough, all the terms are negligible when compared tothe second one. Hence, the three points listed at the beginning of the proof holdtrue. We deduce that Z R | b ( x ) − b j ( x ) | f ( x ) dx = Z K jT | b ( x ) − b j ( x ) | f ( x ) dx + Z K j cT | b ( x ) − b j ( x ) | f ( x ) dx ≤ c v log TT H H + 64 e ηk c η k v log TT (cid:18) H + 1 H (cid:19) | K jT | . We recall that | K jT | = H H and that, as T → ∞ , H i → 0. Thus, the first term isnegligible compared to the second one. The desired result follows. References [1] Applebaum, D. (2009). L´evy processes and stochastic calculus. Cambridge uni-versity press.[2] Amorino, C., Gloter, A. (2021). Invariant density adaptive estimation for er-godic jump diffusion processes over anisotropic classes. Journal of StatisticalPlanning and Inference 213, 106-129.[3] Amorino, C. (2020). Minimax rate of estimation for the stationary dis-tribution of jump-processes over anisotropic H¨older classes. arXiv preprintarxiv.org/abs/2011.11994 204] Amorino, C., Dion, C., Gloter, A., Lemler, S. (2020). On the nonpara-metric inference of coefficients of self-exciting jump-diffusion. arXiv preprintarXiv:2011.12387.[5] Banon, G. (1978). Nonparametric identification for diffusion processes, SIAMJ. Control Optim. 16, 380–395.[6] Banon, G., Nguyen, H.T. (1981). Recursive estimation in diffusion model, SIAMJ. Control Optim. 19, 676–685.[7] Blanke, D. (1996) Estimation de la densit´e pour des trajectoires non directementobservables, Publ. Inst. Statist. Univ. Paris 40, 21–36.[8] Barndorff-Nielsen, O. E., Shephard, N. (2001). Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J. R.Stat. Soc., Ser. B, Stat. Methodol., 63, 167-241.[9] Bosq, D. (1997). Parametric rates of nonparametric estimators and predictorsfor continuous time processes, Ann. Statist. 25, 982–1000.[10] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. (Secondedition), Lecture Notes Statist., 110, New York: Springer-Verlag.[11] Bosq, D. (1998) Minimax rates of density estimators for continuous time pro-cesses, Sankhya. Ser. A 60, 18–28.[12] Bosq, D., Davydov, Yu. (1999). Local time and density estimation in contin-uous time, Math. Methods Statist. 8, 22–45.[13] Bosq, D., Merlev`ede, F., Peligrad, M. (1999). Asymptotic normality for densitykernel estimators in discrete and continuous time. J. Multivar. Anal. 68, 78-95.[14] Cheze-Payaud, N. (1994). Nonparametric regression and prediction forcontinuous-time processes, Publ. Inst. Statist. Univ. Paris 38, 37–58.[15] Castellana, J. V., Leadbetter, M. R. (1986). On smoothed probability densityestimation for stationary processes. Stoch. Process Their Appl. 21(2), 179-193.[16] Chen, Z. Q., Hu, E., Xie, L., Zhang, X. (2017). Heat kernels for non-symmetricdiffusion operators with jumps. J Differ Equ., 263(10), 6576-6634.[17] Comte, F., Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. InAnnales de l’IHP Probabilit´es et statistiques (Vol. 49, No. 2, pp. 569-609).[18] Comte, F., Merlev`ede, F. (2005). Super optimal rates for nonparametric den-sity estimation via projection estimators. Stoch. Process Their Appl., 115, 797-826[19] Delattre, S., Gloter, A., Yoshida, N. (2020). Rate of Estimation for the Station-ary Distribution of Stochastic Damping Hamiltonian Systems with ContinuousObservations. arXiv preprint arXiv:2001.10423.2120] Delecroix, M. (1980). Sur l’estimation des densit´es d’un processus stationnaire (cid:26)(cid:26)