[PDF] Optimal convergence rates for the invariant density estimation of jump-diffusion processes

Abstract

We aim at estimating the invariant density associated to a stochastic differential equation with jumps in low dimension, which is for d=1 and d=2. We consider a class of jump diffusion processes whose invariant density belongs to some H\"older space. Firstly, in dimension one, we show that the kernel density estimator achieves the convergence rate \frac{1}{T}, which is the optimal rate in the absence of jumps. This improves the convergence rate obtained in [Amorino, Gloter (2021)], which depends on the Blumenthal-Getoor index for d=1 and is equal to \frac{\log T}{T} for d=2. Secondly, we show that is not possible to find an estimator with faster rates of estimation. Indeed, we get some lower bounds with the same rates \{\frac{1}{T},\frac{\log T}{T}\} in the mono and bi-dimensional cases, respectively. Finally, we obtain the asymptotic normality of the estimator in the one-dimensional case.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n Optimal convergence rates for the invariantdensity estimation of jump-diﬀusion processes

Chiara Amorino ∗ and Eulalia Nualart † January 27, 2021

Abstract

We aim at estimating the invariant density associated to a stochasticdiﬀerential equation with jumps in low dimension, which is for d = 1 and d = 2. We consider a class of jump diﬀusion processes whose invariantdensity belongs to some H¨older space. Firstly, in dimension one, we showthat the kernel density estimator achieves the convergence rate T , which isthe optimal rate in the absence of jumps. This improves the convergence rateobtained in [2], which depends on the Blumenthal-Getoor index for d = 1and is equal to log TT for d = 2. Secondly, we show that is not possible toﬁnd an estimator with faster rates of estimation. Indeed, we get some lowerbounds with the same rates { T , log TT } in the mono and bi-dimensional cases,respectively. Finally, we obtain the asymptotic normality of the estimatorin the one-dimensional case. Keywords:

Minimax risk, convergence rate, non-parametric statistics, ergodicdiﬀusion with jumps, L´evy driven SDE, invariant density estimation

Solutions to L´evy-driven stochastic diﬀerential equations have recently attracteda lot of attention in the literature due to its many applications in various areassuch as ﬁnance, physics, and neuroscience. Indeed, it includes some importantexamples from ﬁnance such as the well-known Kou model in [29], the Barndorﬀ-Nielsen-Shephard model ([8]), and the Merton model ([32]) to name just a few. Animportant example of application of jump-processes in neuroscience is the stochas-tic Morris-Lecar neuron model presented in [25]. As a consequence, statisticalinference for jump processes has recently become an active domain of research. ∗ Universit´e du Luxembourg, L-4364 Esch-Sur-Alzette, Luxembourg. CA gratefully acknowl-edges ﬁnancial support of ERC Consolidator Grant 815703 “STAMFORD: Statistical Methodsfor High Dimensional Diﬀusions”. † Universitat Pompeu Fabra and Barcelona Graduate School of Economics, Department ofEconomics and Business, Ram´on Trias Fargas 25-27, 08005 Barcelona, Spain. EN acknowledgessupport from the Spanish MINECO grant PGC2018-101643-B-I00 and Ayudas Fundacion BBVAa Equipos de Investigaci´on Cient´ıﬁca 2017.

1e consider the process ( X t ) t ≥ solution to the following stochastic diﬀerentialequation with jumps: X t = X + Z t b ( X s ) ds + Z t a ( X s ) dB s + Z t Z R d γ ( X s − ) z ( ν ( ds, dz ) − F ( z ) dzds ) , (1)where ( B t ) t ≥ is a d -dimensional Brownian motion and ν is a Poisson randommeasure on R + × R d associated to a L´evy process ( L t ) t ≥ with L´evy density function F . We focus on the estimation of the invariant density µ associated to the jump-process solution to (1) in low dimension, which is for d = 1 and d = 2. In particular,assuming that a continuous record of ( X t ) t ∈ [0 ,T ] is available, our goal is to proposea non-parametric kernel estimator for the estimation of the stationary measure andto discuss its convergence rate for large T .The same framework has been considered in some recent papers such as [2],[23] (Section 5.2), and [3]. In the ﬁrst paper, it is shown that the kernel estima-tor achieves the following convergence rates for the pointwise estimation of theinvariant density: log TT for d = 2 and (log T ) (2 − (1+ α )2 ) ∨ T for d = 1 (where α is theBlumenthal-Getoor index). We recall that, in the absence of jumps, the optimalconvergence rate in the one-dimensional case is T , while the one found in [2] de-pends on the jumps and belongs to the interval ( log TT , (log T ) T ).In this paper, we wonder if such a deterioration on the rate is because of thepresence of jumps or the used approach. Indeed, our purpose is to look for anew approach to recover a better convergence rate in the one-dimensional case(hopefully the same as in the continuous case) and to discuss the optimality ofsuch a rate. This new approach will also lead to the obtaining of the asymptoticnormality of the proposed estimator. After that, we will discuss the optimalityof the convergence rate in the bi-dimensional case. This will close the circle ofthe analysis of the convergence rates for the estimation of the invariant density ofjump-diﬀusions, as the convergence rates and their optimality in the case d ≥ µ belongs to some H¨older class whosesmoothness is β , they prove under some mixing conditions that their pointwise L risk achieves the standard rate of convergence T β β +1 and the rates are minimax intheir framework. Castellana and Leadbetter proved in [15] that, under the followingcondition CL , the density can be estimated with the parametric rate T by somenon-parametric estimators (the kernel ones among them). CL: u

7→ k g u k ∞ is integrable on (0 , ∞ ) and g u ( · , · ) is continuous for each u > g u ( x, y ) = µ ( x ) p u ( x, y ) − µ ( x ) µ ( y ), where p u ( x, y ) is the transitiondensity. More precisely, they shed light to the fact that local irregularities of thesample paths provide some additional information. Indeed, if the joint distributionof ( X , X t ) is not too close to a singular distribution for | t | small, then it is possibleto achieve the superoptimal rate T for the pointwise quadratic risk of the kernelestimator. Condition CL can be veriﬁed for ergodic continuous diﬀusion processes(see [38] for suﬃcient conditions). The paper of Castellana and Leadbetter led toa lot of works regarding the estimation of the common marginal distribution of acontinuous time process. In [9], [10], [14], [21], and [7] several related results andexamples can be found.An alternative to the kernel density estimator is given by the local time densityestimator, which was proposed by Kutoyants in [22] in the case of diﬀusion processesand was extended by Bosq and Davydov in [12] to a more general context. Thelatest have proved that, under a condition which is mildly weaker than CL , themean squared error of the local time estimator reaches the full rate T . Leblancbuilt in [31] a wavelet estimator of a density belonging to some general Besov spaceand proved that, if the process is geometrically strong mixing and a condition like CL is satisﬁed, then its L p -integrated risk converges at rate T as well. In [18] theauthors built a projection estimator and showed that its L -integrated risk achievesthe parametric rate T under a condition named WCL, which is blandly diﬀerentcompared to CL . WCL:

There exists a positive integrable function k (deﬁned on R ) such thatsup y ∈ R Z ∞ g u ( x, y ) du ≤ k ( x ) , for all x ∈ R . In this paper, we will show that our mono-dimensional jump-process satisﬁesa local irregularity condition

WCL1 and an asymptotic independence condition

WCL2 (see Proposition 1), two conditions in which the original condition

WCL can be decomposed. In this way, it will be possible to show that the L risk forthe pointwise estimation of the invariant measure achieves the superoptimal rate T ,using our kernel density estimator. Moreover, the same conditions will result in theasymptotic normality of the proposed estimator. Indeed, as we will see in the proofof Theorem 2, the main challenge in this part is to justify the use of dominatedconvergence theorem, which will ensured by conditions WCL1 and

WCL2 . Wewill ﬁnd in particular that, for any collection ( x i ) ≤ i ≤ m of real numbers, we have √ T (ˆ µ h,T ( x i ) − µ ( x i ) , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ , µ h,T is the kernel density estimator andΣ ( m ) := ( σ ( x i , x j )) ≤ i,j ≤ m , σ ( x i , x j ) := 2 Z ∞ g u ( x i , x j ) du. We remark that the precise form of the equation above allows us to construct testsand conﬁdence sets for the density.We have found the convergence rate (cid:8) T , log TT (cid:9) for the risk associated to ourkernel density estimator for the estimation of the invariant density for d = 1 and d = 2. Then, some questions naturally arise: are the convergence rates the bestpossible or is it possible to improve them by using other estimators? In orderto answer, we consider a simpler model where both the volatility and the jumpcoeﬃcient are constant and the intensity of the jumps is ﬁnite. Then, we look fora lower bound for the risk at a point x ∈ R d deﬁned as in equation (9) below.The ﬁrst idea is to use the two hypothesis method (see Section 2.3 in [37]). To dothat, the knowledge of the link between the drift b and the invariant density µ b is essential. If in absence of jumps such link is explicit, in our context it is morechallenging. As shown in [19] and [3], it is possible to ﬁnd the link knowing that theinvariant measure has to satisfy A ∗ µ b = 0, where A ∗ is the adjoint of the generatorof the considered diﬀusion. This method allows us to show that the superoptimalrate T is the best possible for the estimation of the invariant density in d = 1, butit fails in the bi-dimensional case (see Remark 1 below for details). Finally, we usea ﬁnite number of hypotheses to prove a lower bound in the bi-dimensional case.This requires a detailed analysis of the Kullback divergence between the probabilitylaws associated to the diﬀerent hypotheses. Thanks to that, it is possible to recoverthe optimal rate log TT in the two-dimensional case.The paper is organised as follows. In Section 2 we give the assumptions on ourmodel and we provide our main results. Section 3 is devoted to state and provesome preliminary results needed for the proofs of the main results. To conclude, inSection 4 we give the proof of Theorems 1, 2, 3, and 4, where our main results aregathered. We consider the following stochastic diﬀerential equation with jumps X t = X + Z t b ( X s ) ds + Z t a ( X s ) dB s + Z t Z R d γ ( X s − ) z ( ν ( ds, dz ) − F ( z ) dzds ) , (2)where t ≥ d ∈ { , } , R d = R d \ { } , the initial condition X is a R d -valuedrandom variable, the coeﬃcients b : R d → R d , a : R d → R d ⊗ R d and γ : R d → R d ⊗ R d are measurable functions, ( B t ) t ≥ is a d -dimensional Brownian motion, and ν is a Poisson random measure on R + × R d associated to a L´evy process ( L t ) t ≥ withL´evy density function F . All sources of randomness are mutually independent.We consider the following assumptions on the coeﬃcients and on the L´evydensity F : A1 The functions b ( x ), γ ( x ) and aa T ( x ) are globally Lipschitz and bounded.Moreover, inf x ∈ R aa T ( x ) ≥ c Id, for some constant c >

0, where Id denotesthe d × d identity matrix and inf x ∈ R det( γ ( x )) > h x, b ( x ) i ≤ − c | x | + c , for all | x | ≥ ρ , for some ρ, c , c > A3 Supp( F ) = R d and for all z ∈ R d , F ( z ) ≤ c | z | d + α , for some α ∈ (0 , , c > A4 There exist ǫ > c > R R d | z | e ǫ | z | F ( z ) dz ≤ c . A5 If α = 1, R r< | z |

0, there exists c > λ > t ∈ [0 , T ] and x, y ∈ R d , p t ( x, y ) ≤ c (cid:18) t − d/ e − λ | y − x | t + t ( t / + | y − x | ) d + α (cid:19) . (3)We assume that the process is observed continuously X = ( X t ) t ∈ [0 ,T ] in a timeinterval [0 , T ] such that T tends to ∞ . In the paper [2] cited above, the nonpara-metric estimation of µ is studied via the kernel estimator which is deﬁned as follows.We assume that µ belongs to the H¨older space H d ( β, L ) where β = ( β , . . . , β d ), β i > L = ( L , . . . , L d ), L i >

0, which means that for all i ∈ { , . . . , d } , k = 0 , , . . . , ⌊ β i ⌋ and t ∈ R , (cid:13)(cid:13)(cid:13) D ( k ) i µ (cid:13)(cid:13)(cid:13) ∞ ≤ L and (cid:13)(cid:13)(cid:13) D ( ⌊ β i ⌋ ) i µ ( . + te i ) − D ( ⌊ β i ⌋ ) i µ ( . ) (cid:13)(cid:13)(cid:13) ∞ ≤ L i | t | β i −⌊ β i ⌋ , where D ( k ) i denotes the k th order partial derivative of µ w.r.t the i th component, ⌊ β i ⌋ is the integer part of β i , and e , . . . , e d is the canonical basis of R d . We setˆ µ h,T ( x ) = 1 T Q di =1 h i Z T d Y i =1 K (cid:18) x i − X it h i (cid:19) dt =: 1 T Z T K h ( x − X t ) dt, where x = ( x , . . . , x d ) ∈ R d , h = ( h , . . . , h d ) is a bandwidth and K : R → R is akernel function satisfying Z R K ( x ) dx = 1 , k K k ∞ < ∞ , supp( K ) ⊂ [ − , , Z R K ( x ) x i dx = 0 , for all i ∈ { , . . . , M } with M ≥ max i β i .We ﬁrst consider equation (2) with d = 1 and show that the kernel estimatorreaches the optimal rate T − , as it is for the stochastic diﬀerential equation (2)without jumps. For this, we need the following additional assumption on F . A6 F belongs to C ( R ) and for all z ∈ R , | F ′ ( z ) | ≤ c | z | α , for some c > heorem 1. Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1 - A6 hold and µ ∈ H ( β, L ) . Then there exists a constant c > independent of T and h such that for all x ∈ R , E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ c ( h β + 1 T ) . (4) In particular, choosing h ( T ) = T a with a > β , we conclude that E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ cT . Theorem 1 improves the upper bound obtained in [2] which was of the form (log T ) (2 − α ∨ T . As in that paper, we will use the bias-variance decomposition (see[17, Proposition 1]) E [ | ˆ µ h,T ( x ) − µ ( x ) | ] ≤ c (cid:18) h β + T − Var (cid:18)Z T K ( x − X t ) dt (cid:19)(cid:19) . (5)Then in [2] bounds on the transition semigroup and on the transition density (see(3) above) give an upper bound for the variance depending on the bandwidth. Here,we use the same approach as in [15] and [18] to obtain a bandwidth-free rate forthe variance of smoothing density estimators (which include the kernel estimator).For Markov diﬀusions, the suﬃcient conditions can be decomposed into a localirregularity condition WCL1 plus an asymptotic independence condition

WCL2 : WCL1: Z R Z sup y ∈ R | g u ( x, y ) | du dx < ∞ , WCL2: Z R Z ∞ sup y ∈ R | g u ( x, y ) | du dx < ∞ , where g u ( x, y ) := µ ( x ) p u ( x, y ) − µ ( x ) µ ( y ). In order to show these conditions, anupper bound of the second derivative of the transition density p t ( x, y ) is obtained(see Lemma 1 below), for which the additional condition A6 is needed.As shown in [13], conditions WLC1 and

WLC2 are also useful to show theasymptotic normality of the kernel density estimator, as proved in the next theorem.

Theorem 2.

Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1 - A6 hold and µ ∈ H ( β, L ) . Then, for any collection ( x i ) ≤ i ≤ m ofdistinct real numbers √ T (ˆ µ h,T ( x i ) − E [ˆ µ h,T ( x i )] , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ , (6) where Σ ( m ) := ( σ ( x i , x j )) ≤ i,j ≤ m , σ ( x i , x j ) := 2 Z ∞ g u ( x i , x j ) du. Furthermore, √ T (ˆ µ h,T ( x i ) − µ ( x i ) , ≤ i ≤ m ) D −→ N ( m ) (0 , Σ ( m ) ) as T → ∞ . (7)6e are also interested in obtaining lower bounds in dimension d ∈ { , } . Forthis, we consider the particular case of equation (2): X t = X + Z t b ( X s ) ds + aB t + Z t Z R d γz ( ν ( ds, dz ) − F ( z ) dzds ) , (8)where a and γ are d × d invertible matrices and b is a Lipschitz and boundedfunction satisfying Assumption A2 . We assume that F satisﬁes Assumptions A3 - A5 and R R d F ( z ) dz < ∞ . Then, the unique solution to equation (8) admits aunique invariant measure π b , which we assume has a density µ b with respect to theLebesgue measure. We denote by P ( T ) b and E ( T ) b the law and expectation of thesolution ( X t ) t ∈ [0 ,T ] .We say that a bounded and Lipschitz function b belongs to Σ( β, L ) if the uniqueinvariant density µ b belongs to H d ( β, L ) for some β, L ∈ R d , β i > L i > x ∈ R d by R xT ( β, L ) := inf ˜ µ T R (˜ µ T ( x )) := inf ˜ µ T sup b ∈ Σ( β, L ) E ( T ) b [(˜ µ T ( x ) − µ b ( x )) ] , (9)where the inﬁmum is taken on all possible estimators of the invariant density.The following lower bounds hold true. Theorem 3.

Let X be the solution to (8) on [0 , T ] with d = 1 . We assume that a and γ are non-zero constants. There exists T > and c > such that, for all T ≥ T , inf x ∈ R R xT ( β, L ) ≥ cT . Theorem 4.

Let X be the solution to (8) on [0 , T ] with d = 2 . Assume that forall i ∈ { , } and j = i , | ( aa T ) ij ( aa T ) − jj | ≤ . (10) There exists T > and c > such that, for T ≥ T , inf ˜ µ T sup b ∈ Σ( β, L ) E ( T ) b (cid:20) sup x ∈ R (˜ µ T ( x ) − µ b ( x )) (cid:21) ≥ c log TT .

Comparing these lower bounds with the upper bound of Theorem 1 for thecase d = 1 and Proposition 4 in [2] for the two-dimensional case, we conclude thatthe convergence rate { T , log TT } are the best possible for the kernel estimator of theinvariant density in dimension d ∈ { , } .The proof of Theorem 3 follows along the same lines as that of Theorem 2 in[3], where a lower bound for the kernel estimator of the invariant density for thesolution to (8) for d ≥ x inside the expectation, while the method in[3] provides an inf x outside the expectation.7 Preliminary results

The proof of Theorems 1 and 2 will use the following upper bound on the secondpartial derivative of the transition density.

Lemma 1.

Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose that As-sumptions A1-A6 hold. For all

T > , there exist two constants λ > and c > such that for any x, y ∈ R and t ∈ [0 , T ] ∂ ∂x p t ( x, y ) ≤ c (cid:18) t − / e − λ | y − x | t + 1( t / + | x − y | ) α (cid:19) . Proof.

We apply the estimate in Theorem 3.5(v) of [16]. We remark that, in [16],the authors assumed d ≥

2. After inspection of the proof it is possible to see thatthe result can be extended to the case d = 1: it was stated for d ≥ A1-5 ), together with the following additional condition: there exist c > δ ∈ (0 ,

1) such that for all x, y, z ∈ R , | b ( x ) − b ( y ) | + | k ( x, z ) − k ( y, z ) | ≤ c | x − y | δ , (11)where k ( x, z ) = γ ( x ) | z | α F ( zγ ( x ) ). Thus, we only need to show (11). As b isbounded and Lipschitz, it satisﬁes (11). In fact, when x and y are such that | x − y | >

1, thanks to the boundedness of b we have, for each δ ∈ (0 , | b ( x ) − b ( y ) | ≤ | b ( x ) | + | b ( y ) | ≤ c ≤ c | x − y | δ . Instead, when x and y are such that | x − y | ≤

1, the Lipschitz continuity gives | b ( x ) − b ( y ) | ≤ L | x − y | = L | x − y | − δ | x − y | δ ≤ L | x − y | δ . Concerning k , we write | k ( x, z ) − k ( y, z ) | = | z | α (cid:12)(cid:12)(cid:12)(cid:12) γ ( x ) F (cid:18) zγ ( x ) (cid:19) − γ ( y ) F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) = | z | α | γ ( x ) | (cid:12)(cid:12)(cid:12)(cid:12) F (cid:18) zγ ( x ) (cid:19) − F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) + | z | α (cid:12)(cid:12)(cid:12)(cid:12) F (cid:18) zγ ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ ( x ) − γ ( y ) (cid:12)(cid:12)(cid:12)(cid:12) . (12)From the intermediate value theorem and deﬁning ˜ z ∈ [ zγ ( x ) , zγ ( y ) ] (assuming WLOGthat γ ( x ) > γ ( y ), otherwise ˜ z ∈ [ zγ ( y ) , zγ ( x ) ]), the ﬁrst term in the r.h.s above isbounded by | z | α | γ ( x ) | | F ′ (˜ z ) | (cid:12)(cid:12)(cid:12)(cid:12) zγ ( x ) − zγ ( y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | z | α | γ ( x ) | c | ˜ z | α | z || γ ( x ) γ ( y ) | | γ ( y ) − γ ( x ) |≤ c γ αmax γ min | γ ( x ) γ ( y ) | , where we have used A6 in the ﬁrst inequality and γ max := sup x | γ ( x ) | and γ min :=inf x | γ ( x ) | . Moreover, by A3 , the second term in the r.h.s of (12) is bounded by | z | α c | γ ( y ) | α | z | α | γ ( x ) γ ( y ) | | γ ( y ) − γ ( x ) | ≤ c γ αmax γ min | γ ( y ) − γ ( x ) | . | k ( x, z ) − k ( y, z ) | ≤ c (cid:18) γ αmax γ min + γ αmax γ min (cid:19) | γ ( y ) − γ ( x ) | . Finally, as γ is Lipschitz and bounded, we conclude that (11) holds. This concludesthe proof of the lemma.The key point of the proof of Theorem 1 consists in showing that conditions WCL1 and

WCL2 hold true, which is proved in the next proposition.

Proposition 1.

Let X be the solution to (2) on [0 , T ] with d = 1 . Suppose thatAssumptions A1-A6 hold. Then, conditions

WCL1 and

WCL2 are satisﬁed.Proof.

We start considering

WCL1 . The density estimate (3) yields p t ( x, y ) ≤ ct − + ˜ ct − α ≤ ¯ ct − < t ≤ , (13)which combined with sup y ∈ R µ ( y ) < ∞ gives WCL1 . In order to show

WCL2 , weset ϕ ( λ ) := E [exp( iλX t )] and ϕ x ( λ, t ) := E [exp( iλX t ) | X = x ] and we claim thathere exists c > | ϕ ( λ ) | ≤ c (1 + | λ | − ) . (14)Moreover, for all t ≥

2, there exists c >

0, such that for all x ∈ R , | ϕ x ( λ, t ) | ≤ c (1 + | λ | − ) . (15)Recall from Lemma 2 in [2] that the process X is exponentially β -mixing, whichimplies that β X ( u ) ≤ ce − γ u , where β X ( u ) is the β -mixing coeﬃcient deﬁned inSection 1.3.2 of [26]. It follows that, for any p > R ∞ β pX ( u ) du < ∞ . Thus,by Proposition 10 of [18], inequalities (14) and (15) and the integrability of the β -mixing coeﬃcient imply WCL2 . Therefore, we are left to show (14) and (15).We start showing (15). Integrating by parts and using Lemma 1 it yields | ϕ x ( λ, t ) | = (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) p t ( x, y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) ∂ ∂y p t ( x, y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) (cid:18) ∂ ∂y Z R p t − ( x, z ) p ( z, y ) (cid:19) dy (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | − Z R Z R p t − ( x, z ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂y p ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12) dz dy ≤ c | λ | − Z R Z R p t − ( x, z ) (cid:18) e − λ | x − y | + 1(1 + | x − y | ) α (cid:19) dy dz. As α ∈ (0 , dy is ﬁnite. Since R R p t − ( x, z ) dz < c , we get | ϕ x ( λ, t ) | ≤ c | λ | − ≤ c (1 + | λ | − ) , which proves (15). Similarly, | ϕ ( λ ) | = (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) µ ( y ) dy (cid:12)(cid:12)(cid:12)(cid:12) = | λ | − (cid:12)(cid:12)(cid:12)(cid:12) Z R exp( iλy ) (cid:18) ∂ ∂y Z R µ ( z ) p ( z, y ) (cid:19) dy (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | − Z R Z R µ ( z ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂y p ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12) dz dy ≤ c (1 + | λ | − ) , which gives (15). The proof of the proposition is now completed.9heorem 2 is an application of the following central limit theorem for discretestationary sequences. Let Y n = ( Y n,i , i ∈ Z ), n ≥ R m valued random process. We deﬁne the α -mixing coeﬃcientof Y n by α n,k := sup A ∈ σ ( Y n,i , i ≤ , B ∈ σ ( Y n,i , i ≥ k ) P ( A ∩ B ) − P ( A ) P ( B )and we set α k := sup n ≥ α n,k (see also Section 1 in [26]). We denote by Y ( r ) ther-th component of an m dimensional random vector Y . Theorem 5 (Theorem 1.1 [13]) . Assume that (i) E [ Y ( r ) n,i ] = 0 and | Y ( r ) n,i | ≤ M n for every n ≥ , i ≥ and ≤ r ≤ m , where M n is a constant depending only n . (ii) sup i ≥ , ≤ r ≤ m E [( Y ( r ) n,i ) ] < ∞ . (iii) For every ≤ r, s ≤ m and for every sequence b n → ∞ such that b n ≤ n forevery n ≥ , we have lim n →∞ b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = σ r,s . (iv) There exists a ∈ (1 , ∞ ) such that P k ≥ kα a − a k < ∞ . (v) For some constant c > and for every n ≥ , M n ≤ cn a a − a − .Then, P ni =1 Y n,i √ n D −→ N (0 , Σ) as n → ∞ , where Σ = ( σ r,s ) ≤ r,s ≤ m . The proof of Theorem 4 is based on the following Kullback version of the maintheorem on lower bounds in [37], see Lemma C.1 of [36]:

Lemma 2.

Fix β, L ∈ (0 , ∞ ) and assume that there exists f ∈ H ( β, L ) and aﬁnite set J T such that one can ﬁnd { f j , j ∈ J T } ⊂ H ( β, L ) satisfying k f j − f k k ∞ ≥ ψ > ∀ j = k ∈ J T . (16) Moreover, denoting P j the probability measure associated with f j , ∀ j ∈ J T , P ( T ) j ≪ P ( T )0 and | J T | X j ∈ J T KL ( P ( T ) j , P ( T )0 ) = 1 | J T | X j ∈ J T E ( T ) j " log d P ( T ) j d P ( T )0 ( X T ) ! ≤ γ log( | J T | ) (17) for some γ ∈ (0 , ) . Then, for q > , we have inf ˜ µ T sup µ b ∈H ( β, L ) ( E ( T ) b [ ψ − q k ˜ µ T − µ b k q ∞ ]) /q ≥ c ( γ ) > , where the inﬁmum is taken over all the possible estimators ˜ µ T of µ b . Proof of the main results

By the symmetry of the covariance operator and the stationarity of the process, T Var(ˆ µ h,T ( x )) = 1 T Z T Z T Cov( K h ( x − X t ) , K h ( x − X s )) ds dt = 2 T Z T ( T − u )Cov( K h ( x − X u ) , K h ( x − X )) du = 2 Z T (1 − uT ) Z R Z R K h ( x − y ) K h ( x − z ) g u ( y, z ) dy dz du ≤ Z R Z R K h ( x − y ) K h ( x − z ) Z ∞ g u ( y, z ) dy dz du ≤ c, where in the last inequality we have used Proposition 1. Then, from the bias-variance decomposition (5) we obtain (4), which concludes the desired proof. We aim to apply Theorem 5. First of all we split the interval [0 , T ] into n smallintervals whose length is ∆ n as follows: [0 , T ] = ∪ ni =1 [ t i − , t i ), with t = 0 and t n = T and, for any i ∈ { , . . . , n } , t i − t i − = ∆ n . By construction, it clearly holdsthat n ∆ n = T .For each n ≥ ≤ r ≤ m , we consider the sequence ( Y ( r ) n,i ) i ≥ deﬁned as Y ( r ) n,i := 1 √ ∆ n (cid:18)Z t i t i − K h ( x r − X u ) du − E (cid:20)Z t i t i − K h ( x r − X u ) du (cid:21)(cid:19) , for x r ∈ R . We denote by Y n,i the R m valued random vector deﬁned by Y n,i =( Y (1) n,i , . . . , Y ( m ) n,i ). By construction, P ni =1 Y n,i √ n = √ T (ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )]) , where ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )] is the vector(ˆ µ h,T ( x ) − E [ˆ µ h,T ( x )] , . . . , ˆ µ h,T ( x m ) − E [ˆ µ h,T ( x m )]) . It is clear that E [ Y n,i ] = 0 for all n ≥ i ≥

1. Moreover, for all i ≥ ≤ r ≤ m and n ≥ | Y ( r ) n,i | ≤ √ ∆ n k K h k ∞ ∆ n ≤ ch ( T ) p ∆ n =: M n . Hence, assumption (i) holds true. Concerning assumption (ii) we remark that, forany i ≥ ≤ r ≤ m , E [( Y ( r ) n,i ) ] = Var (cid:18) √ ∆ n Z ∆ n K h ( x r − X u ) du (cid:19) = Var( p ∆ n ˆ µ h, ∆ n ( x r ))= ∆ n Var(ˆ µ h, ∆ n ( x r )) ≤ ∆ n c ∆ n = c, b n be a sequence of integers such that b n → ∞ and b n ≤ n for every n . For every1 ≤ r ≤ m and 1 ≤ s ≤ m , we have1 b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = 1 b n Z b n Z b n Cov( K h ( x r − X u ) , K h ( x s − X v )) du dv = 2 Z b n (1 − ub n ) Z R Z R K h ( x r − z ) K h ( x s − z ) g u ( z , z ) dz dz du = 2 Z R Z R Z b n (1 − ub n ) K ( w ) K ( w ) g u ( x r − h ( T ) w , x s − h ( T ) w ) du dw dw , where we have used Fubini’s theorem and the change of variables w := x r − z h ( T ) , w := x s − z h ( T ) . Using dominated convergence and the fact that h ( T ) → T → ∞ ,we obtainlim n →∞ b n E " b n X i =1 Y ( r ) n,i b n X j =1 Y ( s ) n,j = 2 Z R K ( w ) Z R K ( w ) Z ∞ g u ( x r , x s ) du dw dw = 2 Z ∞ g u ( x r , x s ) du =: σ ( x r , x s ) , which proves (iii). Remark that it possible to use dominated convergence theoremas we have shown in Proposition 1 that R ∞ k g u k ∞ du < ∞ . In particular, we have | (1 − ub n ) K ( w ) K ( w ) g u ( x r − h ( T ) w , x s − h ( T ) w )1 [0 ,b n ] ( u )1 R ( w , w ) |≤ k g u k ∞ | K ( w ) K ( w ) | ∈ L ( R + × R ) . We now check (iv). We remark that if a process is β -mixing, then it is also α -mixing and the following estimation holds (see Theorem 3 in Section 1.2.2 of[26]) α k ≤ β Y n,i ( k ) = β X ( k ) ≤ ce − γ k . Therefore, we it suﬃces to show that there exists a ∈ (1 , ∞ ) such that c X k ≥ ke − kγ a − a < ∞ , which is true for any a >

1, so (iv) is satisﬁed.We are left to show (v). Set f ( a ) := a (3 a − a − . We want to show that thereexists a > c > n ≥ √ ∆ n h ( T ) ≤ cn f ( a ) . (18)For any ǫ >

0, we can choose h ( T ) = ( T ) β + ǫ which still achieves the rate optimalchoice of Theorem 1. Recalling that T = n ∆ n and replacing it in (18), we get c p ∆ n ( n ∆ n ) β + ǫ ≤ n f ( a ) , β +1+ ǫ β β n ≤ cn βf ( a ) − − ǫ β β . This condition is satisﬁed if for some ǫ ′ > n ≤ cn βf ( a ) − β +1 − ǫ ′ . We can always ﬁnd an ǫ ′ > β > a > f ( a ) > β so that βf ( a ) − β +1 >

0. Thus, condition (v) is satisﬁed. We can then apply Theorem 5 whichdirectly leads us to (6). We next turn to the proof of (7). In the proof of Theorem1 we have shown that | E [ˆ µ h,T ( x i )] − µ ( x i ) | ≤ h ( T ) β . where h ( T ) = ( T ) ¯ a , with ¯ a > β . Thus, √ T | E [ˆ µ h,T ( x i )] − µ ( x i ) | ≤ cT T − ¯ aβ , which converges to zero. This proves (7) and concludes the desired proof. The proof of of Theorem 3 follows as the proof of the lower bound for d ≥ Step 1

The ﬁrst step consists in showing that given a density function f , we canalways ﬁnd a drift function b f such that f is the unique invariant density functionof equation (8) with drift coeﬃcient b = b f . We give the statement and proof indimension d = 1, as in Propositions 2 and 3 of [3] it is only done for d ≥ Proposition 2.

Let f : R → R be a C positive probability density satisfying thefollowing conditions1. lim y →±∞ f ( y ) = 0 and lim y →±∞ f ′ ( y ) = 0 .2. There exists < ǫ < ǫ | γ | , where ǫ is as in Assumption A4 such that, for any y, z ∈ R , f ( y ± z ) ≤ ˆ c e ǫ | z | f ( y ) .

3. For ǫ > as in 2. there exists ˆ c ( ǫ ) > such that sup y< f ( y ) Z y −∞ f ( w ) dw < ˆ c and sup y> f ( y ) Z ∞ y f ( w ) dw < ˆ c .

4. There exists < ˜ ǫ < a γ c ˆ c ˆ c ˆ c and R > such that for any | y | > R , f ′ ( y ) f ( y ) ≤ − ˜ ǫ sgn( y ) , where c is as in Assumption A4 . Moreover, there exists ˆ c such that for any y ∈ R , | f ′ ( y ) | ≤ ˆ c f ( y ) . . For any y ∈ R and ˜ ǫ as in 4. | f ′′ ( y ) | ≤ ˆ c ˜ ǫ f ( y ) . Then there exists a bounded Lipschitz function b f which satisﬁes A2 such that f isthe unique invariant density to equation (8) with drift coeﬃcient b = b f .Proof. Let A d be the discrete part of the generator of the diﬀusion process X solution of (8) and let A ∗ d its adjoint. We deﬁne b f as b f ( x ) = ( f ( x ) R x −∞ ( a f ′′ ( w ) + A ∗ d f ( w )) dw, if x < − f ( x ) R ∞ x a f ′′ ( x )( w ) + A ∗ d f ( w ) dw, if x > , where A ∗ d f ( x ) = Z R [ f ( x − γz ) − f ( x ) + γzf ′ ( x )] F ( z ) dz. Then, following Proposition 3 in [3], one can check that b f is bounded, Lipschitz,and satisﬁes A2 . Moreover, if we replace b by b f in equation (8), then f is theunique invariant density. Step 2

The second step consists in deﬁning two probability density functions f and f in H ( β, L ).We ﬁrst deﬁne f ( y ) = c η f ( η | y | ), where η ∈ (0 , ), c η is such that R f = 1, and f ( x ) =  e −| x | , if | x | ≥ ∈ [1 , e − ] , if < | x | < , if | x | ≤ . (19)Moreover, f is a C function such that for any x ∈ R ,12 e −| x | ≤ f ( x ) ≤ e −| x | , | f ′ ( | x | ) | ≤ e −| x | , and | f ′ ( | x | ) | ≤ e −| x | . It is easy to see that η can be chosen small enough so that f ∈ H ( β, L ).Moreover, f satisﬁes the assumptions of Proposition 2 with ˆ c = 4, ǫ = η , ˆ c = η , R = η , ˆ c = 4, and ˆ c = 16. In order for the condition on ˜ ǫ in assumption 4. tobe satisﬁed we need c < a γ . This means that the jumps have to integrate anexponential function. The bound depends on the coeﬃcients a and γ and so itdepends only on the model.Therefore, b := b f belongs to Σ( β, L ). Recall that b belongs to Σ( β, L ) if andonly if f belongs to H ( β, L ) and b is bounded, Lipschitz and satisﬁes the driftcondition A2 .We next deﬁne f ( x ) = f ( x ) + 1 M T ˆ K (cid:18) x − x H ( T ) (cid:19) , (20)where x ∈ R is ﬁxed and ˆ K : R → R is a C ∞ function with support on [ − , K (0) = 1 , Z − ˆ K ( z ) dz = 0 .M T and H ( T ) will be calibrated later and satisfy that M T → ∞ and H ( T ) → T → ∞ . 14hen it can be shown as in [3, Lemma 3] that if there exists ǫ > T suﬃciently large,1 M T ≤ ǫH ( T ) β and 1 H ( T ) = o ( M T ) (21)as T → ∞ , then b := b f belongs to Σ( β, L ) for T suﬃciently large. Step 3 As b , b ∈ Σ( β, L ), we can write R (˜ µ T ( x )) ≥ E ( T )1 [(˜ µ T ( x ) − f ( x )) ] + 12 E ( T )0 [(˜ µ T ( x ) − f ( x )) ] , where E ( T ) i denotes the expectation with respect to b i . Then, following as in [3],using Girsanov’s formula, we can show that ifsup T ≥ T M T H ( T ) < ∞ , (22)then for suﬃciently large T , R (˜ µ T ( x )) ≥ C λ M T , (23)where the constants C and λ are as in Lemma 4 of [3] and they do not depend onthe point x . We ﬁnally look for the larger choice of M T for which both (21) and(22) hold true. It suﬃces to choose M T = √ T and H ( T ) a constant, to concludethe proof of Theorem 3. Remark 1.

The two hypothesis method used above does not work to prove the 2-dimensional lower bound of Theorem 4. Indeed, following as above, we can deﬁne f ( x ) = f ( x ) + 1 M T ˆ K (cid:18) x − x H ( T ) (cid:19) ˆ K (cid:18) x − x H ( T ) (cid:19) . Then, it is possible to show that (23) still holds and, therefore, we should take M T such that M T = log TT . On the other hand, condition (22) now becomes sup T ≥ T M T (cid:18) H ( T ) H ( T ) + H ( T ) H ( T ) (cid:19) < ∞ . The optimal choice of the bandwidth is achieved for H ( T ) = H ( T ) which yieldsto sup T ≥ T M T < ∞ , which is clearly not satisﬁed when M T = log TT . We will apply Lemma 2 with ψ := v q log TT , where v > Step 1

As in the one-dimensional case, the ﬁrst step consists in showing thatgiven a density function f , we can always ﬁnd a drift function b f such that f isthe unique invariant density function of equation (8) with drift coeﬃcient b = b f ,15hich is proved in Propositions 2 and 3 of [3]. We remark that condition (10) isneeded in Proposition 3 to ensure that the terms on the diagonal of the volatilitycoeﬃcient a dominate on the others, which is crucial to get that b f satisﬁes thedrift condition A2 . Step 2

We next deﬁne the probability density f ∈ H ( β, L ), the ﬁnite set J T ,and the set of probability densities { f j , j ∈ J T } ⊂ H ( β, L ) needed in order toapply Lemma 2.We ﬁrst deﬁne f as π in Section 7.2 of [3], which is the two-dimensional versionof f deﬁned in the proof of Theorem 3, that is, f ( x ) = c η f ( η ( aa T ) − | x | ) f ( η ( aa T ) − | x | ) , x = ( x , x ) ∈ R , (24)where f is as in (19). The density f belongs to H ( β, L ) by construction.We then set J T := (cid:26) , . . . , ⌊ √ H ⌋ (cid:27) × (cid:26) , . . . , ⌊ √ H ⌋ (cid:27) , (25)where H := H ( T ) and H := H ( T ) are two quantities that converge to 0 as T → ∞ and need to be calibrated.Finally, for j := ( j , j ) ∈ J T , we deﬁne x j := ( x j, , x j, ) = ( j H , j H ) and weset f j ( x ) := f ( x ) + v r log TT ˆ K (cid:18) x − x j, H (cid:19) ˆ K (cid:18) x − x j, H (cid:19) , where recall that v > K is as in (20).Acting as in Lemma 3 of [3], recalling that the rate M T therein is now replacedby q log TT (see also points 1. and 3. in the proof of Proposition 3 below), it is easyto see that if there exists ǫ > T , r log TT ≤ ǫH β , r log TT ≤ ǫH β , (26)then, for any j ∈ J T and large T , b j ∈ Σ( β, L ) . In particular, f j ∈ H ( β, L ) . Therefore, { f j , j ∈ J T } ⊂ H ( β, L ) and, by construction, k f j − f k k ∞ ≥ v r log TT k ˆ K k ∞ = 2 v r log TT , which proves the ﬁrst condition of Lemma 2.

Step 3

We are left to show the remaining conditions of Lemma 2. The absolutecontinuity P ( T ) j ≪ P ( T )0 and the expression for d P ( T ) j d P ( T )0 ( X T ) are both obtained byGirsanov formula, as in Lemma 4 of [3]. We have, KL ( P ( T ) j , P ( T )0 ) = E ( T ) j (cid:20) log (cid:18) f j f ( X T ) (cid:19)(cid:21) + 12 E ( T ) j (cid:20)Z T | a − ( b ( X u ) − b j ( X u )) | du (cid:21) , where the law of X T = ( X t ) t ∈ [0 ,T ] under P ( T ) j is the one of the solution to equation(8) with b = b . 16y the deﬁnition of the f j ’s it is easy to see that the ﬁrst term is o (1) as T → ∞ .In fact, as ˆ K is supported in [ − , E ( T ) j (cid:20) log (cid:18) f j f ( X T ) (cid:19)(cid:21) = Z R log (cid:18) v q log TT ˆ K (cid:16) x − x j, H (cid:17) ˆ K (cid:16) x − x j, H (cid:17) f ( x ) (cid:19) f ( x ) dx ≤ (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) c ∗ v r log TT k ˆ K k ∞ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , which tends to zero as T → ∞ , where c ∗ := c η e η k , c η is the constant of normal-ization introduced in the deﬁnition of f , and k := max i =1 , ( aa T ) − ii . In fact, thisfollows from the deﬁnition of f in (24). Since f ( x ) ≥ e −| x | , we obtain1 f ( x ) ≤ c η e − η ( aa T ) − | x | e − η ( aa T ) − | x | ≤ c η e ηk ( | H | + | x j, | + | H | + | x j, | ) , where we have also used the fact that, as ˆ K is supported in [ − , x ∈ [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ]. Finally, by the deﬁnition of x j andthe fact that H i → T → ∞ for i = 1 , T large enough they aresmaller than 1), we get1 f ( x ) ≤ c η e ηk for any x ∈ [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ] . (27)Regarding the second term, using the stationarity of the process X T , we have E ( T ) j (cid:20)Z T | a − ( b ( X u ) − b j ( X u )) | du (cid:21) = T Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx. Then, the following asymptotic bound will be proved at the end of this Section.

Proposition 3.

For T large enough, Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx ≤ e ηk c η k v H H (cid:18) H + 1 H (cid:19) log TT .

Taking the optimal choice for the bandwidth in Proposition 3, which is H = H ,we get that Z R | a − ( b ( x ) − b j ( x )) | f ( x ) dx ≤ e ηk c η k v TT .

In particular, after having ordered β ≤ β , we choose H = H = ( log TT ) a with a ≤ β = ( β ∧ β ) so that condition (26) is satisﬁed. We therefore get KL ( P ( T ) j , P ( T )0 ) ≤ e ηk c η k v log T ≤ e ηk c η a k v log( | J T | ) , being the last estimation a consequence of the fact that, by construction,log( | J T | ) ≥ a log (cid:18) T log T (cid:19) = a log( T )(1 + o (1)) . It is therefore enough to choose v such that 128 e ηk c η a k v < (ie v < c η a k e ηk )and apply Lemma 2 to conclude the proof of Theorem 4.17 .5 Proof of Proposition 3 The proof of Proposition 3 follows similarly as Proposition 4 of [3]. Indeed, we ﬁrstdeﬁne the set K jT := [ x j, − H , x j, + H ] × [ x j, − H , x j, + H ]and then show the following points for T large enough:1. For any x ∈ K j cT and i ∈ { , } : | b ij ( x ) − b i ( x ) | ≤ c v q log TT .2. For any i ∈ { , } : R K j cT | b ij ( x ) − b i ( x ) | f ( x ) dx ≤ c v q log TT H H .3. For any x ∈ K jT and i ∈ { , } : | b ij ( x ) − b i ( x ) | ≤ c η e ηk kv q log TT (cid:16) H + H (cid:17) .The proof of the ﬁrst two points follows exactly the one in Proposition 4 of [3],remarking that d T ( x ) := π ( x ) − π ( x ) = 1 M T d Y l =1 K (cid:18) x l − x l h l ( T ) (cid:19) in [3] is now replaced by d jT ( x ) := f j ( x ) − f ( x ) = v r log TT ˆ K (cid:18) x − x j, H (cid:19) ˆ K (cid:18) x − x j, H (cid:19) , and the set K T := [ x − h ( T ) , x + h ( T )] × · · · × [ x d − h d ( T ) , x d + h d ( T )]introduced in the d -dimensional framework is now replaced by K jT . We recall that K and ˆ K are exactly the same kernel function. The proof of Proposition 4 of [3] isbased on the fact that d T ( x ) and its derivatives are null for x ∈ K cT . In the sameway, d jT ( x ) and its derivatives are null for x ∈ K j cT . Then, acting as in [3], it iseasy to see that the ﬁrst two points above hold true.Comparing the third point above with the third point of Proposition 4 of [3], itis clear that our goal is to make explicit the constant c . Keeping the notation in[3], we ﬁrst introduce the following quantities:˜ I i [ f ]( x ) := 12 X j =1 ( aa T ) ij ∂f ∂x j ( x ) , ˜ I i [ f ]( x ) = Z x i −∞ A ∗ d,i f ( w i ) dw. We moreover introduce the notation˜ I i [ f ]( x ) = ˜ I i [ f ]( x ) + ˜ I i [ f ]( x ) . According with the deﬁnition of b , we have b i ( x ) = 1 f ( x ) ˜ I i [ f ]( x ) , b ij ( x ) = 1 f j ( x ) ˜ I i [ f j ]( x ) . f → ˜ I i [ f ] is linear, we deduce that b ij ( x ) = 1 f j ( x ) ˜ I i [ f j ]( x ) = 1 f j ( x ) ˜ I i [ f ]( x ) + 1 f j ( x ) ˜ I i [ d jT ]( x ) . (28)Therefore, b ij − b i = ( 1 f j − f ) ˜ I i [ f ] + 1 f j ˜ I i [ d jT ] = f − f j f j f ˜ I i [ f ] + 1 f j ˜ I i [ d jT ] = d jT f j b i + 1 f j ˜ I i [ d jT ] . We need to evaluate such a diﬀerence on the compact set K jT . For this, we will usethat fact that f j = f + d jT , and obtain a lower bound away from 0. Speciﬁcally,from the deﬁnition of d jT , we get (cid:13)(cid:13) d jT (cid:13)(cid:13) ∞ ≤ v r log TT k ˆ K k ∞ = v r log TT . (29)In particular, f j ≥ f − | d jT | ≥ f − v r log TT ≥ f , since q log TT → T → ∞ , so for T large enough we have v q log TT ≤ f . Then,for any x ∈ K jT , using (27) we have1 f j ( x ) ≤ f ≤ c η e ηk . Moreover, as b is bounded, we deduce that for all x ∈ K jT , | b ij ( x ) − b i ( x ) | ≤ vc η e ηk (cid:13)(cid:13) b i (cid:13)(cid:13) ∞ r log TT + 8 e ηk c η ˜ I i [ d jT ]( x ) . (30)We therefore need to evaluate ˜ I i [ d jT ]( x ) = ˜ I i [ d jT ]( x ) + ˜ I i [ d jT ]( x ) on K jT . As (cid:13)(cid:13)(cid:13)(cid:13) ∂d jT ∂x j (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ vH j r log TT , (31)it clearly follows that ˜ I i [ d T ] j ( x ) ≤ kv r log TT (cid:18) H + 1 H (cid:19) . (32)Regarding ˜ I i [ d jT ]( x ), we can act exactly as in the third point of Proposition 4 of[3]. As x ∈ K jT , x i ∈ [ x j,i − H i , x j,i + H i ] for i = 1 ,

2. Therefore, using also thedeﬁnition of d jT , the ﬁrst integral is between x j,i − H i and x i . We enlarge the domainof integration to [ x j,i − H i , x j,i + H i ] and then, appealing to (29) and (31) and the19act that the intensity of the jumps is ﬁnite, we get | ˜ I i [ d jT ]( x ) | ≤ Z x j,i + H i x j,i − H i Z R | d jT ( ˜ w i ) − d jT ( ˜ w i − ) + ( γ · z ) i ∂∂x i d jT ( w i ) | F ( z ) dzdw ≤ Z R F ( z ) dz ) Z x j,i + H i x j,i − H i (cid:13)(cid:13) d jT (cid:13)(cid:13) ∞ dw + Z x j,i + H i x j,i − H i Z R Z R | ( γ · z ) i | (cid:13)(cid:13)(cid:13)(cid:13) ∂d jT ∂x i (cid:13)(cid:13)(cid:13)(cid:13) ∞ F ( z ) dzdw ≤ cH i r log TT + cH i H i r log TT , for some c >

0. Using this together with (30) and (32) it follows that, for any x ∈ K jT , | b j ( x ) − b ( x ) | ≤ c r log TT + 8 e ηk c η kv r log TT (cid:18) H + 1 H (cid:19) + cH i r log TT + c r log TT ≤ e ηk c η kv r log TT (cid:18) H + 1 H (cid:19) , where the last inequality is a consequence of the fact that, ∀ i ∈ { , } , H i → T → ∞ and so, for T large enough, all the terms are negligible when compared tothe second one. Hence, the three points listed at the beginning of the proof holdtrue. We deduce that Z R | b ( x ) − b j ( x ) | f ( x ) dx = Z K jT | b ( x ) − b j ( x ) | f ( x ) dx + Z K j cT | b ( x ) − b j ( x ) | f ( x ) dx ≤ c v log TT H H + 64 e ηk c η k v log TT (cid:18) H + 1 H (cid:19) | K jT | . We recall that | K jT | = H H and that, as T → ∞ , H i →

0. Thus, the ﬁrst term isnegligible compared to the second one. The desired result follows.

References [1] Applebaum, D. (2009). L´evy processes and stochastic calculus. Cambridge uni-versity press.[2] Amorino, C., Gloter, A. (2021). Invariant density adaptive estimation for er-godic jump diﬀusion processes over anisotropic classes. Journal of StatisticalPlanning and Inference 213, 106-129.[3] Amorino, C. (2020). Minimax rate of estimation for the stationary dis-tribution of jump-processes over anisotropic H¨older classes. arXiv preprintarxiv.org/abs/2011.11994 204] Amorino, C., Dion, C., Gloter, A., Lemler, S. (2020). On the nonpara-metric inference of coeﬃcients of self-exciting jump-diﬀusion. arXiv preprintarXiv:2011.12387.[5] Banon, G. (1978). Nonparametric identiﬁcation for diﬀusion processes, SIAMJ. Control Optim. 16, 380–395.[6] Banon, G., Nguyen, H.T. (1981). Recursive estimation in diﬀusion model, SIAMJ. Control Optim. 19, 676–685.[7] Blanke, D. (1996) Estimation de la densit´e pour des trajectoires non directementobservables, Publ. Inst. Statist. Univ. Paris 40, 21–36.[8] Barndorﬀ-Nielsen, O. E., Shephard, N. (2001). Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in ﬁnancial economics. J. R.Stat. Soc., Ser. B, Stat. Methodol., 63, 167-241.[9] Bosq, D. (1997). Parametric rates of nonparametric estimators and predictorsfor continuous time processes, Ann. Statist. 25, 982–1000.[10] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. (Secondedition), Lecture Notes Statist., 110, New York: Springer-Verlag.[11] Bosq, D. (1998) Minimax rates of density estimators for continuous time pro-cesses, Sankhya. Ser. A 60, 18–28.[12] Bosq, D., Davydov, Yu. (1999). Local time and density estimation in contin-uous time, Math. Methods Statist. 8, 22–45.[13] Bosq, D., Merlev`ede, F., Peligrad, M. (1999). Asymptotic normality for densitykernel estimators in discrete and continuous time. J. Multivar. Anal. 68, 78-95.[14] Cheze-Payaud, N. (1994). Nonparametric regression and prediction forcontinuous-time processes, Publ. Inst. Statist. Univ. Paris 38, 37–58.[15] Castellana, J. V., Leadbetter, M. R. (1986). On smoothed probability densityestimation for stationary processes. Stoch. Process Their Appl. 21(2), 179-193.[16] Chen, Z. Q., Hu, E., Xie, L., Zhang, X. (2017). Heat kernels for non-symmetricdiﬀusion operators with jumps. J Diﬀer Equ., 263(10), 6576-6634.[17] Comte, F., Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. InAnnales de l’IHP Probabilit´es et statistiques (Vol. 49, No. 2, pp. 569-609).[18] Comte, F., Merlev`ede, F. (2005). Super optimal rates for nonparametric den-sity estimation via projection estimators. Stoch. Process Their Appl., 115, 797-826[19] Delattre, S., Gloter, A., Yoshida, N. (2020). Rate of Estimation for the Station-ary Distribution of Stochastic Damping Hamiltonian Systems with ContinuousObservations. arXiv preprint arXiv:2001.10423.2120] Delecroix, M. (1980). Sur l’estimation des densit´es d’un processus stationnaire (cid:26)(cid:26)