Nonparametric inference for discretely sampled Lévy processes
aa r X i v : . [ m a t h . S T ] M a y Nonparametric inference for discretely sampledL´evy processes
Shota Gugushvili
Department of MathematicsVrije Universiteit AmsterdamDe Boelelaan 1081a1081 HV AmsterdamThe [email protected]
October 30, 2018
Abstract
Given a sample from a discretely observed L´evy process X = ( X t ) t ≥ of the finite jump activity, the problem of nonparametric estimationof the L´evy density ρ corresponding to the process X is studied. Anestimator of ρ is proposed that is based on a suitable inversion of theL´evy-Khintchine formula and a plug-in device. The main results ofthe paper deal with upper risk bounds for estimation of ρ over suit-able classes of L´evy triplets. The corresponding lower bounds are alsodiscussed. Keywords:
Empirical characteristic function; empirical process; Fourierinversion; L´evy density; L´evy process; maximal inequality; mean squareerror.
AMS subject classification: Introduction
Recent years have witnessed a great revival of interest in L´evy processes,which is primarily due to the fact that they have found numerous applica-tions in various fields. The main interest has been in mathematical finance,see e.g. [28] for a detailed treatment and many references, however L´evy pro-cesses obtained due attention also in queueing, telecommunications, extremevalue theory, quantum theory and many others. A thorough exposition ofthe fundamental properties of L´evy processes can be found e.g. in [8], [44]and [52].It is well-known that L´evy processes have a close link with infinitely di-visible distributions: if X = ( X t ) t ≥ is a L´evy process, then its marginaldistributions are all infinitely divisible and are determined by the distri-bution of X ∆ , where ∆ > µ, one can construct a L´evy process X = ( X t ) t ≥ , such that P X ∆ = µ, cf. Theorem 7.10 in [52]. Hence the law ofthe process X can be uniquely characterised by the characteristic functionof X ∆ , where ∆ > φ X ∆ of X ∆ can be written as φ X ∆ ( t ) = e ψ ∆ ( t ) , where the exponent ψ ∆ , called the characteristic or L´evy exponent, is givenby ψ ∆ ( t ) = ∆ iγ t − ∆ 12 σ t + ∆ Z R \{ } ( e itx − − itx [ | x |≤ ) ν ( dx ) , (1)see Theorem 8.1 of [52]. Here γ ∈ R , σ ≥ , and ν is a measure concentratedon R \{ } , such that R R \{ } (1 ∧ x ) ν ( dx ) < ∞ . This measure is called the L´evymeasure, while the triple ( γ , σ , ν ) is referred to as the characteristic or L´evytriplet of X. The parameter γ is called a drift parameter and a constant σ isa diffusion parameter. The representation in (1) in terms of the L´evy tripletis unique. It then follows that the L´evy triplet determines uniquely the law ofany L´evy process. Therefore, many statistical inference problems for L´evyprocesses can be reduced to inference on the corresponding characteristictriplets.Until quite recently most of the existing literature dealt with parametricinference procedures for L´evy processes, see e.g. [2]–[5], [9]–[11], [20], [41],[49], [51] and [59]. However, a nonparametric approach is also possible andarises if one does not impose parametric assumptions on the L´evy measure,or its density, in case the latter exists. A nonparametric approach can givee.g. valuable indications about the shape of the L´evy density. Furthermore,parametric inference for L´evy processes is complicated by the fact that formany L´evy processes their marginal densities are often intractable or not2vailable in closed form. This makes the implementation of such a standardparameter estimation method as the maximum likelihood method difficult.We refer e.g. to [1], [13]–[15], [21], [23]–[26], [29], [34], [38], [42]–[43], [48], [58],as well as the proceedings [40] and references therein for a nonparametricapproach to inference for L´evy processes.In the present work we will assume that the L´evy measure ν has a finitetotal mass, i.e. ν ( R ) < ∞ , and that it has a density ρ. In essence thismeans that the L´evy process that we sample from is a sum of a linear drift,a rescaled Brownian motion and a compound Poisson process. Thus thismodel is related to Merton’s model of an asset price, see [46]. Nonparametricinference for a similar model was considered in [6], [21] and [38].Since in our case ν ( R ) < ∞ , the L´evy-Khintchine exponent can be rewrit-ten as ψ ∆ ( t ) = ∆ iγt − ∆ 12 σ t + ∆ Z ∞−∞ ( e itx − ρ ( x ) dx. (2)The triple ( γ, σ , ρ ) is again referred to as a L´evy triplet. Note that γ in (2)differs from γ in (1).Suppose that the L´evy process X = ( X t ) t ≥ is observed at discrete timeinstances ∆ , , . . . , n ∆ , with ∆ kept fixed. This sampling case is usuallyreferred to as the low frequency data case. For the case when ∆ is allowed todepend on n and ∆ → , n ∆ → ∞ as n → ∞ see e.g. [25], [26] or [37]. In thiscase it is customary to talk about high frequency data case. Returning tothe case with a fixed ∆ , by a rescaling argument, without loss of generality,we can take ∆ = 1 . Based on observations X , . . . , X n , our goal in this paperis to estimate nonparametrically the L´evy density ρ. Notice that this is aninverse problem in that ρ is associated with jump sizes of a L´evy processand their intensity, the jumps themselves are not directly observable underthe present sampling scheme, and consequently ρ has to be estimated fromindirect observations X , . . . , X n . We will base our estimator of ρ on a suitable inversion of φ X . The ideaof expressing the L´evy measure or the L´evy density in terms of φ X and thenreplacing φ X by its natural nonparametric estimator, the empirical charac-teristic function, to obtain a plug-in type estimator for the L´evy measure orthe L´evy density has been successfully applied e.g. in [21], [24], [34], [38], [48]and [58]. The logic behind this approach is that except of some particularcases, e.g. that of the compound Poisson process, see [14] and [15], findingan explicit relationship expressing the L´evy measure or its density directlyin terms of the distribution of X without referring to the Fourier trans-forms is difficult. This hampers the use of a plug-in device, which is one ofthe most popular and useful methods for obtaining estimators in statistics.On the other hand the Fourier approach allows one to cover a large class ofexamples, as shown in the above-mentioned papers.Observe that the model we consider in the present work shares many fea-tures characteristic of a convolution model with partially or totally unknown3rror distribution, see [17], [27], [45] and [47]. For instance, the Gaussiancomponents in X , . . . , X n in our case will play a role similar to the mea-surement error in those papers, in case the latter has a normal distribution.We proceed to the construction of an estimator of ρ. First by differ-entiating the L´evy-Khintchine formula we will derive a suitable inversionformula for ρ. Suppose that R R x ρ ( x ) dx < ∞ . Since ρ has a finite secondmoment, so does X by Corollary 25.8 in [52]. Also E [ | X | ] is finite by theCauchy-Schwarz inequality. Hence we can differentiate φ X with respect to t to obtain φ ′ X ( t ) = φ X ( t ) (cid:18) iγ − σ t + i Z ∞−∞ e itx xρ ( x ) dx (cid:19) . (3)Notice that differentiation of R R ( e itx − ρ ( x ) dx under the integral sign isjustified by the dominated convergence theorem, applicable because of ourassumptions on ρ. Next rewrite (3) as φ ′ X ( t ) φ X ( t ) = iγ − σ t + i Z R e itx xρ ( x ) dx, (4)which is possible, because φ X ( t ) = 0 for all t ∈ R , see e.g. Theorem 7 . . t, we get φ ′′ X ( t ) φ X ( t ) − ( φ ′ X ( t )) ( φ X ( t )) = − σ − Z ∞∞ e itx x ρ ( x ) dx, (5)where again we interchanged the differentiation and integration order in therighthand side of (4) to obtain the righthand side of (5). Thus by rearrangingthe terms we have Z ∞−∞ e itx x ρ ( x ) dx = ( φ ′ X ( t )) − φ ′′ X ( t ) φ X ( t )( φ X ( t )) − σ . (6)Suppose that the righthand side is integrable, which is implied by the as-sumption that φ ′′ ρ is integrable. Here φ ρ denotes the Fourier transform of ρ. Then by the Fourier inversion argument the relationship x ρ ( x ) = 12 π Z ∞−∞ e − itx ( φ ′ X ( t )) − φ ′′ X ( t ) φ X ( t )( φ X ( t )) − σ ! dt holds. If x = 0 , this yields ρ ( x ) = 12 πx Z ∞−∞ e − itx ( φ ′ X ( t )) − φ ′′ X ( t ) φ X ( t )( φ X ( t )) − σ ! dt, (7)4nd we obtain a desired inversion formula. This formula coincides with theone given in [16] . The formula has to be compared to related inversionformulae given in [24], [26], [48] and [58]. Notice that under stronger mo-ment conditions on X one can perform the differentiation step in the abovederivation not twice, but three times, thereby eliminating σ from (5), andone can obtain an inversion formula of the same type as in (7), but notinvolving σ explicitly, see e.g. [26]. We do not pursue this path, as a studyof asymptotic properties of an estimator of ρ of the same type as we proposebelow based on this different inversion formula would require stronger mo-ment conditions on X , cf. the discussion in the next section. It would alsoinvolve longer and more technical proofs of the asymptotic results. Finally,under certain smoothness assumptions on the L´evy density it would leadto an estimator with worse convergence rate than the one that we proposebelow. See Section 2 for an additional discussion.Denote Z j = X j − X j − and observe that Z , . . . , Z n are i.i.d., whichfollows from the stationary independent increments property of a L´evy pro-cess. Let ˆ φ ( t ) = n − P nj =1 e itZ j . By the strong law of large numbers, forevery fixed t, the empirical characteristic function ˆ φ ( t ) and its derivativeswith respect to t, ˆ φ ′ ( t ) and ˆ φ ′′ ( t ) , converge a.s. to φ X ( t ) , φ ′ X ( t ) and φ ′′ X ( t ) , respectively. Using a plug-in device, a possible estimator of ρ ( x ) could thenbe 12 πx Z ∞−∞ e − itx ( ˆ φ ′ ( t )) ( ˆ φ ( t )) − ˆ φ ′′ ( t )ˆ φ ( t ) − ˆ σ ! dt, (8)where ˆ σ is some estimator of σ . The problem with this ‘estimator’ of ρ isthat in general the integrand in (8) is not integrable. Furthermore, smallvalues of ˆ φ ( t ) might render the estimator numerically unstable, since ˆ φ ( t )appears in the denominator in (8). Therefore, as an estimator of ρ wepropose the following modification of (8),ˆ ρ ( x ) = 12 πx Z ∞−∞ e − itx ( ˆ φ ′ ( t )) ( ˆ φ ( t )) G t − ˆ φ ′′ ( t )ˆ φ ( t ) 1 G t − ˆ σ ! φ w ( ht ) dt. (9)Here φ w denotes the Fourier transform of a kernel function w, while a num-ber h > φ w has a compact support, for instance on [ − , . Wedefine the set G t in (9) by G t = n | ˆ φ ( t ) | ≥ κ n e − Σ / (2 h ) o . (10) [16] contains a more general result valid also for L´evy densities with infinite totalmass. However, the statement of the theorem in [16] mistakenly claims that the L´evydensity ρ is bounded under the assumptions given in [16]. In reality this can in general beascertained only for x ρ ( x ) . Examples (e) and (f) considered in [16] illustrate our point. G t depends on h, as well as a constant Σ and a sequence κ n → G t . A general reason for usingtruncation with 1 G t is a desire of numerical stability, but truncation in (9)will also help in proving the asymptotic results from Section 2. At this pointnotice that we could have also used a “diagonal-out” estimator2 n ( n − X ≤ j 0) instead of ˆ ρ ( x ) . For this modified estimatorwe have E [(ˆ ρ + ( x ) − ρ ( x )) ] ≤ E [(ˆ ρ ( x ) − ρ ( x )) ] and hence its performanceis at least as good as that of ˆ ρ, if the mean square error is used as the6erformance criterion. We restrict our attention to studying the estimatorˆ ρ only.The structure of the paper is as follows: in the next section we willstudy the asymptotic behaviour of the mean square error of the proposedestimator of ρ. In particular we will derive convergence rates of our estimatorover appropriate classes of L´evy triplets and discuss the corresponding lowerbounds for estimation of ρ. The section is concluded with a discussion on theobtained results and possible extensions. The proofs of results from Section2 are collected in Section 3. We first formulate conditions that will be used to establish asymptotic prop-erties of the estimator ˆ ρ. We also supply some comments on these conditions.Introduce a jump size density f ( x ) := ρ ( x ) /ν ( R ) . Condition 2.1. Let the unknown L´evy density ρ belong to the class W ( β, L, L ′ , L ′′ , K, Λ) = n ρ : ρ ( x ) = ν ( R ) f ( x ) , f is a probability density , Z ∞−∞ | t | β | φ f ( t ) | dt ≤ L, | φ f ( t ) | ≤ L ′ | t | β , | φ ′ f ( t ) | ≤ L ′′ | t | β , Z ∞−∞ x f ( x ) dx ≤ K,φ ′′ f is integrable ,ν ( R ) ∈ (0 , Λ] o , where β, L, L ′ , L ′′ , K and Λ are strictly positive numbers. This condition is similar to the one given in [38] and we refer to the latterfor additional discussion. When β is an integer, the integrability conditionon φ f in Condition 2.1 is roughly equivalent to f having a derivative oforder β. The moment condition on f, and consequently on ρ, is admittedlystrong, but on the other hand in mathematical finance it is customary toassume that ρ has a finite exponential moment. The moment conditionin Condition 2.1 is used to prove an appropriate maximal inequality for ˆ φ and its derivatives, see Theorem 2.2, which constitutes one of the importantworking tools of the paper. Condition 2.2. Let σ be such that σ ∈ [0 , Σ] , where Σ is a strictly positivenumber. , that is to say when σ = 0 is known beforehand,we refer to [24] and [34]. Observe that in general σ determines how fast thecharacteristic function φ X decays at plus and minus infinity, because as itis easy to see, one has | φ X ( t ) | ≥ e − − Σ t / . (12)The knowledge of Σ , which we will assume, gives us a lower bound on therate of decay of φ X at plus and minus infinity (uniformly in σ ∈ [0 , Σ]). Condition 2.3. Let γ be such that | γ | ≤ Γ , where Γ is a positive number. This condition is the same as the one in [38], cf. also [6]. Condition 2.4. Let the bandwidth h = h n depend on n and be such that h n = ( η log n ) − / with < η < / (2Σ ) . This condition is similar to the one given in [38]. Notice that in orderto keep our notation compact, we will suppress the dependence of h n on n in the notation. The fact that the bandwidth h depends on Σ has aparallel in the condition on the smoothing parameter in [24], see Remark4 . h establishes a trade-off between the bias and the variance of theestimator: too small an h will result in an estimator with small bias butlarge variance, while too large an h results in the estimator with large biasbut small variance. From Theorems 2.3 and 2.4 it will follow that the choiceof ρ as in Condition 2.4 is optimal in one particular situation in a sensethat it asymptotically minimises the order of the mean square error of theestimator ˆ ρ at a fixed point x. Condition 2.5. Let the kernel w be the sinc kernel: w ( x ) = sin x/ ( πx ) . The sinc kernel has also been used in [38] when estimating the L´evydensity. Its use is frequent in deconvolution problems, see e.g. [17]. TheFourier transform of the sinc kernel is given by φ w ( t ) = 1 [ − , ( t ) . Condition 2.6. Let the sequence κ n be such that κ n = κ | log h | − for aconstant κ > . This is a technical condition used in the proofs. Other sufficiently slowlyvanishing sequences { κ n } can also be used, ours is just one concrete example.The intuition behind Condition 2.6 is that up to a constant e − , the term e − Σ / (2 h ) gives a lower bound for the absolute value of the characteristicfunction φ X ( t ) on the interval [ − h − , h − ] , cf. (12). For n large enough,with an indicator 1 G t in the definition of ˆ ρ we thus cut-off those frequencies t for which | ˆ φ ( t ) | becomes smaller than the lower bound for | φ X ( t ) | over8 ∈ [ − h − , h − ] . Of course different truncation methods are also possible andwe refer e.g. to [24] for an alternative truncation method in the definitionof an estimator of a L´evy density in a problem similar to ours. We thinkthat it is natural to incorporate the knowledge of Σ in the selection of thethreshold in (9), since the knowledge of Σ is required anyway when selectingthe bandwidth h. With our choice of h the set G t can also be characterisedin terms of the sample size n, because h is a function of n, see Condition2.4. Thus our truncation method is not dissimilar from the one in thedeconvolution problem studied in [47].Next we recall two conditions from [38], which were used to study theasymptotics of the estimator ˆ σ . For the convenience of a reader we alsostate a result on the asymptotic behaviour of its mean square error. Thelatter is used in the proof of Theorem 2.3 below. Condition 2.7. Let the kernel v h ( t ) = h v ( ht ) , where the function v iscontinuous and real-valued, has a support on [ − , and is such that Z − v ( t ) dt = 0 , Z − (cid:18) − t (cid:19) v ( t ) dt = 1 , v ( t ) = O ( t β ) as t → . Here β is the same as in Condition 2.1. It is for simplicity of the proofs that we assume that the smoothingparameter h in the definition of ˆ σ is the same as in Condition 2.2. Inpractice the two need not be equal, although they have to be of the sameorder. Condition 2.8. Let the truncating sequence M = ( M n ) n ≥ be such that M n = m n h − , where m n = | log h | − . Here we implicitly assume that n is large enough, so that m n is realand m n > . Other conditions are also possible, ours is just one concreteexample. The use of the truncation in the definition of ˆ σ in (11) is that itprevents the estimator from exploding: | ˆ φ ( t ) | can in general take arbitrarilysmall values and log( | ˆ φ ( t ) | ) consequently can become arbitrarily large.In the remainder of the paper we will often use the symbols . and & when comparing two sequences a n and b n , respectively meaning a n is less orequal than b n , or a n is greater or equal than b n up to a constant that doesnot depend on n. The symbol ≍ will be used to denote the fact that twosequences of real numbers are asymptotically of the same order. Theorem 2.1. Denote by T the collection of all L´evy triplets satisfyingConditions 2.1–2.3 and assume Conditions 2.4, 2.7 and 2.8. Let the esti-mator ˆ σ be defined by (11) . Then sup T E [(ˆ σ − σ ) ] . (log n ) − β − holds. σ is logarithmic, the contribution of ˆ σ to an upper boundon the mean square error of ˆ ρ ( x ) is asymptotically negligible compared toother terms, as can be seen from the proof of Theorem 2.3. By techniquessimilar to those used in [39] in a related deconvolution problem, it is expectedthat under the same conditions on the class of L´evy triplets as in Theorem2.1 one can prove that ˆ σ is rate-optimal, but since our emphasis in thepresent work is on estimation of a L´evy density, we refrain from studyingthis question. For additional discussion on the estimator ˆ σ see [38].Notice that had we not assumed ν ( R ) ≤ Λ < ∞ , there would not exist auniformly consistent estimator of σ , see Remark 3 . σ is not clear in that general setting.Together with the above theorem, an important tool in studying theestimator ˆ ρ is the following maximal inequality for the empirical charac-teristic function ˆ φ ( t ) and its derivatives. Set ˆ φ (0) ( t ) = ˆ φ ( t ) and likewise φ (0) X ( t ) = φ X ( t ) . Theorem 2.2. Let k ≥ and r ≥ be integers. Then we have E " sup t ∈ [ − h − ,h − ] | ˆ φ ( k ) ( t ) − φ ( k ) X ( t ) | ! r . ( k| x | k +1 k r L ∨ r (P) + k| x | k k r L ∨ r (P) ) 1 h r n r/ , (13) provided k| x | k +1 k L ∨ r (P) is finite. Here the probability P on the righthandside refers to the law of X , which is uniquely characterised by the triplet ( γ, σ , ρ ) . The theorem constitutes a generalisation of the corresponding result forˆ φ and r = 2 given in [38]. The theorem is of possible general interest as well.For related results on the empirical characteristic function see Theorem 1 in[31] and Theorem 4.1 in [48].Equipped with the above two theorems, we are now ready to formulatethe first main result of the paper, which concerns the mean square error ofthe estimator ˆ ρ at a fixed point x = 0 . Notice that we prefer to work withasymptotics uniform in L´evy triplets, since existence of the superefficiencyphenomenon in nonparametric estimation makes it difficult to interpret fixedparameter asymptotics, see e.g. [12] for a discussion. This also explains whywe imposed certain smoothness assumptions on the class of L´evy densities:too large a class of densities, e.g. of all continuous densities, usually cannotbe handled when dealing with uniform asymptotics, see e.g. Theorem 1 onp. 36 in [32] for an example from probability density estimation.10 heorem 2.3. Denote by T the collection of all L´evy triplets satisfyingConditions 2.1–2.3 and assume Conditions 2.4–2.8. Let the estimator ˆ ρ bedefined by (9) . Then we have sup T E[(ˆ ρ ( x ) − ρ ( x )) ] . (log n ) − β for every fixed x = 0 . Thus the convergence rate of our estimator turns out to be logarith-mic, just as for the estimator of ρ proposed in [38]. This result can beeasily understood on an intuitive level by comparison to a nonparametricdeconvolution problem: if the distribution of the measurement error in adeconvolution model is normal, and if the class of the target densities ismassive enough, e.g. some H¨older or Sobolev class (see Definitions 1.2 and1.11 in [54]), the minimax convergence rate for estimation of an unknowndensity will be logarithmic for both the mean squared error and mean in-tegrated squared error as measures of risk, see [35] and [36]. Of course thesame holds true also for deconvolution models with unknown error variance,see [17] and [45]. Exactly as kernel-type estimators in semiparametric de-convolution problems, our estimator ˆ ρ also involves division by an estimatorof a characteristic function (or to be more precise by its square), a slightdifference being that in semiparametric deconvolution problems we divideby an estimator of the characteristic function of the measurement error vari-able, while in the definition of ˆ ρ we divide by ˆ φ, an estimator of φ X . Forlarge enough n the empirical characteristic function ˆ φ should be close to thetrue characteristic function φ X on the interval [ − h − , h − ] . Since up to aconstant term, φ X behaves at plus and minus infinity as a normal char-acteristic function, the logarithmic convergence rate of the estimator ρ isthen no surprise. Exactly as in normal deconvolution problem over a H¨olderor Sobolev class of densities, cf. [35] and [36], it is due to the dominatingsquared bias of ˆ ρ, i.e. roughly speaking the term T in the proof of Theorem2.3. More formally, in the theorem given below we actually prove that ourestimator ˆ ρ attains the minimax convergence rate for estimation of the L´evydensity ρ at a fixed point x over a suitable class of L´evy triplets when therisk is measured by the mean square error. Theorem 2.4. Let T be a L´evy triplet ( γ, σ , ρ ) , such that | γ | ≤ Γ , σ ∈ [0 , Σ] , ν ( R ) ∈ (0 , Λ] , where Γ , Σ and Λ are strictly positive constants. As-sume furthermore that Z ∞−∞ | t | β | φ f ( t ) | dt ≤ L ; | φ f ( t ) | ≤ L ′ | t | β ; | φ ′ f ( t ) | ≤ L ′ | t | β (14) for strictly positive constants β, L, L ′ and L ′′ . Let T be a collection of allL´evy triplets satisfying these conditions. Then for every fixed x = 0 we have inf e ρ n sup T E[( e ρ n ( x ) − ρ ( x )) ] & (log n ) − β , (15)11 here the infimum on the lefthand side is taken over all estimators e ρ n basedon observations X , . . . , X n . The proof of the theorem is such that it also works for the case when σ > σ does notlead to some estimator of ρ with a better rate of convergence. This is unlikethe semiparametric deconvolution problem with unknown error variance, see[17], where the fact that the measurement error variance is unknown slowsdown even further the convergence rate. Disregarding the moment conditionin Condition 2.1, an easy consequence of Theorems 2.3 and 2.4 is that ˆ ρ israte-optimal.A slow, logarithmic convergence rate of ˆ ρ seems to indicate that samplesof very large size are needed to accurately estimate ρ. However, it is knownthat in deconvolution problems kernel-type density estimators perform wellfor reasonable sample sizes, provided the noise term variance is not too large,see e.g. [30], [33] or [57]. Likewise, a spectral cut-off method of [6] and [7]produces good results for small values of σ in the problem of calibrationof exponential L´evy models. Since in the financial setting it is perhapsunnatural to assume that σ is known and σ → n → ∞ , which constitutesthe mathematical formalisation of the statement that in the asymptoticsetting the noise level is low, and since in the present work we are mainlyconcerned with asymptotics, we will explore a different possibility, namelythat the L´evy density is much smoother than the H¨older or Sobolev classL´evy densities. Our results will parallel those from [18], where it is shownin the deconvolution context that better than logarithmic convergence ratescan be obtained in case when the target density is supersmooth itself, i.e.essentially has a characteristic function that decays exponentially fast atplus and minus infinity.We first give a condition on the class of L´evy densities. Condition 2.9. Let the unknown L´evy density ρ belong to the class A ( α, s, L, L ′ , L ′′ , K, Λ) = n ρ : ρ ( x ) = ν ( R ) f ( x ) , f is a probability density , Z ∞−∞ | φ f ( t ) | exp(2 α | t | s ) dt ≤ L, | φ f ( t ) || t | (1 − s ) / e − α | t | s ≤ L ′ , | φ ′ f ( t ) || t | (1 − s ) / e − α | t | s ≤ L ′′ , Z ∞−∞ x f ( x ) dx ≤ K,φ ′′ f is integrable ,ν ( R ) ∈ (0 , Λ] o , here α, s, L, K and Λ are strictly positive numbers. The ‘size’ of the class A ( α, s, L, L ′ , L ′′ , K, Λ) is much smaller than the‘size’ of the class W ( β, L, L ′ , L ′′ , K, Λ) , and it is intuitively clear that betterconvergence rates can be expected for estimation of ρ over the former classthan over the latter class. We will refer to the class A ( α, s, L, L ′ , L ′′ , K, Λ)as the class of supersmooth L´evy densities.Since the estimator ˆ ρ depends on the estimator ˆ σ , we first need to studythe asymptotics of the latter. With a different class of L´evy densities thanin Theorem 2.1, the conditions on the bandwidth h and kernel v h have tobe modified accordingly. These are supplied below. Condition 2.10. Let the bandwidth h depend on n and be such that h is apositive solution of the equation αh s + 2Σ h = log n − (log log n ) . (16)Here we thus suppose that s is known. We also assume that n is largeenough, so that equation (16) indeed has a positive root. Condition 2.10is motivated by a similar condition on the bandwidth in the deconvolutionproblem studied in [18]. An optimal bandwidth, i.e. a bandwidth thatasymptotically minimises the risk of the estimator (or an upper bound onit), is typically computed in kernel estimation by differentiating an upperbound on the risk of the estimator with respect to h, setting the derivativeto zero and solving h from the obtained equation. However, in our case anoptimal h can also be computed from (16), cf. Section 3 in [18], and we givethe corresponding argument in the proof of Theorem 2.6. The two methodsof course yield the same asymptotic results. Condition 2.11. Let the kernel v h ( t ) = h v ( ht ) , where the function v iscontinuous and real-valued, has a support on [ −√ , − S [1 , √ and is suchthat Z R v ( t ) dt = 0 , Z R (cid:18) − t (cid:19) v ( t ) dt = 1 . Instead of defining the support of v by [ −√ , − S [1 , √ , we could havedefined it as [ − a, − S [1 , a ] for 1 < a ≤ √ , which would result in a betterconvergence rate for ˆ σ . However, a = √ ρ, as a contribution of ˆ σ to an upper bound on the risk of ˆ ρ will still be asymptotically of at most the same order as that of other terms,cf. the proof of Theorem 2.6. We do not address the problem of constructinga rate-optimal estimator of σ in the present paper.The following result holds true. 13 heorem 2.5. Denote by T the collection of all L´evy triplets satisfyingConditions 2.2, 2.3 and 2.9 and assume that Conditions 2.8, 2.10 and 2.11hold. Let s < and let the estimator ˆ σ be defined by (11) . Then sup T E [(ˆ σ − σ ) ] . h s +5 exp (cid:18) − αh s (cid:19) holds, where h is defined in Condition 2.10. The asymptotics of the estimator ˆ σ (and also those of ˆ ρ ) change qual-itatively when s > . In particular, the convergence rate of ˆ σ becomespolynomial. Although supersmooth densities with s > s < . With the above result we can finally study the asymptotics of ˆ ρ over theclass of supersmooth L´evy densities. Theorem 2.6. Suppose that conditions of Theorem 2.5 are satisfied and letin addition Condition 2.6 hold. Then we have sup T E[(ˆ ρ ( x ) − ρ ( x )) ] . h s − exp (cid:18) − αh s (cid:19) for every fixed x = 0 . In particular, for s = 1 an upper bound sup T E[(ˆ ρ ( x ) − ρ ( x )) ] . exp − α (cid:18) log n (cid:19) / ! (17) is valid. Since h ≍ (log n ) − / , which can be shown as formula (27) of [18], itis easy to see that the convergence rate of ˆ ρ is faster than any power oflog n and hence much better than that in Theorem 2.3. The case s = 1 isparticularly interesting, as it corresponds to the class of L´evy densities thatadmit an analytic continuation into a strip of the complex plane.A natural question is whether ˆ ρ is rate-optimal over a class of super-smooth L´evy densities. We will not provide a formal statement and itsproof, but instead will restrict ourselves to an intuitive discussion, whichwe hope is more enlightening. To answer the question of rate-optimalityof ˆ ρ, one has first to establish a lower bound for estimation of ρ ( x ) over aclass of supersmooth L´evy densities. Disregarding the moment condition inCondition 2.9, this can be done by following a general scheme of the proofof Theorem 2.4 combined with some of the techniques from [17], [19] or [39].This lower bound will be similar to the one given in Theorem 4 in [19] andin fact for s = 1 one will haveinf e ρ n sup T E[( e ρ n ( x ) − ρ ( x )) ] & exp − α (cid:18) log n Σ (cid:19) / ! , (18)14here the infimum is taken over the class of all estimators e ρ based on asample X , . . . , X n from the process X. Unfortunately, the lower bound in(18) is too small in comparison to the upper bound in (17). Although weare not completely sure, we still think that the lower and upper risk boundsthat we give in Theorem (17) and (18) are sharp as far as their rates ofdecay are concerned: we think that it is the estimator ˆ ρ that cannot attainthe minimax convergence rate. Given that this is true, an intuitive explana-tion of the suboptimality of ˆ ρ in the present setting might be the following:the construction of ˆ ρ in (9) involves division by ( ˆ φ ( t )) , which is close to( φ X ( t )) on [ − h − , h − ] for n large enough. Hence in essence we are dealingwith a kernel-type deconvolution density estimator which involves divisionby ( φ X ( t )) , whereas in conventional deconvolution problems the kernel es-timator involves division by the characteristic function of the measurementerror variable and not its square, see e.g. [35]. By a rough analogy, assum-ing that the Gaussian component in the L´evy process plays a role similar tothe measurement error in the deconvolution problems, one can see that thevariance of our estimator ˆ ρ of a L´evy density is larger than the variance of akernel-type deconvolution density estimator, compare p. 1266 in [35] and anupper bound on the term T in the proof of Theorem 2.6. In order to renderthe variance asymptotically negligible, a somewhat larger bandwidth wouldthus be required in the former case than in the latter case. However, unlikethe case when the L´evy density satisfies Condition 2.1, this has a dramaticeffect on the bias of the estimator (as far as its order is concerned) for theclass of supersmooth L´evy densities and the suboptimality of ˆ ρ results: it isthe squared bias, or roughly speaking the term T in the proof of Theorem2.6, that dominates the asymptotics of ˆ ρ. No such problem seems to arisein [24], where unlike our setting it is a priori assumed that γ = 0 , σ = 0 , and as a consequence one can derive a different inversion formula than (7),cf. formula (19) below, which involves only division by φ X and not by itssquare.In light of the above observations another natural question that arisesin this context is whether one has to use (4) instead of (7) as a basis ofconstruction of an estimator of ρ : under appropriate conditions with theformer formula one can express the L´evy density ρ as ρ ( x ) = − πx Z R e − itx (cid:18) i φ ′ X ( t ) φ X ( t ) + γ + iσ t (cid:19) dx, (19)which involves division by the first power of φ X only. By replacing φ X bythe empirical characteristic function ˆ φ and σ and γ by their estimators andby application of an appropriate amount of regularisation we would thusget an estimator of ρ that in its form is closer to a conventional kernel-typedeconvolution density estimator in that under the integral sign it involvesdivision by the first power of the (estimated) characteristic function only.It is nevertheless unclear whether this approach can lead to an estimator15f ρ with a better (optimal in the best case) convergence rate than the onewe are considering in the present work: one has to find estimators of γ and σ that converge at an optimal rate in the present context, which does notseem to be an easy task.Another interesting question that arises in the present context is thatof adaptation: construction of our estimator of ρ does rely on knowledge ofthe smoothness degree of a L´evy density, see in particular Conditions 2.7,2.10 and 2.11. In practice it might happen that this smoothness degree isunknown and it is desirable to have an estimator of ρ that automaticallyachieves the optimal rate of convergence without knowledge of the smooth-ness degree of a L´evy density. We view this as a separate problem and donot address it in the present work. Relevant results are available in thecontext of pure jump L´evy processes and we refer e.g. to [24] for additionaldetails. Note that the proofs of the adaptation results in that paper requirenontrivial amount of technical work. In any case, in our setting an adaptiveestimator and σ would be required.We conclude this section by a brief comparison of ˆ ρ to the estimator ρ n of ρ proposed in [38]. Up to some additional truncation, the latter estimatoris given by ρ n ( x ) = 12 π Z /h − /h e − itx Log ˆ φ ( t ) e i ˆ γt e − ˆ ν ( R ) e − ˆ σ t / ! dt, (20)where Log denotes the so-called distinguished logarithm, i.e. a ‘logarithm’that is a continuous and single-valued function of t, see Theorem 7.6.2 of[22] for its construction. Furthermore, ˆ γ, ˆ ν ( R ) and ˆ σ are estimators ofthe parameters γ, ν ( R ) and σ , respectively. Notice that in general thedistinguished logarithm Log( g ( t )) of some function g is not a composition ofa fixed branch of an ordinary logarithm with g. The estimator ρ n seems to begiven by a more complicated expression than ˆ ρ, because it depends explicitlyon estimators of γ and ν ( R ) in addition to the estimator of σ . The matteris furthermore complicated by the need to use the distinguished logarithm.The latter in (20) can be defined only for those ω ’s from the sample spaceΩ for which ˆ φ as a function of t does not hit zero on [ − h − , h − ] . For those ω ’s for which this is not satisfied, ρ n has to be assigned an arbitrary value,e.g. one can assume that ρ n is a standard normal density. It is shown in[38] that as n → ∞ , the probability of the event that ˆ φ hits zero for t in[ − h − , h − ] vanishes under appropriate conditions. However, an almost sureresult of a similar type remains to be unknown (it has been established onlyin the context of [6] in [53]). Also in practice the fact that ˆ φ does not vanishcan be checked for a discrete grid of points t only and it could happen thatone misses the fact that ˆ φ ( t ) is zero for some t ∈ [ − h − , h − ] . All this seemsto be a disadvantage of the estimator ρ n . On the other hand the estimatorˆ ρ is undefined for x = 0 and a study of its asymptotic properties requires16tronger moment conditions on the L´evy density ρ. Also, a division by x inthe vicinity of the origin might render it numerically unstable. In conclusion,both estimators are rate-optimal over an appropriate class of L´evy triplets,but each of them seems to have its own advantages over another. Proof of Theorem 2.2. The proof is similar in spirit to the one in [38], pp.334–335, which in turn mimicks the one in [17], pp. 326–327. Since both ofthe proofs are deficient, here we also seize an opportunity to rectify them.We haveE " sup t ∈ [ − h − ,h − ] | ˆ φ ( k ) ( t ) − φ ( k ) X ( t ) | ! r = 1 n r/ E " sup t ∈ [ − h − ,h − ] | G n v t,k | ! r , where G n v t,k denotes an empirical process G n v t,k = 1 √ n n X j =1 ( v t,k ( Z j ) − E[ v t,k ( Z j )])and the function v t,k is defined as v t,k : x ( ix ) k e itx . Introduce the functions v t,k, : x x k sin( tx ) and v t,k, : x x k cos( tx ) . Since | i k | = 1 and e itx =cos( tx ) + i sin( tx ) , the c r -inequality givesE " sup t ∈ [ − h − ,h − ] | G n v t,k | ! r . E " sup t ∈ [ − h − ,h − ] | G n v t,k, | ! r + E " sup t ∈ [ − h − ,h − ] | G n v t,k, | ! r . Furthermore, by differentiability of v t,k,j with respect to t and the mean-value theorem we have | v t,k,j ( x ) − v s,k,j ( x ) | ≤ | x | k +1 | t − s | (21)for j = 1 , . Consequently, for a fixed x the function v t,k,j is Lipschitz in t with a Lipschitz constant | x | k +1 . In what follows we will need some results from the theory of empiricalprocesses. For all the unexplained terminology and notation we refer e.g. toSection 19.2 of [55] or Section 2.1.1 of [56]. First of all, by the inequality(21) and by Theorem 2.7.11 of [56] the bracketing number N [] of the class offunctions F n,j (for j = 1 , v t,k,j for t ∈ [ − h − , h − ]) can be bounded by the covering number N of the interval I n = [ − h − , h − ] as follows N [] (2 ǫ k| x | k +1 k L ( Q ) ; F n,j ; L ( Q )) ≤ N ( ǫ ; I n ; | · | ) . Q is any probability measure. Since it is easily seen that for the coveringand bracketing numbers of the classes F n,j , j = 1 , , we have the inequality N ( ǫ k| x | k +1 k L ( Q ) ; F n,j ; L ( Q )) ≤ N [] (2 ǫ k| x | k +1 k L ( Q ) ; F n,j ; L ( Q )) , cf. p. 84 in [56], and since N ( ǫ ; I n ; | · | ) ≤ ǫ h + 1 , we obtain that N ( ǫ k| x | k +1 k L ( Q ) ; F n,j ; L ( Q )) ≤ ǫ h + 1 . (22)By taking s = 0 , it follows from the definition of v t,k,j and (21) that thefunction F h, ( x ) = | x | k +1 h − can be used as an envelope for the class F n, , while F h, ( x ) = | x | k +1 h − + | x | k can serve as an envelope for F n, . Nextdefine J (1 , F n,j ) , the entropy of the class F n,j , as J (1 , F n,j ) = sup Q Z { N ( ǫ k F h,j ( x ) k L ( Q ) ; F n,j ; L ( Q ))) } / dǫ, where j = 1 , , and the supremum is taken over all discrete probabilitymeasures Q, such that k F h,j ( x ) k L ( Q ) > . Notice that F n,j ’s are measurableclasses of functions with measurable envelopes. Theorem 2.14.1 in [56] thenimplies thatE " sup t ∈ [ − h − ,h − ] | G n v t,k,j | ! r . k F h,j ( x ) k r L ∨ r (P) ( J (1 , F n,j )) r . Here the probability measure P on the righthand side is associated with thedistribution of X . We next need to work out the quantities on the righthandside of the above display. Observe that k F h, ( x ) k r L ∨ r (P) = 1 h r k| x | k +1 k r L ∨ r (P) . Moreover, we have k F h, ( x ) k r L ∨ r (P) . h r ( k| x | k +1 k r L ∨ r (P) + k| x | k k r L ∨ r (P) ) , provided h ≤ . Here we also used the c ∨ r -inequality. It thus remains tobound the entropy J (1 , F n,j ) . By the fact that k F h, ( x ) k L ( Q ) = h − k| x | k +1 k L ( Q ) and by taking ǫ/h instead of ǫ in (22) we get N ( ǫ k F h, ( x ) k L ( Q ) ; F n,j ; L ( Q )) ≤ ǫ + 1 . (23)18urthermore, since k F h, ( x ) k L ( Q ) ≥ k| x | k +1 h − k L ( Q ) , by monotonicity ofthe covering number N in the size of the covering balls combined with (23)we obtain that N ( ǫ k F h, ( x ) k L ( Q ) ; F n,j ; L ( Q )) ≤ ǫ + 1 . (24)Inserting the bounds from (23) and (24) into the definition of J (1 , F n,j ), wesee that J (1 , F n,j ) ≤ Z (cid:26) (cid:18) ǫ + 1 (cid:19)(cid:27) / dǫ < ∞ . This yields the statement of the theorem. Proof of Theorem 2.3. By the c -inequality we haveE[(ˆ ρ ( x ) − ρ ( x )) ] . | ρ ( x ) − e ρ ( x ) | + E[ | ˆ ρ ( x ) − e ρ ( x ) | ] = T + T , where e ρ ( x ) = 12 πx Z /h − /h e − itx ( φ ′ X ( t )) − φ ′′ X ( t ) φ X ( t )( φ X ( t )) − σ ! dt. We will first work out the term T . By (6) we have − φ ′′ ρ ( t ) = ( φ ′ X ( t )) − φ ′′ X ( t ) φ X ( t )( φ X ( t )) − σ . Then by the Fourier inversion argument we can write ρ ( x ) − e ρ ( x ) = 12 π Z R e − itx φ ρ ( t ) dt + 12 πx Z /h − /h e − itx φ ′′ ρ ( t ) dt. Integrating by parts twice the second term on the righthand side of theabove display and using Condition 2.1, we obtain12 πx Z /h − /h e − itx φ ′′ ρ ( t ) dt = − π Z /h − /h e − itx φ ρ ( x ) dx + O ( h β ) , where the O ( h β ) term on the righthand side is uniform in ρ. With this inmind and by the fact that φ ρ ( t ) = ν ( R ) φ f ( t ) , we can bound T using the c -inequality as T . Λ π Z R \ [ − h − ,h − ] | φ f ( t ) | dt ! + h β . Z R \ [ − h − ,h − ] | t | β | t | − β | φ f ( t ) | dt ! + h β ≤ (cid:18)Z ∞−∞ | t | β | φ f ( t ) | dt (cid:19) h β + h β . h β , h ≤ . Hence by Condition 2.2 the term sup T T is of order(log n ) − β . This is the term that has the dominating contribution to the riskof ˆ ρ. The rest of the proof is dedicated to showing that T is negligible incomparison to T . This involves a long series of inequalities.By the c -inequality we have T . π x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z /h − /h e − itx dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E [ | ˆ σ − σ | ]+ 14 π x E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z /h − /h e − itx (Φ( ˆ φ ( t ))1 G t − Φ( φ ( t ))) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = T + T , where for a twice differentiable function ζ the mapping Φ is defined byΦ( ζ ( t )) = ( ζ ′ ( t )) − ζ ′′ ( t ) ζ ( t )( ζ ( t )) . By Theorem 2.1 in combination with Condition 2.4 we have sup T T . (log n ) − β − . Next notice that T ≤ π x h E sup t ∈ [ − h − ,h − ] | Φ( ˆ φ ( t ))1 G t − Φ( φ ( t )) | ! = T π x . Hence it remains to study T . This will be done via repeated applications ofTheorem 2.2. First of all, the c -inequality gives T . h E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ′′ ( t )ˆ φ ( t ) 1 G t − φ ′′ X ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! + 1 h E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ˆ φ ′ ( t )) ( ˆ φ ( t )) G t − ( φ ′ X ( t )) ( φ X ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! = T + T . By another application of the c -inequality we obtain T . h E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ′′ ( t )ˆ φ ( t ) 1 G t − φ ′′ X ( t ) φ X ( t ) 1 G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! + 1 h E sup t ∈ [ − h − ,h − ] (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) φ ′′ X ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) G ct (cid:19)! = T + T . T in the last equality can be bounded as follows, T . h E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ′′ ( t )ˆ φ ( t ) 1 G t − ˆ φ ′′ ( t ) φ X ( t ) 1 G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! + 1 h E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ′′ ( t ) φ X ( t ) 1 G t − φ ′′ X ( t ) φ X ( t ) 1 G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! = T + T . Further bounding gives T ≤ h E sup t ∈ [ − h − ,h − ] | ˆ φ ′′ ( t ) | ! sup t ∈ [ − h − ,h − ] | ˆ φ ( t ) − φ X ( t ) || ˆ φ ( t ) || φ X ( t ) | G t !! . Now apply the Cauchy-Schwarz inequality to the righthand side to obtain T ≤ h E sup t ∈ [ − h − ,h − ] | ˆ φ ′′ ( t ) | ! / × E sup t ∈ [ − h − ,h − ] | ˆ φ ( t ) − φ X ( t ) || ˆ φ ( t ) || φ X ( t ) | G t !! / = 1 h p T p T . Observe that by the fact that | ˆ φ ′′ ( t ) | ≤ n − P nj =1 Z j and by the c -inequality T ≤ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X j =1 Z j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c n E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( Z j − E [ Z j ]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + c (E [ Z ]) ≤ (3 √ / c n E [( Z − E [ Z ]) ] + c (E [ Z ]) , where the last inequality follows from the Marcinkiewicz-Zygmund inequal-ity as given in Theorem 2 of [50]. By the Lyapunov inequality (E [ Z ]) ≤ E [ Z ] . This in combination with the c -inequality gives E [( Z − E [ Z ]) ] . E [ Z ] . It remains to bound E [ Z ] uniformly in L´evy triplets. The mostdirect way of doing this is to notice thatE [ Z ] = E [( γ + σW + Y ) ] . Γ + Σ E [ W ] + E [ Y ] , W is a standard normal random variable, while Y has a compoundPoisson distribution with intensity ν ( R ) and jump size density f. Observethat E [ Y ] = φ (8) Y (0) and that under Condition 2.1 and with the Lyapunovinequality it is laborious, though straightforward to show that φ (8) Y (0) isbounded by a universal constant uniformly in L´evy triplets. Hence theterm sup T E [ Z ] is also bounded and then so is sup T √ T . As far as T isconcerned, we have T . e /h κ n E sup t ∈ [ − h − ,h − ] | ˆ φ ( t ) − φ X ( t ) | ! , which follows from Conditions 2.1 and 2.2. Inequality (13) with k = 0 and r = 4 then yields T . k| x |k L (P) e /h κ n h n . Since k| x |k L (P) is bounded by a constant uniformly in L´evy triplets (thiscan be proved by essentially the same argument as we used for sup T E [ Z ]above), it follows that sup T T is negligible in comparison to (log n ) − β . Thisis also true for h − sup T √ T and then also for sup T T . To complete thestudy of T , we need to study T . The latter can be bounded as follows: T . e Σ /h h E sup t ∈ [ − h − ,h − ] | ˆ φ ′′ ( t ) − φ ′′ X ( t ) | ! . By the same reasoning as above one can show that sup T T is negligiblecompared to (log n ) − β . Consequently, so is sup T T . Next we deal with T . Notice that by our conditions and the Lyapunov inequality (cid:12)(cid:12)(cid:12)(cid:12) φ ′′ X ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) φ ′ X ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) + σ + Z ∞−∞ x ρ ( x ) dx ≤ (cid:18) Γ + Σ h + Λ K / (cid:19) + Σ + Λ K / . h . Hence it holds that sup T sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12) φ ′′ X ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) . h . (25)Consequently, we have T . h E sup t ∈ [ − h − ,h − ] G ct ! . 22e study the expectation on the righthand side. First of all, for t ∈ [ − h − , h − ] and all n large enough we have G ct = n | ˆ φ ( t ) | − | φ X ( t ) | < κ n e − Σ / (2 h ) − | φ X ( t ) | o = n | φ X ( t ) | − | ˆ φ ( t ) | > | φ X ( t ) | − κ n e − Σ / (2 h ) o ⊆ n | φ X ( t ) − ˆ φ ( t ) | > ( e − − κ n ) e − Σ / (2 h ) o ⊆ ( sup t ∈ [ − h − ,h − ] | φ X ( t ) − ˆ φ ( t ) | > ( e − − κ n ) e − Σ / (2 h ) ) = G ∗ . Therefore sup t ∈ [ − h − ,h − ] G ct ≤ G ∗ and then by Chebyshev’s inequality weobtain T . h P( G ∗ ) . e /h h E sup t ∈ [ − h − ,h − ] | φ X ( t ) − ˆ φ ( t ) | ! . (26)Next apply (13) with k = 0 and r = 4 to the expectation in the rightmostinequality to conclude that sup T T is negligible in comparison to (log n ) − β . This shows that also sup T T is negligible in comparison to (log n ) − β . Tocomplete bounding T and eventually T , we need to bound T . By the c -inequality T . E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( φ ′ X ( t )) ( φ X ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G ct !! + E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ˆ φ ′ ( t )) ( ˆ φ ( t )) G t − ( φ ′ X ( t )) ( φ X ( t )) G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! = T + T . Observe that for h → T sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( φ ′ X ( t )) ( φ X ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . h , which can be shown by the same arguments that led to (25). We alsohave T ≤ h − P( G ∗ ) by the above display. It then follows from (26) thatsup T T is negligible in comparison to (log n ) − β . We turn to T . By the23 -inequality T . E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ˆ φ ′ ( t )) ( ˆ φ ( t )) G t − ( ˆ φ ′ ( t )) ( φ X ( t )) G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! + E sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ˆ φ ′ ( t )) ( φ X ( t )) G t − ( φ ′ X ( t )) ( φ X ( t )) G t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! = T + T . Notice that by the Cauchy-Schwarz inequality T ≤ E sup t ∈ [ − h − ,h − ] | ( ˆ φ ′ ( t )) | sup t ∈ [ − h − ,h − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 1( ˆ φ ( t )) − φ X ( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G t !! = E sup t ∈ [ − h − ,h − ] | ( ˆ φ ′ ( t )) | sup t ∈ [ − h − ,h − ] | ( φ X ( t )) − ( ˆ φ ( t )) || ˆ φ ( t ) | | φ X ( t ) | G t !! ≤ E sup t ∈ [ − h − ,h − ] | ( ˆ φ ′ ( t )) | ! / × E sup t ∈ [ − h − ,h − ] | ( φ X ( t )) − ( ˆ φ ( t )) || ˆ φ ( t ) | | φ X ( t ) | G t !! / = p T p T . Since | ˆ φ ′ ( t ) | ≤ n − P nj =1 | Z j | , it follows that the term T is bounded byE [( n − P nj =1 | Z j | ) ] . By the c -inequality we then getE n n X j =1 | Z j | . n E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 ( | Z j | − E [ | Z j | ]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (E [ | Z j | ]) . Hence sup T T is bounded by a constant, which can be proved by the sameargument as we used for sup T T . Finally, we consider T . We have T . e /h k n E sup t ∈ [ − h − ,h − ] | ˆ φ ( t ) − φ X ( t ) | ! , because | ( φ X ( t )) − ( ˆ φ ( t )) | ≤ | φ X ( t ) − ˆ φ ( t ) | , because | φ X ( t ) | is bounded from below by e − − Σ / (2 h ) for t ∈ [ − h − , h − ] , and because of the definition of G t . Using (13), we conclude that sup T T 24s negligible in comparison to (log n ) − β . Hence so is sup T T . It remains tostudy T . Since T . e /h E sup t ∈ [ − h − ,h − ] | ˆ φ ′ ( t ) − φ ′ X ( t ) | ! , it follows from (13) and Condition 2.4 that sup T T is negligible in compar-ison to (log n ) − β . Consequently, so are sup T T and sup T T . Combinationof all the above results completes the proof of the theorem. Proof of Theorem 2.4 . The statement of the theorem is for estimators basedon observations X , . . . , X n , but the relationship Z j = X j − X j − and thestationary independent increments property of a L´evy process allows us towork with Z , . . . , Z n instead. We adapt the proof of Theorem 4.1 in [38] tothe present case. A general idea of the proof is as follows: we will considertwo L´evy triplets T = (0 , σ , ρ ) and T = (0 , σ , ρ ) depending on n andsuch that the L´evy densities ρ and ρ are separated as much as possibleat a point x, while at the same time the corresponding product densities q ⊗ n and q ⊗ n of observations Z , . . . , Z n are close in the χ -divergence andhence cannot be distinguished well using the observations Z , . . . , Z n . Upto a constant, the squared distance between ρ ( x ) and ρ ( x ) will then givethe desired lower bound (15) for estimation of a L´evy density ρ at a fixedpoint x. This is a standard technique and we refer to Chapter 2 of [54] fora good exposition of methods for deriving lower bounds in nonparametriccurve estimation.Consider two L´evy triplets T = (0 , σ , ρ ) and T = (0 , σ , ρ ) , where ρ j ( u ) = ν ( R ) f j ( u ) for j = 1 , < ν ( R ) < Λ and 0 < σ < Σ . Let f ( u ) = 12 ( r ( u ) + r ( u )) , where two densities r and r are defined through their characteristic func-tions as follows: r ( u ) = 12 π Z ∞−∞ e − itu t /β ) ( β +1) / dt,r ( u ) = 12 π Z ∞−∞ e − itu e − α | t | α dt. With a proper selection of β , β , α and α one can achieve that f satisfies(14) with constants L/ , L ′ / L ′′ / L, L ′ and L ′′ . We alsoassume that 1 < α < . Next define f by f ( u ) = f ( u ) + δ βn H (( u − x ) /δ n ) , where δ n → n → ∞ , and the function H satisfies the following condi-tions: 25. H (0) > φ H ( t ) is twice continuously differentiable;3. R ∞−∞ | t | β | φ H ( t ) | dt ≤ L/ , | φ H ( t ) | ≤ L ′ / (2 | t | β ) , | φ ′ H ( t ) | ≤ L ′′ / (2 | t | β );4. R ∞−∞ H ( x ) dx = 0;5. R −∞ H ( x ) dx = 0;6. φ H ( t ) = 0 for t outside [1 , . Since f ( u ) decays as r ( u ) at infinity, and consequently as | u | − − α , seeformula (14.37) in [52], with a proper selection of H, e.g. by the reasoningsimilar to the one on p. 1268 in [35], the function f will be nonnegative, atleast for all small enough δ n . Consequently, f will be a probability densityand one can also achieve that it satisfies (14) for all small enough δ n . Now notice that | ρ ( x ) − ρ ( x ) | ≍ δ βn . (27)The statement of the theorem will follow from (27) and Lemma 8 of [19], ifwe prove that for δ n ≍ (log n ) − / we have nχ ( q , q ) = n Z ∞−∞ ( q ( u ) − q ( u )) q ( u ) du ≤ c, (28)where a positive constant c < n. Here χ ( · , · ) denotesthe χ -divergence, see p. 86 in [54] for the definition.Denote by p i a density of a Poisson sum Y = P N ( ν ( R )) j =1 W j conditionalon the fact that its number of summands N ( ν ( R )) > . Here W j are i.i.d.with W ∼ f i . Now rewrite the characteristic function of Y as φ Y ( t ) = e − ν ( R ) + (1 − e − ν ( R ) ) 1 e ν ( R ) − (cid:16) e ν ( R ) φ fi ( t ) − (cid:17) , (29)to see that φ p i ( t ) = 1 e ν ( R ) − (cid:16) e ν ( R ) φ fi ( t ) − (cid:17) . Furthermore, p i ( u ) = ∞ X n =1 f ∗ ni ( u ) P ( N ( ν ( R )) = n | N ( ν ( R )) > . (30)By convolving the law of Y with a normal density φ ,σ with mean zero andvariance σ and using (29), we obtain that q ( u ) ≥ (1 − e − ν ( R ) ) φ ,σ ∗ p ( u ) . A, such thatthe right-hand side of the above display is not less than (1 − e − ν ( R ) ) p ( | u | + A ) , we have nχ ( q , q ) . n Z ∞−∞ ( q ( u ) − q ( u )) p ( | u | + A ) dx . n Z ∞−∞ ( q ( u ) − q ( u )) f ( | u | + A ) dx. The last inequality is true because by (30) it holds that p ( | u | + A ) & f ( | u | + A ) . Splitting the integration region in the rightmost term of the lastdisplay into two parts, we get that nχ ( q , q ) . n Z | u |≤ A ( q ( u ) − q ( u )) du + n Z | u | >A u ( q ( u ) − q ( u )) dx = T + T . Here we used the facts that f ( u ) decays as | u | − − α at infinity and that1 < α < . Parseval’s identity then gives T ≤ n π Z ∞−∞ | φ q ( t ) − φ q ( t ) | dt = n (1 − e − ν ( R ) ) π Z ∞−∞ | φ p ( t ) − φ p ( t ) | e − σ t dt = n (1 − e − ν ( R ) ) ( e ν ( R ) − π Z ∞−∞ | e ν ( R ) φ f ( t ) − e ν ( R ) φ f ( t ) | e − σ t dt . n Z ∞−∞ | φ f ( t ) − φ f ( t ) | e − σ t dt, where the last inequality is a consequence of the mean-value theorem appliedto the function e x and the fact that | ν ( R ) φ f i ( t ) | ≤ Λ < ∞ . Now notice that Z ∞−∞ e itu δ βn H (( u − x ) /δ n ) dx = δ β +1 n e itx φ H ( δ n t ) . By definition of f and f it follows that T . nδ β +2 n Z ∞−∞ | φ H ( δ n t ) | e − σ t dt = nδ β +1 n Z ∞−∞ | φ H ( s ) | e − σ s /δ n ds = O (cid:16) nδ β +1 n e − σ /δ n (cid:17) . Hence a choice δ n ≍ (log n ) − / with an appropriate constant will implythat T → n → ∞ . To complete the proof, we need to show that T → δ n . To this end first notice that even though φ f and φ f are27ot twice differentiable at zero, the difference φ q ( t ) − φ q ( t ) still is, because φ H is identically zero outside the interval [1 , , and hence φ q ( t ) − φ q ( t ) iszero for t in a neighbourhood of zero. Then by Parseval’s identity we obtainthat T ≤ n π Z ∞−∞ | ( φ q ( t ) − φ q ( t )) ′′ | dt. By the same arguments as we used for T , one can show that T → n → ∞ , provided δ n ≍ (log n ) − / with an appropriate constant. Thisentails the statement of the theorem.The following technical lemma is used in the proof of Theorem 2.5. Lemma 3.1. Let the sets B n and B cn be defined as B n = ( sup t ∈ [ −√ h − , √ h − ] (cid:12)(cid:12)(cid:12) ˆ φ ( t ) − φ X ( t ) (cid:12)(cid:12)(cid:12) > δ ) ,B cn = ( sup t ∈ [ −√ h − , √ h − ] (cid:12)(cid:12)(cid:12) ˆ φ ( t ) − φ X ( t ) (cid:12)(cid:12)(cid:12) ≤ δ ) , (31) where δ = (1 / e − − Σ /h . Suppose that ν ( R ) ≤ Λ < ∞ and that Con-ditions 2.2, 2.3, 2.8 and 2.10 hold. Then there exists a universal n notdepending on the L´evy triplet ( γ, σ, ρ ) , such that for all n ≥ n on the set B cn we have max { min { M n , log( | ˆ φ ( t ) | ) } , − M n } = log( | ˆ φ ( t ) | ) for t restricted to the interval [ −√ h − , √ h − ] . Proof. The proof is similar to the proof of Lemma 5.1 in [38]. On the set B cn and for t restricted to the interval [ −√ h − , √ h − ] we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < . (32)Furthermore, on the same set and for t ∈ [ −√ h − , √ h − ] the inequality | log( | ˆ φ ( t )) || ≤ | log( | φ X ( t ) | ) | + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)!(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | log( | φ X ( t ) | ) | + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | log( | φ X ( t ) | ) | + 34 ≤ 2Λ + Σ h + 34 28olds. Here in the second line we used an elementary inequality | log(1 + z ) − z | ≤ | z | valid for | z | < / , the third line follows from (32), while inthe last line we used the bound | log | φ X ( t ) || ≤ 2Λ + Σ /h which holds for t ∈ [ −√ h − , √ h − ] . The result is now immediate fromConditions 2.4 and 2.8, because on the set B cn an upper bound on | log( | ˆ φ ( t ) | ) | grows slower than M n . Proof of Theorem 2.5. A general line of the proof is similar to that of The-orem 2.1 in [34], although the details and actual computations are different.We haveE [(ˆ σ n − σ ) ] = E [(ˆ σ n − σ ) B n ] + E [(ˆ σ n − σ ) B cn ] = S + S , where the two sets B n and B cn are defined in (31) and δ in their definitionis given by δ = (1 / e − − Σ /h . The term S in the above display can bebounded as follows, S . M n (cid:18)Z R | v h ( t ) | dt (cid:19) + Σ ! P( B n ) . M n (cid:18)Z R | v h ( t ) | dt (cid:19) + Σ ! e /h nh = M n h (cid:18)Z R | v ( t ) | dt (cid:19) + Σ ! e /h nh . m n e /h nh , where we used Chebyshev’s inequality and Theorem 2.2 with r = 2 to seethe second line. Next we consider S . By Lemma 3.1 on the set B cn for alllarge enough n truncation in the definition of ˆ σ n becomes unimportant andwe have S = E "(cid:18)Z R log( | ˆ φ ( t ) | ) v h ( t ) dt − σ (cid:19) B cn = E Z R log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! v h ( t ) dt + Z R log( | φ X ( t ) | ) v h ( t ) dt − σ ! B cn . Hence by equation (4) in [38], the c -inequality and Conditions 2.9 and 2.1129e obtain that S . Λ (cid:18)Z R ℜ ( φ f ( t )) v h ( t ) dt (cid:19) + E Z R log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! v h ( t ) dt ! B cn = S + S . To bound S , we proceed as follows, S . h Z R | φ f ( t ) | e α | t | s dt Z R \ [ − h − ,h − ] e − α | t | s dt . h Z ∞ /h e − αt s dt . h s +5 e − α/h s , where we used the Cauchy-Schwarz inequality, the fact that |ℜ ( φ f ( t )) | ≤| φ f ( t ) | and Condition 2.9. As far as S is concerned, we have S . E Z R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | v h ( t ) | dt ! B cn + E Z R ( log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ φ ( t ) φ X ( t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − !) v h ( t ) dt ! B cn = S + S . An application of the Cauchy-Schwarz inequality and Conditions 2.2 and2.9 give S . e /h Z R ( v h ( t )) dt E "Z √ /h −√ /h | ˆ φ ( t ) − φ X ( t ) | dt , (33)where we also used the fact that on the set B cn the inequality (32) holds.Parseval’s identity and Proposition 1.7 of [54] (notice that in the latter it isactually not necessary to have a positive kernel) applied to the sinc kernelthen yield E "Z √ /h −√ /h | ˆ φ ( t ) − φ X ( t ) | dt . nh , from which and from (33) we obtain S . e /h h n . B cn the inequality (32) holds and combiningit with an inequality | log(1 + z ) − z | ≤ | z | valid for | z | < / , one sees that S . S . Furthermore, by a standard argument under Condition 2.10 theterm S dominates other terms. For instance, we have e /h h n h − s − e α/h s → , because2Σ h + 2 αh s − log n + log h − s − = − (log log n ) − ( s + 1) log h → −∞ . This follows from (16) and the fact that under Condition 2.10 it holds that h ≍ (log n ) − / . The latter can be shown as formula (27) in [18]. Hence S dominates S . Combination of all the above bounds on the terms S i completes the proof. Proof of Theorem 2.6. A general line of the proof is the same as in the proofof Theorem 2.3. With the same notation for the individual terms T i as inthe latter, by the the same argument as for the term T in the proof ofTheorem 2.3 and term S in the proof of Theorem 2.5 we have T . Z R \ [ − h − ,h − ] | φ f ( t ) | dt ! + h s − e − α/h s = Z R \ [ − h − ,h − ] e − α | t | s e α | t | s | φ f ( t ) | dt ! + h s − e − α/h s . Z R \ [ − h − ,h − ] e − α | t | s dt Z R \ [ − h − ,h − ] e α | t | s | φ f ( t ) | dt + h s − e − α/h s . Z R \ [ − h − ,h − ] e − α | t | s dt + h s − e − α/h s . Z ∞ /h e − αt s dt + h s − e − α/h s . h s − e − α/h s . Denote by MSE[ˆ σ ] the mean square error of ˆ σ . From the proof of Theorem2.3 and by Theorem 2.5 we have T . h MSE[ˆ σ ] + T . h s − e − α/h s + e /h nh . We then obtain T + T . h s − e − α/h s + e /h nh . h s − e − α/h s , e /h nh h − s e α/h s → , which can be seen by taking the logarithm of the lefthand side and thenusing Condition 2.9 and the fact that h ≍ (log n ) − / , cf. formula (27) in[18], to conclude that the lefthand side in the above display diverges tominus infinity. This entails the first statement of the theorem.Before proving the second statement of the theorem, we will show thatthe choice of h as in Condition 2.10 is optimal in a sense that it asymptoti-cally minimises the order bound on the mean square error of ˆ ρ. This followsin essence by arguments similar to those used in the proof of Lemma 4 in[18]: a minimiser h ∗ with respect to h of the expression h s − e − α/h s + e /h nh , which up to a constant is an upper bound on the risk of the estimator ˆ ρ, can be found from the equation ddh " h s − e − α/h s + e /h nh = 0 . After neglecting lower order terms (here we assume that h → n → ∞ ),one can deduce that h ∗ has to satisfy2 αs nh ∗ (1 + o (1)) = e /h ∗ +2 α/h s ∗ . (34)Taking logarithm of the both sides of (34) yields that h ∗ satisfies a log h ∗ + 2 αh s ∗ + 2Σ h ∗ = log n + C (1 + o (1)) (35)for some constants a and C, cf. equation (11) in [18]. With h ∗ chosen as in(34) or (35), the term h s − ∗ e − α/h s ∗ dominates the term e /h / ( nh ∗ ) , cf. pp.30–31 in [18] for a similar result for the kernel-type deconvolution densityestimator in a particular deconvolution problem. Indeed, for h ∗ satisfying(34) we have h s − ∗ e − α/h s ∗ ≍ e /h ∗ nh ∗ h s − ∗ , and it suffices to observe that s < h be as in(16). For any b ∈ R the formula h b ∗ e − α/h s ∗ = ˜ h b e − α/ ˜ h s (1 + o (1))32olds, which can be proved exactly as formula (28) of [18]. Furthermore, e / ˜ h n ˜ h = o (cid:16) e − α/ ˜ h s (cid:17) , which is a direct consequence of (16) and the fact that ˜ h ≍ (log n ) − / , cf.formula (27) of [18]. Finally, e / ˜ h n ˜ h ≤ e /h ∗ nh ∗ , for n large enough, which can be shown as formula (30) of [18]. These factstogether imply that h as in (16) defines an optimal bandwidth, for an upperbound on the risk of ˆ ρ ( x ) computed with such an h is of the same order asthe one computed with h ∗ . Combination of the above results proves the firststatement of the theorem.To complete the proof of the theorem, it remains to prove (17). Assuming n is large enough, by (16) it holds in the case s = 1 that1 h = − α + p (log n − (log log n ) ) + α . From this is follows thatexp (cid:18) − αh (cid:19) . exp − α r (log n − (log log n ) ) ! . The righthand side is of order exp( − α p (log n ) / (2Σ )) , as can be seen bysome straightforward manipulations: we haveexp − α r (log n − (log log n ) ) ! = exp − α r log n ! × exp α r log n − α r (log n − (log log n ) ) ! (36)and r log n − r (log n − (log log n ) ) → , because the lefthand side of the latter can be rewritten as r (log log n ) √ log n " − s − (log log n ) log n ! log n (log log n ) . The term in the square brackets converges to − / , because it converges toa derivative of the function √ − t at t = 0 , while for the first factor we have s √ (log log n ) √ log n → . − α p (log n ) / (2Σ )) . This concludes the proof of the theorem. Acknowledgments Part of the research reported in this paper was done while the author wasat EURANDOM, Eindhoven, The Netherlands. References [1] Y. A¨ıt-Sahalia and J. Jacod. Volatility estimators for discretely sampledL´evy processes. Ann. Statist. , 35:355–392, 2007.[2] M.G. Akritas and R.A. Johnson. Asymptotic inference in L´evy pro-cesses of the discontinuous type. Ann. Statist. , 9:604–614, 1981.[3] M.G. Akritas. Asymptotic theory for estimating the parameters of aL´evy process. Ann. Inst. Statist. Math. , 34:259–280, 1982.[4] I.V. Basawa and P.J. Brockwell. Inference for gamma and stable pro-cesses. Biometrika , 65:129–133, 1978.[5] I.V. Basawa and P.J. Brockwell. A note on estimation for gamma andstable processes. Biometrika , 67:234–236, 1980.[6] D. Belomestny and M. Reiß. Spectral calibration of exponential L´evymodels. Finance Stoch. , 10:449-474, 2006.[7] D. Belomestny and M. Reiß. Spectral calibration of exponential L´evymodels [2]. SFB 649 Discussion Paper 2006-035, 2006.[8] J. Bertoin. L´evy Processes . Cambridge University Press, Cambridge,1996.[9] B.M. Bibby and M. Sørensen. A hyperbolic diffusion model for stockprices. Finance Stoch. , 1:25–41, 1997.[10] P. Blæsild and M. Sørensen. HYP — a computer program for analyzingdata by means of the hyperbolic distribution. Research report No. 248,Department of Mathematical Statistics, Aarhus University, 1992.[11] S. Borak, W. H¨ardle and R. Weron. Stable distributions. In P. Cizek,W. H¨ardle and R. Weron (editors), Statistical Tools for Finance andInsurance , 21–44, Springer, Berlin, 2005.[12] L.D. Brown, M.G. Low and L.H. Zhao. Superefficiency in nonparamet-ric function estimation. Ann. Statist. , 25:2607-2625, 1997.3413] B. Buchmann. Weighted empirical processes in the nonparametric in-ference for L´evy processes. Math. Methods Statist. , 18:281–309, 2009.[14] B. Buchmann and R. Gr¨ubel. Decompounding: an estimation problemfor Poisson random sums. Ann. Statist. , 31:1054–1074, 2003.[15] B. Buchmann and R. Gr¨ubel. Decompounding Poisson random sums:recursively truncated estimates in the discrete case. Ann. Inst. Statist.Math. , 56:743–756, 2004.[16] E.V. Burnaev. Inversion formula for infinitely divisible distributions. Russ. Math. Surv. , 61:772–774, 2006.[17] C. Butucea and C. Matias. Minimax estimation of the noise level andof the deconvolution density in a semiparametric convolution model. Bernoulli , 11:309–340, 2005.[18] C. Butucea and A.B. Tsybakov. Sharp optimality for density deconvo-lution with dominating bias, I. Theory Probab. Appl. , 52:24–39, 2008.[19] C. Butucea and A.B. Tsybakov. Sharp optimality for density decon-volution with dominating bias, II. Theory Probab. Appl. , 52:237–249,2008.[20] P. Carr, H. Geman, D.B. Madan, and M. Yor. The fine structure ofasset returns: an empirical investigation. J. Bus. , 75:305–332, 2002.[21] S.X. Chen, A. Delaigle and P. Hall. Nonparametric estimation for aclass of L´evy processes. J. Econometrics , 157:257–271, 2010.[22] K.L. Chung. A Course in Probability Theory , 3rd edition. AcademicPress, New York, 2001.[23] F. Comte and V. Genon-Catalot. Nonparametric estimation for purejump L´evy processes based on high frequency data. Stochastic Process.Appl. , 119:4088-4123, 2009.[24] F. Comte and V. Genon-Catalot. Nonparametric adaptive estimationfor pure jump L´evy processes. Ann. Inst. H. Poincar´e Probab. Statist. ,46:595–617, 2010.[25] F. Comte and V. Genon-Catalot. Non-parametric estimation for purejump irregularly sampled or noisy L´evy processes. Stat. Neerl. , 64: 290–313, 2010.[26] F. Comte and V. Genon-Catalot. Estimation for L´evy processes fromhigh frequency data within a long time interval. Ann. Statist. , 39: 803–837, 2011. 3527] F. Comte and C. Lacour. Data driven density estimation in presenceof unknown convolution operator. To appear in J. R. Stat. Soc. Ser. BStat. Methodol. , 2011.[28] R. Cont and P. Tankov. Financial Modelling with Jump Processes .Chapman & Hall/CRC, Boca Raton, 2003.[29] R. Cont and P. Tankov. Retrieving L´evy processes from option prices:regularization of an ill-posed inverse problem. SIAM J. Control Optim. ,45:1–25, 2006.[30] A. Delaigle. An alternative view of the deconvolution problem. Statist.Sinica , 18:1025–1045, 2008.[31] L. Devroye. On the non-consistency of an estimate of Chiu. Stat. Probab.Lett. , 20:183–188, 1994.[32] L. Devroye and L. Gy¨orfi. Nonparametric Density Estimation: the L View . John Wiley & Sons, New York, 1985.[33] B. van Es and S. Gugushvili. Asymptotic normality of the deconvolutionkernel density estimator under the vanishing error variance. J. KoreanStatist. Soc. , 39: 102–115, 2010.[34] B. van Es, S. Gugushvili and P. Spreij. A kernel type nonparametricdensity estimator for decompounding. Bernoulli , 13:672–694, 2007.[35] J. Fan. On the optimal rates of convergence for nonparametric decon-volution problems. Ann. Statist. , 19:1257–1272, 1991.[36] J. Fan. Deconvolution with supersmooth distributions. Canad. J.Statist. , 20:155-169, 1992.[37] E. Figueroa-L´opez. Sieve-based confidence intervals and bands for L´evydensities. Bernoulli , 17:643–670, 2011.[38] S. Gugushvili. Nonparametric estimation of the characteristic triplet ofa discretely observed L´evy process. J. Nonparametr. Stat. , 21:321–343,2009.[39] S. Gugushvili, B. van Es and P. Spreij. Deconvolution for an atomicdistribution: rates of convergence. To appear in J. Nonparametr. Stat. ,2011.[40] S. Gugushvili, C. Klaassen and P. Spreij (editors). Statistical Inferencefor L´evy Processes with Applications to Finance. Stat. Neerl. , 64(3),2010. 3641] G. Jongbloed and F.H. van der Meulen. Parametric estimation for sub-ordinators and induced OU processes. Scand. J. Statist. , 33:825–847,2006.[42] G. Jongbloed, F.H. van der Meulen and A.W. van der Vaart. Non-parametric inference for L´evy-driven Ornstein-Uhlenbeck processes. Bernoulli , 11:759-791, 2005.[43] J. Kappus and M. Reiß. Estimation of the characteristics of a L´evyprocess observed at arbitrary frequency. Stat. Neerl. , 64: 314–328, 2010.[44] A.E. Kyprianou. Introductory Lectures on Fluctuations of L´evy Pro-cesses with Applications . Springer, Berlin, 2006.[45] A. Meister. Density estimation with normal measurement error withunknown variance. Statist. Sinica , 16:195–211, 2006.[46] R.C. Merton. Option pricing when underlying stock returns are discon-tinuous. J. Financ. Econ. , 3:125–144, 1976.[47] M.H. Neumann. On the effect of estimating the error density in non-parametric deconvolution. J. Nonparametr. Statist. , 7:307–330, 1997.[48] M.H. Neumann and M. Reiß. Nonparametric estimation for L´evy pro-cesses from low-frequency observations. Bernoulli , 15:223–248, 2009.[49] J.P. Nolan. Maximum likelihood estimation and diagnostics for sta-ble distributions. In O.E. Barndorff-Nielsen, T. Mikosch, and S.I.Resnick (editors), L´evy Processes: Theory and Applications , 379–400,Birkh¨auser, Boston, 2001.[50] Y.-F. Ren and H.-Y. Liang. On the best constant in Marcinkiewicz-Zygmund inequality. Statist. Probab. Lett. , 53:227-233, 1999.[51] T.H. Rydberg. The normal inverse Gaussian L´evy process: simulationand approximation. Stoch. Models , 13:887–910, 1997.[52] K.-I. Sato. L´evy Processes and Infinitely Divisible Distributions . Cam-bridge University Press, Cambridge, 2004.[53] J. S¨ohl. Polar sets for anisotropic Gaussian random fields. Statist.Probab. Lett. , 80:840–847, 2010.[54] A.B. Tsybakov. Introduction to Nonparametric Estimation . Springer,New York, 2009.[55] A.W. van der Vaart. Asymptotic Statistics . Cambridge University Press,Cambridge, 1998. 3756] A.W. van der Vaart and J.A. Wellner. Weak Convergence and EmpiricalProcesses with Applications to Statistics . Springer, New York, 1996.[57] M.P. Wand. Finite sample performance of deconvolving density estima-tors. Statist. Probab. Lett. , 37:131–139, 1998.[58] R.N. Watteel and R.J. Kulperger. Nonparametric estimation of thecanonical measure for infinitely divisible distributions. J. Stat. Com-put. Simul. , 73:525–542, 2003.[59] V.M. Zolotarev.