Deconvolution for an atomic distribution
aa r X i v : . [ m a t h . S T ] A p r Electronic Journal of Statistics
Vol. 2 (2008) 265–297ISSN: 1935-7524DOI:
Deconvolution for an atomicdistribution
Bert van Es
Korteweg-de Vries Institute for MathematicsUniversiteit van AmsterdamPlantage Muidergracht 241018 TV AmsterdamThe Netherlandse-mail: [email protected]
Shota Gugushvili ∗† EurandomTechnische Universiteit EindhovenP.O. Box 5135600 MB EindhovenThe Netherlandse-mail: [email protected]
Peter Spreij
Korteweg-de Vries Institute for MathematicsUniversiteit van AmsterdamPlantage Muidergracht 241018 TV AmsterdamThe Netherlandse-mail: [email protected]
Abstract:
Let X , . . . , X n be i.i.d. observations, where X i = Y i + σZ i and Y i and Z i are independent. Assume that unobservable Y ’s are dis-tributed as a random variable UV, where U and V are independent, U hasa Bernoulli distribution with probability of zero equal to p and V has adistribution function F with density f. Furthermore, let the random vari-ables Z i have the standard normal distribution and let σ > . Based on asample X , . . . , X n , we consider the problem of estimation of the density f and the probability p. We propose a kernel type deconvolution estimatorfor f and derive its asymptotic normality at a fixed point. A consistentestimator for p is given as well. Our results demonstrate that our estima-tor behaves very much like the kernel type deconvolution estimator in theclassical deconvolution problem. AMS 2000 subject classifications:
Primary 62G07; secondary 62G20.
Keywords and phrases:
Asymptotic normality, atomic distribution, de-convolution, kernel density estimator.Received September 2007. ∗ The corresponding author. † The research of this author was financially supported by the Nederlandse Organisatievoor Wetenschappelijk Onderzoek (NWO). The research was conducted while this author wasat Korteweg-de Vries Institute for Mathematics in Amsterdam.265 . van Es et al./Deconvolution for an atomic distribution
Contents
1. Introduction
Let X , . . . , X n be i.i.d. copies of a random variable X = Y + σZ, where X i = Y i + σZ i , and Y i and Z i are independent and have the same distribution as Y and Z, respectively. Assume that Y ’s are unobservable and that Y = U V, where U and V are independent, U has a Bernoulli distribution with probability ofzero equal to p (we assume that 0 ≤ p <
1) and V has a distribution function F with density f. Furthermore, let the random variable Z have a standard normaldistribution and let σ be a known positive number. The X will then have adensity, which we denote by q. The distribution of Y is completely determinedby f and p. Note that the distribution of Y has an atom at zero. Based on asample X , . . . , X n , we consider the problem of (nonparametric) estimation ofthe density f and the probability p. Our estimation problem is closely related to the classical deconvolution prob-lem, where the situation is as described above, except that in the classical case p vanishes and Y i has a continuous distribution with density f, which we wantto estimate. The Y i ’s can for instance be interpreted as measurements of somecharacteristic of interest, contaminated by noise σZ i . Some works on deconvo-lution include [3, 4, 6, 7, 9, 10, 11, 13, 14, 16, 19, 20, 21, 22, 23, 28, 30, 32,35, 38, 39, 42, 43, 45, 46] and [50]. Practical problems related to deconvolutioncan be found e.g. in [31], which provides a general account of mixture mod-els. The deconvolution problem is also related to empirical Bayes estimation ofthe prior distribution, see e.g. [2] and [33]. Yet another application field is thenonparametric errors in variables regression, see [24].Unlike the classical deconvolution problem, in our case Y does not have adensity, because the distribution of Y has an atom at zero. Hence our results,apart of the direct applications below, will also provide insight into the robust-ness of the deconvolution estimator when the assumption of absolute continuityis violated.One situation where the atomic deconvolution can arise, is the following:one might think of the X i ’s as increments X i − X i − of a stochastic process X t = Y t + σ Z t , where Y = ( Y t ) t ≥ is a compound Poisson process with intensity λ and jump size density ρ, and Z = ( Z t ) t ≥ is a Brownian motion independentof Y . The distribution of Y i − Y i − then has an atom at zero with probabilityequal to e − λ , while Z i − Z i − has a standard normal distribution. Notice that . van Es et al./Deconvolution for an atomic distribution X = ( X t ) t ≥ is a L´evy process, see Example 8 . X can be used to model the evolution of a stock price, see [34]. The lawof X can be completely characterised by f, λ and σ. Furthermore, estimationof f in the atomic deconvolution context is closely related to estimation of thejump size density of a compound Poisson process Y , which is contaminated bynoise coming from a Brownian motion, see [26].Another practical situation might arise in missing data problems. Supposefor instance that a measurement device is used to measure some quantity ofinterest and that it has a fixed probability p of failure to detect this quantity, inwhich case it renders zero. Repetitive measurements can be modelled by randomvariables Y i defined as above. Assume that our goal is to estimate the density f and the probability p. In practice measurements are often contaminated byan additive measurement error and to account for this, we add the noise σZ i to our measurements ( σ quantifies the noise level). If we could directly use themeasurements Y i , then the zero measurements could be discarded and we wouldhave observations with density f to base our estimator on. However, due to theadditional noise σZ i , the zeroes cannot be distinguished from the nonzero Y i ’s.The use of deconvolution techniques is thus unavoidable. The same situationoccurs for instance when Y i are left truncated at zero. In the error-free case, i.e.when σ = 0 , estimation of the mean and variance of a positive random variable V was considered in [1]. Our model appears to be more general.In what follows, we first assume that p is known and construct an estimatorfor f. After this, in the model where p is unknown, we will provide an estimatorfor p and then propose a plug-in type estimator for f. An estimator for f willbe constructed via methods similar to those used in the classical deconvolutionproblem. In particular we will use Fourier inversion and kernel smoothing. Let φ X , φ Y and φ f denote the characteristic functions of the random variables X, Y and V, respectively. Notice that the characteristic function of Y is given by φ Y ( t ) = p + (1 − p ) φ f ( t ) . (1.1)Furthermore, since φ X ( t ) = φ Y ( t ) e − σ t / = ( p + (1 − p ) φ f ( t )) e − σ t / , the characteristic function of V can be expressed as φ f ( t ) = φ X ( t ) − pe − σ t / (1 − p ) e − σ t / . Assuming that φ f is integrable, by Fourier inversion we get f ( x ) = 12 π Z ∞−∞ e − itx φ X ( t ) − pe − σ t / (1 − p ) e − σ t / dt. (1.2)An obvious way to construct an estimator of f ( x ) from this relation is to estimatethe characteristic function φ X ( t ) by its empirical counterpart, φ emp ( t ) = 1 n n X j =1 e itX j , . van Es et al./Deconvolution for an atomic distribution see e.g. [25] for a discussion of its applications in statistics, and then obtain theestimator of f by a plug-in device. Alternatively, one can estimate the density q of X by a kernel estimator q nh ( x ) = 1 nh n X j =1 w (cid:18) x − X j h (cid:19) , where w denotes a kernel function and h > φ w the Fourier transform of the kernel w. The characteristic function of q nh , whichis equal to φ emp ( t ) φ w ( ht ) , will serve as an estimator of φ q , the characteristicfunction of q. A naive estimator of f can then be obtained by a plug-in device,and would be 12 π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) − pe − σ t / (1 − p ) e − σ t / dt. (1.3)However, this procedure is not always meaningful, because the integrand in (1.3)is not integrable in general. Therefore, instead of (1.3), we define our estimatorof f as f nh ( x ) = 12 π Z ∞−∞ e − itx φ emp ( t ) − pe − σ t / (1 − p ) e − σ t / φ w ( ht ) dt, (1.4)where the integral is well-defined under the assumption that φ w has a compactsupport on [ − , . Notice that f nh ( x ) = ˆ f nh ( x )1 − p − p − p w h ( x ) , (1.5)where ˆ f nh ( x ) = 12 π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (1.6)and w h ( x ) = (1 /h ) w ( x/h ) . Hence ˆ f nh has the same form as an ordinary decon-volution kernel density estimator based on the sample X , . . . , X n , see e.g. pp.231–232 in [49].Under the assumption of integrability of φ f and some additional restrictionson w, the bias of the estimator (1.4) will asymptotically vanish as h →
0. Indeed,E[ f nh ( x )] − f ( x ) = 12 π Z ∞−∞ e − itx φ f ( t )( φ w ( ht ) − dt. (1.7)The result follows via the dominated convergence theorem, once we know that φ w is bounded and φ (0) = 1 . Observe that (1.7) coincides with the bias of anordinary kernel density estimator based on a sample from f. In case we know that f belongs to a specific H¨older class, it is possible to derive an order bound for(1.7) in terms of some power of h, see Proposition 1.2 in [40]. Further propertiesof kernel density estimators can be found in [15, 17, 36, 40, 48] and [49].Estimation of p is not as easy, as it might appear at first sight. Indeed, dueto the convolution structure X = Y + σZ, the random variable X has a density . van Es et al./Deconvolution for an atomic distribution and the atom in the distribution of Y is not inherited by the distribution of X. On the other hand p is identifiable, sincelim t →∞ φ X ( t ) e − σ t / = lim t →∞ φ Y ( t ) = p, because φ f ( t ) → t → ∞ by the Riemann-Lebesgue theorem. However, thisrelation cannot be used as a hint for the construction of a meaningful estimatorof p because of the oscillating behaviour of φ emp ( t ) , the obvious estimator of φ X ( t ) , as t → ∞ . As an estimator of p we propose p ng = g Z /g − /g φ emp ( t ) φ k ( gt ) e − σ t / dt, (1.8)where the number g > φ k denotes the Fouriertransform of a kernel k. We assume that φ k has support [ − , . The definitionof p ng is motivated by the fact thatlim g → g Z /g − /g φ X ( t ) e − σ t / dt = lim g → g Z /g − /g φ Y ( t ) dt = lim g → g Z /g − /g ( p + (1 − p ) φ f ( t )) dt = p. Assuming the integrability of φ f , the last equality follows from Z /g − /g | φ f ( t ) | dt ≤ Z ∞−∞ | φ f ( t ) | dt < ∞ . Finally, let us consider the general case when both p and f are unknown.Plugging in an estimator of p into (1.4) leads to the following definition of anestimator of f,f ∗ nhg ( x ) = 12 π Z ∞−∞ e − itx φ emp ( t ) − ˆ p ng e − σ t / (1 − ˆ p ng ) e − σ t / φ w ( ht ) dt, (1.9)where ˆ p ng = min( p ng , − ǫ n ) . (1.10)Here 0 < ǫ n < ǫ n ↓ p ng in (1.10) is introduced for technical reasons,see formula (5.18), where we need that the random variable 1 − ˆ p ng is boundedaway from zero.In practice it might also happen that the error variance σ is unknown andhence has to be estimated. This is a difficult problem in the classical deconvo-lution density estimation if only observations X , . . . , X n are available, as the . van Es et al./Deconvolution for an atomic distribution convergence rate for estimation of σ is not the usual √ n rate, see e.g. [35]. More-over, the convergence rate of an estimator of σ would dominate the asymptotics.If additional measurements are available, then as suggested for instance in [9], σ can be estimated e.g. via the empirical variance of the difference of replicatedobservations or by the method of moments via instrumental variables. A recentpaper on this subject is [14]. We do not pursue this question any further andassume that σ is known.Concluding this section, we introduce some technical conditions on the den-sity f, kernels w and k, bandwidths h and g and the sequence ǫ n . These areneeded in the proof of Theorem 2.5, the main theorem of the paper, and sub-sequent results. Weaker forms of these conditions are sufficient to prove otherresults from Section 2 and will be given directly in the corresponding statements.
Condition 1.1.
There exists a number γ > , such that u γ φ f ( u ) is integrable. Condition 1.2.
Let φ w be bounded, real valued, symmetric and have support [ − , . Let φ w (0) = 1 and let φ w (1 − t ) = At α + o ( t α ) , as t ↓ for some constants A and α ≥ . Moreover, we assume that γ > α. This condition is similar to the one used in [30] and [46] in the classicaldeconvolution problem. An example of a kernel that satisfies this condition is w ( x ) = − √ x cos x + ( − x ) sin x ) πx . (1.12)Its Fourier transform is given by φ w ( t ) = (1 − t ) [ − , ( t ) . (1.13)In this case α = 2 and A = 4 . The kernel (1.12) and its Fourier transform areplotted in Figures 1 and 2. - -
10 10 200.050.100.15
Fig 1 . The kernel (1.13) . - - - Fig 2 . The Fourier transform of the ker-nel (1.13) .. van Es et al./Deconvolution for an atomic distribution
Condition 1.3.
Let φ k be real valued, symmetric and have support [ − , . Let φ k integrate to 2 and let φ k ( t ) = Bt γ + o ( t γ ) , (1.14) φ k (1 − t ) = Ct α + o ( t α ) , (1.15) as t ↓ . Here B and C are some constants, and γ and α are the same as above. An example of such a kernel is given by k ( x ) = − x ( − x − x + 7 x ) cos x √ πx − − x + 13950 x − x + x ) sin x √ πx . (1.16)Its Fourier transform is given by φ k ( t ) = 6938 t (1 − t ) [ − , ( t ) . (1.17)In this case B = 693 / , γ = 6 , α = 2 and C = 693 / . The kernel (1.16) andits Fourier transform are plotted in Figures 3 and 4. Condition (1.14) is onlyneeded when p ng is plugged into f ∗ nhg , but not if p ng is used as an estimatorof p. Condition 1.4.
Let the bandwidths h and g depend on n, h = h n and g = g n , and let h n = σ ((1 + η n ) log n ) − / ,g n = σ ((1 + δ n ) log n ) − / , where η n and δ n are such that η n ↓ , δ n ↓ , η n − δ n > , and ( η n − δ n ) log n → ∞ . - -
10 10 20 - - Fig 3 . The kernel (1.17) . - - - Fig 4 . The Fourier transform of the ker-nel (1.17) .. van Es et al./Deconvolution for an atomic distribution
Furthermore, we assume that − η n log n + (1 + 2 α ) log log n → ∞ , − δ n log n + (1 + 2 α ) log log n → ∞ . (1.18)An example of η n and δ n in the definition above is η n = 2 log log log n log n , δ n = log log log n log n . Conditions on the bandwidths h n and g n in Condition 1.4 are not the onlypossible ones and other restrictions are also possible. However the logarithmicdecay of h n and g n is unavoidable. Following the default convention in kerneldensity estimation and to keep the notation compact, we will suppress the index n when writing h n and g n and will write h and g instead, since no ambiguitywill arise. Condition 1.5.
Let ǫ n ↓ be such that log ǫ n ( η n − δ n ) log n → . An example of such ǫ n for η n and δ n given above is (log log log n ) − . The remainder of the paper is organised as follows: in Section 2 we derivethe theorem establishing the asymptotic normality of f nh ( x ) , the fact that theestimator p ng is weakly consistent, and finally that the estimator f ∗ nhg ( x ) isasymptotically normal. Section 3 contains simulation examples. Section 4 dis-cusses a method for implementation of the estimator in practice. All the proofsare collected in Section 5.
2. Main results
We will first study the estimation of f when p is known, and then proceed tothe general case with unknown p. The reason for this is twofold. Firstly, it isinteresting to compare the behaviour of the estimator of f under the assumptionof known and unknown p, and secondly, the proofs of the results for the lattercase rely heavily on the proofs for the former case.The first result in this section deals with the nonrobustness of the estimatorˆ f nh . In ordinary kernel deconvolution, when it is assumed that Y is absolutelycontinuous, the estimator for its density is defined asˆ f nh ( x ) = 12 π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt. (2.1)Now suppose that the assumption of absolute continuity of Y is violated. Whatwill happen, if we still use the estimator ˆ f nh ( x )? The following result addressesthis question. . van Es et al./Deconvolution for an atomic distribution Theorem 2.1.
Let ˆ f nh ( x ) be defined as in (2.1) . Assume that φ w is boundedand has a compact support on [ − , . Then
E [ ˆ f nh ( x )] = pw h ( x ) + (1 − p ) f ∗ w h ( x ) , (2.2) where w h ( · ) = (1 /h ) w ( · /h ) , and ∗ denotes convolution. From this theorem it follows that E [ ˆ f nh (0)] diverges to infinity as h → , because so does h − w (0) , if w (0) = 0 (the latter is the case for the majority ofconventional kernels). In practice this will also result in an equally undesirablebehaviour of E [ ˆ f nh ( x )] in the neighbourhood of zero. When x = 0 , with aproper selection of a kernel w, one can achieve that the first term in (2.2)asymptotically vanishes as h → . Indeed, it is sufficient to assume that w is suchthat lim u →±∞ uw ( u ) = 0 . The second term in (2.2) will converge to (1 − p ) f ( x )as h → , provided that φ f is integrable, φ w is bounded and φ w (0) = 1 . Thesefacts address the issue of the nonrobustness of ˆ f nh : under a misspecified model,i.e. under the assumption that the distribution of Y is absolutely continuous,while in fact it has an atom at zero, the classical deconvolution estimator willexhibit unsatisfactory behaviour near zero. This will happen despite the factthat ˆ f nh ( x ) will be asymptotically normal when centred at its expectation andsuitably normalised, see Corollary 5.1 in Section 5. The asymptotic normalityfollows from Lemmas 5.2 and 5.3 of Section 5, where only absolute continuityof the distribution of X is required.Our next goal is to establish the asymptotic normality of the estimator f nh ( x ) . We formulate the corresponding theorem below.
Theorem 2.2.
Assume that φ f is integrable. Let E [ X ] < ∞ , and suppose thatCondition 1.2 holds. Let f nh be defined as in (1.4) . Then, as n → ∞ and h → , √ nh α e σ / (2 h ) ( f nh ( x ) − E [ f nh ( x )]) D → N , A π (1 − p ) (cid:18) σ (cid:19) α (Γ( α + 1)) ! , (2.3) where Γ denotes the gamma function, Γ( t ) = R ∞ v t − e − v dv. Note that Theorem 2.2 establishes asymptotic normality of f nh under anatomic distribution, which constitutes a generalisation of a result in [46] (seealso [45]) for the case of the classical deconvolution problem. The generalisationis possible, because the proof uses only the continuity of the density of X, which is still true when Y has a distribution with an atom. Furthermore, noticethat in order to get a consistent estimator, from this theorem it follows that √ nh − − α e − σ / (2 h ) has to diverge to infinity. Therefore the bandwidth h hasto be at least of order (log n ) − / , as it is actually stated in Condition 1.4. Inpractice this implies that the bandwidth h has to be selected fairly large, evenfor large sample sizes. This is the case for the classical deconvolution problemas well in the case of a supersmooth error distribution, cf. [46]. . van Es et al./Deconvolution for an atomic distribution Observe that the asymptotic variance in (2.3) does not depend on the targetdensity f nor on the point x. This phenomenon is quite peculiar, but is alreadyknown in the classical deconvolution kernel density estimation, see for instanceequation (6) in [3]. There, provided that h is small enough, the asymptotic vari-ance of the deconvolution kernel density estimator (or, strictly speaking an upperbound for it) also does not depend neither on the target density f, nor on thepoint x. In this respect see also [46]. Such results do not contradict the asymp-totic normality result in [21], see Theorem 2.2 in that paper, as there the asymp-totic variance of the deconvolution kernel density estimator is not evaluated.Now we state a theorem concerning the consistency of p ng , the estimator of p. Theorem 2.3.
Assume that φ f is integrable, let E [ X ] < ∞ , and let the kernel k have a Fourier transform φ k that is bounded by one and integrates to two. Let p ng be defined as in (1.8) . If g is such that g α e σ /g n − → , then p ng is aconsistent estimator of p, i.e. P( | p ng − p | > ǫ ) → as n → and g → . Here ǫ is an arbitrary positive number. Furthermore, underCondition 1.1 and 1.3 E [( p ng − p ) ] = O g γ + g α e σ /g n ! . One can also show that p ng is asymptotically normal, when centred andsuitably normalised. We formulate the corresponding theorem below. Theorem 2.4.
Assume that the conditions of Theorem 2.3 hold. Let p ng bedefined as in (1.8) and let (1.15) hold. Then √ ng α e σ / (2 g ) ( p ng − E [ p ng ]) D → N , C (Γ(1 + α )) (cid:18) σ (cid:19) α ! as n → ∞ and g → . Finally, we consider the case when both p and f are unknown. We state themain theorem of the paper. Theorem 2.5.
Let f ∗ nhg ( x ) be defined by (1.9) , E [ X ] < ∞ and let Condi-tions 1.1–1.5 hold. Then, as n → ∞ , we have √ nh α e σ / (2 h ) ( f ∗ nhg ( x ) − E [ f ∗ nhg ( x )]) D → N , A π (1 − p ) (cid:18) σ (cid:19) α (Γ( α + 1)) ! . Notice that the asymptotic variance is the same as in (2.3), which justifies theplug-in approach to the construction of an estimator of f, when p is unknown. . van Es et al./Deconvolution for an atomic distribution A natural question to consider is what happens when we centre f ∗ nhg ( x ) notat its expectation, but at f ( x ) . This has practical importance as well, e.g. forthe construction of (asymptotic) confidence intervals. Writing √ nh α e σ / (2 h ) ( f ∗ nhg ( x ) − f ( x )) = √ nh α e σ / (2 h ) ( f ∗ nhg ( x ) − E [ f ∗ nhg ( x )])+ √ nh α e σ / (2 h ) (E [ f ∗ nhg ( x )] − f ( x )) , we see, that we have to study the second term here, i.e. to compare the be-haviour of the bias of f ∗ nhg ( x ) to the normalising factor √ nh − (1+2 α ) e − σ / (2 h ) . We will study the bias of f ∗ nhg ( x ) in two steps: first we will show that itasymptotically vanishes, which itself is of independent interest. After this wewill provide conditions under which it asymptotically vanishes when multipliedby √ nh − (1+2 α ) e − σ / (2 h ) . Recall the definition of a H¨older class of functions H ( β, L ) . Definition 2.1.
A function f is said to belong to the H¨older class H ( β, L ) , ifits derivatives up to order l = [ β ] exist and verify the condition | f ( l ) ( x + t ) − f ( l ) ( x ) | ≤ L | t | β − l for all x, t ∈ R . Such a smoothness condition on a target density f is standard in kerneldensity estimation, see e.g. p. 5 of [40]. Often one assumes that β = 2 . If l = 0 , then set f ( l ) = f. We also need the definition of a kernel of order l. In particular,we will use the version given in Definition 1.3 of [40].
Definition 2.2.
A kernel w is said to be a kernel of order l for l ≥ , if thefunctions x x j w ( x ) are integrable for j = 0 , . . . , l and if Z ∞−∞ w ( x ) dx = 1 , Z ∞−∞ x j w ( x ) dx = 0 for j = 1 , . . . , l. Theorem 2.6.
Let f ∗ nhg ( x ) be defined by (1.9) and assume conditions of The-orem 2.5. Then, as n → ∞ , we have E [ f ∗ nhg ( x )] − f ( x ) → . If additionally f ∈ H ( β, L ) , w is a kernel of order l = [ β ] and β > α, then √ nh α e σ / (2 h ) (E [ f ∗ nhg ( x )] − f ( x )) → as n → ∞ . Combination of this theorem with Theorem 2.5 leads to the following result.
Theorem 2.7.
Assume that the conditions of Theorem 2.6 hold. Then, as n →∞ , we have √ nh α e σ / (2 h ) ( f ∗ nhg ( x ) − f ( x )) D → N , A π (1 − p ) (cid:18) σ (cid:19) α (Γ( α + 1)) ! . . van Es et al./Deconvolution for an atomic distribution One should keep in mind that these results deal only with asymptotics. Inthe next section we will study several simulation examples, which will providesome insight into the finite sample properties of the estimator.
3. Simulation examples
In this section we consider a number of simulation examples. We do not pretendto provide an exhaustive simulation study, rather an illustration, which requiresfurther verification.Assume that σ = 1 , p = 0 . f is normal with mean 3 and variance 9 . This results in a nontrivial deconvolution problem, because the ratio of ‘noise’compared to ‘signal’ is reasonably high: NSR = Var[ σZ ] / Var[ Y ]100% ≈ . We have simulated a sample of size n = 1000 . As kernels w and k we selectedkernels (1.12) and (1.16), respectively. The bandwidths h = 0 .
58 and g = 0 . p ng produced a value equal to 0 . . The estimateof f (bold dotted line), resulting from the procedure described above, togetherwith the target density f (dashed line) is plotted in Figure 5. For comparisonpurposes, we have also plotted the estimate f nh ( x ) (it can be obtained using(1.5) and the true value of the parameter p ), see Figure 6. As can be seen fromthe comparison of these two figures, the estimates f ∗ nhg and f nh look rathersimilar.As the second example we consider the case when f is a gamma density withparameters α = 8 and β = 1 , i.e. f ( x ) = x e − x Γ(8) 1 [ x> , (3.1)and p = 0 . . We simulated a sample of size n = 1000 . The kernels were chosenas above and the bandwidths g = 0 . h = 0 . p ng took a value approximately equal to 0 . . The resulting estimate - Fig 5 . The normal density f (dashed line)and the estimate f ∗ nhg (solid line). Thesample size n = 1000 . - Fig 6 . The normal density f (dashed line)and the estimate f nh (solid line). The sam-ple size n = 1000 . . van Es et al./Deconvolution for an atomic distribution - Fig 7 . The gamma density (dashed line)and the estimate f ∗ nhg (solid line). Thesample size n = 1000 . - Fig 8 . The gamma density (dashed line)and the estimate f nh (solid line). The sam-ple size n = 1000 . - Fig 9 . The histogram of estimates of p for g = 0 . and the sample size n = 1000 . Fig 10 . The histogram of estimates of p for g = 0 . and the sample size n = 1000 . f ∗ nhg is plotted in Figure 7. As above we also plotted the estimate ˆ f nh , seeFigure 8 (notice that the estimate takes on negative values in the neighbourhoodof zero). Again both figures look similar.Examination of these figures leads us to two questions: how well does p ng estimate p for moderate sample samples? How sensitive is f ∗ nhg to under- oroverestimation of p ? To get at least a partial answer to the first question, weconsidered the same model as in our first example in this section (i.e. decon-volution of the normal density) and repeatedly, i.e. 1000 times, estimated p forthe bandwidth g = 0 . n = 1000 for each simulation run.Then the same procedure was repeated for the bandwidths g = 0 . , . . . The resulting histograms are plotted in Figures 9–12. They look quite sat-isfactory. The sample means and sample standard deviations (SD) of estimatesof p for different choices of bandwidth g together with the theoretical standarddeviations are summarised in Table 1. One notices that the sample means in Ta-ble 1 are close to the true value 0 . p. The theoretical standarddeviations in the same table were computed using Theorem 2.4, which predicts . van Es et al./Deconvolution for an atomic distribution
Fig 11 . The histogram of estimates of p for g = 0 . and the sample size n = 1000 . Fig 12 . The histogram of estimates of p for g = 0 . and the sample size n = 1000 . Table 1
Sample and theoretical means and standard deviations (SD) of estimates of p for differentchoices of bandwidth g. The sample size n = 1000Bandwidth 0.5 0.55 0.6 0.65Sample mean 0.0963 0.0975 0.0960 0.0927Sample SD 0.0516 0.0436 0.0388 0.0349Asymptotic SD 1.7891 2.2399 2.8994 3.8164Theoretical SD 0.0700 0.0593 0.0487 0.0432 that they should be equal to (recall, that in our case α = 2) g e / (2 g ) √ n C √ . From Table 1 one sees that there is a large discrepancy between the samplestandard deviations and the standard deviations predicted by the theory. Theexplanation of this discrepancy lies in the fact that the proof of the asymptoticnormality of p ng heavily relies on the asymptotic equivalence Z φ k ( s ) e σ s / (2 g ) ds ∼ C Γ(1 + α ) (cid:18) σ (cid:19) α g α ) e σ / (2 g ) , (3.2)see Lemma 5.1 and the proof of Lemma 5.2 in Section 5 below. However, bydirect evaluation of the integral on the left-hand side of (3.2) for different valuesof g, it can be seen that this relation does not provide an accurate approximationin those cases where the bandwidth is relatively large, as it actually is in ourcase. It then follows that the asymptotic standard deviation will not providea good approximation of the sample standard deviation unless the bandwidthis very small. This in turn implies that the corresponding sample size mustbe extremely large. We can correct for this poor approximation of the integralin (3.2) by using the integral itself as a normalising factor instead of the right-hand side of (3.2). The results of this correction are represented in the lastline of Table 1. As it can be seen, the theoretical standard deviation and thesample standard deviation are much closer to each other. Since the kernel k was . van Es et al./Deconvolution for an atomic distribution - - Fig 13 . The kernel (3.3) . - - - Fig 14 . The Fourier transform of the ker-nel (3.3) . selected more or less arbitrarily, one is tempted to believe that an inaccurateapproximation in (3.2) might be due to the kernel. This might be the case,however to a certain degree this seems to be characteristic of all popular kernelsemployed in kernel deconvolution. Consider for instance the kernel w ( x ) = 48 x ( x −
15) cos x − x −
5) sin xπx . (3.3)Its Fourier transform is given by φ w ( t ) = (1 − t ) [ | t | < . The kernel w and its Fourier transform are plotted in Figures 13 and 14, respec-tively. This kernel was used for simulations in [22] and [47] and it was shown in[13] that it performs well in a deconvolution setting. Notice that this kernel can-not be used to estimate p if we want to plug in the resulting estimator p ng into f ∗ nhg . However, this kernel satisfies Condition 1.2 and can be used to estimate f. Nevertheless, the ratio of the left and right hand sides in (3.2) for h = 0 . . , which is still far from 1 . This issue is further discussed in [42].Another issue here is that often the error variance σ is quite small and it is sen-sible to treat σ as depending on the sample size n (with σ → n → ∞ ), see[9]. However, this is a different model and this question is not addressed here.Notice also that a perfect match between the sample standard deviation andthe theoretical standard deviation is impossible to obtain, because we neglect aremainder term when computing the latter. How large the contribution of theremainder term can be in general requires a separate simulation study.We also considered the case when the error term variance and the sample sizeare smaller (the target density f was again the standard normal density, while p was set to be 0 . σ = 0 . n = 500 . The correspond-ing histograms are given in Figures 15–18, while the sample and theoreticalcharacteristics for four different choices of the bandwidth g = 0 . , . , . .
65 are summarised in Table 2. Notice a particularly bad match between theasymptotic standard deviation and its empirical counterpart. Other conclusionsare similar to those in the previous example. . van Es et al./Deconvolution for an atomic distribution
Fig 15 . The histogram of estimates of p for g = 0 . and the sample size n = 500 . Fig 16 . The histogram of estimates of p for g = 0 . and the sample size n = 500 . Fig 17 . The histogram of estimates of p for g = 0 . and the sample size n = 500 . Fig 18 . The histogram of estimates of p for g = 0 . and the sample size n = 500 . To test the robustness of the estimator f ∗ nhg with respect to the estimatedvalue of p, we again turned to the model that was considered in the first exampleof this section. Instead of ˆ p ng three different values ˆ p = 0 . , ˆ p = 0 . p = 0 .
15 were plugged in into (4.2). The resulting estimates f ∗ nhg are plottedin Figure 19 (the true density is represented by the dashed line). As one cansee from Figure 19, under- or overestimation of p in the given range does nothave a significant impact on the resulting estimate ˆ f nh (of course one shouldkeep in mind that p is relatively small in this case). On the other hand, if thevalue of ˆ p were larger, e.g. if ˆ p = 0 . , that would have a noticeable effect, e.g.it could have suggested bimodality in the case where the density is actuallyunimodal, see Figure 20 on the facing page. At the same time the simulatedexamples concerning the estimates p ng that we considered above seem to suggestthat such instances of unsatisfactory estimates of p are not too frequent, becausemost of the observed values of p ng are concentrated in the interval [0 . , . . We also considered the case when f = 0 . φ − , + 0 . φ , , where φ x,y denotes thenormal density with mean x and variance y. Hence in this case f is a mixtureof two normal densities and it is also bimodal. The match is visually slightlyworse for ˆ p = 0 . , but it is still acceptable. . van Es et al./Deconvolution for an atomic distribution Table 2
Sample and theoretical means and standard deviations (SD) of estimates of p for differentchoices of bandwidth g. The sample size n = 500Bandwidth 0.45 0.5 0.6 0.65Sample mean 0.0972 0.0977 0.0959 0.0930Sample SD 0.0277 0.0269 0.0283 0.0295Asymptotic SD 311.7 562.3 2247 2521Theoretical SD 0.0357 0.0349 0.0338 0.0335 - Fig 19 . The normal density f and esti-mates f ∗ nhg evaluated for ˆ p = 0 . , ˆ p = 0 . , ˆ p = 0 . and the sample size n = 1000 . - Fig 20 . The normal density f and estimate f ∗ nhg evaluated for ˆ p = 0 . and the samplesize n = 1000 . The simulation examples that we considered in this section suggest that,despite the slow (logarithmic) rate of convergence, the estimator f ∗ nhg works inpractice (given that p is estimated accurately). This is somewhat comparable tothe classical deconvolution problem, where by finite sample calculations it wasshown in [47] that for lower levels of noise, the kernel estimators perform well forreasonable sample sizes, in spite of slow rates of convergence for the supersmoothdeconvolution problem, obtained e.g. in [21] and [22]. However, Condition 1.4tells us, that the bandwidths h and g have to be of order (log n ) − / . In practicethis implies that to obtain reasonable estimates, the bandwidths have to beselected fairly large, even for large samples.One more practical issue concerning the implementation of the estimator f ∗ nhg (or p ng ) is the method of bandwidth selection, which is not addressed inthis paper. We expect that techniques similar to those used in the classicaldeconvolution problem will produce comparable results in our problem. Thisrequires a separate investigation of the behaviour of the mean integrated squareerror of f ∗ nhg . In the case of the classical deconvolution problem papers thatconsider the issue of data-dependent bandwidth selection are [10, 11, 18, 28]and [39]. Yet another issue is the choice of kernels w and k. For the case ofthe classical deconvolution problem we refer to [13]. In general in kernel densityestimation it is thought that the choice of a kernel is of less importance for theperformance of an estimator than the choice of the bandwidth, see e.g. p. 31 in[48], or p. 132 in [49]. . van Es et al./Deconvolution for an atomic distribution - Fig 21 . The mixture of normal densities f and estimates f ∗ nhg evaluated for ˆ p = 0 . , ˆ p = 0 . , ˆ p = 0 . . and the sample size n =1000 . - Fig 22 . The mixture of normal densities f and estimate f ∗ nhg evaluated for ˆ p = 0 . and the sample size n = 1000 .
4. Computational issues
To compute the estimator f ∗ nhg in Section 3, a method similar to the one usedin [44] (in turn motivated by [5]) can be employed. Namely, notice thatˆ f nh ( x ) = ˆ f (1) nh ( x ) + ˆ f (2) nh ( x ) , where ˆ f (1) nh ( x ) = 12 π Z ∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt, ˆ f (2) nh ( x ) = 12 π Z ∞ e itx φ emp ( − t ) φ w ( ht ) e − σ t / dt. Using the trapezoid rule and setting v j = η ( j − , we haveˆ f (1) nh ( x ) ≈ π N X j =1 e − iv j x ψ ( v j ) η, (4.1)where N is some power of 2 and ψ ( v j ) = φ emp ( v j ) φ w ( hv j ) e − σ v j / . The Fast Fourier Transform is used to compute values of ˆ f (1) nh at N differentpoints (concerning the application of the Fast Fourier Transform in kernel de-convolution see [12]). We employ a regular spacing size δ, so that the values of x are x u = − N δ δ ( u − , where u = 1 , . . . , N. Therefore, we obtainˆ f (1) nh ( x u ) ≈ π N X j =1 e − iδη ( j − u − e iv j Nδ ψ ( v j ) η. . van Es et al./Deconvolution for an atomic distribution In order to apply the Fast Fourier Transform, note that we must take δη = 2 πN . It follows that a small η, which is needed to achieve greater accuracy in integra-tion, will result in values of x which are relatively far from each other. Therefore,to improve the integration precision, we will apply Simpson’s rule, i.e.ˆ f (1) nh ( x u ) ≈ π N X j =1 e − i πN ( j − u − e iv j Nδ ψ ( v j ) η − j − δ j − ) , where δ j denotes the Kronecker symbol (recall, that δ j is 1 , if j = 0 and is 0otherwise). The same reasoning can be applied to ˆ f (2) nh ( x ) . The estimate f ∗ nhg can then be computed by noticing that f ∗ nhg ( x ) = ˆ f nh ( x )1 − ˆ p ng − ˆ p ng − ˆ p ng w h ( x ) . (4.2)One should keep in mind that even though w h can be evaluated directly, it ispreferable to use the Fast Fourier Transform for its computation, thus avoidingpossible numerical issues, see [12]. Also notice that the direct computation of φ emp is rather time-demanding for large samples. One way to avoid this problemis to use WARPing, cf. [27]. However, for the purposes of the present study, werestricted ourselves to the direct evaluation of φ emp .
5. Proofs
Proof of Theorem 2.1.
The proof is elementary and is based on the definitionof ˆ f nh ( x ) . By Fubini’s theorem we haveE (cid:20) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21) = 12 π Z ∞−∞ e − itx φ Y ( t ) φ w ( ht ) dt. Recalling that φ Y ( t ) = p + (1 − p ) φ f ( t ) , we obtainE [ ˆ f nh ( x )] = pw h ( x ) + (1 − p ) f ∗ w h ( x ) . (5.1)Here we used the facts that12 π Z ∞−∞ e − itx φ w ( ht ) dt = w h ( x ) , π Z ∞−∞ e − itx φ f ( t ) φ w ( ht ) dt = f ∗ w h ( x ) . This concludes the proof.The proof of Theorem 2.2 is based on the following three lemmas, all of whichare reformulations of results from [46]. . van Es et al./Deconvolution for an atomic distribution
Lemma 5.1.
Assume Condition 1.2. For h → and δ ≥ fixed we have Z (1 − s ) δ φ w ( s ) e σ s / (2 h ) ds ∼ A Γ(1 + α + δ ) (cid:18) σ (cid:19) α + δ h α + δ ) e σ / (2 h ) . (5.2) Proof.
We follow the same line of thought as in [46]. Using the substitution s = 1 − h v and the dominated convergence theorem in the one but last step,we get Z (1 − s ) δ φ w ( s ) e σ s / (2 h ) ds = h Z /h ( h v ) δ φ w (1 − h v ) e σ (1 − h v ) / (2 h ) dv = h Z /h φ w (1 − h v )( h v ) α ( h v ) α + δ e σ (1 − h v ) / (2 h ) dv = h α +2 δ e σ / (2 h ) Z /h φ w (1 − h v )( h v ) α v α + δ e − σ v + σ h v / dv ∼ h α +2 δ e σ / (2 h ) A Z ∞ v α + δ e − σ v dv = h α +2 δ e σ / (2 h ) (cid:18) σ (cid:19) α + δ A Γ( α + δ + 1) . The lemma is proved.
Lemma 5.2.
Assume Condition 1.2 and let
E [ X ] < ∞ . Furthermore, let ˆ f nh be defined by (1.6) . Then as n → ∞ and h → , √ nh α e σ / (2 h ) ( ˆ f nh ( x ) − E [ ˆ f nh ( x )])= Aπ (cid:18) σ (cid:19) α (Γ( α + 1) + o (1)) U nh ( x ) + O P ( h ) , where U nh ( x ) = 1 √ n n X j =1 (cid:16) cos (cid:16) X j − xh (cid:17) − E (cid:20) cos (cid:16) X j − xh (cid:17)(cid:21) (cid:17) . (5.3) Proof.
We haveˆ f nh ( x ) = 12 π Z ∞−∞ e − itx φ w ( ht ) 1 e − σ t / φ emp ( t ) dt = 12 πh Z − e − isx/h φ w ( s ) e σ s / (2 h ) φ emp (cid:16) sh (cid:17) ds = 12 πnh n X j =1 Z − e is ( X j − x ) /h φ w ( s ) e σ s / (2 h ) ds = 1 πnh n X j =1 Z cos (cid:16) s (cid:16) X j − xh (cid:17)(cid:17) φ w ( s ) e σ s / (2 h ) ds. . van Es et al./Deconvolution for an atomic distribution Notice thatcos (cid:16) s (cid:16) X j − xh (cid:17)(cid:17) = cos (cid:16) X j − xh (cid:17) + (cid:16) cos (cid:16) s (cid:16) X j − xh (cid:17)(cid:17) − cos (cid:16) X j − xh (cid:17)(cid:17) = cos (cid:16) X j − xh (cid:17) − (cid:16)
12 ( s + 1) (cid:16) X j − xh (cid:17)(cid:17) sin (cid:16)
12 ( s − (cid:16) X j − xh (cid:17)(cid:17) = cos (cid:16) X j − xh (cid:17) + R n,j ( s ) , where R n,j ( s ) is a remainder term satisfying | R n,j | ≤ ( | x | + | X j | ) (cid:16) − sh (cid:17) , ≤ s ≤ . (5.4)The bound follows from the inequality | sin x | ≤ | x | .By Lemma 5.1, ˆ f nh ( x ) equals1 πh Z φ w ( s ) e σ s / (2 h ) ds n n X j =1 cos (cid:16) X j − xh (cid:17) + 1 n n X j =1 ˜ R n,j = Aπ (Γ( α + 1) + o (1)) (cid:18) σ (cid:19) α h α e σ / (2 h ) n n X j =1 cos (cid:16) X j − xh (cid:17) + 1 n n X j =1 ˜ R n,j , where ˜ R n,j = 1 π h Z R n,j ( s ) φ w ( s ) e σ s / (2 h ) ds. For the remainder we have, by (5.4) and Lemma 5.1, | ˜ R n,j | ≤ π ( | x | + | X j | ) 1 h Z (cid:16) − sh (cid:17) φ w ( s ) e σ s / (2 h ) ds = Aπ ( | x | + | X j | )(Γ( α + 2) + o (1)) h α (cid:18) σ (cid:19) α e σ / (2 h ) . Consequently, Var h ˜ R n,j i ≤ E h ˜ R n,j i = O (cid:16) h α e σ /h (cid:17) and 1 n n X j =1 ( ˜ R n,j − E h ˜ R n,j i ) = O P (cid:16) h α √ n e σ / (2 h ) (cid:17) , which follows from Chebyshev’s inequality. Finally, we get √ nh α e σ / (2 h ) ( ˆ f nh ( x ) − E [ ˆ f nh ( x )])= Aπ (Γ( α + 1) + o (1)) (cid:18) σ (cid:19) α U nh ( x ) + O P ( h ) , and this completes the proof of the lemma. . van Es et al./Deconvolution for an atomic distribution The next lemma establishes the asymptotic normality.
Lemma 5.3.
Assume conditions of Lemma 5.2 and let, for a fixed x , U nh ( x ) be defined by (5.3) . Then, as n → ∞ and h → , U nh ( x ) D → N (cid:18) , (cid:19) . Proof.
Write Y j = X j − xh mod 2 π. For 0 ≤ y < π we have P ( Y j ≤ y ) = ∞ X k = −∞ P (2 kπh + x ≤ X j ≤ kπh + yh + x )= ∞ X k = −∞ Z kπh + yh + x kπh + x q ( u ) du = ∞ X k = −∞ yh q ( ξ k,h )= y π ∞ X k = −∞ πh q ( ξ k,h ) ∼ y π Z ∞−∞ q ( u ) du = y π , where ξ k,h is a point in the interval [2 kπh + x, kπh + yh + x ] ⊂ [2 kπh + x, k +1) πh + x ]. Since h →
0, the last equivalence follows from a Riemann sum approx-imation of the integral and continuity of the density q of X. Consequently, as h → , we have Y j D → U, where U is uniformly distributed on the interval [0 , π ].Since the cosine is bounded and continuous, it then follows by the dominatedconvergence theorem that E [ | cos Y j | a ] → E [ | cos U | a ], for all a >
0. ThereforeE (cid:20) cos (cid:18) X j − xh (cid:19)(cid:21) → E [cos U ] = 0and E "(cid:18) cos (cid:18) X j − xh (cid:19)(cid:19) → E [(cos U ) ] = 12 . To prove asymptotic normality of U nh ( x ) , first note that it is a normalisedsum of i.i.d. random variables. We will verify that the conditions for asymptoticnormality in the triangular array scheme of Theorem 7.1.2 in [8] hold (Lya-punov’s condition). In our case this reduces to the verification of the fact that n X j =1 E [ | cos Y j − E [cos Y j ] | ] n / (Var[cos Y j ]) / = E [ | cos Y − E [cos Y ] | ] n / (Var[cos Y ]) / → . Now notice thatE [ | cos Y − E [cos Y ] | ] n / (Var[cos Y ]) / ∼ E [ | cos U | ] n / (Var[cos U ]) / → , . van Es et al./Deconvolution for an atomic distribution as n → ∞ . Consequently, U nh is asymptotically normal, U nh ( x ) D → N (cid:18) , (cid:19) . The following corollary immediately follows from Lemmas 5.2 and 5.3.
Corollary 5.1.
Under the conditions of Lemma 5.2 we have that √ nh α e σ / (2 h ) (cid:16) ˆ f nh ( x ) − E [ ˆ f nh ( x )] (cid:17) D → N , A (Γ( α + 1)) π (cid:18) σ (cid:19) α ! . Now we prove Theorem 2.2.
Proof of Theorem 2.2.
From (1.4) we have that f nh ( x ) − E[ f nh ( x )] = 11 − p ( ˆ f nh ( x ) − E[ ˆ f nh ( x )]) . Hence the result follows from Corollary 5.1.The following lemma gives the order of the variance of f nh ( x ) . Lemma 5.4.
Let Condition 1.2 hold and f nh ( x ) be defined as in (1.4) . Then,as n → ∞ and h → , Var[ f nh ( x )] = O h α ) e σ /h n ! . Proof.
We haveVar[ f nh ( x )] = 14 π (1 − p ) nh Var (cid:20)Z − e is ( X − x ) /h φ w ( s ) e σ s / (2 h ) ds (cid:21) . Notice thatVar (cid:20)Z − e is ( X − x ) /h φ w ( s ) e σ s / (2 h ) ds (cid:21) ≤ (cid:18) Z | φ w ( s ) | e σ s / (2 h ) ds (cid:19) . Recalling Lemma 5.1, we conclude thatVar[ f nh ( x )] = O h α ) e σ /h n ! . Next we deal with consistency of p ng and prove Theorem 2.3. Proof of Theorem 2.3.
We have p ng − p = ( p ng − E [ p ng ]) + (E [ p ng ] − p ) . . van Es et al./Deconvolution for an atomic distribution To prove that this expression converges to zero in probability, it is sufficient toprove that Var[ p ng ] → p ng ] − p → n → ∞ , g → . We haveVar[ p ng ] = π g Var " π Z /g − /g φ emp ( t ) φ k ( gt ) e − σ t / dt = π g Var[ ˆ f ng (0)] . Here it is understood that replacing subindex h by g entails replacement of thesmoothing characteristic function φ w by φ k . By Lemma 5.4,Var[ p ng ] = O g α e σ /g n ! . (5.5)This converges to zero due to the condition on g. Furthermore,E [ p nh ] − p = p (cid:18) Z − φ k ( t ) dt − (cid:19) + (1 − p ) g Z /g − /g φ f ( t ) φ k ( gt ) dt. (5.6)The first term here is zero, since φ k integrates to 2, while the second termconverges to zero, which can be seen upon noticing that φ k is bounded, φ f isintegrable and that this term is bounded by1 − p g Z ∞−∞ | φ f ( t ) | dt, which converges to zero as g → . The last part of the theorem follows from theidentity Z /g − /g φ f ( t ) φ k ( gt ) dt = g γ Z /g − /g t γ φ f ( t ) φ k ( gt )( gt ) γ dt and Conditions 1.1 and 1.3, because Condition 1.3 implies the existence of aconstant K, such that sup t | φ k ( t ) t − γ | < K. Next we prove asymptotic normality of p ng . Proof of Theorem 2.4.
The result follows from the definition of p ng and Corol-lary 5.1, because p ng = gπ ˆ f ng (0) essentially is a rescaled version of ˆ f ng (0) . Now we are ready to prove Theorem 2.5.
Proof of Theorem 2.5.
Write √ nh α e σ / (2 h ) ( f ∗ nhg ( x ) − E [ f ∗ nhg ( x )])= √ nh α e σ / (2 h ) (cid:18) − ˆ p ng π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt − E (cid:20) − ˆ p ng π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21)(cid:19) − √ nh α e σ / (2 h ) h w (cid:16) xh (cid:17) (cid:18) ˆ p ng − ˆ p ng − E (cid:20) ˆ p ng − ˆ p ng (cid:21)(cid:19) . (5.7) . van Es et al./Deconvolution for an atomic distribution We want to prove that the first term is asymptotically normal, while the sec-ond term converges to zero in probability. Application of Slutsky’s lemma, seeLemma 2 . √ nh α e σ / (2 h ) w (cid:16) xh (cid:17) (cid:18) ˆ p ng − ˆ p ng − E (cid:20) ˆ p ng − ˆ p ng (cid:21)(cid:19) = w (cid:16) xh (cid:17) √ nh α e σ / (2 h ) g α e σ / (2 g ) √ n ! × √ ng α e σ / (2 g ) (cid:18) ˆ p ng − ˆ p ng − E (cid:20) ˆ p ng − ˆ p ng (cid:21)(cid:19) . (5.8)Note that Condition 1.4 implies √ nh α e σ / (2 h ) g α e σ / (2 g ) √ n = (cid:18) η n δ n (cid:19) α exp (cid:18)
12 ( δ n − η n ) log n (cid:19) → . Next we prove that √ ng α e σ / (2 g ) (cid:18) ˆ p ng − ˆ p ng − E (cid:20) ˆ p ng − ˆ p ng (cid:21)(cid:19) (5.9)is asymptotically normal. Then (5.8) will converge to zero in probability, sinceconvergence to a constant in distribution is equivalent to convergence to thesame constant in probability and because w is bounded. We have √ ng α e σ / (2 g ) (ˆ p ng − E [ˆ p ng ]) D → N , C (Γ(1 + α )) (cid:18) σ (cid:19) α ! , (5.10)which can be seen as follows: √ ng α e σ / (2 g ) (ˆ p ng − E [ˆ p ng ]) = √ ng α e σ / (2 g ) ( p ng − E [ p ng ])+ √ ng α e σ / (2 g ) (ˆ p ng − p ng − E [ˆ p ng − p ng ]) . Due to Theorem 2.4 the first term here yields the asymptotic normality. Wewill prove that the second term converges to zero in probability. To this end itis sufficient to prove thatVar (cid:20) √ ng α e σ / (2 g ) (ˆ p ng − p ng ) (cid:21) = ng α e σ /g Var [ˆ p ng − p ng ] → . (5.11)It follows from the definition of ˆ p ng and Lemma 5.1 thatVar [ˆ p ng − p ng ] ≤ E [(1 − ε n − p ng ) [ p ng > − ǫ n ] ] ≤ (2 + 2 K g α ) e σ /g ) P( p ng > − ε n ) , (5.12) . van Es et al./Deconvolution for an atomic distribution where K is some constant. This and (5.11) imply that we have to prove n P( p ng > − ε n ) → . (5.13)Now P( p ng > − ǫ n ) = P( p ng − E [ p ng ] > − ǫ n − E [ p ng ]) . (5.14)Denote t n ≡ − ǫ n − E [ p ng ] and select n so large that for n ≥ n , we have t n > . Notice that t n → − p, which follows from (5.6). The probabilityin (5.14) is bounded by P( | p ng − E [ p ng ] | > t n ) . Note that p ng = n X j =1 n πk g (cid:18) − X j g (cid:19) , with k g ( x ) = 12 π Z − e − itx φ k ( t ) e σ t / (2 g ) dt. By Lemma 5.1, which is applicable in view of Condition 1.3,1 n πk g (cid:18) − xg (cid:19) (5.15)is bounded by a constant K, say, times g α e σ / (2 g ) n − . Hoeffding’s inequality,see [29], then yieldsP( | p ng − E [ p ng ] | > t n ) ≤ (cid:18) − t n K ng α ) e σ /g (cid:19) . (5.16)Since now n P( | p ng − E [ p ng ] | > t n ) ≤ n exp (cid:18) − t n K ng α ) e σ /g (cid:19) , it is enough to prove that the term on the right-hand side converges to zero.Taking the logarithm yieldslog 2 + log n − t n K ng α ) e σ /g . This diverges to minus infinity, because the last term dominates log n,ng α ) e σ /g n → ∞ . The latter fact can be seen by taking the logarithm of the left-hand side andusing (1.18). We obtainlog n − (2 + 2 α ) log g − σ g − log log n = − δ n log n + (1 + 2 α ) log log n − (2 + 2 α ) log σ + (2 + 2 α ) log(1 + δ n ) → ∞ , . van Es et al./Deconvolution for an atomic distribution which follows from (1.18). This in turn proves that (5.10) is asymptoticallynormal. Since the derivative ( y/ (1 − y )) ′ = 0 , a minor variation of the δ -methodthen implies that (5.9) is also asymptotically normal (see Theorem 3.8 in [41]for the δ -method). Consequently, the second term in (5.7) converges to zero inprobability.We now consider the first term in (5.7) and want to prove that it is asymp-totically normal. Rewrite this term as √ nh α e σ / (2 h ) (cid:18) − p π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt − E (cid:20) − p π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21)(cid:19) + √ nh α e σ / (2 h ) (cid:18)(cid:26) − ˆ p ng − − p (cid:27) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt − E (cid:20)(cid:26) − ˆ p ng − − p (cid:27) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21)(cid:19) . Thanks to Corollary 5.1 the first summand here is asymptotically normal. Wewill prove that the second term vanishes in probability. Due to Chebyshev’sinequality, it is sufficient to study the behaviour of √ nh α e σ / (2 h ) E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) ˆ p ng − p (1 − ˆ p ng )(1 − p ) 12 π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) . By the Cauchy-Schwarz inequality, after taking squares, we can instead consider nh α ) e σ /h E (cid:20) (ˆ p ng − p ) (1 − ˆ p ng ) (1 − p ) (cid:21) × E "(cid:18) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:19) . (5.17)Notice thatE "(cid:18) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:19) = Var (cid:20) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21) + (cid:18) E (cid:20) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21)(cid:19) . It is easy to see that this expression is of order h − . Indeed, due to Lemma 5.4the first term in this expression is of order n − h α ) e σ /h . The fact thatthis in turn is of lower order than h − can be seen in the same way as we did . van Es et al./Deconvolution for an atomic distribution with (5.6). For the second term we have (cid:18) E (cid:20) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21)(cid:19) = (cid:18) ph w (cid:16) xh (cid:17) + (1 − p ) 12 π Z ∞−∞ e − itx φ f ( t ) φ w ( ht ) dt (cid:19) , and this is of order h − , because (cid:12)(cid:12)(cid:12)(cid:12) (1 − p ) 12 π Z ∞−∞ e − itx φ f ( t ) φ w ( ht ) dt (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 − p ) 12 π Z ∞−∞ | φ f ( t ) | dt and because w is bounded. Consequently, taking into account (5.17), we haveto study nh α e σ /h E (cid:20) (ˆ p ng − p ) (1 − ˆ p ng ) (1 − p ) (cid:21) , or 1(1 − p ) nǫ n h α ) e σ /h E [(ˆ p ng − p ) ] , (5.18)since (1 − ˆ p ng ) − ≤ ǫ − n . NowE [(ˆ p ng − p ) ] ≤
2E [(ˆ p ng − p ng ) ] + 2E [( p ng − p ) ] . Hence we have to prove that nǫ n h α ) e σ /h E [(ˆ p ng − p ng ) ] → , (5.19) nǫ n h α ) e σ /h E [( p ng − p ) ] → . (5.20)The first fact essentially follows from the arguments concerning (5.11), sincethe presence of an additional factor ǫ − n given Condition 1.5 does not affect thearguments used. Indeed, (5.19) will hold true, if we prove that1 ǫ n nh α e σ /h g α e σ /g P( p ng > − ǫ n ) → . Here we used the definitions of p ng and ˆ p ng and Lemma 5.1. Now notice that( g/h ) α → , which follows from Condition 1.4 and that by arguments con-cerning (5.13) we have n P( p ng > − ǫ n ) → . Moreover, under Conditions 1.4and 1.5 we have ǫ − n e σ / /g − /h ) → , which can again be seen by taking thelogarithm and verifying that it diverges to minus infinity. This proves (5.19).Next we will prove (5.20). Notice that the latter is in turn implied by nǫ n h α ) e σ /h Var[ p ng ] + nǫ n h α ) e σ /h (E [ p ng − p ]) → . . van Es et al./Deconvolution for an atomic distribution The first term here converges to zero by (5.5) and Conditions 1.4 and 1.5. Nowwe turn to the second term. Taking into account (5.6), we have to study thebehaviour of √ nǫ n h α e σ / (2 h ) (1 − p ) Z − φ f (cid:18) tg (cid:19) φ k ( t ) dt. This can be rewritten as √ nǫ n h α e σ / (2 h ) g α e σ / (2 g ) √ n ! √ ng α e σ / (2 g ) Z − φ f (cid:18) tg (cid:19) φ k ( t ) dt. The factor between the brackets in this expression converges to zero. Thereforeit is sufficient to consider √ ng α e σ / (2 g ) Z − φ f (cid:18) tg (cid:19) φ k ( t ) dt. Rewrite this as √ ng α e σ / (2 g ) g γ Z /g − /g t γ φ f ( t ) φ k ( gt )( gt ) γ dt. Conditions 1.1, 1.3, 1.4 and 1.5 imply that this expression converges to zero,because the integral converges to a constant by the dominated convergencetheorem, while √ ng α e σ / (2 g ) g γ → , which can be seen by taking the logarithm and noticing that it diverges to minusinfinity. We obtain12 log n + ( γ − − α ) log g − σ g = − δ n n + ( γ − − α ) log σ + 1 + 2 α − γ δ n ) + 1 + 2 α − γ n ≤ ( γ − − α ) log σ + 1 + 2 α − γ δ n ) + 1 + 2 α − γ n → −∞ , (5.21)which follows from the facts that δ n > α − γ < . Combination ofall these intermediary results completes the proof of the theorem.
Proof of Theorem 2.6.
WriteE [ f ∗ nhg ( x )] − f ( x ) = { E [ f ∗ nhg ( x ) − f nh ( x )] } + { E [ f nh ( x )] − f ( x ) } . (5.22)Because of (1.7), the second summand in this expression vanishes as h → . Next we consider the first summand in (5.22). Using the definitions of f ∗ nhg ( x ) . van Es et al./Deconvolution for an atomic distribution and f nh ( x ) , we getE [ f ∗ nhg ( x ) − f nh ( x )] = E (cid:20) ˆ p ng − p (1 − ˆ p ng )(1 − p ) 12 π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:21) − h w (cid:16) xh (cid:17) E (cid:20) ˆ p ng − p (1 − ˆ p ng )(1 − p ) (cid:21) . (5.23)By the Cauchy-Schwarz inequality the absolute value of the first summand inthis expression is bounded by (cid:26) E (cid:20) (ˆ p ng − p ) (1 − ˆ p ng ) (1 − p ) (cid:21)(cid:27) / × ( E "(cid:18) π Z ∞−∞ e − itx φ emp ( t ) φ w ( ht ) e − σ t / dt (cid:19) / . (5.24)The fact that this term converges to zero follows from (5.17) and subsequentarguments in the proof of Theorem 2.5.Now we have to study the second summand in (5.23). By the Cauchy-Schwarzinequality and the fact that (1 − ˆ p ng ) − ≤ ǫ − n , it suffices to consider1(1 − p ) ǫ n (cid:18) h w (cid:16) xh (cid:17)(cid:19) E [(ˆ p ng − p ) ]instead. The fact that this term converges to zero follows from the argumentsconcerning (5.18), which were given in the proof of Theorem 2.5. Indeed, theexpression above can be rewritten as (cid:16) w (cid:16) xh (cid:17)(cid:17) − p ) nǫ n h α ) e σ /h E [(ˆ p ng − p ) ] h α ) e σ /h n . Now use arguments concerning (5.19), (5.20) and the facts that w is a boundedfunction and under Condition 1.4 we have h α ) e σ /h n − → . This con-cludes the proof of the first part of the theorem.Now we prove the second part, an order expansion of the bias E [ f ∗ nhg ( x )] − f ( x ) under additional assumptions given in the statement of the theorem. Theproof follows the same steps as the proof of the first part of the theorem. Noticethat under the condition f ∈ H ( β, L ) , the second summand in (5.22) is oforder h β , see Proposition 1 . h β times √ nh − − α e − σ / (2 h ) converges to zero. To this end it is sufficient to show thatlog (cid:16) h β − − α √ ne − σ / (2 h ) (cid:17) → −∞ . This essentially follows from the same argument as (5.21) (with γ replaced by β ). Now consider (5.23). Its first term is bounded by (5.24) and we have to showthat this term multiplied by √ nh − − α e − σ / (2 h ) tends to zero. The argumentsfrom the proof of Theorem 2.6 lead us to (5.18) and hence the desired result. . van Es et al./Deconvolution for an atomic distribution Proof of Theorem 2.7.
The result is a direct consequence of Theorems 2.5and 2.6.
Acknowledgements
The authors would like to thank Chris Klaassen for his careful reading of thedraft version of the paper and suggestions that led to improvement of its read-ability. Comments by an associate editor and two referees are gratefully ac-knowledged.
References [1] J. Aitchison. On the distribution of a positive random variable having adiscrete probability mass at the origin.
J. Amer. Statist. Assoc. , 50:901–908, 1955. MR0071685[2] J.O. Berger.
Statistical Decision Theory and Bayesian Analysis . Springer,New York, 2nd edition, 1985. MR0804611[3] C. Butucea and A. Tsybakov. Sharp optimality for density deconvolu-tion with dominating bias, I.
Theor. Probab. Appl. , 52:111–128, 2007.MR2354572[4] C. Butucea and A. Tsybakov. Sharp optimality for density deconvolu-tion with dominating bias, II.
Theor. Probab. Appl. , 52:336–349, 2007.MR2354572[5] P. Carr and D.B. Madan. Option valuation using the Fast Fourier Trans-form.
J. Comput. Finance , 2:61–73, 1998.[6] R. Carrol and P. Hall. Optimal rates of convergence for deconvoluting adensity.
J. Am. Statist. Assoc. , 83:1184–1186, 1988. MR0997599[7] E. Cator. Deconvolution with arbitrary smooth kernels.
Statist. Probab.Lett. , 54:205–215, 2001. MR1858635[8] K.L. Chung.
A Course in Probability Theory . Academic Press, New York,3rd edition, 2003. MR1796326[9] A. Delaigle. An alternative view of the deconvolution problem. To appearin
Stat. Sinica , 2007.[10] A. Delaigle and I. Gijbels. Bootstrap bandwidth selection in kernel densityestimation from a contaminated sample.
Ann. Inst. Statist. Math. , 56:19–47, 2003. MR2053727[11] A. Delaigle and I. Gijbels. Comparison of data-driven bandwidth selectionprocedures in deconvolution kernel density estimation.
Comp. Statist. DataAnal. , 45:249–267, 2004. MR2045631[12] A. Delaigle and I. Gijbels. Frequent problems in calculating integrals andoptimizing objective functions: a case study in density deconvolution.
Stat.Comp. , 17:349–355, 2007.[13] A. Delaigle and P. Hall. On optimal kernel choice for deconvolution.
Stat.Probab. Lett. , 76:1594–1602, 2006. MR2248846 . van Es et al./Deconvolution for an atomic distribution [14] A. Delaigle, P. Hall and A. Meister. On deconvolution with repeated mea-surements.
Ann. Statist. , 36:665–685, 2008.[15] L. Devroye.
A Course in Density Estimation . Birkh¨auser, Boston, 1987.MR0891874[16] L. Devroye. A note on consistent deconvolution in density estimation.
Canad. J. Statist. , 17:235–239, 1989. MR1033106[17] L. Devroye and L. Gy¨orfi.
Nonparametric Density Estimation: The L View . John Wiley&Sons, New York, 1985. MR0780746[18] P.J. Diggle and P. Hall. Fourier approach to nonparametric deconvolutionof a density estimate.
J. R. Statist. Soc. B , 55:523–531, 1993. MR1224414[19] J. Fan. Asymptotic normality for deconvolution kernel density estimators.
Sankhy¯a Ser. , 53:97–110, 1991. MR1177770[20] J. Fan. Global behaviour of kernel deconvolution estimates.
Statist. Sinica ,1:541–551, 1991. MR1130132[21] J. Fan. On the optimal rates of convergence for nonparametric deconvolu-tion problems.
Ann. Statist. , 19:1257–1272, 1991. MR1126324[22] J. Fan. Deconvolution for supersmooth distributions.
Canad. J. Statist. ,20:155–169, 1992. MR1183078[23] J. Fan and Y. Liu. A note on asymptotic normality for deconvolution kerneldensity estimators.
Sankhy¯a Ser. , 59:138–141, 1997. MR1665140[24] J. Fan and Y.K. Truong. Nonparametric regression with errors in variables.
Ann. Statist. , 21:1900–1925, 1993. MR1245773[25] A. Feuerverger and R.A. Mureika. The empirical characteristic functionand its applications.
Ann. Statist. , 5:88–97, 1977. MR0428584[26] S. Gugushvili. Decompounding under Gaussian noise. arXiv:0711.0719[math.ST], 2007.[27] W. H¨ardle.
Smoothing Techniques with Implementation in S.
Springer, NewYork, 1991. MR1140190[28] S.H. Hesse. Data-driven deconvolution.
J. Nonparametr. Statist. , 10:343–373, 1999. MR1717098[29] W. Hoeffding. Probability inequalities for sums of bounded random vari-ables.
J. Amer. Statist. Assoc. , 58:13–30, 1963. MR0144363[30] H. Holzmann and L. Boysen. Integrated square error asymptotics for su-persmooth deconvolution.
Scand. J. Statist. , 33:849–860, 2006. MR2300919[31] B.G. Lindsay.
Mixture Models: Theory, Geometry and Applications . IMS,Hayward, 1995.[32] M.C. Liu and R.L. Taylor. A consistent nonparametric density estima-tor for the deconvolution problem.
Canad. J. Statist. , 17:427–438, 1989.MR1047309[33] J.S. Maritz and T. Lwin.
Empirical Bayes Methods . Chapman and Hall,London, 2nd edition, 1989. MR1019835[34] R.C. Merton. Option pricing when underlying stock returns are discontin-uous.
J. Financ. Econ. , 3:125–144, 1976.[35] A. Meister. Density estimation with normal measurement error with un-known variance.
Stat. Sinica
Nonparametric Functional Estimation . Academic . van Es et al./Deconvolution for an atomic distribution
Press, Orlando, 1983. MR0740865[37] K.-I. Sato.
L´evy Processes and Infinitely Divisible Distributions . CambridgeUniversity Press, Cambridge, 2004. MR1739520[38] L.A. Stefanski. Rates of convergence of some estimators in a class of de-convolution problems.
Statist. Probab. Lett. , 9:229–235, 1990. MR1045189[39] L.A. Stefanski and R.J. Caroll. Deconvoluting kernel density estimators.
Statistics , 21:169–184, 1990. MR1054861[40] A. Tsybakov.
Introduction `a l’estimation non-param´etrique . Springer,Berlin, 2004. MR2013911[41] A.W. van der Vaart.
Asymptotic Statistics . Cambridge University Press,Cambridge, 1998. MR1652247[42] B. van Es and S. Gugushvili. Some thoughts on the asymptotics of thedeconvolution kernel density estimator. arXiv:0801.2600 [stat.ME], 2008.[43] B. van Es and S. Gugushvili. Weak convergence of the supremum distancefor supersmooth kernel deconvolution. arXiv:0802.2186 [math.ST], 2008.[44] B. van Es, S. Gugushvili, and P. Spreij. A kernel type nonparametric densityestimator for decompounding.
Bernoulli , 13:672–694, 2007. MR2348746[45] A.J. van Es and H.-W. Uh. Asymptotic normality of nonparametric kerneltype estimators: crossing the Cauchy boundary.
J. Nonparametr. Statist. ,16:261–277, 2004. MR2053074[46] A.J. van Es and H.-W. Uh. Asymptotic normality of kernel-type deconvo-lution estimators.
Scand. J. Statist. , 32:467–483, 2005. MR2204630[47] M.P. Wand. Finite sample performance of deconvolving density estimators.
Statist. Probab. Lett. , 37:131–139, 1998. MR1620450[48] M.P. Wand and M.C. Jones.
Kernel Smoothing . Chapman & Hall, London,1995. MR1319818[49] L. Wasserman.
All of Nonparametric Statistics . Springer, Berlin, 2007.MR2172729[50] C.H. Zhang. Fourier methods for estimating mixing densities and distribu-tions.