Joint behaviour of semirecursive kernel estimators of the location and of the size of the mode of a probability density function
aa r X i v : . [ m a t h . S T ] J a n Joint behaviour of semirecursive kernelestimators of the location and of the size of the modeof a probability density function
Abdelkader Mokkadem Mariane Pelletier Baba Thiam(mokkadem, pelletier, thiam)@math.uvsq.fr
Universit´e de Versailles-Saint-QuentinD´epartement de Math´ematiques45, Avenue des Etats-Unis78035 Versailles CedexFrance
Abstract : Let θ and µ denote the location and the size of the mode of aprobability density. We study the joint convergence rates ofsemirecursive kernel estimators of θ and µ . We show howthe estimation of the size of the mode allows to measurethe relevance of the estimation of its location. We alsoenlighten that, beyond their computational advantage onnonrecursive estimators, the semirecursive estimators arepreferable to use for the construction on confidence regions. AMS Subj. Classification:
Key words :
Location and size of the mode; semirecursive estimation; central limit theorem; lawof the iterated logarithm
Introduction
Let X , . . . , X n be independent and identically distributed R d -valued random variables with un-known probability density f . The aim of this paper is to study the joint kernel estimation of thelocation θ and of the size µ = f ( θ ) of the mode of f . The mode is assumed to be unique, that is, f ( x ) < f ( θ ) for any x = θ , and nondegenerated, that is, the second order differential D f ( θ ) at thepoint θ is nonsingular (in the sequel, D m g will denote the differential of order m of a multivariatefunction g ).The problem of estimating the location of the mode of a probability density was widely studied.Kernel methods were considered, among many others, by Parzen [18], Nadaraya [17], Van Ryzin[26], R¨uschendorf [23], Konakov [10], Samanta [24], Eddy ([5], [6]), Romano [20], Tsybakov [25],Vieu [27], Mokkadem and Pelletier [13], and Abraham et al. ([1], [2]). At our knowledge, thebehaviour of estimators of the size of the mode has not been investigated in detail, whereas thereare at least two statistical motivations for estimating this parameter. First, a use of an estimatorof the size is necessary for the construction of confidence regions for the location of the mode (see,e.g., Romano [20]). As a more important motivation, let us underline that the high of the peakgives information on the shape of a density; from this point view, as suggested by Vieu [27], thelocation of the mode is more related to the shape of the derivative of f , whereas the size of themode is more related to the shape of the density itself. Moreover, the knowledge of the size of themode allows to measure the pertinence of the parameter location of the mode.Let us mention that, even if the problem of estimating the size of the mode was not investigatedin the framework of density estimation, it was studied in the framework of regression estimation.M¨uller [16] proves in particular the joint asymptotic normality and independence of kernel esti-mators of the location and of the size of the mode in the framework of nonparametric regressionmodels with fixed design. In the framework of nonparametric regression with random design, asimilar result is obtained by Ziegler ([32], [33]) for kernel estimators, and by Mokkadem and Pel-letier [14] for estimators issued from stochastic approximation methods.This paper is focused on semirecursive kernel estimators of θ and f ( θ ). To explain why we chosethis option of semirecursive estimators, let us first recall that the (nonrecursive) wellknown kernelestimator of the location of the mode introduced by Parzen [18] is defined as a random variable θ ∗ n satisfying f ∗ n ( θ ∗ n ) = sup y ∈ R d f ∗ n ( y ) , where f ∗ n is Rosenblatt’s estimator of f ; more precisely, f ∗ n ( x ) = 1 nh dn n X i =1 K (cid:18) x − X i h n (cid:19) , where the bandwidth ( h n ) is a sequence of positive real numbers going to zero and the kernel K is acontinuous function satisfying lim k x k→ + ∞ K ( x ) = 0, R R d K ( x ) dx = 1. The asymptotic behaviour of θ ∗ n was widely studied (see, among others, [5], [6], [10], [13], [17], [18], [20], [23], [24], [26], [27]), but,on a computational point of view, the estimator θ ∗ n has a main drawback: its update, from a samplesize n to a sample size n + 1, is far from being immediate. Applying the stochastic approximationmethod, Tsybakov [25] introduced the recursive kernel estimator of θ defined as T n = T n − + γ n h d +1 n ∇ K (cid:18) T n − − X n h n (cid:19) , T ∈ R d is arbitrarily chosen, and the stepsize ( γ n ) is a sequence of positive real numbersgoing to zero. The great property of this estimator is that its update is very rapid. Unfortunately,for reasons inherent to stochastic approximation algorithms properties, very strong assumptionson the density f must be required to ensure its consistency. A recursive version f n of Rosenblatt’sdensity estimator was introduced by Wolverton and Wagner [30] (and discussed, among others,by Yamato [31], Davies [3], Devroye [4], Menon et al. [12], Wertz [29], Wegman and Davies [28],Roussas [22], and Mokkadem et al. [15]). Let us recall that f n is defined as f n ( x ) = 1 n n X i =1 h di K (cid:18) x − X i h i (cid:19) . (1)Its update from a sample of size n to one of size n + 1 is immediate since f n clearly satisfies therecursive relation f n ( x ) = (cid:18) − n (cid:19) f n − ( x ) + 1 nh dn K (cid:18) x − X n h n (cid:19) . This property of rapid update of the density estimator is particularly important in the frameworkof mode estimation, since the number of points where f must be estimated is very large. We thusdefine a semirecursive version of Parzen’s estimator of the location of the mode by using Wolverton-Wagner’s recursive density estimator, rather than Rosenblatt’s density estimator. More precisely,our estimator θ n of the location θ of the mode is a random variable satisfying f n ( θ n ) = sup y ∈ R d f n ( y ) . (2)Let us mention that, in the same way as for Parzen’s estimator, the fact that the kernel K iscontinuous and vanishing at infinity ensures that the choice of θ n as a random variable satisfying(2) can be made with the help of an order on R d . For example, one can consider the followinglexicographic order: x ≤ y if the first nonzero coordinate of x − y is negative. The definition θ n = inf ( y ∈ R d such that f n ( y ) = sup x ∈ R d f n ( x ) ) , where the infimum is taken with respect to the lexicographic order on R d , ensures the measurabilityof the kernel mode estimator.Let us also mention that, in order to make more rapid the computation of the kernel estimatorof the location of the mode, Abraham et al. ([1], [2]) proposed the following alternative version ofParzen’s estimator θ ∗ n : ˆ θ ∗ n = argmax ≤ i ≤ n f ∗ n ( X i ) . Similarly, we could consider the following alternative version of our semirecursive estimator θ n :ˆ θ n = argmax ≤ i ≤ n f n ( X i ) . However, to establish the asymptotic properties of ˆ θ ∗ n , Abraham et al. [2] prove the asymptoticproximity between θ ∗ n and ˆ θ ∗ n , which allows them to deduce the asymptotic weak behaviour of ˆ θ ∗ n from the one of θ ∗ n . In the same way, we can conjecture that the asymptotic weak behaviour ofˆ θ n could be deduced from the one of θ n , but, in this paper, we limit ourselves on establishing the3symptotic properties of θ n .Let us now come back to the problem of estimating the size f ( θ ) of the mode. The ordinarilyused estimator is defined as µ ∗ n = f ∗ n ( θ ∗ n ) ( f ∗ n being Rosenblatt’s density estimator and θ ∗ n Parzen’smode estimator); the consistency of µ ∗ n is sufficient to allow the construction of confidence regionsfor θ (see, e.g., Romano [20]). Adapting the construction of µ ∗ n to the semirecursive frameworkwould lead us to estimate f ( θ ) by µ n = f n ( θ n ) . (3)However, this estimator has two main drawbacks (as well as µ ∗ n ). First, the use of a higher orderkernel K is necessary for ( µ n − µ ) to satisfy a central limit theorem, and thus for the constructionof confidence intervals of µ (and of confidence regions for ( θ, µ )). Moreover, in the case when ahigher order kernel is used, it is not possible to choose a bandwidth for which both estimators θ n and µ n converge at the optimal rate. These constations lead us to use two different bandwidths,one for the estimation of θ , the other one for the estimation of µ . More precisely, let ˜ f n be therecursive kernel density estimator defined as˜ f n ( x ) = 1 n n X i =1 h di K (cid:18) x − X i ˜ h i (cid:19) , where the bandwidth (˜ h n ) may be different from ( h n ) used in the definition of f n (see (1)); weestimate the size of the mode by ˜ µ n = ˜ f n ( θ n ) , (4)where θ n is still defined by (2), and thus with the first bandwidth ( h n ).The purpose of this paper is the study of the joint asymptotic behaviour of θ n and ˜ µ n . We firstprove the strong consistency of both estimators. We then establish the joint weak convergence rateof θ n and ˜ µ n . We prove in particular that adequate choices of the bandwidths lead to the asymptoticnormality and independence of these estimators, and that the use of different bandwidths allowto obtain simultaneously the optimal convergence rate of both estimators. We then apply ourweak convergence rate result to the construction of confidence regions for ( θ, µ ), and illustratethis application with a simulations study. This application enlightens the advantage of usingsemirecursive estimators rather than nonrecursive estimators. It also shows how the estimationof the size of the mode gives information on the relevance of estimating its location. Finally, weestablish the joint strong convergence rate of θ n and ˜ µ n . Throughout this paper, ( h n ) and (˜ h n ) are defined as h n = h ( n ) and ˜ h n = ˜ h ( n ) for all n ≥
1, where h and ˜ h are two positive functions. The conditions we require for the strong consistency of θ n and ˜ µ n are the following.(A1) i) K is an integrable, differentiable, and even function such that R R d K ( z ) dz = 1.ii) There exists ζ > R R d k z k ζ | K ( z ) dz | < ∞ .iii) K is H¨older continuous.iv) There exists γ > z
7→ k z k γ | K ( z ) | is a bounded function.4A2) i) f is uniformly continuous on R d .ii) There exists ξ > R R d k x k ξ f ( x ) dx < ∞ .iii) There exists η > z
7→ k z k η f ( z ) is a bounded function.iv) There exists θ ∈ R d such that f ( x ) < f ( θ ) for all x = θ .(A3) The functions h and ˜ h are locally bounded and vary regularly with exponent ( − a ) and ( − ˜ a )respectively, where a ∈ ]0 , / ( d + 4)[, ˜ a ∈ ]0 , / ( d + 2)[. Remark 1
Note that (A1)iv) implies that K is bounded. Remark 2
The assumptions required on the probability density to establish the strong consistencyof the semirecursive estimator of the location of the mode are slightly stronger than those neededfor the nonrecursive estimator (see, e.g., [13], [20]), but are much weaker than the ones needed forthe recursive estimator (see [25]).
Remark 3
Let us recall that a positive function (not necessarily monotone) L defined on ]0 , ∞ [ is slowly varying if lim t →∞ L ( tx ) / L ( t ) = 1 , and that a function G varies regularly with exponent ρ , ρ ∈ R , if and only if it is of the form G ( x ) = x ρ L ( x ) with L slowly varying (see, for example,Feller [8] page 275). Typical examples of regularly varying functions are x ρ , x ρ log x , x ρ log log x , x ρ log x/ log log x , and so on. Proposition 1
Let θ n and ˜ µ n be defined by (2) and (4), respectively. Under (A1)-(A3), lim n →∞ θ n = θ a.s. and lim n →∞ ˜ µ n = µ a.s. In order to state the weak convergence rate of θ n and ˜ µ n , we need the following additional assump-tions on K and f .(A4) i) K is twice differentiable on R d .ii) z z ∇ K ( z ) is integrable.iii) For any ( i, j ) ∈ { , . . . , d } , ∂ K/∂x i ∂x j is bounded integrable and H¨older continuous.iv) K is a kernel of order q ≥ ∀ s ∈ { , . . . , q − } , ∀ j ∈ { , . . . , d } , R R d y sj K ( y ) dy j = 0and R R d | y qj K ( y ) | dy < ∞ .(A5) i) D f ( θ ) is nonsingular.ii) D f is q -times differentiable, ∇ f and D q f are bounded.iii) For any ( i, j ) ∈ { , . . . , d } , sup x ∈ R d k D q (cid:0) ∂ f /∂x i ∂x j (cid:1) k < ∞ , and for any k ∈ { , . . . , d } ,sup x ∈ R d k D q ( ∂f /∂x k ) k < ∞ . Remark 4
Note that (A4)ii) and (A4)iii) imply that ∇ K is Lipschitz-continuous and integrable;it is thus straightforward to see that lim k x k→∞ k∇ K ( x ) k = 0 (and in particular ∇ K is bounded). We also need to add conditions on the bandwiths. Let us set L θ ( n ) = n a h n and L µ ( n ) = n ˜ a ˜ h n . (In view of (A3), L θ and L µ are positive slowly varying functions, see Remark 3). In the statementof the the weak convergence rate of θ n and ˜ µ n , we shall refer to the following conditions.5C1) One of the following two conditions is fulfilled.i) 1 d + 4 < ˜ a < qd + 2 q + 2 and ˜ aq < a < − ad + 2 ;ii) 1 d + 2 q < ˜ a ≤ d + 4 and 1 d + 2 q + 2 < a < ad d + 2)(C2) One of the following two conditions is fulfilled.i) 0 < ˜ a < d + 2 q and ˜ a < a < d + 2 q + 2 ;ii) ˜ a = 1 d + 2 q , lim n →∞ L µ ( n ) = ∞ and 12( d + 2 q ) < a < d + 2 q + 2 . Remark 5 (C1) implies that lim n →∞ nh d +2 q +2 n = 0 and lim n →∞ n ˜ h d +2 qn = 0 , whereas (C2) impliesthat lim n →∞ nh d +2 q +2 n = ∞ and lim n →∞ n ˜ h d +2 qn = ∞ . We finally need to introduce the following notation: B q ( θ ) = ( − q q !(1 − aq ) ∇ (cid:16) P dj =1 β qj ∂ q f∂x qj ( θ ) (cid:17) ( − q q !(1 − ˜ aq ) P dj =1 β qj ∂ q f∂x qj ( θ ) with β qj = Z R d y qj K ( y ) dy, aq = 1 and ˜ aq = 1 , (5) A = (cid:18) − (cid:2) D f ( θ ) (cid:3) −
00 1 (cid:19) , Σ = f ( θ ) G a ( d +2) f ( θ ) R R d K ( z ) dz ad ! , (6) G is the matrix d × d defined by G ( i,j ) = Z R d ∂K∂x i ( x ) ∂K∂x j ( x ) dx , and, for any c, ˜ c ≥ D ( c, ˜ c ) = (cid:18) √ cI d √ ˜ c (cid:19) where I d is the d × d identity matrix. Theorem 1
Let θ n and ˜ µ n be defined by (2) and (4), respectively, and assume that (A1)-(A5)hold.i) If (C1) is satisfied, then p nh d +2 n ( θ n − θ ) q n ˜ h dn (˜ µ n − µ ) ! D −→ N (0 , A Σ A ) . ii) If a = ( d + 2 q + 2) − , ˜ a = ( d + 2 q ) − , and if there exist c, ˜ c ≥ such that lim n →∞ nh d +2 q +2 n = c and lim n →∞ n ˜ h d +2 qn = ˜ c , then p nh d +2 n ( θ n − θ ) q n ˜ h dn (˜ µ n − µ ) ! D −→ N ( D ( c, ˜ c ) AB q ( θ ) , A Σ A ) . iii) If (C2) is satisfied, then h qn ( θ n − θ ) h qn (˜ µ n − µ ) ! P −→ AB q ( θ ) . emark 6 The simultaneous weak convergence rate of nonrecursive estimators of the location andsize of the mode can be established by following the lines of the proof of Theorem 1. More precisely,set B ∗ q ( θ ) = ( − q q ! ∇ (cid:16) P dj =1 β qj ∂ q f∂x qj ( θ ) (cid:17) ( − q q ! P dj =1 β qj ∂ q f∂x qj ( θ ) , Σ ∗ = (cid:18) f ( θ ) G f ( θ ) R R d K ( z ) dz (cid:19) , let θ ∗ n be Parzen’s kernel estimator of the location of the mode and ˜ µ ∗ n = ˜ f ∗ n ( θ ∗ n ) be the kernelestimator of the size of the mode defined with the help of θ ∗ n and of Rosenblatt’s density estimator ˜ f ∗ n (the bandwidth (˜ h n ) defining ˜ f ∗ n being eventually different from the banwidth ( h n ) used to define θ ∗ n ); Theorem 1 holds when θ n , ˜ µ n , B q ( θ ) , Σ are replaced by θ ∗ n , ˜ µ ∗ n , B ∗ q ( θ ) , Σ ∗ , respectively. Part 1 and Part 2 in the case c = ˜ c = 0 (respectively Part 3) of Theorem 1 correspond to thecase when the bias (respectively the variances) of both estimators θ n and ˜ µ n are negligeable in frontof their respective variances (respectively bias). When c, ˜ c >
0, Part 2 of Theorem 1 correspondsto the case when the bias and the variance of each estimator θ n and ˜ µ n have the same convergencerate. Other possible conditions lead to different combinations; these ones have been omitted forsake of simplicity.Theorem 1 gives the joint weak convergence rate of θ n and ˜ µ n . Of course, it is also possibleto estimate the location and the size of the mode separately. Concerning the estimation of thelocation of the mode, let us enlighten that the advantage of the semirecursive estimator θ n on itsnonrecursive version θ ∗ n is that its asymptotic variance [1 + a ( d + 2)] − f ( θ ) G is smaller than the oneof Parzen’s estimator, which equals f ( θ ) G (see, e.g. Romano [20] for the case d = 1 and Mokkademand Pelletier [13] for the case d ≥ µ n is constructed with the help of the estimator θ n .To get a good estimation of the size of the mode, it seems obvious that θ n should be computedwith a bandwidth ( h n ) leading to its optimal convergence rate (or, at least, to a convergence rateclose to the optimal one). The main information given by Theorem 1 is that, for ˜ µ n to converge atthe optimal rate, the use of a second bandwidth (˜ h n ) is then necessary.Let us enlighten that, in the case when θ n and ˜ µ n satisfy a central limit theorem (Parts 1 and2 of Theorem 1), these estimators are asymptotically independent, although, in its definition, theestimator of the size of the mode is heavily connected to the one of the location of the mode. Aspointed out by a referee, this property was expected. As a matter of fact (and as mentioned inthe introduction), the location of the mode is a parameter which gives information on the shape ofthe density derivative, whereas the size of the mode gives information on the shape of the densityitself. This constatation must be related to the fact that the weak (and strong) convergence rateof θ n is given by the one of the gradient of f n , whereas the weak (and strong) convergence rate of˜ µ n is given by the one of ˜ f n itself; the variance of the density estimators converging to zero fasterthan the one of the estimators of the density derivatives, the asymptotic independence of θ n and˜ µ n is completely explained.Let us finally say one word on our assumptions on the bandwidths. In the framework ofnonrecursive estimation, there is no need to assume that ( h n ) and (˜ h n ) are regularly varyingsequences. In the case of semirecursive estimation, this assumption can obviously not be omitted,since the exponents a and ˜ a stand in the expressions of the asymptotic bias B q ( θ ) and varianceΣ. This might be seen as a slight inconvenient of semirecursive estimation; however, as it is7nlightened in the following section, it turns out to be an advantage, since the asymptotic variancesof the semirecursive estimators are smaller than the ones of the nonrecursive estimators. The application of Theorem 1 (and of Remark 6) allows the construction of confidence regions (si-multaneous or not) of the location and of the size of the mode, as well as confidence ellipsoids of thecouple ( θ, µ ). Hall [9] shows that, in order to construct confidence regions, avoiding bias estimationby a slight undersmoothing is more efficient than explicit bias correction. In the framework ofundersmoothing, the asymptotic bias of the estimator is negligeable in front of its asymptotic vari-ance; according to the estimation by confidence regions point of view, the parameter to minimizeis thus the asymptotic variance. Now, note thatΣ = (cid:18) [1 + a ( d + 2)] − I d
00 [1 + ˜ ad ] − (cid:19) Σ ∗ (where A Σ A (respectively A Σ ∗ A ) is the asymptotic covariance matrix of the semirecursive estima-tors ( θ n , ˜ µ n ) (respectively of the nonrecursive estimators ( θ ∗ n , ˜ µ ∗ n )). In order to construct confidenceregions for the location and/or size of the mode, it is thus much preferable to use semirecursiveestimators rather than nonrecursive estimators. Simulations studies confirm this theoritical con-clusion, whatever the parameter ( θ , µ or ( θ, µ )) for which confidence regions are contructed is. Forsake of succintness, we do not give all these simulations results here, but focuse on the constructionof confidence ellipsoid for ( θ, µ ); the aim of this example is of course to enlighten the advantage ofusing semirecursive estimators rather than nonrecursive estimators, but also to show how this con-fidence region gives informations on the shape of the density, and, consequently allows to measurethe pertinence of the parameter location of the mode.To construct confidence regions for ( θ, µ ), we consider the case d = 1. The following corollaryis a straightforward consequence of Theorem 1. Corollary 1
Let θ n and ˜ µ n be defined by (2) and (4), respectively, and assume that (A1)-(A5)hold. Moreover, let ( h n ) and (˜ h n ) either satisfy (C1) or be such that lim n →∞ nh q +3 n = 0 and lim n →∞ n ˜ h q +1 n = 0 with a = (2 q + 3) − and ˜ a = (2 q + 1) − . We then have (1 + 3 a ) nh n [ f ′′ ( θ )] f ( θ ) R R K ′ ( x ) dx ( θ n − θ ) + (1 + ˜ a ) n ˜ h n f ( θ ) R R K ( x ) dx (˜ µ n − µ ) D −→ χ (2) . (7) Moreover, (7) still holds when the parameters f ( θ ) and f ′′ ( θ ) are replaced by consistent estimators. Remark 7
In view of Remark 6, in the case when the nonrecursive estimators θ ∗ n and ˜ µ ∗ n are used,(7) becomes nh n [ f ′′ ( θ )] f ( θ ) R R K ′ ( x ) dx ( θ ∗ n − θ ) + n ˜ h n f ( θ ) R R K ( x ) dx (˜ µ ∗ n − µ ) D −→ χ (2) (8) (and, again, this convergence still holds when the parameters f ( θ ) and f ′′ ( θ ) are replaced by con-sistent estimators). f ′′ n (respectively ˇ f ∗ ′′ n ) be the recursive estimator (respectively the nonrecursive Rosenblatt’sestimator) of f ′′ computed with the help of a bandwidth ˇ h n , and set P n = (1 + 3 a ) nh n [ ˇ f ′′ n ( θ n )] ˜ f n ( θ n ) R R K ′ ( x ) dx , Q n = (1 + ˜ a ) n ˜ h n ˜ f n ( θ n ) R R K ( x ) dx ,P ∗ n = nh n [ ˇ f ∗ ′′ n ( θ ∗ n )] ˜ f ∗ n ( θ ∗ n ) R R K ′ ( x ) dx , Q ∗ n = n ˜ h n ˜ f ∗ n ( θ ∗ n ) R R K ( x ) dx . Moreover, let c α be such that P ( Z ≤ c α ) = 1 − α , where Z is χ (2)-distributed; in view of Corollary 1and Remark 7, the sets E α = (cid:8) ( θ, µ ) / P n ( θ n − θ ) + Q n (˜ µ n − µ ) ≤ c α (cid:9) E ∗ α = (cid:8) ( θ, µ ) / P ∗ n ( θ ∗ n − θ ) + Q ∗ n (˜ µ ∗ n − µ ) ≤ c α (cid:9) are confidence ellipsoids for ( θ, µ ) with asymptotic coverage level 1 − α . Let us dwell on the factthat both confidence regions have the same asymptotic level, but the lengths of the axes of thefirst one (constructed with the help of the semirecursive estimators θ n and ˜ µ n ) are smaller thanthe ones of the second one (constructed with the help of the nonrecursive estimators θ ∗ n and ˜ µ ∗ n ).We now present simulations results. In order to see the relationship between the shape of theconfidence ellipsoids and the one of the density, the density f we consider is the density of the N (0 , σ )-distribution, the parameter σ taking the values 0 .
3, 0 .
4, 0 .
5, 0 .
7, 0 .
75, 1, 1 .
5, 2, and 2 . n = 100 and the coverage level 1 − α = 95% (and thus c α = 5 . N = 5000. The kernel we use is the standard Gaussian density;the bandwidths are h n = n − / (log n ) , ˜ h n = n − / (log n ) , ˇ h n = n − / . Table 1 below gives, for each value of σ , the empirical values of θ n , θ ∗ n , µ n , µ ∗ n (with respect tothe 5000 simulations), and: b the empirical length of the θ -axis of the confidence ellipsoid E ; b ∗ the empirical length of the θ -axis of the confidence ellipsoid E ∗ ; a the empirical length of the µ -axis of the confidence ellipsoid E ; a ∗ the empirical length of the µ -axis of the confidence ellipsoid E ∗ ; p the empirical coverage level of the confidence ellipsoid E ; p ∗ the empirical coverage level of the confidence ellipsoid E ∗ . Table 1 σ . . . . .
75 1 1 . . θ n − .
002 0 .
004 0 .
001 0 .
003 0 .
002 0 . − . − .
009 0 . θ ∗ n .
003 0 .
005 0 .
001 0 . − .
008 0 .
016 0 . − . − . b .
154 1 .
346 1 .
805 2 .
898 3 .
160 5 .
218 10 .
094 17 .
866 17 . b ∗ .
166 1 .
458 1 .
968 3 .
300 3 .
582 5 .
925 12 .
943 21 .
946 23 . µ n .
335 0 .
989 0 .
782 0 .
564 0 .
522 0 .
401 0 .
263 0 .
196 0 . µ ∗ n .
312 0 .
979 0 .
783 0 .
562 0 .
512 0 .
388 0 .
269 0 .
193 0 . a .
444 0 .
399 0 .
365 0 .
322 0 .
315 0 .
283 0 .
247 0 .
224 0 . a ∗ .
514 0 .
459 0 .
420 0 .
369 0 .
363 0 .
327 0 .
287 0 .
261 0 . p .
7% 97 .
8% 98 .
2% 98 .
4% 97 .
7% 97 .
8% 97 .
5% 97 .
2% 98 . p ∗ .
6% 98 .
1% 98 .
4% 98 .
2% 96 .
8% 96 .
6% 96 .
9% 97 .
7% 98 . E ∗ and E are similar, but that the empirical areas of the ellipsoids E (constructedwith the help of the semirecursive estimators) are always smaller than the ones of the the ellipsoids E ∗ (constructed with the help of the nonrecursive estimators).Let us now discuss the interest of the estimation of the size of the mode and the one of the jointestimation of the location and size of the mode. Both estimations give informations on the shapeof the probability density and, consequently, allow to measure the pertinence of the parameterlocation of the mode. Of course, the parameter θ is significant only in the case when the highof the peak is large enough; since we consider here the example of the N (0 , σ )-distribution, thiscorresponds to the case when σ is small enough. Estimating only the size of the mode gives a firstidea of the shape of the density around the location of the mode (for instance, when the size isestimated around 0 .
16, it is clear that the density is very flat). Now, the shape of the confidenceellipsoids allows to get a more precise idea. As a matter of fact, for small values of σ , the lengthof the µ -axis is larger than the one of the θ -axis; as σ increases, the length of the µ -axis decreases,and the one of the θ -axis increases (for σ = 2 .
5, the length of the θ -axis is larger than 20 times theone of the µ -axis). Let us underline that these variations of the lengths of the axes are not dueto bad estimations results; Table 2 below gives the values of the lengths b (respectively b ∗ ) of the θ -axis, a (respectively a ∗ ) of the µ -axis of the ellipsoids computed with the semirecursive estima-tors θ n and ˜ µ n (respectively with the nonrecursive estimators θ ∗ n and ˜ µ ∗ n ) in the case when the truevalues of the parameters f ( θ ) and f ′′ ( θ ) are used (that is, by straightforwardly applying (7) and (8)). Table 2 σ . . . . .
75 1 1 . . b .
159 0 .
327 0 .
571 1 .
357 1 .
572 3 .
227 8 .
895 18 .
260 31 . b ∗ .
190 0 .
390 0 .
682 1 .
622 1 .
879 3 .
858 10 .
631 21 .
825 38 . µ .
333 0 .
998 0 .
798 0 .
570 0 .
532 0 .
399 0 .
266 0 .
199 0 . a .
465 0 .
403 0 .
360 0 .
303 0 .
294 0 .
255 0 .
208 0 .
180 0 . a ∗ .
509 0 .
441 0 .
395 0 .
332 0 .
322 0 .
279 0 .
228 0 .
197 0 . To establish the joint strong convergence rate of θ n and ˜ µ n , we need the following additionnalassumption.(A6) i) h and ˜ h are differentiables, their derivatives vary regularly with exponent ( − a −
1) and( − ˜ a −
1) respectively.ii) There exists n ∈ N such that n ≥ m ≥ n ⇒ max ( mh − ( d +2) m nh − ( d +2) n ; m ˜ h − dm n ˜ h − dn ) = min n mh − ( d +2) m ; m ˜ h − dm o min n nh − ( d +2) n ; n ˜ h − dn o . Remark 8
Assumption (A6)ii) holds when a = ˜ a , and in the case a = ˜ a , it is satisfied when L θ ( n ) = ( L µ ( n )) dd +2 for n large enough. Moreover, condition (C2) is replaced by the following one.(C’2) Either (C2) i) is fulfilled or ˜ a = 1 d + 2 q , lim n →∞ ( L µ ( n )) d +2 q n = ∞ , and 12( d + 2 q ) < a < d + 2 q + 2 . 10efore stating the almost sure convergence rate of ( θ Tn , ˜ µ n ) T , let us remark that Proposition 2.3in Mokkadem and Pelletier [13] ensures that the matrix G (and thus the matrix Σ) is nonsingular. Theorem 2
Let θ n and ˜ µ n be defined by (2) and (4), respectively, and assume that (A1)-(A6)hold.i) If (C1) is fulfilled, then, with probability one, the sequence √ n p nh d +2 n ( θ n − θ ) q n ˜ h dn (˜ µ n − µ ) ! is relatively compact and its limit set is the ellipsoid E = n ν ∈ R d +1 such that ν T A − Σ − A − ν ≤ o . ii) If a = ( d + 2 q + 2) − , ˜ a = ( d + 2 q ) − , and if there exist c, ˜ c ≥ such that lim n →∞ nh d +2 q +2 n / (2 log log n ) = c and lim n →∞ n ˜ h d +2 qn / (2 log log n ) = ˜ c , then, with probabilityone, the sequence √ n p nh d +2 n ( θ n − θ ) q n ˜ h dn (˜ µ n − µ ) ! is relatively compact and its limit set is the ellipsoid E = n ν ∈ R d +1 such that (cid:0) A − ν − D ( c, ˜ c ) B q ( θ ) (cid:1) T Σ − (cid:0) A − ν − D ( c, ˜ c ) B q ( θ ) (cid:1) ≤ o . iii) If (C’2) is satisfied, then h qn ( θ n − θ ) h qn (˜ µ n − µ ) ! a.s. −→ AB q ( θ ) . Remark 9 (C’1) implies that lim n →∞ nh d +2 q +2 n / log log n = 0 and lim n →∞ n ˜ h d +2 qn / log log n = 0 ,whereas (C’2) implies that lim n →∞ nh d +2 q +2 n / log log n = ∞ and lim n →∞ n ˜ h d +2 qn / log log n = ∞ . Laws of the iterated logarithm for Parzen’s nonrecursive kernel mode estimator were establishedby Mokkadem and Pelletier [13]. The technics of demonstration used in the framework of nonre-cursive estimators are totally different from those employed to prove Theorem 2. This is due tothe following fondamental difference between the nonrecursive estimator θ ∗ n and the semirecursiveestimator θ n : the study of the asymptotic behaviour of θ ∗ n comes down to the one of a triangularsum of independent variables, whereas the study of the asymptotic behaviour of θ n reduces to theone of a sum of independent variables. Of course, this difference is not quite important for thestudy of the weak convergence rate. But, for the study of the strong convergence rate, it makesthe case of the semirecursive estimation much easier than the case of the nonrecursive estimation.In particular, on the oppposite to the weak convergence rate, the joint strong convergence rate ofthe nonrecursive estimators θ ∗ n and ˜ µ ∗ n cannot be obtained by following the lines of the proof ofTheorem 2, and remains an open question. 11 Proofs
Let us first note that an important consequence of (A3) which will be used throughout the proofsis that if βa < , then lim n →∞ nh βn n X i =1 h βi = 11 − aβ . (9)Moreover, for all ε > n n X i =1 h qi = O (cid:18) h q − εn + 1 n (cid:19) . (10)As a matter of fact: (i) if aq <
1, (10) follows easily from (9); (ii) if aq >
1, since P i h qi is summable,(10) holds; (iii) if aq = 1, since a ( q − ε ) <
1, using (9) again, we have n − P ni =1 h qi = O ( h q − εn ), andthus (10) follows. Of course (9) and (10) also hold when ( h n ) and a are replaced by (˜ h n ) and ˜ a ,respectively.Our proofs are now organized as follows. Section 3.1 is devoted to the proof of the strongconsistency of θ n and ˜ µ n . In Section 3.2, we give the convergence rate of the derivatives of f n .In Section 3.3, we show how the study of the joint weak and strong convergence rate of θ n and˜ µ n can be related to the one of ∇ f n ( θ ) and ˜ f n ( θ ). In Section 3.4 (respectively in Section 3.5), weestablish the joint weak convergence rate (respectively the joint strong convergence rate) of ∇ f n ( θ )and ˜ f n ( θ ). Finally, Section 3.6 is devoted to the proof of Theorems 1 and 2. Since θ n is the mode of f n and θ the mode of f , we have:0 ≤ f ( θ ) − f ( θ n ) = [ f ( θ ) − f n ( θ n )] + [ f n ( θ n ) − f ( θ n )] ≤ [ f ( θ ) − f n ( θ )] + [ f n ( θ n ) − f ( θ n )] ≤ (cid:12)(cid:12) f ( θ ) − f n ( θ ) (cid:12)(cid:12) + (cid:12)(cid:12) f n ( θ n ) − f ( θ n ) (cid:12)(cid:12) ≤ k f n − f k ∞ . (11)The application of Theorem 5 in Mokkadem et al. [15] with | α | = 0 and v n = log n ensures that forany δ >
0, there exists c ( δ ) > P [(log n ) k f n − E ( f n ) k ∞ ≥ δ ] ≤ exp( − c ( δ ) P ni =1 h di / (log n ) ).In view of (9), since ad <
1, we can write n exp (cid:18) − c ( δ ) P ni =1 h di (log n ) (cid:19) = n exp (cid:18) − c ( δ ) nh dn (log n ) P ni =1 h di nh dn (cid:19) = o (1) . Borell-Cantelli’s Lemma ensures that lim n →∞ k f n − E ( f n ) k ∞ = 0 a.s. Since lim n →∞ k E ( f n ) − f k ∞ =0, it follows from (11) that lim n →∞ f ( θ n ) = f ( θ ) a.s. Since f is continuous, lim k z k→∞ f ( z ) = 0 and θ is the unique mode of f , we deduce that lim n →∞ θ n = θ a.s. Now, we have | ˜ µ n − µ | ≤ | ˜ f n ( θ n ) − f ( θ n ) | + | f ( θ n ) − f ( θ ) | ≤ k ˜ f n − f k ∞ + 2 k f n − f k ∞ , where the last inequality follows from (11). As previously, one can show that lim n →∞ k ˜ f n − f k ∞ = 0and thus lim n →∞ ˜ µ n = µ a.s. For any d -uplet [ α ] = (cid:16) α , . . . , α d (cid:17) ∈ N d , we set | α | = α + · · · + α d and, for any function g , let ∂ [ α ] g ( x ) = ∂ | α | g/ ( ∂x α . . . ∂x α d d )( x ) denote the [ α ]-th partial derivative of g .12 emma 1 Assume (A3)-(A5) hold. Let ( g n ) and ( b n ) be defined as follows: (cid:26) g n = f n and b n = h n or g n = ˜ f n and b n = ˜ h n . (12) For | α | ∈ { , , } , we have lim n →∞ n P ni =1 b qi (cid:20) E (cid:2) ∂ [ α ] g n ( x ) (cid:3) − ∂ [ α ] f ( x ) (cid:21) = ( − q q ! ∂ [ α ] d X j =1 β qj ∂ q f∂x qj ( x ) where β qj is defined in (5). Moreover, if we set M q = sup x ∈ R d k D q ∂ [ α ] f ( x ) k , then lim n →∞ n P ni =1 b qi sup x ∈ R d (cid:12)(cid:12)(cid:12) E (cid:16) ∂ [ α ] g n ( x ) (cid:17) − ∂ [ α ] f ( x ) (cid:12)(cid:12)(cid:12) ≤ M q q ! Z R d k z k q | K ( z ) | dz. Lemma 2
Let U be a compact set of R d and assume that (A1)iii), (A3), (A4) and (A5)ii) hold.Let ( g n ) and ( b n ) be defined as in (12) . Then, for all γ > and | α | = 1 , , we have sup x ∈ U (cid:12)(cid:12)(cid:12) ∂ [ α ] g n ( x ) − E (cid:16) ∂ [ α ] g n ( x ) (cid:17)(cid:12)(cid:12)(cid:12) = O s (log n ) γ P ni =1 b d +2 | α | i ! a.s. Lemma 1 is proved in Mokkadem et al. [15]. We now prove Lemma 2. Set v n = [ P ni =1 b d +2 | α | i ] / [(log n ) γ ] − / . Applying Proposition 3 in Mokkadem et al. [15], it holds that for any δ >
0, thereexists c ( δ ) > P (cid:20) sup x ∈ U v n (cid:12)(cid:12)(cid:12) ∂ [ α ] g n ( x ) − E (cid:16) ∂ [ α ] g n ( x ) (cid:17)(cid:12)(cid:12)(cid:12) ≥ δ (cid:21) ≤ exp − c ( δ ) P ni =1 b d +2 | α | i v n ! . Since lim n →∞ P ni =1 b d +2 | α | i / ( v n log n ) = ∞ we have, for n large enough, c ( δ ) P ni =1 b d +2 | α | i / (2 v n ) ≥ n , and Lemma 2 follows from the application of Borel-Cantelli’s Lemma. θ n − θ ) T , ( ˜ µ n − µ )) T and ([ ∇ f n ( θ )] T , ˜ f n ( θ ) − f ( θ )) T By definition of θ n , we have ∇ f n ( θ n ) = 0 so that ∇ f n ( θ n ) − ∇ f n ( θ ) = −∇ f n ( θ ) . (13)For each i ∈ { , . . . , d } , a Taylor expansion applied to the real valued application ∂f n /∂x i impliesthe existence of ε n ( i ) = ( ε (1) n ( i ) , . . . , ε ( d ) n ( i )) t such that ( ∂f n ∂x i ( θ n ) − ∂f n ∂x i ( θ ) = P dj =1 ∂ f n ∂x i ∂x j ( ε n ( i )) (cid:16) θ ( j ) n − θ ( j ) (cid:17) , (cid:12)(cid:12) ε ( j ) n ( i ) − θ ( j ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) θ ( j ) n ( i ) − θ ( j ) (cid:12)(cid:12) ∀ j ∈ { , . . . , d } . Define the d × d matrix H n = ( H ( i,j ) n ) ≤ i,j ≤ d by setting H ( i,j ) n = ∂ f n ∂x i ∂x j ( ε n ( i )); Equation (13) canbe then rewritten as H n ( θ n − θ ) = −∇ f n ( θ ). Now, set R n = ˜ f n ( θ n ) − ˜ f n ( θ ) . (14)13e can then write: (cid:18) (cid:2) D f ( θ ) (cid:3) − H n ( θ n − θ )˜ µ n − µ (cid:19) = − (cid:2) D f ( θ ) (cid:3) − ∇ f n ( θ )˜ f n ( θ ) − f ( θ ) ! + (cid:18) R n (cid:19) . (15)Let U be a compact set of R d containing θ . The combination of Lemmas 1 and 2 with | α | = 2, g n = f n and b n = h n ensures that for any γ > ε > x ∈ U (cid:12)(cid:12)(cid:12) ∂ [ α ] f n ( x ) − ∂ [ α ] f ( x ) (cid:12)(cid:12)(cid:12) = O s (log n ) γ P ni =1 h d +4 i + P ni =1 h qi n ! a.s.= O s (log n ) γ nh d +4 n + h q − εn + 1 n ! = o (1) a.s. (16)Since D f is continuous in a neighbourhood of θ and since lim n →∞ θ n = θ a.s., (16) ensures thatlim n →∞ H n = D f ( θ ) a.s. It follows that the weak and a.s. behaviours of (( θ n − θ ) T , (˜ µ n − µ )) T are given by the one of the right-hand-sided term of (15). ∇ f n ( θ )] T , ˜ f n ( θ ) − f ( θ )) T Let us at first assume that the following lemma holds.
Lemma 3
Let Assumptions (A1)i), (A1)iv), (A3), (A4)i) and (A4)ii) hold. Then W n = p nh d +2 n h ∇ f n ( θ ) − E (cid:0) ∇ f n ( θ ) (cid:1)iq n ˜ h dn h ˜ f n ( θ ) − E (cid:0) ˜ f n ( θ ) (cid:1)i D −→ N (cid:16) , Σ (cid:17) . The application of Lemma 1 giveslim n →∞ n P ni =1 h qi E (cid:0) ∇ f n ( θ ) (cid:1) n P ni =1 ˜ h qi h E (cid:0) ˜ f n ( θ ) (cid:1) − f ( θ ) i = ( − q q ! ∇ (cid:16) P dj =1 β qj ∂ q f∂x qj ( θ ) (cid:17) ( − q q ! P dj =1 β qj ∂ q f∂x qj ( θ ) . (17)1) If aq < aq <
1, by using (9), it is straightforward to see thatlim n →∞ h qn E (cid:0) ∇ f n ( θ ) (cid:1) h qn h E (cid:0) ˜ f n ( θ ) (cid:1) − f ( θ ) i ! = B q ( θ ) . (18)2) Let us now consider the case aq ≥ aq ≥
1. We have q nh d +2 n E ( ∇ f n ( θ )) = q nh d +2 n P ni =1 h qi n n P ni =1 h qi E ( ∇ f n ( θ )) , with, in view of (10), for all ε > q nh d +2 n P ni =1 h qi n = O (cid:16) n (1 − ( a − ε )( d +2)) n − aq + aε (cid:17) = o (1) . Applying (17), it follows that lim n →∞ p nh d +2 n E ( ∇ f n ( θ )) = 0. Proceeding in the same wayfor E ( ˜ f n ( θ )), we obtain lim n →∞ p nh d +2 n E (cid:0) ∇ f n ( θ ) (cid:1)q n ˜ h dn h E (cid:0) ˜ f n ( θ ) (cid:1) − f ( θ ) i = 0 . (19)14he combination of either (18) or (19) and of Lemma 3 gives the weak convergence rate of([ ∇ f n ( θ )] T , ˜ f n ( θ ) − f ( θ )) T : • If (C1) holds, then p nh d +2 n ∇ f n ( θ ) q n ˜ h dn ( ˜ f n ( θ ) − f ( θ )) ! D −→ N (0 , Σ) . (20) • If a = ( d + 2 q + 2) − , ˜ a = ( d + 2 q ) − , and if there exist c, ˜ c ≥ n →∞ nh d +2 q +2 n = c and lim n →∞ n ˜ h d +2 qn = ˜ c , then p nh d +2 n ∇ f n ( θ ) q n ˜ h dn ( ˜ f n ( θ ) − f ( θ )) ! D −→ N ( D ( c, ˜ c ) B q ( θ ) , Σ) . (21) • If (C2) holds, since aq < aq <
1, (9) implies that h qn ∇ f n ( θ ) h qn ( ˜ f n ( θ ) − f ( θ )) ! P −→ B q ( θ ) . (22) Proof of Lemma 3
To prove Lemma 3, we first prove thatlim n →∞ E (cid:16) W n W Tn (cid:17) = Σ , (23)and then check that ( W n ) satisfies Lyapounov’s condition. Set Y k,n = 1 p nh − d − n h − d − k (cid:20) ∇ K (cid:18) θ − X k h k (cid:19) − E (cid:18) ∇ K (cid:18) θ − X k h k (cid:19)(cid:19)(cid:21) Z k,n = 1 p n ˜ h − dn ˜ h − dk (cid:20) K (cid:18) θ − X k ˜ h k (cid:19) − E (cid:18) K (cid:18) θ − X k ˜ h k (cid:19)(cid:19)(cid:21) , and note that E (cid:16) W n W Tn (cid:17) = n X k =1 E (cid:16) Y k,n Y Tk,n (cid:17) E (cid:16) Y k,n Z k,n (cid:17) E (cid:16) Y Tk,n Z k,n (cid:17) E (cid:16) Z k,n (cid:17) . Now, for any s, t ∈ { , . . . , d } , we have E " ∂K∂x s (cid:18) θ − X k h k (cid:19) ∂K∂x t (cid:18) θ − X k h k (cid:19) = Z R d ∂K∂x s (cid:18) θ − yh k (cid:19) ∂K∂x t (cid:18) θ − yh k (cid:19) f ( y ) dy = h dk f ( θ ) G s,t + o ( h dk ) , and since, E h ∂K∂x s (cid:16) θ − X k h k (cid:17)i = O ( h dk ), we deduce that E " ∇ K (cid:16) θ − X k h k (cid:17) − E ∇ K (cid:16) θ − X k h k (cid:17) ! ∇ K (cid:16) θ − X k h k (cid:17) − E ∇ K (cid:16) θ − X k h k (cid:17) ! T ! = f ( θ ) Gh dk h o (1) i (24)15hich implies that lim n →∞ P nk =1 E ( Y k,n Y Tk,n ) = f ( θ )[1 + a ( d + 2)] − G . In the same way, we have E " K (cid:18) θ − X k ˜ h k (cid:19) − E K (cid:18) θ − X k ˜ h k (cid:19) ! ! = ˜ h dk f ( θ ) Z R d K ( z ) dz h o (1) i (25)and thus lim n →∞ P nk =1 E ( Z k,n ) = f ( θ )[1 + ˜ ad ] − R R d K ( z ) dz . Moreover, set h ∗ n = min( h n , ˜ h n ); wehave E " ∇ K (cid:18) θ − X k h k (cid:19) K (cid:18) θ − X k ˜ h k (cid:19) = h ∗ dk Z R d ∇ K (cid:16) h ∗ k h k z (cid:17) K (cid:16) h ∗ k ˜ h k z (cid:17) f ( θ − h ∗ k z ) dz. Noting that f ( θ − h ∗ k z ) = f ( θ ) + h ∗ k R k ( θ, z ) with | R k ( θ, z ) | ≤ k∇ f k ∞ k z k , we get E " ∇ K (cid:18) θ − X k h k (cid:19) K (cid:18) θ − X k ˜ h k (cid:19) = h ∗ dk (cid:20) f ( θ ) Z R d ∇ K (cid:16) h ∗ k h k z (cid:17) K (cid:16) h ∗ k ˜ h k z (cid:17) dz + h ∗ k Z R d ∇ K (cid:16) h ∗ k h k z (cid:17) K (cid:16) h ∗ k ˜ h k z (cid:17) R k ( θ, z ) dz (cid:21) . Since the function z (cid:2) ∇ K ( z ) (cid:3) K ( z is odd (in each coordinate), the first right-handed integral iszero, and, since h ∗ k equals either h k or ˜ h k , we get (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E " ∇ K (cid:18) θ − X k h k (cid:19) K (cid:18) θ − X k ˜ h k (cid:19) ≤ h ∗ ( d +1) k k∇ f k ∞ (cid:20) k K k ∞ Z R d k z kk∇ K ( z ) k dz + k∇ K k ∞ Z R d k z k| K ( z ) | dz (cid:21) = O (cid:16) h ∗ ( d +1) k (cid:17) . We then deduce that E " ∇ K (cid:18) θ − X k h k (cid:19) − E ∇ K (cid:18) θ − X k h k (cid:19) ! K (cid:18) θ − X k ˜ h k (cid:19) − E K (cid:18) θ − X k ˜ h k (cid:19) ! = O (cid:18)h min( h k , ˜ h k ) i d +1 (cid:19) + O (cid:16) h dk ˜ h dk (cid:17) = O (cid:18) h d +12 k ˜ h d +12 k (cid:19) , (26)and thus, in view of (9), n X k =1 E (cid:16) Y k,n Z k,n (cid:17) = O q ( nh − d − n )( n ˜ h − dn ) n X k =1 h − d +12 k ˜ h − d k = o (1) , which concludes the proof of (23). Now we check that ( W n ) satisfies the Lyapounov’s condition.Set p >
2. Since K and ∇ K are bounded and integrable, we have R R d k∇ K ( z ) k p dz < ∞ and R R d | K ( z ) | p dz < ∞ . It follows that n X k =1 E ( k Y k,n k p ) = O nh − d − n ) p n X k =1 h ( − d − pk Z R d (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∇ K (cid:18) θ − yh k (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p f ( y ) dy ! = O nh − d − n ) p n X k =1 h ( − d − pk h dk ! = o (1) , n X k =1 E (cid:0)(cid:12)(cid:12) Z k,n (cid:12)(cid:12) p (cid:1) = O n ˜ h − dn ) p n X k =1 ˜ h − dpk Z R d (cid:12)(cid:12)(cid:12) K (cid:18) θ − y ˜ h k (cid:19) (cid:12)(cid:12)(cid:12) p f ( y ) dy ! = O n ˜ h − dn ) p n X k =1 ˜ h − dpk ˜ h dk ! = o (1) , ∇ f n ( θ )] T , ˜ f n ( θ ) − f ( θ )) T Let us at first assume that the following lemma holds.
Lemma 4
Let Assumptions (A1)i), (A1)iv), (A3), (A4)i), (A4)ii) and (A6) hold. With probabilityone, the sequence √ n p nh d +2 n (cid:2) ∇ f n ( θ ) − E (cid:0) ∇ f n ( θ ) (cid:1)(cid:3)q n ˜ h dn h ˜ f n ( θ ) − E (cid:0) ˜ f n ( θ ) (cid:1)i is relatively compact and its limit set is E = (cid:8) ν ∈ R d +1 such that ν T Σ − ν ≤ (cid:9) . The combination of either (18) or (19) and of Lemma 4 gives the almost sure convergence rate of([ ∇ f n ( θ )] T , ˜ f n ( θ ) − f ( θ )) T : • If (C1) holds, then, with probability one, the sequence1 √ n p nh d +2 n ∇ f n ( θ ) q n ˜ h dn h ˜ f n ( θ ) − f ( θ ) i (27)is relatively compact and its limit set is E = (cid:8) ν ∈ R d +1 such that ν T Σ − ν ≤ (cid:9) . • If a = ( d + 2 q + 2) − , ˜ a = ( d + 2 q ) − , and if there exist c, ˜ c ≥ n →∞ nh d +2 q +2 n / (2 log log n ) = c and lim n →∞ n ˜ h d +2 qn / (2 log log n ) = ˜ c , then with probability one, the sequence1 √ n p nh d +2 n ∇ f n ( θ ) q n ˜ h dn h ˜ f n ( θ ) − f ( θ ) i (28)is relatively compact and its limit set is E = n ν ∈ R d +1 such that ( ν − D ( c, ˜ c ) B q ( θ )) T Σ − ( ν − D ( c, ˜ c ) B q ( θ )) ≤ o . • If (C’2) holds, then h qn ∇ f n ( θ ) h qn [ ˜ f n ( θ ) − f ( θ )] ! a.s. −→ B q ( θ ) . (29)We now prove Lemma 4. SetΓ = f ( θ ) (cid:18) G R R d K ( z ) dz (cid:19) , ∆ n = √ nh − d − n I d √ n ˜ h − dn , Q n = p h − d − n I d p ˜ h − dn ! , let ( ε n ) be a sequence of R d +1 -valued, independent and N (0 , Γ)-distributed random vectors, andset S n = P nk =1 Q k ε k . In order to prove Lemma 4, we first establish the following Lemma 5 inSection 3.5.1, and then show in Section 3.5.2 how Lemma 4 can be deduced from Lemma 5. Lemma 5
Let Assumptions (A1)i), (A1)iv), (A3), (A4)i), (A4)ii) and (A6)ii) hold. With proba-bility one, the sequence ( T n ) ≡ (Σ − / ∆ n S n / √ n ) is relatively compact and its limit set isthe unit ball B d +1 (0 ,
1) = n ν ∈ R d +1 such that k ν k ≤ o . .5.1 Proof of Lemma 5 Set B n = E ( S n S Tn ), let k x k (respectively ||| A ||| ) denote the euclidean norm (respectively thespectral norm) of the vector x (respectively of the matrix A ). The application of Theorem 2 inKoval [11] ensures thatlim sup n →∞ k Σ − / ∆ n S n k q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ − / ∆ n B n ∆ n Σ − / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log log ||| B n ||| ≤ n →∞ ∆ n B n ∆ n = Σ and log log ||| B n ||| ∼ log log n , we deduce thatlim sup n →∞ k T n k ≤ T n ) is relatively compact and its limit set U is included in B d +1 (0 , S d +1 = n w ∈ R d +1 , k w k = 1 o , and let us at first assume that ∀ w ∈ S d +1 , lim sup n →∞ w T T n ≥ ∀ ε > , ∀ n ≥ , ∃ n ≥ n suchthat w T T n > − ε and k T n k ≤ ε . Noting that k T n − w k = k T n k + k w k − w T T n , it followsthat, with probability one, ∀ ε > ∀ n ≥ ∃ n ≥ n such that k T n − w k ≤ ε + 1 − − ε ) = 3 ε .Thus, with probability one, S d +1 ⊂ U . To deduce that B d +1 (0 , ⊂ U , we introduce ( e k ), a sequenceof real-valued, independent, and N (0 , e k ) is independentof ( ε k ). Moreover, we set˜ Q n = p h − d − n I d +1 p ˜ h − dn ! , ˜ S n = n X k =1 ˜ Q k (cid:18) e k ε k (cid:19) ˜∆ n = √ nh − d − n I d +1 √ n ˜ h − dn , and ˜Σ = (cid:18) (cid:19) . We then note that the previous result applied to ( ˜ T n ) ≡ ( ˜Σ − / ˜∆ n ˜ S n / √ n ) ensures that,with probability one, S d +2 = { w ∈ R d +2 , k w k = 1 } is included in the limit set of ˜ T n . Now let π : R d +2 −→ R d +1 be the projection map defined by π (( x , . . . , x d +2 ) T ) = ( x , . . . , x d +2 ) T . Weclearly have π ( S d +2 ) = B d +1 (0 ,
1) and π ( ˜ T n ) = T n , and thus deduce that, with probability one, B d +1 (0 ,
1) is included in the limit set of T n . To conclude the proof of Lemma 5, it remains to prove(31). In fact, we shall prove that, ∀ w = 0 , lim sup n →∞ w T ∆ n S n √ n ≥ √ w T Σ w a.s. (32)Set v n = min { [ nh − ( d +2) n ] / ; [ n ˜ h − dn ] / } , A n = v n w T ∆ n and V n = E (cid:0) A n S n S Tn A Tn (cid:1) ; we follow amethod used by Petrov [19] in the proof of his Theorems 7.1 and 7.2. Since lim n →∞ V n = ∞ , ∀ τ > n k such that n k → ∞ as k → ∞ and V n k − ≤ (1 + τ ) k ≤ V n k , ( k = 1 , , . . . ). Since lim n →∞ V n − /V n = 1, we obtain V n k ∼ (1 + τ ) k . Moreover, wehave V n k − V n k − = V n k (cid:0) − V n k − V n k (cid:1) ∼ V n k ττ + 1 . (33)18et χ ( n ) = p V n log log V n , ψ ( n k ) = q V n k − V n k − ) log log( V n k − V n k − ) . It follows from (33) that ψ ( n k ) ∼ τ / χ ( n k − ). Then for any γ ∈ ]0 ,
1[ and k sufficiently large, wehave P (cid:0) A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k ) (cid:1) ≥ P (cid:16) A n k S n k ≥ (1 − γ ψ ( n k ) (cid:17) − P (cid:18) A n k S n k − ≥ γψ ( n k )2 (cid:19) ≥ P (cid:16) A n k S n k ≥ (1 − γ χ ( n k ) (cid:17) − P (cid:18) A n k S n k − ≥ γ √ τ χ ( n k − ) (cid:19) . (34)Since A n k S n k is N (cid:0) , V n k (cid:1) -distributed, we have P (cid:16) A n k S n k ≥ (1 − γ χ ( n k ) (cid:17) = 1 √ π Z ∞ (1 − γ ) √ V nk log log V nk exp (cid:18) − t (cid:19) dt ≥ [log V n k ] − (1+ µ )(1 − γ ) (35)for every µ and sufficiently large k . Set ˜ V n k = v n k w T ∆ n k B n k − ∆ n k w ; since A n k S n k − is N (cid:0) , ˜ V n k (cid:1) -distributed, we have P (cid:16) A n k S n k − ≥ γ √ τ χ ( n k − ) (cid:17) = 1 √ π Z ∞ γ √ τ r Vnk − Vnk log log V nk − exp (cid:18) − t (cid:19) dt. Let ρ min ( A ) (respectively ρ max ( A )) denote the smallest (respectively the largest) eigenvalue of amatrix A , set Σ n = ∆ n B n ∆ n , and note that V n k − ˜ V n k ≥ v n k − ρ min (Σ n k − ) v n k ρ max (∆ n k ∆ − n k − Σ n k − ∆ − n k − ∆ n k ) (36)with ρ max (∆ n k ∆ − n k − Σ n k − ∆ − n k − ∆ n k ) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ n k − ∆ − n k − ∆ n k ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ − n k − ∆ n k ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (37)It follows from (9) and Assumption A6)ii) that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ − n k − ∆ n k ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max ( n k − h − ( d +2) n k − n k h − ( d +2) n k ; n k − ˜ h − dn k − n k ˜ h − dn k ) ∼ v n k − v n k . (38)¿From (36), (37) and (38), we deduce that, for sufficiently large k , V n k − ˜ V n k ≥ ρ min (Σ n k − )2 ρ max (Σ n k − ) ≥ ρ min (Σ)4 ρ max (Σ)and therefore, for sufficiently large k , P (cid:18) A n k S n k − ≥ γ √ τ χ ( n k − ) (cid:19) ≤ √ π Z ∞ γ √ τ q ρmin (Σ) ρmax (Σ) √ V nk − log log V nk − exp (cid:18) − t (cid:19) dt ≤ (cid:2) log V n k − (cid:3) − γ τρmin (Σ)36 ρmax (Σ) . (39)19he inequalities (34), (35) and (39) imply that P (cid:0) A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k ) (cid:1) ≥ [log V n k ] − (1+ µ )(1 − γ ) − (cid:2) log V n k − (cid:3) − γ τρmin (Σ)36 ρmax (Σ) . Thus, for sufficiently large k and τ , there exists c > c does not depend on k and P (cid:0) A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k ) (cid:1) ≥ c h k − (1+ µ )(1 − γ ) − k − i . Choosing µ such that (1 + µ ) (1 − γ/ <
1, we get P (cid:0) A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k ) (cid:1) ≥ c k − (1+ µ )(1 − γ ) and thus P k P ( A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k )) = ∞ . Applying Borel-Cantelli’s Lemma, weobtain P (cid:0) A n k S n k − A n k S n k − ≥ (1 − γ ) ψ ( n k ) i.o. (cid:1) = 1 . (40)Now, lim sup k →∞ (cid:12)(cid:12) A n k S n k − (cid:12)(cid:12) χ ( n k − ) ≤ lim sup k →∞ v n k k w k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k ∆ n k − S n k − k q v n k − ( w T ∆ n k − B n k − ∆ n k − w ) log log V n k − ≤ lim sup k →∞ v n k k w k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k ∆ n k − S n k − k q v n k − ( w T Σ w ) log log V n k − . Applying Theorem 2 in Koval [11] again, and using the fact that lim n →∞ ∆ n B n ∆ n = Σ, we obtainlim sup n →∞ k ∆ n S n k / p ||| Σ ||| log log n ≤ k →∞ (cid:12)(cid:12) A n k S n k − (cid:12)(cid:12) χ ( n k − ) ≤ lim sup k →∞ v n k k w k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ n k ∆ − n k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ||| Σ ||| v n k − √ w T Σ w a.s.Since ||| ∆ n k ∆ − n k − ||| = [ ρ max (∆ − n k − ∆ n k ∆ n k ∆ − n k − )] / ≤ v n k − /v n k , for sufficiently large k , weobtain lim sup k →∞ (cid:12)(cid:12) A n k S n k − (cid:12)(cid:12) /χ ( n k − ) ≤ k w k p ||| Σ ||| / √ w T Σ w a.s. Set ε ∈ ]0 ,
1[ and κ = 2 k w k p ||| Σ ||| / √ w T Σ w . Noting that(1 − γ ) ψ ( n k ) − κχ ( n k − ) ∼ h (1 − γ ) √ τ (1 + τ ) − / − κ (1 + τ ) − / i χ ( n k ) , and noting that γ can be chosen sufficiently small and τ sufficiently large so that (1 − γ ) √ τ (1 + τ ) − / − κ (1 + τ ) − / > − ε , we obtain P ( A n k S n k > (1 − ε ) χ ( n k ) i.o.) ≥ P ( A n k S n k > (1 − γ ) ψ ( n k ) − κχ ( n k − ) i.o.) . Taking (40) into account, we then obtain P ( A n k S n k > (1 − ε ) χ ( n k ) i.o.) = 1. We thus getlim sup n →∞ A n S n /χ ( n ) ≥ .5.2 Proof of Lemma 4 Now, set ˜ V k = h − d/ k h ∇ K (cid:16) θ − X k h k (cid:17) − E (cid:16) ∇ K (cid:16) θ − X k h k (cid:17)(cid:17)i ˜ h − d/ k h K (cid:16) θ − X k ˜ h k (cid:17) − E (cid:16) K (cid:16) θ − X k ˜ h k (cid:17)(cid:17)i and Γ k = E ( ˜ V k ˜ V Tk ). In view of (24), (25) and (26), we have lim k →∞ Γ k = Γ. It follows that ∃ k ≥ ∀ k ≥ k , Γ k is inversible; without loss of generality, we assume k = 1, and set˜ U k = Γ − / k ˜ V k . Set p ∈ ]2 ,
4[ and let L be a slowly varying function; we have: E (cid:0) k ˜ U k k p (cid:1) ( k log log k ) p/ = O h − dp/ k E h k∇ K (cid:16) θ − X k h k (cid:17) k p i + ˜ h − dp/ k E h(cid:12)(cid:12)(cid:12) K (cid:16) θ − X k h k (cid:17)(cid:12)(cid:12)(cid:12) p i ( k log log k ) p/ = O h d − dp/ k + ˜ h d − dp/ k ( k log log k ) p/ ! = O (cid:16) L ( k ) h k − [ p − − ad ) ] + k − [ p − − ˜ ad ) ] i(cid:17) so that P k ( k log log k ) − p/ E ( k ˜ U k k p ) < ∞ . By application of Theorem 2 of Einmahl [7], we de-duce that P nk =1 ˜ U k − P nk =1 η k = o ( √ n log log n ) a.s., where η k are independent, and N (0 , I d +1 )-distributed random vectors. It follows that n X k =1 Γ / Γ − / k ˜ V k − n X k =1 ε k = o ( p n log log n ) a.s. (41)Now, ∆ n " n X k =1 Q k Γ / Γ − / k ˜ V k − n X k =1 Q k ε k = ∆ n n X k =1 Q k h Γ / Γ − / k ˜ V k − ε k i = ∆ n n X k =1 Q k k X j =1 h Γ / Γ − / j ˜ V j − ε j i − k − X j =1 h Γ / Γ − / j ˜ V j − ε j i (with X j =1 = 0)= ∆ n n − X k =1 ( Q k − Q k +1 ) k X j =1 (cid:16) Γ / Γ − / j ˜ V j − ε j (cid:17) + ∆ n Q n n X j =1 (cid:16) Γ / Γ − / j ˜ V j − ε j (cid:17) = ∆ n n − X k =1 ( Q k − Q k +1 ) h o (cid:16)p k log log k (cid:17)i + ∆ n Q n h o (cid:16)p n log log n (cid:17)i a.s.21oreover,∆ n n − X k =1 ( Q k − Q k +1 ) h o (cid:16)p k log log k (cid:17)i = q h d +2 n n P n − k =1 (cid:18) h − d +22 k − h − d +22 k +1 (cid:19) o (cid:0) √ k log log k (cid:1) q ˜ h dn n P n − k =1 (cid:18) ˜ h − d k − ˜ h − d k +1 (cid:19) = o (cid:18)q h d +2 n log log n (cid:19) P n − k =1 (cid:18) h − d +22 k − h − d +22 k +1 (cid:19) o (cid:18)q ˜ h dn log log n (cid:19) P n − k =1 (cid:18) ˜ h − d k − ˜ h − d k +1 (cid:19) . Set φ ( s ) = [ h ( s )] − d +22 and ˜ φ ( s ) = h ˜ h ( s ) i − d , and let u k ∈ [ k, k + 1]; since φ ′ and ˜ φ ′ vary regularlywith exponent ( a ( d + 2) / −
1) and (˜ ad/ −
1) respectively, we have n − X k =1 (cid:18) h − d +22 k − h − d +22 k +1 (cid:19) = O n − X k =1 φ ′ ( u k ) ! = O (cid:18)Z n φ ′ ( s ) ds (cid:19) = O (cid:18) h − d +22 n (cid:19) and n − X k =1 (cid:18) ˜ h − d k − ˜ h − d k +1 (cid:19) = O n − X k =1 ˜ φ ′ ( u k ) ! = O (cid:18)Z n ˜ φ ′ ( s ) ds (cid:19) = O (cid:18) ˜ h − d n (cid:19) , so that ∆ n P n − k =1 ( Q k − Q k +1 ) (cid:2) o (cid:0) √ k log log k (cid:1)(cid:3) = o (cid:0) √ log log n (cid:1) . Since ∆ n Q n (cid:2) o (cid:0) √ n log log n (cid:1)(cid:3) = o (cid:0) √ log log n (cid:1) , we deduce that∆ n P nk =1 Q k Γ / Γ − / k ˜ V k √ n − ∆ n P nk =1 Q k ε k √ n = o (1) a.s.The application of Lemma 5 then ensures that, with probability one, the sequence (∆ n P nk =1 Q k Γ / Γ − / k ˜ V k / √ n ) is relatively compact and its limit set is E = { ν ∈ R d +1 such that ν T Σ − ν ≤ } . Since∆ n P nk =1 Q k ˜ V k √ n = ∆ n P nk =1 Q k Γ / Γ − / k ˜ V k √ n + ∆ n P nk =1 Q k (cid:16) I d +1 − Γ / Γ − / k (cid:17) ˜ V k √ n with lim k →∞ ( I d +1 − Γ / Γ − / k ) = 0, Lemma 4 follows. In view of (15) (and the comment below), Theorem 1 (respectively Theorem 2) is a straightforwardconsequence of the combination of (20), (21) and (22) (respectively (27), (28) and (29)) togetherwith the following lemma, which establishes that the residual term R n (defined as in (14)) isnegligeable. Lemma 6
Let Assumptions (A1)-(A5) hold. If (C2) holds, then lim n →∞ ˜ h − qn R n = 0 a.s. Other-wise, lim n →∞ q n ˜ h dn R n = 0 a.s. roof of Lemma 6 We first note that a Taylor’s expansion implies the existence of ζ n such that k ζ n − θ n k ≤ k θ n − θ k and R n = ( θ n − θ ) T ∇ ˜ f n ( ζ n )= ( θ n − θ ) T h ∇ ˜ f n ( ζ n ) − ∇ f ( ζ n ) + ∇ f ( ζ n ) − ∇ f ( θ ) i . Let V be a compact set that contains θ ; for n large enough, we get k R n k = O (cid:18) k θ n − θ k (cid:20) sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k + k ζ n − θ k (cid:21)(cid:19) = O (cid:18) k θ n − θ k sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k + k θ n − θ k (cid:19) . On the one hand, let us recall that the a.s. convergence rate of ( θ n − θ ) is given by the one of (cid:2) D f ( θ ) (cid:3) − ∇ f n ( θ ) (see (15) and the comment below). One can apply (27), (28), and (29) andobtain the exact a.s. convergence rate of θ n − θ . However, to avoid assuming (A6), we apply hereLemmas 1 and 2 (with | α | = 1 and ( g n , b n ) = ( ˜ f n , ˜ h n )), and get the following upper bound of thea.s. convergence rate of θ n − θ : for any γ > ε > k θ n − θ k = O s (log n ) γ nh d +2 n + P ni =1 h qi n ! = O s (log n ) γ nh d +2 n + h q − εn ! a.s. (42)On the other hand, we havesup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k ≤ sup x ∈V k∇ ˜ f n ( x ) − E (cid:16) ∇ ˜ f n ( x ) (cid:17) k + sup x ∈V k E (cid:16) ∇ ˜ f n ( x ) (cid:17) − ∇ f ( x ) k . The application of Lemmas 1 and 2 with | α | = 1, ( g n , b n ) = ( ˜ f n , ˜ h n ) ensures that, for any γ > ε > x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k = O s (log n ) γ n ˜ h d +2 n + P ni =1 ˜ h qi n ! = O s (log n ) γ n ˜ h d +2 n + ˜ h q − εn ! a.s. (43)Let L denotes a generic slowly varying function that may vary from line to line. • Let us first assume that (C1) holds. The application of (42) and (43) ensures that for any ε > q n ˜ h dn k θ n − θ k sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k = O (cid:16) L ( n ) h n − (1 − a ( d +2) − a ) + n ˜ a − a ( q − ε ) i(cid:17) + o (1) a.s.Observe that by (C1)i), it is straightforward to see that 2˜ a + a ( d + 2) < a < a ( q − ε )for any ε > q n ˜ h dn k θ n − θ k sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k = o (1) a.s.Moreover, the application of (42) ensures that q n ˜ h dn k θ n − θ k = O (cid:16) L ( n ) h n − (1 − a ( d +2)+˜ ad ) + n (1 − ˜ ad − a ( q − ε )) i(cid:17) a.s.Now, by (C1)ii) we have 2 a ( d + 2) − ˜ ad < ad + 4 a ( q − ε ) > ε > q n ˜ h dn k θ n − θ k = o (1) a.s., which ensures the first part of Lemma 6.23 We now assume that (C2) holds. Since ˜ aq ≤ q/ ( d + 2 q ) <
1, using (9), (42) and (43), we have1˜ h qn k θ n − θ k sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k = O (cid:16) L ( n ) h n − a ( d +2)2 + ˜ a ( d +2 q +2)2 + n − − a ( q − ε )+ ˜ a ( d +2 q +2)2 i(cid:17) + o (1) a.s. (44)On the one hand, for any ε > a ( d + 2) + ˜ a ( d + 2 q + 2) < a ( d + 2 + 2 q ) < a ( q − ε ) , (45)˜ aq + a ( d + 2) < aq < a ( q − ε ) . (46)Therefore, it follows from (44) and (45) that1˜ h qn k θ n − θ k sup x ∈V k∇ ˜ f n ( x ) − ∇ f ( x ) k = o (1) a.s.On the other hand, observe again that by (42) and (46), we have1˜ h qn k θ n − θ k = O (cid:16) L ( n ) h n − (1 − ˜ aq − a ( d +2)) + n ˜ aq − a ( q − ε ) i(cid:17) = o (1) a.s. , which concludes the proof of Lemma 6. Acknowledgements
We deeply thank two anonymous Referees for their helpfull suggestions andcomments.
References [1] Abraham, C., Biau, G. and Cadre, B. (2003), Simple estimation of the mode of a multivariatedensity.
Canadian J. Statist. , pp 23-34.[2] Abraham, C., Biau, G. and Cadre, B. (2004), On the asymptotic properties of a simple estimateof the mode. ESAIM Prob. and Statist. , pp 1-11.[3] Davies, H.I. (1973), Strong consistency of a sequential estimator of a probability density func-tion. Bull. Math. Statist. , pp. 49-54.[4] Devroye, L. (1979), On the pointwise and integral convergence of recursive kernel estimates ofprobability densities. Utilitas Math. , pp. 113-128.[5] Eddy, W.F. (1980), Optimum kernel estimates of the mode, Ann. Statist. , pp. 870-882.[6] Eddy, W.F. (1982), The asymptotic distributions of kernel estimators of the mode, Z. Warsch.Verw. Gebiete , pp. 279-290.[7] Einmahl, U. (1987), A useful estimate in the multidimensional invariance principle. Probabilitytheory and related fields , pp. 81-101.[8] Feller, W. (1970), An introduction to probability theory and its applications. Second editionVolume II, Wiley.[9] Hall, P. (1992), Effect of bias estimation on coverage accuracy of bootstrap confidence intervalsfor a probability density, Ann. Statist. , pp. 675-694.2410] Konakov, V.D. (1973), On asymptotic normality of the sample mode of multivariate distribu-tions, Theory Probab. Appl. , pp. 836-842.[11] Koval, V. (2002), A new law of the iterated logarithm in R d with application to matrix-normalized sums of randoms vectors. Journal of Theoretical Probability , pp. 249-257.[12] Menon, V.V., Prasad, B. and Singh, R.S. (1984), Non-parametric recursive estimates of aprobability density function and its derivatives. Journal of Statistical Planning and inference , pp. 73-82.[13] Mokkadem, A. and Pelletier, M. (2003), The law of the iterated logarithm for the multivariatekernel mode estimator. ESAIM: Probab. Statist. , pp. 1-21.[14] Mokkadem, A. and Pelletier, M. (2007), A companion for the Kiefer-Wolfowitz-Blum stochasticapproximation algorithm, Ann. Statist., , 1749-1772.[15] Mokkadem, A., Pelletier, M. and Thiam, B. (2006), Large and moderate deviations principlesfor recursive kernel estimators of a multivariate density and its partial derivatives. SerdicaMath. J. , pp. 323-354.[16] M¨uller H.G. (1989) Adaptive nonparametric peak estimation Ann. Statist., , 1053-1069.[17] Nadaraya, E.A. (1965), On non-parametric estimates of density functions and regressioncurves. Theory Probab. Appl. , pp. 186-190.[18] Parzen, E. (1962), On estimation of a probability density function and mode. Ann. Math.Statist. , pp. 1065-1076.[19] Petrov, V.V. (1995), Limit theorems in probability theory, Clarendon Press, Oxford.[20] Romano, J. (1988), On weak convergence and optimality of kernel density estimates of themode. Ann. Statist. , pp. 629-647.[21] Rosenblatt, M. (1956), Remarks on some-non-parametric estimates of density function. Ann.Math. Statist. , pp. 832-837.[22] Roussas, G. (1992), Exact rates of almost sure convergence of a recursive kernel estimateof a probability density function: Application to regression and hazard rate estimate. J. ofNonparam. Statist. , pp. 171-195.[23] R¨uschendorf, L. (1977), Consistency of estimators for multivariate density functions and forthe mode. Sankhya Ser. A , , pp. 243-250.[24] Samanta, M. (1973), Nonparametric estimation of the mode of a multivariate density, SouthAfrican Statist. J. , pp. 109-117.[25] Tsybakov, A.B. (1990), Recurrent estimation of the mode of a multidimensional distribution, Problems Inform. Transmission , 31-37[26] Van Ryzin, J. (1969), On strong consistency of density estimates. Ann. Math. Statist. , pp1765-1772.[27] Vieu, P. (1996), A note on density mode estimation, Statist. Probab. Lett. , 297-307[28] Wegman, E.J. and Davies, H.I. (1979), Remarks on some recursive estimators of a probabilitydensity. Ann. Statist. , pp. 316-327. 2529] Wertz, W. (1985), Sequential and recursive estimators of the probability density. Statistics ,pp. 277-295.[30] Wolverton, C.T. and Wagner, T.J. (1969), Asymptotically optimal discriminant functions forpattern classification. IEEE Trans. Inform. Theory , pp. 258-265.[31] Yamato, H. (1971), Sequential estimation of a continuous probability density function andmode. Bull. Math. Satist. , pp. 1-12.[32] Ziegler, K. (2003), On the asymptotic normality of kernel regression estimators of the modein the random design model. J. Statist. Plann. Inf. , pp. 123-144.[33] Ziegler, K. (2004), Adaptive kernel estimation of the mode in a nonparametric random designregression model.
Probab. Math. Satist.24