Signed variable optimal kernel for non-parametric density estimation
aa r X i v : . [ m a t h . S T ] F e b Signed variable optimal kernel fornon - parametric density estimation.M.R.Formica, E.Ostrovsky, and L.Sirota.
Universit`a degli Studi di Napoli Parthenope, via Generale Parisi 13, PalazzoPacanowsky, 80132, Napoli, Italy.e-mail: [email protected] of Mathematics and Statistics, Bar-Ilan University, 59200, RamatGan, Israel.e-mail: [email protected] of Mathematics and Statistics, Bar-Ilan University,59200, Ramat Gan, Israel.e-mail: [email protected]
Abstract.
We derive the optimal signed variable in general case kernels for the classicalstatistic density estimation, which are some generalization of the famous Epanech-nikov’s ones.
Key words and phrases.
Probability, random variable and vector (r.v.), densityof distribution, H¨older’s and other functional class of functions, kernel, optimiza-tion, Lagrange’s factor method, Legendre’s ordinary and dilated polynomials, Eu-ler’s equation, even function, fractional order, examples, Epanechnikov’s kernel andgeneralized Epanechnikov’s kernel (GEK), bandwidth, conditions of orthogonality,bias, variance, expectation, Parzen - Rosenblatt and recursive Wolverton - Wagnerstatistical density estimation.
Let (Ω , M, P ) be probability space with expectation E and variance Var . Letalso { ξ k } , k = 1 , , . . . , n be a sequence of independent, identical distributed (i,1.d.) random variables (r.v.) taking the values in the real axis R, and havingcertain non - known density of a distribution f = f ( x ) , x ∈ R. We suppose further that this function belongs to the certain space of allthe numerical valued 2 m − times continuous bounded differentiable functions C m ( R ) , m = 1 , , . . . , having a finite norm || f || m def = max k =0 , , ,..., m sup x ∈ R | f ( k ) ( x ) | < ∞ . (1)Let also K = K ( x ) , x ∈ be certain kernel , i.e. measurable even functionhaving finite support: ∃ θ ∈ (0 , ∞ ) ∀ x : | x | > θ ⇒ K ( x ) = 0 , (2)for which Z R K ( x ) dx = 1 . (3)We impose also the following conditions on this kernel. K ( − x ) = K ( x ); V ( K ) := Z R K ( x ) dx < ∞ ; (4) K ( · ) ∈ C ( R ) , Z R | K ( x ) | dx < ∞ . (5)The following conditions, which are also to be presumed, may be named as conditions of orthogonality: ∀ l = 1 , , . . . , m − ⇒ Z R x l K ( x ) dx = 0 . (6)Recall that the classical Parzen - Rosenblatt estimation f n ( x ) = f P Rn ( x ) of thedensity function f ( x ) has a form f n ( x ) def = 1 nh n X i =1 K x − ξ i h ! . (7)Here h = h ( n ) be a deterministic positive sequence such that lim n →∞ h ( n ) = 0and lim n →∞ nh ( n ) = ∞ , see [20], [21].These and alike estimations was study in many works, see e.g. [7], [9], [15], [16],[19], [20], [21], [23], [26], [27], [28], [29] etc. The optimal choose of h = h ( n ) andthe kernel K ( x ) are devoted the following works [5], [12], [17], [18]. The case whenthe r.v. - s. { ξ i } are (weakly) dependent is investigated in [17], [22].The conditions (6) may be used only for the investigation of bias δ n ( x ) of thesestatistics. Namely, denote δ n ( x ) def = E f n ( x ) − f ( x );then under our conditions | δ n ( x ) | ≤ C ( f ) h m ( n ) . (8)2he variation of f n ( x ) may be estimated as followsVar { f n (x) } ≤ C (f) n − Z ∞−∞ K (y) dy . (9) Statement of an optimization problem.
The relations (9) common with the limitations (2), (3), (4), (5), (6) lead aswas shown by V.A.Epanechnikov in [9] to the following setting of the constrainedoptimization problem: Z ∞−∞ K ( y ) dy → min , (10)under limitations Z ∞−∞ K ( y ) dy = 1; ∀ l = 1 , , , . . . , m − ⇒ Z ∞−∞ y l K ( y ) dy = 0; (11) Z ∞−∞ y m K ( y ) dy = 1 , (12) ∃ θ ∈ (0 , ∞ ) ∀ y : | y | > θ ⇒ K ( y ) = 0 . (13)This problem was solved in the case m = 1 by V.A.Epanechnikov in [9], 1969year: θ = √ | y | ≤ √ ⇒ K ( y ) = 34 √ − y √ . (14) Our aim in this short report is to find the optimal kernel K forarbitrary natural value m = 2 , , . . . . The case when the number m is fractional, will be considered thefourth section. Another motivation of these statement of problem appears from the famousresult belonging to W.Stute [24], [25]:lim n →∞ vuut nh ( n )2 | ln h ( n ) | || f n − E f n || ∞ = || f || / ∞ · V ( K ) , where as ordinary || g || ∞ = sup x ∈ R | g ( x ) | . Remark 1.1.
When m ≥ , we imposed in particular the following conditionon the kernel K ( · ) : 3 θ − θ y K ( y ) dy = 0 . Therefore, the kernel K ( · ) can not take only non - negative values. Some facts about Legendre’s polynomials.
The classical Legendre’s polynomials with support on the closed interval X :=[ − , , denotes as ordinary by P k ( x ) , x ∈ [ − , , k = 0 , , , . . . may be definedfor instance as follows P k ( x ) = 12 k k ! d k dx k ( x − k , Rodriguez’formula. The polynomial P k ( x ) is really polynomial of degree k. Thesepolynomials are orthogonal: Z − P k ( x ) P l ( x ) dx = 22 k + 1 δ k,l , k, l = 0 , , , . . . , where δ k,l is Kroneker’s symbol. Following, ∀ k ≥ , l < k, l ≥ ⇒ Z − P k ( x ) x l dx = 0 . (15)Many properties of these polynomials may be found, e.g. in the classical book [1].We will use the following relations: ∀ k = 0 , , , . . . P k (1) = 1; (16)2 µ ( k ) def = Z − x k P k ( x ) dx = 2 k +1 ( k !) (2 k + 1)! . (17)Some examples: P ( x ) = 1 , P ( x ) = x, P ( x ) = 0 . x − ,P ( x ) = 8 − (35 x − x + 3) . (18) Definition 2.1.
Let θ = const ∈ (0 , ∞ ) . The dilated
Legendre’s polynomial L θk ( x ) , k = 0 , , , . . . ; x ∈ [ − θ, θ ] of degree k, k = 0 , , , . . . with parameter θ is defined as follows. L θk ( x ) def = 1 θ P k (cid:18) xθ (cid:19) . (19)4f course, the properties of these polynomials follows from ones for Legendre’spolynomials. For instance, L s ( · ) , s = 0 , , , . . . is even function, L s +1 ( · ) , s =0 , , , . . . is odd;relations of orthogonality Z θ − θ x l L θk ( x ) dx = 0 , k > l, l = 0 , , . . . , k −
1; (20)as well as L k (0) = 1 /θ = L k ( θ ) , Z θ − θ y k L θk ( y ) dy = θ k · k +1 ( k !) (2 k + 1)! , (21) Z θ − θ L k ( y ) L l ( y ) dy = 2 /θ k + 1 δ k,l . (22)In particular, Z θ − θ L k ( y ) dy = 2 /θ k + 1 . (23)Let us return to the formulated above optimization problem (10), (11), (12),(13). Theorem 2.1.
The formulated optimization problem has an unique solution K ( y ) and having a following form K ( y ) = 12 θ − L θ m ( y ) = 12 θ − θ P m (cid:18) yθ (cid:19) , (24)when | y | ≤ θ , and K ( y ) = 0 otherwise. Here θ = " − µ (2 m ) / (2 m ) , (25)and wherein under our restrictions on the kernel K ( · )min Z R K ( y ) dy = Z R K ( y ) dy = 12 θ · m + 34 m + 1 . (26)Recall that µ (2 m ) = 2 m (2 m )!(4 m + 1)! . Proof.
We will follow the V.A.Epanechnikov, author of the article [9], appliedthe famous Legendre factors method. The Euler’s equation for this problem give usthe following equality for the optimal kernel K ( · )5 ( y ) = m X s =0 λ s y s , | y | ≤ θ. On the other words, K is polynomial of degree ≤ m, inside the interval [ − θ, θ ] . As long as the kernel is even function, λ r +1 = 0 , r = 0 , , . . . , m − . Further, it follows from the conditions of orthogonality that also λ = λ = . . . λ m − = 0 . Therefore, the optimal kernel has a form K ( y ) = a − bL θ m ( y ) = a − bθ P m (cid:18) yθ (cid:19) , a, b = const . We deduce substituting the value y = θ and taking into account the relations K ( θ ) = 0 and P m (1) = 1 a = bθ . (27)Secondly, 1 = Z θ − θ K ( y ) dy = 2 aθ, following a = 1 / (2 θ ) and hence b = 1 / , so that K ( y ) = 12 θ − L θ m ( y ) = 12 θ − θ P m (cid:18) yθ (cid:19) . Thirdly, 1 = Z θ − θ y m K ( y ) dy = θ m − . θ m Z − z m P m ( z ) dz = θ m − θ m µ (2 m ) . Now, proposition (25) there holds, as well. Thus, K ( y ) = 12 θ − θ P m (cid:18) yθ (cid:19) , | y | ≤ θ , Ultimately, the equality (26) follows immediately from ones (22) and (23).This completes the proof of our theorem.
Examples.
Of course, for the case m = 1 we obtain the classical Epanech-nikov’s kernel. 6et now m = 2 . We deduce after some calculations θ = (cid:20) (cid:21) / , and correspondingly K ( y ) = 12 θ (cid:26) − (cid:16) y θ − − y θ − + 3 (cid:17) (cid:27) , | y | ≤ θ , and of course K ( y ) = 0 , | y | > θ . Wherein min Z R K ( y ) dy = Z R K ( y ) dy = 1118 θ . (28) Let β be arbitrary fractional positive number; denote by l = [ β ] its positiveinteger part. We suppose that the function f ( · ) belongs to the space Σ( β ) . Thisimply that all the derivatives f ( k ) ( x ) , k = 0 , , . . . , l are continuous and boundedand that the last continuous derivative f ( l ) ( x ) is bounded and satisfies the H¨older’scondition with power β − l. We impose on the kernel K ( · ) conditions alike ones in the first section: K = K ( x ) , x ∈ be certain even function having finite support: ∃ θ ∈ (0 , ∞ ) ∀ x : | x | > θ ⇒ K ( x ) = 0 , (29)for which Z R K ( x ) dx = 1 , (30) K ( − x ) = K ( x ); Z R K ( x ) dx < ∞ ; (31) K ( · ) ∈ C ( R ) , Z R | K ( x ) | dx < ∞ . (32)The following conditions may be named as before conditions of orthogonality: ∀ r = 1 , , . . . , l ⇒ Z R x r K ( x ) dx = 0 . (33)Ultimately, suppose Z R | x | β K ( x ) dx < ∞ . (34)7t is known that under these conditions the bias δ n ( x ) of Parzen - Rosenblatt’sestimation f n ( x ) , as well as for Wolverton - Wagner’s ones obey’s a followingproperty sup x | δ n ( x ) | = sup x | E f n ( x ) − f ( x ) | ≤ C ( β, f ) h βn , see e.g. [5], [15], [16].We get following again V.A.Epanechnikov [9] the following variational statementof problem under formulated above in this section restrictions Z R K ( y ) dy → min K (35)where in addition R R | y | β K ( y ) dy = 1 . Theorem 3.1.
The (unique) solution of this problem has a form θ = (2 β + 1) /β ; (36) K ( y ) = λ − µ | y | β , | y | ≤ θ , K ( y ) = 0 , | y | > θ ; (37) λ = β + 12 β · (2 β + 1) − /β , (38) µ = β + 12 β · (2 β + 1) − ( β +1) /β , (39)and wherein under our conditionmin Z R K ( y ) dy = Z R K ( y ) dy = µ θ β +10 · ( β (2 β + 1)( β + 1) ) = (40)( β + 1) · (2 β + 1) − ( β +1) /β . (41) Proof.
The Euler’s equations for this problem give us as before the followingform for the optimal kernel K ( y ) = λ − µ | y | β + l X j =1 ν j y j , | y | ≤ θ. (42)Since the function K ( · ) is even, ν = ν = . . . = 0 . Further, it follows fromthe relations of orthogonality that all the other coefficients ν j are absent; so theoptimal kernel K has a form inside the closed interval y ∈ [ − θ, θ ] : K ( y ) = λ − µ | y | β . (43)As long as K ( θ ) = 0 , = µ · θ β . (44)Secondly, 1 = Z θ − θ K ( y ) dy = 2 " λθ − µ θ β β + 1 , following λθ − µ · θ β β + 1 = 12 . (45)Thirdly, 1 = Z θ − θ | y | β K ( y ) dy = 2 λ θ β β + 1 − µ θ β +1 β + 1 , or equally λ θ β +1 β + 1 − µ θ β +1 β + 1 = 12 . (46)Solving the system of equations (44), (45) and (46), we obtain the assertion oftheorem 3.1.The equalities (40), (41) may be obtained after simple calculations.When for instance β = 3 / , then θ = 4 / , λ = 56 · − / , µ = 56 · − / , and min Z R K ( y ) dy = 5 · − / . Remark 3.0.
Note that in the case of fractional value β the optimal kernel K is non -negative! Remark 3.1.
It is interest to note that if we choose as the value β an integer value β := 2 , we obtain the classical Epanechnikov’s kernel θ = √ , λ = 34 √ , µ = 320 √ . In this case the minimal value of R R K ( y ) dy is equal to 3 / (5 √ . emark 3.2. We do not use in this section, i.e. in the case of fractional valueof the parameter β, the theory of Legendre’s polynomials, in contradiction to theforegoing sections. We retain all the restrictions and conditions of the foregoing section. Denote inaddition J β = J β ( K ) := Z R | y | β K ( y ) dy ; V = V ( K ) := Z R K ( y ) dy, and suppose J β ( K ) > . It is well known, see e.g. [5], [18] that | E f n − f | ≤ C h β · J β ( K ) , Var { f n } ≤ C V (K) / nh . The mean square error for the considered statistics f n allows the estimate Z n ( K ) def = E ( f n − f ) ≤ C " V ( K ) nh + h β J β ( K ) . The minimum W = W ( K ) of the right - hand side of the last inequality relativethe bandwidth h is following W = W ( K ) := min h> Z n ( K ) ≍ C n − β/ (2 β +1) J β ( K ) · [ V ( K ) ] β . We get to the following extremal problem relative the kernel K ( · ) under formulatedbefore limitations Φ( K ) def = J β ( K ) · [ V ( K ) ] β → min K . (47)Let us apply the famous calculus of variations, see e.g. [6], p.169; [11], chapter2, section 2.2. Namely, introduce the perturbed kernel K δ ( y ) := K ( y ) + δ g ( y ) , where K is the optimal kernel, δ is ”small” constant, e.g. − . ≤ δ ≤ . , g = g ( y ) , | y | ≤ θ = θ is suitable perturbation function. Of course, Z θ − θ g ( y ) dy = 0 , (48)”centering” condition.We obtain after some calculationsΦ ( K δ ) = Φ ( K ) +10 δ Z θ − θ n C K ( y ) + C | y | β o g ( y ) dy + 0( δ ) , δ → . Therefore, for some finite constants C , C and for arbitrary perturbation”centered” function g = g ( y ) (48) Z θ − θ n C K ( y ) + C | y | β o g ( y ) dy = 0 . (49)It follows from (49) taking into account the ”centering” condition (48) that C K ( y ) + C | y | β = C . (50)We conclude that the considered in this section optimization problem quitecoincides with considered in the third section!Thus, the optimal kernel K in this statement problem is described completelyin the theorem 3.1. It is interest in our opinion to find the optimal kernels for the problem of derivativedensity estimations, in the spirit, for example, [13], pp. 12 - 16; where are describedalso some applications.Denote f ( r ) ( x ) := d r f ( x ) dx r , r = 1 , , . . . ;and we want to build the kernel estimation f ( r ) n ( x ) for the derivative f ( r ) . We suppose that for some m = 1 , , . . . the density function f ( · ) is r + 2 m times continuous bounded differentiable:sup x ∈ R | f ( r +2 m ) ( x ) | < ∞ . As for the kernel K ( · ); we assume in addition to the foregoing restrictions thatit belongs to the Sobolev’s space W ,r ( − θ, θ ) : V r, ( K ) = Z θθ h K ( r ) ( y ) i dy < ∞ , (51)and as ordinary | y | ≥ θ ⇒ K ( y ) = 0; Z θ − θ K ( y ) dy = 1;11 s = 1 , , . . . , m − ⇒ Z θ − θ y s K ( y ) dy = 0; Z θ − θ y m K ( y ) dy = 1; K ( − y ) = K ( y ) . The kernel estimate for the derivative f ( r ) ( x ) has a form f ( r ) n ( x ) = 1 n h r n X i =1 K ( r ) ξ i − xh ! , (52)where as before n → ∞ ⇒ h = h ( n ) → , nh r → ∞ . It is known, see [13], p.11 - 15 that the bias as nto ∞ of f ( r ) n ( x ) under ourcondition has a form E f ( r ) n ( x ) − f ( x ) ∼ C ( f ) h m Z θ − θ y m K ( y ) dy, and the variance may be evaluated as followsVar h f (r)n (x) i ∼ C (f) 1nh [ V r , (K) ] . We get as before to the following extremal problem under our conditionsΦ( K ) := Z θ − θ h K ( r ) ( y ) i dy → min . (53)The solution of this problem is quite alike to one in the second section. Theorem 5.1.
The optimal kernel K r ( y ) for the considered in this section isunique and has a form K r ( y ) = 12 θ − θ P m +2 r (cid:18) yθ (cid:19) , | y | ≤ θ, (54) K r ( y ) = 0 , | y | > θ, where θ = [1 − µ (2 m + 2 r )] − / (2 m +2 r ) . (55) A. The multivariate version of the kernel, in particular, optimal one, has a fac-torizable form 12 ( x , x , . . . , x d ) = d Y j =1 K ( x j ) , d = 2 , , . . . , see [2], [3], [7], [9], [15], [16], [18] etc. B. At the same optimization problem for density measurement appears for theso - called recursive
Wolverton - Wagner’s density estimation, see [7], [17], [19], [23],[28], [29] and so one. C. Offered here method may be generalized perhaps on the so called regressionproblem, i.e. when η i = f ( x i ) + ǫ i , i = 1 , , . . . , n ;see [10], [14], p.64, Theorem 3.1. D. The case when the r.v. ξ i are positive, may be reduced to the consideredhere by a transform η i := ln ξ i , therefore f η ( x ) = e x f ξ ( e x ) , x ∈ ( −∞ , ∞ ) , see [4]. The case when ξ i ∈ ( a, b ) may be considered quite analogously. Acknowledgement.
The first author has been partially supported by the Gruppo Nazionaleper l’Analisi Matematica, la Probabilit`a e le loro Applicazioni (GNAMPA) of the Istituto Nazionaledi Alta Matematica (INdAM) and by Universit`a degli Studi di Napoli Parthenope through theproject “sostegno alla Ricerca individuale”(triennio 2015 - 2017) .The second author is grateful to Yousri Slaoui for sending its a very interestarticle [17].
References [1]
H.Bateman, A.Erdelyi, W.Magnus, F.Oberhettinger, F.G.Tricomi.
Higer transcendental functions,
Volume II, California Institute of Technology.Mc.Graw - Hill Book Company INC., USA, 1953; Renewed 1981.[2]
Bertin K. (2004).
Asymptotically exact minimax estimation in sup-norm foranisotropic H¨older classes.
Bernoulli,
873 - 888. MR2093615[3]
Bertin K. (2004).
Estimation asymptotiquement exacte en norme sup de fonc-tions multidimensionnelles.
Ph.D. thesis, Universiti Paris 6.134]
A Charpentier and E Flachaire.
Log-transform kernel density estimationof income distribution.
LActualite Economique,
141 - 159, 2015.[5]
Fabienne Compte and Nicolas Marie.
Bandwidth selections for the Wolver-ton - Wagner estimator. arXiv:1902.00734v2 [math.ST] 12 Oct 2019[6]
Courant, R; Hilbert, D. (1953). Methods of Mathematical Physics. I (FirstEnglish ed.). New York: Interscience Publishers, Inc.[7]
Devroye L. (1979)
On the pointwise and integral convergence of recursive ker-nel estimates of probability densities.
Util. Math.,
113 - 128.[8]
Devroye Giorfi.
Nonparametric density estimation; the L1 view.
New York :John Wiley, 1985.[9]
V. A. Epanechnikov.
Nonparametric estimation of a multidimensional prob-ability density.
Teor. Veroyatnost. i Primenen., 1969, Volume 14, Issue 1, 156- 161, (in Russian).[10]
Gasser, T. and M¨uller, H. G. (1984).
Estimating regression functions andtheir derivatives by the kernel method.
Scandinavian Journal of Statistics, ,171 - 185.[11] Gelfand, I. M.; Fomin, S. V.
Calculus of variations.
Mineola, New York:Dover Publications. p. 3. ISBN 978-0486414485. (2000). Silverman, Richard A.(ed.).[12]
A.Goldenshluger, and O.Lepski.
Bandwidth selection in kernel density es-timation: oracle inequalities and adaptive minimax optimality.
The Annals ofStatistics,
Bruce E. Hansen.
Lecture Notes on Nonparametrics.
University of Wisconsin,Spring, 2009[14]
H¨ardle Wolfgang.
Applied Nonparametric Regression.
Humboldt-Universitatzu Berlin Wirtschafts wissenschaftliche Fakultat Institut f¨ur Statistik undOkonometrie; Spandauer Str. 1D10178, Berlin, 1994.[15]
Ibragimov, I. A. and Khasminskii, R. Z. (1980).
An estimate of the densityof a distribution.
Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov.(LOMI),
61 - 85, (in Russian).[16]
Ibragimov, I. A. and Khasminskii, R. Z. (1981).
More on estimation ofthe density of a distribution.
Zap. Nauchn.Sem. Leningrad. Otdel. Mat. Inst.Steklov. (LOMI),
72 - 88, (in Russian).[17]
Salah Khardani, Yousri Slaoui.
Recursive Kernel Density Estimation andOptimal Bandwidth Selection Under α Mixing Data.
Journal of StatisticalTheory and Practice, (2019), https://doi.org/10.1007/s42519-018-0031-61, JOURIGINAL ARTICLE. 1418]
M. Lerasle, N. Magalhaes and P. Reynaud-Bouret.
Optimal Kernel Se-lection for Density Estimation.
High-Dimensional Probability VII: The CargeseVolume, Prog. Probab.
Birkhauser, 425 - 460, 2016.[19]
Elizbar Nadaraya, Petre Babilua.
On the Wolverton - Wagner Esti-mate of a Distribution Density.
BULLETIN OF THE GEORGIAN NATION-AL ACADEMY OF SCIENCES, 175, Parzen E. (1962.)
On estimation of a probability density and mode.
Ann. Math.Stat.,
Rosenblatt M. (1956.)
Remarks on some nonparametric estimates of a densityfunction.
Ann. Math. Stat.,
832 - 837.[22]
Andrea De Simone, Alessandro Morandinia.
Nonparametric Density Es-timation from Markov Chains. arXiv:2009.03937v1 [stat.ME] 8 Sep 2020[23]
Slaoui Y. (2013.)
Large and moderate principles for recursive kernel densityestimators defined by stochastic approximation method.
Serdica Math. J., ,53 - 82.[24] Stute, W. (1982).
A law of the logarithm for kernel density estimators.
Ann.Probab.,
414 - 422. MR647513[25]
Stute, W. (1984).
The oscillation behavior of empirical processes: The multi-variate case.
Ann. Probab.,
361 - 379. MR735843[26]
Tsybakov A.B. (1990).
Recurrent estimation of the mode of a multidimen-sional distribution.
Probl. of Inf. Transm.,
119 - 126.[27]
A.B.Tsybakov.
Introduction to Nonparametric Estimation.
Springer, 2009.[28]
E.J.Wegman and H.I.Davies.
Remarks on Some Recursive Estimators of aProbability Density.
The Annals of Statistics,
316 - 327, 1979.[29]
Wolverton C, Wagner T.J. (1969).
Asymptotically optimal discriminantfunctions for pattern classification.
IEEE Trans Inform Theory,15,