Tilted Nonparametric Regression Function Estimation
Farzaneh Boroumand, Mohammad T. Shakeri, Nino Kordzakhia, Mahdi Salehi, Hassan Doosti
TTilted Nonparametric Regression Function Estimation
Farzaneh Boroumand , , Mohammad T. Shakeri , Nino Kordzakhia , ,Mahdi Salehi , Hassan Doosti ∗ Department of Mathematics and Statistics, Faculty of Science and Engineering, Macquarie University, NSW, Australia.([email protected]) Department of Biostatistics, Health School, Mashhad University of Medical Sciences, Mashhad, Iran.([email protected]) ([email protected]) Department of Mathematics and Statistics, University of Neyshabur, Neyshabur, Iran.([email protected]) ∗ Correspond author: [email protected]
Abstract
This paper provides the theory about the convergence rate of the tilted version of linearsmoother. We study tilted linear smoother, a nonparametric regression function estimator,which is obtained by minimizing the distance to an infinite order flat-top trapezoidal kernelestimator. We prove that the proposed estimator achieves a high level of accuracy. Moreover,it preserves the attractive properties of the infinite order flat-top kernel estimator. We alsopresent an extensive numerical study for analysing the performance of two members of thetilted linear smoother class named tilted Nadaraya-Watson and tilted local linear in the finitesample. The simulation study shows that tilted Nadaraya-Watson and tilted local linearperform better than their classical analogs in some conditions in terms of Mean IntegratedSquared Error (MISE). Finally, the performance of these estimators as well as the conventionalestimators were illustrated by curve fitting to COVID-19 data for 12 countries and a dose-response data set.
Keywords—
Tilted estimators; Nonparametric regression function estimation; Rate ofconvergence; Infinite order flat top kernels
Let the regression model be Y i = r ( X i ) + (cid:15) i , ≤ i ≤ n, (1.1)where ( Y , X ) , ( Y , X ) , . . . , ( Y n , X n ), are the data pairs, the design variable X ∼ f X , X and (cid:15) areindependent, (cid:15) i ’s are independent and identically distributed (iid) errors with zero mean E ( (cid:15) ) = 0and variance E ( (cid:15) ) = σ . The regression function r and f X are unknown. In this paper, wewill focus on a nonparametric approach to estimate r . The main subject of this study is a classof nonparametric estimators called linear smoother. Nadaraya-Watson estimator and local linearestimator are two prevailing members of this class of estimators. An estimator ˘ r of r , is said to be1 a r X i v : . [ s t a t . M E ] F e b linear smoother if it can be written in a form of linear function of weighted Y sample. Let theweight-vector be l ( x ) = ( l ( x ) , ..., l n ( x )) T . Then the linear smoother ˘ r can be written as˘ r n ( x ) = l ( x ) T Y = X ni =1 l i ( x ) Y i , (1.2)where P ni =1 l i ( x ) = 1, see Buja et al. [1]. Nadaraya-Watson estimator and local linear estimatorcan be written as a form of linear smoother with the following weight functions. The weightfunctions for Nadaraya-Watson smoother, see Nadaraya [2], Watson [3], are l i,NW ( x ) = K ( X i − xh ) P nj =1 K ( X j − xh ) , i = 1 , . . . , n. (1.3)For the standard local linear smoother the weight functions are defined as follows, l i,ll ( x ) = b i ( x ) P nj =1 b j ( x ) , i = 1 , . . . , n, (1.4) b i ( x ) = K ( X i − xh )( S n , ( x ) − ( X i − x ) S n , ( x )) , i = 1 , . . . , n,S n , j ( x ) = X ni =1 K ( X i − xh )( X i − x ) j , j = 1 , . where K is a kernel function. The kernel function depends on the bandwidth, or smoothing,parameter h and assigns weights to the observations according to the distance to the target point x , see McMurry and Politis, [4]. The small values of h cause the neighboring points of x to havethe larger influence on the estimate leading to curvature changes in the estimated curve. Thelarger values of h imply that the distanced data points will have the same effect as the neighboringpoints on the local fit, resulting in a smoother estimate. Thus finding an optimal h is the essentialtask in the estimation procedure, see Wasserman [5]. One of the ways finding the optimal h is byminimising the leave-one-out cross validation score function, [5]. The leave-one-out cross validationscore is defined by CV = ˆ R ( h ) = 1 n n X i =1 ( Y i − ˘ r ( − i ) ( X i )) , (1.5)where ˘ r ( − i ) ( X i ) is obtained from (1.2) by omitting the i th pair ( X i , Y i ). In this work, we willpresent the tilted versions of linear smoother. A tilting technique applied to an empirical distri-bution, leads to replacing 1 /n data weights from uniform distribution by p i , ≤ i ≤ n, fromgeneral multinomial distribution over data. Hall and Yao [6] studied asymptotic properties ofthe tilted regression estimator with autoregressive errors using generalized empirical likelihoodmethod, which typically involves solving a non-linear and high dimensional optimization problem.Grenander [7] introduced a tilted method to impose restrictions on the density estimates. Thereare two approaches to estimating of the tilting parameters: Empirical likelihood and DistanceMeasure based approaches. The empirical likelihood-based method is a semi-parametric methodwhich provides a convenience of adding a parametric model through estimating equations. Owen28] proposed an empirical likelihood to be used as an alternative to the likelihood ratio tests, andderived its asymptotic distribution. Chen [9], Zhang [10], Schick et al. [11], Müller et al. [12] fur-ther developed the empirical likelihood-based method for estimating the tilting parameters. Chen[9] applied the empirical likelihood method to estimate the tilting parameters p i , ≤ i ≤ n, underthe constraints on the shape of distribution. In his kernel-based estimator, n − was replaced bythe weights obtained from the empirical likelihood method. In [9] it was proved that the proposedestimator has a smaller variance than the conventional kernel estimators. Schick et al.[11], alsoused the similar approach obtaining the consistent tilted estimator with higher efficiency than thatof conventional estimators in the autoregression framework. In contrast in the Distance Measureapproach, the tilted estimators are defined by minimizing distances, conditional to various types ofconstraints. Hall and Presnell [13], Hall and Huang [14], Carroll et al. [15], Doosti and Hall [16],Doosti et al. [17] used the setup-specific Distance Measure approaches for estimating the tiltingparameters. Carroll [15], proposed a new approach for density function estimation, and regressionfunction estimation as well as hypothesis testing under shape constraints in the model with mea-surement errors. A tilting method used in [15] led to curve estimators under some constraints.Doosti and Hall [16] introduced a new higher order nonparametric density estimator, using tiltingmethod, where they used L -metric between the proposed estimator and a consistent ’Sinc’ kernelbased estimator. Doosti et al. [17], have introduced a new way of choosing the bandwidth andestimating the tilted parameters based on the cross-validation function. In [17], it was shown thatthe proposed density function estimator had improved efficiency and was more cost-effective thanthe conventional kernel-based estimators studied in this paper.In this work, we propose a new tilted version of a linear smoother which is obtained by minimis-ing the distance to a comparator estimator. The comparator estimator is selected to be an infiniteorder flat-top kernel estimator. This class of estimators is characterized by a Fourier transform,which is flat near the origin and infinitely differentiable elsewhere, see [18]. We prove that thetilted estimators achieve a high level of accuracy, yet preserving the attractive properties of aninfinite-order flat-top kernel estimator.The rest of this paper contains the additional four sections and Appendix. In the Section2, we provide the notation, definitions and preliminary results. The Section 2 also includes thedefinition of an infinite-order estimator, as a comparator estimator. Section 3 contains the mainresults formulated in Theorems 1-3. We present a simulation study in the Section 4. The real dataapplications are provided in the Section 5. The proof of the main theorem is accommodated in theAppendix. Definition 2.1.
A general infinite order flat-top kernel K is defined K ( x ) = 12 π Z ∞−∞ λ ( s ) e − isx ds, (2.1)where λ ( s ) is the Fourier transform of kernel K , and c > λ ( s ) = ( , | s |≤ cg ( | s | ) , | s | > c , and g is not unique and it should be chosen to make λ ( s ), λ ( s ), and sλ ( s ) integrable [4].3 .1 Infinite order flat-top kernel regression estimator Let ˇ r be a linear smoother ˇ r = X ni =1 ˇ l i ( x i ) Y i , (2.2)where ˇ l i ( x ) = K ( X i − xh ) P nj =1 K ( X j − xh )and K is an infinite order flat top kernel from (2.1), also see McMurry and Politis [4]. Thetrapezoidal kernel K ( x ) = 2( cos ( x/ − cos ( x )) πx , is an infinite order flat top kernel satisfying Definition 2.1 since the Fourier transform of K ( x ) is λ ( s ) = | s |≤ / , − | s | ) 1 / < | s |≤ , | s | > . We define tilted linear smoother as followsˆ r n ( x | h, p ) = X ni =1 p i l i ( x i ) Y i , (2.3)where p i ’s are tilting parameters, p i ≥ P ni p i = 1. The bandwidth parameter h and thevector of tilting parameters p = ( p , · · · , p n ), are to be estimated. In Section 4, we evaluate theperformance of tilted versions of Nadaraya-Watson (1.2) and standard local linear estimators (1.3)in finite samples. Let ˆ r n ( . | θ ) be the tilted linear smoother from (2.3) for the regression function r , where θ = ( h, p )is a vector of unknown parameters. Further ˇ r from (2.2) will be used as a comparator estimator of r , ˇ r can be any estimator with an optimal convergence rate [18]. We will estimate θ by minimisingthe L − distance between ˆ r n ( . | θ ) and ˇ r preserving the convergence rate of ˇ r , provided the followingassumptions hold(a) k ˇ r − r k = O p ( δ n )(b) There exists ˜ θ such that ˆ r n ( . | ˜ θ ) and ˇ r possess the same convergence rates, i.e. k ˆ r n ( . | ˜ θ ) − r k = O p ( δ n ),where δ n ≥ ∞ , e.g. δ n = n − c for some c ∈ (0 , / θ as the solution to the optimisation problem asˆ θ = arg min θ k ˆ r n ( . | θ ) − ˇ r k (3.1)4ubject to the constraints for the bandwidth parameter h > p introduced in Section2.2.In Theorem 1, we show that the convergence rate of ˆ r n ( . | ˆ θ ) and ˇ r is O p ( δ n ). Theorem 1.
If the assumptions (a)-(b) hold then for any ˆ θ which fulfills (3.1) we have k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ) . Proof.
Due to Assumption (a), there exists ˜ θ such that k ˆ r n ( . | ˜ θ ) − ˇ r k ≤ k ˆ r n ( . | ˜ θ ) − r k + k r − ˇ r k = O p ( δ n ) , in which the first equation is a result of the triangle inequality, and specifically from the fact that k r − ˇ r k = O p ( δ n ); (3.2)see assumption, part(a). If ˜ θ is as in assumption 1, part (2) then k ˆ r n ( . | ˆ θ ) − ˇ r k ≤ k ˆ r n ( . | ˜ θ ) − ˇ r k = O p ( δ n ) . (3.3)Together, results (3.2) and (3.3) imply Theorem 1.Theorem 1 implies that the convergence rate of ˆ r n ( . | ˆ θ ) estimator coincides with that of ˇ r withthe bandwidth parameter h replaced by its ‘plug-in’ type estimate similar to that from [4] and [18].The regression function r ∈ C , where C is a class of regression functions, iflim C →∞ lim sup n →∞ sup r ∈C [ P {k ˆ r n ( . | ˜ θ ) − r k ≥ Cδ n } + P {k ˇ r − r k ≥ Cδ n } ] = 0 , (3.4)subject to existence of ˜ θ . Theorem 2.
If (3.4) holds for regression functions from C thenlim C →∞ lim sup n →∞ sup r ∈C P {k ˆ r n ( . | ˆ θ ) − r k ≥ Cδ n } = 0 . (3.5)Theorem 2 states that ˆ r n ( . | ˆ θ ) and ˇ r converge to r uniformly in C .Let X , X , ..., X n be iid random variables with probability density function (pdf) f ( x ) andˆ g ( x ) be its kernel based density function estimatorˆ g ( x ) = 1 nh n X i =1 K ( x − X i h )and g ( x ) = E f ˆ g ( x ).Suppose that (c) - (d) hold for φ K and φ q , Fourier transforms for K and q = r · g , respectively,(c) φ K ( t ) − = 1 + P kj =1 c j t j , c , ..., c k are real numbers;(d) R | φ q ( t ) || t | k dt < ∞ ;(e) For constants C , ..., C >
0, and j = 1 , ..., k the derivatives q (2 j ) exist, | q (2 j ) ( x ) | ≤ C and R | q (2 j ) ( x ) | ≤ C , and either 51) | q (2 j ) ( x ) /r · f ( x ) | ≤ C for all x , or(2) | q (2 j ) ( x ) /r · f ( x ) | ≤ C (1 + | x | ) C for all x and P ( | X | ≥ x ) ≤ C exp( − C x C ) , x > δ n → n → ∞ , so that n / δ n → ∞ , and under assumption(e)-(2), n / log ( n ) − C / C δ n → ∞ , where C and C are defined in (e)-(2).Assumption (c)-(f) are reasonable and considered in tilted density function estimation in [16]. Itis anticipated that k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ), where δ n converges to 0 slower than n − / as shown inTheorem 3. Next we formulate the assumption using the first term of the expression in the lefthand side of (3.4) lim C →∞ lim sup n →∞ sup r ∈C P r ( k ˇ r − r k > Cδ n ) = 0 . (3.6) Theorem 3.
Let (a)-(f) be valid and ˆ θ be defined in (3.1). I. If in addition k ˇ r − r k = O p ( δ n ) then k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ) . II.
If the assumptions in (e) hold uniformly for q ∈ C and obtain (3.6) then (3.5) is valid.The proof of Theorem 3 is given in the Appendix. We present the results of simulation study of the performance of tilted estimators in varioussettings. Data were generated using exponential function and sin regression functions with normaland uniform design distributions. Four samples of sizes n = (60 , , , σ = (0 . , . , . , , . , σ and n we generated 500 data sets. The Median Integrated SquaredError (MISE) was estimated using the Monte Carlo method. The leave-one-out cross validationscore from (1.5) was employed to choose the optimal bandwidths for Nadaraya-Watson and locallinear estimators, [5]. For an infinite order flat-top kernel estimator, bandwidth was selected usingthe rule of thumb introduced by McMurry and Politis [4] as part of ’iosmooth’. The bandwidthparameters for tilted estimators were estimated using our suggested procedure. The exponentialregression function r ( x ) = x + 4 exp ( − x ) / √ π . The design densities were taken to be uniformon [ − ,
2] and N (0 , − , r ( x ) = sin (4 πx ) was paired with the uniform design density on [0 , ,
1] and [0 . , . r ( x ) regression function along with normaldesign density and normal distribution for the error term. It is evident that for moderate samplesize (n=200) and a large sample size (n=1000) and for the medium standard deviation (0.5 and0.7), the tilted estimators outperformed other estimators. Moreover, for larger sample size, asstandard deviation of error terms increases, the MISE of tilted estimators decreases.Table 1: MISE for Infinite Order (IO) estimator with the trapezoidal kernel, Nadaraya-Watson(NW) estimator, standard local linear (LL) estimator, tilted NW estimator with 4 (NW p4) and 10(NW p10) weighting nodes, tilted LL estimator with 4 (LL p4) and 10 (LL p10) weighting nodes,Exponential regression function and normal design density function. In each row the minimumMISE is highlighted in bold. n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.2070 0.0861 . . . . . . . . . . . . . . . . . . . . . . . . r ( x ) With the uniform design density and the random normal error term. For fixed sample size,as the standard deviation increases, the tilted estimators otperform others. Although, for largesample sizes, the conventional estimators tend to perform better than tilted estimators. For smallersample sizes and the moderate standard deviation levels, the tilted N-W estimator remains superiorto the conventional estimators at some extent.Table 2: MISE for Infinite Order (IO) estimator with the trapezoidal kernel, Nadaraya-Watson(NW) estimator, standard local linear (LL) estimator, tilted NW estimator with 4 (NW p4) and 10(NW p10) weighting nodes, tilted LL estimator with 4 (LL p4) and 10 (LL p10) weighting nodes,exponential regression function and uniform design density. In each row the minimum MISE ishighlighted in bold. n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.1559 . . . . . . . . . . . . . . . . . . . . . . . n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.04034 0.0286 0.0234 0.0315 0.0331 . . . . . . . . . . . . . . . . . . . . . . . . n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.0261 0.0182 0.0143 0.0201 0.0213 . . . . . . . . . . . . . . . . . . . . . . . . n = 60 , σ = 0 . , ( n = 200 , σ = 0 .
7) and ( n = 1000 , σ = { , . , } ) the tilted Nadaraya-Watson estimator outperformed its classical counterpart. From theboxplots in Figure 1 it is evident that the tilted estimators have smaller median ISEs. The extremevalues of the ISEs for the tilted estimators are smaller than these of the conventional estimators.Similarity between the ISE distributions and their spreads of the IO and tilted estimators can alsobe seen in Figure 1.Figure 1: Boxplots of Integrated Square Errors (ISE) for Infinite Order (IO) estimator with thetrapezoidal kernel, Nadaraya-Watson (NW) estimator, standard local linear (LL) estimator, tiltedNW estimator with 4 (NW p4) and 10 (NW p10) weighting nodes, tilted LL estimator with 4(LL p4) and 10 (LL p10) weighting nodes, sin regression function, edges excluded, n = 1000 and σ = 0 . In this section, we study the performance of tilted estimators in the real data environment.
Country IO NW NW p4Iran 324 326
Australia 5.48 5.41 . Italy 445 1590
Belgium 2357 3071
Germany 641 1029
Spain 41621 41754
Brazil 108590 108246
United Kingdom 1789 2083
Canada 175 183
Chile 6040 6685
South Africa 1158 1403
United States of America 43910 46512
Table 6: COVID-19 Death: MSE for Nadaraya-Watson (NW), Infinite Order (IO), and tilted(NW p4) estimators. In each row the minimum MSE is highlighted in bold.
Country IO NW NW p4Iran 1.36 1.39 . Australia 0.0225 0.0223 . Italy 1.97 3.66 . Belgium 0.0699 0.3077 . Germany 0.71 0.82 . Spain 37.74 37.92 . Brazil 63.99 64.21 . United Kingdom 13.57 14.50 . Canada 0.40 0.47 . Chile 9.10 9.73 . South Africa 3.32 3.50 . United States of America 201.39 205.98 . .2 Dose-Response data The dose-response data refers to a study of phenylephrine effects on rat corpus cavernosum strips.This data first appeared in Boroumand et al. [21] where the dose-response curves to phenylephrine(0.1 µM to 300 µM ) were obtained by applying the robust four-parameter logistic (4PL) regression.Here we have used a tilted smoother approach to dose-response curve fitting. In terms of MeanSquare Errors (MSEs) the tilted local linear estimators performed better than local linear, infiniteorder flat-top kernel estimators including the robust 4PL model. The fitted dose-response curvesusing the tilted local linear, local linear, infinite order flat-top kernel estimator, and 4PL modelare plotted in Figure 4. The corresponding MSEs are listed in the caption of Figure 4.Figure 4: Dose-response curves: MSE for tilted local linear, local linear, infinite order flat-topkernel estimator and 4PL model are 95.1023, 95.3267, 95.53077 and 110.7539, respectively.The original dose-response data contained the outliers and the standard 4PL model had a poorfit. Due to this we compared the performance of the tilted estimator with the robust 4PL model.The tilted local linear estimator outperformed the robust 4PL model in terms of MSE. Acknowledgement
This research was undertaken with the assistance of resources and services from the NationalComputational Infrastructure (NCI), which is supported by the Australian Government. Thisresearch forms part of the first author’s PhD thesis approved by the ethics committee of MashhadUniversity of Medical Sciences with the project code 971017.16 eferences [1] Andreas Buja, Trevor Hastie, and Robert Tibshirani. “Linear smoothers and additive mod-els”. In:
The Annals of Statistics (1989), pp. 453–510.[2] Elizbar A Nadaraya. “On estimating regression”. In:
Theory of Probability & Its Applications
Sankhy¯a: The Indian Journal of Statis-tics, Series A (1964), pp. 359–372.[4] Timothy L McMurry and Dimitris N Politis. “Nonparametric regression with infinite orderflat-top kernels”. In:
Journal of Nonparametric Statistics
All of nonparametric statistics . Springer Science & Business Media, 2006.[6] Peter Hall and Qiwei Yao. “Data tilting for time series”. In:
Journal of the Royal StatisticalSociety: Series B (Statistical Methodology)
Scandinavian ActuarialJournal
Biometrika
Australian Journalof Statistics
Commu-nications in Statistics-Theory and Methods
Communications in Statistics—Theory and Methods
Statistica Sinica (2005), pp. 177–195.[13] Peter Hall and Brett Presnell. “Intentionally biased bootstrap methods”. In:
Journal of theRoyal Statistical Society: Series B (Statistical Methodology)
Annals of Statistics (2001), pp. 624–647.[15] Raymond J Carroll, Aurore Delaigle, and Peter Hall. “Testing and estimating shape-constrainednonparametric density and regression in the presence of measurement error”. In:
Journal ofthe American Statistical Association
Journal of the Royal Statistical Society: SeriesB (Statistical Methodology)
Journal of Statistical Planning and Inference
REVSTAT–Statistical Journal
Computational statistics handbook with MATLAB.Vol. 22 . 2007.[20] Peter Hall and Thomas E Wehrly. “A geometrical method for removing edge effects fromkernel-type nonparametric regression estimators”. In:
Journal of the American StatisticalAssociation
IranianJournal of Pharmaceutical Sciences ppendices
A Proof of Theorem 3
In this section we provide proof of Theorem 3 for tilted Nadaraya-Watson estimator as a formof tilted linear smoother from (2.3). The result for tilted local linear smoother can be provedanalogously.
Proof.
Let p i = n π ( X i ) where π ≥ P ni =1 p i = 1 which is equivalent R π ( X ) f X ( x ) dx = 1 for continuous X . For simplicity, we replace f X ( x ) by f then n X i =1 p i = 1 n n X i =1 π ( X i )= Eπ ( X ) + O p ( n − / )= Z π ( X ) f dx + O p ( n − / )= 1 + O p ( n − / ) , thus for normalising p i s so that P i p i = 1 we need to multiply the estimator (2.3) by 1+ O p ( n − / ).The factor O p ( n − / ) is negligibly small. We choose π such that ˆ r n ( x | h, p ) in (2.3) is unbiasedestimator for r , i.e E ˆ r ( x | h, p ) = r. (A.1)From (A.1) we have E ˆ r ( x )ˆ g ( x ) = 1 h n X i =1 Ep i Y i K ( x − X i h )= 1 nh n X i =1 EE { Y i π ( X i ) K ( x − X i h ) | X i } = 1 h Z ∞−∞ r ( t ) π ( t ) K ( x − th ) f X ( t ) dt, (A.2)where ˆ g ( x ) = 1 nh n X i =1 K ( x − X i h )and g ( x ) = E ˆ g ( x ). It can be shown that the left-hand side of (A.2) is converging to r ( x ) g ( x ) . We have r ( x ) g ( x ) = 1 h Z ∞−∞ r ( t ) π ( t ) K ( x − t h ) f X ( t ) dt , multiplying both sides by e − itx and integrating over x , we deduceΦ rg ( t ) = 1 h Z ∞−∞ Z e − itx rπf ( t ) K ( x − t h ) dx dt ,
19y changing variable x − t h = u , we haveΦ rg ( t ) = Z ∞−∞ e − itx rπf ( t )Φ k ( t ) dt = Φ k ( t )Φ rπf ( t ) . Φ rπf = Φ rg ( t )Φ k ( t ) ,πrf = 12 π Z ∞−∞ e − itx Φ rg ( t )Φ k ( t ) ,π ( X ) = 12 πrf Z ∞−∞ e − itx Φ rg ( t )Φ k ( t ) dt, (A.3)if kernel K holds the assumption (c) and q = r · g meet the assumption (d), then π = 1 + n X j =1 C j ( − h ) j rg (2 j ) rf , (A.4)with π from (A.3), then ˆ r n in unbiased. Next we show that π satisfies 0 < π ( X ) < C and h ≥
0, for all h , 0 ≤ h ≤ h , π > π ≤ C < ∞ then for unbiased ˆ r n Z var { ˆ r n ( x | h, p ) } dx ≤ nh Z E { π ( X ) K ( x − Xh ) } dx, ≤ nh (sup π ) Z K dx, = O { nh − } . (A.5)So M SE can be written as
M SE { ˆ r n ( x | h, p ) } = Z E { ˆ r n ( x | h, p ) − r } dx, = O { nh − } . We recall that(f) Under the assumption (e)-(1), δ n → n → ∞ , so that n / δ n → ∞ , and under assumption(e)-(2), n / log ( n ) − C / C δ n → ∞ , where C and C are defined in (e)-(2).Then under the assumption (e)-(1) and (f), we have that n / δ n → ∞ thus n − = o ( δ n ). Con-sequently, there exists h ( n ) ↓ n → ∞ such that ( nh ) − = O ( δ n ) , for some large n, h < h since 0 ≤ h ≤ h . Next, by replacing O ( δ n ) in the right-hand side of (A.5), we have a new form of(A.5) which is true for specific choice of π defined at (A.3), and considering ˜ θ = ( h, p ) in the caseof (2.3): lim C →∞ lim sup n →∞ P {k ˆ r n ( . | ˜ θ ) − r k ≥ Cδ n } = 0 . (A.6)20or this version of p i = n − π ( X i ), P i p i = 1 does not satisfy. However, this issue can be fixed bynormalisation similar to that done in the first paragraph of the proof.Property (A.6) implies part I of Theorem 3 and part II can be concluded under uniformity of(A.6) over C .Under assumption (e)-(2) and (A.4), | rg (2 j ) ( x ) /rf | ≤ C (1 + | x | ) C for 1 ≤ j ≤ k , defining C = max( | C | , ..., | C k | ), we have for 0 ≤ h ≤ | π ( X ) − | = | n X j =1 C j ( − h ) j rg (2 j ) rf | , ≤ C C K (1 + | x | ) C h , so, if λ n → ∞ and λ C n h →
0, then sup | π ( X ) − | → | X | ≤ λ n , 0 < π ( X ) < π ( X ) ≥
0. Then we found an upper boundfor π ( X ) in (A.7) when X ∈ [ − λ n , λ n ]. Now, we want to show that the probability of X beingout of this interval is almost zero which means for all X , 0 < π ( X ) < P ( | X | ≥ λ n ) ≤ C exp ( − C λ C n ) . (A.8)Using (f), n − / (log n ) C / C δ n → ∞ , or equivalently, δ n = λ n n − (log n ) C / C → ∞ , where λ n exists. We choose h so that ( nh ) − = O ( δ n ); or for simplicity, ( nh − ) = δ n , then h = λ − n (log n ) − C / C ; let λ n = { λ − η n (log n ) C /C } /C , where η ∈ (0 , − C λ C n ) = exp( − C λ (2 − η ) C /C n log n ) , = O ( n − C ) , for all C >