# Tilted Nonparametric Regression Function Estimation

Farzaneh Boroumand, Mohammad T. Shakeri, Nino Kordzakhia, Mahdi Salehi, Hassan Doosti

TTilted Nonparametric Regression Function Estimation

Farzaneh Boroumand , , Mohammad T. Shakeri , Nino Kordzakhia , ,Mahdi Salehi , Hassan Doosti ∗ Department of Mathematics and Statistics, Faculty of Science and Engineering, Macquarie University, NSW, Australia.([email protected]) Department of Biostatistics, Health School, Mashhad University of Medical Sciences, Mashhad, Iran.([email protected]) ([email protected]) Department of Mathematics and Statistics, University of Neyshabur, Neyshabur, Iran.([email protected]) ∗ Correspond author: [email protected]

Abstract

This paper provides the theory about the convergence rate of the tilted version of linearsmoother. We study tilted linear smoother, a nonparametric regression function estimator,which is obtained by minimizing the distance to an inﬁnite order ﬂat-top trapezoidal kernelestimator. We prove that the proposed estimator achieves a high level of accuracy. Moreover,it preserves the attractive properties of the inﬁnite order ﬂat-top kernel estimator. We alsopresent an extensive numerical study for analysing the performance of two members of thetilted linear smoother class named tilted Nadaraya-Watson and tilted local linear in the ﬁnitesample. The simulation study shows that tilted Nadaraya-Watson and tilted local linearperform better than their classical analogs in some conditions in terms of Mean IntegratedSquared Error (MISE). Finally, the performance of these estimators as well as the conventionalestimators were illustrated by curve ﬁtting to COVID-19 data for 12 countries and a dose-response data set.

Keywords—

Tilted estimators; Nonparametric regression function estimation; Rate ofconvergence; Inﬁnite order ﬂat top kernels

Let the regression model be Y i = r ( X i ) + (cid:15) i , ≤ i ≤ n, (1.1)where ( Y , X ) , ( Y , X ) , . . . , ( Y n , X n ), are the data pairs, the design variable X ∼ f X , X and (cid:15) areindependent, (cid:15) i ’s are independent and identically distributed (iid) errors with zero mean E ( (cid:15) ) = 0and variance E ( (cid:15) ) = σ . The regression function r and f X are unknown. In this paper, wewill focus on a nonparametric approach to estimate r . The main subject of this study is a classof nonparametric estimators called linear smoother. Nadaraya-Watson estimator and local linearestimator are two prevailing members of this class of estimators. An estimator ˘ r of r , is said to be1 a r X i v : . [ s t a t . M E ] F e b linear smoother if it can be written in a form of linear function of weighted Y sample. Let theweight-vector be l ( x ) = ( l ( x ) , ..., l n ( x )) T . Then the linear smoother ˘ r can be written as˘ r n ( x ) = l ( x ) T Y = X ni =1 l i ( x ) Y i , (1.2)where P ni =1 l i ( x ) = 1, see Buja et al. [1]. Nadaraya-Watson estimator and local linear estimatorcan be written as a form of linear smoother with the following weight functions. The weightfunctions for Nadaraya-Watson smoother, see Nadaraya [2], Watson [3], are l i,NW ( x ) = K ( X i − xh ) P nj =1 K ( X j − xh ) , i = 1 , . . . , n. (1.3)For the standard local linear smoother the weight functions are deﬁned as follows, l i,ll ( x ) = b i ( x ) P nj =1 b j ( x ) , i = 1 , . . . , n, (1.4) b i ( x ) = K ( X i − xh )( S n , ( x ) − ( X i − x ) S n , ( x )) , i = 1 , . . . , n,S n , j ( x ) = X ni =1 K ( X i − xh )( X i − x ) j , j = 1 , . where K is a kernel function. The kernel function depends on the bandwidth, or smoothing,parameter h and assigns weights to the observations according to the distance to the target point x , see McMurry and Politis, [4]. The small values of h cause the neighboring points of x to havethe larger inﬂuence on the estimate leading to curvature changes in the estimated curve. Thelarger values of h imply that the distanced data points will have the same eﬀect as the neighboringpoints on the local ﬁt, resulting in a smoother estimate. Thus ﬁnding an optimal h is the essentialtask in the estimation procedure, see Wasserman [5]. One of the ways ﬁnding the optimal h is byminimising the leave-one-out cross validation score function, [5]. The leave-one-out cross validationscore is deﬁned by CV = ˆ R ( h ) = 1 n n X i =1 ( Y i − ˘ r ( − i ) ( X i )) , (1.5)where ˘ r ( − i ) ( X i ) is obtained from (1.2) by omitting the i th pair ( X i , Y i ). In this work, we willpresent the tilted versions of linear smoother. A tilting technique applied to an empirical distri-bution, leads to replacing 1 /n data weights from uniform distribution by p i , ≤ i ≤ n, fromgeneral multinomial distribution over data. Hall and Yao [6] studied asymptotic properties ofthe tilted regression estimator with autoregressive errors using generalized empirical likelihoodmethod, which typically involves solving a non-linear and high dimensional optimization problem.Grenander [7] introduced a tilted method to impose restrictions on the density estimates. Thereare two approaches to estimating of the tilting parameters: Empirical likelihood and DistanceMeasure based approaches. The empirical likelihood-based method is a semi-parametric methodwhich provides a convenience of adding a parametric model through estimating equations. Owen28] proposed an empirical likelihood to be used as an alternative to the likelihood ratio tests, andderived its asymptotic distribution. Chen [9], Zhang [10], Schick et al. [11], Müller et al. [12] fur-ther developed the empirical likelihood-based method for estimating the tilting parameters. Chen[9] applied the empirical likelihood method to estimate the tilting parameters p i , ≤ i ≤ n, underthe constraints on the shape of distribution. In his kernel-based estimator, n − was replaced bythe weights obtained from the empirical likelihood method. In [9] it was proved that the proposedestimator has a smaller variance than the conventional kernel estimators. Schick et al.[11], alsoused the similar approach obtaining the consistent tilted estimator with higher eﬃciency than thatof conventional estimators in the autoregression framework. In contrast in the Distance Measureapproach, the tilted estimators are deﬁned by minimizing distances, conditional to various types ofconstraints. Hall and Presnell [13], Hall and Huang [14], Carroll et al. [15], Doosti and Hall [16],Doosti et al. [17] used the setup-speciﬁc Distance Measure approaches for estimating the tiltingparameters. Carroll [15], proposed a new approach for density function estimation, and regressionfunction estimation as well as hypothesis testing under shape constraints in the model with mea-surement errors. A tilting method used in [15] led to curve estimators under some constraints.Doosti and Hall [16] introduced a new higher order nonparametric density estimator, using tiltingmethod, where they used L -metric between the proposed estimator and a consistent ’Sinc’ kernelbased estimator. Doosti et al. [17], have introduced a new way of choosing the bandwidth andestimating the tilted parameters based on the cross-validation function. In [17], it was shown thatthe proposed density function estimator had improved eﬃciency and was more cost-eﬀective thanthe conventional kernel-based estimators studied in this paper.In this work, we propose a new tilted version of a linear smoother which is obtained by minimis-ing the distance to a comparator estimator. The comparator estimator is selected to be an inﬁniteorder ﬂat-top kernel estimator. This class of estimators is characterized by a Fourier transform,which is ﬂat near the origin and inﬁnitely diﬀerentiable elsewhere, see [18]. We prove that thetilted estimators achieve a high level of accuracy, yet preserving the attractive properties of aninﬁnite-order ﬂat-top kernel estimator.The rest of this paper contains the additional four sections and Appendix. In the Section2, we provide the notation, deﬁnitions and preliminary results. The Section 2 also includes thedeﬁnition of an inﬁnite-order estimator, as a comparator estimator. Section 3 contains the mainresults formulated in Theorems 1-3. We present a simulation study in the Section 4. The real dataapplications are provided in the Section 5. The proof of the main theorem is accommodated in theAppendix. Deﬁnition 2.1.

A general inﬁnite order ﬂat-top kernel K is deﬁned K ( x ) = 12 π Z ∞−∞ λ ( s ) e − isx ds, (2.1)where λ ( s ) is the Fourier transform of kernel K , and c > λ ( s ) = ( , | s |≤ cg ( | s | ) , | s | > c , and g is not unique and it should be chosen to make λ ( s ), λ ( s ), and sλ ( s ) integrable [4].3 .1 Inﬁnite order ﬂat-top kernel regression estimator Let ˇ r be a linear smoother ˇ r = X ni =1 ˇ l i ( x i ) Y i , (2.2)where ˇ l i ( x ) = K ( X i − xh ) P nj =1 K ( X j − xh )and K is an inﬁnite order ﬂat top kernel from (2.1), also see McMurry and Politis [4]. Thetrapezoidal kernel K ( x ) = 2( cos ( x/ − cos ( x )) πx , is an inﬁnite order ﬂat top kernel satisfying Deﬁnition 2.1 since the Fourier transform of K ( x ) is λ ( s ) = | s |≤ / , − | s | ) 1 / < | s |≤ , | s | > . We deﬁne tilted linear smoother as followsˆ r n ( x | h, p ) = X ni =1 p i l i ( x i ) Y i , (2.3)where p i ’s are tilting parameters, p i ≥ P ni p i = 1. The bandwidth parameter h and thevector of tilting parameters p = ( p , · · · , p n ), are to be estimated. In Section 4, we evaluate theperformance of tilted versions of Nadaraya-Watson (1.2) and standard local linear estimators (1.3)in ﬁnite samples. Let ˆ r n ( . | θ ) be the tilted linear smoother from (2.3) for the regression function r , where θ = ( h, p )is a vector of unknown parameters. Further ˇ r from (2.2) will be used as a comparator estimator of r , ˇ r can be any estimator with an optimal convergence rate [18]. We will estimate θ by minimisingthe L − distance between ˆ r n ( . | θ ) and ˇ r preserving the convergence rate of ˇ r , provided the followingassumptions hold(a) k ˇ r − r k = O p ( δ n )(b) There exists ˜ θ such that ˆ r n ( . | ˜ θ ) and ˇ r possess the same convergence rates, i.e. k ˆ r n ( . | ˜ θ ) − r k = O p ( δ n ),where δ n ≥ ∞ , e.g. δ n = n − c for some c ∈ (0 , / θ as the solution to the optimisation problem asˆ θ = arg min θ k ˆ r n ( . | θ ) − ˇ r k (3.1)4ubject to the constraints for the bandwidth parameter h > p introduced in Section2.2.In Theorem 1, we show that the convergence rate of ˆ r n ( . | ˆ θ ) and ˇ r is O p ( δ n ). Theorem 1.

If the assumptions (a)-(b) hold then for any ˆ θ which fulﬁlls (3.1) we have k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ) . Proof.

Due to Assumption (a), there exists ˜ θ such that k ˆ r n ( . | ˜ θ ) − ˇ r k ≤ k ˆ r n ( . | ˜ θ ) − r k + k r − ˇ r k = O p ( δ n ) , in which the ﬁrst equation is a result of the triangle inequality, and speciﬁcally from the fact that k r − ˇ r k = O p ( δ n ); (3.2)see assumption, part(a). If ˜ θ is as in assumption 1, part (2) then k ˆ r n ( . | ˆ θ ) − ˇ r k ≤ k ˆ r n ( . | ˜ θ ) − ˇ r k = O p ( δ n ) . (3.3)Together, results (3.2) and (3.3) imply Theorem 1.Theorem 1 implies that the convergence rate of ˆ r n ( . | ˆ θ ) estimator coincides with that of ˇ r withthe bandwidth parameter h replaced by its ‘plug-in’ type estimate similar to that from [4] and [18].The regression function r ∈ C , where C is a class of regression functions, iflim C →∞ lim sup n →∞ sup r ∈C [ P {k ˆ r n ( . | ˜ θ ) − r k ≥ Cδ n } + P {k ˇ r − r k ≥ Cδ n } ] = 0 , (3.4)subject to existence of ˜ θ . Theorem 2.

If (3.4) holds for regression functions from C thenlim C →∞ lim sup n →∞ sup r ∈C P {k ˆ r n ( . | ˆ θ ) − r k ≥ Cδ n } = 0 . (3.5)Theorem 2 states that ˆ r n ( . | ˆ θ ) and ˇ r converge to r uniformly in C .Let X , X , ..., X n be iid random variables with probability density function (pdf) f ( x ) andˆ g ( x ) be its kernel based density function estimatorˆ g ( x ) = 1 nh n X i =1 K ( x − X i h )and g ( x ) = E f ˆ g ( x ).Suppose that (c) - (d) hold for φ K and φ q , Fourier transforms for K and q = r · g , respectively,(c) φ K ( t ) − = 1 + P kj =1 c j t j , c , ..., c k are real numbers;(d) R | φ q ( t ) || t | k dt < ∞ ;(e) For constants C , ..., C >

0, and j = 1 , ..., k the derivatives q (2 j ) exist, | q (2 j ) ( x ) | ≤ C and R | q (2 j ) ( x ) | ≤ C , and either 51) | q (2 j ) ( x ) /r · f ( x ) | ≤ C for all x , or(2) | q (2 j ) ( x ) /r · f ( x ) | ≤ C (1 + | x | ) C for all x and P ( | X | ≥ x ) ≤ C exp( − C x C ) , x > δ n → n → ∞ , so that n / δ n → ∞ , and under assumption(e)-(2), n / log ( n ) − C / C δ n → ∞ , where C and C are deﬁned in (e)-(2).Assumption (c)-(f) are reasonable and considered in tilted density function estimation in [16]. Itis anticipated that k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ), where δ n converges to 0 slower than n − / as shown inTheorem 3. Next we formulate the assumption using the ﬁrst term of the expression in the lefthand side of (3.4) lim C →∞ lim sup n →∞ sup r ∈C P r ( k ˇ r − r k > Cδ n ) = 0 . (3.6) Theorem 3.

Let (a)-(f) be valid and ˆ θ be deﬁned in (3.1). I. If in addition k ˇ r − r k = O p ( δ n ) then k ˆ r n ( . | ˆ θ ) − r k = O p ( δ n ) . II.

If the assumptions in (e) hold uniformly for q ∈ C and obtain (3.6) then (3.5) is valid.The proof of Theorem 3 is given in the Appendix. We present the results of simulation study of the performance of tilted estimators in varioussettings. Data were generated using exponential function and sin regression functions with normaland uniform design distributions. Four samples of sizes n = (60 , , , σ = (0 . , . , . , , . , σ and n we generated 500 data sets. The Median Integrated SquaredError (MISE) was estimated using the Monte Carlo method. The leave-one-out cross validationscore from (1.5) was employed to choose the optimal bandwidths for Nadaraya-Watson and locallinear estimators, [5]. For an inﬁnite order ﬂat-top kernel estimator, bandwidth was selected usingthe rule of thumb introduced by McMurry and Politis [4] as part of ’iosmooth’. The bandwidthparameters for tilted estimators were estimated using our suggested procedure. The exponentialregression function r ( x ) = x + 4 exp ( − x ) / √ π . The design densities were taken to be uniformon [ − ,

2] and N (0 , − , r ( x ) = sin (4 πx ) was paired with the uniform design density on [0 , ,

1] and [0 . , . r ( x ) regression function along with normaldesign density and normal distribution for the error term. It is evident that for moderate samplesize (n=200) and a large sample size (n=1000) and for the medium standard deviation (0.5 and0.7), the tilted estimators outperformed other estimators. Moreover, for larger sample size, asstandard deviation of error terms increases, the MISE of tilted estimators decreases.Table 1: MISE for Inﬁnite Order (IO) estimator with the trapezoidal kernel, Nadaraya-Watson(NW) estimator, standard local linear (LL) estimator, tilted NW estimator with 4 (NW p4) and 10(NW p10) weighting nodes, tilted LL estimator with 4 (LL p4) and 10 (LL p10) weighting nodes,Exponential regression function and normal design density function. In each row the minimumMISE is highlighted in bold. n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.2070 0.0861 . . . . . . . . . . . . . . . . . . . . . . . . r ( x ) With the uniform design density and the random normal error term. For ﬁxed sample size,as the standard deviation increases, the tilted estimators otperform others. Although, for largesample sizes, the conventional estimators tend to perform better than tilted estimators. For smallersample sizes and the moderate standard deviation levels, the tilted N-W estimator remains superiorto the conventional estimators at some extent.Table 2: MISE for Inﬁnite Order (IO) estimator with the trapezoidal kernel, Nadaraya-Watson(NW) estimator, standard local linear (LL) estimator, tilted NW estimator with 4 (NW p4) and 10(NW p10) weighting nodes, tilted LL estimator with 4 (LL p4) and 10 (LL p10) weighting nodes,exponential regression function and uniform design density. In each row the minimum MISE ishighlighted in bold. n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.1559 . . . . . . . . . . . . . . . . . . . . . . . n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.04034 0.0286 0.0234 0.0315 0.0331 . . . . . . . . . . . . . . . . . . . . . . . . n σ IO NW LL NW p4 NW p10 LL p4 LL p1060 0.3 0.0261 0.0182 0.0143 0.0201 0.0213 . . . . . . . . . . . . . . . . . . . . . . . . n = 60 , σ = 0 . , ( n = 200 , σ = 0 .

7) and ( n = 1000 , σ = { , . , } ) the tilted Nadaraya-Watson estimator outperformed its classical counterpart. From theboxplots in Figure 1 it is evident that the tilted estimators have smaller median ISEs. The extremevalues of the ISEs for the tilted estimators are smaller than these of the conventional estimators.Similarity between the ISE distributions and their spreads of the IO and tilted estimators can alsobe seen in Figure 1.Figure 1: Boxplots of Integrated Square Errors (ISE) for Inﬁnite Order (IO) estimator with thetrapezoidal kernel, Nadaraya-Watson (NW) estimator, standard local linear (LL) estimator, tiltedNW estimator with 4 (NW p4) and 10 (NW p10) weighting nodes, tilted LL estimator with 4(LL p4) and 10 (LL p10) weighting nodes, sin regression function, edges excluded, n = 1000 and σ = 0 . In this section, we study the performance of tilted estimators in the real data environment.

Country IO NW NW p4Iran 324 326

Australia 5.48 5.41 . Italy 445 1590

Belgium 2357 3071

Germany 641 1029

Spain 41621 41754

Brazil 108590 108246

United Kingdom 1789 2083

Canada 175 183

Chile 6040 6685

South Africa 1158 1403

United States of America 43910 46512

Table 6: COVID-19 Death: MSE for Nadaraya-Watson (NW), Inﬁnite Order (IO), and tilted(NW p4) estimators. In each row the minimum MSE is highlighted in bold.

Country IO NW NW p4Iran 1.36 1.39 . Australia 0.0225 0.0223 . Italy 1.97 3.66 . Belgium 0.0699 0.3077 . Germany 0.71 0.82 . Spain 37.74 37.92 . Brazil 63.99 64.21 . United Kingdom 13.57 14.50 . Canada 0.40 0.47 . Chile 9.10 9.73 . South Africa 3.32 3.50 . United States of America 201.39 205.98 . .2 Dose-Response data The dose-response data refers to a study of phenylephrine eﬀects on rat corpus cavernosum strips.This data ﬁrst appeared in Boroumand et al. [21] where the dose-response curves to phenylephrine(0.1 µM to 300 µM ) were obtained by applying the robust four-parameter logistic (4PL) regression.Here we have used a tilted smoother approach to dose-response curve ﬁtting. In terms of MeanSquare Errors (MSEs) the tilted local linear estimators performed better than local linear, inﬁniteorder ﬂat-top kernel estimators including the robust 4PL model. The ﬁtted dose-response curvesusing the tilted local linear, local linear, inﬁnite order ﬂat-top kernel estimator, and 4PL modelare plotted in Figure 4. The corresponding MSEs are listed in the caption of Figure 4.Figure 4: Dose-response curves: MSE for tilted local linear, local linear, inﬁnite order ﬂat-topkernel estimator and 4PL model are 95.1023, 95.3267, 95.53077 and 110.7539, respectively.The original dose-response data contained the outliers and the standard 4PL model had a poorﬁt. Due to this we compared the performance of the tilted estimator with the robust 4PL model.The tilted local linear estimator outperformed the robust 4PL model in terms of MSE. Acknowledgement

This research was undertaken with the assistance of resources and services from the NationalComputational Infrastructure (NCI), which is supported by the Australian Government. Thisresearch forms part of the ﬁrst author’s PhD thesis approved by the ethics committee of MashhadUniversity of Medical Sciences with the project code 971017.16 eferences [1] Andreas Buja, Trevor Hastie, and Robert Tibshirani. “Linear smoothers and additive mod-els”. In:

The Annals of Statistics (1989), pp. 453–510.[2] Elizbar A Nadaraya. “On estimating regression”. In:

Theory of Probability & Its Applications

Sankhy¯a: The Indian Journal of Statis-tics, Series A (1964), pp. 359–372.[4] Timothy L McMurry and Dimitris N Politis. “Nonparametric regression with inﬁnite orderﬂat-top kernels”. In:

Journal of Nonparametric Statistics

All of nonparametric statistics . Springer Science & Business Media, 2006.[6] Peter Hall and Qiwei Yao. “Data tilting for time series”. In:

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology)

Scandinavian ActuarialJournal

Biometrika

Australian Journalof Statistics

Commu-nications in Statistics-Theory and Methods

Communications in Statistics—Theory and Methods

Statistica Sinica (2005), pp. 177–195.[13] Peter Hall and Brett Presnell. “Intentionally biased bootstrap methods”. In:

Journal of theRoyal Statistical Society: Series B (Statistical Methodology)

Annals of Statistics (2001), pp. 624–647.[15] Raymond J Carroll, Aurore Delaigle, and Peter Hall. “Testing and estimating shape-constrainednonparametric density and regression in the presence of measurement error”. In:

Journal ofthe American Statistical Association

Journal of the Royal Statistical Society: SeriesB (Statistical Methodology)

Journal of Statistical Planning and Inference

REVSTAT–Statistical Journal

Computational statistics handbook with MATLAB.Vol. 22 . 2007.[20] Peter Hall and Thomas E Wehrly. “A geometrical method for removing edge eﬀects fromkernel-type nonparametric regression estimators”. In:

Journal of the American StatisticalAssociation

IranianJournal of Pharmaceutical Sciences ppendices

A Proof of Theorem 3

In this section we provide proof of Theorem 3 for tilted Nadaraya-Watson estimator as a formof tilted linear smoother from (2.3). The result for tilted local linear smoother can be provedanalogously.

Proof.

Let p i = n π ( X i ) where π ≥ P ni =1 p i = 1 which is equivalent R π ( X ) f X ( x ) dx = 1 for continuous X . For simplicity, we replace f X ( x ) by f then n X i =1 p i = 1 n n X i =1 π ( X i )= Eπ ( X ) + O p ( n − / )= Z π ( X ) f dx + O p ( n − / )= 1 + O p ( n − / ) , thus for normalising p i s so that P i p i = 1 we need to multiply the estimator (2.3) by 1+ O p ( n − / ).The factor O p ( n − / ) is negligibly small. We choose π such that ˆ r n ( x | h, p ) in (2.3) is unbiasedestimator for r , i.e E ˆ r ( x | h, p ) = r. (A.1)From (A.1) we have E ˆ r ( x )ˆ g ( x ) = 1 h n X i =1 Ep i Y i K ( x − X i h )= 1 nh n X i =1 EE { Y i π ( X i ) K ( x − X i h ) | X i } = 1 h Z ∞−∞ r ( t ) π ( t ) K ( x − th ) f X ( t ) dt, (A.2)where ˆ g ( x ) = 1 nh n X i =1 K ( x − X i h )and g ( x ) = E ˆ g ( x ). It can be shown that the left-hand side of (A.2) is converging to r ( x ) g ( x ) . We have r ( x ) g ( x ) = 1 h Z ∞−∞ r ( t ) π ( t ) K ( x − t h ) f X ( t ) dt , multiplying both sides by e − itx and integrating over x , we deduceΦ rg ( t ) = 1 h Z ∞−∞ Z e − itx rπf ( t ) K ( x − t h ) dx dt ,

19y changing variable x − t h = u , we haveΦ rg ( t ) = Z ∞−∞ e − itx rπf ( t )Φ k ( t ) dt = Φ k ( t )Φ rπf ( t ) . Φ rπf = Φ rg ( t )Φ k ( t ) ,πrf = 12 π Z ∞−∞ e − itx Φ rg ( t )Φ k ( t ) ,π ( X ) = 12 πrf Z ∞−∞ e − itx Φ rg ( t )Φ k ( t ) dt, (A.3)if kernel K holds the assumption (c) and q = r · g meet the assumption (d), then π = 1 + n X j =1 C j ( − h ) j rg (2 j ) rf , (A.4)with π from (A.3), then ˆ r n in unbiased. Next we show that π satisﬁes 0 < π ( X ) < C and h ≥

0, for all h , 0 ≤ h ≤ h , π > π ≤ C < ∞ then for unbiased ˆ r n Z var { ˆ r n ( x | h, p ) } dx ≤ nh Z E { π ( X ) K ( x − Xh ) } dx, ≤ nh (sup π ) Z K dx, = O { nh − } . (A.5)So M SE can be written as

M SE { ˆ r n ( x | h, p ) } = Z E { ˆ r n ( x | h, p ) − r } dx, = O { nh − } . We recall that(f) Under the assumption (e)-(1), δ n → n → ∞ , so that n / δ n → ∞ , and under assumption(e)-(2), n / log ( n ) − C / C δ n → ∞ , where C and C are deﬁned in (e)-(2).Then under the assumption (e)-(1) and (f), we have that n / δ n → ∞ thus n − = o ( δ n ). Con-sequently, there exists h ( n ) ↓ n → ∞ such that ( nh ) − = O ( δ n ) , for some large n, h < h since 0 ≤ h ≤ h . Next, by replacing O ( δ n ) in the right-hand side of (A.5), we have a new form of(A.5) which is true for speciﬁc choice of π deﬁned at (A.3), and considering ˜ θ = ( h, p ) in the caseof (2.3): lim C →∞ lim sup n →∞ P {k ˆ r n ( . | ˜ θ ) − r k ≥ Cδ n } = 0 . (A.6)20or this version of p i = n − π ( X i ), P i p i = 1 does not satisfy. However, this issue can be ﬁxed bynormalisation similar to that done in the ﬁrst paragraph of the proof.Property (A.6) implies part I of Theorem 3 and part II can be concluded under uniformity of(A.6) over C .Under assumption (e)-(2) and (A.4), | rg (2 j ) ( x ) /rf | ≤ C (1 + | x | ) C for 1 ≤ j ≤ k , deﬁning C = max( | C | , ..., | C k | ), we have for 0 ≤ h ≤ | π ( X ) − | = | n X j =1 C j ( − h ) j rg (2 j ) rf | , ≤ C C K (1 + | x | ) C h , so, if λ n → ∞ and λ C n h →

0, then sup | π ( X ) − | → | X | ≤ λ n , 0 < π ( X ) < π ( X ) ≥

0. Then we found an upper boundfor π ( X ) in (A.7) when X ∈ [ − λ n , λ n ]. Now, we want to show that the probability of X beingout of this interval is almost zero which means for all X , 0 < π ( X ) < P ( | X | ≥ λ n ) ≤ C exp ( − C λ C n ) . (A.8)Using (f), n − / (log n ) C / C δ n → ∞ , or equivalently, δ n = λ n n − (log n ) C / C → ∞ , where λ n exists. We choose h so that ( nh ) − = O ( δ n ); or for simplicity, ( nh − ) = δ n , then h = λ − n (log n ) − C / C ; let λ n = { λ − η n (log n ) C /C } /C , where η ∈ (0 , − C λ C n ) = exp( − C λ (2 − η ) C /C n log n ) , = O ( n − C ) , for all C >