Nonparametric estimation of the conditional distribution at regression boundary points
NNonparametric estimation of the conditional distribution atregression boundary points
Srinjoy DasDepartment of Electricaland Computer EngineeringUniversity of California—San DiegoLa Jolla, CA 92093, USAemail: [email protected]
Dimitris N. PolitisDepartment of MathematicsUniversity of California—San DiegoLa Jolla, CA 92093-0112, USAemail: [email protected]
Abstract
Nonparametric regression is a standard statistical tool with increased importancein the Big Data era. Boundary points pose additional difficulties but local polynomialregression can be used to alleviate them. Local linear regression, for example, is easyto implement and performs quite well both at interior as well as boundary points.Estimating the conditional distribution function and/or the quantile function at a givenregressor point is immediate via standard kernel methods but problems ensue if locallinear methods are to be used. In particular, the distribution function estimator is notguaranteed to be monotone increasing, and the quantile curves can “cross”. In thepaper at hand, a simple method of correcting the local linear distribution estimator formonotonicity is proposed, and its good performance is demonstrated via simulationsand real data examples.
Keywords:
Model-Free prediction, local linear regression, kernel smoothing, local polyno-mial fitting, point prediction. 1 a r X i v : . [ s t a t . O T ] A p r Introduction
Nonparametric regression via kernel smoothing is a standard statistical tool with increasedimportance in the Big Data era; see e.g. (Wand & Jones, 1994), (Yu & Jones, 1998), (Yu,Lu, & Stander, 2003), (Koenker, 2005) or (Schucany, 2004) for reviews. The fundamentalnonparametric regression problem is estimating the regression function µ ( x ) = E ( Y | X = x )from data ( Y , x ) , . . . , ( Y n , x n ) under the sole assumption that the function µ ( · ) belongs tosome smoothness class, e.g., that it possesses a certain number of continuous derivatives.Here, Y i is the real-valued response associated with the regressor X taking a value of x i .Either by design or via the conditioning, the regressor values x , . . . , x n are treated asnonrandom. For simplicity of exposition, we will assume that the regressor X is univariatebut extension to the multivariate case is straightforward.A common approach to nonparametric regression starts with assuming that the datawere generated by an additive model such as Y i = µ ( x i ) + σ ( x i ) (cid:15) i for i = 1 , , . . . , n (1)where the errors (cid:15) i are assumed to be independent, identically distributed (i.i.d.) with meanzero and variance one, and σ ( · ) is another unknown function.Nevertheless, standard kernel smoothing methods are applicable in a Model-Free contextas well, i.e., without assuming an equation such as (1). An important example is theNadaraya-Watson kernel estimator defined asˆ µ ( x ) = (cid:80) ni =1 ˜ K i,x Y i (cid:80) ni =1 ˜ K i,x (2)where b > K ( x ) is a nonnegative kernel function satisfying (cid:82) K ( x ) dx =1, and ˜ K i,x = 1 b K (cid:18) x − x i b (cid:19) . Estimator ˆ µ ( x ) enjoys favorable properties such as consistency and asymptotic normalityunder standard regularity conditions in a Model-Free context, e.g. assuming the pairs( Y , X ) , . . . , ( Y n , X n ) are i.i.d. (Li & Racine, 2007).The rationale behind the Nadaraya-Watson estimator (2) is approximating the unknownfunction µ ( x ) by a constant over a window of “width” b ; this is made clearer if a rectangularfunction is chosen as the kernel K , e.g. letting K ( x ) = {| x | < / } where A is theindicator of set A , in which case ˆ µ ( x ) is just the average of the Y ’s whose x value fell inthe window. Going from a local constant to a local linear approximation for µ ( x ), i.e., afirst-order Taylor expansion, motivates the local linear estimatorˆ µ LL ( x ) = (cid:80) ni =1 w i Y i (cid:80) ni =1 w i (3)2here w i = ˜ K i,x (cid:16) − ˆ β ( x − x i ) (cid:17) and ˆ β = (cid:80) ni =1 ˜ K i,x ( x − x i ) (cid:80) ni =1 ˜ K i,x ( x − x i ) . (4)If the design points x j are (approximately) uniformly distributed over an interval [ a , a ],then ˆ µ LL ( x ) is typically indistinguishable from the Nadaraya-Watson estimator ˆ µ ( x ) when x is in the ‘interior’, i.e., when x ∈ [ a + b/ , a − b/ µ LL ( x )offers an advantage when the design points x j are non-uniformly distributed, e.g., whenthere are gaps in the design points, and/or when x is a boundary point, i.e., when x = a or x = a (plus or minus b/ µ ( x ) = E ( Y | X = x ), one may considerestimating the conditional distribution F x ( y ) = P ( Y ≤ y | X = x ) at some fixed point y .Note that F x ( y ) = E ( W | X = x ) where W = { Y ≤ y } . Hence, estimating F x ( y ) can beeasily done via local constant or local linear estimation of the conditional mean from thenew dataset ( W , x ) . . . , ( W n , x n ) where W i = { Y i ≤ y } . To elaborate, the local constantand the local linear estimators of F x ( y ) are respectively given byˆ F x ( y ) = (cid:80) ni =1 ˜ K i,x { Y i ≤ y } (cid:80) ni =1 ˜ K i,x , and ˆ F LLx ( y ) = (cid:80) nj =1 w j { Y j ≤ y } (cid:80) nj =1 w j (5)where the local linear weights w j are given by eq. (4).Viewed as a function of y , ˆ F x ( y ) is a well-defined distribution function; however, beinga local constant estimator, it often has poor performance at boundary points. As alreadydiscussed, ˆ F LLx ( y ) has better performance at boundary points. Unfortunately, ˆ F LLx ( y ) isneither guaranteed to be in [0 ,
1] nor is it guaranteed to be nondecreasing as a function of y ; this is due to some of the weights w j potentially being negative.The problem with non-monotonicity of ˆ F LLx ( y ) and the associated quantile curves po-tentially “crossing” is well-known; see (Hall, Wolff, & Yao, 1999) for the former issue, and(Yu & Jones, 1998) for the latter, as well as the reviews on quantile regression by (Yu etal., 2003) and (Koenker, 2005). In the next section, a simple method of correcting thelocal linear distribution estimator for monotonicity is proposed; its good performance isdemonstrated via simulations and real data examples in Section 3. It should be noted herethat while the paper at hand focuses on the monotonicity correction for local linear esti-mators of the conditional distribution, the monotonicity correction idea can equally be beapplied to other distribution estimators constructed via different nonparametric methods,e.g. wavelets. 3 Local Linear Estimation of smooth conditional distribu-tions
The good performance of local constant and local linear estimators (5) hinges on F x ( · ) beingsmooth, e.g. continuous, as a function of x . In all that follows, we will further assume that F x ( y ) is also continuous in y for all x . Since the estimators (5) are discontinuous (stepfunctions) in y , it is customary to smooth them, i.e., define¯ F x ( y ) = (cid:80) ni =1 ˜ K i,x Λ( y − Y i h ) (cid:80) ni =1 ˜ K i,x , and ¯ F LLx ( y ) = (cid:80) nj =1 w j Λ( y − Y j h ) (cid:80) nj =1 w j (6)where Λ( y ) is some smooth distribution function which is strictly increasing with density λ ( y ) >
0, i.e., Λ( y ) = (cid:82) y −∞ λ ( s ) ds . Here again the local linear weights w j are given byeq. (4), and h > b ; see Section 2.5 for some concrete suggestions on picking b and h in practice.Under standard conditions, both estimators appearing in eq. (6) are asymptoticallyconsistent, and preferable to the respective estimators appearing in eq. (5), i.e., replacing { Y j ≤ y } by Λ( y − Y j h ) is advantageous; see Ch. 6 of (Li & Racine, 2007). Furthermore,as discussed in the Introduction, we expect that ¯ F LLx ( y ) would be a better estimator than¯ F x ( y ) when x is a boundary point and/or the design is not uniform, while ¯ F LLx ( y ) and ¯ F x ( y )would be practically equivalent when x is an interior point and the design is (approximately)uniform. Hence, all in all, ¯ F LLx ( y ) would be preferable to ¯ F x ( y ) as an estimator of F x ( y ) forany fixed y . The problem again is that ¯ F LLx ( y ) is not guaranteed to be a proper distributionviewed as a function of y by analogy to ˆ F LLx ( y ).There have been several proposals in the literature to address this issue. An interestingone is the adjusted Nadaraya-Watson estimator of (Hall et al., 1999) that is a linear functionof the Y ’s with weights being selected by an appropriate optimization procedure. Theadjusted Nadaraya-Watson estimator is much like a local linear estimator in that it hasreduced bias (by an order of magnitude) compared to the regular Nadaraya-Watson localconstant estimator. Unfortunately, the adjusted Nadaraya-Watson estimator does not workwell when x is a boundary point as the required optimization procedure typically does notadmit a solution.Noting that the problems with ¯ F LLx ( y ) and ˆ F LLx ( y ) arise due to potentially negativeweights w j computed by eq. (4), Hansen proposed a straightforward adjustment to thelocal linear estimator that maintains its favorable asymptotic properties (Hansen, 2004) .The local linear versions of ˆ F x ( y ) and ¯ F x ( y ) adjusted via Hansen’s proposal are given as4ollows: ˆ F LLHx ( y ) = (cid:80) ni =1 w (cid:5) i ( Y i ≤ y ) (cid:80) ni =1 w (cid:5) i and ¯ F LLHx m ( y ) = (cid:80) ni =1 w (cid:5) i Λ( y − Y i h ) (cid:80) ni =1 w (cid:5) i (7)where w (cid:5) i = β ( x − x i ) > K i,x (cid:16) − ˆ β ( x − x i ) (cid:17) when ˆ β ( x − x i ) ≤ . (8)Essentially, Hansen’s proposal replaces negative weights by zeros, and then renormalizesthe nonzero weights. The problem here is that if x is on the boundary, negative weights arecrucially needed in order to ensure an extrapolation takes place with minimal bias; this isfurther elaborated upon in the following subsection. In order to illustrate the need for negative weights consider the simple case of n = 2, i.e., twodata points ( Y , x ) and ( Y , x ). The question is to predict a future response Y associatedwith a regressor value of x ; assuming finite second moments, the L –optimal predictor of Y is µ ( x ) where µ ( x ) = E ( Y | X = x ) as before.If x is an interior point as depicted in Figure 1, the problem is one of interpolation .If x is a boundary point, and in particular if x is outside the convex hull of the designpoints as in Figure 2, the problem is one of extrapolation . Let ˆ µ LL ( x ) denote the locallinear estimator of µ ( x ) as before. With n = 2, ˆ µ LL ( x ) reduces to just finding the line thatpasses through the two data points ( Y , x ) and ( Y , x ). In other words, ˆ µ LL ( x ) reducesto a convex combination of Y and Y , i.e., ˆ µ LL ( x ) = ω x Y + (1 − ω x ) Y where ω x = x − xx − x where x < x < x for interior points and x < x < x for boundary points. Note that ω x ∈ [0 ,
1] if x is an interior point, whereas ω x (cid:54)∈ [0 ,
1] if x is outside the convex hull of thedesign points. Hence, negative weights are a sine qua non for effective linear extrapolation.For example, assume we are in the setup of Figure 2 where x < x < x . In thiscase, ω x is negative. Hansen’s proposal (Hansen, 2004) would replace ω x by zero andrenormalize the coefficients, leading to ˆ µ LLH ( x ) = Y ; it is apparent that this does notgive the desired linear extrapolation effect.To generalize the above setup, suppose that now n is an arbitrary even number, and Y i represents the average of n/ x i for i = 1 or 2.Thus, we have a bona fide n –dimensional scatterplot that is supported on two design points.Interestingly, the formula for ˆ µ LL ( x ) is exactly as given above, and so is the argumentrequiring negative weights for linear extrapolation. Of course, we cannot expect a generalscatterplot to be supported on just two design points. Nonetheless, in a nonparametricsituation one performs a linear regression locally , i.e., using a local subset of the data.5igure 1: Interpolation: prediction of Y when x is an interior point; ˆ Y is a convexcombination of Y and Y with nonnegative weights.Figure 2: Extrapolation: prediction of Y when x is outside the convex hull of the designpoints; ˆ Y is a linear combination of Y and Y with one positive and one negative weight.Typically, there is a scarcity of design points near the boundary, and the general situationis qualitatively similar to the case of two design points.6 .3 Monotone Local Linear Distribution Estimation The estimator ˆ F LLx ( y ) from eq. (5) is discontinuous as a function of y therefore we willfocus our attention on ¯ F LLx ( y ) described in eq. (6) from here on. It seems that with thisdouble-smoothed estimator ¯ F LLx ( y ) we can “have our cake and eat it too”, i.e., modify ittowards monotonicity while retaining (some of) the negative weights that are helpful in theextrapolation problem as discussed in the last subsection. We are thus led to define a newestimator denoted by ¯ F LLMx ( y ) which is a monotone version of ¯ F LLx ( y ); we will refer to¯ F LLMx ( y ) as the Monotone Local Linear Distribution Estimator .One way to define ¯ F LLMx ( y ) is as follows. Algorithm 1.
1. Compute ¯ F LLx ( y ), and denote l = lim y →−∞ ¯ F LLx ( y ).2. Define a function G ( y ) = ¯ F LLx ( y ) − l .3. Define a second function G with the property that G ( y + (cid:15) ) = max ( G ( y + (cid:15) ) , G ( y ))for all y and all (cid:15) > F LLMx ( y ) = G ( y ) /L where L = lim y →∞ G ( y ).The above algorithm could be approximately implemented in practice by selecting a smallenough (cid:15) >
0, dividing the range of the y variable using a grid of size (cid:15) , and running step 3of Algorithm 1 sequentially. To elaborate, one would compute G at grid point j + 1 fromthe values of G at previous, i.e., smaller, grid points.A different—albeit equivalent—way of constructing the estimator ¯ F LLMx ( y ) is by firstconstructing its associated density function denoted by ¯ f LLMx ( y ) which will be called the Monotone Local Linear Density Estimator . The alternative algorithm goes as follows.
Algorithm 2.
1. Recall that the derivative of ¯ F LLx ( y ) with respect to y is given by¯ f LLx ( y ) = h (cid:80) nj =1 w j λ ( y − Y j h ) (cid:80) nj =1 w j where λ ( y ) is the derivative of Λ( y ).2. Define a nonnegative version of ¯ f LLx ( y ) as ¯ f LL + x ( y ) = max( ¯ f LLx ( y ) , f LLMx ( y ) = ¯ f LL + x ( y ) (cid:82) ∞−∞ ¯ f LL + x ( s ) ds . (9)4. Finally, define ¯ F LLMx ( y ) = (cid:82) y −∞ ¯ f LLMx ( s ) ds. To implement the above one would again need to divide the range of the y variable using agrid of size (cid:15) in order to construct the maximum function in step 2 of Algorithm 2. The samegrid can by used to provide Riemann-sum approximations to the two integrals appearingin steps 3 and 4. All in all, the implementation of Algorithm 1 is a bit easier, and will beemployed in the sequel. Under standard conditions, the local linear estimator √ nb ¯ F LLx ( y ) is asymptotically normalwith a variance V x,y that depends on the design; for details, see Ch. 6 of (Li & Racine, 2007).In addition, the bias of √ nb ¯ F LLx ( y ) is asymptotically vanishing if b = o ( n / ). Hence, letting b ∼ n α for some α ∈ (0 , / F LLx ( y ) will be consistent for F x ( y ), and approximate 95%confidence intervals for F x ( y ) can be constructed as ¯ F LLx ( y ) ± . V x,y nb .The consistency of ¯ F LLx ( · ) towards F x ( · ) implies that the monotonicity corrections de-scribed in the previous subsection will be asymptotically negligible. To see why, fix a point x of interest, and assume that F x ( y ) is absolutely continuous with density f x ( y ) that isstrictly positive over its support. The consistency of ¯ f LLx ( y ) to a positive target implies that¯ f LLx ( y ) will eventually become (and stay) positive as n increases. Hence, the monotonicitycorrection eventually vanishes, and ¯ F LLMx ( y ) is asymptotically equivalent to ¯ F LLx ( y ).Regardless, it is not advisable to use the aforementioned asymptotic distribution andvariance of ¯ F LLx ( y ) to approximate those of ¯ F LLMx ( y ) for practical work since, in finite sam-ples, ¯ F LLMx ( y ) and ¯ F LLx ( y ) can be quite different. Our recommendation is to use some formof bootstrap in order to approximate the distribution and/or standard error of ¯ F LLMx ( y )directly. In particular, the Model-Free bootstrap (Politis, 2015) in its many forms is im-mediately applicable in the present context. For instance, the “Limit Model-Free” (LMF)bootstrap would go as follows: LMF Bootstrap Algorithm
1. Generate U , . . . , U n i.i.d. Uniform(0,1).2. Define Y ∗ i = G − x i ( U i ) for i = 1 , . . . , n where G − x i ( · ) is the quantile inverse of ¯ F LLMx i ( · ),i.e., G − x i ( u ) = inf { y : ¯ F LLMx i ( y ) ≥ u } . 8. For the points x and y of interest, construct the pseudo-statistic ¯ F LLM ∗ x ( y ) whichis computed by applying estimator ¯ F LLMx ( y ) to the bootstrap dataset ( Y ∗ , x ) . . . , ( Y ∗ n , x n ).4. Repeat steps 1–3 a large number (say B ) times. Plot the B pseudo-replicates ¯ F LLM ∗ x ( y )in a histogram that will serve as an approximation of the distribution of ¯ F LLMx ( y ). Inaddition, the sample variance of the B pseudo-replicates ¯ F LLM ∗ x ( y ) is the bootstrapestimator of the variance of ¯ F LLMx ( y ).Our focus is on point estimation of F x ( y ) so we will not elaborate further on the constructionof interval estimates. There are two bandwidths, b and h , required to construct estimator ¯ F LLMx ( y ) and itsrelatives ¯ F x ( y ) and ¯ F LLHx ( y ). We will now focus on choice of the bandwidth b which is themost crucial of the two, and is often picked via leave-one-out cross-validation.In the paper at hand we are mostly concerned with estimation and prediction at bound-ary points. Since often boundary problems present their own peculiarities, we are stronglyrecommending carrying out the cross-validation procedure ‘locally’, i.e., over a neighbor-hood of the point of interest. One needs, however, to ensure that there are enough pointsnearby to perform the leave-one-out experiment. Hence, our concrete recommendation goesas follows. • Choose a positive integer m which can be fixed number or it can be a small fractionof the sample size at hand. • Then, identify m among the regression points ( Y , x ) , . . . , ( Y n , x n ) with the propertythat their respective x i ’s are the m closest neighbors of the point x under considera-tion. • Denote this set of m points by ( Y g (1) , x g (1) ) , . . . , ( Y g ( m ) , x g ( m ) ) where the function g ( · )gives the index numbers of the selected points. • For k = 1 , . . . , m , compute ˆ Y g ( k ) which is the L –optimal predictor of Y g ( k ) usingleave-one-out data. In other words, ˆ Y g ( k ) is the mean, i.e., center of location, of oneof the aforementioned distribution estimators based on the delete–one dataset, i.e.pretending that Y g ( k ) is unavailable. • Thus, for a range of values of bandwidth b , we can calculate the following:9 rr = m (cid:88) k =1 ( ˆ Y g ( k ) − Y g ( k ) ) . (10) • Finally, the optimal bandwidth is given by the value of b that minimizes Err over therange of bandwidths considered.Coming back to the problem of selecting h , define h = b/n and recall that in ananalogous regression problem the optimal rates h ∼ n − / and h ∼ n − / were suggestedin connection with the nonnegative kernel K ; see (Li & Racine, 2007). As in (Politis, 2013),this leads to the practical recommendation of letting h = h . We will adopt the same rule-of-thumb here as well, namely let h = b /n where b has been chosen previously via localcross-validation. Note that the initial choice of h (before performing the cross-validationto determine the optimal bandwidth b ) can be set by a plug-in rule as available in standardstatistical software such as R. The performance of the three distribution estimators ¯ F x ( y ), ¯ F LLHx ( y ) or ¯ F LLMx ( y ) describedabove are empirically compared using both simulated and real-life datasets according to thefollowing metrics.1. Divergence between the local distribution ¯ F x ( · ) estimated by all three methods andthe corresponding local (empirical) distribution calculated from the actual data; thisis determined using the mean value of the Kolmogorov-Smirnov (KS) test statistic.The measurement is performed on simulated datasets where multiple realizations ofdata at both boundary and internal points are available. Therefore the empiricaldistribution at any point can be calculated and compared versus the estimated values.Our notation is KS-LC, KS-LLH and KS-LLM for the distribution estimators¯ F x ( y ), ¯ F LLHx ( y ) or ¯ F LLMx ( y ) respectively.2. Comparison of estimated quantiles of F x ( · ) at specified points using all three methodsversus the corresponding empirical values calculated using simulated datasets.3. Point prediction performance as indicated by bias and Mean Squared Error (MSE)on simulated and real-life datasets using all three methods. The MSE values of pointprediction are denoted as MSE-LC, MSE-LLH and MSE-LLM for the distri-bution estimators ¯ F x ( y ), ¯ F LLHx ( y ) or ¯ F LLMx ( y ) respectively; the corresponding biasvalues are denoted Bias-LC, Bias-LLH and Bias-LLM . For comparison purposesthe point-prediction performance is also measured using the local linear conditional10oment estimator as given by equations 3 and 4. In this case bias and MSE areindicated as
Bias-LL and
MSE-LL respectively.On simulated datasets the performance metrics for all three distribution estimatorsare calculated both at boundary and internal points to illustrate how performance variesbetween ¯ F x ( y ), ¯ F LLHx ( y ) and ¯ F LLMx ( y ) in the two cases. Data Y i for i = 1 , . . . , µ ( x i ) = sin( x i ), σ ( x i ) = τ and the errors (cid:15) i as i.i.d. N (0 , n was set to 1001. A total of 500such realizations were generated for this study.Results for the mean-value of the Kolmogorov-Smirnov test statistic between the LC,LLH and LLM estimated distributions and empirical distribution calculated using availablevalues of the simulated data are given in Tables 1, 2, 3 and 4 for boundary point n = 1001and internal point n = 200 for values of τ = 0 . . b taking values 10 , , . . . , α –quantile at specific values of α are calculated using all three distri-bution estimators and compared with corresponding quantiles calculated from the availabledata. Plots for selected quantile values ( α = 0 . α = 0 .
9) are shown in Figures 3, 4, 5and 6 for both 1 and 2-sided cases ( τ = 0 . n = 1001 and n = 200 over 500realizations for the case of boundary and internal points respectively. The bandwidths usedfor estimating the quantiles for LC, LLH and LLM are based on bandwidth values wherethe best performance for these estimators was obtained using the Kolmogorov-Smirnov test(refer Tables 3 and 4).Note that the point n = 1001 is excluded from the data used for LC, LLH and LLMestimation at the boundary point. Similarly the point n = 200 is excluded for the case ofestimation at the internal point.From results on these iid regression datasets it can be seen that for boundary valueestimation the estimator based on ¯ F LLMx ( y ) has superior performance as compared to both¯ F x ( y ) and ¯ F LLHx ( y ). The improvement is seen over a wide range of selected bandwidthsusing both the mean values of the Kolmogorov-Smirnov test statistic (Tables 1 and 3) andmean-square error of point prediction (Tables 5 and 7). Moreover the overall best perfor-mance over the selected bandwidth range from 10 , . . . ,
140 is obtained using the MonotoneLocal Linear Estimator ¯ F LLMx ( y ). In addition it can be seen from the plots of the estimatedquantiles at α = 0 . α = 0 . n = 1001 , τ =0 .
1) Bandwidth KS-LC KS-LLH KS-LLM10 0.23508 0.252884 0.27513220 0.241992 0.233996 0.2360630 0.2767 0.232064 0.21894840 0.31528 0.240476 0.2074450 0.349924 0.2554 0.200960 0.38438 0.273648 0.20440470 0.418316 0.288032 0.2150280 0.448772 0.307672 0.23158890 0.474796 0.326224 0.253472100 0.502768 0.342884 0.275936110 0.5264 0.360888 0.2993120 0.54664 0.37786 0.320348130 0.56692 0.393392 0.34248140 0.58646 0.407108 0.359404quantile distribution for LLM is aligned more closely to the ’true’ quantile value calculatedfrom the simulated data as shown by the dotted line (Figures 3 and 4).For the case of estimation at internal points no appreciable differences in performanceare noticeable between the 3 estimators using both the mean values of the Kolmogorov-Smirnov test statistic (Tables 2 and 4) and also using mean-square error of point prediction(Tables 6 and 8). Similar trends are noticeable in the quantile plots where the estimatedquantiles using LC, LLH and LLM nearly overlap for the internal case (Figures 5 and 6).It can also be seen from Tables 5, 6, 7 and 8 that across the range of bandwidthsconsidered there is negligible loss in best point prediction performance of LLM versus thatof LL. 12able 2: Mean values of KS test statistic over i.i.d. data at internal point ( n = 200 , τ = 0 . n = 1001 , τ =0 .
3) Bandwidth KS-LC KS-LLH KS-LLM10 0.207104 0.303696 0.35291220 0.148964 0.210324 0.25085630 0.125284 0.171268 0.205840 0.112412 0.15016 0.18217650 0.107232 0.136612 0.1670260 0.107764 0.127176 0.15494470 0.111144 0.121408 0.14562480 0.119836 0.115008 0.13696890 0.126996 0.110716 0.128792100 0.137376 0.108468 0.121452110 0.14676 0.105504 0.1165120 0.157364 0.107432 0.111452130 0.165528 0.108692 0.107532140 0.175852 0.110228 0.10377213able 4: Mean values of KS test statistic over i.i.d. data at internal point ( n = 200 , τ = 0 . n = 1001 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 -0.01887676 0.01265856 -0.0087034 0.01453471 0.0004694887 0.01667712 0.00279478 0.0171324320 -0.03782673 0.01261435 -0.01818502 0.0126929 0.0005444976 0.01323652 0.003247646 0.0134041830 -0.05753609 0.01418224 -0.02725602 0.01232877 -0.001022256 0.01200918 0.0039133 0.0121962840 -0.07724901 0.01672728 -0.03718728 0.01259729 -0.005397138 0.01148354 0.00354838 0.0116749650 -0.09692561 0.0200906 -0.04758345 0.01327841 -0.01222596 0.01130622 0.002834568 0.0113909560 -0.116533 0.02423279 -0.05831195 0.01431087 -0.02106315 0.01142789 0.002008806 0.0112032770 -0.1359991 0.02911512 -0.06918129 0.0156254 -0.03138586 0.01185914 0.001102312 0.0110682180 -0.1555938 0.03480583 -0.08021998 0.01722284 -0.04274234 0.01263368 8.912064e-05 0.0109694790 -0.1752324 0.04128715 -0.09144259 0.01910772 -0.05473059 0.01375585 -0.001070282 0.01089842100 -0.1947342 0.04848954 -0.1027918 0.02127558 -0.0670785 0.01521865 -0.002416635 0.01084951110 -0.2145001 0.05656322 -0.1142845 0.02374615 -0.07967838 0.01704094 -0.003988081 0.01081946120 -0.2343967 0.06548142 -0.1259372 0.02651703 -0.09236019 0.01919461 -0.005818943 0.01080699130 -0.2543523 0.07522469 -0.1377167 0.02960364 -0.1050934 0.02168698 -0.007939144 0.01081259140 -0.2740635 0.08563245 -0.1496325 0.03301117 -0.1178388 0.02451228 -0.01037417 0.01083832 n = 200 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 0.005693694 0.01026982 0.005815108 0.01027252 0.005811741 0.01027231 0.005672309 0.0102734120 0.004548762 0.009868812 0.004644668 0.009871222 0.004640743 0.009871005 0.004547984 0.00988325730 0.003077572 0.009736622 0.003193559 0.009739295 0.003189924 0.009738919 0.003108078 0.00975492740 0.001168265 0.009684642 0.001329604 0.009685997 0.001325573 0.009685696 0.001205735 0.00970349250 -0.001163283 0.009671566 -0.0009392514 0.009670138 -0.0009440976 0.009670008 -0.001162398 0.00968921460 -0.003874557 0.009682447 -0.00359328 0.009680945 -0.003598744 0.009680969 -0.003997042 0.00970370 -0.006944759 0.009723612 -0.006615935 0.009717111 -0.006621406 0.009717225 -0.007307346 0.00974567580 -0.01035534 0.009789969 -0.009987875 0.009781065 -0.009992804 0.009781194 -0.01109961 0.00982269590 -0.01407319 0.009888265 -0.01368629 0.009877023 -0.01369037 0.009877157 -0.01537421 0.009942768100 -0.01808254 0.01002258 -0.01768867 0.01001026 -0.01769184 0.01001041 -0.02012788 0.01011708110 -0.02234318 0.01020278 -0.02197526 0.01018668 -0.02197765 0.01018686 -0.02535515 0.01035866120 -0.02686568 0.01042781 -0.02652964 0.01041258 -0.02653147 0.0104128 -0.03104801 0.0106819130 -0.03163397 0.01071166 -0.03133849 0.01069454 -0.03133999 0.01069479 -0.03719388 0.01110199140 -0.03662567 0.01105637 -0.03639079 0.01103926 -0.03639212 0.01103955 -0.04377252 0.0116341
Table 7: Point Prediction for Boundary Value over i.i.d. data ( n = 1001 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 0.04888178 0.301925 0.07073083 0.3540897 0.07920865 0.384878 0.0808868 0.403557920 0.02525561 0.2802656 0.05074344 0.3037839 0.06735271 0.3233827 0.07335949 0.327623430 0.00374298 0.2731737 0.038811 0.2892723 0.06222332 0.3013529 0.07053577 0.302794240 -0.01695169 0.270537 0.02715055 0.2805281 0.05849931 0.2922475 0.06822172 0.29355250 -0.03718522 0.2696291 0.01614152 0.2761872 0.05612087 0.2867147 0.06656515 0.289217960 -0.05753523 0.2699832 0.005048478 0.2739688 0.05384922 0.2829767 0.0649322 0.286019270 -0.07760465 0.271603 -0.005574987 0.2723361 0.0513094 0.2798923 0.06320544 0.283079380 -0.09765073 0.2742877 -0.01633413 0.271128 0.04834131 0.2770397 0.06143642 0.280324290 -0.1176859 0.2780296 -0.02722356 0.2704552 0.04514186 0.2748099 0.05960562 0.2778554100 -0.1373472 0.2827116 -0.0383895 0.2701542 0.04137961 0.2727937 0.05763437 0.2757286110 -0.1572939 0.2883236 -0.04971082 0.2703248 0.03701344 0.2709994 0.05544761 0.2739321120 -0.1769863 0.294608 -0.0611495 0.2709176 0.03212707 0.2695289 0.0530012 0.2724221130 -0.1965911 0.3018083 -0.07255455 0.2717088 0.02680826 0.2683285 0.05027668 0.2711495140 -0.2158054 0.3097015 -0.08401642 0.2728317 0.02098977 0.2673724 0.04726651 0.2700701 n = 200 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 0.009184716 0.2520511 0.01220923 0.2521932 0.01220434 0.2521952 0.007409901 0.251658220 0.01372525 0.2431836 0.01526585 0.2435718 0.01526117 0.2435718 0.01263826 0.242690330 0.0148307 0.2398743 0.01582708 0.2401292 0.01582349 0.2401341 0.01395436 0.239570140 0.0135934 0.2381523 0.01432564 0.2382689 0.01432314 0.2382728 0.01288775 0.237928450 0.01125721 0.236852 0.011912 0.2369737 0.01190766 0.2369759 0.01078182 0.236742860 0.008293956 0.2359636 0.008883749 0.2359976 0.008879099 0.2360007 0.007971824 0.235822570 0.004809638 0.2352631 0.005346559 0.2352719 0.005342764 0.235277 0.004580992 0.23512180 0.0009735356 0.2347759 0.001361408 0.2347516 0.001357999 0.2347585 0.0006901448 0.234611890 -0.003467449 0.234453 -0.003042608 0.2344041 -0.003046705 0.2344117 -0.00365963 0.2342717100 -0.008232451 0.2342181 -0.007859816 0.2342051 -0.007864671 0.2342125 -0.008456445 0.2340811110 -0.01347908 0.2341583 -0.01309377 0.2341384 -0.01309954 0.234145 -0.01370081 0.2340256120 -0.01912791 0.2342317 -0.01874779 0.2341951 -0.01875384 0.2342009 -0.01939379 0.2340969130 -0.02516629 0.2344631 -0.0248178 0.2343727 -0.02482374 0.234378 -0.02553028 0.2342927140 -0.0316367 0.2347946 -0.0312908 0.2346738 -0.03129606 0.2346788 -0.03209508 0.2346152
Figure 3: Estimated versus true quantile values ( α = 0 .
1) for 1-sided estimation, i.i.d. errors( τ = 0 .
3) 16igure 4: Estimated versus true quantile values ( α = 0 .
9) for 1-sided estimation, i.i.d. errors( τ = 0 .
3) 17igure 5: Estimated versus true quantile values ( α = 0 .
1) for 2-sided estimation, i.i.d. errors( τ = 0 . Data Y i for i = 1 , . . . , µ ( x i ) = sin( x i ), σ ( x i ) = τ x i where x i = in and the errors (cid:15) i as i.i.d. χ −
1. Sample size n was set to 1001. A total of500 such realizations were generated for this study.Results for the mean-value of the Kolmogorov-Smirnov test statistic between the LC,LLH and LLM estimated distributions and empirical distribution calculated using availablevalues of the simulated data are given in Tables 9, 10, 11 and 12 for boundary point n = 1001and internal point n = 200 for values of τ = 0 . . , , . . . , n = 1001 is excluded from the data used for LC, LLH and LLMestimation at the boundary point. Similarly the point n = 200 is excluded for the case ofestimation at the internal point.From results on these heteroskedastic regression datasets it can be seen that for bound-ary value estimation the estimator based on ¯ F LLMx ( y ) has superior performance as compared18igure 6: Estimated versus true quantile values ( α = 0 .
9) for 2-sided estimation, i.i.d. errors( τ = 0 . F x ( y ) and ¯ F LLHx ( y ). The improvement is seen over a wide range of selected band-widths using both the mean values of the Kolmogorov-Smirnov test statistic (Tables 9 and11) and mean-square error of point prediction (Tables 13 and 15). Moreover the overallbest performance over the selected bandwidth range from 10 , . . . ,
140 is obtained using theMonotone Local Linear Estimator ¯ F LLMx ( y ).For the case of estimation at internal points no appreciable differences in performance arenoticeable between the 3 estimators using both the mean values of the Kolmogorov-Smirnovtest statistic (Tables 10 and 12) and also using mean-square error of point prediction (Tables14 and 16).It can also be seen from Tables 13, 14, 15 and 16 that—across the range of bandwidthsconsidered—there is negligible loss in best point prediction performance of LLM versus thatof LL. This finding is unexpected since it has been widely believed that the LL method givesoptimal point estimators and/or predictors. It appears that the monotonicity correctiondoes not hurt the resulting point estimators/predictors which is encouraging.19able 9: Mean values of KS test statistic over heteroskedastic data at boundary point( n = 1001 , τ = 0 .
1) Bandwidth KS-LC KS-LLH KS-LLM10 0.361228 0.3619 0.36828820 0.39358 0.3606 0.33643630 0.43216 0.371372 0.32607640 0.470316 0.388952 0.32511650 0.506436 0.408316 0.33515260 0.53998 0.42548 0.35086470 0.572256 0.44356 0.37132480 0.599836 0.462808 0.39389690 0.6269 0.47816 0.415468100 0.651132 0.499376 0.44184110 0.670604 0.516304 0.462756120 0.69 0.529796 0.485004130 0.706968 0.545344 0.505352140 0.72394 0.562432 0.5257
The
Wage dataset from the
ISLR package (James, Witten, Hastie, & Tibshirani, 2013) wasselected as a real-life example to demonstrate the differences in estimated local densitiesestimated using the LC, LLH and LLM methods. The full dataset has 3000 points andhas been constructed from the Current Population Survey (CPS) data for year 2011. PointPrediction is used as the criterion for demonstrating performance differences between thethree distribution estimators. This dataset is an example of regression data distributednon-uniformly and hence the local linear estimator (LL) based on equations 3 and 4 isexpected to give the best performance in such cases. However our study involves using point-prediction using the three distribution estimators ¯ F x ( y ), ¯ F LLHx ( y ) or ¯ F LLMx ( y ). Amongthese 3 estimators LLM gives the best point prediction performance and we show thatusing this estimator causes negligible loss in performance compared to using LL.From the plot of the dataset in Figure 7 with superimposed smoother (obtained using loess fitting from the R package lattice ) it can be noted that the regression function issloping upwards at the left boundary whereas it flattens out at the right boundary. Hence,at the right boundary, local constant methods suffice and should be practically equivalent tolocal linear methods. The left boundary is more interesting, and this is where our numericalwork will focus. To carry this out, we created a second version of the data where logwage20able 10: Mean values of KS test statistic over heteroskedastic data at internal point( n = 200 , τ = 0 .
1) Bandwidth KS-LC KS-LLH KS-LLM10 0.459776 0.461528 0.46117620 0.461872 0.462716 0.460330 0.46576 0.467308 0.46495640 0.468904 0.471824 0.47017250 0.47436 0.475916 0.47486460 0.482716 0.482476 0.4791270 0.488952 0.488444 0.48665680 0.495916 0.495736 0.49505690 0.503672 0.503052 0.502708100 0.5105 0.513116 0.51026110 0.519052 0.518104 0.518928120 0.528456 0.528444 0.527104130 0.537336 0.536916 0.535632140 0.545264 0.545496 0.543776Table 11: Mean values of KS test statistic over heteroskedastic data at boundary point( n = 1001 , τ = 0 .
3) Bandwidth KS-LC KS-LLH KS-LLM10 0.208708 0.28022 0.32366420 0.176304 0.210876 0.24122830 0.178416 0.189656 0.20699640 0.189136 0.17842 0.18662850 0.204484 0.175508 0.17309660 0.220652 0.177144 0.16391670 0.240692 0.181092 0.15847680 0.25784 0.186648 0.1573690 0.277888 0.191396 0.156008100 0.295264 0.20092 0.159028110 0.312968 0.20922 0.163296120 0.330008 0.216464 0.167872130 0.345432 0.22344 0.17522140 0.36082 0.234392 0.18137621able 12: Mean values of KS test statistic over heteroskedastic data at internal point( n = 200 , τ = 0 .
3) Bandwidth KS-LC KS-LLH KS-LLM10 0.3289 0.329088 0.32911220 0.327172 0.326072 0.326830 0.327236 0.32788 0.327540 0.331784 0.3309 0.3318650 0.337856 0.337888 0.33769260 0.343504 0.344328 0.34336870 0.350048 0.351444 0.34959280 0.3588 0.359188 0.35894490 0.36826 0.368708 0.368008100 0.378308 0.376472 0.377692110 0.386636 0.3864 0.388256120 0.39642 0.395744 0.39754130 0.4055 0.408072 0.40714140 0.418516 0.4171 0.41794Table 13: Point Prediction for Boundary Value over heteroskedastic data ( n = 1001 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 -0.01646515 0.0110308 -0.008415339 0.01301928 0.003362834 0.01521503 0.002122231 0.0153291120 -0.03418985 0.01113183 -0.01803682 0.01111592 0.001465109 0.01164382 0.003045892 0.0119658730 -0.05291763 0.01251871 -0.02791687 0.0110065 -0.001493594 0.01066538 0.003162759 0.0110203940 -0.07217657 0.01484132 -0.03844108 0.01144334 -0.007355843 0.01025364 0.003252626 0.0105149450 -0.09186859 0.0180368 -0.0493472 0.01222871 -0.01604004 0.01020275 0.003291589 0.0102067860 -0.1116673 0.02205503 -0.06052473 0.01337097 -0.0266163 0.0105145 0.003183081 0.0100204970 -0.1312554 0.02681084 -0.07204081 0.01484635 -0.03845618 0.01120131 0.002843088 0.00991557680 -0.1512692 0.03246252 -0.08373385 0.01662921 -0.05099656 0.01226805 0.002239256 0.00985814990 -0.1714417 0.03896746 -0.09557852 0.01872622 -0.06394962 0.01372077 0.00136753 0.009824624100 -0.1916003 0.04627765 -0.1075785 0.02114855 -0.07708492 0.01554708 0.0002256174 0.009802568110 -0.2119687 0.05448537 -0.1197012 0.02389638 -0.09028337 0.01774215 -0.001196002 0.009787441120 -0.2326798 0.06368262 -0.1320067 0.02699023 -0.1035047 0.0202921 -0.002912961 0.009779257130 -0.2535364 0.07381161 -0.1444581 0.03043434 -0.1167033 0.02319127 -0.004943721 0.009780505140 -0.2740579 0.08462823 -0.1570383 0.03422973 -0.1299138 0.02644559 -0.007307095 0.009795173 n = 200 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 -0.00078446 0.0004397085 -0.001282314 0.0004403816 -0.001281847 0.0004403461 -0.001460641 0.000441750620 -0.001122367 0.0004306633 -0.001431476 0.0004311207 -0.001431238 0.0004311334 -0.001977922 0.000433542730 -0.002288569 0.0004309951 -0.002426097 0.0004313394 -0.002424798 0.0004313337 -0.003182182 0.000436019540 -0.00405804 0.0004390668 -0.004017686 0.0004388123 -0.004015654 0.0004387818 -0.004960882 0.000447617550 -0.006300199 0.0004597561 -0.006090049 0.0004573097 -0.006086971 0.0004572689 -0.007269184 0.000473254560 -0.008952297 0.0004986956 -0.00857106 0.0004917471 -0.008566653 0.0004916759 -0.01008903 0.000520117570 -0.01195461 0.0005599192 -0.0114063 0.0005469568 -0.01140055 0.0005468245 -0.01341156 0.000596653780 -0.01524307 0.000648231 -0.01455151 0.0006275842 -0.01454456 0.0006273696 -0.01723004 0.00071257790 -0.0188042 0.0007686332 -0.0179713 0.0007381116 -0.01796344 0.0007378019 -0.02153766 0.0008788525100 -0.02260511 0.0009254938 -0.02163909 0.0008829351 -0.02163065 0.000882528 -0.02632699 0.00110763110 -0.02662906 0.001123084 -0.02553604 0.001066478 -0.02552742 0.001065985 -0.03158974 0.001412141120 -0.03085926 0.001365925 -0.02964955 0.001293297 -0.02964117 0.00129274 -0.03731584 0.001806523130 -0.03531386 0.001660546 -0.03397167 0.001568158 -0.03396393 0.00156757 -0.0434914 0.002305438140 -0.03995794 0.002010171 -0.0384976 0.001896071 -0.03849081 0.00189549 -0.05009551 0.002923419
Table 15: Point Prediction for Boundary Value over heteroskedastic data ( n = 1001 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL0 -0.01641585 0.273216 -0.01371259 0.3278422 0.01500573 0.3662851 0.01063269 0.383228120 -0.02085331 0.2520507 -0.0253276 0.274159 0.002731055 0.2896534 0.01538866 0.299151630 -0.02981426 0.2462187 -0.03060796 0.2589025 0.003270715 0.2685365 0.0163369 0.275526640 -0.04068759 0.2442488 -0.03742699 0.2526514 0.002433103 0.2586642 0.01748551 0.262914750 -0.05443176 0.2442541 -0.04586018 0.2488821 0.0005299281 0.2526573 0.01882287 0.255252960 -0.06977487 0.245767 -0.05475483 0.2474683 -0.001728694 0.248843 0.0199724 0.250657970 -0.08589639 0.2481108 -0.06470145 0.2471712 -0.005360827 0.2463975 0.02061821 0.248112480 -0.1036357 0.25121 -0.07550857 0.2474051 -0.01066184 0.2448518 0.02070028 0.246756990 -0.1221155 0.2551902 -0.08684923 0.2482818 -0.01739231 0.2440367 0.02029725 0.2459808100 -0.1410488 0.2599296 -0.09877418 0.2499431 -0.02522804 0.2438554 0.01949336 0.2454429110 -0.1599352 0.2653016 -0.1111362 0.252298 -0.03400529 0.2440469 0.0183332 0.2449864120 -0.1798873 0.2718873 -0.1241105 0.2552008 -0.04372748 0.2444396 0.01682687 0.2445524130 -0.2001088 0.2793124 -0.1376482 0.2586551 -0.05435831 0.2450597 0.01496614 0.2441256140 -0.2196351 0.2872558 -0.1514669 0.2625969 -0.06555938 0.2460242 0.01273652 0.2437067 n = 200 , τ = 0 . Ban Bias-LC MSE-LC Bias-LLH MSE-LLH Bias-LLM MSE-LLM Bias-LL MSE-LL10 -0.005989017 0.01091506 -0.009105295 0.01090718 -0.009100397 0.01090687 -0.006151798 0.0110282820 -0.004317512 0.01067232 -0.006852094 0.01066366 -0.006845515 0.01066378 -0.005549238 0.0107715630 -0.004591794 0.01059617 -0.006678386 0.0105944 -0.006665835 0.01059435 -0.006333703 0.0106874540 -0.005937551 0.01054613 -0.007674744 0.01055486 -0.007656429 0.01055456 -0.00795147 0.0106384150 -0.007974463 0.01051967 -0.009442124 0.01053534 -0.009416436 0.01053501 -0.01019012 0.0106141860 -0.01058953 0.01053145 -0.01180495 0.01054554 -0.01176999 0.01054509 -0.01297439 0.0106265670 -0.01373489 0.01057533 -0.01467215 0.01059193 -0.01462675 0.01059104 -0.01627809 0.0106845780 -0.01725266 0.01066128 -0.01798338 0.01067947 -0.01792693 0.01067781 -0.02008891 0.0107961490 -0.02118215 0.0107964 -0.02169107 0.01081295 -0.02162338 0.01081019 -0.0243973 0.01096978100 -0.02546816 0.01098609 -0.02575577 0.01099723 -0.02567729 0.01099311 -0.0291937 0.01121525110 -0.03007643 0.01123397 -0.0301445 0.01123745 -0.03005627 0.01123178 -0.03446792 0.01154379120 -0.03496193 0.01154587 -0.03483024 0.01153901 -0.03473374 0.01153169 -0.04020819 0.01196797130 -0.04015664 0.01193249 -0.03979071 0.01190758 -0.03968792 0.01189862 -0.04639914 0.0125013140 -0.04561132 0.01240015 -0.04500812 0.01234911 -0.04490124 0.01233864 -0.05301874 0.01315745
Table 17: Point Prediction for ISLR Wage Dataset
Method Bias MSELC 0.0004954944 0.08236025LLH -0.001962329 0.0808793LLM -6.005305e-05 0.08044857LL 0.0002608775 0.08055141 is tabulated versus decreasing age and performed point prediction over the last 231 valuesof this backward dataset, i.e., the first 231 values of the original. Since this is a regressiondataset with non-uniformly distributed design points we determine bandwidths for LC,LLH and LLM using the 2-sided predictive cross-validation procedure outlined in Section2.5. We predict the value of logwage at i and compare it with the known value at thatpoint where i = 2770 , . . . , , . . . , Improved estimation of conditional distributions at boundary points is possible via locallinear smoothing and other methods that, however, do not guarantee that the resultingestimator is a proper distribution function. In the paper at hand we propose a simplemonotonicity correction procedure that is immediately applicable, easy to implement, andperforms well with simulated and real data.To elaborate, it has been shown using boundary points on simulated datasets that the25LM distribution estimator outperforms that of LLH and LC as seen by the values ofthe Kolmogorov-Smirnov test statistic, accuracy of estimated quantiles, and also by itsperformance in point prediction—the latter finding being entirely unexpected. In contrast,for internal points on these datasets there seem to be no significant differences between the3 estimators using these performance metrics.In addition, among all three methods over a wide range of selected bandwidths theoverall best performance is obtained using Monotone Local Linear Estimation. As canbe seen from the point prediction tables, the predictor based on ¯ F LLMx ( y ) has lower biascompared to ¯ F x ( y ) and ¯ F LLHx ( y ); this is consistent with the discussion in Section 2, i.e.that ¯ F LLMx ( y ) has improved performance because of reduced bias in extrapolation for theboundary case. No such differences in bias are noticed for the case of internal points.As in the case of simulated data, in the real data example as well the point predictionperformance of LLM closely matches in performance to that of LL which implies thatthe LLM distribution estimator can be used for all practical applications, including pointprediction. 26 cknowledgements This research was partially supported by NSF grants DMS 12-23137 and DMS 16-13026.The authors would like to acknowledge the Pacific Research Platform, NSF Project ACI-1541349 and Larry Smarr (PI, Calit2 at UCSD) for providing the computing infrastructureused in this project.
References
Fan, J., & Gijbels, I. (1996).
Local polynomial modelling and its applications: monographson statistics and applied probability 66 (Vol. 66). CRC Press, Boca Raton.Hall, P., Wolff, R. C., & Yao, Q. (1999). Methods for estimating a conditional distributionfunction.
Journal of the American Statistical Association , (445), 154–163.Hansen, B. E. (2004). Nonparametric estimation of smooth conditional distributions. Unpublished paper: Department of Economics, University of Wisconsin .James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Islr: Data for an introductionto statistical learning with applications in r [Computer software manual]. Retrievedfrom http://CRAN.R-project.org/package=ISLR (R package version 1.0)Koenker, R. (2005).
Quantile regression (No. 38). Cambridge University Press, Cambridge.Li, Q., & Racine, J. S. (2007).
Nonparametric econometrics: theory and practice . PrincetonUniversity Press, Princeton.Politis, D. N. (2013). Model-free model-fitting and predictive distributions.
Test , (2),183–221.Politis, D. N. (2015). Model-free prediction and regression . Springer, New York.Schucany, W. R. (2004). Kernel smoothers: an overview of curve estimators for the firstgraduate course in nonparametric statistics.
Statistical Science , 663–675.Wand, M. P., & Jones, M. C. (1994).
Kernel smoothing . CRC Press, Boca Raton.Yu, K., & Jones, M. (1998). Local linear quantile regression.
Journal of the Americanstatistical Association , (441), 228–237.Yu, K., Lu, Z., & Stander, J. (2003). Quantile regression: applications and current researchareas. Journal of the Royal Statistical Society: Series D (The Statistician) ,52