CCOMPARISONS OF TWO QUANTILEREGRESSION SMOOTHERS
Rand R. WilcoxDept of PsychologyUniversity of Southern CaliforniaOctober 11, 2018 a r X i v : . [ s t a t . M E ] J un BSTRACTThe paper compares the small-sample properties of two non-parametric quantile regres-sion estimators. The first is based on constrained B-spline smoothing (COBS) and the otheris based on a variation and slight extension of a running interval smoother, which apparentlyhas not been studied via simulations. The motivation for this paper stems from the WellElderly 2 study, a portion of which was aimed at understanding the association between thecortisol awakening response and two measures of stress. COBS indicated what appeared bean usual form of curvature. The modified running interval smoother gave a strikingly differ-ent estimate, which raised the issue of how it compares to COBS in terms of mean squarederror and bias as well as its ability to avoid a spurious indication of curvature. R functions forapplying the methods were used in conjunction with default settings for the various optionalarguments. The results indicate that the modified running interval smoother has practicalvalue. Manipulation of the optional arguments might impact the relative merits of the twomethods, but the extent to which this is the case remains unknown.Keywords: running interval smoother, COBS, Harrell-Davis estimator, LOWESS, WellElderly 2 study, depressive symptoms, perceived control.
The paper deals with the problem of estimating and plotting a regression line when thegoal is to determine the conditional quantile of some random variable Y given X . Quantileregression methods have been studied extensively and plots of the regression line can providea useful perspective regarding the association between two variables. One approach is toassume that the conditional q th quantile of Y , given X , is given by Y q = β + β X, (1)where β and β are unknown parameters. For the special case where the goal is to estimatethe median of Y , given X , least absolute regression can be used, which predates least squaresregression by about a half century. A generalization aimed at dealing with any quantile wasderived by Koenker and Bassett (1978). While the assumption of a straight regression line2ppears to provide a good approximation of the true regression line in various situations,this is not always the case. One strategy for dealing with any possible curvature is to usesome obvious parametric model. For example, add a quadratic terms. But generally thiscan be unsatisfactory, which has led to the development of nonparametric regression lines,often called smoothers (e.g., H¨ardle, 1990; Efromovich, 1999; Eubank , 1999; Gy¨orfi, et al.,2002).For the particular case where the goal is to model the conditional quantile of Y given X ,one way of dealing with curvature in a reasonably flexible manner is to use constrained B-spline smoothing (COBS). The many computational details are summarized in Koenker andNg (2005); see in particular section 4 of their paper. The Koenker–Ng method improves ona computational method studied by He and Ng (1999) and builds upon results in Koenker,Ng and Portnoy (1994). Briefly, let ρ q ( u ) = u ( q − I ( u < I ( u <
0) = 1 if u <
0; otherwise I ( u <
0) = 0. The goal is to estimate the q th quantile of Y given X by finding a function ω ( X ) that minimizes (cid:88) ρ q ( Y i − ω ( X i )) (2)based on the random sample ( X , Y ) , . . . , ( X n , Y n ). The estimate is based on quadratic B-splines with the number of knots chosen via a Schwartz-type information criterion. Here,COBS is applied via the R package cobs.The motivation for this paper stems from the use of COBS when analyzing data from theWell Elderly 2 study (Jackson et al., 2009; Clark et al., 2012). A general goal was to assessthe efficacy of an intervention strategy aimed at improving the physical and emotional healthof older adults. A portion of the study dealt with understanding the association betweencortisol and various measures of stress and well-being. Before and six months following theintervention, participants were asked to provide, within 1 week, four saliva samples over thecourse of a single day, to be obtained on rising, 30 min after rising, but before taking anythingby mouth, before lunch, and before dinner. Extant studies (e.g., Clow et al., 2004; Chida& Steptoe, 2009) indicate that measures of stress are associated with the cortisol awakeningresponse, which is defined as the change in cortisol concentration that occurs during the firsthour after waking from sleep. CAR is taken to be the cortisol level upon awakening minusthe level of cortisol after the participants were awake for about an hour.3 CAR C ES D Figure 1: COBS regression line for predicting the .5 quantile of CESD, for males, based onthe cortisol awakening response after intervention.After intervention (with a sample size of 328), COBS indicated some seemingly unusuallyshaped regression lines. One of these had to do with the association between CAR and ameasure of depressive symptoms using the Center for Epidemiologic Studies Depressive Scale(CESD). The CESD (Radloff, 1977) is sensitive to change in depressive status over time andhas been successfully used to assess ethnically diverse older people (Lewinsohn et al., 1988;Foley et al., 2002). Higher scores indicate a higher level of depressive symptoms. Figure 1shows the estimated regression line for males when q = .
5. (There were 157 males.) Theestimated regression line for q = .
75 had a shape very similar to the one shown in Figure 1.Another portion of the study dealt with the association between CAR and a measure4
CAR P e r c e i v ed C on t r o l Figure 2: COBS regression line for predicting the .75 quantile of perceived control based onthe cortisol awakening response.of perceived control. (Perceived control was measured with the instrument in Eizenman etal. 1997. The scores ranged between 16 and 32 and consisted of a sum of Likert scales.)Now the .75 quantile regression line appears as shown in Figure 2. Again, there was concernabout the shape of the regression line.One possibility is that the regression lines in Figures 1 and 2 are a reasonable approxi-mation of the true regression. But another possibility is that they reflect a type of curvaturethat poorly approximates the true regression line. Suppose that Y = β + β X + λ ( X ) (cid:15) (3)5here λ ( X ) is some function used to model heteroscedasticity and (cid:15) is a random variablehaving mean zero and variance σ . Some preliminary simulation results suggested that if β = 0, β = 1, and both X and (cid:15) have standard normal distributions, reasonably straightregression lines are obtained using COBS. However, if (cid:15) has a skewed light-tailed distribution(a g-and-h distribution, details of which are described in section 3) and if for example λ ( X ) = | X | + 1, instances are encountered where a relatively high degree of curvature is encountered.An example is given in Figure 3 with n = 100.These results motivated consideration of an alternative quantile regression estimator. Afew checks suggested that the problems just illustrated are reduced considerably, but thereare no systematic simulation results providing some sense of how this alternative estimatorcompares to COBS. Consequently, the goal in this paper is to compare these estimatorsin terms of bias and mean squared error. Two additional criteria are used. The first isthe maximum absolute error between the predicted and actual quantile being estimated.The other is aimed at characterizing how the estimators compare in terms of indicating amonotonic association when in fact one exists. This is done via Kendall’s tau between thepredicted and true quantiles.It is noted that COBS is being applied using the R package cobs in conjunction with de-fault settings for the various arguments. The argument lambda alters how the regression lineis estimated and might possibly improve the fit to data via visual inspection. But obviouslythis strategy is difficult to study via simulations. The alternative estimator used here is ap-plied with an R function (qhdsm) again using default settings for all of the arguments. Theperformance of the method is impacted by the choice for the span (the constant f in section3). The simulations reported here provide information about the relative merits of the twoestimators with the understanding that perhaps their relative merits might be altered basedon a judgmental process that goes beyond the scope of this paper.Section 2 provides the details of the alternative estimator. Section 3 reports simulationresults comparing COBS to the alternative estimator and section 4 illustrates the differencebetween the two estimators for the data used in Figures 1-3.6 − − − X Y Figure 3: COBS regression line for predicting the .5 quantile using generated data, n = 100.7 Description of the Alternative Estimator
The alternative estimator consists of a blend of two smoothers: the running interval smoother(e.g., Wilcox, 2012, section 11.5.4) and the smoother derived by Cleveland (1979), typicallyknown as LOWESS. The running interval has appeal because it is readily adapted to anyrobust estimator. In particular, it is easily applied when the goal is to estimate the condi-tional quantile of Y given X . However, often this smoother gives a somewhat jagged lookingplot of the regression line. Primarily for aesthetic reasons, this issue is addressed by furthersmoothing the regression line via LOWESS.The version of the running interval smoother used here is based in part on the quantileestimator derived by Harrell and Davis (1982). The Harrell–Davis estimate of the q th quan-tile uses a weighted average of all the order statistics. Let Z , . . . , Z n be a random sample,let U be a random variable having a beta distribution with parameters a = ( n + 1) q and b = ( n + 1)(1 − q ), and let w i = P (cid:18) i − n ≤ U ≤ in (cid:19) . The estimate of the q th quantile is ˆ θ q = n (cid:88) i =1 w i Z ( i ) , (4)where Z (1) ≤ · · · ≤ Z ( n ) are the Z , . . . , Z n written in ascending order. Here the focus is onestimating the median and the .75 quantile. That is, q = . θ q .) Comparisons with other quantile es-timators are reported by Parrish (1990), Sheather and Marron (1990), as well as Dielman,Lowry and Pfaffenberger (1994). The only certainty is that no single estimator dominates interms of efficiency. For example, the Harrell–Davis estimator has a smaller standard errorthan the usual sample median when sampling from a normal distribution or a distribution8hat has relatively light tails, but for sufficiently heavy-tailed distributions, the reverse istrue (Wilcox, 2012, p. 87).The running interval smoother is applied as follows. Let ( X , Y ) , . . . , ( X n , Y n ) be arandom sample from some unknown bivariate distribution and let f be some constant to bedetermined. Then the point x is said to be close to X i if | X i − x | ≤ f × MADN , where MADN is MAD/.6745 and MAD is the median of | X − M | , . . . , | X n − M | , where M isthe usual sample median based on X , . . . , X n . For normal distributions, MADN estimatesthe standard deviation, in which case x is close to X i if x is within f standard deviations of X i . Let N ( X i ) = { j : | X j − X i | ≤ f × MADN } . That is, N ( X i ) indexes the set of all X j values that are close to X i . Let ˆ θ i be the Harrell–Davis estimate based on the Y j values such that j ∈ N ( x i ). To get a graphical representationof the regression line, compute ˆ θ i , the estimated value of Y given that X = X i , i = 1 , . . . , n ,and then plot the points ( X , ˆ θ ) , . . . , ( X n , ˆ θ n ) . Typically f = . f = . X j , let δ i = | X i − X j | , i = 1 , . . . , n. Sort the δ i values and retain the ξn pairs of points that have the smallest δ i values, where ξ is a number between 0 and 1 and plays the role of a span. Here, ξ = .
75 is used. Let δ m bethe largest δ i value among the retained points. Let Q i = | X j − X i | δ m , and if 0 ≤ Q i <
1, set w i = (1 − Q i ) , otherwise set w i = 0 . θ j corresponding to X j using the w i values asweights. That is, determine the values b and b that minimize (cid:88) w i (ˆ θ i − b − b X i ) and estimate ˆ θ j with ˜ θ j = b + b X j . The final plot of the quantile regression is taken tobe the line connecting the points ( X j , ˜ θ j ) ( j = 1 , . . . , n ). This will be called method Rhenceforth. Simulations were used to compare the small-sample properties of COBS and the modifiedrunning interval smoother based on K = 4000 replications and sample size n = 50. Thedata were generated according to the model Y = X + λ ( X ) (cid:15) (5)where X is taken to have a standard normal distribution and (cid:15) has one of four distribu-tions: normal, symmetric and heavy-tailed, asymmetric and light-tailed, and asymmetricand heavy-tailed. More precisely, the distribution for the error term was taken to be one offour g-and-h distributions (Hoaglin, 1985) that contain the standard normal distribution asa special case. If Z has a standard normal distribution, then W = exp( gZ ) − g exp( hZ / , if g > Z exp( hZ / , if g = 0has a g-and-h distribution where g and h are parameters that determine the first four mo-ments. The four distributions used here were the standard normal ( g = h = 0 . h = 0 . g = 0 . h = 0 . g = 0 . g = h = 0 . κ ) and kurtosis ( κ ) for each distribution. Additional propertiesof the g-and-h distribution are summarized by Hoaglin (1985).Three choices for λ were considered: λ ≡ λ = | X | + 1, and λ = 1 / ( | X | + 1). Thesethree choices are called VP 1, 2, and 3 henceforth.10able 1: Some properties of the g-and-h distribution.g h κ κ X , . . . , X n and ˘ θ q , . . . , ˘ θ qn where˘ θ qi is the estimate of the q th quantile of Y given that X = X i based on either COBS ormethod R. The degree to which this goal was accomplished was measured with Kendall’stau.Details about the four criteria used to compare COBS and method R are as follows. Thefirst criterion was mean squared error, which was estimated with1 nK K (cid:88) k =1 n (cid:88) i =1 ( θ qik − ˘ θ qik ) , (6)where now, for the k th replication, θ qik is the true conditional q th quantile of Y given X = X i and again ˘ θ qik is the estimate of θ qik based on either COBS or method R. Bias was estimatedwith 1 nK K (cid:88) k =1 n (cid:88) i =1 θ qik − ˘ θ qik . (7)The third criterion was the mean maximum absolute error:1 K K (cid:88) k =1 max {| θ q k − ˘ θ q k | , . . . , | θ qnk − ˘ θ qnk |} (8)The fourth criterion was 1 K K (cid:88) k =1 τ k , (9)where for the k th replication, τ k is Kendall’s tau between X , . . . , X n and ˘ θ q , . . . , ˘ θ qn .It is noted that the θ qik values are readily determined because the transformation usedto generate observations from a g-and-h distribution is monotonic and quantiles are locationand scale equivariant. 11imulation results are reported in Table 2 where RMSE is the mean squared error ofCOBS divided by the mean squared error of method R and RMAX is the maximum absolutevalue of COBS divided by the maximum absolute value of based on method R. As can beseen, generally method R competes well with COBS in terms of RMSE and RMAX, butneither method dominates. For q = .
5, R is uniformly better in terms of RMSE, but for q = .
75 and VP 3 COBS performs better than R. As for RMAX, R performs best for VP 1and 2, while for VP 3 the reverse is true. Bias for both methods is typically low with COBSseeming to have an advantage over method R. In terms of τ , method R dominates. That is,the simulations indicate that method R is better at avoiding an indication of curvature thatdoes not reflect the true regression line, as was the case in Figure 3. The data in Figures 1-3 are used to illustrate method R. The left panel of Figure 4 shows the .5quantile regression line for CAR and CESD. Notice that for CAR positive (cortisol decreasesafter awakening), the plot suggests a positive association with depressive symptoms, whichis consistent with Figure 1. But for CAR negative, method R suggests that there is littleor no association with CESD and clearly provides a different sense regarding the nature ofthe association. A criticism might be that if method R were to use a smaller choice for thespan, perhaps an association similar to Figure 1 would be revealed. But even with a span f = .
5, the plot of the regression line is very similar to the one shown in Figure 4.The right panel of Figure 4 shows the .75 quantile regression line for predicting perceivedcontrol based on CAR, which differs in an obvious way from the regression line based onCOBS shown in Figure 2. Figure 4 indicates that there is little or no indication of anassociation with CAR when CAR is negative, but for CAR positive, a negative associationis indicated. The only point is that the choice between COBS and method R can make asubstantial difference.Figure 5 shows the .5 quantile regression line based on the data used in Figure 3. Incontrast to COBS, method R provides a very good approximation of the true regression line.Again, this only illustrates the extent to which the two methods can give strikingly different12able 2: Simulation ResultsBIAS τg h
VP RMSE RMAX COBS R COBS R q = . − . − .
002 0.773 0.9270.0 0.0 3 1.281 0.726 − . − .
001 0.989 1.0000.0 0.2 1 1.160 1.333 − . − .
002 0.955 0.9940.0 0.2 2 1.395 2.076 0.005 0.009 0.794 0.9170.0 0.2 3 1.104 0.753 − . − .
002 0.991 1.0000.2 0.0 1 1.247 1.292 0.015 0.031 0.954 0.9960.2 0.0 2 1.400 2.048 0.023 0.035 0.786 0.9300.2 0.0 3 1.220 0.732 0.004 0.022 0.989 1.0000.2 0.2 1 1.178 1.384 0.014 0.027 0.956 0.9930.2 0.2 2 1.455 2.155 0.034 0.042 0.794 0.9140.2 0.2 3 1.040 0.765 0.005 0.023 0.990 1.000 q = . − .
017 0.077 0.938 0.9940.0 0.0 2 1.328 2.020 − .
052 0.027 0.709 0.8620.0 0.0 3 0.644 0.807 − .
014 0.105 0.973 0.9980.0 0.2 1 0.847 1.459 0.010 0.137 0.911 0.9780.0 0.2 2 1.124 2.140 0.022 0.136 0.666 0.7940.0 0.2 3 0.544 0.858 0.001 0.145 0.969 0.9950.2 0.0 1 0.964 1.423 0.000 0.110 0.907 0.9850.2 0.0 2 1.284 2.057 − .
026 0.074 0.655 0.8030.2 0.0 3 0.642 0.860 − .
006 0.126 0.962 0.9970.2 0.2 1 0.849 1.505 0.042 0.181 0.880 0.9530.2 0.2 2 1.195 2.214 0.084 0.202 0.614 0.7400.2 0.2 3 0.552 0.945 0.013 0.167 0.955 0.993R= modified running interval smoother13
CAR C ES D −0.4 −0.2 0.0 0.2 0.4 CAR P e r c e i v ed C on t r o l Figure 4: The quantile regression lines using method R and the data in Figures 1 and 2.results. As is evident, in this particularly case, method R provides a much more accurateindication of the true regression line.
For the situations considered in the simulations, method R does not dominate COBS basedon the four criteria used here. COBS seems to have an advantage in terms of minimizingbias. But otherwise method R competes well with COBS, particularly in terms of Kendall’stau, which suggests that typically method R is better able to avoid an indication of spurious14 − − − X Y Figure 5: The quantile regression line using method R and the data in Figure 315urvature. Moreover, the illustrations demonstrate that the choice between the two methodscan make a substantial difference even with a sample size of n = 328. So in summary,method R would seem to deserve serious consideration.Another possible appeal of method R is that it is readily extended to the situationwhere there is more than one independent variable. That is, a generalization of the runninginterval smooth already exists (e.g., Wilcox, 2012). Moreover, additional smoothing can beaccomplished, if desired, using the smoother derived by Cleveland and Devlin (1988), whichgeneralizes the technique derived by Cleveland (1979). Evidently, a generalization of COBSto more than one independent variable has not been derived.Finally, an R function for applying method R, called qhdsm, is available in the ForgeR package WRS. A version is also stored on the first author’s web page (in the file labeledRallfun-v26.) . ReferencesChida, Y. & Steptoe, A. (2009). Cortisol awakening response and psychosocial factors:A systematic review and meta-analysis. Biological Psychology, 80 , 265–278.Clark, F., Jackson, J., Carlson, M., et al. (2012). Effectiveness of a lifestyle interventionin promoting the well-being of independently living older people: results of the Well Elderly 2Randomise Controlled Trial.
Journal of Epidemiology and Community Health, 66 , 782–790.doi:10.1136/jech.2009.099754Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots.
Journal of the American Statistical Association, 74 , 829–836.Cleveland, W.S., and Devlin, S.J., (1988). Locally-weighted Regression: An Approach toRegression Analysis by Local Fitting.
Journal of the American Statistical Association, 83 ,596–610.Clow, A., Thorn, L., Evans, P. & Hucklebridge, F. (2004). The awakening cortisolresponse: Methodological issues and significance.
Stress, 7 , 29–37.Dielman, T., Lowry, C. & Pfaffenberger, R. (1994). A comparison of quantile estimators.16 ommunications in Statistics–Simulation and Computation, 23 , 355-371.Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P., Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009).Confronting challenges in intervention research with ethnically diverse older adults: the USCWell Elderly II trial.
Clinical Trials, 6 , 90–101.Efromovich, S. (1999).
Nonparametric Curve Estimation: Methods, Theory and Applica-tions . New York: Springer-Verlag.Eizenman, D. R., Nesselroade, J. R., Featherman, D. L. & Rowe, J. W. (1997). Intraindi-vidual variability in perceived control in an older sample: The MacArthur Successful AgingStudies.
Psychology and Aging, 12 , 489–502.Eubank, R. L. (1999).
Nonparametric Regression and Spline Smoothing . New York:Marcel Dekker.Foley K., Reed P., Mutran E., Devellis, R. F. (2002). Measurement adequacy of theCESD among a sample of older African Americans.
Psychiatric Research, 109 , 61–9.Gy¨orfi, L., Kohler, M., Krzyzk, A., & Walk, H. (2002).
A Distribution-Free Theory ofNonparametric Regression . New York: Springer Verlag.H¨ardle, W. (1990).
Applied Nonparametric Regression . Econometric Society MonographsNo. 19, Cambridge, UK: Cambridge University Press.Harrell, F. E. & Davis, C. E. (1982). A new distribution-free quantile estimator.
Biometrika,69 , 635–640.He, X. & Ng, P. (1999). COBS: Qualitative constrained smoothing via linear program-ming.
Computational Statistics, 14 , 315–337.Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D.Hoaglin, F. Mosteller & J. Tukey (Eds.)
Exploring Data Tables Trends and Shapes . NewYork: Wiley.Koenker, R. & Bassett, G. (1978). Regression quantiles.
Econometrika, 46 , 33–50.17oenker, R. & Ng, P. (2005). Inequality Constrained Quantile Regression.
Sankhya, TheIndian Journal of Statistics, 67 , 418-440.Koenker, R., Ng, P. & Portnoy, S. (1994). Quantile smoothing splines.
Biometrika, 81 ,673–680.Lewinsohn, P.M., Hoberman, H. M., & Rosenbaum M. (1988). A prospective study ofrisk factors for unipolar depression.
Journal of Abnormal Psychology, 97 , 251–64.Parrish, R. S. (1990). Comparison of quantile estimators in normal sampling.
Biometrics,46 , 247–257.Radloff, L. (1977). The CESD scale: a self report depression scale for research in thegeneral population.
Applied Psychological Measurement, 1 , 385–401.Sfakianakis, M. E. & Verginis, D. G. (2006). A new family of nonparametric quantileestimators.
Communications in Statistics–Simulation and Computation, 37 , 337–345.Sheather, S. J. & Marron, J. S. (1990). Kernel quantile estimators.
Journal of theAmerican Statistical Association, 85 , 410–416.Wilcox, R. R. (2012).