Nearly root-n approximation for regression quantile processes
aa r X i v : . [ m a t h . S T ] O c t The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2012
NEARLY ROOT- N APPROXIMATION FOR REGRESSIONQUANTILE PROCESSES
By Stephen Portnoy University of Illinois at Urbana-Champaign
Traditionally, assessing the accuracy of inference based on regres-sion quantiles has relied on the Bahadur representation. This providesan error of order n − / in normal approximations, and suggests thatinference based on regression quantiles may not be as reliable as thatbased on other (smoother) approaches, whose errors are generally oforder n − / (or better in special symmetric cases). Fortunately, ex-tensive simulations and empirical applications show that inference forregression quantiles shares the smaller error rates of other procedures.In fact, the “Hungarian” construction of Koml´os, Major and Tusn´ady[ Z. Wahrsch. Verw. Gebiete (1975) 111–131, Z. Wahrsch. Verw.Gebiete (1976) 33–58] provides an alternative expansion for theone-sample quantile process with nearly the root- n error rate (specifi-cally, to within a factor of log n ). Such an expansion is developed hereto provide a theoretical foundation for more accurate approximationsfor inference in regression quantile models. One specific applicationof independent interest is a result establishing that for conditionalinference, the error rate for coverage probabilities using the Hall andSheather [ J. R. Stat. Soc. Ser. B Stat. Methodol. (1988) 381–391]method of sparsity estimation matches their one-sample rate.
1. Introduction.
Consider the classical regression quantile model: givenindependent observations { ( x i Y i ) : i = 1 , . . . , n } , with x i ∈ R p fixed (for fixed p ),the conditional quantile of the response Y i given x i is Q Y i ( τ | x i ) = x ′ i β ( τ ) . Let ˆ β ( τ ) be the Koenker–Bassett regression quantile estimator of β ( τ ).Koenker (2005) provides definitions and basic properties, and describes thetraditional approach to asymptotics for ˆ β ( τ ) using a Bahadur representa- Received August 2011; revised May 2012. Supported in part by NSF Grant DMS-10-07396.
AMS 2000 subject classifications.
Primary 62E20, 62J99; secondary 60F17.
Key words and phrases.
Regression quantiles, asymptotic approximation, Hungarianconstruction.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2012, Vol. 40, No. 3, 1714–1736. This reprint differs from the original inpagination and typographic detail. 1
S. PORTNOY tion: B n ( τ ) ≡ n / ( ˆ β ( τ ) − β ( τ )) = D ( x ) W ( τ ) + R n , where W ( t ) is a Brownian Bridge and R n is an error term.Unfortunately, R n is of order n − / [see, e.g., Jureˇckov´a and Sen (1996)and Knight (2002)]. This might suggest that asymptotic results are ac-curate only to this order. However, both simulations in regression casesand one-dimensional results [Koml´os, Major and Tusn´ady (1975, 1976)] jus-tify a belief that regression quantile methods should share (nearly) the O ( n − / ) accuracy of smooth statistical procedures (uniformly in τ ). Infact, as shown in Knight (2002), n / R n has a limit with zero mean andthat is independent of W ( τ ). Thus, in any smooth inferential procedure(say, confidence interval lengths or coverages), this error term should enteronly through ER n = O ( n − / ). Nonetheless, this expansion would still leavean error of o ( n − / ) (coming from the error beyond the R n term in theBahadur representation), and so would still fail to reflect root- n behavior.Furthermore, previous results only provide such a second-order expansionfor fixed τ .It must be noted that the slower O ( n − / ) error rate arises from thediscreteness introduced by indicator functions appearing in the gradientconditions. In fact, expansions can be carried out when the design is as-sumed to be random; see De Angelis, Hall and Young (1993) and Horowitz(1998), where the focus is on analysis of the ( x, Y ) bootstrap. Specifically,the assumption of a smooth distribution for the design vectors together witha separate treatment of the lattice contribution of the intercept does permitappropriate expansions. Unfortunately, the randomness in X means that allinference must be in terms of the average asymptotic distribution (averagedover X ), and so fails to apply to the generally more desirable conditionalforms of inference. Specifically, unconditional methods may be quite poorin the heteroscedastic and nonsymmetric cases for which regression quantileanalysis is especially appropriate. The main goal of this paper is to reclaimincreased accuracy for conditional inference beyond that provided by thetraditional Bahadur representation.Specifically, the aim is to provide a theoretical justification for an errorbound of nearly root- n order uniformly in τ . Defineˆ δ n ( τ ) = √ n ( ˆ β ( τ ) − β ( τ )) . We first develop a normal approximation for the density of ˆ δ with thefollowing form: f ˆ δ ( δ ) = ϕ Σ ( δ )(1 + O ( L n n − / ))for k δ k ≤ D √ log n , where L n = (log n ) / . We then extend this result to thedensities of a pair of regression quantiles in order to obtain a “Hungarian” EGRESSION QUANTILE APPROXIMATION construction [Koml´os, Major and Tusn´ady (1975, 1976)] that approximatesthe process B n ( τ ) by a Gaussian process to order O ( L ∗ n n − / ), where L ∗ n =(log n ) / (uniformly for ε ≤ τ ≤ − ε ).Section 2 provides some applications of the results here to conditionalinference methods in regression quantile models. Specifically, an expansionis developed for coverage probabilities of confidence intervals based on the[Hall and Sheather (1988)] difference quotient estimator of the sparsity func-tion. The coverage error rate is shown to achieve the rate O ( n − / log n ) forconditional inference, which is nearly the known “optimal” rate obtained fora single sample and for unconditional inference. Section 3 lists the condi-tions and main results, and offers some remarks. Section 4 provides a de-scription of the basic ingredients of the proof (since this proof is rather longand complicated). Section 5 proves the density approximation for a fixed τ (with multiplicative error). Section 6 extends the result to pairs of regressionquantiles (Theorem 1), and Section 7 provides the “Hungarian” construc-tion (Theorem 2) with what appears to be a somewhat innovative inductionalong dyadic rationals.
2. Implications for applications.
As the impetus for this work was theneed to provide some theoretical foundation for empirical results on theaccuracy of regression quantile inference, some remarks on implications arein order.
Remark 1.
Clearly, whenever published work assesses the accuracy ofan inferential method using the error term from the Bahadur representation,the present results will immediately provide an improvement from O ( n − / )to the nearly root- n rate here. One area of such results is methods baseddirectly on regression quantiles and not requiring estimation of the spar-sity function [1 /f ( F − ( τ ))]. There are several papers giving such results,although at present it appears that their methods have theoretical justifica-tion only under location-scale forms of quantile regression models.Specifically, Zhou and Portnoy (1996) introduced confidence intervals (es-pecially for fitted values) based on using pairs of regression quantiles in a wayanalogous to confidence intervals for one-sample quantiles. They showed thatthe method was consistent, but the accuracy depended on the Bahadur errorterm. Thus, results here now provide accuracy to the nearly root- n rate ofTheorem 2.A second approach directly using the dual quantile process is based onthe regression ranks of Gutenbrunner et al. (1993). Again, the error terms inthe theoretical results there can be improved using Theorem 1 here, thoughthe development is not so direct.For a third application, Neocleous and Portnoy (2008) showed that theregression quantile process interpolated along a grid of mesh strictly largerthan n − / is asymptotically equivalent to the full regression quantile process S. PORTNOY to first order, but (because of additional smoothness) will yield monotonicquantile functions with probability tending to 1. However, their developmentused the Bahadur representation, which indicated that a mesh of order n − / balanced the bias and accuracy and bounded the difference between ˆ β ( τ )and its linear interpolate by nearly O ( n − / ). With some work, use of theresults here would permit a mesh slightly larger than the nearly root- n ratehere to obtain an approximation of nearly root- n order. Remark 2.
Inference under completely general regression quantile mod-els appears to require either estimation of the sparsity function or use ofresampling methods. The most general methods in the quantreg package[Koenker (2012)] use the “difference quotient” method with the [Hall andSheather (1988)] bandwidth of order n − / , which is known to be optimalfor coverage probabilities in the one-sample problem. As noted above, ex-pansions using the randomness of the regressors can be developed to provideanalogous results for unconditional inference. The results here (with someelaboration) can be used to show that the Hall–Sheather estimates provide(nearly) the same rates of accuracy for coverage probabilities under the con-ditional form of the regression quantile model.To be specific, consider the problem of confidence interval estimation fora fixed linear combination of regression parameters: a ′ β ( τ ). The asymptoticvariance is the well-known sandwich formula s a ( δ ) = τ (1 − τ ) a ′ ( X ′ DX ) − ( X ′ X )( X ′ DX ) − a, D ≡ diag( x ′ i δ ) , (2.1)where δ is the sparsity, δ = β ′ ( τ ) (with β ′ being the gradient), and where X is the design matrix.Following Hall and Sheather (1988), the sparsity may be approximatedby the difference quotient ˜ δ = ( β ( τ + h ) − β ( τ − h )) / (2 h ). Standard approx-imation theory (using the Taylor series) shows that δ = ˜ δ + O ( h ) . The sparsity may be estimated byˆ δ ≡ ∆( h ) / (2 h ) ≡ ( ˆ β ( τ + h ) − ˆ β ( τ − h )) / (2 h ) , (2.2)and the sparsity (2.1) may be estimated by inserting ˆ δ in D .Then, as shown in the Appendix, the confidence interval a ′ β ( τ ) ∈ a ′ ˆ β ( τ ) ± z α s a (ˆ δ )(2.3)has coverage probability 1 − α + O ((log n ) n − / ), which is within a factorof log n of the optimal Hall–Sheather rate in a single sample. Furthermore,this rate is achieved at the (optimal) h -value h ∗ n = c √ log nn − / , which isthe optimal Hall–Sheather bandwidth except for the √ log n term. EGRESSION QUANTILE APPROXIMATION Since the optimal bandwidth depends on R ∗ n , the optimal constant forthe h ∗ n cannot be determined, as it can when X is allowed to be random [andfor which the O (1 / ( nh n )) term is explicit]. This appears to be an inherentshortcoming for using inference conditional on the design.Note also that it is possible to obtain better error rates for the coverageprobability by using higher order differences. Specifically, using the notationof (2.2), ∆( h ) − ∆(2 h ) = β ′ ( τ ) + O ( h ) . As a consequence, the optimal bandwidth for this estimator is of order n − / ,and the coverage probability is accurate to order n − / (except for logarith-mic factors). Remark 3.
A third approach to inference applies resampling methods.As noted in the Introduction, while the ( x, Y ) bootstrap is available for un-conditional inference, the practicing statistician will generally prefer to useinference conditional on the design. There are some resampling approachesthat can obtain such inference. One method is that of Parzen, Wei and Ying(1994), which simulates the binomial variables appearing in the gradientcondition. Another is the “Markov Chain Marginal Bootstrap” of He andHu (2002) [see also Kocherginsky, He and Mu (2005)]. However, this methodalso involves sampling from the gradient condition. The discreteness in thegradient condition would seem to require the error term from the Bahadurrepresentation, and thus leads to poorer inferential approximation: the errorwould be no better than order n − / even if it were the square of the Bahadurerror term. While some evidence for decent performance of these methodscomes from (rather limited) simulations, it is often noticed that these meth-ods perform perhaps somewhat more poorly than the other methods in the quantreg package of Koenker (2012). Clearly, a more complete analysis ofinference for regression quantiles based on the more accurate stochastic ex-pansions here would be useful.
3. Conditions, fundamental theorems and remarks.
Under the regres-sion quantile model of Section 1, the following conditions will be imposed:Let ˙ x i denote the coordinates of x i except for the intercept (i.e., the last p − φ i ( t ) denote the conditionalcharacteristic function of the random variable ˙ x i ( I ( Y i ≤ x ′ i β ( τ ) + δ/ √ n ) − τ ),given x i . Let f i ( y ) and F i ( y ) denote the conditional density and c.d.f. of Y i given x i . Condition X1.
For any ε >
0, there is η ∈ (0 ,
1) such thatinf k t k >ε Y ˙ φ i ( t ) ≤ η n (3.1)uniformly in ε ≤ τ ≤ − ε . S. PORTNOY
Condition X2. k x i k are uniformly bounded, and there are positivedefinite p × p matrices G = G ( τ ) and H such that for any ε > n → ∞ ) G n ( τ ) ≡ n n X i =1 f i ( x ′ i β ( τ )) x ′ i x i = G ( τ )(1 + O ( n − / )) , (3.2) H n ≡ n n X i =1 x ′ i x i = H (1 + O ( n − / ))(3.3)uniformly in ε ≤ τ ≤ − ε . Condition F.
The derivative of log( f i ( y )) is uniformly bounded on theinterval { y : ε ≤ F i ( y ) ≤ − ε } .Two fundamental results will be developed here. The first result providesa density approximation with multiplicative error of nearly root- n rate. A re-sult for fixed τ is given in Theorem 5, but the result needed here is a bivariateapproximation for the joint density of one regression quantile and the differ-ence between this one and a second regression quantile (properly normalizedfor the difference in τ -values).Let ε ≤ τ ≤ − ε for some ε >
0, and let τ = τ + a n with a n > cn − b forsome b <
1. Here, one may want to take b near 1 [see remark (1) below],though the basic result will often be useful for b = , or even smaller. Define B n = B n ( τ ) ≡ n / ( ˆ β ( τ ) − β ( τ )) , (3.4) R n = R n ( τ , τ ) ≡ ( na n ) / [( ˆ β ( τ ) − β ( τ )) − ( ˆ β ( τ ) − β ( τ ))] . (3.5) Theorem 1.
Under Conditions X1, X2 and F, there is a constant D such that for | B n | ≤ D (log n ) / and | R n | ≤ D (log n ) / , the joint densityof R n and B n at δ and s , respectively, satisfies f R n ,B n ( δ, s ) = ϕ Γ n ( δ, s )(1 + O (( na n (log n ) ) − / )) , where ϕ Γ n is a normal density with covariance matrix Γ n having the formgiven in (7.3). The second result provides the desired “Hungarian” construction:
Theorem 2.
Assume Conditions X1, X2 and F. Fix a n = n − b with b < , and let { τ j } be dyadic rationals with denominator less than n b . Define B ∗ n ( τ ) to be the piecewise linear interpolant of { B n ( τ j ) } [as defined in (3.4)].Then for any ε > , there is a (zero-mean) Gaussian process, { Z n ( τ j ) } , de-fined along the dyadic rationals { τ j } and with the same covariance structureas B ∗ n ( τ ) (along { τ j } ) such that its piecewise linear interpolant Z ∗ n ( τ ) satis- EGRESSION QUANTILE APPROXIMATION fies sup ε ≤ τ ≤ − ε | B ∗ n ( τ ) − Z ∗ n ( τ ) | = O (cid:18) (log n ) / √ n (cid:19) almost surely. Some remarks on the conditions and ramifications are in order:(1) The usual construction approximates B n ( τ ) by a “Brownian Bridge”process. Theorem 2 really only provides an approximation for the discreteprocesses at a sufficiently sparse grid of dyadic rationals. That the piecewiselinear interpolants converge to the usual Brownian Bridge follows as in Neo-cleous and Portnoy (2008). The critical impediment to getting a BrownianBridge approximation to B n ( τ ) with the error in Theorem 2 is the squareroot behavior of the modulus of continuity. This prevents approximatingthe piecewise linear interpolant within an interval of length greater than(roughly) order 1 /n if a root- n error is desired. In order to approximate thedensity of the difference in B n ( τ ) over an interval between dyadic rationals,the length of the interval must be at least of order n − b (for b < √ n − b = n − b/ , and thus to get arbitrarily close to thevalue of for the exponent of n . For most purposes, it might be better tostate the final result assup ε ≤ τ ≤ − ε k B n ( τ ) − Z ( τ ) k = O ( n − a )for any a < / Z is the appropriate Brownian Bridge); but thestronger error bound of Theorem 2 does provide a much closer analog of theresult for the one-sample (one-dimensional) quantile process.(2) The one-sample result requires only the first power of log n , which isknown to give the best rate for a general result. The extra addition of 3 / x . Nonetheless, the condition that k x k be bounded seems rather strongin the case of random x . It seems clear that this can be weakened, thoughprobably at the cost of getting a poorer approximation. For example, k x k having exponentially small tails might increase the bound in Theorem 2 byan additional factor of log n , and algebraic tails are likely worse. However,details of such results remain to be developed.(4) Similarly, it should be possible to let ε , which defines the compactsubinterval of τ -values, tend to zero. Clearly, letting ε n be of order 1 /n S. PORTNOY would lead to extreme value theory and very different approximations. Forslower rates of convergence of ε n , Bahadur expansions have been developed[e.g., see Gutenbrunner et al. (1993)] and extension to the approximationresult in Theorem 2 should be possible. Again, however, this would mostlikely be at the cost of a larger error term.(5) The assumption that the conditional density of the response (given x )be continuous is required even for the usual first order asymptotics. However,one might hope to avoid Condition F, which requires a bounded derivativeat all points. For example, the double exponential distribution does notsatisfy this condition. It is likely that the proofs here can be extended tothe case where the derivative does not exist on a finite set (or even ona set of measure zero), but dropping differentiability entirely would requirea rather different approach. Furthermore, the apparent need for boundedderivatives in providing uniformity over τ in Bahadur expansions suggeststhe possibility that some differentiability is required.(6) Theorem 1 provides a bivariate normal density approximation witherror rate (nearly) n − / when τ and τ are fixed. When a n ≡ τ − τ → a n → D n = ˆ β ( τ ) − ˆ β ( τ ) is of order ( na n ) − / .
4. Ingredients and outline of proof.
The development of the fundamen-tal results (Theorems 1 and 2) will be presented in three phases. The firstphase provides the density approximation for a fixed τ , since some of themore complicated features are more transparent in this case. The secondphase extends this result to the bivariate approximation of Theorem 1. Thefinal phase provides the “Hungarian” construction of Theorem 2. To clarifythe development, the basic ingredients and some preliminary results will bepresented first. Ingredient 1.
Begin with the finite sample density for a regressionquantile [Koenker (2005), Koenker and Bassett (1978)]: assume Y i has a den-sity, f i ( y ), and let τ be fixed. Note that ˆ β ( τ ) is defined by having p zeroresiduals (if the design is in general position). Specifically, there is a sub-set, h , of p integers such that ˆ β ( τ ) = X − h Y h , where X h has rows x ′ i for i ∈ h and Y h has coordinates Y i for i ∈ h . Let H denote the set of all such p -elementsubsets. Define ˆ δ = √ n ( ˆ β ( τ ) − β ( τ )) . As described in Koenker (2005), the density of ˆ δ evaluated at the argu-ment δ = √ n ( b − β ( τ )) is given by f ˆ δ ( δ ) = n − p/ X h ∈H det( X h ) P { S n ∈ A h } Y i ∈ h f i ( x ′ i β ( τ ) + n − / δ ) . (4.1) EGRESSION QUANTILE APPROXIMATION Here, the event in the probability above is the event that the gradientcondition holds for a fixed subset, h : S n ∈ A h , where A h = X h R , with R the rectangle that is the product of intervals ( τ − , τ ) [see Theorem 2.1 ofKoenker (2005)], and where S n = S n ( h, β, δ ) ≡ X i/ ∈ h x i ( I ( Y i ≤ x ′ i β + n − / δ ) − τ ) . (4.2) Ingredient 2.
Since n − / S n is approximately normal, and A h is bound-ed, the probability in (4.1) is approximately a normal density evaluated at δ .To get a multiplicative bound, we may apply a “Cram´er” expansion (ora saddlepoint approximation). If S n had a smooth distribution (i.e., satisfiedCram´er’s condition), then standard results would apply. Unfortunately, S n is discrete. The first coordinate of S n is nearly binomial, and so a multiplica-tive bound can be obtained by applying a known saddlepoint formula forlattice variables [see Daniels (1987)]. Equivalently, approximate by an exactbinomial and (more directly, but with some rather tedious computation)expand the logarithm of the Gamma function in Stirling’s formula. Usingeither approach, one can show the following result: Theorem 3.
Let W ∼ binomial( n, p ) , J be any interval of length O ( √ n ) containing EW = np , and let w = O ( p n log( n )) . Then P { W ∈ J + w } = P { Z ∈ J + w } (1 + O ( n − / p log( n ))) , (4.3) where Z ∼ N ( np, np (1 − p )) . A proof based on multinomial expansions is given for the bivariate gen-eralization in Theorem 1. Note that this result includes an extra factor of p log( n ). This will allow the bounds to hold except with probability boundedby an arbitrarily large negative power of n . This is clear for the limiting nor-mal case (by standard asymptotic expansions of the normal c.d.f.). To obtainsuch bounds for the distribution of S n will require some form of Bernstein’sinequality. Such inequalities date to Bernstein’s original publication in 1924[see Bernstein (1964)], but a version due to Hoeffding (1963) may be easierto apply. Ingredient 3.
Using Theorem 3, it can be shown (see Section 4) thatthe probability in (4.1) may be approximated as P { ˜ S n ∈ A h } (1 + O ( L n / √ n )) , where the first coordinate of ˜ S n is a sum of n i.i.d. N (0 , τ (1 − τ )) randomvariables, the last ( p −
1) coordinates are those of S n , and L n = (log n ) / . S. PORTNOY
Since we seek a normal approximation for this probability with multiplica-tive error, at this point one might hope that a known (multidimensional)“Cram´er” expansion or saddlepoint approximation would allow ˜ S n to bereplaced by a normal vector (thus providing the desired result). However,this will require that the summands be smooth, or (at least) satisfy a formof Cram´er’s condition. Let ˙ x i denote the last ( p −
1) coordinates of x i .One approach would be to assume ˙ x i has a smooth distribution satisfyingthe classical form of Cram´er’s condition. However, to maintain a conditionalform of the analysis, it suffices to impose a condition on ˙ x i , which is designedto mimic the effect of a smooth distribution and will hold with probabilitytending to one if ˙ x i has such a smooth distribution. Condition X1 specifiesjust such an assumption.Note that the characteristic functions of the summands of ˜ S n , say, { ˙ φ i ( t ) } ,will also satisfy Condition X1 [equation (3.1)] and so should allow applica-tion of known results on normal approximations. Unfortunately, I have beenunable to find a published result providing this and so Section 5 will presentan independent proof.Clearly, some additional conditions will be required. Specifically, we willneed conditions that the empirical moments of { x i } converge appropriately,as specified in Condition X2.Finally, the approach using characteristic functions is greatly simplifiedwhen the sums, ˜ S n , have densities. Again, to avoid using smoothness of thedistribution of { ˙ x i } (and thus to maintain a conditional approach), introducea random perturbation V n which is small and has a bounded smooth density(the bound may depend on n ). Section 4 will then prove the following: Theorem 4.
Assume Conditions X1 and X2 and the regression quantilemodel of Section 1. Let δ be the argument of the density of n − / ( ˆ β − β ) ,and suppose k δ k ≤ d √ n for some constant d . Then a constant d can be chosen so that P { S n + V n ∈ A h } = P (cid:26) Z n + V n √ n ∈ A h √ n (cid:27)(cid:18) O (cid:18) log / ( n ) √ n (cid:19)(cid:19) + O ( n − d ) , where Z n has mean − G − n δ and covariance τ (1 − τ ) H n , d can be arbitrarilylarge, and V n is a small perturbation [see (5.1)]. Following the proof of this theorem, it will be shown that the effect of V n can be ignored, if V n is bounded by n − d , where d may depend on d (butnot on d ). Ingredient 4.
Expanding the densities in (4.1) is trivial if the densitiesare sufficiently smooth. The assumption of a bounded first derivative in
EGRESSION QUANTILE APPROXIMATION Condition F appears to be required to analyze second order terms (beyondthe first order normal approximation).
Ingredient 5.
Finally, summing terms involving det( X h ) in (4.1) overthe (cid:0) np (cid:1) summands will require Vinograd’s theorem and related results frommatrix theory concerning adjoint matrices [see Gantmacher (1960)].The remaining ingredients provide the desired “Hungarian” construction. Ingredient 6.
Extend the density approximation to the joint densityfor ˆ β ( τ ) and ˆ β ( τ ) (when standardized). A major complication is that oneneeds a n ≡ | τ − τ | →
0, making the covariance matrix tend to singularity.Thus, we focus on the joint density for standardized versions of ˆ β ( τ ) and D n ≡ ˆ β ( τ ) − ˆ β ( τ ). Clearly, this requires modification of the proof for theunivariate case to treat the fact that D n converges at a rate depending on a n .The result is given in Theorem 1. Ingredient 7.
Extend the density result to obtain an approximationfor the quantile transform for the conditional distribution of differences D n (between successive dyadic rationals). This will provide (independent) nor-mal approximations to the differences whose sums will have the same covari-ance structure as the regression quantile process (at least along a sufficientlysparse grid of dyadic rationals). Ingredient 8.
Finally, the Hungarian construction is applied induc-tively along the sparse grid of dyadic rationals. This inductive step requiressome innovative development, mainly because the regression quantile pro-cess is not directly expressible in terms of sums of random variables (as arethe empiric one-sample distribution function and quantile function).
5. Proof of Theorem 4.
Let ˙ S n be the last p − S n and A (1) ( ˙ S n , h ) be the interval { a : ( a, ˙ S n ) ∈ A h } . Then, P { S n ∈ A h } = P (cid:26)X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − τ ) ∈ A (1) ( ˙ S n , h ) (cid:27) = P (cid:26)X i/ ∈ h ( I ( Y i ≤ x ′ i β ) − τ ) ∈ A (1) ( ˙ S n , h ) − X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − I ( Y i ≤ x ′ i β )) (cid:27) = X k ∈ A ∗ f binomial ( k ; τ ) , S. PORTNOY where A ∗ is the set A (1) shifted as indicated above. Note that by Hoeffding’sinequality [Hoeffding (1963)], for any fixed d , the shift satisfies (cid:12)(cid:12)(cid:12)(cid:12)X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − I ( Y i ≤ x ′ i β )) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d √ n p log( n )except with probability bounded by 2 n − d . Thus, we may apply Theorem 3[equation (4.3)] with w equal to the shift above to obtain the following bound(to within an additional additive error of 2 n − d ): P { S n ∈ A h } = P { nZ p τ (1 − τ ) ∈ A (1) ( ˙ S n , h ) } (1 + O ( a n / √ n )) , where Z ∼ N (0 ,
1) and a n is a bound on ˙ S n , which may be taken to be ofthe form B √ log n (by Hoeffding’s inequality). Finally, we obtain P { S n ∈ A h } = P { ˜ S n ∈ A h } (1 + O ( a n / √ n )) + 2 n − d , where the first coordinate of ˜ S n is a sum of n i.i.d. N (0 , τ (1 − τ )) randomvariables and the last p − S n .To treat the probability involving ˜ S n , standard approaches using charac-teristic functions can be employed. In theory, exponential tilting (or saddle-point methods) should provide better approximations, but since we requireonly the order of the leading error term, we can proceed more directly. Asin Einmahl (1989), the first step is to add an independent perturbation sothat the sum has an integrable density: specifically, for fixed h ∈ H let V n bea random variable (independent of all observations) with a smooth boundeddensity and for which (for each h ∈ H ) k V n k ≤ n − d , (5.1)where d will be chosen later. Define S ∗ n = ˜ S n + V n . We now allow A h to be any (arbitrary) set, say, A . Thus, S ∗ n has a densityand we can write [with c π = (2 π ) − p ] P { S ∗ n / √ n ∈ A } = c π Z Vol( A ) φ Unif( A ) ( t ) φ ˜ S n ( t/ √ n ) φ V n ( t/ √ n ) dt, where φ U denotes the characteristic function of the random variable U .Break domain of integration into 3 sets: k t k ≤ d p log( n ), d p log( n ) ≤k t k ≤ ε √ n , and k t k ≥ ε √ n .On k t k ≤ d p log( n ), expand log φ ˜ S n / √ n ( t ). For this, compute µ i ≡ Ex i ( τ − I ( y i ≤ x ′ i β + x ′ i δ/ √ n ))= − f i ( F − i ( τ )) x i x ′ i δ/ √ n + O ( k x i k k δ k /n ) , Σ i ≡ Cov[ x i ( τ − I ( y i ≤ x ′ i β + x ′ i δ/ √ n ))]= x i x ′ i τ (1 − τ ) + O ( k x i k k δ k /n ) . EGRESSION QUANTILE APPROXIMATION Hence, using the boundedness of k x i k , k δ k and k t k (on this first interval), φ ˜ S n ( t/ √ n ) = exp (cid:26) − ι X i/ ∈ h µ i / √ nt ′ δ − X i/ ∈ h t ′ Σ i t/n + O (cid:18) k δ k + k t k √ n (cid:19)(cid:27) = exp (cid:26) − ιG n t ′ δ − t ′ H n t + O ((log n ) / / √ n ) (cid:27) , where G n and H n are defined in Condition X2 [see (3.2) and (3.3)].For the other two intervals on the t -axis, the integrands will be boundedby an additive error times Z φ V n ( t/ √ n ) dt = O ( n − p ( d +1 / )since k V n k ≤ n − d .On k t k ≤ ε √ n , the summands are bounded and so their characteristicfunctions satisfy φ i ( s ) ≤ (1 − b k t k ) for some constant c . Thus, on d p log( n ) ≤k t k ≤ ε √ n , | φ ˜ S n ( t/ √ n ) | ≤ (1 − bd log( n ) /n ) n − p ≤ c n − bd for some constant c . Therefore, integrating times φ V n ( t/ √ n ) provides anadditive bound of order n − d ∗ , where d ∗ = bd − p ( d + 1 /
2) and (for any d ) d can be chosen sufficiently large so that d ∗ > d .Finally, on k t k ≥ ε √ n , Condition X1 [see (3.1)] gives an additive boundof η n directly and, again (as on the previous interval), an additive errorbounded by n − d can be obtained.Therefore, it now follows that we can choose d (depending on d , d , d and d ∗ ) so that P (cid:26) S n + V n √ n ∈ A (cid:27) = c π Z Vol( A ) φ Unif( A ) ( t ) φ N ( − Gδ,τ (1 − τ ) H ) ( t ) φ V n (cid:18) t √ n (cid:19) dt × (1 + O ((log ( n ) /n ) / )) + O ( n − d ) , from which Theorem 4 follows.Finally, we show that the contribution of V n can be ignored: | P { ˜ S n ∈ A h } − P { S ∗ n ∈ A h }| = | P { ˜ S n ∈ A h } − P { ˜ S n + V n ∈ A h + V n }|≤ P { ˜ S n + V n ∈ A h △ ( A h + V n ) } , where △ denotes the symmetric difference of the sets. Since V n is boundedand A h = X h R , this symmetric difference is contained in a set, D , which isthe union of 2 p (boundary) parallelepipeds each of the form X h R j , where R j is a rectangle one of whose coordinates has width 2 n − d and all other coordi- S. PORTNOY nates have length 1. Thus, applying Theorem 4 (as proved for the set A = D ), | P { ˜ S n ∈ A h } − P { S ∗ n ∈ A h }| ≤ P { ˜ S n + V n ∈ D }≤ c Vol( D ) + O ( n − d ) ≤ c ′ n − d , where c and c ′ are constants, and d may be chosen arbitrarily large.
6. Normal approximation with nearly root- n multiplicative error. Theorem 5.
Assume Conditions X1, X2, F and the regression quantilemodel of Section 1. Let δ be the argument of the density of ˆ δ n ≡ n − / ( ˆ β ( τ ) − β ( τ )) and suppose k δ k ≤ d p log( n ) for some constant d . Then, uniformly in ε ≤ τ ≤ − ε (for ε > ), f ˆ δ n ( δ ) = ϕ Σ ( δ )(1 + O ((log ( n ) /n ) / )) , where ϕ Σ denotes the normal density with covariance Σ n = τ (1 − τ ) G − n H n G − n with G n and H n given by (3.2) and (3.3). Proof.
Recall the basic formula for the density (4.1): f ˆ δ ( δ ) = n − p/ X h ∈H det( X h ) P { S n ∈ A h } Y i ∈ h f i ( x ′ i β + n − / δ ) . By Theorem 4, ignoring the multiplicative and additive error terms given inthis result and setting c ′ π = (2 π ) − p/ , P { S n ∈ A h } = P { Z n ∈ A h / √ n } = c ′ π | H n | − / Z A h / √ n exp (cid:26) −
12 ( z − G − n δ ) ′ H − n τ (1 − τ ) ( z − G − n δ ) (cid:27) dz = c ′ π | H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) Z A h / √ n dz (1 + O ( n − / ))= c ′ π n − p/ | X h || H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) (1 + O ( n − / ))since z is bounded by a constant times n − / on A h / √ n and the last integralequals Vol( A h ) = n − p/ | X h | .By Ingredient 4, the product is Y i ∈ h f i ( x ′ i β )(1 + O ( k δ k n − / )) . EGRESSION QUANTILE APPROXIMATION This gives the main term of the approximation as X h ∈H n − p | X h | Y i ∈ h f i ( x ′ i β ) | H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) . The penultimate step is to apply results from matrix theory on adjointmatrices [specifically, the Cauchy–Binet theorem and the “trace” theorem;see, e.g., Gantmacher (1960), pages 9 and 87]: the sum above is just thetrace of the p th adjoint of ( X ′ D f X ), which equals det( X ′ D f X ).The various determinants combine (with the factor n − p ) to give det(Σ n ) − / ,which provides the asymptotic normal density we want.Finally, we need to combine the multiplicative and additive errors intoa single multiplicative error. So consider k δ k ≤ d p log( n ) (for some con-stant d ). Then, the asymptotic normal density is bounded below by n − cd forsome constant c .Thus, since the constant d (which depends on d , d , d ∗ and η ) can bechosen so that the additive errors are smaller than O ( n − cd − / ), the error isentirely subsumed in the multiplicative factor: (1 + O ((log ( n ) /n ) / )). (cid:3)
7. The Hungarian construction.
We first prove Theorem 1, which pro-vides the bivariate normal approximation.
Proof of Theorem 1.
The proof follows the development in Theo-rem 5. The first step treats the first (intercept) coordinate. Since the bi-nomial expansions were omitted in the proof of Theorem 3, details for thetrinomial expansion needed for the bivariate case here will be presented.The binomial sum in the first coordinate of (4.2) will be split into the sumof observations in the intervals [ x ′ i ˆ β (0) , x ′ i ˆ β ( τ )), [ x ′ i ˆ β ( τ ) , x ′ i ˆ β ( τ + a n )) and[ x ′ i ˆ β ( τ + a n ) , x ′ i ˆ β (1)). The expected number of observations in each intervalis within p of n times the length of the corresponding interval. Thus, ignor-ing an error of order 1 /n , we expand a trinomial with n observations and p = τ and p = a n . Let ( N , N , N ) be the (trinomially distributed) num-ber of observation in the respective intervals and consider P ∗ ≡ P { N = k ,N = k , N = n − k − k } . We may take k = O (( n log n ) / ) , (7.1) k = O ( a n (log n ) / ) , since these bounds are exceeded with probability bounded by n − d for any(sufficiently large) d . So P ∗ ≡ A × B , where A = n !( np + k )!( np + k )!( n (1 − p − p ) − k − k )! ,B = p np + k +11 p np + k (1 − p − p ) n (1 − p − p ) − k − k . S. PORTNOY
Expanding (using Sterling’s formula and some computation), A = 12 π exp (cid:26) (cid:18) n + 12 (cid:19) log (cid:18) n + 1 n (cid:19) − (cid:18) np + k + 12 (cid:19) log (cid:18) np + k + 1 np (cid:19) − (cid:18) np + k + 12 (cid:19) log (cid:18) np + k + 1 np (cid:19) − (cid:18) n (1 − p − p ) − k − k + 12 (cid:19) × log (cid:18) n (1 − p − p ) − k + k − n (1 − p − p ) (cid:19) + O (cid:18) np (cid:19)(cid:27) = 12 π exp (cid:26)
12 log n − np log p − (cid:18) k + 12 (cid:19) log( np ) − np log p − (cid:18) k + 12 (cid:19) log( np ) − n (1 − p − p ) log(1 − p − p ) − (cid:18) k + k + 12 (cid:19) × log( n (1 − p − p )) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) k ( np ) (cid:19)(cid:27) = 12 π exp (cid:26) − log n − (cid:18) np + k + 12 (cid:19) log p − (cid:18) np + k + 12 (cid:19) log p − (cid:18) n (1 − p − p ) − k − k + 12 (cid:19) log(1 − p − p ) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) ( logn ) / na n (cid:19)(cid:27) ,B = exp { ( np + k ) log p + ( np + k ) log p + ( n (1 − p − p ) − k − k ) log(1 − p − p ) } . Therefore, A × B = exp (cid:26) − p − p −
12 (1 − p − p ) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) ( logn ) / na n (cid:19)(cid:27) . EGRESSION QUANTILE APPROXIMATION Some further simplification shows that A × B gives the usual normalapproximation to the trinomial with a multiplicative error of (1 + o ( n − / ))[when k and k satisfy (7.1)].The next step of the proof follows that of Theorem 4 (see Ingredient 3).Since the proof is based on expanding characteristic functions (which donot involve the inverse of the covariance matrices), all uniform error boundscontinue to hold. This extends the result of Theorem 4 to the bivariate case: P { S n ( τ ) ∈ A h , S n ( τ ) ∈ A h } = P { Z ∈ A h / √ n, Z ∈ A h / √ n } (7.2) = P { Z ∈ A h / √ n } × P { ( Z − Z ) / √ n ∈ ( A h − Z ) / √ n | Z } for appropriate normally distributed ( Z , Z ) (depending on n ). This lastequation is needed to extend the argument of Theorem 5, which involvesintegrating normal densities. The joint covariance matrix for ( S n ( τ ) , S n ( τ ))is nearly singular (for τ − τ small) and complicates the bounds for theintegral of the densities. The first factor above can be treated exactly asin the proof of Theorem 5, while the conditional densities involved in thesecond factor can be handled by simple rescaling. This provides the desiredgeneralization of Theorem 5.Thus, the next step is to develop the parameters of the normal distributionfor ( B n ( τ ) , R n ) [see (3.4), (3.5)] in a usable form. The covariance matrixfor ( B n ( τ ) , B n ( τ )) has blocks of the formCov( B n ( τ ) , B n ( τ )) = (cid:18) τ (1 − τ )Λ τ (1 − τ )Λ τ (1 − τ )Λ τ (1 − τ )Λ (cid:19) , where Λ ij = G − n ( τ i ) H n G − n ( τ j ) with G n and H n given in Condition X2[see (3.2) and (3.3)].Expanding G n ( τ ) about τ = τ (using the differentiability of the densitiesfrom Condition F),Λ ij = Λ + ( τ − τ )∆ ij + o ( | τ − τ | ) , where ∆ ij are derivatives of G n at τ (note that ∆ = 0). Straightforwardmatrix computation now yields the joint covariance for ( B n ( τ ) , R n ):Cov( B n ( τ ) , R n ) = (cid:18) τ (1 − τ )Λ ( τ − τ )∆ ∗ ( τ − τ )∆ ∗ ( τ − τ )∆ ∗ (cid:19) + o ( | τ − τ | ) , (7.3)where ∆ ∗ ij are uniformly bounded matrices.Thus, the conditional distribution of R n = p ( τ − τ )( B n ( τ ) − B n ( τ ))given B n ( τ ) has moments E [ R n | B n ( τ )] = ( τ − τ )Λ − ∆ / ( τ (1 − τ )) , (7.4) Cov[ R n | B n ( τ )] = ( τ − τ ) (cid:20) ∆ ∗ − τ − τ τ (1 − τ ) ∆ ∗ Λ − ∆ ∗ (cid:21) (7.5)and analogous equations also hold for { Z − Z | Z } . S. PORTNOY
Finally, recalling that τ − τ = a n , the second term in (7.2) can be written P (cid:26) Z − Z √ n ∈ A h − Z √ n (cid:12)(cid:12)(cid:12) Z (cid:27) = P (cid:26) Z − Z p n ( τ − τ ) ∈ A h − Z √ na n (cid:12)(cid:12)(cid:12) Z (cid:27) . Thus, since the conditional covariance matrix is uniformly bounded exceptfor the a n = ( τ − τ ) factor, the argument of Theorem 5 also applies directlyto this conditional probability. (cid:3) Finally, the above results are used to apply the quantile transform for in-crements between dyadic rationals inductively in order to obtain the desired“Hungarian” construction. The proof of Theorem 2 is as follows:
Proof of Theorem 2. (i) Following the approach in Einmahl (1989),the first step is to provide the result of Theorem 1 for conditional densitiesone coordinate at a time. Using the notation of Theorem 1, let τ = k/ ℓ and τ = ( k + 1) / ℓ be successive dyadic rationals (between ε and 1 − ε ) withdenominator 2 ℓ . So a n = 2 − ℓ . Let R m be the m th coordinate of R n ( τ , τ )[see (3.5)], let ˙ R m be the vector of coordinates before the m th one, and let S = B n ( τ ). Then the conditional density of R m | ( ˙ R m , S ) satisfies f R m | ( ˙ R m ,S ) ( r | r , s ) = ϕ µ, Σ ( r | r , s ) (cid:18) O (cid:18) (log n ) / √ n (cid:19)(cid:19) (7.6)for k r k < D √ log n , k r k < D √ log n , and k s k < D √ log n , and where µ and σ are easily derived from (7.4) and (7.5). Note that µ has the form µ = √ a n α ′ S, (7.7)where k α k can be bounded (independent of n ) and Σ can be bounded awayfrom zero and infinity (independent of n ).This follows since the conditional densities are ratios of marginal densi-ties of the form f Y ( y ) = R f X,Y dx (with f X,Y satisfying Theorem 1). Theintegral over k x k ≤ D √ log n has the multiplicative error bound directly. Theremainder of the integral is bounded by n − d , which is smaller than the nor-mal integral over k x k ≤ D √ log n (see the end of the proof of Theorem 5).(ii) The second step is to develop a bound on the (conditional) quantiletransform in order to approximate an asymptotic normal random variable bya normal one. The basic idea appears in Einmahl (1989). Clearly, from (7.6), Z r f R m | ( ˙ R m ,S ) ( u | r , s ) du = Z r ϕ µ,σ ( u | r , s ) du (cid:18) O (cid:18) (log n ) / √ n (cid:19)(cid:19) for k u k < D √ log n , k r k < D √ log n , and k s k < D √ log n . By Condition F,the conditional densities (of the response given x ) are bounded above zeroon ε ≤ τ ≤ − ε . Hence, the inverse of the above versions of the c.d.f.’s alsosatisfy this multiplicative error bound, at least for the variables bounded by D √ log n . Thus, the quantile transform can be applied to show that there isa normal random variable, Z ∗ , such that ( R m − Z ∗ ) = O ((log n ) / / √ n ) so EGRESSION QUANTILE APPROXIMATION long as R m and the quantile transform of R m are bounded by D √ log n . Usingthe conditional mean and variance [see (7.7)], and the fact that the randomvariables exceed D √ log n with probability bounded by n − d (where d canbe made large by choosing D large enough), there is a random variable Z m that can be chosen independently so that R m = a n α ′ S + Z m + O (cid:18) (log n ) / √ n (cid:19) (7.8)except with probability bounded by n − d .(iii) Finally, the “Hungarian” construction will be developed inductively.Let τ ( k, ℓ ) = k/ ℓ and consider induction on ℓ . First consider the case where τ ≥ ; the argument for τ < is entirely analogous.Define ε ∗ n = c (log n ) / / √ n , where c bounds the big-O term in any equa-tion of the form (7.8). Let A be a bound [uniform over τ ∈ ( ε, − ε )] on α in (7.8). The induction hypothesis is as follows: there are normal randomvectors Z n ( k, ℓ ) such that (cid:13)(cid:13)(cid:13)(cid:13) B n (cid:18) k ℓ (cid:19) − Z n ( k, ℓ ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε ( ℓ )(7.9)except with probability 2 ℓn − d , where for each ℓ , Z n ( · , ℓ ) has the same co-variance structure as B n ( · / ℓ ), and where ε ( ℓ ) = ℓε ∗ n ℓ Y j =1 (1 + A − j/ ) . (7.10)Note: since the earlier bounds apply only for intervals whose lengthsexceed n − a (for some positive a ), ℓ must be taken to be smaller than a log ( n ) = O (log n ). Thus, the bound in (7.10) becomes O ((log n ) / / √ n ),as stated in Theorem 1.To prove the induction result, note first that Theorem 1 (or Theorem 5)provides the normal approximation for B n ( ) for ℓ = 1. The induction stepis proved as follows: following Einmahl (1989), take two consecutive dyadicrationals τ ( k, ℓ ) and τ ( k − , ℓ ) with k odd. So τ ( k − , ℓ ) = [ k/ / ℓ − = τ ([ k/ , ℓ − . Condition each coordinate of B n ( τ ( k, ℓ )) on previous coordinates and on B n ( τ ([ k/ , ℓ − b n ( τ ( k, ℓ )) = b n ( k/ ℓ ) be one such coordinate.Now, as above, define R ( k, ℓ ) by b n ( τ ( k, ℓ )) = b n ( τ ([ k/ , ℓ − R ( k, ℓ ) . From (7.8), there is a normal random variable Z n ( k, ℓ ) such that | R ( k, ℓ ) − √ − ℓ α ′ B n ( τ ([ k/ , ℓ − − Z n ( k, ℓ ) | ≤ ε ∗ n . By the induction hypothesis for ( ℓ − B n ( τ ([ k/ , ℓ −
1) is approximableby normal random variables to within ε ( ℓ −
1) (except with probability n − d ). S. PORTNOY
Thus, a coordinate b n ( τ ([ k/ , ℓ −
1) is also approximable with this error,and the error in approximating a n α ′ B n ( τ ([ k/ , ℓ −
1) is bounded by ε ( ℓ − A √ a n = A − ℓ/ . Finally, since Z n ( k, ℓ ) is independent of these normalvariables, the errors can be added to obtain(1 + A − ℓ/ ) ε ( ℓ −
1) + ε ∗ n . Therefore, except with probability less than 2( ℓ − n − d + 2 n − d = 2 ℓn − d , theinduction hypothesis (7.9) holds with error( ℓ − ε ∗ n ℓ − Y j =1 (1 + 2 − j/ ) × (1 + 2 − ℓ/ ) + ε ∗ n ≤ ℓ ℓ Y j =1 (1 + 2 − j/ ) ε ∗ n = ε ( ℓ ) , and the induction is proven.The theorem now follows since the piecewise linear interpolants satisfythe same error bound [see Neocleous and Portnoy (2008)]. (cid:3) APPENDIX
Result 1.
Under the conditions for the theorems here, the coverageprobability for the confidence interval (2.3) is − α + O ((log n ) n − / ) , whichis achieved at h n = c √ log nn − / (where c is a constant). Sketch of proof.
Recall the notation of Remark 2 in Section 2. UsingTheorem 1 and the quantile transform as described in the first steps ofTheorem 2 (and not needing the dyadic expansion argument), it can beshown that there is a bivariate normal pair (
W, Z ) such that √ n ( ˆ β ( τ ) − β ( τ )) = W + R n , R n = O p ( n − / (log n ) / ) , (A.1) √ n ( ˆ∆( h n ) − ∆( h n )) = Z + R ∗ n , R ∗ n = O p ( n − / (log n ) / ) . Note that from the proofs of Theorems 1 and 2, the O p terms above areactually O terms except with probability n − d where d is an arbitrary fixedconstant. The “almost sure” results above take d >
1, but d = 1 will sufficefor the bounds on the coverage probability here.Incorporating the approximation error in (A.1), √ n (ˆ δ − δ ) = Z/h n + R ∗ n /h n + O ( n / h n ) . Now consider expanding s a ( δ ). First, note that under the design condi-tions here, s a will be of exact order n − / ; specifically, if X is replaced by √ n ˜ X , all terms involving ˜ X ′ ˜ X will remain bounded, and we may focus on EGRESSION QUANTILE APPROXIMATION √ ns a ( δ ). Note also that for h n = O ( n − / ), the terms in the expansion of(ˆ δ − δ ) tend to zero [specifically, 1 / ( √ nh n ) = O ( n − / )]. So the sparsity, s a ( δ ), may be expanded in a Taylor series as follows: √ ns a (ˆ δ ) = √ ns a ( δ ) + b ′ (ˆ δ − δ ) + b (ˆ δ − δ ) + b (ˆ δ − δ ) + O ( n − / ) ≡ √ ns a ( δ ) + K, where b is a (gradient) vector that can be defined in terms of ˜ X and β ( τ )(and its derivatives), b is a quadratic function (of its vector argument)and b is a cubic function. Note that under the design conditions, all thecoefficients in b , b and b are bounded, and so it is not hard to show thatall the terms in K tend to zero as long as h n √ n → ∞ . Specifically, if h n is oforder n − / , then all the terms in K tend to zero. Also, R ∗ n is within a log n factor of O ( n − / ) and h n is even smaller. Finally, Z is a difference of twoquantiles separated by 2 h , and so b ′ Z has variance proportional to h . Thus, E ( b ′ Z/ ( √ nh n )) = O (1 / ( nh n )). Thus, not only does b ′ Z/ ( √ nh n ) → p
0, butpowers of this term greater than 2 will also be O p ( n − ).It follows that the coverage probability may be computed using only twoterms of the Taylor series expansion for the normal c.d.f.: P {√ na ′ ( ˆ β ( τ ) − β ( τ )) ≤ z α √ ns a (ˆ δ ) } = P { a ′ ( W + R n ) ≤ z α √ ns a (ˆ δ ) + K } = E Φ a ′ W | Z ( z α √ ns a ( δ ) + K − a ′ R n )= E { Φ a ′ W | Z ( √ ns a ( δ )) + φ a ′ W | Z ( √ ns a ( δ ))( K − a ′ R n )+ φ ′ a ′ W | Z ( √ ns a ( δ ))( K − a ′ R n ) + O ((log n ) /n ) }≡ − α + T + T + O ((log n ) /n ) . Note that the (normal) conditional distribution of W given Z is straightfor-ward to compute (using the usual asymptotic covariance matrix for quan-tiles): the conditional mean is a small constant (of the order of h n ) times Z ,and the conditional variance is bounded.Expanding the lower probability in the same way and subtracting providessome cancelation. The contribution of R n will cancel in the T differences,and is negligible in subsequent terms since R n = O ((log n ) /n ). Similarly, the R ∗ n / ( √ nh n ) term will appear only in the T difference where it contributesa term that is (log n ) / times a term of order 1 / ( nh n ), and will also be negli-gible in subsequent terms. Also, the h n term will only appear in T , as higherpowers will be negligible. The only remaining terms involve Z/ ( √ nh n )). Forthe first power (appearing in T ), EZ = 0. For the squared Z -terms in T ,since Var( b ′ Z ) is proportional to h n , E ( b ′ Z ) / ( nh n ) = c / ( nh n ), and allother terms involving Z have smaller order. S. PORTNOY
Therefore, one can obtain the following error for the coverage probability:for some constants c and c , the error is b ′ R ∗ n √ nh n + c nh n + c h n (plus terms of smaller order). Since R ∗ n is of order nearly n − / , the firstterms have nearly the same order. Using b ′ R ∗ n = c (log n ) / ( √ nh n ), it is straight-forward to find the optimal h n to be a constant times √ log nn − / , whichbounds the error in the coverage probability by O (log nn − / ). (cid:3) REFERENCES
Bernstein, S. N. (1964). On a modification of Chebyshev’s inequality and of the error for-mula of Laplace. In
Sobranie Sochineni˘ı Ann. Sci. Inst. Sav. Ukraine, Sect. Math. (1924)]. Daniels, H. E. (1987). Tail probability approximations.
Internat. Statist. Rev. De Angelis, D. , Hall, P. and
Young, G. A. (1993). Analytical and bootstrap ap-proximations to estimator distributions in L regression. J. Amer. Statist. Assoc. Einmahl, U. (1989). Extensions of results of Koml´os, Major, and Tusn´ady to the multi-variate case.
J. Multivariate Anal. Gantmacher, F. R. (1960).
Matrix Theory . Amer. Math. Soc., Providence, RI.
Gutenbrunner, C. , Jureˇckov´a, J. , Koenker, R. and
Portnoy, S. (1993). Tests oflinear hypotheses based on regression rank scores.
J. Nonparametr. Stat. Hall, P. and
Sheather, S. J. (1988). On the distribution of a Studentized quantile.
J. R. Stat. Soc. Ser. B Stat. Methodol. He, X. and
Hu, F. (2002). Markov chain marginal bootstrap.
J. Amer. Statist. Assoc. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
J. Amer. Statist. Assoc. Horowitz, J. L. (1998). Bootstrap methods for median regression models.
Econometrica Jureˇckov´a, J. and
Sen, P. K. (1996).
Robust Statistical Procedures: Asymptotics andInterrelations . Wiley, New York. MR1387346
Knight, K. (2002). Comparing conditional quantile estimators: First and second orderconsiderations. Technical report, Univ. Toronto.
Kocherginsky, M. , He, X. and
Mu, Y. (2005). Practical confidence intervals for regres-sion quantiles.
J. Comput. Graph. Statist. Koenker, R. (2005).
Quantile Regression . Econometric Society Monographs . Cam-bridge Univ. Press, Cambridge. MR2268657 Koenker, R. (2012). quantreg : Quantile regression. R-package, Version 4.79. Availableat cran.r-project.org.
Koenker, R. and
Bassett, G. Jr. (1978). Regression quantiles.
Econometrica Koml´os, J. , Major, P. and
Tusn´ady, G. (1975). An approximation of partial sumsof independent RV’s and the sample DF. I.
Z. Wahrsch. Verw. Gebiete Koml´os, J. , Major, P. and
Tusn´ady, G. (1976). An approximation of partial sumsof independent RV’s, and the sample DF. II.
Z. Wahrsch. Verw. Gebiete Neocleous, T. and
Portnoy, S. (2008). On monotonicity of regression quantile func-tions.
Statist. Probab. Lett. Parzen, M. I. , Wei, L. J. and
Ying, Z. (1994). A resampling method based on pivotalestimating functions.
Biometrika Zhou, K. Q. and
Portnoy, S. L. (1996). Direct use of regression quantiles to constructconfidence sets in linear models.