[PDF] Nearly root-n approximation for regression quantile processes

Abstract

Traditionally, assessing the accuracy of inference based on regression quantiles has relied on the Bahadur representation. This provides an error of order n −1/4 in normal approximations, and suggests that inference based on regression quantiles may not be as reliable as that based on other (smoother) approaches, whose errors are generally of order n −1/2 (or better in special symmetric cases). Fortunately, extensive simulations and empirical applications show that inference for regression quantiles shares the smaller error rates of other procedures. In fact, the "Hungarian" construction of Komlós, Major and Tusnády [Z. Wahrsch. Verw. Gebiete 32 (1975) 111-131, Z. Wahrsch. Verw. Gebiete 34 (1976) 33-58] provides an alternative expansion for the one-sample quantile process with nearly the root- n error rate (specifically, to within a factor of logn ). Such an expansion is developed here to provide a theoretical foundation for more accurate approximations for inference in regression quantile models. One specific application of independent interest is a result establishing that for conditional inference, the error rate for coverage probabilities using the Hall and Sheather [J. R. Stat. Soc. Ser. B Stat. Methodol. 50 (1988) 381-391] method of sparsity estimation matches their one-sample rate.

Full PDF

aa r X i v : . [ m a t h . S T ] O c t The Annals of Statistics (cid:13)

Institute of Mathematical Statistics, 2012

NEARLY ROOT- N APPROXIMATION FOR REGRESSIONQUANTILE PROCESSES

By Stephen Portnoy University of Illinois at Urbana-Champaign

Traditionally, assessing the accuracy of inference based on regres-sion quantiles has relied on the Bahadur representation. This providesan error of order n − / in normal approximations, and suggests thatinference based on regression quantiles may not be as reliable as thatbased on other (smoother) approaches, whose errors are generally oforder n − / (or better in special symmetric cases). Fortunately, ex-tensive simulations and empirical applications show that inference forregression quantiles shares the smaller error rates of other procedures.In fact, the “Hungarian” construction of Koml´os, Major and Tusn´ady[ Z. Wahrsch. Verw. Gebiete (1975) 111–131, Z. Wahrsch. Verw.Gebiete (1976) 33–58] provides an alternative expansion for theone-sample quantile process with nearly the root- n error rate (speciﬁ-cally, to within a factor of log n ). Such an expansion is developed hereto provide a theoretical foundation for more accurate approximationsfor inference in regression quantile models. One speciﬁc applicationof independent interest is a result establishing that for conditionalinference, the error rate for coverage probabilities using the Hall andSheather [ J. R. Stat. Soc. Ser. B Stat. Methodol. (1988) 381–391]method of sparsity estimation matches their one-sample rate.

1. Introduction.

Consider the classical regression quantile model: givenindependent observations { ( x i Y i ) : i = 1 , . . . , n } , with x i ∈ R p ﬁxed (for ﬁxed p ),the conditional quantile of the response Y i given x i is Q Y i ( τ | x i ) = x ′ i β ( τ ) . Let ˆ β ( τ ) be the Koenker–Bassett regression quantile estimator of β ( τ ).Koenker (2005) provides deﬁnitions and basic properties, and describes thetraditional approach to asymptotics for ˆ β ( τ ) using a Bahadur representa- Received August 2011; revised May 2012. Supported in part by NSF Grant DMS-10-07396.

AMS 2000 subject classiﬁcations.

Primary 62E20, 62J99; secondary 60F17.

Key words and phrases.

Regression quantiles, asymptotic approximation, Hungarianconstruction.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Statistics ,2012, Vol. 40, No. 3, 1714–1736. This reprint diﬀers from the original inpagination and typographic detail. 1

S. PORTNOY tion: B n ( τ ) ≡ n / ( ˆ β ( τ ) − β ( τ )) = D ( x ) W ( τ ) + R n , where W ( t ) is a Brownian Bridge and R n is an error term.Unfortunately, R n is of order n − / [see, e.g., Jureˇckov´a and Sen (1996)and Knight (2002)]. This might suggest that asymptotic results are ac-curate only to this order. However, both simulations in regression casesand one-dimensional results [Koml´os, Major and Tusn´ady (1975, 1976)] jus-tify a belief that regression quantile methods should share (nearly) the O ( n − / ) accuracy of smooth statistical procedures (uniformly in τ ). Infact, as shown in Knight (2002), n / R n has a limit with zero mean andthat is independent of W ( τ ). Thus, in any smooth inferential procedure(say, conﬁdence interval lengths or coverages), this error term should enteronly through ER n = O ( n − / ). Nonetheless, this expansion would still leavean error of o ( n − / ) (coming from the error beyond the R n term in theBahadur representation), and so would still fail to reﬂect root- n behavior.Furthermore, previous results only provide such a second-order expansionfor ﬁxed τ .It must be noted that the slower O ( n − / ) error rate arises from thediscreteness introduced by indicator functions appearing in the gradientconditions. In fact, expansions can be carried out when the design is as-sumed to be random; see De Angelis, Hall and Young (1993) and Horowitz(1998), where the focus is on analysis of the ( x, Y ) bootstrap. Speciﬁcally,the assumption of a smooth distribution for the design vectors together witha separate treatment of the lattice contribution of the intercept does permitappropriate expansions. Unfortunately, the randomness in X means that allinference must be in terms of the average asymptotic distribution (averagedover X ), and so fails to apply to the generally more desirable conditionalforms of inference. Speciﬁcally, unconditional methods may be quite poorin the heteroscedastic and nonsymmetric cases for which regression quantileanalysis is especially appropriate. The main goal of this paper is to reclaimincreased accuracy for conditional inference beyond that provided by thetraditional Bahadur representation.Speciﬁcally, the aim is to provide a theoretical justiﬁcation for an errorbound of nearly root- n order uniformly in τ . Deﬁneˆ δ n ( τ ) = √ n ( ˆ β ( τ ) − β ( τ )) . We ﬁrst develop a normal approximation for the density of ˆ δ with thefollowing form: f ˆ δ ( δ ) = ϕ Σ ( δ )(1 + O ( L n n − / ))for k δ k ≤ D √ log n , where L n = (log n ) / . We then extend this result to thedensities of a pair of regression quantiles in order to obtain a “Hungarian” EGRESSION QUANTILE APPROXIMATION construction [Koml´os, Major and Tusn´ady (1975, 1976)] that approximatesthe process B n ( τ ) by a Gaussian process to order O ( L ∗ n n − / ), where L ∗ n =(log n ) / (uniformly for ε ≤ τ ≤ − ε ).Section 2 provides some applications of the results here to conditionalinference methods in regression quantile models. Speciﬁcally, an expansionis developed for coverage probabilities of conﬁdence intervals based on the[Hall and Sheather (1988)] diﬀerence quotient estimator of the sparsity func-tion. The coverage error rate is shown to achieve the rate O ( n − / log n ) forconditional inference, which is nearly the known “optimal” rate obtained fora single sample and for unconditional inference. Section 3 lists the condi-tions and main results, and oﬀers some remarks. Section 4 provides a de-scription of the basic ingredients of the proof (since this proof is rather longand complicated). Section 5 proves the density approximation for a ﬁxed τ (with multiplicative error). Section 6 extends the result to pairs of regressionquantiles (Theorem 1), and Section 7 provides the “Hungarian” construc-tion (Theorem 2) with what appears to be a somewhat innovative inductionalong dyadic rationals.

2. Implications for applications.

As the impetus for this work was theneed to provide some theoretical foundation for empirical results on theaccuracy of regression quantile inference, some remarks on implications arein order.

Remark 1.

Clearly, whenever published work assesses the accuracy ofan inferential method using the error term from the Bahadur representation,the present results will immediately provide an improvement from O ( n − / )to the nearly root- n rate here. One area of such results is methods baseddirectly on regression quantiles and not requiring estimation of the spar-sity function [1 /f ( F − ( τ ))]. There are several papers giving such results,although at present it appears that their methods have theoretical justiﬁca-tion only under location-scale forms of quantile regression models.Speciﬁcally, Zhou and Portnoy (1996) introduced conﬁdence intervals (es-pecially for ﬁtted values) based on using pairs of regression quantiles in a wayanalogous to conﬁdence intervals for one-sample quantiles. They showed thatthe method was consistent, but the accuracy depended on the Bahadur errorterm. Thus, results here now provide accuracy to the nearly root- n rate ofTheorem 2.A second approach directly using the dual quantile process is based onthe regression ranks of Gutenbrunner et al. (1993). Again, the error terms inthe theoretical results there can be improved using Theorem 1 here, thoughthe development is not so direct.For a third application, Neocleous and Portnoy (2008) showed that theregression quantile process interpolated along a grid of mesh strictly largerthan n − / is asymptotically equivalent to the full regression quantile process S. PORTNOY to ﬁrst order, but (because of additional smoothness) will yield monotonicquantile functions with probability tending to 1. However, their developmentused the Bahadur representation, which indicated that a mesh of order n − / balanced the bias and accuracy and bounded the diﬀerence between ˆ β ( τ )and its linear interpolate by nearly O ( n − / ). With some work, use of theresults here would permit a mesh slightly larger than the nearly root- n ratehere to obtain an approximation of nearly root- n order. Remark 2.

Inference under completely general regression quantile mod-els appears to require either estimation of the sparsity function or use ofresampling methods. The most general methods in the quantreg package[Koenker (2012)] use the “diﬀerence quotient” method with the [Hall andSheather (1988)] bandwidth of order n − / , which is known to be optimalfor coverage probabilities in the one-sample problem. As noted above, ex-pansions using the randomness of the regressors can be developed to provideanalogous results for unconditional inference. The results here (with someelaboration) can be used to show that the Hall–Sheather estimates provide(nearly) the same rates of accuracy for coverage probabilities under the con-ditional form of the regression quantile model.To be speciﬁc, consider the problem of conﬁdence interval estimation fora ﬁxed linear combination of regression parameters: a ′ β ( τ ). The asymptoticvariance is the well-known sandwich formula s a ( δ ) = τ (1 − τ ) a ′ ( X ′ DX ) − ( X ′ X )( X ′ DX ) − a, D ≡ diag( x ′ i δ ) , (2.1)where δ is the sparsity, δ = β ′ ( τ ) (with β ′ being the gradient), and where X is the design matrix.Following Hall and Sheather (1988), the sparsity may be approximatedby the diﬀerence quotient ˜ δ = ( β ( τ + h ) − β ( τ − h )) / (2 h ). Standard approx-imation theory (using the Taylor series) shows that δ = ˜ δ + O ( h ) . The sparsity may be estimated byˆ δ ≡ ∆( h ) / (2 h ) ≡ ( ˆ β ( τ + h ) − ˆ β ( τ − h )) / (2 h ) , (2.2)and the sparsity (2.1) may be estimated by inserting ˆ δ in D .Then, as shown in the Appendix, the conﬁdence interval a ′ β ( τ ) ∈ a ′ ˆ β ( τ ) ± z α s a (ˆ δ )(2.3)has coverage probability 1 − α + O ((log n ) n − / ), which is within a factorof log n of the optimal Hall–Sheather rate in a single sample. Furthermore,this rate is achieved at the (optimal) h -value h ∗ n = c √ log nn − / , which isthe optimal Hall–Sheather bandwidth except for the √ log n term. EGRESSION QUANTILE APPROXIMATION Since the optimal bandwidth depends on R ∗ n , the optimal constant forthe h ∗ n cannot be determined, as it can when X is allowed to be random [andfor which the O (1 / ( nh n )) term is explicit]. This appears to be an inherentshortcoming for using inference conditional on the design.Note also that it is possible to obtain better error rates for the coverageprobability by using higher order diﬀerences. Speciﬁcally, using the notationof (2.2), ∆( h ) − ∆(2 h ) = β ′ ( τ ) + O ( h ) . As a consequence, the optimal bandwidth for this estimator is of order n − / ,and the coverage probability is accurate to order n − / (except for logarith-mic factors). Remark 3.

A third approach to inference applies resampling methods.As noted in the Introduction, while the ( x, Y ) bootstrap is available for un-conditional inference, the practicing statistician will generally prefer to useinference conditional on the design. There are some resampling approachesthat can obtain such inference. One method is that of Parzen, Wei and Ying(1994), which simulates the binomial variables appearing in the gradientcondition. Another is the “Markov Chain Marginal Bootstrap” of He andHu (2002) [see also Kocherginsky, He and Mu (2005)]. However, this methodalso involves sampling from the gradient condition. The discreteness in thegradient condition would seem to require the error term from the Bahadurrepresentation, and thus leads to poorer inferential approximation: the errorwould be no better than order n − / even if it were the square of the Bahadurerror term. While some evidence for decent performance of these methodscomes from (rather limited) simulations, it is often noticed that these meth-ods perform perhaps somewhat more poorly than the other methods in the quantreg package of Koenker (2012). Clearly, a more complete analysis ofinference for regression quantiles based on the more accurate stochastic ex-pansions here would be useful.

3. Conditions, fundamental theorems and remarks.

Under the regres-sion quantile model of Section 1, the following conditions will be imposed:Let ˙ x i denote the coordinates of x i except for the intercept (i.e., the last p − φ i ( t ) denote the conditionalcharacteristic function of the random variable ˙ x i ( I ( Y i ≤ x ′ i β ( τ ) + δ/ √ n ) − τ ),given x i . Let f i ( y ) and F i ( y ) denote the conditional density and c.d.f. of Y i given x i . Condition X1.

For any ε >

0, there is η ∈ (0 ,

1) such thatinf k t k >ε Y ˙ φ i ( t ) ≤ η n (3.1)uniformly in ε ≤ τ ≤ − ε . S. PORTNOY

Condition X2. k x i k are uniformly bounded, and there are positivedeﬁnite p × p matrices G = G ( τ ) and H such that for any ε > n → ∞ ) G n ( τ ) ≡ n n X i =1 f i ( x ′ i β ( τ )) x ′ i x i = G ( τ )(1 + O ( n − / )) , (3.2) H n ≡ n n X i =1 x ′ i x i = H (1 + O ( n − / ))(3.3)uniformly in ε ≤ τ ≤ − ε . Condition F.

The derivative of log( f i ( y )) is uniformly bounded on theinterval { y : ε ≤ F i ( y ) ≤ − ε } .Two fundamental results will be developed here. The ﬁrst result providesa density approximation with multiplicative error of nearly root- n rate. A re-sult for ﬁxed τ is given in Theorem 5, but the result needed here is a bivariateapproximation for the joint density of one regression quantile and the diﬀer-ence between this one and a second regression quantile (properly normalizedfor the diﬀerence in τ -values).Let ε ≤ τ ≤ − ε for some ε >

0, and let τ = τ + a n with a n > cn − b forsome b <

1. Here, one may want to take b near 1 [see remark (1) below],though the basic result will often be useful for b = , or even smaller. Deﬁne B n = B n ( τ ) ≡ n / ( ˆ β ( τ ) − β ( τ )) , (3.4) R n = R n ( τ , τ ) ≡ ( na n ) / [( ˆ β ( τ ) − β ( τ )) − ( ˆ β ( τ ) − β ( τ ))] . (3.5) Theorem 1.

Under Conditions X1, X2 and F, there is a constant D such that for | B n | ≤ D (log n ) / and | R n | ≤ D (log n ) / , the joint densityof R n and B n at δ and s , respectively, satisﬁes f R n ,B n ( δ, s ) = ϕ Γ n ( δ, s )(1 + O (( na n (log n ) ) − / )) , where ϕ Γ n is a normal density with covariance matrix Γ n having the formgiven in (7.3). The second result provides the desired “Hungarian” construction:

Theorem 2.

Assume Conditions X1, X2 and F. Fix a n = n − b with b < , and let { τ j } be dyadic rationals with denominator less than n b . Deﬁne B ∗ n ( τ ) to be the piecewise linear interpolant of { B n ( τ j ) } [as deﬁned in (3.4)].Then for any ε > , there is a (zero-mean) Gaussian process, { Z n ( τ j ) } , de-ﬁned along the dyadic rationals { τ j } and with the same covariance structureas B ∗ n ( τ ) (along { τ j } ) such that its piecewise linear interpolant Z ∗ n ( τ ) satis- EGRESSION QUANTILE APPROXIMATION ﬁes sup ε ≤ τ ≤ − ε | B ∗ n ( τ ) − Z ∗ n ( τ ) | = O (cid:18) (log n ) / √ n (cid:19) almost surely. Some remarks on the conditions and ramiﬁcations are in order:(1) The usual construction approximates B n ( τ ) by a “Brownian Bridge”process. Theorem 2 really only provides an approximation for the discreteprocesses at a suﬃciently sparse grid of dyadic rationals. That the piecewiselinear interpolants converge to the usual Brownian Bridge follows as in Neo-cleous and Portnoy (2008). The critical impediment to getting a BrownianBridge approximation to B n ( τ ) with the error in Theorem 2 is the squareroot behavior of the modulus of continuity. This prevents approximatingthe piecewise linear interpolant within an interval of length greater than(roughly) order 1 /n if a root- n error is desired. In order to approximate thedensity of the diﬀerence in B n ( τ ) over an interval between dyadic rationals,the length of the interval must be at least of order n − b (for b < √ n − b = n − b/ , and thus to get arbitrarily close to thevalue of for the exponent of n . For most purposes, it might be better tostate the ﬁnal result assup ε ≤ τ ≤ − ε k B n ( τ ) − Z ( τ ) k = O ( n − a )for any a < / Z is the appropriate Brownian Bridge); but thestronger error bound of Theorem 2 does provide a much closer analog of theresult for the one-sample (one-dimensional) quantile process.(2) The one-sample result requires only the ﬁrst power of log n , which isknown to give the best rate for a general result. The extra addition of 3 / x . Nonetheless, the condition that k x k be bounded seems rather strongin the case of random x . It seems clear that this can be weakened, thoughprobably at the cost of getting a poorer approximation. For example, k x k having exponentially small tails might increase the bound in Theorem 2 byan additional factor of log n , and algebraic tails are likely worse. However,details of such results remain to be developed.(4) Similarly, it should be possible to let ε , which deﬁnes the compactsubinterval of τ -values, tend to zero. Clearly, letting ε n be of order 1 /n S. PORTNOY would lead to extreme value theory and very diﬀerent approximations. Forslower rates of convergence of ε n , Bahadur expansions have been developed[e.g., see Gutenbrunner et al. (1993)] and extension to the approximationresult in Theorem 2 should be possible. Again, however, this would mostlikely be at the cost of a larger error term.(5) The assumption that the conditional density of the response (given x )be continuous is required even for the usual ﬁrst order asymptotics. However,one might hope to avoid Condition F, which requires a bounded derivativeat all points. For example, the double exponential distribution does notsatisfy this condition. It is likely that the proofs here can be extended tothe case where the derivative does not exist on a ﬁnite set (or even ona set of measure zero), but dropping diﬀerentiability entirely would requirea rather diﬀerent approach. Furthermore, the apparent need for boundedderivatives in providing uniformity over τ in Bahadur expansions suggeststhe possibility that some diﬀerentiability is required.(6) Theorem 1 provides a bivariate normal density approximation witherror rate (nearly) n − / when τ and τ are ﬁxed. When a n ≡ τ − τ → a n → D n = ˆ β ( τ ) − ˆ β ( τ ) is of order ( na n ) − / .

4. Ingredients and outline of proof.

The development of the fundamen-tal results (Theorems 1 and 2) will be presented in three phases. The ﬁrstphase provides the density approximation for a ﬁxed τ , since some of themore complicated features are more transparent in this case. The secondphase extends this result to the bivariate approximation of Theorem 1. Theﬁnal phase provides the “Hungarian” construction of Theorem 2. To clarifythe development, the basic ingredients and some preliminary results will bepresented ﬁrst. Ingredient 1.

Begin with the ﬁnite sample density for a regressionquantile [Koenker (2005), Koenker and Bassett (1978)]: assume Y i has a den-sity, f i ( y ), and let τ be ﬁxed. Note that ˆ β ( τ ) is deﬁned by having p zeroresiduals (if the design is in general position). Speciﬁcally, there is a sub-set, h , of p integers such that ˆ β ( τ ) = X − h Y h , where X h has rows x ′ i for i ∈ h and Y h has coordinates Y i for i ∈ h . Let H denote the set of all such p -elementsubsets. Deﬁne ˆ δ = √ n ( ˆ β ( τ ) − β ( τ )) . As described in Koenker (2005), the density of ˆ δ evaluated at the argu-ment δ = √ n ( b − β ( τ )) is given by f ˆ δ ( δ ) = n − p/ X h ∈H det( X h ) P { S n ∈ A h } Y i ∈ h f i ( x ′ i β ( τ ) + n − / δ ) . (4.1) EGRESSION QUANTILE APPROXIMATION Here, the event in the probability above is the event that the gradientcondition holds for a ﬁxed subset, h : S n ∈ A h , where A h = X h R , with R the rectangle that is the product of intervals ( τ − , τ ) [see Theorem 2.1 ofKoenker (2005)], and where S n = S n ( h, β, δ ) ≡ X i/ ∈ h x i ( I ( Y i ≤ x ′ i β + n − / δ ) − τ ) . (4.2) Ingredient 2.

Since n − / S n is approximately normal, and A h is bound-ed, the probability in (4.1) is approximately a normal density evaluated at δ .To get a multiplicative bound, we may apply a “Cram´er” expansion (ora saddlepoint approximation). If S n had a smooth distribution (i.e., satisﬁedCram´er’s condition), then standard results would apply. Unfortunately, S n is discrete. The ﬁrst coordinate of S n is nearly binomial, and so a multiplica-tive bound can be obtained by applying a known saddlepoint formula forlattice variables [see Daniels (1987)]. Equivalently, approximate by an exactbinomial and (more directly, but with some rather tedious computation)expand the logarithm of the Gamma function in Stirling’s formula. Usingeither approach, one can show the following result: Theorem 3.

Let W ∼ binomial( n, p ) , J be any interval of length O ( √ n ) containing EW = np , and let w = O ( p n log( n )) . Then P { W ∈ J + w } = P { Z ∈ J + w } (1 + O ( n − / p log( n ))) , (4.3) where Z ∼ N ( np, np (1 − p )) . A proof based on multinomial expansions is given for the bivariate gen-eralization in Theorem 1. Note that this result includes an extra factor of p log( n ). This will allow the bounds to hold except with probability boundedby an arbitrarily large negative power of n . This is clear for the limiting nor-mal case (by standard asymptotic expansions of the normal c.d.f.). To obtainsuch bounds for the distribution of S n will require some form of Bernstein’sinequality. Such inequalities date to Bernstein’s original publication in 1924[see Bernstein (1964)], but a version due to Hoeﬀding (1963) may be easierto apply. Ingredient 3.

Using Theorem 3, it can be shown (see Section 4) thatthe probability in (4.1) may be approximated as P { ˜ S n ∈ A h } (1 + O ( L n / √ n )) , where the ﬁrst coordinate of ˜ S n is a sum of n i.i.d. N (0 , τ (1 − τ )) randomvariables, the last ( p −

1) coordinates are those of S n , and L n = (log n ) / . S. PORTNOY

Since we seek a normal approximation for this probability with multiplica-tive error, at this point one might hope that a known (multidimensional)“Cram´er” expansion or saddlepoint approximation would allow ˜ S n to bereplaced by a normal vector (thus providing the desired result). However,this will require that the summands be smooth, or (at least) satisfy a formof Cram´er’s condition. Let ˙ x i denote the last ( p −

1) coordinates of x i .One approach would be to assume ˙ x i has a smooth distribution satisfyingthe classical form of Cram´er’s condition. However, to maintain a conditionalform of the analysis, it suﬃces to impose a condition on ˙ x i , which is designedto mimic the eﬀect of a smooth distribution and will hold with probabilitytending to one if ˙ x i has such a smooth distribution. Condition X1 speciﬁesjust such an assumption.Note that the characteristic functions of the summands of ˜ S n , say, { ˙ φ i ( t ) } ,will also satisfy Condition X1 [equation (3.1)] and so should allow applica-tion of known results on normal approximations. Unfortunately, I have beenunable to ﬁnd a published result providing this and so Section 5 will presentan independent proof.Clearly, some additional conditions will be required. Speciﬁcally, we willneed conditions that the empirical moments of { x i } converge appropriately,as speciﬁed in Condition X2.Finally, the approach using characteristic functions is greatly simpliﬁedwhen the sums, ˜ S n , have densities. Again, to avoid using smoothness of thedistribution of { ˙ x i } (and thus to maintain a conditional approach), introducea random perturbation V n which is small and has a bounded smooth density(the bound may depend on n ). Section 4 will then prove the following: Theorem 4.

Assume Conditions X1 and X2 and the regression quantilemodel of Section 1. Let δ be the argument of the density of n − / ( ˆ β − β ) ,and suppose k δ k ≤ d √ n for some constant d . Then a constant d can be chosen so that P { S n + V n ∈ A h } = P (cid:26) Z n + V n √ n ∈ A h √ n (cid:27)(cid:18) O (cid:18) log / ( n ) √ n (cid:19)(cid:19) + O ( n − d ) , where Z n has mean − G − n δ and covariance τ (1 − τ ) H n , d can be arbitrarilylarge, and V n is a small perturbation [see (5.1)]. Following the proof of this theorem, it will be shown that the eﬀect of V n can be ignored, if V n is bounded by n − d , where d may depend on d (butnot on d ). Ingredient 4.

Expanding the densities in (4.1) is trivial if the densitiesare suﬃciently smooth. The assumption of a bounded ﬁrst derivative in

EGRESSION QUANTILE APPROXIMATION Condition F appears to be required to analyze second order terms (beyondthe ﬁrst order normal approximation).

Ingredient 5.

Finally, summing terms involving det( X h ) in (4.1) overthe (cid:0) np (cid:1) summands will require Vinograd’s theorem and related results frommatrix theory concerning adjoint matrices [see Gantmacher (1960)].The remaining ingredients provide the desired “Hungarian” construction. Ingredient 6.

Extend the density approximation to the joint densityfor ˆ β ( τ ) and ˆ β ( τ ) (when standardized). A major complication is that oneneeds a n ≡ | τ − τ | →

0, making the covariance matrix tend to singularity.Thus, we focus on the joint density for standardized versions of ˆ β ( τ ) and D n ≡ ˆ β ( τ ) − ˆ β ( τ ). Clearly, this requires modiﬁcation of the proof for theunivariate case to treat the fact that D n converges at a rate depending on a n .The result is given in Theorem 1. Ingredient 7.

Extend the density result to obtain an approximationfor the quantile transform for the conditional distribution of diﬀerences D n (between successive dyadic rationals). This will provide (independent) nor-mal approximations to the diﬀerences whose sums will have the same covari-ance structure as the regression quantile process (at least along a suﬃcientlysparse grid of dyadic rationals). Ingredient 8.

Finally, the Hungarian construction is applied induc-tively along the sparse grid of dyadic rationals. This inductive step requiressome innovative development, mainly because the regression quantile pro-cess is not directly expressible in terms of sums of random variables (as arethe empiric one-sample distribution function and quantile function).

5. Proof of Theorem 4.

Let ˙ S n be the last p − S n and A (1) ( ˙ S n , h ) be the interval { a : ( a, ˙ S n ) ∈ A h } . Then, P { S n ∈ A h } = P (cid:26)X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − τ ) ∈ A (1) ( ˙ S n , h ) (cid:27) = P (cid:26)X i/ ∈ h ( I ( Y i ≤ x ′ i β ) − τ ) ∈ A (1) ( ˙ S n , h ) − X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − I ( Y i ≤ x ′ i β )) (cid:27) = X k ∈ A ∗ f binomial ( k ; τ ) , S. PORTNOY where A ∗ is the set A (1) shifted as indicated above. Note that by Hoeﬀding’sinequality [Hoeﬀding (1963)], for any ﬁxed d , the shift satisﬁes (cid:12)(cid:12)(cid:12)(cid:12)X i/ ∈ h ( I ( Y i ≤ x ′ i β + δ/ √ n ) − I ( Y i ≤ x ′ i β )) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d √ n p log( n )except with probability bounded by 2 n − d . Thus, we may apply Theorem 3[equation (4.3)] with w equal to the shift above to obtain the following bound(to within an additional additive error of 2 n − d ): P { S n ∈ A h } = P { nZ p τ (1 − τ ) ∈ A (1) ( ˙ S n , h ) } (1 + O ( a n / √ n )) , where Z ∼ N (0 ,

1) and a n is a bound on ˙ S n , which may be taken to be ofthe form B √ log n (by Hoeﬀding’s inequality). Finally, we obtain P { S n ∈ A h } = P { ˜ S n ∈ A h } (1 + O ( a n / √ n )) + 2 n − d , where the ﬁrst coordinate of ˜ S n is a sum of n i.i.d. N (0 , τ (1 − τ )) randomvariables and the last p − S n .To treat the probability involving ˜ S n , standard approaches using charac-teristic functions can be employed. In theory, exponential tilting (or saddle-point methods) should provide better approximations, but since we requireonly the order of the leading error term, we can proceed more directly. Asin Einmahl (1989), the ﬁrst step is to add an independent perturbation sothat the sum has an integrable density: speciﬁcally, for ﬁxed h ∈ H let V n bea random variable (independent of all observations) with a smooth boundeddensity and for which (for each h ∈ H ) k V n k ≤ n − d , (5.1)where d will be chosen later. Deﬁne S ∗ n = ˜ S n + V n . We now allow A h to be any (arbitrary) set, say, A . Thus, S ∗ n has a densityand we can write [with c π = (2 π ) − p ] P { S ∗ n / √ n ∈ A } = c π Z Vol( A ) φ Unif( A ) ( t ) φ ˜ S n ( t/ √ n ) φ V n ( t/ √ n ) dt, where φ U denotes the characteristic function of the random variable U .Break domain of integration into 3 sets: k t k ≤ d p log( n ), d p log( n ) ≤k t k ≤ ε √ n , and k t k ≥ ε √ n .On k t k ≤ d p log( n ), expand log φ ˜ S n / √ n ( t ). For this, compute µ i ≡ Ex i ( τ − I ( y i ≤ x ′ i β + x ′ i δ/ √ n ))= − f i ( F − i ( τ )) x i x ′ i δ/ √ n + O ( k x i k k δ k /n ) , Σ i ≡ Cov[ x i ( τ − I ( y i ≤ x ′ i β + x ′ i δ/ √ n ))]= x i x ′ i τ (1 − τ ) + O ( k x i k k δ k /n ) . EGRESSION QUANTILE APPROXIMATION Hence, using the boundedness of k x i k , k δ k and k t k (on this ﬁrst interval), φ ˜ S n ( t/ √ n ) = exp (cid:26) − ι X i/ ∈ h µ i / √ nt ′ δ − X i/ ∈ h t ′ Σ i t/n + O (cid:18) k δ k + k t k √ n (cid:19)(cid:27) = exp (cid:26) − ιG n t ′ δ − t ′ H n t + O ((log n ) / / √ n ) (cid:27) , where G n and H n are deﬁned in Condition X2 [see (3.2) and (3.3)].For the other two intervals on the t -axis, the integrands will be boundedby an additive error times Z φ V n ( t/ √ n ) dt = O ( n − p ( d +1 / )since k V n k ≤ n − d .On k t k ≤ ε √ n , the summands are bounded and so their characteristicfunctions satisfy φ i ( s ) ≤ (1 − b k t k ) for some constant c . Thus, on d p log( n ) ≤k t k ≤ ε √ n , | φ ˜ S n ( t/ √ n ) | ≤ (1 − bd log( n ) /n ) n − p ≤ c n − bd for some constant c . Therefore, integrating times φ V n ( t/ √ n ) provides anadditive bound of order n − d ∗ , where d ∗ = bd − p ( d + 1 /

2) and (for any d ) d can be chosen suﬃciently large so that d ∗ > d .Finally, on k t k ≥ ε √ n , Condition X1 [see (3.1)] gives an additive boundof η n directly and, again (as on the previous interval), an additive errorbounded by n − d can be obtained.Therefore, it now follows that we can choose d (depending on d , d , d and d ∗ ) so that P (cid:26) S n + V n √ n ∈ A (cid:27) = c π Z Vol( A ) φ Unif( A ) ( t ) φ N ( − Gδ,τ (1 − τ ) H ) ( t ) φ V n (cid:18) t √ n (cid:19) dt × (1 + O ((log ( n ) /n ) / )) + O ( n − d ) , from which Theorem 4 follows.Finally, we show that the contribution of V n can be ignored: | P { ˜ S n ∈ A h } − P { S ∗ n ∈ A h }| = | P { ˜ S n ∈ A h } − P { ˜ S n + V n ∈ A h + V n }|≤ P { ˜ S n + V n ∈ A h △ ( A h + V n ) } , where △ denotes the symmetric diﬀerence of the sets. Since V n is boundedand A h = X h R , this symmetric diﬀerence is contained in a set, D , which isthe union of 2 p (boundary) parallelepipeds each of the form X h R j , where R j is a rectangle one of whose coordinates has width 2 n − d and all other coordi- S. PORTNOY nates have length 1. Thus, applying Theorem 4 (as proved for the set A = D ), | P { ˜ S n ∈ A h } − P { S ∗ n ∈ A h }| ≤ P { ˜ S n + V n ∈ D }≤ c Vol( D ) + O ( n − d ) ≤ c ′ n − d , where c and c ′ are constants, and d may be chosen arbitrarily large.

6. Normal approximation with nearly root- n multiplicative error. Theorem 5.

Assume Conditions X1, X2, F and the regression quantilemodel of Section 1. Let δ be the argument of the density of ˆ δ n ≡ n − / ( ˆ β ( τ ) − β ( τ )) and suppose k δ k ≤ d p log( n ) for some constant d . Then, uniformly in ε ≤ τ ≤ − ε (for ε > ), f ˆ δ n ( δ ) = ϕ Σ ( δ )(1 + O ((log ( n ) /n ) / )) , where ϕ Σ denotes the normal density with covariance Σ n = τ (1 − τ ) G − n H n G − n with G n and H n given by (3.2) and (3.3). Proof.

Recall the basic formula for the density (4.1): f ˆ δ ( δ ) = n − p/ X h ∈H det( X h ) P { S n ∈ A h } Y i ∈ h f i ( x ′ i β + n − / δ ) . By Theorem 4, ignoring the multiplicative and additive error terms given inthis result and setting c ′ π = (2 π ) − p/ , P { S n ∈ A h } = P { Z n ∈ A h / √ n } = c ′ π | H n | − / Z A h / √ n exp (cid:26) −

12 ( z − G − n δ ) ′ H − n τ (1 − τ ) ( z − G − n δ ) (cid:27) dz = c ′ π | H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) Z A h / √ n dz (1 + O ( n − / ))= c ′ π n − p/ | X h || H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) (1 + O ( n − / ))since z is bounded by a constant times n − / on A h / √ n and the last integralequals Vol( A h ) = n − p/ | X h | .By Ingredient 4, the product is Y i ∈ h f i ( x ′ i β )(1 + O ( k δ k n − / )) . EGRESSION QUANTILE APPROXIMATION This gives the main term of the approximation as X h ∈H n − p | X h | Y i ∈ h f i ( x ′ i β ) | H n | − / exp (cid:26) − δ ′ Σ − n δ (cid:27) . The penultimate step is to apply results from matrix theory on adjointmatrices [speciﬁcally, the Cauchy–Binet theorem and the “trace” theorem;see, e.g., Gantmacher (1960), pages 9 and 87]: the sum above is just thetrace of the p th adjoint of ( X ′ D f X ), which equals det( X ′ D f X ).The various determinants combine (with the factor n − p ) to give det(Σ n ) − / ,which provides the asymptotic normal density we want.Finally, we need to combine the multiplicative and additive errors intoa single multiplicative error. So consider k δ k ≤ d p log( n ) (for some con-stant d ). Then, the asymptotic normal density is bounded below by n − cd forsome constant c .Thus, since the constant d (which depends on d , d , d ∗ and η ) can bechosen so that the additive errors are smaller than O ( n − cd − / ), the error isentirely subsumed in the multiplicative factor: (1 + O ((log ( n ) /n ) / )). (cid:3)

7. The Hungarian construction.

We ﬁrst prove Theorem 1, which pro-vides the bivariate normal approximation.

Proof of Theorem 1.

The proof follows the development in Theo-rem 5. The ﬁrst step treats the ﬁrst (intercept) coordinate. Since the bi-nomial expansions were omitted in the proof of Theorem 3, details for thetrinomial expansion needed for the bivariate case here will be presented.The binomial sum in the ﬁrst coordinate of (4.2) will be split into the sumof observations in the intervals [ x ′ i ˆ β (0) , x ′ i ˆ β ( τ )), [ x ′ i ˆ β ( τ ) , x ′ i ˆ β ( τ + a n )) and[ x ′ i ˆ β ( τ + a n ) , x ′ i ˆ β (1)). The expected number of observations in each intervalis within p of n times the length of the corresponding interval. Thus, ignor-ing an error of order 1 /n , we expand a trinomial with n observations and p = τ and p = a n . Let ( N , N , N ) be the (trinomially distributed) num-ber of observation in the respective intervals and consider P ∗ ≡ P { N = k ,N = k , N = n − k − k } . We may take k = O (( n log n ) / ) , (7.1) k = O ( a n (log n ) / ) , since these bounds are exceeded with probability bounded by n − d for any(suﬃciently large) d . So P ∗ ≡ A × B , where A = n !( np + k )!( np + k )!( n (1 − p − p ) − k − k )! ,B = p np + k +11 p np + k (1 − p − p ) n (1 − p − p ) − k − k . S. PORTNOY

Expanding (using Sterling’s formula and some computation), A = 12 π exp (cid:26) (cid:18) n + 12 (cid:19) log (cid:18) n + 1 n (cid:19) − (cid:18) np + k + 12 (cid:19) log (cid:18) np + k + 1 np (cid:19) − (cid:18) np + k + 12 (cid:19) log (cid:18) np + k + 1 np (cid:19) − (cid:18) n (1 − p − p ) − k − k + 12 (cid:19) × log (cid:18) n (1 − p − p ) − k + k − n (1 − p − p ) (cid:19) + O (cid:18) np (cid:19)(cid:27) = 12 π exp (cid:26)

12 log n − np log p − (cid:18) k + 12 (cid:19) log( np ) − np log p − (cid:18) k + 12 (cid:19) log( np ) − n (1 − p − p ) log(1 − p − p ) − (cid:18) k + k + 12 (cid:19) × log( n (1 − p − p )) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) k ( np ) (cid:19)(cid:27) = 12 π exp (cid:26) − log n − (cid:18) np + k + 12 (cid:19) log p − (cid:18) np + k + 12 (cid:19) log p − (cid:18) n (1 − p − p ) − k − k + 12 (cid:19) log(1 − p − p ) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) ( logn ) / na n (cid:19)(cid:27) ,B = exp { ( np + k ) log p + ( np + k ) log p + ( n (1 − p − p ) − k − k ) log(1 − p − p ) } . Therefore, A × B = exp (cid:26) − p − p −

12 (1 − p − p ) − k np − k np − ( k + k ) n (1 − p − p ) + O (cid:18) ( logn ) / na n (cid:19)(cid:27) . EGRESSION QUANTILE APPROXIMATION Some further simpliﬁcation shows that A × B gives the usual normalapproximation to the trinomial with a multiplicative error of (1 + o ( n − / ))[when k and k satisfy (7.1)].The next step of the proof follows that of Theorem 4 (see Ingredient 3).Since the proof is based on expanding characteristic functions (which donot involve the inverse of the covariance matrices), all uniform error boundscontinue to hold. This extends the result of Theorem 4 to the bivariate case: P { S n ( τ ) ∈ A h , S n ( τ ) ∈ A h } = P { Z ∈ A h / √ n, Z ∈ A h / √ n } (7.2) = P { Z ∈ A h / √ n } × P { ( Z − Z ) / √ n ∈ ( A h − Z ) / √ n | Z } for appropriate normally distributed ( Z , Z ) (depending on n ). This lastequation is needed to extend the argument of Theorem 5, which involvesintegrating normal densities. The joint covariance matrix for ( S n ( τ ) , S n ( τ ))is nearly singular (for τ − τ small) and complicates the bounds for theintegral of the densities. The ﬁrst factor above can be treated exactly asin the proof of Theorem 5, while the conditional densities involved in thesecond factor can be handled by simple rescaling. This provides the desiredgeneralization of Theorem 5.Thus, the next step is to develop the parameters of the normal distributionfor ( B n ( τ ) , R n ) [see (3.4), (3.5)] in a usable form. The covariance matrixfor ( B n ( τ ) , B n ( τ )) has blocks of the formCov( B n ( τ ) , B n ( τ )) = (cid:18) τ (1 − τ )Λ τ (1 − τ )Λ τ (1 − τ )Λ τ (1 − τ )Λ (cid:19) , where Λ ij = G − n ( τ i ) H n G − n ( τ j ) with G n and H n given in Condition X2[see (3.2) and (3.3)].Expanding G n ( τ ) about τ = τ (using the diﬀerentiability of the densitiesfrom Condition F),Λ ij = Λ + ( τ − τ )∆ ij + o ( | τ − τ | ) , where ∆ ij are derivatives of G n at τ (note that ∆ = 0). Straightforwardmatrix computation now yields the joint covariance for ( B n ( τ ) , R n ):Cov( B n ( τ ) , R n ) = (cid:18) τ (1 − τ )Λ ( τ − τ )∆ ∗ ( τ − τ )∆ ∗ ( τ − τ )∆ ∗ (cid:19) + o ( | τ − τ | ) , (7.3)where ∆ ∗ ij are uniformly bounded matrices.Thus, the conditional distribution of R n = p ( τ − τ )( B n ( τ ) − B n ( τ ))given B n ( τ ) has moments E [ R n | B n ( τ )] = ( τ − τ )Λ − ∆ / ( τ (1 − τ )) , (7.4) Cov[ R n | B n ( τ )] = ( τ − τ ) (cid:20) ∆ ∗ − τ − τ τ (1 − τ ) ∆ ∗ Λ − ∆ ∗ (cid:21) (7.5)and analogous equations also hold for { Z − Z | Z } . S. PORTNOY

Finally, recalling that τ − τ = a n , the second term in (7.2) can be written P (cid:26) Z − Z √ n ∈ A h − Z √ n (cid:12)(cid:12)(cid:12) Z (cid:27) = P (cid:26) Z − Z p n ( τ − τ ) ∈ A h − Z √ na n (cid:12)(cid:12)(cid:12) Z (cid:27) . Thus, since the conditional covariance matrix is uniformly bounded exceptfor the a n = ( τ − τ ) factor, the argument of Theorem 5 also applies directlyto this conditional probability. (cid:3) Finally, the above results are used to apply the quantile transform for in-crements between dyadic rationals inductively in order to obtain the desired“Hungarian” construction. The proof of Theorem 2 is as follows:

Proof of Theorem 2. (i) Following the approach in Einmahl (1989),the ﬁrst step is to provide the result of Theorem 1 for conditional densitiesone coordinate at a time. Using the notation of Theorem 1, let τ = k/ ℓ and τ = ( k + 1) / ℓ be successive dyadic rationals (between ε and 1 − ε ) withdenominator 2 ℓ . So a n = 2 − ℓ . Let R m be the m th coordinate of R n ( τ , τ )[see (3.5)], let ˙ R m be the vector of coordinates before the m th one, and let S = B n ( τ ). Then the conditional density of R m | ( ˙ R m , S ) satisﬁes f R m | ( ˙ R m ,S ) ( r | r , s ) = ϕ µ, Σ ( r | r , s ) (cid:18) O (cid:18) (log n ) / √ n (cid:19)(cid:19) (7.6)for k r k < D √ log n , k r k < D √ log n , and k s k < D √ log n , and where µ and σ are easily derived from (7.4) and (7.5). Note that µ has the form µ = √ a n α ′ S, (7.7)where k α k can be bounded (independent of n ) and Σ can be bounded awayfrom zero and inﬁnity (independent of n ).This follows since the conditional densities are ratios of marginal densi-ties of the form f Y ( y ) = R f X,Y dx (with f X,Y satisfying Theorem 1). Theintegral over k x k ≤ D √ log n has the multiplicative error bound directly. Theremainder of the integral is bounded by n − d , which is smaller than the nor-mal integral over k x k ≤ D √ log n (see the end of the proof of Theorem 5).(ii) The second step is to develop a bound on the (conditional) quantiletransform in order to approximate an asymptotic normal random variable bya normal one. The basic idea appears in Einmahl (1989). Clearly, from (7.6), Z r f R m | ( ˙ R m ,S ) ( u | r , s ) du = Z r ϕ µ,σ ( u | r , s ) du (cid:18) O (cid:18) (log n ) / √ n (cid:19)(cid:19) for k u k < D √ log n , k r k < D √ log n , and k s k < D √ log n . By Condition F,the conditional densities (of the response given x ) are bounded above zeroon ε ≤ τ ≤ − ε . Hence, the inverse of the above versions of the c.d.f.’s alsosatisfy this multiplicative error bound, at least for the variables bounded by D √ log n . Thus, the quantile transform can be applied to show that there isa normal random variable, Z ∗ , such that ( R m − Z ∗ ) = O ((log n ) / / √ n ) so EGRESSION QUANTILE APPROXIMATION long as R m and the quantile transform of R m are bounded by D √ log n . Usingthe conditional mean and variance [see (7.7)], and the fact that the randomvariables exceed D √ log n with probability bounded by n − d (where d canbe made large by choosing D large enough), there is a random variable Z m that can be chosen independently so that R m = a n α ′ S + Z m + O (cid:18) (log n ) / √ n (cid:19) (7.8)except with probability bounded by n − d .(iii) Finally, the “Hungarian” construction will be developed inductively.Let τ ( k, ℓ ) = k/ ℓ and consider induction on ℓ . First consider the case where τ ≥ ; the argument for τ < is entirely analogous.Deﬁne ε ∗ n = c (log n ) / / √ n , where c bounds the big-O term in any equa-tion of the form (7.8). Let A be a bound [uniform over τ ∈ ( ε, − ε )] on α in (7.8). The induction hypothesis is as follows: there are normal randomvectors Z n ( k, ℓ ) such that (cid:13)(cid:13)(cid:13)(cid:13) B n (cid:18) k ℓ (cid:19) − Z n ( k, ℓ ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ ε ( ℓ )(7.9)except with probability 2 ℓn − d , where for each ℓ , Z n ( · , ℓ ) has the same co-variance structure as B n ( · / ℓ ), and where ε ( ℓ ) = ℓε ∗ n ℓ Y j =1 (1 + A − j/ ) . (7.10)Note: since the earlier bounds apply only for intervals whose lengthsexceed n − a (for some positive a ), ℓ must be taken to be smaller than a log ( n ) = O (log n ). Thus, the bound in (7.10) becomes O ((log n ) / / √ n ),as stated in Theorem 1.To prove the induction result, note ﬁrst that Theorem 1 (or Theorem 5)provides the normal approximation for B n ( ) for ℓ = 1. The induction stepis proved as follows: following Einmahl (1989), take two consecutive dyadicrationals τ ( k, ℓ ) and τ ( k − , ℓ ) with k odd. So τ ( k − , ℓ ) = [ k/ / ℓ − = τ ([ k/ , ℓ − . Condition each coordinate of B n ( τ ( k, ℓ )) on previous coordinates and on B n ( τ ([ k/ , ℓ − b n ( τ ( k, ℓ )) = b n ( k/ ℓ ) be one such coordinate.Now, as above, deﬁne R ( k, ℓ ) by b n ( τ ( k, ℓ )) = b n ( τ ([ k/ , ℓ − R ( k, ℓ ) . From (7.8), there is a normal random variable Z n ( k, ℓ ) such that | R ( k, ℓ ) − √ − ℓ α ′ B n ( τ ([ k/ , ℓ − − Z n ( k, ℓ ) | ≤ ε ∗ n . By the induction hypothesis for ( ℓ − B n ( τ ([ k/ , ℓ −

1) is approximableby normal random variables to within ε ( ℓ −

1) (except with probability n − d ). S. PORTNOY

Thus, a coordinate b n ( τ ([ k/ , ℓ −

1) is also approximable with this error,and the error in approximating a n α ′ B n ( τ ([ k/ , ℓ −

1) is bounded by ε ( ℓ − A √ a n = A − ℓ/ . Finally, since Z n ( k, ℓ ) is independent of these normalvariables, the errors can be added to obtain(1 + A − ℓ/ ) ε ( ℓ −

1) + ε ∗ n . Therefore, except with probability less than 2( ℓ − n − d + 2 n − d = 2 ℓn − d , theinduction hypothesis (7.9) holds with error( ℓ − ε ∗ n ℓ − Y j =1 (1 + 2 − j/ ) × (1 + 2 − ℓ/ ) + ε ∗ n ≤ ℓ ℓ Y j =1 (1 + 2 − j/ ) ε ∗ n = ε ( ℓ ) , and the induction is proven.The theorem now follows since the piecewise linear interpolants satisfythe same error bound [see Neocleous and Portnoy (2008)]. (cid:3) APPENDIX

Result 1.

Under the conditions for the theorems here, the coverageprobability for the conﬁdence interval (2.3) is − α + O ((log n ) n − / ) , whichis achieved at h n = c √ log nn − / (where c is a constant). Sketch of proof.

Recall the notation of Remark 2 in Section 2. UsingTheorem 1 and the quantile transform as described in the ﬁrst steps ofTheorem 2 (and not needing the dyadic expansion argument), it can beshown that there is a bivariate normal pair (

W, Z ) such that √ n ( ˆ β ( τ ) − β ( τ )) = W + R n , R n = O p ( n − / (log n ) / ) , (A.1) √ n ( ˆ∆( h n ) − ∆( h n )) = Z + R ∗ n , R ∗ n = O p ( n − / (log n ) / ) . Note that from the proofs of Theorems 1 and 2, the O p terms above areactually O terms except with probability n − d where d is an arbitrary ﬁxedconstant. The “almost sure” results above take d >

1, but d = 1 will suﬃcefor the bounds on the coverage probability here.Incorporating the approximation error in (A.1), √ n (ˆ δ − δ ) = Z/h n + R ∗ n /h n + O ( n / h n ) . Now consider expanding s a ( δ ). First, note that under the design condi-tions here, s a will be of exact order n − / ; speciﬁcally, if X is replaced by √ n ˜ X , all terms involving ˜ X ′ ˜ X will remain bounded, and we may focus on EGRESSION QUANTILE APPROXIMATION √ ns a ( δ ). Note also that for h n = O ( n − / ), the terms in the expansion of(ˆ δ − δ ) tend to zero [speciﬁcally, 1 / ( √ nh n ) = O ( n − / )]. So the sparsity, s a ( δ ), may be expanded in a Taylor series as follows: √ ns a (ˆ δ ) = √ ns a ( δ ) + b ′ (ˆ δ − δ ) + b (ˆ δ − δ ) + b (ˆ δ − δ ) + O ( n − / ) ≡ √ ns a ( δ ) + K, where b is a (gradient) vector that can be deﬁned in terms of ˜ X and β ( τ )(and its derivatives), b is a quadratic function (of its vector argument)and b is a cubic function. Note that under the design conditions, all thecoeﬃcients in b , b and b are bounded, and so it is not hard to show thatall the terms in K tend to zero as long as h n √ n → ∞ . Speciﬁcally, if h n is oforder n − / , then all the terms in K tend to zero. Also, R ∗ n is within a log n factor of O ( n − / ) and h n is even smaller. Finally, Z is a diﬀerence of twoquantiles separated by 2 h , and so b ′ Z has variance proportional to h . Thus, E ( b ′ Z/ ( √ nh n )) = O (1 / ( nh n )). Thus, not only does b ′ Z/ ( √ nh n ) → p

0, butpowers of this term greater than 2 will also be O p ( n − ).It follows that the coverage probability may be computed using only twoterms of the Taylor series expansion for the normal c.d.f.: P {√ na ′ ( ˆ β ( τ ) − β ( τ )) ≤ z α √ ns a (ˆ δ ) } = P { a ′ ( W + R n ) ≤ z α √ ns a (ˆ δ ) + K } = E Φ a ′ W | Z ( z α √ ns a ( δ ) + K − a ′ R n )= E { Φ a ′ W | Z ( √ ns a ( δ )) + φ a ′ W | Z ( √ ns a ( δ ))( K − a ′ R n )+ φ ′ a ′ W | Z ( √ ns a ( δ ))( K − a ′ R n ) + O ((log n ) /n ) }≡ − α + T + T + O ((log n ) /n ) . Note that the (normal) conditional distribution of W given Z is straightfor-ward to compute (using the usual asymptotic covariance matrix for quan-tiles): the conditional mean is a small constant (of the order of h n ) times Z ,and the conditional variance is bounded.Expanding the lower probability in the same way and subtracting providessome cancelation. The contribution of R n will cancel in the T diﬀerences,and is negligible in subsequent terms since R n = O ((log n ) /n ). Similarly, the R ∗ n / ( √ nh n ) term will appear only in the T diﬀerence where it contributesa term that is (log n ) / times a term of order 1 / ( nh n ), and will also be negli-gible in subsequent terms. Also, the h n term will only appear in T , as higherpowers will be negligible. The only remaining terms involve Z/ ( √ nh n )). Forthe ﬁrst power (appearing in T ), EZ = 0. For the squared Z -terms in T ,since Var( b ′ Z ) is proportional to h n , E ( b ′ Z ) / ( nh n ) = c / ( nh n ), and allother terms involving Z have smaller order. S. PORTNOY

Therefore, one can obtain the following error for the coverage probability:for some constants c and c , the error is b ′ R ∗ n √ nh n + c nh n + c h n (plus terms of smaller order). Since R ∗ n is of order nearly n − / , the ﬁrstterms have nearly the same order. Using b ′ R ∗ n = c (log n ) / ( √ nh n ), it is straight-forward to ﬁnd the optimal h n to be a constant times √ log nn − / , whichbounds the error in the coverage probability by O (log nn − / ). (cid:3) REFERENCES

Bernstein, S. N. (1964). On a modiﬁcation of Chebyshev’s inequality and of the error for-mula of Laplace. In

Sobranie Sochineni˘ı Ann. Sci. Inst. Sav. Ukraine, Sect. Math. (1924)]. Daniels, H. E. (1987). Tail probability approximations.

Internat. Statist. Rev. De Angelis, D. , Hall, P. and

Young, G. A. (1993). Analytical and bootstrap ap-proximations to estimator distributions in L regression. J. Amer. Statist. Assoc. Einmahl, U. (1989). Extensions of results of Koml´os, Major, and Tusn´ady to the multi-variate case.

J. Multivariate Anal. Gantmacher, F. R. (1960).

Matrix Theory . Amer. Math. Soc., Providence, RI.

Gutenbrunner, C. , Jureˇckov´a, J. , Koenker, R. and

Portnoy, S. (1993). Tests oflinear hypotheses based on regression rank scores.

J. Nonparametr. Stat. Hall, P. and

Sheather, S. J. (1988). On the distribution of a Studentized quantile.

J. R. Stat. Soc. Ser. B Stat. Methodol. He, X. and

Hu, F. (2002). Markov chain marginal bootstrap.

J. Amer. Statist. Assoc. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.

J. Amer. Statist. Assoc. Horowitz, J. L. (1998). Bootstrap methods for median regression models.

Econometrica Jureˇckov´a, J. and

Sen, P. K. (1996).

Robust Statistical Procedures: Asymptotics andInterrelations . Wiley, New York. MR1387346

Knight, K. (2002). Comparing conditional quantile estimators: First and second orderconsiderations. Technical report, Univ. Toronto.

Kocherginsky, M. , He, X. and

Mu, Y. (2005). Practical conﬁdence intervals for regres-sion quantiles.

J. Comput. Graph. Statist. Koenker, R. (2005).

Quantile Regression . Econometric Society Monographs . Cam-bridge Univ. Press, Cambridge. MR2268657 Koenker, R. (2012). quantreg : Quantile regression. R-package, Version 4.79. Availableat cran.r-project.org.

Koenker, R. and

Bassett, G. Jr. (1978). Regression quantiles.

Econometrica Koml´os, J. , Major, P. and

Tusn´ady, G. (1975). An approximation of partial sumsof independent RV’s and the sample DF. I.

Z. Wahrsch. Verw. Gebiete Koml´os, J. , Major, P. and

Tusn´ady, G. (1976). An approximation of partial sumsof independent RV’s, and the sample DF. II.

Z. Wahrsch. Verw. Gebiete Neocleous, T. and

Portnoy, S. (2008). On monotonicity of regression quantile func-tions.

Statist. Probab. Lett. Parzen, M. I. , Wei, L. J. and

Ying, Z. (1994). A resampling method based on pivotalestimating functions.

Biometrika Zhou, K. Q. and

Portnoy, S. L. (1996). Direct use of regression quantiles to constructconﬁdence sets in linear models.