Fast Algorithms for the Quantile Regression Process
aa r X i v : . [ ec on . E M ] A p r Noname manuscript No. (will be inserted by the editor)
Fast algorithms for the quantile regression process
Victor Chernozhukov · Iv´anFern´andez-Val · Blaise Melly
Received: date / Accepted: date
Abstract
The widespread use of quantile regression methods depends cru-cially on the existence of fast algorithms. Despite numerous algorithmic im-provements, the computation time is still non-negligible because researchersoften estimate many quantile regressions and use the bootstrap for inference.We suggest two new fast algorithms for the estimation of a sequence of quantileregressions at many quantile indexes. The first algorithm applies the prepro-cessing idea of Portnoy and Koenker (1997) but exploits a previously esti-mated quantile regression to guess the sign of the residuals. This step allowsfor a reduction of the effective sample size. The second algorithm starts froma previously estimated quantile regression at a similar quantile index and up-dates it using a single Newton-Raphson iteration. The first algorithm is exact,while the second is only asymptotically equivalent to the traditional quantileregression estimator. We also apply the preprocessing idea to the bootstrap byusing the sample estimates to guess the sign of the residuals in the bootstrapsample. Simulations show that our new algorithms provide very large improve-ments in computation time without significant (if any) cost in the quality ofthe estimates. For instance, we divide by 100 the time required to estimate 99quantile regressions with 20 regressors and 50,000 observations.
Keywords
Quantile regression · Quantile regression process · Preprocessing · One-step estimator · Bootstrap · Uniform inference
V. ChernozhukovDepartment of Economics, Massachusetts Institute of Technology, Cambridge, USAE-mail: [email protected]. Fern´andez-ValDepartment of Economics, Boston University, Boston MA, USAE-mail: [email protected]. MellyDepartment of Economics, University of Bern, Bern, SwitzerlandE-mail: [email protected] Victor Chernozhukov et al.
The idea of estimating regression parameters by minimizing the sum of ab-solute errors (i.e. median regression) predates the least squares method byabout 50 years. Roger Joseph Boscovich suggested in 1857 to fit a straightline to observational data by minimizing the sum of absolute residuals underthe restriction that the sum of residuals be zero. Laplace found in 1899 thatthe solution to this problem is a weighted median. Gauss suggested later touse the least squares criterion. For him, the choice of the loss function is anarbitrary convention and “it cannot be denied that Laplace’s convention vi-olates continuity and hence resists analytic treatment, while the results thatmy convention leads to are distinguished by their wonderful simplicity andgenerality.” Thus, it was mostly due to the simplicity of the analysis and ofthe computation that least squares methods have dominated the statisticaland econometric literature during two hundred years. Only the development of the simplex algorithm during the 20th centuryand its adaptation for median regression by Barrodale and Roberts (1974)made least absolute error methods applicable to multivariate regressions forproblems of moderate size. Portnoy and Koenker (1997) developed an inte-rior points algorithm and a preprocessing step that significantly reduces thecomputation time for large samples with many covariates. These algorithmsare implemented in different programming languages and certainly explain thesurge of applications of quantile regression methods during the last 20 years.Despite these improvements, the computation time is still non-negligiblein two cases: when we are interested in the whole quantile regression processand when we want to use bootstrap for inference. Researchers are rarely inter-ested in only one quantile regression but often estimate many quantile regres-sions. The most common motivation for using quantile regression is, indeed,to analyze heterogeneity. Some estimation and inferential procedures evenrequire the preliminary estimation of the whole quantile regression process.For instance, Koenker and Portnoy (1987) integrate the trimmed quantile re-gression process to get conditional trimmed means. Koenker and Xiao (2002),Chernozhukov and Fern´andez-Val (2005) and Chernozhukov and Hansen (2006)test functional null hypotheses such as the absence of any effect of a covariateor the location-scale model, which requires estimating the whole quantile re-gression process. Machado and Mata (2005) and Chernozhukov et al. (2013)must compute the whole quantile regression process to estimate the conditionaldistribution function by inverting the estimated conditional quantile function.In a second step, they integrate the conditional distribution function over thedistribution of the covariates to obtain unconditional distributions.In addition, researchers often use the bootstrap to estimate the standarderrors of quantile regression. Many simulation studies show that it often pro-vides the best estimate of the variance of the point estimates. It also has the Among others, see Chapter 1 in Stigler (1986). See Koenker (2000) and Koenker (2017) for a more detailed historical account of thecomputation of median regression.ast algorithms for the quantile regression process 3 advantage of avoiding the (explicit) choice of a smoothing parameter, whichis required for the analytical estimators of the variance. Finally, it is difficultto avoid the bootstrap (or other simulation methods) to test some types offunctional hypotheses.In these two cases, when the number of observations is large, the compu-tational burden of quantile regression may still be a reason not to use thismethod. In their survey of the decomposition literature, Fortin et al. (2011)mention the following main limitation of the quantile regression approach:“This decomposition method is computationally demanding, and becomesquite cumbersome for data sets numbering more than a few thousand ob-servations. Bootstrapping quantile regressions for sizeable number of quantiles τ (100 would be a minimum) is computationally tedious with large data sets.”Further improvements are clearly needed to enable more widespread applica-tions of quantile regression estimators.In this paper we suggest two new algorithms to estimate the quantile re-gression process. The first algorithm is a natural extension of the preprocessingidea of Portnoy and Koenker (1997). The intuition is fairly simple. The fittedquantile regression line interpolates k data points, where k is the number ofregressors. Only the sign of the residuals of the other observations mattersfor the determination of the quantile regression coefficients. If one can guessthe sign of some residuals, then these observations do not need to enter intothe optimization problem, thus reducing the effective sample size. When manyquantile regressions are estimated, then we can start with a conventional al-gorithm for the first quantile regression and then progressively climb over theconditional quantile function, using recursively the previous quantile regres-sion as a guess for the next one. This algorithm provides numerically the sameestimate as the traditional algorithms because we can check if the guesses werecorrect and stop only if this is the case.This algorithm seriously reduces the computation time when we estimatemany quantile regressions. When these improvements are still insufficient, wesuggest a new estimator for the quantile regression process. The idea consistsin approximating the difference between two quantile regression coefficientvectors at two close quantile indexes using a first-order Taylor approximation.The one-step algorithm starts from one quantile regression estimated using oneof the existing algorithms and then obtains sequentially all remaining quantileregressions using one-step approximations. This algorithm is extremely quick,but the estimates are numerically different from the estimates obtained usingthe other algorithms. Nevertheless, the one-step estimator is asymptoticallyequivalent to the standard quantile regression estimator if the distance betweenthe quantile indexes decreases sufficiently fast with the sample size.We also apply the preprocessing idea to compute the bootstrap estimates.When we bootstrap a quantile regression, we can use the sample estimates toguess the sign of the residuals in the bootstrap sample. With this algorithm,bootstrapping the quantile regression estimator is actually quicker than boot-strapping the OLS estimator. We cannot apply a similar approach to leastsquares estimators because of their global nature. Victor Chernozhukov et al.
Instead of bootstrapping the quantile regression estimates, it is possibleto bootstrap the score (or estimating equation) of the estimator. This ap-proach amounts in fact to using the one-step estimator to compute the boot-strap estimate when we take the sample estimate as a preliminary guess.This inference procedure, which has been suggested for quantile regressionby Chernozhukov and Hansen (2006) and Belloni et al. (2017), is extremelyfast and can also be used to perform uniform inference. Its drawback is thenecessity to choose a smoothing parameter to estimate the conditional densityof the response variable given the covariates.The simulations we report in Section 6 show that the preprocessing algo-rithm is 30 times faster than Stata’s built-in algorithm when we have 50 , , ,
021 observations, with 14regressors and bootstrap 100 times the estimates in about 30 minutes on alaptop. The same estimation with the built-in commands of Stata would takeover two months. The codes in Stata and some rudimentary codes in R areavailable from the authors.We organize the rest of the paper as follows. Section 2 briefly defines thequantile regression model and estimator, describes the existing algorithms andprovides the limiting distribution of the estimator. In Section 3, we adapt thepreprocessing step of Portnoy and Koenker (1997) to estimate the whole quan-tile regression process. In Section 4, we suggest the new one-step estimator.Section 5 uses the same strategies to develop fast algorithms for bootstrap.Section 6 and 7 provide the results of the simulations and of the application,respectively. Finally, in Section 8 we point out some directions of research thatmay be fruitful.
This section gives a very brief introduction to the linear quantile regressionmodel. For a more thorough discussion we recommend the book written byKoenker (2005) and the recent Handbook of Quantile Regression (Koenker et al.,2017).2.1 The quantile regression modelWe are often interested in learning the effect of a k × X on the distribution of a scalar response variable Y . ast algorithms for the quantile regression process 5 Let Q Y ( τ | x ) be the τ conditional quantile of Y given X = x with 0 < τ < Y given X , τ Q Y ( τ | X ), completelydescribes the dependence between Y and X . For computational convenienceand simplicity of the interpretation, we assume that the conditional quantilefunctions are linear in x : Assumption 1 (Linearity) Q Y ( τ | x ) = x ′ β ( τ ) for all τ ∈ T , which is a closed subset of [ ε, − ε ] for some < ε < . In this paper we focus on central quantile regression and suggest inferenceprocedures justified by asymptotic normality arguments. Therefore, we mustexclude the tail regions. For extremal quantile regressions Chernozhukov and Fern´andez-Val(2011) have suggested alternative inference procedures based on extreme valueapproximations. As a rule of thumb for continuous regressors, they suggest thatnormal inference performs satisfactorily if we set ε = 15 · k/n .Most existing sampling properties of the quantile regression estimator havebeen derived for continuous response variables. Beyond the technical reasonsfor this assumption, the linearity assumption for the conditional quantile func-tions is highly implausible for discrete or mixed discrete-continuous responsevariables. Therefore, we impose the following standard assumption: Assumption 2 (Continuity)
The conditional density function f Y ( y | x ) ex-ists, is uniformly continuous in ( y, x ) and bounded on the support of ( Y, X ) . For identification reasons, we require that the (density weighted) covariatesare not linearly dependent:
Assumption 3 (Rank condition)
The minimal eigenvalue of the Jacobianmatrix J ( τ ) := E [ f y ( X ′ β ( τ ) | X ) XX ′ )] is bounded away from zero uniformlyover τ ∈ T . Finally, we must impose a condition on the distribution of the covariates:
Assumption 4 (Distribution of the covariates) E k X k ε < ∞ for some ε > . Assumptions 3 and 4 impose that the derivatives of the coefficient function τ β ( τ ) are bounded uniformly on T because, by simple algebra (e.g., proofof Theorem 3 in Angrist et al. (2006)),d β ( τ )d τ = J ( τ ) − E ( X ) . (1)Under the linearity and the continuity assumptions, by the definition ofthe quantile function, the parameter β ( τ ) satisfies the following conditionalmoment restriction: P ( Y ≤ x ′ β ( τ ) | X = x ) = τ. (2) Victor Chernozhukov et al.
It can be shown that β ( τ ) is the unique solution to the following optimiza-tion problem: β ( τ ) = arg min b ∈ R k E [ ρ τ ( Y − X ′ b )] (3)where ρ τ is the check function defined as ρ τ ( U ) = ( τ − U ≤ · U and 1 ( · )is the indicator function. The objective function is globally convex and itsfirst derivative with respect to b is E [( τ − Y ≤ X ′ b )) X ]. Thus, β ( τ ) solvesthe unconditional moment condition M ( τ, β ( τ )) := E [( τ − Y ≤ X ′ β ( τ ))) X ] = 0 , (4)which follows from the original conditional moment (2).2.2 The quantile regression estimatorLet { y i , x i } ni =1 be a random sample from { Y, X } . The quantile regression esti-mator of Koenker and Bassett (1978) is the M-estimator that solves the sampleanalog of (3): b β ( τ ) ∈ arg min b ∈R k n X i =1 ρ τ ( y i − x ′ i b ) . (5)This estimator is not an exact Z-estimator because the check function is notdifferentiable at 0. However, for continuous response variables, this will affectonly the k observations for which y i = x ′ i b β ( τ ). Thus, the remainder term van-ishes asymptotically and this estimator can be interpreted as an approximateZ-estimator: c M (cid:16) τ, b β ( τ ) (cid:17) := 1 n n X i =1 (cid:16) τ − (cid:16) y i ≤ x ′ i b β ( τ ) (cid:17)(cid:17) x i = o p (cid:18) √ n (cid:19) . (6)The minimization problem (5) that b β ( τ ) solves can be written as a con-vex linear program. This kind of problem can be relatively efficiently solvedwith some modifications of the simplex algorithm, see Barrodale and Roberts(1974), Koenker and D’Orey (1987) and Koenker and d’Orey (1994). The mod-ified simplex algorithm performs extremely well for problems of moderatesize but becomes relatively slow in larger samples. Worst-case theoretical re-sults indicate that the number of iterations required can increase exponen-tially with the sample size. For this reason, Portnoy and Koenker (1997) havedeveloped an interior point algorithm. Unlike the simplex, interior point al-gorithms start in the interior of the feasible region of the linear program-ming program and travel on a path towards the boundary, converging atthe optimum. The inequality constraints are replaced by a barrier that pe-nalizes points that are close to the boundary of the feasible set. Since thisbarrier idea was pioneered by Ragnar Frisch and each iteration corresponds See Koenker and Bassett (1978).ast algorithms for the quantile regression process 7 to a Newton step, Portnoy and Koenker (1997) call their application of theinterior point method to quantile regression the Newton-Frisch algorithm.Portnoy and Koenker (1997) have also suggested a preprocessing step thatwe will discuss in Section 3.2.3 Sampling propertiesFor the sake of completeness we provide distribution theory for the quantileregression coefficient process. Following Angrist et al. (2006), we do not imposeAssumption 1, therefore allowing for the possibility of model misspecification.
Proposition 1 (Asymptotic distribution theory)
Under Assumptions 2to 4, the quantile regression estimator defined in (5) is consistent for the pa-rameter defined in (3) uniformly over T . Moreover, the quantile regressionprocess J ( · ) √ n ( b β ( · ) − β ( · )) weakly converges to a zero mean Gaussian process z ( · ) in ℓ ∞ ( T ) k , where ℓ ∞ ( T ) is the set of bounded functions on T and z ( · ) isdefined by its covariance function Σ ( τ, τ ′ ) ≡ E [( τ − Y ≤ X ′ β ( τ ))) ( τ ′ − Y ≤ X ′ β ( τ ′ ))) XX ′ ] . If the model is correctly specified, i.e. under Assumption 1, then Σ ( τ, τ ′ ) sim-plifies to (min( τ, τ ′ ) − τ τ ′ ) · E [ XX ′ ] . Uniformly consistent estimators of J ( τ ) and Σ ( τ, τ ′ ) are useful for analyt-ical inference and will be required to implement the one-step estimator andthe score bootstrap. We use the sample analog of Σ ( τ, τ ′ ) and Powell (1991)kernel estimator of J ( τ ): b Σ ( τ, τ ′ ) = 1 n n X i =1 ( τ − y i ≤ x ′ i b β ( τ )))( τ ′ − y i ≤ x ′ i b β ( τ ′ ))) x i x ′ i (7) b J ( τ ) = 1 n · h n n X i =1 K y i − x i b β ( τ ) h n ! x i x ′ i (8)where K ( · ) is a kernel function and h n is a bandwidth. We use a standardnormal density as kernel function and the Hall and Sheather (1988) bandwidth h n = n − / · Φ − (cid:16) − α (cid:17) / " . · φ (cid:0) Φ − ( τ ) (cid:1) · Φ − ( τ ) + 1 / (9)where φ ( · ) and Φ − ( · ) are the density and quantile functions of the standardnormal and α is the targeted level of the test. These estimators are uniformlyconsistent under the additional assumption that E k X k is finite. The pointwisevariance can consistently be estimated by b J ( τ ) − b Σ ( τ, τ ) b J ( τ ) − . (10) Victor Chernozhukov et al. k data points. It caneasily be shown from the moment condition (6) that the quantile regressionestimates are numerically identical if we change the values of the observationsthat are not interpolated as long as they remain on the same side of theregression line. The sign of the residuals y i − x ′ i b β ( τ ) is the only thing thatmatters in the determination of the estimates. This explains why the quantileregression estimator is influenced only by the local behavior of the conditionaldistribution of the response near the specified quantile and is robust to outliersin the response variable if we do not go too far into the tails of the distribution. Portnoy and Koenker (1997) exploit this property to design a quicker al-gorithm. Suppose for the moment that we knew that a certain subset J H ofthe observations have positive residuals and another subset J L have negativeresiduals. Then the solution to the original problem (5) is exactly the same asthe solution to the following revised problemmin b ∈R K X i/ ∈ ( J H ∪ J L ) ρ τ ( y i − x ′ i b ) + ρ τ ( y L − x ′ L b ) + ρ τ ( y H − x ′ H b ) (11)where x G = P i ∈ J G x i , for G ∈ { H, L } , and y L , y H are chosen small andlarge enough, respectively, to ensure that the corresponding residuals remainnegative and positive. Solving this new problem gives numerically the sameestimates but is computationally cheaper because the effective sample size isreduced by the number of observations in J H and J L .In order to implement this idea we need to have some preliminary infor-mation about the sign of some residuals. Portnoy and Koenker (1997) suggestto use only a subsample to estimate an initial quantile regression that will beused to guess the sign of the residuals in the whole sample. More formally,their algorithm works as follows: Algorithm 1 (Portnoy and Koenker (1997))
1. Solve the quantile regression problem (5) using only a subsample of size ( k · n ) / from the original sample. This delivers a preliminary estimate ˜ β ( τ ) .
2. Calculate the residuals r i = y i − x ′ i ˜ β ( τ ) and z i , a quickly computed con-servative estimate of the standard error of r i . Calculate the τ − M n and τ + M n quantiles of r i z i . The observations below this first quantile are in-cluded in J L ; the observations above this second quantile are included in J H ; the M = m · ( k · n ) / observations between these quantiles are kept forthe next step. m is a parameter that can be chosen by the user; by defaultit is set to . . On the other hand, note that quantile regression is not robust to outliers in the x-direction.ast algorithms for the quantile regression process 9
3. Solve the modified problem (11) and obtain b β ( τ )
4. Check the residual signs of the observations in J L and J H :(a) If no bad signs (or the number of bad signs is below the number ofallowed mispredicted signs), b β ( τ ) is the solution.(b) If less than . · M bad signs: take the observations with mispredictedsign out of J L and J H and go back to step 3.(c) If more than . · M bad signs: go back to step 1 with a doubled subsamplesize. Formal computational complexity results in Portnoy and Koenker (1997)indicate that the number of computation required by the simplex is quadraticin n , by the interior point method is of order nk log n , and by the prepro-cessing algorithm is of order n / k log n + nk .3.2 Preprocessing for the quantile regression processIn many cases, the researcher is interested in the quantile regression coeffi-cients at several quantiles. This allows analyzing the heterogeneity of behavior,which is the main motivation for using quantile regression. For instance, it iscommon to estimate 99 quantile regressions, one at each percentile, and plotthe estimated coefficients. Tests of functional hypotheses such as those sug-gested in Koenker and Xiao (2002), Chernozhukov and Fern´andez-Val (2005)and Chernozhukov and Hansen (2006) necessitate the estimation of a largenumber of quantile regressions. Some estimators also require the prelimi-nary estimation of the whole quantile regression process, such as the esti-mators suggested in Koenker and Portnoy (1987), Machado and Mata (2005)and Chernozhukov et al. (2013).In the population we can let the quantile index τ increase continuously from0 to 1 to obtain a continuous quantile regression coefficient process. In finitesamples, however, only a finite number of distinct quantile regressions exists.In the univariate case, there are obviously only n different sample quantiles.In the general multivariate case, the number of distinct solutions depends onthe specific sample but Portnoy (1991) was able to show that the numberof distinct quantile regressions is of order O p ( n log n ) when the number ofcovariates is fixed. Koenker and D’Orey (1987) provides an efficient parametriclinear programming algorithm that computes the whole quantile regressionprocess. Given b β ( τ ), a single simplex pivot is required to obtain the nextquantile regression estimates. This is a feasible solution when n and k arenot too large but it is not feasible to compute O p ( n log n ) different quantileregressions in the type of applications that we want to cover. Therefore, we donot try to estimate the whole quantile regression process. Instead, we discretizethe quantile regression process and estimate quantile regressions only on a gridof quantile indexes such as τ = 0 . , . , ..., . , . Parametric programming is a technique for investigating the effects of a change in theparameters (here of the quantile index τ ) of the objective function.0 Victor Chernozhukov et al. In order to approximate well enough the conditional distribution, severalestimators requires that the mesh width of the grid converges to 0 at the n − / rate or faster. Neocleous and Portnoy (2008) show that it is enoughto estimate quantile regressions on a grid with a mesh width of order n − / if we linearly interpolates between the estimated conditional quantiles and anadditional smoothness condition is satisfied (the derivative of the conditionaldensity must be bounded). Thus, interpolation may be used in a second stepto improve the estimated conditional quantile function but even in this case asignificant number of quantile regressions must be estimated.At the moment, when a researcher wants to estimate 99 quantile regres-sions, she must estimate them separately and the computation time will be 99times longer than the time needed for a single quantile regression. We buildon the preprocessing idea of Portnoy and Koenker (1997) and suggest an algo-rithm that exploits recursively the quantile regressions that have already beenestimated in order to estimate the next one. Assume we want to estimate J different quantile regressions at the quantileindexes τ < τ < ... < τ J . Algorithm 2 (Preprocessing for the quantile regression process)
To initialize the algorithm, b β ( τ ) is estimated using one of the traditionalalgorithms described above. Then, iteratively for j = 2 , ..., J :1. Use b β ( τ j − ) as a preliminary estimate.2. Calculate the residuals r i = y i − x ′ i b β ( τ j − ) and z i , a quickly computedconservative estimate of the standard error of r i . Calculate the τ − M n and τ + M n quantiles of r i z i . The observations below this first quantile are includedin J L ; the observations above this second quantile are included in J H ; the M = m · ( k · n ) / observations between these quantiles are kept for thenext step. m is a parameter that can be chosen by the user; by default it isset to .3. Solve the modified problem (11) and obtain b β ( τ j )
4. Check the residual signs of the observations in J L and J H :(a) If no bad signs (or the number of bad signs is below the number ofallowed mispredicted signs), b β ( τ j ) is the solution.(b) If less than . · M bad signs: take the observations with mispredictedsign out of J L and J H and go back to step 3.(c) If more than . · M bad signs: go back to step 2 with a doubled m . Compared to Algorithm 1, the preliminary estimate does not need to becomputed because we can take the already computed b β ( τ j − ). This will nat-urally provide a good guess of the sign of the residuals only if τ j and τ j − are See e.g. Chernozhukov et al. (2013). It is actually possible to use the estimates from the previous quantile regression as start-ing values for the next quantile regression. These better starting values allow for reducingthe computing time and are, therefore, used by all our algorithms. In his comment of Portnoy and Koenker (1997), Thisted (1997) suggests this idea, whichhas never been implemented to the best of our knowledge.ast algorithms for the quantile regression process 11 close. This can be formalized by assuming that √ n ( τ j − τ j − ) = O p (1). Whenthis condition is satisfied, then √ n ( b β ( τ j ) − b β ( τ j − )) = O p (1) by the stochas-tic equicontinuity of the quantile regression coefficient process. This justifieskeeping a smaller sample in step 2: while we kept a sample proportional to n / in Algorithm 1, we can keep a sample proportional to n / in Algorithm2. These two differences (no need for a preliminary estimate and smaller sam-ple size in the final regression) both imply that Algorithm 2 is faster than Al-gorithm 1 and should be preferred when a large number of quantile regressionsis estimated. Finally, note that step 4 of both algorithms makes sure that theestimates are numerically equal (or very close if we allow for a few mispre-dicted signs of the residuals) to the estimates that we would obtain using thesimplex or interior point algorithms. Thus, there is no statistical trade-off toconsider when deciding which algorithm should be used. This decision can bebased purely on the computing time of the algorithms. In the next section,we will consider an even faster algorithm but it will not provide numericallyidentical estimates. This new estimator will be only asymptotically equivalentto the traditional quantile regression estimator that solves the optimizationproblem (5). One-step estimators were introduced for their asymptotic efficiency by Le Cam(1956). A textbook treatment of this topic can be found in Section 5.7 ofvan der Vaart (1998). The one-step method builds on and improves a prelim-inary estimator. If this preliminary estimator is already √ n -consistent, thenthe estimator that solves a linear approximation to the estimating equation isasymptotically equivalent to the estimator that solves the original estimatingequation. In other words, when we start from a √ n -consistent starting value,then a single Newton-Raphson iteration is enough. Further iterations do notimprove the first-order asymptotic distribution.In this section we suggest a new estimator for the quantile regression coef-ficient process. This estimator is asymptotically equivalent to the estimatorspresented in the two preceding sections but may differ in finite samples. Theidea consists in starting from one quantile regression computed using a stan-dard algorithm and then obtaining sequentially the remaining regression coef-ficients using a single Newton-Raphson iteration for each quantile regression. In other words, this is a one-step estimator that uses a previously estimatedquantile regression for a similar quantile τ as the preliminary estimator.Assume that we have already obtained b β ( τ ), a √ n -consistent estimator of β ( τ ), and we would like to obtain b β ( τ + ε ) for a small ε . Formally, we assumethat ε = O ( n − / ) such that b β ( τ ) is √ n -consistent for β ( τ + ε ) because, by We start from the median regression in the simulations and application in Sections 6and 7.2 Victor Chernozhukov et al. the triangle inequality, k b β ( τ ) − β ( τ + ε ) k ≤ k b β ( τ ) − β ( τ ) k + sup τ ≤ τ ′ ≤ τ + ε (cid:13)(cid:13)(cid:13)(cid:13) d β ( τ ′ )d τ (cid:13)(cid:13)(cid:13)(cid:13) ε = O P ( n − / )where the last equality follows from (1) together with Assumptions 3 and 4.Then, by Theorem 5.45 in van der Vaart (1998) the one step estimator b β ( τ + ε ) = b β ( τ ) − b J ( τ ) − M (cid:16) τ + ε, b β ( τ ) (cid:17) has the same first-order asymptotic distribution as the quantile regressionestimator of β ( τ + ε ). Here we use that k b J ( τ ) − J ( τ + ε ) k ≤ k b J ( τ ) − J ( τ ) k + k J ( τ ) − J ( τ + ε ) k → P , by the triangle inequality, Assumptions 2 and 4, standard regularity conditionsfor b J ( τ ) → P J ( τ ), and ε →
0. Note that the previous argument holds forany τ , τ + ε ∈ T such that ε = O ( n − / ).The one-step estimator corresponds to a single Newton-Raphson iteration.It is possible to use the resulting value as the new starting value for a seconditeration and so on, but the first-order asymptotic distribution does not change.If we iterate until convergence, then J ( τ ) − M (cid:16) τ + ε, b β ( τ + ε ) (cid:17) = 0Since b J ( τ ) is asymptotically full rank by Assumption 3, this implies thatthe moment condition (6) must be satisfied at τ + ε such that we obtainnumerically the same values as the traditional quantile regression estimator of β ( τ + ε ). This property, together with the fact that the quantile regressionestimate is constant over a small range of quantile indexes, also implies that wecan get numerically identical estimates to the traditional quantile regressionestimator by choosing ε to be small enough. This is, however, not the objectivebecause the computation time of such a procedure would necessarily be higherthan that of the parametric linear programming algorithm. The algorithm is summarized below:
Algorithm 3 (One-step estimator for the quantile regression process)
To initialize the algorithm, b β ( τ ) is estimated using one of the traditionalalgorithms described above. Then, iteratively for j = 2 , ..., J :1. Use b β ( τ j − ) as a preliminary estimate.2. Estimate the Jacobian matrix with Powell (1991) estimator and Hall and Sheather(1988) bandwidth and obtain b J ( τ j − ) . Schmidt and Zhu (2016) have suggested a different iterative estimation strategy. Theyalso start from one quantile regression but they add or subtract sums of nonnegative func-tions to it to calculate other quantiles. Their procedure has a different objective (monotonic-ity of the estimated conditional quantile function) and their estimator is not asymptoticallyequivalent to the traditional quantile regression estimator.ast algorithms for the quantile regression process 13
3. Update the quantile regression coefficient b β ( τ j ) = b β ( τ j − ) − b J ( τ j − ) − n n X i =1 (cid:16) τ j − (cid:16) y i ≤ x i b β ( τ j − ) (cid:17)(cid:17) (12)The one-step estimator is much faster to compute when we do not choosea too fine grid of quantiles. Our simulations reported in Section 6 show thata grid with ε = 0 .
01 works well even for large samples. This result dependsnaturally on the data generating process. A fine grid may be needed when theconditional density of Y is changing quickly (i.e. when the first order approx-imation error is large). In practice, it is possible to estimate a few quantileregressions with the traditional estimator and check that the difference be-tween both estimators is small. Researchers have often found that the bootstrap is the most reliable method forboth pointwise and uniform inference. Naturally, this method is also the mostdemanding computationally. The time needed to compute the estimates can bea binding constraint when researchers are considering bootstrapping strategies.Fortunately, the same approaches that we have presented in the precedingsections (preprocessing and linear approximations) can also fruitfully reducethe computing time of the bootstrap.5.1 Preprocessing for the bootstrapA very simple modification of the preprocessing algorithm of Portnoy and Koenker(1997) leads to significant improvements. The advantage when we compute aquantile regression for a bootstrap sample is that we can use the already com-puted estimate in the whole sample to guess the sign of the residuals. Thismeans that we can skip step 1 of the preprocessing Algorithm 1. In addition,this preliminary estimate is more precise because it was computed using asample of size n instead of ( k · n ) / in the original preprocessing algorithm.Thus, we need only to keep a lower number of observations in step 2. Wechoose M = m · ( k · n ) / where m is set by default to 3 but can be modifiedby the user. This multiplying constant 3 was chosen because it worked well inour simulations. We do not have a theoretical justification for this choice andfurther improvements should be possible. In particular, it should be possibleto adjust this constant during the process (that is after the estimates in a fewbootstrap samples have been computed) by increasing it when the sign of toomany residuals is mispredicted in step 4 or decreasing it when the sign of theresiduals is almost never mispredicted. A similar idea could be applied to adjust the constant m in Algorithm 2. The additionaldifficulty is that the optimal constant probably depends on the quantile index τ , which isnot the case for the bootstrap.4 Victor Chernozhukov et al. We now define formally the algorithm for the empirical bootstrap but asimilar algorithm can be applied to all types of the exchangeable bootstrap.
Algorithm 4 (Preprocessing for the bootstrap)
For each bootstrap iteration b = 1 , ..., B , denote by { y ∗ bi , x ∗ bi } ni =1 the bootstrapsample:1. Use b β ( τ ) as a preliminary estimate.2. Calculate the residuals r ∗ bi = y ∗ bi − x ∗ b ′ i b β ( τ ) and z ∗ bi , a quickly computedconservative estimate of the standard error of r ∗ bi . Calculate the τ − M n and τ + M n quantiles of r ∗ bi z ∗ bi . The observations below this first quantile areincluded in J L ; the observations above this second quantile are included in J H ; the M = m · ( k · n ) / observations between these quantiles are kept forthe next step. m is a parameter that can be chosen by the user; by defaultit is set to .3. Solve the modified problem (11) for the sample { y ∗ bi , x ∗ bi } ni =1 and obtain b β ∗ b ( τ ) .4. Check the residual signs of the observations in J L and J H :(a) If no bad signs (or the number of bad signs is below the number ofallowed mispredicted signs), b β ∗ b ( τ ) , is the solution.(b) If less than . · M bad signs: take the observations with mispredictedsign out of J L and J H and go back to step 3.(c) If more than . · M bad signs: go back to step 2 with a doubled m . With this algorithm, bootstrapping a single quantile regression becomes faster than bootstrapping the least squares estimator. The preprocessing strat-egy does not apply to least squares estimators because of the global nature ofthese estimators.When the whole quantile regression process is bootstrapped, either Algo-rithm 2 or Algorithm 4 can be applied. In our simulations we found thatthe computing times were similar for these two algorithms. In our implemen-tation, Algorithm 4 is used in this case. Even shorter computing times canbe obtained either by using the one-step estimator in each bootstrap sampleor by bootstrapping the linear representation of the estimator, which we willpresent in the next subsection. These two algorithms are not numerically iden-tical to the traditional quantile regression estimator and require the choice ofa smoothing parameter to estimate the Jacobian matrix.5.2 Score resampling (or one-step bootstrap)Even with the preprocessing of Algorithm 4, we must recompute the esti-mates in each bootstrap draw, which is naturally computationally demanding. Algorithm 2 can be slightly improved by using preprocessing with ˆ β ( τ ) as a preliminaryestimate of ˆ β ∗ b ( τ ) instead of computing it completely from scratch.ast algorithms for the quantile regression process 15 Instead of resampling the covariates and the response to compute the coeffi-cients, we can resample the asymptotic linear (Bahadur) representation of theestimators. Belloni et al. (2017) suggest and prove the validity of the multi-plier score bootstrap. For b = 1 , ..., B we obtain the corresponding bootstrapdraw of b β ( τ ) via b β ∗ b ( τ ) = b β ( τ ) − b J ( u ) − n n X i =1 ξ ∗ bi (cid:16) τ − (cid:16) y i ≤ x ′ i b β ( τ ) (cid:17)(cid:17) (13)where (cid:0) ξ ∗ bi (cid:1) ni =1 are iid random variables that are independent from the dataand must satisfy the following restrictions: E (cid:2) ξ ∗ bi (cid:3) = 0 and V ar (cid:0) ξ xbi (cid:1) = 1 . We have implemented this score multiplier bootstrap for three different dis-tributions of ξ ∗ bi (i) ξ ∗ bi ∼ E − E is a standard exponential randomvariable, which corresponds to the Bayesian bootstrap (see e.g. Hahn (1997)),(ii) ξ ∗ bi ∼ N (0 , ξ ∗ bi ∼ N / √ (cid:0) N − (cid:1) / N and N aremutually independent standard normal random variables, which correspondsto the wild bootstrap (see e.g. Mammen et al. (1993)). In addition to the mul-tiplier bootstrap, we have also implemented the score bootstrap suggested inChernozhukov and Hansen (2006), which corresponds to (13) with multino-mial weights. Since these different distributions give very similar results, wereport only the performance of the wild score bootstrap in the simulations inSection 6.The score resampling procedure can be interpreted as a one-step estimatorof the bootstrap value where we use the sample estimate as the preliminaryvalue. This interpretation can be seen by comparing equation (12) with equa-tion (13). Kline and Santos (2012) notice this connection between the scorebootstrap and one-step estimators in their remark 3.1.While the score bootstrap is much faster to compute than the exchangeablebootstrap, it requires the preliminary estimation of the matrix J ( τ ). Thismatrix is a function of f Y ( X ′ β ( τ ) | X ), which is relatively difficult to estimateand necessitates the choice of a smoothing parameter. Thus, the multiplierbootstrap has no advantage in this respect compared to analytical estimatorsof the pointwise standard errors. On the other hand, the score bootstrap canbe used to test functional hypotheses within relatively limited computing time(see Chernozhukov and Hansen (2006) for many examples).To summarize, for bootstrapping the quantile regression process we canrecompute the estimate in each bootstrap draw and each quantile using pre-processing as introduced in Section 5.1. Alternatively, we can use a first-orderapproximation starting from another quantile regression in the same bootstrapsample (one-step estimator) or we can use a first-order approximation startingfrom the sample estimate at the same quantile (score resampling). The simu-lations in the following section will compare these different approaches bothin term of computing time as in term of the quality of the inference. The results in Sections 6 and 7 were obtained using Stata 14.2. We comparethe official Stata command (qreg), which implements a version of the simplexalgorithm according to the documentation, with our own implementation ofthe other algorithms directly in Stata. The codes and the data to replicatethese results are available from the authors. The reference machine used forall the simulations is an AMD Ryzen Threadripper 1950X with 16 cores at3.4 GHz and 32 GB of RAM (but for each simulation each algorithm exploitsonly one processor, without any parallelism).6.1 Computation timesWe use empirical data to simulate realistic samples. We calibrate the datagenerating processes to match many characteristics of a Mincerian wage re-gression. In particular, we draw covariates (such as education in years andas a set of indicator variables, a polynomial in experience, regional indica-tor variables, etc.) from the Current Population Survey data set from 2013.We consider different sets of regressors with dimensions ranging from 8 to 76.Then, we simulate the response variable by drawing observations from theestimated conditional log wage distribution.Table 1 provides the average computing times (over 200 replications) forthree different cases. In the first panel, we compare different algorithms toestimate a single quantile regression. It appears that the interior point algo-rithm (denoted by fn in the table) is always faster than the simplex algorithmas implemented by Stata. Preprocessing (denoted by pfn and pqreg) becomesan attractive alternative when the number of observations is ‘large enough’,where ‘large enough’ corresponds to 5 ,
000 observations when there are 8 re-gressors but 500 ,
000 observations when there are 76 regressors. In this lastconfiguration ( n = 500 ,
000 and k = 76) the computing time needed by theinterior point algorithm with the preprocessing step is 35 times lower thanthe computing time of the built-in Stata’s command. The improvement inthis first panel is only due to the implementation of algorithms suggested byPortnoy and Koenker (1997). All these estimators provide numerically identi-cal estimates.The second panel of Table 1 compares the algorithms when the objectiveis to estimate 99 different quantile regressions, one at each percentile. For allsample sizes considered the algorithm with the preprocessing step suggestedin Section 3.2 computes the same estimates at a fraction of the computingtime of the official Stata command. When n = 50 , We provide the results for the median regression but the ranking was similar at otherquantile indexes.ast algorithms for the quantile regression process 17
Table 1. Comparison of the algorithms
Number of observationsAlgorithm 500 1000 5000 10k 50k 100k 500kPanel 1: Estimation of a single quantile regressionA. Computing time in seconds for τ = 0 . k = 8qreg (simplex) 0.05 0.06 0.13 0.24 1.35 2.92 16.84fn 0.02 0.03 0.12 0.21 0.95 1.98 13.04pqreg 0.02 0.02 0.07 0.12 0.62 1.23 6.36pfn 0.04 0.06 0.10 0.16 0.61 1.08 5.08B. Computing time in seconds for τ = 0 . k = 76qreg (simplex) 0.26 0.43 2.14 4.80 37.75 85.69 622.4fn 0.06 0.08 0.25 0.41 1.93 3.72 20.78pqreg 0.17 0.23 0.80 2.51 11.36 22.08 89.44pfn 0.09 0.11 0.30 0.73 2.47 4.30 17.61Panel 2: Estimation of the whole QR process (99 quantile regressions, k = 20)Computing time in secondsqreg (simplex) 5.93 9.01 48.86 119 1,084preprocessing, qreg 0.64 0.85 2.93 6.96 57.70preprocessing, fn 2.02 1.90 3.95 6.55 30.67one-step estimator 0.40 0.45 1.00 1.77 8.29Panel 3: Bootstrap of the median regression (50 bootstrap replications, k = 20)Computing time in secondsbsqreg (simplex) 2.63 3.14 8.19 15.06 99.89 163.3preprocessing 0.42 0.63 1.89 3.50 13.55 18.52multiplier 0.08 0.10 0.25 0.60 1.36 1.90Note: Average computing times over 200 replications. asymptotically equivalent to the other algorithms, we also measure its perfor-mance compared to the traditional quantile regression estimator in Subsection6.2.In the third panel of Table 1 we compare the computing time needed tobootstrap 50 times the median regression with the official Stata command,with the preprocessing step introduced in Section 5.1 and with the score boot-strap described in Section 5.2. The results show that the preprocessing stepdivides the computing time by about 4 for small sample sizes and by 9 for thelargest sample size that we have considered. Using the multiplier bootstrap di-vides the computing time one more time by a factor of 10. However, this typeof bootstrap is not numerically identical to the previous ones. The simulationsin the subsections 6.3 and 6.4 show that it tends to perform slightly worse insmall samples.6.2 Performance of the one-step estimatorTable 2 and Figure 1 measure the performance of the one-step estimator.We use the same data generating process as in the previous subsection with k = 20 regressors. We perform 10 ,
000 replications to get precise results. Wefirst note that the one-step algorithm sometimes does not converge. The reasonis that b J ( τ ) is occasionally singular or close to singular in small samples. As aconsequence, the one-step estimate b β ( τ + ε ), which is linear in b J ( τ ) − , takesa very large value. Once the estimated quantile regression coefficients at onequantile index are very far away from the true values, then the Jacobian b J ( τ )and the coefficients at the remaining quantile indices will be very far fromthe true values. It would be possible to detect convergence issues by checkingthat the moment conditions are approximately satisfied or by estimating afew quantile regressions with a traditional algorithm and checking that theestimates are close. We did not yet implement this possibility because theconvergence problem appears only in small samples (when n ≤ × τ = 0 . , . , ... .
99. Thus, in principle, thereare 1980 different parameters of interest and we could analyze each of themseparately. Instead, we summarize the results and report averages (over quan-tile indices and regressors) of several measures of performance in Table 2. InFigure 1 we plot the main measure separately for each quantile (but averagedover the regressors since the results are similar for different regressors). Thesecond part of Table 2 reports the standard measures of the performance ofan estimator: squared (mean) bias, variance, and mean squared error (MSE).Since these measures based on squared errors are very sensitive to outliers, inthe third part of the table, we also report measures based on absolute devi-ations: the absolute median bias, median absolute deviation (MAD) aroundthe median, and median absolute error (MAE) around the true value.The bias of the one-step estimator is slightly larger than the bias of thetraditional estimator in small and moderate sample sizes but it contributesonly marginally to the MSE and MAE because it is small. When n = 500,the MAE (MSE) of the one-step estimator is on average 3% (8 . ,
000 observations. Thus, the one-step es-timator performs well exactly when we need it, i.e. when the sample size islarge and computing the traditional quantile regression estimator may taketoo much time.In the last column of Table 2, we can assess the role of ε , the distancebetween quantile indices. In the first 6 columns, we use ε = 0 .
01 and estimate99 quantile regressions for τ = 0 . , . , ..., .
99. To satisfy the conditions To make the estimates comparable across quantiles and regressors, we first normalizethem such that they have unit variance in the specification with n = 50 , Table 2. Performance of the one-step estimator
Number of obs. 500 1000 5000 10k 50k 50kQuantile step ε for the 1-step est. 0.01 0.01 0.01 0.01 0.01 0.001Convergence of the one-step estimator:proportion converged 0.581 0.754 0.999 1.000 1.000 1.000Measures based on squared errors (averages over 99 quantiles and 20 coefficients):squared bias, QR 0.383 0.174 0.055 0.037 0.020squared bias, 1-step 0.364 0.178 0.070 0.054 0.040 0.019variance, QR 141.6 62.68 10.70 5.182 1.000variance, 1-step 156.9 66.99 10.76 5.119 0.996 0.991MSE, QR 142.0 62.85 10.76 5.219 1.020MSE, 1-step 157.2 67.17 10.83 5.173 1.035 1.010relative MSE of 1-step 1.087 1.052 1.001 0.988 1.015 0.997Measures based on absolute errors (averages over 99 quantiles and 20 coefficients):absolute median bias, QR 0.416 0.281 0.166 0.138 0.100absolute median bias, 1-step 0.476 0.317 0.201 0.175 0.151 0.100MAD around the median, QR 7.719 5.190 2.173 1.518 0.666MAD around the median, 1-step 8.004 5.290 2.161 1.507 0.669 0.665MAE, QR 7.731 5.197 2.177 1.523 0.672MAE, 1-step 8.021 5.303 2.172 1.518 0.685 0.671relative MAE of 1-step 1.030 1.015 0.995 0.996 1.019 1.001Note: Statistics computed over 10 ,
000 replications. required to prove the asymptotic equivalence of the one-step and the tradi-tional estimators, we should decrease ε when the sample size increases. Thefixed difference between quantile indices does not seem to prevent the relativeMAE and MSE of the one-step estimator to converge to 1 when n = 10 , n = 50 ,
000 we see a (slight) increase in the relative MSE andMAE of the one-step estimator. Therefore, in the last column of the table,we reduce the quantile step to ε = 0 .
001 and we see that the relative MAEand MSE go back to 1 as predicted by the theory. Of course, the computingtime increases when the quantile step decreases such that the researcher mayprefer to pay a (small) price in term of efficiency to compute the estimatesmore quickly.Figure 1 plots the relative MAE of the one-step estimator as a functionof the quantile index. Remember that we initialize the algorithm at the me-dian such that both estimators are numerically identical at that quantile. With500 or 1 ,
000 observations, we nevertheless observe the largest relative MAE atquantile indices close to the median. On the other hand, the one-step estimatoris more efficient than the traditional estimator at the tails of the distribution.Given that the extreme quantile regressions are notoriously hard to estimate,we speculate that the one-step estimator may be able to reduce the estimationerror by using implicitly some information from less extreme quantile regres- . . . R e l a t i v e M ed i an A b s o l u t e E rr o r o f t he − s t ep e s t i m a t o r Number of observations
Fig. 1. Relative MAE as a function of the quantile and the number of obser-vationssions, similar to the extrapolation estimators for extreme quantile regression ofWang et al. (2012) and He et al. (2016). When the sample size increases, thecurve becomes flatter and the relative MAE is very close to 1 at all quantilesbetween the first and ninth decile.6.3 Pointwise inferenceThere exist several inference methods for quantile regression with very differ-ent computational burdens. In this subsection we compare the methods forpointwise inference and the next subsection the methods for uniform infer-ence. In both cases we simulate data from the same data generating processas Hagemann (2017): y i = x i + (0 . x i ) · u i (14)where x i ∼ N (0 ,
1) and u i ∼ N (0 , / x i at a single quantile index. The intended statistical level ofthe tests is 5%. We consider separately two different quantile indexes τ = 0 . τ = 0 . ast algorithms for the quantile regression process 21 Table 3. Pointwise inference
Median regression 0.85 quantile regression
We compare the following methods for inference: (i) the kernel estimatorof the pointwise variance (10), which can be computed very quickly, (ii) theempirical bootstrap of the quantile regression coefficients that solves the fulloptimization problem (5) and is the most demanding in term of computationtime, (iii) the empirical bootstrap of the one-step estimator (3), which uses alinear approximation in the quantile but not in the bootstrap direction, (iv) thescore multiplier bootstrap based on the quantile regression estimator, whichuses a linear approximation in the bootstrap but not in the quantile direction,(v) the score multiplier bootstrap based on the one-step estimator, which usestwo linear approximations and is the fastest bootstrap implementation. Notethat we have initialized the one-step estimator at the median such that thereis no difference at the median but there can be a difference at the 0 .
85 quantileregression between the one-step and the original quantile regression estimators.In small samples all methods over-reject the correct null hypothesis butthey all perform satisfactorily in large samples with empirical sizes that arevery close to the theoretical size. The empirical bootstrap exhibits the lowestsize distortion, which may warrant its higher computing burden. The ana-lytic kernel-based method and the score multiplier bootstrap perform verysimilarly; this is not a surprise because they should provide the same stan-dard errors when the number of bootstrap replications goes to infinity. Thus,there is no reason to use the score bootstrap when the goal is only to performpointwise inference. The tests based on the one-step estimator displays a poorperformance in very small samples. This is simply the consequence of the poorquality of the point estimates in very small samples that was shown in Table2. Like for the point estimates, the performance improves quickly and becomesalmost as good as the tests based on the original quantile regression estimator.
Table 4. Uniform inference
Kolmogorov-Smirnov Cramer-von-Mises X is uniformly equal to 1(this is true) and that the coefficient on X is uniformly equal to 0 (this isfalse). We test these hypotheses with Kolmogorov-Smirnov (supremum of thedeviations from the null hypothesis over the quantile range T ) and Cramer-vonMises statistics (average deviation from the null hypothesis over the quantilerange T ), both with Anderson-Darling weights. We estimate the critical valuesusing either the empirical bootstrap or the score bootstrap. We estimate thediscretized quantile regression process for τ = 0 . , . , . , ..., . k = 3) and tothe smoothness of the conditional distribution of the response variable.. In this section we update the results obtained by Abrevaya (2001) and Koenker and Hallock(2001) using data from June 1997. They utilized quantile regression to ana- See Chernozhukov and Fern´andez-Val (2005), Angrist et al. (2006),Chernozhukov and Hansen (2006) and Belloni et al. (2017) for more details and proofs ofthe validity of these procedures.ast algorithms for the quantile regression process 23 lyze the effect of prenatal care visits, demographic characteristics and motherbehavior on the distribution of birth weights. We use the 2017 Natality Dataprovided by the National Center for Health Statistics. Like Abrevaya (2001)and Koenker and Hallock (2001) we keep only singleton births, with mothersrecorded as either black or white, between the ages of 18 and 45, residing inthe United States. We also dropped observations with missing data for any ofthe regressors. This procedure results in a sample of 2,194,021 observations.We regress the newborn birth weights in grams on a constant and 13 co-variates: gender of the newborn; race, marital status, education (4 categories),and age (and its square) of the mother; an indicator for whether the mothersmoked during the pregnancy and the average number of cigarettes smoked perday; and the prenatal medical care of the mother divided in four categories.We estimate 91 quantile regressions for τ = 0 . , . , ..., .
95 and perform 100bootstrap replications. Given the large number of observations, regressors,quantile regressions and bootstrap replications, we use the fastest procedures,which is the one-step quantile regression estimator combined with the scoremultiplier bootstrap. The computation of all these quantile regressions andbootstrap simulations took about 30 minutes on a 4-cores processor clokedat 3.7 GHz, i.e. a relatively standard notebook. With the built-in Stata com-mands the computation of the same quantile regressions and bootstrap shouldtake more than 2 months. The new algorithms clearly open new opportunitiesfor quantile regression.Figures 2 and 3 present the results for this example. For each coefficient,the line shows the point estimates, the dark grey area shows the 95% pointwiseconfidence intervals and the light grey area shows the 95% uniform confidencebands, which are wider by construction. Any functional null hypothesis thatlies outside of the uniform confidence band even at a single quantile can berejected at the 5% level. For instance, it is obvious that none of the bandscontains the value 0 at all quantiles. We can, therefore, reject the null hypoth-esis that the regressor is irrelevant in explaining birth weight for all includedcovariates. For all variables we can also reject the null hypothesis that thequantile regression coefficient function is constant (location-shift model) be-cause there is no value that is inside of the uniform bands at all quantiles. Onthe other hand, we cannot reject the location-scale null hypothesis for somecovariates because there exists a linear function of the quantile index that iscovered by the band. This is the case for instance for all variables measuringthe education of the mother.Birth weight is often used as a measure of infant health outcomes at birth.Clearly, a low birth weight is associated in the short-run with higher one-year Due to computational limitations, Abrevaya (2001) and Koenker and Hallock (2001)used only the June subsample to estimate 5, respectively 15, different quantile regressionsand they avoided bootstrapping the results. Of course, computers have become more pow-erful in the meantime. See the supplementary appendix SA to Chernozhukov et al. (2013) for the constructionand the validity of the uniform bands. The largest p-value is 0 .
02 for high school.4 Victor Chernozhukov et al.
Mother age − − . − . − . Mother age squared
Boy − − − − Black
Married − High school
Some college
College
Fig. 2. Quantile regression estimates of the birth weight model ast algorithms for the quantile regression process 25 − − − No prenatal doctor visit − − Medical care, 2nd trimester − Medical care, 3rd trimester − − − − − Mother smoked − − − − Cigarettes per day
Intercept
Fig. 3. Quantile regression estimates of the birth weight model (cont.)mortality rates and in the longer run with worse educational attainment andearnings, see, e.g., Black et al. (2007). On the other hand, the goal of pol-icy makers is clearly not to maximize birth weights. In a systematic review,Baidal et al. (2016) find that higher birth weight was consistently associatedwith higher risk of childhood obesity. Thus, we do not want to report the aver-age effects of potential determinants of the birth weight but we need to analyzeseparately their effects on both tails of the conditional birth weight distribu-tion. In the location-shift model, the effect of the covariates is restricted to bethe same over the whole distribution and it can be estimated by least-squaresmethods. But the homogeneity of the effects is rejected for all regressors suchthat we need quantile regression.
In light of this discussion, it is interesting to see that the effect of a highermother education (especially of a college degree) is much higher at the lowerend of the distribution where we want to increase the birth weight. Similarly,not seeing a doctor during the whole pregnancy has dramatic effect at thelower end of the distribution. While not recommended, not seeing a doctorhas only mild effect at the top of the distribution. In general, the results aresurprisingly similar to the results reported by Koenker and Hallock (2001) intheir Figure 4. Twenty years later, the shape, the sign and even the magnitudeof most coefficients are still similar.
The advance of technology is leading to the collection of such a large amount ofdata that they cannot be saved or processed on a single computer. When thisis the case, Volgushev et al. (2019) suggest estimating separately J quantileregressions on each computer and averaging these estimates. In a second step,they obtain the whole quantile process by using a B-spline interpolation of theestimated quantile regressions. We think that the new algorithms, in partic-ular the one-step estimator, can be useful for the distributed computation ofthe quantile regression process. The computation of the one-step estimator re-quires only the communication of a k × k × k matrix of samplemeans, which can be averaged by the master computer. Thus, the one-stepestimator can be implemented even when the data set must be distributed onseveral computers.The optimization problem that defines the linear quantile regression esti-mator is actually easy to solve because it is convex. The optimization prob-lems defining, for instance, censored quantile regression, instrumental variablequantile regression, binary quantile regression are more difficult to solve byseveral orders of magnitude because they are not convex. Estimating thewhole quantile process for these models may not be computationally feasiblefor the moment. It should be possible to adapt the suggested preprocessing andone-step algorithms to estimate the quantile processes also in these models.
Acknowledgements
We would like to thank the associate editor Roger Koenker, twoannomynous referees, and the participants to the conference “Economic Applications ofQuantile Regressions 2.0” that took place at the Nova School of Business and Economicsfor useful comments. See Powell (1987) for the censored quantile regression estimator,Chernozhukov and Hansen (2006) for the instrumental variable quantile regressionestimator and Kordas (2006) for the binary quantile regression estimator, which is ageneralization of Manski (1975) maximum score estimator.ast algorithms for the quantile regression process 27
References
Abrevaya J (2001) The effects of demographics and maternal behavior on thedistribution of birth outcomes. Empirical Economics 26(1):247–257Angrist J, Chernozhukov V, Fern´andez-Val I (2006) Quantile regression undermisspecification, with an application to the us wage structure. Econometrica74:539–563Baidal JAW, Locks LM, Cheng ER, Blake-Lamb TL, Perkins ME, Taveras EM(2016) Risk factors for childhood obesity in the first 1,000 days: a systematicreview. American journal of preventive medicine 50(6):761–779Barrodale I, Roberts F (1974) Solution of an overdetermined system of equa-tions in the l 1 norm [f4]. Communications of the ACM 17(6):319–320Belloni A, Chernozhukov V, Fern´andez-Val I, Hansen C (2017) Programevaluation and causal inference with high-dimensional data. Econometrica85(1):233–298Black SE, Devereux PJ, Salvanes KG (2007) From the cradle to the labormarket? the effect of birth weight on adult outcomes. The Quarterly Journalof Economics 122(1):409–439Chernozhukov V, Fern´andez-Val I (2005) Subsampling inference on quantileregression processes. Sankhya: The Indian Journal of Statistics 67:253–276Chernozhukov V, Fern´andez-Val I (2011) Inference for extremal conditionalquantile models, with an application to market and birthweight risks. TheReview of Economic Studies 78(2):559–589Chernozhukov V, Hansen C (2006) Instrumental quantile regression inferencefor structural and treatment effect models. Journal of Econometrics 132:491–525Chernozhukov V, Fern´andez-Val I, Melly B (2013) Inference on counterfactualdistributions. Econometrica 81(6):2205–2268Fortin N, Lemieux T, Firpo S (2011) Decomposition methods in economics.In: Handbook of labor economics, vol 4, Elsevier, pp 1–102Gin´e E, Zinn J, et al. (1984) Some limit theorems for empirical processes. TheAnnals of Probability 12(4):929–989Hagemann A (2017) Cluster-robust bootstrap inference in quantile regressionmodels. Journal of the American Statistical Association 112(517):446–456Hahn J (1997) Bayesian bootstrap of the quantile regression estimator: a largesample study. International Economic Review pp 795–808Hall P, Sheather SJ (1988) On the distribution of a studentized quantile. Jour-nal of the Royal Statistical Society, Series B 50:381–391He F, Cheng Y, Tong T (2016) Estimation of extreme conditional quantilesthrough an extrapolation of intermediate regression quantiles. Statistics andProbability Letters 113:30–37Kline P, Santos A (2012) A score based approach to wild bootstrap inference.Journal of Econometric Methods 1(1):23–41Koenker R (2000) Galton, edgeworth, frisch, and prospects for quantile regres-sion in econometrics. Journal of Econometrics 95(2):347–374Koenker R (2005) Quantile Regression. Cambridge University Press
Koenker R (2017) Computational methods for quantile regression. In: Hand-book of Quantile Regression, Chapman and Hall/CRC, pp 55–67Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50Koenker R, d’Orey V (1994) Remark as r92: A remark on algorithm as 229:Computing dual regression quantiles and regression rank scores. Journal ofthe Royal Statistical Society Series C (Applied Statistics) 43(2):410–414Koenker R, Hallock KF (2001) Quantile regression. Journal of Economic Per-spectives 15:143–156Koenker R, Portnoy S (1987) L-estimation for linear models. Journal of theAmerican Statistical Association 82(399):851–857Koenker R, Xiao Z (2002) Inference on the quantile regression process. Econo-metrica 70:15831612Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of QuantileRegression. CRC PressKoenker RW, D’Orey V (1987) Algorithm as 229: Computing regression quan-tiles. Journal of the Royal Statistical Society Series C (Applied Statistics)36(3):383–393Kordas G (2006) Smoothed binary regression quantiles. Journal of AppliedEconometrics 21(3):387–407Le Cam L (1956) On the asymptotic theory of estimation and testing hy-potheses. In: Proceedings of the Third Berkeley Symposium on Mathemat-ical Statistics and Probability, Volume 1: Contributions to the Theory ofStatistics, Berkeley: University of California Press, pp 129–156Machado J, Mata J (2005) Counterfactual decomposition of changes in wagedistributions using quantile regression. Journal of Applied Econometrics20:445–465Mammen E, et al. (1993) Bootstrap and wild bootstrap for high dimensionallinear models. The annals of statistics 21(1):255–285Manski CF (1975) Maximum score estimation of the stochastic utility modelof choice. Journal of Econometrics 3:205–228Neocleous T, Portnoy S (2008) On monotonicity of regression quantile func-tions. Statistics & probability letters 78(10):1226–1229Portnoy S (1991) Asymptotic behavior of the number of regression quantilebreakpoints. SIAM journal on scientific and statistical computing 12(4):867–883Portnoy S, Koenker R (1997) The gaussian hare and the laplacian tortoise:computability of squared-error versus absolute-error estimators. StatisticalScience 12(4):279–300Powell JL (1987) Semiparametric estimation of bivariate latent variable mod-els. unpublished manuscript University of Wisconsin-MadisonPowell JL (1991) Estimation of monotonic regression models under quan-tile restrictions. Nonparametric and semiparametric methods in Economet-rics,(Cambridge University Press, New York, NY) pp 357–384Schmidt L, Zhu Y (2016) Quantile spacings: A simple method for the joint es-timation of multiple quantiles without crossing. Available at SSRN 2220901 ast algorithms for the quantile regression process 29ast algorithms for the quantile regression process 29