[PDF] Quantile regression with generated dependent variable and covariates

Abstract

We study linear quantile regression models when regressors and/or dependent variable are not directly observed but estimated in an initial first step and used in the second step quantile regression for estimating the quantile parameters. This general class of generated quantile regression (GQR) covers various statistical applications, for instance, estimation of endogenous quantile regression models and triangular structural equation models, and some new relevant applications are discussed. We study the asymptotic distribution of the two-step estimator, which is challenging because of the presence of generated covariates and/or dependent variable in the non-smooth quantile regression estimator. We employ techniques from empirical process theory to find uniform Bahadur expansion for the two step estimator, which is used to establish the asymptotic results. We illustrate the performance of the GQR estimator through simulations and an empirical application based on auctions.

Full PDF

aa r X i v : . [ ec on . E M ] D ec Quantile regression with generated dependent variable and covariates

Jayeeta Bhattacharya ∗ December 2020

Abstract

We study linear quantile regression models when regressors and/or dependent variable are notdirectly observed but estimated in an initial ﬁrst step and used in the second step quantile regres-sion for estimating the quantile parameters. This general class of generated quantile regression(GQR) covers various statistical applications, for instance, estimation of endogenous quantile re-gression models and triangular structural equation models, and some new relevant applicationsare discussed. We study the asymptotic distribution of the two-step estimator, which is challeng-ing because of the presence of generated covariates and/or dependent variable in the non-smoothquantile regression estimator. We employ techniques from empirical process theory to ﬁnd uniformBahadur expansion for the two step estimator, which is used to establish the asymptotic results.We illustrate the performance of the GQR estimator through simulations and an empirical appli-cation based on auctions.

Keywords:

Two-stage estimation, generated regressors, generated dependent variable, quantileregression, asymptotic variance, Bahadur expansion ∗ Lecturer, Department of Economics, University of Southampton. Correspondence email:[email protected]. This work was completed during my PhD at Queen Mary University of London(QMUL) and I am deeply thankful to Emmanuel Guerre for his valuable comments and supervision. I also thankparticipants at various conferences for insightful comments. Generous funding from the School of Economics andFinance, QMUL, is also gratefully acknowledged.

Introduction

Econometric analysis often requires the use of regressors that are not directly observed but have beenestimated in a preliminary ﬁrst step. A rich literature exists on estimation and inference in modelswith generated regressors. Pagan (1984) and Oxley & McAleer (1993) provide surveys for parametricmodels with generated regressors while Mammen, Rothe & Schienle (2012) study and illustrate variousexamples for non-parametric regression with generated covariates. Studying the asymptotic propertiesof two step estimators in a parametric context, Murphy & Topel (1985) points out that ignoring theeﬀect of ﬁrst step estimation leads to incorrect asymptotic standard errors.While these models are concerned with the characterization of the conditional mean, a more com-plete picture of the conditional distribution of a dependent variable is provided by quantile regression(QR) models. Since the seminal work of Koenker & Bassett (1978), quantile regression is widely usedin both empirical studies and theoretical statistics for analysing conditional quantile functions in linearand nonlinear response models. Quantile regression applications using generated regressors abound inliterature, most prominently related to models with endogenous covariates. Chernozhukov & Hansen(2005, 2006, 2008) develop identiﬁcation and estimation for QR models in the presence of endogeneity.Another popular approach to deal with endogeneity uses the estimated reduced form residuals as con-trol variables in quantile regression. This technique has been applied in endogenous censored quantileregression models by Blundell & Powell (2007) and Chernozhukov, Fern´andez-Val & Kowalski (2015).Estimation of quantile treatment eﬀects or quantile parameters in triangular simultaneous equationmodels using the control variable approach have been considered in Chesher (2003), Koenker & Ma(2006), Lee (2007), Imbens & Newey (2009), and Chernozhukov, Fern´andez-Val, Newey, Stouli & Vella(2017). There are, however, few references that develop a general theory for quantile models with gen-erated covariates and systematically study its statistical properties. The only related work seems tobe Chen, Galvao & Song (2018), who consider estimation and inference of quantile regression whenregressors are generated. However, they study two step quantile estimation when the second stepestimator is diﬀerentiable with respect to the ﬁrst stage, which may not hold true for some relevantapplications since the quantile regression objective function is not smooth. They also do not considertransformation for the dependent variable as permitted here.This paper considers the general framework of QR models when either the regressors or the de-pendent variable (or both) are generated and studies the asymptotic behaviour of the two-step QRestimator, called the generated quantile regression (GQR) estimator, without being tailored to anyspeciﬁc application. An example giving rise to generated dependent variable is quantile speciﬁcationswith some constant slope parameters, as in the setup of Zou & Yuan (2008). Their composite quan-tile regression (CQR) method can be used to estimate the constant and quantile-varying parameterstogether, but its asymptotic properties have been studied for the estimation of constant parametersonly. To focus on quantile estimation, the constant slope parameters can ﬁrst be estimated by linearregression (or any other suitable method) in a ﬁrst step. Estimation of the quantile-varying slopeparameters, thereafter, involves quantile regression with the dependent variable generated as a func-tion of the constant slope parameters and the corresponding covariates. Removing some parameters1hrough the ﬁrst step estimation may alleviate the computational burden of the CQR method causedby a large number of variables. Moreover, covariates make the QR estimators non-monotonic, evenif the quantile function is increasing. So, the expectation is that reducing the dimensionality of theregressors in the QR stage by removing some covariates through the ﬁrst stage of estimation allowsto get closer to monotonicity, a desirable property for quantile estimation. The two-step procedurecan also simplify estimation of complex models like random coeﬃcient models, popularised in demandanalysis by Berry, Levinsohn & Pakes (1995). Hoderlein, Klemel¨a & Mammen (2010) propose non-parametric estimation of the distribution of the random coeﬃcients. Studying the econometrics ofauctions, Gimenes & Guerre (2020) consider quantile speciﬁcations arising from elliptically distributedrandom coeﬃcients, which includes multivariate normal, lognormal or Student distribution. Its two-step estimation involves a median normalisation to identify and estimate the elliptical distributionlocation and dispersion parameters in the ﬁrst step, which can generate the dependent variable forthe second stage quantile regression for estimating sample quantiles. Another example for generateddependent variable arises in quantile models where the dependent variable is transformed based onsome transformation parameter, like Box-Cox transformation, to induce some desirable properties forstatistical inference. The joint estimation of quantile varying transformation and slope parametersthrough non-linear quantile regression is computationally diﬃcult, in addition to a numerical problembeing that the objective function is not deﬁned for all parameter values and observations (meaning es-timation occurs by omitting such values). Estimating the transformation parameter in a ﬁrst step willavoid such numerical problem and involve a linear quantile regression, ensuring a better performanceof the numerical algorithm used to compute the estimator.It is well known that the ﬁrst step estimates impact the overall asymptotic behaviour of the ﬁnalestimator, understanding which is crucial for obtaining consistent standard errors which can be usedfor constructing correct conﬁdence intervals. The wide range of quantile regression applications thatgive rise to generated regressors or dependent variable obtained from estimation in a preliminarystep suggest the need for a systematic analysis of their impact on the statistical properties of theQR estimator. The classical way in which asymptotic analysis is carried out for two step estimatorswith smooth objective functions relies on a Taylor expansion based technique for the second stageestimates, as applied in Murphy & Topel (1985). However, such methods are not applicable for theQR estimator, since it is diﬃcult to diﬀerentiate the QR estimator . Finding the asymptotic varianceof the non-smooth two-step GQR estimator is not a trivial task and requires alternative techniques.Chen, Linton & Van Keilegom (2003) develop the asymptotic theory for semi-parametric GMM-typeestimators with non-smooth criterion function and non-parametric ﬁrst-stage; closely related papersare Ichimura & Lee (2010), Hahn & Ridder (2013), and Mammen, Rothe & Schienle (2016) (the lattertwo also involve generated regressors in the non-parametric component, such that estimation occurs inthree steps). As a consequence of generality, they have a standard two step proof approach, where theyﬁrst give conditions for consistency and then establish asymptotic normality. However, since the QRobjective function is convex, it allows bypassing the tedious task of checking conditions for consistency This could be done in principle by applying the Implicit Function Theorem to the ﬁrst-order condition that deﬁnesthe estimator. However, the QR estimator is not always unique and the QR objective function is not twice diﬀerentiable,preventing the use of this approach.

2s in Chen et al. (2003) and to establish asymptotic normality in one step instead, by applying theconvexity trick of Hjort & Pollard (2011). Also, a Bahadur representation of the non-smooth two-stepestimator with its rate is not present in these works.This paper systematically handles the associated issues for the asymptotic analysis of the gen-eralised two-step GQR estimator using techniques from asymptotic analysis for quantile regressionand empirical process theory. We derive the Bahadur expansion of the GQR estimator, with precisestochastic order of the remainder term, which holds uniformly with respect to the ﬁrst step param-eter and the quantile levels. This involves establishing a stochastic equicontinuity result that allowsapproximating the score evaluated at the estimated ﬁrst stage parameter by that taken at the trueparameter, which is interesting in evaluating the eﬀect of the ﬁrst stage. Using the Bahadur expansionapproach, under the assumption that the ﬁrst stage estimation is asymptotically normal and someother regularity conditions, we establish asymptotic normality and obtain explicit expression for theasymptotic variance of the GQR estimator.Several applications ﬁt the generated QR framework and four motivating examples are discussed- quantile regression involving constant slope parameters, an ellipticly distributed random coeﬃcientmodel, a Box-Cox power transformed quantile regression, and a variant for endogenous quantile re-gression model. The example of QR model with constant slope also forms the basis for simulationexperiments and an empirical application; the analysis suggests potential beneﬁts of the two stageGQR procedure over standard QR. The simulation exercises illustrate the validity of the GQR asymp-totic normality result and the eﬀect of the ﬁrst stage estimation; further analysis of the asymptoticvariance suggests that the GQR estimator produces eﬃciency improvements over standard QR es-timator for central quantiles. Finally, an empirical application based on auction models in quantileframework conﬁrms that the GQR estimator improves the monotonicity and accuracy of quantileslopes as compared to an unconstrained estimation using standard quantile regression.The rest of the paper is organised as follows. Section 2 introduces the baseline model and theGQR estimator, and presents four applications to motivate the framework. Section 3 carries out theasymptotic analysis and presents the Bahadur expansion results and the central limit theorem for theGQR estimator. The asymptotic results are applied to the motivating examples in Section 4. Section5 presents simulation results while Section 6 reports results of the empirical application to ﬁrst priceauctions. Proofs of the main results are given in the Appendices.

We consider the following linear quantile speciﬁcation. Y ( θ ) = X ( θ ) ′ β ( U ); U | X ( θ ) ∼ U [0 , , (2.1)where, provided that τ X ( θ ) ′ β ( τ ) is strictly increasing and continuous in τ , X ( θ ) ′ β ( τ ) is the τ -quantile of Y ( θ ) conditional on X ( θ ). Here, Y ( θ ) and X ( θ ) are functions of a vector of parameters θ , which includes elements that generate the dependent variable Y , or the regressor X , or both. Thetrue value of the parameter θ in (2.1), denoted by θ , is not known but estimated. Hence, estimation3roceeds in two steps. First step: Estimation of θ . It is assumed that a consistent estimator b θ is available. For the sakeof generality, any estimation method is allowed at this stage, provided it satisﬁes an expansion typicalof regular estimators, see for example Newey & McFadden (1994). As discussed for the examples, asuitable choice of b θ can be done on a case-by-case basis. Second step: Estimation of quantile parameter.

The quantile parameter estimate b β ( τ ) in (2.1)is given by b β ( τ ) = b β ( τ ; b θ ) = arg min β n n X i =1 ρ τ (cid:16) Y i ( b θ ) − X i ( b θ ) ′ β (cid:17) , (2.2)where ρ τ ( u ) = ( τ − I ( u < u is the check function of Koenker & Bassett (1978). The general framework of quantile regression with dependent variable and/or covariates obtained asa function of parameters estimated in a ﬁrst step ﬁnds wide application in economics and statistics.We present four applications here, which we revisit later to derive their asymptotic results.

Consider the quantile regression (QR) model Q Y ( τ | X ) = β ( τ ) + β ( τ ) X + β ( τ ) X (2.3)and assume that β ( τ ) = β for all τ , ie β ( · ) is constant. This model can be estimated usingZou & Yuan (2008)’s composite quantile regression (CQR) method as follows: (cid:16) b β , b β ( τ ) , b β ( τ ) , · · · , b β ( τ K ) , b β ( τ K ) (cid:17) = arg min b ,b k ,b k ; k =1 , ··· ,K K X k =1 n X i =1 ρ τ k ( Y i − X i b − b k − X i b k ) , for 0 < τ < τ < · · · < τ K <

1. This could lead to an intractable system due to very large number ofvariables, especially with more quantile parameters and quantile levels. Moreover, Zou & Yuan (2008)studies the asymptotic properties of the CQR estimator for estimation of constant slope parametersand compares eﬃciency with least squares, while the asymptotic behaviour for quantile varying slopeparameters remains unstudied. As an alternative to Zou & Yuan (2008), consider a two step estimationof this model as described below.As there exist uniform variables U i independent of X i such that Y i = Q Y ( U i | X i ), it holds Y i = β + β X i + β X i + ε i where β k = E [ β k ( U i )], k = { , , } , and ε i = β ( U i ) − β + (cid:0) β ( U i ) − β (cid:1) X i (since β = β =4 ( U i )). It follows that the β k ’s can be estimated using OLS, that is, (cid:16)b β , b β , b β (cid:17) = arg min b ,b ,b n X i =1 ( Y i − b − b X i − b X i ) . (2.4)Set b β = b β . A two step estimator of ( β ( τ ) , β ( τ )) is then (cid:16) b β ( τ ) , b β ( τ ) (cid:17) = arg min b ,b n X i =1 ρ τ (cid:16) Y i − b β X i − b − b X i (cid:17) . (2.5)Hence, in this example, the ﬁrst step parameter is θ ≡ β , and the dependent variable is generated as Y i ( β ) = Y i − β X i . Consider the model Y i = X ′ i β i , i = 1 , . . . , n (2.6)where, β i is a ( K + 1)-dimensional vector of random coeﬃcients, independent from the ( K + 1)-vectorof covariates X i whose ﬁrst element is 1 (such that the ﬁrst element of β i represents the error in thismodel). Note that linear regression is a special case of (2.6) where β i = β for all i .Suppose β i is drawn from elliptical distribution with location parameter µ and symmetric non-negative dispersion matrix Σ, which includes distributions like multivariate normal, log-normal andt-distribution, as considered in Gimenes & Guerre (2020)’s auctions based application. Let R i denotea random vector distributed uniformly on the unit sphere in R ( K +1) and consider the Euclidean norm E i = (cid:12)(cid:12)(cid:12)(cid:12) Σ − / ( β i − µ ) (cid:12)(cid:12)(cid:12)(cid:12) , independent from R i . Then, following Fang, Kotz & Ng (1990, pg-32), β i hasthe same distribution as µ + E i Σ / R i . Let r i denote the ﬁrst coordinate of R i such that t ′ R i = || t || r i (see Fang et al. (1990), Theorem 2.4). Hence, using the symbol d = for denoting identical distributionof random variables, we have from (2.6) Y i d = X ′ i µ + (cid:16) Σ / X i (cid:17) ′ E i R i d = X ′ i µ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E i r i . Hence, the quantile speciﬁcation for (2.6) is given by Q Y ( τ | X ) = X ′ µ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ξ ( τ ) (2.7)where ξ ( τ ) is the τ -th quantile of E i r i . The above model can be estimated in two steps as follows.Under the normalisation ξ (1 /

2) = 1, the parameters µ and Σ are identiﬁed by conditional medianregression: ( b µ, b Σ) = arg min µ, Σ n X i =1 (cid:12)(cid:12)(cid:12) Y i − X ′ i µ − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (2.8)5he second step involves quantile regression using the generated dependent variable Y i ( b µ, b Σ) = Y i − X ′ i b µ || b Σ / X || to obtain the sample quantiles: b ξ ( τ ) = arg min ξ n X i =1 ρ τ (cid:16) Y i ( b µ, b Σ) − ξ (cid:17) . (2.9) Box & Cox (1964) proposes ﬁnding a transformation parameter λ such that with the following trans-formation on the original observations Y , Y ( λ ) =  Y λ − λ , if λ = 0 , log Y, if λ = 0 , (2.10) Y ( λ ) is normally distributed with conditional variance σ , and E [ Y ( λ ) | X ] = X ′ β . The desirousproperty for quantile regression is linearity, that is, Q Y ( λ ) ( τ | X ) = X ′ β ( τ ) . The Box-Cox quantile regression literature has mostly focussed on ﬁnding a quantile dependenttransformation parameter (see, for instance, Powell (1991), Chamberlain (1994), Buchinsky (1995),Machado & Mata (2000) and Fitzenberger, Wilke & Zhang (2009)). Owing to the equivariance prop-erty of quantiles, this leads to minimization of the non-linear function P ni =1 ρ τ (cid:16) Y i − ( λX ′ i β + 1) /λ (cid:17) .Quantile varying λ adds ﬂexibility to the model, but joint estimation of ( λ ( τ ) , β ( τ )) requires eﬀort,see Koenker (2017). Also, a basic numerical problem is that ( λX ′ i β + 1) needs to be positive for all λ and all observations.A constrained estimation with a constant λ has obvious computational and numerical beneﬁts.Mu & He (2007) considers constancy of λ ( τ ). In the empirical application of Buchinsky (1995) study-ing transformation of log wages over 25 years, λ ( τ ) seems to be constant for all quantiles except thehighest. A simpler approach would, therefore, involve estimating b λ separately in a ﬁrst step andthereafter performing linear quantile regression using the transformed Y for estimating β ( τ ). b λ canbe estimated from the linear regression Y ( λ ) = X ′ β + ε . A consistent estimator for b λ is Amemiya(1974)’s nonlinear IV (NIV) estimator, (cid:16)b λ NIV , b β NIV (cid:17) = arg min ℓ,b n X i =1 (cid:0) Y i ( ℓ ) − X ′ i b (cid:1) W ′ i ! Ω n X i =1 W i (cid:0) Y i ( ℓ ) − X ′ i b (cid:1)! , (2.11)where W i always contains X i as well as additional instruments (Amemiya & Powell (1981) recommendsusing squares and cross-products of X i ’s). Set b λ = b λ NIV . The dependent variables Y i ( b λ ) is, then,6enerated using equation (2.10). β ( τ ) is estimated from quantile regression of Y ( b λ ) on X , b β ( τ ) = arg min b n X i =1 ρ τ (cid:16) Y i ( b λ ) − X ′ i b (cid:17) . (2.12) Control variable approach views endogeneity bias as an omitted variable bias and proceeds by esti-mating the ‘control variable’, which is the residual of the regression of the endogenous regressor on theinstruments, conditional on which error becomes independent of the regressors (see Blundell & Powell(2003)).Consider the following system of equations Y = W ′ α + Xβ + ε, (2.13) X = Z ′ γ + η (2.14)where W is a vector of exogenous covariates and X is the endogenous regressor of interest generatedby (2.14) in which Z is the vector of instruments uncorrelated with η and ε , η being centered with aﬁnite variance. Hence, endogeneity in X arises due to the unobserved latent variable η , adding whichas regressor in the ﬁrst equation ‘corrects’ for endogeneity, as in the following quantile speciﬁcation: Q Y | W,X,η ( τ | W, X, η ) = W ′ α ( τ ) + Xβ ( τ ) + ηλ ( τ ) . (2.15)The above model can be estimated in two steps as follows. The ﬁrst stage least squares estimates thecontrol variable η , b η i = X i − Z ′ i b γ, b γ = N X i =1 Z i Z ′ i ! − N X i =1 Z i X i . (2.16)The second stage estimator for the quantile coeﬃcients is hb α ′ ( τ ) , b β ( τ ) , b λ ( τ ) i ′ = arg min α,β,λ N X i =1 ρ τ (cid:0) Y i − W ′ i α − X i β − b ηλ (cid:1) = arg min α,β,λ N X i =1 ρ τ (cid:0) Y i − W ′ i α − X i β − ( X i − Z ′ i b γ ) λ (cid:1) . (2.17)Hence, in this example, the ﬁrst step estimator is θ ≡ γ , and the second stage involves quantileregression of Y i on generated regressors, X i ( θ ) ≡ [ W ′ i , X i , ( X i − Z ′ i γ )] ′ . Our main assumptions are as follows: 7 ssumption 1 (First step estimator)

There exists a function ψ ( z ) such that the estimator of thetrue θ is asymptotically linear: √ n (cid:16)b θ − θ (cid:17) = 1 √ n n X i =1 ψ ( z i ) + o P (1) , E [ ψ ( z )] = 0 , E (cid:2) ψ ( z ) ψ ( z ) ′ (cid:3) < ∞ . Assumption 2 (Model) ( X i , Y i ) are i.i.d. There exists a compact set Θ with a non empty interiorcontaining θ such that X i ( θ ) = h ( X i , θ ) and Y i ( θ ) = g ( Y i , X i , θ ) are continuous and diﬀerentiablewith respect to θ in Θ for all ( Y i , X i ) . Denoting k·k as the Euclidean norm, it holds moreover that sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂g ( Y, X, θ ) ∂θ (cid:13)(cid:13)(cid:13)(cid:13) < ∞ . In the next Assumption F ( y | x, θ ) and f ( y | x, θ ) stands for the c.d.f. and p.d.f. of Y ( θ ) given X ( θ ), f X ( ·| θ ) being the p.d.f. of X ( θ ). The set X ( θ ) is the support of X ( θ ). All p.d.f. are deﬁnedwith respect to the Lebesgue measure. The set Θ is as in Assumption 2. Assumption 3 (Smoothness) (i) X ( θ ) lies in R d for each θ and X ( θ ) is a compact subset of R d with non empty interior. f X ( x | θ ) > over the interior of X ( θ ) and vanishes at its boundaries. f X ( x | θ ) is continuously diﬀerentiable with respect to θ . (ii) the p.d.f. f ( y | x, θ ) of Y ( θ ) given X is con-tinuously diﬀerentiable in ( y, x, θ ) with f ( y | x, θ ) > for all ( y, x, θ ) such that ( x, θ ) ∈ S θ ∈ Θ {X ( θ ) × θ } and y is in the interior of the support of F ( ·| x, θ ) . Asymptotically linear estimators in Assumption 1 refer to the class of extremum estimators asconsidered in Newey & McFadden (1994). Examples include MLE, NLS, and the GMM class. Itimplies √ n -consistency of the ﬁrst step estimator and is key to the derivation of the asymptoticnormality result for the second-step estimator. The triangular structure imposed by Assumption 2ensures that X ( θ ) is not a function of Y and therefore remains exogenous; it is useful in the example ofSection 2.1.1. Assumption 3-(ii) is a high level assumption that can be derived from Assumption 2 andthe quantile regression slope β ( · ) since g ( Y, X, θ ) = X ( θ ) ′ β ( U ). It implicitly requests a monotone g ( · , X, θ ) with non zero derivatives, as f ( ·| x, θ ) may diverge otherwise. Indeed, if ∂g ( y, x, θ ) /∂y > f ( y | x ) is the p.d.f. of Y given X (assuming X ( θ ) = X for the sake of the brevity of thisdiscussion), it holds f ( y | x, θ ) = 1 ∂g∂y [ g − ( y, x, θ ) , x, θ ] f (cid:2) g − ( y, x, θ ) | x (cid:3) which may not be bounded if ∂g ( y, x, θ ) /∂y vanishes. Assumption 3-(ii) then holds if f ( y | x ) iscontinuously diﬀerentiable in ( x, y ) and g ( y, x, θ ) twice diﬀerentiable with respect to y and θ withbounded partial derivatives. Assumption 3-(i) is similar, but note that the transformation X ( θ ) = h ( X, θ ) does not need to be one to one, as X ( θ ) may have a smaller dimension than X .The QR estimator of the slope coeﬃcient is an estimator of β (cid:16) τ ; b θ (cid:17) where β ( τ ; θ ) = arg min β E (cid:2) ρ τ (cid:0) Y ( θ ) − X ′ ( θ ) β (cid:1)(cid:3) . (3.1)8ssumption 3 ensures that the objective function is strictly convex for all θ , so that β ( τ ; θ ) is theunique solution of the ﬁrst order condition0 = E (cid:2)(cid:8) I (cid:0) Y ( θ ) ≤ X ′ ( θ ) β (cid:1) − τ (cid:9) X ( θ ) (cid:3) = E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β (cid:12)(cid:12) X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3) This together with the Implicit Function Theorem implies that β ( τ ; θ ) is diﬀerentiable with respectto θ , as established in the following Proposition. Proposition 1

Under Assumptions 2 and 3, β ( τ ; θ ) is continuously diﬀerentiable with respect to θ for any θ ∈ Θ and < τ < . It holds moreover ∂β ( τ ; θ ) ∂θ = H ( τ ; θ ) − D ( τ ; θ ) where H ( τ ; θ ) = E (cid:2) f (cid:0) X ′ ( θ ) β ( τ ; θ ) (cid:12)(cid:12) X, θ (cid:1) X ( θ ) X ′ ( θ ) (cid:3) D ( τ ; θ ) = − ∂∂θ (cid:2) E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β (cid:12)(cid:12) X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ,β = β ( τ ; θ ) . Proof of Proposition 1 : See proof section in Appendix 1.The matrix H ( τ ; θ ) plays an important role in the asymptotic distribution of standard QR estima-tors, see below and Koenker (2005). The existence of its inverse is established in Lemma 1 of theproof section in Appendix 1. The matrix D ( τ ; θ ) is speciﬁc to two stage estimation. With known θ , a linear representation for √ n (cid:16) b β ( τ ; θ ) − β ( τ, θ ) (cid:17) can be found in Koenker (2005) Section 4 . θ in-duces some important changes compared to a known θ and requires ﬁnding an approximation for √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) . The approach used here builds on a Bahadur expansion which holds uni-formly in θ and τ . While detailed proofs are in Appendix 1-2, a heuristic description of the Bahadurexpansion proof is as below. Heuristics.

Deﬁne b S ( τ ; θ ) = 1 √ n n X i =1 (cid:2) I (cid:0) Y i ( θ ) ≤ X ′ i ( θ ) β ( τ ; θ ) (cid:1) − τ (cid:3) X i ( θ ) , (3.2) J ( τ ; θ ) = τ (1 − τ ) E (cid:2) X i ( θ ) X ′ i ( θ ) (cid:3) (3.3) b E ( τ ; θ ) = √ n (cid:16) b β ( τ ; θ ) − β ( τ ; θ ) (cid:17) − (cid:16) − H − ( τ ; θ ) b S ( τ ; θ ) (cid:17) . (3.4)Note that for a given θ , b S ( τ ; θ ) / √ n is the score of the objective function in (3.1) and b S ( τ ; θ ) is centeredfor 0 < τ < J ( τ ; θ ). The basic idea is to approximate √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) with − H − ( τ ; θ ) b S ( τ ; θ ), assuming that the approximation error is of the right order. If so, the asymptoticnormality of the two-step QR estimator follows from that of its score evaluated at the true ﬁrst stage.9rucially, it needs to be shown that the approximation error is small. The approach here is basedon two main results: showing that the Bahadur error term b E ( τ ; θ ) is small by ﬁnding its uniformbound for all θ and τ , and proving the stochastic equicontinuity of H − ( τ ; θ ) ˆ S ( τ ; θ ) at the true θ .The outline of the proof is as follows. Let b E ( τ ; θ ) = arg min ǫ L n ( ǫ, τ ; θ ), where L n is so deﬁnedto be a linear combination of ρ τ ( · ) and, thus, is convex. Consider the decomposition L n ( ǫ, τ ; θ ) = L n ( ǫ, τ ; θ ) + R n ( ǫ, τ ; θ ), where L n is the quadratic approximation of L n and R n is the remainder term.Finding uniform order for b E means ﬁnding bounds for the probability of || b E || ≥ t n , for a small number t n such that lim n →∞ t n →

0, for all θ and τ . This involves ﬁnding bounds on inf || ǫ ||≥ t n L n ( ǫ, τ ; θ ),which, in turn, requests placing bounds on inf || ǫ || = t n L n ( ǫ, τ ; θ ) and sup || ǫ || = t n R n ( ǫ, τ ; θ ). Note thatconvexity allows us to make inference for the non-compact set || ǫ || ≥ t n by considering a compact set || ǫ || = t n (as detailed under heading ‘Uniform order for b E ( τ ; θ )’ in Appendix 1). Obtaining bounds for L n is straightforward. Uniform order of R n for all θ and τ , as obtained in Appendix 1 equation (A1.12),relies on establishing Bernstein type maximal inequality for the empirical process R n using Theorem6 . H − ˆ S also follows fromsimilar arguments of maximal inequality under bracketing entropy. The next Proposition presentsthe Bahadur error bound and stochastic equicontinuity result to establish linearisation of the GQRestimator, uniformly in τ and θ . Proposition 2

Under Assumptions 1-3 it holds for any compact parameter set Θ and C > (i) sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:13)(cid:13)(cid:13) b E ( τ ; θ ) (cid:13)(cid:13)(cid:13) = O P log / nn / ! , (3.5) (ii) sup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13)(cid:13) H − ( τ ; θ ) b S ( τ ; θ ) − H − ( τ ; θ ) b S ( τ ; θ ) (cid:13)(cid:13)(cid:13) = O P log / nn / ! (3.6) where < τ ≤ τ < and B ( θ , ̺ ) = { θ ; k θ − θ k ≤ ̺ } . Proof of Proposition 2 : See proof section in Appendix 1.Propositions 1 and 2 give the next Theorem, which states a Central Limit Theorem for the two stepestimator of the slope coeﬃcient. Note that in absence of the parameter θ , the asymptotic normalityresult is the same as that for usual quantile regression estimator, as derived in Theorem 4 . Theorem 1

Under Assumptions 1-3, it holds for any τ in (0 , √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) d → N (0 , V ( τ ))10 here V ( τ ) = H ( τ ; θ ) − [ J ( τ ; θ ) + D ( τ ; θ ) C Ψ S ( τ ) + C ′ Ψ S ( τ ) D ′ ( τ ; θ )+ D ( τ ; θ ) C ΨΨ D ′ ( τ ; θ )] H ( τ ; θ ) − ,C ΨΨ = E [Ψ ( Z ) Ψ ′ ( Z )] , C Ψ S ( τ ) = E [Ψ ( Z ) X ′ ( θ ) { I [ Y ( θ ) ≤ X ′ ( θ ) β ( τ ; θ )] − τ } ] . Proof of Theorem 1.

Proposition 1 yields that √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β ( τ ; θ ) (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + √ n (cid:16) β (cid:16) τ ; b θ (cid:17) − β ( τ ; θ ) (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ + o P (1) (cid:19) √ n (cid:16)b θ − θ (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1) (3.7)where the last line holds thanks to Assumption 1. Equation (3.7) and Proposition 2 give √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) = H − (cid:16) τ ; b θ (cid:17) b S (cid:16) τ ; b θ (cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1)= H − ( τ ; θ ) b S ( τ ; θ ) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1) , (3.8)where the last line results from (3.6) since Assumption 1 and taking C large enough ensure that b θ belongs to B (cid:0) θ , Cn − / (cid:1) with high probability. Since ∂β ( τ ; θ ) ∂θ = H ( τ ; θ ) − D ( τ ; θ ) from Proposition1, the Limit distribution of Theorem 1 follows from the Multivariate CLT. (cid:3) Remark 1.

As Propositions 1 and 2 hold uniformly in τ , the expansion (3.8) also does. SinceFunctional Central Limit Theorems for b S ( τ ; θ ) can be applied, (3.8) can be used to obtain a Func-tional Central Limit Theorem for the two step quantile regression estimator. Remark 2.

The order of the o P (1) remainder term in (3.8) can be made more precise, strength-ening the smoothness Assumptions 2 and 3 to ensure that β ( τ ; θ ) is twice continuously diﬀerentiableusing the Implicit Function Theorem as in Proposition 1. Indeed, if β ( τ ; θ ) is twice continuouslydiﬀerentiable with respect to θ , the o P (1) remainder term in (3.7) is an O P (cid:0) n − / (cid:1) and the order ofthe o P (1) remainder term in (3.8) follows from (3.5) and is O P (cid:16) n − / log / n (cid:17) . Remark 3.

The proof can be easily modiﬁed for the case where θ depends upon τ . Remark 4.

For estimating the GQR asymptotic variance, a kernel-based approach can be em-ployed with numerical derivatives. But bootstrap may be preferable and, indeed, is more suitablefor quantile regression (see Koenker (2005) and the references therein). The validity of bootstrapfor obtaining asymptotic conﬁdence interval of two-step semiparametric estimators with non-smoothobjective function has been proven by Chen et al. (2003), implying its correctness for the GQR esti-mator. 11

Examples revisited

In this section, we apply the asymptotic theory results of Section 3 to the motivating examplesintroduced in Section 2.1.

For the quantile regression model (2.3), recall that the constant paramater β ( · ) is estimated usingleast squares regression, and the quantile parameters ( β ( · ) , β ( · )) are estimated using the generateddependent variable Y i ( b β ) = Y i − b β X i via the two-step quantile regression estimator of (2.5). Asymp-totic normality of the ﬁrst step OLS estimator is well established. Denote X = [1 , X , X ] ′ . Assumethat E [ ε XX ′ ] is ﬁnite and E [ XX ′ ] is full rank and ﬁnite. The OLS estimator is asymptotically linear: √ n (cid:16) b β − β (cid:17) = n X i =1 (cid:2) E − (cid:2) XX ′ (cid:3) X i ε i (cid:3) / √ n + o P (1) . Denoting i = [0 , , b β is given by V ( β ) = i (cid:0) E − [ XX ′ ] E [ ε XX ′ ] E − [ XX ′ ] (cid:1) i ′ . (4.1)For the second step quantile regression, the dependent variable is generated as Y ( b β ) = Y − b β X ,and the regressors are denoted as e X = [1 , X ] ′ . Asymptotic normality of the quantile parameters β ( τ ) = ( β ( τ ) , β ( τ )) ′ follows directly from Theorem 1: √ n " b β ( τ ) − β ( τ ) b β ( τ ) − β ( τ ) d −→ N (0 , V ( τ )) . The terms of V are obtained from Theorem 1 by replacing θ ≡ β , β ( τ ) ≡ ( β ( τ ) , β ( τ )) ′ , X ( θ ) ≡ e X = [1 , X ] ′ and Y ( θ ) ≡ Y ( β ) = Y − β X . Denoting the ﬁrst τ -derivative of β ( τ ) as β (1) ( τ ), V ( τ )comes as follows: V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( β ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − , where H ( τ ) = E " e X e X ′ β (1)0 ( τ ) + β (1)2 ( τ ) X , J ( τ ) = τ (1 − τ ) E h e X e X ′ i ,D ( τ ) = − E " X e Xβ (1)0 ( τ ) + β (1)2 ( τ ) X and C ( τ ) = E (cid:20) g ( X ) (cid:26)Z τ ( β ( t ) + β ( t ) X ) dt − τ ( β ( τ ) + β ( τ ) X ) (cid:27)(cid:21) , (4.2)with g ( X ) = e X [0 , , E − [ XX ′ ] X . 12 .2 Random coeﬃcient model For the random coeﬃcient model in (2.6), recall that for identiﬁcation of the quantile speciﬁcation in(2.7), we normalise ξ (1 /

2) = 1. Denote the τ -derivative of ξ ( τ ) by ξ (1) ( τ ). The ﬁrst step parameters θ ≡ ( µ, Σ) are estimated by (2.8); denote G ( · ) = X ′ i µ + (cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12) . The θ -derivative of G ( · ) is given by G θ = " X ∂ || Σ / X || ∂σ where σ is a ( K + 1) × / . The non-linear median regressionestimator of (2.8) is asymptotically linear (see Section 4.4 of Koenker (2005)): √ n ( b θ − θ ) = H − n X i =1 G θi [1 / − I ( Y i ≤ G i ( · ))] / √ n + o P (1)where H = E (cid:20) G θ G θ ′ || Σ / X || ξ (1) (1 / (cid:21) . The asymptotic variance of b θ is given by V ( θ ) = H − E h G θ G θ ′ i H − / Y ( b θ ) ≡ Y ( b µ, b Σ) = Y i − X ′ i b µ || b Σ / X || by (2.9). The asymptotic normality of b ξ ( τ ) follows from Theorem 1: √ n hb ξ ( τ ) − ξ ( τ ) i d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( θ ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . The terms of V ( τ ) are: H ( τ ) = E h ξ (1) ( τ ) i , J ( τ ) = τ (1 − τ ) , D ( τ ) = − E " ξ (1) ( τ ) " X ∂ || Σ / X || ∂σ ξ ( τ ) ,C ( τ ) = E (cid:20) Ψ( · ) (cid:26) I (cid:18) Y − X ′ µ || Σ / X || ≤ ξ ( τ ) (cid:19) − τ (cid:27)(cid:21) where Ψ( · ) = H − G θ [1 / − I ( Y ≤ G ( · ))] . The box-cox transformation parameter of (2.10) is estimated using the nonlinear IV (NIV) estimatorof (2.11). The conditional quantile model for the generated dependent variable Y ( b λ ) is assumed linearin parameters, which are estimated using the QR estimator of (2.12). Amemiya (1974) establishes thelimiting behaviour of the NIV estimator. Assume that E h ( Y ( λ ) − X ′ β ) W W ′ i is ﬁnite and Ω is fullrank and ﬁnite.Note that if β is a K -dimension vector, then the NIV estimator estimates ( K + 1) parameters,denoted by θ = [ λ, β ′ ] ′ . Denote the ( K + 1) order square matrix, G = E (cid:20) W ∂Y ( λ ) ∂λ , − W X ′ (cid:21) . √ n (cid:16)b θ − θ (cid:17) = n X i =1 h − (cid:0) G ′ Ω G (cid:1) − G ′ Ω W i ( Y i ( λ ) − X ′ i β ) i / √ n + o P (1) . The asymptotic variance of b λ , denoted by V ( λ ), is the ﬁrst term of the asymptotic variance-covariancematrix for b θ . Denoting i = [1 , K × ], where K × is a K -dimension row vector of zeros, V ( λ ) = i (cid:16)(cid:0) G ′ Ω G (cid:1) − G ′ Ω E (cid:2)(cid:0) Y ( λ ) − X ′ β (cid:1) W W ′ (cid:3) Ω G (cid:0) G ′ Ω G (cid:1) − (cid:17) i ′ . Asymptotic normality for the quantile estimates obtained from QR of Y ( b λ ) on X follows directly fromTheorem 1. √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( λ ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . The terms of V ( τ ) are given by H ( τ ) = E h XX ′ X ′ β (1) ( τ ) i , J ( τ ) = τ (1 − τ ) E [ XX ′ ] , D ( τ ) = − E h XX ′ β (1) ( τ ) ∂ ( X ′ β ( τ ) λ +1) /λ ∂λ i C ( τ ) = E (cid:2) g ( X ) (cid:8)R τ X ′ β ( t ) dt − τ X ′ β ( τ ) (cid:9)(cid:3) , where g ( X ) = X [1 , K × ] (cid:16) − ( G ′ Ω G ) − G ′ Ω W (cid:17) . The quantile regression model in (2.15) is estimated in two steps. The ﬁrst step uses OLS estimator of(2.16) to estimate b γ . This is used to generate the control variable b η = ( X i − Z ′ i b γ ), which is includedas a regressor in the quantile regression estimator of (2.17) for estimating the quantile parameters δ ( τ ) ≡ ( α ( τ ) ′ , β ( τ ) , λ ( τ )) ′ . Denote the generated regressors as X ( γ ) = [ W ′ , X, ( X − Z ′ γ )] ′ . Weassume that E (cid:2) η | Z (cid:3) = σ and E ( ZZ ′ ) is ﬁnite. The OLS estimator is asymptotically linear: √ n ( b γ − γ ) = n X i =1 (cid:2) E − (cid:2) ZZ ′ (cid:3) Z i η i (cid:3) / √ n + o P (1) . The asymptotic normality of the quantile parameters δ ( τ ) follows directly from Theorem 1, √ n hb δ ( τ ) − δ ( τ ) i d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) σ E − [ ZZ ′ ] D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . V ( τ ) are given by J ( τ ) = τ (1 − τ ) E [ X ( γ ) X ( γ ) ′ ] , D ( τ ) = − ∂∂γ [ E [ { F ( X ′ ( γ ) δ | W, X, γ ) − τ } X ( γ )]] (cid:12)(cid:12)(cid:12) δ = δ ( τ ) C ( τ ) = E (cid:2) X ( γ ) ( I ( Y ≤ X ( γ ) ′ δ ( τ )) − τ ) ηZ ′ E − [ ZZ ′ ] (cid:3) , H ( τ ) = E h X ( γ ) X ( γ ) ′ X ( γ ) ′ δ (1) ( τ ) i . This section reports results of simulation exercises to illustrate the performance of the two-step GQRestimator and validate the asymptotic normality result of Theorem 1. The simulations are based onthe quantile regression with constant slope model of Section 2.1.1 Q Y ( τ | X ) = β ( τ ) + β ( τ ) X + β ( τ ) X , with true parameters as, β ( τ ) = e τ , β ( τ ) = β = 1 ∀ τ, β ( τ ) = 2 τ . (5.1)Data are generated as Y i = β ( U i ) + β X i + β ( U i ) X i , where ( X i , X i ) are uniform random variablesbetween [1 ,

5] and [3 ,

10] respectively, U i is a [0 , i = 1 , · · · , n . Sample sizesof n = 100 and n = 1000 are considered. The number of simulation replications is set to 1000.GQR estimation of the above model proceeds as in Section 2.1.1. We also compare GQR withstandard quantile regression, where all parameters - both constant and quantile-varying ones - areestimated together by quantile regression of Y on X ’s. Also, to clearly see the eﬀect of ﬁrst stageestimation on overall variance, the GQR estimator is compared with an infeasible quantile regression(i-QR) estimator which uses the true value of the ﬁrst step parameter instead of its estimate, that is,the unknown dependent variable Y ∗ i ( β ) = Y i − β X i , for QR based estimation of quantile parameters.Following Remark 4, asymptotic variance estimation for validating the asymptotic normality resultand for ﬁnding conﬁdence intervals follows Buchinsky (1994)’s design matrix bootstrap. Design matrixbootstrap is extensively used in empirical applications for quantile regression involving large samples,see, for instance, Buchinsky (1994) and Abrevaya (2002) . The approach is as follows. For B bootstrapreplications, each of size of m (drawn with replacement from an overall sample size of n ), b = 1 , · · · , B bootstrap quantile estimates are obtained at each quantile level. This follows the so-called m -out-of- n bootstrap technique which provides signiﬁcant computational advantage when sample size islarge. Following Buchinsky (1994), the sample covariance of these estimates, rescaled by ( m/n ),constitutes a valid estimator of the covariance matrix of the QR estimator. Hence, the estimate forasymptotic covariance V ( τ ) with quantile parameters β ( · ) and the bootstrap estimates denoted by See Buchinsky (1995) and Koenker & Hallock (2001) for a comparison of various QR variance estimators; theyconclude in favour of design matrix bootstrap. β b ( τ ), b = 1 , · · · , B , is given by b V ( τ ) = n (cid:16) mn (cid:17) B B X b =1 (cid:16) b β b ( τ ) − b β b A ( τ ) (cid:17) (cid:16) b β b ( τ ) − b β b A ( τ ) (cid:17) ′ , (5.2)where b β b A ( τ ) is the average of the B bootstrap estimates. We set B = 1000; for n = 1000, thebootstrap sample size is m = 300, while for n = 100, we have m = n . The choice of bootstrapreplications and sample size are consistent with Buchinsky (1995) and Andrews & Buchinsky (2000).We estimate b V ( τ ) from (5.2) for each of the 1000 simulations and report the average. For GQR in Tables 1-4, the ﬁrst step least squares regression gives the mean of ˆ β as 1 .

007 (withaverage standard deviation = 0 . .

001 (with average standarddeviation = 0 . b β ( · ) for GQR, standard QR andi-QR estimation methods, with varying n . All methods of estimation have low biases and the RMSEfalls with increasing sample size. We note that while all estimation procedures have similar biases,the RMSE with GQR is greater than that of QR for the ﬁrst quantile, and the opposite is true for therest of the quantiles, an observation we investigate further in Section 5.2. As expected, the RMSEwith GQR is greater than that of i-QR at each quantile, with substantial diﬀerence in some, due tothe added variance contribution from ﬁrst step estimation in the former.Table 2 reports the Bias-RMSE results for the slope parameter b β ( · ). The bias and RMSE aresimilar for all three methods of estimation and the RMSE falls with increase in sample size. Thefollowing remark explains this. Remark.

In the GQR asymptotic variance for the QR with constant slope model as given by (4.2),if the covariates X and X are independent, as considered here, it holds that(i) The covariance between ﬁrst and second step estimates is zero: C ( τ ) = 0.(ii) The ﬁrst step estimation has an eﬀect on the second-step variance for the intercept, b β ( τ ), butnot for the slope parameter b β ( τ ), as H ( τ ) − D ( τ ) in (4.2) evaluates to [ − E [ X ] , ′ .Proofs are straightforward using basic matrix algebra and its outline is presented in Appendix 3.As a means for validating the asymptotic normality result, Tables 3-4 compare the empirical90% ,

95% and 99% GQR conﬁdence intervals with that in theory for normal approximation, for n = 100and n = 1000. For τ = { . , . , . , . } , t-stat of the quantile parameters is computed usingbootstrapped standard error (SE) from (5.2) and its absolute value is compared with the criticalvalues for (1 − α ) conﬁdence level of the normal approximation, (1 − α ) = 0 . , .

95 and 0 .

99, to ﬁndif the true quantile parameter is inside the corresponding conﬁdence interval. Repeating the exercise1000 times, we ﬁnd the percentage of times when the true parameter lies inside the (1 − α ) conﬁdence16able 1: Bias and RMSE of ˆ β ( · ) for n = 100 and 1000 n = 100 n = 1000 τ Bias RMSE Bias RMSE0 . . . − . . . . . . . . − . . . . . − . . . . . . . . − . . . . . − . . . . . . . . . . . . . − . . − . . . . . . . . β ( · ) for n = 100 and 1000 n = 100 n = 1000 τ Bias RMSE Bias RMSE0 . . . . . . . − . . . . − . . . − . . . . − . . − . . − . . − . . . − . . . . − . . − . . − . . − . . . − . . . . − . . − . . − . . − . . τ = 0 . n = 100, but it improves for n = 1000 in Table 4. Overall, the empirical levels forconﬁdence intervals are close to (1 − α ) and improves with increasing sample size, which suggests thatthe estimation procedure gives accurate central limit theorem based conﬁdence intervals.Table 3: Conﬁdence intervals: nominal vs. empirical, n = 100, simulations = 1000CI for β ( · ) CI for β ( · )Nominal level 0 .

90 0 .

95 0 .

99 0 .

90 0 .

95 0 . τ = 0 . .

940 0 .

977 0 .

997 0 .

960 0 .

982 0 . .

949 0 .

978 0 .

996 0 .

923 0 .

970 0 . .

928 0 .

972 0 .

994 0 .

907 0 .

952 0 . τ = 0 . .

897 0 .

951 0 .

991 0 .

888 0 .

942 0 . .

899 0 .

948 0 .

987 0 .

888 0 .

940 0 . .

885 0 .

940 0 .

986 0 .

878 0 .

933 0 . τ = 0 . .

895 0 .

944 0 .

985 0 .

882 0 .

934 0 . .

893 0 .

936 0 .

981 0 .

883 0 .

940 0 . .

878 0 .

929 0 .

979 0 .

877 0 .

934 0 . τ = 0 . .

900 0 .

940 0 .

989 0 .

895 0 .

942 0 . .

905 0 .

950 0 .

984 0 .

890 0 .

942 0 . .

867 0 .

928 0 .

978 0 .

875 0 .

932 0 . n = 1000, simulations = 1000CI for β ( · ) CI for β ( · )Nominal level 0 .

90 0 .

95 0 .

99 0 .

90 0 .

95 0 . τ = 0 . .

897 0 .

952 0 .

992 0 .

906 0 .

954 0 . .

903 0 .

953 0 .

989 0 .

911 0 .

947 0 . .

884 0 .

940 0 .

985 0 .

893 0 .

939 0 . τ = 0 . .

885 0 .

933 0 .

987 0 .

882 0 .

944 0 . .

886 0 .

942 0 .

987 0 .

907 0 .

954 0 . .

901 0 .

947 0 .

988 0 .

895 0 .

946 0 . τ = 0 . .

886 0 .

952 0 .

985 0 .

891 0 .

937 0 . .

874 0 .

945 0 .

980 0 .

887 0 .

935 0 . .

893 0 .

946 0 .

985 0 .

893 0 .

942 0 . τ = 0 . .

902 0 .

952 0 .

991 0 .

899 0 .

941 0 . .

897 0 .

946 0 .

983 0 .

876 0 .

935 0 . .

892 0 .

946 0 .

990 0 .

883 0 .

942 0 . .2 Asymptotic variance: Further analysis In this section, we compare the GQR and QR asymptotic variances both analytically and throughsimulation, following the RMSE pattern observed in Table 1 which hints at their relative eﬃciencybeing quantile dependent. The data distribution and true parameter values, as assumed in the datagenerating process, allows comparison based on obtaining explicit asymptotic variance expressions forboth GQR and QR. Although the discussion here is speciﬁc to the assumed QR with constant slopemodel, it provides interesting insights, in particular, to the role of ﬁrst stage estimator in overall GQRvariance. Note that while asymptotic variance of GQR, which estimates the constant parameter β and quantile dependent ones ( β ( τ ) , β ( τ )) separately, is given by (4.2), that for standard QR whereall parameters are estimated together is given by V ( τ ) QR = H ( τ ) − QR J ( τ ) QR H ( τ ) − QR (5.3)where denoting X = [1 , X , X ] ′ , H ( τ ) QR = E " XX ′ β (1)0 ( τ ) + β (1)1 ( τ ) X + β (1)2 ( τ ) X , J ( τ ) QR = τ (1 − τ ) E (cid:2) XX ′ (cid:3) . Asymptotic variance for b β ( · ) . Under the remark noted in Section 5.1, the asymptotic varianceof b β ( · ) for GQR is obtained using (4.2) as follows: V ( τ ) GQR, = [1 , H ( τ ) − J ( τ ) H ( τ ) − [1 , ′ + E [ X ] V ( β ) (5.4)where H ( τ ) , J ( τ ) are given by (4.2). For the true model parameters and distribution considered here,this evaluates to V ( τ ) GQR, = τ (1 − τ )( ac − b ) (cid:0) c − bc E [ X ] + b E [ X ] (cid:1) + E [ X ] V ( β ) (5.5)where a = E " β (1)0 ( τ ) + β (1)2 ( τ ) X = 128 τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19) ,b = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 17 × τ (cid:18) τ − e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) ,c = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 1448 τ (cid:18) τ − τ e τ + e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) E [ X ] = 13 / , E [ X ] = 139 / , E [ X ] = 3 . V ( β ) is given by (4.1), and for the model parameters consideredhere, evaluates to V ( β ) = i (cid:0) E [ XX ′ ] (cid:1) i ′ E (cid:2) ε (cid:3) + i (cid:0) E − [ XX ′ ] E [ X XX ′ ] E − [ XX ′ ] (cid:1) i ′ E (cid:2) ε (cid:3) = 34 E (cid:2) ε (cid:3) + 1394 E (cid:2) ε (cid:3) = 34 × (cid:18)

12 ( e − − ( e − (cid:19) + 1394 × − (cid:18) (cid:19) ! where ε i = (cid:0) β i ( U ) − β i (cid:1) , i = 0 , b β ( · ) for the standard QR is given by the ﬁrst element of (5.3), which,for the model parameters and distribution assumed in this exercise, evaluates to V ( τ ) QR, = τ (1 − τ )( ac − b ) (cid:0) c − bc E [ X ] + b E [ X ] (cid:1) + τ (1 − τ )( bf − dc ) ( ac − b ) a Var( X ) (5.6)where Var( X ) = 4 /

3, ( a , b , c ) are as in (5.5) and d = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 328 τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19) f = E " X X β (1)0 ( τ ) + β (1)2 ( τ ) X = 37 × τ (cid:18) τ − e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) . It can be seen from (5.5) and (5.6) that the GQR and QR asymptotic variances have a commonquantile varying component; GQR has a constant additional component that depends on the ﬁrst stepasymptotic variance, while the additional part for QR is again quantile-dependent. The i-QR varianceis given by (5.5) by setting V ( β ) = 0. Figure 1 plots the asymptotic variance comparison for GQRand QR, as well as i-QR. Generated QR (GQR)Infeasible QR (i-QR)Standard QR

Figure 1: Asymptotic variance: GQR vs i-QR vs QR20s can be seen from Figure 1, the variance of the QR and i-QR estimators, being a function of τ (1 − τ ), is close to 0 at the very tails. The tail variance of both the QR and i-QR is less thanthat of GQR because the two step GQR procedure adds an additional constant variance contributionfrom the ﬁrst step estimation irrespective of the quantile level. But the opposite is true for all otherquantile levels, with GQR asymptotic variance being lesser than QR for most quantiles and especiallyprominent in the higher quantiles . While this exercise considered X and X to be independent,the empirical application suggests that for the general case as well, the pattern for GQR versus QRasymptotic variances remains similar. However, we note that there isn’t a clear eﬃciency gain of onemethod over the other - it depends on the quantile level and the choice of the ﬁrst stage estimatorwhich impacts the tail behaviour. Asymptotic variance for b β ( · ) . Under the remark noted in Section 5.1, the asymptotic varianceof b β ( · ) is same for GQR, QR and i-QR, given by, V ( τ ) GQR, = V ( τ ) QR, = [0 , H ( τ ) − J ( τ ) H ( τ ) − [0 , ′ = [0 , , H ( τ ) − QR J ( τ ) QR H ( τ ) − QR [0 , , ′ where ( H ( τ ) , J ( τ )) and ( H ( τ ) QR , J ( τ ) QR ) are obtained from (4.2) and (5.3), respectively. For thetrue model parameters and distribution considered here, this evaluates to V ( τ ) = τ (1 − τ )( ac − b ) (cid:0) b − ab E [ X ] + a E [ X ] (cid:1) (5.7)where ( a , b , c ) are as in (5.5).Tables 5-6 compare the bootstrapped asymptotic SE of b β ( · ), b β ( · ) obtained from (5.2) (meanof b V ( τ ) over the 1000 simulations is reported) with the true values obtained from their analyticalexpressions as derived above. It can be seen in Table 5 that the true asymptotic SE of b β ( · ) is greaterfor GQR than QR for τ = 0 . τ = 0 . b V ( τ ), which is the ratio of the standard deviation to the mean of b V ( τ ) overthe 1000 simulations. CoV measures the precision in estimation of the asymptotic SE (or variabilityamong the estimated values in each run of the simulation). Looking at CoV, it is interesting to notethat for GQR the estimates of asymptotic SE have lesser variation across simulations relative to theirmean values, and CoV is very similar across quantiles, than that of i-QR or QR. This suggests thatthe GQR asymptotic SE estimates are less dispersed around the mean than that of i-QR or QR. TheCoV falls for all methods with sample size; for n = 1000, it is well within 10% for GQR and slightly The relatively high GQR variance at the boundaries is driven by Assumption 3 that the density of X ( θ ) is boundedaway from 0. If it is relaxed, then the two-step estimator can become as good as one-step. When it is not relaxed, thesupport of X ( θ ) depends on θ . In that case, we can have faster than OLS estimation, like Smith (1994)’s nonregularregression which converges at rate 1 /n , so that the two-step estimator is asymptotically unaﬀected by the ﬁrst stage.The drawback of the latter is the requirement that error distribution is bounded away from 0 in its compact support.OLS, although slower, does not suﬀer from such restrictions. β ( · ) is unaﬀected by the twostep procedure, as QR, i-QR and GQR yield identical true values, and similar bootstrapped estimatesas well as CoV. Also, in Table 5 and, to a lesser degree in Table 6, we ﬁnd a slight overestimation ofthe GQR variance and underestimation of the QR one.Table 5: Asymptotic standard error for ˆ β ( · ), B = 1000, simulations = 1000 n = 100 n = 1000 τ True Estimated CoV Estimated CoV0 . . . .

43% 13 . . . . .

22% 9 . . . . .

51% 7 . . . . . .

03% 19 . . . . .

49% 20 . . . . .

46% 15 . . . . . .

03% 25 . . . . . . . . .

53% 22 . . . . . .

08% 27 . . . . .

83% 32 . . . . .

82% 24 . . The true asymptotic SE for GQR and QR are computed using (5.5) and (5.6), respectively, whilefor i-QR, the ﬁrst step variance=0 in the formula for GQR. Mean over 1000 simulations of thebootstrapped asymptotic SE (5.2) is reported. CoV denotes coeﬃcient of variation and indicates theextent of variability in the estimates for each run of the simulation.

As mentioned earlier, the GQR asymptotic variance is impacted by the choice of the ﬁrst stageestimator. While OLS is a natural choice for estimating the constant ﬁrst stage slope, as we haveconsidered till now, linearity in OLS is a restriction and choosing from a wider class including non-linear estimators is likely to produce variance improvement. In Table 7, we report bootstraped GQRasymptotic SE for ˆ β ( · ) using two QR-based ﬁrst stage estimators of β , apart from OLS: the mean ofthe quantile estimates ˆ β ( τ ) over 19 equidistant quantile levels { . , . , . . . , . } , and the weightedaverage of ˆ β ( τ ) at τ = (1 / , / , , 2 /

3) with weights (0 . , . , . β ( · ) is not reported, since its variance is unaﬀected by ﬁrst stage, as noted earlier.Comparison of bootstrapped GQR asymptotic SE using diﬀerent ﬁrst stage estimators shows that QR-based estimators yield more eﬃcient results than OLS, which is not surprising as quantile regressioncan be more eﬃcient than least squares in absence of i.i.d and Gaussian error assumption. QRmean, which assigns equal weights to all quantiles, results in poorer GQR eﬃciency than Gastwirth’sweighted QR which gives more weight to the median and lesser to the tails - this estimator is knownto have higher eﬃciency in a large class of distributions (see, for example, Koenker & Bassett (1978)).Optimal ﬁrst stage estimator for the constant slopes in QR models and semi-parametric eﬃciency ofthe two-step GQR is an interesting study left for future research.22able 6: Asymptotic standard error for ˆ β ( · ), B = 1000, simulations = 1000 n = 100 n = 1000 τ True Estimated CoV Estimated CoV0 . . . .

13% 1 . . . . .

06% 1 . . . . .

32% 1 . . . . . .

43% 2 . . . . .

07% 2 . . . . .

82% 2 . . . . . .

58% 4 . . . . .

50% 4 . . . . .

66% 4 . . . . . .

36% 4 . . . . .

47% 4 . . . . .

14% 4 . . The true asymptotic SE for all methods is given by (5.7). The rest of explanation is as in Table 5.

Table 7: GQR bootstrapped asymptotic SE for ˆ β ( · ): Varying ﬁrst stage estimators τ β estimator n = 100 n = 1000(First stage)0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . QR mean averages quantile estimates ˆ β ( τ ) over τ = { . , . , . . . , . } . Weighted QR assigns the weights(0 . , . , .

3) to ( ˆ β (1 / , ˆ β (1 / , ˆ β (2 / Empirical Application

The two step estimation procedure of Section 2.1.1 can be useful in estimating auction models asin the quantile regression approach of Gimenes (2017). In ﬁrst price auctions, a quantile regres-sion speciﬁcation for the private value generates a quantile regression speciﬁcation for the bid, seeGimenes & Guerre (2020). The linear regression approach of Haile, Hong & Shum (2003) for esti-mating ﬁrst price auction models uses the ‘homogenized bid’ technique, which implies constant slopeparameters in the bid quantile regression model. It is shown here that the two approaches can becombined, as in the example of Section 2.1.1. We apply the GQR estimator for the estimation of bidquantile speciﬁcation containing both quantile-constant and quantile-dependent slope parameters. Inthe ﬁrst step, following Haile et al. (2003), the constant slope parameter is estimated by regressingthe bids on the observed covariates. This is then used to generate the dependent variable for thequantile regression for estimating the quantile parameters The aim of our empirical exercise is to seehow imposing a constant slope for a given set of variables can improve the estimation of the otherslope functions.We illustrate our proposed methodology using data from ﬁrst price timber auctions conducted bythe US Forest Services (USFS) covering the western half of US in the year 1979. This is the samedata used by Lu & Perrigne (2008). The data consists of 214 ﬁrst price auctions with 2 bidders, andthe covariates listed are appraisal value and timber volume (in log).

Bid homogenization.

Figure 2 shows the bid quantile parameter estimates obtained from quantileregression of bids on the covariates along with the corresponding OLS estimates and its 95% conﬁdenceinterval. Intercept and appraisal value quantile slope coeﬃcients seem to satisfy the assumption ofconstancy across quantiles. However, the volume quantile parameter does not seem to be constant.

Intercept

OLSQR95% CI OLS

Appraisal Value

OLSQR95% CI OLS

Volume

OLSQR95% CI OLS

Figure 2: Bid quantile parameter estimates

Bid quantile estimation using GQR.

The GQR estimator involves constrained estimation as-suming the intercept and appraisal value slope to be constant across quantiles, while the volumeparameter is considered to be varying with quantile levels. Table 8 reports the result of linear re-gression of bids on the covariates. The ﬁrst step estimates constitute the intercept and appraisalvalue slope regression estimates, while the quantile estimates for slope of volume is obtained throughquantile regression of the generated dependent variable ( bids i − ( − . − . × appraisal value i )on volume i . The second step GQR bid quantile estimate for slope of volume is shown in Figure 3.24or comparison purpose, we also plot the results of unconstrained estimation of quantile parametersof volume. Table 9 also reports the bootstrapped standard error (SE) of the constrained and theunconstrained estimator obtained from 10 ,

000 bootstrap replications.Table 8: First step - bid regressionIntercept Appraisal value Volume R − .

07 1 .

01 4 .

07 0 . .

72) (0 .

04) (1 .

95% bootstrapped conﬁdence interval for QR shown by red dotted lines, and for GQR by blue shaded region

As can be seen in Figure 3 and Table 9, the GQR slope estimate is more regular than that of theunconstrained estimation; the GQR estimates increase with quantile level, which is consistent withan increasing bid conditional quantile function. The reason is likely that the ﬁrst stage estimationremoves some of the regressors in the GQR stage along with the associated variation, thereby improvingmonotonicity and smoothness of the quantile estimates. An accuracy improvement, especially in thehigher quantiles, is also evident as the GQR bootstrap conﬁdence interval is much smaller in higherquantiles. The SE pattern observed in Table 9 is as expected from the analysis in Section 5.2 althoughthe covariates are no longer independent: constrained SE is lesser than that obtained by unconstrainedestimation except for the ﬁrst three quantile levels. Also note that the SE for the unconstrainedestimator varies quite a lot across quantiles and is quite high for the higher quantiles, which areparticularly important for auction models as winners reside here. An intuitive explanation for theSE pattern observed here is as follows. The asymptotic variance of the unconstrained estimator willhave the form given by (5.3): in the tails, while τ (1 − τ ) tends to make the quantile estimate moreprecise, the derivative of an increasing quantile slope parameter has an opposite eﬀect. In higherquantiles, as is typical with quantile regression, the latter eﬀect dominates, reducing the precision of25able 9: Bootstrapped SE for constrained vs unconstrained quantile estimation of volumeConstrained (GQR) Unconstrained τ Estimate SE Estimate SE0 . . . − . . . . . . . . . . − . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H − J H − term which increases with τ for increasing slope parameter,there is a negative quantile eﬀect due to the covariance term being negative in τ for an increasingslope parameter. So, the net quantile-dependent eﬀect is reduced and the SE is more uniform acrossquantiles. Hence, at lower quantile levels, the SE of the GQR estimator is greater than that of theunconstrained one because of the constant contribution of ﬁrst step variance. But at higher quantiles,SE of the unconstrained estimator is much greater.In general, the unconstrained quantile regression involves ﬁtting the model at each quantile level,for estimating both the constant and the quantile-dependent model parameters, and thus loses out onthe information that some covariate eﬀects are common across quantiles. It is well noted in literaturethat for estimating quantile models that have some common covariate eﬀects, eﬃciency gain canbe achieved by aggregating the information across multiple quantiles, as in the combined quantileregression approach of Zou & Yuan (2008). The GQR estimator utilizes the commonality informationand improves upon eﬃciency - the overall eﬃciency gain and tail behaviour will, as noted earlier,depend on the choice of ﬁrst step estimator. This paper studies two step estimation of quantile regression models with generated covariates and/ordependent variable. The asymptotic normality of this generated quantile regression (GQR) estimatoris derived using the Bahadur expansion approach. The results are veriﬁed using simulation and anapplication based on auctions is carried out. We mention some relevant areas of application. Inparticluar, analysis of QR models with some constant slopes suggests potential beneﬁts of the twostage procedure in terms of improvements in monotonicity, smoothness and estimation accuracy. Akey technical contribution of the paper is to provide Bahadur expansion which holds uniformly with26espect to ﬁrst step parameter and quantile levels, which can be utilised for developing speciﬁcationtests (like those developed in Koenker & Machado (1999) and Koenker & Xiao (2002)) as well as toobtain functional central limit theorem for the two step quantile regression estimator.A slightly diﬀerent problem that can be studied using techniques developed here relates to quantilespeciﬁcations where a ﬁrst step estimation impacts the quantile level for the second stage quantileregression. Such speciﬁcations arise in Arellano & Bonhomme (2017)’s method of quantile regressionwith “rotated” check function to correct for sample selection in quantile regression models. A morechallenging problem open for future research is to consider the case where the ﬁrst stage converges ata slower rate, like in quantile regression models for panel data where the ﬁrst step within estimator isusually √ n -consistent and the quantile estimator is √ nT -consistent.27 ppendix 1. Proof section Notations.

The notation ≍ is deﬁned as: sequences { x n } and { y n } satisfy x n ≍ y n if | x n | /C ≤| y n | ≤ C | x n | , for some C > n large enough. ||·|| is the Euclidean norm. The largest eigenvaluein absolute value for a symmetric matrix A is || A || = sup u ∈B (0 , || Au || = sup u ∈B (0 , | u ′ Au | . Also, forany matrix or vector B , || AB || ≤ || A || || B || . We denote || f ( ·|· ) || ∞ = sup y,x | f ( y | X ) | . And the notation ≻ denotes that, for two symmetric matrices A , A , A ≻ A if and only if A − A is a positivedeﬁnite symmetric matrix.Deﬁne Q ( β ; τ, θ ) = E (cid:2) ρ τ (cid:0) Y ( θ ) − X ′ ( θ ) β (cid:1)(cid:3) − E [ ρ τ ( Y ( θ ))] . As ρ τ ( · ) is almost everywhere diﬀerentiable with bounded derivatives, β Q ( β ; τ, θ ) is diﬀerentiablewith ﬁrst derivative Q (1) ( β ; τ, θ ) = E [ { I ( Y ( θ ) ≤ X ′ ( θ ) β ) − τ } X ( θ )] = E [ { F ( X ′ ( θ ) β | X, θ ) − τ } X ( θ )] . Hence Q ( · ; τ, θ ) is twice continuously diﬀerentiable with respect to β , with a second derivative Q (2) ( β ; τ, θ ) = E (cid:2) f (cid:0) X ′ ( θ ) β | X, θ (cid:1) X ( θ ) X ′ ( θ ) (cid:3) = Z f (cid:0) x ′ β | x, θ (cid:1) xx ′ f X ( x | θ ) dx. Let B ( θ ) be the set of β ′ such that 0 < F ( x ′ ( θ ) β | x, θ ) < x of X ( θ ), B ( θ ) = (cid:8) β ; there is an inner x of X ( θ ) such that y ( θ | x ) < x ′ β < y ( θ | x ) (cid:9) where y ( θ | x ) = F − (0 | x ) and y ( θ | x ) = F − (1 | x ). The next Lemma describes some key properties of Q (2) ( β ; τ, θ ) and Q ( β ; τ, θ ). Lemma 1

Under Assumption 3 it holds(i) Q (2) ( β ; τ, θ ) is continuous with respect to its three arguments, with (cid:13)(cid:13)(cid:13) Q (2) ( β ; τ, θ ) − Q (2) ( β ; τ, θ ) (cid:13)(cid:13)(cid:13) ≤ C k β − β k for all β and β , θ ∈ Θ and τ ∈ [0 , .(ii) Q (2) ( β ; τ, θ ) is strictly positive for all β ∈ B( θ ) , θ ∈ Θ and τ ∈ [0 , .(iii) For θ ∈ Θ and τ ∈ (0 , , Q ( β ; τ, θ ) has a unique minimizer β ( τ ; θ ) which is continuouslydiﬀerentiable in θ and τ with ∂β ( τ ; θ ) ∂θ ′ = H ( τ ; θ ) − D ( τ ; θ ) ,∂β ( τ ; θ ) ∂τ = H ( τ ; θ ) − E [ X ( θ )] , where H ( τ ; θ ) and D ( τ ; θ ) are as in Proposition 1. Proof of Lemma 1. (i) directly follows from Assumption 3 and the Lebesgue Dominated Con-vergence Theorem. For (ii), Assumption 3 gives that, for each β in B ( θ ), there is an open subset O = O β,θ of X ( θ ) such that Q (2) ( β ; τ, θ ) (cid:23) R O xx ′ dx. Hence, H ( τ ; θ ) = Q (2) ( β ( τ ; θ ); τ, θ ) has aninverse. For (iii), observe that Q ( β ; τ, θ ) is bounded away from −∞ , so that it has local minimizers28hich must satisfy the ﬁrst order condition0 = Q (1) ( β ; τ, θ ) = E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β | X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3) . (A1.1)Hence these minimizers must lie in B ( θ ) as outside this set it holds F ( X ′ ( θ ) β | X, θ ) = 1 a.s, or F ( X ′ ( θ ) β | X, θ ) = 0 a.s. Now, if there are two such local minimizers β ( τ ; θ ) and β ( τ ; θ ), convexityimplies that all β π ( τ ; θ ) = (1 − π ) β ( τ ; θ ) + πβ ( τ ; θ ), 0 ≤ π ≤

1, must be global minimizers,contradicting that Q (2) ( β π ( τ ; θ ) ; τ, θ ) is strictly positive as Q (1) ( β π ( τ ; θ ) ; τ, θ ) = 0 for all π in [0 , (cid:3) Proof of Proposition 1.

Follows from Lemma 1-(iii). (cid:3)

Proof of Proposition 2-(i).

This proof conducts a uniform order study of the Bahadur errorterm (3.4). Deﬁne the following L n ( γ, τ ; θ ) = n X i =1 (cid:26) ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ √ n + β ( τ ; θ ) (cid:19)(cid:19) − ρ τ (cid:0) Y i ( θ ) − X i ( θ ) ′ β ( τ ; θ ) (cid:1)(cid:27) , such that √ n (cid:16) b β ( τ ; θ ) − β ( τ ; θ ) (cid:17) = arg min γ L n ( γ, τ ; θ ) . In what follows, we write b α ( τ ; θ ) ≡ − H − ( τ ; θ ) b S ( τ ; θ ) (A1.2) b S ( τ ; θ ) = 1 √ n n X i =1 s i ( τ ; θ ) . (A1.3)It follows from (3.4) that b E ( τ ; θ ) = arg min ǫ L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) , where L n ( γ, ǫ, τ ; θ ) = L n ( γ + ǫ, τ ; θ ) − L n ( γ, τ ; θ ) . (A1.4)Consider the following decomposition of L n ( γ, ǫ, τ ; θ ). L n ( γ, ǫ, τ ; θ ) = L n ( γ, ǫ, τ ; θ ) + R n ( γ, ǫ, τ ; θ ) , where L n ( γ, ǫ, τ ; θ ) = b S ( τ ; θ ) ′ ( γ + ǫ ) + 12 ( γ + ǫ ) ′ H ( τ ; θ )( γ + ǫ ) − b S ( τ ; θ ) ′ γ − γ ′ H ( τ ; θ ) γ = b S ( τ ; θ ) ′ ǫ + 12 ǫ ′ H ( τ ; θ )( ǫ + 2 γ ) . (A1.5) L n ( γ, ǫ, τ ; θ ) is the quadratic approximation of L n ( γ, ǫ, τ ; θ ) and R n ( γ, ǫ, τ ; θ ) is the remainder term. Asmentioned under ‘Heuristics’ in Section 3, a uniform order for b E ( τ ; θ ) relies on a uniform order study forthe remainder term R n ( γ, ǫ, τ ; θ ), using concepts of maximal inequality under bracketing conditionsgiven in Massart (2007), and on linearization techniques to study b E ( τ ; θ ) given in Hjort & Pollard(2011). 29 niform order for R n ( γ, ǫ, τ ; θ ) . The remainder term is R n ( γ, ǫ, τ ; θ ) = L n ( γ, ǫ, τ ; θ ) − L n ( γ, ǫ, τ ; θ ) = P ni =1 R i ( γ, ǫ, τ ; θ ), where R i ( γ, ǫ, τ ; θ ) = (cid:26) ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ + ǫ √ n + β ( τ ; θ ) (cid:19)(cid:19) − ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ √ n + β ( τ ; θ ) (cid:19)(cid:19)(cid:27) − s i ( τ ; θ ) √ n ′ ǫ − ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) . (A1.6)Deﬁne also R i ( γ, ǫ, τ ; θ ) = R i ( γ, ǫ, τ ; θ ) + 12 ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) , (A1.7) R i ( γ, ǫ, τ ; θ ) = R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ ) | X i ( θ )] , (A1.8) R i ( γ, ǫ, τ ; θ ) = E [ R i ( γ, ǫ, τ ; θ ) | X i ( θ )] − ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) , (A1.9)such that R n ( γ, ǫ, τ ; θ ) = R n ( γ, ǫ, τ ; θ ) + R n ( γ, ǫ, τ ; θ ) , with , R jn ( γ, ǫ, τ ; θ ) = n X i =1 R ji ( γ, ǫ, τ ; θ ) , j = 1 , . (A1.10)The following Lemmas provide uniform bounds for the suprema of the constituents of the remainderterm R n and for ˆ S (see Appendix 2 for their proofs). Lemma 2

Under Assumption 3, for real numbers t γ , t ǫ > with t γ ≍ log / n , t γ ≥ , t ǫ = (cid:16) t log / n (cid:17) /n / for some t > , such that ( t γ + t ǫ ) / /t ǫ ≤ O (cid:16) n / / log / n (cid:17) , for large n , E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ C log / nn / t ǫ ( t γ + t ǫ ) / . Lemma 3

Under Assumption 3, for real numbers t γ , t ǫ > deﬁned as in Lemma 2, such that t γ /t ǫ = O (cid:16) n/ log / n (cid:17) , for large n , E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ C t ǫ ( t γ + t ǫ ) n / . Lemma 4

Under Assumption 3, sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (log / n ) . In what follows, t n = t log / nn / , t > , such that t n = o (cid:16) log / n (cid:17) . t n plays the role of t ǫ in theLemmas, while t γ is chosen such that t γ ≍ log / n . Hence,( t γ + t ǫ ) / t ǫ ≍ n / log / nt log / n = 1 t O n / log / n ! t γ t ǫ ≤ C n / log / nt log / n ≤ C n / log / n ≤ C n / n / log / n = O (cid:18) n log / n (cid:19) , n . These choices for t γ and t ǫ satisfy the requirements for the Lemmas. Lemma 1-(ii),which proves existence of H − for all τ ∈ [ τ , τ ] and θ ∈ Θ, implies that b α ( τ ; θ ) is well deﬁned with aprobability tending to 1. Lemma 4 impliessup ( τ,θ ) ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || = O P (cid:16) log / n (cid:17) . (A1.11)Consider ξ > C ξ such that, for large n and some ϕ > P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n ! ≤ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n , sup τ,θ ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || ≤ C ξ log / n ! + P sup τ,θ ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || > C ξ log / n ! ≤ P  sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t n ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≥ ϕt n  + ξ. Since R n = R n + R n , Lemmas 2-3, and Markov inequality give P sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t n ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≥ ϕt n ! ≤ Ct n  E  sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) + E  sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ Ct n log / nn / (cid:18) C ξ + t n log / n (cid:19) / + (cid:18) log nn (cid:19) / (cid:18) C ξ + t n log / n (cid:19) ! . Using t n = (cid:16) t log / n (cid:17) /n / and since (log n ) /n = o (1), we getlim n →∞ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n ! = ξ + O C ξ / t ! . (A1.12) Uniform order for b E ( τ ; θ ) . Consider T n ≥ t n and ǫ = T n e , || e || = 1 so that || ǫ || ≥ t n . Since ρ τ ( · ) isconvex, L n ( β ( τ ; θ ) , ǫ, τ ; θ ) is convex. Recall that from (A1.4) and (A1.5), L n ( β ( τ ; θ ) , , τ ; θ ) = 0 and L n = L n + R n . Then, using convexity property, t n T n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) = t n T n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) + (cid:18) − t n T n (cid:19) L n ( b α ( τ ; θ ) , , τ ; θ ) ≥ L n (cid:18)b α ( τ ; θ ) , t n ǫ T n , τ ; θ (cid:19) = L n ( b α ( τ ; θ ) , t n e, τ ; θ ) = L n ( b α ( τ ; θ ) , t n e, τ ; θ ) + R n ( b α ( τ ; θ ) , t n e, τ ; θ ) . Since b E ( τ ; θ ) = arg min ǫ L n ( b α ( τ ; θ ) , ǫ, τ ; θ ), we have n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b E ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t n o ⊂ (cid:26) inf ǫ ; || ǫ ||≥ t n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) ≤ inf ǫ ; || ǫ || τ ∈ [ τ , τ ] and θ ∈ Θ, H ( τ ; θ ) ≻ CM where M = E [ X ( θ ) X ( θ ) ′ ] , and for all u in R P , u ′ M u = E (cid:2) u ′ X ( θ ) X ( θ ) ′ u (cid:3) = E h(cid:0) u ′ X ( θ ) (cid:1) i = Z (cid:0) u ′ x (cid:1) f X ( x | θ ) dx ≥ C Z H (cid:0) u ′ x (cid:1) dx ≥ C || u || , where the last bound uses the fact that u (cid:16)R ( u ′ x ) dx (cid:17) / is a norm and norm over R P areequivalent. Hence, for any non-zero u ∈ R P , M is a positive deﬁnite matrix. This implies that if φ M is the smallest eigenvalue of M , then, φ M >

0. Since H ( τ ; θ ) ≻ CM , it follows for the smallesteigenvalue of the positive deﬁnite symmetric matrix H ( τ ; θ ), denoted by φ ( τ ; θ ), thatinf ( τ,θ ) ∈ [ τ,τ ] × Θ φ ( τ ; θ ) ≥ Cφ M + o P (1); for some φ M > . (A1.14)Consider inf ( τ,θ ) ∈ [ τ,τ ] × Θ inf || ǫ || = t n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ). The above result gives, for any ǫ with || ǫ || ≥ t n , L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) = ǫ ′ H ( τ ; θ ) ǫ ≥ φt n . Hence, from (A1.12) and (A1.13), we havelim n →∞ P  sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b E ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t n  ≤ lim n →∞ P  sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ φt n  ≤ lim n →∞ P  sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ φt n  = ξ + O C ξ / t ! . The latter can be made arbitrarily small by choosing ξ arbitrarily small and t large enough. Recalling t n = ( t log / n ) /n / proves Proposition 2-(i). Note that O P (cid:16) log / nn / (cid:17) = (cid:16) log / nn / (cid:17) O P (1) = o P (1). (cid:3) roof of Proposition 2-(ii). Setting Z i ( θ ) = Y i ( θ ) − X ′ i ( θ ) β ( τ ; θ ), b S ( τ ; θ ) − b S ( τ ; θ ) = 1 √ n n X i =1 e s i ( τ ; θ ) , where e s i ( τ ; θ ) = [ X i ( θ ) { I ( Z i ( θ ) ≤ − τ } − X i ( θ ) { I ( Z i ( θ ) ≤ − τ } ] ≤ X i ( θ ) + X i ( θ ))Denoting e s iℓ ( τ ; θ ) as the ℓ -th coordinate of the vector e s i ( τ ; θ ) implies (cid:12)(cid:12)(cid:12)(cid:12) e s iℓ ( τ ; θ ) √ n (cid:12)(cid:12)(cid:12)(cid:12) ≤ C √ n ≍ n − / ≡ ν ′′′ By Assumption 2, for C (1) < ∞ such that sup θ ∈ Θ (cid:13)(cid:13) ∂∂θ ′ [ Z i ( θ )] (cid:13)(cid:13) ≤ C (1) , Taylor inequality gives | Z i ( θ ) − Z i ( θ ) | ≤ C (1) || θ − θ || . Then, under Assumptions 1 and 3, and removing the subscript i to denote random variables, we have V ar (cid:18) e s ℓ ( τ ; θ ) √ n (cid:19) = 1 n E h (( X ℓ ( θ ) τ − X ℓ ( θ ) τ ) + ( X ℓ ( θ ) I [ Z ( θ ) ≤ − X ℓ ( θ ) I [ Z ( θ ) ≤ i ≤ n E h ( X ℓ ( θ ) τ − X ℓ ( θ ) τ ) + ( X ℓ ( θ ) ( I [ Z ( θ ) ≤ − I [ Z ( θ ) ≤ I [ Z ( θ ) ≤

0] ( X ℓ ( θ ) − X ℓ ( θ ))) i ≤ Cn k θ − θ k + Cn E (cid:20) I (cid:18) − C √ n ≤ Z ( θ ) ≤ C √ n (cid:19)(cid:21) ≤ Cn k θ − θ k = O (cid:16) n − / (cid:17) . Hence, the standard deviation of e s ℓ ( τ ; θ ) / √ n is σ ′′′ ≍ n − / . Then arguing as in Steps 1-2 of Lemma2 (see Appendix 2), E " sup τ ∈ [ τ,τ ] , || θ − θ ||≤ C/ √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ℓ ( τ ; θ ) − b S ℓ ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:16) n / σ ′′′ log / n + (cid:0) σ ′′′ + ν ′′′ (cid:1) log n (cid:17) = O log / nn / ! . Note that by Lemma 1 we have sup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13) H − ( θ ; τ ) − H − ( θ ; τ ) (cid:13)(cid:13) = O (cid:0) n − / (cid:1) andsup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13) H − ( θ ; τ ) (cid:13)(cid:13) ≤ C . Markov inequality and Lemma 4, then, explain the orderin (3.6). (cid:3) Appendix 2. Proofs of intermediary Lemmas for Proposition 2

Proof of Lemma 2.

Bound for R n ( γ, ǫ, τ ; θ ) is based on Massart’s maximal inequality underbracketing entropy Theorem 6 .

8, the conditions for which are proven in Step 1. This ﬁrst requiresstudying variance of R ( γ, ǫ, τ ; θ ). Variance of R ( γ, ǫ, τ ; θ ) . Note that ρ a ( b ) = ( a − I ( b < b = R b ( a − I ( t < dt . Denoting δ ( γ ; θ ) = X ( θ ) ′ γ/ √ n, and Z ( τ ; θ ) = Y ( θ ) − X ( θ ) ′ β ( τ ; θ ) , (A2.1)and using deﬁnitions in (A1.6) and (A1.7), for a given θ ∈ Θ, R ( γ, ǫ, τ ; θ ) = ρ τ ( Z ( τ ; θ ) − δ ( γ + ǫ ; θ )) − ρ τ ( Z ( τ ; θ ) − δ ( γ ; θ )) − δ ( ǫ ; θ ) ( I ( Z ( τ ; θ ) ≤ − τ )33 Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt. (A2.2)Using Cauchy-Schwarz inequality, R ( γ, ǫ, τ ; θ ) ≤ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12) R δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt (cid:12)(cid:12)(cid:12)(cid:12) ≤| δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12) R δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) I ( | Z ( τ ; θ ) | ≤ | t | ) dt (cid:12)(cid:12)(cid:12)(cid:12) . Under Assumption 3, E [ R ( γ, ǫ, τ ; θ ) | X ( θ )] ≤ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) (cid:26)Z I (cid:0) | y − X ( θ ) ′ β ( τ ; θ ) | ≤ | t | (cid:1) f ( y | X, θ ) dy (cid:27) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) | t | dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z δ ( ǫ ; θ )0 | δ ( γ ; θ ) + u | du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z | δ ( ǫ ; θ ) | ( | δ ( γ ; θ ) | + | u | ) du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C || X ( θ ) || n / || ǫ || ( || γ || + || ǫ || ) . Therefore, Var( R ( γ, ǫ, τ ; θ )) ≤ E [ R ( γ, ǫ, τ ; θ )] = E [ E [ R ( γ, ǫ, τ ; θ ) | X ( θ )]] ≤ E h C || X ( θ ) || n / || ǫ || ( || γ || + || ǫ || ) i = C || ǫ || ( || γ || + || ǫ || ) n / R || x || f X ( x | θ ) dx ≤ C || ǫ || ( || γ || + || ǫ || ) n / . Step 1. Brackets of { R ( γ, ǫ, τ ; θ ) } . Let F = { R ( γ, ǫ, τ ; θ ); ( γ, ǫ, τ ; θ ) ∈ B (0 , t γ ) ×B (0 , t ǫ ) × [ τ , τ ] × Θ } .This step ﬁnds coverings of F with brackets [ R, R ], where the bracket [

R, R ] is the set of all R j such that R ≤ R j ≤ R almost surely. Deﬁne for γ in R P e R ( γ, τ ; θ ) = R δ ( γ ; θ )0 ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt, which is such that, from (A2.2), R ( γ, ǫ, τ ; θ ) = e R ( γ + ǫ, τ ; θ ) − e R ( γ, τ ; θ ) (A2.3)Let sgn( t ) = I ( t ≥ − I ( t < u = t/ sgn( δ ( γ ; θ )), we have e R ( γ, τ ; θ ) = Z | δ ( γ ; θ ) | ( I ( Z ( τ ; θ ) ≤ sgn( δ ( γ ; θ )) u ) − I ( Z ( τ ; θ ) ≤ δ ( γ ; θ )) du = Z | δ ( γ ; θ ) | | I ( Z ( τ ; θ ) ≤ sgn( δ ( γ ; θ )) u ) − I ( Z ( τ ; θ ) ≤ | du = | δ ( γ ; θ ) | Z | I ( Z ( τ ; θ ) ≤ δ ( γ ; θ ) v ) − I ( Z ( τ ; θ ) ≤ | dv, = | δ ( γ ; θ ) | Z | I ( Z ( τ ; θ ) lies between 0 and δ ( γ ; θ ) v ) | dv, (A2.4)where the second last line is obatined using change of variable v = u/ | δ ( γ ; θ ) | . Hence, 0 ≤ e R ( γ, τ ; θ ) ≤| δ ( γ ; θ ) | . Then, using the deﬁnition of δ ( γ ; θ ) in (A2.1), we get for all γ ∈ B (0 , t γ + t ǫ ), | e R ( γ, τ ; θ ) | ≤ || X ( θ ) || || γ ||√ n ≤ ν , where ν ≍ t γ + t ǫ √ n . (A2.5)It follows from (A2.3) and the variance bound obtained earlier that E h | R ( γ, ǫ, τ ; θ ) − E [ R ( γ, ǫ, τ ; θ )] | k i = E (cid:20)(cid:12)(cid:12)(cid:12) e R ( γ + ǫ, τ ; θ ) − E h e R ( γ + ǫ, τ ; θ ) i − n e R ( γ, τ ; θ ) − E h e R ( γ, τ ; θ ) io(cid:12)(cid:12)(cid:12) k − | R ( γ, ǫ, τ ; θ ) − E [ R ( γ, ǫ, τ ; θ )] | (cid:21) (cid:18) × ν (cid:19) k − Var( R ( γ, ǫ, τ ; θ )) ≤ k !2 ν k − σ , where σ ≍ t ǫ ( t ǫ + t γ ) n / . (A2.6)In order to ﬁnd covering for F , we ﬁrst deﬁne e F t = { e R ( γ, τ ; θ ); ( γ, τ, θ ) ∈ B (0 , t ) × [ τ , τ ] × Θ } andshow that it is suﬃcient to ﬁnd covering of e F t , with set of brackets { [ R j , R j ] , ≤ j ≤ e h ( t b ; t ) } , where t b ∈ (0 ,

1) denotes length of a bracket, satisfying, E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i ≤ k !8 (cid:18) ν (cid:19) k − t b , (A2.7) h ( t b ; t ) ≤ C log (cid:18) ntt b (cid:19) . (A2.8)Consider the following two coverings of e F t γ and e F t γ + t ǫ e F t γ ⊂ [ ≤ j ≤ e h ( tb ; tγ ) h R j , R j i , e F t γ + t ǫ ⊂ [ ≤ j ≤ e h ( tb ; tγ + tǫ ) h R j , R j i If such coverings of e F t γ and e F t γ + t ǫ exist, then for every ( γ, ǫ, τ ; θ ), e R ( γ, τ ; θ ) ∈ h R j , R j i , e R ( γ + ǫ, τ ; θ ) ∈ h R j , R j i , for some j and j , and from (A2.3), we have R ( γ, ǫ, τ ; θ ) ∈ h R j − R j , R j − R j i .Hence, F can be covered by e h ′ ( t b ; t ) brackets such that, using (A2.7) and (A2.8), h ′ ( t b ; t ) = h ( t b ; t γ ) + h ( t b ; t γ + t ǫ ) ≤ C log (cid:18) n ( t γ + t ǫ ) t b (cid:19) , and E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j − (cid:16) R j − R j (cid:17)(cid:12)(cid:12)(cid:12) k (cid:21) = E (cid:20)(cid:12)(cid:12)(cid:12)(cid:16) R j − R j (cid:17) + (cid:16) R j − R j (cid:17)(cid:12)(cid:12)(cid:12) k (cid:21) ≤ k − (cid:18) E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j (cid:12)(cid:12)(cid:12) k (cid:21) + E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j (cid:12)(cid:12)(cid:12) k (cid:21)(cid:19) ≤ k − k !8 (cid:18) ν (cid:19) k − t b = k !2 ν k − t b ., where the inequality in the second line of the above equation follows because, for a > b > a + b ) k ≤ k − ( a k + b k ).We now construct covering for e F t . Lemma 1 proves that β ( τ ; θ ) is continuously diﬀerentiable in µ = ( τ, θ ) over [ τ , τ ] × Θ with bounded derivative. Then from from Taylor’s inequality we get, for all µ , µ in [ τ , τ ] × Θ, (cid:12)(cid:12) x ( θ ) ′ β ( µ ) − x ( θ ) ′ β ( µ ) (cid:12)(cid:12) ≤ C || µ − µ || . (A2.9)Also, given θ ∈ Θ, for all γ , γ in R P , we have | δ ( γ ; θ ) − δ ( γ ; θ ) | ≤ C √ n || γ − γ || . (A2.10)Deﬁne r ( q, δ ) = R ρ ( q, δv ) dv , ρ ( q, δ ) = | I ( q ≤ δ ) − I ( q ≤ | = I ( q ∈ (0 , δ ]) I ( δ ≥ I ( q ∈ [ δ, I ( δ < . From (A2.4), e R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )) . Note that ρ ( q, δ ) is a step function which is 1 for q between 0 and δ , and 0 elsewhere, for a given δ . Let ρ ( q, δ ) and ρ ( q, δ ) be smooth approximationsof ρ ( q, δ ), constructed using Friedrichs molliﬁer of the form35( x ) = C  e − / ( −| x | ) , if | x | < , if | x | ≥ , where C > R − Φ( x ) dx = 1 (see Stroock (2011), chapter 6 for details). As such,for η >

0, the convolution procedure yields that there exist smooth approximation functions ρ ( q, δ ), ρ ( q, δ ), and an open set D η ⊂ R such that:(i) 0 ≤ ρ ( q, δ ) ≤ ρ ( q, δ ) ≤ ρ ( q, δ ) ≤ q, δ ) ∈ D η , with ρ ( q, δ ) = ρ ( q, δ ) = ρ ( q, δ ) if( q, δ ) ∈ R \ D η ,(ii) sup ( q,δ ) ∈ D η (cid:16)(cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂q (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂δ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂q (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂δ (cid:12)(cid:12)(cid:12)(cid:17) ≤ Cη − / , and, ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = 0, when ( q, δ ) ∈ R \ D η ,(iii) D η ⊂ D ′ η = (cid:8) ( q, δ ) ∈ R ; | q | ≤ Cη / or | q − δ | ≤ Cη / (cid:9) Deﬁne r ( q, δ ) = R ρ ( q, vδ ) dv , r ( q, δ ) = R ρ ( q, vδ ) dv , and R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )), R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )) such that condition (i) implies R ( γ, τ ; θ ) ≤ e R ( γ, τ ; θ ) ≤ R ( γ, τ ; θ ) . (A2.11)We now bound R ( γ , µ ) − R ( γ , µ ) and R ( γ , µ ) − R ( γ , µ ). | R ( γ , µ ) − R ( γ , µ ) | = || δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) | = || δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ ))+ | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) |≤ | | δ ( γ ; θ ) − δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) + | δ ( γ ; θ ) | | r ( Z ( µ ) , δ ( γ ; θ )) − r ( Z ( µ ) , δ ( γ ; θ )) | | . Using the deﬁnitions of Z ( τ ; θ ) and δ ( γ ; θ ) given in (A2.1), the bounds on increments of x ( θ ) ′ β ( τ ; θ )and δ ( γ ; θ ) obtained in (A2.9) and (A2.10), respectively, conditions (i, ii) and Taylor’s inequality, wehave, for all ( γ , µ ), ( γ , µ ) in B (0 , t ) × [ τ , τ ] × Θ, where t = t γ + t ǫ ≥ | R ( γ , µ ) − R ( γ , µ ) | ≤ C || γ − γ ||√ n + C tη − / √ n (cid:18) || µ − µ || + || γ − γ ||√ n (cid:19) ≤ C √ n (cid:16) tη − / (cid:17) ( || µ − µ || + || γ − γ || ) . Arguing similarly gives (cid:12)(cid:12) R ( γ , µ ) − R ( γ , µ ) (cid:12)(cid:12) ≤ C √ n (cid:16) tη − / (cid:17) ( || µ − µ || + || γ − γ || ) . From van de Geer (2000) there exists a covering of B (0 , t ) × [ τ , τ ] × Θ by L balls B (( γ j , µ j ) , η ) withcentre ( γ j , µ j ) and radius η such that L ≤ max (cid:18) , Ct P η P + d +1 (cid:19) , where γ ∈ R P , µ = ( τ ; θ ) ∈ [ τ , τ ] × R d . (A2.12)Note that for a ball of radius η with centre ( γ j , µ j ) and ( γ , µ ) inside this ball, | R ( γ j , µ j ) − R ( γ , µ ) | ≤ C √ n (cid:0) tη − / (cid:1) η , (cid:12)(cid:12) R ( γ j , µ j ) − R ( γ , µ ) (cid:12)(cid:12) ≤ C √ n (cid:0) tη − / (cid:1) η . Deﬁne R ′ j = R ( γ j , µ j ) − C √ n (cid:0) tη − / (cid:1) η ,36 ′ j = R ( γ j , µ j ) + C √ n (cid:0) tη − / (cid:1) η, and R j = max (0 , R ′ j ) , R j = min (cid:18) ν , R ′ j (cid:19) . (A2.13)Then, from (A2.11), for ( γ, θ ) in B (( γ j , µ j ) , η ), we have R ′ j ≤ R j ≤ e R ( γ, θ ) ≤ R j ≤ R ′ j (A2.14)This implies that { (cid:2) R j , R j (cid:3) , j = 1 , · · · , L } is a covering of e F t , with, (cid:12)(cid:12) R j − R j (cid:12)(cid:12) ≤ ν ≍ t √ n , (A2.15)since 0 ≤ R j ≤ R j ≤ ν/

2. We now bound E h(cid:0) R j − R j (cid:1) i and E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i . The deﬁnitions of δ ( γ ; θ ), Z ( τ ; θ ) in (A2.1), conditions (i, iii), Assumption 3, (A2.14) and the inequality ( a + b ) ≤ a + b ) give E h(cid:0) R j − R j (cid:1) i ≤ E (cid:20)(cid:16) R ′ j − R ′ j (cid:17) (cid:21) = E "(cid:18)(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) + 2 C √ n (cid:16) tη − / (cid:17) η (cid:19) ≤ E h(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) i + Cn (cid:16) tη − / (cid:17) η ≤ E h(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) i + C (1 + t ) ( η + η ) n = 2 E h δ ( γ j ; θ j ) ( r ( Z ( µ j ) , δ ( γ j ; θ j )) − r ( Z ( µ j ) , δ ( γ j ; θ j ))) i + C (1 + t ) ( η + η ) n ≤ || γ j || n Z || x || (cid:26)Z (cid:26)Z I (( Z ( µ j ) , δ ( γ j ; θ j ) v ) ∈ D η ) dv (cid:27) f ( y | x, θ ) dy (cid:27) f X ( x | θ ) dx + C (1 + t ) ( η + η ) n ≤ C (1 + t ) n ( η + η + η / ) , where the last inequality follows from Assumption 3 and condition (iii), since Z Z I (( Z ( µ j ) , δ ( γ j ; θ j ) v ) ∈ D η ) dvf ( y | x, θ ) dy ≤ Z I (cid:0) y ∈ D η + x ( θ ) ′ β ( µ j ) (cid:1) f ( y | x, θ ) dy ≤ C Z I (cid:0) y ∈ D η + x ( θ ) ′ β ( µ j ) (cid:1) dy = C (length of D η ) ≤ Cη / . The above bound, together with (A2.15), gives for any integer k ≥ E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i = E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) (cid:12)(cid:12) R j − R j (cid:12)(cid:12) k − i ≤ (cid:18) ν (cid:19) k − E h(cid:0) R j − R j (cid:1) i ≤ k !8 (cid:18) ν (cid:19) k − C (1 + t ) n ( η + η + η / ) . Hence, (A2.7) holds if η = C min (cid:18)(cid:16) n (1+ t ) (cid:17) / t b , (cid:16) n (1+ t ) (cid:17) t b , (cid:16) n (1+ t ) (cid:17) t b (cid:19) . Recall that t ≥ t b ∈ (0 , L = e h ( t b ; t ) ≤ max  , Ct P min (cid:18)(cid:16) n (1+ t )2 (cid:17) / t b , (cid:16) n (1+ t )2 (cid:17) t b , (cid:16) n (1+ t )2 (cid:17) t b (cid:19) P + d +1  ≤ max (cid:16) , Cnt t b (cid:17) P + d +1 , such that for large n, h ( t b ; t ) ≤ max (cid:16) , ( P + d + 1) log (cid:16) Cnt t b (cid:17)(cid:17) = C (log n +5 log t − t b ) ≤ C (log n + log t − log t b )+ C log t ≤ C log (cid:16) ntt b (cid:17) + C log (cid:16) ntt b (cid:17) ≤ C log (cid:16) ntt b (cid:17) , which37roves (A2.8). This completes our task of constructing covering for e F t . Step 2. Bound for E (cid:16) sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12)(cid:17) . E  sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) = E  sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ ) | X ( θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E " sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + E " sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E " n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) | X ( θ ) ≤ E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Let ν , σ , and h ( · ; · ) be as deﬁned in Step 1 by equations (A2.5), (A2.6) and (A2.8). Recall that t = t γ + t ǫ ≥ σ < ≤ n ( t γ + t ǫ ). Let us use the notation h ( u ; t ) = h ( u ). Applying Theorem 6 . E  sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:18) n Z σ h / ( u ) du + ( ν + σ ) h ( σ ) (cid:19) . From the discussion in Step 1 equation (A2.8), since σ <

1, for all u ∈ (0 , σ ], h ( u ; t ) = h ( u ) ≤ C log ( n ( t γ + t ǫ ) /u ). Therefore, by Cauchy-Schwarz inequality, we have n / Z σ h / ( u ) du ≤ ( nσ ) / (cid:18)Z σ h ( u ) du (cid:19) / ≤ C ( nσ ) / (cid:18)Z σ log (cid:18) n ( t γ + t ǫ ) u (cid:19) du (cid:19) / = C ( nσ ) / (cid:18) σ (cid:18) log (cid:18) n ( t γ + t ǫ ) σ (cid:19) + 1 (cid:19)(cid:19) / ≤ Cn / σ log / (cid:18) n ( t γ + t ǫ ) σ (cid:19) . With the assumptions on the order of t γ and t ǫ as stated in the statement of Lemma 2 and the orderof σ obtained in (A2.6), it followslog (cid:18) n ( t γ + t ǫ ) σ (cid:19) ≤ C log n / ( t γ + t ǫ ) / t ǫ ! ≤ C log n / n / log / n ! ≤ C log n. Hence, on substituting, we get E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ C (cid:16) n / σ log / n + ( ν + σ ) log n (cid:17) ≤ C t ǫ ( t ǫ + t γ ) / log / nn /

1+ log / n n / + ( t ǫ + t γ ) / t ǫ n / !! ≤ C log / nn / t ǫ ( t ǫ + t γ ) / , which proves Lemma 2. (cid:3) roof of Lemma 3. The proof of Lemma 3 follows the same steps as in Lemma 2 and, hence,a sketch of the proof is provided here. Treating quantities varying with i as random variables, theexpressions for R ( γ, ǫ, τ ; θ ) given in (A2.2), R ( γ, ǫ, τ ; θ ) from (A1.9) and H ( τ ; θ ) gives R ( γ, ǫ, τ ; θ ) = δ ( γ ; θ )+ δ ( ǫ ; θ ) Z δ ( γ ; θ ) (cid:0) F (cid:0) X ( θ ) ′ β ( τ ; θ )+ t | X, θ (cid:1) − F (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dt − ǫ ′ H ( τ ; θ )( ǫ + 2 γ )= δ ( γ ; θ ) + δ ( ǫ ; θ ) Z δ ( γ ; θ ) (cid:0) F (cid:0) X ( θ ) ′ β ( τ ; θ )+ t | X, θ (cid:1) - F (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1) - tf (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dt = δ ( γ ; θ ) + δ ( ǫ ; θ ) Z δ ( γ ; θ ) t (cid:26)Z (cid:0) f (cid:0) X ( θ ) ′ β ( τ ; θ ) + vt | X, θ (cid:1) − f (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dv (cid:27) dt. Deﬁne r ( γ, τ ; θ ) = R δ ( γ ; θ )0 t nR ( f ( X ( θ ) ′ β ( τ ; θ ) + vt | X, θ ) − f ( X ( θ ) ′ β ( τ ; θ ) | X, θ )) dv o dt which im-plies that R ( γ, ǫ, τ ; θ ) = r ( γ + ǫ, τ ; θ ) − r ( γ, τ ; θ ). Using the deﬁnition of δ ( γ ; θ ) in (A2.1) and becauseunder Assumption 3 we have n > | f ( a + b | x, θ ) − f ( a | x, θ ) | ≤ n | b | , from Lemma 1, wehave (cid:12)(cid:12) R ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) t dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = C (cid:12)(cid:12) δ ( ǫ ; θ ) (cid:0) δ ( γ ; θ ) + 3 δ ( γ ; θ ) δ ( ǫ ; θ ) + δ ( ǫ ; θ ) (cid:1)(cid:12)(cid:12) ≤ C | δ ( ǫ ; θ ) | (cid:16) | δ ( γ ; θ ) | +3 | δ ( γ ; θ ) | | δ ( ǫ ; θ ) | + | δ ( ǫ ; θ ) | (cid:17) ≤ C | δ ( ǫ ; θ ) | ( | δ ( γ ; θ ) | + | δ ( ǫ ; θ ) | ) ≤ C || X ( θ ) || || ǫ || ( || γ || + || ǫ || ) n / . (A2.16) | r ( γ, τ ; θ ) | ≤ C | δ ( γ ; θ ) | ≤ C || X ( θ ) || || γ || n / . Thus, for all γ ∈ B (0 , t γ + t ǫ ) and all ( τ, θ ) ∈ [ τ , τ ] × Θ, | r ( γ, τ ; θ ) | ≤ ν ′ ν ′ ≍ ( t γ + t ǫ ) n / . From (A2.16), under Assumption 3,Var (cid:0) R ( γ, ǫ, τ ; θ ) (cid:1) ≤ E (cid:2) R ( γ, ǫ, τ ; θ ) (cid:3) ≤ C || ǫ || ( || γ || + || ǫ || ) n / ! Z || x ( θ ) || f X ( x | θ ) dx ≤ C || ǫ || ( || γ || + || ǫ || ) n ≤ (cid:0) σ ′ (cid:1) ; σ ′ ≍ t ǫ ( t γ + t ǫ ) n / . Then arguing as in step 1 of Lemma 2 to construct brackets, E  sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) − E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) ≤ Cn / σ ′ log / (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + ( σ ′ + ν ′ ) log (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) It follows from (A2.16) and Assumption 3 that for all ( γ, ǫ, τ, θ ) in B (0 , t γ ) × B (0 , t ǫ ) × [ τ , τ ] × Θ, (cid:12)(cid:12) E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12) n E (cid:2) R i ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) ≤ n E (cid:2)(cid:12)(cid:12) R i ( γ, ǫ, τ ; θ ) (cid:12)(cid:12)(cid:3) ≤ Cn / E h || X ( θ ) || || ǫ || ( || γ || + || ǫ || ) i Cn / || ǫ || ( || γ || + || ǫ || ) Z || x ( θ ) || f X ( x | θ ) dx ≤ Cn / t ǫ ( t γ + t ǫ ) , and using the conditions on orders of t γ and t ǫ as speciﬁed in Lemma 2, such that t γ ≥ t γ /t ǫ = O (cid:16) n/ log / n (cid:17) , we have E " sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ E " sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:0)(cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) − E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) + (cid:12)(cid:12) E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12)(cid:1) ≤ Cn / σ ′ log / (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + (cid:0) σ ′ + ν ′ (cid:1) log (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + Cn / t ǫ ( t γ + t ǫ ) ≤ C t ǫ ( t γ + t ǫ ) n log n t ǫ ( t γ + t ǫ ) ! t γ + t ǫ ) t ǫ n / log / n / t ǫ ( t γ + t ǫ ) !! + Cn / t ǫ ( t γ + t ǫ ) . Recall t ǫ = t log / n/n / = o (log / n ) and t γ ≍ log / n , such that t γ + t ǫ ≍ log / n . It follows,for large n , n / ( t ǫ ( t γ + t ǫ )) ≤ ( C/t )( n / / log / n ) ≤ Cn / /t , such that log ( n / / ( t ǫ ( t γ + t ǫ ))) ≤ C log n . Similarly, ( t γ + t ǫ ) / ( t ǫ n / ) ≤ C/ ( n log n ) / .Thus, (( t γ + t ǫ ) / ( t ǫ n / )) × log / ( n / / ( t ǫ ( t γ + t ǫ ))) ≤ C (log n/n ) / = o (1) and(1 + ( t γ + t ǫ ) / ( t ǫ n / ) × log / ( n / / ( t ǫ ( t γ + t ǫ )))) = 1 + o (1) = 1. Therefore, it follows, E  sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) |  ≤ C t ǫ ( t γ + t ǫ ) n log / n + Cn / t ǫ ( t γ + t ǫ ) ≤ C t ǫ ( t γ + t ǫ ) n / , for large n , which proves Lemma 3. (cid:3) Proof of Lemma 4.

The ﬁrst order condition for Q ( β, τ ; θ ) gives E [ X ( θ ) { F ( X ′ ( θ ) β ( τ ; θ ) | X, θ ) − τ } ] =0 . Let s iℓ ( τ ; θ ) denote the ℓ th entry of the vector s i ( τ ; θ ) √ n in (A1.3). Assumption 3 gives, uniformly in( τ, θ ) ∈ [ τ , τ ] × Θ for all i , | s iℓ ( τ ; θ ) | ≤ ν ′′ , where ν ′′ ≍ n − / Var( s ℓ ( τ ; θ )) ≤ E h ( s ℓ ( τ ; θ )) i ≤ E (cid:20) n X ℓ ( θ ) (cid:21) = 1 n Z x f X ( x | θ ) dx ≤ (cid:0) σ ′′ (cid:1) , where σ ′′ ≍ n − / . Hence, arguing as in Steps 1-2 of Lemma 2, E " sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ℓ ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:16) n / σ ′′ log / n + (cid:0) σ ′′ + ν ′′ (cid:1) log n (cid:17) ≤ C log / n + (cid:18) log nn (cid:19) / log / n ! = O (cid:16) log / n (cid:17) . Markov inequality, then, proves Lemma 4. (cid:3) ppendix 3. Proof of remark in Section 5.1 (i) In the expression for C ( τ ) in (4.2), (cid:8)R τ ( β ( t ) + β ( t ) X ) dt − τ ( β ( τ ) + β ( τ ) X ) (cid:9) will have theform p ( τ )+ q ( τ ) X . Recall that e X = [1 , X ] ′ , X = [1 , X , X ] ′ and g ( X ) = e X [0 , , E − [ XX ′ ] X ,then, C ( τ ) = E " (cid:0) [0 , , E − [ XX ′ ] X (cid:1) ( p ( τ ) + q ( τ ) X ) (cid:0) [0 , , E − [ XX ′ ] XX (cid:1) ( p ( τ ) + q ( τ ) X ) . If X and X are independent, elementary matrix algebra gives that[0 , , E − (cid:2) XX ′ (cid:3) X = 1 D (cid:8)(cid:0) E [ X ] E [ X ] − E [ X ] E [ X ] (cid:1) + (cid:0) E [ X ] − E [ X ] (cid:1) X (cid:9) , where D is the determinant of the matrix E [ XX ′ ]. Plugging in this expression in C ( τ ) andsimplifying using independence of X and X proves the result.(ii) Given the result in (i), the increase in variance of the quantile estimates due to ﬁrst step es-timation, over standard quantile regression had the ﬁrst step been known, is given by (4.2) as H ( τ ) − D ( τ ) V ( β ) D ( τ ) ′ H ( τ ) − . Using H ( τ ) and D ( τ ) as given in (4.2), under independence of X and X , the vector H ( τ ) − D ( τ ) evaluates to [ − E [ X ] , ′ . Therefore, the additional variancedue to two-step estimation is given by H ( τ ) − D ( τ ) V ( β ) D ( τ ) ′ H ( τ ) − = " E [ X ] V ( β ) 00 0 . This proves (ii). (cid:3) eferences Abrevaya, J. (2002). The eﬀects of demographics and maternal behavior on the distribution of birth outcomes.In

Economic applications of quantile regression (pp. 247–257). Springer.Amemiya, T. (1974). The nonlinear two-stage least-squares estimator.

Journal of econometrics , 2(2), 105–110.Amemiya, T. & Powell, J. L. (1981). A comparison of the Box–Cox maximum likelihood estimator and thenon-linear two-stage least squares estimator.

Journal of Econometrics , 17(3), 351–381.Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey, J. W. (1972).

RobustEstimates of Location: Survey and Advances . Princeton University Press.Andrews, D. W. & Buchinsky, M. (2000). A three-step method for choosing the number of bootstrap repetitions.

Econometrica , 68(1), 23–51.Arellano, M. & Bonhomme, S. (2017). Quantile selection models with an application to understanding changesin wage inequality.

Econometrica , 85(1), 1–28.Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium.

Econometrica: Journalof the Econometric Society , (pp. 841–890).Blundell, R. & Powell, J. L. (2003). Endogeneity in nonparametric and semiparametric regression models. In

Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress (pp. 312–357).Cambridge University Press.Blundell, R. & Powell, J. L. (2007). Censored regression quantiles with endogenous regressors.

Journal ofEconometrics , 141(1), 65–83.Box, G. E. & Cox, D. R. (1964). An analysis of transformations.

Journal of the Royal Statistical Society. SeriesB (Methodological) , 26(2), 211–252.Buchinsky, M. (1994). Changes in the US wage structure 1963-1987: Application of quantile regression.

Econo-metrica , (pp. 405–458).Buchinsky, M. (1995). Quantile regression, Box–Cox transformation model, and the US wage structure, 1963–1987.

Journal of Econometrics , 65(1), 109–154.Chamberlain, G. (1994). Quantile regression, censoring, and the structure of wages. In C. A. Sims (Ed.),

Ad-vances in Econometrics: Sixth World Congress , Econometric Society Monographs (pp. 171––210). CambridgeUniversity Press.Chen, L., Galvao, A. F., & Song, S. (2018). Quantile regression with generated regressors.

Available at SSRN3039602 .Chen, X., Linton, O., & Van Keilegom, I. (2003). Estimation of semiparametric models when the criterionfunction is not smooth.

Econometrica , 71(5), 1591–1608.Chernozhukov, V., Fern´andez-Val, I., & Kowalski, A. E. (2015). Quantile regression with censoring and endo-geneity.

Journal of Econometrics , 186(1), 201–221.Chernozhukov, V., Fern´andez-Val, I., Newey, W., Stouli, S., & Vella, F. (2017). Semiparametric estimation ofstructural functions in nonseparable triangular models. arXiv preprint arXiv:1711.02184 . hernozhukov, V. & Hansen, C. (2005). An IV model of quantile treatment eﬀects. Econometrica , 73(1),245–261.Chernozhukov, V. & Hansen, C. (2006). Instrumental quantile regression inference for structural and treatmenteﬀect models.

Journal of Econometrics , 132(2), 491–525.Chernozhukov, V. & Hansen, C. (2008). Instrumental variable quantile regression: A robust inference approach.

Journal of Econometrics , 142(1), 379–398.Chesher, A. (2003). Identiﬁcation in nonseparable models.

Econometrica , 71(5), 1405–1441.Fang, K. W., Kotz, S., & Ng, K. W. (1990).

Symmetric multivariate and related distributions . London: Chapmanand Hall/CRC Press.Fitzenberger, B., Wilke, R. A., & Zhang, X. (2009). Implementing Box–Cox quantile regression.

EconometricReviews , 29(2), 158–181.Gastwirth, J. L. (1966). On robust procedures.

Journal of the American Statistical Association , 61(316),929–948.Gimenes, N. (2017). Econometrics of ascending auctions by quantile regression.

Review of Economics andStatistics , 99(5), 944–953.Gimenes, N. & Guerre, E. (2020). Quantile regression methods for ﬁrst-price auctions. arXiv preprintarXiv:1909.05542 .Hahn, J. & Ridder, G. (2013). Asymptotic variance of semiparametric estimators with generated regressors.

Econometrica , 81(1), 315–340.Haile, P. A., Hong, H., & Shum, M. (2003). Nonparametric tests for common values at ﬁrst-price sealed-bidauctions. Technical report , National Bureau of Economic Research .Hjort, N. L. & Pollard, D. (2011). Asymptotics for minimisers of convex processes. arXiv preprintarXiv:1107.3806 .Hoderlein, S., Klemel¨a, J., & Mammen, E. (2010). Analyzing the random coeﬃcient model nonparametrically.

Econometric Theory , (pp. 804–837).Ichimura, H. & Lee, S. (2010). Characterization of the asymptotic distribution of semiparametric m-estimators.

Journal of Econometrics , 159(2), 252–266.Imbens, G. W. & Newey, W. K. (2009). Identiﬁcation and estimation of triangular simultaneous equationsmodels without additivity.

Econometrica , 77(5), 1481–1512.Koenker, R. (2005).

Quantile Regression . Cambridge University Press.Koenker, R. (2017). Quantile regression: 40 years on.

Annual Review of Economics , 9, 155–176.Koenker, R. & Bassett, G. (1978). Regression quantiles.

Econometrica , 46(1), 33–50.Koenker, R. & Hallock, K. (2001). Quantile regression: An introduction.

Journal of Economic Perspectives ,15(4), 43–56. oenker, R. & Ma, L. (2006). Quantile regression methods for recursive structural equation models. Journalof Econometrics , 134(2), 471–506.Koenker, R. & Machado, J. A. (1999). Goodness of ﬁt and related inference processes for quantile regression.

Journal of the american statistical association , 94(448), 1296–1310.Koenker, R. & Xiao, Z. (2002). Inference on the quantile regression process.

Econometrica , 70(4), 1583–1612.Lee, S. (2007). Endogeneity in quantile regression models: A control function approach.

Journal of Economet-rics , 141(2), 1131–1158.Lu, J. & Perrigne, I. (2008). Estimating risk aversion from ascending and sealed-bid auctions: the case of timberauction data.

Journal of Applied Econometrics , 23(7), 871–896.Machado, J. A. & Mata, J. (2000). Box–Cox quantile regression and the distribution of ﬁrm sizes.

Journal ofApplied Econometrics , 15(3), 253–274.Mammen, E., Rothe, C., & Schienle, M. (2012). Nonparametric regression with nonparametrically generatedcovariates.

The Annals of Statistics , 40(2), 1132–1170.Mammen, E., Rothe, C., & Schienle, M. (2016). Semiparametric estimation with generated covariates.

Econo-metric Theory , 32(5), 1140–1177.Massart, P. (2007).

Concentration inequalities and model selection , volume 6. Springer.Mu, Y. & He, X. (2007). Power transformation toward a linear regression quantile.

Journal of the AmericanStatistical Association , 102(477), 269–279.Murphy, K. M. & Topel, R. H. (1985). Estimation and inference in two-step econometric models.

Journal ofBusiness & Economic Statistics , 3(4), 370–379.Newey, W. K. & McFadden, D. (1994). Large sample estimation and hypothesis testing.

Handbook of econo-metrics , 4, 2111–2245.Oxley, L. & McAleer, M. (1993). Econometric issues in macroeconomic models with generated regressors.

Journal of Economic Surveys , 7(1), 1–40.Pagan, A. (1984). Econometric issues in the analysis of regressions with generated regressors.

InternationalEconomic Review , (pp. 221–247).Powell, J. L. (1991). Estimation of monotonic regression models under quantile restrictions. In W. A. Barnett,J. Powell, & G. Tauchen (Eds.),

Nonparametric and Semiparametric Methods in Econometrics and Statistics chapter 14, (pp. 357–384). Cambridge University Press.Smith, R. L. (1994). Nonregular regression.

Biometrika , 81(1), 173–183.Stroock, D. W. (2011).

Essentials of integration theory for analysis , volume 262. Springer.van de Geer, S. (2000).

Empirical Processes in M-Estimation . Cambridge: University Press Cambridge.Zou, H. & Yuan, M. (2008). Composite quantile regression and the oracle model selection theory.

The Annalsof Statistics , 36(3), 1108–1126., 36(3), 1108–1126.