Quantile regression with generated dependent variable and covariates
aa r X i v : . [ ec on . E M ] D ec Quantile regression with generated dependent variable and covariates
Jayeeta Bhattacharya ∗ December 2020
Abstract
We study linear quantile regression models when regressors and/or dependent variable are notdirectly observed but estimated in an initial first step and used in the second step quantile regres-sion for estimating the quantile parameters. This general class of generated quantile regression(GQR) covers various statistical applications, for instance, estimation of endogenous quantile re-gression models and triangular structural equation models, and some new relevant applicationsare discussed. We study the asymptotic distribution of the two-step estimator, which is challeng-ing because of the presence of generated covariates and/or dependent variable in the non-smoothquantile regression estimator. We employ techniques from empirical process theory to find uniformBahadur expansion for the two step estimator, which is used to establish the asymptotic results.We illustrate the performance of the GQR estimator through simulations and an empirical appli-cation based on auctions.
Keywords:
Two-stage estimation, generated regressors, generated dependent variable, quantileregression, asymptotic variance, Bahadur expansion ∗ Lecturer, Department of Economics, University of Southampton. Correspondence email:[email protected]. This work was completed during my PhD at Queen Mary University of London(QMUL) and I am deeply thankful to Emmanuel Guerre for his valuable comments and supervision. I also thankparticipants at various conferences for insightful comments. Generous funding from the School of Economics andFinance, QMUL, is also gratefully acknowledged.
Introduction
Econometric analysis often requires the use of regressors that are not directly observed but have beenestimated in a preliminary first step. A rich literature exists on estimation and inference in modelswith generated regressors. Pagan (1984) and Oxley & McAleer (1993) provide surveys for parametricmodels with generated regressors while Mammen, Rothe & Schienle (2012) study and illustrate variousexamples for non-parametric regression with generated covariates. Studying the asymptotic propertiesof two step estimators in a parametric context, Murphy & Topel (1985) points out that ignoring theeffect of first step estimation leads to incorrect asymptotic standard errors.While these models are concerned with the characterization of the conditional mean, a more com-plete picture of the conditional distribution of a dependent variable is provided by quantile regression(QR) models. Since the seminal work of Koenker & Bassett (1978), quantile regression is widely usedin both empirical studies and theoretical statistics for analysing conditional quantile functions in linearand nonlinear response models. Quantile regression applications using generated regressors abound inliterature, most prominently related to models with endogenous covariates. Chernozhukov & Hansen(2005, 2006, 2008) develop identification and estimation for QR models in the presence of endogeneity.Another popular approach to deal with endogeneity uses the estimated reduced form residuals as con-trol variables in quantile regression. This technique has been applied in endogenous censored quantileregression models by Blundell & Powell (2007) and Chernozhukov, Fern´andez-Val & Kowalski (2015).Estimation of quantile treatment effects or quantile parameters in triangular simultaneous equationmodels using the control variable approach have been considered in Chesher (2003), Koenker & Ma(2006), Lee (2007), Imbens & Newey (2009), and Chernozhukov, Fern´andez-Val, Newey, Stouli & Vella(2017). There are, however, few references that develop a general theory for quantile models with gen-erated covariates and systematically study its statistical properties. The only related work seems tobe Chen, Galvao & Song (2018), who consider estimation and inference of quantile regression whenregressors are generated. However, they study two step quantile estimation when the second stepestimator is differentiable with respect to the first stage, which may not hold true for some relevantapplications since the quantile regression objective function is not smooth. They also do not considertransformation for the dependent variable as permitted here.This paper considers the general framework of QR models when either the regressors or the de-pendent variable (or both) are generated and studies the asymptotic behaviour of the two-step QRestimator, called the generated quantile regression (GQR) estimator, without being tailored to anyspecific application. An example giving rise to generated dependent variable is quantile specificationswith some constant slope parameters, as in the setup of Zou & Yuan (2008). Their composite quan-tile regression (CQR) method can be used to estimate the constant and quantile-varying parameterstogether, but its asymptotic properties have been studied for the estimation of constant parametersonly. To focus on quantile estimation, the constant slope parameters can first be estimated by linearregression (or any other suitable method) in a first step. Estimation of the quantile-varying slopeparameters, thereafter, involves quantile regression with the dependent variable generated as a func-tion of the constant slope parameters and the corresponding covariates. Removing some parameters1hrough the first step estimation may alleviate the computational burden of the CQR method causedby a large number of variables. Moreover, covariates make the QR estimators non-monotonic, evenif the quantile function is increasing. So, the expectation is that reducing the dimensionality of theregressors in the QR stage by removing some covariates through the first stage of estimation allowsto get closer to monotonicity, a desirable property for quantile estimation. The two-step procedurecan also simplify estimation of complex models like random coefficient models, popularised in demandanalysis by Berry, Levinsohn & Pakes (1995). Hoderlein, Klemel¨a & Mammen (2010) propose non-parametric estimation of the distribution of the random coefficients. Studying the econometrics ofauctions, Gimenes & Guerre (2020) consider quantile specifications arising from elliptically distributedrandom coefficients, which includes multivariate normal, lognormal or Student distribution. Its two-step estimation involves a median normalisation to identify and estimate the elliptical distributionlocation and dispersion parameters in the first step, which can generate the dependent variable forthe second stage quantile regression for estimating sample quantiles. Another example for generateddependent variable arises in quantile models where the dependent variable is transformed based onsome transformation parameter, like Box-Cox transformation, to induce some desirable properties forstatistical inference. The joint estimation of quantile varying transformation and slope parametersthrough non-linear quantile regression is computationally difficult, in addition to a numerical problembeing that the objective function is not defined for all parameter values and observations (meaning es-timation occurs by omitting such values). Estimating the transformation parameter in a first step willavoid such numerical problem and involve a linear quantile regression, ensuring a better performanceof the numerical algorithm used to compute the estimator.It is well known that the first step estimates impact the overall asymptotic behaviour of the finalestimator, understanding which is crucial for obtaining consistent standard errors which can be usedfor constructing correct confidence intervals. The wide range of quantile regression applications thatgive rise to generated regressors or dependent variable obtained from estimation in a preliminarystep suggest the need for a systematic analysis of their impact on the statistical properties of theQR estimator. The classical way in which asymptotic analysis is carried out for two step estimatorswith smooth objective functions relies on a Taylor expansion based technique for the second stageestimates, as applied in Murphy & Topel (1985). However, such methods are not applicable for theQR estimator, since it is difficult to differentiate the QR estimator . Finding the asymptotic varianceof the non-smooth two-step GQR estimator is not a trivial task and requires alternative techniques.Chen, Linton & Van Keilegom (2003) develop the asymptotic theory for semi-parametric GMM-typeestimators with non-smooth criterion function and non-parametric first-stage; closely related papersare Ichimura & Lee (2010), Hahn & Ridder (2013), and Mammen, Rothe & Schienle (2016) (the lattertwo also involve generated regressors in the non-parametric component, such that estimation occurs inthree steps). As a consequence of generality, they have a standard two step proof approach, where theyfirst give conditions for consistency and then establish asymptotic normality. However, since the QRobjective function is convex, it allows bypassing the tedious task of checking conditions for consistency This could be done in principle by applying the Implicit Function Theorem to the first-order condition that definesthe estimator. However, the QR estimator is not always unique and the QR objective function is not twice differentiable,preventing the use of this approach.
2s in Chen et al. (2003) and to establish asymptotic normality in one step instead, by applying theconvexity trick of Hjort & Pollard (2011). Also, a Bahadur representation of the non-smooth two-stepestimator with its rate is not present in these works.This paper systematically handles the associated issues for the asymptotic analysis of the gen-eralised two-step GQR estimator using techniques from asymptotic analysis for quantile regressionand empirical process theory. We derive the Bahadur expansion of the GQR estimator, with precisestochastic order of the remainder term, which holds uniformly with respect to the first step param-eter and the quantile levels. This involves establishing a stochastic equicontinuity result that allowsapproximating the score evaluated at the estimated first stage parameter by that taken at the trueparameter, which is interesting in evaluating the effect of the first stage. Using the Bahadur expansionapproach, under the assumption that the first stage estimation is asymptotically normal and someother regularity conditions, we establish asymptotic normality and obtain explicit expression for theasymptotic variance of the GQR estimator.Several applications fit the generated QR framework and four motivating examples are discussed- quantile regression involving constant slope parameters, an ellipticly distributed random coefficientmodel, a Box-Cox power transformed quantile regression, and a variant for endogenous quantile re-gression model. The example of QR model with constant slope also forms the basis for simulationexperiments and an empirical application; the analysis suggests potential benefits of the two stageGQR procedure over standard QR. The simulation exercises illustrate the validity of the GQR asymp-totic normality result and the effect of the first stage estimation; further analysis of the asymptoticvariance suggests that the GQR estimator produces efficiency improvements over standard QR es-timator for central quantiles. Finally, an empirical application based on auction models in quantileframework confirms that the GQR estimator improves the monotonicity and accuracy of quantileslopes as compared to an unconstrained estimation using standard quantile regression.The rest of the paper is organised as follows. Section 2 introduces the baseline model and theGQR estimator, and presents four applications to motivate the framework. Section 3 carries out theasymptotic analysis and presents the Bahadur expansion results and the central limit theorem for theGQR estimator. The asymptotic results are applied to the motivating examples in Section 4. Section5 presents simulation results while Section 6 reports results of the empirical application to first priceauctions. Proofs of the main results are given in the Appendices.
We consider the following linear quantile specification. Y ( θ ) = X ( θ ) ′ β ( U ); U | X ( θ ) ∼ U [0 , , (2.1)where, provided that τ X ( θ ) ′ β ( τ ) is strictly increasing and continuous in τ , X ( θ ) ′ β ( τ ) is the τ -quantile of Y ( θ ) conditional on X ( θ ). Here, Y ( θ ) and X ( θ ) are functions of a vector of parameters θ , which includes elements that generate the dependent variable Y , or the regressor X , or both. Thetrue value of the parameter θ in (2.1), denoted by θ , is not known but estimated. Hence, estimation3roceeds in two steps. First step: Estimation of θ . It is assumed that a consistent estimator b θ is available. For the sakeof generality, any estimation method is allowed at this stage, provided it satisfies an expansion typicalof regular estimators, see for example Newey & McFadden (1994). As discussed for the examples, asuitable choice of b θ can be done on a case-by-case basis. Second step: Estimation of quantile parameter.
The quantile parameter estimate b β ( τ ) in (2.1)is given by b β ( τ ) = b β ( τ ; b θ ) = arg min β n n X i =1 ρ τ (cid:16) Y i ( b θ ) − X i ( b θ ) ′ β (cid:17) , (2.2)where ρ τ ( u ) = ( τ − I ( u < u is the check function of Koenker & Bassett (1978). The general framework of quantile regression with dependent variable and/or covariates obtained asa function of parameters estimated in a first step finds wide application in economics and statistics.We present four applications here, which we revisit later to derive their asymptotic results.
Consider the quantile regression (QR) model Q Y ( τ | X ) = β ( τ ) + β ( τ ) X + β ( τ ) X (2.3)and assume that β ( τ ) = β for all τ , ie β ( · ) is constant. This model can be estimated usingZou & Yuan (2008)’s composite quantile regression (CQR) method as follows: (cid:16) b β , b β ( τ ) , b β ( τ ) , · · · , b β ( τ K ) , b β ( τ K ) (cid:17) = arg min b ,b k ,b k ; k =1 , ··· ,K K X k =1 n X i =1 ρ τ k ( Y i − X i b − b k − X i b k ) , for 0 < τ < τ < · · · < τ K <
1. This could lead to an intractable system due to very large number ofvariables, especially with more quantile parameters and quantile levels. Moreover, Zou & Yuan (2008)studies the asymptotic properties of the CQR estimator for estimation of constant slope parametersand compares efficiency with least squares, while the asymptotic behaviour for quantile varying slopeparameters remains unstudied. As an alternative to Zou & Yuan (2008), consider a two step estimationof this model as described below.As there exist uniform variables U i independent of X i such that Y i = Q Y ( U i | X i ), it holds Y i = β + β X i + β X i + ε i where β k = E [ β k ( U i )], k = { , , } , and ε i = β ( U i ) − β + (cid:0) β ( U i ) − β (cid:1) X i (since β = β =4 ( U i )). It follows that the β k ’s can be estimated using OLS, that is, (cid:16)b β , b β , b β (cid:17) = arg min b ,b ,b n X i =1 ( Y i − b − b X i − b X i ) . (2.4)Set b β = b β . A two step estimator of ( β ( τ ) , β ( τ )) is then (cid:16) b β ( τ ) , b β ( τ ) (cid:17) = arg min b ,b n X i =1 ρ τ (cid:16) Y i − b β X i − b − b X i (cid:17) . (2.5)Hence, in this example, the first step parameter is θ ≡ β , and the dependent variable is generated as Y i ( β ) = Y i − β X i . Consider the model Y i = X ′ i β i , i = 1 , . . . , n (2.6)where, β i is a ( K + 1)-dimensional vector of random coefficients, independent from the ( K + 1)-vectorof covariates X i whose first element is 1 (such that the first element of β i represents the error in thismodel). Note that linear regression is a special case of (2.6) where β i = β for all i .Suppose β i is drawn from elliptical distribution with location parameter µ and symmetric non-negative dispersion matrix Σ, which includes distributions like multivariate normal, log-normal andt-distribution, as considered in Gimenes & Guerre (2020)’s auctions based application. Let R i denotea random vector distributed uniformly on the unit sphere in R ( K +1) and consider the Euclidean norm E i = (cid:12)(cid:12)(cid:12)(cid:12) Σ − / ( β i − µ ) (cid:12)(cid:12)(cid:12)(cid:12) , independent from R i . Then, following Fang, Kotz & Ng (1990, pg-32), β i hasthe same distribution as µ + E i Σ / R i . Let r i denote the first coordinate of R i such that t ′ R i = || t || r i (see Fang et al. (1990), Theorem 2.4). Hence, using the symbol d = for denoting identical distributionof random variables, we have from (2.6) Y i d = X ′ i µ + (cid:16) Σ / X i (cid:17) ′ E i R i d = X ′ i µ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E i r i . Hence, the quantile specification for (2.6) is given by Q Y ( τ | X ) = X ′ µ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ξ ( τ ) (2.7)where ξ ( τ ) is the τ -th quantile of E i r i . The above model can be estimated in two steps as follows.Under the normalisation ξ (1 /
2) = 1, the parameters µ and Σ are identified by conditional medianregression: ( b µ, b Σ) = arg min µ, Σ n X i =1 (cid:12)(cid:12)(cid:12) Y i − X ′ i µ − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (2.8)5he second step involves quantile regression using the generated dependent variable Y i ( b µ, b Σ) = Y i − X ′ i b µ || b Σ / X || to obtain the sample quantiles: b ξ ( τ ) = arg min ξ n X i =1 ρ τ (cid:16) Y i ( b µ, b Σ) − ξ (cid:17) . (2.9) Box & Cox (1964) proposes finding a transformation parameter λ such that with the following trans-formation on the original observations Y , Y ( λ ) = Y λ − λ , if λ = 0 , log Y, if λ = 0 , (2.10) Y ( λ ) is normally distributed with conditional variance σ , and E [ Y ( λ ) | X ] = X ′ β . The desirousproperty for quantile regression is linearity, that is, Q Y ( λ ) ( τ | X ) = X ′ β ( τ ) . The Box-Cox quantile regression literature has mostly focussed on finding a quantile dependenttransformation parameter (see, for instance, Powell (1991), Chamberlain (1994), Buchinsky (1995),Machado & Mata (2000) and Fitzenberger, Wilke & Zhang (2009)). Owing to the equivariance prop-erty of quantiles, this leads to minimization of the non-linear function P ni =1 ρ τ (cid:16) Y i − ( λX ′ i β + 1) /λ (cid:17) .Quantile varying λ adds flexibility to the model, but joint estimation of ( λ ( τ ) , β ( τ )) requires effort,see Koenker (2017). Also, a basic numerical problem is that ( λX ′ i β + 1) needs to be positive for all λ and all observations.A constrained estimation with a constant λ has obvious computational and numerical benefits.Mu & He (2007) considers constancy of λ ( τ ). In the empirical application of Buchinsky (1995) study-ing transformation of log wages over 25 years, λ ( τ ) seems to be constant for all quantiles except thehighest. A simpler approach would, therefore, involve estimating b λ separately in a first step andthereafter performing linear quantile regression using the transformed Y for estimating β ( τ ). b λ canbe estimated from the linear regression Y ( λ ) = X ′ β + ε . A consistent estimator for b λ is Amemiya(1974)’s nonlinear IV (NIV) estimator, (cid:16)b λ NIV , b β NIV (cid:17) = arg min ℓ,b n X i =1 (cid:0) Y i ( ℓ ) − X ′ i b (cid:1) W ′ i ! Ω n X i =1 W i (cid:0) Y i ( ℓ ) − X ′ i b (cid:1)! , (2.11)where W i always contains X i as well as additional instruments (Amemiya & Powell (1981) recommendsusing squares and cross-products of X i ’s). Set b λ = b λ NIV . The dependent variables Y i ( b λ ) is, then,6enerated using equation (2.10). β ( τ ) is estimated from quantile regression of Y ( b λ ) on X , b β ( τ ) = arg min b n X i =1 ρ τ (cid:16) Y i ( b λ ) − X ′ i b (cid:17) . (2.12) Control variable approach views endogeneity bias as an omitted variable bias and proceeds by esti-mating the ‘control variable’, which is the residual of the regression of the endogenous regressor on theinstruments, conditional on which error becomes independent of the regressors (see Blundell & Powell(2003)).Consider the following system of equations Y = W ′ α + Xβ + ε, (2.13) X = Z ′ γ + η (2.14)where W is a vector of exogenous covariates and X is the endogenous regressor of interest generatedby (2.14) in which Z is the vector of instruments uncorrelated with η and ε , η being centered with afinite variance. Hence, endogeneity in X arises due to the unobserved latent variable η , adding whichas regressor in the first equation ‘corrects’ for endogeneity, as in the following quantile specification: Q Y | W,X,η ( τ | W, X, η ) = W ′ α ( τ ) + Xβ ( τ ) + ηλ ( τ ) . (2.15)The above model can be estimated in two steps as follows. The first stage least squares estimates thecontrol variable η , b η i = X i − Z ′ i b γ, b γ = N X i =1 Z i Z ′ i ! − N X i =1 Z i X i . (2.16)The second stage estimator for the quantile coefficients is hb α ′ ( τ ) , b β ( τ ) , b λ ( τ ) i ′ = arg min α,β,λ N X i =1 ρ τ (cid:0) Y i − W ′ i α − X i β − b ηλ (cid:1) = arg min α,β,λ N X i =1 ρ τ (cid:0) Y i − W ′ i α − X i β − ( X i − Z ′ i b γ ) λ (cid:1) . (2.17)Hence, in this example, the first step estimator is θ ≡ γ , and the second stage involves quantileregression of Y i on generated regressors, X i ( θ ) ≡ [ W ′ i , X i , ( X i − Z ′ i γ )] ′ . Our main assumptions are as follows: 7 ssumption 1 (First step estimator)
There exists a function ψ ( z ) such that the estimator of thetrue θ is asymptotically linear: √ n (cid:16)b θ − θ (cid:17) = 1 √ n n X i =1 ψ ( z i ) + o P (1) , E [ ψ ( z )] = 0 , E (cid:2) ψ ( z ) ψ ( z ) ′ (cid:3) < ∞ . Assumption 2 (Model) ( X i , Y i ) are i.i.d. There exists a compact set Θ with a non empty interiorcontaining θ such that X i ( θ ) = h ( X i , θ ) and Y i ( θ ) = g ( Y i , X i , θ ) are continuous and differentiablewith respect to θ in Θ for all ( Y i , X i ) . Denoting k·k as the Euclidean norm, it holds moreover that sup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13) ∂g ( Y, X, θ ) ∂θ (cid:13)(cid:13)(cid:13)(cid:13) < ∞ . In the next Assumption F ( y | x, θ ) and f ( y | x, θ ) stands for the c.d.f. and p.d.f. of Y ( θ ) given X ( θ ), f X ( ·| θ ) being the p.d.f. of X ( θ ). The set X ( θ ) is the support of X ( θ ). All p.d.f. are definedwith respect to the Lebesgue measure. The set Θ is as in Assumption 2. Assumption 3 (Smoothness) (i) X ( θ ) lies in R d for each θ and X ( θ ) is a compact subset of R d with non empty interior. f X ( x | θ ) > over the interior of X ( θ ) and vanishes at its boundaries. f X ( x | θ ) is continuously differentiable with respect to θ . (ii) the p.d.f. f ( y | x, θ ) of Y ( θ ) given X is con-tinuously differentiable in ( y, x, θ ) with f ( y | x, θ ) > for all ( y, x, θ ) such that ( x, θ ) ∈ S θ ∈ Θ {X ( θ ) × θ } and y is in the interior of the support of F ( ·| x, θ ) . Asymptotically linear estimators in Assumption 1 refer to the class of extremum estimators asconsidered in Newey & McFadden (1994). Examples include MLE, NLS, and the GMM class. Itimplies √ n -consistency of the first step estimator and is key to the derivation of the asymptoticnormality result for the second-step estimator. The triangular structure imposed by Assumption 2ensures that X ( θ ) is not a function of Y and therefore remains exogenous; it is useful in the example ofSection 2.1.1. Assumption 3-(ii) is a high level assumption that can be derived from Assumption 2 andthe quantile regression slope β ( · ) since g ( Y, X, θ ) = X ( θ ) ′ β ( U ). It implicitly requests a monotone g ( · , X, θ ) with non zero derivatives, as f ( ·| x, θ ) may diverge otherwise. Indeed, if ∂g ( y, x, θ ) /∂y > f ( y | x ) is the p.d.f. of Y given X (assuming X ( θ ) = X for the sake of the brevity of thisdiscussion), it holds f ( y | x, θ ) = 1 ∂g∂y [ g − ( y, x, θ ) , x, θ ] f (cid:2) g − ( y, x, θ ) | x (cid:3) which may not be bounded if ∂g ( y, x, θ ) /∂y vanishes. Assumption 3-(ii) then holds if f ( y | x ) iscontinuously differentiable in ( x, y ) and g ( y, x, θ ) twice differentiable with respect to y and θ withbounded partial derivatives. Assumption 3-(i) is similar, but note that the transformation X ( θ ) = h ( X, θ ) does not need to be one to one, as X ( θ ) may have a smaller dimension than X .The QR estimator of the slope coefficient is an estimator of β (cid:16) τ ; b θ (cid:17) where β ( τ ; θ ) = arg min β E (cid:2) ρ τ (cid:0) Y ( θ ) − X ′ ( θ ) β (cid:1)(cid:3) . (3.1)8ssumption 3 ensures that the objective function is strictly convex for all θ , so that β ( τ ; θ ) is theunique solution of the first order condition0 = E (cid:2)(cid:8) I (cid:0) Y ( θ ) ≤ X ′ ( θ ) β (cid:1) − τ (cid:9) X ( θ ) (cid:3) = E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β (cid:12)(cid:12) X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3) This together with the Implicit Function Theorem implies that β ( τ ; θ ) is differentiable with respectto θ , as established in the following Proposition. Proposition 1
Under Assumptions 2 and 3, β ( τ ; θ ) is continuously differentiable with respect to θ for any θ ∈ Θ and < τ < . It holds moreover ∂β ( τ ; θ ) ∂θ = H ( τ ; θ ) − D ( τ ; θ ) where H ( τ ; θ ) = E (cid:2) f (cid:0) X ′ ( θ ) β ( τ ; θ ) (cid:12)(cid:12) X, θ (cid:1) X ( θ ) X ′ ( θ ) (cid:3) D ( τ ; θ ) = − ∂∂θ (cid:2) E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β (cid:12)(cid:12) X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) θ = θ ,β = β ( τ ; θ ) . Proof of Proposition 1 : See proof section in Appendix 1.The matrix H ( τ ; θ ) plays an important role in the asymptotic distribution of standard QR estima-tors, see below and Koenker (2005). The existence of its inverse is established in Lemma 1 of theproof section in Appendix 1. The matrix D ( τ ; θ ) is specific to two stage estimation. With known θ , a linear representation for √ n (cid:16) b β ( τ ; θ ) − β ( τ, θ ) (cid:17) can be found in Koenker (2005) Section 4 . θ in-duces some important changes compared to a known θ and requires finding an approximation for √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) . The approach used here builds on a Bahadur expansion which holds uni-formly in θ and τ . While detailed proofs are in Appendix 1-2, a heuristic description of the Bahadurexpansion proof is as below. Heuristics.
Define b S ( τ ; θ ) = 1 √ n n X i =1 (cid:2) I (cid:0) Y i ( θ ) ≤ X ′ i ( θ ) β ( τ ; θ ) (cid:1) − τ (cid:3) X i ( θ ) , (3.2) J ( τ ; θ ) = τ (1 − τ ) E (cid:2) X i ( θ ) X ′ i ( θ ) (cid:3) (3.3) b E ( τ ; θ ) = √ n (cid:16) b β ( τ ; θ ) − β ( τ ; θ ) (cid:17) − (cid:16) − H − ( τ ; θ ) b S ( τ ; θ ) (cid:17) . (3.4)Note that for a given θ , b S ( τ ; θ ) / √ n is the score of the objective function in (3.1) and b S ( τ ; θ ) is centeredfor 0 < τ < J ( τ ; θ ). The basic idea is to approximate √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) with − H − ( τ ; θ ) b S ( τ ; θ ), assuming that the approximation error is of the right order. If so, the asymptoticnormality of the two-step QR estimator follows from that of its score evaluated at the true first stage.9rucially, it needs to be shown that the approximation error is small. The approach here is basedon two main results: showing that the Bahadur error term b E ( τ ; θ ) is small by finding its uniformbound for all θ and τ , and proving the stochastic equicontinuity of H − ( τ ; θ ) ˆ S ( τ ; θ ) at the true θ .The outline of the proof is as follows. Let b E ( τ ; θ ) = arg min ǫ L n ( ǫ, τ ; θ ), where L n is so definedto be a linear combination of ρ τ ( · ) and, thus, is convex. Consider the decomposition L n ( ǫ, τ ; θ ) = L n ( ǫ, τ ; θ ) + R n ( ǫ, τ ; θ ), where L n is the quadratic approximation of L n and R n is the remainder term.Finding uniform order for b E means finding bounds for the probability of || b E || ≥ t n , for a small number t n such that lim n →∞ t n →
0, for all θ and τ . This involves finding bounds on inf || ǫ ||≥ t n L n ( ǫ, τ ; θ ),which, in turn, requests placing bounds on inf || ǫ || = t n L n ( ǫ, τ ; θ ) and sup || ǫ || = t n R n ( ǫ, τ ; θ ). Note thatconvexity allows us to make inference for the non-compact set || ǫ || ≥ t n by considering a compact set || ǫ || = t n (as detailed under heading ‘Uniform order for b E ( τ ; θ )’ in Appendix 1). Obtaining bounds for L n is straightforward. Uniform order of R n for all θ and τ , as obtained in Appendix 1 equation (A1.12),relies on establishing Bernstein type maximal inequality for the empirical process R n using Theorem6 . H − ˆ S also follows fromsimilar arguments of maximal inequality under bracketing entropy. The next Proposition presentsthe Bahadur error bound and stochastic equicontinuity result to establish linearisation of the GQRestimator, uniformly in τ and θ . Proposition 2
Under Assumptions 1-3 it holds for any compact parameter set Θ and C > (i) sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:13)(cid:13)(cid:13) b E ( τ ; θ ) (cid:13)(cid:13)(cid:13) = O P log / nn / ! , (3.5) (ii) sup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13)(cid:13) H − ( τ ; θ ) b S ( τ ; θ ) − H − ( τ ; θ ) b S ( τ ; θ ) (cid:13)(cid:13)(cid:13) = O P log / nn / ! (3.6) where < τ ≤ τ < and B ( θ , ̺ ) = { θ ; k θ − θ k ≤ ̺ } . Proof of Proposition 2 : See proof section in Appendix 1.Propositions 1 and 2 give the next Theorem, which states a Central Limit Theorem for the two stepestimator of the slope coefficient. Note that in absence of the parameter θ , the asymptotic normalityresult is the same as that for usual quantile regression estimator, as derived in Theorem 4 . Theorem 1
Under Assumptions 1-3, it holds for any τ in (0 , √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) d → N (0 , V ( τ ))10 here V ( τ ) = H ( τ ; θ ) − [ J ( τ ; θ ) + D ( τ ; θ ) C Ψ S ( τ ) + C ′ Ψ S ( τ ) D ′ ( τ ; θ )+ D ( τ ; θ ) C ΨΨ D ′ ( τ ; θ )] H ( τ ; θ ) − ,C ΨΨ = E [Ψ ( Z ) Ψ ′ ( Z )] , C Ψ S ( τ ) = E [Ψ ( Z ) X ′ ( θ ) { I [ Y ( θ ) ≤ X ′ ( θ ) β ( τ ; θ )] − τ } ] . Proof of Theorem 1.
Proposition 1 yields that √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β ( τ ; θ ) (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + √ n (cid:16) β (cid:16) τ ; b θ (cid:17) − β ( τ ; θ ) (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ + o P (1) (cid:19) √ n (cid:16)b θ − θ (cid:17) = √ n (cid:16) b β (cid:16) τ ; b θ (cid:17) − β (cid:16) τ, b θ (cid:17)(cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1) (3.7)where the last line holds thanks to Assumption 1. Equation (3.7) and Proposition 2 give √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) = H − (cid:16) τ ; b θ (cid:17) b S (cid:16) τ ; b θ (cid:17) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1)= H − ( τ ; θ ) b S ( τ ; θ ) + (cid:18) ∂β ( τ ; θ ) ∂θ (cid:19) ′ √ n n X i =1 Ψ ( Z i ) + o P (1) , (3.8)where the last line results from (3.6) since Assumption 1 and taking C large enough ensure that b θ belongs to B (cid:0) θ , Cn − / (cid:1) with high probability. Since ∂β ( τ ; θ ) ∂θ = H ( τ ; θ ) − D ( τ ; θ ) from Proposition1, the Limit distribution of Theorem 1 follows from the Multivariate CLT. (cid:3) Remark 1.
As Propositions 1 and 2 hold uniformly in τ , the expansion (3.8) also does. SinceFunctional Central Limit Theorems for b S ( τ ; θ ) can be applied, (3.8) can be used to obtain a Func-tional Central Limit Theorem for the two step quantile regression estimator. Remark 2.
The order of the o P (1) remainder term in (3.8) can be made more precise, strength-ening the smoothness Assumptions 2 and 3 to ensure that β ( τ ; θ ) is twice continuously differentiableusing the Implicit Function Theorem as in Proposition 1. Indeed, if β ( τ ; θ ) is twice continuouslydifferentiable with respect to θ , the o P (1) remainder term in (3.7) is an O P (cid:0) n − / (cid:1) and the order ofthe o P (1) remainder term in (3.8) follows from (3.5) and is O P (cid:16) n − / log / n (cid:17) . Remark 3.
The proof can be easily modified for the case where θ depends upon τ . Remark 4.
For estimating the GQR asymptotic variance, a kernel-based approach can be em-ployed with numerical derivatives. But bootstrap may be preferable and, indeed, is more suitablefor quantile regression (see Koenker (2005) and the references therein). The validity of bootstrapfor obtaining asymptotic confidence interval of two-step semiparametric estimators with non-smoothobjective function has been proven by Chen et al. (2003), implying its correctness for the GQR esti-mator. 11
Examples revisited
In this section, we apply the asymptotic theory results of Section 3 to the motivating examplesintroduced in Section 2.1.
For the quantile regression model (2.3), recall that the constant paramater β ( · ) is estimated usingleast squares regression, and the quantile parameters ( β ( · ) , β ( · )) are estimated using the generateddependent variable Y i ( b β ) = Y i − b β X i via the two-step quantile regression estimator of (2.5). Asymp-totic normality of the first step OLS estimator is well established. Denote X = [1 , X , X ] ′ . Assumethat E [ ε XX ′ ] is finite and E [ XX ′ ] is full rank and finite. The OLS estimator is asymptotically linear: √ n (cid:16) b β − β (cid:17) = n X i =1 (cid:2) E − (cid:2) XX ′ (cid:3) X i ε i (cid:3) / √ n + o P (1) . Denoting i = [0 , , b β is given by V ( β ) = i (cid:0) E − [ XX ′ ] E [ ε XX ′ ] E − [ XX ′ ] (cid:1) i ′ . (4.1)For the second step quantile regression, the dependent variable is generated as Y ( b β ) = Y − b β X ,and the regressors are denoted as e X = [1 , X ] ′ . Asymptotic normality of the quantile parameters β ( τ ) = ( β ( τ ) , β ( τ )) ′ follows directly from Theorem 1: √ n " b β ( τ ) − β ( τ ) b β ( τ ) − β ( τ ) d −→ N (0 , V ( τ )) . The terms of V are obtained from Theorem 1 by replacing θ ≡ β , β ( τ ) ≡ ( β ( τ ) , β ( τ )) ′ , X ( θ ) ≡ e X = [1 , X ] ′ and Y ( θ ) ≡ Y ( β ) = Y − β X . Denoting the first τ -derivative of β ( τ ) as β (1) ( τ ), V ( τ )comes as follows: V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( β ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − , where H ( τ ) = E " e X e X ′ β (1)0 ( τ ) + β (1)2 ( τ ) X , J ( τ ) = τ (1 − τ ) E h e X e X ′ i ,D ( τ ) = − E " X e Xβ (1)0 ( τ ) + β (1)2 ( τ ) X and C ( τ ) = E (cid:20) g ( X ) (cid:26)Z τ ( β ( t ) + β ( t ) X ) dt − τ ( β ( τ ) + β ( τ ) X ) (cid:27)(cid:21) , (4.2)with g ( X ) = e X [0 , , E − [ XX ′ ] X . 12 .2 Random coefficient model For the random coefficient model in (2.6), recall that for identification of the quantile specification in(2.7), we normalise ξ (1 /
2) = 1. Denote the τ -derivative of ξ ( τ ) by ξ (1) ( τ ). The first step parameters θ ≡ ( µ, Σ) are estimated by (2.8); denote G ( · ) = X ′ i µ + (cid:12)(cid:12)(cid:12)(cid:12) Σ / X i (cid:12)(cid:12)(cid:12)(cid:12) . The θ -derivative of G ( · ) is given by G θ = " X ∂ || Σ / X || ∂σ where σ is a ( K + 1) × / . The non-linear median regressionestimator of (2.8) is asymptotically linear (see Section 4.4 of Koenker (2005)): √ n ( b θ − θ ) = H − n X i =1 G θi [1 / − I ( Y i ≤ G i ( · ))] / √ n + o P (1)where H = E (cid:20) G θ G θ ′ || Σ / X || ξ (1) (1 / (cid:21) . The asymptotic variance of b θ is given by V ( θ ) = H − E h G θ G θ ′ i H − / Y ( b θ ) ≡ Y ( b µ, b Σ) = Y i − X ′ i b µ || b Σ / X || by (2.9). The asymptotic normality of b ξ ( τ ) follows from Theorem 1: √ n hb ξ ( τ ) − ξ ( τ ) i d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( θ ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . The terms of V ( τ ) are: H ( τ ) = E h ξ (1) ( τ ) i , J ( τ ) = τ (1 − τ ) , D ( τ ) = − E " ξ (1) ( τ ) " X ∂ || Σ / X || ∂σ ξ ( τ ) ,C ( τ ) = E (cid:20) Ψ( · ) (cid:26) I (cid:18) Y − X ′ µ || Σ / X || ≤ ξ ( τ ) (cid:19) − τ (cid:27)(cid:21) where Ψ( · ) = H − G θ [1 / − I ( Y ≤ G ( · ))] . The box-cox transformation parameter of (2.10) is estimated using the nonlinear IV (NIV) estimatorof (2.11). The conditional quantile model for the generated dependent variable Y ( b λ ) is assumed linearin parameters, which are estimated using the QR estimator of (2.12). Amemiya (1974) establishes thelimiting behaviour of the NIV estimator. Assume that E h ( Y ( λ ) − X ′ β ) W W ′ i is finite and Ω is fullrank and finite.Note that if β is a K -dimension vector, then the NIV estimator estimates ( K + 1) parameters,denoted by θ = [ λ, β ′ ] ′ . Denote the ( K + 1) order square matrix, G = E (cid:20) W ∂Y ( λ ) ∂λ , − W X ′ (cid:21) . √ n (cid:16)b θ − θ (cid:17) = n X i =1 h − (cid:0) G ′ Ω G (cid:1) − G ′ Ω W i ( Y i ( λ ) − X ′ i β ) i / √ n + o P (1) . The asymptotic variance of b λ , denoted by V ( λ ), is the first term of the asymptotic variance-covariancematrix for b θ . Denoting i = [1 , K × ], where K × is a K -dimension row vector of zeros, V ( λ ) = i (cid:16)(cid:0) G ′ Ω G (cid:1) − G ′ Ω E (cid:2)(cid:0) Y ( λ ) − X ′ β (cid:1) W W ′ (cid:3) Ω G (cid:0) G ′ Ω G (cid:1) − (cid:17) i ′ . Asymptotic normality for the quantile estimates obtained from QR of Y ( b λ ) on X follows directly fromTheorem 1. √ n (cid:16) b β ( τ ) − β ( τ ) (cid:17) d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) V ( λ ) D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . The terms of V ( τ ) are given by H ( τ ) = E h XX ′ X ′ β (1) ( τ ) i , J ( τ ) = τ (1 − τ ) E [ XX ′ ] , D ( τ ) = − E h XX ′ β (1) ( τ ) ∂ ( X ′ β ( τ ) λ +1) /λ ∂λ i C ( τ ) = E (cid:2) g ( X ) (cid:8)R τ X ′ β ( t ) dt − τ X ′ β ( τ ) (cid:9)(cid:3) , where g ( X ) = X [1 , K × ] (cid:16) − ( G ′ Ω G ) − G ′ Ω W (cid:17) . The quantile regression model in (2.15) is estimated in two steps. The first step uses OLS estimator of(2.16) to estimate b γ . This is used to generate the control variable b η = ( X i − Z ′ i b γ ), which is includedas a regressor in the quantile regression estimator of (2.17) for estimating the quantile parameters δ ( τ ) ≡ ( α ( τ ) ′ , β ( τ ) , λ ( τ )) ′ . Denote the generated regressors as X ( γ ) = [ W ′ , X, ( X − Z ′ γ )] ′ . Weassume that E (cid:2) η | Z (cid:3) = σ and E ( ZZ ′ ) is finite. The OLS estimator is asymptotically linear: √ n ( b γ − γ ) = n X i =1 (cid:2) E − (cid:2) ZZ ′ (cid:3) Z i η i (cid:3) / √ n + o P (1) . The asymptotic normality of the quantile parameters δ ( τ ) follows directly from Theorem 1, √ n hb δ ( τ ) − δ ( τ ) i d −→ N (0 , V ( τ )) , where V ( τ ) = H ( τ ) − (cid:8) J ( τ ) + D ( τ ) σ E − [ ZZ ′ ] D ( τ ) ′ + C ( τ ) ′ D ( τ ) + C ( τ ) D ( τ ) ′ (cid:9) H ( τ ) − . V ( τ ) are given by J ( τ ) = τ (1 − τ ) E [ X ( γ ) X ( γ ) ′ ] , D ( τ ) = − ∂∂γ [ E [ { F ( X ′ ( γ ) δ | W, X, γ ) − τ } X ( γ )]] (cid:12)(cid:12)(cid:12) δ = δ ( τ ) C ( τ ) = E (cid:2) X ( γ ) ( I ( Y ≤ X ( γ ) ′ δ ( τ )) − τ ) ηZ ′ E − [ ZZ ′ ] (cid:3) , H ( τ ) = E h X ( γ ) X ( γ ) ′ X ( γ ) ′ δ (1) ( τ ) i . This section reports results of simulation exercises to illustrate the performance of the two-step GQRestimator and validate the asymptotic normality result of Theorem 1. The simulations are based onthe quantile regression with constant slope model of Section 2.1.1 Q Y ( τ | X ) = β ( τ ) + β ( τ ) X + β ( τ ) X , with true parameters as, β ( τ ) = e τ , β ( τ ) = β = 1 ∀ τ, β ( τ ) = 2 τ . (5.1)Data are generated as Y i = β ( U i ) + β X i + β ( U i ) X i , where ( X i , X i ) are uniform random variablesbetween [1 ,
5] and [3 ,
10] respectively, U i is a [0 , i = 1 , · · · , n . Sample sizesof n = 100 and n = 1000 are considered. The number of simulation replications is set to 1000.GQR estimation of the above model proceeds as in Section 2.1.1. We also compare GQR withstandard quantile regression, where all parameters - both constant and quantile-varying ones - areestimated together by quantile regression of Y on X ’s. Also, to clearly see the effect of first stageestimation on overall variance, the GQR estimator is compared with an infeasible quantile regression(i-QR) estimator which uses the true value of the first step parameter instead of its estimate, that is,the unknown dependent variable Y ∗ i ( β ) = Y i − β X i , for QR based estimation of quantile parameters.Following Remark 4, asymptotic variance estimation for validating the asymptotic normality resultand for finding confidence intervals follows Buchinsky (1994)’s design matrix bootstrap. Design matrixbootstrap is extensively used in empirical applications for quantile regression involving large samples,see, for instance, Buchinsky (1994) and Abrevaya (2002) . The approach is as follows. For B bootstrapreplications, each of size of m (drawn with replacement from an overall sample size of n ), b = 1 , · · · , B bootstrap quantile estimates are obtained at each quantile level. This follows the so-called m -out-of- n bootstrap technique which provides significant computational advantage when sample size islarge. Following Buchinsky (1994), the sample covariance of these estimates, rescaled by ( m/n ),constitutes a valid estimator of the covariance matrix of the QR estimator. Hence, the estimate forasymptotic covariance V ( τ ) with quantile parameters β ( · ) and the bootstrap estimates denoted by See Buchinsky (1995) and Koenker & Hallock (2001) for a comparison of various QR variance estimators; theyconclude in favour of design matrix bootstrap. β b ( τ ), b = 1 , · · · , B , is given by b V ( τ ) = n (cid:16) mn (cid:17) B B X b =1 (cid:16) b β b ( τ ) − b β b A ( τ ) (cid:17) (cid:16) b β b ( τ ) − b β b A ( τ ) (cid:17) ′ , (5.2)where b β b A ( τ ) is the average of the B bootstrap estimates. We set B = 1000; for n = 1000, thebootstrap sample size is m = 300, while for n = 100, we have m = n . The choice of bootstrapreplications and sample size are consistent with Buchinsky (1995) and Andrews & Buchinsky (2000).We estimate b V ( τ ) from (5.2) for each of the 1000 simulations and report the average. For GQR in Tables 1-4, the first step least squares regression gives the mean of ˆ β as 1 .
007 (withaverage standard deviation = 0 . .
001 (with average standarddeviation = 0 . b β ( · ) for GQR, standard QR andi-QR estimation methods, with varying n . All methods of estimation have low biases and the RMSEfalls with increasing sample size. We note that while all estimation procedures have similar biases,the RMSE with GQR is greater than that of QR for the first quantile, and the opposite is true for therest of the quantiles, an observation we investigate further in Section 5.2. As expected, the RMSEwith GQR is greater than that of i-QR at each quantile, with substantial difference in some, due tothe added variance contribution from first step estimation in the former.Table 2 reports the Bias-RMSE results for the slope parameter b β ( · ). The bias and RMSE aresimilar for all three methods of estimation and the RMSE falls with increase in sample size. Thefollowing remark explains this. Remark.
In the GQR asymptotic variance for the QR with constant slope model as given by (4.2),if the covariates X and X are independent, as considered here, it holds that(i) The covariance between first and second step estimates is zero: C ( τ ) = 0.(ii) The first step estimation has an effect on the second-step variance for the intercept, b β ( τ ), butnot for the slope parameter b β ( τ ), as H ( τ ) − D ( τ ) in (4.2) evaluates to [ − E [ X ] , ′ .Proofs are straightforward using basic matrix algebra and its outline is presented in Appendix 3.As a means for validating the asymptotic normality result, Tables 3-4 compare the empirical90% ,
95% and 99% GQR confidence intervals with that in theory for normal approximation, for n = 100and n = 1000. For τ = { . , . , . , . } , t-stat of the quantile parameters is computed usingbootstrapped standard error (SE) from (5.2) and its absolute value is compared with the criticalvalues for (1 − α ) confidence level of the normal approximation, (1 − α ) = 0 . , .
95 and 0 .
99, to findif the true quantile parameter is inside the corresponding confidence interval. Repeating the exercise1000 times, we find the percentage of times when the true parameter lies inside the (1 − α ) confidence16able 1: Bias and RMSE of ˆ β ( · ) for n = 100 and 1000 n = 100 n = 1000 τ Bias RMSE Bias RMSE0 . . . − . . . . . . . . − . . . . . − . . . . . . . . − . . . . . − . . . . . . . . . . . . . − . . − . . . . . . . . β ( · ) for n = 100 and 1000 n = 100 n = 1000 τ Bias RMSE Bias RMSE0 . . . . . . . − . . . . − . . . − . . . . − . . − . . − . . − . . . − . . . . − . . − . . − . . − . . . − . . . . − . . − . . − . . − . . τ = 0 . n = 100, but it improves for n = 1000 in Table 4. Overall, the empirical levels forconfidence intervals are close to (1 − α ) and improves with increasing sample size, which suggests thatthe estimation procedure gives accurate central limit theorem based confidence intervals.Table 3: Confidence intervals: nominal vs. empirical, n = 100, simulations = 1000CI for β ( · ) CI for β ( · )Nominal level 0 .
90 0 .
95 0 .
99 0 .
90 0 .
95 0 . τ = 0 . .
940 0 .
977 0 .
997 0 .
960 0 .
982 0 . .
949 0 .
978 0 .
996 0 .
923 0 .
970 0 . .
928 0 .
972 0 .
994 0 .
907 0 .
952 0 . τ = 0 . .
897 0 .
951 0 .
991 0 .
888 0 .
942 0 . .
899 0 .
948 0 .
987 0 .
888 0 .
940 0 . .
885 0 .
940 0 .
986 0 .
878 0 .
933 0 . τ = 0 . .
895 0 .
944 0 .
985 0 .
882 0 .
934 0 . .
893 0 .
936 0 .
981 0 .
883 0 .
940 0 . .
878 0 .
929 0 .
979 0 .
877 0 .
934 0 . τ = 0 . .
900 0 .
940 0 .
989 0 .
895 0 .
942 0 . .
905 0 .
950 0 .
984 0 .
890 0 .
942 0 . .
867 0 .
928 0 .
978 0 .
875 0 .
932 0 . n = 1000, simulations = 1000CI for β ( · ) CI for β ( · )Nominal level 0 .
90 0 .
95 0 .
99 0 .
90 0 .
95 0 . τ = 0 . .
897 0 .
952 0 .
992 0 .
906 0 .
954 0 . .
903 0 .
953 0 .
989 0 .
911 0 .
947 0 . .
884 0 .
940 0 .
985 0 .
893 0 .
939 0 . τ = 0 . .
885 0 .
933 0 .
987 0 .
882 0 .
944 0 . .
886 0 .
942 0 .
987 0 .
907 0 .
954 0 . .
901 0 .
947 0 .
988 0 .
895 0 .
946 0 . τ = 0 . .
886 0 .
952 0 .
985 0 .
891 0 .
937 0 . .
874 0 .
945 0 .
980 0 .
887 0 .
935 0 . .
893 0 .
946 0 .
985 0 .
893 0 .
942 0 . τ = 0 . .
902 0 .
952 0 .
991 0 .
899 0 .
941 0 . .
897 0 .
946 0 .
983 0 .
876 0 .
935 0 . .
892 0 .
946 0 .
990 0 .
883 0 .
942 0 . .2 Asymptotic variance: Further analysis In this section, we compare the GQR and QR asymptotic variances both analytically and throughsimulation, following the RMSE pattern observed in Table 1 which hints at their relative efficiencybeing quantile dependent. The data distribution and true parameter values, as assumed in the datagenerating process, allows comparison based on obtaining explicit asymptotic variance expressions forboth GQR and QR. Although the discussion here is specific to the assumed QR with constant slopemodel, it provides interesting insights, in particular, to the role of first stage estimator in overall GQRvariance. Note that while asymptotic variance of GQR, which estimates the constant parameter β and quantile dependent ones ( β ( τ ) , β ( τ )) separately, is given by (4.2), that for standard QR whereall parameters are estimated together is given by V ( τ ) QR = H ( τ ) − QR J ( τ ) QR H ( τ ) − QR (5.3)where denoting X = [1 , X , X ] ′ , H ( τ ) QR = E " XX ′ β (1)0 ( τ ) + β (1)1 ( τ ) X + β (1)2 ( τ ) X , J ( τ ) QR = τ (1 − τ ) E (cid:2) XX ′ (cid:3) . Asymptotic variance for b β ( · ) . Under the remark noted in Section 5.1, the asymptotic varianceof b β ( · ) for GQR is obtained using (4.2) as follows: V ( τ ) GQR, = [1 , H ( τ ) − J ( τ ) H ( τ ) − [1 , ′ + E [ X ] V ( β ) (5.4)where H ( τ ) , J ( τ ) are given by (4.2). For the true model parameters and distribution considered here,this evaluates to V ( τ ) GQR, = τ (1 − τ )( ac − b ) (cid:0) c − bc E [ X ] + b E [ X ] (cid:1) + E [ X ] V ( β ) (5.5)where a = E " β (1)0 ( τ ) + β (1)2 ( τ ) X = 128 τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19) ,b = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 17 × τ (cid:18) τ − e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) ,c = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 1448 τ (cid:18) τ − τ e τ + e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) E [ X ] = 13 / , E [ X ] = 139 / , E [ X ] = 3 . V ( β ) is given by (4.1), and for the model parameters consideredhere, evaluates to V ( β ) = i (cid:0) E [ XX ′ ] (cid:1) i ′ E (cid:2) ε (cid:3) + i (cid:0) E − [ XX ′ ] E [ X XX ′ ] E − [ XX ′ ] (cid:1) i ′ E (cid:2) ε (cid:3) = 34 E (cid:2) ε (cid:3) + 1394 E (cid:2) ε (cid:3) = 34 × (cid:18)
12 ( e − − ( e − (cid:19) + 1394 × − (cid:18) (cid:19) ! where ε i = (cid:0) β i ( U ) − β i (cid:1) , i = 0 , b β ( · ) for the standard QR is given by the first element of (5.3), which,for the model parameters and distribution assumed in this exercise, evaluates to V ( τ ) QR, = τ (1 − τ )( ac − b ) (cid:0) c − bc E [ X ] + b E [ X ] (cid:1) + τ (1 − τ )( bf − dc ) ( ac − b ) a Var( X ) (5.6)where Var( X ) = 4 /
3, ( a , b , c ) are as in (5.5) and d = E " X β (1)0 ( τ ) + β (1)2 ( τ ) X = 328 τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19) f = E " X X β (1)0 ( τ ) + β (1)2 ( τ ) X = 37 × τ (cid:18) τ − e τ ln (cid:18) e τ + 40 τe τ + 12 τ (cid:19)(cid:19) . It can be seen from (5.5) and (5.6) that the GQR and QR asymptotic variances have a commonquantile varying component; GQR has a constant additional component that depends on the first stepasymptotic variance, while the additional part for QR is again quantile-dependent. The i-QR varianceis given by (5.5) by setting V ( β ) = 0. Figure 1 plots the asymptotic variance comparison for GQRand QR, as well as i-QR. Generated QR (GQR)Infeasible QR (i-QR)Standard QR
Figure 1: Asymptotic variance: GQR vs i-QR vs QR20s can be seen from Figure 1, the variance of the QR and i-QR estimators, being a function of τ (1 − τ ), is close to 0 at the very tails. The tail variance of both the QR and i-QR is less thanthat of GQR because the two step GQR procedure adds an additional constant variance contributionfrom the first step estimation irrespective of the quantile level. But the opposite is true for all otherquantile levels, with GQR asymptotic variance being lesser than QR for most quantiles and especiallyprominent in the higher quantiles . While this exercise considered X and X to be independent,the empirical application suggests that for the general case as well, the pattern for GQR versus QRasymptotic variances remains similar. However, we note that there isn’t a clear efficiency gain of onemethod over the other - it depends on the quantile level and the choice of the first stage estimatorwhich impacts the tail behaviour. Asymptotic variance for b β ( · ) . Under the remark noted in Section 5.1, the asymptotic varianceof b β ( · ) is same for GQR, QR and i-QR, given by, V ( τ ) GQR, = V ( τ ) QR, = [0 , H ( τ ) − J ( τ ) H ( τ ) − [0 , ′ = [0 , , H ( τ ) − QR J ( τ ) QR H ( τ ) − QR [0 , , ′ where ( H ( τ ) , J ( τ )) and ( H ( τ ) QR , J ( τ ) QR ) are obtained from (4.2) and (5.3), respectively. For thetrue model parameters and distribution considered here, this evaluates to V ( τ ) = τ (1 − τ )( ac − b ) (cid:0) b − ab E [ X ] + a E [ X ] (cid:1) (5.7)where ( a , b , c ) are as in (5.5).Tables 5-6 compare the bootstrapped asymptotic SE of b β ( · ), b β ( · ) obtained from (5.2) (meanof b V ( τ ) over the 1000 simulations is reported) with the true values obtained from their analyticalexpressions as derived above. It can be seen in Table 5 that the true asymptotic SE of b β ( · ) is greaterfor GQR than QR for τ = 0 . τ = 0 . b V ( τ ), which is the ratio of the standard deviation to the mean of b V ( τ ) overthe 1000 simulations. CoV measures the precision in estimation of the asymptotic SE (or variabilityamong the estimated values in each run of the simulation). Looking at CoV, it is interesting to notethat for GQR the estimates of asymptotic SE have lesser variation across simulations relative to theirmean values, and CoV is very similar across quantiles, than that of i-QR or QR. This suggests thatthe GQR asymptotic SE estimates are less dispersed around the mean than that of i-QR or QR. TheCoV falls for all methods with sample size; for n = 1000, it is well within 10% for GQR and slightly The relatively high GQR variance at the boundaries is driven by Assumption 3 that the density of X ( θ ) is boundedaway from 0. If it is relaxed, then the two-step estimator can become as good as one-step. When it is not relaxed, thesupport of X ( θ ) depends on θ . In that case, we can have faster than OLS estimation, like Smith (1994)’s nonregularregression which converges at rate 1 /n , so that the two-step estimator is asymptotically unaffected by the first stage.The drawback of the latter is the requirement that error distribution is bounded away from 0 in its compact support.OLS, although slower, does not suffer from such restrictions. β ( · ) is unaffected by the twostep procedure, as QR, i-QR and GQR yield identical true values, and similar bootstrapped estimatesas well as CoV. Also, in Table 5 and, to a lesser degree in Table 6, we find a slight overestimation ofthe GQR variance and underestimation of the QR one.Table 5: Asymptotic standard error for ˆ β ( · ), B = 1000, simulations = 1000 n = 100 n = 1000 τ True Estimated CoV Estimated CoV0 . . . .
43% 13 . . . . .
22% 9 . . . . .
51% 7 . . . . . .
03% 19 . . . . .
49% 20 . . . . .
46% 15 . . . . . .
03% 25 . . . . . . . . .
53% 22 . . . . . .
08% 27 . . . . .
83% 32 . . . . .
82% 24 . . The true asymptotic SE for GQR and QR are computed using (5.5) and (5.6), respectively, whilefor i-QR, the first step variance=0 in the formula for GQR. Mean over 1000 simulations of thebootstrapped asymptotic SE (5.2) is reported. CoV denotes coefficient of variation and indicates theextent of variability in the estimates for each run of the simulation.
As mentioned earlier, the GQR asymptotic variance is impacted by the choice of the first stageestimator. While OLS is a natural choice for estimating the constant first stage slope, as we haveconsidered till now, linearity in OLS is a restriction and choosing from a wider class including non-linear estimators is likely to produce variance improvement. In Table 7, we report bootstraped GQRasymptotic SE for ˆ β ( · ) using two QR-based first stage estimators of β , apart from OLS: the mean ofthe quantile estimates ˆ β ( τ ) over 19 equidistant quantile levels { . , . , . . . , . } , and the weightedaverage of ˆ β ( τ ) at τ = (1 / , / , , 2 /
3) with weights (0 . , . , . β ( · ) is not reported, since its variance is unaffected by first stage, as noted earlier.Comparison of bootstrapped GQR asymptotic SE using different first stage estimators shows that QR-based estimators yield more efficient results than OLS, which is not surprising as quantile regressioncan be more efficient than least squares in absence of i.i.d and Gaussian error assumption. QRmean, which assigns equal weights to all quantiles, results in poorer GQR efficiency than Gastwirth’sweighted QR which gives more weight to the median and lesser to the tails - this estimator is knownto have higher efficiency in a large class of distributions (see, for example, Koenker & Bassett (1978)).Optimal first stage estimator for the constant slopes in QR models and semi-parametric efficiency ofthe two-step GQR is an interesting study left for future research.22able 6: Asymptotic standard error for ˆ β ( · ), B = 1000, simulations = 1000 n = 100 n = 1000 τ True Estimated CoV Estimated CoV0 . . . .
13% 1 . . . . .
06% 1 . . . . .
32% 1 . . . . . .
43% 2 . . . . .
07% 2 . . . . .
82% 2 . . . . . .
58% 4 . . . . .
50% 4 . . . . .
66% 4 . . . . . .
36% 4 . . . . .
47% 4 . . . . .
14% 4 . . The true asymptotic SE for all methods is given by (5.7). The rest of explanation is as in Table 5.
Table 7: GQR bootstrapped asymptotic SE for ˆ β ( · ): Varying first stage estimators τ β estimator n = 100 n = 1000(First stage)0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . QR mean averages quantile estimates ˆ β ( τ ) over τ = { . , . , . . . , . } . Weighted QR assigns the weights(0 . , . , .
3) to ( ˆ β (1 / , ˆ β (1 / , ˆ β (2 / Empirical Application
The two step estimation procedure of Section 2.1.1 can be useful in estimating auction models asin the quantile regression approach of Gimenes (2017). In first price auctions, a quantile regres-sion specification for the private value generates a quantile regression specification for the bid, seeGimenes & Guerre (2020). The linear regression approach of Haile, Hong & Shum (2003) for esti-mating first price auction models uses the ‘homogenized bid’ technique, which implies constant slopeparameters in the bid quantile regression model. It is shown here that the two approaches can becombined, as in the example of Section 2.1.1. We apply the GQR estimator for the estimation of bidquantile specification containing both quantile-constant and quantile-dependent slope parameters. Inthe first step, following Haile et al. (2003), the constant slope parameter is estimated by regressingthe bids on the observed covariates. This is then used to generate the dependent variable for thequantile regression for estimating the quantile parameters The aim of our empirical exercise is to seehow imposing a constant slope for a given set of variables can improve the estimation of the otherslope functions.We illustrate our proposed methodology using data from first price timber auctions conducted bythe US Forest Services (USFS) covering the western half of US in the year 1979. This is the samedata used by Lu & Perrigne (2008). The data consists of 214 first price auctions with 2 bidders, andthe covariates listed are appraisal value and timber volume (in log).
Bid homogenization.
Figure 2 shows the bid quantile parameter estimates obtained from quantileregression of bids on the covariates along with the corresponding OLS estimates and its 95% confidenceinterval. Intercept and appraisal value quantile slope coefficients seem to satisfy the assumption ofconstancy across quantiles. However, the volume quantile parameter does not seem to be constant.
Intercept
OLSQR95% CI OLS
Appraisal Value
OLSQR95% CI OLS
Volume
OLSQR95% CI OLS
Figure 2: Bid quantile parameter estimates
Bid quantile estimation using GQR.
The GQR estimator involves constrained estimation as-suming the intercept and appraisal value slope to be constant across quantiles, while the volumeparameter is considered to be varying with quantile levels. Table 8 reports the result of linear re-gression of bids on the covariates. The first step estimates constitute the intercept and appraisalvalue slope regression estimates, while the quantile estimates for slope of volume is obtained throughquantile regression of the generated dependent variable ( bids i − ( − . − . × appraisal value i )on volume i . The second step GQR bid quantile estimate for slope of volume is shown in Figure 3.24or comparison purpose, we also plot the results of unconstrained estimation of quantile parametersof volume. Table 9 also reports the bootstrapped standard error (SE) of the constrained and theunconstrained estimator obtained from 10 ,
000 bootstrap replications.Table 8: First step - bid regressionIntercept Appraisal value Volume R − .
07 1 .
01 4 .
07 0 . .
72) (0 .
04) (1 .
95% bootstrapped confidence interval for QR shown by red dotted lines, and for GQR by blue shaded region
As can be seen in Figure 3 and Table 9, the GQR slope estimate is more regular than that of theunconstrained estimation; the GQR estimates increase with quantile level, which is consistent withan increasing bid conditional quantile function. The reason is likely that the first stage estimationremoves some of the regressors in the GQR stage along with the associated variation, thereby improvingmonotonicity and smoothness of the quantile estimates. An accuracy improvement, especially in thehigher quantiles, is also evident as the GQR bootstrap confidence interval is much smaller in higherquantiles. The SE pattern observed in Table 9 is as expected from the analysis in Section 5.2 althoughthe covariates are no longer independent: constrained SE is lesser than that obtained by unconstrainedestimation except for the first three quantile levels. Also note that the SE for the unconstrainedestimator varies quite a lot across quantiles and is quite high for the higher quantiles, which areparticularly important for auction models as winners reside here. An intuitive explanation for theSE pattern observed here is as follows. The asymptotic variance of the unconstrained estimator willhave the form given by (5.3): in the tails, while τ (1 − τ ) tends to make the quantile estimate moreprecise, the derivative of an increasing quantile slope parameter has an opposite effect. In higherquantiles, as is typical with quantile regression, the latter effect dominates, reducing the precision of25able 9: Bootstrapped SE for constrained vs unconstrained quantile estimation of volumeConstrained (GQR) Unconstrained τ Estimate SE Estimate SE0 . . . − . . . . . . . . . . − . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H − J H − term which increases with τ for increasing slope parameter,there is a negative quantile effect due to the covariance term being negative in τ for an increasingslope parameter. So, the net quantile-dependent effect is reduced and the SE is more uniform acrossquantiles. Hence, at lower quantile levels, the SE of the GQR estimator is greater than that of theunconstrained one because of the constant contribution of first step variance. But at higher quantiles,SE of the unconstrained estimator is much greater.In general, the unconstrained quantile regression involves fitting the model at each quantile level,for estimating both the constant and the quantile-dependent model parameters, and thus loses out onthe information that some covariate effects are common across quantiles. It is well noted in literaturethat for estimating quantile models that have some common covariate effects, efficiency gain canbe achieved by aggregating the information across multiple quantiles, as in the combined quantileregression approach of Zou & Yuan (2008). The GQR estimator utilizes the commonality informationand improves upon efficiency - the overall efficiency gain and tail behaviour will, as noted earlier,depend on the choice of first step estimator. This paper studies two step estimation of quantile regression models with generated covariates and/ordependent variable. The asymptotic normality of this generated quantile regression (GQR) estimatoris derived using the Bahadur expansion approach. The results are verified using simulation and anapplication based on auctions is carried out. We mention some relevant areas of application. Inparticluar, analysis of QR models with some constant slopes suggests potential benefits of the twostage procedure in terms of improvements in monotonicity, smoothness and estimation accuracy. Akey technical contribution of the paper is to provide Bahadur expansion which holds uniformly with26espect to first step parameter and quantile levels, which can be utilised for developing specificationtests (like those developed in Koenker & Machado (1999) and Koenker & Xiao (2002)) as well as toobtain functional central limit theorem for the two step quantile regression estimator.A slightly different problem that can be studied using techniques developed here relates to quantilespecifications where a first step estimation impacts the quantile level for the second stage quantileregression. Such specifications arise in Arellano & Bonhomme (2017)’s method of quantile regressionwith “rotated” check function to correct for sample selection in quantile regression models. A morechallenging problem open for future research is to consider the case where the first stage converges ata slower rate, like in quantile regression models for panel data where the first step within estimator isusually √ n -consistent and the quantile estimator is √ nT -consistent.27 ppendix 1. Proof section Notations.
The notation ≍ is defined as: sequences { x n } and { y n } satisfy x n ≍ y n if | x n | /C ≤| y n | ≤ C | x n | , for some C > n large enough. ||·|| is the Euclidean norm. The largest eigenvaluein absolute value for a symmetric matrix A is || A || = sup u ∈B (0 , || Au || = sup u ∈B (0 , | u ′ Au | . Also, forany matrix or vector B , || AB || ≤ || A || || B || . We denote || f ( ·|· ) || ∞ = sup y,x | f ( y | X ) | . And the notation ≻ denotes that, for two symmetric matrices A , A , A ≻ A if and only if A − A is a positivedefinite symmetric matrix.Define Q ( β ; τ, θ ) = E (cid:2) ρ τ (cid:0) Y ( θ ) − X ′ ( θ ) β (cid:1)(cid:3) − E [ ρ τ ( Y ( θ ))] . As ρ τ ( · ) is almost everywhere differentiable with bounded derivatives, β Q ( β ; τ, θ ) is differentiablewith first derivative Q (1) ( β ; τ, θ ) = E [ { I ( Y ( θ ) ≤ X ′ ( θ ) β ) − τ } X ( θ )] = E [ { F ( X ′ ( θ ) β | X, θ ) − τ } X ( θ )] . Hence Q ( · ; τ, θ ) is twice continuously differentiable with respect to β , with a second derivative Q (2) ( β ; τ, θ ) = E (cid:2) f (cid:0) X ′ ( θ ) β | X, θ (cid:1) X ( θ ) X ′ ( θ ) (cid:3) = Z f (cid:0) x ′ β | x, θ (cid:1) xx ′ f X ( x | θ ) dx. Let B ( θ ) be the set of β ′ such that 0 < F ( x ′ ( θ ) β | x, θ ) < x of X ( θ ), B ( θ ) = (cid:8) β ; there is an inner x of X ( θ ) such that y ( θ | x ) < x ′ β < y ( θ | x ) (cid:9) where y ( θ | x ) = F − (0 | x ) and y ( θ | x ) = F − (1 | x ). The next Lemma describes some key properties of Q (2) ( β ; τ, θ ) and Q ( β ; τ, θ ). Lemma 1
Under Assumption 3 it holds(i) Q (2) ( β ; τ, θ ) is continuous with respect to its three arguments, with (cid:13)(cid:13)(cid:13) Q (2) ( β ; τ, θ ) − Q (2) ( β ; τ, θ ) (cid:13)(cid:13)(cid:13) ≤ C k β − β k for all β and β , θ ∈ Θ and τ ∈ [0 , .(ii) Q (2) ( β ; τ, θ ) is strictly positive for all β ∈ B( θ ) , θ ∈ Θ and τ ∈ [0 , .(iii) For θ ∈ Θ and τ ∈ (0 , , Q ( β ; τ, θ ) has a unique minimizer β ( τ ; θ ) which is continuouslydifferentiable in θ and τ with ∂β ( τ ; θ ) ∂θ ′ = H ( τ ; θ ) − D ( τ ; θ ) ,∂β ( τ ; θ ) ∂τ = H ( τ ; θ ) − E [ X ( θ )] , where H ( τ ; θ ) and D ( τ ; θ ) are as in Proposition 1. Proof of Lemma 1. (i) directly follows from Assumption 3 and the Lebesgue Dominated Con-vergence Theorem. For (ii), Assumption 3 gives that, for each β in B ( θ ), there is an open subset O = O β,θ of X ( θ ) such that Q (2) ( β ; τ, θ ) (cid:23) R O xx ′ dx. Hence, H ( τ ; θ ) = Q (2) ( β ( τ ; θ ); τ, θ ) has aninverse. For (iii), observe that Q ( β ; τ, θ ) is bounded away from −∞ , so that it has local minimizers28hich must satisfy the first order condition0 = Q (1) ( β ; τ, θ ) = E (cid:2)(cid:8) F (cid:0) X ′ ( θ ) β | X, θ (cid:1) − τ (cid:9) X ( θ ) (cid:3) . (A1.1)Hence these minimizers must lie in B ( θ ) as outside this set it holds F ( X ′ ( θ ) β | X, θ ) = 1 a.s, or F ( X ′ ( θ ) β | X, θ ) = 0 a.s. Now, if there are two such local minimizers β ( τ ; θ ) and β ( τ ; θ ), convexityimplies that all β π ( τ ; θ ) = (1 − π ) β ( τ ; θ ) + πβ ( τ ; θ ), 0 ≤ π ≤
1, must be global minimizers,contradicting that Q (2) ( β π ( τ ; θ ) ; τ, θ ) is strictly positive as Q (1) ( β π ( τ ; θ ) ; τ, θ ) = 0 for all π in [0 , (cid:3) Proof of Proposition 1.
Follows from Lemma 1-(iii). (cid:3)
Proof of Proposition 2-(i).
This proof conducts a uniform order study of the Bahadur errorterm (3.4). Define the following L n ( γ, τ ; θ ) = n X i =1 (cid:26) ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ √ n + β ( τ ; θ ) (cid:19)(cid:19) − ρ τ (cid:0) Y i ( θ ) − X i ( θ ) ′ β ( τ ; θ ) (cid:1)(cid:27) , such that √ n (cid:16) b β ( τ ; θ ) − β ( τ ; θ ) (cid:17) = arg min γ L n ( γ, τ ; θ ) . In what follows, we write b α ( τ ; θ ) ≡ − H − ( τ ; θ ) b S ( τ ; θ ) (A1.2) b S ( τ ; θ ) = 1 √ n n X i =1 s i ( τ ; θ ) . (A1.3)It follows from (3.4) that b E ( τ ; θ ) = arg min ǫ L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) , where L n ( γ, ǫ, τ ; θ ) = L n ( γ + ǫ, τ ; θ ) − L n ( γ, τ ; θ ) . (A1.4)Consider the following decomposition of L n ( γ, ǫ, τ ; θ ). L n ( γ, ǫ, τ ; θ ) = L n ( γ, ǫ, τ ; θ ) + R n ( γ, ǫ, τ ; θ ) , where L n ( γ, ǫ, τ ; θ ) = b S ( τ ; θ ) ′ ( γ + ǫ ) + 12 ( γ + ǫ ) ′ H ( τ ; θ )( γ + ǫ ) − b S ( τ ; θ ) ′ γ − γ ′ H ( τ ; θ ) γ = b S ( τ ; θ ) ′ ǫ + 12 ǫ ′ H ( τ ; θ )( ǫ + 2 γ ) . (A1.5) L n ( γ, ǫ, τ ; θ ) is the quadratic approximation of L n ( γ, ǫ, τ ; θ ) and R n ( γ, ǫ, τ ; θ ) is the remainder term. Asmentioned under ‘Heuristics’ in Section 3, a uniform order for b E ( τ ; θ ) relies on a uniform order study forthe remainder term R n ( γ, ǫ, τ ; θ ), using concepts of maximal inequality under bracketing conditionsgiven in Massart (2007), and on linearization techniques to study b E ( τ ; θ ) given in Hjort & Pollard(2011). 29 niform order for R n ( γ, ǫ, τ ; θ ) . The remainder term is R n ( γ, ǫ, τ ; θ ) = L n ( γ, ǫ, τ ; θ ) − L n ( γ, ǫ, τ ; θ ) = P ni =1 R i ( γ, ǫ, τ ; θ ), where R i ( γ, ǫ, τ ; θ ) = (cid:26) ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ + ǫ √ n + β ( τ ; θ ) (cid:19)(cid:19) − ρ τ (cid:18) Y i ( θ ) − X i ( θ ) ′ (cid:18) γ √ n + β ( τ ; θ ) (cid:19)(cid:19)(cid:27) − s i ( τ ; θ ) √ n ′ ǫ − ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) . (A1.6)Define also R i ( γ, ǫ, τ ; θ ) = R i ( γ, ǫ, τ ; θ ) + 12 ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) , (A1.7) R i ( γ, ǫ, τ ; θ ) = R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ ) | X i ( θ )] , (A1.8) R i ( γ, ǫ, τ ; θ ) = E [ R i ( γ, ǫ, τ ; θ ) | X i ( θ )] − ǫ ′ H ( τ ; θ ) n ( ǫ + 2 γ ) , (A1.9)such that R n ( γ, ǫ, τ ; θ ) = R n ( γ, ǫ, τ ; θ ) + R n ( γ, ǫ, τ ; θ ) , with , R jn ( γ, ǫ, τ ; θ ) = n X i =1 R ji ( γ, ǫ, τ ; θ ) , j = 1 , . (A1.10)The following Lemmas provide uniform bounds for the suprema of the constituents of the remainderterm R n and for ˆ S (see Appendix 2 for their proofs). Lemma 2
Under Assumption 3, for real numbers t γ , t ǫ > with t γ ≍ log / n , t γ ≥ , t ǫ = (cid:16) t log / n (cid:17) /n / for some t > , such that ( t γ + t ǫ ) / /t ǫ ≤ O (cid:16) n / / log / n (cid:17) , for large n , E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ C log / nn / t ǫ ( t γ + t ǫ ) / . Lemma 3
Under Assumption 3, for real numbers t γ , t ǫ > defined as in Lemma 2, such that t γ /t ǫ = O (cid:16) n/ log / n (cid:17) , for large n , E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ C t ǫ ( t γ + t ǫ ) n / . Lemma 4
Under Assumption 3, sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (log / n ) . In what follows, t n = t log / nn / , t > , such that t n = o (cid:16) log / n (cid:17) . t n plays the role of t ǫ in theLemmas, while t γ is chosen such that t γ ≍ log / n . Hence,( t γ + t ǫ ) / t ǫ ≍ n / log / nt log / n = 1 t O n / log / n ! t γ t ǫ ≤ C n / log / nt log / n ≤ C n / log / n ≤ C n / n / log / n = O (cid:18) n log / n (cid:19) , n . These choices for t γ and t ǫ satisfy the requirements for the Lemmas. Lemma 1-(ii),which proves existence of H − for all τ ∈ [ τ , τ ] and θ ∈ Θ, implies that b α ( τ ; θ ) is well defined with aprobability tending to 1. Lemma 4 impliessup ( τ,θ ) ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || = O P (cid:16) log / n (cid:17) . (A1.11)Consider ξ > C ξ such that, for large n and some ϕ > P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n ! ≤ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n , sup τ,θ ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || ≤ C ξ log / n ! + P sup τ,θ ∈ [ τ,τ ] × Θ || b α ( τ ; θ ) || > C ξ log / n ! ≤ P sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t n ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≥ ϕt n + ξ. Since R n = R n + R n , Lemmas 2-3, and Markov inequality give P sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t n ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≥ ϕt n ! ≤ Ct n E sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) + E sup ( γ,ǫ,τ,θ ) ∈B (0 ,C ξ log / n ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ Ct n log / nn / (cid:18) C ξ + t n log / n (cid:19) / + (cid:18) log nn (cid:19) / (cid:18) C ξ + t n log / n (cid:19) ! . Using t n = (cid:16) t log / n (cid:17) /n / and since (log n ) /n = o (1), we getlim n →∞ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ ϕt n ! = ξ + O C ξ / t ! . (A1.12) Uniform order for b E ( τ ; θ ) . Consider T n ≥ t n and ǫ = T n e , || e || = 1 so that || ǫ || ≥ t n . Since ρ τ ( · ) isconvex, L n ( β ( τ ; θ ) , ǫ, τ ; θ ) is convex. Recall that from (A1.4) and (A1.5), L n ( β ( τ ; θ ) , , τ ; θ ) = 0 and L n = L n + R n . Then, using convexity property, t n T n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) = t n T n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) + (cid:18) − t n T n (cid:19) L n ( b α ( τ ; θ ) , , τ ; θ ) ≥ L n (cid:18)b α ( τ ; θ ) , t n ǫ T n , τ ; θ (cid:19) = L n ( b α ( τ ; θ ) , t n e, τ ; θ ) = L n ( b α ( τ ; θ ) , t n e, τ ; θ ) + R n ( b α ( τ ; θ ) , t n e, τ ; θ ) . Since b E ( τ ; θ ) = arg min ǫ L n ( b α ( τ ; θ ) , ǫ, τ ; θ ), we have n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b E ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t n o ⊂ (cid:26) inf ǫ ; || ǫ ||≥ t n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) ≤ inf ǫ ; || ǫ ||
0. Since H ( τ ; θ ) ≻ CM , it follows for the smallesteigenvalue of the positive definite symmetric matrix H ( τ ; θ ), denoted by φ ( τ ; θ ), thatinf ( τ,θ ) ∈ [ τ,τ ] × Θ φ ( τ ; θ ) ≥ Cφ M + o P (1); for some φ M > . (A1.14)Consider inf ( τ,θ ) ∈ [ τ,τ ] × Θ inf || ǫ || = t n L n ( b α ( τ ; θ ) , ǫ, τ ; θ ). The above result gives, for any ǫ with || ǫ || ≥ t n , L n ( b α ( τ ; θ ) , ǫ, τ ; θ ) = ǫ ′ H ( τ ; θ ) ǫ ≥ φt n . Hence, from (A1.12) and (A1.13), we havelim n →∞ P sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b E ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t n ≤ lim n →∞ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ φt n ≤ lim n →∞ P sup ( ǫ,τ,θ ) ∈B (0 ,t n ) × [ τ,τ ] × Θ | R n ( b α ( τ ; θ ) , ǫ, τ ; θ ) | ≥ φt n = ξ + O C ξ / t ! . The latter can be made arbitrarily small by choosing ξ arbitrarily small and t large enough. Recalling t n = ( t log / n ) /n / proves Proposition 2-(i). Note that O P (cid:16) log / nn / (cid:17) = (cid:16) log / nn / (cid:17) O P (1) = o P (1). (cid:3) roof of Proposition 2-(ii). Setting Z i ( θ ) = Y i ( θ ) − X ′ i ( θ ) β ( τ ; θ ), b S ( τ ; θ ) − b S ( τ ; θ ) = 1 √ n n X i =1 e s i ( τ ; θ ) , where e s i ( τ ; θ ) = [ X i ( θ ) { I ( Z i ( θ ) ≤ − τ } − X i ( θ ) { I ( Z i ( θ ) ≤ − τ } ] ≤ X i ( θ ) + X i ( θ ))Denoting e s iℓ ( τ ; θ ) as the ℓ -th coordinate of the vector e s i ( τ ; θ ) implies (cid:12)(cid:12)(cid:12)(cid:12) e s iℓ ( τ ; θ ) √ n (cid:12)(cid:12)(cid:12)(cid:12) ≤ C √ n ≍ n − / ≡ ν ′′′ By Assumption 2, for C (1) < ∞ such that sup θ ∈ Θ (cid:13)(cid:13) ∂∂θ ′ [ Z i ( θ )] (cid:13)(cid:13) ≤ C (1) , Taylor inequality gives | Z i ( θ ) − Z i ( θ ) | ≤ C (1) || θ − θ || . Then, under Assumptions 1 and 3, and removing the subscript i to denote random variables, we have V ar (cid:18) e s ℓ ( τ ; θ ) √ n (cid:19) = 1 n E h (( X ℓ ( θ ) τ − X ℓ ( θ ) τ ) + ( X ℓ ( θ ) I [ Z ( θ ) ≤ − X ℓ ( θ ) I [ Z ( θ ) ≤ i ≤ n E h ( X ℓ ( θ ) τ − X ℓ ( θ ) τ ) + ( X ℓ ( θ ) ( I [ Z ( θ ) ≤ − I [ Z ( θ ) ≤ I [ Z ( θ ) ≤
0] ( X ℓ ( θ ) − X ℓ ( θ ))) i ≤ Cn k θ − θ k + Cn E (cid:20) I (cid:18) − C √ n ≤ Z ( θ ) ≤ C √ n (cid:19)(cid:21) ≤ Cn k θ − θ k = O (cid:16) n − / (cid:17) . Hence, the standard deviation of e s ℓ ( τ ; θ ) / √ n is σ ′′′ ≍ n − / . Then arguing as in Steps 1-2 of Lemma2 (see Appendix 2), E " sup τ ∈ [ τ,τ ] , || θ − θ ||≤ C/ √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ℓ ( τ ; θ ) − b S ℓ ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:16) n / σ ′′′ log / n + (cid:0) σ ′′′ + ν ′′′ (cid:1) log n (cid:17) = O log / nn / ! . Note that by Lemma 1 we have sup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13) H − ( θ ; τ ) − H − ( θ ; τ ) (cid:13)(cid:13) = O (cid:0) n − / (cid:1) andsup ( τ,θ ) ∈ [ τ,τ ] ×B ( θ ,Cn − / ) (cid:13)(cid:13) H − ( θ ; τ ) (cid:13)(cid:13) ≤ C . Markov inequality and Lemma 4, then, explain the orderin (3.6). (cid:3) Appendix 2. Proofs of intermediary Lemmas for Proposition 2
Proof of Lemma 2.
Bound for R n ( γ, ǫ, τ ; θ ) is based on Massart’s maximal inequality underbracketing entropy Theorem 6 .
8, the conditions for which are proven in Step 1. This first requiresstudying variance of R ( γ, ǫ, τ ; θ ). Variance of R ( γ, ǫ, τ ; θ ) . Note that ρ a ( b ) = ( a − I ( b < b = R b ( a − I ( t < dt . Denoting δ ( γ ; θ ) = X ( θ ) ′ γ/ √ n, and Z ( τ ; θ ) = Y ( θ ) − X ( θ ) ′ β ( τ ; θ ) , (A2.1)and using definitions in (A1.6) and (A1.7), for a given θ ∈ Θ, R ( γ, ǫ, τ ; θ ) = ρ τ ( Z ( τ ; θ ) − δ ( γ + ǫ ; θ )) − ρ τ ( Z ( τ ; θ ) − δ ( γ ; θ )) − δ ( ǫ ; θ ) ( I ( Z ( τ ; θ ) ≤ − τ )33 Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt. (A2.2)Using Cauchy-Schwarz inequality, R ( γ, ǫ, τ ; θ ) ≤ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12) R δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt (cid:12)(cid:12)(cid:12)(cid:12) ≤| δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12) R δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) I ( | Z ( τ ; θ ) | ≤ | t | ) dt (cid:12)(cid:12)(cid:12)(cid:12) . Under Assumption 3, E [ R ( γ, ǫ, τ ; θ ) | X ( θ )] ≤ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) (cid:26)Z I (cid:0) | y − X ( θ ) ′ β ( τ ; θ ) | ≤ | t | (cid:1) f ( y | X, θ ) dy (cid:27) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) | t | dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z δ ( ǫ ; θ )0 | δ ( γ ; θ ) + u | du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || f ( ·|· , · ) || ∞ | δ ( ǫ ; θ ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z | δ ( ǫ ; θ ) | ( | δ ( γ ; θ ) | + | u | ) du (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C || X ( θ ) || n / || ǫ || ( || γ || + || ǫ || ) . Therefore, Var( R ( γ, ǫ, τ ; θ )) ≤ E [ R ( γ, ǫ, τ ; θ )] = E [ E [ R ( γ, ǫ, τ ; θ ) | X ( θ )]] ≤ E h C || X ( θ ) || n / || ǫ || ( || γ || + || ǫ || ) i = C || ǫ || ( || γ || + || ǫ || ) n / R || x || f X ( x | θ ) dx ≤ C || ǫ || ( || γ || + || ǫ || ) n / . Step 1. Brackets of { R ( γ, ǫ, τ ; θ ) } . Let F = { R ( γ, ǫ, τ ; θ ); ( γ, ǫ, τ ; θ ) ∈ B (0 , t γ ) ×B (0 , t ǫ ) × [ τ , τ ] × Θ } .This step finds coverings of F with brackets [ R, R ], where the bracket [
R, R ] is the set of all R j such that R ≤ R j ≤ R almost surely. Define for γ in R P e R ( γ, τ ; θ ) = R δ ( γ ; θ )0 ( I ( Z ( τ ; θ ) ≤ t ) − I ( Z ( τ ; θ ) ≤ dt, which is such that, from (A2.2), R ( γ, ǫ, τ ; θ ) = e R ( γ + ǫ, τ ; θ ) − e R ( γ, τ ; θ ) (A2.3)Let sgn( t ) = I ( t ≥ − I ( t < u = t/ sgn( δ ( γ ; θ )), we have e R ( γ, τ ; θ ) = Z | δ ( γ ; θ ) | ( I ( Z ( τ ; θ ) ≤ sgn( δ ( γ ; θ )) u ) − I ( Z ( τ ; θ ) ≤ δ ( γ ; θ )) du = Z | δ ( γ ; θ ) | | I ( Z ( τ ; θ ) ≤ sgn( δ ( γ ; θ )) u ) − I ( Z ( τ ; θ ) ≤ | du = | δ ( γ ; θ ) | Z | I ( Z ( τ ; θ ) ≤ δ ( γ ; θ ) v ) − I ( Z ( τ ; θ ) ≤ | dv, = | δ ( γ ; θ ) | Z | I ( Z ( τ ; θ ) lies between 0 and δ ( γ ; θ ) v ) | dv, (A2.4)where the second last line is obatined using change of variable v = u/ | δ ( γ ; θ ) | . Hence, 0 ≤ e R ( γ, τ ; θ ) ≤| δ ( γ ; θ ) | . Then, using the definition of δ ( γ ; θ ) in (A2.1), we get for all γ ∈ B (0 , t γ + t ǫ ), | e R ( γ, τ ; θ ) | ≤ || X ( θ ) || || γ ||√ n ≤ ν , where ν ≍ t γ + t ǫ √ n . (A2.5)It follows from (A2.3) and the variance bound obtained earlier that E h | R ( γ, ǫ, τ ; θ ) − E [ R ( γ, ǫ, τ ; θ )] | k i = E (cid:20)(cid:12)(cid:12)(cid:12) e R ( γ + ǫ, τ ; θ ) − E h e R ( γ + ǫ, τ ; θ ) i − n e R ( γ, τ ; θ ) − E h e R ( γ, τ ; θ ) io(cid:12)(cid:12)(cid:12) k − | R ( γ, ǫ, τ ; θ ) − E [ R ( γ, ǫ, τ ; θ )] | (cid:21) (cid:18) × ν (cid:19) k − Var( R ( γ, ǫ, τ ; θ )) ≤ k !2 ν k − σ , where σ ≍ t ǫ ( t ǫ + t γ ) n / . (A2.6)In order to find covering for F , we first define e F t = { e R ( γ, τ ; θ ); ( γ, τ, θ ) ∈ B (0 , t ) × [ τ , τ ] × Θ } andshow that it is sufficient to find covering of e F t , with set of brackets { [ R j , R j ] , ≤ j ≤ e h ( t b ; t ) } , where t b ∈ (0 ,
1) denotes length of a bracket, satisfying, E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i ≤ k !8 (cid:18) ν (cid:19) k − t b , (A2.7) h ( t b ; t ) ≤ C log (cid:18) ntt b (cid:19) . (A2.8)Consider the following two coverings of e F t γ and e F t γ + t ǫ e F t γ ⊂ [ ≤ j ≤ e h ( tb ; tγ ) h R j , R j i , e F t γ + t ǫ ⊂ [ ≤ j ≤ e h ( tb ; tγ + tǫ ) h R j , R j i If such coverings of e F t γ and e F t γ + t ǫ exist, then for every ( γ, ǫ, τ ; θ ), e R ( γ, τ ; θ ) ∈ h R j , R j i , e R ( γ + ǫ, τ ; θ ) ∈ h R j , R j i , for some j and j , and from (A2.3), we have R ( γ, ǫ, τ ; θ ) ∈ h R j − R j , R j − R j i .Hence, F can be covered by e h ′ ( t b ; t ) brackets such that, using (A2.7) and (A2.8), h ′ ( t b ; t ) = h ( t b ; t γ ) + h ( t b ; t γ + t ǫ ) ≤ C log (cid:18) n ( t γ + t ǫ ) t b (cid:19) , and E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j − (cid:16) R j − R j (cid:17)(cid:12)(cid:12)(cid:12) k (cid:21) = E (cid:20)(cid:12)(cid:12)(cid:12)(cid:16) R j − R j (cid:17) + (cid:16) R j − R j (cid:17)(cid:12)(cid:12)(cid:12) k (cid:21) ≤ k − (cid:18) E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j (cid:12)(cid:12)(cid:12) k (cid:21) + E (cid:20)(cid:12)(cid:12)(cid:12) R j − R j (cid:12)(cid:12)(cid:12) k (cid:21)(cid:19) ≤ k − k !8 (cid:18) ν (cid:19) k − t b = k !2 ν k − t b ., where the inequality in the second line of the above equation follows because, for a > b > a + b ) k ≤ k − ( a k + b k ).We now construct covering for e F t . Lemma 1 proves that β ( τ ; θ ) is continuously differentiable in µ = ( τ, θ ) over [ τ , τ ] × Θ with bounded derivative. Then from from Taylor’s inequality we get, for all µ , µ in [ τ , τ ] × Θ, (cid:12)(cid:12) x ( θ ) ′ β ( µ ) − x ( θ ) ′ β ( µ ) (cid:12)(cid:12) ≤ C || µ − µ || . (A2.9)Also, given θ ∈ Θ, for all γ , γ in R P , we have | δ ( γ ; θ ) − δ ( γ ; θ ) | ≤ C √ n || γ − γ || . (A2.10)Define r ( q, δ ) = R ρ ( q, δv ) dv , ρ ( q, δ ) = | I ( q ≤ δ ) − I ( q ≤ | = I ( q ∈ (0 , δ ]) I ( δ ≥ I ( q ∈ [ δ, I ( δ < . From (A2.4), e R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )) . Note that ρ ( q, δ ) is a step function which is 1 for q between 0 and δ , and 0 elsewhere, for a given δ . Let ρ ( q, δ ) and ρ ( q, δ ) be smooth approximationsof ρ ( q, δ ), constructed using Friedrichs mollifier of the form35( x ) = C e − / ( −| x | ) , if | x | < , if | x | ≥ , where C > R − Φ( x ) dx = 1 (see Stroock (2011), chapter 6 for details). As such,for η >
0, the convolution procedure yields that there exist smooth approximation functions ρ ( q, δ ), ρ ( q, δ ), and an open set D η ⊂ R such that:(i) 0 ≤ ρ ( q, δ ) ≤ ρ ( q, δ ) ≤ ρ ( q, δ ) ≤ q, δ ) ∈ D η , with ρ ( q, δ ) = ρ ( q, δ ) = ρ ( q, δ ) if( q, δ ) ∈ R \ D η ,(ii) sup ( q,δ ) ∈ D η (cid:16)(cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂q (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂δ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂q (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ρ ( q,δ ) ∂δ (cid:12)(cid:12)(cid:12)(cid:17) ≤ Cη − / , and, ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = ∂ρ ( q,δ ) ∂q = ∂ρ ( q,δ ) ∂δ = 0, when ( q, δ ) ∈ R \ D η ,(iii) D η ⊂ D ′ η = (cid:8) ( q, δ ) ∈ R ; | q | ≤ Cη / or | q − δ | ≤ Cη / (cid:9) Define r ( q, δ ) = R ρ ( q, vδ ) dv , r ( q, δ ) = R ρ ( q, vδ ) dv , and R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )), R ( γ, τ ; θ ) = | δ ( γ ; θ ) | r ( Z ( τ ; θ ) , δ ( γ ; θ )) such that condition (i) implies R ( γ, τ ; θ ) ≤ e R ( γ, τ ; θ ) ≤ R ( γ, τ ; θ ) . (A2.11)We now bound R ( γ , µ ) − R ( γ , µ ) and R ( γ , µ ) − R ( γ , µ ). | R ( γ , µ ) − R ( γ , µ ) | = || δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) | = || δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ ))+ | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) − | δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) |≤ | | δ ( γ ; θ ) − δ ( γ ; θ ) | r ( Z ( µ ) , δ ( γ ; θ )) + | δ ( γ ; θ ) | | r ( Z ( µ ) , δ ( γ ; θ )) − r ( Z ( µ ) , δ ( γ ; θ )) | | . Using the definitions of Z ( τ ; θ ) and δ ( γ ; θ ) given in (A2.1), the bounds on increments of x ( θ ) ′ β ( τ ; θ )and δ ( γ ; θ ) obtained in (A2.9) and (A2.10), respectively, conditions (i, ii) and Taylor’s inequality, wehave, for all ( γ , µ ), ( γ , µ ) in B (0 , t ) × [ τ , τ ] × Θ, where t = t γ + t ǫ ≥ | R ( γ , µ ) − R ( γ , µ ) | ≤ C || γ − γ ||√ n + C tη − / √ n (cid:18) || µ − µ || + || γ − γ ||√ n (cid:19) ≤ C √ n (cid:16) tη − / (cid:17) ( || µ − µ || + || γ − γ || ) . Arguing similarly gives (cid:12)(cid:12) R ( γ , µ ) − R ( γ , µ ) (cid:12)(cid:12) ≤ C √ n (cid:16) tη − / (cid:17) ( || µ − µ || + || γ − γ || ) . From van de Geer (2000) there exists a covering of B (0 , t ) × [ τ , τ ] × Θ by L balls B (( γ j , µ j ) , η ) withcentre ( γ j , µ j ) and radius η such that L ≤ max (cid:18) , Ct P η P + d +1 (cid:19) , where γ ∈ R P , µ = ( τ ; θ ) ∈ [ τ , τ ] × R d . (A2.12)Note that for a ball of radius η with centre ( γ j , µ j ) and ( γ , µ ) inside this ball, | R ( γ j , µ j ) − R ( γ , µ ) | ≤ C √ n (cid:0) tη − / (cid:1) η , (cid:12)(cid:12) R ( γ j , µ j ) − R ( γ , µ ) (cid:12)(cid:12) ≤ C √ n (cid:0) tη − / (cid:1) η . Define R ′ j = R ( γ j , µ j ) − C √ n (cid:0) tη − / (cid:1) η ,36 ′ j = R ( γ j , µ j ) + C √ n (cid:0) tη − / (cid:1) η, and R j = max (0 , R ′ j ) , R j = min (cid:18) ν , R ′ j (cid:19) . (A2.13)Then, from (A2.11), for ( γ, θ ) in B (( γ j , µ j ) , η ), we have R ′ j ≤ R j ≤ e R ( γ, θ ) ≤ R j ≤ R ′ j (A2.14)This implies that { (cid:2) R j , R j (cid:3) , j = 1 , · · · , L } is a covering of e F t , with, (cid:12)(cid:12) R j − R j (cid:12)(cid:12) ≤ ν ≍ t √ n , (A2.15)since 0 ≤ R j ≤ R j ≤ ν/
2. We now bound E h(cid:0) R j − R j (cid:1) i and E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i . The definitions of δ ( γ ; θ ), Z ( τ ; θ ) in (A2.1), conditions (i, iii), Assumption 3, (A2.14) and the inequality ( a + b ) ≤ a + b ) give E h(cid:0) R j − R j (cid:1) i ≤ E (cid:20)(cid:16) R ′ j − R ′ j (cid:17) (cid:21) = E "(cid:18)(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) + 2 C √ n (cid:16) tη − / (cid:17) η (cid:19) ≤ E h(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) i + Cn (cid:16) tη − / (cid:17) η ≤ E h(cid:0) R ( γ j , µ j ) − R ( γ j , µ j ) (cid:1) i + C (1 + t ) ( η + η ) n = 2 E h δ ( γ j ; θ j ) ( r ( Z ( µ j ) , δ ( γ j ; θ j )) − r ( Z ( µ j ) , δ ( γ j ; θ j ))) i + C (1 + t ) ( η + η ) n ≤ || γ j || n Z || x || (cid:26)Z (cid:26)Z I (( Z ( µ j ) , δ ( γ j ; θ j ) v ) ∈ D η ) dv (cid:27) f ( y | x, θ ) dy (cid:27) f X ( x | θ ) dx + C (1 + t ) ( η + η ) n ≤ C (1 + t ) n ( η + η + η / ) , where the last inequality follows from Assumption 3 and condition (iii), since Z Z I (( Z ( µ j ) , δ ( γ j ; θ j ) v ) ∈ D η ) dvf ( y | x, θ ) dy ≤ Z I (cid:0) y ∈ D η + x ( θ ) ′ β ( µ j ) (cid:1) f ( y | x, θ ) dy ≤ C Z I (cid:0) y ∈ D η + x ( θ ) ′ β ( µ j ) (cid:1) dy = C (length of D η ) ≤ Cη / . The above bound, together with (A2.15), gives for any integer k ≥ E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) k i = E h(cid:12)(cid:12) R j − R j (cid:12)(cid:12) (cid:12)(cid:12) R j − R j (cid:12)(cid:12) k − i ≤ (cid:18) ν (cid:19) k − E h(cid:0) R j − R j (cid:1) i ≤ k !8 (cid:18) ν (cid:19) k − C (1 + t ) n ( η + η + η / ) . Hence, (A2.7) holds if η = C min (cid:18)(cid:16) n (1+ t ) (cid:17) / t b , (cid:16) n (1+ t ) (cid:17) t b , (cid:16) n (1+ t ) (cid:17) t b (cid:19) . Recall that t ≥ t b ∈ (0 , L = e h ( t b ; t ) ≤ max , Ct P min (cid:18)(cid:16) n (1+ t )2 (cid:17) / t b , (cid:16) n (1+ t )2 (cid:17) t b , (cid:16) n (1+ t )2 (cid:17) t b (cid:19) P + d +1 ≤ max (cid:16) , Cnt t b (cid:17) P + d +1 , such that for large n, h ( t b ; t ) ≤ max (cid:16) , ( P + d + 1) log (cid:16) Cnt t b (cid:17)(cid:17) = C (log n +5 log t − t b ) ≤ C (log n + log t − log t b )+ C log t ≤ C log (cid:16) ntt b (cid:17) + C log (cid:16) ntt b (cid:17) ≤ C log (cid:16) ntt b (cid:17) , which37roves (A2.8). This completes our task of constructing covering for e F t . Step 2. Bound for E (cid:16) sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12)(cid:17) . E sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) = E sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ ) | X ( θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E " sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + E " sup ( γ,ǫ,τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E " n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) | X ( θ ) ≤ E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Let ν , σ , and h ( · ; · ) be as defined in Step 1 by equations (A2.5), (A2.6) and (A2.8). Recall that t = t γ + t ǫ ≥ σ < ≤ n ( t γ + t ǫ ). Let us use the notation h ( u ; t ) = h ( u ). Applying Theorem 6 . E sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( R i ( γ, ǫ, τ ; θ ) − E [ R i ( γ, ǫ, τ ; θ )]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:18) n Z σ h / ( u ) du + ( ν + σ ) h ( σ ) (cid:19) . From the discussion in Step 1 equation (A2.8), since σ <
1, for all u ∈ (0 , σ ], h ( u ; t ) = h ( u ) ≤ C log ( n ( t γ + t ǫ ) /u ). Therefore, by Cauchy-Schwarz inequality, we have n / Z σ h / ( u ) du ≤ ( nσ ) / (cid:18)Z σ h ( u ) du (cid:19) / ≤ C ( nσ ) / (cid:18)Z σ log (cid:18) n ( t γ + t ǫ ) u (cid:19) du (cid:19) / = C ( nσ ) / (cid:18) σ (cid:18) log (cid:18) n ( t γ + t ǫ ) σ (cid:19) + 1 (cid:19)(cid:19) / ≤ Cn / σ log / (cid:18) n ( t γ + t ǫ ) σ (cid:19) . With the assumptions on the order of t γ and t ǫ as stated in the statement of Lemma 2 and the orderof σ obtained in (A2.6), it followslog (cid:18) n ( t γ + t ǫ ) σ (cid:19) ≤ C log n / ( t γ + t ǫ ) / t ǫ ! ≤ C log n / n / log / n ! ≤ C log n. Hence, on substituting, we get E " sup ( γ,ǫ,τ ; θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ C (cid:16) n / σ log / n + ( ν + σ ) log n (cid:17) ≤ C t ǫ ( t ǫ + t γ ) / log / nn /
1+ log / n n / + ( t ǫ + t γ ) / t ǫ n / !! ≤ C log / nn / t ǫ ( t ǫ + t γ ) / , which proves Lemma 2. (cid:3) roof of Lemma 3. The proof of Lemma 3 follows the same steps as in Lemma 2 and, hence,a sketch of the proof is provided here. Treating quantities varying with i as random variables, theexpressions for R ( γ, ǫ, τ ; θ ) given in (A2.2), R ( γ, ǫ, τ ; θ ) from (A1.9) and H ( τ ; θ ) gives R ( γ, ǫ, τ ; θ ) = δ ( γ ; θ )+ δ ( ǫ ; θ ) Z δ ( γ ; θ ) (cid:0) F (cid:0) X ( θ ) ′ β ( τ ; θ )+ t | X, θ (cid:1) − F (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dt − ǫ ′ H ( τ ; θ )( ǫ + 2 γ )= δ ( γ ; θ ) + δ ( ǫ ; θ ) Z δ ( γ ; θ ) (cid:0) F (cid:0) X ( θ ) ′ β ( τ ; θ )+ t | X, θ (cid:1) - F (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1) - tf (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dt = δ ( γ ; θ ) + δ ( ǫ ; θ ) Z δ ( γ ; θ ) t (cid:26)Z (cid:0) f (cid:0) X ( θ ) ′ β ( τ ; θ ) + vt | X, θ (cid:1) − f (cid:0) X ( θ ) ′ β ( τ ; θ ) | X, θ (cid:1)(cid:1) dv (cid:27) dt. Define r ( γ, τ ; θ ) = R δ ( γ ; θ )0 t nR ( f ( X ( θ ) ′ β ( τ ; θ ) + vt | X, θ ) − f ( X ( θ ) ′ β ( τ ; θ ) | X, θ )) dv o dt which im-plies that R ( γ, ǫ, τ ; θ ) = r ( γ + ǫ, τ ; θ ) − r ( γ, τ ; θ ). Using the definition of δ ( γ ; θ ) in (A2.1) and becauseunder Assumption 3 we have n > | f ( a + b | x, θ ) − f ( a | x, θ ) | ≤ n | b | , from Lemma 1, wehave (cid:12)(cid:12) R ( γ, ǫ, τ ; θ ) (cid:12)(cid:12) ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z δ ( γ ; θ )+ δ ( ǫ ; θ ) δ ( γ ; θ ) t dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = C (cid:12)(cid:12) δ ( ǫ ; θ ) (cid:0) δ ( γ ; θ ) + 3 δ ( γ ; θ ) δ ( ǫ ; θ ) + δ ( ǫ ; θ ) (cid:1)(cid:12)(cid:12) ≤ C | δ ( ǫ ; θ ) | (cid:16) | δ ( γ ; θ ) | +3 | δ ( γ ; θ ) | | δ ( ǫ ; θ ) | + | δ ( ǫ ; θ ) | (cid:17) ≤ C | δ ( ǫ ; θ ) | ( | δ ( γ ; θ ) | + | δ ( ǫ ; θ ) | ) ≤ C || X ( θ ) || || ǫ || ( || γ || + || ǫ || ) n / . (A2.16) | r ( γ, τ ; θ ) | ≤ C | δ ( γ ; θ ) | ≤ C || X ( θ ) || || γ || n / . Thus, for all γ ∈ B (0 , t γ + t ǫ ) and all ( τ, θ ) ∈ [ τ , τ ] × Θ, | r ( γ, τ ; θ ) | ≤ ν ′ ν ′ ≍ ( t γ + t ǫ ) n / . From (A2.16), under Assumption 3,Var (cid:0) R ( γ, ǫ, τ ; θ ) (cid:1) ≤ E (cid:2) R ( γ, ǫ, τ ; θ ) (cid:3) ≤ C || ǫ || ( || γ || + || ǫ || ) n / ! Z || x ( θ ) || f X ( x | θ ) dx ≤ C || ǫ || ( || γ || + || ǫ || ) n ≤ (cid:0) σ ′ (cid:1) ; σ ′ ≍ t ǫ ( t γ + t ǫ ) n / . Then arguing as in step 1 of Lemma 2 to construct brackets, E sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) − E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) ≤ Cn / σ ′ log / (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + ( σ ′ + ν ′ ) log (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) It follows from (A2.16) and Assumption 3 that for all ( γ, ǫ, τ, θ ) in B (0 , t γ ) × B (0 , t ǫ ) × [ τ , τ ] × Θ, (cid:12)(cid:12) E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12) n E (cid:2) R i ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) ≤ n E (cid:2)(cid:12)(cid:12) R i ( γ, ǫ, τ ; θ ) (cid:12)(cid:12)(cid:3) ≤ Cn / E h || X ( θ ) || || ǫ || ( || γ || + || ǫ || ) i Cn / || ǫ || ( || γ || + || ǫ || ) Z || x ( θ ) || f X ( x | θ ) dx ≤ Cn / t ǫ ( t γ + t ǫ ) , and using the conditions on orders of t γ and t ǫ as specified in Lemma 2, such that t γ ≥ t γ /t ǫ = O (cid:16) n/ log / n (cid:17) , we have E " sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ E " sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ (cid:0)(cid:12)(cid:12) R n ( γ, ǫ, τ ; θ ) − E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12) + (cid:12)(cid:12) E (cid:2) R n ( γ, ǫ, τ ; θ ) (cid:3)(cid:12)(cid:12)(cid:1) ≤ Cn / σ ′ log / (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + (cid:0) σ ′ + ν ′ (cid:1) log (cid:18) n ( t γ + t ǫ ) σ ′ (cid:19) + Cn / t ǫ ( t γ + t ǫ ) ≤ C t ǫ ( t γ + t ǫ ) n log n t ǫ ( t γ + t ǫ ) ! t γ + t ǫ ) t ǫ n / log / n / t ǫ ( t γ + t ǫ ) !! + Cn / t ǫ ( t γ + t ǫ ) . Recall t ǫ = t log / n/n / = o (log / n ) and t γ ≍ log / n , such that t γ + t ǫ ≍ log / n . It follows,for large n , n / ( t ǫ ( t γ + t ǫ )) ≤ ( C/t )( n / / log / n ) ≤ Cn / /t , such that log ( n / / ( t ǫ ( t γ + t ǫ ))) ≤ C log n . Similarly, ( t γ + t ǫ ) / ( t ǫ n / ) ≤ C/ ( n log n ) / .Thus, (( t γ + t ǫ ) / ( t ǫ n / )) × log / ( n / / ( t ǫ ( t γ + t ǫ ))) ≤ C (log n/n ) / = o (1) and(1 + ( t γ + t ǫ ) / ( t ǫ n / ) × log / ( n / / ( t ǫ ( t γ + t ǫ )))) = 1 + o (1) = 1. Therefore, it follows, E sup ( γ,ǫ,τ,θ ) ∈B (0 ,t γ ) ×B (0 ,t ǫ ) × [ τ,τ ] × Θ | R n ( γ, ǫ, τ ; θ ) | ≤ C t ǫ ( t γ + t ǫ ) n log / n + Cn / t ǫ ( t γ + t ǫ ) ≤ C t ǫ ( t γ + t ǫ ) n / , for large n , which proves Lemma 3. (cid:3) Proof of Lemma 4.
The first order condition for Q ( β, τ ; θ ) gives E [ X ( θ ) { F ( X ′ ( θ ) β ( τ ; θ ) | X, θ ) − τ } ] =0 . Let s iℓ ( τ ; θ ) denote the ℓ th entry of the vector s i ( τ ; θ ) √ n in (A1.3). Assumption 3 gives, uniformly in( τ, θ ) ∈ [ τ , τ ] × Θ for all i , | s iℓ ( τ ; θ ) | ≤ ν ′′ , where ν ′′ ≍ n − / Var( s ℓ ( τ ; θ )) ≤ E h ( s ℓ ( τ ; θ )) i ≤ E (cid:20) n X ℓ ( θ ) (cid:21) = 1 n Z x f X ( x | θ ) dx ≤ (cid:0) σ ′′ (cid:1) , where σ ′′ ≍ n − / . Hence, arguing as in Steps 1-2 of Lemma 2, E " sup ( τ,θ ) ∈ [ τ,τ ] × Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b S ℓ ( τ ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:16) n / σ ′′ log / n + (cid:0) σ ′′ + ν ′′ (cid:1) log n (cid:17) ≤ C log / n + (cid:18) log nn (cid:19) / log / n ! = O (cid:16) log / n (cid:17) . Markov inequality, then, proves Lemma 4. (cid:3) ppendix 3. Proof of remark in Section 5.1 (i) In the expression for C ( τ ) in (4.2), (cid:8)R τ ( β ( t ) + β ( t ) X ) dt − τ ( β ( τ ) + β ( τ ) X ) (cid:9) will have theform p ( τ )+ q ( τ ) X . Recall that e X = [1 , X ] ′ , X = [1 , X , X ] ′ and g ( X ) = e X [0 , , E − [ XX ′ ] X ,then, C ( τ ) = E " (cid:0) [0 , , E − [ XX ′ ] X (cid:1) ( p ( τ ) + q ( τ ) X ) (cid:0) [0 , , E − [ XX ′ ] XX (cid:1) ( p ( τ ) + q ( τ ) X ) . If X and X are independent, elementary matrix algebra gives that[0 , , E − (cid:2) XX ′ (cid:3) X = 1 D (cid:8)(cid:0) E [ X ] E [ X ] − E [ X ] E [ X ] (cid:1) + (cid:0) E [ X ] − E [ X ] (cid:1) X (cid:9) , where D is the determinant of the matrix E [ XX ′ ]. Plugging in this expression in C ( τ ) andsimplifying using independence of X and X proves the result.(ii) Given the result in (i), the increase in variance of the quantile estimates due to first step es-timation, over standard quantile regression had the first step been known, is given by (4.2) as H ( τ ) − D ( τ ) V ( β ) D ( τ ) ′ H ( τ ) − . Using H ( τ ) and D ( τ ) as given in (4.2), under independence of X and X , the vector H ( τ ) − D ( τ ) evaluates to [ − E [ X ] , ′ . Therefore, the additional variancedue to two-step estimation is given by H ( τ ) − D ( τ ) V ( β ) D ( τ ) ′ H ( τ ) − = " E [ X ] V ( β ) 00 0 . This proves (ii). (cid:3) eferences Abrevaya, J. (2002). The effects of demographics and maternal behavior on the distribution of birth outcomes.In
Economic applications of quantile regression (pp. 247–257). Springer.Amemiya, T. (1974). The nonlinear two-stage least-squares estimator.
Journal of econometrics , 2(2), 105–110.Amemiya, T. & Powell, J. L. (1981). A comparison of the Box–Cox maximum likelihood estimator and thenon-linear two-stage least squares estimator.
Journal of Econometrics , 17(3), 351–381.Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey, J. W. (1972).
RobustEstimates of Location: Survey and Advances . Princeton University Press.Andrews, D. W. & Buchinsky, M. (2000). A three-step method for choosing the number of bootstrap repetitions.
Econometrica , 68(1), 23–51.Arellano, M. & Bonhomme, S. (2017). Quantile selection models with an application to understanding changesin wage inequality.
Econometrica , 85(1), 1–28.Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium.
Econometrica: Journalof the Econometric Society , (pp. 841–890).Blundell, R. & Powell, J. L. (2003). Endogeneity in nonparametric and semiparametric regression models. In
Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress (pp. 312–357).Cambridge University Press.Blundell, R. & Powell, J. L. (2007). Censored regression quantiles with endogenous regressors.
Journal ofEconometrics , 141(1), 65–83.Box, G. E. & Cox, D. R. (1964). An analysis of transformations.
Journal of the Royal Statistical Society. SeriesB (Methodological) , 26(2), 211–252.Buchinsky, M. (1994). Changes in the US wage structure 1963-1987: Application of quantile regression.
Econo-metrica , (pp. 405–458).Buchinsky, M. (1995). Quantile regression, Box–Cox transformation model, and the US wage structure, 1963–1987.
Journal of Econometrics , 65(1), 109–154.Chamberlain, G. (1994). Quantile regression, censoring, and the structure of wages. In C. A. Sims (Ed.),
Ad-vances in Econometrics: Sixth World Congress , Econometric Society Monographs (pp. 171––210). CambridgeUniversity Press.Chen, L., Galvao, A. F., & Song, S. (2018). Quantile regression with generated regressors.
Available at SSRN3039602 .Chen, X., Linton, O., & Van Keilegom, I. (2003). Estimation of semiparametric models when the criterionfunction is not smooth.
Econometrica , 71(5), 1591–1608.Chernozhukov, V., Fern´andez-Val, I., & Kowalski, A. E. (2015). Quantile regression with censoring and endo-geneity.
Journal of Econometrics , 186(1), 201–221.Chernozhukov, V., Fern´andez-Val, I., Newey, W., Stouli, S., & Vella, F. (2017). Semiparametric estimation ofstructural functions in nonseparable triangular models. arXiv preprint arXiv:1711.02184 . hernozhukov, V. & Hansen, C. (2005). An IV model of quantile treatment effects. Econometrica , 73(1),245–261.Chernozhukov, V. & Hansen, C. (2006). Instrumental quantile regression inference for structural and treatmenteffect models.
Journal of Econometrics , 132(2), 491–525.Chernozhukov, V. & Hansen, C. (2008). Instrumental variable quantile regression: A robust inference approach.
Journal of Econometrics , 142(1), 379–398.Chesher, A. (2003). Identification in nonseparable models.
Econometrica , 71(5), 1405–1441.Fang, K. W., Kotz, S., & Ng, K. W. (1990).
Symmetric multivariate and related distributions . London: Chapmanand Hall/CRC Press.Fitzenberger, B., Wilke, R. A., & Zhang, X. (2009). Implementing Box–Cox quantile regression.
EconometricReviews , 29(2), 158–181.Gastwirth, J. L. (1966). On robust procedures.
Journal of the American Statistical Association , 61(316),929–948.Gimenes, N. (2017). Econometrics of ascending auctions by quantile regression.
Review of Economics andStatistics , 99(5), 944–953.Gimenes, N. & Guerre, E. (2020). Quantile regression methods for first-price auctions. arXiv preprintarXiv:1909.05542 .Hahn, J. & Ridder, G. (2013). Asymptotic variance of semiparametric estimators with generated regressors.
Econometrica , 81(1), 315–340.Haile, P. A., Hong, H., & Shum, M. (2003). Nonparametric tests for common values at first-price sealed-bidauctions. Technical report , National Bureau of Economic Research .Hjort, N. L. & Pollard, D. (2011). Asymptotics for minimisers of convex processes. arXiv preprintarXiv:1107.3806 .Hoderlein, S., Klemel¨a, J., & Mammen, E. (2010). Analyzing the random coefficient model nonparametrically.
Econometric Theory , (pp. 804–837).Ichimura, H. & Lee, S. (2010). Characterization of the asymptotic distribution of semiparametric m-estimators.
Journal of Econometrics , 159(2), 252–266.Imbens, G. W. & Newey, W. K. (2009). Identification and estimation of triangular simultaneous equationsmodels without additivity.
Econometrica , 77(5), 1481–1512.Koenker, R. (2005).
Quantile Regression . Cambridge University Press.Koenker, R. (2017). Quantile regression: 40 years on.
Annual Review of Economics , 9, 155–176.Koenker, R. & Bassett, G. (1978). Regression quantiles.
Econometrica , 46(1), 33–50.Koenker, R. & Hallock, K. (2001). Quantile regression: An introduction.
Journal of Economic Perspectives ,15(4), 43–56. oenker, R. & Ma, L. (2006). Quantile regression methods for recursive structural equation models. Journalof Econometrics , 134(2), 471–506.Koenker, R. & Machado, J. A. (1999). Goodness of fit and related inference processes for quantile regression.
Journal of the american statistical association , 94(448), 1296–1310.Koenker, R. & Xiao, Z. (2002). Inference on the quantile regression process.
Econometrica , 70(4), 1583–1612.Lee, S. (2007). Endogeneity in quantile regression models: A control function approach.
Journal of Economet-rics , 141(2), 1131–1158.Lu, J. & Perrigne, I. (2008). Estimating risk aversion from ascending and sealed-bid auctions: the case of timberauction data.
Journal of Applied Econometrics , 23(7), 871–896.Machado, J. A. & Mata, J. (2000). Box–Cox quantile regression and the distribution of firm sizes.
Journal ofApplied Econometrics , 15(3), 253–274.Mammen, E., Rothe, C., & Schienle, M. (2012). Nonparametric regression with nonparametrically generatedcovariates.
The Annals of Statistics , 40(2), 1132–1170.Mammen, E., Rothe, C., & Schienle, M. (2016). Semiparametric estimation with generated covariates.
Econo-metric Theory , 32(5), 1140–1177.Massart, P. (2007).
Concentration inequalities and model selection , volume 6. Springer.Mu, Y. & He, X. (2007). Power transformation toward a linear regression quantile.
Journal of the AmericanStatistical Association , 102(477), 269–279.Murphy, K. M. & Topel, R. H. (1985). Estimation and inference in two-step econometric models.
Journal ofBusiness & Economic Statistics , 3(4), 370–379.Newey, W. K. & McFadden, D. (1994). Large sample estimation and hypothesis testing.
Handbook of econo-metrics , 4, 2111–2245.Oxley, L. & McAleer, M. (1993). Econometric issues in macroeconomic models with generated regressors.
Journal of Economic Surveys , 7(1), 1–40.Pagan, A. (1984). Econometric issues in the analysis of regressions with generated regressors.
InternationalEconomic Review , (pp. 221–247).Powell, J. L. (1991). Estimation of monotonic regression models under quantile restrictions. In W. A. Barnett,J. Powell, & G. Tauchen (Eds.),
Nonparametric and Semiparametric Methods in Econometrics and Statistics chapter 14, (pp. 357–384). Cambridge University Press.Smith, R. L. (1994). Nonregular regression.
Biometrika , 81(1), 173–183.Stroock, D. W. (2011).
Essentials of integration theory for analysis , volume 262. Springer.van de Geer, S. (2000).
Empirical Processes in M-Estimation . Cambridge: University Press Cambridge.Zou, H. & Yuan, M. (2008). Composite quantile regression and the oracle model selection theory.
The Annalsof Statistics , 36(3), 1108–1126., 36(3), 1108–1126.