[PDF] A first-stage representation for instrumental variables quantile regression

Abstract

This paper develops a first-stage linear regression representation for the instrumental variables (IV) quantile regression (QR) model. The quantile first-stage is analogous to the least squares case, i.e., a conditional mean regression of the endogenous variables on the instruments, with the difference that the QR case is a weighted regression. The weights are given by the conditional density function of the innovation term in the QR structural model, conditional on the endogeneous and exogenous covariates, and the instruments as well, at a given quantile. In addition, we show that the required Jacobian identification conditions for IVQR models are embedded in the quantile first-stage. The first-stage regression is a natural framework to evaluate the validity of instruments, and in particular, the validity of the Jacobian identification conditions. Hence, we suggest testing procedures to evaluate the adequacy of instruments by evaluating their statistical significance using the first-stage result. This procedure may be specially useful in QR since the instruments may be relevant at some quantiles but not at others, which indicates the use of weak-identification robust inference. Monte Carlo experiments provide numerical evidence that the proposed tests work as expected in terms of empirical size and power in finite samples. An empirical application illustrates that checking for the statistical significance of the instruments at different quantiles is important.

Full PDF

AA ﬁrst-stage representation for instrumental variables quantileregression

Javier Alejo ∗ Antonio F. Galvao † Gabriel Montes-Rojas ‡ February 3, 2021

Abstract

This paper develops a ﬁrst-stage linear regression representation for the instrumentalvariables (IV) quantile regression (QR) model. The ﬁrst-stage is analogue to the leastsquares case, i.e., a conditional mean regression of the endogenous variables on the in-struments, with the diﬀerence that for the QR case is a weighted regression. The weightsare given by the conditional density function of the innovation term in the QR structuralmodel, conditional on the endogeneous and exogenous covariates, and the instruments aswell, at a given quantile. The ﬁrst-stage regression is a natural framework to evaluate thevalidity of instruments. Thus, we are able to use the ﬁrst-stage result and suggest testingprocedures to evaluate the adequacy of instruments in IVQR models by evaluating theirstatistical signiﬁcance. In the QR case, the instruments may be relevant at some quantilesbut not at others or at the mean. Monte Carlo experiments provide numerical evidencethat the proposed tests work as expected in terms of empirical size and power in ﬁnitesamples. An empirical application illustrates that checking for the statistical signiﬁcanceof the instruments at diﬀerent quantiles is important.

Keywords:

Quantile regression, instrumental variables, ﬁrst-stage.

JEL:

C13, C23. ∗ IECON-Universidad de la Rep´ublica, Gonzalo Ramirez 1926, C.P. 11200, Montevideo, Uruguay. E-mail:[email protected] † University of Arizona, Tucson, USA. E-mail: [email protected] ‡ Universidad de Buenos Aires, Ciudad Aut´onoma de Buenos Aires, Argentina. E-mail:[email protected] a r X i v : . [ ec on . E M ] F e b Introduction

Instrumental variables (IV) methods are one of the main workhorses to estimate causal rela-tionships in empirical analysis. Standard IV regression methods stress that for instrumentsto be valid they must be exogenous. It is also important, however, that a second conditionfor a valid instrument, instrument relevance, holds, for if the instruments are only marginallyrelevant, or “weak,” then ﬁrst-order asymptotics can be a poor guide to the actual samplingdistributions of conventional IV regression statistics. Several testing procedures have beenproposed to evaluate the presence of weak instruments, as well as alternative robust infer-ence methods. The most popular test to evaluate the weak instruments problem looks at theﬁrst-stage (i.e. a linear regression of the endogenous variable on the IV and other exogenouscovariates) F-statistics following the rule-of-thumb of Staiger and Stock (1997) and subsequentvariants as Sanderson and Windmeijer (2016), Lee et al. (2020) and others. See Stock andYogo (2005) for an extensive discussion.Quantile regression (QR) is an important method of modeling heterogeneous eﬀects. Sev-eral IV methods have been proposed in QR to solve endogeneity when the covariates arecorrelated with the error term in a regression model. Chernozhukov and Hansen (2005, 2006,2008) (CH hereafter) develop an instrumental variables quantile regression (IVQR) procedurethat has been applied in several contexts. It is one of the most proliﬁc approaches in termsof subsequent work, as it provides a general procedure to use IV for endogeneity of regressors(see, e.g., Angrist et al., 2006; Chernozhukov et al., 2009; Galvao, 2011). Other work, basedon this idea, develop the GMM counterpart constructed using moment conditions directly, see,for instance, Kaplan and Sun (2017) and de Castro et al. (2019). We refer to Chernozhukovet al. (2020) for an overview of IVQR.CH comment that their method is a simple solution to a two-stage least-squares (2SLS)analog, which has been formally established in Galvao and Montes-Rojas (2015). However, theﬁrst-stage of the IVQR estimator has not been explicitly considered, as it is implemented as aninverse QR estimator. The IVQR estimator contrasts to alternative procedures where the ﬁrst-stage is implemented in alternative frameworks. For instance, Amemiya (1982), Powell (1983),Chen and Portnoy (1996), and Kim and Muller (2004) use an explicit ﬁrst-stage that ﬁts theendogenous variable(s) as a function of exogenous covariates and IV, and this is then pluggedin a second-stage. Lee (2007) also adopts a two-step control-function approach where in ﬁrststep consists of estimation of the residuals of the reduced-form equation for the endogenousexplanatory variable. Ma and Koenker (2006) presents an estimator for a recursive structuralequation model.This paper builds on the IVQR estimator and shows that a ﬁrst-stage regression model canbe explicitly recovered from the CH IVQR estimator. The ﬁrst-stage IVQR (FS-IVQR) is aconditional mean regression of the endogenous variables on the instruments, with the diﬀerencethat the QR case is a weighted regression, that is, it has the representation of a weighted2east squares (WLS) regression of the endogenous variable(s) on the IV and the exogenousregressors. The weights are given by the conditional density function of the innovation term inthe QR structural model, conditional on the endogeneous and exogenous covariates togetherwith the instruments, at a given quantile. The derivation of the result is simple. We write theIVQR estimator as a constrained Lagrangian optimization problem and show that one of therestrictions that must be satisﬁed is the analogue of the ﬁrst-stage.The practical implementation of the FS-IVQR is as follows. First, from the IVQR oneestimates the conditional density function at a selected quantile, which produces an estimateof the weights. The weighting factor can be estimated, for instance, using the sparsity method(see, e.g., Koenker (2005)). Second, a standard WLS model is implemented – this is analogueto the ﬁrst-stage model used in 2SLS. We derive the asymptotic distribution of the two-stepestimator.The ﬁrst-stage regression is a natural framework to evaluate the validity of instruments sinceone can test for their statistical signiﬁcance, that is, how the IV impact on the endogenousvariable(s). Furthermore, it can also be used for testing procedures to assess the validity of theIV for given quantiles. A Wald-type test on the coeﬃcients of interest can be used for testing.Provided that weights are consistently estimated, the Wald test is asymptotically Chi-squaredwith the number of degrees of freedom equal to the number of coeﬃcients tested. We highlightthat when testing for one instrument being invalid, the empirical applications are restrictedto the case with at least two instruments, with one being valid. This is because the practicalimplementation of the ﬁrst-stage relies on a consistent estimation of the weights in the ﬁrststep. In sum, the proposed method evaluates the relevance condition for the validity of the IVin the IVQR framework. The procedure, thus, allows the empirical researcher to evaluate theindividual quality of the IV.The proposed inference allows for a procedure in empirical work that is parallel to thestandard ﬁrst-stage in two-stage least squares (FS-2SLS), to evaluate the degree of associationof the IV to the endogenous variable. Nevertheless, the derivation of the FS-IVQR resultillustrates that the rejection of the null hypothesis considered here is a necessary condition forthe validity of the IVQR speciﬁcation to be used in practice for identiﬁcation. Our procedureshould be thus evaluated in a framework where one is concerned with the ﬁrst-stage relevanceof the IV in similar vein as 2SLS for estimating mean eﬀects. It provides a clear link betweenthe IV evaluation in an explicit ﬁrst-stage in least-squares and QR models.One important feature of the procedure developed here is that instruments could be sta-tistically insigniﬁcant in FS-2SLS, but they could still be related to the endogenous variablein the IVQR set-up. The reason is that the FS-2SLS test only evaluates a mean eﬀect, but There is a literature on weak identiﬁcation robust inference for QR models. Without imposing additionalconditions, statistical inference for the structural quantile function can be performed using weak-identiﬁcationrobust inference as described in Chernozhukov and Hansen (2008), Jun (2008), or Chernozhukov et al. (2009).In this paper we do not pursue this avenue and leave it for future research.

Let ( y, d, x, z ) be random variables, where y is a scalar outcome of interest, d is a 1 × r vector ofendogenous control variables, x is a 1 × k vector of exogenous control variables, and z is a 1 × p vector of exogenous instrumental variables, with p ≥ r . Deﬁne w = ( x, z ) and s = ( d, x, z ).Chernozhukov and Hansen (2006) developed estimation and inference for a generalizationof the QR model with endogenous regressors. A linear representation of the model takes thefollowing form y = dα ( u d ) + xβ ( u d ) , u d | x, z ∼ Uniform(0 , , (1)where u d is the nonseparable error or rank. Under some regularity conditions, CH establishthe following IV identiﬁcation function P [ y ≤ dα ( τ ) + xβ ( τ ) | x, z ] = P [ u d ≤ τ | x, z ] = τ. (2)4lthough each parameter and estimator is indexed by the quantile τ ∈ (0 , τ .The restriction in (2) can be used to estimate the parameters of interest. For a givenquantile τ , the population IVQR estimator for model in (1), is given bymin α (cid:107) γ ( α ) (cid:107) A , where ( β ( α ) , γ ( α )) = argmin β,γ E [ ρ τ ( y − dα − xβ − zγ )] , and ρ τ ( u ) = u ( τ − ( u < (cid:107) · (cid:107) A = · (cid:48) A · is the Euclidean distancefor any positively deﬁnite matrix A of dimension p × p .As noted by Chernozhukov and Hansen (2006, p.501), the IVQR estimator is asymptoticallyequivalent to a particular GMM estimator where the QR ﬁrst order conditions are used asmoment conditions. In particular, it would involve a Z-estimator solvingE (cid:2) x (cid:48) ( [ y − dα − xβ < − τ ) (cid:3) = k , (3)E (cid:2) z (cid:48) ( [ y − dα − xβ < − τ ) (cid:3) = p , , (4)where ( · ) is the indicator function. Here k and p are null vectors with dimensions k × p ×

1, respectively.Diﬀerent estimators have been proposed in the GMM framework based on identifyingthe structural parameters from equations (3)–(4). Kaplan and Sun (2017) and de Castroet al. (2019) provide general estimation procedures based on smoothing techniques of the non-diﬀerentiable indicator function. However, the constructed estimator diﬀers from the IVQRone. This can be seen in the fact that the term zγ is not considered altogether from theregression model. The IVQR estimator proposed by Chernozhukov and Hansen (2006), for a given quantile τ ,can be written as a constrained minimization problem, where the constraints are the momentconditions, that is, min ( α,β,γ ) (cid:107) γ (cid:107) A , (5)5ubject to E (cid:2) x (cid:48) ( [ y − dα − xβ − zγ < − τ ) (cid:3) = k , (6)E (cid:2) z (cid:48) ( [ y − dα − xβ − zγ < − τ ) (cid:3) = p . (7)Now we write this constrained optimization as a Lagrangian problem as L ( α, β, γ, λ x , λ z ) = (cid:107) γ (cid:107) A + λ x E (cid:2) x (cid:48) ( [ y − dα − xβ − xγ < − τ ) (cid:3) (8)+ λ z E (cid:2) z (cid:48) ( [ y − dα − xβ − xγ < − τ ) (cid:3) , where λ x is a 1 × k vector and λ z is a 1 × p vector. Therefore, the IVQR estimator is given bythe empirical counterpart of argmin ( θ,λ x ,λ z ) L ( θ, λ x , λ z ) , where θ = ( α (cid:48) , β (cid:48) , γ (cid:48) ) (cid:48) .The ﬁrst derivatives of the Lagrangian in equation (8) are ∂ L /∂α = − (cid:8) λ x E (cid:2) f · x (cid:48) d (cid:3) + λ z E (cid:2) f · z (cid:48) d (cid:3)(cid:9) (cid:48) (9) ∂ L /∂β = − (cid:8) λ x E (cid:2) f · x (cid:48) x (cid:3) + λ z E (cid:2) f · z (cid:48) x (cid:3)(cid:9) (cid:48) (10) ∂ L /∂γ = (cid:8) γ (cid:48) A − λ x E (cid:2) f · x (cid:48) z (cid:3) − λ z E (cid:2) f · z (cid:48) z (cid:3)(cid:9) (cid:48) (11) ∂ L /∂λ x = E (cid:2) x (cid:48) ( [ y − dα − xβ − zγ < − τ ) (cid:3) (cid:48) (12) ∂ L /∂λ z = E (cid:2) z (cid:48) ( [ y − dα − xβ − zγ < − τ ) (cid:3) (cid:48) , (13)where f := f u τ (0 | d, x, z ) denotes the density function of u τ := y − dα ( τ ) − xβ ( τ ) conditionalon s = ( d, x, z ), evaluated at the τ -th conditional quantile, which is zero. Note that f is speciﬁcfor each quantile τ . This density function plays a central role in what follows.The solution, assuming an interior solution, should have all equations above equal to zero.Thus, from equation (10), λ (cid:48) x = − (cid:0) E[ f · x (cid:48) x ] (cid:1) − (cid:0) E[ f · x (cid:48) z ] (cid:1) (cid:48) λ (cid:48) z . (14)Then, replacing (14) in (11), (cid:0) E[ f · z (cid:48) x ] − E[ f · z (cid:48) x ](E[ f · x (cid:48) x ]) − E[ f · x (cid:48) z ] (cid:1) (cid:48) λ (cid:48) z = 2 Aγ, such that λ (cid:48) z = 2 (cid:0) E[ f · z (cid:48) x ] − E[ f · z (cid:48) x ](E[ f · x (cid:48) x ]) − E[ f · x (cid:48) z ] (cid:1) − Aγ. (15)6inally, replacing (15) in (9),E (cid:2) f · d (cid:48) x (cid:3) λ (cid:48) x + E (cid:2) f · d (cid:48) z (cid:3) λ (cid:48) z = 2 (cid:8) E (cid:2) f · d (cid:48) z (cid:3) − E (cid:2) f · d (cid:48) x (cid:3) (E[ f · x (cid:48) x ]) − E[ f · x (cid:48) z ] (cid:9) × (cid:8) E[ f · z (cid:48) x ] − E[ f · z (cid:48) x ](E[ f · x (cid:48) x ]) − E[ f · x (cid:48) z ] (cid:9) − Aγ = r , where r is a r × α (cid:48) , β (cid:48) , γ (cid:48) ) (cid:48) as a system of three equationsgiven by (cid:8) E (cid:2) f · d (cid:48) z (cid:3) − E (cid:2) f · d (cid:48) x (cid:3) (E (cid:2) f · x (cid:48) x (cid:3) ) − E (cid:2) f · x (cid:48) z (cid:3)(cid:9) × (cid:8) E (cid:2) f · z (cid:48) z (cid:3) − E (cid:2) f · z (cid:48) x (cid:3) (E (cid:2) f · x (cid:48) x (cid:3) ) − E (cid:2) f · x (cid:48) z (cid:3)(cid:9) − Aγ = r (16)E [ x · ( [ y − dα − xβ − zγ < − τ )] = k (17)E [ z · ( [ y − dα − xβ − zγ < − τ )] = p . (18) Given equations (16)–(18) above, we can see that (16) provides a ﬁrst-stage representation ofthe IVQR model. This can be written as δ (cid:48) Aγ = r , (19)where δ := (cid:8) E (cid:2) f · z (cid:48) z (cid:3) − E (cid:2) f · z (cid:48) x (cid:3) (E (cid:2) f · x (cid:48) x (cid:3) ) − E (cid:2) f · x (cid:48) z (cid:3)(cid:9) − (cid:8) E (cid:2) f · z (cid:48) d (cid:3) − E (cid:2) f · z (cid:48) x (cid:3) (E (cid:2) f · x (cid:48) x (cid:3) ) − E (cid:2) f · x (cid:48) d (cid:3)(cid:9) . (20)Here δ is a p × r vector. Notice that equation (20) is a least-squares projection. In particular,the representation in (20) is a weighted conditional mean regression, where the endogenousvariable(s), d , is(are) regressed on the IV, z , and the exogenous variables, x . This is theanalogue to the ﬁrst-stage in the 2SLS case, with the diﬀerence that the QR case is a weightedregression. The weights are given by the conditional density function of the innovation term inthe QR structural model, conditional on the endogeneous and exogenous covariates togetherwith the instruments.Hence, for each endogeneous variable, say d j for j = 1 , , ..., r , δ j in equation (20) can berecovered as the solution to the following optimization problem µ j := ( ψ j , δ j ) = argmin ψ,δ E (cid:2) f · ( d j − xψ − zδ ) (cid:3) . (21)Note that the parameter δ also depends on θ = ( α (cid:48) , β (cid:48) , γ (cid:48) ) (cid:48) , through the conditional den-7ity function f at quantile τ . Thus, this ﬁrst-stage representation depends on the structural(second-stage) parameters, and as such, it is diﬀerent from the 2SLS case in mean regressionmodels.We notice that the ﬁrst-stage in equation (21) is diﬀerent from those in the existing lit-erature using two-stage regressions for conditional quantile models. Amemiya (1982), Powell(1983), Chen and Portnoy (1996), and Kim and Muller (2004) propose diﬀerent two stepprocedures in which the ﬁrst step ﬁts the endogenous variable(s) as a function of exogenouscovariates and IV, and this is then plugged in a second-stage. Nevertheless, these papers useleast squares without weighting or standard quantile regression in the ﬁrst-stage. Our proce-dure derives the ﬁrst-stage from the IVQR set-up, thus conﬁrming that a ﬁrst-stage (albeitdiﬀerent) is part of the model. In this section we consider the empirical implementation of the ﬁrst-stage instrumental vari-ables quantile regression (FS-IVQR) and derive the estimators’ asymptotic distribution. Wepropose a two steps estimation procedure, where in the ﬁrst step we estimate the density usingthe IVQR model, and in the second step we use a weighted least squares (WLS) regression.For simplicity of exposition, we develop the case of r = 1, i.e. one endogenous variable. The estimator requires a consistent estimator of µ in (21), which will be based on WLS basedon the estimator of f , at a given quantile of interest τ . The estimator has two steps as following: In the ﬁrst step we obtain ˆ θ = ( ˆ α, ˆ β (cid:48) , ˆ γ (cid:48) ) (cid:48) from the CH estimator,ˆ α = argmin α (cid:107) ˆ γ ( α ) (cid:107) A , where ( ˆ β ( α ) , ˆ γ ( α )) = argmin β,γ n n (cid:88) i =1 [ ρ τ ( y i − d i α − x i β − z i γ )] . Provided that the τ th conditional quantile function of y | s is linear, as in (1), then for h n → τ ± h n conditional quantile functions byˆ θ ( τ ± h n ). And the density f i := f u τ (0 | d = d i , x = x i , z = z i ) can thus be estimated by thediﬀerence quotient ˆ f i = 2 h n s i (cid:16) ˆ θ ( τ + h n ) − ˆ θ ( τ − h n ) (cid:17) . (22)The estimation in (22) is a natural extension of sparsity estimation methods, suggested byHendricks and Koenker (1992). The estimator is discussed in further details in Zhou and8ortnoy (1996) and Koenker (2005). We introduce the simplifying notation ˆ f i := ˆ f u τ (0 | s = s i ). The bandwidth for the density estimation can be chosen heuristically as a scaled versionof Hall and Sheather (1988): h n = 2 n − / Φ − (0 . / (cid:34) · φ (cid:8) Φ − ( τ ) (cid:9) − ( τ ) + 1 (cid:35) / . In the second step the parameters of interest δ can be obtained from a feasible WLS asˆ µ := ( ˆ ψ, ˆ δ ) = argmin ψ,δ n n (cid:88) i =1 (cid:104) ˆ f i · ( d i − x i ψ − z i δ ) (cid:105) . (23)Equation (23) produces ˆ δ which is the main object of interest.Deﬁne Y , X , D and Z as the matrices formed from a random sample of { y i , d i , x i , z i } ni =1 .Similarly deﬁne W = [ X, Z ]. Deﬁne the weighting diagonal matrixˆ V =  ˆ f . . . ˆ f n  . Then, the estimator in (23) above can be written in a simple matrix notation asˆ µ = ( W (cid:48) ˆ V W ) − W (cid:48) ˆ V D. (24)Notice that if f i is a constant for all i , then the proposed FS-IVQR method should deliversame estimates as FS-2SLS for the mean. This would happen, for example, in the case of i.i.d. innovations in the second-stage structural model. Thus, there will be diﬀerences betweenthe two estimators only when f i varies across i , that is, when the weighting factor is not aconstant. Example 1 (location model) Appendix B shows a case where the density functionis a constant. A typical example where the weights are not constant across individuals is thelocation-scale model, see Examples 2 and 3 in Appendix B. In this subsection, we derive the asymptotic distribution of the proposed estimator. Theasymptotic properties of the IVQR estimator can be found in Chernozhukov and Hansen(2006) and the assumptions therein are those required for inference. We consider Assumption2 in Chernozhukov and Hansen (2006, pp.501–502), that we reproduce here for convenience.It imposes conditions for θ to be identiﬁed and estimated. We are assuming that there is only one endogenous variable, r = 1. Otherwise the analysis below shouldbe repeated separately for each endogenous variable as there will be a diﬀerent ﬁrst-stage for each one. ssumption 1. R1. Sampling. { y i , x i , d i , z i } are iid deﬁned on a probability space and takevalues in a compact set.R2. Compactness and convexity. For all τ ∈ (0 , , ( α, β, γ ∈ int( A × B × G ) is compact andconvex.R3. Full rank and continuity. y has bounded conditional density (conditional on w ), and for θ = ( α, β, γ ) , π = ( α, β ) and Π( θ, τ ) := E [( τ − ( y < dα + xβ + zγ ) · [ x, z ]] , Jacobian matrices ∂∂ ( α (cid:48) ,β (cid:48) ) Π( θ, τ ) and ∂∂ ( β (cid:48) ,γ (cid:48) ) Π( θ, τ ) are continuous and have full rank, uni-formly over A × B × G and the image of

A × B × G under the mapping ( α, β ) (cid:55)→ Π( θ, τ ) issimply connected. Assume that θ = ( α , β (cid:48) , γ (cid:48) ) (cid:48) is the unique solution to the CH problem. We impose additional conditions for deriving the limiting properties of the feasible ﬁrst-stage estimator in (23) using the sparsity estimation in (22).

Assumption 2.

Let ε i := d i − x i ψ − z i δ , with E[ ε i | w i ] = 0 , and E[ ε i | w i ] = σ i . Also, let f i := f θ ( y − sθ | s = s i ) and assume that E[ | f − i w i ε i | ] < ∞ . Let Ω fσ := E[ f i σ i w i w (cid:48) i ] and Ω f := E[ f i w i w (cid:48) i ] . The limits lim n →∞ n (cid:80) ni f i σ i w i w (cid:48) i = Ω fσ and lim n →∞ n (cid:80) ni f i w i w (cid:48) i = Ω f exist and are nonsingular (and hence ﬁnite). Assumption 2 contains conditions for establishing consistency and asymptotic normality ofthe proposed estimator. The next result presents an intermediate result.

Lemma 1.

Under Assumptions 1–2, as n → ∞ , h n → and nh n → ∞ , √ n (ˆ µ − µ ) d → N ( k + p , V ( µ )) , (25) where µ := ( ψ , δ ) = argmin ψ,δ E (cid:2) f · ( d − xψ − zδ ) (cid:3) and V ( µ ) = Ω − f Ω fσ Ω − f is the asymp-totic covariance matrix.Proof. In the Appendix A.

In this section we derive tests for the validity of the IV using the ﬁrst-stage representation.

The restriction in equation (19) provides a natural framework to evaluate the relevance of theinstruments in QR models.First, notice that the parameter δ captures the strength of the instrument in the sense itmeasures the correlation between the instrument z and the endogenous variable d weighted10y the density function f . This is the QR counterpart of the ﬁrst-stage eﬀect of z on theendogenous variables d for the 2SLS. When the instrument is valid, δ (cid:54) = p × r .Second, note that the instrument z does not belong in the structural quantile model (1),hence when z is valid, γ = p × r can be used for identiﬁcation, a key feature of the CH IVQRestimator. Another way to see this is the following. Equation (19) also shows that when δ = p × r , the value of γ is irrelevant, and therefore it cannot be used in the IVQR procedureto solve endogeneity. As such, δ (cid:54) = p × r is a necessary condition for the IV to have a purposein the CH set-up. Therefore, a test for the validity of the instruments can then be based oninference on δ .Another way of gaining intuition on the test is the following. Assume that r = 1 (i.e. onlyone endogenous variable), then (19) is in fact equal to 0, a scalar. If we further assume that A = I p , then p (cid:88) q =1 δ q γ q = 0 , (26)where δ = [ δ , . . . , δ p ] (cid:48) is the column vector that has the ﬁrst-stage eﬀect of all IV on d . Noteagain that if δ = p × , then the vector γ could have any value and its implied restrictionswould be irrelevant.The formulation of the test proposed in this paper is based on the condition given inequation (16) together with the ﬁrst-stage IVQR representation in equation (20). A test forvalidity of the instruments for p instruments can be based on the null hypothesis H : δ = p × r , (27)against the alternative H A : δ (cid:54) = p × r . (28)We highlight that, diﬀerently from the 2SLS, the ﬁrst-stage IVQR in (21) is for a givenquantile τ . Thus, for the same variables d and instruments z , the strength of the instrumentsmay vary across diﬀerent quantiles. This variation is captured by the weights f .Note that the procedure works for r ≥

1, that is for one or more than one endogenousvariable. In this case, separate tests could be applied as in 2SLS analysis where there maybe a diﬀerent ﬁrst-stage for each endogeneous variable. To simplify the procedures below weassume that r = 1, that is, there is only one endogenous variable.The expressions of the null and the alternative hypotheses in (27) and (28), respectively,lead to the following testing procedure.When H is true, under suitable regularity conditions, ˆ δ converges in probability to p × r As noted by Galvao and Montes-Rojas (2015) the CH set-up is equivalent to the 2SLS in least-squaresmodels. In fact the CH estimator is the QR counterpart of a 2SLS estimator. The expression above also showsthat there is an implicit ﬁrst-stage, similar to that in 2SLS problems. As such, this provides an analyticalexpression to evaluate the relevance of the IV. τ . On the other hand, when H is true, ˆ δ converges in probability to δ (cid:54) = p × r .Therefore, it is reasonable to reject H if the magnitude of ˆ δ is suitably large.A natural choice to test H against H for the case of r = 1 is the Wald statistic as T n = n ˆ δ (cid:48) { V δ } − ˆ δ, (29)where V δ is the asymptotic covariance matrix of √ n ˆ δ under H . In practice, V δ is replaced bya suitable consistent estimate. Consider a subset of the instruments, p < p , and consider a partition of δ = [ δ (cid:48) , δ (cid:48) ] (cid:48) of thecorresponding ﬁrst-stage parameters of interest, with dimensions p and p (with p = p + p ),respectively. Consider a p × ( k + p ) matrix R = [ p × k , I p , p × p ] where I p is an identitymatrix of dimension p × p . Thus, Rµ = δ is the subvector of interest. Let ˆ V (ˆ µ ) be aconsistent estimator of V ( µ ), which can be obtained from the WLS procedure. The nextresult derives the limiting distribution of the test statistic in eq. (29). Proposition 1.

Consider Assumptions 1–2, n → ∞ , h n → and nh n → ∞ . Furthermore,assume that dim ( z ) = p > p ≥ . Then, under H : δ = p and local alternatives H A : δ = a p / √ n T n = n ( R ˆ µ ) (cid:48) { R ˆ V (ˆ µ ) R (cid:48) } − ( R ˆ µ ) d → χ p ( a p ) . (30) Proof.

In the Appendix A.Computation of the test statistic (29) requires a non-parametric estimator of f , the con-ditional density of u τ | d, x, z evaluated at the speciﬁc quantile of interest τ . Given that theweights need to be estimated, the proposed FS-IVQR has speciﬁc properties when testing un-der the null hypothesis of an invalid instrument. The condition on the number of IV beinglarger than the number of parameters tested in the null hypothesis is required for consistentestimation of θ under the null, which in turn, is used for the consistent estimation of f . We analyze in this section the performance of the proposed test with ﬁnite samples througha series of Monte Carlo simulation exercises. The data generating process (DGP) has thefollowing model: y i = d i + x i + (1 + cd i ) u i , (31) d i = az i + φz i + (1 + bz i ) v i , (32)12here x i , z i and z i are three independent variables with distribution U (0 , u i and v i havestandard bivariate normal distribution with correlation 0 .

50. Equations (31)–(32) specify amodel where there could be pure location or location-scale speciﬁcations in either the ﬁrst-and/or the second-stages. Note that the parameters a and b determine the type of eﬀectthat the instrument z has on the endogenous covariate d . For example, if a (cid:54) = 0 and b = 0the instrument z has a pure location eﬀect on d (pure location shift model), while if a = 0and b (cid:54) = 0 the eﬀect is only on the variance of the endogenous covariate (pure scale shiftmodel). Next the parameter c determines if the structural second-stage model is a location orlocation-scale model.In all cases we consider tests for H : δ = 0 where this is the ﬁrst-stage parameterassociated with the z instrument deﬁned in the previous sections. We consider two diﬀerentcases to investigate the numerical properties of the tests. In the ﬁrst case, φ = 1, there is asecond instrument, z , such that the model correctly identiﬁes the parameters in the structuralequation (31) for all possible values of a and b , even under the case that a = b = 0. In thesecond case, we set φ = 0, and therefore, under the null hypothesis the consistent estimationof the weights f is problematic. Also, in this case, when a = b = 0, there is no valid availableinstrument.We will consider three diﬀerent test statistics from diﬀerent estimators. First, for com-parison purposes, we present a Wald test for the coeﬃcient in z using a simple regressionmodel of d on ( x, z , z ) in a standard 2SLS framework, denoted FS-2SLS. Second, we test for H : δ = 0 using the true density function, f , as weights, that is, using the true θ , denotedFS-IVQR (true density). We note that this is not observed in practice, and we include theseresults for comparison purposes. Our proposed test studied in the previous section is the thirdone, denoted FS-IVQR (sparsity), where we use the sparsity function estimation describedabove. Note that the three tests diﬀer only in the weighting procedure used in the regressionof d on ( x, z , z ).Tables 1–4 show the empirical size (i.e. a = b = 0) of the computed test with 2000simulations for n = { , } and for the quantiles τ = { . , . , . } . The simulationsshow correct empirical size performance in most but not all cases.Consider ﬁrst the case where there is a second instrument, φ = 1 in Tables 1 and 2. Thetests have approximately correct empirical size. As such they clearly evaluate if the instrument z exerts an eﬀect on the endogenous variable d . Note that the empirical size is improved whenwe consider a location-scale model c = 1.Now consider the case where there is no available second instrument, φ = 0, in Tables 3and 4. In this case, the weights in the structural model cannot be estimated consistently underthe null. Since the proposed test evaluates the relationship between z and d , the main issueis whether this relationship can be evaluated in other than the OLS model. Note that for thelocation-only model, Table 3, the test of sparsity estimator is oversized. This is mostly due to13he implicitly estimated sparsity function as the test with the true density function has correctsize. However, when we use a location-scale model, Table 4, the size is correct for the sparsityestimator. This result suggests that the test can be used if there is a location-scale structurein the second-stage, even when the structural parameters cannot be estimated under the null(because z does not solve the endogeneity problem).Table 1: Rejection rate of the null hypothesis using a = b = 0, model with c = 0 and φ = 1 τ Size n = 500 n = 1000FS-2SLS True f Sparsity f FS-2SLS True f Sparsity f a = b = 0, model with c = 1 and φ = 1 τ Size n = 500 n = 1000FS-2SLS True f Sparsity f FS-2SLS True f Sparsity f z is available we should beestimating the correct structural parameters and f u τ (0 | d, x, z ) where u τ = y − Q τ ( y | d, x, z ).However, the case where, under the null, z is invalid would be equivalent to the case where there14able 3: Rejection rate of the null hypothesis using a = b = 0, model with c = 0 and φ = 0 τ Size n = 500 n = 1000FS-2SLS True f Sparsity f FS-2SLS True f Sparsity f a = b = 0, model with c = 1 and φ = 0 τ Size n = 500 n = 1000FS-2SLS True f Sparsity f FS-2SLS True f Sparsity f u ∗ τ = y − Q τ ( y | d, x ). The examples in Appendix B compare f u ∗ τ (0 | d, x ) with f u τ (0 | d, x, z ). The partial results suggest that if f u τ (0 | d, x, z ) and f u ∗ τ (0 | d, x ) are proportional to each otherwhen they vary with d , we could implement the ﬁrst-stage test under the null of all IV beinginvalid.To analyze the empirical power of the test, we performed 2000 simulations only for thecase with n = 1000 and we calculated the rejection rates of the proposed procedure for thequantiles τ = { . , . , . } . As benchmark we also use the test rejection rates obtained inthe FS-2SLS method, i.e., the Wald test of an OLS regression of d on z . The results appear inFigures 1-4. For each ﬁgure we have two blocks, (i) and (ii), where in (i) we evaluate a purelocation ﬁrst-stage model of z on d using a = { , . , ..., . , } and b = 0, and in (ii) we set a = 0 and we vary b = { , . , ..., . , } such that z has only a scale eﬀect on d .We ﬁrst consider the case where the relation y | ( d, x ) is a pure location model, that is, c = 0,and there is a second valid instrument φ = 1. Figure 1, block (i) pure location ﬁrst-stage, showsthat the FS-IVQR power computed with true and estimated densities behaves similarly to FS-2SLS. That is, they correctly reject as a increases. The estimated density model has slightlyless power than the one with the true density. For block (ii), scale-only ﬁrst-stage, however,FS-2SLS and FS-IVQR (true density) have no power in detecting the eﬀect of z on d . Thetest with the estimated sparsity function rejects the null as b increases.We now analyze the case of location-scale in y | ( d, x ), i.e. c = 1, also with φ = 1. The resultsof Figure 2 show that the FS-2SLS test is similar to the QRIV-based tests under pure locationmodel for d | z (block (i) of Figure 2). In particular, both using the true and the estimatedsparsity function correctly rejects when a increases. However, the results of the FS-IVQR diﬀerwhen we are in the presence of a pure-scale model for d | z d and z at the mean (FS-2SLS), but it does aﬀectthe other points of the conditional distribution. Therefore, the ﬁrst-stage of 2SLS does notﬁnd any relationship between the endogenous variable and the instrument while the FS-IVQRestimators (both theoretical and estimated) are able to correctly detect it.Consider now the case where we have c = 0 and φ = 0. The problems noted in the sizetables are exacerbated here since it is not possible to identify the parameters of Q τ ( y | ( d, x ))via the IVQR, as shown in Figure 3. Interestingly, the density estimation using the sparsityfunction introduces some misspeciﬁcation that allow us to evaluate the eﬀect of z . Notehowever, that as noted in the size evaluation, this test is oversized, and therefore it cannot beused for valid inference.Finally, consider the last case when c = 1 and φ = 0 (Figure 4). The FS-IVQR tests work Let ˜ α and ˜ β be the parameters that result from the estimation of the biased structural model withoutinstruments, Q τ ( y | d, x ) = d ˜ α + x ˜ β . Note that u τ = y − Q τ ( y | d, x, z ) = y − dα − xβ can be written as y − d ˜ α − x ˜ β − bias ( d, x ), where bias ( d, x ) = d ( α − ˜ α ) + x ( β − ˜ β ) such that u τ = u ∗ τ − bias ( d, x ). H : δ = 0 (model with c = 0 and φ = 1)in this case. In both (i) and (ii) the tests detects an association between the instrument andthe endogenous variable. In case (ii) the FS-IVQR rejects as b increases while FS-2SLS doesnot. As noted in Table 4 the test works even for the case where a = b = 0 and the endogeneityproblem in the structural estimators cannot be solved.17igure 2: Power for H : δ = 0 (model with c = 1 and φ = 1)Figure 3: Power for H : δ = 0 (model with c = 0 and φ = 0)18igure 4: Power for for H : δ = 0 (model with c = 1 and φ = 0) In this section we show an application of the proposed test in the estimation of a Mincerequation to estimate returns to schooling. The data used are from the paper of Card (1995)and correspond to 3010 individuals of the US National Longitudinal Survey of Young Men. Following the same speciﬁcation of that paper, the model describes wages as a function ofthe years of education and other exogenous controls such as work experience, race and a setof geographic and regional variables. A classic problem with this model is that ability isunobservable and therefore its omission induces a potential bias due to endogeneity of the OLSestimator. Speciﬁcation errors have analogous consequences on QR estimators, as analyzed byAngrist et al. (2006). Card (1995) proposes to implement an IV strategy using two measuresof proximity to the university as external variables to the wage equation.Table 5 shows the results of the ﬁrst-stage to check if the IV are valid, together with theestimated second-stage results. The ﬁrst column corresponds to the conditional mean modeland the next ones are the regressions proposed for IVQR for τ ∈ { . , . , . } . The resultsshows that the ﬁrst instrument (lived near 2-year college in 1966) is not relevant for the low Downloaded from http://davidcard.berkeley.edu/data_sets/proximity.zip τ = 0 . τ = 0 . τ = 0 . First-stage estimates

Lived Near 2-year College in 1966 0.123 0.0644 0.471*** 0.154**(0.0774) (0.129) (0.0704) (0.0709)Lived Near 4-year College in 1966 0.321*** 0.380*** 0.298*** 0.140*(0.0878) (0.146) (0.101) (0.0737)Experience -0.412*** -0.450*** -0.489*** -0.494***(0.0337) (0.0871) (0.0247) (0.0344)Experience-Squared 0.000848 -0.000681 0.00457*** 0.00449**(0.00165) (0.00496) (0.00122) (0.00192)Black indicator -0.945*** -0.926*** -0.886*** -0.753***(0.0939) (0.162) (0.113) (0.0701)Constant 16.60*** 16.42*** 17.00*** 16.68***(0.242) (0.393) (0.173) (0.211)

Second-stage estimates

Education 0.157*** 0.176*** 0.268*** 0.104(0.0524) (0.0521) (0.0271) (0.0662)Experience 0.119*** 0.120*** 0.180*** 0.0932***(0.0227) (0.0248) (0.0140) (0.0341)Experience-Squared -0.00236*** -0.00201*** -0.00337*** -0.00221***(0.000347) (0.000347) (0.000352) (0.000438)Black indicator -0.123** -0.110** -0.00925 -0.148***(0.0520) (0.0519) (0.0342) (0.0469)Constant 3.237*** 2.698*** 1.400*** 4.360***(0.883) (0.870) (0.466) (1.119)Observations 3,010 3,010 3,010 3,010Source: Card (1995). Notes: Standard errors in parentheses. SE robust for OLS estimates. *** p < .

01, ** p < .

05, * p < .

1. Regional and geographic dummies are used but omitted. quantiles and the mean but it is signiﬁcant for middle and high quantiles. Also, note thatalthough the second instrument (lived near 4-year college in 1966) rejects the null hypothesisfor the conditional mean, this variable has diﬀerent degree of signiﬁcance across quantiles.In particular, this is for τ = 0 .

75 where the instrument is relevant only at 10% signiﬁcance.These results are very important since although the proximity to the university seems to be astrong instrument to identify the causal eﬀect of education on the conditional mean, our testalso indicates a certain limitation when the object of study is to evaluate the impact on thelower part of conditional distribution of wages. Therefore, this alerts for the quality of theasymptotic properties of the IVQR estimates in the presence of invalid instruments.

This paper proposes a ﬁrst-stage model and inference procedures to evaluate the degree ofassociation between the IV and the endogenous regressor(s) in the IVQR estimator. Theprocedure developed here allows to evaluate instruments in a similar vein to that in 2SLSmodels for the conditional average, that is, by looking at the statistical signiﬁcance of theinstruments in the ﬁrst-stage regression. In turn, this will allow to investigate IV validity forspeciﬁc quantiles. Monte Carlo experiments clearly illustrate that one may encounter caseswhere the IV are not valid for the mean, but are still valid for some quantiles. The same issue20ppears in the empirical application.The analysis may be extended in the following two directions. First, this approach canbe used to identify local treatment eﬀects, where an IV estimate being signiﬁcant at somequantiles corresponds to a particular eﬀect of a treatment. Second, the procedure outlinedhere could be combined with the second-stage inference to produce statistics similar to theStaiger and Stock (1997) F-statistics rule-of-thumb. In particular, to study weak-instrumentsissues in QR models. 21 ppendix A: Proofs

Proof of Lemma 1.

First, consider an estimator of the parameter µ using the true weightingmatrix V as V =  f . . . f n  , (33)that is given by the following ˜ µ = ( W (cid:48) V W ) − W (cid:48) V D, where W = [ X, Z ]. Replacing D by ( W µ + ε ) in the deﬁnition of ˜ µ we have that √ n (˜ µ − µ ) = (cid:18) W (cid:48) V Wn (cid:19) − W (cid:48) V ε √ n . By the Slutsky’s Theorem, the proof of the lemma requires showing that W (cid:48) V Wn p → Ω f , (34)and W (cid:48) V ε √ n d → N (0 , Ω fσ ) . (35)To show (34), its left side has the ( j, k ) element given by1 n n (cid:88) i =1 f i w ij w ik p → E [ f i w ij w ik ] , by the Law of Large Numbers and Assumption 2. To show (35), ﬁrst note thatE (cid:2) W (cid:48) V ε (cid:3) = E (cid:2) W (cid:48) V E[ ε | W ] (cid:3) = 0 , by Assumption 2. Furthermore, W (cid:48) V ε is a sum of i.i.d. random vectors f θ ( s i ) · w i · ε i withcommon covariance matrix having the ( j, k ) element Cov ( f i w ij ε i , f i w ik ε i ) = E (cid:2) f i w ij w ik ε i (cid:3) = E (cid:2) f i w ij w ik E[ ε i | w i ] (cid:3) = E (cid:2) f i w ij w ik σ i (cid:3) . Thus, each vector f i · w i · ε i has covariance matrix Ω fσ . Therefore, by the Multivariate CentralLimit Theorem, (35) holds.Finally, we have to show that using estimated weights does not aﬀect the liming distribution.22o establish that consider the estimator with the estimated weights as followingˆ µ = ( W (cid:48) ˆ V W ) − W (cid:48) ˆ V D, such that √ n (ˆ µ − µ ) = (cid:32) W (cid:48) ˆ V Wn (cid:33) W (cid:48) ˆ V ε √ n . (36)First, we show that W (cid:48) ˆ V ε √ n − W (cid:48) V ε √ n p → . (37)Note that W (cid:48) ( ˆ V − V ) ε √ n = n − / n (cid:88) i =1 w i ε i (cid:16) ˆ f i − f i (cid:17) . (38)We want to show that the right hand side of (38) is o p (1). Using the sparsity functionestimator in (22) along with some calculations, we have thatˆ f i = f i + 2 h n f i s i (ˆ θ − θ ) + o p (( nh ) − / ) . We refer the reader to Ota, Kato, and Hara (2019) for details on the remainder term.Hence, using the previous equation, the j th component of the right hand side of equation(38) can be written as √ n (ˆ θ j − θ ,j )2 h n n n (cid:88) i =1 f i w ij ε i . The ﬁrst factor √ n (ˆ θ j − θ ,j ) = o p (1) by Assumption 1 and CH. Moreover, note thatthe average of the i.i.d. variables f − i w i ε i obeys the Law of Large Numbers by the momentrestrictions in Assumption 2, and the result follows.Next, we show that W (cid:48) ˆ V Wn − W (cid:48) V Wn p → , (39)which follows from the same argument as above.The convergences (37) and (39) are enough to show that the right-hand side of (36) satisﬁes (cid:32) W (cid:48) ˆ V Wn (cid:33) W (cid:48) ˆ V ε √ n − (cid:18) W (cid:48) V Wn (cid:19) W (cid:48) V ε √ n p → a ˆ b − ab = ˆ a (ˆ b − b ) + (ˆ a − a ) b. Finally, Slutsky’s theorem yields the result. 23 roof of Proposition 1.

The proof of this result is simple. It follows from observing that byLemma 1, √ n (ˆ µ − µ ) d → N ( , V ( µ )) . Notice that Rµ = δ , hence under the null hypothesis, √ n ( R ˆ µ − ) d → N (cid:0) , RV ( µ ) R (cid:48) (cid:1) . Let ˆ V (ˆ µ ) be a consistent estimator of V ( µ ), and V δ := RV ( µ ) R (cid:48) , then by the Slutsky’stheorem, T n = n (cid:16) ˆ δ (cid:17) (cid:48) { V δ } − (cid:16) ˆ δ (cid:17) d → χ p ( a p ) . Appendix B: Examples of weighting factors

1. Location model

Consider a pure location model, using two equations y = d + u,d = az + v, with ( u, v ) ∼ N (0 , , , , ρ ) a bivariate normal with zero mean, unit variance and correlationparameter ρ and z ∼ N (0 , d ∼ N (0 , a ) and y ∼ N (0 , a + 2 ρ ).Consider now the model where we condition on both ( d, z ). For this case, u | d, z ∼ N ( ρv, (1 − ρ )) by the marginal of the bivariate normal density. Then, Q τ ( u | d, z ) = ρv + (cid:112) − ρ Φ − ( τ ) . Then, u τ = y − Q τ ( y | d, z ) = u − Q τ ( u | d, z ). Note that E ( u τ | d, z ) = E ( u τ | d, z ) − Q τ ( u | d, z ) = − (cid:112) − ρ Φ − ( τ ). Thus, the density is f u τ ( U | d, z ) = 1 (cid:112) − ρ φ (cid:32) U + (cid:112) − ρ Φ − ( τ ) (cid:112) − ρ (cid:33) , where φ () is the density function of a standard normal. If we evaluate it at 0, f u τ (0 | d, z ) = 1 (cid:112) − ρ φ (cid:0) Φ − ( τ ) (cid:1) . u, d ) ∼ N (0 , , , a , κ ), where κ = ρ √ a . Then, itfollows that u | d ∼ N ( E ( u | d ) , V ar ( u | d )), where E ( u | d ) = κd and V ar ( u | d ) = (1 − κ ).As such, we can obtain the quantiles of interest, Q τ ( y | d ) = d + κd + Φ − ( τ )(1 − κ ) / . Note that without endogeneity, i.e. ρ = 0, then κ = 0, and the correct τ -quantile modelshould be Q τ ( y | d, ρ = 0) = d + Φ − ( τ ) . Now, u ∗ τ = y − Q τ ( y | d ) = d + u − ( d + κd + Φ − ( τ )(1 − κ ) / ) = u − κd − Φ − ( τ )(1 − κ ) / .Then, E ( u ∗ τ | d ) = − Φ − ( τ )(1 − κ ) / , and V ar ( u ∗ τ | d ) = V ar ( u | d ) = (1 − κ ).Then, f u ∗ τ ( U | d ) = 1 (cid:112) (1 − κ ) φ (cid:32) U − E ( u ∗ τ | d ) (cid:112) V ar ( u ∗ τ | d ) (cid:33) , such that, f u ∗ τ (0 | d ) = 1 (cid:112) (1 − κ ) φ (cid:0) Φ − ( τ ) (cid:1) . In all cases, f u ∗ τ (0 | d ) and f u τ (0 | d, z ) are constant that do not change with d or z . It isinteresting to evaluate when a = 0, such that (1 − κ ) = (1 − ρ ). Note that in this case, f u ∗ τ (0 | d ) = f u τ (0 | d, z ).

2. Location-scale model 1

Now consider a location-scale model of the form y = d + (1 + cd ) u,d = az + v, where a and c are parameters. As in the previous case ( u, v ) ∼ N (0 , , , , ρ ). Then, u | ( d, z ) ∼ u | v ∼ N ( ρv, − ρ ). Thus, Q τ ( u | d, z ) = ρv + (cid:112) − ρ Φ − ( τ ). Note that it does not dependon z .In this case, Q τ ( y | d, z ) = d + (1 + cd ) Q τ ( u | d, z ), and then, u τ = y − Q τ ( y | d, z ) = (1 + cd )( u − Q τ ( u | d, z )).As such, we can obtain, f u τ ( U | d, z ) = 1 | cd | (cid:112) − ρ φ (cid:32) U + (1 + cd ) (cid:112) − ρ Φ − ( τ )(1 + cd ) (cid:112) − ρ (cid:33) .

25f we evaluate it at 0, f u τ (0 | d, z ) = 1 | cd | (cid:112) − ρ φ (cid:0) Φ − ( τ ) (cid:1) . Note that this depends d , and then, the weights are not uniform.Now, consider the of u | d . Consider ﬁrst the joint distribution of ( u, d ) ∼ N (0 , , , a , κ )where κ = ρ/ √ a . Now, u | d ∼ N ( κd, (1 − κ )), then E ( u | d ) = κd and V ar ( u | d ) = (1 − κ ).For this case let u ∗ τ = y − Q τ ( y | d ) = d +(1+ cd ) u − d − (1+ cd ) Q τ ( u | d ) = (1+ cd )( u − Q τ ( u | d )).Since u | d is Gaussian then (1 + cd )( u − κd − Φ − ( τ )(1 − κ ) / ). Then, E ( u ∗ τ | d ) = (1 + cd )( − Φ − ( τ )(1 − κ ) / ) and V ar ( u ∗ τ | d ) = (1 + cd ) (1 − κ ). As such, we can obtain, f u ∗ τ ( U | d ) = 1 | cd |√ − κ φ (cid:32) U + (1 + cd )(1 − κ ) / Φ − ( τ )(1 + cd )(1 − κ ) / (cid:33) . If we evaluate it at 0, f u ∗ τ (0 | d ) = 1 | cd |√ − κ φ (cid:0) Φ − ( τ ) (cid:1) . Note that both f u τ (0 | d, z ) and f u ∗ τ (0 | d ) share the same relationship with d . In fact, theweighting procedure will be equivalent, as they are proportional to each other.

3. Location-scale model 2

Now consider a location-scale model where both the ﬁrst and second stage are aﬀected in thevariance component, y = d + (1 + cd ) u,d = az + (1 + bz ) v, where a , b , and c are parameters. As in the previous case ( u, v ) ∼ N (0 , , , , ρ ). Deﬁne w =(1 + bz ) v and note that ( u, w | z ) ∼ N (0 , , , (1 + bz ) , ρ ). Then, u | d, z ∼ u | w, z ∼ N ( ρv, − ρ ).Thus, Q τ ( u | d, z ) = ρv + (cid:112) − ρ Φ − ( τ ). Note that it does not depend on b .In this case, Q τ ( y | d, z ) = d + (1 + cd ) Q τ ( u | d, z ), and then, u τ = y − Q τ ( y | d, z ) = (1 + cd )( u − Q τ ( u | d, z )).As such, we can obtain, f u τ ( U | d, z ) = 1 | cd | (cid:112) − ρ φ (cid:32) U + (1 + cd ) (cid:112) − ρ Φ − ( τ )(1 + cd ) (cid:112) − ρ (cid:33) . If we evaluate it at 0, f u τ (0 | d, z ) = 1 | cd | (cid:112) − ρ φ (cid:0) Φ − ( τ ) (cid:1) . Note that this depends d , and then, the weights are not uniform.26ow, it is not standard to obtain the distribution of u | d . To exemplify this, suppose z = { , } is a simple binary variable with p = P r ( z = 1) and independent of ( u, v ). Then,the joint density is f ( u, v, z ) = φ ρ ( u, v ) p z (1 − p ) − z and using the Jacobian transformation weobtain: f ( u, d, z ) = 1 | bz | φ ρ (cid:18) u, d − az bz (cid:19) p z (1 − p ) − z Therefore, f ( u, d ) = φ ρ ( u, d ) (1 − p ) + 1 | b | φ ρ (cid:18) u, d − a b (cid:19) p and f ( d ) = φ ( d )(1 − p ) + 1 | b | φ (cid:18) d − a b (cid:19) p Putting all that together, the conditional density is f ( u | d ) = φ ρ ( u, d ) (1 − p ) + | b | φ ρ (cid:16) u, d − a b (cid:17) pφ ( d )(1 − p ) + | b | φ (cid:16) d − a b (cid:17) p . If we assume that p = | b | | b | this expression simpliﬁes to f ( u | d ) = φ ρ ( u, d ) + φ ρ (cid:16) u, d − a b (cid:17) φ ( d ) + φ (cid:16) d − a b (cid:17) . We can rewrite this as a function of standard normal densities noting that φ ρ ( u, d ) = φ ρ ( u | d ) φ ( d ) with φ ρ ( u | d ) = √ − ρ φ (cid:18) u − ρd √ − ρ (cid:19) , then f ( u | d ) = 1 (cid:112) − ρ φ (cid:32) u − ρd (cid:112) − ρ (cid:33) ω ( d ) + 1 (cid:112) − ρ φ (cid:32) u − ρ d − a b (cid:112) − ρ (cid:33) (1 − ω ( d )) , where ω ( d ) = φ ( d ) φ ( d )+ φ ( d − a b ) . Therefore, conditional on d this density is a Gaussian mixtureof two distributions with diﬀerent means. Two particular cases are: (i) ρ = 0 (exogeneity)where f ( u | d ) = φ ( u ); (ii) a = b = 0 ( d and z unrelated) which reduces to f ( u | d ) = φ ρ ( u | d ).Obviously, in the rest of the cases Q τ ( u | d ) does not have an explicit analytical solution andtherefore neither u ∗ τ = y − Q τ ( y | d ) = (1 + cd )( u − Q τ ( u | d )).The interesting feature to notice is that in all cases, the distribution of u ∗ τ depends basicallyon d , and (1 + cd ) should be used to standardize its density function in a similar way to u τ .27 eferences Amemiya, T. (1982). Two stage least absolute deviations estimators.

Econometrica, 50 , 689–711.Angrist, J., Chernozhukov, V. and Fern´andez-Val, I. (2006). Quantile regression under mis-speciﬁcation, with an application to the U.S. wage structure.

Econometrica, 74 , 539-563.Card, D. (1995). Using Geographic Variation in College Proximity to Estimate the Return toSchooling. in Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp,ed, by Louis N. Christoﬁdes, E. Kenneth Grant, and Robert Swidinsky. Toronto: Universityof Toronto Press, 201-222.Chen, L-A. and Portnoy, S. Two-stage regression quantiles and two-stage trimmed least squaresestimators for structural equation models.

Communication in Statistics, Theory Methods,25 , 1005–1032.Chernozhukov, V. and Hansen, C. (2005). An IV model of quantile treatment eﬀects.

Econo-metrica, 73 , 245–261.Chernozhukov, V. and Hansen, C. (2006). Instrumental quantile regression inference for struc-tural and treatment eﬀects models.

Journal of Econometrics, 132 , 491–525.Chernozhukov, V. and Hansen, C. (2008). Instrumental variable quantile regression: A robustinference approach.

Journal of Econometrics, 142 , 379–398.Chernozhukov, V., Hansen, C. and Jansson, M. (2009). Finite sample inference for quantileregression models

Journal of Econometrics, 152 , 93–103.Chernozhukov, V., Hansen, C. and Wuthrich, C. (2020). Instrumental variable quantile regres-sion

Handbook of Quantile Regression, https://arxiv.org/abs/2009.00436Chesher, A. (2003). Identiﬁcation in nonseparable models.

Econometrica, 71 , 1405–1441.de Castro, L., Galvao, A.F., Kaplan, D.M. and Liu, X. (2019) Smoothed GMM for quantilemodels.

Journal of Econometrics, 213 , 121–144.Galvao, A. (2011). Quantile regression for dynamic panel data with ﬁxed eﬀects.

Journal ofEconometrics, 164 , 142–157.Galvao, A. and Montes-Rojas, G. (2015). On the equivalence of instrumental variables esti-mators for linear models.

Economics Letters, 134 , 13–15.Hall, P. and Sheather, S.J. (1988). On the distribution of a studentized quantile.

Journal ofthe Royal Statistical Society: Series B, 50 (3) , 381–391.28endricks, W. and Koenker, R. (1992). Hierarchical spline models for conditional quantilesand the demand for electricity.

Journal of the American Statistical Association, 87 , 58–68.Jun, S.J. (2008). Weak identiﬁcation robust tests in an instrumental quantile model.

Journalof Econometrics, 144 , 118–138.Kaplan, D.M. and Sun, Y. (2017). Smoothed estimating equations for instrumental variablesquantile regression.

Econometric Theory, 33 , 105–157.Kim, T-H and Muller, C. (2004). Two-stage quantile regression when the ﬁrst stage is basedon quantile regression.

Econometrics Journal, 7 , 218–231.Koenker, R. (2005).

Quantile Regression

New York: Cambridge University Press.Lee, S. (2007). Endogeneity in quantile regression models: A control function approach.

Journalof Econometrics, 101 , 1131–1158.Lee, D. S., McCrary, J., Moreira, M. J. and Porter, J. (2020). Valid t-ratio inference for IV.Draft version. https://arxiv.org/abs/2010.05058Ma, L. and Koenker, R. (2006). Quantile regression methods for recursive structural equationmodels.

Journal of Econometrics, 134 , 471–506.Ota, H., Kato, K. and Hara, S. Quantile regression approach to conditional mode estimation.

Electronic Journal of Statistics, 13 , 3120–3160.Powell, J., (1983). The asymptotic normality of two-stage least absolute deviations estimators.

Econometrica, 51 , 1569–1576.Sanderson, E. and Windmeijer, F. (2016). A weak instrument F-test in linear IV models withmultiple endogenous variables.

Journal of Econometrics, 190 , 212-221.Staiger, D. and Stock, J.H. (1997). Instrumental variables regression with weak instruments.

Econometrica, 65 , 557-566.Stock, J. and Yogo, M. (2005). Testing for weak instruments in linear IV regression. InD.W.K. Andrews (Ed.)

Identiﬁcation and Inference for Econometric Models , 80-108. NewYork: Cambridge University Press.Zhou, K. Q. and Portnoy, S. L. (1996). Direct use of regression quantiles to construct conﬁdencesets in linear Models.