Instrumental Variable Quantile Regression
aa r X i v : . [ ec on . E M ] A ug Instrumental Variable Quantile Regression
Victor Chernozhukov ∗ Christian Hansen † Kaspar W¨uthrich ‡ September 2, 2020
Abstract
This chapter reviews the instrumental variable quantile regression model ofChernozhukov and Hansen (2005). We discuss the key conditions used for identifica-tion of structural quantile effects within this model which include the availability ofinstruments and a restriction on the ranks of structural disturbances. We outline sev-eral approaches to obtaining point estimates and performing statistical inference formodel parameters. Finally, we point to possible directions for future research.
Keywords: instrumental variables, ranks, C ( α )-statistic, treatment effects, causal effects JEL classification:
C21, C26
Empirical analyses often focus on understanding the structural (causal) relationship betweenan outcome, Y , and variables of interest, D . In many cases, interest is not just on how D affects measures of the center of the distribution Y but also on other features of thedistribution. For example, in understanding the effect of a government subsidized savingprogram, one might be more interested in the effect of the program on the lower tail ofthe savings distribution conditional on individual characteristics than on the effect of theprogram on the mean of the savings distribution. Quantile regression, as introduced byKoenker and Bassett (1978), offers one useful way to estimate such effects and to summarizethe impact of changes in D on the conditional distribution of Y .Of course, variables of interest are often endogenous or self-selected in observationaldata. For example, individuals choose whether to participate in government subsidized sav-ings plans. Similarly, in trying to understand the demand relationship between quantity andprice, one must face that prices and quantities are jointly determined. Endogeneity of covari-ates renders conventional quantile regression inconsistent for estimating the causal effects ofvariables on the quantiles of outcomes of interest. Instrumental variables (IV) provide a pow-erful tool for learning about structural effects in the presence of endogenous right-hand-side ∗ MIT, email: [email protected] † University of Chicago, email: [email protected] ‡ University of California San Diego, email: [email protected]
Model Overview
The instrumental variables quantile regression (IVQR) model is developed within the con-ventional potential outcome framework. Potential real-valued outcomes, which vary amongobservational units, are indexed against potential treatment states d ∈ D and denoted Y d .The potential outcomes { Y d } are latent because, given the observed treatment D , the ob-served outcome for each observational unit is only one component Y := Y D of the potential outcomes vector { Y d } . Note that we use capital letters to denote randomvariables and lower case letters to denote the potential values the random variables may takethroughout this review. We also do not explicitly state technical measurability assumptionsas these can be deduced from the context.The objective of causal or structural analysis is to learn about features of the distributionsof potential outcomes Y d . Of primary interest to us are the τ th quantiles of potential outcomesunder various potential treatment states d , conditional on observed characteristics X = x ,denoted as Q Y d ( τ | x ) = q ( τ, d, x ) . We note that, after conditioning on observed characteristics X = x , each potential outcome Y d can be related to its quantile function q ( τ, d, x ) as Y d = q ( U d , d, x ) , where U d ∼ U (0 ,
1) (1)is the structural error term and (1) follows from the Fisher-Skorohod representation of ran-dom variables.Given the conditional quantiles of the potential outcomes, we are then interested inQTE which are given by the difference in τ th quantiles of two different conditional potentialoutcomes Y d and Y d : q ( τ, d , x ) − q ( τ, d , x ) . These QTE may then be used to summarize the impact of variables of interest D on thequantiles of potential outcomes as suggested in Doksum (1974) and Lehmann (1975).It is important to note that the structural error U d in (1) is responsible for heterogeneityof potential outcomes among individuals with the same observed characteristics x . Thiserror term determines the relative ranking of observationally equivalent individuals in thedistribution of potential outcomes given the individuals’ observed characteristics, and thuswe refer to U d as the rank variable. Because U d drives differences between observationallyequivalent individuals, one may think of U d as representing some unobserved characteristic,e.g. ability or “proneness,” where we adopt the term proneness from Doksum (1974) whouses the term as in “prone to learn fast” or “prone to grow taller”. This interpretation of thestructural error makes quantile analysis an interesting tool for describing and learning thestructure of heterogeneous treatment effects while accounting for unobserved heterogeneity;see Doksum (1974), Heckman et al. (1997), and Koenker (2005). For example, consider a3eturns-to-training model, where Y d ’s are potential earnings under different training levels d , and q ( τ, d, x ) is the conditional earnings function which describes how an individual withtraining d , characteristics x , and latent “ability” τ is rewarded by the labor market. Theearnings function may differ for different levels of τ , implying heterogeneous effects of trainingon earnings of people that have different levels of “ability”. For example, it may be that thelargest returns to training accrue to those in the upper tail of the conditional distribution,that is, to the “high-ability” workers.In observational data, the realized treatment D is often selected in relation to potentialoutcomes, inducing endogeneity. This endogeneity makes the conventional quantile regressionof Y on D and X , which relies upon the restriction P [ Y ≤ Q Y ( τ | D, X ) | D, X ] = τ a.s.,inappropriate for measuring the structural quantile function q ( τ, d, x ) and thus for learn-ing about QTE. Indeed, the conditional quantile function, Q Y ( τ | d, x ), solving these equa-tions will generally differ from the structural quantile function of latent potential outcomes, q ( τ, d, x ), under endogeneity. The IVQR model presented below provides conditions underwhich we can identify and estimate the quantiles of the latent potential outcomes throughthe use of instruments Z that affect D but are independent of potential outcomes by makinguse of the nonlinear quantile-type conditional moment restrictions P [ Y ≤ q ( τ, D, X ) | X, Z ] = τ a.s.Formally, the IVQR model consists of five key conditions (some are representations). Assumption 1 (IVQR Model)
Consider a common probability space (Ω , F, P ) and theset of potential outcome variables ( Y d , d ∈ D ) , endogenous variables D , exogenous covariates X , and instrumental variables Z . The following conditions hold jointly with probability one: A1 Potential Outcomes.
Conditional on X and for each d , Y d = q ( U d , d, X ) , where τ q ( τ, d, X ) is non-decreasing on [0 , and left-continuous and U d ∼ U (0 , . A2 Independence.
Conditional on X and for each d , U d is independent ofinstrumental variables Z . A3 Selection. D := δ ( Z, X, ν ) for some unknown function δ and randomvector ν . A4 Rank Similarity.
Conditional on ( X, Z, ν ) , { U d } are identically dis-tributed. A5 Observables.
The observed random vector consists of Y := Y D , D , X and Z. The following theorem summarizes the main econometric implications of the model.
Theorem 1 (Main Implications of the IVQR Model)
Suppose conditions A1-A5 hold.(i) Then we have for U := U D , with probability one, Y = q ( U, D, X ) , U ∼ U (0 , | X, Z. (2)4 ii) If (2) holds and τ q ( τ, d, X ) is strictly increasing for each d , then for each τ ∈ (0 , ,a.s P [ Y ≤ q ( τ, D, X ) | X, Z ] = τ. (3) (iii) If (2) holds, then for any closed subset I of [0 , , a.s. P ( U ∈ I ) ≤ P [ Y ∈ q ( I, D, X ) | X, Z ] , (4) where q ( I, d, x ) is the image of I under the mapping τ q ( τ, d, x ) . The first result states that the main consequence of A1-A5 is a simultaneous equationmodel (2) with non-separable error U that is independent of Z, X , and normalized so that U ∼ U (0 , τ q ( τ, D, X )is strictly increasing, which requires that Y is non-atomic conditional on X and Z . In thiscase, we obtain the conditional moment restriction (3). This implication follows from thefirst result and the fact that { Y ≤ q ( τ, D, X ) } is equivalent to { U ≤ τ } when q ( τ, D, X ) is strictly increasing in τ . The final result deals with the case where Y may have atoms conditional on X and Z , e.g. when Y is a count or discrete responsevariable. The first two results were obtained in Chernozhukov and Hansen (2005), and thethird result is in the spirit of results given in Chesher et al. (2013), Chesher (2005), andChesher and Smolinski (2010).The model and the results of Theorem 1 are useful for two reasons. First, Theorem 1serves as a means of identifying QTE in a reasonably general heterogeneous effects model.Second, by demonstrating that the IVQR model leads to the conditional moment restrictions(3) and (4), Theorem 1 provides an economic and causal foundation for estimation based onthese restrictions.Equations (3) and (4) implicitly define the identification region for the structural quantilefunction ( τ, d, x ) q ( τ, d, x ). The identification region for the case of strictly increasing τ q ( τ, d, x ) can be stated as the set M of functions ( τ, d, x ) m ( τ, d, x ) that satisfy thefollowing relations, for all τ ∈ (0 , P [ Y < m ( τ, D, X ) | X, Z ] = τ a.s. (5)This representation of the identification region M is implicit. Without imposing additionalconditions, statistical inference about q ∈ M from (5) can be performed using weak-identification robust inference as described in Chernozhukov and Hansen (2008), Jun (2008),Santos (2012), or Chernozhukov et al. (2009). Section 2.2 discusses conditions under whichpoint identification is obtained; and we mainly focus on the point-identified case in discussingestimation and inference in this review.The identification region for the case of weakly increasing τ q ( τ, d, x ) can be statedas the set M of functions ( τ, d, x ) m ( τ, d, x ) that satisfy the following relations: For anyclosed subset I of (0 , P ( U ∈ I ) ≤ P [ Y ∈ m ( I, D, X ) | X, Z ] a.s. , where m ( I, D, X ) is the image of I under the mapping τ m ( τ, D, X ). The inferenceproblem here falls in the class of conditional moment inequalities and approaches such asthose described in Andrews and Shi (2013) or Chernozhukov et al. (2013b) can be used.5 .2 Conditions for Point Identification Here we briefly discuss the key conditions under which the moment equations (3) pointidentify the structural quantile function q ( τ, d, x ). We focus on the simplest case where D ∈ { , } and Z ∈ { , } and refer to Chernozhukov and Hansen (2013) for more detailsand extensions to multivalued and continuous D and to Chernozhukov and Hansen (2006)for a discussion of identification in linear-in-parameters models. The following analysis isconditional on X = x , but we suppress this dependence for the ease of notation.It follows from Theorem 1 that there is at least one function q ( τ, d ) that solves P [ Y ≤ q ( τ, D ) | Z ] = τ a.s. The function q ( τ, d ) can be equivalently represented by a vectorof its values q = ( q ( τ, , q ( τ, ′ . Therefore, for vectors of the form y = ( y , y ) ′ , we have avector of moment equationsΠ( y ) := ( P [ Y ≤ y D | Z = 0] − τ, P [ Y ≤ y D | Z = 1] − τ ) ′ , where y D := (1 − D ) · y + D · y . We say that q ( τ, d ) is identified in some parameter space, L , if y = q is the only solution to Π( y ) = 0 among all y ∈ L . Define the Jacobian ∂ Π( y ) ofΠ( y ) with respect to y = ( y , y ) ′ as ∂ Π( y ) := (cid:20) f Y ( y | D = 0 , Z = 0) P [ D = 0 | Z = 0] f Y ( y | D = 0 , Z = 1) P [ D = 0 | Z = 1] f Y ( y | D = 1 , Z = 0) P [ D = 1 | Z = 0] f Y ( y | D = 1 , Z = 1) P [ D = 1 | Z = 1] (cid:21) := (cid:20) f Y,D ( y , | Z = 0) f Y,D ( y , | Z = 0) f Y,D ( y , | Z = 1) f Y,D ( y , | Z = 1) (cid:21) (6)The key condition for point identification is full rank of ∂ Π( y ) at y = q . This local identifica-tion condition can be extended to a global condition; see Chernozhukov and Hansen (2005,2013).Full rank of ∂ Π( y ) requires the impact of Z on the joint distribution of ( Y, D ) to be richenough. To illustrate, note that full rank of ∂ Π( y ) is equivalent to det ( ∂ Π( y )) = 0, whichimplies that f Y,D ( y , | Z = 1) f Y,D ( y , | Z = 1) > f Y,D ( y , | Z = 0) f Y,D ( y , | Z = 0) (7)(or the same condition with > replaced by < ). Inequality (7) may be interpreted as a mono-tone likelihood ratio condition . That is, the instrument Z should have a monotonic impacton the likelihood ratio in (7), which is generally stronger than the usual condition that D is correlated with Z . Nevertheless, the full rank condition will be trivially satisfied in manyuseful contexts. For instance, if the instrument satisfies one-sided non-compliance (e.g., thosenot offered the treatment cannot receive that treatment), P [ D = 1 | Z = 0] = 0, so that theright-hand side of (7) equals 0, which makes (7) hold trivially. Condition A1 imposes monotonicity on the structural function of interest which makes its re-lation to q ( τ, d, x ) apparent. Condition A2 states that potential outcomes are independent of6 , given X , which is a conventional independence restriction employed in nonlinear IV mod-els. Condition A3 provides a convenient representation of a treatment selection mechanism,stated for the purpose of discussion. In A3, the unobserved random vector ν is responsi-ble for the difference in treatment choices D across observationally identical individuals.Dependence between ν and { U d } is the source of endogeneity that makes the conventionalexogeneity assumption U ∼ U (0 , | X, D break down. This failure leads to inconsistencyof exogenous quantile methods for estimating the structural quantile function. Within themodel outlined above, this breakdown is resolved through the use of instrumental variables.The independence imposed in A2 and A3 is weaker than the assumption that both the dis-turbances { U d } in the outcome equation and the disturbances ν in the selection equation are jointly independent of the instrument Z which is maintained, for example, in Abadie et al.(2002). The assumption that structural errors { U d } and first-stage unobservables ν are jointlyindependent of instruments may be violated in practical examples. For example, this condi-tion would not hold when the instrument is measured with error as discussed in Hausman(1977) or when the instrument is not assigned exogenously relative to the selection equationas in Example 2 in Imbens and Angrist (1994).Condition A4 is the key restriction of the IVQR model. This assumption restricts thevariation in ranks across potential outcomes and is key for identifying the structural quantilefunction q ( τ, d, x ) and the associated QTE. The simplest, though strongest, version of thiscondition is rank invariance which imposes that ranks U d do not vary with potential treatmentstates d : U d = U for each d ∈ D . (8)Rank invariance is a strong condition that has been used in many interesting models with-out endogeneity such as Doksum (1974), Heckman et al. (1997), and Koenker and Geling(2001). Rank invariance implies that a common unobserved factor U , such as innate ability,determines the ranking of a given person across treatment states. For example, under rankinvariance, people who are strong (highly ranked) earners without a training program ( d = 0)remain strong earners having done the training ( d = 1). Indeed, the earnings of a person withcharacteristics x and rank U = τ in the training state “0” is Y = q ( τ, , x ) and in the state“1” is Y = q ( τ, , x ); that is, the individual’s rank, τ , in the earnings distribution is exactlythe same whether or not the person receives training. Finally, note that Condition A3 is apure representation under rank invariance as nothing restricts the unobserved component ν in this case.While convenient, rank invariance seems too strong a condition for many applicationsas discussed, for example, in Heckman et al. (1997). Rank invariance maintains that anindividual’s rank in the outcome distribution under every possible state of the endogenousvariables is exactly the same. Thus, the potential outcomes { Y d } are jointly degeneratewhich allows identification of individual treatment effects even though no individual is everobserved in more than one state of the endogenous variable. Rank invariance also rules outthe possibility that there may be many unobserved factors that determine individual rankswhich may be differentially relevant under different states of the endogenous variables.Rank similarity A4 relaxes these undesirable features of rank invariance by allowingthe rank variables { U d } to change across d in a way that reflects unobserved, asystematic7ariation in ranks across states of the endogenous variables while also providing sufficientstructure to allow identification of QTE via the moment restrictions in Theorem 1. Morespecifically, rank similarity A4 relaxes exact rank invariance by allowing “slippages”, in theterminology of Heckman et al. (1997), in an individuals’s rank away from some common level U . Conditional on U , which may enter disturbance ν in the selection equation, and any othercomponents of ν from the selection equation A3, rank similarity yields that the slippages ofranks away from common level U under different potential states of the endogenous variable, U d − U , are identically distributed across d ∈ D . In this formulation, we implicitly assumethat any selection of the state of the endogenous variables occurs without knowing theexact potential outcomes. That is, selection may depend on U and even the distribution ofslippages, but does not depend on the exact slippage U d − U . This assumption is consistentwith many empirical situations where the exact latent outcomes are not known before receiptof treatment. We also note that conditioning on appropriate covariates X may be importantto achieve rank similarity. Finally, we note that rank similarity has testable implications.Dong and Shen (2015) and Frandsen and Lefgren (2015) exploit these conditions to developtests of unconditional rank similarity, and their approaches could be extended to test someforms of conditional rank similarity. We present two examples that highlight the nature of the model, its strengths, and itslimitations.
Example 1 (Demand with Non-Separable Error). The following is a generalization of theclassic supply-demand example taken from Chernozhukov and Hansen (2006). Consider themodel Y p = q ( U, p ) , ˜ Y p = ρ ( U , p, z ) ,P ∈ { p : ρ ( U, p, Z ) = q ( U , p ) } , (9)where functions q and ρ are increasing in their first argument. The function p Y p is therandom demand function, and p ˜ Y p is the random supply function. Additionally, functions q and ρ may depend on covariates X , but this dependence is suppressed.Random variable U is the level of demand and describes the demand curve at differentstates of the world. Demand is maximal when U = 1 and minimal when U = 0, holding p fixed. Note that we imposed rank invariance (8), as is typical in classic supply-demandmodels, by making U invariant to p .Model (9) incorporates traditional additive error models for demand which have Y p = q ( p ) + ǫ where ǫ = Q ǫ ( U ). The model is much more general in that the price can affectthe entire distribution of the demand curve, while in traditional models it only affects thelocation of the distribution of the demand curve.The τ -quantile of the demand curve p Y p is given by p q ( τ, p ) . Thus, the curve p Y p lies below the curve p q ( τ, p ) with probability τ . Therefore, the various quantiles of thepotential outcomes play an important role in describing the distribution and heterogeneityof the stochastic demand curve. The QTE may be characterized by ∂q ( τ, p ) /∂p or by an8lasticity ∂ ln q ( τ, p ) /∂ ln p. For example, consider the model q ( τ, p ) = exp ( β ( τ ) + α ( τ ) ln p )which corresponds to a Cobb-Douglas model for demand with non-separable error Y p =exp( β ( U ) + α ( U ) ln p ) . The log transformation gives ln Y p = β ( U ) + α ( U ) ln p, and the QTEfor the log-demand equation is given by the elasticity of the original τ -demand curve α ( τ ) = ∂Q ln Yp ( τ ) /∂ ln p = ∂ ln q ( τ, p ) /∂ ln p. The elasticity α ( U ) is random and depends on the state of the demand U and mayvary considerably with U . For example, this variation could arise when the number ofbuyers varies and aggregation induces a non-constant elasticity across the demand levels.Chernozhukov and Hansen (2008) estimate a simple demand model based on data from aNew York fish market that was first collected and used by Graddy (1995). They find pointestimates of the demand elasticity, α ( τ ), that vary quite substantially from − − . P ∈ { p : ρ ( U, p, Z ) = q ( U , p ) } , is the equilibrium conditionthat generates endogeneity; the selection of the clearing price P by the market depends onthe potential demand and supply outcomes. As a result, we have a representation that isconsistent with A3, P = δ ( Z, ν ) , where ν consists of U and U and may include“sunspot”variables if the equilibrium price is not unique. Thus what we observe can be written as Y := q ( U, P ) , P := δ ( Z, ν ) , U is independent of Z. (10)Identification of the τ th quantile of the demand function, p q ( p, τ ) is obtained throughthe use of instrumental variables Z , like weather conditions or factor prices, that shift thesupply curve and do not affect the level of the demand curve, U , so that independenceassumption A2 is met. Furthermore, the IVQR model allows arbitrary correlation between Z and ν . This property is important as it allows, for example, Z to be measured with erroror to be exogenous relative to the demand equation but endogeneous relative to the supplyequation. Example 2 (Savings). Chernozhukov and Hansen (2004) use the framework of the IVQRmodel to examine the effects of participating in a 401(k) plan on an individual’s accumulatedwealth. Since wealth is continuous, wealth, Y d , in the participation state d ∈ { , } can berepresented as Y d = q ( U d , d, X ) , U d ∼ U (0 , τ q ( τ, d, X ) is the conditional quantile function of Y d and U d is an unobservedrandom variable. U d is an unobservable that drives differences in accumulated wealth con-ditional on X under participation state d . Thus, one might think of U d as the preferencefor saving and interpret the quantile index τ as indexing rank in the preference for savingdistribution. One could also model the individual as selecting the 401(k) participation stateto maximize expected utility: D = arg max d ∈D E h W { Y d , d } (cid:12)(cid:12)(cid:12) X, Z, ν i = arg max d ∈D E h W { q ( U d , d, x ) , d } (cid:12)(cid:12)(cid:12) X, Z, ν i , (11)9here W { Y d , d } is the random indirect utility derived under participation state d . Of course,utility may depend on both observables in X as well as realized and unrealized unobserv-ables. Only dependence on Y d and d is highlighted. As a result, the participation decision isrepresented by D = δ ( Z, X, ν ) , where Z and X are observed, ν is an unobserved information component that may be relatedto ranks U d and includes other unobserved variables that affect the participation state, andfunction δ is unknown. This model fits into the IVQR model with the independence conditionA2 requiring that U d is independent of Z , conditional on X .Under rank invariance (8) the preference for saving vector U d may be collapsed to a singlerandom variable U = U = U . In this case, a single preference for saving is responsible for anindividual’s ranking across both treatment states. The more general rank similarity conditionA4 relaxes the exact invariance of ranks U d across d by allowing noisy, asystematic variationsof U d across d , conditional on ( ν, X, Z ). This relaxation allows for variation in rank acrossthe treatment states, requiring only an “expectational rank invariance.” Similarity impliesthat given the information in ( ν, X, Z ) employed to make the selection of treatment D , theexpectation of any function of rank U d does not vary across the treatment states. That is, ex-ante, conditional on ( ν, X, Z ), the ranks may be considered to be the same across potentialtreatments, but the realized, ex-post, rank may be different across treatment states.From an econometric perspective, the similarity assumption is nothing but a restriction onthe unobserved heterogeneity component which precludes systematic variation of U d acrossthe treatment states. To be more concrete, consider the following simple example where U d = F ν + η d ( ν + η d ) , where F ν + η d ( · ) is the distribution function of ν + η d and { η d } are mutually i.i.d. conditionalon ν , X , and Z . The variable ν represents an individual’s “mean” saving preference, while η d is a noisy adjustment. Clearly similarity holds in this case, U d d = U d ′ given ν , X , and Z .This more general assumption leaves the individual optimization problem (11) unaffected,while allowing variation in an individual’s rank across different potential outcomes.While we feel that rank similarity may be a reasonable assumption in many contexts,imposing rank similarity is not innocuous. In the context of 401(k) participation, matchingpractices of employers could jeopardize the validity of the similarity assumption. To be moreconcrete, let U d = F ν + η d ( ν + η d ) as before but let η d = dM for random variable M thatdepends on the match rate and is independent of ν , X , and Z . Then conditional on ν = v , X , and Z , U = F ν ( v ) is degenerate but U = F ν + M ( v + M ) is not. Therefore, U is notequal to U in distribution. Similarity may still hold in the presence of the employer matchif the rank, U d , in the asset distribution is insensitive to the match rate. The rank may beinsensitive if, for example, individuals follow simple rules of thumb such as target savingwhen they make their savings decisions. Also, if the variation of match rates is small relativeto the variation of individual heterogeneity or if the covariates capture most of the variationin match rates, then similarity may be satisfied approximately.10 .5 Comparison to Other Approaches There are, of course, other assumptions that one could employ to build a quantile modelwith endogeneity. In this section, we briefly compare the IVQR framework to triangularmodels as in Imbens and Newey (2009); see Chesher (2003), Koenker and Ma (2006), Lee(2007) and Chernozhukov et al. (2015a) for related models and results. We also note thattriangular models are related to the Rosenblatt transform; see for example the chapter byHallin and ˇSiman (2016) in this handbook. A comparison between the IVQR model andthe popular Abadie et al. (2002) approach is provided in Melly and W¨uthrich (2016) in thishandbook.The triangular model takes the form of a triangular system of equations Y = g ( D, ǫ ) ,D = h ( Z, η ) , where Y is the outcome, D is a continuous scalar endogenous variable, ǫ is a vector ofdisturbances, Z is a vector of instruments with a continuous component, η is a scalar reducedform error, and we ignore other covariates X for simplicity. It is important to note that thetriangular system generally rules out simultaneous equations which typically have that thereduced form relating D to Z depends on a vector of disturbances. For example, in a supplyand demand system, the reduced form for both price and quantity will generally depend onthe unobservables from both the supply equation and the demand equation; see Example 1in Section 2.4.Outside of η being a scalar, the key conditions that allow identification of quantile effectsin the triangular system are (a) the function η h ( Z, η ) is strictly increasing in η and (b) D and ǫ are independent conditional on V for some observable or estimable V . The variable V is thus the “control function” conditional on which changes in D may be taken as causal.Imbens and Newey (2009) use V = F D | Z ( d, z ) = F η ( η ) as a control variable and show thatthis variable satisfies condition (b) under the additional condition that ( ǫ, η ) is independentof Z . Identification then proceeds as follows. Under the assumed monotonicity of h ( Z, η )in η , D = h ( Z, η ) can be used to identify V . Using V obtained in this first step, one maythen construct the distribution of Y | D, V . Integrating over the distribution of V and usingiterated expectations, one has Z F Y | D,V ( y | d, v ) F V ( dv ) = Z g ( d, ǫ ) ≤ y ) F ǫ ( dǫ )= Pr( g ( d, ǫ ) ≤ y ) := G ( y, d )and the structural quantile function Y d can be obtained as G − ( τ, d ).It should be emphasized that the triangular model is neither more nor less general thanthe IVQR model reviewed here. The key difference between the approaches is that the IVQRmodel uses an essentially unrestricted selection equation ( ν may be vector valued) but re-quires monotonicity and a scalar disturbance ( U ) in the structural equation. The triangularsystem on the other hand relies on monotonicity of the selection mechanism in a scalar dis-turbance ( η ) but does not restrict the unobserved heterogeneity in the outcome equation( ǫ may be a vector of disturbances). In addition, the triangular system, as developed in11mbens and Newey (2009), requires a more stringent independence condition in that the in-struments Z needs to be independent of both the structural disturbances, ǫ , and the reducedform disturbance, η . That the approaches impose structure on different parts of the modelmakes them complementary with a researcher’s choice between the two being dictated bywhether it is more natural to impose restrictions on the structural function or the reducedform in a given application.Finally, we note that the triangular model and the IVQR model can be made compatibleby imposing the conditions from the triangular model on the selection equation and theconditions from the IVQR model on the structural model. Torgovitsky (2015) studies iden-tification when both sets of conditions are imposed and shows that the requirements on theinstruments may be substantially relaxed relative to the IVQR model or Imbens and Newey(2009) in this case. In this section, we present various approaches to estimating and doing inference for theparameters of the IVQR model under the leading case where τ q ( τ, d, X ) is strictlyincreasing. We focus on linear-in-parameters structural quantile models at a single quantileof interest τ : q ( τ, d, x ) = d ′ α ( τ ) + x ′ β ( τ ) . (12)In (12), α ( τ ) captures the causal effect of the endogenous variables D on the τ th quantile ofthe conditional distribution of potential outcomes Y d given X = x . Similarly, β ( τ ) providesthe causal effect of controls X on the τ th quantile of the conditional potential outcomedistributions. We note that D may also contain interactions of endogenous variables andcovariates. Because α ( τ ) is the chief object of interest in many studies, we focus mostof our discussion on estimating and doing inference for α ( τ ) treating β ( τ ) as a nuisanceparameter. Note that in what follows we will often suppress the dependence of α ( τ ) and β ( τ ) on the quantile level τ .In interpreting the parameters in (12), it is important to note that the quantile index, τ , refers to the quantile of potential outcome Y d given that exogenous variables are set to X = x and not to the unconditional quantile of Y d . For example, suppose that one of thecontrol variables in the savings example in Section 2.4 is income. An individual at the 10 th percentile of the distribution of Y d given an income of $200,000, which is far above themedian income, may not necessarily be at the low tail of the unconditional distribution of Y d as even a relatively low saver with a high level of income may still save substantiallymore than the median saver in the overall population, i.e., without conditioning on income;see Fr¨olich and Melly (2013) for a further discussion of this point. In some applications,features of the conditional distribution are not the chief objects of interest and researchersare interested in effects of treatments on unconditional quantiles. Unconditional QTE can beobtained from the conditional quantile functions in three steps. First, obtain the conditionalpotential outcome distribution functions, F Y d ( y | x ), as F Y d ( y | x ) = Z ( d ′ α ( τ ) + x ′ β ( τ ) ≤ y ) dτ, ( · ) is the indicator function that returns one when the expression inside the paren-theses is true and zero otherwise. Second, the unconditional potential outcome distributions, F Y d ( y ), are obtained by integrating F Y d ( y | x ) with respect to the marginal distribution ofcovariates, F X ( x ): F Y d ( y ) = Z F Y d ( y | x ) dF X ( x ) . Finally, the unconditional τ -QTE is given by F − Y d ( τ ) − F − Y d ( τ ). This discussion suggeststhat given estimators of the parameters α ( τ ) and β ( τ ) and the distribution of covariates F X ( x ), unconditional QTE can be estimated based on the plug-in principle; see for instanceMachado and Mata (2005), Melly (2005) or Chernozhukov et al. (2013a).Model (12) provides a simple and widely used baseline for discussion of estimation andinference. Extending the discussion to allow for nonlinear parametric specifications of thepotential outcome quantile functions or to estimation at a small number of quantile in-dices that are widely spaced is straightforward. In some applications, we may be inter-ested in understanding QTE across a range of quantile indices, say τ ∈ [ δ, − δ ] for some δ >
0. Chernozhukov and Hansen (2006) explicitly consider this case and provide uniformconvergence results which allow for inference about a variety of hypotheses surroundingthe behavior of QTE viewed as a function of τ such as tests of monotonicity of treat-ment effects or tests that treatment effects are uniformly 0 across a range of τ . Finally,we note that Chernozhukov et al. (2007), Horowitz and Lee (2007), Chen and Pouzo (2009),Chen and Pouzo (2012), and Gagliardini and Scaillet (2012) consider fully nonparametricapproaches to estimating structural quantile models. The most direct way to estimate the parameters of the linear IVQR model is to note thatthe main implication of the model, equation (3), implies unconditional moment conditionsE [( τ − ( Y − D ′ α − X ′ β ≤ X, Z ) is a vector of functions of the instruments and endogenous variables. Supposing that α is an s × β is a k × r ≥ k + s .Let, for θ := ( α, β ) and V := ( Y, D, X, Z ), g τ ( V, θ ) = ( τ − ( Y − D ′ α − X ′ β ≤ . With a given set of instruments, Ψ, and observables { V i } Ni =1 = { Y i , D i , X i , Z i } Ni =1 , one maythen form the sample analog of the right-hand-side of the equation (13), b g N ( θ ) = 1 N N X i =1 g τ ( V i , θ ) , (14) A natural choice of instruments would be Ψ = ( Z ′ , X ′ ) ′ though the instruments and GMM weightingmatrix could be chosen to produce a pointwise efficient procedure following Chamberlain (1987). θ = ( α ′ , β ′ ) ′ by generalized method of moments (GMM) as b θ = ( b α ′ , b β ′ ) ′ = arg min θ ∈ Θ m N ( θ ) (15)for m N ( θ ) := N b g N ( θ ) ′ Ω N b g N ( θ )where Ω N is the GMM weighting matrix that will typically be set asΩ N = τ (1 − τ ) 1 N N X i =1 Ψ i Ψ ′ i ! − . Maintaining sufficient conditions for point identification as in Chernozhukov and Hansen(2005, 2006, 2013) and assuming that a suitable solution to the GMM optimization prob-lem (15) can be found, asymptotic properties of b θ ( τ ) would then follow from standard re-sults for GMM with non-smooth moment conditions as in Newey and McFadden (1994); seeAbadie (1995) and Chernozhukov and Hong (2003). We note that if the GMM problem (13)is overidentified, overidentification-type tests can be used to assess the joint validity of theunderlying assumptions.The chief difficulty in implementing estimation based on (15) is that the function beingminimized is both non-smooth and non-convex in general. We also note that in many ap-plications, s will be small, often one, but k may be quite large. Solving (15) then involvesoptimizing a non-smooth, non-convex function over s + k arguments where s + k may be quitelarge. Directly solving this problem thus poses a substantial computational challenge andhas led to the adoption of different approaches to estimating the parameters of the IVQRmodel.Within the conventional GMM framework, one option is to take the quasi-Bayesian ap-proach of Chernozhukov and Hong (2003); see also Wang and Yang (2016) in this handbookfor a review of subsequent work on related methods. The Chernozhukov and Hong (2003)approach uses the GMM criterion function to form a “quasi-likelihood”, L N ( θ ) = exp (cid:18) − N b g N ( θ ) ′ Ω N b g N ( θ ) (cid:19) , which when coupled with a prior density π ( θ ) over model parameters θ , defines a “quasi-posterior” density for θ : π N ( θ ) = L N ( θ ) π ( θ ) / Z L N (˜ θ ) dπ (˜ θ ) ∝ L N ( θ ) π ( θ ) . Rather than try to solve the optimization problem (15), one can then use MCMC sam-pling to attempt to explore the implied quasi-posterior distribution. Chernozhukov and Hong(2003) show that measures of central tendency from the quasi-posterior, such as the quasi-posterior mean, b θ = ( b α ′ , b β ′ ) ′ = Z θdπ N ( θ )14nd quasi-posterior median are consistent for model parameters with the same asymptoticdistribution as the solution to (15). Chernozhukov and Hong (2003) also demonstrate thatvalid frequentist confidence intervals may be obtained by taking quasi-posterior quantiles.For example, a frequentist 95% confidence interval may be constructed as by taking the 2.5and 97.5 quantiles of the quasi-posterior distribution. This approach bypasses the need tooptimize a non-convex and non-smooth criterion at the cost of needing to design a samplerthat adequately explores the quasi-posterior in a reasonable amount of computation time.A second option is to directly smooth the GMM-criterion function as in Kaplan and Sun(2016), building upon ideas in Amemiya (1982) and Horowitz (1998). Specifically, one mod-ifies the moment condition (14) to b g h N N ( θ ) = 1 N N X i =1 ( τ − G h N ( Y i − D ′ i α − X ′ i β )) Ψ i , (16)by smoothing the indicator function, where G h ( · ) denotes a smoothing function with smooth-ing parameter h . G h ( · ) can be defined as the survival function associated with any kernelfunction K h ( · ), i.e. G h ( u ) = R ∞ u K h ( v ) dv , that satisfies regularity conditions provided inKaplan and Sun (2016). One can then proceed to estimate model parameters by replacing b g N ( θ ) in (15) with b g h N N ( θ ) and applying any optimizer which is appropriate for smooth, non-convex optimization problems or the quasi-Bayesian approach described above. Solving thesmoothed problem can offer some computational gains relative to attempting to solve theoriginal problem, though non-convexities remain after smoothing. The resulting estimatoris first-order-equivalent to the GMM estimator for the original problem. The estimator can,however, enjoy higher-order improved performance. Kaplan and Sun (2016) provide a plug-inapproach to choosing the smoothing parameter h N and also demonstrate that the estimatedparameters obtained from solving the smoothed problem may perform better in small sam-ples than those from solving the unsmoothed problem or the inverse quantile regressiondiscussed in Section 3.2. Rather than work directly with moment condition (13), Chernozhukov and Hansen (2006)and Chernozhukov and Hansen (2008) take a different approach which they label the inversequantile regression (IQR). The IQR is based on the observation that (3) coupled with thelinear quantile model (12) implies that the τ th quantile of Y − D ′ α conditional on covariates X and instruments Z is equal to X ′ β ( τ ): Q Y − D ′ α ( τ | X, Z ) = X ′ β + Z ′ γ with γ ≡ . (17)That is, at the true value of the coefficient vector on the endogenous variables α , theconventional linear τ -quantile regression of Y − D ′ α onto X and Z would yield coefficients onthe instruments of exactly 0 in the population. This observation then suggests an estimationapproach based on concentrating X out of the problem using conventional quantile regression,which is convex and can be solved very quickly, and then solving a lower dimensional non-convex optimization problem over only the dimension of D to find b α .15pecifically, the IQR procedure works as follows. Let a denote an arbitrary hypothesizedvalue for α . Using the hypothesized value a , estimate coefficients β ( a ) and γ ( a ) from themodel Q Y − D ′ a ( τ | X, Z ) = X ′ β ( a ) + Z ′ γ ( a ) by running the ordinary linear τ -quantile regres-sion of Y − D ′ a onto X and Z . Let b β ( a ) and b γ ( a ) denote the resulting estimators of β ( a )and γ ( a ). Also, let b Ω N ( a ) denote the estimated covariance matrix of √ N ( b γ ( a ) − γ ( a )), andnote that this covariance matrix is available in any common implementation of the ordinaryquantile regression. We can then define the IQR estimator of α as b α = arg min a ∈A W N ( a ) , (18)where W N ( a ) := N b γ ( a ) ′ b Ω N ( a ) − b γ ( a ) . (19)Given b α , we can then estimate β as b β ( b α ).In terms of point estimation, the main virtue of the IQR is that, by concentrating outthe coefficients on exogenous variables X , it produces a non-convex optimization problemover only the parameters α . In many applications, the dimension of D is small, so one canapproach the non-convex optimization problem using highly robust optimization proceduresthat deal effectively with objectives with many local optima. Chernozhukov and Hansen(2006) recommend using a grid-search to solve (18) though other approaches are certainlyavailable. Using a grid-search is particularly appealing when coupled with weak-identificationrobust inference as discussed in Section 3.3.Chernozhukov and Hansen (2006) analyze the properties of ( b α ( τ ) ′ , b β ( τ ) ′ ) ′ under assump-tions that guarantee strong identification. They verify asymptotic normality of the estimator,provide a consistent estimator of the asymptotic variance, and show how instruments andobservation weights can be chosen to produce an efficient estimator of the coefficients for asingle quantile following Chamberlain (1987). Chernozhukov and Hansen (2006) also analyzethe behavior of the process ( b α ( τ ) ′ , b β ( τ ) ′ ) ′ not just at a point but viewed as a function of τ ,providing uniform convergence results and discussing in detail applications of these conver-gence results to testing hypotheses about the behavior of ( α ( τ ) ′ , β ( τ ) ′ ) ′ across the index τ . It is useful to interpret IQR as first-order-equivalent to a particular GMM estimator, wherewe first profile out the coefficients on exogenous variables.To this end, let us define g τ ( V, α ; β, δ ) = ( τ − ( Y ≤ D ′ α + X ′ β )) Ψ( α, δ ( α )) , (20)with “instrument” Ψ( α, δ ( α )) := ( Z − δ ( α ) X ) . (21)In (21), δ ( α ) = M ( α ) J − ( α )16here δ is a matrix parameter, M ( α ) = E [ ZX ′ f ε (0 | X, Z )] , J ( α ) = E [ XX ′ f ε (0 | X, Z )] , and f ε (0 | X, Z ) is the conditional density of ε = Y − D ′ α − X ′ β ( α ) where β ( α ) is defined byE [( τ − ( Y ≤ D ′ α + X ′ β ( α )) X ] = 0 . To proceed with estimation, for a hypothesized value a , we first profile out the coefficientson the exogenous variables as in IQR, b β ( a ) = arg min b ∈B N N X i =1 ρ τ ( Y i − D ′ i a − X ′ i b ) . (22)We may then plug the solution of (22) into (20) to form b g N ( a ) = 1 N N X i =1 g ( V i , a, ˆ β ( a ) , ˆ δ ( a )) , (23)where ˆ δ ( a ) = c M ( a ) b J − ( a ) , for c M ( a ) = 1 N h
N N X i =1 Z i X ′ i K h N (cid:16) Y i − D ′ i a − X ′ i b β ( a ) (cid:17) , b J ( a ) = 1 N h
N N X i =1 X i X ′ i K h N (cid:16) Y i − D ′ i a − X ′ i b β ( a ) (cid:17) , and K h N ( · ) a kernel function with bandwidth h N . Then, we consider the GMM estimatorbased on the concentrated moments (23):ˆ α ( τ ) = arg min a ∈A m N ( a ) , for m N ( a ) := N b g N ( a ) ′ b Σ( a, a ) − b g N ( a ) . (24) b Σ( a, a ) in m N ( a ) is an estimator of the covariance function of the sample concentratedmoment functions (23) such as b Σ( a , a ) = 1 N N X i =1 g (cid:16) V i , a , ˆ β ( a ) (cid:17) g (cid:16) V i , a , ˆ β ( a ) (cid:17) ′ . (25)The estimator ˆ α is first-order equivalent to the estimator ˜ α which employs the momentfunction: g ∗ τ ( V, α ) = ( τ − ( Y ≤ D ′ α + X ′ β ))Ψ( α , δ ( α )) . α uses b g N ( α ) = 1 N N X i =1 g ∗ τ ( V, α ) (26)where b Σ( α, α ) = E [ g ∗ τ ( V i , α ) g ∗ τ ( V i , α ) ′ ] . This equivalence holds because the moments possess the Neyman orthogonality property thatwe discuss later. Moreover, by examining the first-order properties of the IQR estimator wecan conclude that ˜ α and IQR are first-order equivalent. The good behavior of asymptotic approximation results for the point estimators providedin Sections 3.1-3.2 rely on strong identification of the model parameters as discussed inSection 2.2. Because checking these conditions may be difficult, it is useful to have inferenceprocedures that are robust to weak- or non-identification.Chernozhukov and Hansen (2008) present a simple weak-identification robust inferenceprocedure that results naturally from the IQR estimator. The basic idea underlying thisprocedure is exactly the relation (17) which states that the instruments Z should have noexplanatory power in the conventional τ -quantile regression of Y − D ′ α on X and Z at thetrue value of the structural parameter α . Thus, a valid test of the hypothesis that α = a forsome hypothesized a can be obtained by considering a test of the hypothesis that γ ( a ) = 0for γ ( a ) denoting the population value of the τ -quantile regression coefficients defined inSection 3.2. Also, note that W N ( a ) in (19) is simply the standard Wald statistic for testing γ ( a ) = 0 and that W N ( α ) converges in distribution to a χ Z ) regardless of the strengthof identification of α ; see Chernozhukov and Hansen (2008) for details. It then follows thata valid (1 − p )% confidence region for α may be constructed as the set { a ∈ A : W N ( a ) ≤ c − p } (27)where c − p is such that P h χ Z ) > c − p i = p , and the set may be approximated numeri-cally by considering a ’s in the grid { a j , j = 1 , ..., J } . Thus, a natural byproduct of solving(18) through a grid search is a confidence set for the structural parameter α that is validregardless of the strength of identification of the parameter. We note that this procedurecould also be adapted to be used with the orthogonal scores defined in Section 4.1 to provideweak-identification robust inference in settings with high-dimensional X or other settingswhere robustness to estimation of the nuisance parameter β is a major concern.The approach of Chernozhukov and Hansen (2008) outlined above is in the spirit of theweak identification robust procedure of Anderson and Rubin (1949). The procedure is rel-atively simple to implement, but suffers from the same well-known lack of power as other The same statement would also hold for the GMM objective function based on (23) discussed in Section3.2.1.
QLR N ( a ) = m N ( a ) − inf a ∈A m N ( a ) (28)where m N ( a ) is the GMM objective function (24).Under weak identification, the distribution of QLR N ( a ) is non-standard and dependson a nuisance function that is not consistently estimable. Andrews and Mikusheva (2016)provide a sufficient statistic (in LeCam’s Gaussian limit experiment) S ( a ) = √ N (cid:16)b g N ( a ) − b Σ( a, α ) b Σ( α , α ) − b g N ( α ) (cid:17) for this functional nuisance parameter, where g N ( a ) and b Σ( a , a ) are defined in (23) and(25). Andrews and Mikusheva (2016) also outline a procedure to simulate the distributionof QLR ( a ) conditional on S ( a ) that proceeds as follows. First, draw ζ ∗ b ∼ N (cid:16) , b Σ( α , α ) (cid:17) for b = 1 , ..., B for a large number B . For each ζ ∗ b , the QLR statistic for that draw is thencalculated as QLR ∗ N,b ( a ) = m ∗ N,b ( a ) − inf a ∈A m ∗ N,b ( a )where m ∗ N,b ( a ) = N b g ∗ N,b ( a ) ′ b Σ( a, a ) − b g ∗ N,b ( a )for b g ∗ N,b ( a ) = S ( a ) + b Σ( a, α ) b Σ( α , α ) − ζ ∗ b . The simulated distribution then provides an appropriate critical value, c − p ( S ( a )), forperforming a valid p -level test of the null hypothesis that α = a by rejecting when QLR N ( a ) > c − p ( S ( a )). It then follows that a valid (1 − p )% confidence region for α isgiven by { a ∈ A : QLR N ( a ) ≤ c − p ( S ( a )) } . The inference procedures reviewed in the previous sections all rely on asymptotic approxi-mations. Chernozhukov et al. (2009) provide a finite sample inference approach which canalso be used if the validity of the assumptions necessary to justify these approximations isquestionable and is valid in setups with weak or set identification.19heir approach makes use of the fact under the assumptions of the IVQR model, the event { Y ≤ q ( τ, D, X ) } conditional on ( Z, X ) is distributed exactly as a Bernoulli( τ ) randomvariable regardless of the sample size. This random variable depends only on τ , which isknown, and so is pivotal in finite samples. For the GMM objective function m N ( θ ) definedin (15), this implies that m N ( θ ) d = e m N conditional on { X i , Z i } Ni =1 , where e m N := √ N N X i =1 ( τ − B i ) · Ψ i ! ′ Ω N √ N N X i =1 ( τ − B i ) · Ψ i ! and { B i } Ni =1 are i.i.d. Bernoulli random variables that are independent of { X i , Z i } Ni =1 andhave E [ B i ] = τ . This result provides the finite sample distribution of the GMM function m N ( θ ) at θ = θ , which does not depend on any unknown parameters. Given the finitesample distribution of m N ( θ ), a p -level test of the null hypothesis that θ = θ is given bythe rule that rejects the null if m N ( θ ) > c − p , where the critical value c − p is the (1 − p ) th quantile of e m N ( τ ). It then follows that a valid (1 − p )% joint confidence set for θ is given by { θ ∈ Θ : m N ( θ ) ≤ c − p } . We note that inference is simultaneous on all components of θ and that for joint inferencethe approach is not conservative. Inference about subcomponents of θ such as α may bemade by projections and may be conservative.The chief difficulty with the finite sample approach is computational. Implementing theapproach requires inversion of the function m N ( θ ), which may be quite difficult if the numberof parameters is large. To alleviate this problem, Chernozhukov et al. (2009) develop suitableMCMC algorithms. Here we deal with the case where we have high-dimensional covariates. Such cases are com-mon in current high-dimensional data sets where one may see very many potential controlvariables. High-dimensional covariates also arises in semiparametric problems; for example,we may be interested in a partially linear structural quantile model q ( τ, d, w ) = α ( τ ) d + g ( τ, w )where W is a low-dimensional set of variables and we approximate g ( τ, w ) ≈ x ′ β ( τ ) usinga collection of approximating functions x = h ( w ). In settings with high-dimensional X , esti-mation of β ( τ ) may contaminate estimation of the parameters of interest, α ( τ ), leading toa breakdown of estimation and inference based directly on (13). The potential for contami-nation is especially acute in high-dimensional settings where some form of regularization willbe used to make informative estimation feasible but may arise more generally.Due to the potentially poor finite sample performance of estimators based directly on(13), one might prefer to base estimation and inference on “orthogonal” moment conditions20hat are relatively insensitive to estimation of the nuisance parameters β . Specifically, wemay prefer to base estimation and inference for α on moment functions g ( V, α ; η ) , where V = ( Y, D, Z, X )and η denotes nuisance parameters with true values η that include β as a sub-component,that identify α via E[ g ( V, α ; η )] = 0 (29)and obey the Neyman orthogonality condition: ∂ η E[ g ( V, α ; η )] (cid:12)(cid:12)(cid:12) η = η = 0 (30)where ∂ η denotes a functional derivative operator. (30) is the key orthogonality condition thatensures that the moment conditions defining α are locally insensitive to perturbations inthe nuisance parameters. This property results in the first-order properties of estimation andinference of α based on sample analogs to (29) being insensitive to estimation of nuisancefunctions as long as sufficiently high-quality estimators of the nuisance functions are available.The idea of using orthogonal estimating equations goes back at least to Neyman (1959)and Neyman (1979) where they were used in construction of Neyman’s celebrated C ( α )-statistic. The use of moment conditions satisfying the orthogonality condition (30) is crucialfor establishing good properties of semi-parametric estimators in modern, high-dimensionalestimation settings when regularized estimation or other machine learning tools are used inestimation of nuisance functions; see, e.g. Belloni et al. (2016), Chernozhukov et al. (2015b),and Chernozhukov et al. (2016).The orthogonal moment functions for the IVQR setting are given by g τ ( V, α, η ) = ( τ − ( Y ≤ D ′ α + X ′ β ))Ψ( α, δ ( α )) , where Ψ( α, δ ( α )) and δ ( α ) are defined in Section 3.2.1. The nuisance parameter and its truevalue are then given by η := ( β, δ ( α )) , and η := ( β , δ ( α )) . Observe that the Neyman orthogonality condition holds for these moment conditionsbecause, under appropriate smoothness conditions, ∂ β E[ g ( V, α ; η ] (cid:12)(cid:12)(cid:12) η = η = M ( α ) − M ( α ) J − ( α ) J ( α ) = 0 ,∂ δ E[ g ( V, α ; η ] (cid:12)(cid:12)(cid:12) η = η = E [( τ − ( Y ≤ D ′ α + X ′ β )) X ] = 0 . We start similarly to the IQR estimator by first profiling out the coefficients on exogenousvariables using an ℓ -penalized quantile regression estimator to define b β ( a ) = arg min b ∈B n N X i =1 ρ τ ( Y i − D ′ i a − X ′ i b ) + λ dim( b ) X j =1 ψ j | b j | . (31)21or a hypothesized value a . We then estimate c M ( a ) = 1 N h
N N X i =1 Z i X ′ i K h N (cid:16) Y i − D ′ i a − X ′ i b β ( a ) (cid:17) , b J ( a ) = 1 N h
N N X i =1 X i X ′ i K h N (cid:16) Y i − D ′ i a − X ′ i b β ( a ) (cid:17) , for K h N ( · ) a kernel function with bandwidth h N as before. Since b J ( a ) is high-dimensionaland is not invertible, we may estimate row-components δ j ( a ) of matrix δ ( a ) by solving the ℓ -regularized problem ˆ δ j ( a ) = arg min δ δ ′ ˆ J ( a ) δ − ˆ M j ( a ) δ + ϑ k δ k , where ˆ M j ( a ) is the j -th row of ˆ M ( a ), interpreted as a row vector itself, and ϑ is a penaltylevel. The solution ˆ δ j ( a ) obeys the Karush-Kuhn-Tucker condition k ˆ δ j ( a ) ′ ˆ J ( a ) − ˆ M j ( a ) k ∞ ≤ ϑ, ∀ j, (32)so we may think of ˆ δ j ( a ) as a regularized estimator of M j ( a ) J − ( a ).Alternatively we can the regularized estimator via Dantzig form of Lasso by minimizinga norm of ˆ δ ( a ) subject to the above constraints (32).We may then plug in the solution of (31) to form a concentrated sample moment functionanalogous to (14) as b g N ( a ) = 1 N N X i =1 (cid:16) τ − (cid:16) Y i − D ′ i a − X ′ i b β ( a ) ≤ (cid:17)(cid:17) Ψ( a, ˆ δ ( a )) . (33)These concentrated moments can be used to set-up the continuously-updated GMM estima-tor: ˆ α = arg min a ∈A N b g N ( a ) ′ b Σ( a, a ) − b g N ( a ) , where again b Σ( a, a ) is an estimator of the covariance function of the sample concentratedmoment functions (33). The estimator b α would then follow standard properties of the infea-sible GMM estimator that replaced the estimators ˆ β ( a ) and ˆ δ ( a ) with their true values β and δ ( α ) as long as instruments are low dimensional and identification is strong. If the setof instruments was also high-dimensional, further regularization would be called for to makereliable estimation and inference feasible.We can also directly use the concentrated moments to set-up standard Anderson-Rubin-type inference for α under weak or partial identification as in Section 3.3. Similarly, wecould base inference from more refined approaches, such as Andrews and Mikusheva (2016),on the concentrated moments. Indeed, we can use these concentrated moments to form aquasi-likelihood ratio (QLR) statistic as QLR N ( a ) = N b g N ( a ) ′ b Σ( a, a ) − b g N ( a ) − inf a ∈A N b g N ( a ) ′ b Σ( a, a ) − b g N ( a ) . (34)22ecause of the orthogonality property, estimation of the nuisance parameters does not affectthe first-order behavior of the empirical moments, so inference based on (34) falls back exactlyin the setting of Andrews and Mikusheva (2016). One could then employ their approach tocompute the critical values for QLR N ( a ) conditional on a sufficient statistic, c − p ( QLR N ( a )).It then follows that a valid (1 − p )% confidence region for α may be constructed by consid-ering a ’s in the grid { a j , j = 1 , ..., J } exactly as in approximating (27). In this chapter, we have reviewed the structural IVQR model developed in Chernozhukov and Hansen(2005) which can be used to estimate causal quantile effects in the presence of endogeneity.The model makes use of instrumental variables that satisfy conventional independence andrelevance conditions from the nonlinear instrumental variables literature. Specifically, instru-ments are assumed to be independent of unobservables associated to potential outcomes butrelated to endogenous right-hand-side variables in the model. The presence of instrumentsalone is insufficient to identify QTE, and the IVQR models imposes an additional conditionon structural unobservables, termed rank similarity, that restricts the distribution of unob-servables in potential outcomes across different potential states of the endogenous variables.Under these conditions, an IV-style moment condition can be derived which then providesa basis for identification and estimation of QTE. We provided two concrete examples ofeconomic models that fall within the IVQR framework.We then reviewed leading approaches to estimating model parameters and performinginference for QTE within the IVQR model based on the moment conditions implied by themodel. Estimation and inference is complicated by the non-smooth and non-convex nature ofthe IVQR moment conditions. We discuss estimation and inference approaches that attemptto alleviate this issue. We also review approaches to inference which remain valid under weakor even non-identification.There are, of course, many open areas for research in quantile models with endogeneity. Asdiscussed in Section 2.5, Abadie et al. (2002) and Imbens and Newey (2009) offer alternativeapproaches to identifying QTE by imposing alternate sets of assumptions to those usedin the IVQR model. These approaches and the IVQR model are non-nested and furtherunderstanding their connections may be interesting. W¨uthrich (2014) provides a contributionin this direction by showing the connection between the estimands of both models within thestructure of the Abadie et al. (2002) framework. It would also be interesting to analyze theproperties of the IVQR estimands when some of the underlying assumptions are violated.Towards this end, W¨uthrich (2014) provides a characterization of QTE estimands based onthe IVQR model with binary treatments in the absence of rank similarity. Another topic thatmay deserve further consideration is the systematic analysis of estimation and inference basedon the orthogonal moment equations sketched in Section 4.1, especially in high-dimensionalsettings. We also note that the IVQR model may be useful for uncovering structural objectseven if quantile effects are not the chief objects of interest; see, for example, Berry and Haile(2014). It may be interesting to further explore application of the IVQR model and relatedestimation methods in structural economic applications. Finally, a potentially interestingbut more unexplored area may be to think about quantile-like quantities for multivariate23utcomes with endogenous covariates.
References
Abadie, A., October 1995. Changes in spanish labor income structure during the 1980s: Aquantile regression approach, CEMFI Working Paper No. 9521.Abadie, A., Angrist, J., Imbens, G., 2002. Instrumental variables estimates of the effect ofsubsidized training on the quantiles of trainee earnings. Econometrica 70 (1), 91–117.Amemiya, T., 1982. Two stage least absolute deviations estimators. Econometrica 50, 689–711.Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of single equation in acomplete system of stochastic equations. Annals of Mathematical Statistics 20, 46–63.Andrews, D. W. K., Shi, X., 2013. Inference based on conditional moment inequalities.Econometrica 81 (2), 609–666.Andrews, I., Mikusheva, A., 2016. Conditional inference with a functional nuisance param-eter. Econometrica 84 (4), 1571–1612.Belloni, A., Chernozhukov, V., Fern´andez-Val, I., Hansen, C., 2016. Program evaluation withhigh-dimensional data, forthcoming Econometrica.URL https://arxiv.org/abs/1311.2645
Berry, S. T., Haile, P. A., 2014. Identification in differentiated products markets using marketlevel data. Econometrica 82 (5), 1749–1797.Chamberlain, G., 1987. Asymptotic efficiency in estimation with conditional moment restric-tions. Journal of Econometrics 34 (3), 305–334.Chen, X., Pouzo, D., 2009. Efficient estimation of semiparametric conditional moment modelswith possibly nonsmooth residuals. Journal of Econometrics 152 (1), 46–60.Chen, X., Pouzo, D., 2012. Estimation of nonparametric conditional moment models withpossibly nonsmooth moments. Econometrica 80 (1), 277–322.Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., 2016.Double machine learning for treatment and causal parameters. arXiv:1608.00060.URL https://arxiv.org/abs/1608.00060
Chernozhukov, V., Fernandez-Val, I., Melly, B., 2013a. Inference on counterfactual distribu-tions. Econometrica 81 (6), pp. 2205–2268.Chernozhukov, V., Fernndez-Val, I., Kowalski, A. E., 2015a. Quantile regression with cen-soring and endogeneity. Journal of Econometrics 186 (1), 201 – 221.24hernozhukov, V., Hansen, C., 2004. The effects of 401(k) participation on the wealth distri-bution: An instrumental quantile regression analysis. Review of Economics and Statistics86(3), 735–751.Chernozhukov, V., Hansen, C., 2005. An IV model of quantile treatment effects. Economet-rica 73 (1), 245–262.Chernozhukov, V., Hansen, C., 2006. Instrumental quantile regression inference for structuraland treatment effect models. Journal of Econometrics 132 (2), 491–525.Chernozhukov, V., Hansen, C., 2008. Instrumental variable quantile regression: A robustinference approach. Journal of Econometrics 142 (1), 379–398.Chernozhukov, V., Hansen, C., 2013. Quantile models with endogeneity. Annual Review ofEconomics 5, 57–81.Chernozhukov, V., Hansen, C., Jansson, M., 2009. Finite sample inference for quantile re-gression models. Journal of Econometrics 152 (2), 93–103.Chernozhukov, V., Hansen, C., Spindler, M., 2015b. Valid post-selection and post-regularization inference: An elementary, general approach. Annual Review of Economics7, 649–688.Chernozhukov, V., Hong, H., 2003. An mcmc approach to classical estimation. Journal ofEconometrics 115 (2), 293–346.Chernozhukov, V., Imbens, G. W., Newey, W. K., 2007. Instrumental variable estimation ofnonseparable models. Journal of Econometrics 139 (1), 4–14.Chernozhukov, V., Lee, S., Rosen, A., 2013b. Intersection bounds: Estimation and inference.Econometrica 81 (2), 667–737.Chesher, A., 2003. Identification in nonseparable models. Econometrica 71 (5), 1405–1441.Chesher, A., 2005. Nonparametric identification under discrete variation. Econometrica73 (5), 1525–1550.Chesher, A., Rosen, A., Smolinski, K., 2013. An instrumental variable model of multiplediscrete choice. Quantitative Economics 4 (2), 157–196.Chesher, A., Smolinski, K., 2010. Sharp identified sets for discrete variable IV models,ceMMAP Working Paper CWP11/10.URL
Doksum, K., 1974. Empirical probability plots and statistical inference for nonlinear modelsin the two-sample case. Annals of Statistics 2, 267–277.Dong, Y., Shen, S., 2015. Testing for rank invariance or similarity in program evaluation,working paper.URL https://economics.byu.edu/frandsen/Documents/testingranksimilarity20151111.pdf
Fr¨olich, M., Melly, B., 2013. Unconditional quantile treatment effects under endogeneity.Journal of Business and Economic Statistics 31 (3), 346–357.Gagliardini, P., Scaillet, O., 2012. Nonparametric instrumental variable estimation of struc-tural quantile effects. Econometrica 80 (4), 1533–1562.Graddy, K., 1995. Testing for imperfect competition at the Fulton fish market. Rand Journalof Economics 26(1), 75–92.Hallin, M., ˇSiman, M., 2016. Multiple-output quantile regression. In: Chernozhukov, V., He,X., Koenker, R., Peng, L. (Eds.), Handbook of Quantile Regression. CRC Chapman-Hall,forthcoming.Hausman, J. A., 1977. Errors in variables in simultaneous equation models. Journal of Econo-metrics 5 (3), 389–401.Heckman, J. J., Smith, J., Clements, N., 1997. Making the most out of programme evaluationsand social experiments: Accounting for heterogeneity in programme impacts. The Reviewof Economic Studies 64 (4), 487–535.Horowitz, J. L., 1998. Bootstrap methods for median regression models. Econometrica 66 (6),1327–1351.Horowitz, J. L., Lee, S., 2007. Nonparametric instrumental variables estimation of a quantileregression model. Econometrica 75 (4), 1191–1208.Imbens, G. W., Angrist, J. D., 1994. Identification and estimation of local average treatmenteffects. Econometrica 62 (2), 467–475.Imbens, G. W., Newey, W. K., 2009. Identification and estimation of triangular simultaneousequations models without additivity. Econometrica 77 (5), 1481–1512.Jun, S. J., 2008. Weak identification robust tests in an instrumental quantile model. Journalof Econometrics 144, 118–138.Kaplan, D. M., Sun, Y., 2016. Smoothed estimating equations for instrumental variablesquantile regression, forthcoming Econometric Theory.Kleibergen, F., 2005. Testing parameters in gmm without assuming that they are identified.Econometrica 73 (4), 1103–1124.Koenker, R., 2005. Quantile Regression. Cambridge University Press.Koenker, R., Bassett, G. S., 1978. Regression quantiles. Econometrica 46, 33–50.Koenker, R., Geling, O., 2001. Reappraising medfly longevity: A quantile regression survivalanalysis. Journal of the American Statistical Association 96, 458–468.26oenker, R., Ma, L., 2006. Quantile regression methods for recursive structural equationmodels. Journal of Econometrics 134 (2), 471–506.Lee, S., 2007. Endogeneity in quantile regression models: A control function approach. Jour-nal of Econometrics 141 (2), 1131–1158.Lehmann, E. L., 1975. Nonparametrics: statistical methods based on ranks. Holden-Day Inc.,San Francisco, Calif.Machado, J. A. F., Mata, J., 2005. Counterfactual decomposition of changes in wage distri-butions using quantile regression. Journal of Applied Econometrics 20 (4), 445–465.Melly, B., 2005. Decomposition of differences in distribution using quantile regression. LabourEconomics 12 (4), 577 – 590, european Association of Labour Economists 16th AnnualConference, Universidade Nova de Lisboa, Lisbon, 9th 11th Sepetember, 2004.Melly, B., W¨uthrich, K., 2016. Local quantile treatment effects. In: Chernozhukov, V., He,X., Koenker, R., Peng, L. (Eds.), Handbook of Quantile Regression. CRC Chapman-Hall,forthcoming.Moreira, M. J., 2003. A conditional likelihood ratio test for structural models. Econometrica71, 1027–1048.Newey, W., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle,R. F., McFadden, D. (Eds.), Handbook of Econometrics, Vol. IV. Elsevier B. V., pp.2111–2245.Neyman, J., 1959. Optimal asymptotic tests of composite statistical hypotheses. In: Grenan-der, U. (Ed.), Probability and Statistics, the Harald Cramer Volume. New York, Wiley.Neyman, J., 1979. c ( α ) tests and their use. Sankhya 41, 1–21.Santos, A., 2012. Inference in nonparametric instrumental variables with partial identifica-tion. Econometrica 80 (1), 213–275.Torgovitsky, A., 2015. Identification of nonseparable models using instruments with smallsupport. Econometrica 83 (3), 1185–1197.Wang, H. J., Yang, Y., 2016. Bayesian quantile regression. In: Chernozhukov, V., He, X.,Koenker, R., Peng, L. (Eds.), Handbook of Quantile Regression. CRC Chapman-Hall,forthcoming.W¨uthrich, K., 2014. A comparison of two quantile models with endogeneity, working Paper,Universit¨at Bern, Department of Economics.URL