Interpreting Unconditional Quantile Regression with Conditional Independence
aa r X i v : . [ ec on . E M ] O c t Interpreting Unconditional Quantile Regression withConditional Independence
David M. Kaplan ∗ October 9, 2020
Abstract
This note provides additional interpretation for the counterfactual outcome dis-tribution and corresponding unconditional quantile “effects” defined and estimatedby Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly(2013). With conditional independence of the policy variable of interest, these methodsestimate the policy effect for certain types of policies, but not others. In particular, theyestimate the effect of a policy change that itself satisfies conditional independence.
JEL classification : C21
Keywords : counterfactual, policy, unconfoundedness
Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly (2013), amongothers, consider a counterfactual distribution of an outcome variable (scalar Y ) constructedby replacing the marginal distribution of covariates (vector X ) with a new distribution (CDF G X ( · )) while maintaining the same conditional distribution (conditional CDF F Y | X ( · )). Us-ing the notation from equations (1) and (2) of Firpo, Fortin, and Lemieux (2009), the actualand counterfactual distributions are, respectively,actual: F Y ( y ) = Z F Y | X ( y | X = x ) dF X ( x ) , (1)counterfactual: G ∗ Y ( y ) = Z F Y | X ( y | X = x ) dG X ( x ) . (2)Both papers use the phrase “unconditional quantile regression” to mean the change in thequantiles of the unconditional distribution of Y associated with a change in the distributionof X . Firpo, Fortin, and Lemieux (2009) consider an infinitessimal change in the directionof G X , starting at the actual F X . Chernozhukov, Fern´andez-Val, and Melly (2013) consider ∗ Department of Economics, University of Missouri. Email: [email protected] G X , and thus the difference between quantiles of the distributions in (1)and (2); see their discussion on page 2213 (including footnote 8). Chernozhukov, Fern´andez-Val, and Melly(2013, § § X from F X to G X . In certain settings, it may be plausiblethat F Y | X is policy-invariant, but usually it is not, due to the usual sources of endogeneitylike selection. For example, if individuals sort into low and high education ( X ) based onunobserved ability that affects wages ( Y ), then a policy increasing education would movelow-ability individuals into the high-education group, affecting the distribution of wages forthat group, i.e., affecting F Y | X .Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly (2013)both mention that (2) is useful for policy analysis given policy-invariant F Y | X . Chernozhukov, Fern´andez-Val, and Melly(2013) write, “changing the covariate distribution. . . has a causal interpretation as the policyeffect. . . under the assumption that the policy does not affect the conditional distribution”(p. 2215, § F Y | X ( · ) is unaffected by this smallmanipulation of the distribution of X ” (p. 955, § F Y | X tobe “structural” in the sense of “invariant to a class of modifications” (Heckman and Vytlacil,2007, p. 4848), even though it is not explicitly estimated by Firpo, Fortin, and Lemieux(2009).My contribution is to characterize which policies indeed do not affect the conditional dis-tribution, given a conditional independence assumption. This is true if the changes in policy Additionally, in their Section 2.3 (“When Counterfactual Effects Have a Causal Interpretation”),Chernozhukov, Fern´andez-Val, and Melly (2013) consider implications of the same conditional independenceassumption I consider below (their (2.8)), but only for “type 1 counterfactual effects” where the conditionaldistribution changes but the covariate distribution remains fixed (Lemma 2.1), not for unconditional quantileregression. The need to define a particular class of policies is not specific to unconditional quantile regression; Y distribution of a policy that changes the unconditional distributionof one particular variable of interest. Assuming conditional independence of the policy vari-able and unobservables (given a vector of controls), and assuming the policy changes theunconditional distribution of the policy variable in a deterministic way that maintains rankinvariance, the distributional effect is point identified for a continuous policy variable and setidentified for a discrete variable. Instead, in Section 3, I allow the policy to affect a vectorof variables, replacing the deterministic rank-invariant policy change with a conditionallyindependent policy change, which achieves point identification even for discrete policy vari-ables. I also allow stochastic policy changes. I further connect our results by showing thatthe policy change in Rothe (2012) satisfies conditional independence when the policy vari-able is continuous but not discrete. With a discrete policy variable, my results show pointidentification is recovered by strengthening conditional independence of the observed policyvariable to conditional independence of Rothe’s (2012) underlying rank variable.Section 2 has results in a potential outcomes framework. Section 3 has results for ageneral structural model with multiple discrete and/or continuous policy variables. I use the following notation and definitions. Let vector X be partitioned into X = ( X , X ),where X ∈ { , } is the binary treatment variable of policy interest and vector X ∈X contains control variables. Potential untreated and treated outcomes are Y and Y ,respectively. The observed outcome is Y = Y (1 − X ) + Y X . (3) e.g., Heckman and Vytlacil (2007, p. 4848) write generally, “A system structural for one class of policymodifications may not be structural for another.” Y , Y ) ⊥⊥ X | X , (4)i.e., conditional on the control variables in X , there is independence between the treat-ment X and the potential outcome pair ( Y , Y ). This condition has many other names(conditional exogeneity, unconfoundedness, strong ignorability, etc.). Assumption A1.
Outcome Y is generated depending on potential outcomes as in (3), andconditional independence holds in the sense of (4). Assumption A2.
The policy changes ( X , X ) to ( X + ∆ , X ), where ∆ is a randomvariable satisfying the conditional independence assumption( Y , Y ) ⊥⊥ ∆ | X . (5)The policy change ∆ may depend on X , X , unobservables, and/or randomization, aslong as (5) is satisfied. For example, ∆ = 1 − X switches all X = 0 to X = 1 andvice-versa, and it satisfies (5) since ∆ only depends on X and X satisfies conditionalindependence. Similarly, (5) is satisfied by setting all X = 0 with ∆ = − X , or settingall X = 1 with ∆ = 1 − X . Randomization like P(∆ = 0) = P(∆ = 1 − X ) = 1 / X , like setting ∆ = 0 for certainranges of X , as long as (5) holds. The policy could affect X indirectly, like by changingincentives to participate as in Heckman and Vytlacil (2001), unless the incentives affectindividuals differently depending on their potential outcomes (as is often true), thus violating(5). Other than the restriction of changing only X (and not X ), the main restriction isthat the policy may not (explicitly or implicitly) target individuals based on their potentialoutcomes. Theorem 1.
Given Assumption A1, a policy satisfying Assumption A2 does not change theconditional distribution F Y | X .Proof. Given A1, the actual conditional distribution can be simplified using (3) and (4).Evaluating the conditional CDF at value y conditional on X = x and X = x , F Y | X ( y | x , x ) ≡ P( Y ≤ y | X = x , X = x )= P( by (3) z}|{ Y x ≤ y | X = x , X = x )= P( Y x ≤ y | by (4) z }| { X = x ) . (6)4nder the new policy, the first element of X becomes X + ∆ , so F Y | X ( y | x , x ) ≡ P( Y ≤ y | X + ∆ = x , X = x )= P( by (3) z}|{ Y x ≤ y | X + ∆ = x , X = x )= P( Y x ≤ y | by (4) and (5) z }| { X = x ) . (7)That is, after conditioning on X = x , further conditioning on X + ∆ has no effect onthe distribution of Y x , since both X and ∆ are conditionally independent of potentialoutcomes. Thus, F Y | X remains unchanged.Theorem 1 readily extends to multi-valued treatment, i.e., when the support of X is { , , . . . , J } instead of just { , } .Theorem 1 could be applied to the empirical example from Firpo, Fortin, and Lemieux(2009, § Y is log wage (for U.S. males); X is a dummy for union membership; and X includes dummies for non-white, married, education categories, and ranges of experience.Footnote 18 (p. 962) says, “For simplicity, we maintain the assumption that union coveragestatus is exogenous. Studies that have used selection models or longitudinal methods [to treatendogeneity] suggest that the exogeneity assumption only introduces small biases.” That is,they suggest Assumption A1 is at least a good approximation. However, Assumption A2must also hold to interpret the UQR estimates as policy effects.In their setting, Assumption A2 would arguably hold for some policies but not others.It holds for the extreme case of outlawing unionization, but possibly not for marginal policychanges that operate by changing incentives or information sets. For example, if a policypopularizes empirical results that union membership benefits lower-skilled workers morethan higher-skilled workers, consequent changes in union membership would likely dependon potential outcomes (via unobserved skill) even conditional on X . It is difficult to guesswhether a right-to-work law would (approximately) satisfy Assumption A2. If the law detersworkers from union membership independently of their potential outcomes conditional ontheir X , then Assumption A2 would hold. However, if low-wage workers are deterred morethan high-wage workers by union membership becoming relatively more costly than non-membership, then Assumption A2 may not hold.5 Structural model with general policy variable
The intuition of Theorem 1 applies to a general structural model with a general vector X affected by the policy. This X can have continuous, discrete, and/or categorical components.The structural model is Y = h ( X , X , U ) , (8)where Y is the scalar outcome, X is now allowed to be a vector affected by the policy, X is a vector of control variables, and U is a vector of unobserved determinants of Y . Thestructural function h ( · ) is unknown and unrestricted (i.e., nonparametric and nonseparable)but assumed invariant to the policy considered.The conditional independence assumption is U ⊥⊥ X | X , (9)i.e., conditional on the control variables in X , there is independence between X and U . Assumption A3.
Given the policy-invariant structural model in (8), conditional indepen-dence holds in the sense of (9).
Assumption A4.
The policy changes ( X , X ) to ( X + ∆ , X ), where ∆ is a randomvector satisfying the conditional independence assumption U ⊥⊥ ∆ | X . (10) Theorem 2.
Given Assumption A3, a policy satisfying Assumption A4 does not change theconditional distribution F Y | X .Proof. Given A3, the actual (initial) conditional distribution F Y | X simplifies to F Y | X ( y | x , x ) ≡ P( Y ≤ y | X = x , X = x )= P( by (8) z }| { h ( X , X , U ) ≤ y | X = x , X = x )= P( h ( x , x , U ) ≤ y | X = x , X = x )= P( h ( x , x , U ) ≤ y | by (9) z }| { X = x ) . (11)Under the new policy, the first elements of X change from X to X + ∆ , so F Y | X ( y | x , x ) ≡ P( Y ≤ y | X + ∆ = x , X = x )6 P( by (8) and A4 z }| { h ( X + ∆ , X , U ) ≤ y | X + ∆ = x , X = x )= P( h ( x , x , U ) ≤ y | X + ∆ = x , X = x )= P( h ( x , x , U ) ≤ y | by (9) and (10) z }| { X = x ) . (12)That is, after conditioning on X = x , further conditioning on X + ∆ has no effect onthe distribution of U and thus no effect on the distribution of h ( x , x , U ) (given values x and x ), since both X and ∆ are conditionally independent of the unobserved U . Thus, F Y | X remains unchanged.As noted in the introduction, Rothe (2012) considers a related setting; details follow.Assumption A3 is the same, but Assumption A4 differs. Rothe (2012) has scalar X thatis (without loss of generality) written in terms of a latent rank variable W ∼ Unif(0 ,
1) as X = F − ( W ), where F − ( · ) is the generalized inverse of CDF F ( · ). The policy then changes X from F − ( W ) to F − p ( W ), where F − p ( · ) is the generalized inverse of CDF F p ( · ), thecounterfactual CDF of X . This imposes rank invariance because w ≥ w ′ ⇐⇒ F − ( w ) ≥ F − ( w ′ ) ⇐⇒ F − p ( w ) ≥ F − p ( w ′ ), and it is deterministic in the sense that F p ( · ) is non-random. If X is continuous, then W = F ( X ) and∆ = F − p ( W ) − X = F − p ( F ( X )) − X . (13)This satisfies Assumption A4, suggesting the policy effect is point identified, and indeedRothe (2012) provides point identification (by a different argument).However, with discrete X , (13) fails. With discrete X , W = F ( X ) because W ∼ Unif(0 ,
1) but F ( X ) has a discrete distribution. Consequently, ∆ cannot be written solelyin terms of X because F − p ( W ) = F − p ( F ( X )); instead, ∆ = F − p ( W ) − X also depends on W . With the additional assumption of conditional independence of W , point identificationwould result. However, W may violate conditional independence even if X satisfies it.Thus, generally, policy effects are only partially identified with discrete X , and Rothe (2012)derives the identified sets. References
Chernozhukov, Victor, Iv´an Fern´andez-Val, and Blaise Melly. 2013. “Infer-ence on Counterfactual Distributions.”
Econometrica
81 (6):2205–2268. URL .Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2009. “Uncon-7itional Quantile Regression.”
Econometrica
77 (3):953–973. URL .Heckman, James J. and Edward Vytlacil. 2001. “Policy-Relevant Treatment Effects.”
Amer-ican Economic Review
91 (2):107–111. URL .Heckman, James J. and Edward J. Vytlacil. 2007. “Econometric Evaluationof Social Programs, Part I: Causal Models, Structural Models and Economet-ric Policy Evaluation.” In
Handbook of Econometrics , vol. 6B, edited byJames J. Heckman and Edward E. Leamer, chap. 70. Elsevier, 4779–4874. URL https://doi.org/10.1016/S1573-4412(07)06070-9 .Rothe, Christoph. 2012. “Partial Distributional Policy Effects.”
Econometrica
80 (5):2269–2301. URL https://doi.org/10.3982/ECTA9671https://doi.org/10.3982/ECTA9671