[PDF] Interpreting Unconditional Quantile Regression with Conditional Independence

Abstract

This note provides additional interpretation for the counterfactual outcome distribution and corresponding unconditional quantile "effects" defined and estimated by Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fernández-Val, and Melly (2013). With conditional independence of the policy variable of interest, these methods estimate the policy effect for certain types of policies, but not others. In particular, they estimate the effect of a policy change that itself satisfies conditional independence.

Full PDF

aa r X i v : . [ ec on . E M ] O c t Interpreting Unconditional Quantile Regression withConditional Independence

David M. Kaplan ∗ October 9, 2020

Abstract

This note provides additional interpretation for the counterfactual outcome dis-tribution and corresponding unconditional quantile “eﬀects” deﬁned and estimatedby Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly(2013). With conditional independence of the policy variable of interest, these methodsestimate the policy eﬀect for certain types of policies, but not others. In particular, theyestimate the eﬀect of a policy change that itself satisﬁes conditional independence.

JEL classiﬁcation : C21

Keywords : counterfactual, policy, unconfoundedness

Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly (2013), amongothers, consider a counterfactual distribution of an outcome variable (scalar Y ) constructedby replacing the marginal distribution of covariates (vector X ) with a new distribution (CDF G X ( · )) while maintaining the same conditional distribution (conditional CDF F Y | X ( · )). Us-ing the notation from equations (1) and (2) of Firpo, Fortin, and Lemieux (2009), the actualand counterfactual distributions are, respectively,actual: F Y ( y ) = Z F Y | X ( y | X = x ) dF X ( x ) , (1)counterfactual: G ∗ Y ( y ) = Z F Y | X ( y | X = x ) dG X ( x ) . (2)Both papers use the phrase “unconditional quantile regression” to mean the change in thequantiles of the unconditional distribution of Y associated with a change in the distributionof X . Firpo, Fortin, and Lemieux (2009) consider an inﬁnitessimal change in the directionof G X , starting at the actual F X . Chernozhukov, Fern´andez-Val, and Melly (2013) consider ∗ Department of Economics, University of Missouri. Email: [email protected] G X , and thus the diﬀerence between quantiles of the distributions in (1)and (2); see their discussion on page 2213 (including footnote 8). Chernozhukov, Fern´andez-Val, and Melly(2013, § § X from F X to G X . In certain settings, it may be plausiblethat F Y | X is policy-invariant, but usually it is not, due to the usual sources of endogeneitylike selection. For example, if individuals sort into low and high education ( X ) based onunobserved ability that aﬀects wages ( Y ), then a policy increasing education would movelow-ability individuals into the high-education group, aﬀecting the distribution of wages forthat group, i.e., aﬀecting F Y | X .Firpo, Fortin, and Lemieux (2009) and Chernozhukov, Fern´andez-Val, and Melly (2013)both mention that (2) is useful for policy analysis given policy-invariant F Y | X . Chernozhukov, Fern´andez-Val, and Melly(2013) write, “changing the covariate distribution. . . has a causal interpretation as the policyeﬀect. . . under the assumption that the policy does not aﬀect the conditional distribution”(p. 2215, § F Y | X ( · ) is unaﬀected by this smallmanipulation of the distribution of X ” (p. 955, § F Y | X tobe “structural” in the sense of “invariant to a class of modiﬁcations” (Heckman and Vytlacil,2007, p. 4848), even though it is not explicitly estimated by Firpo, Fortin, and Lemieux(2009).My contribution is to characterize which policies indeed do not aﬀect the conditional dis-tribution, given a conditional independence assumption. This is true if the changes in policy Additionally, in their Section 2.3 (“When Counterfactual Eﬀects Have a Causal Interpretation”),Chernozhukov, Fern´andez-Val, and Melly (2013) consider implications of the same conditional independenceassumption I consider below (their (2.8)), but only for “type 1 counterfactual eﬀects” where the conditionaldistribution changes but the covariate distribution remains ﬁxed (Lemma 2.1), not for unconditional quantileregression. The need to deﬁne a particular class of policies is not speciﬁc to unconditional quantile regression; Y distribution of a policy that changes the unconditional distributionof one particular variable of interest. Assuming conditional independence of the policy vari-able and unobservables (given a vector of controls), and assuming the policy changes theunconditional distribution of the policy variable in a deterministic way that maintains rankinvariance, the distributional eﬀect is point identiﬁed for a continuous policy variable and setidentiﬁed for a discrete variable. Instead, in Section 3, I allow the policy to aﬀect a vectorof variables, replacing the deterministic rank-invariant policy change with a conditionallyindependent policy change, which achieves point identiﬁcation even for discrete policy vari-ables. I also allow stochastic policy changes. I further connect our results by showing thatthe policy change in Rothe (2012) satisﬁes conditional independence when the policy vari-able is continuous but not discrete. With a discrete policy variable, my results show pointidentiﬁcation is recovered by strengthening conditional independence of the observed policyvariable to conditional independence of Rothe’s (2012) underlying rank variable.Section 2 has results in a potential outcomes framework. Section 3 has results for ageneral structural model with multiple discrete and/or continuous policy variables. I use the following notation and deﬁnitions. Let vector X be partitioned into X = ( X , X ),where X ∈ { , } is the binary treatment variable of policy interest and vector X ∈X contains control variables. Potential untreated and treated outcomes are Y and Y ,respectively. The observed outcome is Y = Y (1 − X ) + Y X . (3) e.g., Heckman and Vytlacil (2007, p. 4848) write generally, “A system structural for one class of policymodiﬁcations may not be structural for another.” Y , Y ) ⊥⊥ X | X , (4)i.e., conditional on the control variables in X , there is independence between the treat-ment X and the potential outcome pair ( Y , Y ). This condition has many other names(conditional exogeneity, unconfoundedness, strong ignorability, etc.). Assumption A1.

Outcome Y is generated depending on potential outcomes as in (3), andconditional independence holds in the sense of (4). Assumption A2.

The policy changes ( X , X ) to ( X + ∆ , X ), where ∆ is a randomvariable satisfying the conditional independence assumption( Y , Y ) ⊥⊥ ∆ | X . (5)The policy change ∆ may depend on X , X , unobservables, and/or randomization, aslong as (5) is satisﬁed. For example, ∆ = 1 − X switches all X = 0 to X = 1 andvice-versa, and it satisﬁes (5) since ∆ only depends on X and X satisﬁes conditionalindependence. Similarly, (5) is satisﬁed by setting all X = 0 with ∆ = − X , or settingall X = 1 with ∆ = 1 − X . Randomization like P(∆ = 0) = P(∆ = 1 − X ) = 1 / X , like setting ∆ = 0 for certainranges of X , as long as (5) holds. The policy could aﬀect X indirectly, like by changingincentives to participate as in Heckman and Vytlacil (2001), unless the incentives aﬀectindividuals diﬀerently depending on their potential outcomes (as is often true), thus violating(5). Other than the restriction of changing only X (and not X ), the main restriction isthat the policy may not (explicitly or implicitly) target individuals based on their potentialoutcomes. Theorem 1.

Given Assumption A1, a policy satisfying Assumption A2 does not change theconditional distribution F Y | X .Proof. Given A1, the actual conditional distribution can be simpliﬁed using (3) and (4).Evaluating the conditional CDF at value y conditional on X = x and X = x , F Y | X ( y | x , x ) ≡ P( Y ≤ y | X = x , X = x )= P( by (3) z}|{ Y x ≤ y | X = x , X = x )= P( Y x ≤ y | by (4) z }| { X = x ) . (6)4nder the new policy, the ﬁrst element of X becomes X + ∆ , so F Y | X ( y | x , x ) ≡ P( Y ≤ y | X + ∆ = x , X = x )= P( by (3) z}|{ Y x ≤ y | X + ∆ = x , X = x )= P( Y x ≤ y | by (4) and (5) z }| { X = x ) . (7)That is, after conditioning on X = x , further conditioning on X + ∆ has no eﬀect onthe distribution of Y x , since both X and ∆ are conditionally independent of potentialoutcomes. Thus, F Y | X remains unchanged.Theorem 1 readily extends to multi-valued treatment, i.e., when the support of X is { , , . . . , J } instead of just { , } .Theorem 1 could be applied to the empirical example from Firpo, Fortin, and Lemieux(2009, § Y is log wage (for U.S. males); X is a dummy for union membership; and X includes dummies for non-white, married, education categories, and ranges of experience.Footnote 18 (p. 962) says, “For simplicity, we maintain the assumption that union coveragestatus is exogenous. Studies that have used selection models or longitudinal methods [to treatendogeneity] suggest that the exogeneity assumption only introduces small biases.” That is,they suggest Assumption A1 is at least a good approximation. However, Assumption A2must also hold to interpret the UQR estimates as policy eﬀects.In their setting, Assumption A2 would arguably hold for some policies but not others.It holds for the extreme case of outlawing unionization, but possibly not for marginal policychanges that operate by changing incentives or information sets. For example, if a policypopularizes empirical results that union membership beneﬁts lower-skilled workers morethan higher-skilled workers, consequent changes in union membership would likely dependon potential outcomes (via unobserved skill) even conditional on X . It is diﬃcult to guesswhether a right-to-work law would (approximately) satisfy Assumption A2. If the law detersworkers from union membership independently of their potential outcomes conditional ontheir X , then Assumption A2 would hold. However, if low-wage workers are deterred morethan high-wage workers by union membership becoming relatively more costly than non-membership, then Assumption A2 may not hold.5 Structural model with general policy variable

The intuition of Theorem 1 applies to a general structural model with a general vector X aﬀected by the policy. This X can have continuous, discrete, and/or categorical components.The structural model is Y = h ( X , X , U ) , (8)where Y is the scalar outcome, X is now allowed to be a vector aﬀected by the policy, X is a vector of control variables, and U is a vector of unobserved determinants of Y . Thestructural function h ( · ) is unknown and unrestricted (i.e., nonparametric and nonseparable)but assumed invariant to the policy considered.The conditional independence assumption is U ⊥⊥ X | X , (9)i.e., conditional on the control variables in X , there is independence between X and U . Assumption A3.

Given the policy-invariant structural model in (8), conditional indepen-dence holds in the sense of (9).

Assumption A4.

The policy changes ( X , X ) to ( X + ∆ , X ), where ∆ is a randomvector satisfying the conditional independence assumption U ⊥⊥ ∆ | X . (10) Theorem 2.

Given Assumption A3, a policy satisfying Assumption A4 does not change theconditional distribution F Y | X .Proof. Given A3, the actual (initial) conditional distribution F Y | X simpliﬁes to F Y | X ( y | x , x ) ≡ P( Y ≤ y | X = x , X = x )= P( by (8) z }| { h ( X , X , U ) ≤ y | X = x , X = x )= P( h ( x , x , U ) ≤ y | X = x , X = x )= P( h ( x , x , U ) ≤ y | by (9) z }| { X = x ) . (11)Under the new policy, the ﬁrst elements of X change from X to X + ∆ , so F Y | X ( y | x , x ) ≡ P( Y ≤ y | X + ∆ = x , X = x )6 P( by (8) and A4 z }| { h ( X + ∆ , X , U ) ≤ y | X + ∆ = x , X = x )= P( h ( x , x , U ) ≤ y | X + ∆ = x , X = x )= P( h ( x , x , U ) ≤ y | by (9) and (10) z }| { X = x ) . (12)That is, after conditioning on X = x , further conditioning on X + ∆ has no eﬀect onthe distribution of U and thus no eﬀect on the distribution of h ( x , x , U ) (given values x and x ), since both X and ∆ are conditionally independent of the unobserved U . Thus, F Y | X remains unchanged.As noted in the introduction, Rothe (2012) considers a related setting; details follow.Assumption A3 is the same, but Assumption A4 diﬀers. Rothe (2012) has scalar X thatis (without loss of generality) written in terms of a latent rank variable W ∼ Unif(0 ,

1) as X = F − ( W ), where F − ( · ) is the generalized inverse of CDF F ( · ). The policy then changes X from F − ( W ) to F − p ( W ), where F − p ( · ) is the generalized inverse of CDF F p ( · ), thecounterfactual CDF of X . This imposes rank invariance because w ≥ w ′ ⇐⇒ F − ( w ) ≥ F − ( w ′ ) ⇐⇒ F − p ( w ) ≥ F − p ( w ′ ), and it is deterministic in the sense that F p ( · ) is non-random. If X is continuous, then W = F ( X ) and∆ = F − p ( W ) − X = F − p ( F ( X )) − X . (13)This satisﬁes Assumption A4, suggesting the policy eﬀect is point identiﬁed, and indeedRothe (2012) provides point identiﬁcation (by a diﬀerent argument).However, with discrete X , (13) fails. With discrete X , W = F ( X ) because W ∼ Unif(0 ,

1) but F ( X ) has a discrete distribution. Consequently, ∆ cannot be written solelyin terms of X because F − p ( W ) = F − p ( F ( X )); instead, ∆ = F − p ( W ) − X also depends on W . With the additional assumption of conditional independence of W , point identiﬁcationwould result. However, W may violate conditional independence even if X satisﬁes it.Thus, generally, policy eﬀects are only partially identiﬁed with discrete X , and Rothe (2012)derives the identiﬁed sets. References

Chernozhukov, Victor, Iv´an Fern´andez-Val, and Blaise Melly. 2013. “Infer-ence on Counterfactual Distributions.”

Econometrica

81 (6):2205–2268. URL .Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2009. “Uncon-7itional Quantile Regression.”

Econometrica

77 (3):953–973. URL .Heckman, James J. and Edward Vytlacil. 2001. “Policy-Relevant Treatment Eﬀects.”

Amer-ican Economic Review

91 (2):107–111. URL .Heckman, James J. and Edward J. Vytlacil. 2007. “Econometric Evaluationof Social Programs, Part I: Causal Models, Structural Models and Economet-ric Policy Evaluation.” In

Handbook of Econometrics , vol. 6B, edited byJames J. Heckman and Edward E. Leamer, chap. 70. Elsevier, 4779–4874. URL https://doi.org/10.1016/S1573-4412(07)06070-9 .Rothe, Christoph. 2012. “Partial Distributional Policy Eﬀects.”

Econometrica

80 (5):2269–2301. URL https://doi.org/10.3982/ECTA9671https://doi.org/10.3982/ECTA9671