[PDF] Instrumental Variables with Treatment-Induced Selection: Exact Bias Results

Abstract

Instrumental variables (IV) estimation suffers selection bias when the analysis conditions on the treatment. Judea Pearl's early graphical definition of instrumental variables explicitly prohibited conditioning on the treatment. Nonetheless, the practice remains common. In this paper, we derive exact analytic expressions for IV selection bias across a range of data-generating models, and for various selection-inducing procedures. We present four sets of results for linear models. First, IV selection bias depends on the conditioning procedure (covariate adjustment vs. sample truncation). Second, IV selection bias due to covariate adjustment is the limiting case of IV selection bias due to sample truncation. Third, in certain models, the IV and OLS estimators under selection bound the true causal effect in large samples. Fourth, we characterize situations where IV remains preferred to OLS despite selection on the treatment. These results broaden the notion of IV selection bias beyond sample truncation, replace prior simulation findings with exact analytic formulas, and enable formal sensitivity analyses.

Full PDF

IInstrumental Variables withTreatment-Induced Selec-tion: Exact Bias Results

Felix Elwert

University of Wisconsin–Madison

Elan Segarra

University of Wisconsin–Madison

Instrumental variables (IV) estimation suffers selection bias when the analysis conditions onthe treatment. Judea Pearl’s [2000:248] early graphical deﬁnition of instrumental variablesexplicitly prohibited conditioning on the treatment. Nonetheless, the practice remains com-mon. In this paper, we derive exact analytic expressions for IV selection bias across a rangeof data-generating models, and for various selection-inducing procedures. We present foursets of results for linear models. First, IV selection bias depends on the conditioning proce-dure (covariate adjustment vs. sample truncation). Second, IV selection bias due to covariateadjustment is the limiting case of IV selection bias due to sample truncation. Third, in cer-tain models, the IV and OLS estimators under selection bound the true causal effect in largesamples. Fourth, we characterize situations where IV remains preferred to OLS despite se-lection on the treatment. These results broaden the notion of IV selection bias beyond sampletruncation, replace prior simulation ﬁndings with exact analytic formulas, and enable formalsensitivity analyses. Introduction

Instrumental variables (IV) analysis is a popular approach for identifying causal effects whenthe treatment is confounded by omitted variables. IV analysis rests on two main assumptions:that the instrument is associated with the treatment (“relevance”), and that the instrument isassociated with the outcome only via the effect of treatment on the outcome (“exclusion”).The exclusion assumption is the sticking point of many empirical applications, because it a r X i v : . [ ec on . E M ] M a y requires theoretical justiﬁcation and is testable only to a very limited degree (e.g., Balke andPearl 1997; Richardson and Robins 2010).One type of exclusion violation that has recently gained attention is selection bias (e.g.,Swanson et al. 2015; Engberg et al. 2014; Ertefaie et al. 2016; Hughes et al. 2019; Cananet al. 2017; Gkatzionis and Burgess 2018; Mogstad and Wiswall 2012). We say that IVanalysis suffers selection bias when conditioning (rather than not conditioning) on somevariable violates the exclusion assumption. One particularly important case is treatment-induced IV selection bias : whenever treatment is confounded by unobservables, conditioningon a variable that has been affected by treatment induces bias. Judea Pearl [2000, p. 248]recognized this problem and presented the ﬁrst deﬁnition of instrumental variables thatoutright prohibits conditioning on variables affected by treatment. Despite Pearl’s warning,however, conditioning on such “descendants” of treatment remains common in IV analysis.Past research on treatment-induced IV selection bias (Swanson et al. 2015; Hughes et al.2019; Canan et al. 2017; Gkatzionis and Burgess 2018) is limited in two respects. First, it hasfocused on IV selection bias induced by sample truncation, which occurs when observationsare excluded from the sample. This focus neglects that other conditioning procedures, suchas covariate adjustment, can also induce selection bias. Second, in situations where consistentestimators are not readily available, the literature characterizes the size and sign of IVselection bias by simulation. Without analytic bias expressions, however, it is unclear whichstylized facts resulting from simulation studies hold generically.This paper makes two main contributions. First, we derive analytic expressions fortreatment-induced IV selection bias for a range of different data-generating models. Sec-ond, we compare the biases resulting from two different selection-inducing conditioningprocedures: sample truncation and covariate adjustment. For tractability, we focus on linearmodels with homogeneous (constant) effects and normal errors.We highlight several results. First, the selection procedure matters. Within a given data-generating model, selection by truncation and selection by covariate adjustment introducequantitatively different biases into IV analysis. Second, selection bias by adjustment is thelimiting case of selection bias by truncation. Third, in certain models, the IV and OLSestimators with selection bound the true causal effect in large samples. Fourth, our analyticbias expressions characterize the models in which IV is less biased than OLS, which obtainswhen treatment does not exert an extreme effect on selection.The rest of the paper proceeds as follows. Section 2 reviews basic facts about directedacyclic graphs for linear models. Section 3 deﬁnes instrumental variables in econometricand graphical notation. Section 4 describes conditions under which selection violates the IV Some studies have proposed corrections, bounds, or sensitivity analyses for IV selection bias in certain truncationscenarios (e.g., Mogstad and Wiswall 2012; Engberg et al. 2014; Canan et al. 2017; Vansteelandt et al. 2018;Gkatzionis and Burgess 2018; Hughes et al. 2019). These approaches often rely on knowing the selection probabilityof both the observed and the truncated observations.

Causal Graphs Z T YUS π βδ δ γ (a) Z = ε Z U = ε U T = π Z + δ U + ε T S = γ T + ε S Y = β T + δ U + ε Y (b) Figure 1: IV scenario where the selection variable is a function of treatment alone, equiva-lently displayed as a causal graph (a) and as a linear structural equations model (b).exclusion assumption and deﬁnes IV estimation under selection by truncation and covariateadjustment. Section 5 presents analytic expressions for the bias in IV and OLS estimators overa range of models with treatment-induced selection by trunction and by covariate adjustment.Section 6 concludes. Causal Graphs

The challenge of selection bias in IV analysis is transparently communicated with graphicalcausal models (Pearl 2009; Maathuis 2018). Here, we review the basics. A causal graph represents the structural equations of the data-generating model. Causal graphs consist of nodes representing variables and directed edges representing direct causal effects. Causalgraphs are assumed explicitly to display the observed and unobserved common causes ofall variables. By convention, causal graphs do not explicitly display the idiosyncratic shocksthat affect individual variables.Throughout, we assume that the causal graphs represent linear data-generating models withhomogeneous effects and normally distributed errors. Without loss of generality, we furtherassume that all variables are standardized to have mean zero and unit variance. The directcausal effect of one variable on another variable in such models is given by its path parameter ,which is bounded by [-1,1]. For example, the causal graph in Figure 1a represents the linearstructural equations model given in Figure 1b, with path parameters π , β , γ , δ , and δ . Foreach variable V ∈ { Z , U , T , S , Y } the idiosyncratic shocks are marginally independent andnormally distributed, ε V ∼ N ( , σ V ) , with variance σ V scaled so that each V ∼ N ( , ) . Since U is unobserved, the structural error term on Y in econometric terminology is ω Y = δ U + ε Y . Some results do not rest on the joint normality assumption, but our results on IV selection bias with truncation do.

Notice that T is correlated with the structural error, Cov ( T , ω Y ) (cid:54) =

0, because both depend onthe unobserved confounder, U .Under mild conditions to avoid knife-edge cases, simple rules determine the covariancestructure of data generated by a model (Pearl 2009). The notions of paths, collider variables,and descendants play a central role in these rules. A path is an acyclic sequence of adjacentarrows between two variables, regardless of the direction of the arrows. In a causal path fromtreatment to outcome, all arrows point toward the outcome. In a non-causal , or spurious , pathbetween treatment and outcome, at least one arrow points away from the outcome. A variableis called a collider with respect to a speciﬁc path if it receives two inbound arrows on the path.For example, T is a collider on the path Z → T ← U → Y . The descendant set of a variablecontains all variables directly and indirectly caused by it, e.g. desc ( T ) = { S , Y } in Figure 1a.Two variables are statistically independent if all paths between them are closed; and twovariables are statistically associated if there is at least one open path between them (Verma andPearl 1988). A path is closed (does not transmit association) if either (a) it contains a colliderand neither the collider nor any of its descendants are conditioned on, or (b) it contains a non-collider that is conditioned by exact stratiﬁcation. A path is open (does transmit association)iff it is not closed (Pearl 1988). Importantly, when a path contains only one collider, thenconditioning on this collider, or any of its descendants, opens this path.The marginal covariance between two variables in a linear model with standardized vari-ables is given by Wright’s [1934] rule as the sum of the product of the path parameters onthe open paths that connect the variables. For example, the marginal covariance between Z and Y in Figure 1a is Cov ( Y , Z ) = πβ , because the path Z → T → Y is the only openpath (the other path, Z → T ← U → Y , is closed by the unconditioned collider T ). Theconditional covariance between variables A and B , after adjusting for some covariate C , is Cov ( A , B | C ) = Cov ( A , B ) − Cov ( A , C ) Cov ( B , C ) . The novel bias results in this paper hinge onderiving conditional covariances when truncating the sample as a function of C . Instrumental Variables

Let T be the treatment variable of interest, Y be the outcome, Z be the candidate instrumentalvariable, and X be a set of covariates. Econometrically, an instrumental variable is deﬁned bytwo assumptions. Deﬁnition 1.

A variable, Z, is called an instrumental variable for the causal effect of T on Y , β , if, conditional on the set of covariates X (which may be empty), E1:

Z is associated with T , Cov ( Z , T | X ) (cid:54) = , E2:

Z is not associated with the structural error term, ω Y , on Y , Cov ( Z , ω Y | X ) = . Assumption E1 is called relevance , and assumption E2 is called exclusion . Pearl [2001]provides a graphical deﬁnition. Selection Bias in IV: Qualitative Analysis Deﬁnition 2.

A variable, Z, is called an instrumental variable for the causal effect of T on Y , β , if, conditional on the set of covariates X (which may be empty), G1:

There is at least one open path from Z to T conditional on X , G2: X does not contain descendants of Y , X ∩ desc ( Y ) = /0 , G3:

There is no open path from Z to Y conditional on X , other than those paths thatterminate in a causal path from T to Y . Assumption G1 deﬁnes relevance , and assumptions G2 and G3 together deﬁne exclusion .We say that a candidate instrumental variable is “valid” if it is relevant and excluded, and“invalid” otherwise. For example, in Figure 1a, Z is a valid instrument without conditioningon S , since Z is relevant (associated with T ) by the open path Z → T , and Z is excluded(unassociated with the structural error term on Y ) since the only open path from Z to Y , Z → T → Y , terminates in the causal effect of T on Y . When Z is a valid instrumental variable,then the standard IV estimator, given by the sample analog of β IV = Cov ( Y , Z | X ) Cov ( T , Z | X ) , is consistent for the causal effect of T on Y in linear and homogeneous models. The numeratorof this estimator is called the reduced form and the denominator is called the ﬁrst stage. Thebehavior of this IV estimator is the focus of this paper. For simplicity, we will henceforthwrite β IV and β OLS to refer to the probability limits (as the sample size tends to inﬁnity) of thestandard IV and OLS estimators, respectively. Selection Bias in IV: Qualitative Analysis

We say that the IV estimator suffers selection bias when conditioning on some variableviolates the exclusion assumption. For example, conditioning on a variable that opens a pathbetween Z and Y that does not terminate in the causal effect of T on Y violates exclusion bothin the sense of G3 and E2. Hughes et al. [2019] catalogue several models in which selectionviolates exclusion.We focus on the IV selection bias that results from conditioning on a descendant of T , S ∈ desc ( T ) . For example, in Figure 1a, conditioning on S invalidates the use of Z as aninstrumental variable, because T is the only collider variable on the path Z → T ← U → Y ,and conditioning on S as the descendant of the collider T opens this path. The association“transmitted” by this open path overtly violates the exclusion condition G3 and similarlyviolates the exclusion condition E2, since ω Y is a function of U . This rationalizes why Pearl’s[2000:248] early graphical IV deﬁnition rules out conditioning on descendants of treatmentoutright.Since conditioning on a variable can result from many different procedures during datacollection or data analysis, selection bias in IV analysis can result from many different pro- cedures as well. Analysts should be aware, however, that different ways of conditioning on avariable may induce quantitatively different selection biases. In this paper, we contrast selec-tion bias resulting from two empirically common conditioning procedures: sample truncationand covariate adjustment. Truncation occurs when observations are preferentially excluded from the sample (Barein-boim et al. 2014), e.g. due to attrition or listwise deletion of missing data. Write R = R = S be the (possi-bly latent) continuous variable that determines truncation. We distinguish between intervaltruncation and point truncation. Interval truncation restricts the sample to observations witha range of values of S , for example, R = ( S ≥ s ) or R = ( s ≥ S ≥ s ) , where ( · ) is theindicator function. A limiting case of interval truncation is point truncation , where the sampleis restricted to units with a single value of S , R = ( S = s ) . The truncated IV estimator isgiven by β IV | Tr = Cov ( Z , Y | R = ) Cov ( Z , T | R = ) . With truncation (as opposed to censoring) the analyst does not have access to the truncatedobservations, cannot estimate the probability of truncation, and hence cannot use inverse-probability weights to correct for truncation (Canan et al. 2017; Gkatzionis and Burgess 2018).In Figure 1a, a truncated sample would involve the empiricist observing { Z , T , Y } only forunits with R = covariate adjustment for S, this procedure hasreceived less attention in the literature on IV selection bias. With covariate adjustment theanalyst observes { Z , T , S , Y } for all units. Adjustment involves ﬁrst exactly stratifying on S , computing the estimator within each stratum, and then averaging across the marginaldistribution of S . Thus the IV estimator under adjustment on S is given by β IV | Ad j = (cid:90) Cov ( Z , Y | S = s ) Cov ( Z , T | S = s ) f S ( s ) ds , where f S ( s ) is the marginal distribution of S . In linear models, controlling for a variable asa main effect in OLS or 2SLS amounts to covariate adjustment on the variable (Angrist andPischke 2008).Next, we analytically characterize selection bias in IV analysis and OLS regression forvarious data-generating models and provide intuition. Selection Bias in IV: Quantitative Analysis

This section derives exact analytic expressions for selection bias across a range of commondata-generating models. For each model, we contrast the selection bias for the IV and the OLSestimators, resulting from two different conditioning strategies. First, we present the selectionbias resulting from covariate adjustment on S . Next, we newly derive the selection bias from Selection Bias in IV: Quantitative Analysis interval truncation on S , R = ( S ≥ s ) . We assume a probit link between S and the binaryselection indicator, R . Since IV analysis suffers small-sample bias regardless of selection,we study its large-sample behavior (asymptotic bias).

Consider the most basic scenario of IV selection bias in Figure 1a. As stated above, Z in thismodel is a valid instrumental variable for the causal effect of T on Y , β , if the analysis doesnot condition on S . Conditioning on S , however, invalidates Z as an instrumental variable,because S is a descendant of T , and T is a collider on the path Z → T ← U → Y . Conditioningon S opens this path, which induces an association between Z and Y via U and hence violatesthe exclusion condition.Proposition 1 gives the selection bias in the standard IV estimator when the analysis adjustsfor S. Proposition 1.

In a linear and homogeneous model with normal errors represented by Figure1a and covariate adjustment on S, the standard instrumental variables estimator converges inprobability to β IV | Ad j = β − δ δ γ − γ . The proof follows from regression algebra and Wright’s rule (Wright 1934). The magnitudeof selection bias due to covariate adjustment in the IV estimator depends on two components.First, selection bias increases with the strength of unobserved confounding between T and Y via U , δ δ (which corresponds to the path Z → T ← U → Y that is opened by conditioning on S , less the ﬁrst stage Z → T ). Second, selection bias increases with the effect of the treatment T on the selection variable, S , γ . When γ = S contains no information about the collider T , conditioning on S does not open the path Z → T ← U → Y , and selection bias is zero. Bycontrast, as | γ | →

1, the magnitude of the bias increases without bound because adjusting for S increasingly amounts to adjusting for the collider T itself, while at the same time reducingthe ﬁrst stage. (If the analysis directly adjusted for T , then the ﬁrst stage would go to zero andthe IV estimator would not be deﬁned.)Proposition 2 derives the IV selection bias due to interval truncation on S. Proposition 2.

In a linear and homogeneous model with normal errors represented byFigure 1a and truncation on S, R = ( S ≥ s ) , the standard instrumental variables estimatorconverges in probability to β IV | Tr = β − δ δ ψγ − ψγ , where ψ = φ ( s ) − Φ ( s ) (cid:18) φ ( s ) − Φ ( s ) − s (cid:19) , Numerical simulations in prior work have assumed logit selection (Canan et al. 2017; Hughes et al. 2019; Gkatzionisand Burgess 2018). Switching to probit selection captures the same intuition, but gains analytic tractability.

Truncation Severity versus ψ (a) Least Biased Estimator (b)

Figure 2: (a) ψ monotonically increases with truncation severity. (b) Whether OLS or IV isless biased under selection depends on truncation severity and the effect of T on S , | γ | . and φ ( · ) and Φ ( · ) are the standard normal pdf and cdf, respectively. Proposition 2 (proved in Appendix 7.1) illustrates that IV selection bias due to truncation(Proposition 2) differs from IV selection bias due to adjustment (Proposition 1) only in thattruncation deﬂates the contribution of the effect of T on S , γ , by the factor ψ ∈ ( , ) . Since ψ is the derivative of the standard normal hazard function, it monotonically increases with the severity of truncation , Pr ( R = ) = Φ ( s ) , as shown in Figure 2a. Hence, interval truncationleads to less IV selection bias than covariate adjustment in Figure 1a, Corollary 1.

In a linear and homogeneous model with normal errors represented by Figure1a, the magnitude of IV-adjustment bias is weakly larger than that of IV-truncation bias: (cid:12)(cid:12) β IV | Ad j − β (cid:12)(cid:12) ≥ (cid:12)(cid:12) β IV | Tr − β (cid:12)(cid:12) . Corollary 1 makes intuitive sense. Adjustment involves ﬁrst exactly stratifying and thenaveraging across strata deﬁned by S = s . Exact stratiﬁcation on S uses all informationabout T that is contained in S , hence opening the biasing path as much as conditioningon S possibly can. By contrast, interval truncation amounts to imprecise stratiﬁcation on S (retaining observations across a range of values on S , but not exactly stratifying on anyparticular value), hence “less opening” the biasing path.Of some methodological interest, we further note, in Figure 1a, that IV selection bias bytruncation converges on IV selection bias by covariate adjustment as the severity of truncation Selection Bias in IV: Quantitative Analysis increases to shrink the remaining sample to a single point. Proposition 3 states that thisobservation is true for all models, not only for Figure 1a. Proposition 3.

In a linear and homogeneous model with normal errors, selection bias in thestandard instrumental variables estimator due to covariate adjustment is the limiting case ofselection bias due to point truncation, lim s → ∞ β IV | Tr = β IV | Ad j . This proposition makes intuitive sense. Covariate adjustment involves exact stratiﬁcationon S = s , which deﬁnes point truncation. Since the probability limits of all s -stratum spe-ciﬁc estimators are identical in linear Gaussian models, selection bias by adjustment equalsselection bias by point truncation. The proof in Appendix 7.2 formalizes this intuition.Proposition 2 helps inform empirical choices in practice. When selection is unavoidable(e.g. because the data were truncated during data collection), should analysts choose IV orOLS? Figure 2b shows that the IV estimator is preferred to OLS, with respect to bias, formost combinations of γ and truncation severity. Since OLS bias (with or without truncation)only depends on unobserved confounding, i.e. β OLS | Tr − β = δ δ , the difference in magnitudebetween the OLS and IV biases with truncation is given by (cid:12)(cid:12) β OLS | Tr − β (cid:12)(cid:12) − (cid:12)(cid:12) β IV | Tr − β (cid:12)(cid:12) = | δ δ | − ψγ − ψγ . Hence, the IV estimator is preferred when ψγ ≤ . Speciﬁcally, when fewer than 29.1% ofobservations are truncated (corresponding to ψ ≤ . T on S , γ . Conversely, when | γ | < √ . ≈ . γ cannot exceed 1 in magnitude, the selection variable S would have to be an extraordinarily strong proxy for T to make IV more biased than OLS atany level of truncation.Perhaps most useful for practice, we note that selection bias (by truncation or adjustment)in Figure 1a is proportional to the negative of OLS confounding bias. Therefore, the OLS andIV estimators under selection bound the true causal effect. Corollary 2.

In a linear and homogeneous model with normal errors represented by Figure1a, the OLS estimator and the instrumental variables estimator with selection bound thecausal effect of T on Y , β , β IV | Tr ≤ β ≤ β OLS , when δ δ > , β IV | Tr ≥ β ≥ β OLS , when δ δ < . The fact that the IV selection bias has the opposite sign of the OLS selection bias in Figure1a is owed to linearity and homogeneity: in linear and homogeneous models, conditioningon a collider or its descendant reverses the sign of the product of the path parameters for the associated path. For example if all path parameters along the biasing path Z → T ← U → Y are positive, then conditioning on S ∈ desc ( T ) will induce a negative association along thispath. Since the IV bias hinges on conditioning on S , the selection bias would be negative. Bycontrast, OLS bias in Figure 1a does not hinge on conditioning on S and instead results fromconfounding along T ← U → Y . Therefore, OLS bias would be positive. Next, consider models in which the selection variable, S , is a mediator of the effect oftreatment on the outcome, as in the causal graphs in Figures 3a and 3b. These situations areworth investigating for two reasons: ﬁrst, often empiricists are interested in the direct causaleffect of T on Y , which necessitates conditioning on S ; second, they result in qualitativelydifferent bias representations.Suppose that the analyst is interested in the direct causal effect of T on Y , β , in the modelof Figure 3a. The bias in the IV and OLS estimators under interval truncation and adjustmentfor S is given in Proposition 4. Proposition 4.

In a linear and homogeneous model with normal errors represented byFigure 3a. The standard instrumental variables estimator with selection on S, converges inprobability to β IV | S = β − δ δ ψγ − ψγ + γτ − ψ − ψγ , and the OLS estimator with selection on S converges in probability to β OLS | S = β + δ δ + γτ − ψ − ψγ , Z T YUS π τβδ δ γ (a) Z T YUS W π τβδ δ γ δ δ (b) Figure 3: IV scenarios where the selection variable is both a descendant of treatment and amediator. Selection Bias in IV: Quantitative Analysis where ψ =  φ ( s ) − Φ ( s ) (cid:16) φ ( s ) − Φ ( s ) − s (cid:17) with truncation on S , R = ( S ≥ s ) with adjustment on S . All bias expressions in Proposition 4 have a straightforward graphical interpretation. With adjustment on S , the indirect causal path T → S → Y is completely blocked, because S isa non-collider on this path. Hence, the bias in the IV and OLS estimators with adjustmenton S equals the IV and OLS adjustment biases in Figure 1a, where S was not a mediator.With adjustment on S , IV is biased by selection, whereas OLS is biased by confounding; IVselection bias will generally be smaller in magnitude than OLS confounding bias (unless theeffect of T on S is very large); and IV and OLS with adjustment bound the true direct causaleffect.With truncation on S , however, the indirect path T → S → Y is not completely blockedand hence contributes a new term to both IV and OLS bias. For both IV and OLS, thisterm equals the strength of the partially blocked indirect path, γτ , deﬂated by the multiplier0 ≤ ( − ψ ) / ( − ψγ ) ≤

1. The size of the multiplier depends both on the truncation severity, ψ , and on the effect of T on S , γ , but in opposite directions. As γ is ﬁxed and truncationincreases, ψ →

1, the analysis conditions ever more precisely on an ever smaller range ofvalues of S ; hence the indirect path is increasingly blocked, and both the multiplier and thebias term tend to 0. By contrast, when ψ is ﬁxed and the effect of T on S increases, | γ | → T contained in S increases, the multiplier tends to 1, and the path isincreasingly opened.By Proposition 3, it remains true in Figure 3a that IV selection bias due to adjustment is thelimiting case of IV selection bias due to point truncation. However, it is no longer necessarilytrue that IV with adjustment is more biased than IV with truncation. The bias ordering nowdepends on the signs and relative sizes of the two additive bias term (representing the biasingpaths T ← U → Y and T → S → Y ), and on how well the indirect path T → S → Y is closedby truncation. Hence, when selection is made on a mediator of the treatment effect, selectionbias by adjustment could be larger or smaller in magnitude than selection bias by truncation.Bounding the true causal effect also becomes more difﬁcult. With truncation on S , IV andOLS with selection do not necessarily bound the true direct causal effect.The analysis is further complicated when the effect of S on Y is confounded by someunobserved variable, W , as in Figure 3b. This situation is arguably more realistic than themodel in Figure 3a, because mediators in observational studies are expected to be confounded.Here, conditioning on S (by adjustment or truncation) in IV analysis opens a new path, Z → T → S ← W → Y , which violates the exclusion assumption; and in OLS it opens T → S ← W → Y , which biases OLS regression. The resulting bias expressions are the sameas those in Proposition 4 with an additional bias term, − γδ δ ψ − ψγ . Once more, IV selectionbias due to adjustment is the limiting case of IV selection bias due to point truncation. Z T YUS π δ βδ δ γ Figure 4: IV scenario where the selection variable is both a descendant of the treatment andthe unobserved confounder.However, no pair of estimators (among β IV | Tr , β IV | Ad j , β OLS | Tr , β OLS | Ad j ) can be relied on tobound the true direct causal effect in the model of Figure 3b.

Finally, we consider situations where the selection variable, S , is also a descendant of theunobserved U that confounds the effect of treatment on the outcome (Figure 4). Proposition 5.

In a linear and homogeneous model with normal errors represented by Figure4. The standard instrumental variables estimator with selection on S converges in probabilityto β IV | S = β − δ δ ψγ − ψγ ( γ + δ δ ) − γδ δ ψ − ψγ ( γ + δ δ ) , and the OLS estimator with selection on S converges in probability to β OLS | S = β + δ δ − ψ ( γ + γδ δ + δ ) − ψγ ( γ + δ δ ) − γδ δ ψ − ψγ ( γ + δ δ ) , where ψ =  φ ( s ) − Φ ( s ) (cid:16) φ ( s ) − Φ ( s ) − s (cid:17) with truncation on S , R = ( S ≥ s ) with adjustment on S . Three points stand out about selection bias in Figure 4. First, when S is a descendant ofboth T and U , conditioning on S opens a new path, T → S ← U → Y , which biases IV andOLS with adjustment or truncation on S .Second, in contrast to models considered previously, the bias term associated with eachbiasing path ( T ← U → Y and T → S ← U → Y ) is now a function of the path parameters ofboth paths. In other words, the path-speciﬁc biases interact. Pearl’s graphical causal modelsprovide intuition for this interaction. Consider, for example, the second bias term. First,conditioning on S opens the path T → S ← U → Y . Hence, the bias term depends on γδ δ . Conclusion Second, conditioning on S also absorbs variance from U (a non-collider on T → S ← U → Y ),because S is a descendant of U along the path U → T → S . Hence, the bias term also dependson δ .Third, the direction of the interaction, and hence the overall bias, depends on the speciﬁcparameter values. This makes the bias order of these estimators fairly unpredictable andprevents generic recommendations for or against any one estimator. This ambiguity providesadditional motivation for using exact bias formulas for sensitivity analysis. Conclusion

Conditioning on the wrong variable can induce selection bias in IV analysis. When consistentestimators are not available, analysts should gauge the bias in their estimators by principledspeculation or formal sensitivity analysis. To enable this work, we have derived analyticexpressions for IV selection biases that have previously been characterized only by simulation.Our analysis speciﬁcally focused on scenarios where selection is a function of a con-founded treatment. Judea Pearl’s [2000] graphical IV criterion speciﬁcally prohibited con-ditioning on a descendant of treatment. But the practice appears to remain common, callingfor formal analysis. Our analytic expressions present asymptotic IV selection bias in termsof substantively interpretable standardized path parameters for Gaussian models. Empoweredby Pearl’s graphical causal models, we further provided intuition by decomposing the biasinto terms that map onto the paths in the data-generating model that are opened (or closed)by selection. Leveraging prior knowledge or principled theory, analysts may use our bias ex-pressions to conduct formal sensitivity analyses by populating the free parameters to derivethe size of the bias. Even with partial information our expressions may provide informativebounds on the bias.We present three broad conclusions. First, in the models we investigated, IV selection biasdepends on three ingredients: (i) the strength of each biasing path in the model, (ii) the effectof treatment on the selection variable, | γ | ; and (iii) truncation severity, ψ , i.e. the share ofthe full sample excluded from the analysis by truncation. The magnitude of the bias termassociated with each biasing path increases with the strength of the path, with | γ | , and withtruncation severity, ψ , if selection is made on a collider or descendant of a collider on thepath; and the magnitude of the bias term increases with the strength of the path and with | γ | ,but decreases with ψ , if selection is made on a non-collider on the biasing path.Second, sign and magnitude of IV selection bias depend on the selection procedure: in alllinear Gaussian IV models, the bias induced by covariate adjustment is the limiting case ofbias induced by point truncation. This does not mean that adjustment bias is always larger thantruncation bias, only that adjustment bias equals truncation bias if truncation had reduced thesample to a single point.Third, rather usefully, in some models (where selection is only a function of treatment andthe selection variable is not a mediator), IV and OLS suffer selection biases of opposite signs, such that these estimators bound the true causal effect. In the same models, unless the effectof treatment on selection is very large, IV with selection suffers less bias than OLS with orwithout selection. ibliography J. D. Angrist and J.-S. Pischke. 2008.

Mostly Harmless Econometrics . Princeton University Press.A. Balke and J. Pearl. 1997. Bounds on treatment effects from studies with imperfect compliance.

Journal of the American Statistical Association , 92(439): 1171–1176.E. Bareinboim, J. Tian, and J. Pearl. 2014. Recovering from selection bias in causal and statisticalinference. In

Proceedings of the 28th AAAI Conference on Artiﬁcial Intelligence , pp. 2410–2416.C. Canan, C. Lesko, and B. Lau. 2017. Instrumental variable analyses and selection bias.

Epidemiology ,28(3): 396–398.J. Engberg, D. Epple, J. Imbrogno, H. Sieg, and R. Zimmer. 2014. Evaluating education programs thathave lotteried admission and selective attrition.

Journal of Labor Economics , 32(1): 27–63.A. Ertefaie, D. Small, J. Flory, and S. Hennessy. 2016. Selection bias when using instrumental variablemethods to compare two treatments but more than two treatments are available.

The InternationalJournal of Biostatistics , 12(1): 219–232.A. Gkatzionis and S. Burgess. 2018. Contextualizing selection bias in mendelian randomization: howbad is it likely to be?

International Journal of Epidemiology .R. A. Hughes, N. M. Davies, G. Davey Smith, and K. Tilling. 2019. Selection bias when estimatingaverage treatment effects using one-sample instrumental variable analysis.

Epidemiology , 30(3): 350–357.M. L. S. W. M. Maathuis, Marloes; Drton, ed. 2018.

Handbook of Graphical Models , 1. CRC Press.M. Mogstad and M. Wiswall. 2012. Instrumental variables estimation with partially missing instruments.

Economics Letters , 114(2): 186–189.J. Pearl. 1988.

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference . MorganKaufmann Publishers Inc., San Francisco, CA, USA.J. Pearl. 2000.

Causality: Models, Reasoning, and Inference . Cambridge University Press, New York,NY, USA. ISBN 0-521-77362-8.J. Pearl. 2001. Parameter identiﬁcation: A new perspective (second draft). Technical Report R-276,UCLA Cognitive Systems Laboratory.J. Pearl. 2009.

Causality , second edition. Cambridge University Press, New York, NY, USA.T. S. Richardson and J. M. Robins. 2010. Analysis of the binary instrumental variable model. In

Heuristics, Probability and Causality: A Tribute to Judea Pearl , pp. 415–444. College Publications.S. A. Swanson, J. M. Robins, M. Miller, and M. A. Hernn. 2015. Selecting on treatment : A pervasiveform of bias in instrumental variable analyses.

American Journal of Epidemiology , 284: 1–7.G. M. Tallis. 1965. Plane truncation in normal populations.

Journal of the Royal Statistical Society ,27(2): 301–307.

BIBLIOGRAPHY

S. Vansteelandt, S. Walter, and E. Tchetgen Tchetgen. 2018. Eliminating survivor bias in two-stageinstrumental variable estimators.

Epidemiology , 29(4): 536–541.T. Verma and J. Pearl. 1988. Causal networks: Semantics and expressiveness. In

Proceedings ofthe Fourth Annual Conference on Uncertainty in Artiﬁcial Intelligence , pp. 69–78. North-HollandPublishing Co.S. Wright. 1934. The method of path coefﬁcients.

The Annals of Mathematical Statistics , 5(3): 161–215. Appendix

We derive the bias under truncation by leveraging a result from Tallis [1965].

Lemma 1.

Let V ∈ R k follow a multivariate normal distribution, V ∼ N ( , Σ ) , and deﬁnethe truncated random vector (cid:101) V = { v ∈ V : c (cid:48) v ≥ p } with p ∈ R ,c ∈ R k , and | c | = . Then theexpectation and variance of the truncated random vector are given byE (cid:104)(cid:101) V (cid:105) = Σ c κ − λ (cid:16) p κ (cid:17) Var (cid:16)(cid:101) V (cid:17) = Σ − Σ cc (cid:48) Σκ − ψ where κ = ( c (cid:48) Σ c ) − / , λ ( x ) = φ ( x ) − Φ ( x ) is the hazard function of the standard normal distribu-tion, and ψ = λ (cid:16) p κ (cid:17) (cid:16) λ (cid:16) p κ (cid:17) − p κ (cid:17) . Using properties of the standard normal hazard function it can be shown that ψ is in factthe derivative of the hazard function. Proof of Proposition 2.

Consider the model described by Figure 1a. Since the idiosyncraticshocks are all normally distributed, all variables in the model are normally distributed. Specif-ically for vectors V = (cid:104) Z U T S Y (cid:105) (cid:48) and ε = (cid:104) ε Z ε U ε T ε S ε Y (cid:105) (cid:48) , the standard-ized model has the reduced form V = Γε , where ε ∼ N ( , Σ ε ) and Γ =  π δ γπ γδ γ βπ βδ + δ β  Σ ε =  − π − δ − γ

00 0 0 0 1 − β − δ − βδ δ  . Since this implies that V ∼ N ( , ΓΣ ε Γ (cid:48) ) , our truncation scenario, R = ( S ≥ s ) , allows fordirect application of Lemma 1 to derive the covariance matrix of the truncated distribution, Standardization implies non-unit variance for some of the shocks. For example when

Var ( T ) =

1, then ε T is Var ( ε T ) = − π − δ . Appendix (cid:101) V = V | S ≥ s . For Lemma 1, c = (cid:104) (cid:105) (cid:48) , p = s , and Σ = ΓΣ ε Γ (cid:48) . This implies κ = Var (cid:16)(cid:101) V (cid:17) = ΓΣ ε Γ (cid:48) − ΓΣ ε Γ (cid:48) cc (cid:48) ΓΣ ε Γ (cid:48) ψ where ψ = λ ( s ) ( λ ( s ) − s ) . Finally the IV estimand with truncation is given by the ratio of the truncated covariancebetween instrument and outcome and the truncated covariance between instrument and treat-ment. After some enjoyable algebra, we evaluate

Var ( (cid:101) V ) , extract the relevant covariances, andobtain β IV | Tr = Cov ( Z , Y | S ≥ s ) Cov ( Z , T | S ≥ s ) = βπ − ψγπ ( βγ + γδ δ ) π − ψγ π = β − δ δ ψγ − ψγ . The proofs of Propositions 4 and 5 proceed analogously, using the appropriate reducedform matrix, Γ , for each scenario. Proof.