[PDF] Constructing valid instrumental variables in generalized linear causal models from directed acyclic graphs

Abstract

Unlike other techniques of causality inference, the use of valid instrumental variables can deal with unobserved sources of both variable errors, variable omissions, and sampling bias, and still arrive at consistent estimates of average treatment effects. The only problem is to find the valid instruments. Using the definition of Pearl (2009) of valid instrumental variables, a formal condition for validity can be stated for variables in generalized linear causal models. The condition can be applied in two different ways: As a tool for constructing valid instruments, or as a foundation for testing whether an instrument is valid. When perfectly valid instruments are not found, the squared bias of the IV-estimator induced by an imperfectly valid instrument -- estimated with bootstrapping -- can be added to its empirical variance in a mean-square-error-like reliability measure.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Constructing valid instrumental variables ingeneralised linear causal models from directedacyclic graphs

Øyvind [email protected] 17, 2021

Abstract

Unlike other techniques of causality inference, the use of valid instru-mental variables can deal with unobserved sources of both variable errors,variable omissions, and sampling bias, and still arrive at consistent esti-mates of average treatment eﬀects. The only problem is to ﬁnd the validinstruments. Using the deﬁnition of Pearl (2009) of valid instrumentalvariables, a formal condition for validity can be stated for variables ingeneralised linear causal models. The condition can be applied in twodiﬀerent ways: As a tool for constructing valid instruments, or as a foun-dation for testing whether an instrument is valid. When perfectly validinstruments are not found, the squared bias of the IV-estimator inducedby an imperfectly valid instrument — estimated with bootstrapping —can be added to its empirical variance in a mean-square-error-like relia-bility measure.

Causal graphs and theoretical results of Pearl (2009) have so far not foundmany applications within empirical econometrics. Based on this fact Imbens(2020) suggests that causal graphs are relatively unproductive. An alternativeexplanation is, though, that causal graph theory never have been accepted aspart of the foundations of econometrics. Without taking a ﬁnal stand in thiscontroversy, I will contribute with improvements to instrumental variables (IV)estimators based Pearl’s theoretical concepts.The origins of instrumental variables date almost one hundred years back.The geneticist Sewall Wright introduced causal graphs (Wright 1921), and hiseconomist father Philip introduced IV, the IV-estimator, and the equivalenttwo step least squares estimator(Wright 1928) . Later is Haavelmo (1943) and Based on Stock & Trebbi (2003) ”A valid in-strument induces changes in the explanatory variable but has no independenteﬀect on the dependent variable, allowing a researcher to uncover the causaleﬀect of the explanatory variable on the dependent variable.”

However, a sta-tistical test of their validity has so far been lacking within econometrics. Thisseems due to the fact that validity of an instrument not yet has found a deﬁni-tion with testable implications within this branch of science. So far is IV-basedcausal analysis surviving with the idea that valid instruments are rare and shouldbe selected based on verbal citeria: ”detailed institutional knowledge and thecareful investigation and quantiﬁcation of the forces at work in the particularsetting” (Angrist & Krueger 2001).At least for linear models, and for a substantial class of non-linear ones, it ispossible to improve on this situation by applying the deﬁnition of Pearl (2009):”A variable z is instrumental with respect to a treatment x and an outcome y , iﬀ in the causal graph involving ( z, x, y ), z is a cause of x and z and y areconditionally independent given x ”. The deﬁnition needs some translation topoint out the testable condition, though, but this will be provided in this article.A formal test will increase the precision of IV techniques as the selectionof instruments may be more precise than when based on the verbal criterion.In addition a test may increase the applicability of IV in two diﬀerent ways.First, instruments that do not pass the verbal criterion, may pass the test. Sec-ondly, even almost valid instruments may provide informative causal inference.The trick is to add the squared bias due to an almost valid instrument to thevariability of the estimated causal eﬀect in a mean-squared-error-like measure.Causal modelling with experimental data is well understood since Fisher(1960). Basically, when treatments are all under control, the causal eﬀects onresponses are found by means of regression models. The main problem withcausal analysis and non-experimental data is that treatments have not beencontrolled. Of course can control variables be added to the extent they areobserved, but there is always a possibility that some unobserved variable makethe regression coeﬃcients biased or inconsistent as measures of causal eﬀects.Simpson’s paradox — that an average eﬀect over the non-stratiﬁed populationhas diﬀerent sign from the corresponding average eﬀects in all population strata— may even arise in such situations. One will never know if it did, though,since the clarifying explanation is unobserved. Under ideal circumstances, theIV-estimator is expected to solve the problem of unobserved variables aﬀectingregressions.Actually are perfectly valid instruments not expected to be found. Instru-ment validity is a property of the population or super-population. Test statisticsfor a sample of the population, or a population of the super-population, are al-ways aﬀected by sampling errors. A probability need then be assigned to astatement that some variable is a valid instrument in some setting.2his paper continues in section 2 with a clariﬁcation of the close connec-tion between a simple linear causal graph and the corresponding econometricterms. In section 3 is Pearl’s deﬁnition of a valid instrumental relationship be-tween z and ( x, y ) translated to econometric equations which should hold at the(super)population level.Construction of valid instruments, eventual test design and estimation ofbias by means of bootstrapping is described in section 4. At last section 5concludes. A simple ﬁve-noded causal graph as in ﬁgure 1 is suﬃcient to illustrate therelationships between causal graph theory and econometrics. It is emphasisedthat the graph is an hypothesis of the causal relationships — or equivalentlyof the data generating process. Alternative hypotheses with arrows pointing inother directions are possible as long as no cycles arise. The graph needs to bea directed acyclic graph (DAG). Statistics cannot in general say whether x isa cause of y or opposite, or if one hypothesis of causality is more likely thananother, as long as there are no missing edges in the graph. Support for thecausal hypothesis need be found in theory or common sense. E.g. with referenceto ﬁgure 1, w might be the external determinants of an economic agent, x , mightbe his capital stocks, and y his short-run production decisions.Useful terminology distinguish the starting and ending nodes of an arrow.When nodes a and b is connected with an arrow, a − > b , b is named a child of a , and a is named a parent of b . Causality runs from parents to children, notthe other way. In contrast, correlation runs both ways. In relation to parentsand children, the meanings of ancestors and descendants are the obvious ones.This graph assumes that x is a parent of y , while w is a parent of both x and y . In addition is ǫ x , representing all unknown parents of x , also a parent of x ,and ǫ y is likewise a parent of y . Usually are error terms like ǫ x and ǫ y droppedfrom the graph, but for sake of explanation they are kept here.Each node represents independent observations of a single variable, or a blockof variables, with the name of the node. All variables has the same number ofobservations. If the node has parents, the i -th row of a node is a function of the i -th rows of the parent nodes. In this case are only linear functions consideredand the functions are x i = w i β w,x + ǫ i,x and y i = w i β w,y + x i β x,y + ǫ i,y .With parameter estimates known, the causal eﬀect on y by changing x to x ′ is in Pearls terms found with the ”do-operator”, y ( do ( x ′ )) = x ′ β x,y . The eﬀectis, y ( do ( x ′ )) − y ( do ( x )) = ( x ′ − x ) β x,y In more complex cases, the do-operator follows all directed paths from x to y .In general Pearl assumes that a probability, p i , is associated with each real-isation. ( w i , x i , y i , ǫ i,x , ǫ i,y ), and that the error term of a node is independentof the other parent nodes. Here: ǫ x ⊥⊥ w and ǫ y ⊥⊥ ( w, x ). Independence is in3xw ǫ y ǫ x Figure 1: Causal graph of ( w, x, y )turn a property of probability distributions P , more precisely: a ⊥⊥ b means P ( a | b ) = P ( a ). Econometrics with linear models tend to skip probabilities and rely on ﬁrstand second order moments only. The same trick is applicable for causal graphs.Independence is then replaced with weaker condition of orthogonality. As wellknown, equivalence of orthogonality and independence requires normally dis-tributed variables.The relevance of the toy-graph of ﬁgure 2 follows from the following lemma. Lemma 1.

The nodes of an arbitrary directed acyclic graph can be sorted sothat each node depends only on parents of lower orderProof.

Take ﬁrst nodes without parents in arbitrary ordering. If none exists,the graph contains a cycle. Take then nodes with ordered parents in arbitraryordering, and continue until there are no more nodes dependent only on theordered ones. Either there are no more nodes at all, and all nodes are ordered,or there is a subset of nodes depending on at least one non-ordered node. Inthe latter case the subset contains a cycle.Such sorting is helpful in complex graphs.For the situation depicted in ﬁgure 1 the following lemma will also be proved.

Lemma 2.

Let ( w, x, y, ǫ x , ǫ y ) be blocks of variables of length n with ( w, x, y ) non-singular and with zero-means, linearly related as ( w, x, y )  − β w,x − β w,y − β x,y  = ( w, ǫ x , ǫ y ) (1) with error terms, ( ǫ x , ǫ y ) , orthogonal to predictors: w ⊥ ǫ x , ( w, x ) ⊥ ǫ y ll relevant variables aﬀecting x and y are observed.Then: • all nodes with no parents are orthogonal w ⊥ ǫ x , ( w, ǫ x ) ⊥ ǫ y • all parameters are equivalent to those of OLS • all linear causal eﬀects are given by OLS-parametersProof. By assumption ǫ x ⊥ w and ǫ y ⊥ w . Since, ǫ y ⊥ x is equivalent to ǫ y ⊥ ( ǫ x + wβ w,x ), and ǫ y ⊥ w , ǫ y ⊥ ǫ x . Orthogonal nodes without parents arethen proved.With regard to OLS-parameters, it is well known that OLS parameters implyorthogonality between errors and predictors. To show the opposite, because ofassumed orthogonality:0 = ( w, x ) T ǫ y = ( w, x ) T (cid:18) y − ( w, x ) (cid:18) β w,y β x,y (cid:19)(cid:19) = ( w, x ) T y − ( w, x ) T ( w, x ) (cid:18) β w,y β x,y (cid:19) Multiplication with (cid:0) ( w, x ) T ( w, x ) (cid:1) − on both sides, possible by non-singularity,shows that parameters are OLS. (cid:0) ( w, x ) T ( w, x ) (cid:1) − ( w, x ) T y = (cid:18) β w,y β x,y (cid:19) There are three blocks of direct causal eﬀects w − > x , w − > y and x − > y ,given by β w,x , β w,y and β x,y . There is also a total eﬀect of w on y , given by β w,x β x,y + β w,y ,This simple lemma suggests that there is nothing mysterious with the causalgraph. Everything is related to standard econometric terms. A slight contrastis that the graph is a system of variables where x plays a double role as bothdependent and independent. Such systems are not alien to econometrics either.On the other hand has econometrics concerns with respect to eﬃciency. Iferrors are heteroscedastic, econometrics prefer generalized least squares (GLS)as opposed to OLS. Clearly, for sake of eﬃciency, GLS can and should be appliedalso for causal graphs provided the same weighting matrix is applied for everyﬁrst and second order moment.Econometricians often express concerns over residuals being correlated withobserved variables. In the causal graph is global correlation between error andancestor nodes only arising when an arrow between two nodes is wrongly omit-ted. In the current simple case that is not the case. Another aspect is localcorrelation. Both in econometrics and in causal analysis could estimation bemade more eﬃcient by transforming the observations to make them closer tonormally distributed as in generalised linear models. A particularly important5 case is that of binary variables which can be transformed to normals via thelog-odds ratio.The more important contrast between econometrics and causal graph anal-ysis is related to unobserved confounders to be illustrated with another causaldiagram in the next section. In the case portrayed in ﬁgure 3, are two blocks of unobserved variables ( u, v )added to ﬁgure 2. These blocks may contain omitted variables aﬀecting oneobserved variable at a time, and confounders aﬀecting several simultaneously.There is no loss of generality in this relatively simple structure of unobservedvariables.The presence of unobserved variables means regression coeﬃcients may bebiased. Measurement errors come from single omitted variables, selection biascome from confounders.Both econometrics and causal graph analysis recognise these two reasonswhy regressions may turn wrong and instrumental variables may be helpful. Inaddition econometrics has a third one, simultaneity. As shown in lemma 1, amodel over an acyclic graph has a recursivity that avoids simultaneous equationsin the econometric sense. Simultaneity will be further commented in the ﬁnalsection.Regressions can be done with regard to observable variables as for ﬁgure2 with equivalent orthogonality conditions and identical outcome. There isno reason to expect that this outcome provide causal information. However,it will bring information on suitable weighting matrices, Ω, leading to less het-eroscedasticity of error terms. It will also bring information on suitable transfor-mations of variables, so that error terms becomes closer to normally distributed.Both model modiﬁcations will make later use of IV more eﬃcient.Clearly, w is not a set of valid instruments as the requirement that w aﬀects y only through x is not satisﬁed. Theorem 1.

With reference to ﬁgure 3, when ( u, v, w, x, y ) are distributed asmultivariate normal with Ex ( u, v, w, x, y ) = 0 , then the following are equivalent • w and y are independent conditional on x • w is orthogonal to the residuals of y OLS-regressed on x • β w,y = 0 Proof.

When ( u, v, w, x, y ) are multivariate normal with Ex ( u, v, w, x, y ) = 0,so are ( w, x, y ). Let Var ( w, x, y ) = Σ. The distribution of ( w, y ) | x is also mul-tivariate normal with expectation, x Σ − x T x Σ x T ( w,y ) and variance, Σ ( w,y ) T ( w,y ) − Σ ( w,y ) T x Σ − x T x Σ x T ( w,y ) . The independence of y and w conditional on x means6xw v u ǫ y ǫ x ǫ w Figure 2: Causal graph of ( u, v, w, x, y )the latter matrix is block diagonal. That is:0 = Σ w T y − Σ w T x Σ − x T x Σ x T y = Ex (cid:0) w T y (cid:1) − Ex (cid:0) w T x (cid:1) β x,y = Ex (cid:0) w T y (cid:1) − Ex (cid:0) w T xβ x,y (cid:1) = Ex (cid:0) w T y − w T xβ x,y (cid:1) = Ex (cid:0) w T ( y − xβ x,y ) (cid:1) and w should be orthogonal to the residuals of y OLS-regressed on x . By theregression anatomy formula, β w,y = 0.Equivalence follows from reverse statements.Theorem 1 suggests that a modiﬁcation of w to z , so that β z,y = 0, make z a set of valid instruments. It is not so that unobserved u vanish, neither thatthe direct eﬀects of z on y vanish, it is the combination of a direct eﬀect of z on y plus a non-causal correlation through u that vanish by cancelling each other.A relevant diagram is then ﬁgure 3 where node w is replaced by node z with nodirect eﬀect on y and no non-causal correlation between z and y .Parameters β z,x and β x,y are still without causal content because of expectedomissions and confounding, but the IV-estimator, β IVxy , is expected to be consis-tent. Actually, it will be proved below that it is, provided variables are normallydistributed.

Theorem 2.

With reference to 3, when ( z, x, y ) are distributed as multivariatenormal, and z and y are independent conditional on x , then is the IV-estimatorconsistent as a measure of the causal eﬀect of x on y .Proof. By lemma 2 the variables ( u, v, ǫ z , ǫ x , ǫ y ) are all orthogonal. Thus z , asa linear function of v and ǫ z is orthogonal to u , and this is also the case with7xz v u ǫ y ǫ x ǫ w Figure 3: Causal graph of ( u, v, z, x, y ) with z as valid instrumentthe predictor of x , b x ( z ) = z ( z T z ) − z T x . We now have x separated in threeorthogonal parts: x = b x ( z ) + ǫ x + u which aﬀects y : y = b x ( z ) β b x,y + ǫ x β ǫ x ,y + uβ u,y + ǫ y By their orthogonality none of these components change when others are omit-ted. The two latter ones are unobserved and will be omitted. The coeﬃcient β b x,y is then the causal eﬀect of b x on y and also the causal eﬀect of x on y for( u, ǫ x ) kept ﬁxed.The eﬀect can be speciﬁed as: β b x,y = (cid:0)b x T b x (cid:1) − b x T y = (cid:0) x T z ( z T z ) − z T x (cid:1) − x T z ( z T z ) − z T y When z T x has an inverse, this expression simpliﬁes to: β b x,y = (cid:0) z T x (cid:1) − z T y which is the well-known IV-estimator.It should be observed that normal distributions seem necessary in this situ-ation. With a normal distribution the assumption of conditional independenceis equivalent to an orthogonality constraint. Orthogonality without normalityis a weaker assumption without direct connection to Pearl’s deﬁnition of validinstruments. Possibly will some consistency results still hold, but the eﬃciencyof estimation is expected to be better the closer one gets to normality.8 Construction of valid instrumental variables— and other tricks

Obviously can theorem 1 be applied to test the validity of some instrumentalvariable. There is, though, a more constructive application. Valid instrumentscan be constructed.Start out from some variables w as in ﬁgure 3. Compute also the residuals of x OLS-regressed on w , η x = x − w ( w T w ) − w T x . Both blocks of variables havecausal eﬀect on x . A linear combination z = ( w, η x ) λ has also this property.Valid instruments should have the property λ T ( w, η x ) T (cid:0) y − x ( x T x ) − x T y ) (cid:1) = 0When the row dimension of w is at least as large as that of x , and that againat least as large of that of y , a suﬃcient number of instruments will most likelybe found.When there is a space of valid instruments, one may even look for thosehaving k z k k = 1 and | z Tk x | large.When suﬃcient numbers of valid instruments are not found, one might pro-ceed with the least invalid instruments, z , and compute z ′ satisfying the orthog-onality constraints with minimum deviations, ( z − z ′ ) Tk . Some small deviationsare not devastating. After all the orthogonality constraints should hold at the(super-)population levels, not for ﬁnite samples.The bootstrapping technique amounts to make a number of samples, M ,of size n by random draws with replacement from ( z ′ , z, x, y ). All estimationroutines are repeated for each sample ( z, z ′ , x, y ) m . The IV-estimators are com-puted as, β IVm with z as instrument and β IV ′ m with z ′ .Estimates of the expectation and variance of IV-estimators, given that z is a valid instrument, are found as Ex (cid:0) β IV (cid:1) and Var (cid:0) β IV (cid:1) . If it does nothold, some bias will be involved. An estimate of the bias is Ex (cid:16) β IV ′ − β IV (cid:17) .The squared bias should be added to the sampling variance, Var (cid:0) β IV (cid:1) , for amean-square-error-like reliability measure. With an econometric approach to causal analysis with instrumental variables,one hopes for some natural experiment where z is random and therefore inde-pendent of both u and v in ﬁgure 3. In that case both u and v and their arrowsdisappear from the model. In addition one needs to argue that z has no directeﬀect on y .As shown in this article, the causal graph analyst has more options. He maystart out from a vector of other observed variables, v , with a causal eﬀect on x , and ﬁnd a set of valid instruments satisfying Pearl’s deﬁnition, which theeconometrician would not think of. 9or some reason has the econometric community, and to considerable ex-tent also the statistical, not embraced causal graphs. An inherent conceptualproblem with most causal graphs is their non-uniqueness. Statistical methodscannot in general decide the direction of causal arrows. In social sciences, asopposed to physics, are precise theories not known, and it is not at all obvioushow causal graphs should be drawn. Therefore may several causal graphs orDGPs be equally valid for a set of observations.This should be no problem for the case of instruments and causality. Thehypothetical direction of causality from z to x and from x to y is based on otherevidence than statistics. At least, the block sorting of variables here in w , x and y may be less controversial than a complete sorting of single variables.Heckman & Pinto (2013) see causal graphs based on DAGs as a straight-jacket. They would like to see simultaneity with causality going both ways alsotreated with graphs. After all, it was with simultaneous equations instrumentalvariables ﬁrst entered econometrics as a tool of identiﬁcation (Haavelmo 1943).The obvious counterargument is that tools tailored to speciﬁc situationsmay be more productive than universal ones. It has actually been shown inthis article that the restriction to acyclic models open for a much larger spaceof valid instrumental variables than what econometricans are able to ﬁnd withtheir informal analysis of each case.To some extent, has simultaneity already been considered in the analysishere. Variable block, y , consists possibly of a set of simultaneous variables withno predeﬁned causal order between members. Assumed causality is only com-ing from the blocks x or w . Bringing that into account, there is a conditionaldistribution, P ( y | x, w ), which in some sense also deﬁnes a sort of simultane-ous causality . with a partition of y into ( y , y ), can conditional distributions P ( y | y , x, w ) also be formed, and an eﬀect of y on y conditional on x and w can be deﬁned as: ∂ y Ex ( y P ( y | y , x, w ))Simultaneous causality is not directional and is a diﬀerent concept than graph-ical causality, though.In addition, it should be remembered that acyclic graphical models actuallycan cope with simultaneity issues within temporary contexts. With an auto-regressive model y t = Ay t − + ǫ t and a matrix of auto-regressive parameters, A , causality may ﬂow both ways atthe same time.Time will show whether such solutions will satisfy grumpy econometricians. References

Angrist, J. D. & Krueger, A. B. (2001), ‘Instrumental variables and the searchfor identiﬁcation: From supply and demand to natural experiments’,

Journalof Economic perspectives (4), 69–85.10isher, R. A. (1960), The design of experiments , Oliver and Boyd. London andEdinburgh. 7th Edition.Haavelmo, T. (1943), ‘The statistical implications of a system of simultaneousequations’,

Econometrica, Journal of the Econometric Society (1), 1–12.Heckman, J. J. & Pinto, R. (2013), Causal analysis after Haavelmo, Technicalreport, National Bureau of Economic Research.Imbens, G. W. (2020), ‘Potential outcome and directed acyclic graph approachesto causality: Relevance for empirical practice in economics’, Journal of Eco-nomic Literature (4), 1129–79.Pearl, J. (2009), Causality , Cambridge University Press.Reiersøl, O. (1950), ‘Identiﬁability of a linear relation between variables whichare subject to error’,

Econometrica: Journal of the Econometric Society pp. 375–389.Stock, J. H. & Trebbi, F. (2003), ‘Retrospectives: Who invented instrumentalvariable regression?’,

Journal of Economic Perspectives (3), 177–194.Wikipedia (2020), ‘Instrumental variables estimation’. Downloaded Dec 18th2020. URL: https://en.wikipedia.org/wiki/Instrumental variables estimation

Wright, P. G. (1928),

Tariﬀ on animal and vegetable oils , Macmillan Company,New York.Wright, S. (1921), ‘Correlation and causation’,

Journal of agricultural research20