Constructing valid instrumental variables in generalized linear causal models from directed acyclic graphs
aa r X i v : . [ ec on . E M ] F e b Constructing valid instrumental variables ingeneralised linear causal models from directedacyclic graphs
Øyvind [email protected] 17, 2021
Abstract
Unlike other techniques of causality inference, the use of valid instru-mental variables can deal with unobserved sources of both variable errors,variable omissions, and sampling bias, and still arrive at consistent esti-mates of average treatment effects. The only problem is to find the validinstruments. Using the definition of Pearl (2009) of valid instrumentalvariables, a formal condition for validity can be stated for variables ingeneralised linear causal models. The condition can be applied in twodifferent ways: As a tool for constructing valid instruments, or as a foun-dation for testing whether an instrument is valid. When perfectly validinstruments are not found, the squared bias of the IV-estimator inducedby an imperfectly valid instrument — estimated with bootstrapping —can be added to its empirical variance in a mean-square-error-like relia-bility measure.
Causal graphs and theoretical results of Pearl (2009) have so far not foundmany applications within empirical econometrics. Based on this fact Imbens(2020) suggests that causal graphs are relatively unproductive. An alternativeexplanation is, though, that causal graph theory never have been accepted aspart of the foundations of econometrics. Without taking a final stand in thiscontroversy, I will contribute with improvements to instrumental variables (IV)estimators based Pearl’s theoretical concepts.The origins of instrumental variables date almost one hundred years back.The geneticist Sewall Wright introduced causal graphs (Wright 1921), and hiseconomist father Philip introduced IV, the IV-estimator, and the equivalenttwo step least squares estimator(Wright 1928) . Later is Haavelmo (1943) and Based on Stock & Trebbi (2003) ”A valid in-strument induces changes in the explanatory variable but has no independenteffect on the dependent variable, allowing a researcher to uncover the causaleffect of the explanatory variable on the dependent variable.”
However, a sta-tistical test of their validity has so far been lacking within econometrics. Thisseems due to the fact that validity of an instrument not yet has found a defini-tion with testable implications within this branch of science. So far is IV-basedcausal analysis surviving with the idea that valid instruments are rare and shouldbe selected based on verbal citeria: ”detailed institutional knowledge and thecareful investigation and quantification of the forces at work in the particularsetting” (Angrist & Krueger 2001).At least for linear models, and for a substantial class of non-linear ones, it ispossible to improve on this situation by applying the definition of Pearl (2009):”A variable z is instrumental with respect to a treatment x and an outcome y , iff in the causal graph involving ( z, x, y ), z is a cause of x and z and y areconditionally independent given x ”. The definition needs some translation topoint out the testable condition, though, but this will be provided in this article.A formal test will increase the precision of IV techniques as the selectionof instruments may be more precise than when based on the verbal criterion.In addition a test may increase the applicability of IV in two different ways.First, instruments that do not pass the verbal criterion, may pass the test. Sec-ondly, even almost valid instruments may provide informative causal inference.The trick is to add the squared bias due to an almost valid instrument to thevariability of the estimated causal effect in a mean-squared-error-like measure.Causal modelling with experimental data is well understood since Fisher(1960). Basically, when treatments are all under control, the causal effects onresponses are found by means of regression models. The main problem withcausal analysis and non-experimental data is that treatments have not beencontrolled. Of course can control variables be added to the extent they areobserved, but there is always a possibility that some unobserved variable makethe regression coefficients biased or inconsistent as measures of causal effects.Simpson’s paradox — that an average effect over the non-stratified populationhas different sign from the corresponding average effects in all population strata— may even arise in such situations. One will never know if it did, though,since the clarifying explanation is unobserved. Under ideal circumstances, theIV-estimator is expected to solve the problem of unobserved variables affectingregressions.Actually are perfectly valid instruments not expected to be found. Instru-ment validity is a property of the population or super-population. Test statisticsfor a sample of the population, or a population of the super-population, are al-ways affected by sampling errors. A probability need then be assigned to astatement that some variable is a valid instrument in some setting.2his paper continues in section 2 with a clarification of the close connec-tion between a simple linear causal graph and the corresponding econometricterms. In section 3 is Pearl’s definition of a valid instrumental relationship be-tween z and ( x, y ) translated to econometric equations which should hold at the(super)population level.Construction of valid instruments, eventual test design and estimation ofbias by means of bootstrapping is described in section 4. At last section 5concludes. A simple five-noded causal graph as in figure 1 is sufficient to illustrate therelationships between causal graph theory and econometrics. It is emphasisedthat the graph is an hypothesis of the causal relationships — or equivalentlyof the data generating process. Alternative hypotheses with arrows pointing inother directions are possible as long as no cycles arise. The graph needs to bea directed acyclic graph (DAG). Statistics cannot in general say whether x isa cause of y or opposite, or if one hypothesis of causality is more likely thananother, as long as there are no missing edges in the graph. Support for thecausal hypothesis need be found in theory or common sense. E.g. with referenceto figure 1, w might be the external determinants of an economic agent, x , mightbe his capital stocks, and y his short-run production decisions.Useful terminology distinguish the starting and ending nodes of an arrow.When nodes a and b is connected with an arrow, a − > b , b is named a child of a , and a is named a parent of b . Causality runs from parents to children, notthe other way. In contrast, correlation runs both ways. In relation to parentsand children, the meanings of ancestors and descendants are the obvious ones.This graph assumes that x is a parent of y , while w is a parent of both x and y . In addition is ǫ x , representing all unknown parents of x , also a parent of x ,and ǫ y is likewise a parent of y . Usually are error terms like ǫ x and ǫ y droppedfrom the graph, but for sake of explanation they are kept here.Each node represents independent observations of a single variable, or a blockof variables, with the name of the node. All variables has the same number ofobservations. If the node has parents, the i -th row of a node is a function of the i -th rows of the parent nodes. In this case are only linear functions consideredand the functions are x i = w i β w,x + ǫ i,x and y i = w i β w,y + x i β x,y + ǫ i,y .With parameter estimates known, the causal effect on y by changing x to x ′ is in Pearls terms found with the ”do-operator”, y ( do ( x ′ )) = x ′ β x,y . The effectis, y ( do ( x ′ )) − y ( do ( x )) = ( x ′ − x ) β x,y In more complex cases, the do-operator follows all directed paths from x to y .In general Pearl assumes that a probability, p i , is associated with each real-isation. ( w i , x i , y i , ǫ i,x , ǫ i,y ), and that the error term of a node is independentof the other parent nodes. Here: ǫ x ⊥⊥ w and ǫ y ⊥⊥ ( w, x ). Independence is in3xw ǫ y ǫ x Figure 1: Causal graph of ( w, x, y )turn a property of probability distributions P , more precisely: a ⊥⊥ b means P ( a | b ) = P ( a ). Econometrics with linear models tend to skip probabilities and rely on firstand second order moments only. The same trick is applicable for causal graphs.Independence is then replaced with weaker condition of orthogonality. As wellknown, equivalence of orthogonality and independence requires normally dis-tributed variables.The relevance of the toy-graph of figure 2 follows from the following lemma. Lemma 1.
The nodes of an arbitrary directed acyclic graph can be sorted sothat each node depends only on parents of lower orderProof.
Take first nodes without parents in arbitrary ordering. If none exists,the graph contains a cycle. Take then nodes with ordered parents in arbitraryordering, and continue until there are no more nodes dependent only on theordered ones. Either there are no more nodes at all, and all nodes are ordered,or there is a subset of nodes depending on at least one non-ordered node. Inthe latter case the subset contains a cycle.Such sorting is helpful in complex graphs.For the situation depicted in figure 1 the following lemma will also be proved.
Lemma 2.
Let ( w, x, y, ǫ x , ǫ y ) be blocks of variables of length n with ( w, x, y ) non-singular and with zero-means, linearly related as ( w, x, y ) − β w,x − β w,y − β x,y = ( w, ǫ x , ǫ y ) (1) with error terms, ( ǫ x , ǫ y ) , orthogonal to predictors: w ⊥ ǫ x , ( w, x ) ⊥ ǫ y ll relevant variables affecting x and y are observed.Then: • all nodes with no parents are orthogonal w ⊥ ǫ x , ( w, ǫ x ) ⊥ ǫ y • all parameters are equivalent to those of OLS • all linear causal effects are given by OLS-parametersProof. By assumption ǫ x ⊥ w and ǫ y ⊥ w . Since, ǫ y ⊥ x is equivalent to ǫ y ⊥ ( ǫ x + wβ w,x ), and ǫ y ⊥ w , ǫ y ⊥ ǫ x . Orthogonal nodes without parents arethen proved.With regard to OLS-parameters, it is well known that OLS parameters implyorthogonality between errors and predictors. To show the opposite, because ofassumed orthogonality:0 = ( w, x ) T ǫ y = ( w, x ) T (cid:18) y − ( w, x ) (cid:18) β w,y β x,y (cid:19)(cid:19) = ( w, x ) T y − ( w, x ) T ( w, x ) (cid:18) β w,y β x,y (cid:19) Multiplication with (cid:0) ( w, x ) T ( w, x ) (cid:1) − on both sides, possible by non-singularity,shows that parameters are OLS. (cid:0) ( w, x ) T ( w, x ) (cid:1) − ( w, x ) T y = (cid:18) β w,y β x,y (cid:19) There are three blocks of direct causal effects w − > x , w − > y and x − > y ,given by β w,x , β w,y and β x,y . There is also a total effect of w on y , given by β w,x β x,y + β w,y ,This simple lemma suggests that there is nothing mysterious with the causalgraph. Everything is related to standard econometric terms. A slight contrastis that the graph is a system of variables where x plays a double role as bothdependent and independent. Such systems are not alien to econometrics either.On the other hand has econometrics concerns with respect to efficiency. Iferrors are heteroscedastic, econometrics prefer generalized least squares (GLS)as opposed to OLS. Clearly, for sake of efficiency, GLS can and should be appliedalso for causal graphs provided the same weighting matrix is applied for everyfirst and second order moment.Econometricians often express concerns over residuals being correlated withobserved variables. In the causal graph is global correlation between error andancestor nodes only arising when an arrow between two nodes is wrongly omit-ted. In the current simple case that is not the case. Another aspect is localcorrelation. Both in econometrics and in causal analysis could estimation bemade more efficient by transforming the observations to make them closer tonormally distributed as in generalised linear models. A particularly important5 case is that of binary variables which can be transformed to normals via thelog-odds ratio.The more important contrast between econometrics and causal graph anal-ysis is related to unobserved confounders to be illustrated with another causaldiagram in the next section. In the case portrayed in figure 3, are two blocks of unobserved variables ( u, v )added to figure 2. These blocks may contain omitted variables affecting oneobserved variable at a time, and confounders affecting several simultaneously.There is no loss of generality in this relatively simple structure of unobservedvariables.The presence of unobserved variables means regression coefficients may bebiased. Measurement errors come from single omitted variables, selection biascome from confounders.Both econometrics and causal graph analysis recognise these two reasonswhy regressions may turn wrong and instrumental variables may be helpful. Inaddition econometrics has a third one, simultaneity. As shown in lemma 1, amodel over an acyclic graph has a recursivity that avoids simultaneous equationsin the econometric sense. Simultaneity will be further commented in the finalsection.Regressions can be done with regard to observable variables as for figure2 with equivalent orthogonality conditions and identical outcome. There isno reason to expect that this outcome provide causal information. However,it will bring information on suitable weighting matrices, Ω, leading to less het-eroscedasticity of error terms. It will also bring information on suitable transfor-mations of variables, so that error terms becomes closer to normally distributed.Both model modifications will make later use of IV more efficient.Clearly, w is not a set of valid instruments as the requirement that w affects y only through x is not satisfied. Theorem 1.
With reference to figure 3, when ( u, v, w, x, y ) are distributed asmultivariate normal with Ex ( u, v, w, x, y ) = 0 , then the following are equivalent • w and y are independent conditional on x • w is orthogonal to the residuals of y OLS-regressed on x • β w,y = 0 Proof.
When ( u, v, w, x, y ) are multivariate normal with Ex ( u, v, w, x, y ) = 0,so are ( w, x, y ). Let Var ( w, x, y ) = Σ. The distribution of ( w, y ) | x is also mul-tivariate normal with expectation, x Σ − x T x Σ x T ( w,y ) and variance, Σ ( w,y ) T ( w,y ) − Σ ( w,y ) T x Σ − x T x Σ x T ( w,y ) . The independence of y and w conditional on x means6xw v u ǫ y ǫ x ǫ w Figure 2: Causal graph of ( u, v, w, x, y )the latter matrix is block diagonal. That is:0 = Σ w T y − Σ w T x Σ − x T x Σ x T y = Ex (cid:0) w T y (cid:1) − Ex (cid:0) w T x (cid:1) β x,y = Ex (cid:0) w T y (cid:1) − Ex (cid:0) w T xβ x,y (cid:1) = Ex (cid:0) w T y − w T xβ x,y (cid:1) = Ex (cid:0) w T ( y − xβ x,y ) (cid:1) and w should be orthogonal to the residuals of y OLS-regressed on x . By theregression anatomy formula, β w,y = 0.Equivalence follows from reverse statements.Theorem 1 suggests that a modification of w to z , so that β z,y = 0, make z a set of valid instruments. It is not so that unobserved u vanish, neither thatthe direct effects of z on y vanish, it is the combination of a direct effect of z on y plus a non-causal correlation through u that vanish by cancelling each other.A relevant diagram is then figure 3 where node w is replaced by node z with nodirect effect on y and no non-causal correlation between z and y .Parameters β z,x and β x,y are still without causal content because of expectedomissions and confounding, but the IV-estimator, β IVxy , is expected to be consis-tent. Actually, it will be proved below that it is, provided variables are normallydistributed.
Theorem 2.
With reference to 3, when ( z, x, y ) are distributed as multivariatenormal, and z and y are independent conditional on x , then is the IV-estimatorconsistent as a measure of the causal effect of x on y .Proof. By lemma 2 the variables ( u, v, ǫ z , ǫ x , ǫ y ) are all orthogonal. Thus z , asa linear function of v and ǫ z is orthogonal to u , and this is also the case with7xz v u ǫ y ǫ x ǫ w Figure 3: Causal graph of ( u, v, z, x, y ) with z as valid instrumentthe predictor of x , b x ( z ) = z ( z T z ) − z T x . We now have x separated in threeorthogonal parts: x = b x ( z ) + ǫ x + u which affects y : y = b x ( z ) β b x,y + ǫ x β ǫ x ,y + uβ u,y + ǫ y By their orthogonality none of these components change when others are omit-ted. The two latter ones are unobserved and will be omitted. The coefficient β b x,y is then the causal effect of b x on y and also the causal effect of x on y for( u, ǫ x ) kept fixed.The effect can be specified as: β b x,y = (cid:0)b x T b x (cid:1) − b x T y = (cid:0) x T z ( z T z ) − z T x (cid:1) − x T z ( z T z ) − z T y When z T x has an inverse, this expression simplifies to: β b x,y = (cid:0) z T x (cid:1) − z T y which is the well-known IV-estimator.It should be observed that normal distributions seem necessary in this situ-ation. With a normal distribution the assumption of conditional independenceis equivalent to an orthogonality constraint. Orthogonality without normalityis a weaker assumption without direct connection to Pearl’s definition of validinstruments. Possibly will some consistency results still hold, but the efficiencyof estimation is expected to be better the closer one gets to normality.8 Construction of valid instrumental variables— and other tricks
Obviously can theorem 1 be applied to test the validity of some instrumentalvariable. There is, though, a more constructive application. Valid instrumentscan be constructed.Start out from some variables w as in figure 3. Compute also the residuals of x OLS-regressed on w , η x = x − w ( w T w ) − w T x . Both blocks of variables havecausal effect on x . A linear combination z = ( w, η x ) λ has also this property.Valid instruments should have the property λ T ( w, η x ) T (cid:0) y − x ( x T x ) − x T y ) (cid:1) = 0When the row dimension of w is at least as large as that of x , and that againat least as large of that of y , a sufficient number of instruments will most likelybe found.When there is a space of valid instruments, one may even look for thosehaving k z k k = 1 and | z Tk x | large.When sufficient numbers of valid instruments are not found, one might pro-ceed with the least invalid instruments, z , and compute z ′ satisfying the orthog-onality constraints with minimum deviations, ( z − z ′ ) Tk . Some small deviationsare not devastating. After all the orthogonality constraints should hold at the(super-)population levels, not for finite samples.The bootstrapping technique amounts to make a number of samples, M ,of size n by random draws with replacement from ( z ′ , z, x, y ). All estimationroutines are repeated for each sample ( z, z ′ , x, y ) m . The IV-estimators are com-puted as, β IVm with z as instrument and β IV ′ m with z ′ .Estimates of the expectation and variance of IV-estimators, given that z is a valid instrument, are found as Ex (cid:0) β IV (cid:1) and Var (cid:0) β IV (cid:1) . If it does nothold, some bias will be involved. An estimate of the bias is Ex (cid:16) β IV ′ − β IV (cid:17) .The squared bias should be added to the sampling variance, Var (cid:0) β IV (cid:1) , for amean-square-error-like reliability measure. With an econometric approach to causal analysis with instrumental variables,one hopes for some natural experiment where z is random and therefore inde-pendent of both u and v in figure 3. In that case both u and v and their arrowsdisappear from the model. In addition one needs to argue that z has no directeffect on y .As shown in this article, the causal graph analyst has more options. He maystart out from a vector of other observed variables, v , with a causal effect on x , and find a set of valid instruments satisfying Pearl’s definition, which theeconometrician would not think of. 9or some reason has the econometric community, and to considerable ex-tent also the statistical, not embraced causal graphs. An inherent conceptualproblem with most causal graphs is their non-uniqueness. Statistical methodscannot in general decide the direction of causal arrows. In social sciences, asopposed to physics, are precise theories not known, and it is not at all obvioushow causal graphs should be drawn. Therefore may several causal graphs orDGPs be equally valid for a set of observations.This should be no problem for the case of instruments and causality. Thehypothetical direction of causality from z to x and from x to y is based on otherevidence than statistics. At least, the block sorting of variables here in w , x and y may be less controversial than a complete sorting of single variables.Heckman & Pinto (2013) see causal graphs based on DAGs as a straight-jacket. They would like to see simultaneity with causality going both ways alsotreated with graphs. After all, it was with simultaneous equations instrumentalvariables first entered econometrics as a tool of identification (Haavelmo 1943).The obvious counterargument is that tools tailored to specific situationsmay be more productive than universal ones. It has actually been shown inthis article that the restriction to acyclic models open for a much larger spaceof valid instrumental variables than what econometricans are able to find withtheir informal analysis of each case.To some extent, has simultaneity already been considered in the analysishere. Variable block, y , consists possibly of a set of simultaneous variables withno predefined causal order between members. Assumed causality is only com-ing from the blocks x or w . Bringing that into account, there is a conditionaldistribution, P ( y | x, w ), which in some sense also defines a sort of simultane-ous causality . with a partition of y into ( y , y ), can conditional distributions P ( y | y , x, w ) also be formed, and an effect of y on y conditional on x and w can be defined as: ∂ y Ex ( y P ( y | y , x, w ))Simultaneous causality is not directional and is a different concept than graph-ical causality, though.In addition, it should be remembered that acyclic graphical models actuallycan cope with simultaneity issues within temporary contexts. With an auto-regressive model y t = Ay t − + ǫ t and a matrix of auto-regressive parameters, A , causality may flow both ways atthe same time.Time will show whether such solutions will satisfy grumpy econometricians. References
Angrist, J. D. & Krueger, A. B. (2001), ‘Instrumental variables and the searchfor identification: From supply and demand to natural experiments’,
Journalof Economic perspectives (4), 69–85.10isher, R. A. (1960), The design of experiments , Oliver and Boyd. London andEdinburgh. 7th Edition.Haavelmo, T. (1943), ‘The statistical implications of a system of simultaneousequations’,
Econometrica, Journal of the Econometric Society (1), 1–12.Heckman, J. J. & Pinto, R. (2013), Causal analysis after Haavelmo, Technicalreport, National Bureau of Economic Research.Imbens, G. W. (2020), ‘Potential outcome and directed acyclic graph approachesto causality: Relevance for empirical practice in economics’, Journal of Eco-nomic Literature (4), 1129–79.Pearl, J. (2009), Causality , Cambridge University Press.Reiersøl, O. (1950), ‘Identifiability of a linear relation between variables whichare subject to error’,
Econometrica: Journal of the Econometric Society pp. 375–389.Stock, J. H. & Trebbi, F. (2003), ‘Retrospectives: Who invented instrumentalvariable regression?’,
Journal of Economic Perspectives (3), 177–194.Wikipedia (2020), ‘Instrumental variables estimation’. Downloaded Dec 18th2020. URL: https://en.wikipedia.org/wiki/Instrumental variables estimation
Wright, P. G. (1928),
Tariff on animal and vegetable oils , Macmillan Company,New York.Wright, S. (1921), ‘Correlation and causation’,
Journal of agricultural research20