Minimal sufficient causation and directed acyclic graphs
aa r X i v : . [ m a t h . S T ] J un The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2009
MINIMAL SUFFICIENT CAUSATION AND DIRECTEDACYCLIC GRAPHS
By Tyler J. VanderWeele and James M. Robins
University of Chicago and Harvard University
Notions of minimal sufficient causation are incorporated withinthe directed acyclic graph causal framework. Doing so allows for thegraphical representation of sufficient causes and minimal sufficientcauses on causal directed acyclic graphs while maintaining all of theproperties of causal directed acyclic graphs. This in turn provides aclear theoretical link between two major conceptualizations of causal-ity: one counterfactual-based and the other based on a more mecha-nistic understanding of causation. The theory developed can be usedto draw conclusions about the sign of the conditional covariancesamong variables.
1. Introduction.
Two broad conceptualizations of causality can be dis-cerned in the literature, both within philosophy and within statistics andepidemiology. The first conceptualization may be characterized as giving anaccount of the effects of certain causes; the approach addresses the question,“Given a particular cause or intervention, what are its effects?” In the con-temporary philosophical literature, this approach is most closely associatedwith Lewis’ work [17, 18] on counterfactuals. In the contemporary statisticsliterature, this first approach is closely associated with the work of Rubin[30, 31] on potential outcomes, of Robins [25, 26] on the use of counterfac-tual variables in the context of time-varying treatment and of Pearl [21] onthe graphical representation of various counterfactual relations on directedacyclic graphs. This counterfactual approach has been used extensively instatistics both in the development of theory and in application. The secondconceptualization of causality may be characterized as giving an account ofthe causes of particular effects; this approach attempts to address the ques-tion, “Given a particular effect, what are the various events which might have
Received December 2007.
AMS 2000 subject classifications.
Primary 62A01, 62M45; secondary 62G99, 68T30,68R10, 05C20.
Key words and phrases.
Causal inference, conditional independence, directed acyclicgraphs, graphical models, interactions, sufficient causation, synergism.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2009, Vol. 37, No. 3, 1437–1465. This reprint differs from the original inpagination and typographic detail. 1
T. J. VANDERWEELE AND J. M. ROBINS been its cause?” In the contemporary philosophical literature, this secondapproach is most notably associated with Mackie’s work [19] on insufficientbut necessary components of unnecessary but sufficient conditions (INUSconditions) for an effect. In the epidemiologic literature, this approach ismost closely associated with Rothman’s work [29] on sufficient-componentcauses. This work is more closely related to the various mechanisms for aparticular effect than is the counterfactual approach. Rothman’s work onsufficient-component causes has, however, seen relatively little development,extension or application, though the basic framework is routinely taughtin introductory epidemiology courses. Perhaps the only major attempt inthe statistics literature to extend and apply Rothman’s theory has been thework of Aickin [1] (comments relating Aickin’s work to the present work areavailable from the authors upon request).In this paper, we incorporate notions of minimal sufficient causes, cor-responding to Rothman’s sufficient-component causes, within the directedacyclic graph causal framework [21]. Doing so essentially unites the mecha-nistic and the counterfactual approaches into a single framework. As will beseen in Section 5, we can use the framework developed to draw conclusionsabout the sign of the conditional covariances among variables. Without thetheory developed concerning minimal sufficient causes, such conclusions can-not be drawn from causal directed acyclic graphs. In a related paper [35] wehave discussed how these ideas relate to epidemiologic research. The presentpaper develops the theory upon which this epidemiologic discussion relies.The theory developed in this paper is motivated by several other con-siderations. As will be seen below, the incorporation of minimal sufficientcause nodes allows for the identification of certain conditional independen-cies which hold only within a particular stratum of the conditioning vari-able (i.e., “asymmetric conditional independencies,” [7]) which were not evi-dent without the minimal sufficient causation structures. We note that theseasymmetric conditional independencies have been represented elsewhere byBayesian multinets [7] or by trees [3]. Another motivation for the develop-ment of the theory in this paper concerns the notion of interaction. Prod-uct terms are frequently included in regression models to assess interactionsamong variables; these statistical interactions, however, even if present, neednot imply the existence of an actual mechanism in which two distinct causesboth participate. Interactions which do concern the actual mechanisms aresometimes referred to as instances of “synergism” [29], “biologic interac-tions” [32] or “conjunctive causes” [20], and the development of minimalsufficient cause theory provides a useful framework to characterize mecha-nistic interactions. In related work [37] we have derived empirical tests forinteractions in this sufficient cause sense.As yet further motivation, we conclude this Introduction by describinghow the methods we develop in this paper clarified and helped resolve an
INIMAL SUFFICIENT CAUSATION Fig. 1.
Causal directed acyclic graph under the alternative hypothesis of familial coag-gregation. analytic puzzle faced by psychiatric epidemiologists. Consider the follow-ing somewhat simplified version of a study reported in Hudson et al. [10].Three hundred pairs of obese siblings living in an ethnically homogenousupper-middle class suburb of Boston are recruited and cross classified bythe presence or absence of two psychiatric disorders: manic-depressive dis-order P and binge eating disorder B . The question of scientific interest iswhether these two disorders have a common genetic cause, because, if so,studies to search for a gene or genes that cause both disorders would beuseful. Consider two analyses. The first analysis estimates the covariance β between P i and B i , while the second analysis estimates the conditionalcovariance α between P i and B i among subjects with P i = 1, where B ki is 1 if the k th sibling in the i th family has disorder B and is zero otherwise,with P ki defined analogously. It was found that the estimates β and α wereboth positive with 95% confidence intervals that excluded zero.Hudson et al. [10] substantive prior knowledge is summarized in the di-rected acyclic graph of Figure 1 in which the i index denoting family issuppressed. In what follows, we will make reference to some standard re-sults concerning directed acyclic graphs; these results are reviewed in detailin the following section.In Figure 1, G B and G P represent the genetic causes of B and P , respec-tively, that are not common causes of both B and P. The variables E and E represent the environmental exposures of siblings 1 and 2, respectively,that are common causes of both diseases, for example, exposure to a partic-ularly stressful school environment. The variables G B and G P are assumedindependent as would typically be the case if, as is highly likely, they arenot genetically linked. Furthermore, as is common in genetic epidemiology,the environmental exposures E and E are assumed independent of thegenetic factors. The causal arrows from P to B and P to B representthe investigators’ beliefs that manic-depressive disorder may be a cause ofbinge eating disorder but not vice-versa. The node F represents the com-mon genetic causes of both P and B as well as any environmental causes ofboth P and B that are correlated within families. There is no data available T. J. VANDERWEELE AND J. M. ROBINS for G B , G P , E , E or F . The reason for grouping the common geneticcauses with the correlated environmental causes in F is that, based on theavailable data { P ki , B ki ; i = 1 , . . . , , k = 1 , } , we can only hope to testthe null hypothesis that F so defined is absent, which is referred to as thehypothesis of no familial coaggregation. If this null hypothesis is rejected,we cannot determine from the available data whether F is present due to acommon genetic cause or a correlated common environmental cause. Thus E and E are independent on the graph because, by definition, they repre-sent the environmental common causes of B and P that are independentlydistributed between siblings.Now, under the null hypothesis that F is absent, we note that P and B are still correlated due to the unblocked path P − G p − P − B , so wewould expect β = 0 as found. Furthermore, P and B are still expectedto be correlated given P = 1 due to the unblocked path P − G p − P − E − B , so we would expect α = 0 as found. Thus, we cannot test the nullhypothesis that F is absent without further substantive assumptions beyondthose encoded in the causal directed acyclic graph of Figure 1.Now Hudson et al. [10] were also willing to assume that for no subset of thepopulation did the genetic causes G p and G B of P and B prevent disease.Similarly, they assumed there was no subset of the population for whomthe environmental causes E and E of B and P prevented either disease.We will show in Section 5 that under these additional assumptions, the nullhypothesis that F is absent implies that the conditional covariance α mustbe less than or equal to zero, provided that there is no interaction, in thesufficient cause sense, between E and G P . If it is plausible that no sufficientcause interaction between E and G P exists, then the null hypothesis that F is absent is rejected because the estimate of α is positive with a 95%confidence interval that does not include zero.Thus, the conclusion in the argument above that familial coaggregationof diseases B and P was present depended critically on the existence of (i)a formal definition of a sufficient cause interaction, (ii) a substantive under-standing of what the assumption of no sufficient cause interaction entailed,and (iii) a sound mathematical theory that related assumptions about theabsence of sufficient cause interactions to testable restrictions on the distri-bution of the observed data, specifically on the sign of a particular condi-tional covariance. In this paper, we provide a theory that offers (i)–(iii).The remainder of the paper is organized as follows. The second sectionreviews the directed acyclic graph causal framework and provides some basicdefinitions; the third section presents the theory which allows for the graph-ical representation of minimal sufficient causes within the directed acyclicgraph causal framework; the fourth section gives an additional preliminaryresult concerning monotonicity; the fifth section develops results relatingminimal sufficient causation and the sign of conditional covariances; the INIMAL SUFFICIENT CAUSATION sixth section provides some discussion concerning possible extensions to thepresent work.
2. Basic definitions and concepts.
In this section, we review the directedacyclic graph causal framework and give a number of definitions regardingsufficient conjunctions and related concepts. Following Pearl [21], a causaldirected acyclic graph is a set of nodes ( X , . . . , X n ), corresponding to vari-ables, and directed edges among nodes, such that the graph has no cyclesand such that, for each node X i on the graph, the corresponding variable isgiven by its nonparametric structural equation X i = f i ( pa i , ǫ i ), where pa i arethe parents of X i on the graph and the ǫ i are mutually independent randomvariables. These nonparametric structural equations can be seen as a gener-alization of the path analysis and linear structural equation models [21, 22]developed by Wright [43] in the genetics literature and Haavelmo [9] in theeconometrics literature. Robins [27, 28] discusses the close relationship be-tween these nonparametric structural equation models and fully randomized,causally interpreted structured tree graphs [25, 26]. Spirtes, Glymour andScheines [33] present a causal interpretation of directed acyclic graphs out-side the context of nonparametric structural equations and counterfactualvariables. It is easily seen from the structural equations that ( X , . . . , X n )admits the following factorization: p ( X , . . . , X n ) = Q ni =1 p ( X i | pa i ). The non-parametric structural equations encode counterfactual relationships amongthe variables represented on the graph. The equations themselves representone-step ahead counterfactuals with other counterfactuals given by recur-sive substitution. The requirement that the ǫ i be mutually independent isessentially a requirement that there is no variable absent from the graphwhich, if included on the graph, would be a parent of two or more variables[21, 22].A path is a sequence of nodes connected by edges regardless of arrowheaddirection; a directed path is a path which follows the edges in the directionindicated by the graph’s arrows. A node C is said to be a common cause of A and B if there exists a directed path from C to B not through A and adirected path from C to A not through B . A collider is a particular nodeon a path such that both the preceding and subsequent nodes on the pathhave directed edges going into that node. A backdoor path from A to B isa path that begins with a directed edge going into A . A path between A and B is said to be blocked given some set of variables Z if either there isa variable in Z on the path that is not a collider or if there is a collider onthe path such that neither the collider itself nor any of its descendants arein Z . If all paths between A and B are blocked given Z , then A and B aresaid to be d -separated given Z . It has been shown that if all paths between A and B are blocked given Z , then A and B are conditionally independentgiven Z [8, 13, 40]. T. J. VANDERWEELE AND J. M. ROBINS
Suppose that a set of nonparametric structural equations represented bya directed acyclic graph H is such that its variables X are partitioned intotwo sets X = V ∪ W . If in the nonparametric structural equation for V ∪ W ,by replacing each occurrence of X i ∈ W by f i ( pa i , ǫ i ), the nonparametricstructural equations for V can be written so as to correspond to some causaldirected acyclic graph G , then G is said to be the marginalization of H over the set of variables W . A causal directed acyclic graph with variables X = V ∪ W can be marginalized over W if no variable in W is a commoncause of any two variables in V .In giving definitions for a sufficient conjunction and related concepts,we will use the following notation. An event is a binary variable takingvalues in { , } . The complement of some event E we will denote by E . Aconjunction or product of the events X , . . . , X n will be written as X · · · X n .The associative OR operator, ∨ , is defined by A ∨ B = A + B − AB . For arandom variable A with sample space Ω we will use the notation A ≡ A ( ω ) = 0, for all ω ∈ Ω. We will use the notation 1 A = a todenote the indicator function for the random variable A taking the value a ; for some subset S of the sample space Ω, we will use 1 S to denote theindicator that ω ∈ S . We will use the notation A ` B | C to denote that A is conditionally independent of B given C . We begin with the definitions ofa sufficient conjunction and a minimal sufficient conjunction. These basicdefinitions make no reference to directed acyclic graphs or causation. Definition 1.
A set of events X , . . . , X n is said to constitute a suffi-cient conjunction for event, D if X , . . . , X n = 1 ⇒ D = 1. Definition 2.
A set of events X , . . . , X n which constitutes a sufficientconjunction for D is said to constitute a minimal sufficient conjunction for D if no proper subset of X , . . . , X n constitutes a sufficient conjunction for D .Sufficient conjunctions for a particular event need not be causes for anevent. Suppose a particular sound is produced when and only when an in-dividual blows a whistle. This particular sound the whistle makes is a suf-ficient conjunction for the whistle’s having been blown, but the sound doesnot cause the blowing of the whistle. The converse, rather, is true; the blow-ing of the whistle causes the sound to be produced. Corresponding then tothese notions of a sufficient conjunction and a minimal sufficient conjunctionare those of a sufficient cause and a minimal sufficient cause which will bedefined in Section 3. Definition 3.
A set of events M , . . . , M n , each of which may be someproduct of events, is said to be determinative for some event D if D = M ∨ M ∨ · · · ∨ M n . INIMAL SUFFICIENT CAUSATION Fig. 2.
Causal directed acyclic graphs with sufficient causation structures.
Definition 4.
A determinative set M , . . . , M n of (minimal) sufficientconjunctions for D is nonredundant if no proper subset of M , . . . , M n isdeterminative for D . Example 1.
Suppose A = B ∨ CE and D = EF . If we consider all theminimal sufficient conjunctions for A among the events { B, C, D } , we cansee that B and CD are the only minimal sufficient conjunctions, but it is notthe case that A = B ∨ CD . Clearly then, a complete list of minimal sufficientconjunctions for A generated by a particular collection of events may not bea determinative set of sufficient conjunctions for A . If we consider all minimalsufficient conjunctions for A among the events { B, C, D, E } , we see that B and CD and CE are all minimal sufficient conjunctions. In this example, B ∨ CD ∨ CE is a determinative set of minimal sufficient conjunctions for A but is not nonredundant. We see then that even when a complete list ofminimal sufficient conjunctions generated by a particular collection of eventsconstitutes a determinative set of minimal sufficient conjunctions, it may notbe a nonredundant determinative set of minimal sufficient conjunctions.
3. Minimal sufficient causation and directed acyclic graphs.
In this sec-tion, we develop theory which allows for the representation of sufficientconjunctions and minimal sufficient conjunctions on causal directed acyclicgraphs. We begin with a motivating example.
Example 2.
Consider a causal directed acyclic graph given in Figure2(i). Suppose E E and E E constitute a determinative set of sufficientconjunctions for D . We will show in Theorem 1 below that it follows that thediagram in Figure 2(ii) is also a causal directed acyclic graph where E i E j is simply the product or conjunction of E i and E j ; because the sufficientconjunctions E E and E E are determinative, it follows that D = E E ∨ E E . An ellipse is put around the sufficient conjunctions E E and E E T. J. VANDERWEELE AND J. M. ROBINS to indicate that the set is determinative. As will be seen below, in orderto add sufficient conjunctions it is important that a determinative set ofsufficient conjunctions is known or can be constructed. Consider the causaldirected acyclic graph given in Figure 2(iii). Suppose that no determinativeset of sufficient conjunctions can be constructed from E and E alone;suppose further, however, that there exists some other cause of D , say A ,independent of E and E , such that E E and AE form a determinativeset of sufficient conjunctions. Then, Theorem 1 below can again be used toshow that Figure 2(iv) is a causal directed acyclic graph. Furthermore, itwill be shown in Theorem 2 that for any causal directed acyclic graph with abinary node which has only binary parents, a set of variables { A i } ni =0 alwaysexists such that a determinative set of sufficient causes can be formed fromthe original parents on the graph and the variables { A i } ni =0 .Theorem 1 provides the formal result required for the previous example. Theorem 1.
Consider a causal directed acyclic graph G with some node D such that D and all its parents are binary. Suppose that there exists a setof binary variables A , . . . , A u such that a determinative set of sufficientconjunctions for D , say M , . . . , M S , can be formed from conjunctions of A , . . . , A u along with the parents of D on G and the complements of thesevariables. Suppose further that there exists a causal directed acyclic graph H such that the parents of D on H that are not on G consist of the nodes A , . . . , A u and such that G is the marginalization of H over the set of vari-ables which are on the graph for H but not G . Then, the directed acyclicgraph J formed by adding to H the nodes M , . . . , M S , removing the di-rected edges into D from the parents of D on H , adding directed edges fromeach M i into D and adding directed edges into each M i from every parentof D on H which appears in the conjunction for M i is itself a causal directedacyclic graph. Proof.
To prove that the directed acyclic graph J is a causal directedacyclic graph, it is necessary to show that each of the nodes on the directedacyclic graph can be represented by a nonparametric structural equationinvolving only the parents on J of that node and a random term ǫ i whichis independent of all other random terms ǫ j in the nonparametric structuralequations for the other variables on the graph. The nonparametric structuralequation for M i may be defined as the product of events in the conjunctionfor M i . The nonparametric structural equation for D can be given by D = M ∨ · · · ∨ M n . The nonparametric structural equations for all other nodes on J can betaken to be the same as those defining the causal directed acyclic graph INIMAL SUFFICIENT CAUSATION H . Because the nonparametric structural equations for D and for each M i on J are deterministic, they have no random-error term. Thus, for thenonparametric structural equations defining D and each M i on J , the re-quirement that the nonparametric structural equation’s random term ǫ i isindependent of all the other random terms ǫ j in the nonparametric struc-tural equations for the other variables on the graph is trivially satisfied.That this requirement is satisfied for the nonparametric structural equa-tions for the other variables on J follows from the fact that it is satisfiedon H . (cid:3) In Theorem 1, sufficient conjunctions for D are constructed from some setof variables that, on some causal directed acyclic graph H , are all parentsof D and thus, within the directed acyclic graph causal framework, it makessense to speak of sufficient causes and minimal sufficient causes. Definition 5.
If, on a causal directed acyclic graph, some node D withnonparametric structural equation D = f D ( pa D , ǫ D ) is such that D and allits parents are binary, then X , . . . , X n is said to constitute a sufficient causefor D if X , . . . , X n are all parents of D or complements of the parents of D and are such that f D ( pa D , ǫ D ) = 1 for all ǫ D whenever pa D is such that X · · · X n = 1; if no proper subset of X , . . . , X n also constitutes a sufficientcause for D , then X , . . . , X n is said to constitute a minimal-sufficient causefor D . A set of (minimal) sufficient causes, M , . . . , M n , each of which is aproduct of the parents of D and their complements, is said to be determi-native for some event D if, for all ǫ D , f D ( pa D , ǫ D ) = 1 if and only if pa D issuch that M ∨ M ∨ · · · ∨ M n = 1; if no proper subset of M , . . . , M n is alsodeterminative for D , then M , . . . , M n is said to constitute a nonredundantdeterminative set of (minimal) sufficient causes for D .If, for some directed acyclic graph G there exist A , . . . , A u which satisfythe conditions of Theorem 1 for some node D on G so that a determinativeset of sufficient causes for D can be constructed from A , . . . , A u along withthe parents of D on G and their complements, then D will be said to admita sufficient causation structure. As in Example 2, we will, in general, replacethe M i nodes with the conjunctions that constitute them. The node D withdirected edges from the M i nodes is effectively an OR node. The M i nodeswith the directed edges from the A i nodes and the parents of D on G are effectively AND nodes. We call this resulting diagram a causal directedacyclic graph with a sufficient causation structure (or a minimal sufficientcausation structure if the determinative set of sufficient conjunctions for D are each minimal sufficient conjunctions). T. J. VANDERWEELE AND J. M. ROBINS
Because a causal directed acyclic graph with a sufficient causation struc-ture is itself a causal directed acyclic graph, the d -separation criterion ap-plies and allows one to determine independencies and conditional indepen-dencies. A minimal sufficient causation structure will often make apparentconditional independencies within a particular stratum of the conditioningvariable which were not apparent on the original causal directed acyclicgraph. The following corollary is useful in this regard. Corollary 1.
If some node D on a causal directed acyclic graph admitsa sufficient causation structure then conditioning on D = 0 conditions alsoall sufficient cause nodes for D on the causal directed acyclic graph with thesufficient causation structure. Example 2 (Continued). Consider the causal directed acyclic graphwith the minimal sufficient causation structure given in Figure 2(ii). Con-ditioning on D = 0 also conditions on E E = 0 and E E = 0, and thus,by the d -separation criterion, E i is conditionally independent of E j given D = 0 for i ∈ { , } , j ∈ { , } . In the causal directed acyclic graph with theminimal sufficient causation structure in Figure 2(iv), no similar conditionalindependence relations within the D = 0 stratum holds. Although condition-ing on D = 0 conditions also on E E = 0 and AE = 0 there still remainsan unblocked path E − E E − E − AE − A between E and A , and so E and A are not conditionally independent given D = 0; Similarly, thereare unblocked paths between E and E given D = 0 and also between E and A given D = 0.The additional variables A , . . . , A u needed to form a set of sufficientcauses for D we will refer to as the co-causes of D . The co-causes A , . . . , A u required to form a determinative set of sufficient conjunctions for D willgenerally not be unique. For example, if D = A ∨ A E then it is also thecase that D = B ∨ B E , where B = A and B = A A . Similarly, therewill, in general, be no unique set of sufficient causes that is determinativefor D . For example, if E and E constitute a set of sufficient causes for D so that D = E ∨ E , then it is also the case that E E , E E , and E E also constitute a set of sufficient causes for D , and so we could also write D = E E ∨ E E ∨ E E . It can be shown that not even nonredundantdeterminative sets of minimal sufficient causes are unique.Corresponding to the definition of a sufficient cause is the more philosoph-ical notion of a causal mechanism. A causal mechanism can be conceived ofas a set of events or conditions which, if all present, bring about the outcomeunder consideration through a particular pathway. A causal mechanism thusprovides a particular description of how the outcome comes about. Suppose,for instance, that an individual were exposed to two poisons, E and E , INIMAL SUFFICIENT CAUSATION such that in the absence of E , the poison E would lead to heart failureresulting in death; and that in the absence of E , the poison E would leadto respiratory failure resulting in death; but such that when E and E areboth present, they interact and lead to a failure of the nervous system againresulting in death. In this case, there are three distinct causal mechanisms fordeath each corresponding to a sufficient cause for D : death by heart failurecorresponding to E E , death by respiratory failure corresponding to E E and death due to a failure of the nervous system corresponding to E E . It isinteresting to note that in this case none of the sufficient causes correspond-ing to the causal mechanisms is minimally sufficient. Each of E E , E E and E E is sufficient for D but none is minimally sufficient, as either E or E alone is sufficient for death. We will refer to a sufficient cause for D as acausal mechanism for D if the node for the sufficient cause corresponds toa variable, potentially subject to intervention, which whenever the variabletakes the value 1, the outcome D inevitably results.The last example shows that the existence of a particular set of deter-minative sufficient causes does not guarantee that there are actual causalmechanisms corresponding to these sufficient causes; it only implies thata set of causal mechanisms corresponding to these sufficient causes cannotbe ruled out by a complete knowledge of counterfactual outcomes. In par-ticular, in the previous example, the set { E , E } is a determinative set ofsufficient causes that does not correspond to the actual set of causal mecha-nisms { E E , E E , E E } . If there are two or more sets of sufficient causesthat are determinative for some outcome D then although the two sets ofdeterminative sufficient causes are logically equivalent for prediction, wenevertheless view them as distinct. In such cases, some knowledge of thesubject matter in question will, in general, be needed to discern which ofthe sets of determinative sufficient causes actually corresponds to the truecausal mechanisms. For instance, in the previous example, we needed biolog-ical knowledge of how poisons brought about death in the various scenarios.We will, in the interpretation of our results, assume that there always existssome set of true causal mechanisms which forms a determinative set of suffi-cient causes for the outcome. The concept of synergism is closely related tothat of a causal mechanism and is often found in the epidemiologic literature[11, 29, 32]. We will say that there is synergism between the effects of E and E on D if there exists a sufficient cause for D which represents somecausal mechanism and such that this sufficient cause has E and E in itsconjunction. In related work, we have developed tests for synergism, that is,tests for the joint presence of two or more causes in a single sufficient cause[36, 37]. In some of our examples and in our discussion of the various resultsin the paper, we will sometimes make reference to the concepts of a causalmechanism and synergism. However, all definitions, propositions, lemmas, T. J. VANDERWEELE AND J. M. ROBINS theorems and corollaries will be given in terms of sufficient causes for whichwe have a precise definition.The graphical representation of sufficient causes on a causal directedacyclic graph does not require that the determinative set of sufficient causesfor D be minimally sufficient, nor does it require that the set of determina-tive sufficient causes for D be nonredundant. To expand a directed acyclicgraph into another directed acyclic graph with sufficient cause nodes, allthat is required is that the set of sufficient causes constitutes a determina-tive set of sufficient causes for D . However, a set of events that constitutesa sufficient cause can be reduced to a set of events that constitutes a min-imal sufficient cause by iteratively excluding unnecessary events from theset until a minimal sufficient cause is obtained. Also, a set of determinativesufficient causes that is redundant can be reduced to one that is nonredun-dant by excluding those sufficient causes or minimal sufficient causes thatare redundant. It is sometimes an advantage to reduce a redundant set ofsufficient causes to a nonredundant set of minimal sufficient causes. Thisis so because allowing sufficient causes that are not minimally sufficient orallowing redundant sufficient causes or redundant minimal sufficient causescan obscure the conditional independence relations implied by the structureof the causal directed acyclic graph. This is made evident in Example 3. Example 3.
Consider the causal directed acyclic graph with the mini-mal sufficient causation structure given in Figure 3(i). Conditioning on D = 0conditions also on AB = 0 and EF = 0 and by the d -separation criterion, A and E are conditionally independent given D = 0. But now consider anexpanded structure for this causal directed acyclic graph which involvesonly minimal sufficient causes but which allows redundant minimal suffi-cient causes. Define Q = BE , then AQ is a minimal sufficient cause for D since AQ = 1 ⇒ AB = 1 ⇒ D = 1, but A = 1 ; D = 1 and Q = 1 ; D = 1.Now AB, AQ, EF is a determinative but redundant set of minimal sufficientcauses for D . Figure 3(ii) gives an alternative causal directed acyclic graphwith a minimal sufficient causation structure for the causal relationshipsindicated in Figure 3(i). In Figure 3(ii), conditioning on D = 0 conditionsalso on AB = 0, AQ = 0 and EF = 0, but the d -separation criteria no longerimply that A and E are conditionally independent given D = 0; becauseof conditioning on D = 0, there is an unblocked path between A and E ,namely A − AQ − Q − BE − E . Allowing the redundant minimal sufficientcause AQ in the minimal sufficient causation structure obscures the condi-tional independence relation. Similar examples can be constructed to showthat allowing sufficient causes that are not minimally sufficient can alsoobscure conditional independence relations [35]. INIMAL SUFFICIENT CAUSATION Fig. 3.
Example illustrating that redundant sufficient causes can obscure conditional in-dependence relations.
Although allowing sufficient causes that are not minimally sufficient orallowing redundant sufficient causes or redundant minimal sufficient causescan obscure the conditional independence relations implied by the structureof the causal directed acyclic graph, it may sometimes be desirable to includenonminimal sufficient causes or redundant sufficient causes. For example, asnoted above, nonminimal sufficient cause nodes or redundant sufficient causenodes may represent separate causal mechanisms upon which it might bepossible to intervene. Further discussion of conditional independence rela-tions in sufficient causation structures with nonminimally sufficient causesand redundant sufficient causes is given in Section 6.Note a sufficient cause need only involve one co-cause A i in its conjunctionbecause if it involved A i , . . . , A i k , then A i , . . . , A i k could be replaced bythe product A ′ i = A i · · · A i k . In certain cases though, it may be desirableto include more than one A i in a sufficient cause if this corresponds tothe actual causal mechanisms. If a set of variables A , . . . , A u satisfyingTheorem 1 can be constructed from functions of the random term U = ǫ GD ofthe nonparametric structural equation for D on G and their complementsso that A i = f i ( U ), then H can be chosen to be the graph G with theadditional nodes U, A , . . . , A u and with directed edges from U into each A i and from each A i into D . This gives rise to the definition, given below, of arepresentation for D . Definition 6. If D and all of its parents on the causal directed acyclicgraph G are binary and there exists some set { A i , P i } such that each P i issome conjunction of the parents of D and their complements, such that thereexist functions f i for which A i = f i ( ǫ D ), where ǫ D is the random term in thenonparametric structural equation for D on G and such that D = W i A i P i ,then { A i , P i } is said to constitute a representation for D . T. J. VANDERWEELE AND J. M. ROBINS
If the A i variables are constructed from functions of the random term ǫ D in the nonparametric structural equation for D on G , then these A i variablesmay or may not allow for interpretation, and they may or may not be suchthat an intervention on these A i variables is conceivable. In certain cases,the A i variables may simply be logical constructs for which no interventionis conceivable. Although in certain cases it may not be possible to interveneon the A i variables, we will still refer to conjunctions of the form A i P i assufficient causes for D , as it is assumed that it is possible to intervene onthe parents of D which constitute the conjunction for P i .Suppose that for some node D on a causal directed acyclic graph G , a setof variables A , . . . , A u satisfying Theorem 1 can be constructed from func-tions of the random term U = ǫ D in the nonparametric structural equationfor D on G , so that a representation for D is given by D = W i A i P i . Then,in order to simplify the diagram, instead of adding to G the variable U anddirected edges from U into each A i so as to form the minimal sufficient cau-sation structure, we will sometimes suppress U and simply add an asterisknext to each A i indicating that the A i variables have a common cause. Proposition 1.
For any representation for D , the co-causes A i will beindependent of the parents of D on the original directed acyclic graph G . Proof.
This follows immediately from the fact that for any representa-tion for D , the co-causes are functions of the random term in the nonpara-metric structural equation for D . (cid:3) If some of the sufficient causes for D are unknown, then it is not obvioushow one might make use of Theorem 1. The theorem allowed for a sufficientcausation structure on a causal directed acyclic graph, provided there existedsome set of co-causes A , . . . , A u . Theorem 2 complements Theorem 1 in thatit essentially states that when D and all of its parents are binary such a setof co-causes always exists. The variables A , . . . , A u are constructed fromfunctions of the random term ǫ D in the nonparametric structural equationfor D on G . Before stating and proving Theorem 1, we illustrate how theco-causes can be constructed by a simple example. Example 4.
Suppose E is the only parent of D , then the structuralequation for D is given by D = f ( E, ε D ). Define A , A and A as follows:let A ( ω ) = 1 if f (1 , ε D ( ω )) = f (0 , ε D ( ω )) = 1 and A ( ω ) = 0 otherwise; let A ( ω ) = 1 if f (1 , ε D ( ω )) = 1 and f (0 , ε D ( ω )) = 0, and A ( ω ) = 0 otherwise;and let A ( ω ) = 1 if f (1 , ε D ( ω )) = 0 and f (0 , ε D ( ω )) = 1, and A ( ω ) = 0otherwise. It is easily verified that D = A ∨ A E ∨ A E and that A , A E and A E constitute a determinative set of minimal sufficient causes for D .Note that this construction will give a determinative set of minimal sufficientcauses for D regardless of the form of f and the distribution of ε D . INIMAL SUFFICIENT CAUSATION Theorem 2.
Consider a causal directed acyclic graph G on which thereexists some node D such that D and all its parents are binary, then thereexist variables A , . . . , A u that satisfy the conditions of Theorem 1 and suchthat the sufficient causes constructed from A , . . . , A u along with the parentsof D on G and their complements are, in fact, minimal sufficient causes. Proof.
The nonparametric structural equation for D is given by D = f ( pa D , ε D ). Suppose D has m parents on the original causal directed acyclicgraph G . Since these parents are binary, there are 2 m values which pa D can take. Since f maps ( pa D , ε D ) to { , } , each value of ε D assigns toevery possible realization of pa D either 0 or 1 through f . There are 2 m suchassignments. Thus, without loss of generality, we may assume that ε D takeson some finite number of distinct values N ≤ m ; and so, we may writethe sample space for ε D as Ω D = { ω , . . . , ω N } , and we may use ω = ω i and ε D = ε D ( ω i ) interchangeably. The co-causes A , . . . , A u can be constructedas follows. Let W i be the indicator 1 ε D = ε D ( ω i ) . Let P i be some conjunctionof the parents of D and their complements, that is, P i = F i · · · F in i , whereeach F ik is either a parent of D , say E j or its complement E j . For each P i ,let A i ≡ F i · · · F in i is a minimal sufficient cause for D and A i = _ j { W j : W j F i · · · F in i is a minimal sufficient cause for D } otherwise. Let M i = P i if A i = 1, and M i = A i P i otherwise. It must be shownthat each M i = A i F i · · · F in i is a minimal sufficient cause and that the setof M i ’s constitutes a minimal sufficient cause representation for D (or moreprecisely, the set of M i ’s for which A i is not identically 0 constitutes aminimal sufficient cause representation for D ). We first show that each M i = A i F i · · · F in i is a minimal sufficient cause for D . Clearly, this is the case if A i ≡
1. Now consider those A i such that A i is not identically 0 and notidentically 1 and suppose A i = W i ∨ · · · ∨ W iυ i , where each W ij is such that W ij F i · · · F in i is a minimal sufficient cause for D . If A i F i · · · F in i is not aminimal sufficient cause, then either F i · · · F in i = 1 ⇒ D = 1 or there exists j such that A i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1 . Suppose first that F i · · · F in i = 1 ⇒ D = 1 then there does not exist a W j such that W j F i · · · F in i is a minimal sufficient cause for D ; but this contra-dicts A i is not identically 1. On the other hand, if there exists j such that A i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1, then it is also the case that W i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1 , T. J. VANDERWEELE AND J. M. ROBINS since A i is simply a disjunction of the W ij ’s. However, it would then followthat W i F i · · · F in i is not a minimal sufficient cause for D . But this contradictsthe definition of W i . Thus, A i F i · · · F in i must be a minimal sufficient cause for D . It remains to be shown that the set of M i ’s for which A i is not identically0 constitutes a minimal sufficient cause representation for D . We must showthat if D = 1, then there exists a M i = A i P i for which M i = 1. Now D is afunction of ( ε D , E , . . . , E m ), so let ( ε ∗ D , E ∗ , . . . , E ∗ m ) be any particular valueof ( ε D , E , . . . , E m ) for which D = 1. Consider the set { E , . . . , E m } . If forany j , ε D = ε ∗ D , E = E ∗ , . . . , E j − = E ∗ j − ,E j +1 = E ∗ j +1 , . . . , E m = E ∗ m ⇒ D = 1 , remove E j from { E , . . . , E m } . Continue to remove those E j from this setwhich are not needed to maintain the implication D = 1. Suppose the setthat remains is { E h , . . . , E h S } , then either we have E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1 or we have E h = E ∗ h , . . . , E h S = E ∗ h S ; D = 1and ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1 . If E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1, then if we define F j as the indi-cator F j = 1 ( E hj = E ∗ hj ) , F · · · F S is a minimal sufficient cause for D andthere thus exists an i , such that P i = F · · · F S and M i = P i , and when E h = E ∗ h , . . . , E h S = E ∗ h S , we have M i = 1. If E h = E ∗ h , . . . , E h S = E ∗ h S ; D = 1 but ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1, then if we define F j as the indicator 1 ( E hj = E ∗ hj ) , 1 ε D = ε ∗ D F · · · F S is a minimal sufficient causefor D ; and there exists an i such that M i = A i P i and P i = F · · · F S ; and ε D = ε ∗ D ⇒ A i = 1, such that ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ M i = 1 . We have thus shown when D = 1, there exists an M i such that M i = 1 andso the M i ’s constitutes a minimal sufficient cause representation for D . (cid:3) The variables A i constructed in Theorem 2, along with their correspond-ing conjunctions P i of the parents of D and their complements, we definebelow as the canonical representation for D . It is easily verified that theco-causes and representation constructed in Example 4 is the canonical rep-resentation for D in that example. INIMAL SUFFICIENT CAUSATION Definition 7.
Consider a causal directed acyclic graph G , such thatsome node D and all of its parents are binary. Let Ω D be the sample spacefor the random term ǫ D in the nonparametric structural equation for D on G . The conjunctions P i = F i · · · F in i , where each F ik is either a parent of D or the complement of a parent of D , along with the variables A i con-structed by A i ≡ F i · · · F in i is a minimal sufficient cause for D and A i = W ω j ∈ Ω D { ε D = ε D ( ω j ) : 1 ε D = ε D ( ω j ) F i · · · F in i is a minimal sufficient causefor D } ; otherwise, is said to be the canonical representation for D .As noted above, there will in general exist more than one set of co-causes A , . . . , A u , which together with the parents of D and their complements canbe used to construct a sufficient cause representation for D . The set of A i ’s inthe canonical representation constitutes only one particular set of variableswhich can be used to construct a sufficient cause representation. If D hasthree or more parents, examples can be constructed in which the canonicalrepresentation is redundant. Examples can also be constructed to show thatwhen the canonical representation is redundant, it is not always uniquely reducible to a nonredundant minimal sufficient cause representation. Al-though the canonical representation will not always be nonredundant, itdoes however guarantee that for a binary variable with binary parents, adeterminative set of minimal sufficient causes always exists. The canonicalrepresentation in a sense “favors” conjunctions with fewer terms. As canbe seen in the simple illustration given in Example 4, the canonical repre-sentation will never have A i = 1, for some conjunction P i , when there is aconjunction P j with A j = 1 and such that the components of P j are a subsetof those in the conjunction for P i .
4. Monotonic effects and minimal sufficient causation.
Minimal suffi-cient causes for a particular event D may have present in their conjunctionthe parents of D or the complements of these parents. In certain cases, nominimal sufficient cause will involve the complement of a particular parentof D . Such cases closely correspond to what will be defined below as a posi-tive monotonic effect. Essentially, a positive monotonic effect will be said tobe present when a function in a nonparametric structural equation is non-decreasing in a particular argument for all values of the other arguments ofthe function. In this section, we develop the relationship between minimalsufficient causation and monotonic effects. Definition 8.
The nonparametric structural equation for some node D on a causal directed acyclic graph with parent E can be expressed as D = f ( f pa D , E, ǫ D ), where f pa D are the parents of D other than E ; E is said tohave a positive monotonic effect on D if, for all f pa D and ǫ D , f ( f pa D , E , ǫ D ) ≥ T. J. VANDERWEELE AND J. M. ROBINS f ( f pa D , E , ǫ D ) whenever E ≥ E . Similarly, E is said to have a negativemonotonic effect on D if, for all f pa D and ǫ D , f ( f pa D , E , ǫ D ) ≤ f ( f pa D , E , ǫ D )whenever E ≥ E .Note that this notion of a monotonic effect is somewhat stronger thanWellman’s qualitative probabilistic influence [41]. See [38, 39] for furtherdiscussion. Theorem 3. If E is parent of D and if D and all of its parents are bi-nary, then the following are equivalent: (i) E has a positive monotonic effecton D ; (ii) there is some representation for D which is such that none of therepresentation’s conjunctions contain E ; (iii) the canonical representationof D , W i A i P i , is such that no conjunction P i contains E . Proof.
We see that (iii) implies (ii) because the representation requiredby (ii) is met by the canonical representation of D , as constructed in The-orem 2. To show that (ii) implies (i), we assume that we have a repre-sentation for D such that D = W i A i P i , where each P i is some conjunc-tion of the parents of D and their complements but does not contain E .If f ( f pa D , E, ǫ D ) = 1, then f ( f pa D , E, ǫ D ) = 1 because D = W i A i P i and noneof the P i involve E ; from this, (i) follows. To show that (i) implies (iii) weprove the contrapositive. Suppose that the canonical representation of D , { A i , P i } , is such that there exists a P i which contains E in its conjunction.Then there exists some value ε ∗ D of ε D and some conjunction of the parentsof D and their complements, say F · · · F n , such that W i F · · · F n E consti-tutes a minimal sufficient cause for D , where W i = 1 ( ε ∗ D = ε D ) . Let f pa ∗ D takethe values given by F · · · F n . This may not suffice to fix f pa ∗ D , but theremust exist some value of the remaining parents of D other than E which, inconjunction with W i F · · · F n E , gives D = 0; for if there were no such valuesof the other parents, then W i F · · · F n itself would be sufficient for D , and W i F · · · F n E would not be a minimal sufficient cause for D . Let f pa ∗ D besuch that f pa ∗ D and E together with ε ∗ D give D = 1, but f pa ∗ D and E with ε ∗ D give D = 0. Then, f ( f pa ∗ D , E, ε ∗ D ) = 1, but f ( f pa ∗ D , E, ε ∗ D ) = 0, and thus, (i)does not hold. This completes the proof. (cid:3)
5. Conditional covariance and minimal sufficient causation.
When twobinary parents of some event D have positive monotonic effects on D , it isin some cases possible to determine the sign of the conditional covariance ofthese two parents. In general, even in the setting of monotonic effects, theconditional covariance may be of either positive or negative sign; however,when additional knowledge is available concerning the minimal sufficientcausation structure of D , it is often possible to determine the sign of the INIMAL SUFFICIENT CAUSATION conditional covariance of two parents of D . Theorem 4 gives conditions underwhich the sign of the conditional covariance can be determined. Theorems5 and 6 extend the conclusions of Theorem 4 to certain cases concerningthe conditional covariance of two variables that may not be parents of theconditioning variable. The proof of Theorem 4 is suppressed; the proof in-volves extensive but routine algebraic manipulation and factoring (detailsare available from the authors upon request). Theorem 4.
Suppose that E and E are the only parents of D on somecausal directed acyclic graph, that E , E and D are all binary and that both E and E have a positive monotonic effect on D . Then, for any repre-sentation for D such that D = A ∨ A E ∨ A E ∨ A E E , the followinghold: (i) If A ≡ , then Cov( E , E | D ) ≤ . (ii) If A ≡ , A and A are independent and E and E are indepen-dent, then Cov( E , E | D ) ≤ . (iii) If A ≡ or A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (iv) If A ≡ or A ≡ , then Cov( E , E | D ) = 0 . (v) If A ≡ or A ≡ , then Cov( E , E | D ) ≥ provided Cov( E , E ) ≥ . (vi) If A ≡ or A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (vii) If A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (viii) If A ≡ , A and A are independent, E and E are independentand also A is independent of either A or A , then Cov( E , E | D ) = 0 . Note that parts (i)–(viii) of Theorem 4 all require some knowledge of asufficient cause representation for D , that is, that A = 0, A ≡ A ≡ D . As canbe seen from Theorem 4, if no knowledge of the sufficient causes is available,the conditional covariances Cov( E , E | D ) and Cov( E , E | D ) may be ofeither sign, even if E and E have positive monotonic effects on D . Forexample, if E and E have positive monotonic effects on D and (v) holdsthen Cov( E , E | D ) ≥
0; but if E and E have positive monotonic effectson D and (i) holds, then Cov( E , E | D ) ≤ E and E are the only parents of D , possibly correlated due to somecommon cause C , and have positive monotonic effects on D then the minimalsufficient causation structure for the causal directed acyclic graph is thatgiven in Figure 4. T. J. VANDERWEELE AND J. M. ROBINS
Fig. 4.
Minimal sufficient causation structure when E and E have positive monotoniceffects on D . Recall the asterisk is used to indicate that A , A , A or A may havea common cause U . If one of A , A , A or A is identically 0 or 1, thenTheorem 4 may be used to draw conclusions about the sign of the condi-tional covariance Cov( E , E | D ). For example, if one believes that there isno synergism between E and E in the actual causal mechanisms for D then A ≡
0; if this holds, then parts (vii) and (viii) of Theorem 4 can beused to determine the sign of the conditional covariance. Theorem 4 has anobvious analogue if one or both of E or E have a negative monotonic effecton D . If D has more than two parents, but if the two parents, E and E ,are independent of all other parents of D , then the causal directed acyclicgraph can be marginalized over these other parents, and Theorem 4 couldbe applied to the resulting causal directed acyclic subgraph.Some of the conclusions of Theorem 4 require knowing the sign of Cov( E , E )and Proposition 2 below (proved elsewhere [39]) relates the sign of Cov( E , E )to the presence of monotonic effects. In order to state this proposition andto allow for the development of extensions to Theorem 4, we need a fewadditional definitions. Definition 9.
An edge on a causal directed acyclic graph from X to Y is said to be of positive (negative) sign if X has a positive (negative)monotonic effect on Y . If X has neither a positive monotonic effect nor anegative monotonic effect on Y , then the edge from X to Y is said to bewithout a sign. Definition 10.
The sign of a path on a causal directed acyclic graphis the product of the signs of the edges that constitute that path. If one of
INIMAL SUFFICIENT CAUSATION Fig. 5.
Examples requiring extensions to Theorem 4. the edges on a path is without a sign, then the sign of the path is said to beundefined.
Definition 11.
Two variables X and Y are said to be positively mono-tonically associated if all directed paths between X and Y are of positivesign, and all common causes C i of X and Y are such that all directed pathsfrom C i to X not through Y are of the same sign as all directed paths from C i to Y not through X ; the variables X and Y are said to be negativelymonotonically associated if all directed paths between X and Y are of neg-ative sign, and all common causes C i of X and Y are such that all directedpaths from C i to X not through Y are of the opposite sign as all directedpaths from C i to Y not through X . Proposition 2. If X and Y are positively monotonically associated,then Cov(
X, Y ) ≥ . If X and Y are negatively monotonically associated,then Cov(
X, Y ) ≤ . Rules for the propagation of signs have been developed elsewhere [38,39, 41] and, as seen from Proposition 2, are useful for determining the signof covariances; however, as will be seen below, rules for deriving the signof conditional covariances are more subtle. Theorem 4 concerns the condi-tional covariance of two parents of the node D . However, often what willbe desired is the sign of the conditional covariance of two variables whichare not parents of the conditioning node. For example, in the coaggregationproblem discussed in the Introduction, we wanted to draw conclusions aboutCov( P , B | P = 1), but neither P nor B are parents of P in Figure 1. Inthe remainder of the paper we will thus extend Theorem 4 so as to allow forapplication to two variables, say F and G , which are not parents of the con-ditioning node D . The variables F and G might be ancestors, descendantsor have common causes with the parents, E and E , of D . Consider, forexample, the causal directed acyclic graphs in Figure 5.If we were interested in the sign of Cov( F, G | D ) in Figures 5(i)–(iii), thenclearly Theorem 4 is insufficient. Theorems 5 and 6 below will allow us to T. J. VANDERWEELE AND J. M. ROBINS extend the conclusions of Theorem 4 to examples such as those in Figure 5and to certain other cases involving two variables that may not be parentsof the conditioning variable. Lemmas 1–5 below will be needed in the proofsand application of Theorems 5 and 6. Lemmas 1 and 2 are consequencesof Theorems 1 and 2 in the work of Esary, Proschan and Walkup [5]. Lem-mas 3–5 are proved elsewhere in related work concerning the properties ofmonotonic effects [38].
Lemma 1.
Let f and g be functions with n real-valued arguments, suchthat both f and g are nondecreasing in each of their arguments. If X = ( X , . . . , X n ) is a multivariate random variable with n components,such that each component is independent of the other components, then Cov( f ( X ) , g ( X )) ≥ . Lemma 2. If F and G are binary and u and u are nondecreasingfunctions, then sign(Cov( u ( F ) , u ( G ))) = sign(Cov( F, G )) . Lemma 3.
Let X denote some set of nondescendants of A that blocksall backdoor paths from A to Y . If all directed paths between A and Y arepositive, then P ( Y > y | a, x ) and E [ y | a, x ] are nondecreasing in a . Lemma 4.
Suppose that E is binary. Let Q be some set of variableswhich are not descendants of F nor of E , and let C be the common causesof E and F not in Q . If all directed paths from E to F (or from F to E )are of positive sign and all directed paths from C to E not through { Q, F } are of the same sign as all directed paths from C to F not through { Q, E } ,then E [ F | E, Q ] is nondecreasing in E . Lemma 5.
Suppose that E is not a descendant of F . Let Q be some setof nondescendants of E that block all backdoor paths from E to F and let D be a node on a directed path from E to F such that all backdoor paths from D to F are blocked by { E, Q } . If all directed paths from E to F , except possiblythose through D , are of positive sign, then E [ F | D, Q, E ] is nondecreasing in E . Obvious analogues concerning negative signs hold for all of the lemmasabove. Theorem 5 below will allow us to determine the sign of the conditionalcovariance of F and G on graphs like those in Figure 5, provided thereare appropriate signs on the edges. The conclusion of Theorem 5 concernsthe equality of the sign of two conditional covariances, Cov( F, G | D ) andCov( E , E | D ). The theorem itself does not require knowledge of a sufficientcausation representation and thus applies to general causal directed acyclicgraphs. However, to draw conclusions about the sign of Cov( E , E | D ), one INIMAL SUFFICIENT CAUSATION must still appeal to Theorem 4 which does require some knowledge of asufficient causation representation. Theorem 5.
Suppose that E , E and D are binary variables, that E and E are parents of D , that F and G are d -separated given { E , E , D } ,that F and { E , D } are d -separated given E and that G and { E , D } are d -separated given E . If Cov(
F, E ) ≥ and Cov(
G, E ) ≥ then sign(Cov( F, G | D )) = sign(Cov( E , E | D )) . Proof.
Conditioning on E and E , we haveCov( F, G | D ) = E [Cov( F, G | D, E , E ) | D ]+ Cov( E [ F | D, E , E ] , E [ G | D, E , E ] | D ) . The first expression is 0 since F and G are d -separated given { E , E , D } .Furthermore, since F and { E , D } are d -separated given E and G and { E , D } are d -separated given E , the second expression can be reduced toCov( E [ F | E ] , E [ G | E ] | D ). Thus,Cov( F, G | D ) = Cov( E [ F | E ] , E [ G | E ] | D ) . If Cov(
F, E ) ≥ G, E ) ≥ E and E are binary, wehave that E [ F | E ] is nonincreasing in E and E [ G | E ] is nonincreasing in E ,and so by Lemma 2, sign(Cov( E [ F | E ] , E [ G | E ] | D )) = sign(Cov( E , E | D )).We thus have sign(Cov( F, G | D )) = sign(Cov( E , E | D ))and this completes the proof. (cid:3) Note Theorem 5 requires that Cov(
F, E ) ≥ G, E ) ≥
0; Propo-sition 2 can be used to check whether these covariances are nonnegative; thatis, the covariances will be nonnegative if F and E are positively monoton-ically associated and if G and E are positively monotonically associated. Example 5.
Note that the graphs in Figures 5(i) and (ii) satisfy the d -separation restrictions of Theorem 5. In Figure 5(i), G is an ancestor of E whereas F is related to E as a descendant and by a common cause.In Figure 5(ii), F is a descendant of E and G is related to E both as anancestor and by a common cause. The d -separation restrictions of Theorem 5would still hold in Figures 5(i) and (ii) if F and E or G and E had multiplecommon causes or if there were several intermediate variables between E and F and between G and E . T. J. VANDERWEELE AND J. M. ROBINS
Note, however, that Theorem 5 requires that F be d -separated from { E , D } given E and that G be d -separated from { E , D } given E . Thus,if F or G were a descendant of D , these assumptions would be violated.Consequently, Theorem 5 could not be applied to the diagram in Figure5(iii). Nor could Theorem 5 be applied to the paper’s introductory motiva-tion to draw conclusions about the sign of Cov( P , B | P = 1) for the graphin Figure 1, since B is a descendant of the conditioning variable P .Theorem 6 below gives a result that allows for F and G to be descendantsof D . Before stating this result we note, however, that Theorem 5 is restrictedin yet another way. Theorem 5 required that F and G be d -separated given { E , E , D } . If F and G have common causes then the d -separation restric-tions required by Theorem 5 will again, in general, not hold. Theorem 5would thus not apply to the graphs given in Figure 6.Theorem 6 gives a result similar to Theorem 5 which allows for F or G tobe descendants of D and allows also for F and G to have common causes. Aswith Theorem 5, the conclusion of Theorem 6 concerns the equality of thesign of two conditional covariances and the theorem itself does not requireknowledge of a sufficient causation representation. But once again, to drawconclusions about the sign of Cov( F, G | D ) using Theorem 6, one must knowthe sign of Cov( E , E | D ) and thus, appeal must again be made to Theorem4 which does require some knowledge of a sufficient causation representation. Theorem 6.
Suppose that E , E and D are binary variables, that E and E are parents of D , that F and G are d -separated given { E , E , D, Q } ,where Q is some set of common causes of F and G (each component ofwhich is univariate and independent of the other components in Q ) that F and E are d -separated given { E , D, Q } , that G and E are d -separatedgiven { E , Q, D } , that Q and { E , E } are d -separated given D and that Q Fig. 6.
Examples in which F and G have a common cause. INIMAL SUFFICIENT CAUSATION and D are d -separated. Suppose also that E [ F | E , D, Q ] is nondecreasing in E and that E [ G | E , D, Q ] is nondecreasing in E . If Cov( E , E | D ) ≥ ,and for each element of Q i of Q , every directed path from Q i to F is thesame sign as every directed path from Q i to G , then Cov(
F, G | D ) ≥ . If Cov( E , E | D ) ≤ , and for each element of Q i of Q , every directed pathfrom Q i to F is the opposite sign as every directed path from Q i to G , then Cov(
F, G | D ) ≤ . Proof.
We will prove the first of the results above; the proof of thesecond is similar. Conditioning on { E , E , Q } , we haveCov( F, G | D ) = E [Cov( F, G | D, Q, E , E ) | D ]+ Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D ) . The first expression is 0 since F and G are d -separated given { E , E , Q, D } .We can furthermore re-write the second expression as follows:Cov( F, G | D )= Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D )= E [Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | Q, D ) | D ]+ Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D ) . We will show that each of these two expressions is positive. Since F and E are d -separated given { E , D, Q } , E [ F | D, Q, E , E ] = E [ F | E , D, Q ]; andsince G and E are d -separated given { E , D, Q } , E [ G | D, Q, E , E ] = E [ G | E , D, Q ]. By assumption, we have that E [ F | E , D, Q ] is nondecreas-ing in E and that E [ G | E , D, Q ] is nondecreasing in E . For fixed q ,Cov( E [ F | D, Q = q, E , E ] , E [ G | D, Q = q, E , E ] | Q = q, D )= Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | Q = q, D )= Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | D ) , since Q and { E , E } are d -separated given D . And since E [ F | E , D, Q = q ]is nondecreasing in E and E [ G | E , D, Q = q ] is nondecreasing in E , byLemma 2, Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | D ) = Cov( E , E | D ) ≥
0. Thus, we have that Cov( E [ F | D, Q = q, E , E ] , E [ G | D, Q = q, E , E ] | Q = q, D ) ≥ q and taking expectations over Q we have E [Cov( E [ F | D, Q,E , E ] , E [ G | D, Q, E , E ] | Q, D ) | D ] ≥
0. We have shown that the first of thetwo expressions above is nonnegative. We now show that the second expres-sion Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D ) T. J. VANDERWEELE AND J. M. ROBINS is also nonnegative. As before, E [ F | D, Q, E , E ] = E [ F | E , D, Q ] and E [ G | D,Q, E , E ] = E [ G | E , D, Q ]. By hypothesis, for each element of Q i of Q ev-ery directed path from Q i to F is the same sign as every directed pathfrom Q i to G ; without loss of generality, we may assume that the sign ofall of these directed paths are positive. By Lemma 3 with X = { E , D } and X = { E , D } , respectively, E [ F | E , D, Q = q ] and E [ G | E , D, Q = q ] are bothnondecreasing in each dimension of q . Note that we may apply Lemma 3because if there were any backdoor paths from Q to F or to G , then Q would have some parent which would also be a common cause of F and G and thus also a member of the set Q , but this would violate the assumptionthat the members of Q were independent of one another. Furthermore, E [ E [ F | D, Q = q, E , E ] | Q = q, D ] = E [ E [ F | E , D, Q = q ] | Q = q, D ]= E [ E [ F | E , D, Q = q ] | D ]and similarly, E [ E [ G | D, Q = q, E , E ] | Q = q, D ] = E [ E [ G | E , Q = q ] | D ] = E [ E [ G | E , Q = q ] | Q = q, D ] since Q and { E , E } are d -separated given D .Thus, E [ E [ F | D, Q = q, E , E ] | Q = q, D ] = E [ E [ F | E , D, Q = q ] | D ]and E [ E [ G | D, Q = q, E , E ] | Q = q, D ] = E [ E [ G | E , D, Q = q ] | D ]are both nondecreasing in each dimension of q from which it follows byLemma 1 that Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ]) ≥
0. Since Q and D are d -separated we also haveCov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D )= Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ]) ≥ (cid:3) Note the application of Theorem 6 requires that E [ F | E , D, Q ] is nonde-creasing in E and that E [ G | E , D, Q ] is nondecreasing in E . Either of thefollowing will suffice for E [ F | E , D, Q ] to be nondecreasing in E (similarremarks hold for E [ G | E , D, Q ]): (i) F and D are d -separated given { Q, E } and F and E are positively monotonically associated or (ii) if F is a descen-dant of E and D , F and E do not have common causes and all directedpaths from E to F not through D are of positive sign. Condition (i) sufficesby Lemma 4; condition (ii) suffices by Lemma 5. Example 6.
Although the graphs in Figure 5(iii) and in Figure 6 donot satisfy the d -separation restrictions of Theorem 5, it can be verified thatthe these graphs do satisfy the d -separation restrictions of Theorem 6. INIMAL SUFFICIENT CAUSATION At first glance, the d -separation restrictions of Theorems 5 and 6 appearto severely limit the settings to which conclusions about conditional covari-ances can be drawn. The d -separation requirements are, in fact, somewhatless restrictive than they may first seem. We argue that the d -separationrestrictions of either Theorems 5 or 6 will apply to most graphs in whichneither F nor G is a cause of the other (though the restrictions on the set ofcommon causes Q , if any, of F and G in Theorem 6 are more substantial).Theorem 5 requires (i) that F and G are d -separated given { E , E , D } and(ii) that F and { E , D } are d -separated given E and that G and { E , D } are d -separated given E . In Theorems 5 and 6 (and Figures 5 and 6), F was either an ancestor or descendant of or shared a common cause with E ;and G was either an ancestor or descendant of or shared a common causewith E . The d -separation restrictions essentially just require that F and G are sufficiently structurally separated so that (i) F and G are only asso-ciated because of { E , E , D } and (ii) F is associated with { E , D } onlythrough E ; and G is associated with { E , D } only through E . If neither F or G is a descendant of D , then the conditions will, in general, only beviolated if one of F or G is a cause of the other or if they share a commoncause. Theorem 6, however, allowed for F and G to have common causes Q . The restrictions on Q in Theorem 6 were somewhat substantial, but therestrictions on F and G are very similar to those of Theorem 5 except thatthey were made conditional on Q . Theorems 5 and 6 will thus apply to awide range of graphs, as can also be seen by the variety of graphs in Figures5 and 6, in which neither F nor G is a cause of the other.As is clear from Proposition 2, rules concerning the propagation of signswere sufficient to determine the sign of the covariance between two variables.For conditional covariances, the principles guiding such a determination aremore subtle. The principle behind the proofs of Theorems 5 and 6 was topartition the conditional covariance into two componentsCov( F, G | D ) = E [Cov( F, G | D, Q, E , E ) | D ]+ Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D )with Q = ∅ in the proof of Theorem 5. The d -separation restrictions allowedfor the conclusion that Cov( F, G | D, Q, E , E ) = 0. Additional d -separationrestrictions were needed so that the second expression Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D ) could be reduced to a form in which the sign of thisconditional covariance could be determined from signed edges and an appealto Theorem 4.Having stated Theorem 6, we can now return to the motivating examplepresented in the paper’s Introduction. T. J. VANDERWEELE AND J. M. ROBINS
Fig. 7.
Causal directed acyclic graph with signed edges, under the null hypothesis of nofamilial coaggregation.
Example 7.
In the motivating example described in Figure 1, with dataavailable only on P , P , B , B , we wish to test the null hypothesis of no fa-milial coaggregation (i.e., the null hypothesis that there are no directed edgesemanating from F ). Note that Hudson et al. [10] consider an alternative ap-proach using a threshold model with additive multivariate normal latentfactors. Here we use a sufficient causation approach. Given the substantiveknowledge that for no subset of the population do the genetic causes G p and G B of P and B prevent disease and that for no subset of the population dothe environmental causes E and E of B and P prevent either disease, wehave that E and E have positive monotonic effects on P and B and on P and B , respectively, and that G P has a positive monotonic effect on P and on P and that G B has a positive monotonic effect on B and on B .The null hypothesis of no familial coaggregation can then be represented bythe signed causal directed acyclic graph given in Figure 7.If, in addition, using prior biological knowledge, it is assumed that thereis no synergism between E and G P in the sufficient cause sense, then wecan apply part (vii) of Theorem 4 and, under the null hypothesis of no fa-milial coaggregation, we have that Cov( E , G P | P = 1) ≤
0. By Theorem 6with Q = ∅ we have that sign(Cov( B , P | P = 1)) = sign(Cov( E , G P | P =1)). Under the null hypothesis of no familial coaggregation we thus havesign(Cov( B , P | P = 1)) = sign(Cov( E , G P | P = 1)) ≤
0. Thus, as claimedin the Introduction, a test of the null Cov( B , P | P = 1) ≤ E and G P . Note that by the symmetry of this example, a test of the nullCov( B , P | P = 1) ≤ E and G P . The development of a theoryof minimal sufficient causation on directed acyclic graphs provided the con-cepts necessary to derive these results.
6. Discussion.
In this paper we have incorporated notions of minimalsufficient causation into the directed acyclic graph causal framework. Doing
INIMAL SUFFICIENT CAUSATION so has provided a clear theoretical link between two major conceptualizationsof causality. Causal directed acyclic graphs with minimal sufficient causationstructures have furthermore allowed for the development of rules governingthe sign of conditional covariances and of rules governing the presence ofconditional independencies which hold only in a particular stratum of theconditioning variable.The present work could be extended in a number of directions. Theorycould be developed concerning cases in which a sufficient causation struc-ture involves redundant sufficient causes or sufficient causes that are notminimally sufficient. Specifically, it might be possible to develop a systemof axiomatic rules which govern conditional independencies within strataof variables on a causal directed acyclic graph with a sufficient causationstructure, to furthermore demonstrate the soundness and completeness ofthis axiomatic system and to construct algorithms for applying the rulesto identify all conditional independencies inherent in the graph’s structure.Another direction of further research might involve the incorporation of theAND and OR nodes that arise from sufficient causation structures into othergraphical models such as summary graphs [4], MC-graphs [12], chain graphmodels [2, 6, 14, 15, 16, 23, 34, 42] and ancestral graph models [24]. Finally,further work could be done extending the results of Theorem 4 to yet moregeneral settings than those of Theorems 5 and 6.REFERENCES [1] Aickin, M. (2002).
Causal Analysis in Biomedicine and Epidemiology Based on Min-imal Sufficient Causation . Dekker, New York.[2]
Anderson, S. A. , Madigan, D. and
Perlman, M. D. (2001). Alternative Markovproperties for chain graphs.
Scand. J. Statist. Boutilier, C. , Friedman, N. , Goldszmidt, M. and
Koller, D. (1996). Context-specific independence in Bayesian networks. In
Uncertainty in Artificial Intelli-gence
Cox, D. R. and
Wermuth, N. (1996).
Multivariate Dependencies: Models, Analysisand Interpretation . Chapman and Hall, London. MR1456990[5]
Esary, J. D. , Proschan, F. and
Walkup, D. W. (1967). Association of randomvariables, with applications.
Ann. Math. Statist. Frydenberg, M. (1990). The chain graph Markov property.
Scand. J. Statist. Geiger, D. and
Heckerman, D. (1996). Knowledge representation and inferencein similarity networks and Bayesian multinets.
Artificial Intelligence Geiger, D. , Verma, T. S. and
Pearl, J. (1990). Identifying independence inBayesian networks.
Networks Haavelmo, T. (1943). The statistical implications of a system of simultaneous equa-tions.
Econometrica Hudson, J. I. , Javaras, K. N. , Laird, N. M. , VanderWeele, T. J. , Pope, H. G. and
Hern´an, M. A. (2008). A structural approach to the familial coaggregationof disorders.
Epidemiology T. J. VANDERWEELE AND J. M. ROBINS[11]
Koopman, J. S. (1981). Interaction between discrete causes.
American J. Epidemi-ology
Koster, J. T. A. (2002). Marginalizing and conditioning in graphical models.
Bernoulli Lauritzen, S. L. , Dawid, A. P. , Larsen, B. N. and
Leimer, H. G. (1990). Inde-pendence properties of directed Markov fields.
Networks Lauritzen, S. L. and
Richardson, T. S. (2002). Chain graph models and theircausal interpretations (with discussion).
J. R. Stat. Soc. Ser. B Stat. Methodol. Lauritzen, S. L. and
Wermuth, N. (1989). Graphical models for associationsbetween variables, some of which are qualitative and some quantitative.
Ann.Statist. Levitz, M. , Perlman, M. D. and
Madigan, D. (2001). Separation and complete-ness properties for amp chain graph Markov models.
Ann. Statist. Lewis, D. (1973). Causation.
J. Philosophy Lewis, D. (1973).
Counterfactuals . Harvard Univ. Press, Cambridge. MR0421986[19]
Mackie, J. L. (1965). Causes and conditions.
American Philosophical Quarterly Novick, L. R. and
Cheng, P. W. (2004). Assessing interactive causal influence.
Psychological Review
Pearl, J. (1995). Causal diagrams for empirical research.
Biometrika Pearl, J. (2000).
Causality: Models, Reasoning, and Inference . Cambridge Univ.Press, Cambridge. MR1744773[23]
Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs.
Scand. J. Statist. Richardson, T. S. and
Spirtes, P. (2002). Ancestral graph Markov models.
Ann.Statist. Robins, J. M. (1986). A new approach to causal inference in mortality studies withsustained exposure period—application to control of the healthy worker survivoreffect.
Math. Modelling Robins, J. M. (1987). Addendum to “A new approach to causal inference in mortalitystudies with sustained exposure period—application to control of the healthyworker survivor effect.”
Comput. Math. Appl. Robins, J. M. (1995). Discussion of “Causal diagrams for empirical research” by J.Pearl.
Biometrika Robins, J. M. (2003). Semantics of causal dag models and the identification of directand indirect effects. In
Highly Structured Stochastic Systems (P. Green, N. Hjortand S. Richardson, eds.) 70–81. Oxford Univ. Press, New York. MR2082403[29]
Rothman, K. J. (1976). Causes.
American J. Epidemiology
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized andnonrandomized studies.
J. Educ. Psychol. Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization.
Ann. Statist. Saracci, R. (1980). Interaction and synergism.
American J. Epidemiology
Spirtes, P. , Glymour, C. and
Scheines, R. (1993).
Causation, Prediction andSearch . Springer, New York. MR1227558INIMAL SUFFICIENT CAUSATION [34] Studen´y, M. and
Bouckaert, R. (1998). On chain graph models for description ofconditional independence structures.
Ann. Statist. VanderWeele, T. J. and
Robins, J. M. (2007). Directed acyclic graphs, sufficientcauses and the properties of conditioning on a common effect.
American J.Epidemiology
VanderWeele, T. J. and
Robins, J. M. (2007). The identification of synergism inthe sufficient-component cause framework.
Epidemiology VanderWeele, T. J. and
Robins, J. M. (2008). Empirical and counterfactual con-ditions for sufficient cause interactions.
Biometrika VanderWeele, T. J. and
Robins, J. M. (2009). Properties of monotonic effects ondirected acyclic graphs.
J. Machine Learning Research . To appear. Available at http://biostats.bepress.com/cobra/ps/art35 .[39]
VanderWeele, T. J. and
Robins, J. M. (2009). Signed directed acyclic graphs forcausal inference.
J. Roy. Statist. Soc. Ser. B . To appear.[40]
Verma, T. and
Pearl, J. (1988). Causal networks: Semantics and expressiveness.In
Proceedings of the 4th Workshop on Uncertainty in Artificial Intelligence Wellman, M. P. (1990). Fundamental concepts of qualitative probabilistic networks.
Artificial Intelligence Wermuth, N. and
Cox, D. R. (2004). Joint response graphs and separation in-duced by triangular systems.
J. R. Stat. Soc. Ser. B Stat. Methodol. Wright, S. (1921). Correlation and causation.
J. Agric. Res. Department of Health StudiesUniversity of Chicago5841 South Maryland AvenueMC 2007Chicago, Illinois 60637USAE-mail: [email protected]