[PDF] Minimal sufficient causation and directed acyclic graphs

Abstract

Notions of minimal sufficient causation are incorporated within the directed acyclic graph causal framework. Doing so allows for the graphical representation of sufficient causes and minimal sufficient causes on causal directed acyclic graphs while maintaining all of the properties of causal directed acyclic graphs. This in turn provides a clear theoretical link between two major conceptualizations of causality: one counterfactual-based and the other based on a more mechanistic understanding of causation. The theory developed can be used to draw conclusions about the sign of the conditional covariances among variables.

Full PDF

aa r X i v : . [ m a t h . S T ] J un The Annals of Statistics (cid:13)

Institute of Mathematical Statistics, 2009

MINIMAL SUFFICIENT CAUSATION AND DIRECTEDACYCLIC GRAPHS

By Tyler J. VanderWeele and James M. Robins

University of Chicago and Harvard University

Notions of minimal suﬃcient causation are incorporated withinthe directed acyclic graph causal framework. Doing so allows for thegraphical representation of suﬃcient causes and minimal suﬃcientcauses on causal directed acyclic graphs while maintaining all of theproperties of causal directed acyclic graphs. This in turn provides aclear theoretical link between two major conceptualizations of causal-ity: one counterfactual-based and the other based on a more mecha-nistic understanding of causation. The theory developed can be usedto draw conclusions about the sign of the conditional covariancesamong variables.

1. Introduction.

Two broad conceptualizations of causality can be dis-cerned in the literature, both within philosophy and within statistics andepidemiology. The ﬁrst conceptualization may be characterized as giving anaccount of the eﬀects of certain causes; the approach addresses the question,“Given a particular cause or intervention, what are its eﬀects?” In the con-temporary philosophical literature, this approach is most closely associatedwith Lewis’ work [17, 18] on counterfactuals. In the contemporary statisticsliterature, this ﬁrst approach is closely associated with the work of Rubin[30, 31] on potential outcomes, of Robins [25, 26] on the use of counterfac-tual variables in the context of time-varying treatment and of Pearl [21] onthe graphical representation of various counterfactual relations on directedacyclic graphs. This counterfactual approach has been used extensively instatistics both in the development of theory and in application. The secondconceptualization of causality may be characterized as giving an account ofthe causes of particular eﬀects; this approach attempts to address the ques-tion, “Given a particular eﬀect, what are the various events which might have

Received December 2007.

AMS 2000 subject classiﬁcations.

Primary 62A01, 62M45; secondary 62G99, 68T30,68R10, 05C20.

Key words and phrases.

Causal inference, conditional independence, directed acyclicgraphs, graphical models, interactions, suﬃcient causation, synergism.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Statistics ,2009, Vol. 37, No. 3, 1437–1465. This reprint diﬀers from the original inpagination and typographic detail. 1

T. J. VANDERWEELE AND J. M. ROBINS been its cause?” In the contemporary philosophical literature, this secondapproach is most notably associated with Mackie’s work [19] on insuﬃcientbut necessary components of unnecessary but suﬃcient conditions (INUSconditions) for an eﬀect. In the epidemiologic literature, this approach ismost closely associated with Rothman’s work [29] on suﬃcient-componentcauses. This work is more closely related to the various mechanisms for aparticular eﬀect than is the counterfactual approach. Rothman’s work onsuﬃcient-component causes has, however, seen relatively little development,extension or application, though the basic framework is routinely taughtin introductory epidemiology courses. Perhaps the only major attempt inthe statistics literature to extend and apply Rothman’s theory has been thework of Aickin [1] (comments relating Aickin’s work to the present work areavailable from the authors upon request).In this paper, we incorporate notions of minimal suﬃcient causes, cor-responding to Rothman’s suﬃcient-component causes, within the directedacyclic graph causal framework [21]. Doing so essentially unites the mecha-nistic and the counterfactual approaches into a single framework. As will beseen in Section 5, we can use the framework developed to draw conclusionsabout the sign of the conditional covariances among variables. Without thetheory developed concerning minimal suﬃcient causes, such conclusions can-not be drawn from causal directed acyclic graphs. In a related paper [35] wehave discussed how these ideas relate to epidemiologic research. The presentpaper develops the theory upon which this epidemiologic discussion relies.The theory developed in this paper is motivated by several other con-siderations. As will be seen below, the incorporation of minimal suﬃcientcause nodes allows for the identiﬁcation of certain conditional independen-cies which hold only within a particular stratum of the conditioning vari-able (i.e., “asymmetric conditional independencies,” [7]) which were not evi-dent without the minimal suﬃcient causation structures. We note that theseasymmetric conditional independencies have been represented elsewhere byBayesian multinets [7] or by trees [3]. Another motivation for the develop-ment of the theory in this paper concerns the notion of interaction. Prod-uct terms are frequently included in regression models to assess interactionsamong variables; these statistical interactions, however, even if present, neednot imply the existence of an actual mechanism in which two distinct causesboth participate. Interactions which do concern the actual mechanisms aresometimes referred to as instances of “synergism” [29], “biologic interac-tions” [32] or “conjunctive causes” [20], and the development of minimalsuﬃcient cause theory provides a useful framework to characterize mecha-nistic interactions. In related work [37] we have derived empirical tests forinteractions in this suﬃcient cause sense.As yet further motivation, we conclude this Introduction by describinghow the methods we develop in this paper clariﬁed and helped resolve an

INIMAL SUFFICIENT CAUSATION Fig. 1.

Causal directed acyclic graph under the alternative hypothesis of familial coag-gregation. analytic puzzle faced by psychiatric epidemiologists. Consider the follow-ing somewhat simpliﬁed version of a study reported in Hudson et al. [10].Three hundred pairs of obese siblings living in an ethnically homogenousupper-middle class suburb of Boston are recruited and cross classiﬁed bythe presence or absence of two psychiatric disorders: manic-depressive dis-order P and binge eating disorder B . The question of scientiﬁc interest iswhether these two disorders have a common genetic cause, because, if so,studies to search for a gene or genes that cause both disorders would beuseful. Consider two analyses. The ﬁrst analysis estimates the covariance β between P i and B i , while the second analysis estimates the conditionalcovariance α between P i and B i among subjects with P i = 1, where B ki is 1 if the k th sibling in the i th family has disorder B and is zero otherwise,with P ki deﬁned analogously. It was found that the estimates β and α wereboth positive with 95% conﬁdence intervals that excluded zero.Hudson et al. [10] substantive prior knowledge is summarized in the di-rected acyclic graph of Figure 1 in which the i index denoting family issuppressed. In what follows, we will make reference to some standard re-sults concerning directed acyclic graphs; these results are reviewed in detailin the following section.In Figure 1, G B and G P represent the genetic causes of B and P , respec-tively, that are not common causes of both B and P. The variables E and E represent the environmental exposures of siblings 1 and 2, respectively,that are common causes of both diseases, for example, exposure to a partic-ularly stressful school environment. The variables G B and G P are assumedindependent as would typically be the case if, as is highly likely, they arenot genetically linked. Furthermore, as is common in genetic epidemiology,the environmental exposures E and E are assumed independent of thegenetic factors. The causal arrows from P to B and P to B representthe investigators’ beliefs that manic-depressive disorder may be a cause ofbinge eating disorder but not vice-versa. The node F represents the com-mon genetic causes of both P and B as well as any environmental causes ofboth P and B that are correlated within families. There is no data available T. J. VANDERWEELE AND J. M. ROBINS for G B , G P , E , E or F . The reason for grouping the common geneticcauses with the correlated environmental causes in F is that, based on theavailable data { P ki , B ki ; i = 1 , . . . , , k = 1 , } , we can only hope to testthe null hypothesis that F so deﬁned is absent, which is referred to as thehypothesis of no familial coaggregation. If this null hypothesis is rejected,we cannot determine from the available data whether F is present due to acommon genetic cause or a correlated common environmental cause. Thus E and E are independent on the graph because, by deﬁnition, they repre-sent the environmental common causes of B and P that are independentlydistributed between siblings.Now, under the null hypothesis that F is absent, we note that P and B are still correlated due to the unblocked path P − G p − P − B , so wewould expect β = 0 as found. Furthermore, P and B are still expectedto be correlated given P = 1 due to the unblocked path P − G p − P − E − B , so we would expect α = 0 as found. Thus, we cannot test the nullhypothesis that F is absent without further substantive assumptions beyondthose encoded in the causal directed acyclic graph of Figure 1.Now Hudson et al. [10] were also willing to assume that for no subset of thepopulation did the genetic causes G p and G B of P and B prevent disease.Similarly, they assumed there was no subset of the population for whomthe environmental causes E and E of B and P prevented either disease.We will show in Section 5 that under these additional assumptions, the nullhypothesis that F is absent implies that the conditional covariance α mustbe less than or equal to zero, provided that there is no interaction, in thesuﬃcient cause sense, between E and G P . If it is plausible that no suﬃcientcause interaction between E and G P exists, then the null hypothesis that F is absent is rejected because the estimate of α is positive with a 95%conﬁdence interval that does not include zero.Thus, the conclusion in the argument above that familial coaggregationof diseases B and P was present depended critically on the existence of (i)a formal deﬁnition of a suﬃcient cause interaction, (ii) a substantive under-standing of what the assumption of no suﬃcient cause interaction entailed,and (iii) a sound mathematical theory that related assumptions about theabsence of suﬃcient cause interactions to testable restrictions on the distri-bution of the observed data, speciﬁcally on the sign of a particular condi-tional covariance. In this paper, we provide a theory that oﬀers (i)–(iii).The remainder of the paper is organized as follows. The second sectionreviews the directed acyclic graph causal framework and provides some basicdeﬁnitions; the third section presents the theory which allows for the graph-ical representation of minimal suﬃcient causes within the directed acyclicgraph causal framework; the fourth section gives an additional preliminaryresult concerning monotonicity; the ﬁfth section develops results relatingminimal suﬃcient causation and the sign of conditional covariances; the INIMAL SUFFICIENT CAUSATION sixth section provides some discussion concerning possible extensions to thepresent work.

2. Basic deﬁnitions and concepts.

In this section, we review the directedacyclic graph causal framework and give a number of deﬁnitions regardingsuﬃcient conjunctions and related concepts. Following Pearl [21], a causaldirected acyclic graph is a set of nodes ( X , . . . , X n ), corresponding to vari-ables, and directed edges among nodes, such that the graph has no cyclesand such that, for each node X i on the graph, the corresponding variable isgiven by its nonparametric structural equation X i = f i ( pa i , ǫ i ), where pa i arethe parents of X i on the graph and the ǫ i are mutually independent randomvariables. These nonparametric structural equations can be seen as a gener-alization of the path analysis and linear structural equation models [21, 22]developed by Wright [43] in the genetics literature and Haavelmo [9] in theeconometrics literature. Robins [27, 28] discusses the close relationship be-tween these nonparametric structural equation models and fully randomized,causally interpreted structured tree graphs [25, 26]. Spirtes, Glymour andScheines [33] present a causal interpretation of directed acyclic graphs out-side the context of nonparametric structural equations and counterfactualvariables. It is easily seen from the structural equations that ( X , . . . , X n )admits the following factorization: p ( X , . . . , X n ) = Q ni =1 p ( X i | pa i ). The non-parametric structural equations encode counterfactual relationships amongthe variables represented on the graph. The equations themselves representone-step ahead counterfactuals with other counterfactuals given by recur-sive substitution. The requirement that the ǫ i be mutually independent isessentially a requirement that there is no variable absent from the graphwhich, if included on the graph, would be a parent of two or more variables[21, 22].A path is a sequence of nodes connected by edges regardless of arrowheaddirection; a directed path is a path which follows the edges in the directionindicated by the graph’s arrows. A node C is said to be a common cause of A and B if there exists a directed path from C to B not through A and adirected path from C to A not through B . A collider is a particular nodeon a path such that both the preceding and subsequent nodes on the pathhave directed edges going into that node. A backdoor path from A to B isa path that begins with a directed edge going into A . A path between A and B is said to be blocked given some set of variables Z if either there isa variable in Z on the path that is not a collider or if there is a collider onthe path such that neither the collider itself nor any of its descendants arein Z . If all paths between A and B are blocked given Z , then A and B aresaid to be d -separated given Z . It has been shown that if all paths between A and B are blocked given Z , then A and B are conditionally independentgiven Z [8, 13, 40]. T. J. VANDERWEELE AND J. M. ROBINS

Suppose that a set of nonparametric structural equations represented bya directed acyclic graph H is such that its variables X are partitioned intotwo sets X = V ∪ W . If in the nonparametric structural equation for V ∪ W ,by replacing each occurrence of X i ∈ W by f i ( pa i , ǫ i ), the nonparametricstructural equations for V can be written so as to correspond to some causaldirected acyclic graph G , then G is said to be the marginalization of H over the set of variables W . A causal directed acyclic graph with variables X = V ∪ W can be marginalized over W if no variable in W is a commoncause of any two variables in V .In giving deﬁnitions for a suﬃcient conjunction and related concepts,we will use the following notation. An event is a binary variable takingvalues in { , } . The complement of some event E we will denote by E . Aconjunction or product of the events X , . . . , X n will be written as X · · · X n .The associative OR operator, ∨ , is deﬁned by A ∨ B = A + B − AB . For arandom variable A with sample space Ω we will use the notation A ≡ A ( ω ) = 0, for all ω ∈ Ω. We will use the notation 1 A = a todenote the indicator function for the random variable A taking the value a ; for some subset S of the sample space Ω, we will use 1 S to denote theindicator that ω ∈ S . We will use the notation A ` B | C to denote that A is conditionally independent of B given C . We begin with the deﬁnitions ofa suﬃcient conjunction and a minimal suﬃcient conjunction. These basicdeﬁnitions make no reference to directed acyclic graphs or causation. Definition 1.

A set of events X , . . . , X n is said to constitute a suﬃ-cient conjunction for event, D if X , . . . , X n = 1 ⇒ D = 1. Definition 2.

A set of events X , . . . , X n which constitutes a suﬃcientconjunction for D is said to constitute a minimal suﬃcient conjunction for D if no proper subset of X , . . . , X n constitutes a suﬃcient conjunction for D .Suﬃcient conjunctions for a particular event need not be causes for anevent. Suppose a particular sound is produced when and only when an in-dividual blows a whistle. This particular sound the whistle makes is a suf-ﬁcient conjunction for the whistle’s having been blown, but the sound doesnot cause the blowing of the whistle. The converse, rather, is true; the blow-ing of the whistle causes the sound to be produced. Corresponding then tothese notions of a suﬃcient conjunction and a minimal suﬃcient conjunctionare those of a suﬃcient cause and a minimal suﬃcient cause which will bedeﬁned in Section 3. Definition 3.

A set of events M , . . . , M n , each of which may be someproduct of events, is said to be determinative for some event D if D = M ∨ M ∨ · · · ∨ M n . INIMAL SUFFICIENT CAUSATION Fig. 2.

Causal directed acyclic graphs with suﬃcient causation structures.

Definition 4.

A determinative set M , . . . , M n of (minimal) suﬃcientconjunctions for D is nonredundant if no proper subset of M , . . . , M n isdeterminative for D . Example 1.

Suppose A = B ∨ CE and D = EF . If we consider all theminimal suﬃcient conjunctions for A among the events { B, C, D } , we cansee that B and CD are the only minimal suﬃcient conjunctions, but it is notthe case that A = B ∨ CD . Clearly then, a complete list of minimal suﬃcientconjunctions for A generated by a particular collection of events may not bea determinative set of suﬃcient conjunctions for A . If we consider all minimalsuﬃcient conjunctions for A among the events { B, C, D, E } , we see that B and CD and CE are all minimal suﬃcient conjunctions. In this example, B ∨ CD ∨ CE is a determinative set of minimal suﬃcient conjunctions for A but is not nonredundant. We see then that even when a complete list ofminimal suﬃcient conjunctions generated by a particular collection of eventsconstitutes a determinative set of minimal suﬃcient conjunctions, it may notbe a nonredundant determinative set of minimal suﬃcient conjunctions.

3. Minimal suﬃcient causation and directed acyclic graphs.

In this sec-tion, we develop theory which allows for the representation of suﬃcientconjunctions and minimal suﬃcient conjunctions on causal directed acyclicgraphs. We begin with a motivating example.

Example 2.

Consider a causal directed acyclic graph given in Figure2(i). Suppose E E and E E constitute a determinative set of suﬃcientconjunctions for D . We will show in Theorem 1 below that it follows that thediagram in Figure 2(ii) is also a causal directed acyclic graph where E i E j is simply the product or conjunction of E i and E j ; because the suﬃcientconjunctions E E and E E are determinative, it follows that D = E E ∨ E E . An ellipse is put around the suﬃcient conjunctions E E and E E T. J. VANDERWEELE AND J. M. ROBINS to indicate that the set is determinative. As will be seen below, in orderto add suﬃcient conjunctions it is important that a determinative set ofsuﬃcient conjunctions is known or can be constructed. Consider the causaldirected acyclic graph given in Figure 2(iii). Suppose that no determinativeset of suﬃcient conjunctions can be constructed from E and E alone;suppose further, however, that there exists some other cause of D , say A ,independent of E and E , such that E E and AE form a determinativeset of suﬃcient conjunctions. Then, Theorem 1 below can again be used toshow that Figure 2(iv) is a causal directed acyclic graph. Furthermore, itwill be shown in Theorem 2 that for any causal directed acyclic graph with abinary node which has only binary parents, a set of variables { A i } ni =0 alwaysexists such that a determinative set of suﬃcient causes can be formed fromthe original parents on the graph and the variables { A i } ni =0 .Theorem 1 provides the formal result required for the previous example. Theorem 1.

Consider a causal directed acyclic graph G with some node D such that D and all its parents are binary. Suppose that there exists a setof binary variables A , . . . , A u such that a determinative set of suﬃcientconjunctions for D , say M , . . . , M S , can be formed from conjunctions of A , . . . , A u along with the parents of D on G and the complements of thesevariables. Suppose further that there exists a causal directed acyclic graph H such that the parents of D on H that are not on G consist of the nodes A , . . . , A u and such that G is the marginalization of H over the set of vari-ables which are on the graph for H but not G . Then, the directed acyclicgraph J formed by adding to H the nodes M , . . . , M S , removing the di-rected edges into D from the parents of D on H , adding directed edges fromeach M i into D and adding directed edges into each M i from every parentof D on H which appears in the conjunction for M i is itself a causal directedacyclic graph. Proof.

To prove that the directed acyclic graph J is a causal directedacyclic graph, it is necessary to show that each of the nodes on the directedacyclic graph can be represented by a nonparametric structural equationinvolving only the parents on J of that node and a random term ǫ i whichis independent of all other random terms ǫ j in the nonparametric structuralequations for the other variables on the graph. The nonparametric structuralequation for M i may be deﬁned as the product of events in the conjunctionfor M i . The nonparametric structural equation for D can be given by D = M ∨ · · · ∨ M n . The nonparametric structural equations for all other nodes on J can betaken to be the same as those deﬁning the causal directed acyclic graph INIMAL SUFFICIENT CAUSATION H . Because the nonparametric structural equations for D and for each M i on J are deterministic, they have no random-error term. Thus, for thenonparametric structural equations deﬁning D and each M i on J , the re-quirement that the nonparametric structural equation’s random term ǫ i isindependent of all the other random terms ǫ j in the nonparametric struc-tural equations for the other variables on the graph is trivially satisﬁed.That this requirement is satisﬁed for the nonparametric structural equa-tions for the other variables on J follows from the fact that it is satisﬁedon H . (cid:3) In Theorem 1, suﬃcient conjunctions for D are constructed from some setof variables that, on some causal directed acyclic graph H , are all parentsof D and thus, within the directed acyclic graph causal framework, it makessense to speak of suﬃcient causes and minimal suﬃcient causes. Definition 5.

If, on a causal directed acyclic graph, some node D withnonparametric structural equation D = f D ( pa D , ǫ D ) is such that D and allits parents are binary, then X , . . . , X n is said to constitute a suﬃcient causefor D if X , . . . , X n are all parents of D or complements of the parents of D and are such that f D ( pa D , ǫ D ) = 1 for all ǫ D whenever pa D is such that X · · · X n = 1; if no proper subset of X , . . . , X n also constitutes a suﬃcientcause for D , then X , . . . , X n is said to constitute a minimal-suﬃcient causefor D . A set of (minimal) suﬃcient causes, M , . . . , M n , each of which is aproduct of the parents of D and their complements, is said to be determi-native for some event D if, for all ǫ D , f D ( pa D , ǫ D ) = 1 if and only if pa D issuch that M ∨ M ∨ · · · ∨ M n = 1; if no proper subset of M , . . . , M n is alsodeterminative for D , then M , . . . , M n is said to constitute a nonredundantdeterminative set of (minimal) suﬃcient causes for D .If, for some directed acyclic graph G there exist A , . . . , A u which satisfythe conditions of Theorem 1 for some node D on G so that a determinativeset of suﬃcient causes for D can be constructed from A , . . . , A u along withthe parents of D on G and their complements, then D will be said to admita suﬃcient causation structure. As in Example 2, we will, in general, replacethe M i nodes with the conjunctions that constitute them. The node D withdirected edges from the M i nodes is eﬀectively an OR node. The M i nodeswith the directed edges from the A i nodes and the parents of D on G are eﬀectively AND nodes. We call this resulting diagram a causal directedacyclic graph with a suﬃcient causation structure (or a minimal suﬃcientcausation structure if the determinative set of suﬃcient conjunctions for D are each minimal suﬃcient conjunctions). T. J. VANDERWEELE AND J. M. ROBINS

Because a causal directed acyclic graph with a suﬃcient causation struc-ture is itself a causal directed acyclic graph, the d -separation criterion ap-plies and allows one to determine independencies and conditional indepen-dencies. A minimal suﬃcient causation structure will often make apparentconditional independencies within a particular stratum of the conditioningvariable which were not apparent on the original causal directed acyclicgraph. The following corollary is useful in this regard. Corollary 1.

If some node D on a causal directed acyclic graph admitsa suﬃcient causation structure then conditioning on D = 0 conditions alsoall suﬃcient cause nodes for D on the causal directed acyclic graph with thesuﬃcient causation structure. Example 2 (Continued). Consider the causal directed acyclic graphwith the minimal suﬃcient causation structure given in Figure 2(ii). Con-ditioning on D = 0 also conditions on E E = 0 and E E = 0, and thus,by the d -separation criterion, E i is conditionally independent of E j given D = 0 for i ∈ { , } , j ∈ { , } . In the causal directed acyclic graph with theminimal suﬃcient causation structure in Figure 2(iv), no similar conditionalindependence relations within the D = 0 stratum holds. Although condition-ing on D = 0 conditions also on E E = 0 and AE = 0 there still remainsan unblocked path E − E E − E − AE − A between E and A , and so E and A are not conditionally independent given D = 0; Similarly, thereare unblocked paths between E and E given D = 0 and also between E and A given D = 0.The additional variables A , . . . , A u needed to form a set of suﬃcientcauses for D we will refer to as the co-causes of D . The co-causes A , . . . , A u required to form a determinative set of suﬃcient conjunctions for D willgenerally not be unique. For example, if D = A ∨ A E then it is also thecase that D = B ∨ B E , where B = A and B = A A . Similarly, therewill, in general, be no unique set of suﬃcient causes that is determinativefor D . For example, if E and E constitute a set of suﬃcient causes for D so that D = E ∨ E , then it is also the case that E E , E E , and E E also constitute a set of suﬃcient causes for D , and so we could also write D = E E ∨ E E ∨ E E . It can be shown that not even nonredundantdeterminative sets of minimal suﬃcient causes are unique.Corresponding to the deﬁnition of a suﬃcient cause is the more philosoph-ical notion of a causal mechanism. A causal mechanism can be conceived ofas a set of events or conditions which, if all present, bring about the outcomeunder consideration through a particular pathway. A causal mechanism thusprovides a particular description of how the outcome comes about. Suppose,for instance, that an individual were exposed to two poisons, E and E , INIMAL SUFFICIENT CAUSATION such that in the absence of E , the poison E would lead to heart failureresulting in death; and that in the absence of E , the poison E would leadto respiratory failure resulting in death; but such that when E and E areboth present, they interact and lead to a failure of the nervous system againresulting in death. In this case, there are three distinct causal mechanisms fordeath each corresponding to a suﬃcient cause for D : death by heart failurecorresponding to E E , death by respiratory failure corresponding to E E and death due to a failure of the nervous system corresponding to E E . It isinteresting to note that in this case none of the suﬃcient causes correspond-ing to the causal mechanisms is minimally suﬃcient. Each of E E , E E and E E is suﬃcient for D but none is minimally suﬃcient, as either E or E alone is suﬃcient for death. We will refer to a suﬃcient cause for D as acausal mechanism for D if the node for the suﬃcient cause corresponds toa variable, potentially subject to intervention, which whenever the variabletakes the value 1, the outcome D inevitably results.The last example shows that the existence of a particular set of deter-minative suﬃcient causes does not guarantee that there are actual causalmechanisms corresponding to these suﬃcient causes; it only implies thata set of causal mechanisms corresponding to these suﬃcient causes cannotbe ruled out by a complete knowledge of counterfactual outcomes. In par-ticular, in the previous example, the set { E , E } is a determinative set ofsuﬃcient causes that does not correspond to the actual set of causal mecha-nisms { E E , E E , E E } . If there are two or more sets of suﬃcient causesthat are determinative for some outcome D then although the two sets ofdeterminative suﬃcient causes are logically equivalent for prediction, wenevertheless view them as distinct. In such cases, some knowledge of thesubject matter in question will, in general, be needed to discern which ofthe sets of determinative suﬃcient causes actually corresponds to the truecausal mechanisms. For instance, in the previous example, we needed biolog-ical knowledge of how poisons brought about death in the various scenarios.We will, in the interpretation of our results, assume that there always existssome set of true causal mechanisms which forms a determinative set of suﬃ-cient causes for the outcome. The concept of synergism is closely related tothat of a causal mechanism and is often found in the epidemiologic literature[11, 29, 32]. We will say that there is synergism between the eﬀects of E and E on D if there exists a suﬃcient cause for D which represents somecausal mechanism and such that this suﬃcient cause has E and E in itsconjunction. In related work, we have developed tests for synergism, that is,tests for the joint presence of two or more causes in a single suﬃcient cause[36, 37]. In some of our examples and in our discussion of the various resultsin the paper, we will sometimes make reference to the concepts of a causalmechanism and synergism. However, all deﬁnitions, propositions, lemmas, T. J. VANDERWEELE AND J. M. ROBINS theorems and corollaries will be given in terms of suﬃcient causes for whichwe have a precise deﬁnition.The graphical representation of suﬃcient causes on a causal directedacyclic graph does not require that the determinative set of suﬃcient causesfor D be minimally suﬃcient, nor does it require that the set of determina-tive suﬃcient causes for D be nonredundant. To expand a directed acyclicgraph into another directed acyclic graph with suﬃcient cause nodes, allthat is required is that the set of suﬃcient causes constitutes a determina-tive set of suﬃcient causes for D . However, a set of events that constitutesa suﬃcient cause can be reduced to a set of events that constitutes a min-imal suﬃcient cause by iteratively excluding unnecessary events from theset until a minimal suﬃcient cause is obtained. Also, a set of determinativesuﬃcient causes that is redundant can be reduced to one that is nonredun-dant by excluding those suﬃcient causes or minimal suﬃcient causes thatare redundant. It is sometimes an advantage to reduce a redundant set ofsuﬃcient causes to a nonredundant set of minimal suﬃcient causes. Thisis so because allowing suﬃcient causes that are not minimally suﬃcient orallowing redundant suﬃcient causes or redundant minimal suﬃcient causescan obscure the conditional independence relations implied by the structureof the causal directed acyclic graph. This is made evident in Example 3. Example 3.

Consider the causal directed acyclic graph with the mini-mal suﬃcient causation structure given in Figure 3(i). Conditioning on D = 0conditions also on AB = 0 and EF = 0 and by the d -separation criterion, A and E are conditionally independent given D = 0. But now consider anexpanded structure for this causal directed acyclic graph which involvesonly minimal suﬃcient causes but which allows redundant minimal suﬃ-cient causes. Deﬁne Q = BE , then AQ is a minimal suﬃcient cause for D since AQ = 1 ⇒ AB = 1 ⇒ D = 1, but A = 1 ; D = 1 and Q = 1 ; D = 1.Now AB, AQ, EF is a determinative but redundant set of minimal suﬃcientcauses for D . Figure 3(ii) gives an alternative causal directed acyclic graphwith a minimal suﬃcient causation structure for the causal relationshipsindicated in Figure 3(i). In Figure 3(ii), conditioning on D = 0 conditionsalso on AB = 0, AQ = 0 and EF = 0, but the d -separation criteria no longerimply that A and E are conditionally independent given D = 0; becauseof conditioning on D = 0, there is an unblocked path between A and E ,namely A − AQ − Q − BE − E . Allowing the redundant minimal suﬃcientcause AQ in the minimal suﬃcient causation structure obscures the condi-tional independence relation. Similar examples can be constructed to showthat allowing suﬃcient causes that are not minimally suﬃcient can alsoobscure conditional independence relations [35]. INIMAL SUFFICIENT CAUSATION Fig. 3.

Example illustrating that redundant suﬃcient causes can obscure conditional in-dependence relations.

Although allowing suﬃcient causes that are not minimally suﬃcient orallowing redundant suﬃcient causes or redundant minimal suﬃcient causescan obscure the conditional independence relations implied by the structureof the causal directed acyclic graph, it may sometimes be desirable to includenonminimal suﬃcient causes or redundant suﬃcient causes. For example, asnoted above, nonminimal suﬃcient cause nodes or redundant suﬃcient causenodes may represent separate causal mechanisms upon which it might bepossible to intervene. Further discussion of conditional independence rela-tions in suﬃcient causation structures with nonminimally suﬃcient causesand redundant suﬃcient causes is given in Section 6.Note a suﬃcient cause need only involve one co-cause A i in its conjunctionbecause if it involved A i , . . . , A i k , then A i , . . . , A i k could be replaced bythe product A ′ i = A i · · · A i k . In certain cases though, it may be desirableto include more than one A i in a suﬃcient cause if this corresponds tothe actual causal mechanisms. If a set of variables A , . . . , A u satisfyingTheorem 1 can be constructed from functions of the random term U = ǫ GD ofthe nonparametric structural equation for D on G and their complementsso that A i = f i ( U ), then H can be chosen to be the graph G with theadditional nodes U, A , . . . , A u and with directed edges from U into each A i and from each A i into D . This gives rise to the deﬁnition, given below, of arepresentation for D . Definition 6. If D and all of its parents on the causal directed acyclicgraph G are binary and there exists some set { A i , P i } such that each P i issome conjunction of the parents of D and their complements, such that thereexist functions f i for which A i = f i ( ǫ D ), where ǫ D is the random term in thenonparametric structural equation for D on G and such that D = W i A i P i ,then { A i , P i } is said to constitute a representation for D . T. J. VANDERWEELE AND J. M. ROBINS

If the A i variables are constructed from functions of the random term ǫ D in the nonparametric structural equation for D on G , then these A i variablesmay or may not allow for interpretation, and they may or may not be suchthat an intervention on these A i variables is conceivable. In certain cases,the A i variables may simply be logical constructs for which no interventionis conceivable. Although in certain cases it may not be possible to interveneon the A i variables, we will still refer to conjunctions of the form A i P i assuﬃcient causes for D , as it is assumed that it is possible to intervene onthe parents of D which constitute the conjunction for P i .Suppose that for some node D on a causal directed acyclic graph G , a setof variables A , . . . , A u satisfying Theorem 1 can be constructed from func-tions of the random term U = ǫ D in the nonparametric structural equationfor D on G , so that a representation for D is given by D = W i A i P i . Then,in order to simplify the diagram, instead of adding to G the variable U anddirected edges from U into each A i so as to form the minimal suﬃcient cau-sation structure, we will sometimes suppress U and simply add an asterisknext to each A i indicating that the A i variables have a common cause. Proposition 1.

For any representation for D , the co-causes A i will beindependent of the parents of D on the original directed acyclic graph G . Proof.

This follows immediately from the fact that for any representa-tion for D , the co-causes are functions of the random term in the nonpara-metric structural equation for D . (cid:3) If some of the suﬃcient causes for D are unknown, then it is not obvioushow one might make use of Theorem 1. The theorem allowed for a suﬃcientcausation structure on a causal directed acyclic graph, provided there existedsome set of co-causes A , . . . , A u . Theorem 2 complements Theorem 1 in thatit essentially states that when D and all of its parents are binary such a setof co-causes always exists. The variables A , . . . , A u are constructed fromfunctions of the random term ǫ D in the nonparametric structural equationfor D on G . Before stating and proving Theorem 1, we illustrate how theco-causes can be constructed by a simple example. Example 4.

Suppose E is the only parent of D , then the structuralequation for D is given by D = f ( E, ε D ). Deﬁne A , A and A as follows:let A ( ω ) = 1 if f (1 , ε D ( ω )) = f (0 , ε D ( ω )) = 1 and A ( ω ) = 0 otherwise; let A ( ω ) = 1 if f (1 , ε D ( ω )) = 1 and f (0 , ε D ( ω )) = 0, and A ( ω ) = 0 otherwise;and let A ( ω ) = 1 if f (1 , ε D ( ω )) = 0 and f (0 , ε D ( ω )) = 1, and A ( ω ) = 0otherwise. It is easily veriﬁed that D = A ∨ A E ∨ A E and that A , A E and A E constitute a determinative set of minimal suﬃcient causes for D .Note that this construction will give a determinative set of minimal suﬃcientcauses for D regardless of the form of f and the distribution of ε D . INIMAL SUFFICIENT CAUSATION Theorem 2.

Consider a causal directed acyclic graph G on which thereexists some node D such that D and all its parents are binary, then thereexist variables A , . . . , A u that satisfy the conditions of Theorem 1 and suchthat the suﬃcient causes constructed from A , . . . , A u along with the parentsof D on G and their complements are, in fact, minimal suﬃcient causes. Proof.

The nonparametric structural equation for D is given by D = f ( pa D , ε D ). Suppose D has m parents on the original causal directed acyclicgraph G . Since these parents are binary, there are 2 m values which pa D can take. Since f maps ( pa D , ε D ) to { , } , each value of ε D assigns toevery possible realization of pa D either 0 or 1 through f . There are 2 m suchassignments. Thus, without loss of generality, we may assume that ε D takeson some ﬁnite number of distinct values N ≤ m ; and so, we may writethe sample space for ε D as Ω D = { ω , . . . , ω N } , and we may use ω = ω i and ε D = ε D ( ω i ) interchangeably. The co-causes A , . . . , A u can be constructedas follows. Let W i be the indicator 1 ε D = ε D ( ω i ) . Let P i be some conjunctionof the parents of D and their complements, that is, P i = F i · · · F in i , whereeach F ik is either a parent of D , say E j or its complement E j . For each P i ,let A i ≡ F i · · · F in i is a minimal suﬃcient cause for D and A i = _ j { W j : W j F i · · · F in i is a minimal suﬃcient cause for D } otherwise. Let M i = P i if A i = 1, and M i = A i P i otherwise. It must be shownthat each M i = A i F i · · · F in i is a minimal suﬃcient cause and that the setof M i ’s constitutes a minimal suﬃcient cause representation for D (or moreprecisely, the set of M i ’s for which A i is not identically 0 constitutes aminimal suﬃcient cause representation for D ). We ﬁrst show that each M i = A i F i · · · F in i is a minimal suﬃcient cause for D . Clearly, this is the case if A i ≡

1. Now consider those A i such that A i is not identically 0 and notidentically 1 and suppose A i = W i ∨ · · · ∨ W iυ i , where each W ij is such that W ij F i · · · F in i is a minimal suﬃcient cause for D . If A i F i · · · F in i is not aminimal suﬃcient cause, then either F i · · · F in i = 1 ⇒ D = 1 or there exists j such that A i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1 . Suppose ﬁrst that F i · · · F in i = 1 ⇒ D = 1 then there does not exist a W j such that W j F i · · · F in i is a minimal suﬃcient cause for D ; but this contra-dicts A i is not identically 1. On the other hand, if there exists j such that A i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1, then it is also the case that W i F i · · · F ij − F ij +1 · · · F in i ⇒ D = 1 , T. J. VANDERWEELE AND J. M. ROBINS since A i is simply a disjunction of the W ij ’s. However, it would then followthat W i F i · · · F in i is not a minimal suﬃcient cause for D . But this contradictsthe deﬁnition of W i . Thus, A i F i · · · F in i must be a minimal suﬃcient cause for D . It remains to be shown that the set of M i ’s for which A i is not identically0 constitutes a minimal suﬃcient cause representation for D . We must showthat if D = 1, then there exists a M i = A i P i for which M i = 1. Now D is afunction of ( ε D , E , . . . , E m ), so let ( ε ∗ D , E ∗ , . . . , E ∗ m ) be any particular valueof ( ε D , E , . . . , E m ) for which D = 1. Consider the set { E , . . . , E m } . If forany j , ε D = ε ∗ D , E = E ∗ , . . . , E j − = E ∗ j − ,E j +1 = E ∗ j +1 , . . . , E m = E ∗ m ⇒ D = 1 , remove E j from { E , . . . , E m } . Continue to remove those E j from this setwhich are not needed to maintain the implication D = 1. Suppose the setthat remains is { E h , . . . , E h S } , then either we have E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1 or we have E h = E ∗ h , . . . , E h S = E ∗ h S ; D = 1and ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1 . If E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1, then if we deﬁne F j as the indi-cator F j = 1 ( E hj = E ∗ hj ) , F · · · F S is a minimal suﬃcient cause for D andthere thus exists an i , such that P i = F · · · F S and M i = P i , and when E h = E ∗ h , . . . , E h S = E ∗ h S , we have M i = 1. If E h = E ∗ h , . . . , E h S = E ∗ h S ; D = 1 but ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ D = 1, then if we deﬁne F j as the indicator 1 ( E hj = E ∗ hj ) , 1 ε D = ε ∗ D F · · · F S is a minimal suﬃcient causefor D ; and there exists an i such that M i = A i P i and P i = F · · · F S ; and ε D = ε ∗ D ⇒ A i = 1, such that ε D = ε ∗ D , E h = E ∗ h , . . . , E h S = E ∗ h S ⇒ M i = 1 . We have thus shown when D = 1, there exists an M i such that M i = 1 andso the M i ’s constitutes a minimal suﬃcient cause representation for D . (cid:3) The variables A i constructed in Theorem 2, along with their correspond-ing conjunctions P i of the parents of D and their complements, we deﬁnebelow as the canonical representation for D . It is easily veriﬁed that theco-causes and representation constructed in Example 4 is the canonical rep-resentation for D in that example. INIMAL SUFFICIENT CAUSATION Definition 7.

Consider a causal directed acyclic graph G , such thatsome node D and all of its parents are binary. Let Ω D be the sample spacefor the random term ǫ D in the nonparametric structural equation for D on G . The conjunctions P i = F i · · · F in i , where each F ik is either a parent of D or the complement of a parent of D , along with the variables A i con-structed by A i ≡ F i · · · F in i is a minimal suﬃcient cause for D and A i = W ω j ∈ Ω D { ε D = ε D ( ω j ) : 1 ε D = ε D ( ω j ) F i · · · F in i is a minimal suﬃcient causefor D } ; otherwise, is said to be the canonical representation for D .As noted above, there will in general exist more than one set of co-causes A , . . . , A u , which together with the parents of D and their complements canbe used to construct a suﬃcient cause representation for D . The set of A i ’s inthe canonical representation constitutes only one particular set of variableswhich can be used to construct a suﬃcient cause representation. If D hasthree or more parents, examples can be constructed in which the canonicalrepresentation is redundant. Examples can also be constructed to show thatwhen the canonical representation is redundant, it is not always uniquely reducible to a nonredundant minimal suﬃcient cause representation. Al-though the canonical representation will not always be nonredundant, itdoes however guarantee that for a binary variable with binary parents, adeterminative set of minimal suﬃcient causes always exists. The canonicalrepresentation in a sense “favors” conjunctions with fewer terms. As canbe seen in the simple illustration given in Example 4, the canonical repre-sentation will never have A i = 1, for some conjunction P i , when there is aconjunction P j with A j = 1 and such that the components of P j are a subsetof those in the conjunction for P i .

4. Monotonic eﬀects and minimal suﬃcient causation.

Minimal suﬃ-cient causes for a particular event D may have present in their conjunctionthe parents of D or the complements of these parents. In certain cases, nominimal suﬃcient cause will involve the complement of a particular parentof D . Such cases closely correspond to what will be deﬁned below as a posi-tive monotonic eﬀect. Essentially, a positive monotonic eﬀect will be said tobe present when a function in a nonparametric structural equation is non-decreasing in a particular argument for all values of the other arguments ofthe function. In this section, we develop the relationship between minimalsuﬃcient causation and monotonic eﬀects. Definition 8.

The nonparametric structural equation for some node D on a causal directed acyclic graph with parent E can be expressed as D = f ( f pa D , E, ǫ D ), where f pa D are the parents of D other than E ; E is said tohave a positive monotonic eﬀect on D if, for all f pa D and ǫ D , f ( f pa D , E , ǫ D ) ≥ T. J. VANDERWEELE AND J. M. ROBINS f ( f pa D , E , ǫ D ) whenever E ≥ E . Similarly, E is said to have a negativemonotonic eﬀect on D if, for all f pa D and ǫ D , f ( f pa D , E , ǫ D ) ≤ f ( f pa D , E , ǫ D )whenever E ≥ E .Note that this notion of a monotonic eﬀect is somewhat stronger thanWellman’s qualitative probabilistic inﬂuence [41]. See [38, 39] for furtherdiscussion. Theorem 3. If E is parent of D and if D and all of its parents are bi-nary, then the following are equivalent: (i) E has a positive monotonic eﬀecton D ; (ii) there is some representation for D which is such that none of therepresentation’s conjunctions contain E ; (iii) the canonical representationof D , W i A i P i , is such that no conjunction P i contains E . Proof.

We see that (iii) implies (ii) because the representation requiredby (ii) is met by the canonical representation of D , as constructed in The-orem 2. To show that (ii) implies (i), we assume that we have a repre-sentation for D such that D = W i A i P i , where each P i is some conjunc-tion of the parents of D and their complements but does not contain E .If f ( f pa D , E, ǫ D ) = 1, then f ( f pa D , E, ǫ D ) = 1 because D = W i A i P i and noneof the P i involve E ; from this, (i) follows. To show that (i) implies (iii) weprove the contrapositive. Suppose that the canonical representation of D , { A i , P i } , is such that there exists a P i which contains E in its conjunction.Then there exists some value ε ∗ D of ε D and some conjunction of the parentsof D and their complements, say F · · · F n , such that W i F · · · F n E consti-tutes a minimal suﬃcient cause for D , where W i = 1 ( ε ∗ D = ε D ) . Let f pa ∗ D takethe values given by F · · · F n . This may not suﬃce to ﬁx f pa ∗ D , but theremust exist some value of the remaining parents of D other than E which, inconjunction with W i F · · · F n E , gives D = 0; for if there were no such valuesof the other parents, then W i F · · · F n itself would be suﬃcient for D , and W i F · · · F n E would not be a minimal suﬃcient cause for D . Let f pa ∗ D besuch that f pa ∗ D and E together with ε ∗ D give D = 1, but f pa ∗ D and E with ε ∗ D give D = 0. Then, f ( f pa ∗ D , E, ε ∗ D ) = 1, but f ( f pa ∗ D , E, ε ∗ D ) = 0, and thus, (i)does not hold. This completes the proof. (cid:3)

5. Conditional covariance and minimal suﬃcient causation.

When twobinary parents of some event D have positive monotonic eﬀects on D , it isin some cases possible to determine the sign of the conditional covariance ofthese two parents. In general, even in the setting of monotonic eﬀects, theconditional covariance may be of either positive or negative sign; however,when additional knowledge is available concerning the minimal suﬃcientcausation structure of D , it is often possible to determine the sign of the INIMAL SUFFICIENT CAUSATION conditional covariance of two parents of D . Theorem 4 gives conditions underwhich the sign of the conditional covariance can be determined. Theorems5 and 6 extend the conclusions of Theorem 4 to certain cases concerningthe conditional covariance of two variables that may not be parents of theconditioning variable. The proof of Theorem 4 is suppressed; the proof in-volves extensive but routine algebraic manipulation and factoring (detailsare available from the authors upon request). Theorem 4.

Suppose that E and E are the only parents of D on somecausal directed acyclic graph, that E , E and D are all binary and that both E and E have a positive monotonic eﬀect on D . Then, for any repre-sentation for D such that D = A ∨ A E ∨ A E ∨ A E E , the followinghold: (i) If A ≡ , then Cov( E , E | D ) ≤ . (ii) If A ≡ , A and A are independent and E and E are indepen-dent, then Cov( E , E | D ) ≤ . (iii) If A ≡ or A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (iv) If A ≡ or A ≡ , then Cov( E , E | D ) = 0 . (v) If A ≡ or A ≡ , then Cov( E , E | D ) ≥ provided Cov( E , E ) ≥ . (vi) If A ≡ or A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (vii) If A ≡ , then Cov( E , E | D ) ≤ provided Cov( E , E ) ≤ . (viii) If A ≡ , A and A are independent, E and E are independentand also A is independent of either A or A , then Cov( E , E | D ) = 0 . Note that parts (i)–(viii) of Theorem 4 all require some knowledge of asuﬃcient cause representation for D , that is, that A = 0, A ≡ A ≡ D . As canbe seen from Theorem 4, if no knowledge of the suﬃcient causes is available,the conditional covariances Cov( E , E | D ) and Cov( E , E | D ) may be ofeither sign, even if E and E have positive monotonic eﬀects on D . Forexample, if E and E have positive monotonic eﬀects on D and (v) holdsthen Cov( E , E | D ) ≥

0; but if E and E have positive monotonic eﬀectson D and (i) holds, then Cov( E , E | D ) ≤ E and E are the only parents of D , possibly correlated due to somecommon cause C , and have positive monotonic eﬀects on D then the minimalsuﬃcient causation structure for the causal directed acyclic graph is thatgiven in Figure 4. T. J. VANDERWEELE AND J. M. ROBINS

Fig. 4.

Minimal suﬃcient causation structure when E and E have positive monotoniceﬀects on D . Recall the asterisk is used to indicate that A , A , A or A may havea common cause U . If one of A , A , A or A is identically 0 or 1, thenTheorem 4 may be used to draw conclusions about the sign of the condi-tional covariance Cov( E , E | D ). For example, if one believes that there isno synergism between E and E in the actual causal mechanisms for D then A ≡

0; if this holds, then parts (vii) and (viii) of Theorem 4 can beused to determine the sign of the conditional covariance. Theorem 4 has anobvious analogue if one or both of E or E have a negative monotonic eﬀecton D . If D has more than two parents, but if the two parents, E and E ,are independent of all other parents of D , then the causal directed acyclicgraph can be marginalized over these other parents, and Theorem 4 couldbe applied to the resulting causal directed acyclic subgraph.Some of the conclusions of Theorem 4 require knowing the sign of Cov( E , E )and Proposition 2 below (proved elsewhere [39]) relates the sign of Cov( E , E )to the presence of monotonic eﬀects. In order to state this proposition andto allow for the development of extensions to Theorem 4, we need a fewadditional deﬁnitions. Definition 9.

An edge on a causal directed acyclic graph from X to Y is said to be of positive (negative) sign if X has a positive (negative)monotonic eﬀect on Y . If X has neither a positive monotonic eﬀect nor anegative monotonic eﬀect on Y , then the edge from X to Y is said to bewithout a sign. Definition 10.

The sign of a path on a causal directed acyclic graphis the product of the signs of the edges that constitute that path. If one of

INIMAL SUFFICIENT CAUSATION Fig. 5.

Examples requiring extensions to Theorem 4. the edges on a path is without a sign, then the sign of the path is said to beundeﬁned.

Definition 11.

Two variables X and Y are said to be positively mono-tonically associated if all directed paths between X and Y are of positivesign, and all common causes C i of X and Y are such that all directed pathsfrom C i to X not through Y are of the same sign as all directed paths from C i to Y not through X ; the variables X and Y are said to be negativelymonotonically associated if all directed paths between X and Y are of neg-ative sign, and all common causes C i of X and Y are such that all directedpaths from C i to X not through Y are of the opposite sign as all directedpaths from C i to Y not through X . Proposition 2. If X and Y are positively monotonically associated,then Cov(

X, Y ) ≥ . If X and Y are negatively monotonically associated,then Cov(

X, Y ) ≤ . Rules for the propagation of signs have been developed elsewhere [38,39, 41] and, as seen from Proposition 2, are useful for determining the signof covariances; however, as will be seen below, rules for deriving the signof conditional covariances are more subtle. Theorem 4 concerns the condi-tional covariance of two parents of the node D . However, often what willbe desired is the sign of the conditional covariance of two variables whichare not parents of the conditioning node. For example, in the coaggregationproblem discussed in the Introduction, we wanted to draw conclusions aboutCov( P , B | P = 1), but neither P nor B are parents of P in Figure 1. Inthe remainder of the paper we will thus extend Theorem 4 so as to allow forapplication to two variables, say F and G , which are not parents of the con-ditioning node D . The variables F and G might be ancestors, descendantsor have common causes with the parents, E and E , of D . Consider, forexample, the causal directed acyclic graphs in Figure 5.If we were interested in the sign of Cov( F, G | D ) in Figures 5(i)–(iii), thenclearly Theorem 4 is insuﬃcient. Theorems 5 and 6 below will allow us to T. J. VANDERWEELE AND J. M. ROBINS extend the conclusions of Theorem 4 to examples such as those in Figure 5and to certain other cases involving two variables that may not be parentsof the conditioning variable. Lemmas 1–5 below will be needed in the proofsand application of Theorems 5 and 6. Lemmas 1 and 2 are consequencesof Theorems 1 and 2 in the work of Esary, Proschan and Walkup [5]. Lem-mas 3–5 are proved elsewhere in related work concerning the properties ofmonotonic eﬀects [38].

Lemma 1.

Let f and g be functions with n real-valued arguments, suchthat both f and g are nondecreasing in each of their arguments. If X = ( X , . . . , X n ) is a multivariate random variable with n components,such that each component is independent of the other components, then Cov( f ( X ) , g ( X )) ≥ . Lemma 2. If F and G are binary and u and u are nondecreasingfunctions, then sign(Cov( u ( F ) , u ( G ))) = sign(Cov( F, G )) . Lemma 3.

Let X denote some set of nondescendants of A that blocksall backdoor paths from A to Y . If all directed paths between A and Y arepositive, then P ( Y > y | a, x ) and E [ y | a, x ] are nondecreasing in a . Lemma 4.

Suppose that E is binary. Let Q be some set of variableswhich are not descendants of F nor of E , and let C be the common causesof E and F not in Q . If all directed paths from E to F (or from F to E )are of positive sign and all directed paths from C to E not through { Q, F } are of the same sign as all directed paths from C to F not through { Q, E } ,then E [ F | E, Q ] is nondecreasing in E . Lemma 5.

Suppose that E is not a descendant of F . Let Q be some setof nondescendants of E that block all backdoor paths from E to F and let D be a node on a directed path from E to F such that all backdoor paths from D to F are blocked by { E, Q } . If all directed paths from E to F , except possiblythose through D , are of positive sign, then E [ F | D, Q, E ] is nondecreasing in E . Obvious analogues concerning negative signs hold for all of the lemmasabove. Theorem 5 below will allow us to determine the sign of the conditionalcovariance of F and G on graphs like those in Figure 5, provided thereare appropriate signs on the edges. The conclusion of Theorem 5 concernsthe equality of the sign of two conditional covariances, Cov( F, G | D ) andCov( E , E | D ). The theorem itself does not require knowledge of a suﬃcientcausation representation and thus applies to general causal directed acyclicgraphs. However, to draw conclusions about the sign of Cov( E , E | D ), one INIMAL SUFFICIENT CAUSATION must still appeal to Theorem 4 which does require some knowledge of asuﬃcient causation representation. Theorem 5.

Suppose that E , E and D are binary variables, that E and E are parents of D , that F and G are d -separated given { E , E , D } ,that F and { E , D } are d -separated given E and that G and { E , D } are d -separated given E . If Cov(

F, E ) ≥ and Cov(

G, E ) ≥ then sign(Cov( F, G | D )) = sign(Cov( E , E | D )) . Proof.

Conditioning on E and E , we haveCov( F, G | D ) = E [Cov( F, G | D, E , E ) | D ]+ Cov( E [ F | D, E , E ] , E [ G | D, E , E ] | D ) . The ﬁrst expression is 0 since F and G are d -separated given { E , E , D } .Furthermore, since F and { E , D } are d -separated given E and G and { E , D } are d -separated given E , the second expression can be reduced toCov( E [ F | E ] , E [ G | E ] | D ). Thus,Cov( F, G | D ) = Cov( E [ F | E ] , E [ G | E ] | D ) . If Cov(

F, E ) ≥ G, E ) ≥

0; Propo-sition 2 can be used to check whether these covariances are nonnegative; thatis, the covariances will be nonnegative if F and E are positively monoton-ically associated and if G and E are positively monotonically associated. Example 5.

Note that the graphs in Figures 5(i) and (ii) satisfy the d -separation restrictions of Theorem 5. In Figure 5(i), G is an ancestor of E whereas F is related to E as a descendant and by a common cause.In Figure 5(ii), F is a descendant of E and G is related to E both as anancestor and by a common cause. The d -separation restrictions of Theorem 5would still hold in Figures 5(i) and (ii) if F and E or G and E had multiplecommon causes or if there were several intermediate variables between E and F and between G and E . T. J. VANDERWEELE AND J. M. ROBINS

Note, however, that Theorem 5 requires that F be d -separated from { E , D } given E and that G be d -separated from { E , D } given E . Thus,if F or G were a descendant of D , these assumptions would be violated.Consequently, Theorem 5 could not be applied to the diagram in Figure5(iii). Nor could Theorem 5 be applied to the paper’s introductory motiva-tion to draw conclusions about the sign of Cov( P , B | P = 1) for the graphin Figure 1, since B is a descendant of the conditioning variable P .Theorem 6 below gives a result that allows for F and G to be descendantsof D . Before stating this result we note, however, that Theorem 5 is restrictedin yet another way. Theorem 5 required that F and G be d -separated given { E , E , D } . If F and G have common causes then the d -separation restric-tions required by Theorem 5 will again, in general, not hold. Theorem 5would thus not apply to the graphs given in Figure 6.Theorem 6 gives a result similar to Theorem 5 which allows for F or G tobe descendants of D and allows also for F and G to have common causes. Aswith Theorem 5, the conclusion of Theorem 6 concerns the equality of thesign of two conditional covariances and the theorem itself does not requireknowledge of a suﬃcient causation representation. But once again, to drawconclusions about the sign of Cov( F, G | D ) using Theorem 6, one must knowthe sign of Cov( E , E | D ) and thus, appeal must again be made to Theorem4 which does require some knowledge of a suﬃcient causation representation. Theorem 6.

Suppose that E , E and D are binary variables, that E and E are parents of D , that F and G are d -separated given { E , E , D, Q } ,where Q is some set of common causes of F and G (each component ofwhich is univariate and independent of the other components in Q ) that F and E are d -separated given { E , D, Q } , that G and E are d -separatedgiven { E , Q, D } , that Q and { E , E } are d -separated given D and that Q Fig. 6.

Examples in which F and G have a common cause. INIMAL SUFFICIENT CAUSATION and D are d -separated. Suppose also that E [ F | E , D, Q ] is nondecreasing in E and that E [ G | E , D, Q ] is nondecreasing in E . If Cov( E , E | D ) ≥ ,and for each element of Q i of Q , every directed path from Q i to F is thesame sign as every directed path from Q i to G , then Cov(

F, G | D ) ≥ . If Cov( E , E | D ) ≤ , and for each element of Q i of Q , every directed pathfrom Q i to F is the opposite sign as every directed path from Q i to G , then Cov(

F, G | D ) ≤ . Proof.

We will prove the ﬁrst of the results above; the proof of thesecond is similar. Conditioning on { E , E , Q } , we haveCov( F, G | D ) = E [Cov( F, G | D, Q, E , E ) | D ]+ Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D ) . The ﬁrst expression is 0 since F and G are d -separated given { E , E , Q, D } .We can furthermore re-write the second expression as follows:Cov( F, G | D )= Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D )= E [Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | Q, D ) | D ]+ Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D ) . We will show that each of these two expressions is positive. Since F and E are d -separated given { E , D, Q } , E [ F | D, Q, E , E ] = E [ F | E , D, Q ]; andsince G and E are d -separated given { E , D, Q } , E [ G | D, Q, E , E ] = E [ G | E , D, Q ]. By assumption, we have that E [ F | E , D, Q ] is nondecreas-ing in E and that E [ G | E , D, Q ] is nondecreasing in E . For ﬁxed q ,Cov( E [ F | D, Q = q, E , E ] , E [ G | D, Q = q, E , E ] | Q = q, D )= Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | Q = q, D )= Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | D ) , since Q and { E , E } are d -separated given D . And since E [ F | E , D, Q = q ]is nondecreasing in E and E [ G | E , D, Q = q ] is nondecreasing in E , byLemma 2, Cov( E [ F | E , D, Q = q ] , E [ G | E , D, Q = q ] | D ) = Cov( E , E | D ) ≥

0. We have shown that the ﬁrst of thetwo expressions above is nonnegative. We now show that the second expres-sion Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D ) T. J. VANDERWEELE AND J. M. ROBINS is also nonnegative. As before, E [ F | D, Q, E , E ] = E [ F | E , D, Q ] and E [ G | D,Q, E , E ] = E [ G | E , D, Q ]. By hypothesis, for each element of Q i of Q ev-ery directed path from Q i to F is the same sign as every directed pathfrom Q i to G ; without loss of generality, we may assume that the sign ofall of these directed paths are positive. By Lemma 3 with X = { E , D } and X = { E , D } , respectively, E [ F | E , D, Q = q ] and E [ G | E , D, Q = q ] are bothnondecreasing in each dimension of q . Note that we may apply Lemma 3because if there were any backdoor paths from Q to F or to G , then Q would have some parent which would also be a common cause of F and G and thus also a member of the set Q , but this would violate the assumptionthat the members of Q were independent of one another. Furthermore, E [ E [ F | D, Q = q, E , E ] | Q = q, D ] = E [ E [ F | E , D, Q = q ] | Q = q, D ]= E [ E [ F | E , D, Q = q ] | D ]and similarly, E [ E [ G | D, Q = q, E , E ] | Q = q, D ] = E [ E [ G | E , Q = q ] | D ] = E [ E [ G | E , Q = q ] | Q = q, D ] since Q and { E , E } are d -separated given D .Thus, E [ E [ F | D, Q = q, E , E ] | Q = q, D ] = E [ E [ F | E , D, Q = q ] | D ]and E [ E [ G | D, Q = q, E , E ] | Q = q, D ] = E [ E [ G | E , D, Q = q ] | D ]are both nondecreasing in each dimension of q from which it follows byLemma 1 that Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ]) ≥

0. Since Q and D are d -separated we also haveCov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ] | D )= Cov( E [ E [ F | D, Q, E , E ] | Q, D ] , E [ E [ G | D, Q, E , E ] | Q, D ]) ≥ (cid:3) Note the application of Theorem 6 requires that E [ F | E , D, Q ] is nonde-creasing in E and that E [ G | E , D, Q ] is nondecreasing in E . Either of thefollowing will suﬃce for E [ F | E , D, Q ] to be nondecreasing in E (similarremarks hold for E [ G | E , D, Q ]): (i) F and D are d -separated given { Q, E } and F and E are positively monotonically associated or (ii) if F is a descen-dant of E and D , F and E do not have common causes and all directedpaths from E to F not through D are of positive sign. Condition (i) suﬃcesby Lemma 4; condition (ii) suﬃces by Lemma 5. Example 6.

Although the graphs in Figure 5(iii) and in Figure 6 donot satisfy the d -separation restrictions of Theorem 5, it can be veriﬁed thatthe these graphs do satisfy the d -separation restrictions of Theorem 6. INIMAL SUFFICIENT CAUSATION At ﬁrst glance, the d -separation restrictions of Theorems 5 and 6 appearto severely limit the settings to which conclusions about conditional covari-ances can be drawn. The d -separation requirements are, in fact, somewhatless restrictive than they may ﬁrst seem. We argue that the d -separationrestrictions of either Theorems 5 or 6 will apply to most graphs in whichneither F nor G is a cause of the other (though the restrictions on the set ofcommon causes Q , if any, of F and G in Theorem 6 are more substantial).Theorem 5 requires (i) that F and G are d -separated given { E , E , D } and(ii) that F and { E , D } are d -separated given E and that G and { E , D } are d -separated given E . In Theorems 5 and 6 (and Figures 5 and 6), F was either an ancestor or descendant of or shared a common cause with E ;and G was either an ancestor or descendant of or shared a common causewith E . The d -separation restrictions essentially just require that F and G are suﬃciently structurally separated so that (i) F and G are only asso-ciated because of { E , E , D } and (ii) F is associated with { E , D } onlythrough E ; and G is associated with { E , D } only through E . If neither F or G is a descendant of D , then the conditions will, in general, only beviolated if one of F or G is a cause of the other or if they share a commoncause. Theorem 6, however, allowed for F and G to have common causes Q . The restrictions on Q in Theorem 6 were somewhat substantial, but therestrictions on F and G are very similar to those of Theorem 5 except thatthey were made conditional on Q . Theorems 5 and 6 will thus apply to awide range of graphs, as can also be seen by the variety of graphs in Figures5 and 6, in which neither F nor G is a cause of the other.As is clear from Proposition 2, rules concerning the propagation of signswere suﬃcient to determine the sign of the covariance between two variables.For conditional covariances, the principles guiding such a determination aremore subtle. The principle behind the proofs of Theorems 5 and 6 was topartition the conditional covariance into two componentsCov( F, G | D ) = E [Cov( F, G | D, Q, E , E ) | D ]+ Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D )with Q = ∅ in the proof of Theorem 5. The d -separation restrictions allowedfor the conclusion that Cov( F, G | D, Q, E , E ) = 0. Additional d -separationrestrictions were needed so that the second expression Cov( E [ F | D, Q, E , E ] , E [ G | D, Q, E , E ] | D ) could be reduced to a form in which the sign of thisconditional covariance could be determined from signed edges and an appealto Theorem 4.Having stated Theorem 6, we can now return to the motivating examplepresented in the paper’s Introduction. T. J. VANDERWEELE AND J. M. ROBINS

Fig. 7.

Causal directed acyclic graph with signed edges, under the null hypothesis of nofamilial coaggregation.

Example 7.

In the motivating example described in Figure 1, with dataavailable only on P , P , B , B , we wish to test the null hypothesis of no fa-milial coaggregation (i.e., the null hypothesis that there are no directed edgesemanating from F ). Note that Hudson et al. [10] consider an alternative ap-proach using a threshold model with additive multivariate normal latentfactors. Here we use a suﬃcient causation approach. Given the substantiveknowledge that for no subset of the population do the genetic causes G p and G B of P and B prevent disease and that for no subset of the population dothe environmental causes E and E of B and P prevent either disease, wehave that E and E have positive monotonic eﬀects on P and B and on P and B , respectively, and that G P has a positive monotonic eﬀect on P and on P and that G B has a positive monotonic eﬀect on B and on B .The null hypothesis of no familial coaggregation can then be represented bythe signed causal directed acyclic graph given in Figure 7.If, in addition, using prior biological knowledge, it is assumed that thereis no synergism between E and G P in the suﬃcient cause sense, then wecan apply part (vii) of Theorem 4 and, under the null hypothesis of no fa-milial coaggregation, we have that Cov( E , G P | P = 1) ≤

0. By Theorem 6with Q = ∅ we have that sign(Cov( B , P | P = 1)) = sign(Cov( E , G P | P =1)). Under the null hypothesis of no familial coaggregation we thus havesign(Cov( B , P | P = 1)) = sign(Cov( E , G P | P = 1)) ≤

0. Thus, as claimedin the Introduction, a test of the null Cov( B , P | P = 1) ≤ E and G P . Note that by the symmetry of this example, a test of the nullCov( B , P | P = 1) ≤ E and G P . The development of a theoryof minimal suﬃcient causation on directed acyclic graphs provided the con-cepts necessary to derive these results.

6. Discussion.

In this paper we have incorporated notions of minimalsuﬃcient causation into the directed acyclic graph causal framework. Doing

INIMAL SUFFICIENT CAUSATION so has provided a clear theoretical link between two major conceptualizationsof causality. Causal directed acyclic graphs with minimal suﬃcient causationstructures have furthermore allowed for the development of rules governingthe sign of conditional covariances and of rules governing the presence ofconditional independencies which hold only in a particular stratum of theconditioning variable.The present work could be extended in a number of directions. Theorycould be developed concerning cases in which a suﬃcient causation struc-ture involves redundant suﬃcient causes or suﬃcient causes that are notminimally suﬃcient. Speciﬁcally, it might be possible to develop a systemof axiomatic rules which govern conditional independencies within strataof variables on a causal directed acyclic graph with a suﬃcient causationstructure, to furthermore demonstrate the soundness and completeness ofthis axiomatic system and to construct algorithms for applying the rulesto identify all conditional independencies inherent in the graph’s structure.Another direction of further research might involve the incorporation of theAND and OR nodes that arise from suﬃcient causation structures into othergraphical models such as summary graphs [4], MC-graphs [12], chain graphmodels [2, 6, 14, 15, 16, 23, 34, 42] and ancestral graph models [24]. Finally,further work could be done extending the results of Theorem 4 to yet moregeneral settings than those of Theorems 5 and 6.REFERENCES [1] Aickin, M. (2002).

Causal Analysis in Biomedicine and Epidemiology Based on Min-imal Suﬃcient Causation . Dekker, New York.[2]

Anderson, S. A. , Madigan, D. and

Perlman, M. D. (2001). Alternative Markovproperties for chain graphs.

Scand. J. Statist. Boutilier, C. , Friedman, N. , Goldszmidt, M. and

Koller, D. (1996). Context-speciﬁc independence in Bayesian networks. In

Uncertainty in Artiﬁcial Intelli-gence

Cox, D. R. and

Wermuth, N. (1996).

Multivariate Dependencies: Models, Analysisand Interpretation . Chapman and Hall, London. MR1456990[5]

Esary, J. D. , Proschan, F. and

Walkup, D. W. (1967). Association of randomvariables, with applications.

Ann. Math. Statist. Frydenberg, M. (1990). The chain graph Markov property.

Scand. J. Statist. Geiger, D. and

Heckerman, D. (1996). Knowledge representation and inferencein similarity networks and Bayesian multinets.

Artiﬁcial Intelligence Geiger, D. , Verma, T. S. and

Pearl, J. (1990). Identifying independence inBayesian networks.

Networks Haavelmo, T. (1943). The statistical implications of a system of simultaneous equa-tions.

Econometrica Hudson, J. I. , Javaras, K. N. , Laird, N. M. , VanderWeele, T. J. , Pope, H. G. and

Hern´an, M. A. (2008). A structural approach to the familial coaggregationof disorders.

Epidemiology T. J. VANDERWEELE AND J. M. ROBINS[11]

Koopman, J. S. (1981). Interaction between discrete causes.

American J. Epidemi-ology

Koster, J. T. A. (2002). Marginalizing and conditioning in graphical models.

Bernoulli Lauritzen, S. L. , Dawid, A. P. , Larsen, B. N. and

Leimer, H. G. (1990). Inde-pendence properties of directed Markov ﬁelds.

Networks Lauritzen, S. L. and

Richardson, T. S. (2002). Chain graph models and theircausal interpretations (with discussion).

J. R. Stat. Soc. Ser. B Stat. Methodol. Lauritzen, S. L. and

Wermuth, N. (1989). Graphical models for associationsbetween variables, some of which are qualitative and some quantitative.

Ann.Statist. Levitz, M. , Perlman, M. D. and

Madigan, D. (2001). Separation and complete-ness properties for amp chain graph Markov models.

Ann. Statist. Lewis, D. (1973). Causation.

J. Philosophy Lewis, D. (1973).

Counterfactuals . Harvard Univ. Press, Cambridge. MR0421986[19]

Mackie, J. L. (1965). Causes and conditions.

American Philosophical Quarterly Novick, L. R. and

Cheng, P. W. (2004). Assessing interactive causal inﬂuence.

Psychological Review

Pearl, J. (1995). Causal diagrams for empirical research.

Biometrika Pearl, J. (2000).

Causality: Models, Reasoning, and Inference . Cambridge Univ.Press, Cambridge. MR1744773[23]

Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs.

Scand. J. Statist. Richardson, T. S. and

Spirtes, P. (2002). Ancestral graph Markov models.

Ann.Statist. Robins, J. M. (1986). A new approach to causal inference in mortality studies withsustained exposure period—application to control of the healthy worker survivoreﬀect.

Math. Modelling Robins, J. M. (1987). Addendum to “A new approach to causal inference in mortalitystudies with sustained exposure period—application to control of the healthyworker survivor eﬀect.”

Comput. Math. Appl. Robins, J. M. (1995). Discussion of “Causal diagrams for empirical research” by J.Pearl.

Biometrika Robins, J. M. (2003). Semantics of causal dag models and the identiﬁcation of directand indirect eﬀects. In

Highly Structured Stochastic Systems (P. Green, N. Hjortand S. Richardson, eds.) 70–81. Oxford Univ. Press, New York. MR2082403[29]

Rothman, K. J. (1976). Causes.

American J. Epidemiology

Rubin, D. B. (1974). Estimating causal eﬀects of treatments in randomized andnonrandomized studies.

J. Educ. Psychol. Rubin, D. B. (1978). Bayesian inference for causal eﬀects: The role of randomization.

Ann. Statist. Saracci, R. (1980). Interaction and synergism.

American J. Epidemiology

Spirtes, P. , Glymour, C. and

Scheines, R. (1993).

Causation, Prediction andSearch . Springer, New York. MR1227558INIMAL SUFFICIENT CAUSATION [34] Studen´y, M. and

Bouckaert, R. (1998). On chain graph models for description ofconditional independence structures.

Ann. Statist. VanderWeele, T. J. and

Robins, J. M. (2007). Directed acyclic graphs, suﬃcientcauses and the properties of conditioning on a common eﬀect.

American J.Epidemiology

VanderWeele, T. J. and

Robins, J. M. (2007). The identiﬁcation of synergism inthe suﬃcient-component cause framework.

Epidemiology VanderWeele, T. J. and

Robins, J. M. (2008). Empirical and counterfactual con-ditions for suﬃcient cause interactions.

Biometrika VanderWeele, T. J. and

Robins, J. M. (2009). Properties of monotonic eﬀects ondirected acyclic graphs.

J. Machine Learning Research . To appear. Available at http://biostats.bepress.com/cobra/ps/art35 .[39]

VanderWeele, T. J. and

Robins, J. M. (2009). Signed directed acyclic graphs forcausal inference.

J. Roy. Statist. Soc. Ser. B . To appear.[40]

Verma, T. and

Pearl, J. (1988). Causal networks: Semantics and expressiveness.In

Proceedings of the 4th Workshop on Uncertainty in Artiﬁcial Intelligence Wellman, M. P. (1990). Fundamental concepts of qualitative probabilistic networks.

Artiﬁcial Intelligence Wermuth, N. and

Cox, D. R. (2004). Joint response graphs and separation in-duced by triangular systems.

J. R. Stat. Soc. Ser. B Stat. Methodol. Wright, S. (1921). Correlation and causation.

J. Agric. Res. Department of Health StudiesUniversity of Chicago5841 South Maryland AvenueMC 2007Chicago, Illinois 60637USAE-mail: [email protected]