[PDF] Causal Sufficiency and Actual Causation

Abstract

Pearl opened the door to formally defining actual causation using causal models. His approach rests on two strategies: first, capturing the widespread intuition that X=x causes Y=y iff X=x is a Necessary Element of a Sufficient Set for Y=y, and second, showing that his definition gives intuitive answers on a wide set of problem cases. This inspired dozens of variations of his definition of actual causation, the most prominent of which are due to Halpern & Pearl. Yet all of them ignore Pearl's first strategy, and the second strategy taken by itself is unable to deliver a consensus. This paper offers a way out by going back to the first strategy: it offers six formal definitions of causal sufficiency and two interpretations of necessity. Combining the two gives twelve new definitions of actual causation. Several interesting results about these definitions and their relation to the various Halpern & Pearl definitions are presented. Afterwards the second strategy is evaluated as well. In order to maximize neutrality, the paper relies mostly on the examples and intuitions of Halpern & Pearl. One definition comes out as being superior to all others, and is therefore suggested as a new definition of actual causation.

Full PDF

aa r X i v : . [ c s . A I] F e b Causal Suﬃciency and Actual Causation (Preprint of paper to appear in the

Journal of Philosophical Logic ) Sander BeckersMunich Center for Mathematical Philosophy, LMU [email protected]

Abstract

Pearl opened the door to formally deﬁning actual causation usingcausal models. His approach rests on two strategies: ﬁrst, capturingthe widespread intuition that X = x causes Y = y iﬀ X = x is a Nec-essary Element of a Suﬃcient Set for Y = y , and second, showing thathis deﬁnition gives intuitive answers on a wide set of problem cases. Thisinspired dozens of variations of his deﬁnition of actual causation, the mostprominent of which are due to Halpern & Pearl. Yet all of them ignorePearl’s ﬁrst strategy, and the second strategy taken by itself is unable todeliver a consensus. This paper oﬀers a way out by going back to the ﬁrststrategy: it oﬀers six formal deﬁnitions of causal suﬃciency and two inter-pretations of necessity. Combining the two gives twelve new deﬁnitions ofactual causation. Several interesting results about these deﬁnitions andtheir relation to the various Halpern & Pearl deﬁnitions are presented.Afterwards the second strategy is evaluated as well. In order to maxi-mize neutrality, the paper relies mostly on the examples and intuitions ofHalpern & Pearl. One deﬁnition comes out as being superior to all others,and is therefore suggested as a new deﬁnition of actual causation. Keywords:

Actual Causation; Causal Suﬃciency; NESS; Counterfactuals

Two decades have passed since Judea Pearl’s groundbreaking book on causalitywas published (Pearl, 2000). It oﬀers a formal account of causal models thatled causal modeling to become a central part of Artiﬁcial Intelligence. One ofthe book’s most important applications for philosophy is its formal deﬁnition of actual causation , i.e., causation of particular events.Pearl defends his account of actual causation using two strategies. The ﬁrststrategy starts with the widely shared intuition that X = x causes Y = y iﬀ X = x is a Necessary Element of a Suﬃcient Set for Y = y (the NESS intuition,1rom now on). Pearl claims that using causal models allows one to makethis intuition formally precise, whereas existing logical notions of necessity andsuﬃciency lack the resources to do so. The second strategy is to demonstratethat his formal account oﬀers intuitive verdicts for a number of problematicexamples.Ever since, Pearl’s account has come under severe criticism. By now there aredozens of papers – both from philosophers and from researchers in AI – attempt-ing to improve upon his account. Most prominently, Pearl himself has oﬀeredseveral revisions of his account in collaboration with Halpern, culminating in themost recent revision by Halpern individually (Pearl, 2009; Halpern and Pearl,2001, 2005; Halpern, 2015, 2016). Together these accounts of causation are re-ferred to as the Halpern & Pearl deﬁnitions, or

HP deﬁnitions for short, andthey are by far the most inﬂuential accounts of causation out there.The problem with all of these attempts at revising Pearl’s initial account, isthat they completely ignore the ﬁrst strategy and focus almost excusively on thesecond strategy. Roughly put, the typical setup is to go over some examples forwhich existing deﬁnitions give counterintuitive answers, and then to constructa new deﬁnition that does not do so. It is unrealistic to expect that this secondstrategy in and of itself can deliver a satisfactory account of causation, becausethere are too many examples and even more intuitions (Glymour et al., 2010;Beckers and Vennekens, 2018).To solve this problem, this paper starts out with an explicit focus on theﬁrst strategy. It is striking that immediately after discussing the NESS intuition,Pearl diverges into complicated technical notions like “sustenance” and “causalbeams” and never looks back, be it in his book or in the subsequent workon the HP deﬁnitions. Instead I oﬀer what is the most natural route downthe ﬁrst strategy, namely to look at formalizations of causal suﬃciency (asopposed to logical suﬃciency) and combine them with two interpretations of necessity . Taken together this results in twelve distinct formal deﬁnitions ofactual causation.These deﬁnitions are compared to each other and to the HP deﬁnitions,leading to several interesting results. For one, it turns out that one of thesetwelve deﬁnitions is equivalent to the most recent HP deﬁnition (Halpern, 2015,2016). Therefore this paper is the ﬁrst to show that one of the HP deﬁnitionssucceeds in delivering Pearl’s promise. At the same time, it also shows that theother HP deﬁnitions do not. This acronym was coined by Wright (1988), but Pearl does not intend to formalize thespeciﬁc manner in which Wright understood it, nor do I in the current paper. I have formalizedWright’s interpretation of the NESS deﬁnition elsewhere, in the process of developing anotherdeﬁnition of causation (Beckers, 2021). The latter deﬁnition is in many ways a simpliﬁcationof the deﬁnition that I defend here. The precise relation between these two deﬁnitions is thesubject of future work. Mackie (1965) formulates the same intuition diﬀerently, resulting in the equally famousINUS acronym. See Wright (2011) for a detailed discussion of the subtle diﬀerences betweenthem. Just to name some of the most inﬂuential ones: Hitchcock (2001, 2007); Woodward (2003);Hall (2007); Weslake (2015). X = x causes Y = y iﬀ there is a set ⃗ W = ⃗ w so that ( X = x, ⃗ W = ⃗ w ) is suﬃcient for Y = y along a causal network ⃗ N and there existssome value x ′ so that ( X = x ′ , ⃗ W = ⃗ w ) is not suﬃcient for Y = y along anycausal subnetwork of ⃗ N .This paper is laid out as follows. The next section introduces structuralequations models , the formal causal models that are used to express all thedeﬁnitions. Then I state the three most recent HP deﬁnitions in Section 3.Section 4 presents six notions of causal suﬃciency and shows how they relate toeach other. We then use these six notions to formalize actual causation along theNESS intuition in Section 5, and discuss several interesting results. After thistheoretical groundwork, we start looking for the best deﬁnition. Two deﬁnitionsare discarded by showing that they have certain unacceptable properties inSection 6. Finally, Section 7 compares the remaining deﬁnitions to each otherand to the HP deﬁnitions by considering examples from Halpern & Pearl and afew additional ones. This section reviews the deﬁnition of causal models as they were introduced byPearl (2000). Much of the discussion and notation is taken from Halpern (2016)with little change.

Deﬁnition 2.1 :

A signature S is a tuple (U , V , R) , where U is a set of exogenous variables, V is a set of endogenous variables, and R a function thatassociates with every variable Y ∈ U ∪ V a nonempty set R( Y ) of possible valuesfor Y (i.e., the set of values over which Y ranges ). If ⃗ X = ( X , . . . , X n ) , R( ⃗ X ) denotes the crossproduct R( X ) × ⋯ × R( X n ) .Exogenous variables represent factors whose causal origins are outside thescope of the causal model, such as background conditions and noise. The valuesof the endogenous variables, on the other hand, are causally determined by othervariables within the model (both endogenous and exogenous).3 eﬁnition 2.2: A causal model M is a pair (S , F ) , where S is a signature and F deﬁnes a function that associates with each endogenous variable X a structuralequation F X giving the value of X in terms of the values of other endogenousand exogenous variables. Formally, the equation F X maps R(U ∪ V − { X }) to R( X ) , so F X determines the value of X , given the values of all the othervariables in U ∪ V .Note that there are no functions associated with exogenous variables; theirvalues are determined outside the model. We call a setting ⃗ u ∈ R(U) of valuesof exogenous variables a context .The value of X may depend on the values of only a few other variables. X depends on Y in context ⃗ u if there is some setting of the endogenous variablesother than X and Y such that if the exogenous variables have value ⃗ u , thenvarying the value of Y in that context results in a variation in the value of X ;that is, there is a setting ⃗ z of the endogenous variables other than X and Y andvalues y and y ′ of Y such that F X ( y, ⃗ z, ⃗ u ) ≠ F X ( y ′ , ⃗ z, ⃗ u ) . We then say that Y is a parent of X .We extend this genealogical terminology in the usual manner, by taking the ancestor relation to be the transitive closure of the parent relation (i.e., Y isan ancestor of X iﬀ there exist variables so that Y is a parent of V , V is aparent of V , ..., and V n is a parent of X ). The descendant relation is simply thereversal of the ancestor relation (i.e., X is a descendant of Y iﬀ Y is an ancestorof X .) A path is a sequence of variables in which each element is a child of theprevious element.In this paper we restrict attention to strongly recursive (or strongly acyclic )models, that is, models where there is a partial order ⪯ on variables such that if Y depends on X , then X ≺ Y . In a strongly recursive model, given a context ⃗ u ,the values of all the remaining variables are determined (we can just solve for thevalue of the variables in the order given by ⪯ ). We often write the equation foran endogenous variable as X = f ( ⃗ Y ) ; this denotes that the value of X dependsonly on the values of the variables in ⃗ Y , and the connection is given by thefunction f . For example, we might have X = Y + intervention has the form ⃗ X ← ⃗ x , where ⃗ X is a set of endogenous vari-ables. Intuitively, this means that the values of the variables in ⃗ X are set tothe values ⃗ x . The structural equations deﬁne what happens in the presence ofinterventions. Setting the value of some variables ⃗ X to ⃗ x in a causal model M = (S , F ) results in a new causal model, denoted M ⃗ X ←⃗ x , which is identical to M , except that F is replaced by F ⃗ X ←⃗ x : for each variable Y ∉ ⃗ X , F ⃗ X ←⃗ xY = F Y (i.e., the equation for Y is unchanged), while for each X ′ in ⃗ X , the equation F X ′ for X ′ is replaced by X ′ = x ′ (where x ′ is the value in ⃗ x corresponding to X ′ ).Given a signature S = (U , V , R) , an atomic formula is a formula of the form X = x , for X ∈ V and x ∈ R( X ) . A causal formula (over S ) is one of the form [ Y ← y , . . . , Y k ← y k ] ϕ , where • ϕ is a Boolean combination of atomic formulas,4 Y , . . . , Y k are distinct variables in V , and • y i ∈ R ( Y i ) for each 1 ≤ i ≤ k .Such a formula is abbreviated as [ ⃗ Y ← ⃗ y ] ϕ . The special case where k = ϕ . Intuitively, [ Y ← y , . . . , Y k ← y k ] ϕ says that ϕ would hold if Y i were set to y i , for i = , . . . , k .A causal formula ψ is true or false in a causal setting , which is a causal modelgiven a context. As usual, we write ( M, ⃗ u ) ⊧ ψ if the causal formula ψ is true inthe causal setting ( M, ⃗ u ) . The ⊧ relation is deﬁned inductively. ( M, ⃗ u ) ⊧ X = x if the variable X has value x in the unique (since we are dealing with recursivemodels) solution to the equations in M in context ⃗ u (i.e., the unique vector ofvalues that simultaneously satisﬁes all equations in M with the variables in U set to ⃗ u ). The truth of conjunctions and negations is deﬁned in the standardway. Finally, ( M, ⃗ u ) ⊧ [ ⃗ Y ← ⃗ y ] ϕ if ( M ⃗ Y ←⃗ y , ⃗ u ) ⊧ ϕ (i.e., the intervention ⃗ Y ← ⃗ y transforms M into a new model M ⃗ Y ←⃗ y , in which we assess the truth of ϕ ). Now on to the HP deﬁnitions. As Pearl (2000)’s initial deﬁnition is a precursor tothe HP deﬁnitions that gives less intuitive results and is far more complicated,I do not discuss it. (It is safe to say that by now it has been unanimouslyrejected.) Two of the HP deﬁnitions are developed by both Halpern and Pearl,whereas the third one is solely due to Halpern. The relations between them areextensively discussed by Halpern (2016).The general form of all three deﬁnitions is as follows (where ϕ is a Booleancombination of atomic formulas): Deﬁnition 3.1: ⃗ X = ⃗ x is an actual cause of ϕ in ( M, ⃗ u ) if the following threeconditions hold:AC1. ( M, ⃗ u ) ⊧ ( ⃗ X = ⃗ x ) ∧ ϕ .AC2. See below.AC3. ⃗ X is minimal; there is no strict subset ⃗ X ′′ of ⃗ X such that ⃗ X ′′ = ⃗ x ′′ satisﬁesAC2, where ⃗ x ′′ is the restriction of ⃗ x to the variables in ⃗ X ′′ .Questions of actual causation are posed relative to an actual context ⃗ u , be-cause as we know from the previous section a context completely determineswhich events actually took place. So AC1 represents the trivial requirementthat the candidate cause and eﬀect are among the events which took place.AC3 is also fairly straightforward: we should not consider redundant elementsto be parts of causes. The real content of the deﬁnition lies with AC2.Throughout the rest of the paper, settings of variables ⃗ V with superscript ∗ (i.e., ⃗ v ∗ ) indicate that ( M, ⃗ u ) ⊧ ( ⃗ V = ⃗ v ∗ ) . Settings of variables ⃗ V with su-perscript ′ (i.e., ⃗ v ′ ) indicate that ( M, ⃗ u ) ⊧ ( V ≠ v ′ ) for each V ∈ ⃗ V . Settings ofvariables without any superscript can refer to any setting.5n line with the NESS intuition, we should expect AC2 to consist of formalvariants of these two conditions: AC2(b). There is a set ⃗ W so that ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is causally suﬃcient for ϕ .AC2(a). ⃗ X = ⃗ x is necessary for the suﬃciency of ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) .At ﬁrst glance, the ﬁrst two HP deﬁnitions seem to meet this expectation:they consist of conditions AC2(a) and AC2(b), and Halpern refers to these asa “necessity condition” and a “suﬃciency condition” (2015, p. 3). Upon closerexamination, however, it is hard to see how either version of AC2(b) can sensiblybe interpreted as capturing causal suﬃciency.We start with Original HP (Halpern and Pearl, 2001):

Deﬁnition 3.2: [ Original HP ]AC2(a). There is a partition of V into two sets ⃗ Z and ⃗ W with ⃗ X ⊆ ⃗ Z and asetting ⃗ x ′ and ⃗ w of the variables in ⃗ X and ⃗ W , respectively, such that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ] ¬ ϕ. AC2(b). For all subsets ⃗ Y of ⃗ Z − ⃗ X , we have ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w, ⃗ Y ← ⃗ y ∗ ] ϕ. We call ⃗ W = ⃗ w a witness of ⃗ X = ⃗ x causing Y = y .Note that one choice of ⃗ Y for which the condition in AC2(b) is required tohold, is ⃗ Y = ∅ . For that choice, AC2 states that the eﬀect counterfactuallydepends on the cause when holding ﬁxed the witness ⃗ W = ⃗ w : ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ] ϕ and ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ] ¬ ϕ . Therefore AC2(a) can easilybe interpeted as expressing a – contrastive – necessity condition: there existcontrast values ⃗ x ′ such that if those values were to obtain, then AC2(b) nolonger holds.The problem lies with interpreting AC2(b) as expressing causal suﬃciency.The main obstacle lies in the absence of the requirement that ⃗ w = ⃗ w ∗ , i.e.,it is not required that the supposedly suﬃcient set of events ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ) actually took place . Therefore we cannot simply view ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ) itself as thecausally suﬃcient set we are looking for. Although it cannot be excluded thatthe conditions imposed by invoking ⃗ Z (and ⃗ Y ) somehow ensure the existenceof some other set that can be interpreted as a causally suﬃcient set, it is farfrom obvious that this is the case. This is conﬁrmed by the fact that Halpern& Pearl do not even oﬀer an attempt at giving an interpretation of AC2(b) asexpressing causal suﬃciency.Matters get worse when we turn our attention to Updated HP (Halpern and Pearl,2005):

Deﬁnition 3.3: [ Updated HP ]AC2(a). Identical to the previous one. I list them unalphabetically for consistency with the HP deﬁnitions. ⃗ V of ⃗ W and subsets ⃗ Y of ⃗ Z − ⃗ X , we have ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ V ← ⃗ v, ⃗ Y ← ⃗ y ∗ ] ϕ (where ⃗ v is the restriction of ⃗ w to ⃗ V ).We see that AC2(b) has become even more complicated, and yet no ar-gument is given as to how this condition formalizes causal suﬃciency, despiteHalpern explicitly claiming that this is what it aims to do. Instead, the up-dated version is justiﬁed on the basis of examples for which the previous versiongave counterintuitive answers.As a sidenote, Halpern and Pearl (2005) also deﬁne strong causation by de-manding that the following condition holds in addition to the other two:AC2(c). For all ⃗ w ∈ R ( ⃗ W ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ] ϕ. This deﬁnition has received almost no attention in the literature, because ac-cording to Halpern & Pearl it is too strong. As we shall see, this is unfortunate,because AC2(c) does adequately capture a variant of causal suﬃciency.Finally we have

Modiﬁed HP , which is far simpler than the previous two(Halpern, 2015).

Deﬁnition 3.4: [ Modiﬁed HP ]AC2. There is a set ⃗ W of variables in V − ⃗ X , and a setting ⃗ x ′ of the variablesin ⃗ X such that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ ] ¬ ϕ. The crucial diﬀerence here is that

Modiﬁed HP does require the witness toconsist solely of events which actually took place, i.e., ⃗ w = ⃗ w ∗ . It is straightfor-ward to show that simply adding this requirement ensures that both versions ofAC2(b) are satisﬁed automatically, and therefore an explicit suﬃciency condi-tion is not required. Halpern considers this deﬁnition to be an improvemementover the other two, and I agree with him. However, Halpern arrives at thisconclusion based on the many examples in which it better agrees with intuition.As will become clear, another – and arguably more compelling – justiﬁcation isto be found in the fact that it is the only deﬁnition of the three which has anatural interpretation as formalizing the NESS intuition with which we started.To get there, we need to step away from the HP deﬁnitions and start afresh. Concretely, when discussing suﬃcient causality we ﬁnd the following (Halpern, 2016, p.53): The key intuition behind the deﬁnition of suﬃcient causality is that not onlydoes ⃗ X = ⃗ x suﬃce to bring about ϕ in the actual context (which is the intuitionthat AC2(b) [from Original HP ] and AC2(b) [from

Updated HP ] are tryingto capture)... In retrospect, there is little basis for this judgment. They only discuss two examples inwhich strong causation diverges from

Updated HP . In the ﬁrst of those (Ex. 3.2), it fails tocall the lighting of each of two matches ( ML = ML =

1) to be causes of a forest ﬁre,whereas

Updated HP does not. However, their conjunction ( ML = , ML = ) is a strongcause, and thus each of them is part of a strong cause. As we will see, Halpern later suggeststreating “part of a cause” as being synonymous to “cause”, so the point would be moot. Inthe second example (Ex. 5.5), discussed as Example 7.4 later on, S = Updated HP . This is an example of trumping causation ,for which the majority opinion is that S = Modiﬁed HP also does not consider it a cause. Causal Suﬃciency

1: Halpern (2016) suggests treating “part of a cause” (i.e., any X = x thatappears in ⃗ X = ⃗ x ) as synonymous with “cause” when talking about ModiﬁedHP . I will follow this suggestion throughout whenever discussing the judgmentof

Modiﬁed HP in particular examples, unless stated otherwise. In statingtheorems, however, the two are kept apart.2: The HP deﬁnitions allow the eﬀect to be any propositional formula ϕ ,whereas the other deﬁnitions of causation will require eﬀects to be of the form Y = y . A thorough discussion of complex eﬀects is beyond the scope of thispaper. I here limit myself to two observations. • Although the deﬁnitions of causation here developed can be generalizedto allow for conjunctive eﬀects (i.e., eﬀects of the form ⃗ Y = ⃗ y ), it is not atall clear that we should want to do so. The reason is that we can easilyinclude variables into the eﬀect that have nothing whatsoever to do withthe causes. Say we have a variable Y with equation Y = U , where U is anexogenous variable, and we are considering a context where U =

1. Thenfor any cause-eﬀect pair ⃗ X = ⃗ x and ϕ , we automatically get that ⃗ X = ⃗ x also causes ϕ ∧ Y =

1, which is not a sensible result. Therefore we chooseto simply exclude conjunctive eﬀects. • In the few examples in the literature where the HP deﬁnitions actuallyconsider an eﬀect ϕ that is not of the form Y = y , ϕ takes on the form Y = y ∨ Y = y , . . . , ∨ Y = y n for some n . The deﬁnitions here developedcan easily be generalized to also allow for such eﬀects. For reasons ofsimplicitly I choose not to do so in general and limit the discussion of thisgeneralization to one example for which it is required.3: The deﬁnitions of suﬃciency below (and the deﬁnitions of actual causationthat follow in their wake) could be extended to also allow for exogenous variablesas members of a suﬃcient set, so that exogenous and endogenous variables aretreated alike. Since our goal is to make comparisons with the HP deﬁnitions,those would also have to be extended. Concretely, the HP deﬁnitions restrictcauses to being endogenous variables, and they do not allow exogenous variablesto be parts of a “witness” (the set ⃗ W above). For example, if we have Y = X ∨ U where U ∈ U and we consider a context where U = X =

1, the HP deﬁnitionsare unable to identify X = U =

0. The simplest way to sidestep this issue is to restrictourselves to models where exogenous variables only appear in equations of theform V = U . In that manner, all inﬂuence of the exogenous variables can beoverriden by interventions, reducing their role to simply providing us with theactual values of all variables. For any model which does not conform to thisrestriction, we can easily construct a very similar model that does: simplyreplace any exogenous variable U which appears in some equation that is not8f this form with a new endogenous variable V U , and add the equation V U = U .For the previous example this results in the model with equations Y = X ∨ V U , V U = U . (Note that now the HP deﬁnitions do consider X = Y = Throughout the rest of the paper, we take ⃗ X and ⃗ Y to be non-identical subsetsof the endogenous variables V that appear in a causal model M . Informally, to say that some setting ⃗ X = ⃗ x is suﬃcient for another setting ⃗ Y = ⃗ y , is to say that the latter follows from the former. To formalize thisrequires making explicit what it means for one setting to “follow” from another.In the context of causal suﬃciency, an obvious minimal demand is that thismeaning captures the causal directionality. In the framework of causal modelsthis comes down to treating ⃗ X = ⃗ x as an intervention and ⃗ Y = ⃗ y as a consequenceof that intervention: if we set ⃗ X to the values ⃗ x , then ⃗ Y takes on the values ⃗ y .At least this much is clear.Yet by saying this, we have said nothing at all about the other endogenousvariables and their values, nor about the contexts in which we are evaluatingthe intervention. The diﬃculty lies in deciding what conditions we choose toimpose on the other variables, both endogenous and exogenous. I consider sixpossible ways in which this decision can be made that are fairly natural, butthis is by no means an exhaustive list.We start with the strongest conditions possible: in all contexts , if we set ⃗ X to the values ⃗ x , then ⃗ Y takes on the values ⃗ y , independent of the values of allother variables . Deﬁnition 4.1:

We say that ⃗ X = ⃗ x is directly suﬃcient for ⃗ Y = ⃗ y in M if for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ Y )) and all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ Y = ⃗ y .The strength of this deﬁnition is also its weakness: by putting such strongdemands on the suﬃcient set, many interesting sets are excluded. This restric-tiveness becomes apparent later on when we add a necessity condition (Propo-sition 6.1): only parents can ever be part of a minimal directly suﬃcient set.A trivial example illustrates this point. Say the equation for Y is Y = A , theequation for A is A = X , and we are looking at a context in which X = Then X = Y =

1, because intervening on A overrides any We take them to be non-identical to exclude calling a setting ⃗ X = ⃗ x causally suﬃcient foritself, and a fortiori to exclude calling it a cause of itself. Note that in this paper we are interested in the causal suﬃciency of settings of variables for other settings of variables. This is quite distinct from how the term “causal suﬃciency” issometimes used in the causal modelling literature, namely as a property of a set of variables in a causal graph. Weslake (2015) also oﬀers this deﬁnition of causal suﬃciency to develop a deﬁnition ofactual causation. He mistakenly claims that Halpern & Pearl call this condition strong cau-sation. As we have seen, strong causation does not require ⃗ C to contain all other variables. In all examples the variables are binary unless indicated otherwise. A binary variable isa variable that has range { , } . X on Y . Still, there is clearly a sense in which X = is causallysuﬃcient for Y =

1. In particular, X = ( A = , Y = ) .Generalizing this intuition provides us with the second form of suﬃciency:there is some setting ⃗ N = ⃗ n that includes ⃗ Y = ⃗ y , so that in all contexts, if we set ⃗ X to the values ⃗ x , then ⃗ N takes on the values ⃗ n , independent of the values ofall other variables. This can be formulated more succinctly as: ⃗ X = ⃗ x is directlysuﬃcient for some set to which ⃗ Y = ⃗ y belongs. Deﬁnition 4.2:

We say that ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M ifthere exists a ⃗ N = ⃗ n so that ⃗ Y ⊆ ⃗ N , ⃗ y is the restriction of ⃗ n to ⃗ Y , and ⃗ X = ⃗ x isdirectly suﬃcient for ⃗ N = ⃗ n .Observe that another intuitive way of viewing X = Y = X = A = A = Y =

1. Thisintuition can also be generalized to deﬁne a form of suﬃciency. Concretely, wecan deﬁne strong suﬃciency along a network as the transitive closure of directsuﬃciency. Deﬁnition 4.3 :

We say that ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M along a network ⃗ N if there are (possibly overlapping) sets ⃗ N i such that ⃗ N = ⃗ Y ∪ i ∈{ ,...,k } ⃗ N i and there exist values ⃗ n i ∈ R ( ⃗ N i ) for each i such that ⃗ X = ⃗ x isdirectly suﬃcient for ⃗ N = ⃗ n , ⃗ N = ⃗ n is directly suﬃcient for ⃗ N = ⃗ n , ..., and ⃗ N k = ⃗ n k is directly suﬃcient for ⃗ Y = ⃗ y .The following result shows that both forms of strong suﬃciency are merelydiﬀerent ways of expressing the same notion of suﬃciency (and hence the termis appropriately chosen). Taking in mind the earlier observation (to appear lateras Proposition 6.1) that direct suﬃciency combined with necessity is a relationbetween parents and children, we can safely think of a network as consisting ofvariables that lie on some path between ⃗ X and ⃗ Y . Doing so will make it easierto apply the deﬁnitions of causation to examples. Proposition 4.4: ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M along a network ⃗ N iﬀ ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M . (Proofs of all Theorems are to be found in the Appendix.)Another obvious way to weaken the conditions on the values of the endoge-nous variables compared to direct suﬃciency is to only consider the setting inwhich we leave the other variables alone, giving: in all contexts , if we set ⃗ X tothe values ⃗ x and do not intervene on any other variable , then ⃗ Y takes on thevalues ⃗ y . As with the deﬁnition of direct suﬃciency, this one also appears in Weslake (2015)’sconstruction of actual causation, with the added requirement that ⃗ N is minimal. This demandbecomes redundant once we add our necessity condition. The other conditions Weslake invokesare quite complicated and do not have a counterpart in our story, which is why his deﬁnitionalso fails at the ﬁrst strategy. This deﬁnition appears as just one condition in Halpern (2016)’s deﬁnition of suﬃcientcausality . One of the other conditions is in fact actual causation. eﬁnition 4.5: We say that ⃗ X = ⃗ x is weakly suﬃcient for ⃗ Y = ⃗ y in M if forall ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] ⃗ Y = ⃗ y .The following straightforward result shows the relative strengths of the abovethree notions of suﬃciency. Proposition 4.6: If ⃗ X = ⃗ x is directly suﬃcient for ⃗ Y = ⃗ y then ⃗ X = ⃗ x is stronglysuﬃcient for ⃗ Y = ⃗ y , and if ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y then ⃗ X = ⃗ x isweakly suﬃcient for ⃗ Y = ⃗ y . So far we have considered three deﬁnitions that diﬀer only with regards tothe conditions they impose on the values of the endogenous variables: theyall agreed on requiring their respective conditions to hold in all contexts. Yetquestions of actual causation are posed relative to an actual context ⃗ u , and thusit is only natural that we should consider doing the same for questions of causalsuﬃciency. This adds three more deﬁnitions of suﬃciency, which are simplythe result of replacing the universal quantiﬁer over contexts with a particularcontext that is assumed to be given. Deﬁnition 4.7:

We say that ⃗ X = ⃗ x is actually directly suﬃcient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ Y )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ Y = ⃗ y . Deﬁnition 4.8:

We say that ⃗ X = ⃗ x is actually strongly suﬃcient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if there exist ⃗ N = ⃗ n so that ⃗ Y ⊆ ⃗ N , ⃗ y is the restriction of ⃗ n to ⃗ Y , and ⃗ X = ⃗ x is actually directly suﬃcient for ⃗ N = ⃗ n . Deﬁnition 4.9:

We say that ⃗ X = ⃗ x is actually weakly suﬃcient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] ⃗ Y = ⃗ y .Obviously the counterpart of Proposition 4.6 holds as well for these notionsof actual suﬃciency. We can formalize and generalize the intuitions behind the deﬁnitions in the pre-ceding section by showing that all six deﬁnitions of suﬃciency can be interpretedas simply putting diﬀerent constraints on the parameters that occur in the fol-lowing general deﬁnition of suﬃciency. (We only explicitly discuss the threedeﬁnitions of “non-actual” suﬃciency, but the same analysis trivially applies tothe three deﬁnitions of actual suﬃciency.)

Deﬁnition 4.10: [ General Deﬁnition of Suﬃciency ] We say that ⃗ X = ⃗ x is suﬃcient for ⃗ Y = ⃗ y in M if there exist sets ⃗ C ⊆ V − ( ⃗ X ∪ ⃗ Y ) , ⃗ N ⊆ V − ( ⃗ X ∪ ⃗ C ) with ⃗ Y ⊆ ⃗ N , and a setting ⃗ n ∈ R ( ⃗ N ) where the restriction of ⃗ n to ⃗ Y is ⃗ y , such that forall ⃗ c ∈ R ( ⃗ C ) and for all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n .We say that ⃗ X = ⃗ x is suﬃcient for ⃗ Y = ⃗ y in M along ⃗ N independent of ⃗ C .11his deﬁnition is more complicated than Deﬁnitions 4.1, 4.2, and 4.5. Itsuse lies in the fact that it allows us to see exactly how the three deﬁnitionsrelate to each other, and how one can construct other deﬁnitions of suﬃciency,by invoking the following trivial result. Proposition 4.11:

Deﬁnitions 4.1, 4.2, and 4.5, are equivalent to Deﬁnition4.10 when making respectively the following choices for ⃗ N and ⃗ C : Weak Suﬃciency.

Choose both ⃗ C and ⃗ N to be minimal, i.e., ⃗ C = ∅ , ⃗ N = ⃗ Y . Strong Suﬃciency.

Choose ⃗ N to be maximal given ⃗ C , i.e., ⃗ N = V − ( ⃗ X ∪ ⃗ C ) . Direct Suﬃciency.

Choose ⃗ C to be maximal, i.e., ⃗ C = V − ( ⃗ X ∪ ⃗ Y ) and thus ⃗ N = ⃗ Y . Proposition 4.11 could inspire even more variants of suﬃciency. In fact,we have already come across the most obvious one: AC2(c). It is easy to seethat it consists of choosing ⃗ N to be minimal given ⃗ C , i.e., ⃗ N = ⃗ Y , meaning itsits in between Weak and Strong Suﬃciency. The condition also appears as asuﬃciency condition in Pearl’s notion of sustenance , which is the ﬁrst step hetakes towards formalizing the NESS intuition (2009, p. 317). Unfortunatelyit is also the last step, because the subsequent notions he introduces are farmore complicated and bear no resemblance to NESS. The added complexity isintroduced precisely because taken by itself sustenance fails to provide a sensibledeﬁnition of causation, which is why I leave the exploration of this and otherpossible variants of suﬃciency for another occasion. We are ﬁnally ready to take up the main challenge: deﬁning actual causation asthe formal expression of the NESS intuition. In order to do so, several questionsneed to be answered: • Should we use actual suﬃciency or not? • Which of the three deﬁnitions of (actual) causal suﬃciency should we use? • Does necessity mean that there exist contrast values of ⃗ X so that the setwould not be suﬃcient if those values obtained, or does it mean that theset is no longer suﬃcient when we remove the subset ⃗ X ?I have introduced six deﬁnitions of causal suﬃciency in the previous section.For each deﬁnition, we can deﬁne causation using either of the two interpre-tations of necessity, giving twelve deﬁnitions of actual causation altogether.However, I will show that several of these are equivalent to each other, and onewill be impossible to satisfy, leaving us with six deﬁnitions in the end. One ofthose will be Modiﬁed HP . 12 .1 A Family of Deﬁnitions

As with the HP deﬁnitions, Deﬁnition 3.1 gives the general form of all deﬁnitions,except that ϕ is restricted to Y = y . (This restriction is assumed whenevercomparisons are made with the HP deﬁnitions.) As before, the only diﬀerencelies with the content of AC2. Using the ﬁrst interpretation of necessity, whichwe shall call contrastive necessity , the general form of AC2 is as follows: Deﬁnition 5.1: [ General Deﬁnition of Causation ] There exist sets ⃗ W , ⃗ N such thatAC2(a c ). There exist values ⃗ x ′ such that for all ⃗ S ⊆ ⃗ N , ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is notsuﬃcient for Y = y along ⃗ S .AC2(b). ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along ⃗ N .We call ⃗ W a witness of ⃗ X = ⃗ x causing Y = y .By replacing suﬃciency in the General Deﬁnition of Causation withany of the six deﬁnitions of suﬃciency from Section 4, we obtain six speciﬁcdeﬁnitions of actual causation. AC2(b) simply expresses causal suﬃciency,whatever form it may take. AC2(a c ) oﬀers a somewhat nuanced expressionof necessity because it also focusses on subsets of ⃗ N . (Note that this nuancematters only for Strong Suﬃciency, since for Weak and Direct Suﬃciency ⃗ N = { Y } anyway.) The reason is that our interest lies with the suﬃciency for Y = y ,and the network ⃗ N is merely a means to that end. If ⃗ X = ⃗ x ′ accomplishes thesame end using less means, then ⃗ X = ⃗ x was not necessary for achieving it.Under the second interpretation of necessity, which we shall call minimalnecessity , AC2(a c ) is replaced with:AC2(a m ). For all ⃗ S ⊆ ⃗ N , ⃗ W = ⃗ w ∗ is not suﬃcient for Y = y along ⃗ S .Both interpretations of necessity are prima facie plausible. The contrastiveinterpretation is explicitly counterfactual in nature, whereas the minimal inter-pretation is more neutral. Our analysis will settle which one of them is to bepreferred.Filling in each of the six deﬁnitions of causal suﬃciency into both versionsof the General Deﬁnition of Causation gives twelve speciﬁc deﬁnitions ofactual causation. I refer to each of these as

Def x for x ∈ { , . . . , } along thefollowing convention: • Def 1

Contrastive actual weak suﬃciency • Def 2

Contrastive actual strong suﬃciency • Def 3

Contrastive actual direct suﬃciency Deﬁnition 5.1 can be made even more general by also incorporating ⃗ C from Deﬁnition4.10. Since we are only considering notions of suﬃciency for which ⃗ C is determined entirelyby the other sets, there is no need to do so for our purposes. But it is important to keep thisadditional generality in mind if one wants to use alternative deﬁnitions of suﬃciency. Def 4

Contrastive weak suﬃciency • Def 5

Contrastive strong suﬃciency • Def 6

Contrastive direct suﬃciency • Def 7

Minimal actual weak suﬃciency • Def 8

Minimal actual strong suﬃciency • Def 9

Minimal actual direct suﬃciency • Def 10

Minimal weak suﬃciency • Def 11

Minimal strong suﬃciency • Def 12

Minimal direct suﬃciencySo to be clear, each

Def x is constructed by taking the respective deﬁnitionof suﬃciency (i.e., Deﬁnition 4.1, 4.2, 4.5, 4.7, 4.8, or 4.9), ﬁlling that intothe General Deﬁnition of Causation where AC2(a) takes on AC2(a c ) orAC2(a m ) depending on whether x < Def 2 . Deﬁnition 5.2: [ Def 2 ] ⃗ X = ⃗ x is an actual cause of Y = y according to Def 2 in ( M, ⃗ u ) if the following three conditions hold:AC1. ( M, ⃗ u ) ⊧ ( ⃗ X = ⃗ x ) ∧ Y = y .AC2(a c ). There exist sets ⃗ W , ⃗ N with Y ∈ ⃗ N , and values ⃗ x ′ , such that for all ⃗ S ⊆ ⃗ N with Y ∈ ⃗ S , and for all ⃗ s ∈ R ( ⃗ S ) such that y ∈ ⃗ s , there exists a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S )) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s .AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n ∗ .AC3. ⃗ X is minimal.Admittedly, Def 2 looks even more complicated than

Updated HP . Fur-ther on I provide some results that allow us in many cases to use simpler def-initions as stand-ins for

Def 2 . More importantly, although the notation ofDeﬁnition 5.2 is complicated, its meaning can be spelled out intuitively by stat-ing that ⃗ X = ⃗ x causes Y = y iﬀ ⃗ X = ⃗ x is a Minimal Contrastively NecessarySubset of a Strongly Suﬃcient Set for Y = y (or MCNS ). Strictly speaking it should say “Actually Strongly Suﬃcient”, but that makes for a lesselegant acronym. I am cheating a bit by anticipating Theorem 5.3. .2 Analysis Let us now turn to investigating the relations between these deﬁnitions. (Know-ing these relations before getting into the discussion of examples makes life alot easier.) A ﬁrst remark is that

Def 7 is impossible to satisfy, as it requiresthat both ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ∗ , ⃗ W ← ⃗ w ∗ ] Y = y and ( M, ⃗ u ) / ⊧ [ ⃗ W ← ⃗ w ∗ ] Y = y hold,implying that ( M, ⃗ u ) ⊧ Y = y ∧ Y ≠ y .A second remark is that Def 3 is equivalent to a condition that appears inPearl’s ﬁrst deﬁnition of actual causation (1998). Ignoring

Def 7 , we are still left with eleven candidate deﬁnitions of actualcausation (fourteen candidates if we count the three HP deﬁnitions), whereaswe would like to settle on just one. The rest of the paper is concerned withselecting the best deﬁnition out of the lot. As a ﬁrst step, we can reduce thenumber of deﬁnitions by six.

Theorem 5.3:

The following are all equivalences among the twelve deﬁnitionsand the three HP deﬁnitions: • Modiﬁed HP iﬀ Def 1 • Def 2 iﬀ Def 5 • Def 8 iﬀ Def 11 • Def 3 iﬀ Def 6 iﬀ Def 9 iﬀ Def 12

Theorem 5.3 oﬀers our ﬁrst interesting result: it shows that

Modiﬁed HP succeeds in formalizing the NESS intuition, whereas the other two HP deﬁnitionsdo not. From now on I will ignore the deﬁnitions appearing on the right-handside in Theorem 5.3. The following is a helpful result for applying some of thedeﬁnitions going forward. (As is well known, the same result holds for

OriginalHP (Halpern, 2016).)

Proposition 5.4: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to a deﬁnitionthat uses minimal necessity, then ⃗ X is a singleton. The following result oﬀers important insights into the relations between theremaining deﬁnitions.

Theorem 5.5 :

The only implications – involving either causes or parts ofcauses – between the remaining ﬁve deﬁnitions (

Def 2 , Def 3 , Def 4 , Def8 , and

Def 10 ) and the three HP deﬁnitions are the following ones (and theirimmediate consequences, of course): • If part of

Modiﬁed HP then

Updated HP ; It re-appears in his second deﬁnition of actual causation in the notion of a causal beam ,but without the necessity condition (Pearl, 2009, p. 318). To see the equivalence, one needsto invoke Proposition 6.1. This is shorthand for: If X = x is part of a cause of Y = y according to the Modiﬁed HP deﬁnition then it is a cause of Y = y according to the Updated HP deﬁnition. If part of

Updated HP then

Original HP ; • If Def 3 then

Def 2 ; • If part of

Def 2 then

Def 8 ; • If Def 3 then

Original HP ; • If Def 10 then

Def 4 . Two deﬁnitions can be excluded quickly. The following result shows why

Def 3 is not a sensible candidate as a general deﬁnition of causation, since causationis obviously not restricted to parent-children pairs.

Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to Def 3 , then ⃗ X is a singleton, and X is a parent of Y . Although we can dismiss

Def 3 as a general deﬁnition of causation, it is stilla useful stand-in for – the arguably more complicated –

Def 2 and

Def 8 incase X is a parent of Y and X is not an ancestor of Y along any path that islonger than a single edge (which in fact covers a surprisingly large number ofcases discussed in the literature). In such cases we say that X is only a parentof Y . Proposition 6.2: If X is only a parent of Y , then Def 2 , Def 3 , and

Def 8 are all equivalent for causes X = x . A cornerstone of the counterfactual approach to causation is that counterfac-tual dependence is suﬃcient for causation. More formally, there is widespreadconsensus that causation should satisfy the following principle: Principle 1 (Dependence)

Say ( M, ⃗ u ) ⊧ X = x ∧ Y = y . If there exists avalue x ′ such that ( M, ⃗ u ) ⊧ [ X ← x ′ ] Y ≠ y then X = x causes Y = y in ( M, ⃗ u ) . Accepting this principle means that

Def 10 is excluded as well.

Proposition 6.3:

Out of all deﬁnitions we have considered,

Def 10 and

Def3 are the only ones which do not satisfy

Dependence . That leaves us with

Def 2 , Def 4 , and

Def 8 as possible alternatives to theHP deﬁnitions. As does Halpern, I here restrict myself to counterfactual dependence on a single conjunct(2016, p. 26). Def 2, Def 4, and Def 8, vs the HP deﬁnitions

We have shown that all twelve deﬁnitions we developed (including

ModiﬁedHP ) are instantiations of the

General Deﬁnition of Causation (Def. 5.1),and thereby they improve upon

Original HP and

Updated HP as far as theﬁrst strategy goes. We now show that

Def 2 also improves upon all three HPdeﬁnitions as far as the second strategy goes, whereas

Def 4 and

Def 8 do not.In order to remain as neutral as possible, we go over Halpern & Pearl’s ownexamples, compare the verdicts of our deﬁnitions to theirs, and stick as close aspossible to their intuitions.

The

Updated HP deﬁnition is by far the most well-known. It was developed asan improvement of

Original HP , which sometimes gives unreasonable answers.Halpern and Pearl (2005) oﬀer many examples to illustrate how it works andhow it successfully deals with paradigm cases of causation.Their ﬁrst example is one of those few cases – recall the beginning of Section4 – in which the eﬀect is of the form Y = y ∨ Y = y , and therefore allows usto illustrate how we can generalize the General Deﬁnition of Causation tosuch eﬀects. It is also an example for which

Def 8 gives the wrong answer, butthe subsequent example is far simpler and more convincing in this respect.

Example 7.1: “Suppose that there was a heavy rain in April and electricalstorms in the following two months; and in June the lightning took hold. If ithadn’t been for the heavy rain in April, the forest would have caught ﬁre inMay.” (Halpern and Pearl, 2005, p. 15) I agree with Halpern and Pearl’s judg-ment that it would be very counterintuitive to say that the April rain causedthe forest ﬁre, since all it did was delay the ﬁre. As they indicate, it is neverthe-less perfectly sensible to say that the April rain caused the forest ﬁre in June ,as opposed to May. In order to capture this distinction, we need to invoke adisjunctive eﬀect.Let F represent there being a ﬁre or not, with three possible values: 0 (noﬁre), 1 (ﬁre in May), or 2 (ﬁre in June). ES is a four-valued variable thatcaptures whether there are electric storms: ( , ) (no electric storms in eitherMay or June), ( , ) (electric storms in May but not in June), ( , ) (storms inJune but not May), and ( , ) (storms in both May and June). Lastly, AS is abinary variable expressing whether or not there was April rain.The equation for F is then given by: F = ( AS = ∧ ES = ( , )) ∨ ES = ( , ) , F = AS = ∧ ( ES = ( , ) ∨ ES = ( , )) , and F = F = AS =

1, all deﬁnitions we are consideringagree that AS = F =

2. The question is whether AS = F = ∨ F = ⃗ X = ⃗ x is suﬃcientfor Y = y ∨ Y = y ′ iﬀ ⃗ X = ⃗ x is suﬃcient for Y = y or ⃗ X = ⃗ x is suﬃcient for Y = y ′ .When integrated into our General Deﬁnition of Causation , this results in17plitting up AC2(a) so that there is one instance for each disjunct. AC2(b) neednot be split up, since it can only ever be satisﬁed for the actual value of Y . Let us apply this idea to our example. To satisfy AC2(b), we have to add ES to the witness: ( AS = , ES = ( , )) is directly suﬃcient for F = AS = AS is only a parentof F . We cannot invoke Proposition 6.2 though, since that requires an eﬀect Y = y .) We then see that one of the two conditions that now make up AC2(a)is not satisﬁed for Def 2 and

Def 4 , because ( AS = , ES = ( , )) is directlysuﬃcient for F =

1. Therefore

Def 2 and

Def 4 agree with the HP deﬁnitionsthat the April rain did not cause the forest ﬁre. But

Def 8 does not reach thisverdict, because ES = ( , ) is not directly suﬃcient for either F =

1, nor is itfor F =

2. This means AC2(a) is fullﬁlled for

Def 8 , which leads to a mistakenconclusion.Although one counterexample need not disqualify a deﬁnition, the followingexample is indicative of a deeper problem with

Def 8 : whenever X = x stronglysuﬃces for Y = y , it is automatically a cause according to Def 8 , since ∅ isnever strongly suﬃcient for Y = y . The following example is but one of manyparadigm cases in the literature for which this property leads to a counterintu-itive verdict. Therefore

Def 8 is also excluded as a deﬁnition of causation.

Example 7.2: “The engineer is standing by a switch in the railroad tracks. Atrain approaches in the distance. She ﬂips the switch, so that the train travelsdown the right-hand track, instead of the left. Since the tracks reconverge upahead, the train arrives at its destination all the same...Again, our causal model gets this right. Suppose we have three randomvariables: • F for “ﬂip”, with values 0 (the engineer doesn’t ﬂip the switch) and 1 (shedoes); • T for “track”, with values 0 (the train goes on the left-hand track) and 1(it goes on the right-hand track); and • A for “arrival”, with values 0 (the train does not arrive at the point ofreconvergence) and 1 (it does).” (Halpern and Pearl, 2005, p. 26) Note that this means generalizing to disjunctions across diﬀerent variables – i.e., somethinglike Y = y ∨ Z = z – is more complicated. McDermott (1995) oﬀers an almost identical example involving a dog biting a terrorist.Another famous case is that involving a boulder rolling towards a hiker (Hitchcock, 2001).All of these examples are counterexamples to the transitivity of causation. The failure oftransitivity has become broadly accepted by now (Beckers and Vennekens, 2017). Despitewhat

Def 8 ’s behavior in these examples might suggest, it is also not transitive. A simplecounterexample consists of equations Z = Y ∨ W , and Y = X ∧ W . If X = W = Def 8 considers X = Y = Y = Z =

1, yet it does not consider X = Z = A is given by A = T ∨ ¬ T , which can be rewritten as A =

1. Thiscan be ﬁxed by extending the range of T with a value 2, representing the trainnot going down any track (because it breaks down, for example). Then theequations become A = ( T ≠ ) and T = F . The context is such that F = F = A = A = { T } , but so is F =

0. Therefore

Def 2 and

Def 4 agree with

UpdatedHP (and with intuition) that ﬂipping the switch is not a cause of the train’sarrival.

Def 8 fails to reach this verdict, because ∅ is not strongly suﬃcient for A = Def 4 suﬀers from an even bigger defect than

Def 8 : it fails to distinguishpreempted causes from preempting causes. Since preemption cases are the breadand butter of the literature on actual causation, this means that

Def 4 isimmediately disqualiﬁed. The following is a famous example of late preemptiondiscussed by Halpern and Pearl (2005) (and originally by Hall (2004)).

Example 7.3:

Suzy and Billy both throw a rock at a bottle. Suzy’s rock getsthere ﬁrst, shattering the bottle. However Billy’s throw was also accurate, andwould have shattered the bottle had it not been preempted by Suzy’s throw.Halpern and Pearl (2005) use the following variables for this example, whichcapture the fact that Billy’s throw was preempted by Suzy’s rock hitting thebottle: BS for the bottle shattering, BH , SH for Billy’s (resp. Suzy’s) rockhitting the bottle, and two more variables ( BT , ST ) for either of them throwingtheir rock. The equations are then as follows: BS = BH ∨ SH , SH = ST , BH = BT ∧ ¬ SH . None of the deﬁnitions has any problem arriving at theobvious result that Suzy’s throw ( ST =

1) causes the bottle to shatter ( BS = Def 4 is the only deﬁnition under consideration that mistakenly alsojudges Billy’s throw to be a cause of the bottle’s shattering: in all contexts BT = BS =

1, whereas BT = BS = ST = Def 2 as the last potential alternative to the HP deﬁni-tions. Going through the many remaining examples, there is only one in which

Def 2 disagrees with

Updated HP . I leave it to the reader to verify this claim,and restrict the discussion to that single example.

Example 7.4:

Major ( M ) and sergeant ( S ) stand before corporal, and bothshout ‘Charge!’ ( M = S = C = M =

0) the corporal would not have charged. If the major remains quiet( M = − The equation for C is thus: C = M if M ≠ − C = S otherwise. The majority intuition is that the This formulation is due to Weslake (2015), but the example was ﬁrst discussed by Schaﬀer(2000) (who attributes it to van Fraassen). Def 2 agrees, as it does not consider S = C =

1. The reasonis that M = S = M = S = C = Original HP and

Updated HP . Halpern & Pearl do not consider this to beproblematic, but they do go through the trouble of showing how

Original HP and

Updated HP change their verdict if one adds extra variables to the model.Moreover,

Modiﬁed HP also agrees with

Def 2 here. Given Halpern’s laterpreference for

Modiﬁed HP , it is fair to say that

Def 2 does at least as goodas

Updated HP on this example.

Dissatisﬁed with

Updated HP due to the many counterexamples that werepresented in the literature, Halpern (2015) develops

Modiﬁed HP . First of all,despite Theorem 5.5, there do exist interesting connections between the threedeﬁnitions we have considered and

Modiﬁed HP . Proposition 7.5: If Modiﬁed HP with ⃗ X a singleton, then Def 2 , Def 4 ,and

Def 8 . Halpern (2015) goes over several counterexamples to

Updated HP andshows that

Modiﬁed HP oﬀers sensible verdicts. Taking into account Halpern’ssuggestion that “part of cause” is synonymous with “cause” for

Modiﬁed HP ,there are in fact only three examples in which

Modiﬁed HP disagrees with

Updated HP (Examples 3.5, 3.8, and 3.11). In all three of those cases,

Def2 sides with

Modiﬁed HP .There is only one example in which

Def 2 disagrees with

Modiﬁed HP . Crucially, it is an example for which Halpern agrees that

Modiﬁed HP reachesthe wrong verdict.

Example 7.6:

A ranch has ﬁve individuals: a , . . . , a . They have to vote ontwo possible outcomes: staying at the campﬁre ( O =

0) or going on a round-up( O = A i be the random variable denoting a i ’s vote, so A i = j if a i votes for outcome j . There is a complicated rule for deciding on the outcome.If a and a agree (i.e., if A = A ), then that is the outcome. If a , . . . , a agree, and a votes diﬀerently, then the outcome is given by a ’s vote (i.e., See Weslake (2015) for a discussion. When discussing Example 3.8 again in (Halpern, 2016), he mistakenly claims that

Mod-iﬁed HP agrees with

Updated HP when treating parts of causes as causes. In response,Halpern has suggested a small variation on the example in which

Modiﬁed HP indeed doesagree with

Updated HP (personal communication). For that variation,

Def 2 also agreeswith the HP deﬁnitions. Halpern (2016) discusses far more cases, but none of them reveal any further disagreementsbetween these deﬁnitions. = A ). Otherwise, majority rules. In the actual situation, A = A = A = A = A =

0, so by the ﬁrst mechanism, O = Halpern states, and I agree, that intuitively one should expect only A = A = O =

1. After all, a , . . . , a voted against O = Def2 gives that result, whereas

Modiﬁed HP considers every vote to be a cause.Halpern argues for adding more variables to the model in order to get the rightoutcome, but it speaks in favor of

Def 2 that it is able to give the right answerwith just these variables.We conclude that judged by the second strategy and Halpern & Pearl’sown examples,

Def 2 does better than

Updated HP and at least as good as

Modiﬁed HP . Lastly we consider a very simple example that was oﬀered as acounterexample to

Modiﬁed HP by Rosenberg and Glymour (2018).

Example 7.7:

We have equations Y = X ∨ D and X = D , and we consider acontext such that D =

1. This looks very much like a standard case of overde-termination in which X = D = Modiﬁed HP : it does not consider X = Y =

1. The reason forthis is that Y = D = X = D = Def 2 over

Modiﬁed HP . Finally I will argue that

Def 2 does better than all of the other deﬁnitionson a few more examples according to two metrics: it oﬀers verdicts that areboth intuitively plausible and consistent across minor changes of the examples.Before doing so, I present an example that illustrates a special property of

Def2 . This is the formulation of the example found in (Halpern, 2016, p. 109), but the examplewas ﬁrst presented by Glymour et al. (2010). [ ⃗ W ← ⃗ w ] such that Y = y counterfactually dependson ⃗ X = ⃗ x under that intervention. The same is true for the most well-knowndeﬁnitions out there that have been inspired by the HP deﬁnitions (see Weslake(2015) for an overview), as well as for Def 3 , Def 4 , and

Def 10 . Let us calldeﬁnitions with this property strongly counterfactual . Although

Def 2 clearlyalso relies on counterfactuals, and thus falls within the counterfactual approachto causation, it is not strongly counterfactual, as the following example shows. Example 7.8:

The equation for a binary variable Y is such that Y = N ≠ N is { , , , } . The equation for N is as follows: N = A = N = ( A = ∧ X = ) , N = ( A = ∧ X = ∧ W = ) , and N = ( A = ∧ X = ∧ W = ) . In a context where A = W = X =

1, we get that X = Y = Def 2 . Yet there is no intervention such that Y = X = X = Y = Def 2 reaches its verdict because of the asymmetry between ( A = , X = ) and ( A = , X = ) : only the former is by itself causally suﬃcientfor a network that results in Y =

1, whereas the latter also needs the assistanceof W = W = Y : Y = ( X ∧ D ) ∨ A . Moreover,they all share a context such that X = A =

1. The only diﬀerence betweenthem lies with the value of D (0 or 1) and with the relation between A and D .(Concretely, there could be no relation, or it can be given by A = D , A = ¬ D , D = A , and D = ¬ A .) In all examples, all deﬁnitions agree that A = Y =

1. The disagreement arises over whether X = X = D =

0, regardless of the relation between A and D . The disjunct in which X appears is false, and therefore it played no positive part whatsoever in causing Y =

1. Perhaps others are more tolerant. But even if that is the case, one shouldexpect one’s verdicts to exhibit some consistency. As we will see,

Def 2 and

Original HP are the only deﬁnitions which can meet this demand.The situation is simplest for

Original HP : it considers X = Y = ( D = , A = ) . Holdingﬁxed that witness, Y = X =

1. Since ⃗ Z = { X } ,the former is equivalent to AC2 for Original HP . So we gain consistency, butat the price of extreme tolerance. In fact, Halpern and Pearl use precisely thisexample to argue against

Original HP and in favor of

Updated HP (2005,p. 35): It is not so clear that

Def 8 also relies on counterfactuals, since it does not explicitlyinvoke counterfactual values of the candidate cause. Exploring this topic further lies beyondthe scope of this paper. xample 7.9: “Suppose that a prisoner dies either if X loads D ’s gun and D shoots, or if A loads and shoots his gun. Taking Y to represent the prisoner’sdeath and making the obvious assumptions about the meaning of the variables,... [we can use the equation described above]. Suppose that X loads D ’s gun( X = D does not shoot ( D = A does load and shoot his gun ( A = A = Y = We would not wantto say that X = is a cause of Y = , given that D did not shoot (i.e., giventhat D = ). ” [emphasis added]If we agree with Halpern and Pearl here – which I do – then Original HP can be discarded on the basis of this example (and on the basis of the manyothers we discussed previously, of course). I leave it to the reader to verify thatnone of the other deﬁnitions consider X = D = Def 2 . Moreover, it is the only remainingdeﬁnition that oﬀers a simple consistent answer in all cases: X = Y = D =

1. To see why this is the case, we go over the possible directlysuﬃcient sets. (Since X is only a parent of Y , we can invoke Proposition 6.2 anduse Def 3 instead of

Def 2 .) Clearly X = Y = A = A = Y = D asour witness. If D =

0, this gives ( X = , D = ) , which is not directly suﬃcientfor Y = X = D =

1, we get ( X = , D = ) , whichis directly suﬃcient for Y =

1. Since the same does not hold for ( X = , D = ) , X = Y = Updated HP and

Modiﬁed HP ﬂip-ﬂop between calling X = D .Of course I cannot exclude the possibility that some consistent argumentationcan be oﬀered to explain the results of one of these deﬁnitions, but in its absenceall of this speaks in favor of Def 2 . We start with the three possible ways inwhich it can arise that D = Example 7.10:

First consider the case where D is determined by the context,and we have a context such that D =

1. Here all four deﬁnitions agree that X = Y = Example 7.11:

Second consider the case where the equation for D is given by D = A and thus again D = UpdatedHP and

Modiﬁed HP ﬂip their verdict, as they no longer consider X = Y = Example 7.12:

Third, we simply ﬂip the relation between A and D so that A = D , and again D = UpdatedHP and

Modiﬁed HP go back to considering X = Y = D = xample 7.13: Consider the case where the equation for D is D = ¬ A . As withExample 7.9, we have that D =

0, and yet

Updated HP changes its verdict,calling X = Y = Example 7.14: Lastly, consider the case where the equation for D is A = ¬ D ,and thus we again have that D =

0. Now both

Modiﬁed HP and

UpdatedHP ﬂip their verdicts as compared to Example 7.9. To see why, it suﬃcesto consider

Modiﬁed HP . The result for

Updated HP then follows fromTheorem 5.5. D = Y = Y = D =

0. Since Y = ( X = , D = ) , X = Y = I have developed twelve deﬁnitions of actual causation that formalize the NESSintuition with which Pearl started, and have shown that the most recent of theHP deﬁnitions is among them. Although these deﬁnitions vary widely in termsof the verdicts they reach, they all resemble each other as being instantiationsof the same general deﬁnition. Each deﬁnition is made up of two elements: adeﬁnition of causal suﬃciency, and a deﬁnition of necessity. Other deﬁnitionscan easily be developed by playing around with these elements.After studying various properties of these deﬁnitions and the relations be-tween them, I moved on to the process of selecting the deﬁnition that does bestin practice. In the majority of the many examples that we have considered,

Def2 agrees with

Modiﬁed HP . However, in Section 7.2 we came across two exam-ples for which

Def 2 disagreed with

Modiﬁed HP and where

Modiﬁed HP gave the wrong verdict. Moreover, contrary to

Modiﬁed HP , Def 2 managesto give consistent (and intuitive) answers to the group of cases considered in theprevious section. Therefore I conclude by suggesting that we should adopt

Def2 as a deﬁnition of actual causation. This deﬁnition is made up of strong suﬃ-ciency and contrastive necessity. It states that ⃗ X = ⃗ x causes Y = y iﬀ ⃗ X = ⃗ x is aMinimal Contrastively Necessary Subset of a Strongly Suﬃcient Set for Y = y ,or MCNS . A AppendixCausal Suﬃciency

Proposition 4.4: ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M along a network ⃗ N iﬀ ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M . Proof:

First assume ⃗ X = ⃗ x is strongly suﬃcient for ⃗ Y = ⃗ y in M and ⃗ N can beused to show this. Then the result follows immediately from the observation The attentive reader will remember this example from the proof of Theorem 5.3. ⃗ X = ⃗ x is directly suﬃcient for ⃗ N = ⃗ n and either ⃗ N = ⃗ n is directly suﬃcientfor ⃗ Y = ⃗ y or ⃗ N = ⃗ Y and ⃗ n = ⃗ y .Second assume ⃗ X = ⃗ x is strongly suﬃcient for Y = y in M along a network ⃗ N . Deﬁne ⃗ A = V − ( ⃗ X ∪ ⃗ N ) . We need to show that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n .We know that ⃗ X = ⃗ x is directly suﬃcient for ⃗ N = ⃗ n . Deﬁne ⃗ C = V− ( ⃗ X ∪ ⃗ N ) and ⃗ D = ⃗ N − ⃗ N . Note that ⃗ C = ⃗ A ∪ ⃗ D . We have that for all ⃗ c ∈ R ( ⃗ C ) andall ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n . In particular, we have that forall ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n .Deﬁne ⃗ C = V − ( ⃗ N ∪ ⃗ N ) and ⃗ D = ⃗ N − ( ⃗ N ∪ ⃗ N ) . Note that ⃗ C = ⃗ A ∪ ⃗ D ∪ ⃗ X .We have that for all ⃗ c ∈ R ( ⃗ C ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ N ← ⃗ n , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n . In particular, we have that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ N ← ⃗ n , ⃗ A ← ⃗ a ] ⃗ N = ⃗ n . Combined with the conclusionfrom the previous paragraph, it follows that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n ∧ ⃗ N = ⃗ n .Deﬁning ⃗ N k + = ⃗ Y , we can generalize this reasoning for all consecutive i ∈ { , . . . , k + } to get the desired outcome. Deﬁning Causation using Suﬃciency

Theorem 5.3:

The following are all equivalences among the twelve deﬁnitionsand the three HP deﬁnitions: • Modiﬁed HP iﬀ Def 1 • Def 2 iﬀ Def 5 • Def 8 iﬀ Def 11 • Def 3 iﬀ Def 6 iﬀ Def 9 iﬀ Def 12Proof:

First we consider the equivalences that do hold.We start with the ﬁrst equivalence:

Modiﬁed HP iﬀ Def 1 . This is simplya matter of explicitly writing out the deﬁnitions, starting with actual weaksuﬃciency: ⃗ X = ⃗ x is actually weakly suﬃcient for Y = y in ( M, ⃗ u ) iﬀ ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] Y = y . Next we note that the following condition is trivially satisﬁedfor any ⃗ W ⊆ V : ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ ] Y = y .Combining both claims, we can rewrite Modiﬁed HP as follows, whichgives the desired result:AC2(a). There is a set ⃗ W ⊆ ( V − ( ⃗ X ∪ { Y })) and a setting ⃗ x ′ of the variables in ⃗ X such that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not actually weakly suﬃcient for Y = y in ( M, ⃗ u ) .AC2(b). ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is actually weakly suﬃcient for Y = y in ( M, ⃗ u ) .25ext we consider all of the following equivalences: Def 2 iﬀ Def 5 , Def8 iﬀ Def 11 , Def 3 iﬀ Def 6 , Def 9 iﬀ Def 12 . The reason we can groupthese together, is because we can prove all of them by invoking the followingobservation and two subsequent lemmas.

Observation 1

Recall our restriction on causal models that exogenous variablesonly appear in equations of the form V = U . Say ⃗ R ⊆ V are all variables whichhave such an equation, and call these the root variables. It is clear that if weintervene on all of the root variables, they take over the role of the exogenousvariables. Concretely, given strong recursivity, for any setting ⃗ r ∈ R ( ⃗ R ) thereexists a unique setting ⃗ v ∈ R ( V ) so that for all contexts ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v . Lemma A.1:

Given a setting ⃗ X = ⃗ x , a setting ⃗ N = ⃗ n that includes Y = y andsuch that ⃗ N ∩ ⃗ R = ∅ , a context ⃗ u , the following holds: • ⃗ X = ⃗ x is actually directly suﬃcient for Y = y in ( M, ⃗ u ) iﬀ ⃗ X = ⃗ x is directlysuﬃcient for Y = y in M ; • ⃗ X = ⃗ x is actually strongly suﬃcient for Y = y in ( M, ⃗ u ) along ⃗ N = ⃗ n iﬀ ⃗ X = ⃗ x is strongly suﬃcient for Y = y in M along ⃗ N = ⃗ n . Proof:

Filling in the deﬁnitions of direct and actually direct suﬃciency, theﬁrst equivalence reduces to the following: for all ⃗ c ∈ R ( V − ( ⃗ X ∪ { Y })) , it holdsthat ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] Y = y iﬀ for all ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] Y = y .Because of Observation 1, we have that for any setting ⃗ v ∈ V and any setting ⃗ r ∈ R ( ⃗ R ) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v iﬀ for all contexts ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v . Combining this with the fact that ⃗ R ⊆ ( ⃗ C ∪ ⃗ X ) givesthe desired result.The second equivalence can be reformulated as follows: ⃗ X = ⃗ x is actuallydirectly suﬃcient for ⃗ N = ⃗ n in ( M, ⃗ u ) iﬀ ⃗ X = ⃗ x is directly suﬃcient for ⃗ N = ⃗ n in M . In turn, this reduces to: for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ N )) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n iﬀ for all ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n .Given that ⃗ N ∩ ⃗ R = ∅ , we still have that ⃗ R ⊆ ( ⃗ C ∪ ⃗ X ) , and therefore we canapply the same reasoning as before. Lemma A.2:

For all twelve instances of the

General Deﬁnition of Causa-tion we can restrict ourselves to sets ⃗ N so that ( ⃗ N − { Y }) ∩ ⃗ R = ∅ . Proof:

Let ⃗ A denote ( ⃗ N − { Y }) ∩ ⃗ R . For all deﬁnitions using either variantsof direct or weak suﬃciency the result follows immediately from the fact that ⃗ N − { Y } = ∅ .First consider the case where we use non-actual strong suﬃciency ( Def 5 or Def 11 ). In that case, AC2(b) can never be satisﬁed unless ⃗ A = ∅ . To see why, ⃗ R is deﬁned in Observation 1. ⃗ u ′′ ∈ R ( U ) , it has to hold that ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ ] ⃗ A = ⃗ a . Since ⃗ A ∩ ( ⃗ X ∪ ⃗ W ) and the equation for each element A i ∈ ⃗ A is ofthe form A i = U for some exogenous variable U , this is impossible. (Strictlyspeaking it is possible, namely if the range of U consists only of the single value a ∗ i . Although I did not make this explicit in Section 2, it is standard to assumethat all variables have a range that contains at least two elements.)Second consider the case where we use actual strong suﬃciency and con-trastive necessity ( Def 2 ). (The case of

Def 8 is entirely analogous.) Say weare considering a candidate cause ⃗ X = ⃗ x , a candidate witness ⃗ W = ⃗ w ∗ , contrastvalues ⃗ x ′ , and a setting ⃗ N = ⃗ n that includes Y = y . Given AC1, we can safelyassume that ⃗ n = ⃗ n ∗ .I claim that the following holds, from which the result follows: ⃗ X = ⃗ x satisﬁesAC2 using contrast values ⃗ x ′ , witness ⃗ W = ⃗ w ∗ , and network ⃗ N iﬀ ⃗ X = ⃗ x satisﬁesAC2 using contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A .Because ⃗ A ⊆ ⃗ R , we have that for any set ⃗ B ⊆ ( V − ⃗ A ) , and any setting ⃗ b ∈ R ( ⃗ B ) , ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b ] ⃗ A = ⃗ a ∗ . Moreover, since ( M, ⃗ u ) ⊧ ⃗ A = ⃗ a ∗ , foreach setting ⃗ v ∈ ( V − ⃗ A ) we also have that ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b ]( V − ⃗ A ) = ⃗ v iﬀ ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b, ⃗ A ← ⃗ a ∗ ]( V − ⃗ A ) = ⃗ v .Using these observations and the fact that ⃗ A ⊆ ⃗ N , we get that the followingtwo conditions are equivalent, for which the result follows as far as AC2(b) isconcerned:AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n ∗ .AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ C ← ⃗ c ]( ⃗ N − ⃗ A ) = ⃗ n ∗ (where ⃗ n is the restriction of ⃗ n ∗ to ( ⃗ N − ⃗ A ) ).Now we focus on AC2(a c ).Let us ﬁrst assume AC2(a c ) holds for ⃗ X = ⃗ x , contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A . We need to show that it holds for ⃗ X = ⃗ x ,contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ ) , and network ⃗ N .Consider some ⃗ S ⊆ ⃗ N with Y ∈ ⃗ S . We need to ﬁnd a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S )) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Deﬁne ⃗ S = ⃗ S − ⃗ A , ⃗ S = ⃗ S ∩ ⃗ A ,and ⃗ A = ⃗ A − ⃗ S .Since ⃗ S ⊆ ( ⃗ N − ⃗ A ) with Y ∈ ⃗ S , we know that there exists some ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Since ⃗ S ⊆ ⃗ S , it also holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Also, given our observations about ⃗ A , it also follows that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Lastly, note that [ V− ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S )] ∪ ⃗ A =V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S ) . Therefore we can choose ⃗ t = ( ⃗ a , ⃗ t ) .Next we consider the other direction: assume AC2(a c ) holds for ⃗ X = ⃗ x ,contrast values ⃗ x ′ , witness ⃗ W = ⃗ w ∗ , and network ⃗ N . We need to show thatit holds for ⃗ X = ⃗ x , contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A . 27onsider some ⃗ S ⊆ ( ⃗ N − ⃗ A ) with Y ∈ ⃗ S . We need to ﬁnd a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Note that ( ⃗ S ∪ ⃗ A ) ⊆ ⃗ N , and also Y ∈ ( ⃗ S ∪ ⃗ A ) . Therefore there exists some ⃗ t ∈ R ( V− ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ]( ⃗ S ≠ ⃗ s ∗ ∨ ⃗ A ≠ ⃗ a ∗ ) . It follows that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Choosing ⃗ t = ⃗ t gives the desired result.Because of the above lemmas, all that remains is to show that the aboveequivalences hold also when Y ∈ ⃗ R . This is accomplished by showing thatsettings of such variables do not have any cause, regardless of the deﬁnition oneuses.AC2(a) requires us to look at all subsets of ⃗ N = ⃗ n that include Y = y , andverify that the candidate cause and witness ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) (or candidatewitness ⃗ W = ⃗ w ∗ in case we use AC2(a m )) is not suﬃcient for that subset.One such subset is the one containing just Y = y . By AC1, we have that ( M, ⃗ u ) ⊧ Y = y . Since Y ∈ ⃗ R , there is no intervention on the other endogenousvariables so that Y ≠ y under that intervention in ⃗ u . Therefore any deﬁnitionof causation using a version of actual suﬃciency (i.e., Def 2 , Def 3 , Def 8 ,and

Def 9 ) considers all sets that do not include Y to be suﬃcient for Y = y in ( M, ⃗ u ) . In particular, they consider ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) to be suﬃcient for Y = y in ( M, ⃗ u ) , and thus fail to meet condition AC2(a).For the deﬁnitions using non-actual variants of suﬃciency ( Def 5 , Def 6 , Def 11 , and

Def 12 ), it is condition AC2(b) that can never be satisﬁed. Anal-ogous to what we saw in the proof of Lemma A.2, this follows from the fact thatwhatever version of suﬃciency we use, Y = y has to hold in all contexts, whichis impossible given that Y / ∈ ( ⃗ X ∪ ⃗ W ) . From this the result follows.Now we prove the only remaining equivalence: Def 6 iﬀ Def 12 . (Given theprevious equivalences, other choices are possible too.) We need to show thatthe following two statements are equivalent: • ⃗ W = ⃗ w ∗ is not directly suﬃcient for Y = y . • There exists values ⃗ x ′ of ⃗ X such that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not directlysuﬃcient for Y = y .Filling in Deﬁnition 4.1, the result follows immediately: • There exists a ⃗ z ∈ R ( V − ( ⃗ W ∪ ⃗ X ∪ { Y })) , a ⃗ x ′ ∈ R ( ⃗ X ) , and a ⃗ u ′ ∈ R ( U ) so that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ , ⃗ X ← ⃗ x ′ , ⃗ C ← ⃗ c ] Y ≠ y . • There exists values ⃗ x ′ of ⃗ X , a ⃗ z ∈ R ( V − ( ⃗ W ∪ ⃗ X ∪ { Y })) and a ⃗ u ′ ∈ R ( U ) so that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ , ⃗ X ← ⃗ x ′ , ⃗ C ← ⃗ c ] Y ≠ y .Second, we go over some examples to show that none of the other equiva-lences hold. (Obviously, from now on we may ignore Def 1 , Def 5 , Def 6 , Def7 , Def 9 , Def 11 , and

Def 12 .) Example A.3:

Equations: Y = ( X ∧ A ) ∨ D , D = A . Context: A =

1. Then X = Y = Modiﬁed HP : We can always consider choosing ⃗ W = ∅ , in which casewe simply get counterfactual dependence: ( M, ⃗ u ) ⊧ ⃗ X = ⃗ x ∧ Y = ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ ] Y ≠ y . Doing so in this example, we see that Y = ( X = , D = ) . There is clearly also nowitness ⃗ W = ⃗ w ∗ to show that X = D = X = • Updated HP and

Original HP : taking ( A = , D = ) as a witnessmeets the conditions. • Def 3 : again take ( A = , D = ) as a witness. • Def 2 : follows from the previous item and Theorem 5.5. • Def 8 : follows from the previous item and Theorem 5.5. X = Y = • Def 10 : X = Y = A = A or D to the witness. Butboth A = D = Y = • Def 4 : ( X = , A = ) and ( X = , D = ) also weakly suﬃce for Y = Def 4 and

Def 10 are not equivalent to any of the otherdeﬁnitions. We give an example to show that

Def 4 and

Def 10 are notequivalent to each other either.

Example A.4:

Equations: Y = X ∧ A , X = A . Context: A =

1. Since X = Y =

1, we need to include A = ( X = , A = ) is weakly suﬃcient for Y =

1. However, so is A =

1, and therefore X = Y = Def 10 . Yet ( X = , A = ) is notweakly suﬃcient for Y =

1, and therefore X = Y = Def4 . This leaves us with the HP deﬁnitions,

Def 2 , Def 3 , and

Def 8 . The nextexample shows that the former are not equivalent to the latter.

Example A.5:

Equations: Y = ( X ∧ ¬ A ) ∨ D , D = A . Context: A =

1. Then X = Y = • Modiﬁed HP : Y = ( X = , A = ) , andnot on either X = A =

1. So X = • Updated HP and

Original : take A = X = Y = • Def 3 : X = Y = [ A ← , D ← ] ), so we need to add A or D to the witness. Since theactual value of A is 1, it is of no use, which leaves us with D . But D = Y = ( X = , D = ) .29 Def 2 : follows from the previous item and Proposition 6.2. • Def 8 : follows from the previous item and Proposition 6.2.That none of the HP deﬁnitions are equivalent is of course a well-establishedfact, and also follows from the examples we consider in Section 7. Therefore weare left with showing that

Def 2 , Def 3 , and

Def 8 are not equivalent. That

Def 3 diﬀers from the other two is a direct consequence of some of our laterresults, but a simple example illustrates this as well.

Example A.6:

Equations: Y = A , A = X . Context: A =

1. Then it is easy tosee that X = Y = Def 3 .Lastly, I refer the reader to Example 7.2 in Sections 7 for an example thatshows

Def 2 and

Def 8 are not equivalent.

Proposition 5.4: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to a deﬁnitionthat uses minimal necessity, then ⃗ X is a singleton. Proof:

Since we know that

Def 7 is unsatisﬁable and we have Theorem 5.3,we only need to consider

Def 3 , Def 8 , and

Def 10 . The following applies toboth weak and direct suﬃciency (i.e.,

Def 3 and

Def 10 .)Assume ( ⃗ X = ⃗ x , ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y , and ⃗ W = ⃗ w ∗ isnot suﬃcient for Y = y . If either ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) or ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) isalso suﬃcient for Y = y , then ( ⃗ X = ⃗ x , ⃗ X = ⃗ x ) is not minimal.So let us assume that neither ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) nor ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) issuﬃcient for Y = y . This means we can move ⃗ X to the witness to show that ⃗ X = ⃗ x satisﬁes AC2 by itself, and likewise for ⃗ X and ⃗ X reversed. From thisthe result follows.Now we prove that it also holds for strong suﬃciency, i.e., for Def 8 . Assume ( ⃗ X = ⃗ x , ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along ⃗ N , and ⃗ W = ⃗ w ∗ is notsuﬃcient for Y = y along any network ⃗ S ⊆ ⃗ N . If either ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) or ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is also suﬃcient for Y = y along ⃗ N , then ( ⃗ X = ⃗ x , ⃗ X = ⃗ x ) is not minimal.So let us assume that neither ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) nor ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) issuﬃcient for Y = y along ⃗ N . If the same is true for all subnetworks ⃗ S ⊆ ⃗ N , thenas before, we can move either one of ⃗ X and ⃗ X to the witness to show that theother satisﬁes AC2 by itself.So let us assume that there is some subnetwork ⃗ S ′ ⊆ ⃗ N such that ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along ⃗ S ′ . (Obviously the same reasoningapplies to ⃗ X .) Since all subnetworks ⃗ S ′′ of ⃗ S ′ are also subnetworks of ⃗ N , itfollows from the above that ( ⃗ X = ⃗ x ) satisﬁes AC2 by itself when taking ⃗ W aswitness and ⃗ S ′ as network. From this the result follows. Theorem 5.5:

The only implications – involving either causes or parts ofcauses – between the remaining ﬁve deﬁnitions (

Def 2 , Def 3 , Def 4 , Def8 , and

Def 10 ) and the three HP deﬁnitions are the following ones (and theirimmediate consequences, of course): If part of

Modiﬁed HP then

Updated HP ; • If part of

Updated HP then

Original HP ; • If Def 3 then

Def 2 ; • If part of

Def 2 then

Def 8 ; • If Def 3 then

Original HP ; • If Def 10 then

Def 4 . Proof:

The ﬁrst two implications are proven in (Halpern, 2016).First we prove the third implication. Assume ⃗ X = ⃗ x causes Y = y withwitness ⃗ W according to Def 3 . It follows from Proposition 5.4 that ⃗ X is asingle conjunct X . Note that this immediately implies minimality of ⃗ X .In other words, ( X = x, ⃗ W = ⃗ w ∗ ) is directly suﬃcient for Y = y , and thereexists some x ′ such that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not directly suﬃcient for Y = y .From the former it follows that ( X = x, ⃗ W = ⃗ w ∗ ) is strongly suﬃcient for Y = y along ∅ . From the latter it follows that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not stronglysuﬃcient for Y = y along ∅ , from which the result follows.Second we prove the fourth implication. Assume ( X = x, ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along ⃗ N , and ( X = x ′ , ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not suﬃcientfor Y = y along any network ⃗ S ⊆ ⃗ N , for some ⃗ N , x ′ and ⃗ x ′ . We show that X = x causes Y = y according to Def 8 .Taking ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) as our witness and using ⃗ N , AC2(b) remainsunchanged. If ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is not suﬃcient for Y = y along any network ⃗ S ⊆ ⃗ N , then the result follows. We proceed by a reductio.Let us assume that ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along some ⃗ S ⊆ ⃗ N . If ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not suﬃcient for Y = y along any ⃗ S ′′ ⊆ ⃗ S , wehave a violation of minimality (since X is redundant). Therefore we know that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is suﬃcient for Y = y along some network ⃗ S ′′ ⊆ ⃗ S .This means that there exist values ⃗ s ′′ ∈ R ( ⃗ S ′′ ) so that for all settings ⃗ c ∈R ( V − ( ⃗ S ′′ ∪ ⃗ X ∪ { X, Y }) , and for all x ′′ ∈ R ( X ) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c, X ← x ′′ ] ⃗ S = ⃗ s ′′ and ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c, X ← x ′′ , ⃗ S ← ⃗ s ′′ ] Y = y . In particular, this holds if we choose X = x ′ . Butthat means that ( X = x ′ , ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is also suﬃcient for Y = y along ⃗ S ′′ ,which contradicts our starting assumption.Third we prove the ﬁfth implication. As with the third implication, assumethat ( X = x, ⃗ W = ⃗ w ∗ ) is directly suﬃcient for Y = y , and there exists some x ′ such that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not directly suﬃcient for Y = y . From the latterit follows that there exists a setting ⃗ d of V − ( ⃗ X ∪ ⃗ W ∪ { Y }) such that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ , ⃗ D ← ⃗ d ] Y ≠ y . This means that if we take ( ⃗ W = ⃗ w ∗ , ⃗ D = ⃗ d ) aswitness, AC2(a) is satisﬁed for Original HP . Since ( X = x, ⃗ W = ⃗ w ∗ ) is directlysuﬃcient for Y = y , we know that ( M, ⃗ u ) ⊧ [ X ← x, ⃗ W ← ⃗ w ∗ , ⃗ D ← ⃗ d ] Y = y . Also,we have that ⃗ Z = ⃗ X , and thus the former means that also AC2(b) is satisﬁedfor Original HP . 31ourth we prove the last implication. Assume X = x causes Y = y withwitness ⃗ W according to Def 10 . (We know because of Proposition 5.4 that ⃗ X is a singleton.) In other words, ( X = x, ⃗ W = ⃗ w ∗ ) is weakly suﬃcient for Y = y ,and ⃗ W = ⃗ w ∗ is not weakly suﬃcient for Y = y . Remains to be shown that thereexist a value x ′ so that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not weakly suﬃcient for Y = y .Say ⃗ u ′ is a context such that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ ] Y ≠ y , and say x ′ is theunique value such that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ ] X = x ′ . Then also ( M, ⃗ u ′ ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] Y ≠ y , which is what remained to be shown.Fifth, we show that none of the remaining implications hold. (Again, wedo not consider the relations amongst the HP deﬁnitions explicitly and referthe reader to the examples in Section 7. We also do not explicitly considerthe remaining implications for parts of causes, but the reader can verify thatthe following examples suﬃce to falsify all those implications as well. For theleft-hand side of all implications this follows immediately from the fact that thecauses in all the following examples are singletons. For the right-hand side ofimplications, Propositions 5.4, 6.1, and 6.2 come in handy.)Example A.4 shows that Def 4 does not imply

Def 10 .Example A.3 shows that none of the other deﬁnitons imply either

Def 4 or Def 10 . So there are no remaining implications with either

Def 4 or Def 10 on the right-hand side.Example A.6 shows that

Def 3 is not implied by any deﬁnition.Example A.5 shows that none of the HP deﬁnitions imply

Def 2 or Def8 . Note that

Def 4 and

Def 10 also consider X = Y = X = Y =

1, whereas X = Def 8 does not imply

Def2 . Therefore there are no remaining implications with

Def 2 or Def 8 on theright-hand side.That leaves us to consider implications with one of the HP deﬁnitions onthe right-hand side. Given the ﬁrst two implications of Theorem 5.5, it suﬃcesto show that none of

Def 4 , Def 2 , Def 8 , or

Def 10 , imply

Original HP ,and that

Def 3 does not imply

Updated HP .I refer the reader to Example 7.8 in Section 7 for an example where

Def 2 – and thus also

Def 8 – hold and

Original HP does not.The following example shows that neither

Def 4 nor

Def 10 implies

Orig-inal HP . Example A.7:

Equations: Y = Z ∨ Z ∨ A , Z = X ∧ A , Z = X ∧ ¬ A . Context: A = X =

1. Then X = Y = • Def 10 : X = Y = ∅ is not. • Def 4 : follows from the previous one.Yet X = Y = Original HP . To see why,note that we need to include A = Z . Also, we clearly cannot add Z =

1. Therefore the witnesshas to be A =

0. The actual value of Z is 0. Since we have ( M, ⃗ u ) ⊧ [ X ← , A ← , Z ← ] Y =

0, AC2(b) is not satisﬁed.32astly, an example to show that

Def 3 does not imply

Updated HP . Example A.8:

Equations: Y = ( X ∧ D ) ∨ A , D = A . Context: A = X = X = Y = Def 3 : ( X = , D = ) is directlysuﬃcient for Y =

1, and ( X = , D = ) is not. But X = Y = Updated HP . To see why, note that we need to include A = ( M, ⃗ u ) ⊧ [ X ← , A ← ] Y = Updated HP . Excluding Def 3 and Def 10

Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to Def 3 , then ⃗ X is a singleton, and X is a parent of Y . Proof:

That ⃗ X is always a singleton is a direct consequence of the combinationof Proposition 5.4 and Theorem 5.3.Recall that X is a parent of Y iﬀ there exists a context ⃗ u ′′ , a setting ⃗ z ∈R ( V − { X, Y }) , and values x, x ′′ of X so that F Y (⃗ u ′′ , ⃗ z, x ) ≠ F Y (⃗ u ′′ , ⃗ z, x ′′ ) . Thismeans precisely that for some y ∈ R ( Y ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ Z ← ⃗ z, X ← x ] Y = y and ( M, ⃗ u ′′ ) ⊧ [ ⃗ Z ← ⃗ z, X ← x ′′ ] Y ≠ y . If X = x causes Y = y according to Def 3 ,the existence of values such that the previous holds follows immediately.

Proposition 6.2: If X is only a parent of Y , then Def 3 , Def 2 , and

Def 8 are all equivalent for causes X = x . Proof:

Given Theorem 5.5, we only need to prove the implication from

Def 8 to Def 3 .Assume X is only a parent of Y , and X = x causes Y = y according to Def8 . Thus, there is a witness ⃗ W and some network ⃗ N such that ( X = x, ⃗ W = ⃗ w ∗ ) is strongly suﬃcient for Y = y along ⃗ N , and ( ⃗ W = ⃗ w ∗ ) is not strongly suﬃcientfor Y = y along any subnetwork of ⃗ N .First consider the case where ⃗ N = ∅ . This means that ( X = x, ⃗ W = ⃗ w ∗ ) isdirectly suﬃcient for Y = y , and ( ⃗ W = ⃗ w ∗ ) is not directly suﬃcient for Y = y .That means precisely that X = x causes Y = y according to Def 12 . The resultnow follows from Theorem 5.3.Second consider the case where there exists some N ∈ ⃗ N . If N is not anancestor of Y , it can be removed from ⃗ N without consequence. If N is anancestor of Y , then it cannot be a descendant of X . But in that case it does notdepend on X , and thus we can remove it from ⃗ N and add it to the witness ⃗ W without consequence. Therefore there always exists a choice of witness so that ⃗ N = ∅ , and thus the result follows. Proposition 6.3:

Out of all deﬁnitions we have considered,

Def 10 and

Def3 are the only ones which do not satisfy

Dependence . Proof:

For the HP deﬁnitions this is proven in (Halpern, 2016, p. 26).33xample A.6 shows the result for

Def 3 .Example A.4 shows the result for

Def 10 .Therefore it remains to be shown that

Dependence implies

Def 2 , Def 4 ,and

Def 8 . This is a direct consequence of the fact that

Dependence implies

Modiﬁed HP , combined with Proposition 7.5.

Def 2, Def 4, and Def 8, vs the HP deﬁnitions

Proposition 7.5: If Modiﬁed HP with ⃗ X a singleton, then Def 2 , Def 4 ,and

Def 8 . Proof:

Recall the root variables ⃗ R from Observation 1. Note that for anysetting ⃗ r ∈ R ( ⃗ R ) , for any set ⃗ Y ⊆ ( V − ⃗ R ) , there exists some ⃗ y so that ⃗ R = ⃗ r isboth weakly, actually weakly, and strongly, suﬃcient for ⃗ Y = ⃗ y .Assume X = x causes Y = y according to Modiﬁed HP with witness ⃗ W .This means there exists a x ′ so that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] Y ≠ y . Let ⃗ S = ⃗ R − ( ⃗ W ∪ { X }) .First we focus on Def 4 . Note that ( X = x, ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is weaklysuﬃcient for Y = y . Furthermore, changing X from x to x ′ obviously has noeﬀect on any of the values in ⃗ R . Therefore ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] ⃗ S = ⃗ s ∗ ,and thus we get that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ , ⃗ S ← ⃗ s ∗ ] Y ≠ y . (Also, we mayassume that ⃗ W ∩ ⃗ R = ∅ .) From this it follows that ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) isnot weakly suﬃcient for Y = y . So taking ( ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) as witness gives thedesired result.Second we focus on Def 2 (from which

Def 8 follows due to Theorem5.5). Combining the previous statement about ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) withProposition 4.6 it follows immediately that there does not exist any network ⃗ N so that ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is strongly suﬃcient for Y = y along ⃗ N .Clearly there exists some ⃗ N so that ⃗ R = ⃗ r ∗ is strongly suﬃcient for Y = y along ⃗ N . (We can start by picking parents ⃗ A of Y = y such that ⃗ A = ⃗ a ∗ isdirectly suﬃcient for Y = y . Then we can take parents of all elements in ⃗ A , toget a set ⃗ B so that ⃗ B = ⃗ b ∗ is directly suﬃcient for ⃗ A = ⃗ a ∗ , etc.) But then also ( X = x, ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is strongly suﬃcient for Y = y along ⃗ N , from which theresult follows. Acknowledgements

Many thanks to Joe Halpern and Naftali Weinberger for helpful comments onearlier versions of this paper. This research was made possible by funding fromthe Alexander von Humboldt Foundation.34 eferenceseferences