aa r X i v : . [ c s . A I] F e b Causal Sufficiency and Actual Causation (Preprint of paper to appear in the
Journal of Philosophical Logic ) Sander BeckersMunich Center for Mathematical Philosophy, LMU [email protected]
Abstract
Pearl opened the door to formally defining actual causation usingcausal models. His approach rests on two strategies: first, capturingthe widespread intuition that X = x causes Y = y iff X = x is a Nec-essary Element of a Sufficient Set for Y = y , and second, showing thathis definition gives intuitive answers on a wide set of problem cases. Thisinspired dozens of variations of his definition of actual causation, the mostprominent of which are due to Halpern & Pearl. Yet all of them ignorePearl’s first strategy, and the second strategy taken by itself is unable todeliver a consensus. This paper offers a way out by going back to the firststrategy: it offers six formal definitions of causal sufficiency and two inter-pretations of necessity. Combining the two gives twelve new definitions ofactual causation. Several interesting results about these definitions andtheir relation to the various Halpern & Pearl definitions are presented.Afterwards the second strategy is evaluated as well. In order to maxi-mize neutrality, the paper relies mostly on the examples and intuitions ofHalpern & Pearl. One definition comes out as being superior to all others,and is therefore suggested as a new definition of actual causation. Keywords:
Actual Causation; Causal Sufficiency; NESS; Counterfactuals
Two decades have passed since Judea Pearl’s groundbreaking book on causalitywas published (Pearl, 2000). It offers a formal account of causal models thatled causal modeling to become a central part of Artificial Intelligence. One ofthe book’s most important applications for philosophy is its formal definition of actual causation , i.e., causation of particular events.Pearl defends his account of actual causation using two strategies. The firststrategy starts with the widely shared intuition that X = x causes Y = y iff X = x is a Necessary Element of a Sufficient Set for Y = y (the NESS intuition,1rom now on). Pearl claims that using causal models allows one to makethis intuition formally precise, whereas existing logical notions of necessity andsufficiency lack the resources to do so. The second strategy is to demonstratethat his formal account offers intuitive verdicts for a number of problematicexamples.Ever since, Pearl’s account has come under severe criticism. By now there aredozens of papers – both from philosophers and from researchers in AI – attempt-ing to improve upon his account. Most prominently, Pearl himself has offeredseveral revisions of his account in collaboration with Halpern, culminating in themost recent revision by Halpern individually (Pearl, 2009; Halpern and Pearl,2001, 2005; Halpern, 2015, 2016). Together these accounts of causation are re-ferred to as the Halpern & Pearl definitions, or
HP definitions for short, andthey are by far the most influential accounts of causation out there.The problem with all of these attempts at revising Pearl’s initial account, isthat they completely ignore the first strategy and focus almost excusively on thesecond strategy. Roughly put, the typical setup is to go over some examples forwhich existing definitions give counterintuitive answers, and then to constructa new definition that does not do so. It is unrealistic to expect that this secondstrategy in and of itself can deliver a satisfactory account of causation, becausethere are too many examples and even more intuitions (Glymour et al., 2010;Beckers and Vennekens, 2018).To solve this problem, this paper starts out with an explicit focus on thefirst strategy. It is striking that immediately after discussing the NESS intuition,Pearl diverges into complicated technical notions like “sustenance” and “causalbeams” and never looks back, be it in his book or in the subsequent workon the HP definitions. Instead I offer what is the most natural route downthe first strategy, namely to look at formalizations of causal sufficiency (asopposed to logical sufficiency) and combine them with two interpretations of necessity . Taken together this results in twelve distinct formal definitions ofactual causation.These definitions are compared to each other and to the HP definitions,leading to several interesting results. For one, it turns out that one of thesetwelve definitions is equivalent to the most recent HP definition (Halpern, 2015,2016). Therefore this paper is the first to show that one of the HP definitionssucceeds in delivering Pearl’s promise. At the same time, it also shows that theother HP definitions do not. This acronym was coined by Wright (1988), but Pearl does not intend to formalize thespecific manner in which Wright understood it, nor do I in the current paper. I have formalizedWright’s interpretation of the NESS definition elsewhere, in the process of developing anotherdefinition of causation (Beckers, 2021). The latter definition is in many ways a simplificationof the definition that I defend here. The precise relation between these two definitions is thesubject of future work. Mackie (1965) formulates the same intuition differently, resulting in the equally famousINUS acronym. See Wright (2011) for a detailed discussion of the subtle differences betweenthem. Just to name some of the most influential ones: Hitchcock (2001, 2007); Woodward (2003);Hall (2007); Weslake (2015). X = x causes Y = y iff there is a set ⃗ W = ⃗ w so that ( X = x, ⃗ W = ⃗ w ) is sufficient for Y = y along a causal network ⃗ N and there existssome value x ′ so that ( X = x ′ , ⃗ W = ⃗ w ) is not sufficient for Y = y along anycausal subnetwork of ⃗ N .This paper is laid out as follows. The next section introduces structuralequations models , the formal causal models that are used to express all thedefinitions. Then I state the three most recent HP definitions in Section 3.Section 4 presents six notions of causal sufficiency and shows how they relate toeach other. We then use these six notions to formalize actual causation along theNESS intuition in Section 5, and discuss several interesting results. After thistheoretical groundwork, we start looking for the best definition. Two definitionsare discarded by showing that they have certain unacceptable properties inSection 6. Finally, Section 7 compares the remaining definitions to each otherand to the HP definitions by considering examples from Halpern & Pearl and afew additional ones. This section reviews the definition of causal models as they were introduced byPearl (2000). Much of the discussion and notation is taken from Halpern (2016)with little change.
Definition 2.1 :
A signature S is a tuple (U , V , R) , where U is a set of exogenous variables, V is a set of endogenous variables, and R a function thatassociates with every variable Y ∈ U ∪ V a nonempty set R( Y ) of possible valuesfor Y (i.e., the set of values over which Y ranges ). If ⃗ X = ( X , . . . , X n ) , R( ⃗ X ) denotes the crossproduct R( X ) × ⋯ × R( X n ) .Exogenous variables represent factors whose causal origins are outside thescope of the causal model, such as background conditions and noise. The valuesof the endogenous variables, on the other hand, are causally determined by othervariables within the model (both endogenous and exogenous).3 efinition 2.2: A causal model M is a pair (S , F ) , where S is a signature and F defines a function that associates with each endogenous variable X a structuralequation F X giving the value of X in terms of the values of other endogenousand exogenous variables. Formally, the equation F X maps R(U ∪ V − { X }) to R( X ) , so F X determines the value of X , given the values of all the othervariables in U ∪ V .Note that there are no functions associated with exogenous variables; theirvalues are determined outside the model. We call a setting ⃗ u ∈ R(U) of valuesof exogenous variables a context .The value of X may depend on the values of only a few other variables. X depends on Y in context ⃗ u if there is some setting of the endogenous variablesother than X and Y such that if the exogenous variables have value ⃗ u , thenvarying the value of Y in that context results in a variation in the value of X ;that is, there is a setting ⃗ z of the endogenous variables other than X and Y andvalues y and y ′ of Y such that F X ( y, ⃗ z, ⃗ u ) ≠ F X ( y ′ , ⃗ z, ⃗ u ) . We then say that Y is a parent of X .We extend this genealogical terminology in the usual manner, by taking the ancestor relation to be the transitive closure of the parent relation (i.e., Y isan ancestor of X iff there exist variables so that Y is a parent of V , V is aparent of V , ..., and V n is a parent of X ). The descendant relation is simply thereversal of the ancestor relation (i.e., X is a descendant of Y iff Y is an ancestorof X .) A path is a sequence of variables in which each element is a child of theprevious element.In this paper we restrict attention to strongly recursive (or strongly acyclic )models, that is, models where there is a partial order ⪯ on variables such that if Y depends on X , then X ≺ Y . In a strongly recursive model, given a context ⃗ u ,the values of all the remaining variables are determined (we can just solve for thevalue of the variables in the order given by ⪯ ). We often write the equation foran endogenous variable as X = f ( ⃗ Y ) ; this denotes that the value of X dependsonly on the values of the variables in ⃗ Y , and the connection is given by thefunction f . For example, we might have X = Y + intervention has the form ⃗ X ← ⃗ x , where ⃗ X is a set of endogenous vari-ables. Intuitively, this means that the values of the variables in ⃗ X are set tothe values ⃗ x . The structural equations define what happens in the presence ofinterventions. Setting the value of some variables ⃗ X to ⃗ x in a causal model M = (S , F ) results in a new causal model, denoted M ⃗ X ←⃗ x , which is identical to M , except that F is replaced by F ⃗ X ←⃗ x : for each variable Y ∉ ⃗ X , F ⃗ X ←⃗ xY = F Y (i.e., the equation for Y is unchanged), while for each X ′ in ⃗ X , the equation F X ′ for X ′ is replaced by X ′ = x ′ (where x ′ is the value in ⃗ x corresponding to X ′ ).Given a signature S = (U , V , R) , an atomic formula is a formula of the form X = x , for X ∈ V and x ∈ R( X ) . A causal formula (over S ) is one of the form [ Y ← y , . . . , Y k ← y k ] ϕ , where • ϕ is a Boolean combination of atomic formulas,4 Y , . . . , Y k are distinct variables in V , and • y i ∈ R ( Y i ) for each 1 ≤ i ≤ k .Such a formula is abbreviated as [ ⃗ Y ← ⃗ y ] ϕ . The special case where k = ϕ . Intuitively, [ Y ← y , . . . , Y k ← y k ] ϕ says that ϕ would hold if Y i were set to y i , for i = , . . . , k .A causal formula ψ is true or false in a causal setting , which is a causal modelgiven a context. As usual, we write ( M, ⃗ u ) ⊧ ψ if the causal formula ψ is true inthe causal setting ( M, ⃗ u ) . The ⊧ relation is defined inductively. ( M, ⃗ u ) ⊧ X = x if the variable X has value x in the unique (since we are dealing with recursivemodels) solution to the equations in M in context ⃗ u (i.e., the unique vector ofvalues that simultaneously satisfies all equations in M with the variables in U set to ⃗ u ). The truth of conjunctions and negations is defined in the standardway. Finally, ( M, ⃗ u ) ⊧ [ ⃗ Y ← ⃗ y ] ϕ if ( M ⃗ Y ←⃗ y , ⃗ u ) ⊧ ϕ (i.e., the intervention ⃗ Y ← ⃗ y transforms M into a new model M ⃗ Y ←⃗ y , in which we assess the truth of ϕ ). Now on to the HP definitions. As Pearl (2000)’s initial definition is a precursor tothe HP definitions that gives less intuitive results and is far more complicated,I do not discuss it. (It is safe to say that by now it has been unanimouslyrejected.) Two of the HP definitions are developed by both Halpern and Pearl,whereas the third one is solely due to Halpern. The relations between them areextensively discussed by Halpern (2016).The general form of all three definitions is as follows (where ϕ is a Booleancombination of atomic formulas): Definition 3.1: ⃗ X = ⃗ x is an actual cause of ϕ in ( M, ⃗ u ) if the following threeconditions hold:AC1. ( M, ⃗ u ) ⊧ ( ⃗ X = ⃗ x ) ∧ ϕ .AC2. See below.AC3. ⃗ X is minimal; there is no strict subset ⃗ X ′′ of ⃗ X such that ⃗ X ′′ = ⃗ x ′′ satisfiesAC2, where ⃗ x ′′ is the restriction of ⃗ x to the variables in ⃗ X ′′ .Questions of actual causation are posed relative to an actual context ⃗ u , be-cause as we know from the previous section a context completely determineswhich events actually took place. So AC1 represents the trivial requirementthat the candidate cause and effect are among the events which took place.AC3 is also fairly straightforward: we should not consider redundant elementsto be parts of causes. The real content of the definition lies with AC2.Throughout the rest of the paper, settings of variables ⃗ V with superscript ∗ (i.e., ⃗ v ∗ ) indicate that ( M, ⃗ u ) ⊧ ( ⃗ V = ⃗ v ∗ ) . Settings of variables ⃗ V with su-perscript ′ (i.e., ⃗ v ′ ) indicate that ( M, ⃗ u ) ⊧ ( V ≠ v ′ ) for each V ∈ ⃗ V . Settings ofvariables without any superscript can refer to any setting.5n line with the NESS intuition, we should expect AC2 to consist of formalvariants of these two conditions: AC2(b). There is a set ⃗ W so that ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is causally sufficient for ϕ .AC2(a). ⃗ X = ⃗ x is necessary for the sufficiency of ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) .At first glance, the first two HP definitions seem to meet this expectation:they consist of conditions AC2(a) and AC2(b), and Halpern refers to these asa “necessity condition” and a “sufficiency condition” (2015, p. 3). Upon closerexamination, however, it is hard to see how either version of AC2(b) can sensiblybe interpreted as capturing causal sufficiency.We start with Original HP (Halpern and Pearl, 2001):
Definition 3.2: [ Original HP ]AC2(a). There is a partition of V into two sets ⃗ Z and ⃗ W with ⃗ X ⊆ ⃗ Z and asetting ⃗ x ′ and ⃗ w of the variables in ⃗ X and ⃗ W , respectively, such that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ] ¬ ϕ. AC2(b). For all subsets ⃗ Y of ⃗ Z − ⃗ X , we have ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w, ⃗ Y ← ⃗ y ∗ ] ϕ. We call ⃗ W = ⃗ w a witness of ⃗ X = ⃗ x causing Y = y .Note that one choice of ⃗ Y for which the condition in AC2(b) is required tohold, is ⃗ Y = ∅ . For that choice, AC2 states that the effect counterfactuallydepends on the cause when holding fixed the witness ⃗ W = ⃗ w : ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ] ϕ and ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ] ¬ ϕ . Therefore AC2(a) can easilybe interpeted as expressing a – contrastive – necessity condition: there existcontrast values ⃗ x ′ such that if those values were to obtain, then AC2(b) nolonger holds.The problem lies with interpreting AC2(b) as expressing causal sufficiency.The main obstacle lies in the absence of the requirement that ⃗ w = ⃗ w ∗ , i.e.,it is not required that the supposedly sufficient set of events ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ) actually took place . Therefore we cannot simply view ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ) itself as thecausally sufficient set we are looking for. Although it cannot be excluded thatthe conditions imposed by invoking ⃗ Z (and ⃗ Y ) somehow ensure the existenceof some other set that can be interpreted as a causally sufficient set, it is farfrom obvious that this is the case. This is confirmed by the fact that Halpern& Pearl do not even offer an attempt at giving an interpretation of AC2(b) asexpressing causal sufficiency.Matters get worse when we turn our attention to Updated HP (Halpern and Pearl,2005):
Definition 3.3: [ Updated HP ]AC2(a). Identical to the previous one. I list them unalphabetically for consistency with the HP definitions. ⃗ V of ⃗ W and subsets ⃗ Y of ⃗ Z − ⃗ X , we have ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ V ← ⃗ v, ⃗ Y ← ⃗ y ∗ ] ϕ (where ⃗ v is the restriction of ⃗ w to ⃗ V ).We see that AC2(b) has become even more complicated, and yet no ar-gument is given as to how this condition formalizes causal sufficiency, despiteHalpern explicitly claiming that this is what it aims to do. Instead, the up-dated version is justified on the basis of examples for which the previous versiongave counterintuitive answers.As a sidenote, Halpern and Pearl (2005) also define strong causation by de-manding that the following condition holds in addition to the other two:AC2(c). For all ⃗ w ∈ R ( ⃗ W ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ] ϕ. This definition has received almost no attention in the literature, because ac-cording to Halpern & Pearl it is too strong. As we shall see, this is unfortunate,because AC2(c) does adequately capture a variant of causal sufficiency.Finally we have
Modified HP , which is far simpler than the previous two(Halpern, 2015).
Definition 3.4: [ Modified HP ]AC2. There is a set ⃗ W of variables in V − ⃗ X , and a setting ⃗ x ′ of the variablesin ⃗ X such that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ ] ¬ ϕ. The crucial difference here is that
Modified HP does require the witness toconsist solely of events which actually took place, i.e., ⃗ w = ⃗ w ∗ . It is straightfor-ward to show that simply adding this requirement ensures that both versions ofAC2(b) are satisfied automatically, and therefore an explicit sufficiency condi-tion is not required. Halpern considers this definition to be an improvemementover the other two, and I agree with him. However, Halpern arrives at thisconclusion based on the many examples in which it better agrees with intuition.As will become clear, another – and arguably more compelling – justification isto be found in the fact that it is the only definition of the three which has anatural interpretation as formalizing the NESS intuition with which we started.To get there, we need to step away from the HP definitions and start afresh. Concretely, when discussing sufficient causality we find the following (Halpern, 2016, p.53): The key intuition behind the definition of sufficient causality is that not onlydoes ⃗ X = ⃗ x suffice to bring about ϕ in the actual context (which is the intuitionthat AC2(b) [from Original HP ] and AC2(b) [from
Updated HP ] are tryingto capture)... In retrospect, there is little basis for this judgment. They only discuss two examples inwhich strong causation diverges from
Updated HP . In the first of those (Ex. 3.2), it fails tocall the lighting of each of two matches ( ML = ML =
1) to be causes of a forest fire,whereas
Updated HP does not. However, their conjunction ( ML = , ML = ) is a strongcause, and thus each of them is part of a strong cause. As we will see, Halpern later suggeststreating “part of a cause” as being synonymous to “cause”, so the point would be moot. Inthe second example (Ex. 5.5), discussed as Example 7.4 later on, S = Updated HP . This is an example of trumping causation ,for which the majority opinion is that S = Modified HP also does not consider it a cause. Causal Sufficiency
1: Halpern (2016) suggests treating “part of a cause” (i.e., any X = x thatappears in ⃗ X = ⃗ x ) as synonymous with “cause” when talking about ModifiedHP . I will follow this suggestion throughout whenever discussing the judgmentof
Modified HP in particular examples, unless stated otherwise. In statingtheorems, however, the two are kept apart.2: The HP definitions allow the effect to be any propositional formula ϕ ,whereas the other definitions of causation will require effects to be of the form Y = y . A thorough discussion of complex effects is beyond the scope of thispaper. I here limit myself to two observations. • Although the definitions of causation here developed can be generalizedto allow for conjunctive effects (i.e., effects of the form ⃗ Y = ⃗ y ), it is not atall clear that we should want to do so. The reason is that we can easilyinclude variables into the effect that have nothing whatsoever to do withthe causes. Say we have a variable Y with equation Y = U , where U is anexogenous variable, and we are considering a context where U =
1. Thenfor any cause-effect pair ⃗ X = ⃗ x and ϕ , we automatically get that ⃗ X = ⃗ x also causes ϕ ∧ Y =
1, which is not a sensible result. Therefore we chooseto simply exclude conjunctive effects. • In the few examples in the literature where the HP definitions actuallyconsider an effect ϕ that is not of the form Y = y , ϕ takes on the form Y = y ∨ Y = y , . . . , ∨ Y = y n for some n . The definitions here developedcan easily be generalized to also allow for such effects. For reasons ofsimplicitly I choose not to do so in general and limit the discussion of thisgeneralization to one example for which it is required.3: The definitions of sufficiency below (and the definitions of actual causationthat follow in their wake) could be extended to also allow for exogenous variablesas members of a sufficient set, so that exogenous and endogenous variables aretreated alike. Since our goal is to make comparisons with the HP definitions,those would also have to be extended. Concretely, the HP definitions restrictcauses to being endogenous variables, and they do not allow exogenous variablesto be parts of a “witness” (the set ⃗ W above). For example, if we have Y = X ∨ U where U ∈ U and we consider a context where U = X =
1, the HP definitionsare unable to identify X = U =
0. The simplest way to sidestep this issue is to restrictourselves to models where exogenous variables only appear in equations of theform V = U . In that manner, all influence of the exogenous variables can beoverriden by interventions, reducing their role to simply providing us with theactual values of all variables. For any model which does not conform to thisrestriction, we can easily construct a very similar model that does: simplyreplace any exogenous variable U which appears in some equation that is not8f this form with a new endogenous variable V U , and add the equation V U = U .For the previous example this results in the model with equations Y = X ∨ V U , V U = U . (Note that now the HP definitions do consider X = Y = Throughout the rest of the paper, we take ⃗ X and ⃗ Y to be non-identical subsetsof the endogenous variables V that appear in a causal model M . Informally, to say that some setting ⃗ X = ⃗ x is sufficient for another setting ⃗ Y = ⃗ y , is to say that the latter follows from the former. To formalize thisrequires making explicit what it means for one setting to “follow” from another.In the context of causal sufficiency, an obvious minimal demand is that thismeaning captures the causal directionality. In the framework of causal modelsthis comes down to treating ⃗ X = ⃗ x as an intervention and ⃗ Y = ⃗ y as a consequenceof that intervention: if we set ⃗ X to the values ⃗ x , then ⃗ Y takes on the values ⃗ y .At least this much is clear.Yet by saying this, we have said nothing at all about the other endogenousvariables and their values, nor about the contexts in which we are evaluatingthe intervention. The difficulty lies in deciding what conditions we choose toimpose on the other variables, both endogenous and exogenous. I consider sixpossible ways in which this decision can be made that are fairly natural, butthis is by no means an exhaustive list.We start with the strongest conditions possible: in all contexts , if we set ⃗ X to the values ⃗ x , then ⃗ Y takes on the values ⃗ y , independent of the values of allother variables . Definition 4.1:
We say that ⃗ X = ⃗ x is directly sufficient for ⃗ Y = ⃗ y in M if for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ Y )) and all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ Y = ⃗ y .The strength of this definition is also its weakness: by putting such strongdemands on the sufficient set, many interesting sets are excluded. This restric-tiveness becomes apparent later on when we add a necessity condition (Propo-sition 6.1): only parents can ever be part of a minimal directly sufficient set.A trivial example illustrates this point. Say the equation for Y is Y = A , theequation for A is A = X , and we are looking at a context in which X = Then X = Y =
1, because intervening on A overrides any We take them to be non-identical to exclude calling a setting ⃗ X = ⃗ x causally sufficient foritself, and a fortiori to exclude calling it a cause of itself. Note that in this paper we are interested in the causal sufficiency of settings of variables for other settings of variables. This is quite distinct from how the term “causal sufficiency” issometimes used in the causal modelling literature, namely as a property of a set of variables in a causal graph. Weslake (2015) also offers this definition of causal sufficiency to develop a definition ofactual causation. He mistakenly claims that Halpern & Pearl call this condition strong cau-sation. As we have seen, strong causation does not require ⃗ C to contain all other variables. In all examples the variables are binary unless indicated otherwise. A binary variable isa variable that has range { , } . X on Y . Still, there is clearly a sense in which X = is causallysufficient for Y =
1. In particular, X = ( A = , Y = ) .Generalizing this intuition provides us with the second form of sufficiency:there is some setting ⃗ N = ⃗ n that includes ⃗ Y = ⃗ y , so that in all contexts, if we set ⃗ X to the values ⃗ x , then ⃗ N takes on the values ⃗ n , independent of the values ofall other variables. This can be formulated more succinctly as: ⃗ X = ⃗ x is directlysufficient for some set to which ⃗ Y = ⃗ y belongs. Definition 4.2:
We say that ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M ifthere exists a ⃗ N = ⃗ n so that ⃗ Y ⊆ ⃗ N , ⃗ y is the restriction of ⃗ n to ⃗ Y , and ⃗ X = ⃗ x isdirectly sufficient for ⃗ N = ⃗ n .Observe that another intuitive way of viewing X = Y = X = A = A = Y =
1. Thisintuition can also be generalized to define a form of sufficiency. Concretely, wecan define strong sufficiency along a network as the transitive closure of directsufficiency. Definition 4.3 :
We say that ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M along a network ⃗ N if there are (possibly overlapping) sets ⃗ N i such that ⃗ N = ⃗ Y ∪ i ∈{ ,...,k } ⃗ N i and there exist values ⃗ n i ∈ R ( ⃗ N i ) for each i such that ⃗ X = ⃗ x isdirectly sufficient for ⃗ N = ⃗ n , ⃗ N = ⃗ n is directly sufficient for ⃗ N = ⃗ n , ..., and ⃗ N k = ⃗ n k is directly sufficient for ⃗ Y = ⃗ y .The following result shows that both forms of strong sufficiency are merelydifferent ways of expressing the same notion of sufficiency (and hence the termis appropriately chosen). Taking in mind the earlier observation (to appear lateras Proposition 6.1) that direct sufficiency combined with necessity is a relationbetween parents and children, we can safely think of a network as consisting ofvariables that lie on some path between ⃗ X and ⃗ Y . Doing so will make it easierto apply the definitions of causation to examples. Proposition 4.4: ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M along a network ⃗ N iff ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M . (Proofs of all Theorems are to be found in the Appendix.)Another obvious way to weaken the conditions on the values of the endoge-nous variables compared to direct sufficiency is to only consider the setting inwhich we leave the other variables alone, giving: in all contexts , if we set ⃗ X tothe values ⃗ x and do not intervene on any other variable , then ⃗ Y takes on thevalues ⃗ y . As with the definition of direct sufficiency, this one also appears in Weslake (2015)’sconstruction of actual causation, with the added requirement that ⃗ N is minimal. This demandbecomes redundant once we add our necessity condition. The other conditions Weslake invokesare quite complicated and do not have a counterpart in our story, which is why his definitionalso fails at the first strategy. This definition appears as just one condition in Halpern (2016)’s definition of sufficientcausality . One of the other conditions is in fact actual causation. efinition 4.5: We say that ⃗ X = ⃗ x is weakly sufficient for ⃗ Y = ⃗ y in M if forall ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] ⃗ Y = ⃗ y .The following straightforward result shows the relative strengths of the abovethree notions of sufficiency. Proposition 4.6: If ⃗ X = ⃗ x is directly sufficient for ⃗ Y = ⃗ y then ⃗ X = ⃗ x is stronglysufficient for ⃗ Y = ⃗ y , and if ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y then ⃗ X = ⃗ x isweakly sufficient for ⃗ Y = ⃗ y . So far we have considered three definitions that differ only with regards tothe conditions they impose on the values of the endogenous variables: theyall agreed on requiring their respective conditions to hold in all contexts. Yetquestions of actual causation are posed relative to an actual context ⃗ u , and thusit is only natural that we should consider doing the same for questions of causalsufficiency. This adds three more definitions of sufficiency, which are simplythe result of replacing the universal quantifier over contexts with a particularcontext that is assumed to be given. Definition 4.7:
We say that ⃗ X = ⃗ x is actually directly sufficient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ Y )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ Y = ⃗ y . Definition 4.8:
We say that ⃗ X = ⃗ x is actually strongly sufficient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if there exist ⃗ N = ⃗ n so that ⃗ Y ⊆ ⃗ N , ⃗ y is the restriction of ⃗ n to ⃗ Y , and ⃗ X = ⃗ x is actually directly sufficient for ⃗ N = ⃗ n . Definition 4.9:
We say that ⃗ X = ⃗ x is actually weakly sufficient for ⃗ Y = ⃗ y in ( M, ⃗ u ) if ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] ⃗ Y = ⃗ y .Obviously the counterpart of Proposition 4.6 holds as well for these notionsof actual sufficiency. We can formalize and generalize the intuitions behind the definitions in the pre-ceding section by showing that all six definitions of sufficiency can be interpretedas simply putting different constraints on the parameters that occur in the fol-lowing general definition of sufficiency. (We only explicitly discuss the threedefinitions of “non-actual” sufficiency, but the same analysis trivially applies tothe three definitions of actual sufficiency.)
Definition 4.10: [ General Definition of Sufficiency ] We say that ⃗ X = ⃗ x is sufficient for ⃗ Y = ⃗ y in M if there exist sets ⃗ C ⊆ V − ( ⃗ X ∪ ⃗ Y ) , ⃗ N ⊆ V − ( ⃗ X ∪ ⃗ C ) with ⃗ Y ⊆ ⃗ N , and a setting ⃗ n ∈ R ( ⃗ N ) where the restriction of ⃗ n to ⃗ Y is ⃗ y , such that forall ⃗ c ∈ R ( ⃗ C ) and for all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n .We say that ⃗ X = ⃗ x is sufficient for ⃗ Y = ⃗ y in M along ⃗ N independent of ⃗ C .11his definition is more complicated than Definitions 4.1, 4.2, and 4.5. Itsuse lies in the fact that it allows us to see exactly how the three definitionsrelate to each other, and how one can construct other definitions of sufficiency,by invoking the following trivial result. Proposition 4.11:
Definitions 4.1, 4.2, and 4.5, are equivalent to Definition4.10 when making respectively the following choices for ⃗ N and ⃗ C : Weak Sufficiency.
Choose both ⃗ C and ⃗ N to be minimal, i.e., ⃗ C = ∅ , ⃗ N = ⃗ Y . Strong Sufficiency.
Choose ⃗ N to be maximal given ⃗ C , i.e., ⃗ N = V − ( ⃗ X ∪ ⃗ C ) . Direct Sufficiency.
Choose ⃗ C to be maximal, i.e., ⃗ C = V − ( ⃗ X ∪ ⃗ Y ) and thus ⃗ N = ⃗ Y . Proposition 4.11 could inspire even more variants of sufficiency. In fact,we have already come across the most obvious one: AC2(c). It is easy to seethat it consists of choosing ⃗ N to be minimal given ⃗ C , i.e., ⃗ N = ⃗ Y , meaning itsits in between Weak and Strong Sufficiency. The condition also appears as asufficiency condition in Pearl’s notion of sustenance , which is the first step hetakes towards formalizing the NESS intuition (2009, p. 317). Unfortunatelyit is also the last step, because the subsequent notions he introduces are farmore complicated and bear no resemblance to NESS. The added complexity isintroduced precisely because taken by itself sustenance fails to provide a sensibledefinition of causation, which is why I leave the exploration of this and otherpossible variants of sufficiency for another occasion. We are finally ready to take up the main challenge: defining actual causation asthe formal expression of the NESS intuition. In order to do so, several questionsneed to be answered: • Should we use actual sufficiency or not? • Which of the three definitions of (actual) causal sufficiency should we use? • Does necessity mean that there exist contrast values of ⃗ X so that the setwould not be sufficient if those values obtained, or does it mean that theset is no longer sufficient when we remove the subset ⃗ X ?I have introduced six definitions of causal sufficiency in the previous section.For each definition, we can define causation using either of the two interpre-tations of necessity, giving twelve definitions of actual causation altogether.However, I will show that several of these are equivalent to each other, and onewill be impossible to satisfy, leaving us with six definitions in the end. One ofthose will be Modified HP . 12 .1 A Family of Definitions
As with the HP definitions, Definition 3.1 gives the general form of all definitions,except that ϕ is restricted to Y = y . (This restriction is assumed whenevercomparisons are made with the HP definitions.) As before, the only differencelies with the content of AC2. Using the first interpretation of necessity, whichwe shall call contrastive necessity , the general form of AC2 is as follows: Definition 5.1: [ General Definition of Causation ] There exist sets ⃗ W , ⃗ N such thatAC2(a c ). There exist values ⃗ x ′ such that for all ⃗ S ⊆ ⃗ N , ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is notsufficient for Y = y along ⃗ S .AC2(b). ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along ⃗ N .We call ⃗ W a witness of ⃗ X = ⃗ x causing Y = y .By replacing sufficiency in the General Definition of Causation withany of the six definitions of sufficiency from Section 4, we obtain six specificdefinitions of actual causation. AC2(b) simply expresses causal sufficiency,whatever form it may take. AC2(a c ) offers a somewhat nuanced expressionof necessity because it also focusses on subsets of ⃗ N . (Note that this nuancematters only for Strong Sufficiency, since for Weak and Direct Sufficiency ⃗ N = { Y } anyway.) The reason is that our interest lies with the sufficiency for Y = y ,and the network ⃗ N is merely a means to that end. If ⃗ X = ⃗ x ′ accomplishes thesame end using less means, then ⃗ X = ⃗ x was not necessary for achieving it.Under the second interpretation of necessity, which we shall call minimalnecessity , AC2(a c ) is replaced with:AC2(a m ). For all ⃗ S ⊆ ⃗ N , ⃗ W = ⃗ w ∗ is not sufficient for Y = y along ⃗ S .Both interpretations of necessity are prima facie plausible. The contrastiveinterpretation is explicitly counterfactual in nature, whereas the minimal inter-pretation is more neutral. Our analysis will settle which one of them is to bepreferred.Filling in each of the six definitions of causal sufficiency into both versionsof the General Definition of Causation gives twelve specific definitions ofactual causation. I refer to each of these as
Def x for x ∈ { , . . . , } along thefollowing convention: • Def 1
Contrastive actual weak sufficiency • Def 2
Contrastive actual strong sufficiency • Def 3
Contrastive actual direct sufficiency Definition 5.1 can be made even more general by also incorporating ⃗ C from Definition4.10. Since we are only considering notions of sufficiency for which ⃗ C is determined entirelyby the other sets, there is no need to do so for our purposes. But it is important to keep thisadditional generality in mind if one wants to use alternative definitions of sufficiency. Def 4
Contrastive weak sufficiency • Def 5
Contrastive strong sufficiency • Def 6
Contrastive direct sufficiency • Def 7
Minimal actual weak sufficiency • Def 8
Minimal actual strong sufficiency • Def 9
Minimal actual direct sufficiency • Def 10
Minimal weak sufficiency • Def 11
Minimal strong sufficiency • Def 12
Minimal direct sufficiencySo to be clear, each
Def x is constructed by taking the respective definitionof sufficiency (i.e., Definition 4.1, 4.2, 4.5, 4.7, 4.8, or 4.9), filling that intothe General Definition of Causation where AC2(a) takes on AC2(a c ) orAC2(a m ) depending on whether x < Def 2 . Definition 5.2: [ Def 2 ] ⃗ X = ⃗ x is an actual cause of Y = y according to Def 2 in ( M, ⃗ u ) if the following three conditions hold:AC1. ( M, ⃗ u ) ⊧ ( ⃗ X = ⃗ x ) ∧ Y = y .AC2(a c ). There exist sets ⃗ W , ⃗ N with Y ∈ ⃗ N , and values ⃗ x ′ , such that for all ⃗ S ⊆ ⃗ N with Y ∈ ⃗ S , and for all ⃗ s ∈ R ( ⃗ S ) such that y ∈ ⃗ s , there exists a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S )) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s .AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n ∗ .AC3. ⃗ X is minimal.Admittedly, Def 2 looks even more complicated than
Updated HP . Fur-ther on I provide some results that allow us in many cases to use simpler def-initions as stand-ins for
Def 2 . More importantly, although the notation ofDefinition 5.2 is complicated, its meaning can be spelled out intuitively by stat-ing that ⃗ X = ⃗ x causes Y = y iff ⃗ X = ⃗ x is a Minimal Contrastively NecessarySubset of a Strongly Sufficient Set for Y = y (or MCNS ). Strictly speaking it should say “Actually Strongly Sufficient”, but that makes for a lesselegant acronym. I am cheating a bit by anticipating Theorem 5.3. .2 Analysis Let us now turn to investigating the relations between these definitions. (Know-ing these relations before getting into the discussion of examples makes life alot easier.) A first remark is that
Def 7 is impossible to satisfy, as it requiresthat both ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ∗ , ⃗ W ← ⃗ w ∗ ] Y = y and ( M, ⃗ u ) / ⊧ [ ⃗ W ← ⃗ w ∗ ] Y = y hold,implying that ( M, ⃗ u ) ⊧ Y = y ∧ Y ≠ y .A second remark is that Def 3 is equivalent to a condition that appears inPearl’s first definition of actual causation (1998). Ignoring
Def 7 , we are still left with eleven candidate definitions of actualcausation (fourteen candidates if we count the three HP definitions), whereaswe would like to settle on just one. The rest of the paper is concerned withselecting the best definition out of the lot. As a first step, we can reduce thenumber of definitions by six.
Theorem 5.3:
The following are all equivalences among the twelve definitionsand the three HP definitions: • Modified HP iff Def 1 • Def 2 iff Def 5 • Def 8 iff Def 11 • Def 3 iff Def 6 iff Def 9 iff Def 12
Theorem 5.3 offers our first interesting result: it shows that
Modified HP succeeds in formalizing the NESS intuition, whereas the other two HP definitionsdo not. From now on I will ignore the definitions appearing on the right-handside in Theorem 5.3. The following is a helpful result for applying some of thedefinitions going forward. (As is well known, the same result holds for
OriginalHP (Halpern, 2016).)
Proposition 5.4: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to a definitionthat uses minimal necessity, then ⃗ X is a singleton. The following result offers important insights into the relations between theremaining definitions.
Theorem 5.5 :
The only implications – involving either causes or parts ofcauses – between the remaining five definitions (
Def 2 , Def 3 , Def 4 , Def8 , and
Def 10 ) and the three HP definitions are the following ones (and theirimmediate consequences, of course): • If part of
Modified HP then
Updated HP ; It re-appears in his second definition of actual causation in the notion of a causal beam ,but without the necessity condition (Pearl, 2009, p. 318). To see the equivalence, one needsto invoke Proposition 6.1. This is shorthand for: If X = x is part of a cause of Y = y according to the Modified HP definition then it is a cause of Y = y according to the Updated HP definition. If part of
Updated HP then
Original HP ; • If Def 3 then
Def 2 ; • If part of
Def 2 then
Def 8 ; • If Def 3 then
Original HP ; • If Def 10 then
Def 4 . Two definitions can be excluded quickly. The following result shows why
Def 3 is not a sensible candidate as a general definition of causation, since causationis obviously not restricted to parent-children pairs.
Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to Def 3 , then ⃗ X is a singleton, and X is a parent of Y . Although we can dismiss
Def 3 as a general definition of causation, it is stilla useful stand-in for – the arguably more complicated –
Def 2 and
Def 8 incase X is a parent of Y and X is not an ancestor of Y along any path that islonger than a single edge (which in fact covers a surprisingly large number ofcases discussed in the literature). In such cases we say that X is only a parentof Y . Proposition 6.2: If X is only a parent of Y , then Def 2 , Def 3 , and
Def 8 are all equivalent for causes X = x . A cornerstone of the counterfactual approach to causation is that counterfac-tual dependence is sufficient for causation. More formally, there is widespreadconsensus that causation should satisfy the following principle: Principle 1 (Dependence)
Say ( M, ⃗ u ) ⊧ X = x ∧ Y = y . If there exists avalue x ′ such that ( M, ⃗ u ) ⊧ [ X ← x ′ ] Y ≠ y then X = x causes Y = y in ( M, ⃗ u ) . Accepting this principle means that
Def 10 is excluded as well.
Proposition 6.3:
Out of all definitions we have considered,
Def 10 and
Def3 are the only ones which do not satisfy
Dependence . That leaves us with
Def 2 , Def 4 , and
Def 8 as possible alternatives to theHP definitions. As does Halpern, I here restrict myself to counterfactual dependence on a single conjunct(2016, p. 26). Def 2, Def 4, and Def 8, vs the HP definitions
We have shown that all twelve definitions we developed (including
ModifiedHP ) are instantiations of the
General Definition of Causation (Def. 5.1),and thereby they improve upon
Original HP and
Updated HP as far as thefirst strategy goes. We now show that
Def 2 also improves upon all three HPdefinitions as far as the second strategy goes, whereas
Def 4 and
Def 8 do not.In order to remain as neutral as possible, we go over Halpern & Pearl’s ownexamples, compare the verdicts of our definitions to theirs, and stick as close aspossible to their intuitions.
The
Updated HP definition is by far the most well-known. It was developed asan improvement of
Original HP , which sometimes gives unreasonable answers.Halpern and Pearl (2005) offer many examples to illustrate how it works andhow it successfully deals with paradigm cases of causation.Their first example is one of those few cases – recall the beginning of Section4 – in which the effect is of the form Y = y ∨ Y = y , and therefore allows usto illustrate how we can generalize the General Definition of Causation tosuch effects. It is also an example for which
Def 8 gives the wrong answer, butthe subsequent example is far simpler and more convincing in this respect.
Example 7.1: “Suppose that there was a heavy rain in April and electricalstorms in the following two months; and in June the lightning took hold. If ithadn’t been for the heavy rain in April, the forest would have caught fire inMay.” (Halpern and Pearl, 2005, p. 15) I agree with Halpern and Pearl’s judg-ment that it would be very counterintuitive to say that the April rain causedthe forest fire, since all it did was delay the fire. As they indicate, it is neverthe-less perfectly sensible to say that the April rain caused the forest fire in June ,as opposed to May. In order to capture this distinction, we need to invoke adisjunctive effect.Let F represent there being a fire or not, with three possible values: 0 (nofire), 1 (fire in May), or 2 (fire in June). ES is a four-valued variable thatcaptures whether there are electric storms: ( , ) (no electric storms in eitherMay or June), ( , ) (electric storms in May but not in June), ( , ) (storms inJune but not May), and ( , ) (storms in both May and June). Lastly, AS is abinary variable expressing whether or not there was April rain.The equation for F is then given by: F = ( AS = ∧ ES = ( , )) ∨ ES = ( , ) , F = AS = ∧ ( ES = ( , ) ∨ ES = ( , )) , and F = F = AS =
1, all definitions we are consideringagree that AS = F =
2. The question is whether AS = F = ∨ F = ⃗ X = ⃗ x is sufficientfor Y = y ∨ Y = y ′ iff ⃗ X = ⃗ x is sufficient for Y = y or ⃗ X = ⃗ x is sufficient for Y = y ′ .When integrated into our General Definition of Causation , this results in17plitting up AC2(a) so that there is one instance for each disjunct. AC2(b) neednot be split up, since it can only ever be satisfied for the actual value of Y . Let us apply this idea to our example. To satisfy AC2(b), we have to add ES to the witness: ( AS = , ES = ( , )) is directly sufficient for F = AS = AS is only a parentof F . We cannot invoke Proposition 6.2 though, since that requires an effect Y = y .) We then see that one of the two conditions that now make up AC2(a)is not satisfied for Def 2 and
Def 4 , because ( AS = , ES = ( , )) is directlysufficient for F =
1. Therefore
Def 2 and
Def 4 agree with the HP definitionsthat the April rain did not cause the forest fire. But
Def 8 does not reach thisverdict, because ES = ( , ) is not directly sufficient for either F =
1, nor is itfor F =
2. This means AC2(a) is fullfilled for
Def 8 , which leads to a mistakenconclusion.Although one counterexample need not disqualify a definition, the followingexample is indicative of a deeper problem with
Def 8 : whenever X = x stronglysuffices for Y = y , it is automatically a cause according to Def 8 , since ∅ isnever strongly sufficient for Y = y . The following example is but one of manyparadigm cases in the literature for which this property leads to a counterintu-itive verdict. Therefore
Def 8 is also excluded as a definition of causation.
Example 7.2: “The engineer is standing by a switch in the railroad tracks. Atrain approaches in the distance. She flips the switch, so that the train travelsdown the right-hand track, instead of the left. Since the tracks reconverge upahead, the train arrives at its destination all the same...Again, our causal model gets this right. Suppose we have three randomvariables: • F for “flip”, with values 0 (the engineer doesn’t flip the switch) and 1 (shedoes); • T for “track”, with values 0 (the train goes on the left-hand track) and 1(it goes on the right-hand track); and • A for “arrival”, with values 0 (the train does not arrive at the point ofreconvergence) and 1 (it does).” (Halpern and Pearl, 2005, p. 26) Note that this means generalizing to disjunctions across different variables – i.e., somethinglike Y = y ∨ Z = z – is more complicated. McDermott (1995) offers an almost identical example involving a dog biting a terrorist.Another famous case is that involving a boulder rolling towards a hiker (Hitchcock, 2001).All of these examples are counterexamples to the transitivity of causation. The failure oftransitivity has become broadly accepted by now (Beckers and Vennekens, 2017). Despitewhat
Def 8 ’s behavior in these examples might suggest, it is also not transitive. A simplecounterexample consists of equations Z = Y ∨ W , and Y = X ∧ W . If X = W = Def 8 considers X = Y = Y = Z =
1, yet it does not consider X = Z = A is given by A = T ∨ ¬ T , which can be rewritten as A =
1. Thiscan be fixed by extending the range of T with a value 2, representing the trainnot going down any track (because it breaks down, for example). Then theequations become A = ( T ≠ ) and T = F . The context is such that F = F = A = A = { T } , but so is F =
0. Therefore
Def 2 and
Def 4 agree with
UpdatedHP (and with intuition) that flipping the switch is not a cause of the train’sarrival.
Def 8 fails to reach this verdict, because ∅ is not strongly sufficient for A = Def 4 suffers from an even bigger defect than
Def 8 : it fails to distinguishpreempted causes from preempting causes. Since preemption cases are the breadand butter of the literature on actual causation, this means that
Def 4 isimmediately disqualified. The following is a famous example of late preemptiondiscussed by Halpern and Pearl (2005) (and originally by Hall (2004)).
Example 7.3:
Suzy and Billy both throw a rock at a bottle. Suzy’s rock getsthere first, shattering the bottle. However Billy’s throw was also accurate, andwould have shattered the bottle had it not been preempted by Suzy’s throw.Halpern and Pearl (2005) use the following variables for this example, whichcapture the fact that Billy’s throw was preempted by Suzy’s rock hitting thebottle: BS for the bottle shattering, BH , SH for Billy’s (resp. Suzy’s) rockhitting the bottle, and two more variables ( BT , ST ) for either of them throwingtheir rock. The equations are then as follows: BS = BH ∨ SH , SH = ST , BH = BT ∧ ¬ SH . None of the definitions has any problem arriving at theobvious result that Suzy’s throw ( ST =
1) causes the bottle to shatter ( BS = Def 4 is the only definition under consideration that mistakenly alsojudges Billy’s throw to be a cause of the bottle’s shattering: in all contexts BT = BS =
1, whereas BT = BS = ST = Def 2 as the last potential alternative to the HP defini-tions. Going through the many remaining examples, there is only one in which
Def 2 disagrees with
Updated HP . I leave it to the reader to verify this claim,and restrict the discussion to that single example.
Example 7.4:
Major ( M ) and sergeant ( S ) stand before corporal, and bothshout ‘Charge!’ ( M = S = C = M =
0) the corporal would not have charged. If the major remains quiet( M = − The equation for C is thus: C = M if M ≠ − C = S otherwise. The majority intuition is that the This formulation is due to Weslake (2015), but the example was first discussed by Schaffer(2000) (who attributes it to van Fraassen). Def 2 agrees, as it does not consider S = C =
1. The reasonis that M = S = M = S = C = Original HP and
Updated HP . Halpern & Pearl do not consider this to beproblematic, but they do go through the trouble of showing how
Original HP and
Updated HP change their verdict if one adds extra variables to the model.Moreover,
Modified HP also agrees with
Def 2 here. Given Halpern’s laterpreference for
Modified HP , it is fair to say that
Def 2 does at least as goodas
Updated HP on this example.
Dissatisfied with
Updated HP due to the many counterexamples that werepresented in the literature, Halpern (2015) develops
Modified HP . First of all,despite Theorem 5.5, there do exist interesting connections between the threedefinitions we have considered and
Modified HP . Proposition 7.5: If Modified HP with ⃗ X a singleton, then Def 2 , Def 4 ,and
Def 8 . Halpern (2015) goes over several counterexamples to
Updated HP andshows that
Modified HP offers sensible verdicts. Taking into account Halpern’ssuggestion that “part of cause” is synonymous with “cause” for
Modified HP ,there are in fact only three examples in which
Modified HP disagrees with
Updated HP (Examples 3.5, 3.8, and 3.11). In all three of those cases,
Def2 sides with
Modified HP .There is only one example in which
Def 2 disagrees with
Modified HP . Crucially, it is an example for which Halpern agrees that
Modified HP reachesthe wrong verdict.
Example 7.6:
A ranch has five individuals: a , . . . , a . They have to vote ontwo possible outcomes: staying at the campfire ( O =
0) or going on a round-up( O = A i be the random variable denoting a i ’s vote, so A i = j if a i votes for outcome j . There is a complicated rule for deciding on the outcome.If a and a agree (i.e., if A = A ), then that is the outcome. If a , . . . , a agree, and a votes differently, then the outcome is given by a ’s vote (i.e., See Weslake (2015) for a discussion. When discussing Example 3.8 again in (Halpern, 2016), he mistakenly claims that
Mod-ified HP agrees with
Updated HP when treating parts of causes as causes. In response,Halpern has suggested a small variation on the example in which
Modified HP indeed doesagree with
Updated HP (personal communication). For that variation,
Def 2 also agreeswith the HP definitions. Halpern (2016) discusses far more cases, but none of them reveal any further disagreementsbetween these definitions. = A ). Otherwise, majority rules. In the actual situation, A = A = A = A = A =
0, so by the first mechanism, O = Halpern states, and I agree, that intuitively one should expect only A = A = O =
1. After all, a , . . . , a voted against O = Def2 gives that result, whereas
Modified HP considers every vote to be a cause.Halpern argues for adding more variables to the model in order to get the rightoutcome, but it speaks in favor of
Def 2 that it is able to give the right answerwith just these variables.We conclude that judged by the second strategy and Halpern & Pearl’sown examples,
Def 2 does better than
Updated HP and at least as good as
Modified HP . Lastly we consider a very simple example that was offered as acounterexample to
Modified HP by Rosenberg and Glymour (2018).
Example 7.7:
We have equations Y = X ∨ D and X = D , and we consider acontext such that D =
1. This looks very much like a standard case of overde-termination in which X = D = Modified HP : it does not consider X = Y =
1. The reason forthis is that Y = D = X = D = Def 2 over
Modified HP . Finally I will argue that
Def 2 does better than all of the other definitionson a few more examples according to two metrics: it offers verdicts that areboth intuitively plausible and consistent across minor changes of the examples.Before doing so, I present an example that illustrates a special property of
Def2 . This is the formulation of the example found in (Halpern, 2016, p. 109), but the examplewas first presented by Glymour et al. (2010). [ ⃗ W ← ⃗ w ] such that Y = y counterfactually dependson ⃗ X = ⃗ x under that intervention. The same is true for the most well-knowndefinitions out there that have been inspired by the HP definitions (see Weslake(2015) for an overview), as well as for Def 3 , Def 4 , and
Def 10 . Let us calldefinitions with this property strongly counterfactual . Although
Def 2 clearlyalso relies on counterfactuals, and thus falls within the counterfactual approachto causation, it is not strongly counterfactual, as the following example shows. Example 7.8:
The equation for a binary variable Y is such that Y = N ≠ N is { , , , } . The equation for N is as follows: N = A = N = ( A = ∧ X = ) , N = ( A = ∧ X = ∧ W = ) , and N = ( A = ∧ X = ∧ W = ) . In a context where A = W = X =
1, we get that X = Y = Def 2 . Yet there is no intervention such that Y = X = X = Y = Def 2 reaches its verdict because of the asymmetry between ( A = , X = ) and ( A = , X = ) : only the former is by itself causally sufficientfor a network that results in Y =
1, whereas the latter also needs the assistanceof W = W = Y : Y = ( X ∧ D ) ∨ A . Moreover,they all share a context such that X = A =
1. The only difference betweenthem lies with the value of D (0 or 1) and with the relation between A and D .(Concretely, there could be no relation, or it can be given by A = D , A = ¬ D , D = A , and D = ¬ A .) In all examples, all definitions agree that A = Y =
1. The disagreement arises over whether X = X = D =
0, regardless of the relation between A and D . The disjunct in which X appears is false, and therefore it played no positive part whatsoever in causing Y =
1. Perhaps others are more tolerant. But even if that is the case, one shouldexpect one’s verdicts to exhibit some consistency. As we will see,
Def 2 and
Original HP are the only definitions which can meet this demand.The situation is simplest for
Original HP : it considers X = Y = ( D = , A = ) . Holdingfixed that witness, Y = X =
1. Since ⃗ Z = { X } ,the former is equivalent to AC2 for Original HP . So we gain consistency, butat the price of extreme tolerance. In fact, Halpern and Pearl use precisely thisexample to argue against
Original HP and in favor of
Updated HP (2005,p. 35): It is not so clear that
Def 8 also relies on counterfactuals, since it does not explicitlyinvoke counterfactual values of the candidate cause. Exploring this topic further lies beyondthe scope of this paper. xample 7.9: “Suppose that a prisoner dies either if X loads D ’s gun and D shoots, or if A loads and shoots his gun. Taking Y to represent the prisoner’sdeath and making the obvious assumptions about the meaning of the variables,... [we can use the equation described above]. Suppose that X loads D ’s gun( X = D does not shoot ( D = A does load and shoot his gun ( A = A = Y = We would not wantto say that X = is a cause of Y = , given that D did not shoot (i.e., giventhat D = ). ” [emphasis added]If we agree with Halpern and Pearl here – which I do – then Original HP can be discarded on the basis of this example (and on the basis of the manyothers we discussed previously, of course). I leave it to the reader to verify thatnone of the other definitions consider X = D = Def 2 . Moreover, it is the only remainingdefinition that offers a simple consistent answer in all cases: X = Y = D =
1. To see why this is the case, we go over the possible directlysufficient sets. (Since X is only a parent of Y , we can invoke Proposition 6.2 anduse Def 3 instead of
Def 2 .) Clearly X = Y = A = A = Y = D asour witness. If D =
0, this gives ( X = , D = ) , which is not directly sufficientfor Y = X = D =
1, we get ( X = , D = ) , whichis directly sufficient for Y =
1. Since the same does not hold for ( X = , D = ) , X = Y = Updated HP and
Modified HP flip-flop between calling X = D .Of course I cannot exclude the possibility that some consistent argumentationcan be offered to explain the results of one of these definitions, but in its absenceall of this speaks in favor of Def 2 . We start with the three possible ways inwhich it can arise that D = Example 7.10:
First consider the case where D is determined by the context,and we have a context such that D =
1. Here all four definitions agree that X = Y = Example 7.11:
Second consider the case where the equation for D is given by D = A and thus again D = UpdatedHP and
Modified HP flip their verdict, as they no longer consider X = Y = Example 7.12:
Third, we simply flip the relation between A and D so that A = D , and again D = UpdatedHP and
Modified HP go back to considering X = Y = D = xample 7.13: Consider the case where the equation for D is D = ¬ A . As withExample 7.9, we have that D =
0, and yet
Updated HP changes its verdict,calling X = Y = Example 7.14: Lastly, consider the case where the equation for D is A = ¬ D ,and thus we again have that D =
0. Now both
Modified HP and
UpdatedHP flip their verdicts as compared to Example 7.9. To see why, it sufficesto consider
Modified HP . The result for
Updated HP then follows fromTheorem 5.5. D = Y = Y = D =
0. Since Y = ( X = , D = ) , X = Y = I have developed twelve definitions of actual causation that formalize the NESSintuition with which Pearl started, and have shown that the most recent of theHP definitions is among them. Although these definitions vary widely in termsof the verdicts they reach, they all resemble each other as being instantiationsof the same general definition. Each definition is made up of two elements: adefinition of causal sufficiency, and a definition of necessity. Other definitionscan easily be developed by playing around with these elements.After studying various properties of these definitions and the relations be-tween them, I moved on to the process of selecting the definition that does bestin practice. In the majority of the many examples that we have considered,
Def2 agrees with
Modified HP . However, in Section 7.2 we came across two exam-ples for which
Def 2 disagreed with
Modified HP and where
Modified HP gave the wrong verdict. Moreover, contrary to
Modified HP , Def 2 managesto give consistent (and intuitive) answers to the group of cases considered in theprevious section. Therefore I conclude by suggesting that we should adopt
Def2 as a definition of actual causation. This definition is made up of strong suffi-ciency and contrastive necessity. It states that ⃗ X = ⃗ x causes Y = y iff ⃗ X = ⃗ x is aMinimal Contrastively Necessary Subset of a Strongly Sufficient Set for Y = y ,or MCNS . A AppendixCausal Sufficiency
Proposition 4.4: ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M along a network ⃗ N iff ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M . Proof:
First assume ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M and ⃗ N can beused to show this. Then the result follows immediately from the observation The attentive reader will remember this example from the proof of Theorem 5.3. ⃗ X = ⃗ x is directly sufficient for ⃗ N = ⃗ n and either ⃗ N = ⃗ n is directly sufficientfor ⃗ Y = ⃗ y or ⃗ N = ⃗ Y and ⃗ n = ⃗ y .Second assume ⃗ X = ⃗ x is strongly sufficient for Y = y in M along a network ⃗ N . Define ⃗ A = V − ( ⃗ X ∪ ⃗ N ) . We need to show that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n .We know that ⃗ X = ⃗ x is directly sufficient for ⃗ N = ⃗ n . Define ⃗ C = V− ( ⃗ X ∪ ⃗ N ) and ⃗ D = ⃗ N − ⃗ N . Note that ⃗ C = ⃗ A ∪ ⃗ D . We have that for all ⃗ c ∈ R ( ⃗ C ) andall ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n . In particular, we have that forall ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n .Define ⃗ C = V − ( ⃗ N ∪ ⃗ N ) and ⃗ D = ⃗ N − ( ⃗ N ∪ ⃗ N ) . Note that ⃗ C = ⃗ A ∪ ⃗ D ∪ ⃗ X .We have that for all ⃗ c ∈ R ( ⃗ C ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ N ← ⃗ n , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n . In particular, we have that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ N ← ⃗ n , ⃗ A ← ⃗ a ] ⃗ N = ⃗ n . Combined with the conclusionfrom the previous paragraph, it follows that for all ⃗ a ∈ R ( ⃗ A ) and all ⃗ u ∈ R ( U ) , ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ A ← ⃗ a ] ⃗ N = ⃗ n ∧ ⃗ N = ⃗ n .Defining ⃗ N k + = ⃗ Y , we can generalize this reasoning for all consecutive i ∈ { , . . . , k + } to get the desired outcome. Defining Causation using Sufficiency
Theorem 5.3:
The following are all equivalences among the twelve definitionsand the three HP definitions: • Modified HP iff Def 1 • Def 2 iff Def 5 • Def 8 iff Def 11 • Def 3 iff Def 6 iff Def 9 iff Def 12Proof:
First we consider the equivalences that do hold.We start with the first equivalence:
Modified HP iff Def 1 . This is simplya matter of explicitly writing out the definitions, starting with actual weaksufficiency: ⃗ X = ⃗ x is actually weakly sufficient for Y = y in ( M, ⃗ u ) iff ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ] Y = y . Next we note that the following condition is trivially satisfiedfor any ⃗ W ⊆ V : ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ ] Y = y .Combining both claims, we can rewrite Modified HP as follows, whichgives the desired result:AC2(a). There is a set ⃗ W ⊆ ( V − ( ⃗ X ∪ { Y })) and a setting ⃗ x ′ of the variables in ⃗ X such that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not actually weakly sufficient for Y = y in ( M, ⃗ u ) .AC2(b). ( ⃗ X = ⃗ x, ⃗ W = ⃗ w ∗ ) is actually weakly sufficient for Y = y in ( M, ⃗ u ) .25ext we consider all of the following equivalences: Def 2 iff Def 5 , Def8 iff Def 11 , Def 3 iff Def 6 , Def 9 iff Def 12 . The reason we can groupthese together, is because we can prove all of them by invoking the followingobservation and two subsequent lemmas.
Observation 1
Recall our restriction on causal models that exogenous variablesonly appear in equations of the form V = U . Say ⃗ R ⊆ V are all variables whichhave such an equation, and call these the root variables. It is clear that if weintervene on all of the root variables, they take over the role of the exogenousvariables. Concretely, given strong recursivity, for any setting ⃗ r ∈ R ( ⃗ R ) thereexists a unique setting ⃗ v ∈ R ( V ) so that for all contexts ⃗ u ∈ R ( U ) we have that ( M, ⃗ u ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v . Lemma A.1:
Given a setting ⃗ X = ⃗ x , a setting ⃗ N = ⃗ n that includes Y = y andsuch that ⃗ N ∩ ⃗ R = ∅ , a context ⃗ u , the following holds: • ⃗ X = ⃗ x is actually directly sufficient for Y = y in ( M, ⃗ u ) iff ⃗ X = ⃗ x is directlysufficient for Y = y in M ; • ⃗ X = ⃗ x is actually strongly sufficient for Y = y in ( M, ⃗ u ) along ⃗ N = ⃗ n iff ⃗ X = ⃗ x is strongly sufficient for Y = y in M along ⃗ N = ⃗ n . Proof:
Filling in the definitions of direct and actually direct sufficiency, thefirst equivalence reduces to the following: for all ⃗ c ∈ R ( V − ( ⃗ X ∪ { Y })) , it holdsthat ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] Y = y iff for all ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] Y = y .Because of Observation 1, we have that for any setting ⃗ v ∈ V and any setting ⃗ r ∈ R ( ⃗ R ) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v iff for all contexts ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ R ← ⃗ r ] V = ⃗ v . Combining this with the fact that ⃗ R ⊆ ( ⃗ C ∪ ⃗ X ) givesthe desired result.The second equivalence can be reformulated as follows: ⃗ X = ⃗ x is actuallydirectly sufficient for ⃗ N = ⃗ n in ( M, ⃗ u ) iff ⃗ X = ⃗ x is directly sufficient for ⃗ N = ⃗ n in M . In turn, this reduces to: for all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ N )) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n iff for all ⃗ u ′′ ∈ R ( U ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ C ← ⃗ c ] ⃗ N = ⃗ n .Given that ⃗ N ∩ ⃗ R = ∅ , we still have that ⃗ R ⊆ ( ⃗ C ∪ ⃗ X ) , and therefore we canapply the same reasoning as before. Lemma A.2:
For all twelve instances of the
General Definition of Causa-tion we can restrict ourselves to sets ⃗ N so that ( ⃗ N − { Y }) ∩ ⃗ R = ∅ . Proof:
Let ⃗ A denote ( ⃗ N − { Y }) ∩ ⃗ R . For all definitions using either variantsof direct or weak sufficiency the result follows immediately from the fact that ⃗ N − { Y } = ∅ .First consider the case where we use non-actual strong sufficiency ( Def 5 or Def 11 ). In that case, AC2(b) can never be satisfied unless ⃗ A = ∅ . To see why, ⃗ R is defined in Observation 1. ⃗ u ′′ ∈ R ( U ) , it has to hold that ( M, ⃗ u ′′ ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ ] ⃗ A = ⃗ a . Since ⃗ A ∩ ( ⃗ X ∪ ⃗ W ) and the equation for each element A i ∈ ⃗ A is ofthe form A i = U for some exogenous variable U , this is impossible. (Strictlyspeaking it is possible, namely if the range of U consists only of the single value a ∗ i . Although I did not make this explicit in Section 2, it is standard to assumethat all variables have a range that contains at least two elements.)Second consider the case where we use actual strong sufficiency and con-trastive necessity ( Def 2 ). (The case of
Def 8 is entirely analogous.) Say weare considering a candidate cause ⃗ X = ⃗ x , a candidate witness ⃗ W = ⃗ w ∗ , contrastvalues ⃗ x ′ , and a setting ⃗ N = ⃗ n that includes Y = y . Given AC1, we can safelyassume that ⃗ n = ⃗ n ∗ .I claim that the following holds, from which the result follows: ⃗ X = ⃗ x satisfiesAC2 using contrast values ⃗ x ′ , witness ⃗ W = ⃗ w ∗ , and network ⃗ N iff ⃗ X = ⃗ x satisfiesAC2 using contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A .Because ⃗ A ⊆ ⃗ R , we have that for any set ⃗ B ⊆ ( V − ⃗ A ) , and any setting ⃗ b ∈ R ( ⃗ B ) , ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b ] ⃗ A = ⃗ a ∗ . Moreover, since ( M, ⃗ u ) ⊧ ⃗ A = ⃗ a ∗ , foreach setting ⃗ v ∈ ( V − ⃗ A ) we also have that ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b ]( V − ⃗ A ) = ⃗ v iff ( M, ⃗ u ) ⊧ [ ⃗ B ← ⃗ b, ⃗ A ← ⃗ a ∗ ]( V − ⃗ A ) = ⃗ v .Using these observations and the fact that ⃗ A ⊆ ⃗ N , we get that the followingtwo conditions are equivalent, for which the result follows as far as AC2(b) isconcerned:AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c ] ⃗ N = ⃗ n ∗ .AC2(b). For all ⃗ c ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ N )) we have that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x, ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ C ← ⃗ c ]( ⃗ N − ⃗ A ) = ⃗ n ∗ (where ⃗ n is the restriction of ⃗ n ∗ to ( ⃗ N − ⃗ A ) ).Now we focus on AC2(a c ).Let us first assume AC2(a c ) holds for ⃗ X = ⃗ x , contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A . We need to show that it holds for ⃗ X = ⃗ x ,contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ ) , and network ⃗ N .Consider some ⃗ S ⊆ ⃗ N with Y ∈ ⃗ S . We need to find a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S )) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Define ⃗ S = ⃗ S − ⃗ A , ⃗ S = ⃗ S ∩ ⃗ A ,and ⃗ A = ⃗ A − ⃗ S .Since ⃗ S ⊆ ( ⃗ N − ⃗ A ) with Y ∈ ⃗ S , we know that there exists some ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Since ⃗ S ⊆ ⃗ S , it also holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Also, given our observations about ⃗ A , it also follows that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ . Lastly, note that [ V− ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S )] ∪ ⃗ A =V − ( ⃗ X ∪ ⃗ W ∪ ⃗ S ) . Therefore we can choose ⃗ t = ( ⃗ a , ⃗ t ) .Next we consider the other direction: assume AC2(a c ) holds for ⃗ X = ⃗ x ,contrast values ⃗ x ′ , witness ⃗ W = ⃗ w ∗ , and network ⃗ N . We need to show thatit holds for ⃗ X = ⃗ x , contrast values ⃗ x ′ , witness ( ⃗ W = ⃗ w ∗ , ⃗ A = ⃗ a ∗ ) , and network ⃗ N − ⃗ A . 27onsider some ⃗ S ⊆ ( ⃗ N − ⃗ A ) with Y ∈ ⃗ S . We need to find a ⃗ t ∈ R ( V − ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Note that ( ⃗ S ∪ ⃗ A ) ⊆ ⃗ N , and also Y ∈ ( ⃗ S ∪ ⃗ A ) . Therefore there exists some ⃗ t ∈ R ( V− ( ⃗ X ∪ ⃗ W ∪ ⃗ A ∪ ⃗ S ) so that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ]( ⃗ S ≠ ⃗ s ∗ ∨ ⃗ A ≠ ⃗ a ∗ ) . It follows that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ A ← ⃗ a ∗ , ⃗ T ← ⃗ t ] ⃗ S ≠ ⃗ s ∗ .Choosing ⃗ t = ⃗ t gives the desired result.Because of the above lemmas, all that remains is to show that the aboveequivalences hold also when Y ∈ ⃗ R . This is accomplished by showing thatsettings of such variables do not have any cause, regardless of the definition oneuses.AC2(a) requires us to look at all subsets of ⃗ N = ⃗ n that include Y = y , andverify that the candidate cause and witness ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) (or candidatewitness ⃗ W = ⃗ w ∗ in case we use AC2(a m )) is not sufficient for that subset.One such subset is the one containing just Y = y . By AC1, we have that ( M, ⃗ u ) ⊧ Y = y . Since Y ∈ ⃗ R , there is no intervention on the other endogenousvariables so that Y ≠ y under that intervention in ⃗ u . Therefore any definitionof causation using a version of actual sufficiency (i.e., Def 2 , Def 3 , Def 8 ,and
Def 9 ) considers all sets that do not include Y to be sufficient for Y = y in ( M, ⃗ u ) . In particular, they consider ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) to be sufficient for Y = y in ( M, ⃗ u ) , and thus fail to meet condition AC2(a).For the definitions using non-actual variants of sufficiency ( Def 5 , Def 6 , Def 11 , and
Def 12 ), it is condition AC2(b) that can never be satisfied. Anal-ogous to what we saw in the proof of Lemma A.2, this follows from the fact thatwhatever version of sufficiency we use, Y = y has to hold in all contexts, whichis impossible given that Y / ∈ ( ⃗ X ∪ ⃗ W ) . From this the result follows.Now we prove the only remaining equivalence: Def 6 iff Def 12 . (Given theprevious equivalences, other choices are possible too.) We need to show thatthe following two statements are equivalent: • ⃗ W = ⃗ w ∗ is not directly sufficient for Y = y . • There exists values ⃗ x ′ of ⃗ X such that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not directlysufficient for Y = y .Filling in Definition 4.1, the result follows immediately: • There exists a ⃗ z ∈ R ( V − ( ⃗ W ∪ ⃗ X ∪ { Y })) , a ⃗ x ′ ∈ R ( ⃗ X ) , and a ⃗ u ′ ∈ R ( U ) so that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ , ⃗ X ← ⃗ x ′ , ⃗ C ← ⃗ c ] Y ≠ y . • There exists values ⃗ x ′ of ⃗ X , a ⃗ z ∈ R ( V − ( ⃗ W ∪ ⃗ X ∪ { Y })) and a ⃗ u ′ ∈ R ( U ) so that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ , ⃗ X ← ⃗ x ′ , ⃗ C ← ⃗ c ] Y ≠ y .Second, we go over some examples to show that none of the other equiva-lences hold. (Obviously, from now on we may ignore Def 1 , Def 5 , Def 6 , Def7 , Def 9 , Def 11 , and
Def 12 .) Example A.3:
Equations: Y = ( X ∧ A ) ∨ D , D = A . Context: A =
1. Then X = Y = Modified HP : We can always consider choosing ⃗ W = ∅ , in which casewe simply get counterfactual dependence: ( M, ⃗ u ) ⊧ ⃗ X = ⃗ x ∧ Y = ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ ] Y ≠ y . Doing so in this example, we see that Y = ( X = , D = ) . There is clearly also nowitness ⃗ W = ⃗ w ∗ to show that X = D = X = • Updated HP and
Original HP : taking ( A = , D = ) as a witnessmeets the conditions. • Def 3 : again take ( A = , D = ) as a witness. • Def 2 : follows from the previous item and Theorem 5.5. • Def 8 : follows from the previous item and Theorem 5.5. X = Y = • Def 10 : X = Y = A = A or D to the witness. Butboth A = D = Y = • Def 4 : ( X = , A = ) and ( X = , D = ) also weakly suffice for Y = Def 4 and
Def 10 are not equivalent to any of the otherdefinitions. We give an example to show that
Def 4 and
Def 10 are notequivalent to each other either.
Example A.4:
Equations: Y = X ∧ A , X = A . Context: A =
1. Since X = Y =
1, we need to include A = ( X = , A = ) is weakly sufficient for Y =
1. However, so is A =
1, and therefore X = Y = Def 10 . Yet ( X = , A = ) is notweakly sufficient for Y =
1, and therefore X = Y = Def4 . This leaves us with the HP definitions,
Def 2 , Def 3 , and
Def 8 . The nextexample shows that the former are not equivalent to the latter.
Example A.5:
Equations: Y = ( X ∧ ¬ A ) ∨ D , D = A . Context: A =
1. Then X = Y = • Modified HP : Y = ( X = , A = ) , andnot on either X = A =
1. So X = • Updated HP and
Original : take A = X = Y = • Def 3 : X = Y = [ A ← , D ← ] ), so we need to add A or D to the witness. Since theactual value of A is 1, it is of no use, which leaves us with D . But D = Y = ( X = , D = ) .29 Def 2 : follows from the previous item and Proposition 6.2. • Def 8 : follows from the previous item and Proposition 6.2.That none of the HP definitions are equivalent is of course a well-establishedfact, and also follows from the examples we consider in Section 7. Therefore weare left with showing that
Def 2 , Def 3 , and
Def 8 are not equivalent. That
Def 3 differs from the other two is a direct consequence of some of our laterresults, but a simple example illustrates this as well.
Example A.6:
Equations: Y = A , A = X . Context: A =
1. Then it is easy tosee that X = Y = Def 3 .Lastly, I refer the reader to Example 7.2 in Sections 7 for an example thatshows
Def 2 and
Def 8 are not equivalent.
Proposition 5.4: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to a definitionthat uses minimal necessity, then ⃗ X is a singleton. Proof:
Since we know that
Def 7 is unsatisfiable and we have Theorem 5.3,we only need to consider
Def 3 , Def 8 , and
Def 10 . The following applies toboth weak and direct sufficiency (i.e.,
Def 3 and
Def 10 .)Assume ( ⃗ X = ⃗ x , ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y , and ⃗ W = ⃗ w ∗ isnot sufficient for Y = y . If either ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) or ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) isalso sufficient for Y = y , then ( ⃗ X = ⃗ x , ⃗ X = ⃗ x ) is not minimal.So let us assume that neither ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) nor ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) issufficient for Y = y . This means we can move ⃗ X to the witness to show that ⃗ X = ⃗ x satisfies AC2 by itself, and likewise for ⃗ X and ⃗ X reversed. From thisthe result follows.Now we prove that it also holds for strong sufficiency, i.e., for Def 8 . Assume ( ⃗ X = ⃗ x , ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along ⃗ N , and ⃗ W = ⃗ w ∗ is notsufficient for Y = y along any network ⃗ S ⊆ ⃗ N . If either ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) or ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is also sufficient for Y = y along ⃗ N , then ( ⃗ X = ⃗ x , ⃗ X = ⃗ x ) is not minimal.So let us assume that neither ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) nor ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) issufficient for Y = y along ⃗ N . If the same is true for all subnetworks ⃗ S ⊆ ⃗ N , thenas before, we can move either one of ⃗ X and ⃗ X to the witness to show that theother satisfies AC2 by itself.So let us assume that there is some subnetwork ⃗ S ′ ⊆ ⃗ N such that ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along ⃗ S ′ . (Obviously the same reasoningapplies to ⃗ X .) Since all subnetworks ⃗ S ′′ of ⃗ S ′ are also subnetworks of ⃗ N , itfollows from the above that ( ⃗ X = ⃗ x ) satisfies AC2 by itself when taking ⃗ W aswitness and ⃗ S ′ as network. From this the result follows. Theorem 5.5:
The only implications – involving either causes or parts ofcauses – between the remaining five definitions (
Def 2 , Def 3 , Def 4 , Def8 , and
Def 10 ) and the three HP definitions are the following ones (and theirimmediate consequences, of course): If part of
Modified HP then
Updated HP ; • If part of
Updated HP then
Original HP ; • If Def 3 then
Def 2 ; • If part of
Def 2 then
Def 8 ; • If Def 3 then
Original HP ; • If Def 10 then
Def 4 . Proof:
The first two implications are proven in (Halpern, 2016).First we prove the third implication. Assume ⃗ X = ⃗ x causes Y = y withwitness ⃗ W according to Def 3 . It follows from Proposition 5.4 that ⃗ X is asingle conjunct X . Note that this immediately implies minimality of ⃗ X .In other words, ( X = x, ⃗ W = ⃗ w ∗ ) is directly sufficient for Y = y , and thereexists some x ′ such that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not directly sufficient for Y = y .From the former it follows that ( X = x, ⃗ W = ⃗ w ∗ ) is strongly sufficient for Y = y along ∅ . From the latter it follows that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not stronglysufficient for Y = y along ∅ , from which the result follows.Second we prove the fourth implication. Assume ( X = x, ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along ⃗ N , and ( X = x ′ , ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not sufficientfor Y = y along any network ⃗ S ⊆ ⃗ N , for some ⃗ N , x ′ and ⃗ x ′ . We show that X = x causes Y = y according to Def 8 .Taking ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) as our witness and using ⃗ N , AC2(b) remainsunchanged. If ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is not sufficient for Y = y along any network ⃗ S ⊆ ⃗ N , then the result follows. We proceed by a reductio.Let us assume that ( ⃗ X = ⃗ x , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along some ⃗ S ⊆ ⃗ N . If ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is not sufficient for Y = y along any ⃗ S ′′ ⊆ ⃗ S , wehave a violation of minimality (since X is redundant). Therefore we know that ( ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is sufficient for Y = y along some network ⃗ S ′′ ⊆ ⃗ S .This means that there exist values ⃗ s ′′ ∈ R ( ⃗ S ′′ ) so that for all settings ⃗ c ∈R ( V − ( ⃗ S ′′ ∪ ⃗ X ∪ { X, Y }) , and for all x ′′ ∈ R ( X ) , it holds that ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c, X ← x ′′ ] ⃗ S = ⃗ s ′′ and ( M, ⃗ u ) ⊧ [ ⃗ X ← ⃗ x ′ , ⃗ W ← ⃗ w ∗ , ⃗ C ← ⃗ c, X ← x ′′ , ⃗ S ← ⃗ s ′′ ] Y = y . In particular, this holds if we choose X = x ′ . Butthat means that ( X = x ′ , ⃗ X = ⃗ x ′ , ⃗ W = ⃗ w ∗ ) is also sufficient for Y = y along ⃗ S ′′ ,which contradicts our starting assumption.Third we prove the fifth implication. As with the third implication, assumethat ( X = x, ⃗ W = ⃗ w ∗ ) is directly sufficient for Y = y , and there exists some x ′ such that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not directly sufficient for Y = y . From the latterit follows that there exists a setting ⃗ d of V − ( ⃗ X ∪ ⃗ W ∪ { Y }) such that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ , ⃗ D ← ⃗ d ] Y ≠ y . This means that if we take ( ⃗ W = ⃗ w ∗ , ⃗ D = ⃗ d ) aswitness, AC2(a) is satisfied for Original HP . Since ( X = x, ⃗ W = ⃗ w ∗ ) is directlysufficient for Y = y , we know that ( M, ⃗ u ) ⊧ [ X ← x, ⃗ W ← ⃗ w ∗ , ⃗ D ← ⃗ d ] Y = y . Also,we have that ⃗ Z = ⃗ X , and thus the former means that also AC2(b) is satisfiedfor Original HP . 31ourth we prove the last implication. Assume X = x causes Y = y withwitness ⃗ W according to Def 10 . (We know because of Proposition 5.4 that ⃗ X is a singleton.) In other words, ( X = x, ⃗ W = ⃗ w ∗ ) is weakly sufficient for Y = y ,and ⃗ W = ⃗ w ∗ is not weakly sufficient for Y = y . Remains to be shown that thereexist a value x ′ so that ( X = x ′ , ⃗ W = ⃗ w ∗ ) is not weakly sufficient for Y = y .Say ⃗ u ′ is a context such that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ ] Y ≠ y , and say x ′ is theunique value such that ( M, ⃗ u ′ ) ⊧ [ ⃗ W ← ⃗ w ∗ ] X = x ′ . Then also ( M, ⃗ u ′ ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] Y ≠ y , which is what remained to be shown.Fifth, we show that none of the remaining implications hold. (Again, wedo not consider the relations amongst the HP definitions explicitly and referthe reader to the examples in Section 7. We also do not explicitly considerthe remaining implications for parts of causes, but the reader can verify thatthe following examples suffice to falsify all those implications as well. For theleft-hand side of all implications this follows immediately from the fact that thecauses in all the following examples are singletons. For the right-hand side ofimplications, Propositions 5.4, 6.1, and 6.2 come in handy.)Example A.4 shows that Def 4 does not imply
Def 10 .Example A.3 shows that none of the other definitons imply either
Def 4 or Def 10 . So there are no remaining implications with either
Def 4 or Def 10 on the right-hand side.Example A.6 shows that
Def 3 is not implied by any definition.Example A.5 shows that none of the HP definitions imply
Def 2 or Def8 . Note that
Def 4 and
Def 10 also consider X = Y = X = Y =
1, whereas X = Def 8 does not imply
Def2 . Therefore there are no remaining implications with
Def 2 or Def 8 on theright-hand side.That leaves us to consider implications with one of the HP definitions onthe right-hand side. Given the first two implications of Theorem 5.5, it sufficesto show that none of
Def 4 , Def 2 , Def 8 , or
Def 10 , imply
Original HP ,and that
Def 3 does not imply
Updated HP .I refer the reader to Example 7.8 in Section 7 for an example where
Def 2 – and thus also
Def 8 – hold and
Original HP does not.The following example shows that neither
Def 4 nor
Def 10 implies
Orig-inal HP . Example A.7:
Equations: Y = Z ∨ Z ∨ A , Z = X ∧ A , Z = X ∧ ¬ A . Context: A = X =
1. Then X = Y = • Def 10 : X = Y = ∅ is not. • Def 4 : follows from the previous one.Yet X = Y = Original HP . To see why,note that we need to include A = Z . Also, we clearly cannot add Z =
1. Therefore the witnesshas to be A =
0. The actual value of Z is 0. Since we have ( M, ⃗ u ) ⊧ [ X ← , A ← , Z ← ] Y =
0, AC2(b) is not satisfied.32astly, an example to show that
Def 3 does not imply
Updated HP . Example A.8:
Equations: Y = ( X ∧ D ) ∨ A , D = A . Context: A = X = X = Y = Def 3 : ( X = , D = ) is directlysufficient for Y =
1, and ( X = , D = ) is not. But X = Y = Updated HP . To see why, note that we need to include A = ( M, ⃗ u ) ⊧ [ X ← , A ← ] Y = Updated HP . Excluding Def 3 and Def 10
Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in ( M, ⃗ u ) according to Def 3 , then ⃗ X is a singleton, and X is a parent of Y . Proof:
That ⃗ X is always a singleton is a direct consequence of the combinationof Proposition 5.4 and Theorem 5.3.Recall that X is a parent of Y iff there exists a context ⃗ u ′′ , a setting ⃗ z ∈R ( V − { X, Y }) , and values x, x ′′ of X so that F Y (⃗ u ′′ , ⃗ z, x ) ≠ F Y (⃗ u ′′ , ⃗ z, x ′′ ) . Thismeans precisely that for some y ∈ R ( Y ) , ( M, ⃗ u ′′ ) ⊧ [ ⃗ Z ← ⃗ z, X ← x ] Y = y and ( M, ⃗ u ′′ ) ⊧ [ ⃗ Z ← ⃗ z, X ← x ′′ ] Y ≠ y . If X = x causes Y = y according to Def 3 ,the existence of values such that the previous holds follows immediately.
Proposition 6.2: If X is only a parent of Y , then Def 3 , Def 2 , and
Def 8 are all equivalent for causes X = x . Proof:
Given Theorem 5.5, we only need to prove the implication from
Def 8 to Def 3 .Assume X is only a parent of Y , and X = x causes Y = y according to Def8 . Thus, there is a witness ⃗ W and some network ⃗ N such that ( X = x, ⃗ W = ⃗ w ∗ ) is strongly sufficient for Y = y along ⃗ N , and ( ⃗ W = ⃗ w ∗ ) is not strongly sufficientfor Y = y along any subnetwork of ⃗ N .First consider the case where ⃗ N = ∅ . This means that ( X = x, ⃗ W = ⃗ w ∗ ) isdirectly sufficient for Y = y , and ( ⃗ W = ⃗ w ∗ ) is not directly sufficient for Y = y .That means precisely that X = x causes Y = y according to Def 12 . The resultnow follows from Theorem 5.3.Second consider the case where there exists some N ∈ ⃗ N . If N is not anancestor of Y , it can be removed from ⃗ N without consequence. If N is anancestor of Y , then it cannot be a descendant of X . But in that case it does notdepend on X , and thus we can remove it from ⃗ N and add it to the witness ⃗ W without consequence. Therefore there always exists a choice of witness so that ⃗ N = ∅ , and thus the result follows. Proposition 6.3:
Out of all definitions we have considered,
Def 10 and
Def3 are the only ones which do not satisfy
Dependence . Proof:
For the HP definitions this is proven in (Halpern, 2016, p. 26).33xample A.6 shows the result for
Def 3 .Example A.4 shows the result for
Def 10 .Therefore it remains to be shown that
Dependence implies
Def 2 , Def 4 ,and
Def 8 . This is a direct consequence of the fact that
Dependence implies
Modified HP , combined with Proposition 7.5.
Def 2, Def 4, and Def 8, vs the HP definitions
Proposition 7.5: If Modified HP with ⃗ X a singleton, then Def 2 , Def 4 ,and
Def 8 . Proof:
Recall the root variables ⃗ R from Observation 1. Note that for anysetting ⃗ r ∈ R ( ⃗ R ) , for any set ⃗ Y ⊆ ( V − ⃗ R ) , there exists some ⃗ y so that ⃗ R = ⃗ r isboth weakly, actually weakly, and strongly, sufficient for ⃗ Y = ⃗ y .Assume X = x causes Y = y according to Modified HP with witness ⃗ W .This means there exists a x ′ so that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] Y ≠ y . Let ⃗ S = ⃗ R − ( ⃗ W ∪ { X }) .First we focus on Def 4 . Note that ( X = x, ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is weaklysufficient for Y = y . Furthermore, changing X from x to x ′ obviously has noeffect on any of the values in ⃗ R . Therefore ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ ] ⃗ S = ⃗ s ∗ ,and thus we get that ( M, ⃗ u ) ⊧ [ X ← x ′ , ⃗ W ← ⃗ w ∗ , ⃗ S ← ⃗ s ∗ ] Y ≠ y . (Also, we mayassume that ⃗ W ∩ ⃗ R = ∅ .) From this it follows that ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) isnot weakly sufficient for Y = y . So taking ( ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) as witness gives thedesired result.Second we focus on Def 2 (from which
Def 8 follows due to Theorem5.5). Combining the previous statement about ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) withProposition 4.6 it follows immediately that there does not exist any network ⃗ N so that ( X = x ′ , ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is strongly sufficient for Y = y along ⃗ N .Clearly there exists some ⃗ N so that ⃗ R = ⃗ r ∗ is strongly sufficient for Y = y along ⃗ N . (We can start by picking parents ⃗ A of Y = y such that ⃗ A = ⃗ a ∗ isdirectly sufficient for Y = y . Then we can take parents of all elements in ⃗ A , toget a set ⃗ B so that ⃗ B = ⃗ b ∗ is directly sufficient for ⃗ A = ⃗ a ∗ , etc.) But then also ( X = x, ⃗ S = ⃗ s ∗ , ⃗ W = ⃗ w ∗ ) is strongly sufficient for Y = y along ⃗ N , from which theresult follows. Acknowledgements
Many thanks to Joe Halpern and Naftali Weinberger for helpful comments onearlier versions of this paper. This research was made possible by funding fromthe Alexander von Humboldt Foundation.34 eferenceseferences