[PDF] Agent Incentives: A Causal Perspective

Abstract

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

Full PDF

aa r X i v : . [ c s . A I] F e b Agent Incentives: A Causal Perspective

Tom Everitt, * Ryan Carey, * Eric Langlois, * Pedro A. Ortega, Shane Legg DeepMind, University of Oxford, University of Toronto, Vector Institute, * Equal [email protected], [email protected], [email protected], [email protected]

Abstract

We present a framework for analysing agent incentives usingcausal inﬂuence diagrams. We establish that a well-knowncriterion for value of information is complete. We proposea new graphical criterion for value of control, establishingits soundness and completeness. We also introduce two newconcepts for incentive analysis: response incentives indicatewhich changes in the environment affect an optimal deci-sion, while instrumental control incentives establish whetheran agent can inﬂuence its utility via a variable X. For bothnew concepts, we provide sound and complete graphical cri-teria. We show by example how these results can help withevaluating the safety and fairness of an AI system.

Introduction

A recurring question in AI research is how to choose an objec-tive to induce safe and fair behaviour (O’Neil 2016; Russell2019). In a given setup, will an optimal policy depend on asensitive attribute, or seek to inﬂuence an important variable?For example, consider the following two incentive designproblems, to which we will return throughout the paper:

Example 1 (Grade prediction) . To decide which applicantsto admit, a university uses a model to predict the grades ofnew students. The university would like the system to predictaccurately, without treating students differently based ontheir gender or race (see Figure 1a).

Example 2 (Content recommendation) . An AI algorithmhas the task of recommending a series of posts to a user. Thedesigners want the algorithm to present content adapted toeach user’s interests to optimize clicks. However, they do notwant the algorithm to use polarising content to manipulatethe user into clicking more predictably (Figure 1b).

Contributions

This paper provides a common languagefor incentive analysis, based on inﬂuence diagrams (Howard1990) and causal models (Pearl 2009). Traditionally, inﬂu-ence diagrams have been used to help decision-makers makebetter decisions. Here, we invert the perspective, and use thediagrams to understand and predict the behaviour of machinelearning systems trained to optimize an objective in a given environment. To facilitate this analysis, we prove a numberof relevant theorems and introduce two new concepts:•

Value of Information (VoI): First deﬁned by Howard(1966), a graphical criterion for detecting positive VoIin inﬂuence diagrams were proposed and proven soundby Fagiuoli and Zaffalon (1998), Lauritzen and Nilsson(2001), and Shachter (2016). Here we offer the ﬁrst correctcompleteness proof, showing that the graphical criterion isunique and cannot be further improved upon.•

Value of Control (VoC): Deﬁned by Shachter (1986), Math-eson (1990), and Shachter and Heckerman (2010), an in-complete graphical criterion was discussed by Shachter(1986). Here we provide a complete graphical criterion,along with both soundness and completeness proofs.•

Instrumental Control incentive (ICI): We propose a reﬁne-ment of VoC to nodes the agent can inﬂuence with its de-cision. Conceptually, this is a hybrid of VoC and respon-siveness (Shachter 2016). We offer a formal deﬁnition ofinstrumental control incentives based on nested counterfac-tuals, and establish a sound and complete graphical crite-rion.•

Response incentive (RI): Which changes in the environ-ment does an optimal policy respond to? This is a centralproblem in fairness and AI safety (e.g. Kusner et al. 2017;Hadﬁeld-Menell et al. 2017). Again, we give a formal def-inition, and a sound and complete graphical criterion.Our analysis focuses on inﬂuence diagrams with a single-decision. This single-decision setting is adequate to modelsupervised learning, (contextual) bandits, and the choice of apolicy in an MDP. Previous work has also discussed ways totransform a multi-decision setting into a single-decision set-ting by imputing policies to later decisions (Shachter 2016).

Applicability

This paper combines material from twopreprints (Everitt et al. 2019c; Carey et al. 2020). Since therelease of these preprints, the uniﬁed language of causal inﬂu-ence diagrams have already aided in the understanding of in-centive problems such as an agent’s redirectability, ambition,tendency to tamper with reward, and other properties (Arm-strong et al. 2020; Holtman 2020; Cohen, Vellambi, and Hut-ter 2020; Everitt et al. 2019a,b; Langlois and Everitt 2021). ace High school Education GradePredicted gradeGender Accuracy(a) Fairness example: grade prediction Poststo showModel oforiginal opinionsOriginaluser opinions ClicksInﬂuenceduser opinions(b) Safety example: content recommendation structurenodedecision nodeutility node

Figure 1: Two examples of decision problems represented as causal inﬂuence diagrams. In a) a predictor at a hypotheticaluniversity aims to estimate a student’s grade, using as inputs their gender and the high school they attended. We ask whether thepredictor is incentivised to behave in a discriminatory manner with respect to the students’ gender and race. In this hypotheticalcohort of students, performance is assumed to be a function of the quality of the high-school education they received. A student’shigh-school is assumed to be impacted by their race, and can affect the quality of their education. Gender, however, is assumednot to have an effect. In b) the goal of a content recommendation system is to choose posts that will maximise the user’s clickrate. However, the system’s designers prefer the system not to manipulate the user’s opinions in order to obtain more clicks.

Setup

To analyse agents’ incentives, we will need a graphical frame-work with the causal properties of a structural causal modeland the node categories of an inﬂuence diagram. This sec-tion will deﬁne such a model after reviewing structural causalmodels and inﬂuence diagrams.

Structural Causal Models

Structural causal models (SCMs) Pearl (2009) are a type ofcausal model where all randomness is consigned to exoge-nous variables, while deterministic structural functions re-late the endogenous variables to each other and to the ex-ogenous ones. As demonstrated by Pearl (2009), this struc-tural approach has signiﬁcant beneﬁts over traditional causalBayesian networks for analysing (nested) counterfactualsand “individual-level” effects.

Deﬁnition 1 (Structural causal model; Pearl 2009, Chapter7) . A structural causal model (with independent errors) is atuple h E , V , F , P i , where E is a set of exogenous variables; V is a set of endogenous variables; and F = { f V } V ∈ V is a collection of functions, one for each V . Each function f V : dom( Pa V ∪ {E V } ) → dom( V ) speciﬁes the value of V in terms of the values of the corresponding exogenousvariable E V and endogenous parents Pa V ⊂ V , wherethese functional dependencies are acyclic. The domainof a variable V is dom( V ) and for a set of variables, dom( W ) := × W ∈ W dom( W ) . The uncertainty is encodedthrough a probability distribution P ( ε ) such that theexogenous variables are mutually independent.For example, Figure 2b shows an SCM that models how posts ( D ) can inﬂuence a user’s opinion ( O ) and clicks ( U ).xThe exogenous variables E of an SCM represent factorsthat are not modelled. For any value E = ε of the exogenousvariables, the value of any set of variables W ⊆ V is givenby recursive application of the structural functions F andis denoted by W ( ε ) . Together with the distribution P ( ε ) over exogenous variables, this induces a joint distribution Pr( W = w ) = P { ε | W ( ε )= w } P ( ε ) .SCMs model causal interventions that set variables to par-ticular values. These are deﬁned via submodels: Deﬁnition 2 (Submodel; Pearl 2009, Chapter 7) . Let M = h E , V , F , P i be an SCM, X a set of variables in V ,and x a particular realization of X . The submodel M x represents the effects of an intervention do( X = x ) ,and is formally deﬁned as the SCM h E , V , F x , P i where F x = { f V | V / ∈ X } ∪ { X = x } . That is to say, the originalfunctional relationships of X ∈ X are replaced with the con-stant functions X = x .More generally, a soft intervention on a variable X inan SCM M replaces f X with a function g X : dom( Pa X ∪{E X } ) → dom( X ) (Eberhardt and Scheines 2007; Tian andPearl 2001). The probability distribution Pr( W g X ) on any W ⊆ V is deﬁned as the value of Pr( W ) in the submodel M g X where M g X is M modiﬁed by replacing f X with g X .If W is a variable in an SCM M , then W x refers to thesame variable in the submodel M x and is called a potentialresponse variable . In Figure 2b, the random variable O rep-resents user opinion under “default” circumstances while O d in Figure 2c represents the user’s opinion given an interven-tion do( D = d ) on the content posted. Note also how theintervention on D severs the link from ε D to d in Figure 2c,as the intervention on D overrides the causal effect from D ’sparents. Throughout this paper we use subscripts to indicatesubmodels or interventions, and superscripts for indexing.More elaborate hypotheticals can be described with anested counterfactual, in which the intervention is itself a po-tential response variable. In Figure 2c, the click probability U depends on both the chosen posts D and the user opinion O ,which is in turn also inﬂuenced by D . The nested potentialresponse variable U O d , deﬁned by U O d ( ε ) := U o ( ε ) where o = O d ( ε ) , represents the probability that a user clicks on a“default” post D given that their opinion has been inﬂuencedby a hypothetical post d . In other words, the effect of the in-tervention do( D = d ) is propagated to U only through O . Causal Inﬂuence Diagrams

Inﬂuence diagrams are graphical models with special deci-sion and utility nodes, developed to model decision makingproblems (Howard 1990; Lauritzen and Nilsson 2001). Inﬂu-ence diagrams do not in general have causal semantics, al-though some causal structure can be inferred (Heckermanand Shachter 1995). We will assume that the edges of the OU E D E O E U Opinion O = f O ( D, E O ) Clicks U = f U ( D, O, E U ) (a) SCIM DOU E D E O E U Posts D = π ( E D ) Opinion O = f O ( D, E O ) Clicks U = f U ( D, O, E U ) (b) SCM DOU E D E O E U dO d U O d Posts d = apoliticalOpinion O d = f O ( d, E O ) Clicks U Od = f U ( D, O d , E U ) (c) SCM with nested counterfactual exogenousnodestructuralnodeintervenednodedecision nodeutility node Figure 2: An example of a SCIM and interventions. In the SCIM, either political or apolitical posts D are displayed. These affectthe user’s opinion O . D and O inﬂuence the user’s clicks U (a). Given a policy, the SCIM becomes a SCM (b). Interventionsand counterfactuals may be deﬁned in terms of this SCM. For example, the nested counterfactual U O d represents the number ofclicks if the user has the opinions that they would arrive at, after viewing apolitical content (c).inﬂuence diagram reﬂect the causal structure of the environ-ment, so we use the term “Causal Inﬂuence Diagram”. Deﬁnition 3 (Causal inﬂuence diagram) . A causal inﬂuencediagram (CID) is a directed acyclic graph G where the vertexset V is partitioned into structure nodes X , decision nodes D , and utility nodes U . Utility nodes have no children.We use Pa V and Desc V to denote the parents and descen-dants of a node V ∈ V . The parents of the decision, Pa D , arealso called observations. An edge from node V to node Y isdenoted V → Y . Edges into decisions are called informationlinks, as they indicate what information is available at thetime of the decision. A directed path (of length at least zero)is denoted V Y . For sets of variables, V Y meansthat V Y holds for some V ∈ V , Y ∈ Y . Structural Causal Inﬂuence Models

For our new incentive concepts, we deﬁne a hybrid of theinﬂuence diagram and the SCM. Such a model, originallyproposed by Dawid (2002), has structure and utility nodeswith associated functions, exogenous variables with an asso-ciated probability distributions, and decision nodes, withoutany function at all, until one is selected by an agent. Thiscan be formalised as the structural causal inﬂuence model (SCIM, pronounced ‘skim’).

Deﬁnition 4 (Structural causal inﬂuence model) . A struc-tural causal inﬂuence model (SCIM) is a tuple M = hG , E , F , P i where:• G is a CID with ﬁnite-domain variables V (partitioned into X , D , and U ) where utility variable domains are a subsetof R . We say that M is compatible with G .• E = {E V } V ∈ V is a set of ﬁnite-domain exogenous vari-ables , one for each endogenous variable.• F = { f V } V ∈ V \ D is a set of structural functions f V : dom( Pa V ∪ {E V } ) → dom( V ) that specify howeach non-decision endogenous variable depends on its par-ents in G and its associated exogenous variable. Dawid called this a “functional inﬂuence diagram”. We favourthe term SCIM, because the corresponding term SCM is more preva-lent than “functional model”. • P is a probability distribution for E such that the individualexogenous variables E V are mutually independent.We will restrict our attention to single-decision settingswith D = { D } . An example of such a SCIM for the con-tent recommendation example is shown in Figure 2a. Insingle-decision SCIMs, the decision-making task is to max-imize expected utility by selecting a decision d ∈ dom( D ) based on the observations Pa D . More formally, the task isto select a structural function for D in the form of a pol-icy π : dom( Pa D ∪ {E D } ) → dom( D ) . The exogenousvariable E D provides randomness to allow the policy to bea stochastic function of its endogenous parents Pa D . Thespeciﬁcation of a policy turns a SCIM M into an SCM M π := h E , V , F ∪ { π } , P i , see Figure 2b. With the result-ing SCM, the standard deﬁnitions of causal interventions ap-ply. Note that what determines whether a node is observedor not at the time of decision-making is whether the node isa parent of the decision. Commonly, some structure nodesrepresent latent variables that are unobserved.We use Pr π and E π to denote probabilities and expecta-tions with respect to M π . For a set of variables X not in Desc D , Pr π ( x ) is independent of π and we simply write Pr( x ) . An optimal policy for a SCIM is deﬁned as anypolicy π that maximises E π [ U ] , where U := P U ∈ U U . Apotential response U x is deﬁned as U x := P U ∈ U U x . Materiality

Next, we review a characterization of which observationsare material for optimal performance, as this will be afundamental building block for most of our theory. Deﬁnition 5 (Materiality; Shachter 2016) . For any givenSCIM M , let V ∗ ( M ) = max π E π [ U ] be the maximum at-tainable utility in M , and let M X D be M modiﬁed byremoving any information link X → D . The observation X ∈ Pa D is material if V ∗ ( M X D ) < V ∗ ( M ) .Nodes may often be identiﬁed as immaterial based onthe graphical structure alone (Fagiuoli and Zaffalon 1998; In contrast to subsequent sections, the results in this section andthe VoI section do not require the inﬂuence diagrams to be causal. auritzen and Nilsson 2001; Shachter 2016). The graphicalcriterion uses uses the notion of d-separation.

Deﬁnition 6 (d-separation; Verma and Pearl 1988) . A path p is said to be d-separated by a set of nodes Z if and only if:1. p contains a collider X → W ← Y , such that the middlenode W is not in Z and no descendants of W are in Z , or2. p contains a chain X → W → Y or fork X ← W → Y where W is in Z , or3. one or both of the endpoints of p is in Z .A set Z is said to d-separate X from Y , written ( X ⊥ Y | Z ) if and only if Z d-separates every pathfrom a node in X to a node in Y . Sets that are notd-separated are called d-connected.According to the graphical criterion of Fagiuoli and Zaf-falon (1998), an observation cannot provide useful informa-tion if it is d-separated from utility, conditional on other ob-servations. This condition is called nonrequisiteness. Deﬁnition 7 (Nonrequisite observation; Lauritzen and Nils-son 2001) . Let U D := U ∩ Desc D be the utility nodes down-stream of D . An observation X ∈ Pa D in a single-decisionCID G is nonrequisite if: X ⊥ U D (cid:12)(cid:12) (cid:0) Pa D ∪ { D } \ { X } (cid:1) (1)In this case, the edge X → D is also called nonrequisite.Otherwise X and X → D are requisite .For example, in Figure 3a, high school is a requisite obser-vation while gender is not. Value of Information

Materiality can be generalized to nodes not observed, to as-sess which variables a decision-maker would beneﬁt fromknowing before making a decision, i.e. which variables haveVoI (Howard 1966; Matheson 1990). To assess VoI for a vari-able X , we ﬁrst make X an observation by adding a link X → D , and then test whether X is material in the updatedmodel (Shachter 2016). Deﬁnition 8 (Value of information) . A node X ∈ V \ Desc D in a single-decision SCIM M has VoI if it is material in themodel M X → D obtained by adding the edge X → D to M .A CID G admits VoI for X if X has VoI in a a SCIM M compatible with G .Since Deﬁnition 8 adds an information link, it can only beapplied to non-descendants of the decision, lest cycles be cre-ated in the graph. Fortunately, the structural functions neednot be adapted for the added link, since there is no structuralfunction associated with D .We prove that the graphical criterion of Deﬁnition 7 is tightfor both materiality and VoI, in that it identiﬁes every zeroVoI node that can be identiﬁed from the graphical structure(in a single decision setting). Theorem 9 (Value of information criterion) . A single deci-sion CID G admits VoI for X ∈ V \ Desc D if and only if X is a requisite observation in G X → D , the graph obtained byadding X → D to G . The soundness direction (i.e. the only if direction) followsfrom d-separation (Fagiuoli and Zaffalon 1998; Lauritzenand Nilsson 2001; Shachter 2016). In contrast, the complete-ness direction does not follow from the completeness prop-erty of d-separation. The d-connectedness of X to U impliesthat U may be conditionally dependent on X . It does notimply, however, that the expectation of U or the utility attain-able under an optimal policy will change. Instead, our proof(Appendix C.1) constructs a SCIM such that X is material.This differs from a previous attempt by Nielsen and Jensen(1999), as discussed in Related Work.We apply the graphical criterion to the grade prediction ex-ample in Figure 3a. One can see that the predictor has an in-centive to use the incoming student’s high school but not gen-der. This makes intuitive sense, given that gender provides noinformation useful for predicting the university grade in thisexample. Response Incentives

There are two ways to understand a material observation. Oneis that it provides useful information. From this perspective,a natural generalisation is VoI, as described in the previoussection. An alternative perspective is that a material observa-tion is one that inﬂuences optimal decisions. Under this inter-pretation, the natural generalisation is the set of all (observedand unobserved) variables that inﬂuence the decision. We saythat these variables have a response incentive. Deﬁnition 10 (Response incentive) . Let M be a single-decision SCIM. A policy π responds to a variable X ∈ X if there exists some intervention do( X = x ) and some set-ting E = ε , such that D x ( ε ) = D ( ε ) . The variable X has a response incentive if all optimal policies respond to X .A CID admits a response incentive on X if it is compatiblewith a SCIM that has a response incentive on X .For a response incentive on X to be possible, there must be:i) a directed path X D , and ii) an incentive for D to useinformation from that path. For example, in Figure 3a, gender has a directed path to the decision but it does not provide anyinformation about the likely grade, so there is no responseincentive. The graphical criterion for RI builds on a modiﬁedgraph with nonrequisite information links removed. Deﬁnition 11 (Minimal reduction; Lauritzen and Nilsson2001) . The minimal reduction G min of a single-decision CID G is the result of removing from G all information links fromnonrequisite observations.The presence (or absence) of a path X D in the mini-mal reduction tells us whether a response incentive can occur. Theorem 12 (Response incentive criterion) . A single-decision CID G admits a response incentive on X ∈ X ifand only if the minimal reduction G min has a directed path X D . The term responsiveness (Heckerman and Shachter 1995;Shachter 2016) has a related but not identical meaning – it refersto whether a decision D affects a variable X rather than whether X affects D .ace High school Education GradePredicted gradeGender Accuracy(a) Admits response incentive on raceRace Highschool Education GradePredicted gradeGender Accuracy VoIRI(b) Admits no response incentive on race Figure 3: In (a), the admissible incentives of the gradeprediction example from Figure 1a are shown, including aresponse incentive on race. In (b), the predictor no-longerhas access to the students’ high school, and hence there canno-longer be any response incentive on race.

Proof.

The if (completeness) direction is proved inLemma 28 in Appendix C.2. For the soundness direction, as-sume that for G , the minimal reduction G min does not containa directed path X D . Let M = hG , E , F , P i be anySCIM compatible with G . Let M min = (cid:10) G min , E , F , P (cid:11) be M , but with the minimal reduction G min . By Lemma 25 inAppendix C, there exists a G min -respecting policy ˜ π that isoptimal in M . In M min ˜ π , X is causally irrelevant for D so D ( ε ) = D x ( ε ) . Furthermore, M ˜ π and M min ˜ π are the sameSCM, with the functions F ∪ { ˜ π } . So D ( ε ) = D x ( ε ) also in M ˜ π , which means that there is an optimal policy in M thatdoes not respond to interventions on X for any ε .The intuition behind the proof is that an optimal decisiononly responds to effects that propagate to one of its requisiteobservations. For the completeness direction, we show in Ap-pendix C.2 that if X D is present in the minimal reduc-tion G min , then we can select a SCIM M compatible with G such that D receives useful information along that path, thatany optimal policy must respond to.In a safety setting, it may be desirable for an AI system tohave an incentive to respond to its shutdown button, so thatwhen asked to shut down, it does so (Hadﬁeld-Menell et al.2017). In a fairness setting, on the other hand, a responseincentive may be a cause for concern, as illustrated next. Incentivised unfairness

Response incentives are closelyrelated to counterfactual fairness (Kusner et al. 2017; Kilber-tus et al. 2017). A prediction — or more generally a decision— is considered counterfactually unfair if a change to a sensi-tive attribute like race or gender would change the decision.

Deﬁnition 13 (Counterfactual fairness; Kusner et al. 2017) . A policy π is counterfactually fair with respect to a sensitiveattribute A if Pr π (cid:0) D a ′ = d | pa D , a (cid:1) = Pr π (cid:0) D = d | pa D , a (cid:1) for every decision d ∈ dom( D ) , every context pa D ∈ dom( Pa D ) , and every pair of attributes a, a ′ ∈ dom( A ) with Pr( pa D , a ) > . A response incentive on a sensitive attribute indicates thatcounterfactual unfairness is incentivised, as it implies that all optimal policies are counterfactually unfair: Theorem 14 (Counterfactual fairness and response incen-tives) . In a single-decision SCIM M with a sensitive at-tribute A ∈ X , all optimal policies π ∗ are counterfactuallyunfair with respect to A if and only if A has a response incen-tive. The proof is given in Appendix C.5.A response incentive on a sensitive attribute meansthat counterfactual unfairness is not just possible, butincentivised. As a result, it has a more restrictive graphicalcriterion. The graphical criterion for counterfactual fairnessstates that a decision can only be counterfactually unfairwith respect to a sensitive attribute if that attribute is anancestor of the decision (Kusner et al. 2017, Lemma 1). Forexample, in the grade prediction example of Figure 3a, it ispossible for a predictor to be counterfactually unfair withrespect to either gender or race , because both are ancestorsof the decision. The response incentive criterion can tell us inwhich case counterfactual unfairness is actually incentivised.In this example, the minimal reduction includes the edgefrom high school to predicted grade and hence the directedpath from race to predicted grade . However, it excludesthe edge from gender to predicted grade . This means thatthe agent is incentivised to be counterfactually unfair withrespect to race but not to gender .Based on this, how should the system be redesigned?According to the response incentive criterion, the mostimportant change is to remove the path from race to pre-dicted grade in the minimal reduction. This can be doneby removing the agent’s access to high school . This changeis implemented in Figure 3b, where there is no responseincentive on either sensitive variable.Value of information is also related to fairness. For a sensi-tive variable that is not a parent of the decision, positive VoImeans that if the predictor gained access to its value, thenthe predictor would use it. For example, if in Figure 3b anedge is added from race to predicted grade , then unfair be-haviour will result. In practice, such access can result fromunanticipated correlations between the sensitive attribute andparents of the decision, rather than the system being givendirect access to the attribute. Analysing VoI may help detectsuch problems at an early stage. However, VoI is less closelyrelated to counterfactual fairness than response incentives. Inparticular, race lacks VoI in Figure 3a, but counterfactual un-fairness is incentivised. On the other hand, Figure 3b admitspositive VoI for race , but counterfactual unfairness is not in-centivised.The incentive approach is not restricted to counterfactualfairness. For any fairness deﬁnition, one could assess whetherthat kind of unfairness is incentivised by checking whether itis present under all optimal policies. Value of Control

A variable has VoC if a decision-maker could beneﬁt fromsetting its value (Shachter 1986; Matheson 1990; Shachterand Heckerman 2010). Concretely, we ask whether the at-tainable utility can be increased by letting the agent decidehe structural function for the variable.

Deﬁnition 15 (Value of control) . In a single-decision SCIM M , a non-decision node X has positive value of control if max π E π [ U ] < max π,g X E π [ U g X ] where g X : dom( Pa X ∪ {E X } ) → dom( X ) is a soft inter-vention at X , i.e. a new structural function for X that respectsthe graph.A CID G admits positive value of control for X if thereexists a SCIM M compatible with G where X has positivevalue of control. This can be deduced from the graph, usingagain the minimal reduction (Deﬁnition 11) to rule out effectsthrough observations that an optimal policy can ignore. Theorem 16 (Value of control criterion) . A single-decisionCID G admits positive value of control for a node X ∈ V \{ D } if and only if there is a directed path X U in theminimal reduction G min .Proof. The if (completeness) direction is proved inLemma 29. The proof of only if (soundness) is as follows.Let M = hG , E , F , P i be a single-decision SCIM. Let M g X be M , but with the structural function f X replaced with g X .Let M min and M min g X be the same SCIMs, respectively, butreplacing each graph with the minimal reduction G min .Recall that E π [ U g X ] is deﬁned by applying the soft in-tervention g X to the (policy-completed) SCM M π . How-ever, this is equivalent to applying the policy π to the modi-ﬁed SCIM M g X , as the resulting SCMs are identical. Since M g X is a SCIM, Lemma 25 can be applied, to ﬁnd a G min -respecting optimal policy ˜ π for M g X .Consider now the expected utility under an arbitrary inter-vention g X for a policy π optimal for M g X : E π [ U g X ] in M = E π [ U ] in M g X by SCM equivalence = E ˜ π [ U ] in M g X by Lemma 25 = E ˜ π [ U ] in M min g X since ˜ π is G min -respecting = E ˜ π [ U ] in M min by Lemma 23 = E ˜ π [ U ] in M only increasing the policy set ≤ max π ∗ E π ∗ [ U ] in M max dominates all elements.This shows that X must lack value of control.The proof of the completeness direction (Appendix C.3)establishes that if a path exists, then a SCIM be selectedwhere the intervention on X can either directly control U or increase the useful information available at D .To apply this criterion to the content recommendationexample (Figure 4a), we ﬁrst obtain the minimal reduction,which is identical to the original graph. Since all non-decision nodes are upstream of the utility in the minimalreduction, they all admit positive VoC. Notably, this includesnodes like original user opinions and model of user opinions that the decision has no ability to control according to thegraphical structure. In the next section, we propose instru-mental control incentives , which incorporate the agent’slimitations. Instrumental Control Incentive

Would an agent use its decision to control a variable X ? Thisquestion has two parts: whether X is useful to control (VoC),and whether X is possible to control (responsiveness). As de-scribed in the previous section, VoC uses U g X to consider theutility attainable from arbitrary control of X . Meanwhile, X d describes the way X can be controlled by D . These notionscan be combined with a nested counterfactual U X d , whichexpresses the effect that D can have on U by controlling X . Deﬁnition 17 (Instrumental control incentive) . In a single-decision SCIM M , there is an instrumental control incentive on a variable X in decision context pa D if, for all optimalpolicies π ∗ , E π ∗ [ U X d | pa D ] = E π ∗ [ U | pa D ] . (2)Conceptually, an instrumental control incentive can be in-terpreted as follows. If the agent got to choose D to inﬂuence X independently of how D inﬂuences other aspects of theenvironment, would that choice matter? We call it an instru-mental control incentive, as the control of X is a tool forachieving utility (cf. instrumental goals Omohundro 2008;Bostrom 2014). ICIs do not consider side-effects of the op-timal policy: for instance, it may be that all optimal policiesaffect X in a particular way, even if X is a not an ancestor ofany utility node — in such cases, no ICI is present. Finally, inPearl’s (2001) terminology, an instrumental control incentivecorresponds to a natural indirect effect from D to U via X in M π ∗ , for all optimal policies π ∗ .A CID G admits an instrumental control incentive on X if G is compatible with a SCIM M with an instrumental controlincentive on X for some decision context pa D . The follow-ing theorem gives a sound and complete graphical criterionfor which CIDs admit instrumental control incentives. Theorem 18 (Instrumental Control Incentive Criterion) . Asingle-decision CID G admits an instrumental control incen-tive on X ∈ V if and only if G has a directed path from thedecision D to a utility node U ∈ U that passes through X ,i.e. a directed path D X U .Proof. Completeness (the if direction) is proved in Ap-pendix C.4. The proof of soundness is as follows.Let M be any SCIM compatible with G and π any policyfor M . We consider variables in the SCM M π . If there is nodirected path D X U in G , then either D X or X U . If D X , then X d ( ε ) = X ( ε ) for any setting ε ∈ dom( E ) and decision d (Lemma 20). Therefore, U ( ε ) = U X d ( ε ) . Similarly, if X U then U ( ε ) = U x ( ε ) for everysetting ε ∈ dom( E ) , x ∈ dom( X ) and U ∈ U so U ( ε ) = U X d ( ε ) . In either case, E π [ U | pa D ] = E π [ U X d | pa D ] andthere is no instrumental control incentive on X .The logic behind the soundness proof above is that if thereis no path from D to X to U , then D cannot have any effecton U via X . For the completeness direction proved in Ap-pendix C.4, we show how to construct a SCIM so that U X d differs from the non-intervened U for any diagram with apath D X U . oststo showModel oforiginal opinionsOriginaluser opinions ClicksInﬂuenceduser opinions(a) Admits instrumental control incentive on user opinionPoststo showModel oforiginal opinionsOriginaluser opinions PredictedClicksInﬂuenceduser opinions VoCICI(b) Admits no instrumental control incentive on user opinion Figure 4: In (a), the content recommendation example fromFigure 1b is shown to admit an instrumental control incentiveon user opinion. This is avoided in (b) with a change to theobjective.Let us apply this criterion to the content recommendationexample in Figure 4a. The only nodes X in this graph thatlie on a path D X U are clicks and inﬂuenced useropinions . Since inﬂuenced user opinions has an instrumentalcontrol incentive, the agent may seek to inﬂuence thatvariable in order to attain utility. For example, it may beeasier to predict what content a more emotional user willclick on and therefore, a recommender may achieve a higherclick rate by introducing posts that induce strong emotions.How could we instead design the agent to maximise clickswithout manipulating the user’s opinions (i.e. without an in-strumental control incentive on inﬂuenced user opinions )?As shown in Figure 4b, we could redesign the system sothat instead of being rewarded for the true click rate, it is re-warded for the clicks it would be predicted to have, based ona separately trained model of the user’s preferences. An agenttrained in this way would view any modiﬁcation of user opin-ions as irrelevant for improving its performance; however,it would still have an instrumental control incentive for pre-dicted clicks so it would still deliver desired content. To avoidundesirable behaviour in practice, the click prediction musttruly predict whether the original user would click the con-tent, rather than baking in the effect of changes to the user’sopinion from reading earlier posts. This could be accom-plished, for instance, by training a model to predict how manyclicks each post would receive if it was offered individually.This dynamic is related to concerns about the long-termsafety of AI systems. For example, Russell (2019) has hy-pothesised that an advanced AI system would seek to ma-nipulate its objective function (or human overseer) to ob-tain reward. This can be understood as an instrumental con-trol incentive on the objective function (or the overseer’s be-haviour). A better understanding of incentives could there-fore be relevant for designing safe systems in both the shortand long-term. Related Work

Causal inﬂuence diagrams

Jern and Kemp (2011) andKleiman-Weiner et al. (2015) deﬁne inﬂuence diagrams withcausal edges, and similarly use them to model decision-making of rational agents (although they are less formal thanus, and focus on human decision-making).An informal precursor of the SCIM that also usedstructural functions (as opposed to conditional probabil-ity distributions) was the “functional inﬂuence diagram”(Dawid 2002). The most similar alternative model is theHoward canonical form inﬂuence diagram (Howard 1990;Heckerman and Shachter 1995). However, this only permitscounterfactual reasoning downstream of decisions, whichis inadequate for deﬁning the response incentive. Similarly,the causality property for inﬂuence diagrams introduced byHeckerman and Shachter (1994) and Shachter and Hecker-man (2010) only constrains the relationships to be partiallycausal downstream of the decision (though adding newdecision-node parents to all nodes makes the diagram fullycausal). Appendix A shows by example why the strongercausality property is necessary for most of our incentiveconcepts.An open-source Python implementation of CIDs has re-cently been developed (Fox et al. 2021). Value of information and control

Theorems 9 and 16 forvalue of information and value of control build on previouswork. The concepts were ﬁrst introduced by Howard (1966)and Shachter (1986), respectively. The VoI soundness prooffollows previous proofs (Shachter 1998; Lauritzen and Nils-son 2001), while the VoI completeness proof is most similarto an attempted proof by Nielsen and Jensen (1999). Theypropose the criterion X U D | Pa D for requisite nodes,which differs from (1) in the conditioned set. Taken literally, their criterion is unsound for requisite nodes and positive VoI.For example, in Figure 3a, High school is d-separated from accuracy given Pa D , so their criterion would fail to detectthat High school is requisite and admits VoI. To have positive VoC, it is known that a node must be anancestor of a value node (Shachter 1986), but the authorsknow of no more-speciﬁc criterion. The concept of a rele-vant node introduced by Nielsen and Jensen (1999) also bearssome semblance to VoC. https://github.com/causalincentives/pycid Def. 6 deﬁnes d-separation for potentially overlapping sets. Furthermore, to prove that nodes meeting the d-connectednessproperty are requisite, Nielsen and Jensen claim that “ X is [requi-site] for D if Pr(dom( U ) | D, Pa D ) is a function of X and U isa utility function relevant for D ”. However, U being a function of X only proves that U is conditionally dependent on X , not that itchanges the expected utility, or is requisite or material. Additionalargumentation is needed to show that conditioning on X canactually change the expected utility; our proof provides such anargument. Since a preprint of this paper was placed online (Everitt et al.2019c), this completeness result was independently discovered byZhang, Kumor, and Bareinboim (2020, Thm. 2) and Lee and Barein-boim (2020, Thm. 1). Theorem 2 in the latter also provides a crite-rion for material observations in a multi-decision setting. he relation of the current technical results to prior workis summarised in Table S1 in the Appendix.

Instrumental control incentives

Kleiman-Weiner et al.(2015) use (causal) inﬂuence diagrams to deﬁne a notion of intention , that captures which nodes an optimal policy seeksto inﬂuence. Intention is conceptually similar to instrumen-tal control incentives and uses hypothetical node deletions toask which nodes the agent intends to control. Their conceptis more reﬁned than ICI in the sense that it includes includesonly the nodes that determine optimal policy behaviour, butthe deﬁnition is not properly formalized and it is not clearthat it can be applied to all inﬂuence diagram structures.

AI fairness

Another application of this work is to evaluatewhen an AI system is incentivised to behave unfairly, onsome deﬁnition of fairness. Response incentives addressthis question for counterfactual fairness (Kusner et al. 2017;Kilbertus et al. 2017). An incentive criterion correspondingto path-speciﬁc effects (Zhang, Wu, and Wu 2017; Nabi andShpitser 2018) is deferred to future work. Nabi, Malinsky,and Shpitser (2019) have shown how a policy may bechosen subject to path-speciﬁc effect constraints. However,they assume recall of all past events, whereas the responseincentive criterion applies to any CID.

Mechanism design

The aim of mechanism design is to un-derstand how objectives and environments can be designed,in order to shape the behavior of rational agents (e.g. Nisanet al. 2007, Part II). At this high level, mechanism design isclosely related to the incentive design results we have devel-oped in this paper. In practice, the strands of research lookrather different. The core challenge of mechanism design isthat agents have private information or preferences. As wetake the perspective of an agent designer, private informa-tion is only relevant for us to the extent that some types ofagents or objectives may be harder to implement than oth-ers. Instead, our core challenge comes from causal relation-ships in agent environments, a consideration of little interestto most of mechanism design.

Discussion and Conclusion

We have proved sound and complete graphical criteria fortwo existing concepts (VoI and VoC) and two new concepts:response incentive and instrumental control incentive. The re-sults have all focused on the (causal) structure of the interac-tion between agent and environment. This is both a strengthand a weakness. On the one hand, it means that formal con-clusions can be made about a system’s incentives, even whendetails about the quantitative relationship between variablesis unknown. On the other hand, it also means that these re-sults will not help with subtler comparisons, such as the rel-ative strength of different incentives. It also means that thecausal relationships between variables must be known. Thischallenge is common to causal models in general. In the con-text of incentive design, it is partially alleviated by the factthat causal relationships often follow directly from the de-sign choices for an agent and its objective. Finally, causal diagrams struggle to express dynamically changing causal re-lationships.While important to be aware of, these limitations do notprevent causal inﬂuence diagrams from providing a clear,useful, and uniﬁed perspective on agent incentives. It hasseen applications ranging from value learning (Armstronget al. 2020; Holtman 2020), interruptibility (Langlois andEveritt 2021), conservatism (Cohen, Vellambi, and Hutter2020), modeling of agent frameworks (Everitt et al. 2019b),and reward tampering (Everitt et al. 2019a). Through suchapplications, we hope that the incentive analysis described inthis paper will ultimately contribute to more fair and safe AIsystems.

Acknowledgements

Thanks to Michael Cohen, Ramana Kumar, Chris van Mer-wijk, Carolyn Ashurst, Michiel Bakker, Silvia Chiappa, andKoen Holtman for their invaluable feedback.We acknowledge the support of the Natural Sciences andEngineering Research Council of Canada (NSERC), [fund-ing reference number CGSD3-534795-2019].Cette recherche a ´et´e ﬁnanc´ee par le Conseil de recherchesen sciences naturelles et en g´enie du Canada (CRSNG),[num´ero de r´ef´erence CGSD3-534795-2019].

References

Armstrong, S.; Orseau, L.; Leike, J.; and Legg, S. 2020. Pit-falls in learning a reward function online. In

InternationalJoint Conference on Artiﬁcial Intelligence (IJCAI) .Bostrom, N. 2014.

Superintelligence: Paths, Dangers, Strate-gies . Oxford University Press.Carey, R.; Langlois, E.; Everitt, T.; and Legg, S. 2020. TheIncentives that Shape Behaviour. In

SafeAI AAAI workshop .Cohen, M. K.; Vellambi, B. N.; and Hutter, M. 2020. Asymp-totically Unambitious Artiﬁcial General Intelligence. In

AAAI Conference on Artiﬁcial Intelligence .Correa, J.; and Bareinboim, E. 2020. A calculus for stochas-tic interventions: Causal effect identiﬁcation and surrogateexperiments. In

AAAI Conference on Artiﬁcial Intelligence .Dawid, A. P. 2002. Inﬂuence diagrams for causal modellingand inference.

International Statistical Review .Eberhardt, F.; and Scheines, R. 2007. Interventions andcausal inference.

Philosophy of Science .Everitt, T.; Hutter, M.; Kumar, R.; and Krakovna, V. 2019a.Reward Tampering Problems and Solutions in Reinforce-ment Learning: A Causal Inﬂuence Diagram Perspective.

CoRR .Everitt, T.; Kumar, R.; Krakovna, V.; and Legg, S. 2019b.Modeling AGI Safety Frameworks with Causal Inﬂuence Di-agrams. In

Workshop on Artiﬁcial Intelligence Safety , vol-ume 2419 of

CEUR Workshop Proceedings .Everitt, T.; Ortega, P. A.; Barnes, E.; and Legg, S. 2019c.Understanding Agent Incentives using Causal Inﬂuence Di-agrams. Part I: Single Action Settings.

CoRR .agiuoli, E.; and Zaffalon, M. 1998. A note about redun-dancy in inﬂuence diagrams.

International Journal of Ap-proximate Reasoning .Fox, J.; Hammond, L.; Everitt, T.; Abate, A.; and Wooldridge,M. 2021. Equilibrium Reﬁnements for Multi-Agent Inﬂu-ence Diagrams: Theory and Practice. In

AAMAS .Galles, D.; and Pearl, J. 1997. Axioms of Causal Relevance.

Artiﬁcial Intelligence .Hadﬁeld-Menell, D.; Dragan, A.; Abbeel, P.; and Russell,S. J. 2017. The Off-Switch Game. In

International JointConference on Artiﬁcial Intelligence (IJCAI) .Heckerman, D.; and Shachter, R. 1994. A Decision-BasedView of Causality. In

Uncertainty in Artiﬁcial Intelligence(UAI) , 302–310.Heckerman, D.; and Shachter, R. D. 1995. Decision-Theoretic Foundations for Causal Reasoning.

Journal of Ar-tiﬁcial Intelligence Research

3: 405–430. doi:10.1613/jair.202.Holtman, K. 2020. AGI Agent Safety by Iteratively Improv-ing the Utility Function.

International Conference on Artiﬁ-cial General Intelligence .Howard, R. A. 1966. Information Value Theory.

IEEE Trans-actions on Systems Science and Cybernetics .Howard, R. A. 1990. From inﬂuence to relevance to knowl-edge.

Inﬂuence diagrams, belief nets and decision analysis .Jern, A.; and Kemp, C. 2011. Capturing mental state rea-soning with inﬂuence diagrams. In

Proceedings of the 2011Cognitive Science Conference , 2498–2503.Kilbertus, N.; Rojas-Carulla, M.; Parascandolo, G.; Hardt,M.; Janzing, D.; and Sch¨olkopf, B. 2017. Avoiding Discrim-ination through Causal Reasoning. In

Advances in NeuralInformation Processing Systems , 656–666.Kleiman-Weiner, M.; Gerstenberg, T.; Levine, S.; and Tenen-baum, J. B. 2015. Inference of intention and permissibilityin moral decision making. In

Proceedings of the 37th AnnualConference of the Cognitive Science Society , 1123–1128.Kusner, M. J.; Loftus, J. R.; Russell, C.; and Silva, R. 2017.Counterfactual Fairness. In

Advances in Neural InformationProcessing Systems .Langlois, E.; and Everitt, T. 2021. How RL Agents Behavewhen their Actions are Modiﬁed. In

AAAI .Lauritzen, S. L.; and Nilsson, D. 2001. Representing andSolving Decision Problems with Limited Information.

Man-agement Science .Lee, S.; and Bareinboim, E. 2020. Characterizing optimalmixed policies: Where to intervene and what to observe.

Ad-vances in Neural Information Processing Systems

Inﬂuence Diagrams, Belief Nets, and Decision Analysis .Wiley and Sons. Nabi, R.; Malinsky, D.; and Shpitser, I. 2019. Learning opti-mal fair policies.

Proceedings of machine learning research .Nabi, R.; and Shpitser, I. 2018. Fair Inference on Outcomes.In

AAAI Conference on Artiﬁcial Intelligence .Nielsen, T. D.; and Jensen, F. V. 1999. Welldeﬁned DecisionScenarios. In

Uncertainty in Artiﬁcial Intelligence (UAI) .Nisan, N.; Roughgarden, T.; Tardos, E.; and Vijay V Vazirani,eds. 2007.

Algorithmic Game Theory . Cambridge UniversityPress.Omohundro, S. M. 2008. The Basic AI Drives. In Wang, P.;Goertzel, B.; and Franklin, S., eds.,

Artiﬁcial General Intelli-gence , volume 171. IOS Press.O’Neil, C. 2016.

Weapons of Math Destruction . CrownBooks.Pearl, J. 2001. Direct and Indirect Effects. In

Uncertainty inArtiﬁcial Intelligence (UAI) .Pearl, J. 2009.

Causality: Models, Reasoning, and Infer-ence . Cambridge University Press, 2nd edition edition. ISBN9780521895606.Russell, S. J. 2019.

Human Compatible: Artiﬁcial Intelli-gence and the Problem of Control . Viking.Shachter, R. D. 1986. Evaluating Inﬂuence Diagrams.

Oper-ations Research .Shachter, R. D. 1998. Bayes-Ball: The Rational Pastime (forDetermining Irrelevance and Requisite Information in BeliefNetworks and Inﬂuence Diagrams).

Uncertainty in ArtiﬁcialIntelligence (UAI) .Shachter, R. D. 2016. Decisions and Dependence in Inﬂu-ence Diagrams. In

International Conference on ProbabilisticGraphical Models .Shachter, R. D.; and Heckerman, D. 2010. Pearl Causalityand the Value of Control. In R. Dechter, H. G.; and Halpern,J. Y., eds.,

Heuristics, Probability and Causality: A Tributeto Judea Pearl . College Publications.Tian, J.; and Pearl, J. 2001. Causal discovery from changes.In

Uncertainty in Artiﬁcial Intelligence (UAI) .Verma, T.; and Pearl, J. 1988. Causal Networks: Semanticsand Expressiveness. In

Uncertainty in Artiﬁcial Intelligence(UAI) .Zhang, J.; Kumor, D.; and Bareinboim, E. 2020. Causal im-itation learning with unobserved confounders.

Advances inNeural Information Processing Systems

IJCAI International Joint Conference on ArtiﬁcialIntelligence .eﬁnition Criterion Soundness CompletenessVoI Howard 1966;Matheson 1990 Fagiuoli andZaffalon 1998;Lauritzen andNilsson 2001;Shachter 2016 Fagiuoli andZaffalon 1998;Lauritzen andNilsson 2001;Shachter 2016 First correct proof to our knowledge(see Related Work)VoC Shachter 1986;Matheson 1990;Shachter andHeckerman 2010 Incomplete version byShachter (1986)(see Related Work) New; proved usingdo-calculus and VoI New; proved constructively (cf.“relevant utility nodes” Nielsen andJensen (1999))RI New New New; proved usingdo-calculus and VoI New; proved constructivelyICI New New New; proved usingdo-calculus New; proved constructivelyTable S1: Comparison with related work. The concepts of positive value of information (VoI), and positive value of control (VoC)are well-known. For VoI, a new, corrected, proof is provided. For VoC, the present work offers a new criterion, proving it soundand complete. For response incentive (RI) and instrumental control incentive (ICI), the criterion and all proofs are new.

D UX X = DU = X + DD ∈ { , } (a) A causal inﬂuence diagramreﬂecting the causal structure ofthe environment D UX X = DU = 2 · DD ∈ { , } (b) Inﬂuence diagram that iscausal in the sense of Hecker-man and Shachter (1994, 1995) Figure 5: Two different inﬂuence diagram representations ofthe same situation, with different VoC and ICI.

A Causality Examples

Causal inﬂuence diagrams that reﬂect the full causal structureof the environment are needed to correctly capture responseincentives, value of control and instrumental control incen-tives. We begin with showing this for instrumental controlincentives and value of control, leaving response incentive tothe end of this section. Consider the two inﬂuence diagramsin Figure 5. If we assume that X really affects U , only the di-agram in Figure 5a correctly represents this causal structure,whereas Figure 5b lacks the edge X → U . According to Def-initions 15 and 17, X has positive value of control and aninstrumental control incentive. Only Figure 5a gets this right.The inﬂuence diagram literature has discussed weaker no-tions of causality, under which Figure 5b is considered a validalternative representation of the situation described by Fig-ure 5a. For example, if we only consider their joint distribu-tions conditional on various policies, then Figures 5a and 5bare identical. Both diagrams are also in the canonical formof Heckerman and Shachter (1995), as every variable respon-sive to the decision is a descendant of the decision. For thesame reason, both diagrams are also causal inﬂuence dia- DXY UY ∼ { , } X = YU = X + DD ∈ { , } (a) A causal inﬂuence diagramreﬂecting the causal structure ofthe environment DXY UX ∼ { , } Y = X U = X + DD ∈ { , } (b) Inﬂuence diagram that iscausal in the sense of Hecker-man and Shachter (1994, 1995) Figure 6: Two different inﬂuence diagram representations ofthe same situation, with different RI and VoC. In Figure 6a, Y is sampled from some arbitrary distribution on { , } , forexample a Bernoulli distribution with p = 0 . . In Figure 6b, X is sampled in the same way.grams in the terminology of Heckerman and Shachter (1994)and Shachter and Heckerman (2010). Since only Figure 5agets the incentives right, we see that the stronger notion ofcausal inﬂuence diagram introduced in this paper is neces-sary to correctly model instrumental control incentives andvalue of control.To show that response incentives also rely on fully causalinﬂuence diagrams, consider the diagrams in Figure 6. Again,we assume that Figure 6a accurately depicts the environment,while Figure 6b has the edge Y → X reversed. Again, bothdiagrams have identical joint distributions given any policy.Both diagrams are also causal in the weaker sense of Heck-erman and Shachter (1994) and Shachter and Heckerman(2010). Yet only the fully causal inﬂuence diagram in Fig-ure 6a exhibits that Y can have a response incentive or posi-tive value of control. Proof Preliminaries

Our proofs will rely on the following fundamental resultsabout causal models from (Galles and Pearl 1997) and (Pearl2009).

Deﬁnition 19 (Causal Irrelevance) . X is causally irrelevant to Y , given Z , written ( X Y | Z ) if, for every set W disjoint of X ∪ Y ∪ Z , we have ∀ ε, z , x , x ′ , w Y xzw ( ε ) = Y x ′ zw ( ε ) Lemma 20.

For every SCM M compatible with a DAG G , ( X Y | Z ) G ⇒ ( X Y | Z ) Proof.

By induction over variables, as in (Galles and Pearl1997, Lemma 12).

Lemma 21 (Pearl 2009, Thm. 3.4.1, Rule 1) . For anydisjoint subsets of variables W , X , Y , Z in the DAG G , E ( Y x | z , w ) = E ( Y x | w ) if Y ⊥ Z | ( X , W ) in the graph G ′ formed by deleting all incoming edges to X . Lemma 22 (Pearl 2009, Thm. 1.2.4) . For any three disjointsubsets of nodes ( X , Y , Z ) in a DAG G , ( X ⊥ G Y | Z ) ifand only if ( X ⊥⊥ Y | Z ) P for every probability function P compatible with G . Lemma 23 (Correa and Bareinboim 2020, Sigma CalculusRule 3) . For any disjoint subsets of nodes ( X , Y ) ⊆ V and Z ⊆ V in a DAG G Pr( X | Z ; g Y ) = Pr( X | Z ; g ′ Y ) if X ⊥ Y | Z in G Y ( Z ) where Y ( Z ) ⊆ Y is the set of elements in Y that are not ancestors of Z in G and G W denotes G but withedges incoming to variables in W removed. C Proofs

C.1 Value of Information Criterion

First, we introduce the notion of a G min -respecting optimalpolicy. Our proof of its optimality is similar to Theorem 3from (Lauritzen and Nilsson 2001). It builds on the followingintersection property of d-separation. Lemma 24 (d-separation intersection property) . For all dis-joint sets of variables W , X , Y , and Z , ( W ⊥ X | Y , Z ) ∧ ( W ⊥ Y | X , Z ) ⇒ ( W ⊥ ( X ∪ Y ) | Z ) Proof.

Suppose that the RHS is false, so there is a path from W to X ∪ Y conditional on Z . This path must have a sub-path that passes from W to X ∈ X without passing through Y or to Y ∈ Y without passing through X (it must traverseone set ﬁrst). But this implies that W is d-connected to X given Y , Z or to Y given X , Z , meaning the LHS is false.So if the LHS is true, then the RHS must be true. Lemma 25 ( G min -respecting optimal policy) . Every single-decision SCIM M = hG , E , F , P i has an optimal pol-icy ˜ π that depends only on requisite observations. In otherwords, ˜ π is also a policy for the minimal model M min = (cid:10) G min , E , F , P (cid:11) . We call ˜ π a G min -respecting optimal policy . Proof. First partition Pa D G into the requisite parents Pa D min = { W ∈ Pa D : W U D | { D } ∪ Pa D \ { W }} , and non-requisite parents Pa D − = Pa D G \ Pa D min .Let π ∗ be an optimal policy in M . To construct a G min -respecting version ˜ π , select any value ˜ pa D − ∈ dom( Pa D − ) forwhich Pr π ∗ ( Pa D − = ˜ pa D − ) > . For all pa D min ∈ dom( Pa D min ) and ε D ∈ dom( E D ) , let ˜ π ( pa D min , pa D − , ε D ) := π ∗ ( pa D min , ˜ pa D − , ε D ) . The policy ˜ π is permitted in M min because it does not varywith Pa D − .Now let us prove that ˜ π that is optimal in M . Partition U into U D = U ∩ Desc D and U \ D = U \ Desc D . D is causallyirrelevant for every U ∈ U \ D so every policy π (in particular, ˜ π ) is optimal with respect to U \ D := P U ∈ U \ D U .We now consider U D . By deﬁnition, W ⊥ U D | { D } ∪ Pa D \ { W } for every W ∈ Pa D − . By inductively applyingthe intersection property of d-separation (Lemma 24) overelements of Pa D − we obtain Pa D − ⊥ U D | { D } ∪ Pa D min . (3)Next, we establish that E ˜ π [ U D ] = E π ∗ [ U D ] byshowing that E ˜ π [ U D | pa D ] = E π ∗ [ U D | pa D ] for ev-ery pa D ∈ dom( Pa D ) with Pr( pa D ) > . First,the expected utility of ˜ π given any ( pa D min , pa D − ) with Pr( Pa D min = pa D min , pa D − = pa D − ) > is equal to the expectedutility of π ∗ on input ( pa D min , ˜ pa D − ) : E ˜ π [ U D | pa D min , pa D − ]= X u,d (cid:16) u Pr( U D = u | d, pa D min , pa D − ) · Pr ˜ π ( D = d | pa D min , pa D − ) (cid:17) = X u,d (cid:16) u Pr( U D = u | d, pa D min , ˜ pa D − ) · Pr π ∗ ( D = d | pa D min , ˜ pa D − ) (cid:17) = E π ∗ [ U D | pa D min , ˜ pa D − ] where the middle equality follows from (3) and the deﬁnitionof ˜ π . Second, the expected utility of π ∗ given input ˜ pa D − isthe same as its expected utility on any input pa D − : = max d E π ∗ [ U Dd | pa D min , ˜ pa D − ]= max d E π ∗ [ U Dd | pa D min , pa D − ]= E π ∗ [ U D | pa D min , pa D − ] where the ﬁrst equality follows from the optimality of π ∗ andthe second from Lemma 21. The expression E π ∗ [ U Dd | · · · ] means that we ﬁrst assign the policy π ∗ then intervene to set D = d , which renders π ∗ effectively irrelevant but formallynecessary for creating an SCM. This result shows that ˜ π isoptimal for U D and has E ˜ π [ U D ] = E π ∗ [ U D ] . Since ˜ π is opti-mal for both U D and U \ D , ˜ π is optimal in M .e now prove Theorem 9 by establishing the soundnessand completeness of the value of information criterion. Lemma 26 (VoI criterion soundness) . If, in the single-decision CID G , X ∈ V \ Desc D has X ⊥ U D (cid:12)(cid:12) (cid:0) Pa D ∪ { D } \ { X } (cid:1) where U D := U ∩ Desc D , then X does not have positivevalue of information in any SCIM M compatible with G . The result is already known from (Lauritzen and Nilsson2001; Fagiuoli and Zaffalon 1998), but we prove it here tomake the paper more self-contained.

Proof.

Let M = hG , E , F , P i be any SCIM compatiblewith G . Let G X → D and G X D be versions of G modi-ﬁed by adding and removing X → D respectively. Let G min X → D be the minimal reduction of G X → D . Let M X D := hG X D , E , F , P i and M min X → D := hG min X → D , E , F , P i beSCIMs with the same domains and structural functions.By Lemma 25, there is a G min -respecting policy ˜ π ad-missible in M min X → D and optimal in M X → D . We provethat G min X → D is a subgraph of G X D , meaning that ˜ π isalso admissible in M X D . By assumption, G has X ⊥ U D (cid:12)(cid:12) (cid:0) Pa D ∪ { D } \ { X } (cid:1) . Adding X → D to G cannotcause X to be d-connected to U D given Pa D ∪{ D } , becauseany new path along X → D is blocked by D and Pa D \ { X } .So G min X → D is a version of G with X → D (and possibly othernodes) removed. This makes it a subgraph of G X D , imply-ing that ˜ π is admissible in M X D .Since ˜ π is admissible in M X D and optimal in M X → D , V ∗ ( M X D ) < V ∗ ( M X → D ) . Lemma 27 (VoI criterion completeness) . If in the single-decision CID G , X ∈ V \ Desc D is d-connected to a utilitynode that is a descendant of D conditional on the decisionand other parents: X U D (cid:12)(cid:12) (cid:0) Pa D ∪ { D } \ { X } (cid:1) (4) where U D := U ∩ Desc D then X has VoI in at least oneSCIM M compatible with G . This follows from the response incentive completenessLemma 28 in Appendix C.2, so we defer the proof to thatsection.

C.2 Response Incentive Criterion

The Response Incentives section contains a proof of thesoundness of the response incentive criterion. We now proveits completeness in order to ﬁnish the proof of Theorem 12.Figure 7 illustrates the model constructed in the proof.

Lemma 28 (Response Incentive Criterion Completeness) . If X D in the minimal reduction G min of a single-decisionCID G then there is a response incentive on X in at least oneSCIM M compatible with G .Proof. Starting from the assumption that X D in G min ,we explicitly construct a compatible model for G for whichthe decision of every optimal policy causally depends on the value of X . Let −−→ XD be a directed path from X to D that onlycontains a single requisite observation that we label W (if X is itself a requisite observation, then W and X is the samenode). Since W is a requisite observation for D , there existssome utility node U descending from D that is d-connectedto W in G when conditioning on Pa D ∪ { D } \ { W } . Let −−→ DU be a directed path from D to U and let W U be a pathbetween W and U that is active when conditioning on Pa D ∪{ D } \ { W } . By the deﬁnition of d-connecting paths, W U has the following structure ( m ≥ ): W S C S · · · C m S m U consisting of directed sub-paths leaving source nodes S i andentering collider nodes C i , where there is a directed pathfrom each collider to Pa D ∪ { D } \ { W } and no non-collidernode is in Pa D ∪ { D } \ { W } . It may be the case that W and S are the same node. For each i ∈ { , . . . , m } , let −−−→ C i O i bea directed path from C i to some O i ∈ Pa D such that no othernode along −−−→ C i O i is in Pa D .We make the following assumptions without loss of gener-ality:• W U ﬁrst intersects −−→ DU at some variable Y (possibly Y is U ) and thereafter both W U and −−→ DU follow the samedirected path from Y to U (otherwise, let Y be the ﬁrstintersection point and replace the Y U sub-path of W U with the Y U sub-path of −−→ DU ).• The S W sub-path of reversed W U ﬁrst intersects −−→ XD at some node Z and thereafter both follow the samedirected path from Z to W (same argument as for Y ).• The paths −−−→ C i O i are mutually non-intersecting (if there isan intersection between −−−→ C i O i and −−−→ C j O j with j = i thenreplace the part of W U between C i and C j with the paththrough the intersection point, which becomes the new col-lider; this can only happen ﬁnitely many times as it reducesthe number of collider nodes).The resulting structure is shown in Figure 7.We now formally deﬁne the model represented in the ﬁg-ure. The domains of all endogenous variables are set to {− , , } . All exogenous variables are given independentdiscrete uniform distributions over {− , } . Unless other-wise speciﬁed, we set B = A for each edge A → B withinthe directed paths shown in Figure 7, i.e. f B ( pa B , ε B ) = a .Nodes at the heads of directed paths can therefore be deﬁnedin terms of nodes at the tails. We begin by describing func-tions for the “default” case depicted by Figure 7, and discussadaptations for various special cases below.• S i = E S i , giving S i a uniform distribution over − and .• U = Y , and• Y = S m · D , so D must match S m to optimize utility.• C i = S i − · S i , and m C m O m C O S WZX ... ......

D Y U Y = S m · DO = C O m = C m U = YW = ZZ = S · XX = 1 C m = S m − · S m S m ∼ Uniform ( {− , } ) S ∼ Uniform ( {− , } ) C = S · S choose D ∈ {− , , } Figure 7: Outline of the variables involved in the response incentive construction. Every graph that satisﬁes the response incentivegraphical criterion contains this structure (allowing all dashed paths except those to C i or Y to have length zero). An optimalpolicy for the given model is D = W · Q i O i = S m , yielding utility U = Y = X ( S m ) = 1 , and all optimal policies mustdepend on the value of W .• O i = C i , so the collider C i reveals (only) whether S i − and S i have the same sign or not.• X = 1 ,• Z = X · S , and• W = Z , so W reﬂects the value of S , unless X is inter-vened upon.All other variables not part of any named path are set to .Special cases arise when two or more of the labeled nodesin Figure 7 refer to the same variable. When W , Y , or O i isthe same node as one of its parents, then it simply takes thefunction of this parent (instead of copying its value). Mean-while, the S i , C i , and Y nodes must be distinct by construc-tion, so no special cases treatment is required. Finally, thefunctions for X , S and Z are adapted per the followingcases: Case 1: X , S , and Z are all the same node . Let X = Z = S = E S , i.e. the node takes a uniform distributionover {− , } . Case 2: Z is the same node as S , but different from X . Inthis case, let Z = S = X · E S . Case 3: X is the same node as Z , but different from S . Inthis case, let X = Z = S .The ﬁnal combination of X and S being the same, whiledifferent from Z , cannot happen by the deﬁnition of Z .Regardless of which case applies, an optimal policy is D = W · Q mi =1 O i , which yields a utility of .Now consider the intervention that sets X = 0 , and con-sequently W X =0 = Z X =0 = 0 . Without the information in W , S m is independent of ( Pa D ) X =0 and hence indepen-dent of D X =0 regardless of the selected policy. Therefore, E π [ U D X =0 ] = E π [ S m · D X =0 ] = E π [ S m ] · E π [ D X =0 ] = 0 for every policy π . In particular, for any optimal policy π ∗ , E π ∗ [ U D X =0 ] = E π ∗ [ U ] = 1 so there must be some ε suchthat D X =0 ( ε ) = D ( ε ) . Therefore, there is a response incen-tive on X .With this result we can now prove the completeness of thevalue of information criterion. Proof of Lemma 27 (VoI criterion completeness). If X U D | ( Pa D ∪ { D } \ { X } ) then X is a requisite obser-vation in G X → D (where G X → D is G modiﬁed to includethe edge X → D if the edge does not exist already) and X → D is a path in the minimal reduction G min X → D . ByLemma 28, there exists a model M X → D compatible with G X → D that has a response incentive on X . If every optimalpolicy for M X → D depends on X then it must be the casethat V ∗ ( M X D ) < V ∗ ( M X → D ) . C.3 Value of Control Criterion

The Value of Control section contains a proof of the sound-ness of the value of control criterion. We complete the proofof Theorem 16 by showing that the criterion is also complete. Note that if m = 0 and S is Z then ( S m ) X =0 = 0 but thefact that this is predictable is irrelevant because we compare D X =0 against the pre-intervention variable S m , which remains indepen-dent of ( Pa D ) X =0 . emma 29 (VoC criterion completeness) . If X U inthe minimal reduction G min of a single-decision CID G and X

6∈ { D } then X has positive value of control in at least oneSCIM M compatible with G .Proof. Assume that X U for X / ∈ { D } and ﬁx a par-ticular directed path ρ from X to some utility U ∈ U . Weconsider two cases depending on whether D is in ρ and con-struct a SCIM for each: Case 1: ρ does not contain D . Let the domain of all vari-ables be { , } . Set all exogenous variable distributions arbi-trarily. Set F such that X = 0 with every other variable along ρ copying the value of X forward. All remaining variablesare set to the constant . With this model, an intervention g X that sets X to instead of increases the total expected utilityby , which means there is an instrumental control incentivefor X . Case 2: ρ contains D . This implies that a directed path X → D is present in G min so we can construct (a modiﬁedversion of) the response incentive construction used in theproof of Lemma 28. We make one change: instead of startingwith f X ( · ) = 1 we start with f X ( · ) = 0 . As noted in theresponse incentive completeness proof, this means that S m is independent of Pa D so regardless of the policy the optimalattainable utility is . If we perform the intervention g X ( · ) =1 then the attainable expected utility is once again so theintervention g X strictly increases the optimal expected utility. C.4 Instrumental Control Incentive Criterion

The Instrumental Control Incentive section contains a proofof the soundness of the instrumental control incentive crite-rion. We prove its completeness to ﬁnish the proof of Theo-rem 18.

Lemma 30 (ICI Criterion Completeness) . If a single-decision CID G contains a path of the form D X U then there is an instrumental control incentive on X in atleast one SCIM M compatible with G .Proof. Assume that G contains a directed path D = Z → Z → · · · → Z n = U where U ∈ U and Z i = X forsome i ∈ { , . . . , n } . We construct a compatible SCIM forwhich there is an instrumental control incentive on X . Letall variables along the path Z → . . . → Z n be equalto their predecessor, except Z = D , which has no struc-ture function. All other variables are set to 0. In this model, U = D ∈ { , } and all other utility variables are always so the only optimal policy is π ∗ ( pa D ) = 1 , which gives E π ∗ [ U | Pa D = ] = 1 . Meanwhile, U X d = d so for d = 0 we have E π ∗ [ U X d | Pa D = ] = 0 . C.5 Counterfactual Fairness

Theorem 14 (Counterfactual fairness and response incen-tives) . In a single-decision SCIM M with a sensitive at-tribute A ∈ X , all optimal policies π ∗ are counterfactuallyunfair with respect to A if and only if A has a response incen-tive. Proof. We begin by showing that if there exists an optimalpolicy π that is counterfactually fair, then there is no responseincentive on A . To this end, letsupp π ( D | pa D ) = { d | Pr π ( D = d | pa D ) > }∀ a, supp π ( D a | pa D ) = { d | Pr π ( D a = d | pa D ) > } be the sets of decisions taken by π with positive probabilitywith and without an intervention on A . As a ﬁrst step, we willshow that for any ε ∈ dom( E ) and any intervention a on A ,supp π (cid:0) D | Pa D ( ε ) (cid:1) = supp π (cid:0) D a | Pa D ( ε ) (cid:1) . (5)By way of contradiction, suppose there exists a decision d ∈ supp π (cid:0) D | Pa D ( ε ) (cid:1) \ supp π (cid:0) D a | Pa D ( ε ) (cid:1) . (6)Since d ∈ supp π (cid:0) D | Pa D ( ε ) (cid:1) , we have Pr π (cid:0) D = d | Pa D ( ε ) , A ( ε ) (cid:1) > . (7)And since d supp π (cid:0) D a | Pa D ( ε ) (cid:1) , there exists no ε ′ withpositive probability such that Pa D ( ε ′ ) = Pa D ( ε ) , A ( ε ′ ) = A ( ε ) , and D a ( ε ′ ) = d . This gives Pr π (cid:0) D a = d | Pa D ( ε ) , A ( ε ) (cid:1) = 0 . (8)Equations (7) and (8) violate the counterfactual fairness prop-erty, Deﬁnition 13, which shows that (6) is impossible. Ananalogous argument shows that d ∈ supp π (cid:0) D a | Pa D ( ε ) (cid:1) \ supp π (cid:0) D | Pa D ( ε ) (cid:1) also violates the counterfactual fairnessproperty Deﬁnition 13. We have thereby established (5).Now select an arbitrary ordering of the elements of dom( D ) and deﬁne a new policy π ∗ such that π ∗ ( pa D ) isthe minimal element of supp π ( D | pa D ) . Then π ∗ is optimalbecause π is optimal. Further, π ∗ will make the same deci-sion in decision contexts Pa D ( ε ) and Pa Da ( ε ) because of (5).In other words, D a ( ε ) = D ( ε ) in M π ∗ for the optimal pol-icy π ∗ , which means that there is no response incentive on A .Now we prove the reverse direction — that if there is noresponse incentive then some optimal π ∗ is counterfactuallyfair. Choose any optimal policy π ∗ where D a ( ε ) = D ( ε ) forall ε . Since an intervention a cannot change D in any setting, Pr( D a = d | · ) = Pr( D = d | · ) for any condition and anydecision d , hence π ∗∗