A General Counterexample to Any Decision Theory and Some Responses
aa r X i v : . [ c s . A I] J a n A General Counterexample to Any Decision Theory and Some Responses
Joar Skalse
Oxford University [email protected]
Abstract.
In this paper I present an argument and a general schema which can be used to construct a problemcase for any decision theory, in a way that could be taken to show that one cannot formulate a decision theory thatis never outperformed by any other decision theory. I also present and discuss a number of possible responsesto this argument. One of these responses raises the question of what it means for two decision problems tobe “equivalent” in the relevant sense, and gives an answer to this question which would invalidate the firstargument. However, this position would have further consequences for how we compare di ff erent decisiontheories in decision problems already discussed in the literature (including e.g. Newcomb’s problem). Suppose there exists a decision theory XDT such that an agent who follows the recommendations of XDT isat least as well-o ff as any other agent in any “fair” decision problem. I will present an argument for why no suchdecision theory XDT can exist. Let DP(XDT) be the following decision problem:There are two boxes in front of you. One box is transparent and can be observed to contain $1,000, and onebox is opaque. You can choose to take either of the two boxes, but not both. The boxes have been set up insuch a way that if XDT permits taking the opaque box then the opaque box is empty, else the opaque boxcontains $2,000. The decision maker knows this fact.I will argue that whatever XDT recommends in DP(XDT), we can construct an alternative decision theory YDTsuch that followers of YDT are better o ff than followers of XDT in DP(XDT). I will, for the time being, assume thatDP(XDT) is a “fair” decision problem in the relevant sense. I will also assume that a satisfactory decision theoryalways must permit at least one action in any decision problem. This means that we have three options; either XDTonly permits taking the transparent box, or XDT only permits taking the opaque box, or XDT permits taking eitherthe transparent box or the opaque box. In the first two cases we could simply let YDT invert the recommendationof XDT. In this case, a decision maker who follows the recommendation of XDT will end up with $1,000 less than adecision maker who instead follows the recommendation of YDT, which contradicts the assumption that followersof XDT are at least as well-o ff as all other agents in all “fair” decision problems. It is worth noting that this would mean that e.g. ratificationism (Je ff rey, 1965) is not a “satisfactory” decision theory, sincethere are decision problems in which there are no ratifiable choices (see e.g. Egan, 2007). However, refinements such as lexicalratificationism (Egan, 2007) do satisfy this condition. hat if XDT permits taking either of the two boxes? Formally, there is then some set of probabilities S ⊆ [0 , p ∈ S , XDT permits taking the opaque box with probability p , and where S , { } and S , { } . Inthis case, the opaque box will be empty (since XDT permits taking the opaque box). Let YDT recommend takingthe transparent box (with probability 1). Let X be an agent that follows XDT and in DP(XDT) takes the transparentbox with probability p ∈ S , p ,
1, and let Y be an agent that follows YDT. In this case, Y is guaranteed to get at leastas much money as X , and might get more money than X , which again contradicts the assumption that followers ofXDT always are at least as well-o ff as any other agent in any “fair” decision problem. This exhausts all the availableoptions, and so it seems like no such decision theory XDT can exist.Before moving on, I want to make a few clarifying remarks. What I have presented above is a general schemawhich can be used to construct a seemingly problematic case for any decision theory. For example, we can considerwhat YDT does in DP(YDT), or what causal decision theory (CDT) and evidential decision theory (EDT) doin DP(CDT) and DP(EDT) respectively. If the recommendation of a given decision theory ZDT depends on theBayesian prior D of the decision maker then we could simply modify the schema to take the prior into account aswell – i.e., we construct a decision problem DP(ZDT, D ) where the opaque box contains $2,000 if ZDT does notpermit agents with the prior D to take the opaque box. Therefore, whatever the structure of a given decision theoryis, this schema can be used to construct a problem case for that decision theory. I should also comment on the casewhere XDT permits taking either of the two boxes, but allows the agent to take the opaque box with probability p = ∈ S ). In that case the opaque box will contain $2,000, and an agent A could follow XDT and take theopaque box with probability 1 in DP(XDT), in which case no agent is better o ff than A in DP(XDT). However, inthis case an XDT agent is still allowed to not take the opaque box with probability 1, so merely following XDT is not su ffi cient to ensure that you are at least as well-o ff as any other agent in any decision problem. And, of course, if weconstruct a refined decision theory XDT + which says “follow XDT, and if you face DP(XDT) then take the opaquebox with probability 1”, then we could use the schema to construct a decision problem PD(XDT + ), which wouldbe problematic for this refined decision theory.One possible response to this argument is to say that DP(XDT) in some sense is “unfair”, and that we shouldexclude it from the set of situations we consider. However, we must then provide some principle on the groundsof which DP(XDT) can be excluded, and it is not immediately obvious how to do this. We can note that it is notthe case that DP(XDT) is an “unwinnable” decision problem. In DP(XDT) there is a choice that is available to thedecision maker that is better than the choice that XDT in fact recommends, and an agent that follows YDT willactually get more money than an XDT agent. Moreover, it is also not the case that DP(XDT) rewards or punishesdecision makers based directly on what decision theory they follow – it is possible to set up DP(XDT) withoutknowing what decision theory the decision maker follows, and the outcome depends only on what action theecision maker takes. This means that DP(XDT) is di ff erent from “decision” problems in which the decision makeris directly punished for following some particular decision theory, for example.We should also note that this decision problem is di ff erent from Newcomb’s problem (Nozick, 1969) in a numberof important ways. In particular, many of the objections that are sometimes raised against Newcomb’s problemdo not apply here. A causal decision theorist can argue that an evidential decision theorist and a causal decisiontheorist do not actually have the same options available to them when they are faced with Newcomb’s problem(see e.g. pp. 151–154 in Joyce, 1999). The reason that evidential decision theorists seem to do better is because theyare given better options, but given the options that they have the decision that they make is irrational (or so onemight argue). However, this argument does not apply to DP(XDT), since both the state of the environment andthe causal consequences of any action are perfectly identical for any agent facing DP(XDT). We can therefore notexclude DP(XDT) on the grounds that it o ff ers di ff erent options to di ff erent types of decision makers. Moreover,Newcomb’s problem requires that it is possible to accurately predict the actions of the decision maker, and onecould object to it on this ground. In contrast, all that is required to set up DP(XDT) is that one can compute theoutput of XDT, and this must presumably be possible if the decision maker is able to use XDT.Another possible response to this argument is to say that there is a relevant sense in which agents who followXDT and YDT are not facing the same decision problem when they are faced with DP(XDT), even though they arein identical environments and have access to the same information. We can note that agents following XDT andYDT will compute di ff erent values of P( state i | action j ) in DP(XDT), at least provided that they know which decisiontheory they themselves follow. The agents will therefore be in di ff erent epistemic states when they make theirrespective decisions, and one could argue that this means that they are not facing the “same” decision problem inthe relevant sense. If this is the case then DP(XDT) fails to demonstrate that there is a decision problem in whichXDT is outperformed by another decision theory.This response could plausibly be satisfactory, but it calls for a more precise account of what it takes for twodecision situations to be “equivalent” in the relevant sense. There are at least three considerations that seemrelevant. First of all, we can say that two decision situations are “physically equivalent” if they take place inenvironments with identical dynamics – that is, the environments have the same states and the same actions,the actions have the same consequences, and corresponding outcomes are associated with the same utilities. Furthermore, we can say that two decision situations are “experientially equivalent” if the agents facing themmake identical observations before they make their decisions. Finally, we can say that two decision situations are“epistemically equivalent” if the decision makers are in the same epistemic state when they make their respectivedecisions – that is, they assign the same probability to every proposition. In most cases physical equivalence would Stated di ff erently, the situations correspond to identical Markov decision processes (Bellman, 1957). I am here of course assuming a broadly Bayesian epistemology, but the argument does not rely much on this assumption. mply experiential equivalence, and experiential equivalence imply epistemic equivalence, but there are cases inwhich this relationship does not hold. In particular, if an XDT agent and a YDT agent both face DP(XDT) then theirsituations are physically and experientially equivalent, but not epistemically equivalent. If we consider epistemicequivalence between decision situations to be the most important kind of equivalence then the argument abovedoes not work, since XDT and YDT agents would then not be facing the “same” decision problem (in the relevantsense) when they are faced with DP(XDT).It should be noted that this position would also disqualify some Newcomb-like decision problems. For example,say that an evidential decision theorist and a causal decision theorist face Newcomb’s problem, and that they knowwhich decision theory they themselves follow. They could then predict which action they will themselves take,from which they could infer what the opaque box contains. This means that they would be in di ff erent epistemicstates when they make their decisions, and hence not be facing the “same” decision problem (according to theoutlined position). However, if they are not able to predict their own actions then they could be facing the “same”decision problem. This may be seen as undesirable.Another possible response to the argument is to say that XDT could be making the “right” decision in DP(XDT)even though the recommendation of YDT yields more utility than the recommendation of XDT. In situations wherethe decision maker has incomplete information the rational action may of course be di ff erent from the action thatyields the greatest amount of utility – for example, it is not rational to buy a lottery ticket, even if that lottery tickethappens to be a winning ticket. This principle does not apply directly to DP(XDT), since in DP(XDT) the decisionmaker has information that logically entails the precise state of the environment he is in. However, one couldreasonably maintain that XDT should recommend taking the transparent box, because if XDT recommends takingthe transparent box then agents following XDT will get $1,000 in DP(XDT), whereas if it recommends taking theopaque box then agents following XDT will get $0. If this is the case then DP(XDT) fails to demonstrate that thereis any decision problem in which XDT does not make the right decision.This response could, like the previous response, plausibly be satisfactory. However, it is not entirely unproblem-atic. First of all, even if we believe that XDT is making the right decision if it takes the transparent box in DP(XDT),we would presumably not want to say that YDT is making the wrong decision if it then takes the opaque box. Tomake sense of this (without arguing that XDT and YDT are facing di ff erent decision problems) it seems as thoughwe would have to argue that the rational course of action in a given situation in some peculiar way depends onwhat decision theory the decision maker follows. It is not clear to me how exactly this position could be explicated,and I will not attempt to do so here. I will however note that I suspect this approach would end up being e ff ectivelyequivalent to arguing that XDT and YDT are facing di ff erent decision problems. cknowledgements : With many thanks to Caspar Oesterheld for giving feedback on the ideas in this paper. References [1] Richard Bellman. “A Markovian Decision Process”. In:
Indiana Univ. Math. J. issn :0022-2518.[2] Andy Egan. “Some Counterexamples to Causal Decision Theory”. In:
The Philosophical Review issn : 00318108, 15581470. url : .[3] Richard C. Je ff rey. The Logic of Decision . University of Chicago Press, 1965.[4] James M. Joyce.
The Foundations of Causal Decision Theory . Cambridge University Press, 1999.[5] Robert Nozick. “Newcomb’s Problem and Two Principles of Choice”. In:
Essays in Honor of Carl G. Hempel . Ed.by Nicholas Rescher et al. Springer, 1969, pp. 114–146. url : http://faculty.arts.ubc.ca/rjohns/nozick_newcomb.pdfhttp://faculty.arts.ubc.ca/rjohns/nozick_newcomb.pdf