[PDF] Normal form backward induction for decision trees with coherent lower previsions

Abstract

We examine normal form solutions of decision trees under typical choice functions induced by lower previsions. For large trees, finding such solutions is hard as very many strategies must be considered. In an earlier paper, we extended backward induction to arbitrary choice functions, yielding far more efficient solutions, and we identified simple necessary and sufficient conditions for this to work. In this paper, we show that backward induction works for maximality and E-admissibility, but not for interval dominance and Gamma-maximin. We also show that, in some situations, a computationally cheap approximation of a choice function can be used, even if the approximation violates the conditions for backward induction; for instance, interval dominance with backward induction will yield at least all maximal normal form solutions.

Full PDF

aa r X i v : . [ m a t h . S T ] M a r Nathan Huntley and Matthias C. M. Troﬀaes. Normal Form Backward Induction for DecisionTrees with Coherent Lower Previsions.

Annals of Operations Research , 195(1):111-134, 2012. http://dx.doi.org/10.1007/s10479-011-0968-2

NORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITHCOHERENT LOWER PREVISIONS

NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES

Abstract.

We examine normal form solutions of decision trees under typical choice functionsinduced by lower previsions. For large trees, ﬁnding such solutions is hard as very many strate-gies must be considered. In an earlier paper, we extended backward induction to arbitrarychoice functions, yielding far more eﬃcient solutions, and we identiﬁed simple necessary andsuﬃcient conditions for this to work. In this paper, we show that backward induction worksfor maximality and E-admissibility, but not for interval dominance and Γ-maximin. We alsoshow that, in some situations, a computationally cheap approximation of a choice functioncan be used, even if the approximation violates the conditions for backward induction; forinstance, interval dominance with backward induction will yield at least all maximal normalform solutions. Introduction

In classical decision theory, one aims to maximize expected utility. Such approach requiresprobabilities for all relevant events. However, when information and knowledge are limited, sadly,the decision maker may not be able to specify or elicit probabilities exactly. To handle this,various theories have been suggested, including lower previsions [35], which essentially amountto sets of probabilities.In non-sequential problems, given a lower prevision, various generalizations of maximizingexpected utility exist [34]. Sequential extensions of some of these alternatives have been sug-gested [29, 9, 13, 1, 30, 5, 14, 31], yet not systematically studied. In this paper, we study,systematically, using lower previsions, which decision criteria admit eﬃcient solutions to sequen-tial decision problems, by backward induction, even if probabilities are not exactly known. Ourmain contribution is that we prove for which criteria backward induction coincides with the usualnormal form.We study very general sequential decision problems: a subject can choose from a set ofoptions, where each option has uncertain consequences, leading to either rewards or more options.Based on her beliefs and preferences, the subject seeks an optimal strategy. Such problems arerepresented by a decision tree [25, 18, 3].When maximizing expected utility, one can solve a decision tree by the usual normal formmethod , or by backward induction . First, note that the subject can specify, in advance, her actionsin all eventualities. In the normal form, she simply chooses a speciﬁcation which maximizes herexpected utility. However, in larger problems, the number of speciﬁcations is gargantuan, andthe normal form is not feasible.Fortunately, backward induction is far more eﬃcient. We ﬁnd the expected utility at theﬁnal decision nodes, and then replace these nodes with the maximum expected utility. Thepreviously penultimate decision nodes are now ultimate, and the process repeats until the root is

Key words and phrases. backward induction; decision tree; lower prevision; sequential decision making; choicefunction; maximality; E-admissibility; interval dominance; maximin; imprecise probability.

NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES reached. Backward induction is guaranteed to coincide with the normal form [25] if probabilitiesare non-zero [8, p. 44].The usual normal form method works easily with decision criteria for lower previsions: applyit to the set of all strategies. Generalizing backward induction is harder, as no single expectationsummarizes all relevant information about substrategies, unlike with expected utility. We followKikuti et al. [14], and instead replace nodes with sets of optimal substrategies, moving fromright to left in the tree, eliminating strategies as we go. De Cooman and Troﬀaes [5] presenteda similar idea for dynamic programming.In this general setting, normal form and backward induction can diﬀer, as noted by many[29, 20, 30, 14, 5, 13, 1]. However, for some decision criteria the methods always coincide.In [11], we found conditions for coincidence. In this paper, we expand the work begun in [12],and investigate what works for lower previsions, ﬁnding that maximality and

E-admissibility work, but the others do not.This coincidence is of interest for at least two reasons. First, as mentioned, the normal form isnot feasible for larger trees, whereas backward induction can eliminate many strategies early on,hence being far more eﬃcient. Secondly, one might argue that a solution where the two methodsdiﬀer is philosophically ﬂawed [11, 8, 29, 21].The paper is organized as follows. Section 2 explains decision trees and introduces notation.Section 3 presents lower previsions and their decision criteria, and demonstrates normal formbackward induction on a simple example. Section 4 formally deﬁnes the two methods, andcharacterizes their equivalence, which is applied in Section 5 to lower previsions. Section 6discusses a larger example. Section 7 concludes. Readers familiar with decision trees and lowerprevisions can start with Sections 3.3 and 6.2.

Decision Trees

Deﬁnition and Example.

Informally, a decision tree [18, 3] is a graphical causal represen-tation of decisions, events, and rewards. Decision trees consist of a rooted tree [7, p. 92, Sec. 3.2]of decision nodes, chance nodes, and reward leaves, growing from left to right. The left handside corresponds to what happens ﬁrst, and the right side to what happens last.Consider the following example. Tomorrow, a subject is going for a walk in the lake district.It may rain ( E ), or not ( E ). The subject can either take a waterproof ( d ), or not ( d ). But thesubject may also choose to buy today’s newspaper, at cost c , to learn about tomorrow’s weatherforecast ( d S ), or not ( d S ), before leaving for the lake district. The forecast has two possibleoutcomes: predicting rain ( S ), or not ( S ).The corresponding decision tree is depicted in Figure 1. Decision nodes are depicted bysquares, and chance nodes by circles. From each node, a number of branches emerge, representingdecisions at decision nodes and events at chance nodes. The events from a node form a partitionof the possibility space: exactly one of the events will take place. Each path in a decision treecorresponds to a sequence of decisions and events. The reward from each such sequence appearsat the right hand end of the branch.2.2. Notation.

A particular decision tree can be seen as a combination of smaller decisiontrees: for example, one could draw the subtree corresponding to buying the newspaper, and alsodraw the subtree corresponding to making an immediate decision. The decision tree for the fullproblem is then formed by joining these two subtrees at a decision node.So, we can represent a decision tree by its subtrees and the type of its root node. Let T , . . . , T n be decision trees. If T combines the trees at a decision node, we write T = n G i =1 T i . ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 3 N N N N − c E − c E d N − c E − c E d S N N − c E − c E d N − c E − c E d S d S N N E E d N E E d d S Figure 1.

A decision tree for walking in the lake district.If T combines the trees at a chance node, with subtree T i being connected by event E i ( E , . . . , E n is a partition of the possibility space) we write T = n K i =1 E i T i . For instance, for the tree of Fig. 1 with c = 1, we write( S ( T ⊔ T ) ⊙ S ( T ⊔ T )) ⊔ ( U ⊔ U )with, where we denoted the reward nodes by their utility, T = E ⊙ E U = E ⊙ E T = E ⊙ E U = E ⊙ E T an event ev( T ) representing the intersection of all the eventson chance arcs that have preceded T . Deﬁnition 1.

A subtree of a tree T obtained by removal of all non-descendants of a particularnode N is called the subtree of T at N and is denoted by st N ( T ) . These subtrees are called ‘continuation trees’ by Hammond [8].

NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES

Consider all possible ways that sets of decision trees T , . . . , T n can be combined. Our notationeasily extends. For any partition E , . . . , E n , n K i =1 E i T i = ( n K i =1 E i T i : T i ∈ T i ) . For any sets of consistent decision trees T , . . . , T n , n G i =1 T i = ( n G i =1 T i : T i ∈ T i ) . For convenience we only work with decision trees for which there is no event arc that isimpossible given preceding events.

Deﬁnition 2.

A decision tree T is called consistent if for every node N of T , ev(st N ( T )) = ∅ . Clearly, if a decision tree T is consistent, then for any node N in T , st N ( T ) is also consistent.Considering only consistent trees is not really a restriction, since inconsistent trees would onlybe drawn due to an oversight and could easily be made consistent.2.3. Solving Decision Trees with Probabilities and Utilities.

We give a brief overview ofthe standard method of solving a decision tree when probabilities of events are known. Supposein Fig. 1 we have p ( S ) = 0 . p ( E | S ) = 0 .

7, and p ( E | S ) = 0 .

2, so p ( E ) = 0 .

5. We ﬁrstcalculate the expected utility of the ﬁnal chance nodes. For example, the expected utility at N is 0 . − c ) + 0 . − c ) = 11 . − c , and the expected utility at N is 9 . − c .We now see that at N it is better to choose decision d . We then replace N and its subtreewith the expected utility of N : 11 . − c . Also follow this procedure for N and N , andthe tree has been reduced by a stage. We ﬁnd that d is optimal at N with value 17 − c , andat N both decisions are optimal with value 12 . N , which is 0 . . − c ) + 0 . − c ) = 13 . − c . At N , wetherefore take decision d S if c ≤ . d S if c ≥ .

2. This procedure is illustrated in Fig. 2,where the dashed lines indicate decision arcs that are rejected because their expected utility istoo low (for any speciﬁc c = 1 .

2, either d S or d S would be dashed).This method can only be carried out if the subject has assessed precise probabilities andutilities and wishes to maximize expected utility. It may be that the subject is unable or un-willing to comply with these requirements. The next section considers a possible solution, anddemonstrates how backward induction can be generalized.3. Credal Sets and Lower Previsions

First, we outline a straightforward generalization of the theory of probability, allowing thesubject to model uncertainty in cases where too little information is available to identify aunique probability distribution (see for instance [2, 26, 6, 33, 36, 37, 35]).3.1.

Gambles, Credal Sets and Lower Previsions.

The possibility space

Ω is the set of allpossible states of the world. Elements of Ω are denoted by ω . Subsets of Ω are called events ,and are denoted by capital letters A , B , etc. The arcs emerging from chance nodes in a decisiontree correspond to events.A gamble is a function X : Ω → R , interpreted as an uncertain reward: should ω ∈ Ω be thetrue state of the world, the gamble X will yield the reward X ( ω ).A probability mass function is a non-negative real-valued function p : Ω → R + whose valuessum to one [28, p. 138, Sec. 4.2]. For convenience, we will write p ( A ) for P ω ∈ A p ( ω ). ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 5 N . − c . − c . − c − c . − c . d . − c − c . − c . . − c − c − c . − c . − c − c . − c . d . d S . . . . d . . . d d S Figure 2.

Solving the Lake District problem with expected utility.For the purpose of this paper, we assume that • Ω is ﬁnite, • rewards are expressed in utiles, • the subject can express her beliefs by means of a closed convex set M of probabilitymass functions p ( M is called the credal set ), and • each probability mass function p ∈ M satisﬁes p ( ω ) >

0, for all ω ∈ Ω.Under the above assumptions, each p in M determines a conditional expectation E p ( X | A ) = P ω ∈ A X ( ω ) p ( ω ) p ( A ) , and the whole set M determines a conditional lower and upper expectation P ( X | A ) = min p ∈M E p ( X | A ) P ( X | A ) = max p ∈M E p ( X | A ) , and this for every gamble X and every non-empty event A .The functional P is called a coherent conditional lower prevision , and similarly, P is calleda coherent conditional upper prevision . Although here we have deﬁned these by means of a setof probability measures, there are diﬀerent ways of obtaining and interpreting lower and upperprevisions (see for instance Miranda [22] for a survey).Following De Finetti [6], where it is convenient we denote the indicator gamble I A ( ω ) = ( ω ∈ A ω A NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES of an event A also simply by A : for instance, if X is a gamble, and A is an event, then AX isjust a shorthand notation for I A X .Below are some properties of coherent conditional lower and upper previsions that we requirelater (see Williams [36, 37] or Walley [35] for proofs). Proposition 3.

For all non-empty events

A, B , gambles

X, Y , and constants λ > :(i) If AX = AY then P ( X | A ) = P ( Y | A ) .(ii) P ( X | A ) + P ( Y | A ) ≤ P ( X + Y | A ) ≤ P ( X | A ) + P ( Y | A ) .(iii) P ( λX | A ) = λP ( X | A ) and P ( λX | A ) = λP ( X | A ) .(iv) P ( X | A ) = − P ( − X | A ) .(v) P ( A ( X − P ( X | A ∩ B )) | B ) = 0 . Property (v) generalizes the generalized Bayes rule , which describes how conditional lowerprevisions are linked to unconditional lower previsions. For instance, if P ( A | B ) >

0, then P ( ·| A ∩ B ) is uniquely determined by P ( ·| B ) via property (v) [35, p. 297, Thm. 6.4.1].3.2. Choice Functions and Optimality.

Suppose a subject must choose a gamble from aset X . In classical decision theory, a gamble is optimal if its expectation is maximal. Moregenerally, given a credal set M , the subject might consider as optimal for example any gambleswhose expectation is maximal for at least one p ∈ M : in fact, most criteria determine optimaldecisions by comparison of gambles. So, we suppose that the subject has a way of determiningan optimal subset of any set of gambles, conditional on any event A (think of A as ev( T )): Deﬁnition 4. A choice function opt is an operator that, for each non-empty event A , maps eachnon-empty ﬁnite set X of gambles to a non-empty subset of this set: ∅ 6 = opt( X | A ) ⊆ X . Note that common uses of choice functions in social choice theory, such as by Sen [32, p. 63,ll. 19–21], do not consider conditioning on events, and deﬁne choice functions for arbitrary setsof options, rather than for gambles only.The interpretation of a choice function is that when the subject can choose among the elementsof X , having observed A , she would only choose from opt( X | A ). Therefore, we say the elementsof opt( X | A ) are optimal (relative to X and A ). Note that the subject may not consider themequivalent: adding a small incentive to choose a particular optimal option would not necessarilymake it the single preferred option.We now consider four popular choice functions that have been proposed for choosing betweengambles given a coherent lower prevision. Further discussion of the criteria presented here canbe found in Troﬀaes [34].3.2.1. Maximality.

Maximality is based on the following strict partial preference order > P | A . Deﬁnition 5.

Given a coherent lower prevision P , for any two gambles X and Y we write X > P | A Y whenever P ( X − Y | A ) > . The partial order > P | A gives rise to the choice function maximality , proposed by Condorcet[4, pp. lvj–lxix, 4. e Exemple], Sen [32], and Walley [35], among others. Deﬁnition 6.

For any non-empty ﬁnite set of gambles X and each event A = ∅ , opt > P |· ( X | A ) = { X ∈ X : ( ∀ Y ∈ X )( Y > P | A X ) } . Because all probabilities in M are assumed to be strictly positive, Walley’s admissibility condition is impliedand hence omitted in Deﬁnition 6. ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 7

E-admissibility.

Another criterion is E-admissibility, proposed by Levi [17]. Recall that P ( ·| A ) is the lower envelope of M . For each p ∈ M we can maximize expected utility:opt p ( X | A ) = { X ∈ X : ( ∀ Y ∈ X )( E p ( Y | A ) ≤ E p ( X | A )) } . Then the set of E-admissible options is deﬁned by:

Deﬁnition 7.

For any non-empty ﬁnite set of gambles X and each event A = ∅ , opt M ( X | A ) = [ p ∈M opt p ( X | A ) . A gamble X is therefore E-admissible when it maximizes expected utility under at least one p ∈ M . Any E-admissible gamble is maximal [35, p. 162, ll. 26–28].3.2.3. Interval Dominance.

Interval dominance is based on the strict partial preference order ⊐ P | A . Deﬁnition 8.

Given a coherent lower prevision P , for any non-empty event A and any two A -consistent gambles X and Y we write X ⊐ P | A Y whenever P ( X | A ) > P ( Y | A ) . This ordering induces a choice function usually called interval dominance [38, 34]:

Deﬁnition 9.

For any non-empty ﬁnite set of gambles X and each event A = ∅ , opt ⊐ P |· ( X | A ) = { X ∈ X : ( ∀ Y ∈ X )( Y ⊐ P | A X ) } . The above criterion was apparently ﬁrst introduced by Kyburg [15] and was originally calledstochastic dominance.3.2.4. Γ -maximin.

Γ-maximin selects gambles that maximizes the minimum expected reward.

Deﬁnition 10.

For any non-empty ﬁnite set of gambles X and each event A = ∅ , opt P ( X | A ) = { X ∈ X : ( ∀ Y ∈ X )( P ( X | A ) ≥ P ( Y | A )) } . Γ-maximin is a total preorder, and so usually selects a single gamble regardless of the degreeof uncertainty in P . Γ-maximin can be criticized for being too conservative (see Walley [35,p. 164]), as it only takes into account the worst possible scenario.3.3. Sequential Problems Using Lower Previsions.

Consider again the lake district prob-lem depicted in Fig. 1, but now suppose that the subject has speciﬁed a coherent lower prevision,instead of a singe probability measure. For this example, we consider an ǫ -contamination model:with probability 1 − ǫ , observations follow a given probability mass function p , and with proba-bility ǫ , observations follow an unknown arbitrary distribution. One can easily check that, underthis model, the lower expectation for a gamble X is P ( X ) = (1 − ǫ ) E p ( X ) + ǫ inf X The conditional lower expectation is [35, p. 309] P ( X | A ) = (1 − ǫ ) E p ( AX ) + ǫ inf ω ∈ A X ( ω )(1 − ǫ ) p ( A ) + ǫ As before, let p ( S ) = 0 . p ( E | S ) = 0 .

7, and p ( E | S ) = 0 .

2, so p ( E ) = 0 .

5. Let ǫ = 0 . normal form method : she lists all possiblestrategies (actions to take in all eventualities), ﬁnds the corresponding gambles, and applies asuitable choice function, say maximality.Table 1 lists all strategies and their gambles. Each strategy gives a reward determined entirelyby ω , and hence has a corresponding gamble. For example, the gamble for the last strategy is(5 − c ) S E + (20 − c ) S E + (10 − c ) S E + (15 − c ) S E = S Y + S X − c, NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES strategy gamble d S , then d Xd S , then d Yd S , then d if S and d if S X − cd S , then d if S and d if S Y − cd S , then d if S and d if S S X + S Y − cd S , then d if S and d if S S Y + S X − c Table 1.

Strategies and gambles for the lake district problem.(i) N N N { X − c } d { Y − c } d S N { X − c } d { Y − c } d S d S N { X } d { Y } d d S (ii) N N opt( { X − c, Y − c }| S ) = { X − c } S opt( { X − c, Y − c }| S ) = { Y − c } S d S opt( { X, Y } ) = { X, Y } d S (iii) N opt( { S ( X − c ) + S ( Y − c ) } ) = { S X + S Y − c } d S { X, Y } d S (iv) opt( { S X + S Y − c, X, Y } ) =  { S X + S Y − c } if c < / { X, Y } if c > / { S X + S Y − c, X, Y } otherwise Figure 3.

Solving the lake district example by normal form backward induction.with X = 10 E + 15 E and Y = 5 E + 20 E . Recall that (5 − c ) S E is just a shorthand notationfor (5 − c ) I S I E , and similarly for all other terms.Maximality can then be applied to ﬁnd the optimal gambles: this requires comparison of allsix gambles at once. Skipping the details of this calculation, for instance with c = 0 .

5, we ﬁndthat we should buy the newspaper and follow its advice.However, could we think of a backward induction scheme which might not require comparisonof all six gambles at once? Obviously, any such scheme will not work as easily as in Section 2.3,because we are not maximizing for a single number (i.e. expected utility). Instead, we retain theoptimal strategies in subtrees , as illustrated in Fig. 3.Denote subtrees at a particular node N ∗∗ by T ∗∗ = st N ∗∗ ( T ).(i) First, write down the gambles at the ﬁnal chance nodes. For example, at N thegamble is (10 − c ) E + (15 − c ) E = X − c , and similarly for all others.(ii) Let us ﬁrst deal with the branch corresponding to refusing the newspaper. At the decisionnode N , we have a choice between two strategies that correspond to the gambles X and Y . We also have ev( T ) = Ω. So to determine the optimal strategies in this subtree, ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 9 we must compare these two gambles unconditionally: P ( X − Y ) = P ( Y − X ) = − ǫ = − / , so at T the strategies d and d are both optimal.Now we move to the branch corresponding to buying the newspaper. At N , we needto compare X − c and Y − c . We have that ev( T ) = S , and P (( X − c ) − ( Y − c ) | S ) = − ǫ ǫ > , so X − c > P | S Y − c and the uniquely optimal strategy is d . Next, considering N ,we see that ev( T ) = S , and P (( Y − c ) − ( X − c ) | S ) = − ǫ ǫ > , so the optimal strategy here is d .(iii) Moving to N , we see that only one of the original four strategies remains: “ d if S and d if S ”, corresponding to the gamble S X + S Y − c .(iv) Finally, considering the entire tree T , three strategies are left: “ d if S and d if S ”;“ d S , then d ”; “ d S , then d ”. Therefore we need to ﬁndopt > P |· ( { S X + S Y − c, X, Y } ) . We have P ( X − ( S X + S Y − c )) = c − (6 + 19 ǫ ) / c − / P (( S X + S Y − c ) − X ) = (6 − ǫ ) / − c = 29 / − cP ( Y − ( S X + S Y − c )) = c − (6 + 19 ǫ ) / c − / P (( S X + S Y − c ) − Y ) = (6 − ǫ ) / − c = 29 / − c Concluding (see Fig. 3(iv)): • if the newspaper costs less than 29 /

50, we should buy and follow its advice. • if it costs more than 79 /

50, we do not buy, but have insuﬃcient information todecide whether to take the waterproof or not. • if the newspaper costs between 29 /

50 and 79 /

50, we can take any of the threeremaining options.Comparing this with the solution calculated in Section 2.3, we observe that the imprecisionhas created a range of c for which it is unclear whether buying the newspaper is better than not,rather than the single value for c in the precise case. Despite this, should the subject decide tobuy the newspaper, she will follow the same policy in both cases: take the waterproof only if thenewspaper predicts rain. Finally it should be noted that, although in both cases both d S d and d S d are involved in optimal normal form decisions for some values of c , in the precise case thisis because they are equivalent and in the imprecise case they are incomparable . A tiny increasein value for, say, not taking the waterproof and no rain, would make d S d always non-optimalunder E p but still optimal under P for c ≥ / c . It is easy to ﬁnd choice functions and decision trees where thisdoes not work [16, 13, 30]. We want to know for which of the coherent lower previsions choicefunctions the two methods agree. To answer this, we invoke a theorem relating to general choicefunctions, outlined in the next section. Normal Form Solutions for Decision Trees

We now introduce the necessary terminology for examining our two methods of solution indetail, and provide theorems stating when they coincide. These two methods yield normal formsolutions , that is, sets of optimal strategies at the root node.4.1.

Normal Form Decisions, Solutions, Operators, and Gambles.

Suppose the subjectspeciﬁes a decision for each eventuality, and resolves to follow this policy. She now has no choiceat decision nodes, and her reward is entirely determined by the state of nature. This correspondsto a reduced decision tree obtained by taking the initial tree and removing all but one of thearcs at each decision node. Such a reduced tree is called a normal form decision , and representswhat we called a “strategy” in Section 3.3. We denote the set of all normal form decisions of T by nfd( T ).It is unlikely that the subject can specify a single optimal normal form decision for all problems.Nevertheless, she might be able to eliminate some unacceptable ones: a normal form solution of a decision tree T is simply a non-empty subset of nfd( T ). A normal form operator is then afunction mapping every decision tree to a normal form solution of that tree. The two methodswe investigate are normal form operators.As we saw in Section 3.3, the reward for a normal form decision is determined entirely by theevents that take place. That is, a normal form decision has a corresponding gamble, which wecall a normal form gamble . The set of all normal form gambles associated with a decision tree T is denoted by gamb( T ), so gamb is an operator on trees that yields the set of all gambles inducedby normal form decisions of the tree.We will need to know when a set of gambles can be represented by a consistent decision tree(as deﬁned earlier in Section 2.2): Deﬁnition 11.

Let A be any non-empty event, and let X be a set of gambles. Then the followingconditions are equivalent; if they are satisﬁed, we say that X is A -consistent .(A) There is a consistent decision tree T with ev( T ) = A and gamb( T ) = X .(B) For all r ∈ R and all X ∈ X such that X − ( r ) = ∅ , it holds that X − ( r ) ∩ A = ∅ . Proof of equivalence of (A) and (B) is fairly straightforward, whence omitted here.The following notation proves convenient for normal form gambles at chance nodes.

Deﬁnition 12.

For any events E , . . . , E n which form a partition, and any ﬁnite family of setsof gambles X , . . . , X n , we deﬁne the following set of gambles: (1) n X i =1 E i X i = ( n X i =1 E i X i : X i ∈ X i ) Normal Form Operator Induced by a Choice Function.

We can now formalize thesimple normal form method described at the start of Section 3.3. Listing all strategies corre-sponds to ﬁnding nfd( T ). Listing their corresponding gambles corresponds to ﬁnding gamb( T ).Then calculate opt(gamb( T ) | ev( T )), and ﬁnd all elements of nfd( T ) that induced these optimalgambles. The solution is then the set of all these normal form decisions. Formally, Deﬁnition 13.

Given any choice function opt , and any decision tree T with ev( T ) = ∅ , norm opt ( T ) = { U ∈ nfd( T ) : gamb( U ) ⊆ opt(gamb( T ) | ev( T )) } . The following important equality follows immediately:(2) gamb(norm opt ( T )) = opt(gamb( T ) | ev( T )) . ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 11

Let us demonstrate this deﬁnition on lake district problem. For any particular strategy U , sayfor instance “buy the newspaper and take the waterproof only if the newspaper predicts rain”,we can calculate its associated gamble U , which is in our instance X = (10 − c ) E S ⊕ (15 − c ) E S ⊕ (5 − c ) E S ⊕ (20 − c ) E S . We check whether X is optimal in the set of all gambles associated with T , that is, whether X ∈ opt(gamb( T ) | ev( T )). If so, then gamb( U ) = { X } ⊆ opt(gamb( T ) | ev( T )) and so U ∈ norm opt ( T ). Otherwise, gamb( U ) = { X } 6⊆ opt(gamb( T ) | ev( T )) and so U norm opt ( T ). Thisprocedure for each strategy in T will determine norm opt ( T ).4.3. Normal Form Backward Induction.

The operator norm opt is a natural and popularchoice, but for practical or philosophical reasons one may wish to be able to ﬁnd it by backwardinduction. Even for an almost trivial problem such as Fig. 1 there are already six normal formdecisions. If a tree T has at least n decision nodes in every path from the root to any leaf, andeach decision node has at least two children, then there will be at least 2 n normal form decisionsassociated with T (and often a lot more). Working with sets of 2 n gambles may be impracticalfor large n , particularly for maximality and E-admissibility, so a method that may avoid applyingthe choice function on a large set is necessary.Implementation of backward induction is easy when there is a unique choice at every node,but a choice function may not have this property, so we need to adapt the traditional approach.The technique informally introduced in Section 3.3 is a generalization of the method of Kikutiet al. [14], where the only diﬀerence is that we apply our choice function at all nodes, not justdecision nodes. Although the focus of Kikuti et al. is also on uncertainty models represented bycoherent lower previsions, their approach, and so our generalization, can be used for any choicefunction on gambles.The goal of our backward induction algorithm is to reach a normal form solution of T byﬁnding normal form solutions of subtrees of T , and using these to remove some elements ofnfd( T ) before applying opt. A formal deﬁnition requires many deﬁnitions that hinder clarity, soin this paper we prefer a more intuitive informal approach. Rigorous treatment of our backwardinduction operator can be found in [10, 11].The algorithm moves from right to left in the tree as follows. At a subtree st N ( T ), ﬁnd theset of normal form decisions nfd( T ), but remove any strategies that contain substrategies judgednon-optimal at any descendent node. For example, in Fig. 1, at N there are four normal formdecisions, but in the example the substrategy d was removed at N , and d was removed at N , so the only strategy at N that is retained is d if S , d if S . Next, ﬁnd the correspondinggambles of all surviving normal form decisions, apply opt, and transform back to optimal normalform decisions. Move to the next layer of nodes, and continue until the root node is reached.This yields a set of normal form decisions at the root node, that is, the algorithm correspondsto a normal form operator, which we call back opt .A further example using maximality, but this time using the notation of this section, can befound in Section 6, if further clariﬁcation is required.In [10, 11], we found four necessary and suﬃcient properties on opt for back opt and norm opt to coincide for any consistent decision tree. Property 1 (Backward conditioning property) . Let A and B be events such that A ∩ B = ∅ and A ∩ B = ∅ , and let X be a non-empty ﬁnite A ∩ B -consistent set of gambles, with { X, Y } ⊆ X such that AX = AY . Then X ∈ opt( X | A ∩ B ) implies Y ∈ opt( X | A ∩ B ) whenever there is anon-empty ﬁnite A ∩ B -consistent set of gambles Z such that, for at least one Z ∈ Z , AX + AZ ∈ opt( A X + A Z| B ) . This property requires that, if two gambles agree on A , it is not possible for exactly one to beoptimal, conditional on any subset of A , unless there is no suitable Z . Property 2 (Insensitivity of optimality to the omission of non-optimal elements) . For any event A = ∅ , and any non-empty ﬁnite A -consistent sets of gambles X and Y , opt( X | A ) ⊆ Y ⊆ X ⇒ opt( Y| A ) = opt( X | A ) . If opt satisﬁes this property, then removing non-optimal elements from a set does not aﬀectwhether or not each of the remaining elements is optimal. The property is called ‘insensitivityto the omission of non-optimal elements’ by De Cooman and Troﬀaes [5], and ‘property ǫ ’ bySen [32] who attributes this designation to Douglas Blair. Property 3 (Preservation of non-optimality under the addition of elements) . For any event A = ∅ , and any non-empty ﬁnite A -consistent sets of gambles X and Y , Y ⊆ X ⇒ opt( Y| A ) ⊇ opt( X | A ) ∩ Y . This is called ‘property α ’ by Sen [32], Axiom 7 by Luce and Raiﬀa [19, p. 288], and ‘indepen-dence of irrelevant alternatives’ by Radner and Marschak [24]. It states that any gamble thatis non-optimal in a set of gambles Y is non-optimal in any set of gambles containing Y .Properties 2 and 3 are together equivalent to the well-known property of path independence,which can often be checked more conveniently. Property 4.

A choice function opt is path independent if, for any non-empty event A , and forany ﬁnite family of non-empty ﬁnite A -consistent sets of gambles X , . . . , X n , opt n [ i =1 X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A ! = opt n [ i =1 opt( X i | A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A ! . Path independence appears frequently in the social choice literature. Plott [23] gives a detailedinvestigation of path independence and its possible justiﬁcations. Path independence is alsoequivalent to Axiom 7 ′ of Luce and Raiﬀa [19, p. 289]. Lemma 14 (Sen [32, Proposition 19]) . A choice function opt satisﬁes Properties 2 and 3 if andonly if opt satisﬁes Property 4.

Property 5 (Backward mixture property) . For any events A and B such that B ∩ A = ∅ and B ∩ A = ∅ , any B ∩ A -consistent gamble Z , and any non-empty ﬁnite B ∩ A -consistent set ofgambles X , opt (cid:0) A X + AZ | B (cid:1) ⊆ A opt( X | A ∩ B ) + AZ.

Theorem 15 (Backward induction theorem) . Let opt be any choice function. The followingconditions are equivalent.(A) For any consistent decision tree T , it holds that back opt ( T ) = norm opt ( T ) .(B) opt satisﬁes Properties 1, 2, 3, and 5. Application to Coherent Lower Previsions

In this section we investigate which of the choice functions for coherent lower previsions satisfythe conditions of Theorem 15(B). Some of the results are based on proofs for more general choicefunctions that can be found in the Appendix. This is diﬀerent from several other properties bearing the same name, such as that of Arrow’s ImpossibilityTheorem. For further discussion, see Ray [27].

ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 13

Maximality.

Maximality is a strict partial order, so Properties 2 and 3 hold, by Proposi-tion A.1.

Proposition 16.

Maximality satisﬁes Property 1.Proof.

We prove a stronger result. Let A be a non-empty event, X be a non-empty ﬁnite set of A -consistent gambles, and { X, Y } ⊆ X with AX = AY . We show that, for any event B suchthat A ∩ B = ∅ , X ∈ opt > P |· ( X | A ∩ B ) implies Y ∈ opt > P |· ( X | A ∩ B ).If X ∈ opt > P |· ( X | A ∩ B ), then for every Z ∈ X , P ( Z − X | A ∩ B ) ≤

0. But AX = AY implies( A ∩ B ) X = ( A ∩ B ) Y , and, by Proposition 3(i), P ( Z − X | A ∩ B ) = P ( Z − Y | A ∩ B ), and so itimmediately follows that Y ∈ opt > P |· ( X | A ∩ B ). (cid:3) Proposition 17.

Maximality satisﬁes Property 5.Proof.

Consider events A and B such that A ∩ B = ∅ and A ∩ B = ∅ , a non-empty ﬁnite setof A ∩ B -consistent gambles X , and an A ∩ B -consistent gamble Z . To establish Property 5, itsuﬃces to show that for any Y ∈ X , Y / ∈ opt > P |· ( X | A ∩ B ) = ⇒ AY + AZ / ∈ opt > P |· ( A X ⊕ AZ | B ) . If Y / ∈ opt > P |· ( X | A ∩ B ) then there is an X ∈ X with P ( X − Y | A ∩ B ) >

0. The result followsif we show that P ( AX + AZ − ( AY + AZ ) | B ) >

For any consistent decision tree T , it holds that back opt > P |· ( T ) = norm opt > P |· ( T ) . Proof.

Immediate, from Propositions A.1, 16, and 17, and Theorem 15. (cid:3)

E-admissibility.

Since E-admissibility is a union of maximality choice functions we have:

Corollary 19.

For any consistent decision tree T , it holds that back opt M ( T ) = norm opt M ( T ) . Proof.

Immediate, from Proposition A.2, Corollary 18, and Theorem 15. (cid:3)

Further, from Theorem A.3, we have:

Corollary 20.

For any consistent decision tree T , norm opt M ( T ) = norm opt M (back opt > P |· ( T )) . A AX Y . . Z P ( ·| B ) P ( ·| B ) X Y Z P PBX + BZ BY + BZ . Table 2.

Gambles and their lower and upper previsions for Example 21.5.3.

Interval Dominance.

By Proposition A.1, interval dominance satisﬁes Properties 2 and 3,and it satisﬁes Property 1 because AX = AY implies P ( X | A ) = P ( Y | A ) and P ( X | A ) = P ( Y | A ).We now show that interval dominance fails Property 5. Example 21.

Suppose A and B are events, and X , Y , and Z are the gambles given in Table 2.Let M contain all mass functions P such that A and B are independent, / ≤ P ( A ) ≤ / ,and P ( B ) = 1 / . Let P be the lower envelope of M .Lower and upper previsions of relevant gambles are given in Table 2; for example, P ( BY + BZ ) = max P ∈M P ( BY + BZ ) = max P ∈M P ( B ) P ( Y | B ) + P ( B ) P ( Z | B )= 12 max P ∈M ( P ( Y ) + P ( Z )) = 12 max p ∈ [ , ] (1 . − p ) + 3 . p + 4 p ) = 3 and similar for all other gambles. Clearly, Y interval dominates X conditional on B , however, BY + BZ does not interval dominate BX + BZ , violating Property 5. Even though interval dominance violates Property 5, it can still be of use in backward induc-tion. It is easily shown that (see for instance Troﬀaes [34])opt > P |· ( X | A ) ⊆ opt ⊐ P |· ( X | A ) . By Theorem A.3, we therefore have:

Corollary 22.

For any consistent decision tree T , norm opt > P |· ( T ) = norm opt > P |· (back opt ⊐ P |· ( T ))norm opt M ( T ) = norm opt M (back opt ⊐ P |· ( T ))It can also be shown that back opt ⊐ P |· ( T ) ⊆ norm opt ⊐ P |· ( T ) for all T , so all strategies found bybackward induction will be optimal with respect to opt ⊐ P |· .5.4. Γ -maximin. Γ-maximin fails Theorem 15(A): see for example Seidenfeld [30, SequentialExample 1, pp. 75–77]. Since Γ-maximin is induced by an ordering, it satisﬁes Properties 2and 3 by Proposition A.1. As for interval dominance, Γ-maximin satisﬁes Property 1. Hence, Γ-maximin must fail Property 5. Indeed, backward induction can fail in a particularly serious way:it can select a single gamble that is inferior to another normal form gamble. Hence, backwardinduction may not ﬁnd any Γ-maximin gambles.6.

The Oil Wildcatter Example

We now illustrate our algorithm using the same example as Kikuti et al. [14, Fig. 2]. Fig. 4depicts the decision tree, with utiles in units of $10000. The subject must decide whether to drillfor oil ( d ) or not ( d ). Drilling costs 7 and provides a return of 0, 12, or 27 depending on therichness of the site. The events S to S represent the diﬀerent yields, with S being the leastproﬁtable and S the most. The subject may pay 1 to test the site before deciding whether to ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 15 N N N − d N − S S S d T N − d N − S S S d T N − d N − S S S d T d T N d N − S S S d d T Figure 4.

Decision tree for the oil wildcatter. T T T . .

222 0 . .

363 0 . . Table 3.

Unconditional lower and upper probabilities P ( T i ) and P ( T i ) for oil example. T T T S . .

653 0 . .

333 0 . . S . .

272 0 . .

444 0 . . S . .

181 0 . .

363 0 . . Table 4.

Conditional lower and upper probabilities P ( S i | T i ) and P ( S i | T i ) foroil example.drill; this gives one of three results T to T , where T is the most pessimistic and T the mostoptimistic.Lower and upper probabilities are given for each T i (Table 3), and for each S i conditionalon T i (Table 4). (Some intervals are tighter than those in Kikuti et al., since their values areincoherent—we corrected these by natural extension [35, § marginal extension [35, § Z is then P ( Z ) = P ( T P ( Z | T ) + T P ( Z | T ) + T P ( Z | T )) . Let X = − S +5 S +20 S , and again let T ∗∗ = st N ∗∗ ( T ). Since we will only be concerned withmaximality, and normal form decisions in this problem are uniquely identiﬁed by their gambles, (i) N N N {− } d { X − } d T N {− } d { X − } d T N {− } d { X − } d T d T N { } d { X } d d T (ii) N N opt( {− , X − }| T ) = {− , X − } T opt( {− , X − }| T ) = { X − } T opt( {− , X − }| T ) = { X − } T d T opt( { , X } ) = { X } d T (iii) N opt( { T ( −

1) + T ( X −

1) + T ( X − , T ( X −

1) + T ( X −

1) + T ( X − } ) = { ( T + T ) X − , X − } d T { X } d T (iv) opt( { ( T + T ) X − , X − , X } ) = { X } Figure 5.

Solving the oil wildcatter example by normal form backward induction.we can conveniently work with gambles in this example. Therefore, we use the following notation:opt = opt > P |· back = gamb ◦ back opt > P |· norm = gamb ◦ norm opt > P |· Fig. 5 depicts the process of backward induction described next.(i) Of course, back( · ) at the ﬁnal chance nodes simply reports the gamble: back( T ) =back( T ) = back( T ) = { X − } , and back( T ) = { X } .(ii) For T , we must ﬁnd P (( X − − ( − | T ) and P ( − − ( X − | T ). These lowerprevisions can be computed using Table 4 as follows: X will have lowest expected valuewhen the worst outcome S is most likely (probability 0 . S is least likely (probability 0 . S is 0 . P ( X | T ) = − × .

653 + 5 × .

222 + 20 × .

125 = − . P ( − X | T ) = − . T ) = { X − , − } .For T , P (( X − − ( − | T ) = 4 . d dominates d , so back( T ) = { X − } . Similarly, P (( X − − ( − | T ) = 10 . T ) = { X − } .For T , we need to ﬁnd P ( X − P ( X ) = P ( T P ( X | T ) + T P ( X | T ) + T P ( X | T ))= P ( − . T + 4 . T + 10 . T )= 0 . × − .

961 + 0 . × .

754 + 0 . × .

073 = 5 . . This is greater than zero, so back( T ) = { X } .(iii) At T there are two potentially optimal gambles: T ( X −

1) + T ( X −

1) = X − T ( −

1) + T ( X −

1) = ( T + T ) X −

1. We must ﬁnd P (( X − − (( T + T ) X − P ( T X ) and P ((( T + T ) X − − ( X − P ( − T X ). ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 17

Using marginal extension, P ( T X ) = P ( T P ( X | T )) = P ( − . T ) = 0 . × − .

961 = − . < ,P ( − XT ) = P ( T P ( − X | T )) = P ( − . T ) = 0 . × − .

151 = − . < , so back( T ) = { X − , ( T + T ) X − } .(iv) Finally, for T , we must consider { X, X − , ( T + T ) X − } . It is clear that P ( X − ( X − >

0, so X − X − X , so by our calculation at T we knowthat X is maximal. We ﬁnally have P ( X − (( T + T ) X − P ( T X + 1) = P ( T X ) + 1 = − . > , so back( T ) = { X } . So, the optimal strategy is: do not test and just drill.We found a single maximal strategy. By Corollary 20, it is also the unique E-admissiblestrategy. (Our solution diﬀers from Kikuti et al. [14]; since they do not detail their calculations,we could not identify why.) Of course, if the imprecision was larger, we would have foundmore, but it does show that non-trivial sequential problems can give unique solutions even whenprobabilities are imprecise.In this example, the usual normal form method requires comparing 10 gambles at once. Bynormal form backward induction, we only had to compare 2 gambles at once at each stage (exceptat the end, where we had 3), leading us much quicker to the solution: the computational beneﬁtof normal form backward induction is obvious.7. Conclusion

When solving sequential decision problems with limited knowledge, it may be impossible toassign a probability to every event. An alternative is to use a coherent lower prevision, or,equivalently, a closed convex set of probability mass functions. Under such a model there areseveral plausible generalizations of maximizing expected utility.Given any criterion, we considered two methods of solving decision trees: the usual normalform, where the subject applies the criterion to the lists all strategies, and normal form backwardinduction, adapted from [14]. If they coincide, backward induction helps eﬃciently solving trees.If they diﬀer, doubt is cast on the criterion’s suitability.In Theorem 15 we identiﬁed when the two methods coincide. We then applied these resultsto the choice functions for coherent lower previsions. As was already known, Γ-maximin failsProperty 5. Interval dominance fails the same condition. However, and perhaps surprisingly,maximality and E-admissibility satisfy all conditions, in the case where lower probabilities arenon-zero. If any lower probabilities are zero, then Property 5 fails (unsurprisingly, as in this caseit already fails with precise probabilities).When analysing choice functions, whether for lower previsions or not, usually Property 5 ismost troublesome. Failing Property 1 would involve an unnatural form of conditioning, andpath independence (Property 4) is a very natural consistency condition that one usually wantsto satisfy before even considering decision trees.We have not argued that a normal form operator, and norm opt in particular, gives the bestsolution to a decision tree. A normal form solution requires a policy to be speciﬁed and adheredto. The subject does this policy only by her own resolution: she of course can change herpolicy upon reaching a decision node [29]. One might argue that a normal form solution is onlyacceptable when the subject cannot change her mind (for example, if she instructs, in advance,others to carry out the actions).Further, many choice functions cause norm opt to have undesirable properties, even when theysatisfy Properties 1, 2, 3, and 5. For example, using maximality or E -admissibility with a lower prevision allows the subject to choose to pay for apparently irrelevant information instead ofmaking an immediate decision [31], and a gamble that is optimal in a subtree may becomenon-optimal in the full tree [8, 10, 11].Moreover, normal form backward induction does not always help with computations. Theneed to store, at every stage, all optimal gambles, could be a burden. Secondly, if imprecisionis large, causing only few gambles to be deleted, the set of optimal gambles at each stage willstill eventually become too large. In such situations, a form of approximation may be necessary.Even so, we have shown that, perhaps surprisingly, the normal form can be solved exactly withbackward induction, and when either trees or imprecision are not too large, the method will becomputationally feasible. References [1] T. Augustin,

On decision making under ambiguous prior and sampling information , ISIPTA 01: Proceedingsof the Second International Symposium on Imprecise Probabilities (G. de Cooman, T.L. Fine, S. Moral, andT. Seidenfeld, eds.), 2001.[2] George Boole,

An investigation of the laws of thought on which are founded the mathematical theories oflogic and probabilities , Walton and Maberly, London, 1854.[3] Robert T. Clemen and Terence Reilly,

Making hard decisions , Duxbury, Belmont CA, 2001.[4] Marquis de Condorcet,

Essai sur l’application de l’analyse `a la probabilit´e des d´ecisions rendues `a la pluralit´edes voix , L’Imprimerie Royale, Paris, 1785.[5] G. De Cooman and Matthias C. M. Troﬀaes,

Dynamic programming for deterministic discrete-time sys-tems with uncertain gain , International Journal of Approximate Reasoning (2005), no. 2-3, 257–278, doi:10.1016/j.ijar.2004.10.004 .[6] Bruno de Finetti, Theory of probability: A critical introductory treatment , Wiley, New York, 1974–5, Twovolumes.[7] Jonathan Gross and Jay Yellen,

Graph theory and its applications , CRC Press, London, 1999.[8] P. Hammond,

Consequentialist foundations for expected utility , Theory and Decision (1988), no. 1, 25–78.[9] David Harmanec, A generalization of the concept of markov decision process to imprecise probabilities , 1stInternational Symposium on Imprecise Probabilities and Their Applications, 1999, pp. 175–182.[10] N. Huntley and Matthias C. M. Troﬀaes,

Characterizing factuality in normal form sequential decision mak-ing , ISIPTA’09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories andApplications (Thomas Augustin, Frank P. A. Coolen, Seraﬁn Moral, and Matthias C. M. Troﬀaes, eds.), Jul2009, pp. 239–248.[11] ,

Subtree perfectness, backward induction, and normal-extensive form equivalence for single agentsequential decision making under arbitrary choice functions , arXiv:1109.3607v1 [math.ST], September 2011, arXiv:1109.3607 .[12] Nathan Huntley and Matthias C. M. Troﬀaes,

An eﬃcient normal form solution to decision trees with lowerprevisions , Soft Methods for Handling Variability and Imprecision (Berlin) (Didier Dubois, M. Asunci´on Lu-biano, Henri Prade, Mar´ıa ´Angeles Gil, Przemyslaw Grzegorzewski, and Olgierd Hryniewicz, eds.), Advancesin Soft Computing, Springer, Sep 2008, pp. 419–426.[13] J. Jaﬀray,

Rational decision making with imprecise probabilities , 1st International Symposium on ImpreciseProbabilities and Their Applications, 1999, pp. 183–188.[14] D. Kikuti, F. Cozman, and C.P. de Campos,

Partially ordered preferences in decision trees: Computingstrategies with imprecision in probabilities , IJCAI-05 Multidisciplinary Workshop on Advances in PreferenceHandling (R. Brafman and U. Junker, eds.), 2005, pp. 118–123.[15] H.E. Kyburg,

Rational belief , Behavioral and Brain Sciences (1983), no. 2, 231–273.[16] I. LaValle and K. Wapman, Rolling back decision trees requires the independence axiom! , ManagementScience (1986), no. 3, 382–385.[17] I. Levi, The enterprise of knowledge , MIT Press, London, 1980.[18] D. V. Lindley,

Making decisions , 2nd ed., Wiley, London, 1985.[19] R.D. Luce and H. Raiﬀa,

Games and decisions: introduction and critical survery , Wiley, 1957.[20] M.J. Machina,

Dynamic consistency and non-expected utility models of choice under uncertainty , Journal ofEconomic Literature (1989), 1622–1688.[21] E. F. McClennen, Rationality and dynamic choice: Foundational explorations , Cambridge University Press,Cambridge, 1990.

ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 19 [22] Enrique Miranda,

A survey of the theory of coherent lower previsions , International Journal of ApproximateReasoning (2008), no. 2, 628–658.[23] C.R. Plott, Path independence, rationality, and social choice , Econometrica (1973), no. 6, 1075–1091.[24] R. Radner and J. Marschak, Note on some proposed decision criteria , Decision Processes (R. M. Thrall, C. HCoombs, and R. L. Davies, eds.), John Wiley, New York, 1954, pp. 61–68.[25] H. Raiﬀa and R. Schlaifer,

Applied statistical decision theory , Harvard University Press, Boston, 1961.[26] Frank P. Ramsey,

Truth and probability , Foundations of Mathematics and other Logical Essays (R. B. Braith-waite, ed.), Routledge and Kegan Paul, London, 1931, Published posthumously, pp. 156–198.[27] P. Ray,

Independence of irrelevant alternatives , Econometrica (1973), no. 5, 987–991.[28] Sheldon Ross, A ﬁrst course in probability , 7th ed., Pearson Prentice Hall, 2006.[29] T. Seidenfeld,

Decision theory without ‘independence’ or without ‘ordering’: What is the diﬀerence? , Eco-nomics and Philosophy (1988), 267–290.[30] , A contrast between two decision rules for use with (convex) sets of probabilities: Γ -maximin versusE-admissibility , Synthese (2004), 69–88.[31] T. Seidenfeld, M. J. Schervish, and J. B Kadane, Coherent choice functions under uncertainty , 5th Interna-tional Symposium on Imprecise Probability: Theories and Applications, 2007, pp. 385–394.[32] A. K. Sen,

Social choice theory: A re-examination , Econometrica (1977), no. 1, 53–89.[33] Cedric A. B. Smith, Consistency in statistical inference and decision , Journal of the Royal Statistical Society B (1961), no. 23, 1–37.[34] Matthias C. M. Troﬀaes, Decision making under uncertainty using imprecise probabilities , InternationalJournal of Approximate Reasoning (2007), 17–29.[35] P. Walley, Statistical reasoning with imprecise probabilities , Chapman and Hall, London, 1991.[36] Peter M. Williams,

Notes on conditional previsions , Tech. report, School of Math. and Phys. Sci., Univ. ofSussex, 1975.[37] ,

Notes on conditional previsions , International Journal of Approximate Reasoning (2007), no. 3,366–383.[38] Marco Zaﬀalon, Keith Wesnes, and Orlando Petrini, Reliable diagnoses of dementia by the naive credalclassiﬁer inferred from incomplete cognitive data , Artiﬁcial Intelligence in Medicine (2003), no. 1–2,61–79. Appendix A. Results for General Choice Functions

This appendix details intermediate results required for the proofs in Section 3. Since theresults are applicable for choice functions that are nothing to do with coherent lower previsions,and so may be useful for investigating other uncertainty models, we present them separately.

Proposition A.1.

For each non-empty event A , let ≻ A be any strict partial order on A -consistent gambles. The choice function induced by these strict partial orders, that is, opt ≻|· ( X | A ) = { X ∈ X : ( ∀ Y ∈ X )( Y A X ) } satisﬁes Properties 2 and 3.Proof. By Lemma 14, it suﬃces to show that opt ≻|· is path independent. Let X , . . . , X n benon-empty ﬁnite sets of A -consistent gambles, and let A be a non-empty event. Let X = S ni =1 X i and Z = S ni =1 opt ≻|· ( X i | A ). We show must show that(3) opt ≻|· ( X | A ) = opt ≻|· ( Z| A ) . By deﬁnition, opt ≻|· ( Z| A ) = { Z ∈ Z : ( ∀ Y ∈ Z )( Y A Z ) } , and, observe that, if X ∈ X but X / ∈ Z , by transitivity of ≻ A and ﬁniteness of X , there is a Y ∈ Z such that Y ≻ A X . Therefore again by transitivity of ≻ A , for any Z ∈ Z such that X ≻ A Z , we have Y ≻ A Z . So, = { Z ∈ Z : ( ∀ Y ∈ X )( Y A Z ) } , and once again, by deﬁnition of Z , if X ∈ X but X / ∈ Z there is a Y ∈ X such that Y ≻ A X , sowe have = { X ∈ X : ( ∀ Y ∈ X )( Y A X ) } = opt ≻|· ( X | A ) . (cid:3) Proposition A.2.

Let { opt i : i ∈ I} be a family of choice functions. For any non-empty event A and any non-empty ﬁnite set of A -consistent gambles X , let opt( X | A ) = [ i ∈I opt i ( X | A ) . (i) If each opt i satisﬁes Property 2, then so does opt .(ii) If each opt i satisﬁes Property 3, then so does opt .(iii) If each opt i satisﬁes Property 5, then so does opt .(iv) If each opt i satisﬁes Properties 1, 3, and 5, then opt satisﬁes Property 1.Proof. (i). By deﬁnition of opt and by assumption, for any ﬁnite non-empty sets of gambles X and Y such that Y ⊆ X and for any i ∈ I , opt i ( X | A ) ⊆ opt( X | A ) ⊆ Y , and therefore byProperty 2, opt i ( X | A ) = opt i ( Y| A ). Whence,opt( Y| A ) = [ i ∈I opt i ( Y| A ) = [ i ∈I opt i ( X | A ) = opt( X | A ) . ORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 21 (ii). By assumption, for any ﬁnite non-empty sets of gambles X and Y such that Y ⊆ X andfor any i ∈ I , opt i ( Y| A ) ⊇ opt i ( X | A ) ∩ Y . Therefore,opt( Y| A ) = [ i ∈I opt i ( Y| A ) ⊇ [ i ∈I (opt i ( X | A ) ∩ Y )= Y ∩ [ i ∈I opt i ( X | A ) = opt( X | A ) ∩ Y . (iii). By assumption, for any non-empty ﬁnite set of gambles X , any gamble Z , any events A and B such that A ∩ B = ∅ , and for any i ∈ I ,opt i ( A X ⊕ AZ | B ) ⊆ A opt i ( X | A ∩ B ) ⊕ AZ, whence opt( A X ⊕ AZ | B ) = [ i ∈I opt i ( A X ⊕ AZ | B ) ⊆ [ i ∈I ( A opt i ( X | A ∩ B ) ⊕ AZ )= AZ ⊕ A [ i ∈I opt i ( X | A ∩ B )= AZ ⊕ A opt( X | B ) . (iv). Let A and B be events such that A ∩ B = ∅ and A ∩ B = ∅ , Z be a non-empty ﬁniteset of A ∩ B -consistent gambles, and X be a non-empty ﬁnite set of A ∩ B -consistent gamblessuch that there is { X, Y } ⊆ X with AX = AY . Suppose that there is a Z ∈ Z such that AX + AZ ∈ opt( A X + A Z| B ).By deﬁnition of opt, there is a j such that AX + AZ ∈ opt j ( A X + A Z| B ). We show thatboth X and Y are in opt j ( X | A ∩ B ), and therefore are both in opt( X | A ∩ B ). It follows fromProperties 3 and 5 thatopt j ( A X + A Z| B ) ⊆ A opt j ( X | A ∩ B ) + A opt j ( Z| A ∩ B ) . Therefore, there is a V ∈ opt j ( X | A ∩ B ) with AV = AX . Finally, opt j satisﬁes Property 1, andtherefore both X and Y must be in opt j ( X | A ∩ B ). This establishes Property 1 for opt. (cid:3) The ﬁnal result is the following: if opt satisﬁes the necessary properties, opt does not,but opt ⊆ opt , then we can use opt (back opt ( · )) to ﬁnd norm opt . This could be of interestin situations where opt is much more computationally eﬃcient than opt , and still eliminatesenough gambles to be useful. Theorem A.3.

Let opt and opt be choice functions such that opt satisﬁes Properties 1, 2,3, and 5, and for any non-empty event A and any non-empty ﬁnite set of A -consistent gambles X , opt ( X | A ) ⊆ opt ( X | A ) . Then, for any consistent decision tree T , (4) norm opt ( T ) = norm opt (back opt ( T )) . Durham University, Department of Mathematical Sciences, Science Laboratories, South Road,Durham DH1 3LE, United Kingdom

E-mail address : [email protected] Durham University, Department of Mathematical Sciences, Science Laboratories, South Road,Durham DH1 3LE, United Kingdom

E-mail address ::