Conditional Utility, Utility Independence, and Utility Networks
4429
Conditional Utility, Utility Independence, and Utility Networks
Yoav Shoham•
Department of Computer Science Stanford University Stanford, CA [email protected]
Abstract
We introduce a new interpretation of two related notions -conditional utility and utility independence. Unlike the traditional interpretation, the new interpretation render the notions the direct analogues of their probabilistic counterparts. To capture these notions formally, we appeal to the notion of utility distribution, introduced in previous paper. We show that utility distributions, which have a structure that is identical to that of probability distributions, can be viewed as a special case of an additive multiattribute utility functions, and show how this special case permits us to capture the novel senses of conditional utility and utility independence. Finally, we present the notion of utility networks, which do for utilities what Bayesian networks do for probabilities. Specifically, utility networks exploit the new interpretation of conditional utility and utility independence to compactly represent a utility distribution.
1 Introduction
There has recently been a growing interest within AI in representing and reasoning about utility. There are several reasons for this. First, while probabilistic methods have gained much influence, probability is only one ingredient of decision theory; foundations of decision theory are based on utility functions as much as they are on probability distributions. Second, just as there exist applications which call for reasoning purely about probabilities, there exist applications that call for reasoning purely about utilities. Examples include a software agent that needs to reason about the • This work was supported in part by
NSF grant
IRI-9503109. utility functions of other agents in a bargaining situation, and a meal-planning program needing to understand the gastronomic preferences of the user. As we argue in previous paper [7]1, it would be quite convenient if we had a mechanism analogous to Bayesian networks to reason purely about utilities. As we further note there, at the heart of Bayesian networks lie three concepts: probability distribution, conditional probability, and probability independence. If we manage to mirror those notions in the case of utilities, we will have potentially availed ourselves of a ready-made mechanism for reasoning about utilities. In [7] we introduce the notion of utility distribution.2 Here we concentrate on the notions of conditional utility and utility independence, and the derived notion of utility networks. While not the main focus here, as we shall see, this paper does shed some new light on the notion of utility distribution itself. Specifically, while the treatment in [7] derives the notion from scratch, as a side effect of considering notions such as utility independence we will end up re-deriving the notion of utility distribution as an extension of standard decision theoretic notions, in particular those encountered in multiattribute utility theory (MAUT) [5, 3].
Indeed, most papers in AI that attempt to do something interesting with utilities appeal to MAUT, and to notions of conditionalization and independence therein. This is true of earlier work on influence diagrams that introduces multiple (additive or multiplicative ) value nodes [6], and more recent papers by Bacchus and Grove [1] and by Doyle and Wellman [2]. I think that there are two reasons why researchers have concentrated on the classical notions of condi- Despite the publication dates, [7] describes work carried out almost a year before the work described here. furthermore define the notion of a hi-distribution, which contains both a probability and utility distribution, but this will not play a major role in this paper. Shoham tiona! utility and utility independence. First, the decision theoretic literature itself (most notably,
Keeney and Raiffa's [5]) presents compelling arguments in favor of these notions. Second, the terms themselves suggest an analogy with conditional probability and probabilistic independence, leading to this vague hope that the utility-based notions will yield similar computational advantages. The decision theory literature reinforces the analogy between the probabilistic and utilitarian notions. Here's a quote from [5]: One of the fundamental concepts of multiattribute utility theory is that of utility independence.
Its role in multiattribute utility theory is similar to that of probabilistic independence in multivariate probability theory. (p.224) I argue that this analogy is misleading. The classical sense of utility independence has fundamentally different properties from those of probabilistic independence. However, there exists a different sense of utility independence for which this analogy holds in a precise sense. The same is true of conditional utility. Let me illustrate these two senses of conditional utility through an example. Referring to the hypothetical meal-planning program mentioned above, consider two conditional utilities: • The utility for John of having beef for the main course, given that the appetizer is salmon mousse. • The utility for John of having beef for the main course, given that John is vegetarian. These are fundamentally different senses of conditional utility.
The first conditions on an objective fact; loosely speaking, it can be thought of updating the utility based on information learned about the state of the world. The second conditions on a mental fact; loosely speaking, it can be thought of as updating the utility based on information learned about the preference structure of the agent. We might call the first 'objective' conditional utility, and the second 'subjective' conditional utility. An analogy may be instructive here. The KR and database communities have learned to distinguish between updating a knowledge base and revising it [4]; the first reflects changes in the world, the second changes in information about the world. A similar distinction must be drawn with respect to conditional utilities. The standard notion of conditional utility (and derived notions of utility independence) in decision theory is of the first variety, and it is the one commonly discussed in AI.
However, this version of conditional utility is the one least similar to conditional probability. Perhaps for this reason, and despite great ingenuity on the part of the various authors, this notion has not yielded a computational device similar in nature and power to
Bayesian networks.3 In the rest of this paper we do the following. First, we briefly review the basics of MAUT and the standard notion of utility independence (and conditional utility). Next we extend those notions to include the notion of utility distribution. We then formally present the alternative notions of utility independence and conditional utility, which are directly analogous to their probabilistic counterparts. We conclude with an a computational application of these ideas, and introduce the notion of utility networks. Multiattribute utility theory (review)
Every utility function is defined over a set of states, and maps each state to a real value (its utility). In multi-attribute utility theory (MAUT) [3, 5] one posits a set of n attributes with corresponding domains D1, D2, . . . , Dn, and the set of states is defined to be the Cartesian product Dt x D2 x ... x Dn.
In general, specifying a MAU function can be expensive, exponential in the number of attributes.
How ever, under special conditions the representation can be more compact. The general scheme for specifying these conditions goes like this: One defines certain "independence conditions" on the n attributes, and then provides a "representation theorem," stating that under these independence conditions the utility function can be specified in a certain compact form. The remainder of this section summarizes these conditions and corresponding specialized representations. We first note that a utility function u over a set of states S induces a preference ordering �u on lotteries (or probability distributions) over S via the expected utility construct: where Pt and p2 are any two lotteries over S. More generally, in the case of MAU functions, a utility func- onditional Utility, Utility Independence, and Utility Networks tion defines preference on lotteries conditional on cer� tain attribute X having particular values x: PI t P2(givenX = x) '¢::::::::>
EsEspi(s I X = x)u(s) E s E S P ( s I X= x)u(s)
Given this, we can define the general notion of utility independence. In the following definition, Y and Z are sets of attributes ( in most applications, Z will be the complement of Y):
Definition (based on [5], p.226) Y is utility independent of Z when conditional preferences on lotteries on Y given Z = z do not depend on the particular value of z. Using the notion of utility independence we define two independence conditions on a set of attributes. The first definition doesn't appear in the literature as a definition or given this name, but these will be convenient:
Definition 2 (based on [5], p.293)
Attributes
X1, ... , Xn are singulary utility independent if every X; is utility independent of X;. The next definition involves stronger independence conditions:
Definition 3 ([5], p.289)
AttributesX1, ... ,Xn are mutually utility independent if every subset of {X .. . , Xn} is utility independent of its complement. The third and final type of independence involves a stronger condition than either of the first two:
Definition ([5], p.295) Attributes X1, . .. , Xn are additive independent if preferences over lotteries on X ... , Xn depend only on their margmal probability distributions and not on their joint probability distribution. The relative strength of these three properties is reflected in the different representation theorems they allow. Starting off somewhat qualitatively, here is a rough description of four special forms of MAU functions that are based on these three independence conditions. The first column specifies the independence condition on attributes, the second column names the special form in which the utility function can be represented, and the third gives some properties of this special form: ind. cond. I special form some properties singular multilinear n simple utility func-tions; ex- ponential number of constants; exponen-tial number of addi-tions and multiplica- tions mutual no standard n simple utility func- name tions, n constants, ex-ponential number of additions and multi-plications. However, this form always col-lapses to one of the following two special _eases none multiplicative n simple utility func� stated tions; n constants; n additions and multi-plications. additive additive n simple utility func-tions; n constants; n additions and multi-plications. Although the above table omits details, it does show that, as can be expected, stronger forms of utility independence admit more economical representations of the utility function. Indeed, additive independence ( and even the more specific notion of conditional ad ditive independence, in which additive independence holds given certain attributes) is the one that has attracted some interest recently in AI, and the one that is most relevant to us here. Given this, let us define the additive form of MAU functions precisely:
Definition 5 (adapted from [5], p.295)
A MA U function u over variables x1, .. . , Xn is additive iff there exist functions u1, ... , Un and constants kt,·· . , k n such that u(x1, ... ,xn) = Ei=1k;u;(x;). (
In fact, in the current context the constants k; play no role and could be omitted, but they will be useful in the sequel. ) From additive utility functions to utility distributions
One way in which to view our alternative perspective on utility independence is through a specialization of additive MAU functions. Our starting point is an ex� ample used by Bacchus and Grove. Consider two variables, H (healthy) and W (wealthy), and the following utility function: u(HW) = u(HW = = u(HW) = Shoham
It is easy to verify that H is utility independent of W, and vice versa (intuitively, one prefers being healthy to being sick regardless of whether one is wealthy or poor, and vice versa). However, H and W are not additive independent. Consider two lotteries Pt and P2 defined by pr(HW) = = lj4),p1(HW) = = and P2(HW} = = O),p2(HW) = O,p2(HW) = PI and P2 have identical marginals on H and W, but while the expected utility of p1 is the expected utility of p2 is Intuitively, a person with this utility function prefers the synergy of health and wealth more than is predicted by their individual utilities. Now consider a modified example, in which u(HW) = (and the other values remain unchanged). It is not hard to see that in this case H and W are additive independent. This modified example is both instructive and misleading. It is instructive because it demonstrate that, unlike the notion of utility independence which is qualitative in nature (it merely compares various numbers), additive independence is arithmetic in nature. However, it is also misleading because this example has more properties than are required by the notion of additive independence. Specifically, the attributes in this example are binary in nature; you are either healthy or not. The more general case would allow for multiple values of health and of wealth. However, while too specialized to be a neutral representative of additive utility functions, this example is representative of a new, more specific class, which we define next. Definition 6
Given a vector of boolean attributes
X = x1, ... , Xn {that is, each with domain {0,1}), a utility function u over X is TIOLI ("take it or leave it") iff there exists constants k1, ... , kn such that u(x1, ... , Xn ) = Ef=l kiXi.
The interpretation of a TIOLI utility function is best explained through our modified example. Health contributes to one's joy (or utility or satisfaction), Wealth contributes and the total utility experienced in any given state is simply the sum of the joys supplied by the elements present in the state - u(HW) = = u(HW) = = and so on. I will call the attributes of a TIOLI MAU function utility factors or simply factors, to denote the fact they are thought of as representing the basic ingredients of one's mental state of satisfaction. It is worthwhile to mention here that
MAUT is completely agnostic about the interpretation of attributes. In some examples each attribute is some good such as sugar or flour, its value denote the quantity of the good consumed in a state, and a state then becomes interpreted as a bundle of goods (on which one might bid in an auction, for example). In a different interpretation, the different attributes are days of the week, their values are the rewards experienced in each day, and their combination (for example, via addition or weighted addition) describes the overall utility experienced during the week. TIOLI functions admit the more psychological interpretation discussed above. Finally, we note that the various k;s can be translated and scaled to lie in the interval [0 .. 1] and to sum to This yields the special form of TIOLI utility functions called utility distributions:
Definition Given a vector of boolean attributes X = X1, ... , Xn, a utility function u over X is a utility distribution iff there exists constants k1, . . . , kn such that {a) 0 � k; � 1, (b) L:j=1k; = and {c) u ( x1, ... ,xn ) = Ej=1kiX;.
Clearly, the structure of a utility distribution is essentially that of a probability distribution, except that the measure is applied to utility factors rather than to events. In fact, all that remains in order to make the two measures identical in structure is to lift the domain of the utility distribution to sets of factors. This is done in the obvious way; the utility of a set of factors is the sum of the individual utilities.4 The notion of utility distribution was introduced already in a previous paper [7], where it was defined independently of any pre-existing notion. I've re-derived the notion here in the context of multiattribute utility theory in order to be able to contrast different senses of, e.g., utility independence in the next section, but let me briefly mention here a few more elements discussed in [7]. The reader might be concerned about the applicability of these notions. We have discussed a progression of increasingly strong constraints on the structure of the utility function, and one might worry that the
MAU functions - in particular, the additive form - come with a representation theorem, stating necessary and sufficient conditions on the preference relation over lotteries that permit the special form in question. One might ask if similar necessary and sufficient conditions exist for TIOLI functions or utility distributions. This is a question that I haven't looked into closely thus far, but it seems that there do not exist simple, compact conditions. Obviously, a necessary condition is that the preference among lotteries depend only on the marginals, but it seems that the ad ditional requirements needed to make this also a sufficient condition are not as neat. However, given my preliminary state of understanding, and the fact that this question will not play a role in the sequel, I won't pursue this further here. onditional Utility, Utility Independence, and Utility Networks strongest constraint -the utility distribution form-is too rare to be of interest. It turns out that this is not so, at least in principle. For any utility function- even one not in MAU form- we can find a set of factors and a utility distribution over them, s u c h that original utility function can be reconstructed from these factors. The only hitch is that the set of factors might not be as small as one would like. In the examples given above (the health/wealth, and the cars examples) the number of factors was logarithmic in the number of states. However, i n general it will range between logarithmic and linear. I have not discussed here how one can combine pr ob a b i li t y distributions and ut i l i t y distributions i n t o one framework. In [7] I define the notion of a hidistribution. Briefly, a hi-distribution is a structure consisting of a probability distribution and a utility distribution, with undirected arcs connecting some fa c tors with some states. What is important to e m p h as i z e here is that one cannot in general define a probability distribution and a utility distribution on the same set. In effect, when one carves up the world into a set of elements, one usually makes a choice -these elements can be additive in probability (in which case they are called ' s t a te s ' ) or additive in utility (in which case they are called ' f a cto r s ' ) , but not both. However, given two such different sets of elements, one can define a third set by taking the Cartesian product of the first two. The elements in this induces set are additive in both probability and utility, but do not typically correspond to an intuitive concept. Defining subjective senses of conditional utility and utility independence
Let us now reconsider the notions of conditional utility and utility independence, in the context of util ity distributions. Since utility distributions are MAU functions, we can apply (any of) the standard notions of conditioning and independence to them. However, unlike arbitrary MAU functions, utility distributions also allow us to define the subjective versions of these two notions, discussed in the introduction. Let's start with conditional utility. In the subjective version of the notion, we interpret u(x]y) as "the utility of x, given that u(y) = or, that all the u t ility is derived from the factor set y." This can be explained i n t u i ti vely through an ex a m p l e . Consider a person whose entire value system is based on owning any or all of three cars, a Rolls Royce, a Maserati, and a Ford. We define three corresponding u t i li t y factors, with the k;s defined by u(r) = u(m) = a n d u(f) = Thus, for example, if the person owned all three cars he would derive of his utility from the Ford; this can be thought of as the prior utility of a Ford. But what would the contribution of the Ford to utility be if it is learned that the person derives no pleasure from British-made cars? By direct analogy with p ro b a b i l i t i e s , we define subjective conditional utility by u(x\y) = u(x II y)/u(y). This gives us, in particular, that the utility of a Ford for a British-car-hater is u(f]fm) = u(f n fm)/u(fm) = u(f)/u(fm) = / ( . + = Subjective utility independence will also be defined similarly; factor set x will be said to be utilityindependent of y iff u(x]y) = u(x). The intuition behind this property will be that the relative contribution of the factor set x does not change if we learn that the entire contribution to joy lies within the set y. Thus in t h e car example, a Ford is not utility independent of non-British cars, since u(f) = . . . = u(flfm) However, if we add a Toyota and modify the k;s as follow:;: factor t utility / then it's easy to verify that the utility of the class of cars made in English-speaking countries is independent of the utility of European-made cars: u(fr]rm) = tt(fr) = Utility networks
Our goal at the outset was to investigate the p o ss i b il i ty of endowing utilities with the properties of probabilities, so that the benefit of Bayesian networks can transfer to them. We've now achieved this, so it might be argued that the rest of the story is anti-climactic. Since now we have notions of ut il it y distribution, conditional utility, and utility independence that have exactly the mathematical properties of their probabilistic counterparts, we can, it might be argued, go ahead a n d use Bayesian networks to represent and reason about utilities. (Of course, the term Bayesian networks should now be replaced by s o me th i n g more appropriate. We might use t e r m s utility network, or u-net, when we are us i n g the Bayesian-network-like structure for utilities, and p-net in t h e case of p ro bab i l i t i es.) Left at this, however, this might be deemed little more than a formal exercise. There are at least two sources Shoham of potential complaint. The first is that utility factors might not correspond to anything intuitive, and hence could not be used in practice. The second is that Bayesian networks have proven useful because the structure of such networks reflects causality; it is our intuitive grasp of causal relations in the world that allows us to construct and understand Bayesian networks. What's to help us make intuitive sense of utility networks? Let me address these two potential complaints in turn. I don't know whether utility factors will in general turn out to be intuitive or not; I think we don't have enough experience to pronounce judgement on this. However, we do not need to reason about factors directly. This is exactly analogous to probabilistic reasoning. Bayesian networks do not represent individual world states; in any realistic domain, these would be impossibly complex for any human to comprehend. Rather, each variable represents an event, a set of states, which abstracts away from all but a few aspects of the world. The events "it rained," "the lawn was watered," and "the pavement is wet," are examples of such abstractions. We will do the same with factors, and reason only about sets of factors, such as "having money," "being admired by a loved one," and "owning a motorcycle." If these sets of factors seem indistinguishable from events, this is no accident. While in general the set of individual states and the set of individual factors are disjoint, certain sets of states and sets of factors might be co-extensive. In [7]
I discuss how for any utility function defined on states one can construct a dual, factor space such that the utility of each original state is the sum of some subset of factors in the dual space. In general, every event (set of states) defines a set of factors, and every set of factors defines an event. When a given set of factors and a given event define each other, they are co-extensive. Detailed discussion of this is beyond the scope of the paper. Hopefully the examples to come will give more intuition, and in particular clarify the general point about not having reason about individual factors. But here's the most important point. We shouldn't fret over whether nodes in a network are events sets of factors, or both. The critical question is the i nterpretation we give to the links in the network, that is, to the relationships between the nodes. Which brings us to the second issue, causality. Although I'm in general suspicious about cavalier uses of the causal terms in AI, and in particular about the purely causal interpretation of Bayesian networks, there is no denying that most of the Bayesian networks actually produced in fact correspond to intuitive no- tions of causality, typified by the standard example: burglar-John cilll5- The intuitive interpretation of this example is that either an earthquake or a burglary can trigger your house alarm, which in turn can cause both of your neighbors-John and Mary - to call you at work. Indeed, it is hard to imagine how one would come up with this network without appealing to causality. Similarly, given the causal interpretation, it is intuitively clear why John's calling you is probablistically independent of there being an earthquake given that the house alarm went off. Can any concept play the role of causality in utility networks? I think the answer is yes, and that the concept is teleology, a form of "mental causality." Consider two factor sets, "love of art" and "visiting the San Francisco Museum of Modern Art (SFMOMA)." The utility one places on Art determines, at least in part, the utility of SFMOMA. In other words, SFMOMA is desired inasmuch as it contributes to satisfying the Art desire. It seems to me that this familiar utilitarian notion serves just the purpose needed here. Consider a more elaborate example: GS-PDrelalod Anrt�� /\ SFMOMA I Dirt Bike Mag
The way to interpret this picture is as follows. The onditional Utility, Uti l ity Independence, and Utility Networks person whose utility is modeled has two basic moti vators -love of BMW GS-PD motorcycles, and love of art. In service of the GS-PD motivator, the person place a certain value on owning one of these bikes, and on reading Dirt Bike magazine, which covers dualpurpose motorcycles such as the GS-PD. The Art motivator leads hi m to desire to go to SFMOMA, as well as to own an original de Kooning. Both the consideration of owning a GS-PD and that of owning the de Kooning lead the person to place a certain value on money. The graph structure induces various independence conditions.
For example, the utility of money given that the person wishes to p ur c ha s e a GS-PD is independent of his love of motorcycles. This is presumably a natural and familiar pattern. Indeed, it is a causal pattern, but one must be careful about the nature of this causal relation. Consider again the link between Art and
SFMOMA. There are at least three causal connections one might be tempted to identify here. First, satisfying the SFMOMA desire will cause the higher satisfaction of the desire for Art. Second, the desire for Art will cause the person to desire to visit SFMOMA.
Third, the desire for Art will cause the person to actually visit S FM OM A . In utility networks, the links capture the first two kinds of causality, but not the third. Utility networks do not speak about what will be the case, only about a person's mental state. Indeed, whether the person will actually visit the museum might be determined by factors that are unrelated to the person's preference structure, just as even the most intense interest in the GS-PD will not necessarily result in the person buying one. This is not to say that mental state doesn't impact reality, only that utility network don't capture this fact. Although this lies beyond the scope of this paper, let me add a few words on reasoning simultaneously about probabilities and utilities, in a structure we might call a bi-network.
I've mentioned already that in [7]
I define a hi-distribution, which couples a probability distribution (or p-distribution) with a utility distribution (or u-distribution). We can represent a hi-distribution by a pair of networks, a Bayesian network (or p-net) and a utility network (or u-net). If the two were unrelated that wouldn't be interesting, but in fact a hidistribution includes a set of undirected edges between nodes in the two distributions, which can be used to induce utilities on the p-net and probabilities on the u-net. In a simple version of hi-networks, computation in each net will proceed independently; in particular, we can condition the two nets independently from one another. Just as influence diagrams use utility nodes merely to compute values resulting from the probabilistic conditioning, in simple hi-networks links will be used to merely compute expected-utility values resulting from both probabilistic and utilitarian conditioning. A m or e ambitious version of hi-networks will allow "hybrid" forms of conditioning, in which the p-net and the u-net share nodes, and, more interestingly, one can condition probabilities on utilities and vice versa. However, this remains an avenue for future investigation. Summary
The ideas described in this paper are part of a con t i n u i ng enquiry into the role of choice theory in AI, and the questioning of some established assumptions in choice theory. The main messages synthesized so far as a result of this inquiry are as follows: • There is no reason for the traditional asymmetries between probabilities and utilities. In particular, utilities too can enjoy distributions. This is the main focus of [7], and was discussed here only partially, • There exists a sense of conditional utility that is different from the classical one, and utility distributions provide a way to define it. The. same is true of utility independence. This has been the primary focus of this paper. For this reason, the related work discussed throughout the paper is mostly that which pertains to utility independence. • The interpretation of classical results in decision theory, and in particular the von Neumann and Morgenstern representation theorem, is opened to question. This is discussed in [7] but not here. • Utility networks can do for utilities what Bayesian networks do for probabilities, with the concept of teleology replacing that of causality. This was discussed in the previous section. • The new perspective suggests a structure, called a hi-distribution, in which probability and utility distributions live side by side, and which can be used to compute expected utilities. This too is discussed in [7] but only mentioned here, with brief discussion of how it suggests the notion of hi-networks as a generalization of both Bayesian networks and utility networks. Acknowledgements.
I have discussed my ideas on utilities with many people. For the material in Shoham this article, I thank in particular participants in the AAAI
Spring Symposium on Q u a l i ta t i v e Deci sion
Theory, Adam Grove, and several anonymous and very careful referees. Which is not to suggest that any of the above endorse the ideas expressed herein. References [1] F. Bacchus and A. Grove.
Graphical models for preference and utility. In
Proc. Eleventh Conference on Uncertainty in A rtijicial Intelligence, pages
J. Doyle and M. P. Wellman. Defining preferences as ceteris paribus comparatives.
In Proc. AAAI Spring Symp. on Qualitative Decision Making, pages 69-75, 1995. [3]
P. C. F ish b u rn. Utility Theory for Decision Making.
John Wiley & Sons, Inc., H. Katsuno and
A. 0.
Mendelzon. On the difference between updating a knowledge base and revis ing it. In Proc. Second Conference on Knowledge Representation and Reasoning,
Boston,
MA,
R. H.
Keeney and H. Raiffa.
Decision with Multiple Objectwes: Preferences and Value Tradeoffs.
John
Wiley & Sons, Inc.,
D. Shachter. Evaluating influence diagrams. In G. Shafer and J. Pearl, editors,
Readings in uncertain reasonmg, pages 79-90. Morgan Kaufmann Publishers, 1990. [7] Y. Shoham. A symmetric view of probabilities and utilities. In Proc.of IJCAI-97, page ( to a pp e ar ) ,) ,