Choice with Endogenous Categorization
CCHOICE WITH ENDOGENOUS CATEGORIZATION ∗ ANDREW ELLIS † AND YUSUFCAN MASATLIOGLU § Abstract.
We propose a novel categorical thinking model (CTM) where the fram-ing of the decision problem affects how the agent categorizes each product, and theproduct’s category affects her evaluation of the product. We show that a number ofprominent models of salience, status quo bias, loss-aversion, inequality aversion, andpresent bias all fit under the umbrella of CTM. This suggests categorization as anunderlying mechanism for key departures from the neoclassical model of choice andan account for diverse sets of evidence that are anomalous from its perspective. Wespecialize CTM to provide a behavioral foundation for the salient thinking model ofBordalo et al. [2013], highlighting its strong predictions and distinctions from otherexisting models.
Date : March, 2020. ∗ We thank David Dillenberger, Erik Eyster, Nicola Gennaioli, Matt Levy, Collin Raymond, the anony-mous referees, Andrei Shleifer, Rani Spiegler, Tomasz Strzalecki, and conference/seminar participantsat BRIC 2017, CETC 2017, SAET 2017, Lisbon Meetings 2017, ESSET 2019, Brown, UPenn, PompeuFabra, and Harvard for helpful comments and discussions. This project began at ESSET Gerzensee,whose hospitality is gratefully acknowledged. † Department of Economics, London School of Economics, Haughton Street, London, WC2A 2AE.Email: [email protected]. § University of Maryland, 3147E Tydings Hall, 7343 Preinkert Dr., College Park, MD 20742. E-mail: [email protected] . a r X i v : . [ ec on . T H ] M a y Introduction
Psychologists have long held that knowledge about our environment is organizedinto categories, and that this categorization plays a key role in decision making. Cate-gorization has been used by both humans and animals for thousands of years. As Ashby& Maddox [2005] write, “All organisms assign objects and events in the environmentto separate classes or categories... Any species lacking this ability would quickly becomeextinct.”
Categorization plays a key role in a number of important anomalies for the neo-classical model of choice. Attributes categorized as losses get higher weight relativeto those categorized as gains [Tversky & Kahneman, 1991]. An object’s most salientattribute plays a disproportionate role in the agent’s subsequent evaluation [Bordaloet al., 2013]. Subjects avoid objects they categorize as not-obviously-better-than thestatus-quo [Masatlioglu & Ok, 2005]. Agents are less patient when deciding betweendated rewards in the short-term than in the long-term [Strotz, 1955]. Allocations amongmembers of society are evaluated according to whether inequities are advantageous ordisadvantageous [Fehr & Schmidt, 1999].This paper proposes and axiomatizes a simple model of the role that categorizationplays in economic decisions. In the Categorical Thinking Model (CTM), a decisionmaker (DM) first groups objects together into categories, consciously or unconsciously,then evaluates each object through the lens of the category to which it belongs. Themodel has two key features motivated by psychological evidence. First, categorization iscontext-dependent, as summarized by a reference point that may depend on the choiceset. Second, how an object is categorized affects its valuation. Prominent modelsof loss-aversion, salience, status quo bias, present bias, and inequality aversion all fitunder the umbrella of CTM. Hence, CTM suggests categorization as an underlyingexplanation for many key departures from the neoclassical model in many differentdecision-making environments.
To make our results comparable with previous work, we begin by assuming that afamily of reference-dependent preference relations describe the DM’s choices for eachreference point. Each alternative has a pair of observed attributes, such as price andquality, height and weight, or size and timing of a reward. In CTM, the context inwhich the decision takes place determines a reference point, which in turn divides thealternatives into categories. Each category has its own utility function, and within agiven category, the DM evaluates the options according to it. Hence, the DM makesdifferent trade-offs between the attributes when they are differentially categorized. Weshow that the DM conforms to CTM if and only if she behaves as a standard DMwhen comparing objects categorized the same way. That is, her choices satisfy somestandard axioms, such as acyclicity, and do not depend on the reference point whenrestricted to alternatives that belong to the same category.CTM is a parsimonious approach to incorporating psychological evidence into eco-nomics. Psychological factors determine how each alternative is perceived, which CTMcaptures through different categories. Moreover, they predict how being categorized ina particular way affects the DM’s choice, which CTM captures through the category’sutility function. For instance, salience and loss-aversion make distinct predictions aboutwhen a DM puts higher weight on a dimension. The most salient attribute gets moreweight, as does an attribute classified as a loss. Our result shows that CTM closes themodel by requiring that the DM acts consistently within the alternatives categorizedthe same way.Despite its generality, CTM makes testable predictions and excludes certain typesof modeling choices. For instance, a number of models capture salience effects, in-cluding the salient thinking model [Bordalo et al., 2013] (BGS), Kőszegi & Szeidl[2013], Bhatia & Golman [2013], Gabaix [2014], and Bushong et al. [2015]. Of thesemodels, only BGS is a CTM. In other words, even the most general version of BGSexcludes these models, so BGS offers a different method of modeling salience. Ourresults highlight trade-offs between the different modeling approaches. For instance,BGS maintains a stronger consistency condition across reference points than does the constant loss aversion of [Tversky & Kahneman, 1991], but the latter, unlike BGS,satisfies Monotonicity across regions.We then provide the first complete characterization of the observable choice be-havior equivalent to the BGS model, clarifying and identifying the nature of the as-sumptions used in the model. The first crucial step towards understanding the model isgetting a handle on its novel salience function that determines which attribute standsout for a given reference point. We study the salience function based on a simple obser-vation: while it influences which attribute is salient, the weight given to each attributeis independent of its magnitude. This makes BGS a special case of CTM, so our earlierresults allow a characterization.One key feature of BGS is that the reference point is endogenously determinedby the set of available options. Since the salience of each alternative depends onthe reference point, varying the budget set affects the salience of, and so the DM’sevaluation of, a given alternative. Our final contribution addresses this challenge byextending our characterization of CTM to the setting where the reference point isendogenous. Our primitive is a choice correspondence describing the DM’s choices.The menu maps to a reference point, such as the average level of each attribute overalternatives in the set. As long as the reference point varies systematically with thechoice problem, we characterize the properties of the choice correspondence equivalentto CTM. Specifically, we show that if the DM’s choices obey the natural analogs ofour earlier axioms, then CTM rationalizes her behavior. We apply it to provide acompletely endogenous characterization of the BGS function.The paper proceeds as follows. The next subsection provides a brief overview ofthe relevant psychology literature on categorization. Section 2 introduces CTM anddiscusses the models covered under its umbrella. Section 3 axiomatizes CTM andcompares and contrasts the models of riskless choice discussed in Section 2. Section 4 contains our analysis of the salient thinking model. Section 5 introduces the endoge-nous reference point setting, and applies our axiomatizations of CTM to it. Section 6concludes with a discussion of related literature.1.1.
Psychology of Categorization.
There is a long literature in psychology andmarketing discussing categorization. Recent review articles include Ashby & Maddox[2005], Loken [2006], Loken et al. [2008] and Cosmides & Tooby [2013]. Much ofthe literature focuses on how categories are formed, and when new alternatives areadded into existing categories. CTM relies on several properties documented by thisliterature.First, categories are context dependent. Tversky [1977], Tversky & Gati [1978]present evidence that replacing one item in a set of objects can drastically alter howpeople categorize the remaining objects. Tversky & Gati [1978] argue that categoriza-tion “is generally not invariant with respect to changes in context or frame of reference.”
For example, they show that subjects put East Germany and West Germany into thesame category when the salient feature is geography or cultural background, but catego-rize the two differently if political system is salient. Similarly, Choi & Kim [2016] positthat depending on the context an Apple Watch can be categorized as a tech product,a fashion product, a fitness product, or a simple watch. Ratneshwar & Shocker [1991]show that subjects categorize ice cream and cookies together in terms of similarity (e.g.they are both desserts), but categorize ice cream and hot dogs together in terms ofusage benefit (e.g. both are good snacks to have at the pool). Stewart et al. [2002]present evidence that relative magnitude information, derived from a comparison ofthe reference point, is used in categorization of sounds.Second, how an object is categorized affects its final valuation. In a classic seriesof experiments, Rosch [1975] shows that differently categorized but physically identicalstimuli are perceptually encoded as distinct objects. Wanke et al. [1999] demonstratethat “wine” is evaluated more positively when categorized with “lobster” than with“cigarettes.” Mogilner et al. [2008] show that categorizing goods differently resulted in different reported satisfaction. Chernev [2011] shows that bundling a healthy fooditem with a junk food item reduced the reported caloric content beyond that of thejunk food alone.Finally, categories take the form of regions in the alternative space. This tracksvery closely with the decision bound theory in the psychology. As Ashby & Maddox[2005, p. 152] describe, it posits that the subject “ partition[s] the stimulus space intoresponse regions... determines which region the percept is in, and then emits the associ-ated response. ” Ashby & Gott [1988] show it can accommodate examples incompatiblewith other theories of category formation, such as prototype theory. Moreover, there issubstantial experimental support for it, including Ashby & Waldron [1999], Anderson[1991], Love et al. [2004]. 2.
Model
To aid in comparison with the existing literature and to separate the effects ofreference point formation, we follow Tversky & Kahneman [1991] by taking as givena family of reference-dependent preference relations. We assume that the space ofalternatives is X = R n ++ , focusing on n = 2 when not otherwise noted. We oftenuse the convention of writing x as ( x i , x − i ) with x − i denoting the components of x different for i . The next subsections explore three different interpretations of X indifferent contexts: as a riskless object with different attributes, as a dated reward orconsumption stream, and as an allocation of consumption across individuals. For eachreference point r ∈ X , the DM maximizes a complete and transitive preference relation,denoted by (cid:37) r , over X . As usual, (cid:31) r denotes strict preference and ∼ r indifference. Theprimitive of the model is a family of such preferences indexed by the set of referencepoints, { (cid:37) r } r ∈ X . In this section, we assume that the reference point is exogenously We note when there is a distinction between general n and n = 2. Theorem 5 and the resultsthat rely on it use the full structure of R n ++ . The remaining results all generalize to any X that is afinite Cartesian product of open, linearly ordered, separable, connected sets endowed with the ordertopology, where X itself has the product topology. given. We relax this assumption in Section 5 to allow endogenous reference pointformation.2.1. Categorical Thinking Model.
The first ingredient of the model is a mappingfrom the reference r to categories. Each category corresponds to a different psycholog-ical treatment and changes as the reference changes. We allow the categories to havea very general structure. Definition 1.
A vector-valued function K = ( K , K , . . . , K m ) is a category function if each K k : X → X satisfies the following properties:(1) K k ( r ) is a non-empty, regular open set, and cl ( K k ( r )) is connected, (2) S mk =1 K k ( r ) is dense,(3) K k ( r ) T K l ( r ) = ∅ for all k = l , and(4) K k ( · ) is continuous. Categories arise from the psychology of the phenomenon to be modeled. For CTMto be applicable, the psychology must make an unambiguous prediction about whichalternatives are affected. For instance, with gain-loss utility, alternatives that dominatethe reference point are treated differently than those better in only one dimension.Similarly, with present-bias, alternatives that pay-off sooner than the reference arecategorized together. While we take the categories as given, if the psychology onlymakes partial predictions, then the categorization of other alternatives can often beinferred from choice. Proposition 1 does so for the salient thinking model.We interpret the properties as follows. Every category contains some alternativefor every reference point. If a particular product, say x , belongs to the category k ,then so do all products that are close enough to x . There is a path that stays withinthe category between any two points, so categories cannot be the union of “islands.”Almost every alternative is in at least one category, and none are in two categories.Finally, if the reference point does not change too much, then neither do the categories. Recall that a set A is regular open if A = int ( cl ( A )). That is, each K k is both upper and lower hemicontinuous when viewed as a correspondence. The consumer values each good in a way that depends not only on alternative ofa product, as in the standard neoclassical model, but also on the category to whichthe product belongs. When alternatives x and y are both categorized in category k ,the category utility function U k : X → R represents the DM’s choices. That is, sheprefers x to y if and only if U k ( x ) ≥ U k ( y ). We focus on the effect of categorizationon distorting trade-offs, so we require that a category utility function is additivelyseparable and monotonic : U k ( x ) = P ni =1 U ki ( x i ) where each U ki ( · ) is strictly monotoneand continuous. The utility index U ki represents the DM’s preferences over dimension i when an alternative belongs to the category k .When alternatives belong to different categories, the reference point may affectthe DM’s choice. If the alternative x lies in the category k when the reference is r , thatis, x ∈ K k ( r ), then the value of consumption x is represented by U k ( x | r ). However,the reference does not affect the utility trade-off within a category. To capture this, werequire that U k ( ·| r ) agrees with U k , in the sense that it is an increasing transformationthereof. Then, U k ( x | r ) ≥ U k ( y | r ) if and only if U k ( x | r ) ≥ U k ( y | r ) for any references r, r ∈ X . We can now formally define the model as follows. Definition 2.
The family { (cid:37) r } r ∈ X conforms to the Categorical Thinking Model (CTM) under category function K = ( K , K , . . . , K m ) if for each category k there is a categoryutility function U k so that when x ∈ K k ( r ) and y ∈ K l ( r ) for some rx (cid:37) r y ⇐⇒ U k ( x | r ) ≥ U l ( y | r )and U k ( ·| r ) is an increasing transformation of U k ( · ) for each r ∈ X and category k .A CTM is increasing if U ki is increasing in x i for every category k and dimension i .We also consider two sub-classes: A CTM is affine if U k ( ·| r ) an affine transformationof U k for each r . A CTM is strong if U k ( ·| r ) = U k ( · ) for each r . Most of the models wediscuss below are affine CTM, and those of riskless consumer choice are all increasing. That is, U ki is either strictly increasing on R + or strictly decreasing on R + . Riskless Consumer Choice.
In this subsection, we consider our primary appli-cation: riskless consumer choice. The four models are introduced formally, and each isshown to be CTM. Figure 1 plots their indifference curves and categories, with darkerlines indicating higher utility. r r MO Attribute 1
BGS A tt r i b u t e A tt r i b u t e Attribute 1 A tt r i b u t e 𝑝 " 𝑝 𝑝 $ Attribute 1
Prototype Model rr Attribute 1 A tt r i b u t e TK Figure 1.
CTM for Riskless Choice
Salient Thinking Model (BGS):
Bordalo et al. [2013] propose an intuitive anddescriptive behavioral model based on salience. In the model, an attribute receivesmore weight when it is salient than when it is not. The magnitude of salience isdetermined by a salience function , σ := R ++ × R ++ → R + . Given a reference ( r , r ),attribute 1 is salient for good x if σ ( x , r ) > σ ( x , r ), and attribute 2 is salient forgood x if σ ( x , r ) < σ ( x , r ). That is, the salient attribute is the one that differs themost from the reference according to the salience function. We describe the properties of σ more fully in Section 4. Definition 3.
The family { (cid:37) r } r ∈ X has a BGS ( σ ; w , w , u , u ) representation if each (cid:37) r is represented by(1) V BGS ( x | r ) = w u ( x ) + w u ( x ) if σ ( x , r ) > σ ( x , r ) w u ( x ) + w u ( x ) if σ ( x , r ) > σ ( x , r )for a salience function σ , strictly positive weights with w w > w w , and each u i strictlyincreasing.To illustrate this model, consider the salience function proposed by BGS: σ ( x k , r k ) = | x k − r k | x k + r k . Based on it, the left-upper panel in Figure 1 illustrates BGS. There are two categories:those that are 1-salient, i.e. σ ( x , r ) > σ ( x , r ), and those that are 2-salient, i.e. σ ( x , r ) > σ ( x , r ). To visualize them, note that the entire product space is dividedinto four distinct areas by the two dashed curves that intersect at the reference point.The areas lying the north and south of the reference point are categorized as the2-salient products. Similarly, 1-salient products lie east and west of the referencepoint. The figure incorporates indifference curves as well, holding fixed the referencepoint. There are two potential sets of indifference curves, illustrated by dotted lines.Depending on the category, one of the two is utilized to determine the DM’s choice.When attribute 1 is salient, the steeper one becomes the indifference curve since itputs higher weight on the first attribute. Conversely, the flatter one is the indifferencecurves when attribute 2 is salient. We draw two different indifference curves, wherethe darker color corresponds to higher utility. Constant Loss Aversion Model (TK):
Tversky & Kahneman [1991] providesfoundations for a reference-dependent model that extends Prospect Theory to risklessconsumption bundles. Each is evaluated relative to reference point r , and losses loomlarger than gains. In the absence of losses, the DM values each alternative with anadditive utility function, u ( x ) − u ( r ) + v ( x ) − v ( r ), which attaches equal weight to each attribute. If she experiences a loss in attribute i , then she inflates the weightattached to that attribute by λ i >
1. There are four different categories in the TKformulation: (i) gain in both dimensions, (ii) gain in the first dimension and loss in thesecond dimension, (iii) loss in the first dimension and gain in the second dimension,and (iv) loss in both dimensions (see the right-upper panel in Figure 1). We modelthis as K GL = ( K , K , K , K ) where K ( r ) = { x : x (cid:29) r } , K ( r ) = { x : x
Masatlioglu & Ok [2005] model individuals whoexperience some form of psychological discomfort when they have to abandon theirstatus quo option. This discomfort imposes an additional utility cost. Of course, ifan alternative is unambiguously superior to the status quo, the DM does not feel anypsychological discomfort to forgo the status quo; in such cases there will be no cost.Formally, Q ( r ) is a closed set denoting the alternatives that are unambiguously superiorto the default option r (see the left-bottom panel of Figure 1). If an alternative does notbelong to this set, then the DM pays a cost c ( r ) >
0, which may depend on the referencepoint, to move away from the status quo. In this model, there are two categories K MO = ( K , K ) where K ( r ) = { x | x ∈ int ( Q ( r )) } and K ( r ) = { x | x / ∈ Q ( r ) } . Forany x = r , we have V MO ( x | r ) = u ( x ) + u ( x ) if x ∈ K ( r ) u ( x ) + u ( x ) − c ( r ) if x ∈ K ( r ) . This is an example of an affine CTM for general c , and a strong CTM when c ( r ) isconstant. Prototype Theory (PT):
Prototype theory was first proposed by Posner & Keele[1970]. According to it, each category is associated with a prototype, its “most typi-cal” member. Initial categorization is determined by comparing each product to eachprototype. We now formalize this idea and show that this is CTM.There are m prototypes, p , . . . , p m . The DM categorizes alternatives accordingto how similar they are to a given prototype. Then, category K i ( r ) is the set ofalternatives categorized as most similar to exemplar p i . Similarity may depend on thereference. There is a family of metrics indexed by r so that d r ( x, y ) indicates how faraway the DM perceives x to be from y given reference r . Formally, K P = ( K , . . . , K m )where K i ( r ) = { x : i = arg min j d r ( p j , x ) } . and the DM evaluates alternatives in category i according to V iP T ( x | r ) = U ( p i ) + λ i ( x − p i ) + λ i ( x − p i ) if x ∈ K i ( r )where U ( · ) is a hedonic utility function and λ ij >
0. A particularly interesting specifi-cation is where λ ij = ∂∂p ij U ( p i ). Then, the DM approximates the utility of x accordingto a first-order Taylor expansion around the prototype most similar to it (see theright-bottom panel of Figure 1). This is an example of a Strong CTM.2.3.
Time Preferences.
We apply our model to choices of dated rewards. The pair( x, t ) represents a payment of x at time t . Motivated by present bias, we proposea model where the DM divides time periods according to short term and long term.Given a reference r = ( r x , r t ), rewards arriving before r t are perceived as a short-termand after r t as long-term. Hence K short ( r ) = { ( x, t ) | t < r t } In the figure, we use d r ( p j , x ) = d ( p j ,x ) d ( p j ,r ) where d is the Euclidean metric. and K long ( r ) = { ( x, t ) | t > r t } . The utility function is V QH (( x, t ) | ( r x , r t )) = ( βδ ) t u ( x ) if ( x, t ) ∈ K short ( r ) β r t δ t u ( x ) if ( x, t ) ∈ K long ( r )where 0 < δ < < β ≤
1. The model is additively separable after taking logs,so it is a special case of CTM. It exhibits present bias when β <
1: there exist values y > x > x, τ ) (cid:37) r ( y, τ + 1) if and only if τ < r t − Figure2 plots its indifference curves. T i m e Money r 𝑇𝑖𝑚𝑒 𝑃𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐾 𝑙𝑜𝑛𝑔 ( 𝑟 )𝐾 𝑠ℎ𝑜𝑟𝑡 ( 𝑟 ) Figure 2.
CTM for Dated Rewards2.4.
Social preferences.
Our final application is to consumption allocations. Alter-natives assign consumption to each of n agents, labeled 1 , . . . , n . Dimension 1 corre-sponds to the DM’s own consumption. We consider inequality-averse and social-welfareconcerned DMs. Inequality Aversion:
Fehr & Schmidt [1999] introduce a model of inequality aver-sion. The DM experiences envy, i.e. she dislikes having a lower allocation than another, For instance u ( x ) = 1 and u ( y ) = ( βδ ) − . and guilt, i.e. she prefers others not to have less consumption than her. We present ageneralization of their model where a reference point affects how much envy or guiltthe DM feels. Envy and guilt are generated by the difference between how much betteragent i is relative to i ’s reference consumption and how much better the DM is rel-ative to her own reference.Some reference-dependence makes sense when consideringdecisions that impact the social alternative: the DM may not experience much guilt ifagent 2’s consumption is low in every feasible allocation.In the Relative Inequality Aversion (RIA) model, the DM i feels guilty if her ownrelative gain ( x i − r i ) is higher than the relative gain of individual j ( x j − r j ). Otherwise,the individual i is jealous of individual j since x j − r j > x i − r i . Given the referencepoint is r , the value function of the DM equals V RIAi ( x | r ) = x i − α ( n − X j = i max { ( x j − r j ) − ( x i − r i ) , }− β ( n − X j = i max { ( x i − r i ) − ( x j − r j ) , } Observe that when r i = r j for all i and j , the utility function reduces to that of Fehr& Schmidt [1999]. Throughout, we follow them in assuming that α ≥ β ≥ β < K RIA = ( K R , K R ) where K R ( r ) = { x ∈ X : x − r > x − r } and K R ( r ) = { x ∈ X : x − r < x − r } The set K jR ( r ) contains allocations where individual j gets a relatively better deal thanthe other. The relative inequality aversion model can be written as V RIA ( x | r ) = x − α [( x − r ) − ( x − r )] if x ∈ K ( r ) x − β [( x − r ) − ( x − r )] if x ∈ K ( r )which is an affine CTM. Distributional Preferences:
Charness & Rabin [2002] propose a model of socialpreferences where utility is increasing with the minimum of all individuals’ payoffs and I n d i v i d u a l Individual 1 r 𝐾 (𝑟)𝐾 & (𝑟) I n d i v i d u a l Individual 1 r 𝐾 (𝑟)𝐾 & (𝑟) Figure 3.
Left: Relative Inequality Aversion and Right: Reference-Dependent Distributional Preferencesthe total of all individuals’ payoffs. The DM i maximizes V i ( x ) = (1 − λ ) x i + λ [ δ min { x , x , ..., x n } + (1 − δ ) X k x k ] . The parameter δ ∈ (0 ,
1) measures the degree of concern for helping the worst-offindividual (Rawlsian) versus maximizing the total social payoffs (Utilitarian). Theparameter λ ∈ (0 ,
1) measures how the DM balances social welfare with her ownutility, where λ = 0 captures pure self-interest.We propose a natural extension of their model with an exogenously given referencepoint. We call this model Reference-Dependent Distributional Preferences . That is, V CRi ( x | r ) = (1 − λ ) x i + λ [ δ min { x − r , x − r , ..., x n − r n } + (1 − δ ) X k x k ]According this model, each individual cares to maximize the minimum possible relativepayoff x j − r j . Note that if r i = r j for all i and j , this model encompasses the modelof Charness & Rabin [2002] as a special case.We show that this model is CTM. To do that, we first define categories for thismodel. Each category corresponds to the individual who has the worst relative payoff.In this case, K CR = ( K , . . . , K n ) where K j ( r ) = { x ∈ X : ( x j − r j ) is the minimum of { x − r , x − r , ..., x n − r n }} , and V CRi ( x | r ) = (1 − λ ) x i + λ [ δ ( x j − r j ) + (1 − δ ) X k x k ] if x ∈ K j ( r ) , showing that the model is an affine CTM.3. Behavioral Foundation for CTM
In this section, we provide a set of behavioral postulates characterizing increasingCTM. These postulates represents the key features of the model. We show that theyhold if and only if the data is representable by increasing CTM, rendering the modelbehaviorally testable. In subsequent subsections, we explore the various strengtheningsof the model and provide axiomatizations of these as well.For each category k , define the revealed ranking within that category (cid:37) k so that x (cid:37) k y if and only if there exists r such that x, y ∈ K k ( r ) and x (cid:37) r y . The sub-relations (cid:31) k and ∼ k are defined in the usual way. The ranking (cid:37) k captures preference withincategory k . The following axiom states that the within-category revealed preferencehas no cycles. Axiom 1 (Weak Reference Irrelevance) . The relation (cid:37) k is acyclic. That is, if x (cid:37) k x (cid:37) k · · · (cid:37) k x m , then x m k x .Weak Reference Irrelevance ensures that the DM reacts consistently to alternativeswhen they are categorized the same way. That is, the categories reflect the DM’spsychological treatment of the alternative. Although she may have choice cycles, thesecycles occur only when the context changes how the DM categorizes alternatives. Since (cid:37) k is acyclic, we can take its transitive closure to derive full comparisons. Let (cid:37) k ∗ beis transitive closure, with (cid:31) k ∗ and ∼ k ∗ the asymmetric and symmetric parts.Within a category, preference has an additive structure. The next axiom impliesthat each (cid:37) r satisfies Cancellation when restricted to a given category. Axiom 2 (Category Cancellation) . For all x , y , z , x , y , z ∈ R + , r ∈ X , and cate-gory j so that ( x , z ) , ( z , y ) , ( z , x ) , ( y , z ) , ( x , x ) , ( y , y ) ∈ K j ( r ):If ( x , z ) (cid:37) r ( z , y ) and ( z , x ) (cid:37) r ( y , z ), then ( x , x ) (cid:37) r ( y , y ).Category Cancellation adapts the well-known Cancellation axiom to our setting,differing in its requirement that the alternatives belong to the same category. Withoutthe qualifiers on how alternatives are categorized, the axiom is a well-known necessarycondition for an additive representation that appears in Krantz et al. [1971] and Tversky& Kahneman [1991], among others. If X has strictly more than two dimensions, thenwe can replace it with the analog of P2 [Savage, 1954]; see Debreu [1959]. The next axiom requires that Monotonicity holds between objects categorized thesame way.
Axiom 3 (Category Monotonicity (CM)) . For any x, y, r ∈ X : if x ≥ y and x = y ,then y (cid:37) k ∗ x for any category k ; in particular, if x, y ∈ K k ( r ), then x (cid:31) r y .Since both attributes are “goods” as opposed to “bads,” Monotonicity means thatif a product x contains more of some or all attributes, but no less of any, than anotherproduct y , then x is preferred to y . The postulate requires that choice respects Mono-tonicity for alternatives within the same category. However, it does not require thatthis comparison holds when the goods belong to different categories, and we shall seelater that salience can distort comparisons enough to cause Monotonicity violations.Finally, the family of preference relations is suitably continuous. Axiom 4 (Category Continuity) . For any r ∈ X and any x ∈ S i K i ( r ), the sets U C j ( x ) = { y ∈ K j ( r ) : y (cid:31) r x } and LC j ( x ) = { y ∈ K j ( r ) : x (cid:31) r y } are open. Formally, for any x, y, x , y ∈ K k ( r ) and subset of indexes E , if x i = x i and y i = y i for i ∈ E , x i = y i and x i = y i for all i / ∈ E , and x (cid:37) r y , then x (cid:37) r y . This is implied by Category Monotonicity when n = 2, so a stronger condition is necessary. Moreover, the set ( x ∈ [ i K i ( r ) : U C j ( x ) [ LC j ( x ) = K j ( r ) and U C j ( x ) = K j ( r ) and LC j ( x ) = K j ( r ) ) has an empty interior.Category continuity adapts the usual continuity condition to apply only withina category. It says that when y is preferred to x in a given context and y is closeenough to y , then y is also preferred to x , provided that y belongs to the samecategory as y . The final condition requires that if an alternative x is neither betterthan everything within category j nor worse than everything within category j , thenthere exists something in category j that is as good as x , or as good as somethingarbitrarily close to x . For such an x , the category must intersect almost all indifferencecurves close to x ’s since each category is almost connected.Finally, we make a structural assumption. Assumption (Structure) . The category function K is such that for any category k , thefollowing sets are connected: E k = S r ∈ X K k ( r ), { x ∈ E k : x i = s } for all dimensions i and scalars s , and { y ∈ E k : x ∼ k ∗ y } for all x ∈ E k .The Structure Assumption is satisfied all the models we discussed in the previoussection. Indeed, E k = R n ++ for every category in these models. These conditionsestablish that the objects categorized in the same way have enough topological structureso that “local” properties can be extended to global ones. Chateauneuf & Wakker [1993]show that the structure assumption, applied to a single preference relation and domain,is needed to guarantee that a local additive representation implies a global one. Theorem 1.
Assume the Structure Assumption holds. The family { (cid:37) r } r ∈ X satisfiesWeak Reference Irrelevance, Category Cancellation, Category Monotonicity, and Cat-egory Continuity for K if and only if it conforms to increasing CTM under K . Increasing CTM captures the behavior implied by the axioms, so we call Axioms1-4 the CTM axioms. Taken together, they establish that the DM acts rationally whenrestricting attention to alternatives categorized in the same way for a given referencepoint. That is, CTM captures a DM who differs from the neoclassical model only whenalternatives are categorized differently. The theorem reveals that a number of otherreference dependent models have been studied by the literature fall outside the scopeof our analysis. For instance, Bhatia & Golman [2013], Munro & Sugden [2003], thenon-constant loss averse version of Tversky & Kahneman [1991], and the continuousversion of the salient thinking model (see online appendix of Bordalo et al. [2013]) allviolate weak reference irrelevance for any specification of the category function. Weprovide the details in Appendix A.6.We provide a brief outline of how the proof works, and all omitted proofs can befound in the appendix. The axioms are sufficient for a “local” additive representationof (cid:37) r (and thus (cid:37) k ) on an open ball around each alternative within category k . TheStructure Assumption allows us to apply Theorem 2.2 of Chateauneuf & Wakker [1993]to aggregate the local additive representation of (cid:37) k into a global one. To do so, we mustestablish that the global preference is complete, transitive, monotone, and continuous.We establish these properties for preference within each category by showing that thetransitive closure of each (cid:37) k is complete and suitably continuous. The remainder ofthe proof shows that Categorical Continuity allows us to stitch the different within-category representations together into an overall utility function.3.1. Reweighting.
In all of the models discussed in Section 2.2, the DM evaluatesthe difference between alternatives categorized in the same way similarly. That is,regardless of the category, the DM agrees on how much better a value of x versus y is in dimension i . Categorization affects only how much weight she puts on eachdimension. This is captured by the following axiom. Axiom 5 (Reference Interlocking) . For any a, b, a , b , x , y , x, y ∈ X and categories k, j with x − i = a − i , y − i = b − i , x i = a i , y i = b i , x i = x i , y i = y i , a i = a i , b i = b i :if x ∼ k y , a (cid:37) k b , and x ∼ j y , then it does not hold that b (cid:31) j a .The term “Reference Interlocking” comes from Tversky & Kahneman [1991]. Ifeach (cid:37) k is complete, then their statement of it is equivalent given the other axioms.Roughly, the DM agrees on the difference in utilities along a given dimension regard-less of how an alternative is categorized. To interpret, observe that the first pair ofcomparisons reveals that the difference between a i and b i exceeds that between x i and y i when the alternatives belong to category k . For alternatives categorized in j , theDM should not reveal the opposite ranking. We defer to the above paper for a detaileddiscussion. Theorem 2.
Suppose that { (cid:37) r } r ∈ X conforms to increasing CTM under K and each E k is connected. For each dimension i , there exist a utility index u i and a weight w ki > for each category k so that each category utility U k is cardinally equivalent to one thatmaps each x ∈ E k to P i w ki u i ( x i ) if and only if Reference Interlocking holds. All of the models in Section 2.2 satisfy the axiom, and are thus special casesof increasing CTM satisfying Reference Interlocking. For instance, differences in thesalient dimension of BGS receive higher weight, but the relative size of two givendifferences in the same dimension is the same regardless of whether both are salient orboth are not. The axiom implies that the utility index within each category must bethe same, up to an increasing, affine transformation.3.2.
Behavioral Foundation for Affine CTM.
In this section, we explore when anaffine CTM exists. That is, when is U k ( ·| r ) a positive affine transformation of U k ( ·| r )for any r, r ? All of the models from Section 2.2 fall into this class. For MO, this is true only when c ( r ) < ∞ . Unsurprisingly, the key restriction relative to CTM is that tradeoffs across cate-gories are affine. As is usual, this is captured by a form of lineariry, or the “Indepen-dence Axiom.” We require it to hold only when alternatives combined belong to thesame category, and adjust for the curvature of the utility index.To state the key axiom, we define an operation ⊕ k along similar lines as Ghirardatoet al. [2003]. For x, y ∈ R and a category k , x ⊕ ki y = z when there exists a, b such that( x i , a − i ) ∼ k ∗ ( z i , b − i ) and ( z i , a − i ) ∼ k ∗ ( y i , b − i ). If (cid:37) k has an additive representation,then U ki ( x ) + U ki ( y ) = U ki ( z ). Define ⊕ k similarly for alternatives: x ⊕ k y = z if and only if z i = x i ⊕ ki y i for each dimension i . Finally, define αx ⊕ k (1 − α ) y bytaking limits. We note that if U ki is linear, then αx ⊕ ki (1 − α ) y = αx + (1 − α ) y . Axiom 6 (Affine Across Categories (AAC)) . For any r ∈ X , x, x , αx ⊕ j (1 − α ) x ∈ K j ( r ), and y, y , αy ⊕ k (1 − α ) y ∈ K k ( r ): if x (cid:37) r y and x (cid:37) r y , then αx ⊕ j (1 − α ) x (cid:37) r αy ⊕ k (1 − α ) y .This axiom is a natural adaptation of the linearity axiom, a close relative of theindependence axiom. If we strengthened Affine Across Categories to be stated usingthe traditional linearity condition, then we would obtain a representation where each U k ( ·| r ) is itself an affine function. Otherwise, it requires that the ⊕ k operation preservesindifference.The second axiom deals with a technical issue. Axiom 7 (Unbounded) . For any r ∈ X : if K k ( r ) contains a sequence x n so that U k ( x n ) → ∞ ( −∞ ), then for any x ∈ X there exists x ∗ ∈ K k ( r ) so that x ∗ (cid:31) r x ( x (cid:31) r x ∗ ).We note that U k is unique up to a positive affine transformation. Hence wheneverthe utility of some sequence goes to infinity for some representation of (cid:37) k , it must In general, αx ⊕ k (1 − α ) y need not exist. However, it does exist “locally,” which is all we require inthe proof. That is, if x ∈ K k ( r ), then there exists an open set O with x ∈ O on which αy ⊕ k (1 − α ) z exists for every α ∈ [0 ,
1] and y, z ∈ O . also converge to infinity for any other representation as well. While the axiom can bestated in terms of primitives, we instead state it in terms of the U k . It ensures that acategory containing alternatives whose utility goes to positive (negative) infinity mustcontain an alternative better (worse) than any other given alternative. If it failed, thenno affine transformation of the category utility would represent the preference.
Theorem 3.
Assume the Structure Assumption holds. Then, { (cid:37) r } r ∈ X satisfies theCTM axioms, Affine Across Categories, and Unbounded for K if and only if it conformsto Affine Increasing CTM under K . All the models discussed in Section 2 fall into the class of Affine CTM, so theresult reveals the behavior all have in common. Relative to CTM, Affine Across Cat-egories imposes stronger requirements on how the DM relates alternatives in differentcategories. Not only does the DM evaluate utility within a category using an addi-tive function, but the additive structure persists across categories. Moreover, this aidswith interpreting utility differences. If every pair of categories contains alternativesindifferent to one another, the entire representation is unique up to a common positiveaffine transformation. We call the combination of Axioms 1-4 and 6-7 the Affine CTMaxioms.3.3.
Behavioral Foundation for Strong CTM.
For a strong CTM, changing thereference point does not reverse the ranking of two products unless it also changes theircategorization. The following axiom imposes this.
Axiom 8 (Reference Irrelevance) . For any x, y, r, r ∈ X :if x ∈ K k ( r ) T K k ( r ) and y ∈ K l ( r ) T K l ( r ), then x (cid:37) r y if and only if x (cid:37) r y .For the general CTM, the reference point influences choice trough two channels:the category to which it belongs and its valuation. The axiom eliminates the latter. The statement in terms of primitives involves standard sequences and does not reveal key aspectsof behavior, so we instead present the simpler and easier to interpret one above. In special cases, thisis easy to do. For instance, if U k is linear, then the axiom simply states that if K k ( r ) is an unboundedset, then the conclusion of the above axiom holds. When comparing two alternatives across different reference points, the DM’s relativeranking does not change when neither’s category changes. This property greatly limitsthe effect of the reference point. In fact, a sufficiently small change in the referencenever leads to a preference reversal.
Theorem 4.
Assume the Structure Assumption holds and for any categories i, j andany r ∈ X , there exists x ∈ K i ( r ) and y ∈ K j ( r ) with x ∼ r y . Then, { (cid:37) r } r ∈ X satisfiesthe Affine CTM axioms and Reference Irrelevance for K if and only if conforms toStrong, Increasing CTM under K . Since BGS, MO, and PT are Strong CTM, Theorem 4 characterizes the behaviorthey have in common. While the reference plays a role in categorization, it plays norole in choice after categorization is taken into account. TK, which belongs to affineCTM but not strong CTM, must therefore violate reference irrelevance.3.4.
Comparing Models of Riskless Choice.
TK, BGS, MO, PT, and the neo-classical model all conform to Affine CTM, so Theorems 1 and 3 describe the behaviorthat they have in common. However, the analysis so far, as well as the functionalforms of the models, leaves open the question of what behavior distinguishes them.Of course, they differ in how alternatives are categorized, but the models also reflectdistinct behavior within and across categories.In addition to Reference Irrelevance, they are distinguished by whether they satisfytwo classic axioms: Monotonicity and Cancellation, the unrestricted versions of Cat-egory Monotonicity and Category Cancellation. The first requires that a dominantbundle is chosen, and the latter that an additive structure obtains. The representa-tion theorem of Tversky & Kahneman [1991] imposes those two axioms in additionto continuity. In Appendix A.8, we show that an affine CTM with a Gain-Loss cate-gory function satisfies the two classic axioms and continuity if and only if it has a TKrepresentation. We provide a detailed examination of the BGS model in Section 4. The formal statements are obtained by dropping the requirement in those two axioms that thealternatives belong to the same category. Table 1 compares the four models in terms of Reference Irrelevance, Monotonicityand Cancellation, when BGS, TK, MO, and PT do not coincide with the neoclassicalmodel. Only the neoclassical model satisfies all conditions; none of the other four do.On the one hand, BGS and PT satisfy Reference Irrelevance but violate Monotonicityand Cancellation. On the other, TK maintains Monotonicity and Cancellation butviolates Reference Irrelevance. Finally, MO satisfies all but Cancellation. Neoclassical BGS TK MO PTCTM (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)
Monotonicity (cid:51) (cid:55) (cid:51) (cid:51) (cid:55)
Reference Irrelevance (cid:51) (cid:51) (cid:55) (cid:51) (cid:51) Cancellation (cid:51) (cid:55) (cid:51) (cid:55) (cid:55)
Table 1.
Comparisons of ModelsWe provide a plausible example violating the Cancellation axiom, and hence be-havior inconsistent with TK. Then, we illustrate BGS can accommodate this exampleeven without requiring a shift in the reference point. While the example is one simpletest to distinguish BGS from TK, it is also powerful as it works for a fixed referencepoint.
Example 1.
Consider a consumer who visits the same wine bar regularly. The bar-tender occasionally offers promotions. The customer prefers to pay $8 for a glass ofFrench Syrah rather than $2 for a glass of Australian Shiraz. At the same time, sheprefers to pay $2 for a bottle of water rather than $10 for the glass of French Syrah.However, without any promotion in the store, she prefers paying $10 for AustralianShiraz to paying $8 for water. Propositions 2 and 5 give the (cid:51) ’s of the table for BGS and TK. It is routine to verify that MOsatisfies Monotonicity and Reference Irrelevance and the PT satisfies RI. We provide examples showingthe other properties are violated in Appendix A.5. Whenever c ( r ) = c ( r ) for every r, r ∈ X . The behavior in this example is both intuitively and formally consistent with thesalient thinking model of BGS. Without any promotion, the consumer expects to paya high price for a relatively low quality selection. When choosing between Syrah orShiraz, the consumer focuses on the French wine’s sublime quality, and she is willingto pay at least $6 more for it. When choosing between water and Syrah, the low priceof water stands out and she reveals that the gap between wine and water is less than$8. However, when there is no promotion, she focuses again on the quality, and she iswilling to pay an additional $2 for even her less-preferred Australian Shiraz over water.Notice that this explanation does not require that the reference points are different.Since the consumer visits this bar regularly, intuitively, her reference point should befixed and stable.3.5.
Non-increasing CTM.
For simplicity, we have so far focused on increasingCTM. This is a desirable feature in consumer choice, but models of social prefer-ence often violate this property. For instance, inequality-averse individual 1 prefersto increase the allocation to individual 2 from x to y when she feels guilty but notwhen she is envious. However, she always prefers increasing the allocation to 2 in anallocation categorized as guilty, and to decrease in any categorized as envious. Thiscontradicts Category Montonicity, suggesting the following weakening. Axiom (Consistent Preference within Category, CPC) . For each category k , thereexists a set of attributes P k so that if x j ≥ y j for all j ∈ P k , y i ≥ x i for all i / ∈ P k ,and x = y , then y (cid:37) k ∗ x . The set P k contains the attributes for which an increase positively affects theDM’s evaluation. CPC requires that the set of positive attributes in a category doesnot depend on the reference point. For the two-person-RIA model, the set for the Implicitly, the example reveals that the quality of French Syrah is higher than Australian Shirazwhich is in turn higher than water. The numerical value of quality assigned to each beverage is irrele-vant to the violation of Cancellation. For examples of qualities so that choice can be represented by theBGS model, one can calculate that ( − , q fs ) (cid:31) r ( − , q as ), ( − , q w ) (cid:31) r ( − , q fs ) and ( − , q as ) (cid:31) r ( − , q w ) for q fs = 8, q as = 6 . q w = 5 .
1, and the reference point r = ( ( −
10 + − , ( q w + q as ))when w = 0 . “guilty” category is { , } since she strictly prefers increasing everyone’s allocation,but the set for the “envious” one is { } – she prefers more for herself but dislikesothers having even more. Note that CM is the special case of CPC where P k includesevery dimension for every category.A CTM is characterized by all the properties of an increasing CTM, except whereCM is replaced by CPC. The proof is a straightforward generalization of earlier one,so it is omitted. 4. BGS Model and Categories
The BGS model is intuitive, tractable, and accounts for a number of empiricalanomalies for the neoclassical model of choice. Despite its popularity, it can be difficultto understand all of the implications of the BGS model. Its new components areunobservable, and its functional form rather involved.The first crucial step towards understanding the model is getting a handle on thenovel salience function that determines which attribute stands out for a given referencepoint. While one can work out the implications of a particular salience function, thisexercise is not fruitful since the particular function that applies to a given agent isunobservable. Moreover, it is not clear how the model changes when the underlyingsalience function changes.CTM provides a lens through which we can study the salience function. While itinfluences which attribute is salient, the weight given to each attribute is independentof its magnitude. Therefore, its role is simply to divide the domain into distinctcategories, each associated with a particular attribute being most salient. We studythe salience function by focusing on the properties of the categories it generates.Categories are generated by a function s : R ++ × R ++ → R + if x ∈ K i ( r ) if andonly if s ( x i , r i ) > s ( x j , r j ) for all j = i . In the BGS model, categories are generated bya salience function σ that must satisfy the following properties. First, it increases in contrast, i.e. for (cid:15) > a > b , σ ( a + (cid:15), b ) > σ ( a, b ) and σ ( a, b − (cid:15) ) > σ ( a, b ). Second,it is continuous in both arguments. Third, it is symmetric, i.e. σ ( a, b ) = σ ( b, a ). Twoother properties are sometimes assumed: σ is Homogeneous of Degree Zero (HOD) iffor all α > σ ( αa, αb ) = σ ( a, b ), and σ has diminishing sensitivity if for all (cid:15) > a, b > σ ( a + (cid:15), b + (cid:15) ) ≤ σ ( a, b ). Finally, we always impose that the salience functionis grounded: σ ( r, r ) = σ ( r , r ) for all r, r ∈ X . This is an implication of HOD satisfiedby all of the specifications of which we are aware in the literature, and is a necessarycondition for an attribute to be salient only if it differs from the reference.Consider the following properties of categories. S0: (Basic) For any r ∈ X : K ( r ) T K ( r ) = ∅ , K ( r ) S K ( r ) is dense in X , K , K are continuous at r , and K ( r ) , K ( r ) are regular open sets. S1: (Moderation) For any λ ∈ [0 ,
1] and r ∈ X : if x ∈ K k ( r ), y k = x k , and y − k = λx − k + (1 − λ ) r − k , then y ∈ K k ( r ). S2: (Symmetry) If ( a, b ) ∈ K k ( c, d ), then ( c, d ) ∈ K k ( a, b ) and ( b, a ) ∈ K − k ( d, c ). S3: (Transitivity) If ( a , a ) / ∈ K ( r , r ) and ( a , a ) / ∈ K ( r , r ) then ( a , a ) / ∈ K ( r , r ). S4: (Difference) For any x, y, z with y = z , ( x, y ) ∈ K ( x, z ) and ( y, x ) ∈ K ( z, x ). S5: (Diminishing Sensitivity) For any x, y, K , K , (cid:15) >
0, if ( x, y ) / ∈ K ( r , r ),then ( x + (cid:15), y ) / ∈ K ( r + (cid:15), r ). S6: (Equal Salience) For any x, r ∈ X : if x r = x r or x r = r x , then x / ∈ K k ( r ) for k = 1 , S0 bydefinition; we include it for completeness. S1 indicates that making a bundle’s lesssalient attribute closer to the reference point does not change the salience of the bundle.That is, when x and y differ only in attribute l , and y is closer to the reference in thatattribute, if x is k -salient, then so is y . S2 requires that the same ranking is used for BGS require this inequality to hold strictly. However, this is not a desirable property. If σ is HODas they assume, then σ ( r, r ) = σ ( αr, αr ) = σ ( r + (cid:15), r + (cid:15) ) for α > (cid:15) = ( α − r , violating theirdefinition of diminishing sensitivity. rr r 𝑅 " 𝑅 " 𝑅 𝑅 𝑅 " 𝑅 " 𝑅 𝑅 𝑅 " 𝑅 " 𝑅 𝑅 rr 𝑅 " 𝑅 " 𝑅 𝑅 r S0-S3 but not S4-S6 S0-S4 but not S5-S6 S0-S5 but not S6 S0-S6
Figure 4.
Properties
S0-S6
Illustratedeach attribute. S3 adapts transitivity to the salience ranking. It says that if a standsout more relative to r than a does to r , and a stands out more relative to r than a does to r , then a stands out more relative to r than a does to r . S4 says simplythat any difference stands out more than no difference. S5 implies that increasing boththe good and the reference by the same amount in the same dimension does not movethe good from one category to another. S6 reads that if every attribute of x differsfrom the reference point by the same percentage, then none of the attributes standsout. More formally, if the percentage difference between x k and r k is the same acrossattributes, then x is not k -salient for any k ∈ { , } .Figure 4 provides examples satisfying some but not all of the properties. Thefunctions that generate them, as well as a verification that they satisfy the claimedproperties, can be found in Example 4 in the Appendix. Theorem 5.
The category function satisfies: (1)
S0-S4 if and only if there exists a salience function σ that generates them; (2) S0-S5 if and only if the σ that generates it has diminishing sensitivity; and (3) S0 , S1 , and S6 if and only if it satisfies S0-S6 if and only if the σ thatgenerates it is HOD. Any HOD salience function generates the same categories. This theorem provides a characterization for BGS’s salience function. It trans-lates the functional form assumptions on the salience function in terms properties on Theorem 5 relies on the full structure of R for the last two results, as noted in Footnote 1. Di-minishing sensitivity and Homogeneity are both cardinal properties, and so are undefined without the salience categories. The most common specification of the salience function, HOD,satisfies all of the above properties. Surprisingly, the result shows that there is a uniquecategory function satisfying these properties. Hence, any two HOD salience functionslead to exactly the same behavior.We now turn to the question of identifying the salience function from choice be-havior alone. That is, given that we observe a family { (cid:37) r } r ∈ X , can we identify whichalternatives have what salience? Proposition 1.
Suppose that { (cid:37) r } r ∈ X has a BGS representation. Then, the weights,utility indices, and salience function are uniquely identified from { (cid:37) r } r ∈ X . The proof provides an algorithm for this in general. We illustrate for the case where u and u are linear. Fixing a reference point r , any alternative that differs only indimension i from r must be i -salient. Hence, we can identify the weights on dimensionswithin each category from the slope of the indifference curve passing through thatalternative. Now, we can test whether y is 1-salient by seeing if the indifference curvesclose to it are those generated by the weights for 1-salient alternatives. Varying y and r allows identification of the salience function, and hence the categories.In addition to the particular form of categories, BGS satisfies several propertiesthat distinguish it from other CTMs. The most general of these is Reference Irrelevance,above, making BGS a strong CTM. The other follows. Axiom 9 (Salient Dimension Overweighted, SDO) . For any x, y, r, r ∈ X :if x, y ∈ K k ( r ) ∩ K l ( r ), x (cid:37) r y , x l > y l , and y k > x k , then x (cid:31) r y .This axiom requires that categories correspond to the dimension that gets themost weight. That is, the DM is more willing to choose an alternative whose “best”attribute is k when it is k -salient. To illustrate, consider alternatives x, y with x > y and y > x . Because x is relatively strong in attribute 1, x should benefit more than cardinal structure on X . Properties S0-S4 are defined. Subsequent results that rely on Theorem 5,such as Propositions 2 and 3, remain true when imposing only
S0-S4 in this setting. y from a focus on it. If x is chosen over y when attribute 2 stands out for both, thenthis advantage in the first dimension is so strong that even a focus on the other onedoes not offset it. Hence, the DM should surely choose x over y for sure when attribute1 stands out for it. Proposition 2.
Assume that there exists x ∈ K k ( r ) and y ∈ K j ( r ) with x ∼ r y forany categories k, j and any r ∈ X . Then, the family { (cid:37) r } r ∈ X satisfies the AffineCTM axioms, Reference Interlocking, Reference Irrelevance, and Salient DimensionOverweighted for a category function K satisfying S0 - S5 if and only if it has a BGSrepresentation where σ has diminishing sensitivity. This result characterizes the BGS model. It also provides guidance for comparingit with other models in the CTM class (see Figure 1 and Table 1). By outlining themodel’s testable implications, the result provides guidance on how to design experi-ments to test it. In their 2013 paper, BGS focus on a special case where the model is linear: w = w = 1 − w = 1 − w > and u ( x ) = u ( x ) = x . In an earlier version of this paper, weshow this model is characterized by strengthening Affine Across Categories to requirelinearity and imposing a reflection axiom that requires permuting two alternatives andthe reference point in the same way not to reverse the DM’s choice between the two. Taken together Propositions 1 and 2 provide an outline for a fully subjectiveaxiomatization of a family of preferences with a BGS representation. Proposition 1shows that we can reveal a category function from the family of preferences, providedthey have a representation. We check whether these revealed categories exist andsatisfy S0-S5. If so, then the axioms shown necessary by the second result apply withthis revealed category function. The assumption that alternatives indifferent to each other exist in each category for each referencepoint is not strictly necessary. A sufficient condition for it to be necessary is that the utility indexesare both unbounded above (or below). Formally, the first is that Affine Across Categories holds with ⊕ k replaced by the usual + operation.The second is that ( a, b ) (cid:37) r ,r ( c, d ) if and only if ( b, a ) (cid:37) r ,r ( d, c ). One can verify that theseadditional assumptions imply that the ancillary assumption about indifference holds. Choice Correspondence
In this section, the modeler observes only the DM’s choice from a finite subset ofchoices and nothing more. A model consists of both a theory of reference formation anda theory of choice given categorization. In this setting, we can jointly test the theoryof choice given categorization, categorization given reference, and reference formation.We model reference formation via a reference generator, a map from finite subsetsof alternatives to reference points. We denote the reference generator A : 2 S \ ∅ → X ,with the interpretation that A ( S ) is the reference point when the menu is S . Examplesinclude the BGS theory that A ( S ) is the average alternative, that A ( S ) is the medianbundle, that A ( S ) is the upper (or lower) bound of S , and the Köszegi & Rabin[2006] theory that A ( S ) = c ( S ). If additional observable data on the choice context isprovided, then it is easy to extend our results to A being a function of that as well. Forinstance MO theorize that the initial endowment e is observable and that A ( S, e ) = e ,and Bordalo et al. [2019] theorize that past histories h of consumption are availableand that A ( S, h ) is the average between the bundles in S and those in h .Fixing a categorization function K and a reference generator A , let X be the set offinite and non-empty subsets of X such that every alternative is categorized. Formally, S ∈ X only if S ⊂ S mi =1 K i ( A ( S )). We call these menus or categorized menus for short.The requirement ensures that each alternative in the choice set belongs to a categorygiven the reference point A ( S ). We leave open how the DM chooses when alternativesthat are uncategorized belong to the choice set. By leaving the choice from this smallset of menus ambiguous, we can more clearly state the properties of choice implied bythe model. One can, of course, extend the model to account for these choices. For instance, BGS hypothe-size that these alternatives are evaluated according to their sum. Complications arise because theuncategorized alternatives are “small:” its complement is open and dense, and moreover it has zeromeasure. We summarize the DM’s choices by a choice correspondence c : X ⇒ X with c ( S ) ⊆ S and c ( S ) = ∅ for each S ∈ X . Adapted to this setting, the model has thefollowing representation. Definition 4.
The choice correspondence c conforms to Strong-CTM under ( K , A ) ifthere exists a family of preference relations { (cid:37) r } r ∈ X that conforms to Increasing StrongCTM under K so that c ( S ) = n x ∈ S : x (cid:37) A ( S ) y for all y ∈ S o for every S ∈ X .5.1. Reference point formation.
Provided that the reference generator is responsiveenough to changes in the menu, there is the possibility of testing the properties requiredby categorization on (cid:37) r . One example of enough structure is that the reference point isthe average bundle. However, this is just one example. An even more general sufficientcondition is as follows. Assumption.
A function A is a generalized average if for any S = { x , . . . , x m } ∈ X :(i) the function x A ([ S \ { x } ] S { x } ) is continuous at x , and(ii) for any (cid:15) > S ∈ S i K i ( A ( S )), there exists S ∗ ∈ X so that S ∗ ⊃ S S S , d ( A ( S ∗ ) , A ( S )) < (cid:15) , and for any x ∈ S ∗ \ S , min x ∈ S d ( x , x ) < (cid:15) .Examples of generalized average reference include the average bundle A a ( S ) = P x ∈ S x | S | , P x ∈ S x | S | ! , the median value of each attribute, and a weighted average A wa ( S ) = P x ∈ S w ( x ) x P x ∈ S w ( x ) , P x ∈ S w ( x ) x P x ∈ S w ( x ) ! for any continuous weight function w : X → [ a, b ] with b > a >
0. We sometimesimpose the additional requirement that A ( S ) ∈ co ( S ) \ ext ( S ) for all non-singleton S ;if so, we call A a strong generalized average . The first and last of these examples satisfy this property. The supremum and infimum on their own are not weighted averages,nor (necessarily) is the choice acclimating reference generator, c ( S ) = A ( S ). Behavioral Foundations for Strong-CTM.
We now consider the behaviorby a DM who conforms to Strong-CTM for a given category function and referencegenerator. To do so, we make use of our earlier analysis by revealing how the DMevaluates alternatives categorized in a given way. When A ( S ) is a generalized average,this provides enough structure to identify enough of the family to apply our earlieranalysis.The main behavioral content comes from the choice correspondence equivalent ofReference Irrelevance. To state it, we introduce the following definition and notation. Definition 5.
The alternative x in category k is indirectly revealed preferred to al-ternative y in category j , written ( x, k ) (cid:37) R ( y, j ), if there exists finite sequences ofpairs ( x i , S i ) ni =1 such that x = x ∈ K k ( A ( S )), y ∈ K j ( A ( S n )) T S n , and for each i : x i ∈ c ( S i ), x i +1 ∈ S i , and x i +1 ∈ K k i ( A ( S i )) ∩ K k i ( A ( S i +1 )) for some k i .We replace Reference Irrelevance with the following weakening of the Strong Axiomof Revealed Preference (SARP). Axiom (Category SARP) . For any S ∈ X , if ( x, k ) (cid:37) R ( y, j ) , x ∈ K k ( A ( S )) T S , y ∈ K j ( A ( S )) T S , and y ∈ c ( S ) , then x ∈ c ( S ) . We first illustrate in a simple two menu setting, analogous to a test case for theWeak Axiom of Revealed Preference (WARP). Consider two menus S and S and twochosen products x ∈ c ( S ) and x ∈ c ( S ) where both products are categorized in thesame way for both menus. For example, x is in category 1 for both menus, and x is in category 2 for both. The observation x ∈ c ( S ) reveals that the valuation of x is at least as high as that of x when x belongs to the first category and x to thesecond. Since the categorization of products does not change when the menu changes Recall sup S = (max x ∈ S x , max x ∈ S x ) and inf S is defined analogously. from S to S , their relative valuation stays the same as well. Hence, if x is chosenfrom S , then x must be chosen too. Since neither products’ category has changed,the DM should obey WARP for these two menus. However, the axiom leaves open thepossibility of a WARP violation when either is differentially categorized.The axiom extends this logic to sequences of choices in much the same way thatSARP does to WARP. A finite sequence of choices, where the choice from the nextmenu is available in the current one and has the same salience in both, does not leadto a choice reversal. Since salience does not change along the sequence of choices, thechoices do not exhibit a reversal.Category SARP limits the effect of unchosen alternatives. Modifying them canalter the DM’s choice, but only insofar as changing them changes the reference pointand thus the salience of alternatives. It states that these unchosen options do not alterthe relative ranking of two alternatives, unless they change the region to which thealternatives belong. That is, when comparing the same two alternatives in differentmenus, the DM’s relative ranking does not change when neither’s salience changes.This property greatly limits the effect of the reference point. In fact, a sufficientlysmall change in the reference never leads to a preference reversal.The remaining axioms are the natural generalizations to the choice correspondenceof Category Cancellation, Category Monotonicity, Category Continuity, Reference In-terlocking, and Affine Across Categories. We denote these by appending a “*” todistinguish from their reference-dependent-preference formulation. Appendix B.1 con-tains their formal statement.As before, we require some additional topological structure on the categories. Fora category k , let E R,k = { x ∈ X : x ∈ K k ( A ( S )) , { x } = c ( S ) } and D k = [ S ∈X n K k ( A ( S )) \ S o . The generalization of the structure assumption is as follows.
Assumption (Revealed Structure) . For any category k , E R,k is open, E R,k is dense in D k , and the following sets are connected: E R,k , { x ∈ E R,k : x j = s } for all dimensions j and scalars s ∈ R , and { y ∈ E R,k : ( x, k ) ∼ R ( y, k ) } for all x ∈ E R,k .In addition to what was imposed by the Structure Assumption, we require thatalmost all objects categorized in a category are chosen in some menu. This can beweakened, but is typically satisfied by the models in which we are interested, such asBGS.We require one last assumption.
Axiom (Comparability Across Regions, CAR) . If x ∈ E R,k , then for any j there exists y ∈ E R,j so that ( x, k ) ∼ R ( y, j ) . This is a version of the assumption we made for Strong CTM. It requires thatevery alternative chosen when it belongs to category k is revealed to be equally goodto some other alternative when it is categorized in category j . With it, we can nowstate the result. Theorem 6.
Assume that Revealed Structure and CAR hold and that A is a generalizedaverage. A choice correspondence c conforms to strong-CTM under ( K , A ) if and only if c satisfies Category-SARP, Category Monotonicity*, Category Cancellation*, CategoryContinuity*, and Affine Across Categories*. The result is the counterpart of Theorem 4 with an endogenous reference point.The behavior corresponding to categorization does not fundamentally change acrosssettings. As long as the DM reacts consistently when alternatives are categorized inthe same way, then we can represent her choices as categorical thinking where the reference point only affects how she categorizes each alternative. The key challenge inthe proof is to establish that the arguments we used to establish our earlier results stillhold. We adapt our earlier arguments to show that revealed preference within cate-gory k is complete on E R,k . This relies on small changes in alternatives not changingchoice, a property implied by generalized average. Then, the remaining axioms estab-lish that this within-category preference has an additive representation. CAR allowsus to extend across categories.5.3.
Behavioral Foundations for BGS.
In this subsection, we provide a behavioralfoundation for BGS. The first step is to show that the Revealed Structure assumptionholds.
Lemma 1. If A is a strong generalized average, K satisfies S0 , S1 , and S4 , and c satisfies Category Montonicity*, then E R,k = R for k = 1 , . Given the assumptions we have made so far, every alternative is chosen in somemenu when it is k -salient. Consequently, the revealed structure assumption must hold.The result relies on the observation that the DM categorizes x as 1-salient when allother available options have the same value in dimension 2 as x . If x has the highestvalue in attribute 1 in such a choice set, then it must be chosen.Now, we can apply Theorem 6 in combination with the insights gained from Propo-sition 2 to understand the behavioral foundation of the BGS model. Proposition 3.
Assume that A is a strong generalized average and that CAR holds.The choice correspondence c satisfies Category-SARP, Category Monotonicity*, Cat-egory Cancellation*, Category Continuity*, Affine Across Categories*, Reference In-terlocking*, and Salient Dimension Overweighted* for a category function K satisfying S0 - S5 if and only if c conforms to BGS where σ has diminishing sensitivity. This theorem lays out the behavioral postulates that characterize the BGS modelwith endogenous reference point formation. Most importantly, it connects the (un-observed) components of the model to observed choice behavior. Fundamentally, theproperties that Proposition 2 characterized the model in our first setting still character-ize it. To do so, we note that Theorems 5 and 6 imply that there exists a Strong CTMwith categories generated by a salience function. We then establish that choice withinthe k -salient alternatives overweights dimension k by using SDO and the structure ofregions.Finally, we ask the question of whether the choice correspondence with an endoge-nous reference point provides enough leverage to identify salience. Proposition 4.
Given that c conforms to BGS with a strong generalized average, thecategories are uniquely identified. As with Propositions 1 and 2, Propositions 3 and 4 provide a roadmap for testingBGS without a known salience function. However, it still requires that the referencegenerator is a strong generalized average. Consequently, the axioms capture the fulltestable implication of the model and allow for tight comparisons with other existingwork. 6.
Related Literature
This paper provides a choice theoretic analysis of categorization. We apply thismodel to highlight similarities and differences between a number of behavioral modelsin the literature. As such, it is closely related to the literature which studies howa reference point affects choices, ( e.g.
Tversky & Kahneman [1991], Munro & Sug-den [2003], Sugden [2003], Masatlioglu & Ok [2005], Sagi [2006], Salant & Rubinstein[2008], Apesteguia & Ballester [2009], Masatlioglu & Nakajima [2013], Masatlioglu &Ok [2014], Dean, Kıbrıs, & Masatlioglu [2017]). The papers focus on an exogenous reference point, as in Section 3. While TK and MO are examples of CTM, the oth-ers are not. Nonetheless, our analysis puts the models on an equal footing so theirimplications can be compared.We then extend the model to consider endogenous reference point formation. Thisadopts the approach of a number of recent papers, e.g. Bodner & Prelec [1994],Kivetz, Netzer, & Srinivasan [2004], Orhun [2009], Bordalo, Gennaioli, & Shleifer[2012], Tserenjigmid [2015]. As in Section 5, the reference point is a function of thecontext, and is identical for all feasible alternatives. Finally, Köszegi & Rabin [2006],Ok, Ortoleva, & Riella [2015], Freeman [2017] and Kıbrıs et al. [2018] study modelswhere the endogenous reference point is determined by what the agent chooses, but isotherwise independent of the choice set. This represents a very different approach toreference formation, and our approach does not easily generalize to accommodate it. One of our key contributions is to provide an axiomatization of the salient thinkingmodel. Interpreting salience as arising from differential attention to attributes, CTMhas a close relationship with the literature studying how limited attention affects deci-sion making. Masatlioglu et al. [2012] and Manzini & Mariotti [2014] study a DM whohas limited attention to the alternatives available. The DM maximizes a fixed prefer-ence relation over the consideration set, a subset of the alternatives actually available.In contrast, in CTM the DM the considers all available alternatives but maximizes apreference relation distorted by her attention. Caplin & Dean [2015], de Oliveira et al.[2017] and Ellis [2018] study a DM who has limited attention to information. In con-trast to CTM, attention is chosen rationally to maximize ex ante utility, rather thandetermined by the framing of the decision, and choice varies across states of the world.The most related interpretation considers attributes as payoffs in a fixed state. In ad-dition to choices varying across states, each alternative has the same weights on eachattribute, similar to Kőszegi & Szeidl [2013]. Taken together, these results highlightthe effects on behavior of different types of attention. Maltz [2017] is the only model of which we are aware that combines an exogenous reference pointwith endogenous reference-point formation. While we argue in this paper that a number of prominent behavioral economicmodels can be thought of as resulting from categorization, few papers in econom-ics explicitly address categorization. Mullainathan [2002] provides a model of beliefupdating and shows how categorization can generate non-Bayesian effects. Fryer &Jackson [2008] introduce a categorical model of cognition where a decision maker cat-egorizes her past experiences. Since the number of categories is limited, the decisionmaker must group distinct experiences in the same category. In this model, predictionis based on the prototype from the category which matches closely the current situa-tion. Finally, Manzini & Mariotti [2012] introduce a two-stage decision-making model.In the first stage, a decision maker eliminates some of alternatives based on their cat-egories, and in the second stage she maximizes her preference among the alternativessurviving after the first stage. Bordalo et al. [2019] provide a model of memory andattention, where the context’s similarity to past consumption opportunities affects thesalience of the alternatives currently available. They show this leads to endogenouscategorization of the current opportunity set, and discuss the resulting implications forchoice.The evolutionary psychology literature on categorization suggests a common ex-planation for the effects shown to be captured by our model of categorization. Thatliterature stresses that categories evolved as cues to apply a particular mental processin a given situation (see e.g. the review by Cosmides & Tooby [2013]). However, theseprocesses are often applied to situations different from their evolutionary purpose.Boyer & Barrett [2015] explain, “The fact that some cognitive system is specializedfor a domain D does not entail that it invariably or exclusively handles D, nor does itmean that the specialization cannot be co-opted for evolutionarily novel activities.”
Thisimplies that systems used to evaluate categorized objects are miscalibrated from howthey would be more useful. For instance, New et al. [2007] documented that subjectswere quicker and more accurate in noticing changes involving animals than for thoseinvolving vehicles, despite the latter’s much greater importance in modern life. References
Anderson, J. R. (1991). The adaptive nature of human categorization.
Psychologicalreview , (3), 409.Apesteguia, J., & Ballester, M. A. (2009). A theory of reference-dependent behavior. Economic Theory , (3), 427–455.Ashby, F., & Gott, R. (1988). Decision rules in the perception and categorization ofmultidimensional stimuli. Journal of experimental psychology. Learning, memory,and cognition , , 33–53.Ashby, F. G., & Maddox, W. T. (2005). Human category learning. Annual ReviewPsychology , , 149–178.Ashby, F. G., & Waldron, E. M. (1999). On the nature of implicit categorization. Psychonomic Bulletin and Review , , 363–378.Bhatia, S., & Golman, R. (2013). Attention and reference dependence. Working paper .Bodner, R., & Prelec, D. (1994). The centroid model of context dependent choice.
Unpublished Manuscript, MIT .Bordalo, P., Gennaioli, N., & Shleifer, A. (2012). Salience theory of choice under risk.
The Quarterly Journal of Economics , (3), 1243–1285.Bordalo, P., Gennaioli, N., & Shleifer, A. (2013). Salience and consumer choice. Journalof Political Economy , (5), 803–843.Bordalo, P., Gennaioli, N., & Shleifer, A. (2019). Memory, attention, and choice. Working Paper, Harvard .Boyer, P., & Barrett, H. C. (2015). Intuitive ontologies and domain specificity.
Thehandbook of evolutionary psychology , (pp. 1–19).Bushong, B., Rabin, M., & Schwartzstein, J. (2015). A model of relative thinking.
Unpublished manuscript, Harvard University, Cambridge, MA .Caplin, A., & Dean, M. (2015). Revealed preference, rational inattention, and costlyinformation acquisition.
American Economic Review , (7), 2183–2203.Charness, G., & Rabin, M. (2002). Understanding social preferences with simple tests. The Quarterly Journal of Economics , (3), 817–869. Chateauneuf, A., & Wakker, P. (1993). From local to global additive representation.
Journal of Mathematical Economics , (6), 523–545.Chernev, A. (2011). The dieter’s paradox. Journal of Consumer Psychology , (2),178–183.Choi, J., & Kim, S. (2016). Is the smartwatch an it product or a fashion product? astudy on factors affecting the intention to use smartwatches. Computers in HumanBehavior , , 777 – 786.Cosmides, L., & Tooby, J. (2013). Evolutionary psychology: New perspectives oncognition and motivation. Annual review of psychology , , 201–229.de Oliveira, H., Denti, T., Mihm, M., & Ozbek, K. (2017). Rationally inattentivepreferences and hidden information costs. Theoretical Economics , (2), 621–654.Dean, M., Kıbrıs, Ö., & Masatlioglu, Y. (2017). Limited attention and status quo bias. Journal of Economic Theory , , 93–127.Debreu, G. (1959). Topological methods in cardinal utility theory. Tech. rep., CowlesFoundation for Research in Economics, Yale University.Ellis, A. (2018). Foundations for optimal inattention. Journal of Economic Theory , , 56–94.Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. Quarterly Journal of Economics , (3), 817–868.Freeman, D. J. (2017). Preferred personal equilibrium and simple choices. Journal ofEconomic Behavior & Organization , , 165–172.Fryer, R., & Jackson, M. O. (2008). A categorical model of cognition and biaseddecision making. The BE Journal of Theoretical Economics , (1).Gabaix, X. (2014). A sparsity-based model of bounded rationality, applied to basicconsumer and equilibrium theory. Quarterly Journal of Economics , , 1369–1420.Ghirardato, P., Maccheroni, F., Marinacci, M., & Siniscalchi, M. (2003). A subjectivespin on roulette wheels. Econometrica , (6), 1897–1908.Kıbrıs, Ö., Masatlioglu, Y., & Suleymanov, E. (2018). A theory of reference pointformation. Working papers. Kivetz, R., Netzer, O., & Srinivasan, V. (2004). Alternative models for capturing thecompromise effect.
Journal of Marketing Research , (3), 237–257.Köszegi, B., & Rabin, M. (2006). A model of reference-dependent preferences. Quar-terly Journal of Economics , (4), 1133–1165.Kőszegi, B., & Szeidl, A. (2013). A model of focusing in economic choice. The QuarterlyJournal of Economics , (1), 53–104.Krantz, D., Luce, D., Suppes, P., & Tversky, A. (1971). Foundations of measurement,Vol. I: Additive and polynomial representations . New York Academic Press.Loken, B. (2006). Consumer psychology: categorization, inferences, affect, and per-suasion.
Annu. Rev. Psychol. , , 453–485.Loken, B., Barsalou, L. W., & Joiner, C. (2008). Categorization theory and researchin consumer psychology. Handbook of consumer psychology , (pp. 133–165).Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sustain: a network model ofcategory learning.
Psychological review , (2), 309.Maltz, A. (2017). Exogenous endowment - endogenous reference point. Working PapersWP2016/5, University of Haifa, Department of Economics.Manzini, P., & Mariotti, M. (2012). Categorize then choose: Boundedly rational choiceand welfare. Journal of the European Economic Association , (5), 1141–1165.Manzini, P., & Mariotti, M. (2014). Stochastic choice and consideration sets. Econo-metrica , (3), 1153–1176.Masatlioglu, Y., & Nakajima, D. (2013). Choice by iterative search. Theoretical Eco-nomics , (3), 701–728.Masatlioglu, Y., Nakajima, D., & Ozbay, E. Y. (2012). Revealed attention. AmericanEconomic Review , (5), 2183–2205.Masatlioglu, Y., & Ok, E. A. (2005). Rational choice with status quo bias. Journal ofEconomic Theory , , 1–29.Masatlioglu, Y., & Ok, E. A. (2014). A canonical model of choice with initial endow-ments. The Review of Economic Studies , (2), 851–883.Mogilner, C., Rudnick, T., & Iyengar, S. S. (2008). The mere categorization effect:How the presence of categories increases choosers’ perceptions of assortment variety and outcome satisfaction. Journal of Consumer Research , (2), 202–215.Mullainathan, S. (2002). Thinking through categories. NBER working paper .Munro, A., & Sugden, R. (2003). On the theory of reference-dependent preferences.
Journal of Economic Behavior & Organization , (4), 407–428.New, J., Cosmides, L., & Tooby, J. (2007). Category-specific attention for animalsreflects ancestral priorities, not expertise. Proceedings of the National Academy ofSciences , (42), 16598–16603.Ok, E. A., Ortoleva, P., & Riella, G. (2015). Revealed (p)reference theory. AmericanEconomic Review , (1), 299–321.Orhun, A. Y. (2009). Optimal product line design when consumers exhibit choiceset-dependent preferences. Marketing Science , (5), 868–886.Posner, M. I., & Keele, S. W. (1970). Retention of abstract ideas. Journal of Experi-mental psychology , (2p1), 304.Ratneshwar, S., & Shocker, A. D. (1991). Substitution in use and the role of usagecontext in product category structures. Journal of Marketing Research , (3), 281–295.Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experi-mental Psychology: General , (3), 192–233.Sagi, J. S. (2006). Anchored preference relations. Journal of Economic Theory , (1),283–295.Salant, Y., & Rubinstein, A. (2008). (a, f): Choice with frames. The Review ofEconomic Studies , (4), 1287–1296.Savage, L. J. (1954). The Foundations of Statistics . Dover.Stewart, N., Brown, G. D., & Chater, N. (2002). Sequence effects in categorization ofsimple perceptual stimuli.
Journal of Experimental Psychology: Learning, Memory,and Cognition , (1), 3.Strotz, R. (1955). Myopia and inconsistency in dynamic utility maximization. TheReview of Economic Studies , (3), 165–180.Sugden, R. (2003). Reference-dependent subjective expected utility. Journal of Eco-nomic Theory , (2), 172–191. Tserenjigmid, G. (2015). Choosing with the worst in mind: A reference-dependentmodel. Working papers.Tversky, A. (1977). Features of similarity.
Psychological review , (4), 327.Tversky, A., & Gati, I. (1978). Studies of similarity. Cognition and categorization , (1978), 79–98.Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. The Quarterly Journal of Economics , (4), 1039–1061.Wanke, M., Bless, H., & Schwarz, N. (1999). Lobster, wine and cigarettes: Ad hoccategorisations and the emergence of context effects. Marketing Bulletin , , 52–56. Appendix A. Proofs and Extras from Sections 2 - 4
A.1.
Proof of Theorem 1.Lemma 2. (cid:31) k ∗ has open upper and lower contour sets in E k .Proof. Suppose x (cid:31) k ∗ y . Then, there are x , x , . . . , x M ∈ E k and r , . . . , r M with x = x and x M = y so that x j (cid:37) r j x j +1 and x j , x j +1 ∈ K k ( r j ). Let (cid:15) j > B (cid:15) j ( x j ) , B (cid:15) j ( x j +1 ) ⊂ K k ( r j ). Set (cid:15) = min { (cid:15) j } j
For categories K i ( r ) and K j ( r ) , either (i) there exists x i ∈ K i ( r ) and x j ∈ K j ( r ) so that x i ∼ r x j ; or (ii) x i (cid:31) r x j for all x i ∈ K i ( r ) and x j ∈ K j ( r ) ; or(iii) x j (cid:31) r x i for all x i ∈ K i ( r ) and x j ∈ K j ( r ) .Proof. If neither (ii) nor (iii) holds, then after relabeling categories if necessary, thereexist x ∈ K i ( r ) and y, z ∈ K j ( r ) such that y (cid:31) r x (cid:31) r z . Let U C j ( x ) and LC j ( x )be the strict upper and lower contour sets of x in category j for reference r . Anypoint in K j ( r ) \ [ U C j ( x ) S LC j ( x )] is indifferent to x , so either (i) holds or the setis empty. There exists an (cid:15) > x ∈ B (cid:15) ( x ), y (cid:31) r x (cid:31) r z byCategory Continuity and hence K j ( r ) = U j ( x ) and K j ( r ) = L j ( x ). By CategoryContinuity, there exists x ∈ B (cid:15) ( x ) such that K j ( r ) \ [ U C j ( x ) S LC j ( x )] = ∅ (otherwise, B (cid:15) ( x ) is contained in the interior of the set considered), so we can take y ∈ K j ( r ) \ [ U C j ( x ) S LC j ( x )] and conclude y ∼ r x . (cid:3) Definition 6.
A finite sequence ( Q , . . . , Q m +1 ) with each Q i ∈ { K ( r ) , . . . , K n ( r ) } isan indifference sequence for r (IS) if there exists x , . . . , x m , y , . . . , y m with x k ∈ Q k , y k ∈ Q k +1 and x k ∼ r y k .We omit the dependence on r when clear from context.Define the relation ./ r by x ./ r y if there exists an indifference sequence of cate-gories ( Q , . . . , Q m ) with x ∈ Q and y ∈ Q m . It is easy to see that ./ r is an equivalencerelation (reflexive, symmetric, and transitive). Let [ x ] r denote the ./ r equivalence classof x . Lemma 5. If y / ∈ [ x ] r and x (cid:31) r y , then x (cid:31) r y for all x ∈ [ x ] r and y ∈ [ y ] r .Proof. Fix x, y, r ∈ X with y / ∈ [ x ] r and x (cid:31) r y , and assume x ∈ K k . Pick any y ∈ [ y ] r .By definition, there is an IS ( Q , . . . , Q m ) with y ∈ Q m and y ∈ Q . Let i = 1 and y = y . If there exists y ∈ Q i with y (cid:37) r x , then y (cid:37) r x (cid:31) r y i , so by Lemma 4, we can find z ∈ Q i and x ∈ K k with z ∼ r x . If that occurs, then ( K k , Q i , . . . , Q ) isan IS and y ∈ [ x ] r , a contradiction. Thus x (cid:31) r y for all y ∈ Q i . Now, there exists y i +1 ∈ Q i +1 with x (cid:31) r y i +1 by transitivity and definition of IS. Hence, we can applyabove logic to Q i +1 as well: x (cid:31) r y for all y ∈ Q i +1 . Inductively, this extends all theway to Q m , so x (cid:31) r y in particular. Since y is arbitrary, this extends to any y ∈ [ y ] r .Similar arguments show that x (cid:31) r y for any x ∈ [ x ] r . Combining, x (cid:31) r y whenever x ∈ [ x ] r and y ∈ [ y ] r . (cid:3) Fix a reference point r . Let A , . . . , A n be the distinct equivalence classes of ./ r .By Lemma 5, these sets can be completely ordered by (cid:31) r , i.e. A i (cid:31) r A j ⇐⇒ x (cid:31) r y for all x ∈ A i and y ∈ A j . Label so that A (cid:31) r A (cid:31) r · · · (cid:31) r A n .Pick an indifference class A i and an IS Q , . . . , Q M that contains points in everyregion in A i . We define V i ( · ) on A i as follows. Define V i ( x ) on Q so that V i ( x ) = U j ( x )for all x ∈ K j ( r ) where K j ( r ) = Q . Clearly V i represents (cid:31) r when restricted to Q .There is no loss in assuming that V i is bounded, and the closure of its range is aninterval. Now, assume inductively that, for a given m ≤ k , V i represents (cid:31) r when restrictedto S m − j =1 Q j ≡ Q m − , is bounded, is continuous on Q m − , and is an increasing trans-formation of U k within Q j when Q j = K k ( r ). Then, extend V i to Q m as follows. ByLemma 5, it is impossible that y (cid:31) r x for every x ∈ Q m − and every y ∈ Q m . It willbe convenient to relabel regions so that Q m = K m ( r ).Pick a bounded, strictly increasing, continuous h : R → R . For any x ∈ K m ( r ) sothat x (cid:31) r y for all y ∈ Q m − , set V i ( x ) = h ( U m ( x )) + β + where β + = sup { V i ( x ) : x ∈ Q m − } − inf { h ( U m ( x )) : x ∈ K m ( r ) , x (cid:31) r y for all y ∈ Q m − } . For any x ∈ K m ( r ) for which there exists y, y ∈ Q m − so that y (cid:31) r x (cid:31) r y , let V i ( x ) = inf { V i ( y ) : y ∈ Q m − and y (cid:37) r x } . For all other x ∈ K m ( r ), let V i ( x ) = h ( U m ( x )) + β − where β − = inf { V i ( x ) : x ∈ Q m − } − sup { h ( U m ( x )) : x ∈ K m ( r ) , y (cid:31) r x for all y ∈ Q m − } . This V i is bounded and continuous. We can define V ( x ) = h ( V ( x )) for h ( v ) = − / (1 + v ) when v ≥ h ( v ) = − / (1 − v ) when v < We now show that it represents (cid:31) r on Q m . Pick x, y ∈ Q m . There are four cases: Case 1: x, y ∈ Q m − : then the claim follows by hypothesis. Case 2: x ∈ K m ( r ) and either x (cid:31) r y for all y ∈ Q m − or y (cid:31) r x for all y ∈ Q m − :the claim is immediate. Case 3: x ∈ K m ( r ) and y ∈ Q m − : If y (cid:31) r x , then y − (cid:15) (cid:31) r x for some (cid:15) > y − (cid:15) belongs to the same region as y . If y ∼ r x , then V i ( y ) ≥ V i ( x ). If this doesnot hold with equality, then there is a y ∈ Q m − so that y (cid:37) r x and y (cid:31) r y (since y (cid:37) r y ). But then y (cid:31) r x , a contradiction. If x (cid:31) r y but V i ( y ) ≥ V i ( x ), there exists z ∈ Q m − so that V i ( z ) ≤ V i ( y ) and z (cid:37) r x . But then by transitivity and hypothesis, y (cid:37) r z (cid:37) r x . Case 4: x, y ∈ K m ( r ) and Case 2 does not hold for either x or y : Suppose x (cid:37) r y .If not, then V i ( y ) > V i ( x ) so there exists a z ∈ Q m − so that z (cid:37) r x and z (cid:37) r y . Byweak order, y (cid:31) r z and so y (cid:31) r x , a contradiction.Since it represents (cid:37) r on K m ( r ), it also agrees with (cid:37) m on K m ( r ). Hence it is anincreasing transformation of U i within K i ( r ) for each i ≤ m . Renormalize V i so thatits range is a subset of [ − − i, − i ].For any x, y ∈ A i , the above establishes that V i ( x ) ≥ V i ( y ) ⇐⇒ x (cid:37) r y . Forany x ∈ A i and y ∈ A j where i < j , x (cid:31) r y by Lemma 5 and construction. Since V i ( x ) > − − i , V j ( y ) < − j , and − − i > − j , we have V i ( x ) > V j ( y ). Define U k ( ·| r ) toagree with the appropriate restriction of V i , and conclude {(cid:31) r } r ∈ X conforms to CTMunder K . Since r was arbitrary, this completes the proof. (cid:3) A.2.
Proof for Theorem 2.
Sufficiency is easy to verify. Suppose that U k ( x ) = P ni =1 U ki ( x i ). We show that for every category j there exists a vector w (cid:29) U j ( x ) = P ni =1 w i U ki ( x i ) represents (cid:31) j on E k T E j .Consider dimension 1, and the rest follow the same arguments. The goal is toshow that U k ( x ) − U k ( y ) ≥ U k ( a ) − U k ( b ) if and only if U j ( x ) − U j ( y ) ≥ U j ( a ) − U j ( b )for any x, y, a, b ∈ E k T E j . If this is the case, then standard uniqueness results givethat U j ( x ) = αU k ( x ) + β . The β can be dropped completing the claim.Let π i be the projection onto the i -coordinate. Then, E k = π ( E k ) is open andconnected for any category k . This follows from E k connected and open and π i con-tinuous. In R , connected implies convex. Claim 1.
For any z ∈ E k T E j , there exists a neighborhood O z = B (cid:15) z ( z ) so that U k ( x ) − U k ( y ) ≥ U k ( a ) − U k ( b ) if and only if U j ( x ) − U j ( y ) ≥ U j ( a ) − U j ( b ) for any x, y, a, b ∈ O z . To see it is true, pick x ∈ E k T E j . Then there is an a l ∈ E l with a l = x for l = k, j . Let U k − i ( y ) = P j = i U kj ( y j ) for any y ∈ X . Since each a l ∈ K l ( r l ) for some r l ∈ X , there exists an (cid:15) l > B (cid:15) l ( a l ) ⊂ K l ( r l ) ⊂ E l , where the distance isgiven by the supnorm. Pick (cid:15) ∈ (0 , (cid:15) l ) so that U l ( x + (cid:15) ) − U l ( x − (cid:15) ) < U l − ( a l + (cid:15) l ) − U l − ( a l − (cid:15) l )for l = k, j . Then, for any a, b ∈ [ x − (cid:15), x + (cid:15) ] there exists y a − , y b − so that ( a, y a − ) , ( b, y b − ) ∈ B (cid:15) k ( a k ) and ( a, y a − ) ∼ r k ( b, y b − ) by Category Continuity and CM. In particular, U k ( a ) − U k ( b ) = U k − ( y b − ) − U k − ( y a − ). For any a , b ∈ [ x − (cid:15), x + (cid:15) ], it holds that U k ( a ) − U k ( b ) ≥ U k ( a ) − U k ( b ) if and only if ( b , y a − ) (cid:37) r k ( a , y b − ). Similarly,there exist z a − , z b − so that ( a, z a − ) , ( b, z b − ) ∈ B (cid:15) j ( a j ) and ( a, z a − ) ∼ r j ( b, z b − ). Now,( b , z b − ) (cid:37) r j ( a , z a − ) if and only if U j ( a ) − U j ( b ) ≥ U j ( a ) − U j ( b ). By ReferenceInterlocking and weak order, ( b , z b − ) (cid:37) r j ( a , z a − ) if and only if ( b , y a − ) (cid:37) r k ( a , y b − ),so we conclude that the claim holds with (cid:15) x = (cid:15) .We now extend to the entire domain (this follows similar arguments in CW). Pickan arbitrary x ∗ < x ∗ ∈ E k T E j and consider Z = ( x ∗ , x ∗ ]. If the claim is true, thenstandard uniqueness results give that U j ( x ) = αU k ( x ) + β for all x ∈ O z for some α >
0. Let α ∗ , β ∗ be the constants so that U j ( x ) = α ∗ U k ( x ) + β ∗ for all x in theneighborhood of x ∗ , as guaranteed to exist by the claim.Let Z = n s ∈ Z : U j ( x ) = α ∗ U k ( x ) + β ∗ for all x ∈ ( x ∗ , s ] o .Z is not empty by the claim. We show that it is both open and closed by pickingany s ∈ cl ( Z ) and showing s ∈ int ( Z ). Since [ x ∗ , s ] is compact and O = { O z : z ∈ [ x ∗ , s ] } is an open covering, there exists { O , . . . , O n } ⊂ O with x ∗ ∈ O , s ∈ O n and O m T O m = ∅ for all m ≥ m + 2. On each O m , there exists α m , β m so that theutility indexes agree by the claim. Also, O m and O m +1 have non-empty intersectionswith more than two points, so ( α m +1 , β m +1 ) = ( α m , β m ). In particular, O intersects O x ∗ so α m = α ∗ for all m . Then O n T Z ⊂ Z , i.e. s ∈ int ( Z ), so cl ( Z ) ⊂ int ( Z ) ⊂ Z ⊂ cl ( Z ), i.e. Z is both closed and open relative to Z . Conclude Z = Z since Z connected.Since U j ( x ) = αU k ( x ) + β for all x ∈ ( x ∗ , x ∗ ] for any interval in the domain, itholds for the whole domain as well. Extend to other categories that intersect E i S E j inductively. If there is no intersecting category, we can start again and obtain a (dis-joint) interval, the values of U i (and U j ) on which have no bearing on the DM’s choices.Similar arguments obtain for the other dimensions. Moreover, there is no loss in settingeach β = 0. This completes the proof. (cid:3) A.3.
Proof of Theorem 3.
To save notation, until after Lemma 10, we fix r andwrite K k instead of K k ( r ) and (cid:37) instead of (cid:37) r . We also identify xα k y with thealternative αx ⊕ k (1 − α ) y . Let ( U , . . . , U n ) be the additive functions that represent (cid:37) , . . . , (cid:37) n . Observe that U k ( xα k y ) = αU k ( x ) + (1 − α ) U k ( y ) for any α , provided that x, y, xα k y ∈ E k .Recall from Definition 6 that an indifference sequence is a finite sequence of cate-gories with indifference between each succeeding members. Definition 7.
The function v is a utility for the indifference sequence ( Q , . . . , Q m ) if v is an increasing additive utility function on each Q k and for all k , x, y ∈ Q k S Q k +1 : x (cid:37) y ⇐⇒ v ( x ) ≥ v ( y ). Lemma 6. If x k ∈ K k , x l ∈ K l , and x k ∼ x l , then there is a > , b ∈ R such that for x ∈ K k and y ∈ K l , x (cid:37) y ⇐⇒ U k ( x ) ≥ αU l ( y ) + β .Proof. W.L.O.G., take U k ( x k ) = 0. There is (cid:15) k > B (cid:15) k ( x k ) ⊂ K k . By CMand Category Continuity, there is (cid:15) l > B (cid:15) l ( x l ) ⊂ K l and for all y ∈ B (cid:15) l ( x l ), x ∗ = x k + (cid:15) k (cid:31) y (cid:31) x k − (cid:15) k = x ∗ . For any y ∈ K l and α such that yα l x l ∈ B (cid:15) l ( x l ), thereexists β ∈ (0 ,
1) such that x ∗ β k x ∗ ∼ yα l x l by Category Continuity, CM, and that (cid:37) isa weak order. Let V l ( y ) = α − U k ( x ∗ β k x ∗ ). This is well defined, additive, increasing,and ranks alternatives in the same way as U l . Thus, V l ( y ) = aU l ( y ) + b for some a > b ∈ R .For any x ∈ K k and y ∈ K l , pick α ∈ [0 ,
1] such that xα k x k ∈ B (cid:15) k ( x k ) and yα l x l ∈ B (cid:15) l ( x l ). By construction, yα l x l ∼ y when y ∈ B (cid:15) k ( x k ) and U k ( y ) = αV l ( y ). Thus, xα k x k (cid:37) y ∼ yα l x l holds if and only if U k ( x ) ≥ V l ( y ) and x (cid:37) y ⇐⇒ xα k x k (cid:37) yα l x l by AAC since x k ∼ x l , completing the proof. (cid:3) For an indifference sequence ( Q , . . . , Q m ) with utility v , we label the range ofutilities as cl ( v ( Q k )) = [ l k , u k ] where l k ≤ u k . Note that we allow Q k = Q l for k = l . Lemma 7.
For an indifference sequence ( Q , . . . , Q m ) , there is an affine, increasingutility v for it.Proof. The proof is by induction. We claim that there is a utility v k : X → R thatis a utility for the IS ( Q , . . . , Q k ) for any k . When k = 1 or k = 2, this is true bythe above lemmas. The induction hypothesis (IH) is that the claim is true for k = N .Consider k = N + 1. Let v N be the utility for ( Q , . . . , Q N ) be index that exists by theIH. If Q N +1 ⊆ S Ni =1 Q i , then we are done. If not, then for Q N = K l , there is no lossin normalizing v N so that it equals U l on K l ( r ). Suppose Q N +1 = K j ( r ), and let α, β be the scalars claimed to exist by Lemma 6, so that U j ( x ) ≥ αU l ( y ) + β ⇐⇒ x (cid:37) r y for x ∈ K k ( r ) and y ∈ K l ( r ). Restricted to Q N , v N = U l , so we can define v N +1 ( x ) = αv N ( x ) + β if x ∈ S Ni =1 Q i and v N +1 ( x ) = U j ( x ) if x ∈ Q N +1 . Then, if l < N and x, y ∈ Q l S Q l +1 , then we are done by the IH, since v N +1 ( x ) ≥ v N +1 ( y ) ⇐⇒ v N ( x ) ≥ v N ( y ). If x, y ∈ Q N S Q N +1 , then Lemma 6 andconstruction implies the result. The claim then holds by induction. (cid:3) Lemma 8.
Fix an indifference sequence ( Q , . . . , Q n ) with utility v . If x k ∈ Q k for k = i, i +1 , i +2 with x i ∼ x i +1 ∼ x i +2 , then ( Q , . . . , Q i , Q i +2 , . . . , Q n ) is an indifferencesequence (after relabeling) with utility v .Proof. The Lemma is vacuously true for any 1 or 2-element IS. Fix an IS ( Q , . . . , Q n )with n ≥ v as above, and suppose x k ∈ Q k for k = i, i + 1 , i + 2 with x i ∼ x i +1 ∼ x i +2 . By transitivity x i ∼ x i +2 , so ( Q , . . . , Q i , Q i +2 , . . . , Q n ) is an IS; it remains to beshown that v is a utility for it. There is an (cid:15) > B = B (cid:15) ( v ( x i )) ⊂ ( l k , u k ) for k = i, i +1 , i +2. Let v − ( u ) : B → Q i +1 be an arbitrary point in Q i +1 such that v [ v − ( u )] = u . Now, fix x ∈ Q i and y ∈ Q i +2 . For α small enough, v ( xα i x i ) , v ( yα i +2 x i +2 ) ∈ B .Then xα i x i ∼ v − ( v ( xα i x i )) and yα i +2 x i +2 ∼ v − ( v ( yα i +2 x i +2 )). So x (cid:37) y ⇐⇒ xα i x i (cid:37) yα i +2 x i +2 ⇐⇒ v − ( v ( xα i x i )) (cid:37) v − ( v ( yα i +2 x i +2 )) ⇐⇒ v [ v − ( v ( xα i x i ))] ≥ v [ v − ( v ( yα i +2 x i +2 ))] ⇐⇒ αv ( x ) + (1 − α ) v ( x i ) ≥ αv ( y ) + (1 − α ) v ( x i +2 ) ⇐⇒ v ( x ) ≥ v ( y )This establishes the Lemma. (cid:3) Lemma 9.
Fix an indifference sequence ( Q , . . . , Q n ) with utility v . If ( l , u ) T ( l n , u n ) = ∅ , then there exists i and x k ∈ Q k for k = i, i + 1 , i + 2 with x i ∼ x i +1 ∼ x i +2 .Proof. If there is i with ( l i , u i ) T ( l i +2 , u i +2 ) = ∅ , then there is u ∈ T j = i,i +1 ,i +2 ( l j , u j ) sothere exists x j ∈ Q j with v ( x j ) = u for j = i, i + 1 , i + 2 and thus by the hypothesis, x i ∼ x i +1 ∼ x i +2 . We show there exists such an i by contradiction. If l i +2 > u i for all i or l i > u i +2 for all i , then ( l , u ) T ( l n , u n ) = ∅ , a contradiction. So there must exist i such that [ l i +2 > u i and l i +2 > u i +4 ] or [ u i +2 < l i and u i +2 < l i +4 ]. In the first case, l i +2 ∈ ( l i +1 , u i +1 ) T ( l i +3 , u i +3 ); in the second, u i +2 ∈ ( l i +1 , u i +1 ) T ( l i +3 , u i +3 ). In eithercase, we have a contradiction. (cid:3) Lemma 10.
Fix an indifference sequence ( Q , . . . , Q n ) with utility v . Then for all x, y ∈ S i Q i , x (cid:37) y ⇐⇒ v ( x ) ≥ v ( y ) .Proof. This is clearly true if n = 1. (IH) Suppose the claim is true for any IS with m < n elements. Fix an IS ( Q , . . . , Q n ) with utility v . If x / ∈ Q S Q n or y / ∈ Q S Q n ,then the claim immediately follows from the IH, and clearly holds if x, y ∈ Q i for some i . So it suffices to consider arbitrary x ∈ Q and y ∈ Q n . By Lemmas 8 and 9, if( u , l ) T ( l n , u n ) = ∅ , we can form a shorter IS from Q to Q n and the claim thenfollows from the IH. There are two cases to consider: l n > u and u n < l . Consider l n > u . Therange of v restricted to S n − i =1 Q i is dense in S n − i =1 ( l i , u i ) = (¯ l, ¯ u ). Note l n ∈ (¯ l, ¯ u ) since x n − ∼ y n , so ( l n − , u n − ) T ( l n , u n ) = ∅ . Then ( l n , v ( y )) is an open interval having anon-empty intersection with (¯ l, ¯ u ). Since the range of v is dense in (¯ l, ¯ u ), there exists y ∈ Q n with l n < v ( y ) < v ( y ). Since l n > u , n >
1. Then ( Q , . . . , Q n ) and( Q n , . . . , Q n ) are both ISes with strictly less than n elements. Applying the IH, y (cid:31) x and y (cid:31) y . Conclude using transitivity that y (cid:31) x . Similar arguments obtain thedesired conclusion when u n < l . (cid:3) Define ./ r as in the proof of Theorem 1, and let A , . . . , A n be the distinct indiffer-ence classes of ./ r . Again using Lemma 5, we can relabel so that x ∈ A i and y ∈ A i +1 implies x (cid:31) r y . By Lemma 10, there is v i on A i so that v i is additive and increasingwithin categories and x (cid:37) y ⇐⇒ v i ( x ) ≥ v i ( y ) for all x, y ∈ A i .By Unbounded and Lemma 5, every positive unbounded region (if any) is a subsetof A , and every negative unbounded region (if any) is a subset of A n . If one region isboth positive and negative unbounded, then n = 1. Therefore, v i ( A i ) is bounded forall i ∈ (1 , n ), and v n ( A n ) is bounded above whenever n >
1. Define V ( x ) = v ( x ) forall x ∈ A . For x ∈ A i with i >
1, define V ( x ) recursively by V ( x ) = v i ( x ) − sup y ∈ A i v i ( y ) + inf y ∈ A i − V ( y ) − . Observe V ( · ) is a positive affine transformation of v i ( · ) when restricted to A i , and if x ∈ A i , y ∈ A j and i > j , then V ( x ) > V ( y ). Thus V represents (cid:37) r and, whenrestricted to any given region, is affine and increasing.Defining U k ( ·| r ) as the (unique) affine transformation of U k so it agrees with V on K k ( r ) establishes that (cid:37) r is an affine CTM. Since r was arbitrary, this establishesthat each (cid:37) r has such a representation. Conclude that { (cid:37) r } conforms to Affine CTM,completing the proof. (cid:3) A.4.
Proof of Theorem 4.
Without loss of generality, normalize so that U ( ·| r ) = U ( ·| r ) for all r, r . Suppose U k ( ·| r ) = U k ( ·| r ) for some r, r and some k . Then, let¯ (cid:15) = d ( r, r ) and pick a sequence ˆ r n → ˆ r such that: U k ( ·| ˆ r n ) = U k ( ·| r ), ˆ r n ∈ B ¯ (cid:15) ( r ) forall n , and d (ˆ r n , r ) → inf { d ( r , r ) : U k ( ·| r ) = U k ( ·| r ) } . Since ˆ r n ∈ cl ( B ¯ (cid:15) ( r )), there is noloss in assuming this sequence converges. Similarly, let r n be a sequence in B ¯ (cid:15) ( r ) suchthat r n → ˆ r and U k ( ·| r ) = U k ( ·| r n ).By hypothesis and that each K k ( r ) is open, there exists (cid:15) > x k and x such that B (cid:15) ( x k ) ⊂ K k (ˆ r ), B (cid:15) ( x ) ⊂ K (ˆ r ), and x k ∼ ˆ r x . By continuity of the region functions, B (cid:15) ( x k ) ⊆ K i (ˆ r n ) ∩ K i ( r n ) and B (cid:15) ( x ) ⊆ K (ˆ r n ) ∩ K ( r n ) for n large enough. For z close enough to x k , there exists y ( z ) ∈ B (cid:15) ( x ) such that z ∼ ˆ r y ( z ). But then by SC, z ∼ r n y ( z ) and z ∼ ˆ r n y ( z ). Thus U k ( z | r n ) = U ( y ( z ) | r n ) = U ( y ( z ) | ˆ r n ) = U k ( z | ˆ r n ) for all z close enough to x k , implying that U k ( ·| r n ) = U k ( ·| ˆ r n ), a contradiction. Conclude U k ( ·| r ) = U k ( ·| r ) for all r, r . (cid:3) A.5.
Examples from Table 1.
Example 1 shows that BGS violates Cancellation andinspecting Figure 1 shows it violates Monotonicity. It remains to show that TK violatesReference Irrelevance and that MO violates Cancellation. This is established by thefollowing two examples.
Example 2 (TK violates Reference Irrelevance) . Consider a TK model with λ = λ = 2. Then, for r = (10 , x = (12 ,
12) and y = (9 , y (cid:37) r x since (12 −
10) +(12 −
10) = 2(9 −
10) + (16 − r = (11 , x (cid:31) r y since (12 −
11) + (12 − > −
11) + (16 − x ∈ R GL ( r ) T R GL ( r ) and r ∈ R GL ( r ) T R GL ( r ), so thefamily violates Reference Irrelevance. Example 3 (MO violates Cancellation) . Let Q ( r ) = { x ∈ X : x / x > r / r } and c ( r ) = 1. Then, let x = (2 , y = (1 , z = (4 , r = (0 . , . x , z ) = (2 , (cid:37) r (4 ,
2) = ( z , y ) and ( z , x ) = (4 , (cid:37) r (1 ,
4) = ( y , z ) because allfour points belong to Q ( r ), cancellation requires that x (cid:37) r y . However, x / ∈ Q ( r ), so y (cid:31) r x , so cancellation does not hold.A.6. Other models and CTM.
In this subsection, we present the functional formsof the other models of salience we discussed, and show that they are not CTM. • Gabaix [2014] assumes a rational DM would maximize u ( a, w ) but actuallymaximizes u ( a, ( w m ∗ , . . . , w n m ∗ n ))where m ∗ ∈ arg min m ∈ [0 , n X i,j (1 − m i )Λ ij (1 − m j ) + κ X i m αi where Λ ij incorporates the “variance” in the marginal utility of dimensions i and j . When n is large, m ∗ i is often zero, so ( w m ∗ , . . . , w n m ∗ n ) is a “sparse”vector. • Tversky & Kahneman [1991] refer in general to V CT K ( x | r ) = X i v i ( u i ( x i ) − u i ( r i ))where v i is concave above 0 and convex below • Bordalo et al. [2019] and the continuous form of the salient thinking model has V CBGS ( x | r ) = w ( x , r ) x + w ( x , r ) x where w has the same properties as a salience function. • Munro & Sugden [2003] use the functional from V MS ( x | r ) = A ( r ) X i γ i r ρ − βi x βi ! β • Bhatia & Golman [2013] assume that the DM chooses the bundle x that maxi-mizes U ( x | r ) = α ( r )[ V ( x ) − V ( r )] + α ( r )[ V ( x ) − V ( r )]given that a reference point r , where each α i is increasing and positive.The first fails to be CTM, as the indifference curves have the same slope every-where for a fixed context. If they were CTM, then they would necessarily have onlya single region. Single region CTM coincides with the neoclassical model. The finalfour explicitly take into account a reference point. In all four, it is easy to see thatthe reference point affects the marginal rate of substitution between attributes. Thisimplies a violation of weak reference irrelevance for any given category function: anytwo points in the same category that are indifferent to each other necessarily remainso for a sufficiently small change in the reference point.A.7. Proof of Proposition 2. K satisfying S0-S4 implies that E = E = R n ++ , sothe structure assumption is satisfied. Moreover, Theorem 5 gives that the categoriesare generated by a salience function. The axioms allow us to apply Theorems 2 and4 to get a strong CTM representation of the family with reweighted utility indexes.Hence, U k ( x ) = w k u ( x ) + w k u ( x ) + β k for each x ∈ X .There is no loss in normalizing so that β = 0. Pick x ∈ X with x > x , and byS4 observe that x ∈ K ( r ) for r = ( x , x /
2) and x ∈ K ( r ) for r = ( x / , x ). Since K ( r ) and K ( r ) are open, there exists (cid:15) > B (cid:15) ( x ) ⊂ K ( r ) T K ( r ). Since U is continuous and increasing, there is y ∈ B (cid:15) ( x ) with y < x so that U ( y ) = U ( x ),i.e. y ∼ r x ; this y necessarily has y > x by CM. Then, SDO implies y (cid:31) r x , i.e. U ( y ) > U ( x ), which requires w /w < w > w . We can incorporate β into u byreplacing it with u + β / ( w − w ) or into u by replacing β into u by replacing itwith u + β / ( w − w ). At least one does not involve dividing by zero, as otherwise w i = w i for i = 1 , (cid:3) A.8.
TK.
This subsection states and proves a characterization theorem for TK.
Proposition 5.
A family of preferences { (cid:37) r } r ∈ X has a TK representation if and onlyif it is an affine CTM with a gain-loss regional function that satisfies Reference Inter-locking, Monotonicity, Cancellation, and continuity of each (cid:37) r . Tversky & Kahneman [1991, p. 1053] provide an alternative axiomatic character-ization of the model, and our result makes heavy use of their theorem.
Proof.
Necessity follows from the discussion above and TK’s theorem. To show suffi-ciency, we rely on TK’s theorem, which states that any monotone, continuous familyof preference relations that satisfies cancellation, sign-dependence and reference inter-locking has a TK representation. Given our assumptions, we need to show that { (cid:37) r } satisfies sign-dependence and reference interlocking.TK say that { (cid:37) r } satisfies sign-dependence if “for any x, y, r, s ∈ X , x (cid:37) r y ⇐⇒ x (cid:37) s y whenever x and y belong to the same quadrant with respect to r and withrespect to s , and r and s belong to the same quadrant with respect to x and withrespect to y .” This happens if and only if x ∈ K k ( r ) T K k ( s ) and y ∈ K k ( r ) T K k ( s )for some k ∈ { , , , } . Then, sign-dependence is exactly an implication of affineCTM, since U k ( ·| r ) = αU k ( ·| s ) + β for α > { (cid:37) r } satisfies reference interlocking if “for any w, w , x, x , y, y , z, z that belong to the same quadrant with respect to r as well as with respect to s , w = w , x = x , y = y , z = z and x = z , w = y , x = z , w = y , if w ∼ r x , y ∼ r z , and w ∼ s x then y ∼ s z .” The assumptions on quadrants imply that w, w , x, x , y, y , z, z ∈ K k ( r ) T K l ( s ) for some k, l ∈ { , , , } . Since y , z ∈ K l ( s ),the conclusion follows immediately from RI. (cid:3) A.9.
Example 4.Example 4.
The following salience functions generates regions all satisfy S0-S3, butonly satisfy a subset of the other properties.(1) The function σ ( x, r ) = max { x,r } min { x,r } generates regions that violate S4-S6. Note σ ( a, a ) = a for a >
0. Then ( a, b + (cid:15) ) , ( a, b ) ∈ K ( a, b ) for all a > b andsmall enough (cid:15) >
0, contradicting S4 and S6, respectively. Also note σ ( a, a ) = σ ( √ a,
1) for a >
0. Hence, ( a, √ a ) / ∈ K ( a,
1) but ( a + (cid:15), √ a ) ∈ K ( a + (cid:15),
1) forevery (cid:15) >
0, violating S5.(2) The function σ ( x, r ) = | x − r | generates regions that satisfy S0-S4 but violateS5 and S6. Observe that (2 , √ / ∈ K (1 , √
2) since σ (2 ,
1) = σ ( √ , √
2) = 3,but (2 + (cid:15), √ ∈ K (1 + (cid:15), √
2) for any (cid:15) > σ (2 + (cid:15), (cid:15) ) = 3 + 2 (cid:15) > x = (2 , r = (4 ,
1) have x x = r r , but σ (2 , > σ (2 ,
1) =, so x ∈ K ( r ),contradicting S6.(3) The function σ ( x, r ) = |√ x −√ r | generates regions that satisfy S0-S5 but violateS6. Also, x = (2 ,
2) and r = (4 ,
1) have x x = r r , but σ (2 , > σ (2 , x ∈ K ( r ), contradicting S6. Differentiating establishes S4 and S5. (4) The function σ ( x, r ) = max { x,r } min { x,r } generates regions that satisfy S0-S6.A.10. Proof of Theorem 5.
We first prove the following lemma.
Lemma 11. If K is a category function, then for any (cid:15) > and x so that B (cid:15) ( x ) ⊂ K i ( r ) , there exists δ > so that B (cid:15)/ ( x ) ⊂ K i ( r ) for all r ∈ B δ ( r ) .Proof. Let B (cid:15)/ ( x ). For each j = i , d ( K j ( r ) , B ) > (cid:15)/
2, where d ( · ) is the Hausdorffmetric, and continuity of K j implies that there exists a neighborhood O j of r sothat d ( K j ( r ) , B ) > (cid:15)/ r ∈ O j . Let O = T j = i O j . Then, for any r ∈ O , B / ∈ cl ( S j = i K j ( r )). Since cl ( S i K i ( r )) = X , B ⊂ cl ( K i ( r )). But since B is open, B ⊂ int ( cl ( K i ( r ))) = K i ( r ) since K i ( r ) is regular open. (cid:3) For sufficiency, define a binary relation S by ( a, b ) S ( c, d ) if and only if ( a, c ) / ∈ K ( b, d ). S is clearly complete. It is also transitive by S3. We show it has an opencontour sets. Let S ∗ be the strict part of S . If ( a, b ) S ∗ ( c, d ), then x ∈ K ( r ) for x = ( a, c ) and r = ( b, d ). K ( r ) is open by S0 so there exists (cid:15) > B (cid:15) ( x ) ⊂ K ( r ). By Lemma 11, x ∈ K ( r ) for all r in a neighborhood O of r . Conclude( a , b ) S ∗ ( c , d ) for all ( a , b ) , ( c , d ) ∈ B (cid:15) ( x ) × O . Standard results then show existenceof a continuous function σ so that ( a, b ) S ( c, d ) if and only if σ ( a, b ) ≥ σ ( c, d ). σ issymmetric by S2 and increasing in contrast by S1 and S4. Hence x ∈ K ( y ) if and onlyif σ ( x , y ) > σ ( x , y ), and by S2, x ∈ K ( y ) if and only if y ∈ K ( x ) where x , y arethe reflections of x, y . Hence, x ∈ K ( y ) if and only if σ ( x , y ) < σ ( x , y ).Pick any a, b . By S3, σ ( a, b ) = σ ( b, a ) so ( a, b ) / ∈ K ( b, a ) for any a, b . By S5,( a + (cid:15), b ) / ∈ K ( b + (cid:15), a ). Then, ( b, a ) S ( a + (cid:15), b + (cid:15) ) so σ ( a, b ) = σ ( b, a ) ≥ σ ( a + (cid:15), b + (cid:15) ).Since a, b were arbitrary, diminishing sensitivity holds.For necessity, verifying that S0-S5 hold are trivial, except that each K i ( r ) is regularopen. To see this, pick r and x ∈ int ( cl ( K ( r ))) (symmetric arguments hold for K ).Suppose x (cid:29) r (the other cases follow by changing the signs). Then, there is (cid:15) > x = ( x − (cid:15), x + (cid:15) ) ∈ cl ( K ( r )). Then, there exists x ∈ K ( r ) that is arbitrarilyclose to ¯ x , and we can take it so that r < x < x and r < x < x . Then, σ ( x , r ) >σ ( x , r ) > σ ( x , > σ ( x , r ) since σ is increasing in contrast. Hence x ∈ K ( r ) andsince x was arbitrary, int ( cl ( K ( r ))) ⊂ K ( r ). Clearly, K ( r ) ⊂ int ( cl ( K ( r ))).Now we show the following are equivalent:(i) The functions K and K satisfy S0, S1, and S6,(ii) There exists a salience function σ s.t. x ∈ K k ( r ) ⇐⇒ σ ( x k , r k ) > σ ( x − k , r − k ) In this case it is actually a pseudo metric. That (ii) implies (i) follows from the first part, and that S6 is implied by sym-metry and homogeneity of degree zero. Now, we show (i) implies (ii). Set σ ( a, b ) =max { a/b, b/a } . Clearly σ is a salience function, and we show that σ generates K and K . Fix r ∈ X and set A = { x : σ ( x , r ) > σ ( x , r ) } . We show A = K ( r ).Claim A T K ( r ) = ∅ . If not, pick x ∈ A T K ( r ). x ∈ A implies either (a) x /r > x /r and x /r > r /x or (b) r /x > x /r and r /x > r /x . If (a) and x ≤ r , then x /r > r /x ≥ x /r implies x > r r /x ≥ r , so there exists λ ∈ [0 ,
1) such that ( λx + (1 − λ ) r , x ) = ( r r /x , x ) = x . If (a) and x > r , then x > r x /r > r , so there exists λ ∈ (0 ,
1) such that ( λx + (1 − λ ) r , x ) = ( r x /r , x ) = x . By S1and x ∈ K ( r ), x ∈ K ( r ). However, we have either x x = r r or x /x = r /r so x / ∈ K ( r ) by S6, a contradiction. A similar contradiction obtains if (b) holds.Now, since A T K ( r ) = ∅ and K ( r ) S K ( r ) is dense, A ⊂ cl ( K ( r )). By S0, K ( r ) = int ( cl ( K ( r )). Since A is an open set contained in cl ( K ( r )), A ⊆ K ( r ).Similarly, for B = { x : σ ( x , r ) < σ ( x , r ) } , B ⊆ K ( r ). But( A [ B ) c = { x : x x = r r or x /x = r /r } , and by S6, ( A S B ) c T K k ( r ) = ∅ for k = 1 ,
2. Thus A = K ( r ) and B = K ( r ),completing the proof.Finally, fix any HOD salience function s . Observe s ( a, b ) > s ( c, d ) if and only if s ( a/b, > s ( c/d,
1) by homogeneity if and only if s (max( a/b, b/a ) , > s (max( c/d, d/c ) , a/b, b/a ) > max( c/d, d/c ) by ordering. Thus if onesalience function generates the regions, every other salience function does as well. (cid:3) A.11.
Proof of Proposition 1.
Pick any r ∈ X . Observe that x = ( r + k, r ) neces-sarily belongs to K ( r ) by S4, as does an open set O x . This set O can be identifiedby looking at whether Cancellation and Monotonicity hold on the set. By varying r and k , we obtain a covering of the entirety of R by points that necessarily belongto the 1-salient region. This allows one to identify (cid:37) and obtain a representation U .Repeating with x = ( r , r + k ) obtains a representation U of (cid:37) .Fix any r ∈ X . Consider y (cid:29) r and let I ∗ ( y ) = { y : U k ( y ) = U k ( y ) and U − k ( y ) = U − k ( y ) } \ { y } . If y ∈ K k ( r ), then there exists y ∈ I ∗ ( y ) arbitrarily close to y so that y, y ∈ K k ( r ); forany such y , y ∼ r y . If y r y for every y ∈ I ∗ ( y ) T B (cid:15) ( y ) for some (cid:15) > y / ∈ K k ( r ). Conclude y ∈ K k ( r ) if and only if there exists y ∈ I ∗ ( y ) \ { y } arbitrarily close to y sothat y ∼ r y .We now infer whether σ ( x, a ) > σ ( y, b ) for any x, y, a, b by considering whetheran alternative x is in K ( r ) for appropriately chosen bundles so that x (cid:29) r . Thisis impossible if x = a and always true if y = b and x = a . For any other values,we have that σ ( x, a ) > σ ( y, b ) if and only if either ( x, y ) ∈ K ( a, b ), x > a and y > b ; ( x, b ) ∈ K ( a, y ), x > a , and b > y ; ( a, y ) ∈ K ( x, b ), x < a , and y > b ; or( a, b ) ∈ K ( x, y ), x < a , and b > y . In this way we can reveal the σ function and thusthe regions. (cid:3) Appendix B. Proofs and Extras from Section 5
B.1.
Axioms for c . This subsection formally states the adaptations of the axiomsfor reference dependent preferences { (cid:37) r } r ∈ X in terms of the choice correspondence c .Interpretation is identical to that of those axioms. Axiom (Category Cancellation*) . For all x , y , z , x , y , z ∈ R ++ and category k : if ( x , z ) ∈ c ( S ) , ( z , y ) ∈ S , ( z , x ) ∈ c ( S ) , ( y , z ) ∈ S , ( x , x ) , ( y , y ) ∈ S and S i ⊂ K k ( A ( S i )) for i ∈ { , , } , then ( x , x ) ∈ c ( S ) whenever ( y , y ) ∈ c ( S ) . Axiom (Category Monotonicity*) . For any x, y ∈ X : if x ≥ y and x = y , then ( y, k ) (cid:37) R ( x, k ) for any category k . Axiom (Category Continuity*) . For any S ∈ X and any (cid:15) > so that E T S \ c ( S ) = ∅ where E ≡ S x ∈ c ( S ) B (cid:15) ( x ) there exists δ > so that if S ∈ X , d ( A ( S ) , A ( S )) < δ , andfor any y ∈ S , there is y ∈ S so that y ∈ B δ ( y ) , then c ( S ) ⊂ E whenever S T E = ∅ . Define (cid:37)
R,k by x (cid:37) R,k y if and only if ( x, k ) (cid:37) R ( y, k ). Using this relation, we candefine ⊕ k for each category as we did with preference relations. Axiom (Affine Across Categories*) . For any S , S , S ∈ X , x i ∈ K j ( A ( S i )) , y i ∈ K k ( A ( S i )) for i = 1 , , , and any α ∈ (0 , so that ( x , j ) (cid:37) R ( αx ⊕ j (1 − α ) x , j ) and ( αy ⊕ k (1 − α ) y , k ) (cid:37) R ( y , k ) :if x ∈ c ( S ) and x ∈ c ( S ) , then y / ∈ c ( S ) . Axiom (Salient Dimension Overvalued*) . For x, y ∈ S T S with x k > y k and y − k >x − k , if x, y ∈ K k ( A ( S )) , x, y ∈ R − k ( A ( S )) , and y ∈ c ( S ) , then x / ∈ c ( S ) . Axiom (Reference Interlocking*) . For any a, b, a , b , x , y , x, y ∈ X with x − i = a − i , y − i = b − i , x i = a i , y i = b i , x i = x i , y i = y i , a i = a i , b i = b i :if x ∼ R ∗ k y , a (cid:37) R ∗ k b , and x ∼ R ∗ j y , then it does not hold that b (cid:31) R ∗ j a . B.2.
Proof of Theorem 6.Lemma 12.
Assume that Revealed Structure holds, and that A is a generalized average.If Category-SARP, Category Monotonicity*, Category Cancellation*, and CategoryContinuity* hold, then for any category k there exists a Category utility U k so that forany x, y ∈ E R,k , ( x, k ) (cid:37) R ( y, k ) ⇐⇒ U k ( x ) ≥ U k ( y ) . Proof.
Fix a category i and pick any x, y ∈ E R,i . Let E ∗ = E R,i T B d ( x,y )+1 ( x ). As inproof of Lemma 3, there is a continuous path θ : [0 , → E ∗ so that θ (0) = x and θ (1) = y that crosses each (cid:37) R,i indifference curve at most once, and Y = θ − ([0 , z ∈ Y , there exists an open set z ∈ B z ⊂ E ∗ sothat (cid:37) R,i is complete on B z . If this is the case, we can mimic the rest of the proof ofLemma 3 to show that either x (cid:37) R,i y or y (cid:37) R,i x .By definition of E ∗ , for any z ∈ E ∗ , there exists S ∈ X with A ( S ) = r so that c ( S ) = z . Since K i ( r ) is open, there exists (cid:15) > B (cid:15) ( z ) ⊂ K i ( r ). ByLemma 11, there exists (cid:15) > r ∈ B (cid:15) ( r ) implies B (cid:15) ( z ) ⊂ K i ( r ). Pick ζ ∈ (0 , ) so that B ζ ( z ) ∩ S = z . By Category Continuity*, there exists (cid:15) > S ∈ X with d ( A ( S ) , A ( S )) < (cid:15) , for any y ∈ S , there is y ∈ S so that y ∈ B (cid:15) ( y ), and S T B ζ ( x ) = ∅ , then c ( S ) ⊂ B ζ ( x ). By Generalized Average, thereexists (cid:15) > z ∈ B (cid:15) ( z ) implies d ( A ( S \ { z } ∪ { z } ) , A ( S )) < min { (cid:15) , (cid:15) } / (cid:15) ∗ = min { (cid:15) , (cid:15) , (cid:15) , (cid:15) , ζ } .Pick any x , y ∈ B (cid:15) ∗ / ( z ) and let z ∗ = z − (cid:15) ∗ . Set S = S \ { z } ∪ { z ∗ } , not-ing d ( r, A ( S )) < (cid:15) /
2. By Generalized Average, there exists S ∗ with { x , y } S S ⊂ S ∗ so that d ( A ( S ∗ ) , A ( S )) < (cid:15) ∗ / d ( S , S ∗ \ [ { x , y } S S ]) < ( (cid:15) ∗ / . Since d ( A ( S ∗ ) , r ) ≤ d ( A ( S ∗ ) , A ( S )) + d ( A ( S ) , r ) < (cid:15) , x , y ∈ K i ( A ( S ∗ )). Since everymember of S ∗ is no more than (cid:15) ∗ away from a member of S , Category Continuity* im-plies that c ( S ∗ ) ⊂ B ζ ( z ). CM* gives that either x ∈ c ( S ∗ ) or y ∈ c ( S ∗ ), so x (cid:37) R,i y or y (cid:37) R,i x .Continuity follows along the same lines as Lemma 2. CM* gives that it is alsomonotone, and Category Cancellation* that it is locally additive. Apply Theorem 2.2of Chateauneuf & Wakker [1993] to get a globally additive representation U k . (cid:3) By Lemma 12, there exists a category utility U k for each category. Since E R,k isdense in D k , we can extend U k to D k uniquely. By Generalized Average and CategoryContinuity*, for any S ∈ X with z ∈ [ D k \ E R k ] ∩ S , there is a z ∈ E R,k arbitrarily closeto z so that c ( S ) = c ([ S \ { z } ] ∪ { z } ), so it is sufficient to establish a representationwhen all alternatives categorized as k in S belong E R,k for each k and S . Fix two regions k and j . By CAR, for any x ∈ E R,k there exists x ∈ E R,k , y ∈ E R,j , and S ∈ X so that x , y ∈ c ( S ) and x ∼ R,k x . This implies there exists astrictly increasing function H so that V ( x | r ) = U k ( x ) when x ∈ K k ( r ) and V ( x | r ) = H ( U j ( x )) when x ∈ K j ( r ) represents choice (when S ⊂ K k S K j ). This is well-definedand represents choice by Category SARP. By AAC*, H is an affine function. Theargument are readily seen to extend inductively to all regions, which complete theproof. (cid:3) B.3.
Proof of Lemma 1.
Pick any x ∈ X and set S = { x, x } where x = ( x , x ).Then, A ( S ) = x by strong generalized average, so both x and x are 1-salient by S4.By CM*, x ∈ c ( S ), and so x ⊂ E R, . x was arbitrary, so X = E R, . Similar for K . (cid:3) B.4.
Proof of Proposition 3.
By Lemma 1, the structure assumption is satisfied.By Theorem 5, the category function is generated by a salience function. By Theorem6, c conforms to Strong CTM. Mimicking the arguments of Theorem 2, ReferenceInterlocking implies U k ( x ) = w k u ( x ) + w k u ( x ) + β k . The rest follows from thearguments that establish Proposition 2. (cid:3) B.5.
Proof of Proposition 4.
Pick any r , and suppose U k ( r ) ≥ U − k ( r ). Since A isa generalized average, for any y (cid:29) r there exists a menu S so that A ( S ) is arbitrarilyclose to r and y (cid:29) x for all x ∈ S (pick S so its convex hull is in a small enoughneighborhood of r that doesn’t include y ). By making that neighborhood smaller ifnecessary, either y belongs to K k ( A ( S )) or y / ∈ K k ( r ). There exists a y arbitrarilyclose to, but not equal to, y , so that U k ( y ) = U k ( y ) and U − k ( y ) = U − k ( y ). In theformer case either y or y is chosen from the menu S by categorical monotonicity*,where S is a menu (assumed to exist by generalized average) with A ( S ) sufficientlyclose to A ( S ) and y, y ∈ S . Moreover, y ∈ c ( S ) if it is close enough that it toobelongs to K k . Conclude that y ∈ K k ( r ) implies that there exists y arbitrarily closeto, but not equal to, y with U k ( y ) = U k ( y ) and { y, y } = c ( S ). If y / ∈ K k ( r ), thenboth y and y cannot be chosen. Either y is not chosen because it is in K − k , or theDM will not be indifferent between y and y . The rest follows from Proposition 1.. The rest follows from Proposition 1.