[PDF] Choice with Endogenous Categorization

Abstract

We propose and axiomatize the categorical thinking model (CTM) in which the framing of the decision problem affects how agents categorize alternatives, that in turn affects their evaluation of it. Prominent models of salience, status quo bias, loss-aversion, inequality aversion, and present bias all fit under the umbrella of CTM. This suggests categorization is an underlying mechanism of key departures from the neoclassical model of choice. We specialize CTM to provide a behavioral foundation for the salient thinking model of Bordalo et al. (2013) that highlights its strong predictions and distinctions from other models.

Full PDF

CCHOICE WITH ENDOGENOUS CATEGORIZATION ∗ ANDREW ELLIS † AND YUSUFCAN MASATLIOGLU § Abstract.

We propose a novel categorical thinking model (CTM) where the fram-ing of the decision problem aﬀects how the agent categorizes each product, and theproduct’s category aﬀects her evaluation of the product. We show that a number ofprominent models of salience, status quo bias, loss-aversion, inequality aversion, andpresent bias all ﬁt under the umbrella of CTM. This suggests categorization as anunderlying mechanism for key departures from the neoclassical model of choice andan account for diverse sets of evidence that are anomalous from its perspective. Wespecialize CTM to provide a behavioral foundation for the salient thinking model ofBordalo et al. [2013], highlighting its strong predictions and distinctions from otherexisting models.

Date : March, 2020. ∗ We thank David Dillenberger, Erik Eyster, Nicola Gennaioli, Matt Levy, Collin Raymond, the anony-mous referees, Andrei Shleifer, Rani Spiegler, Tomasz Strzalecki, and conference/seminar participantsat BRIC 2017, CETC 2017, SAET 2017, Lisbon Meetings 2017, ESSET 2019, Brown, UPenn, PompeuFabra, and Harvard for helpful comments and discussions. This project began at ESSET Gerzensee,whose hospitality is gratefully acknowledged. † Department of Economics, London School of Economics, Haughton Street, London, WC2A 2AE.Email: [email protected]. § University of Maryland, 3147E Tydings Hall, 7343 Preinkert Dr., College Park, MD 20742. E-mail: [email protected] . a r X i v : . [ ec on . T H ] M a y Introduction

Psychologists have long held that knowledge about our environment is organizedinto categories, and that this categorization plays a key role in decision making. Cate-gorization has been used by both humans and animals for thousands of years. As Ashby& Maddox [2005] write, “All organisms assign objects and events in the environmentto separate classes or categories... Any species lacking this ability would quickly becomeextinct.”

Categorization plays a key role in a number of important anomalies for the neo-classical model of choice. Attributes categorized as losses get higher weight relativeto those categorized as gains [Tversky & Kahneman, 1991]. An object’s most salientattribute plays a disproportionate role in the agent’s subsequent evaluation [Bordaloet al., 2013]. Subjects avoid objects they categorize as not-obviously-better-than thestatus-quo [Masatlioglu & Ok, 2005]. Agents are less patient when deciding betweendated rewards in the short-term than in the long-term [Strotz, 1955]. Allocations amongmembers of society are evaluated according to whether inequities are advantageous ordisadvantageous [Fehr & Schmidt, 1999].This paper proposes and axiomatizes a simple model of the role that categorizationplays in economic decisions. In the Categorical Thinking Model (CTM), a decisionmaker (DM) ﬁrst groups objects together into categories, consciously or unconsciously,then evaluates each object through the lens of the category to which it belongs. Themodel has two key features motivated by psychological evidence. First, categorization iscontext-dependent, as summarized by a reference point that may depend on the choiceset. Second, how an object is categorized aﬀects its valuation. Prominent modelsof loss-aversion, salience, status quo bias, present bias, and inequality aversion all ﬁtunder the umbrella of CTM. Hence, CTM suggests categorization as an underlyingexplanation for many key departures from the neoclassical model in many diﬀerentdecision-making environments.

To make our results comparable with previous work, we begin by assuming that afamily of reference-dependent preference relations describe the DM’s choices for eachreference point. Each alternative has a pair of observed attributes, such as price andquality, height and weight, or size and timing of a reward. In CTM, the context inwhich the decision takes place determines a reference point, which in turn divides thealternatives into categories. Each category has its own utility function, and within agiven category, the DM evaluates the options according to it. Hence, the DM makesdiﬀerent trade-oﬀs between the attributes when they are diﬀerentially categorized. Weshow that the DM conforms to CTM if and only if she behaves as a standard DMwhen comparing objects categorized the same way. That is, her choices satisfy somestandard axioms, such as acyclicity, and do not depend on the reference point whenrestricted to alternatives that belong to the same category.CTM is a parsimonious approach to incorporating psychological evidence into eco-nomics. Psychological factors determine how each alternative is perceived, which CTMcaptures through diﬀerent categories. Moreover, they predict how being categorized ina particular way aﬀects the DM’s choice, which CTM captures through the category’sutility function. For instance, salience and loss-aversion make distinct predictions aboutwhen a DM puts higher weight on a dimension. The most salient attribute gets moreweight, as does an attribute classiﬁed as a loss. Our result shows that CTM closes themodel by requiring that the DM acts consistently within the alternatives categorizedthe same way.Despite its generality, CTM makes testable predictions and excludes certain typesof modeling choices. For instance, a number of models capture salience eﬀects, in-cluding the salient thinking model [Bordalo et al., 2013] (BGS), Kőszegi & Szeidl[2013], Bhatia & Golman [2013], Gabaix [2014], and Bushong et al. [2015]. Of thesemodels, only BGS is a CTM. In other words, even the most general version of BGSexcludes these models, so BGS oﬀers a diﬀerent method of modeling salience. Ourresults highlight trade-oﬀs between the diﬀerent modeling approaches. For instance,BGS maintains a stronger consistency condition across reference points than does the constant loss aversion of [Tversky & Kahneman, 1991], but the latter, unlike BGS,satisﬁes Monotonicity across regions.We then provide the ﬁrst complete characterization of the observable choice be-havior equivalent to the BGS model, clarifying and identifying the nature of the as-sumptions used in the model. The ﬁrst crucial step towards understanding the model isgetting a handle on its novel salience function that determines which attribute standsout for a given reference point. We study the salience function based on a simple obser-vation: while it inﬂuences which attribute is salient, the weight given to each attributeis independent of its magnitude. This makes BGS a special case of CTM, so our earlierresults allow a characterization.One key feature of BGS is that the reference point is endogenously determinedby the set of available options. Since the salience of each alternative depends onthe reference point, varying the budget set aﬀects the salience of, and so the DM’sevaluation of, a given alternative. Our ﬁnal contribution addresses this challenge byextending our characterization of CTM to the setting where the reference point isendogenous. Our primitive is a choice correspondence describing the DM’s choices.The menu maps to a reference point, such as the average level of each attribute overalternatives in the set. As long as the reference point varies systematically with thechoice problem, we characterize the properties of the choice correspondence equivalentto CTM. Speciﬁcally, we show that if the DM’s choices obey the natural analogs ofour earlier axioms, then CTM rationalizes her behavior. We apply it to provide acompletely endogenous characterization of the BGS function.The paper proceeds as follows. The next subsection provides a brief overview ofthe relevant psychology literature on categorization. Section 2 introduces CTM anddiscusses the models covered under its umbrella. Section 3 axiomatizes CTM andcompares and contrasts the models of riskless choice discussed in Section 2. Section 4 contains our analysis of the salient thinking model. Section 5 introduces the endoge-nous reference point setting, and applies our axiomatizations of CTM to it. Section 6concludes with a discussion of related literature.1.1.

Psychology of Categorization.

There is a long literature in psychology andmarketing discussing categorization. Recent review articles include Ashby & Maddox[2005], Loken [2006], Loken et al. [2008] and Cosmides & Tooby [2013]. Much ofthe literature focuses on how categories are formed, and when new alternatives areadded into existing categories. CTM relies on several properties documented by thisliterature.First, categories are context dependent. Tversky [1977], Tversky & Gati [1978]present evidence that replacing one item in a set of objects can drastically alter howpeople categorize the remaining objects. Tversky & Gati [1978] argue that categoriza-tion “is generally not invariant with respect to changes in context or frame of reference.”

For example, they show that subjects put East Germany and West Germany into thesame category when the salient feature is geography or cultural background, but catego-rize the two diﬀerently if political system is salient. Similarly, Choi & Kim [2016] positthat depending on the context an Apple Watch can be categorized as a tech product,a fashion product, a ﬁtness product, or a simple watch. Ratneshwar & Shocker [1991]show that subjects categorize ice cream and cookies together in terms of similarity (e.g.they are both desserts), but categorize ice cream and hot dogs together in terms ofusage beneﬁt (e.g. both are good snacks to have at the pool). Stewart et al. [2002]present evidence that relative magnitude information, derived from a comparison ofthe reference point, is used in categorization of sounds.Second, how an object is categorized aﬀects its ﬁnal valuation. In a classic seriesof experiments, Rosch [1975] shows that diﬀerently categorized but physically identicalstimuli are perceptually encoded as distinct objects. Wanke et al. [1999] demonstratethat “wine” is evaluated more positively when categorized with “lobster” than with“cigarettes.” Mogilner et al. [2008] show that categorizing goods diﬀerently resulted in diﬀerent reported satisfaction. Chernev [2011] shows that bundling a healthy fooditem with a junk food item reduced the reported caloric content beyond that of thejunk food alone.Finally, categories take the form of regions in the alternative space. This tracksvery closely with the decision bound theory in the psychology. As Ashby & Maddox[2005, p. 152] describe, it posits that the subject “ partition[s] the stimulus space intoresponse regions... determines which region the percept is in, and then emits the associ-ated response. ” Ashby & Gott [1988] show it can accommodate examples incompatiblewith other theories of category formation, such as prototype theory. Moreover, there issubstantial experimental support for it, including Ashby & Waldron [1999], Anderson[1991], Love et al. [2004]. 2.

Model

To aid in comparison with the existing literature and to separate the eﬀects ofreference point formation, we follow Tversky & Kahneman [1991] by taking as givena family of reference-dependent preference relations. We assume that the space ofalternatives is X = R n ++ , focusing on n = 2 when not otherwise noted. We oftenuse the convention of writing x as ( x i , x − i ) with x − i denoting the components of x diﬀerent for i . The next subsections explore three diﬀerent interpretations of X indiﬀerent contexts: as a riskless object with diﬀerent attributes, as a dated reward orconsumption stream, and as an allocation of consumption across individuals. For eachreference point r ∈ X , the DM maximizes a complete and transitive preference relation,denoted by (cid:37) r , over X . As usual, (cid:31) r denotes strict preference and ∼ r indiﬀerence. Theprimitive of the model is a family of such preferences indexed by the set of referencepoints, { (cid:37) r } r ∈ X . In this section, we assume that the reference point is exogenously We note when there is a distinction between general n and n = 2. Theorem 5 and the resultsthat rely on it use the full structure of R n ++ . The remaining results all generalize to any X that is aﬁnite Cartesian product of open, linearly ordered, separable, connected sets endowed with the ordertopology, where X itself has the product topology. given. We relax this assumption in Section 5 to allow endogenous reference pointformation.2.1. Categorical Thinking Model.

The ﬁrst ingredient of the model is a mappingfrom the reference r to categories. Each category corresponds to a diﬀerent psycholog-ical treatment and changes as the reference changes. We allow the categories to havea very general structure. Deﬁnition 1.

A vector-valued function K = ( K , K , . . . , K m ) is a category function if each K k : X → X satisﬁes the following properties:(1) K k ( r ) is a non-empty, regular open set, and cl ( K k ( r )) is connected, (2) S mk =1 K k ( r ) is dense,(3) K k ( r ) T K l ( r ) = ∅ for all k = l , and(4) K k ( · ) is continuous. Categories arise from the psychology of the phenomenon to be modeled. For CTMto be applicable, the psychology must make an unambiguous prediction about whichalternatives are aﬀected. For instance, with gain-loss utility, alternatives that dominatethe reference point are treated diﬀerently than those better in only one dimension.Similarly, with present-bias, alternatives that pay-oﬀ sooner than the reference arecategorized together. While we take the categories as given, if the psychology onlymakes partial predictions, then the categorization of other alternatives can often beinferred from choice. Proposition 1 does so for the salient thinking model.We interpret the properties as follows. Every category contains some alternativefor every reference point. If a particular product, say x , belongs to the category k ,then so do all products that are close enough to x . There is a path that stays withinthe category between any two points, so categories cannot be the union of “islands.”Almost every alternative is in at least one category, and none are in two categories.Finally, if the reference point does not change too much, then neither do the categories. Recall that a set A is regular open if A = int ( cl ( A )). That is, each K k is both upper and lower hemicontinuous when viewed as a correspondence. The consumer values each good in a way that depends not only on alternative ofa product, as in the standard neoclassical model, but also on the category to whichthe product belongs. When alternatives x and y are both categorized in category k ,the category utility function U k : X → R represents the DM’s choices. That is, sheprefers x to y if and only if U k ( x ) ≥ U k ( y ). We focus on the eﬀect of categorizationon distorting trade-oﬀs, so we require that a category utility function is additivelyseparable and monotonic : U k ( x ) = P ni =1 U ki ( x i ) where each U ki ( · ) is strictly monotoneand continuous. The utility index U ki represents the DM’s preferences over dimension i when an alternative belongs to the category k .When alternatives belong to diﬀerent categories, the reference point may aﬀectthe DM’s choice. If the alternative x lies in the category k when the reference is r , thatis, x ∈ K k ( r ), then the value of consumption x is represented by U k ( x | r ). However,the reference does not aﬀect the utility trade-oﬀ within a category. To capture this, werequire that U k ( ·| r ) agrees with U k , in the sense that it is an increasing transformationthereof. Then, U k ( x | r ) ≥ U k ( y | r ) if and only if U k ( x | r ) ≥ U k ( y | r ) for any references r, r ∈ X . We can now formally deﬁne the model as follows. Deﬁnition 2.

The family { (cid:37) r } r ∈ X conforms to the Categorical Thinking Model (CTM) under category function K = ( K , K , . . . , K m ) if for each category k there is a categoryutility function U k so that when x ∈ K k ( r ) and y ∈ K l ( r ) for some rx (cid:37) r y ⇐⇒ U k ( x | r ) ≥ U l ( y | r )and U k ( ·| r ) is an increasing transformation of U k ( · ) for each r ∈ X and category k .A CTM is increasing if U ki is increasing in x i for every category k and dimension i .We also consider two sub-classes: A CTM is aﬃne if U k ( ·| r ) an aﬃne transformationof U k for each r . A CTM is strong if U k ( ·| r ) = U k ( · ) for each r . Most of the models wediscuss below are aﬃne CTM, and those of riskless consumer choice are all increasing. That is, U ki is either strictly increasing on R + or strictly decreasing on R + . Riskless Consumer Choice.

In this subsection, we consider our primary appli-cation: riskless consumer choice. The four models are introduced formally, and each isshown to be CTM. Figure 1 plots their indiﬀerence curves and categories, with darkerlines indicating higher utility. r r MO Attribute 1

BGS A tt r i b u t e A tt r i b u t e Attribute 1 A tt r i b u t e 𝑝 " 𝑝 𝑝 $ Attribute 1

Prototype Model rr Attribute 1 A tt r i b u t e TK Figure 1.

CTM for Riskless Choice

Salient Thinking Model (BGS):

Bordalo et al. [2013] propose an intuitive anddescriptive behavioral model based on salience. In the model, an attribute receivesmore weight when it is salient than when it is not. The magnitude of salience isdetermined by a salience function , σ := R ++ × R ++ → R + . Given a reference ( r , r ),attribute 1 is salient for good x if σ ( x , r ) > σ ( x , r ), and attribute 2 is salient forgood x if σ ( x , r ) < σ ( x , r ). That is, the salient attribute is the one that diﬀers themost from the reference according to the salience function. We describe the properties of σ more fully in Section 4. Deﬁnition 3.

The family { (cid:37) r } r ∈ X has a BGS ( σ ; w , w , u , u ) representation if each (cid:37) r is represented by(1) V BGS ( x | r ) =  w u ( x ) + w u ( x ) if σ ( x , r ) > σ ( x , r ) w u ( x ) + w u ( x ) if σ ( x , r ) > σ ( x , r )for a salience function σ , strictly positive weights with w w > w w , and each u i strictlyincreasing.To illustrate this model, consider the salience function proposed by BGS: σ ( x k , r k ) = | x k − r k | x k + r k . Based on it, the left-upper panel in Figure 1 illustrates BGS. There are two categories:those that are 1-salient, i.e. σ ( x , r ) > σ ( x , r ), and those that are 2-salient, i.e. σ ( x , r ) > σ ( x , r ). To visualize them, note that the entire product space is dividedinto four distinct areas by the two dashed curves that intersect at the reference point.The areas lying the north and south of the reference point are categorized as the2-salient products. Similarly, 1-salient products lie east and west of the referencepoint. The ﬁgure incorporates indiﬀerence curves as well, holding ﬁxed the referencepoint. There are two potential sets of indiﬀerence curves, illustrated by dotted lines.Depending on the category, one of the two is utilized to determine the DM’s choice.When attribute 1 is salient, the steeper one becomes the indiﬀerence curve since itputs higher weight on the ﬁrst attribute. Conversely, the ﬂatter one is the indiﬀerencecurves when attribute 2 is salient. We draw two diﬀerent indiﬀerence curves, wherethe darker color corresponds to higher utility. Constant Loss Aversion Model (TK):

Tversky & Kahneman [1991] providesfoundations for a reference-dependent model that extends Prospect Theory to risklessconsumption bundles. Each is evaluated relative to reference point r , and losses loomlarger than gains. In the absence of losses, the DM values each alternative with anadditive utility function, u ( x ) − u ( r ) + v ( x ) − v ( r ), which attaches equal weight to each attribute. If she experiences a loss in attribute i , then she inﬂates the weightattached to that attribute by λ i >

1. There are four diﬀerent categories in the TKformulation: (i) gain in both dimensions, (ii) gain in the ﬁrst dimension and loss in thesecond dimension, (iii) loss in the ﬁrst dimension and gain in the second dimension,and (iv) loss in both dimensions (see the right-upper panel in Figure 1). We modelthis as K GL = ( K , K , K , K ) where K ( r ) = { x : x (cid:29) r } , K ( r ) = { x : x r } , K ( r ) = { x : x > r and x < r } , and K ( r ) = { x : x (cid:28) r } ; callthis the gain-loss category function . Then, the utility function is V T K ( x | r ) =  u ( x ) − u ( r ) + u ( x ) − u ( r ) if x ∈ K ( r ) λ ( u ( x ) − u ( r )) + u ( x ) − u ( r ) if x ∈ K ( r ) u ( x ) − u ( r ) + λ ( u ( x ) − u ( r )) if x ∈ K ( r ) λ ( u ( x ) − u ( r )) + λ ( u ( x ) − u ( r )) if x ∈ K ( r )where λ , λ > > u i strictly increasing. TK is a specialcase of aﬃne CTM with four categories deﬁned by a gain-loss category function. Status Quo Bias Model (MO):

Masatlioglu & Ok [2005] model individuals whoexperience some form of psychological discomfort when they have to abandon theirstatus quo option. This discomfort imposes an additional utility cost. Of course, ifan alternative is unambiguously superior to the status quo, the DM does not feel anypsychological discomfort to forgo the status quo; in such cases there will be no cost.Formally, Q ( r ) is a closed set denoting the alternatives that are unambiguously superiorto the default option r (see the left-bottom panel of Figure 1). If an alternative does notbelong to this set, then the DM pays a cost c ( r ) >

0, which may depend on the referencepoint, to move away from the status quo. In this model, there are two categories K MO = ( K , K ) where K ( r ) = { x | x ∈ int ( Q ( r )) } and K ( r ) = { x | x / ∈ Q ( r ) } . Forany x = r , we have V MO ( x | r ) =  u ( x ) + u ( x ) if x ∈ K ( r ) u ( x ) + u ( x ) − c ( r ) if x ∈ K ( r ) . This is an example of an aﬃne CTM for general c , and a strong CTM when c ( r ) isconstant. Prototype Theory (PT):

Prototype theory was ﬁrst proposed by Posner & Keele[1970]. According to it, each category is associated with a prototype, its “most typi-cal” member. Initial categorization is determined by comparing each product to eachprototype. We now formalize this idea and show that this is CTM.There are m prototypes, p , . . . , p m . The DM categorizes alternatives accordingto how similar they are to a given prototype. Then, category K i ( r ) is the set ofalternatives categorized as most similar to exemplar p i . Similarity may depend on thereference. There is a family of metrics indexed by r so that d r ( x, y ) indicates how faraway the DM perceives x to be from y given reference r . Formally, K P = ( K , . . . , K m )where K i ( r ) = { x : i = arg min j d r ( p j , x ) } . and the DM evaluates alternatives in category i according to V iP T ( x | r ) = U ( p i ) + λ i ( x − p i ) + λ i ( x − p i ) if x ∈ K i ( r )where U ( · ) is a hedonic utility function and λ ij >

0. A particularly interesting speciﬁ-cation is where λ ij = ∂∂p ij U ( p i ). Then, the DM approximates the utility of x accordingto a ﬁrst-order Taylor expansion around the prototype most similar to it (see theright-bottom panel of Figure 1). This is an example of a Strong CTM.2.3.

Time Preferences.

We apply our model to choices of dated rewards. The pair( x, t ) represents a payment of x at time t . Motivated by present bias, we proposea model where the DM divides time periods according to short term and long term.Given a reference r = ( r x , r t ), rewards arriving before r t are perceived as a short-termand after r t as long-term. Hence K short ( r ) = { ( x, t ) | t < r t } In the ﬁgure, we use d r ( p j , x ) = d ( p j ,x ) d ( p j ,r ) where d is the Euclidean metric. and K long ( r ) = { ( x, t ) | t > r t } . The utility function is V QH (( x, t ) | ( r x , r t )) =  ( βδ ) t u ( x ) if ( x, t ) ∈ K short ( r ) β r t δ t u ( x ) if ( x, t ) ∈ K long ( r )where 0 < δ < < β ≤

1. The model is additively separable after taking logs,so it is a special case of CTM. It exhibits present bias when β <

1: there exist values y > x > x, τ ) (cid:37) r ( y, τ + 1) if and only if τ < r t − Figure2 plots its indiﬀerence curves. T i m e Money r 𝑇𝑖𝑚𝑒 𝑃𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐾 𝑙𝑜𝑛𝑔 ( 𝑟 )𝐾 𝑠ℎ𝑜𝑟𝑡 ( 𝑟 ) Figure 2.

CTM for Dated Rewards2.4.

Social preferences.

Our ﬁnal application is to consumption allocations. Alter-natives assign consumption to each of n agents, labeled 1 , . . . , n . Dimension 1 corre-sponds to the DM’s own consumption. We consider inequality-averse and social-welfareconcerned DMs. Inequality Aversion:

Fehr & Schmidt [1999] introduce a model of inequality aver-sion. The DM experiences envy, i.e. she dislikes having a lower allocation than another, For instance u ( x ) = 1 and u ( y ) = ( βδ ) − . and guilt, i.e. she prefers others not to have less consumption than her. We present ageneralization of their model where a reference point aﬀects how much envy or guiltthe DM feels. Envy and guilt are generated by the diﬀerence between how much betteragent i is relative to i ’s reference consumption and how much better the DM is rel-ative to her own reference.Some reference-dependence makes sense when consideringdecisions that impact the social alternative: the DM may not experience much guilt ifagent 2’s consumption is low in every feasible allocation.In the Relative Inequality Aversion (RIA) model, the DM i feels guilty if her ownrelative gain ( x i − r i ) is higher than the relative gain of individual j ( x j − r j ). Otherwise,the individual i is jealous of individual j since x j − r j > x i − r i . Given the referencepoint is r , the value function of the DM equals V RIAi ( x | r ) = x i − α ( n − X j = i max { ( x j − r j ) − ( x i − r i ) , }− β ( n − X j = i max { ( x i − r i ) − ( x j − r j ) , } Observe that when r i = r j for all i and j , the utility function reduces to that of Fehr& Schmidt [1999]. Throughout, we follow them in assuming that α ≥ β ≥ β < K RIA = ( K R , K R ) where K R ( r ) = { x ∈ X : x − r > x − r } and K R ( r ) = { x ∈ X : x − r < x − r } The set K jR ( r ) contains allocations where individual j gets a relatively better deal thanthe other. The relative inequality aversion model can be written as V RIA ( x | r ) =  x − α [( x − r ) − ( x − r )] if x ∈ K ( r ) x − β [( x − r ) − ( x − r )] if x ∈ K ( r )which is an aﬃne CTM. Distributional Preferences:

Charness & Rabin [2002] propose a model of socialpreferences where utility is increasing with the minimum of all individuals’ payoﬀs and I n d i v i d u a l Individual 1 r 𝐾 (𝑟)𝐾 & (𝑟) I n d i v i d u a l Individual 1 r 𝐾 (𝑟)𝐾 & (𝑟) Figure 3.

Left: Relative Inequality Aversion and Right: Reference-Dependent Distributional Preferencesthe total of all individuals’ payoﬀs. The DM i maximizes V i ( x ) = (1 − λ ) x i + λ [ δ min { x , x , ..., x n } + (1 − δ ) X k x k ] . The parameter δ ∈ (0 ,

1) measures the degree of concern for helping the worst-oﬀindividual (Rawlsian) versus maximizing the total social payoﬀs (Utilitarian). Theparameter λ ∈ (0 ,

1) measures how the DM balances social welfare with her ownutility, where λ = 0 captures pure self-interest.We propose a natural extension of their model with an exogenously given referencepoint. We call this model Reference-Dependent Distributional Preferences . That is, V CRi ( x | r ) = (1 − λ ) x i + λ [ δ min { x − r , x − r , ..., x n − r n } + (1 − δ ) X k x k ]According this model, each individual cares to maximize the minimum possible relativepayoﬀ x j − r j . Note that if r i = r j for all i and j , this model encompasses the modelof Charness & Rabin [2002] as a special case.We show that this model is CTM. To do that, we ﬁrst deﬁne categories for thismodel. Each category corresponds to the individual who has the worst relative payoﬀ.In this case, K CR = ( K , . . . , K n ) where K j ( r ) = { x ∈ X : ( x j − r j ) is the minimum of { x − r , x − r , ..., x n − r n }} , and V CRi ( x | r ) = (1 − λ ) x i + λ [ δ ( x j − r j ) + (1 − δ ) X k x k ] if x ∈ K j ( r ) , showing that the model is an aﬃne CTM.3. Behavioral Foundation for CTM

In this section, we provide a set of behavioral postulates characterizing increasingCTM. These postulates represents the key features of the model. We show that theyhold if and only if the data is representable by increasing CTM, rendering the modelbehaviorally testable. In subsequent subsections, we explore the various strengtheningsof the model and provide axiomatizations of these as well.For each category k , deﬁne the revealed ranking within that category (cid:37) k so that x (cid:37) k y if and only if there exists r such that x, y ∈ K k ( r ) and x (cid:37) r y . The sub-relations (cid:31) k and ∼ k are deﬁned in the usual way. The ranking (cid:37) k captures preference withincategory k . The following axiom states that the within-category revealed preferencehas no cycles. Axiom 1 (Weak Reference Irrelevance) . The relation (cid:37) k is acyclic. That is, if x (cid:37) k x (cid:37) k · · · (cid:37) k x m , then x m k x .Weak Reference Irrelevance ensures that the DM reacts consistently to alternativeswhen they are categorized the same way. That is, the categories reﬂect the DM’spsychological treatment of the alternative. Although she may have choice cycles, thesecycles occur only when the context changes how the DM categorizes alternatives. Since (cid:37) k is acyclic, we can take its transitive closure to derive full comparisons. Let (cid:37) k ∗ beis transitive closure, with (cid:31) k ∗ and ∼ k ∗ the asymmetric and symmetric parts.Within a category, preference has an additive structure. The next axiom impliesthat each (cid:37) r satisﬁes Cancellation when restricted to a given category. Axiom 2 (Category Cancellation) . For all x , y , z , x , y , z ∈ R + , r ∈ X , and cate-gory j so that ( x , z ) , ( z , y ) , ( z , x ) , ( y , z ) , ( x , x ) , ( y , y ) ∈ K j ( r ):If ( x , z ) (cid:37) r ( z , y ) and ( z , x ) (cid:37) r ( y , z ), then ( x , x ) (cid:37) r ( y , y ).Category Cancellation adapts the well-known Cancellation axiom to our setting,diﬀering in its requirement that the alternatives belong to the same category. Withoutthe qualiﬁers on how alternatives are categorized, the axiom is a well-known necessarycondition for an additive representation that appears in Krantz et al. [1971] and Tversky& Kahneman [1991], among others. If X has strictly more than two dimensions, thenwe can replace it with the analog of P2 [Savage, 1954]; see Debreu [1959]. The next axiom requires that Monotonicity holds between objects categorized thesame way.

Axiom 3 (Category Monotonicity (CM)) . For any x, y, r ∈ X : if x ≥ y and x = y ,then y (cid:37) k ∗ x for any category k ; in particular, if x, y ∈ K k ( r ), then x (cid:31) r y .Since both attributes are “goods” as opposed to “bads,” Monotonicity means thatif a product x contains more of some or all attributes, but no less of any, than anotherproduct y , then x is preferred to y . The postulate requires that choice respects Mono-tonicity for alternatives within the same category. However, it does not require thatthis comparison holds when the goods belong to diﬀerent categories, and we shall seelater that salience can distort comparisons enough to cause Monotonicity violations.Finally, the family of preference relations is suitably continuous. Axiom 4 (Category Continuity) . For any r ∈ X and any x ∈ S i K i ( r ), the sets U C j ( x ) = { y ∈ K j ( r ) : y (cid:31) r x } and LC j ( x ) = { y ∈ K j ( r ) : x (cid:31) r y } are open. Formally, for any x, y, x , y ∈ K k ( r ) and subset of indexes E , if x i = x i and y i = y i for i ∈ E , x i = y i and x i = y i for all i / ∈ E , and x (cid:37) r y , then x (cid:37) r y . This is implied by Category Monotonicity when n = 2, so a stronger condition is necessary. Moreover, the set ( x ∈ [ i K i ( r ) : U C j ( x ) [ LC j ( x ) = K j ( r ) and U C j ( x ) = K j ( r ) and LC j ( x ) = K j ( r ) ) has an empty interior.Category continuity adapts the usual continuity condition to apply only withina category. It says that when y is preferred to x in a given context and y is closeenough to y , then y is also preferred to x , provided that y belongs to the samecategory as y . The ﬁnal condition requires that if an alternative x is neither betterthan everything within category j nor worse than everything within category j , thenthere exists something in category j that is as good as x , or as good as somethingarbitrarily close to x . For such an x , the category must intersect almost all indiﬀerencecurves close to x ’s since each category is almost connected.Finally, we make a structural assumption. Assumption (Structure) . The category function K is such that for any category k , thefollowing sets are connected: E k = S r ∈ X K k ( r ), { x ∈ E k : x i = s } for all dimensions i and scalars s , and { y ∈ E k : x ∼ k ∗ y } for all x ∈ E k .The Structure Assumption is satisﬁed all the models we discussed in the previoussection. Indeed, E k = R n ++ for every category in these models. These conditionsestablish that the objects categorized in the same way have enough topological structureso that “local” properties can be extended to global ones. Chateauneuf & Wakker [1993]show that the structure assumption, applied to a single preference relation and domain,is needed to guarantee that a local additive representation implies a global one. Theorem 1.

Assume the Structure Assumption holds. The family { (cid:37) r } r ∈ X satisﬁesWeak Reference Irrelevance, Category Cancellation, Category Monotonicity, and Cat-egory Continuity for K if and only if it conforms to increasing CTM under K . Increasing CTM captures the behavior implied by the axioms, so we call Axioms1-4 the CTM axioms. Taken together, they establish that the DM acts rationally whenrestricting attention to alternatives categorized in the same way for a given referencepoint. That is, CTM captures a DM who diﬀers from the neoclassical model only whenalternatives are categorized diﬀerently. The theorem reveals that a number of otherreference dependent models have been studied by the literature fall outside the scopeof our analysis. For instance, Bhatia & Golman [2013], Munro & Sugden [2003], thenon-constant loss averse version of Tversky & Kahneman [1991], and the continuousversion of the salient thinking model (see online appendix of Bordalo et al. [2013]) allviolate weak reference irrelevance for any speciﬁcation of the category function. Weprovide the details in Appendix A.6.We provide a brief outline of how the proof works, and all omitted proofs can befound in the appendix. The axioms are suﬃcient for a “local” additive representationof (cid:37) r (and thus (cid:37) k ) on an open ball around each alternative within category k . TheStructure Assumption allows us to apply Theorem 2.2 of Chateauneuf & Wakker [1993]to aggregate the local additive representation of (cid:37) k into a global one. To do so, we mustestablish that the global preference is complete, transitive, monotone, and continuous.We establish these properties for preference within each category by showing that thetransitive closure of each (cid:37) k is complete and suitably continuous. The remainder ofthe proof shows that Categorical Continuity allows us to stitch the diﬀerent within-category representations together into an overall utility function.3.1. Reweighting.

In all of the models discussed in Section 2.2, the DM evaluatesthe diﬀerence between alternatives categorized in the same way similarly. That is,regardless of the category, the DM agrees on how much better a value of x versus y is in dimension i . Categorization aﬀects only how much weight she puts on eachdimension. This is captured by the following axiom. Axiom 5 (Reference Interlocking) . For any a, b, a , b , x , y , x, y ∈ X and categories k, j with x − i = a − i , y − i = b − i , x i = a i , y i = b i , x i = x i , y i = y i , a i = a i , b i = b i :if x ∼ k y , a (cid:37) k b , and x ∼ j y , then it does not hold that b (cid:31) j a .The term “Reference Interlocking” comes from Tversky & Kahneman [1991]. Ifeach (cid:37) k is complete, then their statement of it is equivalent given the other axioms.Roughly, the DM agrees on the diﬀerence in utilities along a given dimension regard-less of how an alternative is categorized. To interpret, observe that the ﬁrst pair ofcomparisons reveals that the diﬀerence between a i and b i exceeds that between x i and y i when the alternatives belong to category k . For alternatives categorized in j , theDM should not reveal the opposite ranking. We defer to the above paper for a detaileddiscussion. Theorem 2.

Suppose that { (cid:37) r } r ∈ X conforms to increasing CTM under K and each E k is connected. For each dimension i , there exist a utility index u i and a weight w ki > for each category k so that each category utility U k is cardinally equivalent to one thatmaps each x ∈ E k to P i w ki u i ( x i ) if and only if Reference Interlocking holds. All of the models in Section 2.2 satisfy the axiom, and are thus special casesof increasing CTM satisfying Reference Interlocking. For instance, diﬀerences in thesalient dimension of BGS receive higher weight, but the relative size of two givendiﬀerences in the same dimension is the same regardless of whether both are salient orboth are not. The axiom implies that the utility index within each category must bethe same, up to an increasing, aﬃne transformation.3.2.

Behavioral Foundation for Aﬃne CTM.

In this section, we explore when anaﬃne CTM exists. That is, when is U k ( ·| r ) a positive aﬃne transformation of U k ( ·| r )for any r, r ? All of the models from Section 2.2 fall into this class. For MO, this is true only when c ( r ) < ∞ . Unsurprisingly, the key restriction relative to CTM is that tradeoﬀs across cate-gories are aﬃne. As is usual, this is captured by a form of lineariry, or the “Indepen-dence Axiom.” We require it to hold only when alternatives combined belong to thesame category, and adjust for the curvature of the utility index.To state the key axiom, we deﬁne an operation ⊕ k along similar lines as Ghirardatoet al. [2003]. For x, y ∈ R and a category k , x ⊕ ki y = z when there exists a, b such that( x i , a − i ) ∼ k ∗ ( z i , b − i ) and ( z i , a − i ) ∼ k ∗ ( y i , b − i ). If (cid:37) k has an additive representation,then U ki ( x ) + U ki ( y ) = U ki ( z ). Deﬁne ⊕ k similarly for alternatives: x ⊕ k y = z if and only if z i = x i ⊕ ki y i for each dimension i . Finally, deﬁne αx ⊕ k (1 − α ) y bytaking limits. We note that if U ki is linear, then αx ⊕ ki (1 − α ) y = αx + (1 − α ) y . Axiom 6 (Aﬃne Across Categories (AAC)) . For any r ∈ X , x, x , αx ⊕ j (1 − α ) x ∈ K j ( r ), and y, y , αy ⊕ k (1 − α ) y ∈ K k ( r ): if x (cid:37) r y and x (cid:37) r y , then αx ⊕ j (1 − α ) x (cid:37) r αy ⊕ k (1 − α ) y .This axiom is a natural adaptation of the linearity axiom, a close relative of theindependence axiom. If we strengthened Aﬃne Across Categories to be stated usingthe traditional linearity condition, then we would obtain a representation where each U k ( ·| r ) is itself an aﬃne function. Otherwise, it requires that the ⊕ k operation preservesindiﬀerence.The second axiom deals with a technical issue. Axiom 7 (Unbounded) . For any r ∈ X : if K k ( r ) contains a sequence x n so that U k ( x n ) → ∞ ( −∞ ), then for any x ∈ X there exists x ∗ ∈ K k ( r ) so that x ∗ (cid:31) r x ( x (cid:31) r x ∗ ).We note that U k is unique up to a positive aﬃne transformation. Hence wheneverthe utility of some sequence goes to inﬁnity for some representation of (cid:37) k , it must In general, αx ⊕ k (1 − α ) y need not exist. However, it does exist “locally,” which is all we require inthe proof. That is, if x ∈ K k ( r ), then there exists an open set O with x ∈ O on which αy ⊕ k (1 − α ) z exists for every α ∈ [0 ,

1] and y, z ∈ O . also converge to inﬁnity for any other representation as well. While the axiom can bestated in terms of primitives, we instead state it in terms of the U k . It ensures that acategory containing alternatives whose utility goes to positive (negative) inﬁnity mustcontain an alternative better (worse) than any other given alternative. If it failed, thenno aﬃne transformation of the category utility would represent the preference.

Theorem 3.

Assume the Structure Assumption holds. Then, { (cid:37) r } r ∈ X satisﬁes theCTM axioms, Aﬃne Across Categories, and Unbounded for K if and only if it conformsto Aﬃne Increasing CTM under K . All the models discussed in Section 2 fall into the class of Aﬃne CTM, so theresult reveals the behavior all have in common. Relative to CTM, Aﬃne Across Cat-egories imposes stronger requirements on how the DM relates alternatives in diﬀerentcategories. Not only does the DM evaluate utility within a category using an addi-tive function, but the additive structure persists across categories. Moreover, this aidswith interpreting utility diﬀerences. If every pair of categories contains alternativesindiﬀerent to one another, the entire representation is unique up to a common positiveaﬃne transformation. We call the combination of Axioms 1-4 and 6-7 the Aﬃne CTMaxioms.3.3.

Behavioral Foundation for Strong CTM.

For a strong CTM, changing thereference point does not reverse the ranking of two products unless it also changes theircategorization. The following axiom imposes this.

Axiom 8 (Reference Irrelevance) . For any x, y, r, r ∈ X :if x ∈ K k ( r ) T K k ( r ) and y ∈ K l ( r ) T K l ( r ), then x (cid:37) r y if and only if x (cid:37) r y .For the general CTM, the reference point inﬂuences choice trough two channels:the category to which it belongs and its valuation. The axiom eliminates the latter. The statement in terms of primitives involves standard sequences and does not reveal key aspectsof behavior, so we instead present the simpler and easier to interpret one above. In special cases, thisis easy to do. For instance, if U k is linear, then the axiom simply states that if K k ( r ) is an unboundedset, then the conclusion of the above axiom holds. When comparing two alternatives across diﬀerent reference points, the DM’s relativeranking does not change when neither’s category changes. This property greatly limitsthe eﬀect of the reference point. In fact, a suﬃciently small change in the referencenever leads to a preference reversal.

Theorem 4.

Assume the Structure Assumption holds and for any categories i, j andany r ∈ X , there exists x ∈ K i ( r ) and y ∈ K j ( r ) with x ∼ r y . Then, { (cid:37) r } r ∈ X satisﬁesthe Aﬃne CTM axioms and Reference Irrelevance for K if and only if conforms toStrong, Increasing CTM under K . Since BGS, MO, and PT are Strong CTM, Theorem 4 characterizes the behaviorthey have in common. While the reference plays a role in categorization, it plays norole in choice after categorization is taken into account. TK, which belongs to aﬃneCTM but not strong CTM, must therefore violate reference irrelevance.3.4.

Comparing Models of Riskless Choice.

TK, BGS, MO, PT, and the neo-classical model all conform to Aﬃne CTM, so Theorems 1 and 3 describe the behaviorthat they have in common. However, the analysis so far, as well as the functionalforms of the models, leaves open the question of what behavior distinguishes them.Of course, they diﬀer in how alternatives are categorized, but the models also reﬂectdistinct behavior within and across categories.In addition to Reference Irrelevance, they are distinguished by whether they satisfytwo classic axioms: Monotonicity and Cancellation, the unrestricted versions of Cat-egory Monotonicity and Category Cancellation. The ﬁrst requires that a dominantbundle is chosen, and the latter that an additive structure obtains. The representa-tion theorem of Tversky & Kahneman [1991] imposes those two axioms in additionto continuity. In Appendix A.8, we show that an aﬃne CTM with a Gain-Loss cate-gory function satisﬁes the two classic axioms and continuity if and only if it has a TKrepresentation. We provide a detailed examination of the BGS model in Section 4. The formal statements are obtained by dropping the requirement in those two axioms that thealternatives belong to the same category. Table 1 compares the four models in terms of Reference Irrelevance, Monotonicityand Cancellation, when BGS, TK, MO, and PT do not coincide with the neoclassicalmodel. Only the neoclassical model satisﬁes all conditions; none of the other four do.On the one hand, BGS and PT satisfy Reference Irrelevance but violate Monotonicityand Cancellation. On the other, TK maintains Monotonicity and Cancellation butviolates Reference Irrelevance. Finally, MO satisﬁes all but Cancellation. Neoclassical BGS TK MO PTCTM (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)

Monotonicity (cid:51) (cid:55) (cid:51) (cid:51) (cid:55)

Reference Irrelevance (cid:51) (cid:51) (cid:55) (cid:51) (cid:51) Cancellation (cid:51) (cid:55) (cid:51) (cid:55) (cid:55)

Table 1.

Comparisons of ModelsWe provide a plausible example violating the Cancellation axiom, and hence be-havior inconsistent with TK. Then, we illustrate BGS can accommodate this exampleeven without requiring a shift in the reference point. While the example is one simpletest to distinguish BGS from TK, it is also powerful as it works for a ﬁxed referencepoint.

Example 1.

Consider a consumer who visits the same wine bar regularly. The bar-tender occasionally oﬀers promotions. The customer prefers to pay $8 for a glass ofFrench Syrah rather than $2 for a glass of Australian Shiraz. At the same time, sheprefers to pay $2 for a bottle of water rather than $10 for the glass of French Syrah.However, without any promotion in the store, she prefers paying $10 for AustralianShiraz to paying $8 for water. Propositions 2 and 5 give the (cid:51) ’s of the table for BGS and TK. It is routine to verify that MOsatisﬁes Monotonicity and Reference Irrelevance and the PT satisﬁes RI. We provide examples showingthe other properties are violated in Appendix A.5. Whenever c ( r ) = c ( r ) for every r, r ∈ X . The behavior in this example is both intuitively and formally consistent with thesalient thinking model of BGS. Without any promotion, the consumer expects to paya high price for a relatively low quality selection. When choosing between Syrah orShiraz, the consumer focuses on the French wine’s sublime quality, and she is willingto pay at least $6 more for it. When choosing between water and Syrah, the low priceof water stands out and she reveals that the gap between wine and water is less than$8. However, when there is no promotion, she focuses again on the quality, and she iswilling to pay an additional $2 for even her less-preferred Australian Shiraz over water.Notice that this explanation does not require that the reference points are diﬀerent.Since the consumer visits this bar regularly, intuitively, her reference point should beﬁxed and stable.3.5.

Non-increasing CTM.

For simplicity, we have so far focused on increasingCTM. This is a desirable feature in consumer choice, but models of social prefer-ence often violate this property. For instance, inequality-averse individual 1 prefersto increase the allocation to individual 2 from x to y when she feels guilty but notwhen she is envious. However, she always prefers increasing the allocation to 2 in anallocation categorized as guilty, and to decrease in any categorized as envious. Thiscontradicts Category Montonicity, suggesting the following weakening. Axiom (Consistent Preference within Category, CPC) . For each category k , thereexists a set of attributes P k so that if x j ≥ y j for all j ∈ P k , y i ≥ x i for all i / ∈ P k ,and x = y , then y (cid:37) k ∗ x . The set P k contains the attributes for which an increase positively aﬀects theDM’s evaluation. CPC requires that the set of positive attributes in a category doesnot depend on the reference point. For the two-person-RIA model, the set for the Implicitly, the example reveals that the quality of French Syrah is higher than Australian Shirazwhich is in turn higher than water. The numerical value of quality assigned to each beverage is irrele-vant to the violation of Cancellation. For examples of qualities so that choice can be represented by theBGS model, one can calculate that ( − , q fs ) (cid:31) r ( − , q as ), ( − , q w ) (cid:31) r ( − , q fs ) and ( − , q as ) (cid:31) r ( − , q w ) for q fs = 8, q as = 6 . q w = 5 .

1, and the reference point r = ( ( −

10 + − , ( q w + q as ))when w = 0 . “guilty” category is { , } since she strictly prefers increasing everyone’s allocation,but the set for the “envious” one is { } – she prefers more for herself but dislikesothers having even more. Note that CM is the special case of CPC where P k includesevery dimension for every category.A CTM is characterized by all the properties of an increasing CTM, except whereCM is replaced by CPC. The proof is a straightforward generalization of earlier one,so it is omitted. 4. BGS Model and Categories

The BGS model is intuitive, tractable, and accounts for a number of empiricalanomalies for the neoclassical model of choice. Despite its popularity, it can be diﬃcultto understand all of the implications of the BGS model. Its new components areunobservable, and its functional form rather involved.The ﬁrst crucial step towards understanding the model is getting a handle on thenovel salience function that determines which attribute stands out for a given referencepoint. While one can work out the implications of a particular salience function, thisexercise is not fruitful since the particular function that applies to a given agent isunobservable. Moreover, it is not clear how the model changes when the underlyingsalience function changes.CTM provides a lens through which we can study the salience function. While itinﬂuences which attribute is salient, the weight given to each attribute is independentof its magnitude. Therefore, its role is simply to divide the domain into distinctcategories, each associated with a particular attribute being most salient. We studythe salience function by focusing on the properties of the categories it generates.Categories are generated by a function s : R ++ × R ++ → R + if x ∈ K i ( r ) if andonly if s ( x i , r i ) > s ( x j , r j ) for all j = i . In the BGS model, categories are generated bya salience function σ that must satisfy the following properties. First, it increases in contrast, i.e. for (cid:15) > a > b , σ ( a + (cid:15), b ) > σ ( a, b ) and σ ( a, b − (cid:15) ) > σ ( a, b ). Second,it is continuous in both arguments. Third, it is symmetric, i.e. σ ( a, b ) = σ ( b, a ). Twoother properties are sometimes assumed: σ is Homogeneous of Degree Zero (HOD) iffor all α > σ ( αa, αb ) = σ ( a, b ), and σ has diminishing sensitivity if for all (cid:15) > a, b > σ ( a + (cid:15), b + (cid:15) ) ≤ σ ( a, b ). Finally, we always impose that the salience functionis grounded: σ ( r, r ) = σ ( r , r ) for all r, r ∈ X . This is an implication of HOD satisﬁedby all of the speciﬁcations of which we are aware in the literature, and is a necessarycondition for an attribute to be salient only if it diﬀers from the reference.Consider the following properties of categories. S0: (Basic) For any r ∈ X : K ( r ) T K ( r ) = ∅ , K ( r ) S K ( r ) is dense in X , K , K are continuous at r , and K ( r ) , K ( r ) are regular open sets. S1: (Moderation) For any λ ∈ [0 ,

1] and r ∈ X : if x ∈ K k ( r ), y k = x k , and y − k = λx − k + (1 − λ ) r − k , then y ∈ K k ( r ). S2: (Symmetry) If ( a, b ) ∈ K k ( c, d ), then ( c, d ) ∈ K k ( a, b ) and ( b, a ) ∈ K − k ( d, c ). S3: (Transitivity) If ( a , a ) / ∈ K ( r , r ) and ( a , a ) / ∈ K ( r , r ) then ( a , a ) / ∈ K ( r , r ). S4: (Diﬀerence) For any x, y, z with y = z , ( x, y ) ∈ K ( x, z ) and ( y, x ) ∈ K ( z, x ). S5: (Diminishing Sensitivity) For any x, y, K , K , (cid:15) >

0, if ( x, y ) / ∈ K ( r , r ),then ( x + (cid:15), y ) / ∈ K ( r + (cid:15), r ). S6: (Equal Salience) For any x, r ∈ X : if x r = x r or x r = r x , then x / ∈ K k ( r ) for k = 1 , S0 bydeﬁnition; we include it for completeness. S1 indicates that making a bundle’s lesssalient attribute closer to the reference point does not change the salience of the bundle.That is, when x and y diﬀer only in attribute l , and y is closer to the reference in thatattribute, if x is k -salient, then so is y . S2 requires that the same ranking is used for BGS require this inequality to hold strictly. However, this is not a desirable property. If σ is HODas they assume, then σ ( r, r ) = σ ( αr, αr ) = σ ( r + (cid:15), r + (cid:15) ) for α > (cid:15) = ( α − r , violating theirdeﬁnition of diminishing sensitivity. rr r 𝑅 " 𝑅 " 𝑅 𝑅 𝑅 " 𝑅 " 𝑅 𝑅 𝑅 " 𝑅 " 𝑅 𝑅 rr 𝑅 " 𝑅 " 𝑅 𝑅 r S0-S3 but not S4-S6 S0-S4 but not S5-S6 S0-S5 but not S6 S0-S6

Figure 4.

Properties

S0-S6

Illustratedeach attribute. S3 adapts transitivity to the salience ranking. It says that if a standsout more relative to r than a does to r , and a stands out more relative to r than a does to r , then a stands out more relative to r than a does to r . S4 says simplythat any diﬀerence stands out more than no diﬀerence. S5 implies that increasing boththe good and the reference by the same amount in the same dimension does not movethe good from one category to another. S6 reads that if every attribute of x diﬀersfrom the reference point by the same percentage, then none of the attributes standsout. More formally, if the percentage diﬀerence between x k and r k is the same acrossattributes, then x is not k -salient for any k ∈ { , } .Figure 4 provides examples satisfying some but not all of the properties. Thefunctions that generate them, as well as a veriﬁcation that they satisfy the claimedproperties, can be found in Example 4 in the Appendix. Theorem 5.

The category function satisﬁes: (1)

S0-S4 if and only if there exists a salience function σ that generates them; (2) S0-S5 if and only if the σ that generates it has diminishing sensitivity; and (3) S0 , S1 , and S6 if and only if it satisﬁes S0-S6 if and only if the σ thatgenerates it is HOD. Any HOD salience function generates the same categories. This theorem provides a characterization for BGS’s salience function. It trans-lates the functional form assumptions on the salience function in terms properties on Theorem 5 relies on the full structure of R for the last two results, as noted in Footnote 1. Di-minishing sensitivity and Homogeneity are both cardinal properties, and so are undeﬁned without the salience categories. The most common speciﬁcation of the salience function, HOD,satisﬁes all of the above properties. Surprisingly, the result shows that there is a uniquecategory function satisfying these properties. Hence, any two HOD salience functionslead to exactly the same behavior.We now turn to the question of identifying the salience function from choice be-havior alone. That is, given that we observe a family { (cid:37) r } r ∈ X , can we identify whichalternatives have what salience? Proposition 1.

Suppose that { (cid:37) r } r ∈ X has a BGS representation. Then, the weights,utility indices, and salience function are uniquely identiﬁed from { (cid:37) r } r ∈ X . The proof provides an algorithm for this in general. We illustrate for the case where u and u are linear. Fixing a reference point r , any alternative that diﬀers only indimension i from r must be i -salient. Hence, we can identify the weights on dimensionswithin each category from the slope of the indiﬀerence curve passing through thatalternative. Now, we can test whether y is 1-salient by seeing if the indiﬀerence curvesclose to it are those generated by the weights for 1-salient alternatives. Varying y and r allows identiﬁcation of the salience function, and hence the categories.In addition to the particular form of categories, BGS satisﬁes several propertiesthat distinguish it from other CTMs. The most general of these is Reference Irrelevance,above, making BGS a strong CTM. The other follows. Axiom 9 (Salient Dimension Overweighted, SDO) . For any x, y, r, r ∈ X :if x, y ∈ K k ( r ) ∩ K l ( r ), x (cid:37) r y , x l > y l , and y k > x k , then x (cid:31) r y .This axiom requires that categories correspond to the dimension that gets themost weight. That is, the DM is more willing to choose an alternative whose “best”attribute is k when it is k -salient. To illustrate, consider alternatives x, y with x > y and y > x . Because x is relatively strong in attribute 1, x should beneﬁt more than cardinal structure on X . Properties S0-S4 are deﬁned. Subsequent results that rely on Theorem 5,such as Propositions 2 and 3, remain true when imposing only

S0-S4 in this setting. y from a focus on it. If x is chosen over y when attribute 2 stands out for both, thenthis advantage in the ﬁrst dimension is so strong that even a focus on the other onedoes not oﬀset it. Hence, the DM should surely choose x over y for sure when attribute1 stands out for it. Proposition 2.

Assume that there exists x ∈ K k ( r ) and y ∈ K j ( r ) with x ∼ r y forany categories k, j and any r ∈ X . Then, the family { (cid:37) r } r ∈ X satisﬁes the AﬃneCTM axioms, Reference Interlocking, Reference Irrelevance, and Salient DimensionOverweighted for a category function K satisfying S0 - S5 if and only if it has a BGSrepresentation where σ has diminishing sensitivity. This result characterizes the BGS model. It also provides guidance for comparingit with other models in the CTM class (see Figure 1 and Table 1). By outlining themodel’s testable implications, the result provides guidance on how to design experi-ments to test it. In their 2013 paper, BGS focus on a special case where the model is linear: w = w = 1 − w = 1 − w > and u ( x ) = u ( x ) = x . In an earlier version of this paper, weshow this model is characterized by strengthening Aﬃne Across Categories to requirelinearity and imposing a reﬂection axiom that requires permuting two alternatives andthe reference point in the same way not to reverse the DM’s choice between the two. Taken together Propositions 1 and 2 provide an outline for a fully subjectiveaxiomatization of a family of preferences with a BGS representation. Proposition 1shows that we can reveal a category function from the family of preferences, providedthey have a representation. We check whether these revealed categories exist andsatisfy S0-S5. If so, then the axioms shown necessary by the second result apply withthis revealed category function. The assumption that alternatives indiﬀerent to each other exist in each category for each referencepoint is not strictly necessary. A suﬃcient condition for it to be necessary is that the utility indexesare both unbounded above (or below). Formally, the ﬁrst is that Aﬃne Across Categories holds with ⊕ k replaced by the usual + operation.The second is that ( a, b ) (cid:37) r ,r ( c, d ) if and only if ( b, a ) (cid:37) r ,r ( d, c ). One can verify that theseadditional assumptions imply that the ancillary assumption about indiﬀerence holds. Choice Correspondence

In this section, the modeler observes only the DM’s choice from a ﬁnite subset ofchoices and nothing more. A model consists of both a theory of reference formation anda theory of choice given categorization. In this setting, we can jointly test the theoryof choice given categorization, categorization given reference, and reference formation.We model reference formation via a reference generator, a map from ﬁnite subsetsof alternatives to reference points. We denote the reference generator A : 2 S \ ∅ → X ,with the interpretation that A ( S ) is the reference point when the menu is S . Examplesinclude the BGS theory that A ( S ) is the average alternative, that A ( S ) is the medianbundle, that A ( S ) is the upper (or lower) bound of S , and the Köszegi & Rabin[2006] theory that A ( S ) = c ( S ). If additional observable data on the choice context isprovided, then it is easy to extend our results to A being a function of that as well. Forinstance MO theorize that the initial endowment e is observable and that A ( S, e ) = e ,and Bordalo et al. [2019] theorize that past histories h of consumption are availableand that A ( S, h ) is the average between the bundles in S and those in h .Fixing a categorization function K and a reference generator A , let X be the set ofﬁnite and non-empty subsets of X such that every alternative is categorized. Formally, S ∈ X only if S ⊂ S mi =1 K i ( A ( S )). We call these menus or categorized menus for short.The requirement ensures that each alternative in the choice set belongs to a categorygiven the reference point A ( S ). We leave open how the DM chooses when alternativesthat are uncategorized belong to the choice set. By leaving the choice from this smallset of menus ambiguous, we can more clearly state the properties of choice implied bythe model. One can, of course, extend the model to account for these choices. For instance, BGS hypothe-size that these alternatives are evaluated according to their sum. Complications arise because theuncategorized alternatives are “small:” its complement is open and dense, and moreover it has zeromeasure. We summarize the DM’s choices by a choice correspondence c : X ⇒ X with c ( S ) ⊆ S and c ( S ) = ∅ for each S ∈ X . Adapted to this setting, the model has thefollowing representation. Deﬁnition 4.

The choice correspondence c conforms to Strong-CTM under ( K , A ) ifthere exists a family of preference relations { (cid:37) r } r ∈ X that conforms to Increasing StrongCTM under K so that c ( S ) = n x ∈ S : x (cid:37) A ( S ) y for all y ∈ S o for every S ∈ X .5.1. Reference point formation.

Provided that the reference generator is responsiveenough to changes in the menu, there is the possibility of testing the properties requiredby categorization on (cid:37) r . One example of enough structure is that the reference point isthe average bundle. However, this is just one example. An even more general suﬃcientcondition is as follows. Assumption.

A function A is a generalized average if for any S = { x , . . . , x m } ∈ X :(i) the function x A ([ S \ { x } ] S { x } ) is continuous at x , and(ii) for any (cid:15) > S ∈ S i K i ( A ( S )), there exists S ∗ ∈ X so that S ∗ ⊃ S S S , d ( A ( S ∗ ) , A ( S )) < (cid:15) , and for any x ∈ S ∗ \ S , min x ∈ S d ( x , x ) < (cid:15) .Examples of generalized average reference include the average bundle A a ( S ) = P x ∈ S x | S | , P x ∈ S x | S | ! , the median value of each attribute, and a weighted average A wa ( S ) = P x ∈ S w ( x ) x P x ∈ S w ( x ) , P x ∈ S w ( x ) x P x ∈ S w ( x ) ! for any continuous weight function w : X → [ a, b ] with b > a >

0. We sometimesimpose the additional requirement that A ( S ) ∈ co ( S ) \ ext ( S ) for all non-singleton S ;if so, we call A a strong generalized average . The ﬁrst and last of these examples satisfy this property. The supremum and inﬁmum on their own are not weighted averages,nor (necessarily) is the choice acclimating reference generator, c ( S ) = A ( S ). Behavioral Foundations for Strong-CTM.

We now consider the behaviorby a DM who conforms to Strong-CTM for a given category function and referencegenerator. To do so, we make use of our earlier analysis by revealing how the DMevaluates alternatives categorized in a given way. When A ( S ) is a generalized average,this provides enough structure to identify enough of the family to apply our earlieranalysis.The main behavioral content comes from the choice correspondence equivalent ofReference Irrelevance. To state it, we introduce the following deﬁnition and notation. Deﬁnition 5.

The alternative x in category k is indirectly revealed preferred to al-ternative y in category j , written ( x, k ) (cid:37) R ( y, j ), if there exists ﬁnite sequences ofpairs ( x i , S i ) ni =1 such that x = x ∈ K k ( A ( S )), y ∈ K j ( A ( S n )) T S n , and for each i : x i ∈ c ( S i ), x i +1 ∈ S i , and x i +1 ∈ K k i ( A ( S i )) ∩ K k i ( A ( S i +1 )) for some k i .We replace Reference Irrelevance with the following weakening of the Strong Axiomof Revealed Preference (SARP). Axiom (Category SARP) . For any S ∈ X , if ( x, k ) (cid:37) R ( y, j ) , x ∈ K k ( A ( S )) T S , y ∈ K j ( A ( S )) T S , and y ∈ c ( S ) , then x ∈ c ( S ) . We ﬁrst illustrate in a simple two menu setting, analogous to a test case for theWeak Axiom of Revealed Preference (WARP). Consider two menus S and S and twochosen products x ∈ c ( S ) and x ∈ c ( S ) where both products are categorized in thesame way for both menus. For example, x is in category 1 for both menus, and x is in category 2 for both. The observation x ∈ c ( S ) reveals that the valuation of x is at least as high as that of x when x belongs to the ﬁrst category and x to thesecond. Since the categorization of products does not change when the menu changes Recall sup S = (max x ∈ S x , max x ∈ S x ) and inf S is deﬁned analogously. from S to S , their relative valuation stays the same as well. Hence, if x is chosenfrom S , then x must be chosen too. Since neither products’ category has changed,the DM should obey WARP for these two menus. However, the axiom leaves open thepossibility of a WARP violation when either is diﬀerentially categorized.The axiom extends this logic to sequences of choices in much the same way thatSARP does to WARP. A ﬁnite sequence of choices, where the choice from the nextmenu is available in the current one and has the same salience in both, does not leadto a choice reversal. Since salience does not change along the sequence of choices, thechoices do not exhibit a reversal.Category SARP limits the eﬀect of unchosen alternatives. Modifying them canalter the DM’s choice, but only insofar as changing them changes the reference pointand thus the salience of alternatives. It states that these unchosen options do not alterthe relative ranking of two alternatives, unless they change the region to which thealternatives belong. That is, when comparing the same two alternatives in diﬀerentmenus, the DM’s relative ranking does not change when neither’s salience changes.This property greatly limits the eﬀect of the reference point. In fact, a suﬃcientlysmall change in the reference never leads to a preference reversal.The remaining axioms are the natural generalizations to the choice correspondenceof Category Cancellation, Category Monotonicity, Category Continuity, Reference In-terlocking, and Aﬃne Across Categories. We denote these by appending a “*” todistinguish from their reference-dependent-preference formulation. Appendix B.1 con-tains their formal statement.As before, we require some additional topological structure on the categories. Fora category k , let E R,k = { x ∈ X : x ∈ K k ( A ( S )) , { x } = c ( S ) } and D k = [ S ∈X n K k ( A ( S )) \ S o . The generalization of the structure assumption is as follows.

Assumption (Revealed Structure) . For any category k , E R,k is open, E R,k is dense in D k , and the following sets are connected: E R,k , { x ∈ E R,k : x j = s } for all dimensions j and scalars s ∈ R , and { y ∈ E R,k : ( x, k ) ∼ R ( y, k ) } for all x ∈ E R,k .In addition to what was imposed by the Structure Assumption, we require thatalmost all objects categorized in a category are chosen in some menu. This can beweakened, but is typically satisﬁed by the models in which we are interested, such asBGS.We require one last assumption.

Axiom (Comparability Across Regions, CAR) . If x ∈ E R,k , then for any j there exists y ∈ E R,j so that ( x, k ) ∼ R ( y, j ) . This is a version of the assumption we made for Strong CTM. It requires thatevery alternative chosen when it belongs to category k is revealed to be equally goodto some other alternative when it is categorized in category j . With it, we can nowstate the result. Theorem 6.

Assume that Revealed Structure and CAR hold and that A is a generalizedaverage. A choice correspondence c conforms to strong-CTM under ( K , A ) if and only if c satisﬁes Category-SARP, Category Monotonicity*, Category Cancellation*, CategoryContinuity*, and Aﬃne Across Categories*. The result is the counterpart of Theorem 4 with an endogenous reference point.The behavior corresponding to categorization does not fundamentally change acrosssettings. As long as the DM reacts consistently when alternatives are categorized inthe same way, then we can represent her choices as categorical thinking where the reference point only aﬀects how she categorizes each alternative. The key challenge inthe proof is to establish that the arguments we used to establish our earlier results stillhold. We adapt our earlier arguments to show that revealed preference within cate-gory k is complete on E R,k . This relies on small changes in alternatives not changingchoice, a property implied by generalized average. Then, the remaining axioms estab-lish that this within-category preference has an additive representation. CAR allowsus to extend across categories.5.3.

Behavioral Foundations for BGS.

In this subsection, we provide a behavioralfoundation for BGS. The ﬁrst step is to show that the Revealed Structure assumptionholds.

Lemma 1. If A is a strong generalized average, K satisﬁes S0 , S1 , and S4 , and c satisﬁes Category Montonicity*, then E R,k = R for k = 1 , . Given the assumptions we have made so far, every alternative is chosen in somemenu when it is k -salient. Consequently, the revealed structure assumption must hold.The result relies on the observation that the DM categorizes x as 1-salient when allother available options have the same value in dimension 2 as x . If x has the highestvalue in attribute 1 in such a choice set, then it must be chosen.Now, we can apply Theorem 6 in combination with the insights gained from Propo-sition 2 to understand the behavioral foundation of the BGS model. Proposition 3.

Assume that A is a strong generalized average and that CAR holds.The choice correspondence c satisﬁes Category-SARP, Category Monotonicity*, Cat-egory Cancellation*, Category Continuity*, Aﬃne Across Categories*, Reference In-terlocking*, and Salient Dimension Overweighted* for a category function K satisfying S0 - S5 if and only if c conforms to BGS where σ has diminishing sensitivity. This theorem lays out the behavioral postulates that characterize the BGS modelwith endogenous reference point formation. Most importantly, it connects the (un-observed) components of the model to observed choice behavior. Fundamentally, theproperties that Proposition 2 characterized the model in our ﬁrst setting still character-ize it. To do so, we note that Theorems 5 and 6 imply that there exists a Strong CTMwith categories generated by a salience function. We then establish that choice withinthe k -salient alternatives overweights dimension k by using SDO and the structure ofregions.Finally, we ask the question of whether the choice correspondence with an endoge-nous reference point provides enough leverage to identify salience. Proposition 4.

Given that c conforms to BGS with a strong generalized average, thecategories are uniquely identiﬁed. As with Propositions 1 and 2, Propositions 3 and 4 provide a roadmap for testingBGS without a known salience function. However, it still requires that the referencegenerator is a strong generalized average. Consequently, the axioms capture the fulltestable implication of the model and allow for tight comparisons with other existingwork. 6.

Related Literature

This paper provides a choice theoretic analysis of categorization. We apply thismodel to highlight similarities and diﬀerences between a number of behavioral modelsin the literature. As such, it is closely related to the literature which studies howa reference point aﬀects choices, ( e.g.

Tversky & Kahneman [1991], Munro & Sug-den [2003], Sugden [2003], Masatlioglu & Ok [2005], Sagi [2006], Salant & Rubinstein[2008], Apesteguia & Ballester [2009], Masatlioglu & Nakajima [2013], Masatlioglu &Ok [2014], Dean, Kıbrıs, & Masatlioglu [2017]). The papers focus on an exogenous reference point, as in Section 3. While TK and MO are examples of CTM, the oth-ers are not. Nonetheless, our analysis puts the models on an equal footing so theirimplications can be compared.We then extend the model to consider endogenous reference point formation. Thisadopts the approach of a number of recent papers, e.g. Bodner & Prelec [1994],Kivetz, Netzer, & Srinivasan [2004], Orhun [2009], Bordalo, Gennaioli, & Shleifer[2012], Tserenjigmid [2015]. As in Section 5, the reference point is a function of thecontext, and is identical for all feasible alternatives. Finally, Köszegi & Rabin [2006],Ok, Ortoleva, & Riella [2015], Freeman [2017] and Kıbrıs et al. [2018] study modelswhere the endogenous reference point is determined by what the agent chooses, but isotherwise independent of the choice set. This represents a very diﬀerent approach toreference formation, and our approach does not easily generalize to accommodate it. One of our key contributions is to provide an axiomatization of the salient thinkingmodel. Interpreting salience as arising from diﬀerential attention to attributes, CTMhas a close relationship with the literature studying how limited attention aﬀects deci-sion making. Masatlioglu et al. [2012] and Manzini & Mariotti [2014] study a DM whohas limited attention to the alternatives available. The DM maximizes a ﬁxed prefer-ence relation over the consideration set, a subset of the alternatives actually available.In contrast, in CTM the DM the considers all available alternatives but maximizes apreference relation distorted by her attention. Caplin & Dean [2015], de Oliveira et al.[2017] and Ellis [2018] study a DM who has limited attention to information. In con-trast to CTM, attention is chosen rationally to maximize ex ante utility, rather thandetermined by the framing of the decision, and choice varies across states of the world.The most related interpretation considers attributes as payoﬀs in a ﬁxed state. In ad-dition to choices varying across states, each alternative has the same weights on eachattribute, similar to Kőszegi & Szeidl [2013]. Taken together, these results highlightthe eﬀects on behavior of diﬀerent types of attention. Maltz [2017] is the only model of which we are aware that combines an exogenous reference pointwith endogenous reference-point formation. While we argue in this paper that a number of prominent behavioral economicmodels can be thought of as resulting from categorization, few papers in econom-ics explicitly address categorization. Mullainathan [2002] provides a model of beliefupdating and shows how categorization can generate non-Bayesian eﬀects. Fryer &Jackson [2008] introduce a categorical model of cognition where a decision maker cat-egorizes her past experiences. Since the number of categories is limited, the decisionmaker must group distinct experiences in the same category. In this model, predictionis based on the prototype from the category which matches closely the current situa-tion. Finally, Manzini & Mariotti [2012] introduce a two-stage decision-making model.In the ﬁrst stage, a decision maker eliminates some of alternatives based on their cat-egories, and in the second stage she maximizes her preference among the alternativessurviving after the ﬁrst stage. Bordalo et al. [2019] provide a model of memory andattention, where the context’s similarity to past consumption opportunities aﬀects thesalience of the alternatives currently available. They show this leads to endogenouscategorization of the current opportunity set, and discuss the resulting implications forchoice.The evolutionary psychology literature on categorization suggests a common ex-planation for the eﬀects shown to be captured by our model of categorization. Thatliterature stresses that categories evolved as cues to apply a particular mental processin a given situation (see e.g. the review by Cosmides & Tooby [2013]). However, theseprocesses are often applied to situations diﬀerent from their evolutionary purpose.Boyer & Barrett [2015] explain, “The fact that some cognitive system is specializedfor a domain D does not entail that it invariably or exclusively handles D, nor does itmean that the specialization cannot be co-opted for evolutionarily novel activities.”

Thisimplies that systems used to evaluate categorized objects are miscalibrated from howthey would be more useful. For instance, New et al. [2007] documented that subjectswere quicker and more accurate in noticing changes involving animals than for thoseinvolving vehicles, despite the latter’s much greater importance in modern life. References

Anderson, J. R. (1991). The adaptive nature of human categorization.

Psychologicalreview , (3), 409.Apesteguia, J., & Ballester, M. A. (2009). A theory of reference-dependent behavior. Economic Theory , (3), 427–455.Ashby, F., & Gott, R. (1988). Decision rules in the perception and categorization ofmultidimensional stimuli. Journal of experimental psychology. Learning, memory,and cognition , , 33–53.Ashby, F. G., & Maddox, W. T. (2005). Human category learning. Annual ReviewPsychology , , 149–178.Ashby, F. G., & Waldron, E. M. (1999). On the nature of implicit categorization. Psychonomic Bulletin and Review , , 363–378.Bhatia, S., & Golman, R. (2013). Attention and reference dependence. Working paper .Bodner, R., & Prelec, D. (1994). The centroid model of context dependent choice.

Unpublished Manuscript, MIT .Bordalo, P., Gennaioli, N., & Shleifer, A. (2012). Salience theory of choice under risk.

The Quarterly Journal of Economics , (3), 1243–1285.Bordalo, P., Gennaioli, N., & Shleifer, A. (2013). Salience and consumer choice. Journalof Political Economy , (5), 803–843.Bordalo, P., Gennaioli, N., & Shleifer, A. (2019). Memory, attention, and choice. Working Paper, Harvard .Boyer, P., & Barrett, H. C. (2015). Intuitive ontologies and domain speciﬁcity.

Thehandbook of evolutionary psychology , (pp. 1–19).Bushong, B., Rabin, M., & Schwartzstein, J. (2015). A model of relative thinking.

Unpublished manuscript, Harvard University, Cambridge, MA .Caplin, A., & Dean, M. (2015). Revealed preference, rational inattention, and costlyinformation acquisition.

American Economic Review , (7), 2183–2203.Charness, G., & Rabin, M. (2002). Understanding social preferences with simple tests. The Quarterly Journal of Economics , (3), 817–869. Chateauneuf, A., & Wakker, P. (1993). From local to global additive representation.

Journal of Mathematical Economics , (6), 523–545.Chernev, A. (2011). The dieter’s paradox. Journal of Consumer Psychology , (2),178–183.Choi, J., & Kim, S. (2016). Is the smartwatch an it product or a fashion product? astudy on factors aﬀecting the intention to use smartwatches. Computers in HumanBehavior , , 777 – 786.Cosmides, L., & Tooby, J. (2013). Evolutionary psychology: New perspectives oncognition and motivation. Annual review of psychology , , 201–229.de Oliveira, H., Denti, T., Mihm, M., & Ozbek, K. (2017). Rationally inattentivepreferences and hidden information costs. Theoretical Economics , (2), 621–654.Dean, M., Kıbrıs, Ö., & Masatlioglu, Y. (2017). Limited attention and status quo bias. Journal of Economic Theory , , 93–127.Debreu, G. (1959). Topological methods in cardinal utility theory. Tech. rep., CowlesFoundation for Research in Economics, Yale University.Ellis, A. (2018). Foundations for optimal inattention. Journal of Economic Theory , , 56–94.Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. Quarterly Journal of Economics , (3), 817–868.Freeman, D. J. (2017). Preferred personal equilibrium and simple choices. Journal ofEconomic Behavior & Organization , , 165–172.Fryer, R., & Jackson, M. O. (2008). A categorical model of cognition and biaseddecision making. The BE Journal of Theoretical Economics , (1).Gabaix, X. (2014). A sparsity-based model of bounded rationality, applied to basicconsumer and equilibrium theory. Quarterly Journal of Economics , , 1369–1420.Ghirardato, P., Maccheroni, F., Marinacci, M., & Siniscalchi, M. (2003). A subjectivespin on roulette wheels. Econometrica , (6), 1897–1908.Kıbrıs, Ö., Masatlioglu, Y., & Suleymanov, E. (2018). A theory of reference pointformation. Working papers. Kivetz, R., Netzer, O., & Srinivasan, V. (2004). Alternative models for capturing thecompromise eﬀect.

Journal of Marketing Research , (3), 237–257.Köszegi, B., & Rabin, M. (2006). A model of reference-dependent preferences. Quar-terly Journal of Economics , (4), 1133–1165.Kőszegi, B., & Szeidl, A. (2013). A model of focusing in economic choice. The QuarterlyJournal of Economics , (1), 53–104.Krantz, D., Luce, D., Suppes, P., & Tversky, A. (1971). Foundations of measurement,Vol. I: Additive and polynomial representations . New York Academic Press.Loken, B. (2006). Consumer psychology: categorization, inferences, aﬀect, and per-suasion.

Annu. Rev. Psychol. , , 453–485.Loken, B., Barsalou, L. W., & Joiner, C. (2008). Categorization theory and researchin consumer psychology. Handbook of consumer psychology , (pp. 133–165).Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sustain: a network model ofcategory learning.

Psychological review , (2), 309.Maltz, A. (2017). Exogenous endowment - endogenous reference point. Working PapersWP2016/5, University of Haifa, Department of Economics.Manzini, P., & Mariotti, M. (2012). Categorize then choose: Boundedly rational choiceand welfare. Journal of the European Economic Association , (5), 1141–1165.Manzini, P., & Mariotti, M. (2014). Stochastic choice and consideration sets. Econo-metrica , (3), 1153–1176.Masatlioglu, Y., & Nakajima, D. (2013). Choice by iterative search. Theoretical Eco-nomics , (3), 701–728.Masatlioglu, Y., Nakajima, D., & Ozbay, E. Y. (2012). Revealed attention. AmericanEconomic Review , (5), 2183–2205.Masatlioglu, Y., & Ok, E. A. (2005). Rational choice with status quo bias. Journal ofEconomic Theory , , 1–29.Masatlioglu, Y., & Ok, E. A. (2014). A canonical model of choice with initial endow-ments. The Review of Economic Studies , (2), 851–883.Mogilner, C., Rudnick, T., & Iyengar, S. S. (2008). The mere categorization eﬀect:How the presence of categories increases choosers’ perceptions of assortment variety and outcome satisfaction. Journal of Consumer Research , (2), 202–215.Mullainathan, S. (2002). Thinking through categories. NBER working paper .Munro, A., & Sugden, R. (2003). On the theory of reference-dependent preferences.

Journal of Economic Behavior & Organization , (4), 407–428.New, J., Cosmides, L., & Tooby, J. (2007). Category-speciﬁc attention for animalsreﬂects ancestral priorities, not expertise. Proceedings of the National Academy ofSciences , (42), 16598–16603.Ok, E. A., Ortoleva, P., & Riella, G. (2015). Revealed (p)reference theory. AmericanEconomic Review , (1), 299–321.Orhun, A. Y. (2009). Optimal product line design when consumers exhibit choiceset-dependent preferences. Marketing Science , (5), 868–886.Posner, M. I., & Keele, S. W. (1970). Retention of abstract ideas. Journal of Experi-mental psychology , (2p1), 304.Ratneshwar, S., & Shocker, A. D. (1991). Substitution in use and the role of usagecontext in product category structures. Journal of Marketing Research , (3), 281–295.Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experi-mental Psychology: General , (3), 192–233.Sagi, J. S. (2006). Anchored preference relations. Journal of Economic Theory , (1),283–295.Salant, Y., & Rubinstein, A. (2008). (a, f): Choice with frames. The Review ofEconomic Studies , (4), 1287–1296.Savage, L. J. (1954). The Foundations of Statistics . Dover.Stewart, N., Brown, G. D., & Chater, N. (2002). Sequence eﬀects in categorization ofsimple perceptual stimuli.

Journal of Experimental Psychology: Learning, Memory,and Cognition , (1), 3.Strotz, R. (1955). Myopia and inconsistency in dynamic utility maximization. TheReview of Economic Studies , (3), 165–180.Sugden, R. (2003). Reference-dependent subjective expected utility. Journal of Eco-nomic Theory , (2), 172–191. Tserenjigmid, G. (2015). Choosing with the worst in mind: A reference-dependentmodel. Working papers.Tversky, A. (1977). Features of similarity.

Psychological review , (4), 327.Tversky, A., & Gati, I. (1978). Studies of similarity. Cognition and categorization , (1978), 79–98.Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. The Quarterly Journal of Economics , (4), 1039–1061.Wanke, M., Bless, H., & Schwarz, N. (1999). Lobster, wine and cigarettes: Ad hoccategorisations and the emergence of context eﬀects. Marketing Bulletin , , 52–56. Appendix A. Proofs and Extras from Sections 2 - 4

A.1.

Proof of Theorem 1.Lemma 2. (cid:31) k ∗ has open upper and lower contour sets in E k .Proof. Suppose x (cid:31) k ∗ y . Then, there are x , x , . . . , x M ∈ E k and r , . . . , r M with x = x and x M = y so that x j (cid:37) r j x j +1 and x j , x j +1 ∈ K k ( r j ). Let (cid:15) j > B (cid:15) j ( x j ) , B (cid:15) j ( x j +1 ) ⊂ K k ( r j ). Set (cid:15) = min { (cid:15) j } j a . Since IC = { b ∈ E ∗ : b ∼ k ∗ θ ( a ) } is path-connected, there isanother continuous path θ : [0 , → IC with θ (0) = θ ( a ) and θ (1) = θ ( b ). Then thepath θ ∗ given by θ ∗ ( x ) = θ ( x ) for x / ∈ [ a, b ] and θ ∗ ( x ) = θ (cid:16) x − ab − a (cid:17) for x ∈ [ a, b ] is also acontinuous path from x to y . Constructing this for a ∗ = min { a : θ ( a ) ∼ k ∗ θ ( a ) } and b ∗ = max { a : θ ( a ) ∼ k ∗ θ ( a ) } gives a path that crosses IC at most once. These arewell-deﬁned since θ is continuous.Now, let Y = θ − ([0 , Y is closed since θ is continuous and so compact as asubset of cl ( B d ( x,y )+1 ( x )). For any z ∈ Y , there exists r z ∈ X and (cid:15) z > B z = B (cid:15) z ( z ) ⊂ K k ( r z ). Since B z ⊂ K k ( r z ) and (cid:37) k is a subrelation of (cid:37) k ∗ , (cid:37) k ∗ iscomplete and transitive when restricted to B z . Then, the collection { B z : z ∈ Y } isan open cover of Y and hence has a ﬁnite subcover B z , B z , . . . , B z m . W.L.O.G., B z j is not a subset of B z j for any j, j and θ ( z j ) < θ ( z j +1 ), so x ∈ B z and y ∈ B z m .Moreover, since θ crosses each indiﬀerence curve only once, if z k (cid:31) k ∗ z k +1 ( z k ≺ k ∗ z k +1 )for any k , then z j (cid:37) k ∗ z j ( z k (cid:45) k ∗ z k +1 ) for any j > j . W.L.O.G. consider the former.Pick a ∈ B z T B z T Y so that x (cid:37) k a and then pick a j ∈ B z j T B z j +1 T Y so that a j − (cid:37) k a j . Then, x (cid:37) k ∗ a (cid:37) k ∗ a (cid:37) ∗ · · · (cid:37) k ∗ a m (cid:37) k ∗ y. Since (cid:37) k ∗ is transitive, we conclude x (cid:37) k ∗ y . Since x, y were arbitrary, (cid:37) k ∗ is complete. (cid:3) Apply CW Theorem 2.2 to get an additive representation U i ( x ) on E i . For any x, y ∈ K i ( r ), x (cid:37) r y if and only if U i ( x ) ≥ U i ( y ) and U i ( x ) = P j U ij ( x j ). Lemma 4.

For categories K i ( r ) and K j ( r ) , either (i) there exists x i ∈ K i ( r ) and x j ∈ K j ( r ) so that x i ∼ r x j ; or (ii) x i (cid:31) r x j for all x i ∈ K i ( r ) and x j ∈ K j ( r ) ; or(iii) x j (cid:31) r x i for all x i ∈ K i ( r ) and x j ∈ K j ( r ) .Proof. If neither (ii) nor (iii) holds, then after relabeling categories if necessary, thereexist x ∈ K i ( r ) and y, z ∈ K j ( r ) such that y (cid:31) r x (cid:31) r z . Let U C j ( x ) and LC j ( x )be the strict upper and lower contour sets of x in category j for reference r . Anypoint in K j ( r ) \ [ U C j ( x ) S LC j ( x )] is indiﬀerent to x , so either (i) holds or the setis empty. There exists an (cid:15) > x ∈ B (cid:15) ( x ), y (cid:31) r x (cid:31) r z byCategory Continuity and hence K j ( r ) = U j ( x ) and K j ( r ) = L j ( x ). By CategoryContinuity, there exists x ∈ B (cid:15) ( x ) such that K j ( r ) \ [ U C j ( x ) S LC j ( x )] = ∅ (otherwise, B (cid:15) ( x ) is contained in the interior of the set considered), so we can take y ∈ K j ( r ) \ [ U C j ( x ) S LC j ( x )] and conclude y ∼ r x . (cid:3) Deﬁnition 6.

A ﬁnite sequence ( Q , . . . , Q m +1 ) with each Q i ∈ { K ( r ) , . . . , K n ( r ) } isan indiﬀerence sequence for r (IS) if there exists x , . . . , x m , y , . . . , y m with x k ∈ Q k , y k ∈ Q k +1 and x k ∼ r y k .We omit the dependence on r when clear from context.Deﬁne the relation ./ r by x ./ r y if there exists an indiﬀerence sequence of cate-gories ( Q , . . . , Q m ) with x ∈ Q and y ∈ Q m . It is easy to see that ./ r is an equivalencerelation (reﬂexive, symmetric, and transitive). Let [ x ] r denote the ./ r equivalence classof x . Lemma 5. If y / ∈ [ x ] r and x (cid:31) r y , then x (cid:31) r y for all x ∈ [ x ] r and y ∈ [ y ] r .Proof. Fix x, y, r ∈ X with y / ∈ [ x ] r and x (cid:31) r y , and assume x ∈ K k . Pick any y ∈ [ y ] r .By deﬁnition, there is an IS ( Q , . . . , Q m ) with y ∈ Q m and y ∈ Q . Let i = 1 and y = y . If there exists y ∈ Q i with y (cid:37) r x , then y (cid:37) r x (cid:31) r y i , so by Lemma 4, we can ﬁnd z ∈ Q i and x ∈ K k with z ∼ r x . If that occurs, then ( K k , Q i , . . . , Q ) isan IS and y ∈ [ x ] r , a contradiction. Thus x (cid:31) r y for all y ∈ Q i . Now, there exists y i +1 ∈ Q i +1 with x (cid:31) r y i +1 by transitivity and deﬁnition of IS. Hence, we can applyabove logic to Q i +1 as well: x (cid:31) r y for all y ∈ Q i +1 . Inductively, this extends all theway to Q m , so x (cid:31) r y in particular. Since y is arbitrary, this extends to any y ∈ [ y ] r .Similar arguments show that x (cid:31) r y for any x ∈ [ x ] r . Combining, x (cid:31) r y whenever x ∈ [ x ] r and y ∈ [ y ] r . (cid:3) Fix a reference point r . Let A , . . . , A n be the distinct equivalence classes of ./ r .By Lemma 5, these sets can be completely ordered by (cid:31) r , i.e. A i (cid:31) r A j ⇐⇒ x (cid:31) r y for all x ∈ A i and y ∈ A j . Label so that A (cid:31) r A (cid:31) r · · · (cid:31) r A n .Pick an indiﬀerence class A i and an IS Q , . . . , Q M that contains points in everyregion in A i . We deﬁne V i ( · ) on A i as follows. Deﬁne V i ( x ) on Q so that V i ( x ) = U j ( x )for all x ∈ K j ( r ) where K j ( r ) = Q . Clearly V i represents (cid:31) r when restricted to Q .There is no loss in assuming that V i is bounded, and the closure of its range is aninterval. Now, assume inductively that, for a given m ≤ k , V i represents (cid:31) r when restrictedto S m − j =1 Q j ≡ Q m − , is bounded, is continuous on Q m − , and is an increasing trans-formation of U k within Q j when Q j = K k ( r ). Then, extend V i to Q m as follows. ByLemma 5, it is impossible that y (cid:31) r x for every x ∈ Q m − and every y ∈ Q m . It willbe convenient to relabel regions so that Q m = K m ( r ).Pick a bounded, strictly increasing, continuous h : R → R . For any x ∈ K m ( r ) sothat x (cid:31) r y for all y ∈ Q m − , set V i ( x ) = h ( U m ( x )) + β + where β + = sup { V i ( x ) : x ∈ Q m − } − inf { h ( U m ( x )) : x ∈ K m ( r ) , x (cid:31) r y for all y ∈ Q m − } . For any x ∈ K m ( r ) for which there exists y, y ∈ Q m − so that y (cid:31) r x (cid:31) r y , let V i ( x ) = inf { V i ( y ) : y ∈ Q m − and y (cid:37) r x } . For all other x ∈ K m ( r ), let V i ( x ) = h ( U m ( x )) + β − where β − = inf { V i ( x ) : x ∈ Q m − } − sup { h ( U m ( x )) : x ∈ K m ( r ) , y (cid:31) r x for all y ∈ Q m − } . This V i is bounded and continuous. We can deﬁne V ( x ) = h ( V ( x )) for h ( v ) = − / (1 + v ) when v ≥ h ( v ) = − / (1 − v ) when v < We now show that it represents (cid:31) r on Q m . Pick x, y ∈ Q m . There are four cases: Case 1: x, y ∈ Q m − : then the claim follows by hypothesis. Case 2: x ∈ K m ( r ) and either x (cid:31) r y for all y ∈ Q m − or y (cid:31) r x for all y ∈ Q m − :the claim is immediate. Case 3: x ∈ K m ( r ) and y ∈ Q m − : If y (cid:31) r x , then y − (cid:15) (cid:31) r x for some (cid:15) > y − (cid:15) belongs to the same region as y . If y ∼ r x , then V i ( y ) ≥ V i ( x ). If this doesnot hold with equality, then there is a y ∈ Q m − so that y (cid:37) r x and y (cid:31) r y (since y (cid:37) r y ). But then y (cid:31) r x , a contradiction. If x (cid:31) r y but V i ( y ) ≥ V i ( x ), there exists z ∈ Q m − so that V i ( z ) ≤ V i ( y ) and z (cid:37) r x . But then by transitivity and hypothesis, y (cid:37) r z (cid:37) r x . Case 4: x, y ∈ K m ( r ) and Case 2 does not hold for either x or y : Suppose x (cid:37) r y .If not, then V i ( y ) > V i ( x ) so there exists a z ∈ Q m − so that z (cid:37) r x and z (cid:37) r y . Byweak order, y (cid:31) r z and so y (cid:31) r x , a contradiction.Since it represents (cid:37) r on K m ( r ), it also agrees with (cid:37) m on K m ( r ). Hence it is anincreasing transformation of U i within K i ( r ) for each i ≤ m . Renormalize V i so thatits range is a subset of [ − − i, − i ].For any x, y ∈ A i , the above establishes that V i ( x ) ≥ V i ( y ) ⇐⇒ x (cid:37) r y . Forany x ∈ A i and y ∈ A j where i < j , x (cid:31) r y by Lemma 5 and construction. Since V i ( x ) > − − i , V j ( y ) < − j , and − − i > − j , we have V i ( x ) > V j ( y ). Deﬁne U k ( ·| r ) toagree with the appropriate restriction of V i , and conclude {(cid:31) r } r ∈ X conforms to CTMunder K . Since r was arbitrary, this completes the proof. (cid:3) A.2.

Proof for Theorem 2.

Suﬃciency is easy to verify. Suppose that U k ( x ) = P ni =1 U ki ( x i ). We show that for every category j there exists a vector w (cid:29) U j ( x ) = P ni =1 w i U ki ( x i ) represents (cid:31) j on E k T E j .Consider dimension 1, and the rest follow the same arguments. The goal is toshow that U k ( x ) − U k ( y ) ≥ U k ( a ) − U k ( b ) if and only if U j ( x ) − U j ( y ) ≥ U j ( a ) − U j ( b )for any x, y, a, b ∈ E k T E j . If this is the case, then standard uniqueness results givethat U j ( x ) = αU k ( x ) + β . The β can be dropped completing the claim.Let π i be the projection onto the i -coordinate. Then, E k = π ( E k ) is open andconnected for any category k . This follows from E k connected and open and π i con-tinuous. In R , connected implies convex. Claim 1.

For any z ∈ E k T E j , there exists a neighborhood O z = B (cid:15) z ( z ) so that U k ( x ) − U k ( y ) ≥ U k ( a ) − U k ( b ) if and only if U j ( x ) − U j ( y ) ≥ U j ( a ) − U j ( b ) for any x, y, a, b ∈ O z . To see it is true, pick x ∈ E k T E j . Then there is an a l ∈ E l with a l = x for l = k, j . Let U k − i ( y ) = P j = i U kj ( y j ) for any y ∈ X . Since each a l ∈ K l ( r l ) for some r l ∈ X , there exists an (cid:15) l > B (cid:15) l ( a l ) ⊂ K l ( r l ) ⊂ E l , where the distance isgiven by the supnorm. Pick (cid:15) ∈ (0 , (cid:15) l ) so that U l ( x + (cid:15) ) − U l ( x − (cid:15) ) < U l − ( a l + (cid:15) l ) − U l − ( a l − (cid:15) l )for l = k, j . Then, for any a, b ∈ [ x − (cid:15), x + (cid:15) ] there exists y a − , y b − so that ( a, y a − ) , ( b, y b − ) ∈ B (cid:15) k ( a k ) and ( a, y a − ) ∼ r k ( b, y b − ) by Category Continuity and CM. In particular, U k ( a ) − U k ( b ) = U k − ( y b − ) − U k − ( y a − ). For any a , b ∈ [ x − (cid:15), x + (cid:15) ], it holds that U k ( a ) − U k ( b ) ≥ U k ( a ) − U k ( b ) if and only if ( b , y a − ) (cid:37) r k ( a , y b − ). Similarly,there exist z a − , z b − so that ( a, z a − ) , ( b, z b − ) ∈ B (cid:15) j ( a j ) and ( a, z a − ) ∼ r j ( b, z b − ). Now,( b , z b − ) (cid:37) r j ( a , z a − ) if and only if U j ( a ) − U j ( b ) ≥ U j ( a ) − U j ( b ). By ReferenceInterlocking and weak order, ( b , z b − ) (cid:37) r j ( a , z a − ) if and only if ( b , y a − ) (cid:37) r k ( a , y b − ),so we conclude that the claim holds with (cid:15) x = (cid:15) .We now extend to the entire domain (this follows similar arguments in CW). Pickan arbitrary x ∗ < x ∗ ∈ E k T E j and consider Z = ( x ∗ , x ∗ ]. If the claim is true, thenstandard uniqueness results give that U j ( x ) = αU k ( x ) + β for all x ∈ O z for some α >

0. Let α ∗ , β ∗ be the constants so that U j ( x ) = α ∗ U k ( x ) + β ∗ for all x in theneighborhood of x ∗ , as guaranteed to exist by the claim.Let Z = n s ∈ Z : U j ( x ) = α ∗ U k ( x ) + β ∗ for all x ∈ ( x ∗ , s ] o .Z is not empty by the claim. We show that it is both open and closed by pickingany s ∈ cl ( Z ) and showing s ∈ int ( Z ). Since [ x ∗ , s ] is compact and O = { O z : z ∈ [ x ∗ , s ] } is an open covering, there exists { O , . . . , O n } ⊂ O with x ∗ ∈ O , s ∈ O n and O m T O m = ∅ for all m ≥ m + 2. On each O m , there exists α m , β m so that theutility indexes agree by the claim. Also, O m and O m +1 have non-empty intersectionswith more than two points, so ( α m +1 , β m +1 ) = ( α m , β m ). In particular, O intersects O x ∗ so α m = α ∗ for all m . Then O n T Z ⊂ Z , i.e. s ∈ int ( Z ), so cl ( Z ) ⊂ int ( Z ) ⊂ Z ⊂ cl ( Z ), i.e. Z is both closed and open relative to Z . Conclude Z = Z since Z connected.Since U j ( x ) = αU k ( x ) + β for all x ∈ ( x ∗ , x ∗ ] for any interval in the domain, itholds for the whole domain as well. Extend to other categories that intersect E i S E j inductively. If there is no intersecting category, we can start again and obtain a (dis-joint) interval, the values of U i (and U j ) on which have no bearing on the DM’s choices.Similar arguments obtain for the other dimensions. Moreover, there is no loss in settingeach β = 0. This completes the proof. (cid:3) A.3.

Proof of Theorem 3.

To save notation, until after Lemma 10, we ﬁx r andwrite K k instead of K k ( r ) and (cid:37) instead of (cid:37) r . We also identify xα k y with thealternative αx ⊕ k (1 − α ) y . Let ( U , . . . , U n ) be the additive functions that represent (cid:37) , . . . , (cid:37) n . Observe that U k ( xα k y ) = αU k ( x ) + (1 − α ) U k ( y ) for any α , provided that x, y, xα k y ∈ E k .Recall from Deﬁnition 6 that an indiﬀerence sequence is a ﬁnite sequence of cate-gories with indiﬀerence between each succeeding members. Deﬁnition 7.

The function v is a utility for the indiﬀerence sequence ( Q , . . . , Q m ) if v is an increasing additive utility function on each Q k and for all k , x, y ∈ Q k S Q k +1 : x (cid:37) y ⇐⇒ v ( x ) ≥ v ( y ). Lemma 6. If x k ∈ K k , x l ∈ K l , and x k ∼ x l , then there is a > , b ∈ R such that for x ∈ K k and y ∈ K l , x (cid:37) y ⇐⇒ U k ( x ) ≥ αU l ( y ) + β .Proof. W.L.O.G., take U k ( x k ) = 0. There is (cid:15) k > B (cid:15) k ( x k ) ⊂ K k . By CMand Category Continuity, there is (cid:15) l > B (cid:15) l ( x l ) ⊂ K l and for all y ∈ B (cid:15) l ( x l ), x ∗ = x k + (cid:15) k (cid:31) y (cid:31) x k − (cid:15) k = x ∗ . For any y ∈ K l and α such that yα l x l ∈ B (cid:15) l ( x l ), thereexists β ∈ (0 ,

1) such that x ∗ β k x ∗ ∼ yα l x l by Category Continuity, CM, and that (cid:37) isa weak order. Let V l ( y ) = α − U k ( x ∗ β k x ∗ ). This is well deﬁned, additive, increasing,and ranks alternatives in the same way as U l . Thus, V l ( y ) = aU l ( y ) + b for some a > b ∈ R .For any x ∈ K k and y ∈ K l , pick α ∈ [0 ,

1] such that xα k x k ∈ B (cid:15) k ( x k ) and yα l x l ∈ B (cid:15) l ( x l ). By construction, yα l x l ∼ y when y ∈ B (cid:15) k ( x k ) and U k ( y ) = αV l ( y ). Thus, xα k x k (cid:37) y ∼ yα l x l holds if and only if U k ( x ) ≥ V l ( y ) and x (cid:37) y ⇐⇒ xα k x k (cid:37) yα l x l by AAC since x k ∼ x l , completing the proof. (cid:3) For an indiﬀerence sequence ( Q , . . . , Q m ) with utility v , we label the range ofutilities as cl ( v ( Q k )) = [ l k , u k ] where l k ≤ u k . Note that we allow Q k = Q l for k = l . Lemma 7.

For an indiﬀerence sequence ( Q , . . . , Q m ) , there is an aﬃne, increasingutility v for it.Proof. The proof is by induction. We claim that there is a utility v k : X → R thatis a utility for the IS ( Q , . . . , Q k ) for any k . When k = 1 or k = 2, this is true bythe above lemmas. The induction hypothesis (IH) is that the claim is true for k = N .Consider k = N + 1. Let v N be the utility for ( Q , . . . , Q N ) be index that exists by theIH. If Q N +1 ⊆ S Ni =1 Q i , then we are done. If not, then for Q N = K l , there is no lossin normalizing v N so that it equals U l on K l ( r ). Suppose Q N +1 = K j ( r ), and let α, β be the scalars claimed to exist by Lemma 6, so that U j ( x ) ≥ αU l ( y ) + β ⇐⇒ x (cid:37) r y for x ∈ K k ( r ) and y ∈ K l ( r ). Restricted to Q N , v N = U l , so we can deﬁne v N +1 ( x ) = αv N ( x ) + β if x ∈ S Ni =1 Q i and v N +1 ( x ) = U j ( x ) if x ∈ Q N +1 . Then, if l < N and x, y ∈ Q l S Q l +1 , then we are done by the IH, since v N +1 ( x ) ≥ v N +1 ( y ) ⇐⇒ v N ( x ) ≥ v N ( y ). If x, y ∈ Q N S Q N +1 , then Lemma 6 andconstruction implies the result. The claim then holds by induction. (cid:3) Lemma 8.

Fix an indiﬀerence sequence ( Q , . . . , Q n ) with utility v . If x k ∈ Q k for k = i, i +1 , i +2 with x i ∼ x i +1 ∼ x i +2 , then ( Q , . . . , Q i , Q i +2 , . . . , Q n ) is an indiﬀerencesequence (after relabeling) with utility v .Proof. The Lemma is vacuously true for any 1 or 2-element IS. Fix an IS ( Q , . . . , Q n )with n ≥ v as above, and suppose x k ∈ Q k for k = i, i + 1 , i + 2 with x i ∼ x i +1 ∼ x i +2 . By transitivity x i ∼ x i +2 , so ( Q , . . . , Q i , Q i +2 , . . . , Q n ) is an IS; it remains to beshown that v is a utility for it. There is an (cid:15) > B = B (cid:15) ( v ( x i )) ⊂ ( l k , u k ) for k = i, i +1 , i +2. Let v − ( u ) : B → Q i +1 be an arbitrary point in Q i +1 such that v [ v − ( u )] = u . Now, ﬁx x ∈ Q i and y ∈ Q i +2 . For α small enough, v ( xα i x i ) , v ( yα i +2 x i +2 ) ∈ B .Then xα i x i ∼ v − ( v ( xα i x i )) and yα i +2 x i +2 ∼ v − ( v ( yα i +2 x i +2 )). So x (cid:37) y ⇐⇒ xα i x i (cid:37) yα i +2 x i +2 ⇐⇒ v − ( v ( xα i x i )) (cid:37) v − ( v ( yα i +2 x i +2 )) ⇐⇒ v [ v − ( v ( xα i x i ))] ≥ v [ v − ( v ( yα i +2 x i +2 ))] ⇐⇒ αv ( x ) + (1 − α ) v ( x i ) ≥ αv ( y ) + (1 − α ) v ( x i +2 ) ⇐⇒ v ( x ) ≥ v ( y )This establishes the Lemma. (cid:3) Lemma 9.

Fix an indiﬀerence sequence ( Q , . . . , Q n ) with utility v . If ( l , u ) T ( l n , u n ) = ∅ , then there exists i and x k ∈ Q k for k = i, i + 1 , i + 2 with x i ∼ x i +1 ∼ x i +2 .Proof. If there is i with ( l i , u i ) T ( l i +2 , u i +2 ) = ∅ , then there is u ∈ T j = i,i +1 ,i +2 ( l j , u j ) sothere exists x j ∈ Q j with v ( x j ) = u for j = i, i + 1 , i + 2 and thus by the hypothesis, x i ∼ x i +1 ∼ x i +2 . We show there exists such an i by contradiction. If l i +2 > u i for all i or l i > u i +2 for all i , then ( l , u ) T ( l n , u n ) = ∅ , a contradiction. So there must exist i such that [ l i +2 > u i and l i +2 > u i +4 ] or [ u i +2 < l i and u i +2 < l i +4 ]. In the ﬁrst case, l i +2 ∈ ( l i +1 , u i +1 ) T ( l i +3 , u i +3 ); in the second, u i +2 ∈ ( l i +1 , u i +1 ) T ( l i +3 , u i +3 ). In eithercase, we have a contradiction. (cid:3) Lemma 10.

Fix an indiﬀerence sequence ( Q , . . . , Q n ) with utility v . Then for all x, y ∈ S i Q i , x (cid:37) y ⇐⇒ v ( x ) ≥ v ( y ) .Proof. This is clearly true if n = 1. (IH) Suppose the claim is true for any IS with m < n elements. Fix an IS ( Q , . . . , Q n ) with utility v . If x / ∈ Q S Q n or y / ∈ Q S Q n ,then the claim immediately follows from the IH, and clearly holds if x, y ∈ Q i for some i . So it suﬃces to consider arbitrary x ∈ Q and y ∈ Q n . By Lemmas 8 and 9, if( u , l ) T ( l n , u n ) = ∅ , we can form a shorter IS from Q to Q n and the claim thenfollows from the IH. There are two cases to consider: l n > u and u n < l . Consider l n > u . Therange of v restricted to S n − i =1 Q i is dense in S n − i =1 ( l i , u i ) = (¯ l, ¯ u ). Note l n ∈ (¯ l, ¯ u ) since x n − ∼ y n , so ( l n − , u n − ) T ( l n , u n ) = ∅ . Then ( l n , v ( y )) is an open interval having anon-empty intersection with (¯ l, ¯ u ). Since the range of v is dense in (¯ l, ¯ u ), there exists y ∈ Q n with l n < v ( y ) < v ( y ). Since l n > u , n >

1. Then ( Q , . . . , Q n ) and( Q n , . . . , Q n ) are both ISes with strictly less than n elements. Applying the IH, y (cid:31) x and y (cid:31) y . Conclude using transitivity that y (cid:31) x . Similar arguments obtain thedesired conclusion when u n < l . (cid:3) Deﬁne ./ r as in the proof of Theorem 1, and let A , . . . , A n be the distinct indiﬀer-ence classes of ./ r . Again using Lemma 5, we can relabel so that x ∈ A i and y ∈ A i +1 implies x (cid:31) r y . By Lemma 10, there is v i on A i so that v i is additive and increasingwithin categories and x (cid:37) y ⇐⇒ v i ( x ) ≥ v i ( y ) for all x, y ∈ A i .By Unbounded and Lemma 5, every positive unbounded region (if any) is a subsetof A , and every negative unbounded region (if any) is a subset of A n . If one region isboth positive and negative unbounded, then n = 1. Therefore, v i ( A i ) is bounded forall i ∈ (1 , n ), and v n ( A n ) is bounded above whenever n >

1. Deﬁne V ( x ) = v ( x ) forall x ∈ A . For x ∈ A i with i >

1, deﬁne V ( x ) recursively by V ( x ) = v i ( x ) − sup y ∈ A i v i ( y ) + inf y ∈ A i − V ( y ) − . Observe V ( · ) is a positive aﬃne transformation of v i ( · ) when restricted to A i , and if x ∈ A i , y ∈ A j and i > j , then V ( x ) > V ( y ). Thus V represents (cid:37) r and, whenrestricted to any given region, is aﬃne and increasing.Deﬁning U k ( ·| r ) as the (unique) aﬃne transformation of U k so it agrees with V on K k ( r ) establishes that (cid:37) r is an aﬃne CTM. Since r was arbitrary, this establishesthat each (cid:37) r has such a representation. Conclude that { (cid:37) r } conforms to Aﬃne CTM,completing the proof. (cid:3) A.4.

Proof of Theorem 4.

Without loss of generality, normalize so that U ( ·| r ) = U ( ·| r ) for all r, r . Suppose U k ( ·| r ) = U k ( ·| r ) for some r, r and some k . Then, let¯ (cid:15) = d ( r, r ) and pick a sequence ˆ r n → ˆ r such that: U k ( ·| ˆ r n ) = U k ( ·| r ), ˆ r n ∈ B ¯ (cid:15) ( r ) forall n , and d (ˆ r n , r ) → inf { d ( r , r ) : U k ( ·| r ) = U k ( ·| r ) } . Since ˆ r n ∈ cl ( B ¯ (cid:15) ( r )), there is noloss in assuming this sequence converges. Similarly, let r n be a sequence in B ¯ (cid:15) ( r ) suchthat r n → ˆ r and U k ( ·| r ) = U k ( ·| r n ).By hypothesis and that each K k ( r ) is open, there exists (cid:15) > x k and x such that B (cid:15) ( x k ) ⊂ K k (ˆ r ), B (cid:15) ( x ) ⊂ K (ˆ r ), and x k ∼ ˆ r x . By continuity of the region functions, B (cid:15) ( x k ) ⊆ K i (ˆ r n ) ∩ K i ( r n ) and B (cid:15) ( x ) ⊆ K (ˆ r n ) ∩ K ( r n ) for n large enough. For z close enough to x k , there exists y ( z ) ∈ B (cid:15) ( x ) such that z ∼ ˆ r y ( z ). But then by SC, z ∼ r n y ( z ) and z ∼ ˆ r n y ( z ). Thus U k ( z | r n ) = U ( y ( z ) | r n ) = U ( y ( z ) | ˆ r n ) = U k ( z | ˆ r n ) for all z close enough to x k , implying that U k ( ·| r n ) = U k ( ·| ˆ r n ), a contradiction. Conclude U k ( ·| r ) = U k ( ·| r ) for all r, r . (cid:3) A.5.

Examples from Table 1.

Example 1 shows that BGS violates Cancellation andinspecting Figure 1 shows it violates Monotonicity. It remains to show that TK violatesReference Irrelevance and that MO violates Cancellation. This is established by thefollowing two examples.

Example 2 (TK violates Reference Irrelevance) . Consider a TK model with λ = λ = 2. Then, for r = (10 , x = (12 ,

12) and y = (9 , y (cid:37) r x since (12 −

10) +(12 −

10) = 2(9 −

10) + (16 − r = (11 , x (cid:31) r y since (12 −

11) + (12 − > −

11) + (16 − x ∈ R GL ( r ) T R GL ( r ) and r ∈ R GL ( r ) T R GL ( r ), so thefamily violates Reference Irrelevance. Example 3 (MO violates Cancellation) . Let Q ( r ) = { x ∈ X : x / x > r / r } and c ( r ) = 1. Then, let x = (2 , y = (1 , z = (4 , r = (0 . , . x , z ) = (2 , (cid:37) r (4 ,

2) = ( z , y ) and ( z , x ) = (4 , (cid:37) r (1 ,

4) = ( y , z ) because allfour points belong to Q ( r ), cancellation requires that x (cid:37) r y . However, x / ∈ Q ( r ), so y (cid:31) r x , so cancellation does not hold.A.6. Other models and CTM.

In this subsection, we present the functional formsof the other models of salience we discussed, and show that they are not CTM. • Gabaix [2014] assumes a rational DM would maximize u ( a, w ) but actuallymaximizes u ( a, ( w m ∗ , . . . , w n m ∗ n ))where m ∗ ∈ arg min m ∈ [0 , n X i,j (1 − m i )Λ ij (1 − m j ) + κ X i m αi where Λ ij incorporates the “variance” in the marginal utility of dimensions i and j . When n is large, m ∗ i is often zero, so ( w m ∗ , . . . , w n m ∗ n ) is a “sparse”vector. • Tversky & Kahneman [1991] refer in general to V CT K ( x | r ) = X i v i ( u i ( x i ) − u i ( r i ))where v i is concave above 0 and convex below • Bordalo et al. [2019] and the continuous form of the salient thinking model has V CBGS ( x | r ) = w ( x , r ) x + w ( x , r ) x where w has the same properties as a salience function. • Munro & Sugden [2003] use the functional from V MS ( x | r ) = A ( r ) X i γ i r ρ − βi x βi ! β • Bhatia & Golman [2013] assume that the DM chooses the bundle x that maxi-mizes U ( x | r ) = α ( r )[ V ( x ) − V ( r )] + α ( r )[ V ( x ) − V ( r )]given that a reference point r , where each α i is increasing and positive.The ﬁrst fails to be CTM, as the indiﬀerence curves have the same slope every-where for a ﬁxed context. If they were CTM, then they would necessarily have onlya single region. Single region CTM coincides with the neoclassical model. The ﬁnalfour explicitly take into account a reference point. In all four, it is easy to see thatthe reference point aﬀects the marginal rate of substitution between attributes. Thisimplies a violation of weak reference irrelevance for any given category function: anytwo points in the same category that are indiﬀerent to each other necessarily remainso for a suﬃciently small change in the reference point.A.7. Proof of Proposition 2. K satisfying S0-S4 implies that E = E = R n ++ , sothe structure assumption is satisﬁed. Moreover, Theorem 5 gives that the categoriesare generated by a salience function. The axioms allow us to apply Theorems 2 and4 to get a strong CTM representation of the family with reweighted utility indexes.Hence, U k ( x ) = w k u ( x ) + w k u ( x ) + β k for each x ∈ X .There is no loss in normalizing so that β = 0. Pick x ∈ X with x > x , and byS4 observe that x ∈ K ( r ) for r = ( x , x /

2) and x ∈ K ( r ) for r = ( x / , x ). Since K ( r ) and K ( r ) are open, there exists (cid:15) > B (cid:15) ( x ) ⊂ K ( r ) T K ( r ). Since U is continuous and increasing, there is y ∈ B (cid:15) ( x ) with y < x so that U ( y ) = U ( x ),i.e. y ∼ r x ; this y necessarily has y > x by CM. Then, SDO implies y (cid:31) r x , i.e. U ( y ) > U ( x ), which requires w /w < w > w . We can incorporate β into u byreplacing it with u + β / ( w − w ) or into u by replacing β into u by replacing itwith u + β / ( w − w ). At least one does not involve dividing by zero, as otherwise w i = w i for i = 1 , (cid:3) A.8.

TK.

This subsection states and proves a characterization theorem for TK.

Proposition 5.

A family of preferences { (cid:37) r } r ∈ X has a TK representation if and onlyif it is an aﬃne CTM with a gain-loss regional function that satisﬁes Reference Inter-locking, Monotonicity, Cancellation, and continuity of each (cid:37) r . Tversky & Kahneman [1991, p. 1053] provide an alternative axiomatic character-ization of the model, and our result makes heavy use of their theorem.

Proof.

Necessity follows from the discussion above and TK’s theorem. To show suﬃ-ciency, we rely on TK’s theorem, which states that any monotone, continuous familyof preference relations that satisﬁes cancellation, sign-dependence and reference inter-locking has a TK representation. Given our assumptions, we need to show that { (cid:37) r } satisﬁes sign-dependence and reference interlocking.TK say that { (cid:37) r } satisﬁes sign-dependence if “for any x, y, r, s ∈ X , x (cid:37) r y ⇐⇒ x (cid:37) s y whenever x and y belong to the same quadrant with respect to r and withrespect to s , and r and s belong to the same quadrant with respect to x and withrespect to y .” This happens if and only if x ∈ K k ( r ) T K k ( s ) and y ∈ K k ( r ) T K k ( s )for some k ∈ { , , , } . Then, sign-dependence is exactly an implication of aﬃneCTM, since U k ( ·| r ) = αU k ( ·| s ) + β for α > { (cid:37) r } satisﬁes reference interlocking if “for any w, w , x, x , y, y , z, z that belong to the same quadrant with respect to r as well as with respect to s , w = w , x = x , y = y , z = z and x = z , w = y , x = z , w = y , if w ∼ r x , y ∼ r z , and w ∼ s x then y ∼ s z .” The assumptions on quadrants imply that w, w , x, x , y, y , z, z ∈ K k ( r ) T K l ( s ) for some k, l ∈ { , , , } . Since y , z ∈ K l ( s ),the conclusion follows immediately from RI. (cid:3) A.9.

Example 4.Example 4.

The following salience functions generates regions all satisfy S0-S3, butonly satisfy a subset of the other properties.(1) The function σ ( x, r ) = max { x,r } min { x,r } generates regions that violate S4-S6. Note σ ( a, a ) = a for a >

0. Then ( a, b + (cid:15) ) , ( a, b ) ∈ K ( a, b ) for all a > b andsmall enough (cid:15) >

0, contradicting S4 and S6, respectively. Also note σ ( a, a ) = σ ( √ a,

1) for a >

0. Hence, ( a, √ a ) / ∈ K ( a,

1) but ( a + (cid:15), √ a ) ∈ K ( a + (cid:15),

1) forevery (cid:15) >

0, violating S5.(2) The function σ ( x, r ) = | x − r | generates regions that satisfy S0-S4 but violateS5 and S6. Observe that (2 , √ / ∈ K (1 , √

2) since σ (2 ,

1) = σ ( √ , √

2) = 3,but (2 + (cid:15), √ ∈ K (1 + (cid:15), √

2) for any (cid:15) > σ (2 + (cid:15), (cid:15) ) = 3 + 2 (cid:15) > x = (2 , r = (4 ,

1) have x x = r r , but σ (2 , > σ (2 ,

1) =, so x ∈ K ( r ),contradicting S6.(3) The function σ ( x, r ) = |√ x −√ r | generates regions that satisfy S0-S5 but violateS6. Also, x = (2 ,

2) and r = (4 ,

1) have x x = r r , but σ (2 , > σ (2 , x ∈ K ( r ), contradicting S6. Diﬀerentiating establishes S4 and S5. (4) The function σ ( x, r ) = max { x,r } min { x,r } generates regions that satisfy S0-S6.A.10. Proof of Theorem 5.

We ﬁrst prove the following lemma.

Lemma 11. If K is a category function, then for any (cid:15) > and x so that B (cid:15) ( x ) ⊂ K i ( r ) , there exists δ > so that B (cid:15)/ ( x ) ⊂ K i ( r ) for all r ∈ B δ ( r ) .Proof. Let B (cid:15)/ ( x ). For each j = i , d ( K j ( r ) , B ) > (cid:15)/

2, where d ( · ) is the Hausdorﬀmetric, and continuity of K j implies that there exists a neighborhood O j of r sothat d ( K j ( r ) , B ) > (cid:15)/ r ∈ O j . Let O = T j = i O j . Then, for any r ∈ O , B / ∈ cl ( S j = i K j ( r )). Since cl ( S i K i ( r )) = X , B ⊂ cl ( K i ( r )). But since B is open, B ⊂ int ( cl ( K i ( r ))) = K i ( r ) since K i ( r ) is regular open. (cid:3) For suﬃciency, deﬁne a binary relation S by ( a, b ) S ( c, d ) if and only if ( a, c ) / ∈ K ( b, d ). S is clearly complete. It is also transitive by S3. We show it has an opencontour sets. Let S ∗ be the strict part of S . If ( a, b ) S ∗ ( c, d ), then x ∈ K ( r ) for x = ( a, c ) and r = ( b, d ). K ( r ) is open by S0 so there exists (cid:15) > B (cid:15) ( x ) ⊂ K ( r ). By Lemma 11, x ∈ K ( r ) for all r in a neighborhood O of r . Conclude( a , b ) S ∗ ( c , d ) for all ( a , b ) , ( c , d ) ∈ B (cid:15) ( x ) × O . Standard results then show existenceof a continuous function σ so that ( a, b ) S ( c, d ) if and only if σ ( a, b ) ≥ σ ( c, d ). σ issymmetric by S2 and increasing in contrast by S1 and S4. Hence x ∈ K ( y ) if and onlyif σ ( x , y ) > σ ( x , y ), and by S2, x ∈ K ( y ) if and only if y ∈ K ( x ) where x , y arethe reﬂections of x, y . Hence, x ∈ K ( y ) if and only if σ ( x , y ) < σ ( x , y ).Pick any a, b . By S3, σ ( a, b ) = σ ( b, a ) so ( a, b ) / ∈ K ( b, a ) for any a, b . By S5,( a + (cid:15), b ) / ∈ K ( b + (cid:15), a ). Then, ( b, a ) S ( a + (cid:15), b + (cid:15) ) so σ ( a, b ) = σ ( b, a ) ≥ σ ( a + (cid:15), b + (cid:15) ).Since a, b were arbitrary, diminishing sensitivity holds.For necessity, verifying that S0-S5 hold are trivial, except that each K i ( r ) is regularopen. To see this, pick r and x ∈ int ( cl ( K ( r ))) (symmetric arguments hold for K ).Suppose x (cid:29) r (the other cases follow by changing the signs). Then, there is (cid:15) > x = ( x − (cid:15), x + (cid:15) ) ∈ cl ( K ( r )). Then, there exists x ∈ K ( r ) that is arbitrarilyclose to ¯ x , and we can take it so that r < x < x and r < x < x . Then, σ ( x , r ) >σ ( x , r ) > σ ( x , > σ ( x , r ) since σ is increasing in contrast. Hence x ∈ K ( r ) andsince x was arbitrary, int ( cl ( K ( r ))) ⊂ K ( r ). Clearly, K ( r ) ⊂ int ( cl ( K ( r ))).Now we show the following are equivalent:(i) The functions K and K satisfy S0, S1, and S6,(ii) There exists a salience function σ s.t. x ∈ K k ( r ) ⇐⇒ σ ( x k , r k ) > σ ( x − k , r − k ) In this case it is actually a pseudo metric. That (ii) implies (i) follows from the ﬁrst part, and that S6 is implied by sym-metry and homogeneity of degree zero. Now, we show (i) implies (ii). Set σ ( a, b ) =max { a/b, b/a } . Clearly σ is a salience function, and we show that σ generates K and K . Fix r ∈ X and set A = { x : σ ( x , r ) > σ ( x , r ) } . We show A = K ( r ).Claim A T K ( r ) = ∅ . If not, pick x ∈ A T K ( r ). x ∈ A implies either (a) x /r > x /r and x /r > r /x or (b) r /x > x /r and r /x > r /x . If (a) and x ≤ r , then x /r > r /x ≥ x /r implies x > r r /x ≥ r , so there exists λ ∈ [0 ,

1) such that ( λx + (1 − λ ) r , x ) = ( r r /x , x ) = x . If (a) and x > r , then x > r x /r > r , so there exists λ ∈ (0 ,

1) such that ( λx + (1 − λ ) r , x ) = ( r x /r , x ) = x . By S1and x ∈ K ( r ), x ∈ K ( r ). However, we have either x x = r r or x /x = r /r so x / ∈ K ( r ) by S6, a contradiction. A similar contradiction obtains if (b) holds.Now, since A T K ( r ) = ∅ and K ( r ) S K ( r ) is dense, A ⊂ cl ( K ( r )). By S0, K ( r ) = int ( cl ( K ( r )). Since A is an open set contained in cl ( K ( r )), A ⊆ K ( r ).Similarly, for B = { x : σ ( x , r ) < σ ( x , r ) } , B ⊆ K ( r ). But( A [ B ) c = { x : x x = r r or x /x = r /r } , and by S6, ( A S B ) c T K k ( r ) = ∅ for k = 1 ,

2. Thus A = K ( r ) and B = K ( r ),completing the proof.Finally, ﬁx any HOD salience function s . Observe s ( a, b ) > s ( c, d ) if and only if s ( a/b, > s ( c/d,

1) by homogeneity if and only if s (max( a/b, b/a ) , > s (max( c/d, d/c ) , a/b, b/a ) > max( c/d, d/c ) by ordering. Thus if onesalience function generates the regions, every other salience function does as well. (cid:3) A.11.

Proof of Proposition 1.

Pick any r ∈ X . Observe that x = ( r + k, r ) neces-sarily belongs to K ( r ) by S4, as does an open set O x . This set O can be identiﬁedby looking at whether Cancellation and Monotonicity hold on the set. By varying r and k , we obtain a covering of the entirety of R by points that necessarily belongto the 1-salient region. This allows one to identify (cid:37) and obtain a representation U .Repeating with x = ( r , r + k ) obtains a representation U of (cid:37) .Fix any r ∈ X . Consider y (cid:29) r and let I ∗ ( y ) = { y : U k ( y ) = U k ( y ) and U − k ( y ) = U − k ( y ) } \ { y } . If y ∈ K k ( r ), then there exists y ∈ I ∗ ( y ) arbitrarily close to y so that y, y ∈ K k ( r ); forany such y , y ∼ r y . If y r y for every y ∈ I ∗ ( y ) T B (cid:15) ( y ) for some (cid:15) > y / ∈ K k ( r ). Conclude y ∈ K k ( r ) if and only if there exists y ∈ I ∗ ( y ) \ { y } arbitrarily close to y sothat y ∼ r y .We now infer whether σ ( x, a ) > σ ( y, b ) for any x, y, a, b by considering whetheran alternative x is in K ( r ) for appropriately chosen bundles so that x (cid:29) r . Thisis impossible if x = a and always true if y = b and x = a . For any other values,we have that σ ( x, a ) > σ ( y, b ) if and only if either ( x, y ) ∈ K ( a, b ), x > a and y > b ; ( x, b ) ∈ K ( a, y ), x > a , and b > y ; ( a, y ) ∈ K ( x, b ), x < a , and y > b ; or( a, b ) ∈ K ( x, y ), x < a , and b > y . In this way we can reveal the σ function and thusthe regions. (cid:3) Appendix B. Proofs and Extras from Section 5

B.1.

Axioms for c . This subsection formally states the adaptations of the axiomsfor reference dependent preferences { (cid:37) r } r ∈ X in terms of the choice correspondence c .Interpretation is identical to that of those axioms. Axiom (Category Cancellation*) . For all x , y , z , x , y , z ∈ R ++ and category k : if ( x , z ) ∈ c ( S ) , ( z , y ) ∈ S , ( z , x ) ∈ c ( S ) , ( y , z ) ∈ S , ( x , x ) , ( y , y ) ∈ S and S i ⊂ K k ( A ( S i )) for i ∈ { , , } , then ( x , x ) ∈ c ( S ) whenever ( y , y ) ∈ c ( S ) . Axiom (Category Monotonicity*) . For any x, y ∈ X : if x ≥ y and x = y , then ( y, k ) (cid:37) R ( x, k ) for any category k . Axiom (Category Continuity*) . For any S ∈ X and any (cid:15) > so that E T S \ c ( S ) = ∅ where E ≡ S x ∈ c ( S ) B (cid:15) ( x ) there exists δ > so that if S ∈ X , d ( A ( S ) , A ( S )) < δ , andfor any y ∈ S , there is y ∈ S so that y ∈ B δ ( y ) , then c ( S ) ⊂ E whenever S T E = ∅ . Deﬁne (cid:37)

R,k by x (cid:37) R,k y if and only if ( x, k ) (cid:37) R ( y, k ). Using this relation, we candeﬁne ⊕ k for each category as we did with preference relations. Axiom (Aﬃne Across Categories*) . For any S , S , S ∈ X , x i ∈ K j ( A ( S i )) , y i ∈ K k ( A ( S i )) for i = 1 , , , and any α ∈ (0 , so that ( x , j ) (cid:37) R ( αx ⊕ j (1 − α ) x , j ) and ( αy ⊕ k (1 − α ) y , k ) (cid:37) R ( y , k ) :if x ∈ c ( S ) and x ∈ c ( S ) , then y / ∈ c ( S ) . Axiom (Salient Dimension Overvalued*) . For x, y ∈ S T S with x k > y k and y − k >x − k , if x, y ∈ K k ( A ( S )) , x, y ∈ R − k ( A ( S )) , and y ∈ c ( S ) , then x / ∈ c ( S ) . Axiom (Reference Interlocking*) . For any a, b, a , b , x , y , x, y ∈ X with x − i = a − i , y − i = b − i , x i = a i , y i = b i , x i = x i , y i = y i , a i = a i , b i = b i :if x ∼ R ∗ k y , a (cid:37) R ∗ k b , and x ∼ R ∗ j y , then it does not hold that b (cid:31) R ∗ j a . B.2.

Proof of Theorem 6.Lemma 12.

Assume that Revealed Structure holds, and that A is a generalized average.If Category-SARP, Category Monotonicity*, Category Cancellation*, and CategoryContinuity* hold, then for any category k there exists a Category utility U k so that forany x, y ∈ E R,k , ( x, k ) (cid:37) R ( y, k ) ⇐⇒ U k ( x ) ≥ U k ( y ) . Proof.

Fix a category i and pick any x, y ∈ E R,i . Let E ∗ = E R,i T B d ( x,y )+1 ( x ). As inproof of Lemma 3, there is a continuous path θ : [0 , → E ∗ so that θ (0) = x and θ (1) = y that crosses each (cid:37) R,i indiﬀerence curve at most once, and Y = θ − ([0 , z ∈ Y , there exists an open set z ∈ B z ⊂ E ∗ sothat (cid:37) R,i is complete on B z . If this is the case, we can mimic the rest of the proof ofLemma 3 to show that either x (cid:37) R,i y or y (cid:37) R,i x .By deﬁnition of E ∗ , for any z ∈ E ∗ , there exists S ∈ X with A ( S ) = r so that c ( S ) = z . Since K i ( r ) is open, there exists (cid:15) > B (cid:15) ( z ) ⊂ K i ( r ). ByLemma 11, there exists (cid:15) > r ∈ B (cid:15) ( r ) implies B (cid:15) ( z ) ⊂ K i ( r ). Pick ζ ∈ (0 , ) so that B ζ ( z ) ∩ S = z . By Category Continuity*, there exists (cid:15) > S ∈ X with d ( A ( S ) , A ( S )) < (cid:15) , for any y ∈ S , there is y ∈ S so that y ∈ B (cid:15) ( y ), and S T B ζ ( x ) = ∅ , then c ( S ) ⊂ B ζ ( x ). By Generalized Average, thereexists (cid:15) > z ∈ B (cid:15) ( z ) implies d ( A ( S \ { z } ∪ { z } ) , A ( S )) < min { (cid:15) , (cid:15) } / (cid:15) ∗ = min { (cid:15) , (cid:15) , (cid:15) , (cid:15) , ζ } .Pick any x , y ∈ B (cid:15) ∗ / ( z ) and let z ∗ = z − (cid:15) ∗ . Set S = S \ { z } ∪ { z ∗ } , not-ing d ( r, A ( S )) < (cid:15) /

2. By Generalized Average, there exists S ∗ with { x , y } S S ⊂ S ∗ so that d ( A ( S ∗ ) , A ( S )) < (cid:15) ∗ / d ( S , S ∗ \ [ { x , y } S S ]) < ( (cid:15) ∗ / . Since d ( A ( S ∗ ) , r ) ≤ d ( A ( S ∗ ) , A ( S )) + d ( A ( S ) , r ) < (cid:15) , x , y ∈ K i ( A ( S ∗ )). Since everymember of S ∗ is no more than (cid:15) ∗ away from a member of S , Category Continuity* im-plies that c ( S ∗ ) ⊂ B ζ ( z ). CM* gives that either x ∈ c ( S ∗ ) or y ∈ c ( S ∗ ), so x (cid:37) R,i y or y (cid:37) R,i x .Continuity follows along the same lines as Lemma 2. CM* gives that it is alsomonotone, and Category Cancellation* that it is locally additive. Apply Theorem 2.2of Chateauneuf & Wakker [1993] to get a globally additive representation U k . (cid:3) By Lemma 12, there exists a category utility U k for each category. Since E R,k isdense in D k , we can extend U k to D k uniquely. By Generalized Average and CategoryContinuity*, for any S ∈ X with z ∈ [ D k \ E R k ] ∩ S , there is a z ∈ E R,k arbitrarily closeto z so that c ( S ) = c ([ S \ { z } ] ∪ { z } ), so it is suﬃcient to establish a representationwhen all alternatives categorized as k in S belong E R,k for each k and S . Fix two regions k and j . By CAR, for any x ∈ E R,k there exists x ∈ E R,k , y ∈ E R,j , and S ∈ X so that x , y ∈ c ( S ) and x ∼ R,k x . This implies there exists astrictly increasing function H so that V ( x | r ) = U k ( x ) when x ∈ K k ( r ) and V ( x | r ) = H ( U j ( x )) when x ∈ K j ( r ) represents choice (when S ⊂ K k S K j ). This is well-deﬁnedand represents choice by Category SARP. By AAC*, H is an aﬃne function. Theargument are readily seen to extend inductively to all regions, which complete theproof. (cid:3) B.3.

Proof of Lemma 1.

Pick any x ∈ X and set S = { x, x } where x = ( x , x ).Then, A ( S ) = x by strong generalized average, so both x and x are 1-salient by S4.By CM*, x ∈ c ( S ), and so x ⊂ E R, . x was arbitrary, so X = E R, . Similar for K . (cid:3) B.4.

Proof of Proposition 3.

By Lemma 1, the structure assumption is satisﬁed.By Theorem 5, the category function is generated by a salience function. By Theorem6, c conforms to Strong CTM. Mimicking the arguments of Theorem 2, ReferenceInterlocking implies U k ( x ) = w k u ( x ) + w k u ( x ) + β k . The rest follows from thearguments that establish Proposition 2. (cid:3) B.5.

Proof of Proposition 4.

Pick any r , and suppose U k ( r ) ≥ U − k ( r ). Since A isa generalized average, for any y (cid:29) r there exists a menu S so that A ( S ) is arbitrarilyclose to r and y (cid:29) x for all x ∈ S (pick S so its convex hull is in a small enoughneighborhood of r that doesn’t include y ). By making that neighborhood smaller ifnecessary, either y belongs to K k ( A ( S )) or y / ∈ K k ( r ). There exists a y arbitrarilyclose to, but not equal to, y , so that U k ( y ) = U k ( y ) and U − k ( y ) = U − k ( y ). In theformer case either y or y is chosen from the menu S by categorical monotonicity*,where S is a menu (assumed to exist by generalized average) with A ( S ) suﬃcientlyclose to A ( S ) and y, y ∈ S . Moreover, y ∈ c ( S ) if it is close enough that it toobelongs to K k . Conclude that y ∈ K k ( r ) implies that there exists y arbitrarily closeto, but not equal to, y with U k ( y ) = U k ( y ) and { y, y } = c ( S ). If y / ∈ K k ( r ), thenboth y and y cannot be chosen. Either y is not chosen because it is in K − k , or theDM will not be indiﬀerent between y and y . The rest follows from Proposition 1.. The rest follows from Proposition 1.