[PDF] Multinomial logit processes and preference discovery: inside and outside the black box

Abstract

We provide two characterizations, one axiomatic and the other neuro-computational, of the dependence of choice probabilities on deadlines, within the widely used softmax representation p t (a,A)= e u(a) λ(t) +α(a) ∑ b∈A e u(b) λ(t) +α(b) where p t (a,A) is the probability that alternative a is selected from the set A of feasible alternatives if t is the time available to decide, λ is a time dependent noise parameter measuring the unit cost of information, u is a time independent utility function, and α is an alternative-specific bias that determines the initial choice probabilities reflecting prior information and memory anchoring. Our axiomatic analysis provides a behavioral foundation of softmax (also known as Multinomial Logit Model when α is constant). Our neuro-computational derivation provides a biologically inspired algorithm that may explain the emergence of softmax in choice behavior. Jointly, the two approaches provide a thorough understanding of soft-maximization in terms of internal causes (neurophysiological mechanisms) and external effects (testable implications).

Full PDF

MMultinomial logit processes and preference discovery:outside and inside the black box

Simone Cerreia-Vioglio, Fabio Maccheroni, Massimo Marinacci

Universit(cid:224) Bocconi

Aldo Rustichini

University of Minnesota

April 28, 2020

Abstract

We provide two characterizations, one axiomatic and the other neuro-computational, of the de-pendence of choice probabilities on deadlines, within the widely used softmax representation p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) where p t ( a; A ) is the probability that alternative a is selected from the set A of feasible alternativesif t is the time available to decide, (cid:21) is a time dependent noise parameter measuring the unit costof information, u is a time independent utility function, and (cid:11) is an alternative-speci(cid:133)c bias thatdetermines the initial choice probabilities and possibly re(cid:135)ects prior information.Our axiomatic analysis provides a behavioral foundation of softmax (also known as MultinomialLogit Model when (cid:11) is constant). Our neuro-computational derivation provides a biologically inspiredalgorithm that may explain the emergence of softmax in choice behavior. Jointly, the two approachesprovide a thorough understanding of soft-maximization in terms of internal causes (neurophysiologicalmechanisms) and external e⁄ects (testable implications). Keywords:

Discrete Choice Analysis, Drift Di⁄usion Model, Heteroscedastic Extreme Value Mod-els, Luce Model, Metropolis Algorithm, Multinomial Logit Model, Quantal Response Equilibrium,Rational Inattention

Human decisions are often made under pressing deadlines that substantially a⁄ect decision processes.Think of a trader deciding among alternative investments in fast moving (cid:133)nancial markets, a triage nursescreening patients in life threatening conditions, a soccer player under pressure choosing an action in asplit second. In all of these examples, the decision maker is given a constrained deliberation time to gatherand process noisy information about alternatives, whose nature he typically only imperfectly knows. Thisbinding constraint, with deliberation typically lasting till deadline, prevents the decision maker to fullylearn the nature of alternatives, so to discover his preference over them and select the best one. For thisreason, noise in information acquisition translates into stochastic choice behavior: when facing in di⁄erentoccasions the same set of alternatives, the decision maker might well end up choosing di⁄erently.1n this paper we study stochastic choice behavior caused by time constrained information processing.We focus on softmax probabilistic choices, the most classic stochastic choice speci(cid:133)cation, in which theprobability of choosing alternative a from a menu A is: p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) (1)Here u ( a ) is the true, but unknown to the decision maker and to the analyst, subjective value of alternative a , (cid:21) ( t ) is the cost of processing one unit of information in t seconds, and (cid:11) ( a ) is a behavioral initial biasfor alternative a , possibly due to past information. When the initial bias is absent, (1) reduces to amultinomial logit speci(cid:133)cation. The initial bias determines choice behavior when there is no deliberationtime: p ( a; A ) = lim t ! p t ( a; A ) = e (cid:11) ( a ) P b A e (cid:11) ( b ) At the opposite extreme, under unconstrained deliberation time the best alternatives are selected: p ( a; A ) = lim t !1 p t ( a; A ) > () a arg max A u as prescribed by standard ordinal utility analysis. In general, under constrained but non-zero deliberationtime, an intermediate stochastic behavior results, which gives, as deliberation time increases, a higherchance (cid:150)in the sense of stochastic dominance (cid:150)of choosing better alternatives.Mateijka and McKay (2015) have shown that softmax stochastic choice behavior arises when thedecision maker optimally processes information about u (cid:150) the unknown (cid:147)state of nature(cid:148) (cid:150) under anentropic cost of information. Their study provides an important optimal information acquisition foundationfor softmax behavior. In this paper we study such behavior from two di⁄erent, yet complementary,viewpoints that integrate their analysis. First, we provide a framework for the external, behavioral, studyof an analyst who observes the choices of the decision maker and interprets them in the (cid:147)as if(cid:148) mode ofrevealed preference analysis, through behavioral axioms that characterize softmax stochastic behavior.Second, we pursue an internal, neural, approach that provides a causal analysis of the decision makerchoices through a biologically inspired algorithmic decision process that may explain softmax emergencein intelligent behavior and that naturally links multi-alternative choice with the classical di⁄usion modelparadigm of binary choice.These two complementary approaches provide, along with the Mateijka and McKay (2015) optimalityanalysis, a complete perspective on softmaximization as a model of preference discovery, in terms of bothinternal (neural) causes and external (behavioral) e⁄ects. In particular, we address two key questions:(i) Is the softmax model empirically testable, and can its parameters be identi(cid:133)ed by behavioral data?(ii) Is this model plausible from a neural viewpoint?This paper presents positive answers to both questions. More speci(cid:133)cally, in the (cid:133)rst part of thepaper (Sections 2 and 3) we address the (cid:133)rst question by carrying out an (cid:147)outside the black box(cid:148)revealedpreference analysis that leads to a representation theorem, Theorem 5, that axiomatizes the softmax model(1). Our axioms form a set of necessary and su¢ cient testable implications of the model that allow theanalyst to falsify the model and, when not falsi(cid:133)ed, to elicit its parameters from behavioral data, as detailedin Proposition 6. Estimation methods for this model are, instead, well established as the multinomial logitmodel is widely used in discrete choice analysis. Formally, (cid:21) ( t ) ! 1 (resp., ) as t ! (resp. ). Bias is absent when (cid:11) is constant. That is, of the learning of the nature (cid:150)so, the subjective value (cid:150)of alternatives when information is costly.

2e complete our external analysis by showing that longer deliberation times (cid:133)rst-order stochasticallyimprove the chances of selecting better alternatives (Proposition 7). As deliberation time becomes in(cid:133)nite,best alternatives get selected (Proposition 8), thus recovering standard ordinal analysis as a limit case.In the second part of the paper (Section 4) we address the second question by going (cid:147)inside the blackbox(cid:148) through a computational neuroscience approach. We develop an algorithmic decision process that,when implemented by the neural system, generates the softmax stochastic choice behavior described inequation (1). This process is inspired by eye-tracking evidence and combines Markov exploration as inMetropolis et al. (1953) and the drift di⁄usion model of binary choice of Racli⁄ (1978), in the value-basedversion proposed by Krajbich, Armel and Rangel (2010) and Milosavljevic et al. (2010). Moreover, itapproximately generates softmax stochastic choice as observable exterior output (Proposition 12), thusproviding a neural foundation for softmax choice behavior. We also present physiologically-calibratedsimulations that support the biological plausibility of this neural foundation (Section 4.6).The (cid:133)rst two parts of the paper show that, jointly, the inner and outer approaches provide a thoroughunderstanding of softmaximization in terms of internal causes (neurophysiological mechanisms) and exter-nal e⁄ects (testable implications). In the (cid:133)nal part of the paper (Section 5), we show that their cause-e⁄ectnexus actually permits to identify and cross-validate the components of the behavioral and neural softmaxspeci(cid:133)cations. This empirical dividend of our inner-outer analysis concludes our exercise.

Discrete choice analysis

A by-product of our analysis is an axiomatic foundation of the heteroscedasticmultinomial logit model, the workhorse of discrete choice analysis. Indeed, (1) can be rewritten in termsof random utility (see Luce and Suppes, 1965, and McFadden, 1973) as p t ( a; A ) = Pr f u ( a ) + (cid:21) ( t ) (cid:15) ( a ) > u ( b ) + (cid:21) ( t ) (cid:15) ( b ) for all b A n f a gg where f (cid:15) ( a ) g a A is a collection of independent errors with type I extreme value distribution, speci(cid:133)c mean (cid:11) ( a ) , and common variance (cid:25) = . Here p t ( a; A ) describes the stochastic behavior of a decision makerwho is trying to maximize u but, because of time pressure, makes mistakes in evaluating the variousalternatives. The standard deviation of mistakes is proportional to (cid:21) ( t ) and their bias is captured by (cid:11) . In discrete choice analysis, t may be time or, more in general, an index describing the experimentalconditions under which data have been collected (that is, the di⁄erent data sets available to the analyst). Heteroscedasticity, i.e., the dependence of (cid:21) on t and the presence of (cid:11) , was introduced because, whilethe decision makers(cid:146) utility u is a stable trait to be learned, disturbances are a⁄ected by experimentalconditions and alternative speci(cid:133)c biases.The present paper permits to test for mis-speci(cid:133)cation of the heteroscedastic multinomial logit modeland provides simple techniques to directly identify its parameters from data. In return, as previouslymentioned, the discrete choice analysis literature provides a number of methods to estimate the parametersof the softmax speci(cid:133)cation (1). Related literature

This paper considers exogenous deliberation times, thus we focus our discussion onthe literature dealing with this issue. To the best of our knowledge, there is only one other axiomatic See, e.g., the textbooks of Louviere, Hensher and Swait (2000) and Train (2009). For example, T is a set of locations in Train (2009, pp. 24-25), it is a doubleton distinguishing between stated intentionsand market choices in Ben-Akiva and Morikawa (1990). In Appendix D, we extend our axiomatic analysis to allow forcompletely general choice and index sets. The econometric study of the heteroscedastic multinomial logit model, now textbook material, dates back to Ben-Akivaand Morikawa (1991), Swait and Louviere (1993), Hensher and Bradley (1993) and Bhat (1995). Models where decision time is endogenously (cid:150) say, optimally (cid:150) chosen are the subject of active research and we referreaders to Woodford (2014), Steiner, Stewart and Matejka (2017), Fudenberg, Strack and Strzalecki (2018) and Webb (2019)for updated perspectives. while we consider the general case in which the analyst may possibly ignore this state, or even thestate space. Outside the laboratory, presuming such knowledge is a quite strong assumption. For example,what is the relevant state in the following simple vending machine value-based task?In a general Random Expected Utility perspective, Lu (2016) axiomatically captures preference learningthrough increasingly informative priors on the set of probabilistic beliefs of the decision maker. Fudenbergand Strzalecki (2015) axiomatize a discounted adjusted logit model. Di⁄erently from the present work, theirpaper studies stochastic choice in a dynamic setting where choices made today can in(cid:135)uence the possiblechoices available tomorrow, and consumption may occur in multiple periods. Frick, Iijima and Strzalecki(2017) characterize the general random utility counterpart. Saito (2017) obtains several characterizationsof the Mixed Logit Model. Finally, Baldassi et al. (2019) and Fudenberg, Newey, Strack and Strzalecki(2019) axiomatize the value-based DDM.As to algorithmic random choice theory, the vast majority of the multi-alternative extensions of theDDM to choice tasks with N > alternatives considers simultaneous evidence accumulation for all the N alternatives in the menu. In these models, the choice task is assumed to simultaneously activate N accumulators, each of them is primarily sensitive to one of the alternatives and integrates the evidencerelative to that alternative. Choices are then made based on absolute or relative evidence levels, withendogenous or exogenous stopping times. See, e.g., Roe, Busemeyer and Townsend (2001), Anderson,Goeree and Holt (2004), McMillen and Holmes (2006), Bogacz, Usher, Zhang and McClelland (2007),Ditterich (2010) and Krajbich and Rangel (2011). Natenzon (2019) also belongs to this family and proposesa Multinomial Bayesian Probit model to jointly accommodate similarity, attraction and compromise e⁄ectsin a preference learning perspective. According to Natenzon(cid:146)s model, when facing a menu of alternativesthe decision maker (cid:150)who has a priori i.i.d. standard normally distributed beliefs on the possible utilities ofalternatives (cid:150)receives a random vector of jointly normally distributed signals that represents how much heis able to learn about the ranking of alternatives before making a choice (say within time t ). The decisionmaker updates the prior according to Bayes(cid:146)rule and chooses the option with the highest posterior meanutility.Alternatively, Reutskaja, Nagel, Camerer and Rangel (2011) propose three two-stage models in whichsubjects randomly search through the feasible set during an initial search phase, and when this phase Choice situations of this kind have been studied since Saltzman and Garner (1948) and Kaufman, Lord, Reese andVolkmann (1949). More recent contributions are Gabaix, Laibson, Moloche and Weinberg (2006), Caplin and Dean (2014),Dean and Neligh (2019) and Dewan and Neligh (2020).

4s concluded they select the best item that was encountered during the search, up to some noise. Thisapproach involves what may be called a quasi-exhaustive search in that the presence of a deadline mayterminate the search phase before all alternatives have been evaluated and introduces an error probability.In contrast, this paper focuses on sequential pairwise comparison, as advocated by Russo and Rosen(1975) in a seminal eye (cid:133)xation study. Although di⁄erent from the models considered by Krajbich andRangel (2011) and Reutskaja, Nagel, Camerer and Rangel (2011), our model is consistent with some of theirexperimental (cid:133)ndings about the menu-exploration process and shares the reliance on the classical choicetheory approach in which multi-alternative choice proceeds through binary comparison and elimination.Rustichini and Padoa-Schioppa (2015) extend the DDM in a biologically realistic model by adoptingmodels developed in visual perception to economic choices. Rustichini et al. (2017) uses this model toexplain optimality properties of adaptive coding in choice.

Let A be the collection of all nonempty (cid:133)nite subsets A of a universal set X of possible alternatives, called menus . We denote by (cid:1) ( X ) the set of all (cid:133)nitely supported probability measures on X and, for each A (cid:18) X , by (cid:1) ( A ) the subset of (cid:1) ( X ) consisting of the measures assigning mass to A . De(cid:133)nition 1 A random choice rule is a function p : A ! (cid:1) ( X ) A p A such that p A (cid:1) ( A ) for all A . Given any alternative a in A , we interpret p A ( f a g ) , also denoted by p ( a; A ) , as the probability that adecision maker chooses a when the set of available alternatives is A . More generally, if B is a subset of A ,we denote by p A ( B ) or p ( B; A ) the probability P b B p ( b; A ) that the selected element lies in B . Thisprobability can be viewed as the frequency with which an element in B is chosen.As usual, given any a and b in X , we set p ( a; b ) = p ( a; f a; b g ) ; r ( a; b ) = p ( a; b ) p ( b; a ) ; ‘ ( a; b ) = ln r ( a; b ) (2)Thus r ( a; b ) denotes the odds for a against b , that is, the ratio between the number of episodes in which a is chosen and the number of episodes in which b is. Its logarithm ‘ ( a; b ) denotes the log-odds , which areanalytically convenient because they are positive if and only if odds are favorable to a . Luce (1959) proposes the most classical random choice model. Its assumptions on p are: Positivity p ( a; b ) > for all a; b X . Choice Axiom p ( a; A ) = p ( a; B ) p ( B; A ) for all B (cid:18) A in A and all a B . Or choice sets or choice problems . We also assume, that X has at least three elements since the two remaining cases aresimple exercises. Formally, x p ( x; A ) , for all x in X , is the discrete density of p A , but notation will be abused and p A ( (cid:1) ) identi(cid:133)ed with p ( (cid:1) ; A ) . Indeed, p ( a; b ) (cid:21) p ( b; a ) () r ( a; b ) (cid:21) () ‘ ( a; b ) (cid:21) . a from menu A is that of (cid:133)rstselecting B from A and then a from B . As observed by Luce, this amounts to require that f p A : A is a conditional probability system in the sense of Renyi (1955). As well-known, both axioms can be expressed in terms of odds. In particular, the Choice Axiom isequivalent to the odds independence condition, p ( a; b ) =p ( b; a ) = p ( a; A ) =p ( b; A ) when p ( a; A ) =p ( b; A ) iswell de(cid:133)ned, that the odds for a against b be independent of the other alternatives available in the menu. Next we state Luce(cid:146)s classic representation theorem.

Theorem 1 (Luce)

The following conditions are equivalent for a random choice rule p : A ! (cid:1) ( X ) :1. p satis(cid:133)es Positivity and the Choice Axiom;2. there exists v : X ! R such that p ( a; A ) = e v( a ) P b A e v( b ) (3) for all A and all a A .In this case, v is unique up to location (i.e., up to an additive constant). Moreover, when X is a topological space, it is easy to see that the next axiom characterizes thecontinuity of the function v that appears in (3). Continuity

The function ( a; b ) p ( a; b ) is continuous on the set of all pairs of distinct alternatives in X . This topological setting is standard in applications, where typically Continuity is either implicitlyassumed or automatically satis(cid:133)ed. Finally, observe that Theorem 1 shows that Positivity is equivalent,under the Choice Axiom, to the stronger assumption that p A has full support for each A in A . In our preamble we have considered a single random choice rule: this is the setup of the classical stochasticchoice as in, for example, Debreu (1958). We now provide the intuition for the extension of p to a familyof such rules, p t , where the index t models the e⁄ects of the discovery process. We will illustrate threestages in such process. The (cid:133)rst stage, corresponding to t = 0 , is illustrated by a decision maker who has See Lemma 2 of Luce (1959) for the case in which Positivity holds and our Lemma 14 in Appendix A for the generalcase. This odds independence condition is often called independence from irrelevant alternatives . See Lemma 3 of Luce (1959)for the case in which Positivity holds and our Lemma 14 in Appendix A for the general case. Also this continuity axiom can be expressed in terms of odds (see Lemma 13 in Appendix A). In Cerreia-Vioglio, Maccheroni, Marinacci, and Rustichini (2016), we drop the full support assumption and characterizegeneral random choice rules in terms of (cid:147)optimality(cid:148)of their support (see Theorem 15 in Appendix A).

6o choose one of the following alternatives, each identi(cid:133)ed by a QR code: ? ?? ?If the decision maker chooses alternative a , he receives a number of euros (or apple juice drops) equal tothe number of black squares n ( a ) present in QR code a .Our decision maker is (cid:147)greedy(cid:148) and so prefers more euros (or apple juice) to less: a (cid:31) b () n ( a ) > n ( b ) A utility function that represents (cid:31) has thus the form u = (cid:30) (cid:14) n , where (cid:30) : N ! R is strictly increasing.If the decision maker has deliberated long enough (so t = ) we may suppose that he knows thecorrect number of black squares n ( a ) of each alternative, as indicated in (cid:133)gure:229 242232 248This knowledgeable decision maker selects the best alternative, which in the (cid:133)gure has 248 black squares.His choice behavior is thus non-stochastic and reveals only his preference order, his ranking of alternatives.Even if the decision maker experienced di⁄erent intensities of preferences over di⁄erent pairs of alternatives,his choice behavior would not reveal anything about them to the analyst, an external observer. In otherwords, intensities are irrelevant to model his choice behavior. This is, for instance, the standard settingof consumer theory since the ordinal revolution started by Vilfredo Pareto (who (cid:133)rst understood thisirrelevance).Matters are di⁄erent in the third stage, intermediate to the (cid:133)rst two we have considered. Supposethat, perhaps because of time pressure (or cognitive limitations), the decision maker does not know n ( a ) ,so he is unable to properly evaluate alternatives, and only receives a (cid:150)possibly costly (cid:150)noisy signal aboutit. For instance, knowing that he can deliberate for t seconds only, the decision maker might use this Each code has

441 = 21 (cid:2) white or black squares. Throughout the paper we take seconds as the units of time. (cid:4) (cid:3) (cid:3) (cid:3) (cid:3) (cid:4) (cid:3) (cid:3)(cid:4) (cid:4) (cid:4) (cid:3) (cid:4) (cid:3) (cid:4) (cid:3) After deliberation, the decision maker has to choose an alternative. Because of the signal(cid:146)s noise, hischoice is now stochastic and he might well end up selecting a sub-optimal alternative. Interestingly, thestochastic choice behavior that emerges in this, informationally poorer, setting may reveal information onpreference intensities.Intuitively, this happens because the intensity of preference a⁄ects the error probability, that is, thechance of selecting a sub-optimal alternative after deliberation. Indeed, the stronger the preference for a over b , the easier their comparison, so the smaller the probability of choosing the inferior b . The intensityof preference, which plays no role in the choice behavior of a decision maker who knows his subjectivevalue of alternatives, becomes important to understand his stochastic choice behavior when he ignoressuch value and receives only noisy evidence about it. In turn, by a⁄ecting choice probabilities, preferenceintensities leave a trace in the decision maker choice behavior that an analyst may exploit to elicit them.This preference discovery intuition is an information acquisition elaboration of a classic discriminationprinciple of psychophysics, discussed for example in Davidson and Marschak (1959, p. 237). Some recentwork of Alos-Ferrer, Fehr and Netzer (2018) and de Palma, Fosgerau, Melo and Shum (2019) may help tobetter understand this elaboration. The latter paper, for instance, shows that, when the information costbelongs to a large class of entropies, choice probabilities implied by optimal information acquisition takethe form of an additive random utility. In this case, after acquiring information optimally for t seconds,the decision maker chooses a over b with probability p t ( a; b ) = Pr (cid:8) u ( a ) + (cid:15) at > u ( b ) + (cid:15) bt (cid:9) = Pr (cid:8) (cid:15) bt (cid:0) (cid:15) at < u ( a ) (cid:0) u ( b ) (cid:9) The error probability (cid:150)that is, the probability of choosing b when u ( a ) > u ( b ) (cid:150)decreases as the di⁄erence u ( a ) (cid:0) u ( b ) , interpreted as preference intensity, increases. The previous discussion motivates us to go beyond the traditional ordinal setting, where preferences onlyrank alternatives, and to introduce a richer setting in which we can also talk about preference intensitiesand their utility representations. The outcome of this procedure is stochastic, but depends on the correct state (e.g., the probability of extracting a blacksquare from the North-Western QR code is 229/441). For example, our experiment leads to a mistake: the sub-optimal South-Western square is chosen, with a material lossof 16 euros (or apple juice drops).

8o this end, we consider three strict preference relations (cid:31) , (cid:31) \ and (cid:31) (cid:3) . In particular, (cid:31) is de(cid:133)nedon the set of alternatives X and ranks them a (cid:31) b while (cid:31) \ is de(cid:133)ned on the set of distinct pairs of alternatives X = = f ( a; b ) : a = b in X g and ranks them ( a; b ) (cid:31) \ ( c; d ) Finally, (cid:31) (cid:3) is de(cid:133)ned on the set of binary choice sets A = ff a; b g : a = b in X g and ranks them f a; b g (cid:31) (cid:3) f c; d g In terms of interpretation, (cid:31) is a standard preference relation that ranks alternatives a la Debreu (1954,1964), (cid:31) \ ranks pairs of alternatives in terms of intensity of preference, a la Shapley (1975), and (cid:31) (cid:3) rankschoice problems in terms of ease of comparison, a la Suppes and Winet (1955). Indeed, a decision makermight well regard some comparisons as easier to make than others.Next we introduce a joint numerical representation of these three binary relations that extends thetraditional ordinal representation. De(cid:133)nition 2

A function u : X ! R is a psychometric utility (function) for the triplet (cid:0) (cid:31) ; (cid:31) \ ; (cid:31) (cid:3) (cid:1) if, foreach pair of alternatives a; b X , a (cid:31) b () u ( a ) > u ( b ) (4) and if, for each quadruple of alternatives a = b and c = d in X , ( a; b ) (cid:31) \ ( c; d ) () u ( a ) (cid:0) u ( b ) > u ( c ) (cid:0) u ( d ) (5) as well as f a; b g (cid:31) (cid:3) f c; d g () j u ( a ) (cid:0) u ( b ) j > j u ( c ) (cid:0) u ( d ) j (6)A psychometric utility does not only represent the basic preference (cid:31) in the standard ordinal fashion,but also accounts for the intensity of preferences, quanti(cid:133)ed via utility di⁄erences, as well as, for the easeof comparison, quanti(cid:133)ed via absolute values of utility di⁄erences.Psychometric utilities are cardinal, as the next routine lemma shows. Lemma 2

Continuous psychometric utilities, de(cid:133)ned on connected topological spaces, are cardinally unique.

The basic intuition previously outlined suggests that stochastic choice behavior might be understood interms of psychometric utilities. Our softmax representation theorem (Theorem 5) will show that, indeed,this is the case. In turn, stochastic choice behavior can be then used to elicit psychometric utilities. Thenext section introduces the measurement concepts to accomplish this elicitation.Before doing this, we make a (cid:133)nal important remark. Conceptually, the representations (4)-(6) capturedi⁄erent important features of the decision makers subjective evaluations of alternatives. Yet, thoughdistinct, they are not independent. Intuitively, the decision maker should (cid:133)nd easier (cid:150) say in terms ofmental e⁄ort (for instance, to retrieve past memories) (cid:150)to rank alternatives over which he feels a stronger,more intense, preference, with one alternative being clearly more desirable than the other. Proposition17 in Appendix B clari(cid:133)es by establishing (cid:150)when representations (4)-(6) hold (cid:150)the existence of a dualitymap that associates to each pair ( (cid:31) ; (cid:31) (cid:3) ) a relation (cid:31) \ and vice versa. We can diagram the duality as ( (cid:31) ; (cid:31) (cid:3) ) D (cid:26) (cid:31) \ R (cid:2) R (cid:3) R \ ( (cid:31) ; (cid:31) (cid:3) ) D (cid:0) (cid:27) (cid:31) \ (7) Strict preference relations are asymmetric and negatively transitive (see, e.g., De(cid:133)nition 2.2 of Kreps, 1988). R , R (cid:3) and R \ denote the sets of strict preferences on X , A and X = , respectively.In view of this duality, in principle one can focus on either ( (cid:31) ; (cid:31) (cid:3) ) or (cid:31) \ and derive the propertiesof the other via the duality. We nevertheless consider them together, as a triple (cid:0) (cid:31) ; (cid:31) \ ; (cid:31) (cid:3) (cid:1) , becauseas previously remarked they shed light on di⁄erent features of decision makers(cid:146)subjective evaluations ofalternatives (cid:150)albeit logically connected when they admit utility representations (4)-(6). Yet, this dualityis an important structural property that later will emerge in our analysis, in particular in the structure ofthe softmax representation theorem (Theorem 5). How can an analyst detect and measure the (cid:147)intensity(cid:148)traces left by the stochastic choice behavior of thedecision maker? The new element that we have introduced is a family of random choice rules, rather thana single one, and the crucial assumptions that permit to address this question concern the way in whichthey move as t changes. To this end, observe that a random choice rule can represent both the outcomeof deliberation and the initial bias of a decision maker. Suppose he is comparing two distinct alternatives a and b . The initial probability p ( a; b ) describes the frequency with which a is chosen over b , before any evidence-based deliberation. Alternatives a and b are a priori homogeneous if p ( a; b ) = 1 = , that is, if there is no initial bias for one over the other.Now assume that, after presentation of the choice problem f a; b g , the decision maker is (exogenously)given the possibility to deliberate for t seconds by acquiring and processing information about the alterna-tives. Depending on the evidence that he is able to gather, be it from environment or memory (or both), the choice probability p t ( a; b ) at deliberation time t may well be di⁄erent from the initial one p ( a; b ) . We interpret this change inlight of the following basic principle.

Measurement Principle

Prior behavior gets transformed to posterior behavior through consideration ofevidence, and the transformation itself represents the amount of evidence processed during deliberation.

This principle is best formalized through a change in odds as: r t ( a; b ) | {z } posterior odds = f |{z} strength of evidence (cid:2) r ( a; b ) | {z } prior odds (8)The ratio f = f t ( a; b ) = r t ( a; b ) r ( a; b ) represents the strength of evidence , gathered in t seconds, in favor of the hypothesis (cid:147) a is preferable to b .(cid:148)That said, in both statistics and neuroscience additive measurements are preferred, here routinelyachieved by taking logarithms on both sides of (8): ‘ t ( a; b ) | {z } posterior log-odds = ln f t ( a; b ) | {z } weight of evidence + ‘ ( a; b ) | {z } prior log-odds See, e.g., Bogacz et al. (2006), Gold and Shadlen (2007), and Shadlen and Shohamy (2016). See, e.g., Huseynov, Krajbich, and Palma (2018). See, again, Gold and Shadlen (2007). w t ( a; b ) = ln f t ( a; b ) = ‘ t ( a; b ) (cid:0) ‘ ( a; b ) is the additive version of f t ( a; b ) , called weight of evidence , a convenient logarithmic rescaling of strengthof evidence.Summing up, the strength of evidence is the change in odds for a against b induced by evidenceaccumulation for t seconds. This important notion permits to introduce three revealed preferences thatcorrespond to the three strict preferences (cid:0) (cid:31) ; (cid:31) \ ; (cid:31) (cid:3) (cid:1) that capture, as previously argued, some key featuresof the decision maker subjective evaluations of alternatives.We begin with the traditional ordinal notion. As usual, in the following (cid:147)revealed(cid:148)is short for (cid:147)revealedto an analyst.(cid:148) De(cid:133)nition 3

After a deliberation time t , an alternative a is revealed preferred to b , written a (cid:31) t b , if p t ( a; b ) > p ( a; b ) . In words, a is revealed preferred to b if deliberation favors a over b . In particular, when alternatives are apriori homogeneous, i.e., p ( a; b ) = 1 = , this de(cid:133)nition coincides with the standard notion of stochasticallyrevealed preference a (cid:31) t b () p t ( a; b ) > p t ( b; a ) which has informed economics and psychology since the 1950s. In general, when alternatives are not necessarily a priori homogeneous, preference for a over b isequivalently revealed by an increase in the odds for a against b after deliberation, in fact, a (cid:31) t b () w t ( a; b ) > () f t ( a; b ) > Starting from this observation, Luce (1957, pp. 17-19) observes that, while the preference order is deter-mined by the sign of w t ( a; b ) , the preference intensity is determined by its value. This motivates the nextde(cid:133)nition. De(cid:133)nition 4

After a deliberation time t , the preference for a over b is revealed to be stronger than thatfor c over d , written ( a; b ) (cid:31) \t ( c; d ) , if w t ( a; b ) > w t ( c; d ) . In words, the preference for a over b is stronger than that for c over d if deliberation provides strongerevidence in favor of a against b than in favor of c against d . This de(cid:133)nition thus equates strength ofpreference and strength of evidence, a key revelation assumption. Formally, ( a; b ) (cid:31) \t ( c; d ) () w t ( a; b ) > w t ( c; d ) () f t ( a; b ) > f t ( c; d ) While the preference order (cid:31) t is a relation between single alternatives, preference intensity (cid:31) \t is a relationbetween pairs of alternatives.The next and (cid:133)nal relation (cid:31) (cid:3) t is de(cid:133)ned over binary decision problems f a; b g and is meant to representtheir relative di¢ culty. It relies upon the following classic principle of psychophysics. Psychometric Principle

Easier choice problems are more likely to elicit correct responses than harderones. See, e.g., Georgescu-Roegen (1936, 1958), Mosteller and Nogee (1951), Papandreou (1953, 1957), Quandt (1956), Debreu(1958), and Davidson and Marschack (1959). This principle is often discussed under the name (cid:147)Psychometric Function.(cid:148) See, e.g., Alos-Ferrer, Fehr, and Netzer(2018).

11n a series of important works, Georg Rasch formalizes this principle through the concept of degreeof easiness of a decision problem f a; b g , given by e t ( a; b ) = j w t ( a; b ) j The reason why this quantity captures the psychometric principle is immediately seen by drawing the errorrate (cid:150)the probability of choosing the inferior alternative (cid:150)in the decision problem f a; b g as a function ofthe degree of easiness: Degree of easinessError rate

When the degree of easiness is zero, the error rate is maximal and coincides with the initial probabilityof choosing the inferior alternative. It then decreases exponentially as the degree of easiness increases, andeventually vanishes.Since w t ( a; b ) = (cid:0) w t ( b; a ) , the evidence in favor of a coincides with that against b . The degree ofeasiness thus represents the total amount of evidence j w t ( a; b ) j that can be obtained by comparing a and b for t seconds. A decision problem is di¢ cult when this quantity is small, say because sensory evidenceor memory do not provide information to the decision maker about the alternatives. All this leads to thefollowing de(cid:133)nition. De(cid:133)nition 5

After a deliberation time t , a decision problem f a; b g is revealed to be easier than a decisionproblem f c; d g , written f a; b g (cid:31) (cid:3) t f c; d g , if e t ( a; b ) > e t ( c; d ) . This de(cid:133)nition equates ease of comparison with the absolute amount of evidence that can be obtainedthrough deliberation, in fact, f a; b g (cid:31) (cid:3) t f c; d g () j w t ( a; b ) j > j w t ( c; d ) j (9)Summing up, strength of evidence (cid:150) or, equivalently, weight of evidence (cid:150) can be elicited from choicedata by looking at the variation of choice probabilities before and after deliberation. It reveals threerelations: preference order, preference intensity, and ease of comparison. Let X be a topological space and T (cid:18) (0 ; ) a (cid:150)discrete or continuous (cid:150)nonempty set of points of time.We set T = T [ f g . De(cid:133)nition 6 A random choice process is a collection f p t g t T of random choice rules. See Rasch (1960, 1961, 1980). t , we interpret p t ( a; A ) as the probability that a decision maker chooses alternative a frommenu A if t is the deliberation time , that is, the maximum amount of time he is (exogenously) given todecide. A random choice process thus describes the decision maker stochastic choice behavior underdi⁄erent deliberation times.Each component p t of a random choice process stochastically reveals (to an analyst), via De(cid:133)nitions3-5, a triplet ( (cid:31) t ; (cid:31) \t ; (cid:31) (cid:3) t ) for each t T . We say that u : X ! R is a psychometric utility for process f p t g if it is a psychometric utility for all triplets ( (cid:31) t ; (cid:31) \t ; (cid:31) (cid:3) t ) that the process reveals over di⁄erent deliberationtimes, that is, if for any a; b X , a (cid:31) t b () u ( a ) > u ( b ) and, for any a = b and c = d in X , ( a; b ) (cid:31) \t ( c; d ) () u ( a ) (cid:0) u ( b ) > u ( c ) (cid:0) u ( d ) as well as f a; b g (cid:31) (cid:3) t f c; d g () j u ( a ) (cid:0) u ( b ) j > j u ( c ) (cid:0) u ( d ) j for all t T .We adopt a preference discovery interpretation. As previously outlined, the psychometric utility u represents the correct value that alternatives have for the decision maker, a trait of his tastes which isstable (so, independent of t ) and unknown to him yet. During deliberation, the decision maker processesnoisy evidence about u . Evidence may be costly (say in subjective terms, like fatigue), so the decisionmaker confronts an information acquisition problem. After deliberation, he has to choose an alternative.Noise in information gathering and processing makes stochastic the ensuing choice behavior, which thepsychometric utility determines only probabilistically. The most important class of random choice processes describing a stochastic choice behavior whichis consistent, as Matejka and McKay (2015) have shown, with the preference discovery interpretation wemaintained so far is that of softmax random choice processes:

De(cid:133)nition 7

A random choice process f p t g is softmax if there exist a payo⁄ u : X ! R , an (initialbehavioral) bias (cid:11) : X ! R , and a noise (cid:21) : T ! (0 ; ) , extended to T by (cid:21) (0) = , such that p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) (10) for all A , all a A , and all t T . Next we clarify the utility nature of the payo⁄.

Proposition 3 If f p t g is a softmax random choice process, then the payo⁄ (function) u in (10) is apsychometric utility for f p t g . Say, by an experimenter, a script, or a spouse (see Agranov, Caplin, and Tergiman, 2015, for a simple protocol thatallows to observe these probabilities for human agents). An alternative interpretation of t , especially relevant when T isdiscrete and panel data are considered, is the number of times that the decision maker has been facing choice problem A ,called experience level by McKelvey and Palfrey (1995). On this, see also Luce and Suppes (1965, p. 332). To illustrate, in the QR code example (Section 2.2) the agent knows that he prefers more (money or apple juice) to less,but does not know the correct vector ( n ( a ) ; n ( b ) ; n ( c ) ; n ( d )) = (229 ; ; ; of physical payo⁄s (in euros or drops)associated with alternatives. This vector determines the correct subjective value u = (cid:30) (cid:14) n of alternatives, the unknown(cid:147)state(cid:148) that determines the distribution of signals that the decision maker obtains through experimentation, both when hetries to count the number of black squares or when he randomly extracts four squares from each code. In both cases thestate is revealed only stochastically (three research assistants, asked to count the squares within ten minutes, obtained threedi⁄erent vectors). u , the softmax speci(cid:133)cation features two other key elements, bias (cid:11) and noise (cid:21) . Before discussing the roles of these functions within a preference discovery interpretation ofstochastic choice, we report their uniqueness properties. Proposition 4 If f p t g is a softmax random choice process, then the psychometric utility u in (10) iscardinally unique, the bias (cid:11) is unique up to location and, unless f p t g is constant, the noise (cid:21) is uniquegiven u . Since p ( a; b ) = e (cid:11) ( a ) = (cid:2) e (cid:11) ( a ) + e (cid:11) ( b ) (cid:3) , a nonconstant function (cid:11) accounts for the existence of initial, pre-deliberation, biases in the stochastic choice behavior of the decision maker. In particular, (cid:11) is constant(so, irrelevant) if and only if alternatives are a priori homogeneous , with no initial bias in favor of anyalternative over another, that is, p ( a; b ) = 1 = for all distinct alternatives a and b . An unbiased softmaxprocess is called multinomial logit and has the form p t ( a; A ) = e u ( a ) (cid:21) ( t ) P b A e u ( b ) (cid:21) ( t ) The value (cid:21) ( t ) of function (cid:21) accounts for the error rate when t is the deliberation time. Without lossof generality, assume a (cid:31) t b , that is, u ( a ) > u ( b ) . The error probability is then p t ( b; a ) . Simple algebrashows that p t ( b; a ) = 11 + e u ( a ) (cid:0) u ( b ) (cid:21) ( t ) + (cid:11) ( a ) (cid:0) (cid:11) ( b ) (11)The higher (cid:21) ( t ) , the higher the error probability (so, the (cid:147)noise(cid:148)). In particular, when (cid:21) ( t ) vanishes theerror rate goes to , while when (cid:21) ( t ) diverges to it goes to p ( b; a ) (cid:150)the error rate implied by the initialbias. To understand the nature of softmax random choice processes, we aim to establish a representation theoremthat identi(cid:133)es the properties of random choice processes that make them softmax.Next we group the deliberative version of a (cid:133)rst set of assumptions that, in view of Luce(cid:146)s Theorem,are necessary for the softmax representation.

Deliberative Luce Axioms:Positivity p t satis(cid:133)es Positivity for all t T . Choice Axiom p t satis(cid:133)es the Choice Axiom for all t T . Continuity p t satis(cid:133)es Continuity for all t T .By Luce(cid:146)s Theorem, these conditions imply that, for each t in T , there exists a continuous function v t : X ! R , unique up to location, such that p t ( a; A ) = e v t ( a ) P b A e v t ( b ) A softmax process is constant (cid:150)i.e., p t = p s for all s; t T (cid:150)if and only if p t ( a; A ) = e (cid:11) ( a ) = P b A e (cid:11) ( b ) for all A ,all a A , and all t T In this case, u must be constant (in particular, cardinally unique), (cid:11) is unique up to location, and (cid:21) is undetermined (see Lemma 20 in Appendix C).

14o attain the softmax representation, we need to express all the functions v t by means of two timeindependent functions, utility u and bias (cid:11) , and one time dependent function, noise (cid:21) , such that v t ( a ) = u ( a ) (cid:21) ( t ) + (cid:11) ( a ) This is achieved through the next axiom which requires that, over deliberation times, there are noordinal reversals in the weight of evidence.

Intensity Consistency

Given any s > t in T , w t ( a; b ) > w t ( c; d ) () w s ( a; b ) > w s ( c; d ) for all a = b and c = d in X . In words, this axiom says that if the weight of evidence in favor of the hypothesis (cid:147) a is preferable to b (cid:148) is, after a given deliberation time, greater than that in favor of the hypothesis (cid:147) c is preferable to d (cid:148),the same happens after a longer deliberation time.In terms of revealed preference intensity, we can equivalently write this axiom as ( a; b ) (cid:31) \t ( c; d ) () ( a; b ) (cid:31) \s ( c; d ) for all s > t . This form justi(cid:133)es the axiom name, which requires preference intensities to be time invariant.The next representation theorem will show that Intensity Consistency characterizes softmax processes.Yet, as the duality (7) suggests, an alternative characterization is attained by using, together, analogousnon-reversal conditions for (cid:31) t and (cid:31) (cid:3) t . Interestingly, they have a one-way form, weaker than the two-wayform of Intensity Consistency. Preference Consistency

Given any s > t in T , p t ( a; b ) > p ( a; b ) = ) p s ( a; b ) > p ( a; b ) for all a; b X . Ease (of Comparison) Consistency

Given any s > t in T , e t ( a; b ) (cid:20) e t ( c; d ) = ) e s ( a; b ) (cid:20) e s ( c; d ) for all a = b and c = d in X . In terms of the revealed preference order, Preference Consistency is equivalent to a (cid:31) t b = ) a (cid:31) s b for all s > t . Preferences are thus stable: as time passes, they are not reverted. This is in accord with theidea that during deliberation correct (yet noisy) evidence is gathered and analyzed by the decision makerto inform his choice between the two alternatives.Ease Consistency, instead, says that the di¢ culty of decision problem f a; b g relative to decision problem f c; d g is inherent to the alternatives involved and independent of deliberation times. If the comparisonbetween a and b is not easier than that between c and d , given deliberation time t , then the passage oftime does not make a and b easier to compare than c and d . In terms of revealed ease of comparison, EaseConsistency is equivalent to f a; b g (cid:31) (cid:3) s f c; d g = ) f a; b g (cid:31) (cid:3) t f c; d g for all s > t .We can now state the softmax representation theorem.15 heorem 5 Let X be a connected topological space and f p t g a random choice process. The followingconditions are equivalent:1. f p t g satis(cid:133)es the Deliberative Luce Axioms and Intensity Consistency;2. f p t g satis(cid:133)es the Deliberative Luce Axioms, Preference Consistency and Ease Consistency;3. f p t g is a softmax process with continuous u; (cid:11) : X ! R , and (cid:21) : T ! (0 ; ) , that is, p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) for all A , all a A and all t T .In this case, u is cardinally unique, (cid:11) is unique up to location, and, unless f p t g is constant, (cid:21) is uniquegiven u .Moreover, process f p t g is multinomial logit if and only if alternatives are a priori homogeneous. An analyst, who observes that the stochastic choices of the decision maker behavior satisfy the axiomsof this theorem, can thus understand his behavior in terms of preference discovery, that is, as if carriedout by a decision maker who is trying to learn the value that alternatives have for him.More is true: our analyst can actually identify from the probabilistic choices of the decision makerthe softmax components u , (cid:11) and (cid:21) . In fact, since u is a psychometric utility for f p t g , if the process isconstant, then u must be constant, (cid:11) ( a ) (cid:0) (cid:11) ( b ) = ‘ ( a; b ) for all a; b X and (cid:21) is unde(cid:133)ned, else thereexist at least a pair of alternatives ^ a and ^ b and a deliberation time ^ t such that the preference ^ a (cid:31) ^ t ^ b isrevealed, and the next proposition provides the explicit expression of the parameters. Proposition 6

Let f p t g be a softmax random choice process. If there exist ^ a; ^ b X and ^ t T such that p ^ t (^ a; ^ b ) > p (^ a; ^ b ) , then the functions ^ u; ^ (cid:11) : X ! R and ^ (cid:21) : T ! (0 ; ) de(cid:133)ned by ^ u ( x ) = w ^ t ( x; ^ b ) w ^ t (^ a; ^ b ) ; ^ (cid:11) ( x ) = ‘ ( x; ^ b ) ; ^ (cid:21) ( t ) = 1 w t (^ a; ^ b ) (12) are well de(cid:133)ned, with p t ( a; A ) = e ^ u ( a )^ (cid:21) ( t ) +^ (cid:11) ( a ) P b A e ^ u ( b )^ (cid:21) ( t ) +^ (cid:11) ( b ) (13) for all A , all a A and all t T . Summing up, the last two results enable the analyst to interpret the stochastic choice behavior of thedecision maker in terms of softmax preference discovery and to empirically identify the softmax compo-nents.Finally, we can extend the results of this section to a general choice set X , without a topology, and toa general index set T , without an order. This is the subject matter of Appendix D. Their estimation is standard, typically carried out by maximum likelihood. See, e.g., Ben-Akiva and Lerman (1985) onthe econometric side and McKelvey and Palfrey (1995) on the game-theoretic one. .4 Ordinality and learning In this section we study how, as deliberation time increases, the stochastic choice behavior of a decisionmaker improves, and he becomes less prone to errors. Clearly the study of these time-increasing errorrate situations mirrors the one we consider here.

Decreasing Error Rate

Given any s > t in T , p t ( a; b ) > p ( a; b ) = ) p s ( a; b ) (cid:21) p t ( a; b ) for all a; b X . This axiom requires the frequency of mistakes to decrease over deliberation time. Indeed, if u is apsychometric utility for f p t g , according to this axiom we have: u ( a ) > u ( b ) = ) p s ( b; a ) (cid:20) p t ( b; a ) In words, longer deliberation times decrease the chance of selecting an inferior alternative. To appreciatethe consequences of this axiom, we need an additional one.

Payo⁄ Stochastic Dominance

Given any s > t in T , p s ( f a A : u ( a ) > (cid:22) u g ; A ) (cid:21) p t ( f a A : u ( a ) > (cid:22) u g ; A ) (cid:22) u R (14) for all A .Payo⁄ Stochastic Dominance requires that, for any given utility level (cid:22) u , the probability of obtaining apayo⁄ greater than (cid:22) u is higher after deliberating for a longer amount of time. This notion thus records aprobabilistic improvement, in payo⁄ terms, of the decision maker stochastic choice behavior as deliberationtimes increase. It is an improvement in the sharp sense of stochastic dominance: distribution p s;A (cid:14) u (cid:0) ((cid:133)rst-order) stochastically dominates distribution p t;A (cid:14) u (cid:0) .The next proposition shows that Decreasing Error Rate and Payo⁄ Stochastic Dominance are equivalentaxioms for softmax processes. So, the former axiom characterizes the stochastic choice behavior of asoftmax decision maker who, according to stochastic dominance, takes better and better decisions asdeliberation times increase. In terms of the softmax speci(cid:133)cation, it corresponds to a decreasing noise (cid:21) on T . In terms of rational inattention, to a time decreasing unit cost of information processing (e.g., theattention cost of reading and understanding a given paragraph decreases with the time available to do so). Proposition 7

Let f p t g be a nonconstant softmax process with utility u , bias (cid:11) and noise (cid:21) . The followingconditions are equivalent:1. f p t g satis(cid:133)es Decreasing Error Rate;2. p s ( f a A : a (cid:31) s b g ; A ) (cid:21) p t ( f a A : a (cid:31) t b g ; A ) for all b A and all s > t in T ;3. f p t g satis(cid:133)es Payo⁄ Stochastic Dominance;4. (cid:21) is decreasing on T . For instance, in medical decision making under severe time pressure, the longer the time for a doctor to process informa-tion, the lower the chance of selecting a suboptimal treatment seems to be (see, e.g., ALQuathani et al., 2016). Yet, otherpsychology evidence suggests that overly slack deadlines leave room to procrastination, distractions and fatigue that maydeteriorate choice performance (see, e.g., Ariely and Wertenbroch, 2002). Indeed, u ( a ) > u ( b ) () a (cid:31) t b () p t ( a; b ) > p ( a; b ) = ) p s ( b; a ) (cid:20) p t ( b; a ) .

17n view of this result, it is natural to wonder whether, for longer and longer deliberation times, thedecision maker eventually learns his ranking over alternatives, that is, his preference over them. In otherwords, is the preference discovery interpretation of softmax processes true to its name?To address this question, assume for simplicity that T = (0 ; ) . By the last result, under DecreasingError Rate the noise (cid:21) is decreasing on (0 ; ) . This permits to de(cid:133)ne a limit random choice rule p : A ! (cid:1) ( X ) by p ( a; A ) = lim t !1 p t ( a; A ) for all A and all a A . On this limit rule we consider the following axiom. Asymptotic Tie-breaking

Given any a; b X , p ( a; b ) = 0 ; ) p ( a; b ) = p ( a; b ) This axiom postulates that, if the decision maker is unable to make up his mind between alternatives a and b irrespective of deliberation time, then he will choose by (cid:135)ipping a biased coin. The coin(cid:146)s load isdetermined by the initial bias (cid:11) , so the coin is fair if and only if alternatives are a priori homogeneous. Proposition 8

Let f p t g be a nonconstant softmax process with utility u , bias (cid:11) and noise (cid:21) . If f p t g satis(cid:133)es Decreasing Error Rate and Asymptotic Tie-breaking, then p ( a; A ) = (cid:14) a (arg max A u ) e (cid:11) ( a ) P b arg max A u e (cid:11) ( b ) for all A and all a A . In particular, u ( a ) > u ( b ) () p ( a; b ) = 1 for all a = b in X . According to this proposition, the choice rule p reveals a preference (cid:31) on X de(cid:133)ned by a (cid:31) b () p ( a; b ) = 1 This preference permits to interpret the non-stochastic limit choice behavior in a traditional ordinal way,as if carried out by a decision maker who learned his preference (cid:150)so, his psychometric utility u up to anordinal transformation (cid:150)and accordingly selects the best alternatives. Standard ordinal analysis thus emerges as the limit version, as deliberation time becomes arbitrarilylarge, of our cardinal analysis. Alternatively, one can regard standard theory as assuming deliberationtime to be virtual; in real time, decision makers act as if they know their preferences.

According to the preference discovery interpretation, a softmax random choice process represents the choiceprobabilities induced by the solution of a rather complex problem of optimal information acquisition. Howcan it be implemented by a simple system, say a stylized neural system? Does it have a neurophysiologicalfoundation? To address these questions, in this section we move from outside to inside the black box, froman (cid:147)as if(cid:148) revealed preference analysis based on behavioral data to a causal computational neuroscienceanalysis calibrated with physiological (in particular, eye-tracking) data.Speci(cid:133)cally, we combine Markovian search inside a menu with DDM (Drift Di⁄usion Model) pairwisecomparison of its alternatives. Both assumptions are inspired by the seminal eye-tracking study of Russoand Rosen (1975), (cid:133)nd support in recent theories about memory, and are consistent with some of the Otherwise, the role of is played by the supremum of T . In contrast, an analyst learns u up to a cardinal transformation by observing the decision maker softmax stochasticbehavior (Proposition 6). See, e.g., Luck and Vogel (1997), Vogel and Machizawa (2004), and Shadlen and Shohamy (2016). As in the previous behavioral part, we consider a decision maker who has to select an alternative froma (cid:133)nite menu A , within an exogenously given deliberation time t > . Here time represents a constrainedresource which the decision maker(cid:146)s decision process relies upon. In what follows, we (cid:133)rst introduce thedi⁄erent parts of the decision procedure and we then assemble them. Notation is eased by assuming, unlessexplicitly stated otherwise, that A = f ; ; :::; j A j (cid:0) g , with j A j (cid:21) , and by identifying elements of (cid:1) ( A ) with vectors in the simplex of R j A j . Exploration of menu A has a classic Markovian format a la Metropolis et al. (1953). The decisionmaker starts with a (cid:133)rst, automatically accepted, candidate solution b drawn from an initial distribution (cid:22) (cid:1) ( A ) . Then, given an incumbent solution b , he considers an alternative candidate solution a = b with probability Q ( a j b ) . The only requirements we make on the probability transition matrix Q , called exploration matrix , is to be symmetric and irreducible. These requirements are both satis(cid:133)ed when the decision maker perceives a distance between alternativesthat can be described by a metric d on A , and the probability Q ( a j b ) is a strictly positive function ofthe distance d ( a; b ) . For example, if A is a connected graph (like a wine rack or a vending machine), then d may be the shortest-path distance and Q may have the form Q ( a j b ) = k ( A ) d ( a; b ) (cid:26) for all a = b . Here k ( A ) is a proportionality factor (independent of a and b ) and (cid:26) (0 ; ) is an explorationaversion parameter: for large (cid:26) only the nearest neighbors of the incumbent solution are considered, whilefor small (cid:26) exploration is essentially uniform across alternatives. Once proposed, say at time (cid:28) i , alternative a is compared with incumbent b via a value-based DDM. According to this model, an alternative is selected as soon as the net neural evidence in its favor reachesa posited decision threshold (cid:12) > . Speci(cid:133)cally, the comparison of distinct a and b is assumed to activatetwo neuronal populations whose activities ((cid:133)ring rates) provide evidence for the two alternatives. If the In particular, with the high number of re(cid:133)xations to alternatives previously contemplated by the decision maker, whichis not predicted by the existing models. See, e.g., Krajbich and Rangel (2011) and Reutskaja et al. (2011). Along with menu A , time t is thus kept (cid:133)xed throughout this section. For extra clarity, it is the time given to the decisionmaker to think through a single decision episode involving the choice problem, so it is the moment in which the deliberationprocess is externally terminated and a choice must be made. Other interpretations of t can be analyzed in this setup, butsome modi(cid:133)cations are required. See, e.g., Madras (2002). While symmetry is crucial, irreducibility is not (see Baldassi et al., 2019). See Russo and Rosen (1975) and Roe, Busemeyer and Townsend (2001). If the shortest-path distance is replaced with the discrete distance (or, equivalently, the graph is complete), explorationbecomes genuinely uniform. The value-based version of the DDM of Ratcli⁄ (1978) was introduced by Krajbich, Armel, and Rangel (2010) andMilosavljevic et al. (2010). See also Fehr and Rangel (2011). See Bogacz et al. (2006) and Shadlen and Shohamy (2016) for neurophysiological and neuropsychological analyses ofthis mechanism, Roe, Busemeyer and Townsend (2001), Krajbich, Armel, and Rangel (2010), Milosavljevic et al. (2010),Krajbich, Lu, Camerer and Rangel (2012), Rangel and Clithero (2014), Clithero (2018) and Chiong, Shum, Webb and Chen(2019) for applications of this model to the choice of consumption goods, as well as, Ratcli⁄, Smith, Brown and McKoon(2016) for a recent review. v ( a ) and v ( b ) , the cumulated di⁄erence between (cid:133)ringrates is assumed to have the Brownian motion form d Z a;b = [ v ( a ) (cid:0) v ( b )] d (cid:28) + p W The random variable Z a;b ( (cid:28) i + (cid:28) ) is interpreted as the net neural evidence in favor of a against b gatheredwithin (cid:28) seconds after the proposal, at time (cid:28) i , of alternative a . We adopt the standard interpretation ofthe mean activity v ( a ) as a neural index of value of alternative a , and call v : A ! R the neural utility ( function ) of the decision maker.A common assumption is that the process is unbiased , that is, Z a;b ( (cid:28) i ) = 0 When this is not the case, we are in the presence of starting point bias , and Z a;b ( (cid:28) i ) = (cid:16) a;b is a nonzero initial condition in ( (cid:0) (cid:12); (cid:12) ) . The DDM literature interprets starting point bias as the e⁄ectof past information about the hypothesis (cid:147) v ( a ) > v ( b ) .(cid:148)In both the unbiased and biased cases, comparison ends when Z a;b ( (cid:28) i + (cid:28) ) reaches either the threshold (cid:12) or (cid:0) (cid:12) . So, the response time is the random variable RT a;b = min f (cid:28) (0 ; ) : j Z a;b ( (cid:28) i + (cid:28) ) j = (cid:12) g At time (cid:28) i +RT a;b , if the upper bound (cid:12) has been reached, the decision maker accepts proposal a . Otherwise,if the lower bound (cid:0) (cid:12) has been reached, proposal a is rejected and the decision maker maintains theincumbent b . The resulting comparison outcome is the random variable CO a;b = ( a if Z a;b ( (cid:28) i + RT a;b ) = (cid:12)b if Z a;b ( (cid:28) i + RT a;b ) = (cid:0) (cid:12) Therefore, the probability of accepting proposal a is P (CO a;b = a ) = 1 (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] called acceptance probability , that of rejecting a is P (CO a;b = b ) = e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] called rejection probability .In line with a neural utility discovery interpretation, the DDM does not assume that the decision makerknows the utility di⁄erence v ( a ) (cid:0) v ( b ) . Instead, he (cid:147)discovers(cid:148) this di⁄erence by accumulating (noisy)evidence, from either the external environment or memory, until the decision threshold (cid:12) is reached. Thepresence of noise in evidence accumulation is what makes utility discovery time consuming and subject toerror.We denote this model by DDM ( v; (cid:12); (cid:16) ) , where (cid:16) : A (cid:2) A ! ( (cid:0) (cid:12); (cid:12) ) is a function, such that (cid:16) a;b = (cid:0) (cid:16) b;a ,which speci(cid:133)es an initial condition for the comparison of any two distinct alternatives in A . The DDM is unbiased when (cid:16) is null. The unbiased case, often called simple or original in the neuroscience literature,is the most popular value-based DDM. See Krajbich, Armel and Rangel (2010) and Milosavljevic et al. (2010). See, e.g., Bogacz et al. (2006), Bornstein, Khaw, Shohamy and Daw (2017), Gold and Shadlen (2007), Hanks et al.(2011) and Mulder et al. (2012). See, e.g., Ratcli⁄ (1978). See Shadlen and Shohamy (2016) and Fudenberg, Strack and Strzalecki (2018) Baldassi et al. (2019) provide an axiomatization for it. .3 Decision procedure We now combine Metropolis exploration and value-based DDM comparisons. The resulting proceduredescribes a decision maker who, given time t to deliberate, explores menu A in a Markovian way andsequentially compares alternatives according to the DDM. The algorithm starts at time and terminatesat time t , when the incumbent solution is chosen. Metropolis-DDM AlgorithmInput:

Given t > . Start:

Draw a from A according to (cid:22) and (cid:15) set (cid:28) = 0 , (cid:15) set b = a . Repeat:

Draw a n +1 from A according to Q ( (cid:1) j b n ) and compare it to b n via DDM ( v; (cid:12); (cid:16) ) : (cid:15) set (cid:28) n +1 = (cid:28) n + RT a n +1 ;b n , (cid:15) set b n +1 = CO a n +1 ;b n , until (cid:28) n +1 > t . Stop:

Set b (cid:3) = b n . Output:

Choose b (cid:3) from A . This algorithm is consistent with a neural utility discovery interpretation. At each iteration of the(cid:147)repeat-until(cid:148) loop, the evaluation of the sign of the utility di⁄erence v ( a ) (cid:0) v ( b ) is performed accordingto the DDM. In particular, after comparing incumbent b with proposal a and selecting CO a;b as the newincumbent, the decision maker has not learned v ( a ) and v ( b ) , but rather has performed a test of thehypothesis that a is more valuable than b . The fact that this test is time consuming and subject to error represents the main di⁄erence betweenthe Metropolis-DDM algorithm and the standard brute force comparison-and-elimination algorithm ofclassical optimization, sometimes called standard revision (especially in marketing). According to standardrevision, multiple alternatives are pairwise compared and one alternative is permanently eliminated aftereach binary comparison. With this, after j A j (cid:0) comparisons, the incumbent solution is an optimalchoice. The implicit assumption which this brute-force procedure rests upon is that pairwise comparisonsare instantaneous and exact. In the time constrained Metropolis-DDM algorithm, instead, the fact thatcomparisons are time consuming may lead to incomplete exploration of the menu, while the fact thatcomparisons may be erroneous makes it inadvisable to eliminate permanently an alternative that wasjudged inferior at a previous stage.Next we list some of the main features of the Metropolis-DDM algorithm: (cid:15) termination time t is exogenous and deterministic; (cid:15) the duration RT a;b of each pairwise comparison is endogenous and random, with expectation notgreater than (cid:12) = ; See Gold and Shadlen (2002, 2007). (cid:25) , the ex post probability given by its

Gibbs transition P (cid:16) . It can be equivalently expressed in terms of odds as follows: P (cid:16) ( a; b ) P (cid:16) ( b; a ) | {z } odds post DDM = e (cid:12) [ v ( a ) (cid:0) v ( b )] | {z } strength of evidence (cid:2) (cid:25) ( a; b ) (cid:25) ( b; a ) | {z } odds ante DDM (16)This rule can be interpreted according to the measurement principle of Section 3.1. With one caveat: inEquation (8) of that section, the analyst observes the ex ante and the ex post odds, and aims to measurethe unknown strength of evidence. Here, in contrast, the analyst observes the ex post odds and the evidencethreshold, and aims to measure the unknown ex ante odds that are implied by a posited bias (cid:16) . Yet, theunderlying measurement principle is the same: the change in odds for a against b resulting from the DDMis proportional, via an exponential factor, to the accumulated neural evidence (cid:12) weighted by the neuralutility di⁄erence v ( a ) (cid:0) v ( b ) . The quantity (cid:12) [ v ( a ) (cid:0) v ( b )] is thus the weight of evidence for a against b that makes the neural system move from the ex ante to the ex post probability of choosing a over b ,according to the Gibbs transition rule. Observe that, for (cid:16) a;b = 0 , the solution of (16) is easily seen to be (cid:25) ( a; b ) = 1 = , as the intuition for theunbiased DDM suggests. In words, a null initial condition corresponds to a uniform ex ante probability.In terms of optimality, the Gibbs transition rule can be justi(cid:133)ed (cid:150)via a routine variational analysis (cid:150)through the unique solution of the optimization problem max (cid:24) (cid:1)( f a;b g ) (cid:26) [ v ( a ) (cid:24) ( a ) + v ( b ) (cid:24) ( b ) (cid:0) ( v ( a ) (cid:25) ( a; b ) + v ( b ) (cid:25) ( b; a ))] (cid:0) R ( (cid:24) k (cid:25) ) (cid:12) (cid:27) Here, the relative entropy R ( (cid:24) k (cid:25) ) =(cid:12) is the cost (cid:150) in terms of required information elaboration (cid:150) ofthe change from the ex ante probability (cid:25) to a candidate ex post probability (cid:24) , assumed to be directlyproportional to their entropic distance and inversely proportional to the accumulated neural evidence (cid:12) .The expected utility di⁄erence v ( a ) (cid:24) ( a ) + v ( b ) (cid:24) ( b ) (cid:0) ( v ( a ) (cid:25) ( a; b ) + v ( b ) (cid:25) ( b; a )) is, instead, the expected bene(cid:133)t of such a change. With this, the objective function becomes the netexpected bene(cid:133)t of the change from (cid:25) to (cid:24) . The Gibbs transition P (cid:16) is the ex post probability thatmaximizes such bene(cid:133)t.Denote by (cid:25) (cid:16) the Gibbs ex ante (binary) probability that has P (cid:16) as its Gibbs transition. Simplemanipulation of formula (15) gives the explicit expression: (cid:25) (cid:16) ( a; b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) (17)Back to our opening question, we claim that (cid:25) (cid:16) is the probabilistic rendering of the initial condition (cid:16) . As argued before, this claim can be understood in terms of both measurement and optimality. Moreimportantly, perhaps, next we show that the Gibbs binary bijection ( (cid:0) (cid:12); (cid:12) ) (cid:16) a;b (cid:25) (cid:16) ( a; b ) (0 ; (18)between initial conditions and ex ante probabilities, de(cid:133)ned via (17), features some remarkable properties. This transition is, mutatis mutandis , the analogue of the Gibbs posterior of Zhang (2006a, 2006b). When A = f a; b g , with say v ( a ) > v ( b ) , we can normalize the weight to (cid:12) by setting v ( a ) = 1 and v ( b ) = 0 . See, e.g., Dupuis and Ellis (1997, p. 27). roposition 9 Given a neural utility v and a threshold (cid:12) , for distinct alternatives a and b in A the Gibbsbinary bijection is such that: (cid:16) a;b (cid:21) () (cid:25) (cid:16) ( a; b ) (cid:21) = (19) and (cid:12)(cid:12) P (cid:16) ( a; b ) (cid:0) (cid:25) (cid:16) ( a; b ) (cid:12)(cid:12) (cid:20) (cid:12) j v ( a ) (cid:0) v ( b ) j (20)The monotonicity formula (19) ensures that a positive bias (cid:16) a;b in favor of a against b corresponds toa higher ex ante probability (cid:25) (cid:16) ( a; b ) of selecting a over b . It implies, inter alia , that a null (cid:16) correspondsto a uniform (cid:25) , as we previously checked in a direct way.The monotonicity formula thus substantiates the claim that (cid:25) (cid:16) is the ex ante probability naturallyassociated to the initial condition (cid:16) . Inequality (20) further corroborates this role of (cid:25) (cid:16) by showing that itactually governs the DDM(cid:146)s probabilistic choices when the accumulated evidence (cid:12) is small or alternativeshave similar neural utilities. As shown in Figure 3, a Metropolis-DDM algorithm randomly produces a sequence ( b ; a ; (cid:28) ; b ; :::; b n ; a n +1 ; (cid:28) n +1 ; b n +1 ; ::: ) of incumbents b n , proposals a n +1 and elapsed (response) times (cid:28) n +1 , which is truncated at time t when theincumbent is chosen.At each iteration of the (cid:147)repeat-until(cid:148) loop, proposal a is accepted as the new incumbent with proba-bility P (cid:16) ( a; b ) while, with the complementary probability, a is rejected and the old incumbent b is maintained. Therefore,the probability of selecting a as a new incumbent given the old incumbent b is M ( a j b ) = Q ( a j b ) P (cid:16) ( a; b ) for all a = b . This Markovian transition probability combines the stochasticity of the Metropolis ex-ploration mechanism and that of the DDM acceptance/rejection rule. The transition matrix M =[ M ( a j b )] a;b A is called the incumbents(cid:146) transition matrix of the Metropolis-DDM algorithm.To study how the Metropolis-DDM algorithm proceeds according to this transition matrix, we introducea class of DDMs that will play an important role in our analysis. De(cid:133)nition 8

A DDM ( v; (cid:12); (cid:16) ) is transitive if P (cid:16) ( b; a ) P (cid:16) ( c; b ) P (cid:16) ( a; c ) = P (cid:16) ( c; a ) P (cid:16) ( b; c ) P (cid:16) ( a; b ) (21) for all distinct alternatives a; b; c A . In words, a DDM is transitive when violations of transitivity in the choices that it determines are dueonly to the presence of noise. Indeed, condition (21) amounts to require that the intransitive cycles a ! b ! c ! a and a ! c ! b ! a be equally likely. Since in most of the cases we consider v and (cid:12) as (cid:133)xed, we will sometimes say that (cid:16) isa transitive initial condition if DDM ( v; (cid:12); (cid:16) ) is transitive.24nbiased value-based DDMs are an important example of transitive DDMs. Biased value-based DDMs,instead, might well not be transitive, so may result in choices between alternatives that feature systematicintransitivities, thus violating a basic rationality tenet. The DDM transitivity ensures that this is not thecase.The next result, which builds upon Kolmogorov (1936) and Luce and Suppes (1965), shows theimportance of transitive DDMs in our setting. Proposition 10

Given a neural utility v , a threshold (cid:12) and an initial condition (cid:16) , the following conditionsare equivalent:1. the incumbent transition matrix M is reversible for every exploration matrix Q ;2. DDM ( v; (cid:12); (cid:16) ) is transitive;3. there exists (cid:25) (cid:16) (cid:1) ( A ) such that, for all a = b in A , the Gibbs ex ante binary probability (cid:25) (cid:16) ( a; b ) isgiven by (cid:25) (cid:16) ( a; b ) = (cid:25) (cid:16) ( a; A ) (cid:25) (cid:16) ( a; A ) + (cid:25) (cid:16) ( b; A ) Remarkably, this proposition connects properties of altogether di⁄erent nature:1. reversibility of M , an algorithmic property which is an important su¢ cient condition for the existenceof a stationary distribution of a Markov chain, especially in computational analyses;

2. transitivity of the DDM, a behavioral property which ensures that violations of transitivity in theprobabilistic choices that it determines are due only to the presence of noise;3. existence of a universal

Gibbs ex ante probability (cid:25) (cid:16) (cid:1) ( A ) of DDM ( v; (cid:12); (cid:16) ) that, via conditioning,determines all Gibbs ex ante binary probabilities (cid:25) (cid:16) : for all a = b in A , now (cid:25) (cid:16) ( a; b ) is the conditionalprobability (cid:25) (cid:16) ( a j f a; b g ) .These connections allow us to study both the stationarity of the Metropolis-DDM algorithm and theextension of the Gibbs binary bijection to a multi-alternative setting. We (cid:133)rst study the latter extension.To this end, observe that, for point 3 to hold, (cid:25) (cid:16) must be the unique fully supported probability (cid:25) on A that solves equation P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) ( a; A ) (cid:25) ( b; A ) (22)for all a = b in A . The next result, based on the equivalence of points 2 and 3 of Proposition 10, uses thisequation (cid:150) with unknown (cid:25) and parameter (cid:16) (cid:150) to de(cid:133)ne a general, multi-alternative, Gibbs bijection asits solution function. Proposition 11

Given a neural utility v and a threshold (cid:12) , Equation (22) de(cid:133)nes a bijection (cid:0) v;(cid:12) (cid:16) (cid:25) (cid:16) (cid:1) + ( A ) between the set (cid:0) v;(cid:12) (cid:18) ( (cid:0) (cid:12); (cid:12) ) A (cid:2) A of transitive initial conditions and the set (cid:1) + ( A ) of fully supporteddistributions (cid:25) on A . In particular, (cid:16) a;b (cid:21) () (cid:25) (cid:16) ( a; A ) (cid:21) (cid:25) (cid:16) ( b; A ) for all a = b in A . On reversibility, see, e.g., Kelly (2011). As Geyer (2011, p. 6) writes (cid:147)All known methods for constructing transition probability mechanisms that preserve aspeci(cid:133)ed equilibrium distribution in non-toy problems are ... reversible.(cid:148) A solution function of an equation associates, to each value of the parameter (here (cid:16) ), the corresponding solution of theequation (here (cid:25) (cid:16) ).

25e turn now to the study of the stationary distribution of the Metropolis-DDM algorithm. The nextresult, based on the equivalence of points 1 and 2 of Proposition 10, shows that this stationary distributionhas a softmax form determined only by the components v , (cid:12) and (cid:16) of the DDM. In contrast, the initialdistribution (cid:22) and the exploration matrix Q of the algorithm do not play any role in the stationarydistribution, because they are swamped by iterations. Proposition 12

Given a neural utility v , a threshold (cid:12) and an initial condition (cid:16) , if DDM( v; (cid:12) ; (cid:16) ) istransitive, then the stationary distribution of the incumbent transition matrix M is m ( a; A ) = (cid:25) (cid:16) ( a; A ) e (cid:12)v ( a ) P b A (cid:25) (cid:16) ( b; A ) e (cid:12)v ( b ) a A (23)The probability M n (cid:22) ( a ) that, after n iterations of the repeat-until loop, alternative a is the incumbentsolution thus converges, as n diverges to , to the stationary probability m ( a; A ) of the Metropolis-DDMalgorithm. Formally, lim n !1 ( M n (cid:22) ) ( a ) = m ( a; A ) .This result shows that a Metropolis-DDM algorithm featuring a transitive DDM has a softmax sta-tionary distribution with components v , (cid:12) and (cid:25) (cid:16) , which thus turn out to be the neural counterparts ofthe behavioral softmax components u , (cid:21) and (cid:11) . A natural identi(cid:133)cation assumption is (cid:22) = (cid:25) (cid:16) , that is,to assume that the initial distribution of the Metropolis-DDM algorithm be equal to the Gibbs ex anteprobability of its DDM. In this case, the stationary distribution (23) takes the sharp form m ( a; A ) = (cid:22) ( a ) e (cid:12)v ( a ) P b A (cid:22) ( b ) e (cid:12)v ( b ) a A The consequence of this identi(cid:133)cation assumption will be explored in the next section.That said, what does this last proposition say about the output of the Metropolis-DDM algorithm?Since the average duration of each iteration is bounded by (cid:12) = , the answer depends on whether theevidence threshold (cid:12) is small or large relative to the time t when the algorithm is stopped. If (cid:12) is large relative to t , then the algorithm might even stop before the (cid:133)rst DDM comparison is (cid:133)nished, and so theprobability of choosing a is circa (cid:22) ( a ) .In contrast, if (cid:12) is small relative to t , then the algorithm performs many iterations and the softmaxstationary distribution m ( a; A ) becomes a good approximation of the probability of choosing a among thealternatives in A . This is the way in which softmax enters our neural analysis. This last discussion begs a natural question: How physiologically plausible is the convergence of theMetropolis-DDM algorithm to its softmax stationary distribution? To address this key question, we (cid:133)rstcalibrate the algorithm with physiological data and then run simulations. To calibrate, observe that thecomponents v , (cid:12) and (cid:25) (cid:16) of the stationary softmax distribution (23) either coincide with (like v and (cid:12) )or are uniquely determined by (like (cid:25) (cid:16) ) the components of the underlying DDM ( v; (cid:12); (cid:16) ) , which can beeasily estimated by eye-tracking techniques. For instance, Milosavljevic et al. (2010) elicit these DDMcomponents and observe that augmenting time pressure (that is reducing t ) decreases (cid:12) . Speci(cid:133)cally,their estimates for unbiased binary DDM comparisons, under both high and low time pressure, correspondto max a;b A j v ( a ) (cid:0) v ( b ) j (cid:25) : , with (cid:12) (cid:25) : for high time pressure ( t = 4 seconds) and (cid:12) (cid:25) : for low time pressure ( t = 12 seconds).Using these physiological data, we simulate the Metropolis-DDM algorithm and test its softmax con-vergence. In the plots below we report the output of some simulations with A = f ; ; :::; g . In all cases,the empirical distribution of the Metropolis-DDM algorithm is simulated by running it ; times. See also Karsilar, Simen, Papadakis and Balci (2014). imulation 1 Choose in seconds with v ( a ) = a (cid:0) : and (cid:12) = 0 : : Simulation 2

Choose in 4 seconds with v ( a ) = j a (cid:0) : j and (cid:12) = 0 : : Simulation 3

Choose in 12 seconds with v ( a ) = a (cid:0) : and (cid:12) = 1 : : Simulation 4

Choose in 12 seconds with v ( a ) = j a (cid:0) : j and (cid:12) = 1 : : or , and performs arelatively small (and random) number of iterations, yet the empirical distribution that it generates matches(23) almost perfectly.Summing up, our simulations show that, when calibrated with physiological data, the Metropolis-DDM algorithm is able to converge to its softmax stationary distribution. Softmax thus solidly enters ouranalysis. Causes and e⁄ects

Our external, behavioral, analysis identi(cid:133)es the behavioral conditions character-izing softmax stochastic choice and permits the empirical elicitation of its components, thus providinga behavioral foundation for an (cid:147)as if(cid:148) interpretation of the decision maker stochastic choice in terms ofpreference discovery. Our internal, causal, analysis provides a biologically inspired algorithm that mayexplain softmax emergence in stochastic choice.Conceptually, these complementary approaches provide a complete perspective on softmaximizationas a model of preference discovery both in terms of internal (neuropsychological) causes and external(behavioral) e⁄ects.Empirically, the cause-e⁄ect nexus between the two analyses permits to identify and cross-validate thecomponents of the behavioral and neural softmax speci(cid:133)cations. This empirical interplay is the subjectmatter of this section.We (cid:133)rst introduce neural random choice processes, the inner counterpart of the behavioral randomchoice processes of the (cid:133)rst part of the paper, and then propose an identi(cid:133)cation and cross-validationprocedure.

Neural random choice processes

We extend the neural analysis of the last section from a (cid:133)xed pair A and t of menus and deliberation times to any such pair. To abstract from both context and deliberationtime e⁄ects, we de(cid:133)ne:(i) a neural utility v : X ! R on the set X of all alternatives, which has the neural utility v of lastsection as its restriction on menu A ; (ii) a strictly positive initial distribution (cid:22) , which has the initial distribution (cid:22) of last section as itsconditional on menu A ; For convenience, we assume X to be (cid:133)nite, the in(cid:133)nite case extension being straightforward. (cid:12) : T ! (0 ; ) which has the quantity (cid:12) of last section as its value at deliber-ation time t ;(iv) a family f (cid:16) t g t T of initial conditions (cid:16) t : X (cid:2) X ! ( (cid:0) (cid:12) ( t ) ; (cid:12) ( t )) , one for each t , which has thefunction (cid:16) of last section as its restriction on A (cid:2) A ;(v) a strictly positive exploration matrix Q on X (cid:2) X which has the matrix Q of last section as itsconditional version on A (cid:2) A .With these (cid:147)universal(cid:148) versions, the value-based DDM of last section takes now the form DDM ( v; (cid:12) ( t ) ; (cid:16) t ) Our analysis relies on the following identi(cid:133)cation assumption (cid:22) = (cid:25) (cid:16) t t > It equates the initial distribution of the Metropolis-DDM algorithm (cid:150) a free parameter in our exercise (cid:150)with the Gibbs ex ante probability of the DDM for each deliberation time. Because of this assumption,we may call (cid:22) the ( initial ) neural bias of the algorithm.In view of the previous simulations, we also assume that for each deliberation time t the Metropolis-DDM algorithm converges to its stationary distribution. The algorithm then induces a neural randomchoice process f m t g t T given, for each menu A , by m t ( a; A ) = (cid:22) ( a ) e (cid:12) ( t ) v ( a ) P b A (cid:22) ( b ) e (cid:12) ( t ) v ( b ) t T and m ( a; A ) = (cid:22) ( a ) = P b A (cid:22) ( b ) because of our identi(cid:133)cation assumption.Process f m t g t T summarizes the stochastic choice behavior caused by neural decision processes thatoccur inside the black box. An identi(cid:133)cation and cross-validation procedure

In the softmax representation theorem (Theorem5) we axiomatize a behavioral random choice process f p t g t T given, for each menu A , by p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) t T and p ( a; A ) = e (cid:11) ( a ) = P b A e (cid:11) ( b ) .Inside the black box, each Metropolis-DDM algorithm generates choices described by process f m t g t T .Outside the black box, an analyst observes process f p t g t T , so the e⁄ects of the neural system decisionprocesses. The next procedure shows how the analyst can combine the inside and outside perspectives toidentify and cross-validate the values of the components of both processes. Identi(cid:133)cation and cross-validation procedureNeural softmax hypothesis

The Metropolis-DDM approximates its softmax stationary distribution f m t g given by (23), that is, for all t T and all a A , m t ( a; A ) = (cid:22) ( a ) e (cid:12) ( t ) v ( a ) P b A (cid:22) ( b ) e (cid:12) ( t ) v ( b ) v , (cid:22) and (cid:12) . Behavioral data

The analyst observes a random choice process f p t g , describing the frequencies of choice. Behavioral test

The analyst checks whether f p t g satis(cid:133)es the axioms of Theorem 5. If this is the case,so the neural softmax hypothesis is not rejected, the analyst posits that m = p , that is, for all t T andall a A , e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) = p t ( a; A ) = m t ( a; A ) = (cid:22) ( a ) e (cid:12) ( t ) v ( a ) P b A (cid:22) ( b ) e (cid:12) ( t ) v ( b ) with unknown behavioral components u , (cid:11) and (cid:21) . Identi(cid:133)cation

If not constant, the softmax choice process f p t g reveals, by Proposition 6, to the analystthe values ^ u , ^ (cid:11) and ^ (cid:21) of the behavioral components of f p t g . By the uniqueness property of softmax(Proposition 4), we have v = ^ u ; (cid:22) = ln ^ (cid:11) ; (cid:12) = ^ (cid:21) (cid:0) up to a cardinal transformation. The neural components are thus identi(cid:133)ed.

Cross-validation

If available physiological data permit to identify the neural components v , (cid:22) and (cid:12) , theanalyst can cross-validate the values previously obtained.This procedure shows the signi(cid:133)cant interplay between the inner and outer perspectives on stochasticchoice studied in this paper. Far from being disconnected, these two perspectives complement each otherconceptually (cid:150)by providing external (behavioral) and internal (causal) explanations of softmax stochasticchoice behavior (cid:150)as well as empirically (cid:150)by permitting to identify and cross-validate the components ofthe softmax speci(cid:133)cations. In particular, we have the following inner/outer counterparts: Inner Outer

Neural utility v Psychometric utility u Neural bias (cid:22)

Behavioral bias (cid:11)

Threshold (cid:12)

Noise (cid:21)

We close by observing that the interplay discussed in this section gives a simulated annealing (cid:135)avorto the convergence result established by Proposition 8 (Section 3.4). As the pressure of time diminishes,the DDM threshold can increase, so that errors become less frequent and the Metropolis-DDM algorithmapproximates standard optimization.

The previous section concluded our analysis by showing how it complements that of Matejka and McKay(2015) by providing an external veri(cid:133)cation and an internal justi(cid:133)cation of the softmax model, for whichthey establish an optimal information acquisition foundation. Our two approaches complement each other,leaving no room for (cid:147)free parameters.(cid:148)In these concluding remarks, we explore limitations and possible future extensions. That is, there exist j; k > and h R such that v = ku + h , (cid:22) = je (cid:11) and (cid:12) = 1 =k(cid:21) . eyond softmax. To put our softmax revealed preference analysis (Section 3) in a wider perspectivewe brie(cid:135)y discuss a general speci(cid:133)cation of a random choice process. In particular, in the next de(cid:133)nitionwe generalize to our deliberation context the de(cid:133)nition of utility for random choice rules introduced byDebreu (1958) and Davidson and Marschak (1959).

De(cid:133)nition 9

A psychometric utility function u and an initial bias (cid:11) on X rationalize a random choiceprocess f p t g if, at each deliberation time t and all alternatives a; b; c; d X : u ( a ) (cid:0) u ( b ) (cid:20) u ( c ) (cid:0) u ( d ) and (cid:11) ( a ) (cid:0) (cid:11) ( b ) (cid:20) (cid:11) ( c ) (cid:0) (cid:11) ( d ) = ) p t ( a; b ) (cid:20) p t ( c; d ) : It is easy to see that this amounts to requiring the existence, at each deliberation time t , of a timedependent function (cid:30) t , increasing in both arguments, such that p t ( a; b ) = (cid:30) t ( u ( a ) (cid:0) u ( b ) ; (cid:11) ( a ) (cid:0) (cid:11) ( b )) for all distinct alternatives a; b X . Softmax is the special case (cid:30) t ( x; y ) = 11 + e (cid:0) x(cid:21) ( t ) (cid:0) y so that (11) holds, i.e., p t ( a; b ) = 11 + e (cid:0) u ( a ) (cid:0) u ( b ) (cid:21) ( t ) (cid:0) [ (cid:11) ( a ) (cid:0) (cid:11) ( b )] A pair ( u; (cid:11) ) is thus needed to understand random choice processes, u alone is no longer enough (cid:150)asit was, instead, in the analyses of Debreu, Davidson and Marschak of random choice rules. The noise (cid:21) ispeculiar to the softmax case, where it parametrizes the exponential form of (cid:30) t .A natural question, which may be explored in future research, is how the analysis of Debreu (1958) maygeneralize in this setup, determining which conditions on a random choice process ensure the existenceof the pair ( u; (cid:11) ) . A result along these lines would be in relation to Debreu (1958) what our softmaxrepresentation theorem (Theorem 5) is to Luce (1959). Adaptive exploration in the Metropolis-DDM algorithm.

A more sophisticated version of theMetropolis-DDM algorithm should perhaps take into account the fact that although the decision makerdoes not experience the utility from alternatives along the exploration, he might realize that some alter-natives must have similar utility, as he observes that comparing them takes a long time. For this reason,rather than exploring the menu in a homogeneous Markovian way, he might start with uniform explorationat the (cid:133)rst iteration, and then, at each subsequent iteration, penalize alternatives when the comparison isparticularly time consuming.This variation, introduced by Baldassi et al. (2020), permits to diminish the frequency of (cid:147)useless(cid:148)comparisons (those between alternatives with similar utility), increases the number of iterations before thedeadline, performs very well numerically, and poses some mathematical questions.

Quantal response equilibrium.

The softmax functional form can be regarded as a formalization of theDiscovered Preference Hypothesis outlined in Plott (1996). According to this hypothesis, decision makerslearn how their basic needs are satis(cid:133)ed by the di⁄erent alternatives in the choice environment through aprocess of re(cid:135)ection and practice that, in the long run, leads to optimizing behavior.Re(cid:135)ection is readily captured in our model by the deliberation time. If one considers applications todata analysis, this extension points to a di⁄erent natural interpretation of the set T . Instead of a deadline, The real-valued function (cid:30) t has domain (Im u (cid:0) Im u ) (cid:2) (Im (cid:11) (cid:0) Im (cid:11) ) . t of T may represent the number of times that the decision maker has been facing choice problem A .Under this interpretation, softmax can be seen as capturing preference discovery through practice.Softmaximization is the form that preference discovery takes in the Quantal Response Equilibrium ( QRE ) theory of McKelvey and Palfrey (1995). In their theory, t is the number of times an agent playedthe game, and thus measures his experience level, u ( a ) is the expected payo⁄ of action a , and (cid:21) ( t ) indexesthe agent(cid:146)s degree of rationality. From the original data analysis of McKelvey and Palfrey (1995) to therecent Agranov, Caplin and Tergiman (2015), Goeree, Holt and Palfrey (2016) and Ortega and Stocker(2016) evidence seems to suggest that, for sophisticated players, the function (cid:21) increases as time passesand the decision making environment is better understood. Our axiomatic and neuropsychological characterizations of softmax can thus be seen as alternativefoundations of QRE. The (cid:133)rst identi(cid:133)es the discovery outcome, the second explains the discovery process.QRE is thus the equilibrium concept that corresponds to the decision theory developed in this paper.Goeree, Holt and Palfrey (2016) give a broad perspective of its di⁄erent applications.

A (cid:133)rst draft of this paper was circulated under the title (cid:147)Law of Demand and Forced Choice(cid:148) as IGIERWP 593, 2016. We thank Jerome Adda, Pierpaolo Battigalli, Patrick Beissner, Renato Berlinghieri, An-drei Borodin, Roberto Corrao, Federico Echenique, Agata Farina, Marcelo Fernandez, Loic Grenie, PhilipHolmes, Ryota Iijima, Michael Konig, Giacomo Lanzani, Jay Lu, Laura Maspero, Thomas Palfrey, CharlesPlott, Antonio Rangel, Roger Ratcli⁄, Giorgia Romagnoli, Kota Saito, Larry Samuelson, Vaibhav Srivas-tava, Jakub Steiner, Tomasz Strzalecki, the Editors (Bart Lipman and Joel Sobel) and four AnonymousReferees, as well as several seminar audiences. We especially thank Carlo Baldassi and Giulio Principi forvery useful discussions and Marco Pirazzini for coding assistance. Simone Cerreia-Vioglio and MassimoMarinacci gratefully acknowledge the (cid:133)nancial support of ERC (grants SDDM-TEA and INDIMACRO,respectively).

A Proofs of the results of Section 2.1

Recall that X = = f ( a; b ) : a = b in X g denotes the set of all pairs of distinct alternatives in X , and that j X j (cid:21) . Lemma 13

Let X be a topological space. Under the hypotheses of Theorem 1, the following conditionsare equivalent:1. the function p : ( a; b ) p ( a; b ) is continuous on X = ;2. the function r : ( a; b ) r ( a; b ) is continuous on X ;3. the function v : X ! R is continuous. Proof of Lemma 13

Let f x (cid:17) g be a net in X with index set N directed by > . Assume x (cid:17) ! x in X and take y = x .If there exists (cid:20) N such that x (cid:17) = y for all (cid:17) > (cid:20) , then the net f ( x (cid:17) ; y ) g (cid:17) > (cid:20) is contained in X = andconverges to ( x; y ) . Point 1 guarantees that f p ( x (cid:17) ; y ) g (cid:17) > (cid:20) converges to p ( x; y ) . Then lim (cid:17) > (cid:20)

11 + e (cid:0) [v( x (cid:17) ) (cid:0) v( y )] = 11 + e (cid:0) [v( x ) (cid:0) v( y )] Interestingly, in Agranov, Caplin, and Tergiman (2015) and Ortega and Stocker (2016), t is not the experience level, butthe time the player had to contemplate the alternatives in A before choosing. lim (cid:17) N

11 + e (cid:0) [v( x (cid:17) ) (cid:0) v( y )] = 11 + e (cid:0) [v( x ) (cid:0) v( y )] which implies lim (cid:17) N [v ( x (cid:17) ) (cid:0) v ( y )] = v ( x ) (cid:0) v ( y ) and lim (cid:17) N v ( x (cid:17) ) = v ( x ) .Else, for all (cid:20) N there exists (cid:17) > (cid:20) such that x (cid:17) = y . But, y then belongs to all neighborhoods of x because x (cid:17) ! x . Take z distinct from x and y (this is possible because X has at least three elements).Then, the net f ( y; z ) g (cid:17) N converges to ( x; z ) in X = . Point 1 guarantees that v ( y ) (cid:0) v ( z ) converges to v ( x ) (cid:0) v ( z ) and v ( x ) = v ( y ) . Now consider the net ~ x (cid:17) = ( x (cid:17) if x (cid:17) = yx if x (cid:17) = y Note that ~ x (cid:17) ! x , v ( x (cid:17) ) = v (~ x (cid:17) ) for all (cid:17) N , and ~ x (cid:17) = y for all (cid:17) N . But, then f (~ x (cid:17) ; y ) g (cid:17) N converges to ( x; y ) in X = , so that f v (~ x (cid:17) ) g (cid:17) N converges to v ( x ) , and so does f v ( x (cid:17) ) g (cid:17) N = f v (~ x (cid:17) ) g (cid:17) N .Summing up, v is continuous. To prove this, observe that, for all ( a; b ) X (also when a = b ), r ( a; b ) = e v( a ) (cid:0) v( b ) . For all ( a; b ) X = , p ( a; b ) = 11 + e (cid:0) [v( a ) (cid:0) v( b )] = r ( a; b ) r ( a; b ) + 1 and continuity follows immediately. (cid:4) Lemma 14

Let p : A ! (cid:1) ( X ) be a random choice rule. The following conditions are equivalent:1. p is such that, p A ( C ) = p B ( C ) p A ( B ) for all C (cid:18) B (cid:18) A in A ;2. p satis(cid:133)es the Choice Axiom;3. p is such that p ( b; B ) p ( a; A ) = p ( a; B ) p ( b; A ) for all B (cid:18) A in A and all a; b B ;4. p is such that p ( a; b ) p ( b; a ) = p ( a; A ) p ( b; A ) (Independence from Irrelevant Alternatives) for all A and all a; b A such that p ( a; A ) =p ( b; A ) is well de(cid:133)ned; p is such that p ( Y \ B; A ) = p ( Y; B ) p ( B; A ) for all B (cid:18) A in A and all Y (cid:18) X .Moreover, in this case, p satis(cid:133)es Positivity if and only if p A has full support for all A in A (that is, p ( a; A ) > for all a A ). Proof

Choose as C the singleton a appearing in the statement of the axiom. Given any B (cid:18) A in A and any a; b B , by the Choice Axiom, p ( a; A ) = p ( a; B ) p ( B; A ) ,but then p ( b; B ) p ( a; A ) = p ( a; B ) p ( b; B ) p ( B; A ) = p ( a; B ) p ( b; A ) where the second equality followsfrom another application of the Choice Axiom. Let A and arbitrarily choose a; b A such that p ( a; A ) =p ( b; A ) = 0 = . By point 3, p ( b; a ) p ( a; A ) = p ( b; f a; b g ) p ( a; A ) = p ( a; f a; b g ) p ( b; A ) = p ( a; b ) p ( b; A ) three cases have to be considered: We are denoting by f ( y; z ) g (cid:17) N any net f ( y (cid:20) ; z (cid:20) ) g (cid:20) N in X = such that y (cid:20) (cid:17) y and z (cid:20) (cid:17) z . That is, di⁄erent from = . p ( b; a ) = 0 and p ( b; A ) = 0 , then p ( a; A ) =p ( b; A ) = p ( a; b ) =p ( b; a ) ; (cid:15) p ( b; a ) = 0 , then p ( a; b ) p ( b; A ) = 0 , but p ( a; b ) = 0 (because p ( a; b ) =p ( b; a ) = 0 = ), thus p ( b; A ) =0 and p ( a; A ) = 0 (because p ( a; A ) =p ( b; A ) = 0 = ); therefore p ( a; b ) p ( b; a ) = = p ( a; A ) p ( b; A ) (cid:15) p ( b; A ) = 0 , then p ( b; a ) p ( a; A ) = 0 , but p ( a; A ) = 0 (because p ( a; A ) =p ( b; A ) = 0 = ), thus p ( b; a ) = 0 and p ( a; b ) = 0 (because p ( a; b ) =p ( b; a ) = 0 = ); therefore p ( a; A ) p ( b; A ) = = p ( a; b ) p ( b; a ) Given any B (cid:18) A in A and any a; b B : (cid:15) If p ( a; A ) =p ( b; A ) = 0 = and p ( a; B ) =p ( b; B ) = 0 = , then by Independence from Irrelevant Alter-natives p ( a; A ) p ( b; A ) = p ( a; b ) p ( b; a ) = p ( a; B ) p ( b; B ) (cid:14) If p ( b; A ) = 0 , then p ( b; B ) = 0 and p ( b; B ) p ( a; A ) = p ( a; B ) p ( b; A ) . (cid:14) Else p ( b; A ) = 0 , then p ( b; B ) = 0 and again p ( b; B ) p ( a; A ) = p ( a; B ) p ( b; A ) . (cid:15) Else, either p ( a; A ) =p ( b; A ) = 0 = or p ( a; B ) =p ( b; B ) = 0 = and in both cases p ( b; B ) p ( a; A ) = p ( a; B ) p ( b; A ) Given any B (cid:18) A in A and any Y (cid:18) X , since p ( B; B ) = 1 , it follows p ( Y; B ) = p ( Y \ B; B ) . Therefore p ( Y \ B; A ) = X y Y \ B p ( y; A ) = X y Y \ B X x B p ( x; B ) ! p ( y; A ) = X y Y \ B X x B p ( x; B ) p ( y; A ) ! [by point 3] = X y Y \ B X x B p ( y; B ) p ( x; A ) ! = X y Y \ B p ( y; B ) X x B p ( x; A ) ! = X y Y \ B p ( y; B ) p ( B; A ) = p ( Y \ B; B ) p ( B; A ) = p ( Y; B ) p ( B; A ) Take Y = C .Finally, let p satisfy the Choice Axiom. Assume (cid:150) per contra (cid:150) Positivity holds and p ( a; A ) = 0 for some A and some a A . Then A = f a g and, for all b A n f a g , the Choice Axiom implies p ( a; A ) = p ( a; f a; b g ) p ( f a; b g ; A ) = p ( a; b ) ( p ( a; A ) + p ( b; A )) = p ( a; b ) p ( b; A ) whence p ( b; A ) = 0 (because p ( a; b ) = 0 ), contradicting p ( A; A ) = 1 . Therefore Positivity implies that p A has full support forall A in A . The converse is trivial. (cid:4) The next result characterizes a special case of the general Luce(cid:146)s model of Echenique and Saito (2018).Speci(cid:133)cally, Theorem 15 extends Theorem 1 by maintaining the assumption of Independence from Irrele-vant Alternatives while removing that of full support. In the subsequent analysis, this theorem will allowus to distill the utility function u starting from choice frequencies. In reading it, recall that a choicecorrespondence is a map (cid:0) : A ! A such that (cid:0) ( A ) (cid:18) A for all A . A choice correspondence is called rational if it satis(cid:133)es the Weak Axiom of Revealed Preference of Arrow (1959), that is, B (cid:18) A and (cid:0) ( A ) \ B = ? imply (cid:0) ( B ) = (cid:0) ( A ) \ B heorem 15 A random choice rule p : A ! (cid:1) ( X ) satis(cid:133)es the Choice Axiom if and only if there exista function v : X ! R and a rational choice correspondence (cid:0) : A ! A such that p ( a; A ) = e v( a ) P b (cid:0)( A ) e v( b ) if a (cid:0) ( A )0 elsefor all A and all a A .In this case, (cid:0) is unique and (cid:0) ( A ) = supp p A for all A . Proof

See Cerreia-Vioglio, Maccheroni, Marinacci, and Rustichini (2016). (cid:4)

B Proofs of the results of Section 2.3

Lemma 16

Let X be a connected topological space, and v and w be two continuous functions from X to R such that, for all x and y in X , v ( x ) (cid:21) v ( y ) implies w ( x ) (cid:21) w ( y ) . The following conditions areequivalent:1. for all x; y; z X , v ( x ) + v ( y )2 = v ( z ) () w ( x ) + w ( y )2 = w ( z )

2. for all x; y; a; b X , v ( x ) (cid:0) v ( a ) = v ( b ) (cid:0) v ( y ) () w ( x ) (cid:0) w ( a ) = w ( b ) (cid:0) w ( y )

3. for all x; y; a; b X , j v ( x ) (cid:0) v ( a ) j = j v ( b ) (cid:0) v ( y ) j () j w ( x ) (cid:0) w ( a ) j = j w ( b ) (cid:0) w ( y ) j

4. there exist (cid:20) > and (cid:24) R such that w = (cid:20) v + (cid:24) . Proof If v is constant, the lemma is trivial. Assume v ( X ) is a nondegenerate interval. By LemmaA.1.5 of Wakker (1989), there exists a (weakly) increasing and continuous (cid:30) : v ( X ) ! w ( X ) such that w = (cid:30) (cid:14) v . For all x; y; a; b X , j v ( x ) (cid:0) v ( a ) j = j v ( b ) (cid:0) v ( y ) j () (cid:20) j v ( x ) (cid:0) v ( a ) j = (cid:20) j v ( b ) (cid:0) v ( y ) j() j [ (cid:20) v ( x ) + (cid:24) ] (cid:0) [ (cid:20) v ( a ) + (cid:24) ] j = j [ (cid:20) v ( b ) + (cid:24) ] (cid:0) [ (cid:20) v ( y ) + (cid:24) ] j() j w ( x ) (cid:0) w ( a ) j = j w ( b ) (cid:0) w ( y ) j For all x; y; a; b X , v ( x ) (cid:0) v ( a ) = v ( b ) (cid:0) v ( y ) = ) j v ( x ) (cid:0) v ( a ) j = j v ( b ) (cid:0) v ( y ) j () j w ( x ) (cid:0) w ( a ) j = j w ( b ) (cid:0) w ( y ) j If v ( x ) (cid:0) v ( a ) (cid:21) , then v ( b ) (cid:0) v ( y ) (cid:21) , and w ( x ) (cid:21) w ( a ) and w ( b ) (cid:21) w ( y ) , hence w ( x ) (cid:0) w ( a ) = j w ( x ) (cid:0) w ( a ) j = j w ( b ) (cid:0) w ( y ) j = w ( b ) (cid:0) w ( y ) else if v ( x ) (cid:0) v ( a ) < , then v ( b ) (cid:0) v ( y ) < , and w ( x ) (cid:20) w ( a ) and w ( b ) (cid:20) w ( y ) , hence w ( x ) (cid:0) w ( a ) = (cid:0) j w ( x ) (cid:0) w ( a ) j = (cid:0) j w ( b ) (cid:0) w ( y ) j = w ( b ) (cid:0) w ( y ) v ( x ) (cid:0) v ( a ) = v ( b ) (cid:0) v ( y ) = ) w ( x ) (cid:0) w ( a ) = w ( b ) (cid:0) w ( y ) To prove the converse, (cid:133)rst notice that (cid:30) must be injective. In fact, for all v ( x ) ; v ( a ) v ( X ) , (cid:30) (v ( x )) = (cid:30) (v ( a )) = ) w ( x ) = w ( a ) = ) j w ( x ) (cid:0) w ( a ) j = j w ( x ) (cid:0) w ( x ) j = ) j v ( x ) (cid:0) v ( a ) j = j v ( x ) (cid:0) v ( x ) j = 0 Then (cid:30) is bijective and strictly increasing, hence, for all x; y; a; b X , w ( x ) (cid:0) w ( a ) = w ( b ) (cid:0) w ( y ) = ) j w ( x ) (cid:0) w ( a ) j = j w ( b ) (cid:0) w ( y ) j = ) j v ( x ) (cid:0) v ( a ) j = j v ( b ) (cid:0) v ( y ) j If w ( x ) (cid:0) w ( a ) (cid:21) , then w ( b ) (cid:0) w ( y ) (cid:21) , and v ( x ) = (cid:30) (cid:0) (w ( x )) (cid:21) (cid:30) (cid:0) (w ( a )) = v ( a ) and v ( b ) = (cid:30) (cid:0) (w ( b )) (cid:21) (cid:30) (cid:0) (w ( y )) = v ( y ) , hence v ( x ) (cid:0) v ( a ) = j v ( x ) (cid:0) v ( a ) j = j v ( b ) (cid:0) v ( y ) j = v ( b ) (cid:0) v ( y ) else if w ( x ) (cid:0) w ( a ) < , then w ( b ) (cid:0) w ( y ) < , and v ( x ) = (cid:30) (cid:0) (w ( x )) (cid:20) (cid:30) (cid:0) (w ( a )) = v ( a ) and v ( b ) = (cid:30) (cid:0) (w ( b )) (cid:20) (cid:30) (cid:0) (w ( y )) = v ( y ) , hence v ( x ) (cid:0) v ( a ) = (cid:0) j v ( x ) (cid:0) v ( a ) j = (cid:0) j v ( b ) (cid:0) v ( y ) j = v ( b ) (cid:0) v ( y ) Thus w ( x ) (cid:0) w ( a ) = w ( b ) (cid:0) w ( y ) = ) v ( x ) (cid:0) v ( a ) = v ( b ) (cid:0) v ( y ) For all x; y; z X , v ( x ) + v ( y )2 = v ( z ) () v ( x ) + v ( y ) = 2v ( z ) () v ( x ) (cid:0) v ( z ) = v ( z ) (cid:0) v ( y ) () w ( x ) (cid:0) w ( z ) = w ( z ) (cid:0) w ( y ) () w ( x ) + w ( y )2 = w ( z ) First notice that (cid:30) must be injective. In fact, for all v ( x ) ; v ( y ) v ( X ) , (cid:30) (v ( x )) = (cid:30) (v ( y )) = ) w ( x ) = w ( y ) = ) w ( x ) + w ( y )2 = w ( x )= ) v ( x ) + v ( y )2 = v ( x ) = ) v ( x ) + v ( y ) = 2v ( x ) Then (cid:30) is bijective and strictly increasing. Let [ (cid:17); (cid:18) ] be a closed nondegenerate interval in v ( X ) , andnotice that, for all v ( x ) ; v ( y ) ; v ( z ) [ (cid:17); (cid:18) ] , w ( x ) + w ( y )2 = w ( z ) () v ( x ) + v ( y )2 = v ( z ) thus (cid:30) (v ( x )) + (cid:30) (v ( y ))2 = (cid:30) (v ( z )) () v ( x ) + v ( y )2 = v ( z ) Denoting by the identity function ( (cid:13) ) = (cid:13) for all (cid:13) [ (cid:17); (cid:18) ] , and using the notation of Hardy, Littlewood,and Polya (1934), we have M (cid:30) (v ( x ) ; v ( y )) = v ( z ) () M (v ( x ) ; v ( y )) = v ( z ) by their Statements 83 and 89, it follows (cid:30) = (cid:20) + (cid:24) (cid:20) = 0 and (cid:24) R . Since (cid:30) is strictly increasing, then (cid:20) > . Finally, since v ( X ) is a nondegenerateinterval, there exists an increasing sequence [ (cid:17) n ; (cid:18) n ] of closed nondegenerate intervals in v ( X ) such that [ (cid:17) n ; (cid:18) n ] % v ( X ) . For each n , (cid:30) ( (cid:13) ) = (cid:20) n ( (cid:13) ) + (cid:24) n (cid:13) [ (cid:17) n ; (cid:18) n ] (cid:30) ( (cid:13) ) = (cid:20) n +1 ( (cid:13) ) + (cid:24) n +1 (cid:13) [ (cid:17) n +1 ; (cid:18) n +1 ] but then, (cid:24) n = (cid:24) n +1 = (cid:24) and (cid:20) n = (cid:20) n +1 = (cid:20) for all n . (cid:4) De(cid:133)nition 10

Let v : X ! R be a function. The relations de(cid:133)ned on X and on A by a (cid:31) b () v ( a ) > v ( b ) f a; b g (cid:31) (cid:3) f c; d g () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j are called psychometric preferences represented by v . The relation de(cid:133)ned on X = by ( a; b ) (cid:31) \ ( c; d ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) is called preference intensity represented by v . Proposition 17

Let v : X ! R be a function.1. If ( (cid:31) ; (cid:31) (cid:3) ) are psychometric preferences represented by v , then the relation de(cid:133)ned by ( a; b ) _ (cid:31) \ ( c; d ) () either (i) a % b and c % d and f a; b g (cid:31) (cid:3) f c; d g or (ii) a (cid:31) b and c (cid:30) d or (iii) a - b and c - d and f c; d g (cid:31) (cid:3) f a; b g is a preference intensity represented by v .2. If (cid:31) \ is a preference intensity represented by v , then the relations de(cid:133)ned by a _ (cid:31) b () ( a; c ) (cid:31) \ ( b; c ) for all c = a; b in X f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) are psychometric preferences represented by v .

3. In both cases, the function v is such that a (cid:31) b () v ( a ) > v ( b ) f a; b g (cid:31) (cid:3) f c; d g () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j ( a; b ) (cid:31) \ ( c; d ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) where the decorations are omitted.Moreover, if ( (cid:31) ; (cid:31) (cid:3) ) are psychometric preferences, and _ (cid:31) \ is the derived preference intensity, then (cid:0) (cid:127) (cid:31) ; (cid:127) (cid:31) (cid:3) (cid:1) = ( (cid:31) ; (cid:31) (cid:3) ) ; dually, if (cid:31) \ is a preference intensity and (cid:0) _ (cid:31) ; _ (cid:31) (cid:3) (cid:1) are the derived psychometric pref-erences, then (cid:127) (cid:31) \ = (cid:31) \ . Because of the possibility that a _ (cid:24) b or c _ (cid:24) d , or both, the clause (cid:147) ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) (cid:148) must be pedanticallyread as (cid:147) ( x; y ) (cid:31) \ ( z; w ) whenever ( x; y ) is a pair consisting of distinct maximal and minimal elements of f a; b g with respectto _ (cid:31) and ( z; w ) is a pair consisting of distinct maximal and minimal elements of f c; d g with respect to _ (cid:31) (cid:148). roof

1) By de(cid:133)nition of psychometric preferences represented by v : a (cid:31) b () v ( a ) > v ( b ) f a; b g (cid:31) (cid:3) f c; d g () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j Moreover, given any a = b and c = d in X , and abbreviating (cid:147)either(cid:148) with (cid:147)ei.(cid:148), ( a; b ) _ (cid:31) \ ( c; d ) () either (i) a % b and c % d and f a; b g (cid:31) (cid:3) f c; d g or (ii) a (cid:31) b and c (cid:30) d or (iii) a - b and c - d and f c; d g (cid:31) (cid:3) f a; b g() ei. v ( a ) (cid:0) v ( b ) (cid:21) ; v ( c ) (cid:0) v ( d ) (cid:21) ; j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j or v ( a ) (cid:0) v ( b ) > ; v ( c ) (cid:0) v ( d ) < or v ( a ) (cid:0) v ( b ) (cid:20) ; v ( c ) (cid:0) v ( d ) (cid:20) ; j v ( c ) (cid:0) v ( d ) j > j v ( a ) (cid:0) v ( b ) j() ei. v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) (cid:21) or v ( a ) (cid:0) v ( b ) > > v ( c ) (cid:0) v ( d ) or v ( a ) (cid:0) v ( b ) (cid:20) ; v ( c ) (cid:0) v ( d ) (cid:20) ; (cid:0) [v ( c ) (cid:0) v ( d )] > (cid:0) [v ( a ) (cid:0) v ( b )] () ei. v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) (cid:21) or v ( a ) (cid:0) v ( b ) > > v ( c ) (cid:0) v ( d ) or (cid:21) v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) hence _ (cid:31) \ is a preference intensity represented by v .2) If (cid:31) \ is a preference intensity represented by v , then, given any a; b X , a _ (cid:31) b () ( a; c ) (cid:31) \ ( b; c ) for all c = a; b in X () v ( a ) (cid:0) v ( c ) > v ( b ) (cid:0) v ( c ) for all c = a; b in X () v ( a ) > v ( b ) Moreover, given any f a; b g and f c; d g in A , there are the following nine possibilities:(i) v ( a ) > v ( b ) and v ( c ) > v ( d ) , then a g b = a; a f b = b; c g d = c; c f d = d hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( a; b ) (cid:31) \ ( c; d ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (ii) v ( a ) > v ( b ) and v ( c ) = v ( d ) , then a g b = a; a f b = b and, either c g d = c; c f d = d or c g d = d; c f d = c , hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( a; b ) (cid:31) \ ( c; d ) and ( a; b ) (cid:31) \ ( d; c ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) and v ( a ) (cid:0) v ( b ) > v ( d ) (cid:0) v ( c ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j v ( a ) > v ( b ) and v ( c ) < v ( d ) , then a g b = a; a f b = b; c g d = d; c f d = c hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( a; b ) (cid:31) \ ( d; c ) () v ( a ) (cid:0) v ( b ) > v ( d ) (cid:0) v ( c ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (iv) v ( a ) = v ( b ) and v ( c ) > v ( d ) , then c g d = c; c f d = d and, either a g b = a; a f b = b or a g b = b; a f b = a , hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( a; b ) (cid:31) \ ( c; d ) and ( b; a ) (cid:31) \ ( c; d ) () v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) and v ( b ) (cid:0) v ( a ) > v ( c ) (cid:0) v ( d ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j Note that v ( a ) (cid:0) v ( b ) = 0 , so j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j is impossible, but this fact is formallyirrelevant; it only means that if v ( a ) = v ( b ) and v ( c ) > v ( d ) , then it cannot be the case that f a; b g _ (cid:31) (cid:3) f c; d g .(v) v ( a ) = v ( b ) and v ( c ) = v ( d ) , then for all ( x; y ) ; ( z; w ) X = such that ( x; y ) is a pair consisting ofdistinct maximal and minimal elements of f a; b g with respect to _ (cid:31) and ( z; w ) is a pair consisting ofdistinct maximal and minimal elements of f c; d g with respect to _ (cid:31) , we have v ( x ) = v ( y ) = v ( a ) =v ( b ) and v ( z ) = v ( w ) = v ( c ) = v ( d ) , thus f a; b g _ (cid:31) (cid:3) f c; d g () ( x; y ) (cid:31) \ ( z; w ) for all these pairs () > () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (Note that ...)(vi) v ( a ) = v ( b ) and v ( c ) < v ( d ) , then c g d = d; c f d = c and, either a g b = a; a f b = b or a g b = b; a f b = a , hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( a; b ) (cid:31) \ ( d; c ) and ( b; a ) (cid:31) \ ( d; c ) () v ( a ) (cid:0) v ( b ) > v ( d ) (cid:0) v ( c ) and v ( b ) (cid:0) v ( a ) > v ( d ) (cid:0) v ( c ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (Note that ...) 39vii) v ( a ) < v ( b ) and v ( c ) > v ( d ) , then a g b = b; a f b = a; c g d = c; c f d = d hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( b; a ) (cid:31) \ ( c; d ) () v ( b ) (cid:0) v ( a ) > v ( c ) (cid:0) v ( d ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (viii) v ( a ) < v ( b ) and v ( c ) = v ( d ) , then a g b = b; a f b = a and, either c g d = c; c f d = d or c g d = d; c f d = c , hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( b; a ) (cid:31) \ ( c; d ) and ( b; a ) (cid:31) \ ( d; c ) () v ( b ) (cid:0) v ( a ) > v ( c ) (cid:0) v ( d ) and v ( b ) (cid:0) v ( a ) > v ( d ) (cid:0) v ( c ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j (ix) v ( a ) < v ( b ) and v ( c ) < v ( d ) , then a g b = b; a f b = a; c g d = d; c f d = c hence f a; b g _ (cid:31) (cid:3) f c; d g () ( a g b; a f b ) (cid:31) \ ( c g d; c f d ) () ( b; a ) (cid:31) \ ( d; c ) () v ( b ) (cid:0) v ( a ) > v ( d ) (cid:0) v ( c ) () j v ( a ) (cid:0) v ( b ) j > j v ( c ) (cid:0) v ( d ) j Summing up, (cid:0) _ (cid:31) ; _ (cid:31) (cid:3) (cid:1) are psychometric preferences represented by v .The rest is a routine veri(cid:133)cation ... (cid:4) Lemma 18

Let X be a connected topological space and v ; w : X ! R be continuous functions. Thefollowing conditions are equivalent:1. v and w represent the same psychometric preferences;2. v and w represent the same preference intensity;3. there exist (cid:20) > and (cid:24) R such that w = (cid:20) v + (cid:24) . Proof

Assume that v : X ! R and w : X ! R are continuous functions such that, given any a = b and c = d in X , v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) () ( a; b ) (cid:31) \ ( c; d ) () w ( a ) (cid:0) w ( b ) > w ( c ) (cid:0) w ( d ) (24)40aking, for any a = b , an element c = c a;b such that c = a; b , we have v ( a ) (cid:0) v ( c ) > v ( b ) (cid:0) v ( c ) () w ( a ) (cid:0) w ( c ) > w ( b ) (cid:0) w ( c ) that is v ( a ) (cid:20) v ( b ) () w ( a ) (cid:20) w ( b ) (25)and the same is obviously true if a = b . In turn, (24) and (25) imply that v ( a ) (cid:0) v ( b ) > v ( c ) (cid:0) v ( d ) () w ( a ) (cid:0) w ( b ) > w ( c ) (cid:0) w ( d ) for all a; b; c; d X .Next we show that this implies point 2 of Lemma 16. Given any a; b; c; d X , if v ( a ) (cid:0) v ( b ) = v ( c ) (cid:0) v ( d ) , since w ( a ) (cid:0) w ( b ) ? w ( c ) (cid:0) w ( d ) would imply v ( a ) (cid:0) v ( b ) ? v ( c ) (cid:0) v ( d ) , then it must be the casethat w ( a ) (cid:0) w ( b ) = w ( c ) (cid:0) w ( d ) . That is, v ( a ) (cid:0) v ( b ) = v ( c ) (cid:0) v ( d ) implies w ( a ) (cid:0) w ( b ) = w ( c ) (cid:0) w ( d ) ,and the converse implication is obtained by exchanging the roles of v and w . Lemma 16 allows us toconclude that there exist (cid:20) > and (cid:24) R such that w = (cid:20) v + (cid:24) .This shows that, if v and w represent the same preference intensity, then there exist (cid:20) > and (cid:24) R such that w = (cid:20) v + (cid:24) . The converse is trivial.Finally, Proposition 17 shows that v and w represent the same preference intensity if and only if theyrepresent the same psychometric preferences. (cid:4) Not only these results yield

Lemma 2 and completely characterize the duality described by diagram(7), but they also lead to the following corollary which will be key in the proof of Theorem 5 below.

Corollary 19

Let X be a connected topological space, ( (cid:31) t ; (cid:31) (cid:3) t ) and ( (cid:31) s ; (cid:31) (cid:3) s ) be psychometric preferencesrepresented by continuous u t and u s , and (cid:31) \t and (cid:31) \s be the corresponding preference intensities. Thefollowing conditions are equivalent:1. given any a = b and c = d in X , a (cid:31) t b = ) a (cid:31) s b (Preference Consistency) f a; b g (cid:31) (cid:3) s f c; d g = ) f a; b g (cid:31) (cid:3) t f c; d g (Ease Consistency)

2. given any a = b and c = d in X , ( a; b ) (cid:31) \t ( c; d ) () ( a; b ) (cid:31) \s ( c; d ) (Intensity Consistency)

3. there exist (cid:20) > and (cid:24) R such that u s = (cid:20)u t + (cid:24) . Proof

It su¢ ces to prove that point 1 implies ( (cid:31) s ; (cid:31) (cid:3) s ) = ( (cid:31) t ; (cid:31) (cid:3) t ) , because then (cid:31) \s and (cid:31) \t coincide (by Proposition 17). At the risk of being pedantic, let us observe that the implication requiredby Preference Consistency also holds when a = b (in a vacuous way because the antecedent a (cid:31) t b is false).First, we show that, given any a; b X , a (cid:31) s b = ) a (cid:31) t b . Assume per contra that a (cid:31) s b and not a (cid:31) t b . It cannot be the case that b (cid:31) t a , because Preference Consistency would imply b (cid:31) s a . Therefore b (cid:24) t a holds and u t ( b ) = u t ( a ) . Moreover, a (cid:31) s b implies u s ( a ) > u s ( b ) and u s ( X ) is a nondegenerateinterval. This implies that there exists c X such that u s ( c ) = u s ( b ) + 13 ( u s ( a ) (cid:0) u s ( b )) j u s ( a ) (cid:0) u s ( c ) j > j u s ( c ) (cid:0) u s ( b ) j = j u s ( b ) (cid:0) u s ( c ) j > and simultaneously j u t ( a ) (cid:0) u t ( c ) j = j u t ( b ) (cid:0) u t ( c ) j , which leads to f a; c g (cid:31) (cid:3) s f b; c g and not f a; c g (cid:31) (cid:3) t f b; c g a contradiction of Ease Consistency. Summing up, given any a; b X , a (cid:31) s b () a (cid:31) t b .Second, we show that, given any a = b and c = d in X , f a; b g (cid:31) (cid:3) t f c; d g = ) f a; b g (cid:31) (cid:3) s f c; d g .Assume per contra f a; b g (cid:31) (cid:3) t f c; d g but not f a; b g (cid:31) (cid:3) s f c; d g . It cannot be the case that f c; d g (cid:31) (cid:3) s f a; b g ,because by Ease Consistency, that would imply j u t ( c ) (cid:0) u t ( d ) j > j u t ( a ) (cid:0) u t ( b ) j and f a; b g (cid:31) (cid:3) t f c; d g implies j u t ( a ) (cid:0) u t ( b ) j > j u t ( c ) (cid:0) u t ( d ) j . Then we have j u t ( a ) (cid:0) u t ( b ) j > j u t ( c ) (cid:0) u t ( d ) j and j u s ( a ) (cid:0) u s ( b ) j = j u s ( c ) (cid:0) u s ( d ) j Therefore j u t ( a ) (cid:0) u t ( b ) j > , and w.l.o.g. u t ( a ) > u t ( b ) (else exchange the roles of a and b ). Since (cid:31) s and (cid:31) t coincide, u s ( a ) > u s ( b ) , and so j u s ( c ) (cid:0) u s ( d ) j = u s ( a ) (cid:0) u s ( b ) > u t ( a ) (cid:0) u t ( b ) > j u t ( c ) (cid:0) u t ( d ) j > where j u t ( c ) (cid:0) u t ( d ) j > is true because u t ( c ) = u t ( d ) would imply u s ( c ) = u s ( d ) . But then u t ( a ) > u t ( a ) (cid:0) j u t ( c ) (cid:0) u t ( d ) j > u t ( b ) then there exists x X such that u t ( a ) > u t ( x ) = u t ( a ) (cid:0) j u t ( c ) (cid:0) u t ( d ) j > u t ( b ) Therefore u t ( a ) (cid:0) u t ( x ) = j u t ( c ) (cid:0) u t ( d ) j but also u s ( a ) > u s ( x ) > u s ( b ) thus (cid:0) u s ( a ) < (cid:0) u s ( x ) < (cid:0) u s ( b ) , whence < u s ( a ) (cid:0) u s ( x ) < u s ( a ) (cid:0) u s ( b ) = j u s ( c ) (cid:0) u s ( d ) j it follows j u s ( a ) (cid:0) u s ( x ) j < j u s ( c ) (cid:0) u s ( d ) j and j u t ( a ) (cid:0) u t ( x ) j = j u t ( c ) (cid:0) u t ( d ) j , that is, f c; d g (cid:31) (cid:3) s f a; x g and not f c; d g (cid:31) (cid:3) t f a; x g a contradiction of Ease Consistency. Summing up, given any a = b and c = d in X , f a; b g (cid:31) (cid:3) t f c; d g ()f a; b g (cid:31) (cid:3) s f c; d g . As wanted. In fact, point 2 requires the coincidence of (cid:31) \s and (cid:31) \t , which, by Lemma 18 implies thecardinal equivalence of u s and u t . Is trivial. (cid:4)

C Proofs of the results of Section 3

Given any function (cid:21) : T ! (0 ; ) , it is convenient for notational purposes to set (cid:12) ( t ) = 1 (cid:21) ( t ) (cid:12) : T ! (0 ; ) to set (cid:21) ( t ) = 1 (cid:12) ( t ) The convention (cid:21) (0) = here corresponds to (cid:12) (0) = 0 . Proof of Proposition 3

Set (cid:12) = 1 =(cid:21) , with (cid:12) (0) = 0 . Given any a; b X , and any t in T , r t ( a; b ) = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) = e (cid:12) ( t )[ u ( a ) (cid:0) u ( b )]+ (cid:11) ( a ) (cid:0) (cid:11) ( b ) ‘ t ( a; b ) = (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] + (cid:11) ( a ) (cid:0) (cid:11) ( b ) f t ( a; b ) = r t ( a; b ) r ( a; b ) = e (cid:12) ( t )[ u ( a ) (cid:0) u ( b )] w t ( a; b ) = ln f t ( a; b ) = (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] also if a = b and if t = 0 . Then, for each t T , a (cid:31) t b def () w t ( a; b ) > () u ( a ) > u ( b )( a; b ) (cid:31) \t ( c; d ) def () w t ( a; b ) > w t ( c; d ) () u ( a ) (cid:0) u ( b ) > u ( c ) (cid:0) u ( d ) f a; b g (cid:31) (cid:3) t f c; d g def () j w t ( a; b ) j > j w t ( c; d ) j () j u ( a ) (cid:0) u ( b ) j > j u ( c ) (cid:0) u ( d ) j because (cid:12) ( t ) > . (cid:4) Lemma 20

If a random choice process f p t g is such that there exist u; (cid:11) : X ! R and (cid:12) : T ! (0 ; ) forwhich p t ( a; A ) = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) (26) for all A , all a A , and all t T , then f p t g is constant if and only if u is constant.Moreover, (cid:15) if f p t g is constant, (cid:22) u; (cid:22) (cid:11) : X ! R and (cid:22) (cid:12) : T ! (0 ; ) represent f p t g in the sense of (26) if and onlyif there exist k > and h; l R such that (cid:22) u = ku + h and (cid:22) (cid:11) = (cid:11) + l (there are no constraints on (cid:22) (cid:12) ). (cid:15) else, (cid:22) u; (cid:22) (cid:11) : X ! R and (cid:22) (cid:12) : T ! (0 ; ) represent f p t g in the sense of (26) if and only if there exist k > and h; l R such that (cid:22) u = ku + h , (cid:22) (cid:11) = (cid:11) + l , and (cid:22) (cid:12) = (cid:12)=k .Brie(cid:135)y, u is cardinally unique, (cid:11) is unique up to location, and (cid:12) is unique given u unless f p t g is constant. In particular, when the process is not constant, (cid:12) is unique up to scale: we can multiply (cid:12) by a strictlypositive constant provided we divide u by the same constant. Proof If u is constant, say u (cid:17) (cid:23) R , then p t ( a; A ) = e (cid:12) ( t ) (cid:23) + (cid:11) ( a ) P b A e (cid:12) ( t ) (cid:23) + (cid:11) ( b ) = e (cid:11) ( a ) P b A e (cid:11) ( b ) = p ( a; A ) for all A , all a A , and all t T . Conversely, if e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) = p t ( a; A ) = p ( a; A ) = e (cid:11) ( a ) P b A e (cid:11) ( b ) A , all a A , and all t T , then (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] + (cid:11) ( a ) (cid:0) (cid:11) ( b ) = ‘ t ( a; b ) = ‘ ( a; b ) = (cid:11) ( a ) (cid:0) (cid:11) ( b ) for all a; b X and all t T , and, since (cid:12) ( t ) > , u ( a ) (cid:0) u ( b ) = 0 follows.As to uniqueness of u , (cid:11) , and (cid:12) , notice that, if also (cid:22) u , (cid:22) (cid:11) , and (cid:22) (cid:12) represent f p t g in the sense of (26),then e (cid:11) ( a ) (cid:0) (cid:11) ( b ) = r ( a; b ) = e (cid:22) (cid:11) ( a ) (cid:0) (cid:22) (cid:11) ( b ) and e (cid:12) ( t )[ u ( a ) (cid:0) u ( b )]+ (cid:11) ( a ) (cid:0) (cid:11) ( b ) = r t ( a; b ) = e (cid:22) (cid:12) ( t )[(cid:22) u ( a ) (cid:0) (cid:22) u ( b )]+(cid:22) (cid:11) ( a ) (cid:0) (cid:22) (cid:11) ( b ) for all a; b X and all t T . Therefore, arbitrarily choosing c (cid:3) X , it follows (cid:22) (cid:11) ( a ) = (cid:11) ( a ) +[ (cid:22) (cid:11) ( c (cid:3) ) (cid:0) (cid:11) ( c (cid:3) )] for all a X , whence (cid:22) (cid:11) = (cid:11) + l where l = (cid:22) (cid:11) ( c (cid:3) ) (cid:0) (cid:11) ( c (cid:3) ) is a constant. Hence, (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] + (cid:11) ( a ) (cid:0) (cid:11) ( b ) = (cid:22) (cid:12) ( t ) [(cid:22) u ( a ) (cid:0) (cid:22) u ( b )] + (cid:22) (cid:11) ( a ) (cid:0) (cid:22) (cid:11) ( b )= (cid:22) (cid:12) ( t ) [(cid:22) u ( a ) (cid:0) (cid:22) u ( b )] + (cid:11) ( a ) (cid:0) (cid:11) ( b ) and (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] = (cid:22) (cid:12) ( t ) [(cid:22) u ( a ) (cid:0) (cid:22) u ( b )] for all a; b X and all t T . Arbitrarily choosing t (cid:3) T and b (cid:3) X , it follows (cid:22) u ( a ) = (cid:12) ( t (cid:3) )(cid:22) (cid:12) ( t (cid:3) ) [ u ( a ) (cid:0) u ( b (cid:3) )] + (cid:22) u ( b (cid:3) ) = ku ( a ) + h a X with k > and h R . Cardinal uniqueness of u and uniqueness of (cid:11) up to location follow. Moreover, if f p t g is not constant, then u is not constant either. Choosing a; b X with u ( a ) = u ( b ) , by what we havejust proved, it must be the case that (cid:12) ( t ) [ u ( a ) (cid:0) u ( b )] = (cid:22) (cid:12) ( t ) [(cid:22) u ( a ) (cid:0) (cid:22) u ( b )] = (cid:22) (cid:12) ( t ) [ ku ( a ) (cid:0) ku ( b )] for all t T ; so that (cid:22) (cid:12) = (cid:12)=k if (cid:22) u = ku + h . This yields uniqueness of (cid:12) given u , because if u = (cid:22) u , then k = 1 .The converse is also true. In fact, if (cid:22) u = ku + h and (cid:22) (cid:11) = (cid:11) + l , with k > and h; l R , and we set (cid:22) (cid:12) = (cid:12)=k , it follows that e (cid:12) ( t ) k [ ku ( a )+ h ]+[ (cid:11) ( a )+ l ] P b A e (cid:12) ( t ) k [ ku ( b )+ h ]+[ (cid:11) ( b )+ l ] = e (cid:12) ( t ) u ( a )+ (cid:12) ( t ) k h + (cid:11) ( a ) e l P b A e (cid:12) ( t ) u ( b )+ (cid:12) ( t ) k h + (cid:11) ( b ) e l = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) e l + (cid:12) ( t ) k h P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) e l + (cid:12) ( t ) k h = p t ( a; A ) for all A , all a A , and all t T , thus, (cid:22) u , (cid:22) (cid:11) , and (cid:22) (cid:12) represent f p t g in the sense of (26). If in addition f p t g is constant, then u is constant and e ~ (cid:12) ( t )[ ku ( a )+ h ]+[ (cid:11) ( a )+ l ] P b A e ~ (cid:12) ( t )[ ku ( b )+ h ]+[ (cid:11) ( b )+ l ] = e (cid:11) ( a ) P b A e (cid:11) ( b ) = p ( a; A ) = p t ( a; A ) for all A , all a A , all t T , and any ~ (cid:12) : T ! (0 ; ) . (cid:4) Both

Proposition 4 and a characterization of constant processes follow immediately.

Proposition 21

Let f p t g be a softmax process with utility u . The following conditions are equivalent:1. f p t g is nonconstant; Positivity of k guarantees positivity of (cid:22) (cid:12) . . u is nonconstant;3. there exist ^ a; ^ b X and ^ t T such that p ^ t (^ a; ^ b ) > p (^ a; ^ b ) . Proof of Theorem 5

Since f p t g satis(cid:133)es Positivity, the Choice Axiom, and Continuity,by Theorem 1, for each t T , there exists a continuous v t : X ! R such that p t ( a; A ) = e v t ( a ) P b A e v t ( b ) a A (27)Arbitrarily choose (cid:22) c X and replace each v t with v t (cid:0) v t ((cid:22) c ) . With this, v t ((cid:22) c ) = 0 for all t T and (27)still holds. Set (cid:11) = v and u t = v t (cid:0) (cid:11) = v t (cid:0) v for all t T . Clearly, the new v t (cid:146)s, the u t (cid:146)s, and (cid:11) arecontinuous and u t ((cid:22) c ) = v t ((cid:22) c ) (cid:0) v ((cid:22) c ) = 0 t T (also (cid:11) ((cid:22) c ) = 0 ). As in Section 3.1, a (cid:31) t b () w t ( a; b ) > f a; b g (cid:31) (cid:3) t f c; d g () e t ( a; b ) > e t ( c; d ) for all t T , a; b X , and f a; b g ; f c; d g 2 A . By (27), for all t T and a; b X , w t ( a; b ) = ‘ t ( a; b ) (cid:0) ‘ ( a; b ) = v t ( a ) (cid:0) v t ( b ) (cid:0) v ( a ) + v ( b ) = u t ( a ) (cid:0) u t ( b ) thus a (cid:31) t b () u t ( a ) > u t ( b ) f a; b g (cid:31) (cid:3) t f c; d g () j u t ( a ) (cid:0) u t ( b ) j > j u t ( c ) (cid:0) u t ( d ) j But then X is a connected topological space, and ( (cid:31) t ; (cid:31) (cid:3) t ) and ( (cid:31) s ; (cid:31) (cid:3) s ) are psychometric preferencesrepresented by u t and u s for all s; t T . As observed in the main text, Preference Consistency and EaseConsistency imply that a (cid:31) t b = ) a (cid:31) s b f a; b g (cid:31) (cid:3) s f c; d g = ) f a; b g (cid:31) (cid:3) t f c; d g for all s > t in T , a; b X , and f a; b g ; f c; d g 2 A ; but then Corollary 19 guarantees that there exist (cid:20) s;t > and (cid:24) s;t R such that u s = (cid:20) s;t u t + (cid:24) s;t In particular, all the u t (cid:146)s are cardinally equivalent. Thus, arbitrarily choosing ^ t T and setting u = u ^ t , itfollows that, for every t T , there exist (cid:21) ( t ) > and (cid:17) ( t ) R such that u t = u ^ t (cid:21) ( t ) + (cid:17) ( t ) = u(cid:21) ( t ) + (cid:17) ( t ) Moreover, for all t T , u t ((cid:22) c ) = u ^ t ((cid:22) c ) (cid:21) ( t ) + (cid:17) ( t ) = 0 (cid:21) ( t ) + (cid:17) ( t ) = (cid:17) ( t ) and v t = u t + (cid:11) = u(cid:21) ( t ) + (cid:11) so that point 3 follows from (27), because the case t = 0 follows suit.45 implies 3. Since f p t g satis(cid:133)es Positivity, the Choice Axiom, and Continuity, by Theorem 1, for each t T , there exists a continuous v t : X ! R such that p t ( a; A ) = e v t ( a ) P b A e v t ( b ) a A (28)Arbitrarily choose (cid:22) c X and replace each v t with v t (cid:0) v t ((cid:22) c ) . With this, v t ((cid:22) c ) = 0 for all t T and (28)still holds. Set (cid:11) = v and u t = v t (cid:0) (cid:11) = v t (cid:0) v for all t T . Clearly, the new v t (cid:146)s, the u t (cid:146)s, and (cid:11) arecontinuous and u t ((cid:22) c ) = v t ((cid:22) c ) (cid:0) v ((cid:22) c ) = 0 t T (also (cid:11) ((cid:22) c ) = 0 ). De(cid:133)ne, like in Section 3.1, ( a; b ) (cid:31) \t ( c; d ) () w t ( a; b ) > w t ( c; d ) for all t T and ( a; b ) ; ( c; d ) X = . By (28), for all t T and a; b X , w t ( a; b ) = ‘ t ( a; b ) (cid:0) ‘ ( a; b ) = v t ( a ) (cid:0) v t ( b ) (cid:0) v ( a ) + v ( b ) = u t ( a ) (cid:0) u t ( b ) thus ( a; b ) (cid:31) \t ( c; d ) () u t ( a ) (cid:0) u t ( b ) > u t ( c ) (cid:0) u t ( d ) But then X is a connected topological space, and (cid:31) \t and (cid:31) \s are preference intensities represented by u t and u s for all s; t T . As observed in the main text, Intensity Consistency is equivalent to ( a; b ) (cid:31) \t ( c; d ) () ( a; b ) (cid:31) \s ( c; d ) for all s > t in T and ( a; b ) ; ( c; d ) X = ; but then Corollary 19 guarantees that there exist (cid:20) s;t > and (cid:24) s;t R such that u s = (cid:20) s;t u t + (cid:24) s;t Point 3 follows by the argument we used above.The rest of the proof is routine (for uniqueness see Lemma 20). (cid:4)

Proof of Proposition 6

Setting (cid:12) = 1 =(cid:21) , with (cid:12) (0) = 0 , p t ( a; A ) = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) for all A , all a A , and all t T . If p ^ t (^ a; ^ b ) > p (^ a; ^ b ) for some ^ a; ^ b X and ^ t T , then ^ a (cid:31) ^ t ^ b , and < w ^ t (^ a; ^ b ) = (cid:12) (cid:0) ^ t (cid:1) h u (^ a ) (cid:0) u (^ b ) i thus (cid:1) = u (^ a ) (cid:0) u (^ b ) > , because (cid:12) (cid:0) ^ t (cid:1) > . Therefore, if we set ^ (cid:12) ( t ) = w t (^ a; ^ b ) = (cid:12) ( t ) (cid:1) t T we obtain a function ^ (cid:12) : T ! (0 ; ) . Analogously ^ u ( x ) = w ^ t ( x; ^ b ) w ^ t (^ a; ^ b ) = (cid:12) (cid:0) ^ t (cid:1) h u ( x ) (cid:0) u (^ b ) i (cid:12) (cid:0) ^ t (cid:1) (cid:1) = u ( x ) (cid:0) u (^ b )(cid:1) x X and ^ (cid:11) ( x ) = ‘ ( x; ^ b ) = (cid:11) ( x ) (cid:0) (cid:11) (^ b ) x X ^ u; ^ (cid:11) : X ! R . Finally, for all A , all a A , and all t T , e ^ (cid:12) ( t )^ u ( a )+^ (cid:11) ( a ) P b A e ^ (cid:12) ( t )^ u ( b )+^ (cid:11) ( b ) = exp h ^ (cid:12) ( t ) ^ u ( a ) + ^ (cid:11) ( a ) iP b A exp h ^ (cid:12) ( t ) ^ u ( b ) + ^ (cid:11) ( b ) i = exp " (cid:12) ( t ) (cid:1) u ( a ) (cid:0) u (^ b )(cid:1) + (cid:11) ( a ) (cid:0) (cid:11) (^ b ) b A exp " (cid:12) ( t ) (cid:1) u ( b ) (cid:0) u (^ b )(cid:1) + (cid:11) ( b ) (cid:0) (cid:11) (^ b ) = exp h (cid:12) ( t ) u ( a ) (cid:0) (cid:12) ( t ) u (^ b ) + (cid:11) ( a ) (cid:0) (cid:11) (^ b ) iP b A exp h (cid:12) ( t ) u ( b ) (cid:0) (cid:12) ( t ) u (^ b ) + (cid:11) ( b ) (cid:0) (cid:11) (^ b ) i = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) e (cid:0) [ (cid:12) ( t ) u (^ b )+ (cid:11) (^ b ) ] P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) e (cid:0) [ (cid:12) ( t ) u (^ b )+ (cid:11) (^ b ) ]= p t ( a; A ) and the same is true for t = 0 . (cid:4) Proposition 22

Let u; (cid:11) : X ! R and p t ( a; A ) = e tu ( a )+ (cid:11) ( a ) P b A e tu ( b )+ (cid:11) ( b ) for all A , all a A , and all t [0 ; ) . Then p s ( f a A : u ( a ) > h g ; A ) (cid:21) p t ( f a A : u ( a ) > h g ; A ) h R for all s > t in (0 ; ) and all A . Proof

Arbitrarily choose A , h R , and set [ u > h ] = f c A : u ( c ) > h g . If [ u > h ] = ? , then p t ( f a A : u ( a ) > h g ; A ) = p t ( ? ; A ) = 0 t (0 ; ) hence p s ( f a A : u ( a ) > h g ; A ) = p t ( f a A : u ( a ) > h g ; A ) for all s > t in (0 ; ) . Analogously, if [ u > h ] = A , then p t ( f a A : u ( a ) > h g ; A ) = p t ( A; A ) = 1 t (0 ; ) hence p s ( f a A : u ( a ) > h g ; A ) = p t ( f a A : u ( a ) > h g ; A ) for all s > t in (0 ; ) . Else ? ( [ u > h ] ( A . If we prove that, in this case, it holds dd t p t ([ u > h ] ; A ) > t (0 ; ) (29)then the statement follows. In fact, (29) implies that the function p ([ u > h ] ; A ) : T ! [0 ; t p t ([ u > h ] ; A )

47s strictly increasing on (0 ; ) .Next we show that (29) holds. Notice that ? ( [ u > h ] ( A implies [ u (cid:20) h ] is not empty. Given any t (0 ; ) , with the abbreviation P u ( c ) >h = P c A : u ( c ) >h , we have < dd t u ( c ) >h e tu ( c )+ (cid:11) ( c ) P b A e tu ( b )+ (cid:11) ( b ) = P b A e tu ( b )+ (cid:11) ( b ) ! P u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) (cid:0) P u ( c ) >h e tu ( c )+ (cid:11) ( c ) P b A u ( b ) e tu ( b )+ (cid:11) ( b ) ! P b A e tu ( b )+ (cid:11) ( b ) ! () P u ( c ) >h e tu ( c )+ (cid:11) ( c ) P u ( b ) >h u ( b ) e tu ( b )+ (cid:11) ( b ) + P u ( b ) (cid:20) h u ( b ) e tu ( b )+ (cid:11) ( b ) ! <

0@ P u ( b ) >h e tu ( b )+ (cid:11) ( b ) + X u ( b ) (cid:20) h e tu ( b )+ (cid:11) ( b )

1A P u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) () X u ( c ) >h e tu ( c )+ (cid:11) ( c ) X u ( b ) (cid:20) h u ( b ) e tu ( b )+ (cid:11) ( b ) < X u ( b ) (cid:20) h e tu ( b )+ (cid:11) ( b ) X u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) after re-lettering, this is equivalent to X u ( c ) (cid:20) h u ( c ) e tu ( c )+ (cid:11) ( c ) X u ( b ) >h e tu ( b )+ (cid:11) ( b ) < X u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) X u ( b ) (cid:20) h e tu ( b )+ (cid:11) ( b ) P u ( c ) (cid:20) h u ( c ) e tu ( c )+ (cid:11) ( c ) P u ( b ) (cid:20) h e tu ( b )+ (cid:11) ( b ) < P u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) P u ( b ) >h e tu ( b )+ (cid:11) ( b ) X u ( c ) (cid:20) h u ( c ) e tu ( c )+ (cid:11) ( c ) P u ( b ) (cid:20) h e tu ( b )+ (cid:11) ( b ) < X u ( c ) >h u ( c ) e tu ( c )+ (cid:11) ( c ) P u ( b ) >h e tu ( b )+ (cid:11) ( b ) c [ u (cid:20) h ] u ( c ) p t ( c; [ u (cid:20) h ]) < X c [ u>h ] u ( c ) p t ( c; [ u > h ]) and this concludes the proof, because the l.h.s. is an average (i.e., a convex combination) of values u ( c ) (cid:20) h , so it is not greater than h itself, the r.h.s. is an average of values u ( c ) > h , so it is strictlygreater than h itself. (cid:4) Proof of Proposition 7

By Proposition 21, u is nonconstant. Given any s; t T and a; b X , we have p s ( a; b ) (cid:21) p t ( a; b ) () r s ( a; b ) (cid:21) r t ( a; b ) () e u ( a ) (cid:0) u ( b ) (cid:21) ( s ) + (cid:11) ( a ) (cid:0) (cid:11) ( b ) (cid:21) e u ( a ) (cid:0) u ( b ) (cid:21) ( t ) + (cid:11) ( a ) (cid:0) (cid:11) ( b ) () (cid:21) ( t ) [ u ( a ) (cid:0) u ( b )] (cid:21) (cid:21) ( s ) [ u ( a ) (cid:0) u ( b )] Now given s > t in T , arbitrarily choose a; b X such that u ( a ) > u ( b ) . Direct computation of p t ( a; b ) yields p t ( a; b ) > p ( a; b ) Decreasing Error Rate then implies p s ( a; b ) (cid:21) p t ( a; b ) and (cid:21) ( t ) [ u ( a ) (cid:0) u ( b )] (cid:21) (cid:21) ( s ) [ u ( a ) (cid:0) u ( b )] that is, (cid:21) ( t ) (cid:21) (cid:21) ( s ) . Let s > t in T and observe that (cid:21) ( s ) (cid:20) (cid:21) ( t ) . Consider q l ( a; A ) = e lu ( a )+ (cid:11) ( a ) P b A e lu ( b )+ (cid:11) ( b ) a A l [0 ; ) . By Proposition 22, it follows that, for every A , if l (cid:21) l , then q l ( f a A : u ( a ) > h g ; A ) (cid:21) q l ( f a A : u ( a ) > h g ; A ) h R Now, taking l = 1 =(cid:21) ( s ) and l = 1 =(cid:21) ( t ) , decreasing monotonicity of (cid:21) guarantees that l (cid:21) l , and we have q =(cid:21) ( s ) ( f a A : u ( a ) > h g ; A ) (cid:21) q =(cid:21) ( t ) ( f a A : u ( a ) > h g ; A ) h R p s ( f a A : u ( a ) > h g ; A ) (cid:21) p t ( f a A : u ( a ) > h g ; A ) h R that is, Payo⁄ Stochastic Dominance holds. Given any A and any s > t in T , by Payo⁄ Stochastic Dominance, p s ( f a A : u ( a ) > h g ; A ) (cid:21) p t ( f a A : u ( a ) > h g ; A ) h R but then, for all b A , taking h = u ( b ) , it follows p s ( f a A : u ( a ) > u ( b ) g ; A ) (cid:21) p t ( f a A : u ( a ) > u ( b ) g ; A ) b A but w (cid:28) ( c; d ) = ln r (cid:28) ( c; d ) r ( c; d ) = u ( c ) (cid:0) u ( d ) (cid:21) ( (cid:28) ) c; d A (cid:28) T (30)and since c (cid:31) (cid:28) d if and only if w (cid:28) ( c; d ) > , it follows c (cid:31) (cid:28) d if and only if u ( c ) > u ( d ) ; therefore p s ( f a A : a (cid:31) s b g ; A ) (cid:21) p t ( f a A : a (cid:31) t b g ; A ) b A By (30), given any c; d X and (cid:28) T , p (cid:28) ( c; d ) > p ( c; d ) () c (cid:31) (cid:28) d () w (cid:28) ( c; d ) > () u ( c ) > u ( d ) Let s > t and a; b X be such that p t ( a; b ) (cid:21) p ( a; b ) . If p t ( a; b ) = p ( a; b ) , then u ( a ) = u ( b ) , hence p s ( a; b ) = p ( a; b ) = p t ( a; b ) . Else p t ( a; b ) > p ( a; b ) and u ( a ) > u ( b ) . Point 2 guarantees that p s ( f x a; b g : x (cid:31) s b g ; f a; b g ) (cid:21) p t ( f x a; b g : x (cid:31) t b g ; f a; b g ) and, since u represents both (cid:31) s and (cid:31) t , it follows that f x a; b g : x (cid:31) s b g = f x a; b g : x (cid:31) t b g = f a g ,therefore p s ( a; b ) (cid:21) p t ( a; b ) and Decreasing Error Rate holds. (cid:4) Proof of Proposition 8

By Proposition 7, as t ! 1 , (cid:21) ( t ) decreases to some (cid:21) (cid:3) . Let a = b in X . If (cid:21) (cid:3) > , then p ( a; b ) = lim t !1 p t ( a; b ) = 11 + e u ( b ) (cid:0) u ( a ) (cid:21) (cid:3) + (cid:11) ( b ) (cid:0) (cid:11) ( a ) (0 ; By Asymptotic Tie-breaking,

11 + e u ( b ) (cid:0) u ( a ) (cid:21) (cid:3) + (cid:11) ( b ) (cid:0) (cid:11) ( a ) = p ( a; b ) = p ( a; b ) = 11 + e (cid:11) ( b ) (cid:0) (cid:11) ( a ) which contradicts (cid:21) (cid:3) > . We conclude that (cid:21) (cid:3) = 0 . In turn, given any a A , this implies p ( a; A ) = lim (cid:21) ( t ) ! e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) = lim (cid:12) !1 e (cid:12)u ( a )+ (cid:11) ( a ) P b A e (cid:12)u ( b )+ (cid:11) ( b ) = lim (cid:12) !1 X f b A : u ( b ) >u ( a ) g e (cid:12) [ u ( b ) (cid:0) u ( a )]+ (cid:11) ( b ) (cid:0) (cid:11) ( a ) | {z } !1 if there exists b A such that u ( b ) >u ( a ) + X f b A : u ( b ) (cid:20) u ( a ) g e (cid:12) [ u ( b ) (cid:0) u ( a )]+ (cid:11) ( b ) (cid:0) (cid:11) ( a ) | {z } ! P f b A : u ( b )= u ( a ) g e (cid:11) ( b ) (cid:0) (cid:11) ( a ) so: 49 if a = arg max A u , then there exists b A such that u ( b ) > u ( a ) , and so p ( a; A ) = 1 + P f b A : u ( b )= u ( a ) g e (cid:11) ( b ) (cid:0) (cid:11) ( a ) = 0 = (cid:14) arg max A u ( a ) e (cid:11) ( a ) P b arg max A u e (cid:11) ( b ) (cid:15) else, there does not exist b A such that u ( b ) > u ( a ) , u ( a ) = max A u , and p ( a; A ) = 1 P f b A : u ( b )= u ( a ) g e (cid:11) ( b ) (cid:0) (cid:11) ( a ) = 1 P b arg max A u e (cid:11) ( b ) (cid:0) (cid:11) ( a ) = e (cid:11) ( a ) P b arg max A u e (cid:11) ( b ) = (cid:14) arg max A u ( a ) e (cid:11) ( a ) P b arg max A u e (cid:11) ( b ) as desired.Let a = b in X . If u ( a ) > u ( b ) , then arg max f a;b g u = f a g , hence p ( a; b ) = (cid:14) f a g ( a ) e (cid:11) ( a ) e (cid:11) ( a ) = 1 Conversely, if u ( a ) (cid:20) u ( b ) , there are two possibilities: (cid:15) either u ( b ) > u ( a ) , then arg max f a;b g u = f b g , hence p ( a; b ) = (cid:14) f b g ( a ) e (cid:11) ( a ) e (cid:11) ( b ) = 0 = 1 (cid:15) or u ( a ) = u ( b ) , then arg max f a;b g u = f a; b g , hence p ( a; b ) = (cid:14) f a;b g ( a ) e (cid:11) ( a ) e (cid:11) ( a ) + e (cid:11) ( b ) = e (cid:11) ( a ) e (cid:11) ( a ) + e (cid:11) ( b ) (0 ; in any case, p ( a; b ) = 1 ; and so if p ( a; b ) = 1 it must be the case that u ( a ) > u ( b ) . (cid:4) D Discrete choice analysis

Recall that Ease Consistency requires in an ordinal way that the di¢ culty of decision problem f a; b g relative to decision problem f c; d g is inherent to the alternatives involved and independent of deliberationtimes. The same requirement can be made cardinal: Constant Relative Ease of Comparison

Given any s > t in T , e t ( a; b ) e t ( c; d ) = e s ( a; b ) e s ( c; d ) for all a; b; c; d X such that either ratio is well de(cid:133)ned. Theorem 23

A random choice process f p t g satis(cid:133)es Positivity, the Choice Axiom, Preference Con-sistency, and Constant Relative Ease of Comparison if and only if there exist u; (cid:11) : X ! R and (cid:21) : T ! (0 ; ) such that p t ( a; A ) = e u ( a ) (cid:21) ( t ) + (cid:11) ( a ) P b A e u ( b ) (cid:21) ( t ) + (cid:11) ( b ) (31) for all A , all a A , and all t T .In this case, u is cardinally unique, (cid:11) is unique up to location, and (cid:21) is unique given u unless f p t g isconstant. Constant Relative Weight of Evidence

Given any s > t in T , w t ( a; b ) w t ( c; d ) = w s ( a; b ) w s ( c; d ) for all a; b; c; d X such that either ratio is well de(cid:133)ned. Which has a very similar interpretation.Inspection of the following proofs shows that, in order to apply these results to any index set T , notnecessarily a subset of (0 ; ) , it is su¢ cient to replace the inequality > appearing in the axioms with theweaker inequality = . Actually this replacement makes the axioms easier to test on the empirical side.Finally, Proposition 6 holds unchanged. D.1 Proofs

Recall that ‘ t ( a; c ) (cid:0) ‘ ( a; c ) = w t ( a; c ) is the weight of evidence, and note that Constant Relative Weightof Evidence implies: Log-odds Ratio Invariance

Given any s > t in T , ‘ t ( a; c ) (cid:0) ‘ ( a; c ) ‘ t ( b; c ) (cid:0) ‘ ( b; c ) = ‘ s ( a; c ) (cid:0) ‘ ( a; c ) ‘ s ( b; c ) (cid:0) ‘ ( b; c ) for all a; b; c X such that either ratio is well de(cid:133)ned. Lemma 24

A random choice process f p t g satis(cid:133)es Positivity, the Choice Axiom, Preference Consistency,and Log-odds Ratio Invariance if and only if there exist u; (cid:11) : X ! R and (cid:12) : T ! (0 ; ) such that p t ( a; A ) = e (cid:12) ( t ) u ( a )+ (cid:11) ( a ) P b A e (cid:12) ( t ) u ( b )+ (cid:11) ( b ) (32) for all A , all a A , and all t T . Proof

Only if.

Since f p t g satis(cid:133)es Positivity and the Choice Axiom, by Theorem 1, for each t T , thereexists v t : X ! R such that p t ( a; A ) = e v t ( a ) P b A e v t ( b ) a A (33)Arbitrarily choose (cid:22) c X and replace each v t with v t (cid:0) v t ((cid:22) c ) . With this, v t ((cid:22) c ) = 0 for all t T and (33)still holds. Set (cid:11) = v and u t = v t (cid:0) (cid:11) = v t (cid:0) v for all t T . Clearly, u t ((cid:22) c ) = v t ((cid:22) c ) (cid:0) v ((cid:22) c ) = 0 t T (also (cid:11) ((cid:22) c ) = 0 ).Note that, for all t T and all x X , w t ( x; (cid:22) c ) = ‘ t ( x; (cid:22) c ) (cid:0) ‘ ( x; (cid:22) c ) = v t ( x ) (cid:0) v t ((cid:22) c ) (cid:0) v ( x ) + v ((cid:22) c ) = v t ( x ) (cid:0) (cid:11) ( x ) = u t ( x ) (34)If u t is constant for all t T , then u t (cid:17) u t ((cid:22) c ) = 0 , and p t ( a; A ) = e v t ( a ) P b A e v t ( b ) = e u t ( a )+ (cid:11) ( a ) P b A e u t ( b )+ (cid:11) ( b ) = e (cid:11) ( a ) P b A e (cid:11) ( b ) = p ( a; A ) a A u (cid:17) ). Otherwise, there exists (cid:22) t T such that u (cid:22) t is not constant, so that u (cid:22) t (cid:0) (cid:22) b (cid:1) = 0 = u (cid:22) t ((cid:22) c ) for some (cid:22) b X . This implies that ‘ (cid:22) t ( a; (cid:22) c ) (cid:0) ‘ ( a; (cid:22) c ) ‘ (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) (cid:0) ‘ (cid:0) (cid:22) b; (cid:22) c (cid:1) = u (cid:22) t ( a ) u (cid:22) t (cid:0) (cid:22) b (cid:1) is a well de(cid:133)ned real number for all a X . By Log-odds Ratio Invariance, ‘ t ( a; (cid:22) c ) (cid:0) ‘ ( a; (cid:22) c ) ‘ t (cid:0) (cid:22) b; (cid:22) c (cid:1) (cid:0) ‘ (cid:0) (cid:22) b; (cid:22) c (cid:1) is well de(cid:133)ned too for all t T , and u t ( a ) u t (cid:0) (cid:22) b (cid:1) = ‘ t ( a; (cid:22) c ) (cid:0) ‘ ( a; (cid:22) c ) ‘ t (cid:0) (cid:22) b; (cid:22) c (cid:1) (cid:0) ‘ (cid:0) (cid:22) b; (cid:22) c (cid:1) = ‘ (cid:22) t ( a; (cid:22) c ) (cid:0) ‘ ( a; (cid:22) c ) ‘ (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) (cid:0) ‘ (cid:0) (cid:22) b; (cid:22) c (cid:1) = u (cid:22) t ( a ) u (cid:22) t (cid:0) (cid:22) b (cid:1) R ( a; t ) X (cid:2) T Therefore, u t (cid:0) (cid:22) b (cid:1) = 0 = u t ((cid:22) c ) for all t T , and u t ( a ) = u t (cid:0) (cid:22) b (cid:1) u (cid:22) t (cid:0) (cid:22) b (cid:1) u (cid:22) t ( a ) ( a; t ) X (cid:2) T (35)Consider the case in which u (cid:22) t (cid:0) (cid:22) b (cid:1) > u (cid:22) t ((cid:22) c ) . If t > (cid:22) t , then, by (34) and Preference Consistency, we have u (cid:22) t (cid:0) (cid:22) b (cid:1) > ) w (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) > ) w t (cid:0) (cid:22) b; (cid:22) c (cid:1) > ) u t (cid:0) (cid:22) b (cid:1) > thus u t (cid:0) (cid:22) b (cid:1) =u (cid:22) t (cid:0) (cid:22) b (cid:1) > . This is clearly true also if t = (cid:22) t . Else t < (cid:22) t , assume per contra u t (cid:0) (cid:22) b (cid:1) < , then, by(34) and Preference Consistency, we have u t (cid:0) (cid:22) b (cid:1) < ) w t (cid:0) (cid:22) b; (cid:22) c (cid:1) < ) (cid:0) w t (cid:0) (cid:22) b; (cid:22) c (cid:1) > ) w t (cid:0) (cid:22) c; (cid:22) b (cid:1) > ) w (cid:22) t (cid:0) (cid:22) c; (cid:22) b (cid:1) > ) (cid:0) w (cid:22) t (cid:0) (cid:22) c; (cid:22) b (cid:1) < ) w (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) < ) u (cid:22) t (cid:0) (cid:22) b (cid:1) < a contradiction. Thus u t (cid:0) (cid:22) b (cid:1) =u (cid:22) t (cid:0) (cid:22) b (cid:1) > holds for all t T provided u (cid:22) t (cid:0) (cid:22) b (cid:1) > .Consider the case in which u (cid:22) t (cid:0) (cid:22) b (cid:1) < u (cid:22) t ((cid:22) c ) . If t > (cid:22) t , then, by (34) and Preference Consistency, wehave u (cid:22) t (cid:0) (cid:22) b (cid:1) < ) w (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) < ) w (cid:22) t (cid:0) (cid:22) c; (cid:22) b (cid:1) > ) w t (cid:0) (cid:22) c; (cid:22) b (cid:1) > ) w t (cid:0) (cid:22) b; (cid:22) c (cid:1) < ) u t (cid:0) (cid:22) b (cid:1) < thus u t (cid:0) (cid:22) b (cid:1) =u (cid:22) t (cid:0) (cid:22) b (cid:1) > . This is clearly true also if t = (cid:22) t . Else t < (cid:22) t , assume per contra u t (cid:0) (cid:22) b (cid:1) > , then, by(34) and Preference Consistency, we have u t (cid:0) (cid:22) b (cid:1) > ) w t (cid:0) (cid:22) b; (cid:22) c (cid:1) > ) w (cid:22) t (cid:0) (cid:22) b; (cid:22) c (cid:1) > ) u (cid:22) t (cid:0) (cid:22) b (cid:1) > a contradiction. Thus u t (cid:0) (cid:22) b (cid:1) =u (cid:22) t (cid:0) (cid:22) b (cid:1) > holds for all t T provided u (cid:22) t (cid:0) (cid:22) b (cid:1) < .This shows that (cid:12) : T ! (0 ; ) t u t ( (cid:22) b ) u (cid:22) t ( (cid:22) b ) is well de(cid:133)ned. Moreover, the function u = u (cid:22) t : X ! R is nonconstant and relation (35) implies u t ( a ) = u t (cid:0) (cid:22) b (cid:1) u (cid:22) t (cid:0) (cid:22) b (cid:1) u (cid:22) t ( a ) = (cid:12) ( t ) u ( a ) ( a; t ) X (cid:2) T v t = u t + (cid:11) (for all t T ) shows that the axioms implyrepresentation (32); because the case t = 0 follows suit. If.

It is easy to verify that the converse implication holds too. For the sake of completeness, we checkthat representation (32) implies Log-odds Ratio Invariance. Let t; s T and a; b; c; x; y X . Notice that w t ( x; y ) = ‘ t ( x; y ) (cid:0) ‘ ( x; y ) = ln e (cid:12) ( t ) u ( x )+ (cid:11) ( x ) e (cid:12) ( t ) u ( y )+ (cid:11) ( y ) (cid:0) ln e (cid:11) ( x ) e (cid:11) ( y ) = (cid:12) ( t ) u ( x ) + (cid:11) ( x ) (cid:0) (cid:12) ( t ) u ( y ) (cid:0) (cid:11) ( y ) (cid:0) (cid:11) ( x ) + (cid:11) ( y )= (cid:12) ( t ) [ u ( x ) (cid:0) u ( y )] so that w t ( x; y ) = 0 () (cid:12) ( t ) [ u ( x ) (cid:0) u ( y )] = 0 () u ( x ) = u ( y ) because (cid:12) ( t ) > . The same considerations hold with s in place of t . Assume w s ( a; c ) =w s ( b; c ) is wellde(cid:133)ned: (cid:15) If w s ( b; c ) = 0 , then w s ( a; c ) = 0 , u ( b ) = u ( c ) , and u ( a ) = u ( c ) ; therefore: (cid:14) w s ( a; c ) w s ( b; c ) = (cid:12) ( s ) [ u ( a ) (cid:0) u ( c )]0 = u ( a ) (cid:0) u ( c )0 , because (cid:12) ( s ) > , (cid:14) w t ( b; c ) = (cid:12) ( t ) [ u ( b ) (cid:0) u ( c )] = 0 , because u ( b ) = u ( c ) , (cid:14) w t ( a; c ) = (cid:12) ( t ) [ u ( a ) (cid:0) u ( c )] = 0 , because u ( a ) = u ( c ) , and since (cid:12) ( t ) > , then w t ( a; c ) w t ( b; c ) = (cid:12) ( t ) [ u ( a ) (cid:0) u ( c )]0 = u ( a ) (cid:0) u ( c )0 = w s ( a; c ) w s ( b; c ) (cid:15) Else w s ( b; c ) = 0 , then u ( b ) = u ( c ) and w t ( b; c ) = (cid:12) ( t ) [ u ( b ) (cid:0) u ( c )] = 0 , so that w s ( a; c ) w s ( b; c ) = (cid:12) ( s ) [ u ( a ) (cid:0) u ( c )] (cid:12) ( s ) [ u ( b ) (cid:0) u ( c )] = u ( a ) (cid:0) u ( c ) u ( b ) (cid:0) u ( c ) = (cid:12) ( t ) [ u ( a ) (cid:0) u ( c )] (cid:12) ( t ) [ u ( b ) (cid:0) u ( c )] = w t ( a; c ) w t ( b; c ) The case in which w t ( a; c ) =w t ( b; c ) is well de(cid:133)ned is analogous. (cid:4) Lemma 25

If a random choice process f p t g satis(cid:133)es Positivity, the Choice Axiom, and Preference Con-sistency, then it satis(cid:133)es Log-odds Ratio Invariance if and only if it satis(cid:133)es Constant Relative Ease ofComparison. Proof

Since f p t g satis(cid:133)es Positivity and the Choice Axiom, by Theorem 1, for each t T , there exists v t : X ! R such that p t ( a; A ) = e v t ( a ) P b A e v t ( b ) a A (36)Arbitrarily choose (cid:22) c X and replace each v t with v t (cid:0) v t ((cid:22) c ) . With this, v t ((cid:22) c ) = 0 for all t T and (36)still holds. Set (cid:11) = v and u t = v t (cid:0) (cid:11) = v t (cid:0) v for all t T . Clearly, u t ((cid:22) c ) = v t ((cid:22) c ) (cid:0) v ((cid:22) c ) = 0 t T (also (cid:11) ((cid:22) c ) = 0 ). By (36), for all t T and a; b X , w t ( a; b ) = ‘ t ( a; b ) (cid:0) ‘ ( a; b ) = v t ( a ) (cid:0) v t ( b ) (cid:0) v ( a ) + v ( b ) = u t ( a ) (cid:0) u t ( b ) = (cid:0) w t ( b; a ) thus and e t ( a; b ) = j u t ( a ) (cid:0) u t ( b ) j hese relations will be repeatedly used during the proof .If f p t g satis(cid:133)es Log-odds Ratio Invariance, by Lemma 24, there exist u; (cid:11) : X ! R and (cid:12) : T ! (0 ; ) such that v t ( a ) = (cid:12) ( t ) u ( a ) + (cid:11) ( a ) for all a X and all t T .Let t; s T and a; b; c; d; x; y X . Notice that, since (cid:12) ( t ) > , then e t ( x; y ) = j v t ( x ) (cid:0) v t ( y ) (cid:0) [v ( x ) (cid:0) v ( y )] j = j (cid:12) ( t ) u ( x ) + (cid:11) ( x ) (cid:0) (cid:12) ( t ) u ( y ) (cid:0) (cid:11) ( y ) (cid:0) (cid:11) ( x ) + (cid:11) ( y ) j = (cid:12) ( t ) j u ( x ) (cid:0) u ( y ) j and so e t ( x; y ) = 0 if and only if u ( x ) = u ( y ) . The same considerations hold with s in place of t .Assume e s ( a; b ) =e s ( c; d ) is well de(cid:133)ned. Then it cannot be the case that both u ( a ) (cid:0) u ( b ) and u ( c ) (cid:0) u ( d ) are simultaneously zero. If u ( c ) = u ( d ) , then u ( a ) = u ( b ) and e s ( a; b ) e s ( c; d ) = (cid:12) ( s ) j u ( a ) (cid:0) u ( b ) j = (cid:12) ( t ) j u ( a ) (cid:0) u ( b ) j e t ( a; b ) e t ( c; d ) Else u ( c ) = u ( d ) and e s ( a; b ) e s ( c; d ) = (cid:12) ( s ) j u ( a ) (cid:0) u ( b ) j (cid:12) ( s ) j u ( c ) (cid:0) u ( d ) j = j u ( a ) (cid:0) u ( b ) jj u ( c ) (cid:0) u ( d ) j = (cid:12) ( t ) j u ( a ) (cid:0) u ( b ) j (cid:12) ( t ) j u ( c ) (cid:0) u ( d ) j = e t ( a; b ) e t ( c; d ) The case in which e t ( a; b ) =e t ( c; d ) is well de(cid:133)ned is analogous.Therefore Log-odds Ratio Invariance implies Constant Relative Ease of Comparison.Conversely, assume that f p t g satis(cid:133)es Constant Relative Ease of Comparison and that one of the ratios ‘ t ( a; c ) (cid:0) ‘ ( a; c ) ‘ t ( b; c ) (cid:0) ‘ ( b; c ) = w t ( a; c ) w t ( b; c ) or ‘ s ( a; c ) (cid:0) ‘ ( a; c ) ‘ s ( b; c ) (cid:0) ‘ ( b; c ) = w s ( a; c ) w s ( b; c ) is well de(cid:133)ned for some a; b; c X and some s > t in T . If w s ( a; c ) =w s ( b; c ) is well de(cid:133)ned, then it cannotbe the case that both u s ( a ) (cid:0) u s ( c ) and u s ( b ) (cid:0) u s ( c ) are simultaneously zero. If u s ( b ) (cid:0) u s ( c ) = 0 , theneither u s ( a ) > u s ( c ) or u s ( a ) < u s ( c ) . Moreover, by Preference Consistency, it must be the case that u t ( b ) (cid:0) u t ( c ) = 0 , and since e s ( a; c ) e s ( b; c ) = j u s ( a ) (cid:0) u s ( c ) jj u s ( b ) (cid:0) u s ( c ) j is well de(cid:133)ned, by Constant Relative Ease of Comparison, also e t ( a; c ) e t ( b; c ) = j u t ( a ) (cid:0) u t ( c ) jj u t ( b ) (cid:0) u t ( c ) j is well de(cid:133)ned and it must hold j u s ( a ) (cid:0) u s ( c ) jj u s ( b ) (cid:0) u s ( c ) j = j u t ( a ) (cid:0) u t ( c ) jj u t ( b ) (cid:0) u t ( c ) j In fact, by Preference Consistency, if s > t , then, given any x; y X , w t ( x; y ) > ) w s ( x; y ) > u t ( x ) (cid:0) u t ( y ) > ) u s ( x ) (cid:0) u s ( y ) > u t ( y ) (cid:0) u t ( x ) < ) u s ( y ) (cid:0) u s ( x ) < u s ( b ) (cid:0) u s ( c ) = u t ( b ) (cid:0) u t ( c ) = 0 , and j u t ( a ) (cid:0) u t ( c ) jj u t ( b ) (cid:0) u t ( c ) j = j u s ( a ) (cid:0) u s ( c ) jj u s ( b ) (cid:0) u s ( c ) j = then either u t ( a ) > u t ( c ) or u t ( c ) > u t ( a ) ; by Preference Consistency, in the former case we have u s ( a ) > u s ( c ) and w t ( a; c ) w t ( b; c ) = u t ( a ) (cid:0) u t ( c )0 = = u s ( a ) (cid:0) u s ( c )0 = w s ( a; c ) w s ( b; c ) in the latter case we have u s ( c ) > u s ( a ) and w t ( a; c ) w t ( b; c ) = u t ( a ) (cid:0) u t ( c )0 = (cid:0)1 = u s ( a ) (cid:0) u s ( c )0 = w s ( a; c ) w s ( b; c ) Else if u s ( b ) (cid:0) u s ( c ) = 0 , then e s ( a; c ) e s ( b; c ) = j u s ( a ) (cid:0) u s ( c ) jj u s ( b ) (cid:0) u s ( c ) j is well de(cid:133)ned and (cid:133)nite, so is e t ( a; c ) e t ( b; c ) = j u t ( a ) (cid:0) u t ( c ) jj u t ( b ) (cid:0) u t ( c ) j and it must hold j u s ( a ) (cid:0) u s ( c ) jj u s ( b ) (cid:0) u s ( c ) j = j u t ( a ) (cid:0) u t ( c ) jj u t ( b ) (cid:0) u t ( c ) j But then u t ( b ) (cid:0) u t ( c ) = 0 , and if u t ( b ) (cid:0) u t ( c ) ? , by Preference Consistency, u s ( b ) (cid:0) u s ( c ) ? .Therefore j u s ( a ) (cid:0) u s ( c ) j(cid:6) ( u s ( b ) (cid:0) u s ( c )) = j u t ( a ) (cid:0) u t ( c ) j(cid:6) ( u t ( b ) (cid:0) u t ( c )) Now, if u t ( a ) (cid:0) u t ( c ) = 0 , then u s ( a ) (cid:0) u s ( c ) = 0 and u s ( a ) (cid:0) u s ( c ) (cid:6) ( u s ( b ) (cid:0) u s ( c )) = u t ( a ) (cid:0) u t ( c ) (cid:6) ( u t ( b ) (cid:0) u t ( c )) else if u t ( a ) (cid:0) u t ( c ) > , by Preference Consistency, u s ( a ) (cid:0) u s ( c ) > and u s ( a ) (cid:0) u s ( c ) (cid:6) ( u s ( b ) (cid:0) u s ( c )) = u t ( a ) (cid:0) u t ( c ) (cid:6) ( u t ( b ) (cid:0) u t ( c )) else, u t ( a ) (cid:0) u t ( c ) < , by Preference Consistency, u s ( a ) (cid:0) u s ( c ) < and (cid:0) ( u s ( a ) (cid:0) u s ( c )) (cid:6) ( u s ( b ) (cid:0) u s ( c )) = (cid:0) ( u t ( a ) (cid:0) u t ( c )) (cid:6) ( u t ( b ) (cid:0) u t ( c )) In any case, w s ( a; c ) w s ( b; c ) = u s ( a ) (cid:0) u s ( c ) u s ( b ) (cid:0) u s ( c ) = u t ( a ) (cid:0) u t ( c ) u t ( b ) (cid:0) u t ( c ) = w t ( a; c ) w t ( b; c ) So far we proved that, under Constant Relative Ease of Comparison, if w s ( a; c ) =w s ( b; c ) is well de(cid:133)ned,then w t ( a; c ) =w t ( b; c ) is also well de(cid:133)ned, and the two ratios coincide. Now assume that w t ( a; c ) =w t ( b; c ) is well de(cid:133)ned. Then e t ( a; c ) =e t ( b; c ) = j w t ( a; c ) j = j w t ( b; c ) j is well de(cid:133)ned as well; by Constant RelativeEase of Comparison, e s ( a; c ) =e s ( b; c ) = j w s ( a; c ) j = j w s ( b; c ) j is well de(cid:133)ned too, then w s ( a; c ) =w s ( b; c ) isnot = . By the previous argument, we have w s ( a; c ) w s ( b; c ) = w t ( a; c ) w t ( b; c ) In conclusion, Log-odds Ratio Invariance holds. (cid:4) Proofs of the results of Section 4

Let v : A ! R , (cid:12) > , and a = b in A be given and (cid:133)xed; set (cid:14) = v ( a ) (cid:0) v ( b ) . In this way, whenDDM ( v; (cid:12); (cid:16) ) is considered, the ex post (binary) probability of accepting proposal a over incumbent b is P (cid:16) ( a; b ) = P (CO a;b = a ) = 1 (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (37)with the limit convention (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] = (cid:16) a;b + (cid:12) (cid:12) (38)if v ( a ) = v ( b ) . This number is uniquely determined by the value (cid:16) a;b of (cid:16) at ( a; b ) . Fact 1 P (cid:16) ( a; b ) (0 ; for all (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) . Proof If (cid:14) = 0 , then (cid:15) P (CO a;b = a ) = 0 () (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) )1 (cid:0) e (cid:0) (cid:14)(cid:12) = 0 () e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) ) () (cid:16) a;b + (cid:12) = 0 () (cid:16) a;b = (cid:0) (cid:12) which is excluded by (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) ; (cid:15) P (CO a;b = a ) = 1 () (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) ) = 1 (cid:0) e (cid:0) (cid:14)(cid:12) () e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) ) = e (cid:0) (cid:14)(cid:12) () (cid:16) a;b + (cid:12) = 2 (cid:12) () (cid:16) a;b = (cid:12) which is excluded by (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) .Else (cid:14) = 0 and P (CO a;b = a ) ; g if and only if (cid:16) a;b + (cid:12) (cid:12) ; g () (cid:16) a;b + (cid:12) ; (cid:12) g () (cid:16) a;b (cid:12); (cid:12) g which, again, is excluded by (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) . (cid:4) Fact 2 If (cid:16) a;b ; (cid:16) b;a ( (cid:0) (cid:12); (cid:12) ) are such that (cid:16) a;b = (cid:0) (cid:16) b;a , then P (cid:16) ( a; b ) = 1 (cid:0) P (cid:16) ( b; a ) . Proof

First we show that (cid:16) a;b = (cid:0) (cid:16) b;a implies that P (CO a;b = a ) = P (CO b;a = a ) . That is, the DDM-induced probability of accepting proposal a over incumbent b coincides with the DDM-induced probabilityof rejecting proposal b over incumbent a , when (cid:16) a;b = (cid:0) (cid:16) b;a .If (cid:14) = 0 , then P (CO a;b = a ) = P (CO b;a = a ) () P (CO a;b = a ) = 1 (cid:0) P (CO b;a = b )1 (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) )1 (cid:0) e (cid:0) (cid:14)(cid:12) = 1 (cid:0) (cid:0) e (cid:14) ( (cid:16) b;a + (cid:12) )1 (cid:0) e (cid:14)(cid:12) () (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) )1 (cid:0) e (cid:0) (cid:14)(cid:12) = 1 (cid:0) e (cid:14)(cid:12) (cid:0) e (cid:14) ( (cid:16) b;a + (cid:12) )1 (cid:0) e (cid:14)(cid:12) (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) )1 (cid:0) e (cid:0) (cid:14)(cid:12) = (cid:0) e (cid:14)(cid:12) + e (cid:14) ( (cid:16) b;a + (cid:12) )1 (cid:0) e (cid:14)(cid:12) () (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) )1 (cid:0) e (cid:0) (cid:14)(cid:12) = (cid:0) e (cid:14) ( (cid:16) b;a + (cid:12) ) (cid:0) (cid:14)(cid:12) e (cid:0) (cid:14)(cid:12) (cid:0) (cid:0) e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) ) = 1 (cid:0) e (cid:14) ( (cid:16) b;a + (cid:12) ) (cid:0) (cid:14)(cid:12) () e (cid:0) (cid:14) ( (cid:16) a;b + (cid:12) ) = e (cid:14) ( (cid:16) b;a + (cid:12) ) (cid:0) (cid:14)(cid:12) (cid:0) (cid:14)(cid:16) a;b (cid:0) (cid:14)(cid:12) = (cid:14)(cid:16) b;a (cid:0) (cid:14)(cid:12) () (cid:16) a;b = (cid:0) (cid:16) b;a See, e.g., Pinsky and Karlin (2011, Theorem 8.1). P (CO a;b = a ) = P (CO b;a = a ) () P (CO a;b = a ) = 1 (cid:0) P (CO b;a = b ) (cid:16) a;b + (cid:12) (cid:12) = 1 (cid:0) (cid:16) b;a + (cid:12) (cid:12) () (cid:16) a;b + (cid:12) = 2 (cid:12) (cid:0) ( (cid:16) b;a + (cid:12) ) (cid:16) a;b + (cid:12) = (cid:0) (cid:16) b;a + (cid:12) () (cid:16) a;b = (cid:0) (cid:16) b;a which is the hypothesis.But then P (cid:16) ( a; b ) = P (CO a;b = a ) = P (CO b;a = a ) = 1 (cid:0) P (CO b;a = b ) = 1 (cid:0) P (cid:16) ( b; a ) as wanted. (cid:4) Fact 3

For all scalars x; y > and z; w = 0 , xy = zw () xx + y = zz + w () yx + y = wz + w Proof

Notice that x; y > excludes y=x = (cid:0) , then xy = zw () yx = wz ()

11 + yx = 11 + wz () xx + y = zz + w () (cid:0) xx + y = 1 (cid:0) zz + w () x + y (cid:0) xx + y = z + w (cid:0) zz + w () yx + y = wz + w as wanted. (cid:4) Proposition 26 If (cid:16) a;b ; (cid:16) b;a ( (cid:0) (cid:12); (cid:12) ) are such that (cid:16) a;b = (cid:0) (cid:16) b;a , then the following conditions are equiva-lent for (cid:24) R :1. P (cid:16) ( a; b ) = (cid:24)e (cid:12)v ( a ) (cid:24)e (cid:12)v ( a ) + (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) ;2. P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:24) (cid:0) (cid:24) ;3. (cid:24) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) . Proof If (cid:24) satis(cid:133)es the second equation, then neither (cid:24) nor (cid:0) (cid:24) can be zero because Fact 1 requires that P (cid:16) ( a; b ) = P (cid:16) ( b; a ) = ; , and P (cid:16) ( a; b ) P (cid:16) ( b; a ) = (cid:24)e (cid:12)v ( a ) (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) But P (cid:16) ( a; b ) ; P (cid:16) ( b; a ) > and (cid:24)e (cid:12)v ( a ) ; (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) = 0 , then, by Fact 3, P (cid:16) ( a; b ) P (cid:16) ( a; b ) + P (cid:16) ( b; a ) = (cid:24)e (cid:12)v ( a ) (cid:24)e (cid:12)v ( a ) + (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) P (cid:16) ( a; b ) + P (cid:16) ( b; a ) = 1 by Fact 2, is equivalent to P (cid:16) ( a; b ) = (cid:24)e (cid:12)v ( a ) (cid:24)e (cid:12)v ( a ) + (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) the (cid:133)rst equation.Conversely, if (cid:24) satis(cid:133)es the (cid:133)rst equation, then neither (cid:24) nor (cid:0) (cid:24) can be zero because Fact 1 requiresthat P (cid:16) ( a; b ) = ; g , and since, by Fact 2, P (cid:16) ( a; b ) + P (cid:16) ( b; a ) = 1 , it follows P (cid:16) ( a; b ) P (cid:16) ( a; b ) + P (cid:16) ( b; a ) = (cid:24)e (cid:12)v ( a ) (cid:24)e (cid:12)v ( a ) + (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) But P (cid:16) ( a; b ) ; P (cid:16) ( b; a ) > and (cid:24)e (cid:12)v ( a ) ; (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) = 0 , then, by Fact 3, (cid:24) satis(cid:133)es the second equation.If (cid:24) satis(cid:133)es the second equation (and so neither (cid:24) nor (cid:0) (cid:24) are zero), then e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) = (cid:24) (cid:0) (cid:24) But e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) ; e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) > and (cid:24); (cid:0) (cid:24) = 0 , then, by Fact 3, e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) = (cid:24)(cid:24) + (1 (cid:0) (cid:24) ) = (cid:24) the third equation.Conversely, if (cid:24) satis(cid:133)es the third equation, then (cid:24) and (cid:0) (cid:24) cannot be zero because (cid:24) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) (0 ; and the third equation can be written as e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) = (cid:24)(cid:24) + (1 (cid:0) (cid:24) ) But e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) ; e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) > and (cid:24); (cid:0) (cid:24) = 0 , then, by Fact 3, e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) = (cid:24) (cid:0) (cid:24) and so P (cid:16) ( a; b ) P (cid:16) ( b; a ) = (cid:24)e (cid:12)v ( a ) (1 (cid:0) (cid:24) ) e (cid:12)v ( b ) that is, (cid:24) satis(cid:133)es the second equation. (cid:4) This shows the equivalence of Equations (15), (16), and (17).The Gibbs ex ante (binary) probability is de(cid:133)ned by (17) as (cid:25) (cid:16) ( a; b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) and it is such that (cid:25) (cid:16) ( a; b ) + (cid:25) (cid:16) ( b; a ) = 1 Moreover, the maintained assumption (cid:16) a;b = (cid:0) (cid:16) b;a , on the initial condition, makes (17) equivalent to (cid:25) (cid:16) ( a; b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) (1 (cid:0) P (cid:16) ( a; b )) (cid:25) (cid:16) ( a; b ) as a function of (cid:16) a;b only. This means that G a;b : ( (cid:0) (cid:12); (cid:12) ) ! (0 ; (cid:16) a;b (cid:25) (cid:16) ( a; b ) is well de(cid:133)ned and explicitly given by G a;b ( (cid:16) a;b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) (1 (cid:0) P (cid:16) ( a; b ))= 11 + e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) P (cid:16) ( a; b ) P (cid:16) ( a; b )= 11 + e (cid:12) [ v ( a ) (cid:0) v ( b )] e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] = 11 + e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] Next we prove Proposition 9 and the fact that G a;b is a bona (cid:133)de bijection. Proof of Proposition 9

Arbitrarily choose (cid:25) ( a; b ) (0 ; .If v ( a ) = v ( b ) , then G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) () (cid:25) ( a; b ) = 11 + e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] (setting (cid:25) ( b; a ) = 1 (cid:0) (cid:25) ( a; b ) ) () (cid:25) ( a; b ) (cid:25) ( a; b ) + (cid:25) ( b; a ) = 11 + e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] ()

11 + (cid:25) ( b; a ) (cid:25) ( a; b ) = 11 + e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] () (cid:25) ( a; b ) (cid:25) ( b; a ) = 1 (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) )[ v ( a ) (cid:0) v ( b )] e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] Recalling (cid:14) = v ( a ) (cid:0) v ( b ) = 0 and setting (cid:13) = ln (cid:25) ( a; b ) =(cid:25) ( b; a ) , we have: G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) () (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) ) (cid:14) e (cid:0) (cid:16) a;b (cid:14) (cid:0) e (cid:0) (cid:12)(cid:14) = e (cid:13) () (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) ) (cid:14) = e (cid:13) (cid:0) (cid:16) a;b (cid:14) (cid:0) e (cid:13) (cid:0) (cid:12)(cid:14) () e (cid:13) (cid:0) (cid:12)(cid:14) + 1 = e (cid:13) (cid:0) (cid:16) a;b (cid:14) + e (cid:0) (cid:16) a;b (cid:14) (cid:0) (cid:12)(cid:14) () e (cid:0) (cid:16) a;b (cid:14) = e (cid:13) (cid:0) (cid:12)(cid:14) + 1 e (cid:13) + e (cid:0) (cid:12)(cid:14) () (cid:16) a;b = (cid:0) (cid:14) ln e (cid:13) (cid:0) (cid:12)(cid:14) + 1 e (cid:13) + e (cid:0) (cid:12)(cid:14) () (cid:16) a;b = (cid:0) (cid:14) ln 1 e (cid:0) (cid:12)(cid:14) e (cid:13) (cid:0) (cid:12)(cid:14) + 1 e (cid:13) + (cid:12)(cid:14) + 1 () (cid:16) a;b = (cid:0) (cid:14) (cid:18) (cid:12)(cid:14) + ln e (cid:13) (cid:0) (cid:12)(cid:14) + 1 e (cid:13) + (cid:12)(cid:14) + 1 (cid:19) () (cid:16) a;b = (cid:0) (cid:12) + 1 (cid:14) ln e (cid:13) + (cid:12)(cid:14) + 1 e (cid:13) (cid:0) (cid:12)(cid:14) + 1 That is, (cid:16) a;b = (cid:0) (cid:12) + 1 v ( a ) (cid:0) v ( b ) ln (cid:25) ( a;b ) (cid:25) ( b;a ) e (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 (cid:25) ( a;b ) (cid:25) ( b;a ) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 (39) Instead of as a function of (cid:16) (for v , (cid:12) and ( a; b ) given). (cid:16) a;b (cid:12) = 1 (cid:12) [ v ( a ) (cid:0) v ( b )] ln (cid:25) ( a;b ) (cid:25) ( b;a ) e (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 (cid:25) ( a;b ) (cid:25) ( b;a ) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 ! (cid:0) (40)De(cid:133)ne g : R ! R by g ( x; y ) = 1 x ln (cid:18) e y e x + 1 e y e (cid:0) x + 1 (cid:19) (cid:0) with the limit convention g (0 ; y ) = lim x ! (cid:18) x ln (cid:18) e y e x + 1 e y e (cid:0) x + 1 (cid:19) (cid:0) (cid:19) = 2 e y e y + 1 (cid:0) By (40), G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) () (cid:16) a;b = (cid:12)g (cid:18) (cid:12) [ v ( a ) (cid:0) v ( b )] ; ln (cid:25) ( a; b )1 (cid:0) (cid:25) ( a; b ) (cid:19) (41)if v ( a ) = v ( b ) .Else if v ( a ) = v ( b ) , then G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) () (cid:25) ( a; b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) + e (cid:0) (cid:12)v ( b ) (1 (cid:0) P (cid:16) ( a; b )) (cid:0) = P (cid:16) ( a; b ) (cid:1) (cid:25) ( a; b ) = (cid:16) a;b + (cid:12) (cid:12) () (cid:16) a;b = 2 (cid:12)(cid:25) ( a; b ) (cid:0) (cid:12) () (cid:16) a;b = (cid:12) (2 (cid:25) ( a; b ) (cid:0) () (cid:16) a;b = (cid:12) (cid:18) e ln (cid:25) ( a;b ) =(cid:25) ( b;a ) e ln (cid:25) ( a;b ) =(cid:25) ( b;a ) + 1 (cid:0) (cid:19) () (cid:16) a;b = (cid:12)g (cid:18) (cid:12) [ v ( a ) (cid:0) v ( b )] ; ln (cid:25) ( a; b )1 (cid:0) (cid:25) ( a; b ) (cid:19) Hence (41) holds also if v ( a ) = v ( b ) .So far we have shown that, given any (cid:25) ( a; b ) (0 ; , G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) () (cid:16) a;b = (cid:12)g (cid:18) (cid:12) [ v ( a ) (cid:0) v ( b )] ; ln (cid:25) ( a; b )1 (cid:0) (cid:25) ( a; b ) (cid:19) Since g takes values in ( (cid:0) ; , for every (cid:25) ( a; b ) (0 ; there exists one and only one (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) suchthat G a;b ( (cid:16) a;b ) = (cid:25) ( a; b ) . Then the Gibbs binary bijection is a genuine bijection.Moreover, for each x R , g x : R ! ( (cid:0) ; y g ( x; y ) is such that g x ( y ) ? if and only if y ? . Then, given any (cid:16) a;b ( (cid:0) (cid:12); (cid:12) ) , G a;b ( (cid:16) a;b ) ? () ln G a;b ( (cid:16) a;b )1 (cid:0) G a;b ( (cid:16) a;b ) ? () (cid:12)g (cid:18) (cid:12) [ v ( a ) (cid:0) v ( b )] ; ln G a;b ( (cid:16) a;b )1 (cid:0) G a;b ( (cid:16) a;b ) (cid:19) ? () (cid:16) a;b ? The proof is concluded by observing that (cid:12)(cid:12) P (cid:16) ( a; b ) (cid:0) (cid:25) (cid:16) ( a; b ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) (cid:25) (cid:16) ( a; b ) e (cid:12)v ( a ) (cid:25) (cid:16) ( a; b ) e (cid:12)v ( a ) + (cid:25) (cid:16) ( b; a ) e (cid:12)v ( b ) (cid:0) (cid:25) (cid:16) ( a; b ) (cid:25) (cid:16) ( a; b ) + (cid:25) (cid:16) ( b; a ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)

11 + (cid:25) (cid:16) ( b;a ) (cid:25) (cid:16) ( a;b ) e (cid:12) [ v ( b ) (cid:0) v ( a )] (cid:0)

11 + (cid:25) (cid:16) ( b;a ) (cid:25) (cid:16) ( a;b ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)

11 + e (cid:12) [ v ( b ) (cid:0) v ( a )] (cid:0) (cid:13) (cid:0)

11 + e (cid:0) (cid:13) (cid:12)(cid:12)(cid:12)(cid:12) (cid:13) = ln (cid:25) (cid:16) ( a; b ) =(cid:25) (cid:16) ( b; a ) and the derivative of

11 + e x has module bounded above by = . (cid:4) Proposition 27

Let v : A ! R and (cid:12) > . The following conditions are equivalent for a function (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) such that (cid:16) a;b = (cid:0) (cid:16) b;a for all a = b in A :1. the incumbent transition matrix M is reversible for every exploration matrix Q ;2. DDM ( v; (cid:12); (cid:16) ) is transitive;3. there exists (cid:25) (cid:16) (cid:1) ( A ) such that, for all a = b in A , (cid:25) (cid:16) ( a; b ) = (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( a ) + (cid:25) (cid:16) ( b ) In this case:(i) (cid:25) (cid:16) is the unique fully supported distribution (cid:25) on A that solves equation P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) ( a ) (cid:25) ( b ) that is, (cid:25) ( a ) (cid:25) ( b ) = 1 (cid:0) e (cid:0) ( (cid:16) a;b + (cid:12) ) [ v ( a ) (cid:0) v ( b )] e (cid:0) (cid:16) a;b [ v ( a ) (cid:0) v ( b )] (cid:0) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] (42) (ii) the stationary distribution of M is m ( a ) = (cid:25) (cid:16) ( a ) e (cid:12)v ( a ) P b A (cid:25) (cid:16) ( b ) e (cid:12)v ( b ) a A (iii) If (cid:16) (cid:17) , then (cid:25) (cid:16) is the uniform distribution on A and m ( a ) = e (cid:12)v ( a ) P b A e (cid:12)v ( b ) a A Moreover, for each fully supported (cid:25) (cid:1) ( A ) , there exists a unique (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) such that (cid:16) a;b = (cid:0) (cid:16) b;a for which DDM ( v; (cid:12); (cid:16) ) is transitive and such that (cid:25) (cid:16) = (cid:25) . Summing up, given neural utility v and threshold (cid:12) , starting bias speci(cid:133)cations (cid:16) of transitive DDMsbijectively correspond to fully supported initial distributions (cid:25) on A . These distributions in turn charac-terize the (softmax) stationary distribution of the resulting Metropolis-DDM algorithms (irrespective ofthe exploration parameters (cid:22) and Q ). Proof of Proposition 27

Since (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) is such that (cid:16) a;b = (cid:0) (cid:16) b;a , then, given any a = b in A , (cid:15) by Fact 1, P (cid:16) ( a; b ) (0 ; ; (cid:15) by Fact 2, P (cid:16) ( a; b ) = 1 (cid:0) P (cid:16) ( b; a ) . 61f moreover DDM ( v; (cid:12); (cid:16) ) is transitive, then P (cid:16) ( b; a ) P (cid:16) ( c; b ) P (cid:16) ( a; c ) = P (cid:16) ( c; a ) P (cid:16) ( b; c ) P (cid:16) ( a; b ) for all distinct a; b; c A . By Luce and Suppes (1965, Theorem 48, p. 350), there exists v : A ! R suchthat P (cid:16) ( a; b ) = P (CO a;b = a ) = e v( a ) e v( a ) + e v( b ) (43)for all a = b in A . De(cid:133)ne (cid:13) ( a ) = v ( a ) (cid:0) (cid:12)v ( a ) a A so that v = (cid:12)v + (cid:13) . Notice that (cid:13) is unique up to location because v is.With this, the explicit form of M is M ( a j b ) = Q ( a j b ) e v( a ) e v( a ) + e v( b ) if a = b (cid:0) P c A nf b g Q ( c j b ) e v( a ) e v( c ) + e v( b ) if a = b and so M is irreducible for every irreducible Q . Moreover, again by the irreducibility of Q , X c A nf b g Q ( c j b ) > b A Otherwise it would follow Q ( b j b ) = 1 for some b A , violating irreducibility. But, then M ( b j b ) > for all b A , which implies aperiodicity of M . Thus M admits a unique stationary distribution (see, e.g.,Madras, 2002, Theorem 4.2, p. 35).Next we show that the stationary distribution is m ( a ) = e (cid:12)v ( a )+ (cid:13) ( a ) P b A e (cid:12)v ( b )+ (cid:13) ( b ) = e v( a ) P b A e v( b ) a A Notice that, for all a = b in A , M ( a j b ) m ( b ) = Q ( a j b ) e v( a ) e v( a ) + e v( b ) e v( b ) P x A e v( x ) = Q ( a j b ) P x A e v( x ) e v( a )+v( b ) e v( a ) + e v( b ) = Q ( b j a ) e v( b ) e v( b ) + e v( a ) e v( a ) P x A e v( x ) = M ( b j a ) m ( a ) If a = b , then M ( a j b ) m ( b ) = M ( b j a ) m ( a ) is obvious, thus M ( a j b ) m ( b ) = M ( b j a ) m ( a ) a; b A Therefore, M is reversible with respect to m and a fortiori m is stationary for M (see, e.g., Madras, 2002,Proposition 4.4, p. 36).So far we have proved that point 2 implies point 1 and that, under point 2, irrespective of (cid:22) and Q ,the stationary distribution of M is m ( a ) = e (cid:12)v ( a )+ (cid:13) ( a ) P b A e (cid:12)v ( b )+ (cid:13) ( b ) = e v( a ) P b A e v( b ) a A (cid:22) Q ( x j y ) = 1 = j A j for all x; y A , we have that the matrix (cid:22) M ( a j b ) = 1 j A j P (cid:16) ( a; b ) ( a; b ) A = is reversible with respect to some (cid:22) m (cid:1) ( A ) , that is j A j P (cid:16) ( a; b ) (cid:22) m ( b ) = 1 j A j P (cid:16) ( b; a ) (cid:22) m ( a ) ( a; b ) A = Since (cid:22) M is irreducible and aperiodic, then (cid:22) m is the unique stationary distribution of (cid:22) M .If (cid:22) m were not fully supported, say (cid:22) m ( a (cid:3) ) = 0 for some a (cid:3) A , then it would follow (cid:22) m ( b ) = P (cid:16) ( b; a (cid:3) ) P (cid:16) ( a (cid:3) ; b ) (cid:22) m ( a (cid:3) ) = 0 b = a (cid:3) which is absurd. Then, for all a = b in A , P (cid:16) ( a; b ) P (cid:16) ( b; a ) = (cid:22) m ( a )(cid:22) m ( b ) Set (cid:25) (cid:16) ( a ) = (cid:22) m ( a ) e (cid:0) (cid:12)v ( a ) P b A (cid:22) m ( b ) e (cid:0) (cid:12)v ( b ) a A this distribution is fully supported on A and, for all a = b in A , (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( b ) = (cid:22) m ( a ) e (cid:0) (cid:12)v ( a ) (cid:22) m ( b ) e (cid:0) (cid:12)v ( b ) = e (cid:0) (cid:12)v ( a ) P (cid:16) ( a; b ) e (cid:0) (cid:12)v ( b ) P (cid:16) ( b; a ) = (cid:25) (cid:16) ( a; b ) (cid:25) (cid:16) ( b; a ) Since (cid:25) (cid:16) ( a; b ) + (cid:25) (cid:16) ( b; a ) = 1 for all a = b , it follows that (cid:25) (cid:16) ( a; b ) = 11 + (cid:25) (cid:16) ( b;a ) (cid:25) (cid:16) ( a;b ) = 11 + (cid:25) (cid:16) ( b ) (cid:25) (cid:16) ( a ) = (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( a ) + (cid:25) (cid:16) ( b ) So far we have proved that point 1 implies point 3 and that, under point 1, irrespective of (cid:22) and with auniform Q , there exists a fully supported (cid:25) (cid:16) (cid:1) ( A ) such that: (cid:15) the stationary distribution of (cid:22) M (where the bar recalls that Q is uniform) is given, for all a A , by (cid:22) m ( a ) = (cid:22) m ( a ) P b A (cid:22) m ( b ) = 1 P b A (cid:22) m ( b )(cid:22) m ( a ) = 1 P b A (cid:22) m ( b ) e (cid:0) (cid:12)v ( b ) e (cid:12)v ( b ) (cid:22) m ( a ) e (cid:0) (cid:12)v ( a ) e (cid:12)v ( a ) = 1 P b A (cid:25) (cid:16) ( b ) e (cid:12)v ( b ) (cid:25) (cid:16) ( a ) e (cid:12)v ( a ) = (cid:25) (cid:16) ( a ) e (cid:12)v ( a ) P b A (cid:25) (cid:16) ( b ) e (cid:12)v ( b ) (cid:15) for all a = b in A (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( b ) = (cid:25) (cid:16) ( a; b ) (cid:25) (cid:16) ( b; a ) Now assume point 3 holds, then there exists (cid:25) (cid:16) (cid:1) ( A ) such that, for all a = b in A , (cid:25) (cid:16) ( a; b ) = (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( a ) + (cid:25) (cid:16) ( b )

63n particular, (cid:25) (cid:16) is fully supported (because (cid:25) (cid:16) ( a; b ) (0 ; for all a = b in A ), and for all distinctalternatives a; b; c A , P (cid:16) ( b; a ) P (cid:16) ( c; b ) P (cid:16) ( a; c ) P (cid:16) ( c; a ) P (cid:16) ( b; c ) P (cid:16) ( a; b ) = P (cid:16) ( b; a ) P (cid:16) ( c; b ) P (cid:16) ( a; c ) P (cid:16) ( a; b ) P (cid:16) ( b; c ) P (cid:16) ( c; a )= e (cid:12)v ( b ) e (cid:12)v ( a ) (cid:25) (cid:16) ( b; a ) (cid:25) (cid:16) ( a; b ) e (cid:12)v ( c ) e (cid:12)v ( b ) (cid:25) (cid:16) ( c; b ) (cid:25) (cid:16) ( b; c ) e (cid:12)v ( a ) e (cid:12)v ( c ) (cid:25) (cid:16) ( a; c ) (cid:25) (cid:16) ( c; a )= (cid:25) (cid:16) ( b; a ) (cid:25) (cid:16) ( a; b ) (cid:25) (cid:16) ( c; b ) (cid:25) (cid:16) ( b; c ) (cid:25) (cid:16) ( a; c ) (cid:25) (cid:16) ( c; a ) = (cid:25) (cid:16) ( b ) (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( c ) (cid:25) (cid:16) ( b ) (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( c ) = 1 that is, DDM ( v; (cid:12); (cid:16) ) is transitive.So far we have shown the equivalence of points 1, 2, and 3. Moreover, if 1 holds and we denote by (cid:22) m the stationary distribution of (cid:22) M (which is unique because (cid:22) M is aperiodic and irreducible), then (cid:22) m is fullysupported and (cid:25) (cid:16) ( a ) = (cid:22) m ( a ) e (cid:0) (cid:12)v ( a ) P b A (cid:22) m ( b ) e (cid:0) (cid:12)v ( b ) is such that (cid:22) m ( a ) = (cid:25) (cid:16) ( a ) e (cid:12)v ( a ) P b A (cid:25) (cid:16) ( b ) e (cid:12)v ( b ) (44)but since also point 2 holds, all stationary distributions of all M = M ( (cid:22); Q ) coincide (irrespective of (cid:22) and Q ) and are given by (44) and (ii) holds. Moreover, for all a = b in A , (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( b ) = (cid:25) (cid:16) ( a; b ) (cid:25) (cid:16) ( b; a ) = e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] P (cid:16) ( a; b ) P (cid:16) ( b; a ) and so (cid:25) (cid:16) is a fully supported probability (cid:25) on A that solves the equation P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) ( a ) (cid:25) ( b ) for all a = b in A . At the same time such solution is unique because fully supported probability measureson A are uniquely determined by their odds. Thus (ii) holds.Moreover, if (cid:16) (cid:17) , then P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] thus DDM ( v; (cid:12); ) is transitive and (cid:25) is uniform. This proves the (cid:133)rst part of (iii). To prove the secondpart, we will show that, for each fully supported (cid:30) in (cid:1) ( A ) , there exists one and only one (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) such that (cid:16) a;b = (cid:0) (cid:16) b;a for which DDM ( v; (cid:12); (cid:16) ) is transitive and such that (cid:25) (cid:16) = (cid:30) .Assume that (cid:25) (cid:16) = (cid:25) (cid:24) = (cid:25) . Then for all a = b in A , P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) (cid:16) ( a ) (cid:25) (cid:16) ( b ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) (cid:24) ( a ) (cid:25) (cid:24) ( b ) = P (cid:24) ( a; b ) P (cid:24) ( b; a ) and by (39) (cid:16) ( a; b ) = (cid:0) (cid:12) + 1 v ( a ) (cid:0) v ( b ) ln (cid:25) ( a ) (cid:25) ( b ) e (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 (cid:25) ( a ) (cid:25) ( b ) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 = (cid:24) ( a; b ) so that (cid:16) = (cid:24) . This proves that (cid:16) (cid:25) (cid:16) from the set of all (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) such that (cid:16) a;b = (cid:0) (cid:16) b;a forwhich DDM ( v; (cid:12); (cid:16) ) is transitive to the set of all fully supported probabilities on A is injective.As to surjectivity, set (cid:16) ( a; b ) = (cid:0) (cid:12) + 1 v ( a ) (cid:0) v ( b ) ln (cid:25) ( a ) (cid:25) ( b ) e (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 (cid:25) ( a ) (cid:25) ( b ) e (cid:0) (cid:12) [ v ( a ) (cid:0) v ( b )] + 1 v ( a ) = v ( b ) and (cid:16) ( a; b ) = (cid:0) (cid:12) + 2 (cid:12) (cid:25) ( a ) = ( (cid:25) ( a ) + (cid:25) ( b )) otherwise. Tedious veri(cid:133)cation shows that (cid:16) : A = ! ( (cid:0) (cid:12); (cid:12) ) is such that (cid:16) a;b = (cid:0) (cid:16) b;a and P (cid:16) ( a; b ) P (cid:16) ( b; a ) = e (cid:12) [ v ( a ) (cid:0) v ( b )] (cid:2) (cid:25) ( a ) (cid:25) ( b ) for all a = b in A , and so DDM ( v; (cid:12); (cid:16) ) is transitive and (cid:25) (cid:16) = (cid:25) , as wanted. (cid:4) F References

Aczel, J. (1966).

Lectures on functional equations and their applications . Academic Press.Agranov, M., Caplin, A., and Tergiman, C. (2015). Naive play and the process of choice in guessing games.

Journal of the Economic Science Association , 1, 146-157.Alos-Ferrer, C., Fehr, E., and Netzer, N. (2018). Time will tell: recovering preferences when choices arenoisy. Mimeo.ALQahtani, D. A., Rotgans, J. I., Mamede, S., ALAlwan, I., Magzoub, M. E. M., Altayeb, F. M., Mo-hamedani M. A., and Schmidt, H. G. (2016). Does time pressure have a negative e⁄ect on diagnosticaccuracy?

Academic Medicine , 91, 710-716.Anderson, S. P., Goeree, J. K., and Holt, C. A. (2004). Noisy directional learning and the logit equilibrium.

The Scandinavian Journal of Economics , 106, 581-602.Ariely, D., and Wertenbroch, K. (2002). Procrastination, deadlines, and performance: Self-control byprecommitment.

Psychological science , 13, 219-224.Arrow, K. J. (1959). Rational choice functions and orderings.

Economica , 26, 121-127.Baldassi, C., Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., and Pirazzini, M. (2019). A behavioralcharacterization of the Drift Di⁄usion Model and its multi-alternative extension for choice under timepressure.

Management Science , forthcoming.Baldassi, C., Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., and Pirazzini, M. (2019). Time stoppedMCMC algorithms. Mimeo.Ben-Akiva, M. E., and Lerman, S. R. (1985).

Discrete choice analysis: theory and application to traveldemand . MIT Press.Ben-Akiva, M., and Morikawa, T. (1990). Estimation of switching models from revealed preferences andstated intentions.

Transportation Research , 24A, 485-495.Ben-Akiva, M., and Morikawa, T. (1991). Estimation of travel demand models from multiple data sources.In Koshi, M. (Ed.).

Transportation and Tra¢ c Theory, Proceedings of the 11th ISTTT (pp. 461-476).Elsevier.Bhat, C. R. (1995). A heteroscedastic extreme value model of intercity travel mode choice.

TransportationResearch , 29B, 471-483.Bogacz, R., Brown, E., Moehlis, J., Holmes, P., and Cohen, J. D. (2006). The physics of optimal decisionmaking: a formal analysis of models of performance in two-alternative forced-choice tasks.

PsychologicalReview , 113, 700-765.Bogacz, R., Usher, M., Zhang, J., and McClelland, J. L. (2007). Extending a biologically inspired modelof choice: multi-alternatives, nonlinearity and value-based multidimensional choice.

Philosophical Trans-actions of the Royal Society of London B: Biological Sciences , 362, 1655-1670.65ornstein, A. M., Khaw, M. W., Shohamy, D., and Daw, N. D. (2017). Reminders of past choices biasdecisions for reward in humans.

Nature Communications , 8, 15958.Caplin, A., and Dean, M. (2014). Revealed preference, rational inattention, and costly information acqui-sition. NBER Working Papers 19876Caplin, A., Dean, M., and Leahy, J. (2019). Rational inattention, optimal consideration sets, and stochasticchoice.

The Review of Economic Studies , 86, 1061-1094.Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., and Rustichini, A. (2016). Law of demand and forcedchoice. IGIER Working Paper 593.Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., and Rustichini, A. (2018). Law of demand and sto-chastic choice. Mimeo.Chen, C., Chorus, C., Molin, E., and van Wee, B. (2016). E⁄ects of task complexity and time pressure onactivity-travel choices: heteroscedastic logit model and activity-travel simulator experiment.

Transporta-tion , 43, 455-472.Chiong, K., Shum, M., Webb, R., and Chen, R. (2019). Combining choices and response times in the (cid:133)eld:a drift-di⁄usion model of mobile advertisement. Mimeo.Clithero, J. A. (2018). Improving out-of-sample predictions using response times and a model of thedecision process.

Journal of Economic Behavior & Organization , 148, 344-375.Davidson, D., and Marschak, J. (1959). Experimental tests of a stochastic decision theory. In

Measure-ment: De(cid:133)nitions and Theories (C. W. Churchman, ed.), Wiley, New York.Dean, M., and Neligh, N. L. (2017). Experimental tests of rational inattention. Mimeo.Dewan, A., and Neligh, N. (2020). Estimating information cost functions in models of rational inattention.

Journal of Economic Theory , 187, 105011.Debreu G. (1954). Representation of a preference ordering by a numerical function. In

Decision Processes (R. M. Thrall, C. H. Coombs and R. L. Davis, eds.), Wiley, New York.Debreu G. (1958). Stochastic choice and cardinal utility,

Econometrica , 26, 440-444.Debreu G. (1964). Continuity properties of Paretian utility,

International Economic Review , 5, 285-293.de Palma, A., Fosgerau, M., Melo, E., Shum, M. (2019). Discrete choice and rational inattention: Ageneral equivalence result. Mimeo.Ditterich, J. (2010). A comparison between mechanisms of multi-alternative perceptual decision making:ability to explain human behavior, predictions for neurophysiology, and relationship with decision theory.

Frontiers in Neuroscience , 4.Dupuis, P., and Ellis, R. E. (1997).

A weak convergence approach to the theory of large deviations . Wiley,New York.Echenique, F., and Saito, K. (2018). General Luce model,

Economic Theory , forthcoming.Fehr, E., and Rangel, A. (2011). Neuroeconomic foundations of economic choice(cid:151) recent advances.

TheJournal of Economic Perspectives , 25, 3-30.Frick, M., Iijima, R., and Strzalecki, T. (2017). Dynamic random utility.

Econometrica , 87, 1941-2002.Fudenberg, D., Newey, W. K., Strack, P., and Strzalecki, T. (2019). Testing the Drift-Di⁄usion Model.Mimeo. 66udenberg, D., Strack, P., and Strzalecki, T. (2018). Speed, accuracy, and the optimal timing of choices.

American Economic Review , 108, 3651-3684.Fudenberg, D., and Strzalecki, T. (2015). Dynamic logit with choice aversion.

Econometrica , 83, 651-691.Gabaix, X., Laibson, D., Moloche, G., and Weinberg, S. (2006). Costly information acquisition: Experi-mental analysis of a boundedly rational model.

American Economic Review , 96, 1043-1068.Geyer, C. (2011). Introduction to Markov Chain Monte Carlo. In

Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones, X. Meng, eds.), CRC Press, Boca Raton.Georgescu-Roegen, N. (1936). The pure theory of consumers behavior.

Quarterly Journal of Economics ,50, 545-593.Georgescu-Roegen, N. (1958). Threshold in Choice and the Theory of Demand.

Econometrica , 26, 157-168.Gold, J. I., and Shadlen, M. N. (2002). Banburismus and the brain: decoding the relationship betweensensory stimuli, decisions, and reward.

Neuron , 36, 299-308.Gold, J. I., and Shadlen, M. N. (2007). The neural basis of decision making.

Annual Review of Neuro-science , 30, 535-574.Goeree, J. K., Holt, C. A., and Palfrey, T. R. (2016).

Quantal response equilibrium: a stochastic theory ofgames . Princeton University Press.Hanks, T. D., Mazurek, M. E., Kiani, R., Hopp, E., and Shadlen, M. N. (2011). Elapsed decision timea⁄ects the weighting of prior probability in a perceptual decision task.

Journal of Neuroscience , 31,6339-6352.Hardy, G. H., Littlewood, J. E., and Polya, G. (1934).

Inequalities . Cambridge University Press.Hensher, D. A., and Bradley, M. (1993). Using stated response choice data to enrich revealed preferencediscrete choice models.

Marketing Letters , 4, 139-151.Hensher, D., Louviere, J., and Swait, J. (1999). Combining sources of preference data.

Journal of Econo-metrics , 89, 197-221.Huseynov, S., Krajbich, I., and Palma, M. A., 2018. No time to think: Food decision-making under timepressure. Mimeo.Karsilar, H., Simen, P., Papadakis, S., and Balci, F. (2014). Speed accuracy trade-o⁄ under responsedeadlines.

Frontiers in Neuroscience , 8, Article 248.Kaufman, E.L., Lord, M.W., Reese, T.W., and Volkmann, J. (1949). The discrimination of visual number.

American Journal of Psychology , 62, 498-525.Kelly, F. P. (2011).

Reversibility and stochastic networks . Cambridge University Press.Kolmogorov, A. (1936). Zur theorie der Marko⁄schen ketten.

Mathematische Annalen , 112, 155-160.Krajbich, I., Armel, C., and Rangel, A. (2010). Visual (cid:133)xations and the computation and comparison ofvalue in simple choice.

Nature Neuroscience , 13, 1292-1298.Krajbich, I., and Rangel, A. (2011). Multialternative drift-di⁄usion model predicts the relationship be-tween visual (cid:133)xations and choice in value-based decisions.

Proceedings of the National Academy of Sciences ,108, 13852-13857.Krajbich, I., Lu, D., Camerer, C., and Rangel, A. (2012). The attentional drift-di⁄usion model extends tosimple purchasing decisions.

Frontiers in Psychology , 3, Article 193.Kreps, D. M. (1988).

Notes on the theory of choice . Westview.67ouviere, J. J., Hensher, D. A., and Swait, J. D. (2000).

Stated choice methods: analysis and applications .Cambridge University Press.Lu, J. (2016). Random choice and private information.

Econometrica , 84, 1983-2027.Luce, R. D. (1957). A theory of individual choice behavior. Mimeo.Luce, R. D. (1959).

Individual choice behavior: a theoretical analysis . Wiley.Luce, R. D., and Suppes, P. (1965). Preference, utility and subjective probability. In Luce, R. D., Bush,R. R., and Galanter, E. (Eds.).

Handbook of mathematical psychology , vol. 3 (pp. 249-410). Wiley.Luce, R. D., and Suppes, P. (2002). Representational measurement theory. In Yantis, S. (Ed.).

Stevens(cid:146)handbook of experimental psychology (pp. 1-41). Wiley.Luck, S. J., and Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions.

Nature , 390, 279-281.Madras, N. N. (2002).

Lectures on Monte Carlo methods . American Mathematical Society.Matejka, F., and McKay, A. (2015). Rational inattention to discrete choices: A new foundation for themultinomial logit model.

American Economic Review , 105, 272-298.McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In Zarembka, P. (Ed.).

Frontiers in econometrics (pp. 105-142). Academic Press.McKelvey, R. D., and Palfrey, T. R. (1995). Quantal response equilibria for normal form games.

Gamesand Economic Behavior , 10, 6-38.McMillen, T., and Holmes, P. (2006). The dynamics of choice among multiple alternatives.

Journal ofMathematical Psychology , 50, 30-57.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation ofstate calculations by fast computing machines.

Journal of Chemical Physics , 21, 1087-1092.Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., and Rangel, A. (2010). The drift di⁄usion modelcan account for the accuracy and reaction time of value-based choices under high and low time pressure.

Judegment and Decision Making , 5, 437-449.Mosteller, F., and Nogee, P. (1951). An experimental measurement of utility.

Journal of Political Economy ,59, 371-404.Mulder, M. J., Wagenmakers, E. J., Ratcli⁄, R., Boekel, W., and Forstmann, B. U. (2012). Bias in thebrain: a di⁄usion model analysis of prior probability and potential payo⁄.

Journal of Neuroscience , 32,2335-2343.Natenzon, P. (2019). Random choice and learning.

Journal of Political Economy , 127, 419-457.Ortega, P, and Stocker, A. A. (2016). Human decision-making under limited time. Proceedings of the

NIPS 2016 Conference . MIT Press.Papandreou, A. G. (1953). An experimental test of an axiom in the theory of choice.

Econometrica , 21,477.Papandreou, A. G., Sauerlender, O. H., Brownlee, O. H., Hurwicz, L., and Franklin, W. (1957). A test ofa stochastic theory of choice.

University of California Publications in Economics , 16, 1-18.Pieters, R., and Warlop, L. (1999). Visual attention during brand choice: The impact of time pressureand task motivation.

International Journal of Research in Marketing , 16, 1-16.Pinsky, M., and Karlin, S. (2011).

An introduction to stochastic modeling . Academic press.68lott, C. R. (1996). Rational individual behavior in markets and social choice processes: the discoveredpreference hypothesis. In Arrow, K., Colombatto, E., Perlaman, M., and Schmidt, C. (Eds.).

The rationalfoundations of economic behavior (pp. 225-250). Macmillan.Proto, E., Rustichini, A., and So(cid:133)anos, A. (2018). Intelligence, personality and gains from cooperation inrepeated interactions.

Journal of Political Economy , forthcoming.Quandt, R. E. (1956). A probabilistic theory of consumer behavior.

Quarterly Journal of Economics , 70,507-536.Rangel, A., and Clithero, J. A. (2014). The computation of stimulus values in simple choice. In Glimcher,P. W., and Fehr, E. (Eds.).

Neuroeconomics (pp. 125-148). Academic Press.Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Neyman, J.(Ed.).

Proceedings of the fourth Berkeley symposium on mathematical statistics and probability , vol. 4 (pp.321-333). University of California Press.Rasch, G. (1960).

Probabilistic models for some intelligence and attainment tests . Danish Institute forEducational Research. Expanded edition (1980), The University of Chicago Press.Ratcli⁄, R. (1978). A theory of memory retrieval.

Psychological Review , 85, 59-108.Ratcli⁄, R., Smith, P. L., Brown, S. D., and McKoon, G. (2016). Di⁄usion decision model: current issuesand history.

Trends in Cognitive Sciences , 20, 260-281.Renyi, A. (1955). On a new axiomatic theory of probability.

Acta Mathematica Hungarica , 6, 285-335.Reutskaja, E., Nagel, R., Camerer, C. F., and Rangel, A. (2011). Search dynamics in consumer choiceunder time pressure: An eye-tracking study.

American Economic Review , 101, 900-926.Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001). Multialternative decision (cid:133)eld theory: Adynamic connectionst model of decision making.

Psychological Review , 108, 370.Russo, J. E., and Rosen, L. D. (1975). An eye (cid:133)xation analysis of multialternative choice.

Memory &Cognition , 3, 267-276.Rustichini, A., and Padoa-Schioppa, C. (2015). A neuro-computational model of economic decisions.

Journal of Neurophysiology , 114, 1382-1398.Rustichini, A., Conen, K.E., Cai, X., Padoa-Schioppa, C. (2017). Optimal coding and neuronal adaptationin economic decisions.

Nature Communications , 8, 1208.Saito, K. (2017). Axiomatizations of the Mixed Logit Model. Mimeo.Saltzman, I.J., and Garner, W.R. (1948). Reaction time as a measure of span of attention.

Journal ofPsychology , 25, 227-241.Shadlen, M. N., and Shohamy, D. (2016). Decision making and sequential sampling from memory.

Neuron ,90, 927-939.Shapley, L. S. (1975). Cardinal utility from intensity comparisons. RAND Working Paper R-1683-PR.Steiner, J., Stewart, C., and Matejka, F. (2017). Rational inattention dynamics: inertia and delay indecision-making.

Econometrica , 85, 521-553.Suppes, P., and Winet, M. (1955). An axiomatization of utility based on the notion of utility di⁄erences.

Management science , 1, 259-270.Swait, J., and Louviere, J. (1993). The role of the scale parameter in the estimation and comparison ofmultinomial logit models.

Journal of Marketing Research , 30, 305-314.69rain, K. E. (2009).

Discrete choice methods with simulation . Cambridge University Press.Vogel, E. K., and Machizawa, M. G. (2004). Neural activity predicts individual di⁄erences in visualworking memory capacity.

Nature , 428, 748-751.Wakker, P. P. (1989).

Additive representations of preferences: A new foundation of decision analysis .Springer.Webb, R. (2019). The (neural) dynamics of stochastic choice.

Management Science , 65, 230-255.Woodford, M. (2014). Stochastic choice: an optimizing neuroeconomic model.

American Economic Re-view , 104, 495-500.Zhang, T. (2006a). From " -entropy to KL-entropy: Analysis of minimum information complexity densityestimation. Annals of Statistics , 34, 2180-2210.Zhang, T. (2006b). Information-theoretic upper and lower bounds for statistical estimation.