[PDF] Entropic Decision Making

Abstract

Using results from neurobiology on perceptual decision making and value-based decision making, the problem of decision making between lotteries is reformulated in an abstract space where uncertain prospects are mapped to corresponding active neuronal representations. This mapping allows us to maximize non-extensive entropy in the new space with some constraints instead of a utility function. To achieve good agreements with behavioral data, the constraints must include at least constraints on the weighted average of the stimulus and on its variance. Both constraints are supported by the adaptability of neuronal responses to an external stimulus. By analogy with thermodynamic and information engines, we discuss the dynamics of choice between two lotteries as they are being processed simultaneously in the brain by rate equations that describe the transfer of attention between lotteries and within the various prospects of each lottery. This model is able to give new insights on risk aversion and on behavioral anomalies not accounted for by Prospect Theory.

Full PDF

EEntropic Decision Making

Adnan Rebei

University of Illinois at Urbana-Champaign, Champaign, USAAbstractUsing results from neurobiology on perceptual decision making andvalue-based decision making, the problem of decision making betweenlotteries is reformulated in an abstract space where uncertain prospectsare mapped to corresponding active neuronal representations. Thismapping allows us to maximize non-extensive entropy in the new spacewith some constraints instead of a utility function. To achieve goodagreements with behavioral data, the constraints must include at leastconstraints on the weighted average of the stimulus and on its vari-ance. Both constraints are supported by the adaptability of neuronalresponses to an external stimulus. By analogy with thermodynamicand information engines, we discuss the dynamics of choice betweentwo lotteries as they are being processed simultaneously in the brainby rate equations that describe the transfer of attention between lot-teries and within the various prospects of each lottery. This model isable to give new insights on risk aversion and on behavioral anomaliesnot accounted for by Prospect Theory.

Keywords:

Allais paradox, stochastic dominance, Maxwell’s demon,nonextensive entropy, variance, neurobiology, Birnbaum paradoxes. a r X i v : . [ q - b i o . N C ] J a n Introduction

The underlying neuronal mechanisms of decision-making in the brain are silllargely unknown, especially in complex tasks where uncertainty in the choices isimportant in the decision process. However, at the behavioral level, risky decision-making models are numerous and vary in methods of treatment with little attentionto the neuronal basis of the process. This work is an attempt to build a model ofrisky decision-making based on behavioral results that also accounts for some of thefeatures we learned about the brain network during decision tasks.The Allais paradox [3] has been a driving force behind the development of utility-based theories for the last seventy years or so. The paradox was a concrete examplethat human behavior is incompatible with Expected Utility Theory (EUT) of vonNeumann and Morgenstern [120]. In spite of its simplicity, EUT has a concave utilityfunction that successfully accounts for subjective values of goods and is consistentwith the law of diminishing marginal utility as opposed to the naive linear expectedvalue theory (EVT). To study uncertainties in economic decisions, lotteries becamea simple testing ground for theories of choice. In EUT, the prospect of a lotteryis weighted by a linear probability which in a choice task between lotteries leads tocommon outcomes being ignored. This property is called independence of outcomes.The Allais paradox showed that the independence property of EUT is violated bymost people.Among the many extensions of EUT, Prospect Theory (PT), which is also anexpectation-based model similar to EUT [51, 114], is by far the most popular theory ofdecision making under risk. PT extended EUT by introducing subjective probabilitiesand accounting for loss aversion. The subjective probabilities are nonlinear functionsof the probabilities of the prospects that give rise to inverse-S shapes to reﬂect non-

NTROPIC DECISION MAKING

NTROPIC DECISION MAKING x , x , ..., x n and corresponding probabilities p , p , ..., p n arerepresented by ( x , p ; x , p ; ...., x n , p n ), then given two lotteries A + = ($100 , B + = ($500 , .

2; $0 , . A + rather than lottery B + where lottery A + has zero variance. However, when it comes to choosing betweenlotteries A − = ( − $100 ,

1) and B − = ( − $500 , .

2; $0 , . B − which has a larger variance. This asymmetry between positive and negative out-comes, called the reﬂection eﬀect in PT, was not exactly what Allais had anticipatedin his original work. Because PT can explain this behavior of people avoiding sureloss, variance has not been explicitly incorporated in utility theories.One of the main objectives of this paper is to include the variance of the dis-tribution of outcomes explicitly in the analysis of risky decision making. Presently,computational models of decision making include some kind of variance in the anal-ysis. This variance is modeled as some additive noise that is added to each outcome.The noise is independently distributed across outcomes, and its strength is assignedad-hoc. Theories that fall in this latter category include Stochastic Expected Utility,e.g, Blavatskyy [16], and drift-diﬀusion-based models [91, 50, 19, 94].The ﬁrst attempt to use entropy as a measure of spread of outcomes in a lot- NTROPIC DECISION MAKING x , p ; x , p ; ..., x n , p n ) with utilities u ( x i ) , i = 1 , , ..., n and entropy H ( p , p , ..., p n ) [105] U ( x, p ) = n X i =1 p i u ( x i ) + βH ( p , p , ..., p n ) , (1)where β is a free parameter that can be either positive or negative. Luce et al. [65]and by Ng, Luce and Marley [75] developed this idea further axiomatically. Insteadin this paper, our goal is to develop Meginnis’ idea in a more physical way.In his thesis, Meginnis showed that his proposed utility function satisﬁes tran-sitivity, irrelevance of impossible outcomes, and more importantly, that lotteries withequal probability weights are preferred to lotteries with diﬀerent probability weightswhen prospects are the same. The extra entropy term allowed him to predict thatrisk-taking behavior should be observable in situations that involve large gains butsmall probabilities or a small loss with a large probability as is the case when buyinga lottery ticket. Similarly, risk-taking behavior can be observed in events with smallgains and large probabilities or large losses with small probabilities as in the case ofcrossing the street to get a coﬀee. Another advantage of including an entropic termin the utility function is that he was able to show that buying insurance and gam-bling behavior are both still explainable with a monotonic concave wealth functionas opposed to the idea that concavity is not a valid requirement in this situation [32].To explain the Allais paradox, Meginnis needed a negative scaling parameter β in Eq. 1. This signals that in the Allais lottery, the DM is entropy averse. A negative β was also needed to get a similar eﬃcient frontier as in the modern portfolio theoryof Markowitz [67]. In this paper, the parameter β is ﬁxed by the distribution ofprospects independent of other lotteries. However, this is achieved only if variance of NTROPIC DECISION MAKING

NTROPIC DECISION MAKING

NTROPIC DECISION MAKING ,

000 other neurons, the neuronsrepresenting prospects 1 and 2 can be considered in a quasi-equilibrium state inde-pendent of those representing the other prospect(s). Since we are mainly interested invalue-based decision making behavior, we will assume that value is expressed in termsof the intensity of ﬁrings or interactions in the neurons that are encoding a prospect.This is a generally valid assumption used in popular models of neuronal dynamics. InWang’s model of motion perception of parietal neurons, higher ﬁring rate corresponds

NTROPIC DECISION MAKING Figure 1 . Neural representation of a two-choice task. Each choice is initially repre-sented by a group of neurons independently of the others. During the decision phase,both sets of neurons become coupled. Both sets of neurons directly involved in thedecision state reside in the cortex.to higher value [121, 124, 122]. In the Good-based model of [80], higher ﬁring ratesare also correlated with higher values. The neuronal representation of the prospectsthat we adopt in this paper were motivated by measurements of Padoa-Schioppa’sgroup [81, 84, 82].

NTROPIC DECISION MAKING

One important ﬁnding in neuroeconomics is that subjective values are repre-sented explicitly at the neuronal level [37]. In a decision making task, diﬀerent cellsencode diﬀerent variables. Padoa-Schioppa [80] reported that the cells that encodevalues and choices are found to be in the orbitofrontal cortex (OFC) region of thebrain. However, background noise in the OFC is signiﬁcant and may lead to ﬂuctua-tions in the encoded variables such as the probabilities of the lotteries. In this section,we will consider only this eﬀect in a very simpliﬁed model of neuronal excitations.The case of equal expected values and equal entropies provide a stringent test ofthe ideas proposed by Meginnis [70]. To be concrete let’s consider the following caseinspired by studies from the work of Lichenstein and Slovic [63] on bets with high prob-ability values and low utility values of outcomes and vice-versa. As an example, let’sconsider the following gambles A = ($10 , .

90; $0 , .

10) and B = ($90 , .

10; $0 , . EV ( A ) = EV ( B ) = $9, where EV stands for expected value, and H ( A ) = H ( B ) = 0 . u ( x i ) = x i in Eq. 1, we have the utility of bothgambles U = 9 + 0 . β . Therefore, in this case there is no β that can give a pref- NTROPIC DECISION MAKING A to gamble B . Cumulative Prospect Theory [114] is also of no help in this pair ofgambles unless the probability weight function is allowed to be ’S’ shaped, i.e., γ > γ = 0 .

61 for their originalprobability weight functions [114], we ﬁnd that U ( A ) = $7 .

12 and U ( B ) = $16 . B is preferred to A . For the TAX model of Birnbaum[9], we ﬁnd that U ( A ) = $5 .

49 and U ( B ) = $10 .

6. Therefore, similar to PT, the TAXmodel predicts that people should choose B over A .To get around these diﬃculties, we instead adopt a more microscopic(biophysically-inspired) picture of the process of decision making. As we noted inthe previous section, probabilities are positively correlated with ﬁring rates betweenneurons. In a simple perceptual decision task that consists of identifying the directionof motion of a dot on a screen, neurons in the middle temporal (MT) of the brain,which are sensitive only to motion directions, start ﬁring with higher frequency con-sistent with the behavioral response. Moreover, ﬂuctuations in the ﬁrings are alsoreﬂected in ﬂuctuations in the decisions [18, 76]. Adopting this picture, we can easilysee that because of the noisy environment in the brain, there will be a distributionof ﬁring rates representing the probability distribution of outcomes themselves. Inother words, the probabilities of outcomes represent only an average of a distributionwith some variance. These ﬂuctuations in the probabilities may become importantin a pair of gambles like the one we are considering. Of course, we may expect alsoﬂuctuations in the representations of the values of the outcomes too. But for themoment, we will conﬁne ourselves to try to account for the ﬂuctuations around theprobability values. In principle, a microscopic treatment should be able to provide a These numbers for the TAX model were calculated using the online calculator available atBirnbaum’s site: http://psych.fullerton.edu/mbirnbaum/calculators/taxcalculator.htmNTROPIC DECISION MAKING

NTROPIC DECISION MAKING ATP molecules to transmit a bit at a chemical synapse and 10 − ATPmolecules for graded signals in an interneuron, photoreceptor, or for spike coding[60]. In a study on photoreceptors in ﬂies, it was observed that higher light intensi-ties require much more energy to process. Therefore, energy considerations can leadto stricter constraints than those due to information processing [77]. Other studiesalso showed that taking into account energy considerations in addition to neural cod-ing eﬃciency changes the understanding of what is being optimized. Considerationbased solely on information theory tend to have a goal of minimizing entropy [53, 6].However, when energy considerations are included, eﬃcient information processing,i.e., ﬁring rates, can sometimes favor increases in entropy [62]. According to Padoa-Schioppa [80], each neuronal response encoded only one variable, and the encodingwas linear. Therefore, higher outcome values translate into higher ﬁring rates whichimplies higher energy consumption. In other words, the higher the utility function,the higher the energy needed to represent that utility in the brain. This is how en-ergy is introduced in our model which may translate into maximizing motivationalstates or anticipating states rather than gains. Therefore, an outcome in a gamble

NTROPIC DECISION MAKING in time is assumed to be on average constant in each population. If we treateach gamble separately, it is abstractly represented by a system of ‘energy levels’ thatrepresent value ‘populated’ by corresponding spikes with their corresponding relativenumber representing probabilities. Taking this point of view, our decision problembecomes an inverse problem of a familiar problem in statistical physics that of de-termining the distribution of particles among n energy levels in contact with a heatbath at given temperature [47]. In our case, we have a distribution of spikes over n outcomes, and we want to determine external parameters due to other neurons in thebackground that help in maintaining the metastable state of the neurons directly in-volved in the decision phase. This model is similar in spirit to the Good-based modelof Padoa-Schioppa [80] and Padoa-Schioppa [83] in that the states of the decisionspace are labeled by the outcome values. This model, as it stands, is incapable ofleading us dynamically to the post decision state, since the interneurons in the OFCthat encode the chosen value are not part of the model [98]. But in the presence ofanother gamble, we will assume that the gamble that is less noisy will be the one thatwill be transformed to a chosen state given that the average strength of both signals(i.e., gambles) are the same. This is what we will show next.In Fig. 2, a pictorial representation of the abstract space of stimulus represen-tation by neurons in the cortex is shown. Given that all the spikes are similar, the The term ‘instant’ here means a time interval that is very short with respect to the time scaleof the decision which is of the order of 100 msec .NTROPIC DECISION MAKING x > x . Therefore, the metastable state is characterized bytwo parameters, the average energy, ¯ u = p x + p x (i.e., utility) and the averagenumber of total spikes n s = n + n . The averages are deﬁned with respect to theexponential (Boltzmann) distributions that maximize the Shannon entropy and aregiven by Huang [47]: p ( x i ) = 1 /n s λe x j /β − p i = n i /n s , then any noisy encoding of the stimulus will lead tonoisy encoding of the anticipated utility, ∆ u = u − ¯ u . The signal to noise ratio,SNR, for this gamble is therefore equal to, SN R = ¯ u/ (cid:16) ∆ u (cid:17) / , or SN R = p x + p x | β | (cid:20) ( p + p ) ln λ + 2 (cid:16) n s (cid:17) (cid:21) / (3)where β = n s p p p − p ( x − x ) , (4)ln λ = x β − n ! . (5)It should be noted that the SNR is zero when p = p = 0 .

5, which is what we expectif only spikes determine the outcome as assumed here.For our particular gambles, we have for gamble A, x = 0 , x = $10 , p = 0 . p = 0 .

90, which amounts to

SN R ( A ) = 0 . x =0 , x = $90 , p = 0 . p = 0 .

10. In this case,

SN R ( B ) = 0 . NTROPIC DECISION MAKING Figure 2 . A schematic representation of the metastable state of the neurons repre-senting the stimulus in the decision phase and right before the ’birth’ of the chosenstate. The spikes 1 and 2 are only diﬀerentiated by their frequency and the popula-tion of neurons that form the background for these spikes. These neurons themselvesare in contact with another large number of neurons that may be part of the cortexor subcortex, such as the amygdala, that are considered part of the ambient mediumor bath.

NTROPIC DECISION MAKING

The discussion in the previous section was based on adopting a simple repre-sentation of the underlying environment of the spikes that encoded the stimulus. Toaddress more complicated situations, we will instead apply Jaynes’ principle of max-imum entropy [48] to the macroscopic states but include risk or variance of outcomesexplicitly as suggested by Allais [3]. Moreover, we will use a generalized form ofentropy, non-extensive entropy, to describe the mental states of a DM.Meginniss [70], in the Appendix of his thesis, pointed to the possibility of usingnon-extensive entropy instead of Shannon entropy. Non-extensive entropy was ﬁrstsuggested by Havrda and Charvat [44] based solely on mathematical properties ofentropy when the additive property is no longer valid. The ﬁrst serious applicationsof non-extensive entropy appeared in physics [113] and later by Ng, Luce and Marley[75] to decision making. Here, we will follow our treatment in the previous section, butstart directly from maximizing the non-extensive entropy under constraints of bothaverage outcome and variance of the gamble. These constraints can be understood in

NTROPIC DECISION MAKING H q = X i p qi − p i − q , (6)can be linked to intrinsic features of internal dynamics of neuronal networks, such asﬂuctuations driven by the stimulus [15] or interactions between lotteries in a choicetask. NTROPIC DECISION MAKING Instead of a full-blown neurocomputational model, we aim to propose a modelthat conceptually bridges both behavioral and neuronal approaches. This has threeaims. First, it adopts the behavioral responses as the ‘stuﬀ’ to be explained. Second,to build a model that is biophysically plausible and able to describe the macroscopicproperties of spiking neurons. Third, the model must be consistent with physical lawsof non-equilibrium thermodynamics.The structure of the neuronal network reﬂects functional specialization (or mod-ularity) and functional integration of the brain. Modularity allows us to treat a spe-ciﬁc phase of a cognitive task locally while averaging the eﬀect of the rest of the brainon this local population of neurons. In the visual system, the most widely studiedsystem, neurons in the visual area V5 are very homogeneous and are only special-ized in detecting motion. In the decision making problem, neurons in the OFC arebelieved to be responsible for input, output value and chosen good processing [82].Therefore, neurons in this area will be our focus, while the rest of the neurons in thebrain will be represented by parameters β and λ .For any gamble G = ( x , p ; x , p ; ..., x n , p n ), a corresponding generalizedMeginnis’ utility is obtained by maximizing the non-extensive entropy using q-weighted averages, Eq. 6, subject to the conditions X i p ( x i ) = 1 , (7)1 P i p ( x i ) q X i p ( x i ) q ( x i − x M ) = 0 , (8)1 P i p ( x i ) q X p ( x i ) q ( x i − x M ) = σ , (9)where x M and σ are ﬁxed real quantities that may be used to enforce value normal- NTROPIC DECISION MAKING u ( x i ) = x i . Similarly, we will assume that there is a linear relationshipbetween energies needed (e.g., in the form of ﬁring rates) to keep the gamble rep-resentation active in the brain and corresponding outcomes. This representation issupported by neurobiological ﬁndings. For example, Kennerley and Wallis [54] founda sizable population of OFC neurons are modulated by three variables: the juice quan-tity, the action cost, and the probability of receiving the juice at the end of the trial.The ﬁring rate increased with quantity and probability, as in the Padoa-Schioppa[83] study. Therefore, (positive) value is understood in terms of energy expandedto represent a positive outcome, or the motivation toward acquiring that outcome.Hence, an approximate variational principle can be written for this ‘coarse-grained’energy representation of the gamble as follows [68] δ " H q ( p ) − α X i p ( x i ) − ! − β X i x i p ( x i ) q − x M ! / X i p ( x i ) q (10) − λ X i ( x i − x M ) p ( x i ) q − σ ! / X i p ( x i ) q = 0 . The parameters α , β , and λ are the associated Lagrange multipliers of the constraints,Eq. (7-9). From a behavioral perspective, the parameter β determines the averageutility while λ is correlated with the variance in outputs or the amount of risk involvedin the gamble. The parameter q reﬂects the degree of deformation of subjectiveprobabilities compared to those in the stimulus, or a parameter reﬂecting long-rangeinteractions between the neuron populations associated with the various outcomes.The limit q = 1 is the case treated in the previous section and represents independentneuron populations. Therefore, from a dynamical point of view, on the neuronal level, NTROPIC DECISION MAKING s , p ( x | s ). In this case, the variational principle willinvolve a negative KL-divergence, D q ( p | p ( x | s )), between the distribution p ( x ) and theactual one. Instead, we are assuming that all the microscopic states of the neuronalnetwork are equally likely to be accessible. This is the microscopic origin of keepingall options open by the DM.If we rewrite the constraints in the simpler form X m p m = 1 , (11) P m p qm u m P m p qm = U , (12) P m p qm u m P m p qm = V . (13)The solution of this optimization, Eq. (11), gives [68] p m = 1 Z q h − (1 − q ) β ∗ ( u m − U ) − (1 − q ) λ ∗ (cid:16) u m − V (cid:17)i − q (14)where the normalized parameters β ∗ = β P m p qm , (15) λ ∗ = λ P m p qm , (16) NTROPIC DECISION MAKING Z q = X m h − (1 − q ) β ∗ ( u m − U ) − (1 − q ) λ ∗ (cid:16) u m − V (cid:17)i − q . (17)Using the solution of the variational principle, Eq. 14, in the entropy, Eq. 6, we ﬁndthat β ∂ U ∂β = ∂∂β H q [ p ] , (18)and λ ∂ V ∂λ = ∂∂λ H q [ p ] . (19)Both of these equalities lead to the following relations ∂H q [ p ] ∂ U = β, (20)and ∂H q [ p ] ∂ V = λ. (21)A new functional, C , that is optimal with respect to utility U and risk V is deﬁned asfollows C [ U , V ] = U + λβ V − β H q [ p ] . (22) NTROPIC DECISION MAKING δ C δ U = 0 , (23) δ C δ V = 0 . (24)Therefore, this is the new function that controls decision making processes in themind of the DM, where the parameters β and λ control the average value and thespread of the options, respectively. These parameters are ﬁxed by the distribution ofoutcomes in the stimulus, and possibly the state of mind of the DM. The decisionfunction C is explicitly given by C = X m p qm P l p ql λβ u m + u m ! − β H q [ p ] . (25)The function C is not necessarily convex, and that depends on the sign of bothparameters β and λ . Hence for negative ratio λ/β , the functional may have onlymaximum values. This is the case we are interested in if C is to play the role of ageneralized utility function. This C function will be called an entropic utility. It mustbe understood that this is not an actual utility, but more like a motivational eﬀortfunction. To understand this, we recall that we divided the brain into two parts;one small part that includes the actual neuron populations just before a decisionis made, and a larger part, the rest of the brain, which constitutes in our picturethe environment. In essence, we are considering only the delayed stimulus responseonly in the period just prior to the choice state. If we consider the utility part ofthe C as the ’energy’ needed to represent the gamble states, then negative C maybe considered from the point of view of the environment the diﬀerence between the’information’ cost needed to maintain the small population in their states by the rest NTROPIC DECISION MAKING C , signals that the subsystem of neurons representing theoutcomes are in a non-equilibrium state.As in the previous section, the parameters are determined so that the distribu-tion p is a best estimate of the input distribution. For a ﬁxed q , the parameters β and λ are determined numerically solely from the distribution of outcomes and initialconditions. The initial conditions β = λ = 0 , q = 1 are used throughout this studywhich reﬂects an unbiased mental state of the DM regarding the outcomes of thegamble. We will be looking for solutions that maximize C . Therefore, λ/β will benegative. Among the possible solutions for diﬀerent q, the solution with the highestq, 0 < q ≤

1, is chosen (see Eq. 26). Encoding the variance of the distribution of out-comes results in a normalization of the utility function of the values (or goods) of thegamble. It must be stressed that even though Schwatenbeck et al. [102] included vari-ance in their analysis using the free energy method of Friston and Kiebel [33], thereis nothing in common with the treatment presented here which is motivated insteadby energy considerations. Other work in decision making that was also inspired by

NTROPIC DECISION MAKING

Consider two lotteries A and B similar to the ones treated earlier. Instead ofconsidering each lottery separately, we now treat them simultaneously through theparameter q . Therefore, our lottery system is now represented in the mind by a jointprobability distribution p ( A, B, θ ). We will assume that both distributions, p ( A ) and p ( B ) are independent but their eﬀective entropy satisﬁes the q-entropy decompositionformula, H q ( A + B ) = H q ( A ) + H q ( B ) + (1 − q ) H q ( A ) H q ( B ) . (26)The q-parameter is a dynamical parameter that microscopically may reﬂect the inter-action of the neuronal populations encoding the gambles with the rest of the neuronalnetwork [15]. The q = 1 case is where both gambles are completely independent, butit is also the case where their combined entropy is minimum for 0 < q ≤

1. Thisour motivation for choosing solutions with the largest q to minimize the combinedentropy of gambles in a choice experiment.As an application of this proposed model, we apply it to the Allais paradox, theviolation of stochastic dominance, and other new discovered paradoxes by Birnbaum[9] that violate PT. When comparing two gambles, we will choose the one that has the

NTROPIC DECISION MAKING C , if C is well-behaved in the sense discussed above. Moreover, the q-parameteris the same if both gambles are compared simultaneously, but not necessarily the sameacross diﬀerent pairs of gambles. For example, in Allais paradox we have two pairsof gambles that are presented to the DM at diﬀerent times. From a physical point ofview, the two pairs represent two diﬀerent stimuli , and therefore it is expected thatthe q-parameters will be diﬀerent too. But ﬁrst, we take-up the example of the twolotteries with equal expected values and equal entropies. As a ﬁrst application of the entropic utility, we calculate the corresponding C -valuesof both gambles that we discussed previously from a signal processing viewpoint.The gambles are represented pictorially in Fig. 3. As we mentioned earlier, mostpeople choose gamble A which is less risky, but PT fails to reproduce this result forprobabilities not so close to 100%, unless values are also distorted. Similarly, TAXdoes not predict that people prefer A unless values are also distorted. In this section,we will show that with both probabilities and values kept undistorted, we still canpredict that people choose A over B. In a small sample of 19 people, we had onlythree people choosing B, that is about 85% of people chose A.Solving for q , we ﬁnd several solutions. Among these solutions we choose theone where the diﬀerence between the C -values is maximum. This diﬀerence can beassociated with motivational momentum toward the chosen state and away from thesuppressed state. In Fig. 4, we represent two cases where A is preferred to B from anenergetic point of view, where in case (a) the choice of A is much easier to achieve thanin case (b) which has a larger barrier between the two states. In this ﬁgure, we aresketching a potential for an interactive theory, i.e., when both choices are entertainedsimultaneously in the mind of the DM. The star in the ﬁgure represents the metastableof the system right before a decision is made. Based on this criterion, the best values NTROPIC DECISION MAKING Figure 3 . Gambles with equal expected value of $9 and equal entropy, H q ( H = − . . − . . β terms in both gambles, a DM who chooses gamble A overgamble B is entropy seeking in gamble A but entropy averse for entropy B. From aninformation theory point of view, this corresponds to increasing information about Awhile decreasing (or erasing) information about B [85]. For both gambles, a decreasein variance is energetically more favorable. This latter observation is always true inthis analysis and follows from the sign of λ/β which is always negative. A negative NTROPIC DECISION MAKING Figure 4 . A qualitative shape of a dynamical theory of choice. A and B represent twopossible choices. Making a decision between A and B should be more energeticallyfavorable in case (a) than in case (b). The mental state represented by a "*" is thestate we discuss in this work, i.e., just before the decision, where both options arepresent.’utility’ can be interpreted that from an energy viewpoint, the cost of the processingof the information content exceeds that which is needed to maintain the state itself.However, it is more natural to interpret the diﬀerence as the direction of motivationalmomentum to drift toward one state and away from the other. The C value by itself NTROPIC DECISION MAKING

Equal EV and equal non-extensive entropy gambles, Fig. 3. q = 0.944Gamble A B C β -22.79 30.75 λ/β -0.099 -0.011 text is meaningless in this model, only diﬀerences are physical.The current analysis that led us to the conclusion that A is preferred to B ap-pears to be very diﬀerent from the signal analysis presented earlier. Both approacheshowever lead to the same conclusion in agreement with measurements but contradictsProspect Theory and the TAX model. The signal analysis discussion was at a morereﬁned scale, where ﬂuctuations caused by background noise was taken into accountand probabilities were left linear and Boltzmann-Shannon entropy was used, but en-tropy is maximized in both cases. We expect a microscopic treatment along the linesof Roxin and Ledberg [97] to include aspects of both approaches, and will includehigher order terms beyond variance, which is a discussion beyond the scope of thiswork. Next we apply our model to Allais paradox [3]. Inhis work, Allais argued that taking into account variance in addition to expectedgain is essential to have a sound preference relation among gambles. To includeinformation about variance, Meginniss [70] found that including an entropy functionin addition to the expected utility function still provided a new utility function withproperties mostly similar to EUT. The parameter β in Eq. 1 is a ﬁtting parameterthat is adjusted in such a way that Allais paradox can be explained on the basis of NTROPIC DECISION MAKING v , then the entropy of outcomes is simply proportional to ln v [24] and probablythis was the motivation for Meginnis to consider entropy as a measure of variance.In our case, we are motivated on the basis of the space of outcomes embedded ina much larger space of states, that of the whole brain network, and therefore theconstraints imposed on prospects become very important to reproduce results in-linewith behavior expectations. We follow the same analysis carried out in section 3.2.1.The gambles that Allais proposed has the structure shown in Fig. 5. Thisparticular example of four gambles has been tested extensively by Birnbaum [10]. Figure 5 . The Allais paradox discussed in Birnbaum [10].

NTROPIC DECISION MAKING

Allais paradox, Fig. 5 in arbitrary units. q = 0.713Gamble A B C

100 56 β -0.035 -0.059 λ/β C β λ/β -4.99e-3 -7.27e-2 text the reasons we discussed above to be consistent with what most people choose. Butthis is not consistent with the physical picture we are advocating here. The q-factorcan be understood as a residual interaction parameter of the underlying microscopicstates of the gambles that are being compared only if both gambles are consideredpart of the metastable state at the same time.For the Allais paradox example in Fig. 5, the values we found that ﬁts bestthe physical picture advocated here are displayed in Table 2. The q-values foundhere give probabilities that have the usual inverse ’S’-shape as in Prospect Theory.In both pairs of gambles, the DM is risk averse in the sense that negative changesin variance are more ’energetically’ favorable. Note that our interpretation of theentropic utility function as an overall motivational momentum and that the extraparts as costs related to the information content is reasonable. For example, in thecase of the certain lottery of $1 M , the information content is zero and hence thereis no cost associated with it. However, diﬀerences are what is physically relevant NTROPIC DECISION MAKING C A − C B > C D − C C = 8 .

5. According to this inequality, if reaction timesare measured, people will tend to respond faster when choosing A compared to whenchoosing D . The negative sign of the parameter β implies that the higher value statesare being more weighted in the decision than lower value states. Given two gambles A and B, if the proba-bility of winning x or more in gamble A is greater than or equal to the probability ofwinning x or more in gamble B for all values of x, and if this probability is strictlyhigher for at least one value of x, we say that A stochastically dominates B. Thereforemathematically, stochastic dominance will involve calculating a cumulative probabil-ity density of a distribution. In short, stochastic dominance expresses the apparentlyrational fact that more is better. This is the reason why stochastic dominance viola-tions were not easily accepted. Birnbaum [9] extensively studied this violation undermany diﬀerent frameworks, and obtained strong evidence that violation of stochasticdominance is a very common behavior among DMs.The violation of stochastic dominance is accounted for by the old version ofProspect Theory [51], the TAX model [10], and the Attention model [50] for varyingreasons. We show here that the theory presented above also allows for stochasticdominance to be violated. The pair of gambles we have chosen to discuss for thisviolation is from Birnbaum [9]: J = ($12 , .

10; $90 , .

05; $96 , . I = ($12 , .

05; $14 , .

05; $96 , . . Gamble I stochastically dominates J because the probabilities of winning $14or more and of $96 or more are greater in gamble I than J, and the probabilities of

NTROPIC DECISION MAKING splitting the outcomes: J ∗ = ($12 , .

05; $12 , .

05; $90 , .

05; $96 , . ,I ∗ = ($12 , .

05; $14 , .

05; $96 , .

05; $96 , . . If the two lotteries are compared term by term, it is clear that according to the lawsof probability, lottery I should be preferred to lottery J . But this contradicts whatmost people choose to do. However, from a heuristic point of view, it is easy to seewhy most people would prefer to play lottery J if they want to maximize their chancesof winning bigger amount of money. People choose J over I presumably because Jhas two ways to produce a ‘very good’ outcome but I has only one way (as if theoutcomes are equally likely even though they are not) [13]. This latter argument byitself indirectly shows the relevance of variance in the decision process, since both I ∗ and J ∗ have the same entropy [102], but in the initial set-up, J has higher entropythan that of I. Clearly linear operations on the gambles, as described here, is not astep that the human brain uses to compute the eﬀective value of each gamble, andvariance brings the minimal nonlinearity needed in the computations to make themalign with human behavior. According to Birnbaum [10]’s TAX model, people failto detect stochastic dominance because the coalescence or splitting of branches withequal outcomes is not a valid step. From the point of view of our model, coalescence isclearly violated because of the nonlinearities in the probabilities and in the outcomes.For the pair of gambles J and I displayed in Fig. 6, Birnbaum [9] conducted 16diﬀerent studies over the years. The violation of stochastic dominance was signiﬁcantin all of them. The theory we proposed here also shows that violations of stochastic NTROPIC DECISION MAKING $96$12 85%10% $96$14$12 90%5%5% J I $90 5% Figure 6 . A lottery that violates stochastic dominance [9].violation is valid for all 0 . ≤ q ≤

1. The q-value, q = 0 .

801 we have chosen is theone that also works for a transparent stochastic dominance example related to the I − J pair, but this does not have to be the case as we discussed above. The pair ofgambles J and J [9], J = ($12 , .

10; $96 , . J stochastically dominates J .The chosen q-value for the probability distributions respects stochastic dominance for NTROPIC DECISION MAKING J − J and violates stochastic dominance for I − J .Table 3 Violation of stochastic dominance [9], Fig. 6. q = 0.801Gamble J I C β λ/β -9.68e-3 -3.86e-2 text The parameters derived for the I − J pair are given in Table 3. Again usingour analogy with energy, the relative costs associated with the information content ofgamble I are much more than that of gamble J. This is reﬂected in the negative signof C . According to our model, the reason people prefer J to I has to do more with thecost of information associated with the gamble rather than its utility, or diﬀerently,people are less motivated to drift to gamble I. Since in our model we are assuminga linear relationship between utility of outcomes and energy, the opposite of C , −C ,should be interpreted from the point of view of the rest of the neuronal network as theexcess energy needed to maintain the encoding of the gamble in the brain. It is thisexcess of energy that is minimized by the (computational part of the) brain. So it isimportant to remember that in classic descriptive theories, the utility is synonymouswith the DM, but here utility is just the energy needed to represent the gamble inone part of the cortex.Finally, we would like to make further comments on the choice of the parameter q . According to the physical picture advocated here, the q parameter should reﬂectthe interaction of the two lotteries that are being considered and no other. For eachpair of gambles, we need just to ﬁnd q that gives the largest diﬀerence between the NTROPIC DECISION MAKING q = 1 should have been chosen for the IJ gamble. Theparameters are still approximately the same and our conclusions won’t be aﬀected.For q = 0 . C J − I = 273, while for q = 1, ∆ C J − I = 275. In general however, weshould choose q independently for each pair of lotteries. This is what we would dofor the remaining paradoxes. In Savage’s subjective utility theory, the "sure-thing" principle is an important assumption in the theory. The principle statesthat if a DM prefers A to B given that event C occurred or not-C occurred, thenthe DM should still prefer A to B even if the DM does not know anything about C[101]. This principle is now only of general historical interest since people do not obeythis principle even in it weakest form that of branch independence which has beenshown to be violated [13]. Branch independence implies that outcomes or brancheswith equal outcomes should not matter when comparing two gambles. Birnbaumparadoxes are a manifestation of violations of branch independence and coalescence[14]. To further show that coalescence is at the root of these violations, Birnbaum[9] studied many gambles that he expected to violate Prospect Theory. In this paper,we will address the violations of upper and lower cumulative independence (Table 3in Birnbaum [9]). For outcomes 0 < z < x < x < y < y < z , and probabilities p, q, and r , upper cumulative independence is expressed as follows: S = ( x, p ; y, q ; z , r ) < R = ( x , p ; y , q ; z , r ) (27) ⇒ S = ( x, p + q ; y , r ) < R = ( x , p ; y , q + r ) . NTROPIC DECISION MAKING S = ( z, r ; x, p ; y, q ) > R = ( z, r ; x , p ; y , q ) (28) ⇒ S = ( x , r ; y, p + q ) > R = ( x , r + p ; y , q ) . All theories with weights that depend on a cumulative distribution of outcomes , in-cluding Prospect Theory and rank dependent utilities, satisfy these properties. Thesetwo properties are not dependent on the inverse S-shape of the weight function, andthe reader is referred to Birnbaum and Navarrete [14] for more details about howthese two properties are derived. Therefore, to show the robustness of the violationsof both of these properties, Birnbaum employed diﬀerent forms of the stimuli to min-imize any framing eﬀects. His TAX model is able to reproduce these new violations.In the following, we show that these new paradoxes can be also explained within ourmodel.

Upper cumulative independence:

Upper cumulative independence (UCI)were predicted by TAX but violated by PT. An example of UCI is displayed in Fig. 7.For the gambles, we will use the same notation as that of Birnbaum [9]. The violationsof UCI are due to failure of coalescence and branch independence. According to PT,if a person prefers R over S , then that person should also prefer R over S . Here S is obtained from S by coalescing the lower two outputs, while R is obtainedfrom R by coalescing the upper outcomes. In the 12 studies reported by Birnbaumtesting UCI, he found that about 69% of people preferred R to S while about 63%of people switched preference in the second pair of gambles, i.e., most chose S . NTROPIC DECISION MAKING $110$40 80%10% $110$98$10 80%10%10% S’ R ’ $44 10%$98$40 80%20% $98$10 90%10% S’’’ R’’’

Figure 7 . Upper cumulative independence: pair of gambles no. 10 and no. 9 inBirnbaum [9].The parameters that are consistent with a violation of UCL are shown in Table4. For each pair of gambles, the q-value corresponds to the maximum diﬀerence inutilities and are in the expected preference order as chosen by people. In comparinggambles R and S , the diﬀerence is ∆ C R S = 36 .

72, while that between the gambles S and R is ∆ C S R = 6 . × . Therefore, there is a reversal in preferences. NTROPIC DECISION MAKING

Violation of upper cumulative independence of pair of gambles (10,9) in Fig. 7, [9]. q = 1Gamble R S C -3.23 -39.95 β λ/β -9.47e-3 -1.21e-2q = 0.368Gamble S R C β λ/β -7.23e-3 -0.70 text NTROPIC DECISION MAKING Figure 8 . Upper cumulative independence: pair of gambles no. 12 and no. 14 inBirnbaum [9].The second pair of gambles discussed by Birnbaum is shown in Fig. 8. Theparameters corresponding to this pair of gambles are given in Table 5. According tothe experimental measurements, about 52% of people chose R over S , while about71% of people chose S over R , which is a violation of UCI. The much higherpercentage in the S > R is reﬂected in the magnitude of diﬀerences of entropicutilities, ∆ C R S = 9 .

14, for the ﬁrst pair and ∆ C S R = 7 . × for the secondpair of lotteries. NTROPIC DECISION MAKING

Violation of upper cumulative independence of pair of gambles (12,14) in Fig. 8, [9]. q = 1Gamble R S C β λ/β -9.40e-3 -1.0e-2q = 0.737Gamble S R C β -8.29 3.13e-3 λ/β -6.88e-3 -9.30e-2 text Lower cumulative independence:

The pair of gambles we use to show vi-olations of lower cumulative independence (LCI) is from Birnbaum [10]. The gambleswere studied under 12 diﬀerent frames [9]. According to LCI, if a person chooses S over R , then that person should also choose S over R . Prospect Theory satisﬁesLCI. NTROPIC DECISION MAKING Figure 9 . Lower cumulative independence: pair of gambles no. 17 and no. 20 inBirnbaum [9].The pair of gambles are shown in Fig. 9 with the corresponding parametersin Table 6. According to the measurements, about 51% of people prefer S over R , while about 65% prefer R over S which is a violation of LCI. The parametersdetermined from our theory for this pair of gambles are given in Table 5. The eﬀectivemotivational eﬀort, C , is larger for the R S pair than for the RS pair, ∆ C SR = 0 . < ∆ C R ” S ” = 0 .

92. This is indirectly reﬂected in the percentage numbers quoted above,which imply that it is cognitively easier for the process R > S to be processed thanthat of S > R . Note that according to our theory, we cannot conclude for certain

NTROPIC DECISION MAKING C R ” S = 6 .

96 is larger than C S R = 6 .

54 since all the parameters will be diﬀerent,but what we can conclude is that the reaction time to choose S should be slowerthan that of choosing R . Only a fully dynamical theory, with higher-order termsin values, can account for these diﬀerences in transition rates. In the next section,we propose some ideas on how to approach this problem using the Maxwell ’demon’analogy [85].Table 6 Violation of lower cumulative independence of gambles (17,20) in Fig. 9, [9]. q = 0.506Gamble

S R C β λ/β -1.0e-2 -1.0e-2q = 0.974Gamble R S C β λ/β -9.26e-3 -1.56e-2 text NTROPIC DECISION MAKING Why it is not enough to only compare lotteries by calculation of expected utilityseparately? While this approach appears to work in many cases, it does not workwhen decoy eﬀects or framing eﬀects are included. For example, it is known thatquestions posed in terms of losses instead of gains may reverse the choice made bythe DM. Similarly, addition of irrelevant options can aﬀect the choice [119]. At theneuronal level, it has also been detected in single trials that changes in preferencescan occur right before the commitment to a decision [55]. Moreover, we know thatinhibiting neurons play a role in the dynamics leading to a decision, but a full pictureof the mechanisms of choice are still not known [45, 125]. Therefore, extensions ofchoice mechanisms that include interaction among the diﬀerent options is desirable.So far, we have been able to argue based on a physical picture and plausiblephysical arguments to present an account of behavioral decision making under risk.The results we were able to obtain were very encouraging, but this is still far from aphysically sound theory that can bridge the gap between behavioral and neurobiolog-ical processes. Another improvement of the treatment presented above is to propose ascheme where interactions between gambles that are being compared simultaneouslyare included in the analysis. To achieve this, we need to account for other processesthat take part in a decision process such as attention and memory, and hence extraparameters will need to be included.The need to incorporate attention in models that explain cognitive tasks is welldocumented [86]. For example, eye-tracking analysis of the dynamics of decisionmaking shows that attention increases with increasing probability and value [29, 50].This attention process will be regulated by another population of neurons in additionto the populations considered above that encoded only the gambles. However, to

NTROPIC DECISION MAKING

NTROPIC DECISION MAKING From an energetic viewpoint, consider two lotteries L and R, the populationof neurons representing each gamble will form an instantaneous equilibrium, andduring the deliberation phase of the decision making, the decision maker is assumedto explore all possible states by a feedback process that also sustains the signal inthe absence of the stimulus. The feedback loop process is similar to the rehearsalcomponent of the Atkinson and Shiﬀrin [5] model for short-term memory. For thedecision to be made, both gambles must remain active in a short-term buﬀer inprefrontal cortex such as in OFC and may be other regions of the cortex. In ourmemory, both implicit and explicit attentions are involved, and may span diﬀerenttime scales depending on the relative values of the parameters that we determinedearlier in the entropic utility approach.In the deliberation phase, shifting attention among all possible states will bepart of the dynamical process that will upset the local equilibrium and drive thesystem toward the chosen state. The analogy with a chemical reaction in the presenceof an enzyme is very close to the picture we are adopting her. Schematically, thedynamic of decision is represented in Fig. 10. Each well represents a lottery withtransitions represented by thin arrows, while fat arrows represent transitions betweenstates. The demon is represented here by a tape that represents bias or an energy-feedback loop that facilitates the decision to proceed in the least resistant path inthe neuronal network and keeps track of the transitions made between L and R. Wewill not treat the whole dynamic of the system in a self-consistent way, but we willintroduce the minimum ingredients needed to account for memory of the demon. Thediscussion of entropy manipulation by WM will not be discussed here, but may givea better picture of how entropy generation is transported across the diﬀerent regions

NTROPIC DECISION MAKING

Figure 10 . The system is made of two subsystems, L and R , that represent thelotteries. The tape represents the memory of the mind as it juggles between thetwo subsystems. The distribution of L’s and R’s in the outgoing tape represents thedistribution of choices made by people and represented by variable θ = 0 ,

1. Here it isassumed that people are ﬁrst presented with the lottery at the left. So initially, peopleregister lottery L in their memory, and then afterwards decide to stay in L or moveattention to R. Moving attention away from L is equivalent to erasing informationabout L. In this particular example, a single time step is assumed to start alwaysfrom the L state.In the sensorimotor region, diﬀerent action options are represented in parallel[39]. Hence, it is reasonable to represent diﬀerent options with diﬀerent states inparallel. Moreover as we discussed above, there is some evidence in a two-choicereaching task that neural activity in the primary cortex (M1) and dorsal premotor

NTROPIC DECISION MAKING dp i dt = X j p j W ji − p i W ij (29)where W ij is the transition probability rate from the i -th state to the j -th state.The transition probabilities within each gamble are determined from the probabilitydistributions found earlier, while the transition between gambles are caused by thedemon, which are assumed constant for simplicity. These transition rates give rise to aﬂow of entropy in or out of the coarse-grained subsystem describing the gambles to therest of the brain. The picture we adopt here is very similar to the one used in systemswith thermodynamic and information processes [85]. Therefore, the introduction ofthe demon will drive the subsystem to a non-equilibrium state. The coeﬃcients W ij will be interpreted in terms of attention [78].To keep our model manageable, we will assume local detailed balance for tran-sitions within a gamble. This implies that the the transition probability rate fromstate, W ij ( i → j ) satisfy the following relation between any pair of outcomes orstates: W ij p eqi = W ji p eqj . (30)Fortunately for us, this non-equilibrium state can achieve a steady state which makes NTROPIC DECISION MAKING W ij W ji = " − (1 − q ) P ν λ ∗ ν ( O ν,i − u ν )1 − (1 − q ) P ν λ ∗ ν ( O ν,j − u ν ) − q (31)where the λ ∗ ν are the normalized Lagrange multipliers and P i O ν,i = u ν is the ν thconstraint on the subsystem. We choose these transition rates as W ij = Ω( β, λ ) " − (1 − q ) P ν λ ∗ ν ( O ν,i − u ν )1 − (1 − q ) P ν λ ∗ ν ( O ν,j − u ν )

12 11 − q , (32)where Ω is the transition rate of attention within a single lottery. If transition statetheory is any guidance, we expect to have Ω of the same order as 1 /β . Hence bothparameters Ω and ω represent the frequency of shifting attention among the outcomesof a single lottery and between lotteries. It may be possible that an eye ﬁxationanalysis can be used to estimate some of these parameters [96]. In addition to thesetwo parameters, we also need a distribution in working memory of the two gambles.This distribution will eventually represent the belief or information gained after thedeliberation (i.e, feedback) process. Based on this distribution, the working memorywill erase any information associated with the discarded lottery (i.e., decrease inentropy), and ’focus’ attention on the chosen lottery. This information is then relayed NTROPIC DECISION MAKING et al. were able to identify a diﬀerent set of neurons that encoded the chosengood (juice). Our model is based on his ﬁndings, and chosen gamble is representedin the working memory (WM), or demon. This is best illustrated by treating variousexamples.

The ﬁrst example we treat is that ofstochastic dominance discussed earlier. Since we are addressing switching of atten-tion among the various possible outcomes of both lotteries, we need ﬁrst to set-upthe possible paths along which these transitions of attention can be activated. Theworking memory will play the role of a traﬃc controller between various states thatare usually the focus of attention. The circuit that corresponds to the two lotteriesin Fig. 6 is shown in Fig. 11, and is represented in an abstract value-based space.In addition to the probability transition rates, W ij , that we already discussed, thereis a new element in this network that corresponds to the working memory that isresponsible for new attentional guidance between the various options available to theDM. Similar processes were observed in the visual system where the WM modulatescompetitive interactions between items in the visual ﬁeld before a selection is made[57]. Therefore, our attention model is very plausible physically. The word atten-tion is mostly associated with a conscious action, but it is also known that attentioncan be implicit. For example, studies on the eﬀect of emotions on explicit task-related attention processes are well documented, and showed that implicit attentionto marginal information (stimulus) is possible [30]. In our model, we assume bothtypes of attention mechanisms are involved in a decision task. NTROPIC DECISION MAKING Figure 11 . Stochastic dominance circuit: Choice between gamble J =($96 , . , $90 , . , $12 , .

10) and gamble I = ($96 , . , $14 , . , $12 , . ω represents the switchingrate between the states $14 ⇔ $90, while (cid:15) is a constant distribution that mayrepresent the degree of bias or belief of the DM towards the decision states. Hence ω and (cid:15) are associated with the WM (or demon). In suggesting this particular circuit,we are assuming that the DM really focuses on states and not gambles. If the mostprobable state happens to be in gamble A, then gamble A is chosen. Therefore, themental representation we have adopted is the following [93]. The Maxwell demon lives NTROPIC DECISION MAKING p = ω (cid:15)p − ω (1 − (cid:15) ) p + W p − W p + W p − W p , ˙ p = − ω (cid:15)p + ω (1 − (cid:15) ) p + W p − W p + W p − W p , (33)˙ p = W p − W p + W p − W p + W p − W p , ˙ p = W p − W p + W p − W p + W p − W p , where the probabilities satisfy p + p + p + p = 1, and are not equal to the ini-tial probabilities. The initial (i.e., stimulus) probabilities are now encoded in thetransition rates W ij according to Eq. 32.For the working memory, we adopt the simplest model. The bits 1 and 0are associated with states $90 and $40, respectively. The choice of these two statesappear arbitrary, but what is important is that they do not belong to the same lottery.Otherwise, their choice does not have a signiﬁcant eﬀect on the ﬁnal distribution. Forsimplicity, the incoming bits are assumed to have probability p (1) = (cid:15) . The bitsinteract for a period T with the network according to the rate equations above, Eq.33, before the next bit arrives. After the interaction period, the network evolves to anew state such that the value outcome $90 has a new probability, p . The change in NTROPIC DECISION MAKING p will be reﬂected in the outgoing part of the tape such that p out (1) = p / ( p + p ).This system reaches a non-equilibrium steady state with p out (1) = 0 . k + 0 . k + ω(cid:15) (0 .

21 + 0 . k /k + 0 . k /k ) k + 1 . k + ω (0 .

21 + 0 . k /k + 0 . k /k ) , (34)where k and k are the corresponding transition probability rates for J and I , re-spectively. From reaction rate theory, the ratios of the transition probabilities areexpected to be such that k /k ∝ | β /β | ν , with ν > p out (1) = 0 .

46 1 + 0 . (cid:15)ω/ Ω1 + 0 . ω/ Ω , (35)where k = k = Ω. The function in Eq. 35 is displayed in Fig. 12. Therefore, forsmall ratios of ω/ Ω, state J can never be chosen according to our scheme. In theopposite limit of this ratio, the state $90 becomes abundant in the memory, or inother words, attention is more focused on this state than other states. Physically,this limit corresponds to very fast decisions. However, this latter approximation isvery poor given that according to the entropic utility solution, the ratio of β J /β I ≈ NTROPIC DECISION MAKING p out (1) corresponds to the percentage of populationchoosing gamble J. In this case, if the population starts from the $90 state uponbeing asked to choose between I and J , only about 67% of them will end up choosing J . According to Birnbaum’s measurements, 65 .

8% of the population choose J over I in 12 studies [9]. Figure 12 . Violation of Stochastic Dominance: The probability of state $90 whichcorresponds to bit 1 in the memory as calculated from the viewpoint of the memory(demon). The data is for k I = k J = Ω. — First, we treat the pair of gambles A and B . Thecircuit for this case is shown in Fig. 13-a. Here the transition rate, ω , between the$2 M state and $1 M state is inﬂuenced by the presence of the certain state of $1 M .Hence, ω will be proportional to 1 / | β A | . Since people will focus their attention on therelative chances of receiving $2 M , a naive estimation is to evaluate p ($2 M ) / ( p ($1 M )+ NTROPIC DECISION MAKING p ($2 M )) which is approximately equal to 10%, the same as in the original B gamblesince the $2 state is barely relevant. Therefore, we expect including the certain stateto make receiving $2 M less probable than receiving $1 M even in the most pessimisticcase that of (cid:15) = 0. To calculate the chances of choosing the state of $2 M , and thegamble B , we ﬁrst write the rate equations for the circuit. (a) Allais AB (b) Allais CD Figure 13 . Allais paradox [10]: (a) Allais Paradox: A vs. B. The A lotteryconsists of $1 M option with certainty, while the B lottery has three options with($2 , . , ($1 M, .

89) and ($2 M, . M, , M, , p , p and p the probabilities of the $2 M , $1 M , and $2 states, respectively.The rate of change of these probabilities for the lotteries A-B are˙ p = W p + W p − W p − W p + ω (cid:15)p + ω (1 − (cid:15) ) p ˙ p = W p − W p + W p − W p − ω (cid:15)p + ω (1 − (cid:15) ) p (36)˙ p = W p − W p + W p − W p , where ω is the frequency of transitions between the two lotteries. Therefore it is as-sumed to reﬂect the process of attending to the various outcome values. The outgoing NTROPIC DECISION MAKING p out (1) = p p + p = W W + ( W + W ) ( W + ω (cid:15) ) W W + W W + + W W + ( W + W )( W + ω ) . = 0 .

898 1 + 0 . (cid:15) ωk B . ωk B . (37)Thus according to this representation, the relative probability of the $1 M state com-pared to the $2 M state in the registered memory is at worst 75%, with complete biastoward the $2 M state, i.e., (cid:15) = 0 and ωk B ≈ ( k A + k B ). The treatment discussedhere can be interpreted diﬀerently. If we assume that each time step corresponds toa diﬀerent DM, and that the $1 M state attracts the attention of each DM the sameway. We estimate in this case that only 8% of the population will choose to play lot-tery B and not A . This estimate is very close to the one reported in Kahneman andTversky [51], about 12%, but far smaller than the 42% measured by Birnbaum [10].This latter experiment was not part of his extensive studies reported in Birnbaum[9]. The circuit for the two lotteries C and D in the brain is schematically repre-sented in Fig. 13-b. Similar to previous set of lotteries, we write the rate equationsfor the CD network in the value-based space of choices. The rate of change of prob- NTROPIC DECISION MAKING p = W p − W p − ω (cid:15)p + ω (1 − (cid:15) ) p ˙ p = W p − W p + ω (cid:15)p − ω (1 − (cid:15) ) p (38)˙ p = W p − W p + W p − W p . The probability of bit 1 in the memory after many interaction periods with the net-work representing both gambles is p out (1) = p p + p = W W + ω (cid:15) ( W + W ) W W + W W + ω ( W + W ) . (39)Using the probability transition rates, we have p out (1) = 0 .

92 + (cid:15) ωk C (cid:16) + 0 . k C k D (cid:17) .

01 + ωk C (cid:16) + 0 . k C k D (cid:17) . (40)For this pair of lotteries, Birnbaum [10] gave 76% as the percentage of people whochose lottery D over lottery C . This pair of gambles were not part of the 12 studieswe quoted above. To get a percentage close to the measured value, the populationmust have a bias toward the $2 M state, i.e., (cid:15) > .

5, and for average interactionperiods between the WM and the network much shorter than the transition timesbetween the outcomes of the D gamble. The requirement of bias is not necessarilya weakness of the model, but is considered a prediction in this instance. Diﬀusion-based models [92], for-example, require a bias parameter that needs to be variedto achieve agreements in two-choice decision tasks. From Table 2, we see that thefocus of attention, i.e., 1 /k D is much longer than that of gamble C . Therefore, ifthe interaction period between memory and network is much larger than the average NTROPIC DECISION MAKING C , but less than that of gamble D , wecan attain probabilities in the range measured by the experiment. These are detailsthat can be checked in future measurements. Next we suggest an interactive circuit for theBirnbaum paradoxes.

Upper Cumulative Independence.

For the upper cumulative indepen-dence test, we use the pair of gambles 10 and 9 in Table 3 of Birnbaum [9]. Aswe observed above, we will interpret our results from two angles, that of a single DMor a population of DMs. (a) R’S’ (b) R”’S”’

Figure 14 . Upper cumulative independence: (a) R versus S choice 10 in Table 3,(b) S versus R choice 9 in Table 3 in [9]The initial probability of bit 1 is p in (1) = (cid:15) . After interaction with the bit, thenew probability of bit 1 in the R S circuit becomes p out (1) = 0 . . (cid:15) ωk (cid:16) k k (cid:17) . ωk (1 + k k ) (41)This functional dependence is plotted in Fig. 15 for diﬀerent values of the initial NTROPIC DECISION MAKING (cid:15) = 0 .

5, the attention scheme does not work since itcan’t decide on its own which state to choose. A slight bias is needed for a choiceto be made. According to the measurements of Birnbaum [9] averaged over his 12studies, 69% of people chose gamble R over gamble S . If we adopt the view thatevery interaction interval between the demon and the gambles are independent andcorrespond to diﬀerent people, the probability of outgoing bit 1 should be closelyaligned with the percentage of people choosing R in the limit of an inﬁnite sampleof people. Therefore, according to Eq. 41, to achieve comparable percentages, thepopulation should have an initial bias toward the higher value state of $98 in lottery R . As an example, if we take k /k ≈ β R /β S ≈

5, Table 4, and choose the rate ofinformation manipulation by the WM to be approximately ω/k = 2 /

5, i.e., midwaybetween the intra-transition rates, we ﬁnd p out (1) ≡ . R S of test 9 in Birnbaum[9] gives for the probability of outgoing bit 1 after interaction with the network of thegamble p out (1) = 1 . (cid:15) ω/ k (1 + 1 . k /k )2 .

17 + ω/ k (1 + 1 . k /k ) . (42)This function is shown in Fig. 16 for various values of the initial bit 1. Thepercentage value of the population that chose S over lottery R averaged over 12studies was about 63%. Such percentage can be understood either as a bias towardthe $40 state or faster shifting of attention between gambles than that within gamble S . The latter means that more attention is paid to the chosen gamble. NTROPIC DECISION MAKING Figure 15 . Upper cumulative independence in the interactive model: R versus S oftest no. 10 in Table 3 of Birnbaum [9]. The data is for k S /k R = 5 . —The plot is shown for the case when k S /k R ≈ ω and (cid:15) reﬂectcognitive eﬀort by the WM, this implies that the memory can interact very little withthe network and induce a decision. Therefore, compared to the decision made on the R S pair, it is easier to make a decision on this pair. Again, this is very reasonablegiven the complexity of the ﬁrst pair. Since the measured probabilities of the ﬁrstdecision are higher than this current one, this may lead us to believe that it is easierto make a decision on the ﬁrst pair rather than this one. From the structure of bothpairs, it is very diﬃcult to agree with this interpretation, and we believe that theconclusion from this model is more plausible in this case. NTROPIC DECISION MAKING Figure 16 . Violation of upper cumulative independence: S vs R choice 9 in Table3 in [9]. The data is for k S /k R = 0. Lower Cumulative Independence.

Finally, we address the violation oflower cumulative independence with this transfer of attention approach, and discusshow this violation can be understood from the WM memory manipulation of theregions encoding the stimulus, i.e., the gambles. Lower cumulative independenceimplies that if people prefer S over R , then they should also prefer S over R . NTROPIC DECISION MAKING R =($3 , .

90; $12 , .

05; $96 , .

05) and S = ($3 , .

90; $48 , .

05; $52 , .

05) for the ﬁrst pairof choices, and R = ($12 , .

95; $96 , .

05) and S = ($12 , .

90; $52 , .

10) for the sec-ond pair of gambles. The circuits for the two pairs of gambles are shown in Fig.17. For the pair of gambles RS , the probability of having bit 1 in the memory afterinteraction with the network is given by p ( S ) = 0 . . (cid:15) ωk (1 + k k )1 + 0 . ωk (1 + k k ) . (43)The parameters of the entropic utility for this pair of gambles suggests that the ratio k /k ≈ (a) RS (b) R”S” Figure 17 . Lower cumulative independence circuits: (a) R versus S lotteries: Table3 , test no. 17 in [9] (b) R” versus S” lotteries: Table 3, test no. 20 in [9]).According to Birnbaum ’s measurements, there was approximately 52% of thepopulation who chose S over R . According to the suggested circuit, a preference of S over R will need at least some bias towards the state S , which is shown in Fig. 18.Moreover, this bias will be accentuated for higher ω/k S ( k = k S ), i.e., when the WM NTROPIC DECISION MAKING Figure 18 . Lower cumulative independence: R vs S lotteries. Probability of choosingR as a function of the parameters ω , k , and (cid:15) ([9], test 17). The data is for k /k = 3.——interacts with the S-side of network more often than with R. This is again a plausibleresult which means that most of the attention is focused on the S gamble, and themore attention a gamble is attended to, the more often it is chosen. Hence, in spiteof the simplicity of the model, the details of the prediction is something that can betested.For the second pair of gambles R and S in test 20 of Birnbaum [9], theprobability of the bit 1 in the memory after the interaction with the network is p ( R ) = 0 .

321 1 + (cid:15) ωk (0 .

485 + k k )1 + ωk (0 .

156 + 0 . k k ) . (44) NTROPIC DECISION MAKING R depends heavily on the ratio ω/k in addition to an initialbias towards the largest value state of $96. From the energetic picture we triedto lay out in previous sections, it is very plausible that this state will have moremotivational momentum because of its high variance, but since this simple attentionmodel cannot quantify the dependence of the rates on the parameter λ , we cannotstudy the behavior due to this terms in detail. The dependence of the rates on thevariance term has not been included in the previous applications, and therefore ourdiscussion remains incomplete. But in spite of this shortcoming, the method is stillable to shed some light on the interaction of the WM with the various prospects ofthe lotteries. The data is shown in Fig. 19.Birnbaum’s measurements for this pair of gambles was about 69%. For thisparticular pair of gambles, the attention is mainly concentrated on gamble R andlittle attention is paid to gamble S ”. In addition to the bias, the reaction time of theWM with the lotteries must be less than that of gamble R . Therefore, attention ismore concentrated on the chosen item similar to earlier conclusions.The advantage of including a component representing WM as part of the deci-sion problem is obvious from these simple illustrations. Even though our discussioninvolved only the qualitative inﬂuence of the β parameter which is related to theaverage anticipated utility, we completely ignored the direct eﬀect of the parameter λ on the attention process. As we demonstrated in the previous section, this term isessential to the decision problem and cannot be ignored. However, its eﬀect will bediﬃcult to quantify theoretically. NTROPIC DECISION MAKING Figure 19 . Lower cumulative independence: R vs S lotteries. Probability of choos-ing R as a function of the parameters ω , k , and (cid:15) ([9], test 20). The data is for k /k = 1. – Much progress has been made in identifying regions of the brain that are in-volved in decision making under risk. However, little is known about the actualmechanisms of choice. Normative theories of decision making have had very limitedsuccess in accounting for many anomalies that appear to be irrational from the view-point of the laws of probability. But, descriptive theories such as Prospect Theorywere able to address some of these anomalies by introducing a non-linear weightingfunction of the probabilities in the utility function. Unfortunately, Prospect Theoryturned out to be itself incomplete. For example, the TAX model of Birnbaum is more

NTROPIC DECISION MAKING