[PDF] Over-representation of Extreme Events in Decision-Making: A Rational Metacognitive Account

Abstract

The Availability bias, manifested in the over-representation of extreme eventualities in decision-making, is a well-known cognitive bias, and is generally taken as evidence of human irrationality. In this work, we present the first rational, metacognitive account of the Availability bias, formally articulated at Marr's algorithmic level of analysis. Concretely, we present a normative, metacognitive model of how a cognitive system should over-represent extreme eventualities, depending on the amount of time available at its disposal for decision-making. Our model also accounts for two well-known framing effects in human decision-making under risk---the fourfold pattern of risk preferences in outcome probability (Tversky & Kahneman, 1992) and in outcome magnitude (Markovitz, 1952)---thereby providing the first metacognitively-rational basis for those effects. Empirical evidence, furthermore, confirms an important prediction of our model. Surprisingly, our model is unimaginably robust with respect to its focal parameter. We discuss the implications of our work for studies on human decision-making, and conclude by presenting a counterintuitive prediction of our model, which, if confirmed, would have intriguing implications for human decision-making under risk. To our knowledge, our model is the first metacognitive, resource-rational process model of cognitive biases in decision-making.

Full PDF

OOver-representation of Extreme Events in Decision-Making:A Rational Metacognitive Account

Ardavan S. Nobandegani , , Kevin da Silva Castanheira , A. Ross Otto , & Thomas R. Shultz , { ardavan.salehinobandegani, kevin.dasilvacastanheira } @mail.mcgill.ca { ross.otto, thomas.shultz } @mcgill.ca Department of Electrical & Computer Engineering, McGill University School of Computer Science, McGill University Department of Psychology, McGill University

Abstract

The Availability bias, manifested in the over-representationof extreme eventualities in decision-making, is a well-knowncognitive bias, and is generally taken as evidence of human ir-rationality. In this work, we present the ﬁrst rational, metacog-nitive account of the Availability bias, formally articulated atMarr’s algorithmic level of analysis. Concretely, we presenta normative, metacognitive model of how a cognitive systemshould over-represent extreme eventualities, depending on theamount of time available at its disposal for decision-making.Our model also accounts for two well-known framing effectsin human decision-making under risk—the fourfold pattern ofrisk preferences in outcome probability (Tversky & Kahne-man, 1992) and in outcome magnitude (Markovitz, 1952)—thereby providing the ﬁrst metacognitively-rational basis forthose effects. Empirical evidence, furthermore, conﬁrms animportant prediction of our model. Surprisingly, our modelis unimaginably robust with respect to its focal parameter.We discuss the implications of our work for studies on hu-man decision-making, and conclude by presenting a counter-intuitive prediction of our model, which, if conﬁrmed, wouldhave intriguing implications for human decision-making un-der risk. To our knowledge, our model is the ﬁrst metacog-nitive, resource-rational process model of cognitive biases indecision-making.

Keywords:

Availability bias; Decision-making under uncer-tainty and risk; Metacognitively rational models; Fourfold pat-tern of risk preferences

Which one comes to your mind easier? The most horriblecar crash of your life, or the event of driving home safelyon an ordinary day? Among the great many cognitive biasesdocumented in the literature, the Availability bias (Tversky &Kahneman, 1972) is a notable one: people overestimate theprobability of events that easily come to mind. A number ofnotable effects can be explained by this cognitive bias: peo-ple’s overestimation of the frequency of extreme events likean earthquake (Lichtenstein, Slovic, Fischhoff, Layman, &Combs, 1978) and people’s overreaction to threats like ter-rorism (Lichtenstein et al., 1978; Rothman, Klein, & Wein-stein, 1996, Sunstein & Zeckhauser, 2011). Neurobiologicalwork shows that the strength of a memory is modulated bythe salience of its positive or negative valance (Cruciani et al.,2011), thereby providing a possible explanation of the Avail-ability bias.Recently, Lieder, Grifﬁths, and Hsu (2014, 2017) pro-posed a boundedly-optimal, rational process model of thisbias which can explain a wide range of ﬁndings in the human-decision making literature. Drawing on the importance sam-pling paradigm , their account aimed to minimize the mean squared error (MSE) of an expected utility estimator, as awell-established and normatively-justiﬁed measure of qualityof an estimator (Poor, 2013). Since the variance of the esti-mator is the asymptotically-dominant term in the MSE (i.e.,for large sample size, variance becomes an accurate proxy forMSE), Lieder et al. (2014, 2017) suggested that people adoptthe following importance distribution (as the importance dis-tribution minimizing the variance): q ( o ) ∝ p ( o ) | u ( o ) − E p [ u ( o )] | , (1)for mental simulations of events. In (1), o denotes an arbitraryevent, p the objective probability of event o , u ( o ) the utilityof event o , q the probability distribution one adopts for theirmental simulations (i.e., the subjective probability of event o ), and, ﬁnally, E p [ · ] the expectation with respect to p .Note that the expression in (1) does not depend on thenumber of samples one gets to draw before making their de-cision (denoted by s ). In that light, Lieder et al.’s (2014,2017) account implies that time availability , i.e., the amountof time a decision-maker has for making a decision, shouldhave no implications on what importance distribution q oneadopt. While a cognitively-rational agent is ignorant aboutadapting their importance distribution q based on time avail-ability, a meta cognitively-rational agent plausibly considersthat in their choice of q . That is, the metacognitively-rationalagent chooses, among all q ’s, the one which is normatively-justiﬁed based on time availability considerations—this es-sentially makes it a strategy selection task guided by timeavailability. In agreement with this view, a large body of psy-chological work on decision-making suggests that (1) peo-ple evoke different strategies for decision making under timepressure vs. no time pressure condition, and (2) people adapttheir strategies in accord with time availability (see e.g., Sven-son & Maule, 1993; Svenson, 1993).In this work, we present the ﬁrst normative, metacognitivemodel of how an agent should over-represent extreme even-tualities, depending on the amount of time available at theirdisposal for decision making. Concretely, our work servesas a rational, meta-level model for the work by Lieder etal. (2017, 2014). More speciﬁcally, the importance distri-bution suggested by Lieder et al. (2017, 2014) naturally fol-lows from our metacognitive account, when s is large (i.e. forlarge sample size regime). In contrast to Lieder et al. (2017,2014), our meta-level account also speciﬁes how a decision-maker should rationally choose their importance distribution a r X i v : . [ q - b i o . N C ] J a n hen they can only afford to collect merely very few sam-ples (i.e. when making decision under extremely high timepressure). Importantly, recent work has provided mountingevidence suggesting that people often use very few samplesin probabilistic judgments and reasoning under uncertainty(e.g., Vul et al., 2014; Battaglia et al. 2013; Lake et al., 2017;Gershman, Horvitz, & Tenenbaum, 2015; Hertwig & Pleskac,2010; Grifﬁths et al., 2012; Gershman, Vul, & Tenenbaum,2012; Bonawitz et al., 2014), elevating the importance ofdeveloping process models speciﬁcally directed at the smallsample size regime.We show that our model can account for two well-knownframing effects in human decision-making under risk: thefourfold pattern of risk preferences in outcome probabil-ity (Tversky & Kahneman, 1992) and in outcome mag-nitude (Markovitz, 1952). Despite being often taken asstrong evidence for human irrationality, we provide the ﬁrstmetacognitively-rational basis for these effects. Empirical ev-idence, furthermore, conﬁrms an important prediction of ourmodel: over-representation of extreme events regardless oftheir frequencies. Our model also makes a counterintuitive(normative) prediction, which, if conﬁrmed, would have sur-prising implications for human decision-making under risk.

In this section, we formally present our metacognitively-rational model for the Availability bias (Tversky & Kahne-man, 1973). According to the expected utility theory (VonNeumann & Morgenstern, 1944), an agent chooses an action a , with the highest expected utility E [ u ( o )] = (cid:90) p ( o | a ) u ( o ) do , (2)where p ( o | a ) denotes the distribution over outcomes o result-ing from taking action a , u ( o ) the subjective utility associatedto outcome o , and E [ · ] the expectation operation. Since thecomputation of (2) is intractable in general, we assume thatthe agent estimates (2) using sampling methods (Hammers-ley & Handscomb, 1964). Substantial neural and behavioralevidence supports this hypothesis (see e.g. Fiser, Berkes, Or-ban, & Lengyel, 2010; Vul, Goodman, Grifﬁths, & Tenen-baum, 2014; Denison, Bonawitz, Gopnik, & Grifﬁths, 2013;Grifﬁths & Tenenbaum, 2006). Concretely, following Liederet al. (2014, 2017), we assume that the agent estimates (2)using (self-normalized) importance sampling (Hammersley& Handscomb, 1964; Geweke, 1989), which is shown tohave connections to both neural networks (Shi & Grifﬁths,2009) and cognitive process models (Shi, Grifﬁths, Feldman,& Sanborn, 2010):ˆ E = ∑ sj = w j s ∑ i = w i u ( o i ) , ∀ i : o i ∼ q , w i = p ( o i ) q ( o i ) . (3) The optimality of Lieder et al.’s (2017, 2014) model hinges onthe number of samples s being large. When s is small (i.e. smallsample size regime) Lieder et al.’s (2017, 2014) model is no longeroptimal. Our model, however, remains rational for both small andlarge s ’s. In Eq. (3), s denotes the total number of mental simulationsperformed by the agent, o i the i th mentally simulated out-come, u ( o i ) the utility of o i , p the objective probability ofevent o i , q the probability distribution the agent adopts fortheir mental simulations (i.e., the subjective probability ofevent o i ), and, ˆ E the (normalized) importance sampling es-timator of E [ u ( o )] given in (2).The mean-squared error (MSE) of the estimator in (3), asa standard normative measure of the quality of an estima-tor (Poor, 2013), can be decomposed as follows: E [( ˆ E − E [ u ( o )]) ] = ( Bias [ ˆ E ]) + Var [ ˆ E ] , where The bias Bias [ ˆ E ] andvariance Var [ ˆ E ] of the estimator ˆ E can be approximated by(Zabaras, 2010):Bias [ ˆ E ] ≈ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do , (4)Var [ ˆ E ] ≈ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do . (5)Under mild technical conditions, it can be shown that therational importance distribution for minimizing the MSE ofthe estimator ˆ E is given by: q ∗ meta ∝ p ( o ) | u ( o ) − E p [ u ( o )] | (cid:115) + | u ( o ) − E p [ u ( o )] |√ s | u ( o ) − E p [ u ( o )] |√ s , (6)where p denotes the objective probability of event o , and E p [ · ] the expectation with respect to distribution p . We refer to q ∗ meta given in (6) as the metacognitively-rational importancedistribution the agent should adopt for mental simulation ofevents for decision-making under uncertainty. For the deriva-tion of the expression given in (6), the reader is referred toSec. A-I of the Appendix.Comparing expressions (1) and (6) reveals that the mul-tiplicative factor (cid:115) + | u ( o ) − E p [ u ( o )] |√ s | u ( o ) − E p [ u ( o )] |√ s , which we term metacognitive rationality factor (MCRF), is what sets apartLieder et al.’s (2014, 2017) cognitively-rational model (seeEq. (1)) from our metacognitively-rational model. In the re-mainder of this work, we show that MCRF plays a crucial rolein accounting for two important framing effects in decision-making under risk. It is crucial to note that q ∗ meta takes intoaccount the amount of time available for making a decision(i.e., time availability), as evidenced by expression (6) ex-plicitly depending on the number of mental simulations s per-formed by the agent.Following Lieder et al. (2017, 2014), and for ease of expo-sition, we assume E p [ u ( o )] = A simple investigation of the metacognitively-rational impor-tance distribution q ∗ meta given in (6) yields an important pre-diction of our model: Extreme eventualities should be over-represented in decision-making, regardless of how rare or fre-quent they are. Importantly, this effects is already empirically Figure 1:

A metacognitively-rational agent should over-representextreme events precisely according to Lieder et al.’s (2014, 2017)cognitively-rational model, evidenced by the curves converging to1 as u ( o ) → + ∞ . Importantly, however, a metacognitively-rationalagent should also over-represent mundane events signiﬁcantly morethan what a merely cognitively-rational model prescribes, evidencedby the curves overshooting at the neighborhood of u ( o ) = conﬁrmed (Lieder et al., 2017). Note that this coverage is tobe expected as our proposed model subsumes the model out-lined in Lieder et al. (2014, 2017).Importantly, a detailed analysis of MCRF reveals that a meta cognitively-rational agent should over-represent extremeevents precisely according to Lieder et al.’s (2014, 2017)cognitively-rational model, however, it should also over-represent mundane events signiﬁcantly more than what thecognitively-rational model by Lieder et al. prescribes. Theseﬁndings are depicted in Fig. 1.Next, we formally show that when the number of sam-ples s is sufﬁciently large (i.e., for large sample size regime),our proposed metacognitively-rational importance distribu-tion q ∗ meta converges to the cognitively-rational importancedistribution of Lieder et al. (2014, 2017) given in (1). More accurately, in formal terms, q ∗ meta converges to (1) almost surely , except at u ( o ) =

0. Notice that despite the unbound-edness of MCRF at u ( o ) = q ∗ meta remains bounded at u ( o ) = Proposition 1.

When the number of mental simulations sis large, q ∗ meta converges to the importance distribution givenin (1). Formally, assuming u ( o ) − E p [ u ( o )] (cid:54) = , lim s → + ∞ q ∗ meta = Z p ( o ) | u ( o ) − E p [ u ( o )] | , (7) where Z is a normalizing constant (aka partition function). For a formal proof of Proposition 1, the reader is referredto Sec. A-II of the Appendix.Proposition 1 formally establishes that our metacognitivelyrational model of Availability bias serves as a rational, meta-level model for the work by Lieder et al. (2017, 2014), withour model converging to Lieder et al.’s when the number ofsamples s is large. Note that, since Lieder et al.’s impor-tance distribution was speciﬁcally derived under the assump-tion that s is large, the result presented in Proposition 1 intu-itively makes sense, and, importantly, attests to the claim thatour metacognitively-rational model subsumes Lieder et al.’scognitively-rational model, with the rationality of our modelholding for both small and large s ’s while that of Lieder etal.’s only for large s ’s. Past work has documented that people’s risk preferences areinconsistent and context-dependent (see e.g., Tversky & Kah-neman, 1992; Markovitz, 1952). For example, in choos-ing between a safe gamble (low payoff with high probabil-ity) and a risky gamble (high payoff with low probability),risk preferences change depending on the probabilities of thegambles (Tversky & Kahneman, 1992), the amount offered(Markovitz, 1952), and whether those gambles are framed asa gain or loss (Tversky & Kahneman, 1992).In what follows, we show that our metacognitively-rationalmodel can account for two well-known framing effects in hu-man decision-making under risk: the fourfold pattern of riskpreferences in outcome probability (Tversky & Kahneman,1992) and in outcome magnitude (Markovitz, 1952). Thus, (a) (b)

Figure 2:

Accounting for the fourfold pattern of risk preferences in outcome probability (Tversky & Kahneman, 1992), with few samples( s =

2) and the utility function given in (8) based on the prospect theory (Tversky & Kahneman, 1992). (a)

Our metacognitively-rationalmodel can account for the fourfold pattern of risk preferences in outcome probability, with s = (b) Lieder et al.’s (2014, 2017) cognitively-rational model prediction for the probability of choosing the risky choice, with s = ur model establishes the ﬁrst metacognitively-rational basisfor those effects. Framing outcomes as losses rather than gains can reversepeople’s risk preferences (Tversky & Kahneman, 1992): Inthe domain of gains people prefer a lottery ( o dollars withprobability p ) to its expected value (i.e., risk seeking) when p < .

5, but when p > . p < .

5, and risk averse when p > . prospect theory (Tversky &Kahneman, 1992), as did Lieder et al. (2014) postulate, weassume that the agent’s utility function can be modeled by: u ( o ) = (cid:26) o . if o ≥ , −| o | . if o < . (8)Normatively, people should make their choice dependingon whether the expected value of the utility difference ∆ u ( o ) is negative or positive: ∆ u ( o ) = (cid:26) u ( o ) − u ( p × o ) with probability p , − u ( p × o ) with probability 1 − p . (9)Fig. 2(a) shows that our metacognitively-rational modelcan account for the fourfold pattern of risk preferences in out-come probability (Tversky & Kahneman, 1992), with the util-ity function given in (8) based on the prospect theory (Tver-sky & Kahneman, 1992) and very few samples ( s = s = probability of risky choice suggested by Tverskyand Kahneman (1992) using a suggested utility function bythe prospect theory given in (8); our simulations suggest thatthis apparent failure also holds for other values of s . How-ever, Lieder et al.’s (2014, 2017) cognitively-rational modelcan partially account for this effect (see Fig. 3) based on theexpected value of the importance sampling estimator givenin (3), E [ ˆ E ] , replicating the ﬁnding reported in Lieder et al.’s(2014) Fig 3.In their recent work, Lieder et al. (2017) showed thattheir cognitively-rational model can better account for the Figure 3: Expected value of the importance sampling estimatorgiven in (3), E [ ˆ E ] , with s = probability of risky choice suggested by Tversky and Kahneman (1992), using the utility func-tion given in (8) based on the prospect theory; cf. Fig. 2(b). for the fourfold pattern of risk preferences in outcome prob-ability, provided that the utility function is noisy ( efﬁcientneural coding , Summerﬁeld and Tsetsos, 2015); see Fig. 4in Lieder et al. (2017). The result reported in Fig. 2(a)strongly suggests that this effect can be accounted for by apurely metacognitively-rational model together with a utilityfunction fully consistent with the prospect theory (Tversky& Kahneman, 1992), without necessarily having to invoke anoisy utility function (see Lieder et al., 2017, Appendix C).

Past work in behavioral economics has documented an-other curious inconsistency in human decision making underrisk: the fourfold of risk preferences in outcome magnitude(Markovitz, 1952; Hershey & Schoemaker, 1980; Scholten &Read, 2014). Concretely, in choosing between a sure thingand a low-probability risky gamble people demonstrate thefollowing behavioral pattern: In moderate-to-large outcomes,people are risk-averse for gains and risk-seeking for losses.This pattern reverses when outcomes are small, with peoplebeing risk-seeking for gains and risk-averse for losses. Forexample, people would rather choose a sure 1 million dollaroption rather than a (low-probability) risky gamble yielding$10 million dollars with probability 0 . . Speciﬁcally, Lieder et al. (2017) adopt the noisy utility function u ( o ) = oo max − o min + ε , where ε is an additive Gaussian noise, i.e., ε ∼ N ( , σ ) .a) (b) Figure 4:

Simulating the fourfold pattern of risk preferences in outcome magnitude, with few samples ( s = ) and the normalized logarithmicutility function in (10) with α = .

032 and β = . (a) Our metacognitively-rational model can account for this effect: Moving from leftto right along the x -axis within the boxed region clearly shows the risk preference reversal from risk-seeking to risk-aversion (in losses), backto risk-seeking and ﬁnally to risk-aversion (in gains). For ease of visualization, a magniﬁed version of the part lying within the black squareis shown on the top-right. (b) Lieder et al.’s (2014, 2017) cognitively-rational model prediction under the same setting as (a) . Read (2014) show that, armed with a particular choice ofutility function, the prospect theory can accommodate this ef-fect. Concretely, they show that the prospect theory can bestaccount for this effect by adopting the normalized logarith-mic utility function (Rachlin 1992; Scholten & Read, 2010;Kirby, 2011; Kontek, 2011): u nlog ( o ) =  α log ( + α . o ) if o ≥ , − λβ log ( − β . o ) if o < , (10)where α , β ∈ R > , λ ≥ α and β to be 0 .

032 and0 . λ = , α = . , β = . s . These ﬁndings suggest that the fourfold pattern ofrisk preferences in outcome magnitude could stem from theoptimization of a boundedly-rational agent’s decision strat-egy at the metacognitive level, as suggested by (6). As discussed earlier, a metacognitively-rational agent opti-mizes their decision strategy (in our case, their importancedistribution for mental simulations) according to time avail-ability. This requires the agent to have a good estimate of thenumber of samples s they will likely draw within the avail-able time frame, using which they can appropriately selecttheir importance distribution q ∗ meta . However, a crucial ques-tion immediately presents itself: What if the agent is inaccu-rate at approximating the number of samples they get to drawbefore making their decision? After all, it seems plausible to assume that the agent would only have a rough estimate of theparameter s . Thus, it would be very likely that there would bea mismatch between the number of samples the agent thinksthey can draw, and the actual number of samples they ﬁnallydraw. Our model nicely allows for a quantitative investiga-tion of the effects of such a mismatch. The parameter s in (6)indicates the the number of samples the agent thinks they candraw, whereas the parameter s in (3) reﬂects the the number ofsamples the agent actually draws before making a decision. Itis worth noting that the cognitively-rational model by Liederet al. (2014, 2017) does not permit the investigation of thepossible mismatch alluded to above, as the parameter s doesnot feature in Lieder et al.’s importance distribution (Eq. (1)).Intriguingly, our model demonstrates a striking insensitiv-ity to such mismatches: Even if the the number of sample theagent thinks they can draw is unimaginably greater (to be pre-cise, 10 times greater) than the the number of samples theyactually draw before making their decision, the agent shouldstill show the fourfold patterns. Figures are omitted due tolack of space. People overestimate the probability of extreme events, andshow the fourfold pattern of risk preferences in outcomeprobability (Tversky & Kahneman, 1992) and in outcomemagnitude (Markovitz, 1952) in decision-making under risk;these effects are generally taken as evidence against hu-man rationality. In this work, we presented the ﬁrstmetacognitively-rational process model which can accountfor those effects, appearing to suggest that they mightnot be signs of human irrationality after all, but the re-sult of a boundedly-rational decision-maker optimizing theirdecision strategy (in our case, their importance samplingdistribution for performing mental simulations) in accordwith time availability. In fact, it can be shown that themetacognitively-rational importance distribution q ∗ meta in (6)allows the decision-maker to ensure an upper-bound on theMSE of their estimator for the expected value in (2) using inimal number of samples, thereby demonstrating signs ofeconomy ( rational minimalist program, Nobandegani, 2017).Furthermore, our model is unimaginably robust to inaccurateestimations its focal parameter s , positioning it as the ﬁrst ra-tional process model we know of which scores near-perfectlyin optimality, economical use of limited cognitive resources,and robustness, all at the same time.The metacognitively-rational process model presented inthis work and Lieder et al.’s (2014, 2017) cognitively-rationalprocess model seem to suggest that a (boundedly) rational-ist approach to understanding human decision-making at thealgorithmic level might be a fruitful endeavor. In fact, theinﬂuential Rescorla-Wagner model (1972) and its extensiontemporal-difference learning model (Sutton & Barto, 1987;Sutton & Barto, 1998) can be given solid rational groundsbased on linear-Gaussian generative models and the Kalmanﬁltering paradigm, a rational scheme in signal detection the-ory (Kalman, 1960).Our model also makes a counterintuitive (normative) pre-diction, which, if conﬁrmed, would have surprising implica-tions for human decision-making under risk: In choosing be-tween a lottery ( o dollars with probability p ) and its expectedvalue ( p × o ), people should qualitatively behave the same un-der the following two conditions: (i) making a decision basedon a mere single sample (i.e., under extremely high time pres-sure) and (ii) making a decision based on a great many sam-ples (i.e., after a along deliberation time). Note that, giventhe normative status of our model, this is exactly the behav-ior that a boundedly-rational agent should manifest, a ﬁndingwhich would be of great interest for the artiﬁcial intelligencecommunity. If conﬁrmed, this prediction seems to suggest anintriguing possibility for human decision-making under risk:people’s performance after long deliberation times is qualita-tively similar to their performance under extremely high timepressure (i.e., s = metacognitive ,resource-rational process model of cognitive biases, and gen-erally sheds light on possible rational grounds of humandecision-making. We hope to have made some progress inthis exciting direction. Acknowledgments:

We would like to thank Falk Lieder forfruitful discussions. This works is supported by an operating grantto TRS from Natural Sciences and Engineering Research Council of Canada.

Appendix

A-I: Proof of q ∗ meta Given in (6)

Using (4)-(5), the mean-squared error (MSE) of the (normalized)importance sampling estimator ˆ E (Eq. 3) can be written as: E [( ˆ E − E [ u ( o )]) ] ≈ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do + s (cid:2) (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do (cid:3) Under the mild technical condition1 s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do ≤ (cid:2) √ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do (cid:3) , the following holds E [( ˆ E − E [ u ( o )]) ] ≤ (cid:20) √ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do + s (cid:12)(cid:12)(cid:12)(cid:12) (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ (cid:20) √ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do + s (cid:90) p ( o ) q ( o ) (cid:12)(cid:12)(cid:12)(cid:12) E p [ u ( o )] − u ( o ) (cid:12)(cid:12)(cid:12)(cid:12) do (cid:21) Next, we show that q ∗ meta = argmin q (cid:20) √ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do + s (cid:90) p ( o ) q ( o ) (cid:12)(cid:12)(cid:12)(cid:12) E p [ u ( o )] − u ( o ) (cid:12)(cid:12)(cid:12)(cid:12) do (cid:21) . (A.1)Forming the Lagrangian, L ( q ) = √ s (cid:90) p ( o ) q ( o ) ( E p [ u ( o )] − u ( o )) do + s (cid:90) p ( o ) q ( o ) (cid:12)(cid:12)(cid:12)(cid:12) E p [ u ( o )] − u ( o ) (cid:12)(cid:12)(cid:12)(cid:12) do + λ ( (cid:90) q ( o ) do − ) . Equating the ﬁrst variation to zero straightforwardly implies δδ q L ( q ) = = ⇒ q ∗ meta ∝ p ( o ) | u ( o ) − E p [ u ( o )] | (cid:115) + | u ( o ) − E p [ u ( o )] |√ s | u ( o ) − E p [ u ( o )] | s . Note that, as a distribution (which integrates to one over all events o ), q ∗ meta is invariant under any multiplicative re-scaling by a purelyfunction of s , f ( s ) , which does not involve o . Hence, using f ( s ) = √ s , we have: q ∗ meta ∝ p ( o ) | u ( o ) − E p [ u ( o )] | (cid:115) + | u ( o ) − E p [ u ( o )] |√ s | u ( o ) − E p [ u ( o )] |√ s , thereby granting the validity of the expression (6) in the main text.Finally, using Jensen’s inequality, it is straightforward to show that q ∗ meta indeed satisﬁes (A.1). This completes the proof. (cid:4) -II: Proposition 1 Proof.

The distribution q ∗ meta given in (6) can be re-written as: q ∗ meta ∝ p ( o ) | u ( o ) − E p [ u ( o )] | (cid:115) + | u ( o ) − E p [ u ( o )] |√ s . Assuming u ( o ) − E p [ u ( o )] (cid:54) =

0, as n → + ∞ , the term shown in blueapproaches zero. This completes the proof. (cid:4) References

Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013).Simulation as an engine of physical scene understanding.

Proceedings of the National Academy of Sciences , (45),18327–18332.Bonawitz, E., Denison, S., Grifﬁths, T. L., & Gopnik, A.(2014). Probabilistic models, learning algorithms, andresponse variability: sampling in cognitive development. Trends in cognitive sciences , (10), 497–500.Cruciani, F., Berardi, A., Cabib, S., & Conversi, D. (2011).Positive and negative emotional arousal increases durationof memory traces: common and independent mechanisms. Frontiers in behavioral neuroscience , , 86.Denison, S., Bonawitz, E., Gopnik, A., & Grifﬁths, T. L.(2013). Rational variability in childrens causal inferences:The sampling hypothesis. Cognition , (2), 285–300.Fiser, J., Berkes, P., Orb´an, G., & Lengyel, M. (2010). Statis-tically optimal perception and learning: from behavior toneural representations. Trends in cognitive sciences , (3),119–130.Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015).Computational rationality: A converging paradigm forintelligence in brains, minds, and machines. Science , (6245), 273–278.Gershman, S. J., Vul, E., & Tenenbaum, J. B. (2012). Mul-tistability and perceptual inference. Neural computation , (1), 1–24.Geweke, J. (1989). Bayesian inference in econometric mod-els using monte carlo integration. Econometrica: Journalof the Econometric Society , 1317–1339.Grifﬁths, T. L., & Tenenbaum, J. B. (2006). Optimal predic-tions in everyday cognition.

Psychological science , (9),767–773.Grifﬁths, T. L., Vul, E., & Sanborn, A. N. (2012). Bridg-ing levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science , (4), 263–268.Hammersley, J., & Handscomb, D. (1964). Monte carlomethods.

London: Methuen & Co Ltd.Hershey, J. C., & Schoemaker, P. J. (1980). Prospect the-ory’s reﬂection hypothesis: A critical examination.

Orga-nizational Behavior and Human Performance , (3), 395–418.Hertwig, R., & Pleskac, T. J. (2010). Decisions from experi-ence: Why small samples? Cognition , (2), 225–237. Kahneman, D., & Tversky, A. (1972). Subjective probability:A judgment of representativeness. Cognitive psychology , (3), 430–454.Kalman, R. E. (1960). A new approach to linear ﬁltering andprediction problems. Journal of basic Engineering , (1),35–45.Kirby, K. N. (2011). An empirical assessment of the formof utility functions. Journal of Experimental Psychology:Learning, Memory, and Cognition , (2), 461.Kontek, K. (2011). On mental transformations. Journal ofNeuroscience, Psychology, and Economics , (4), 235.Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman,S. J. (2017). Building machines that learn and think likepeople. Behavioral and Brain Sciences , .Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., &Combs, B. (1978). Judged frequency of lethal events. Journal of experimental psychology: Human learning andmemory , (6), 551.Lieder, F., Grifﬁths, T. L., & Hsu, M. (2017). Overrepresenta-tion of extreme events in decision making reﬂects rationaluse of cognitive resources. Psychological Review .Lieder, F., Hsu, M., & Grifﬁths, T. L. (2014). The high avail-ability of extreme events serves resource-rational decision-making. In

Proceedings of the annual meeting of the cog-nitive science society (Vol. 36).Markowitz, H. (1952). The utility of wealth.

Journal ofpolitical Economy , (2), 151–158.Maule, A. J., & Svenson, O. (1993). Theoretical and empiri-cal approaches to behavioral decision making and their re-lation to time constraints. In Time pressure and stress in hu-man judgment and decision making (pp. 3–25). Springer.Nobandegani, A. S. (2017). The Minimalist Mind: OnMininality in Learning, Reasoning, Action, & Imagination.McGill University, PhD Dissertation.Poor, H. V. (2013).

An Introduction to Signal Detection andEstimation . Springer Science & Business Media.Rachlin, H. (1992). Diminishing marginal value as delaydiscounting.

Journal of the Experimental Analysis of Be-havior , (3), 407–415.Rescorla, R. A., Wagner, A. R., et al. (1972). A theory ofpavlovian conditioning: Variations in the effectiveness ofreinforcement and nonreinforcement. Classical condition-ing II: Current research and theory , , 64–99.Rothman, A. J., Klein, W. M., & Weinstein, N. D. (1996).Absolute and relative biases in estimations of personal risk. Journal of Applied Social Psychology , (14), 1213–1236.Scholten, M., & Read, D. (2010). The psychology of in-tertemporal tradeoffs. Psychological review , (3), 925.Scholten, M., & Read, D. (2014). Prospect theory and theforgotten fourfold pattern of risk preferences. Journal ofRisk and Uncertainty , (1), 67–83.Shi, L., & Grifﬁths, T. L. (2009). Neural implementationof hierarchical bayesian inference by importance sampling.n Advances in neural information processing systems (pp.1669–1677).Shi, L., Grifﬁths, T. L., Feldman, N. H., & Sanborn, A. N.(2010). Exemplar models as a mechanism for perform-ing bayesian inference.

Psychonomic Bulletin & Review , (4), 443–464.Sunstein, C. R., & Zeckhauser, R. (2011). Overreaction tofearsome risks. Environmental and Resource Economics , (3), 435–449.Sutton, R. S., & Barto, A. G. (1987). A temporal-differencemodel of classical conditioning. In Proceedings of theninth annual conference of the cognitive science society (pp. 355–378).Sutton, R. S., & Barto, A. G. (1998).

Reinforcement learning:An introduction (Vol. 1) (No. 1). MIT press Cambridge.Svenson, O. (1993).

Time pressure and stress in human judg-ment and decision making . Springer Science & BusinessMedia.Tversky, A., & Kahneman, D. (1992). Advances in prospecttheory: Cumulative representation of uncertainty.

Journalof Risk and uncertainty , (4), 297–323.von Von Neumann, J., & Morgenstern, O. (1955). The the-ory of games and economic behavior . Princeton UniversityPress.Vul, E., Goodman, N., Grifﬁths, T. L., & Tenenbaum, J. B.(2014). One and done? optimal decisions from very fewsamples.

Cognitive science ,38