[PDF] Confidence biases and learning among intuitive Bayesians

Abstract

We design a double-or-quits game to compare the speed of learning one's specific ability with the speed of rising confidence as the task gets increasingly difficult. We find that people on average learn to be overconfident faster than they learn their true ability and we present an intuitive-Bayesian model of confidence which integrates confidence biases and learning. Uncertainty about one's true ability to perform a task in isolation can be responsible for large and stable confidence biases, namely limited discrimination, the hard--easy effect, the Dunning--Kruger effect, conservative learning from experience and the overprecision phenomenon (without underprecision) if subjects act as Bayesian learners who rely only on sequentially perceived performance cues and contrarian illusory signals induced by doubt. Moreover, these biases are likely to persist since the Bayesian aggregation of past information consolidates the accumulation of errors and the perception of contrarian illusory signals generates conservatism and under-reaction to events. Taken together, these two features may explain why intuitive Bayesians make systematically wrong predictions of their own performance.

Full PDF

TTheory and Decision manuscript No. (will be inserted by the editor)

Conﬁdence Biases and Learning among IntuitiveBayesians

Louis Lévy-Garboua · Muniza Askari · Marco Gazel

Received: date / Accepted: date

Abstract

We design a double-or-quits game to compare the speed of learn-ing one’s speciﬁc ability with the speed of rising conﬁdence as the task getsincreasingly diﬃcult. We ﬁnd that people on average learn to be overconﬁdentfaster than they learn their true ability and we present an Intuitive-Bayesianmodel of conﬁdence which integrates conﬁdence biases and learning. Uncer-tainty about one’s true ability to perform a task in isolation can be responsiblefor large and stable conﬁdence biases, namely limited discrimination, the hard-easy eﬀect, the Dunning-Kruger eﬀect, conservative learning from experienceand the overprecision phenomenon (without underprecision) if subjects act asBayesian learners who rely only on sequentially perceived performance cuesand contrarian illusory signals induced by doubt. Moreover, these biases arelikely to persist since the Bayesian aggregation of past information consol-idates the accumulation of errors and the perception of contrarian illusorysignals generates conservatism and under-reaction to events. Taken together,these two features may explain why intuitive Bayesians make systematicallywrong predictions of their own performance.

Keywords

Conﬁdence biases · intuitive-Bayesian · learning · double or quitsexperimental game · doubt · contrarian illusory signals Louis Lévy-GarbouaParis School of Economics, Université Paris 1 Pantheon-Sorbonne, and Centre d’Economiede la Sorbonne, 106-112 Bd de l’Hôpital 75013, Paris, FranceTel.: +33 6 85 11 18 68E-mail: [email protected] AskariCentre d’Economie de la SorbonneE-mail: [email protected] GazelParis School of Economics, Université Paris 1 Panthéon Sorbonne and Centre d’Economiede la Sorbonne, 106-112 Bd de l’Hôpital - 75013 ParisE-mail: [email protected] a r X i v : . [ s t a t . O T ] J u l Louis Lévy-Garboua et al.

In many circumstances, people appear to be "overconﬁdent" in their own abil-ities and good fortune. This may occur when they compare themselves withothers, massively ﬁnding themselves "better-than-average" in familiar domains(eg., Svenson 1981, Kruger 1999), when they overestimate their own absoluteability to perform a task (eg., Lichtenstein and Fischhoﬀ 1977, Lichtensteinet al 1982), or when they overestimate the precision of their estimates andforecasts (eg., Oskamp 1965). Moore and Healy (2008) designate these threeforms of overconﬁdence respectively as overplacement, overestimation, andoverprecision. We shall here be concerned with how people overestimate, orsometimes underestimate, their own absolute ability to perform a task in iso-lation. Remarkably, however, our explanation of the estimation bias predictsthe overprecision phenomenon as well.The estimation bias refers to the discrepancy between ex post objective per-formance (measured by frequency of success in a task) with ex ante subjectivelyheld conﬁdence (Lichtenstein et al, 1982). It has ﬁrst been interpreted as a cog-nitive bias caused by the diﬃculty of the task (e.g.,Griﬃn and Tversky 1992).It is the so called "hard-easy eﬀect" (Lichtenstein and Fischhoﬀ, 1977): peopleunderestimate their ability to perform an easy task and overestimate their abil-ity to perform a diﬃcult task. However, a recent literature has challenged thisinterpretation by seeking to explain the apparent over/underconﬁdence by therational-Bayesian calculus of individuals discovering their own ability throughexperience and learning (Moore and Healy, 2008, Grieco and Hogarth, 2009,Benoît and Dubra, 2011, Van den Steen, 2011). While the cognitive bias viewdescribes self-conﬁdence as a stable trait, the Bayesian learning perspectivepoints at the experiences leading to over- or under-conﬁdence. The primarygoal of this paper is to propose a parsimonious integration of the cognitivebias and the learning approach.We design a real-eﬀort experiment which enables us to test the respectivestrengths of estimation biases and learning. People enter a game in whichthe task becomes increasingly diﬃcult - i.e. risky- over time. By comparing,for three levels of diﬃculty, the subjective probability of success (conﬁdence)with the objective frequency at three moments before and during the task, weexamine the speed of learning one’s ability for this task and the persistenceof overconﬁdence with experience. We conjecture that subjects will be ﬁrstunderconﬁdent when the task is easy and become overconﬁdent when the taskis getting diﬃcult. However, "diﬃculty" is a relative notion and a task thata low-ability individual ﬁnds diﬃcult may look easy to a high-ability person.Thus, we should observe that overconﬁdence declines with ability and riseswith diﬃculty. The question raised here is the following: if people have initiallyan imperfect knowledge of their ability and miscalibrate their estimates, willtheir rising overconﬁdence as the task becomes increasingly diﬃcult be oﬀsetby learning, and will they learn their true ability fast enough to stop the gamebefore it is too late? onﬁdence Biases and Learning among Intuitive Bayesians 3

The popular game "double or quits" ﬁts the previous description and willthus inspire the following experiment. A modern version of this game is theworld-famous TV show "who wants to be a millionaire" . In the games of "dou-ble or quits" and "who wants to be a millionaire" , players are ﬁrst given anumber of easy questions to answer so that most of them win a small prize.At this point, they have an option to quit with their prize or double by pursu-ing the game and answering a few more questions of increasing diﬃculty. Thesame sort of double or quits decision may be repeated several times in orderto allow enormous gains in case of repeated success. However, if the playerfails to answer one question, she must step out of the game with a consolationprize of lower value than the prize that she had previously declined.Our experimental data reproduces the double or quits game. We observethat subjects are under-conﬁdent in front of a novel but easy task, whereasthey feel overconﬁdent and willing to engage in tasks of increasing diﬃcultyto the point of failing.We propose a new model of "intuitive Bayesian learning" to interpret thedata and draw new testable implications. Our model builds on ideas put for-ward by Erev et al (1994) and Moore and Healy (2008). It is Bayesian likeMoore and Healy (2008), while viewing conﬁdence as a subjective probabilityof success, like Erev et al (1994). However, it introduces intuitive rationalityto overcome a limitation of the rational-Bayesian framework which is to de-scribe how rational people learn from experience without being able to predictthe formation of conﬁdence biases before completion of a task. This is not aninnocuous limitation because it means, among other things, that the rational-Bayesian theory is inconsistent with the systematic probability distortionsobserved in decisions under risk or uncertainty since the advent of prospecttheory (Kahneman and Tversky, 1979). Therefore, we need to go deeper intothe cognitive process of decision. Subjects in our view derive their beliefs exclu-sively from their prior and the informative signals that they receive. However,"intuitive Bayesians" decide on the basis of the sensory evidence that they per-ceive sequentially. If they feel uncertain of their prior belief, they will perceivethe objection to it triggered by their doubt and wish to "test" its strengthbefore making their decision, like those decision makers weighting the prosand cons of an option. The perceived objection to a rational prior acts like a contrarian illusory signal that causes probability distortions in opposition tothe prior and this is a cognitive mechanism that does not require completionof the task. As they gain experience, they keep on applying Bayes rule to up-date their prior belief both by cues on their current performance and by theprior-dependent contrarian signal.Thus, with the single assumption of intuitiverationality, we can account for all the cognitive biases described on our datawithin the Bayesian paradigm and integrate the cognitive bias and the learningapproach. With this model, and in contrast with Gervais and Odean (2001),we don’t need to assume a self-attribution bias (Langer and Roth, 1975, Miller

Louis Lévy-Garboua et al. and Ross, 1975) combined with Bayesian learning to produce overconﬁdence .Signals of future success and failure are treated symmetrically . Finally, un-like models of conﬁdence management (e.g. Brunnermeier and Parker 2005,Köszegi 2006, Mobius et al 2014), we don’t have to postulate that individualsmanipulate their beliefs and derive direct utility from optimistic beliefs aboutthemselves.Section 2 lays down the structure of the experiment and incentives, andprovides the basic descriptive statistics. Our large data set allows a thoroughdescription of conﬁdence biases and a dynamic view of their evolution withexperience of the task. Section 3 describes the conﬁdence biases and learningshown by our data. Four basic facts about conﬁdence are reported from ourdata: (i) limited discrimination among diﬀerent tasks; (ii) miscalibration ofsubjective probabilities of success elicited by the "hard-easy eﬀect"; (iii) dif-ferential, ability-dependent, calibration biases known as the Dunning-Kruger(or ability) eﬀect (Kruger and Dunning, 1999); and (iv) local, but not global,learning. Section 4 proposes a new theory of over (under)-conﬁdence amongintuitive Bayesians which integrates doubt and learning and can predict bi-ases, before as well as during the task, in repeated as well as in single trials.Doubt-driven miscalibration appears to be a suﬃcient explanation, not onlyfor the hard-easy eﬀect and the ’ability’ or Dunning-Kruger eﬀect, but alsofor limited discrimination and for the overprecision phenomenon. The theoryis further used in section 5 to predict the evolution of conﬁdence over expe-rience on our data set. For instance, low-ability subjects ﬁrst lose conﬁdencewhen they discover their low performance during the ﬁrst and easiest level;but they eventually regain their initial conﬁdence in own ability to performmore diﬃcult tasks in the future after laborious but successful completion ofthe ﬁrst level. Intuitive Bayesians exhibit conservatism , that is, under-reactionto received information, and slow learning. Finally, we show in sub-section 5.3that the cues upon which subjects construct their own estimate of success, i.e. conﬁdence, widely diﬀer from the genuine predictors of success, which furtherexplains the planning fallacy . The conclusion follows in section 6. Using German survey data about stock market forecasters, Deaves et al (2010) does notconﬁrm that success has a greater impact than failure on self-conﬁdence, which casts doubton the self-attribution bias explanation. In studies where subjects are free to stay or to leave after a negative feedback, subjectswho update most their conﬁdence in their future success to a negative feedback are selectivelysorted out of the sample. This creates an asymmetry in measured responses to positiveand negative feedback. Such spurious asymmetry does not exist in the present experiment,because subjects who fail to reach one level must drop out of the game. The planning fallacy is the tendency to underestimate the time needed for completionof a task. See, e.g. Buehler et al (2002).onﬁdence Biases and Learning among Intuitive Bayesians 5 i.e . 6anagrams per round to be solved in no more than eight minutes). It is longenough to let participants feel that a large eﬀort and ability is required ofthem to succeed at the optional upper levels. It does also let them ample timeto learn the task. The middle and high levels, which come next, comprise 3rounds each.The gradient of task diﬃculty was manipulated after completion of thetraining level and two conditions are available: (i) in the ’wall’ condition, thediﬃculty jumps sharply at middle level, but remains constant at high level; (ii) in the ’hill’ condition, the diﬃculty always rises from one level to the next,slowly ﬁrst at middle level, then sharply at high level.By the end of the experiment, the required number of anagrams is the samefor the ’wall’ and ’hill’ conditions. However, the distribution of anagrams tobe decoded diﬀers for these two conditions. In the wall condition, ten anagramsper round are proposed at the middle and high levels, of which 20 anagramsat least must be decoded per level. In the hill condition, eight anagrams perround are proposed at middle level, and this rises to twelve anagrams at highlevel. Decoding sixteen anagrams in three rounds is required for middle level;and decoding twenty-four anagrams in three rounds is required for high level.This design can be visualized in Figure 1. The same ﬁgure appears (withoutthe legends) on the screen before each round .The manipulation of the ’wall’ and ’hill’ conditions gave rise to three treat-ments: – Wall treatment ( wall ): the wall condition is imposed to participants whopassed the training level; – Hill treatment ( hill ): the hill condition is imposed to participants whopassed the training level; – Choice treatment ( choice ): a choice among the two conditions ( wall or hill )is proposed to participants who passed the training level.The double or quits game is played under these three treatments. All sub-jects ﬁrst go through the training level. Those who were successful -i.e., those The screen highlights the round, the number of correct anagrams cumulated during thecurrent level and the number of anagrams needed to pass this level. Louis Lévy-Garboua et al.

Fig. 1

Decision problem perceived by participants at the start of level 2 of the choicetreatment.

Notes:Payoﬀs in parentheses : (fail, success and stop).Decisions I, II and III are conditioned to success in the previous level.Decision II depends on the treatment.Estimation of Conﬁdence After is conditioned to success in the ﬁrst and decision to start thesecond level. who solved at least 36 anagrams during the training level- will then be askedto double or quits: – Double : Continue to the next level to win a substantial increase in earn-ings; – Quits : Stop the experiment and take your earnings.Participants who decide to go to middle level get a consolation prize thatis lower than the foregone earnings if they fail or drop out before the thirdround. If they succeed middle level, they will be asked again to double or quits.The same rules apply for high level at rising levels of earnings. The potentialgains (in Euros) were (10, 2) at the training level, that is, 10 e for successfulquitters and 2 e for failures, (14, 4) at middle level, and (26, 11) at high level.2.2 Experimental sessionsWe ran 24 sessions for a total of 410 participants, half for the choice treatmentand the other half equally split between the ’wall’ and ’hill’ treatments. Eightsessions were run in the BULCIRANO lab (Center for Interuniversity Researchand Analysis on Organizations), Montreal (Canada), and the same number ofsessions were conducted at the LEEP ( Laboratoire d’Economie Expérimentalede Paris ), Pantheon-Sorbonne University. The diﬀerence between Paris andMontreal was observed to be insigniﬁcant. Thus, eight additional sessions wereconducted at LEEP in order to acquire robust results. A show-up fee of 5 e inParis and Can$ 5 in Montreal was paid to the participants (from now on, allmoney amounts will be given in Euros). About 80% of the participants werestudents. onﬁdence Biases and Learning among Intuitive Bayesians 7 At the start, instructions were read out and a hard copy of it was alsoprovided individually. Participants answered six questions to test their fullcomprehension of the experiment. Information on gender, age, educationallevel and labor market status was required. The last question was a hypothet-ical choice between 5 e for sure and an ambiguous urn containing 100 balls oftwo colors (white and black) in unknown proportions. Ten Euros (10 e ) wereto be earned if a black ball was drawn. Choice of the sure gain provided arough but simple measure of risk aversion in the uncertainty context of theexperiment.2.3 Descriptive statisticsThe main descriptive statistics for the three treatments are reported in Table1: Table 1

Descriptive statistics for the three treatments

TreatmentsVariables Wall Hill Choice

Male 56% 48% 49%Age 24.5 25.8 25.1Risk Averse 54% 59% 51%Payments 9.1 8.9 7.8Total anagrams solved 55.6 53.7 54.3Ability Number of observations 101 106 203

Decision to double conditional on success at previous level:

Middle level 78% (91) 76% (90) 77% (176)High level 95% (22) 72% (29) 82% (34)

Notes: Decision to double to High level: diﬀerence between the "Wall"and "Hill" treatments is signiﬁcant at 5%; all other diﬀerences are notsigniﬁcant at 10% level (t-test). Number of participants successfullyclearing the previous level is in parentheses.

The results of tests show that the three samples are homogeneous. No sig-niﬁcant diﬀerence is observed among the samples’ means for individual charac-teristics. As expected, the ’wall ’ and ’hill’ treatments had a substantial impacton the decision to double upon reaching the middle level. Almost everybodydoubles in the ’wall’ treatment on reaching middle level because the high levelis no more diﬃcult than the middle level. In contrast, only 72% enter the highlevel in the ’hill’ treatment as the diﬃculty gradient is very steep (t-test: t =2.20; p-value =0.033). In spite of these diﬀerences, the number of anagramssolved and payments may be considered equal among treatments at the usuallevel of signiﬁcance. Ability is measured by the number of anagrams solved per minute in the ﬁrst 4 rounds.It lies in the interval [0,6]. Louis Lévy-Garboua et al.

Subjects can also be grouped in three diﬀerent levels of ability, according tothe number of anagrams solved per minute in the ﬁrst 4 rounds: high ability(ﬁrst tercile), medium ability (second tercile) and low ability (last tercile).Some descriptive statistics for the three treatments are reported on Table 2.The three groups are homogeneous in terms of gender and risk aversion but aslightly greater proportion of low-ability subjects can be found among older,probably non-student, participants.

Table 2

Descriptive statistics by ability level

Level of ability

Diﬀerence

Variables High Medium Low

M-H L-M L-H

Male 47% 54% 50% ns ns ns

Age 23.6 24.5 27.2 ns *** ***

Risk Aversion 53% 50% 59% ns ns ns

Payments 11.7 7.7 6.0 *** ** ***

Number anagrams solved 67.7 53.8 42.6 *** *** ***

Ability 4.5 2.4 1.1 *** *** ***

Number of observations 131 142 137

Decision to double conditional on success at previous level:

Middle level 91% (128) 81% (127) 54% (102) ** *** ***

High level 87% (55) 72% (25) 80% (5) * ns ns

Notes: Signiﬁcance level: * 10%; ** 5%; ***1%; ns: not signiﬁcant at 10% level (t-test). Numberof participants successfully clearing the previous level is in parentheses

Table 2 shows that "ability" strongly discriminates among participants interms of performance (total anagrams solved, payments) and quits before themiddle level. However, the training level was meant to be easy enough thatthree-quarters (102:137) of low-ability subjects would pass it.2.4 Conﬁdence judgmentsParticipants were asked to state their subjective probability of success forthe three levels and at three moments: before, during, and after the traininglevel. Before beginning the game, they were shown a demonstration slide whichlasted one minute. Anagrams of the kind they would have to solve appeared onthe screen with their solution. Then, they were asked to assess their chancesof success on a scale of 0 to 100 (Adams, 1957), and the game started for real.After four rounds of decoding anagrams, players were asked again to rate theirconﬁdence. Lastly, players who had passed the training level and decided todouble re-estimated their chances of success for the middle and high levels.The Adams’s (1957) scale that we used is convenient for quantitative anal-ysis because it converts conﬁdence into (almost) continuous subjective proba-bilities. It was required for consistency that the reported chances of success donot increase as the diﬃculty level increased. Answers could not be validatedas long as they remained inconsistent. Subjects actually used the whole scale onﬁdence Biases and Learning among Intuitive Bayesians 9 but, before the experiment, 14% expressed absolute certainty that they wouldsucceed the ﬁrst level and only 1 participant was sure that she would fail.We did not directly incentivize beliefs because our primary aim was notto force subjects to make optimal forecasts of their chances of success but tohave them report sincerely their true beliefs in their attempt to maximize theirsubjective expected utility, and to observe the variation of such beliefs withexperience. The true beliefs are those which dictate actual behavior followingsuch prediction, and the latter was incentivized by the money gains based onsubjects’ decisions to double or quits and performance in the task. Armantierand Treich (2013) have recently generalized previous work on proper scoringrules (see their extensive bibliography). They show that, when subjects have aﬁnancial stake in the events they are predicting and can hedge their predictionsby taking additional action after reporting their beliefs, use of any proper scor-ing rule generates complex distortions in the predictions and further behaviorsince these are not independent and are in general diﬀerent from what theywould have been if each had been decided separately. In the present context,ﬁnal performance yields income and does not immediately follow the forecast.Hence, incentivizing forecasts might force subjects to try and adjust graduallytheir behavior to their forecast and, therefore, unduly condition their behavior.A further diﬃculty encountered in this experiment was that, by incentivizingbeliefs on three successive occasions, we induced risk-averse subjects to diver-sify their reported estimates as a hedge against the risk of prediction error.Self-report methods have been widely used and validated by psychologistsand neuroscientists; and recent careful comparisons of this method with thequadratic scoring rule found that it performed as well (Clark and Friesen,2009) or better (Hollard et al, 2015) than the quadratic scoring rule . Con-sidering that self-reports perform nicely while being much simpler and fasterthan incentive-compatible rules, use of the self-report seemed appropriate inthis experiment. After the subject has reported a probability p , the quadratic scoring rule imposes a costthat is proportional to (1 − p ) in case of success and to (0 − p ) in case of failure. The scoretakes the general form: S = a − b . Cost, with a, b > . The second study also included the lottery rule in the comparison and found that thelatter slightly outperformed self-report. The lottery rule rests on the following mechanism:after the subject has reported a probability p , a random number q is drawn. If q is smallerthan p , the subject is paid according to the task. If q is greater than p , the subject is paidaccording to a risky bet that provides the same reward with probability q . The lottery rulecannot be implemented on our design.0 Louis Lévy-Garboua et al. whatsoever of the characteristics, nor even the existence, of the other path. Result 1 (Limited discrimination): Subjects do not perceive diﬀerences of dif-ﬁculty between two diﬀerent tasks in the future unless such diﬀerences areparticularly salient. Moreover, they are not forward-looking, in the sense thatthey are unable to anticipate the increased likelihood of their success at the highlevel conditional on passing the middle level. However, they can be sophisticatedwhen it is time for them to choose.Support of result 1:

Table 3 compares conﬁdence judgments regarding thethree levels of diﬃculty among the ’ wall ’ and the ’ hill ’ subjects before, dur-ing, and after the training period. Although the ’ wall ’ and ’ hill ’ were designedto be quite diﬀerent at the middle and high levels, the subjective estimates ofsuccess exhibit almost no signiﬁcant diﬀerence at any level. The single excep-tion concerns the early estimate (before round 1) regarding the high level forwhich the diﬀerence of gradient between the two paths is particularly salient.However, the diﬀerence ceases to be signiﬁcant as subjects acquire experienceof the task. This striking observation suggests that individuals are unable todiscriminate distinctive characteristics of the task unless the latter are partic-ularly salient.Perhaps even more disturbing is the fact that, in Table 3, subjects discounttheir conﬁdence level from the middle to the high level as much in the Wall asin the Hill treatment. For instance, just before the middle level, the ratio ofconﬁdence in passing the high level to conﬁdence in passing the middle levelwas close to 0.70 in both treatments. However, a perfectly rational agent shouldrealize that the high level is no more diﬃcult than the middle level in the Walltreatment whereas it is much more diﬃcult in the Hill treatment. Thus, sheshould report almost the same conﬁdence at both levels in the Wall treatment,and a considerably lower conﬁdence at the high level in the Hill treatment.The latter observation suggests that most individuals are unable to computeconditional probabilities accurately even when the latter is equal to one asin the Wall treatment. They don’t anticipate that, if they demonstrate theability to solve 20 anagrams or more at middle level, they should be almostsure to solve 20 or more at the high level. However, subjects do make theright inference when it is time for them to make the decision since 95% ofsubjects who passed the middle level in the Wall treatment decided to continue(Table 1). And, if they have a choice between Wall and Hill, they do make adiﬀerence between these two tracks: 71.4% of doublers then prefer the Walltrack although they would have greater chances of success at the middle level ifthey chose Hill. This observation suggests that subjects did not maximize theirimmediate probability of success but made a sophisticated comparison of theexpected utility of both tracks, taking the option value of Wall in considerationbefore making an irreversible choice of track spanning over two periods . We are grateful to Luis Santos-Pinto for making the last point clear in early discussions.onﬁdence Biases and Learning among Intuitive Bayesians 11

Table 3

A comparison of conﬁdence for the wall and hill treatments shown separately

Subjective conﬁdence No-choice treatment

Wall (%) Hill (%)

Diﬀerence

Before round 1:

Level 1

80 77 nsLevel 2

62 58 nsLevel 3

47 40 ** Before round 5:

Level 1

71 71 nsLevel 2

53 52 nsLevel 3

40 36 ns Before round 10:

Level 2

60 56 nsLevel 3

43 39 ns Notes. Observations: Before rounds 1 and 5 (before round 10): 101 (71) forwall and 106 (68) for hill.

Signiﬁcance Level : * p < . , ** p < . , *** p < . , ns: not signiﬁcant at 10% level. Result 2 (The hard-easy eﬀect): In comparison with actual performance, con-ﬁdence in one’s ability to reach a given level is underestimated for a novelbut relatively easy task (the training level); and it is overestimated for thesubsequent more diﬃcult tasks (the middle and high levels). Overconﬁdenceincreases in relative terms with the diﬃculty of the task. Conditional on aninitial success (training level) and on the decision to continue, conﬁdence inone’s ability to reach higher levels is still overestimated. Thus, initially suc-cessful subjects remain too optimistic about their future.Support for result 2:

Figure 2 compares the measured frequency of successwith the reported subjective conﬁdence in the three successive levels of in-creasing diﬃculty. For the middle and high levels, we also indicate these twoprobabilities as they appear before the training period and after it conditionalon doubling. The Choice and No-choice conditions have been aggregated onthis ﬁgure because no signiﬁcant diﬀerence was found in the result of tests.The task required at the training level was relatively easy for our subjectssince 87% passed this level. However, subjects started it without knowing whatit would be like and, even after four rounds of training, they underestimatedtheir own ability to a low 77% probability of success. The diﬀerence amongthe two percentages is signiﬁcant (t=5.77, p=0.000; t-test). Hence, individualsare under-conﬁdent on the novel but relatively easy task.In contrast, subjects appear to be overconﬁdent as the task gets increas-ingly diﬃcult. They consistently diminish their estimated probabilities of suc-cess but do not adjust their estimates in proportion to the diﬃculty of the task.Thus, individuals tend to overestimate their own chances for the advanced lev-els. The diﬀerence between the frequency of success and conﬁdence before thetask is always signiﬁcant, both at the middle level ( t=18.3, p=0.000 ) and atthe high level ( t=17.1, p=0.000 ). Fig. 2

Hard-easy eﬀect observed at three levels

Notes:

Observations : before training level (N: 410); after training level (N:275 - analysis restricted to doublers).

Diﬀerences between frequency of suc-cess and conﬁdence (before and after) are signiﬁcant at 1% at all levels (Train-ing, Middle and High). ( t-test ) The same conclusions hold conditional on passing the training level andchoosing to double. Subjects remain overconﬁdent in their future chances ofsuccess. However, their conﬁdence does not rise after their initial success inproportion to their chances of further success.3.3 The ability eﬀect

Result 3 (The ability eﬀect): Overcalibration diminishes with task-speciﬁc abil-ity.Support for result 3:

The hard-easy eﬀect is reproduced on Figures 3a, 3b, 3cfor the three ability terciles . Low-ability subjects are obviously more over-conﬁdent at middle and high levels relative to high and medium-ability in-dividuals. This result conﬁrms earlier observations of Kruger and Dunning(1999) among others (see Ryvkin et al (2012) for a recent overview and incen-tivized experiments). The so-called Dunning-Kruger eﬀect has been attributedto a metacognitive inability of the unskilled to recognize their mistakes . We Diﬀerence between conﬁdence and frequency of success is signiﬁcant at 1% for all abilitylevels. For these ﬁgures, we selected conﬁdence reported after th round (during traininglevel) in order to minimize the impact of mismeasurement. The Dunning-Kruger eﬀect initially addressed general knowledge questions whereas weconsider self-assessments of own performance in a real-eﬀort task.onﬁdence Biases and Learning among Intuitive Bayesians 13

Fig. 3a

Under-conﬁdence at the training level, by ability.

Fig. 3b

Overconﬁdence at middle level, by ability. give here another, and in our opinion, simpler explanation . The ability (orDunning-Kruger) eﬀect may be seen as a corollary of the hard-easy eﬀect be-cause "diﬃculty" is a relative notion and a task that a low-ability individualﬁnds diﬃcult certainly looks easier to a high-ability person. Thus, if overconﬁ-dence rises with the diﬃculty of a task, it is natural to observe that it declineson a given task with the ability of performers. Our explanation may also be better than the initial explanation such that the unskilledare unaware of their lower abilities. Miller and Geraci (2011) found that students withpoor abilities showed greater overconﬁdence than high-performing students, but they alsoreported lower conﬁdence in these predictions.4 Louis Lévy-Garboua et al.

Fig. 3c

Overconﬁdence at high level, by ability.

Result 4 (Learning is local, not global): Conﬁdence and performance co-varyduring the task. Subjects learned locally upon experiencing variations in theirperformance. However, they didn’t learn globally in our experiment, sincedoublers remained as conﬁdent as before after completing the training levelirrespective of their true ability level.Support for result 4:

Figures 4 and 5 describe conﬁdence by ability groupbefore, during, and after the training period for the middle and high levelrespectively whereas Figure 6 describes the variation of performance of thesame groups within the same period. These graphs, taken together, show adecline in both (ability-adjusted) conﬁdence and performance during the ﬁrstfour rounds, followed by a concomitant rise of conﬁdence and performance inthe following rounds . The observed decline of conﬁdence at the beginningof the training period can be related on Figure 6 to the fact that participantssolved less and less anagrams per period during the ﬁrst four periods: 5.51 onaverage in period 1, 5.18 in period 2, 4.60 in period 3, and 4.17 in period 4 .Subjects kept solving at least two-thirds of the anagrams available during the No signiﬁcant diﬀerence was found between the Choice and No-choice conditions, sug-gesting that the option to choose the preferred path does not trigger an illusion of control. Participants who reported conﬁdence after the training period were more able thanaverage since they had passed this level and decided to double. Thus, we compare ability-adjusted conﬁdence Before and During with the reported conﬁdence After. The ability-adjusted conﬁdence Before and During are obtained by running a simple linear regressionof conﬁdence Before and During on ability, measured by the average number of anagramssolved per minute in the ﬁrst 4 rounds of the training level. The estimated eﬀect of superiorability of doublers was added to conﬁdence During or Before to get the ability-adjustedconﬁdence which directly compares with the observed conﬁdence After. With a single exception, conﬁdence variations are statistically signiﬁcant at 1% level inthe middle and high levels. There was no signiﬁcant diﬀerence between treatments.onﬁdence Biases and Learning among Intuitive Bayesians 15 training session but probably lost part of their motivation on repeating thetask. On sequentially observing their declining performance, they revised theirinitial estimate of future success downward. However, on being asked to reporttheir conﬁdence after four rounds, they became conscious of their performancedecline and responded to this information feedback. Performance rose sharplybut momentarily during the next two rounds. The average performance ﬁrstrose to 4.37 in period 5 and 5.05 in period 6 then sharply declined to 4.39 inperiod 7, 4.06 in period 8 and 3.48 in period 9. As soon as subjects became(almost) sure of passing the training level, they diminished their eﬀort. Duringthe experiment it was also observed that individuals stopped decoding furtheranagrams as soon as the minimum requirement to clear a level was fulﬁlled.Subjects experiencing low (medium) performance in the ﬁrst rounds seemto learn locally that they have a low (medium) ability since the conﬁdence gapwidens during the ﬁrst four periods. However, this learning eﬀect is short-livedsince the conﬁdence gap shrinks back to its initial size after low (medium)-ability subjects strove to succeed, increasing their performance (as reported onFigure 6) and regaining conﬁdence. Eventually, experienced "doublers" are asconﬁdent to succeed at higher levels as they were before the task, irrespectiveof their ability level: there is no global learning eﬀect. We share the conclusionof Merkle and Weber (2011) that the persistence of prior beliefs is inconsistentwith fully rational-Bayesian behavior(see also Benoît et al 2015).

Fig. 4

Variation of conﬁdence with experience, by level of ability: middle level

Notes.

Sample size : 410 individuals for Before and During, and 275 for After(only doublers). We report the adjusted ability for doublers, see Footnote 13for more details.

Diﬀerences between ability levels are signiﬁcant at 1%level Before and During. Diﬀerences After are not signiﬁcant at 10% level.

Diﬀerences by ability level : High-ability: During-Before: ***; After-During:ns; After-Before: ns. Medium-ability: During-Before: ***; After-During: ***;After-Before:**. Low- ability: During-Before:***; After-During: *** ; After-Before: ns.

Signiﬁcance level : *** 1%; ** 5%; * 10%; ns: not signiﬁcant at10% level (t-test).

Fig. 5

Variation of conﬁdence with experience, by level of ability: high level

Notes.

Sample size : 410 individuals for Before and During, and 275 for After(only doublers). We report the adjusted ability for doublers, see Footnote 13for more details.

Diﬀerences between ability levels are signiﬁcant at 1%level Before and During. Diﬀerences After are not signiﬁcant at 10% level.

Diﬀerences by ability level : High-ability: During-Before: ns; After-During:ns; After-Before: ns. Medium-ability: During-Before: ***; After-During: **;After-Before: ns. Low-ability: During-Before:***; After-During: *** ; After-Before: ns..

Signiﬁcance level : *** 1%; ** 5%; * 10%; ns: not signiﬁcant at10% level (t-test).

Fig. 6

Number of anagrams solved per round by level of abilityonﬁdence Biases and Learning among Intuitive Bayesians 17

We present now a simple Bayesian model that describes absolute conﬁdencereported before and during completion of a task, and predicts limited discrimi-nation, the hard-easy eﬀect and the ability eﬀect. It builds on ideas put forwardby Erev et al (1994) and Moore and Healy (2008) who both consider that con-ﬁdence, like most judgments, are subject to errors. Erev et al (1994) viewconﬁdence as a subjective probability that must lie between 0 and 1. Hence,probabilities close to 1 are most likely to be underestimated and probabilitiesclose to 0 are most likely to be overestimated. The hard-easy eﬀect and theability eﬀect may be merely the consequence of that simple truth. However,their theory oﬀers a qualitative assessment that lacks precision and cannot beapplied to intermediate values of conﬁdence. Moore and Healy (2008) analyzeconﬁdence as a score in a quiz that the player must guess after completionof the task and before knowing her true performance. Bayesian players adjusttheir prior estimate after receiving a subjective signal from their own experi-ence. It is natural to think that signals are randomly distributed around theirtrue unknown value. Assuming normal distributions for the signal and theprior, the posterior expectation of conﬁdence is then a weighted average of theprior and the signal lying necessarily between these two values. Thus, if thetask was easier than expected, the signal tends to be higher than the prior.The attraction of the prior pulls reported conﬁdence below the high signal,hence below true performance on average since the signal is drawn from anunbiased distribution. While rational-Bayesian models like Moore and Healy(2008) may account for learning over experience, they fail to predict limiteddiscrimination, miscalibration of conﬁdence before completion of the task, orthe absence of global learning. Therefore, we add to the Bayesian model a cru-cial but hidden aspect of behavior under risk or uncertainty, that is doubt. Wedescribe the behavior of subjects who are uncertain of their true probabilityof success and become consequently vulnerable to prediction errors and cog-nitive illusions if they rely essentially on what they perceive sequentially. Wedesignate these subjects as "intuitive Bayesians". It turns out, unexpectedly,that the same model also predicts the overprecision bias of conﬁdence, whichwe consider as a further conﬁrmation of its validity.Intuitive Bayesians may miscalibrate their own probability of success evenif they have an unbiased estimate of their own ability to succeed. This can occurif they are uncertain of the true probability of success because they can bemisled by "available" illusory signals triggered by their doubt. The direction ofdoubt is entirely diﬀerent depending on whether their prior estimate led themto believe that they would fail or that they would succeed. We thus distinguishmiscalibration among those individuals who should normally believe that theyshould not perform the task and those who should normally believe that theyshould.To facilitate intuition, let us ﬁrst consider a subject who is almost sure tosucceed a task, either because the task is easy or because the subject has high-ability ( H ). However, the "availability" of a possible failure acts like a negative signal which leads to overweighting this possibility (Tversky and Kahneman,1973), and underweighting her subjective probability of success Ep H , i.e. underconﬁdence: q H = µEp H + (1 − µ )0 = µEp H ≤ Ep H , (1)with < µ ≤ Even though high-ability agents are almost sure of succeeding the traininglevel, their conﬁdence is way below 1, conﬁrming the Dunning-Kruger eﬀectwhere high-ability subjects underestimate their abilities. An estimate of thisundercalibration bias for an easy task is derived from Figure 3a: µ H (training level) = 0 . .

98 = 0 . ∼ = q H (training level) The undercalibration bias is: − .

806 = 0 . .However, underweighting a high probability of success need not reverse theintention of doubling. Indeed, taking the expected value as the decision crite-rion, among 167 "able" subjects who should double if objective probabilitiesare used for computation, 158 (i.e. 94.6%) still intended to double accordingto the subjective conﬁdence reported before the game .At the other end of the spectrum, consider now a subject who is almostsure of failing, either because the task is very diﬃcult or because the subjecthas low-ability ( L ). However, the "availability" of a possible success leads tooverweighting her subjective probability of success Ep L i.e. overconﬁdence: q L = µEp L + (1 − µ )1 ≥ Ep L , (2)with < µ ≤ Thus, even though low ability agents should give up a diﬃcult task, they areoverconﬁdent and are thus tempted by the returns to success . In the limit,conﬁdence remains positive if one is almost certain to fail. This means thatlow-ability individuals always exhibit a positive bottom conﬁdence, which isin line with the Dunning-Kruger eﬀect (they overestimate their abilities). Anestimate of this overcalibration bias for the high level is derived from Figure3c: − µ L (high level) = 0 . − . − .

01 = 0 . ∼ = q L (high level) The time t = (1 , , when conﬁdence is reported is omitted in this sub-section toalleviate notations. Very close numbers are obtained for all calibration biases with conﬁdence reportedduring the game. This should not be confounded with motivated inference as it applies symmetrically toundesirable and desirable outcomes.onﬁdence Biases and Learning among Intuitive Bayesians 19

Similarly, the overcalibration bias for the middle level is derived from Figure3b: − µ (cid:48) L (middle level) = 0 . − . − .

04 = 0 . ∼ = q L (middle level) Notice that the overcalibration bias is about twice as large as the under-calibration bias. Hence, taking the expected value as the decision criterion,among 190 "unable" subjects who should quit if objective probabilities areused for computation, 159 (i.e. 83.7%) intended to double according to thesubjective conﬁdence reported before the game.To sum up, we explain both the hard-easy eﬀect and the ability eﬀect byan availability bias triggered by the doubt about one’s possibility to fail arelatively easy task (underconﬁdence) or to succeed a relatively diﬃcult task(overconﬁdence). If probabilities are updated in a Bayesian fashion, the cali-bration bias is the relative precision of the illusory signal. The latter is inverselyrelated with the absolute precision of the prior estimate and positively relatedwith the absolute precision of the illusory signal. Thus, we mustn’t be sur-prised to ﬁnd that our estimate of the calibration bias is lower for the traininglevel (19.4%) than for upper levels (42.7% and 33.3% respectively) becauseexperience in the ﬁrst rounds of the training level must be more relevant forpredicting the probability of success in the training level than in subsequentlevels. And, when comparing upper levels, the illusion of success should bemore credible for the near future (middle level) than for the more distantfuture (high level).This explanation is also consistent with the other measures displayed byFigures 3a, 3b, 3c, given the fact that they aggregate overconﬁdent sub-jects who should not undertake the task with underconﬁdent subjects whoshould undertake it . If λ L is the proportion who should stop and λ H theproportion who should continue ( λ L + λ H ≡ ), the average conﬁdence is: λ L ( µEp L + 1 − µ ) + λ H µEp H = µEp + (1 − µ ) λ L . Conﬁdence is overcalibratedon average iﬀ λ L > Ep and undercalibrated iﬀ the reverse condition holds. Theapparent overcalibration of conﬁdence for a diﬃcult task takes less extremevalues when the average measured ability of the group rises. For instance, theresults displayed by Figure 3c are consistent with our estimate for the overcal-ibration bias if the proportion of successful middle-ability subjects is 12% andthat of successful high-ability subjects is 25%, since these two predicted valuesare close to the observed frequency of success in these groups, respectively 10%and 27%.Remarkably, this simple model of miscalibration also predicts limited dis-crimination. Although Wall is more diﬃcult than Hill at the middle level, The rational decision to undertake a non-trivial task of level l (with a possibility tofail and regret) is subjective. The economic criterion for making this decision rests on thecomparison of the expected utilities of all options conditional on the estimated probabilitiesof success at the time of decision. A rational subject should refuse the task if the expectedutility of continuing to level l or above is no higher than the expected utility of stoppingbefore level l. We make use of this criterion for writing equations 6 and 7 in the nextsub-section (5.1).0 Louis Lévy-Garboua et al. our subjects attributed on average about the same conﬁdence level to bothtasks (see table 1). High-ability subjects who should double at middle levelin the Wall condition, and low-ability subjects who should stop before middlelevel in the Hill condition would both estimate their chances of success to behigher with 16 anagrams to solve with Hill than with 20 anagrams with Wall.The former would underestimate their chances according to (1) and the latterwould overestimate them according to (2), but the diﬀerence between the twoestimates would be the same, equal to µ ( E p Hill − E p Wall ). Thus, if their priorestimates were unbiased, intuitive (s.t. µ < ) high and low-ability subjectswould imperfectly discriminate between Hill and Wall by underestimating thediﬃculty gap between them. Things are even worse for middle-ability subjectswho should opt for middle level under Hill and quit before middle level underWall. According to (1) and (2), those individuals would have a low estimate( µE p Hill ) of their pass rate under Hill and a high estimate ( E p Wall + 1 − µ )under Wall. They would then underestimate the diﬃculty gap more severelythan high or low-ability subjects and they might even give a higher estimateunder Wall than under Hill iﬀ E p Hill − E p Wall < ((1 − µ ) /µ ). Therefore, ourmodel implies limited discrimination of diﬀerences in diﬃculty by intuitiveBayesians when the diﬀerence is not very salient.A further implication of Bayesian updating is that, in the subject’s mind,the precision of the posterior estimate for probabilities of success, i.e. conﬁ-dence in her estimate, is increased by reception of the illusory signal, whateverthe latter may be . Therefore, our theory of conﬁdence predicts the overpre-cision phenomenon even before completion of the task. In contrast with theother distortions of conﬁdence, underprecision will never be observed, a pre-diction which is corroborated by Moore and Healy (2008) who do not quoteany study in their discussion of "underprecision". The overestimation of theprecision of acquired knowledge is an additional manifestation of the hiddensearch undertaken by intuitive Bayesians. Our analysis of overprecision is con-gruent with the observation that greater overconﬁdence of this kind was foundfor tasks in which subjects considered they were more competent (Heath andTversky, 1991). E p , after fourrounds E p , and after nine rounds (only for doublers) E p .After going through four rounds of anagrams, a number of cues on thetask have been received and processed. Participants may recall how many It is assumed here, as in Table 1, that the two estimates are independent. If ν i denotes the prior precision of subject I (cid:48) s estimate of her future success (omittinglevel l for simplicity) ν i + 1 ≡ Φ i will be the posterior precision after reception of an i.i.d. signal. Thus, Φ i > ν i . Notice that µ i = ν i ν i +1 .onﬁdence Biases and Learning among Intuitive Bayesians 21 anagrams they solved in each round and in the aggregate, whether they wouldhave passed the test in each round or on the whole at this stage of the task,whether their performance improved or declined from one round to the next,how fast they could solve anagrams, and so forth. For the purpose of decision-making, cues are converted into a discrete set of i.i.d. Bernoulli variables takingvalue 1 if they signal to the individual that she should reach her goal for level l ( l = 1 , , , and 0 otherwise. The single parameter of the Bernoulli variableis its mean which deﬁnes the expected likelihood of success. However, thismean is essentially unknown to that individual. Thus, let it be denoted by (cid:101) p which is randomly distributed within the interval [0, 1]. Assume that theprior distribution of (cid:101) p is a Beta-distribution with a reported mean E p andprecision ν .Behaving like intuitive Bayesians, participants update their prior expecta-tion of success at level l ( l = 1 , , before the training session E p l in thefollowing manner (see DeGroot 1970, Chapter 9): E p l = ν l ν l + τ l E p l + 1 ν l + τ l X − l (3)with τ l > designating the precision of all the independent cues perceivedduring the ﬁrst four rounds, and X − l deﬁning the number of independentcues predicting future success at level l at this stage of the task. They alsoupdate the precision of the posterior expectation E p l , which rises from ν l to: ν l = ν l + τ l (3’)with ≤ X − ,l ≤ τ l .Equation (3) cannot be directly estimated on the data because the es-timated probabilities E p l and E p l are unobservable. However, it may berewritten concisely in terms of reported conﬁdence q ( l ) and q ( l ) with thehelp of the miscalibration equations (1) and (2). Let us express generally theBayesian transformation of the probability estimates into conﬁdence as: q ( l ) = µ l E p l + (1 − µ l ) D ,l , l = (1 , , (4)with µ l = ν l + τ l ν l + τ l +1 and D (5 ,l ) = (cid:26) if max EU ( l (cid:48) | E p l (cid:48) , l (cid:48) = (0 , · · · l − ≥ max EU ( l (cid:48)(cid:48) | E p l (cid:48)(cid:48) , l (cid:48)(cid:48) = ( l, · · · , otherwise.Conﬁdence is merely a weighted average of the prior forecast and a doubtterm acting as a contrarian Bernoulli signal.And likewise: In order to have an unambiguous deﬁnition of D (5 ,l ) and D (1 ,l ) below, we use theexpected utility (EU) criterion, as explained in note 19.2 Louis Lévy-Garboua et al. q ( l ) = µ l E p l + (1 − µ l ) D ,l (5)with µ l = ν l ν l +1 and D (1 ,l ) = (cid:26) if max EU ( l (cid:48) | E p l (cid:48) , l (cid:48) = (0 , · · · l − ≥ max EU ( l (cid:48)(cid:48) | E p l (cid:48)(cid:48) , l (cid:48)(cid:48) = ( l, · · · , otherwise.Combining (3), (4) and (5), we get: q ( l ) = ν l + 1 ν l + τ l + 1 q ( l ) + 1 ν l + τ l + 1 X − ,l + 1 ν l + τ l + 1 ( D ,l − D ,l ) (6)By the same reasoning, we can express the conﬁdence of doublers for upperlevels l = (2 , as: q ( l ) = ν l + τ l + 1 ν l + τ l + 1 q ( l ) + 1 ν l + τ l + 1 X − ,l + 1 ν l + τ l + 1 ( D ,l − D ,l ) (7)with τ l ≥ τ l designating the precision of all of the independent cues perceivedduring the training level (9 rounds), ν l + τ l the precision of the posteriorexpectation E p l , and X deﬁning the number of independent cues predictingfuture success at level l at this stage of the task.Equations (6) and (7) are essentially the same with a moving prior of in-creasing precision. In the absence of miscalibration, conﬁdence reported beforeround t ( t = (5 , would be a weighted average of prior conﬁdence and themean frequency of cues predicting future success at level l since the last timeconﬁdence was reported. With miscalibration, another term is added whichcan only take three values, reﬂecting the occurrence and direction of changein subjects’ estimated ability with experience. If experience conﬁrms the priorintention to stop or continue to level l , this additional term takes value 0 andconﬁdence is predicted by the rational-Bayesian model (with perfect calibra-tion). However, if experience disconﬁrms the prior intention to stop or continueto level l , conﬁdence rises above this reference value with disappointing expe-rience and declines symmetrically below this reference value with encouragingexperience. Thus, our model predicts that intuitive Bayesians be conservative and under-react symmetrically to negative experience (by diminishing theirconﬁdence less than they should) and to positive experience (by raising theirconﬁdence less than they should). Below, we report indeed rather small vari-ations of conﬁdence in our experiment in the form of local, but not global,learning. onﬁdence Biases and Learning among Intuitive Bayesians 23 respectively. Reportedconﬁdence in participant i ’s ability to reach one level of the double-or-quitsgame is regressed in Table 4 (Table 5) on the conﬁdence that she reportedbefore the ﬁrst (ﬁfth) round and on a vector Z li of level-speciﬁc cues observablein the ﬁrst four (last ﬁve) rounds, assuming that X − ,li ( X − ,li ) = β l Z li + (cid:15) li where β l is a vector of coeﬃcients and (cid:15) li an error term of zero mean. Twodummy variables for the hill and choice treatments (wall as reference) havebeen added to the regression. Table 4

OLS estimation of the Bayesian model of conﬁdence before round 5

Training Level Middle Level High Level

Conﬁdence before training session . ∗∗∗ . ∗∗∗ . ∗∗∗ Freq. of rounds with 4 anagrams solved . ∗∗∗ . ns . ns Freq. of rounds with 5- 6 anagrams solved . ∗∗∗ . ∗∗∗ . ∗∗∗ Freq. of rounds with non-declining performance . ∗∗∗ . ∗∗∗ . ∗∗∗ Anagrams solved per minute on rounds 1-4 . ∗∗∗ . ∗∗∗ . ∗∗∗ Hill . ∗ . ∗∗ . ∗ Choice . ns . ns . ns Constant − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ R

67% 70% 76%Observations 410 410 410

Notes.

The regressions conﬁrm the existence of local learning. Subjects did revisetheir expectations with experience of the task as several cues have highly sig-niﬁcant coeﬃcients (at 1% level) with the right sign. Moreover, they analyzetheir own performance correctly by setting stronger pre-requisites for them-selves when the task gets more diﬃcult. For example, their ability to solvejust four anagrams per round in the training period increases their conﬁdencefor this period only because, if such performance is enough to ensure successin this period, it is no longer suﬃcient when the task becomes more diﬃcult.Another interesting result in Table 5 consistent with the miscalibration termin equation (7) concerns low achievers who double. The later they ended upsolving the required number of anagrams in the training period, the moreabruptly their conﬁdence rose. It is indeed an implication of subjects’ vul-nerability to illusory signals that low-ability doublers ﬁnd themselves almostas conﬁdent as high-ability doublers in spite of widely diﬀerent performances.This result appears too on Figures 4 and 5, where the ability-adjusted con- The discrete value of conﬁdence between 0 and 100 can be safely treated as continuous.4 Louis Lévy-Garboua et al.

Table 5

OLS estimation of the Bayesian model of conﬁdence for doublers reported beforethe middle level

Middle Level High Level

Conﬁdence after round 4 . ∗∗∗ . ∗∗∗ Freq. of rounds with 4 anagrams solved (5-9) . ns − . ns Freq. of rounds with 5- 6 anagrams solved (5-9) . ∗∗∗ . ∗ Freq. of rounds with non-declining performance (5-9) . ns . ∗∗ Number of rounds used to solve 36 anagrams . ∗∗∗ . ∗∗ Anagrams solved per minute on rounds 5-9 . ns − . ns Hill − . ∗∗∗ − . ns Choice − . ns . ns Constant − . ns − . ∗∗ R

74% 81%Observations 275 275

Notes.

Signiﬁcance level : * p < . , ** p < . , *** p < . , ns: not signiﬁcant at 10%level. Variables : Frequency of rounds with non-declining performance represents the percentageof rounds (in rounds 5-9) in which number of anagrams solved was equal or higher than in theprevious round. Hill and Choice: dummy variables with Wall as reference. Number of rounds usedto solve 36 anagrams (between rounds 6 and 9). (5-9) refers to measures between rounds 5 and 9. ﬁdence of low-ability doublers jumps from bottom to top during the secondstage of the training period.A major testable implication of the Bayesian model lies in the coeﬃcientof the prior conﬁdence, which must be interpreted as the precision of priorinformation relative to the information collected by experience of the taskduring the training period. This coeﬃcient is always high in Tables 4 and 5with a minimum value of 0.77. Observing such high weights for the prior favorsthe hypothesis of rational-Bayesian updating over adaptive expectations asthe latter would considerably underweight the prior relative to the evidenceaccumulated in the ﬁrst four rounds. Successful experience of the easier taskin the early rounds is expected to be more predictive of ﬁnal success on thesame task than in future tasks of greater diﬃculty. Thus, the relative weightof experience should diminish in the conﬁdence equation at increasing levelsor, equivalently, the relative weight of prior conﬁdence should rise. Indeed, thecoeﬃcient of prior conﬁdence increases continuously with the level. It rises from0.79 to 0.86 and 0.90 in Table 4; and, from 0.77 to 0.87 in Table 5. In parallel,the coeﬃcients of cues signaling a successful experience continuously diminishwhen the level rises. We can use the mathematical expressions of the twocoeﬃcients of prior conﬁdence derived from equations (6) and (7) to calculatethe precision of early experience relative to prior conﬁdence (before the task) τ l ν l ( l = 1 , , . With the data of Table 4, we get 0.266 for the training level,0.163 for the middle level, and 0.111 for the high level. Similarly, we computethe precision of late experience relative to prior conﬁdence (before the task) τ l ν l ( l = 2 , . With the data of Table 5, we get 0.506 for middle level and 0.274 forhigh level. The impact of learning from experience appears to be substantialand with increasing returns. By elimination of ν l , we ﬁnally calculate theprecision of early experience relative to total experience during the training onﬁdence Biases and Learning among Intuitive Bayesians 25 period τ l τ l ( l = 2 , . We obtain 0.322 for middle level and 0.405 for high level.The rate of increase of precision resulting from longer experience (from 4 to 9rounds) τ l − τ l τ l reaches a considerable 211% at middle level and 147% at highlevel, which forms indirect evidence of the overprecision phenomenon.5.3 Why do intuitive Bayesians make wrong (and costly) predictions ofperformance?The answer to this important question, and to the related planning fallacy , iscontained in Table 6, which uses the same set of potential predictors to forecastconﬁdence in succeeding the middle level after doubling and ex post chancesof success : prior conﬁdence, ability, and performance cues observed subse-quently (during rounds 5 to 9). The mere comparison of coeﬃcients betweenthe two columns of Table 6 demonstrates that posterior conﬁdence is based onboth objective performance cues and subjective variables, whereas the chancesof success are predicted by the objective performance cues and ability only.The latter are the frequencies of rounds with 4 and with 5-6 anagrams solvedrespectively (eﬀort) and the speed of anagram resolution (ability); and thesubjective variables are essentially the prior conﬁdence and the illusory signalgiven to low achievers by their (lucky) initial success. Remarkably, the numberof rounds needed for solving 36 anagrams (varying from 6 to 9), which indi-cates low achievement and recommends quitting the game at an early stage,acts as an illusory signal with a signiﬁcantly positive eﬀect on conﬁdence incolumn 1; but the same variable acts as a correlate of low ability in column 2with a strong negative eﬀect on the chances of success at middle level. Indeed,the subjective predictors of posterior conﬁdence do not predict success whenthe objective performance cues are held constant. Prior conﬁdence predictsthe posterior conﬁdence that conditions the decision to double but fails topredict success because it is based on an intuitive reasoning which suﬀers fromsystematic biases. Past errors convey to the prior through the aggregation pro-cedure of Bayesian calculus and may add up with further errors caused by theperception of illusory signals.To reinforce our demonstration, we used the regressions listed in Table 6 topredict normative (based on rational expectations) and subjective (conﬁdence-based) expected values and determine the best choice of doubling or quittingprescribed by those alternative models. As expected, the normative model’s We used an OLS to predict probabilities of success so as to make the comparison withconﬁdence transparent. Estimating an OLS instead of a Probit in columns 3 and 4 didn’taﬀect the qualitative conclusions. Conditional on initial success, prior conﬁdence is a good predictor of the future decisionto double (regression not shown). This is good news for the quality of conﬁdence reports;and it conﬁrms that subjects behave as intuitive Bayesians who rely on their own subjectiveestimates of success to make the choice of doubling. The predicted values were computed on regressions containing only the signiﬁcant vari-ables. We checked that these values stayed close to predictions derived from the regressionslisted in Table 6 which contain non signiﬁcant variables too.6 Louis Lévy-Garboua et al.

Table 6

Estimation of posterior conﬁdence (after doubling) and ex post chances of successat the middle level

Level 2Conﬁdence After Chances of success

Conﬁdence after round 4 . ∗∗∗ . ns Freq. of rounds with 4 anagrams solved (5-9) . ns . ∗ Freq. of rounds with 5- 6 anagrams solved (5-9) . ∗∗∗ . ∗∗ Freq. of rounds with non-declining performance (5-9) . ns − . ns Number of rounds used to solve 36 anagrams . ∗∗∗ − . ∗∗∗ Anagrams solved per minute on rounds 5-9 . ns . ∗∗∗ Ability − . ns . ∗∗∗ Hill − . ∗∗∗ . ns Choice − . ns − . ∗ Constant − . ns . ∗ R

74% 30%Observations 275 275

Notes.

Sample : to be comparable, these regressions consider only those who succeeded ﬁrst leveland decided to double to second level.

Table 7

The prevalence and cost of miscalibration among doublersPresciption of subjective Prescription of normative Rate ofexpected value expected value Category Share failuredouble double able and calibrated 47% 52%stop stop unable and calibrated 12% 78%double stop overconﬁdent 36% 91%stop double underconﬁdent 5% 57%onﬁdence Biases and Learning among Intuitive Bayesians 27

We designed an experimental analog to the popular double-or-quits game tocompare the speed of learning one’s ability to perform a task in isolation withthe speed of rising conﬁdence as the task gets increasingly diﬃcult. In simplewords, we found that people on average learn to be overconﬁdent faster thanthey learn their true ability. We present a new intuitive-Bayesian model of con-ﬁdence which integrates conﬁdence biases and learning. The distinctive featureof our model of self-conﬁdence is that it rests solely on a Bayesian representa-tion of the cognitive process: intuitive people predict their own probability ofperforming a task on the basis of cues and contrarian illusory signals related tothe task that they perceive sequentially. Conﬁdence biases arise in our opinion,not from an irrationality of the treatment of information, but from the poorquality and subjectivity of the information being treated. For instance, we ruleout self-attribution biases, motivated cognition, self-image concerns and ma-nipulation of beliefs but we describe people as being fundamentally uncertainof their future performance and taking all the information they can get withlimited discrimination, including cognitive illusions. Above all, a persistentdoubt about their true ability is responsible for their perception of contrarianillusory signals that make them believe, either in their possible failure if theyshould succeed or in their possible success if they should fail.Our intuitive-Bayesian theory of estimation combines parsimoniously thecognitive bias and the learning approach. It brings a novel interpretation ofthe cognitive bias and it provides a general account of estimation biases. In-deed, we did not attribute conﬁdence biases to speciﬁc cognitive errors butto the fundamental uncertainty about one’s true ability; and we predictedphenomena beyond the hard-easy and Dunning-Kruger eﬀect which could notbe explained all together by previous models: miscalibration and overpreci-sion before completion of the task, limited discrimination, conservatism, slowlearning and planning fallacy. Moreover, we showed that these biases are likelyto persist since the Bayesian aggregation of past information consolidates theaccumulation of errors, and the perception of illusory signals generates con-servatism and under-reaction to events. Taken together, these two featuresmay explain why intuitive Bayesians make systematically wrong and costlypredictions of their own performance. Don’t we systematically underestimatethe time needed to perform a new (diﬃcult) task and never seem to learn?Our analysis of overconﬁdence is restricted to the overestimation bias. Thelatter must be carefully distinguished from the overplacement bias since thehard-easy eﬀect that we observed here with absolute conﬁdence has often beenreversed when observing relative conﬁdence: overplacement for an easy task(like driving one’s car) and underplacement for a novel or diﬃcult task. Thereasons for overplacement are probably not unique and context-dependent.When people really compete, the over (under) placement bias may result fromtheir observing and knowing their own ability (although imperfectly) betterthan others’. If both high-ability and low-ability individuals compare them-selves with average-ability others, the former are likely to experience overplace- ment and the latter underplacement. The same reasoning applies to individualsfamiliar or unfamiliar with the task, and to individuals who were initially suc-cessful or unsuccessful with the task. When no real competition is involved,the overplacement eﬀect relates to an evaluation-based estimate of probability.While there is an underlying choice to be made in the estimation task, no suchthing is present in the latter case. If I ask you whether you consider your-self as a top driver (relative to others), I don’t generally expect you to showme how you drive. Preference reversals are not uncommon between choicesand evaluations (Lichtenstein and Slovic, 1971). Thus, the present analysis ofoverestimation is consistent with reasonable explanations of overplacement.Moreover, it predicts the overprecision phenomenon and even rules out under-precision. This demonstrates that overestimation and overprecision are relatedbut diﬀerent biases.Double-or-quits-type behavior can be found in many important decisionslike addictive gambling (Goodie, 2005), military conquests (Johnson, 2004),business expansion (Malmendier and Tate, 2005), speculative behavior (Shiller,2000), educational choices (Breen, 2001), etc. Overconﬁdent players, chiefs, en-trepreneurs, traders, or students are inclined to take excessive risks; they areunable to stop at the right time and eventually fail more than well-calibratedpersons (e.g., Barber and Odean 2001, Camerer and Lovallo 1999). In con-trast, under-conﬁdent individuals won’t take enough risks and stay perma-nently out of successful endeavors.On the theoretical side, the intuitive-Bayesian model of conﬁdence beforecompletion of a task creates a link between conﬁdence and decision analysesand their respective biases. Conﬁdence biases and the anomalies of decisionunder risk or uncertainty can be analyzed with the same tools. The estimationof one’s ability implies an implicit comparison between an uncertain binarylottery and a reference outcome. It is a by-product of the question: should Idouble or quit? This is a question of interest to behavioral and decision theo-rists. Acknowledgements

We thank the French

Ministére de la Recherche ( ACI "Contextessociaux, contextes institutionnels et rendements des systèmes éducatifs" ) for generous sup-port, Claude Montmarquette for oﬀering an opportunity to conduct part of the experimentalsessions at CIRANO (Montreal), and Noemi Berlin for numerous discussions. We are grate-ful to the referees and the editors of this special issue for bringing very helpful remarks andsuggestions. We remain responsible for any error.

References

Adams JK (1957) A conﬁdence scale deﬁned in terms of expected percentages.The American journal of psychology pp 432–436 However, overconﬁdence may pay oﬀ when there is uncertainty about opponents’ realstrengths, and when the beneﬁts of the prize at stake are suﬃciently larger than the costs(e.g., Johnson and Fowler 2011, Anderson et al 2012).onﬁdence Biases and Learning among Intuitive Bayesians 29

Anderson C, Brion S, Moore DA, Kennedy JA (2012) A status-enhancementaccount of overconﬁdence. Journal of personality and social psychology103(4):718–735Armantier O, Treich N (2013) Eliciting beliefs: Proper scoring rules, incentives,stakes and hedging. European Economic Review 62:17–40Barber BM, Odean T (2001) Boys will be boys: Gender, overconﬁdence, andcommon stock investment. Quarterly journal of Economics pp 261–292Benoît JP, Dubra J (2011) Apparent overconﬁdence. Econometrica79(5):1591–1625Benoît JP, Dubra J, Moore DA (2015) Does the better-than-average eﬀectshow that people are overconﬁdent?: Two experiments. Journal of the Eu-ropean Economic Association 13(2):293–329Breen R (2001) A rational choice model of educational inequality. Centro deEstudios Avanzados en Ciencias Sociales Instituto Juan March de Estudiose Investigaciones, Madrid Working paper(166)Brunnermeier MK, Parker JA (2005) Optimal expectations. American Eco-nomic Review 95(4):1092–1118, DOI 10.1257/0002828054825493Buehler R, Griﬃn D, Ross M (2002) Inside the planning fallacy: The causesand consequences of optimistic time predictions. Heuristics and biases: Thepsychology of intuitive judgment pp 250–270Camerer C, Lovallo D (1999) Overconﬁdence and excess entry: An experimen-tal approach. American economic review pp 306–318Clark J, Friesen L (2009) Overconﬁdence in forecasts of own performance: Anexperimental study*. The Economic Journal 119(534):229–251Deaves R, Lüders E, Schröder M (2010) The dynamics of overconﬁdence: Ev-idence from stock market forecasters. Journal of Economic Behavior & Or-ganization 75(3):402–412DeGroot MH (1970) Optimal Statistical Decisions. New York: McGraw-HillErev I, Wallsten TS, Budescu DV (1994) Simultaneous over-and undercon-ﬁdence: The role of error in judgment processes. Psychological review101(3):519–528Gervais S, Odean T (2001) Learning to be overconﬁdent. Review of Financialstudies 14(1):1–27Goodie AS (2005) The role of perceived control and overconﬁdence in patho-logical gambling. Journal of Gambling Studies 21(4):481–502Grieco D, Hogarth RM (2009) Overconﬁdence in absolute and relative perfor-mance: The regression hypothesis and bayesian updating. Journal of Eco-nomic Psychology 30(5):756–771Griﬃn D, Tversky A (1992) The weighing of evidence and the determinantsof conﬁdence. Cognitive psychology 24(3):411–435Heath C, Tversky A (1991) Preference and belief: Ambiguity and competencein choice under uncertainty. Journal of risk and uncertainty 4(1):5–28Hollard G, Massoni S, Vergnaud JC (2015) In search of good probability as-sessors: an experimental comparison of elicitation rules for conﬁdence judg-ments. Theory and Decision pp 1–25, DOI 10.1007/s11238-015-9509-9

Johnson DD (2004) Overconﬁdence and War: The Havoc and Glory of PositiveIllusions. Cambridge, MA: Harvard UPJohnson DD, Fowler JH (2011) The evolution of overconﬁdence. Nature477(7364):317–320Kahneman D, Tversky A (1979) Prospect theory: An analysis of decision underrisk. Econometrica pp 263–291Köszegi B (2006) Ego utility, overconﬁdence, and task choice. Journal of theEuropean Economic Association 4(4):673–707Kruger J (1999) Lake wobegon be gone! the" below-average eﬀect" and theegocentric nature of comparative ability judgments. Journal of personalityand social psychology 77(2):221–232Kruger J, Dunning D (1999) Unskilled and unaware of it: how diﬃculties inrecognizing one’s own incompetence lead to inﬂated self-assessments. Jour-nal of personality and social psychology 77(6):1121–1134Langer EJ, Roth J (1975) Heads i win tails it’s chance: The illusion of controlas a function of the sequence outcomes in a purely chance task. Journal ofPersonality and Social Psychology 32:951–955Lichtenstein S, Fischhoﬀ B (1977) Do those who know more also know moreabout how much they know? Organizational behavior and human perfor-mance 20(2):159–183Lichtenstein S, Slovic P (1971) Reversals of preference between bids andchoices in gambling decisions. Journal of experimental psychology 89(1):46–55Lichtenstein S, Fischhoﬀ B, Phillips L (1982) Calibration of probabilities: Thestate of the art to 1980. In: Kahneman D, Slovic P, Tverski A (eds) Judge-ment under uncertainty: Heuristics and biases, New York: Cambridge Uni-versity Press, pp 306–334Malmendier U, Tate G (2005) Ceo overconﬁdence and corporate investment.The journal of ﬁnance 60(6):2661–2700Merkle C, Weber M (2011) True overconﬁdence: The inability of rational in-formation processing to account for apparent overconﬁdence. OrganizationalBehavior and Human Decision Processes 116(2):262–271Miller DT, Ross M (1975) Self-serving biases in the attribution of causality:Fact or ﬁction? Psychological bulletin 82(2):213–225Miller TM, Geraci L (2011) Unskilled but aware: reinterpreting overconﬁdencein low-performing students. Journal of experimental psychology: learning,memory, and cognition 37(2):502–506Mobius M, Niederle M, Niehaus P, Rosenblat T (2014) Managing self-conﬁdence. Tech. rep., Working PaperMoore DA, Healy PJ (2008) The trouble with overconﬁdence. Psychologicalreview 115(2):502–517Oskamp S (1965) Overconﬁdence in case-study judgments. Journal of consult-ing psychology 29(3):261–265Ryvkin D, Krajč M, Ortmann A (2012) Are the unskilled doomed to remainunaware? Journal of Economic Psychology 33(5):1012–1031 onﬁdence Biases and Learning among Intuitive Bayesians 31

Shiller RJ (2000) Measuring bubble expectations and investor conﬁdence. TheJournal of Psychology and Financial Markets 1(1):49–60Van den Steen E (2011) Overconﬁdence by bayesian-rational agents. Manage-ment Science 57(5):884–896Svenson O (1981) Are we all less risky and more skillful than our fellow drivers?Acta Psychologica 47(2):143–148Tversky A, Kahneman D (1973) Availability: A heuristic for judging frequencyand probability. Cognitive psychology 5(2):207–232

Appendix