Confidence biases and learning among intuitive Bayesians
TTheory and Decision manuscript No. (will be inserted by the editor)
Confidence Biases and Learning among IntuitiveBayesians
Louis Lévy-Garboua · Muniza Askari · Marco Gazel
Received: date / Accepted: date
Abstract
We design a double-or-quits game to compare the speed of learn-ing one’s specific ability with the speed of rising confidence as the task getsincreasingly difficult. We find that people on average learn to be overconfidentfaster than they learn their true ability and we present an Intuitive-Bayesianmodel of confidence which integrates confidence biases and learning. Uncer-tainty about one’s true ability to perform a task in isolation can be responsiblefor large and stable confidence biases, namely limited discrimination, the hard-easy effect, the Dunning-Kruger effect, conservative learning from experienceand the overprecision phenomenon (without underprecision) if subjects act asBayesian learners who rely only on sequentially perceived performance cuesand contrarian illusory signals induced by doubt. Moreover, these biases arelikely to persist since the Bayesian aggregation of past information consol-idates the accumulation of errors and the perception of contrarian illusorysignals generates conservatism and under-reaction to events. Taken together,these two features may explain why intuitive Bayesians make systematicallywrong predictions of their own performance.
Keywords
Confidence biases · intuitive-Bayesian · learning · double or quitsexperimental game · doubt · contrarian illusory signals Louis Lévy-GarbouaParis School of Economics, Université Paris 1 Pantheon-Sorbonne, and Centre d’Economiede la Sorbonne, 106-112 Bd de l’Hôpital 75013, Paris, FranceTel.: +33 6 85 11 18 68E-mail: [email protected] AskariCentre d’Economie de la SorbonneE-mail: [email protected] GazelParis School of Economics, Université Paris 1 Panthéon Sorbonne and Centre d’Economiede la Sorbonne, 106-112 Bd de l’Hôpital - 75013 ParisE-mail: [email protected] a r X i v : . [ s t a t . O T ] J u l Louis Lévy-Garboua et al.
In many circumstances, people appear to be "overconfident" in their own abil-ities and good fortune. This may occur when they compare themselves withothers, massively finding themselves "better-than-average" in familiar domains(eg., Svenson 1981, Kruger 1999), when they overestimate their own absoluteability to perform a task (eg., Lichtenstein and Fischhoff 1977, Lichtensteinet al 1982), or when they overestimate the precision of their estimates andforecasts (eg., Oskamp 1965). Moore and Healy (2008) designate these threeforms of overconfidence respectively as overplacement, overestimation, andoverprecision. We shall here be concerned with how people overestimate, orsometimes underestimate, their own absolute ability to perform a task in iso-lation. Remarkably, however, our explanation of the estimation bias predictsthe overprecision phenomenon as well.The estimation bias refers to the discrepancy between ex post objective per-formance (measured by frequency of success in a task) with ex ante subjectivelyheld confidence (Lichtenstein et al, 1982). It has first been interpreted as a cog-nitive bias caused by the difficulty of the task (e.g.,Griffin and Tversky 1992).It is the so called "hard-easy effect" (Lichtenstein and Fischhoff, 1977): peopleunderestimate their ability to perform an easy task and overestimate their abil-ity to perform a difficult task. However, a recent literature has challenged thisinterpretation by seeking to explain the apparent over/underconfidence by therational-Bayesian calculus of individuals discovering their own ability throughexperience and learning (Moore and Healy, 2008, Grieco and Hogarth, 2009,Benoît and Dubra, 2011, Van den Steen, 2011). While the cognitive bias viewdescribes self-confidence as a stable trait, the Bayesian learning perspectivepoints at the experiences leading to over- or under-confidence. The primarygoal of this paper is to propose a parsimonious integration of the cognitivebias and the learning approach.We design a real-effort experiment which enables us to test the respectivestrengths of estimation biases and learning. People enter a game in whichthe task becomes increasingly difficult - i.e. risky- over time. By comparing,for three levels of difficulty, the subjective probability of success (confidence)with the objective frequency at three moments before and during the task, weexamine the speed of learning one’s ability for this task and the persistenceof overconfidence with experience. We conjecture that subjects will be firstunderconfident when the task is easy and become overconfident when the taskis getting difficult. However, "difficulty" is a relative notion and a task thata low-ability individual finds difficult may look easy to a high-ability person.Thus, we should observe that overconfidence declines with ability and riseswith difficulty. The question raised here is the following: if people have initiallyan imperfect knowledge of their ability and miscalibrate their estimates, willtheir rising overconfidence as the task becomes increasingly difficult be offsetby learning, and will they learn their true ability fast enough to stop the gamebefore it is too late? onfidence Biases and Learning among Intuitive Bayesians 3
The popular game "double or quits" fits the previous description and willthus inspire the following experiment. A modern version of this game is theworld-famous TV show "who wants to be a millionaire" . In the games of "dou-ble or quits" and "who wants to be a millionaire" , players are first given anumber of easy questions to answer so that most of them win a small prize.At this point, they have an option to quit with their prize or double by pursu-ing the game and answering a few more questions of increasing difficulty. Thesame sort of double or quits decision may be repeated several times in orderto allow enormous gains in case of repeated success. However, if the playerfails to answer one question, she must step out of the game with a consolationprize of lower value than the prize that she had previously declined.Our experimental data reproduces the double or quits game. We observethat subjects are under-confident in front of a novel but easy task, whereasthey feel overconfident and willing to engage in tasks of increasing difficultyto the point of failing.We propose a new model of "intuitive Bayesian learning" to interpret thedata and draw new testable implications. Our model builds on ideas put for-ward by Erev et al (1994) and Moore and Healy (2008). It is Bayesian likeMoore and Healy (2008), while viewing confidence as a subjective probabilityof success, like Erev et al (1994). However, it introduces intuitive rationalityto overcome a limitation of the rational-Bayesian framework which is to de-scribe how rational people learn from experience without being able to predictthe formation of confidence biases before completion of a task. This is not aninnocuous limitation because it means, among other things, that the rational-Bayesian theory is inconsistent with the systematic probability distortionsobserved in decisions under risk or uncertainty since the advent of prospecttheory (Kahneman and Tversky, 1979). Therefore, we need to go deeper intothe cognitive process of decision. Subjects in our view derive their beliefs exclu-sively from their prior and the informative signals that they receive. However,"intuitive Bayesians" decide on the basis of the sensory evidence that they per-ceive sequentially. If they feel uncertain of their prior belief, they will perceivethe objection to it triggered by their doubt and wish to "test" its strengthbefore making their decision, like those decision makers weighting the prosand cons of an option. The perceived objection to a rational prior acts like a contrarian illusory signal that causes probability distortions in opposition tothe prior and this is a cognitive mechanism that does not require completionof the task. As they gain experience, they keep on applying Bayes rule to up-date their prior belief both by cues on their current performance and by theprior-dependent contrarian signal.Thus, with the single assumption of intuitiverationality, we can account for all the cognitive biases described on our datawithin the Bayesian paradigm and integrate the cognitive bias and the learningapproach. With this model, and in contrast with Gervais and Odean (2001),we don’t need to assume a self-attribution bias (Langer and Roth, 1975, Miller
Louis Lévy-Garboua et al. and Ross, 1975) combined with Bayesian learning to produce overconfidence .Signals of future success and failure are treated symmetrically . Finally, un-like models of confidence management (e.g. Brunnermeier and Parker 2005,Köszegi 2006, Mobius et al 2014), we don’t have to postulate that individualsmanipulate their beliefs and derive direct utility from optimistic beliefs aboutthemselves.Section 2 lays down the structure of the experiment and incentives, andprovides the basic descriptive statistics. Our large data set allows a thoroughdescription of confidence biases and a dynamic view of their evolution withexperience of the task. Section 3 describes the confidence biases and learningshown by our data. Four basic facts about confidence are reported from ourdata: (i) limited discrimination among different tasks; (ii) miscalibration ofsubjective probabilities of success elicited by the "hard-easy effect"; (iii) dif-ferential, ability-dependent, calibration biases known as the Dunning-Kruger(or ability) effect (Kruger and Dunning, 1999); and (iv) local, but not global,learning. Section 4 proposes a new theory of over (under)-confidence amongintuitive Bayesians which integrates doubt and learning and can predict bi-ases, before as well as during the task, in repeated as well as in single trials.Doubt-driven miscalibration appears to be a sufficient explanation, not onlyfor the hard-easy effect and the ’ability’ or Dunning-Kruger effect, but alsofor limited discrimination and for the overprecision phenomenon. The theoryis further used in section 5 to predict the evolution of confidence over expe-rience on our data set. For instance, low-ability subjects first lose confidencewhen they discover their low performance during the first and easiest level;but they eventually regain their initial confidence in own ability to performmore difficult tasks in the future after laborious but successful completion ofthe first level. Intuitive Bayesians exhibit conservatism , that is, under-reactionto received information, and slow learning. Finally, we show in sub-section 5.3that the cues upon which subjects construct their own estimate of success, i.e. confidence, widely differ from the genuine predictors of success, which furtherexplains the planning fallacy . The conclusion follows in section 6. Using German survey data about stock market forecasters, Deaves et al (2010) does notconfirm that success has a greater impact than failure on self-confidence, which casts doubton the self-attribution bias explanation. In studies where subjects are free to stay or to leave after a negative feedback, subjectswho update most their confidence in their future success to a negative feedback are selectivelysorted out of the sample. This creates an asymmetry in measured responses to positiveand negative feedback. Such spurious asymmetry does not exist in the present experiment,because subjects who fail to reach one level must drop out of the game. The planning fallacy is the tendency to underestimate the time needed for completionof a task. See, e.g. Buehler et al (2002).onfidence Biases and Learning among Intuitive Bayesians 5 i.e . 6anagrams per round to be solved in no more than eight minutes). It is longenough to let participants feel that a large effort and ability is required ofthem to succeed at the optional upper levels. It does also let them ample timeto learn the task. The middle and high levels, which come next, comprise 3rounds each.The gradient of task difficulty was manipulated after completion of thetraining level and two conditions are available: (i) in the ’wall’ condition, thedifficulty jumps sharply at middle level, but remains constant at high level; (ii) in the ’hill’ condition, the difficulty always rises from one level to the next,slowly first at middle level, then sharply at high level.By the end of the experiment, the required number of anagrams is the samefor the ’wall’ and ’hill’ conditions. However, the distribution of anagrams tobe decoded differs for these two conditions. In the wall condition, ten anagramsper round are proposed at the middle and high levels, of which 20 anagramsat least must be decoded per level. In the hill condition, eight anagrams perround are proposed at middle level, and this rises to twelve anagrams at highlevel. Decoding sixteen anagrams in three rounds is required for middle level;and decoding twenty-four anagrams in three rounds is required for high level.This design can be visualized in Figure 1. The same figure appears (withoutthe legends) on the screen before each round .The manipulation of the ’wall’ and ’hill’ conditions gave rise to three treat-ments: – Wall treatment ( wall ): the wall condition is imposed to participants whopassed the training level; – Hill treatment ( hill ): the hill condition is imposed to participants whopassed the training level; – Choice treatment ( choice ): a choice among the two conditions ( wall or hill )is proposed to participants who passed the training level.The double or quits game is played under these three treatments. All sub-jects first go through the training level. Those who were successful -i.e., those The screen highlights the round, the number of correct anagrams cumulated during thecurrent level and the number of anagrams needed to pass this level. Louis Lévy-Garboua et al.
Fig. 1
Decision problem perceived by participants at the start of level 2 of the choicetreatment.
Notes:Payoffs in parentheses : (fail, success and stop).Decisions I, II and III are conditioned to success in the previous level.Decision II depends on the treatment.Estimation of Confidence After is conditioned to success in the first and decision to start thesecond level. who solved at least 36 anagrams during the training level- will then be askedto double or quits: – Double : Continue to the next level to win a substantial increase in earn-ings; – Quits : Stop the experiment and take your earnings.Participants who decide to go to middle level get a consolation prize thatis lower than the foregone earnings if they fail or drop out before the thirdround. If they succeed middle level, they will be asked again to double or quits.The same rules apply for high level at rising levels of earnings. The potentialgains (in Euros) were (10, 2) at the training level, that is, 10 e for successfulquitters and 2 e for failures, (14, 4) at middle level, and (26, 11) at high level.2.2 Experimental sessionsWe ran 24 sessions for a total of 410 participants, half for the choice treatmentand the other half equally split between the ’wall’ and ’hill’ treatments. Eightsessions were run in the BULCIRANO lab (Center for Interuniversity Researchand Analysis on Organizations), Montreal (Canada), and the same number ofsessions were conducted at the LEEP ( Laboratoire d’Economie Expérimentalede Paris ), Pantheon-Sorbonne University. The difference between Paris andMontreal was observed to be insignificant. Thus, eight additional sessions wereconducted at LEEP in order to acquire robust results. A show-up fee of 5 e inParis and Can$ 5 in Montreal was paid to the participants (from now on, allmoney amounts will be given in Euros). About 80% of the participants werestudents. onfidence Biases and Learning among Intuitive Bayesians 7 At the start, instructions were read out and a hard copy of it was alsoprovided individually. Participants answered six questions to test their fullcomprehension of the experiment. Information on gender, age, educationallevel and labor market status was required. The last question was a hypothet-ical choice between 5 e for sure and an ambiguous urn containing 100 balls oftwo colors (white and black) in unknown proportions. Ten Euros (10 e ) wereto be earned if a black ball was drawn. Choice of the sure gain provided arough but simple measure of risk aversion in the uncertainty context of theexperiment.2.3 Descriptive statisticsThe main descriptive statistics for the three treatments are reported in Table1: Table 1
Descriptive statistics for the three treatments
TreatmentsVariables Wall Hill Choice
Male 56% 48% 49%Age 24.5 25.8 25.1Risk Averse 54% 59% 51%Payments 9.1 8.9 7.8Total anagrams solved 55.6 53.7 54.3Ability Number of observations 101 106 203
Decision to double conditional on success at previous level:
Middle level 78% (91) 76% (90) 77% (176)High level 95% (22) 72% (29) 82% (34)
Notes: Decision to double to High level: difference between the "Wall"and "Hill" treatments is significant at 5%; all other differences are notsignificant at 10% level (t-test). Number of participants successfullyclearing the previous level is in parentheses.
The results of tests show that the three samples are homogeneous. No sig-nificant difference is observed among the samples’ means for individual charac-teristics. As expected, the ’wall ’ and ’hill’ treatments had a substantial impacton the decision to double upon reaching the middle level. Almost everybodydoubles in the ’wall’ treatment on reaching middle level because the high levelis no more difficult than the middle level. In contrast, only 72% enter the highlevel in the ’hill’ treatment as the difficulty gradient is very steep (t-test: t =2.20; p-value =0.033). In spite of these differences, the number of anagramssolved and payments may be considered equal among treatments at the usuallevel of significance. Ability is measured by the number of anagrams solved per minute in the first 4 rounds.It lies in the interval [0,6]. Louis Lévy-Garboua et al.
Subjects can also be grouped in three different levels of ability, according tothe number of anagrams solved per minute in the first 4 rounds: high ability(first tercile), medium ability (second tercile) and low ability (last tercile).Some descriptive statistics for the three treatments are reported on Table 2.The three groups are homogeneous in terms of gender and risk aversion but aslightly greater proportion of low-ability subjects can be found among older,probably non-student, participants.
Table 2
Descriptive statistics by ability level
Level of ability
Difference
Variables High Medium Low
M-H L-M L-H
Male 47% 54% 50% ns ns ns
Age 23.6 24.5 27.2 ns *** ***
Risk Aversion 53% 50% 59% ns ns ns
Payments 11.7 7.7 6.0 *** ** ***
Number anagrams solved 67.7 53.8 42.6 *** *** ***
Ability 4.5 2.4 1.1 *** *** ***
Number of observations 131 142 137
Decision to double conditional on success at previous level:
Middle level 91% (128) 81% (127) 54% (102) ** *** ***
High level 87% (55) 72% (25) 80% (5) * ns ns
Notes: Significance level: * 10%; ** 5%; ***1%; ns: not significant at 10% level (t-test). Numberof participants successfully clearing the previous level is in parentheses
Table 2 shows that "ability" strongly discriminates among participants interms of performance (total anagrams solved, payments) and quits before themiddle level. However, the training level was meant to be easy enough thatthree-quarters (102:137) of low-ability subjects would pass it.2.4 Confidence judgmentsParticipants were asked to state their subjective probability of success forthe three levels and at three moments: before, during, and after the traininglevel. Before beginning the game, they were shown a demonstration slide whichlasted one minute. Anagrams of the kind they would have to solve appeared onthe screen with their solution. Then, they were asked to assess their chancesof success on a scale of 0 to 100 (Adams, 1957), and the game started for real.After four rounds of decoding anagrams, players were asked again to rate theirconfidence. Lastly, players who had passed the training level and decided todouble re-estimated their chances of success for the middle and high levels.The Adams’s (1957) scale that we used is convenient for quantitative anal-ysis because it converts confidence into (almost) continuous subjective proba-bilities. It was required for consistency that the reported chances of success donot increase as the difficulty level increased. Answers could not be validatedas long as they remained inconsistent. Subjects actually used the whole scale onfidence Biases and Learning among Intuitive Bayesians 9 but, before the experiment, 14% expressed absolute certainty that they wouldsucceed the first level and only 1 participant was sure that she would fail.We did not directly incentivize beliefs because our primary aim was notto force subjects to make optimal forecasts of their chances of success but tohave them report sincerely their true beliefs in their attempt to maximize theirsubjective expected utility, and to observe the variation of such beliefs withexperience. The true beliefs are those which dictate actual behavior followingsuch prediction, and the latter was incentivized by the money gains based onsubjects’ decisions to double or quits and performance in the task. Armantierand Treich (2013) have recently generalized previous work on proper scoringrules (see their extensive bibliography). They show that, when subjects have afinancial stake in the events they are predicting and can hedge their predictionsby taking additional action after reporting their beliefs, use of any proper scor-ing rule generates complex distortions in the predictions and further behaviorsince these are not independent and are in general different from what theywould have been if each had been decided separately. In the present context,final performance yields income and does not immediately follow the forecast.Hence, incentivizing forecasts might force subjects to try and adjust graduallytheir behavior to their forecast and, therefore, unduly condition their behavior.A further difficulty encountered in this experiment was that, by incentivizingbeliefs on three successive occasions, we induced risk-averse subjects to diver-sify their reported estimates as a hedge against the risk of prediction error.Self-report methods have been widely used and validated by psychologistsand neuroscientists; and recent careful comparisons of this method with thequadratic scoring rule found that it performed as well (Clark and Friesen,2009) or better (Hollard et al, 2015) than the quadratic scoring rule . Con-sidering that self-reports perform nicely while being much simpler and fasterthan incentive-compatible rules, use of the self-report seemed appropriate inthis experiment. After the subject has reported a probability p , the quadratic scoring rule imposes a costthat is proportional to (1 − p ) in case of success and to (0 − p ) in case of failure. The scoretakes the general form: S = a − b . Cost, with a, b > . The second study also included the lottery rule in the comparison and found that thelatter slightly outperformed self-report. The lottery rule rests on the following mechanism:after the subject has reported a probability p , a random number q is drawn. If q is smallerthan p , the subject is paid according to the task. If q is greater than p , the subject is paidaccording to a risky bet that provides the same reward with probability q . The lottery rulecannot be implemented on our design.0 Louis Lévy-Garboua et al. whatsoever of the characteristics, nor even the existence, of the other path. Result 1 (Limited discrimination): Subjects do not perceive differences of dif-ficulty between two different tasks in the future unless such differences areparticularly salient. Moreover, they are not forward-looking, in the sense thatthey are unable to anticipate the increased likelihood of their success at the highlevel conditional on passing the middle level. However, they can be sophisticatedwhen it is time for them to choose.Support of result 1:
Table 3 compares confidence judgments regarding thethree levels of difficulty among the ’ wall ’ and the ’ hill ’ subjects before, dur-ing, and after the training period. Although the ’ wall ’ and ’ hill ’ were designedto be quite different at the middle and high levels, the subjective estimates ofsuccess exhibit almost no significant difference at any level. The single excep-tion concerns the early estimate (before round 1) regarding the high level forwhich the difference of gradient between the two paths is particularly salient.However, the difference ceases to be significant as subjects acquire experienceof the task. This striking observation suggests that individuals are unable todiscriminate distinctive characteristics of the task unless the latter are partic-ularly salient.Perhaps even more disturbing is the fact that, in Table 3, subjects discounttheir confidence level from the middle to the high level as much in the Wall asin the Hill treatment. For instance, just before the middle level, the ratio ofconfidence in passing the high level to confidence in passing the middle levelwas close to 0.70 in both treatments. However, a perfectly rational agent shouldrealize that the high level is no more difficult than the middle level in the Walltreatment whereas it is much more difficult in the Hill treatment. Thus, sheshould report almost the same confidence at both levels in the Wall treatment,and a considerably lower confidence at the high level in the Hill treatment.The latter observation suggests that most individuals are unable to computeconditional probabilities accurately even when the latter is equal to one asin the Wall treatment. They don’t anticipate that, if they demonstrate theability to solve 20 anagrams or more at middle level, they should be almostsure to solve 20 or more at the high level. However, subjects do make theright inference when it is time for them to make the decision since 95% ofsubjects who passed the middle level in the Wall treatment decided to continue(Table 1). And, if they have a choice between Wall and Hill, they do make adifference between these two tracks: 71.4% of doublers then prefer the Walltrack although they would have greater chances of success at the middle level ifthey chose Hill. This observation suggests that subjects did not maximize theirimmediate probability of success but made a sophisticated comparison of theexpected utility of both tracks, taking the option value of Wall in considerationbefore making an irreversible choice of track spanning over two periods . We are grateful to Luis Santos-Pinto for making the last point clear in early discussions.onfidence Biases and Learning among Intuitive Bayesians 11
Table 3
A comparison of confidence for the wall and hill treatments shown separately
Subjective confidence No-choice treatment
Wall (%) Hill (%)
Difference
Before round 1:
Level 1
80 77 nsLevel 2
62 58 nsLevel 3
47 40 ** Before round 5:
Level 1
71 71 nsLevel 2
53 52 nsLevel 3
40 36 ns Before round 10:
Level 2
60 56 nsLevel 3
43 39 ns Notes. Observations: Before rounds 1 and 5 (before round 10): 101 (71) forwall and 106 (68) for hill.
Significance Level : * p < . , ** p < . , *** p < . , ns: not significant at 10% level. Result 2 (The hard-easy effect): In comparison with actual performance, con-fidence in one’s ability to reach a given level is underestimated for a novelbut relatively easy task (the training level); and it is overestimated for thesubsequent more difficult tasks (the middle and high levels). Overconfidenceincreases in relative terms with the difficulty of the task. Conditional on aninitial success (training level) and on the decision to continue, confidence inone’s ability to reach higher levels is still overestimated. Thus, initially suc-cessful subjects remain too optimistic about their future.Support for result 2:
Figure 2 compares the measured frequency of successwith the reported subjective confidence in the three successive levels of in-creasing difficulty. For the middle and high levels, we also indicate these twoprobabilities as they appear before the training period and after it conditionalon doubling. The Choice and No-choice conditions have been aggregated onthis figure because no significant difference was found in the result of tests.The task required at the training level was relatively easy for our subjectssince 87% passed this level. However, subjects started it without knowing whatit would be like and, even after four rounds of training, they underestimatedtheir own ability to a low 77% probability of success. The difference amongthe two percentages is significant (t=5.77, p=0.000; t-test). Hence, individualsare under-confident on the novel but relatively easy task.In contrast, subjects appear to be overconfident as the task gets increas-ingly difficult. They consistently diminish their estimated probabilities of suc-cess but do not adjust their estimates in proportion to the difficulty of the task.Thus, individuals tend to overestimate their own chances for the advanced lev-els. The difference between the frequency of success and confidence before thetask is always significant, both at the middle level ( t=18.3, p=0.000 ) and atthe high level ( t=17.1, p=0.000 ). Fig. 2
Hard-easy effect observed at three levels
Notes:
Observations : before training level (N: 410); after training level (N:275 - analysis restricted to doublers).
Differences between frequency of suc-cess and confidence (before and after) are significant at 1% at all levels (Train-ing, Middle and High). ( t-test ) The same conclusions hold conditional on passing the training level andchoosing to double. Subjects remain overconfident in their future chances ofsuccess. However, their confidence does not rise after their initial success inproportion to their chances of further success.3.3 The ability effect
Result 3 (The ability effect): Overcalibration diminishes with task-specific abil-ity.Support for result 3:
The hard-easy effect is reproduced on Figures 3a, 3b, 3cfor the three ability terciles . Low-ability subjects are obviously more over-confident at middle and high levels relative to high and medium-ability in-dividuals. This result confirms earlier observations of Kruger and Dunning(1999) among others (see Ryvkin et al (2012) for a recent overview and incen-tivized experiments). The so-called Dunning-Kruger effect has been attributedto a metacognitive inability of the unskilled to recognize their mistakes . We Difference between confidence and frequency of success is significant at 1% for all abilitylevels. For these figures, we selected confidence reported after th round (during traininglevel) in order to minimize the impact of mismeasurement. The Dunning-Kruger effect initially addressed general knowledge questions whereas weconsider self-assessments of own performance in a real-effort task.onfidence Biases and Learning among Intuitive Bayesians 13
Fig. 3a
Under-confidence at the training level, by ability.
Fig. 3b
Overconfidence at middle level, by ability. give here another, and in our opinion, simpler explanation . The ability (orDunning-Kruger) effect may be seen as a corollary of the hard-easy effect be-cause "difficulty" is a relative notion and a task that a low-ability individualfinds difficult certainly looks easier to a high-ability person. Thus, if overconfi-dence rises with the difficulty of a task, it is natural to observe that it declineson a given task with the ability of performers. Our explanation may also be better than the initial explanation such that the unskilledare unaware of their lower abilities. Miller and Geraci (2011) found that students withpoor abilities showed greater overconfidence than high-performing students, but they alsoreported lower confidence in these predictions.4 Louis Lévy-Garboua et al.
Fig. 3c
Overconfidence at high level, by ability.
Result 4 (Learning is local, not global): Confidence and performance co-varyduring the task. Subjects learned locally upon experiencing variations in theirperformance. However, they didn’t learn globally in our experiment, sincedoublers remained as confident as before after completing the training levelirrespective of their true ability level.Support for result 4:
Figures 4 and 5 describe confidence by ability groupbefore, during, and after the training period for the middle and high levelrespectively whereas Figure 6 describes the variation of performance of thesame groups within the same period. These graphs, taken together, show adecline in both (ability-adjusted) confidence and performance during the firstfour rounds, followed by a concomitant rise of confidence and performance inthe following rounds . The observed decline of confidence at the beginningof the training period can be related on Figure 6 to the fact that participantssolved less and less anagrams per period during the first four periods: 5.51 onaverage in period 1, 5.18 in period 2, 4.60 in period 3, and 4.17 in period 4 .Subjects kept solving at least two-thirds of the anagrams available during the No significant difference was found between the Choice and No-choice conditions, sug-gesting that the option to choose the preferred path does not trigger an illusion of control. Participants who reported confidence after the training period were more able thanaverage since they had passed this level and decided to double. Thus, we compare ability-adjusted confidence Before and During with the reported confidence After. The ability-adjusted confidence Before and During are obtained by running a simple linear regressionof confidence Before and During on ability, measured by the average number of anagramssolved per minute in the first 4 rounds of the training level. The estimated effect of superiorability of doublers was added to confidence During or Before to get the ability-adjustedconfidence which directly compares with the observed confidence After. With a single exception, confidence variations are statistically significant at 1% level inthe middle and high levels. There was no significant difference between treatments.onfidence Biases and Learning among Intuitive Bayesians 15 training session but probably lost part of their motivation on repeating thetask. On sequentially observing their declining performance, they revised theirinitial estimate of future success downward. However, on being asked to reporttheir confidence after four rounds, they became conscious of their performancedecline and responded to this information feedback. Performance rose sharplybut momentarily during the next two rounds. The average performance firstrose to 4.37 in period 5 and 5.05 in period 6 then sharply declined to 4.39 inperiod 7, 4.06 in period 8 and 3.48 in period 9. As soon as subjects became(almost) sure of passing the training level, they diminished their effort. Duringthe experiment it was also observed that individuals stopped decoding furtheranagrams as soon as the minimum requirement to clear a level was fulfilled.Subjects experiencing low (medium) performance in the first rounds seemto learn locally that they have a low (medium) ability since the confidence gapwidens during the first four periods. However, this learning effect is short-livedsince the confidence gap shrinks back to its initial size after low (medium)-ability subjects strove to succeed, increasing their performance (as reported onFigure 6) and regaining confidence. Eventually, experienced "doublers" are asconfident to succeed at higher levels as they were before the task, irrespectiveof their ability level: there is no global learning effect. We share the conclusionof Merkle and Weber (2011) that the persistence of prior beliefs is inconsistentwith fully rational-Bayesian behavior(see also Benoît et al 2015).
Fig. 4
Variation of confidence with experience, by level of ability: middle level
Notes.
Sample size : 410 individuals for Before and During, and 275 for After(only doublers). We report the adjusted ability for doublers, see Footnote 13for more details.
Differences between ability levels are significant at 1%level Before and During. Differences After are not significant at 10% level.
Differences by ability level : High-ability: During-Before: ***; After-During:ns; After-Before: ns. Medium-ability: During-Before: ***; After-During: ***;After-Before:**. Low- ability: During-Before:***; After-During: *** ; After-Before: ns.
Significance level : *** 1%; ** 5%; * 10%; ns: not significant at10% level (t-test).
Fig. 5
Variation of confidence with experience, by level of ability: high level
Notes.
Sample size : 410 individuals for Before and During, and 275 for After(only doublers). We report the adjusted ability for doublers, see Footnote 13for more details.
Differences between ability levels are significant at 1%level Before and During. Differences After are not significant at 10% level.
Differences by ability level : High-ability: During-Before: ns; After-During:ns; After-Before: ns. Medium-ability: During-Before: ***; After-During: **;After-Before: ns. Low-ability: During-Before:***; After-During: *** ; After-Before: ns..
Significance level : *** 1%; ** 5%; * 10%; ns: not significant at10% level (t-test).
Fig. 6
Number of anagrams solved per round by level of abilityonfidence Biases and Learning among Intuitive Bayesians 17
We present now a simple Bayesian model that describes absolute confidencereported before and during completion of a task, and predicts limited discrimi-nation, the hard-easy effect and the ability effect. It builds on ideas put forwardby Erev et al (1994) and Moore and Healy (2008) who both consider that con-fidence, like most judgments, are subject to errors. Erev et al (1994) viewconfidence as a subjective probability that must lie between 0 and 1. Hence,probabilities close to 1 are most likely to be underestimated and probabilitiesclose to 0 are most likely to be overestimated. The hard-easy effect and theability effect may be merely the consequence of that simple truth. However,their theory offers a qualitative assessment that lacks precision and cannot beapplied to intermediate values of confidence. Moore and Healy (2008) analyzeconfidence as a score in a quiz that the player must guess after completionof the task and before knowing her true performance. Bayesian players adjusttheir prior estimate after receiving a subjective signal from their own experi-ence. It is natural to think that signals are randomly distributed around theirtrue unknown value. Assuming normal distributions for the signal and theprior, the posterior expectation of confidence is then a weighted average of theprior and the signal lying necessarily between these two values. Thus, if thetask was easier than expected, the signal tends to be higher than the prior.The attraction of the prior pulls reported confidence below the high signal,hence below true performance on average since the signal is drawn from anunbiased distribution. While rational-Bayesian models like Moore and Healy(2008) may account for learning over experience, they fail to predict limiteddiscrimination, miscalibration of confidence before completion of the task, orthe absence of global learning. Therefore, we add to the Bayesian model a cru-cial but hidden aspect of behavior under risk or uncertainty, that is doubt. Wedescribe the behavior of subjects who are uncertain of their true probabilityof success and become consequently vulnerable to prediction errors and cog-nitive illusions if they rely essentially on what they perceive sequentially. Wedesignate these subjects as "intuitive Bayesians". It turns out, unexpectedly,that the same model also predicts the overprecision bias of confidence, whichwe consider as a further confirmation of its validity.Intuitive Bayesians may miscalibrate their own probability of success evenif they have an unbiased estimate of their own ability to succeed. This can occurif they are uncertain of the true probability of success because they can bemisled by "available" illusory signals triggered by their doubt. The direction ofdoubt is entirely different depending on whether their prior estimate led themto believe that they would fail or that they would succeed. We thus distinguishmiscalibration among those individuals who should normally believe that theyshould not perform the task and those who should normally believe that theyshould.To facilitate intuition, let us first consider a subject who is almost sure tosucceed a task, either because the task is easy or because the subject has high-ability ( H ). However, the "availability" of a possible failure acts like a negative signal which leads to overweighting this possibility (Tversky and Kahneman,1973), and underweighting her subjective probability of success Ep H , i.e. underconfidence: q H = µEp H + (1 − µ )0 = µEp H ≤ Ep H , (1)with < µ ≤ Even though high-ability agents are almost sure of succeeding the traininglevel, their confidence is way below 1, confirming the Dunning-Kruger effectwhere high-ability subjects underestimate their abilities. An estimate of thisundercalibration bias for an easy task is derived from Figure 3a: µ H (training level) = 0 . .
98 = 0 . ∼ = q H (training level) The undercalibration bias is: − .
806 = 0 . .However, underweighting a high probability of success need not reverse theintention of doubling. Indeed, taking the expected value as the decision crite-rion, among 167 "able" subjects who should double if objective probabilitiesare used for computation, 158 (i.e. 94.6%) still intended to double accordingto the subjective confidence reported before the game .At the other end of the spectrum, consider now a subject who is almostsure of failing, either because the task is very difficult or because the subjecthas low-ability ( L ). However, the "availability" of a possible success leads tooverweighting her subjective probability of success Ep L i.e. overconfidence: q L = µEp L + (1 − µ )1 ≥ Ep L , (2)with < µ ≤ Thus, even though low ability agents should give up a difficult task, they areoverconfident and are thus tempted by the returns to success . In the limit,confidence remains positive if one is almost certain to fail. This means thatlow-ability individuals always exhibit a positive bottom confidence, which isin line with the Dunning-Kruger effect (they overestimate their abilities). Anestimate of this overcalibration bias for the high level is derived from Figure3c: − µ L (high level) = 0 . − . − .
01 = 0 . ∼ = q L (high level) The time t = (1 , , when confidence is reported is omitted in this sub-section toalleviate notations. Very close numbers are obtained for all calibration biases with confidence reportedduring the game. This should not be confounded with motivated inference as it applies symmetrically toundesirable and desirable outcomes.onfidence Biases and Learning among Intuitive Bayesians 19
Similarly, the overcalibration bias for the middle level is derived from Figure3b: − µ (cid:48) L (middle level) = 0 . − . − .
04 = 0 . ∼ = q L (middle level) Notice that the overcalibration bias is about twice as large as the under-calibration bias. Hence, taking the expected value as the decision criterion,among 190 "unable" subjects who should quit if objective probabilities areused for computation, 159 (i.e. 83.7%) intended to double according to thesubjective confidence reported before the game.To sum up, we explain both the hard-easy effect and the ability effect byan availability bias triggered by the doubt about one’s possibility to fail arelatively easy task (underconfidence) or to succeed a relatively difficult task(overconfidence). If probabilities are updated in a Bayesian fashion, the cali-bration bias is the relative precision of the illusory signal. The latter is inverselyrelated with the absolute precision of the prior estimate and positively relatedwith the absolute precision of the illusory signal. Thus, we mustn’t be sur-prised to find that our estimate of the calibration bias is lower for the traininglevel (19.4%) than for upper levels (42.7% and 33.3% respectively) becauseexperience in the first rounds of the training level must be more relevant forpredicting the probability of success in the training level than in subsequentlevels. And, when comparing upper levels, the illusion of success should bemore credible for the near future (middle level) than for the more distantfuture (high level).This explanation is also consistent with the other measures displayed byFigures 3a, 3b, 3c, given the fact that they aggregate overconfident sub-jects who should not undertake the task with underconfident subjects whoshould undertake it . If λ L is the proportion who should stop and λ H theproportion who should continue ( λ L + λ H ≡ ), the average confidence is: λ L ( µEp L + 1 − µ ) + λ H µEp H = µEp + (1 − µ ) λ L . Confidence is overcalibratedon average iff λ L > Ep and undercalibrated iff the reverse condition holds. Theapparent overcalibration of confidence for a difficult task takes less extremevalues when the average measured ability of the group rises. For instance, theresults displayed by Figure 3c are consistent with our estimate for the overcal-ibration bias if the proportion of successful middle-ability subjects is 12% andthat of successful high-ability subjects is 25%, since these two predicted valuesare close to the observed frequency of success in these groups, respectively 10%and 27%.Remarkably, this simple model of miscalibration also predicts limited dis-crimination. Although Wall is more difficult than Hill at the middle level, The rational decision to undertake a non-trivial task of level l (with a possibility tofail and regret) is subjective. The economic criterion for making this decision rests on thecomparison of the expected utilities of all options conditional on the estimated probabilitiesof success at the time of decision. A rational subject should refuse the task if the expectedutility of continuing to level l or above is no higher than the expected utility of stoppingbefore level l. We make use of this criterion for writing equations 6 and 7 in the nextsub-section (5.1).0 Louis Lévy-Garboua et al. our subjects attributed on average about the same confidence level to bothtasks (see table 1). High-ability subjects who should double at middle levelin the Wall condition, and low-ability subjects who should stop before middlelevel in the Hill condition would both estimate their chances of success to behigher with 16 anagrams to solve with Hill than with 20 anagrams with Wall.The former would underestimate their chances according to (1) and the latterwould overestimate them according to (2), but the difference between the twoestimates would be the same, equal to µ ( E p Hill − E p Wall ). Thus, if their priorestimates were unbiased, intuitive (s.t. µ < ) high and low-ability subjectswould imperfectly discriminate between Hill and Wall by underestimating thedifficulty gap between them. Things are even worse for middle-ability subjectswho should opt for middle level under Hill and quit before middle level underWall. According to (1) and (2), those individuals would have a low estimate( µE p Hill ) of their pass rate under Hill and a high estimate ( E p Wall + 1 − µ )under Wall. They would then underestimate the difficulty gap more severelythan high or low-ability subjects and they might even give a higher estimateunder Wall than under Hill iff E p Hill − E p Wall < ((1 − µ ) /µ ). Therefore, ourmodel implies limited discrimination of differences in difficulty by intuitiveBayesians when the difference is not very salient.A further implication of Bayesian updating is that, in the subject’s mind,the precision of the posterior estimate for probabilities of success, i.e. confi-dence in her estimate, is increased by reception of the illusory signal, whateverthe latter may be . Therefore, our theory of confidence predicts the overpre-cision phenomenon even before completion of the task. In contrast with theother distortions of confidence, underprecision will never be observed, a pre-diction which is corroborated by Moore and Healy (2008) who do not quoteany study in their discussion of "underprecision". The overestimation of theprecision of acquired knowledge is an additional manifestation of the hiddensearch undertaken by intuitive Bayesians. Our analysis of overprecision is con-gruent with the observation that greater overconfidence of this kind was foundfor tasks in which subjects considered they were more competent (Heath andTversky, 1991). E p , after fourrounds E p , and after nine rounds (only for doublers) E p .After going through four rounds of anagrams, a number of cues on thetask have been received and processed. Participants may recall how many It is assumed here, as in Table 1, that the two estimates are independent. If ν i denotes the prior precision of subject I (cid:48) s estimate of her future success (omittinglevel l for simplicity) ν i + 1 ≡ Φ i will be the posterior precision after reception of an i.i.d. signal. Thus, Φ i > ν i . Notice that µ i = ν i ν i +1 .onfidence Biases and Learning among Intuitive Bayesians 21 anagrams they solved in each round and in the aggregate, whether they wouldhave passed the test in each round or on the whole at this stage of the task,whether their performance improved or declined from one round to the next,how fast they could solve anagrams, and so forth. For the purpose of decision-making, cues are converted into a discrete set of i.i.d. Bernoulli variables takingvalue 1 if they signal to the individual that she should reach her goal for level l ( l = 1 , , , and 0 otherwise. The single parameter of the Bernoulli variableis its mean which defines the expected likelihood of success. However, thismean is essentially unknown to that individual. Thus, let it be denoted by (cid:101) p which is randomly distributed within the interval [0, 1]. Assume that theprior distribution of (cid:101) p is a Beta-distribution with a reported mean E p andprecision ν .Behaving like intuitive Bayesians, participants update their prior expecta-tion of success at level l ( l = 1 , , before the training session E p l in thefollowing manner (see DeGroot 1970, Chapter 9): E p l = ν l ν l + τ l E p l + 1 ν l + τ l X − l (3)with τ l > designating the precision of all the independent cues perceivedduring the first four rounds, and X − l defining the number of independentcues predicting future success at level l at this stage of the task. They alsoupdate the precision of the posterior expectation E p l , which rises from ν l to: ν l = ν l + τ l (3’)with ≤ X − ,l ≤ τ l .Equation (3) cannot be directly estimated on the data because the es-timated probabilities E p l and E p l are unobservable. However, it may berewritten concisely in terms of reported confidence q ( l ) and q ( l ) with thehelp of the miscalibration equations (1) and (2). Let us express generally theBayesian transformation of the probability estimates into confidence as: q ( l ) = µ l E p l + (1 − µ l ) D ,l , l = (1 , , (4)with µ l = ν l + τ l ν l + τ l +1 and D (5 ,l ) = (cid:26) if max EU ( l (cid:48) | E p l (cid:48) , l (cid:48) = (0 , · · · l − ≥ max EU ( l (cid:48)(cid:48) | E p l (cid:48)(cid:48) , l (cid:48)(cid:48) = ( l, · · · , otherwise.Confidence is merely a weighted average of the prior forecast and a doubtterm acting as a contrarian Bernoulli signal.And likewise: In order to have an unambiguous definition of D (5 ,l ) and D (1 ,l ) below, we use theexpected utility (EU) criterion, as explained in note 19.2 Louis Lévy-Garboua et al. q ( l ) = µ l E p l + (1 − µ l ) D ,l (5)with µ l = ν l ν l +1 and D (1 ,l ) = (cid:26) if max EU ( l (cid:48) | E p l (cid:48) , l (cid:48) = (0 , · · · l − ≥ max EU ( l (cid:48)(cid:48) | E p l (cid:48)(cid:48) , l (cid:48)(cid:48) = ( l, · · · , otherwise.Combining (3), (4) and (5), we get: q ( l ) = ν l + 1 ν l + τ l + 1 q ( l ) + 1 ν l + τ l + 1 X − ,l + 1 ν l + τ l + 1 ( D ,l − D ,l ) (6)By the same reasoning, we can express the confidence of doublers for upperlevels l = (2 , as: q ( l ) = ν l + τ l + 1 ν l + τ l + 1 q ( l ) + 1 ν l + τ l + 1 X − ,l + 1 ν l + τ l + 1 ( D ,l − D ,l ) (7)with τ l ≥ τ l designating the precision of all of the independent cues perceivedduring the training level (9 rounds), ν l + τ l the precision of the posteriorexpectation E p l , and X defining the number of independent cues predictingfuture success at level l at this stage of the task.Equations (6) and (7) are essentially the same with a moving prior of in-creasing precision. In the absence of miscalibration, confidence reported beforeround t ( t = (5 , would be a weighted average of prior confidence and themean frequency of cues predicting future success at level l since the last timeconfidence was reported. With miscalibration, another term is added whichcan only take three values, reflecting the occurrence and direction of changein subjects’ estimated ability with experience. If experience confirms the priorintention to stop or continue to level l , this additional term takes value 0 andconfidence is predicted by the rational-Bayesian model (with perfect calibra-tion). However, if experience disconfirms the prior intention to stop or continueto level l , confidence rises above this reference value with disappointing expe-rience and declines symmetrically below this reference value with encouragingexperience. Thus, our model predicts that intuitive Bayesians be conservative and under-react symmetrically to negative experience (by diminishing theirconfidence less than they should) and to positive experience (by raising theirconfidence less than they should). Below, we report indeed rather small vari-ations of confidence in our experiment in the form of local, but not global,learning. onfidence Biases and Learning among Intuitive Bayesians 23 respectively. Reportedconfidence in participant i ’s ability to reach one level of the double-or-quitsgame is regressed in Table 4 (Table 5) on the confidence that she reportedbefore the first (fifth) round and on a vector Z li of level-specific cues observablein the first four (last five) rounds, assuming that X − ,li ( X − ,li ) = β l Z li + (cid:15) li where β l is a vector of coefficients and (cid:15) li an error term of zero mean. Twodummy variables for the hill and choice treatments (wall as reference) havebeen added to the regression. Table 4
OLS estimation of the Bayesian model of confidence before round 5
Training Level Middle Level High Level
Confidence before training session . ∗∗∗ . ∗∗∗ . ∗∗∗ Freq. of rounds with 4 anagrams solved . ∗∗∗ . ns . ns Freq. of rounds with 5- 6 anagrams solved . ∗∗∗ . ∗∗∗ . ∗∗∗ Freq. of rounds with non-declining performance . ∗∗∗ . ∗∗∗ . ∗∗∗ Anagrams solved per minute on rounds 1-4 . ∗∗∗ . ∗∗∗ . ∗∗∗ Hill . ∗ . ∗∗ . ∗ Choice . ns . ns . ns Constant − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ R
67% 70% 76%Observations 410 410 410
Notes.
Significance level : * p < . , ** p < . , *** p < . , ns: not significant at 10%level. Variables : Frequency of rounds with non-declining performance represents the percentageof rounds (in rounds 2-4) in which number of anagrams solved was equal or higher than in theprevious round, it takes four values (0,.33,.67,1). Hill and Choice: dummy variables with Wall asreference.
The regressions confirm the existence of local learning. Subjects did revisetheir expectations with experience of the task as several cues have highly sig-nificant coefficients (at 1% level) with the right sign. Moreover, they analyzetheir own performance correctly by setting stronger pre-requisites for them-selves when the task gets more difficult. For example, their ability to solvejust four anagrams per round in the training period increases their confidencefor this period only because, if such performance is enough to ensure successin this period, it is no longer sufficient when the task becomes more difficult.Another interesting result in Table 5 consistent with the miscalibration termin equation (7) concerns low achievers who double. The later they ended upsolving the required number of anagrams in the training period, the moreabruptly their confidence rose. It is indeed an implication of subjects’ vul-nerability to illusory signals that low-ability doublers find themselves almostas confident as high-ability doublers in spite of widely different performances.This result appears too on Figures 4 and 5, where the ability-adjusted con- The discrete value of confidence between 0 and 100 can be safely treated as continuous.4 Louis Lévy-Garboua et al.
Table 5
OLS estimation of the Bayesian model of confidence for doublers reported beforethe middle level
Middle Level High Level
Confidence after round 4 . ∗∗∗ . ∗∗∗ Freq. of rounds with 4 anagrams solved (5-9) . ns − . ns Freq. of rounds with 5- 6 anagrams solved (5-9) . ∗∗∗ . ∗ Freq. of rounds with non-declining performance (5-9) . ns . ∗∗ Number of rounds used to solve 36 anagrams . ∗∗∗ . ∗∗ Anagrams solved per minute on rounds 5-9 . ns − . ns Hill − . ∗∗∗ − . ns Choice − . ns . ns Constant − . ns − . ∗∗ R
74% 81%Observations 275 275
Notes.
Significance level : * p < . , ** p < . , *** p < . , ns: not significant at 10%level. Variables : Frequency of rounds with non-declining performance represents the percentageof rounds (in rounds 5-9) in which number of anagrams solved was equal or higher than in theprevious round. Hill and Choice: dummy variables with Wall as reference. Number of rounds usedto solve 36 anagrams (between rounds 6 and 9). (5-9) refers to measures between rounds 5 and 9. fidence of low-ability doublers jumps from bottom to top during the secondstage of the training period.A major testable implication of the Bayesian model lies in the coefficientof the prior confidence, which must be interpreted as the precision of priorinformation relative to the information collected by experience of the taskduring the training period. This coefficient is always high in Tables 4 and 5with a minimum value of 0.77. Observing such high weights for the prior favorsthe hypothesis of rational-Bayesian updating over adaptive expectations asthe latter would considerably underweight the prior relative to the evidenceaccumulated in the first four rounds. Successful experience of the easier taskin the early rounds is expected to be more predictive of final success on thesame task than in future tasks of greater difficulty. Thus, the relative weightof experience should diminish in the confidence equation at increasing levelsor, equivalently, the relative weight of prior confidence should rise. Indeed, thecoefficient of prior confidence increases continuously with the level. It rises from0.79 to 0.86 and 0.90 in Table 4; and, from 0.77 to 0.87 in Table 5. In parallel,the coefficients of cues signaling a successful experience continuously diminishwhen the level rises. We can use the mathematical expressions of the twocoefficients of prior confidence derived from equations (6) and (7) to calculatethe precision of early experience relative to prior confidence (before the task) τ l ν l ( l = 1 , , . With the data of Table 4, we get 0.266 for the training level,0.163 for the middle level, and 0.111 for the high level. Similarly, we computethe precision of late experience relative to prior confidence (before the task) τ l ν l ( l = 2 , . With the data of Table 5, we get 0.506 for middle level and 0.274 forhigh level. The impact of learning from experience appears to be substantialand with increasing returns. By elimination of ν l , we finally calculate theprecision of early experience relative to total experience during the training onfidence Biases and Learning among Intuitive Bayesians 25 period τ l τ l ( l = 2 , . We obtain 0.322 for middle level and 0.405 for high level.The rate of increase of precision resulting from longer experience (from 4 to 9rounds) τ l − τ l τ l reaches a considerable 211% at middle level and 147% at highlevel, which forms indirect evidence of the overprecision phenomenon.5.3 Why do intuitive Bayesians make wrong (and costly) predictions ofperformance?The answer to this important question, and to the related planning fallacy , iscontained in Table 6, which uses the same set of potential predictors to forecastconfidence in succeeding the middle level after doubling and ex post chancesof success : prior confidence, ability, and performance cues observed subse-quently (during rounds 5 to 9). The mere comparison of coefficients betweenthe two columns of Table 6 demonstrates that posterior confidence is based onboth objective performance cues and subjective variables, whereas the chancesof success are predicted by the objective performance cues and ability only.The latter are the frequencies of rounds with 4 and with 5-6 anagrams solvedrespectively (effort) and the speed of anagram resolution (ability); and thesubjective variables are essentially the prior confidence and the illusory signalgiven to low achievers by their (lucky) initial success. Remarkably, the numberof rounds needed for solving 36 anagrams (varying from 6 to 9), which indi-cates low achievement and recommends quitting the game at an early stage,acts as an illusory signal with a significantly positive effect on confidence incolumn 1; but the same variable acts as a correlate of low ability in column 2with a strong negative effect on the chances of success at middle level. Indeed,the subjective predictors of posterior confidence do not predict success whenthe objective performance cues are held constant. Prior confidence predictsthe posterior confidence that conditions the decision to double but fails topredict success because it is based on an intuitive reasoning which suffers fromsystematic biases. Past errors convey to the prior through the aggregation pro-cedure of Bayesian calculus and may add up with further errors caused by theperception of illusory signals.To reinforce our demonstration, we used the regressions listed in Table 6 topredict normative (based on rational expectations) and subjective (confidence-based) expected values and determine the best choice of doubling or quittingprescribed by those alternative models. As expected, the normative model’s We used an OLS to predict probabilities of success so as to make the comparison withconfidence transparent. Estimating an OLS instead of a Probit in columns 3 and 4 didn’taffect the qualitative conclusions. Conditional on initial success, prior confidence is a good predictor of the future decisionto double (regression not shown). This is good news for the quality of confidence reports;and it confirms that subjects behave as intuitive Bayesians who rely on their own subjectiveestimates of success to make the choice of doubling. The predicted values were computed on regressions containing only the significant vari-ables. We checked that these values stayed close to predictions derived from the regressionslisted in Table 6 which contain non significant variables too.6 Louis Lévy-Garboua et al.
Table 6
Estimation of posterior confidence (after doubling) and ex post chances of successat the middle level
Level 2Confidence After Chances of success
Confidence after round 4 . ∗∗∗ . ns Freq. of rounds with 4 anagrams solved (5-9) . ns . ∗ Freq. of rounds with 5- 6 anagrams solved (5-9) . ∗∗∗ . ∗∗ Freq. of rounds with non-declining performance (5-9) . ns − . ns Number of rounds used to solve 36 anagrams . ∗∗∗ − . ∗∗∗ Anagrams solved per minute on rounds 5-9 . ns . ∗∗∗ Ability − . ns . ∗∗∗ Hill − . ∗∗∗ . ns Choice − . ns − . ∗ Constant − . ns . ∗ R
74% 30%Observations 275 275
Notes.
Sample : to be comparable, these regressions consider only those who succeeded first leveland decided to double to second level.
Significance level : * p < . , ** p < . , *** p < . ,ns: not significant at 10% level. Variables : Frequency of rounds with non-declining performancerepresents the percentage of rounds (in rounds 5-9) in which number of anagrams solved was equalor higher than in the previous round. Hill and Choice: dummy variables with Wall as reference.Number of rounds used to solve 36 anagrams (between rounds 6 and 9). (5-9) refers to measuresbetween rounds 5 and 9. Number of rounds used to solve 36 anagrams (between rounds 6 and 9). predictions (based on the true -ex post- probabilities) deviate farther fromreality than the subjective model’s: 48% versus 17% of the time. However,the confidence-based prescriptions have no information value since the rate offailure is the same whether one follows the prescription (69%) or not (70%). Bycontrast, the normative prescriptions have great value since the rate of failureis 52% for those who respect them versus 88% for those who don’t. Finally,Table 7 divides the sample of doublers in four categories: 47% are able andcalibrated, 12% are unable and calibrated, 36% are overconfident and 5% areunderconfident. Rates of failure are markedly different among these categories:52% only for the able calibrated, 57% for the (able) underconfident, 78% forthe unable calibrated and 91% for the (unable) overconfident! Undeniably,the prevalence of miscalibration among doublers is substantial and its cost interms of failure is massive.
Table 7
The prevalence and cost of miscalibration among doublersPresciption of subjective Prescription of normative Rate ofexpected value expected value Category Share failuredouble double able and calibrated 47% 52%stop stop unable and calibrated 12% 78%double stop overconfident 36% 91%stop double underconfident 5% 57%onfidence Biases and Learning among Intuitive Bayesians 27
We designed an experimental analog to the popular double-or-quits game tocompare the speed of learning one’s ability to perform a task in isolation withthe speed of rising confidence as the task gets increasingly difficult. In simplewords, we found that people on average learn to be overconfident faster thanthey learn their true ability. We present a new intuitive-Bayesian model of con-fidence which integrates confidence biases and learning. The distinctive featureof our model of self-confidence is that it rests solely on a Bayesian representa-tion of the cognitive process: intuitive people predict their own probability ofperforming a task on the basis of cues and contrarian illusory signals related tothe task that they perceive sequentially. Confidence biases arise in our opinion,not from an irrationality of the treatment of information, but from the poorquality and subjectivity of the information being treated. For instance, we ruleout self-attribution biases, motivated cognition, self-image concerns and ma-nipulation of beliefs but we describe people as being fundamentally uncertainof their future performance and taking all the information they can get withlimited discrimination, including cognitive illusions. Above all, a persistentdoubt about their true ability is responsible for their perception of contrarianillusory signals that make them believe, either in their possible failure if theyshould succeed or in their possible success if they should fail.Our intuitive-Bayesian theory of estimation combines parsimoniously thecognitive bias and the learning approach. It brings a novel interpretation ofthe cognitive bias and it provides a general account of estimation biases. In-deed, we did not attribute confidence biases to specific cognitive errors butto the fundamental uncertainty about one’s true ability; and we predictedphenomena beyond the hard-easy and Dunning-Kruger effect which could notbe explained all together by previous models: miscalibration and overpreci-sion before completion of the task, limited discrimination, conservatism, slowlearning and planning fallacy. Moreover, we showed that these biases are likelyto persist since the Bayesian aggregation of past information consolidates theaccumulation of errors, and the perception of illusory signals generates con-servatism and under-reaction to events. Taken together, these two featuresmay explain why intuitive Bayesians make systematically wrong and costlypredictions of their own performance. Don’t we systematically underestimatethe time needed to perform a new (difficult) task and never seem to learn?Our analysis of overconfidence is restricted to the overestimation bias. Thelatter must be carefully distinguished from the overplacement bias since thehard-easy effect that we observed here with absolute confidence has often beenreversed when observing relative confidence: overplacement for an easy task(like driving one’s car) and underplacement for a novel or difficult task. Thereasons for overplacement are probably not unique and context-dependent.When people really compete, the over (under) placement bias may result fromtheir observing and knowing their own ability (although imperfectly) betterthan others’. If both high-ability and low-ability individuals compare them-selves with average-ability others, the former are likely to experience overplace- ment and the latter underplacement. The same reasoning applies to individualsfamiliar or unfamiliar with the task, and to individuals who were initially suc-cessful or unsuccessful with the task. When no real competition is involved,the overplacement effect relates to an evaluation-based estimate of probability.While there is an underlying choice to be made in the estimation task, no suchthing is present in the latter case. If I ask you whether you consider your-self as a top driver (relative to others), I don’t generally expect you to showme how you drive. Preference reversals are not uncommon between choicesand evaluations (Lichtenstein and Slovic, 1971). Thus, the present analysis ofoverestimation is consistent with reasonable explanations of overplacement.Moreover, it predicts the overprecision phenomenon and even rules out under-precision. This demonstrates that overestimation and overprecision are relatedbut different biases.Double-or-quits-type behavior can be found in many important decisionslike addictive gambling (Goodie, 2005), military conquests (Johnson, 2004),business expansion (Malmendier and Tate, 2005), speculative behavior (Shiller,2000), educational choices (Breen, 2001), etc. Overconfident players, chiefs, en-trepreneurs, traders, or students are inclined to take excessive risks; they areunable to stop at the right time and eventually fail more than well-calibratedpersons (e.g., Barber and Odean 2001, Camerer and Lovallo 1999). In con-trast, under-confident individuals won’t take enough risks and stay perma-nently out of successful endeavors.On the theoretical side, the intuitive-Bayesian model of confidence beforecompletion of a task creates a link between confidence and decision analysesand their respective biases. Confidence biases and the anomalies of decisionunder risk or uncertainty can be analyzed with the same tools. The estimationof one’s ability implies an implicit comparison between an uncertain binarylottery and a reference outcome. It is a by-product of the question: should Idouble or quit? This is a question of interest to behavioral and decision theo-rists. Acknowledgements
We thank the French
Ministére de la Recherche ( ACI "Contextessociaux, contextes institutionnels et rendements des systèmes éducatifs" ) for generous sup-port, Claude Montmarquette for offering an opportunity to conduct part of the experimentalsessions at CIRANO (Montreal), and Noemi Berlin for numerous discussions. We are grate-ful to the referees and the editors of this special issue for bringing very helpful remarks andsuggestions. We remain responsible for any error.
References
Adams JK (1957) A confidence scale defined in terms of expected percentages.The American journal of psychology pp 432–436 However, overconfidence may pay off when there is uncertainty about opponents’ realstrengths, and when the benefits of the prize at stake are sufficiently larger than the costs(e.g., Johnson and Fowler 2011, Anderson et al 2012).onfidence Biases and Learning among Intuitive Bayesians 29
Anderson C, Brion S, Moore DA, Kennedy JA (2012) A status-enhancementaccount of overconfidence. Journal of personality and social psychology103(4):718–735Armantier O, Treich N (2013) Eliciting beliefs: Proper scoring rules, incentives,stakes and hedging. European Economic Review 62:17–40Barber BM, Odean T (2001) Boys will be boys: Gender, overconfidence, andcommon stock investment. Quarterly journal of Economics pp 261–292Benoît JP, Dubra J (2011) Apparent overconfidence. Econometrica79(5):1591–1625Benoît JP, Dubra J, Moore DA (2015) Does the better-than-average effectshow that people are overconfident?: Two experiments. Journal of the Eu-ropean Economic Association 13(2):293–329Breen R (2001) A rational choice model of educational inequality. Centro deEstudios Avanzados en Ciencias Sociales Instituto Juan March de Estudiose Investigaciones, Madrid Working paper(166)Brunnermeier MK, Parker JA (2005) Optimal expectations. American Eco-nomic Review 95(4):1092–1118, DOI 10.1257/0002828054825493Buehler R, Griffin D, Ross M (2002) Inside the planning fallacy: The causesand consequences of optimistic time predictions. Heuristics and biases: Thepsychology of intuitive judgment pp 250–270Camerer C, Lovallo D (1999) Overconfidence and excess entry: An experimen-tal approach. American economic review pp 306–318Clark J, Friesen L (2009) Overconfidence in forecasts of own performance: Anexperimental study*. The Economic Journal 119(534):229–251Deaves R, Lüders E, Schröder M (2010) The dynamics of overconfidence: Ev-idence from stock market forecasters. Journal of Economic Behavior & Or-ganization 75(3):402–412DeGroot MH (1970) Optimal Statistical Decisions. New York: McGraw-HillErev I, Wallsten TS, Budescu DV (1994) Simultaneous over-and undercon-fidence: The role of error in judgment processes. Psychological review101(3):519–528Gervais S, Odean T (2001) Learning to be overconfident. Review of Financialstudies 14(1):1–27Goodie AS (2005) The role of perceived control and overconfidence in patho-logical gambling. Journal of Gambling Studies 21(4):481–502Grieco D, Hogarth RM (2009) Overconfidence in absolute and relative perfor-mance: The regression hypothesis and bayesian updating. Journal of Eco-nomic Psychology 30(5):756–771Griffin D, Tversky A (1992) The weighing of evidence and the determinantsof confidence. Cognitive psychology 24(3):411–435Heath C, Tversky A (1991) Preference and belief: Ambiguity and competencein choice under uncertainty. Journal of risk and uncertainty 4(1):5–28Hollard G, Massoni S, Vergnaud JC (2015) In search of good probability as-sessors: an experimental comparison of elicitation rules for confidence judg-ments. Theory and Decision pp 1–25, DOI 10.1007/s11238-015-9509-9
Johnson DD (2004) Overconfidence and War: The Havoc and Glory of PositiveIllusions. Cambridge, MA: Harvard UPJohnson DD, Fowler JH (2011) The evolution of overconfidence. Nature477(7364):317–320Kahneman D, Tversky A (1979) Prospect theory: An analysis of decision underrisk. Econometrica pp 263–291Köszegi B (2006) Ego utility, overconfidence, and task choice. Journal of theEuropean Economic Association 4(4):673–707Kruger J (1999) Lake wobegon be gone! the" below-average effect" and theegocentric nature of comparative ability judgments. Journal of personalityand social psychology 77(2):221–232Kruger J, Dunning D (1999) Unskilled and unaware of it: how difficulties inrecognizing one’s own incompetence lead to inflated self-assessments. Jour-nal of personality and social psychology 77(6):1121–1134Langer EJ, Roth J (1975) Heads i win tails it’s chance: The illusion of controlas a function of the sequence outcomes in a purely chance task. Journal ofPersonality and Social Psychology 32:951–955Lichtenstein S, Fischhoff B (1977) Do those who know more also know moreabout how much they know? Organizational behavior and human perfor-mance 20(2):159–183Lichtenstein S, Slovic P (1971) Reversals of preference between bids andchoices in gambling decisions. Journal of experimental psychology 89(1):46–55Lichtenstein S, Fischhoff B, Phillips L (1982) Calibration of probabilities: Thestate of the art to 1980. In: Kahneman D, Slovic P, Tverski A (eds) Judge-ment under uncertainty: Heuristics and biases, New York: Cambridge Uni-versity Press, pp 306–334Malmendier U, Tate G (2005) Ceo overconfidence and corporate investment.The journal of finance 60(6):2661–2700Merkle C, Weber M (2011) True overconfidence: The inability of rational in-formation processing to account for apparent overconfidence. OrganizationalBehavior and Human Decision Processes 116(2):262–271Miller DT, Ross M (1975) Self-serving biases in the attribution of causality:Fact or fiction? Psychological bulletin 82(2):213–225Miller TM, Geraci L (2011) Unskilled but aware: reinterpreting overconfidencein low-performing students. Journal of experimental psychology: learning,memory, and cognition 37(2):502–506Mobius M, Niederle M, Niehaus P, Rosenblat T (2014) Managing self-confidence. Tech. rep., Working PaperMoore DA, Healy PJ (2008) The trouble with overconfidence. Psychologicalreview 115(2):502–517Oskamp S (1965) Overconfidence in case-study judgments. Journal of consult-ing psychology 29(3):261–265Ryvkin D, Krajč M, Ortmann A (2012) Are the unskilled doomed to remainunaware? Journal of Economic Psychology 33(5):1012–1031 onfidence Biases and Learning among Intuitive Bayesians 31
Shiller RJ (2000) Measuring bubble expectations and investor confidence. TheJournal of Psychology and Financial Markets 1(1):49–60Van den Steen E (2011) Overconfidence by bayesian-rational agents. Manage-ment Science 57(5):884–896Svenson O (1981) Are we all less risky and more skillful than our fellow drivers?Acta Psychologica 47(2):143–148Tversky A, Kahneman D (1973) Availability: A heuristic for judging frequencyand probability. Cognitive psychology 5(2):207–232
Appendix