GGender Differences in Motivated Reasoning ∗Michael Thaler † January 2021
Abstract
Men and women systematically differ in their beliefs about their performance rela-tive to others; in particular, men tend to be more overconfident. This paper providessupport for one explanation for gender differences in overconfidence, performance-motivated reasoning , in which people distort how they process new information in waysthat make them believe they outperformed others. Using a large online experiment,I find that male subjects distort information processing in ways that favor their per-formance, while female subjects do not systematically distort information processingin either direction. These statistically-significant gender differences in performance-motivated reasoning mimic gender differences in overconfidence; beliefs of male subjectsare systematically overconfident, while beliefs of female subjects are well-calibrated onaverage. The experiment also includes political questions, and finds that politically-motivated reasoning is similar for both men and women. These results suggest that,while men and women are both susceptible to motivated reasoning in general, men findit particularly attractive to believe that they outperformed others.
JEL classification:
J16; D83; C91; D91 ∗ I would like to thank Alberto Alesina, Christine Exley, David Laibson, Matthew Rabin, and Mattie Toma for helpfulcomments on this paper. I am grateful for funding support from the Harvard Business School Research Fellowship and the EricM. Mindich Research Fund for the Foundations of Human Behavior. The experiment was approved by the IRB at Harvard(IRB17-1725). † Princeton University. Email: [email protected]. a r X i v : . [ ec on . GN ] J a n Introduction
When applying for jobs, entering competitions, or picking stocks to trade, people must pre-dict how their own performance will compare to the performance of others. A growing bodyof literature shows that there are sizable gender gaps in beliefs about one’s relative perfor-mance. The consistent finding across labor market, tournament, and finance environmentsis that men are more overconfident than women.Information can remedy or exacerbate these gender gaps. On the one hand, informativesignals can enable people to learn about their performance, which will reduce overconfidence;in environments in which men are more overconfident, grounding them in the truth will re-duce gender differences. On the other hand, people may distort how they process informationin directions that make them falsely believe they are high performers: performance-motivatedreasoning . If there are gender differences in motivated reasoning, information may cause mento become even more overconfident and the gender confidence gap can expand.To identify performance-motivated reasoning, I use the experiment from Thaler (2020),which asks people to assess the veracity of a news source that tells them their current medianbeliefs are too high or too low. People first report their median belief about their performancerelative to other subjects on a set of factual questions about the US, politics, and currentevents. Then, they are given a binary message from an unknown news source that eithersays that their true performance is greater than their median belief or less than their medianbelief. The news source either reports the truth (“True News”) or reports the opposite ofthe truth (“Fake News”). Since subjects’ medians are elicited, subjects have stated thatthey believe the correct answer is equally likely to be greater than or less than their median.Therefore, a Bayesian subject would not infer anything about the veracity of the sourcefrom a “greater than” message or a “less than” message. However, “greater than” may evokepositive motivated beliefs for subjects who find it attractive to believe they outperformedothers, and motivated reasoning would lead these subjects to assess this source as more likelyto be True News.I run this experiment online with approximately 1000 participants, and this paper’smain finding is that men systematically engage in performance-motivated reasoning andthat women do not. That is, men think that news is more likely to be from True News iftells them they should become even. more confident than they currently are (“good news”),whereas women trust good news and bad news about performance equally. The gender gapin performance-motivated reasoning is statistically significant at the p = 0 .
001 level.There are two possible explanations for the finding that men alone motivatedly reasonabout performance. The first explanation is that men may be more susceptible to the bias of1otivated reasoning, and women may reason in a more Bayesian manner. This explanationwould imply that, on questions that evoke motivated beliefs about other topics, men wouldagain engage in a greater degree of motivated reasoning than women would. The secondexplanation is that men and women may hold different motivated beliefs in the particulardomain of relative performance, but that both would be susceptible to motivated reasoningin other settings.To test these competing hypotheses, I consider gender differences in motivated reason-ing about politics. On political questions, I find that subjects of both genders engage inpolitically-motivated reasoning, and that the differences between men and women are smalland statistically insignificant. This result suggests that susceptibility to motivated reason-ing in general need not be different between men and women, but that certain types ofmotivated beliefs differ by gender (such as in Coffman, Collis, and Kulkarni 2019). It is con-sistent with the theory that men find it more attractive to believe that they outperformedothers, but that both men and women find it attractive to believe that their political party’sstances are right.The data suggest that results are not driven by potential confounding factors of sub-jects misreporting their median belief or subjects misunderstanding the distribution of newssources. In particular, subjects who have a confidence interval that is symmetric about theirinitial guess are likely to have a belief distribution for which their mean and median aresimilar. For such unskewed subjects, I find that the treatment effects are nearly identical tothe full sample. For the distribution of news sources, I tell some subjects that the (ex ante)likelihood of receiving True News and Fake News is 50-50; the other subjects are not giventhis information. The first group may anchor more towards 50-50, and the second group mayupdate their meta-beliefs about the distribution; however, I find no significant differences intreatment effects between the two groups, suggesting that these potential confounds are nota principal driver of the main treatment effects.This paper contributes to a decades-old literature that discusses why there have been sub-stantial gender differences in labor market outcomes (Goldin 1990; Goldin 2014). As shownby numerous experiments (Gneezy, Niederle, and Rustichini 2003; Niederle and Vesterlund2007; Dohmen and Falk 2011) and the literature summarized in Niederle and Vesterlund(2011), men tend to engage in competition more than women do. Both preferences (Eckeland Grossman 2008; Croson and Gneezy 2009; Shurchkov and Eckel 2018) and beliefs aboutone’s ability and performance play a role. In particular, gender differences in overconfi-dence may be explained by responses to new information (Lenney 1977; Beyer 1990; Barberand Odean 2001), leading to differential behavior. This paper focuses on a form of over- This form of heterogeneity is also mentioned in passing in Thaler 2020. overplacement (Moore and Healy 2008), in which people areoverconfidence about the ranking of their performance or ability within a comparison group;overplacement is particularly important in competitive settings. These findings also comple-ment the empirical results from Sarsons and Xu (2016), who find survey evidence of gendergaps in perceived knowledge — relating both to overconfidence and overprecision — abouteconomic questions.Motivated reasoning from new information is an important potential driver of overcon-fidence (Kunda 1990; Benabou and Tirole 2002), but has been difficult to experimentallyidentify from other models of cognition (Benjamin 2019; Logg, Haran, and Moore 2018). Assuch, I use the experimental design from Thaler (2020), which is able to portably identifymotivated reasoning in many domains since there is nothing to infer from the signals sub-jects receive. Using designs that estimate non-Bayesian responsiveness to “good news” and“bad news,” previous papers have not systematically found statistically significant genderdifferences in asymmetric updating (typically explained by motivated reasoning)(Eil and Rao2011; Mobius et al. 2014; Ertac 2011; Coutts 2018). However, heterogeneity by gender isnot the main focus of those papers, and they are not well-powered to rule out moderate dif-ferences. Tangentially, Mobius et al. (2014) and Coutts (2018) find that women underreactmore to information than men do; this dimension is not captured using my design, whichrules out underreaction by giving subjects uninformative signals.One related paper with a similar hypothesis is Coffman, Collis, and Kulkarni (2019),which ran contemporaneous experiments to the one in this paper, and finds that genderdifferences in responses to information about performance depend on whether the task isstereotypically male-typed or female-typed. My paper differs in its primary focus on identi-fying motivated reasoning, which the new experimental design enables me to cleanly identify.While Coffman, Collis, and Kulkarni (2019) find novel evidence for asymmetries in responsesto information, most of the effect is due to Bayesian updating or to inaccurate prior beliefs. The results from my paper indicate that differences in performance-motivated reason-ing may be an important determinant of differences in overconfidence. Indeed, I find thatoverconfidence and motivated reasoning are correlated. Previous literature has looked atpotentially-related implications of these gender differences, such as in labor markets (Sar-sons 2019; Schultz and Thoni 2016; Shurchkov and Eckel 2018), self-promotion (Exley andKessler 2019), and stereotyping others (Bordalo et al. 2019; Grossman et al. 2019). This Ertac (2011) does find gender differences in asymmetric updating on a verbal task, but not on a mathtask. Coffman, Collis, and Kulkarni (2019) also test a version of the regression specification of Eil and Rao(2011). Once they control for Bayesian predictions, they find a suggestive, but statistically insignificant,effect of gender-congruent motivated reasoning.
This section outlines the formalization of the theory of motivated reasoning, and followsThaler (2020). Further details are in that paper.There are a set of agents i and a set of questions q . For each agent and question, Naturedetermines whether a source is true ( T ) or false ( ¬ T ). Sources are independently drawnwith P ( T ) = p , and agents receive data x iq about the source veracity.Agents engage in motivated reasoning , which means that they form their posterior byincorporating prior, likelihood, and a motivated beliefs term: P ( T | x iq ) | {z } posterior ∝ P ( T ) | {z } prior · P ( x iq | T ) | {z } likelihood · M iq ( T ) ϕ | {z } mot. reasoning , We take log odds ratios of both sides to attain an additive form:logit P ( T | x iq ) = logit P ( T ) + log P ( x iq | T ) P ( x iq |¬ T ) ! + ϕ ( m iq ( T ) − m iq ( ¬ T )) . (1)Motivated reasoners act as if they receive both the actual signal ( x iq ) and a signal whoserelative likelihood corresponds to how much they are motivated to believe the state is T . m iq ( T ) : { T, ¬ T } → R is denoted the motive function. The weight put on this signal is ϕ ≥
0, called susceptibility . When ϕ = 0, agents are Bayesian; when ϕ >
0, agentsmotivatedly reason.We particularly consider one information structure, in which the x iq are binary anduninformative about the news source veracity. That is, we will choose the x iq in such away that P ( x iq | T ) = P ( x iq |¬ T ) = 1 /
2. In such a setting, described experimentally below,the motivated reasoner will have logit P ( T | x iq ) = logit P ( T ) + ϕ ( m iq ( T ) − m iq ( ¬ T )) . If themessage is attractive, then their posterior on the likelihood that the news source is true willbe higher; if the message is unattractive, then their posterior will be lower. A Bayesian will4ot update from either message.This paper will allow for two types of agents, male and female, who differ in their motives: m Male ( T ) = m Female ( T ). We assume that ϕ ≥ m ( T ) − m ( ¬ T ) by observing agents’inference processes. To test the model, the experiment provides people with signals about the veracity of newssources that tell them their median beliefs are too high or too low (Thaler 2020). Thesesignals are constructed such that there would be nothing for a Bayesian to infer, but theyevoke motivated beliefs.First, subjects take a study that tests knowledge and responses to news about politicaland US knowledge issues like crime, income mobility, racial discrimination, and geography.Full question texts are in the Appendix. Next, subjects are asked the following question:
How well do you think you performed on this study about political and U.S. knowl-edge? I’ve compared the average points you scored for all questions (prior to thisone) to that of 100 other participants.How many of the 100 do you think you scored higher than?(Please guess between 0 and 100.)
Subjects were indeed ranked versus 100 others, so this question has a factual answer. Subjectsmay be motivated to believe that they outperformed others, and we will hypothesize thatthere are gender differences in performance-motivated reasoning. This test of motivatedreasoning involves three steps:1.
Beliefs:
Subjects are asked to guess the answers to the question above. They are askedand incentivized to guess their median belief (such that they find it equally likely forthe answer to be above or below their guess). Details on incentives are below.2.
News:
Subjects receive a binary message from one of two randomly-chosen newssources: True News and Fake News. The message from True News is always cor-rect, and the message from Fake News is always incorrect. This is the main treatmentvariation. 5he message says either “The answer is greater than your previous guess of [previ-ous guess].” or “The answer is less than your previous guess of [previous guess].” Notethat the exact messages are different for subjects who make different initial guesses.For the question above, “greater than” corresponds to Pro-Performance News, orGood News; “less than” corresponds to Anti-Performance News, or Bad News.3.
Assessment:
After receiving the message, subjects assess the probability that themessage came from True News using a scale from 0/10 to 10/10, and are incentivizedto state their true belief. I test for motivated reasoning by looking at the treatmenteffect of seeing a “greater than” message versus a “less than” message on news veracityassessments.This same procedure is followed for the nine political questions participants answer elsewherein the study. On the political questions, the news is either classified as Pro-Republican orPro-Democratic; then, we use subjects’ party preference and classify news as Good News /Pro-Party News or Bad News / Anti-Party News.Recall that since subjects receive “greater than” or “less than” messages that comparethe answer to their median, a Bayesian would not change her assessment based on themessage. If she had a prior that P(True News) = 1/2 before seeing the message, she wouldform a posterior that P(True | “greater than”) = P(True | “less than”) = 1/2. We attributesystematic treatment effects of the messages on veracity assessments to motivated reasoning.For instance, if men tend to state P(True | “greater than”) > P(True | “less than”) andwomen tend to state P(True | “greater than”) = P(True | “less than”) on the question aboutperformance, this would indicate that there is systematic performance-motivated reasoningfor men but not for women. Overall Scoring Rule
At the end of the experiment, subjects earn a show-up fee of $3 and either receive anadditional $10 bonus or no additional bonus. As will be elaborated below, in each roundof the experiment subjects earn between 0-100 “points” based on their performance. Thesepoints correspond to the probability that the subject wins the bonus: a score of x pointscorresponds to an x/
10 percent chance of winning the bonus. Likewise, general over- and under-weighting of priors and likelihoods (such as forms of prior-confirmingbiases and conservatism) do not predict a treatment effect of message direction on assessment. This lottery system is designed to account for risk aversion, as directly mapping points to earningscould lead to subjects hedging their guesses. As such, we do not need to assume risk neutrality in order forthe experiment to be incentive compatible, but we do need to assume that subjects behave as if compound uestions Page On question pages, subjects are given the text of the question and are asked to inputthree numbers about their initial beliefs:•
My Guess : This elicits the median of the subjects’ prior distribution.•
My Lower Bound : This elicits the 25th percentile of the subjects’ prior distribution.•
My Upper Bound : This elicits the 75th percentile of the subjects’ prior distribution.The scoring rule for guesses is piecewise linear. Subjects earn max { − | c − g | , } pointsfor a guess of g when the correct answer is c . Subjects are told that they will maximizeexpected points by stating the median of their belief distribution.The scoring rule for bounds is piecewise linear with different slopes. For upper bound ub , subjects earn max { − c − ub ) , } points if c ≥ ub and max { − ( ub − c ) , } pointsif c ≤ ub . For lower bound lb , subjects earn max { − ( c − lb ) , } points if c ≥ lb andmax { − lb − c ) , } points if c ≤ lb . Subjects maximize expected points by setting ub tobe the 75 th percentile and lb to be the 25 th percentile of their belief distribution. Subjectsare restricted to give answers for which My Lower Bound ≤ My Guess ≤ My Upper Bound;if they do not, they see an error message.
News Assessments Page
After submitting their guess, subjects see a second page about the same question. Atthe top of the page is the exact text of the original question. Below the original question isa message relating the answer to the number they submit for
My Guess . This message sayseither:“The answer is greater than your previous guess of [My Guess] .” or“The answer is less than your previous guess of [My Guess] .”Subjects are told that True News always tells the truth and Fake News never tells thetruth, and that sources are independent across questions. Below the message, subjects areasked: “Do you think this information is from True News or Fake News?” and choose one ofeleven radio buttons that say “x/10 chance it’s True News, (10-x)/10 chance it’s Fake News”from each x=0, 1, . . . , 10.The scoring rule for assessments is quadratic. For assessment a , subjects earn 100(1 − (1 − a ) ) points if the source is True News and 100(1 − a ) points if it is Fake News. Subjects lotteries are reduced to simple lotteries. The probability distribution is identical to randomly choosing aquestion for payment and subsequently playing the lottery based on the points scored on that question. Seven answers to the performancequestion (0.71 percent) are exactly correct. The likelihood of getting the answer exactlycorrect is similar to the likelihood if subjects guessed randomly (0.99 percent), so there is noevidence of guess manipulation.
The experiment, also the focus of Thaler (2020), was conducted on June 25, 2018 on Ama-zon’s Mechanical Turk (MTurk) platform. MTurk is an online labor marketplace in whichparticipants choose “Human Intelligence Tasks” to complete. MTurk has become a verypopular way to run economic experiments, and participants generally tend to have morediverse demographics than students in university laboratories (e.g. Horton, Rand, and Zeck-hauser 2011; Levay, Freese, and Druckman 2016). The experiment was coded using oTree,an open-source software based on the Django web application framework developed by Chen,Schonger, and Wickens (2016).The study was offered to MTurk workers currently living in the United States. 1,387subjects were recruited and answered at least one question, and 1,300 subjects completedthe study. Of these subjects, 987 (76 percent) passed simple attention and comprehensionchecks, and the rest are dropped from the analyses. As discussed in Section 3.3 and theAppendix, results are robust to the inclusion of these subjects.Of the 987 subjects, 980 (99.3 percent) do not get the performance question exactlycorrect. These subjects receive a message relating their guess to the true answer, and aregiven the news veracity assessment page. Of the 980, 528 (53.9 percent) identify as male,447 (45.6 percent) identify as female, and 5 (0.5 percent) do not identify as either or prefernot to report their gender. These results focus on the 975 subjects who report their genderas male or female.The balance table for the Pro-Performance / Anti-Performance treatment for these 975subjects is in Appendix A.1. There are not statistically significant differences across demo-graphics between subjects who receive Good News or Bad News about their performance. This is true except for the comprehension check question, where the message says “The answer is equal / not equal to your previous guess of [My Guess] .” In order to pass these checks, subjects needed to perfectly answer the comprehension check question inAppendix B (by giving a correct answer, correct bounds, and answering the news assessment with certainty).In addition, many questions had clear maximum and minimum possible answers (such as percentages, be-tween 0 and 100). Subjects were dropped if any of their answers did not lie within these bounds.
Section 3.1 analyzes gender differences in overconfidence about performance. Next, Sec-tion 3.2 presents the main results about how men and women motivatedly reason abouttheir performance. It shows that gender gaps in motivated reasoning on the performancequestion is not mirrored on politicized questions. Section 3.3 discusses robustness and po-tential confounds to identification, and does not find evidence that results are explained bythese alternative explanations. Section 3.4 proposes an interpretation of these results, aswell as complementary evidence regarding overprecision.
Gender differences in overconfidence are apparent from the raw data. Male subjects expect tooutperform 55.3 percent of subjects (s.e. 0.9 pp), and they actually outperform 49.5 percentof subjects (s.e. 1.1 pp). Female subjects expect to outperform 44.5 percent of subjects (s.e.1.0 pp), and they actually outperform 45.4 percent of subjects (s.e. 1.3 pp). Male subjects’performance predictions are 5.8 pp too high on average (s.e. 1.3 pp, p < . p = 0 . p < . igure 1: Confidence and Performance by Gender
Notes:
The graph shows a binned scatter plot of performance and confidence by gender. The x-axis measures performancepercentile on the quiz. The y-axis measures confidence, which is what percentile subjects expected to score.
The figure shows that confidence is larger for men at every point in the performancedistribution on this question. In fact, men of high, medium, and low performance levels allexpect to perform at or above the median on average. Except at the high end, women donot expect to perform above the median. The differences between the beliefs of men andwomen (10.8 pp) are larger than the differences between the beliefs of the top quintile andbottom quintile of subjects (6.3 pp).Appendix Figure 6 presents a CDF of subjects’ levels of overconfidence. It shows thatdifferences in overconfidence are not driven by outliers; the distribution for men first-order-stochastically dominates the distribution for women.
The gender gap in motivated reasoning is also apparent in the raw data.Recall that a Bayesian would believe that the news source were equally likely to beTrue or Fake if it gave good news or bad news about their relative performance. As shown10n Figure 2, men significantly trust the news more when it gives Good News about theirrelative performance than when it gives Bad News. By contrast, women give nearly-identicalassessments of Good News and Bad News about their performance.
Figure 2:
Trust in News About Own Performance by Gender
Notes:
Good News tells subjects their current median beliefs about their performance are too low; Bad News tells subjectsthese beliefs are too high. The y-axis measures subjects’ assessments of the veracity of the source. Bayesians would havethe same trust in news for Good News and Bad News, and the residual is motivated reasoning. Error bars correspond to95 percent confidence intervals.
These results demonstrate that, when given news about their relative performance,men on average motivatedly reason to think they did even better than their (already-overconfident) current beliefs, while women are approximately Bayesian on average. Gender differences in motivated reasoning appear in both directions: men believe thatGood News is more likely to be True News and believe that Bad News is more likely tobe Fake News. The differences in average treatment effects are not driven by male andfemale subjects differently choosing extreme probabilities. Appendix Figure 7 shows the Note that this is the measure of the average treatment effect.
The average treatment for women could bedue to women not engaging in any motivated reasoning about performance, or due to women being equallylikely to motivatedly reason to believe they performed better than expected and believe they performedworse than expected. | Good News) and Figure 8 shows the CDFof news veracity assessments P(True News | Bad News) about performance. Male subjectsgive higher veracity assessments of Good News about performance — and lower veracityassessments of Bad News about performance — at all points in the distribution.Next, we ask whether this discrepancy is due to the specific domain of performanceor due to an overall susceptibility to motivated reasoning. To tease apart these competinghypotheses, we consider gender differences in motivated reasoning in another setting: politics.The political topics used follow Thaler (2020), and the list of hypothesized political motivesis in Appendix Table 3. Figure 3 shows that that there is no sizable heterogeneity by genderin motivated reasoning about politics, suggesting that the gender differences we see areparticular to beliefs about performance.
Figure 3:
Motivated Reasoning by Gender
Notes:
The x-axis is the treatment effect of seeing Good News versus Bad News about one’s performance or party.Bayesians would have an effect of zero, while motivated reasoners would have a positive effect. Error bars correspond to95 percent confidence intervals.
The regression specifications for gender differences in performance-motivated reasoningare between subjects, regressing assessments a for subject i on the subject’s gender, whether12he news was Pro-Performance (vs. Anti-Performance), and the interaction between genderand news direction, with and without controls for a vector of other demographics Z i . a i = α + β · Male i + β · Pro-Performance i + β · Pro-Performance i · Male i + γZ i + (cid:15) i β is the main object of interest, measuring how much men motivatedly reason more thanwomen about performance. β is also interesting in its own right; it measures how muchwomen motivatedly reason about performance. The demographic controls Z i are race, age,income, education, religion, and party preference.The regression specification that compares performance-motivated reasoning to politically-motivated reasoning is within subjects, regressing assessments a by subject i on question q inround r on a dummy for the news being Pro-Performance vs. Anti-Performance, a dummyfor the news being overall Good vs. overall Bad (this now corresponds to Pro-Performance or Pro-Party vs. Anti-Performance or Anti-Party), and the interactions between gender andnews direction for performance and overall. Since this test is within subjects, we can replacecontrols with fixed effects for subjects i , question topic q interacted with gender, and round r interacted with gender. Note that the performance question is always in the same round,so question and round fixed effects only pertain to the political questions. For this analysis,we remove the politically-neutral subjects, since there is no hypothesis as to which way theywill motivatedly reason about politics. a iqr = α + β · Good News iqr + β · Good News iqr · Male i + β · Pro-Performance iqr + β · Pro-Performance iqr · Male i + γF E i + δF E gender, q + ζF E gender, r + (cid:15) iqr Table 1 shows that the regression results show the same patterns as what was capturedin the raw data. The performance question is always in the same round since the correct answer depends on subjects’answers to the previous questions. able 1: Motivated Reasoning by Gender and Topic (1) (2) (3) (4)Pro-Performance 0.005 0.006 -0.088(0.026) (0.026) (0.028)Pro-Performance x Male 0.113 0.114 0.122(0.035) (0.035) (0.037)Pro-Party 0.095(0.009)Pro-Party x Male -0.014(0.013)Good News 0.097(0.009)Good News x Male -0.013(0.013)Gender x Question FE No No Yes YesGender x Round FE No No Yes YesSubject FE No No Yes YesGender Control Yes Yes No NoOther Controls No Yes No NoObservations 887 887 7868 8755 R Notes:
OLS, errors clustered at subject level. Only subjects who listmale or female as their gender, and subjects who have a party preference,are included. Good News refers to Pro-Party or Pro-Performance news, sospecification (4) is comparing Pro-Performance to Pro-Party news. Sub-ject controls: race, age, log(income), years of education, religion, andparty preference.
In particular, columns (1) and (2) show that gender interacts with performance-motivatedreasoning, and that only men are systematically biased towards trusting Pro-Performancenews. Column (3) shows that men and women are similar at motivatedly reasoning towardstrusting Pro-Party news. Column (4) shows that the difference between Pro-Party and Pro-Performance motivated reasoning is significantly larger for men than it is for women. GoodNews is defined as either
Pro-Performance or Pro-Party news, so the Pro-Performance rowin (4) measures the difference between Pro-Performance and Pro-Party since Good News iscontrolled for. 14ecall that the experiment tested motivated reasoning on nine separate political ques-tions. Instead of aggregating all the political topics, Figure 4 compares motivated reasoningby gender on each question individually by interacting the motivated reasoning measure withtopic-by-topic dummies.
Figure 4:
Motivated Reasoning by Gender and Topic
Notes:
The x-axis is the treatment effect of seeing Good News versus Bad News about one’s performance or party.Bayesians would have an effect of zero, while motivated reasoners would have a positive effect. Error bars correspond to95 percent confidence intervals.
This figure shows that the null effect in gender differences on politically-motivated rea-soning is not driven by large and heterogeneous gender differences on individual questions.On none of the individual political questions do men and women motivatedly reason bya statistically significantly different amount. For both men and women, we can rule outBayesian updating for politically-driven motivated reasoning on eight of the nine politicalquestions. The performance question uniquely stands out in its gender discrepancy.15 .3 Robustness
Two threats to identification include misunderstanding the distribution of news sources andmisreporting median beliefs in initial guesses.First, subjects may not understand the distribution of news sources. The actual likelihoodof True News and Fake News is 50 percent each. However, subjects who are given this priormay overly anchor towards 50-50, and subjects who are not given a prior may update aboutthe distribution. While it is not clear why there would be gender differences that interactwith these effects, I run a between-subjects treatment to ensure that results do not dependon the prior. That is, subjects are either told in the instructions that the news sources were(ex ante) equally likely, or they were not given this information.I find that giving subjects a prior about the likelihood of True News does not noticeablyaffect the results. In each treatment, male subjects systematically engage in performance-motivated reasoning while female subjects do not; and in each treatment, male and femalesubjects engage in politically-motivated reasoning. These results are shown in the circle anddiamond plots in Figure 5. In general, there are not clear differences between the treatments,though the effects are noisier.Second, subjects may not correctly understand what a median is and report anothermoment of their belief distribution (such as their mean). If subjects make this error, thenresults could be explained by Bayesian updating if subjects’ mean were lower than theirmedian. Subjects whose initial guesses were lower than their true median belief wouldrationally think that the source is more likely to be True News if they receive a “greaterthan” message.To account for this potential confound, I elicit 50-percent confidence intervals (25th and75th percentile beliefs) from all subjects and analyze where the initial guess lies in this range.If subjects’ initial guesses were closer to their lower bound than their upper bound, thiswould be indicative of negatively-skewed prior belief distributions. I do not find evidenceof substantially-skewed distributions. For men, initial guesses lie 49.4 percent of the waybetween their lower and upper bound; for women, initial guesses lie 51.7 percent of the waybetween their bounds. Guesses are close to the exact midpoint of confidence intervals, andthe gender difference is not large enough to explain the 11 percentage point gap betweenmen and women’s news assessments.Futhermore, the main results look similar if we restrict estimates to the subjects whoseinitial guesses lie exactly at the midpoint of their confidence interval. As shown in the squareand triangle plots in Figure 5, there are still sizable gender differences in motivated reasoningon the performance question but not on the political questions.Results are also similar if subjects who failed attention checks are included in the anal-16 igure 5:
Motivated Reasoning by Gender: Robustness
Notes:
The x-axis is the treatment effect of seeing Good News versus Bad News about one’s performance or party.Bayesians would have an effect of zero, while motivated reasoners would have a positive effect. Error bars correspond to 95percent confidence intervals. Circle: Received 50-50 prior about the veracity of the news. Diamond: Did not receive a priorabout the veracity of the news. Square: Unskewed prior belief distributions. Triangle: Skewed prior belief distributions. ysis as well. With the inclusion of the 311 subjects who failed attention checks, there are1300 subjects in total. Of these 1300, there are 1282 news veracity assessments on theperformance question by men and women; 710 are by men, and 572 are by women. In theAppendix, Figure 9 shows that the results are similar to the results from the main sample(Figure 3), though the treatment effect estimates are slightly smaller since the inattentivesubjects tend to give more random news assessments.
One explanation for these findings is that, while men and women are both susceptible tobiasing how they process information towards more attractive beliefs, they differ in which Most subjects who failed the attention check incorrectly answered the question that asked “What is theyear right now?” ϕ in the model) being the same for men andwomen, but motives ( m Male ( · ) and m Female ( · )) differing by gender. These results provide further evidence that heterogeneity in overconfidence may be partlyexplained — and furthered — by heterogeneity in motivated reasoning. Table 2 providessuggestive evidence that subjects who are more overconfident about their performance alsomotivatedly reason more about performance. While this evidence is purely correlational, itsuggests that motivated reasoning may play a role for overconfidence.
Table 2:
Motivated Reasoning and Overconfi-dence about Performance (1) (2)Pro-Performance 0.055 0.056(0.016) (0.016)Overconfidence -0.015 -0.009(0.038) (0.037)Pro-Performance x Overconfidence 0.114 0.119(0.057) (0.056)Controls No YesObservations 980 980 R Notes:
OLS, robust standard errors. Subject controls: gender,race, age, log(income), years of education, religion, and partypreference.
These results about what issues men and women differentially motivatedly reason aboutrelate to the broader discussion of gender stereotyping. Results suggest that relative per-formance is a male-typed belief, while political topics are not systematically gendered. Tosupport the latter hypothesis, we can also look at another measure of confidence: overpreci-sion (Moore and Healy 2008; Moore, Tenney, and Haran 2015; Sarsons and Xu 2016).In the experiment, subjects are also asked to state their 50-percent confidence intervals. However, note that ϕ and m ( · ) are not separately identified, so both may vary by gender. These beliefs are incentivized using a piecewise-linear scoring rule: a subject earns 100 − − upper bound) if upper bound < solution and 100 − (upper bound − solution) if upper bound > solution, and
18n each question, subjects are labeled as underprecise if the confidence interval contains thecorrect answer more than 50 percent of the time, and labeled as overprecise if the confidenceinterval contains the correct answer less than 50 percent of the time.If gender confidence differences played a role for political beliefs, we would expect mento have greater levels of overprecision than women. However, men and women have almostidentical levels of overprecision. Male subjects’ confidence intervals include the true answer46.7 percent of the time (s.e. 0.7 pp) and female subjects’ confidence intervals include the trueanswer 46.6 percent of the time (s.e. 0.8 pp). There is significant evidence of overprecisionfor both men and women (using a t-test we can reject that these probabilities are less than50; p < .
001 for each men and women). The gender difference is 0.1 pp (s.e. 1.1 pp) andstatistically insignificant ( p = 0 . This paper has shown that there are sizable gender gaps in motivated reasoning. Mensystematically engage in motivated reasoning about their performance relative to others;women do not systematically engage in motivated reasoning about performance. The gendergaps in motivated reasoning can make gender gaps in overconfidence persist, and even furtherthem. By contrast, there are little gender differences about politically-motivated reasoning;both men and women are systematically biased in their inference.Results are consistent with a theory in which men and women are both susceptible tomotivated reasoning, but that there are gender differences in how attractive people find itto believe they performed better than others.There are several avenues for future work in both theoretical and applied directions. the opposite for the lower bound. Thaler (2020) provides a deeper discussion on the relationship between politically-motivated beliefs andoverprecision.
References
Barber, Brad and Terrance Odean (2001). “Boys will be boys: Gender, overconfidence, andcommon stock investment”. In:
Quarterly Journal of Economics .Benabou, Roland and Jean Tirole (2002). “Self-Confidence and Personal Motivation”. In:
Quarterly Journal of Economics .Benjamin, Daniel (2019). “Errors in Probabilistic Reasoning and Judgment Biases”. In:
Chapter for the Handbook of Behavioral Economics .Beyer, Sylvia (1990). “Gender Differences in the Accuracy of Self-Evaluations of Perfor-mance”. In:
Journal of Personality and Social Psychology .Bordalo, Pedro et al. (2019). “Beliefs about Gender”. In:
American Economic Review .Chen, Daniel, Martin Schonger, and Chris Wickens (2016). “oTree – An open-source platformfor laboratory, online, and field experiments”. In:
Journal of Behavioral and ExperimentalFinance .Coffman, Katherine, Manuela Collis, and Leena Kulkarni (2019). “Stereotypes and BeliefUpdating”. In:
Working Paper .Coutts, Alexander (2018). “Good news and bad news are still news: Experimental evidenceon belief updating”. In:
Experimental Economics .20roson, Rachel and Uri Gneezy (2009). “Gender differences in preferences”. In:
Journal ofEconomic Literature .Dohmen, Thomas and Armin Falk (2011). “Performance pay and multidimensional sorting:Productivity, preferences, and gender”. In:
American Economic Review .Eckel, Catherine and Philip Grossman (2008). “Men, Women and Risk Aversion: Experi-mental Evidence”. In:
Handbook of Experimental Economics Results .Eil, David and Justin Rao (2011). “The good news-bad news effect: asymmetric processing ofobjective information about yourself”. In:
American Economic Journal: Microeconomics .Ertac, Seda (2011). “Does self-relevance affect information processing? Experimental evi-dence on the response to performance and non-performance feedback”. In:
Journal ofEconomic Behavior and Organization .Exley, Christine and Judd Kessler (2018). “Motivated Errors”. In:
Working Paper .Exley, Christine and Judd Kessler (2019). “The gender gap in self-promotion”. In:
WorkingPaper .Gneezy, Uri, Muriel Niederle, and Aldo Rustichini (2003). “Performance in competitive en-vironments: Gender differences”. In:
Quarterly Journal of Economics .Goldin, Claudia (1990).
Understanding the Gender Gap: An Economic History of AmericanWomen . Oxford University Press.Goldin, Claudia (2014). “A Grand Gender Convergence: Its Last Chapter”. In:
AmericanEconomic Review .Grossman, Philip et al. (2019). “It pays to be a man: Rewards for leaders in a coordinationgame”. In:
Journal of Economic Behavior and Organization .Horton, John, David Rand, and Richard Zeckhauser (2011). “The online laboratory: con-ducting experiments in a real labor market”. In:
Experimental Economics .Kahan, Dan (2016). “The Politically Motivated Reasoning Paradigm, Part 1: What Polit-ically Motivated Reasoning Is and How to Measure It”. In:
Emerging Trends in Socialand Behavioral Sciences .Kuhnen, Camelia (2014). “Asymmetric Learning from Financial Information”. In:
The Jour-nal of Finance .Kunda, Ziva (1990). “The case for motivated reasoning”. In:
Psychological Bulletin .Lenney, Ellen (1977). “Women’s self-confidence in achievement settings”. In:
PsychologicalBulletin .Levay, Kevin, Jeremy Freese, and James Druckman (2016). “The Demographic and PoliticalComposition of Mechanical Turk Samples”. In:
SAGE Open .Logg, Jennifer, Uriel Haran, and Don Moore (2018). “Is overconfidence a motivated bias?Experimental evidence”. In:
Journal of Experimental Psychology .21obius, Markus et al. (2014). “Managing self-confidence: Theory and experimental evi-dence”. In:
Working Paper .Moore, Don and Paul Healy (2008). “The Trouble with Overconfidence”. In:
PsychologicalReview .Moore, Don, Elizabeth Tenney, and Uriel Haran (2015). “Overprecision in Judgment”. In:
The Wiley Blackwell Handbook of Judgment and Decision Making .Niederle, Muriel and Lise Vesterlund (2007). “Do women shy away from competition? Domen compete too much?” In:
American Economic Review .Niederle, Muriel and Lise Vesterlund (2011). “Gender and Competition”. In:
Annual Reviewof Economics .Sarsons, Heather (2019). “Interpreting Signals in the Labor Market: Evidence from MedicalReferrals”. In:
Working Paper .Sarsons, Heather and Guo Xu (2016). “Confidence Men? Gender and Confidence: Evidenceamong Top Economists”. In:
Working Paper .Schultz, Jonathan and Christian Thoni (2016). “Overconfidence and Career Choice”. In:
PLOS One .Shurchkov, Olga and Catherine Eckel (2018). “Gender Differences in Behavioral Traits andLabor Market Outcomes”. In:
The Oxford Handbook of Women and the Economy .Thaler, Michael (2020). “The “Fake News” Effect: Experimentally Identifying MotivatedReasoning Using Trust in News”. In:
Working Paper .22
Additional Figures
A.1 Balance Table
Good News Bad News Good vs. Bad p-valueMale 0.545 0.538 0.007 0.824(0.023) (0.023) (0.032)Female 0.455 0.462 -0.007 0.824(0.023) (0.023) (0.032)Age 34.830 35.780 -0.950 0.172(0.482) (0.501) (0.695)White 0.779 0.735 0.044 0.113(0.019) (0.020) (0.027)Black 0.074 0.088 -0.015 0.406(0.012) (0.013) (0.017)Latino 0.051 0.078 -0.027 0.089(0.010) (0.012) (0.016)Education 14.617 14.719 -0.102 0.392(0.085) (0.084) (0.119)Log(income) 10.715 10.687 0.028 0.574(0.035) (0.036) (0.050)Religious 0.465 0.417 0.048 0.129(0.023) (0.022) (0.032)Pro-Republican 0.279 0.269 0.010 0.735(0.020) (0.020) (0.029)Pro-Democratic 0.633 0.639 -0.005 0.861(0.022) (0.022) (0.031) N
488 487 975
Notes:
Standard errors in parentheses. Only subjects who list male or female as their gender are included.Good News / Bad News refers to news about relative performance. Education is in years. Religious is 1if subject is in any religious group. Pro-Republican (Pro-Democratic) is 1 if subject gives a strictly higherrating to the Republican (Democratic) Party. .2 Overconfidence Figure 6:
CDF of Overconfidence by Gender
Notes:
Overconfidence is measured by subtracting actual percentile performance (which ranges from 0 to 100) frompredicted percentile performance (also ranging from 0 to 100). Positive numbers indicate overconfidence and negativenumbers indicate underconfidence. .3 Motivated Reasoning Figure 7:
CDF of Trust in “Good News” by Gender
Notes:
Good News tells subjects their current median beliefs about their performance are too low. This figure shows thatmen trust Good News more than women do. The x-axis measures subjects’ assessments of P(True News | Good News).The y-axis measures the share of respondents that give at most that high of an assessment. Bayesians would have thesame trust in news for Good News and Bad News, and the residual is motivated reasoning. Error bars correspond to 95percent confidence intervals. igure 8: CDF of Trust in “Bad News” by Gender
Notes:
Bad News tells subjects their current median beliefs about their performance are too low. This figure shows thatwomen trust Bad News more than men do. The x-axis measures subjects’ assessments of P(True News | Bad News). They-axis measures the share of respondents that give at most that high of an assessment. Bayesians would have the sametrust in news for Good News and Bad News, and the residual is motivated reasoning. Error bars correspond to 95 percentconfidence intervals. able 3: Topics and Hypothesized Motives for Democrats and Republicans
Topic Pro-Democrat Motives Pro-Republican Motives
US crime Got better under Obama Got worse under ObamaUpward mobility Low in US after tax cuts High in US after tax cutsRacial discrimination Severe in labor market Not severe in labor marketGender Girls better at math Boys better at mathRefugees Decreased violent crime Increased violent crimeClimate change Scientific consensus No scientific consensusGun reform Decreased homicides Didn’t decrease homicidesMedia bias Media not dominated by Dems Media is dominated by DemsParty performance Higher for Dems over Reps Higher for Reps over Dems27 igure 9:
Motivated Reasoning by Gender, Including Subjects Who Fail Attention Checks
Notes:
Same as Figure 3 but the subjects who failed attention checks are also included. The x-axis is the treatment effectof seeing Good News versus Bad News about one’s performance or party. Bayesians would have an effect of zero, whilemotivated reasoners would have a positive effect. Error bars correspond to 95 percent confidence intervals. Study Materials: Exact Question Wordings
The Relative Performance question is seen in Round 13, after the political and neutral topics(but before the Party Performance question).
Performance Question
How well do you think you performed on this study about political and U.S. knowledge? I’vecompared the average points you scored for all questions (prior to this one) to that of 100 otherparticipants.How many of the 100 do you think you scored higher than?(Please guess between 0 and 100.)
Correct answer: Depends on participant’s performance.
Political Questions
Crime Under Obama
Some people believe that the Obama administration was too soft on crime and that violent crime in-creased during his presidency, while others believe that President Obama’s pushes towards criminaljustice reform and reducing incarceration did not increase violent crime.This question asks how murder and manslaughter rates changed during the Obama adminis-tration. In 2008 (before Obama became president), the murder and manslaughter rate was 54 permillion Americans.In 2016 (at the end of Obama’s presidency), what was the per-million murder and manslaughterrate?
Correct answer: 53.Source linked on results page: http: // bit. ly/ us-crime-rate
Upward Mobility
In 2017, Donald Trump signed into law the largest tax reform bill since Ronald Reagan’s 1981 and1986 bills. Some people believe that Reagan’s reforms accelerated economic growth and allowedlower-income Americans to reap the benefits of lower taxes, while other people believe that thisdecreased the government’s spending to help lower-income Americans get ahead.This question asks whether children who grew up in low-income families during Reagan’s tenurewere able to benefit from his tax reforms.Of Americans who were born in the lowest-income (bottom 20%) families from 1980-1985, whatpercent rose out of the lowest-income group as adults?(Please guess between 0 and 100.) orrect answer: 64.9.Source linked on results page: http: // bit. ly/ us-upward-mobility (page 47) Racial Discrimination
In the United States, white Americans have higher salaries than black Americans on average. Somepeople attribute these differences in income to differences in education, training, and culture, whileothers attribute them more to racial discrimination.In a study, researchers sent fictitious resumes to respond to thousands of help-wanted ads innewspapers. The resumes sent had identical skills and education, but the researchers gave half ofthe (fake) applicants stereotypically White names such as Emily Walsh and Greg Baker, and gavethe other half of the applicants stereotypically Black names such as Lakisha Washington and JamalJones.9.65 percent of the applicants with White-sounding names received a call back. What percentof the applicants with Black-sounding names received a call back?(Please guess between 0 and 100.)
Correct answer: 6.45.Source linked on results page: http: // bit. ly/ labor-market-discrimination
Gender and Math GPA
In the United States, men are more likely to enter into mathematics and math-related fields. Somepeople attribute this to gender differences in interest in or ability in math, while others attributeit to other factors like gender discrimination.This question asks whether high school boys and girls differ substantially in how well they doin math classes. A major testing service analyzed data on high school seniors and compared theaverage GPA for male and female students in various subjects.Male students averaged a 3.04 GPA (out of 4.00) in math classes. What GPA did femalestudents average in math classes?(Please guess between 0.00 and 4.00.)
Correct answer: 3.15.Source linked on results page: http: // bit. ly/ gender-hs-gpa
Refugees and Violent Crime
Some people believe that the U.S. has a responsibility to accept refugees into the country, whileothers believe that an open-doors refugee policy will be taken advantage of by criminals and putAmericans at risk.In 2015, German leader Angela Merkel announced an open-doors policy that allowed all Syrianrefugees who had entered Europe to take up residence in Germany. From 2015-17, nearly one illion Syrians moved to Germany. This question asks about the effect of Germany’s open-doorsrefugee policy on violent crime rates.In 2014 (before the influx of refugees), the violent crime rate in Germany was 224.0 per hundred-thousand people.In 2017 (after the entrance of refugees), what was the violent crime rate in Germany perhundred-thousand people? Correct answer: 228.2.Sources linked on results page: Main site: http: // bit. ly/ germany-crime-main-site . 2014and 2015 data: http: // bit. ly/ germany-crime-2014-2015 . 2016 and 2017 data: http: //bit. ly/ germany-crime-2016-2017 . Climate change
Some people believe that there is a scientific consensus that human activity is causing globalwarming and that we should have stricter environmental regulations, while others believe thatscientists are not in agreement about the existence or cause of global warming and think thatstricter environmental regulations will sacrifice jobs without much environmental gain.This question asks about whether most scientists think that global warming is caused by hu-mans. A major nonpartisan polling company surveyed thousands of scientists about the existenceand cause of global warming.What percent of these scientists believed that “Climate change is mostly due to human activity”?(Please guess between 0 and 100.)
Correct answer: 87.Source linked on results page: http: // bit. ly/ scientists-climate-change
Gun Reform
The United States has a homicide rate that is much higher than other wealthy countries. Somepeople attribute this to the prevalence of guns and favor stricter gun laws, while others believe thatstricter gun laws will limit Americans’ Second Amendment rights without reducing homicides verymuch.After a mass shooting in 1996, Australia passed a massive gun control law called the NationalFirearms Agreement (NFA). The law illegalized, bought back, and destroyed almost one millionfirearms by 1997, mandated that all non-destroyed firearms be registered, and required a lengthywaiting period for firearm sales.Democrats and Republicans have each pointed to the NFA as evidence for/against stricter gunlaws. This question asks about the effect of the NFA on the homicide rate in Australia.In the five years before the NFA (1991-1996), there were 319.8 homicides per year in Australia.In the five years after the NFA (1998-2003), how many homicides were there per year in Australia? orrect answer: 318.6.Sources linked on results page: http: // bit. ly/ australia-homicide-rate (Suicides de-clined substantially, however. For details: http: // bit. ly/ impact-australia-gun-laws .) Media Bias
Some people believe that the media is unfairly biased towards Democrats, while some believe it isbalanced, and others believe it is biased towards Republicans.This question asks whether journalists are more likely to be Democrats than Republicans.A representative sample of journalists were asked about their party affiliation. Of those ei-ther affiliated with either the Democratic or Republican Party, what percent of journalists areRepublicans?(Please guess between 0 and 100.)
Correct answer: 19.8.Source linked on results page: http: // bit. ly/ journalist-political-affiliation
Party Relative Performance
Subjects are randomly assigned to see either the Democrats’ score (and asked to predict the Re-publicans’ score) or to see the Republicans’ score (and asked to predict the Democrats’ score).
Democrats’ Relative Performance
This question asks whether you think Democrats or Republicans did better on this study about po-litical and U.S. knowledge. I’ve compared the average points scored by Democrats and Republicansamong 100 participants (not including yourself).The Republicans scored 70.83 points on average.How many points do you think the Democrats scored on average?(Please guess between 0 and 100)
Correct answer: 72.44.
Republicans’ Relative Performance
This question asks whether you think Democrats or Republicans did better on this study about po-litical and U.S. knowledge. I’ve compared the average points scored by Democrats and Republicansamong 100 participants (not including yourself).The Democrats scored 72.44 points on average.How many points do you think the Republicans scored on average?(Please guess between 0 and 100)
Correct answer: 70.83. eutral Questions Random Number
A computer will randomly generate a number between 0 and 100. What number do you think thecomputer chose?(As a reminder, it is in your best interest to guess an answer that is close to the computer’schoice, even if you don’t perfectly guess it.)
Correct answer: Randomly generated for each participant.
Latitude of Center of the United States
The U.S. National Geodetic Survey approximated the geographic center of the continental UnitedStates. (This excludes Alaska and Hawaii, and U.S. territories.)How many degrees North is this geographic center?(Please guess between 0 and 90. The continental U.S. lies in the Northern Hemisphere, theEquator is 0 degrees North, and the North Pole is 90 degrees North.)
Correct answer: 39.833.Source linked on results page: http: // bit. ly/ center-of-the-us
Longitude of Center of the United States
The U.S. National Geodetic Survey approximated the geographic center of the continental UnitedStates. (This excludes Alaska and Hawaii, and U.S. territories.)How many degrees West is this geographic center?(Please guess between 0 and 180. The continental U.S. lies in the Western Hemisphere, whichranges from 0 degrees West to 180 degrees West.)
Correct answer: 98.583.Source linked on results page: http: // bit. ly/ center-of-the-us
Attention Check Question
Current Year
In 1776 our fathers brought forth, upon this continent, a new nation, conceived in Liberty, anddedicated to the proposition that all men are created equal.What is the year right now?This is not a trick question and the first sentence is irrelevant; this is a comprehension checkto make sure you are paying attention. For this question, your lower and upper bounds should beequal to your guess if you know what year it currently is. orrect answer: 2018.Source linked on results page: http: // bit. ly/ what-year-is-it Online Appendix: Study Materials
C.1 Flow of Experiment
Subjects see a series of pages in the following order:• Introduction and Consent• Demographics and Current Events Quiz• Opinions• Instructions for Question Pages• Question 1• Instructions for News Assessment Pages• News Assessment 1• Question 2, News Assessment 2, . . . , Question 14, News Assessment 14• Feedback• Results and PaymentThe Performance question is always in Round 13, and pertains to the performance in Rounds1-12.Screenshots for each of the pages are in the following subsection. Red boxes are notshown to subjects and are included for illustration purposes only. Results pages here are cutoff after three questions, but all results are shown to subjects. Choices on the Demographicspage are randomly ordered. 35 .2 Study Materials igure 10:
The question page for the performance questione.4012 igure 11:igure 11: