[PDF] Overconfidence and Prejudice

Abstract

We explore conclusions a person draws from observing society when he allows for the possibility that individuals' outcomes are affected by group-level discrimination. Injecting a single non-classical assumption, that the agent is overconfident about himself, we explain key observed patterns in social beliefs, and make a number of additional predictions. First, the agent believes in discrimination against any group he is in more than an outsider does, capturing widely observed self-centered views of discrimination. Second, the more group memberships the agent shares with an individual, the more positively he evaluates the individual. This explains one of the most basic facts about social judgments, in-group bias, as well as "legitimizing myths" that justify an arbitrary social hierarchy through the perceived superiority of the privileged group. Third, biases are sensitive to how the agent divides society into groups when evaluating outcomes. This provides a reason why some ethnically charged questions should not be asked, as well as a potential channel for why nation-building policies might be effective. Fourth, giving the agent more accurate information about himself increases all his biases. Fifth, the agent is prone to substitute biases, implying that the introduction of a new outsider group to focus on creates biases against the new group but lowers biases vis a vis other groups. Sixth, there is a tendency for the agent to agree more with those in the same groups. As a microfoundation for our model, we provide an explanation for why an overconfident agent might allow for potential discrimination in evaluating outcomes, even when he initially did not conceive of this possibility.

Full PDF

aa r X i v : . [ ec on . T H ] S e p Overconﬁdence and Prejudice ∗ Paul HeidhuesDICE, Heinrich-Heine University D¨usseldorf Botond K˝oszegiCentral European UniversityPhilipp StrackYale UniversitySeptember 19, 2019

Abstract

We explore conclusions a person draws from observing society when he allows for the pos-sibility that individuals’ outcomes are aﬀected by group-level discrimination. Injecting a singlenon-classical assumption, that the agent is overconﬁdent about himself, we explain key observedpatterns in social beliefs, and make a number of additional predictions. First, the agent believesin discrimination against any group he is in more than an outsider does, capturing widely ob-served self-centered views of discrimination. Second, the more group memberships the agentshares with an individual, the more positively he evaluates the individual. This explains oneof the most basic facts about social judgments, in-group bias, as well as “legitimizing myths”that justify an arbitrary social hierarchy through the perceived superiority of the privilegedgroup. Third, biases are sensitive to how the agent divides society into groups when evaluatingoutcomes. This provides a reason why some ethnically charged questions should not be asked,as well as a potential channel for why nation-building policies might be eﬀective. Fourth, givingthe agent more accurate information about himself increases all his biases. Fifth, the agent isprone to substitute biases, implying that the introduction of a new outsider group to focus oncreates biases against the new group but lowers biases vis a vis other groups. Sixth, there is atendency for the agent to agree more with those in the same groups. As a microfoundation forour model, we provide an explanation for why an overconﬁdent agent might allow for potentialdiscrimination in evaluating outcomes, even when he initially did not conceive of this possibility. ∗ We are grateful to Aislinn Bohren, Alex Imas, Robert Lieli, and Florian Zimmermann for insightful discussions,and seminar and conference audiences for comments. Introduction

Among the many factors that hinder the fair and productive coexistence of diﬀerent social groups,two sets of widely held beliefs surely stand out. First, many or most people think of their own groupas superior to other groups, and often hold negative evaluations of other groups. For instance, themajority of non-Muslim Americans believe that Muslims do not want to adopt the American wayof life, a statement that most Muslims disagree with. Second, diﬀerent groups hold dissenting viewsabout the most urgent intergroup problems. For instance, blacks consider discrimination againstblacks a greater problem than whites do — who often believe that discrimination against whitesis going on. Such beliefs can in turn foster or maintain interpersonal discrimination and conﬂictbetween groups, especially when resources are scarce (Jackson, 2011, Bertrand and Duﬂo, 2017).In this paper, we develop a novel theory of prejudiced beliefs based on a person’s attempts tounderstand society while maintaining stubborn, unrealistically positive views of himself . Lookingto best explain why his outcomes are not as good as he thinks he deserves, the agent comes tooverestimate discrimination against any social group he is a member of, and to overestimate dis-crimination in favor of any group he is a competitor of. Furthermore, interpreting others’ outcomesin light of his views about discrimination, he is led to develop excessively positive views of hissocial groups, and excessively negative views of competing groups. Our framework provides a newperspective on key facts explained by previous theories based on politician-induced hate (Glaeser,2005) and the representativeness heuristic (Bordalo et al., 2016), but we also explain several otherfacts, including connecting prejudiced beliefs to self-serving beliefs about discrimination, and makea number of further predictions.We begin in Section 2 by deriving a general formula for what a person with dogmatic incorrectbeliefs about a variable learns about other variables. This formula is invaluable for our subsequentanalysis, and might also be useful for studying other learning settings with misspeciﬁcation andmulti-dimensional states and signals. The agent is interested in estimating the levels of L funda-mentals, with him having a degenerate prior about one fundamental and a full-support prior overthe other fundamentals. He repeatedly observes some linear combinations of the fundamentals withmultivariate normal errors that are i.i.d. over time, and updates using Bayes’ Rule. We identify hislimiting beliefs about the fundamentals and the covariance matrix of the errors, which depend onwhat linear combinations he observes, the true covariance matrix of the errors, and how wrong his1ogmatic belief is. Interestingly, although the agent misinfers the covariance matrix, his inferencesabout the fundamentals are the same as when he knows the true covariance matrix.In Section 3, we turn to our main topic, a theory of social beliefs. We assume that society iscomprised of K potentially overlapping groups, and a person is either a member, a competitor,or a neutral outsider of a group. The agent observes many draws of the “recognition” — i.e.,achievement, social status, or other measure of success — of each individual, including himself. Heunderstands that recognition depends in part on a person’s “caliber” — i.e., ability, hard work, orother measure of deservingness — and noise, but he also posits that there might be “discrimination”— or policies, cheating, conspiracies, etc. — that beneﬁt a group and hurt its competitors. Someof these possibilities might be realistic, while others might be imagined by the agent; but he doesnot a priori assume any particular pattern of discrimination, he attempts to learn the true patternfrom his observations, including from direct, unbiased signals about the degrees of discriminationthemselves. All noise terms in the agent’s observations are independent of each other. Crucially,to these relatively standard ingredients we add a single non-classical but empirically well-foundedassumption, stubborn overconﬁdence: the agent has a point belief about his own caliber that isabove the correct one.In Section 4, we identify properties of the agent’s long-run beliefs, beginning with two centralpatterns that are widely documented in the literature. First, the agent holds self-serving viewsabout discrimination: he overestimates discrimination against any group he is a member of andunderestimates discrimination against any group he is in competition with, and consequently heconsiders discrimination against his groups as a greater issue than outsiders do. Intuitively, theagent’s overconﬁdence implies that the recognition he obtains is in his view systematically too low,and believing in discrimination against his groups provides an explanation for why this is the case.Second, the agent is subject to an “in-group bias,” perhaps the most basic ﬁnding in theliterature on stereotypes, discrimination, and prejudice: he tends to hold overly favorable viewsabout those in the same social circles, and overly unfavorable views about those in competingcircles. Intuitively, since the agent believes that discrimination against his groups and in favor ofcompeting groups is going on, he attributes more of an in-group member’s recognition, and less ofa competitor’s recognition, to caliber.Beyond naturally accounting for two important stylized facts, our theory makes many subtler2redictions. As the intuitions make clear, the degree of the agent’s self-serving view of discrimina-tion, and the degree of his in-group bias, are directly related to his degree of overconﬁdence. Hence,we predict that these biases are increasing in overconﬁdence, or (equivalently in our model) theextent to which a person feels that he is not getting what he deserves based on his caliber. To ourknowledge, this relationship has not been directly tested in the literature, although it is consistent,for instance, with the observation that men are both more overconﬁdent and more prejudiced thanwomen.A person’s pattern of biases, however, derives not only from his overconﬁdence, but also fromthe manner in which he thinks about society. As an illustrative extreme case, suppose that dis-crimination is in reality non-existent in society. If the agent conceives of each individual separately( K = 0 groups), then he develops unbiased beliefs about everyone. If he conceives of individualsin terms of group membership ( K > K = 0), but individuals’ recognitions are notnecessarily independent of each other. Then, the agent develops a positive bias about any personwhose recognition is positively correlated with his, and a negative bias about any person whoserecognition is negatively correlated with his. We can then endogenously deﬁne the agent’s in-groupas the former group — those “in the same boat” with him — and the latter group as his out-group. Furthermore, this case of our model provides a potential microfoundation for why an agentwho does not initially think about the possibility of group-level discrimination nevertheless startsbelieving in it. Namely, comparing his conclusions to the very observations on which his conclusionsare based, the agent might observe that his in-group does not get the outcomes that he thinks itshould, and draw the natural inference that something must be hurting the entire group.We also analyze simple versions of our model in which a person observes not just the recognitionsof others, but also signals about their calibers, for instance through personal contact. We showthat more precise information about caliber, or knowing a greater number of individuals, lowers4ll of the person’s biases. More precise information about an outside group’s recognition, however,lowers the agent’s opinion of the group. This insight provides a justiﬁcation for the conventionof news outlets not to report the race of a suspected criminal unless it is essential for the story.Finally, we give an example illustrating that when people are characterized by multiple attributes(e.g., ability and morality), then the agent may develop positive biases about some but not all ofa competing group’s attributes.We discuss related literature throughout, but especially in Section 6. While a large literature insociology and social psychology explores prejudices and stereotypes, to our knowledge no previoustheory has derived these phenomena from overconﬁdence or connected them to views about group-level discrimination. As a result, existing theories make some very diﬀerent predictions than do we.Similarly, Glaeser’s (2005) theory that politicians supply, and citizens often passively accept, hate-creating stories about minorities, and Bordalo et al.’s (2016) theory that individuals exaggeratedistinctive true diﬀerences across groups, address diﬀerent aspects of stereotypes and prejudicethan our paper.In Section 7, we mention some questions that are unaddressed by our current framework. Forinstance, a person might be unsure about the composition of groups in society. Then, we conjecturethat in some circumstances he comes to believe in a secret group that is favored or conspires againstothers. More broadly, while our framework takes group relationships as exogenous, it would beimportant to endogenize them in future work. In this section, we derive a theoretical result that we will apply in multiple ways to analyze our mainmodels, and that might be useful for other researchers in studying implications of overconﬁdenceand other misspeciﬁcations. Readers uninterested in our abstract result can skip to Section 3.A person makes inferences about an L -dimensional vector of fundamentals f = ( f , . . . , f L ) T ∈ R L , which are ﬁxed over time. In each period t , he observes a D -dimensional signal r t = M f + ǫ t , M ∈ R D × L is a matrix and ǫ t ∈ R D is a vector of errors that is jointly normally distributedwith mean zero and positive deﬁnite covariance matrix Σ. We assume that D ≥ L and M hasrank L . Otherwise, there would be two diﬀerent fundamentals that entail the same distribution ofsignals and hence the agent could not learn the fundamentals even with access to inﬁnite data.The agent observes a sequence of realizations of r , r , r , . . . , with the ǫ t drawn independentlyover time. He updates his beliefs in a Bayesian way: given a prior belief P over the set offundamentals and positive deﬁnite covariance matrices, the probability the agent’s posterior belief P t assigns to the set A after seeing the the sequence of signals r = ( r , r , . . . , r t ) is given by P t A = R ( f ′ , Σ ′ ) ∈ A ℓ t ( r | f ′ , Σ ′ ) d P ( f ′ , Σ ′ ) R ℓ t ( r | f ′ , Σ ′ ) d P ( f ′ , Σ ′ ) , where the likelihood equals ℓ t ( r | f ′ , Σ ′ ) = t Y z =1 p (2 π ) L det Σ ′ exp (cid:18) −

12 ( r z − M f ′ ) T Σ ′ ( r z − M f ′ ) (cid:19) . Crucially, we assume that the agent is misspeciﬁed in a particular sense: while the true valueof fundamental i equals f i , he believes with certainty that it is ˜ f i .We consider three diﬀerent inference problems depending on what aspect of the agent’s be-liefs are ﬁxed by his prior belief, and what are derived from his observations. In our preferredspeciﬁcation, the agent is trying to infer the fundamentals f as well as the covariance matrix Σ:supp P = n ( f ′ , Σ ′ ) ∈ R L × R D × D : f ′ i = ˜ f i , Σ ′ is positive deﬁnite o . (Case III)Because they are potentially of interest in other applications, we also consider two simpler infer-ence problems. We ask what the agent infers about the fundamentals when his beliefs about thecovariance matrix are ﬁxed at some positive deﬁnite ˜Σ, so thatsupp P = n ( f ′ , Σ ′ ) ∈ R L × R D × D : f ′ i = ˜ f i , Σ ′ = ˜Σ o . (Case I)And we ask what the agent infers about the covariance matrix when his beliefs about all funda-mentals are ﬁxed at ˜ f = ( ˜ f , . . . , ˜ f L ) T , so thatsupp P = n ( f ′ , Σ ′ ) ∈ R L × R D × D : f ′ = ˜ f , Σ ′ is positive deﬁnite o . (Case II)We say that the agents’ beliefs concentrate on a point ( ˜ f , ˜Σ) if for every open set A such that( ˜ f , ˜Σ) is contained in A , the agent will in the limit assign probability 1 to A :lim t →∞ P t A = 1 . M T ˜Σ − M is well-deﬁned; and since M has rank L , this matrix is positive deﬁnite and henceinvertible. Theorem 1 (Long-Run Beliefs) . In Cases (I), (II), and (III), the agent’s beliefs concentrate on asingle point ( ˜ f , ˜Σ) . Furthermore:Case (I) If the agent has ﬁxed beliefs ˜Σ about the covariance matrix but is uncertain about thefundamentals j = i , then in the limit his bias about fundamental j is ˜ f j − f j = ( M T ˜Σ − M ) − ij ( M T ˜Σ − M ) − ii ( ˜ f i − f i ) . (1) Case (II) If the agent has ﬁxed beliefs ˜ f about the fundamentals but is uncertain about the co-variance matrix, then in the limit his bias about the covariance matrix is ˜Σ − Σ = ( M ( ˜ f − f ))( M ( ˜ f − f )) T . (2) Case (III) If the agent is uncertain about both the fundamentals j = i and the covariance matrix,then in the limit his bias about fundamental j is ˜ f j − f j = (cid:2) M T Σ − M (cid:3) − ji [ M T Σ − M ] − ii ( ˜ f i − f i ) , (3) and his bias about the covariance matrix is given by Expression (2) . The proof of Theorem 1 proceeds as follows. First, by the seminal result of Berk (1966), beliefsconcentrate on the set of minimizers of the Kullback-Leibler divergence. Intuitively, the negativeof the Kullback-Leibler divergence is increasing in the subjective likelihood of observing the truedistribution of data, so it is a natural measure of how likely a combination of parameters is inthe agent view in the long run. Due to our assumption of normal signals, the Kullback-Leiblerdivergence D (cid:0) F, Σ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f , ˆΣ (cid:1) assigned to the parameters ( ˆ f , ˆΣ) when the true parameters equal ( f, Σ)simpliﬁes to D (cid:0) f, Σ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f , ˆΣ (cid:1) = 12 tr( ˆΣ − Σ) + ( M ( ˆ f − f )) T ˆΣ − M ( ˆ f − f ) − n + log det ˆΣdet Σ ! . The proof then derives the unique minimizer of the above expression over the support speciﬁedin Cases (I), (II), and (III) using properties of the trace, Kronecker product, determinant, and7igenvalues of a matrix. While Case (I) can be veriﬁed by taking ﬁrst-order conditions with respectto the fundamentals, in Cases (II) and (III) the objective function involves the determinant of ˆΣ,which is not a tractable function in general. We solve this problem by looking at the eigenvaluesof a well-chosen matrix in each case, greatly reducing the dimensionality of the problems as well aseliminating the determinant from the objective.One curious fact about the agent’s inferences is immediate from plugging the true covariancematrix into Expression (1) — which yields exactly Expression (3). Hence, Part (III) says thatwhen the agent is initially agnostic about both the fundamentals and the covariance matrix, then— although by Part (II) he misinfers the covariance matrix — his beliefs about the fundamentalsare the same as if he correctly understood the covariance matrix.

We now turn to our main interest, a model of how overconﬁdence aﬀects social beliefs.

There are I individuals in society. Individual j ∈ { , . . . , I } has ﬁxed “caliber” a j ∈ R and K observable group relationships c j ∈ { , , − } K , with c jk = 1 denoting that he is a member ofgroup k , c jk = 0 denoting that he is outside of but neutrally related to group k , and c jk = − k . We consider society from the perspective of oneindividual i ∈ { , . . . , I } , whom we call agent i ; in some cases, we also compare the views of diﬀerentagents who all think according to our model. Agent i observes a sequence of realizations of q j = a j + K X k =1 c jk θ k + ǫ qj , j = 1 , . . . , I, and η k = θ k + ǫ ηk , k = 1 , . . . , K, (4)where q j is individual j ’s “recognition” on that occasion, the time-invariant constant θ k ∈ R isdiscrimination in favor of group k , η k is a signal of θ k , and ǫ qj and ǫ ηk are independent normallydistributed errors with mean zero and variances v qj and v ηk , respectively. Hence, recognition dependsin part on caliber (and noise), but in addition to that, discrimination toward a group increases amember’s recognition by a ﬁxed amount and decreases a competitor’s recognition by the sameamount. 8he crucial assumption of our model — and the single non-classical assumption from which ourresults derive — is that agent i is overconﬁdent about himself. Formally, while his true caliber is A i ,he believes with certainty that it is ˜ a i > A i . Beyond having an unrealistic self-view, however, agent i is rational: he applies Bayes’ Rule correctly to update his beliefs. Furthermore, he is agnosticregarding the levels of discrimination and the calibers of other individuals, with his prior having fullsupport. Similarly, he is uncertain about the covariance matrix of the errors, with a full-supportprior over positive deﬁnite covariance matrices. Especially given our context of social judgments and prejudice, we think of the variables a j , q j ,and θ k , as well as the concept of overconﬁdence, quite broadly. To start with the ﬁrst two, astraightforward interpretation is that a j is ability and q j is wage or other measure of economicsuccess that is susceptible to discrimination. Alternatively, a j could denote a person’s deservingnessof social rewards based on past work or behavior or general character, with q j capturing the respecthe gets in the form of transfers, perks, or other recognition. For instance, views on whether alow-income person should get transfers often rest on whether the person is seen as hard-workingand honest, and attitudes toward immigrants are sometimes framed in terms of who deserveshelp from the state — with any perceived deviation from fair treatment interpreted as reﬂectingdiscrimination. Accordingly, we think of overconﬁdence as an overly positive view not only of one’sability, but also of one’s skills, importance, appropriate status, or deservingness in society. Whatis crucial for our theory is that the overconﬁdent person thinks of the recognition he obtains asunjustiﬁably low. And while θ k is most straightforwardly interpreted as discrimination, it couldalso capture policies that favor a group while potentially hurting others, or any actions by groupmembers that favor their own group. For instance, group members may cheat at the expense ofoutsiders, or they may be part of a conspiracy to beneﬁt themselves.Since we assume that recognition in reality as well as recognition as perceived by agent i satisﬁesEquation (4), our framework implicitly presumes that agent i ’s theory includes all groups for whichdiscrimination occurs (i.e., θ k = 0). Our theory does, however, allow for the possibility that somegroups in the agent’s theory actually face no discrimination ( θ k = 0). But even when the agentincludes such an “irrelevant” group in his model of society, he does not assume that there is9iscrimination in favor of (or against) that group, he merely allows for the possibility that there might be , and evaluates what he sees with this possibility in mind. In fact, the agent correctlyunderstands that excluding any group for which θ k = 0 leads to misspeciﬁcation and thereforeincorrect conclusions; and he wrongly believes that if θ k = 0, then he will learn that this is thecase. So from his perspective, it is best to include all potentially relevant groups.For the purposes of our model, whether a person outside group k is neutral toward or incompetition with group k — i.e., whether c jk = 0 or c jk = − k hurt the individual. Technically, we take these relationships asexogenous, although some economic principles might help guide their speciﬁcation if the agent isreasonable. For example, if a socio-economic group is very small, then the agent may reasonablyentertain the idea that there is discrimination in favor of that group, but he cannot reasonablythink that such discrimination would hurt him much. It is important to note, however, that theagent may perceive potential competition with groups that are logically not well-founded.We present the model and results by referring to individual j as a person. In reality, it isunrealistic to assume that a person observes the recognitions of all individuals. An equivalentmodel obtains if some observations q j are average recognitions of groups or subgroups. And ifagent i observes an individual’s or group’s recognition with noise, that noise can be incorporatedinto the error ǫ qj .The main assumption of our model, that the agent is stubbornly overconﬁdent about himself,is consistent with a large body of evidence from psychology as well as economics that documentsoverly positive self-views among individuals who have had plenty of opportunity to learn aboutthemselves (see our earlier paper, Heidhues et al., 2018, for a selective review). But beyond thisassumption, our model still presumes a lot of rationality in that the agent perfectly applies Bayes’Rule to a diverse set of observations, and develops a single full set of beliefs about all individualsand groups. Indeed, if the agent does not update properly about his caliber, one may wonderwhether he makes reasonable inferences about anything. While we cannot think of plausible andtractable speciﬁc alternatives, our modeling assumption of Bayesian updating does not seem to Following Heidhues et al. (2018), we specify agent i ’s belief about his own caliber as degenerate for two mainreasons. First, it allows us to study the implications of overconﬁdence for inferences in a tractable manner. Second,and more importantly, it is a reduced-form way to incorporate forces modeled in other papers that induce individuals toretain excessively positive self-views. That there are such forces is supported by the plethora of evidence mentioned inour earlier paper. Furthermore, a recent experiment by Goette and Kozakiewicz (2018) speciﬁcally tests and conﬁrmsthe predictions of our earlier model based on point beliefs. Theorem 1 implies that agent i ’s beliefs converge to point beliefs (in a sense deﬁned precisely inSection 2). To state our results, we denote agent i ’s long-run belief about discrimination towardgroup k by ˜ θ ik , and his long-run belief about individual j ’s caliber by ˜ a ij . We also denote truediscrimination in favor of group k by Θ k , and individual j ’s true caliber by A j . Proposition 1 (Biases) . Agent i ’s long-run bias about discrimination toward group k is ˜ θ ik − Θ k = − c ik v ηk v qi + P k ′ c ik ′ v ηk ′ · (˜ a i − A i ) , (5) and his long-run bias about agent j ’s caliber is ˜ a ij − A j = P k c ik c jk v ηk v qi + P k c ik v ηk · (˜ a i − A i ) . (6)Proposition 1 has a number of important implications for how individuals think about society.11 .1 Self-Centered Views of Discrimination The ﬁrst implication is immediately apparent from Equation (5): a person overestimates discrimi-nation against any group he is a member of, and underestimates discrimination against any grouphe is in competition with. Intuitively, overconﬁdence implies that agent i ’s average recognition isnot on par with his perceived caliber. A compelling explanation is that discrimination against thegroups he is in and discrimination in favor of the groups he competes with are hurting him.The above bias is diﬃcult to measure directly if the true extent of discrimination is unclear or noteasily compared to individuals’ opinions. But there is an immediate implication that is amenableto measurement and consistent with evidence: that a member’s estimate of discrimination againsta group is higher than a non-member’s. Regarding racial discrimination, 88 percent of blacks saythat “the country needs to continue making changes to give blacks equal rights with whites,” whileonly 54 percent of whites and 69 percent of Hispanics agree with that statement (Pew ResearchCenter, 2017, Chapter 4). In fact, the majority of American whites think that they are the onesbeing discriminated against (National Public Radio et al., 2017). Regarding gender discrimination,77 percent of surveyed male STEM employees say that women are treated fairly in opportunitiesfor promotion and advancement, but only 43 percent of females agree (Funk and Parker, 2018). And in the ﬁnancial domain, 37% of those with family incomes over $75,000, but 56% of those withfamily incomes below $30,000, think that being rich has more to do with having had advantagesthan with working harder (Pew Research Center, 2018a).It is worth noting that one group the agent conceives of could be a singleton consisting ofhimself. In this case, he develops the view that there is some “exclusive” discrimination directedonly at him. If he is in addition particularly bad at judging the degree of exclusive discrimination(the Σ ηk corresponding to himself is much higher than the Σ ηk corresponding to other groups), thenhe converges on what might be called paranoid beliefs: he explains his lack of performance mostlyby exclusive discrimination — the belief that “the world is out to get him” and only him. Inthis case, the group-based biases we identify below are small. But it seems reasonable to posit More speciﬁcally, 70 percent of blacks, but only 37 percent of whites, say that blacks are treated less fairly bypolice than whites, with similar gaps regarding the treatment of blacks in courts, stores, public schools, health care,and on the job (Anderson, 2014). Similarly, in a representative survey of Germans between 39 and 50 years of age, 69 percent of women versus43 percent of men answered that much still needs to be done to accomplish gender-equality (“Weniger Respekt undwachsende Fremdenfeindlichkeit”, Frankfurter Allgemeine Zeitung, 12.09.2019). ηk corresponding to himself is not that high). In those cases, all of our conclusions continue to hold. The agent’s biased beliefs about discrimination in turn lead to biased beliefs about individuals.Equation (6) implies that the greater is P k c ik c jk v ηk , the more positively biased is ˜ a ij : the moregroup memberships and the more competing groups individuals i and j have in common, the higheris agent i ’s opinion of individual j . Intuitively, since agent i believes that discrimination against hisgroups and in favor of competitors is going on, the more groups and competitor groups he shareswith individual j , the more he believes individual j is suﬀering from discrimination, so the more ofindividual j ’s recognition he attributes to caliber.The above bias has two important, closely related implications. For simplicity and clarity,we state these implications in a special case in which a person’s in-group and out-groups areunambiguous. We say that the group structure is partitional if the K groups are disjoint, theirunion is the set of all individuals, and c j = c j ′ whenever j and j ′ are in the same group. This meansthat society is divided into separate groups, with group memberships determining individuals’relationships to other groups. Corollary 1.

Suppose that the group structure is partitional, and take two overconﬁdent agents i and i who belong to diﬀerent groups k and k .1. If the average calibers of the groups are equal, then agent i believes that group k has higheraverage caliber than group k .2. Agent i ’s belief about the average caliber of group k , and his belief about the diﬀerencein average calibers between groups k and k , are higher than agent i ’s beliefs about the samemeasures. Part 1 says that if the average abilities of the groups are (approximately) equal, then all groupsbelieve themselves to be better than other groups. This “in-group bias” is perhaps the most basicstylized fact in the literature on stereotypes, discrimination, and prejudice. It was central in thegroundbreaking works of Sumner (1906), Allport (1954) and Tajfel (1982), and has been conﬁrmedby many researchers (see Mullen et al., 1992, for a meta-analysis of 137 studies). As recently as the90’s, for instance, about 65% of whites expressed the view that whites are more hard-working than13lacks, and 55% thought that whites are more intelligent (Krysan and Moberg, 2016), while bothwhites and blacks showed biological prejudice, the most traditional form of prejudiced belief thatone’s group is innately superior (Hraba et al., 1996). Relatedly, Gilens (1999) provides evidencethat the widespread dislike of welfare programs in the US is based on the (mis-)perception of whitesthat recipients are mainly blacks lacking suﬃcient work ethics, which Alesina et al. (2001) argue isan important reason for why the US welfare state is smaller than its European counterparts. Andprejudices can persist even when there is a strong non-discrimination norm: Shayo and Zussman(2011) document that both Jewish and Arab judges favor plaintiﬀs of their own ethnicity in Israelismall-claims courts, although it is unclear to what extent this reﬂects beliefs rather than tastes.Research on the in-group bias distinguishes between in-group favoritism and out-group deroga-tion (Hewstone et al., 2002), with the former being viewed as a more essential and more commoningredient of in-group bias than the latter. We can deﬁne in-group favoritism as the overestima-tion of in-group members, and out-group derogation as the underestimation of out-group members.Then, our framework says that an overconﬁdent individual always engages in in-group favoritism,but he may or may not engage in out-group derogation. First, if agent i is not a competitor ofanyone (for any k , c ik = −

1, and for any j , c jk i = − i ’s group k i ), then he correctly perceivesoutsiders. In this case, overestimating discrimination in favor of an outside group does not helpagent i in explaining his low performance, so he misestimates only discrimination vis a vis hisown group. Since his own group has no competitors, however, this does not aﬀect his view of anoutsider. Second, agent i ’s evaluation of an outsider j is increasing with each competing groupthey share (i.e., for which c ik = c jk = − i and j have a common competitor group, then i believes that j suﬀers from discrimination in favor of thatgroup just like he does, which introduces a positive bias in his evaluation. Agent i does derogate j if he is, or perceives he is, in competition with j ( c jk i = − i ’s group k i ), and they do nothave common competing groups — i.e., they are unequivocal competitors. This pattern is roughlyconsistent with the basic premise of group conﬂict theory discussed below, although we account for In the United States, overt expressions of racism have declined over the years; e.g., in a 2014 survey, “only” 23%of whites said that blacks are less intelligent (Krysan and Moberg, 2016). But evidence suggests that this declinereﬂects individuals’ realization that unambiguous racism is socially unacceptable, and more subtle measures still showsubstantial prejudice (see for instance McConahay et al. 1981 for an early constribution, and Fiske and North 2015for a review of modern prejudice measures). Nevertheless, van Oorschot (2006) ﬁnds a similar pattern regarding immigration in Europe: citizens view immi-grant groups as less deserving than other needy groups, which he points out is in line with prior research suggestingpeople close to us in terms of identity are seen as more deserving. θ k > Regarding gender, while both maleand female students give lower evaluations to female instructors than to male instructors, male Speciﬁcally, 56 percent of U.S. Muslims, but only 33 percent of the general public, think that Muslims who cometo the U.S. want to adopt American customs, and 20 percent of Muslims, but 51 percent of the general public thinkthat Muslims want to be diﬀerent from the larger American society. Regarding another dimension of the issue, 40percent of the general public think that there is a great deal or fair amount of support for extremism in the MuslimAmerican community, but only 21 percent of Muslims agree. In reality, Muslim Americans have some distinctlyAmerican values and concerns. For instance, 71 percent — as compared to 62 percent in the general public — saythat most people can get ahead if they are willing to work hard. And Muslims are almost as equally concernedabout Islamic extremism as others, with 60 percent very or somewhat concerned about extremism in the U.S., and 72percent very or somewhat concerned about extremism in the world, versus 67 and 73 percent for the general public.For more details, see the report by the Pew Research Center (2011). i is biased about in-group members, his bias is limited by his overconﬁdence:even if he belongs to the exact same groups as individual j ( c ik = c jk for all k ), he has less biasedbeliefs about agent j than about himself (˜ a ij − A j < ˜ a i − A i ). People are positively biased aboutthose in the same groups, but the median person still believes that he is better than most membersof his closest group. Another immediate implication of Proposition 1 is that person i ’s biases about both individuals anddiscrimination are increasing in his overconﬁdence ˜ a i − A i . As we have mentioned, in our modeloverconﬁdence has an equivalent interpretation to the belief that one is getting less than whathe deserves. Hence, our model predicts that those who feel more strongly that they are gettingless than they deserve are more prone to exhibit the biases and prejudices we identify. We havenot found direct evidence addressing this prediction but it is consistent with some correlationalpatterns. Relatedly, a simple implication of our framework is that a person’s biases depend on him tryingto explain what is happening to himself. Suppose that in building his theory about society, agent i acted like a disinterested scientist, ﬁtting a model in which he does not treat his own outcomesas observations. Then, he uses a correctly speciﬁed model, so he develops correct beliefs abouteverything. As a speciﬁc example, a white male professor studying discrimination in diﬀerent citiesmay conclude that racial and gender discrimination are widespread. At the same time, he may beprone to believe that discrimination in his own workplace — academia — is non-existent or eventhat getting hired or promoted is easier for women and minorities. While the evidence is not fully conclusive, existing research suggests that males tend to be more overconﬁdentat least regarding “stereotypical male” attributes such as mathematical ability (Beyer, 1990, Jakobsson et al., 2013,Niederle and Vesterlund, 2007). In as much as males are more overconﬁdent, we predict they have greater beliefbiases, and some evidence from Sweden and the US suggests that male students indeed have stronger racial prejudices(Ekehammar and Sidanius, 1982, Qualls et al., 1992). Researchers have also studied the relationship between self-esteem and in-group bias, and in line with our prediction have generally found a positive relationship (see, e.g.,Aberson et al., 2000, for a review). But we cannot be sure that self-esteem is a good measure of overconﬁdence, evenif it is plausible that they are positively correlated. .4 Biases Derive from Group-Based Thinking A person’s pattern of biases, however, derives not only from his overconﬁdence, but also from themanner in which he thinks about society. As a starting point, suppose that Θ k = 0 for all k .Then, the biases agent i comes to develop are directly related to the extent to which he evaluatesobservations with group distinctions in mind. If he conceives of each individual separately insteadof in terms of group membership ( K = 0), then he develops unbiased beliefs about everyone. If heconceives of society in terms of groups ( K > K and any Θ , . . . , Θ K ), this increases hisbias regarding the extent of discrimination: Corollary 2.

Adding an irrelevant new group (group K + 1 with Θ K +1 = 0 ) to the agent’s theoryincreases P k | ˜ θ ik − Θ k | . If there are more in reality irrelevant groups agent i considers, then he can better explain hisobservations by developing biased views about these groups, and hence his total bias regardingdiscrimination increases. As a result, agent i ’s biases regarding individuals closest to him (forwhom c jk = c ik = 0 for all k ) and individuals furthest from him (for whom c ik = − c ik = 0 for all k ) also increase.At the same time, some of agent i ’s biases can oﬀset each other, so his bias about a personwho shares some but not all group memberships with him can decrease in the number of groups heconsiders. As a simple example, suppose ﬁrst that agent i thinks of one group ( K = 1), and c i = 1, c j = −

1. Then, agent i ’s bias about agent j is ˜ a ij − A j = − v η (˜ a i − A i ) / ( v qi + v η ). Now suppose Corollary 2 is somewhat analogous to Schwartzstein’s (2014) result that an agent who ignores an importantexplanatory variable when trying to understand his observations may overestimate the relevance of another variable. i also considers another division of society ( K = 2), with c i = c j = 1 and v η = v η .Now agent i ’s bias is ˜ a ij − A j = 0. Intuitively, a white male evaluates a black male more negativelyif he thinks of society along a black/white divide than if he thinks of society along black/white aswell as male/female divides.In some circumstances, however, adding a new group to a person’s theory does change his viewsin a speciﬁc direction. In particular, we consider a situation in which agent i starts to contemplatea group of outsiders who are competitors of one of his groups. For instance, men may start askingthemselves whether a speciﬁc group of women is receiving favorable discrimination at the expenseof men. Corollary 3.

Suppose c iκ = 1 , and an irrelevant group K + 1 satisfying c jK +1 = − for any j with c jκ = 1 is added to agent i ’s theory. This improves agent i ’s view of any member of group κ and worsens his view of any member of group K + 1 . Contemplating the new competitor group, agent i concludes that there is discrimination in favor ofit, which lowers his opinion of the new group and improves his opinion of anyone who he believesis hurt by the discrimination.Corollary 3 provides one justiﬁcation for the view — typically associated with political correct-ness — that some questions relating to disadvantaged groups should not be discussed or investi-gated. The concern behind this view is that groups are innately not diﬀerent, and many racist,sexist, or other discriminatory questions, such as “are women as capable as men, or are they gettingahead due to favorable treatment?” can instead promote prejudiced beliefs. From the perspectiveof a correctly speciﬁed model, such a position is puzzling: if the groups a person investigates areapproximately equal, then he will (eventually) conclude that this is the case, lowering any existingprejudices. But in our model, asking such questions is indeed prone to produce prejudiced answersdespite equality between groups. Unfortunately, as we have discussed, a person often prefers to asksuch questions, and it might be diﬃcult to prevent him. We identify a few senses in which attempts to address the agent’s biases through the provisionof information can backﬁre or be only partially eﬀective. From a classical perspective, the mostobvious way to debias a person if his biases derive from an inﬂated self-view is to provide more18ccurate information about himself. Indeed, if a person has a correctly speciﬁed model with anoverly high prior about himself, better information can serve to rectify his self-view faster. But astubbornly overconﬁdent person’s inferences work completely diﬀerently: more precise evaluationsof himself (i.e., a decrease in v qi ) merely lead agent i to develop stronger biases about everything.Intuitively, being evaluated more clearly forces agent i to acknowledge that his low performance isnot due to bad luck, requiring a better explanation for why he is not recognized as he thinks heshould be. He responds by increasing his belief about how much discrimination he suﬀers from,resulting in greater in-group biases as well. This implies that societies or parts of society whereevaluations are more frequent or more objectively clear should (all else equal) be more prejudiced. Attempts to debias can also lead to reallocating a person’s biases. As a case in point, supposethat information about discrimination toward one group the agent belongs to or competes withbecomes more precise. This could happen due to a social planner providing more informationabout discrimination, or due to the agent investigating the issue himself. The eﬀect is not fullybeneﬁcial:

Corollary 4.

An increase in the precision of information about discrimination toward a groupthat agent i either belongs to or competes with (a decrease in v ηk for a k with c ik = 0 ) lowers agent i ’s bias regarding discrimination toward group k and his total bias regarding discrimination, butincreases his bias regarding discrimination toward all other groups. If agent i receives more information about discrimination toward group k , then it becomes morediﬃcult to maintain that discrimination toward group k is going on, and it becomes more diﬃcultto believe in discrimination more generally, so his biases regarding these matters decrease. Lookingto explain his recognition in another way, however, agent i engages in bias substitution: he comesto form more biased beliefs about discrimination toward other groups. For instance, if someoneconvinces a white male that there is no discrimination against males in hiring decisions, then hecomes to believe in discrimination against whites to a greater extent.Another manifestation of bias substitution occurs when the agent ﬁnds a new competitor groupto evaluate his observations with, such as when citizens are confronted with the refugee crisis andstart asking themselves whether immigrants are diﬀerent or are being treated diﬀerently. For a new19ssue such as this one, it is plausible to assume that information about discrimination is poor, sothat Σ ηk is large. Corollary 5.

Suppose that a new competitor group of individuals is added to society (individuals j = I + 1 , . . . , I ′ with c jK +1 = 1 for j > I , c jK +1 = − for j ≤ I , and c jk = 0 for any j > I and k ≤ K ). Then, agent i develops a negative bias about any member of group K + 1 , and if v ηK +1 issuﬃciently large, then he develops a positive bias about everyone else. Intuitively, the presence of immigrants provides a convenient account for why agent i is not gettingwhat he thinks he deserves, so he comes to believe in discrimination in favor of immigrants anddevelops a negative view of immigrants. But because he views his fellow citizens as also competingwith immigrants, he forms positive opinions of them. This provides a mechanism for how focusingon a competitor outside group can help unify a population hitherto riddled with disagreements anddislikes — a common tactic of politicians. At the same time, agents who do not view themselves ascompetitors of the new group — perhaps because they are wealthier and do not compete directly forlow-income housing and other state beneﬁts — do not come to believe in favorable discriminationtowards the new group, do not form negative opinions of it, and do not change their opinions ofothers. The fact that beliefs depend on group membership implies that patterns of agreement and disagree-ment often also fall along group lines, at least when it comes to the beliefs of overconﬁdent individ-uals. Suppose that agents i and i have the same degree of overconﬁdence: ˜ a i − A i = ˜ a i − A i .Then, Equation (5) implies that agents i and i agree about the direction of discrimination towarda group if and only if they have the same relationship with the group ( c i k = c i k ). And Equation(6) implies that agents i and i agree about the calibers of all individuals if they share all groupmemberships ( c i = c i ), and otherwise they may disagree about some or all individuals. Becausediﬀerent biases can cancel each other, however, the degree of agreement is not necessarily decreasingin the number of shared groups. As an example, suppose that K = 2, c i = c i = 1, c j = 1 , c j = −

1, and Σ η = Σ η . Then, agents i and i agree about the caliber of individual j if c i = c i = −

1, but they do not agree if c i = 1 , c i = −

1. For instance,a white male and an African American female may evaluate a white female similarly, as they are both biased againsther for diﬀerent reasons. But a white male and a white female do not evaluate the white female similarly. By the logic of our model, the pattern of disagreements above applies only to overconﬁdent indi-viduals. Individuals who have realistic beliefs about themselves also develop realistic beliefs aboutothers and about discrimination. Hence, realistic individuals agree with each other irrespective ofgroup membership, and disagree with overconﬁdent individuals of all groups.

In our main model above, group relationships are exogenous. In the current section, we consider amodel in which there are no exogenously given groups that the agent considers relevant to thinkabout, but nevertheless he develops biases. This variant allows us to endogenize a person’s in-groupand out-group in some situations, and motivates why the agent might want to think of groups andgroup-level discrimination even in other situations.Formally, we make two modiﬁcations to our previous model, with all other assumptions re- See, e.g., Bagues et al. (2014) in the context of academic evaluations, Bagues and Esteve-Volart (2010) inthe context of judicial hiring decisions, and Card et al. (2019) in the context of refereeing at top journals. At thesame time, other authors, including Zinovyeva and Bagues (2011) studying academic promotions, Gagliarducci andPaserman (2012) studying municipal governments, De Paola and Scoppa (2015) studying academic evaluations, andKunze and Miller (2017) studying promotions at private ﬁrms, ﬁnd that women are treated better by other womenthan by men. q j = a j + ǫ j is an unbiased signalof caliber. Second, the ǫ j are not necessarily independent, but have a positive deﬁnite covariancematrix Σ q .A plausible economic example is team production. The I individuals are working in two disjointteams. Pay is determined by individual performances, which depends in part on individual abilityand idiosyncratic noise. But pay also depends on shocks common to the team, such as how well-chosen the tasks are or how a manager evaluates team performance in allocating bonuses. Thisnoise structure induces positive correlation between the outcomes of individuals on the same team,and may induce negative correlation between the outcomes of individuals on diﬀerent teams.Biases are now determined in the following way: Proposition 2 (Correlated Errors and Biases) . Agent i ’s long-run bias about agent j is ˜ a ij − A j = Σ qij Σ qii (˜ a i − A i ) , (7) while his bias about the covariance matrix is given by ˜Σ qjj ′ − Σ qjj ′ = (˜ a ij − A j )(˜ a ij ′ − A j ′ ) = Σ qj ′ i Σ qji Σ qii (˜ a i − A i ) . (8)To start developing intuition for Proposition 2, suppose ﬁrst that agent i has a correct un-derstanding of the covariance structure of signals. As before, the basic implication of agent i ’soverconﬁdence is that he repeatedly observes levels of q i that seem to him surprisingly low. If heknows that q i and q j are positively correlated, then his conclusion that q i is systematically too lowleads him to conclude that q j must be systematically too low as well. As a result, he overestimatesindividual j .By the second part of Proposition 2, however, the agent misestimates the covariance matrix aswell; speciﬁcally, he overestimates the covariance between q j and q j ′ if and only if he misestimatesindividuals j and j ′ in the same direction. For an intuition, suppose that he overestimates bothindividuals. Then, in a prototypical observation both q j and q j ′ seem to him to be unexpectedlylow and thus positively correlated.Finally, Proposition 2 implies that agent i ’s misestimation of the covariance matrix does notaﬀect his inferences about individuals. Intuitively, the amount by which agent i overestimates a j relative to a i (i.e., (˜ a ij − A j ) / (˜ a i − A i )) both determines the relative amount by which he22verestimates the covariance of q j and q i (( ˜Σ qij − Σ qij ) / ( ˜Σ qii − Σ qii )), and is determined by his relativeestimate of that covariance ( ˜Σ qij / ˜Σ qii ). This can only be consistent if he estimates the relativecovariance ( ˜Σ qij / ˜Σ qii ) correctly. While agent i ’s long-run conclusions depend on the correlation structure of the signals, a realisticperson’s long-run conclusions do not. For an overconﬁdent agent to make correct inferences, indi-viduals must not only be evaluated in an unbiased way, they must be evaluated in an independentway from him.This model allows us to endogenize a person’s in-group as individuals whose outcomes arepositively correlated with his — those “in the same boat” with him — and his out-group asindividuals whose outcomes are negatively correlated with his. With this endogenous speciﬁcationof the in-group and out-group, the model predicts the same type of in-group bias that we haveidentiﬁed in Section 4, as well as an interesting pattern of biases in agent i ’s perception of thecovariance structure. Speciﬁcally, agent i overestimates the covariance between the outcomes of twoin-group members as well as the covariance between the outcomes of two out-group members, buthe underestimates the covariance between the outcomes of an in-group member and an out-groupmember. These biases are consistent with one aspect of perceived group homogeneity deﬁned byLinville et al. (1989) and documented by Quattrone and Jones (1980), that a person overestimateshow much one group member’s outcome predicts another’s outcome. Perceived group homogeneityis an interesting contrast to correlation neglect, whereby people perceive or assume less correlationbetween relevant variables than there is in reality (Demarzo et al., 2003, Eyster and Rabin, 2005,Enke and Zimmermann, forthcoming).Furthermore, the current model suggests one possible explanation for why individuals mightwant to estimate theories of discrimination, as we have assumed exogenously in Sections 3 and 4.Comparing his conclusions to the very performances that generated his conclusions, agent i mightnotice that his in-group is faring persistently worse, and his out-group is faring persistently better,than he thinks they should given their calibers. Even if he initially did not think so, he might beginto suspect that discrimination is going on. As a result, he might be drawn to evaluate the dataallowing for discrimination. Formally, Part II of Theorem 1 implies that (˜Σ qij − Σ qij ) / (˜Σ qii − Σ qii ) = (˜ a ij − A j ) / (˜ a i − A i ) ; and Part I of Theorem 1 impliesthat (˜ a ij − A j ) / (˜ a i − A i ) = ˜Σ qij / ˜Σ qii . For both equations to hold simultaneously, it must be that (˜Σ qij − Σ qij ) / (˜Σ qii − Σ qii ) = ˜Σ qij / ˜Σ qii . Dividing by the right hand side and rewriting yields (1 − Σ qij / ˜Σ qij ) / (1 − Σ qii / ˜Σ qii ) = 1, implying that ˜Σ qij / ˜Σ qii = Σ qij / Σ qii .

23 natural question is what happens when — combining our model in Section 3 with that here— agent i allows for group-level discrimination, and individuals’ recognitions are correlated. Infact, our proof of Proposition 1 allows for this possibility, and implies that the eﬀects are additive:agent i ’s opinion of individual j is increasing both in the covariance between their recognitions andin the number of groups and competing groups they share. In this section, we consider how an agent’s inferences are modiﬁed if he also observes signals ofthe calibers of individuals. This could happen, for instance, if he has personal contact with or atrustworthy source about some members of society, so that he receives signals about them that arenot tainted by discrimination.We assume that agent i makes the same observations as in the model of Section 3, and alsoobserves signals of individuals’ calibers. Because the general analysis appears intractable, however,we solve a special case of the model. We assume that there is only one group, and each individual iseither a member or a competitor of the group ( c j ∈ {− , } ); we drop the subscript 1 for the singlegroup. Furthermore, recognitions q j = a j + c j θ + ǫ qj , signals about caliber s j = a j + ǫ aj , and signalsabout discrimination η = θ + ǫ η are independently distributed with ǫ qj ∼ N (0 , v q ) , ǫ aj ∼ N (0 , v a ),and ǫ η ∼ N (0 , v η ). This means that individuals’ recognitions have the same variance, and so dosignals about individuals’ calibers. Proposition 3 (The Eﬀect of Personal Contact) . Agent i ’s long-run bias about discrimination is ˜ θ i − Θ = − v η ( v q + v a ) c i ( v q + v η )( v q + v a ) + ( I − v q v η · (˜ a i − A i ) , and his long-run bias about individual j is ˜ a ij − A j = v η v a c i c j ( v q + v η )( v q + v a ) + ( I − v q v η · (˜ a i − A i ) . The qualitative pattern of biases is similar to that before: the agent is prone to believe indiscrimination against his group and in favor of the out-group, and he develops positive biasesregarding his in-group and negative biases regarding his out-group. But Proposition 3 also impliesthat more accurate information about individuals’ calibers (a lower v a ) and observing more people(a higher I ) both lower all biases. Intuitively, the former makes it more diﬃcult to maintain one’s24iases about individuals, and the latter provides better information about the role of discriminationin performances by allowing a person to compare those performances to his direct observationsabout caliber. For instance, getting to know many members of an out-group might make it clearthat their achievements are not due to favorable discrimination.In the comparative static above, a reduction in v a applies to observations of both in-groupmembers and out-group members. Although the intuition applies generally, unfortunately we can-not solve a model in which the variances are diﬀerent. To help conﬁrm the eﬀect of improvedinformation about just out-group members, we consider a particular example. Example . Suppose that I = 4, with individuals 1 and 2 being members and individuals 3 and 4being competitors of the group. Agent 1 observes an out-group member’s recognition with variance v qo , and an out-group member’s caliber with variance v ao . The variances of all other errors are 1.Then, ˜ a − A ˜ a − A = ˜ a − A ˜ a − A = − v ao v qo + 5 v ao + 4 and ˜ θ − Θ ˜ a − A = − v qo + v ao )5 v qo + 5 v ao + 4 . (9)Conﬁrming the previous logic, better information about the out-group’s caliber (a reduction in v ao )lowers all of the agent’s biases. This makes the agent more realistic not only about his out-groupand the extent of discrimination, but also about his in-group. Intuitively, receiving more accurateinformation about the out-group’s caliber makes it more diﬃcult to believe that the group’s averagecaliber is low, which in turn makes it more diﬃcult to believe that discrimination in favor of theout-group is going on. As a result, it is also less viable to believe that the in-group has high caliber.The prediction of our model that contact between diﬀerent groups can reduce prejudices andbiases is an instance of Allport’s (1954) inﬂuential contact hypothesis, for which the evidence isoverwhelming. Pettigrew and Tropp (2006) provide a meta-analysis of hundreds of studies, mostof which ﬁnd evidence consistent with the hypothesis. Many studies are correlational in nature,which are suggestive albeit not well-identiﬁed. But evidence reviewed by Paluck et al. (2018) inwhich researchers experimentally manipulate interactions between groups shows that contact hasa causal negative impact on prejudices. As a simple illustrative example from a poll by the Pew Research Center (2006), 52 percent of Americansagreed with the statement that immigrants are a burden because they take jobs and housing. It is unlikely that thesame percentage of immigrants agree, so this is probably another instance of in-group bias. But more important forthe present purpose is the geographic variation in these beliefs. In areas with a high concentration of foreign-bornindividuals, only 47 percent of those with U.S.-born parents think that immigrants are a burden, whereas in areaswith a low concentration of foreigners, 65 percent think so. Of course, an alternative interpretation is that immigrantssettle in places that are friendlier toward them. recogni-tions rather than caliber is detrimental. Part of the reason is the same as in our basic model: betterinformation about himself makes it more imperative for the agent to explain why his recognitionis low, increasing all his biases. But Example 1 makes it clear that there is another eﬀect actingthrough observations of the out-group. Observing more accurate information about the recognitionof the out-group (i.e., a decrease in v qo ) decreases the agent’s misinference about discrimination, butit increases his bias about the out-group’s caliber. The intuition is the following. Given that agent1 overestimates discrimination in favor of the out-group, the actual recognition of the out-groupis worse than he expects. He attributes this diﬀerence to noise, but a decrease in the noise makessuch an attribution less plausible. As a result, he concludes that discrimination must not be asstrong, but also that the out-group must be of lower caliber, than he thought. For example, manymajority Eastern Europeans believe that the Roma receive positive discrimination from police andget away with crimes too easily. Our model says that providing more information about how badlythe Roma are treated will lead the majority to conclude not only that favorable treatment is not aspronounced, but also that the Roma are committing worse crimes, than they previously thought.This prediction of our model provides a potential justiﬁcation for the practice of mainstreamnews outlets not to report the race of a suspected criminal unless it is essential for the story. Under the assumption that all parties use a correctly speciﬁed model, it is diﬃcult to understand thispractice, especially when in reality groups’ crime rates do not diﬀer much. But in our model, givingmore precise information about outsiders’ outcomes — as would be the case if racial informationwas provided — creates or exacerbates incorrect, prejudiced views, so it can be seen as harmful. It isimportant to emphasize, however, that other misspeciﬁcations on the part of readers or journalistscan also render racial information in crime reporting detrimental. In our models above, each individual is characterized by a single attribute a i . In reality, peoplethink of others in multidimensional ways. In this section, we consider a simple example of such See for example Guideline 12.1 of the German Press Codex (available at ). For instance, some journalists may hold prejudiced views that aﬀect when and how they report a criminal’srace. If readers do not account for journalists’ prejudices when interpreting reports, they can develop biased views.And within our framework, reporting racial information can also encourage readers to think in terms of races andthe moral standing of diﬀerent races. If so, our model says that this can increase racial biases (Section 4.4).

Example . There is one group and two individuals, with individual 1 being a representative memberand individual 2 being a representative competitor of the group. Agent 1 makes observations aboutthe social statuses of his in-group and out-group, which equal q = a + m + θ + ǫ q and q = a + m − θ + ǫ q , where a j is group j ’s talent, m j is group j ’s morality, and θ is discrimination in favor of thein-group. Agent 1 also observes his out-group’s business success, which equals b = 2 a + m + ǫ b . Hence, business success is unaﬀected by discrimination, and depends relatively more on talentthan does social status. Finally, the agent observes a signal of discrimination η = θ + ǫ η asbefore. We assume that the agent is overconﬁdent regarding his total deservingness of social status,overestimating a + m by ∆ . The errors are independent, with the variance of ǫ q being v q andthe variance of ǫ η being v η . Then, agent 1 develops the following biases in the long run:˜ a − A = 11 + v q /v η · ∆ ; ˜ m − M = −

21 + v q /v η · ∆ ; ˜ θ − Θ = −

11 + v q /v η · ∆ . As in our previous models, the agent comes to believe that discrimination against his in-groupand in favor of the out-group is going on. But he does not develop exclusively negative views ofthe out-group as a result: he comes to think that the out-group in more talented than it reallyis. Intuitively, given that he believes discrimination in favor of the out-group is going on, heunderestimates the out-group’s total deservingness of status, a + m . At the same time, hemust reconcile his beliefs with realistic views of the out-group’s business success, which are moresensitive to a . The best way to do so is to slightly overestimate the out-group’s talent and grosslyunderestimate its morality.Of course, the above overestimation depends on the speciﬁc pattern of observations the agentmakes, and is therefore a possibility result rather than a general prediction of our model. Nev-ertheless, it is consistent with the observation that stereotypes about out-groups are sometimespositive. For instance, Jews used to be stereotyped as smart and hard-working, women are often27een as being kind and empathic, and some minority men are considered good athletes. At ﬁrstsight, this may seem to contradict the idea that individuals tend to hold negative or at best realisticviews of their out-groups. Yet exactly as in our example, even when some stereotypes are positive,they often go along with, or even form part of, broader and arguably more important negativestereotypes (see Jackson 2011, pp. 18-20, for a discussion, and Glick and Fiske 1996 and Fiske etal. 2002 for closely related ideas). That Jews are smart and hard-working goes along with the ideathat they are cool and competitive. That women are kind goes along with the view that they arenot capable leaders. And that minorities are good athletes goes along with the notion that theyare not good at academics. In this section, we relate our theory to research not discussed elsewhere in the paper.Because the agent draws conclusions from observations while holding an incorrect view abouthimself, conceptually our paper belongs to the growing literature on learning with misspeciﬁedmodels. Researchers have studied inferences by individuals who ignore some explanatory variables(Hanna et al., 2014, Schwartzstein, 2014), misunderstand causal relationships (Spiegler, 2016),misinterpret social observations (Bohren, 2016, Bohren and Hauser, 2019), are overconﬁdent (Hes-termann and Le Yaouanq, 2016, Heidhues et al., 2018), or make mistakes in applying Bayes’ Rule(e.g., Rabin and Schrag, 1999, Rabin, 2002). The speciﬁc economic questions we ask and the speciﬁctheoretical methods we use are diﬀerent from those in the literature.The predominant economic approach to stereotypes — i.e., generalizations about groups — isthat of statistical discrimination, in which individuals use available information correctly to makeinferences about individuals (Phelps, 1972, Arrow, 1973). In our setting, the agent also uses hisobservations to form beliefs, but he does so incorrectly. In this sense, our model can be thought ofas one of misspeciﬁed statistical discrimination.From a psychological perspective, our model is most closely related to social identity theory(Tajfel and Turner, 1979, Tajfel, 1982). Social identity theory posits that individuals see themselvesas members of relevant social groups — their in-groups — and identify with those groups. As aresult, their self-esteem is bound up with their in-groups, so thinking positively about their in-groupsand negatively about their out-groups leads them to think and feel positively about themselves.28ur theory also implies that a person’s prejudices are intimately tied to his views about himself,but the connection follows a diﬀerent — in a sense reverse — logic: a person thinks positively abouthimself, and this leads to biases about his in-groups and out-groups. By virtue of being the mostconsistent beliefs with an inﬂated self-view, the prejudices in our model can also be interpreted ashelping to maintain a high self-esteem.Relatedly, group conﬂict theory posits that competition between two groups for the same limitedresource naturally leads to hostility between the groups, as well as discrimination and prejudice(Jackson, 2011). We derive prejudiced beliefs from intrapersonal considerations. Another strand of the social psychology literature conceptualizes stereotypes as heuristic sim-pliﬁcations of real attributes of groups. Bordalo et al. (2016) formalize this idea using a version ofKahneman and Tversky’s (1972) representativeness heuristic. They assume that a person considersa trait more typical in a group if it is relatively more common in the group than in the relevantcomparison group. This approach does not comfortably explain why stereotypes are often deroga-tory prejudices and why many views are self-serving, and unless diﬀerent groups have diﬀerentcomparison groups, it also does not explain why diﬀerent groups hold diﬀerent views. On the otherhand, our framework does not explain neutral stereotypes, such as the view that Swedes are blonde,which the framework of Bordalo et al. does.Glaeser (2005) presents a political-economy model of hate, which he deﬁnes as beliefs aboutthe harmfulness of others. Politicians can send fake messages that the out-group is dangerous, andthese messages are costly for the electorate to investigate. Because voters who believe that theout-group is dangerous prefer policies that lower the out-group’s resources, politicians beneﬁt fromhate-inducing messages that complement their policies. For instance, a pro-redistribution politicianmight want to induce hate against rich minorities. Unlike our framework, this model explains howthe political environment can aﬀect people’s beliefs about minorities, and which messages arecommunicated by which politicians. At the same time, our theory helps understand why negativeattitudes often persist without politicians stoking them, or even despite politicians’ attempts todebias. In addition, our theory can be viewed as one explanation for Glaeser’s (2005) assumptionthat only hateful messages can be sent: negative messages about other groups resonate more withcitizens because these ﬁt better with pre-existing beliefs. See also Akerlof and Kranton (2000) for a theory about the eﬀects of identity on behavior, which are less relatedto our theory focused on beliefs. Conclusion

Our theory posits exogenously given groups that are known to individuals. What happens when— as in reality — groups are endogenous and not fully known is an interesting question for futureresearch. As a simple illustration, consider a young academic who is unsure about what determinespublication success but knows that he is not a member of a privileged group that accepts eachother’s papers at the expense of others. As he observes that his papers do not get the credit heoverconﬁdently believes they deserve, he concludes that there must be such a group. As a result,he tries to ﬁnd the group and become a member of it (rather than improving papers). Since henever ﬁnds the group, he develops the conspiracy theory that it must be a secret society.While we study only beliefs in this paper, ultimately we are interested in how actions interactwith biased beliefs. Among the many possible questions, consider the troubling ﬁnding in theliterature on stereotypes we have mentioned: that biased beliefs can become self-fulﬁlling througha variety of mechanisms. Our model provides a platform for exploring such mechanisms. Forinstance, researchers have found that a salient negative stereotype (a “stereotype threat”) candirectly aﬀect a stereotyped person’s performance in the relevant domain (Steele and Aronson,1995). This changes the observations people make about the person for the worse, exacerbating thestereotype and potentially creating a vicious circle that ends in a real performance gap far greaterthan the bias itself.

References

Aberson, Christopher L., Michael Healy, and Victoria Romero , “Ingroup Bias and Self-Esteem: A Meta-Analysis,”

Personality and Social Psychology Review , 2000, (2), 157–173. Akerlof, George A. and Rachel E. Kranton , “Economics and Identity,”

Quarterly Journal ofEconomics , 2000, (3), 715–753.

Alesina, Alberto and Bryony Reich , “Nation Building,” 2015. NBER Working Paper , Edward L. Glaeser, and Bruce Sacerdote , “Why Doesn’t the United States Have aEuropean-Style Welfare State?,”

Brookings Papers on Economic Activity , 2001, , 187–277. Allport, Gordon W. , The Nature of Prejudice , Addison-Wesley Publishing Company, Inc., 1954.

Anderson, Monica , “Vast Majority of Blacks View the Criminal Justice System as Unfair,”Technical Report, Pew Research Center 2014.

Arnold, David, Will Dobbie, and Crystal S. Yang , “Racial Bias in Bail Decisions,”

QuarterlyJournal of Economics , 2018, (4), 1885–1932.30 rrow, Kenneth J. , “The Theory of Discrimination,” in O. Ashenfelter and A. Rees, eds.,

Discrimination in Labor Markets , Princeton University Press, 1973.

Bagues, Manuel F. and Berta Esteve-Volart , “Can Gender Parity Break the Glass Ceiling?Evidence from a Repeated Randomized Experiment,”

Review of Economic Studies , 2010, (4),1301–1328. Bagues, Manuel, Mauro Sylos-Labini, and Natalia Zinovyeva , “Do Gender Quotas Passthe Test ? Evidence from Academic Evaluations in Italy,” 2014. Working Paper.

Berk, Robert H. , “Limiting Behavior of Posterior Distributions when the Model Is Incorrect,”

Annals of Mathematical Statistics , 1966, (1), 51–58. Bertrand, Marianne and Esther Duﬂo , “Field Experiments on Discrimination,” in EstherDuﬂo and Abhijit Banerjee, eds.,

Handbook of Field Experiments , Vol. 1, Elsevier, 2017, chapter 8,pp. 309–394.

Beyer, Sylvia , “Gender Diﬀerences in the Accuracy of Self-Evaluations of Performance,”

Journalof Personality and Social Psychology , 1990, (5), 960–970. Bohren, Aislinn , “Informational Herding with Model Mispeciﬁcation,”

Journal of Economic The-ory , 2016, , 222–247. , Alex Imas, and Michael Rosenberg , “The Dynamics of Discrimination: Theory and Evi-dence,” 2018. Working Paper. and Daniel Hauser , “Misinterpreting Social Outcomes and Information Campaigns,” 2019.Working Paper.

Bordalo, Pedro, Katherine Coﬀman, Nicola Gennaioli, and Andrei Shleifer , “Stereo-types,”

Quarterly Journal of Economics , 2016, (4), 1753–1794.

Card, David, Stefano DellaVigna, Patricia Funk, and Nagore Iriberri , “Are Referees andEditors in Economics Gender Neutral?,” 2019. Working Paper.

Carlana, Michela , “Implicit Stereotypes: Evidence from Teachers’ Gender Bias,”

The QuarterlyJournal of Economics , 2019, (3), 1163–1224.

Coate, Stephen and Glenn Loury , “Antidiscrimination Enforcement and the Problem of Pa-tronization,”

American Economic Review , 1993, (2), 92–98. Demarzo, Peter M., Dimitri Vayanos, and Jeﬀrey Zwiebel , “Persuasion Bias, Social Inﬂu-ence, And Unidimensional Opinions,”

Quarterly Journal of Economics , 2003, (3), 909–968.

Ekehammar, Bo and Jin Sidanius , “Sex Diﬀerences in Sociopolitical Attitudes: A Replicationand Extension,”

British Journal of Social Psychology , 1982, (3), 249–257. Enke, Benjamin and Florian Zimmermann , “Correlation Neglect in Belief Formation,”

TheReview of Economic Studies , forthcoming.

Eyster, Erik and Matthew Rabin , “Cursed Equilibrium,”

Econometrica , 2005, (5), 1623–1672. 31 iske, Susan T., Amy J. C. Cuddy, Peter Glick, and Jun Xu , “A Model of (Often Mixed)Stereotype Content: Competence and Warmth Respectively Follow from Perceived Status andCompetition,” Journal of Personality and Social Psychology , 2002, (6), 878–902. and Michael S. North , “Measures of Stereotyping and Prejudice: Barometers of Bias,” inGregory J. Boyle, Donald H. Saklofske, and Gerald Matthews, eds., Measures of Personality andSocial Psychological Constructs , Academic Press, 2015, chapter 24, pp. 684–718.

Funk, Cary and Kim Parker , “Women and Men in STEM Often at Odds Over WorkplaceEquity,” Technical Report, Pew Research Center January 2018.

Gagliarducci, Stefano and M. Daniele Paserman , “Gender Interactions within Hierarchies:Evidence from the Political Arena,”

Review of Economic Studies , 2012, (3), 1021–1052. Gentzkow, Matthew and Jesse Shapiro , “Media Bias and Reputation,”

Journal of PoliticalEconomy , 2006, (2), 280–316.

Gilens, Martin , Why Americans Hate Welfare: Race, Media and the Politics of Anti-povertyPolicy. , The University of Chicago Press, 1999.

Glaeser, Edward L. , “The Political Economy of Hatred,”

Quarterly Journal of Economics , 2005, (1), 45–86.

Glick, Peter and Susan T. Fiske , “The Ambivalent Sexism Inventory: Diﬀerentiating Hostileand Benevolent Sexism,”

Journal of Personality and Social Psychology , 1996, (3), 491–512. Glover, Dylan, Amanda Pallais, and William Pariente , “Discrimination as a Self-FulﬁllingProphecy: Evidence from French Grocery Stores,”

Quarterly Journal of Economics , 2017, (3), 1219–1260.

Goette, Lorenz and Marta Kozakiewicz , “Experimental Evidence on Misguided Learning,”2018. Working Paper.

Hanna, Rema, Sendhil Mullainathan, and Joshua Schwartzstein , “Learning ThroughNoticing: Theory and Evidence from a Field Experiment,”

Quarterly Journal of Economics ,2014, (3), 1311–1353.

Heidhues, Paul, Botond K˝oszegi, and Philipp Strack , “Unrealistic Expectations and Mis-guided Learning,”

Econometrica , 2018, (4), 1159–1214. Hestermann, Nina and Yves Le Yaouanq , “It’s Not My Fault! Ego, Excuses and Persever-ance,” 2016. Working Paper.

Hewstone, Miles, Mark Rubin, and Hazel Willis , “Intergroup Bias,”

Annual Review ofPsychology , 2002, (1), 575–604. Hraba, Joseph, Richard Brinkman, and Phyllis GrayRay , “A Comparison of Black andWhite Prejudice,”

Sociological Spectrum , 1996, (2), 129–157. Jackson, Lynne M. , The Psychology of Prejudice: From Attitudes to Social Action , Washington,DC, US: American Psychological Association, 2011.32 akobsson, Niklas, Minna Levin, and Andreas Kotsadam , “Gender and Overconﬁdence:Eﬀects of Context, Gendered Stereotypes, and Peer Group,”

Advances in Applied Sociology , 2013, (2), 137–141. Kahneman, Daniel and Amos Tversky , “Subjective Probability: A Judgment of Representa-tiveness,”

Cognitive Psychology , 1972, (3), 430–454. Krysan, Maria and Sarah P. Moberg , “Trends in Racial Attitudes,” Technical Report, Uni-versity of Illinois Institute of Government and Public Aﬀairs 2016.

Kunze, Astrid and Amalia R. Miller , “Women Helping Women? Evidence from Private SectorData on Workplace Hierarchies,”

Review of Economics and Statistics , 2017, (5), 769–775. Lavy, Victor and Edith Sand , “On the Origins of Gender Gaps in Human Capital: Short-and Long-Term Consequences of Teachers’ Biases,”

Journal of Public Economics , 2018, (C),263–279.

Linville, Patricia W., Gregory Fischer, and Peter Salovey , “Perceived Distributions ofthe Characteristics of In-Group and Out-Group Members: Empirical Evidence and a ComputerSimulation,”

Journal of Personality and Social Psychology , 1989, (2), 165–188. McConahay, John B., Betty B. Hardee, and Valerie Batts , “Has Racism Declined inAmerica? It Depends on Who is Asking and What is Asked,”

Journal of Conﬂict Resolution ,1981, (4), 563–579. Mengel, Friederike, Jan Sauermann, and Ulf Z¨olitz , “Gender Bias in Teaching Evaluations,”

Journal of the European Economic Association , 2018, (2), 535–566. Miguel, Edward , “Tribe or Nation? Nation Building and Public Goods in Kenya versus Tanza-nia,”

World Politics , 2004, (3), 327–362. Mullen, Brian, Rupert Brown, and Colleen Smith , “Ingroup Bias as a Function of Salience,Relevance, and Status: An Integration,”

European Journal of Social Psychology , 1992, (2),103–122. National Public Radio, Robert Wood Johnson Foundation, and Harvard T.H. ChanSchool of Public Health , “Discrimination in America: Experiences and Views of White Amer-icans,” Technical Report November 2017.

Niederle, Muriel and Lise Vesterlund , “Do Women Shy Away From Competition? Do MenCompete Too Much?,”

Quarterly Journal of Economics , 2007, (3), 1067–1101.

Paluck, Elizabeth Levy, Seth A. Green, and Donald P. Green , “The Contact HypothesisRe-Evaluated,”

Behavioural Public Policy , 2018, pp. 1–30.

Paola, Maria De and Vincenzo Scoppa , “Gender Discrimination and Evaluators’ Gender:Evidence from Italian Academia,”

Economica , 2015, (325), 162–188. Pettigrew, Thomas F. and Linda R. Tropp , “A Meta-Analytic Test of Intergroup ContactTheory,”

Journal of Personality and Social Psychology , 2006, (5), 751–783.33 ew Research Center , “America’s Immigration Quandary,” Technical Report 2006., “Muslim Americans: No Signs of Growth in Alienation or Support for Extremism,” TechnicalReport 2011., “The Partisan Divide on Political Values Grows Even Wider,” Technical Report October 2017., “Partisans are Divided over The Fairness of the U.S. Economy – And Why People are Rich orPoor,” Technical Report 2018., “Trump’s International Ratings Remain Low, Especially Among Key Allies,” Technical Report2018. Phelps, Edmund S. , “The Statistical Theory of Racism and Sexism,”

The American EconomicReview , 1972, (4), 659–661. Pratto, Felicia, Jim Sidanius, and Shana Levin , “Social Dominance Theory and the Dy-namics of Intergroup Relations: Taking Stock and Looking Forward,”

European Review of SocialPsychology , 2006, (1), 271–320. Prendergast, Canice , “A Theory of “Yes Men”,”

American Economic Review , 1993, (4),757–70. Qualls, Christopher, Mary Cox, and Terra Schehr , “Racial Attitudes on Campus: Are thereGender Diﬀerences?,”

Journal of College Student Development , 1992, (6), 524–530. Quattrone, George A. and Edward E. Jones , “The Perception of Variability within In-Groupsand Out-Groups: Implications for the Law of Small Numbers,”

Journal of Personality and SocialPsychology , 1980, (1), 141–152. Rabin, Matthew , “Inference by Believers in the Law of Small Numbers,”

Quarterly Journal ofEconomics , 2002, (3), 775–816. and Joel Schrag , “First Impressions Matter: A Model of Conﬁrmatory Bias,”

QuarterlyJournal of Economics , 1999, (1), 37–82.

Schwartzstein, Joshua , “Selective Attention and Learning,”

Journal of the European EconomicAssociation , 2014, (6), 1423–1452. Shayo, Moses and Asaf Zussman , “Judicial Ingroup Bias in the Shadow of Terrorism,”

Quar-terly Journal of Economics , 2011, (3), 1447–1484.

Sidanius, Jim and Felicia Pratto , Social Dominance: An Intergroup Theory of Social Hierarchyand Oppression , Cambridge University Press, 1999.

Spiegler, Ran , “Bayesian Networks and Boundedly Rational Expectations,”

Quarterly Journal ofEconomics , 2016, (3), 1243–1290.

Steele, Claude M. and Joshua Aronson , “Stereotype Threat and the Intellectual Test Per-formance of African Americans,”

Journal of Personality and Social Psychology , 1995, (5),797–811. 34 umner, William G. , Folkways , New York: Ginn, 1906.

Tajfel, Henri , “Social Psychology of Intergroup Relations,”

Annual Review of Psychology , 1982, (1), 1–39. and John Turner , “An Integrative Theory of Intergroup Conﬂict,” in William G. Austin andStephen Worchel, eds., The Social Psychology of Intergroup Relations , Brooks/Cole Pub. Co,1979, chapter 3, pp. 33–47. van Oorschot, Wim , “Making the Diﬀerence in Social Europe: Deservingness Perceptions AmongCitizens of European Welfare States,”

Journal of European Social Policy , 2006, (1), 23–42. Zinovyeva, Natalia and Manuel Bagues , “Does Gender Matter for Academic Promotion?Evidence from a Randomized Natural Experiment,” 2011. Working Paper.

A Proofs

For brevity, throughout the Appendix we denote the bias of the agent’s long-run beliefs aboutfundamental j by ∆ j = ˜ f j − f j , and let ∆ = (∆ , . . . , ∆ L ) T . Proof of Theorem 1.

As shown in (Berk, 1966, main theorem p.54) the support of the agent’sbeliefs will concentrate on the set of points that minimize the Kullback-Leibler divergence to thetrue model parameters ( f, Σ) over the support of P arg min ( ˆ f, ˆΣ) ∈ supp P D (cid:0) f, Σ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f , ˆΣ (cid:1) , (10)where the Kullback-Leibler divergence is given by D (cid:0) f, Σ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f , ˆΣ (cid:1) = E " log ℓ ( r | f, Σ) ℓ ( r | ˆ f , ˆΣ) . We will argue that the minimization problem (10) admits a unique solution when the prior P satisﬁes either (Case I), (Case II), or (Case III) and thus beliefs concentrate on a single point. Asboth the true model as well as the subjective model are Normal, we have that the Kullback-Leibler35ivergence simpliﬁes to D (cid:0) f, Σ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f , ˆΣ (cid:1) = 12 tr( ˆΣ − Σ) + ( M ( ˆ f − f )) T ˆΣ − M ( ˆ f − f ) − D + log det ˆΣdet Σ ! . (11)Throughout, we denote by ˜ f , ˜Σ the agents subjective long-run beliefs about the mean of the fun-damentals and the covariance matrix. Deﬁne the matrix B = M T ˜Σ − M ∈ R L × L and denote it’s elements by ( B jk ) j,k ∈{ ,...,L } . For future reference, note that since ˜Σ is symmetric,so is M T ˜Σ − M , and thus B jk = B kj . Furthermore, as ˜Σ is positive deﬁnite, so is ˜Σ − and B = M T ˜Σ − M .We ﬁrst analyze Case (I): By condition (Case I) the minimum in (10) is taken over means of thefundamentals ˆ f or equivalently biases ∆ = ˆ f − f , taking the subjective covariance matrix ˆΣ = ˜Σ asgiven. By Berk’s Theorem, the agent’s beliefs about the fundamentals concentrate on the set thatminimizes the Kullback-Leibler divergence (11). As we can ignore all terms that do not depend onˆ f , we get that the support of the subjective long-run belief about the mean of the fundamental iscontained in arg min ˆ f : ˆ f i = ˜ f i ( M ( ˆ f − f )) T ˜Σ − M ( ˆ f − f ) = f + arg min ∆: ∆ i = ˜ f i − f i ∆ T (cid:16) M T ˜Σ − M (cid:17) ∆= f + arg min ∆: ∆ i = ˜ f i − f i L X k =1 L X j =1 B kj ∆ k ∆ j . (12)Here the sum symbolizes the addition of f to every element by element in the set of minimizers.Taking the ﬁrst order conditions in the bias about fundamental ∆ h for h = i and using that B jk = B kj yields 0 = 2 L X k =1 B kj ∆ k . Dividing by 2 and plugging in ∆ k = B − ki B − ii ∆ i on the right-hand-side yields L X k =1 B kj ∆ k = L X k =1 B kj B − ki B − ii ∆ i = ∆ i B − ii L X k =1 B kj B − ki = ∆ i B − ii L X k =1 B jk B − ki = ∆ i B − ii ( BB − ) ji , See for example https://en.wikipedia.org/wiki/Kullback\%E2\%80\%93Leibler_divergence BB − is the identity and i = j . Hence, ∆ k = B − ki B − ii ∆ i satisﬁes the ﬁrst ordercondition.Let e k be the k -th unit vector, for k ∈ { , . . . , L } . We next verify that the ﬁrst order conditionis suﬃcient for a global minimum. To do so, we rewrite the part of the objective (12) in terms of∆ − i = P j = i e j ∆ j ∆ T B ∆ =  e i ∆ i + X j = i e j ∆ j  T B  e i ∆ i + X j = i e j ∆ j  = ( e i ∆ i + ∆ − i ) T B ( e i ∆ i + ∆ − i )= ( e i ∆ i ) T B ( e i ∆ i ) + ∆ T − i B ∆ − i + 2 ( e i ∆ i ) T B ∆ − i . (13)The Hessian with respect to ∆ − i of (13) equals 2 B . As any quadratic form with a positive deﬁnitematrix Hessian has a unique global minimum that satisﬁes the ﬁrst-order condition, it follows thatindeed ∆ k = B − ki B − ii ∆ i = ( M T ˜Σ − M ) − ij ( M T ˜Σ − M ) − ii ∆ i is the unique global minimizer for all k = i . This completes (I).We next analyze Case (II): In this case the agent takes the subjective mean of the fundamentals˜ f and thus the bias ∆ as given and estimates the covariance matrix ˜Σ. Again, by Berk’s Theoremthe agent’s beliefs about the covariance matrix concentrate on the set that minimizes the Kullback-Leibler divergence (11), which is equivalent to the setarg min ˆΣ tr( ˆΣ − Σ) + ( M ∆) T ˆΣ − ( M ∆) + log det ˆΣdet Σ ! . (14)Denote by · ⊗ · : R D × R D → R D × D the Kronecker product. In matrix notation, we want to showthat the unique minimum of (14) is attained atˆΣ = Σ + ( M ∆) ⊗ ( M ∆) T

37o simplify notation let y = M ∆. We ﬁrst manipulate the objective functiontr( ˆΣ − Σ) + y T ˆΣ − y + log det ˆΣdet Σ = tr( ˆΣ − Σ) + tr( y T ˆΣ − y ) + log(det ˆΣ) − log(det Σ)= tr( ˆΣ − Σ) + tr( ˆΣ − [ y ⊗ y T ]) − log(det ˆΣ − ) − log(det Σ)= tr (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) − log (cid:16) det ˆΣ − (cid:17) − log(det Σ)= tr (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) − log det (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) + log det (cid:16) Σ − (Σ + [ y ⊗ y T ]) (cid:17) = tr (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) − log det (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) + log det (cid:16) Id + Σ − [ y ⊗ y T ] (cid:17) . (15)Here we used in the ﬁrst equality that a real number equals it’s trace and the log of the ratio equalsthe diﬀerence of the logs. The second equality uses that the trace of A T B equals the trace of BA T .For third equality we use that the trace is an additive function. In the last equalities we use thatthe sum of logarithms equals the logarithm of the product and that the product of determinantsequals the determinant of the product. Now notice that since Σ and y do not depend on ˆΣ, the setof minimizers equalsarg min ˆΣ tr( ˆΣ − (Σ + [ y ⊗ y T ])) − log(det( ˆΣ − (Σ + [ y ⊗ y T ])) . (16)Let λ , . . . , λ D be the eigenvalues of the matrix ˆΣ − (Σ + [ y ⊗ y T ]). Since the trace is the sum ofeigenvalues and the determinant is the product of eigenvalues, (16) is minimized by all matrices ˆΣsuch that the eigenvalues of ˆΣ − (Σ + [ y ⊗ y T ]) minimize D X k =1 λ k − D X k =1 log λ k . (17)As (17) is strictly convex, we can take the ﬁrst order condition to identify the unique minimizer.This yields that (17) uniquely minimized if and only if λ k = 1 for all k . As all eigenvalues equal oneand ˜Σ − (Σ + [ y ⊗ y T ]) is symmetric—and hence diagonalizable—, ˜Σ − (Σ + [ y ⊗ y T ]) is the identitymatrix. This establishes that˜Σ = Σ + [ y ⊗ y T ] = Σ + ( M ∆) ⊗ ( M ∆) T (18)is the unique minimizer of (14) and thus the subjective long-run belief of the agent about thecovariance matrix. This establishes (II). 38inally, we prove Case (III): Again, by Berk’s Theorem the agent’s long-run bias about thefundamental and beliefs about the covariance matrix concentrate on the set that minimizes theKullback-Leibler divergence (11)arg min (∆ , ˆΣ): ∆ i = ˜ f i − f i tr( ˆΣ − Σ) + y T ˆΣ − y − D + log det ˆΣdet Σ ! . (19)As shown in (15) this objective is equivalent to / timestr (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) − log det (cid:16) ˆΣ − (Σ + [ y ⊗ y T ]) (cid:17) − D + log det (cid:16) Id + Σ − [ y ⊗ y T ] (cid:17) . Plugging in the minimizer for the covariance matrix Σ + [ y ⊗ y T ] derived in part two simpliﬁes theobjective to log det (cid:16) Id + Σ − [ y ⊗ y T ] (cid:17) . (20)We ﬁrst observe that as the determinant is the product of eigenvalues, (20) equals the sum of thelogarithms of the eigenvalues of Id +Σ − [ y ⊗ y T ]. Furthermore, if λ is an eigenvalue of Id +Σ − [ y ⊗ y T ]with associated eigenvector v then λ − − [ y ⊗ y T ] as λv = ( Id + Σ − [ y ⊗ y T ]) v ⇒ ( λ − v = Σ − [ y ⊗ y T ] v . If we denote the eigenvalues of Σ − [ y ⊗ y T ] by λ , . . . , λ D then the objective (20) equals K X i =1 log( λ k + 1) . As eigenvalues are independent of the basis, we next choose an orthogonal basis x , . . . , x D suchthat x = y (we can always do so by picking an arbitrary basis and applying the Gram-Schmidtprocess). Denote, = (1) the 1 × x i is orthogonal to y = x , we have thatΣ − [ y ⊗ y T ] x i = Σ − [ y ⊗ y T ][ ⊗ x i ] = Σ − [ y ] ⊗ [ y T x i ] =  i = 1( y T y )(Σ − y ) if i = 1 . Hence, D − − [ y ⊗ y T ] equal zero. We will next show that v = Σ − y isan eigenvector with associated non-zero eigenvalue. Let v = P Di =1 α i x i be the representation of v = Σ − y in the basis x . We have thatΣ − [ y ⊗ y T ] v = α ( y T y )(Σ − y ) = α ( y T y ) v v is an eigenvector of Σ − [ y ⊗ y T ] with eigenvalue α ( y T y ). As α is given by the projectionof v on y , we have that α = y T vy T y and thus the non-zero eigenvalue of Σ − [ y ⊗ y T ] equals α ( y T y ) = y T v = y T Σ − y . Consequently, the agents long-run belief about the mean of the state satisﬁes˜ f = f + arg min ∆: ∆ i = ˜ f i − f i y T Σ − y = f + arg min ∆: ∆ i = ˜ f i − f i ∆ T (cid:0) M T Σ − M (cid:1) ∆ . By (I) we have then have that the unique minimizer and thus the long-run belief of the agent isgiven by ∆ k = h M T Σ − M i − ki (cid:2) M T Σ − M (cid:3) − ii ∆ i for k = i ˜Σ = Σ + ( M ∆) ⊗ ( M ∆) T . (21)This completes the proof of (III). Proof of Proposition 1.

Let Σ q , Σ η be the variance-covariance matrices of ǫ q and ǫ η ,Σ q = diag( v q , . . . , v qI )Σ η = diag( v η , . . . , v ηK )and observe that they are invertible as the variances are greater than zero. We show that thismodel can be reduced into our old model. To see this observe that one can write the vector ( q η ) T in matrix notation as  qη  =  Id C Id  ·  aθ  +  ǫ q ǫ η  . (22)Let M =  Id C Id  . M has determinant 1 it is invertible. We have that the matrix h M T Σ − M i − is given by h M T Σ − M i − = M − Σ( M − ) T =  Id − C Id   Σ q

00 Σ η   Id − C T Id  =  Id − C Id   Σ q − Σ η C T Σ η  =  Σ q + C Σ η C T − C Σ η − Σ η C T Σ η  . By Theorem 1 agent i ’s bias about the ability of agent j is given by˜ a ij − A j = h M T Σ − M i − ij (cid:2) M T Σ − M (cid:3) − ii ∆ i = (cid:2) Σ q + C Σ η C T (cid:3) ij [Σ q + C Σ η C T ] ii (˜ a i − A i )= P k c ik c jk v ηk v qi + P k c ik v ηk · (˜ a i − A i ) . By a similar argument we have that the estimated bias associated with characteristic k is given by˜ θ ik − Θ k = h M T Σ − M i − i ( I + k ) (cid:2) M T Σ − M (cid:3) − ii ∆ i = (cid:2) − Σ η C T (cid:3) ik [Σ q + C Σ η C T ] ii (˜ a i − A i )= − c ik v ηk v qi + P k P k c ik v ηk · (˜ a i − A i ) . This proves the result.

Proof of Corollary 1.

Part 1. Consider individual i , who is a member of group k . For anyindividual j in group k , c ik c jk = 1, so by Equation (6) individual i overestimates individual j .Hence, individual i overestimates the average ability of group k . For any k ′ = k and member j of group k ′ , we have c ik ′ c jk ′ ≤

0, so individual i does not overestimate individual j . As a result,individual i does not overestimate the average ability of group k ′ .Given that the average abilities of the groups are equal and i overestimates the average caliberof k but not of k , the result follows.Part 2. Since, by the reasoning in the ﬁrst paragraph of the proof of Part 1, i overestimates theability of group k but i does not, i believes the average ability of k to be greater than i does.And because in addition i overestimates the average ability of k while i does not, i thinks that k − k is greater than i does. Proof of Corollary 2.

Using Equation (5), we have X k | ˜ θ ik − Θ k | = P k | c ik | Σ ηk Σ qi + P k c ik Σ ηk · (˜ a i − A i ) = P k c ik Σ ηk Σ qi + P k c ik Σ ηk · (˜ a i − A i )41dding an irrelevant group increases the numerator and denominator on the right-hand side by thesame amount. Since the numerator is smaller, the fraction increases. Proof of Corollary 3.

Note that c iK +1 = 1. For any member j = i of group κ , c iK +1 c jK +1 = 1.Hence, adding group K + 1 increases the numerator and denominator on the right-hand side ofEquation (6) by the same amount. Since the ratio has absolute value less than 1, this increases theratio.For any member j of group K + 1, c iK +1 c jK +1 = −

1. Hence, adding group K + 1 lowers thenumerator on the right-hand side of Equation (6), and raises the denominator by the same amount.Since the ratio has absolute value less than 1, this lowers the ratio. Proof of Corollary 4.

Obvious from Equation (5).

Proof of Corollary 5.

The negative bias about members of group K +1 follows from the facts thatfor any j > I , c iK +1 c jK +1 = − c jk = 0 for any k ≤ K . The second part follows from the factthat for any j ≤ I , c iK +1 c jK +1 = 1, and that for Σ ηK +1 suﬃciently large, this term dominates. Proof of Proposition 2.

We apply Part III Theorem 1 to f = a , M = Id . Then, [ M T Σ − M ] = Σ,and M ( ˜ f − f ) = ˜ a − A , yielding the formulas in the proposition. Proof of Proposition 3.

Again the model is a special case of our general model introduced inSection 2 with  qηs  = M  aθ  + ǫ , where ǫ ∼ N (0 , Σ). We have that the matrix M is given M =  Id C IdId  and the variance covariance matrix is of the formΣ =  v q Id v η Id

00 0 v a Id  .

42y Theorem 1 (III), we have that the agent’s long-run bias is given by∆ k = h M T Σ − M i − ki (cid:2) M T Σ − M (cid:3) − ii ∆ i . (23)To compute the agents beliefs we ﬁrst compute ( M T Σ − M ) − . We get that M T Σ − M =  Id IdC T Id  ×  v q Id v η Id

00 0 v a Id  ×  Id C IdId  =  Id IdC T Id  ×  v q Id v q C v η Id v a Id  = (cid:0) v q + v a (cid:1) Id v q C v q C T v η Id + v q C T C  . The inverse to this matrix is given by[ M T Σ − M ] − = (cid:0) v q + v a (cid:1) v η Id + v q v a CC T (cid:0) v q + v a (cid:1) v η Id + v q v a C T C  − ×  v η Id + ( CC T ) v q − C v q − C T v q (cid:0) v q + v a (cid:1) Id  = (cid:2)(cid:0) v q + v a (cid:1) v η Id + v q v a CC T (cid:3) − (cid:2) v η Id + ( CC T ) v q (cid:3) − (cid:2)(cid:0) v q + v a (cid:1) v η Id + v q v a CC T (cid:3) − C v q − (cid:2)(cid:0) v q + v a (cid:1) v η Id + v q v a C T C (cid:3) − C T v q (cid:2)(cid:0) v q + v a (cid:1) v η Id + v q v a C T C (cid:3) − (cid:0) v q + v a (cid:1) Id  . To identify agent i ’s biases regarding other individuals, we need to understand the upper left cornerof this matrix. Furthermore, since each bias given in (23) is given by the ratio of two matrix entries,it is suﬃcient to understand the matrix up to a multiplicative constant. The matrix is proportionalto (cid:20) v q + v a v η Id + CC T (cid:21) − (cid:20) v q v η Id + CC T (cid:21) . Deﬁne x = v q + v a v η ∈ R and y = v q v η ∈ R . Rewriting gives (cid:2) xId + CC T (cid:3) − (cid:2) xId + CC T + ( y − x ) Id (cid:3) = Id + ( y − x ) (cid:2) xId + CC T (cid:3) − . (24)43e consider the special case in which there is one group, and each individual in the populationis either a member or a competitor of the group. This means that C is an N -dimensional vectorconsisting only of +1’s and − CC T ) ij = X k ( CC T ) ik ( CC T ) kj = X k c i c k c j = X k c i c j = N ( CC T ) ij , so that ( CC T ) = I CC T . Given this, we have that (cid:2) x Id + CC T (cid:3) (cid:18) x Id − x + I x CC T (cid:19) = Id − xx + I x CC T + 1 x CC T − x + I x CC T CC T = Id − xx + I x CC T + 1 x CC T − Ix + I x CC T = Id , and thus (cid:2) xId + CC T (cid:3) − = 1 x Id − x + I x CC T . As a consequence we get that (24) simpliﬁes to Id + ( y − x ) (cid:2) xId + CC T (cid:3) − = Id + y − xx Id − y − xx + I x CC T = yx Id + x − yx + I x CC T . Plugging in for x and y yields v q v q + v a Id + v a v η (cid:0) v q + v a v η (cid:1) + I v q + v a v η CC T , which is proportional to v q Id + v av q + v a v η + I CC T . Hence, agent i ’s bias regarding agent j satisﬁes˜ a ij − A j ˜ a i − A i = (cid:18) v q Id + v avq + vavη + I CC T (cid:19) ij (cid:18) v q Id + v avq + vavη + I CC T (cid:19) ii = v avq + vavη + I c i c j v q + v avq + vavη + I = v η v a c i c j ( v q + v η )( v q + v a ) + ( I − v q v η . Similarly, the upper right corner of the matrix is proportional (with the same proportionality) to − v η C + v ηv q + v a v η + I CC T C CC T C = I C . Hence, the i th component of the above vector equals − v q + v av q + v a v η + I c i , and therefore agent i ’s bias about discrimination is˜ θ i − Θ ˜ a i − A i = − v q + v avq + vavη + I c i v q + v avq + vavη + I = − v η ( v q + v a ) c i ( v q + v η )( v q + v a ) + ( I − v q v η . Calculations behind Example 1.

In the notation of Theorem 1, f =  a a a a θ  , M =  −

10 0 0 1 −

10 0 0 0 11 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 0  , Σ =  σ qo σ qo σ ao

00 0 0 0 0 0 0 0 σ ao  . Applying Part III of Theorem 1, and using Matlab in symbolic mode yields the results.

Calculations behind Example 2.

Let a ′ = a + m and a ′ = a + m . In the notation ofTheorem 1, f =  a ′ a ′ θ a  , M =  −  , Σ =  v q v q v b

00 0 0 v η  ..