[PDF] Evolutionarily Stable (Mis)specifications: Theory and Applications

Abstract

We introduce an evolutionary framework to evaluate competing (mis)specifications in strategic situations, focusing on which misspecifications can persist over a correct specification. Agents with heterogeneous specifications coexist in a society and repeatedly match against random opponents to play a stage game. They draw Bayesian inferences about the environment based on personal experience, so their learning depends on the distribution of specifications and matching assortativity in the society. One specification is evolutionarily stable against another if, whenever sufficiently prevalent, its adherents obtain higher expected objective payoffs than their counterparts. The learning channel leads to novel stability phenomena compared to frameworks where the heritable unit of cultural transmission is a single belief instead of a specification (i.e., set of feasible beliefs). We apply the framework to linear-quadratic-normal games where players receive correlated signals but possibly misperceive the information structure. The correct specification is not evolutionarily stable against a correlational error, whose direction depends on matching assortativity. As another application, the framework also endogenizes coarse analogy classes in centipede games.

Full PDF

EEvolutionarily Stable (Mis)speciﬁcations:Theory and Applications ∗Kevin He † Jonathan Libgober ‡ First version: December 20, 2020This version: December 29, 2020

Abstract

We introduce an evolutionary framework to evaluate competing (mis)speciﬁcationsin strategic situations, focusing on which misspeciﬁcations can persist over a correctspeciﬁcation. Agents with heterogeneous speciﬁcations coexist in a society and repeat-edly match against random opponents to play a stage game. They draw Bayesian infer-ences about the environment based on personal experience, so their learning dependson the distribution of speciﬁcations and matching assortativity in the society. Onespeciﬁcation is evolutionarily stable against another if, whenever suﬃciently prevalent,its adherents obtain higher expected objective payoﬀs than their counterparts. Thelearning channel leads to novel stability phenomena compared to frameworks wherethe heritable unit of cultural transmission is a single belief instead of a speciﬁcation(i.e., set of feasible beliefs). We apply the framework to linear-quadratic-normal gameswhere players receive correlated signals but possibly misperceive the information struc-ture. The correct speciﬁcation is not evolutionarily stable against a correlational error,whose direction depends on matching assortativity. As another application, the frame-work also endogenizes coarse analogy classes in centipede games.

Keywords : misspeciﬁed Bayesian learning, endogenous misspeciﬁcations, evolution-ary stability, higher-order beliefs, analogy classes ∗ We thank Cuimin Ba, In-Koo Cho, Krishna Dasaratha, Andrew Ellis, Ignacio Esponda, Drew Fuden-berg, Alice Gindin, Ryota Iijima, Yuhta Ishii, Filippo Massari, Philipp Sadowski, Alvaro Sandroni, JoshuaSchwartzstein, Philipp Strack, and various conference and seminar participants for helpful comments. KevinHe thanks the California Institute of Technology for hospitality when some of the work on this paper wascompleted. † University of Pennsylvania. Email: [email protected] ‡ University of Southern California. Email: [email protected] a r X i v : . [ ec on . T H ] D ec Introduction

In many economic settings, people draw misspeciﬁed inferences about the world — that is,they start with a prior belief that dogmatically precludes the true data-generating process.For instance, behavioral economics documents a number of prevalent statistical biases. Whenpeople reason about economic fundamentals under the spell of one of these biases, theyengage in misspeciﬁed learning. Following Esponda and Pouzo (2016), a growing literaturehas focused on the implications of Bayesian learning under diﬀerent misspeciﬁcations, takingthe errors as exogenously given.Why and when might we expect such misspeciﬁcations to persist? Mistakes that distortlearning are empirically ubiquitous, which is puzzling for two reasons. First, many of theseerrors demand even greater computational sophistication than the simple truth, making themhard to justify on the grounds of bounded cognition or costly attention. Convoluted conspir-acy theories fall into this category. So does a behavioral error called projection bias, whereagents overestimate the similarity between their own information and others’ information.Reasoning with projection bias in settings with statistical independence requires the learnerto keep track of inter-personal correlations, complicating the inference problem. Second,conventional economic wisdom dating back to at least Friedman (1953)’s market-selectionhypothesis suggests competitive pressures will eliminate mistakes — including misspeciﬁ-cations. Indeed, contemporaneous papers that formalize payoﬀ-based criteria for selectingbetween (mis)speciﬁcations ﬁnd no strict advantage to being misspeciﬁed in single-agentdecision problems (Fudenberg and Lanzani, 2020; Frick, Iijima, and Ishii, 2020).This paper introduces a general framework to evaluate competing (mis)speciﬁcationsbased on their expected objective payoﬀs, with particular emphasis on which misspeciﬁca-tions are likely to persist over the correct speciﬁcation (and in which environments). Weﬁnd that when agents with heterogeneous speciﬁcations coexist in a society and repeatedlymatch against random opponents to play a stage game, misspeciﬁed agents may enjoy astrict payoﬀ advantage compared to their correctly speciﬁed counterparts. Unlike in deci-sion problems, misspeciﬁcations in games can lead to strategically beneﬁcial misinferencesabout the game parameters. Through several examples and applications, we discuss howdetails of the social interaction structure, such as the matching assortativity between agentswith diﬀerent speciﬁcations, shape the stability of diﬀerent mistakes.We consider an evolutionary framework where speciﬁcations are encoded in theories ,which delineate feasible beliefs and serve as the basic unit of cultural transmission. Eachtheory may represent, for example, a scientiﬁc paradigm that stipulates a set of (possiblyincorrect) relationships between the environmental parameters and the observables. Adher-ents of the theory learn about the environment by estimating the parameters of their theory1nd play the stage game based on their calibrated model. Theories rise and fall in promi-nence based on the objective welfare of their adherents, as the school of thought that leadsto higher payoﬀs tends to acquire more resources and attract more followers in the future.The ﬁtness of a theory is determined by its average payoﬀ in stage games, and this averagedepends on the distribution of opponents. We introduce the concept of a zeitgeist to capturethe relevant social interaction structure in the society — the sizes of the subpopulationswith diﬀerent theories, and the matchmaking technology that pairs up opponents to playthe game. In equilibrium, each agent forms a Bayesian belief about her environment usingdata from all of her interactions, and subjectively best responds to this belief. We deﬁne the evolutionary stability of theory A against theory B based on whether theory A has a higherequilibrium ﬁtness than theory B when the population share of theory A is close to 1.Adherents of a misspeciﬁed theory may come to diﬀerent conclusions about the economicfundamentals in diﬀerent zeitgeists, with these diﬀerent beliefs translating into diﬀerentsubjective best-response functions in the stage game. This kind of endogeneity in stage-game behavior leads to novel stability phenomena. First, we show the possibility of a strongform of multiplicity in the stability comparison between two theories: stability reversals . Twotheories exhibit stability reversal if (i) theory A’s adherents strictly outperform theory B’sadherents not only on average, but even conditional on opponent’s type, whenever theoryA is dominant; (ii) theory B’s adherents strictly outperform theory A’s adherents, whenevertheory B is dominant. Second, we show that the relative stability of one theory over anothermay be non-monotonic in matching assortativity. One theory may be evolutionarily stableagainst another when assortativity is either high or low, but not when it is intermediate.Both of these stability phenomena operate through misinference and cannot happen if thelearning channel is eliminated. That is, they never arise in a world where the basic unit ofcultural transmission is a single belief about the economic environment instead of a theory(i.e., a collection of feasible beliefs).As an application of our general framework, we examine a linear-quadratic-normal gameof incomplete information from Vives (1988). This game has been used to study the equi-librium impact of changing information structures (Bergemann and Morris, 2013), but weinstead ask about the evolutionary stability of misspeciﬁcations on the information struc-ture. Players receive objectively correlated signals about Nature’s type in the stage game,and we consider theories that may be misspeciﬁed about the correlation between the signals.After seeing their own signals, the adherents of misspeciﬁed theories hold correct ﬁrst-orderbeliefs about Nature’s type but incorrect higher-order beliefs (i.e., wrong beliefs about ri-val’s signal and action), and thus misinfer parameters of the stage game from the game’soutcome. We show the correctly speciﬁed theory is not evolutionary stable against eithertheories that dogmatically stipulate excessively correlated information (projection bias) or2hose that stipulate excessively independent information (correlation neglect), but not both.Which correlational error can invade a rational society depends on the social interactionstructure — namely, the matching assortativity of how agents with diﬀerent theories arepaired to play the game. We also use this game to illustrate that the mislearning channel iscrucial to the predictions: the same correlational errors would instead confer an evolutionarydisadvantage if they were combined with correct beliefs about the other game parameters.As a second application, we discuss how our framework can endogenize analogy classes,a solution concept that Jehiel (2005) introduced to capture simpliﬁed strategic thinking incomplex environments. Forming analogy classes is a type of misspeciﬁcation about strategicuncertainty in extensive-form games, where a biased agent incorrectly believes that heropponent follows the same strategy at distinct nodes in the same analogy class. Providing afoundation for coarse analogy classes has been an open question. We show how to representdiﬀerent analogy classes as diﬀerent theories in our framework, then ﬁnd that the theorywith the ﬁnest analogy class (i.e., rational agents) is not evolutionarily stable against atheory with coarse analogy classes in a centipede game. This result provides an evolutionaryjustiﬁcation for analogy-based thinking in this context that does not involve thinking cost.The rest of this section reviews related literature. Section 2 introduces the environmentand the evolutionary framework for assessing the stability of speciﬁcations. Section 3 dis-cusses how the learning channel enables novel stability phenomena and gives conditions forthe existence and continuity of equilibria in zeitgeists. Sections 4 and 5 contain applicationsto misspeciﬁed information structures in linear-quadratic-normal games and coarse analogyclasses in extensive-form games. Section 6 concludes.

Our paper contributes to the literature on misspeciﬁed Bayesian learning by proposing aframework to assess which speciﬁcations are more likely to persist based on their objec-tive performance. Most prior work on misspeciﬁed Bayesian learning study implications ofparticular errors in speciﬁc active-learning environments (i.e., when actions aﬀect observa-tions), including both single-agent decision problems (Nyarko, 1991; Fudenberg, Romanyuk,and Strack, 2017; Heidhues, Koszegi, and Strack, 2018; He, 2020) and multi-agent games(Bohren, 2016; Bohren and Hauser, 2018; Jehiel, 2018; Molavi, 2019; Dasaratha and He,2020; Ba and Gindin, 2020; Frick, Iijima, and Ishii, 2021). A number of papers establishgeneral convergence properties of misspeciﬁed learning (Esponda and Pouzo, 2016; Esponda,Pouzo, and Yamamoto, 2019; Frick, Iijima, and Ishii, 2019; Fudenberg, Lanzani, and Strack,2020). All of the above papers take misspeciﬁcations as exogenously given. By contrast, wepropose endogenizing misspeciﬁcations using ideas from evolutionary game theory. This also3ets us ask how details of the evolutionary process (e.g., the matching assortativity) shapethe stability of misspeciﬁcations.Another strand of literature shares our central focus on selecting between multiple speci-ﬁcations for Bayesian learning. Except for two contemporaneous papers discussed later, theselection criteria in this literature can be categorized into two groups: subjective expectationsof payoﬀs and goodness-of-ﬁt tests.

Subjective expectations of payoﬀs.

The ﬁrst approach selects speciﬁcations based on ad-herents’ (possibly incorrect) beliefs about their own payoﬀs. Olea, Ortoleva, Pai, and Prat(2020) consider a decision-maker estimating a possibly misspeciﬁed linear regression modelof y as a function of x ∈ R k . They deﬁne a notion of competition between agents withdiﬀerent regression speciﬁcations where the agent with a higher subjective conﬁdence abouttheir prediction error wins. Levy, Razin, and Young (2020) study a model of electoral com-petition between a misspeciﬁed simple worldview and the correct complex worldview. Votersbelieving in each worldview are more likely to vote if they expect a higher payoﬀ diﬀerencebetween the policies that would be implemented under their own worldview and the com-peting worldview. Eliaz and Spiegler (2020) study the equilibrium distribution of politicalnarratives, requiring that only the narratives that promise the highest subjective payoﬀssurvive. Gagnon-Bartsch, Rabin, and Schwartzstein (2020) deﬁne a misspeciﬁed theory tobe attentionally stable against an alternative theory if misspeciﬁed agents subjectively judgea certain coarsening of the data to be suﬃcient for decision-making, and such coarsened datadoes not falsify the wrong theory. Goodness-of-ﬁt tests . The second approach applies exogenous statistical criteria to aban-don or downplay speciﬁcations that ﬁt past data poorly. Cho and Kasa (2015) consider acentral bank that switches between two misspeciﬁed theories whenever its current theoryfails a goodness-of-ﬁt test. Ba (2020) also considers an agent who switches between twotheories, but uses the relative ﬁts of the two theories to data as the switching criterion. Sheinvestigates whether a misspeciﬁed theory can survive in the presence of a “nearby” com-peting theory. Cho and Kasa (2017) let an agent entertaining two competing theories makepredictions using a weighted average of the two theories, with the weights determined byhow well they ﬁt past data. Schwartzstein and Sunderam (2021) study a persuasion settingwhere the sender observes the receiver’s data and proposes a model to interpret the dataand inﬂuence the receiver’s belief, with the constraint that the likelihood of data must begreater under the proposed model than the receiver’s default model.The present work diﬀers in that our selection criterion is based on the objective expectedpayoﬀs of agents with diﬀerent speciﬁcations. We are implicitly motivated by a story ofcultural transmission where agents with higher objective welfare are more likely to passdown their theories to future agents. In general, the theory that leads to the highest objective4ayoﬀ need not be the one that leads to the highest subjective expectation of payoﬀs or theone that best ﬁts a ﬁnite dataset.In independent and contemporaneous work, Fudenberg and Lanzani (2020) and Frick,Iijima, and Ishii (2020) consider welfare-based criteria for selecting among misspeciﬁcationsin single-agent decision problems. Fudenberg and Lanzani (2020) study a framework wherea continuum of agents with heterogeneous misspeciﬁcations arrive each period and learnfrom their predecessors’ data. When the population shares of diﬀerent misspeciﬁcationschange according to their objective performance, Fudenberg and Lanzani (2020) ask whichBerk-Nash equilibria under one misspeciﬁcation are robust to invasion by a small fraction ofmutants with a diﬀerent misspeciﬁcation. Frick, Iijima, and Ishii (2020) compare learningunder diﬀerent misspeciﬁed signal structures with the property that biased agents still learnthe state correctly with enough signals. They assign an eﬃciency index to every misspeciﬁ-cation and show two agents with misspeciﬁcations ranked by this index must also have thesame welfare ranking in any decision problem, provided there is a large enough but ﬁnitenumber of signals.In single-agent decision problems, correctly speciﬁed agents always perform weakly betterthan misspeciﬁed agents (except when there are non-identiﬁability issues, see Proposition1), so the welfare-based criteria in Fudenberg and Lanzani (2020) and Frick, Iijima, andIshii (2020) do not provide a strict advantage to misspeciﬁed individuals compared to thecorrectly speciﬁed ones in the same society. By contrast, we focus on a theory of welfare-basedselection of misspeciﬁcations in games, where strategic concerns may imply that learningunder a misspeciﬁcation confers a strict evolutionary advantage relative to learning underthe correct speciﬁcation. The central concept in our framework, a zeitgeist, captures aspectsof the social interaction structure that are uniquely relevant when agents confront a game asopposed to a decision problem — namely, the assortativity of the matching technology thatpairs up agents with diﬀerent speciﬁcations to play the stage game, and how agents behavewhen matched against diﬀerent types of opponents.Our framework of competition between diﬀerent speciﬁcations for Bayesian learning isinspired by the evolutionary game theory literature. This literature also uses objective payoﬀsas the selection criterion, and studies the evolution of subjective preferences in games anddecision problems (e.g., Dekel, Ely, and Yilankaya (2007), see also the surveys Robson andSamuelson (2011) and Alger and Weibull (2019)) and the evolution of constrained strategyspaces (Heller, 2015; Heller and Winter, 2016). Learning does not play a key role in thesepapers. By contrast, our work seeks to provide a foundation for the exogenously given Some papers studying misspeciﬁed learning in games also point out that misspeciﬁcations can improvean agent’s welfare in particular situations (e.g., Jehiel (2005) and Ba and Gindin (2020)). We contribute byintroducing a general framework that can be applied broadly. sets of preferences,viewing every misspeciﬁcation (i.e., a set of feasible stage-game parameters) as a set ofpreferences over strategy proﬁles. A few papers in this literature study the evolution ofdiﬀerent belief-formation processes (Heller and Winter, 2020; Berman and Heller, 2020),but they take a reduced-form (and possibly non-Bayesian) approach and consider arbitraryinference rules. We require agents to be Bayesians who only diﬀer in the support of theirBayesian prior (i.e., their speciﬁcation), given the relation of this work to the literature onmisspeciﬁed Bayesian learning.

In this section, we introduce the general environment and stability concept. We begin withthe objective stage game and subjective theories that encode speciﬁcations. We deﬁne thenotion of an equilibrium zeitgeist , which describes the steady-state behavior and beliefs in asociety populated by agents with heterogeneous speciﬁcations. We then present the stabilityconcept, based on objective welfare in equilibrium zeitgeists when one theory is suﬃcientlyprevalent.

We ﬁrst set up the objective primitives of the general environment. The stage game is asymmetric two-player game with a common strategy space A , assumed to be metrizable.When i and − i choose strategies a i , a − i ∈ A , random consequences y i , y − i ∈ Y are generatedfor the players from a metrizable space Y . These consequences determine each player’sutility, according to a utility function π : Y → R . Objectively, y i is generated as a functionof i and − i ’s play. We take this distribution to be F • ( a i , a − i ) ∈ ∆( Y ), where ∆( Y ) is theset of distributions over Y . We denote the density or probability mass function associatedwith this distribution by f • ( a i , a − i ) : Y → R + .This general setup can allow for mixed strategies (if A is the set of mixtures over somepure actions) and incomplete-information games (if S is a space of private signals, A aspace of actions, and A = A S is the set of signal-contingent actions). It can also describeasymmetric games. Suppose there is a game with action sets A , A for player roles P1 andP2, and that the consequences of P1 and P2 under the action proﬁle ( a , a ) ∈ A × A are generated according to the distributions F • ( a , a ) and F • ( a , a ) over Y , where we6ssume the consequence also fully reveals the agent’s role. We may construct a symmetricstage game by letting A = A × A , so the strategies of two matches agents spell out whatactions they would take if they were assigned into each of the player roles. The agents arethen placed into the player roles uniformly at random and play according to the strategies.That is, the objective distribution over i ’s consequence when playing ( a i , a i ) ∈ A against( a − i , a − i ) ∈ A is given by the 50-50 mixture over F • ( a i , a − i ) and F • ( a i , a − i ). Throughout this paper, we will take the strategy space A , the set of consequences Y , andthe utility function over consequences π to be common knowledge among the agents. But,agents entertain two kinds of uncertainty. First, they are unsure about how play in the stagegame translates into consequences — that is, they have fundamental uncertainty about thefunction F • . For example, the agents may be uncertain about some parameters of the stagegame, such as the market price elasticity in a quantity-competition game. Second, theyare unsure about how others play — that is, they have strategic uncertainty about others’behavior.We will consider a society with two observably distinguishable groups of agents, A andB, who may behave diﬀerently in the stage game (due to each group having a diﬀerentbelief about the economic fundamentals, for example). All agents entertain diﬀerent models of the world as possible resolutions of their uncertainty. Models are triplets ( a A , a B , F )with a A , a B ∈ A and F : A → ∆( Y ) . Each model contains a conjecture a A about howgroup A opponents act when matched with the agent, a conjecture a B about how group Bopponents act, and a conjecture F about how strategy proﬁles translate into consequencesfor the agent. Assume each F , like F • , is given by a density or probability mass function f ( a i , a − i ) : Y → R + for every ( a i , a − i ) ∈ A .A theory Θ is a collection of models: that is, a subset of A × (∆( Y )) A . We assume themarginal of the theory on (∆( Y )) A is metrizable. Each agent enters society with a persistenttheory, which depends entirely on whether they are from group A or group B. We think ofthis exogenously endowed theory as coming from education or cultural background, and eachagent dogmatically believes that her theory contains the correct model of the world. A theoryΘ is correctly speciﬁed if Θ ⊇ A × { F • } , so the agent can make unrestricted inferences aboutothers’ play and does not rule out the correct fundamental environment F • .In general, a theory may exclude some feasible opponent strategies or the true F • . Such misspeciﬁed theories can represent a scientiﬁc paradigm about the economy based on a falsepremise, a religious belief system with dogmas that contradict facts about the world, orheuristic thinking stemming from a psychological bias that deems the true environment as7mplausible. Each agent plays the stage game with a random opponent in every period, anduses her personal experience in these matches to calibrate the most accurate model withinher theory in a way that we will make precise in Section 2.4.An agent endowed with a theory is called an adherent of the theory. As alluded to above,we suppose the society is composed of the adherents in the two observable groups A andB. This presumes that agents can identify which group their matched opponent belongs to,though we do not assume that agents know the models contained in theories other thantheir own. For instance, imagine two dominant theories about business economics coexist ina society, taught by two diﬀerent universities. Agents are the executives of competing ﬁrmsand they can use public records to look up the educational background of other executives andtherefore learn which school of thought they subscribe to. But even though each agent canperfectly identify her opponent’s group membership (which helps to predict the opponent’sbehavior), she does not understand anything about the contents of the rival economic theory. To study competition between two theories, we must describe the social composition andinteraction structure in the society where learning takes place. We introduce the conceptsof zeitgeists and equilibrium zeitgeists to capture these details.The Cambridge Dictionary deﬁnes the noun “zeitgeist” as “the general set of ideas, beliefs,feelings, etc. that is typical of a particular period in history.” Crucial in this dictionarydeﬁnition is the multiplicity of coexisting ideas and beliefs in the society at a moment intime. In the spirit of the usual meaning of the word, we deﬁne a zeitgeist as a landscape of beliefs from diﬀerent schools of thought, their relative prominence in the society, and theinteraction among the adherents of diﬀerent theories.

Deﬁnition 1. A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) consists of: (1) two theories Θ A andΘ B ; (2) a belief over models for each theory, µ A ∈ ∆(Θ A ) and µ B ∈ ∆(Θ B ); (3) relativesizes of the two groups in the society, p = ( p A , p B ) with p A , p B ≥ , p A + p B = 1; (4) amatching assortativity parameter λ ∈ [0 , a = ( a AA , a AB , a BA , a BB ) where a g,g ∈ A is the strategy that an adherentof Θ g plays against an adherent of Θ g . A zeitgeist outlines the beliefs and interactions among agents with heterogeneous theoriesliving in the same society. Parts (1) and (2) of this deﬁnition capture the beliefs of eachgroup. Parts (3) and (4) determine social composition and social interaction—the relativeprominence of each theory and the probability of interacting with one’s own group versus withthe population as a whole. In each period, every agent is matched with an opponent from8er own group with probability λ, and matched uniformly by population proportion withprobability 1 − λ. Therefore, an agent from group g has an overall probability of λ + (1 − λ ) p g of being matched with an opponent from her own group, and a complementary chance ofbeing matched with an opponent from the other group. Part (5) describes behavior in thesociety. To evaluate payoﬀs under a zeitgeist, which we then use to determine each theory’s evolu-tionary ﬁtness, we introduce our equilibrium concept.An equilibrium zeitgeist (EZ) imposes equilibrium conditions on the beliefs and behaviorin a zeitgeist. Speciﬁcally, it is a zeitgeist that satisﬁes the optimality of inference andbehavior, holding ﬁxed the population shares p and the matching assortativity λ. Optimalityof behavior requires all players best respond, and optimality of inference requires that thebelief is supported on models that minimize Kullback-Leibler (KL) divergence.Formally, for two distributions F • , ˆ F ∈ ∆( Y ) with density functions / probability massfunctions f • , ˆ f , deﬁne the KL divergence from ˆ F to F • as D KL ( F • k ˆ F ) := R f • ( y ) ln (cid:16) f • ( y ) f ( y ) (cid:17) dy . Deﬁnition 2.

A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) is an equilibrium zeitgeist (EZ) iffor every g, g ∈ { A, B } , a g,g ∈ arg max ˆ a ∈ A E ( a A ,a B ,F ) ∼ µ g (cid:20) E y ∼ F (ˆ a,a g ) ( π ( y )) (cid:21) and, for every g ∈{ A, B } , the belief µ g is supported onarg min (ˆ a A , ˆ a B , ˆ F ) ∈ Θ g  ( λ + (1 − λ ) p g ) · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , ˆ a g )))+(1 − λ )(1 − p g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , ˆ a − g )  . We now interpret the deﬁnition of an EZ. Each agent from group g chooses a subjectivebest response a g,g against each group g of opponents, given her belief µ g about the fun-damental and strategic uncertainty. Her belief µ g is supported on the models in her theorythat minimize a weighted KL-divergence, with the data from each type of match weightedby the probability of confronting this type of opponent. Section 3.4 and Appendix B developa learning foundation of EZs as the social steady state when Bayesian learners start with aprior supported on the models in their theory.An important assumption behind this framework is that agents (correctly) believe theeconomic fundamentals are ﬁxed, no matter who they are matched against. That is, themapping ( a i , a − i ) ∆( Y ) describes the stage game that they are playing, and agents knowthat they always play the same stage game even though opponents from diﬀerent groupsmay use diﬀerent strategies in the game. As a result, the agent’s experience in gamesagainst both groups of opponents jointly resolve the same fundamental uncertainty about the9nvironment. Generally, play between two groups g and g is not a Berk-Nash equilibrium,as the individuals in group g draw inferences about the game’s parameters not only from thematches against group g , but also from the matches against the other group − g , who mayuse a diﬀerent strategy.Even as agents adjust their beliefs and behavior to converge to an EZ, the populationproportions of diﬀerent theories p A , p B remain ﬁxed. We imagine a world where the relativeprominence of theories change much more slowly than the rate of convergence to an EZ.Thus, an equilibrium zeitgeist provides a snapshot of the society in a given era, and thesocial transitions between diﬀerent EZs as p evolves takes place on a longer timescale. Equilibrium zeitgeists describe environments where agents entertain both fundamental un-certainty and strategic uncertainty. For some applications, we may wish to focus attentionon agents’ inferences about the game parameters and abstract away from learning how othersplay. To do this, we introduce a variant of EZs where agents are restricted to hold correctbeliefs about others’ behavior.

Deﬁnition 3.

A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) is an equilibrium zeitgeist with strate-gic certainty (EZ-SC) if for every g, g ∈ { A, B } ,a g,g ∈ arg max ˆ a ∈ A E ( a A ,a B ,F ) ∼ µ g (cid:20) E y ∼ F (ˆ a,a g ) ( π ( y )) (cid:21) and, for every g ∈ { A, B } , the belief µ g is supported onarg min (ˆ a g , ˆ a − g , ˆ F ) ∈ Θ g s.t. ˆ a g = a g,g , ˆ a − g = a − g,g  ( λ + (1 − λ ) p g ) · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , a g,g )))+(1 − λ )(1 − p g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , a − g,g )  . That is, an EZ-SC adds the extra requirement relative to EZ that µ g correctly reﬂectsothers’ play. In our applications, we will only work with EZ-SC in environments wheretheories have the product structure Θ = A × A × F , where F ⊆ (∆( Y )) A . So, agents canmake any inference about others’ play and the optimization problem in the deﬁnition ofEZ-SC can be thought of as an optimization over conjectures about the game, F ∈ F . Aswill be made precise by the learning foundation, an EZ-SC is an EZ in situations whereagents see suﬃciently informative ex-post signals about the matched opponent’s strategy atthe end of every match.Agents may still be misspeciﬁed because F may exclude the true mapping F • that trans-lates strategy proﬁles into consequences. For theories with a product structure, we sometimes10buse notation and use the terminology Θ or “theory” to refer to F , even though F formallyrepresents only the marginal of the theory on fundamental uncertainty. In an EZ or EZ-SC, deﬁne the ﬁtness of each theory Θ A and Θ B as the objective expectedpayoﬀ of its adherents. Consider an evolutionary story where the relative prominence ofthe two theories in the society rise and fall according to their relative ﬁtness. This couldhappen, for example, if the theories are the basic heritable units of information passed downto future agents via cultural transmission, and the school of thought whose adherents havehigher average payoﬀ tends to acquire more resources and attract a larger share of futureadherents. We are interested in a notion of stability based on this “evolutionary” processwhere two co-existing rival theories compete to create intellectual descendants in a payoﬀ-monotonic way. Can the adherents of a resident theory Θ A , starting at a position of socialprominence, always repel an invasion from a small (cid:15) mass of agents who adhere to a mutanttheory Θ B ? The deﬁnition of evolutionary stability formalizes this idea.Since we are motivated by situations where a small but strictly positive population oftheory Θ B adherents invades an otherwise homogeneous society all believing in theory Θ A , webegin with a reﬁnement of EZ and EZ-SC that rules out those equilibria with the populationshare ( p A , p B ) = (1 ,

0) that cannot be written as the limit of equilibria with a positive butvanishing p B . This rules out, for example, EZs with p A = 1 sustained only because groupA holds arbitrary beliefs about the play of group B or fragile beliefs about the economicfundamentals that would be discarded after a single match against a group B opponent. Deﬁnition 4.

An EZ Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) with p = (1 ,

0) is approachable if thereexists a sequence of EZs Z ( n ) = (Θ A , Θ B , µ ( n ) A , µ ( n ) B , ( p ( n ) A , p ( n ) B ) , λ, ( a ( n ) AA , a ( n ) AB , a ( n ) BA , a ( n ) BB )), where p ( n ) B > n, p ( n ) B → , µ ( n ) A → µ A , µ ( n ) B → µ B , a ( n ) → a. An EZ-SC Z with p = (1 ,

0) is approachable if there exists a sequence of EZ-SCs Z ( n ) satisfying the analogous convergenceconditions.In this deﬁnition, µ ( n ) g → µ g refers to convergence in weak* topology on the space ∆(Θ g )of distributions over the models in theory Θ g , and a ( n ) → a means the convergence of thestrategy proﬁle in the metrizable space A .We now turn to the deﬁnition of evolutionary stability, which is deﬁned only when theset of approachable EZ / EZ-SC with p = (1 ,

0) is non-empty. Stability is deﬁned basedon the ﬁtness of theories Θ A , Θ B in such equilibria. Evolutionary stability is when Θ A hashigher ﬁtness than Θ B in all approachable equilibria, and evolutionary fragility is when Θ A has lower ﬁtness in all approachable equilibria. These two cases give sharp predictions about11hether a small share of mutant-theory invaders might grow in size, across all equilibriumselections. A third possible case, where Θ A has lower ﬁtness than Θ B in some but not allapproachable equilibria, correspond to a situation where the mutant theory may or may notgrow in the society, depending on the equilibrium selection. Deﬁnition 5.

Suppose there exists at least one approachable EZ with theories Θ A , Θ B , p = (1 , , and matching assortativity λ . Say Θ A is evolutionarily stable [fragile] against Θ B under λ -matching if in all such approachable EZ, Θ A has a weakly higher [strictly lower]ﬁtness than Θ B .Analogously, suppose there exists at least one approachable EZ-SC with theories Θ A , Θ B , p = (1 , , and matching assortativity λ . Say Θ A is evolutionarily stable [fragile] with strategiccertainty against Θ B under λ -matching if in all such approachable EZ-SC, Θ A has a weaklyhigher [strictly lower] ﬁtness than Θ B . Before turning to particular applications, we discuss some general properties of the frame-work. Section 3.1 points out that correct speciﬁcations are stable against misspeciﬁcations indecision problems. Section 3.2 shows the framework’s learning channel leads to new stabilityphenomena. Section 3.3 presents suﬃcient conditions for the existence and upper hemicon-tinuity of EZ-SCs. Section 3.4 outlines a learning foundation for EZs and EZ-SCs (with thedetails relegated to Appendix B).

We ﬁrst show that in single-agent problems, evolutionary arguments will always favor acorrectly speciﬁed theory over an incorrect one. The stage “game” is a decision problem if ( a i , a − i ) F • ( a i , a − i ) only depends on a i . In decision problems, the correctly speciﬁedtheory is evolutionarily stable (with or without strategic certainty) against any other theory,except when there are identiﬁcation issues. We adapt the notion of strong identiﬁcation fromEsponda and Pouzo (2016).

Deﬁnition 6.

Theory Θ A is strongly identiﬁed in EZ Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) if when-ever (ˆ a A , ˆ a B , ˆ F ) , (ˆ a A , ˆ a B , ˆ F ) ∈ Θ A both solvemin (ˆ a A , ˆ a B , ˆ F ) ∈ Θ A  ( λ + (1 − λ ) p A ) · D KL ( F • ( a AA , a AA ) k ˆ F ( a AA , ˆ a A )))+(1 − λ )(1 − p A ) · D KL ( F • ( a AB , a BA ) k ˆ F ( a AB , ˆ a B )  , we have ˆ F ( a i , ˆ a A ) = ˆ F ( a i , ˆ a A ) and ˆ F ( a i , ˆ a B ) = ˆ F ( a i , ˆ a B ) for all a i ∈ A .12heory Θ A is strongly identiﬁed in EZ-SC Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) if whenever( a AA , a BA , ˆ F ) , ( a AA , a BA , ˆ F ) ∈ Θ A are such that ˆ F , ˆ F both solvemin ˆ F s.t. ( a AA ,a BA , ˆ F ) ∈ Θ A  ( λ + (1 − λ ) p A ) · D KL ( F • ( a AA , a AA ) k ˆ F ( a AA , a AA )))+(1 − λ )(1 − p A ) · D KL ( F • ( a AB , a BA ) k ˆ F ( a AB , a BA )  , we have ˆ F ( a i , a AA ) = ˆ F ( a i , a AA ) and ˆ F ( a i , a BA ) = ˆ F ( a i , a BA ) for all a i ∈ A . Proposition 1.

Suppose the stage game is a decision problem. Let λ and two theories Θ A , Θ B be given, where Θ A is correctly speciﬁed. Suppose there exists at least one approachableequilibrium zeitgeist [with strategic certainty] with p A = 1 , and Θ A is strongly identiﬁed inall such equilibria. Then Θ A evolutionarily stable [with strategic certainty] under λ -matchingagainst Θ B . The result that a resident correct speciﬁcation is immune to invasions from misspeciﬁ-cations echoes related results in Fudenberg and Lanzani (2020) and Frick, Iijima, and Ishii(2020). For the rest of the paper, we focus on stage games where multiple agents’ actionsjointly determine their payoﬀs.

A key feature of our theory-evolution framework is that each agent interprets her observa-tions through the lens of her theory, thus drawing inferences about her environment (e.g.,game parameters). These inferences, in turn, shape her preference over strategy proﬁles inthe stage game. So, the learning channel endogenously determines the preferences that theadherents of diﬀerent theories hold in the stage game. By contrast, the literature on pref-erence evolution discussed in Section 1.1 precludes such inferences and endows each agentwith a ﬁxed preference.We ﬁrst show how preference evolution is embedded as a special case of our framework.We then explore the implications of the learning channel for evolutionary stability, showingthat some novel stability phenomena can only arise with theory evolution, and not withpreference evolution. Some of the results in our applications (e.g., Proposition 9) also showthat predictions about evolutionary stability change drastically without the learning channel.A theory Θ is called a singleton if Θ = A × { F } for some F : A → ∆( Y ) . An agentwith a singleton theory does not entertain fundamental uncertainty: she is sure that thestage game is described by F. We can view every singleton theory as a subjective utilityfunction in the stage game. That is, we deﬁne ( a i , a − i ) U i ( a i , a − i ; F ) with U i ( a i , a − i ; F ) := E y ∼ F ( a i ,a − i ) [ π ( y )]. An EZ-SC in a society where all agents have singleton theories correspondto an equilibrium in a setting with preference evolution. The adherents of Θ g hold the13ubjective preference U i ( · , · ; F g ) in the stage game, and all agents maximize their subjectivepreferences in all match types. To see this, suppose Θ A = A × { F A } , Θ B = A × { F B } are singleton theories. If Z = (Θ A , Θ B , µ A , µ B , ( p ) , λ, ( a )) is an EZ-SC, then µ A must putprobability 1 on ( a AA , a BA , F A ) and µ B must put probability 1 on ( a AB , a BB , F B ), so for every g, g ∈ { A, B } , a t,t satisﬁes a g,g ∈ arg max ˆ a ∈ A U i (ˆ a, a g ,g ; F g ) . We work with strategic certaintyfor ease of comparison with the literature on preference evolution, as that literature typicallyassumes agents correctly know others’ behavior in equilibrium.In a society with matching assortativity λ, an adherent of a theory with populationproportion p g is matched up with someone from the same group with probability λ +(1 − λ ) p g .This matching probability is an increasing and linear function in each of λ and p g . Supposethe two subjective preferences U i ( · , · ; F A ) and U i ( · , · ; F B ) associated with the two singletontheories Θ A and Θ B in a society induce a unique equilibrium in matches between groups g and g for all g, g ∈ { A, B } . Then, the ﬁtness of each theory changes linearly as we changethe matching assortativity or population shares. This linearity underlies the key distinctionbetween preference evolution and theory evolution.Every non-singleton theory may be thought of as a set of preferences over stage gamestrategy proﬁles, viewing each feasible conjecture about the stage game F : A → ∆( Y )as one such preference. As matching assortativity or population shares change, each agentencounters a diﬀerent distribution over opponent strategies. This may lead a misspeciﬁedagent to draw a diﬀerent inference about the stage game parameters and may change theagent’s best-response function. By contrast, in a world of preference evolution, a gamebetween two agents with a given pair of subjective preferences always plays out in the sameway, regardless of the social composition or matching assortativity of the larger society wherethe game takes place.We exhibit two stability phenomena that only happen with non-singleton theories. Stability reversal refers to a strong kind of multiplicity in the relative stability of two theoriesΘ A and Θ B under uniform matching. Recall that in an EZ-SC, the ﬁtness of a theory is theobjective expected payoﬀs of its adherents, where this expectation averages across expectedpayoﬀs in matches against each of the two groups. Let a theory’s conditional ﬁtness againstgroup g refer to the expected payoﬀ of the theory’s adherents in matches against group g. Deﬁnition 7.

Two theories Θ A , Θ B exhibit stability reversal if (i) in every EZ-SC with λ = 0and ( p A , p B ) = (1 , , Θ A has strictly higher conditional ﬁtness than Θ B against group Aopponents and against group B opponents, but also (ii) in every EZ-SC with λ = 0 and( p A , p B ) = (0 , , Θ B has strictly higher ﬁtness than Θ A .14f at least one EZ-SC is approachable with λ = 0 , ( p A , p B ) = (1 , A to be evolutionarilystable with strategic certainty against Θ B . It imposes the more stringent condition that Θ A outperforms Θ B not only on average, but also conditional on the opponent’s group. Thelinearity of ﬁtness in population share discussed above then implies that stability reversalcannot take place if both theories are singletons (i.e., if we are in the world of preferenceevolution). Proposition 2.

Two singleton theories (i.e., two subjective preferences in the stage game)cannot exhibit stability reversal in any stage game.

Stability reversal is unique to the world of theory evolution. For an example, considera two-player investment game where player i chooses an investment level a i ∈ { , } . Arandom productivity level P is realized according to b • ( a i + a − i ) + (cid:15) where (cid:15) is a zero-meannoise term, b • >

0. Player i gets a i · P − { a i =2 } · c . So P determines the marginal returnon investment, and c > y = ( a i , a − i , P ) . The payoﬀ matrix below displays the objective expected payoﬀs for diﬀerentinvestment proﬁles. 1 21 2 b • , b • b • , b • − c b • − c, b • b • − c, b • − c Condition 1. b • < c < b • .Condition 1 ensures that a i = 1 is a strictly dominant strategy in the stage game, and theinvestment proﬁle (2,2) Pareto dominates the investment proﬁle (1,1). Higher investmenthas a positive externality as it also increases opponent’s productivity.Consider two theories in the society. Theory Θ A is a correctly speciﬁed singleton – itsadherents understand how investment proﬁles translate into distributions over productivity.Theory Θ B wrongly stipulates P = b ( x i + x − i ) − m + (cid:15) , where m > b ∈ R is a parameter that the adherents infer. We require the followingcondition, which is satisﬁed whenever m > B is suﬃcientlymisspeciﬁed. Condition 2. c < b • + m and c < b • + m. We show that in contrast to the impossibility result when all theories are singletons, inthis example theories Θ A and Θ B exhibit stability reversal.15 xample 1. In the investment game, under Condition 1 and Condition 2, Θ A and Θ B exhibit stability reversal.The idea is that the adherents of Θ B overestimate the complementarity of investments,and this overestimation is more severe when they face data generated from lower investmentproﬁles. As a result, the match between Θ A and Θ B plays out in a diﬀerent way dependingon which theory is resident: it results in the investment proﬁle (1 ,

2) when Θ A is resident,but results in (1 ,

1) when Θ B is resident.Let b ∗ ( a i , a − i ) solve min b ∈ R D KL ( F • ( a i , a − i ) k ˆ F ( a i , a − i ; b, m ))) , where F • ( a i , a − i ) is theobjective distribution over observations under the investment proﬁle ( a i , a − i ) , and ˆ F ( a i , a − i ; b, m )is the distribution under the same investment proﬁle in the model where productivity is givenby P = b ( x i + x − i ) − m + (cid:15) . We ﬁnd that b ∗ ( a i , a − i ) = b • + ma i + a − i . That is, adherents ofΘ B end up with diﬀerent beliefs about the game parameter b depending on the behavior oftheir typical opponents, which in turn aﬀects how they respond to diﬀerent rival investmentlevels. Stability reversal hinges on the fact that when Θ A is resident and the adherents ofΘ B always meet opponents who play a i = 1 , they end up with a more distorted belief aboutthe fundamental than when Θ B is resident.In this example, stability reversal happens because the misspeciﬁed agents hold diﬀerentbeliefs about a stage-game parameter depending on which theory is resident. Also, notethe stage game involves non-trivial strategic interaction between the players — the comple-mentarity in investment levels implies an agent’s best response may vary with the rival’sstrategy. Both of these turn out to be necessary conditions for stability reversal in generalstage games. Deﬁnition 8.

A theory Θ is strategically independent if for all µ ∈ ∆(Θ), arg max a i ∈ A E F ∼ µ [ U i ( a i , a − i ; F )]is the same for every a − i ∈ A . The adherents of a strategically independent theory believe that while opponent’s actionmay aﬀect their utility, it does not aﬀect their best response.

Proposition 3.

In any stage game, suppose Θ A , Θ B exhibit stability reversal and Θ A is thecorrectly speciﬁed singleton theory and Θ B has the product structure. Then, the beliefs thatthe adherents of Θ B hold in all EZ-SCs with p = (1 , and the beliefs they hold in all EZ-SCswith p = (0 , form disjoint sets. Also, Θ B is not strategically independent. The ﬁrst claim of Proposition 3 shows that stability reversal must operate through thelearning channel. So in particular, it cannot happen if the group B agents simply have adiﬀerent subjective preference in the stage game. The second claim shows that stabilityreversal can only happen if the misspeciﬁed agents respond diﬀerently to diﬀerent rival play.In particular, it cannot happen in decision problems.16 .2.2 Non-Monotonic Stability in Matching Assortativity

We now turn to the role of matching assortativity on the stability of theories. In the worldof preference evolution, the linearity of ﬁtness in matching assortativity discussed beforeimplies that if a theory Θ A is evolutionarily stable with strategic certainty against a theoryΘ B both under uniform matching ( λ = 0) and perfectly assortative matching ( λ = 1), thenthe same must also hold under any intermediate level of assortativity λ ∈ (0 , . Proposition 4.

Suppose Θ A , Θ B are singleton theories (i.e., subjective preferences in thestage game) and Θ A is evolutionarily stable with strategic certainty against Θ B with λ -matching for both λ = 0 and λ = 1 . Then, Θ A is also evolutionarily stable with strategiccertainty against Θ B with λ -matching for any λ ∈ [0 , . This result does not always hold with non-singleton general theories. We use an exampleto show that stability need not be monotonic in matching assortativity. In this example, acorrectly speciﬁed singleton theory is evolutionarily stable with strategic certainty againstanother misspeciﬁed theory both when λ = 0 and when λ = 1, but it is also evolutionarilyfragile with strategic certainty for some intermediate values of λ. Consider a stage game where each player chooses an action from { a , a , a } . Every playerthen receives a random prize, y ∈ { g, b } , which are worth utilities π ( g ) = 1 , π ( b ) = 0 . Thepayoﬀ matrix below displays the objective expected utilities associated with diﬀerent actionproﬁles, which also correspond to the probabilities that the row and column players receivethe good prize g . a a a a a a A be the correctly speciﬁed singleton theory. The action a is strictly dominantunder the objective payoﬀs, so an adherent of Θ A always plays a in all matches. Let Θ B bea misspeciﬁed theory Θ B = A × { F H , F L } . Each model F H , F L stipulates that the prize g is generated the the probabilities in the following table, where b and c are parameters thatdepend on the model. The model F H has ( b, c ) = (0 . , .

2) and F L has ( b, c ) = (0 . , . .a a a a c a c, b, b b, a b Example 2.

In this stage game, Θ A is evolutionarily stable with strategic certainty againstΘ B under λ -matching when λ = 0 and λ = 1 , but it is also evolutionarily fragile withstrategic certainty under λ -matching when λ ∈ ( λ l , λ h ), where 0 < λ l < λ h < λ l = 0 . λ h ≈ . B . If theybelieve in F H , they will play the action proﬁle ( a , a ) and generate the objective payoﬀproﬁle (0 . , . a , a ).The problem is that the data generated from the ( a , a ) proﬁle provides a better ﬁt for F L than F H , since the objective 40% probability of getting prize g is closer to F L ’s conjecture of10% than F H ’s conjecture of 80%. A belief in F H — and hence the proﬁle ( a , a ) — cannotbe sustained if the mutants only play each other. On the other hand, when an adherent of Θ B plays a correctly speciﬁed Θ A adherent, both models F H and F L prescribe a best response of a against the Θ A adherent’s play a . The data generated from the ( a , a ) proﬁle lead biasedagents to the model F H that enables cooperative behavior within the mutant community.But, these matches against correctly speciﬁed opponents harm the mutant’s welfare, as theyonly get an objective payoﬀ of 0.2.Therefore, the most advantageous interaction structure for the mutants is one wherethey can calibrate the model F H using the data from matches against correctly speciﬁedopponents, then extrapolate this optimistic belief about b to coordinate on ( a , a ) in matchesagainst fellow mutants. This requires the mutants to match with intermediate assortativity.Figure 1 depicts the equilibrium ﬁtness of the mutant theory Θ B as a function of assortativity.While payoﬀs of Θ B adherents increase in λ at ﬁrst, eventually they drop when mutant-vs-mutant matches become suﬃciently frequent that a belief in F H can no longer be sustained.The preference evolution framework does not allow this non-linear and even non-monotonicchange in ﬁtness with respect to λ, which the theory evolution framework accommodates. We provide a few technical results about the existence of EZ-SC and the upper-hemicontinuityof the set of EZ-SC with respect to population share. The existence and continuity resultsalso establish the existence of approachable EZ-SCs with population shares p = (1 , .0 0.2 0.4 0.6 0.8 1.0 . . . Misspecified Theory's Fitness in EZ−SC assortativity l t heo r y B ' s f i t ne ss infer F H infer F L resident's fitness Figure 1: The EZ-SC ﬁtness of Θ B for diﬀerent values of matching assortativity λ when p B = 0. (The EZ-SC ﬁtness of the resident theory Θ A is always 0.25.) In the blue region,there is a unique EZ-SC where the adherents of Θ B infer F H and receive linearly increasingaverage payoﬀs across all matches as λ increases. In the red region, there is an EZ-SC wherethe adherents of Θ B infer F L and receive payoﬀ 0.2 in all matches, regardless of λ .agents hold correct beliefs about others’ play and prove analogous results for EZ instead ofEZ-SC, but this result is not needed in the subsequent applications as we will only considerstrategic uncertainty in examples where we can explicitly characterize the entire set of EZfor every population share p .Let two theories, Θ A , Θ B be ﬁxed, where Θ A = A × F A and Θ B = A × F B have productstructures. Also ﬁx population shares p and matching assortativity λ. For µ ∈ ∆(Θ A ) ∪ ∆(Θ B ), a i , a − i ∈ A , let U i ( a i , a − i ; µ ) := E F ∼ µ h E y ∼ F ( a i ,a − i ) ( π ( y )) i be the subjective expectedutility of playing a i against a − i , under the belief µ over models. Let U A : A × Θ A → R be such that U A ( a i , a − i ; F ) = U i ( a i , a − i ; δ F ) and let U B : A × Θ B → R be such that U B ( a i , a − i ; F ) = U i ( a i , a − i ; δ F ). Assumption 1. A , Θ A , Θ B are compact metrizable spaces. Assumption 2. U A , U B are continuous. Assumption 3.

For every F ∈ Θ A ∪ Θ B and a i , a − i ∈ A , KL ( F • ( a i , a − i ) k F ( a i , a − i )) iswell-deﬁned and ﬁnite. Under Assumption 3, we have the well-deﬁned functions K A : A × Θ A → R + and K B : A × Θ B → R + , where K g ( a i , a − i ; F ) := KL ( F • ( a i , a − i ) k F ( a i , a − i )). Assumption 4. K A and K B are continuous. Assumption 5. A is convex and, for all a − i ∈ A and µ ∈ ∆(Θ A ) ∪ ∆(Θ B ) , a i U i ( a i , a − i ; µ ) is quasiconcave.

19e show existence of EZ-SC using the Kakutani-Fan-Glicksberg ﬁxed point theorem,applied to the correspondence which maps strategy proﬁles and beliefs over models into bestreplies and beliefs over KL-divergence minimizing models. We start with a lemma.

Lemma 1.

For g ∈ { A, B } , a = ( a AA , a AB , a BA , a BB ) ∈ A , and ≤ m g ≤ , let Θ ∗ g ( a, m g ) := arg min ˆ F ∈ Θ g  m g · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , a g,g )))+(1 − m g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , a − g,g )  . Then, Θ ∗ g is upper hemicontinuous in its arguments. This lemma says the set of KL-minimizing models is upper hemicontinuous in strategyproﬁle and matching assortativity. This leads to the existence result.

Proposition 5.

Under Assumptions 1, 2, 3, 4, and 5, an EZ-SC exists.

Next, upper hemicontinuity in m g in Lemma 1 allows us to deduce the upper hemicon-tinuity of the EZ-SC correspondence in population shares, and conclude that the notionof approachability from Deﬁnition 4 is a non-empty reﬁnement of the set of EZ-SC with p = (1 , Proposition 6.

Fix two theories Θ A , Θ B where Θ A = A × F A and Θ B = A × F B . Also ﬁxmatching assortativity λ ∈ [0 , . The set of EZ-SC is an upper hemicontinuous correspon-dence in p B under Assumptions 1, 2, 3, and 4. Corollary 1.

Under Assumptions 1, 2, 3, 4, and 5, the set of approachable EZ-SC with p = (1 , is non-empty for every λ . Appendix B provides a learning foundation for our equilibrium concepts by showing that itis not possible for behavior and beliefs to stabilize at any outcome other than an EZ or anEZ-SC. We summarize it here, omitting the technical assumptions necessary for the result.A continuum of long-lived agents are endowed with one of two theories at time 0, andmatch in every period to play the stage game. Each agent starts with a full-support priorbelief over the models in her theory, believing her environment to be stationary. Whenmatched with an opponent, the agent observes the opponent’s group, then chooses a strat-egy a i ∈ A , and ﬁnally observes a consequence y i ∈ Y and possibly also a signal x i aboutthe matched opponent’s strategy a − i at the end of the game. She then updates her be-lief using Bayes’ rule. As models include conjectures about others’ play and conjecturesabout the fundamental parameters, agents may make inferences about the game parameters20sing opponents’ strategy choice through a correlated prior over strategic uncertainty andfundamental uncertainty.We do not require agents to act myopically and only assume that they use asymptoticallymyopic policies: they eventually choose actions that are (cid:15) myopic best responses to theBayesian posterior belief about the environment. We provide this foundation for the case ofgames with ﬁnite strategy spaces, but conjecture a similar argument would extend to gameswith compact inﬁnite strategy spaces given some uniformity conditions.The learning foundation clariﬁes the diﬀerence between EZ and EZ-SC. The formeremerges without the additional ex-post signals about opponent’s play. The latter emergeswhen the ex-post signals are suﬃciently informative. The idea is that a wrong conjectureabout opponent’s play then leads to higher KL divergence than the correct beliefs aboutopponent’s play combined with any feasible belief about the fundamentals. So, agents musthold correct beliefs about others’ play if beliefs and behavior converge to a steady state. We apply our framework to study the stability of misperceptions of the information struc-ture in linear quadratic normal (LQN) games. LQN games have been used as a tractableworkhorse model for studying comparative statics of equilibrium outcomes with respect tochanges in information (e.g., Bergemann and Morris (2013)). In this application, we exploitthe same tractability to study the evolutionary stability of correct beliefs about the informa-tion structure to misspeciﬁcations — in particular, misspeciﬁcations about the correlationin information between diﬀerent players. Assuming that agents know others’ strategies, thekey conclusion is that a society of rational residents with correct beliefs about how privatesignals are correlated is evolutionarily fragile against misspeciﬁed mutants who suﬀer fromeither correlation neglect or projection bias. The type of bias that gets selected depends onthe matching assortativity λ in the society. In the LQN setup we consider, we interpret the players as competing ﬁrms that possess cor-related private information about market demand. At the start of the stage game, Nature’stype (i.e., a demand state) ω is drawn from N (0 , σ ω ), where N ( µ, σ ) is the normal distri-bution with mean µ and variance σ . Each of the two players i (i.e., ﬁrms) receives a privatesignal s i = ω + (cid:15) i , then chooses an action q i ∈ R (i.e., a quantity). Market price is then21ealized according to P = ω − r • · ( q + q ) + ζ , where ζ ∼ N (0 , ( σ • ζ ) ) is an idiosyncraticprice shock that is independent of all the other random variables. Firm i ’s proﬁt in the gameis q i P − q i . The stage game is parametrized by the strictly positive terms σ ω , r • , and ( σ • ζ ) , whichrepresent variance in market demand, the elasticity of market price with respect to averagequantity supplied, and the variance of price shocks. These parameters remain constantthrough all matches. But in every match, demand state ω, signals ( s i ), and price shock ζ areredrawn, independently across matches. The environment can be interpreted as a marketwith daily ﬂuctuations in demand, but the ﬂuctuations are generated according to a ﬁxedset of fundamental parameters.In the LQN game, market prices and quantity choices may be positive or negative. Tointerpret, when P > , the market pays for each unit of good supplied, and market pricedecreases in total supply. When P < , the market pays for disposal of the good. Firmsmake money by submitting negative quantities, which represent oﬀers to remove the goodfrom the market. The per-unit disposal fee decreases as the ﬁrms oﬀer to dispose more. Thecost q i represents either a convex production cost or a convex disposal cost, depending onthe sign of q i . We now turn to the information structure of the stage game — that is, the joint distri-bution of ( ω, s i , s − i ) . The ﬁrms’ signals s i = ω + (cid:15) i are conditionally correlated given ω. Theerror terms (cid:15) i are generated by (cid:15) i = κ q κ + (1 − κ ) z + 1 − κ q κ + (1 − κ ) η i , where η i ∼ N (0 , σ (cid:15) ) is the idiosyncratic component of the error generated i.i.d. across i, and z ∼ N (0 , σ (cid:15) ) is the common component for both i. Here, κ ∈ [0 ,

1] parametrizes theconditional correlation of the two ﬁrms’ signals. Higher κ leads to an information structurewith higher conditional correlation. When κ = 0 , s i and s − i are conditionally uncorrelatedgiven the state (though still unconditionally correlated since both depend on ω ). When κ = 1 , we always have s i = s − i . The functional form of (cid:15) i ensures the variance of the signalsVar( s i ) remains constant across all possible values of κ. We consider a family of misspeciﬁcations about the information structure parametrizedby misperceptions of κ . The objective information structure is given by κ = κ • . Notethat a misspeciﬁed information structure associated with a wrong κ leads to a higher-ordermisspeciﬁcation about the state ω in the stage game. Suppose agents are correct about thedistributions of ω, η i , and z . Write E κ for expectation under the information structure withcorrelational parameter κ. Then E κ [ ω | s i ] is the same for all κ — in particular, even an agent22ho believes in some κ = κ • makes a correct ﬁrst-order inference about the expectation ofthe market demand, given her own information. But, one can show (Lemma 2) there existsa strictly increasing and strictly positive function ψ ( κ ) so that E κ [ s − i | s i ] = ψ ( κ ) · s i for all s i ∈ R , κ ∈ [0 , . The misspeciﬁed agent holds a wrong belief about the rival’s signal, andthus a wrong belief about the rival’s belief about ω. Many experiments have found that subjects do not form accurate beliefs about the beliefsof others. We draw a connection between the misperception we study and the statisticalbiases that have been previously documented:

Deﬁnition 9.

Let ˜ κ be a player’s perceived κ . A player suﬀers from correlation neglect if˜ κ < κ • . A player suﬀers from projection bias if ˜ κ > κ • .Under correlation neglect, agents believe signals are more independent from one another thanthey really are. Under projection bias, agents “project” their own information onto othersand exaggerate the similarity between others’ signals and their own signals. We are agnosticabout the origin of these misspeciﬁcations about correlation. They may be psychological innature and come directly from the agents’ cognitive biases, or they could be driven by morecomplex mechanisms. We instead ask whether such misspeciﬁcations could persist in thesociety once they appear.

We translate the environment described above into the formalism from Section 2.A strategy in the stage game is a function Q i : R → R that assigns a quantity Q i ( s i ) toevery signal s i . The strategy is called linear if there exists an α i ≥ Q i ( s i ) = α i s i for every s i ∈ R . We will later show that the best response to any linear strategy is linear,regardless of the agent’s belief about the correlation parameter and market price elasticity(Lemma 3). We therefore restrict attention to linear strategies and let A = [0 , ¯ M α ] for some¯ M α < ∞ , where a typical element α i ∈ A corresponds to the linear strategy with coeﬃcient α i . We suppose all parameters of the stage game are common knowledge except for r • , κ • , and σ • ζ . To investigate the evolutionary implications of higher-order misspeciﬁcations aboutthe state, we consider theories that are dogmatic and possibly wrong about κ, but allowagents to make inferences about r and σ ζ . We let the space of consequences be Y = R , For example, Hansen, Misra, and Pai (2021) show that multiple agents simultaneously conducting algo-rithmic price experiments in the same market may generate correlated information which get misinterpretedas independent information, a form of correlation neglect for ﬁrms. Goldfarb and Xiao (2019) structurallyestimate a model of thinking cost and ﬁnd that bar owners over-extrapolate the eﬀect of today’s weathershock on future proﬁtability. y = ( s i , q i , P ) shows the agent’s signal, quantity choice, and themarket price. The consequence y delivers the utility π ( y ) := q i P − q i . We consider theorieswith the product structure Θ( κ ) := A ×F κ , where F κ := { F r,κ,σ ζ : r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] } for some ¯ M r , ¯ M σ ζ < ∞ . So F κ is a set of conjectures of the game environment indexed bythe parameters ( r, κ, σ ζ ), but all reﬂecting a dogmatic belief in the correlation parameter κ . Each F r,κ,σ ζ : A × A → ∆( Y ) is such that F r,κ,σ ζ ( α i , α − i ) gives the distribution over i ’sconsequences in a stage game with parameters ( r, κ, σ ζ ), when i uses the linear strategy α i against an opponent using the linear strategy α − i . Since we will focus on EZ-SC with theoriesthat allow unrestricted inference about others’ play, we will abuse notation and identify Θ( κ )with F κ .While agents learn about both r and σ ζ , it is their (mis)inferences about the marketprice elasticity r that drives the main results. Since each ﬁrm’s proﬁt is linear in the marketprice, an agent’s belief about the variance of the idiosyncratic price shock does not changeher expected payoﬀs or behavior. We use inference over σ ζ to simplify our analysis: thisparameter absorbs changes in the variance of market price under diﬀerent correlation struc-tures. A Bayesian agent whose data are all generated from the same strategy proﬁle onlylearn about r using the mean of the market price in the data, not its variance.In formalizing the stage game and translating misperceptions of the information structureinto theories, we have assumed that the space of feasible linear strategies α i ∈ [0 , ¯ M α ] and thedomain of inference over game parameters r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] are bounded sets. Thesecompactness assumptions help ensure that EZ-SC exist. In analyzing evolutionary stability,we will focus on the case where the bounds ¯ M α , ¯ M r , ¯ M σ ζ are ﬁnite but suﬃciently large, sothat the optimal behavior and beliefs are interior. We introduce the following shorthand: Notation . A result is said to hold “ with high enough price volatility and large enough strategyspace and inference space ” if, whenever the strategy space [0 , ¯ M α ] has ¯ M α ≥ /σ (cid:15) /σ (cid:15) +1 /σ ω , thereexist 0 < L , L , L < ∞ so that for any objective game F • with ( σ • ζ ) ≥ L and with theorieswhere the parameter spaces r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] are such that ¯ M σ ζ ≥ ( σ • ζ ) + L and¯ M r ≥ L , the result is true. In order to determine which theories (i.e., perceptions of κ ) are stable against rival theories,we must characterize the relevant equilibrium zeitgeists. This section develops a numberof preliminary results that relate beliefs about the game parameters to best responses, andconversely strategy proﬁles to the KL-divergence minimizing inferences.We begin by proving the result alluded to earlier: under normality, every agent’s infer-ences about the state and about opponent’s signal are linear functions of her own signal.24he linear coeﬃcient on the latter increases with the correlation parameter κ . Lemma 2.

There exists a strictly increasing function ψ ( κ ) , with ψ (0) > and ψ (1) = 1 , so that E κ [ s − i | s i ] = ψ ( κ ) · s i for all s i ∈ R , κ ∈ [0 , . Also, there exists a strictly positive γ ∈ R so that E κ [ ω | s i ] = γ · s i for all s i ∈ R , κ ∈ [0 , . Linearity of E [ ω | s i ] and E [ s − i | s i ] in s i allows us explicitly characterize the correspond-ing linear best responses, given beliefs about κ and elasticity r . For Q i , Q − i (not necessarilylinear) strategies in the stage game and µ ∈ ∆(Θ( κ )), let U i ( Q i , Q − i ; µ ) be i ’s subjectiveexpected utility from playing Q i against Q − i , under the belief µ. Lemma 3.

For α − i a linear strategy, U i ( α i , α − i ; µ ) = E [ s i ] · (cid:18) α i γ −

12 ˆ rα i −

12 ˆ rψ ( κ ) α i α − i − α i (cid:19) for every linear strategy α i , where ˆ r = R r dµ ( r, κ, σ ζ ) is the mean of µ ’s marginal on elas-ticity. For κ ∈ [0 , and r > , α BRi ( α − i , ; κ, r ) := γ − rψ ( κ ) α − i r best responds to α − i amongall strategies Q i : R → R for all σ ζ > . Lemma 3 shows that α BRi ( α − i , ; κ, r ) is not only the best-responding linear strategy whenopponent plays α − i and i believes in correlation parameter κ and elasticity r , it is alsooptimal among the class of all strategies Q i ( s i ) against the same opponent play and underthe same beliefs.Call a linear strategy more aggressive if its coeﬃcient α i ≥ i ’s subjective best response function becomes more aggressive when i believes in lower κ or lower r . We have ∂α BRi ∂κ < ∂α BRi ∂r < r • . The followinglemma shows that any linear proﬁle generates data whose KL-divergence can be minimized to0 by a unique value of r . We also characterize how this inference about elasticity depends onthe strategy proﬁles and the agent’s belief about the correlation parameter κ . As mentionedearlier, we focus on the case where the bounds on the inferences r ∈ [0 , ¯ M r ], σ ζ ∈ [0 , ¯ M σ ζ ] aresuﬃciently large to ensure that the KL-divergence minimization problem is well-behaved. Lemma 4.

For every < r • , ¯ M α < ∞ , there exist < L , L , L < ∞ such that forany ( σ • ζ ) ≥ L , ¯ M σ ζ ≥ ( σ • ζ ) + L , ¯ M r ≥ L , κ • , κ ∈ [0 , , α i , α − i ∈ [0 , ¯ M α ] , we have D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F ˆ r,κ, ˆ σ ζ ( α i , α − i )) = 0 for exactly one pair ˆ r ∈ [0 , ¯ M r ] , ˆ σ ζ ∈ [0 , ¯ M σ ζ ] .This ˆ r is given by r INFi ( α i , α − i , ; κ • , κ, r • ) := r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) . r is strictly decreasing in her belief aboutthe correlation parameter κ. To understand why, assume player i uses the linear strategy α i and player − i uses the linear strategy α − i . After receiving a private signal s i , player i expects to face a price distribution with a mean of γs i − r ( α i s i + α − i E κ [ s − i | s i ]) . Underprojection bias κ > κ • , E κ [ s − i | s i ] is excessively steep in s i . For example, following a largeand positive s i , the agent overestimates the similarity of − i ’s signal and wrongly predictsthat − i must also choose a very high quantity, and thus becomes surprised when marketprice remains high. The agent then wrongly infers that the market price elasticity must below. Therefore, in order to rationalize the average market price conditional on own signal, anagent with projection bias must infer r < r • . For similar reasons, an agent with correlationneglect infers r > r • . Combining Lemma 3 and Lemma 4, we ﬁnd that increasing κ has an a priori ambigu-ous impact on the agent’s equilibrium aggressiveness. Increasing κ has the direct eﬀect oflowering aggression (by Lemma 3), but it also causes the indirect eﬀect of lowering inferenceabout r (by Lemma 4) and therefore increases aggression (by Lemma 3). Nevertheless, weshow in the results below that the indirect eﬀect through the mislearning channel dominates,and the evolutionary stability of correlational errors are driven by this channel. We show inSection 4.6 that the results are reversed when we shut down the learning channel.Lemma 4 considers the problem of KL-divergence minimization when all of the dataare generated from a single strategy proﬁle, ( α − i , α − i ) . It implies that if λ ∈ { , } and( p A , p B ) = (1 , A and Θ B . Thus, they must ﬁnd a single set of parameters for thestage game that best ﬁts all of their data, and even this best-ﬁtting model will have positiveKL divergence in equilibrium. The next lemma shows the LQN game satisﬁes Assumptions1 through 5. Therefore, the existence and continuity results from Section 3.3 imply that thetractable analysis in homogeneous societies remains robust to the introduction of a smallbut non-zero share of a mutant theory.

Lemma 5.

For every r • , σ • ζ ≥ , λ ∈ [0 , , κ • , κ ∈ [0 , , ¯ M α , ¯ M σ ζ , ¯ M r < ∞ , the LQN withobjective parameters ( r • , κ • , σ • ζ ) , strategy space A = [0 , ¯ M α ] and theories Θ( κ • ) , Θ( κ ) withparameter spaces [0 , ¯ M r ] , [0 , ¯ M σ ζ ] satisfy Assumptions 1, 2, 3, 4, and 5. .4 Uniform Matching ( λ = 0 ) and Projection Bias We now describe our main results on the evolutionary instability of correctly speciﬁed be-liefs about the information structure. Our ﬁrst main result is that in a society where agentsare uniformly matched, a correctly speciﬁed κ will be evolutionarily fragile with strategiccertainty against some amount of projection bias. The proof of this result involves character-izing the asymmetric equilibrium strategy proﬁle in matches between the correctly speciﬁedresidents and the projection-biased mutants, and proving that a small amount of projectionbias leads the mutants to have higher payoﬀs in the resident-vs-mutant matches than theresidents’ payoﬀs in the resident-vs-resident matches. Proposition 7.

Let r • > , κ • ∈ [0 , be given. With high enough price volatility and largeenough strategy space and inference space, there exist κ < κ • < ¯ κ so that in societies with twotheories (Θ A , Θ B ) = (Θ( κ • ) , Θ( κ )) where κ ∈ [ κ, ¯ κ ] , there is a unique EZ-SC with uniformmatching ( λ = 0 ) and ( p A , p B ) = (1 , . The equilibrium ﬁtness of Θ( κ ) is strictly higherthan that of Θ( κ • ) if κ > κ • , and strictly lower if κ < κ • . Combining this result with Lemma 5, we conclude that in societies with theories Θ( κ • )and Θ( κ ) where κ is slightly above κ • , the unique EZ-SC is approachable. Hence, the correctspeciﬁcation is not evolutionarily stable with strategic certainty against a small amount ofprojection bias.Intuitively, as discussed after Lemma 4, projection bias generates a commitment to ag-gression as it leads the biased agents to under-infer market price elasticity. It is well knownthat in Cournot oligopoly games, such commitment can be beneﬁcial. For instance, if quan-tities are chosen sequentially, the ﬁrst mover obtains a higher payoﬀ compared to the casewhere quantities are chosen simultaneously. A similar force is at work here, but the source ofthe commitment is diﬀerent. Misspeciﬁcation about signal correlation leads to misinferenceabout r • , which causes the mutants to credibly respond to their opponents’ play in an overlyaggressive manner. The rational residents, who can identify the mutants in the population,back down and yield a larger share of the surplus. While projection bias is beneﬁcial insmall amounts, it is also intuitive that excessive aggression would be detrimental as well, asoverproduction can be individually suboptimal. λ = 1 ) and Correlation Neglect Turning to the case of perfectly assortative matching, we obtain the opposite result: evolu-tionary stability now selects for theories with correlation neglect. The fragility of the correctspeciﬁcation is even starker here, as we show that any level of correlation neglect leads tohigher equilibrium ﬁtness. 27 roposition 8.

Let r • > , κ • ∈ [0 , be given. With high enough price volatility andlarge enough strategy space and inference space, in societies with two theories (Θ A , Θ B ) =(Θ( κ A ) , Θ( κ B )) where κ A ≤ κ B , the ﬁtness of Θ A is weakly higher than that of Θ B in everyEZ-SC with any population proportion p and perfectly assortative matching ( λ = 1 ). Combining this result with Lemma 5, we conclude that under Proposition 8’s conditionswith ( p A , p B ) = (1 , , at least one EZ-SC is approachable, and each theory’s ﬁtness isinvariant across all approachable EZ-SCs. Furthermore, this ﬁtness is strictly decreasing in κ. Hence, for any κ A < κ B , theory Θ( κ A ) is evolutionarily stable with strategic certaintyagainst theory Θ( κ B ) . Specializing to κ B = κ • , we conclude that the correct speciﬁcation isevolutionarily fragile against any level of correlation neglect.As discussed after Lemma 4, correlation neglect makes agents over-infer market priceelasticity, and thus lets them commit to more cooperative behavior (i.e., linear strategieswith a smaller coeﬃcient α i ). Rational opponents would take advantage of such agents,but the biased agents never match up against rational opponents in a society with perfectlyassortative matching. Note also that in the uniform matching case, projection bias leads tohigher payoﬀ for the mutant at the expense of the rational opponent’s payoﬀ. With perfectlyassortative matching, correlation neglect Pareto improves both biased agents’ payoﬀs.To understand why equilibrium ﬁtness is a monotonically decreasing function of κ withperfectly assortative matching, let α T EAM denote the symmetric linear strategy proﬁle thatmaximizes the sum of the two ﬁrms’ expected objective payoﬀs. We can show that amongsymmetric strategy proﬁles, players’ payoﬀs strictly decrease in their aggressiveness in theregion α > α

T EAM . We can also show that with λ = 1 and any κ ∈ [0 , , the equilibrium playamong two adherents of Θ( κ ) strictly increases in aggression as κ grows, and it is alwaysstrictly more aggressive than α T EAM . Lowering perception of κ confers an evolutionaryadvantage by bringing play monotonically closer to the team solution α T EAM in equilibrium.

The key mechanism behind Proposition 7 and Proposition 8 is that misperceptions about κ confer an evolutionary advantage through the learning channel: they cause the misspeciﬁedagents to misinfer some other parameter of the stage game. This mislearning is strategicallybeneﬁcial as it commits the agents to certain behavior that increases their equilibrium payoﬀsagainst their typical opponents, given the matching assortativity. Section 3.2 showed that thelearning channel unique to the world of theory evolution permits novel stability phenomenain general games, and here we ﬁnd the same channel is also indispensable for the predictionsin this particular application. The results about the evolutionary fragility of the correctspeciﬁcation in Proposition 7 and Proposition 8 would be reversed without it.28 roposition 9. Let r • > , κ • ∈ [0 , be given. With high enough price volatility and largeenough strategy space and inference space, there exists (cid:15) > so that for any κ l , κ h ∈ [0 , , κ l < κ • < κ h ≤ κ • + (cid:15) , the correctly speciﬁed theory Θ( κ • ) is evolutionarily stable withstrategic certainty against the singleton theory { F r • ,κ h ,σ • ζ } under uniform matching ( λ = 0 ),and evolutionarily stable with strategic certainty against the singleton theory { F r • ,κ l ,σ • ζ } underperfectly assortative matching ( λ = 1 ). In this proposition, we consider agents with singleton theories who misperceive the signalcorrelation structure but hold dogmatic and correct beliefs about the other game parameters,including the elasticity of market price. Once the mislearning channel is shut down, we ﬁndthat misperceptions about κ that used to confer an evolutionary advantage under a certainmatching assortativity can no longer invade a society of correctly speciﬁed residents. We turn to general incomplete-information games and provide a condition for a theory tobe evolutionarily fragile against a “nearby” misspeciﬁed theory. This condition shows howassortativity and the learning channel shape the evolutionary selection of theories for abroader class of stage games and biases. We also relate the condition to the speciﬁc resultsstudied so far in this application.Consider a stage game where a state of the world ω is realized at the start of the game.Players 1 and 2 observe private signals s , s ∈ S ⊆ R , possibly correlated given ω. Theobjective distribution of ( ω, s , s ) is P • . Based on their signals, players choose actions q , q ∈ R and receive random consequences y , y ∈ Y . The distribution over consequencesas a function of ( ω, s , s , q , q ) and the utility over consequences π : Y → R are such thateach player i ’s objective expected utility from taking action q i against opponent action q − i in state ω is given by u • i ( q i , q − i ; ω ), diﬀerentiable in its ﬁrst two arguments.For an interval of real numbers [ κ, ¯ κ ] with κ < ¯ κ and κ • ∈ ( κ, ¯ κ ), suppose there is a familyof theories (Θ( κ )) κ ∈ [ κ, ¯ κ ] , each with a product structure Θ( κ ) = A × F ( κ ). Fix λ ∈ [0 ,

1] anda strategy space A ⊆ R S , representing the feasible signal-contingent strategies. Suppose thetwo theories in the society are Θ A = Θ( κ • ) and Θ B = Θ( κ ) for some κ ∈ [ κ, ¯ κ ] . The nextassumption requires there to be a unique EZ-SC with ( p A , p B ) = (1 ,

0) in such societies withany κ ∈ [ κ, ¯ κ ], and further requires the EZ-SC to feature linear equilibria. Linear equilibriaexist and are unique in a large class of games outside of the duopoly framework, and inparticular in LQN games under some conditions on the payoﬀ functions (see, e.g., Angeletosand Pavan (2007)). Assumption 6.

Suppose there is a unique EZ-SC under λ -matching and population pro-portions ( p A , p B ) = (1 , with Θ A = Θ( κ • ) , Θ B = Θ( κ ) for every κ ∈ [ κ, ¯ κ ] . Suppose the -indexed EZ-SC strategy proﬁles ( σ ( κ )) = ( σ AA ( κ ) , σ AB ( κ ) , σ BA ( κ ) , σ BB ( κ )) are linear, i.e., σ gg ( κ )( s i ) = α gg ( κ ) · s i with α gg ( κ ) diﬀerentiable in κ . Suppose that in the EZ-SC with κ = κ • , α AA ( κ • ) is objectively interim-optimal against itself. Finally, assume for every κ ,Assumptions 1, 2, 3, 4, and 5 are satisﬁed. Proposition 10.

Let α • := α AA ( κ • ) . Then, under Assumption 6, if E • " E • " ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s | s > , then there exists some (cid:15) > so that Θ( κ • ) is evolutionarily fragile with strategic certaintyagainst theories Θ( κ ) with κ ∈ ( κ • , κ • + (cid:15) ] ∩ [ κ, ¯ κ ] . Also, if E • " E • " ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s | s < , then there exists some (cid:15) > so that Θ( κ • ) is evolutionarily fragile with strategic certaintyagainst theories Θ( κ ) with κ ∈ [ κ • − (cid:15), κ • ) ∩ [ κ, ¯ κ ] . Here E • is the expectation with respect tothe objective distribution of ( ω, s , s ) under P • . Proposition 10 describes a general condition to determine whether a correctly speciﬁedtheory is evolutionarily fragile against a nearby misspeciﬁed mutant theory. The conditionasks if a slight change in the mutant theory’s κ leads mutants’ opponents to change theirequilibrium actions such that the mutants become better oﬀ on average. These opponentsare the residents under uniform matching λ = 0, so α AB ( κ • ) is relevant. These opponentsare other mutants under perfectly assortative matching λ = 1, so α BB ( κ • ) is relevant.Proposition 10 implies that one should only expect the correctly speciﬁed theory to bestable against all nearby theories in “special” cases — that is, when the expectation in thestatement of Proposition 10 is exactly equal to 0. One such special case is when the agentsface a decision problem where 2’s action does not aﬀect 1’s payoﬀs, that is ∂u • ∂q = 0. Thissets the expectation to zero, so the result never implies that the correctly speciﬁed theory isevolutionarily fragile against a misspeciﬁed theory in such decision problems.In the duopoly game analyzed previously, we have ∂u • ∂q ( q , q , ω ) = − r • q . Player 1 isharmed by player 2 producing more if q > , and helped if q < . From straightforwardalgebra, the expectation in Proposition 10 simpliﬁes to E • [ s ] · ( − ψ ( κ • ) r • α • ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] . The proof of Proposition 7 shows that when λ = 0, α AB ( κ • ) <

0. The proof of Proposition More precisely, for every s i ∈ S, α AA ( κ • ) · s i maximizes the agent’s objective expected utility across allof R when − i uses the same linear strategy α AA ( κ • ).

30 shows that when λ = 1, α BB ( κ • ) >

0. The uniqueness of EZ-SC also follow from thesepropositions, for an open interval of κ containing κ • . We restrict A to the set of linearstrategies, and Lemma 3 implies the linear strategies played by two correctly speciﬁed ﬁrmsagainst each other are interim optimal. Finally, Lemma 5 veriﬁes that Assumptions 1 through5 are satisﬁed. Therefore, the conditions of Proposition 10 are satisﬁed for λ ∈ { , } , and wededuce the correctly speciﬁed theory is evolutionarily fragile with strategic certainty againstslightly higher κ (for λ = 0) and slightly lower κ (for λ = 1). In the next application, we use the framework of theory evolution to provide a justiﬁca-tion for coarse analogy classes in games. Jehiel (2005) introduces the solution concept ofanalogy-based expectation equilibrium (ABEE) in extensive-form games, where agents groupopponents’ nodes in an extensive-form game into analogy classes and only keep track of av-erage behavior within each analogy class. An ABEE is a strategy proﬁle where agents bestrespond to such average opponent behavior. In the ensuing literature that applies ABEE todiﬀerent settings, analogous classes are usually exogenously given and interpreted as aris-ing from agents’ cognitive constraints. We show through an example that suitably deﬁnedtheories whose marginals on opponents’ play are restricted subsets of extensive-form strate-gies can encode analogy classes, and that the matches between any two groups in an EZconstitute ABEEs. We can then investigate which analogy classes are more likely to ariseby studying the stability of diﬀerent theories (i.e., analogy classes), including the correctlyspeciﬁed theory (i.e., the ﬁnest analogy class).Consider a centipede game, shown in Figure 2. P1 and P2 take turns choosing Acrossor Drop. The non-terminal nodes are labeled n k , 1 ≤ k ≤ K for an even K . P1 acts atnodes n , n , ..., n K − , P2 acts at nodes n , n , ..., n K , and choosing Drop at n k leads to theterminal node z k . If Across is always chosen, then the terminal node z end is reached. If P1chooses Drop at the ﬁrst node, the game ends with the payoﬀ proﬁle (0, 0). Every time aplayer i chooses Across, the sum of payoﬀs grows by g > , but if the next player choosesDrop then i ’s payoﬀ is ‘ > i would have gotten by choosing Drop. If z end is reached, both get Kg/ . That is, if u kj is the utility of j at the terminal node z k , and i moves at n k , then u k − i = u k − − i − ‘ while u ki = ( u k − i + u k − − i + g ) − u k − i . This works out to u kj = g ( k − for both players when k is odd, and u k = k − g − ‘ , u k = k g + ‘ when k is even.While this is an asymmetric stage game, we study the symmetrized version mentioned in Section 6.2 of Jehiel (2005) mentions that if players could choose their own analogy classes, then theﬁnest analogy classes need not arise, but also says “it is beyond the scope of this paper to analyze theimplications of this approach.” l ,g+ l ) (g,g)n n n D D DA A A ( g(K-2)/2, g(K-2)/2 ) ( g(K-2)/2- l ,gK/2+ l )n K-1 n K D DA... A (gK/2,gK/2)A

Figure 2: The centipede game. There are 2 K non-terminal nodes and players 1 (blue) and 2(red) alternate in choosing Across (A) or Drop (D). Payoﬀ proﬁles are shown at the terminalnodes.Section 2.1, where two matched agents are randomly assigned into the roles of P1 and P2.Let A = { ( d k ) Kk =1 ∈ [0 , K } , so each strategy is characterized by the probabilities of playingDrop at various nodes in the game tree. When assigned into the role of P1, the strategy( d k ) plays Drop with probabilities d , d , ..., d K − at nodes n , n , ...n K − . When assignedinto the role of P2, it plays Drop with probabilities d , d , ..., d K at nodes n , n , ...n K . Theset of consequences is Y = { , } × ( { z k : 1 ≤ k ≤ K } ∪ { z end } ), where the ﬁrst dimensionof the consequence returns the player role that the agent was assigned into, and the seconddimension returns the terminal node reached. The objective distribution over consequencesas a function of play is F • : A → ∆( Y ).Consider a learning environment where agents know the game tree (i.e., they know F • ),but some agents mistakenly think that when their opponents are assigned into a role, theseopponents play Drop with the same probabilities at all of their nodes. Formally, deﬁne therestricted space of strategies A An := { ( d k ) ∈ [0 , K : d k = d k if k ≡ k (mod 2) } ⊆ A . Thecorrectly speciﬁed theory is Θ • := A × A × { F • } . The misspeciﬁed theory with a restrictionon beliefs about opponents’ play is Θ An := A An × A An × { F • } , reﬂecting a dogmatic beliefthat opponents play the same mixed action at all nodes in the analogy class. It is importantto remember that these restriction on strategies only exists in the subjective beliefs of thetheory Θ An adherents. All agents, regardless of their theory, actually have the strategy space A . The next proposition provides a justiﬁcation for why we might expect agents with coarseanalogy classes given by A An to persist in the society. Proposition 11.

Suppose K ≥ and g > K − ‘ . For any matching assortativity λ ∈ [0 , , the correctly speciﬁed theory Θ • is evolutionarily stable against itself, but it is notevolutionarily stable against the misspeciﬁed theory Θ An . Also, Θ An is not evolutionarilystable against Θ • , unless λ = 1 . In contrast to the results from the previous section, which predict diﬀerent biases mayarise under diﬀerent matching assortativities, we ﬁnd in this environment that the correctlyspeciﬁed theory is not evolutionarily stable against the theory Θ An with coarse analogy32lasses under any level of assortativity. In the the previous application to LQN games, agentswith projection bias commit to acting more aggressively, which increases their equilibriumwelfare in matches against rational agents but decreases their equilibrium welfare in matchesagainst other agents with the same bias — and vice versa for agents with correlation neglect.But in the current application, the conditional ﬁtness of Θ An against both Θ • and Θ An canstrictly improve on the correctly speciﬁed residents’ equilibrium ﬁtness. This is becausethe matches between two adherents of Θ • must result in Dropping at the ﬁrst move inequilibrium, while matches where at least one player is an adherent of Θ An either lead to thesame outcome or lead to a Pareto dominating payoﬀ proﬁle.But at the same time, Θ An is not evolutionarily stable against Θ • either, because thecorrectly speciﬁed agents receive higher payoﬀs than the misspeciﬁed agents when matchedagainst each other. It is easy to see that there is some interior population proportions forthe two theories, ( p A , p B ) ∈ (0 , , such that there is an EZ where no matches that involvemisspeciﬁed agents result in immediate dropping, and the two theories have the same ﬁtness. This paper presents an evolutionary selection criterion to endogenize (mis)speciﬁcationswhen agents learn about a strategic environment. We introduce the concept of a zeitgeist tocapture the ambient social structure where learning takes place: the prominence of diﬀerenttheories in the society and the interaction patterns among their adherents. These detailsmatter because diﬀerent types of opponents behave diﬀerently, inducing diﬀerent beliefsabout the economic fundamentals for a misspeciﬁed agent. Evolutionary stability of a theoryis deﬁned based on the expected objective payoﬀs (ﬁtness) of its adherents in equilibrium.We have highlighted settings where the correct speciﬁcation is not evolutionarily stableagainst some misspeciﬁcations. We view our main contributions as two fold. First, we pointout how details of the zeitgeist (e.g., the matching assortativity) change which learningbiases may persist in an otherwise rational society. Second, we emphasize that the learningchannel, unique to a world where evolutionary forces act on speciﬁcations (sets of feasiblebeliefs) instead of single beliefs, generates novel stability phenomena.Our framework evaluates whether a misspeciﬁcation is likely to persist once it emergesin a society, but does not account for which errors appear in the ﬁrst place. It is plausiblethat some ﬁrst-stage ﬁlter prevents certain obvious misspeciﬁcations from ever reaching thestage that we study in the evolutionary framework. In the applications, we have focused onmisspeciﬁcations that seem psychologically plausible or harder to detect, such as misspeciﬁedhigher-order beliefs. 33e have used the simplest evolutionary framework where ﬁtness is identiﬁed with theexpectation of objective payoﬀs, as opposed to some more exotic function of the payoﬀs. Thispaper not meant to be a just-so congruence exercise of identifying the suitable deﬁnition ofﬁtness to justify a particular error (which is the focus for many of the papers that Robson andSamuelson (2011) survey). Rather, we hope that our stability notions are reasonably simpleand universal that they may become a part of the applied theory toolkit in the future. Studieson the implications of misspeciﬁcations in various strategic environments may further enrichour understanding of these errors by paying more attention to their evolutionary stability.

References

Alger, I. and J. Weibull (2019): “Evolutionary models of preference formation,”

AnnualReview of Economics , 11, 329–354.

Aliprantis, C. and K. Border (2006):

Inﬁnite Dimensional Analysis: A Hitchhiker’sGuide , Springer Science & Business Media.

Angeletos, G.-M. and A. Pavan (2007): “Eﬃcient use of information and social valueof information,”

Econometrica , 75, 1103–1142.

Ba, C. (2020): “Model misspeciﬁcation and paradigm shift,”

Working Paper . Ba, C. and A. Gindin (2020): “A multi-agent model of misspeciﬁed learning with over-conﬁdence,”

Working Paper . Bergemann, D. and S. Morris (2013): “Robust predictions in games with incompleteinformation,”

Econometrica , 81, 1251–1308.

Berman, R. and Y. Heller (2020): “Naive analytics equilibrium,”

Working Paper . Bohren, J. A. (2016): “Informational herding with model misspeciﬁcation,”

Journal ofEconomic Theory , 163, 222–247.

Bohren, J. A. and D. Hauser (2018): “Learning with model misspeciﬁcation: Charac-terization and robustness,”

Working Paper . Cho, I.-K. and K. Kasa (2015): “Learning and model validation,”

Review of EconomicStudies , 82, 45–82.——— (2017): “Gresham’s law of model averaging,”

American Economic Review , 107, 3589–3616.

Dasaratha, K. and K. He (2020): “Network structure and naive sequential learning,”

Theoretical Economics , 15, 415–444. 34 ekel, E., J. Ely, and O. Yilankaya (2007): “Evolution of preferences,”

Review ofEconomic Studies , 74, 685–704.

Eliaz, K. and R. Spiegler (2020): “A model of competing narratives,”

American Eco-nomic Review , 110, 3786–3816.

Esponda, I. and D. Pouzo (2016): “Berk–Nash equilibrium: A framework for modelingagents with misspeciﬁed models,”

Econometrica , 84, 1093–1130.

Esponda, I., D. Pouzo, and Y. Yamamoto (2019): “Asymptotic behavior of Bayesianlearners with misspeciﬁed models,”

Working Paper . Frick, M., R. Iijima, and Y. Ishii (2019): “Stability and rbustness in misspeciﬁedlearning models,”

Working Paper .——— (2020): “Welfare comparisons for biased learning,”

In Preparation .——— (2021): “Misinterpreting others and the fragility of social learning,”

Econometrica,forthcoming . Friedman, M. (1953):

Essays in Positive Economics , University of Chicago Press.

Fudenberg, D. and G. Lanzani (2020): “Which misperceptions persist?”

WorkingPaper . Fudenberg, D., G. Lanzani, and P. Strack (2020): “Limits points of endogenousmisspeciﬁed learning,”

Working Paper . Fudenberg, D., G. Romanyuk, and P. Strack (2017): “Active learning with a mis-speciﬁed prior,”

Theoretical Economics , 12, 1155–1189.

Gagnon-Bartsch, T., M. Rabin, and J. Schwartzstein (2020): “Channeled atten-tion and stable errors,”

Working Paper . Goldfarb, A. and M. Xiao (2019): “Transitory shocks, limited attention, and a ﬁrm’sdecision to exit,”

Working Paper . Hansen, K., K. Misra, and M. Pai (2021): “Algorithmic collusion: Supra-competitiveprices via independent algorithms,”

Marketing Science, forthcoming . He, K. (2020): “Mislearning from censored data: The gambler’s fallacy in optimal-stoppingproblems,”

Working Paper . Heidhues, P., B. Koszegi, and P. Strack (2018): “Unrealistic expectations and mis-guided learning,”

Econometrica , 86, 1159–1214.

Heller, Y. (2015): “Three steps ahead,”

Theoretical Economics , 10, 203–241.

Heller, Y. and E. Winter (2016): “Rule rationality,”

International Economic Review ,57, 997–1026. 35—— (2020): “Biased-belief equilibrium,”

American Economic Journal: Microeconomics ,12, 1–40.

Jehiel, P. (2005): “Analogy-based expectation equilibrium,”

Journal of Economic theory ,123, 81–104.——— (2018): “Investment strategy and selection bias: An equilibrium perspective onoveroptimism,”

American Economic Review , 108, 1582–97.

Levy, G., R. Razin, and A. Young (2020): “Misspeciﬁed politics and the recurrence ofpopulism,”

Working Paper . Molavi, P. (2019): “Macroeconomics with learning and misspeciﬁcation: A general theoryand applications,”

Working Paper . Nyarko, Y. (1991): “Learning in mis-speciﬁed models and the possibility of cycles,”

Journalof Economic Theory , 55, 416–427.

Olea, J. L. M., P. Ortoleva, M. M. Pai, and A. Prat (2020): “Competing models,”

Working Paper . Robson, A. J. and L. Samuelson (2011): “The evolutionary foundations of preferences,”in

Handbook of Social Economics , Elsevier, vol. 1, 221–310.

Schwartzstein, J. and A. Sunderam (2021): “Using models to persuade,”

AmericanEconomic Review, forthcoming . Vives, X. (1988): “Aggregation of information in large Cournot markets,”

Econometrica ,851–876.

AppendixA Proofs

A.1 Proof of Proposition 1

Proof.

In any approachable EZ or approachable EZ-SC, let (ˆ a A , ˆ a B , ˆ F ) ∈ supp( µ A ) and notethat ( a AA , a BA , F • ) ∈ Θ A since Θ A is correctly speciﬁed. Both (ˆ a A , ˆ a B , ˆ F ) and ( a AA , a BA , F • )solve the weighted minimization problem, the former because it is in the support of µ A , thelatter because it attains the lowest minimization objective of 0. By strong identiﬁcation, thebest-response function under belief µ A is the same as that of someone who knows the thegame is the decision problem F • . Therefore, adherents of Θ A obtain the highest possibleexpected payoﬀs in F • , so Θ A has weakly higher ﬁtness than Θ B in the approachable EZ orEZ-SC. 36 .2 Proof of Proposition 2 Proof.

Let two singleton theories Θ A , Θ B be given. By way of contradiction, suppose theyexhibit stability reversal. Let Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ = 0 , ( a )) be any EZ-SCwhere Θ B is resident. By the deﬁnition of EZ-SC, Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ =0 , ( a )) is also an EZ-SC where Θ A is resident. Let u g,g be theory Θ g ’s conditional ﬁtnessagainst group g in the EZ-SC Z . Part (i) of the deﬁnition of stability reversal requires that u AA > u BA and u AB > u BB . These conditional ﬁtness levels remain the same in Z . Thismeans the ﬁtness of Θ A is strictly higher than that of Θ B in Z , a contradiction. A.3 Proof of Example 1

Proof.

Deﬁne b ∗ ( a i , a − i ) := b • + ma i + a − i . It is clear that D KL ( F • ( a i , a − i ) k ˆ F ( a i , a − i ; b ∗ ( a i , a − i ) , m ))) =0, while this KL divergence is strictly positive for any other choice of b. In every EZ-SC with λ = 0 and p = (1 , , we must have a AA = a AB = 1 . If a BA = 2 , then the adherents of Θ B infer b ∗ (1 ,

2) = b • + m . With this inference, the biased agentsexpect 1 · (2( b • + m ) − m ) = 2 b • − m from playing 1 against rival investment 1, and expect2 · (3( b • + m ) − m ) − c = 6 b • − c from playing 2 against rival investment 1. Since 4 b • + m − c > a BA = 2 and µ B puts probability 1 on b • + m . Itis impossible to have a BA = 1 in EZ-SC. This is because b ∗ (1 , > b ∗ (1 , , and under theinference b ∗ (1 ,

2) we already have that the best response to 1 is 2, so the same also holdsunder any higher belief about complementarity. Also, we have a BB = 2, since 2 must bestrespond to both 1 and 2. So in every such EZ-SC, Θ A ’s conditional ﬁtness against group A is2 b • and Θ B ’s conditional ﬁtness against group A is 6 b • − c , with 2 b • > b • − c by Condition 1.Also, Θ A ’s conditional ﬁtness against group B is 3 b • , while Θ B ’s conditional ﬁtness againstgroup B is 8 b • − c . Again, 3 b • > b • − c by Condition 1.Next, we show Θ B has strictly higher ﬁtness than Θ A in every EZ-SC with λ = 0 , p B =1 . There is no EZ-SC with a BB = 1 . This is because b ∗ (1 ,

1) = b • + m . As discussedbefore, under this inference the best response to 1 is 2, not 1. Now suppose a BB = 2 . Then µ B puts probability 1 on b ∗ (2 ,

2) = b • + m . With this inference, the biased agentsexpect 1 · (3( b • + m ) − m ) = 3 b • − m from playing 1 against rival investment 2, and expect2 · (4( b • + m ) − m ) − c = 8 b • − c from playing 2 against rival investment 2. We have5 b • + m − c > a AA = a AB = 1 . We conclude the unique EZ-SC behavior is ( a AA , a AB , a BA , a BB ) = (1 , , , · (2( b • + m ) − m ) = 2 b • − m from playing 1 against rival investment 1, andexpect 2 · (3( b • + m ) − m ) − c = 6 b • − m − c from playing 2 against rival investment 1. Wehave 4 b • − c < λ = 0and p = (0 , , the ﬁtness of Θ A is 2 b • and the ﬁtness of Θ B is 8 b • − c, where 8 b • − c > b •

37y Condition 1.

A.4 Proof of Proposition 3

Proof.

To show the ﬁrst claim, by way of contradiction, suppose Z = (Θ A , Θ B , µ A , µ B , p =(1 , , λ = 0 , ( a AA , a AB , a BA , a BB )) is an EZ-SC, and ˜ Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ =0 , (˜ a AA , ˜ a AB , ˜ a BA , ˜ a BB )) is another EZ-SC where the adherents of Θ B hold the same belief µ B (group A’s belief cannot change as Θ A is the correctly speciﬁed singleton theory). Bythe optimality of behavior in Z , a BA best responds to a AB under the belief µ B , and a AB best responds to a BA under the belief µ A , therefore ˜ Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ =0 , (˜ a AA , a AB , a BA , ˜ a BB )) is another EZ-SC. This holds because the distributions of observa-tions for the adherents of Θ B are identical in ˜ Z and ˜ Z , since they only face data gener-ated from the proﬁle (˜ a BB , ˜ a BB ) . At the same time, since ˜ a BB best responds to itself underthe belief µ B , we have that Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 0 , ( a AA , a AB , a BA , ˜ a BB ))is an EZ-SC. Part (i) of the deﬁnition of stability reversal applied to Z requires that U • ( a AB , a BA ) > U • (˜ a BB , ˜ a BB ) (where U • is the objective expected payoﬀs), but part (ii)of the same deﬁnition applied to ˜ Z requires U • (˜ a BB , ˜ a BB ) ≥ U • ( a AB , a BA ) , a contradiction.To show the second claim, by way of contradiction suppose Θ B is strategically indepen-dent and Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ = 0 , ( a AA , a AB , a BA , a BB )) is an EZ-SC. We canwrite Θ B = A × F B . By deﬁnition of EZ-SC, µ B can be written as δ a AB × δ a BB × µ F B where µ F B ∈ ∆( F B ) . By strategic independence, the adherents of Θ B ﬁnd it optimal toplay a BB against any opponent strategy under the belief that F is drawn from µ F B . So,there exists another EZ-SC of the form Z = (Θ A , Θ B , µ A , δ a AB × δ a BB × µ F B , p = (0 , , λ =0 , ( a AA , a AB , a BB , a BB )), where a AB is an objective best response to a BB . The belief µ F B issustained because in both Z and Z , the adherents of Θ B have the same data: from the strat-egy proﬁle ( a BB , a BB ) . In Z , Θ A ’s ﬁtness is U • ( a AB , a BB ) and Θ B ’s ﬁtness is U • ( a BB , a BB ) . We have U • ( a AB , a BB ) ≥ U • ( a BB , a BB ) since a AB is an objective best response to a BB , contradicting the deﬁnition of stability reversal. A.5 Proof of Proposition 4

Proof.

Let λ ∈ [0 ,

1] be given and let Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ, ( a )) be an EZ-SC. Since Θ A , Θ B are singleton theories, Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 0 , ( a )) and Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 1 , ( a )) are also EZ-SCs. Furthermore, they are all ap-proachable since the same beliefs and behavior are sustained as EZ-SCs with any populationproportions. Let u g,g represent theory Θ g ’s conditional ﬁtness against group g in each ofthese three EZ-SCs. From the hypothesis of the proposition, u A,A ≥ u B,A and u A,A ≥ u B,B .38his means the ﬁtness of Θ A in Z , which is u A,A , is weakly larger than the ﬁtness of Θ B in Z , which is λu B,B + (1 − λ ) u B,A . This shows Θ A has weakly higher ﬁtness than Θ B in every ap-proachable EZ-SC with λ and p = (1 , λ , for at least one approachable EZ-SC exists when λ = 0, and the sameequilibrium belief and behavior also constitutes an EZ-SC for any other assortativity. A.6 Proof of Example 2

Proof.

Let KL , := 0 . · ln . . + 0 . · ln . . ≈ . , KL , := 0 . · ln . . + 0 . · ln . . ≈ . , and KL , := 0 . · ln . . + 0 . · ln . . ≈ . λ h be the unique solution to(1 − λ ) KL , − λ ( KL , − KL , ) = 0 , so λ h ≈ . . We show for any λ ∈ [0 , λ h ), there exists a unique EZ-SC Z = (Θ A , Θ B , µ A , µ B , p =(1 , , λ, ( a )), and that this EZ-SC has µ B putting probability 1 on F H , a AA = a , a AB = a ,a BA = a , a BB = a . First, we may verify that under F H , a best responds to both a and a . Also, the KL divergence of F H is λ · KL , while that of F L is λ · KL , + (1 − λ ) · KL , . Since λ < λ h , we see that F H has strictly lower KL divergence. Finally, to check that there areno other EZ-SCs, note we must have a AA = a , a AB = a , a BA = a in every EZ-SC. In anEZ-SC where a BB puts probability q ∈ [0 ,

1] on a , the KL divergence of F H is λp · KL , andthe KL divergence of F L is λp · KL , + (1 − λ ) · KL , . We have λq · KL , +(1 − λ ) · KL , − λq · KL , = λq · ( KL , − KL . )+(1 − λ ) KL , ≥ (1 − λ ) KL , − λ ( KL , − KL , ) . Since λ < λ h , this is strictly positive. Therefore we must have µ B put probability 1 on F H , which in turn implies q = 1 . For each λ ∈ [0 , λ h ), the beliefs and behavior in the unique EZ-SC discussed above alsoconstitute an EZ-SC for a small enough p B > . So, the unique EZ-SC with p B = 0 isapproachable.When Θ A is dominant, the equilibrium ﬁtness of Θ A is always 0.25 for every λ . Theequilibrium ﬁtness of Θ B , as a function of λ , is 0 . λ + 0 . − λ ) . Let λ l solve 0 .

25 =0 . λ + 0 . − λ ) , that is λ l = 0 . . This shows Θ A is evolutionarily fragile with strategiccertainty against Θ B for λ ∈ ( λ l , λ h ) , and it is evolutionarily stable with strategic certaintyagainst Θ B for λ = 0.Now suppose λ = 1 . If there is an EZ-SC with p A = 1 where a BB plays a with positiveprobability, then µ B must put probability 1 on F L , since KL , < KL , . This is a contra-diction, since a does not best respond to itself under F L . So the unique EZ-SC involves a AA = a , a AB = a , a BA = a , a BB = a . It is easy to check this EZ-SC is approachable.In the EZ-SC, the ﬁtness of Θ A is 0.25, and the ﬁtness of Θ B is 0.2. This shows Θ A is39volutionarily stable with strategic certainty against Θ B for λ = 1 . A.7 Proof of Lemma 1

Proof.

First note the minimization objective may be written as W ( a, F, m g ) := m g K g ( a g,g , a g,g ; F ) + (1 − m g ) K g ( a g, − g , a − g,g ; F ) , a continuous function of ( a, F, m g ) by Assumption 4. Suppose we have a sequence ( a ( n ) , m ( n ) g ) → ( a ∗ , m ∗ g ) ∈ A × [0 ,

1] and let F ( n ) ∈ Θ ∗ g ( a ( n ) , m ( n ) g ) for each n, with F ( n ) → F ∗ ∈ Θ g . Forany other ˆ F ∈ Θ g , note that W ( a ∗ , m ∗ g , ˆ F ) = lim n →∞ W ( a ( n ) , m ( n ) g , ˆ F ) by continuity. Butalso by continuity, W ( a ∗ , m ∗ g , F ∗ ) = lim n →∞ W ( a ( n ) , m ( n ) g , F ( n ) ) and W ( a ( n ) , m ( n ) g , F ( n ) ) ≤ W ( a ( n ) , m ( n ) g , ˆ F ) for every n. It therefore follows W ( a ∗ , m ∗ g , F ∗ ) ≤ W ( a ∗ , m ∗ g , ˆ F ) . A.8 Proof of Proposition 5

Proof.

Consider the correspondence Γ : A × ∆(Θ A ) × ∆(Θ B ) ⇒ A × ∆(Θ A ) × ∆(Θ B ) , Γ( a AA , a AB , a BA , a BB , µ A , µ B ) :=(BR( a AA , µ A ) , BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) , ∆(Θ ∗ A ( a )) , ∆(Θ ∗ B ( a ))) , where BR( a − i , µ g ) := arg max ˆ a i ∈ A U g (ˆ a i , a − i ; µ g ) and, for each g ∈ { A, B } , the correspondenceΘ ∗ g is deﬁned with m g = λ + (1 − λ ) p g , m − g = 1 − m g . It is clear that ﬁxed points of Γ areEZ-SC.We apply the Kakutani-Fan-Glicksberg theorem (see, e.g, Corollary 17.55 in Aliprantisand Border (2006)). By Assumptions 1 and 5, A is acompact and convex metric space, andeach Θ g is a compact metric space, so it follows the domain of Γ is a nonempty, compactand convex metric space. We need only verify that Γ has closed graph, non-empty values,and convex values.To see that Γ has closed graph, the previous lemma shows the upper hemicontinuity ofΘ ∗ A ( a ) and Θ ∗ B ( a ) in a, and Theorem 17.13 of Aliprantis and Border (2006) then implies∆(Θ ∗ A ( a )) and ∆(Θ ∗ B ( a )) are also upper hemicontinuous in a. It is a standard argument thatsince Assumption 2 supposes U A , U B are continuous, it implies the best-response correspon-dences BR( a AA , µ A ) , BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) have closed graphs.To see that Γ is non-empty, recall that each ˆ a i U g (ˆ a i , a − i ; µ g ) is a continuous functionon a compact domain, so it must attain a maximum on A . Similarly, the minimizationproblem that deﬁnes each Θ ∗ g ( a ) is a continuous function of F over a compact domain of40ossible F ’s, so it attains a minimum. Thus each ∆(Θ ∗ g ( a )) is the set of distributions over anon-empty set.To see that Γ is convex valued, clearly ∆(Θ ∗ A ( a )) and ∆(Θ ∗ B ( a )) are convex valued bydeﬁnition. Also, ˆ a i U A (ˆ a i , a AA ; µ A ) is quasiconcave by Assumption 5. That means if a i , a i ∈ BR( a AA , µ A ) , then for any convex combination ˜ a i of a i , a i , we have U A (˜ a i , a AA ; µ A ) ≥ min( U A ( a i , a AA ; µ A ) , U A ( a i , a AA ; µ A )) = max ˆ a i ∈ A U A (ˆ a i , a AA ; µ A ). Therefore, BR( a AA , µ A ) isconvex. For similar reasons, BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) are convex. A.9 Proof of Proposition 6

Proof.

Since A × ∆(Θ A ) × ∆(Θ B ) is compact by Assumption 1, we need only show that forevery sequence ( p ( k ) B ) k ≥ and ( a ( k ) , µ ( k ) ) k ≥ = ( a ( k ) AA , a ( k ) AB , a ( k ) BA , a ( k ) BB , µ ( k ) A , µ ( k ) B ) k ≥ such that forevery k , ( a ( k ) , µ ( k ) ) is an EZ-SC with p = (1 − p ( k ) B , p ( k ) B ), p ( k ) B → p ∗ B , and ( a ( k ) , µ ( k ) ) → ( a ∗ , µ ∗ ),then ( a ∗ , µ ∗ ) is an EZ-SC with p = (1 − p ∗ B , p ∗ B ).We ﬁrst show for all g, g ∈ { A, B } , a ∗ g,g is optimal against a ∗ g ,g under the belief µ ∗ g . Assortativity does not matter here, since optimality applies within all type match-ups.By Assumption 2, U g ( a i , a − i ; F ) is continuous, so by property of convergence in distribu-tion, U g ( a ( k ) g,g , a ( k ) g ,g ; µ ( k ) g ) → U g ( a ∗ g,g , a ∗ g ,g ; µ ∗ g ). For any other ˆ a i ∈ A , U g (ˆ a i , a ( k ) g ,g ; µ ( k ) g ) → U g (ˆ a i , a ∗ g ,g ; µ ∗ g ) and for every k, U g ( a ( k ) g,g , a ( k ) g ,g ; µ ( k ) g ) ≥ U g (ˆ a i , a ( k ) g ,g ; µ ( k ) g ) . Therefore a ∗ g,g bestresponds to a ∗ g ,g under belief µ ∗ g . Next, we show models in the support of µ ∗ g minimize weighted KL divergence for group g. First consider the correspondence H : A × [0 , ⇒ Θ g where H ( a, p g ) := Θ ∗ g ( a, λ +(1 − λ )( p g )). Then H is upper hemicontinuous by Lemma 1. Since H ( a, p g ) represents theminimizers of a continuous function on a compact domain, it is non-empty and closed. ByTheorem 17.13 of Aliprantis and Border (2006), the correspondence ˜ H : A × [0 , ⇒ ∆(Θ g )deﬁned so that ˜ H ( a, p g ) := ∆( H ( a, p g )) is also upper hemicontinuous. For every k, µ ( k ) g ∈ ˜ H ( a ( k ) , p ( k ) g ), and µ ( k ) g → µ ∗ g , a ( k ) → a ∗ , p ( k ) g → p ∗ g . Therefore, µ ∗ g ∈ ˜ H ( a ∗ , p ∗ g ) , that is to say µ ∗ g is supported on the minimizers of weighted KL divergence. A.10 Proof of Lemma 2

Proof.

For i = j, rewrite s i = (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) + − κ √ κ +(1 − κ ) η i and s j = (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) + − κ √ κ +(1 − κ ) η j . Note that ω + κ √ κ +(1 − κ ) z has a normal distribution with mean 0 and variance σ ω + κ κ +(1 − κ ) σ (cid:15) . The posterior distribution of (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) given s i is therefore normalwith a mean of / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) s i and a variance of / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) . η j is mean-zero and independent of i ’s signal, the posterior distribution of s j | s i under the correlation parameter κ is normal with a mean of1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) )1 / ( σ ω + κ κ +(1 − κ ) σ (cid:15) ) + 1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) ) s i and a variance of / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) + (1 − κ ) κ +(1 − κ ) σ (cid:15) . We thus deﬁne ψ ( κ ) := / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) for κ ∈ [0 , , and ψ (1) := 1. To see that ψ ( κ ) is strictlyincreasing in k, we have 1 /ψ ( κ ) = 1 + (1 − κ ) κ +(1 − κ ) σ (cid:15) σ ω + κ κ +(1 − κ ) σ (cid:15) = 1 + (1 − κ ) σ (cid:15) ( κ + (1 − κ ) ) σ ω + κ σ (cid:15) and then we can verify that the second term is decreasing in κ. As κ → , the term 1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) ) tends to ∞ , so / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) ap-proaches / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) = 1. We also verify that ψ (0) = /σ (cid:15) (1 /σ ω )+(1 /σ (cid:15) ) > . Finally, for any κ ∈ [0 , κ √ κ +(1 − κ ) z + − κ √ κ +(1 − κ ) η i has variance σ (cid:15) and mean 0, so E κ [ ω | s i ] = /σ (cid:15) /σ (cid:15) +1 /σ ω s i . We then deﬁne γ as the strictly positive constant /σ (cid:15) /σ (cid:15) +1 /σ ω . A.11 Proof of Lemma 3

Proof.

Player i ’s conditional expected utility given signal s i is α i s i · E κ [ E r ∼ marg r ( µ ) [ ω − rα i s i − rα − i s − i + ζ ] | s i ] −

12 ( α i s i ) by linearity, expectation over r is equivalent to evaluating the inner expectation with r = ˆ r ,which gives α i s i · E κ [ ω −

12 ˆ rα i s i −

12 ˆ rα − i s − i + ζ | s i ] −

12 ( α i s i ) = α i s i · ( γs i −

12 ˆ rα i s i −

12 ˆ rψ ( κ ) s i α − i ) −

12 ( α i s i ) = s i · ( α i γ −

12 ˆ rα i −

12 ˆ rψ ( κ ) α i α − i − α i ) . s i , and the second moment of s i is the samefor all values of κ. Therefore this expectation is E [ s i ] · (cid:16) α i γ − ˆ rα i − ˆ rψ ( κ ) α i α − i − α i (cid:17) . The expression for α BRi ( α − i , ; κ, r ) follows from simple algebra, noting that E [ s i ] > α i for the term in the parenthesis is − ˆ r − < . To see that the said linear strategy is optimal among all strategies, suppose i insteadchooses any q i after s i . By above arguments, the objective to maximize is q i · ( γs i −

12 ˆ rq i −

12 ˆ rψ ( κ ) s i α − i ) − q i . This objective is a strictly concave function in q i , as − ˆ r − < . First-order conditionﬁnds the maximizer q ∗ i = α BRi ( α − i , ; κ, ˆ r ). Therefore, the linear strategy also maximizesinterim expected utility after every signal s i , and so it cannot be improved on by any otherstrategy. A.12 Proof of Lemma 4

Proof.

Note that α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) ≥ α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) = 1 + α − i ( ψ ( κ • ) − ψ ( κ )) α i + α − i ψ ( κ ) ≤ ψ (0) (re-calling ψ (0) > L = r • · (1 + ψ (0) ) . When ¯ M r ≥ L , we always have r INFi ( α i , α − i , ; κ • , κ, r • ) ≤ ¯ M r for all α i , α − i ≥ κ • , κ ∈ [0 , . Conditional on the signal s i , the distribution of market price under the model F ˆ r,κ, ˆ σ ζ isnormal with a mean of E [ ω | s i ] −

12 ˆ rα i s i −

12 ˆ rα − i · E κ [ s − i | s i ] = γs i −

12 ˆ rα i s i −

12 ˆ rα − i ψ ( κ ) s i , while the distribution of market price under the model F r • ,κ • ,σ • ζ is normal with a mean of E [ ω | s i ] − r • α i s i − r • α − i · E κ • [ s − i | s i ] = γs i − r • α i s i − r • α − i ψ ( κ • ) s i . Matching coeﬃcients on s i , we ﬁnd that if ˆ r = r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) , then these means match afterevery s i . On the other hand, for any other value of ˆ r, these means will not match for any s i and thus D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F ˆ r,κ, ˆ σ ζ ( α i , α − i )) > r = r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) . Let L = max κ ∈ [0 , n Var κ [ ω | s i ] + Var κ h r • · (1 + ψ (0) ) B α · s − i | s i io . This maximumexists and is ﬁnite, since the expression is a continuous function of κ on the compact domain[0 , . Also, let L = max κ ∈ [0 , n Var κ [ ω | s i ] + Var κ h r • B α · s − i | s i io , where the maximumexists for the same reason. Conditional on the signal s i , the variance of market price under43he model F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ isVar κ " ω − r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) α − i s − i | s i + ˆ σ ζ . Since ω and s − i are positively correlated given s i , and using the fact r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) ≤ r • · (1 + ψ (0) ) and α − i ≤ B α , this variance is no larger thanVar κ [ ω | s i ] + Var κ " r • · (1 + 1 ψ (0) ) B α · s − i | s i + ˆ σ ζ = L + ˆ σ ζ . On the other hand, the variance of market price under the model F r • ,κ • ,σ • ζ isVar κ • (cid:20) ω − r • α − i s − i | s i (cid:21) +( σ • ζ ) ≤ Var κ • [ ω | s i ]+Var κ • (cid:20) r • B α · s − i | s i (cid:21) +( σ • ζ ) ≤ L +( σ • ζ ) . At the same time, since ( σ • ζ ) ≥ L , this conditional variance is at least L . Among val-ues of ˆ σ ζ ∈ [0 , ¯ M σ ζ ] , there exists exactly one such that the conditional variance under F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ is the same as that under F r • ,κ • ,σ • ζ , since we have let ¯ M σ ζ ≥ ( σ • ζ ) + L . Thus there is one choice of ˆ σ ζ ∈ [0 , ¯ M σ ζ ] with such that D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ ( α i , α − i )) = 0. For any other choice of ˜ σ ζ , we conclude that D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˜ σ ζ ( α i , α − i )) > A.13 Proof of Lemma 5

Proof.

Assumption 1 holds as A , Θ A , Θ B are compact due to the ﬁnite bounds ¯ M α , ¯ M r , ¯ M σ ζ . Also, from Lemma 3, the expected utility from playing α i against α − i in a model with param-eters (ˆ r, κ, σ ζ ) is E [ s i ] · (cid:16) α i γ − ˆ rα i − ˆ rψ ( κ ) α i α − i − α i (cid:17) . This is a continuous function in( α i , α − i , ˆ r ) and strictly concave in α i . Therefore Assumptions 2 and 5 are satisﬁed.To see the ﬁniteness and continuity of the K functions, ﬁrst recall that the KL divergencefrom a true distribution N ( µ , σ ) to a diﬀerent distribution N ( µ , σ ) is given by ln( σ /σ )+ σ +( µ − µ ) σ − . Under own play α i , opponent play α − i , correlation parameter κ, elasticity ˆ r and price idiosyncratic variance σ ζ , the expected distribution of price after signal s i is −

12 ˆ rα i s i + ( ω −

12 ˆ rα − i s − i | s i , κ ) + ˆ ζ where the ﬁrst term is not random, the middle term is the conditional distribution of ω − ˆ rα − i s − i given s i , based on the joint distribution of ( ω, s i , s − i ) with correlation parameter κ. The ﬁnal term is an independent random variable with mean 0, variance σ ζ . The analogous44rue distribution of price is − r • α i s i + ( ω − r • α − i s − i | s i , κ • ) + ζ • where ζ • is an independent random variable with mean 0, variance ( σ • ζ ) . For a ﬁxed κ, wemay ﬁnd 0 < σ < ¯ σ < ∞ so that the variances of both distributions lie in [ σ , ¯ σ ] for all s i ∈ R , α i , α − i ∈ [0 , ¯ M α ] , ˆ r ∈ [0 , ¯ M r ] . First note that as a consequence of the multivariatenormality, the variances of these two expressions do not change with the realization of s i . The lower bound comes from the fact that Var κ ( ω − ˆ rα − i s − i | s i ) is nonzero for all α − i , ˆ r in the compact domains and it is a continuous function of these two arguments, so it musthave some positive lower bound σ > . For a similar reason, the variance of the middleterm has a upper bound for choices of the parameters α − i , ˆ r in the compact domains, andthe inference about σ ζ is also bounded.The diﬀerence in the means of the two distributions is no larger than s i · [ ( ¯ M r + r • ) · ( ¯ M r + r • ) · · ( ψ ( κ ) + ψ ( κ • ))] . Thus consider the function h ( s i ) := ln(¯ σ/σ ) + 12 (¯ σ /σ ) + [ ( ¯ M r + r • ) · ( ¯ M r + r • ) · · ( ψ ( κ ) + ψ ( κ • ))] σ s i − . That is h ( s i ) has the form h ( s i ) = C + C s i for constants C , C . It is absolutely integrableagainst the distribution of s i , and it dominates the KL divergence between the true and ex-pected price distributions at every s i and for any choices of α i , α − i ∈ [0 , ¯ M α ] , ˆ r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M ζ ] . This shows K A , K B are ﬁnite, so Assumption 3 holds. Further, since the KL diver-gence is a continuous function of the means and variances of the price distributions, and sincethese mean and variance parameters are continuous functions of α i , α − i , ˆ r, σ ζ , the existenceof the absolutely integrable dominating function h also proves K A , K B (as integrals of KLdivergences across diﬀerent s i ) are continuous, so Assumption 4 holds. A.14 Proof of Proposition 7

Proof.

We can take L , L , L as given by Lemma 4. Suppose there is an EZ-SC with behavior α = ( α AA , α AB , α BA , α BB ) and beliefs over parameters µ A ∈ ∆(Θ( κ • )) , µ B ∈ ∆(Θ( κ )) . ByLemma 4, both µ A and µ B must be degenerate beliefs that induce zero KL divergence, sinceboth groups match up with group A with probability 1. Furthermore, since Θ A is correctlyspeciﬁed, it is easy to see that the model F r • ,κ • ,σ • ζ generates 0 KL divergence, hence the beliefof the adherents of Θ A must be degenerate on this correct model.In terms of behavior, from Lemma 3, α BRi ( α − i , ; κ, r ) ≤ γ for all α − i ≥ , κ ∈ [0 , , r ≥ . Since the upper bound ¯ M α ≥ γ , the adherents of each theory must be best responding (across45ll linear strategies in [0 , ∞ )) in all matches, given their beliefs about the environment.Using the equilibrium belief of group A, we must have α AA = α BRi ( α AA , ; κ • , r • ) , so α AA = γ − r • ψ ( κ • ) α AA r • . We ﬁnd the unique solution α AA = γ r • + r • ψ ( κ • ) .Next we turn to α AB , α BA , and µ B . We know µ B puts probability 1 on some r B . Foradherents of groups A and B to best respond to each others’ play and for group B’s inferenceto have 0 KL divergence (when paired with an appropriate choice of σ ζ ), we must have α AB = γ − r • ψ ( κ • ) α BA r • , α BA = γ − r B ψ ( κ ) α AB r B , and r B = r • α BA + α AB ψ ( κ • ) α BA + α AB ψ ( κ ) from Lemma 4. Wemay rearrange the expression for α BA to say α BA = γ − r B α BA − r B ψ ( κ ) α AB . Substitutingthe expression of r B into this expression of α BA , we get α BA = γ − r B · ( α BA + α AB ψ ( κ ) − α AB ψ ( κ ))= γ − r • α BA + r • α AB ψ ( κ • ) α BA + α AB ψ ( κ ) · ( α BA + α AB ψ ( κ ) − α AB ψ ( κ ))= γ − r • α BA − r • α AB ψ ( κ • ) + 12 ψ ( κ ) α AB r • α BA + r • α AB ψ ( κ • ) α BA + α AB ψ ( κ )Multiply by α BA + α AB ψ ( κ ) on both sides and collect terms by powers of α , ( α BA ) · [ − − r • ]+( α BA α AB ) · (cid:20) − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • ) (cid:21) − ( α AB ) · (cid:20) r • ψ ( κ • ) ψ ( κ ) (cid:21) + γ [ α BA + α AB ψ ( κ )] = 0 . Consider the following quadratic function in x , H ( x ) := x [ − − r • ]+( x · ‘ ( x )) · (cid:20) − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • ) (cid:21) − ( ‘ ( x )) · (cid:20) r • ψ ( κ • ) ψ ( κ ) (cid:21) + γ [ x + ‘ ( x ) ψ ( κ )] = 0 , (1) where ‘ ( x ) := γ − r • ψ ( κ • ) x r • is a linear function in x. In an EZ-SC, α BA is a root of H ( x )in [0 , γ r • ψ ( κ • ) ]. To see why, if we were to have α BA > γ r • ψ ( κ • ) , then α AB = 0 . In thatcase, r B = r • and so α BA = α BRi (0 , ; κ • , r • ) = γ r • . Yet γ r • < γ r • ψ ( κ • ) , contradiction.Conversely, for any root x ∗ of H ( x ) in [0 , γ r • ψ ( κ • ) ], there is an EZ-SC where α BA = x ∗ ,α AB = ‘ ( x ∗ ) ∈ [0 , γ ] , and r B = r • α BA + α AB ψ ( κ • ) α BA + α AB ψ ( κ ) . We now show H ( x ) (i) has a unique root in [0 , γ r • ψ ( κ • ) ] when κ = κ • ; (ii) does not havea root at x = 0 or x = γ r • ψ ( κ • ) , and (iii) the root in the interval is not a double root. Since H ( x ) is a continuous function of κ, there must exist some κ < κ • < ¯ κ so that it continuesto have a unique root in [0 , γ r • ψ ( κ • ) ] for all κ ∈ [ κ , ¯ κ ] ∩ [0 , . Claim (i) has to do with the fact that if κ = κ • , then we need α AB = γ − r • ψ ( κ • ) α BA r • and α BA = γ − r • ψ ( κ • ) α AB r • . These are linear best response functions with a slope of − r • r • ψ ( κ • ),which falls in ( − , . So there can only be one solution to H in that region (even whenwe allow α AB = α BA ), which is the symmetric equilibrium found before α AB = α BA =46 r • + r • ψ ( κ • ) .For Claim (ii), we evaluate H (0) = − ( γ r • ) r • ψ ( κ • ) + γ ψ ( κ • )1+ r • = ψ ( κ • ) γ r • (1 − (1 / r • ψ ( κ • )1+ r • ) =0 because 1 + r • > (1 / r • ψ ( κ • ) . Finally, we evaluate H ( γ r • ψ ( κ • ) ) = ( γ r • ψ ( κ • ) ) ( − − r • ) + γ γ r • ψ ( κ • ) = γ r • ψ ( κ • ) (1 − r • r • ψ ( κ • ) ) . This is once again not 0 because 1 + r • > (1 / r • ψ ( κ • ) . For Claim (iii), we show that H ( x ∗ ) < x ∗ = γ r • + r • ψ ( κ • ) . We ﬁnd that H ( x ) =2 x ( − − r • ) + γ − r • ψ ( κ • ) x r • ! ( − ψ ( κ • ) − r • ψ ( κ • ) − r • ψ ( κ • )) − γ − r • ψ ( κ • ) x r • ! − r • ψ ( κ • )1 + r • ! (cid:18) r • ψ ( κ • ) (cid:19) + γ − r • ψ ( κ • )1 + r • γψ ( κ • ) . Collecting terms, the coeﬃcient on x is − − r • + ψ ( κ • ) r • r • r • + 1 −

14 ( ( r • ) ψ ( κ • ) r • ) ! , while the coeﬃcient on the constant is γψ ( κ • )1 + r • − r • − r • ) ψ ( κ • ) r • − r • ψ ( κ • ) ! + γ. Therefore, we may calculate H ( x ∗ ) · x ∗ (1 + r • ) , which has the same sign as H ( x ∗ ) , to be: − (1 + r • ) (2 + 2 r • ) + ψ ( κ • ) r • ((1 + r • )( 32 r • + 1) −

14 ( r • ) ψ ( κ • ) )+ (1 + r • + 12 r • ψ ( κ • )) (cid:20) ψ ( κ • )((1 + r • )[ − r • − − r • ψ ( κ • )] + 12 ( r • ) ψ ( κ • ) ) + (1 + r • ) (cid:21) . We have − (1 + r • ) (2 + 2 r • ) + (1 + r • + 12 r • ψ ( κ • ))(1 + r • ) ≤ (1 + r • ) ( − − r • ) < , since 0 ≤ ψ ( κ • ) ≤ . Also, for the same reason,(1 + r • )[ − r • ψ ( κ • )] + 12 ( r • ) ψ ( κ • ) ≤ −

12 ( r • ) ψ ( κ • ) + 12 ( r • ) ψ ( κ • ) ≤ . Finally, ψ ( κ • ) r • (1 + r • )( r • + 1) + (1 + r • + r • ψ ( κ • )) ψ ( κ • )(1 + r • )( − r • −

1) is no largerthan ψ ( κ • ) r • ( 32 ( r • ) + 52 r • + 1) + [ r • ψ ( κ • ) r • ( − (3 / r • )]+ [ r • ψ ( κ • ) r • ( −

1) + 1 · ψ ( κ • ) r • ( − (3 / r • )] + [ r • ψ ( κ • ) · · ( − H ( x ∗ ) < . We have shown that for κ ∈ [ κ , ¯ κ ] ∩ [0 , r B ( κ )) in EZ-SC), since there is only one possible outcome in thematch between group A and group B. This means α BB is also pinned down, since there isonly one solution to α BB = α BRi ( α BB , ; κ, r B ( κ )). So for every κ ∈ [ κ , ¯ κ ] ∩ [0 , κ by α ( κ ) =( α AA ( κ ) , α AB ( κ ) , α BA ( κ ) , α BB ( κ )) . Recall from Lemma 3 that the objective expected utility from playing α i against anopponent who plays α − i is U • i ( α i , α − i ) = E [ s i ] · (cid:16) α i γ − r • α i − r • ψ ( κ • ) α i α − i − α i (cid:17) . If − i plays the rational best response, then the objective expected utility of choosing α i is¯ U i ( α i ) := E [ s i ] · (cid:18) α i γ − r • α i − r • ψ ( κ • ) α i γ − r • ψ ( κ • ) α i r • − α i (cid:19) . The derivative in α i is¯ U i ( α i ) = γ − r • α i − r • r • γψ ( κ • ) +

12 ( r • ) ψ ( κ • ) r • α i − α i . We also know that α AA = γ r • + r • ψ ( κ • ) satisﬁes the ﬁrst-order condition that γ − r • α AA − r • ψ ( κ • ) α AA − α AA = 0, therefore¯ U i ( α AA ) = − r • r • γψ ( κ • ) + 12 ( r • ) ψ ( κ • ) r • α AA + 12 r • ψ ( κ • ) α AA = " r • ψ ( κ • )2 − γ r • + α AA ψ ( κ • ) r • r • + α AA ! . Making the substitution α AA = γ r • + r • ψ ( κ • ) , − γ r • + α AA ψ ( κ • ) r • r • + α AA = − γ (1 + r • + ψ ( κ • ) r • ) + γψ ( κ • ) r • + γ (1 + r • )(1 + r • )(1 + r • + ψ ( κ • ) r • )= γψ ( κ • ) r • (1 + r • )(1 + r • + ψ ( κ • ) r • ) > . Therefore, if we can show that α BA ( κ • ) > , then there exists some κ ≤ κ < κ • < ¯ κ ≤ ¯ κ sothat for every κ ∈ [ κ, ¯ κ ] ∩ [0 , κ = κ • adherents of Θ B have strictly higher or strictly lowerequilibrium ﬁtness in the unique EZ-SC than adherents of Θ A , depending on the sign of κ − κ • .Consider again the quadratic function H ( x ) in Equation (1) and implicitly characterize theunique root x in [0 , γ r • ψ ( κ • ) ] as a function of κ in a neighborhood around κ • . Denote this48oot by α M , let D := dα M dψ ( κ ) and also note d‘ ( α M ) dψ ( κ ) = − r • r • ) ψ ( k • ) · D . We have ( − − r • ) · (2 α M ) · D + ( α M ‘ ( α M ))( − − r • )+ ( ‘ ( α M ) D + α M − r • r • ) ψ ( κ • ) D ) · ( − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • )) + +( ‘ ( α M )) · ( − r • ψ ( κ • ))+ (2 ‘ ( α M ) − r • r • ) ψ ( κ • ) D ) · ( − r • ψ ( κ • ) ψ ( κ )) + γ ( D + ‘ ( α M ) + ψ ( κ ) − r • r • ) ψ ( κ • ) D ) = 0 Evaluate at κ = κ • , noting that α M ( κ • ) = ‘ ( α M ( κ • )) = x ∗ := γ r • + ψ ( κ • ) r • .The terms without D are:( x ∗ ) ( − − r • ) + ( x ∗ ) ( 12 r • ψ ( κ • )) + γx ∗ = x ∗ · (cid:20) − x ∗ · (cid:18) r • + 12 ψ ( κ • ) r • − r • (cid:19) + γ (cid:21) = x ∗ · (cid:20) − γ + 12 x ∗ r • + γ (cid:21) = 12 r • ( x ∗ ) > . The coeﬃcient in front of D is:( − − r • )(2 x ∗ )+( x ∗ + x ∗ − r • r • ) ψ ( κ • )) · ( − ψ ( κ • ) − r • ψ ( κ • ))+ 12 x ∗ ( r • ) (1 + r • ) ψ ( κ • ) + γ + γψ ( κ • ) · − r • r • ) . Make the substitution γ = x ∗ · (cid:16) r • + ψ ( κ • ) r • (cid:17) , x ∗ · ( − − r • + − r • r • ) ψ ( κ • ) ! · ψ ( κ • )( − r • −

1) + ( r • ) r • ) ψ ( κ • ) ) + x ∗ · ((cid:18) r • + 12 ψ ( κ • ) r • (cid:19) · (1 − ψ ( κ • ) r • r • ) ) ) . Collect terms inside the parenthesis based on powers of ψ ( κ • ) , we get x ∗ · ( ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • r • ) ( − r • −

1) + ψ ( κ • )( − r • − − r • − ) + x ∗ · ( − ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • r • ) · (1 + r • ) + 1 + r • + 12 ψ ( κ • ) r • ) . Combine to get: x ∗ · " ψ ( κ • ) ( r • ) r • ) + ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • − ψ ( κ • ) − r • − . ψ ( κ • ) r • ) r • ) and ψ ( κ • ) ( r • ) r • ) are positive terms with ψ ( κ • ) ( r • ) r • ) + ψ ( κ • ) ( r • ) r • ) ≤ ( r • ) r • ) + ( r • ) r • ) ≤ · r • · r • r • ≤ r • . Now − r • + · r • <

0, and also − ψ ( κ • ) r • − ψ ( κ • ) − < . Thus the coeﬃcient in front of D is strictly negative. This shows D ( κ • ) > . Finally, dα M dψ ( κ ) has the same sign as dα M dκ since ψ ( κ ) is strictly increasing in κ. A.15 Proof of Proposition 8

Proof.

We will show that in every EZ-SC: (i) for each g ∈ { A, B } , µ g puts probability 1 on ψ ( κ • )1+ ψ ( κ g ) r • ; (ii) for each g ∈ { A, B } , α gg = γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κg ) ) ; (iii) the equilibrium ﬁtnessof group A is weakly higher than that of group B if and only if κ A ≤ κ B .Choose L , L , L as in Lemma 4, given r • and ¯ M α . In any EZ-SC with behavior ( α AA , α AB , α BA , α BB ) , since the adherents of each theory matches with their own group with probability 1 underperfectly assortatively matching, we conclude that each of µ g for g ∈ { A, B } must put fullweight on r INFi ( α gg , α gg ; κ • , κ g , r • ) = α gg + α gg ψ ( κ • ) α gg + α gg ψ ( κ g ) r • = ψ ( κ • )1+ ψ ( κ g ) r • , proving (i).Given this belief, we must have α gg = γ −

12 1+ ψ ( κ • )1+ ψ ( κg ) r • ψ ( κ g ) α gg ψ ( κ • )1+ ψ ( κg ) r • by Lemma 3. Rearrangingyields α gg = γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) , proving (ii).From Lemma 3, the objective expected utility of each player when both play the strategyproﬁle α symm is E [ s i ] · (cid:16) α symm γ − r • α symm − r • ψ ( κ • ) α symm − α symm (cid:17) . This is a strictlyconcave quadratic function in α symm that is 0 at α symm = 0 . Therefore, it is strictly decreasingin α symm for α symm larger than the team solution α T EAM that maximizes this expression,given by the ﬁrst-order condition γ − r • α T EAM − r • ψ ( κ • ) α T EAM − α T EAM = 0 ⇒ α T EAM = γ r • + r • ψ ( κ • ) . For any value of κ ∈ [0 , , using the fact that ψ (0) > ψ is strictly increasing, γ r • (1 + ψ ( κ • )) + r • ( ψ ( κ • )1+ ψ ( κ ) ) > γ r • (1 + ψ ( κ • )) + r • (1 + ψ ( κ • )) = α T EAM . Also, γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) is a strictly increasing function in κ , since ψ is strictly increas-ing. We therefore conclude that each player’s utility when they play γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) against each other is strictly decreasing in κ, proving (iii).50 .16 Proof of Proposition 9 Proof.

Find L , L , L as given by Lemma 4. Suppose Θ A = Θ( κ • ), Θ B = { F r • ,κ,σ • ζ } for any κ ∈ [0 , , ( p A , p B ) = (1 , λ ∈ [0 , , then arguments similar to those in the proof ofLemma 4 imply there exists exactly one EZ-SC, and it involves the adherents of Θ A holdingcorrect beliefs and playing γ r • + r • ψ ( κ • ) against each other.We now analyze α BA ( κ ) in such EZ-SC. In the proof of Proposition 7, we deﬁned ¯ U i ( α i )as i ’s objective expected utility of choosing α i when − i plays the rational best response.We showed that ¯ U i ( γ r • + r • ψ ( κ • ) ) > . In an EZ-SC where i believes in the model F r • ,κ,σ • ζ and − i believes in the model F r • ,κ • ,σ • ζ , using the expression for α BRi from Lemma 3, theplay of i solves x = γ − r • ψ ( κ ) (cid:16) γ − r • ψ ( κ • ) x r • (cid:17) r • , which implies α BA ( κ ) = γ (1+ r • − ψ ( κ ) r • )1+2 r • +( r • ) − ψ ( κ ) ψ ( κ • )( r • ) .Taking the derivative and evaluating at κ = κ • , we ﬁnd an expression with the same sign as ψ ( κ • ) r • (1+ r • ) γ ( − r • )+ ψ ( κ • ) r • ) , which is strictly negative because ψ ( κ • ) > , r • > ,γ > , and ψ ( κ • ) ≤

1. This shows there exists (cid:15) > κ h ∈ ( κ • , κ • + (cid:15) ], wehave ¯ U i ( α BA ( κ h )) < ¯ U i ( γ r • + r • ψ ( κ • ) ), that is the adherents of { F r • ,κ h ,σ • ζ } have strictly lowerﬁtness than the adherents of Θ( κ • ) with λ = 0 in the unique EZ-SC. Finally, existence andupper-hemicontinuity of EZ-SC in population proportion in such societies can be establishedusing arguments similar to the proof of Propositions 5 and 6. This establishes the ﬁrst claimto be proved.Next, we turn to α BB ( κ ) . Using the expressing for α BRi in Lemma 3, we ﬁnd that α BB ( κ ) = γ r • + r • ψ ( κ ) . Since ψ > , we have α BB ( κ ) is strictly larger than α AA = γ r • + r • ψ ( κ • ) when κ < κ • . From the proof of Proposition 8, we know that objective payoﬀs in the stage game isstrictly decreasing in linear strategies larger than the team solution α T EAM = γ r • + r • ψ ( κ • ) . Since α BB ( κ ) > α AA > α T EAM , we conclude the adherents of { F r • ,κ l ,σ • ζ } have strictly lowerﬁtness than the adherents of Θ( κ • ) with λ = 1 in the unique EZ-SC, for any κ l < κ • . Again ,existence and upper-hemicontinuity of EZ-SC in population proportion in such societies canbe established using arguments similar to the proof of Propositions 5 and 6. This establishesthe second claim to be proved.

A.17 Proof of Proposition 10

Proof.

Consider the society where Θ A = Θ B = Θ( κ • ), ( p A , p B ) = (1 , . For any EZ-SC with behavior ( σ AA , σ AB , σ BA , σ BB ) and beliefs ( µ A , µ B ), there exists another EZ-SC( σ AA , σ AB , σ BA , σ BB ) where σ g,g = σ AA for all g, g ∈ { A, B } and all agents hold the belief µ A .The uniqueness of EZ-SC from Assumption 6 implies α AB ( κ • ) = α BA ( κ • ) = α BB ( κ • ) = α • . Now consider the society where Θ B = Θ( κ ), ( p A , p B ) = (1 , . By the same arguments asthe existence arguments in Proposition 5, there exists an EZ-SC where α AA ( κ ) = α AA ( κ • ) .

51y the uniqueness of EZ-SC from Assumption 6, we must in fact have α AA ( κ ) = α AA ( κ • )for all κ , so the ﬁtness of theory Θ( κ • ) in the unique EZ-SC is E • [ E • [ u • ( α • s , α • s , ω ) | s ]] . Under λ matching with mutant theory Θ( κ ), the mutant’s ﬁtness in the unique EZ-SC is E • [ E • [(1 − λ ) u • ( α BA ( κ ) s , α AB ( κ ) s , ω ) + ( λ ) u • ( α BB ( κ ) s , α BB ( κ ) s , ω ) | s ]] . Diﬀerentiate and evaluate at κ = κ • . At κ = κ • , adherents of Θ A and Θ B have the sameﬁtness since they play the same strategies. So, a non-zero sign on the derivative would givethe desired evolutionary fragility against either theories with slightly higher or slightly lower κ. This derivative is: E •  E •  ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α BA ( κ • ) + λα BB ( κ • )] · s + ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s  . Using the interim optimality part of Assumption 6, E • h ∂u • ∂q ( α • s , α • s , ω ) | s i = 0 for every s ∈ S , using the necessity of the ﬁrst-order condition. The derivative thus simpliﬁes asclaimed. A.18 Proof of Proposition 11

Proof.

When Θ A = Θ B = Θ • , for any matching assortativity λ and with ( p A , p B ) = (1 , , we show adherents of both theories have 0 ﬁtness in every approachable EZ. Suppose insteadthat the match between groups g and g reach a terminal node other than z with positiveprobability. Let n L be the last non-terminal node reached with positive probability, so wemust have L ≥

2, and also that nodes n , ..., n L − are also reached with positive probability.So Drop must be played with probability 1 at n L . Since n L is reached with positive probabilityand the EZ is approachable, correctly speciﬁed agents hold correct beliefs about opponent’splay at n L , which means at n L − it cannot be optimal to play Across with positive probabilitysince this results in a loss of ‘ compared to playing Drop, a contradiction.Now let Θ A = Θ • , Θ B = Θ An . Suppose λ ∈ [0 ,

1] and let p B ∈ (0 , . We claim thereis an EZ where d kAA = 1 for every k , d kAB = 0 for every even k with k < K , d kAB = 1 forevery other k , d kBA = 0 for every odd k and d kBA = 1 for every even k , and d kBB = 0 forevery k with k < K, d KBB = 1 . It is easy to see that the behavior ( d AA ) is optimal undercorrect belief about opponent’s play. In the Θ A vs. Θ B matches, the conjecture about A’splay ˆ d kAB = 2 /K for k even, ˆ d kAB = 1 for k odd minimizes KL divergence among all strategies52n A An , given B’s play. To see this, note that when B has the role of P2, opponent Dropsimmediately. When B has the role of P1, the outcome is always z K . So a conjecture withˆ d kAB = x for every even k has the conditional KL divergence of: X k ≤ K − · ln (cid:18) (cid:19)| {z } (1 ,z k ) for k ≤ K − + X k ≤ K − · ln / · (1 − x ) ( k/ − · x !| {z } (1 ,z k ) for k ≤ K − + 12 ln / / · (1 − x ) ( K/ − · x !| {z } (1 ,z K ) + 0 · ln − x ) ( K/ !| {z } (1 ,z end ) when matched with an opponent from Θ A . Using 0 · ln(0) = 0 , the expression simpliﬁes to ln (cid:16) − x ) ( K/ − · x (cid:17) , which is minimized among x ∈ [0 ,

1] by x = 2 /K. Against this conjecture,the diﬀerence in expected payoﬀ at node n K − from Across versus Drop is (1 − /K )( g ) +(2 /K )( − ‘ ) . This is strictly positive when g > K − ‘. This means the continuation value at n K − is at least g larger than the payoﬀ of Dropping at n K − , so again Across has strictlyhigher expected payoﬀ than Drop. Inductively, ( d kBA ) is optimal given the belief ( ˆ d kAB ) . Also,( d kAB ) is optimal as it results in the highest possible payoﬀ. We can similarly show that theconjecture ˆ d kBB with ˆ d kBB = 2 /K for k even, ˆ d kBB = 0 for k odd minimizes KL divergenceconditional on Θ B opponent, and ( d kBB ) is optimal given this conjecture.As p B → , we ﬁnd an approachable EZ where adherents of A have ﬁtness 0, whereasthe adherents of B have ﬁtness at least ((( K/ − g − ‘ ) > g > K − ‘. This showsΘ A is not evolutionarily stable against Θ B .But consider the same ( d AA , d AB , d BA ) and suppose d kBB = 1 for every k . Taking p B → , with λ <

1, we ﬁnd an approachable EZ where adherents of B have ﬁtness 0, adherents of Ahave ﬁtness (1 − λ ) · · (( K/ g + ‘ ) > . This shows Θ B is not evolutionarily stable againstΘ A . B Learning Foundation of EZ and EZ-SC

We provide a foundation for EZ and EZ-SC as the steady state of a learning system.

B.1 Regularity Assumptions

We make some regularity assumptions on the objective environments and on the theoriesΘ A , Θ B . These are similar to the regularity assumptions from Section 3.3, but we do notrequire here that Θ A , Θ B have a product structure for the EZ microfoundation.Suppose A is ﬁnite. Suppose the marginals of Θ A , Θ B on the dimension of fundamental53ncertainty, F A , F B , are compact metrizable spaces. So, we can endow Θ A and Θ B withthe product metric. Suppose that each model ( a A , a B , F ) in each theory is so that for every( a i , a − i ) ∈ A , whenever f • ( a i , a − i )( y ) > , we also get f ( a i , a A )( y ) > f ( a i , a B )( y ) > f is the density or probability mass function for F .For each g, g ∈ { A, B } , F ∈ F g , deﬁne K g,g : A × Θ g → R by K g,g ( a i , a − i ; ( a A , a B , F )) = KL ( F • ( a i , a − i ) k F ( a i , a g )) . Suppose each K g,g is well deﬁned and a continuous function ofthe model ( a A , a B , F ).For g ∈ { A, B } , F ∈ F g , let U g ( a i , a − i ; F ) be the expected payoﬀs of the strategy proﬁle( a i , a − i ) for i when consequences are drawn according to F. Assume U A , U B are continuous.Suppose for every theory Θ g and every ( a A , a B , F ) ∈ Θ g and (cid:15) > , there exists anopen neighborhood V ⊆ Θ g of ( a A , a B , F ), so that for every (ˆ a A , ˆ a B , ˆ F ) ∈ V , 1 − (cid:15) ≤ f ( a i , a A )( y ) / ˆ f ( a i , ˆ a A )( y ) ≤ (cid:15) and 1 − (cid:15) ≤ f ( a i , a B )( y ) / ˆ f ( a i , ˆ a B )( y ) ≤ (cid:15) for all a i ∈ A , y ∈ Y . Also suppose there is some M > f ( a i , a A )( y )) and ln( f ( a i , a B )( y ))are bounded in [ − M, M ] for all ( a A , a B , F ) ∈ Θ g , a i , a − i ∈ A , y ∈ Y . B.2 Learning Environment

Time is discrete and inﬁnite, t = 0 , , , ... A unit mass of agents, i ∈ [0 , p A ∈ (0 ,

1) measure of them are assigned to theory A and the rest are assignedto theory B . Each agent born into theory g starts with the same full support prior overthis theory, µ (0) g ∈ ∆(Θ g ), and believes there is some ( a A , a B , F ) ∈ Θ g so that every group g opponent always plays a g and the consequences are always generated by F .In each period t , agents are matched up partially assortatively to play the stage game.Assortativity is λ ∈ (0 , . Each person in group g has λ + (1 − λ ) p g chance of matching withsomeone from group g, and matches with someone from group − g with the complementarychance. Each agent i observes their opponent’s group membership and chooses a strategy a ( t ) i ∈ A . At the end of the match, the agent observes own consequence y ( t ) i and an ex-post signal x ( t ) i ∈ A , where x ( t ) i equals the matched opponent’s strategy a − i with probability τ ∈ [0 , , and it is uniformly random on A with the complementary probability. To give afoundation for EZ, we consider τ = 0, so the signal x i is uninformative. To give a foundationfor EZ-SC, we consider τ close to 1.Thus, the space of histories from one period is { A, B } × A × Y × A , where the ﬁrstinstance of the strategy is own strategy and the second instance is the ex-post signal. Let H denote the space of all ﬁnite-length histories.Given the assumption on the two theories, there is a well-deﬁned Bayesian belief operatorfor each theory g, µ g : H → ∆(Θ g ) , mapping every ﬁnite-length history into a belief overmodels in Θ g , starting with the prior µ (0) g .

54e also take as exogenously given policy functions for choosing strategies after eachhistory. That is, a g,g : H → A for every g, g ∈ { A, B } gives the strategy that a group g agent uses against a group g opponent after every history. Assume these policy functionsare asymptotically myopic. Assumption A.1.

For every (cid:15) > , there exists K so that for any history h containingat least K matches against opponents of each group, a g,g ( h ) is an (cid:15) -best response to theBayesian belief µ g ( h ) about the model. From the perspective of each agent i in group g, i ’s play against groups A and B, aswell as i ’s belief over Θ g , is a stochastic process (˜ a ( t ) iA , ˜ a ( t ) iB , ˜ µ ( t ) i ) t ≥ valued in A × A × ∆(Θ g ) . The randomness is over the groups of opponents matched with in diﬀerent periods, thestrategies they play, and the random consequence and ex-post signals drawn at the endof the match. At the same time, since there is a continuum of agents, the distributionover histories within each population in each period is deterministic. As such, there is adeterministic sequence ( α ( t ) AA , α ( t ) AB , α ( t ) BA , α ( t ) BA , ν ( t ) A , ν ( t ) B ) ∈ ∆( A ) × ∆(∆(Θ A )) × ∆(∆(Θ B ))that describes the distributions of play and beliefs that prevail in the two sub-populationsin every period t. B.3 Steady State Limits are EZs and EZ-SCs

We state and prove the learning foundation. For ( α ( t ) ) t a sequence valued in ∆( A ) and a ∗ ∈ A , α ( t ) → a ∗ means E ˆ a ∼ α ( t ) k ˆ a − a ∗ k→ t → ∞ . For ( ν ( t ) ) t a sequence valued in∆(∆(Θ g )) and µ ∗ ∈ ∆(Θ g ) , ν ( t ) → µ ∗ means E ˆ µ ∼ ν ( t ) k ˆ µ − µ ∗ k→ t → ∞ . Proposition A.1.

Suppose the regularity assumptions in Section B.1 hold, and supposeAssumption A.1 holds.Suppose τ = 0 . Suppose there exists ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) ∈ A × ∆(Θ A ) × ∆(Θ B ) so that ( α ( t ) AA , α ( t ) AB , α ( t ) BA , α ( t ) BA , ν ( t ) A , ν ( t ) B ) → ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) and for each agent i in group g, almost surely (˜ a ( t ) iA , ˜ a ( t ) iB , ˜ µ ( t ) i ) → ( a ∗ gA , a ∗ gB , µ ∗ g ) . Then, ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) is an EZ.Suppose Θ A , Θ B have the product structure. Then, there exists some τ < so that forevery τ ∈ ( τ ,

1) ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) is an EZ-SC under the above conditions.Proof. We ﬁrst consider the case of τ = 0 , so the uninformative ex-post signals may beignored.For µ a belief and g ∈ { A, B } , let u µ ( a i ; g ) represent subjective expected payoﬀ fromplaying a i against group g . Suppose a ∗ AA / ∈ argmax ˆ a ∈ A u µ ∗ A (ˆ a ; A ) (the other cases are analo-gous). By the continuity assumptions on U A (which is also bounded because F A is bounded),55here are some (cid:15) , (cid:15) > µ i ∈ ∆(Θ A ) with k µ i − µ ∗ A k < (cid:15) , we also have u µ i ( a ∗ AA ; A ) < max ˆ a ∈ A u µ i (ˆ a ; A ) − (cid:15) . By the deﬁnition of asymptotically empirical best re-sponses, ﬁnd K so that a A,A ( h ) must be a myopic (cid:15) -best response when there are at least K periods of matches against A and B. Agent i has a strictly positive chance to match withgroups A and B in every period. So, at all except a null set of points in the probabilityspace, i ’s history eventually records at least K periods of play by groups A and B. Also,by assumption, almost surely ˜ µ ( t ) i → µ ∗ A . This shows that by asymptotically myopic bestresponses, almost surely ˜ a ( k ) iA a ∗ AA , a contradiction.Now suppose some θ ∗ A = ( a ∗ A , a ∗ B , f ∗ ) in the support of µ ∗ A does not minimize the weightedKL divergence in the deﬁnition of EZ (the case of a model θ ∗ B in the support of µ ∗ B notminimizing is similar). Then we have θ ∗ A / ∈ argmin ˆ θ ∈ Θ A  ( λ + (1 − λ ) p A ) · D KL ( F • ( a ∗ AA , a ∗ AA ) k ˆ F ( a ∗ AA , ˆ a A ))+(1 − λ )(1 − p A ) · D KL ( F • ( a ∗ AB , a ∗ BA ) k ˆ F ( a ∗ AB , ˆ a B ))  where ˆ θ = (ˆ a A , ˆ a B , ˆ F ) . This is equivalent to: θ ∗ A / ∈ argmax ˆ θ ∈ Θ A  ( λ + (1 − λ ) p A ) · E y ∼ F • ( a ∗ AA ,a ∗ AA ) ln( ˆ f ( a ∗ AA , ˆ a A )( y ))+(1 − λ )(1 − p A ) · E y ∼ F • ( a ∗ AB ,a ∗ BA ) ln( ˆ f ( a ∗ AB , ˆ a B )( y ))  Let this objective, as a function of ˆ θ , be denoted W L (ˆ θ ) . There exists θ optA = ( a optA , a optB , f opt ) ∈ Θ A and δ, (cid:15) > − δ ) W L ( θ optA ) − δM − (cid:15) > (1 − δ ) W L ( θ ∗ A ) . By assumption on theprimitives, ﬁnd open neighborhoods V opt and V ∗ of θ optA , θ ∗ A respectively, so that for all a i ∈ A ,g ∈ { A, B } , y ∈ Y , 1 − (cid:15) ≤ f opt ( a i , a optg )( y ) / ˆ f ( a i , ˆ a g )( y ) ≤ (cid:15) , for all ˆ θ = (ˆ a A , ˆ a B , ˆ f ) ∈ V opt ,and also 1 − (cid:15) ≤ f ∗ ( a i , a ∗ g )( y ) / ˆ f ( a i , ˆ a g )( y ) ≤ (cid:15) for all ˆ θ = (ˆ a A , ˆ a B , ˆ f ) ∈ V ∗ . Also, byconvergence of play in the populations, ﬁnd T so that in all periods t ≥ T , α ( t ) AA ( a ∗ AA ) ≥ − δ and α ( t ) BA ( a ∗ BA ) ≥ − δ .For T ≥ T , consider a probability space deﬁned by Ω := ( { A, B } × A × ( Y ) A ) ∞ thatdescribes the randomness in an agent’s learning process starting with period T + 1. For apoint ω ∈ Ω and each period T + s , s ≥ ω s = ( g, a − i,A , a − i,B , ( y a i ,a − i ) ( a i ,a − i ) ∈ A ) speciﬁesthe group g of the matched opponent, the play a − i,A , a − i,B of hypothetical opponents fromgroups A and B, and the hypothetical consequence y a i ,a − i that would be generated for everypair of strategies ( a i , a − i ) played. As notation, let opp ( ω, s ), a − i,A ( ω, s ) , a − i,B ( ω, s ), and y a i ,a − i ( ω, s ) denote the corresponding components of ω s . Deﬁne P T over this space in thenatural way. That is, it is independent across periods, and within each period, the density56or probability mass function if Y is ﬁnite) of ω s = ( g, a − i,A , a − i,B , ( y a i ,a − i ) ( a i ,a − i ) ∈ A ) is m g · α ( T + s ) AA ( a − i,A ) α ( T + s ) BA ( a − i,B ) · Y ( a i ,a − i ) ∈ A f • ( a i , a − i )( y a i ,a − i ) , where m g is the probability of i from group A being matched up against an opponent ofgroup g, that is m A = ( λ + (1 − λ ) p A ), m B = (1 − λ )(1 − p A ) . For θ = ( a θA , a θB , F θ ) ∈ Θ A with f θ the density of F θ , ω ∈ Ω , consider the stochasticprocess ‘ s ( θ, ω ) := 1 s T + s X t = T +1 ln( f θ ( a ∗ AA , a θopp ( ω,t ) )( y a ∗ AA ,a − i,opp ( ω,t ) ( ω,t ) ( ω, t )) . By choice of the neighborhood V ∗ , lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + 1 s T + s X t = T +1 ln( f ∗ ( a ∗ AA , a ∗ opp ( ω,t ) )( y a ∗ AA ,a − i,opp ( ω,t ) ( ω,t ) ( ω, t )) ≤ (cid:15) + 1 s T + s X t = T +1 { a − i,opp ( ω,t ) ( ω,t )= a ∗ opp ( ω,t ) ,A } · ln( f ∗ ( a ∗ AA , a ∗ opp ( ω,t ) )( y a ∗ AA ,a ∗ opp ( ω,t ) ,A ( ω, t ))(1 − { a − i,opp ( ω,t ) ( ω,t )= a ∗ opp ( ω,t ) ,A } ) · M. Since T ≥ T , in every period t, P T ( a − i,opp ( ω,t ) ( ω, t ) = a ∗ opp ( ω,t ) ,A ) ≥ − δ . Let ( ξ k ) k ≥ arelated stochastic process: it is i.i.d. such that each ξ k has δ chance to be equal to M, (1 − δ ) m A chance to be distributed according to ln( f ∗ ( a ∗ AA , a ∗ A )( y )) where y ∼ f • ( a ∗ AA , a ∗ AA ) , and(1 − δ ) m B chance to be distributed according to ln( f ∗ ( a ∗ AB , a ∗ B )( y )) where y ∼ f • ( a ∗ AB , a ∗ BA ) . By law of large numbers, s P sk =1 ξ k converges almost surely to δM + (1 − δ ) W L ( θ ∗ A ) . Bythis comparison, lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + δM + (1 − δ ) W L ( θ ∗ A ) P T -almost surely.By a similar argument, lim inf s inf θ A ∈ V opt ‘ s ( θ A , ω ) ≥ − (cid:15) − δM + (1 − δ ) W L ( θ optA ) P T -almostsurely.Along any ω where we have both lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + δM + (1 − δ ) W L ( θ ∗ A )and lim inf s inf θ A ∈ V opt ‘ s ( θ A , ω ) ≥ − (cid:15) − δM + (1 − δ ) W L ( θ optA ), if ω also leads to i alwaysplaying a ∗ AA against group A and a ∗ AB against group B in all periods starting with T + 1 , then the posterior belief assigns to V ∗ must tend to 0, hence ˜ µ ( t ) i µ ∗ A . Starting from anylength T history h, there exists a subset ˆΩ h ⊆ Ω that leads to i not playing the EZ strategyin at least one period starting with T + 1 . So conditional on h, the probability of ˜ µ ( t ) i → µ ∗ A is no larger than 1 − P T ( ˆΩ h ) . The unconditional probability is therefore no larger than E h [1 − P T ( ˆΩ h )] , where E h is taken with respect to the distribution of period T histories for i. But this term is also the probability of i playing non-EZ action at least once starting withperiod T . Since there are ﬁnitely many actions and (˜ a ( t ) iA , ˜ a ( t ) iB ) → ( a ∗ AA , a ∗ AB ) almost surely, E h [1 − P T ( ˆΩ h )] tends to 0 as T → ∞ . We have a contradiction as this shows ˜ µ ( t ) i µ ∗ A with probability 1. 57ow consider the case where Θ A , Θ B have the product structure. Let ¯ K < ∞ be an upperbound on K g,g ( a i , a − i ; ( a A , a B , F )) across all g, g ∈ { A, B } , a i , a − i ∈ A , ( a A , a B , F ) ∈ Θ g . Here ¯ K is ﬁnite because A is ﬁnite and K g,g is continuous in the model, which is from acompact domain. Let F Xτ ( a − i ) ∈ ∆( A ) represent the distribution of ex-post signals givenprecision τ, when opponent plays a − i ∈ A . It is clear that there exists some τ < a − i = a − i , τ ∈ ( τ , , we get min( m A , m B ) · D KL ( F Xτ ( a − i ) k F Xτ ( a − i )) > ¯ K. Therefore,given any ( a ∗ AA , a ∗ AB , a ∗ BA ) ∈ A , the solution tomin ˆ θ ∈ Θ A  ( λ + (1 − λ ) p A ) · [ D KL ( F • ( a ∗ AA , a ∗ AA ) k ˆ F ( a ∗ AA , ˆ a A )) + D KL ( F Xτ ( a ∗ AA ) k F Xτ (ˆ a A ))]+(1 − λ )(1 − p A ) · [ D KL ( F • ( a ∗ AB , a ∗ BA ) k ˆ F ( a ∗ AB , ˆ a B )) + D KL ( F Xτ ( a ∗ BA ) k F Xτ (ˆ a B )]  must satisfy ˆ a A = a ∗ AA , ˆ a B = a ∗ BA , because ( a ∗ AA , a ∗ BA , F ) for any F ∈ F A has a KL divergenceno larger than ¯ K , and it is in Θ A because of the product structure. On the other hand, any(ˆ a A , ˆ a B , ˆ F ) with either ˆ a A = a ∗ AA or ˆ a B = a ∗ BA has KL divergence strictly larger than ¯ K bythe choice of ττ