Evolutionarily Stable (Mis)specifications: Theory and Applications
EEvolutionarily Stable (Mis)specifications:Theory and Applications ∗Kevin He † Jonathan Libgober ‡ First version: December 20, 2020This version: December 29, 2020
Abstract
We introduce an evolutionary framework to evaluate competing (mis)specificationsin strategic situations, focusing on which misspecifications can persist over a correctspecification. Agents with heterogeneous specifications coexist in a society and repeat-edly match against random opponents to play a stage game. They draw Bayesian infer-ences about the environment based on personal experience, so their learning dependson the distribution of specifications and matching assortativity in the society. Onespecification is evolutionarily stable against another if, whenever sufficiently prevalent,its adherents obtain higher expected objective payoffs than their counterparts. Thelearning channel leads to novel stability phenomena compared to frameworks wherethe heritable unit of cultural transmission is a single belief instead of a specification(i.e., set of feasible beliefs). We apply the framework to linear-quadratic-normal gameswhere players receive correlated signals but possibly misperceive the information struc-ture. The correct specification is not evolutionarily stable against a correlational error,whose direction depends on matching assortativity. As another application, the frame-work also endogenizes coarse analogy classes in centipede games.
Keywords : misspecified Bayesian learning, endogenous misspecifications, evolution-ary stability, higher-order beliefs, analogy classes ∗ We thank Cuimin Ba, In-Koo Cho, Krishna Dasaratha, Andrew Ellis, Ignacio Esponda, Drew Fuden-berg, Alice Gindin, Ryota Iijima, Yuhta Ishii, Filippo Massari, Philipp Sadowski, Alvaro Sandroni, JoshuaSchwartzstein, Philipp Strack, and various conference and seminar participants for helpful comments. KevinHe thanks the California Institute of Technology for hospitality when some of the work on this paper wascompleted. † University of Pennsylvania. Email: [email protected] ‡ University of Southern California. Email: [email protected] a r X i v : . [ ec on . T H ] D ec Introduction
In many economic settings, people draw misspecified inferences about the world — that is,they start with a prior belief that dogmatically precludes the true data-generating process.For instance, behavioral economics documents a number of prevalent statistical biases. Whenpeople reason about economic fundamentals under the spell of one of these biases, theyengage in misspecified learning. Following Esponda and Pouzo (2016), a growing literaturehas focused on the implications of Bayesian learning under different misspecifications, takingthe errors as exogenously given.Why and when might we expect such misspecifications to persist? Mistakes that distortlearning are empirically ubiquitous, which is puzzling for two reasons. First, many of theseerrors demand even greater computational sophistication than the simple truth, making themhard to justify on the grounds of bounded cognition or costly attention. Convoluted conspir-acy theories fall into this category. So does a behavioral error called projection bias, whereagents overestimate the similarity between their own information and others’ information.Reasoning with projection bias in settings with statistical independence requires the learnerto keep track of inter-personal correlations, complicating the inference problem. Second,conventional economic wisdom dating back to at least Friedman (1953)’s market-selectionhypothesis suggests competitive pressures will eliminate mistakes — including misspecifi-cations. Indeed, contemporaneous papers that formalize payoff-based criteria for selectingbetween (mis)specifications find no strict advantage to being misspecified in single-agentdecision problems (Fudenberg and Lanzani, 2020; Frick, Iijima, and Ishii, 2020).This paper introduces a general framework to evaluate competing (mis)specificationsbased on their expected objective payoffs, with particular emphasis on which misspecifica-tions are likely to persist over the correct specification (and in which environments). Wefind that when agents with heterogeneous specifications coexist in a society and repeatedlymatch against random opponents to play a stage game, misspecified agents may enjoy astrict payoff advantage compared to their correctly specified counterparts. Unlike in deci-sion problems, misspecifications in games can lead to strategically beneficial misinferencesabout the game parameters. Through several examples and applications, we discuss howdetails of the social interaction structure, such as the matching assortativity between agentswith different specifications, shape the stability of different mistakes.We consider an evolutionary framework where specifications are encoded in theories ,which delineate feasible beliefs and serve as the basic unit of cultural transmission. Eachtheory may represent, for example, a scientific paradigm that stipulates a set of (possiblyincorrect) relationships between the environmental parameters and the observables. Adher-ents of the theory learn about the environment by estimating the parameters of their theory1nd play the stage game based on their calibrated model. Theories rise and fall in promi-nence based on the objective welfare of their adherents, as the school of thought that leadsto higher payoffs tends to acquire more resources and attract more followers in the future.The fitness of a theory is determined by its average payoff in stage games, and this averagedepends on the distribution of opponents. We introduce the concept of a zeitgeist to capturethe relevant social interaction structure in the society — the sizes of the subpopulationswith different theories, and the matchmaking technology that pairs up opponents to playthe game. In equilibrium, each agent forms a Bayesian belief about her environment usingdata from all of her interactions, and subjectively best responds to this belief. We define the evolutionary stability of theory A against theory B based on whether theory A has a higherequilibrium fitness than theory B when the population share of theory A is close to 1.Adherents of a misspecified theory may come to different conclusions about the economicfundamentals in different zeitgeists, with these different beliefs translating into differentsubjective best-response functions in the stage game. This kind of endogeneity in stage-game behavior leads to novel stability phenomena. First, we show the possibility of a strongform of multiplicity in the stability comparison between two theories: stability reversals . Twotheories exhibit stability reversal if (i) theory A’s adherents strictly outperform theory B’sadherents not only on average, but even conditional on opponent’s type, whenever theoryA is dominant; (ii) theory B’s adherents strictly outperform theory A’s adherents, whenevertheory B is dominant. Second, we show that the relative stability of one theory over anothermay be non-monotonic in matching assortativity. One theory may be evolutionarily stableagainst another when assortativity is either high or low, but not when it is intermediate.Both of these stability phenomena operate through misinference and cannot happen if thelearning channel is eliminated. That is, they never arise in a world where the basic unit ofcultural transmission is a single belief about the economic environment instead of a theory(i.e., a collection of feasible beliefs).As an application of our general framework, we examine a linear-quadratic-normal gameof incomplete information from Vives (1988). This game has been used to study the equi-librium impact of changing information structures (Bergemann and Morris, 2013), but weinstead ask about the evolutionary stability of misspecifications on the information struc-ture. Players receive objectively correlated signals about Nature’s type in the stage game,and we consider theories that may be misspecified about the correlation between the signals.After seeing their own signals, the adherents of misspecified theories hold correct first-orderbeliefs about Nature’s type but incorrect higher-order beliefs (i.e., wrong beliefs about ri-val’s signal and action), and thus misinfer parameters of the stage game from the game’soutcome. We show the correctly specified theory is not evolutionary stable against eithertheories that dogmatically stipulate excessively correlated information (projection bias) or2hose that stipulate excessively independent information (correlation neglect), but not both.Which correlational error can invade a rational society depends on the social interactionstructure — namely, the matching assortativity of how agents with different theories arepaired to play the game. We also use this game to illustrate that the mislearning channel iscrucial to the predictions: the same correlational errors would instead confer an evolutionarydisadvantage if they were combined with correct beliefs about the other game parameters.As a second application, we discuss how our framework can endogenize analogy classes,a solution concept that Jehiel (2005) introduced to capture simplified strategic thinking incomplex environments. Forming analogy classes is a type of misspecification about strategicuncertainty in extensive-form games, where a biased agent incorrectly believes that heropponent follows the same strategy at distinct nodes in the same analogy class. Providing afoundation for coarse analogy classes has been an open question. We show how to representdifferent analogy classes as different theories in our framework, then find that the theorywith the finest analogy class (i.e., rational agents) is not evolutionarily stable against atheory with coarse analogy classes in a centipede game. This result provides an evolutionaryjustification for analogy-based thinking in this context that does not involve thinking cost.The rest of this section reviews related literature. Section 2 introduces the environmentand the evolutionary framework for assessing the stability of specifications. Section 3 dis-cusses how the learning channel enables novel stability phenomena and gives conditions forthe existence and continuity of equilibria in zeitgeists. Sections 4 and 5 contain applicationsto misspecified information structures in linear-quadratic-normal games and coarse analogyclasses in extensive-form games. Section 6 concludes.
Our paper contributes to the literature on misspecified Bayesian learning by proposing aframework to assess which specifications are more likely to persist based on their objec-tive performance. Most prior work on misspecified Bayesian learning study implications ofparticular errors in specific active-learning environments (i.e., when actions affect observa-tions), including both single-agent decision problems (Nyarko, 1991; Fudenberg, Romanyuk,and Strack, 2017; Heidhues, Koszegi, and Strack, 2018; He, 2020) and multi-agent games(Bohren, 2016; Bohren and Hauser, 2018; Jehiel, 2018; Molavi, 2019; Dasaratha and He,2020; Ba and Gindin, 2020; Frick, Iijima, and Ishii, 2021). A number of papers establishgeneral convergence properties of misspecified learning (Esponda and Pouzo, 2016; Esponda,Pouzo, and Yamamoto, 2019; Frick, Iijima, and Ishii, 2019; Fudenberg, Lanzani, and Strack,2020). All of the above papers take misspecifications as exogenously given. By contrast, wepropose endogenizing misspecifications using ideas from evolutionary game theory. This also3ets us ask how details of the evolutionary process (e.g., the matching assortativity) shapethe stability of misspecifications.Another strand of literature shares our central focus on selecting between multiple speci-fications for Bayesian learning. Except for two contemporaneous papers discussed later, theselection criteria in this literature can be categorized into two groups: subjective expectationsof payoffs and goodness-of-fit tests.
Subjective expectations of payoffs.
The first approach selects specifications based on ad-herents’ (possibly incorrect) beliefs about their own payoffs. Olea, Ortoleva, Pai, and Prat(2020) consider a decision-maker estimating a possibly misspecified linear regression modelof y as a function of x ∈ R k . They define a notion of competition between agents withdifferent regression specifications where the agent with a higher subjective confidence abouttheir prediction error wins. Levy, Razin, and Young (2020) study a model of electoral com-petition between a misspecified simple worldview and the correct complex worldview. Votersbelieving in each worldview are more likely to vote if they expect a higher payoff differencebetween the policies that would be implemented under their own worldview and the com-peting worldview. Eliaz and Spiegler (2020) study the equilibrium distribution of politicalnarratives, requiring that only the narratives that promise the highest subjective payoffssurvive. Gagnon-Bartsch, Rabin, and Schwartzstein (2020) define a misspecified theory tobe attentionally stable against an alternative theory if misspecified agents subjectively judgea certain coarsening of the data to be sufficient for decision-making, and such coarsened datadoes not falsify the wrong theory. Goodness-of-fit tests . The second approach applies exogenous statistical criteria to aban-don or downplay specifications that fit past data poorly. Cho and Kasa (2015) consider acentral bank that switches between two misspecified theories whenever its current theoryfails a goodness-of-fit test. Ba (2020) also considers an agent who switches between twotheories, but uses the relative fits of the two theories to data as the switching criterion. Sheinvestigates whether a misspecified theory can survive in the presence of a “nearby” com-peting theory. Cho and Kasa (2017) let an agent entertaining two competing theories makepredictions using a weighted average of the two theories, with the weights determined byhow well they fit past data. Schwartzstein and Sunderam (2021) study a persuasion settingwhere the sender observes the receiver’s data and proposes a model to interpret the dataand influence the receiver’s belief, with the constraint that the likelihood of data must begreater under the proposed model than the receiver’s default model.The present work differs in that our selection criterion is based on the objective expectedpayoffs of agents with different specifications. We are implicitly motivated by a story ofcultural transmission where agents with higher objective welfare are more likely to passdown their theories to future agents. In general, the theory that leads to the highest objective4ayoff need not be the one that leads to the highest subjective expectation of payoffs or theone that best fits a finite dataset.In independent and contemporaneous work, Fudenberg and Lanzani (2020) and Frick,Iijima, and Ishii (2020) consider welfare-based criteria for selecting among misspecificationsin single-agent decision problems. Fudenberg and Lanzani (2020) study a framework wherea continuum of agents with heterogeneous misspecifications arrive each period and learnfrom their predecessors’ data. When the population shares of different misspecificationschange according to their objective performance, Fudenberg and Lanzani (2020) ask whichBerk-Nash equilibria under one misspecification are robust to invasion by a small fraction ofmutants with a different misspecification. Frick, Iijima, and Ishii (2020) compare learningunder different misspecified signal structures with the property that biased agents still learnthe state correctly with enough signals. They assign an efficiency index to every misspecifi-cation and show two agents with misspecifications ranked by this index must also have thesame welfare ranking in any decision problem, provided there is a large enough but finitenumber of signals.In single-agent decision problems, correctly specified agents always perform weakly betterthan misspecified agents (except when there are non-identifiability issues, see Proposition1), so the welfare-based criteria in Fudenberg and Lanzani (2020) and Frick, Iijima, andIshii (2020) do not provide a strict advantage to misspecified individuals compared to thecorrectly specified ones in the same society. By contrast, we focus on a theory of welfare-basedselection of misspecifications in games, where strategic concerns may imply that learningunder a misspecification confers a strict evolutionary advantage relative to learning underthe correct specification. The central concept in our framework, a zeitgeist, captures aspectsof the social interaction structure that are uniquely relevant when agents confront a game asopposed to a decision problem — namely, the assortativity of the matching technology thatpairs up agents with different specifications to play the stage game, and how agents behavewhen matched against different types of opponents.Our framework of competition between different specifications for Bayesian learning isinspired by the evolutionary game theory literature. This literature also uses objective payoffsas the selection criterion, and studies the evolution of subjective preferences in games anddecision problems (e.g., Dekel, Ely, and Yilankaya (2007), see also the surveys Robson andSamuelson (2011) and Alger and Weibull (2019)) and the evolution of constrained strategyspaces (Heller, 2015; Heller and Winter, 2016). Learning does not play a key role in thesepapers. By contrast, our work seeks to provide a foundation for the exogenously given Some papers studying misspecified learning in games also point out that misspecifications can improvean agent’s welfare in particular situations (e.g., Jehiel (2005) and Ba and Gindin (2020)). We contribute byintroducing a general framework that can be applied broadly. sets of preferences,viewing every misspecification (i.e., a set of feasible stage-game parameters) as a set ofpreferences over strategy profiles. A few papers in this literature study the evolution ofdifferent belief-formation processes (Heller and Winter, 2020; Berman and Heller, 2020),but they take a reduced-form (and possibly non-Bayesian) approach and consider arbitraryinference rules. We require agents to be Bayesians who only differ in the support of theirBayesian prior (i.e., their specification), given the relation of this work to the literature onmisspecified Bayesian learning.
In this section, we introduce the general environment and stability concept. We begin withthe objective stage game and subjective theories that encode specifications. We define thenotion of an equilibrium zeitgeist , which describes the steady-state behavior and beliefs in asociety populated by agents with heterogeneous specifications. We then present the stabilityconcept, based on objective welfare in equilibrium zeitgeists when one theory is sufficientlyprevalent.
We first set up the objective primitives of the general environment. The stage game is asymmetric two-player game with a common strategy space A , assumed to be metrizable.When i and − i choose strategies a i , a − i ∈ A , random consequences y i , y − i ∈ Y are generatedfor the players from a metrizable space Y . These consequences determine each player’sutility, according to a utility function π : Y → R . Objectively, y i is generated as a functionof i and − i ’s play. We take this distribution to be F • ( a i , a − i ) ∈ ∆( Y ), where ∆( Y ) is theset of distributions over Y . We denote the density or probability mass function associatedwith this distribution by f • ( a i , a − i ) : Y → R + .This general setup can allow for mixed strategies (if A is the set of mixtures over somepure actions) and incomplete-information games (if S is a space of private signals, A aspace of actions, and A = A S is the set of signal-contingent actions). It can also describeasymmetric games. Suppose there is a game with action sets A , A for player roles P1 andP2, and that the consequences of P1 and P2 under the action profile ( a , a ) ∈ A × A are generated according to the distributions F • ( a , a ) and F • ( a , a ) over Y , where we6ssume the consequence also fully reveals the agent’s role. We may construct a symmetricstage game by letting A = A × A , so the strategies of two matches agents spell out whatactions they would take if they were assigned into each of the player roles. The agents arethen placed into the player roles uniformly at random and play according to the strategies.That is, the objective distribution over i ’s consequence when playing ( a i , a i ) ∈ A against( a − i , a − i ) ∈ A is given by the 50-50 mixture over F • ( a i , a − i ) and F • ( a i , a − i ). Throughout this paper, we will take the strategy space A , the set of consequences Y , andthe utility function over consequences π to be common knowledge among the agents. But,agents entertain two kinds of uncertainty. First, they are unsure about how play in the stagegame translates into consequences — that is, they have fundamental uncertainty about thefunction F • . For example, the agents may be uncertain about some parameters of the stagegame, such as the market price elasticity in a quantity-competition game. Second, theyare unsure about how others play — that is, they have strategic uncertainty about others’behavior.We will consider a society with two observably distinguishable groups of agents, A andB, who may behave differently in the stage game (due to each group having a differentbelief about the economic fundamentals, for example). All agents entertain different models of the world as possible resolutions of their uncertainty. Models are triplets ( a A , a B , F )with a A , a B ∈ A and F : A → ∆( Y ) . Each model contains a conjecture a A about howgroup A opponents act when matched with the agent, a conjecture a B about how group Bopponents act, and a conjecture F about how strategy profiles translate into consequencesfor the agent. Assume each F , like F • , is given by a density or probability mass function f ( a i , a − i ) : Y → R + for every ( a i , a − i ) ∈ A .A theory Θ is a collection of models: that is, a subset of A × (∆( Y )) A . We assume themarginal of the theory on (∆( Y )) A is metrizable. Each agent enters society with a persistenttheory, which depends entirely on whether they are from group A or group B. We think ofthis exogenously endowed theory as coming from education or cultural background, and eachagent dogmatically believes that her theory contains the correct model of the world. A theoryΘ is correctly specified if Θ ⊇ A × { F • } , so the agent can make unrestricted inferences aboutothers’ play and does not rule out the correct fundamental environment F • .In general, a theory may exclude some feasible opponent strategies or the true F • . Such misspecified theories can represent a scientific paradigm about the economy based on a falsepremise, a religious belief system with dogmas that contradict facts about the world, orheuristic thinking stemming from a psychological bias that deems the true environment as7mplausible. Each agent plays the stage game with a random opponent in every period, anduses her personal experience in these matches to calibrate the most accurate model withinher theory in a way that we will make precise in Section 2.4.An agent endowed with a theory is called an adherent of the theory. As alluded to above,we suppose the society is composed of the adherents in the two observable groups A andB. This presumes that agents can identify which group their matched opponent belongs to,though we do not assume that agents know the models contained in theories other thantheir own. For instance, imagine two dominant theories about business economics coexist ina society, taught by two different universities. Agents are the executives of competing firmsand they can use public records to look up the educational background of other executives andtherefore learn which school of thought they subscribe to. But even though each agent canperfectly identify her opponent’s group membership (which helps to predict the opponent’sbehavior), she does not understand anything about the contents of the rival economic theory. To study competition between two theories, we must describe the social composition andinteraction structure in the society where learning takes place. We introduce the conceptsof zeitgeists and equilibrium zeitgeists to capture these details.The Cambridge Dictionary defines the noun “zeitgeist” as “the general set of ideas, beliefs,feelings, etc. that is typical of a particular period in history.” Crucial in this dictionarydefinition is the multiplicity of coexisting ideas and beliefs in the society at a moment intime. In the spirit of the usual meaning of the word, we define a zeitgeist as a landscape of beliefs from different schools of thought, their relative prominence in the society, and theinteraction among the adherents of different theories.
Definition 1. A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) consists of: (1) two theories Θ A andΘ B ; (2) a belief over models for each theory, µ A ∈ ∆(Θ A ) and µ B ∈ ∆(Θ B ); (3) relativesizes of the two groups in the society, p = ( p A , p B ) with p A , p B ≥ , p A + p B = 1; (4) amatching assortativity parameter λ ∈ [0 , a = ( a AA , a AB , a BA , a BB ) where a g,g ∈ A is the strategy that an adherentof Θ g plays against an adherent of Θ g . A zeitgeist outlines the beliefs and interactions among agents with heterogeneous theoriesliving in the same society. Parts (1) and (2) of this definition capture the beliefs of eachgroup. Parts (3) and (4) determine social composition and social interaction—the relativeprominence of each theory and the probability of interacting with one’s own group versus withthe population as a whole. In each period, every agent is matched with an opponent from8er own group with probability λ, and matched uniformly by population proportion withprobability 1 − λ. Therefore, an agent from group g has an overall probability of λ + (1 − λ ) p g of being matched with an opponent from her own group, and a complementary chance ofbeing matched with an opponent from the other group. Part (5) describes behavior in thesociety. To evaluate payoffs under a zeitgeist, which we then use to determine each theory’s evolu-tionary fitness, we introduce our equilibrium concept.An equilibrium zeitgeist (EZ) imposes equilibrium conditions on the beliefs and behaviorin a zeitgeist. Specifically, it is a zeitgeist that satisfies the optimality of inference andbehavior, holding fixed the population shares p and the matching assortativity λ. Optimalityof behavior requires all players best respond, and optimality of inference requires that thebelief is supported on models that minimize Kullback-Leibler (KL) divergence.Formally, for two distributions F • , ˆ F ∈ ∆( Y ) with density functions / probability massfunctions f • , ˆ f , define the KL divergence from ˆ F to F • as D KL ( F • k ˆ F ) := R f • ( y ) ln (cid:16) f • ( y ) f ( y ) (cid:17) dy . Definition 2.
A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) is an equilibrium zeitgeist (EZ) iffor every g, g ∈ { A, B } , a g,g ∈ arg max ˆ a ∈ A E ( a A ,a B ,F ) ∼ µ g (cid:20) E y ∼ F (ˆ a,a g ) ( π ( y )) (cid:21) and, for every g ∈{ A, B } , the belief µ g is supported onarg min (ˆ a A , ˆ a B , ˆ F ) ∈ Θ g ( λ + (1 − λ ) p g ) · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , ˆ a g )))+(1 − λ )(1 − p g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , ˆ a − g ) . We now interpret the definition of an EZ. Each agent from group g chooses a subjectivebest response a g,g against each group g of opponents, given her belief µ g about the fun-damental and strategic uncertainty. Her belief µ g is supported on the models in her theorythat minimize a weighted KL-divergence, with the data from each type of match weightedby the probability of confronting this type of opponent. Section 3.4 and Appendix B developa learning foundation of EZs as the social steady state when Bayesian learners start with aprior supported on the models in their theory.An important assumption behind this framework is that agents (correctly) believe theeconomic fundamentals are fixed, no matter who they are matched against. That is, themapping ( a i , a − i ) ∆( Y ) describes the stage game that they are playing, and agents knowthat they always play the same stage game even though opponents from different groupsmay use different strategies in the game. As a result, the agent’s experience in gamesagainst both groups of opponents jointly resolve the same fundamental uncertainty about the9nvironment. Generally, play between two groups g and g is not a Berk-Nash equilibrium,as the individuals in group g draw inferences about the game’s parameters not only from thematches against group g , but also from the matches against the other group − g , who mayuse a different strategy.Even as agents adjust their beliefs and behavior to converge to an EZ, the populationproportions of different theories p A , p B remain fixed. We imagine a world where the relativeprominence of theories change much more slowly than the rate of convergence to an EZ.Thus, an equilibrium zeitgeist provides a snapshot of the society in a given era, and thesocial transitions between different EZs as p evolves takes place on a longer timescale. Equilibrium zeitgeists describe environments where agents entertain both fundamental un-certainty and strategic uncertainty. For some applications, we may wish to focus attentionon agents’ inferences about the game parameters and abstract away from learning how othersplay. To do this, we introduce a variant of EZs where agents are restricted to hold correctbeliefs about others’ behavior.
Definition 3.
A zeitgeist Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) is an equilibrium zeitgeist with strate-gic certainty (EZ-SC) if for every g, g ∈ { A, B } ,a g,g ∈ arg max ˆ a ∈ A E ( a A ,a B ,F ) ∼ µ g (cid:20) E y ∼ F (ˆ a,a g ) ( π ( y )) (cid:21) and, for every g ∈ { A, B } , the belief µ g is supported onarg min (ˆ a g , ˆ a − g , ˆ F ) ∈ Θ g s.t. ˆ a g = a g,g , ˆ a − g = a − g,g ( λ + (1 − λ ) p g ) · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , a g,g )))+(1 − λ )(1 − p g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , a − g,g ) . That is, an EZ-SC adds the extra requirement relative to EZ that µ g correctly reflectsothers’ play. In our applications, we will only work with EZ-SC in environments wheretheories have the product structure Θ = A × A × F , where F ⊆ (∆( Y )) A . So, agents canmake any inference about others’ play and the optimization problem in the definition ofEZ-SC can be thought of as an optimization over conjectures about the game, F ∈ F . Aswill be made precise by the learning foundation, an EZ-SC is an EZ in situations whereagents see sufficiently informative ex-post signals about the matched opponent’s strategy atthe end of every match.Agents may still be misspecified because F may exclude the true mapping F • that trans-lates strategy profiles into consequences. For theories with a product structure, we sometimes10buse notation and use the terminology Θ or “theory” to refer to F , even though F formallyrepresents only the marginal of the theory on fundamental uncertainty. In an EZ or EZ-SC, define the fitness of each theory Θ A and Θ B as the objective expectedpayoff of its adherents. Consider an evolutionary story where the relative prominence ofthe two theories in the society rise and fall according to their relative fitness. This couldhappen, for example, if the theories are the basic heritable units of information passed downto future agents via cultural transmission, and the school of thought whose adherents havehigher average payoff tends to acquire more resources and attract a larger share of futureadherents. We are interested in a notion of stability based on this “evolutionary” processwhere two co-existing rival theories compete to create intellectual descendants in a payoff-monotonic way. Can the adherents of a resident theory Θ A , starting at a position of socialprominence, always repel an invasion from a small (cid:15) mass of agents who adhere to a mutanttheory Θ B ? The definition of evolutionary stability formalizes this idea.Since we are motivated by situations where a small but strictly positive population oftheory Θ B adherents invades an otherwise homogeneous society all believing in theory Θ A , webegin with a refinement of EZ and EZ-SC that rules out those equilibria with the populationshare ( p A , p B ) = (1 ,
0) that cannot be written as the limit of equilibria with a positive butvanishing p B . This rules out, for example, EZs with p A = 1 sustained only because groupA holds arbitrary beliefs about the play of group B or fragile beliefs about the economicfundamentals that would be discarded after a single match against a group B opponent. Definition 4.
An EZ Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) with p = (1 ,
0) is approachable if thereexists a sequence of EZs Z ( n ) = (Θ A , Θ B , µ ( n ) A , µ ( n ) B , ( p ( n ) A , p ( n ) B ) , λ, ( a ( n ) AA , a ( n ) AB , a ( n ) BA , a ( n ) BB )), where p ( n ) B > n, p ( n ) B → , µ ( n ) A → µ A , µ ( n ) B → µ B , a ( n ) → a. An EZ-SC Z with p = (1 ,
0) is approachable if there exists a sequence of EZ-SCs Z ( n ) satisfying the analogous convergenceconditions.In this definition, µ ( n ) g → µ g refers to convergence in weak* topology on the space ∆(Θ g )of distributions over the models in theory Θ g , and a ( n ) → a means the convergence of thestrategy profile in the metrizable space A .We now turn to the definition of evolutionary stability, which is defined only when theset of approachable EZ / EZ-SC with p = (1 ,
0) is non-empty. Stability is defined basedon the fitness of theories Θ A , Θ B in such equilibria. Evolutionary stability is when Θ A hashigher fitness than Θ B in all approachable equilibria, and evolutionary fragility is when Θ A has lower fitness in all approachable equilibria. These two cases give sharp predictions about11hether a small share of mutant-theory invaders might grow in size, across all equilibriumselections. A third possible case, where Θ A has lower fitness than Θ B in some but not allapproachable equilibria, correspond to a situation where the mutant theory may or may notgrow in the society, depending on the equilibrium selection. Definition 5.
Suppose there exists at least one approachable EZ with theories Θ A , Θ B , p = (1 , , and matching assortativity λ . Say Θ A is evolutionarily stable [fragile] against Θ B under λ -matching if in all such approachable EZ, Θ A has a weakly higher [strictly lower]fitness than Θ B .Analogously, suppose there exists at least one approachable EZ-SC with theories Θ A , Θ B , p = (1 , , and matching assortativity λ . Say Θ A is evolutionarily stable [fragile] with strategiccertainty against Θ B under λ -matching if in all such approachable EZ-SC, Θ A has a weaklyhigher [strictly lower] fitness than Θ B . Before turning to particular applications, we discuss some general properties of the frame-work. Section 3.1 points out that correct specifications are stable against misspecifications indecision problems. Section 3.2 shows the framework’s learning channel leads to new stabilityphenomena. Section 3.3 presents sufficient conditions for the existence and upper hemicon-tinuity of EZ-SCs. Section 3.4 outlines a learning foundation for EZs and EZ-SCs (with thedetails relegated to Appendix B).
We first show that in single-agent problems, evolutionary arguments will always favor acorrectly specified theory over an incorrect one. The stage “game” is a decision problem if ( a i , a − i ) F • ( a i , a − i ) only depends on a i . In decision problems, the correctly specifiedtheory is evolutionarily stable (with or without strategic certainty) against any other theory,except when there are identification issues. We adapt the notion of strong identification fromEsponda and Pouzo (2016).
Definition 6.
Theory Θ A is strongly identified in EZ Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) if when-ever (ˆ a A , ˆ a B , ˆ F ) , (ˆ a A , ˆ a B , ˆ F ) ∈ Θ A both solvemin (ˆ a A , ˆ a B , ˆ F ) ∈ Θ A ( λ + (1 − λ ) p A ) · D KL ( F • ( a AA , a AA ) k ˆ F ( a AA , ˆ a A )))+(1 − λ )(1 − p A ) · D KL ( F • ( a AB , a BA ) k ˆ F ( a AB , ˆ a B ) , we have ˆ F ( a i , ˆ a A ) = ˆ F ( a i , ˆ a A ) and ˆ F ( a i , ˆ a B ) = ˆ F ( a i , ˆ a B ) for all a i ∈ A .12heory Θ A is strongly identified in EZ-SC Z = (Θ A , Θ B , µ A , µ B , p, λ, a ) if whenever( a AA , a BA , ˆ F ) , ( a AA , a BA , ˆ F ) ∈ Θ A are such that ˆ F , ˆ F both solvemin ˆ F s.t. ( a AA ,a BA , ˆ F ) ∈ Θ A ( λ + (1 − λ ) p A ) · D KL ( F • ( a AA , a AA ) k ˆ F ( a AA , a AA )))+(1 − λ )(1 − p A ) · D KL ( F • ( a AB , a BA ) k ˆ F ( a AB , a BA ) , we have ˆ F ( a i , a AA ) = ˆ F ( a i , a AA ) and ˆ F ( a i , a BA ) = ˆ F ( a i , a BA ) for all a i ∈ A . Proposition 1.
Suppose the stage game is a decision problem. Let λ and two theories Θ A , Θ B be given, where Θ A is correctly specified. Suppose there exists at least one approachableequilibrium zeitgeist [with strategic certainty] with p A = 1 , and Θ A is strongly identified inall such equilibria. Then Θ A evolutionarily stable [with strategic certainty] under λ -matchingagainst Θ B . The result that a resident correct specification is immune to invasions from misspecifi-cations echoes related results in Fudenberg and Lanzani (2020) and Frick, Iijima, and Ishii(2020). For the rest of the paper, we focus on stage games where multiple agents’ actionsjointly determine their payoffs.
A key feature of our theory-evolution framework is that each agent interprets her observa-tions through the lens of her theory, thus drawing inferences about her environment (e.g.,game parameters). These inferences, in turn, shape her preference over strategy profiles inthe stage game. So, the learning channel endogenously determines the preferences that theadherents of different theories hold in the stage game. By contrast, the literature on pref-erence evolution discussed in Section 1.1 precludes such inferences and endows each agentwith a fixed preference.We first show how preference evolution is embedded as a special case of our framework.We then explore the implications of the learning channel for evolutionary stability, showingthat some novel stability phenomena can only arise with theory evolution, and not withpreference evolution. Some of the results in our applications (e.g., Proposition 9) also showthat predictions about evolutionary stability change drastically without the learning channel.A theory Θ is called a singleton if Θ = A × { F } for some F : A → ∆( Y ) . An agentwith a singleton theory does not entertain fundamental uncertainty: she is sure that thestage game is described by F. We can view every singleton theory as a subjective utilityfunction in the stage game. That is, we define ( a i , a − i ) U i ( a i , a − i ; F ) with U i ( a i , a − i ; F ) := E y ∼ F ( a i ,a − i ) [ π ( y )]. An EZ-SC in a society where all agents have singleton theories correspondto an equilibrium in a setting with preference evolution. The adherents of Θ g hold the13ubjective preference U i ( · , · ; F g ) in the stage game, and all agents maximize their subjectivepreferences in all match types. To see this, suppose Θ A = A × { F A } , Θ B = A × { F B } are singleton theories. If Z = (Θ A , Θ B , µ A , µ B , ( p ) , λ, ( a )) is an EZ-SC, then µ A must putprobability 1 on ( a AA , a BA , F A ) and µ B must put probability 1 on ( a AB , a BB , F B ), so for every g, g ∈ { A, B } , a t,t satisfies a g,g ∈ arg max ˆ a ∈ A U i (ˆ a, a g ,g ; F g ) . We work with strategic certaintyfor ease of comparison with the literature on preference evolution, as that literature typicallyassumes agents correctly know others’ behavior in equilibrium.In a society with matching assortativity λ, an adherent of a theory with populationproportion p g is matched up with someone from the same group with probability λ +(1 − λ ) p g .This matching probability is an increasing and linear function in each of λ and p g . Supposethe two subjective preferences U i ( · , · ; F A ) and U i ( · , · ; F B ) associated with the two singletontheories Θ A and Θ B in a society induce a unique equilibrium in matches between groups g and g for all g, g ∈ { A, B } . Then, the fitness of each theory changes linearly as we changethe matching assortativity or population shares. This linearity underlies the key distinctionbetween preference evolution and theory evolution.Every non-singleton theory may be thought of as a set of preferences over stage gamestrategy profiles, viewing each feasible conjecture about the stage game F : A → ∆( Y )as one such preference. As matching assortativity or population shares change, each agentencounters a different distribution over opponent strategies. This may lead a misspecifiedagent to draw a different inference about the stage game parameters and may change theagent’s best-response function. By contrast, in a world of preference evolution, a gamebetween two agents with a given pair of subjective preferences always plays out in the sameway, regardless of the social composition or matching assortativity of the larger society wherethe game takes place.We exhibit two stability phenomena that only happen with non-singleton theories. Stability reversal refers to a strong kind of multiplicity in the relative stability of two theoriesΘ A and Θ B under uniform matching. Recall that in an EZ-SC, the fitness of a theory is theobjective expected payoffs of its adherents, where this expectation averages across expectedpayoffs in matches against each of the two groups. Let a theory’s conditional fitness againstgroup g refer to the expected payoff of the theory’s adherents in matches against group g. Definition 7.
Two theories Θ A , Θ B exhibit stability reversal if (i) in every EZ-SC with λ = 0and ( p A , p B ) = (1 , , Θ A has strictly higher conditional fitness than Θ B against group Aopponents and against group B opponents, but also (ii) in every EZ-SC with λ = 0 and( p A , p B ) = (0 , , Θ B has strictly higher fitness than Θ A .14f at least one EZ-SC is approachable with λ = 0 , ( p A , p B ) = (1 , A to be evolutionarilystable with strategic certainty against Θ B . It imposes the more stringent condition that Θ A outperforms Θ B not only on average, but also conditional on the opponent’s group. Thelinearity of fitness in population share discussed above then implies that stability reversalcannot take place if both theories are singletons (i.e., if we are in the world of preferenceevolution). Proposition 2.
Two singleton theories (i.e., two subjective preferences in the stage game)cannot exhibit stability reversal in any stage game.
Stability reversal is unique to the world of theory evolution. For an example, considera two-player investment game where player i chooses an investment level a i ∈ { , } . Arandom productivity level P is realized according to b • ( a i + a − i ) + (cid:15) where (cid:15) is a zero-meannoise term, b • >
0. Player i gets a i · P − { a i =2 } · c . So P determines the marginal returnon investment, and c > y = ( a i , a − i , P ) . The payoff matrix below displays the objective expected payoffs for differentinvestment profiles. 1 21 2 b • , b • b • , b • − c b • − c, b • b • − c, b • − c Condition 1. b • < c < b • .Condition 1 ensures that a i = 1 is a strictly dominant strategy in the stage game, and theinvestment profile (2,2) Pareto dominates the investment profile (1,1). Higher investmenthas a positive externality as it also increases opponent’s productivity.Consider two theories in the society. Theory Θ A is a correctly specified singleton – itsadherents understand how investment profiles translate into distributions over productivity.Theory Θ B wrongly stipulates P = b ( x i + x − i ) − m + (cid:15) , where m > b ∈ R is a parameter that the adherents infer. We require the followingcondition, which is satisfied whenever m > B is sufficientlymisspecified. Condition 2. c < b • + m and c < b • + m. We show that in contrast to the impossibility result when all theories are singletons, inthis example theories Θ A and Θ B exhibit stability reversal.15 xample 1. In the investment game, under Condition 1 and Condition 2, Θ A and Θ B exhibit stability reversal.The idea is that the adherents of Θ B overestimate the complementarity of investments,and this overestimation is more severe when they face data generated from lower investmentprofiles. As a result, the match between Θ A and Θ B plays out in a different way dependingon which theory is resident: it results in the investment profile (1 ,
2) when Θ A is resident,but results in (1 ,
1) when Θ B is resident.Let b ∗ ( a i , a − i ) solve min b ∈ R D KL ( F • ( a i , a − i ) k ˆ F ( a i , a − i ; b, m ))) , where F • ( a i , a − i ) is theobjective distribution over observations under the investment profile ( a i , a − i ) , and ˆ F ( a i , a − i ; b, m )is the distribution under the same investment profile in the model where productivity is givenby P = b ( x i + x − i ) − m + (cid:15) . We find that b ∗ ( a i , a − i ) = b • + ma i + a − i . That is, adherents ofΘ B end up with different beliefs about the game parameter b depending on the behavior oftheir typical opponents, which in turn affects how they respond to different rival investmentlevels. Stability reversal hinges on the fact that when Θ A is resident and the adherents ofΘ B always meet opponents who play a i = 1 , they end up with a more distorted belief aboutthe fundamental than when Θ B is resident.In this example, stability reversal happens because the misspecified agents hold differentbeliefs about a stage-game parameter depending on which theory is resident. Also, notethe stage game involves non-trivial strategic interaction between the players — the comple-mentarity in investment levels implies an agent’s best response may vary with the rival’sstrategy. Both of these turn out to be necessary conditions for stability reversal in generalstage games. Definition 8.
A theory Θ is strategically independent if for all µ ∈ ∆(Θ), arg max a i ∈ A E F ∼ µ [ U i ( a i , a − i ; F )]is the same for every a − i ∈ A . The adherents of a strategically independent theory believe that while opponent’s actionmay affect their utility, it does not affect their best response.
Proposition 3.
In any stage game, suppose Θ A , Θ B exhibit stability reversal and Θ A is thecorrectly specified singleton theory and Θ B has the product structure. Then, the beliefs thatthe adherents of Θ B hold in all EZ-SCs with p = (1 , and the beliefs they hold in all EZ-SCswith p = (0 , form disjoint sets. Also, Θ B is not strategically independent. The first claim of Proposition 3 shows that stability reversal must operate through thelearning channel. So in particular, it cannot happen if the group B agents simply have adifferent subjective preference in the stage game. The second claim shows that stabilityreversal can only happen if the misspecified agents respond differently to different rival play.In particular, it cannot happen in decision problems.16 .2.2 Non-Monotonic Stability in Matching Assortativity
We now turn to the role of matching assortativity on the stability of theories. In the worldof preference evolution, the linearity of fitness in matching assortativity discussed beforeimplies that if a theory Θ A is evolutionarily stable with strategic certainty against a theoryΘ B both under uniform matching ( λ = 0) and perfectly assortative matching ( λ = 1), thenthe same must also hold under any intermediate level of assortativity λ ∈ (0 , . Proposition 4.
Suppose Θ A , Θ B are singleton theories (i.e., subjective preferences in thestage game) and Θ A is evolutionarily stable with strategic certainty against Θ B with λ -matching for both λ = 0 and λ = 1 . Then, Θ A is also evolutionarily stable with strategiccertainty against Θ B with λ -matching for any λ ∈ [0 , . This result does not always hold with non-singleton general theories. We use an exampleto show that stability need not be monotonic in matching assortativity. In this example, acorrectly specified singleton theory is evolutionarily stable with strategic certainty againstanother misspecified theory both when λ = 0 and when λ = 1, but it is also evolutionarilyfragile with strategic certainty for some intermediate values of λ. Consider a stage game where each player chooses an action from { a , a , a } . Every playerthen receives a random prize, y ∈ { g, b } , which are worth utilities π ( g ) = 1 , π ( b ) = 0 . Thepayoff matrix below displays the objective expected utilities associated with different actionprofiles, which also correspond to the probabilities that the row and column players receivethe good prize g . a a a a a a A be the correctly specified singleton theory. The action a is strictly dominantunder the objective payoffs, so an adherent of Θ A always plays a in all matches. Let Θ B bea misspecified theory Θ B = A × { F H , F L } . Each model F H , F L stipulates that the prize g is generated the the probabilities in the following table, where b and c are parameters thatdepend on the model. The model F H has ( b, c ) = (0 . , .
2) and F L has ( b, c ) = (0 . , . .a a a a c a c, b, b b, a b Example 2.
In this stage game, Θ A is evolutionarily stable with strategic certainty againstΘ B under λ -matching when λ = 0 and λ = 1 , but it is also evolutionarily fragile withstrategic certainty under λ -matching when λ ∈ ( λ l , λ h ), where 0 < λ l < λ h < λ l = 0 . λ h ≈ . B . If theybelieve in F H , they will play the action profile ( a , a ) and generate the objective payoffprofile (0 . , . a , a ).The problem is that the data generated from the ( a , a ) profile provides a better fit for F L than F H , since the objective 40% probability of getting prize g is closer to F L ’s conjecture of10% than F H ’s conjecture of 80%. A belief in F H — and hence the profile ( a , a ) — cannotbe sustained if the mutants only play each other. On the other hand, when an adherent of Θ B plays a correctly specified Θ A adherent, both models F H and F L prescribe a best response of a against the Θ A adherent’s play a . The data generated from the ( a , a ) profile lead biasedagents to the model F H that enables cooperative behavior within the mutant community.But, these matches against correctly specified opponents harm the mutant’s welfare, as theyonly get an objective payoff of 0.2.Therefore, the most advantageous interaction structure for the mutants is one wherethey can calibrate the model F H using the data from matches against correctly specifiedopponents, then extrapolate this optimistic belief about b to coordinate on ( a , a ) in matchesagainst fellow mutants. This requires the mutants to match with intermediate assortativity.Figure 1 depicts the equilibrium fitness of the mutant theory Θ B as a function of assortativity.While payoffs of Θ B adherents increase in λ at first, eventually they drop when mutant-vs-mutant matches become sufficiently frequent that a belief in F H can no longer be sustained.The preference evolution framework does not allow this non-linear and even non-monotonicchange in fitness with respect to λ, which the theory evolution framework accommodates. We provide a few technical results about the existence of EZ-SC and the upper-hemicontinuityof the set of EZ-SC with respect to population share. The existence and continuity resultsalso establish the existence of approachable EZ-SCs with population shares p = (1 , .0 0.2 0.4 0.6 0.8 1.0 . . . Misspecified Theory's Fitness in EZ−SC assortativity l t heo r y B ' s f i t ne ss infer F H infer F L resident's fitness Figure 1: The EZ-SC fitness of Θ B for different values of matching assortativity λ when p B = 0. (The EZ-SC fitness of the resident theory Θ A is always 0.25.) In the blue region,there is a unique EZ-SC where the adherents of Θ B infer F H and receive linearly increasingaverage payoffs across all matches as λ increases. In the red region, there is an EZ-SC wherethe adherents of Θ B infer F L and receive payoff 0.2 in all matches, regardless of λ .agents hold correct beliefs about others’ play and prove analogous results for EZ instead ofEZ-SC, but this result is not needed in the subsequent applications as we will only considerstrategic uncertainty in examples where we can explicitly characterize the entire set of EZfor every population share p .Let two theories, Θ A , Θ B be fixed, where Θ A = A × F A and Θ B = A × F B have productstructures. Also fix population shares p and matching assortativity λ. For µ ∈ ∆(Θ A ) ∪ ∆(Θ B ), a i , a − i ∈ A , let U i ( a i , a − i ; µ ) := E F ∼ µ h E y ∼ F ( a i ,a − i ) ( π ( y )) i be the subjective expectedutility of playing a i against a − i , under the belief µ over models. Let U A : A × Θ A → R be such that U A ( a i , a − i ; F ) = U i ( a i , a − i ; δ F ) and let U B : A × Θ B → R be such that U B ( a i , a − i ; F ) = U i ( a i , a − i ; δ F ). Assumption 1. A , Θ A , Θ B are compact metrizable spaces. Assumption 2. U A , U B are continuous. Assumption 3.
For every F ∈ Θ A ∪ Θ B and a i , a − i ∈ A , KL ( F • ( a i , a − i ) k F ( a i , a − i )) iswell-defined and finite. Under Assumption 3, we have the well-defined functions K A : A × Θ A → R + and K B : A × Θ B → R + , where K g ( a i , a − i ; F ) := KL ( F • ( a i , a − i ) k F ( a i , a − i )). Assumption 4. K A and K B are continuous. Assumption 5. A is convex and, for all a − i ∈ A and µ ∈ ∆(Θ A ) ∪ ∆(Θ B ) , a i U i ( a i , a − i ; µ ) is quasiconcave.
19e show existence of EZ-SC using the Kakutani-Fan-Glicksberg fixed point theorem,applied to the correspondence which maps strategy profiles and beliefs over models into bestreplies and beliefs over KL-divergence minimizing models. We start with a lemma.
Lemma 1.
For g ∈ { A, B } , a = ( a AA , a AB , a BA , a BB ) ∈ A , and ≤ m g ≤ , let Θ ∗ g ( a, m g ) := arg min ˆ F ∈ Θ g m g · D KL ( F • ( a g,g , a g,g ) k ˆ F ( a g,g , a g,g )))+(1 − m g ) · D KL ( F • ( a g, − g , a − g,g ) k ˆ F ( a g, − g , a − g,g ) . Then, Θ ∗ g is upper hemicontinuous in its arguments. This lemma says the set of KL-minimizing models is upper hemicontinuous in strategyprofile and matching assortativity. This leads to the existence result.
Proposition 5.
Under Assumptions 1, 2, 3, 4, and 5, an EZ-SC exists.
Next, upper hemicontinuity in m g in Lemma 1 allows us to deduce the upper hemicon-tinuity of the EZ-SC correspondence in population shares, and conclude that the notionof approachability from Definition 4 is a non-empty refinement of the set of EZ-SC with p = (1 , Proposition 6.
Fix two theories Θ A , Θ B where Θ A = A × F A and Θ B = A × F B . Also fixmatching assortativity λ ∈ [0 , . The set of EZ-SC is an upper hemicontinuous correspon-dence in p B under Assumptions 1, 2, 3, and 4. Corollary 1.
Under Assumptions 1, 2, 3, 4, and 5, the set of approachable EZ-SC with p = (1 , is non-empty for every λ . Appendix B provides a learning foundation for our equilibrium concepts by showing that itis not possible for behavior and beliefs to stabilize at any outcome other than an EZ or anEZ-SC. We summarize it here, omitting the technical assumptions necessary for the result.A continuum of long-lived agents are endowed with one of two theories at time 0, andmatch in every period to play the stage game. Each agent starts with a full-support priorbelief over the models in her theory, believing her environment to be stationary. Whenmatched with an opponent, the agent observes the opponent’s group, then chooses a strat-egy a i ∈ A , and finally observes a consequence y i ∈ Y and possibly also a signal x i aboutthe matched opponent’s strategy a − i at the end of the game. She then updates her be-lief using Bayes’ rule. As models include conjectures about others’ play and conjecturesabout the fundamental parameters, agents may make inferences about the game parameters20sing opponents’ strategy choice through a correlated prior over strategic uncertainty andfundamental uncertainty.We do not require agents to act myopically and only assume that they use asymptoticallymyopic policies: they eventually choose actions that are (cid:15) myopic best responses to theBayesian posterior belief about the environment. We provide this foundation for the case ofgames with finite strategy spaces, but conjecture a similar argument would extend to gameswith compact infinite strategy spaces given some uniformity conditions.The learning foundation clarifies the difference between EZ and EZ-SC. The formeremerges without the additional ex-post signals about opponent’s play. The latter emergeswhen the ex-post signals are sufficiently informative. The idea is that a wrong conjectureabout opponent’s play then leads to higher KL divergence than the correct beliefs aboutopponent’s play combined with any feasible belief about the fundamentals. So, agents musthold correct beliefs about others’ play if beliefs and behavior converge to a steady state. We apply our framework to study the stability of misperceptions of the information struc-ture in linear quadratic normal (LQN) games. LQN games have been used as a tractableworkhorse model for studying comparative statics of equilibrium outcomes with respect tochanges in information (e.g., Bergemann and Morris (2013)). In this application, we exploitthe same tractability to study the evolutionary stability of correct beliefs about the informa-tion structure to misspecifications — in particular, misspecifications about the correlationin information between different players. Assuming that agents know others’ strategies, thekey conclusion is that a society of rational residents with correct beliefs about how privatesignals are correlated is evolutionarily fragile against misspecified mutants who suffer fromeither correlation neglect or projection bias. The type of bias that gets selected depends onthe matching assortativity λ in the society. In the LQN setup we consider, we interpret the players as competing firms that possess cor-related private information about market demand. At the start of the stage game, Nature’stype (i.e., a demand state) ω is drawn from N (0 , σ ω ), where N ( µ, σ ) is the normal distri-bution with mean µ and variance σ . Each of the two players i (i.e., firms) receives a privatesignal s i = ω + (cid:15) i , then chooses an action q i ∈ R (i.e., a quantity). Market price is then21ealized according to P = ω − r • · ( q + q ) + ζ , where ζ ∼ N (0 , ( σ • ζ ) ) is an idiosyncraticprice shock that is independent of all the other random variables. Firm i ’s profit in the gameis q i P − q i . The stage game is parametrized by the strictly positive terms σ ω , r • , and ( σ • ζ ) , whichrepresent variance in market demand, the elasticity of market price with respect to averagequantity supplied, and the variance of price shocks. These parameters remain constantthrough all matches. But in every match, demand state ω, signals ( s i ), and price shock ζ areredrawn, independently across matches. The environment can be interpreted as a marketwith daily fluctuations in demand, but the fluctuations are generated according to a fixedset of fundamental parameters.In the LQN game, market prices and quantity choices may be positive or negative. Tointerpret, when P > , the market pays for each unit of good supplied, and market pricedecreases in total supply. When P < , the market pays for disposal of the good. Firmsmake money by submitting negative quantities, which represent offers to remove the goodfrom the market. The per-unit disposal fee decreases as the firms offer to dispose more. Thecost q i represents either a convex production cost or a convex disposal cost, depending onthe sign of q i . We now turn to the information structure of the stage game — that is, the joint distri-bution of ( ω, s i , s − i ) . The firms’ signals s i = ω + (cid:15) i are conditionally correlated given ω. Theerror terms (cid:15) i are generated by (cid:15) i = κ q κ + (1 − κ ) z + 1 − κ q κ + (1 − κ ) η i , where η i ∼ N (0 , σ (cid:15) ) is the idiosyncratic component of the error generated i.i.d. across i, and z ∼ N (0 , σ (cid:15) ) is the common component for both i. Here, κ ∈ [0 ,
1] parametrizes theconditional correlation of the two firms’ signals. Higher κ leads to an information structurewith higher conditional correlation. When κ = 0 , s i and s − i are conditionally uncorrelatedgiven the state (though still unconditionally correlated since both depend on ω ). When κ = 1 , we always have s i = s − i . The functional form of (cid:15) i ensures the variance of the signalsVar( s i ) remains constant across all possible values of κ. We consider a family of misspecifications about the information structure parametrizedby misperceptions of κ . The objective information structure is given by κ = κ • . Notethat a misspecified information structure associated with a wrong κ leads to a higher-ordermisspecification about the state ω in the stage game. Suppose agents are correct about thedistributions of ω, η i , and z . Write E κ for expectation under the information structure withcorrelational parameter κ. Then E κ [ ω | s i ] is the same for all κ — in particular, even an agent22ho believes in some κ = κ • makes a correct first-order inference about the expectation ofthe market demand, given her own information. But, one can show (Lemma 2) there existsa strictly increasing and strictly positive function ψ ( κ ) so that E κ [ s − i | s i ] = ψ ( κ ) · s i for all s i ∈ R , κ ∈ [0 , . The misspecified agent holds a wrong belief about the rival’s signal, andthus a wrong belief about the rival’s belief about ω. Many experiments have found that subjects do not form accurate beliefs about the beliefsof others. We draw a connection between the misperception we study and the statisticalbiases that have been previously documented:
Definition 9.
Let ˜ κ be a player’s perceived κ . A player suffers from correlation neglect if˜ κ < κ • . A player suffers from projection bias if ˜ κ > κ • .Under correlation neglect, agents believe signals are more independent from one another thanthey really are. Under projection bias, agents “project” their own information onto othersand exaggerate the similarity between others’ signals and their own signals. We are agnosticabout the origin of these misspecifications about correlation. They may be psychological innature and come directly from the agents’ cognitive biases, or they could be driven by morecomplex mechanisms. We instead ask whether such misspecifications could persist in thesociety once they appear.
We translate the environment described above into the formalism from Section 2.A strategy in the stage game is a function Q i : R → R that assigns a quantity Q i ( s i ) toevery signal s i . The strategy is called linear if there exists an α i ≥ Q i ( s i ) = α i s i for every s i ∈ R . We will later show that the best response to any linear strategy is linear,regardless of the agent’s belief about the correlation parameter and market price elasticity(Lemma 3). We therefore restrict attention to linear strategies and let A = [0 , ¯ M α ] for some¯ M α < ∞ , where a typical element α i ∈ A corresponds to the linear strategy with coefficient α i . We suppose all parameters of the stage game are common knowledge except for r • , κ • , and σ • ζ . To investigate the evolutionary implications of higher-order misspecifications aboutthe state, we consider theories that are dogmatic and possibly wrong about κ, but allowagents to make inferences about r and σ ζ . We let the space of consequences be Y = R , For example, Hansen, Misra, and Pai (2021) show that multiple agents simultaneously conducting algo-rithmic price experiments in the same market may generate correlated information which get misinterpretedas independent information, a form of correlation neglect for firms. Goldfarb and Xiao (2019) structurallyestimate a model of thinking cost and find that bar owners over-extrapolate the effect of today’s weathershock on future profitability. y = ( s i , q i , P ) shows the agent’s signal, quantity choice, and themarket price. The consequence y delivers the utility π ( y ) := q i P − q i . We consider theorieswith the product structure Θ( κ ) := A ×F κ , where F κ := { F r,κ,σ ζ : r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] } for some ¯ M r , ¯ M σ ζ < ∞ . So F κ is a set of conjectures of the game environment indexed bythe parameters ( r, κ, σ ζ ), but all reflecting a dogmatic belief in the correlation parameter κ . Each F r,κ,σ ζ : A × A → ∆( Y ) is such that F r,κ,σ ζ ( α i , α − i ) gives the distribution over i ’sconsequences in a stage game with parameters ( r, κ, σ ζ ), when i uses the linear strategy α i against an opponent using the linear strategy α − i . Since we will focus on EZ-SC with theoriesthat allow unrestricted inference about others’ play, we will abuse notation and identify Θ( κ )with F κ .While agents learn about both r and σ ζ , it is their (mis)inferences about the marketprice elasticity r that drives the main results. Since each firm’s profit is linear in the marketprice, an agent’s belief about the variance of the idiosyncratic price shock does not changeher expected payoffs or behavior. We use inference over σ ζ to simplify our analysis: thisparameter absorbs changes in the variance of market price under different correlation struc-tures. A Bayesian agent whose data are all generated from the same strategy profile onlylearn about r using the mean of the market price in the data, not its variance.In formalizing the stage game and translating misperceptions of the information structureinto theories, we have assumed that the space of feasible linear strategies α i ∈ [0 , ¯ M α ] and thedomain of inference over game parameters r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] are bounded sets. Thesecompactness assumptions help ensure that EZ-SC exist. In analyzing evolutionary stability,we will focus on the case where the bounds ¯ M α , ¯ M r , ¯ M σ ζ are finite but sufficiently large, sothat the optimal behavior and beliefs are interior. We introduce the following shorthand: Notation . A result is said to hold “ with high enough price volatility and large enough strategyspace and inference space ” if, whenever the strategy space [0 , ¯ M α ] has ¯ M α ≥ /σ (cid:15) /σ (cid:15) +1 /σ ω , thereexist 0 < L , L , L < ∞ so that for any objective game F • with ( σ • ζ ) ≥ L and with theorieswhere the parameter spaces r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M σ ζ ] are such that ¯ M σ ζ ≥ ( σ • ζ ) + L and¯ M r ≥ L , the result is true. In order to determine which theories (i.e., perceptions of κ ) are stable against rival theories,we must characterize the relevant equilibrium zeitgeists. This section develops a numberof preliminary results that relate beliefs about the game parameters to best responses, andconversely strategy profiles to the KL-divergence minimizing inferences.We begin by proving the result alluded to earlier: under normality, every agent’s infer-ences about the state and about opponent’s signal are linear functions of her own signal.24he linear coefficient on the latter increases with the correlation parameter κ . Lemma 2.
There exists a strictly increasing function ψ ( κ ) , with ψ (0) > and ψ (1) = 1 , so that E κ [ s − i | s i ] = ψ ( κ ) · s i for all s i ∈ R , κ ∈ [0 , . Also, there exists a strictly positive γ ∈ R so that E κ [ ω | s i ] = γ · s i for all s i ∈ R , κ ∈ [0 , . Linearity of E [ ω | s i ] and E [ s − i | s i ] in s i allows us explicitly characterize the correspond-ing linear best responses, given beliefs about κ and elasticity r . For Q i , Q − i (not necessarilylinear) strategies in the stage game and µ ∈ ∆(Θ( κ )), let U i ( Q i , Q − i ; µ ) be i ’s subjectiveexpected utility from playing Q i against Q − i , under the belief µ. Lemma 3.
For α − i a linear strategy, U i ( α i , α − i ; µ ) = E [ s i ] · (cid:18) α i γ −
12 ˆ rα i −
12 ˆ rψ ( κ ) α i α − i − α i (cid:19) for every linear strategy α i , where ˆ r = R r dµ ( r, κ, σ ζ ) is the mean of µ ’s marginal on elas-ticity. For κ ∈ [0 , and r > , α BRi ( α − i , ; κ, r ) := γ − rψ ( κ ) α − i r best responds to α − i amongall strategies Q i : R → R for all σ ζ > . Lemma 3 shows that α BRi ( α − i , ; κ, r ) is not only the best-responding linear strategy whenopponent plays α − i and i believes in correlation parameter κ and elasticity r , it is alsooptimal among the class of all strategies Q i ( s i ) against the same opponent play and underthe same beliefs.Call a linear strategy more aggressive if its coefficient α i ≥ i ’s subjective best response function becomes more aggressive when i believes in lower κ or lower r . We have ∂α BRi ∂κ < ∂α BRi ∂r < r • . The followinglemma shows that any linear profile generates data whose KL-divergence can be minimized to0 by a unique value of r . We also characterize how this inference about elasticity depends onthe strategy profiles and the agent’s belief about the correlation parameter κ . As mentionedearlier, we focus on the case where the bounds on the inferences r ∈ [0 , ¯ M r ], σ ζ ∈ [0 , ¯ M σ ζ ] aresufficiently large to ensure that the KL-divergence minimization problem is well-behaved. Lemma 4.
For every < r • , ¯ M α < ∞ , there exist < L , L , L < ∞ such that forany ( σ • ζ ) ≥ L , ¯ M σ ζ ≥ ( σ • ζ ) + L , ¯ M r ≥ L , κ • , κ ∈ [0 , , α i , α − i ∈ [0 , ¯ M α ] , we have D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F ˆ r,κ, ˆ σ ζ ( α i , α − i )) = 0 for exactly one pair ˆ r ∈ [0 , ¯ M r ] , ˆ σ ζ ∈ [0 , ¯ M σ ζ ] .This ˆ r is given by r INFi ( α i , α − i , ; κ • , κ, r • ) := r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) . r is strictly decreasing in her belief aboutthe correlation parameter κ. To understand why, assume player i uses the linear strategy α i and player − i uses the linear strategy α − i . After receiving a private signal s i , player i expects to face a price distribution with a mean of γs i − r ( α i s i + α − i E κ [ s − i | s i ]) . Underprojection bias κ > κ • , E κ [ s − i | s i ] is excessively steep in s i . For example, following a largeand positive s i , the agent overestimates the similarity of − i ’s signal and wrongly predictsthat − i must also choose a very high quantity, and thus becomes surprised when marketprice remains high. The agent then wrongly infers that the market price elasticity must below. Therefore, in order to rationalize the average market price conditional on own signal, anagent with projection bias must infer r < r • . For similar reasons, an agent with correlationneglect infers r > r • . Combining Lemma 3 and Lemma 4, we find that increasing κ has an a priori ambigu-ous impact on the agent’s equilibrium aggressiveness. Increasing κ has the direct effect oflowering aggression (by Lemma 3), but it also causes the indirect effect of lowering inferenceabout r (by Lemma 4) and therefore increases aggression (by Lemma 3). Nevertheless, weshow in the results below that the indirect effect through the mislearning channel dominates,and the evolutionary stability of correlational errors are driven by this channel. We show inSection 4.6 that the results are reversed when we shut down the learning channel.Lemma 4 considers the problem of KL-divergence minimization when all of the dataare generated from a single strategy profile, ( α − i , α − i ) . It implies that if λ ∈ { , } and( p A , p B ) = (1 , A and Θ B . Thus, they must find a single set of parameters for thestage game that best fits all of their data, and even this best-fitting model will have positiveKL divergence in equilibrium. The next lemma shows the LQN game satisfies Assumptions1 through 5. Therefore, the existence and continuity results from Section 3.3 imply that thetractable analysis in homogeneous societies remains robust to the introduction of a smallbut non-zero share of a mutant theory.
Lemma 5.
For every r • , σ • ζ ≥ , λ ∈ [0 , , κ • , κ ∈ [0 , , ¯ M α , ¯ M σ ζ , ¯ M r < ∞ , the LQN withobjective parameters ( r • , κ • , σ • ζ ) , strategy space A = [0 , ¯ M α ] and theories Θ( κ • ) , Θ( κ ) withparameter spaces [0 , ¯ M r ] , [0 , ¯ M σ ζ ] satisfy Assumptions 1, 2, 3, 4, and 5. .4 Uniform Matching ( λ = 0 ) and Projection Bias We now describe our main results on the evolutionary instability of correctly specified be-liefs about the information structure. Our first main result is that in a society where agentsare uniformly matched, a correctly specified κ will be evolutionarily fragile with strategiccertainty against some amount of projection bias. The proof of this result involves character-izing the asymmetric equilibrium strategy profile in matches between the correctly specifiedresidents and the projection-biased mutants, and proving that a small amount of projectionbias leads the mutants to have higher payoffs in the resident-vs-mutant matches than theresidents’ payoffs in the resident-vs-resident matches. Proposition 7.
Let r • > , κ • ∈ [0 , be given. With high enough price volatility and largeenough strategy space and inference space, there exist κ < κ • < ¯ κ so that in societies with twotheories (Θ A , Θ B ) = (Θ( κ • ) , Θ( κ )) where κ ∈ [ κ, ¯ κ ] , there is a unique EZ-SC with uniformmatching ( λ = 0 ) and ( p A , p B ) = (1 , . The equilibrium fitness of Θ( κ ) is strictly higherthan that of Θ( κ • ) if κ > κ • , and strictly lower if κ < κ • . Combining this result with Lemma 5, we conclude that in societies with theories Θ( κ • )and Θ( κ ) where κ is slightly above κ • , the unique EZ-SC is approachable. Hence, the correctspecification is not evolutionarily stable with strategic certainty against a small amount ofprojection bias.Intuitively, as discussed after Lemma 4, projection bias generates a commitment to ag-gression as it leads the biased agents to under-infer market price elasticity. It is well knownthat in Cournot oligopoly games, such commitment can be beneficial. For instance, if quan-tities are chosen sequentially, the first mover obtains a higher payoff compared to the casewhere quantities are chosen simultaneously. A similar force is at work here, but the source ofthe commitment is different. Misspecification about signal correlation leads to misinferenceabout r • , which causes the mutants to credibly respond to their opponents’ play in an overlyaggressive manner. The rational residents, who can identify the mutants in the population,back down and yield a larger share of the surplus. While projection bias is beneficial insmall amounts, it is also intuitive that excessive aggression would be detrimental as well, asoverproduction can be individually suboptimal. λ = 1 ) and Correlation Neglect Turning to the case of perfectly assortative matching, we obtain the opposite result: evolu-tionary stability now selects for theories with correlation neglect. The fragility of the correctspecification is even starker here, as we show that any level of correlation neglect leads tohigher equilibrium fitness. 27 roposition 8.
Let r • > , κ • ∈ [0 , be given. With high enough price volatility andlarge enough strategy space and inference space, in societies with two theories (Θ A , Θ B ) =(Θ( κ A ) , Θ( κ B )) where κ A ≤ κ B , the fitness of Θ A is weakly higher than that of Θ B in everyEZ-SC with any population proportion p and perfectly assortative matching ( λ = 1 ). Combining this result with Lemma 5, we conclude that under Proposition 8’s conditionswith ( p A , p B ) = (1 , , at least one EZ-SC is approachable, and each theory’s fitness isinvariant across all approachable EZ-SCs. Furthermore, this fitness is strictly decreasing in κ. Hence, for any κ A < κ B , theory Θ( κ A ) is evolutionarily stable with strategic certaintyagainst theory Θ( κ B ) . Specializing to κ B = κ • , we conclude that the correct specification isevolutionarily fragile against any level of correlation neglect.As discussed after Lemma 4, correlation neglect makes agents over-infer market priceelasticity, and thus lets them commit to more cooperative behavior (i.e., linear strategieswith a smaller coefficient α i ). Rational opponents would take advantage of such agents,but the biased agents never match up against rational opponents in a society with perfectlyassortative matching. Note also that in the uniform matching case, projection bias leads tohigher payoff for the mutant at the expense of the rational opponent’s payoff. With perfectlyassortative matching, correlation neglect Pareto improves both biased agents’ payoffs.To understand why equilibrium fitness is a monotonically decreasing function of κ withperfectly assortative matching, let α T EAM denote the symmetric linear strategy profile thatmaximizes the sum of the two firms’ expected objective payoffs. We can show that amongsymmetric strategy profiles, players’ payoffs strictly decrease in their aggressiveness in theregion α > α
T EAM . We can also show that with λ = 1 and any κ ∈ [0 , , the equilibrium playamong two adherents of Θ( κ ) strictly increases in aggression as κ grows, and it is alwaysstrictly more aggressive than α T EAM . Lowering perception of κ confers an evolutionaryadvantage by bringing play monotonically closer to the team solution α T EAM in equilibrium.
The key mechanism behind Proposition 7 and Proposition 8 is that misperceptions about κ confer an evolutionary advantage through the learning channel: they cause the misspecifiedagents to misinfer some other parameter of the stage game. This mislearning is strategicallybeneficial as it commits the agents to certain behavior that increases their equilibrium payoffsagainst their typical opponents, given the matching assortativity. Section 3.2 showed that thelearning channel unique to the world of theory evolution permits novel stability phenomenain general games, and here we find the same channel is also indispensable for the predictionsin this particular application. The results about the evolutionary fragility of the correctspecification in Proposition 7 and Proposition 8 would be reversed without it.28 roposition 9. Let r • > , κ • ∈ [0 , be given. With high enough price volatility and largeenough strategy space and inference space, there exists (cid:15) > so that for any κ l , κ h ∈ [0 , , κ l < κ • < κ h ≤ κ • + (cid:15) , the correctly specified theory Θ( κ • ) is evolutionarily stable withstrategic certainty against the singleton theory { F r • ,κ h ,σ • ζ } under uniform matching ( λ = 0 ),and evolutionarily stable with strategic certainty against the singleton theory { F r • ,κ l ,σ • ζ } underperfectly assortative matching ( λ = 1 ). In this proposition, we consider agents with singleton theories who misperceive the signalcorrelation structure but hold dogmatic and correct beliefs about the other game parameters,including the elasticity of market price. Once the mislearning channel is shut down, we findthat misperceptions about κ that used to confer an evolutionary advantage under a certainmatching assortativity can no longer invade a society of correctly specified residents. We turn to general incomplete-information games and provide a condition for a theory tobe evolutionarily fragile against a “nearby” misspecified theory. This condition shows howassortativity and the learning channel shape the evolutionary selection of theories for abroader class of stage games and biases. We also relate the condition to the specific resultsstudied so far in this application.Consider a stage game where a state of the world ω is realized at the start of the game.Players 1 and 2 observe private signals s , s ∈ S ⊆ R , possibly correlated given ω. Theobjective distribution of ( ω, s , s ) is P • . Based on their signals, players choose actions q , q ∈ R and receive random consequences y , y ∈ Y . The distribution over consequencesas a function of ( ω, s , s , q , q ) and the utility over consequences π : Y → R are such thateach player i ’s objective expected utility from taking action q i against opponent action q − i in state ω is given by u • i ( q i , q − i ; ω ), differentiable in its first two arguments.For an interval of real numbers [ κ, ¯ κ ] with κ < ¯ κ and κ • ∈ ( κ, ¯ κ ), suppose there is a familyof theories (Θ( κ )) κ ∈ [ κ, ¯ κ ] , each with a product structure Θ( κ ) = A × F ( κ ). Fix λ ∈ [0 ,
1] anda strategy space A ⊆ R S , representing the feasible signal-contingent strategies. Suppose thetwo theories in the society are Θ A = Θ( κ • ) and Θ B = Θ( κ ) for some κ ∈ [ κ, ¯ κ ] . The nextassumption requires there to be a unique EZ-SC with ( p A , p B ) = (1 ,
0) in such societies withany κ ∈ [ κ, ¯ κ ], and further requires the EZ-SC to feature linear equilibria. Linear equilibriaexist and are unique in a large class of games outside of the duopoly framework, and inparticular in LQN games under some conditions on the payoff functions (see, e.g., Angeletosand Pavan (2007)). Assumption 6.
Suppose there is a unique EZ-SC under λ -matching and population pro-portions ( p A , p B ) = (1 , with Θ A = Θ( κ • ) , Θ B = Θ( κ ) for every κ ∈ [ κ, ¯ κ ] . Suppose the -indexed EZ-SC strategy profiles ( σ ( κ )) = ( σ AA ( κ ) , σ AB ( κ ) , σ BA ( κ ) , σ BB ( κ )) are linear, i.e., σ gg ( κ )( s i ) = α gg ( κ ) · s i with α gg ( κ ) differentiable in κ . Suppose that in the EZ-SC with κ = κ • , α AA ( κ • ) is objectively interim-optimal against itself. Finally, assume for every κ ,Assumptions 1, 2, 3, 4, and 5 are satisfied. Proposition 10.
Let α • := α AA ( κ • ) . Then, under Assumption 6, if E • " E • " ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s | s > , then there exists some (cid:15) > so that Θ( κ • ) is evolutionarily fragile with strategic certaintyagainst theories Θ( κ ) with κ ∈ ( κ • , κ • + (cid:15) ] ∩ [ κ, ¯ κ ] . Also, if E • " E • " ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s | s < , then there exists some (cid:15) > so that Θ( κ • ) is evolutionarily fragile with strategic certaintyagainst theories Θ( κ ) with κ ∈ [ κ • − (cid:15), κ • ) ∩ [ κ, ¯ κ ] . Here E • is the expectation with respect tothe objective distribution of ( ω, s , s ) under P • . Proposition 10 describes a general condition to determine whether a correctly specifiedtheory is evolutionarily fragile against a nearby misspecified mutant theory. The conditionasks if a slight change in the mutant theory’s κ leads mutants’ opponents to change theirequilibrium actions such that the mutants become better off on average. These opponentsare the residents under uniform matching λ = 0, so α AB ( κ • ) is relevant. These opponentsare other mutants under perfectly assortative matching λ = 1, so α BB ( κ • ) is relevant.Proposition 10 implies that one should only expect the correctly specified theory to bestable against all nearby theories in “special” cases — that is, when the expectation in thestatement of Proposition 10 is exactly equal to 0. One such special case is when the agentsface a decision problem where 2’s action does not affect 1’s payoffs, that is ∂u • ∂q = 0. Thissets the expectation to zero, so the result never implies that the correctly specified theory isevolutionarily fragile against a misspecified theory in such decision problems.In the duopoly game analyzed previously, we have ∂u • ∂q ( q , q , ω ) = − r • q . Player 1 isharmed by player 2 producing more if q > , and helped if q < . From straightforwardalgebra, the expectation in Proposition 10 simplifies to E • [ s ] · ( − ψ ( κ • ) r • α • ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] . The proof of Proposition 7 shows that when λ = 0, α AB ( κ • ) <
0. The proof of Proposition More precisely, for every s i ∈ S, α AA ( κ • ) · s i maximizes the agent’s objective expected utility across allof R when − i uses the same linear strategy α AA ( κ • ).
30 shows that when λ = 1, α BB ( κ • ) >
0. The uniqueness of EZ-SC also follow from thesepropositions, for an open interval of κ containing κ • . We restrict A to the set of linearstrategies, and Lemma 3 implies the linear strategies played by two correctly specified firmsagainst each other are interim optimal. Finally, Lemma 5 verifies that Assumptions 1 through5 are satisfied. Therefore, the conditions of Proposition 10 are satisfied for λ ∈ { , } , and wededuce the correctly specified theory is evolutionarily fragile with strategic certainty againstslightly higher κ (for λ = 0) and slightly lower κ (for λ = 1). In the next application, we use the framework of theory evolution to provide a justifica-tion for coarse analogy classes in games. Jehiel (2005) introduces the solution concept ofanalogy-based expectation equilibrium (ABEE) in extensive-form games, where agents groupopponents’ nodes in an extensive-form game into analogy classes and only keep track of av-erage behavior within each analogy class. An ABEE is a strategy profile where agents bestrespond to such average opponent behavior. In the ensuing literature that applies ABEE todifferent settings, analogous classes are usually exogenously given and interpreted as aris-ing from agents’ cognitive constraints. We show through an example that suitably definedtheories whose marginals on opponents’ play are restricted subsets of extensive-form strate-gies can encode analogy classes, and that the matches between any two groups in an EZconstitute ABEEs. We can then investigate which analogy classes are more likely to ariseby studying the stability of different theories (i.e., analogy classes), including the correctlyspecified theory (i.e., the finest analogy class).Consider a centipede game, shown in Figure 2. P1 and P2 take turns choosing Acrossor Drop. The non-terminal nodes are labeled n k , 1 ≤ k ≤ K for an even K . P1 acts atnodes n , n , ..., n K − , P2 acts at nodes n , n , ..., n K , and choosing Drop at n k leads to theterminal node z k . If Across is always chosen, then the terminal node z end is reached. If P1chooses Drop at the first node, the game ends with the payoff profile (0, 0). Every time aplayer i chooses Across, the sum of payoffs grows by g > , but if the next player choosesDrop then i ’s payoff is ‘ > i would have gotten by choosing Drop. If z end is reached, both get Kg/ . That is, if u kj is the utility of j at the terminal node z k , and i moves at n k , then u k − i = u k − − i − ‘ while u ki = ( u k − i + u k − − i + g ) − u k − i . This works out to u kj = g ( k − for both players when k is odd, and u k = k − g − ‘ , u k = k g + ‘ when k is even.While this is an asymmetric stage game, we study the symmetrized version mentioned in Section 6.2 of Jehiel (2005) mentions that if players could choose their own analogy classes, then thefinest analogy classes need not arise, but also says “it is beyond the scope of this paper to analyze theimplications of this approach.” l ,g+ l ) (g,g)n n n D D DA A A ( g(K-2)/2, g(K-2)/2 ) ( g(K-2)/2- l ,gK/2+ l )n K-1 n K D DA... A (gK/2,gK/2)A
Figure 2: The centipede game. There are 2 K non-terminal nodes and players 1 (blue) and 2(red) alternate in choosing Across (A) or Drop (D). Payoff profiles are shown at the terminalnodes.Section 2.1, where two matched agents are randomly assigned into the roles of P1 and P2.Let A = { ( d k ) Kk =1 ∈ [0 , K } , so each strategy is characterized by the probabilities of playingDrop at various nodes in the game tree. When assigned into the role of P1, the strategy( d k ) plays Drop with probabilities d , d , ..., d K − at nodes n , n , ...n K − . When assignedinto the role of P2, it plays Drop with probabilities d , d , ..., d K at nodes n , n , ...n K . Theset of consequences is Y = { , } × ( { z k : 1 ≤ k ≤ K } ∪ { z end } ), where the first dimensionof the consequence returns the player role that the agent was assigned into, and the seconddimension returns the terminal node reached. The objective distribution over consequencesas a function of play is F • : A → ∆( Y ).Consider a learning environment where agents know the game tree (i.e., they know F • ),but some agents mistakenly think that when their opponents are assigned into a role, theseopponents play Drop with the same probabilities at all of their nodes. Formally, define therestricted space of strategies A An := { ( d k ) ∈ [0 , K : d k = d k if k ≡ k (mod 2) } ⊆ A . Thecorrectly specified theory is Θ • := A × A × { F • } . The misspecified theory with a restrictionon beliefs about opponents’ play is Θ An := A An × A An × { F • } , reflecting a dogmatic beliefthat opponents play the same mixed action at all nodes in the analogy class. It is importantto remember that these restriction on strategies only exists in the subjective beliefs of thetheory Θ An adherents. All agents, regardless of their theory, actually have the strategy space A . The next proposition provides a justification for why we might expect agents with coarseanalogy classes given by A An to persist in the society. Proposition 11.
Suppose K ≥ and g > K − ‘ . For any matching assortativity λ ∈ [0 , , the correctly specified theory Θ • is evolutionarily stable against itself, but it is notevolutionarily stable against the misspecified theory Θ An . Also, Θ An is not evolutionarilystable against Θ • , unless λ = 1 . In contrast to the results from the previous section, which predict different biases mayarise under different matching assortativities, we find in this environment that the correctlyspecified theory is not evolutionarily stable against the theory Θ An with coarse analogy32lasses under any level of assortativity. In the the previous application to LQN games, agentswith projection bias commit to acting more aggressively, which increases their equilibriumwelfare in matches against rational agents but decreases their equilibrium welfare in matchesagainst other agents with the same bias — and vice versa for agents with correlation neglect.But in the current application, the conditional fitness of Θ An against both Θ • and Θ An canstrictly improve on the correctly specified residents’ equilibrium fitness. This is becausethe matches between two adherents of Θ • must result in Dropping at the first move inequilibrium, while matches where at least one player is an adherent of Θ An either lead to thesame outcome or lead to a Pareto dominating payoff profile.But at the same time, Θ An is not evolutionarily stable against Θ • either, because thecorrectly specified agents receive higher payoffs than the misspecified agents when matchedagainst each other. It is easy to see that there is some interior population proportions forthe two theories, ( p A , p B ) ∈ (0 , , such that there is an EZ where no matches that involvemisspecified agents result in immediate dropping, and the two theories have the same fitness. This paper presents an evolutionary selection criterion to endogenize (mis)specificationswhen agents learn about a strategic environment. We introduce the concept of a zeitgeist tocapture the ambient social structure where learning takes place: the prominence of differenttheories in the society and the interaction patterns among their adherents. These detailsmatter because different types of opponents behave differently, inducing different beliefsabout the economic fundamentals for a misspecified agent. Evolutionary stability of a theoryis defined based on the expected objective payoffs (fitness) of its adherents in equilibrium.We have highlighted settings where the correct specification is not evolutionarily stableagainst some misspecifications. We view our main contributions as two fold. First, we pointout how details of the zeitgeist (e.g., the matching assortativity) change which learningbiases may persist in an otherwise rational society. Second, we emphasize that the learningchannel, unique to a world where evolutionary forces act on specifications (sets of feasiblebeliefs) instead of single beliefs, generates novel stability phenomena.Our framework evaluates whether a misspecification is likely to persist once it emergesin a society, but does not account for which errors appear in the first place. It is plausiblethat some first-stage filter prevents certain obvious misspecifications from ever reaching thestage that we study in the evolutionary framework. In the applications, we have focused onmisspecifications that seem psychologically plausible or harder to detect, such as misspecifiedhigher-order beliefs. 33e have used the simplest evolutionary framework where fitness is identified with theexpectation of objective payoffs, as opposed to some more exotic function of the payoffs. Thispaper not meant to be a just-so congruence exercise of identifying the suitable definition offitness to justify a particular error (which is the focus for many of the papers that Robson andSamuelson (2011) survey). Rather, we hope that our stability notions are reasonably simpleand universal that they may become a part of the applied theory toolkit in the future. Studieson the implications of misspecifications in various strategic environments may further enrichour understanding of these errors by paying more attention to their evolutionary stability.
References
Alger, I. and J. Weibull (2019): “Evolutionary models of preference formation,”
AnnualReview of Economics , 11, 329–354.
Aliprantis, C. and K. Border (2006):
Infinite Dimensional Analysis: A Hitchhiker’sGuide , Springer Science & Business Media.
Angeletos, G.-M. and A. Pavan (2007): “Efficient use of information and social valueof information,”
Econometrica , 75, 1103–1142.
Ba, C. (2020): “Model misspecification and paradigm shift,”
Working Paper . Ba, C. and A. Gindin (2020): “A multi-agent model of misspecified learning with over-confidence,”
Working Paper . Bergemann, D. and S. Morris (2013): “Robust predictions in games with incompleteinformation,”
Econometrica , 81, 1251–1308.
Berman, R. and Y. Heller (2020): “Naive analytics equilibrium,”
Working Paper . Bohren, J. A. (2016): “Informational herding with model misspecification,”
Journal ofEconomic Theory , 163, 222–247.
Bohren, J. A. and D. Hauser (2018): “Learning with model misspecification: Charac-terization and robustness,”
Working Paper . Cho, I.-K. and K. Kasa (2015): “Learning and model validation,”
Review of EconomicStudies , 82, 45–82.——— (2017): “Gresham’s law of model averaging,”
American Economic Review , 107, 3589–3616.
Dasaratha, K. and K. He (2020): “Network structure and naive sequential learning,”
Theoretical Economics , 15, 415–444. 34 ekel, E., J. Ely, and O. Yilankaya (2007): “Evolution of preferences,”
Review ofEconomic Studies , 74, 685–704.
Eliaz, K. and R. Spiegler (2020): “A model of competing narratives,”
American Eco-nomic Review , 110, 3786–3816.
Esponda, I. and D. Pouzo (2016): “Berk–Nash equilibrium: A framework for modelingagents with misspecified models,”
Econometrica , 84, 1093–1130.
Esponda, I., D. Pouzo, and Y. Yamamoto (2019): “Asymptotic behavior of Bayesianlearners with misspecified models,”
Working Paper . Frick, M., R. Iijima, and Y. Ishii (2019): “Stability and rbustness in misspecifiedlearning models,”
Working Paper .——— (2020): “Welfare comparisons for biased learning,”
In Preparation .——— (2021): “Misinterpreting others and the fragility of social learning,”
Econometrica,forthcoming . Friedman, M. (1953):
Essays in Positive Economics , University of Chicago Press.
Fudenberg, D. and G. Lanzani (2020): “Which misperceptions persist?”
WorkingPaper . Fudenberg, D., G. Lanzani, and P. Strack (2020): “Limits points of endogenousmisspecified learning,”
Working Paper . Fudenberg, D., G. Romanyuk, and P. Strack (2017): “Active learning with a mis-specified prior,”
Theoretical Economics , 12, 1155–1189.
Gagnon-Bartsch, T., M. Rabin, and J. Schwartzstein (2020): “Channeled atten-tion and stable errors,”
Working Paper . Goldfarb, A. and M. Xiao (2019): “Transitory shocks, limited attention, and a firm’sdecision to exit,”
Working Paper . Hansen, K., K. Misra, and M. Pai (2021): “Algorithmic collusion: Supra-competitiveprices via independent algorithms,”
Marketing Science, forthcoming . He, K. (2020): “Mislearning from censored data: The gambler’s fallacy in optimal-stoppingproblems,”
Working Paper . Heidhues, P., B. Koszegi, and P. Strack (2018): “Unrealistic expectations and mis-guided learning,”
Econometrica , 86, 1159–1214.
Heller, Y. (2015): “Three steps ahead,”
Theoretical Economics , 10, 203–241.
Heller, Y. and E. Winter (2016): “Rule rationality,”
International Economic Review ,57, 997–1026. 35—— (2020): “Biased-belief equilibrium,”
American Economic Journal: Microeconomics ,12, 1–40.
Jehiel, P. (2005): “Analogy-based expectation equilibrium,”
Journal of Economic theory ,123, 81–104.——— (2018): “Investment strategy and selection bias: An equilibrium perspective onoveroptimism,”
American Economic Review , 108, 1582–97.
Levy, G., R. Razin, and A. Young (2020): “Misspecified politics and the recurrence ofpopulism,”
Working Paper . Molavi, P. (2019): “Macroeconomics with learning and misspecification: A general theoryand applications,”
Working Paper . Nyarko, Y. (1991): “Learning in mis-specified models and the possibility of cycles,”
Journalof Economic Theory , 55, 416–427.
Olea, J. L. M., P. Ortoleva, M. M. Pai, and A. Prat (2020): “Competing models,”
Working Paper . Robson, A. J. and L. Samuelson (2011): “The evolutionary foundations of preferences,”in
Handbook of Social Economics , Elsevier, vol. 1, 221–310.
Schwartzstein, J. and A. Sunderam (2021): “Using models to persuade,”
AmericanEconomic Review, forthcoming . Vives, X. (1988): “Aggregation of information in large Cournot markets,”
Econometrica ,851–876.
AppendixA Proofs
A.1 Proof of Proposition 1
Proof.
In any approachable EZ or approachable EZ-SC, let (ˆ a A , ˆ a B , ˆ F ) ∈ supp( µ A ) and notethat ( a AA , a BA , F • ) ∈ Θ A since Θ A is correctly specified. Both (ˆ a A , ˆ a B , ˆ F ) and ( a AA , a BA , F • )solve the weighted minimization problem, the former because it is in the support of µ A , thelatter because it attains the lowest minimization objective of 0. By strong identification, thebest-response function under belief µ A is the same as that of someone who knows the thegame is the decision problem F • . Therefore, adherents of Θ A obtain the highest possibleexpected payoffs in F • , so Θ A has weakly higher fitness than Θ B in the approachable EZ orEZ-SC. 36 .2 Proof of Proposition 2 Proof.
Let two singleton theories Θ A , Θ B be given. By way of contradiction, suppose theyexhibit stability reversal. Let Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ = 0 , ( a )) be any EZ-SCwhere Θ B is resident. By the definition of EZ-SC, Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ =0 , ( a )) is also an EZ-SC where Θ A is resident. Let u g,g be theory Θ g ’s conditional fitnessagainst group g in the EZ-SC Z . Part (i) of the definition of stability reversal requires that u AA > u BA and u AB > u BB . These conditional fitness levels remain the same in Z . Thismeans the fitness of Θ A is strictly higher than that of Θ B in Z , a contradiction. A.3 Proof of Example 1
Proof.
Define b ∗ ( a i , a − i ) := b • + ma i + a − i . It is clear that D KL ( F • ( a i , a − i ) k ˆ F ( a i , a − i ; b ∗ ( a i , a − i ) , m ))) =0, while this KL divergence is strictly positive for any other choice of b. In every EZ-SC with λ = 0 and p = (1 , , we must have a AA = a AB = 1 . If a BA = 2 , then the adherents of Θ B infer b ∗ (1 ,
2) = b • + m . With this inference, the biased agentsexpect 1 · (2( b • + m ) − m ) = 2 b • − m from playing 1 against rival investment 1, and expect2 · (3( b • + m ) − m ) − c = 6 b • − c from playing 2 against rival investment 1. Since 4 b • + m − c > a BA = 2 and µ B puts probability 1 on b • + m . Itis impossible to have a BA = 1 in EZ-SC. This is because b ∗ (1 , > b ∗ (1 , , and under theinference b ∗ (1 ,
2) we already have that the best response to 1 is 2, so the same also holdsunder any higher belief about complementarity. Also, we have a BB = 2, since 2 must bestrespond to both 1 and 2. So in every such EZ-SC, Θ A ’s conditional fitness against group A is2 b • and Θ B ’s conditional fitness against group A is 6 b • − c , with 2 b • > b • − c by Condition 1.Also, Θ A ’s conditional fitness against group B is 3 b • , while Θ B ’s conditional fitness againstgroup B is 8 b • − c . Again, 3 b • > b • − c by Condition 1.Next, we show Θ B has strictly higher fitness than Θ A in every EZ-SC with λ = 0 , p B =1 . There is no EZ-SC with a BB = 1 . This is because b ∗ (1 ,
1) = b • + m . As discussedbefore, under this inference the best response to 1 is 2, not 1. Now suppose a BB = 2 . Then µ B puts probability 1 on b ∗ (2 ,
2) = b • + m . With this inference, the biased agentsexpect 1 · (3( b • + m ) − m ) = 3 b • − m from playing 1 against rival investment 2, and expect2 · (4( b • + m ) − m ) − c = 8 b • − c from playing 2 against rival investment 2. We have5 b • + m − c > a AA = a AB = 1 . We conclude the unique EZ-SC behavior is ( a AA , a AB , a BA , a BB ) = (1 , , , · (2( b • + m ) − m ) = 2 b • − m from playing 1 against rival investment 1, andexpect 2 · (3( b • + m ) − m ) − c = 6 b • − m − c from playing 2 against rival investment 1. Wehave 4 b • − c < λ = 0and p = (0 , , the fitness of Θ A is 2 b • and the fitness of Θ B is 8 b • − c, where 8 b • − c > b •
37y Condition 1.
A.4 Proof of Proposition 3
Proof.
To show the first claim, by way of contradiction, suppose Z = (Θ A , Θ B , µ A , µ B , p =(1 , , λ = 0 , ( a AA , a AB , a BA , a BB )) is an EZ-SC, and ˜ Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ =0 , (˜ a AA , ˜ a AB , ˜ a BA , ˜ a BB )) is another EZ-SC where the adherents of Θ B hold the same belief µ B (group A’s belief cannot change as Θ A is the correctly specified singleton theory). Bythe optimality of behavior in Z , a BA best responds to a AB under the belief µ B , and a AB best responds to a BA under the belief µ A , therefore ˜ Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ =0 , (˜ a AA , a AB , a BA , ˜ a BB )) is another EZ-SC. This holds because the distributions of observa-tions for the adherents of Θ B are identical in ˜ Z and ˜ Z , since they only face data gener-ated from the profile (˜ a BB , ˜ a BB ) . At the same time, since ˜ a BB best responds to itself underthe belief µ B , we have that Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 0 , ( a AA , a AB , a BA , ˜ a BB ))is an EZ-SC. Part (i) of the definition of stability reversal applied to Z requires that U • ( a AB , a BA ) > U • (˜ a BB , ˜ a BB ) (where U • is the objective expected payoffs), but part (ii)of the same definition applied to ˜ Z requires U • (˜ a BB , ˜ a BB ) ≥ U • ( a AB , a BA ) , a contradiction.To show the second claim, by way of contradiction suppose Θ B is strategically indepen-dent and Z = (Θ A , Θ B , µ A , µ B , p = (0 , , λ = 0 , ( a AA , a AB , a BA , a BB )) is an EZ-SC. We canwrite Θ B = A × F B . By definition of EZ-SC, µ B can be written as δ a AB × δ a BB × µ F B where µ F B ∈ ∆( F B ) . By strategic independence, the adherents of Θ B find it optimal toplay a BB against any opponent strategy under the belief that F is drawn from µ F B . So,there exists another EZ-SC of the form Z = (Θ A , Θ B , µ A , δ a AB × δ a BB × µ F B , p = (0 , , λ =0 , ( a AA , a AB , a BB , a BB )), where a AB is an objective best response to a BB . The belief µ F B issustained because in both Z and Z , the adherents of Θ B have the same data: from the strat-egy profile ( a BB , a BB ) . In Z , Θ A ’s fitness is U • ( a AB , a BB ) and Θ B ’s fitness is U • ( a BB , a BB ) . We have U • ( a AB , a BB ) ≥ U • ( a BB , a BB ) since a AB is an objective best response to a BB , contradicting the definition of stability reversal. A.5 Proof of Proposition 4
Proof.
Let λ ∈ [0 ,
1] be given and let Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ, ( a )) be an EZ-SC. Since Θ A , Θ B are singleton theories, Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 0 , ( a )) and Z = (Θ A , Θ B , µ A , µ B , p = (1 , , λ = 1 , ( a )) are also EZ-SCs. Furthermore, they are all ap-proachable since the same beliefs and behavior are sustained as EZ-SCs with any populationproportions. Let u g,g represent theory Θ g ’s conditional fitness against group g in each ofthese three EZ-SCs. From the hypothesis of the proposition, u A,A ≥ u B,A and u A,A ≥ u B,B .38his means the fitness of Θ A in Z , which is u A,A , is weakly larger than the fitness of Θ B in Z , which is λu B,B + (1 − λ ) u B,A . This shows Θ A has weakly higher fitness than Θ B in every ap-proachable EZ-SC with λ and p = (1 , λ , for at least one approachable EZ-SC exists when λ = 0, and the sameequilibrium belief and behavior also constitutes an EZ-SC for any other assortativity. A.6 Proof of Example 2
Proof.
Let KL , := 0 . · ln . . + 0 . · ln . . ≈ . , KL , := 0 . · ln . . + 0 . · ln . . ≈ . , and KL , := 0 . · ln . . + 0 . · ln . . ≈ . λ h be the unique solution to(1 − λ ) KL , − λ ( KL , − KL , ) = 0 , so λ h ≈ . . We show for any λ ∈ [0 , λ h ), there exists a unique EZ-SC Z = (Θ A , Θ B , µ A , µ B , p =(1 , , λ, ( a )), and that this EZ-SC has µ B putting probability 1 on F H , a AA = a , a AB = a ,a BA = a , a BB = a . First, we may verify that under F H , a best responds to both a and a . Also, the KL divergence of F H is λ · KL , while that of F L is λ · KL , + (1 − λ ) · KL , . Since λ < λ h , we see that F H has strictly lower KL divergence. Finally, to check that there areno other EZ-SCs, note we must have a AA = a , a AB = a , a BA = a in every EZ-SC. In anEZ-SC where a BB puts probability q ∈ [0 ,
1] on a , the KL divergence of F H is λp · KL , andthe KL divergence of F L is λp · KL , + (1 − λ ) · KL , . We have λq · KL , +(1 − λ ) · KL , − λq · KL , = λq · ( KL , − KL . )+(1 − λ ) KL , ≥ (1 − λ ) KL , − λ ( KL , − KL , ) . Since λ < λ h , this is strictly positive. Therefore we must have µ B put probability 1 on F H , which in turn implies q = 1 . For each λ ∈ [0 , λ h ), the beliefs and behavior in the unique EZ-SC discussed above alsoconstitute an EZ-SC for a small enough p B > . So, the unique EZ-SC with p B = 0 isapproachable.When Θ A is dominant, the equilibrium fitness of Θ A is always 0.25 for every λ . Theequilibrium fitness of Θ B , as a function of λ , is 0 . λ + 0 . − λ ) . Let λ l solve 0 .
25 =0 . λ + 0 . − λ ) , that is λ l = 0 . . This shows Θ A is evolutionarily fragile with strategiccertainty against Θ B for λ ∈ ( λ l , λ h ) , and it is evolutionarily stable with strategic certaintyagainst Θ B for λ = 0.Now suppose λ = 1 . If there is an EZ-SC with p A = 1 where a BB plays a with positiveprobability, then µ B must put probability 1 on F L , since KL , < KL , . This is a contra-diction, since a does not best respond to itself under F L . So the unique EZ-SC involves a AA = a , a AB = a , a BA = a , a BB = a . It is easy to check this EZ-SC is approachable.In the EZ-SC, the fitness of Θ A is 0.25, and the fitness of Θ B is 0.2. This shows Θ A is39volutionarily stable with strategic certainty against Θ B for λ = 1 . A.7 Proof of Lemma 1
Proof.
First note the minimization objective may be written as W ( a, F, m g ) := m g K g ( a g,g , a g,g ; F ) + (1 − m g ) K g ( a g, − g , a − g,g ; F ) , a continuous function of ( a, F, m g ) by Assumption 4. Suppose we have a sequence ( a ( n ) , m ( n ) g ) → ( a ∗ , m ∗ g ) ∈ A × [0 ,
1] and let F ( n ) ∈ Θ ∗ g ( a ( n ) , m ( n ) g ) for each n, with F ( n ) → F ∗ ∈ Θ g . Forany other ˆ F ∈ Θ g , note that W ( a ∗ , m ∗ g , ˆ F ) = lim n →∞ W ( a ( n ) , m ( n ) g , ˆ F ) by continuity. Butalso by continuity, W ( a ∗ , m ∗ g , F ∗ ) = lim n →∞ W ( a ( n ) , m ( n ) g , F ( n ) ) and W ( a ( n ) , m ( n ) g , F ( n ) ) ≤ W ( a ( n ) , m ( n ) g , ˆ F ) for every n. It therefore follows W ( a ∗ , m ∗ g , F ∗ ) ≤ W ( a ∗ , m ∗ g , ˆ F ) . A.8 Proof of Proposition 5
Proof.
Consider the correspondence Γ : A × ∆(Θ A ) × ∆(Θ B ) ⇒ A × ∆(Θ A ) × ∆(Θ B ) , Γ( a AA , a AB , a BA , a BB , µ A , µ B ) :=(BR( a AA , µ A ) , BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) , ∆(Θ ∗ A ( a )) , ∆(Θ ∗ B ( a ))) , where BR( a − i , µ g ) := arg max ˆ a i ∈ A U g (ˆ a i , a − i ; µ g ) and, for each g ∈ { A, B } , the correspondenceΘ ∗ g is defined with m g = λ + (1 − λ ) p g , m − g = 1 − m g . It is clear that fixed points of Γ areEZ-SC.We apply the Kakutani-Fan-Glicksberg theorem (see, e.g, Corollary 17.55 in Aliprantisand Border (2006)). By Assumptions 1 and 5, A is acompact and convex metric space, andeach Θ g is a compact metric space, so it follows the domain of Γ is a nonempty, compactand convex metric space. We need only verify that Γ has closed graph, non-empty values,and convex values.To see that Γ has closed graph, the previous lemma shows the upper hemicontinuity ofΘ ∗ A ( a ) and Θ ∗ B ( a ) in a, and Theorem 17.13 of Aliprantis and Border (2006) then implies∆(Θ ∗ A ( a )) and ∆(Θ ∗ B ( a )) are also upper hemicontinuous in a. It is a standard argument thatsince Assumption 2 supposes U A , U B are continuous, it implies the best-response correspon-dences BR( a AA , µ A ) , BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) have closed graphs.To see that Γ is non-empty, recall that each ˆ a i U g (ˆ a i , a − i ; µ g ) is a continuous functionon a compact domain, so it must attain a maximum on A . Similarly, the minimizationproblem that defines each Θ ∗ g ( a ) is a continuous function of F over a compact domain of40ossible F ’s, so it attains a minimum. Thus each ∆(Θ ∗ g ( a )) is the set of distributions over anon-empty set.To see that Γ is convex valued, clearly ∆(Θ ∗ A ( a )) and ∆(Θ ∗ B ( a )) are convex valued bydefinition. Also, ˆ a i U A (ˆ a i , a AA ; µ A ) is quasiconcave by Assumption 5. That means if a i , a i ∈ BR( a AA , µ A ) , then for any convex combination ˜ a i of a i , a i , we have U A (˜ a i , a AA ; µ A ) ≥ min( U A ( a i , a AA ; µ A ) , U A ( a i , a AA ; µ A )) = max ˆ a i ∈ A U A (ˆ a i , a AA ; µ A ). Therefore, BR( a AA , µ A ) isconvex. For similar reasons, BR( a BA , µ A ) , BR( a AB , µ B ) , BR( a BB , µ B ) are convex. A.9 Proof of Proposition 6
Proof.
Since A × ∆(Θ A ) × ∆(Θ B ) is compact by Assumption 1, we need only show that forevery sequence ( p ( k ) B ) k ≥ and ( a ( k ) , µ ( k ) ) k ≥ = ( a ( k ) AA , a ( k ) AB , a ( k ) BA , a ( k ) BB , µ ( k ) A , µ ( k ) B ) k ≥ such that forevery k , ( a ( k ) , µ ( k ) ) is an EZ-SC with p = (1 − p ( k ) B , p ( k ) B ), p ( k ) B → p ∗ B , and ( a ( k ) , µ ( k ) ) → ( a ∗ , µ ∗ ),then ( a ∗ , µ ∗ ) is an EZ-SC with p = (1 − p ∗ B , p ∗ B ).We first show for all g, g ∈ { A, B } , a ∗ g,g is optimal against a ∗ g ,g under the belief µ ∗ g . Assortativity does not matter here, since optimality applies within all type match-ups.By Assumption 2, U g ( a i , a − i ; F ) is continuous, so by property of convergence in distribu-tion, U g ( a ( k ) g,g , a ( k ) g ,g ; µ ( k ) g ) → U g ( a ∗ g,g , a ∗ g ,g ; µ ∗ g ). For any other ˆ a i ∈ A , U g (ˆ a i , a ( k ) g ,g ; µ ( k ) g ) → U g (ˆ a i , a ∗ g ,g ; µ ∗ g ) and for every k, U g ( a ( k ) g,g , a ( k ) g ,g ; µ ( k ) g ) ≥ U g (ˆ a i , a ( k ) g ,g ; µ ( k ) g ) . Therefore a ∗ g,g bestresponds to a ∗ g ,g under belief µ ∗ g . Next, we show models in the support of µ ∗ g minimize weighted KL divergence for group g. First consider the correspondence H : A × [0 , ⇒ Θ g where H ( a, p g ) := Θ ∗ g ( a, λ +(1 − λ )( p g )). Then H is upper hemicontinuous by Lemma 1. Since H ( a, p g ) represents theminimizers of a continuous function on a compact domain, it is non-empty and closed. ByTheorem 17.13 of Aliprantis and Border (2006), the correspondence ˜ H : A × [0 , ⇒ ∆(Θ g )defined so that ˜ H ( a, p g ) := ∆( H ( a, p g )) is also upper hemicontinuous. For every k, µ ( k ) g ∈ ˜ H ( a ( k ) , p ( k ) g ), and µ ( k ) g → µ ∗ g , a ( k ) → a ∗ , p ( k ) g → p ∗ g . Therefore, µ ∗ g ∈ ˜ H ( a ∗ , p ∗ g ) , that is to say µ ∗ g is supported on the minimizers of weighted KL divergence. A.10 Proof of Lemma 2
Proof.
For i = j, rewrite s i = (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) + − κ √ κ +(1 − κ ) η i and s j = (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) + − κ √ κ +(1 − κ ) η j . Note that ω + κ √ κ +(1 − κ ) z has a normal distribution with mean 0 and variance σ ω + κ κ +(1 − κ ) σ (cid:15) . The posterior distribution of (cid:18) ω + κ √ κ +(1 − κ ) z (cid:19) given s i is therefore normalwith a mean of / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) s i and a variance of / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) . η j is mean-zero and independent of i ’s signal, the posterior distribution of s j | s i under the correlation parameter κ is normal with a mean of1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) )1 / ( σ ω + κ κ +(1 − κ ) σ (cid:15) ) + 1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) ) s i and a variance of / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) + (1 − κ ) κ +(1 − κ ) σ (cid:15) . We thus define ψ ( κ ) := / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) for κ ∈ [0 , , and ψ (1) := 1. To see that ψ ( κ ) is strictlyincreasing in k, we have 1 /ψ ( κ ) = 1 + (1 − κ ) κ +(1 − κ ) σ (cid:15) σ ω + κ κ +(1 − κ ) σ (cid:15) = 1 + (1 − κ ) σ (cid:15) ( κ + (1 − κ ) ) σ ω + κ σ (cid:15) and then we can verify that the second term is decreasing in κ. As κ → , the term 1 / ( (1 − κ ) κ +(1 − κ ) σ (cid:15) ) tends to ∞ , so / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( σ ω + κ κ − κ )2 σ (cid:15) )+1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) ap-proaches / ( (1 − κ )2 κ − κ )2 σ (cid:15) )1 / ( (1 − κ )2 κ − κ )2 σ (cid:15) ) = 1. We also verify that ψ (0) = /σ (cid:15) (1 /σ ω )+(1 /σ (cid:15) ) > . Finally, for any κ ∈ [0 , κ √ κ +(1 − κ ) z + − κ √ κ +(1 − κ ) η i has variance σ (cid:15) and mean 0, so E κ [ ω | s i ] = /σ (cid:15) /σ (cid:15) +1 /σ ω s i . We then define γ as the strictly positive constant /σ (cid:15) /σ (cid:15) +1 /σ ω . A.11 Proof of Lemma 3
Proof.
Player i ’s conditional expected utility given signal s i is α i s i · E κ [ E r ∼ marg r ( µ ) [ ω − rα i s i − rα − i s − i + ζ ] | s i ] −
12 ( α i s i ) by linearity, expectation over r is equivalent to evaluating the inner expectation with r = ˆ r ,which gives α i s i · E κ [ ω −
12 ˆ rα i s i −
12 ˆ rα − i s − i + ζ | s i ] −
12 ( α i s i ) = α i s i · ( γs i −
12 ˆ rα i s i −
12 ˆ rψ ( κ ) s i α − i ) −
12 ( α i s i ) = s i · ( α i γ −
12 ˆ rα i −
12 ˆ rψ ( κ ) α i α − i − α i ) . s i , and the second moment of s i is the samefor all values of κ. Therefore this expectation is E [ s i ] · (cid:16) α i γ − ˆ rα i − ˆ rψ ( κ ) α i α − i − α i (cid:17) . The expression for α BRi ( α − i , ; κ, r ) follows from simple algebra, noting that E [ s i ] > α i for the term in the parenthesis is − ˆ r − < . To see that the said linear strategy is optimal among all strategies, suppose i insteadchooses any q i after s i . By above arguments, the objective to maximize is q i · ( γs i −
12 ˆ rq i −
12 ˆ rψ ( κ ) s i α − i ) − q i . This objective is a strictly concave function in q i , as − ˆ r − < . First-order conditionfinds the maximizer q ∗ i = α BRi ( α − i , ; κ, ˆ r ). Therefore, the linear strategy also maximizesinterim expected utility after every signal s i , and so it cannot be improved on by any otherstrategy. A.12 Proof of Lemma 4
Proof.
Note that α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) ≥ α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) = 1 + α − i ( ψ ( κ • ) − ψ ( κ )) α i + α − i ψ ( κ ) ≤ ψ (0) (re-calling ψ (0) > L = r • · (1 + ψ (0) ) . When ¯ M r ≥ L , we always have r INFi ( α i , α − i , ; κ • , κ, r • ) ≤ ¯ M r for all α i , α − i ≥ κ • , κ ∈ [0 , . Conditional on the signal s i , the distribution of market price under the model F ˆ r,κ, ˆ σ ζ isnormal with a mean of E [ ω | s i ] −
12 ˆ rα i s i −
12 ˆ rα − i · E κ [ s − i | s i ] = γs i −
12 ˆ rα i s i −
12 ˆ rα − i ψ ( κ ) s i , while the distribution of market price under the model F r • ,κ • ,σ • ζ is normal with a mean of E [ ω | s i ] − r • α i s i − r • α − i · E κ • [ s − i | s i ] = γs i − r • α i s i − r • α − i ψ ( κ • ) s i . Matching coefficients on s i , we find that if ˆ r = r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) , then these means match afterevery s i . On the other hand, for any other value of ˆ r, these means will not match for any s i and thus D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F ˆ r,κ, ˆ σ ζ ( α i , α − i )) > r = r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) . Let L = max κ ∈ [0 , n Var κ [ ω | s i ] + Var κ h r • · (1 + ψ (0) ) B α · s − i | s i io . This maximumexists and is finite, since the expression is a continuous function of κ on the compact domain[0 , . Also, let L = max κ ∈ [0 , n Var κ [ ω | s i ] + Var κ h r • B α · s − i | s i io , where the maximumexists for the same reason. Conditional on the signal s i , the variance of market price under43he model F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ isVar κ " ω − r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) α − i s − i | s i + ˆ σ ζ . Since ω and s − i are positively correlated given s i , and using the fact r • α i + α − i ψ ( κ • ) α i + α − i ψ ( κ ) ≤ r • · (1 + ψ (0) ) and α − i ≤ B α , this variance is no larger thanVar κ [ ω | s i ] + Var κ " r • · (1 + 1 ψ (0) ) B α · s − i | s i + ˆ σ ζ = L + ˆ σ ζ . On the other hand, the variance of market price under the model F r • ,κ • ,σ • ζ isVar κ • (cid:20) ω − r • α − i s − i | s i (cid:21) +( σ • ζ ) ≤ Var κ • [ ω | s i ]+Var κ • (cid:20) r • B α · s − i | s i (cid:21) +( σ • ζ ) ≤ L +( σ • ζ ) . At the same time, since ( σ • ζ ) ≥ L , this conditional variance is at least L . Among val-ues of ˆ σ ζ ∈ [0 , ¯ M σ ζ ] , there exists exactly one such that the conditional variance under F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ is the same as that under F r • ,κ • ,σ • ζ , since we have let ¯ M σ ζ ≥ ( σ • ζ ) + L . Thus there is one choice of ˆ σ ζ ∈ [0 , ¯ M σ ζ ] with such that D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˆ σ ζ ( α i , α − i )) = 0. For any other choice of ˜ σ ζ , we conclude that D KL ( F r • ,κ • ,σ • ζ ( α i , α − i ) k F r • αi + α − iψ ( κ • ) αi + α − iψ ( κ ) ,κ, ˜ σ ζ ( α i , α − i )) > A.13 Proof of Lemma 5
Proof.
Assumption 1 holds as A , Θ A , Θ B are compact due to the finite bounds ¯ M α , ¯ M r , ¯ M σ ζ . Also, from Lemma 3, the expected utility from playing α i against α − i in a model with param-eters (ˆ r, κ, σ ζ ) is E [ s i ] · (cid:16) α i γ − ˆ rα i − ˆ rψ ( κ ) α i α − i − α i (cid:17) . This is a continuous function in( α i , α − i , ˆ r ) and strictly concave in α i . Therefore Assumptions 2 and 5 are satisfied.To see the finiteness and continuity of the K functions, first recall that the KL divergencefrom a true distribution N ( µ , σ ) to a different distribution N ( µ , σ ) is given by ln( σ /σ )+ σ +( µ − µ ) σ − . Under own play α i , opponent play α − i , correlation parameter κ, elasticity ˆ r and price idiosyncratic variance σ ζ , the expected distribution of price after signal s i is −
12 ˆ rα i s i + ( ω −
12 ˆ rα − i s − i | s i , κ ) + ˆ ζ where the first term is not random, the middle term is the conditional distribution of ω − ˆ rα − i s − i given s i , based on the joint distribution of ( ω, s i , s − i ) with correlation parameter κ. The final term is an independent random variable with mean 0, variance σ ζ . The analogous44rue distribution of price is − r • α i s i + ( ω − r • α − i s − i | s i , κ • ) + ζ • where ζ • is an independent random variable with mean 0, variance ( σ • ζ ) . For a fixed κ, wemay find 0 < σ < ¯ σ < ∞ so that the variances of both distributions lie in [ σ , ¯ σ ] for all s i ∈ R , α i , α − i ∈ [0 , ¯ M α ] , ˆ r ∈ [0 , ¯ M r ] . First note that as a consequence of the multivariatenormality, the variances of these two expressions do not change with the realization of s i . The lower bound comes from the fact that Var κ ( ω − ˆ rα − i s − i | s i ) is nonzero for all α − i , ˆ r in the compact domains and it is a continuous function of these two arguments, so it musthave some positive lower bound σ > . For a similar reason, the variance of the middleterm has a upper bound for choices of the parameters α − i , ˆ r in the compact domains, andthe inference about σ ζ is also bounded.The difference in the means of the two distributions is no larger than s i · [ ( ¯ M r + r • ) · ( ¯ M r + r • ) · · ( ψ ( κ ) + ψ ( κ • ))] . Thus consider the function h ( s i ) := ln(¯ σ/σ ) + 12 (¯ σ /σ ) + [ ( ¯ M r + r • ) · ( ¯ M r + r • ) · · ( ψ ( κ ) + ψ ( κ • ))] σ s i − . That is h ( s i ) has the form h ( s i ) = C + C s i for constants C , C . It is absolutely integrableagainst the distribution of s i , and it dominates the KL divergence between the true and ex-pected price distributions at every s i and for any choices of α i , α − i ∈ [0 , ¯ M α ] , ˆ r ∈ [0 , ¯ M r ] , σ ζ ∈ [0 , ¯ M ζ ] . This shows K A , K B are finite, so Assumption 3 holds. Further, since the KL diver-gence is a continuous function of the means and variances of the price distributions, and sincethese mean and variance parameters are continuous functions of α i , α − i , ˆ r, σ ζ , the existenceof the absolutely integrable dominating function h also proves K A , K B (as integrals of KLdivergences across different s i ) are continuous, so Assumption 4 holds. A.14 Proof of Proposition 7
Proof.
We can take L , L , L as given by Lemma 4. Suppose there is an EZ-SC with behavior α = ( α AA , α AB , α BA , α BB ) and beliefs over parameters µ A ∈ ∆(Θ( κ • )) , µ B ∈ ∆(Θ( κ )) . ByLemma 4, both µ A and µ B must be degenerate beliefs that induce zero KL divergence, sinceboth groups match up with group A with probability 1. Furthermore, since Θ A is correctlyspecified, it is easy to see that the model F r • ,κ • ,σ • ζ generates 0 KL divergence, hence the beliefof the adherents of Θ A must be degenerate on this correct model.In terms of behavior, from Lemma 3, α BRi ( α − i , ; κ, r ) ≤ γ for all α − i ≥ , κ ∈ [0 , , r ≥ . Since the upper bound ¯ M α ≥ γ , the adherents of each theory must be best responding (across45ll linear strategies in [0 , ∞ )) in all matches, given their beliefs about the environment.Using the equilibrium belief of group A, we must have α AA = α BRi ( α AA , ; κ • , r • ) , so α AA = γ − r • ψ ( κ • ) α AA r • . We find the unique solution α AA = γ r • + r • ψ ( κ • ) .Next we turn to α AB , α BA , and µ B . We know µ B puts probability 1 on some r B . Foradherents of groups A and B to best respond to each others’ play and for group B’s inferenceto have 0 KL divergence (when paired with an appropriate choice of σ ζ ), we must have α AB = γ − r • ψ ( κ • ) α BA r • , α BA = γ − r B ψ ( κ ) α AB r B , and r B = r • α BA + α AB ψ ( κ • ) α BA + α AB ψ ( κ ) from Lemma 4. Wemay rearrange the expression for α BA to say α BA = γ − r B α BA − r B ψ ( κ ) α AB . Substitutingthe expression of r B into this expression of α BA , we get α BA = γ − r B · ( α BA + α AB ψ ( κ ) − α AB ψ ( κ ))= γ − r • α BA + r • α AB ψ ( κ • ) α BA + α AB ψ ( κ ) · ( α BA + α AB ψ ( κ ) − α AB ψ ( κ ))= γ − r • α BA − r • α AB ψ ( κ • ) + 12 ψ ( κ ) α AB r • α BA + r • α AB ψ ( κ • ) α BA + α AB ψ ( κ )Multiply by α BA + α AB ψ ( κ ) on both sides and collect terms by powers of α , ( α BA ) · [ − − r • ]+( α BA α AB ) · (cid:20) − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • ) (cid:21) − ( α AB ) · (cid:20) r • ψ ( κ • ) ψ ( κ ) (cid:21) + γ [ α BA + α AB ψ ( κ )] = 0 . Consider the following quadratic function in x , H ( x ) := x [ − − r • ]+( x · ‘ ( x )) · (cid:20) − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • ) (cid:21) − ( ‘ ( x )) · (cid:20) r • ψ ( κ • ) ψ ( κ ) (cid:21) + γ [ x + ‘ ( x ) ψ ( κ )] = 0 , (1) where ‘ ( x ) := γ − r • ψ ( κ • ) x r • is a linear function in x. In an EZ-SC, α BA is a root of H ( x )in [0 , γ r • ψ ( κ • ) ]. To see why, if we were to have α BA > γ r • ψ ( κ • ) , then α AB = 0 . In thatcase, r B = r • and so α BA = α BRi (0 , ; κ • , r • ) = γ r • . Yet γ r • < γ r • ψ ( κ • ) , contradiction.Conversely, for any root x ∗ of H ( x ) in [0 , γ r • ψ ( κ • ) ], there is an EZ-SC where α BA = x ∗ ,α AB = ‘ ( x ∗ ) ∈ [0 , γ ] , and r B = r • α BA + α AB ψ ( κ • ) α BA + α AB ψ ( κ ) . We now show H ( x ) (i) has a unique root in [0 , γ r • ψ ( κ • ) ] when κ = κ • ; (ii) does not havea root at x = 0 or x = γ r • ψ ( κ • ) , and (iii) the root in the interval is not a double root. Since H ( x ) is a continuous function of κ, there must exist some κ < κ • < ¯ κ so that it continuesto have a unique root in [0 , γ r • ψ ( κ • ) ] for all κ ∈ [ κ , ¯ κ ] ∩ [0 , . Claim (i) has to do with the fact that if κ = κ • , then we need α AB = γ − r • ψ ( κ • ) α BA r • and α BA = γ − r • ψ ( κ • ) α AB r • . These are linear best response functions with a slope of − r • r • ψ ( κ • ),which falls in ( − , . So there can only be one solution to H in that region (even whenwe allow α AB = α BA ), which is the symmetric equilibrium found before α AB = α BA =46 r • + r • ψ ( κ • ) .For Claim (ii), we evaluate H (0) = − ( γ r • ) r • ψ ( κ • ) + γ ψ ( κ • )1+ r • = ψ ( κ • ) γ r • (1 − (1 / r • ψ ( κ • )1+ r • ) =0 because 1 + r • > (1 / r • ψ ( κ • ) . Finally, we evaluate H ( γ r • ψ ( κ • ) ) = ( γ r • ψ ( κ • ) ) ( − − r • ) + γ γ r • ψ ( κ • ) = γ r • ψ ( κ • ) (1 − r • r • ψ ( κ • ) ) . This is once again not 0 because 1 + r • > (1 / r • ψ ( κ • ) . For Claim (iii), we show that H ( x ∗ ) < x ∗ = γ r • + r • ψ ( κ • ) . We find that H ( x ) =2 x ( − − r • ) + γ − r • ψ ( κ • ) x r • ! ( − ψ ( κ • ) − r • ψ ( κ • ) − r • ψ ( κ • )) − γ − r • ψ ( κ • ) x r • ! − r • ψ ( κ • )1 + r • ! (cid:18) r • ψ ( κ • ) (cid:19) + γ − r • ψ ( κ • )1 + r • γψ ( κ • ) . Collecting terms, the coefficient on x is − − r • + ψ ( κ • ) r • r • r • + 1 −
14 ( ( r • ) ψ ( κ • ) r • ) ! , while the coefficient on the constant is γψ ( κ • )1 + r • − r • − r • ) ψ ( κ • ) r • − r • ψ ( κ • ) ! + γ. Therefore, we may calculate H ( x ∗ ) · x ∗ (1 + r • ) , which has the same sign as H ( x ∗ ) , to be: − (1 + r • ) (2 + 2 r • ) + ψ ( κ • ) r • ((1 + r • )( 32 r • + 1) −
14 ( r • ) ψ ( κ • ) )+ (1 + r • + 12 r • ψ ( κ • )) (cid:20) ψ ( κ • )((1 + r • )[ − r • − − r • ψ ( κ • )] + 12 ( r • ) ψ ( κ • ) ) + (1 + r • ) (cid:21) . We have − (1 + r • ) (2 + 2 r • ) + (1 + r • + 12 r • ψ ( κ • ))(1 + r • ) ≤ (1 + r • ) ( − − r • ) < , since 0 ≤ ψ ( κ • ) ≤ . Also, for the same reason,(1 + r • )[ − r • ψ ( κ • )] + 12 ( r • ) ψ ( κ • ) ≤ −
12 ( r • ) ψ ( κ • ) + 12 ( r • ) ψ ( κ • ) ≤ . Finally, ψ ( κ • ) r • (1 + r • )( r • + 1) + (1 + r • + r • ψ ( κ • )) ψ ( κ • )(1 + r • )( − r • −
1) is no largerthan ψ ( κ • ) r • ( 32 ( r • ) + 52 r • + 1) + [ r • ψ ( κ • ) r • ( − (3 / r • )]+ [ r • ψ ( κ • ) r • ( −
1) + 1 · ψ ( κ • ) r • ( − (3 / r • )] + [ r • ψ ( κ • ) · · ( − H ( x ∗ ) < . We have shown that for κ ∈ [ κ , ¯ κ ] ∩ [0 , r B ( κ )) in EZ-SC), since there is only one possible outcome in thematch between group A and group B. This means α BB is also pinned down, since there isonly one solution to α BB = α BRi ( α BB , ; κ, r B ( κ )). So for every κ ∈ [ κ , ¯ κ ] ∩ [0 , κ by α ( κ ) =( α AA ( κ ) , α AB ( κ ) , α BA ( κ ) , α BB ( κ )) . Recall from Lemma 3 that the objective expected utility from playing α i against anopponent who plays α − i is U • i ( α i , α − i ) = E [ s i ] · (cid:16) α i γ − r • α i − r • ψ ( κ • ) α i α − i − α i (cid:17) . If − i plays the rational best response, then the objective expected utility of choosing α i is¯ U i ( α i ) := E [ s i ] · (cid:18) α i γ − r • α i − r • ψ ( κ • ) α i γ − r • ψ ( κ • ) α i r • − α i (cid:19) . The derivative in α i is¯ U i ( α i ) = γ − r • α i − r • r • γψ ( κ • ) +
12 ( r • ) ψ ( κ • ) r • α i − α i . We also know that α AA = γ r • + r • ψ ( κ • ) satisfies the first-order condition that γ − r • α AA − r • ψ ( κ • ) α AA − α AA = 0, therefore¯ U i ( α AA ) = − r • r • γψ ( κ • ) + 12 ( r • ) ψ ( κ • ) r • α AA + 12 r • ψ ( κ • ) α AA = " r • ψ ( κ • )2 − γ r • + α AA ψ ( κ • ) r • r • + α AA ! . Making the substitution α AA = γ r • + r • ψ ( κ • ) , − γ r • + α AA ψ ( κ • ) r • r • + α AA = − γ (1 + r • + ψ ( κ • ) r • ) + γψ ( κ • ) r • + γ (1 + r • )(1 + r • )(1 + r • + ψ ( κ • ) r • )= γψ ( κ • ) r • (1 + r • )(1 + r • + ψ ( κ • ) r • ) > . Therefore, if we can show that α BA ( κ • ) > , then there exists some κ ≤ κ < κ • < ¯ κ ≤ ¯ κ sothat for every κ ∈ [ κ, ¯ κ ] ∩ [0 , κ = κ • adherents of Θ B have strictly higher or strictly lowerequilibrium fitness in the unique EZ-SC than adherents of Θ A , depending on the sign of κ − κ • .Consider again the quadratic function H ( x ) in Equation (1) and implicitly characterize theunique root x in [0 , γ r • ψ ( κ • ) ] as a function of κ in a neighborhood around κ • . Denote this48oot by α M , let D := dα M dψ ( κ ) and also note d‘ ( α M ) dψ ( κ ) = − r • r • ) ψ ( k • ) · D . We have ( − − r • ) · (2 α M ) · D + ( α M ‘ ( α M ))( − − r • )+ ( ‘ ( α M ) D + α M − r • r • ) ψ ( κ • ) D ) · ( − ψ ( κ ) − r • ψ ( κ ) − r • ψ ( κ • )) + +( ‘ ( α M )) · ( − r • ψ ( κ • ))+ (2 ‘ ( α M ) − r • r • ) ψ ( κ • ) D ) · ( − r • ψ ( κ • ) ψ ( κ )) + γ ( D + ‘ ( α M ) + ψ ( κ ) − r • r • ) ψ ( κ • ) D ) = 0 Evaluate at κ = κ • , noting that α M ( κ • ) = ‘ ( α M ( κ • )) = x ∗ := γ r • + ψ ( κ • ) r • .The terms without D are:( x ∗ ) ( − − r • ) + ( x ∗ ) ( 12 r • ψ ( κ • )) + γx ∗ = x ∗ · (cid:20) − x ∗ · (cid:18) r • + 12 ψ ( κ • ) r • − r • (cid:19) + γ (cid:21) = x ∗ · (cid:20) − γ + 12 x ∗ r • + γ (cid:21) = 12 r • ( x ∗ ) > . The coefficient in front of D is:( − − r • )(2 x ∗ )+( x ∗ + x ∗ − r • r • ) ψ ( κ • )) · ( − ψ ( κ • ) − r • ψ ( κ • ))+ 12 x ∗ ( r • ) (1 + r • ) ψ ( κ • ) + γ + γψ ( κ • ) · − r • r • ) . Make the substitution γ = x ∗ · (cid:16) r • + ψ ( κ • ) r • (cid:17) , x ∗ · ( − − r • + − r • r • ) ψ ( κ • ) ! · ψ ( κ • )( − r • −
1) + ( r • ) r • ) ψ ( κ • ) ) + x ∗ · ((cid:18) r • + 12 ψ ( κ • ) r • (cid:19) · (1 − ψ ( κ • ) r • r • ) ) ) . Collect terms inside the parenthesis based on powers of ψ ( κ • ) , we get x ∗ · ( ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • r • ) ( − r • −
1) + ψ ( κ • )( − r • − − r • − ) + x ∗ · ( − ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • r • ) · (1 + r • ) + 1 + r • + 12 ψ ( κ • ) r • ) . Combine to get: x ∗ · " ψ ( κ • ) ( r • ) r • ) + ψ ( κ • ) ( r • ) r • ) − ψ ( κ • ) r • − ψ ( κ • ) − r • − . ψ ( κ • ) r • ) r • ) and ψ ( κ • ) ( r • ) r • ) are positive terms with ψ ( κ • ) ( r • ) r • ) + ψ ( κ • ) ( r • ) r • ) ≤ ( r • ) r • ) + ( r • ) r • ) ≤ · r • · r • r • ≤ r • . Now − r • + · r • <
0, and also − ψ ( κ • ) r • − ψ ( κ • ) − < . Thus the coefficient in front of D is strictly negative. This shows D ( κ • ) > . Finally, dα M dψ ( κ ) has the same sign as dα M dκ since ψ ( κ ) is strictly increasing in κ. A.15 Proof of Proposition 8
Proof.
We will show that in every EZ-SC: (i) for each g ∈ { A, B } , µ g puts probability 1 on ψ ( κ • )1+ ψ ( κ g ) r • ; (ii) for each g ∈ { A, B } , α gg = γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κg ) ) ; (iii) the equilibrium fitnessof group A is weakly higher than that of group B if and only if κ A ≤ κ B .Choose L , L , L as in Lemma 4, given r • and ¯ M α . In any EZ-SC with behavior ( α AA , α AB , α BA , α BB ) , since the adherents of each theory matches with their own group with probability 1 underperfectly assortatively matching, we conclude that each of µ g for g ∈ { A, B } must put fullweight on r INFi ( α gg , α gg ; κ • , κ g , r • ) = α gg + α gg ψ ( κ • ) α gg + α gg ψ ( κ g ) r • = ψ ( κ • )1+ ψ ( κ g ) r • , proving (i).Given this belief, we must have α gg = γ −
12 1+ ψ ( κ • )1+ ψ ( κg ) r • ψ ( κ g ) α gg ψ ( κ • )1+ ψ ( κg ) r • by Lemma 3. Rearrangingyields α gg = γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) , proving (ii).From Lemma 3, the objective expected utility of each player when both play the strategyprofile α symm is E [ s i ] · (cid:16) α symm γ − r • α symm − r • ψ ( κ • ) α symm − α symm (cid:17) . This is a strictlyconcave quadratic function in α symm that is 0 at α symm = 0 . Therefore, it is strictly decreasingin α symm for α symm larger than the team solution α T EAM that maximizes this expression,given by the first-order condition γ − r • α T EAM − r • ψ ( κ • ) α T EAM − α T EAM = 0 ⇒ α T EAM = γ r • + r • ψ ( κ • ) . For any value of κ ∈ [0 , , using the fact that ψ (0) > ψ is strictly increasing, γ r • (1 + ψ ( κ • )) + r • ( ψ ( κ • )1+ ψ ( κ ) ) > γ r • (1 + ψ ( κ • )) + r • (1 + ψ ( κ • )) = α T EAM . Also, γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) is a strictly increasing function in κ , since ψ is strictly increas-ing. We therefore conclude that each player’s utility when they play γ r • (1+ ψ ( κ • ))+ r • ( ψ ( κ • )1+ ψ ( κ ) ) against each other is strictly decreasing in κ, proving (iii).50 .16 Proof of Proposition 9 Proof.
Find L , L , L as given by Lemma 4. Suppose Θ A = Θ( κ • ), Θ B = { F r • ,κ,σ • ζ } for any κ ∈ [0 , , ( p A , p B ) = (1 , λ ∈ [0 , , then arguments similar to those in the proof ofLemma 4 imply there exists exactly one EZ-SC, and it involves the adherents of Θ A holdingcorrect beliefs and playing γ r • + r • ψ ( κ • ) against each other.We now analyze α BA ( κ ) in such EZ-SC. In the proof of Proposition 7, we defined ¯ U i ( α i )as i ’s objective expected utility of choosing α i when − i plays the rational best response.We showed that ¯ U i ( γ r • + r • ψ ( κ • ) ) > . In an EZ-SC where i believes in the model F r • ,κ,σ • ζ and − i believes in the model F r • ,κ • ,σ • ζ , using the expression for α BRi from Lemma 3, theplay of i solves x = γ − r • ψ ( κ ) (cid:16) γ − r • ψ ( κ • ) x r • (cid:17) r • , which implies α BA ( κ ) = γ (1+ r • − ψ ( κ ) r • )1+2 r • +( r • ) − ψ ( κ ) ψ ( κ • )( r • ) .Taking the derivative and evaluating at κ = κ • , we find an expression with the same sign as ψ ( κ • ) r • (1+ r • ) γ ( − r • )+ ψ ( κ • ) r • ) , which is strictly negative because ψ ( κ • ) > , r • > ,γ > , and ψ ( κ • ) ≤
1. This shows there exists (cid:15) > κ h ∈ ( κ • , κ • + (cid:15) ], wehave ¯ U i ( α BA ( κ h )) < ¯ U i ( γ r • + r • ψ ( κ • ) ), that is the adherents of { F r • ,κ h ,σ • ζ } have strictly lowerfitness than the adherents of Θ( κ • ) with λ = 0 in the unique EZ-SC. Finally, existence andupper-hemicontinuity of EZ-SC in population proportion in such societies can be establishedusing arguments similar to the proof of Propositions 5 and 6. This establishes the first claimto be proved.Next, we turn to α BB ( κ ) . Using the expressing for α BRi in Lemma 3, we find that α BB ( κ ) = γ r • + r • ψ ( κ ) . Since ψ > , we have α BB ( κ ) is strictly larger than α AA = γ r • + r • ψ ( κ • ) when κ < κ • . From the proof of Proposition 8, we know that objective payoffs in the stage game isstrictly decreasing in linear strategies larger than the team solution α T EAM = γ r • + r • ψ ( κ • ) . Since α BB ( κ ) > α AA > α T EAM , we conclude the adherents of { F r • ,κ l ,σ • ζ } have strictly lowerfitness than the adherents of Θ( κ • ) with λ = 1 in the unique EZ-SC, for any κ l < κ • . Again ,existence and upper-hemicontinuity of EZ-SC in population proportion in such societies canbe established using arguments similar to the proof of Propositions 5 and 6. This establishesthe second claim to be proved.
A.17 Proof of Proposition 10
Proof.
Consider the society where Θ A = Θ B = Θ( κ • ), ( p A , p B ) = (1 , . For any EZ-SC with behavior ( σ AA , σ AB , σ BA , σ BB ) and beliefs ( µ A , µ B ), there exists another EZ-SC( σ AA , σ AB , σ BA , σ BB ) where σ g,g = σ AA for all g, g ∈ { A, B } and all agents hold the belief µ A .The uniqueness of EZ-SC from Assumption 6 implies α AB ( κ • ) = α BA ( κ • ) = α BB ( κ • ) = α • . Now consider the society where Θ B = Θ( κ ), ( p A , p B ) = (1 , . By the same arguments asthe existence arguments in Proposition 5, there exists an EZ-SC where α AA ( κ ) = α AA ( κ • ) .
51y the uniqueness of EZ-SC from Assumption 6, we must in fact have α AA ( κ ) = α AA ( κ • )for all κ , so the fitness of theory Θ( κ • ) in the unique EZ-SC is E • [ E • [ u • ( α • s , α • s , ω ) | s ]] . Under λ matching with mutant theory Θ( κ ), the mutant’s fitness in the unique EZ-SC is E • [ E • [(1 − λ ) u • ( α BA ( κ ) s , α AB ( κ ) s , ω ) + ( λ ) u • ( α BB ( κ ) s , α BB ( κ ) s , ω ) | s ]] . Differentiate and evaluate at κ = κ • . At κ = κ • , adherents of Θ A and Θ B have the samefitness since they play the same strategies. So, a non-zero sign on the derivative would givethe desired evolutionary fragility against either theories with slightly higher or slightly lower κ. This derivative is: E • E • ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α BA ( κ • ) + λα BB ( κ • )] · s + ∂u • ∂q ( α • s , α • s , ω ) · [(1 − λ ) α AB ( κ • ) + λα BB ( κ • )] · s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s . Using the interim optimality part of Assumption 6, E • h ∂u • ∂q ( α • s , α • s , ω ) | s i = 0 for every s ∈ S , using the necessity of the first-order condition. The derivative thus simplifies asclaimed. A.18 Proof of Proposition 11
Proof.
When Θ A = Θ B = Θ • , for any matching assortativity λ and with ( p A , p B ) = (1 , , we show adherents of both theories have 0 fitness in every approachable EZ. Suppose insteadthat the match between groups g and g reach a terminal node other than z with positiveprobability. Let n L be the last non-terminal node reached with positive probability, so wemust have L ≥
2, and also that nodes n , ..., n L − are also reached with positive probability.So Drop must be played with probability 1 at n L . Since n L is reached with positive probabilityand the EZ is approachable, correctly specified agents hold correct beliefs about opponent’splay at n L , which means at n L − it cannot be optimal to play Across with positive probabilitysince this results in a loss of ‘ compared to playing Drop, a contradiction.Now let Θ A = Θ • , Θ B = Θ An . Suppose λ ∈ [0 ,
1] and let p B ∈ (0 , . We claim thereis an EZ where d kAA = 1 for every k , d kAB = 0 for every even k with k < K , d kAB = 1 forevery other k , d kBA = 0 for every odd k and d kBA = 1 for every even k , and d kBB = 0 forevery k with k < K, d KBB = 1 . It is easy to see that the behavior ( d AA ) is optimal undercorrect belief about opponent’s play. In the Θ A vs. Θ B matches, the conjecture about A’splay ˆ d kAB = 2 /K for k even, ˆ d kAB = 1 for k odd minimizes KL divergence among all strategies52n A An , given B’s play. To see this, note that when B has the role of P2, opponent Dropsimmediately. When B has the role of P1, the outcome is always z K . So a conjecture withˆ d kAB = x for every even k has the conditional KL divergence of: X k ≤ K − · ln (cid:18) (cid:19)| {z } (1 ,z k ) for k ≤ K − + X k ≤ K − · ln / · (1 − x ) ( k/ − · x !| {z } (1 ,z k ) for k ≤ K − + 12 ln / / · (1 − x ) ( K/ − · x !| {z } (1 ,z K ) + 0 · ln − x ) ( K/ !| {z } (1 ,z end ) when matched with an opponent from Θ A . Using 0 · ln(0) = 0 , the expression simplifies to ln (cid:16) − x ) ( K/ − · x (cid:17) , which is minimized among x ∈ [0 ,
1] by x = 2 /K. Against this conjecture,the difference in expected payoff at node n K − from Across versus Drop is (1 − /K )( g ) +(2 /K )( − ‘ ) . This is strictly positive when g > K − ‘. This means the continuation value at n K − is at least g larger than the payoff of Dropping at n K − , so again Across has strictlyhigher expected payoff than Drop. Inductively, ( d kBA ) is optimal given the belief ( ˆ d kAB ) . Also,( d kAB ) is optimal as it results in the highest possible payoff. We can similarly show that theconjecture ˆ d kBB with ˆ d kBB = 2 /K for k even, ˆ d kBB = 0 for k odd minimizes KL divergenceconditional on Θ B opponent, and ( d kBB ) is optimal given this conjecture.As p B → , we find an approachable EZ where adherents of A have fitness 0, whereasthe adherents of B have fitness at least ((( K/ − g − ‘ ) > g > K − ‘. This showsΘ A is not evolutionarily stable against Θ B .But consider the same ( d AA , d AB , d BA ) and suppose d kBB = 1 for every k . Taking p B → , with λ <
1, we find an approachable EZ where adherents of B have fitness 0, adherents of Ahave fitness (1 − λ ) · · (( K/ g + ‘ ) > . This shows Θ B is not evolutionarily stable againstΘ A . B Learning Foundation of EZ and EZ-SC
We provide a foundation for EZ and EZ-SC as the steady state of a learning system.
B.1 Regularity Assumptions
We make some regularity assumptions on the objective environments and on the theoriesΘ A , Θ B . These are similar to the regularity assumptions from Section 3.3, but we do notrequire here that Θ A , Θ B have a product structure for the EZ microfoundation.Suppose A is finite. Suppose the marginals of Θ A , Θ B on the dimension of fundamental53ncertainty, F A , F B , are compact metrizable spaces. So, we can endow Θ A and Θ B withthe product metric. Suppose that each model ( a A , a B , F ) in each theory is so that for every( a i , a − i ) ∈ A , whenever f • ( a i , a − i )( y ) > , we also get f ( a i , a A )( y ) > f ( a i , a B )( y ) > f is the density or probability mass function for F .For each g, g ∈ { A, B } , F ∈ F g , define K g,g : A × Θ g → R by K g,g ( a i , a − i ; ( a A , a B , F )) = KL ( F • ( a i , a − i ) k F ( a i , a g )) . Suppose each K g,g is well defined and a continuous function ofthe model ( a A , a B , F ).For g ∈ { A, B } , F ∈ F g , let U g ( a i , a − i ; F ) be the expected payoffs of the strategy profile( a i , a − i ) for i when consequences are drawn according to F. Assume U A , U B are continuous.Suppose for every theory Θ g and every ( a A , a B , F ) ∈ Θ g and (cid:15) > , there exists anopen neighborhood V ⊆ Θ g of ( a A , a B , F ), so that for every (ˆ a A , ˆ a B , ˆ F ) ∈ V , 1 − (cid:15) ≤ f ( a i , a A )( y ) / ˆ f ( a i , ˆ a A )( y ) ≤ (cid:15) and 1 − (cid:15) ≤ f ( a i , a B )( y ) / ˆ f ( a i , ˆ a B )( y ) ≤ (cid:15) for all a i ∈ A , y ∈ Y . Also suppose there is some M > f ( a i , a A )( y )) and ln( f ( a i , a B )( y ))are bounded in [ − M, M ] for all ( a A , a B , F ) ∈ Θ g , a i , a − i ∈ A , y ∈ Y . B.2 Learning Environment
Time is discrete and infinite, t = 0 , , , ... A unit mass of agents, i ∈ [0 , p A ∈ (0 ,
1) measure of them are assigned to theory A and the rest are assignedto theory B . Each agent born into theory g starts with the same full support prior overthis theory, µ (0) g ∈ ∆(Θ g ), and believes there is some ( a A , a B , F ) ∈ Θ g so that every group g opponent always plays a g and the consequences are always generated by F .In each period t , agents are matched up partially assortatively to play the stage game.Assortativity is λ ∈ (0 , . Each person in group g has λ + (1 − λ ) p g chance of matching withsomeone from group g, and matches with someone from group − g with the complementarychance. Each agent i observes their opponent’s group membership and chooses a strategy a ( t ) i ∈ A . At the end of the match, the agent observes own consequence y ( t ) i and an ex-post signal x ( t ) i ∈ A , where x ( t ) i equals the matched opponent’s strategy a − i with probability τ ∈ [0 , , and it is uniformly random on A with the complementary probability. To give afoundation for EZ, we consider τ = 0, so the signal x i is uninformative. To give a foundationfor EZ-SC, we consider τ close to 1.Thus, the space of histories from one period is { A, B } × A × Y × A , where the firstinstance of the strategy is own strategy and the second instance is the ex-post signal. Let H denote the space of all finite-length histories.Given the assumption on the two theories, there is a well-defined Bayesian belief operatorfor each theory g, µ g : H → ∆(Θ g ) , mapping every finite-length history into a belief overmodels in Θ g , starting with the prior µ (0) g .
54e also take as exogenously given policy functions for choosing strategies after eachhistory. That is, a g,g : H → A for every g, g ∈ { A, B } gives the strategy that a group g agent uses against a group g opponent after every history. Assume these policy functionsare asymptotically myopic. Assumption A.1.
For every (cid:15) > , there exists K so that for any history h containingat least K matches against opponents of each group, a g,g ( h ) is an (cid:15) -best response to theBayesian belief µ g ( h ) about the model. From the perspective of each agent i in group g, i ’s play against groups A and B, aswell as i ’s belief over Θ g , is a stochastic process (˜ a ( t ) iA , ˜ a ( t ) iB , ˜ µ ( t ) i ) t ≥ valued in A × A × ∆(Θ g ) . The randomness is over the groups of opponents matched with in different periods, thestrategies they play, and the random consequence and ex-post signals drawn at the endof the match. At the same time, since there is a continuum of agents, the distributionover histories within each population in each period is deterministic. As such, there is adeterministic sequence ( α ( t ) AA , α ( t ) AB , α ( t ) BA , α ( t ) BA , ν ( t ) A , ν ( t ) B ) ∈ ∆( A ) × ∆(∆(Θ A )) × ∆(∆(Θ B ))that describes the distributions of play and beliefs that prevail in the two sub-populationsin every period t. B.3 Steady State Limits are EZs and EZ-SCs
We state and prove the learning foundation. For ( α ( t ) ) t a sequence valued in ∆( A ) and a ∗ ∈ A , α ( t ) → a ∗ means E ˆ a ∼ α ( t ) k ˆ a − a ∗ k→ t → ∞ . For ( ν ( t ) ) t a sequence valued in∆(∆(Θ g )) and µ ∗ ∈ ∆(Θ g ) , ν ( t ) → µ ∗ means E ˆ µ ∼ ν ( t ) k ˆ µ − µ ∗ k→ t → ∞ . Proposition A.1.
Suppose the regularity assumptions in Section B.1 hold, and supposeAssumption A.1 holds.Suppose τ = 0 . Suppose there exists ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) ∈ A × ∆(Θ A ) × ∆(Θ B ) so that ( α ( t ) AA , α ( t ) AB , α ( t ) BA , α ( t ) BA , ν ( t ) A , ν ( t ) B ) → ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) and for each agent i in group g, almost surely (˜ a ( t ) iA , ˜ a ( t ) iB , ˜ µ ( t ) i ) → ( a ∗ gA , a ∗ gB , µ ∗ g ) . Then, ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) is an EZ.Suppose Θ A , Θ B have the product structure. Then, there exists some τ < so that forevery τ ∈ ( τ ,
1) ( a ∗ AA , a ∗ AB , a ∗ BA , a ∗ BB , µ ∗ A , µ ∗ B ) is an EZ-SC under the above conditions.Proof. We first consider the case of τ = 0 , so the uninformative ex-post signals may beignored.For µ a belief and g ∈ { A, B } , let u µ ( a i ; g ) represent subjective expected payoff fromplaying a i against group g . Suppose a ∗ AA / ∈ argmax ˆ a ∈ A u µ ∗ A (ˆ a ; A ) (the other cases are analo-gous). By the continuity assumptions on U A (which is also bounded because F A is bounded),55here are some (cid:15) , (cid:15) > µ i ∈ ∆(Θ A ) with k µ i − µ ∗ A k < (cid:15) , we also have u µ i ( a ∗ AA ; A ) < max ˆ a ∈ A u µ i (ˆ a ; A ) − (cid:15) . By the definition of asymptotically empirical best re-sponses, find K so that a A,A ( h ) must be a myopic (cid:15) -best response when there are at least K periods of matches against A and B. Agent i has a strictly positive chance to match withgroups A and B in every period. So, at all except a null set of points in the probabilityspace, i ’s history eventually records at least K periods of play by groups A and B. Also,by assumption, almost surely ˜ µ ( t ) i → µ ∗ A . This shows that by asymptotically myopic bestresponses, almost surely ˜ a ( k ) iA a ∗ AA , a contradiction.Now suppose some θ ∗ A = ( a ∗ A , a ∗ B , f ∗ ) in the support of µ ∗ A does not minimize the weightedKL divergence in the definition of EZ (the case of a model θ ∗ B in the support of µ ∗ B notminimizing is similar). Then we have θ ∗ A / ∈ argmin ˆ θ ∈ Θ A ( λ + (1 − λ ) p A ) · D KL ( F • ( a ∗ AA , a ∗ AA ) k ˆ F ( a ∗ AA , ˆ a A ))+(1 − λ )(1 − p A ) · D KL ( F • ( a ∗ AB , a ∗ BA ) k ˆ F ( a ∗ AB , ˆ a B )) where ˆ θ = (ˆ a A , ˆ a B , ˆ F ) . This is equivalent to: θ ∗ A / ∈ argmax ˆ θ ∈ Θ A ( λ + (1 − λ ) p A ) · E y ∼ F • ( a ∗ AA ,a ∗ AA ) ln( ˆ f ( a ∗ AA , ˆ a A )( y ))+(1 − λ )(1 − p A ) · E y ∼ F • ( a ∗ AB ,a ∗ BA ) ln( ˆ f ( a ∗ AB , ˆ a B )( y )) Let this objective, as a function of ˆ θ , be denoted W L (ˆ θ ) . There exists θ optA = ( a optA , a optB , f opt ) ∈ Θ A and δ, (cid:15) > − δ ) W L ( θ optA ) − δM − (cid:15) > (1 − δ ) W L ( θ ∗ A ) . By assumption on theprimitives, find open neighborhoods V opt and V ∗ of θ optA , θ ∗ A respectively, so that for all a i ∈ A ,g ∈ { A, B } , y ∈ Y , 1 − (cid:15) ≤ f opt ( a i , a optg )( y ) / ˆ f ( a i , ˆ a g )( y ) ≤ (cid:15) , for all ˆ θ = (ˆ a A , ˆ a B , ˆ f ) ∈ V opt ,and also 1 − (cid:15) ≤ f ∗ ( a i , a ∗ g )( y ) / ˆ f ( a i , ˆ a g )( y ) ≤ (cid:15) for all ˆ θ = (ˆ a A , ˆ a B , ˆ f ) ∈ V ∗ . Also, byconvergence of play in the populations, find T so that in all periods t ≥ T , α ( t ) AA ( a ∗ AA ) ≥ − δ and α ( t ) BA ( a ∗ BA ) ≥ − δ .For T ≥ T , consider a probability space defined by Ω := ( { A, B } × A × ( Y ) A ) ∞ thatdescribes the randomness in an agent’s learning process starting with period T + 1. For apoint ω ∈ Ω and each period T + s , s ≥ ω s = ( g, a − i,A , a − i,B , ( y a i ,a − i ) ( a i ,a − i ) ∈ A ) specifiesthe group g of the matched opponent, the play a − i,A , a − i,B of hypothetical opponents fromgroups A and B, and the hypothetical consequence y a i ,a − i that would be generated for everypair of strategies ( a i , a − i ) played. As notation, let opp ( ω, s ), a − i,A ( ω, s ) , a − i,B ( ω, s ), and y a i ,a − i ( ω, s ) denote the corresponding components of ω s . Define P T over this space in thenatural way. That is, it is independent across periods, and within each period, the density56or probability mass function if Y is finite) of ω s = ( g, a − i,A , a − i,B , ( y a i ,a − i ) ( a i ,a − i ) ∈ A ) is m g · α ( T + s ) AA ( a − i,A ) α ( T + s ) BA ( a − i,B ) · Y ( a i ,a − i ) ∈ A f • ( a i , a − i )( y a i ,a − i ) , where m g is the probability of i from group A being matched up against an opponent ofgroup g, that is m A = ( λ + (1 − λ ) p A ), m B = (1 − λ )(1 − p A ) . For θ = ( a θA , a θB , F θ ) ∈ Θ A with f θ the density of F θ , ω ∈ Ω , consider the stochasticprocess ‘ s ( θ, ω ) := 1 s T + s X t = T +1 ln( f θ ( a ∗ AA , a θopp ( ω,t ) )( y a ∗ AA ,a − i,opp ( ω,t ) ( ω,t ) ( ω, t )) . By choice of the neighborhood V ∗ , lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + 1 s T + s X t = T +1 ln( f ∗ ( a ∗ AA , a ∗ opp ( ω,t ) )( y a ∗ AA ,a − i,opp ( ω,t ) ( ω,t ) ( ω, t )) ≤ (cid:15) + 1 s T + s X t = T +1 { a − i,opp ( ω,t ) ( ω,t )= a ∗ opp ( ω,t ) ,A } · ln( f ∗ ( a ∗ AA , a ∗ opp ( ω,t ) )( y a ∗ AA ,a ∗ opp ( ω,t ) ,A ( ω, t ))(1 − { a − i,opp ( ω,t ) ( ω,t )= a ∗ opp ( ω,t ) ,A } ) · M. Since T ≥ T , in every period t, P T ( a − i,opp ( ω,t ) ( ω, t ) = a ∗ opp ( ω,t ) ,A ) ≥ − δ . Let ( ξ k ) k ≥ arelated stochastic process: it is i.i.d. such that each ξ k has δ chance to be equal to M, (1 − δ ) m A chance to be distributed according to ln( f ∗ ( a ∗ AA , a ∗ A )( y )) where y ∼ f • ( a ∗ AA , a ∗ AA ) , and(1 − δ ) m B chance to be distributed according to ln( f ∗ ( a ∗ AB , a ∗ B )( y )) where y ∼ f • ( a ∗ AB , a ∗ BA ) . By law of large numbers, s P sk =1 ξ k converges almost surely to δM + (1 − δ ) W L ( θ ∗ A ) . Bythis comparison, lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + δM + (1 − δ ) W L ( θ ∗ A ) P T -almost surely.By a similar argument, lim inf s inf θ A ∈ V opt ‘ s ( θ A , ω ) ≥ − (cid:15) − δM + (1 − δ ) W L ( θ optA ) P T -almostsurely.Along any ω where we have both lim sup s sup θ A ∈ V ∗ ‘ s ( θ A , ω ) ≤ (cid:15) + δM + (1 − δ ) W L ( θ ∗ A )and lim inf s inf θ A ∈ V opt ‘ s ( θ A , ω ) ≥ − (cid:15) − δM + (1 − δ ) W L ( θ optA ), if ω also leads to i alwaysplaying a ∗ AA against group A and a ∗ AB against group B in all periods starting with T + 1 , then the posterior belief assigns to V ∗ must tend to 0, hence ˜ µ ( t ) i µ ∗ A . Starting from anylength T history h, there exists a subset ˆΩ h ⊆ Ω that leads to i not playing the EZ strategyin at least one period starting with T + 1 . So conditional on h, the probability of ˜ µ ( t ) i → µ ∗ A is no larger than 1 − P T ( ˆΩ h ) . The unconditional probability is therefore no larger than E h [1 − P T ( ˆΩ h )] , where E h is taken with respect to the distribution of period T histories for i. But this term is also the probability of i playing non-EZ action at least once starting withperiod T . Since there are finitely many actions and (˜ a ( t ) iA , ˜ a ( t ) iB ) → ( a ∗ AA , a ∗ AB ) almost surely, E h [1 − P T ( ˆΩ h )] tends to 0 as T → ∞ . We have a contradiction as this shows ˜ µ ( t ) i µ ∗ A with probability 1. 57ow consider the case where Θ A , Θ B have the product structure. Let ¯ K < ∞ be an upperbound on K g,g ( a i , a − i ; ( a A , a B , F )) across all g, g ∈ { A, B } , a i , a − i ∈ A , ( a A , a B , F ) ∈ Θ g . Here ¯ K is finite because A is finite and K g,g is continuous in the model, which is from acompact domain. Let F Xτ ( a − i ) ∈ ∆( A ) represent the distribution of ex-post signals givenprecision τ, when opponent plays a − i ∈ A . It is clear that there exists some τ < a − i = a − i , τ ∈ ( τ , , we get min( m A , m B ) · D KL ( F Xτ ( a − i ) k F Xτ ( a − i )) > ¯ K. Therefore,given any ( a ∗ AA , a ∗ AB , a ∗ BA ) ∈ A , the solution tomin ˆ θ ∈ Θ A ( λ + (1 − λ ) p A ) · [ D KL ( F • ( a ∗ AA , a ∗ AA ) k ˆ F ( a ∗ AA , ˆ a A )) + D KL ( F Xτ ( a ∗ AA ) k F Xτ (ˆ a A ))]+(1 − λ )(1 − p A ) · [ D KL ( F • ( a ∗ AB , a ∗ BA ) k ˆ F ( a ∗ AB , ˆ a B )) + D KL ( F Xτ ( a ∗ BA ) k F Xτ (ˆ a B )] must satisfy ˆ a A = a ∗ AA , ˆ a B = a ∗ BA , because ( a ∗ AA , a ∗ BA , F ) for any F ∈ F A has a KL divergenceno larger than ¯ K , and it is in Θ A because of the product structure. On the other hand, any(ˆ a A , ˆ a B , ˆ F ) with either ˆ a A = a ∗ AA or ˆ a B = a ∗ BA has KL divergence strictly larger than ¯ K bythe choice of ττ