[PDF] Coevolution of deception and preferences: Darwin and Nash meet Machiavelli

Abstract

We develop a framework in which individuals' preferences coevolve with their abilities to deceive others about their preferences and intentions. Specifically, individuals are characterised by (i) a level of cognitive sophistication and (ii) a subjective utility function. Increased cognition is costly, but higher-level individuals have the advantage of being able to deceive lower-level opponents about their preferences and intentions in some of the matches. In the remaining matches, the individuals observe each other's preferences. Our main result shows that, essentially, only efficient outcomes can be stable. Moreover, under additional mild assumptions, we show that an efficient outcome is stable if and only if the gain from unilateral deviation is smaller than the effective cost of deception in the environment.

Full PDF

aa r X i v : . [ ec on . T H ] J un Coevolution of Deception and Preferences:Darwin and Nash Meet Machiavelli ∗ Yuval Heller † and Erik Mohlin ‡ Final pre-print of a paper published in

Games and Economic Behavior , 113, 2019, pp. 223-247.

Abstract

We develop a framework in which individuals’ preferences coevolve with their abilitiesto deceive others about their preferences and intentions. Speciﬁcally, individuals are charac-terised by (i) a level of cognitive sophistication and (ii) a subjective utility function. Increasedcognition is costly, but higher-level individuals have the advantage of being able to deceivelower-level opponents about their preferences and intentions in some of the matches. In theremaining matches, the individuals observe each other’s preferences. Our main result showsthat, essentially, only eﬃcient outcomes can be stable. Moreover, under additional mild as-sumptions, we show that an eﬃcient outcome is stable if and only if the gain from unilateraldeviation is smaller than the eﬀective cost of deception in the environment.

Keywords:

Evolution of Preferences; Indirect Evolutionary Approach; Theory of Mind;Depth of Reasoning; Deception; Eﬃciency.

JEL codes:

C72, C73, D03, D83.Preprint of the ∗ Valuable comments were provided by the anonymous associate editor and referees, Vince Crawford, EddieDekel, Jeﬀrey Ely, Itzhak Gilboa, Christoph Kuzmics, Larry Samuelson, Jörgen Weibull, and Okan Yilankaya,as well as participants at presentations at Oxford University, Queen Mary University, G.I.R.L.13 in Lund, theToulouse Economics and Biology Workshop, DGL13 in Stockholm, the 25th International Conference on GameTheory at Stony Brook, and the Biological Basis of Preference and Strategic Behaviour 2015 conference at SimonFraser University. Yuval Heller is grateful to the

European Research Council for its ﬁnancial support (starting grant

Handelsbankens forskningsstiftelser (grant

SwedishResearch Council (grant † Aﬃliation: Department of Economics, Bar Ilan University. Address: Ramat Gan 5290002, Israel. E-mail:[email protected]. ‡ Aﬃliation: Department of Economics, Lund University. Address: Tycho Brahes väg 1, 220 07 Lund, Sweden.E-mail: [email protected]. Introduction

For a long time economists took preferences as given. The study of their origin and formationwas considered a question outside the scope of economics. Over the past two decades this haschanged dramatically. In particular, there is now a large literature on the evolutionary foundationsof preferences (for an overview, see Robson and Samuelson, 2011). A prominent strand of thisliterature is the so-called “indirect evolutionary approach,” pioneered by Güth and Yaari (1992)(term coined by Güth, 1995). This approach has been used to explain the existence of a varietyof “non-standard” preferences that do not coincide with material payoﬀs, e.g., altruism, spite, andreciprocal preferences. Typically, the non-materialistic preferences in question convey some formof commitment advantage that induces opponents to behave in a way that beneﬁts individualswith non-materialistic preferences, as described by Schelling (1960) and Frank (1987). Indeed,Heifetz, Shannon, and Spiegel (2007) show that this kind of result is generic.A crucial feature of the indirect evolutionary approach is that preferences are explicitly orimplicitly assumed to be at least partially observable. Consequently the results are vulnerable tothe existence of mimics who signal that they have, say, a preference for cooperation, but actuallydefect on cooperators, thereby earning the beneﬁts of having the non-standard preference withouthaving to pay the cost (Samuelson, 2001). The eﬀect of varying the degree to which preferencescan be observed has been investigated by Ok and Vega-Redondo (2001), Ely and Yilankaya (2001),Dekel, Ely, and Yilankaya (2007), and Herold and Kuzmics (2009). They conﬁrm that the degreeto which preferences are observed decisively inﬂuences the outcome of preference evolution.Yet, the degree to which preferences are observed is still exogenous in these models. In re-ality we would expect both the preferences and the ability to observe or conceal them to be theproduct of an evolutionary process. This paper provides a ﬁrst step towards ﬁlling in the missing For example, Bester and Güth (1998), Bolle (2000), and Possajennikov (2000) study combinations of altruism,spite, and selﬁshness. Ellingsen (1997) ﬁnds that preferences that induce aggressive bargaining can survive in a Nashdemand game. Fershtman and Weiss (1998) study evolution of concerns for social status. Sethi and Somanthan(2001) study the evolution of reciprocity in the form of preferences that are conditional on the opponent’s pref-erence type. In the context of the ﬁnitely repeated Prisoner’s Dilemma, Guttman (2003) explores the stability ofconditional cooperation. Dufwenberg and Güth (1999) study ﬁrm’s preferences for large sales. Güth and Napel(2006) study preference evolution when players use the same preferences in both ultimatum and dictator games.Koçkesen, Ok, and Sethi (2000) investigate survival of more general interdependent preferences in aggregativegames. Friedman and Singh (2009) show that vengefulness may survive if observation has some degree of in-formativeness. Recently, Norman (2012) has shown how to adapt some of these results into a dynamic model Gamba (2013) is an interesting exception. She assumes play of a self-conﬁrming equilibrium, rather than aNash equilibrium, in an extensive-form game. This allows for evolution of non-materialistic preferences even whenthey are completely unobservable. An alternative is to allow for a dynamic that is not strictly payoﬀ monotonic.This approach is pursued by Frenkel, Heller, and Teper (forthcoming), who show that multiple biases (inducingnon-materialistic preferences) can survive in non-monotonic evolutionary dynamics even if they are unobservable,because each approximately compensates for the errors of the others. On this topic, Robson and Samuelson (2011) write: “The standard argument is that we can observe preferencesbecause people give signals – a tightening of the lips or ﬂash of the eyes – that provide clues as to their feelings.However, the emission of such signals and their correlation with the attendant emotions are themselves the productof evolution. [...] We cannot simply assume that mimicry is impossible, as we have ample evidence of mimicry from ink between evolution of preferences and evolution of how preferences are concealed, feigned, anddetected. In our model the ability to observe preferences and the ability to deceive and inducefalse beliefs about preferences are endogenously determined by evolution, jointly with the evolu-tion of preferences. Cognitively more sophisticated players have positive probability of deceivingcognitively less sophisticated players. Mutual observation of preferences occurs only in matches inwhich such deception fails. This setup is general enough to encompass both the standard indirectevolutionary model where preferences are always observed, and the reverse case in which moresophisticated types always deceive lower types, as well as all intermediate cases between thesetwo extremes.

We ﬁnd that, generically, only eﬃcient outcomes can be played in stable popula-tion states.

Moreover, we deﬁne a single number that captures the eﬀective cost of deceptionagainst naive opponents, and show that an eﬃcient outcome is stable if and only if the gain froma unilateral deviation is smaller than the eﬀective cost of deception . Overview of the Model.

As is common in standard evolutionary game theory we assume aninﬁnite population of individuals who are uniformly randomly matched to play a symmetric normalform game. Each individual has a type, which is a tuple, consisting of a preference component anda cognitive component . The preference component is identiﬁed with a subjective utility functionover the set of outcomes (i.e. action proﬁles), which may diﬀer from the objective payoﬀs (i.e.,ﬁtness) of the underlying game. The cognitive component is simply a natural number representingthe level of cognitive sophistication of the individual. , The cost of increased cognition is strictlypositive.When two individuals with diﬀerent cognitive levels are matched, there is positive probability(which may depend on the cognitive levels of both agents) that the agent with the higher leveldeceives his opponent. For the sake of tractability, and in order not to limit the degree to which the animal world, as well as experience with humans who make their way by misleading others as to their feelings,intentions and preferences. [...] In our view, the indirect evolutionary approach will remain incomplete until theevolution of preferences, the evolution of signals about preferences, and the evolution of reactions to these signals,are all analysed within the model .” [Emphasis added] (pp. 14–15) The recent working paper of Gauer and Kuzmics (2016) presents a diﬀerent way to endogenising the observ-ability of preferences. Speciﬁcally, they assume that preferences are ex ante uncertain, and that each player mayexert a cognitive eﬀort to privately observe the opponent’s preferences. It is known that positive assortative matching is conducive to the evolution of altruistic behaviour(Hines and Maynard Smith, 1979) and non-materialistic preferences even when preferences are perfectly unob-servable (Alger and Weibull, 2013; Bergstrom, 1995). It is also known that ﬁnite populations allow for evolutionof spiteful behaviours (Schaﬀer, 1988) and non-materialistic preferences (Huck and Oechssler, 1999). By assumingthat individuals are uniformly randomly matched in an inﬁnite population, we avoid confounding these eﬀects withthe eﬀect of endogenising the degree of observability. The one-dimensional representation of cognitive ability reﬂects the idea that if one is good at deceiving others,then one is more likely to be good also at reading others and avoiding being deceived by them. In this paperwe simplify this relation by assuming a perfect correlation between the two abilities, and leave the study of moregeneral relations for future research. Remark 7 in Section 2.2 presents an alternative interpretation of our model, according to which this cognitivecomponent represents the agent’s social status, rather than the agent’s ability to deceive other agents. deception equilibrium . With the remaining probability (orwith probability one if both agents have the same cognitive level) there is no deception in thematch. In this case, we assume that each player observes the opponent’s preferences, and theindividuals play a Nash equilibrium of the complete information game induced by their subjectivepreferences.The state of a population is described by a conﬁguration , consisting of a type distributionand a behaviour policy. The type distribution is simply a ﬁnite support distribution on the setof types. The behaviour policy speciﬁes a Nash equilibrium for each match without deception,and a deception equilibrium for each match with deception. In a neutrally stable conﬁguration allincumbents earn the same expected ﬁtness, and if a small group of mutants enter they earn weaklyless than the incumbents in any focal post-entry state. A focal post-entry state is one in whichthe incumbents behave against each other in the same way as before the mutants entered.

Main Results.

We say that a strategy proﬁle is (ﬁtness-)eﬃcient if it maximises the sum ofobjective payoﬀs. Theorem 1 shows that in any stable conﬁguration, any type ¯ θ with the highestcognitive level in the incumbent population must play an eﬃcient strategy proﬁle when meetingitself. The intuition is that otherwise a highest-type mutant who mimics the play of ¯ θ againstall incumbents while playing an eﬃcient strategy proﬁle against itself would outperform type ¯ θ (anovel application of the “secret handshake” argument due to Robson, 1990).Next we restrict attention to generic games (i.e. games that result with probability one ifﬁtness payoﬀs are independently drawn from a continuous distribution) and obtain our ﬁrst mainresult: any stable conﬁguration must induce eﬃcient play in all matches between all types. Theidea of the proof can be brieﬂy sketched as follows. We ﬁrst show that any type θ in a stableconﬁguration must play an eﬃcient strategy proﬁle when meeting itself . Otherwise a mutant whohas the same level as θ and the same utility function as θ , but who plays eﬃciently against itself,could invade the population. Next, we show that any two types must play an eﬃcient strategyproﬁle. The intuition is that otherwise the average within-group ﬁtness would be higher than thebetween-group ﬁtness, which implies instability in the face of small perturbations in the frequencyof the types: a type who became slightly more frequent would have a higher ﬁtness than the otherincumbents, and this would move the population away from the original conﬁguration.The existing literature (e.g., Dekel, Ely, and Yilankaya, 2007) has demonstrated that if playersperfectly observe each other’s preferences (or do so with suﬃciently high probability), then onlyeﬃcient outcomes are stable. As was pointed out above, our model encompasses the limiting casein which it is arbitrarily “cheap and easy” to deceive the opponent, i.e. the case in which the4arginal cost of an additional cognitive level is very low, and having a slightly higher cognitivelevel allows a player to deceive the opponent with probability one. A key contribution of thepaper is to show that even when it is cheap and easy to deceive the opponent, the seemingly mildassumption of perfect observability, and Nash equilibrium behaviour, among players with the samecognitive level is enough to ensure that stability implies eﬃciency .In order to obtain suﬃcient conditions for stability we restrict attention to generic gamesthat admit a “punishment action” that ensures that the opponent achieves strictly less than thesymmetric eﬃcient ﬁtness payoﬀ. For games satisfying this relatively mild requirement we fullycharacterise stable conﬁgurations. We deﬁne the (ﬁtness) deviation gain of an action proﬁle to bethe maximal ﬁtness increase a player may obtain by unilaterally deviating from this action proﬁle(this gain is zero if and only if the action proﬁle is a Nash equilibrium of the underlying game). Nextwe deﬁne the eﬀective cost of deception in the environment as the minimal ratio between the cost ofan increased cognitive level and the probability that an agent with this level deceives an opponentwith the lowest cognitive level. Our second main result shows that an eﬃcient action proﬁle is theoutcome of a stable conﬁguration if and only if its deviation gain is smaller than the eﬀective costof deception. In particular, eﬃcient Nash equilibria are stable in all environments, while non-Nasheﬃcient action proﬁles are stable only as long as the gain from a unilateral deviation is suﬃcientlysmall .Next, we note that non-generic games may admit diﬀerent kinds of stable conﬁgurations. Oneparticularly interesting family of non-generic games is the family of zero-sum games, such asthe Rock-Paper-Scissors game. We analyse this game and characterise a heterogeneous stablepopulation (inspired by a related construction in Conlisk, 2001) in which diﬀerent cognitive levelscoexist, players with equal levels play the Nash equilibrium of the underlying game, and playerswith higher levels beat their opponents but this gain is oﬀset by higher cognitive costs.Finally, in Section 4 we discuss two extensions of the model (which are formally analysed inAppendices B and D): (1) we relax the assumption that each agent perfectly observes the partner’spreferences in matches without deception, and (2) we allow for type-interdependent preferences(`a la Herold and Kuzmics, 2009), which are represented by utility functions that are deﬁned overboth action proﬁles and the opponent’s type. Further Related Literature.

Our model is related to work in biology and evolutionary psy-chology on the evolution of the “theory of mind” (Premack and Woodruﬀ, 1979), speciﬁcally,the “Machiavellian intelligence” hypothesis (Humphrey, 1976) and the “social brain” hypothe-sis (Byrne and Whiten, 1998), according to which the extraordinary cognitive abilities of hu-mans evolved as a result of the demands of social interactions, rather than the demands ofthe natural environment: in a single-person decision problem there is a ﬁxed beneﬁt from be-ing smart, but in a strategic situation it may be important to be smarter than the opponent.5rom an evolutionary perspective, there is a trade-oﬀ between the beneﬁt of outsmarting the op-ponent and the non-negligible costs associated with increased cognitive capacity (Holloway, 1996;Kinderman, Dunbar, and Bentall, 1998). Our model incorporates these features.There is a smaller literature on the evolution of strategic sophistication within game theory; see,e.g., Stahl (1993), Banerjee and Weibull (1995), Stennek (2000), Conlisk (2001), Abreu and Sethi(2003), Mohlin (2012), Rtischev (2016), and Heller (2015). Following these papers, we provideresults to the eﬀect that diﬀerent degrees of cognitive sophistication may coexist.Robalino and Robson (2016) construct a model to demonstrate the advantage of having atheory of mind (understood as an ability to ascribe stable preferences to other players) over learningby reinforcement. In novel games the ascribed preferences allow the agents with a theory of mindto draw on past experience whereas a reinforcement learner without such a model has to start overagain. Hopkins (2014) explains why costly signaling of altruism may be especially valuable forthose agents who have a theory of mind.Robson (1990) initiated a literature on evolution in cheap-talk games by formulating the secrethandshake eﬀect: evolution selects an eﬃcient stable state if mutants can send messages that theincumbents either do not see or do not beneﬁt from seeing. Against the incumbents a mutantplays the same action as the incumbents do, but against other mutants the mutant plays an actionthat is a component of the eﬃcient equilibrium. Thus the mutants are able to invade unless theincumbents are already playing eﬃciently. See also the related analysis in Matsui (1991) andWiseman and Yilankaya (2001). We allow for deception and still ﬁnd that eﬃciency is necessary(though no longer suﬃcient) for stability. As pointed out by Wärneryd (1991) and Schlag (1993),among others, problems arise if either the incumbents use all available messages (so that there is nomessage left for the incumbents to coordinate on) or the incumbents follow a strategy that inducesthe mutants to play an action that lowers the mutants’ payoﬀs below those of the incumbents. Tocircumvent this problem, Kim and Sobel (1995) use stochastic stability arguments and Wärneryd(1998) uses complexity costs. Similarly, evolution selects an eﬃcient outcome in our model, wherethe preferences also serve the function of messages.We conclude by mentioning three other related strands of literature in which deception hasbeen implicitly studied: (1) the “strategic teaching” literature, which studies situations in whichsophisticated agents manipulate the learning input of opponents in order to change the beliefs andfuture actions of these opponents (see, e.g., Fudenberg and Levine, 1998; Camerer, Ho, and Chong,2002; Schipper, 2017, Section 8.11); (2) the “reputation” literature, in which a long-run playermanipulates the beliefs and behaviour of short-run opponents (see Mailath and Samuelson, 2006,for a textbook exposition); and (3) non-equilibrium level-k analysis of games of conﬂict, whereagents can use pre-play communication to deceive naive opponents (see, e.g., Crawford, 2003).6 tructure.

The rest of the paper is organised as follows. Section 2 presents the model. Theresults are presented in Section 3. In Section 4 we extend the model to deal with partial observ-ability (formally analysed in Appendix D) and type-interdependent preferences (formally analysedin Appendix B). We conclude in Section 5. Appendix A contains proofs not in the main text.Appendix C formally constructs heterogeneous stable populations in speciﬁc games.

We consider a large population of agents, each of whom is endowed with a type that determinesher subjective preferences and her cognitive level. The agents are randomly matched to play asymmetric two-player game. A dynamic evolutionary process of cultural learning, or biologicalinheritance, increases the frequency of more successful types. We present a static solution conceptto capture stable population states in such environments.

Consider a symmetric two-player normal form game G with a ﬁnite set A of pure actions and aset ∆ ( A ) of mixed actions (or strategies). We use the letter a (resp., σ ) to describe a typical pureaction (resp., mixed action). Payoﬀs are given by π : A × A → R , where π ( a, a ′ ) is the material(or ﬁtness) payoﬀ to a player using action a against action a ′ . The payoﬀ function is extended tomixed actions in the standard way, where π ( σ, σ ′ ) denotes the material payoﬀ to a player usingstrategy σ , against an opponent using strategy σ ′ . With a slight abuse of notation let a denote thedegenerate strategy that puts all the weight on action a . We adopt this convention for probabilitydistributions throughout the paper. Remark . Asymmetric interactions can be captured in our setup (as is standard in the litera-ture; see, e.g., Brown and von Neumann, 1950; Selten, 1980; van Damme, 1987, Section 9.5) byembedding the asymmetric interaction in a larger, symmetric game in which nature ﬁrst randomlyassigns the players to roles in the asymmetric interaction.We imagine a large population of individuals (technically, a continuum) who are uniformlyrandomly matched to play the game G . Each individual i in the population is endowed with a type θ = ( u, n ) ∈ Θ = U × N , consisting of preferences , identiﬁed with a von Neumann–Morgensternutility function, u ∈ U , and cognitive level n ∈ N . Let ∆ (Θ) be the set of all ﬁnite supportprobability distributions on Θ. A population is represented by a ﬁnite-support type distribution µ ∈ ∆ (Θ). Let C ( µ ) denote the support (carrier) of type distribution µ ∈ ∆ (Θ). Given a type For tractability, we choose to work with a discrete set of cognitive levels. The main results in the paper canbe adapted to a setup in which the feasible set of cognitive eﬀorts is a continuum, provided that we maintain ourfocus on ﬁnite-support type distributions. Comment 6 in Section 2.2 explains why we restrict attention to ﬁnite-support type distributions. , we use u θ and n θ to refer to its preferences and cognitive level, respectively.In the main model we assume that the preferences are deﬁned over action proﬁles, as inDekel, Ely, and Yilankaya (2007). This means that any preferences can be represented by autility function of the form u : A × A → R . The set of all possible (modulo aﬃne transformations)utility functions on A × A is U = [0 , | A | . Let BR u ( σ ′ ) denote the set of best replies to strategy σ ′ given preferences u , i.e. BR u ( σ ′ ) = arg max σ ∈ ∆( A ) u ( σ, σ ′ ).There is a ﬁtness cost to increased cognition, represented by a strictly increasing cognitive costfunction k : N → R + satisfying lim n →∞ k ( n ) = ∞ . The ﬁtness payoﬀ of an individual equals thematerial payoﬀ from the game, minus the cognitive cost. Let k n denote the cost of having cognitivelevel n . Hence k θ = k n θ denotes the cost of having type θ . Without loss of generality, we assumethat k = 0.We would like to put forward two motivations for the assumption that there is an increasingﬁtness cost of having a higher cognitive level. The ﬁrst motivation is relevant to settings in whichthe evolution of types is inﬂuenced by biological inheritance. There is a literature in biology andbiological anthropology showing that brain volume, especially neocortex volume, is correlated withthe size of social groups across species. Noting that brain tissue is metabolically costly, it has beenargued that the size of the brain (in particular the neocortex) is at least partially determinedby complexity of social organisation (see Dunbar, 1998, for a summary of the evidence and thearguments), which is in line with the “Machiavellian intelligence” and “social brain” hypotheses(Humphrey, 1976; Byrne and Whiten, 1997; Whiten and Byrne, 1988).The second motivation is relevant also in setups in which types evolve as part of a social learn-ing process. For concreteness, suppose that agents face two kinds of decision problems throughouttheir lives: (1) individual (ecological) decision problems against nature, and (2) interactive (so-cial) decision problems as represented by playing the underlying game G . Agents have limitedcognitive capacity. New agents who join the population face a trade-oﬀ between developing theirdeception-related cognitive skills (which are helpful when playing the game G ) and developingother skills (which are helpful in the decision problems against nature). When a new agent joinsthe population, his type θ = ( u θ , n θ ) determines how much eﬀort the agent exerts in develop-ing his deception-related cognitive ability n θ (while the remaining eﬀort is exerted to develop theother skills). The increasing cognitive cost function k ( n θ ) captures the agent’s loss due to hissub-optimal performance in the decision problems against nature, which is induced by divertingeﬀort to developing his deception-related cognitive ability at the expense of developing the otherskills. In Appendix B, we study type-interdependent preferences, which depend on the opponent’s type, as inHerold and Kuzmics (2009). .2 Conﬁgurations A state of the population is described by a type distribution and a behaviour policy for each typein the support of the type distribution. An individual’s behaviour is assumed to be (subjectively)rational in the sense that it maximises her subjective preferences given the belief she has aboutthe opponent’s expected behaviour. However, her beliefs may be incorrect if she is deceived by heropponent. An individual may be deceived if her opponent is of a strictly higher cognitive level.The probability of deception is given by the function q : N × N → [0 ,

1] that satisﬁes q ( n, n ′ ) = 0if and only if n ≤ n ′ . We interpret q ( n, n ′ ) as the probability that a player with cognitive level n deceives an opponent with cognitive level n ′ . Speciﬁcally, when two players with cognitive levels n ′ and n ≥ n ′ are matched to play, then with a probability of q ( n, n ′ ) the individual with the highercognitive level n (henceforth, the higher type ) observes the opponent’s preferences perfectly, andis able to deceive the opponent (henceforth, the lower type ). The deceiver is allowed to choosewhatever she wants the deceived party to believe about the deceiver’s intended action choice. Thedeceived party best-replies given her possibly incorrect belief. For simplicity, we assume that ifthe deceived party has multiple best replies, then the deceiver is allowed to break indiﬀerence, andchoose which of the best replies she wants the deceived party to play. Consequently the deceiveris able to induce the deceived party to play any strategy that is a best reply to some belief aboutthe opponent’s mixed action, given the deceived party’s preferences.Given preferences u ∈ U , let Σ ( u ) denote the set of undominated strategies , which are the setof actions that are best replies to at least one strategy of the opponent (given the preferences u ).Formally, we deﬁneΣ ( u ) = { σ ∈ ∆ ( A ) : there exists σ ′ ∈ ∆ ( A ) such that σ ∈ BR u ( σ ′ ) } . We say that a strategy proﬁle is a deception equilibrium if the strategy proﬁle is optimal fromthe point of view of the deceiver under the constraint that the deceived player has to play anundominated strategy. Formally:

Deﬁnition 1.

Given two types θ, θ ′ with n θ > n θ ′ , a strategy proﬁle (˜ σ, ˜ σ ′ ) is a deception equilib-rium if (˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) ,σ ′ ∈ Σ( u θ ′ ) u θ ( σ, σ ′ ) . Let DE ( θ, θ ′ ) be the set of all such deception equilibria.With the remaining probability of 1 − q ( n, n ′ ) − q ( n ′ , n ) there is no deception between the One can extend our main results to a setup in which individuals with lower cognitive levels can deceive opponentswith higher cognitive levels with a suﬃciently small probability. Speciﬁcally, assume that for each generic game,there exists ǫ >

0, such that q ( n, n ′ ) < ǫ for each n ≤ n ′ (instead of requiring q ( n, n ′ ) = 0). One can show that thecharacterization of NSCs in Corollary 2 remains qualitatively the same. Namely, the only candidates to be NSCsare conﬁgurations in which all agents have the minimal cognitive level, and all agents play the eﬃcient action proﬁlein every match with no deception. These conﬁgurations are NSCs if the eﬀective cost of defection is suﬃciently low. n and n ′ , and they play a Nash equilibrium of the game induced bytheir preferences. Given two preferences u, u ′ ∈ U , let N E ( u, u ′ ) ⊆ ∆ ( A ) × ∆ ( A ) be the set ofmixed equilibria of the game induced by the preferences u and u ′ , i.e. N E ( u, u ′ ) = { ( σ, σ ′ ) ∈ ∆ ( A ) × ∆ ( A ) : σ ∈ BR u ( σ ′ ) and σ ′ ∈ BR u ′ ( σ ) } . We are now in a position to deﬁne our key notion of a conﬁguration (following the terminologyof Dekel, Ely, and Yilankaya, 2007), by combining a type distribution with a behaviour policy, asrepresented by Nash equilibria and deception equilibria.

Deﬁnition 2. A conﬁguration is a pair ( µ, b ) where µ ∈ ∆ (Θ) is a type distribution, and b = (cid:16) b N , b D (cid:17) is a behaviour policy , where b N , b D : C ( µ ) × C ( µ ) −→ ∆ ( A ) satisfy for each θ, θ ′ ∈ C ( µ ) : q ( n θ , n θ ′ ) + q ( n θ ′ , n θ ) < ⇒ (cid:16) b Nθ ( θ ′ ) , b Nθ ′ ( θ ) (cid:17) ∈ N E ( θ, θ ′ ) , and q ( n θ , n θ ′ ) > ⇔ n θ > n θ ′ ⇒ (cid:16) b Dθ ( θ ′ ) , b Dθ ′ ( θ ) (cid:17) ∈ DE ( θ, θ ′ ) . We interpret b Dθ ( θ ′ ) (resp., b Nθ ( θ ′ )) to be the strategy used by type θ against type θ ′ when deceptionoccurs (resp., does not occur).Given a conﬁguration ( µ, b ) we call the types in the support of µ incumbents . Note that standardarguments imply that for any distribution µ , there exists a mapping b : C ( µ ) × C ( µ ) → ∆ ( A ) suchthat ( µ, b ) is a conﬁguration. Given a conﬁguration ( µ, b ) and types θ, θ ′ ∈ C ( µ ), let π θ ( θ ′ | ( µ, b ))be the expected ﬁtness of an agent with type θ conditional on being matched with θ ′ : π θ (cid:0) θ ′ | ( µ, b ) (cid:1) = ( q ( n θ , n θ ′ ) + q ( n θ ′ , n θ )) · π (cid:16) b Dθ (cid:0) θ ′ (cid:1) , b Dθ ′ ( θ ) (cid:17) +(1 − ( q ( n θ , n θ ′ ) + q ( n θ ′ , n θ ))) · π (cid:16) b Nθ (cid:0) θ ′ (cid:1) , b Nθ ′ ( θ ) (cid:17) . The expected ﬁtness of an individual of type θ in conﬁguration ( µ, b ) isΠ θ | ( µ,b ) = X θ ′ ∈ C ( µ ) µ ( θ ′ ) · π θ ( θ ′ | ( µ, b )) − k θ , where µ ( θ ′ ) denotes the frequency of type θ ′ in the population. Given a conﬁguration ( µ, b ), letΠ ( µ,b ) be the average ﬁtness in the population, i.e.,Π ( µ,b ) = X θ ∈ C ( µ ) µ ( θ ) · Π θ | ( µ,b ) . When all incumbent types have the same expected ﬁtness (i.e. Π ( µ,b ) = Π θ | ( µ,b ) for each θ ∈ C ( µ )),we say that the conﬁguration is balanced .A number of aspects of our model of cognitive sophistication merit further discussion.10. Unidimensional cognitive ability : In reality the ability to deceive and the ability to detectpreferences are probably not identical. However, both of them are likely to be stronglyrelated to cognitive ability in general, and more speciﬁcally to theory of mind and the abil-ity to entertain higher-order intentional attitudes (Kinderman, Dunbar, and Bentall, 1998;Dunbar, 1998). For this reason we believe that a unidimensional cognitive trait is a reason-able approximation. Moreover, it is an approximation that aﬀords us necessary tractability.We connect the abilities to detect and conceal preferences with the ability to deceive, by as-suming (throughout the paper) that one is able to deceive one’s opponent if and only if oneobserves the opponent’s preferences and conceals one’s own preferences from the opponent.2.

Power of deception : Our deﬁnition of deception equilibrium amounts to an assumptionthat a successful deception attempt allows the deceiver to implement her favourite strat-egy proﬁle, under the constraint that the deceived party does not choose a dominated ac-tion from her point of view. Moreover, we assume that a player with a higher cognitivelevel knows whether her deception was successful when choosing her action. These assump-tions give higher cognitive types a clear advantage over lower cognitive types. Hence, in analternative model in which successful deceivers have less deception power, we would expectthe evolutionary advantage of higher types to be weaker than in our current model. Belowwe ﬁnd that (for generic games) in any stable state everyone plays the same eﬃcient actionproﬁle and has the lowest cognitive level. We conjecture that these states will remain stablealso in a model where successful deception is less powerful. We leave for future research theanalysis of feasible but less powerful deception technologies.3.

Same deception against all lower types : Our model assumes that a player may use diﬀerentdeceptions against diﬀerent types with lower cognitive levels. We note that our results remainthe same (with minor changes to the proofs) in an alternative setup in which individuals haveto use the same mixed action in their deception eﬀorts towards all opponents.4.

Non-Bayesian deception : Note that a successful deceiver is able to induce the opponent tobelieve that the deceiving type will play any mixed action ˆ σ , even an action that is neverplayed by any agent in the population. That is, deception is so powerful in our model that thedeceived opponent is not able to apply Bayesian reasoning in his false assessment of whichaction the agent is going to play. We think of this assumption as describing a setting inwhich the deceiver (of a higher cognitive type) is able to provide a convincing argument (tella convincing story) that she is going to play ˆ σ . From a Bayesian perspective one might objectthat these arguments are signals that should be used to update beliefs. To this we wouldrespond that the stories told to a potential victim by diﬀerent deceivers will vary across Thus, in our setup a cognitive arms race (i.e. Machiavellian intelligence hypothesis `a la Humphrey, 1976;Robson, 2003) is a non-equilibrium phenomenon, or alternatively a feature of non-generic games.

Observation and Nash equilibrium behaviour in the case of non-deception : It is diﬃcult toavoid an element of arbitrariness when making an assumption about what is being observedwhen neither party is able to deceive the other. As in most of the existing literature on theindirect evolutionary approach (e.g., Güth and Yaari, 1992; Dekel, Ely, and Yilankaya, 2007,Section 3), we assume that when there is no deception, then there is perfect observabilityof the opponent’s preferences. In Section 4.1 we discuss the implications of the relaxationof this assumption. We consider it to be an important contribution of our analysis that ithighlights the critical importance of the assumption made regarding observability, and theresulting behaviour, in matches without deception.We further assume that if two agents observe each other’s preferences then they play a Nashequilibrium of the complete information game induced by their preferences. This assumptionis founded on the common idea that when agents are not deceived, then (1) over time theyadapt their beliefs (in a way that is consistent with Bayesian inference) about the distributionof actions they face, conditional on their partners’ observed preferences, and (2) they best-reply given their belief about their current partner’s distribution of actions. By contrast, asdiscussed above, when agents are deceived they are unable to correctly update their beliefsabout their partner’s action (i.e. unable to use Bayesian inference to arrive at beliefs aboutthe opponent’s distribution of actions). Still, they are able to best-reply given their (possiblyfalse) beliefs about the deceiver’s action.6.

Continuum population and ﬁnite-support type distributions . Our model is intended to be asimple approximation of a real-life environment that includes a large ﬁnite population, andin which new agents who join the population, or existing agents who revise their choice oftype, typically choose to mimic one of the existing active types. As a result each active typeis played by several agents (rather than by a single agent), and for each active type thereis a positive probability of a match between agents who are endowed with this type. As iscommon in the literature, for tractability, we assume a continuum population and an “exactlaw of large numbers,” rather then a large ﬁnite population. We want all other aspects ofthe model to be as close as possible to the real-life environment. Speciﬁcally, we want tomaintain the property that for each type, there is a positive probability of a match betweenagents who are endowed with this type. In order to maintain this property, we have to12ssume that the distribution of active types has a ﬁnite support. Alternative interpretation of our model: social status . As suggested by one of the referees,one can present an interesting interpretation of our model that describes social status, ratherthan deception. According to this interpretation, the level n θ of type θ describes the socialstatus (like caste) of agents belonging to this type. When two players are randomly matchedto play a game, ﬁrst a “social struggle” ensues. With a certain probability, the higher-caste player prevails and enslaves the lower-caste opponent. This means he can dictate thechoice by the lower-caste opponent as long as the choice is undominated for this opponent.Otherwise, they simply play the Nash equilibrium of the game (given by their preferences).Maintaining a higher social status is costly in terms of ﬁtness. As discussed in the previous subsection, each agent in the population behaves in a way that max-imises the agent’s subjective preferences induced by the agent’s type. By contrast, the distributionof types in the population evolves according to the expected material ﬁtness obtained by each type.This evolutionary process is captured by the static solution concepts introduced in this subsection.We consider dynamics in which types with higher expected ﬁtness gradually become morefrequent. One example of such dynamics is the replicator dynamic (Taylor and Jonker, 1978),which can be interpreted in terms of biological (asexual) reproduction or as social learning byimitation (see Weibull, 1995, Chapter 3 for a textbook introduction). According to the latterinterpretation, an agent who has the opportunity to revise her choice or a new agent who joins thepopulation randomly chooses a member of the population as “mentor,” and imitates the mentor’stype; the probability that an agent is chosen as a mentor is proportional to that agent’s ﬁtness.Recall that a neutrally stable strategy (Maynard Smith and Price, 1973; Maynard Smith, 1982)is a strategy that, if played by most of the population, weakly outperforms any other strategy.Similarly, an evolutionarily stable strategy is a strategy that, if played by most of the population,strictly outperforms any other strategy.

Deﬁnition 3.

A strategy σ ∈ ∆ ( A ) is a neutrally stable strategy (NSS) if for every σ ′ ∈ ∆ ( A )there is some ¯ ε ∈ (0 ,

1) such that if ε ∈ (0 , ¯ ε ), then ˜ π ( σ ′ , (1 − ε ) σ + εσ ′ ) ≤ ˜ π ( σ, (1 − ε ) σ + εσ ′ ).If weak inequality is replaced by strict inequality for each σ ′ = σ, then σ is an evolutionarily stablestrategy (ESS) .It is well known that NSSs and ESSs correspond to Lyapunov stable and asymptotically stablepopulation states, respectively, under the replicator dynamics. That is, a population starting close More accurately, we need to assume that the set of active types is countable. All of our results hold under thissomewhat weaker assumption.

13o an NSS will always remain close to the NSS, and a population starting close to an ESS willconverge to the ESS (see, e.g., Taylor and Jonker, 1978; Thomas, 1985; Bomze and Weibull, 1995;Cressman, 1997; Sandholm, 2010.)We extend the notions of neutral and evolutionary stability from strategies to conﬁgurations.We begin by deﬁning the type game that is induced by a conﬁguration.

Deﬁnition 4.

For any conﬁguration ( µ, b ) the corresponding type game Γ ( C ( µ ) ,b ) is the symmetrictwo-player game where each player’s pure strategy space is C ( µ ), and the payoﬀ to strategy θ ,against θ ′ , is π θ ( θ ′ | ( µ, b )) − k θ .The deﬁnition of a type game allows us to apply notions and results from standard evolutionarygame theory, where evolution acts upon strategies, to the present setting where evolution acts upontypes. A similar methodology was used in Mohlin (2012). Note that each type distribution withsupport in C ( µ ) is represented by a mixed strategy in Γ ( C ( µ ) ,b ) .We want to capture robustness with respect to small groups of individuals, henceforth called mutants , who introduce new types and new behaviours into the population. Suppose that afraction ε of the population is replaced by mutants and suppose that the distribution of typeswithin the group of mutants is µ ′ ∈ ∆ (Θ). Consequently the post-entry type distribution is˜ µ = (1 − ε ) · µ + ε · µ ′ . That is, for each type θ ∈ C ( µ ) ∪ C ( µ ′ ), ˜ µ ( θ ) = (1 − ε ) · µ ( θ ) + ε · µ ′ ( θ ). Inline with most of the literature on the indirect evolutionary approach we assume that adjustmentof behaviour is inﬁnitely faster than adjustment of type distribution. Thus we assume that thepost-entry type distribution quickly stabilises into a conﬁguration (cid:16) ˜ µ, ˜ b (cid:17) . There may exist manysuch post-entry type conﬁgurations, all having the same type distribution, but diﬀerent behaviourpolicies. We note that incumbents do not have to adjust their behaviour against other incumbentsin order to continue playing Nash equilibria, and deception equilibria, among themselves. Forthis reason, we assume (similarly to Dekel, Ely, and Yilankaya, 2007, in the setup with perfectobservability) that the incumbents maintain the same pre-entry behaviour among themselves.Formally: Deﬁnition 5.

Let ( µ, b ) and (cid:16) ˜ µ, ˜ b (cid:17) be two conﬁgurations such that C ( µ ) ⊆ C (˜ µ ). We saythat (cid:16) ˜ µ, ˜ b (cid:17) is focal (with respect to ( µ, b )) if θ, θ ′ ∈ C ( µ ) implies that ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) and˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ).Standard ﬁxed-point arguments imply that for every conﬁguration ( µ, b ) and every type dis-tribution ˜ µ satisfying C ( µ ) ⊆ C (˜ µ ) , there exists a behaviour policy ˜ b such that (cid:16) ˜ µ, ˜ b (cid:17) is a focalconﬁguration.Our stability notion requires that the incumbents outperform all mutants in all conﬁgurationsthat are focal relative to the initial conﬁguration. Sandholm (2001) and Mohlin (2010) are exceptions. eﬁnition 6. A conﬁguration ( µ, b ) is a neutrally stable conﬁguration (NSC) if, for every µ ′ ∈ ∆ (Θ), there is some ¯ ε ∈ (0 ,

1) such that for all ε ∈ (0 , ¯ ε ), it holds that if (cid:16) ˜ µ, ˜ b (cid:17) , where ˜ µ =(1 − ε ) · µ + ε · µ ′ , is a focal conﬁguration, then µ is an NSS in the type game Γ( ˜ µ, ˜ b ). Theconﬁguration ( µ, b ) is an evolutionarily stable conﬁguration (ESC) if the same conditions implythat µ is an ESS in the type game Γ( ˜ µ, ˜ b ) for each µ ′ = µ .We conclude this section by discussing a few issues related to our notion of stability.1. In line with existing notions of evolutionary stability in the literature (in particular, thenotions of Dekel, Ely, and Yilankaya, 2007, and Alger and Weibull, 2013), we require themutants to be outperformed in all focal conﬁgurations (rather than requiring them to beoutperformed in at least one focal conﬁguration). This reﬂects the assumption that thepopulation converges to a new post-entry equilibrium in a decentralised (possibly random)way that may lead to any of the post-entry focal conﬁgurations. Thus the incumbents cannotcoordinate their post-entry play on a speciﬁc focal conﬁguration that favors them.2. In order to be consistent with the standard deﬁnition of neutral stability, we require theincumbents to earn weakly more than the average payoﬀ of the mutants. We note that allof our results remain the same if one uses an alternative weaker deﬁnition that requires theincumbents to earn weakly more than the worst-performing mutant.3. The main stability notion that we use in the paper is NSC. The stronger notion of ESC isnot useful in our main model because there always exist equivalent types that have slightlydiﬀerent preferences (as the set of preferences is a continuum) and induce the same behaviouras the incumbents. Such mutants always achieve the same ﬁtness as the incumbents inpost-entry conﬁgurations, and thus ESCs never exist. Note that the stability notions inDekel, Ely, and Yilankaya (2007) and Alger and Weibull (2013) are also based on neutralstability. In Section B we study a variant of the model in which the preferences maydepend also on the opponent’s types. This allows for the existence of ESCs.4. Observe that Deﬁnition 6 implies internal stability with respect to small perturbations in thefrequencies of the incumbent types (because when µ ′ = µ, then µ is required to be an NSSin Γ ( C ( µ ) ,b ) ). By standard arguments, internal stability implies that any NSC is balanced: allincumbent types obtain the same ﬁtness.5. The stability notions of Dekel, Ely, and Yilankaya (2007) and Alger and Weibull (2013) con-sider only monomorphic groups of mutants (i.e. mutants all having the same type). We addi-tionally consider stability against polymorphic groups of mutants (as do Herold and Kuzmics, In their stability analysis of homo hamiltonensis preferences Alger and Weibull (2013) disregard mutants whoare behaviourally indistinguishable from homo hamiltonensis upon entry.

Deﬁne the deviation gain of action a ∈ A , denoted by g ( a ) ∈ R + , as the maximal gain a playercan get by playing a diﬀerent action in a population in which everyone plays a : g ( a ) = max a ′ ∈ A π ( a ′ , a ) − π ( a, a ) . Note that g ( a ) = 0 iﬀ ( a, a ) is a Nash equilibrium.Deﬁne the eﬀective cost of deception in the environment, denoted by c ∈ R + , as the minimalratio between the cognitive cost and the probability of deceiving an opponent of cognitive levelone:

16 17 c = min n ≥ k n q ( n, . We say that a strategy proﬁle is eﬃcient if it maximises the sum of ﬁtness payoﬀs. Formally:

Deﬁnition 7.

A strategy proﬁle ( σ, σ ′ ) is eﬃcient in the game G = ( A, π ) if π ( σ, σ ′ ) + π ( σ ′ , σ ) ≥ π ( a, a ′ ) + π ( a ′ , a ), for each action proﬁle ( a, a ′ ) ∈ A .Note that our notion of eﬃciency is deﬁned: (1) with respect to the ﬁtness payoﬀ (rather thanthe agents’ subjective payoﬀs), similarly to the analogous deﬁnition of eﬃciency in Dekel, Ely, and Yilankaya(2007), and (2) with respect to the strategy proﬁle played by the agents; by contrast, the deﬁnitiondoes not take into account the cognitive costs.A pure Nash equilibrium ( a, a ) is strict if π ( a, a ) > π ( a ′ , a ) for all a ′ = a ∈ A . Let ˆ π =max a,a ′ ∈ A (0 . · ( π ( a, a ′ ) + π ( a ′ , a ))) denote the eﬃcient payoﬀ, i.e. the average payoﬀ achieved byplayers who play an eﬃcient proﬁle. The minimum in the deﬁnition of c is well deﬁned for the following reason. Let ˆ n be a number such that k ˆ n > k q (2 , (such a number exists because lim n →∞ k n = ∞ ). Observe that k n q ( n, ≥ k n > k q (2 , for any n ≥ ˆ n . Thisimplies that there is an ¯ n such that 2 ≤ ¯ n ≤ ˆ n and ¯ n = arg min n ≥ k n q ( n, . We deﬁne the eﬀective cost of deception only with respect to an opponent with a cognitive level of one becausewe later show (Lemma 1 and Theorem 2) that the only candidate to be an NSC is a conﬁguration in which allagents have a cognitive level of one, and such a conﬁguration is an NSC iﬀ the eﬀective cost of defection againstthese incumbents with n = 1 is suﬃciently large.

16n action a is a punishment action if playing it guarantees that the opponent will obtain lessthan the eﬃcient payoﬀ, i.e. π ( a ′ , a ) < ˆ π for each a ′ ∈ A . Some of our results below assume thatthe underlying game admits a punishment action. Remark . Many economic interactions admit punishment actions. A few examples include:1. Price competition (Bertrand), either for a homogeneous good or for diﬀerentiated goods,where a punishment action is setting the price equal to zero.2. Quantity competition (Cournot), either for a homogeneous good or for diﬀerentiated goods,where the punishment action is “ﬂooding” the market.3. Public good games, where contributing nothing to the public good is the punishment action.4. Bargaining situations, where the punishment action is for one side of the bargaining to insiston obtaining all surplus.5. Any game that admits an action proﬁle that Pareto dominates all other action proﬁles (i.e.,games with common interests).Moreover, if one adds to any underlying generic game a new pure action that is equivalent to playingthe mixed action that min-maxes the opponent’s payoﬀ (e.g., in matching pennies this new actionis equivalent to privately tossing a coin and then playing according to the toss’s outcome), thenthis newly added action is always a punishment action.Given a conﬁguration ( µ, b ) let ¯ n = max θ ∈ C ( µ ) n θ denote the maximal cognitive level of theincumbents. We refer to incumbents with this cognitive level as the highest types .A deception equilibrium is ﬁtness maximising if it maximises the ﬁtness of the higher type inthe match (under the restriction that the lower type plays an action that is not dominated, givenher preferences). Formally: Deﬁnition 8.

Let θ, θ ′ be types with n θ > n θ ′ . A deception equilibrium (˜ σ, ˜ σ ′ ) is ﬁtness maximising if (˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) , σ ′ ∈ Σ( u θ ′ ) π ( σ, σ ′ ) . Let

F M DE ( θ, θ ′ ) ⊆ DE ( θ, θ ′ ) denote the set of all such ﬁtness-maximising deception equilibriaof two types θ, θ ′ with n θ > n θ ′ . In principle, F M DE ( θ, θ ′ ) might be an empty set (if there is noaction proﬁle that maximises both the ﬁtness and the subjective utility of the higher type). Ourﬁrst result (Theorem 1 below) implies that the preference of the higher type in any NSC are suchthat the set F M DE ( θ, θ ′ ) is non-empty.A conﬁguration is pure if everyone plays the same action. Formally:17 eﬁnition 9. A conﬁguration ( µ, b ) is pure if there exists a ∗ ∈ A such that for each θ, θ ′ ∈ C ( µ )it holds that b Nθ ( θ ′ ) = a ∗ whenever q ( θ, θ ′ ) <

1, and b Dθ ( θ ′ ) = a ∗ whenever q ( θ, θ ′ ) > µ, a ∗ ), and we refer to b ≡ a ∗ asthe outcome of the conﬁguration.In order to simplify the notation and the arguments in the proofs, we assume throughout thissection that the underlying game admits at least three actions (i.e. | A | ≥ In this section we characterise the behaviour of an incumbent type, ¯ θ = ( u, ¯ n ), which has thehighest level of cognition in the population. We show that the behaviour satisﬁes the followingthree conditions:1. Type ¯ θ plays an eﬃcient action proﬁle when meeting itself.2. Type ¯ θ maximises its ﬁtness in all deception equilibria.3. Any opponent with a lower cognitive level achieves at most the eﬃcient payoﬀ ˆ π against type¯ θ . Theorem 1.

Let ( µ ∗ , b ∗ ) be an NSC, and let θ, ¯ θ ∈ C ( µ ∗ ) . Then: (1) if n ¯ θ = ¯ n then π (cid:16) ¯ θ, ¯ θ (cid:17) = ˆ π ;(2) if n θ < n ¯ θ = ¯ n then (cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ¯ θ, θ (cid:17) ; and (3) if n θ < n ¯ θ = ¯ n then π (cid:16) θ, ¯ θ (cid:17) ≤ ˆ π .Proof Sketch (formal proof in Appendix A.2). The proof utilises mutants (denoted by θ , θ , θ ,and ˆ θ , below) with the highest cognitive level ¯ n and with a speciﬁc kind of utility function,called indiﬀerent and pro-generous , that makes a player indiﬀerent between all her own actions,but which makes the player prefer that the opponent choose an action that allows the player toobtain the highest possible ﬁtness payoﬀ.To prove part 1 of the theorem, assume to the contrary that π (cid:16) b ¯ θ (cid:16) ¯ θ (cid:17) , b ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) < ˆ π . Let a , a ∈ A be any two actions such that ( a , a ) is an eﬃcient action proﬁle (i.e. 0 . · ( π ( a , a ) + π ( a , a )) =ˆ π ). Consider three diﬀerent mutant types θ , θ , and θ , which are of the highest cognitive levelthat is present in the population, and have indiﬀerent and pro-generous utility functions. Suppose For tractability we assume that a conﬁguration can have only ﬁnite support. Note, however, that there is somesuﬃciently high cognitive level n such that k n > max a,a ′ ∈ A π ( a, a ′ ). As a result, even if one relaxes the assumptionof ﬁnite support, any NSC must include only a ﬁnite number of cognitive levels, also without the ﬁnite-supportassumption. There is a focal post-entryconﬁguration in which the incumbents keep playing their pre-entry play among themselves, themutants play the same Nash equilibria as the incumbent ¯ θ against all incumbent types (and theincumbents behave against the mutants in the same way they behave against ¯ θ ), the mutantsplay ﬁtness-maximising deception equilibria against all lower types, when mutants of type θ i arematched with mutants of type θ ( i +1) mod 3 they play the eﬃcient proﬁle ( a , a ), and when twomutants of the same type are matched they play the same way as two incumbents of type ¯ θ thatare matched together. In such a focal post-entry conﬁguration all mutants earn a weakly higherﬁtness than ¯ θ against the incumbents, and a strictly higher ﬁtness against the mutants. Thisimplies that ( µ ∗ , b ∗ ) cannot be an NSC.To prove part 2, assume to the contrary that (cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17) F M DE (cid:16) ¯ θ, θ (cid:17) . Supposemutants of type ˆ θ enter. Consider a post-entry conﬁguration in which the incumbents keep playingtheir pre-entry play among themselves, and the mutants mimic the play of ¯ θ , except that theyplay ﬁtness-maximising deception equilibria against all lower types. The mutants obtain a weaklyhigher payoﬀ than ¯ θ against all types, and a strictly higher payoﬀ than ¯ θ against at least one lowertype. Thus ( µ ∗ , b ∗ ) cannot be an NSC.To prove part 3, assume to the contrary that π (cid:16) θ, ¯ θ (cid:17) > ˆ π . This implies that against type ¯ θ ,type θ earns more than ˆ π in either the deception equilibrium or the Nash equilibrium. Supposemutants of type ˆ θ enter. Consider a post-entry conﬁguration in which the incumbents keep playingtheir pre-entry play among themselves, while the mutants: (i) play ﬁtness-maximising deceptionequilibria against lower types, (ii) mimic type θ ’s play in the Nash/deception equilibrium againsttype ¯ θ in which θ earns more than ˆ π , and (iii) mimic the play of ¯ θ in all other interactions. Thetype ˆ θ mutants earn strictly more than ¯ θ against both ˆ θ and ¯ θ . The mutants earn weakly morethan ¯ θ against all other types. This implies that ( µ ∗ , b ∗ ) cannot be an NSC. Remark . The ﬁrst part of Theorem 1 (a highest type must play an eﬃcient strategy proﬁle whenmeeting itself) is similar to Dekel, Ely, and Yilankaya’s (2007) Proposition 2, which shows thatonly eﬃcient outcomes can be stable in a setup with perfect observability and no deception. Weshould note that Dekel, Ely, and Yilankaya (2007) use a weaker notion of eﬃciency. An action iseﬃcient in the sense of Dekel, Ely, and Yilankaya (2007) (DEY-eﬃcient) if its ﬁtness is the highestamong the symmetric strategy proﬁles (i.e. action a is DEY-eﬃcient if π ( a, a ) ≥ π ( σ, σ ) for allstrategies σ ∈ ∆ ( A )). Observe that our notion of eﬃciency (Deﬁnition 7) implies DEY-eﬃciency,but the converse is not necessarily true. The weaker notion of DEY-eﬃciency is the relevant onein the setup of Dekel, Ely, and Yilankaya (2007), because they consider only monomorphic groups One must have at least two diﬀerent types of mutants, in order for the mutants to be able to play the asymmetricproﬁle ( a , a ). We preset a construction with three diﬀerent mutant types in order to allow all mutant types tooutperform the incumbents (one can also prove the result using a constructions with only two diﬀerent mutanttypes, but in this case one can only guarantee that the mutants, on average, would outperform the incumbents) If i = 1 (resp., i = 2, i = 3), then θ ( i +1) mod 3 = θ (resp., θ ( i +1) mod 3 = θ , θ ( i +1) mod 3 = θ ). Corollary 1. If G does not have an eﬃcient proﬁle that is symmetric (i.e. if π ( a, a ) < ˆ π for each a ∈ A ), then the game does not admit an NSC.Remark . As discussed in Remark 1, any interaction (symmetric or asymmetric) can be embeddedin a larger, symmetric game in which nature ﬁrst randomly assigns roles to the players, and theneach player chooses an action given his assigned role. Observe that such an embedded game alwaysadmits an eﬃcient symmetric action proﬁle . In particular, if the eﬃcient asymmetric proﬁle inthe original game is ( a, a ′ ), then the eﬃcient symmetric proﬁle in the embedded game is the onein which each player plays a as the row player and a ′ as the column player. In this subsection we characterise pure NSCs, i.e. stable conﬁgurations in which everyone playsthe same pure action in every match. Such a conﬁguration may be viewed as representing thestate of a population that has settled on a convention that there is a unique correct way to behave.We begin by showing that in a pure NSC all incumbents have the minimal cognitive level, sincehaving a higher ability does not yield any advantage when everyone plays the same action.

Lemma 1. If ( µ, a ∗ ) is an NSC, and ( u, n ) ∈ C ( µ ) , then n = 1 .Proof. Since all players earn the same game payoﬀ of π ( a ∗ , a ∗ ) , they must also incur the samecognitive cost, or else the ﬁtness of the diﬀerent incumbent types would not be balanced (whichwould contradict that ( µ, a ∗ ) is an NSC). Moreover, this uniform cognitive level must be level1. Otherwise a mutant of a lower level, who strictly prefers to play a ∗ against all actions, wouldstrictly outperform the incumbents in nearby post-entry focal conﬁgurations.The following proposition shows that a pure outcome is stable iﬀ it is eﬃcient and its deviationgain is smaller than than the eﬀective cost of deception. Formally: Proposition 1.

Let a ∗ be an action in a game that admits a punishment action. The followingtwo statements are equivalent:(a) There exists a type distribution µ such that ( µ, a ∗ ) is an NSC.(b) a ∗ satisﬁes the following two conditions: (1) π ( a ∗ , a ∗ ) = ˆ π , and (2) g ( a ∗ ) ≤ c . If the original game is symmetric, the role (i.e. being either the row or the column player) can be interpretedas reﬂecting some observable payoﬀ-irrelevant asymmetry between the two players. roof. “If side.” Assume that ( a ∗ , a ∗ ) is an eﬃcient proﬁle and that g ( a ∗ ) ≤ c . Let e a be apunishment action. Consider a monomorphic conﬁguration ( µ, a ∗ ) consisting of type θ ∗ =( u ∗ ,

1) where all incumbents are of cognitive level 1 and of the same preference type u ∗ ,according to which all actions except a ∗ and e a are strictly dominated, e a weakly dominates a ∗ , and a ∗ is a best reply to itself: u ∗ ( a, a ′ ) =  a = e a and a ′ = a ∗ a = a ∗ or ( a = e a and a ′ = a ∗ . ) − π when they are matched with the incumbents,and strictly less than ˆ π if the mutants play any action a = a ∗ with positive probabilityagainst the incumbents. Further observe, that the mutants can earn (on average) at most ˆ π when they are matched with other mutants (because ˆ π is the eﬃcient payoﬀ). This impliesthat incumbents weakly outperform any mutants with cognitive level one in any post-entrypopulation.Next, consider mutants with a higher cognitive level n >

1. Such mutants can earn atmost ˆ π + g ( a ∗ ) when they deceive the incumbents and at most ˆ π when they do not deceivethe incumbents (recall that π ( e a, e a ) + g ( e a ) = max a ′ π ( a ′ , e a ) < ˆ π because e a is a punishmentaction). Thus the mutants are weakly outperformed by the incumbents if q ( n, · ( g ( a ∗ ) + ˆ π ) + (1 − q ( n, · ˆ π − k n ≤ ˆ π ⇔ g ( a ∗ ) ≤ k n q ( n, . This holds for all n if g ( a ∗ ) ≤ c . Thus, the probability of deceiving the incumbents is atmost k n g ( a ∗ ) . The fact that g ( a ∗ ) ≤ c implies that the average payoﬀ of the mutants againstthe incumbents is less than ˆ π + g ( a ∗ ) · k n g ( a ∗ ) ≤ ˆ π + k n , and thus if the mutants are suﬃcientlyrare they are weakly outperformed (due to paying the cognitive cost of k n ). We concludethat ( µ, a ∗ ) is an NSC.2. “Only if side.” Assume that ( µ, a ∗ ) is an NSC. Theorem 1 implies that π ( a ∗ , a ∗ ) = ˆ π . As-sume that g ( a ∗ ) > c. The deﬁnition of the eﬀective cost of deception implies that thereexists a cognitive level n such that k n q ( n, < g ( a ∗ ). Lemma 1 implies that all the incumbentshave cognitive level 1. Consider mutants with cognitive level n and completely indiﬀerentpreferences (i.e. preferences that induce indiﬀerence between all action proﬁles). Let a ′ bea best reply against a ∗ . There is a post-entry focal conﬁguration in which (i) the incum-bents play a ∗ against the mutants, (ii) the mutants play a ′ when they deceive an incumbent21pponent and a ∗ when they do not deceive an incumbent opponent, and (iii) the mutantsplay a ∗ when they are matched with another mutant. Note that the mutants achieve at leastˆ π + g ( a ∗ ) · q ( n,

1) when they are matched against the incumbents. The gain relative to in-cumbents, g ( a ∗ ) · q ( n, k n , by our assumptionthat g ( a ∗ ) > c. Thus the mutants strictly outperform the incumbents.

In this section we characterise NSCs in generic games, by which we mean games in which any twodiﬀerent action proﬁles each give the same player a diﬀerent payoﬀ, and each yield a diﬀerent sumof payoﬀs.

Deﬁnition 10.

A (symmetric) game is generic if for each a, a ′ , b, b ′ ∈ A , { a, a ′ } 6 = { b, b ′ } implies π ( a, a ′ ) = π ( b, b ′ ) , and π ( a, a ′ ) + π ( a ′ , a ) = π ( b, b ′ ) + π ( b ′ , b ) . For example, if the entries of the payoﬀ matrix π are drawn independently from a continu-ous distribution on an open subset of the real numbers, then the induced game is generic withprobability one.Note that a generic game admits at most one eﬃcient action proﬁle. From Corollary 1 we knowthat if the game does not have a symmetric eﬃcient proﬁle then it does not admit any NSC (andas discussed in Remark 4, essentially every interaction admits a symmetric eﬃcient proﬁle). Hencewe can restrict attention to games with exactly one eﬃcient action proﬁle. Let ¯ a denote the actionplayed in this unique proﬁle.Next we present our main result: all incumbent types play eﬃciently in any NSC of a genericgame. Theorem 2. If ( µ ∗ , b ∗ ) is an NSC of a generic game with a (unique) eﬃcient outcome (¯ a, ¯ a ) , then b ∗ ≡ ¯ a , for all θ, θ ′ ∈ C ( µ ∗ ) ; i.e. all types play the pure action ¯ a in all matches.Proof. Assume to the contrary that conﬁguration ( µ ∗ , b ∗ ) is an NSC such that there are some θ, θ ′ ∈ C ( µ ∗ ) such that b Nθ ( θ ′ ) = ¯ a and q ( θ n , θ n ′ ) + q ( θ n ′ , θ n ) <

1, or b Dθ ( θ ′ ) = ¯ a and q ( θ n , θ n ′ ) > θ be the type with the highest cognitive level among the types that satisfy at least one of thefollowing conditions:(A) ˚ θ plays ineﬃciently against itself, i.e. π (cid:16) ˚ θ, ˚ θ (cid:17) < ˆ π .(B) ˚ θ and an opponent with a weakly higher type play an ineﬃcient strategy proﬁle, i.e. 0 . · (cid:16) π (cid:16) ˚ θ, θ ′ (cid:17) + π (cid:16) θ ′ , ˚ θ (cid:17)(cid:17) < ˆ π for some θ ′ = ˚ θ with n ˚ θ ≤ n θ ′ .22C) A strictly lower type earns strictly more than ˆ π against ˚ θ , i.e. π (cid:16) θ ′′ , ˚ θ (cid:17) > ˆ π for some θ ′′ = ˚ θ with n ˚ θ > n θ ′′ .We will now successively rule out each of these cases.Assume ﬁrst that (A) holds. Let ˆ u be a utility function that is identical to u ˚ θ except that:(i) the payoﬀ of the outcome (¯ a, ¯ a ) is increased by the minimal amount required to make it abest reply to itself, and (ii) the payoﬀ of some other outcome is altered slightly (to ensure ˆ u isnot already an incumbent) in a way that does not change the behaviour of agents. (The formaldeﬁnition of ˆ u is provided in Appendix A.3.) Suppose that mutants of type ˆ θ = (ˆ u, n θ ) invadethe population. Consider a focal post-entry conﬁguration in which the mutants mimic the play ofthe type ˚ θ incumbents in all matches except that: (i) the mutants play the eﬃcient proﬁle (¯ a, ¯ a )among themselves (which yields a higher payoﬀ than what ¯ θ achieves when matched against ˚ θ ),and (ii) when the mutants face a higher type they play either (¯ a, ¯ a ) or the same deception/Nashequilibrium that the higher types play against ¯ θ . It follows that the mutants ˆ θ earn a strictly higherpayoﬀ than ˚ θ against ˆ θ , and a weakly higher ﬁtness than type θ against all other types. Thus themutants strictly outperform the incumbents, which contradicts the assumption that ( µ ∗ , b ∗ ) is anNSC. The full technical details of this argument are given in Appendix A.3.Next, assume that case (B) holds and that case (A) does not hold. This implies that0 . · (cid:16) π (cid:16) ˚ θ, θ ′ (cid:17) + π (cid:16) θ ′ , ˚ θ (cid:17)(cid:17) < ˆ π = π (cid:16) ˚ θ, ˚ θ (cid:17) = π ( θ ′ , θ ′ ) . That is, in the subpopulation that includes types ˚ θ and θ ′ the within-type matchings yieldhigher payoﬀs than out-group matchings (an average payoﬀ of less than ˆ π ). The following formalargument shows that this property implies dynamic instability. The fact that ( µ ∗ , b ∗ ) is an NSCimplies that µ ∗ is an NSS in the type game Γ ( µ ∗ ,b ∗ ) . Let B be the payoﬀ matrix of the type gameΓ ( µ ∗ ,b ∗ ) and let n = | C ( µ ∗ ) | . It is well known (e.g., Hofbauer and Sigmund, 1988, Exercise 6.4.3,and Hofbauer, 2011, pp. 1–2) that an interior Nash equilibrium of a normal-form game is an NSSif and only if the payoﬀ matrix is negative semi-deﬁnite with respect to the tangent space, i.e. ifand only if x T B x ≤ x ∈ R n such that P i x i = 0. Assume without loss of generality thattype ˚ θ ( θ ′ ) is represented by the j th ( k th ) row of the matrix B . Let the column vector x be deﬁnedas follows: x ( j ) = 1, x ( k ) = −

1, and x ( i ) = 0 for each i / ∈ { j, k } . That is, the vector x has allentries equal to zero, except for the j th entry, which is equal to 1, and the k th entry, which is equalto −

1. We have x T B x = B jj − B kj − B jk + B kk = π (¯ a, ¯ a ) − k n ˚ θ + π (¯ a, ¯ a ) − k n θ ′ − (cid:16) π (cid:16) b ˚ θ ( θ ′ ) , b θ ′ (cid:16) ˚ θ (cid:17)(cid:17) − k n ˚ θ + π (cid:16) b θ ′ (cid:16) ˚ θ (cid:17) , b ˚ θ ( θ ′ ) (cid:17) − k n θ ′ (cid:17) = 2 · π (¯ a, ¯ a ) − (cid:16) π (cid:16) b ˚ θ ( θ ′ ) , b θ ′ (cid:16) ˚ θ (cid:17)(cid:17) + π (cid:16) b θ ′ (cid:16) ˚ θ (cid:17) , b ˚ θ ( θ ′ ) (cid:17)(cid:17) > . B is not negative semi-deﬁnite.Finally, assume that only case (C) holds. Let ¯ θ be an incumbent type with the highest cognitivelevel. The fact that case (B) does not hold implies that π (cid:16) ¯ θ, ˚ θ (cid:17) = π (cid:16) ˚ θ, ¯ θ (cid:17) = ˆ π . The fact that case(C) holds implies that π (cid:16) θ ′′ , ˚ θ (cid:17) > ˆ π , which implies that type ˚ θ has an undominated action that canyield a deceiving opponent a payoﬀ of more than ˆ π in a deception equilibrium. This contradictspart (2) of Theorem 1, according to which we should have (cid:16) b D ¯ θ (cid:16) ˚ θ (cid:17) , b D ˚ θ (cid:16) ¯ θ (cid:17)(cid:17) = F M DE (cid:16) ¯ θ, ˚ θ (cid:17) .We have shown that no type in the population satisﬁes either (A), (B), or (C). The fact that notype satisﬁes (A) implies that in any match of agents of the same type both agents play action ¯ a ,and the fact that no type satisﬁes (B) implies that in any match between two agents of diﬀerenttypes both players play action ¯ a .Combining the results of this section with the above characterisation of pure NSCs yields thefollowing corollary, which fully characterises the NSCs of generic games that admit punishmentactions (as discussed in Remark 2, such actions exist in many economic interactions). The resultshows that such games admit an NSC iﬀ the deviation gain from the pure eﬃcient symmetric proﬁleis smaller than the eﬀective cost of defection, and when an NSC exists, its outcome is the pureeﬃcient symmetric proﬁle. In particular, in any game that admits an eﬃcient symmetric pureNash equilibrium, this equilibrium is the unique NSC outcome, and in the Prisoner’s Dilemmamutual cooperation is the unique NSC outcome iﬀ the gain from defecting against a cooperator isless than the eﬀective cost of deception, and no NSC exists otherwise. Corollary 2.

Let G be a generic game that admits a punishment action. The environment admitsan NSC iﬀ there exists an eﬃcient symmetric pure proﬁle ( a ∗ , a ∗ ) satisfying g ( a ∗ ) ≤ c (i.e. thedeviation gain is smaller than the eﬀective cost of deception). Moreover, if ( µ, b ) is an NSC, then b ≡ a ∗ , and n = 1 for all ( u, n ) ∈ C ( µ ) .Remark . Corollary 2 shows that generic games do not admit NSCs if the eﬀective cost of deceptionis less than the deviation gain of the eﬃcient proﬁle. In such cases the distribution of typesand their induced behaviour will not converge to a static population state. We leave the formalanalysis of environments that do not admit NSCs to future research. One conjecture for thedynamic behaviour in such environments is a never-ending cycle between states in which almostall agents are naive and play an eﬃcient action proﬁle, and states in which diﬀerent cognitive levelscoexist, and agents play ineﬃcient action proﬁles (see the related analysis of cyclic behaviour in thePrisoner’s Dilemma with cheap talk and material preferences in Wiseman and Yilankaya, 2001).

Remark . Corollary 2 states that in an NSC of a generic game everyone has the same cognitivelevel. One may wonder how this relates to the apparent cognitive heterogeneity in the real world.Our analysis in this paper assumes a single underlying game, while in reality we face a potentiallyinﬁnite set of games. If an individual’s ﬁtness is the result of interactions in a set of games thatincludes generic games with an NSC as well as non-generic games (see Section 3.5) or generic24ames that do not admit any NSC (see previous remark), then evolution may lead to states inwhich diﬀerent cognitive levels coexist, possibly with a never-ending cycle between states withdiﬀerent mixtures of cognitive levels.

Remark . Corollary 2 assumes that the underlying game admits a punishment action ˜ a , that givesan opponent a payoﬀ strictly smaller than the eﬃcient payoﬀ ˆ π , regardless of the opponent’s play.This punishment action is used in the construction of the NSC that induces the eﬃcient action a ∗ . Speciﬁcally, a non-deceived incumbent plays the punishment action a ′ against any mutantwho does not always plays action a ∗ . If the game does not admit a punishment action, then (1)a complicated game-speciﬁc construction of the way in which incumbents behave against mutantswho do not always play a ∗ may be required to support the eﬃcient action as the outcome of anNSC, and (2) this construction may require further restrictions on the eﬀective cost of deception,in addition to g ( a ∗ ) ≤ c . We leave the study of these issues to future research. The previous two subsections fully characterise (i) pure NSCs and (ii) NSCs in generic games. Inthis section we analyse non-pure NSCs in non-generic games. Non-generic games may be of interestin various setups, such as: (1) normal-form representation of generic extensive-form games (theinduced matrix is typically non-generic), and (2) interesting families of games, such as zero-sumgames. Unlike generic games, non-generic games can admit NSCs that are not pure and thatmay therefore contain multiple cognitive levels. To demonstrate this we consider the Rock-Paper-Scissors game, with the following payoﬀ matrix: R P SR , − , , − P , − , − , S − , , − , . To simplify the analysis and the notations we assume in this subsection that a player alwayssucceeds in deceiving an opponent with a lower cognitive level, i.e. that q ( n, n ′ ) = 1 whenever n>n ′ . The analysis can be extended to the more general setup.The result below shows that, under mild assumptions on the cognitive cost function, this gameadmits an NSC in which all players have the same materialistic preferences, but players of diﬀerentcognitive levels coexist, and non-Nash proﬁles are played in all matches between two individuals For the construction presented in this subsection to work, the underlying game must be non-generic. Observethat if one slightly perturbs the payoﬀs of the Rock-Paper-Scissors game to make it a strictly competitive almost-zero-sum generic game, then Corollary 2 applies, and the only candidate to be an NSC is a conﬁguration in whichall agents have cognitive level one, and they all play an eﬃcient action proﬁle.

25f diﬀerent cognitive levels. More precisely, when individuals of diﬀerent cognitive levels meet, thehigher-level individual deceives the lower-level individual into taking a pure action that the higher-level individual then best-replies to. Thus the higher-level individual earns 1 and her opponentearns −

1. Individuals of the same cognitive level play the unique Nash equilibrium. This meansthat higher-level types will obtain a payoﬀ of 1 more often than lower-level types, and lower-level types will obtain a payoﬀ of − Proposition 2.

Let G be a Rock-Paper-Scissors game. Let u π denote the (materialistic) preferencesuch that u π ( a, a ′ ) = π ( a, a ′ ) for all proﬁles ( a, a ′ ) . Assume that q ( n, n ′ ) = 1 whenever n = n ′ .Further assume that the marginal cognitive cost is small but non-vanishing, so that (a) there isan N such that k N ≤ < k N +1 , and (b) it holds that > k n +1 − k n for all n ≤ N . Under theseassumptions there exists an NSC ( µ ∗ , b ∗ ) such that C ( µ ∗ ) ⊆ { ( u π , n ) } Nn =1 , and µ ∗ is mixed (i.e. | C ( µ ∗ ) | > ). The behaviour of the incumbent types is as follows: if the individuals in a matchare of diﬀerent cognitive levels, then the higher level plays Paper and the lower level plays Rock;if both individuals in a match are of the same cognitive level, then they both play the unique Nashequilibrium (i.e. randomise uniformly over the three actions). Appendix C contains a formal proof of this result and relates it to a similar construction inConlisk (2001).Our next result gives a lower bound to the ﬁtness obtained in NSCs. Let M be the puremaxmin value of the underlying game: M = max a ∈ A min a ∈ A π ( a , a ) . The pure maxmin value M is the minimal ﬁtness payoﬀ a player can guarantee herself in thesequential game in which she plays ﬁrst, and the opponent replies in an arbitrary way (i.e. notnecessarily maximising the opponent’s ﬁtness.)Proposition 3 shows that the pure maxmin value is a lower bound on the ﬁtness payoﬀ obtainedin an NSC. The intuition is that if the payoﬀ is lower, then a mutant of cognitive level 1, withpreferences such that the maxmin action a M is dominant, will outperform the incumbents. Proposition 3. If ( µ ∗ , b ∗ ) is an NSC then Π ( µ ∗ , b ∗ ) ≥ M .Proof. Assume to the contrary that Π ( µ ∗ , b ∗ ) < M . Let a M be a maxmin action of a player, whichguarantees that the player’s payoﬀ is at least M ¯ , i.e. a M ∈ arg max a ∈ A min a ∈ A π ( a , a ) . u a M be the preferences in which the player obtains a payoﬀ of 1 if she plays a M and apayoﬀ of 0 otherwise. Consider a monomorphic group of mutants with type ( u a M , a M is a maxmin action implies that π ( u aM , ) (cid:16) ˜ µ, ˜ b (cid:17) ≥ M in any post-entry conﬁguration.Furthermore, due to continuity it holds that Π θ (cid:16) ˜ µ, ˜ b (cid:17) < M for any θ ∈ C ( µ ) in all suﬃcientlyclose focal post-entry conﬁgurations. This contradicts that µ ∗ is an NSS in Γ( ˜ µ, ˜ b ), and thus itcontradicts that ( µ ∗ , b ∗ ) is an NSC.We conclude by demonstrating that the lower bound of the maxmin payoﬀ is binding. Specif-ically, Example 1 shows an NSC in a zero-sum game in which the ﬁtness of the incumbents isarbitrarily close to the lowest feasible payoﬀ in the underlying game -1 (which is equal to themaxmin payoﬀ). Example 1.

Consider the Rock-Paper-Scissors game described above. Assume that k = 1, k > , and q (2 ,

1) = 1. For each ǫ ∈ (0 , ǫ of the agents have cognitivelevel 1, and the remaining 1 − ǫ of the agents have level 2. The agents’ behaviour is according tothe behaviour described in Proposition 2, i.e.: (1) an agent of level 2 deceives a level-1 opponentinto taking a pure action that the level-2 agent then best-replies to; thus the level-2 agent earns 1and her opponent earns −

1; and (2) individuals of the same cognitive level play the unique Nashequilibrium, and obtain a payoﬀ of zero in the underlying game. When one takes into account thecognitive cost k = 1 of the level-2 agents, this behaviour implies that all incumbents obtain aﬁtness of ǫ −

1. An analogous argument to the proof of Proposition 2 implies that this conﬁgurationis an NSC.

As mentioned above, our basic model assumes perfect observability, and Nash equilibrium be-haviour, in matches without deception. In what follows we brieﬂy describe the results of a robust-ness check that relaxes the ﬁrst of these two assumptions. For brevity, we detail the full technicalanalysis in Appendix D.Speciﬁcally, we follow Dekel, Ely, and Yilankaya (2007) and assume that in matches withoutdeception, each player privately observes the opponent’s type with an exogenous probability p ,and with the remaining probability observes an uninformative signal. This general model extendsboth our baseline model (where p = 1) and Dekel, Ely, and Yilankaya’s (2007) model (which canbe viewed as assuming arbitrarily high deception costs).The main results of the baseline model ( p = 1) show that (1) only eﬃcient proﬁles can be NSCs,and (2) there exist non-Nash eﬃcient NSCs, provided that the cost of deception is suﬃciently27arge. Our analysis shows that the former result (namely, stability implies eﬃciency) is robust tothe introduction of partial observability: (1) a somewhat weaker notion of eﬃciency is satisﬁed bythe behaviour of the incumbents with the highest cognitive level in any NSC for any p >

0, and(2) in games such as the Prisoner’s Dilemma, we show that only the eﬃcient proﬁle can be theoutcome of an NSC.On the other hand, our analysis shows that our second main result (namely, the stability ofnon-Nash eﬃcient outcomes) is not robust to the introduction of partial observability. Speciﬁcally,we show that: (1) non-Nash eﬃcient proﬁles cannot be NSC outcomes for any p < p ∈ [0 , p ∈ (0 , In the main text we deal exclusively with preferences that are deﬁned only over action proﬁles.In what follows we brieﬂy describe how to extend the analysis to interdependent preferences, i.e.preferences that may also depend on the opponent’s type. A detailed formal analysis is presented inAppendix B. Herold and Kuzmics (2009) study a similar setup while assuming perfect observabilityof types among all individuals. Their key result is that any mixed action that gives each player apayoﬀ above her maxmin payoﬀ can be the outcome of a stable conﬁguration. Our main result for interdependent preferences in our setup shows that a pure conﬁgurationis stable essentially iﬀ: (1) all incumbents have the same cognitive level n , (2) the cost of level n is smaller than the diﬀerence between the incumbents’ (ﬁtness) payoﬀ and the minmax/maxminvalue, and (3) the deviation gain is smaller than the eﬀective cost of deception against an opponentwith cognitive level n . In particular, if the marginal eﬀective cost of deception is suﬃciently small,then only Nash equilibria can be the outcomes of pure stable conﬁgurations, while if the eﬀectivecost of deceiving some cognitive level n is suﬃciently high (while the cost of achieving level n issuﬃciently low), then essentially any action proﬁle is the outcome of a pure stable conﬁguration Herold and Kuzmics (2009) expand the framework of Dekel, Ely, and Yilankaya (2007) to include interdepen-dent preferences, i.e. preferences that depend on the opponent’s preference type. Under perfect or almost perfectobservability, if all preferences that depend on the opponent’s type are considered, then any symmetric outcomeabove the minmax material payoﬀ is evolutionarily stable. In our setting a pure proﬁle also has to be a Nash equi-librium in order to be the sole outcome supported by evolutionarily stable preferences. Herold and Kuzmics (2009)ﬁnd that non-discriminating preferences (including selﬁsh materialistic preferences) are typically not evolutionarilystable on their own. By contrast, certain preferences that exhibit discrimination are evolutionarily stable. Similarly,evolutionary stability requires the presence of discriminating preferences also in our setup.

We have developed a model in which preferences coevolve with the ability to detect others’ prefer-ences and misrepresent one’s own preferences. To this end, we have allowed for heterogeneity withrespect to costly cognitive ability. The assumption of an exogenously given level of observabilityof the opponent’s preferences, which has characterised the indirect evolutionary approach up untilnow, is replaced by the Machiavellian notion of deception equilibrium, which endogenously deter-mines what each player observes. Our model assumes a very powerful form of deception. Thisallows us to derive sharp results that clearly demonstrate the eﬀects of endogenising observationand introducing deception. We think that the “Bayesian” deception is an interesting model forfuture research: each incumbent type is associated with a signal, agents with high cognitive levelscan mimic the signals of types with lower cognitive levels, and agents maximise their preferencesgiven the received signals and the correct Bayesian inference about the opponent’s type.In a companion paper (Heller and Mohlin, forthcoming) we study environments in which playersare randomly matched, and make inferences about the opponent’s type by observing her pastbehaviour (rather than directly observing her type, as is standard in the “indirect evolutionaryapproach”). In future research, it would be interesting to combine both approaches and allow theobservation of past behaviour to be inﬂuenced by deception.Most papers taking the indirect evolutionary approach study the stability of preferences deﬁnedover material outcomes. Moreover, it is common to restrict attention to some parameterised class ofsuch preferences. Since we study preferences deﬁned on the more abstract level of action proﬁles wedo not make predictions about whether some particular kind of preferences over material outcomes,from a particular family of utility functions, will be stable or not. It would be interesting to extendour model to such classes of preferences. Furthermore, with preferences deﬁned over materialoutcomes it would be possible to study coevolution of preferences and deception not only in isolatedgames, but also when individuals play many diﬀerent games using the same preferences. We hopeto come back to these questions and we invite others to employ and modify our framework in thesedirections. 29

Formal Proofs of Theorems 1 and 2

A.1 Preliminaries

This subsection contains notation and deﬁnitions that will be used in the following proofs.A generous action is an action such that if played by the opponent, it allows a player to achievethe maximal ﬁtness payoﬀ. Formally:

Deﬁnition 11.

Action a g ∈ A is generous , if there exists a ∈ A such that π ( a, a g ) ≥ π ( a ′ , a ′′ ) forall a ′ , a ′′ ∈ A .Fix a generous action a g ∈ A of the game G . A second-best generous action is an action suchthat if played by the opponent, it allows a player to achieve the ﬁtness payoﬀ that is maximalunder the constraint that the opponent is not allowed to play the generous action a g . Formally: Deﬁnition 12.

Action a g ∈ A is second-best generous , conditional on a g ∈ A being ﬁrst-bestgenerous, if there exists a ∈ A such that π ( a, a g ) ≥ π ( a ′ , a ′′ ) for all a ′ , a ′′ ∈ A such that a ′′ = a g .Fix a generous action a g ∈ A , and ﬁx a second-best generous action a g ∈ A , conditional on a g ∈ A being ﬁrst-best generous. For each α ≥ β ≥

0, let u α,β be the following utility function: u α,β ( a, a ′ ) =  α a ′ = a g β a ′ = a g u α,β satisﬁes:1. Indiﬀerence: the utility function only depends on the opponent’s action; i.e. the player isindiﬀerent between any two of her own actions.2.

Pro-generosity: the utility is highest if the opponent plays the generous action, second-highestif the opponent plays the second-best generous action, and lowest otherwise.Let U GI = { u α,β | α ≥ β ≥ } be the family of all such preferences, called pro-generous indiﬀerentpreferences . Note that U GI includes a continuum of diﬀerent utilities (under the assumption that G includes at least three actions). Thus, for any set of incumbent types, we can always ﬁnd autility function in U GI that does not belong to any of the current incumbents. A.2 Proof of Theorem 1 (Behaviour of the Highest Types)

A.2.1 Proof of Theorem 1, Part 1

Assume to the contrary that π (cid:16) b N ¯ θ (cid:16) ¯ θ (cid:17) , b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) < ˆ π . (Note that the deﬁnition of ˆ π implies thatthe opposite inequality is impossible.) Let a , a ∈ A be any two actions such that ( a , a ) is an30ﬃcient action proﬁle, i.e. 0 . · ( π ( a , a ) + π ( a , a )) = ˆ π . Let θ , θ , θ be three types that satisfythe following conditions: (1) the types are not incumbents: θ , θ , θ / ∈ C ( µ ∗ ), (2) the types havethe highest incumbent cognitive level: n θ = n θ = n θ = ¯ n , and (3) the types have diﬀerent pro-generosity indiﬀerent preferences; u θ , u θ , u θ ∈ U GI and u θ i = u θ j for each i = j ∈ { , , } . Let µ ′ be the distribution that assigns mass to each of these types. The post-entry type distributionis ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ . Let the post-entry behaviour policy ˜ b be deﬁned as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) foreach incumbent pair θ, θ ′ ∈ C ( µ ∗ ).2. The mutants play ﬁtness-maximising deception equilibria against incumbents with lowercognitive levels: (cid:16) ˜ b Dθ i ( θ ′ ) , ˜ b Dθ ′ ( θ i ) (cid:17) ∈ F M DE ( θ i , θ ′ ) for each i ∈ { , , } and θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n . Note that F M DE ( θ i , θ ′ ) is nonempty in virtue of the construction of U GI .3. In matches without deception between mutants and incumbents, the mutants mimic ¯ θ andthe incumbents play the same way they play against ¯ θ : (cid:16) ˜ b Nθ i ( θ ′ ) , ˜ b Nθ ′ ( θ i ) (cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) ,for each i ∈ { , , } and θ ′ ∈ C ( µ ∗ ).4. Two mutants of diﬀerent types play eﬃciently when meeting each other: ˜ b Nθ i (cid:16) θ ( i +1) mod 3 (cid:17) = a and ˜ b Nθ i (cid:16) θ ( i −

1) mod 3 (cid:17) = a for each i ∈ { , , } .5. When two mutants of the same type meet, they play the same way ¯ θ plays against itself:˜ b Nθ i ( θ i ) = b N ¯ θ (cid:16) ¯ θ (cid:17) for each i ∈ { , , } .In virtue of point 1 the construction (cid:16) ˜ µ, ˜ b (cid:17) is a focal conﬁguration (with respect to ( µ ∗ , b ∗ )). Bypoints 2 and 3 each mutant θ i earns weakly more than ¯ θ against all incumbent types. By points4 and 5 each mutant earns strictly more than ¯ θ against the mutants. In total the average ﬁtnessearned by each mutant is strictly higher than that of ¯ θ , against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) .This implies that µ ′ is a strictly better reply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, and therefore it is not an NSS, in Γ( ˜ µ, ˜ b ), which implies that µ ∗ is not an NSC. A.2.2 Proof of Theorem 1, Part 2

Assume to the contrary that (cid:16)(cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17)(cid:17) F M DE (cid:16) ¯ θ, θ (cid:17) . Let ˆ θ be a type that satisﬁesthe conditions of: (1) not being an incumbent: ˆ θ / ∈ C ( µ ∗ ), (2) having the highest incumbentcognitive level: n ˆ θ = ¯ n , and (3) having pro-generous indiﬀerent preferences: u ˆ θ ∈ U GI . Let µ ′ be the distribution that assigns mass one to type ˆ θ . The post-entry type distribution is ˜ µ =(1 − ǫ ) · µ ∗ + ǫ · µ ′ . Let the post-entry behaviour policy ˜ b be deﬁned as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ). 31. In matches with deception between mutants and incumbents , behaviour is such that themutants maximise their ﬁtness: (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) for each θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n .3. In matches without deception between mutants and incumbents, the mutants mimic ¯ θ andthe incumbents play the same way they play against ¯ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) ,for each θ ′ ∈ C ( µ ∗ ).4. The mutant ˆ θ plays against itself the same way ¯ θ plays against itself: (cid:16) ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) , ˜ b N ˆ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) ˜ b N ¯ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) .Note that (cid:16) ˜ µ, ˜ b (cid:17) is a focal conﬁguration (with respect to ( µ ∗ , b ∗ )) and that ˆ θ obtains a strictly higherﬁtness than ¯ θ against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) . This implies that µ ′ is a strictly betterreply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, andtherefore it is not an NSS, in Γ( ˜ µ, ˜ b ), which implies that µ ∗ is not an NSC. A.2.3 Proof of Theorem 1, Part 3

Assume to the contrary that π (cid:16) θ, ¯ θ (cid:17) > ˆ π , which immediately implies that π (cid:16) ¯ θ, θ (cid:17) < ˆ π and thateither π (cid:16) b | Dθ (cid:16) ¯ θ (cid:17) , b D ¯ θ ( θ ) (cid:17) > ˆ π or π (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) > ˆ π . Let ˆ θ be a type that satisﬁes the conditionsof: (1) not being an incumbent: ˆ θ / ∈ C ( µ ∗ ), (2) having the highest incumbent cognitive level: n ˆ θ = ¯ n , and (3) having pro-generous indiﬀerent preferences: u ˆ θ ∈ U GI . Let µ ′ be the distributionthat assigns mass one to type ˆ θ . The post-entry type distribution is ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ . Letthe post-entry behaviour policy ˜ b be deﬁned as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ).2. In matches with deception between mutants and incumbents, behaviour is such that themutants maximise their ﬁtness: (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) for each θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n .3. In a match between a mutant ˆ θ and the incumbent ¯ θ , the mutant mimics θ , and the in-cumbent ¯ θ plays the same way it plays against θ : (cid:16) ˜ b N ˆ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) if π (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) > ˆ π , and (cid:16) ˜ b N ˆ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b Dθ (cid:16) ¯ θ (cid:17) , b D ¯ θ ( θ ) (cid:17) otherwise.4. The mutant ˆ θ plays against itself the same way ¯ θ plays against itself: (cid:16) ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) , ˜ b N ˆ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) ˜ b N ¯ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) .5. The mutant ˆ θ mimics ¯ θ against all other incumbents without deception, and these incumbentsplay against ˆ θ in the same way they play against ¯ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) foreach θ ′ = ¯ θ . 32ote that (cid:16) ˜ µ, ˜ b (cid:17) is a focal conﬁguration (with respect to ( µ ∗ , b ∗ )). By point 2 the mutant ˆ θ earnsweakly more than ¯ θ against lower types. By point 3 and Theorem 1.1, the mutants earn strictlymore than ¯ θ against type ¯ θ . By points 3 and 4 and Theorem 1.1, the mutant earns strictly morethan ¯ θ against the mutant. By point 5 the mutant ˆ θ earns the same as ¯ θ against all other types.In total the average ﬁtness earned by ˆ θ is strictly higher than that of ¯ θ , against a population thatfollows (cid:16) ˜ µ, ˜ b (cid:17) . Recall (Remark 4 in Section 2.3) that all the incumbent types have the same ﬁtnessin ( µ ∗ , b ∗ ). By a standard continuity argument, the ﬁtness of incumbent ¯ θ is arbitrarily close (fora suﬃciently small ǫ ) to the ﬁtness levels of any other incumbent type in the focal post-entryconﬁguration (cid:16) ˜ µ, ˜ b (cid:17) . This implies that µ ′ is a strictly better reply against µ ∗ in the type gameΓ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, and therefore it is not an NSS, in Γ( ˜ µ, ˜ b ),which implies that ( µ ∗ , b ∗ ) is not an NSC. A.3 Proof of Case (A) in Theorem 2

In what follows we ﬁll in the missing technical details for the part of the proof of Theorem 2 thatconcerns case (A). We begin by proving a lemma.

Lemma 2. If ( σ , σ ) ∈ DE ( θ , θ ) then there exist actions a , a ′ ∈ C ( σ ) and a , a ′ ∈ C ( σ ) such that: ( a , a ) ∈ DE ( θ , θ ) , and ( a ′ , a ′ ) ∈ DE ( θ , θ ) , with π ( a , a ) ≥ π ( σ , σ ) , and π ( a ′ , a ′ ) ≤ π ( σ , σ ) .Proof. Note that for any mixed deception equilibrium ( σ , σ ) and any action a ∈ C ( σ ), theproﬁle ( σ , a ) is also a deception equilibrium (because otherwise the deceiver would not induce thedeceived party to take a mixed action that puts positive weight on a ). It follows that there areactions a , a ′ ∈ C ( σ ) such that ( σ , a ) and ( σ , a ′ ) are deception equilibria, with π ( σ , a ) ≥ π ( σ , σ ) and π ( σ , a ′ ) ≤ π ( σ , σ ). Furthermore, if ( σ , a ) and ( σ , a ′ ) are deception equilibria,then for any action a ∈ C ( σ ), the proﬁles ( a, a ) and ( a, a ′ ) are also deception equilibria, with π ( σ , a ) = π ( a, a ) and π ( σ , a ′ ) = π ( a, a ′ ). Hence there are actions a , a ′ ∈ C ( σ ) suchthat ( a , a ) and ( a ′ , a ′ ) are deception equilibria, with π ( a , a ) = π ( σ , a ) ≥ π ( σ , σ ), and π ( a , a ′ ) = π ( σ , a ′ ) ≤ π ( σ , σ ).Assume that case (A) holds: there is an incumbent ˚ θ that plays ineﬃciently against itself, i.e. (cid:16) b N ˚ θ (cid:16) ˚ θ (cid:17) , b N ˚ θ (cid:16) ˚ θ (cid:17)(cid:17) = (¯ a, ¯ a ), and there is no incumbent type with a strictly higher cognitive levelthan ˚ θ that satisﬁes any of the cases (A), (B), or (C). To prove that this cannot hold in an NSCwe introduce a mutant ˆ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ) . If Σ ( u ˚ θ ) = ∆, then we let ˆ u ∈ U GI be such thatˆ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ). If Σ ( u ˚ θ ) = ∆, then we ﬁx a dominated action a ∈ A \ Σ ( u ˚ θ ), and let ˆ u bedeﬁned as follows: ˆ u ( a, a ′ ) =  max a ∈ A ( u ˚ θ ( a, ¯ a )) a = a ′ = ¯ au ˚ θ ( a, a ′ ) − β a ′ a = a and a ′ = ¯ au ˚ θ ( a, a ′ ) otherwise,33here each β a ′ ≥ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ). That is, if Σ ( u ˚ θ ) = ∆, then theutility function ˆ u is constructed from the utility function u ˚ θ by arbitrarily lowering the payoﬀ ofsome of the outcomes associated with the (already) dominated action a and that do not involveaction ¯ a , while increasing the payoﬀ of the outcome (¯ a, ¯ a ) by the minimal amount that makes¯ a a best reply to itself. Note that this deﬁnition of ˆ u is valid also for the case of ¯ a = a . Itfollows that a ∈ Σ ( u ˚ θ ) ∪ { ¯ a } iﬀ a ∈ Σ (ˆ u ). To see this, note that if Σ ( u ˚ θ ) = ∆ and a = ¯ a , thenΣ (ˆ u ) = Σ ( u ˚ θ ) ∪ { ¯ a } . Otherwise Σ (ˆ u ) = Σ ( u ˚ θ ). Thus, ˆ θ can be induced to play exactly the samepure actions as ˚ θ , unless ¯ a = a , in which case ˆ θ can be induced to play ¯ a in addition to all actionsthat ˚ θ can be induced to play.Let µ ′ be the distribution that assigns mass one to type (ˆ u, n ˚ θ ). Let the post-entry typedistribution be ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ , and let the post-entry behaviour policy ˜ b be deﬁned asfollows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ).2. In matches without deception between the mutant type ˆ θ and any incumbent type θ ′ ,the mutant ˆ θ mimics ˚ θ , and the incumbent θ ′ treats the mutant ˆ θ like the incumbent ˚ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ˚ θ ( θ ′ ) , b Nθ ′ (cid:16) ˚ θ (cid:17)(cid:17) for all θ ′ such that n θ ′ = n ˚ θ and θ ′ = ˆ θ .3. In matches with deception between the mutant type ˆ θ and any lower type θ ′ ∈ C ( µ ∗ ) (with n θ ′ < n ˆ θ ), we distinguish two cases.(a) Suppose that Σ ( u ˚ θ ) = ∆. In this case let (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) . Note that F M DE (cid:16) ˆ θ, θ ′ (cid:17) is nonempty since in this case ˆ u ∈ U GI .(b) Suppose that Σ ( u ˚ θ ) = ∆. In this case let (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = ( a , a ), for some ( a , a ) ∈ DE (cid:16) ˚ θ, θ ′ (cid:17) such that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) . By Lemma 2 above such a proﬁle( a , a ) exists.4. The mutant plays eﬃciently when meeting itself: ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) = ¯ a .5. In matches with deception between the mutant ˆ θ and a higher type θ ′ ∈ C ( µ ∗ ) ( n θ ′ >n ˆ θ ), we distinguish two cases. Pick a proﬁle ( a , a ) ∈ DE (cid:16) θ ′ , ˚ θ (cid:17) , such that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) . By Lemma 2 above such a proﬁle ( a , a ) exists. Moreover, by theconstruction of ˆ u , it is either the case that ( a , a ) ∈ DE (cid:16) θ ′ , ˆ θ (cid:17) , or there is some ˜ a such that u θ ′ (˜ a, ¯ a ) > u θ ′ ( a , a ). In the latter case we have (¯ a, ¯ a ) ∈ DE (cid:16) θ ′ , ˆ θ (cid:17) , due to the fact that (cid:16) b Dθ ′ ( θ ′ ) , b Dθ ′ ( θ ′ ) (cid:17) = (¯ a, ¯ a ) implies that ¯ a is a best reply to ¯ a for type θ ′ .(a) If u θ ′ ( a , a ) > u θ ′ (¯ a, ¯ a ) let (cid:16) ˜ b Dθ ′ (cid:16) ˆ θ (cid:17) , ˜ b D ˆ θ ( θ ′ ) (cid:17) = ( a , a ). Note that by the deﬁnition of( a , a ) it holds that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) .34b) If u θ ′ ( a , a ) ≤ u θ ′ (¯ a, ¯ a ) let (cid:16) ˜ b Dθ ′ (cid:16) ˆ θ (cid:17) , ˜ b D ˆ θ ( θ ′ ) (cid:17) = (¯ a, ¯ a ). Note that by the deﬁnition of ˚ θ it holds that π (¯ a, ¯ a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) .By point 1, (cid:16) ˜ µ, ˜ b (cid:17) is a focal conﬁguration (with respect to ( µ ∗ , b ∗ )). By point 2 the mutant ˆ θ earns weakly more than ˚ θ against lower types. By point 3 the mutant ˆ θ earns the same as ˚ θ against all incumbents of level n ˚ θ . By points 3 and 4 (and the assumption that ˚ θ does not playeﬃciently against itself), the mutant ˆ θ earns strictly more than ˚ θ against ˆ θ . By point 5 the mutantˆ θ earns weakly more than ˚ θ against all incumbents of a higher cognitive level. In total the averageﬁtness earned by ˆ θ is strictly higher than that of ˚ θ , against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) . Thisimplies that µ ′ is a strictly better reply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is nota symmetric Nash equilibrium, and therefore it is not an NSS of Γ( ˜ µ, ˜ b ), which implies that µ ∗ isnot an NSC. Thus we have shown that ˚ θ plays eﬃciently against itself. B Type-interdependent Preferences

As argued by Herold and Kuzmics (2009, pp. 542–543), people playing a game seem to carenot only about the outcome, but also their opponent’s intentions and they discriminate betweendiﬀerent types of opponents (for experimental evidence, see, e.g., Falk, Fehr, and Fischbacher,2003; Charness and Levine, 2007). Motivated by this observation, in this appendix we extend ourbaseline model to allow preferences to depend not only on action proﬁles, but also on an opponent’stype.

B.1 Changes to the Baseline Model

We brieﬂy describe how to extend the model to handle type-interdependent preferences. Ourconstruction is similar to that of Herold and Kuzmics (2009).When the preferences of a type depend on the opponent’s type, we can no longer work withthe set of all possible preferences, because it would create problems of circularity and cardinality. Instead, we must restrict attention to a pre-speciﬁed set of feasible preferences. We begin bydeﬁning Θ ID as an arbitrary set of labels. Each label is a pair θ = ( u, n ) ∈ Θ ID , where n ∈ N and u is a type-interdependent utility function that depends on the played action proﬁle as well as theopponent’s label, u : A × A × Θ ID → R . The circularity comes from the fact that each type contains a preferences component, which is identiﬁed witha utility function deﬁned over types (and action proﬁles). To see that this creates a problem if the set of types isunrestricted, let U ∗ be the set of all utility functions that we want to include in our model. Hence Θ ∗ = U ∗ × N is the set of all types. If U ∗∗ is the set of all mappings u : A × A × Θ ∗ → R , or, equivalently, U ∗∗ is the set of all mappings u : A × A × U ∗ × N → R , then clearly we have U ∗∗ = U ∗ . See also footnote 10 in Herold and Kuzmics(2009). θ = ( u, n ) may now be interpreted as a type. The deﬁnition of u extends to mixedactions in the obvious way. We use the label u also to describe its associated utility function u .Thus u ( σ, σ ′ , θ ′ ) denotes the subjective payoﬀ that a player with preferences u earns when sheplays strategy σ against an opponent with type θ ′ who plays strategy σ ′ .Let U ID denote the set of all preferences that are part of some type in Θ ID , i.e. U ID = { u : ∃ n ∈ N s.t. ( u, n ) ∈ Θ ID } . For each preference ˜ u ∈ U of the baseline model (which is deﬁnedonly over the action proﬁles) we can deﬁne an equivalent type-interdependent preference u ∈ U ID ,which is independent of the opponent’s type; that is, u ( σ, σ ′ , θ ′ ) = u ( σ, σ ′ , θ ′′ ) = ˜ u ( σ, σ ′ ) for each θ ′ , θ ′′ ∈ Θ ID and σ, σ ′ ∈ ∆ ( A ). Let U N denote the set of all such type-interdependent versions ofthe preferences of the baseline model. To simplify the statements of the results of Section B.3, inwhat follows we assume that U N ⊆ U ID .Next, we amend the deﬁnitions of Nash equilibrium, undominated strategies, and deceptionequilibrium. The best-reply correspondence now takes both strategies and types as arguments: BR u ( σ ′ , θ ′ ) = arg max σ ∈ ∆( A ) u ( σ, σ ′ , θ ′ ). Accordingly we adjust the deﬁnition of the set of Nashequilibria, N E ( θ, θ ′ ) = { ( σ, σ ′ ) ∈ ∆ ( A ) × ∆ ( A ) : σ ∈ BR u ( σ ′ , θ ′ ) and σ ′ ∈ BR u ′ ( σ, θ ) } , and the set of undominated strategies, Σ ( θ ) = { σ ∈ ∆ ( A ) : there exists σ ′ ∈ ∆ ( A ) and θ ′ ∈ Θ ID such that σ ∈ BR u ( σ ′ , θ ′ ) } . Finally, we adapt the deﬁnition of deception equilibrium. Given two types θ, θ ′ with n θ > n θ ′ , astrategy proﬁle (˜ σ, ˜ σ ′ ) is a deception equilibrium if(˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) ,σ ′ ∈ Σ( θ ′ ) u θ ( σ, σ ′ , θ ′ ) . The interpretation of this deﬁnition is that the deceiver is able to induce both a belief about thedeceiver’s preferences, and a belief the deceiver’s intention, in the mind of the deceived party. Let DE ( θ, θ ′ ) be the set of all such deception equilibria. The rest of our model remains unchanged.Some of the following results rely on the existence of preferences u ˜ a ˜ a ′ , ˜ n that satisfy two conditions:(1) action ˜ a is a (subjective) dominant action against an opponent with the same preferencesand with cognitive level ˜ n , and (2) action ˜ a ′ is the dominant action against all other opponents.Formally: Deﬁnition 13.

Given any two actions ˜ a, ˜ a ′ ∈ A, let u ˜ a ˜ a ′ , ˜ n be the discriminating preferences deﬁned36y the following utility function: for all a, a ′ ∈ A and θ ′ ∈ U ID , u ˜ a ˜ a ′ , ˜ n ( a, a ′ , θ ′ ) =  (cid:16) θ ′ = (cid:16) u ˜ a ˜ a ′ , ˜ n , ˜ n (cid:17) and a = ˜ a (cid:17) or (cid:16) θ ′ = (cid:16) u ˜ a ˜ a ′ , ˜ n , ˜ n (cid:17) and a = ˜ a ′ (cid:17) . Finally, deﬁne the eﬀective cost of deceiving cognitive level n , denoted by c ( n ), as the minimalratio between the additional cognitive cost and the probability of deceiving an opponent of cognitivelevel n : c ( n ) = min m>n k m − k n q ( m, n ) . Note that c (1) ≡ c , which coheres with the deﬁnition of the eﬀective cost of deception (withrespect to cognitive level 1) in the baseline model. B.2 Pure Maxmin and Minimal Fitness

The pure maxmin and minmax values give a minimal bound to the ﬁtness of an NSC. Given agame G = ( A, π ) , deﬁne M and ¯ M as its pure maxmin and minmax values, respectively: M = max a ∈ A min a ∈ A π ( a , a ) , M = min a ∈ A max a ∈ A π ( a , a ) . The pure maxmin value M is the minimal ﬁtness payoﬀ a player can guarantee herself in thesequential game in which she plays ﬁrst, and the opponent replies in an arbitrary way. The pureminmax value M is the minimal ﬁtness payoﬀ a player can guarantee herself in the sequentialgame in which her opponent plays ﬁrst an arbitrary action, and she best-replies to the opponent’spure action. It is immediate that M ≤ M and that the minmax value in mixed actions is betweenthese two values.Let a M¯ be a maxmin action of a player; i.e. an action a M¯ guarantees that the player’s payoﬀis at least M ¯ , and let a ¯ M be a minmax action, i.e. an action that guarantees that the opponent’spayoﬀ is at most ¯ M : a M ∈ arg max a ∈ A min a ∈ A π ( a , a ) , a ¯ M ∈ arg min a ∈ A max a ∈ A π ( a , a ) . The proof of Proposition 3 holds with minor changes also in the setup of interdependent pref-erences (under the assumption that ( u a M , ∈ Θ ID ), and this implies that the maxmin value is alower bound on the ﬁtness payoﬀ obtained in an NSC (i.e. if ( µ, b ) is an NSC then Π ( µ, b ) ≥ M ). B.3 Characterisation of Pure Stable Conﬁgurations

In this subsection we show that, essentially, a pure conﬁguration is stable if and only if (1) allincumbents have the same cognitive level n , (2) the cost of level n is smaller than the diﬀerence37etween the incumbents’ (ﬁtness) payoﬀ and the minmax/maxmin values, and (3) the deviationgain is smaller than the eﬀective cost of deceiving cognitive level n .We begin by formally stating and proving the necessity claim. Proposition 4. If ( µ ∗ , a ∗ ) is a pure NSC then the following holds: (1) if θ, θ ′ ∈ C ( µ ∗ ) then n θ = n θ ′ = n for some n , (2) π ( a ∗ , a ∗ ) − M ≥ k n , and (3) g ( a ∗ ) ≤ c ( n ) .Proof.

1. Since all players earn the same game payoﬀ of π ( a ∗ , a ∗ ) , they must also incur the samecognitive cost, or else the ﬁtness of the diﬀerent incumbent types would not be balanced(which would contradict the fact that ( µ, a ∗ ) is an NSC).2. Assume to the contrary that π ( a ∗ , a ∗ ) − M < k n . A mutant of type ( π,

1) will be able toearn at least M against incumbents in any post-entry focal conﬁguration. As the fraction ofmutants vanishes the average ﬁtness of mutants is weakly higher than M , whereas the ﬁtnessof the incumbents converges to π ( a ∗ , a ∗ ) − k n . Thus, if it were the case that π ( a ∗ , a ∗ ) − M c ( n ). This implies that there exists a cognitive level m > n such that g ( a ∗ ) > k m − k n q ( m,n ) . Let ˜ a be the ﬁtness best reply against a ∗ . Let ˜ u ∈ U N be the preferences that assign a subjective payoﬀ of one if the agent plays either ˜ a or a ∗ and the opponent plays a ∗ , and zero otherwise, i.e. ˜ u ( a, a ′ , θ ′ ) = a ∈{ a ∗ , ˜ a } and a ′ = a ∗ .There is a focal post-entry conﬁguration in which all agents play action a ∗ in all interactionsexcept when a deceiving mutant plays action ˜ a . A mutant of type (˜ u, m ) will then earn π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( m, n ) against the incumbents. As the fraction of mutants vanishes theaverage ﬁtness of mutants is weakly higher than π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( m, n ) − k m > π ( a ∗ , a ∗ ) + ( k m − k n ) − k m = π ( a ∗ , a ∗ ) − k n , whereas the ﬁtness of the incumbents is weakly below π ( a ∗ , a ∗ ) − k n . Thus, if it were truethat g ( a ∗ ) > c ( n ), the mutants would strictly outperform the incumbents.Next, we state and prove the suﬃciency claim. Proposition 5.

Suppose that ˆ θ := (cid:16) u a ∗ a ¯ M,n , n (cid:17) ∈ Θ ID . If π ( a ∗ , a ∗ ) − M > k n , and g ( a ∗ ) < c ( n ) ,then (cid:16) ˆ θ, a ∗ (cid:17) is an ESC.Proof. Suppose that all incumbents are of type (cid:16) u a ∗ a ¯ M,n , n (cid:17) . Note that in all focal post-entry conﬁg-urations the incumbent ˆ θ always plays either a ∗ or a ¯ M . Moreover, whenever an incumbent agent is38on-deceived, then she plays action a ∗ against a fellow incumbent and action a ¯ M against a mutant.The fact that π ( a ∗ , a ∗ ) − k n > M implies that any mutant θ = ˆ θ with cognitive level n θ ′ ≤ n earnsa strictly lower payoﬀ against the incumbents in any focal post-entry conﬁguration. As a result,if the frequency of mutants is suﬃciently small, then they are strictly outperformed. Against amutant ( θ ′ , n ′ ) with cognitive level n ′ > n , an incumbent may play action a ∗ only when she is beingdeceived. Since π ( a ∗ , a ∗ ) > M the mutants earn (on average) at most π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( n ′ , n )in matches against incumbents. Consequently, as the fraction of mutants vanishes, the averageﬁtness of mutants is weakly less than π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( n ′ , n ) − k n ′ < π ( a ∗ , a ∗ ) + k n ′ − k n q ( n ′ , n ) · q ( n ′ , n ) − k n ′ = π ( a ∗ , a ∗ ) − k n , and the average ﬁtness of the incumbents converges to π ( a ∗ , a ∗ ) − k n . Hence, the mutants areoutperformed.In particular, our results imply that:1. Any pure equilibrium that induces a payoﬀ above the minmax value M is the outcome of apure ESC (regardless of the cost of deception).2. If the eﬀective cost of deception is suﬃciently small, then only Nash equilibria can be theoutcomes of pure NSCs. Speciﬁcally, this is the case if c ( n ) < g ( a ) for each cognitive level n and each action a such that ( a, a ) is not a Nash equilibrium of the ﬁtness game.3. If there is a cognitive level n , such that (1) the cost of achieving level n is suﬃcientlysmall, and (2) the eﬀective cost of deceiving an opponent of level n is suﬃciently high,then essentially any pure proﬁle is the outcome of a pure ESC (similar to the results ofHerold and Kuzmics, 2009, in the setup without deception). Formally, let A ′ ⊆ A be theset of actions that induce a payoﬀ above the minmax value: A ′ = n a ∈ A | π ( a, a ) > ¯ M o .Assume that there is a cognitive level n , such that (1) k n < π ( a, a ) − ¯ M for each action a ∈ A ′ and (2) c ( n ) > g ( a ) for each action a . Then any action a ∈ A ′ is the outcome of apure ESC (in which all incumbents have cognitive level n ). B.4 Application: In-group Cooperation and Out-group Exploitation

The following table represents a family of Hawk-Dove games. When both players play D (Dove)they earn 1 each and when they both play H (Hawk) they earn 0. When a player plays H againstan opponent playing D , she obtains an additional gain of g > ∈ (0 , H DH , g, − lD − l, g , . (1)It is natural to think of a mutual play of D as the cooperative outcome. We deﬁne preferencesthat induce players to cooperate with their own kind and to seek to exploit those who are not oftheir own kind. Deﬁnition 14.

Let u n denote the preferences such that:1. If u θ ′ = u n and n θ ′ = n then u n ( D, a ′ , θ ′ ) = 1 and u n ( H, a ′ , θ ′ ) = 0 for all a ′ .2. If u θ ′ = u n or n θ ′ = n then u n ( H, a ′ , θ ′ ) = 1 and u n ( D, a ′ , θ ′ ) = 0 for all a ′ .Thus, when facing someone who is of the same type, an individual with u n -preferences strictlyprefers cooperation, in the sense of playing D . When facing someone who is not of the same type,an individual with u n -preferences strictly prefers the aggressive action H .To simplify the analysis and the notation in this example we assume that a player alwayssucceeds in deceiving an opponent with a lower cognitive level; i.e. we assume that q ( n, n ′ ) = 1whenever n>n ′ .Under the assumption that g > l and that the marginal cognitive costs are suﬃciently small(but non-vanishing), we construct an ESC in which only individuals with preferences from { u i } ∞ i =1 are present. Individuals of diﬀerent cognitive levels coexist, and non-Nash proﬁles are played in allmatches between equals. When individuals of the same level meet, they play mutual cooperation( D, D ). When individuals of diﬀerent levels meet, the higher level plays H and the lower level plays D . The gain from obtaining the high payoﬀ of 1 + g against lower types is exactly counterbalancedby the higher cognitive costs. By contrast, if g < l then the game does not admit this kind ofstable conﬁguration. Proposition 6.

Let G be the game represented in (1), where g > and l ∈ (0 , . Assume that q ( n, n ′ ) = 1 whenever n = n ′ . Suppose that the marginal cognitive cost is small but non-vanishing,so that (a) there is an N such that k N ≤ l + g < k N +1 , and (b) it holds that g > k n +1 − k n for all n ≤ N . (i) If g > l then there exists an ESC ( µ ∗ , b ∗ ) , such that C ( µ ∗ ) ⊆ { ( u n , n ) } Nn =1 , and µ ∗ is mixed(i.e. | C ( µ ∗ ) | > ). The behaviour of the incumbents is as follows: if the individuals in a matchare of diﬀerent cognitive levels, then the higher level plays H and the lower level plays D; if bothindividuals in a match are of the same cognitive level, then they both play D.(ii) If g = l then there exists an NSC with the above properties.(iii) If g < l then there does not exist any NSC ( µ ∗ , b ∗ ) , such that C ( µ ∗ ) ⊆ { ( u n , n ) } ∞ n =1 . Remark . It is possible to construct an ESC that is like Proposition 6(i) except that when in-cumbents of the same cognitive level meet they play the mixed equilibrium of the Hawk-Dovegame. Thus we can have ESCs in which agents mix at the individual level. For instance, thiscan be accomplished by considering preferences u m such that: (1) if u θ ′ = u m and n θ ′ = n then u m ( a, a ′ , θ ′ ) = π ( a, a ′ , θ ′ ) for all a and a ′ , and (2) if u θ ′ = u m or n θ ′ = n then u n ( H, a ′ , θ ′ ) = 1 and u n ( D, a ′ , θ ′ ) = 0 for all a ′ . C Constructions of Heterogeneous NSCs in Examples

Appendix C appears in the supplementary material that can be found online.

D Partial Observability When There Is No Deception

Appendix D appears in the supplementary material that can be found online.

References

Abreu, D., and

R. Sethi (2003): “Evolutionary Stability in a Reputational Model of Bargain-ing,”

Games and Economic Behavior , 44(2), 195–216.

Alger, I., and

J. W. Weibull (2013): “Homo Moralis, Preference Evolution under IncompleteInformation and Assortative Matching,”

Econometrica , 81(6), 2269–2302.

Banerjee, A., and

J. W. Weibull (1995): “Evolutionary Selection and Rational Behavior,” in

Learning and Rationality in Economics , ed. by A. Kirman, and

M. Salmon. Blackwell, Oxford,pp. 343–363.

Bergstrom, T. C. (1995): “On the Evolution of Altruistic Ethical Rules for Siblings,”

AmericanEconomic Review , 85(1), 58–81.

Bester, H., and

W. Güth (1998): “Is Altruism Evolutionarily Stable?,”

Journal of EconomicBehavior and Organization , 34, 193–209.

Bolle, F. (2000): “Is Altruism Evolutionarily Stable? And Envy and Malevolence? Remarks onBester and Güth,”

Journal of Economic Behavior and Organization , 42, 131–133.

Bomze, I. M., and

J. W. Weibull (1995): “Does Neutral Stability Imply Lyapunov Stability?,”

Games and Economic Behavior , 11(2), 173–192.41 rown, G. W., and

J. von Neumann (1950): “Solutions of Games by Diﬀerential Equations,”in

Contributions to the Theory of Games , ed. by H. W. Kuhn, and

A. W. Tucker, Annals ofMathematics Studies 24. Princeton University Press, Princeton.

Byrne, R. W., and

A. Whiten (1997): “Machiavellian intelligence,”

Machiavellian intelligenceII: Extensions and evaluations , pp. 1–23.

Byrne, R. W., and

A. Whiten (1998):

Machiavellian Intelligence: Social Expertise and theEvolution of Intellect in Monkeys, Apes, and Humans . Oxford University Press, Oxford.

Camerer, C. F., T.-H. Ho, and

J.-K. Chong (2002): “Sophisticated experience-weightedattraction learning and strategic teaching in repeated games,”

Journal of Economic theory ,104(1), 137–188.

Charness, G., and

D. I. Levine (2007): “Intention and stochastic outcomes: An experimentalstudy,”

The Economic Journal , 117(522), 1051–1072.

Conlisk, J. (2001): “Costly Predation and the Distribution of Competence,”

American EconomicReview , 91(3), 475–484.

Crawford, V. P. (2003): “Lying for strategic advantage: Rational and boundedly rationalmisrepresentation of intentions,”

American Economic Review , 93(1), 133–149.

Cressman, R. (1997): “Local Stability of Smooth Selection Dynamics for Normal Form Games,”

Mathematical Social Sciences , 34(1), 1–19.

Dekel, E., J. C. Ely, and

O. Yilankaya (2007): “Evolution of Preferences,”

Review ofEconomic Studies , 74, 685–704.

Dufwenberg, M., and

W. Güth (1999): “Indirect Evolution vs. Strategic Delegation: AComparison of Two Approaches to Explaining Economic Institutions,”

European Journal ofPolitical Economy , 15(2), 281–295.

Dunbar, R. I. M. (1998): “The Social Brain Hypothesis,”

Evolutionary Anthropology , 6, 178–190.

Ellingsen, T. (1997): “The Evolution of Bargaining Behavior,”

The Quarterly Journal of Eco-nomics , 112(2), 581–602.

Ely, J. C., and

O. Yilankaya (2001): “Nash Equilibrium and the Evolution of Preferences,”

Journal of Economic Theory , 97, 255–272.

Falk, A., E. Fehr, and

U. Fischbacher (2003): “On the nature of fair behavior,”

EconomicInquiry , 41(1), 20–26. 42 ershtman, C., and

Y. Weiss (1998): “Social Rewards, Externalities and Stable Preferences,”

Journal of Public Economics , 70(1), 53–73.

Frank, R. H. (1987): “If Homo Economicus Could Choose His Own Utility Function, Would HeWant One with a Conscience?,”

The American Economic Review , 77(4), 593–604.

Frenkel, S., Y. Heller, and

R. Teper (forthcoming): “The endowment eﬀect as a blessing,”

International Economic Review . Friedman, D., and

N. Singh (2009): “Equilibrium Vengeance,”

Games and Economic Behavior ,66(2), 813–829.

Fudenberg, D., and

D. K. Levine (1998):

The theory of learning in games , vol. 2. MIT press.

Gamba, A. (2013): “Learning and Evolution of Altruistic Preferences in the Centipede Game,”

Journal of Economic Behavior and Organization , 85(C), 112–117.

Gauer, F., and

C. Kuzmics (2016): “Cognitive empathy in conﬂict situations,” mimeo , SSRN2715160.

Güth, W. (1995): “An Evolutionary Approach to Explaining Cooperative Behavior by ReciprocalIncentives,”

International Journal of Game Theory , 24(4), 323–344.

Güth, W., and

S. Napel (2006): “Inequality Aversion in a Variety of Games: An IndirectEvolutionary Analysis,”

The Economic Journal , 116, 1037–1056.

Güth, W., and

M. E. Yaari (1992): “Explaining Reciprocal Behavior in Simple StrategicGames: An Evolutionary Approach,” in

Explaining Process and Change , ed. by U. Witt. Uni-versity of Michigan Press, Ann Arbor, MI, pp. 22–34.

Guttman, J. M. (2003): “Repeated Interaction and the Evolution of Preferences for Reciprocity,”

The Economic Journal , 113(489), 631–656.

Heifetz, A., C. Shannon, and

Y. Spiegel (2007): “What to Maximize if You Must,”

Journalof Economic Theory , 133(1), 31–57.

Heller, Y. (2015): “Three Steps Ahead,”

Theoretical Economics , 10, 203–241.

Heller, Y., and

E. Mohlin (forthcoming): “Observations on cooperation,”

Review of Economicstudies . Herold, F., and

C. Kuzmics (2009): “Evolutionary Stability of Discrimination under Observ-ability,”

Games and Economic Behavior , 67, 542–551.43 ines, W. G. S., and

J. Maynard Smith (1979): “Games between Relatives,”

Journal ofTheoretical Biology , 79(1), 19–30.

Hofbauer, J. (2011): “Deterministic Evolutionary Game Dynamics,” in

Proceedings of Symposiain Applied Mathematics , vol. 69, pp. 61–79.

Hofbauer, J., and

K. Sigmund (1988):

The Theory of Evolution and Dynamical Systems .Cambridge University Press, Cambridge.

Holloway, R. (1996): “Evolution of the Human Brain,” in

Handbook of Human Symbolic Evo-lution , ed. by A. Lock, and

C. R. Peters. Clarendon Press, New York: Oxford University Press,pp. 74–116.

Hopkins, E. (2014): “Competitive Altruism, Mentalizing and Signalling,”

American EconomicJournal: Microeconomics , 6, 272–292.

Huck, S., and

J. Oechssler (1999): “The Indirect Evolutionary Approach to Explaining FairAllocations,”

Games and Economic Behavior , 28, 13–24.

Humphrey, N. K. (1976): “The Social Function of Intellect,” in

Growing Points in Ethology , ed.by P. P. G. Bateson, and

R. A. Hinde. Cambridge University Press, Cambridge, pp. 303-317.

Kim, Y.-G., and

J. Sobel (1995): “An Evolutionary Approach to Pre-Play Communication,”

Econometrica , 63(5), 1181–1193.

Kinderman, P., R. I. M. Dunbar, and

R. P. Bentall (1998): “Theory-of-Mind Deﬁcits andCausal Attributions,”

British Journal of Psychology , 89, 191–204.

Koçkesen, L., E. A. Ok, and

R. Sethi (2000): “Evolution of interdependent preferences inaggregative games,”

Games and Economic Behavior , 31(2), 303–310.

Mailath, G. J., and

L. Samuelson (2006):

Repeated games and reputations: long-run rela-tionships . Oxford university press.

Matsui, A. (1991): “Cheap-Talk and Cooperation in a Society,”

Journal of Economic Theory ,54(2), 245–258.

Maynard Smith, J. (1982):

Evolution and the Theory of Games . Cambridge University Press,Cambridge.

Maynard Smith, J., and

G. R. Price (1973): “The Logic of Animal Conﬂict,”

Nature ,246(5427), 15–18. 44 ohlin, E. (2010): “Internalized Social Norms in Conﬂicts: An Evolutionary Approach,”

Eco-nomics of Governance , 11(2), 169–181.(2012): “Evolution of Theories of Mind,”

Games and Economic Behavior , 75(1), 299–312.

Norman, T. W. L. (2012): “Equilibrium Selection and the Dynamic Evolution of Preferences,”

Games and Economic Behavior , 74(1), 311–320.

Ok, E. A., and

F. Vega-Redondo (2001): “On the Evolution of Individualistic Preferences:An Incomplete Information Scenario,”

Journal of Economic Theory , 97, 231–254.

Possajennikov, A. (2000): “On the Evolutionary Stability of Altruistic and Spiteful Prefer-ences,”

Journal of Economic Behavior and Organization , 42, 125–129.

Premack, D., and

G. Woodruff (1979): “Does the Chimpanzee Have a Theory of Mind,”

Behavioral and Brain Sciences , 1, 515–526.

Robalino, N., and

A. Robson (2016): “The evolution of strategic sophistication,”

AmericanEconomic Review , 106(4), 1046–72.

Robson, A. J. (1990): “Eﬃciency in Evolutionary Games: Darwin, Nash and the Secret Hand-shake,”

Journal of Theoretical Biology , 144(3), 379–396.

Robson, A. J. (2003): “The Evolution of Rationality and the Red Queen,”

Journal of EconomicTheory , 111, 1–22.

Robson, A. J., and

L. Samuelson (2011): “The Evolutionary Foundations of Preferences,” in

The Social Economics Handbook , ed. by J. Benhabib, A. Bisin, and

M. Jackson. North Holland,Amsterdam, pp. 221–310.

Rtischev, D. (2016): “Evolution of Mindsight and Psychological Commitment among Strategi-cally Interacting Agents,”

Games , 7(3), 27.

Samuelson, L. (2001): “Introduction to the Evolution of Preferences,”

Journal of EconomicTheory , 97(2), 225–230.

Sandholm, W. H. (2001): “Preference Evolution, Two-Speed Dynamics, and Rapid SocialChange,”

Review of Economic Dynamics , 4, 637–679.(2010): “Local Stability under Evolutionary Game Dynamics,”

Theoretical Economics ,5(1), 27–50.

Schaffer, M. E. (1988): “Evolutionarily Stable Strategies for a Finite Population and a VariableContest Size,”

Journal of Theoretical Biology , 132, 469–478.45 chelling, T. C. (1960):

The Strategy of Conﬂict . Harvard University Press, Cambrdige, MA.

Schipper, B. C. (2017): “Strategic teaching and learning in games,” mimeo . Schlag, K. H. (1993): “Cheap Talk and Evolutionary Dynamics,” Bonn Department of Eco-nomics Discussion Paper B-242.

Selten, R. (1980): “A Note on Evolutionarily Stable Strategies in Asymmetric Animal Conﬂicts,”

Journal of Theoretical Biology , 84(1), 93–101.

Sethi, R., and

E. Somanthan (2001): “Preference Evolution and Reciprocity,”

Journal ofEconomic Theory , 97, 273–297.

Stahl, D. O. (1993): “Evolution of Smart n Players,”

Games and Economic Behavior , 5(4),604–617.

Stennek, J. (2000): “The Survival Value of Assuming Others to be Rational,”

InternationalJournal of Game Theory , 29, 147–163.

Taylor, P. D., and

L. B. Jonker (1978): “Evolutionary Stable Strategies and Game dynam-ics,”

Mathematical Biosciences , 40(1–2), 145–156.

Thomas, B. (1985): “On Evolutionarily Stable Sets,”

Journal of Mathematical Biology , 22(1),105–115. van Damme, E. (1987):

Stability and Perfection of Nash Equilibria . Springer, Berlin.

Wärneryd, K. (1991): “Evolutionary Stability in Unanimity Games with Cheap Talk,”

Eco-nomics Letters , 36(4), 375–378.(1998): “Communication, Complexity, and Evolutionary Stability,”

International Journalof Game Theory , 27(4), 599–609.

Weibull, J. W. (1995):

Evolutionary Game Theory . MIT Press, Cambridge Massachusetts.

Whiten, A., and

R. W. Byrne (1988): “Tactical deception in primates,”

Behavioral and brainsciences , 11(2), 233–244.

Wiseman, T., and

O. Yilankaya (2001): “Cooperation, secret handshakes, and imitation inthe prisoners’ dilemma,”