Coevolution of deception and preferences: Darwin and Nash meet Machiavelli
aa r X i v : . [ ec on . T H ] J un Coevolution of Deception and Preferences:Darwin and Nash Meet Machiavelli ∗ Yuval Heller † and Erik Mohlin ‡ Final pre-print of a paper published in
Games and Economic Behavior , 113, 2019, pp. 223-247.
Abstract
We develop a framework in which individuals’ preferences coevolve with their abilitiesto deceive others about their preferences and intentions. Specifically, individuals are charac-terised by (i) a level of cognitive sophistication and (ii) a subjective utility function. Increasedcognition is costly, but higher-level individuals have the advantage of being able to deceivelower-level opponents about their preferences and intentions in some of the matches. In theremaining matches, the individuals observe each other’s preferences. Our main result showsthat, essentially, only efficient outcomes can be stable. Moreover, under additional mild as-sumptions, we show that an efficient outcome is stable if and only if the gain from unilateraldeviation is smaller than the effective cost of deception in the environment.
Keywords:
Evolution of Preferences; Indirect Evolutionary Approach; Theory of Mind;Depth of Reasoning; Deception; Efficiency.
JEL codes:
C72, C73, D03, D83.Preprint of the ∗ Valuable comments were provided by the anonymous associate editor and referees, Vince Crawford, EddieDekel, Jeffrey Ely, Itzhak Gilboa, Christoph Kuzmics, Larry Samuelson, Jörgen Weibull, and Okan Yilankaya,as well as participants at presentations at Oxford University, Queen Mary University, G.I.R.L.13 in Lund, theToulouse Economics and Biology Workshop, DGL13 in Stockholm, the 25th International Conference on GameTheory at Stony Brook, and the Biological Basis of Preference and Strategic Behaviour 2015 conference at SimonFraser University. Yuval Heller is grateful to the
European Research Council for its financial support (starting grant
Handelsbankens forskningsstiftelser (grant
SwedishResearch Council (grant † Affiliation: Department of Economics, Bar Ilan University. Address: Ramat Gan 5290002, Israel. E-mail:[email protected]. ‡ Affiliation: Department of Economics, Lund University. Address: Tycho Brahes väg 1, 220 07 Lund, Sweden.E-mail: [email protected]. Introduction
For a long time economists took preferences as given. The study of their origin and formationwas considered a question outside the scope of economics. Over the past two decades this haschanged dramatically. In particular, there is now a large literature on the evolutionary foundationsof preferences (for an overview, see Robson and Samuelson, 2011). A prominent strand of thisliterature is the so-called “indirect evolutionary approach,” pioneered by Güth and Yaari (1992)(term coined by Güth, 1995). This approach has been used to explain the existence of a varietyof “non-standard” preferences that do not coincide with material payoffs, e.g., altruism, spite, andreciprocal preferences. Typically, the non-materialistic preferences in question convey some formof commitment advantage that induces opponents to behave in a way that benefits individualswith non-materialistic preferences, as described by Schelling (1960) and Frank (1987). Indeed,Heifetz, Shannon, and Spiegel (2007) show that this kind of result is generic.A crucial feature of the indirect evolutionary approach is that preferences are explicitly orimplicitly assumed to be at least partially observable. Consequently the results are vulnerable tothe existence of mimics who signal that they have, say, a preference for cooperation, but actuallydefect on cooperators, thereby earning the benefits of having the non-standard preference withouthaving to pay the cost (Samuelson, 2001). The effect of varying the degree to which preferencescan be observed has been investigated by Ok and Vega-Redondo (2001), Ely and Yilankaya (2001),Dekel, Ely, and Yilankaya (2007), and Herold and Kuzmics (2009). They confirm that the degreeto which preferences are observed decisively influences the outcome of preference evolution.Yet, the degree to which preferences are observed is still exogenous in these models. In re-ality we would expect both the preferences and the ability to observe or conceal them to be theproduct of an evolutionary process. This paper provides a first step towards filling in the missing For example, Bester and Güth (1998), Bolle (2000), and Possajennikov (2000) study combinations of altruism,spite, and selfishness. Ellingsen (1997) finds that preferences that induce aggressive bargaining can survive in a Nashdemand game. Fershtman and Weiss (1998) study evolution of concerns for social status. Sethi and Somanthan(2001) study the evolution of reciprocity in the form of preferences that are conditional on the opponent’s pref-erence type. In the context of the finitely repeated Prisoner’s Dilemma, Guttman (2003) explores the stability ofconditional cooperation. Dufwenberg and Güth (1999) study firm’s preferences for large sales. Güth and Napel(2006) study preference evolution when players use the same preferences in both ultimatum and dictator games.Koçkesen, Ok, and Sethi (2000) investigate survival of more general interdependent preferences in aggregativegames. Friedman and Singh (2009) show that vengefulness may survive if observation has some degree of in-formativeness. Recently, Norman (2012) has shown how to adapt some of these results into a dynamic model Gamba (2013) is an interesting exception. She assumes play of a self-confirming equilibrium, rather than aNash equilibrium, in an extensive-form game. This allows for evolution of non-materialistic preferences even whenthey are completely unobservable. An alternative is to allow for a dynamic that is not strictly payoff monotonic.This approach is pursued by Frenkel, Heller, and Teper (forthcoming), who show that multiple biases (inducingnon-materialistic preferences) can survive in non-monotonic evolutionary dynamics even if they are unobservable,because each approximately compensates for the errors of the others. On this topic, Robson and Samuelson (2011) write: “The standard argument is that we can observe preferencesbecause people give signals – a tightening of the lips or flash of the eyes – that provide clues as to their feelings.However, the emission of such signals and their correlation with the attendant emotions are themselves the productof evolution. [...] We cannot simply assume that mimicry is impossible, as we have ample evidence of mimicry from ink between evolution of preferences and evolution of how preferences are concealed, feigned, anddetected. In our model the ability to observe preferences and the ability to deceive and inducefalse beliefs about preferences are endogenously determined by evolution, jointly with the evolu-tion of preferences. Cognitively more sophisticated players have positive probability of deceivingcognitively less sophisticated players. Mutual observation of preferences occurs only in matches inwhich such deception fails. This setup is general enough to encompass both the standard indirectevolutionary model where preferences are always observed, and the reverse case in which moresophisticated types always deceive lower types, as well as all intermediate cases between thesetwo extremes.
We find that, generically, only efficient outcomes can be played in stable popula-tion states.
Moreover, we define a single number that captures the effective cost of deceptionagainst naive opponents, and show that an efficient outcome is stable if and only if the gain froma unilateral deviation is smaller than the effective cost of deception . Overview of the Model.
As is common in standard evolutionary game theory we assume aninfinite population of individuals who are uniformly randomly matched to play a symmetric normalform game. Each individual has a type, which is a tuple, consisting of a preference component anda cognitive component . The preference component is identified with a subjective utility functionover the set of outcomes (i.e. action profiles), which may differ from the objective payoffs (i.e.,fitness) of the underlying game. The cognitive component is simply a natural number representingthe level of cognitive sophistication of the individual. , The cost of increased cognition is strictlypositive.When two individuals with different cognitive levels are matched, there is positive probability(which may depend on the cognitive levels of both agents) that the agent with the higher leveldeceives his opponent. For the sake of tractability, and in order not to limit the degree to which the animal world, as well as experience with humans who make their way by misleading others as to their feelings,intentions and preferences. [...] In our view, the indirect evolutionary approach will remain incomplete until theevolution of preferences, the evolution of signals about preferences, and the evolution of reactions to these signals,are all analysed within the model .” [Emphasis added] (pp. 14–15) The recent working paper of Gauer and Kuzmics (2016) presents a different way to endogenising the observ-ability of preferences. Specifically, they assume that preferences are ex ante uncertain, and that each player mayexert a cognitive effort to privately observe the opponent’s preferences. It is known that positive assortative matching is conducive to the evolution of altruistic behaviour(Hines and Maynard Smith, 1979) and non-materialistic preferences even when preferences are perfectly unob-servable (Alger and Weibull, 2013; Bergstrom, 1995). It is also known that finite populations allow for evolutionof spiteful behaviours (Schaffer, 1988) and non-materialistic preferences (Huck and Oechssler, 1999). By assumingthat individuals are uniformly randomly matched in an infinite population, we avoid confounding these effects withthe effect of endogenising the degree of observability. The one-dimensional representation of cognitive ability reflects the idea that if one is good at deceiving others,then one is more likely to be good also at reading others and avoiding being deceived by them. In this paperwe simplify this relation by assuming a perfect correlation between the two abilities, and leave the study of moregeneral relations for future research. Remark 7 in Section 2.2 presents an alternative interpretation of our model, according to which this cognitivecomponent represents the agent’s social status, rather than the agent’s ability to deceive other agents. deception equilibrium . With the remaining probability (orwith probability one if both agents have the same cognitive level) there is no deception in thematch. In this case, we assume that each player observes the opponent’s preferences, and theindividuals play a Nash equilibrium of the complete information game induced by their subjectivepreferences.The state of a population is described by a configuration , consisting of a type distributionand a behaviour policy. The type distribution is simply a finite support distribution on the setof types. The behaviour policy specifies a Nash equilibrium for each match without deception,and a deception equilibrium for each match with deception. In a neutrally stable configuration allincumbents earn the same expected fitness, and if a small group of mutants enter they earn weaklyless than the incumbents in any focal post-entry state. A focal post-entry state is one in whichthe incumbents behave against each other in the same way as before the mutants entered.
Main Results.
We say that a strategy profile is (fitness-)efficient if it maximises the sum ofobjective payoffs. Theorem 1 shows that in any stable configuration, any type ¯ θ with the highestcognitive level in the incumbent population must play an efficient strategy profile when meetingitself. The intuition is that otherwise a highest-type mutant who mimics the play of ¯ θ againstall incumbents while playing an efficient strategy profile against itself would outperform type ¯ θ (anovel application of the “secret handshake” argument due to Robson, 1990).Next we restrict attention to generic games (i.e. games that result with probability one iffitness payoffs are independently drawn from a continuous distribution) and obtain our first mainresult: any stable configuration must induce efficient play in all matches between all types. Theidea of the proof can be briefly sketched as follows. We first show that any type θ in a stableconfiguration must play an efficient strategy profile when meeting itself . Otherwise a mutant whohas the same level as θ and the same utility function as θ , but who plays efficiently against itself,could invade the population. Next, we show that any two types must play an efficient strategyprofile. The intuition is that otherwise the average within-group fitness would be higher than thebetween-group fitness, which implies instability in the face of small perturbations in the frequencyof the types: a type who became slightly more frequent would have a higher fitness than the otherincumbents, and this would move the population away from the original configuration.The existing literature (e.g., Dekel, Ely, and Yilankaya, 2007) has demonstrated that if playersperfectly observe each other’s preferences (or do so with sufficiently high probability), then onlyefficient outcomes are stable. As was pointed out above, our model encompasses the limiting casein which it is arbitrarily “cheap and easy” to deceive the opponent, i.e. the case in which the4arginal cost of an additional cognitive level is very low, and having a slightly higher cognitivelevel allows a player to deceive the opponent with probability one. A key contribution of thepaper is to show that even when it is cheap and easy to deceive the opponent, the seemingly mildassumption of perfect observability, and Nash equilibrium behaviour, among players with the samecognitive level is enough to ensure that stability implies efficiency .In order to obtain sufficient conditions for stability we restrict attention to generic gamesthat admit a “punishment action” that ensures that the opponent achieves strictly less than thesymmetric efficient fitness payoff. For games satisfying this relatively mild requirement we fullycharacterise stable configurations. We define the (fitness) deviation gain of an action profile to bethe maximal fitness increase a player may obtain by unilaterally deviating from this action profile(this gain is zero if and only if the action profile is a Nash equilibrium of the underlying game). Nextwe define the effective cost of deception in the environment as the minimal ratio between the cost ofan increased cognitive level and the probability that an agent with this level deceives an opponentwith the lowest cognitive level. Our second main result shows that an efficient action profile is theoutcome of a stable configuration if and only if its deviation gain is smaller than the effective costof deception. In particular, efficient Nash equilibria are stable in all environments, while non-Nashefficient action profiles are stable only as long as the gain from a unilateral deviation is sufficientlysmall .Next, we note that non-generic games may admit different kinds of stable configurations. Oneparticularly interesting family of non-generic games is the family of zero-sum games, such asthe Rock-Paper-Scissors game. We analyse this game and characterise a heterogeneous stablepopulation (inspired by a related construction in Conlisk, 2001) in which different cognitive levelscoexist, players with equal levels play the Nash equilibrium of the underlying game, and playerswith higher levels beat their opponents but this gain is offset by higher cognitive costs.Finally, in Section 4 we discuss two extensions of the model (which are formally analysed inAppendices B and D): (1) we relax the assumption that each agent perfectly observes the partner’spreferences in matches without deception, and (2) we allow for type-interdependent preferences(`a la Herold and Kuzmics, 2009), which are represented by utility functions that are defined overboth action profiles and the opponent’s type. Further Related Literature.
Our model is related to work in biology and evolutionary psy-chology on the evolution of the “theory of mind” (Premack and Woodruff, 1979), specifically,the “Machiavellian intelligence” hypothesis (Humphrey, 1976) and the “social brain” hypothe-sis (Byrne and Whiten, 1998), according to which the extraordinary cognitive abilities of hu-mans evolved as a result of the demands of social interactions, rather than the demands ofthe natural environment: in a single-person decision problem there is a fixed benefit from be-ing smart, but in a strategic situation it may be important to be smarter than the opponent.5rom an evolutionary perspective, there is a trade-off between the benefit of outsmarting the op-ponent and the non-negligible costs associated with increased cognitive capacity (Holloway, 1996;Kinderman, Dunbar, and Bentall, 1998). Our model incorporates these features.There is a smaller literature on the evolution of strategic sophistication within game theory; see,e.g., Stahl (1993), Banerjee and Weibull (1995), Stennek (2000), Conlisk (2001), Abreu and Sethi(2003), Mohlin (2012), Rtischev (2016), and Heller (2015). Following these papers, we provideresults to the effect that different degrees of cognitive sophistication may coexist.Robalino and Robson (2016) construct a model to demonstrate the advantage of having atheory of mind (understood as an ability to ascribe stable preferences to other players) over learningby reinforcement. In novel games the ascribed preferences allow the agents with a theory of mindto draw on past experience whereas a reinforcement learner without such a model has to start overagain. Hopkins (2014) explains why costly signaling of altruism may be especially valuable forthose agents who have a theory of mind.Robson (1990) initiated a literature on evolution in cheap-talk games by formulating the secrethandshake effect: evolution selects an efficient stable state if mutants can send messages that theincumbents either do not see or do not benefit from seeing. Against the incumbents a mutantplays the same action as the incumbents do, but against other mutants the mutant plays an actionthat is a component of the efficient equilibrium. Thus the mutants are able to invade unless theincumbents are already playing efficiently. See also the related analysis in Matsui (1991) andWiseman and Yilankaya (2001). We allow for deception and still find that efficiency is necessary(though no longer sufficient) for stability. As pointed out by Wärneryd (1991) and Schlag (1993),among others, problems arise if either the incumbents use all available messages (so that there is nomessage left for the incumbents to coordinate on) or the incumbents follow a strategy that inducesthe mutants to play an action that lowers the mutants’ payoffs below those of the incumbents. Tocircumvent this problem, Kim and Sobel (1995) use stochastic stability arguments and Wärneryd(1998) uses complexity costs. Similarly, evolution selects an efficient outcome in our model, wherethe preferences also serve the function of messages.We conclude by mentioning three other related strands of literature in which deception hasbeen implicitly studied: (1) the “strategic teaching” literature, which studies situations in whichsophisticated agents manipulate the learning input of opponents in order to change the beliefs andfuture actions of these opponents (see, e.g., Fudenberg and Levine, 1998; Camerer, Ho, and Chong,2002; Schipper, 2017, Section 8.11); (2) the “reputation” literature, in which a long-run playermanipulates the beliefs and behaviour of short-run opponents (see Mailath and Samuelson, 2006,for a textbook exposition); and (3) non-equilibrium level-k analysis of games of conflict, whereagents can use pre-play communication to deceive naive opponents (see, e.g., Crawford, 2003).6 tructure.
The rest of the paper is organised as follows. Section 2 presents the model. Theresults are presented in Section 3. In Section 4 we extend the model to deal with partial observ-ability (formally analysed in Appendix D) and type-interdependent preferences (formally analysedin Appendix B). We conclude in Section 5. Appendix A contains proofs not in the main text.Appendix C formally constructs heterogeneous stable populations in specific games.
We consider a large population of agents, each of whom is endowed with a type that determinesher subjective preferences and her cognitive level. The agents are randomly matched to play asymmetric two-player game. A dynamic evolutionary process of cultural learning, or biologicalinheritance, increases the frequency of more successful types. We present a static solution conceptto capture stable population states in such environments.
Consider a symmetric two-player normal form game G with a finite set A of pure actions and aset ∆ ( A ) of mixed actions (or strategies). We use the letter a (resp., σ ) to describe a typical pureaction (resp., mixed action). Payoffs are given by π : A × A → R , where π ( a, a ′ ) is the material(or fitness) payoff to a player using action a against action a ′ . The payoff function is extended tomixed actions in the standard way, where π ( σ, σ ′ ) denotes the material payoff to a player usingstrategy σ , against an opponent using strategy σ ′ . With a slight abuse of notation let a denote thedegenerate strategy that puts all the weight on action a . We adopt this convention for probabilitydistributions throughout the paper. Remark . Asymmetric interactions can be captured in our setup (as is standard in the litera-ture; see, e.g., Brown and von Neumann, 1950; Selten, 1980; van Damme, 1987, Section 9.5) byembedding the asymmetric interaction in a larger, symmetric game in which nature first randomlyassigns the players to roles in the asymmetric interaction.We imagine a large population of individuals (technically, a continuum) who are uniformlyrandomly matched to play the game G . Each individual i in the population is endowed with a type θ = ( u, n ) ∈ Θ = U × N , consisting of preferences , identified with a von Neumann–Morgensternutility function, u ∈ U , and cognitive level n ∈ N . Let ∆ (Θ) be the set of all finite supportprobability distributions on Θ. A population is represented by a finite-support type distribution µ ∈ ∆ (Θ). Let C ( µ ) denote the support (carrier) of type distribution µ ∈ ∆ (Θ). Given a type For tractability, we choose to work with a discrete set of cognitive levels. The main results in the paper canbe adapted to a setup in which the feasible set of cognitive efforts is a continuum, provided that we maintain ourfocus on finite-support type distributions. Comment 6 in Section 2.2 explains why we restrict attention to finite-support type distributions. , we use u θ and n θ to refer to its preferences and cognitive level, respectively.In the main model we assume that the preferences are defined over action profiles, as inDekel, Ely, and Yilankaya (2007). This means that any preferences can be represented by autility function of the form u : A × A → R . The set of all possible (modulo affine transformations)utility functions on A × A is U = [0 , | A | . Let BR u ( σ ′ ) denote the set of best replies to strategy σ ′ given preferences u , i.e. BR u ( σ ′ ) = arg max σ ∈ ∆( A ) u ( σ, σ ′ ).There is a fitness cost to increased cognition, represented by a strictly increasing cognitive costfunction k : N → R + satisfying lim n →∞ k ( n ) = ∞ . The fitness payoff of an individual equals thematerial payoff from the game, minus the cognitive cost. Let k n denote the cost of having cognitivelevel n . Hence k θ = k n θ denotes the cost of having type θ . Without loss of generality, we assumethat k = 0.We would like to put forward two motivations for the assumption that there is an increasingfitness cost of having a higher cognitive level. The first motivation is relevant to settings in whichthe evolution of types is influenced by biological inheritance. There is a literature in biology andbiological anthropology showing that brain volume, especially neocortex volume, is correlated withthe size of social groups across species. Noting that brain tissue is metabolically costly, it has beenargued that the size of the brain (in particular the neocortex) is at least partially determinedby complexity of social organisation (see Dunbar, 1998, for a summary of the evidence and thearguments), which is in line with the “Machiavellian intelligence” and “social brain” hypotheses(Humphrey, 1976; Byrne and Whiten, 1997; Whiten and Byrne, 1988).The second motivation is relevant also in setups in which types evolve as part of a social learn-ing process. For concreteness, suppose that agents face two kinds of decision problems throughouttheir lives: (1) individual (ecological) decision problems against nature, and (2) interactive (so-cial) decision problems as represented by playing the underlying game G . Agents have limitedcognitive capacity. New agents who join the population face a trade-off between developing theirdeception-related cognitive skills (which are helpful when playing the game G ) and developingother skills (which are helpful in the decision problems against nature). When a new agent joinsthe population, his type θ = ( u θ , n θ ) determines how much effort the agent exerts in develop-ing his deception-related cognitive ability n θ (while the remaining effort is exerted to develop theother skills). The increasing cognitive cost function k ( n θ ) captures the agent’s loss due to hissub-optimal performance in the decision problems against nature, which is induced by divertingeffort to developing his deception-related cognitive ability at the expense of developing the otherskills. In Appendix B, we study type-interdependent preferences, which depend on the opponent’s type, as inHerold and Kuzmics (2009). .2 Configurations A state of the population is described by a type distribution and a behaviour policy for each typein the support of the type distribution. An individual’s behaviour is assumed to be (subjectively)rational in the sense that it maximises her subjective preferences given the belief she has aboutthe opponent’s expected behaviour. However, her beliefs may be incorrect if she is deceived by heropponent. An individual may be deceived if her opponent is of a strictly higher cognitive level.The probability of deception is given by the function q : N × N → [0 ,
1] that satisfies q ( n, n ′ ) = 0if and only if n ≤ n ′ . We interpret q ( n, n ′ ) as the probability that a player with cognitive level n deceives an opponent with cognitive level n ′ . Specifically, when two players with cognitive levels n ′ and n ≥ n ′ are matched to play, then with a probability of q ( n, n ′ ) the individual with the highercognitive level n (henceforth, the higher type ) observes the opponent’s preferences perfectly, andis able to deceive the opponent (henceforth, the lower type ). The deceiver is allowed to choosewhatever she wants the deceived party to believe about the deceiver’s intended action choice. Thedeceived party best-replies given her possibly incorrect belief. For simplicity, we assume that ifthe deceived party has multiple best replies, then the deceiver is allowed to break indifference, andchoose which of the best replies she wants the deceived party to play. Consequently the deceiveris able to induce the deceived party to play any strategy that is a best reply to some belief aboutthe opponent’s mixed action, given the deceived party’s preferences.Given preferences u ∈ U , let Σ ( u ) denote the set of undominated strategies , which are the setof actions that are best replies to at least one strategy of the opponent (given the preferences u ).Formally, we defineΣ ( u ) = { σ ∈ ∆ ( A ) : there exists σ ′ ∈ ∆ ( A ) such that σ ∈ BR u ( σ ′ ) } . We say that a strategy profile is a deception equilibrium if the strategy profile is optimal fromthe point of view of the deceiver under the constraint that the deceived player has to play anundominated strategy. Formally:
Definition 1.
Given two types θ, θ ′ with n θ > n θ ′ , a strategy profile (˜ σ, ˜ σ ′ ) is a deception equilib-rium if (˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) ,σ ′ ∈ Σ( u θ ′ ) u θ ( σ, σ ′ ) . Let DE ( θ, θ ′ ) be the set of all such deception equilibria.With the remaining probability of 1 − q ( n, n ′ ) − q ( n ′ , n ) there is no deception between the One can extend our main results to a setup in which individuals with lower cognitive levels can deceive opponentswith higher cognitive levels with a sufficiently small probability. Specifically, assume that for each generic game,there exists ǫ >
0, such that q ( n, n ′ ) < ǫ for each n ≤ n ′ (instead of requiring q ( n, n ′ ) = 0). One can show that thecharacterization of NSCs in Corollary 2 remains qualitatively the same. Namely, the only candidates to be NSCsare configurations in which all agents have the minimal cognitive level, and all agents play the efficient action profilein every match with no deception. These configurations are NSCs if the effective cost of defection is sufficiently low. n and n ′ , and they play a Nash equilibrium of the game induced bytheir preferences. Given two preferences u, u ′ ∈ U , let N E ( u, u ′ ) ⊆ ∆ ( A ) × ∆ ( A ) be the set ofmixed equilibria of the game induced by the preferences u and u ′ , i.e. N E ( u, u ′ ) = { ( σ, σ ′ ) ∈ ∆ ( A ) × ∆ ( A ) : σ ∈ BR u ( σ ′ ) and σ ′ ∈ BR u ′ ( σ ) } . We are now in a position to define our key notion of a configuration (following the terminologyof Dekel, Ely, and Yilankaya, 2007), by combining a type distribution with a behaviour policy, asrepresented by Nash equilibria and deception equilibria.
Definition 2. A configuration is a pair ( µ, b ) where µ ∈ ∆ (Θ) is a type distribution, and b = (cid:16) b N , b D (cid:17) is a behaviour policy , where b N , b D : C ( µ ) × C ( µ ) −→ ∆ ( A ) satisfy for each θ, θ ′ ∈ C ( µ ) : q ( n θ , n θ ′ ) + q ( n θ ′ , n θ ) < ⇒ (cid:16) b Nθ ( θ ′ ) , b Nθ ′ ( θ ) (cid:17) ∈ N E ( θ, θ ′ ) , and q ( n θ , n θ ′ ) > ⇔ n θ > n θ ′ ⇒ (cid:16) b Dθ ( θ ′ ) , b Dθ ′ ( θ ) (cid:17) ∈ DE ( θ, θ ′ ) . We interpret b Dθ ( θ ′ ) (resp., b Nθ ( θ ′ )) to be the strategy used by type θ against type θ ′ when deceptionoccurs (resp., does not occur).Given a configuration ( µ, b ) we call the types in the support of µ incumbents . Note that standardarguments imply that for any distribution µ , there exists a mapping b : C ( µ ) × C ( µ ) → ∆ ( A ) suchthat ( µ, b ) is a configuration. Given a configuration ( µ, b ) and types θ, θ ′ ∈ C ( µ ), let π θ ( θ ′ | ( µ, b ))be the expected fitness of an agent with type θ conditional on being matched with θ ′ : π θ (cid:0) θ ′ | ( µ, b ) (cid:1) = ( q ( n θ , n θ ′ ) + q ( n θ ′ , n θ )) · π (cid:16) b Dθ (cid:0) θ ′ (cid:1) , b Dθ ′ ( θ ) (cid:17) +(1 − ( q ( n θ , n θ ′ ) + q ( n θ ′ , n θ ))) · π (cid:16) b Nθ (cid:0) θ ′ (cid:1) , b Nθ ′ ( θ ) (cid:17) . The expected fitness of an individual of type θ in configuration ( µ, b ) isΠ θ | ( µ,b ) = X θ ′ ∈ C ( µ ) µ ( θ ′ ) · π θ ( θ ′ | ( µ, b )) − k θ , where µ ( θ ′ ) denotes the frequency of type θ ′ in the population. Given a configuration ( µ, b ), letΠ ( µ,b ) be the average fitness in the population, i.e.,Π ( µ,b ) = X θ ∈ C ( µ ) µ ( θ ) · Π θ | ( µ,b ) . When all incumbent types have the same expected fitness (i.e. Π ( µ,b ) = Π θ | ( µ,b ) for each θ ∈ C ( µ )),we say that the configuration is balanced .A number of aspects of our model of cognitive sophistication merit further discussion.10. Unidimensional cognitive ability : In reality the ability to deceive and the ability to detectpreferences are probably not identical. However, both of them are likely to be stronglyrelated to cognitive ability in general, and more specifically to theory of mind and the abil-ity to entertain higher-order intentional attitudes (Kinderman, Dunbar, and Bentall, 1998;Dunbar, 1998). For this reason we believe that a unidimensional cognitive trait is a reason-able approximation. Moreover, it is an approximation that affords us necessary tractability.We connect the abilities to detect and conceal preferences with the ability to deceive, by as-suming (throughout the paper) that one is able to deceive one’s opponent if and only if oneobserves the opponent’s preferences and conceals one’s own preferences from the opponent.2.
Power of deception : Our definition of deception equilibrium amounts to an assumptionthat a successful deception attempt allows the deceiver to implement her favourite strat-egy profile, under the constraint that the deceived party does not choose a dominated ac-tion from her point of view. Moreover, we assume that a player with a higher cognitivelevel knows whether her deception was successful when choosing her action. These assump-tions give higher cognitive types a clear advantage over lower cognitive types. Hence, in analternative model in which successful deceivers have less deception power, we would expectthe evolutionary advantage of higher types to be weaker than in our current model. Belowwe find that (for generic games) in any stable state everyone plays the same efficient actionprofile and has the lowest cognitive level. We conjecture that these states will remain stablealso in a model where successful deception is less powerful. We leave for future research theanalysis of feasible but less powerful deception technologies.3.
Same deception against all lower types : Our model assumes that a player may use differentdeceptions against different types with lower cognitive levels. We note that our results remainthe same (with minor changes to the proofs) in an alternative setup in which individuals haveto use the same mixed action in their deception efforts towards all opponents.4.
Non-Bayesian deception : Note that a successful deceiver is able to induce the opponent tobelieve that the deceiving type will play any mixed action ˆ σ , even an action that is neverplayed by any agent in the population. That is, deception is so powerful in our model that thedeceived opponent is not able to apply Bayesian reasoning in his false assessment of whichaction the agent is going to play. We think of this assumption as describing a setting inwhich the deceiver (of a higher cognitive type) is able to provide a convincing argument (tella convincing story) that she is going to play ˆ σ . From a Bayesian perspective one might objectthat these arguments are signals that should be used to update beliefs. To this we wouldrespond that the stories told to a potential victim by different deceivers will vary across Thus, in our setup a cognitive arms race (i.e. Machiavellian intelligence hypothesis `a la Humphrey, 1976;Robson, 2003) is a non-equilibrium phenomenon, or alternatively a feature of non-generic games.
Observation and Nash equilibrium behaviour in the case of non-deception : It is difficult toavoid an element of arbitrariness when making an assumption about what is being observedwhen neither party is able to deceive the other. As in most of the existing literature on theindirect evolutionary approach (e.g., Güth and Yaari, 1992; Dekel, Ely, and Yilankaya, 2007,Section 3), we assume that when there is no deception, then there is perfect observabilityof the opponent’s preferences. In Section 4.1 we discuss the implications of the relaxationof this assumption. We consider it to be an important contribution of our analysis that ithighlights the critical importance of the assumption made regarding observability, and theresulting behaviour, in matches without deception.We further assume that if two agents observe each other’s preferences then they play a Nashequilibrium of the complete information game induced by their preferences. This assumptionis founded on the common idea that when agents are not deceived, then (1) over time theyadapt their beliefs (in a way that is consistent with Bayesian inference) about the distributionof actions they face, conditional on their partners’ observed preferences, and (2) they best-reply given their belief about their current partner’s distribution of actions. By contrast, asdiscussed above, when agents are deceived they are unable to correctly update their beliefsabout their partner’s action (i.e. unable to use Bayesian inference to arrive at beliefs aboutthe opponent’s distribution of actions). Still, they are able to best-reply given their (possiblyfalse) beliefs about the deceiver’s action.6.
Continuum population and finite-support type distributions . Our model is intended to be asimple approximation of a real-life environment that includes a large finite population, andin which new agents who join the population, or existing agents who revise their choice oftype, typically choose to mimic one of the existing active types. As a result each active typeis played by several agents (rather than by a single agent), and for each active type thereis a positive probability of a match between agents who are endowed with this type. As iscommon in the literature, for tractability, we assume a continuum population and an “exactlaw of large numbers,” rather then a large finite population. We want all other aspects ofthe model to be as close as possible to the real-life environment. Specifically, we want tomaintain the property that for each type, there is a positive probability of a match betweenagents who are endowed with this type. In order to maintain this property, we have to12ssume that the distribution of active types has a finite support. Alternative interpretation of our model: social status . As suggested by one of the referees,one can present an interesting interpretation of our model that describes social status, ratherthan deception. According to this interpretation, the level n θ of type θ describes the socialstatus (like caste) of agents belonging to this type. When two players are randomly matchedto play a game, first a “social struggle” ensues. With a certain probability, the higher-caste player prevails and enslaves the lower-caste opponent. This means he can dictate thechoice by the lower-caste opponent as long as the choice is undominated for this opponent.Otherwise, they simply play the Nash equilibrium of the game (given by their preferences).Maintaining a higher social status is costly in terms of fitness. As discussed in the previous subsection, each agent in the population behaves in a way that max-imises the agent’s subjective preferences induced by the agent’s type. By contrast, the distributionof types in the population evolves according to the expected material fitness obtained by each type.This evolutionary process is captured by the static solution concepts introduced in this subsection.We consider dynamics in which types with higher expected fitness gradually become morefrequent. One example of such dynamics is the replicator dynamic (Taylor and Jonker, 1978),which can be interpreted in terms of biological (asexual) reproduction or as social learning byimitation (see Weibull, 1995, Chapter 3 for a textbook introduction). According to the latterinterpretation, an agent who has the opportunity to revise her choice or a new agent who joins thepopulation randomly chooses a member of the population as “mentor,” and imitates the mentor’stype; the probability that an agent is chosen as a mentor is proportional to that agent’s fitness.Recall that a neutrally stable strategy (Maynard Smith and Price, 1973; Maynard Smith, 1982)is a strategy that, if played by most of the population, weakly outperforms any other strategy.Similarly, an evolutionarily stable strategy is a strategy that, if played by most of the population,strictly outperforms any other strategy.
Definition 3.
A strategy σ ∈ ∆ ( A ) is a neutrally stable strategy (NSS) if for every σ ′ ∈ ∆ ( A )there is some ¯ ε ∈ (0 ,
1) such that if ε ∈ (0 , ¯ ε ), then ˜ π ( σ ′ , (1 − ε ) σ + εσ ′ ) ≤ ˜ π ( σ, (1 − ε ) σ + εσ ′ ).If weak inequality is replaced by strict inequality for each σ ′ = σ, then σ is an evolutionarily stablestrategy (ESS) .It is well known that NSSs and ESSs correspond to Lyapunov stable and asymptotically stablepopulation states, respectively, under the replicator dynamics. That is, a population starting close More accurately, we need to assume that the set of active types is countable. All of our results hold under thissomewhat weaker assumption.
13o an NSS will always remain close to the NSS, and a population starting close to an ESS willconverge to the ESS (see, e.g., Taylor and Jonker, 1978; Thomas, 1985; Bomze and Weibull, 1995;Cressman, 1997; Sandholm, 2010.)We extend the notions of neutral and evolutionary stability from strategies to configurations.We begin by defining the type game that is induced by a configuration.
Definition 4.
For any configuration ( µ, b ) the corresponding type game Γ ( C ( µ ) ,b ) is the symmetrictwo-player game where each player’s pure strategy space is C ( µ ), and the payoff to strategy θ ,against θ ′ , is π θ ( θ ′ | ( µ, b )) − k θ .The definition of a type game allows us to apply notions and results from standard evolutionarygame theory, where evolution acts upon strategies, to the present setting where evolution acts upontypes. A similar methodology was used in Mohlin (2012). Note that each type distribution withsupport in C ( µ ) is represented by a mixed strategy in Γ ( C ( µ ) ,b ) .We want to capture robustness with respect to small groups of individuals, henceforth called mutants , who introduce new types and new behaviours into the population. Suppose that afraction ε of the population is replaced by mutants and suppose that the distribution of typeswithin the group of mutants is µ ′ ∈ ∆ (Θ). Consequently the post-entry type distribution is˜ µ = (1 − ε ) · µ + ε · µ ′ . That is, for each type θ ∈ C ( µ ) ∪ C ( µ ′ ), ˜ µ ( θ ) = (1 − ε ) · µ ( θ ) + ε · µ ′ ( θ ). Inline with most of the literature on the indirect evolutionary approach we assume that adjustmentof behaviour is infinitely faster than adjustment of type distribution. Thus we assume that thepost-entry type distribution quickly stabilises into a configuration (cid:16) ˜ µ, ˜ b (cid:17) . There may exist manysuch post-entry type configurations, all having the same type distribution, but different behaviourpolicies. We note that incumbents do not have to adjust their behaviour against other incumbentsin order to continue playing Nash equilibria, and deception equilibria, among themselves. Forthis reason, we assume (similarly to Dekel, Ely, and Yilankaya, 2007, in the setup with perfectobservability) that the incumbents maintain the same pre-entry behaviour among themselves.Formally: Definition 5.
Let ( µ, b ) and (cid:16) ˜ µ, ˜ b (cid:17) be two configurations such that C ( µ ) ⊆ C (˜ µ ). We saythat (cid:16) ˜ µ, ˜ b (cid:17) is focal (with respect to ( µ, b )) if θ, θ ′ ∈ C ( µ ) implies that ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) and˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ).Standard fixed-point arguments imply that for every configuration ( µ, b ) and every type dis-tribution ˜ µ satisfying C ( µ ) ⊆ C (˜ µ ) , there exists a behaviour policy ˜ b such that (cid:16) ˜ µ, ˜ b (cid:17) is a focalconfiguration.Our stability notion requires that the incumbents outperform all mutants in all configurationsthat are focal relative to the initial configuration. Sandholm (2001) and Mohlin (2010) are exceptions. efinition 6. A configuration ( µ, b ) is a neutrally stable configuration (NSC) if, for every µ ′ ∈ ∆ (Θ), there is some ¯ ε ∈ (0 ,
1) such that for all ε ∈ (0 , ¯ ε ), it holds that if (cid:16) ˜ µ, ˜ b (cid:17) , where ˜ µ =(1 − ε ) · µ + ε · µ ′ , is a focal configuration, then µ is an NSS in the type game Γ( ˜ µ, ˜ b ). Theconfiguration ( µ, b ) is an evolutionarily stable configuration (ESC) if the same conditions implythat µ is an ESS in the type game Γ( ˜ µ, ˜ b ) for each µ ′ = µ .We conclude this section by discussing a few issues related to our notion of stability.1. In line with existing notions of evolutionary stability in the literature (in particular, thenotions of Dekel, Ely, and Yilankaya, 2007, and Alger and Weibull, 2013), we require themutants to be outperformed in all focal configurations (rather than requiring them to beoutperformed in at least one focal configuration). This reflects the assumption that thepopulation converges to a new post-entry equilibrium in a decentralised (possibly random)way that may lead to any of the post-entry focal configurations. Thus the incumbents cannotcoordinate their post-entry play on a specific focal configuration that favors them.2. In order to be consistent with the standard definition of neutral stability, we require theincumbents to earn weakly more than the average payoff of the mutants. We note that allof our results remain the same if one uses an alternative weaker definition that requires theincumbents to earn weakly more than the worst-performing mutant.3. The main stability notion that we use in the paper is NSC. The stronger notion of ESC isnot useful in our main model because there always exist equivalent types that have slightlydifferent preferences (as the set of preferences is a continuum) and induce the same behaviouras the incumbents. Such mutants always achieve the same fitness as the incumbents inpost-entry configurations, and thus ESCs never exist. Note that the stability notions inDekel, Ely, and Yilankaya (2007) and Alger and Weibull (2013) are also based on neutralstability. In Section B we study a variant of the model in which the preferences maydepend also on the opponent’s types. This allows for the existence of ESCs.4. Observe that Definition 6 implies internal stability with respect to small perturbations in thefrequencies of the incumbent types (because when µ ′ = µ, then µ is required to be an NSSin Γ ( C ( µ ) ,b ) ). By standard arguments, internal stability implies that any NSC is balanced: allincumbent types obtain the same fitness.5. The stability notions of Dekel, Ely, and Yilankaya (2007) and Alger and Weibull (2013) con-sider only monomorphic groups of mutants (i.e. mutants all having the same type). We addi-tionally consider stability against polymorphic groups of mutants (as do Herold and Kuzmics, In their stability analysis of homo hamiltonensis preferences Alger and Weibull (2013) disregard mutants whoare behaviourally indistinguishable from homo hamiltonensis upon entry.
Define the deviation gain of action a ∈ A , denoted by g ( a ) ∈ R + , as the maximal gain a playercan get by playing a different action in a population in which everyone plays a : g ( a ) = max a ′ ∈ A π ( a ′ , a ) − π ( a, a ) . Note that g ( a ) = 0 iff ( a, a ) is a Nash equilibrium.Define the effective cost of deception in the environment, denoted by c ∈ R + , as the minimalratio between the cognitive cost and the probability of deceiving an opponent of cognitive levelone:
16 17 c = min n ≥ k n q ( n, . We say that a strategy profile is efficient if it maximises the sum of fitness payoffs. Formally:
Definition 7.
A strategy profile ( σ, σ ′ ) is efficient in the game G = ( A, π ) if π ( σ, σ ′ ) + π ( σ ′ , σ ) ≥ π ( a, a ′ ) + π ( a ′ , a ), for each action profile ( a, a ′ ) ∈ A .Note that our notion of efficiency is defined: (1) with respect to the fitness payoff (rather thanthe agents’ subjective payoffs), similarly to the analogous definition of efficiency in Dekel, Ely, and Yilankaya(2007), and (2) with respect to the strategy profile played by the agents; by contrast, the definitiondoes not take into account the cognitive costs.A pure Nash equilibrium ( a, a ) is strict if π ( a, a ) > π ( a ′ , a ) for all a ′ = a ∈ A . Let ˆ π =max a,a ′ ∈ A (0 . · ( π ( a, a ′ ) + π ( a ′ , a ))) denote the efficient payoff, i.e. the average payoff achieved byplayers who play an efficient profile. The minimum in the definition of c is well defined for the following reason. Let ˆ n be a number such that k ˆ n > k q (2 , (such a number exists because lim n →∞ k n = ∞ ). Observe that k n q ( n, ≥ k n > k q (2 , for any n ≥ ˆ n . Thisimplies that there is an ¯ n such that 2 ≤ ¯ n ≤ ˆ n and ¯ n = arg min n ≥ k n q ( n, . We define the effective cost of deception only with respect to an opponent with a cognitive level of one becausewe later show (Lemma 1 and Theorem 2) that the only candidate to be an NSC is a configuration in which allagents have a cognitive level of one, and such a configuration is an NSC iff the effective cost of defection againstthese incumbents with n = 1 is sufficiently large.
16n action a is a punishment action if playing it guarantees that the opponent will obtain lessthan the efficient payoff, i.e. π ( a ′ , a ) < ˆ π for each a ′ ∈ A . Some of our results below assume thatthe underlying game admits a punishment action. Remark . Many economic interactions admit punishment actions. A few examples include:1. Price competition (Bertrand), either for a homogeneous good or for differentiated goods,where a punishment action is setting the price equal to zero.2. Quantity competition (Cournot), either for a homogeneous good or for differentiated goods,where the punishment action is “flooding” the market.3. Public good games, where contributing nothing to the public good is the punishment action.4. Bargaining situations, where the punishment action is for one side of the bargaining to insiston obtaining all surplus.5. Any game that admits an action profile that Pareto dominates all other action profiles (i.e.,games with common interests).Moreover, if one adds to any underlying generic game a new pure action that is equivalent to playingthe mixed action that min-maxes the opponent’s payoff (e.g., in matching pennies this new actionis equivalent to privately tossing a coin and then playing according to the toss’s outcome), thenthis newly added action is always a punishment action.Given a configuration ( µ, b ) let ¯ n = max θ ∈ C ( µ ) n θ denote the maximal cognitive level of theincumbents. We refer to incumbents with this cognitive level as the highest types .A deception equilibrium is fitness maximising if it maximises the fitness of the higher type inthe match (under the restriction that the lower type plays an action that is not dominated, givenher preferences). Formally: Definition 8.
Let θ, θ ′ be types with n θ > n θ ′ . A deception equilibrium (˜ σ, ˜ σ ′ ) is fitness maximising if (˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) , σ ′ ∈ Σ( u θ ′ ) π ( σ, σ ′ ) . Let
F M DE ( θ, θ ′ ) ⊆ DE ( θ, θ ′ ) denote the set of all such fitness-maximising deception equilibriaof two types θ, θ ′ with n θ > n θ ′ . In principle, F M DE ( θ, θ ′ ) might be an empty set (if there is noaction profile that maximises both the fitness and the subjective utility of the higher type). Ourfirst result (Theorem 1 below) implies that the preference of the higher type in any NSC are suchthat the set F M DE ( θ, θ ′ ) is non-empty.A configuration is pure if everyone plays the same action. Formally:17 efinition 9. A configuration ( µ, b ) is pure if there exists a ∗ ∈ A such that for each θ, θ ′ ∈ C ( µ )it holds that b Nθ ( θ ′ ) = a ∗ whenever q ( θ, θ ′ ) <
1, and b Dθ ( θ ′ ) = a ∗ whenever q ( θ, θ ′ ) > µ, a ∗ ), and we refer to b ≡ a ∗ asthe outcome of the configuration.In order to simplify the notation and the arguments in the proofs, we assume throughout thissection that the underlying game admits at least three actions (i.e. | A | ≥ In this section we characterise the behaviour of an incumbent type, ¯ θ = ( u, ¯ n ), which has thehighest level of cognition in the population. We show that the behaviour satisfies the followingthree conditions:1. Type ¯ θ plays an efficient action profile when meeting itself.2. Type ¯ θ maximises its fitness in all deception equilibria.3. Any opponent with a lower cognitive level achieves at most the efficient payoff ˆ π against type¯ θ . Theorem 1.
Let ( µ ∗ , b ∗ ) be an NSC, and let θ, ¯ θ ∈ C ( µ ∗ ) . Then: (1) if n ¯ θ = ¯ n then π (cid:16) ¯ θ, ¯ θ (cid:17) = ˆ π ;(2) if n θ < n ¯ θ = ¯ n then (cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ¯ θ, θ (cid:17) ; and (3) if n θ < n ¯ θ = ¯ n then π (cid:16) θ, ¯ θ (cid:17) ≤ ˆ π .Proof Sketch (formal proof in Appendix A.2). The proof utilises mutants (denoted by θ , θ , θ ,and ˆ θ , below) with the highest cognitive level ¯ n and with a specific kind of utility function,called indifferent and pro-generous , that makes a player indifferent between all her own actions,but which makes the player prefer that the opponent choose an action that allows the player toobtain the highest possible fitness payoff.To prove part 1 of the theorem, assume to the contrary that π (cid:16) b ¯ θ (cid:16) ¯ θ (cid:17) , b ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) < ˆ π . Let a , a ∈ A be any two actions such that ( a , a ) is an efficient action profile (i.e. 0 . · ( π ( a , a ) + π ( a , a )) =ˆ π ). Consider three different mutant types θ , θ , and θ , which are of the highest cognitive levelthat is present in the population, and have indifferent and pro-generous utility functions. Suppose For tractability we assume that a configuration can have only finite support. Note, however, that there is somesufficiently high cognitive level n such that k n > max a,a ′ ∈ A π ( a, a ′ ). As a result, even if one relaxes the assumptionof finite support, any NSC must include only a finite number of cognitive levels, also without the finite-supportassumption. There is a focal post-entryconfiguration in which the incumbents keep playing their pre-entry play among themselves, themutants play the same Nash equilibria as the incumbent ¯ θ against all incumbent types (and theincumbents behave against the mutants in the same way they behave against ¯ θ ), the mutantsplay fitness-maximising deception equilibria against all lower types, when mutants of type θ i arematched with mutants of type θ ( i +1) mod 3 they play the efficient profile ( a , a ), and when twomutants of the same type are matched they play the same way as two incumbents of type ¯ θ thatare matched together. In such a focal post-entry configuration all mutants earn a weakly higherfitness than ¯ θ against the incumbents, and a strictly higher fitness against the mutants. Thisimplies that ( µ ∗ , b ∗ ) cannot be an NSC.To prove part 2, assume to the contrary that (cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17) F M DE (cid:16) ¯ θ, θ (cid:17) . Supposemutants of type ˆ θ enter. Consider a post-entry configuration in which the incumbents keep playingtheir pre-entry play among themselves, and the mutants mimic the play of ¯ θ , except that theyplay fitness-maximising deception equilibria against all lower types. The mutants obtain a weaklyhigher payoff than ¯ θ against all types, and a strictly higher payoff than ¯ θ against at least one lowertype. Thus ( µ ∗ , b ∗ ) cannot be an NSC.To prove part 3, assume to the contrary that π (cid:16) θ, ¯ θ (cid:17) > ˆ π . This implies that against type ¯ θ ,type θ earns more than ˆ π in either the deception equilibrium or the Nash equilibrium. Supposemutants of type ˆ θ enter. Consider a post-entry configuration in which the incumbents keep playingtheir pre-entry play among themselves, while the mutants: (i) play fitness-maximising deceptionequilibria against lower types, (ii) mimic type θ ’s play in the Nash/deception equilibrium againsttype ¯ θ in which θ earns more than ˆ π , and (iii) mimic the play of ¯ θ in all other interactions. Thetype ˆ θ mutants earn strictly more than ¯ θ against both ˆ θ and ¯ θ . The mutants earn weakly morethan ¯ θ against all other types. This implies that ( µ ∗ , b ∗ ) cannot be an NSC. Remark . The first part of Theorem 1 (a highest type must play an efficient strategy profile whenmeeting itself) is similar to Dekel, Ely, and Yilankaya’s (2007) Proposition 2, which shows thatonly efficient outcomes can be stable in a setup with perfect observability and no deception. Weshould note that Dekel, Ely, and Yilankaya (2007) use a weaker notion of efficiency. An action isefficient in the sense of Dekel, Ely, and Yilankaya (2007) (DEY-efficient) if its fitness is the highestamong the symmetric strategy profiles (i.e. action a is DEY-efficient if π ( a, a ) ≥ π ( σ, σ ) for allstrategies σ ∈ ∆ ( A )). Observe that our notion of efficiency (Definition 7) implies DEY-efficiency,but the converse is not necessarily true. The weaker notion of DEY-efficiency is the relevant onein the setup of Dekel, Ely, and Yilankaya (2007), because they consider only monomorphic groups One must have at least two different types of mutants, in order for the mutants to be able to play the asymmetricprofile ( a , a ). We preset a construction with three different mutant types in order to allow all mutant types tooutperform the incumbents (one can also prove the result using a constructions with only two different mutanttypes, but in this case one can only guarantee that the mutants, on average, would outperform the incumbents) If i = 1 (resp., i = 2, i = 3), then θ ( i +1) mod 3 = θ (resp., θ ( i +1) mod 3 = θ , θ ( i +1) mod 3 = θ ). Corollary 1. If G does not have an efficient profile that is symmetric (i.e. if π ( a, a ) < ˆ π for each a ∈ A ), then the game does not admit an NSC.Remark . As discussed in Remark 1, any interaction (symmetric or asymmetric) can be embeddedin a larger, symmetric game in which nature first randomly assigns roles to the players, and theneach player chooses an action given his assigned role. Observe that such an embedded game alwaysadmits an efficient symmetric action profile . In particular, if the efficient asymmetric profile inthe original game is ( a, a ′ ), then the efficient symmetric profile in the embedded game is the onein which each player plays a as the row player and a ′ as the column player. In this subsection we characterise pure NSCs, i.e. stable configurations in which everyone playsthe same pure action in every match. Such a configuration may be viewed as representing thestate of a population that has settled on a convention that there is a unique correct way to behave.We begin by showing that in a pure NSC all incumbents have the minimal cognitive level, sincehaving a higher ability does not yield any advantage when everyone plays the same action.
Lemma 1. If ( µ, a ∗ ) is an NSC, and ( u, n ) ∈ C ( µ ) , then n = 1 .Proof. Since all players earn the same game payoff of π ( a ∗ , a ∗ ) , they must also incur the samecognitive cost, or else the fitness of the different incumbent types would not be balanced (whichwould contradict that ( µ, a ∗ ) is an NSC). Moreover, this uniform cognitive level must be level1. Otherwise a mutant of a lower level, who strictly prefers to play a ∗ against all actions, wouldstrictly outperform the incumbents in nearby post-entry focal configurations.The following proposition shows that a pure outcome is stable iff it is efficient and its deviationgain is smaller than than the effective cost of deception. Formally: Proposition 1.
Let a ∗ be an action in a game that admits a punishment action. The followingtwo statements are equivalent:(a) There exists a type distribution µ such that ( µ, a ∗ ) is an NSC.(b) a ∗ satisfies the following two conditions: (1) π ( a ∗ , a ∗ ) = ˆ π , and (2) g ( a ∗ ) ≤ c . If the original game is symmetric, the role (i.e. being either the row or the column player) can be interpretedas reflecting some observable payoff-irrelevant asymmetry between the two players. roof. “If side.” Assume that ( a ∗ , a ∗ ) is an efficient profile and that g ( a ∗ ) ≤ c . Let e a be apunishment action. Consider a monomorphic configuration ( µ, a ∗ ) consisting of type θ ∗ =( u ∗ ,
1) where all incumbents are of cognitive level 1 and of the same preference type u ∗ ,according to which all actions except a ∗ and e a are strictly dominated, e a weakly dominates a ∗ , and a ∗ is a best reply to itself: u ∗ ( a, a ′ ) = a = e a and a ′ = a ∗ a = a ∗ or ( a = e a and a ′ = a ∗ . ) − π when they are matched with the incumbents,and strictly less than ˆ π if the mutants play any action a = a ∗ with positive probabilityagainst the incumbents. Further observe, that the mutants can earn (on average) at most ˆ π when they are matched with other mutants (because ˆ π is the efficient payoff). This impliesthat incumbents weakly outperform any mutants with cognitive level one in any post-entrypopulation.Next, consider mutants with a higher cognitive level n >
1. Such mutants can earn atmost ˆ π + g ( a ∗ ) when they deceive the incumbents and at most ˆ π when they do not deceivethe incumbents (recall that π ( e a, e a ) + g ( e a ) = max a ′ π ( a ′ , e a ) < ˆ π because e a is a punishmentaction). Thus the mutants are weakly outperformed by the incumbents if q ( n, · ( g ( a ∗ ) + ˆ π ) + (1 − q ( n, · ˆ π − k n ≤ ˆ π ⇔ g ( a ∗ ) ≤ k n q ( n, . This holds for all n if g ( a ∗ ) ≤ c . Thus, the probability of deceiving the incumbents is atmost k n g ( a ∗ ) . The fact that g ( a ∗ ) ≤ c implies that the average payoff of the mutants againstthe incumbents is less than ˆ π + g ( a ∗ ) · k n g ( a ∗ ) ≤ ˆ π + k n , and thus if the mutants are sufficientlyrare they are weakly outperformed (due to paying the cognitive cost of k n ). We concludethat ( µ, a ∗ ) is an NSC.2. “Only if side.” Assume that ( µ, a ∗ ) is an NSC. Theorem 1 implies that π ( a ∗ , a ∗ ) = ˆ π . As-sume that g ( a ∗ ) > c. The definition of the effective cost of deception implies that thereexists a cognitive level n such that k n q ( n, < g ( a ∗ ). Lemma 1 implies that all the incumbentshave cognitive level 1. Consider mutants with cognitive level n and completely indifferentpreferences (i.e. preferences that induce indifference between all action profiles). Let a ′ bea best reply against a ∗ . There is a post-entry focal configuration in which (i) the incum-bents play a ∗ against the mutants, (ii) the mutants play a ′ when they deceive an incumbent21pponent and a ∗ when they do not deceive an incumbent opponent, and (iii) the mutantsplay a ∗ when they are matched with another mutant. Note that the mutants achieve at leastˆ π + g ( a ∗ ) · q ( n,
1) when they are matched against the incumbents. The gain relative to in-cumbents, g ( a ∗ ) · q ( n, k n , by our assumptionthat g ( a ∗ ) > c. Thus the mutants strictly outperform the incumbents.
In this section we characterise NSCs in generic games, by which we mean games in which any twodifferent action profiles each give the same player a different payoff, and each yield a different sumof payoffs.
Definition 10.
A (symmetric) game is generic if for each a, a ′ , b, b ′ ∈ A , { a, a ′ } 6 = { b, b ′ } implies π ( a, a ′ ) = π ( b, b ′ ) , and π ( a, a ′ ) + π ( a ′ , a ) = π ( b, b ′ ) + π ( b ′ , b ) . For example, if the entries of the payoff matrix π are drawn independently from a continu-ous distribution on an open subset of the real numbers, then the induced game is generic withprobability one.Note that a generic game admits at most one efficient action profile. From Corollary 1 we knowthat if the game does not have a symmetric efficient profile then it does not admit any NSC (andas discussed in Remark 4, essentially every interaction admits a symmetric efficient profile). Hencewe can restrict attention to games with exactly one efficient action profile. Let ¯ a denote the actionplayed in this unique profile.Next we present our main result: all incumbent types play efficiently in any NSC of a genericgame. Theorem 2. If ( µ ∗ , b ∗ ) is an NSC of a generic game with a (unique) efficient outcome (¯ a, ¯ a ) , then b ∗ ≡ ¯ a , for all θ, θ ′ ∈ C ( µ ∗ ) ; i.e. all types play the pure action ¯ a in all matches.Proof. Assume to the contrary that configuration ( µ ∗ , b ∗ ) is an NSC such that there are some θ, θ ′ ∈ C ( µ ∗ ) such that b Nθ ( θ ′ ) = ¯ a and q ( θ n , θ n ′ ) + q ( θ n ′ , θ n ) <
1, or b Dθ ( θ ′ ) = ¯ a and q ( θ n , θ n ′ ) > θ be the type with the highest cognitive level among the types that satisfy at least one of thefollowing conditions:(A) ˚ θ plays inefficiently against itself, i.e. π (cid:16) ˚ θ, ˚ θ (cid:17) < ˆ π .(B) ˚ θ and an opponent with a weakly higher type play an inefficient strategy profile, i.e. 0 . · (cid:16) π (cid:16) ˚ θ, θ ′ (cid:17) + π (cid:16) θ ′ , ˚ θ (cid:17)(cid:17) < ˆ π for some θ ′ = ˚ θ with n ˚ θ ≤ n θ ′ .22C) A strictly lower type earns strictly more than ˆ π against ˚ θ , i.e. π (cid:16) θ ′′ , ˚ θ (cid:17) > ˆ π for some θ ′′ = ˚ θ with n ˚ θ > n θ ′′ .We will now successively rule out each of these cases.Assume first that (A) holds. Let ˆ u be a utility function that is identical to u ˚ θ except that:(i) the payoff of the outcome (¯ a, ¯ a ) is increased by the minimal amount required to make it abest reply to itself, and (ii) the payoff of some other outcome is altered slightly (to ensure ˆ u isnot already an incumbent) in a way that does not change the behaviour of agents. (The formaldefinition of ˆ u is provided in Appendix A.3.) Suppose that mutants of type ˆ θ = (ˆ u, n θ ) invadethe population. Consider a focal post-entry configuration in which the mutants mimic the play ofthe type ˚ θ incumbents in all matches except that: (i) the mutants play the efficient profile (¯ a, ¯ a )among themselves (which yields a higher payoff than what ¯ θ achieves when matched against ˚ θ ),and (ii) when the mutants face a higher type they play either (¯ a, ¯ a ) or the same deception/Nashequilibrium that the higher types play against ¯ θ . It follows that the mutants ˆ θ earn a strictly higherpayoff than ˚ θ against ˆ θ , and a weakly higher fitness than type θ against all other types. Thus themutants strictly outperform the incumbents, which contradicts the assumption that ( µ ∗ , b ∗ ) is anNSC. The full technical details of this argument are given in Appendix A.3.Next, assume that case (B) holds and that case (A) does not hold. This implies that0 . · (cid:16) π (cid:16) ˚ θ, θ ′ (cid:17) + π (cid:16) θ ′ , ˚ θ (cid:17)(cid:17) < ˆ π = π (cid:16) ˚ θ, ˚ θ (cid:17) = π ( θ ′ , θ ′ ) . That is, in the subpopulation that includes types ˚ θ and θ ′ the within-type matchings yieldhigher payoffs than out-group matchings (an average payoff of less than ˆ π ). The following formalargument shows that this property implies dynamic instability. The fact that ( µ ∗ , b ∗ ) is an NSCimplies that µ ∗ is an NSS in the type game Γ ( µ ∗ ,b ∗ ) . Let B be the payoff matrix of the type gameΓ ( µ ∗ ,b ∗ ) and let n = | C ( µ ∗ ) | . It is well known (e.g., Hofbauer and Sigmund, 1988, Exercise 6.4.3,and Hofbauer, 2011, pp. 1–2) that an interior Nash equilibrium of a normal-form game is an NSSif and only if the payoff matrix is negative semi-definite with respect to the tangent space, i.e. ifand only if x T B x ≤ x ∈ R n such that P i x i = 0. Assume without loss of generality thattype ˚ θ ( θ ′ ) is represented by the j th ( k th ) row of the matrix B . Let the column vector x be definedas follows: x ( j ) = 1, x ( k ) = −
1, and x ( i ) = 0 for each i / ∈ { j, k } . That is, the vector x has allentries equal to zero, except for the j th entry, which is equal to 1, and the k th entry, which is equalto −
1. We have x T B x = B jj − B kj − B jk + B kk = π (¯ a, ¯ a ) − k n ˚ θ + π (¯ a, ¯ a ) − k n θ ′ − (cid:16) π (cid:16) b ˚ θ ( θ ′ ) , b θ ′ (cid:16) ˚ θ (cid:17)(cid:17) − k n ˚ θ + π (cid:16) b θ ′ (cid:16) ˚ θ (cid:17) , b ˚ θ ( θ ′ ) (cid:17) − k n θ ′ (cid:17) = 2 · π (¯ a, ¯ a ) − (cid:16) π (cid:16) b ˚ θ ( θ ′ ) , b θ ′ (cid:16) ˚ θ (cid:17)(cid:17) + π (cid:16) b θ ′ (cid:16) ˚ θ (cid:17) , b ˚ θ ( θ ′ ) (cid:17)(cid:17) > . B is not negative semi-definite.Finally, assume that only case (C) holds. Let ¯ θ be an incumbent type with the highest cognitivelevel. The fact that case (B) does not hold implies that π (cid:16) ¯ θ, ˚ θ (cid:17) = π (cid:16) ˚ θ, ¯ θ (cid:17) = ˆ π . The fact that case(C) holds implies that π (cid:16) θ ′′ , ˚ θ (cid:17) > ˆ π , which implies that type ˚ θ has an undominated action that canyield a deceiving opponent a payoff of more than ˆ π in a deception equilibrium. This contradictspart (2) of Theorem 1, according to which we should have (cid:16) b D ¯ θ (cid:16) ˚ θ (cid:17) , b D ˚ θ (cid:16) ¯ θ (cid:17)(cid:17) = F M DE (cid:16) ¯ θ, ˚ θ (cid:17) .We have shown that no type in the population satisfies either (A), (B), or (C). The fact that notype satisfies (A) implies that in any match of agents of the same type both agents play action ¯ a ,and the fact that no type satisfies (B) implies that in any match between two agents of differenttypes both players play action ¯ a .Combining the results of this section with the above characterisation of pure NSCs yields thefollowing corollary, which fully characterises the NSCs of generic games that admit punishmentactions (as discussed in Remark 2, such actions exist in many economic interactions). The resultshows that such games admit an NSC iff the deviation gain from the pure efficient symmetric profileis smaller than the effective cost of defection, and when an NSC exists, its outcome is the pureefficient symmetric profile. In particular, in any game that admits an efficient symmetric pureNash equilibrium, this equilibrium is the unique NSC outcome, and in the Prisoner’s Dilemmamutual cooperation is the unique NSC outcome iff the gain from defecting against a cooperator isless than the effective cost of deception, and no NSC exists otherwise. Corollary 2.
Let G be a generic game that admits a punishment action. The environment admitsan NSC iff there exists an efficient symmetric pure profile ( a ∗ , a ∗ ) satisfying g ( a ∗ ) ≤ c (i.e. thedeviation gain is smaller than the effective cost of deception). Moreover, if ( µ, b ) is an NSC, then b ≡ a ∗ , and n = 1 for all ( u, n ) ∈ C ( µ ) .Remark . Corollary 2 shows that generic games do not admit NSCs if the effective cost of deceptionis less than the deviation gain of the efficient profile. In such cases the distribution of typesand their induced behaviour will not converge to a static population state. We leave the formalanalysis of environments that do not admit NSCs to future research. One conjecture for thedynamic behaviour in such environments is a never-ending cycle between states in which almostall agents are naive and play an efficient action profile, and states in which different cognitive levelscoexist, and agents play inefficient action profiles (see the related analysis of cyclic behaviour in thePrisoner’s Dilemma with cheap talk and material preferences in Wiseman and Yilankaya, 2001).
Remark . Corollary 2 states that in an NSC of a generic game everyone has the same cognitivelevel. One may wonder how this relates to the apparent cognitive heterogeneity in the real world.Our analysis in this paper assumes a single underlying game, while in reality we face a potentiallyinfinite set of games. If an individual’s fitness is the result of interactions in a set of games thatincludes generic games with an NSC as well as non-generic games (see Section 3.5) or generic24ames that do not admit any NSC (see previous remark), then evolution may lead to states inwhich different cognitive levels coexist, possibly with a never-ending cycle between states withdifferent mixtures of cognitive levels.
Remark . Corollary 2 assumes that the underlying game admits a punishment action ˜ a , that givesan opponent a payoff strictly smaller than the efficient payoff ˆ π , regardless of the opponent’s play.This punishment action is used in the construction of the NSC that induces the efficient action a ∗ . Specifically, a non-deceived incumbent plays the punishment action a ′ against any mutantwho does not always plays action a ∗ . If the game does not admit a punishment action, then (1)a complicated game-specific construction of the way in which incumbents behave against mutantswho do not always play a ∗ may be required to support the efficient action as the outcome of anNSC, and (2) this construction may require further restrictions on the effective cost of deception,in addition to g ( a ∗ ) ≤ c . We leave the study of these issues to future research. The previous two subsections fully characterise (i) pure NSCs and (ii) NSCs in generic games. Inthis section we analyse non-pure NSCs in non-generic games. Non-generic games may be of interestin various setups, such as: (1) normal-form representation of generic extensive-form games (theinduced matrix is typically non-generic), and (2) interesting families of games, such as zero-sumgames. Unlike generic games, non-generic games can admit NSCs that are not pure and thatmay therefore contain multiple cognitive levels. To demonstrate this we consider the Rock-Paper-Scissors game, with the following payoff matrix: R P SR , − , , − P , − , − , S − , , − , . To simplify the analysis and the notations we assume in this subsection that a player alwayssucceeds in deceiving an opponent with a lower cognitive level, i.e. that q ( n, n ′ ) = 1 whenever n>n ′ . The analysis can be extended to the more general setup.The result below shows that, under mild assumptions on the cognitive cost function, this gameadmits an NSC in which all players have the same materialistic preferences, but players of differentcognitive levels coexist, and non-Nash profiles are played in all matches between two individuals For the construction presented in this subsection to work, the underlying game must be non-generic. Observethat if one slightly perturbs the payoffs of the Rock-Paper-Scissors game to make it a strictly competitive almost-zero-sum generic game, then Corollary 2 applies, and the only candidate to be an NSC is a configuration in whichall agents have cognitive level one, and they all play an efficient action profile.
25f different cognitive levels. More precisely, when individuals of different cognitive levels meet, thehigher-level individual deceives the lower-level individual into taking a pure action that the higher-level individual then best-replies to. Thus the higher-level individual earns 1 and her opponentearns −
1. Individuals of the same cognitive level play the unique Nash equilibrium. This meansthat higher-level types will obtain a payoff of 1 more often than lower-level types, and lower-level types will obtain a payoff of − Proposition 2.
Let G be a Rock-Paper-Scissors game. Let u π denote the (materialistic) preferencesuch that u π ( a, a ′ ) = π ( a, a ′ ) for all profiles ( a, a ′ ) . Assume that q ( n, n ′ ) = 1 whenever n = n ′ .Further assume that the marginal cognitive cost is small but non-vanishing, so that (a) there isan N such that k N ≤ < k N +1 , and (b) it holds that > k n +1 − k n for all n ≤ N . Under theseassumptions there exists an NSC ( µ ∗ , b ∗ ) such that C ( µ ∗ ) ⊆ { ( u π , n ) } Nn =1 , and µ ∗ is mixed (i.e. | C ( µ ∗ ) | > ). The behaviour of the incumbent types is as follows: if the individuals in a matchare of different cognitive levels, then the higher level plays Paper and the lower level plays Rock;if both individuals in a match are of the same cognitive level, then they both play the unique Nashequilibrium (i.e. randomise uniformly over the three actions). Appendix C contains a formal proof of this result and relates it to a similar construction inConlisk (2001).Our next result gives a lower bound to the fitness obtained in NSCs. Let M be the puremaxmin value of the underlying game: M = max a ∈ A min a ∈ A π ( a , a ) . The pure maxmin value M is the minimal fitness payoff a player can guarantee herself in thesequential game in which she plays first, and the opponent replies in an arbitrary way (i.e. notnecessarily maximising the opponent’s fitness.)Proposition 3 shows that the pure maxmin value is a lower bound on the fitness payoff obtainedin an NSC. The intuition is that if the payoff is lower, then a mutant of cognitive level 1, withpreferences such that the maxmin action a M is dominant, will outperform the incumbents. Proposition 3. If ( µ ∗ , b ∗ ) is an NSC then Π ( µ ∗ , b ∗ ) ≥ M .Proof. Assume to the contrary that Π ( µ ∗ , b ∗ ) < M . Let a M be a maxmin action of a player, whichguarantees that the player’s payoff is at least M ¯ , i.e. a M ∈ arg max a ∈ A min a ∈ A π ( a , a ) . u a M be the preferences in which the player obtains a payoff of 1 if she plays a M and apayoff of 0 otherwise. Consider a monomorphic group of mutants with type ( u a M , a M is a maxmin action implies that π ( u aM , ) (cid:16) ˜ µ, ˜ b (cid:17) ≥ M in any post-entry configuration.Furthermore, due to continuity it holds that Π θ (cid:16) ˜ µ, ˜ b (cid:17) < M for any θ ∈ C ( µ ) in all sufficientlyclose focal post-entry configurations. This contradicts that µ ∗ is an NSS in Γ( ˜ µ, ˜ b ), and thus itcontradicts that ( µ ∗ , b ∗ ) is an NSC.We conclude by demonstrating that the lower bound of the maxmin payoff is binding. Specif-ically, Example 1 shows an NSC in a zero-sum game in which the fitness of the incumbents isarbitrarily close to the lowest feasible payoff in the underlying game -1 (which is equal to themaxmin payoff). Example 1.
Consider the Rock-Paper-Scissors game described above. Assume that k = 1, k > , and q (2 ,
1) = 1. For each ǫ ∈ (0 , ǫ of the agents have cognitivelevel 1, and the remaining 1 − ǫ of the agents have level 2. The agents’ behaviour is according tothe behaviour described in Proposition 2, i.e.: (1) an agent of level 2 deceives a level-1 opponentinto taking a pure action that the level-2 agent then best-replies to; thus the level-2 agent earns 1and her opponent earns −
1; and (2) individuals of the same cognitive level play the unique Nashequilibrium, and obtain a payoff of zero in the underlying game. When one takes into account thecognitive cost k = 1 of the level-2 agents, this behaviour implies that all incumbents obtain afitness of ǫ −
1. An analogous argument to the proof of Proposition 2 implies that this configurationis an NSC.
As mentioned above, our basic model assumes perfect observability, and Nash equilibrium be-haviour, in matches without deception. In what follows we briefly describe the results of a robust-ness check that relaxes the first of these two assumptions. For brevity, we detail the full technicalanalysis in Appendix D.Specifically, we follow Dekel, Ely, and Yilankaya (2007) and assume that in matches withoutdeception, each player privately observes the opponent’s type with an exogenous probability p ,and with the remaining probability observes an uninformative signal. This general model extendsboth our baseline model (where p = 1) and Dekel, Ely, and Yilankaya’s (2007) model (which canbe viewed as assuming arbitrarily high deception costs).The main results of the baseline model ( p = 1) show that (1) only efficient profiles can be NSCs,and (2) there exist non-Nash efficient NSCs, provided that the cost of deception is sufficiently27arge. Our analysis shows that the former result (namely, stability implies efficiency) is robust tothe introduction of partial observability: (1) a somewhat weaker notion of efficiency is satisfied bythe behaviour of the incumbents with the highest cognitive level in any NSC for any p >
0, and(2) in games such as the Prisoner’s Dilemma, we show that only the efficient profile can be theoutcome of an NSC.On the other hand, our analysis shows that our second main result (namely, the stability ofnon-Nash efficient outcomes) is not robust to the introduction of partial observability. Specifically,we show that: (1) non-Nash efficient profiles cannot be NSC outcomes for any p < p ∈ [0 , p ∈ (0 , In the main text we deal exclusively with preferences that are defined only over action profiles.In what follows we briefly describe how to extend the analysis to interdependent preferences, i.e.preferences that may also depend on the opponent’s type. A detailed formal analysis is presented inAppendix B. Herold and Kuzmics (2009) study a similar setup while assuming perfect observabilityof types among all individuals. Their key result is that any mixed action that gives each player apayoff above her maxmin payoff can be the outcome of a stable configuration. Our main result for interdependent preferences in our setup shows that a pure configurationis stable essentially iff: (1) all incumbents have the same cognitive level n , (2) the cost of level n is smaller than the difference between the incumbents’ (fitness) payoff and the minmax/maxminvalue, and (3) the deviation gain is smaller than the effective cost of deception against an opponentwith cognitive level n . In particular, if the marginal effective cost of deception is sufficiently small,then only Nash equilibria can be the outcomes of pure stable configurations, while if the effectivecost of deceiving some cognitive level n is sufficiently high (while the cost of achieving level n issufficiently low), then essentially any action profile is the outcome of a pure stable configuration Herold and Kuzmics (2009) expand the framework of Dekel, Ely, and Yilankaya (2007) to include interdepen-dent preferences, i.e. preferences that depend on the opponent’s preference type. Under perfect or almost perfectobservability, if all preferences that depend on the opponent’s type are considered, then any symmetric outcomeabove the minmax material payoff is evolutionarily stable. In our setting a pure profile also has to be a Nash equi-librium in order to be the sole outcome supported by evolutionarily stable preferences. Herold and Kuzmics (2009)find that non-discriminating preferences (including selfish materialistic preferences) are typically not evolutionarilystable on their own. By contrast, certain preferences that exhibit discrimination are evolutionarily stable. Similarly,evolutionary stability requires the presence of discriminating preferences also in our setup.
We have developed a model in which preferences coevolve with the ability to detect others’ prefer-ences and misrepresent one’s own preferences. To this end, we have allowed for heterogeneity withrespect to costly cognitive ability. The assumption of an exogenously given level of observabilityof the opponent’s preferences, which has characterised the indirect evolutionary approach up untilnow, is replaced by the Machiavellian notion of deception equilibrium, which endogenously deter-mines what each player observes. Our model assumes a very powerful form of deception. Thisallows us to derive sharp results that clearly demonstrate the effects of endogenising observationand introducing deception. We think that the “Bayesian” deception is an interesting model forfuture research: each incumbent type is associated with a signal, agents with high cognitive levelscan mimic the signals of types with lower cognitive levels, and agents maximise their preferencesgiven the received signals and the correct Bayesian inference about the opponent’s type.In a companion paper (Heller and Mohlin, forthcoming) we study environments in which playersare randomly matched, and make inferences about the opponent’s type by observing her pastbehaviour (rather than directly observing her type, as is standard in the “indirect evolutionaryapproach”). In future research, it would be interesting to combine both approaches and allow theobservation of past behaviour to be influenced by deception.Most papers taking the indirect evolutionary approach study the stability of preferences definedover material outcomes. Moreover, it is common to restrict attention to some parameterised class ofsuch preferences. Since we study preferences defined on the more abstract level of action profiles wedo not make predictions about whether some particular kind of preferences over material outcomes,from a particular family of utility functions, will be stable or not. It would be interesting to extendour model to such classes of preferences. Furthermore, with preferences defined over materialoutcomes it would be possible to study coevolution of preferences and deception not only in isolatedgames, but also when individuals play many different games using the same preferences. We hopeto come back to these questions and we invite others to employ and modify our framework in thesedirections. 29
Formal Proofs of Theorems 1 and 2
A.1 Preliminaries
This subsection contains notation and definitions that will be used in the following proofs.A generous action is an action such that if played by the opponent, it allows a player to achievethe maximal fitness payoff. Formally:
Definition 11.
Action a g ∈ A is generous , if there exists a ∈ A such that π ( a, a g ) ≥ π ( a ′ , a ′′ ) forall a ′ , a ′′ ∈ A .Fix a generous action a g ∈ A of the game G . A second-best generous action is an action suchthat if played by the opponent, it allows a player to achieve the fitness payoff that is maximalunder the constraint that the opponent is not allowed to play the generous action a g . Formally: Definition 12.
Action a g ∈ A is second-best generous , conditional on a g ∈ A being first-bestgenerous, if there exists a ∈ A such that π ( a, a g ) ≥ π ( a ′ , a ′′ ) for all a ′ , a ′′ ∈ A such that a ′′ = a g .Fix a generous action a g ∈ A , and fix a second-best generous action a g ∈ A , conditional on a g ∈ A being first-best generous. For each α ≥ β ≥
0, let u α,β be the following utility function: u α,β ( a, a ′ ) = α a ′ = a g β a ′ = a g u α,β satisfies:1. Indifference: the utility function only depends on the opponent’s action; i.e. the player isindifferent between any two of her own actions.2.
Pro-generosity: the utility is highest if the opponent plays the generous action, second-highestif the opponent plays the second-best generous action, and lowest otherwise.Let U GI = { u α,β | α ≥ β ≥ } be the family of all such preferences, called pro-generous indifferentpreferences . Note that U GI includes a continuum of different utilities (under the assumption that G includes at least three actions). Thus, for any set of incumbent types, we can always find autility function in U GI that does not belong to any of the current incumbents. A.2 Proof of Theorem 1 (Behaviour of the Highest Types)
A.2.1 Proof of Theorem 1, Part 1
Assume to the contrary that π (cid:16) b N ¯ θ (cid:16) ¯ θ (cid:17) , b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) < ˆ π . (Note that the definition of ˆ π implies thatthe opposite inequality is impossible.) Let a , a ∈ A be any two actions such that ( a , a ) is an30fficient action profile, i.e. 0 . · ( π ( a , a ) + π ( a , a )) = ˆ π . Let θ , θ , θ be three types that satisfythe following conditions: (1) the types are not incumbents: θ , θ , θ / ∈ C ( µ ∗ ), (2) the types havethe highest incumbent cognitive level: n θ = n θ = n θ = ¯ n , and (3) the types have different pro-generosity indifferent preferences; u θ , u θ , u θ ∈ U GI and u θ i = u θ j for each i = j ∈ { , , } . Let µ ′ be the distribution that assigns mass to each of these types. The post-entry type distributionis ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ . Let the post-entry behaviour policy ˜ b be defined as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) foreach incumbent pair θ, θ ′ ∈ C ( µ ∗ ).2. The mutants play fitness-maximising deception equilibria against incumbents with lowercognitive levels: (cid:16) ˜ b Dθ i ( θ ′ ) , ˜ b Dθ ′ ( θ i ) (cid:17) ∈ F M DE ( θ i , θ ′ ) for each i ∈ { , , } and θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n . Note that F M DE ( θ i , θ ′ ) is nonempty in virtue of the construction of U GI .3. In matches without deception between mutants and incumbents, the mutants mimic ¯ θ andthe incumbents play the same way they play against ¯ θ : (cid:16) ˜ b Nθ i ( θ ′ ) , ˜ b Nθ ′ ( θ i ) (cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) ,for each i ∈ { , , } and θ ′ ∈ C ( µ ∗ ).4. Two mutants of different types play efficiently when meeting each other: ˜ b Nθ i (cid:16) θ ( i +1) mod 3 (cid:17) = a and ˜ b Nθ i (cid:16) θ ( i −
1) mod 3 (cid:17) = a for each i ∈ { , , } .5. When two mutants of the same type meet, they play the same way ¯ θ plays against itself:˜ b Nθ i ( θ i ) = b N ¯ θ (cid:16) ¯ θ (cid:17) for each i ∈ { , , } .In virtue of point 1 the construction (cid:16) ˜ µ, ˜ b (cid:17) is a focal configuration (with respect to ( µ ∗ , b ∗ )). Bypoints 2 and 3 each mutant θ i earns weakly more than ¯ θ against all incumbent types. By points4 and 5 each mutant earns strictly more than ¯ θ against the mutants. In total the average fitnessearned by each mutant is strictly higher than that of ¯ θ , against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) .This implies that µ ′ is a strictly better reply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, and therefore it is not an NSS, in Γ( ˜ µ, ˜ b ), which implies that µ ∗ is not an NSC. A.2.2 Proof of Theorem 1, Part 2
Assume to the contrary that (cid:16)(cid:16) b D ¯ θ ( θ ) , b Dθ (cid:16) ¯ θ (cid:17)(cid:17)(cid:17) F M DE (cid:16) ¯ θ, θ (cid:17) . Let ˆ θ be a type that satisfiesthe conditions of: (1) not being an incumbent: ˆ θ / ∈ C ( µ ∗ ), (2) having the highest incumbentcognitive level: n ˆ θ = ¯ n , and (3) having pro-generous indifferent preferences: u ˆ θ ∈ U GI . Let µ ′ be the distribution that assigns mass one to type ˆ θ . The post-entry type distribution is ˜ µ =(1 − ǫ ) · µ ∗ + ǫ · µ ′ . Let the post-entry behaviour policy ˜ b be defined as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ). 31. In matches with deception between mutants and incumbents , behaviour is such that themutants maximise their fitness: (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) for each θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n .3. In matches without deception between mutants and incumbents, the mutants mimic ¯ θ andthe incumbents play the same way they play against ¯ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) ,for each θ ′ ∈ C ( µ ∗ ).4. The mutant ˆ θ plays against itself the same way ¯ θ plays against itself: (cid:16) ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) , ˜ b N ˆ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) ˜ b N ¯ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) .Note that (cid:16) ˜ µ, ˜ b (cid:17) is a focal configuration (with respect to ( µ ∗ , b ∗ )) and that ˆ θ obtains a strictly higherfitness than ¯ θ against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) . This implies that µ ′ is a strictly betterreply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, andtherefore it is not an NSS, in Γ( ˜ µ, ˜ b ), which implies that µ ∗ is not an NSC. A.2.3 Proof of Theorem 1, Part 3
Assume to the contrary that π (cid:16) θ, ¯ θ (cid:17) > ˆ π , which immediately implies that π (cid:16) ¯ θ, θ (cid:17) < ˆ π and thateither π (cid:16) b | Dθ (cid:16) ¯ θ (cid:17) , b D ¯ θ ( θ ) (cid:17) > ˆ π or π (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) > ˆ π . Let ˆ θ be a type that satisfies the conditionsof: (1) not being an incumbent: ˆ θ / ∈ C ( µ ∗ ), (2) having the highest incumbent cognitive level: n ˆ θ = ¯ n , and (3) having pro-generous indifferent preferences: u ˆ θ ∈ U GI . Let µ ′ be the distributionthat assigns mass one to type ˆ θ . The post-entry type distribution is ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ . Letthe post-entry behaviour policy ˜ b be defined as follows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ).2. In matches with deception between mutants and incumbents, behaviour is such that themutants maximise their fitness: (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) for each θ ′ ∈ C ( µ ∗ ) with n θ ′ < ¯ n .3. In a match between a mutant ˆ θ and the incumbent ¯ θ , the mutant mimics θ , and the in-cumbent ¯ θ plays the same way it plays against θ : (cid:16) ˜ b N ˆ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) if π (cid:16) b Nθ (cid:16) ¯ θ (cid:17) , b N ¯ θ ( θ ) (cid:17) > ˆ π , and (cid:16) ˜ b N ˆ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b Dθ (cid:16) ¯ θ (cid:17) , b D ¯ θ ( θ ) (cid:17) otherwise.4. The mutant ˆ θ plays against itself the same way ¯ θ plays against itself: (cid:16) ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) , ˜ b N ˆ θ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) ˜ b N ¯ θ (cid:16) ¯ θ (cid:17) , ˜ b N ¯ θ (cid:16) ¯ θ (cid:17)(cid:17) .5. The mutant ˆ θ mimics ¯ θ against all other incumbents without deception, and these incumbentsplay against ˆ θ in the same way they play against ¯ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ¯ θ ( θ ′ ) , b Nθ ′ (cid:16) ¯ θ (cid:17)(cid:17) foreach θ ′ = ¯ θ . 32ote that (cid:16) ˜ µ, ˜ b (cid:17) is a focal configuration (with respect to ( µ ∗ , b ∗ )). By point 2 the mutant ˆ θ earnsweakly more than ¯ θ against lower types. By point 3 and Theorem 1.1, the mutants earn strictlymore than ¯ θ against type ¯ θ . By points 3 and 4 and Theorem 1.1, the mutant earns strictly morethan ¯ θ against the mutant. By point 5 the mutant ˆ θ earns the same as ¯ θ against all other types.In total the average fitness earned by ˆ θ is strictly higher than that of ¯ θ , against a population thatfollows (cid:16) ˜ µ, ˜ b (cid:17) . Recall (Remark 4 in Section 2.3) that all the incumbent types have the same fitnessin ( µ ∗ , b ∗ ). By a standard continuity argument, the fitness of incumbent ¯ θ is arbitrarily close (fora sufficiently small ǫ ) to the fitness levels of any other incumbent type in the focal post-entryconfiguration (cid:16) ˜ µ, ˜ b (cid:17) . This implies that µ ′ is a strictly better reply against µ ∗ in the type gameΓ( ˜ µ, ˜ b ). Thus, µ ∗ is not a symmetric Nash equilibrium, and therefore it is not an NSS, in Γ( ˜ µ, ˜ b ),which implies that ( µ ∗ , b ∗ ) is not an NSC. A.3 Proof of Case (A) in Theorem 2
In what follows we fill in the missing technical details for the part of the proof of Theorem 2 thatconcerns case (A). We begin by proving a lemma.
Lemma 2. If ( σ , σ ) ∈ DE ( θ , θ ) then there exist actions a , a ′ ∈ C ( σ ) and a , a ′ ∈ C ( σ ) such that: ( a , a ) ∈ DE ( θ , θ ) , and ( a ′ , a ′ ) ∈ DE ( θ , θ ) , with π ( a , a ) ≥ π ( σ , σ ) , and π ( a ′ , a ′ ) ≤ π ( σ , σ ) .Proof. Note that for any mixed deception equilibrium ( σ , σ ) and any action a ∈ C ( σ ), theprofile ( σ , a ) is also a deception equilibrium (because otherwise the deceiver would not induce thedeceived party to take a mixed action that puts positive weight on a ). It follows that there areactions a , a ′ ∈ C ( σ ) such that ( σ , a ) and ( σ , a ′ ) are deception equilibria, with π ( σ , a ) ≥ π ( σ , σ ) and π ( σ , a ′ ) ≤ π ( σ , σ ). Furthermore, if ( σ , a ) and ( σ , a ′ ) are deception equilibria,then for any action a ∈ C ( σ ), the profiles ( a, a ) and ( a, a ′ ) are also deception equilibria, with π ( σ , a ) = π ( a, a ) and π ( σ , a ′ ) = π ( a, a ′ ). Hence there are actions a , a ′ ∈ C ( σ ) suchthat ( a , a ) and ( a ′ , a ′ ) are deception equilibria, with π ( a , a ) = π ( σ , a ) ≥ π ( σ , σ ), and π ( a , a ′ ) = π ( σ , a ′ ) ≤ π ( σ , σ ).Assume that case (A) holds: there is an incumbent ˚ θ that plays inefficiently against itself, i.e. (cid:16) b N ˚ θ (cid:16) ˚ θ (cid:17) , b N ˚ θ (cid:16) ˚ θ (cid:17)(cid:17) = (¯ a, ¯ a ), and there is no incumbent type with a strictly higher cognitive levelthan ˚ θ that satisfies any of the cases (A), (B), or (C). To prove that this cannot hold in an NSCwe introduce a mutant ˆ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ) . If Σ ( u ˚ θ ) = ∆, then we let ˆ u ∈ U GI be such thatˆ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ). If Σ ( u ˚ θ ) = ∆, then we fix a dominated action a ∈ A \ Σ ( u ˚ θ ), and let ˆ u bedefined as follows: ˆ u ( a, a ′ ) = max a ∈ A ( u ˚ θ ( a, ¯ a )) a = a ′ = ¯ au ˚ θ ( a, a ′ ) − β a ′ a = a and a ′ = ¯ au ˚ θ ( a, a ′ ) otherwise,33here each β a ′ ≥ θ = (ˆ u, n ˚ θ ) / ∈ C ( µ ∗ ). That is, if Σ ( u ˚ θ ) = ∆, then theutility function ˆ u is constructed from the utility function u ˚ θ by arbitrarily lowering the payoff ofsome of the outcomes associated with the (already) dominated action a and that do not involveaction ¯ a , while increasing the payoff of the outcome (¯ a, ¯ a ) by the minimal amount that makes¯ a a best reply to itself. Note that this definition of ˆ u is valid also for the case of ¯ a = a . Itfollows that a ∈ Σ ( u ˚ θ ) ∪ { ¯ a } iff a ∈ Σ (ˆ u ). To see this, note that if Σ ( u ˚ θ ) = ∆ and a = ¯ a , thenΣ (ˆ u ) = Σ ( u ˚ θ ) ∪ { ¯ a } . Otherwise Σ (ˆ u ) = Σ ( u ˚ θ ). Thus, ˆ θ can be induced to play exactly the samepure actions as ˚ θ , unless ¯ a = a , in which case ˆ θ can be induced to play ¯ a in addition to all actionsthat ˚ θ can be induced to play.Let µ ′ be the distribution that assigns mass one to type (ˆ u, n ˚ θ ). Let the post-entry typedistribution be ˜ µ = (1 − ǫ ) · µ ∗ + ǫ · µ ′ , and let the post-entry behaviour policy ˜ b be defined asfollows:1. Behaviour among incumbents respects focality: ˜ b Nθ ( θ ′ ) = b Nθ ( θ ′ ) and ˜ b Dθ ( θ ′ ) = b Dθ ( θ ′ ) ∀ θ, θ ′ ∈ C ( µ ∗ ).2. In matches without deception between the mutant type ˆ θ and any incumbent type θ ′ ,the mutant ˆ θ mimics ˚ θ , and the incumbent θ ′ treats the mutant ˆ θ like the incumbent ˚ θ : (cid:16) ˜ b N ˆ θ ( θ ′ ) , ˜ b Nθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = (cid:16) b N ˚ θ ( θ ′ ) , b Nθ ′ (cid:16) ˚ θ (cid:17)(cid:17) for all θ ′ such that n θ ′ = n ˚ θ and θ ′ = ˆ θ .3. In matches with deception between the mutant type ˆ θ and any lower type θ ′ ∈ C ( µ ∗ ) (with n θ ′ < n ˆ θ ), we distinguish two cases.(a) Suppose that Σ ( u ˚ θ ) = ∆. In this case let (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) ∈ F M DE (cid:16) ˆ θ, θ ′ (cid:17) . Note that F M DE (cid:16) ˆ θ, θ ′ (cid:17) is nonempty since in this case ˆ u ∈ U GI .(b) Suppose that Σ ( u ˚ θ ) = ∆. In this case let (cid:16) ˜ b D ˆ θ ( θ ′ ) , ˜ b Dθ ′ (cid:16) ˆ θ (cid:17)(cid:17) = ( a , a ), for some ( a , a ) ∈ DE (cid:16) ˚ θ, θ ′ (cid:17) such that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) . By Lemma 2 above such a profile( a , a ) exists.4. The mutant plays efficiently when meeting itself: ˜ b N ˆ θ (cid:16) ˆ θ (cid:17) = ¯ a .5. In matches with deception between the mutant ˆ θ and a higher type θ ′ ∈ C ( µ ∗ ) ( n θ ′ >n ˆ θ ), we distinguish two cases. Pick a profile ( a , a ) ∈ DE (cid:16) θ ′ , ˚ θ (cid:17) , such that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) . By Lemma 2 above such a profile ( a , a ) exists. Moreover, by theconstruction of ˆ u , it is either the case that ( a , a ) ∈ DE (cid:16) θ ′ , ˆ θ (cid:17) , or there is some ˜ a such that u θ ′ (˜ a, ¯ a ) > u θ ′ ( a , a ). In the latter case we have (¯ a, ¯ a ) ∈ DE (cid:16) θ ′ , ˆ θ (cid:17) , due to the fact that (cid:16) b Dθ ′ ( θ ′ ) , b Dθ ′ ( θ ′ ) (cid:17) = (¯ a, ¯ a ) implies that ¯ a is a best reply to ¯ a for type θ ′ .(a) If u θ ′ ( a , a ) > u θ ′ (¯ a, ¯ a ) let (cid:16) ˜ b Dθ ′ (cid:16) ˆ θ (cid:17) , ˜ b D ˆ θ ( θ ′ ) (cid:17) = ( a , a ). Note that by the definition of( a , a ) it holds that π ( a , a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) .34b) If u θ ′ ( a , a ) ≤ u θ ′ (¯ a, ¯ a ) let (cid:16) ˜ b Dθ ′ (cid:16) ˆ θ (cid:17) , ˜ b D ˆ θ ( θ ′ ) (cid:17) = (¯ a, ¯ a ). Note that by the definition of ˚ θ it holds that π (¯ a, ¯ a ) ≥ π (cid:16) b D ˚ θ ( θ ′ ) , b Dθ ′ (cid:16) ˚ θ (cid:17)(cid:17) .By point 1, (cid:16) ˜ µ, ˜ b (cid:17) is a focal configuration (with respect to ( µ ∗ , b ∗ )). By point 2 the mutant ˆ θ earns weakly more than ˚ θ against lower types. By point 3 the mutant ˆ θ earns the same as ˚ θ against all incumbents of level n ˚ θ . By points 3 and 4 (and the assumption that ˚ θ does not playefficiently against itself), the mutant ˆ θ earns strictly more than ˚ θ against ˆ θ . By point 5 the mutantˆ θ earns weakly more than ˚ θ against all incumbents of a higher cognitive level. In total the averagefitness earned by ˆ θ is strictly higher than that of ˚ θ , against a population that follows (cid:16) ˜ µ, ˜ b (cid:17) . Thisimplies that µ ′ is a strictly better reply against µ ∗ in the population game Γ( ˜ µ, ˜ b ). Thus, µ ∗ is nota symmetric Nash equilibrium, and therefore it is not an NSS of Γ( ˜ µ, ˜ b ), which implies that µ ∗ isnot an NSC. Thus we have shown that ˚ θ plays efficiently against itself. B Type-interdependent Preferences
As argued by Herold and Kuzmics (2009, pp. 542–543), people playing a game seem to carenot only about the outcome, but also their opponent’s intentions and they discriminate betweendifferent types of opponents (for experimental evidence, see, e.g., Falk, Fehr, and Fischbacher,2003; Charness and Levine, 2007). Motivated by this observation, in this appendix we extend ourbaseline model to allow preferences to depend not only on action profiles, but also on an opponent’stype.
B.1 Changes to the Baseline Model
We briefly describe how to extend the model to handle type-interdependent preferences. Ourconstruction is similar to that of Herold and Kuzmics (2009).When the preferences of a type depend on the opponent’s type, we can no longer work withthe set of all possible preferences, because it would create problems of circularity and cardinality. Instead, we must restrict attention to a pre-specified set of feasible preferences. We begin bydefining Θ ID as an arbitrary set of labels. Each label is a pair θ = ( u, n ) ∈ Θ ID , where n ∈ N and u is a type-interdependent utility function that depends on the played action profile as well as theopponent’s label, u : A × A × Θ ID → R . The circularity comes from the fact that each type contains a preferences component, which is identified witha utility function defined over types (and action profiles). To see that this creates a problem if the set of types isunrestricted, let U ∗ be the set of all utility functions that we want to include in our model. Hence Θ ∗ = U ∗ × N is the set of all types. If U ∗∗ is the set of all mappings u : A × A × Θ ∗ → R , or, equivalently, U ∗∗ is the set of all mappings u : A × A × U ∗ × N → R , then clearly we have U ∗∗ = U ∗ . See also footnote 10 in Herold and Kuzmics(2009). θ = ( u, n ) may now be interpreted as a type. The definition of u extends to mixedactions in the obvious way. We use the label u also to describe its associated utility function u .Thus u ( σ, σ ′ , θ ′ ) denotes the subjective payoff that a player with preferences u earns when sheplays strategy σ against an opponent with type θ ′ who plays strategy σ ′ .Let U ID denote the set of all preferences that are part of some type in Θ ID , i.e. U ID = { u : ∃ n ∈ N s.t. ( u, n ) ∈ Θ ID } . For each preference ˜ u ∈ U of the baseline model (which is definedonly over the action profiles) we can define an equivalent type-interdependent preference u ∈ U ID ,which is independent of the opponent’s type; that is, u ( σ, σ ′ , θ ′ ) = u ( σ, σ ′ , θ ′′ ) = ˜ u ( σ, σ ′ ) for each θ ′ , θ ′′ ∈ Θ ID and σ, σ ′ ∈ ∆ ( A ). Let U N denote the set of all such type-interdependent versions ofthe preferences of the baseline model. To simplify the statements of the results of Section B.3, inwhat follows we assume that U N ⊆ U ID .Next, we amend the definitions of Nash equilibrium, undominated strategies, and deceptionequilibrium. The best-reply correspondence now takes both strategies and types as arguments: BR u ( σ ′ , θ ′ ) = arg max σ ∈ ∆( A ) u ( σ, σ ′ , θ ′ ). Accordingly we adjust the definition of the set of Nashequilibria, N E ( θ, θ ′ ) = { ( σ, σ ′ ) ∈ ∆ ( A ) × ∆ ( A ) : σ ∈ BR u ( σ ′ , θ ′ ) and σ ′ ∈ BR u ′ ( σ, θ ) } , and the set of undominated strategies, Σ ( θ ) = { σ ∈ ∆ ( A ) : there exists σ ′ ∈ ∆ ( A ) and θ ′ ∈ Θ ID such that σ ∈ BR u ( σ ′ , θ ′ ) } . Finally, we adapt the definition of deception equilibrium. Given two types θ, θ ′ with n θ > n θ ′ , astrategy profile (˜ σ, ˜ σ ′ ) is a deception equilibrium if(˜ σ, ˜ σ ′ ) ∈ arg max σ ∈ ∆( A ) ,σ ′ ∈ Σ( θ ′ ) u θ ( σ, σ ′ , θ ′ ) . The interpretation of this definition is that the deceiver is able to induce both a belief about thedeceiver’s preferences, and a belief the deceiver’s intention, in the mind of the deceived party. Let DE ( θ, θ ′ ) be the set of all such deception equilibria. The rest of our model remains unchanged.Some of the following results rely on the existence of preferences u ˜ a ˜ a ′ , ˜ n that satisfy two conditions:(1) action ˜ a is a (subjective) dominant action against an opponent with the same preferencesand with cognitive level ˜ n , and (2) action ˜ a ′ is the dominant action against all other opponents.Formally: Definition 13.
Given any two actions ˜ a, ˜ a ′ ∈ A, let u ˜ a ˜ a ′ , ˜ n be the discriminating preferences defined36y the following utility function: for all a, a ′ ∈ A and θ ′ ∈ U ID , u ˜ a ˜ a ′ , ˜ n ( a, a ′ , θ ′ ) = (cid:16) θ ′ = (cid:16) u ˜ a ˜ a ′ , ˜ n , ˜ n (cid:17) and a = ˜ a (cid:17) or (cid:16) θ ′ = (cid:16) u ˜ a ˜ a ′ , ˜ n , ˜ n (cid:17) and a = ˜ a ′ (cid:17) . Finally, define the effective cost of deceiving cognitive level n , denoted by c ( n ), as the minimalratio between the additional cognitive cost and the probability of deceiving an opponent of cognitivelevel n : c ( n ) = min m>n k m − k n q ( m, n ) . Note that c (1) ≡ c , which coheres with the definition of the effective cost of deception (withrespect to cognitive level 1) in the baseline model. B.2 Pure Maxmin and Minimal Fitness
The pure maxmin and minmax values give a minimal bound to the fitness of an NSC. Given agame G = ( A, π ) , define M and ¯ M as its pure maxmin and minmax values, respectively: M = max a ∈ A min a ∈ A π ( a , a ) , M = min a ∈ A max a ∈ A π ( a , a ) . The pure maxmin value M is the minimal fitness payoff a player can guarantee herself in thesequential game in which she plays first, and the opponent replies in an arbitrary way. The pureminmax value M is the minimal fitness payoff a player can guarantee herself in the sequentialgame in which her opponent plays first an arbitrary action, and she best-replies to the opponent’spure action. It is immediate that M ≤ M and that the minmax value in mixed actions is betweenthese two values.Let a M¯ be a maxmin action of a player; i.e. an action a M¯ guarantees that the player’s payoffis at least M ¯ , and let a ¯ M be a minmax action, i.e. an action that guarantees that the opponent’spayoff is at most ¯ M : a M ∈ arg max a ∈ A min a ∈ A π ( a , a ) , a ¯ M ∈ arg min a ∈ A max a ∈ A π ( a , a ) . The proof of Proposition 3 holds with minor changes also in the setup of interdependent pref-erences (under the assumption that ( u a M , ∈ Θ ID ), and this implies that the maxmin value is alower bound on the fitness payoff obtained in an NSC (i.e. if ( µ, b ) is an NSC then Π ( µ, b ) ≥ M ). B.3 Characterisation of Pure Stable Configurations
In this subsection we show that, essentially, a pure configuration is stable if and only if (1) allincumbents have the same cognitive level n , (2) the cost of level n is smaller than the difference37etween the incumbents’ (fitness) payoff and the minmax/maxmin values, and (3) the deviationgain is smaller than the effective cost of deceiving cognitive level n .We begin by formally stating and proving the necessity claim. Proposition 4. If ( µ ∗ , a ∗ ) is a pure NSC then the following holds: (1) if θ, θ ′ ∈ C ( µ ∗ ) then n θ = n θ ′ = n for some n , (2) π ( a ∗ , a ∗ ) − M ≥ k n , and (3) g ( a ∗ ) ≤ c ( n ) .Proof.
1. Since all players earn the same game payoff of π ( a ∗ , a ∗ ) , they must also incur the samecognitive cost, or else the fitness of the different incumbent types would not be balanced(which would contradict the fact that ( µ, a ∗ ) is an NSC).2. Assume to the contrary that π ( a ∗ , a ∗ ) − M < k n . A mutant of type ( π,
1) will be able toearn at least M against incumbents in any post-entry focal configuration. As the fraction ofmutants vanishes the average fitness of mutants is weakly higher than M , whereas the fitnessof the incumbents converges to π ( a ∗ , a ∗ ) − k n . Thus, if it were the case that π ( a ∗ , a ∗ ) − M
Suppose that ˆ θ := (cid:16) u a ∗ a ¯ M,n , n (cid:17) ∈ Θ ID . If π ( a ∗ , a ∗ ) − M > k n , and g ( a ∗ ) < c ( n ) ,then (cid:16) ˆ θ, a ∗ (cid:17) is an ESC.Proof. Suppose that all incumbents are of type (cid:16) u a ∗ a ¯ M,n , n (cid:17) . Note that in all focal post-entry config-urations the incumbent ˆ θ always plays either a ∗ or a ¯ M . Moreover, whenever an incumbent agent is38on-deceived, then she plays action a ∗ against a fellow incumbent and action a ¯ M against a mutant.The fact that π ( a ∗ , a ∗ ) − k n > M implies that any mutant θ = ˆ θ with cognitive level n θ ′ ≤ n earnsa strictly lower payoff against the incumbents in any focal post-entry configuration. As a result,if the frequency of mutants is sufficiently small, then they are strictly outperformed. Against amutant ( θ ′ , n ′ ) with cognitive level n ′ > n , an incumbent may play action a ∗ only when she is beingdeceived. Since π ( a ∗ , a ∗ ) > M the mutants earn (on average) at most π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( n ′ , n )in matches against incumbents. Consequently, as the fraction of mutants vanishes, the averagefitness of mutants is weakly less than π ( a ∗ , a ∗ ) + g ( a ∗ ) · q ( n ′ , n ) − k n ′ < π ( a ∗ , a ∗ ) + k n ′ − k n q ( n ′ , n ) · q ( n ′ , n ) − k n ′ = π ( a ∗ , a ∗ ) − k n , and the average fitness of the incumbents converges to π ( a ∗ , a ∗ ) − k n . Hence, the mutants areoutperformed.In particular, our results imply that:1. Any pure equilibrium that induces a payoff above the minmax value M is the outcome of apure ESC (regardless of the cost of deception).2. If the effective cost of deception is sufficiently small, then only Nash equilibria can be theoutcomes of pure NSCs. Specifically, this is the case if c ( n ) < g ( a ) for each cognitive level n and each action a such that ( a, a ) is not a Nash equilibrium of the fitness game.3. If there is a cognitive level n , such that (1) the cost of achieving level n is sufficientlysmall, and (2) the effective cost of deceiving an opponent of level n is sufficiently high,then essentially any pure profile is the outcome of a pure ESC (similar to the results ofHerold and Kuzmics, 2009, in the setup without deception). Formally, let A ′ ⊆ A be theset of actions that induce a payoff above the minmax value: A ′ = n a ∈ A | π ( a, a ) > ¯ M o .Assume that there is a cognitive level n , such that (1) k n < π ( a, a ) − ¯ M for each action a ∈ A ′ and (2) c ( n ) > g ( a ) for each action a . Then any action a ∈ A ′ is the outcome of apure ESC (in which all incumbents have cognitive level n ). B.4 Application: In-group Cooperation and Out-group Exploitation
The following table represents a family of Hawk-Dove games. When both players play D (Dove)they earn 1 each and when they both play H (Hawk) they earn 0. When a player plays H againstan opponent playing D , she obtains an additional gain of g > ∈ (0 , H DH , g, − lD − l, g , . (1)It is natural to think of a mutual play of D as the cooperative outcome. We define preferencesthat induce players to cooperate with their own kind and to seek to exploit those who are not oftheir own kind. Definition 14.
Let u n denote the preferences such that:1. If u θ ′ = u n and n θ ′ = n then u n ( D, a ′ , θ ′ ) = 1 and u n ( H, a ′ , θ ′ ) = 0 for all a ′ .2. If u θ ′ = u n or n θ ′ = n then u n ( H, a ′ , θ ′ ) = 1 and u n ( D, a ′ , θ ′ ) = 0 for all a ′ .Thus, when facing someone who is of the same type, an individual with u n -preferences strictlyprefers cooperation, in the sense of playing D . When facing someone who is not of the same type,an individual with u n -preferences strictly prefers the aggressive action H .To simplify the analysis and the notation in this example we assume that a player alwayssucceeds in deceiving an opponent with a lower cognitive level; i.e. we assume that q ( n, n ′ ) = 1whenever n>n ′ .Under the assumption that g > l and that the marginal cognitive costs are sufficiently small(but non-vanishing), we construct an ESC in which only individuals with preferences from { u i } ∞ i =1 are present. Individuals of different cognitive levels coexist, and non-Nash profiles are played in allmatches between equals. When individuals of the same level meet, they play mutual cooperation( D, D ). When individuals of different levels meet, the higher level plays H and the lower level plays D . The gain from obtaining the high payoff of 1 + g against lower types is exactly counterbalancedby the higher cognitive costs. By contrast, if g < l then the game does not admit this kind ofstable configuration. Proposition 6.
Let G be the game represented in (1), where g > and l ∈ (0 , . Assume that q ( n, n ′ ) = 1 whenever n = n ′ . Suppose that the marginal cognitive cost is small but non-vanishing,so that (a) there is an N such that k N ≤ l + g < k N +1 , and (b) it holds that g > k n +1 − k n for all n ≤ N . (i) If g > l then there exists an ESC ( µ ∗ , b ∗ ) , such that C ( µ ∗ ) ⊆ { ( u n , n ) } Nn =1 , and µ ∗ is mixed(i.e. | C ( µ ∗ ) | > ). The behaviour of the incumbents is as follows: if the individuals in a matchare of different cognitive levels, then the higher level plays H and the lower level plays D; if bothindividuals in a match are of the same cognitive level, then they both play D.(ii) If g = l then there exists an NSC with the above properties.(iii) If g < l then there does not exist any NSC ( µ ∗ , b ∗ ) , such that C ( µ ∗ ) ⊆ { ( u n , n ) } ∞ n =1 . Remark . It is possible to construct an ESC that is like Proposition 6(i) except that when in-cumbents of the same cognitive level meet they play the mixed equilibrium of the Hawk-Dovegame. Thus we can have ESCs in which agents mix at the individual level. For instance, thiscan be accomplished by considering preferences u m such that: (1) if u θ ′ = u m and n θ ′ = n then u m ( a, a ′ , θ ′ ) = π ( a, a ′ , θ ′ ) for all a and a ′ , and (2) if u θ ′ = u m or n θ ′ = n then u n ( H, a ′ , θ ′ ) = 1 and u n ( D, a ′ , θ ′ ) = 0 for all a ′ . C Constructions of Heterogeneous NSCs in Examples
Appendix C appears in the supplementary material that can be found online.
D Partial Observability When There Is No Deception
Appendix D appears in the supplementary material that can be found online.
References
Abreu, D., and
R. Sethi (2003): “Evolutionary Stability in a Reputational Model of Bargain-ing,”
Games and Economic Behavior , 44(2), 195–216.
Alger, I., and
J. W. Weibull (2013): “Homo Moralis, Preference Evolution under IncompleteInformation and Assortative Matching,”
Econometrica , 81(6), 2269–2302.
Banerjee, A., and
J. W. Weibull (1995): “Evolutionary Selection and Rational Behavior,” in
Learning and Rationality in Economics , ed. by A. Kirman, and
M. Salmon. Blackwell, Oxford,pp. 343–363.
Bergstrom, T. C. (1995): “On the Evolution of Altruistic Ethical Rules for Siblings,”
AmericanEconomic Review , 85(1), 58–81.
Bester, H., and
W. Güth (1998): “Is Altruism Evolutionarily Stable?,”
Journal of EconomicBehavior and Organization , 34, 193–209.
Bolle, F. (2000): “Is Altruism Evolutionarily Stable? And Envy and Malevolence? Remarks onBester and Güth,”
Journal of Economic Behavior and Organization , 42, 131–133.
Bomze, I. M., and
J. W. Weibull (1995): “Does Neutral Stability Imply Lyapunov Stability?,”
Games and Economic Behavior , 11(2), 173–192.41 rown, G. W., and
J. von Neumann (1950): “Solutions of Games by Differential Equations,”in
Contributions to the Theory of Games , ed. by H. W. Kuhn, and
A. W. Tucker, Annals ofMathematics Studies 24. Princeton University Press, Princeton.
Byrne, R. W., and
A. Whiten (1997): “Machiavellian intelligence,”
Machiavellian intelligenceII: Extensions and evaluations , pp. 1–23.
Byrne, R. W., and
A. Whiten (1998):
Machiavellian Intelligence: Social Expertise and theEvolution of Intellect in Monkeys, Apes, and Humans . Oxford University Press, Oxford.
Camerer, C. F., T.-H. Ho, and
J.-K. Chong (2002): “Sophisticated experience-weightedattraction learning and strategic teaching in repeated games,”
Journal of Economic theory ,104(1), 137–188.
Charness, G., and
D. I. Levine (2007): “Intention and stochastic outcomes: An experimentalstudy,”
The Economic Journal , 117(522), 1051–1072.
Conlisk, J. (2001): “Costly Predation and the Distribution of Competence,”
American EconomicReview , 91(3), 475–484.
Crawford, V. P. (2003): “Lying for strategic advantage: Rational and boundedly rationalmisrepresentation of intentions,”
American Economic Review , 93(1), 133–149.
Cressman, R. (1997): “Local Stability of Smooth Selection Dynamics for Normal Form Games,”
Mathematical Social Sciences , 34(1), 1–19.
Dekel, E., J. C. Ely, and
O. Yilankaya (2007): “Evolution of Preferences,”
Review ofEconomic Studies , 74, 685–704.
Dufwenberg, M., and
W. Güth (1999): “Indirect Evolution vs. Strategic Delegation: AComparison of Two Approaches to Explaining Economic Institutions,”
European Journal ofPolitical Economy , 15(2), 281–295.
Dunbar, R. I. M. (1998): “The Social Brain Hypothesis,”
Evolutionary Anthropology , 6, 178–190.
Ellingsen, T. (1997): “The Evolution of Bargaining Behavior,”
The Quarterly Journal of Eco-nomics , 112(2), 581–602.
Ely, J. C., and
O. Yilankaya (2001): “Nash Equilibrium and the Evolution of Preferences,”
Journal of Economic Theory , 97, 255–272.
Falk, A., E. Fehr, and
U. Fischbacher (2003): “On the nature of fair behavior,”
EconomicInquiry , 41(1), 20–26. 42 ershtman, C., and
Y. Weiss (1998): “Social Rewards, Externalities and Stable Preferences,”
Journal of Public Economics , 70(1), 53–73.
Frank, R. H. (1987): “If Homo Economicus Could Choose His Own Utility Function, Would HeWant One with a Conscience?,”
The American Economic Review , 77(4), 593–604.
Frenkel, S., Y. Heller, and
R. Teper (forthcoming): “The endowment effect as a blessing,”
International Economic Review . Friedman, D., and
N. Singh (2009): “Equilibrium Vengeance,”
Games and Economic Behavior ,66(2), 813–829.
Fudenberg, D., and
D. K. Levine (1998):
The theory of learning in games , vol. 2. MIT press.
Gamba, A. (2013): “Learning and Evolution of Altruistic Preferences in the Centipede Game,”
Journal of Economic Behavior and Organization , 85(C), 112–117.
Gauer, F., and
C. Kuzmics (2016): “Cognitive empathy in conflict situations,” mimeo , SSRN2715160.
Güth, W. (1995): “An Evolutionary Approach to Explaining Cooperative Behavior by ReciprocalIncentives,”
International Journal of Game Theory , 24(4), 323–344.
Güth, W., and
S. Napel (2006): “Inequality Aversion in a Variety of Games: An IndirectEvolutionary Analysis,”
The Economic Journal , 116, 1037–1056.
Güth, W., and
M. E. Yaari (1992): “Explaining Reciprocal Behavior in Simple StrategicGames: An Evolutionary Approach,” in
Explaining Process and Change , ed. by U. Witt. Uni-versity of Michigan Press, Ann Arbor, MI, pp. 22–34.
Guttman, J. M. (2003): “Repeated Interaction and the Evolution of Preferences for Reciprocity,”
The Economic Journal , 113(489), 631–656.
Heifetz, A., C. Shannon, and
Y. Spiegel (2007): “What to Maximize if You Must,”
Journalof Economic Theory , 133(1), 31–57.
Heller, Y. (2015): “Three Steps Ahead,”
Theoretical Economics , 10, 203–241.
Heller, Y., and
E. Mohlin (forthcoming): “Observations on cooperation,”
Review of Economicstudies . Herold, F., and
C. Kuzmics (2009): “Evolutionary Stability of Discrimination under Observ-ability,”
Games and Economic Behavior , 67, 542–551.43 ines, W. G. S., and
J. Maynard Smith (1979): “Games between Relatives,”
Journal ofTheoretical Biology , 79(1), 19–30.
Hofbauer, J. (2011): “Deterministic Evolutionary Game Dynamics,” in
Proceedings of Symposiain Applied Mathematics , vol. 69, pp. 61–79.
Hofbauer, J., and
K. Sigmund (1988):
The Theory of Evolution and Dynamical Systems .Cambridge University Press, Cambridge.
Holloway, R. (1996): “Evolution of the Human Brain,” in
Handbook of Human Symbolic Evo-lution , ed. by A. Lock, and
C. R. Peters. Clarendon Press, New York: Oxford University Press,pp. 74–116.
Hopkins, E. (2014): “Competitive Altruism, Mentalizing and Signalling,”
American EconomicJournal: Microeconomics , 6, 272–292.
Huck, S., and
J. Oechssler (1999): “The Indirect Evolutionary Approach to Explaining FairAllocations,”
Games and Economic Behavior , 28, 13–24.
Humphrey, N. K. (1976): “The Social Function of Intellect,” in
Growing Points in Ethology , ed.by P. P. G. Bateson, and
R. A. Hinde. Cambridge University Press, Cambridge, pp. 303-317.
Kim, Y.-G., and
J. Sobel (1995): “An Evolutionary Approach to Pre-Play Communication,”
Econometrica , 63(5), 1181–1193.
Kinderman, P., R. I. M. Dunbar, and
R. P. Bentall (1998): “Theory-of-Mind Deficits andCausal Attributions,”
British Journal of Psychology , 89, 191–204.
Koçkesen, L., E. A. Ok, and
R. Sethi (2000): “Evolution of interdependent preferences inaggregative games,”
Games and Economic Behavior , 31(2), 303–310.
Mailath, G. J., and
L. Samuelson (2006):
Repeated games and reputations: long-run rela-tionships . Oxford university press.
Matsui, A. (1991): “Cheap-Talk and Cooperation in a Society,”
Journal of Economic Theory ,54(2), 245–258.
Maynard Smith, J. (1982):
Evolution and the Theory of Games . Cambridge University Press,Cambridge.
Maynard Smith, J., and
G. R. Price (1973): “The Logic of Animal Conflict,”
Nature ,246(5427), 15–18. 44 ohlin, E. (2010): “Internalized Social Norms in Conflicts: An Evolutionary Approach,”
Eco-nomics of Governance , 11(2), 169–181.(2012): “Evolution of Theories of Mind,”
Games and Economic Behavior , 75(1), 299–312.
Norman, T. W. L. (2012): “Equilibrium Selection and the Dynamic Evolution of Preferences,”
Games and Economic Behavior , 74(1), 311–320.
Ok, E. A., and
F. Vega-Redondo (2001): “On the Evolution of Individualistic Preferences:An Incomplete Information Scenario,”
Journal of Economic Theory , 97, 231–254.
Possajennikov, A. (2000): “On the Evolutionary Stability of Altruistic and Spiteful Prefer-ences,”
Journal of Economic Behavior and Organization , 42, 125–129.
Premack, D., and
G. Woodruff (1979): “Does the Chimpanzee Have a Theory of Mind,”
Behavioral and Brain Sciences , 1, 515–526.
Robalino, N., and
A. Robson (2016): “The evolution of strategic sophistication,”
AmericanEconomic Review , 106(4), 1046–72.
Robson, A. J. (1990): “Efficiency in Evolutionary Games: Darwin, Nash and the Secret Hand-shake,”
Journal of Theoretical Biology , 144(3), 379–396.
Robson, A. J. (2003): “The Evolution of Rationality and the Red Queen,”
Journal of EconomicTheory , 111, 1–22.
Robson, A. J., and
L. Samuelson (2011): “The Evolutionary Foundations of Preferences,” in
The Social Economics Handbook , ed. by J. Benhabib, A. Bisin, and
M. Jackson. North Holland,Amsterdam, pp. 221–310.
Rtischev, D. (2016): “Evolution of Mindsight and Psychological Commitment among Strategi-cally Interacting Agents,”
Games , 7(3), 27.
Samuelson, L. (2001): “Introduction to the Evolution of Preferences,”
Journal of EconomicTheory , 97(2), 225–230.
Sandholm, W. H. (2001): “Preference Evolution, Two-Speed Dynamics, and Rapid SocialChange,”
Review of Economic Dynamics , 4, 637–679.(2010): “Local Stability under Evolutionary Game Dynamics,”
Theoretical Economics ,5(1), 27–50.
Schaffer, M. E. (1988): “Evolutionarily Stable Strategies for a Finite Population and a VariableContest Size,”
Journal of Theoretical Biology , 132, 469–478.45 chelling, T. C. (1960):
The Strategy of Conflict . Harvard University Press, Cambrdige, MA.
Schipper, B. C. (2017): “Strategic teaching and learning in games,” mimeo . Schlag, K. H. (1993): “Cheap Talk and Evolutionary Dynamics,” Bonn Department of Eco-nomics Discussion Paper B-242.
Selten, R. (1980): “A Note on Evolutionarily Stable Strategies in Asymmetric Animal Conflicts,”
Journal of Theoretical Biology , 84(1), 93–101.
Sethi, R., and
E. Somanthan (2001): “Preference Evolution and Reciprocity,”
Journal ofEconomic Theory , 97, 273–297.
Stahl, D. O. (1993): “Evolution of Smart n Players,”
Games and Economic Behavior , 5(4),604–617.
Stennek, J. (2000): “The Survival Value of Assuming Others to be Rational,”
InternationalJournal of Game Theory , 29, 147–163.
Taylor, P. D., and
L. B. Jonker (1978): “Evolutionary Stable Strategies and Game dynam-ics,”
Mathematical Biosciences , 40(1–2), 145–156.
Thomas, B. (1985): “On Evolutionarily Stable Sets,”
Journal of Mathematical Biology , 22(1),105–115. van Damme, E. (1987):
Stability and Perfection of Nash Equilibria . Springer, Berlin.
Wärneryd, K. (1991): “Evolutionary Stability in Unanimity Games with Cheap Talk,”
Eco-nomics Letters , 36(4), 375–378.(1998): “Communication, Complexity, and Evolutionary Stability,”
International Journalof Game Theory , 27(4), 599–609.
Weibull, J. W. (1995):
Evolutionary Game Theory . MIT Press, Cambridge Massachusetts.
Whiten, A., and
R. W. Byrne (1988): “Tactical deception in primates,”
Behavioral and brainsciences , 11(2), 233–244.
Wiseman, T., and
O. Yilankaya (2001): “Cooperation, secret handshakes, and imitation inthe prisoners’ dilemma,”