[PDF] A model of cultural evolution in the context of strategic conflict

Abstract

We consider a model of cultural evolution for a strategy selection in a population of individuals who interact in a game theoretic framework. The evolution combines individual learning of the environment (population strategy profile), reproduction, proportional to the success of the acquired knowledge, and social transmission of the knowledge to the next generation. A mean-field type equation is derived that describes the dynamics of the distribution of cultural traits, in terms of the rate of learning, the reproduction rate and population size. We establish global well-posedness of the initial-boundary value problem for this equation and give several examples that illustrate the process of the cultural evolution for some classical games.

Full PDF

AA MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OFSTRATEGIC CONFLICT

MISHA PEREPELITSAA

BSTRACT . We consider a model of cultural evolution for a strategyselection in a population of individuals who interact in a game theoreticframework. The evolution combines individual learning of the environ-ment (population strategy proﬁle), reproduction, proportional to the suc-cess of the acquired knowledge, and social transmission of the knowl-edge to the next generation. A mean-ﬁeld type equation is derived thatdescribes the dynamics of the distribution of cultural traits, in terms ofthe rate of learning, the reproduction rate and population size. We es-tablish global well-posedness of the initial-boundary value problem forthis equation and give several examples that illustrate the process of thecultural evolution for some classical games.

1. I

NTRODUCTION

Evolutionary game theory, pioneered by Maynard Smith and Price [16] isa powerful tool that explains dominance of some behavioral traits as beinguninvadable by other traits in the competition for Darwinian ﬁtness points,when ﬁtness is frequency dependent. A deterministic dynamic process thatselects a stable behavioral traits can be described by the replicator equation,see Taylor and Jonker [17], Hofbauer et al. [7], Zeeman [18].The replicator equation also governs the dynamics of reinforcement learn-ing in repeated play of a game, see Borgers and Sarin [1], Fudenberg andLevine [3], Krishnedu et al. [10], Perepelitsa [11].Learning in games is an integral part of game theory that goes back toworks of Robinson [13] and Shapley [14]. One of its mainstays is ﬁctitiousplay or statistical learning. The learning by ﬁctitious play in large pop-ulations can be described by an ODE, called best-response equation, seeGilboa and Matsui [5], Gaunersdorfer and Hofbauer [4], Hofbauer [6] andHofbauer and Sigmund[8]. The best-response equation describes changesin the mean statistical prior about the opponent actions, and its stationarypoints are Nash equilibria.In this paper we consider an evolutionary process that combines the con-cept of “the survival of ﬁttest” from biological evolution with individual

Date : June 3, 2020. a r X i v : . [ q - b i o . P E ] J un MISHA PEREPELITSA learning through ﬁctitious play, when a state of learning is socially trans-mitted to the next generation of players. The examples of this types ofprocesses are furnished by the cultural evolution theory.Consider a cartoonish scenario of cultural evolution. Lets say there isan island populated by pedantical statisticians capable of asexual reproduc-tion. Statisticians wonder aimlessly around an island, meeting each otheroccasionally for a round of a game (symmetric game with a ﬁnite set ofstrategies). Each carries a ledger where he/she carefully marks how manytimes the opponent played a particular strategy (opponents as indistinguish-able). To select a strategy, each of them uses the “sacred book of rules”(best response in a game) that prescribes what to do given the current countfrom his/her ledger. The book has a dual purpose of settling the outcome ofeach play of the game (payoffs), and the players collect certain amount ofﬁtness points from each play. From time to time statisticians reproduce atthe rate proportional to their accumulated ﬁtness. When that happens, theysolemnly pass an exact copy of their ledgers to each of the offspring, whocarry on with it in the same manner.We say that this model is a form of cultural evolution because it is char-acterized by social transmission of traits (inheritance of knowledge) andindividual learning as adaptation mechanisms, see Hoppitt and Laland [9],Richerson and Boyd [12]. Note, that in this case, social transmission andlearning change population strategy proﬁle (environment), which determines,in its turn, the degree of success of a cultural trait, rendering the problemnonlinear.The main parameters of the problem are the rate of learning, the rate ofreproduction and population size. Additional information might be neededto completely specify the problem. For example, if the island is not big,we may assume that the frequency with which inhabitants meet and playthe game increases with the population size. Another scenario is an inﬁniteisland which allows inhabitants to spread, no matter how many of them arethere, so that the interaction frequency is constant.The goal of the paper is to develop a mathematical model that takes asinput the initial distribution of cultural traits in a population, the above men-tioned rates of learning and reproduction, and outputs the distribution ofcultural traits at any moment t in future. As we will see from the exam-ples of section 3, it is essential for an accurate description of the dynamicsthat the model speciﬁes the whole distribution of traits and not just somestatistical averages, such as the mean and the variance. The model is de-rived as a mean-ﬁeld approximation of the distribution density of a Markovprocess describing the interaction of agents. The equation is of kinetic typewith non-linear kinetic velocities. Due to the discontinuities of the best-response function, solutions of this equation are intrinsically weak. Our MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT3 main result, stated in section 2, establishes global well-posedeness of theinitial-boundary value problem for this equation.In section 3 we discuss two examples that illustrate the dynamics of thismodel of cultural evolution. In the ﬁrst, we consider Hawk-Dove-Retaliatorgame with two evolutionary stable strategies: H + D and R . These twostrategies are the only asymptotically stable points for the replicator andbest-response equations. The phase, however, are different, with the basinof attraction for H + D , for the replicator equation, being strictly includedin the basin of attraction for the best-response dynamics. As a result, thereare initial conditions for the distribution of cultural traits which evolve toRetaliator when the rate of learning is low, but proceed to H + D whenlearning rate is increased. It also means that if the biological evolution pro-ceeds to the mix of Hawk and Dove, it can not be averted to anything elseby learning. Another interesting property of this process is a sharp changein the environment (population strategy proﬁle) when a subpopulation con-tinuously transitions from one decision polygon to another.In the second example, of Rock-Paper-Scissors game, we show how ex-ponentially growing heterogeneous population can lock the cultural evolu-tion in a suboptimal pure strategy, in contrast to both, the dynamics of thereplicator and best-response equations.In general, determining asymptotic state for this type of evolution for anarbitrary game is problematic due to complicated dynamics and absenceof entropy functionals. It can be done in some cases, at least partially, asin the model with zero reproduction rate. For that model, we show that iftwo statistical averages, the mean prior and mean best-response converge tosome values (not necessarily the same), then the prior of every agent in thepopulation converge to a Nash equilibrium of the game.2. M ODEL

We consider a series of plays of a symmetric 2-player game betweenrandomly selected agents in a large population. There are d strategies avail-able to agents and the payoffs are given by matrix A = { a i j } di , j = , whichwe assume to have non-negative entries. The game deﬁnes multi-valuedbest response function BR ( p ) : S d − → P ( S d − ) , where S d − is d − b ( p ) ∈ BR ( p / ∑ i p i ) . We refer the reader to Appendix section 4.1 for de-tails. We start with the case when there is no reproduction and the popula-tion stays at the same level N .We will record the change of the state of agents that occur at discreteepochs, labeled by t . The state of agent i at epoch t is a d + MISHA PEREPELITSA vector X ti = (cid:0) P ti , S ti (cid:1) , where P ti = ( p ti , , ..., p ti , d ) , is a vector of learning priors (unscaled) and S ti is an averaged, accumulatedﬁtness.An interaction is a round of the game between to random agents, say i and j , who play according to their priors P ti and P tj , that is, using theirbest response strategies. Based on that, they earn ﬁtness points and updatethe learning priors. To describe the update rule we will use the followingparameters: h – the characteristic learning increment, µ h – characteristicﬁtness increment, and δ – time increment. Thus we assuming the learningand ﬁtness increments are of the same order, but not necessarily equal. Therule takes the form P t + δ i = P ti + hb ( P tj ) P t + δ j = P tj + hb ( P ti ) S t + δ i = ( − µ h ) S ti + µ ha ( b ( P ti ) , b ( P tj )) S t + δ j = ( − µ h ) S tj + µ ha ( b ( P tj ) , b ( P ti )) where a ( b ( P ti ) , b ( P tj )) = ∑ k , l a i j b ( P ti , k ) b ( P tj , l ) is the ﬁtness earned by agent i . In this formulas the ﬁtness is averaged over the history of payoffs, so thatit can not grow without a bound. One can think of µ as being a recencyparameter. Large values of µ put more weight on more recent payoffs.Our goal here is to derive an approximate equation for f ( p , s , t ) – thedensity of the distribution of agents over the space of learning priors andﬁtness ( p , s ) ∈ R d + × R + . In the following derivation we use the conventionthat x i = ( p i , s i ) , ¯ x = ( x , ..., x N ) ∈ ( R d + × R + ) N , where ¯ x parametrized the state of the whole population.Let w ( ¯ x , t ) be the density of the distribution of priors and ﬁtness for thewhole population. This function implicitly depends on parameters suchas h , h , and µ , but we suppress them from notation for convenience ofpresentation. The update rule can be expressed as a moment relation with atest function φ , (cid:90) φ ( ¯ x ) w ( ¯ x , t + δ ) d ¯ x = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) φ ( ¯ x ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j w ( ¯ x , t ) d ¯ x , MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT5 where ˆ x i = ( p i + hb ( p j ) , ( − µ h ) s i + µ ha ( b ( p i ) , b ( p j ))) , ans symmetri-cally for ˆ x j . The last equation can be written as(1) (cid:90) φ ( ¯ x )[ w ( ¯ x , t + δ ) − w ( ¯ x , t )] d ¯ x = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) [ φ ( ¯ x ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j − φ ( ¯ x )] w ( ¯ x , t ) d ¯ x . Function f ( x , t ) , where x = ( p , s ) is related to the multi-agent distribution w ( ¯ x , t ) through the rule: f ( x , t ) = ∑ k N − (cid:90) w ( ¯ x ) (cid:12)(cid:12) x k = x d ¯ x k , x ∈ R d + × R + , where ¯ x k is a ( d + )( N − ) dimensional vector of all coordinates, x , ..., x N , excluding x k . This is one-particle distribution function. In the formulas tofollow we need to use two-particle distribution function g ( x , y , t ) = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) w ( ¯ x ) (cid:12)(cid:12) x i = x , x j = y d ¯ x i j , where ¯ x i j is the ( d + )( N − ) dimensional vector of all coordinated ex-cluding x i and x j . Function g is symmetric in ( x , y ) and is related to f by the formulas f ( x , t ) = (cid:90) g ( x , y , t ) dx = (cid:90) g ( x , y , t ) dy . The moments of function f and g are computed from the moments of w : (cid:90) ψ ( x ) f ( x , t ) dx = ∑ k N − (cid:90) ψ ( x k ) w ( ¯ x ) d ¯ x , and (cid:90) ω ( x , y ) g ( x , y , t ) dxdy = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) ω ( x i , x j ) w ( ¯ x ) d ¯ x . MISHA PEREPELITSA

Now, we use (1) to obtain an integral equation of the change of function f . For that select φ ( ¯ x ) = ψ ( x k ) , sum over k and take average. We get(2) (cid:90) ψ ( x )[ f ( x , t + δ ) − f ( x , t )] dx = N − ∑ k ∑ i (cid:54) = j ( N ( N − )) − (cid:90) [ ψ ( x k ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j − ψ ( x k )] w ( ¯ x , t ) d ¯ x = N − ∑ i (cid:54) = j ( N ( N − )) − (cid:18) (cid:90) [ ψ ( ˆ x i ) − ψ ( x i )] w ( ¯ x , t ) d ¯ x + (cid:90) [ ψ ( ˆ x j ) − ψ ( x j )] w ( ¯ x , t ) d ¯ x (cid:19) = N (cid:90)(cid:90) [ ψ ( ˆ x ) − ψ ( x )] g ( x , y , t ) dxdy , where x = ( p , s ) , y = ( p (cid:48) , s (cid:48) ) andˆ x = ( p + hb ( p (cid:48) ) , ( − µ h ) s + µ ha ( b ( p ) , b ( p (cid:48) ))) . To proceed to, we make an assumption of statistical independence of thestates of two randomly selected agents: g ( x , y , t ) = f ( x , t ) f ( y , t ) . The plausibility of this condition is partially justiﬁed if the population islarge, so that same agents are rarely matched together, and the informationabout the interaction is not shared between other agents.Then, expanding ψ ( ˆ x ) in Taylor series and integrating by parts, we obtain(3) (cid:90) ψ ( x )[ f ( x , t + δ ) − f ( x , t )] dx = N (cid:90)(cid:90) ψ ( p , s ) div p , s (cid:0) ( hb ( p (cid:48) ) , µ hs − µ ha ( b ( p ) , b ( p (cid:48) ))) f ( x , t ) (cid:1) f ( y , t ) dydx + O ( h )= N (cid:90) ψ ( p , s ) div p , s (cid:0) ( h ¯ b ( t ) , µ hs − µ h ¯ a ( b ( p ) , t )) f ( x , t ) (cid:1) dx + O ( h ) , with the mean best response(4) ¯ b ( t ) = (cid:90)(cid:90) b ( p ) f ( p , s , t ) d pds , and the mean ﬁtness for using strategy b ( p ) :(5) ¯ a ( b ( p ) , t ) = (cid:90)(cid:90) a ( b ( p ) , b ( p (cid:48) )) f ( p (cid:48) , s (cid:48) , t ) d p (cid:48) ds (cid:48) = ∑ i , j a i j b i ( p ) ¯ b j ( t ) . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT7

Dividing equation by δ and ignoring higher order terms we arrive at Fokker-Planck equation for density f ( p , s , t ) :(6) ∂ t f + hN δ div p (cid:0) ¯ b ( t ) f (cid:1) + h µ N δ ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = . In passing from a discrete to continuous time model we are assuming δ , h are small, N is large, so that ratios(7) α p = hN δ , α s = µ hN δ are of ﬁnite order. Note that ( N δ ) − can be interpreted as a number ofinteractions per agent, per unit of time. We’re assuming that this numberis large and inversely proportional to the characteristic learning and ﬁtnessincrement h . Now we extend the model to variable populations, by allowing agents toreproduce at the rate proportional their level of ﬁtness. At this point weproceeding heuristically, leaving out the details of the derivation.With reproduction, the Fokker-Planck equation must be appended by asource term proportional to ( s − ¯ r ( t )) f ( p , s , t ) on the right-hand side of (6),where ¯ r ( t ) is mean population ﬁtness(8) ¯ r ( t ) = ∑ i , j a i j ¯ b i ( t ) ¯ b j ( t ) . Mean population size N = N ( t ) , which is determined from the equation(9) 1 N dNdt = α (cid:90)(cid:90) s f ( p , s , t ) d pds , where α is reproduction rate. Moreover rates α p , α s are variable and dependon N = N ( t ) . The ﬁnal model reads:(10) ∂ t f + α p div p (cid:0) ¯ b ( t ) f (cid:1) + α s ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = α ( s − ¯ r ( t )) f , with α p , α s , ¯ b ( t ) , ¯ a ( b ( p ) , t ) and ¯ r ( t ) given by (7), (15), (5), and (8), respec-tively. Note that equations (9) and (10) are coupled through formulas (7).2.1. Singular limit of recency parameter µ . . In the reproduction scenariodescribed by (10), children acquire not only knowledge p of parents but alsotheir averaged, accumulated ﬁtness s . Hypothetically, this might be a validassumption in some situations, however, it seems more relevant to considerthe case that it is only knowledge p that eventually determines the ﬁtness ofoffspring. This can easily be achieved in the framework of models (7)-(10)be taking the limit of µ → ∞ ( α s → ∞ ), which overweights the stimulusobtained from recent encounters. For the derivation of the new model we MISHA PEREPELITSA proceed informally. Dividing equation (10) by α s and passing to the limit,we get ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = . Since f is non-negative, this equation can be true only if for all p ∈ R d + , and t > , f is a delta-function concentrated on value ¯ a ( b ( p ) , t ) : f ( p , s , t ) = δ ( s − ¯ a ( b ( p ) , t )) . That is, ﬁtness equals to the expected payoff for an agent using strategy b ( p ) against the population strategy proﬁle ¯ b ( t ) : s = ¯ a ( b ( p ) , t ) = ∑ i j a i j b i ( p ) ¯ b j ( t ) . Now, the dimension of the problem can be reduced, as we can integrate (10)in s , and ﬁnd an equation for moment (cid:82) ∞ − ∞ f ( p , s , t ) ds , which, with slightabuse of notation, we still call f ( p , t ) . The equation reads: ∂ t f + α p div p ¯ b ( t ) f = α ( ¯ a ( b ( p ) , t ) − ¯ r ( t )) f = α (cid:32) ∑ i j a i j b i ( p ) ¯ b j ( t ) − ∑ i j a i j ¯ b i ( t ) ¯ b j ( t ) (cid:33) f . (11)This is the equation of our main interest, for which we will establish globalwell-posedness. Before we switch to the mathematical analysis, we mentiona special case with zero reproduction α = . To complete the mathematical setup for equations (11) and (14) it re-mains to add the initial conditions for the population size N ( ) = N , forthe density(12) f ( p , ) = f ( p ) , p ∈ R d + . and boundary conditions (zero inﬂux of probability):(13) f ( p , t ) = , p ∈ ∂ R d + , t ≥ . Note that velocity vector ¯ b ( t ) is always directed into R d + , and the problemis not over-determined.2.2. Fictitious play in large populations.

The model becomes particu-larly simple:(14) ∂ t f + α p div ( ¯ b ( t ) f ) = , with the mean best response(15) ¯ b ( t ) = (cid:90) b ( p ) f ( p , t ) d p . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT9

Using equation (14) we can compute the equation for the mean empiricalfrequencies vector P ( t ) :(16) dP i dt = (cid:90) ∑ j p j (cid:18) ¯ b i ( t ) − p i ∑ j p j (cid:19) ˜ f ( p , t ) d p , i = .. d , since ∑ j ¯ b j ( t ) = . If one postulates that all agents have the same, or ap-proximately the same, priors(17) p ( t ) = ( P ( t ) , .., P d ( t )) , then the above equation reduces to a variant of the best response dynamicsequation:(18) dP i dt = ∑ j P j ( t ) (cid:0) ¯ b i ( P ) − P i (cid:1) , i = .. d . Notice, also, the positive factor on the right-hand side of the equation. Fora learning processes in which priors become large, the learning rate slowsdown.2.3.

Relation to the replicator equation.

With zero learning rate α p = f ( p , t ) . Indeed, in this case each agent uses a ﬁxed strategy b ( p ) , so that the population is split into at most d groups, each using aparticular strategy, and each reproducing at the rate proportional to the av-eraged ﬁtness obtained from interacting with whole population. Formally,one obtains the system of replicator equations by integrating (11) over sets { p : b ( p ) = e k } , k = .. d . Existence of weak solutions.

In this section we establish our mainresult, theorem 1. Let Ω = R d + , and C ( Ω ) be a space of continuouslydifferentiable functions with compact support in Ω . We adopt standard no-tation for L p ( Ω ) spaces and the space of functions of locally bounded vari-ation BV loc ( Ω ) . The latter consists of all measurable and locally integrablefunctions f such that for any ball B r , (cid:107) f (cid:107) TV ( B r ∩ Ω ) = sup (cid:26) (cid:90) B r ∩ Ω f div ψ d p : ψ ∈ C ( B r ∩ Ω ) , sup p | ψ | ≤ (cid:27) < + ∞ . For such functions, the distributional derivative ∂ p i f , i = .. d , is a signedRadon measure. One can ﬁnd the information on these spaces and the re-sults from functional analysis that we use below, for example, in a book byBrezis [2]. Theorem 1.

Let f ∈ C ( Ω ) be a non-negative function with unit mass.There is a unique weak solution f of (11) , (12) , (13) such thatf ∈ C ([ , T ] ; L ( Ω )) ∩ L ∞ ([ , T ] ; BV ( B r ∩ Ω )) , ∀ r , T > . For any t > , f ( p , t ) ≥ , a.e. in Ω and (cid:82) f ( p , t ) d p = . Proof.

From the deﬁnition of function b ( p ) and properties of BR ( p ) it fol-lows that for any ball B r , b ( p ) has ﬁnite total variation on B r ∩ Ω , and thereis C = C ( r ) , but not depending on the center of the ball, such that(19) (cid:107) b (cid:107) TV ( B r ∩ Ω ) ≤ C . Equation (11) can be written in non-conservative form as(20) ∂ t f + ¯ b ( t ) ∇ f = (cid:0) ∑ a i j ( b i ( p ) ¯ b j ( t ) − ¯ b i ( t ) ¯ b j ( t )) (cid:1) f , where for simplicity we set α = . Given a continuous function ¯ b ( t ) wesolve this equation by the method of characteristics. For a mapping X t : R d → R d , deﬁned as X t ( p ) = p + (cid:90) t ¯ b ( τ ) d τ , f is expressed through the formula f ( X t ( p ) , t ) = f ( p ) exp (cid:26) (cid:90) t ∑ a i j [ b i ( X τ ( p )) ¯ b j ( τ ) − ¯ b i ( τ ) ¯ b j ( τ )] d τ (cid:27) , or as(21) f ( p , t ) = f ( p − (cid:90) t ¯ b ( τ ) d τ ) × exp (cid:26) (cid:90) t ∑ a i j [ b i ( p − (cid:90) t τ ¯ b i ( s ) ds ) ¯ b j ( τ ) − ¯ b i ( τ ) ¯ b j ( τ )] d τ (cid:27) . Let g ∈ C ([ , T ] ; L ( Ω )) be a non-negative function such that g ( p , ) = f ( p ) , and (cid:82) g ( p , t ) d p = , for all t ∈ [ , T ] . We denote this subset of func-tions as K . It is a closed, convex subset of C ([ , T ] ; L ( Ω )) . Let¯ b g ( t ) = (cid:90) b ( p ) g ( p , t ) d p , and deﬁne map f = L ( g ) by evaluating (21) with ¯ b = ¯ b g . Notice that dueto assumptions on g , sup t | ¯ b g ( t ) | ≤ . It follows thatsup p , t f ( p , t ) ≤ e CT sup p f ( p ) , for some C > g , and (cid:82) f ( p , t ) d p = . Moreover, the fol-lowing lemma holds

Lemma 1.

For any r > , f ∈ L ∞ (( , T ) ; BV ( B r ∩ Ω )) , and there is C = C ( r , T ) , independent of g such that ess sup t (cid:107) f ( · , t ) (cid:107) TV ( B R ∩ Ω ) ≤ C ( r , T ) . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT11

Proof.

Recall that b ( p ) is a function of ﬁnite total variation that veriﬁesestimate (19). Differentiating (21) in p k , and using the chain rule we ﬁndthat for any ball B r , (22) (cid:90) B r | ∂ p k f | d p ≤ C ( T ) (cid:90) Ω | ∂ p k f | d p + C ( T ) sup f (cid:90) t (cid:90) B r − (cid:82) t τ ¯ b g ( s ) ds ∑ i | ∂ p k b i ( p ) | d pd τ ≤ C ( T ) (cid:90) Ω | ∂ p k f | d p + C ( r , T ) sup f ≤ C ( r , T ) , where | ∂ p k b i ( p ) | is a Borel measure. (cid:3) Using the argument of the last lemma one easily veriﬁes that f is Lips-chitz continuous in time with values in L ( Ω ) : Lemma 2.

Let t and t + δ ∈ [ , T ] . Then, there is C = C ( T ) , independentof g , such that (23) (cid:90) | f ( p , t + δ ) − f ( p , t ) | d p ≤ (cid:107) f (cid:107) C ( Ω ) C δ . From the properties of f = L ( g ) that we have just established we seethat L maps K into itself. In addition, we now show that Lemma 3. L [ K ] is pre-compact in C ([ , T ] ; L ( Ω )) . Proof.

Indeed, since f has bounded total variation in p , we know thatsup t ∈ [ , T ] (cid:90) | f ( p + h , t ) − f ( p , t ) | d p ≤ C ( T ) h . The support of functions f ( · , t ) for all different t ’s and g ’s is containedin some ﬁxed ball B r because X t is an uniform translation with a con-tinuous vector (cid:82) t ¯ b g ( τ ) d τ . By Kolmogorov-Riesz-Frechet theorem, for all t ∈ [ , T ] , set { L ( g ) } g ∈ K is pre-compact in L ( Ω ) . Using Lipschitz continuity in time, this also im-plies that { L } g ∈ K is pre-compact in C ([ , T ] ; L ( Ω )) . (cid:3) Thus, L is a compact mapping from K into itself. By Schauder ﬁxedpoint theorem, there is a ﬁxed point f = L ( f ) in K ⊂ C ([ , T ] ; L ( Ω )) . Clearly, it veriﬁes all estimates that we have derived. Moreover, it can beshown that f is a weak solution of pde (11).Uniqueness of solutions follows from a stronger property, stability esti-mate. Let f , f be two solutions of (11)–(13) with initial conditions f ,

12 MISHA PEREPELITSA and f , . Such solutions verify the formula (21), from which we ﬁnd that (cid:90) | f ( p , t ) − f ( p , t ) | d p ≤ C ( T ) (cid:90) | f , ( p , t ) − f , ( p , t ) | d p + C (cid:90) t (cid:90) | f ( p , τ ) − f ( p , τ ) | d pd τ . Thus, according to Gronwall’s inequality (cid:90) | f ( p , t ) − f ( p , t ) | d p ≤ C ( T ) (cid:90) | f , ( p , t ) − f , ( p , t ) | d p . (cid:3) Now we collect information on the support of solutions of (11) that willbe used in the proof of theorem 2.

Lemma 4.

Suppose that supp f ⊂ Interior ( Ω ) . Then, for any t > , a. supp f ( · , t ) ⊂ Interior ( Ω ) ;b. for any p ∈ Ω , | p + (cid:90) t ¯ b ( τ ) d τ | ≥ t / d ;c. if supp f ⊂ B r ( p ) , for some r and p ∈ Ω , then supp f ( · , t ) ⊂ B r (cid:18) p + (cid:90) t ¯ b ( τ ) d τ (cid:19) . Proof.

Since for any t , ¯ b ( t ) ∈ S d − ⊂ Ω and Ω is a cone, it follows that (cid:82) t ¯ b ( τ ) d τ ∈ Ω and for any p ∈ Interior ( Ω ) , p + (cid:82) t ¯ b ( τ ) d τ ∈ Interior ( Ω ) . Moreover, the distance from p + (cid:82) t ¯ b ( τ ) , d τ to ∂ Ω is no less than the dis-tance from p to ∂ Ω . This proves part. a. Part b. follows from the fact thatfor any t > , ∑ di = ¯ b i ( t ) = , and so, there is i , and there is δ ⊂ [ , t ] , suchthat ¯ b i ( t ) ≥ / d , for all t ∈ ∆ , and | ∆ | ≥ t / d . Part c . follows immediatelyfrom (21). (cid:3) Asymptotic behavior in ﬁctitious play.

Consider a model of statis-tical learning in a large population described by equation (14). An initialboundary-value problem (12), (13) with arbitrary f ∈ C ( Ω ) , has a globalunique solution, as was established in theorem 1. Denote population meanlearning prior by P ( t ) = (cid:90) p ∑ i p i f ( p , t ) d p , and by ˆ f the projection of f ( p , t ) onto the simplex S d − . That is,ˆ f ( ˆ p , t ) = f ( p , t ) , ˆ p = p ∑ i p i ∈ S d − . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT13

Hawk Dove RetaliatorHawk -1 2 -1Dove 0 1 0.9Retaliator -1 1.1 1T

ABLE

1. Hawk-Dove-Retaliator game.The next theorem shows that if the population averages P ( t ) and ¯ b ( t ) con-verge to certain values, then these values must be the same and equal toa Nash equilibrium for the matrix game, and the learning priors of everyagent in the population converge to that Nash equilibrium. Theorem 2.

Suppose that lim t → ∞ P ( t ) = P and lim t → ∞ ¯ b ( t ) = b . Then,b = P ∈ BR ( P ) , and ∀ ε > , ∃ T ( ε ) such that if t > T ( ε ) , then (24) supp ˆ f ( · , t ) ⊂ B ε ( P ) ∩ S d − . Proof.

Consider function ˆ f ( p , t ) which is deﬁned for p ∈ S d − . From thedeﬁnition of P ( t ) it follows that P ( t ) belongs to the closed convex hullspanned by supp ˆ f ( · , t ) . At time t = , the support of f is separated fromthe origin, and thus, by properties b. and c. of lemma 4 (it applies to solu-tions of (14) as well), support of f ( · , t ) will be contained in a ball of ﬁxedradius and the center diverging to inﬁnity. This means that the diameter ofthe support of projection ˆ f decreases to zero. At the same time, since itcontains point P ( t ) accumulating at P , statement (24) follows.To proof the ﬁrst statement, notice that for sufﬁciently small ε and large t , all of mass of ˆ f is near P so that ¯ b ( t ) is a convex combination of values of ofBR ( p ) in polytops adjacent to point P , and so (see (29) from Appendinx),is an element of BR ( P ) . On the other hand P must be equal to b , becauseof the transport structure of the kinetic equation (14). (cid:3)

3. E

XAMPLES

Cultural evolution in Hawk-Dove-Retaliator game.

Consider a clas-sical Hawk-Dove-Retaliator game, table 1, from evolutionary game theory,see Maynard Smith [15] ans Zeeman [18], table 1. The game has two ESS: Hawk+ Dove and Retaliator. Depending on the initial distribution of fre-quencies to play hawk, dove, or retaliator, the replicator dynamics will pro-ceed to one of ESS’s as shown on ﬁgure 1.The same strategies are also asymptotically stable points for the best-response dynamics, which describes the statistical learning (ﬁctitious play) in this game. Figure 1 shows the basins of attraction for each of the strate-gies. Notice that basin of attraction for strategy R in replicator equationcontains that region for the best-response dynamics. The mismatch be-tween two dynamics accounts for different scenarios of cultural learningfor different pairs of the learning and reproduction rates ( α p , α ) . For a population consisting of three groups, located in three best-respnsepoligyons, ﬁgures 2 and 3 show two different scenarios for cultural evolu-tion. The ﬁrst is reproduction dominated and the other is learning dominateddynamics. Trajectories were obtained by solving (11) numerically.Notice also that the mean best-response (strategy proﬁle) changes dis-continuously when one the subpopulation crosses the boundaries of bestresponse polygons.With ﬁnite number of subpopulations the model reduces to a system ofODEs. In this particular example the density function f ( p , t ) = ∑ i = w i ( t ) δ ( p − p i ( t )) , ∑ i = w i ( t ) = . where functions p i and w i are solutions of ∂ t p i = ¯ b ( t ) , i = .. , ∂ t w i = α w i (cid:32) ∑ kl a kl b k ( p i ( t )) ¯ b l ( t ) − a kl ¯ b k ( t ) ¯ b l ( t ) (cid:33) , i = .. , and ¯ b ( t ) = w ( t ) b ( p ( t )) + w ( t ) b ( p ( t )) + w ( t ) b ( p ( t )) . Notice that all priors p i ( t ) change in the direction of the mean best response¯ b ( t ) (when projected to S d − , this means that p i ( t ) moves toward ¯ b ( t ) ), andthe weights w i change according the performance of priors p i . Figures 2 and 3 were obtained using the following set of initial data: w ( ) = . , w ( ) = . , w ( ) = . , p ( ) = ( . , . , . ) , p ( ) =( . , . , . ) , p ( ) = ( . , . , . ) . With such initial data, the meanbest response ¯ b ( ) = ( . , . , . ) is located in the basin of attraction ofRetaliator according to the replicator equation and in the basin of attractionof Hawk+ Dove for the best response dynamics. The values of ( α , α p ) are ( , ) , for the example in ﬁgure 2, and ( , ) in ﬁgure 3.3.2. Effect of growing population.

We consider a situation when the num-ber of interactions among agents is constant and does not change if the pop-ulation size N ( t ) increases. That is, the effective learning rate α p = hN ( t ) δ = N N ( t ) hN δ = α N ( t ) , MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT15 F IGURE

1. Phase portraites for best-response (left) andreplicator (right) equations for Hawk-Dove-Retaliator gamein table 1. On the left, three polygonal regions, formed bylines OL , OM and O ( . H + . D ) are the regions where thebest response function a single value: H , D , or R . The basinof attraction for R is polygon KRMO (left) and region abovecurve

HOD (right). The plots show several trajectories forthe best response and the replication equations.F

IGURE

2. Cultural evolution: reproduction dominatingcase. Three subpopulaitons starting at A , B , and C , movetoward Retaliator. A trajectory starting at D is the mean bestresponse (strategy proﬁle). Notice that it changes discontin-uously when one of the groups crosses boundaries of bestresponse polygons. F IGURE

3. Cultural evolution: learning dominating case.Same initial conditions as in Figure 2. All groups convergeto .5H+.5D after the group that started at C moves to theadjacent polygon. Trajectory starting at D is the mean bestresponse (strategy proﬁle). Notice that it changes discon-tinuously (switches to D (cid:48) ) when the top group moves to theadjacent polygon.where N is the population size at time t = . The system of equations is(25) ∂ t f + α N ( t ) div p ( ¯ b ( t ) f ) = α f (cid:32) ∑ i j a i j b i ( p ) ¯ b j ( t ) − a i j ¯ b i ( t ) ¯ b j ( t ) (cid:33) , (26) ∂ t N ( t ) = α N ∑ i j ¯ b i ( t ) ¯ b j ( t ) , with ¯ b ( t ) given by (15). Consider rock-paper-scissors game from table 2.We deﬁne the ﬁtness levels (number of offspring) a i j as basis ﬁtness 1 plusthe numbers from the table. The best response function b ( p ) is sketched inﬁgure 4, to which we refer below. Initially, the population is split into threegroups. The ﬁrst is 23/32 of all population and every agent in this groups hasinitially learning prior p ( ) = ( . , . , . ) . It is located in the polygonRNOL for which the best response is to play “paper”. The second group ofproportion 1/4 has prior p ( ) = ( . , . , . ) with the best response “scis-sors”, and the third of proportion 1/32 has prior p ( ) = ( . , . , . ) .The mean best response ¯ b ( ) is located on insider region LONP . Supposethat initially there are 10 agents and the values of the parameters α = MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT17 α = . . The distribution function f has the form f ( p , t ) = ∑ i = w i ( t ) δ ( p − p i ( t )) , ∑ i = w i ( t ) = , and the system (25), (26) reduces to a system of 5 ODEs for w i ( t ) , p i ( t ) , N ( t ) , i = .. p ( t ) , p ( t ) , p ( t ) , and the mean best response¯ b ( t ) is shown on ﬁgure 4, obtained from solving the system of ODEs numer-ically. In this dynamics, statistical learning pushes p , p , p toward ¯ b ( t ) , however, the rate of learning decreases exponentially (we show this below),and as the result p ( t ) , and p ( t ) will asymptotically approach some loca-tions in the same polygons where they have started, where as p ( t ) movesto the decision polygon of p ( t ) and also becomes locked there. Then, thepopulation frequency vector ¯ b converges to “scissors” along line PS , mean-ing that subpopulations that started in LOMP and MSNO out-evolves theﬁrst group.The dynamics here is different from that of the replicator equation forwhich ¯ b ( t ) oscillates on a closed trajectory passing through the initial point¯ b ( ) . It differs also, from the dynamics of the best-response equation, thatconverges to the equilibrium ( / , / , / ) , see Gaunersdorfer and Hof-bauer [4].To see that this scenario takes place, notice that as long as ¯ b ( t ) is locatedon line PS, ¯ b ( t ) = ∂ t N = α ( ¯ b ( t ) + ¯ b ( t )) N = α N . Thus, the population grows exponentially, N ( t ) = N e α t . In the state of pri-ors R + , each group moves to new positions given by formulas p i ( t ) = p i ( ) + α N (cid:90) t e − α t ¯ b ( τ ) d τ . Note, that ﬁgure 4 shows projections of this p i ( t ) onto S . Clearly, p ( t ) and p ( t ) move a ﬁnite distance away from their initial position, and theparameters α , α , N can be selected (as in this example) in such a way that p ( t ) p ( t ) remain in the polygon where it has started. Moreover, a smallfraction of population w can be placed initially into region OMSN , closeto line ON , so that it crosses that line, forcing ¯ b ( t ) to move to line PS .

4. A

PPENDIX

Best response function BR ( p ) . . Let S d − be the d − (cid:8) p ∈ R d + : ∑ i p i = (cid:9) . Let A = { a i j } represents payoff matrix in a Rock Paper ScissorsRock 0 -1 1Paper 1 0 -1Scissors -1 1 0T

ABLE

2. Rock-Paper-Scissors game.F

IGURE

4. Cultural evolution with constant interaction fre-quency. The plot shows three polygonal regions where thebest response function takes a single value: R , P , or S . Thereare three subpopulations located at points A , B , and C , re-spectively. The population starting at C moves to the adja-cent polygon and stays there for all subsequent times. Sub-populations started at A and B do not leave their polygons.A discontinuous trajectory starting in at D is the mean best-response (strategy proﬁle). Asymptotically it moves to S , meaning that the subpopulations contained in the polygon LONP , out-evolves the population from the adjacent poly-gon LRMO . symmetric game. We will assume that for no two indexes i (cid:54) = j , (27) ∑ k a ik p k = ∑ k a jk p k , ∀ p ∈ S d − . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT19

Denote by r i ( p ) = ∑ k a ik p k , the payoff to strategy i played against mixedstrategy p , and a set I ( p ) = (cid:26) i ( p ) ∈ .. d : r i ( p ) = max i r i ( p ) (cid:27) . Denote the coordinate vectors e i = ( , .. , , .. ) , with 1 in i th position, anda multi-valued function(28) BR ( p ) = (cid:8) convex hull of all e i ( p ) , such that i ( p ) ∈ I ( p ) (cid:9) . Under hypothesis (27), S d − is a union of ﬁnite number of polytops suchthat BR ( p ) is single-valued in the interior of each polytop P k , and at anypoint p on the boundary of P k , the best response BR ( p ) contains the valueBR ( p ) from the interior of P k :BR ( p ) ∈ BR ( p ) , ∀ p ∈ Interior ( P k ) , p ∈ ∂ P k . This condition can be re-phrased in an equivalent way, as a continuity con-dition: for any p ∈ S d − , there is ε > , such that for any ε < ε , and anypoint p ∈ B ε ( p ) ∩ S d − , (29) BR ( p ) ⊆ BR ( p ) . Finally, we select a single-valued representative of b ( p ) from the valuesof BR ( p ) . If p ∈ R d + , then b ( p ) is one of the values of BR ( p / ∑ i p i ) . Theselection can be, for example, the barycenter of the set of values of BR ( p ) , which corresponds to the situation when agents are choosing one strategyat random (from an uniform distribution).R EFERENCES [1] T. Borgers and R. Sarin. Learning through reinforcement and replicator dynamics.

Journal of Economic Theory , 77(1):1–14, 1997.[2] H. Brezis.

Functional Analysis, Sobolev Spaces and Partial Differential Equations .Springer, 1st edition, 2011.[3] D. Fudenberg and D. Levine.

The theory of learning in games . MIT Press, Cambridge,MA, 1998.[4] A. Gaunersdorfer and J. Hofbauer. Fictitious play, shapley polygons and the replicatorequation.

Games and economic behavior , 11:279–303, 1995.[5] I. Gilboa and A. Matsui. Social stability and equilibrium.

Econometrica , 59(3):859–867, 1991.[6] J. Hofbauer. From nash and brown to maynard smith: equilibria, dynamics and ess.

Selection , 1:81–88, 2000.[7] J. Hofbauer, P. Schuster, and K. Sigmund. A note on evolutionary stable strategiesand game dynamics.

Journal of Theoretical Biology , 81:609–612, 1979.[8] J. Hofbauer and K. Sigmund.

Evolutionary games and population dynamics . Cam-bridge University Press, New York, 1998.[9] W. Hoppitt and K. Laland.

Social Learning: An Introduction to Mechanisms, Methodsand Models . Princenton University Press, 2013. [10] Ch. Krishnedu, D. Zufferey, and M. Nowak. Evolutionary game dynamics in popula-tions with different learners.

Journal of theoretical biology , 301:161–173, 2012.[11] M. Perepelitsa. Adaptive learning in large populations.

Journal of Mathematical Bi-ology , 79:2237–2253, 2019.[12] P. Richerson and R. Boyd.

Not by Genes Alone: How Culture Transformed HumanEvolution . University of Chicago Press, 2005.[13] J. Robinson. An iterative method for solving a game.

Annals of Mathematics , 54:296–301, 1951.[14] L. S. Shapley. Some topics in two person games: in advances. In M. Drescher, L. S.Shapley, and A. W. Tucker, editors,

Advances in Game Theory , chapter 1. PrincetonUniversity Press, 1964.[15] J. Maynard Smith.

Evolution and the theory of games . Cambridge University Press,New York, 1982.[16] J. Maynard Smith and G.R. Price. The logic of animal conﬂict.

Nature , 246:15–18,1973.[17] P. D. Taylor and I. N. Jonker. Evolutionary stable strategies.

Mathematical Bio-sciences , 20:145–156, 1978.[18] E. C. Zeeman. Dynamics of the evolution of animal conﬂicts.

Journal theoreticalbiology , 89:249–270, 1981.

MPEREPEL @ CENTRAL . UH . EDU , D

EPARTMENT OF M ATHEMATICS , U

NIVERSITY OF H OUSTON , 4800 C