A model of cultural evolution in the context of strategic conflict
AA MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OFSTRATEGIC CONFLICT
MISHA PEREPELITSAA
BSTRACT . We consider a model of cultural evolution for a strategyselection in a population of individuals who interact in a game theoreticframework. The evolution combines individual learning of the environ-ment (population strategy profile), reproduction, proportional to the suc-cess of the acquired knowledge, and social transmission of the knowl-edge to the next generation. A mean-field type equation is derived thatdescribes the dynamics of the distribution of cultural traits, in terms ofthe rate of learning, the reproduction rate and population size. We es-tablish global well-posedness of the initial-boundary value problem forthis equation and give several examples that illustrate the process of thecultural evolution for some classical games.
1. I
NTRODUCTION
Evolutionary game theory, pioneered by Maynard Smith and Price [16] isa powerful tool that explains dominance of some behavioral traits as beinguninvadable by other traits in the competition for Darwinian fitness points,when fitness is frequency dependent. A deterministic dynamic process thatselects a stable behavioral traits can be described by the replicator equation,see Taylor and Jonker [17], Hofbauer et al. [7], Zeeman [18].The replicator equation also governs the dynamics of reinforcement learn-ing in repeated play of a game, see Borgers and Sarin [1], Fudenberg andLevine [3], Krishnedu et al. [10], Perepelitsa [11].Learning in games is an integral part of game theory that goes back toworks of Robinson [13] and Shapley [14]. One of its mainstays is fictitiousplay or statistical learning. The learning by fictitious play in large pop-ulations can be described by an ODE, called best-response equation, seeGilboa and Matsui [5], Gaunersdorfer and Hofbauer [4], Hofbauer [6] andHofbauer and Sigmund[8]. The best-response equation describes changesin the mean statistical prior about the opponent actions, and its stationarypoints are Nash equilibria.In this paper we consider an evolutionary process that combines the con-cept of “the survival of fittest” from biological evolution with individual
Date : June 3, 2020. a r X i v : . [ q - b i o . P E ] J un MISHA PEREPELITSA learning through fictitious play, when a state of learning is socially trans-mitted to the next generation of players. The examples of this types ofprocesses are furnished by the cultural evolution theory.Consider a cartoonish scenario of cultural evolution. Lets say there isan island populated by pedantical statisticians capable of asexual reproduc-tion. Statisticians wonder aimlessly around an island, meeting each otheroccasionally for a round of a game (symmetric game with a finite set ofstrategies). Each carries a ledger where he/she carefully marks how manytimes the opponent played a particular strategy (opponents as indistinguish-able). To select a strategy, each of them uses the “sacred book of rules”(best response in a game) that prescribes what to do given the current countfrom his/her ledger. The book has a dual purpose of settling the outcome ofeach play of the game (payoffs), and the players collect certain amount offitness points from each play. From time to time statisticians reproduce atthe rate proportional to their accumulated fitness. When that happens, theysolemnly pass an exact copy of their ledgers to each of the offspring, whocarry on with it in the same manner.We say that this model is a form of cultural evolution because it is char-acterized by social transmission of traits (inheritance of knowledge) andindividual learning as adaptation mechanisms, see Hoppitt and Laland [9],Richerson and Boyd [12]. Note, that in this case, social transmission andlearning change population strategy profile (environment), which determines,in its turn, the degree of success of a cultural trait, rendering the problemnonlinear.The main parameters of the problem are the rate of learning, the rate ofreproduction and population size. Additional information might be neededto completely specify the problem. For example, if the island is not big,we may assume that the frequency with which inhabitants meet and playthe game increases with the population size. Another scenario is an infiniteisland which allows inhabitants to spread, no matter how many of them arethere, so that the interaction frequency is constant.The goal of the paper is to develop a mathematical model that takes asinput the initial distribution of cultural traits in a population, the above men-tioned rates of learning and reproduction, and outputs the distribution ofcultural traits at any moment t in future. As we will see from the exam-ples of section 3, it is essential for an accurate description of the dynamicsthat the model specifies the whole distribution of traits and not just somestatistical averages, such as the mean and the variance. The model is de-rived as a mean-field approximation of the distribution density of a Markovprocess describing the interaction of agents. The equation is of kinetic typewith non-linear kinetic velocities. Due to the discontinuities of the best-response function, solutions of this equation are intrinsically weak. Our MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT3 main result, stated in section 2, establishes global well-posedeness of theinitial-boundary value problem for this equation.In section 3 we discuss two examples that illustrate the dynamics of thismodel of cultural evolution. In the first, we consider Hawk-Dove-Retaliatorgame with two evolutionary stable strategies: H + D and R . These twostrategies are the only asymptotically stable points for the replicator andbest-response equations. The phase, however, are different, with the basinof attraction for H + D , for the replicator equation, being strictly includedin the basin of attraction for the best-response dynamics. As a result, thereare initial conditions for the distribution of cultural traits which evolve toRetaliator when the rate of learning is low, but proceed to H + D whenlearning rate is increased. It also means that if the biological evolution pro-ceeds to the mix of Hawk and Dove, it can not be averted to anything elseby learning. Another interesting property of this process is a sharp changein the environment (population strategy profile) when a subpopulation con-tinuously transitions from one decision polygon to another.In the second example, of Rock-Paper-Scissors game, we show how ex-ponentially growing heterogeneous population can lock the cultural evolu-tion in a suboptimal pure strategy, in contrast to both, the dynamics of thereplicator and best-response equations.In general, determining asymptotic state for this type of evolution for anarbitrary game is problematic due to complicated dynamics and absenceof entropy functionals. It can be done in some cases, at least partially, asin the model with zero reproduction rate. For that model, we show that iftwo statistical averages, the mean prior and mean best-response converge tosome values (not necessarily the same), then the prior of every agent in thepopulation converge to a Nash equilibrium of the game.2. M ODEL
We consider a series of plays of a symmetric 2-player game betweenrandomly selected agents in a large population. There are d strategies avail-able to agents and the payoffs are given by matrix A = { a i j } di , j = , whichwe assume to have non-negative entries. The game defines multi-valuedbest response function BR ( p ) : S d − → P ( S d − ) , where S d − is d − b ( p ) ∈ BR ( p / ∑ i p i ) . We refer the reader to Appendix section 4.1 for de-tails. We start with the case when there is no reproduction and the popula-tion stays at the same level N .We will record the change of the state of agents that occur at discreteepochs, labeled by t . The state of agent i at epoch t is a d + MISHA PEREPELITSA vector X ti = (cid:0) P ti , S ti (cid:1) , where P ti = ( p ti , , ..., p ti , d ) , is a vector of learning priors (unscaled) and S ti is an averaged, accumulatedfitness.An interaction is a round of the game between to random agents, say i and j , who play according to their priors P ti and P tj , that is, using theirbest response strategies. Based on that, they earn fitness points and updatethe learning priors. To describe the update rule we will use the followingparameters: h – the characteristic learning increment, µ h – characteristicfitness increment, and δ – time increment. Thus we assuming the learningand fitness increments are of the same order, but not necessarily equal. Therule takes the form P t + δ i = P ti + hb ( P tj ) P t + δ j = P tj + hb ( P ti ) S t + δ i = ( − µ h ) S ti + µ ha ( b ( P ti ) , b ( P tj )) S t + δ j = ( − µ h ) S tj + µ ha ( b ( P tj ) , b ( P ti )) where a ( b ( P ti ) , b ( P tj )) = ∑ k , l a i j b ( P ti , k ) b ( P tj , l ) is the fitness earned by agent i . In this formulas the fitness is averaged over the history of payoffs, so thatit can not grow without a bound. One can think of µ as being a recencyparameter. Large values of µ put more weight on more recent payoffs.Our goal here is to derive an approximate equation for f ( p , s , t ) – thedensity of the distribution of agents over the space of learning priors andfitness ( p , s ) ∈ R d + × R + . In the following derivation we use the conventionthat x i = ( p i , s i ) , ¯ x = ( x , ..., x N ) ∈ ( R d + × R + ) N , where ¯ x parametrized the state of the whole population.Let w ( ¯ x , t ) be the density of the distribution of priors and fitness for thewhole population. This function implicitly depends on parameters suchas h , h , and µ , but we suppress them from notation for convenience ofpresentation. The update rule can be expressed as a moment relation with atest function φ , (cid:90) φ ( ¯ x ) w ( ¯ x , t + δ ) d ¯ x = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) φ ( ¯ x ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j w ( ¯ x , t ) d ¯ x , MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT5 where ˆ x i = ( p i + hb ( p j ) , ( − µ h ) s i + µ ha ( b ( p i ) , b ( p j ))) , ans symmetri-cally for ˆ x j . The last equation can be written as(1) (cid:90) φ ( ¯ x )[ w ( ¯ x , t + δ ) − w ( ¯ x , t )] d ¯ x = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) [ φ ( ¯ x ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j − φ ( ¯ x )] w ( ¯ x , t ) d ¯ x . Function f ( x , t ) , where x = ( p , s ) is related to the multi-agent distribution w ( ¯ x , t ) through the rule: f ( x , t ) = ∑ k N − (cid:90) w ( ¯ x ) (cid:12)(cid:12) x k = x d ¯ x k , x ∈ R d + × R + , where ¯ x k is a ( d + )( N − ) dimensional vector of all coordinates, x , ..., x N , excluding x k . This is one-particle distribution function. In the formulas tofollow we need to use two-particle distribution function g ( x , y , t ) = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) w ( ¯ x ) (cid:12)(cid:12) x i = x , x j = y d ¯ x i j , where ¯ x i j is the ( d + )( N − ) dimensional vector of all coordinated ex-cluding x i and x j . Function g is symmetric in ( x , y ) and is related to f by the formulas f ( x , t ) = (cid:90) g ( x , y , t ) dx = (cid:90) g ( x , y , t ) dy . The moments of function f and g are computed from the moments of w : (cid:90) ψ ( x ) f ( x , t ) dx = ∑ k N − (cid:90) ψ ( x k ) w ( ¯ x ) d ¯ x , and (cid:90) ω ( x , y ) g ( x , y , t ) dxdy = ∑ i (cid:54) = j ( N ( N − )) − (cid:90) ω ( x i , x j ) w ( ¯ x ) d ¯ x . MISHA PEREPELITSA
Now, we use (1) to obtain an integral equation of the change of function f . For that select φ ( ¯ x ) = ψ ( x k ) , sum over k and take average. We get(2) (cid:90) ψ ( x )[ f ( x , t + δ ) − f ( x , t )] dx = N − ∑ k ∑ i (cid:54) = j ( N ( N − )) − (cid:90) [ ψ ( x k ) (cid:12)(cid:12)(cid:12) xi = ˆ xix j = ˆ x j − ψ ( x k )] w ( ¯ x , t ) d ¯ x = N − ∑ i (cid:54) = j ( N ( N − )) − (cid:18) (cid:90) [ ψ ( ˆ x i ) − ψ ( x i )] w ( ¯ x , t ) d ¯ x + (cid:90) [ ψ ( ˆ x j ) − ψ ( x j )] w ( ¯ x , t ) d ¯ x (cid:19) = N (cid:90)(cid:90) [ ψ ( ˆ x ) − ψ ( x )] g ( x , y , t ) dxdy , where x = ( p , s ) , y = ( p (cid:48) , s (cid:48) ) andˆ x = ( p + hb ( p (cid:48) ) , ( − µ h ) s + µ ha ( b ( p ) , b ( p (cid:48) ))) . To proceed to, we make an assumption of statistical independence of thestates of two randomly selected agents: g ( x , y , t ) = f ( x , t ) f ( y , t ) . The plausibility of this condition is partially justified if the population islarge, so that same agents are rarely matched together, and the informationabout the interaction is not shared between other agents.Then, expanding ψ ( ˆ x ) in Taylor series and integrating by parts, we obtain(3) (cid:90) ψ ( x )[ f ( x , t + δ ) − f ( x , t )] dx = N (cid:90)(cid:90) ψ ( p , s ) div p , s (cid:0) ( hb ( p (cid:48) ) , µ hs − µ ha ( b ( p ) , b ( p (cid:48) ))) f ( x , t ) (cid:1) f ( y , t ) dydx + O ( h )= N (cid:90) ψ ( p , s ) div p , s (cid:0) ( h ¯ b ( t ) , µ hs − µ h ¯ a ( b ( p ) , t )) f ( x , t ) (cid:1) dx + O ( h ) , with the mean best response(4) ¯ b ( t ) = (cid:90)(cid:90) b ( p ) f ( p , s , t ) d pds , and the mean fitness for using strategy b ( p ) :(5) ¯ a ( b ( p ) , t ) = (cid:90)(cid:90) a ( b ( p ) , b ( p (cid:48) )) f ( p (cid:48) , s (cid:48) , t ) d p (cid:48) ds (cid:48) = ∑ i , j a i j b i ( p ) ¯ b j ( t ) . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT7
Dividing equation by δ and ignoring higher order terms we arrive at Fokker-Planck equation for density f ( p , s , t ) :(6) ∂ t f + hN δ div p (cid:0) ¯ b ( t ) f (cid:1) + h µ N δ ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = . In passing from a discrete to continuous time model we are assuming δ , h are small, N is large, so that ratios(7) α p = hN δ , α s = µ hN δ are of finite order. Note that ( N δ ) − can be interpreted as a number ofinteractions per agent, per unit of time. We’re assuming that this numberis large and inversely proportional to the characteristic learning and fitnessincrement h . Now we extend the model to variable populations, by allowing agents toreproduce at the rate proportional their level of fitness. At this point weproceeding heuristically, leaving out the details of the derivation.With reproduction, the Fokker-Planck equation must be appended by asource term proportional to ( s − ¯ r ( t )) f ( p , s , t ) on the right-hand side of (6),where ¯ r ( t ) is mean population fitness(8) ¯ r ( t ) = ∑ i , j a i j ¯ b i ( t ) ¯ b j ( t ) . Mean population size N = N ( t ) , which is determined from the equation(9) 1 N dNdt = α (cid:90)(cid:90) s f ( p , s , t ) d pds , where α is reproduction rate. Moreover rates α p , α s are variable and dependon N = N ( t ) . The final model reads:(10) ∂ t f + α p div p (cid:0) ¯ b ( t ) f (cid:1) + α s ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = α ( s − ¯ r ( t )) f , with α p , α s , ¯ b ( t ) , ¯ a ( b ( p ) , t ) and ¯ r ( t ) given by (7), (15), (5), and (8), respec-tively. Note that equations (9) and (10) are coupled through formulas (7).2.1. Singular limit of recency parameter µ . . In the reproduction scenariodescribed by (10), children acquire not only knowledge p of parents but alsotheir averaged, accumulated fitness s . Hypothetically, this might be a validassumption in some situations, however, it seems more relevant to considerthe case that it is only knowledge p that eventually determines the fitness ofoffspring. This can easily be achieved in the framework of models (7)-(10)be taking the limit of µ → ∞ ( α s → ∞ ), which overweights the stimulusobtained from recent encounters. For the derivation of the new model we MISHA PEREPELITSA proceed informally. Dividing equation (10) by α s and passing to the limit,we get ∂ s (( ¯ a ( b ( p ) , t ) − s ) f ) = . Since f is non-negative, this equation can be true only if for all p ∈ R d + , and t > , f is a delta-function concentrated on value ¯ a ( b ( p ) , t ) : f ( p , s , t ) = δ ( s − ¯ a ( b ( p ) , t )) . That is, fitness equals to the expected payoff for an agent using strategy b ( p ) against the population strategy profile ¯ b ( t ) : s = ¯ a ( b ( p ) , t ) = ∑ i j a i j b i ( p ) ¯ b j ( t ) . Now, the dimension of the problem can be reduced, as we can integrate (10)in s , and find an equation for moment (cid:82) ∞ − ∞ f ( p , s , t ) ds , which, with slightabuse of notation, we still call f ( p , t ) . The equation reads: ∂ t f + α p div p ¯ b ( t ) f = α ( ¯ a ( b ( p ) , t ) − ¯ r ( t )) f = α (cid:32) ∑ i j a i j b i ( p ) ¯ b j ( t ) − ∑ i j a i j ¯ b i ( t ) ¯ b j ( t ) (cid:33) f . (11)This is the equation of our main interest, for which we will establish globalwell-posedness. Before we switch to the mathematical analysis, we mentiona special case with zero reproduction α = . To complete the mathematical setup for equations (11) and (14) it re-mains to add the initial conditions for the population size N ( ) = N , forthe density(12) f ( p , ) = f ( p ) , p ∈ R d + . and boundary conditions (zero influx of probability):(13) f ( p , t ) = , p ∈ ∂ R d + , t ≥ . Note that velocity vector ¯ b ( t ) is always directed into R d + , and the problemis not over-determined.2.2. Fictitious play in large populations.
The model becomes particu-larly simple:(14) ∂ t f + α p div ( ¯ b ( t ) f ) = , with the mean best response(15) ¯ b ( t ) = (cid:90) b ( p ) f ( p , t ) d p . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT9
Using equation (14) we can compute the equation for the mean empiricalfrequencies vector P ( t ) :(16) dP i dt = (cid:90) ∑ j p j (cid:18) ¯ b i ( t ) − p i ∑ j p j (cid:19) ˜ f ( p , t ) d p , i = .. d , since ∑ j ¯ b j ( t ) = . If one postulates that all agents have the same, or ap-proximately the same, priors(17) p ( t ) = ( P ( t ) , .., P d ( t )) , then the above equation reduces to a variant of the best response dynamicsequation:(18) dP i dt = ∑ j P j ( t ) (cid:0) ¯ b i ( P ) − P i (cid:1) , i = .. d . Notice, also, the positive factor on the right-hand side of the equation. Fora learning processes in which priors become large, the learning rate slowsdown.2.3.
Relation to the replicator equation.
With zero learning rate α p = f ( p , t ) . Indeed, in this case each agent uses a fixed strategy b ( p ) , so that the population is split into at most d groups, each using aparticular strategy, and each reproducing at the rate proportional to the av-eraged fitness obtained from interacting with whole population. Formally,one obtains the system of replicator equations by integrating (11) over sets { p : b ( p ) = e k } , k = .. d . Existence of weak solutions.
In this section we establish our mainresult, theorem 1. Let Ω = R d + , and C ( Ω ) be a space of continuouslydifferentiable functions with compact support in Ω . We adopt standard no-tation for L p ( Ω ) spaces and the space of functions of locally bounded vari-ation BV loc ( Ω ) . The latter consists of all measurable and locally integrablefunctions f such that for any ball B r , (cid:107) f (cid:107) TV ( B r ∩ Ω ) = sup (cid:26) (cid:90) B r ∩ Ω f div ψ d p : ψ ∈ C ( B r ∩ Ω ) , sup p | ψ | ≤ (cid:27) < + ∞ . For such functions, the distributional derivative ∂ p i f , i = .. d , is a signedRadon measure. One can find the information on these spaces and the re-sults from functional analysis that we use below, for example, in a book byBrezis [2]. Theorem 1.
Let f ∈ C ( Ω ) be a non-negative function with unit mass.There is a unique weak solution f of (11) , (12) , (13) such thatf ∈ C ([ , T ] ; L ( Ω )) ∩ L ∞ ([ , T ] ; BV ( B r ∩ Ω )) , ∀ r , T > . For any t > , f ( p , t ) ≥ , a.e. in Ω and (cid:82) f ( p , t ) d p = . Proof.
From the definition of function b ( p ) and properties of BR ( p ) it fol-lows that for any ball B r , b ( p ) has finite total variation on B r ∩ Ω , and thereis C = C ( r ) , but not depending on the center of the ball, such that(19) (cid:107) b (cid:107) TV ( B r ∩ Ω ) ≤ C . Equation (11) can be written in non-conservative form as(20) ∂ t f + ¯ b ( t ) ∇ f = (cid:0) ∑ a i j ( b i ( p ) ¯ b j ( t ) − ¯ b i ( t ) ¯ b j ( t )) (cid:1) f , where for simplicity we set α = . Given a continuous function ¯ b ( t ) wesolve this equation by the method of characteristics. For a mapping X t : R d → R d , defined as X t ( p ) = p + (cid:90) t ¯ b ( τ ) d τ , f is expressed through the formula f ( X t ( p ) , t ) = f ( p ) exp (cid:26) (cid:90) t ∑ a i j [ b i ( X τ ( p )) ¯ b j ( τ ) − ¯ b i ( τ ) ¯ b j ( τ )] d τ (cid:27) , or as(21) f ( p , t ) = f ( p − (cid:90) t ¯ b ( τ ) d τ ) × exp (cid:26) (cid:90) t ∑ a i j [ b i ( p − (cid:90) t τ ¯ b i ( s ) ds ) ¯ b j ( τ ) − ¯ b i ( τ ) ¯ b j ( τ )] d τ (cid:27) . Let g ∈ C ([ , T ] ; L ( Ω )) be a non-negative function such that g ( p , ) = f ( p ) , and (cid:82) g ( p , t ) d p = , for all t ∈ [ , T ] . We denote this subset of func-tions as K . It is a closed, convex subset of C ([ , T ] ; L ( Ω )) . Let¯ b g ( t ) = (cid:90) b ( p ) g ( p , t ) d p , and define map f = L ( g ) by evaluating (21) with ¯ b = ¯ b g . Notice that dueto assumptions on g , sup t | ¯ b g ( t ) | ≤ . It follows thatsup p , t f ( p , t ) ≤ e CT sup p f ( p ) , for some C > g , and (cid:82) f ( p , t ) d p = . Moreover, the fol-lowing lemma holds
Lemma 1.
For any r > , f ∈ L ∞ (( , T ) ; BV ( B r ∩ Ω )) , and there is C = C ( r , T ) , independent of g such that ess sup t (cid:107) f ( · , t ) (cid:107) TV ( B R ∩ Ω ) ≤ C ( r , T ) . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT11
Proof.
Recall that b ( p ) is a function of finite total variation that verifiesestimate (19). Differentiating (21) in p k , and using the chain rule we findthat for any ball B r , (22) (cid:90) B r | ∂ p k f | d p ≤ C ( T ) (cid:90) Ω | ∂ p k f | d p + C ( T ) sup f (cid:90) t (cid:90) B r − (cid:82) t τ ¯ b g ( s ) ds ∑ i | ∂ p k b i ( p ) | d pd τ ≤ C ( T ) (cid:90) Ω | ∂ p k f | d p + C ( r , T ) sup f ≤ C ( r , T ) , where | ∂ p k b i ( p ) | is a Borel measure. (cid:3) Using the argument of the last lemma one easily verifies that f is Lips-chitz continuous in time with values in L ( Ω ) : Lemma 2.
Let t and t + δ ∈ [ , T ] . Then, there is C = C ( T ) , independentof g , such that (23) (cid:90) | f ( p , t + δ ) − f ( p , t ) | d p ≤ (cid:107) f (cid:107) C ( Ω ) C δ . From the properties of f = L ( g ) that we have just established we seethat L maps K into itself. In addition, we now show that Lemma 3. L [ K ] is pre-compact in C ([ , T ] ; L ( Ω )) . Proof.
Indeed, since f has bounded total variation in p , we know thatsup t ∈ [ , T ] (cid:90) | f ( p + h , t ) − f ( p , t ) | d p ≤ C ( T ) h . The support of functions f ( · , t ) for all different t ’s and g ’s is containedin some fixed ball B r because X t is an uniform translation with a con-tinuous vector (cid:82) t ¯ b g ( τ ) d τ . By Kolmogorov-Riesz-Frechet theorem, for all t ∈ [ , T ] , set { L ( g ) } g ∈ K is pre-compact in L ( Ω ) . Using Lipschitz continuity in time, this also im-plies that { L } g ∈ K is pre-compact in C ([ , T ] ; L ( Ω )) . (cid:3) Thus, L is a compact mapping from K into itself. By Schauder fixedpoint theorem, there is a fixed point f = L ( f ) in K ⊂ C ([ , T ] ; L ( Ω )) . Clearly, it verifies all estimates that we have derived. Moreover, it can beshown that f is a weak solution of pde (11).Uniqueness of solutions follows from a stronger property, stability esti-mate. Let f , f be two solutions of (11)–(13) with initial conditions f ,
12 MISHA PEREPELITSA and f , . Such solutions verify the formula (21), from which we find that (cid:90) | f ( p , t ) − f ( p , t ) | d p ≤ C ( T ) (cid:90) | f , ( p , t ) − f , ( p , t ) | d p + C (cid:90) t (cid:90) | f ( p , τ ) − f ( p , τ ) | d pd τ . Thus, according to Gronwall’s inequality (cid:90) | f ( p , t ) − f ( p , t ) | d p ≤ C ( T ) (cid:90) | f , ( p , t ) − f , ( p , t ) | d p . (cid:3) Now we collect information on the support of solutions of (11) that willbe used in the proof of theorem 2.
Lemma 4.
Suppose that supp f ⊂ Interior ( Ω ) . Then, for any t > , a. supp f ( · , t ) ⊂ Interior ( Ω ) ;b. for any p ∈ Ω , | p + (cid:90) t ¯ b ( τ ) d τ | ≥ t / d ;c. if supp f ⊂ B r ( p ) , for some r and p ∈ Ω , then supp f ( · , t ) ⊂ B r (cid:18) p + (cid:90) t ¯ b ( τ ) d τ (cid:19) . Proof.
Since for any t , ¯ b ( t ) ∈ S d − ⊂ Ω and Ω is a cone, it follows that (cid:82) t ¯ b ( τ ) d τ ∈ Ω and for any p ∈ Interior ( Ω ) , p + (cid:82) t ¯ b ( τ ) d τ ∈ Interior ( Ω ) . Moreover, the distance from p + (cid:82) t ¯ b ( τ ) , d τ to ∂ Ω is no less than the dis-tance from p to ∂ Ω . This proves part. a. Part b. follows from the fact thatfor any t > , ∑ di = ¯ b i ( t ) = , and so, there is i , and there is δ ⊂ [ , t ] , suchthat ¯ b i ( t ) ≥ / d , for all t ∈ ∆ , and | ∆ | ≥ t / d . Part c . follows immediatelyfrom (21). (cid:3) Asymptotic behavior in fictitious play.
Consider a model of statis-tical learning in a large population described by equation (14). An initialboundary-value problem (12), (13) with arbitrary f ∈ C ( Ω ) , has a globalunique solution, as was established in theorem 1. Denote population meanlearning prior by P ( t ) = (cid:90) p ∑ i p i f ( p , t ) d p , and by ˆ f the projection of f ( p , t ) onto the simplex S d − . That is,ˆ f ( ˆ p , t ) = f ( p , t ) , ˆ p = p ∑ i p i ∈ S d − . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT13
Hawk Dove RetaliatorHawk -1 2 -1Dove 0 1 0.9Retaliator -1 1.1 1T
ABLE
1. Hawk-Dove-Retaliator game.The next theorem shows that if the population averages P ( t ) and ¯ b ( t ) con-verge to certain values, then these values must be the same and equal toa Nash equilibrium for the matrix game, and the learning priors of everyagent in the population converge to that Nash equilibrium. Theorem 2.
Suppose that lim t → ∞ P ( t ) = P and lim t → ∞ ¯ b ( t ) = b . Then,b = P ∈ BR ( P ) , and ∀ ε > , ∃ T ( ε ) such that if t > T ( ε ) , then (24) supp ˆ f ( · , t ) ⊂ B ε ( P ) ∩ S d − . Proof.
Consider function ˆ f ( p , t ) which is defined for p ∈ S d − . From thedefinition of P ( t ) it follows that P ( t ) belongs to the closed convex hullspanned by supp ˆ f ( · , t ) . At time t = , the support of f is separated fromthe origin, and thus, by properties b. and c. of lemma 4 (it applies to solu-tions of (14) as well), support of f ( · , t ) will be contained in a ball of fixedradius and the center diverging to infinity. This means that the diameter ofthe support of projection ˆ f decreases to zero. At the same time, since itcontains point P ( t ) accumulating at P , statement (24) follows.To proof the first statement, notice that for sufficiently small ε and large t , all of mass of ˆ f is near P so that ¯ b ( t ) is a convex combination of values of ofBR ( p ) in polytops adjacent to point P , and so (see (29) from Appendinx),is an element of BR ( P ) . On the other hand P must be equal to b , becauseof the transport structure of the kinetic equation (14). (cid:3)
3. E
XAMPLES
Cultural evolution in Hawk-Dove-Retaliator game.
Consider a clas-sical Hawk-Dove-Retaliator game, table 1, from evolutionary game theory,see Maynard Smith [15] ans Zeeman [18], table 1. The game has two ESS: Hawk+ Dove and Retaliator. Depending on the initial distribution of fre-quencies to play hawk, dove, or retaliator, the replicator dynamics will pro-ceed to one of ESS’s as shown on figure 1.The same strategies are also asymptotically stable points for the best-response dynamics, which describes the statistical learning (fictitious play) in this game. Figure 1 shows the basins of attraction for each of the strate-gies. Notice that basin of attraction for strategy R in replicator equationcontains that region for the best-response dynamics. The mismatch be-tween two dynamics accounts for different scenarios of cultural learningfor different pairs of the learning and reproduction rates ( α p , α ) . For a population consisting of three groups, located in three best-respnsepoligyons, figures 2 and 3 show two different scenarios for cultural evolu-tion. The first is reproduction dominated and the other is learning dominateddynamics. Trajectories were obtained by solving (11) numerically.Notice also that the mean best-response (strategy profile) changes dis-continuously when one the subpopulation crosses the boundaries of bestresponse polygons.With finite number of subpopulations the model reduces to a system ofODEs. In this particular example the density function f ( p , t ) = ∑ i = w i ( t ) δ ( p − p i ( t )) , ∑ i = w i ( t ) = . where functions p i and w i are solutions of ∂ t p i = ¯ b ( t ) , i = .. , ∂ t w i = α w i (cid:32) ∑ kl a kl b k ( p i ( t )) ¯ b l ( t ) − a kl ¯ b k ( t ) ¯ b l ( t ) (cid:33) , i = .. , and ¯ b ( t ) = w ( t ) b ( p ( t )) + w ( t ) b ( p ( t )) + w ( t ) b ( p ( t )) . Notice that all priors p i ( t ) change in the direction of the mean best response¯ b ( t ) (when projected to S d − , this means that p i ( t ) moves toward ¯ b ( t ) ), andthe weights w i change according the performance of priors p i . Figures 2 and 3 were obtained using the following set of initial data: w ( ) = . , w ( ) = . , w ( ) = . , p ( ) = ( . , . , . ) , p ( ) =( . , . , . ) , p ( ) = ( . , . , . ) . With such initial data, the meanbest response ¯ b ( ) = ( . , . , . ) is located in the basin of attraction ofRetaliator according to the replicator equation and in the basin of attractionof Hawk+ Dove for the best response dynamics. The values of ( α , α p ) are ( , ) , for the example in figure 2, and ( , ) in figure 3.3.2. Effect of growing population.
We consider a situation when the num-ber of interactions among agents is constant and does not change if the pop-ulation size N ( t ) increases. That is, the effective learning rate α p = hN ( t ) δ = N N ( t ) hN δ = α N ( t ) , MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT15 F IGURE
1. Phase portraites for best-response (left) andreplicator (right) equations for Hawk-Dove-Retaliator gamein table 1. On the left, three polygonal regions, formed bylines OL , OM and O ( . H + . D ) are the regions where thebest response function a single value: H , D , or R . The basinof attraction for R is polygon KRMO (left) and region abovecurve
HOD (right). The plots show several trajectories forthe best response and the replication equations.F
IGURE
2. Cultural evolution: reproduction dominatingcase. Three subpopulaitons starting at A , B , and C , movetoward Retaliator. A trajectory starting at D is the mean bestresponse (strategy profile). Notice that it changes discontin-uously when one of the groups crosses boundaries of bestresponse polygons. F IGURE
3. Cultural evolution: learning dominating case.Same initial conditions as in Figure 2. All groups convergeto .5H+.5D after the group that started at C moves to theadjacent polygon. Trajectory starting at D is the mean bestresponse (strategy profile). Notice that it changes discon-tinuously (switches to D (cid:48) ) when the top group moves to theadjacent polygon.where N is the population size at time t = . The system of equations is(25) ∂ t f + α N ( t ) div p ( ¯ b ( t ) f ) = α f (cid:32) ∑ i j a i j b i ( p ) ¯ b j ( t ) − a i j ¯ b i ( t ) ¯ b j ( t ) (cid:33) , (26) ∂ t N ( t ) = α N ∑ i j ¯ b i ( t ) ¯ b j ( t ) , with ¯ b ( t ) given by (15). Consider rock-paper-scissors game from table 2.We define the fitness levels (number of offspring) a i j as basis fitness 1 plusthe numbers from the table. The best response function b ( p ) is sketched infigure 4, to which we refer below. Initially, the population is split into threegroups. The first is 23/32 of all population and every agent in this groups hasinitially learning prior p ( ) = ( . , . , . ) . It is located in the polygonRNOL for which the best response is to play “paper”. The second group ofproportion 1/4 has prior p ( ) = ( . , . , . ) with the best response “scis-sors”, and the third of proportion 1/32 has prior p ( ) = ( . , . , . ) .The mean best response ¯ b ( ) is located on insider region LONP . Supposethat initially there are 10 agents and the values of the parameters α = MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT17 α = . . The distribution function f has the form f ( p , t ) = ∑ i = w i ( t ) δ ( p − p i ( t )) , ∑ i = w i ( t ) = , and the system (25), (26) reduces to a system of 5 ODEs for w i ( t ) , p i ( t ) , N ( t ) , i = .. p ( t ) , p ( t ) , p ( t ) , and the mean best response¯ b ( t ) is shown on figure 4, obtained from solving the system of ODEs numer-ically. In this dynamics, statistical learning pushes p , p , p toward ¯ b ( t ) , however, the rate of learning decreases exponentially (we show this below),and as the result p ( t ) , and p ( t ) will asymptotically approach some loca-tions in the same polygons where they have started, where as p ( t ) movesto the decision polygon of p ( t ) and also becomes locked there. Then, thepopulation frequency vector ¯ b converges to “scissors” along line PS , mean-ing that subpopulations that started in LOMP and MSNO out-evolves thefirst group.The dynamics here is different from that of the replicator equation forwhich ¯ b ( t ) oscillates on a closed trajectory passing through the initial point¯ b ( ) . It differs also, from the dynamics of the best-response equation, thatconverges to the equilibrium ( / , / , / ) , see Gaunersdorfer and Hof-bauer [4].To see that this scenario takes place, notice that as long as ¯ b ( t ) is locatedon line PS, ¯ b ( t ) = ∂ t N = α ( ¯ b ( t ) + ¯ b ( t )) N = α N . Thus, the population grows exponentially, N ( t ) = N e α t . In the state of pri-ors R + , each group moves to new positions given by formulas p i ( t ) = p i ( ) + α N (cid:90) t e − α t ¯ b ( τ ) d τ . Note, that figure 4 shows projections of this p i ( t ) onto S . Clearly, p ( t ) and p ( t ) move a finite distance away from their initial position, and theparameters α , α , N can be selected (as in this example) in such a way that p ( t ) p ( t ) remain in the polygon where it has started. Moreover, a smallfraction of population w can be placed initially into region OMSN , closeto line ON , so that it crosses that line, forcing ¯ b ( t ) to move to line PS .
4. A
PPENDIX
Best response function BR ( p ) . . Let S d − be the d − (cid:8) p ∈ R d + : ∑ i p i = (cid:9) . Let A = { a i j } represents payoff matrix in a Rock Paper ScissorsRock 0 -1 1Paper 1 0 -1Scissors -1 1 0T
ABLE
2. Rock-Paper-Scissors game.F
IGURE
4. Cultural evolution with constant interaction fre-quency. The plot shows three polygonal regions where thebest response function takes a single value: R , P , or S . Thereare three subpopulations located at points A , B , and C , re-spectively. The population starting at C moves to the adja-cent polygon and stays there for all subsequent times. Sub-populations started at A and B do not leave their polygons.A discontinuous trajectory starting in at D is the mean best-response (strategy profile). Asymptotically it moves to S , meaning that the subpopulations contained in the polygon LONP , out-evolves the population from the adjacent poly-gon LRMO . symmetric game. We will assume that for no two indexes i (cid:54) = j , (27) ∑ k a ik p k = ∑ k a jk p k , ∀ p ∈ S d − . MODEL OF CULTURAL EVOLUTION IN THE CONTEXT OF STRATEGIC CONFLICT19
Denote by r i ( p ) = ∑ k a ik p k , the payoff to strategy i played against mixedstrategy p , and a set I ( p ) = (cid:26) i ( p ) ∈ .. d : r i ( p ) = max i r i ( p ) (cid:27) . Denote the coordinate vectors e i = ( , .. , , .. ) , with 1 in i th position, anda multi-valued function(28) BR ( p ) = (cid:8) convex hull of all e i ( p ) , such that i ( p ) ∈ I ( p ) (cid:9) . Under hypothesis (27), S d − is a union of finite number of polytops suchthat BR ( p ) is single-valued in the interior of each polytop P k , and at anypoint p on the boundary of P k , the best response BR ( p ) contains the valueBR ( p ) from the interior of P k :BR ( p ) ∈ BR ( p ) , ∀ p ∈ Interior ( P k ) , p ∈ ∂ P k . This condition can be re-phrased in an equivalent way, as a continuity con-dition: for any p ∈ S d − , there is ε > , such that for any ε < ε , and anypoint p ∈ B ε ( p ) ∩ S d − , (29) BR ( p ) ⊆ BR ( p ) . Finally, we select a single-valued representative of b ( p ) from the valuesof BR ( p ) . If p ∈ R d + , then b ( p ) is one of the values of BR ( p / ∑ i p i ) . Theselection can be, for example, the barycenter of the set of values of BR ( p ) , which corresponds to the situation when agents are choosing one strategyat random (from an uniform distribution).R EFERENCES [1] T. Borgers and R. Sarin. Learning through reinforcement and replicator dynamics.
Journal of Economic Theory , 77(1):1–14, 1997.[2] H. Brezis.
Functional Analysis, Sobolev Spaces and Partial Differential Equations .Springer, 1st edition, 2011.[3] D. Fudenberg and D. Levine.
The theory of learning in games . MIT Press, Cambridge,MA, 1998.[4] A. Gaunersdorfer and J. Hofbauer. Fictitious play, shapley polygons and the replicatorequation.
Games and economic behavior , 11:279–303, 1995.[5] I. Gilboa and A. Matsui. Social stability and equilibrium.
Econometrica , 59(3):859–867, 1991.[6] J. Hofbauer. From nash and brown to maynard smith: equilibria, dynamics and ess.
Selection , 1:81–88, 2000.[7] J. Hofbauer, P. Schuster, and K. Sigmund. A note on evolutionary stable strategiesand game dynamics.
Journal of Theoretical Biology , 81:609–612, 1979.[8] J. Hofbauer and K. Sigmund.
Evolutionary games and population dynamics . Cam-bridge University Press, New York, 1998.[9] W. Hoppitt and K. Laland.
Social Learning: An Introduction to Mechanisms, Methodsand Models . Princenton University Press, 2013. [10] Ch. Krishnedu, D. Zufferey, and M. Nowak. Evolutionary game dynamics in popula-tions with different learners.
Journal of theoretical biology , 301:161–173, 2012.[11] M. Perepelitsa. Adaptive learning in large populations.
Journal of Mathematical Bi-ology , 79:2237–2253, 2019.[12] P. Richerson and R. Boyd.
Not by Genes Alone: How Culture Transformed HumanEvolution . University of Chicago Press, 2005.[13] J. Robinson. An iterative method for solving a game.
Annals of Mathematics , 54:296–301, 1951.[14] L. S. Shapley. Some topics in two person games: in advances. In M. Drescher, L. S.Shapley, and A. W. Tucker, editors,
Advances in Game Theory , chapter 1. PrincetonUniversity Press, 1964.[15] J. Maynard Smith.
Evolution and the theory of games . Cambridge University Press,New York, 1982.[16] J. Maynard Smith and G.R. Price. The logic of animal conflict.
Nature , 246:15–18,1973.[17] P. D. Taylor and I. N. Jonker. Evolutionary stable strategies.
Mathematical Bio-sciences , 20:145–156, 1978.[18] E. C. Zeeman. Dynamics of the evolution of animal conflicts.
Journal theoreticalbiology , 89:249–270, 1981.
MPEREPEL @ CENTRAL . UH . EDU , D
EPARTMENT OF M ATHEMATICS , U
NIVERSITY OF H OUSTON , 4800 C