Statistical Physics of the Spatial Prisoner's Dilemma with Memory-Aware Agents
EEPJ manuscript No. (will be inserted by the editor)
Statistical Physics of the Spatial Prisoner’s Dilemma withMemory-Aware Agents
Marco Alberto Javarone Department of Mathematics and Computer Science, University of Cagliari, Cagliari (Italy) DUMAS - Department of Humanities and Social Sciences, University of Sassari, Sassari (Italy)Received: date / Revised version: date
Abstract.
We introduce an analytical model to study the evolution towards equilibrium in spatial games,with ‘memory-aware’ agents, i.e., agents that accumulate their payoff over time. In particular, we focus ourattention on the spatial Prisoner’s Dilemma, as it constitutes an emblematic example of a game whose Nashequilibrium is defection. Previous investigations showed that, under opportune conditions, it is possibleto reach, in the evolutionary Prisoner’s Dilemma, an equilibrium of cooperation. Notably, it seems thatmechanisms like motion may lead a population to become cooperative. In the proposed model, we mapagents to particles of a gas so that, on varying the system temperature, they randomly move. In doingso, we are able to identify a relation between the temperature and the final equilibrium of the population,explaining how it is possible to break the classical Nash equilibrium in the spatial Prisoner’s Dilemmawhen considering agents able to increase their payoff over time. Moreover, we introduce a formalism tostudy order-disorder phase transitions in these dynamics. As result, we highlight that the proposed modelallows to explain analytically how a population, whose interactions are based on the Prisoner’s Dilemma,can reach an equilibrium far from the expected one; opening also the way to define a direct link betweenevolutionary game theory and statistical physics.
PACS.
Evolutionary games [1,2,3] represent the attempt to studythe evolution of populations [4,5,6] by the framework ofgame theory [7]. Notably, these games allow to analyzesimplified scenarios in different domains, spanning fromsocio-economic dynamics to biological systems [8,9,1,10,11,12,13,14,15,16,17]. In general, evolutionary games con-sider a population of agents whose interactions are basedon games like the Prisoner’s Dilemma (hereinafter PD)or the Hawk-Dove game [4], where there are two possi-ble strategies: cooperation and defection. As in classicalgame theory, the concept of equilibrium represents a coreaspect [18]. Therefore, we aim to evaluate if a popula-tion reaches an equilibrium equal or different from theexpected one, i.e., the Nash equilibrium of the consideredgame. At each interaction, agents gain a payoff accordingto the adopted strategy and to a payoff matrix. The payoffrepresents a form of reward in the considered domain (e.g.,money in an economic system or food in an ecosystem).Remarkably, as agents are allowed to change their strategyover time, we can map them to spins with states σ = ± a r X i v : . [ phy s i c s . s o c - ph ] J a n Marco Alberto Javarone: Statistical Physics of the Spatial Prisoner’s Dilemma with Memory-Aware Agents alytical formulation. Section 3 shows analytical results.Eventually, Section 4 ends the paper.
In the proposed model, we are interested in studying thespatial prisoner’s dilemma by an analytical approach. Letus start by introducing the general form of a payoff matrix (cid:18)
C DC R SD T P (cid:19) (1)where the set of strategies is Σ = { C, D } : C stands for‘Cooperator’ and D for ‘Defector’. In the matrix 1, R isthe gain obtained by two interacting cooperators, T rep-resents the Temptation , i.e., the payoff that an agent gainsif it defects while its opponent cooperates, S the Sucker’spayoff , i.e., the gain achieved by a cooperator while theopponent defects, eventually P the payoff of two interact-ing defectors. In the case of the PD, matrix elements of 1are: R = 1, 0 ≤ S ≤ −
1, 1 ≤ T ≤ P = 0. Asstated before, during the evolution of the system agentscan change their strategy from C to D , and vice versa,following an updating rule, as for instance the one named‘imitation of the best’ (see [19,4]), where agents imitatethe strategy of their richest neighbor. Now, we consider a mixed population of N agents with,at the beginning, an equal density of cooperators and de-fectors. Under the hypothesis that all agents interact to-gether, at each time step the payoffs gained by cooperatorsand defectors are computed as follows (cid:40) π c = ( ρ c · N −
1) + ( ρ d · N ) Sπ d = ( ρ c · N ) T (2)with ρ c + ρ d = 1, ρ c density of cooperators and ρ d den-sity of defectors. We recall that defection is the dominantstrategy in the PD and, even if we set S = 0 and T = 1,it corresponds to the final equilibrium because π d is al-ways greater than π c . At this point, it is important tohighlight that previous investigations [19,20,21] have beenperformed by ‘memoryless’ agents (i.e., agents that do notaccumulate the payoff over time) whose interactions weredefined only with their neighbors, and focusing only onone agent (and on its neighbors) at a time. These condi-tions are fundamental. For instance, if at each time step werandomly select one agent interacting only with its neigh-bors, there exists the probability to select consecutivelya number of close cooperators; thus, in this occurrence,very rich cooperators may emerge and then prevail on de-fectors, even without introducing mechanisms like motion.It is also worth to observe that as P = 0, a homogeneouspopulation of defectors does not increase its overall payoff. Instead, according to the matrix 1, a cooperative popula-tion continuously increases its payoff over time.Now, we consider a population divided into two groupsby a wall: a group G a composed of cooperators, and amixed group G b , i.e., composed of cooperators and defec-tors in equal amount. Agents interact only with membersof the same group, then the group G a never changes and,in addition, it strongly increases its payoff over time. Theopposite occurs in the group G b , as it converges to anordered phase of defection, limiting its final payoff. Re-markably, in this scenario, we can introduce a strategyto modify the equilibria of the two groups. In particular,we can both change to cooperation the equilibrium of G b ,and to defection that of G a . In the first case, we have towait a while, before moving one or few cooperators to G b ,so that defectors increase their payoff, but during the re-vision phase they change strategy to cooperation as thenewcomers are richer than them. In the second case, ifwe move after few time steps a small group of defectorsfrom G b to G a , the latter converges to a final defectionphase. These preliminary and theoretical observations letemerge an important property of the ‘memory-aware’ PD:considering the two different groups, cooperators may suc-ceed when act after a long time and individually. Instead,defectors may succeed acting fast and in group. Notably,rich cooperators have to move individually since otherwisemany rich cooperators risk to increase too much the pay-off of defectors that, in this case, will not change strategy.The opposite holds for defectors that, acting in group, maystrongly reduce the payoff of a community of cooperators(for S <
We hypothesize that the spatial PD, with moving agents,can be successfully studied by the framework of kinetictheory [30]. Therefore, in the proposed model, we mapagents to particles of a gas. In doing so, the average speedof particles is computed as < v > = (cid:113) T s k b m p , with T s sys-tem temperature, k b Boltzmann constant, and m p particlemass. Particles are divided into two groups by a permeablewall, so that it can be crossed by particles, but it avoidsinteractions among particles belonging to different groups.Now, it is worth to emphasize that we can provide a dualdescription of our system: one in the ‘physical’ domain ofparticles, the other in the ‘information’ domain of agents.Notably, to analyze the system in the ‘information’ do-main we will introduce, as above discussed, the mappingof agents to a spin system (see [33]). Summarizing, we mapagents to gas particles in order to represent their ‘physi-cal’ property of motion, and we map agents to spins forrepresenting their ‘information’ property (i.e., their strat-egy). Remarkably, these two mappings can be viewed astwo different layers for studying how the agent populationevolves over time. Although the physical property (i.e.,the motion) affects the agent strategy (i.e., its spin), theequilibrium can be reached in both layers/domains inde-pendently. This last observation is important since we are arco Alberto Javarone: Statistical Physics of the Spatial Prisoner’s Dilemma with Memory-Aware Agents 3 interested in evaluating only the final equilibrium reachedin the ’information’ domain. Then, as stated before, agentsinteract only with those belonging to the same group, sothe evolution of the mixed group G b can be described byfollowing equations dρ bc ( t ) dt = p bc ( t ) · ρ bc ( t ) · ρ bd ( t ) − p bd ( t ) · ρ bd ( t ) · ρ bc ( t ) dρ bd ( t ) dt = p bd ( t ) · ρ bd ( t ) · ρ bc ( t ) − p bc ( t ) · ρ bc ( t ) · ρ bd ( t ) ρ bc ( t ) + ρ bd ( t ) = 1 (3)with p bc ( t ) probability that cooperators prevail on defec-tors (at time t ), and p bd ( t ) probability that defectors pre-vail on cooperators (at time t ). These probabilities arecomputed according to the payoffs obtained, at each timestep, by cooperators and defectors (cid:40) p bc ( t ) = π bc ( t ) π bc ( t )+ π bd ( t ) p bd ( t ) = 1 − p bc ( t ) (4)The system 3 can be analytically solved provided that, ateach time step, values of p bc ( t ) and p bd ( t ) be updated. So,the density of cooperators reads ρ bc ( t ) = ρ bc (0) ρ bc (0) − [( ρ bc (0) − · e τtNb ] (5)with ρ bc (0) initial density of cooperators in G b , τ = p bd ( t ) − p bc ( t ), and N b number of agents in G b . Recall that setting T s = 0, not allowed in a thermodynamic system, corre-sponds to a motionless case, leading to the Nash equi-librium in G b . Instead, for T s > t = 0,particles of G a are much closer to the wall than those of G b (later we will relax this constraint); for instance, let usconsider a particle of G a that, during its random motion, itis following a trajectory of length d (in the n -dimensionalphysical space) towards the wall. Assuming this particleis moving with speed equal to < v > , we can computethe instant of crossing t c = d
1) + ( ρ bd · N b ) S ] i (7)moreover, π bc → ρ bc →
0. At t = t c , a new cooperatorreaches G b , with a payoff computed with equation 6. The analytical solution 5 allows to analyze the evolutionof the system and to evaluate how initial conditions affectsthe outcomes of the model. Let us observe that, if π ac ( t c )is enough big, the new cooperator may modify the equi-librium of G b , turning defectors to cooperators. Notably,the payoff considered to compute p bc , after t c , correspondsto π ac ( t c ), as the newcomer is the richest cooperator in G b .Furthermore, we note that π ac ( t c ) depends on N a , hencewe study the evolution of the system on varying the pa-rameter (cid:15) = N a N b , i.e., the ratio between particles in thetwo groups. Eventually, for numerical convenience, we set k b = 1 · − , m p = 1, and d = 1.Figure 1 shows the evolution of G b , for (cid:15) = 1 on varying T s and, depicted in the inner insets, the variation of sys-tem magnetization over time (always inside G b ) computedas [34] M = (cid:80) N b i =1 σ i N b (8)with σ i strategy of the i -agent. As discussed before, inthe physical domain of particles, heating the system en-tails the average speed of particles increases. Thus, underthe assumption that two agents play together if they stayclose (i.e., in the same group) for a long enough time,we hypothesize that exists a maximum speed such thatfor greater values interactions do not occur (in terms ofgame). This hypothesis requires a critical temperature T c ,above which no interactions, in the ‘information’ domain,are possible. As shown in plot f of figure 1, for temper-atures in range 0 < T s < T max the system converges toa cooperation phase (i.e., M = +1), for T max < T s < T c the system follows the Nash equilibrium (i.e., M = − T > T c a disordered phase emerges at equilibrium.Remarkably, results of our model suggest that it is alwayspossible to compute a range of temperatures to obtain anequilibrium of full cooperation —see figure 2. Moreover,we study the variation of T max on varying (cid:15) (see figure 3)showing that, even for low (cid:15) , it is possible to obtain a time t c that allows the system to converge towards coopera-tion. Eventually, we investigate the relation between themaximum value of T s that allows a population to becomecooperative and its size N (i.e., the number of agents).Remarkably, as shown in figure 4, the maximum T s scaleswith N following a power-law function characterized by ascaling parameter (i.e., an exponent) γ ∼
2. The value of γ has been computed by considering values of T s shownin figure 2 for the case (cid:15) = 2. Eventually, it is worth tohighlight that all analytical results let emerge a link be-tween the system temperature and its final equilibrium.Recalling that we are not considering the equilibrium ofthe gas, i.e., it does not thermalize in the proposed model,we emphasize that the equilibrium is considered only inthe information domain. As discussed before, in the information domain we canstudy the system by mapping agents to spins, whose value
Marco Alberto Javarone: Statistical Physics of the Spatial Prisoner’s Dilemma with Memory-Aware Agents
Fig. 1.
From a to e : Evolution of the group G b , with N = 100 and (cid:15) = 1, on varying the temperature: a. T s = 0. b. T s = 0 . c. T s = 9. d. T s = 15. e. T s = 50. Insets show the system magnetization over time. The istant t = t c , can be detected in plots c,d,e as a discontinuity of the two lines (i.e., red and black). f. Final magnetization M , of G b , for different temperatures ( T c indicates the ‘critical temperature’). Fig. 2.
Maximum values of temperature T s that allow thegroup G b to converge to cooperation. Red values correspondto results computed with (cid:15) = 0 .
5, while blue values to thosecomputed with (cid:15) = 1. Circles are placed in the
T S diagramindicating values of T and S , of the payoff matrix, used foreach case. Even for high values of T , and small values of S , itis possible to achieve cooperation. represents their strategy. In addition, we can map the dif-ference between winning probabilities, of cooperators anddefectors, to an external magnetic field: h = p bc − p bd . In Fig. 3.
Maximum value of system temperature that allowsto achieve cooperation at equilibrium versus (cid:15) (i.e., the ratiobetween particles in the two groups). Different colors identifydifferent trends, fitted by power-law functions. After the finalgreen plateau, temperatures are too high to play the spatialPD. doing so, by the Landau theory [30], we can analyticallyidentify an order-disorder phase transition. Notably, weanalyze the free energy F of the spin system on varyingthe control parameter m [35] (corresponding to the mag- arco Alberto Javarone: Statistical Physics of the Spatial Prisoner’s Dilemma with Memory-Aware Agents 5 Fig. 4.
Maximum value of T s to achieve full cooperation atequilibrium in function of N , i.e., the size of the population.The fitting function (dotted line) is a power-law characterizedby a scaling parameter equal to 2. netization M ) F ( m ) = − hm ± m m T s > T c and negative for T s < T c ;recalling that T c represents the temperature beyond whichit is not possible to play the PD due to the high particlesspeed (according to the condition before discussed). Forthe sake of clarity, we want to emphasize that the free en-ergy is introduced in order to evaluate the nature of thefinal equilibrium achieved by the system. In particular,looking for the minima of F allows to investigate if ourpopulation reaches the Nash equilibrium, or different con-figurations (e.g., full cooperation). Figure 5 shows a pic-torial representation of the phase transitions that occur inour system, on varying T s and the external field h . Finally,the constraints related to the average speed of particles,and to the distance between each group and the perme-able wall, can in principle be relaxed as we can imagineto extend this description to a wider system with severalgroups (as done in previous investigations, e.g. [20]), whereagents are uniformly spread in the whole space. It is worthto highlight that our results are completely in agreementwith those achieved by authors who studied the role ofmotion in the PD (as [19,20]), explaining why clustersof cooperators emerge in their simulations [20]. We alsorecall that, in the proposed model, we are using memory-aware agents, while in previous computational investiga-tions agents reset their payoff at each step, i.e., before tostart new interactions. To conclude, in this work we provide an analytical de-scription of the spatial Prisoner’s Dilemma, by using the framework of statistical physics, studying the particularcase of agents provided with memory of their payoff (de-fined memory-aware agents). This condition entails thattheir payoff is not reset at each time step, so that they canincrease it over time. In particular, we propose a modelbased on the kinetic theory of gases, showing how motionmay lead a population towards an equilibrium far from theexpected one (i.e., the Nash equilibrium). Remarkably, thefinal equilibrium depends on the system temperature, sothat we have been able to identify a range of tempera-tures that triggers cooperation for all values of the payoffmatrix (related to the PD). In addition, we found an in-teresting relation between the maximum temperature thatfoster cooperation and the size of the system. Notably, ascaling parameter in that relation has been computed byinvestigating different orders of magnitude of the size ofthe system. Furthermore, the dynamics of the resultingmodel have been also described in terms of order-disorderphase transitions. Finally, we deem that our results openthe way to define a direct link between evolutionary gametheory and statistical physics.
Acknowledgments
MAJ is extremely grateful to Adriano Barra for all price-less suggestions. Moreover, he wants to thank Mirko DegliEsposti, Marco Lenci, and Giampaolo Cristadoro for theuseful comments. This work has been supported by Fon-dazione Banco di Sardegna.
References
1. Perc, M., Grigolini, P.: Collective behavior and evolutionarygames An introduction.
Chaos, Solitons & Fractals Harvard University Press (2006)3. Tomassini, M.: Introduction to evolutionary game theory.
Proc. Conf. on Genetic and evolutionary computation com-panion (2014)4. Julia, PC, Gomez-Gardenes, J., Traulsen, A., and Moreno,Y.: Evolutionary game dynamics in a growing structuredpopulation.
New Journal of Physics Phys. Rev. E Cambridge University Press (1988)7. Colman, A.M., Game Theory and Its Applications DigitalPrinting, 2008.8. Perc, M., Szolnoki, A.: Social diversity and promotion ofcooperation in the spatial prisoner’s dilemma.
Phys. Rev. E J. R. Soc. Interface Scientific Reports Fig. 5.
Order-disorder phase transitions in the population. For T s < T c , the population is in a ferromagnetic phase: a. Applyingan external negative field, the system converges to the Nash equilibrium, corresponding to m = − σ = − b. Applying an external positive field, the population converges to cooperation ( σ = +1), corresponding to m = +1. c. For temperatures higher than T c , a disordered paramagnetic phase emerges.11. Szolnoki, A., Xie, N.-G., Wang, C. and Perc, M.: Imitat-ing emotions instead of strategies in spatial games elevatessocial welfare. Europhysics Letters New Journal of Physics Journal of Evolutionary Economics
BioSystems
Physica A
TheRoyal Society - Proc. B (2011)17. Lieberman, E., Hauert, C., Nowak, M.A.: Evolutionary dy-namics on graphs.
Nature (2004)18. Galam, S. and Walliser, B.: Ising model versus normal formgame.
Physica A
Phys. Rev. E
Journal of Theoretical Biology
344 (2014)21. Tomassini, M., Antonioni, A.: Levy flights and coopera-tion among mobile individuals.
Journal of theoretical biology
Scientific Reports (2015)23. Perc, M., Gomez-Gardenes, J., Szolnoki, A., Floria, L.M.,and Moreno, Y.: Evolutionary dynamics of group interac-tions on structured populations: a review. J. R. Soc. Inter-face
Computational Social Networks (2015)25. Javarone, M.A., Atzeni, A.E. and Galam, S.: Emergenceof Cooperation in the Prisoners Dilemma Driven by Con-formity. LNCS - Springer
Science
Physics Reports
Nature
A. J.Phys.
73 - 405 (2005)30. Huang, K.: Statistical Mechanics.
Wiley 2nd Ed. (1987)31. Szolnoki, A., Szabo, G., Perc, M.: Phase diagrams for thespatial public goods game with pool punishment.
Phys. Rev.E EPL Europhysics Letters (2015)34. Mobilia, M. and Redner, S.: Majority versus minority dy-namics: Phase transition in an interacting two-state spinsystem.
Phys. Rev. E
Journal of Statistical Physics132-5