[PDF] Competition of individual and institutional punishments in spatial public goods games

Abstract

We have studied the evolution of strategies in spatial public goods games where both individual (peer) and institutional (pool) punishments are present beside unconditional defector and cooperator strategies. The evolution of strategy distribution is governed by imitation based on random sequential comparison of neighbors' payoff for a fixed level of noise. Using numerical simulations we have evaluated the strategy frequencies and phase diagrams when varying the synergy factor, punishment cost, and fine. Our attention is focused on two extreme cases describing all the relevant behaviors in such a complex system. According to our numerical data peer punishers prevail and control the system behavior in a large segments of parameters while pool punishers can only survive in the limit of weak peer punishment when a rich variety of solutions is observed. Paradoxically, the two types of punishment may extinguish each other's impact resulting in the triumph of defectors. The technical difficulties and suggested methods are briefly discussed.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] O c t Competition of individual and institutional punishments in spatial public goods games

Attila Szolnoki, Gy¨orgy Szab´o, and Lilla Czak´o Research Institute for Technical Physics and Materials Science, P.O. Box 49, H-1525 Budapest, Hungary Roland E¨otv¨os University, Institute of Physics, P´azm´any P. s´et´any 1/A, H-1117 Budapest, Hungary

We have studied the evolution of strategies in spatial public goods games where both individual (peer) andinstitutional (pool) punishments are present beside unconditional defector and cooperator strategies. The evo-lution of strategy distribution is governed by imitation based on random sequential comparison of neighbors’payoff for a ﬁxed level of noise. Using numerical simulations we have evaluated the strategy frequencies andphase diagrams when varying the synergy factor, punishment cost, and ﬁne. Our attention is focused on twoextreme cases describing all the relevant behaviors in such a complex system. According to our numerical datapeer punishers prevail and control the system behavior in a large segments of parameters while pool punisherscan only survive in the limit of weak peer punishment when a rich variety of solutions is observed. Paradoxi-cally, the two types of punishment may extinguish each other’s impact resulting in the triumph of defectors. Thetechnical difﬁculties and suggested methods are brieﬂy discussed.

PACS numbers: 89.65.-s, 89.75.Fb, 87.23.Kg

I. INTRODUCTION

The emergence of cooperation among selﬁsh individualsis an important and intensively studied puzzle inspired bysystems of biology, sociology, or economics [1, 2]. One ofthe frequently used framework to catch the conﬂict of indi-vidual and common interests is the so-called public goodsgame (PGG) in which several players decide simultaneouslywhether they contribute or not to the common venture. Thecollected income is multiplied by a factor (representing theadvantage of collective actions) and shared equally amongall members of the group independently of their personal act.Accordingly, defectors, who deny to contribute but enjoy thecommon beneﬁt (due to the cooperators), collect higher in-dividual payoffs and are favored leading to the “tragedy ofcommons” state [3].In the last decade several mechanisms have already beenidentiﬁed that help resolve this dilemma by ensuring com-petitive payoff for altruistic (cooperative) players [4–19]. Aplausible idea is to punish defectors by lowering their incomewhich decreases their popularity [20–23]. To punish cheaters,however, can be executed in two signiﬁcantly different ways.Firstly, players can retaliate individually by paying extracost of punishment as often as they face with defectors. Nat-urally, this so-called peer punisher strategy fares equally wellwith pure cooperators in the absence of cheaters. The purecooperators, however, who do not contribute to the sanctionsbut utilize the advantage of punishment, can be consideredas “second-order free-riders” [24]. As a conclusion, the gen-erally less favored peer punisher strategy will become extinctgradually and the original problem emerges again. Without in-troducing further complexity this problem cannot be solved inwell-mixed population. In structured population, however, anadequate solution may be achieved by utilizing spatial effects[2, 5]. Here the pure cooperators and peer punishers are ableto separate from each other and ﬁght independently againstdefectors. Since punishers do it more successfully, they even-tually displace the pure cooperators via an indirect territorialﬁght [25, 26].The alternative way to impose sanctions is when players invest a permanent cost into a punishment pool and punishdefectors “institutionally”. In this case, if there is punish-ment in the group, the ﬁne imposed on defectors may notnecessarily depend on the actual number of punishers in thegroup and the cost of punishers can also be independent onthe number of cheaters among group members. In this waythe cost is always charged independently of the necessity orefﬁciency of punishment. In well-mixed population pool pun-ishment can only prevail if “second-order punishment” is al-lowed, i.e. pure cooperators, who do not invest extra cost intothe punishment pool, are also ﬁned [27, 28]. In the absenceof the latter possibility defectors will spread if the participa-tion in PGG is compulsory. In agreement with the expecta-tions the spatial models offer another type of solutions wherethe pool punisher strategy can survive without assuming ad-ditional punishment of pure cooperators. In the latter case aself-organizing spatio-temporal pattern can be observed [29].The emergence of spatial patterns, maintained by cyclic dom-inance among three strategies, is a general phenomenon andoccurs for a wide variety of systems including PGG [4, 30, 31]and different variants of prisoner’s dilemma game [32–36].We note that many aspects of punishment were already in-vestigated in human experiments [21, 37–43], as well as bymeans of mathematical models with three [4, 44, 45], four[46, 47], and even more strategies [48, 49].The seminal work of Sigmund et al. has revealed that poolpunishers always lose and peer punishers prevail for well-mixed populations in the absence of second-order punishment[27, 28]. In the present paper we study the competition of pun-ishing strategies by assuming structured population. It will bedemonstrated that the stable solution depends sensitively onthe relative cost that punishing strategies bear. Accordingly,we have studied two extreme cases illustrating the possible re-lations of pool and peer punisher players. It should be stressedthat in our model additional strategies, such as voluntary op-tional participation in PGG or second-order punishment ofpure cooperators, are not allowed. Despite its simplicity thespatial model exhibits really complex behavior including dif-ferent space and time scales in connection to the emergingsolutions.The remainder of this paper is organized as follows. In thenext section we describe the studied models by supplying mo-tivations to the suggested extreme cases termed as “hard” and“weak” peer punishment limits. In Sec. III we present thesolutions obtained by Monte Carlo (MC) simulations for anexpensive peer strategy. The results of the other extreme caseare presented in Sec. IV. Finally, we summarize our observa-tions and discuss their potential implications.

II. SPATIAL PUBLIC GOODS GAME WITH PUNISHINGSTRATEGIES

To preserve comparability with previous works [26, 29] thepublic goods game is staged on a square lattice using periodicboundary conditions. We should emphasize, however, that theobserved results are robust and are valid in a wider class oftwo-dimensional lattices. The players are arranged into over-lapping ﬁve-person ( G = 5 ) groups in a way that each playerat site x serves as a focal player in the group formed togetherwith his/her four nearest neighbors [50–54]. Consequently,each individual belongs to G = 5 different groups and playsﬁve ﬁve-person games by following the same strategy in everygroup he/she is afﬁliated with.According to the four possible strategies, a player on site x is designated as a defector ( s x = D ), or pure cooperator( s x = C ), or peer ( s x = E ), or pool punisher ( s x = O ). Forthe last three strategies the player contributes a ﬁxed amount(equals to without loss of generality) to the public goodswhile defectors contribute nothing. The sum of all contribu-tions in each group is multiplied by the factor r ( < r < G ),reﬂecting the synergetic effects of cooperation, and the multi-plied investment is divided equally among the group membersirrespective of their strategies.In addition to the basic game, defectors may be punished ifthere are pool or peer punisher players in the group. Pool pun-ishment requires precursory allocation of resources, that is,each punisher contributes a ﬁxed amount γ to the punishmentpool irrespective of the strategies in its neighborhood. Fur-thermore, because of the institutional character of this sanc-tion, the resulting β ﬁne of defectors is independent on thefrequency of pool punishers: the only criterion is to presenceat least one pool punisher in the group.The character of peer and pool punishments differ signiﬁ-cantly. Namely, the cost of peer punishment is charged onlyif a peer punisher faces with a defector but this cost is multi-plied by the number of defectors ( N gD ) in the given group g ( g = 1 , . . . , G ). The latter fact reﬂects that a peer punishershould penalize every defectors individually. In addition, theﬁne of a defector originated from peer punishment is accu-mulated and is proportional to the number of peer punishers N gE in the group. Denoting the number of cooperators andpool punishers by N gC and N gO in the group, the payoff for thepossible strategies can be given as P gC = r ( N gC + N gO + N gE + 1) /G − ,P gO = P gC − γ, (1) P gE = P gC − γmN gD , P gD = r ( N gC + N gO + N gE ) /G − βmN gE − βf ( N gO ) , where the step-like function f ( Z ) is if Z > and oth-erwise. The total payoff of a player at site x is accumulatedfrom ﬁve public goods games, consequently, P s x = P g P gs x ( g = 1 , . . . , G ).The parameter m in Eqs. (1) allows us to quantify two rele-vant limits in the relation of pool and peer punisher strategies.At the “hard” limit of peer punishment ( m = 1 ) the pool pun-isher pays a lump cost γ while the peer punisher is charged bythe same cost γ for each action of punishment, that is their cor-responding income is reduced by γN gD . Notice, that in spite oftheir high cost the peer punisher may overcome pool punish-ers in the absence of defectors. The latter constellations maybecome relevant in the spatial systems if defectors are presentrarely. On the other hand, the hard peer punishment reducesthe income of defectors more efﬁciently if several neighborsapply this strategy against a defector.The ”weak” limit of the peer punishment will be studied atthe parameter value m = 1 / ( G − . In this case the cost of apool punisher always exceeds the cost of a peer punisher ex-cepting the case when every group member chooses defectionaround the E player. Now we consider only the case whenthe efﬁciency ( i.e. , corresponding ﬁne) of peer punishment isalso reduced by the factor m . The above situations raise manyquestions about the competition and coexistence of the basi-cally different types of punishment.Following the traditional concept of evolutionary game the-ory, the population of the more successful individual strate-gies expands at the disadvantage of others having lower in-come (ﬁtness). For networked population this strategy updateis usually performed via a stochastic imitation of the moresuccessful neighbors. Accordingly, during an elementary stepof Monte Carlo simulation a randomly selected player x playspublic goods games with her all co-players in G groups andcollects P s x total payoff as described in Eqs. (1). Next, player x chooses one of its four nearest neighbors at random, and thechosen co-player y also acquires its payoff P s y in the sameway. Finally, player x imitates the strategy of player y with aprobability w ( s x → s y ) = 1 / { P s x − P s y ) /K ] } ,where K quantiﬁes the uncertainty in strategy adoptions[2, 55]. Generally, the possibility of error in strategy updateprevents the system from being trapped in a frozen, metastablestate. For the sake of direct comparison with previous results[26, 29] we set K = 0 . . It is emphasized that the found solu-tions are robust and remain valid at other (low) values of noiseparameter.The frequencies of pool and peer punishers ( ρ O and ρ E ),cooperators ( ρ C ) and defectors ( ρ D ) [satisfying the condition ρ D + ρ C + ρ O + ρ E = 1 ] are determined by averaging overa sampling time t s after a sufﬁciently long relaxation time t r .The time is measured in the unit of Monte Carlo step (MCS)giving a chance once on average for the players to adopt oneof the neighboring strategies. Depending on the values of theparameters γ, β , and r the emerging spatial patterns exhibita large variety in the characteristic length and time scales. Inorder to achieve an adequate accuracy (typically the line thick-ness) we need to vary the linear system size from L = 400 to for sufﬁciently long sampling and relaxation times (insome crucial cases t r = t s > MCS). As we will describein detail in the subsequent sections the usual choice of randomdistribution of strategies as an initial state was not always ap-propriate to ﬁnd the solution that is valid in the large systemsize limit. At some parameter values even the largest attain-able system size, ( L = 7200 ), was not large enough to reachthe most stable solution from a random initial state. This prob-lem is related to the fact that the formation of some solutions ischaracterized by different time scales and the fast relaxationfrom a random state toward an intermediate (unstable) stateprevents the more complex solutions to emerge. In the lat-ter cases we had to use prepared (artiﬁcial) initial state ( e.g. ,a patch-work-like pattern) combining solutions of subsystemswhere several strategies are missing. III. HARD PEER PUNISHMENT

First we discuss the case of hard peer punishment becauseit yields simpler phase diagrams. In this case the cost of peerpunishers exceeds the cost of pool punishers when several de-fectors are present in their neighborhood. At the same timethe peer punishers can help each other if they form compactcolonies in the spatial system and these collaborations mul-tiply the ﬁne reducing the income of neighboring defectors.To reveal the possible stable solutions we have studied differ-ent values of synergy factor r exhibiting signiﬁcantly differ-ent results in simpler models studied previously [26, 29]. Theapplied values ( r = 3 . , . , and ) represent three differentclasses in the stationary behavior.The highest synergy factor ( r = 3 . ) allows pure coopera-tors to survive even in the absence of punishment. At a slightlylower synergy value ( r = 3 . ) defectors would prevail with-out punishment, however, both types of punishment (as a pos-sible third strategy) can boost cooperation as it was alreadyshown [26, 29]. In case of the lowest synergy factor ( r = 2 ),the simpler three-strategy models predict signiﬁcantly differ-ent behaviors when applying only peer or pool punishment.For low cost values cooperators were unable to survive forthe case of peer punishment in the presence of a weak noiseallowing additional rare creation of defectors. On the con-trary, for pool punishment, the D , C , and O strategies formeda self-organizing spatial pattern maintained by cyclic domi-nance. Now the numerical analysis is extended for higher val-ues of β and γ . As a result, we have observed the coexistenceof D , C , and E strategies via a curious mechanism within aregion of parameters (not yet investigated previously).MC simulations were performed to determine the station-ary frequency of strategies when varying the value of ﬁne β for different values of cost γ and r . The numerical data indi-cated discontinuous (ﬁrst-order) or continuous (second-order)phase transition(s) between phases characterized by basicallydifferent compositions and/or spatio-temporal structures as il-lustrated in Fig. 1 for the lowest value of r we ﬁrst study.If the system is started from a random initial state with fourstrategies then the system evolves into the homogeneous (ab-sorbing) state D where only defectors remain alive ( ρ D = 1 )if the ﬁne is smaller than a threshold value β th ( γ = 0 . , r = s t r a t e gy fr e qu e n c i e s fine DCE

D D h(E) (D+C+E) E

FIG. 1: (Color online) Average strategy frequencies vs. ﬁne in theﬁnal stationary state for hard peer punishment limit ( m = 1 ) at γ =0 . and r = 2 . The corresponding phases are denoted at the top.Lines are just to guide the eye. The arrow points to the value of ﬁneseparating the phases D and D h where the average invasion velocitybetween the domains of D and E strategies becomes zero. , K = 0 .

5) = 0 . [indicated by an arrow in Fig. 1].The simulations show clearly that defectors invade the terri-tories of peer punishers if β < β th . For β th < β < β c the superiority of defectors ( ρ D = 1 ) is due to a mecha-nism that can be understood by considering ﬁrst the curi-ous coexistence of the D , C , and E strategies occuring for β c < β < β c ) [ β c ( γ = 0 . , r = 2 , K = 0 .

5) = 1 . and β c ( γ = 0 . , r = 2 , K = 0 .

5) = 2 . ]. The cor-responding phase is denoted as (D+C+E). Within this phasestrategy E can invade the territories of D s along the inter-faces separating them as illustrated in the snapshot in Fig. 2.For sufﬁciently high values of β and γ , however, the expensiveaction of punishment reduces the income of both defectors andpeer punishers along the interface where players can increasetheir payoff by choosing cooperation. As a result cooperatorscan spread along these interfaces by forming a “monolayer”.At the same time the interfacial cooperators serve as a “coop-erator reservoir” from where cooperation can spread into thephase E via the mechanism described by the voter model [56–58]. Rarely the cooperators aggregate in the vicinity of theinterface and the given territory becomes unprotected againstthe invasion of defectors. Consequently, the presence of co-operators along the D-E interfaces reverses the direction ofinvasion. In the snapshot of Fig. 2 one can observe both typesof invasions balanced in the (D+C+E) phases.The spreading of cooperators along the D-E interfaces isinﬂuenced by the values of γ and β and it may become so ef-ﬁcient that C monolayers are formed throughout these inter-faces. In that case the E domains are invaded by defectors withthe assistance of cooperators. Having the last peer punishersremoved the defectors sweep out cooperators, too. This pro-cess is resembling a real life situation referred as ”The Moorhas done his duty, the Moor may go”. Such a scenario occurswithin the phase D h where subscript h refers to homoclinicinstability. The mentioned transient process to D is conﬁrmed FIG. 2: (Color online) Typical arrangement of cooperators (white),defectors (black) and peer punishers (orange - light gray) for the(D+C+E) phase within a × part of a larger system at r = 2 , γ = 1 , and β = 2 . in the hard peer punisment limit. by MC simulations for most of the runs in small systems (e.g., L < ). The present system, however, can evolve into thehomogeneous state E ( ρ E = 1 ) with a probability increasingwith L . The phase E can conquer D (via a nucleation mecha-nism) if a small colony of E players survive the extinction ofcooperators and the colony size exceeds a critical value duringthe stochastic evolutionary steps. It is emphasized that the E invasion can be reversed by the offspring of a single coopera-tor substituted for one of the players along the D-E interfaceand ﬁnally the system evolves into a state prevailed by defec-tors. Notice that pool punishers die out for all the cases plottedin Fig. 1. Furthermore, the (D+C+E) phases transform into Ewith a continuous extinction of both the D and C strategieswhen approaching β c . Similar numerical investigations aremade for many other values of cost γ and the results are sum-marized in a phase diagram plotted in Fig. 3).The simulations indicate that both defectors and pool pun-ishers die out within a transient time for sufﬁciently high val-ues of β if γ < γ c ( r = 2) = 0 . . As the survivingcooperators and peer punishers receive the same payoff there-fore the resultant two-strategy evolutionary process becomesequivalent to those described by the voter model. The two-dimensional voter model exhibits an extremely (logarithmi-cally) slow evolution toward one of the (homogeneous) ab-sorbing states [59]. The coexistence of C and E strategies,however, can be destroyed by introducing defectors (as mu-tants even for arbitrarily small rates) favoring and accelerat-ing the ﬁxation in the homogeneous state of E strategy [60].This is the reason why the ﬁnal stationary state is denoted byE in the phase diagrams throughout the whole paper (see e.g.,Figs. 1 and 3). Finally we mention that the dotted line in Fig. 3is the analytical continuation of the dashed (red) one separat-ing the phases D and E. Along these lines the average velocityof invasion between the phases E and D becomes zero. c o s t fineD D h(E) E (D+C+E)

FIG. 3: (Color online) Cost-ﬁne phase diagram in the hard peer pun-ishment limit ( m = 1 ) for a low synergy factor ( r = 2 ). The dashed(red) and solid (blue) lines represent ﬁrst- and second-order phasetransitions, dotted (black) line separates the homogeneous phases ofdefectors (D) with different stabilities. As expected, the increase of r supports the maintenance ofcooperation. Consequently, a smaller ﬁne is capable to sup-press defection. As well as for r = 2 pool punishers die outquickly if r = 3 . . For high values of β and γ the cooperatorsprefer staying along the interfaces separating domains of Dand E phases (as described above) and yield a slower tendencytoward the ﬁnal stationary state. The undesired technical dif-ﬁculty is reduced signiﬁcantly for lower values of cost andﬁne where Fig. 4 illustrates a discontinuous (ﬁrst-order) phasetransition between the phases D and E at a threshold value ofﬁne increasing with the cost of peer punishment if these quan-tities exceed the suitable critical values ( β c = 0 . and γ c = 0 . for r = 3 . and K = 0 . ). When increasing β c o s t fineD ED+E FIG. 4: (Color online) Cost-ﬁne phase diagram at m = 1 for r =3 . . Solid (blue) and dashed (red) lines represent second-order andﬁrst-order phase transitions, respectively. D+E denotes a phase withcoexisting D and E strategies. for γ < γ c the ﬁrst order phase transition from the homoge-neous D state to E is separated by a coexistence region of D and E strategies. Within this phase the frequency of peer pun-ishers varies continuously from 0 to 1 and both transitions ex-hibit the general features of directed percolation universalityclass in agreement with previous results obtained for imitationdynamics [32, 55, 61].For higher synergy factors (e.g., r = 3 . ) the cooperatorssurvive in the absence of punishment ( γ = 0 ). Consequently,the homogeneous D phase is missing in the phase diagram. InFigure 5 the phase D+C refers to the coexistence of coopera-tors and defectors in the ﬁnal stationary states. The increase c o s t fineD+C ED+E FIG. 5: (Color online) Cost-ﬁne phase diagram at m = 1 for r =3 . . Phases and phase boundaries are denoted as in Fig. 4. of ﬁne yields a discontinuous transition from D+C to D+E if γ < . for the given parameters, otherwise one can ob-serve a ﬁrst-order transition from D+C to E (within the regionof γ and β plotted in Fig. 5). Within the D+E phase the den-sity of defectors vanishes continuously when approaching thephase boundary separating the phases D+E and E. IV. WEAK PEER PUNISHMENT

In this section we focus on the opposite limit where duringthe sanction of punishment peer punishers have less cost andenforce lower ﬁne in comparison with those of pool punish-ers. Using m = 1 / ( G − parameter value, their costs areequal only if a peer punisher is surrounded only by defectors( N gD = G − ). According to a naive argument, the peer pun-ishers might beneﬁt from the powerful ﬁne of pool punisherswhich strengthen their position further comparing to the latterstrategy. This is expected especially after the experience whatwe observed in the previous section where peer punisher play-ers prevail the system despite of their large extra cost. Follow-ing the established protocol, we explore the possible solutionsat three representative synergy factors.At high ( r = 3 . ) synergy factor the phase diagram, plot-ted in Fig. 6, partly supports our expectation. Namely, at highcost ( γ > . ) the solutions become identical to those ob-tained in the absence of O strategies. At low values of cost,however, the above mentioned belief is broken because poolpunishers can survive despite that they are charged by a largerpermanent cost of punishment. Notice furthermore, that they can fully displace not only pure cooperators but also peer pun-ishers who both can be considered as second-order free-riders. c o s t fineD+C D+E ED+O D+O+E FIG. 6: (Color online) Phase diagram for the weak peer punishmentlimit ( m = 1 / ( G − ) at r = 3 . . Solid (blue) and dashed (red)lines represent second- and ﬁrst-order phase transitions. Figure 7 shows the variation of strategy frequencies andillustrates ﬁve consecutive phase transitions (at β c , β c , ..., β c ) when the ﬁne is increased at γ = 0 . . s t r a t e gy fr e qu e n c i e s fine DCOE

FIG. 7: (Color online) Strategy frequencies as a function of ﬁne in theweek peer punishment limit ( m = 1 / ( G − ) for a low punishmentcost ( γ = 0 . ) and r = 3 . . Inset features the enlargement of thesmall-ﬁne area. The visualization of the time-dependence of spatial strat-egy distribution has helped us understand what happens andthe characteristic mechanisms can be summarized as follows.If the system is started from a random initial state then after ashort relaxation process we can observe a sea of defectors withhomogeneous islands of cooperative strategies ( C , O , and E )for low values of β . Due to the stochastic dynamics the islandsgrow and shrink at random and sometimes they can disappear,unite, or split into two. In the late stage of the evolutionaryprocess the pattern formation can be considered as a competi-tion among three two-strategy associations (denoted as D+C,D+O, and D+E) representing the corresponding stationary so-lutions of subsystems where only two strategies take place [2].Evidently, the D+C solution can invade the other two asso-ciations for inﬁnitesimally small values of ﬁne, because C sare not charged by the cost of punishment. The increase ofﬁne, however, favors the survival of the O and E strategies.As a result, the average frequency of the punishing strategies( ρ O and ρ E ) increases with the ﬁne in the corresponding two-strategy phases (D+O and D+E) while ρ C remains constant inthe phase D+C. The mentioned variations modify the relation-ship among the three two-strategy solutions. The MC simula-tions indicate that the D+E phase conquers the whole systemif β c < β < β c and the D+O phase can be observed in theﬁnal state if β c < β < β c . In the latter two phases the fre-quency of defectors decreases monotonously with the ﬁne andthe punisher islands are simultaneously separated by channelsbecoming narrower. In parallel with this process the punishingislands unite more frequently enforcing the relevance of directcompetition between E and O that boosts the spreading of E .The latter effect helps peer punishers to survive in the three-strategy phase D+O+E within the region β c < β < β c .In the following region of ﬁne ( β c < β < β c ) the direct E invasion sweeps out all the pool punishers and the systemdevelops into the phase D+E where the defector frequency ap-proaches 0 at β c . If β > β c then the system evolves into thephase E as detailed above.The general behavior of the four-strategy system at r = 3 . is similar to those described above except the missing D+Cphase in the low-ﬁne limit. Figure 8 shows that pool punisherscan survive with defectors both in the absence or presence ofpeer punishers at a sufﬁciently low cost. Otherwise the phasediagram is identical to the result achieved in the absence ofpool punisher.

0 0.1 0.2 0 0.1 0.2 0.3 0.4 0.5 c o s t fineD D+E ED+O (D+O+E) FIG. 8: (Color online) Phase diagram for the weak peer punishmentlimit ( m = 1 / ( G − ) at r = 3 . . Signiﬁcantly different and more complex solutions arefound at low synergy factor r offering a modest efﬁciency ofinvestment payed into the common pool. The phase diagramfor r = 2 is plotted in Fig. 9. In agreement with the previousresults some parts of the corresponding phase diagram is iden-tical with those one can obtain if only one type of punishmentis allowed. For example, at high ﬁne values, the E strategy conquers not only D but O strategies as well, and the solu-tion reproduces the cases when the player can choose only D , C , or E strategy. This feature is related to an earlier observa-tion indicating that the increase of ﬁne would not necessarilyhelp the invasion of O strategy meanwhile peer punishers areunequivocally supported and conquer the system if β is en-hanced.On the other hand, one can observe striking similarity withthe previous results of a simpler model [29] obtained in theabsence of peer punishers. This happens in the low-ﬁne re-gion where E s cannot ﬁght efﬁciently against D and die outwithin a transient period. Accordingly, in this region of the β − γ plane D+O, (D+C+DO) c , and (D+C+O) c phases areidentiﬁed (as detailed in [29]) where the subscript ”c” refersto self-organizing spatial strategy distribution maintained bycyclic dominance on the analogy of evolutionary rock-paper-scissors games.

0 0.1 0.2 0.3 0.5 0.6 0.7 0.8 0.9 1 1.1 c o s t fineD D h(O) E D+O(D+C+DO) c (D+C+O) c D+C+O+E (D+O+E) c FIG. 9: (Color online) Cost-ﬁne phase diagram for the weak peerpunishment limit ( m = 1 / ( G − ) at r = 2 . . The coexistence of both types of punishments occurs in thephases (D+O+E) c and D+C+O+E indicated in the β − γ phasediagram (see Fig. 9). Within the phase (D+O+E) c three strate-gies dominate cyclically each other (namely, D beats E beats O beats D ) and form a self-organizing spatial pattern. At theseparameter values the (D+C+O) c phase is also a possible solu-tions. A. Stability analyses

In the present four-strategy model, however, the (D+O+E) c coalition (with proper spatio-temporal pattern) is more sta-ble and capable to invade the territory of other solutions asdemonstrated by consecutive snapshots in Fig. 10. For thisgoal the whole system is divided into large rectangular regionswith proper periodic boundary conditions (PBC) for each boxduring a relaxation time. Within each box only three strategies[ D + C + O or D + O + E ] are placed randomly in the initial state.After a suitable relaxation time the proper PBCs are removedand simultaneously the usual PBC is switched on. This trickhas allowed us to visualize the spatial competition betweenthe solutions (D+C+O) c (left) and (D+O+E) c (right). (a) (b)(c) (d) ODC E

FIG. 10: (Color online) Spatial competition between two solutions ofthree-strategy subsystems in the week peer punishment limit ( m =1 / ( G − ) for r = 2 . , β = 0 . , and γ = 0 . . Before inva-sions are allowed at t = 0 MCS along the vertical interfaces, bothstationary solutions [(D+C+O) c (left) and (D+O+E) c (right)] havebeen developed without disturbing each other in the correspondingregions. Snapshots of L = 400 × part of a L = 800 × system are taken at t = 0 MCS (a),

MCS (b),

MCS (c),and from the stationary state (d). Lower panel shows the colors ofstrategies and their relations at these parameters (pointed by an ar-row towards the one who is invaded by the other). These are blackfor D , white for C , blue (dark gray) for O , and orange (light gray)for E . If the system size is large enough then there is always achance that all the possible solutions can emerge locally some-where in the system and the most stable solution can ﬁnallyprevail throughout an invasion process in the whole system.The latter expectation is not necessarily satisﬁed particularlyif the system size is small (such as

L < for c ≈ . , r = 2 ). Besides it the ”small” size of the system also limitsthe characteristic size of patterns and prevents the formationof phases including signiﬁcantly larger correlation lengths.These are the reasons why one cannot achieve reliable MC results on small systems when analyzing the system behaviorin the vicinity of a critical point where the correlation lengthdiverges [62]. Further difﬁculties arise from the fact that thepresent spatio-temporal patterns can be characterized by twoor more length scales preventing the straightforward applica-tion of methods (e.g., ﬁnite-size scaling) developed in statisti-cal physics for the investigation of simpler systems [63].Besides it, the small size decreases the probability of theemergence of phases requiring longer relaxation throughouta complex evolutionary process. Figure 11 demonstrates therelated difﬁculties of numerical simulations we faced whenstudying this system for sizes as large as L = 5000 . Despiteof the large system size the ﬁnal state is still ambiguous if thesystem is started from a random initial state. In most cases thesystem evolves to either D or O state as demonstrated by theupper two plots of Fig. 11. Only a very few runs result in athird type (D+C+O) c phase. In order to justify the stability ofthe (D+C+O) c phase we have performed further stability anal-yses. Namely, by starting from a three-strategy initial state thestochastic evolution of the (D+C+O) c phase is interrupted at atime (indicated by an arrow in the bottom plot of Fig. 11) andhalf of the system is replaced by a large domain of E phase andafterwards the simulation is continued. The time-dependenceof the strategy frequencies quantify how the original solutionis restored. Similar analysis can be done to justify the superi-ority of (D+C+O) c phase over the O phase. It is worth men-tioning that this ineludible analysis is not time-consuming dueto smaller system size used in simulations. Furthermore, sucha conclusive test cannot be avoided when the model containsmore than three competing strategies.Now we discuss two (perpendicular) cross-sections of thecost-ﬁne phase diagram at r = 2 (Fig. 9) where the com-petition between the two punishing strategies plays relevantrole. The upper plot of Fig. 12 shows the variation of strat-egy frequencies in the stationary state when the ﬁne is var-ied from β = 0 . to β = 1 . at a ﬁxed cost. The readercan observe that the four-strategy D+C+O+E phase occurs viaa continuous transition from the phase (D+C+O) c when in-creasing the ﬁne and subsequently it transforms abruptly intothe phase D h ( O ) where only defectors are present in the ﬁ-nal stationary state. As well as previously, the subscript ofthe notation D h ( O ) refers to homoclinic instability being dif-ferent from those discussed in the previous section. In thepresent case the homogeneous D phase can be invaded by theoffspring of pool punishers if they help each other by form-ing a sufﬁciently large domain. At the same time the growingdomain of pool punishers can be eliminated by the offspringof either a single cooperator or peer punisher who is insertedinto the territory of pool punishers as a mutant created withan arbitrarily small rate. For both cases defectors play the roleof tertius gaudens and prevail the whole population. For thegiven cost the peer punishers can beat defectors (with or with-out the presence of others) if β > . when the systemevolves into the phase E.The lower plot of Fig. 12 illustrates three consecutive phasetransitions when increasing γ from 0 to 0.1 for a ﬁxed value ofﬁne ( β = 0 . ). Notice that within the four strategy phase thefrequency of cooperators is low ( ρ C < . ). Despite of the -6 -5 -4 -3 -2 -1 s t r a t e gy fr e qu e n c i e s time [MCS] DC O E -6 -5 -4 -3 -2 -1 s t r a t e gy fr e qu e n c i e s time [MCS] DC OE s t r a t e gy fr e qu e n c i e s time [MCS] DCO E

FIG. 11: (Color online) The upper two plots show evolutionaryprocesses within the region of (D+C+O) c phase when the systemis started from random initial state for L = 5000 ,using identical r = 2 . , β = 0 . , γ = 0 . , and m = 1 / ( G − parameter val-ues. The bottom plot demonstrates the stability of (D+C+O) c phaseif we insert a large E domain into the given state at t = 0 MCS (here L = 1200 ). low values of ρ C the presence of cooperators inﬂuences theefﬁciency of punishing strategies in a complex way indicatedby Figs. 12. s t r a t e gy fr e qu e n c i e s fine DCOE (D+C+O) c D h(O) D+C+O+E E s t r a t e gy fr e qu e n c i e s cost DCOE (D+C+O) c D h(O) D+C+O+E(D+O+E) c FIG. 12: (Color online) Strategy frequencies vs . ﬁne if γ = 0 . (upper plot) and vs . cost if β = 0 . (bottom plot) for m = 1 / ( G − and r = 2 . . Notation of phases is indicated at the top. V. CONCLUSIONS

In this work we have compared the efﬁciency of pool (insti-tutional) and peer (individual) punishments within the frame-work of spatial public goods game when the strategy evolutionis controlled by stochastic imitation (resembling Darwinianselection). This study is considered as an initial effort to un-derstand why some societies rely mainly on peer punishmentand others prefer pool punishments. As a general conclusion,the output in structured population may depend sensitively onthe parameter values characterize the relation of punishmentstrategies.Both types of punishment are applied by cooperative play-ers in different ways. The present four-strategy model exhibitsa wide variety in the ﬁnal stationary behavior in the limit ofinﬁnitely large system size when tuning the model parame-ters (synergy factor, cost and ﬁne of punishment) for a ﬁxedlevel of noise. In many cases the peer punisher strategy seemsto be more efﬁcient in the elimination of the ”tragedy of thecommons” when all players choose defection. The numeri-cal analysis allowed us to identify phases where both types ofpunishments coexist, sometimes together with the (pure) co-operators weakening the efﬁciency of punishment. We havefound regions in the plane of parameters where the compe-tition between the different punishments helped defectors toprevail the whole system.Finally we emphasize some additional and general conclu-sions extracted during the numerical analysis of the presentfour-strategy evolutionary game on a square lattice. Namely,we have observed an interesting phase where the spreading ofone of the strategies (here cooperation) is favored along an in-terface and the resultant monolayer can reverse the directionof invasion between the homogeneous domains separated. Wethink that the structure of the present interactions (namely, theplayers’ income are accumulated from ﬁve ﬁve-person games)provides convenient conditions for studying these types ofself-organizing patterns. Furthermore, we should stress thetechnical difﬁculties in the evaluation of phase diagrams de-scribing the boundary between distinguishable stationary be- haviors in the limit L → ∞ . It turned out that using the con-cepts of competing associations [2] we should check the direc-tion of invasions between most of the pair of solutions char-acterizing the spatio-temporal patterns for all possible sub-systems if we wish to avoid artifacts related to the complexﬁnite-size effects. At the same time the application of thisapproach may enhance the accuracy and efﬁciency of the nu-merical investigations when quantifying the phase boundariesin the large-size limit. Evidently, the systematic investigationof the ﬁnite size effect and also the expansion of a solution inanother subsystem solution are inevitable in similar complexsystems.We thank Karl Sigmund for initiating the present investiga-tions and stimulating discussions. This work was supportedby the Hungarian National Research Fund (grant K-73449),the Bolyai Research Grant, and the COST Action MP0801(Physics of Competition and Conﬂicts). [1] M. A. Nowak, Evolutionary Dynamics (Harvard UniversityPress, Cambridge, MA, 2006).[2] G. Szab´o and G. F´ath, Phys. Rep. , 97 (2007).[3] G. Hardin, Science , 1243 (1968).[4] C. Hauert, S. De Monte, J. Hofbauer, and K. Sigmund, Science , 1129 (2002).[5] M. A. Nowak, Science , 1560 (2006).[6] C. Hauert, A. Traulsen, H. Brandt, M. A. Nowak, and K. Sig-mund, Science , 1905 (2007).[7] C. Hauert, A. Traulsen, H. Brandt, M. A. Nowak, and K. Sig-mund, Biol. Theor. , 114 (2008).[8] J. Y. Wakano, M. A. Nowak, and C. Hauert, Proc. Natl. Acad.Sci. USA , 7910 (2009).[9] X.-B. Cao, W.-B. Du, and Z.-H. Rong, Physica A , 1273(2010).[10] J. Zhang, X. Chen, C. Zhang, L. Wang, and T. Chu, Physica A , 4081 (2010).[11] Z. Rong, Z.-X. Wu, and W.-X. Wang, Phys. Rev. E , 026101(2010).[12] H.-Y. Cheng, H.-H. Li, Q.-L. Dai, Y. Zhu, and J.-Z. Yang, NewJ. Phys. , 123014 (2010).[13] M. Perc and Z. Wang, PLoS ONE , e15117 (2011).[14] Y.-T. Lin, H.-X. Yang, Z.-X. Wu, and B.-H. Wang, Physica A , 77 (2011).[15] H. Cheng, Q. Dai, H. Li, Y. Zhu, M. Zhang, and J. Yang, NewJ. Phys. , 043032 (2011).[16] D. Peng, H.-X. Yang, W.-X. Wang, G. R. Chen, and B.-H.Wang, Eur. Phys. J. B , 455 (2010).[17] F. Du and F. Fu, Dyn. Games Appl. (2011).[18] R. Suzuki and T. Arita, Int. J. Bio-Inspired Computation , 151(2011).[19] T. Ohdaira and T. Terano, Adv. Complex Syst. , 377 (2011).[20] E. Fehr and S. G¨achter, Am. Econ. Rev. , 980 (2000).[21] E. Fehr and S. G¨achter, Nature , 137 (2002).[22] K. Sigmund, Trends Ecol. Evol. , 593 (2007).[23] K. Sigmund, The Calculus of Selﬁshness (Princeton UniversityPress, Princeton, MA, 2010).[24] K. Panchanathan and R. Boyd, Nature , 499 (2004).[25] D. Helbing, A. Szolnoki, M. Perc, and G. Szab´o, PLoS Comput.Biol. , e1000758 (2010).[26] D. Helbing, A. Szolnoki, M. Perc, and G. Szab´o, New J. Phys. , 083005 (2010).[27] K. Sigmund, H. De Silva, A. Traulsen, and C. Hauert, Nature , 861 (2010).[28] K. Sigmund, C. Hauert, A. Traulsen, and H. De Silva, Dyn.Games Appl. , 149 (2011).[29] A. Szolnoki, G. Szab´o, and M. Perc, Phys. Rev. E , 036101(2011).[30] G. Szab´o and C. Hauert, Phys. Rev. Lett. , 118101 (2002).[31] A. Szolnoki and M. Perc, EPL , 38003 (2010).[32] G. Szab´o and C. Hauert, Phys. Rev. E , 062903 (2002).[33] V. C. L. Hutson and G. T. Vickers, Phil. Trans. R. Soc. Lond. B , 393 (1995).[34] T. Reichenbach, M. Mobilia, and E. Frey, J. Theor. Biol. ,368 (2008).[35] E. Frey, Physica A , 4265 (2010).[36] A. Szolnoki, Z. Wang, J. Wang, and X. Zhu, Phys. Rev. E ,036110 (2010).[37] T. Clutton-Brock and G. A. Parker, Nature , 209 (1995).[38] E. Fehr and B. Rockenbach, Nature , 137 (2003).[39] D. Semmann, H.-J. Krambeck, and M. Milinski, Nature ,390 (2003).[40] D. J.-F. de Quervain, U. Fischbacher, V. Treyer, M. Schellham-mer, U. Schnyder, A. Buck, and E. Fehr, Science , 1254(2004).[41] J. Henrich, Science , 60 (2006).[42] T. Sasaki, I. Okada, and T. Unemi, Proc. R. Soc. Lond. B ,2639 (2007).[43] M. Egas and A. Riedl, Proc. R. Soc. B , 871 (2008).[44] S. Bowles and H. Gintis, Theor. Pop. Biol. , 17 (2004).[45] H. Brandt and K. Sigmund, Proc. Natl. Acad. Sci. USA ,2666 (2005).[46] K. Sigmund, C. Hauert, and M. A. Nowak, Proc. Natl. Acad.Sci. USA , 10757 (2001).[47] H. Ohtsuki, Y. Iwasa, and M. A. Nowak, Nature , 79 (2009).[48] J. Henrich and R. Boyd, J. Theor. Biol. , 79 (2001).[49] A. Dreber, D. G. Rand, D. Fudenberg, and M. A. Nowak, Na-ture , 348 (2008).[50] A. Szolnoki, M. Perc, and G. Szab´o, Phys. Rev. E , 056109(2009).[51] D.-M. Shi, H.-X. Yang, M.-B. Hu, W.-B. Du, B.-H. Wang, andX.-B. Cao, Physica A , 4646 (2009). [52] R.-R. Liu, C.-X. Jia, and B.-H. Wang, Physica A , 5719(2010).[53] D.-M. Shi, Y. Zhuang, and B.-H. Wang, EPL , 58003 (2010).[54] X.-J. Chen and L. Wang, EPL , 38003 (2010).[55] G. Szab´o and C. T˝oke, Phys. Rev. E , 69 (1998).[56] P. Clifford and A. Sudbury, Biometrika , 581 (1973).[57] J. T. Cox and D. Griffeath, Ann. Probab. , 876 (1983).[58] T. M. Liggett, Interacting Particle Systems (Springer, NewYork, 1985).[59] I. Dornic, H. Chat´e, J. Chave, and H. Hinrichsen, Phys. Rev.Lett. , 045701 (2001). [60] D. Helbing, A. Szolnoki, M. Perc, and G. Szab´o, Phys. Rev. E , 057104 (2010).[61] J. R. N. Chiappin and M. J. de Oliveira, Phys. Rev. E , 6419(1999).[62] D. Landau and K. Binder, A Guide to Monte Carlo Simulationsin Statistical Physics (Cambridge University Press, Cambridge,2000).[63] H. E. Stanley,