[PDF] Social imitation vs strategic choice, or consensus vs cooperation in the networked Prisoner's Dilemma

Abstract

The interplay of social and strategic motivations in human interactions is a largely unexplored question in collective social phenomena. Whether individuals' decisions are taken in a pure strategic basis or due to social pressure without a rational background crucially influences the model outcome. Here we study a networked Prisoner's Dilemma in which decisions are made either based on the replication of the most successful neighbor's strategy (unconditional imitation) or by pure social imitation following an update rule inspired by the voter model. The main effects of the voter dynamics are an enhancement of the final consensus, i.e., asymptotic states are generally uniform, and a promotion of cooperation in certain regions of the parameter space as compared to the outcome of purely strategic updates. Thus, voter dynamics acts as an interface noise and has a similar effect to a pure random noise; furthermore, its influence is mostly independent of the network heterogeneity. When strategic decisions are made following other update rules such as the replicator or Moran processes, the dynamic mixed state found under unconditional imitation for some parameters disappears, but an increase of cooperation in certain parameter regions is still observed. Comparing our results with recent experiments on the Prisoner's Dilemma, we conclude that such a mixed dynamics may explain moody conditional cooperation among the agents.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] A ug Social imitation vs strategic choice, or consensus vs cooperationin the networked Prisoner’s Dilemma

Daniele Vilone, ∗ Jos´e J. Ramasco, † Angel S´anchez,

3, 4, ‡ and Maxi San Miguel § LABSS (Laboratory of Agent Based Social Simulation), Institute of Cognitive Science and Technology,National Research Council (CNR), Via Palestro 32, 00185 Rome, Italy Instituto de F´ısica Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), 07122 Palma de Mallorca, Spain Grupo Interdisciplinar de Sistemas Complejos (GISC), Departamento de Matem´aticas,Universidad Carlos III de Madrid, 28911 Legan´es, Madrid, Spain Instituto de Biocomputaci´on y F´ısica de Sistemas Complejos (BIFI),Universidad de Zaragoza, 50018 Zaragoza, Spain

The interplay of social and strategic motivations in human interactions is a largely unexploredquestion in collective social phenomena. Whether individuals’ decisions are taken in a pure strategicbasis or due to social pressure without a rational background crucially inﬂuences the model outcome.Here we study a networked Prisoner’s Dilemma in which decisions are made either based on thereplication of the most successful neighbor’s strategy (unconditional imitation) or by pure socialimitation following an update rule inspired by the voter model. The main eﬀects of the voterdynamics are an enhancement of the ﬁnal consensus, i.e., asymptotic states are generally uniform,and a promotion of cooperation in certain regions of the parameter space as compared to theoutcome of purely strategic updates. Thus, voter dynamics acts as an interface noise and hasa similar eﬀect to a pure random noise; furthermore, its inﬂuence is mostly independent of thenetwork heterogeneity. When strategic decisions are made following other update rules such as thereplicator or Moran processes, the dynamic mixed state found under unconditional imitation forsome parameters disappears, but an increase of cooperation in certain parameter regions is stillobserved. Comparing our results with recent experiments on the Prisoner’s Dilemma, we concludethat such a mixed dynamics may explain moody conditional cooperation among the agents.

PACS numbers: 89.75.Fb, 87.23.Ge, 87.23.KgKeywords: complex systems, evolutionary game theory, prisoner’s dilemma, voter model, cooperation, con-sensus

I. INTRODUCTION

Collective social phenomena arise from interactions ofindividuals, considered the elementary units of socialstructures [1]. Related phenomena include, e.g., strategicinteractions, opinion/cultural dynamics, epidemic andrumor spreading, and so on. In many of these problems,dynamics is not uniquely deﬁned as in a Hamiltoniansystem, and it is often the case that social contexts canevolve in time in many diﬀerent manners [2]. This isparticularly relevant in strategic interactions, in whichpeople take into account their expectations about whattheir partners might choose to do [3, 4].A speciﬁc and socially important problem where thedynamics leads to very diﬀerent outcomes is the emer-gence of cooperation [5–8]. Indeed, the spreading andprosperity of cooperative behaviors, often observed inhuman societies despite their disadvantage in terms ofﬁtness of single individuals, has been for years a cen-tral topic in Evolutionary Dynamics, Economics, Soci-ology and Psychology. Several models, mechanisms and ∗ [email protected] † jramasco@iﬁsc.uib-csic.es ‡ [email protected] § maxi@iﬁsc.uib-csic.es ideas have been suggested in order to explain the emer-gence and the stability of cooperation [9–11], particularlyin games where cooperation is costly such as Prisoner’sDilemma (PD). One such mechanism is the so-called net-work reciprocity, which was ﬁrst proposed by Nowak andMay [12]. They showed that if the PD is played on a lat-tice frozen conﬁgurations can appear, where cooperatorshave survived and even overwhelmed defectors. Subse-quent research established that the eﬀect of the topologyis not universal [13] (see also [14] for a recent review),and the enhancement of the cooperative strategies de-pends on the network properties, the dynamical updaterule, the exact entries of the payoﬀ matrix, the size ofthe system and the time structure [15–21]. The theoret-ical discussion was ﬁnally settled by several experiments[22–26], which established that networks did not exhibita cooperation level signiﬁcantly higher than a well mixedpopulation. Interestingly, those experimental works al-low to conclude that most of the update rules that havebeen used in analytical or numerical studies of the evolu-tionary PD are not used by actual people when playingthe game, and therefore other dynamics must be consid-ered.In this paper, we propose an evolutionary update rulethat combines strategic thinking with imitation of a moresocial character, and study how it aﬀects the outcome ofthe networked PD. Indeed, it has been noticed that inthe real world a key point is that people do not reach adecision only on the basis of a strategic reasoning, butalso considering the social pressure of their environment,with the possibility of making mistakes [27, 28]. Theﬁrst and most common way for an individual to followsocial pressure is imitating a neighbor’s act or opinionwithout any strategic considerations behind. To repre-sent this, in a previous work [29] we introduced a modelwhere agents can evolve by a mixed dynamics of pure andstrategic imitation in a coordination game, ﬁnding thatsuch mixing makes possible that the system orders in oneof the two absorbing states in situations in which neitherthe pure coordination game nor the voter model reachconsensus. Our aim here is to broaden the topic study-ing what happens when pure imitation acts together withstrategic dynamics on the PD, verifying if also in this casethe non-strategic behavior leads to a more convenient ﬁ-nal conﬁguration for the whole society. In addition, wewill compare our results to the available experimentalevidence, drawing conclusions as to the relevance of theproposed dynamics to the real world.The paper is organized as follows: in Sec. II, we presentin detail our model and the rationale behind its deﬁni-tion. Section III deals then with the numerical resultsobtained in two relevant systems, namely random (Erd¨os-R´enyi, ER) and scale free (SF) networks. Subsequently,in Sec. IV, it is analyzed how the system outcome changeswhen other update dynamics are used in the strategicdecisions. Section V discusses our work on the light ofthe experiments and ﬁnally Sec. VI summarizes our mainﬁndings. II. MODEL DESCRIPTION

Our model consists of a system of N agents on a net-work, which for the purpose of this work will be eitherErd¨os-R´enyi or random SF (generated with the conﬁg-urational model) as paradigmatic examples to explorethe eﬀects of heterogeneity in the number of connections.Additionally, experiments on the behavior of human sub-jects have been carried out on the same type of inhomo-geneous networks [25], which facilitates the comparisonbetween our results and the experimental ones. Eachagent interacts directly only with her nearest neighbors inthe network, and can choose among two possible actions,represented by a two-valued integer variable (“action”) s = ±

1. We identify the positive value with cooperation( C ) and the negative one with defection ( D ). Individu-als interact in a PD, i.e., they play with their neighborsand then collect a payoﬀ according to the action adoptedby themselves and their opponents. Payoﬀs are collectedaccording to the following payoﬀ matrix: C DC D T ε (1)where the punishment parameter ε must fall in the in-terval [0 ,

1) and

T > T = 1 . ε inour simulations. Note that this is not the usual way towork with the PD in most papers, as typically ε is setto 0 and the payoﬀ to C vs D is in a range of zero ornegative values. The reason for this particular choice isto facilitate the interpretation of our results on the lightof earlier work of ours on symmetric coordination games[29], given by the same payoﬀ matrix as in Eq. (1) with T = 0 and ǫ = 1. In the game considered here, it iswell known that the rational choice is always to defect,not only to prevent betrayals by other players but alsobecause it always gives higher payoﬀ no matter what theother player does (in economic terms, it is a Nash equi-librium [4]). However, it is also clear that the global gainwould be higher if both agents cooperate, and just in herelays the dilemma.Having deﬁned our strategic context, we now turn tothe dynamics. After every interaction, an individual up-dates her strategy simply imitating a neighbor at ran-dom, that is, following the voter model (VM) dynamicswith probability q . Otherwise, with probability 1 − q , anupdate rule based on the actual ﬁtness attained by her-self and her neighbors is implemented. For the main partof the paper, we will use unconditional imitation (UI) asthe strategic update rule. With UI, at the end of eachround of the game, every player imitates the strategy ofthe neighbor that has obtained the best payoﬀ providedit is larger than her own. This update rule was intro-duced in [12] as a simple way to implement the interestof a (rational) agent to maximize her own ﬁtness andto reach that end by imitating the most successful in-dividuals. It is important to note that there are otherstrategic imitation rules that could be considered, someof which are discussed in Sec. IV below as alternativesto our main model. Experimental results, however, showthat UI might resemble the actions of the players in reallife, provided they do not stick to the update rule all thetime [24, 26]. This is a further motivation to consider amixed dynamics as we are proposing here. On the otherhand, regarding the VM, in spite of its simplicity it hasbeen shown to capture some features of the way peoplebehave in, e.g., electoral processes [31] and therefore it isa very suitable manner to introduce mechanisms of socialimitation in an otherwise strategic problem.For reference, it is useful to recall what are the eﬀectsof either of our dynamics when acting alone. In highdimensional lattices ( d ≥

3) and random networks, theVM dynamics drives the system to a disordered activestate, whose proportion of opinions (in our case, actions)is given by the initial conditions [1, 30]. This state is dy-namic in the ensemble average, the opinion of the agentskeep changing along time. In ﬁnite systems, the ﬂuc-tuations will eventually drive the system to consensus,which is an absorbing state. The characteristic time toreach consensus grows with system size N , diverging inthe limit N → ∞ . As for the strategic dynamics, whilein mean ﬁeld PD ends up in a consensus (frozen) statewith all defectors regardless of the update rule, in com- ER random network

Cooperatordensity

0 0.1 0.2 0.3 0.4 0.50 0.1 0.2 0.3 0.4 0.500.20.40.60.8100.20.40.60.81 00.10.20.30.400.20.40.60.81

Active bonddensity

FIG. 1. Final active bonds density (top) and cooperator den-sity (bottom) as a function of the dynamics mixing parame-ter q and the punishment ε for a system on an ER randomnetwork of size N = 3000 and average degree h k i = 8 . ε, q ) ∈ [0 , . × (0 , q = 0 and q = 1 are excluded. plex topologies it is possible that some cooperators sur-vive, even though this eﬀect is not universal and the ﬁ-nal state depends sensitively on an entire set of diﬀerentparameters. A full summary of the diﬀerent outcomescan be found in the review [14]. For our basic rule, UI,the outcome of the evolution on random networks is ingeneral also full defection, but in a small parameter re-gion around ε = 0 cooperation prevails, the more so thelower the degree of the network. Such promotion of co-operation in a bounded range of parameters takes placeas well for scale free networks, although in this case thecooperative region is smaller than for random networks.Note, however, that (whether ordered or disordered) theﬁnal outcome of the system on the random networks withstrategic rules is in general a frozen state, where no playerchanges her decision anymore. Finally, it is also inter-esting to mention what happens when the interaction isgiven by a coordination game [29]. In this case, as ex-plained, the ﬁnal outcome of either rule (VM or UI) aloneis a disordered state active or frozen but the mix of bothUI and VM completely changes the scenario because thesystem tends to consensus either rapidly when the VMis dominant or slowly when it acts as a noise over thegame-like evolution. ER random networks Scale free

0 0.1 0.2 0.3 0.4 0.50 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.50 0.1 0.2 0.3 0.4 0.5

ActivebonddensityCooperatordensity ActivebonddensityCooperatordensity

FIG. 2. Final active bonds density (top) and cooperator den-sity (bottom) as a function of the mixing parameter q and ε for a system on networks of size N = 3000. The networkis an ER on the left and scale free on the right with averagedegree h k i = 5 .

14. Caveat: the interval of parameter spaceinvestigated is ( ε, q ) ∈ [0 , . × (0 , q = 0 and q = 1 are excluded. III. NUMERICAL RESULTS

We begin the report of our numerical results by check-ing whether the mixing of the two dynamics inﬂuenceshow the system reaches the ﬁnal state, and the natureof the ﬁnal state itself. Unless otherwise stated, we willbe presenting results averaged over 1000 system realiza-tions. Figure 1 summarizes our results for an ER randomnetwork with h k i = 8 .

48. In the top plot, the ﬁnal valueof the active link density is shown as a function of thepunishment and of the mixing parameter. The active linkdensity is the fraction of links connecting nodes with op-posite strategies. In the case of VM dynamics, these linksare susceptible of further updates. As can be seen in thebottom panel, for a majority of parameter choices thesystem becomes frozen in a consensus state (either all Cor D). Nevertheless, there exists a parameter region inwhich the fate of the system is not frozen but a dynamicstate. As mentioned above, this is not the case when thegame is of coordination, and therefore it is a new phe-nomenon arising for the PD. This region corresponds tolow values of ε , meaning that the incentive to defect is rel-atively small, and high values of q , i.e., large probabilityof imitating a neighbor at random (VM). Therefore, thedynamics is such that as the transition from cooperatorto defector is relatively slow due to the strategic updatethere remains a pool of cooperators than can be sociallycopied and thus an overall intermediate level of cooper-ation is maintained. In the regions where full consensusis reached, the cooperator density in the plot is actuallythe probability that the ﬁnal conﬁguration of a realiza-tion is a full C state. An interesting feature demonstratedby these plots is that cooperation is enhanced with re-spect to the classical mean-ﬁeld case and, although thecomparison is not straightforward due to the diﬀerent pa-rameterization, it is also enhanced when compared withpure UI dynamics on a random network. -3 -2 -1 n a (t) q = 0.05q = 0.5q = 0.95 ER network t -3 -2 -1 n a (t) q = 0.05q = 0.5q = 0.95 -3 -2 -1 q = 0.1q = 0.7q = 0.95 Scale-free t -3 -2 -1 q = 0.1q = 0.7q = 0.95 ε = 0.05ε = 0.05ε = 0.3 ε = 0.4 FIG. 3. Time evolution of the average active links for a systemon networks of size N = 3000; a random network on theleft and a scale free network on the right, both with h k i =5 .

14. Top ﬁgures are with low value of ε (0.5 in both cases);bottom ones are with higher punishment parameter (0.3 and0.4, respectively). As these results are obtained for a speciﬁc type of net-works, it is important to check their generality if theaverage degree or the full network topology is changed.For comparison, Figure 2 presents the ﬁnal value of theactive link density and the cooperator density for an ERrandom network and for a scale free network of averagedegree h k i = 5 .

14 . It can be seen that decreasing thenumber of neighbors leads to an expansion of the regionwhere cooperation is observed, in agreement with whattakes place with UI dynamics only; as the number ofneighbors decreases it becomes more likely to imitate acooperator and the network may be driven towards fullcooperation. On the contrary, the case of the scale freenetwork behaves diﬀerently: adding more heterogeneityto the network leads to a slightly larger region of cooper-ative behavior and, at the same time, to a smaller regionwhere the dynamics leads to an active state. Notwith-standing, the general conclusion one can draw from theseplots is that the network properties, at least in the classof uncorrelated networks we are looking at, do not aﬀectvery much the outcome of the mixed dynamics.Let us focus now on the details of the time evolutionof the system. Figure 3 shows examples of the behaviorof the average active links density in a system evolvingon a random or on a scale free network for diﬀerent val-ues of the punishment ε . For larger values of the punish-ment, we ﬁnd that the system always reaches a consensusstate, following an exponential decay for high values of q ,and a power-law-like decay for small values of the mix-ing parameter. This behavior is qualitatively the samein both types of networks, and it is in good agreementwith the results of the model studied in [29], where play-ers interacted through a coordination game. When thepunishment is small, for small q the system still reaches a consensus state. However for higher q it ends up in adynamic state. It is interesting to stress that for the q values chosen in the case of the scale free network conver-gence to consensus or an active mixed state is in fact areentrant phenomenon, as was expected from the resultsin Figure 2. The fact that the system reaches a real activeﬁnal state is proven in Figure 4, where the time neededto reach consensus, τ , is shown to diverge exponentiallywith the system size. In the thermodynamic limit, thesystem never orders.To gain further insight on the properties of the ﬁnalstate, we check whether cooperation is promoted or notby the inﬂuence of the VM dynamics. The ﬁnal coop-erator density is shown in Figure 5 as a function of q for ε = 0 .

05. The aim is to provide a clearer picturethan the general phase diagrams ( ε, q ) discussed above.Even though in absence of any other model contribution( q = 0) the cooperation is already promoted (as com-pared to mean ﬁeld) due to topological eﬀects [14], wecan notice how initially the mixing of dynamics actuallyenhances cooperation compared with the case of pure PDgame. Subsequently, upon increasing q , the ﬁnal cooper-ator density begins its decay towards zero (not shown),apart from the singular limit at q = 1 where, as alreadystated, only the voter dynamics is acting and the systemremains in an active disordered state (more precisely, thelimit of n ∞ c for q → − is zero, but an inﬁnite time isneeded to reach it). Once again, the behavior is similaron the two types of networks studied. This is reminiscentof the promotion of cooperation due to random noise (ormutation) analyzed in [32]. However, the two models arediﬀerent because in the case of [32] noise acted upon anynetwork node, through a random change in the currentaction of the player. On the contrary, what we have hereis interfacial noise, because it only acts through imitationof neighbors who are playing a diﬀerent action. There-fore, the noise introduced by the VM does not act on N τ Numerical dataExponential fit

ER networks N τ Scale-free

FIG. 4. Dependence of the time needed to reach the ﬁnalstate τ as a function of the system size N . Operatively, τ has been measured as the time when the active link densitydropped below 0 .

07. The networks have a random (left) orscale free (right) topology with average degree h k i = 5 . ε = 0 .

05 and q = 0 . τ ∼ exp( γN ), wehave estimated γ ≃ . ± . γ ≃ . ± . q n c ∞ ER networks q Scale-free

FIG. 5. Final cooperator density for a system on a random(left) or scale free (right) network of size N = 3000 and aver-age degree h k i = 5 .

14. The punishment parameter is ε = 0 . q = 0 means a purely strategy update of the PD, note thatthere is a slight increase of n ∞ c as soon as q becomes largerthan zero. -3 -2 -1 n a (t) q = 0.1q = 0.6q = 0.9 REP t -3 -2 -1 n a (t) -2 -1 MOR t -2 -1 ε = 0.1 ε = 0.1ε = 0.4 ε = 0.4 FIG. 6. Time evolution of average active links for a system ona random network of size N = 3000 and average degree h k i =8 .

48 with punishment ε = 0 . ε = 0 . players who are completely surrounded by others withtheir same action. Notwithstanding, we do observe thatan appropriate amount of VM leads to an increase of thecooperation. IV. OTHER UPDATE RULES

To complete our study and enlarge the understandingof the combined eﬀects of social imitation and strategicdynamics, we will now consider two additional dynam-ics, namely, replicator (REP) and Moran (MOR) rules.With REP agents choose a neighbor at random: if thepayoﬀ of the chosen neighbor is lower than the agent’sown, nothing changes, but if it is larger, the agent willadopt the neighbor’s strategy with a probability propor-tional to the diﬀerence between the two payoﬀs. On theother hand, with MOR, agents choose one of their neigh-bors with probability proportional to their payoﬀ, with-

ER networks Scale free

0 0.1 0.2 0.3 0.4 0.50 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.50 0.1 0.2 0.3 0.4 0.500.20.40.60.8100.20.40.60.81 00.20.40.60.8100.20.40.60.8100.10.2 0.10.20.30.40.250.30.40.500.20.40.6 n c ∞ REP n c ∞ REP

MOR MOR n c ∞ n c ∞ FIG. 7. Final cooperator density as a function of q and ε for a system on a random (left) or scale free (right) networkof size N = 3000 and average degree h k i = 8 .

48 and h k i =5 .

14, respectively. Top: REP update rule. Bottom: MORrule. Caveat: the interval of parameter space investigatedis ( ε, q ) ∈ [0 , . × (0 , q = 0 and q = 1 areexcluded. out considering whether it is larger than hers or not. Wechose these two rules because they complement our studyof UI: indeed, UI is a local deterministic rule (an agentwatches all her neighborhood and the evolution is com-pletely predetermined by the rule), REP is pairwise andstochastic (an agent decides how to evolve watching onlyone neighbor per time, and the result of the evolution isnot univocally determined), MOR is local and stochas-tic. Moreover, MOR allows the individuals to make mis-takes (there is a non-zero probability to imitate a neigh-bor with worse ﬁtness). In fact, MOR can be understoodas a weighted social imitation, where the weight is thesuccess of the individual observed. We stress, however,that these two rules have not been observed in experi-ments and therefore their interest here arises from theviewpoint of the understanding of the mechanisms of themixed dynamics.The ﬁrst important diﬀerence when the update rulechanges is the disappearance of the ﬁnal active state forany value of the parameters. In all the ( ε, q ) phase space,the behavior is always as in the examples depicted inFigure 6: the active link density vanishes and a ﬁnalconsensus is reached, with all the network choosing thesame action (either C or D). Moreover, the decay is inall cases exponential, pointing to the power-law behaviorobserved with UI as a speciﬁcity of that algorithm. Inpractice, what we ﬁnd is that the stochasticity of thestrategic evolution algorithm speeds up the dynamics andhelps the system reaches always ﬁnal consensus, whereasthe pairwise or local character of the rule is less relevant.As with the UI strategic dynamics, the type of net-work considered (within the broad class of uncorrelatednetworks of high dimensionality) is not relevant, in par-ticular with regard to the ﬁnal conﬁguration. MOR ruleenhances cooperation with respect to REP, as shown inFigure 7, and this increase is always somewhat larger q n c REP q MOR ∞ FIG. 8. Final cooperator density for a system on a randomnetwork of size N = 3000 and average degree h k i = 8 .

48. Thepunishment parameter is ε = 0 .

10. Left: REP update rule.Right: MOR rule. Note that since the ﬁnal conﬁguration isstatic n ∞ c corresponds to the fraction of realizations endingin full cooperation. in scale free networks. It can be noticed that while forREP the ﬁnal cooperator density drops rapidly with in-creasing q , for agents evolving with the Moran rule theﬁnal cooperator density falls more slowly and in a longerrange of q values (Fig. 8). In practice, while with MORrule inserting non-strategic imitation slightly enhances(or at least does not hinder) cooperation just as withUI, when we have REP the voter dynamics appears tohamper the spreading of the cooperative strategy. As in-dicated above, MOR can be seen as a generalized voter,and the result is that the dynamics is similar in the sensethat it leads with similar probability to any of the twopossible outcomes. The ﬁnal system aftermath is slightlybiased towards the equilibrium, but, interestingly, alwaysin a frozen conﬁguration, which does not occur with VMalone. Finally, it is clear that in both cases there is onlya weak dependence on ε . V. DISCUSSION

This far, we have studied the combination of strategicand social imitation dynamics as driver for evolution in anetworked PD. We have thus learned that when strategicupdates obey the UI rule, the mixing of dynamics helpsthe system to reach consensus in an all C or D conﬁg-uration. Note that for certain values of the parametersthe ﬁnal state is full cooperation, which is not an equi-librium of the underlying game. Furthermore, there is asmall parameter region (high q and low ε ) where the ﬁnalfate of the system is a dynamic state with a fraction ofcooperators and defectors.In view of these results, a ﬁrst question arises aswhether some analytical explanation of those observa-tions is possible. Unfortunately, the approach used forthis purpose in [29] does not work here due to the stronglydiﬀerent nature of the two games: VM and PD do nothave the same equilibria as VM and the coordinationgame. Indeed, the outcome of the evolution for the co-ordination game is always a Nash equilibrium, and when P D C ER networks fraction cooperators P C C scale-free fraction cooperators FIG. 9. Evidence for moody conditional cooperation on sys-tems given by a random network (left) and a scale free network(right) of size N = 3000 and average degree h k i = 5 .

14. Top:frequency of cooperation after defection as a function of thefraction of cooperating neighbors (see text) in the previousround. Bottom: frequency of cooperation after cooperationas a function of the fraction of cooperating neighbors in theprevious round. Note that the histograms are similar regard-less of the network topology. The payoﬀ matrix is the sameas in [24] for comparison, with ε = 0, T = 1 .

4, and the payoﬀto a cooperator facing a defector vanishing as well; UI updaterule. the PD converges to full cooperation, it is not. Thereis, however, a possibility to address the issue in a lim-ited parameter regime by considering a random walk ina weighted directed network. It is known that the conver-gence time of the voter model has the same distributionas the convergence time of the coalescing random walkprocess to a single particle [33–35], which in its turn isrelated to the mean ﬁrst passage time (MFPT) of a singlewalker [36]. Then, for q = 1 (i.e., pure VM dynamics)our problem is exactly that of the random walk on com-plex networks. On the other hand, for q <

1, because ofthe evolutionary rules, the agents with higher ﬁtness canbe seen as nodes where the probability of a walker to fallinto increases and, at the same time, it becomes morediﬃcult to escape from them. Now, for an undirectednetwork, the MFPT from whatever starting point to anarrival node j , τ uj , is [36, 37] τ uj = D P l A lj , (2)where A lj is the adjacency matrix, which includes theweight of each link, and D is an appropriate constant.Considering a situation with a mixing parameter q closeto one, let us say, q = 1 − η with η small enough, theMFPT can be approximated by τ j = τ uj + α j ( η ) , (3) t -2 -1 n c , n a n a (t)n c (t) FIG. 10. Time evolution of the fractions of cooperators andactive links on a scale free network of size N = 3000 andaverage degree h k i = 5 .

14. The payoﬀ matrix is the same asin [24] for comparison, with ε = 0, T = 1 .

4, and the payoﬀto a cooperator facing a defector vanishing as well; UI updaterule. where α j should be a suitable perturbation dependingon η and on the node j . Then, if j is occupied by anindividual with the best ﬁtness of all her neighborhoodand having in mind that the strategic evolution is UI, nowalker will ever reach node j , i.e. α j → ∞ , then also τ j diverges. For the other rules we have checked, withMOR rule α j always remains ﬁnite, whereas with REPrule the system behaves in principle like UI, but sinceit hinders considerably cooperation after relatively fewtime steps, it is easy for the ﬁnite system to undergo aﬂuctuation (due to the voter dynamics) which let it landat the all-defectors consensus. Of course, such argumentis only qualitative and does not fully explain what goeson to the dynamics for every value of q close to 1, butat least provides some justiﬁcation of our observationsunder an UI update rule.Beyond ﬁnding analytical insights on our results, theother important question that requires discussion is therelation of this dynamics with experiments. The experi-ments on square lattices [24, 26] or on heterogeneous net-works [25] show that subjects behave in a way that hasbeen called moody conditional cooperation (MCC). Thismeans that the action they take depends on the numberof cooperating neighbors they had in the previous round:the more neighbors cooperated, the more likely it is thatthe player cooperates. However, the choice depends alsoon the player’s own previous action: thus, it has beenshown that cooperation following cooperation is muchmore likely than following defection. We have thereforechecked whether the dynamics of our simulations is com-patible with this observation. To this end, we looked atthe probability of cooperation of an agent as a functionof her last action and depending on the fraction of coop-erating neighbors the last time she was selected to play a round. Also, in order to get better statistics and com-pare the results, we consider three ranges of cooperationcontexts: when less than one third of the neighbors co-operated, when more than two third cooperated, and theintermediate case. We then estimate the probabilities ofcooperation by looking at the frequencies with which thecorresponding actions are chosen.Our results are shown in Figure 9, where it can be seenthat we indeed observe a behavior of the agents reminis-cent of the experimental observations. Indeed, we seethat after defecting the probability to cooperate in thenext round is low and not very dependent on the context,while after cooperating there is an almost linear depen-dence of the probability to cooperate again on the num-ber of neighboring cooperators, as reported in the exper-iments [24–26]. Moreover, again in accordance with ex-periments, there is no dependence on the topology. Thesimulation has been done with a value of q = 0 .

3, whichwe have chosen seeing that in the experimental data, ifthere is any imitation of the best neighbor, it happensat most in around 70% of the cases. However, it is im-portant to stress that these measurements are carriedout during the transient phase: as we already stated, formost of the parameter space the ﬁnal state of the systemis either full cooperation or full defection, and in par-ticular for payoﬀ values similar to those used in the ex-periments, the system converges to full cooperation (seeFig. 10). This is not in agreement with the experimen-tal observations, where, if there is indeed convergenceto a homogeneous state, it is very slow and much morelikely towards full defection. In previous works [29, 38],we have observed a much slower dynamics for the coop-eration fraction but associated to other type of gamesor more involved opinion models. On the other hand,we have checked that the same behavior arises for a verywide range of values of q and ε , and hence in this sense therelation between moody conditional cooperation and ourmixed dynamics is quite robust. In any event, we stressthat such mixing of dynamics is not a necessary conditionto get MCC behavior, but a suﬃcient one. Therefore,while there are alternative explanations of the observedbehavior in terms of players’ learning [39], we believethat the fact that the outcome of our simulations mix-ing social and strategic imitation is not far, in behavioralterms, from the experimental results, warrants furtherresearch to clarify the explanatory power of this mecha-nism or similar ones. In this respect, the observed lackof qualitative dependence of the agents’ behavior on spe-ciﬁc properties of a high dimensional network where thegame takes place, being another feature in which bothour simulations and experiments coincide, makes such astudy even more appealing. VI. CONCLUSIONS

In this paper, we have provided evidence that social im-itation dynamics, added to strategic update rules drivingthe evolution of players’ actions in a PD, leads to consen-sus. In addition, in an ample range of parameters, theselected consensus is cooperative, even if it is not a Nashequilibrium of the game. In this manner, social imita-tion given by a VM-like dynamics, helps the individualsof a complex society adopt the most convenient behaviorwithout being inﬂuenced by the risks involved in such achoice, again, at least for not too large punishments. It isimportant to realize that reaching cooperative outcomesis a product of our mixed dynamics and of the existenceof a network, as in a well-mixed population the consen-sus is always to full defection. The appearance of fullcooperation takes place when the punishment ε for mu-tual defection is small, as in this case cooperation is notan equilibrium but it is not very far from it, and in thatsituation social imitation of defectors is not fast enoughto induce the other consensus. We also want to stressthat this is a general eﬀect, not limited to the PD, as ourearlier study [29] on the totally symmetric coordinationgame with unconditional imitation shows. Interestingly,in contrast to what takes place in the coordination game,for large q and small ε , the networked PD remains in adisordered dynamic state stable in the large-size limitand which, in turn, appears to be independent of thetopology, at least as far as the heterogeneity of the net-work is concerned. Moreover, with UI update rule, asin the coordination game we also observe the existenceof two regimes in the time evolution towards consensus:power-law when q (the probability of social imitation) issmall, and then the main driver of the agents’ decision isstrategic, and exponential, when q →

1. In this last case,the system behaves initially according to a pure voterdynamics, reaching a state that is very close to the ﬁnalconﬁguration of the voter model. Only after a charac-teristic time t ∗ ( q ) that conﬁguration is in turn aﬀectedby the action of the strategic rule, ending up in a frozenconﬁguration, with the exception of the region mentionedabove. Naturally, such t ∗ ( q ) diverges for q approachingto 1, where only the voter model regime is left. On theother hand, with REP and MOR only an exponentialdecay can be observed (Fig. 6), for every ε and q .We have also extended our study to other strategicdynamics, in order to assess the mechanisms behind ourobservations. To that end, we have implemented REP,which copies a neighboring agent’s action if it led to bet-ter payoﬀ than that of the focal agent, and MOR, whichcopies a neighboring agent’s action with probability pro-portional to her payoﬀ, and can make mistakes choos-ing less proﬁtable actions. By comparing these updaterules with UI, given by imitation of the best-performingneighbor, we conclude that only if the strategic imitationtakes into account the whole neighborhood the payoﬀ-increasing cooperation can arise with some generality.Indeed, with the REP rule consensus to cooperation isthe outcome in at most 15% of the realizations, wherethe asymptotics of MOR is almost random (somewhatbiased towards defection) as it is a combination of twobounded rationality imitation rules. Another particular result of UI dynamics is the regime in which the systemremains in active state, as with the other two dynamicsthe ﬁnal outcome is always full consensus and a frozenconﬁguration.It is worth recalling that other researchers have alreadydealt with models of PD whose dynamics change by tun-ing a parameter. For example, in [40] the behavior ofthe game on diﬀerent topologies is studied as a functionof selection pressure, from neutral evolution to pure im-itation, while in [41] the robustness of the outcomes ofweak selection dynamics in well-mixed populations is an-alyzed when selection becomes intermediate or strong.There are interesting transitions also in these cases, butwe must stress that our study is deeply diﬀerent: indeed,here the varying parameter is the weight of the VM dy-namics, which cannot be considered as a simple noise,since it produces correlations in the system [29]. More-over, at variance with reference [41], it acts on interfaces,that is, it can work only between agents of opposite opin-ion.To conclude in a more general tone, we have found thatthe mechanism of combined social and strategic imita-tion is a very powerful one to drive the system towardsfull consensus, even if the so reached frozen conﬁgura-tion is not an equilibrium of the game as we have seenin this paper. We note that such mechanism does notnecessarily lead to desirable states such as cooperationin a networked PD: this is only observed under suitableparameters (in particular, a not too stringent dilemma)and with a “greedy” strategic rule such as UI. Other-wise, social imitation leads in the case of the PD to lesscooperation than pure strategic behavior. We have alsonoted that social imitation is not a mutation-type noise,but rather it behaves as interfacial noise as it only actson active links, where a cooperator and a defector areconnected. As such noise, it also leads to enhanced co-operation as compared to pure UI on networks. Finally,we have seen that there a number of features of the ex-perimental results on networked PDs that are recoveredfrom this model, and that a description of those results interms of transient conﬁgurations of our dynamics wouldbe possible. This suggests that our mechanism could pro-vide the basis for alternative explanations of the behaviorof human subjects in experiments. ACKNOWLEDGMENTS

D. V. acknowledges support from the PRISMA project(PON04a2 A), within the Italian National Program forResearch and Innovation. M. S. M. acknowledges supportfrom grant INTENSE@COSYP (FIS2012-30634) of theMinisterio de Econom´ıa y Competitividad (MINECO,Spain). J. J. R. receives funding also from the MINECOthrough the Ram´on y Cajal program and through theproject MODASS (FIS2011-24785). A. S. acknowledgessupport from grant PRODIEVO from the MINECO. Inaddition, funding from the EU Commission was receivedthrough project LASAGNE.

REFERENCES [1] C. Castellano, S. Fortunato and V. Loreto, Rev. Mod.Phys. , 591 (2009).[2] J. Kl¨uver, The Dynamics and Evolution of Social Sys-tems: New Foundations of a Mathematical Sociology ,Springer-Verlag (2000).[3] K. Sigmund,

The Calculus of Selﬁshness , Princeton Uni-versity Press (2010).[4] H. Gintis,

Game Theory Evolving , 2nd edition, PrincetonUniversity Press (2009).[5] R. Axelrod and W.D. Hamilton, Science , 1390(1981).[6] J. Maynard-Smith and E. Szathm´ary E.,

The MajorTransitions in Evolution , Oxford Univ. Press (1995).[7] P. Hammerstein,

Genetic and Cultural Evolution of Co-operation , MIT Press (2003).[8] R. Kappeler and C.P. van Schaik,

Cooperation inPrimates and Humans: Mechanisms and Evolution ,Springer-Verlag (2006).[9] E. Fehr and U. Fischbacher, Nature , 785 (2003).[10] M.A. Nowak, Science , 1560 (2006).[11] J.A. Fletcher and M. Doebeli, Proc. R. Soc. B , 13(2009).[12] M.A. Nowak and R.M. May, Nature , 826 (1992).[13] C.P. Roca, J.A. Cuesta and A. S´anchez, Phys. Rev. E , 046106 (2009).[14] C.P. Roca, J.A. Cuesta and A. S´anchez, Phys. Life Rev. , 208 (2009).[15] B.A. Huberman and N.S. Glance, Proc. Natl. Acad. Sci.USA , 7716 (1993).[16] M.A. Nowak, A. Sasaki, C. Taylor and D. Fudenberg,Nature , 646 (2004).[17] V.M. Egu´ıluz, M.G. Zimmerman, C.J. Cela-Conde andM. San Miguel, Am. J. Soc.

110 (4) , 977 (2005).[18] F.C. Santos, J.M. Pacheco and T. Lenaerts, Proc. Natl.Acad. USA , 3490 (2006).[19] C.P. Roca, J.A. Cuesta and A. S´anchez, Eur. Phys. J. B , 587 (2009).[20] L.L. Jiang, T. Zhou, M. Perc, X. Huang and B.H. Wang,New J. Phys. , 103001 (2009).[21] D. Vilone, A. Robledo and A. S´anchez, Phys. Rev. Lett. , 038101 (2011).[22] O. Kirchkamp and R. Nagel, Games Econ. Behav. ,269 (2007). [23] A. Traulsen, D. Semmann, R.D. Sommerfeld, H.J. Kram-beck and M. Milinski, Proc. Natl. Acad. Sci. USA ,2962 (2010).[24] J. Gruji´c, C. Fosco, L. Araujo, J.A. Cuesta and A.S´anchez, PLoS ONE , e13749 (2010).[25] C. Gracia-L´azaro, A. Ferrer, G. Ruiz, A. Taranc´on, J.A.Cuesta, A. S´anchez and Y. Moreno, Proc. Natl. Acad.Sci. USA , 12922 (2012).[26] J. Gruji´c, C. Gracia-L´azaro, A. Traulsen, M. Milinski, D.Semmann, J.A. Cuesta, Y. Moreno and A. S´anchez, Sci.Reports , 4615 (2014).[27] M. Granovetter, Am. J. Soc. , 1420 (1978).[28] W. Yu, Phys. Rev. E , 026105 (2011).[29] D. Vilone, J.J. Ramasco, A. S´anchez and M. San Miguel,Sci. Rep. , 686 (2012).[30] K. Suchecki, V.M. Egu´ıluz and M. San Miguel, Phys.Rev. E , 036132 (2005).[31] J. Fernandez-Gracia, K. Suchecki, J.J. Ramasco, M. SanMiguel and V.M. Eguiluz, Phys. Rev. Lett. , 158701(2014).[32] C.P. Roca, J.A. Cuesta and A. S´anchez, Eur. Phys. Lett. , 48005 (2009).[33] D. Aldous, Random walks on ﬁnite groups and rapidlymixing Markov chains , Springer (1983).[34] T.M. Liggett,

Interacting Particle Systems , Springer(2005).[35] M.E. Yildiz, R. Pagliari, A. Ozdaglar and A. Scaglione,

Voting models in random networks , Information Theoryand Applications Workshop - ITA (2010).[36] J.D. Noh and H. Rieger, Phys. Rev. Lett. , 118701(2004).[37] M.E.J. Newman, Phys. Rev. E , 056131 (2004).[38] F. Gargiulo and J.J. Ramasco, PLoS ONE , e48916(2012).[39] G. Cimini and A. S´anchez, J. Roy. Soc. Interface ,20131186 (2014).[40] F.E. Pinheiro, F.C. Santos and J.M. Pacheco, New J.Phys. , 073035 (2012).[41] B. Wu, J. Garc´ıa, C. Hauert and A. Traulsen, PLoSComp. Biol.9(12)