[PDF] Imitation Dynamics with Payoff Shocks

Abstract

We investigate the impact of payoff shocks on the evolution of large populations of myopic players that employ simple strategy revision protocols such as the "imitation of success". In the noiseless case, this process is governed by the standard (deterministic) replicator dynamics; in the presence of noise however, the induced stochastic dynamics are different from previous versions of the stochastic replicator dynamics (such as the aggregate-shocks model of Fudenberg and Harris, 1992). In this context, we show that strict equilibria are always stochastically asymptotically stable, irrespective of the magnitude of the shocks; on the other hand, in the high-noise regime, non-equilibrium states may also become stochastically asymptotically stable and dominated strategies may survive in perpetuity (they become extinct if the noise is low). Such behavior is eliminated if players are less myopic and revise their strategies based on their cumulative payoffs. In this case, we obtain a second order stochastic dynamical system whose attracting states coincide with the game's strict equilibria and where dominated strategies become extinct (a.s.), no matter the noise level.

Full PDF

aa r X i v : . [ m a t h . P R ] D ec IMITATION DYNAMICS WITH PAYOFF SHOCKS

PANAYOTIS MERTIKOPOULOS AND YANNICK VIOSSAT

Abstract.

We investigate the impact of payoﬀ shocks on the evolution oflarge populations of myopic players that employ simple strategy revision pro-tocols such as the “imitation of success”. In the noiseless case, this process isgoverned by the standard (deterministic) replicator dynamics; in the presenceof noise however, the induced stochastic dynamics are diﬀerent from previousversions of the stochastic replicator dynamics (such as the aggregate-shocksmodel of Fudenberg and Harris, 1992). In this context, we show that strictequilibria are always stochastically asymptotically stable, irrespective of themagnitude of the shocks; on the other hand, in the high-noise regime, non-equilibrium states may also become stochastically asymptotically stable anddominated strategies may survive in perpetuity (they become extinct if thenoise is low). Such behavior is eliminated if players are less myopic and revisetheir strategies based on their cumulative payoﬀs. In this case, we obtain a sec-ond order stochastic dynamical system whose attracting states coincide withthe game’s strict equilibria and where dominated strategies become extinct(a.s.), no matter the noise level. Introduction

Evolutionary game dynamics study the evolution of behavior in populations ofboundedly rational agents that interact strategically. The most widely studied dy-namical model in this context is the replicator dynamics: introduced in biology as amodel of natural selection (Taylor and Jonker, 1978), the replicator dynamics alsoarise from models of imitation of successful individuals (Björnerstedt and Weibull,1996; Schlag, 1998; Weibull, 1995) and from models of learning in games (Hofbaueret al., 2009; Mertikopoulos and Moustakas, 2010; Rustichini, 1999). Mathemati-cally, they stipulate that the growth rate of the frequency of a strategy is propor-tional to the diﬀerence between the payoﬀ of individuals playing this strategy andthe mean payoﬀ in the population. These payoﬀs are usually assumed deterministic:this is typically motivated by a large population assumption and the premise that,owing to the law of large numbers, the resulting mean ﬁeld provides a good ap-proximation of a more realistic but less tractable stochastic model. This approachmakes sense when the stochasticity aﬀecting payoﬀs is independent across individ-uals playing the same strategies, but it fails when the payoﬀ shocks are aggregate,that is, when they aﬀect all individuals playing a given strategy in a similar way.Such aggregate shocks are not uncommon. Bergstrom (2014) recounts the storyof squirrels stocking nuts for the winter months: squirrels may stock a few or alot of nuts, the latter leading to a higher probability of surviving a long winterbut a higher exposure to predation. The unpredictable mildness or harshness of

Supported in part by the French National Research Agency under grant no. GAGA–13–JS01–0004–01 and the French National Center for Scientiﬁc Research (CNRS) under grant no.PEPS–GATHERING–2014. the ensuing winter will then favor one of these strategies in an aggregate way (seealso Robson and Samuelson, 2011, Sec. 3.1.1, and references therein). In traﬃcengineering, one might think of a choice of itinerary to go to work: ﬂuctuations oftraﬃc on some roads aﬀect all those who chose them in a similar way. Likewise, indata networks, a major challenge occurs when trying to minimize network latenciesin the presence of stochastic disturbances: in this setting, the travel time of apacket in the network does not depend only on the load of each link it traverses,but also on unpredictable factors such as random packet drops and retransmissions,ﬂuctuations in link quality, excessive backlog queues, etc. (Bertsekas and Gallager,1992).Incorporating such aggregate payoﬀ shocks in the biological derivation of thereplicator dynamics leads to the stochastic replicator dynamics of Fudenberg andHarris (1992), later studied by (among others) Cabrales (2000), Imhof (2005) andHofbauer and Imhof (2009). To study the long-run behavior of these dynamics,Imhof (2005) introduced a modiﬁed game where the expected payoﬀ of a strategyis penalized by a term which increases with the variance of the noise aﬀecting thisstrategy’s payoﬀ (see also Hofbauer and Imhof, 2009). Among other results, itwas then shown that a ) strategies that are iteratively (strictly) dominated in thismodiﬁed game become extinct almost surely; and b ) strict equilibria of the modiﬁedgame are stochastically asymptotically stable.In this biological model, noise is detrimental to the long-term survival of strate-gies: a strategy which is strictly dominant on average (i.e. in the original, un-modiﬁed game) but which is aﬀected by shocks of substantially higher intensitybecomes extinct almost surely. By contrast, in the learning derivation of the repli-cator dynamics, noise leads to a stochastic exponential learning model where onlyiteratively undominated strategies survive, irrespective of the intensity of the noise(Mertikopoulos and Moustakas, 2010); as a result the frequency of a strictly domi-nant strategy converges to almost surely. Moreover, strict Nash equilibria of theoriginal game remain stochastically asymptotically stable (again, independently ofthe level of the noise), so the impact of the noise in the stochastic replicator dynam-ics of exponential learning is minimal when compared to the stochastic replicatordynamics with aggregate shocks.In this paper, we study the eﬀect of payoﬀ shocks when the replicator equationis seen as a model of imitation of successful agents. As in the case of Imhof (2005)and Hofbauer and Imhof (2009), it is convenient to introduce a noise-adjusted gamewhich is reduced to the original game in the noiseless, deterministic regime. Weshow that: a ) strategies that are iteratively strictly dominated in the modiﬁedgame become extinct almost surely; and b ) strict equilibria of the modiﬁed gameare stochastically asymptotically stable. However, despite the formal similarity,our results are qualitatively diﬀerent from those of Imhof (2005) and Hofbauer andImhof (2009): in the modiﬁed game induced by imitation of success in the presenceof noise, noise is not detrimental per se. In fact, in the absence of diﬀerences inexpected payoﬀs, a strategy survives with a probability that does not depend on therandom variance of its payoﬀs: a strategy’s survival probability is simply its initialfrequency. Similarly, even if a strategy which is strictly dominant in expectationis subject to arbitrarily high noise, it will always survive with positive probability;by contrast, such strategies become extinct (a.s.) in the aggregate shocks model ofFudenberg and Harris (1992). MITATION DYNAMICS WITH PAYOFF SHOCKS 3

That said, the dynamics’ long-term properties change dramatically if playersare less “myopic” and, instead of imitating strategies based on their instantaneouspayoﬀs, they base their decisions on the cumulative payoﬀs of their strategies overtime. In this case, we obtain a second-order stochastic replicator equation whichcan be seen as a noisy version of the higher order game dynamics of Laraki andMertikopoulos (2013). Thanks to this payoﬀ aggregation mechanism, the noiseaverages out in the long run and we recover results that are similar to those ofMertikopoulos and Moustakas (2010): strategies that are dominated in the orig-inal game become extinct (a.s.) and strict Nash equilibria attract nearby initialconditions with arbitrarily high probability.1.1.

Paper Outline.

The remainder of our paper is structured as follows: in Sec-tion 2, we present our model and we derive the stochastic replicator dynamicsinduced by imitation of success in the presence of noise. Our long-term rationalityanalysis begins in Section 3 where we introduce the noise-adjusted game discussedabove and we state our elimination and stability results in terms of this modiﬁedgame. In Section 4, we consider the case where players imitate strategies based ontheir cumulative payoﬀs and we show that the adjustment due to noise is no longerrelevant. Finally, in Section 5, we discuss some variants of our core model relatedto diﬀerent noise processes.1.2.

Notational conventions.

The real space spanned by a ﬁnite set S = { s α } d +1 α =1 will be denoted by R S and we will write { e s } s ∈ S for its canonical basis; in a slightabuse of notation, we will also use α to refer interchangeably to either s α or e α and we will write δ αβ for the Kronecker delta symbols on S . The set ∆( S ) ofprobability measures on S will be identiﬁed with the d -dimensional simplex ∆ = { x ∈ R S : P α x α = 1 and x α ≥ } of R S and the relative interior of ∆ will bedenoted by ∆ ◦ ; also, the support of p ∈ ∆( S ) will be written supp( p ) = { α ∈ S : p α > } . For simplicity, if { S k } k ∈ N is a ﬁnite family of ﬁnite sets, we use theshorthand ( α k ; α − k ) for the tuple ( . . . , α k − , α k , α k +1 , . . . ) and we write P kα insteadof P α ∈ S k . Unless mentioned otherwise, deterministic processes will be representedby lowercase letters, while their stochastic counterparts will be denoted by thecorresponding uppercase letter. Finally, we will suppress the dependence of the lawof a process X ( t ) on its initial condition X (0) = x , and we will write P instead of P x . 2. The model

In this section, we recall a few preliminaries from the theory of population gamesand evolutionary dynamics, and we introduce the stochastic game dynamics understudy.2.1.

Population games.

Our main focus will be games played by populations ofnonatomic players. Formally, such games consist of a ﬁnite set of player populations N = { , . . . , N } (assumed for simplicity to have unit mass), each with a ﬁnite setof pure strategies (or types ) A k = { α k, , α k, , . . . } , k ∈ N . During play, each playerchooses a strategy and the state of each population is given by the distribution x k = ( x kα ) α ∈ A k of players employing each strategy α ∈ A k . Accordingly, the statespace of the k -th population is the simplex X k ≡ ∆( A k ) and the state space of thegame is the product X ≡ Q k X k . P. MERTIKOPOULOS AND Y. VIOSSAT

The payoﬀ to a player of population k ∈ N playing α ∈ A k is determined bythe corresponding payoﬀ function v kα : X → R (assumed Lipschitz). Thus, givena population state x ∈ X , the average payoﬀ to population k will be X kα x kα v kα ( x ) = h v k ( x ) | x i , (2.1)where v k ( x ) ≡ ( v kα ( x )) α ∈ A k denotes the payoﬀ vector of the k -th population inthe state x ∈ X . Putting all this together, a population game is then deﬁned as atuple G ≡ G ( N , A , v ) of nonatomic player populations k ∈ N , their pure strategies α ∈ A k and the associated payoﬀ functions v kα : X → R .In this context, we say that a pure strategy α ∈ A k is dominated by β ∈ A k if v kα ( x ) < v kβ ( x ) for all x ∈ X , (2.2)i.e. the payoﬀ of an α -strategist is always inferior to that of a β -strategist. Moregenerally (and in a slight abuse of terminology), we will say that p k ∈ X k is domi-nated by p ′ k ∈ X k if h v k ( x ) | p k i < h v k ( x ) | p ′ k i for all x ∈ X , (2.3)i.e. when the average payoﬀ of a small inﬂux of mutants in population k is alwaysgreater when they are distributed according to p ′ k rather than p k (irrespective ofthe incumbent population state x ∈ X ).Finally, we will say that the population state x ∗ ∈ X is at Nash equilibrium if v kα ( x ∗ ) ≥ v kβ ( x ∗ ) for all α ∈ supp( x ∗ k ) and for all β ∈ A k , k ∈ N . (NE)In particular, if x ∗ is pure (in the sense that supp( x ∗ ) is a singleton) and (NE) holdsas a strict inequality for all β / ∈ supp( x ∗ k ) , x ∗ will be called a strict equilibrium . Remark . Throughout this paper, we will be suppressing the population index k ∈ N for simplicity, essentially focusing in the single-population case. This is doneonly for notational clarity: all our results apply as stated to the multi-populationmodel described in detail above.2.2. Revision protocols.

A fundamental evolutionary model in the context ofpopulation games is provided by the notion of a revision protocol . Following Sand-holm (2010, Chapter 3), it is assumed that each nonatomic player receives an op-portunity to switch strategies at every ring of an independent Poisson alarm clock,and this decision is based on the payoﬀs associated to each strategy and the cur-rent population state. The players’ revision protocol is thus deﬁned in terms of the conditional switch rates ρ αβ ≡ ρ αβ ( v, x ) that determine the relative mass dx αβ ofplayers switching from α to β over an inﬁnitesimal time interval dt : dx αβ = x α ρ αβ dt. (2.4)The population shares x α are then governed by the revision protocol dynamics : ˙ x α = X β x β ρ βα − x α X β ρ αβ , (2.5)with ρ αα deﬁned arbitrarily. Note that we are considering general payoﬀ functions and not only multilinear (resp. linear)payoﬀs arising from asymmetric (resp. symmetric) random matching in ﬁnite N -person (resp. -person) games. This distinction is important as it allows our model to cover e.g. general traﬃcgames as in Sandholm (2010). In other words, ρ αβ is the probability of an α -strategist becoming a β -strategist up to nor-malization by the alarm clocks’ rate. MITATION DYNAMICS WITH PAYOFF SHOCKS 5

In what follows, we will focus on revision protocols of the general “imitative”form ρ αβ ( v, x ) = x β r αβ ( v, x ) , (2.6)corresponding to the case where a player imitates the strategy of a uniformly drawnopponent with probability proportional to the so-called conditional imitation rate r αβ (assumed Lipschitz). In particular, one of the most widely studied revisionprotocols of this type is the “imitation of success” protocol (Weibull, 1995) wherethe imitation rate of a given target strategy is proportional to its payoﬀ, i.e. r αβ ( v, x ) = v β (2.7)On account of (2.5), the mean evolutionary dynamics induced by (2.7) take theform: ˙ x α = x α h v α ( x ) − X β x β v β ( x ) i , (RD)which is simply the classical replicator equation of Taylor and Jonker (1978).The replicator dynamics have attracted signiﬁcant interest in the literature andtheir long-run behavior is relatively well understood. For instance, Akin (1980),Nachbar (1990) and Samuelson and Zhang (1992) showed that dominated strate-gies become extinct under (RD), whereas the (multi-population) “folk theorem” ofevolutionary game theory (Hofbauer and Sigmund, 2003) states that a ) (Lyapunov)stable states are Nash; b ) limits of interior trajectories are Nash; and c ) strict Nashequilibria are asymptotically stable under (RD).2.3. Payoﬀ shocks and the induced dynamics.

Our main goal in this paperis to investigate the rationality properties of the replicator dynamics in a settingwhere the players’ payoﬀs are subject to exogenous stochastic disturbances. Tomodel these “payoﬀ shocks”, we assume that the players’ payoﬀs at time t are of theform ˆ v α ( t ) = v α ( x ( t )) + ξ α ( t ) for some zero-mean “white noise” process ξ α . Then,in Langevin notation, the replicator dynamics (RD) become: dX α dt = X α h ˆ v α − X β X β ˆ v β i = X α h v α ( X ) − X β X β v β ( X ) i + X α h ξ α − X β X β ξ β i , (2.8)or, in stochastic diﬀerential equation (SDE) form: dX α = X α h v α ( X ) − X β X β v β ( X ) i dt + X α h σ α ( X ) dW α − X β X β σ β ( X ) dW β i , (SRD)where the diﬀusion coeﬃcients σ α : X → R (assumed Lipschitz) measure the inten-sity of the payoﬀ shocks and the Wiener processes W α are assumed independent.The stochastic dynamics (SRD) will constitute the main focus of this paper, sosome remarks are in order: Remark . With v and σ assumed Lipschitz, it follows that (SRD) admits a unique(strong) solution X ( t ) for every initial condition X (0) ∈ X . Moreover, since thedrift and diﬀusion terms of (SRD) all vanish at the boundary bd( X ) of X , standard Modulo an additive constant which ensures that ρ is positive but which cancels out when itcomes to the dynamics. P. MERTIKOPOULOS AND Y. VIOSSAT arguments can be used to show that these solutions exist (a.s.) for all time, andthat X ( t ) ∈ X ◦ for all t ≥ if X (0) ∈ X ◦ (Khasminskii, 2012; Øksendal, 2007). Remark . The independence assumption for the Wiener processes W α can berelaxed without qualitatively aﬀecting our analysis; in particular, as we shall seein the proofs of our results, the rationality properties of (SRD) can be formulateddirectly in terms of the quadratic (co)variation of the noise processes W α . Doingso however would complicate the relevant expressions considerably, so, for clarity,we will retain this independence assumption throughout our paper. Remark . The deterministic replicator dynamics (RD) are also the governingdynamics for the “pairwise proportional imitation” revision protocol (Schlag, 1998)where a revising agent imitates the strategy of a randomly chosen opponent onlyif the opponent’s payoﬀ is higher than his own, and he does so with probabilityproportional to the payoﬀ diﬀerence. Formally, the conditional switch rate ρ αβ under this revision protocol is: ρ αβ = x β (cid:2) v β − v α (cid:3) + , (2.9)where [ x ] + = max { x, } denotes the positive part of x . Accordingly, if the game’spayoﬀs at time t are of the perturbed form ˆ v α ( t ) = v α ( x ( t )) + ξ α ( t ) as before, (2.5)leads to the master stochastic equation: ˙ X α = X β X β X α (cid:2) ˆ v α − ˆ v β (cid:3) + − X α X β X β (cid:2) ˆ v β − ˆ v α (cid:3) + = X α X β X β n(cid:2) ˆ v α − ˆ v β (cid:3) + − (cid:2) ˆ v β − ˆ v α (cid:3) + o = X α X β X β (ˆ v α − ˆ v β )= X α h ˆ v α − X β X β ˆ v β i , (2.10)which is simply the stochastic replicator dynamics (2.8). In other words, (SRD)could also be interpreted as the mean dynamics of a pairwise imitation process withperturbed payoﬀ comparisons as above.2.4. Related stochastic models.

The replicator dynamics were ﬁrst introducedin biology, as a model of frequency-dependent selection. They arise from the geo-metric population growth equation: ˙ z α = z α v α (2.11)where z α denotes the absolute population size of the α -th genotype of a givenspecies. This biological model was also the starting point of Fudenberg and Har-ris (1992) who added aggregate payoﬀ shocks to (2.11) based on the geometricBrownian model: dZ α = Z α [ v α dt + σ α dW α ] , (2.12)where the diﬀusion process σ α dW α represents the impact of random, weather-likeeﬀects on the genotype’s ﬁtness (see also Cabrales, 2000; Hofbauer and Imhof, 2009; An important special case where it makes sense to consider correlated shocks is if the payoﬀfunctions v α ( x ) are derived from random matchings in a ﬁnite game whose payoﬀ matrix is subjectto stochastic perturbations. This speciﬁc disturbance model is discussed in Section 5. The replicator equation (RD) is obtained simply by computing the evolution of the frequencies x α = z α / P β z β under (2.11). MITATION DYNAMICS WITH PAYOFF SHOCKS 7

Imhof, 2005). Itô’s lemma applied to the population shares X α = Z α / P β Z β thenyields the replicator dynamics with aggregate shocks: dX α = X α h v α ( X ) − X β X β v β ( X ) i dt + X α h σ α dW α − X β σ β X β dW β i − X α h σ α X α − X β σ β X β i dt. (2.13)In a repeated game context, the replicator dynamics also arise from a continuous-time variant of the exponential weight algorithm introduced by Vovk (1990) andLittlestone and Warmuth (1994) (see also Sorin, 2009). In particular, if playersfollow the exponential learning scheme: dy α = v α ( x ) dt,x α = exp( y α ) P β exp( y β ) , (2.14)that is, if they play a logit best response to the vector of their cumulative pay-oﬀs, then the frequencies x α follow (RD). Building on this, Mertikopoulos andMoustakas (2009, 2010) considered the stochastically perturbed exponential learn-ing scheme: dY α = v α ( X ) dt + σ α ( X ) dW α ,X α = exp( Y α ) P β exp( Y β ) , (2.15)where the cumulative payoﬀs are perturbed by the observation noise process σ α dW α .By Itô’s lemma, we then obtain the stochastic replicator dynamics of exponentiallearning: dX α = X α h v α ( X ) − X β X β v β ( X ) i dt + X α h σ α dW α − X β σ β X β dW β i + X α h σ α (1 − X α ) − X β σ β X β (1 − X β ) i dt. (2.16)Besides their very distinct origins, a key diﬀerence between the stochastic repli-cator dynamics (SRD) and the stochastic models (2.13)/(2.16) is that there is noItô correction term in the former. The reason for this is that in (2.13) and (2.16),the noise aﬀects primarily the evolution of an intermediary variable (the absolutepopulation sizes Z α and the players’ cumulative payoﬀs Y α respectively) before be-ing carried over to the evolution of the strategy shares X α . By contrast, the payoﬀshocks that impact the players’ revision protocol in (SRD) aﬀect the correspondingstrategy shares directly, so there is no intervening Itô correction. Khasminskii and Potsepun (2006) also considered a related evolutionary model withStratonovich-type perturbations while, more recently, Vlasic (2012) studied the eﬀect of discon-tinuous semimartingale shocks incurred by catastrophic, earthquake-like events. The intermediate variable y α should be thought of as an evaluation of how good the strategy α is, and the formula for x α as a way of transforming these evaluations into a strategy. P. MERTIKOPOULOS AND Y. VIOSSAT

The pure noise case. To better understand the diﬀerences between our model andprevious models of stochastic replicator dynamics, it is useful to consider the caseof pure noise, that is, when the expected payoﬀ of each strategy is equal to one andthe same constant C : v α ( x ) = C for all α ∈ A and for all x ∈ X .For simplicity, let us also assume that σ α ( x ) is independent of the state of thepopulation x . Eq. (2.12) then becomes a simple geometric Brownian motion of theform: dZ α = Z α [ C dt + σ α dW α ] , (2.17)which readily yields Z α ( t ) = Z α (0) exp (cid:0) ( C − σ α / t + σ α W α ( t ) (cid:1) . The correspond-ing frequency X α = Z α / P β Z β will then be: X α ( t ) = X α (0) exp (cid:0) − σ α t + σ α W α ( t ) (cid:1)P β X β (0) exp (cid:16) − σ β t + σ β W β ( t ) (cid:17) . (2.18)If σ α = 0 , the law of large numbers yields − σ α t + σ α W α ( t ) ∼ − σ α t (a.s.).Therefore, letting σ min = min α ∈ A σ α , it follows from (2.18) that strategy α ∈ A is eliminated if σ α > σ min and survives if σ α = σ min (a.s.). In particular, if allintensities are equal ( σ α = σ min for all α ∈ A ), then all strategies survive and theshare of each strategy oscillates for ever, occasionally taking values arbitrarily closeto and arbitrarily close to . On the other hand, under the stochastic replicatordynamics of exponential learning for the pure noise case, (2.16) readily yields: X α ( t ) = X α (0) exp( σ α W α ( t )) P β X β (0) exp( σ β W β ( t )) . (2.19)Therefore, for any value of the diﬀusion coeﬃcients σ α (and, in particular, even ifsome strategies are aﬀected by noise much more than others), all pure strategiessurvive.Our model behaves diﬀerently from both (2.13) and (2.16): in the pure noisecase, for any value of the noise coeﬃcients σ α (as long as σ α > for all α ), onlya single strategy survives (a.s.), and strategy α survives with probability equal to X α (0) . To see this, consider ﬁrst the model with pure noise and only two strategies, α and β . Then, letting X ( t ) = X α ( t ) (so X β ( t ) = 1 − X ( t ) ), we get: dX ( t ) = X ( t )(1 − X ( t )) [ σ α dW α − σ β dW β ] = X ( t )(1 − X ( t )) σ dW ( t ) , (2.20)where σ = σ α + σ β and we have used the time-change theorem for martingalesto write σ dW = σ α dW α − σ β dW β for some Wiener process W ( t ) . This diﬀusionprocess can be seen as a continuous-time random walk on [0 , with step sizes thatget smaller as X approaches { , } . Thus, at a heuristic level, when X ( t ) startsclose to X = 1 and takes one step to the left followed by one step to the right (or theopposite), the walk does not return to its initial position, but will approach (ofcourse, the same phenomenon occurs near ). This suggests that the process shouldeventually converge to one of the vertices: indeed, letting f ( x ) = log x (1 − x ) , Itô’slemma yields df ( X ) = (1 − X ) σdW − (cid:2) (1 − X ) + X (cid:3) σ dt ≤ (1 − X ) σdW − σ dt, (2.21)so, by Lemma A.1, we get lim t →∞ f ( X ( t )) = 0 (a.s.), that is, lim t →∞ X ( t ) ∈ { , } . Elimination is obvious; for survival, simply add σ t to the exponents of (2.18) and recallthat any Wiener process has lim sup t W ( t ) > and lim inf t W ( t ) < (a.s.). MITATION DYNAMICS WITH PAYOFF SHOCKS 9

More generally, consider the model with pure noise and n strategies. Then,computing d [log X α (1 − X α )] as above, we readily obtain lim t →∞ X α ( t ) ∈ { , } (a.s.), for every strategy α ∈ A with σ α > . Since X α is a martingale, we will have E [ X α ( t )] = X α (0) for all t ≥ , so X α → with probability X α (0) and X α ( t ) → with probability − X α (0) . The above highlights two important diﬀerences between our model and the sto-chastic replicator dynamics of Fudenberg and Harris (1992). First, in our model,noise is not detrimental in itself: in the pure noise case, the expected frequency of astrategy remains constant, irrespective of the noise level; by contrast, in the modelof Fudenberg and Harris (1992), the expected frequency of strategies aﬀected bystrong payoﬀ noise decreases. Second, our model behaves in a somewhat more“unpredictable” way: for instance, in the model of Fudenberg and Harris (1992),when there are only two strategies with the same expected payoﬀ, and if one of thestrategies is aﬀected by a stronger payoﬀ noise, then it will be eliminated (a.s.); inour model, we cannot say in advance whether it will be eliminated or not.3.

Long-term rationality analysis

In this section, we investigate the long-run rationality properties of the stochasticdynamics (SRD); in particular, we focus on the elimination of dominated strategiesand the stability of equilibrium play.3.1.

Elimination of dominated strategies.

We begin with the elimination ofdominated strategies. Formally, given a trajectory of play x ( t ) ∈ X , we say thata pure strategy α ∈ A becomes extinct along x ( t ) if x α ( t ) → as t → ∞ . Moregenerally, following Samuelson and Zhang (1992), we will say that the mixed strat-egy p ∈ X becomes extinct along x ( t ) if min { x α ( t ) : α ∈ supp( p ) } → as t → ∞ ;otherwise, we say that p survives.Now, with a fair degree of hindsight, it will be convenient to introduce a modiﬁedgame G σ ≡ G σ ( N , A , v σ ) with payoﬀ functions v σα adjusted for noise as follows: v σα ( x ) = v α ( x ) −

12 (1 − x α ) σ α ( x ) . (3.1)Imhof (2005) introduced a similar modiﬁed game to study the long-term conver-gence and stability properties of the stochastic replicator dynamics with aggregateshocks (2.13) and showed that strategies that are dominated in this modiﬁed game We are implicitly assuming here deterministic initial conditions, i.e. X (0) = x (a.s.) for some x ∈ X . If several strategies are unaﬀected by noise, that is, are such that σ α = 0 , then their rel-ative shares remain constant (that is, if α and β are two such strategies, then X α ( t ) /X β ( t ) = X α (0) /X β (0) for all t ≥ ). It follows from this observation and the above result that, almostsurely, all these strategies are eliminated or all these strategies survive (and only them). In the pure noise case of the model of Fudenberg and Harris (1992), what remains constantis the expected number of individuals playing a strategy. A crucial point here is that this numbermay grow to inﬁnity. What happens to strategies aﬀected by large aggregate shocks is that withsmall probability, the total number of individuals playing this strategy gets huge, but with a largeprobability (going to 1), it gets small (at least compared to the number of individuals playing otherstrategies). This can be seen as a gambler’s ruin phenomenon, which explains that even with ahigher expected payoﬀ than others (hence a higher expected subpopulation size), the frequency ofa strategy may go to zero almost surely (see e.g. Robson and Samuelson, 2011, Sec 3.1.1). Thiscannot happen in our model since noise is added directly to the frequencies (which are bounded). are eliminated (a.s.) – cf. Remark 3.7 below. Our main result concerning theelimination of dominated strategies under (SRD) is of a similar nature:

Theorem 3.1.

Let X ( t ) be an interior solution orbit of the stochastic replicatordynamics (SRD) . Assume further that p ∈ X is dominated by p ′ ∈ X in the modiﬁedgame G σ . Then, p becomes extinct along X ( t ) ( a.s. ) .Remark . As a special case, if the (pure) strategy α ∈ A is dominated by the(pure) strategy β ∈ A , Theorem 3.1 shows that α becomes extinct under (SRD) aslong as v β ( x ) − v α ( x ) > (cid:2) σ α ( x ) + σ β ( x ) (cid:3) for all x ∈ X . (3.2)In terms of the original game, this condition can be interpreted as saying that α isdominated by β by a margin no less that max x (cid:0) σ α ( x ) + σ β ( x ) (cid:1) . Put diﬀerently,Theorem 3.1 shows that dominated strategies in the original, unmodiﬁed gamebecome extinct provided that the payoﬀ shocks are mild enough. Proof of Theorem 3.1.

Following Cabrales (2000), we will show that p becomesextinct along X ( t ) by studying the “cross-entropy” function: V ( x ) = D KL ( p, x ) − D KL ( p ′ , x ) = X α ( p α log p α − p ′ α log p ′ α )+ X α ( p ′ α − p α ) log x α , (3.3)where D KL ( p, x ) = P α p α log( p α /x α ) denotes the Kullback–Leibler (KL) diver-gence of x with respect to p . By a standard argument (Weibull, 1995), p becomesextinct along X ( t ) if lim t →∞ D KL ( p, X ( t )) = ∞ ; thus, with D KL ( p ′ , x ) ≥ , itsuﬃces to show that lim t →∞ V ( X ( t )) = ∞ .To that end, let Y α = log X α so that dY α = dX α X α −

12 1 X α ( dX α ) , (3.4)by Itô’s lemma. Then, writing dS α = X α (cid:2) σ α dW α − P β X β σ β dW β (cid:3) for the mar-tingale term of (SRD), we readily obtain: ( dS α ) = X α h σ α dW α − X β X β σ β dW β i · h σ α dW α − X γ X γ σ γ dW γ i = X α h (1 − X α ) σ α + X β σ β X β i dt, (3.5)where we have used the orthogonality conditions dW β · dW γ = δ βγ dt . By the sametoken, we also get ( dX α ) = ( dS α ) , and hence: dY α = (cid:0) v α − h v | X i (cid:1) dt − h (1 − X α ) σ α + X β σ β X β i dt + σ α dW α − X β X β σ β dW β . (3.6)Therefore, after some easy algebra, we obtain: dV = X α ( p ′ α − p α ) dY α = h v ( X ) | p ′ − p i dt − X α (cid:0) p ′ α − p α (cid:1) (1 − X α ) σ α ( X ) dt + X α ( p ′ α − p α ) σ α ( X ) dW α = h v σ ( X ) | p ′ − p i dt + X α ( p ′ α − p α ) σ α ( X ) dW α (3.7) MITATION DYNAMICS WITH PAYOFF SHOCKS 11 where we have used the fact that P α ( p ′ α − p α ) = 0 .Now, since p is dominated by p ′ in G σ , we will have h v σ ( x ) | p ′ − p i ≥ m for somepositive constant m > and for all x ∈ X . Eq. (4.6) then yields: V ( X ( t )) ≥ V ( X (0)) + mt + ξ ( t ) , (3.8)where ξ denotes the martingale part of (4.6), viz. ξ ( t ) = X α ( p ′ α − p α ) Z t σ α ( X ( s )) dW α ( s ) . (3.9)Since the σ ( X ( t )) is bounded and continuous (a.s.), Lemma A.1 shows that mt + ξ ( t ) ∼ mt as t → ∞ , so the RHS of (3.8) escapes to ∞ as t → ∞ . This implies lim t →∞ V ( X ( t )) = ∞ and our proof is complete. (cid:3) Theorem 3.1 is our main result concerning the extinction of dominated strategiesunder (SRD) so a few remarks are in order:

Remark . Theorem 3.1 is analogous to the elimination results of Imhof (2005,Theorem 3.1) and Cabrales (2000, Prop. 1A) who show that dominated strategiesbecome extinct under the replicator dynamics with aggregate shocks (2.13) if theshocks satisfy certain “tameness” requirements. On the other hand, Theorem 3.1should be contrasted to the corresponding results of Mertikopoulos and Moustakas(2010) who showed that dominated strategies become extinct under the stochasticreplicator dynamics of exponential learning (2.16) irrespective of the noise level (fora related elimination result, see also Bravo and Mertikopoulos, 2014). The crucialqualitative diﬀerence here lies in the Itô correction term that appears in the driftof the stochastic replicator dynamics: the Itô correction in (2.16) is “just right”with respect to the logarithmic variables Y α = log X α and this is what leads to theunconditional elimination of dominated strategies. On the other hand, even thoughthere is no additional drift term in (SRD) except for the one driven by the game’spayoﬀs, the logarithmic transformation Y α = log X α incurs an Itô correction whichis reﬂected in the deﬁnition of the modiﬁed payoﬀ functions (3.1). Remark . A standard induction argument based on the rounds of eliminationof iteratively dominated strategies (see e.g. Cabrales, 2000 or Mertikopoulos andMoustakas, 2010) can be used to show that the only strategies that survive underthe stochastic replicator dynamics (SRD) must be iteratively undominated in themodiﬁed game G σ . Remark . Finally, it is worth mentioning that Imhof (2005) also establishes anexponential rate of extinction of dominated strategies under the stochastic repli-cator dynamics with aggregate shocks (2.13). Speciﬁcally, if α ∈ A is dominated,Imhof (2005) showed that there exist constants A, B > and A ′ , B ′ > such that X α ( t ) = o (cid:16) exp (cid:16) − At + B p t log log t (cid:17)(cid:17) (a.s.) , (3.10)and P [ X α ( t ) > ε ] ≤

12 erfc h A ′ t / + B ′ log ε · t − / i , (3.11)provided that the noise coeﬃcients of (2.13) satisfy a certain “tameness” condition.Following the same reasoning, it is possible to establish similar exponential decayrates for the elimination of dominated strategies under (SRD), but the exact ex-pressions for the constants in (3.10) and (3.11) are more complicated, so we do notpresent them here. Stability analysis of equilibrium play.

In this section, our goal will beto investigate the stability and convergence properties of the stochastic replicatordynamics (SRD) with respect to equilibrium play. Motivated by a collection ofstability results that is sometimes called the “folk theorem” of evolutionary gametheory (Hofbauer and Sigmund, 2003), we will focus on the following three proper-ties of the deterministic replicator dynamics (RD):(1) Limits of interior orbits are Nash equilibria.(2) Lyapunov stable states are Nash equilibria.(3) Strict Nash equilibria are asymptotically stable under (RD).Of course, given the stochastic character of the dynamics (SRD), the notions ofLyapunov and asymptotic stability must be suitably modiﬁed. In this SDE context,we have:

Deﬁnition 3.2.

Let x ∗ ∈ X . We will say that:(1) x ∗ is stochastically Lyapunov stable under (SRD) if, for every ε > and forevery neighborhood U of x ∗ in X , there exists a neighborhood U ⊆ U of x ∗ such that P ( X ( t ) ∈ U for all t ≥ ≥ − ε whenever X (0) ∈ U . (3.12)(2) x ∗ is stochastically asymptotically stable under (SRD) if it is stochasticallystable and attracting: for every ε > and for every neighborhood U of x ∗ in X , there exists a neighborhood U ⊆ U of x ∗ such that P (cid:16) X ( t ) ∈ U for all t ≥ and lim t →∞ X ( t ) = x ∗ (cid:17) ≥ − ε whenever X (0) ∈ U . (3.13)For (SRD), we have:

Theorem 3.3.

Let X ( t ) be an interior solution orbit of the stochastic replicatordynamics (SRD) and let x ∗ ∈ X . (1) If P (lim t →∞ X ( t ) = x ∗ ) > , then x ∗ is a Nash equilibrium of the noise-adjusted game G σ . (2) If x ∗ is stochastically Lyapunov stable, then it is also a Nash equilibrium ofthe noise-adjusted game G σ . (3) If x ∗ is a strict Nash equilibrium of the noise-adjusted game G σ , then it isstochastically asymptotically stable under (SRD) .Remark . By the nature of the modiﬁed payoﬀ functions (3.1), strict equilibriaof the original game G are also strict equilibria of G σ , so Theorem 3.3 impliesthat strict equilibria of G are also stochastically asymptotically stable under thestochastic dynamics (SRD). The converse does not hold: if the noise coeﬃcients σ α are suﬃciently large, (SRD) possesses stochastically asymptotically stable statesthat are not Nash equilibria of G . This is consistent with the behavior of (SRD) inthe pure noise case that we discussed in the previous section: if X ( t ) starts within ε of a vertex of X and there are no payoﬀ diﬀerences, then X ( t ) converges to thisvertex with probability at least − ε . Remark . The condition for α to be a strict equilibrium of the modiﬁed game isthat v β − v α < (cid:0) σ α + σ β (cid:1) for all β = α , (3.14) MITATION DYNAMICS WITH PAYOFF SHOCKS 13 where the payoﬀs and the noise coeﬃcients are evaluated at the vertex e α of X (note the similarity with (3.2)). To provide some intuition for this condition,consider the case of only two pure strategies, α and β , and assume constantnoise coeﬃcients. Letting X ( t ) = X β ( t ) and proceeding as in (2.20), we get dX = X (1 − X ) [( v β − v α ) dt − σ dW ] where σ = σ α + σ β and W is a rescaledWiener process. Heuristically, a discrete-time counterpart of X ( t ) is then providedby the random walk: X ( n + 1) − X ( n ) = X ( n )(1 − X ( n )) h ( v β − v α ) δ + σξ n √ δ i (3.15)where ξ n ∈ { +1 , − } is a zero-mean Bernoulli process, and the noise term is mul-tiplied by √ δ instead of δ because dW · dW = dt . For small X and δ , a simplecomputation then shows that, in the event ξ n +1 = − ξ n , we have: X ( n + 2) − X ( n ) = 2 δX ( n ) (cid:2) v β − v α − σ (cid:3) + o ( δ ) + o ( X ( n )) . (3.16)Since σ = σ α + σ β , the bracket is negative (so X α = 1 − X increases) if andonly if condition (3.14) is satisﬁed. Thus, (3.14) may be interpreted as saying thatwhen the discrete-time process X ( n ) is close to e α and the random noise term ξ n takes two successive steps in opposite direction, then the process ends up evencloser to e α . On the other hand, if the opposite strict inequality holds, then thisinterpretation suggests that β should successfully invade a population where mostindividuals play α – which, in turn, explains (3.2). Proof of Theorem 3.3.

Contrary to the approach of Hofbauer and Imhof (2009),we will not employ the stochastic Lyapunov method (see e.g. Khasminskii, 2012)which requires calculating the inﬁnitesimal generator of (SRD). Instead, motivatedby the recent analysis of Bravo and Mertikopoulos (2014), our proof will rely onthe “dual” variables Y α = log X α that were already used in the proof of Theorem3.1.Part 1. We argue by contradiction. Indeed, assume that P (lim t →∞ X ( t ) = x ∗ ) > but that x ∗ is not Nash for the noise-adjusted game G σ , so v σα ( x ∗ ) < v σβ ( x ∗ ) for some α ∈ supp( x ∗ ) , β ∈ A . On that account, let U be a suﬃciently small neighborhoodof x ∗ in X such that v σβ ( x ) − v σα ( x ) ≥ m for some m > and for all x ∈ U . Then,by (3.6), we get: dY α − dY β = [ v α − v β ] dt − (cid:2) (1 − X α ) σ α − (1 − X β ) σ β (cid:3) dt + σ α dW α − σ β dW β , (3.17)so, if X ( t ) is an interior orbit of (SRD) that converges to x ∗ , we will have: dY α − dY β ≤ − m dt − dξ for all large enough t > , (3.18)where ξ denoting the martingale part of (3.17). Since the diﬀusion coeﬃcients of(3.17) are bounded, Lemma A.1 shows that mt + ξ ( t ) ∼ mt for large t (a.s.), so log X α ( t ) X β ( t ) ≤ log X α (0) X β (0) − mt − ξ ( t ) ∼ − mt → −∞ (a.s.) (3.19)as t → ∞ . This implies that lim t →∞ X α ( t ) = 0 , contradicting our original assump-tion that X ( t ) stays in a small enough neighborhood of x ∗ with positive probability Put diﬀerently, it’s more probable for X ( n ) to decrease rather than increase: X ( n +2) > X ( n ) with probability / (i.e. if and only if ξ n takes two positive steps), while X ( n + 2) < X ( n ) withprobability / . (recall that x ∗ α > ); we thus conclude that x ∗ is a Nash equilibrium of the noise-adjusted game G σ , as claimed.Part 2. Assume that x ∗ is stochastically Lyapunov stable. Then, every neighbor-hood U of x ∗ admits an interior trajectory X ( t ) that stays in U for all time withpositive probability. The proof of Part 1 shows that this only possible if x ∗ is aNash equilibrium of the modiﬁed G σ , so our claim follows.Part 3. To show that strict Nash equilibria of G σ are stochastically asymptoticallystable, let x ∗ = ( α ∗ , . . . , α ∗ N ) ∈ X be a strict equilibrium of G σ . Then, suppressingthe population index k as before, let Z α = Y α − Y α ∗ , (3.20)so that X ( t ) → x ∗ if and only if Z α ( t ) → −∞ for all α ∈ A ∗ ≡ A \ { α ∗ } . To proceed, ﬁx some probability threshold ε > and a neighborhood U of x ∗ in X . Since x ∗ is a strict equilibrium of G σ , there exists a neighborhood U ⊆ U of x ∗ and some m > such that v σα ∗ ( x ) − v σα ( x ) ≥ m for all x ∈ U and for all α ∈ A ∗ . (3.21)Let M > be suﬃciently large so that X ( t ) ∈ U if Z α ( t ) ≤ − M for all α ∈ A ∗ ;we will show that if M is chosen suitably (in terms of ε ) and Z α (0) < − M , then X ( t ) ∈ U for all t ≥ and Z α ( t ) → −∞ with probability at least − ε , i.e. x ∗ isstochastically asymptotically stable.To that end, take Z α (0) ≤ − M in (3.20) and deﬁne the ﬁrst exit time: τ U = inf { t > X ( t ) / ∈ U } . (3.22)By applying (3.17), we then get: dZ α = dY α − dY α ∗ = (cid:2) v σα − v σα ∗ (cid:3) dt − dξ, (3.23)where the martingale term dξ is deﬁned as in (3.17), taking β = α ∗ . Hence, for all t ≤ τ U , we will have: Z α ( t ) = Z α (0) + Z t [ v σα ( X ( s )) − v σα ∗ ( X ( s ))] ds − ξ ( t ) ≤ − M − mt − ξ ( t ) . (3.24)By the time-change theorem for martingales (Øksendal, 2007, Cor. 8.5.4), thereexists a standard Wiener process f W ( t ) such that ξ ( t ) = f W ( ρ ( t )) where ρ = [ ξ, ξ ] denotes the quadratic variation of ξ ; as such, we will have Z α ( t ) ≤ − M whenever f W ( ρ ( t )) ≥ − M − mt . However, with σ Lipschitz over X , we readily get ρ ( t ) ≤ Kt for some positive constant K > , so it suﬃces to show that the hitting time τ = inf (cid:8) t > f W ( t ) = − M − mt/K (cid:9) (3.25)is ﬁnite with probability not exceeding ε . Indeed, if a trajectory of f W ( t ) has f W ( t ) ≥ − M − mt/K for all t ≥ , we will also have f W ( ρ ( t )) ≥ − M − mρ ( t ) /K ≥ − M − mt, (3.26)so τ U is inﬁnite for every trajectory of f W with inﬁnite τ , hence P ( τ U < + ∞ ) ≤ P ( τ < + ∞ ) . Lemma A.2 then shows that P ( τ < + ∞ ) = e − Mm/K , so, if wetake

M > − (2 m ) − K log ε , we get P ( τ U = ∞ ) ≥ − ε . Conditioning on the event τ U = + ∞ , Lemma A.1 applied to (3.24) yields Z α ( t ) ≤ − M − mt − ξ ( t ) ∼ − mt → −∞ (a.s.) (3.27) Simply note that X α ∗ = (cid:0) P β ∈ A ∗ exp( Z β ) (cid:1) − . MITATION DYNAMICS WITH PAYOFF SHOCKS 15 so X ( t ) → x ∗ with probability at least − ε , as was to be shown. (cid:3) Remark . As mentioned before, Hofbauer and Imhof (2009) state a similar “evo-lutionary folk theorem” in the context of single-population random matching gamesunder the stochastic replicator dynamics with aggregate shocks (2.13). In particu-lar, Hofbauer and Imhof (2009) consider the modiﬁed game: v σα ( x ) = v α ( x ) − σ α , (3.28)where σ α denotes the intensity of the aggregate shocks in (2.13), and they show thatstrict Nash equilibria of this noise-adjusted game are stochastically asymptoticallystable under (2.13). It is interesting to note that the adjustments (3.1) and (3.28)do not coincide: the payoﬀ shocks aﬀect the deterministic replicator equation (RD)in a diﬀerent way than the aggregate shocks of (2.13). Heuristically, in the modelof Fudenberg and Harris (1992), noise is detrimental because for a given expectedgrowth rate, noise almost surely lowers the long-term average geometric growth rateof the total number of individuals playing α by the quantity σ α . In a geometricgrowth process, the quantities that matter (the proper ﬁtness measures) are theselong-term geometric growth rates, so the relevant payoﬀs are those of this modiﬁedgame. In our model, noise is not detrimental, but if it is strong enough comparedto the deterministic drift, then, with positive probability, it may lead to otheroutcomes than the deterministic model. Instead, the assumptions of Theorems 3.1and 3.3 should be interpreted as guaranteeing that the deterministic drift prevails.One way to see this is to note that if β strictly dominates α in the original gameand both strategies are aﬀected by the same noise intensity ( σ α = σ β = σ ), then β need not dominate α in the modiﬁed game deﬁned by (3.1), unless the payoﬀmargin in the original game is always greater than σ . Remark . It is also worth contrasting Theorem 3.3 to the unconditional conver-gence and stability results of Mertikopoulos and Moustakas (2010) for the stochas-tic replicator dynamics of exponential learning (2.16). As in the case of dominatedstrategies, the reason for this qualitative diﬀerence is the distinct origins of theperturbation process: the Itô correction in (2.16) is “just right” with respect to thedual variables Y α = log X α , so a state x ∗ ∈ X is stochastically asymptotically stableunder the (2.16) if and only if it is a strict equilibrium of the original game G .4. The effect of aggregating payoffs

In this section, we examine the case where players are less “myopic” and, insteadof using revision protocols driven by their instantaneous payoﬀs, they base theirdecisions on the cumulative payoﬀs of their strategies over time. Formally, focusingfor concreteness on the “imitation of success” revision protocol (2.7), this amountsto considering conditional switch rates of the form: ˜ ρ αβ = x β U β , (4.1)where U β ( t ) = Z t v β ( x ( s )) ds (4.2) In a discrete time setting, if Z ( n + 1) = g ( n ) Z n and g ( n ) = k i with probability p i , what wemean is that the quantity that a.s. governs the long-term growth of Z is not E ( g ) = P i p i k i , but exp( E (ln g )) = Q i k p i i . denotes the cumulative payoﬀ of strategy β up to time t . In this case, (RD) becomes: ˙ x α = x α h U α − X β x β U β i , (4.3)and, as was shown by Laraki and Mertikopoulos (2013), the evolution of mixedstrategy shares is governed by the (deterministic) second order replicator dynamics : ¨ x α = x α h v α ( x ) − X β x β u β ( x ) i + x α h ˙ x α /x α − X β ˙ x β /x β i . (RD )As in the previous section, we are interested in the eﬀects of random payoﬀ shockson the dynamics (RD ). If the game’s payoﬀ functions are subject to random shocksat each instant in time, then these shocks will also be aggregated over time, leadingto the perturbed cumulative payoﬀ process: ˆ U α ( t ) = Z t v α ( X ( s )) ds + Z t σ α ( X ( s )) dW α ( s ) . (4.4)Since ˆ U α is continuous (a.s.), we obtain the stochastic integro-diﬀerential dynamics: ˙ X α = X α h U α ( t ) − X β X β ( t ) U β ( t ) i + X α (cid:20)Z t σ α ( X ( s )) dW α ( s ) − X β Z t X β ( s ) σ β ( X ( s )) dW β ( s ) (cid:21) , (4.5)where, as in (SRD), we assume that the Brownian disturbances W α ( t ) are indepen-dent.To obtain an autonomous SDE from (4.5), let V α = ˙ X α denote the growth rateof strategy α . Then, diﬀerentiating (4.5) yields: dV α = X α h ˙ U α − X β X β ˙ U β i dt (4.6a) + V α h U α − X β X β U β i dt − X α X β U β V β dt (4.6b) + V α (cid:20)Z t σ α ( X ( s )) dW α ( s ) − X β Z t X β ( s ) σ β ( X ( s )) dW β ( s ) (cid:21) dt (4.6c) + X α h σ α ( X ) dW α − X β σ β ( X ) X β dW β i . (4.6d)By (4.5), the sum of the ﬁrst term of (4.6b) and (4.6c) is equal to V α /X α . Thus,using (4.2) we obtain: dV α = X α h v α ( X ) − X β X β v β ( X ) i dt + V α X α dt − X α X β U β V β dt + X α h σ α ( X ) dW α − X β σ β ( X ) X β dW β i , (4.7)and, after summing over all α and solving for X α P β U β V β dt , we get the secondorder SDE system: dX α = V α dtdV α = X α h v α ( X ) − X β x β v β ( X ) i dt + X α h V α /X α − X β V β /X β i dt (4.8) + X α h σ α ( X ) dW α − X β σ β ( X ) X β dW β i . Recall that P α dV α = 0 since P α X α = 1 . MITATION DYNAMICS WITH PAYOFF SHOCKS 17

By comparing the second order system (4.8) to (RD ), we see that there is noItô correction, just as in the ﬁrst order case. By using similar arguments as inLaraki and Mertikopoulos (2013), it is then possible to show that the system (4.8)is well-posed, i.e. it admits a unique (strong) solution X ( t ) for every interior initialcondition X (0) ∈ X ◦ , V (0) ∈ R A and this solution remains in X ◦ for all time.With this well-posedness result at hand, we begin by showing that (4.8) elimi-nates strategies that are dominated in the original game G (instead of the modiﬁedgame G σ ): Theorem 4.1.

Let X ( t ) be a solution orbit of the dynamics (4.5) and assume that α ∈ A is dominated by β ∈ A . Then, α becomes extinct ( a.s. ) .Proof. As in the proof of Theorem 3.1, let Y α = log X α . Then, following the samestring of calculations leading to (3.6), we get: dY α − dY β = [ U α − U β ] dt (4.9a) + (cid:20)Z t σ α ( X ) dW α − Z t σ β ( X ) dW β (cid:21) dt (4.9b)Since α is dominated by β , there exists some positive m > such that v α − v β ≤ − m ,and hence U α ( t ) − U β ( t ) ≤ − mt . Furthermore, with σ bounded and continuous on X , Lemma A.1 readily yields: − mt + (cid:20)Z t σ α ( X ) dW α − Z t σ β ( X ) dW β (cid:21) ∼ − mt (4.10)as t → ∞ . Accordingly, (4.9) becomes: dY α − dY β ≤ − mtdt + θ ( t ) dt (4.11)where the remainder function θ ( t ) corresponding to the drift term (4.9b) is sublinearin t . By integrating and applying Lemma A.1 a second time, we then obtain: Y α ( t ) − Y β ( t ) ≤ Y α (0) − Y β (0) − mt + Z t θ ( s ) ds ∼ − mt (a.s.) . (4.12)We infer that lim t →∞ Y α ( t ) = 0 (a.s.), i.e. α becomes extinct along X ( t ) . (cid:3) Remark . In view of Theorem 4.1, we see that the “imitation of long-term suc-cess” protocol (4.1) provides more robust elimination results than (2.7) in the pres-ence of payoﬀ shocks: contrary to Theorem 3.1, there are no “small noise” require-ments in Theorem 4.1. Our next result provides the analogue of Theorem 3.3 regarding the stability ofequilibrium play:

Theorem 4.2.

Let X ( t ) be an interior solution orbit of the stochastic dynamics (4.5) and let x ∗ ∈ X . Then:(1) If P (lim t →∞ X ( t ) = x ∗ ) > , x ∗ is a Nash equilibrium of G . The reason however is diﬀerent: in (SRD), there is no Itô correction because the noise isadded directly to the dynamical system under study; in (4.8), there is no Itô correction becausethe noise is integrated over, so X α is smooth (and, hence, obeys the rules of ordinary calculus). Recall that R t σ α ( X ( s )) dW α ( s ) is continuous, so the only Itô correction stems from randommutations. Theorem 4.1 actually applies to mixed dominated strategies as well (even iteratively domi-nated ones). The proof is a simple adaptation of the pure strategies case, so we omit it.

Moreover, for every neighborhood U of x ∗ and for all ε > , we have:(2) If P ( X ( t ) ∈ U for all t ≥ ≥ − ε whenever X (0) ∈ U for some neigh-borhood U ⊆ U of x ∗ , then x ∗ is a Nash equilibrium of G .(3) If x ∗ is a strict Nash equilibrium of G , there exists a neighborhood U of x ∗ such that: P (cid:16) X ( t ) ∈ U for all t ≥ and lim t →∞ X ( t ) = x ∗ (cid:17) ≥ − ε, (4.13) whenever X (0) ∈ U .Remark . Part 1 of Theorem 4.2 is in direct analogy with Part 1 of Theorem 3.3:the diﬀerence is that Theorem 4.2 shows that only Nash equilibria of the originalgame G can be ω -limits of interior orbits with positive probability. Put diﬀerently,if x ∗ is a strict equilibrium of G σ but not of G , there is zero probability that (4.5)converges to x ∗ .On the other hand, Parts 2 and 3 are not tantamount to stochastic stability (Lya-punov or asymptotic) under the autonomous SDE system (4.8). The diﬀerence hereis that (4.8) is only well-deﬁned in the interior of X ◦ , so it is not straightforwardhow to deﬁne the notion of (stochastic) stability for boundary points x ∗ ∈ bd( X ) ;moreover, given that (4.8) is a second order system, stability should be stated interms of the problem’s entire phase space, including initial velocities (for a relevantdiscussion, see Laraki and Mertikopoulos, 2013). Instead, the stated stability con-ditions simply reﬂect the fact that the integro-diﬀerential dynamics (4.5) alwaysstart with initial velocity V (0) = 0 , so this added complexity is not relevant. Proof of Theorem 4.2.

We shadow the proof of Theorem 3.3.Part 1. Assume that P (lim t →∞ X ( t ) = x ∗ ) > for some x ∗ ∈ X . If x ∗ is nota Nash equilibrium of G , we will have v α ( x ∗ ) < v β ( x ∗ ) for some α ∈ supp( x ∗ ) , β ∈ A . Accordingly, let U be a suﬃciently small neighborhood of x ∗ in X such that v β ( x ) − v α ( x ) ≥ m for some m > and for all x ∈ U . Since X ( t ) → x ∗ with positiveprobability, it also follows that P ( X ( t ) ∈ U for all t ≥ > ; hence, arguing as in(4.12) and conditioning on the positive probability event “ X ( t ) ∈ U for all t ≥ ”,we get: Y α ( t ) − Y β ( t ) ∼ − mt (conditionally a.s.). (4.14)This implies X α ( t ) → , contradicting our original assumption that X ( t ) stays in asmall neighborhood of x ∗ . We infer that x ∗ is a Nash equilibrium of G , as claimed.Part 2. Simply note that the stability assumption of Part 2 implies that there existsa positive measure of interior trajectories X ( t ) that remain in an arbitrarily smallneighborhood of x ∗ with positive probability. The proof then follows as in Part 1.Part 3. Let Z α = Y α − Y α ∗ be deﬁned as in (3.20) and let m > be such that v α ∗ ( x ) − v α ( x ) ≥ m for all x in some suﬃciently small neighborhood of x ∗ . Also,let M > be suﬃciently large so that X ( t ) ∈ U if Z α ( t ) ≤ − M for all α ∈ A ∗ ;as in the proof of Theorem 3.3, we will show that there is a suitable choice of M such that Z α (0) < − M for all α = α ∗ implies that X ( t ) ∈ U for all t ≥ and Z α ( t ) → −∞ for all α = α ∗ with probability at least − ε . Recall here that strict equilibria of G are also strict equilibria of G σ , but the converse neednot hold. MITATION DYNAMICS WITH PAYOFF SHOCKS 19

Indeed, by setting β = α ∗ in (4.9), we obtain: dZ α = [ U α − U α ∗ ] dt + (cid:20)Z t σ α ( X ) dW α − Z t σ α ∗ ( X ) dW α ∗ (cid:21) dt (4.15)so, recalling (eq:pay-cum-noise), for all t ≤ τ U = inf { t > X ( t ) / ∈ U } , we willhave: Z α ( t ) ≤ − M − mt + Z t θ ( s ) ds − ξ ( t ) , (4.16)where ξ denotes the martingale part of (4.15) and θ ( t ) is deﬁned as in (4.11).Now, let W ( t ) be a Wiener process starting at the origin. We will show that if M is chosen suﬃciently large, then P (cid:16) M + mt ≥ R t W ( s ) ds for all t ≥ (cid:17) ≥ − ε. (4.17)With a fair degree of hindsight, we note ﬁrst that mt + M ≥ at + bt + c where a = m , b = √ M m and c = M , so it suﬃces to show that the hitting time τ = inf { t : R t W ( s ) ds = at + bt + c } is inﬁnite with probability at least − ε .However, by the mean value theorem, there exists some (random) time τ such that: aτ + b − W ( τ ) = 0 − cτ ≤ . (4.18)Since c/τ > , the hitting time τ ′ = inf { t > W ( t ) = 2 at + b } will satisfy τ ′ ( ω ) < τ ( ω ) for every trajectory ω of W with τ ( ω ) < ∞ . However, Lemma A.2gives P [ τ ′ < ∞ ] = exp( − ab ) , hence: P ( τ < ∞ ) ≤ P ( τ ′ < ∞ ) = exp( − ab ) = exp( − M m/ , (4.19)i.e. P ( τ < ∞ ) can be made arbitrarily small by choosing M large enough. We thusdeduce that Z t W ( s ) ds ≤ mt + M for all t ≥ (4.20)with probability no less than − ε .Going back to (4.16), we see that R t θ ( s ) ds − ξ ( t ) − mt remains below M for all time with probability at least − ε (simply use the probability estimate(4.17) and argue as in the proof of Theorem 3.3 recalling that the processes W α areassumed independent). In turn, this shows that P ( X ( t ) ∈ U for all t ≥ ≥ − ε ,so, conditioning on this last event and letting t → ∞ in (4.16), we obtain: P (cid:16) lim t →∞ Z α ( t ) = −∞ (cid:12)(cid:12)(cid:12) X ( t ) ∈ U for all t ≥ (cid:17) = 1 for all α = α ∗ . (4.21)We conclude that X ( t ) remains in U and X ( t ) → x ∗ with probability at least − ε ,as was to be shown. (cid:3) Discussion

In this section, we discuss some points that would have otherwise disrupted theﬂow of the main text:

Payoﬀ shocks in bimatrix games.

Throughout our paper, we have workedwith generic population games, so we have not made any speciﬁc assumptions onthe payoﬀ shocks either. On the other hand, if the game’s payoﬀ functions areobtained from some common underlying structure, then the resulting payoﬀ shocksmay also end up having a likewise speciﬁc form.For instance, consider a basic (symmetric) random matching model where pairsof players are drawn randomly from a nonatomic population to play a symmetrictwo-player game with payoﬀ matrix V αβ , α, β = 1 , . . . , n . In this case, the payoﬀto an α -strategist in the population state x ∈ X will be of the form: v α ( x ) = X β V αβ x β . (5.1)Thus, if the entries of V are disturbed at each t ≥ by some (otherwise independent)white noise process ξ αβ , the perturbed payoﬀ matrix ˆ V αβ = V αβ + ξ αβ will resultin the total payoﬀ shock: ξ α = X β ξ αβ x β . (5.2)The stochastic dynamics (2.8) thus become: dX α = X α hX β V αβ X β − X β,γ V βγ X β X γ i dt + X α hX β σ αβ X β dW αβ − X β,γ σ βγ X β X γ dW βγ i , (5.3)where the Wiener processes W αβ are assumed independent.To compare (5.3) with the core model (SRD), the same string of calculations asin the proof of Theorems 3.1 and 3.3 leads to the modiﬁed payoﬀ functions: v σα ( x ) = v α ( x ) −

12 (1 − x α ) X β σ αβ x β . (5.4)It is then trivial to see that Theorems 3.1 and 3.3 still apply with respect to themodiﬁed game G σ with payoﬀ functions deﬁned as above; however, seeing as (5.4)is cubic in x α and considering the case of constant noise, these modiﬁed payoﬀfunctions no longer correspond to random matching in a modiﬁed bimatrix game.5.2. Stratonovich-type perturbations.

Depending on the origins of the payoﬀshock process (for instance, if there are nontrivial autocorrelation eﬀects that donot vanish in the continuous-time regime), the perturbed dynamics (SRD) couldinstead be written as a Stratonovich-type SDE (Kuo, 2006): ∂X α = X α h v α ( X ) − X β X β v β ( X ) i dt + X α h σ α ∂W α − X β X β σ β ∂W β i , (5.5)where ∂ ( · ) denotes Stratonovich integration. In this case, if M αβ = X α ( δ αβ − X β ) σ β denotes the diﬀusion matrix of (5.5), the Itô equivalent SDE correspondingto (5.5) will be: dX α = X α h v α ( X ) − X β X β v β ( X ) i dt + 12 X β,γ ∂M αβ ∂X γ M γβ dt + X α h σ α dW α − X β X β σ β dW β i . (5.6) For a general overview of the diﬀerences between Itô and Stratonovich integration, see vanKampen (1981); for a more speciﬁc account in the context of stochastic population growth models,the reader is instead referred to Khasminskii and Potsepun (2006) and Hofbauer and Imhof (2009).

MITATION DYNAMICS WITH PAYOFF SHOCKS 21

Then, assuming that the shock coeﬃcients σ β are constant, some algebra yields thefollowing explicit expression for the Itô correction of (5.6): X β,γ ∂M αβ ∂X γ M γβ dt = 12 X β,γ ( δ αβγ − δ αγ x β − δ βγ x α ) σ β · X γ ( δ γβ − X β ) σ β dt = 12 X β,γ (cid:2) δ αβγ (1 − X β ) + δ αγ X β − δ βγ X α + δ βγ X α X β (cid:3) X γ σ β dt = 12 X α h (1 − X β ) σ β − X β X β (1 − X β ) σ β i dt. (5.7)By substituting this correction back to (5.6), we see that the replicator dynam-ics with Stratonovich shocks (5.5) are equivalent to the (Itô) stochastic replicatordynamics of exponential learning (2.16). In this context, Mertikopoulos and Mous-takas (2010) showed that the conclusions of Theorems 3.1 and 3.3 apply directlyto the original, unmodiﬁed game G under (2.16), so dominated strategies becomeextinct and strict equilibria are stochastically asymptotically stable under (5.5) aswell. Alternatively, this can also be seen directly from the correction term (5.7)which cancels with that of (3.1).5.3. Random strategy switches.

An alternative source of noise to the players’evolution under (RD) could come from random masses of players that switch strate-gies without following an underlying deterministic drift – as opposed to jumps witha well-deﬁned direction induced by a revision protocol. To model this kind of “mu-tations”, we posit that the relative mass dX αβ of players switching from α to β overan inﬁnitesimal time interval dt is governed by the SDE: dX αβ = X α ( ρ αβ dt + dM αβ ) , (5.8)where dM αβ denotes the (conditional) mutation rate from α to β .To account for randomness, we will assume that M αβ has unbounded variationover ﬁnite time intervals (contrary to the bounded variation drift term X α ρ αβ dt ).Moreover, for concreteness, we will focus on the imitative regime where ρ αβ = x β r αβ and the mutation processes M αβ follow a similar imitative pattern, namely dM αβ = X β dR αβ . The net change in the population of α -strategists will then be X β X β dM βα − X α X β dM αβ = X α X β X β dQ βα , (5.9)where dQ βα = dR βα − dR αβ describes the net inﬂux of β -strategists in strategy α per unit population mass. Thus, assuming that the increments dQ βα are zero-mean,we will model Q as an Itô process of the form: dQ αβ = η αβ ( X ) dW αβ (5.10)where W αβ is an ordinary Wiener process and the diﬀusion coeﬃcients η αβ : X → R reﬂect the intensity of the mutation process. In particular, the only assumptionsthat we need to make for W and η are that: dW αβ = − dW βα and η αβ = η βα for all α, β ∈ A and for all k ∈ N , (5.11)so that the net inﬂux from α to β is minus the net inﬂux from β to α ; except forthis “conservation of mass” requirement, we will assume that the processes dW αβ are otherwise independent. Thus, in the special case of the “imitation of success”revision protocol (2.7), we obtain the replicator dynamics with random mutations : dX α = X α h v α ( X ) − X β X β v β ( X ) i dt + X α X β = α X β η βα ( X ) dW βα . (5.12)This equation diﬀers from (SRD) in that the martingale term of (SRD) cannotbe recovered from that of (5.12) without violating the symmetry conditions (5.11)that guarantee that there is no net transfer of mass across any pair of strategies α, β ∈ A . Nonetheless, by repeating the same analysis as in the case of Theorems3.1 and 3.3, we obtain the following proposition for the stochastic dynamics (5.12): Proposition 5.1.

Let X ( t ) be an interior solution orbit of the stochastic dynamics (5.12) and consider the noise-adjusted game G η with modiﬁed payoﬀ functions: v ηα ( x ) = v α ( x ) − X β = α x β η βα ( x ) . (5.13) We then have: (1) If p ∈ X is dominated in G η , then it becomes extinct under (5.12) . (2) If P (lim t →∞ X ( t ) = x ∗ ) > for some x ∗ ∈ X , then x ∗ is a Nash equilibriumof G η . (3) If x ∗ ∈ X is stochastically Lyapunov stable, then it is a Nash equilibrium of G η . (4) If x ∗ ∈ X is a strict Nash equilibrium of G η , then it is stochastically asymp-totically stable under (5.12) .Proof. The proof is similar to that of Theorems 3.1 (for Part 1) and 3.3 (for Parts2–4), so we omit it. (cid:3)

Appendix A. Auxiliary results from stochastic analysis

In this appendix, we state and prove two auxiliary results from stochastic analysisthat were used throughout the paper. Lemma A.1 is an asymptotic growth boundfor Wiener processes relying on the law of the iterated logarithm, while Lemma A.2is a calculation of the probability that a Wiener process starting at the origin hitsthe line a + bt in ﬁnite time. Both lemmas appear in a similar context in Bravoand Mertikopoulos (2014); we provide a proof here only for completeness and easeof reference. Lemma A.1.

Let W ( t ) = ( W ( t ) , . . . , W n ( t )) , t ≥ , be an n -dimensional Wienerprocesses and let Z ( t ) be a bounded, continuous process in R n . Then: f ( t ) + Z t Z ( s ) · dW ( s ) ∼ f ( t ) as t → ∞ ( a.s. ) , (A.1) for any function f : [0 , ∞ ) → R such that lim t →∞ ( t log log t ) − / f ( t ) = + ∞ .Proof. Let ξ ( t ) = R t Z ( s ) · dW ( s ) = P ni =1 R t Z i ( s ) dW i ( s ) . Then, the quadraticvariation ρ = [ ξ, ξ ] of ξ satisﬁes: d [ ξ, ξ ] = dξ · dξ = n X i =1 Z i Z j δ ij dt ≤ M dt, (A.2)where M = sup t ≥ k Z ( t ) k < + ∞ (recall that Z ( t ) is bounded by assumption).On the other hand, by the time-change theorem for martingales (Øksendal, 2007, MITATION DYNAMICS WITH PAYOFF SHOCKS 23

Corollary 8.5.4), there exists a Wiener process f W ( t ) such that ξ ( t ) = f W ( ρ ( t )) , andhence: f ( t ) + ξ ( t ) f ( t ) = 1 + f W ( ρ ( t )) f ( t ) . (A.3)Obviously, if lim t →∞ ρ ( t ) ≡ ρ ( ∞ ) < + ∞ , f W ( ρ ( ∞ )) is normally distributed so f W ( ρ ( t )) /f ( t ) → and there is nothing to show. Otherwise, if lim t →∞ ρ ( t ) = + ∞ ,the quadratic variation bound (A.2) and the law of the iterated logarithm yield: (cid:12)(cid:12)f W ( ρ ( t )) (cid:12)(cid:12) f ( t ) ≤ (cid:12)(cid:12)f W ( ρ ( t )) (cid:12)(cid:12)p ρ ( t ) log log ρ ( t ) × √ M t log log

M tf ( t ) → as t → ∞ , (A.4)and our claim follows. (cid:3) Lemma A.2.

Let W ( t ) be a standard one-dimensional Wiener process and considerthe hitting time τ a,b = inf { t > W ( t ) = a + bt } , a, b ∈ R . Then: P ( τ a,b < ∞ ) = exp( − ab − | ab | ) . (A.5) Proof.

Let W ( t ) = W ( t ) − bt so that τ a,b = inf { t > W ( t ) = a } . By Girsanov’stheorem (see e.g. Øksendal, 2007, Chap. 8), there exists a probability measure Q such that a ) W is a Brownian motion with respect to Q ; and b ) the Radon–Nikodymderivative of Q with respect to P satisﬁes d Q d P (cid:12)(cid:12)(cid:12)(cid:12) F t = exp (cid:0) − b t/ bW ( t ) (cid:1) = exp (cid:0) b t/ − bW ( t ) (cid:1) , (A.6)where F t denotes the natural ﬁltration of W ( t ) . We then get P ( τ a,b < t ) = E P [ ( τ a,b < t )]= E Q (cid:2) ( τ a,b < t ) · exp( − b t/ − bW ( t )) (cid:3) = E Q (cid:2) ( τ a,b < t ) · exp( − b τ a,b / − bW ( τ a,b )) (cid:3) = exp( − ab ) E Q (cid:2) ( τ a,b < t ) · exp( − b τ a,b / (cid:3) , (A.7)and hence: P ( τ a,b < ∞ ) = lim t →∞ P ( τ a,b < t )= lim t →∞ exp( − ab ) E Q (cid:2) ( τ a,b < t ) · exp( − b τ a,b / (cid:3) = exp( − ab ) E Q (cid:2) exp( − b τ a,b / (cid:3) = exp( − ab − | ab | ) , (A.8)where, in the last step, we used the expression E [exp( − λτ a )] = exp( − a √ λ ) forthe Laplace transform of the Brownian hitting time τ a = inf { t > W ( t ) = a } (Karatzas and Shreve, 1998). (cid:3) References

Akin, E., 1980: Domination or equilibrium.

Mathematical Biosciences ,

50 (3-4) , 239–250.Bergstrom, T. C., 2014: On the evolution of hoarding, risk-taking, and wealth distribution innonhuman and human populations.

Proceedings of the National Academy of Sciences of theUSA ,

111 (3) , 10 860–10 867.Bertsekas, D. P. and R. Gallager, 1992:

Data Networks . 2d ed., Prentice Hall, Englewood Cliﬀs,NJ.

Björnerstedt, J. and J. W. Weibull, 1996: Nash equilibrium and evolution by imitation.

TheRational Foundations of Economic Behavior , K. J. Arrow, E. Colombatto, M. Perlman, andC. Schmidt, Eds., St. Martin’s Press, New York, NY, 155–181.Bravo, M. and P. Mertikopoulos, 2014: On the robustness of learning in games with stochasticallyperturbed payoﬀ observations. http://arxiv.org/abs/1412.6565 .Cabrales, A., 2000: Stochastic replicator dynamics.

International Economic Review ,

41 (2) , 451–81.Fudenberg, D. and C. Harris, 1992: Evolutionary dynamics with aggregate shocks.

Journal ofEconomic Theory ,

57 (2) , 420–441.Hofbauer, J. and L. A. Imhof, 2009: Time averages, recurrence and transience in the stochasticreplicator dynamics.

The Annals of Applied Probability ,

19 (4) , 1347–1368.Hofbauer, J. and K. Sigmund, 2003: Evolutionary game dynamics.

Bulletin of the AmericanMathematical Society ,

40 (4) , 479–519.Hofbauer, J., S. Sorin, and Y. Viossat, 2009: Time average replicator and best reply dynamics.

Mathematics of Operations Research ,

34 (2) , 263–269.Imhof, L. A., 2005: The long-run behavior of the stochastic replicator dynamics.

The Annals ofApplied Probability ,

15 (1B) , 1019–1045.Karatzas, I. and S. E. Shreve, 1998:

Brownian Motion and Stochastic Calculus . Springer-Verlag,Berlin.Khasminskii, R. Z., 2012:

Stochastic Stability of Diﬀerential Equations . 2d ed., No. 66 in Sto-chastic Modelling and Applied Probability, Springer-Verlag, Berlin.Khasminskii, R. Z. and N. Potsepun, 2006: On the replicator dynamics behavior underStratonovich type random perturbations.

Stochastic Dynamics , , 197–211.Kuo, H.-H., 2006: Introduction to Stochastic Integration . Springer, Berlin.Laraki, R. and P. Mertikopoulos, 2013: Higher order game dynamics.

Journal of Economic Theory ,

148 (6) , 2666–2695.Littlestone, N. and M. K. Warmuth, 1994: The weighted majority algorithm.

Information andComputation ,

108 (2) , 212–261.Mertikopoulos, P. and A. L. Moustakas, 2009: Learning in the presence of noise.

GameNets ’09:Proceedings of the 1st International Conference on Game Theory for Networks .Mertikopoulos, P. and A. L. Moustakas, 2010: The emergence of rational behavior in the presenceof stochastic perturbations.

The Annals of Applied Probability ,

20 (4) , 1359–1388.Nachbar, J. H., 1990: Evolutionary selection dynamics in games.

International Journal of GameTheory , , 59–89.Øksendal, B., 2007: Stochastic Diﬀerential Equations . 6th ed., Springer-Verlag, Berlin.Robson, A. J. and L. Samuelson, 2011: The evolutionary foundations of preferences.

Handbookof Social Economics , J. Benhabib, A. Bisin, and M. O. Jackson, Eds., North-Holland, Vol. 1,chap. 7, 221–310.Rustichini, A., 1999: Optimal properties of stimulus-response learning models.

Games and Eco-nomic Behavior , , 230–244.Samuelson, L. and J. Zhang, 1992: Evolutionary stability in asymmetric games. Journal of Eco-nomic Theory , , 363–391.Sandholm, W. H., 2010: Population Games and Evolutionary Dynamics . Economic learning andsocial evolution, MIT Press, Cambridge, MA.Schlag, K. H., 1998: Why imitate, and if so, how? a boundedly rational approach to multi-armedbandits.

Journal of Economic Theory ,

78 (1) , 130–156.Sorin, S., 2009: Exponential weight algorithm in continuous time.

Mathematical Programming ,

116 (1) , 513–528.Taylor, P. D. and L. B. Jonker, 1978: Evolutionary stable strategies and game dynamics.

Mathe-matical Biosciences ,

40 (1-2) , 145–156.van Kampen, N. G., 1981: Itô versus Stratonovich.

Journal of Statistical Physics ,

24 (1) , 175–187.Vlasic, A., 2012: Long-run analysis of the stochastic replicator dynamics in the presence of randomjumps. http://arxiv.org/abs/1206.0344 . MITATION DYNAMICS WITH PAYOFF SHOCKS 25

Vovk, V. G., 1990: Aggregating strategies.

COLT ’90: Proceedings of the 3rd Workshop onComputational Learning Theory , 371–383.Weibull, J. W., 1995:

Evolutionary Game Theory . MIT Press, Cambridge, MA.(P. Mertikopoulos)

CNRS (French National Center for Scientific Research), LIG, F-38000 Grenoble, France, and Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France

E-mail address : [email protected] URL : http://mescal.imag.fr/membres/panayotis.mertikopoulos (Y. Viossat) PSL, Université Paris–Dauphine, CEREMADE UMR7534, Place duMaréchal de Lattre de Tassigny, 75775 Paris, France

E-mail address ::