OOrdinal Imitative Dynamics
George Loginov ∗ July 10, 2019
Abstract
This paper introduces an evolutionary dynamics based on imitate the better realization (IBR) rule. Under this rule, agents in a population game imitate the strategy of arandomly chosen opponent whenever the opponent‘s realized payo ff is higher thantheir own. Such behavior generates an ordinal mean dynamics which is polynomialin strategy utilization frequencies. We demonstrate that while the dynamics does notpossess Nash stationarity or payo ff monotonicity, under it pure strategies iterativelystrictly dominated by pure strategies are eliminated and strict equilibria are locallystable. We investigate the relationship between the dynamics based on the IBR ruleand the replicator dynamics. In trivial cases, the two dynamics are topologicallyequivalent. In Rock-Paper-Scissors games we conjecture that both dynamics exhibitthe same types of behavior, but the partitions of the game set do not coincide. In othercases, the IBR dynamics exhibits behaviors that are impossible under the replicatordynamics. When information about the available strategies and their payo ff s is limited, it may bereasonable for players in a population game to copy the behavior of their opponents ifit yields or at least seems to yield better payo ff s. Such copying gives rise to a family ofimitative revision protocols, in which a player’s decision to switch strategies depends onsome summary of the relative performance of a random sample of opponents the playergets to observe.The two components that comprise any imitative protocol are the sampling procedurewhich determines the candidates to be imitated, and the conditional imitation rate whichdescribes the likelihood of imitation given the information about the candidates’ strategiesand payo ff s. With respect to the payo ff information one can distinguish between protocols ∗ Department of Economics, Augustana University, 2001 S Summit Ave, Sioux Falls, SD 57197, USA. Email:[email protected] a r X i v : . [ ec on . T H ] J u l hat rely on average payo ff s and ones that only depend on realized payo ff s from a smallnumber of matches.This paper studies the imitative protocol with the fewest information requirements.A player who gets a revision opportunity observes one opponent from the population atrandom and switches to that opponent’s strategy whenever the opponent’s realized payo ff is higher than his or her own. This revision rule labeled imitate the better realization (IBR)was first studied in Izquierdo and Izquierdo (2013) in the context of two-strategy games.(We elaborate on their results in Section 3.) It gives rise to ordinal mean dynamics sinceit ignores the magnitudes of payo ff di ff erences, and the resulting dynamics is polynomialin strategy utilization frequencies.The disregard of the payo ff di ff erences deprives the dynamics of some common car-dinal properties. For instance, the Nash equilibria of the base game need not be the restpoints of the dynamics, and the average payo ff s need not improve along the solutiontrajectories. At a rest point, instead of equilibrating the average payo ff s of the survivingstrategies, the dynamics balances the flows to and from each surviving strategy.Despite not being monotone in average payo ff s, the dynamics still eliminates purestrategies iteratively dominated by pure strategies. Weakly dominated strategies andpure strategies dominated by mixed strategies, on the contrary, may survive. In addition,we demonstrate that strict equilbria are locally stable.The dynamics generated by the IBR rule in many cases qualitatively resembles thereplicator dynamics, which can be derived from the pairwise proportional imitation ruleof Schlag (1998). Under the PPI rule a revising agent observes one opponent from thepopulation at random and switches to that opponent’s strategy at a rate proportional tothe payo ff advantage of that strategy . Thus, both the IBR and the PPI rule are based oncomparisons of realized payo ff s, but the magnitudes of payo ff s matter only under thelatter rule.In two-strategy games the IBR dynamics and the replicator dynamics are topologicallyequivalent: they have the same number of rest points, and their stability and convergenceproperties are the same. In the Rock-Paper-Scissors games both dynamics exhibit one ofthe three possible behaviors: global convergence to the rest point, global convergence tothe boundary, or closed orbits around the rest point , but these behaviors need to be thesame. In other cases, for instance in Zeeman’s game, the number of interior rest points thetwo dynamics possess is di ff erent.The study of ordinal imitative dynamics originated with Hofbauer (1995) and Schlag(1998), in which the imitate if better (IB) dynamics – the average payo ff counterpart of imitate the better realization – were introduced and their main properties established. Inparticular, Hofbauer (1995) demonstrates that in the Rock-Paper-Scissors games the IBdynamics behaves similarly to the replicator dynamics. A surprising result is that IB need In the first two cases we make a conjecture about global behavior based on local stability analysis andsimulations. In the third case we prove the statement formally. imitate the better realization rule can be viewed asan analogue of the word-of-mouth communication model (Ellison and Fudenberg (1993))for strategic environments: agents can learn about the relative merits of strategies fromothers’ experiences, but need not be able to find out the exact advantage of a particularstrategy. For instance, one can learn about a better route from a neighbor or a better modeof behavior from an elder.
Imitate the Better Realization protocol and its properties
Suppose that a continuum of agents of mass 1 are randomly matched to play a symmetrictwo-player game with the payo ff matrix A . Let S = { , . . . , n } be the set of strategies, andfor i , j ∈ S let π ij be the payo ff of strategy i against strategy j . At any instant, the populationstate x = ( x , x , . . . , x n ) describes the proportion of players choosing each strategy. The setof all such population states is the ( n − ∆ = { x | (cid:80) ni = x i = , x i ≥ } .Assume that the agents do not know the structure of the game, they are not aware ofall the available strategies, and they don’t keep the record of the strategies they used inthe past or the payo ff s they received in their previous interactions. In addition, they don’tknow the current population state and they are not capable of correctly anticipating theway it will evolve. The only piece of information they possess and are able to retain istheir current payo ff , and the only way they can learn about alternative modes of behavioris by observing the strategies of others.The only objective of the agents compatible with these assumptions is maximizingtheir current payo ff s. As usual in a population setting, they are only able to switch theirstrategies infrequently, and once a strategy revision opportunity arises, the revising agentobserves one opponent from the population at random and switches to that opponent‘sstrategy whenever the opponent‘s realized payo ff is higher than his or her own.E ff ectively, the revising agent gets to compare their current payo ff to a sample payo ff of some other strategy without learning the circumstance under which that sample payo ff was obtained. Thus agents cannot distinguish between strategies that perform better onaverage and favorable circumstances in which worse strategies outperform better ones.Besides, upon switching to the candidate strategy the agent’s payo ff may di ff er from thesample payo ff he or she got to observe. As a result, such “blind” imitation may not beimproving in terms of average payo ff s, and yet it eliminates dominated strategies.Given this imitate the better realization revision protocol, the switch rate ρ ij from strategy i to strategy j can be expressed as the probability that a payo ff drawn from the j -th row ofthe payo ff matrix A exceeds a payo ff drawn from the i -th row, with the population state x ρ ij ( x , A ) = x j n (cid:88) k = x k n (cid:88) m = x m { π jk >π im } The population setting o ff ers the following interpretation: the realized payo ff of a revisingagent playing strategy i equals π im with probability x m for m ∈ S . Such an agent wouldobserve a payo ff realization π jk for strategy j with probability x j x k for k ∈ S . Summingover m and k and counting the cases in which the payo ff to the candidate strategy j isbetter yields the probability of switching ρ ij ( x , A ).A stochastic process that emerges when agents in the population independently receiverevision opportunities can be approximated by its mean dynamics which describes theexpected change in the proportion of agents playing each strategy (Bena¨ım and Weibull(2003)). The mean dynamics of the process governed by imitation of the better realizationis ˙ x i = n (cid:88) j = x j ρ ji ( x , A ) − x i n (cid:88) j = ρ ij ( x , A )(IBRD) = x i n (cid:88) j = x j n (cid:88) k = x k n (cid:88) m = x m ( { π jk <π im } − { π jk >π im } ) Thus in general the IBR dynamics is a quartic polynomial in n variables { x , x , . . . , x n } ,but in certain cases, as shown in the next subsection, it reduces to the replicator dynam-ics which is cubic in x i and is the mean dynamics for a number of more information-demanding imitative rules (Imitation via pairwise comparisons of Helbing (1992) andSchlag (1998), imitation driven by dissatisfaction of Bj ¨ornerstedt and Weibull (1996), andimitation of success of Hofbauer (1995), see also Section 5.4.2 in Sandholm (2010)). The connection between the IBR dynamics and the replicator dynamics can be estab-lished via the pairwise proportional imitation protocol (PPI) introduced in Schlag (1998).Under the PPI the revising agent observes one opponent from the population at ran-dom and imitates that opponent’s strategy at a rate proportional to its payo ff advantage.Compared to the PPI rule based on realized payo ff s, the IBR rule suppresses any payo ff di ff erences, thus under the IBR rule all conditional switch rates to strategies that exhibithigher outcomes are the same, whereas under the PPI rule the switch rate is higher thehigher the outcome.Under the PPI rule based on realized payo ff s, an agent whose strategy i yields payo ff π im for some m ∈ S switches to strategy j if the observed payo ff realization π jk exceeds4 im , with a conditional switch rate proportional to the payo ff advantage of j , which is[ π jk − π im ] + . Accounting for the likelihood of each payo ff realization π jk one can expressthe switch rate from strategy i to strategy j as( PPI R ) ρ ij ( x , A ) = x j n (cid:88) k = x k n (cid:88) m = x m [ π jk − π im ] + The PPI rule based on average payo ff s generates the switch rates in the form( PPI A ) ρ ij ( x , A ) = x j [ π j − π i ] + where x j is the probability of sampling an agent whose strategy is j , and the term [ π j − π i ] + is the (average) payo ff advantage of strategy j over i . Due to linearity of average payo ff sin strategy utilization frequencies x i , both kinds of proportional imitation generate thereplicator dynamics as their mean dynamics (Theorem 3 of Schlag (1998)):(RD) ˙ x i = x i ( π i − ¯ π )The di ff erence in the conditional switch rates under the IBR and the PPI rules leads toa significant dissimilarity in the resulting mean dynamics. The rest points of the replicatordynamics are the restricted equilibria of the underlying game, so the average payo ff s to allactive strategies are the same and the agents have no incentives to switch between them.The rest points of the IBR dynamics, on the other hand, are “ordinal restricted equilibria”in which the net flow for each strategy is zero. Yet it is possible that the flows betweenthe active strategies are positive at a rest point, and the average payo ff s to strategies neednot be the same. Thus in general, the sets of the rest points for the replicator and theIBR dynamics do not coincide. However, as the following proposition states, when thecardinality of the set of payo ff s is low the IBR dynamics coincides with the replicatordynamics up to a constant change of speed. Proposition 1.
If the payo ff matrix A contains only two distinct payo ff s, the IBR dynamics reducesto the replicator dynamics up to a constant change of speed.Proof. Suppose WLOG that the set of payo ff s is { π ij } = { , k } for some k >
0. Then forany pair of strategy profiles involving the strategies i and j the conditional switch ratesfrom i to j under the IBR rule are proportional to those under the PPI rule. For any i , j , k , m ∈ { , , . . . , n } the following holds:[ π jk − π im ] + = k · { π jk >π im } , therefore ρ PPIij ( x , A ) = k · ρ IBRij ( x , A ), so the IBR dynamics is the replicator dynamics scaledby k . (cid:3) ff s in which both protocols generate the same dynamics. Example 1.
Consider the coordination game with the payo ff matrix (cid:32) (cid:33) Both the IBR and the PPI protocols generate the same dynamics: ˙ x = x (1 − x )(2 x − . Under thePPI rule an agent with payo ff π = who observes a candidate with payo ff π = is twice aslikely to switch to strategy 1 than an agent with payo ff π = , but at the same time an agent withpayo ff π = is twice as likely to switch to strategy 2 when he or she observes a candidate withpayo ff π = rather than a candidate with payo ff π = . These higher switch rates annihilateeach other, so in the end the flows between the strategies are identical to those under the IBR rule,when all switch rates upon observing a better outcome are the same. (cid:3) Section 4.1 presents another example in which the two dynamics draw closer: in thestandard Rock-Paper-Scissors game the IBR dynamics can be obtained from the replicatordynamics by a positive non-constant change of speed. But Example 1 and the standardRPS game are an exception to the general rule. ff monotonicity and payo ff positivity In this section it is shown that the IBR dynamics need not preserve such cardinalproperties as payo ff monotonicity and payo ff positivity. Payo ff monotonicity (Nachbar(1990)) requires that the order of growth rates be the same as the order of average payo ff s(if π i > π j then ˙ x i x i > ˙ x j x j ). Payo ff positivity (Nachbar (1990)) is a weaker requirement thata strategy have a positive growth rate if and only if its payo ff is higher than the averagepayo ff in the population. Weak payo ff positivity (Weibull (1995)) is an even weakerrequirement that among the strategies with above-average payo ff s there is one with apositive growth rate. The next example demonstrates that all these properties are violatedfor the IBR dynamics even in two-strategy games: Example 2.
Consider the coordination game with the payo ff matrix AA = (cid:32)
10 03 3 (cid:33)
Let x be the frequency of the first (top) strategy in the population. The mixed strategy equilibriumin game A is x ∗ = . , and for all x > x ∗ the average payo ff of the first strategy is higher than thatof the second strategy. But the IBR dynamics for A is ˙ x = x (1 − x )(2 x − , o for all x < . the proportion of agents playing the first strategy is decreasing. Thus, for instance,when x = . π = > = π , but ˙ x (0 . < , and so the IBR dynamics is neither payo ff monotonenor payo ff positive. (cid:3) Depending on the payo ff s, any state x ∗ ∈ (0 ,
1) can be the mixed Nash equilibrium ofthe game which has the same order of payo ff s as game A from the Example 2. Yet for allsuch games the IBR dynamics selects x = . x ∗ and 0 . x ∗ and 0 .
5, the first strategy already has a higheraverage payo ff yet the majority of agents playing it receive the lowest payo ff and thuswould treat switching to the other strategy as an improvement. The switches in theopposite direction are less likely since the agents currently playing the second strategywould only imitate the minority of strategy 1 agents who currently receive the overallhighest payo ff . In the remainder of the state space the vector fields of the IBR dynamicsand the replicator dynamics point in the same direction. For this to happen, a strategywith a higher average payo ff needs to guarantee a better payo ff for a larger share of agentsthan the other strategy.This intuition also paves the way for the next result: elimination of dominated strate-gies. If strategy 1 dominates strategy 2, then at any interior population state there will bea positive flow from 2 to 1, and the net inflow from any other strategy would be higherfor 1 than for 2. Together these e ff ects result in the ultimate extinction of strategy 2. This section sharpens the results on elimination of dominated strategies for imitativedynamics. As established by Nachbar (1990) and Samuelson and Zhang (1992), “cardinal”imitative dynamics, including the replicator dynamics, eliminate pure strategies (itera-tively) dominated by other pure strategies due to payo ff monotonicity. The IBR dynamics,on the contrary, is a non-monotone imitative dynamics which still eliminates such domi-nated strategies. In terms of the comparison between the PPI and the IBR rules, this resultmeans that imitation alone can be su ffi cient for the elimination of dominated strategies.In addition, Hofbauer and Sandholm (2011) demonstrate that under most dynamics notbased on imitation dominated pure strategies can survive, in part because the agents in apopulation setting may be unable to recognize dominated strategies and thus avoid them.In the case of the IBR dynamics, the available payo ff information is also insu ffi cient toidentify dominated strategies, and yet the agent switch away from them in the course ofthe play. Proposition 2.
If a strategy is (iteratively) dominated by another pure strategy, then it is eliminatedalong any interior solution of the IBR dynamics.Proof.
Suppose that strategy j is dominated by strategy i . Fix k ∈ { , . . . , n } , by dominance π ik > π jk . Take a strategy p (cid:44) i , j and consider all possible strategy profiles ( p , q ) that might7rise in a match involving an agent playing this strategy. For a fixed q ∈ { , . . . , n } thepayo ff π pq would fall into one of these three categories:1. π pq ≤ π jk < π ik , in which case there is an inflow into each strategy: ( x p x q x k ) x i intostrategy i and ( x p x q x k ) x j into j . Let P k x i and P k x j denote the total flows in this case.2. π jk < π pq ≤ π ik , so there is outflow from strategy j and inflow into strategy i , withtotal flows expressed as some Q k x i and Q k x j .3. π jk < π ik < π pq , so there is outflow from both strategies, with total outflow expressedby R k x i and R k x j .In addition, there is net inflow Tx i x j from strategy j to i which includes at least the terms x k x i x j .In terms of these flow components the change in the population proportion for thestrategies i and j can be expressed as˙ x i = n (cid:88) k = ( P k x i + Q k x i − R k x i ) + Tx i x j ˙ x j = n (cid:88) k = (cid:16) P k x j − Q k x j − R k x j (cid:17) − Tx i x j and so the change in the di ff erence d in growth rates of the two strategies is always positive:˙ d = ˙ x i x i − ˙ x j x j = n (cid:88) k = Q k + T ( x i + x j ) > d → + ∞ as t → ∞ , and thus strategy j is eliminated.In the restricted game in which strategy j is eliminated, one can apply the samereasoning to demonstrate that any strategy that becomes dominated will be eliminated aswell. Thus by continuity of the IBR dynamics in the neighborhood of the edge oppositeto the vertex x j = A in the example 3 for an illustration to this argument. (cid:3) With a slight adjustment ( π ik > π jk would hold for at least one k , but not necessary all k ∈ { , . . . , n } ) the argument can be applied to weakly dominated strategies as well, but oneshould only consider weakly dominated strategies after all strictly dominated strategiesare eliminated. Otherwise, a strategy that is weakly dominated only with respect toa strictly dominated strategy may survive, as illustrated by game A in the followingexample. 8 xample 3. Consider the games A and A A = A = Figure 1: Some solution trajectories in games A (left) and A (right). In the game A strategy 3 is dominated by both 1 and 2, strategy 2 is weakly dominated by 1,but once strategy 3 is eliminated, both 1 and 2 coexist.From the perspective of an agent playing strategy 3 the remaining two strategies are equallygood. The only case when strategy 1 gains advantage over strategy 2 is when an agent playingstrategy 2 gets to imitate someone playing 1 against 3. The probability of observing such a candidateis x x , so the switch rate from strategy 2 to strategy 1 is low near the pure state x = and nearthe edge x = . It is relatively high near the center of the simplex where the expression x x x ismaximized.In the game A strategy 3 is dominated by strategy 2, and after strategy 3 is eliminated, strategy2 is dominated by strategy 1. In this game the solution trajectories originating near the pure statex = first move in the direction of the pure state x = , since when most agents choose strategy3 almost no one gets to imitate strategy 1 as the majority of strategy 1 agents receive the lowestpayo ff . But after the population state gets su ffi ciently close to x = and the strategy 3 becomesalmost extinct, agents begin to realize the advantage of 1 over 2. (cid:3) Another example that complements the result of Proposition 2 demonstrates that astrategy dominated by mixed strategies may survive.
Example 4.
Consider the games A and A with α ∈ (1 , : The figures in this and other examples are generated in
EvoDyn-3s . See Izquierdo et al. (2018). = α α α A = α α α Compared to game A , the order of payo ff s in game A is reversed. In both games strategy 2 isalways the second best, and when α < . , it is dominated by a mixed strategy of 1 and 3.The IBR dynamics in game A is ˙ x = x (1 − x )(1 − z )˙ y = y ( z − x )(1 − z )˙ z = z (1 − z )(2 z − where x , y, and z are the proportions of strategies 1, 2, and 3, respectively. When exactly half ofthe population ( z = ) chooses strategy 3, strategy 2 can survive. In the game A , strategy 2 isotherwise eliminated: strategy 1 is better than 2 when z < , while 3 is better than 2 when z > . Figure 2: Some solution trajectories in games A (left) and A (right). The critical region is z = . In game A with the reversed order of payo ff s the critical region z = becomes absorbing, whichsuggests that strategy 2 survives along any trajectory originating in the interior of the state space. (cid:3) We conclude this section with a stability property for strict equilibria. If a strategy i isthe unique best response to itself, then in the neighborhood of a pure state x i = j (cid:44) i would be most likely matched against an opponentplaying strategy i , and upon receiving a revision opportunity would most likely observe a10andidate earning π ii and as a consequence switch to i . The behavior of such agents wouldcreate an inflow into the strategy i which is of higher order than any potential outflowcaused by payo ff advantages of other strategies over i , so the proportion of agents playing i would increase, making the state x i = Proposition 3.
Strict symmetric equilibria are locally stable with a basin of attraction that includesall states with x i > − √ .Proof. Suppose that the strategy profile ( i , i ) is a strict equilibrium. To show that it is locallystable it is enough to demonstrate that ˙ x i > x i = L ( x ) = x i is a strict local Lyapunov function for the state x i =
1. To do so, construct the lower bound on ˙ x i by considering a game with the lowestnet inflow into the strategy i .Let x i = − (cid:15) . Since ( i , i ) is a strict equilibrium, for any j (cid:44) i we have π ii > π ji , sostrategy i would be imitated by any agent who currently obtains π ji and who observes acandidate obtaining π ii . Such switches to strategy i create an inflow of x i (cid:80) j (cid:44) i x j = (1 − (cid:15) ) (cid:15) ,which is a lower bound on the inflow into strategy i .To obtain a lower bound on ˙ x i , assume that in all other cases strategy i performs worsethan its alternatives, i.e. π im < π jk for any m ∈ S and j , k (cid:44) i and π ik < π ji for any j , k (cid:44) i . Inthe former case the outflow from strategy i is (cid:80) nm = x i x m (cid:80) j , k (cid:44) i x j x k = (1 − (cid:15) ) (cid:15) , and in thelatter it is (cid:80) k (cid:44) i x i x k (cid:80) j (cid:44) i x j x i = (1 − (cid:15) ) (cid:15) . The sum of these two components is the upperbound on the outflow from strategy i .Subtracting the highest outflow from the lowest inflow yields the desired lower bound:˙ x i = n (cid:88) j = x j ρ ji ( x , A ) − x i n (cid:88) j = ρ ij ( x , A ) = x i n (cid:88) j = x j n (cid:88) k = x k n (cid:88) m = x m { π jk <π im } − x i n (cid:88) j = x j n (cid:88) k = x k n (cid:88) m = x m { π jk >π im } ≥ x i (cid:88) j (cid:44) i x j { π ji <π ii } − x i n (cid:88) m = x m (cid:88) j , k (cid:44) i x j x k { π jk >π im } − x i (cid:88) k (cid:44) i x k (cid:88) j (cid:44) i x j { π ik <π ji } ≥ (1 − (cid:15) ) (cid:15) − (1 − (cid:15) ) (cid:15) − (1 − (cid:15) ) (cid:15) ≥ (cid:15) (1 − (cid:15) ) (cid:16) − (cid:15) + (cid:15) (cid:17) ≥ (cid:15) < − √ . Thus whenever (cid:15) < − √ the trajectory from an initial condition with x i = − (cid:15) convergesto x i =
1, so any strict symmetric equilibrium is locally stable. (cid:3) Two-strategy games
In this section we completely characterize the behavior of the IBR dynamics for two-strategy games. This topic was first studied in Izquierdo and Izquierdo (2013), in whichthe dynamics is used to approximate the behavior of a finite population of agents whoemploy the IBR rule in the Hawk-Dove game. Izquierdo and Izquierdo (2013) also derivethe general IBR equation for two-strategy games. Our paper complements their findings byidentifying all possible rest points of the dynamics, and by demonstrating its equivalenceto the replicator dynamics in terms of the number of rest points and the local behavioraround them.For two-strategy games with two distinct payo ff s the IBR dynamics and the replicatordynamics coincide. The table below presents all possible cases with 3 or 4 distinct payo ff s,grouped by the game type: D – game with a dominant strategy, W – game with a weaklydominant strategy, C – coordination game, A – anticoordination game. D D D D D D (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) D D D D D D (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) W W W W W W (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) C C C C C C (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) A A A A A A (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) Table 1: types of two-strategy games with 3 or 4 distinct payo ff sIn games with a (weakly) dominant strategy the set of rest points is { , } , and trajectoriesfrom any interior state converge to state x = D , D , D , D , D , D , D , D ˙ x = x (1 − x ) D , D ˙ x = x (1 − x )[ x + (1 − x ) ] D ˙ x = x (1 − x )[ x + (1 − x ) ] D ˙ x = x (1 − x )[ x + (1 − x )] W , W ˙ x = x (1 − x ) W , W ˙ x = x (1 − x ) W ˙ x = x (1 − x )(1 + x ) W ˙ x = x (1 − x ) (1 + x )Table 2: the mean dynamics for the games with (weakly) dominant strategiesIn coordination games the set of rest points is { , − √ , − √ , , } . The interior rest pointsare repelling, and trajectories from the interior states converge to the boundary states.types mean dynamics interior RP C ˙ x = x (1 − x )[ − x + x − x = − √ ≈ . C , C , C ˙ x = x (1 − x )[2 x − x = C , C ˙ x = x (1 − x )[ − x + x − x = − √ ≈ . { , , √ − , √ , } . Essentially, the restpoints in coordination and anticoordination games are the same save for the order of thestrategies. The trajectories from the interior states converge to the interior rest point.types mean dynamics interior RP A ˙ x = x (1 − x )[1 − x ] x = √ ≈ . A , A , A ˙ x = x (1 − x )[1 − x ] x = A , A ˙ x = x (1 − x )[ − x − x + x = √ − ≈ . First consider the symmetric RPS game with the payo ff matrix A and a , b > A = − a bb − a − a b For any values of the payo ff parameters the game A induces the same order of payo ff s asthe standard RPS game, for which the replicator dynamics is˙ x = x ( z − y ) , ˙ y = y ( x − z ) , ˙ z = z ( y − x ) , where x , y , and z are the shares of agents playing Rock, Paper, and Scissors, respectively.The mean dynamics in game A generated by the IBR protocol is˙ x = x ( z − y )(1 − xy − xz − yz ) , ˙ y = y ( x − z )(1 − xy − xz − yz ) , ˙ z = z ( y − x )(1 − xy − xz − yz )Thus in the standard RPS game the IBR dynamics is the replicator dynamics with speedadjusted by the positive non-constant function (1 − xy − xz − yz ). This relationship helpsidentify the global behavior of the IBR dynamics in symmetric RPS games. Proposition 4.
In all symmetric RPS games the trajectories under the IBR dynamics are closedorbits around the unique interior rest point ( , , ) .Proof. Clearly, the only interior solution to the system (RPS) is x ∗ = y ∗ = z ∗ = .Since the IBR dynamics is the speed-adjusted replicator dynamics for this game, theLyapunov function H ( x ) = x ∗ log xx ∗ + y ∗ log yy ∗ + z ∗ log zz ∗ (introduced in Theorem 6 in Zeeman(1980)) would be constant along the solutions of the IBR dynamics for all symmetric RPSgames. (cid:3) ff to a strategy (the informationabout the candidate strategy that a player receives under the proportional imitation rulebased on the average payo ff s) is equivalent to learning about the di ff erence in shares ofwinners and losers under that strategy. So the average payo ff under the PPI rule is higherwhenever the likelihood of switching under the IBR rule is higher. In general under the replicator dynamics the behavior of the system in the Rock-Paper-Scissors game solely depends on the determinant of the payo ff matrix A . If det A = A > A < that the global behavior of the system can beone of the same three types: either all solutions converge to the interior steady state, orform closed orbits around it, or converge to the boundary. The di ff erence is, the behaviordepends on the order of payo ff s, so for a fixed RPS game one can have any combinationof behaviors under the replicator and the IBR dynamics.In Table 5 we consider the nine possible orderings over the payo ff s in the RPS game.In all cases Rock yields the highest positive payo ff . A = A = A = B = B = B = C = C = C = Table 5: The interior steady state is repelling in A1-A3, an attractor in B1-B3, and a centerin C1-C3.Each B game is obtained from an A game with the same index by reversing the orderof payo ff s and subsequently relabelling the strategies. This procedure reverses the flowsalong the solution trajectories, so the repelling rest points in games of type A becomeattractors in games of type B. The next proposition states this result formally. We provide the proof of that statement for the closed orbits case, and state it as a conjecture for theremaining two cases based on simulations.
Proposition 5. i) The unique interior rest point in any A game is repelling.ii) For any i ∈ { , , } the game B i can be obtained from the game ( − A i ) by relabeling the strategies.Proof. i) The statement is proved for the game A by direct computation. The proofs forgames A and A are similar.The IBR dynamics in game A can be written as F ( x , y , z ) = ˙ x ˙ y ˙ z = x ( z − y )( x + y + z ) y ( x − z )( x + y + z ) + yz ( y − x )( x + z ) z ( y − x )( x + y + z ) − yz ( y − x )( x + z ) To identify the interior rest points notice that ˙ x = y = z , in which case ˙ y = x − y ) (cid:16) x + y − y ( x + y ) (cid:17) = . Plugging in x = − y results in the equation(1 − y ) (cid:16) y − y + (cid:17) = , with the only real solution y = . Thus the unique interior rest point of the system F is( , , ).To identify the local behavior of the system F around the interior steady state, projectit from R onto ∆ , the two-dimensional simplex, to obtain the systemˆ F ( x , y ) = (cid:32) x (1 − x − y )( x − x + y (2 x + y − y − y + + xy ( y − x )(1 − x − y ) (cid:33) , ) is D ˆ F (cid:18) , (cid:19) = (cid:32) − − (cid:33) with the eigenvalues ± i √ . Since both eigenvalues have positive real parts the rest pointis repelling.ii) To see the relationship between A and B , write the matrices A , − A , and B side byside: A = − A = B = The game B can be obtained from ( − A ) by relabeling strategies 2 and 3. Formally theIBR dynamics in game B can be written as G ( x , y , z ) = ˙ x ˙ y ˙ z = x ( z − y )( x + y + z ) y ( x − z )( x + y + z ) − yz ( x − z )( x + y ) z ( y − x )( x + y + z ) + yz ( x − z )( x + y ) so − F ( x , y , z ) = G ( x , z , y ). Thus the system G is the time-reversed system F , so the eigenval-ues of the Jacobian evaluated at the interior rest point of ˆ G both have negative real parts,and that rest point is an attractor. (cid:3) Games of type C require a di ff erent approach, since in them the eigenvalues of theJacobian at the interior rest point are purely imaginary, so the local stability analysis usingthe Jacobian does not produce an unambiguous result. However, this obstacle can beovercome once one notices that up to the strategy labels, reversing the order in any Cgame results in the same game. Proposition 6.
The unique interior rest point in any C game is a center, and any trajectory fromthe interior forms a closed orbit around it.Proof.
The proposition is proved for the game C . The proof for games C and C is similar.Observe that the negative of the game C is C with strategies 2 and 3 interchanged. C = − C = C H ( x , y , z ) = ˙ x ˙ y ˙ z = x ( z − y )( x + y + z ) y ( x − z )( x + y + z ) − yz ( x − xy − xz − yz ) z ( y − x )( x + y + z ) + yz ( x − xy − xz − yz ) has the property − H ( x , y , z ) = H ( x , z , y ).To identify the interior rest points of the system H observe that ˙ x = z = y ,so that x = − y − z = − y . Plugging the expressions for x and z into ˙ y = − y + y − y = , which only has one root y ≈ .
374 in the interval [0 , x ∗ ≈ (0 . , . , . H .To show that the solution trajectories originating in the interior of the simplex formclosed orbits around the rest point x ∗ , we first show that any such solution circles around x ∗ and then apply the “self-negating” property to conclude that any circular solutiontrajectory must be indeed a closed orbit.Formally, the IBR dynamics in C H ( x , y , z ) = ˙ x ˙ y ˙ z = x ( z y )( x + y + z ) y ( x z )( x + y + z ) yz ( x xy xz yz ) z ( y x )( x + y + z ) + yz ( x xy xz yz ) has the property H ( x , y , z ) = H ( x , z , y ).To identify the interior rest points of the system H observe that ˙ x = z = y , so that x = y z = y . Plugging the expressions for x and z into ˙ y = y + y y = , which only has one root y ⇡ .
374 in the interval [0 , x ⇤ ⇡ (0 . , . , . H .To show that the solution trajectories originating in the interior of the simplex form closedorbits around the rest point x ⇤ , we first show that any such solution circles around x ⇤ and thenapply the “self-negating” property to conclude that any circular solution trajectory must be indeeda closed orbit. x = y = z = x = y = z = x ⇤ x = y = z = x = y = z = x ⇤ Figure 4: Left: the nullclines divide the simplex into 6 regions. Right: solution trajectories fromthe marked points have to remain within the red areas.Observe that the x-nullcline is the x-bisector, while the y-nullcline and z-nullcline originate atstates y = z =
1, respectively, pass through the interior rest point x ⇤ and hit the oppositeedges of the simplex. Together the nullclines divide the simplex into six regions: x is decreasingin regions 1, 2, and 3, and increasing in regions 4, 5, and 6; y is decreasing in 3, 4, and 5, andincreasing in 6, 1, and 2; z is decreasing in regions 5, 6, and 1, and increasing in regions 2, 3, and 4.Given the signs of the nullclines, the possible directions of motion in each of the six regions arerestricted to a particular 60-degree wedge. For instance, in region 1, x and z must be decreasing,while y is increasing, so a solution originating from a point inside region 1 can only move towardthe boundary z = z =
0. But as the solution gets closer to the boundary, thespeed ˙ z approaches 0, while the speeds ˙ x and ˙ y are bounded away from 0 within the red area thatrestricts the feasible directions of motion (Formally, as z !
0, ˙ x ! xy ( x + y ); ˙ y ! xy ( x + y );˙ z !
0, so near the edge z = y = z =
1, respectively, pass through the interior rest point x ∗ andhit the opposite edges of the simplex. Together the nullclines divide the simplex into sixregions: x is decreasing in regions 1, 2, and 3, and increasing in regions 4, 5, and 6; y is18ecreasing in 3, 4, and 5, and increasing in 6, 1, and 2; z is decreasing in regions 5, 6, and1, and increasing in regions 2, 3, and 4.Given the signs of the nullclines, the possible directions of motion in each of the sixregions are restricted to a particular 60-degree wedge. For instance, in region 1, x and z must be decreasing, while y is increasing, so a solution originating from a point insideregion 1 can only move toward the boundary z = z =
0. But as thesolution gets closer to the boundary, the speed ˙ z approaches 0, while the speeds ˙ x and˙ y are bounded away from 0 within the red area that restricts the feasible directions ofmotion (Formally, as z →
0, ˙ x → − xy ( x + y ); ˙ y → xy ( x + y ); ˙ z →
0, so near the edge z = z must be increasing.But it is not immediately obvious that they cannot hit the rest point x ∗ . To exclude thispossibility, one has to show that the y-component of every point in the region 2 is at leastas high as the y-component of x ∗ , so that x ∗ can not be reached as y must be increasing.This will be the case if the slope of the z-nullcline at x ∗ is not lower than the slope of theline y = const . To compute that slope, simplify the system H to H ( x , y , z ) = ˙ x ˙ y ˙ z = x ( z − y )( x + y + z ) y ( x − z )( x + y + z ) + xy zz ( y − x )( x + y + z ) − xyz so that the equation for the z-nullcline becomes ( y − x )( x + y + z ) − xyz =
0. Using z = − x − y , apply the Implicit Function Theorem to compute the slope of the z-nullcline: dydx = x + y + z − ( y − x )(2 x − + yz − xyx + y + z + ( y − x )(2 y − − xz + xy The rest point x ∗ is characterized by z = y and x = − y , so the slope of the z-nullcline at x ∗ can be expressed solely in terms of y : dydx | x = x ∗ = (1 − y ) + y + y − (3 y − − y ) + y − y (1 − y )(1 − y ) + y + y + (3 y − y − = y − y + y − y + > x ∗ the z-nullcline has a positive slope, whereas the slope of the line y = const is 0 (in standard coordinates). Therefore at any point in the interior of region 2, y > y ( x ∗ ), and the trajectories originating in that region have to escape it via the y-nullcline.Similarly, in region 4 the slope of the x-nullcline z − y = − exceeds the19lope of the line z = const equal to −
1, and in the region 6 the slope of the y-nullcline( x − z )( x + y + z ) + xyz = dydx = − x + y + z + ( x − z ) + yz − xyx + y + z + ( x − z )(1 − z ) + xz − xy so that at x ∗ it becomes dydx | x = x ∗ = − y − y + y − y + < x = const is vertical. Hence the trajectories originating in both regions 4and 6 have to escape them via the nullclines. Therefore the solution trajectory from anyinterior initial condition circles around the interior rest point x ∗ by sequentially enteringand exiting each of the six regions via the nullclines. x = y = z = x ⇤ x x x Figure 5: The solution trajectory from x (blue) and the time-reversed solution trajectory from x (red).x-bisector between the rest point x ⇤ and the state x =
1. Suppose that once this solution completesa loop around the rest point, it hits the x-bisector again at some point x , whereas if one wereto reverse the flow it would hit the bisector at a point x . The “self-negating” property of C implies that the mirror image of the phase portrait of the negative of C is the phase portrait of C (the x-bisector being the axis of symmetry). In particular, the mirror image of the segment ofthe solution trajectory between x and x has to be the segment between x and x . Therefore x = x , but that is only possible if x = x = x , as otherwise x and x must be on the oppositesides of x . Therefore all solution trajectories must form closed orbits around x ⇤ . ⌅ The Proposition 4.3 suggests that if a three-strategy game has the “self-negating” property,the solution trajectories and the time-reversed solution trajectories “meet”, which implies that thetrajectories form closed orbits around an interior steady state. This observation provides a linkbetween self-negating games under the IBR dynamics and zero-sum games under the replicatordynamics, since in the latter case interior rest points also cannot be asymptotically stable (Akinand Losert (1984)). However, in games with more than three strategies one might not get as muchmileage out of self-negation. With only three strategies, the self-negating property implies anaxisymmetric phase portrait, since there must be a pair of strategies that is relabeled when theorder of payo ↵ s is reversed. With four or more strategies it is possible that more than one pair ofstrategies are relabeled, and it is not entirely clear what this possibility implies.
5. Other examples
In this section we provide two more examples that relate the IBR and the replicator dynamics.First, we show that the game from the example 1 of Zeeman (1980) admits two interior rest pointsunder the IBR dynamics. Such behavior is impossible under the replicator dynamics, under whichin non-degenerate games there can be at most one interior rest point (Theorem 3 in Zeeman (1980)).This result suggests that under the IBR dynamics there are more classes of game dynamics thanunder the replicator dynamics. Second, we construct a self-negating game in which the interioris split into two regions, one containing closed orbits around the interior rest point, and the otherbeing a basin of attraction for a pure rest point. Up to the position of the rest point, both dynamicsare equivalent. –17–Figure 5: The solution trajectory from x (blue) and the time-reversed solution trajectoryfrom x (red).The final step of the proof is to show that every solution trajectory is a closed orbit.Since all trajectories circle around x ∗ , it su ffi ces to consider a solution originating at somepoint x on the x-bisector between the rest point x ∗ and the state x =
1. Suppose that oncethis solution completes a loop around the rest point, it hits the x-bisector again at somepoint x , whereas if one were to reverse the flow it would hit the bisector at a point x − .The “self-negating” property of C implies that the mirror image of the phase portrait ofthe negative of C is the phase portrait of C (the x-bisector being the axis of symmetry).In particular, the mirror image of the segment of the solution trajectory between x − and x has to be the segment between x and x . Therefore x − = x , but that is only possibleif x − = x = x , as otherwise x − and x must be on the opposite sides of x . Therefore allsolution trajectories must form closed orbits around x ∗ . (cid:3) W in Example5 is a self-negating game, in which some interior solution trajectories form closed orbits,while others converge to a pure rest point.In games with more than three strategies one might not get as much mileage out of self-negation. With only three strategies, the self-negating property implies an axisymmetricphase portrait, since there must be a pair of strategies that is relabeled when the order ofpayo ff s is reversed. With four or more strategies it is possible that more than one pair ofstrategies are relabeled, and it is not entirely clear what this possibility implies. In this section we provide two more examples that relate the IBR and the replicatordynamics. First, we show that the game from the example 1 of Zeeman (1980) admitstwo interior rest points under the IBR dynamics. Such behavior is impossible under thereplicator dynamics, under which in non-degenerate games there can be at most oneinterior rest point (Theorem 3 in Zeeman (1980)). Second, we construct a self-negatinggame in which the interior is split into two regions, one containing closed orbits aroundthe interior rest point, and the other being a basin of attraction for a pure rest point. Up tothe position of the rest point, both dynamics are equivalent.
Example 5.
Consider the games Z and W:Z = W = Game Z is the game from the Example 1 in Zeeman (1980). The IBR dynamics in it ˙ x = x (1 − x ) (cid:16) x + y − z (cid:17) − x yz ˙ y = − y (1 − y ) (cid:16) x + y − z (cid:17) + x yz ˙ z = z ( x − y ) (cid:16) z − x − y (cid:17) ields two distinct interior rest points z = ( , , ) and z ≈ (0 . , . , . . At z therelevant eigenvalues are (cid:16) − ± i √ (cid:17) , so this rest point is stable, whereas at z the eigenvaluesare 0.41 and -0.041, so it is unstable. Figure 6: Some solution trajectories in games Z (left) and W (right). Game W has the self-negating property, so there are closed orbits around the interior rest point, buta part of the interior of the simplex is the basin of attraction of the state x = . In the time-reversedgame x = becomes the attractor, while the interior rest point preserves its region with the closedorbits. This paper investigated the properties of an imitative rule that ignores any cardinalinformation about the game’s payo ff s. Agents switch to strategies which they perceive asbetter based on the comparison of their realized payo ff s to that of a random member ofthe population. Since this behavioral rule bears a similarity to the pairwise proportionalimitation of Schlag (1998), the resulting ordinal imitative dynamics begs comparison withthe replicator dynamics arising from the PPI.We demonstrate that while the IBR dynamics does not possess the payo ff monotonicityand Nash stationarity properties of the replicator dynamics in general, the two dynamicsare topologically equivalent in two-strategy games. We also conjecture that they generatethe same types of behavior in Rock-Paper-Scissors games. In other cases, the IBR dynamicscan generate behavior that is impossible under the replicator dynamics.Better understanding the relationship between the two dynamics and investigating theself-negating property in games with more than three strategies would be the two mostimportant directions for future research. 22 eferences Bena¨ım, M. and Weibull, J. W. (2003). Deterministic approximation of stochastic evolutionin games.
Econometrica , 71:873–903.Bj ¨ornerstedt, J. and Weibull, J. W. (1996). Nash equilibrium and evolution by imitation. InArrow, K. J. et al., editors,
The Rational Foundations of Economic Behavior , pages 155–181.St. Martin’s Press, New York.Helbing, D. (1992). A mathematical model for behavioral changes by pair interactions. InHaag, G., Mueller, U., and Troitzsch, K. G., editors,
Economic Evolution and DemographicChange: Formal Models in Social Sciences , pages 330–348. Springer, Berlin.Hofbauer, J. (1995). Imitation dynamics for games. Unpublished manuscript, Universityof Vienna.Hofbauer, J. and Sandholm, W. H. (2011). Survival of dominated strategies under evolu-tionary dynamics.
Theoretical Economics , 6:341–377.Izquierdo, L. R., Izquierdo, S. S., and Sandholm, W. H. (2018). Evodyn-3s: A mathe-matica computable document to analyse evolutionary dynamics in 3-strategy games.Unpublished manuscript.Izquierdo, S. S. and Izquierdo, L. R. (2013). Stochastic approximation to understand simplesimulation models.
Journal of Statistical Physics , 151(1):254–276.Nachbar, J. H. (1990). ’Evolutionary’ selection dynamics in games: Convergence and limitproperties.
International Journal of Game Theory , 19:59–89.Samuelson, L. and Zhang, J. (1992). Evolutionary stability in asymmetric games.
Journalof Economic Theory , 57:363–391.Sandholm, W. H. (2010).
Population Games and Evolutionary Dynamics . Cambridge: MITPress.Schlag, K. H. (1998). Why imitate, and if so, how? A boundedly rational approach tomulti-armed bandits.
Journal of Economic Theory , 78:130–156.Viossat, Y. (2015). Evolutionary dynamics and dominated strategies.
Economic TheoryBulletin , 3:91–113.Weibull, J. W. (1995).
Evolutionary Game Theory . MIT Press, Cambridge.Zeeman, E. C. (1980). Population dynamics from game theory. In Nitecki, Z. and Robin-son, C., editors,