A taxonomy of learning dynamics in 2 x 2 games
AA taxonomy of learning dynamics in 2 × Marco Pangallo (cid:63), , James Sanders , Tobias Galla , and J. Doyne Farmer Institute for New Economic Thinking at the Oxford Martin School, University of Oxford, Oxford OX26ED, UK Mathematical Institute, University of Oxford, Oxford OX1 3LP, UK Theoretical Physics, School of Physics and Astronomy, University of Manchester, Manchester M139PL, UK Santa Fe Institute, Santa Fe, NM 87501, US
Abstract
Learning would be a convincing method to achieve coordination on an equilibrium. Butdoes learning converge, and to what? We answer this question in generic 2-player, 2-strategygames, using Experience-Weighted Attraction (EWA), which encompasses many extensivelystudied learning algorithms. We exhaustively characterize the parameter space of EWAlearning, for any payoff matrix, and we understand the generic properties that imply con-vergent or non-convergent behaviour in 2 × Key Words:
Behavioural Game Theory, EWA Learning, Convergence, Equilibrium,Chaos.
JEL Class.:
C62, C73, D83. ∗ Corresponding author: [email protected]. We thank Vince Crawford, Mike Harr´e, CarsHommes, Peiran Jiao, Robin Nicole, Mousavi Shabnam and Peyton Young, as well as seminar participantsat Nuffield College, INET YSI Plenary, Herbert Simon Society International Workshop, Conference on ComplexSystems 2016 and King’s College, for helpful comments and suggestions. a r X i v : . [ q -f i n . E C ] F e b ontents × Introduction
How do players coordinate on specific profiles of strategies in non-cooperative games, andwhy should they coordinate on an equilibrium profile? If the game is simple or one-shot, areasonable explanation is provided by strategic thinking and introspection. Another justi-fication, which is more generally valid in complicated and repeated games, is learning andinteraction. However, as it is fairly well known since the contribution of Shapley (1964),the learning dynamics may fail to converge to an equilibrium. This questions the validity ofequilibrium thinking in game theory: at least in some contexts, strategic interactions mightbe governed by learning in an ever-changing environment, rather than by rational and fully-informed decision making. The literature has faced the dilemma about the convergence ofthe learning dynamics to Nash Equilibria (NE) in several ways. Most theoretical work hasidentified classes of games and learning algorithms in which the dynamics succeeds to con-verge; some authors provided counter-examples in which learning would not converge. Littlehas been said about the generic properties of games and learning algorithms which yield aconvergent or non-convergent dynamics. Recent work (Galla and Farmer, 2013) addressedthis issue by considering ensembles of 2-person, N -strategy games and finding the regions ofthe parameter space where learning was less likely to converge: negatively correlated pay-offs and “rational” long-memory learning implied limit cycles and high-dimensional chaosin the learning dynamics. However, little understanding of the reasons for non-convergentbehaviour was provided.In order to shed light on the mechanisms behind (non-)convergence, this paper inves-tigates the drivers of instability in the simplest possible non-trivial setting, that is generic2-person, 2-strategy normal form games, trying to capture the typical features of the payoffmatrix and of the learning behaviour that yield a cycling or an irregular dynamics. Westudy a slightly simplified version of Experience-Weighted Attraction (EWA), which is gen-eral enough to encompass both reinforcement and belief learning and has been shown tobe in accord with experimental data (Camerer and Ho, 1999). In short, we find that theexistence of a cycle of best responses in the payoff matrix, coupled with a quick enoughlearning dynamics (in a sense that will be specified later), is a sufficient condition for thenon-convergence of learning. In particular, in games with a unique mixed strategy equilib-rium (to which we refer as discoordination games , lacking an established terminology in theliterature) the players follow the cycle of best responses and never converge to the NE: werather observe limit cycles or low-dimensional chaos. Lack of convergence is driven by theplayers adapting too quickly to the moves of their opponent. In the same learning scenario,if the payoff matrix is acyclic (there is at least one fixed point in terms of best responses,that is a profile of strategies which is the best response by both players to some beliefs ontheir opponent), as in dominance-solvable and coordination games, convergence to a purestrategy NE occurs immediately. On the contrary, if the players are “irrational” and/or donot have enough incentives to switch their moves, they do not recognize that a pure strat-egy may be better and simply randomize between their possible moves, reaching a mixedstrategy fixed point.We find such a taxonomy of the learning dynamics by looking at relevant combinationsof parameters, which naturally emerge from the mathematical analysis. Figure 1 illustratesour approach and provides a qualitative characterization of the parameter space. We denoteby “irrationality” the ratio of two parameters of EWA, namely the memory loss of pastperformance α divided by the closeness to optimal decision making β (payoff sensitivity or intensity of choice ). “Coordination” ( AC ) depends on the payoff matrix and quantifies thepreference of the players for “diagonal” outcomes: if we denote their pure strategies by 1 Robinson (1951); Miyazawa (1961); Shapley (1964); Crawford (1974); Stahl (1988); Nachbar (1990); Milgromand Roberts (1991); Krishna (1992); Conlisk (1993a); Monderer and Shapley (1996); Hahn (1999); Arieli andYoung (2016). For instance, in Matching Pennies, if player Row (who wins if the pennies are matched) thinks that playerColumn would play Heads, the best response for Row would be to play Heads. The best response for Column tothis move of player Row is to play Tails. Row would then switch to Tails as well, and so on. ominance solvablegames (Anti) Coordination gamesDiscoordination games βα AC βα BD Multiple fixed pointsUnique pure fixed pointUnique mixed fixed pointLimit cycles and chaos (cid:18) , − , , , − (cid:19) (cid:18) , , , , (cid:19)(cid:18) , , , , (cid:19)(cid:18) , , , − , − (cid:19) Figure 1: Qualitative characterization of the parameter space. The irrationality α/β refersto the intrinsic noise in the learning algorithm.
Coordination ( AC ) and dominance ( | BD | )quantify properties of the payoff matrix. The combinations of these parameters characterizethe learning dynamics and relate to specific classes of 2 × and 2, coordination is large when the payoffs associated with the profiles of strategies (1 , ,
2) are much larger than the payoffs for (1 ,
2) and (2 , “Dominance” ( | BD | ) onthe other hand quantifies the relative strength of a pure strategy with respect to the otherone. Coordination and dominance naturally relate to well-known classes of 2 × Related literature
The first example of a normal form game where convergence of fic-titious play (Brown, 1951; Robinson, 1951) did not occur was provided by Shapley (1964).He considered a 3 × Or viceversa: coordination is also large if the payoffs for (1 ,
2) and (2 ,
1) are much larger than the payoffs for(1 ,
1) and (2 , ype of game Payoff matrix: (cid:18) a, e b, gc, f d, h (cid:19) Example ParametersCoordination a > c , b < d , e > g , f < h .Two pure strategy (1 , , (cid:18) , − , , − , (cid:19) AC = 72 | BD | = 6Anticoordination a < c , b > d , e < g , f > h .Two pure strategy (1 , , (cid:18) , , , , (cid:19) AC = 12 | BD | = 0Discoordination a > c , e < g , b < d , f > h ; a < c , e > g , b > d , f < h .Unique mixed strategy NE. (cid:18) , − − , − − , , − (cid:19) AC = − | BD | = 18Dominance-solvable All other possible orderings.E.g. a > c , b > d , e > g , f > h .Unique pure strategy NE. (cid:18) , − , , − − , − (cid:19) AC = 4 | BD | = 18 Table 1: Games in the taxonomy. The games are defined in terms of the orderings in the payoffmatrix. Coordination is AC , where A = a + d − b − c and C = e + h − f − g , while dominanceis | BD | , B = a + b − c − d , D = e + f − g − h . In dominance-solvable games, | BD | > | AC | ;in coordination and anticoordination games, | BD | < | AC | , AC >
0; in discoordination games, | BD | < | AC | , AC <
0. Note that there are some exceptions: see Proposition 1.
Another literature focused on the generic properties of the payoff matrices and learningalgorithms that were associated with multiplicity of NE or non-convergent behaviour. Bergand Weigt (1999) showed how the number of NE increases exponentially with the correlationof the payoffs, while Opper and Diederich (1992) considered the replicator dynamics with alarge number of species and used techniques from the statistical physics of disordered systemsto show how, below a certain level of cooperation pressure (a parameter characterizingthe learning algorithm), the dynamics becomes unstable. More recently, Galla and Farmer(2013) analysed random games and EWA learning, showing that high-dimensional chaos andlimit cycles could be observed in a significant portion of the parameter space, for negativelycorrelated payoffs.This paper bridges the two described literatures in that we exhaustively characterize theparameter space of EWA in generic 2 × ex-post the learning dynamicsto specific classes of games based on the convergence properties of the learning algorithm,rather than focusing ex-ante on any specific class of games. We show that convergence occursin acyclic 2 × × × ure strategies available to each player, we find low-dimensional chaos, in contrast withGalla and Farmer (2013), who find high-dimensional chaotic attractors (which are consistentwith an essentially random and unpredictable learning dynamics) in games with many purestrategies. Since we find a quasi-cyclical learning dynamics, it can sensibly be argued thatthe pattern can be guessed by one of the players, who could then take advantage of herforecast of the moves of her opponent in order to systematically outguess his choices, andthereby perform better than him. In evolutionary terms, the player who can guess thecyclical behaviour of her opponent has higher fitness and is eventually expected to takeover the entire population. This is the rational expectations argument of Muth (1961)and would suggest that the cyclic behaviour is expected to die out. However, in line withthe view of the rational route to randomness (Brock and Hommes, 1997), this is not anobvious outcome. The information cost for guessing the moves of the other player andthe interaction between two or more forecasting strategies easily yield complex dynamics,preventing rational and perfectly informed players to outperform less sophisticated players.Hommes et al. (2016) apply this formalism to the theory of learning in games by consideringthe interplay between rational play and a short memory adjustment process such as best-response dynamics or fictitious play in Cournot games. Rational players are able to outguessthe choices of their opponents, but complex dynamics may still occur. In a different context,Huberman and Hogg (1988) show that more sophisticated learning algorithms may lead tochaotic dynamics.Another understandable critique is whether our learning algorithm can be considered asrepresentative of how players learn in reality, and whether limit cycles or chaos in the learn-ing dynamics play a role in the real world and could be detected in experiments. Camererand Ho (1999) and Ho et al. (2007) fit the EWA model to experimental data in several classesof games and show that it outperforms other learning models in most cases. However, it islikely that the players would change their learning strategy as the game evolves, implyingthat they learn how to learn . Stahl (1996) considered a model of rule learning where theplayers are of different k -levels (Nagel, 1995) and change their k -level using reinforcementlearning. Crawford (1995) proposed a generalization of the standard belief learning algo-rithms to take into account time-varying memory and idiosyncratic shocks. Would we findthe same qualitative learning dynamics if we used more sophisticated learning algorithms?Our analysis suggests that limit cycles and chaos may theoretically be observed as longas the players are willing to quickly switch their moves, independently of the reason whythey behave so. A property of the cycling behaviour, as opposed to the convergence to amixed strategy equilibrium, is the slower decay in the autocorrelation function of the moveschosen by each player. In the language of time series, the sequence of moves by each playerexhibits persistence . This is a precise theoretical prediction that can be tested against dataon experimental learning of discoordination games. Organization of the paper
The rest of this paper is organized as follows: in Section2 we define the classes of 2 × × Based on the properties one wants to look at, it is possible to construct several classificationsof 2-person, 2-strategy (2 ×
2) games. Rapoport et al. (1976) find 78 classes of games, whichcan be reduced to 24 when less properties are considered. Here we are only concerned withthe number of Nash Equilibria (NE) and with their type, i.e. whether they are pure or The dimensionality of the chaotic attractors quantifies the departure from regular oscillations. ixed strategy NE. We only find 3 classes of 2 × × (cid:18) a, e b, gc, f d, h (cid:19) . (1)The number and type of the NE depend on the pairwise ordering of the payoffs eachplayer compares, namely ( a, c ) and ( b, d ) for player Row, ( e, g ) and ( f, h ) for player Column.There are 2 = 16 such orderings. We find the following classes of 2 × • Coordination and anticoordination games are respectively defined by the order-ings a > c , d > b , e > g , h > f and a < c , d < b , e < g , h < f . Coordination 2 × ,
1) and (2 , × ,
2) and (2 , • Discoordination games are defined by the orderings a > c , d > b , e < g , h < f and a < c , d < b , e > g , h > f (again, there exists no standard terminology forthis class of games). They have a unique mixed strategy NE and no pure strategyNE because the players have incentives to coordinate on different profiles of strategies.The prototypical discoordination game is Matching Pennies. • Dominance-solvable games are defined by all 12 remaining possible orderings. Theyhave a unique pure strategy NE, obtainable from the elimination of strongly dominatedstrategies. For instance, if a > c , d < b , e > g , h < f , the NE is (1 , × × A = 14 ( a + d − b − c ) ,B = 14 ( a + b − c − d ) ,C = 14 ( e + h − f − g ) ,D = 14 ( e + f − g − h ) . (2)The parameter A indicates the preference of player Row for outcomes of the type (1 , ,
2) over the cases (1 ,
2) and (2 , C is a measure for the preference of playerColumn for the same “diagonal” outcomes. It is then sensible to use the product AC asa measure of overall coordination. We then name AC as the “coordination” parameter. Ifboth A and C are positive and large, coordination is positive and large and both playersprefer outcomes (1 ,
1) and (2 , A and C are negative and large, coordination is stillpositive and large. The payoff matrix describes an anticoordination game and both players refer outcomes (1 ,
2) and (2 , A is positive and large and C is negative and large,coordination is negative and large, one player prefers outcomes (1 ,
1) and (2 , ,
2) and (2 , B is a measure for the dominance of player Row’s first strategy over hersecond, and similarly D measures the dominance of player Column’s first strategy over hersecond. We refer to | BD | as “dominance” parameter (we take the absolute value of theproduct BD because its sign only determines which profile of strategies is selected as theNE, but does not change the type of game). If dominance is large the payoff matrix describesa dominance-solvable game and it is sensible that the learning dynamics is characterized bya unique fixed point, close to the pure strategy NE.These statements are made more precise in the following proposition: Proposition 1. (i) In symmetric games ( A = C , B = D ), where coordination ( A ) anddominance ( B ) are positive, it is equivalent to consider | A | as the coordination parameterand | B | as the dominance parameter. If coordination is larger than dominance ( | A | > | B | ),the payoff matrix describes a coordination (if A > ) or anticoordination (if A < ) game.Viceversa, if | A | < | B | , it describes a dominance-solvable game.(ii) In asymmetric games ( A (cid:54) = C , B (cid:54) = D ), if coordination in absolute value is smallerthan dominance ( | AC | < | BD | ), the game is dominance-solvable; in the opposite case( | AC | > | BD | ), we cannot disambiguate between the classes of games using only theseparameters. In particular, if both | B | < | A | and | D | < | C | , the payoff matrix describes acoordination (if AC > , A > , C > ), anticoordination (if AC > , A < , C < ) ordiscoordination (if AC < ) game. On the other hand, if | B | > | A | or | D | > | C | , even if | AC | > | BD | , the game is dominance-solvable. However, the larger the value of coordination(compared to dominance), the less likely the payoff matrix describes a dominance-solvablegame. The proof of Proposition 1 is in Appendix A, where we also show that there are only4 effective degrees of freedom in the payoff matrix, for what concerns the NE and thedynamical properties of EWA learning.
In this section we describe Experience-Weighted Attraction (EWA) learning and we list allmathematical simplifications that ease the subsequent analysis. In Section 3.1 we provide aformal definition of EWA and discuss the meaning of its parameters. In Section 3.2 we startto simplify the dynamics by assuming that the experience (one of the EWA components)has already reached a steady state and by taking a deterministic limit. In Section 3.3 wespecify a diffeomorphism that allows to substantially simplify the equations governing thelearning dynamics, with no loss in generality.
Camerer and Ho (1999) proposed EWA as a hybrid of reinforcement (the players learn on thebasis of the performance of their actions) and belief learning (the players construct beliefson the possible actions of their opponents and respond to these beliefs). They noticed thatthe two largely studied classes of learning algorithms are in fact equivalent if the playersalso consider forgone payoffs. Thanks to the generality of EWA, the fit with experimentaldata is better than with pure reinforcement or pure belief learning. The reason is that realplayers learn using both information about performance and beliefs.We now introduce some notation. Consider a 2-person, 2-strategy normal form game.We index the players by µ ∈ { Row = R, Column = C } and the pure strategies by i = 1 , x ( t ) the probability for player R to play pure strategy 1 at time t , and by The other requirement is that they average between the current payoff for a certain strategy and the pasttendency to play the same strategy, see Section 3.2. ( t ) the probability for player C to play pure strategy 1 at time t . We further denote by s µ ( t ) the pure strategy which is actually chosen by player µ at time t , so that Π µ ( i, s − µ ( t ))represents the payoff that player µ receives at t if she plays pure strategy i and the otherplayer chooses the pure strategy s − µ ( t ).In EWA, the mixed strategies are determined from the so-called attractions or propen-sities Q µi ( t ) following a logit rule. For example, the probability for player R to play purestrategy 1 is given by x ( t + 1) = e βQ R ( t +1) e βQ R ( t +1) + e βQ R ( t +1) , (3)where β is the payoff sensitivity or intensity of choice and a similar expression holds for y ( t + 1). The propensities update as follows: Q µi ( t + 1) = (1 − α ) N ( t ) Q µi ( t ) + ( δ + (1 − δ ) I ( i, s µ ( t + 1))Π µ ( i, s − µ ( t + 1)) N ( t + 1) , (4)where N ( t +1) = (1 − α )(1 − κ ) N ( t )+1. Here N ( t ) represents experience and increases withthe number of rounds played; the more it grows, the smaller becomes the influence of thereceived payoffs on the attractions. The propensities change according to the received payoffwhen playing action i against the strategies s − µ by the other players, i.e. Π µ ( i, s − µ ( t + 1)).The indicator function I ( i, s µ ( t + 1)) is equal to 1 if i is the actual pure strategy that wasplayed by µ at time t + 1, that is i = s µ ( t + 1). All attractions (those corresponding tostrategies that were and were not played) are updated with weight δ , while an additionalweight 1 − δ is given to the specific attraction corresponding to the strategy that was actuallyplayed. Finally, the memory loss parameter α determines how quickly previous attractionand experience are discounted and the parameter κ interpolates between cumulative andaverage reinforcement learning (see below). Here we make two substantial, albeit rather innocuous, simplifications. First, EWA hastwo state variables: attraction and experience. The dynamics of the latter is trivial, as itreaches a fixed point extremely fast (for many combinations of parameters, the time scaleof convergence is of the order of 2-3 time steps). Therefore we assume, with a small loss ingenerality, that experience has already reached a fixed point N (cid:63) when the dynamics starts.To ensure the existence of such a fixed point we need to assume that (1 − α )(1 − κ ) <
1. Thisrestriction only rules out standard fictitious play, in which all past actions are taken intoaccount with the same weight and therefore the relative weight of the most recent actionsbecomes smaller and smaller. There is no further loss in generality, as all other reinforcementand belief learning algorithms can still be viewed as a particular case of the EWA dynamicsonce N ( t ) has reached a fixed point.The update rule (4) now reads: Q µi ( t + 1) = (1 − α ) Q µi ( t ) + (1 − (1 − α )(1 − κ ))( δ + (1 − δ ) I ( i, s µ ( t + 1))Π µ ( i, s − µ ( t + 1)) . (5)The interpretation for κ is now more transparent: if κ = 1 the past payoffs are cumulated,hence cumulative reinforcement learning; if κ = 0 the past attraction and the current payoffare averaged with weight given by the memory loss parameter α , hence average reinforcementlearning. Note that the two learning algorithms can be made equivalent by rescaling thepropensities (or equivalently the intensity of choice) by α (see Galla and Farmer 2013). Due to the normalization condition, the learning dynamics is fully characterized by { x ( t ) , y ( t ) } ∞ t =0 . The larger β , the more the players consider the attractions in determining their strategy. In the limit β → ∞ the players choose with certainty the pure strategy with the larger attraction. In the limit β → ollowing Camerer and Ho (1999), note that belief learning is recovered if δ = 1 and atleast one of the following conditions is satisfied: • There is no memory ( α = 1). If, in addition, β → ∞ , one recovers best-responsedynamics (Cournot, 1838), in that the players best respond to the last period beliefsonly; • Average reinforcement learning ( κ = 0).Therefore, by studying the dynamic properties of (3) and (5) we are considering a wideclass of learning algorithms, including reinforcement learning, best-response dynamics andweighted fictitious play. As a benchmark case we consider cumulative reinforcement learning(Section 4), which excludes belief learning, but we allow for average reinforcement learningin Section 5.2, where we generalize the results to belief learning.We make another bold assumption in this section, which will then be relaxed in Section5.1: we assume that the players play against each other many times before updating theirpropensities, so that the empirical frequency of their moves corresponds to their mixedstrategy. This sort of argument was already made by Crawford (1974) and justified byConlisk (1993a) in terms of “two-rooms experiments”: the players only interact through acomputer console and need to specify several moves before they know the moves of theiropponent. This assumption is useful from a theoretical point of view and does not affect theresults in most cases (Section 5.1): the only difference when noise is allowed is a blurring ofthe dynamical properties.We denote by Π µi the expected payoff for player µ playing pure strategy i at time t , giventhat player − µ plays a distribution of strategies given by her mixed strategy. An importantremark is that, under the deterministic assumption, it is intended that δ = 1, as it would beambiguous to distinguish between the strategies which were and were not played (as long asthe players choose a non-degenerate mixed strategy, both pure strategies would be chosenby each player with non-zero frequency), so in order to recover belief learning it is reallyjust enough to consider average reinforcement learning ( κ = 0).Finally, it is useful to combine (3) and (5) and to write the probabilities x ( t + 1) and y ( t + 1) directly in terms of the same probabilities at time t , that is x ( t ) and y ( t ). In thedeterministic limit (and so with δ = 1) we get x ( t + 1) = x ( t ) − α e β (1 − (1 − α )(1 − κ ))Π R ( y ( t )) Z x , (6)where Z x = x ( t ) − α e β (1 − (1 − α )(1 − κ ))Π R ( y ( t )) + (1 − x ( t )) − α e β (1 − (1 − α )(1 − κ ))Π R ( y ( t )) and ananalogous expression holds for y ( t + 1). The remaining simplification implies no loss of generality and matters for technical reasons:we propose a diffeomorphism that transforms the coordinates of the learning dynamicsand leads to a simpler set of equations. As we consider the combinations of parametersin the transformed coordinates, the taxonomy of the learning dynamics starts naturally toemerge. A diffeomorphism between a coordinate space ( x, y ), henceforth denoted by originalcoordinates , to a coordinate space (˜ x, ˜ y ), henceforth denoted by transformed coordinates ,leaves the dynamical properties (e.g. Jacobian, Lyapunov Exponents) in ( x, y ) unchangedin (˜ x, ˜ y ), thanks to a well-known property in dynamical systems theory (Ott, 2002).We consider the generic 2 × x = −
12 ln (cid:18) x − (cid:19) , ˜ y = −
12 ln (cid:18) y − (cid:19) . (7) n terms of the transformed coordinates, the map (6) writes:˜ x ( t + 1) = (1 − α )˜ x ( t ) + β (1 − (1 − α )(1 − κ ))( A tanh ˜ y ( t ) + B ) , ˜ y ( t + 1) = (1 − α )˜ y ( t ) + β (1 − (1 − α )(1 − κ ))( C tanh ˜ x ( t ) + D ) , (8)where A , B , C and D have been defined in (2).The original coordinates are restricted to x ( t ) ∈ [0 ,
1] and y ( t ) ∈ [0 , x, y ) ∈{ (0 , , (0 , , (1 , , (1 , } in the original coordinates map to (˜ x, ˜ y ) ∈ { ( ±∞ , ±∞ ) } in thetransformed coordinates. Note also that mixed strategies where the players choose amongtheir actions with the same probability, i.e. x, y = 1 /
2, are mapped to ˜ x, ˜ y = 0.The inverse transformation is given by x = 11 + e − x ,y = 11 + e − y . (9) We analyse the dynamical properties of EWA learning in generic 2 × δ = 1)and consider cumulative reinforcement learning ( κ = 1). The extensions are considered inSection 5. We first analyse the existence and the position of the fixed points in the strategy simplex,and then we consider their stability. In Section 4.1.1 we focus on symmetric games. We findthat there exists always at least one stable fixed point, which may or may not correspondto the NE. In Section 4.1.2 we consider “antisymmetric” games, where, for any combinationof strategies, the payoffs received by one player are the opposite of the payoffs received bythe other player (this does not necessarily correspond to zero-sum games, see below). Fordiscoordination games, the learning dynamics may not settle to a fixed point. Finally, inSection 4.1.3, we analyse the most general class of asymmetric games.
We start from the simplest case from the point of view of the analysis, namely symmetric2 × Rij = Π
Cij ,so A = C and B = D . Therefore, coordination is A and dominance is B . Recall fromProposition 1 that if | A | > | B | and A >
0, the payoff matrix describes a coordination game;if | A | > | B | and A <
0, the payoff matrix describes an anticoordination game; if | B | > | A | ,the game is dominance-solvable. he fixed points in the transformed coordinates can be obtained from (8), by setting˜ x ( t + 1) = ˜ x ( t ) = ˜ x (cid:63) and ˜ y ( t + 1) = ˜ y ( t ) = ˜ y (cid:63) . The fixed point equation is ˜ x (cid:63) = Ψ(˜ x (cid:63) ),where Ψ(˜ x (cid:63) ) = βα (cid:20) A tanh (cid:18) βα ( A tanh ˜ x (cid:63) + B ) (cid:19) + B (cid:21) . (10)An identical expression holds for ˜ y (cid:63) . Note that the EWA parameters α and β combineas the ratio α/β (or β/α ). It makes sense to define α/β as the “irrationality” parameterbecause it is large if there is substantial memory loss and/or small intensity of choice. Eq.(10) can have either 1 or 3 solutions. If there are 3 intersections between Ψ(˜ x (cid:63) ) and the ˜ x (cid:63) line, we denote as central solution the intersection with an intermediate value for ˜ x (cid:63) and by lateral solutions the intersections with the maximum and minimum values. Note that thefixed points are a vector (˜ x (cid:63) , ˜ y (cid:63) ), so it is not enough to compute the solutions of Eq. (10),one also needs to find the right couplings by replacing the possible combinations in (8). Thanks to the fact that the maps (6) and (8) are topologically conjugate, their Jacobianis the same. We compute it from (8): J | ˜ x (cid:63) , ˜ y (cid:63) = (cid:32) − α Aβ cosh (˜ y (cid:63) ) Cβ cosh (˜ x (cid:63) ) − α (cid:33) . (11)The eigenvalues are λ ± = 1 − α ± | A | β x (cid:63) ) cosh(˜ y (cid:63) ) . (12)Since 1 − α >
0, the leading eigenvalue is λ + and it is enough to study that for thestability properties. After a little algebra we get the stability condition αβ cosh(˜ x (cid:63) ) cosh(˜ y (cid:63) ) − | A | ≥ . (13)The shape of Ψ(˜ x (cid:63) ) varies according to the irrationality ( α/β ), coordination ( | A | ) anddominance ( | B | ) parameters. Due to the strong non-linearity of Ψ(˜ x (cid:63) ), it is not possible tostudy it analytically in full. Therefore, we first solve Eq. (10) numerically, and then providea mathematical analysis of a number of specific cases. Figure 2 shows the properties of thefixed points obtained from the numerical solution of (10), keeping irrationality constant,i.e. α/β = 1 (since the parameters combine as βα A and βα B , it is equivalent to change thevalues of A and B ). We also check the stability of the fixed points by using Eq. (13). Wefind that there is always at least one stable fixed point. If there are multiple fixed points,only the lateral solutions are stable. For small values of the payoffs, such that the playersdo not have strong incentives to choose a specific pure strategy, learning converges to amixed strategy fixed point, where the players randomly choose between the pure strategies.If dominance is larger than coordination, the payoff matrix describes a dominance-solvablegame and learning converges to a pure strategy fixed point corresponding to the NE. Ifcoordination is larger than dominance, the payoff matrix may represent a coordinationor an anticoordination game. Note that multiple fixed points are much more likely inanticoordination games. To see why this is the case, consider the following payoff matrices,with A = C = ± .
75 and B = D = 1 . coord = (cid:18) , , , , (cid:19) ; Π anticoord = (cid:18) , , , , (cid:19) . (14)While in Π coord the NE which yields payoffs (6,6) is to be clearly preferred over theNE yielding (1,1), so it is reasonable that learning only converges to the preferred outcome(unique pure strategy fixed point), in Π anticoord there is no preferred NE, so it is sensiblethat learning displays multiple fixed points. This is indeed what happens in the top rightand top left corners of Figure 2, which correspond to the payoff matrices in (14). It never occurs that the components of a fixed point are the central and lateral solutions: either bothcomponents are central solutions, or both components are lateral solutions. A = C B = D Dominance-solvable gamesAnticoordination games Coordination gamesMultiple fixed pointsUnique pure fixed pointUnique mixed fixed point
Figure 2: Numerical solution of Eq. (10) for α/β = 1 and several values of A = C and B = D .If 0 . < x (cid:63) < . . < y (cid:63) < .
7, the solution is classified as a “mixed strategy fixed point”.All unique fixed points are stable; if there are multiple fixed points, only the “lateral solutions”are stable. The overall picture is that for small values of the payoffs learning converges to amixed strategy fixed point; if dominance is strong, to a pure strategy fixed point; if coordinationis strong, to multiple fixed points. There are noticeable differences between coordination andanticoordination games. This figure corresponds to the βα AC > y -axis (because both A = C < A = C >
AC >
We now proceed with the mathematical solution for a number of specific cases. We firstset B = 0, and study the interplay of the coordination parameter, | A | , and the irrationalityparameter, α/β . The lateral solutions do not exist if βα | A | ≤ . (15)The interpretation is straightforward: if irrationality is large (so its inverse is small)and/or coordination is small (i.e. the absolute value of the payoffs is small), there is aunique fixed point in the centre of the strategy simplex. This transition can be seen for B = 0 and A = ± B >
0. It is possible to check in Eq. (10) that a large value of βα | B | flattens Ψ(˜ x (cid:63) ) (because it makes the argument less sensitive to the values of ˜ x (cid:63) ) and movesthe offset Ψ(0) away from zero. Therefore, for a sufficiently large value of βα | B | there is aunique fixed point far from the centre of the simplex. This is indeed what happens in Figure2. Stability is addressed in the following proposition. Proposition 2.
We consider a symmetric × game. The following results hold:(i) if B = 0 and βα | A | ≤ , the unique fixed point is stable.(ii) if B = 0 and βα | A | → + or βα | A | → + ∞ , the fixed point whose components are thecentral solutions is unstable and the fixed points whose components are the lateral solutionsare stable. In particular, at βα | A | = 1 a supercritical pitchfork bifurcation occurs.(iii) if βα | B | → + ∞ and B (cid:29) A , the unique fixed point is stable. The proof is in Appendix B. .1.2 “Antisymmetric” games So far, we analysed the learning dynamics in dominance-solvable, coordination and antico-ordination games. We now want to focus on the remaining class of 2 × Rij = − Π Cij , and so A = − C , B = − D . Note that this condition does notgenerally define zero-sum games, as the latter are rather defined by the equality Π Rij = − Π Cji (so the two classes of games correspond only if Π
Rij = − Π Cij = 0 for i (cid:54) = j ). Again, if B > A the game is dominance-solvable, but if
A > B we have a discoordination game.The fixed points (˜ x (cid:63) , ˜ y (cid:63) ) are again obtained from (8):˜ x (cid:63) = βα (cid:20) − A tanh (cid:18) βα ( A tanh ˜ x (cid:63) + B ) (cid:19) + B (cid:21) , ˜ y (cid:63) = βα (cid:20) − A tanh (cid:18) βα ( A tanh ˜ y (cid:63) + B ) (cid:19) − B (cid:21) , (16)where we have used the identity tanh( − x ) = − tanh( x ). It is immediate to note from (16)that there exists a unique fixed point. Indeed, the functions on the RHS are monotonicallydecreasing, so only one intersection with the ˜ x (cid:63) and ˜ y (cid:63) lines is possible. Moreover, given AC = − A <
0, the eigenvalues of the Jacobian (11) are complex: λ ± = 1 − α ± i β | A | cosh(˜ x (cid:63) ) cosh(˜ y (cid:63) ) . (17)The stability condition reads: β √ α − α | A | cosh(˜ x (cid:63) ) cosh(˜ y (cid:63) ) ≤ . (18)In Figure 3 we show the properties of the unique fixed point obtained from the numeri-cal solution of (16), for several values of A and B . We also check the stability of the fixedpoints by using Eq. (18). Focusing on small B , a larger value of | A | or a smaller value of √ α − α /β (which is close to the irrationality parameter α/β ) imply a more likely instabil-ity. The intuition is straightforward: if the players are rational and/or have strong incentivesto switch a strategy which is not performing well, they follow the cycle of best-responsesin the payoff matrix and keep switching their moves, rather than smoothly converging toa fixed point in the centre of the simplex, where they would randomize between the purestrategies. On the contrary, if B is large (with respect to A ), the learning dynamics simplyconverges towards a fixed point close to the pure strategy NE.We conclude this section by focusing on one specific example of discoordination games,where dominance is null: B = D = 0, C = − A . The unique fixed point is (0 ,
0) and is stableif (we assume without loss of generality
A > C < β √ α − α A ≤ . (19)Replacing the values of α and β used in Figure 3, the fixed point becomes unstable for A (cid:63) = 1 . We now consider asymmetric 2 × A (cid:54) = C and B (cid:54) = D . There is a larger variety of behaviours,but in general asymmetric games are widely similar to their symmetric counterparts (e.g.asymmetric dominance-solvable games are widely similar to symmetric dominance-solvablegames), except that the player with the strongest incentive to play a certain move playsmixed strategies farther from the centre of the strategy simplex. .0 0.5 1.0 1.5 2.0 A = − C B = − D Dominance-solvable gamesDiscoordination games
Limit cycles and chaosUnique pure fixed pointUnique mixed fixed point
Figure 3: Numerical solution of Eq. (20) for α = β = 0 . A = − C and B = − D . If 0 . < x (cid:63) < . . < y (cid:63) < .
7, the solution is classified as a “mixed strategy fixedpoint”. There is always a unique fixed point, which may become unstable in discoordinationgames for low values of irrationality and/or high (absolute) values of coordination. The intuitionis that the players have strong incentives to try and improve their payoffs, so they fail tocoordinate to the mixed strategy NE and the learning dynamics keeps cycling. This figurecorresponds to the βα AC < he fixed points (˜ x (cid:63) , ˜ y (cid:63) ) are given by˜ x (cid:63) = βα (cid:20) A tanh (cid:18) βα ( C tanh ˜ x (cid:63) + D ) (cid:19) + B (cid:21) , ˜ y (cid:63) = βα (cid:20) C tanh (cid:18) βα ( A tanh ˜ y (cid:63) + B ) (cid:19) + D (cid:21) . (20)Without loss of generality, we can write the (combinations of) payoffs of player Columnas a rescaled version of the (combinations of) payoffs of player Row, that is C = W A and D = ZB , with W and Z scale factors. The magnitudes of W and Z quantify the imbalancein coordination and dominance for the two players. For instance, if W is large, playerColumn has stronger incentives to converge on one of the pure strategy NE (just considerthe payoff matrix (1), with a = d = 1, e = h = 5, b = c = f = g = 0). Consistently, theheight of the hyperbolic tangent (20) for player Column is larger, leading to an intersectionwith the ˜ y (cid:63) line which is farther away from zero (˜ y (cid:63) (cid:29) ˜ x (cid:63) ). Therefore, player Columnwill choose a mixed strategy farther from the centre of the simplex. Likewise, if Z is large(consider a = 1, d = − e = 5, h = − b = c = f = g = 0), player Column ends up toa fixed point closer to the pure strategy NE. Concerning the signs, the sign of Z does notmatter in determining the stability properties, while the sign of W has a substantial effect.If W >
0, we find little difference with the symmetric case; if
W < | B | < | A | and | D | < | C | ),which may have no stable fixed points, as shown in Section 4.1.2. We choose a parameter setting where the fixed point of the discoordination game is unstable.Figure 4 shows some examples of the dynamics for some values of α and β , for a givenpayoff matrix. The dynamics superficially looks like following a limit cycle, whose shape isgoverned by α and β : Fig. 4a shows that, for high α and β , the players frequently changetheir strategies, whereas in Fig. 4b, for low values of α and β , the dynamics is smoother;in Fig. 4c, where α is very small but β is reasonably high, the players spend a lot of timeplaying mostly one pure strategy and then quickly switch to the other one (because theyhave long memory). Finally, in Fig. 4d we choose B (cid:54) = 0: the discrepancy between the purestrategies seems to yield the most irregular dynamics.In order to get further insights into the learning dynamics, Figure 5 represents thebifurcation diagrams and the largest Lyapunov Exponent (LLE), varying α and β . Wefocus on the values of the payoff matrix in Fig. 4d, as the behaviour of the learning dynamicsin Fig. 4a is only marginally chaotic. Figures 5a and 5c refer to a parameter setting wherethe fixed point is unstable, and we observe alternating limit cycles and chaotic bands. Onthe other hand, in Figures 5b and 5d, for small intensity of choice, that is β ∈ (0 , . β ∈ (0 . , .
8) the dynamics is not periodic, but the LLE is almost null. This casecorresponds to a marginally chaotic behaviour, like the one in Fig. 4a. For larger values of β we observe again chaotic bands and limit cycles. At the points where the limit cycles becomechaotic we can observe a higher density of trajectories, probably related to the intermittencyscenario of transition to chaos (Pomeau and Manneville, 1980).Figure 6 shows that chaos is more frequently observed if one of the pure strategies isdominant over the other, B >
0. Moreover, the LLE is always negative if
B > A (consistently Since the system is 2-dimensional, in order to compute the Lyapunov exponents it is necessary to periodicallyorthogonalize the unit vectors using a Gram-Schmidt procedure, see Benettin et al. (1980). Note that, while thisis strictly necessary only in order to obtain the whole Lyapunov spectrum, and so compute the Kaplan-Yorkedimension, in practice the estimate of the LLE is much more accurate if one uses the orthogonalization methodeven just to compute the LLE. Since we choose C = − A and D = − B , there is a 4-fold symmetry in the AB plane, so we only plot the 1stquadrant.
00 910 920 930 940 950 960 970 980 990 1000 t x , y (a) α = 0 . β = 1
900 910 920 930 940 950 960 970 980 990 1000 t x , y (b) α = 0 . β = 0 .
900 910 920 930 940 950 960 970 980 990 1000 t x , y (c) α = 0 . β = 0 .
900 910 920 930 940 950 960 970 980 990 1000 t x , y (d) α = 0 . β = 1 Figure 4: Time series of the probabilities x (in blue) and y (in red). The payoff parameters are: b = c = f = g = 0, a = d = 4 and e = h = − a = − . d = − . e = 11 . h = 1 . with the diagram depicted in Figure 3) and is larger for high absolute values of the payoffs(so that A and B are larger). × In the above sections, the taxonomy of learning dynamics is determined by three classes of2 × a priori . We choose an ensemble of payoff matrices obtainedby constraining the mean, variance and correlation of the payoff elements. In particular,we assume that the mean is 0, the variance is 1 and the two payoffs for the same profileof pure strategies in the payoff matrix are correlated by a parameter Γ. A value Γ = − <
0, are more generally associated with competitive games; on the contraryΓ = 1 reveals perfect correlation and positive values of Γ are related to cooperative games.Finally, Γ = 0 implies lack of correlation. Under these constraints, the maximum entropydistribution is a bivariate Gaussian with specified mean and covariance matrix (Galla andFarmer, 2013).Figure 7 represents the fraction of games which belong to each of the three classes, as afunction of the correlation parameter Γ. We see that for all values of Γ, dominance-solvablegames are the most likely. Positive values of Γ are associated with (anti)coordination games,which display multiple fixed points under EWA learning, whereas for negative values of Γit is more likely to obtain a discoordination game, and consequently limit cycles or chaosin the learning dynamics. Indeed, this was observed by Galla and Farmer (2013), whofind convergence to multiple fixed points in a semiplane Γ > < α/β should be low). A difference with Galla and .0 0.2 0.4 0.6 0.8 1.0 α x (a) β x (b) α -0.5-0.3-0.10.10.30.5 λ (c) β -0.5-0.3-0.10.10.30.5 λ (d) Figure 5: Bifurcation diagrams and largest Lyapunov exponent for b = c = f = g = 0, a = − . d = − . e = 11 . h = 1 .
8. Panels (a)-(c): β = 1, α varying. Panels (b)-(d): α = 0 . β varying. Low-dimensional chaos may be observed.18 a) (b) Figure 6: Largest Lyapunov exponent as a function of the parameters A and B . The colour scaleis set such that chaos is observed from green to red. The parameters are: C = − A , D = − B , β = 1, α = 0 . α = 0 . B >
0, and the payoffs are quite large in absolute value. -1.0 -0.5 0.0 0.5 1.0 Γ Dominance-solvable game(Anti)Coordination gameDiscoordination game
Figure 7: Fraction of dominance-solvable, (anti)coordination and discoordination games, asa function of the correlation Γ. These results are averaged over 10000 random draws of the(Gaussian) payoff matrix. 19 armer (2013) is that, whereas they find consistently unstable behaviour in certain regionsof the parameter space, we cannot rule out convergence to a fixed point for Γ <
0. In fact,for all values of Γ, most payoff matrices describe dominance-solvable games, which alwaysdisplay a stable fixed point. This difference might be explained by the fact that Galla andFarmer (2013) consider high-dimensional strategy spaces, whereas we are restricted to twopure strategies. A reasonable conjecture would be that by increasing the number of purestrategies the fixed points of the learning dynamics may become unstable. We leave theexploration of this conjecture to future work.
Here we show that the NE in pure strategies are “infinitely” unstable.
Proposition 3.
Consider a generic × game and the learning dynamics in the originalcoordinates (6) . At the profiles of pure strategies, ( x, y ) ∈ { (0 , , (0 , , (1 , , (1 , } , forpositive memory loss, α > , the Jacobian has infinite elements along the main diagonaland null elements along the antidiagonal. The proof of Proposition 3 is in Appendix C.A clarification is worth here: while the NE in pure strategies are formally unstable (unless α = 0), for most values of the parameters there is a fixed point nearby. In particular, ifirrationality is not too high and the absolute values of the payoffs are not too small, it islikely that one of the fixed points will be quite close to the NE in pure strategies. Thisresult could be anticipated, since for, e.g., dominance-solvable games, a reasonable learningdynamics is expected to converge sufficiently close to the NE. We generalize the results in Section 4 by relaxing two seemingly restricting assumptions. InSection 5.1 we drop the simplification of deterministic learning and we analyse the stochasticlearning dynamics. All previous results still hold and the only effect of this extension is ablurring of the dynamical properties. In Section 5.2 we analyse the more general case wherethe parameter κ , which interpolates between average and cumulative reinforcement learning,is not restricted to be κ = 1. This allows us to recover belief learning and to reproducethe well-known result about the convergence of fictitious play in 2 × κ . When playing a game, except for very specific experimental arrangements (Conlisk, 1993a),the players would update their propensities after observing each move by their opponent.This questions whether the deterministic dynamics (6), which assumes that the participantsof the game play against each other many times before updating their propensities, providesrobust conclusions. We interpolate from the deterministic limit by considering batches of size T , where the players sample their mixed strategies. The limit T → ∞ recovers deterministiclearning, whereas actual learning would occur with T = 1. As noted in Section 3.2, unless T = 1, the meaning of the parameter δ in unclear. Indeed, a value of δ different from 1implies that the players give an additional update to the attraction corresponding to themove which they chose. This rule is not well defined if they play against each other manytimes before updating their attractions, as they might choose both pure strategies at leastonce. However, for T = 1 we consider several values of δ and we show that, the lower thevalue of δ , the more noisy becomes the learning dynamics, as there is an additional source ofstochasticity given by which strategy the player randomly chooses, further to which strategyis randomly chosen by his opponent.It is beyond the scope of this paper to systematically study the effect of noise on thelearning dynamics, and we refer the reader to Galla (2009) for a study on the effect of noise n learning, and to Crutchfield et al. (1982) for a more general discussion on the effect ofnoise on the properties of dynamical systems. In the following we show a few numericalexamples where we investigate what happens as we progressively increase the level of noise.We simply describe our findings and we leave most of the numerical support to Appendix D.We stress that the dynamical properties in the deterministic limit, in order to be consideredas robust, need to hold down to T = 1, as that is the natural choice for a realistic learningdynamics. We focus on the three classes of games which we identified in the paper. Dominance-solvable games
Provided that the irrationality parameter is not too high,the players converge close to the pure strategy NE (Figures D.1 (a)-(d)). After an irregulartransient, as the learning dynamics moves close to the faces of the simplex, it becomesremarkably stable. On the contrary, if α/β is high, the players converge to the centre of thesimplex, as it occurs with deterministic learning (Figures D.1 (e)-(f)). However, the learningdynamics is much more irregular. The asymptotic learning behaviour is explained by twofactors: deviations from the previous moves, and their effect. If the players always playedthe same moves, the learning dynamics would converge to a fixed point. But as one of themswitched her move, we would observe a perturbation from such a fixed point. This explainsin part why, close to the centre of the simplex, the learning dynamics is more irregular: theplayers converge to a mixed strategy where they choose each move roughly with the sameprobability. The other factor is that the attractions are large at the faces of the simplex, sothe relative magnitude of their update (due to the deviation) is smaller. We also observeanother pattern in Figures D.1 (a)-(d): the higher the level of noise (i.e., the smaller T and/or the smaller δ ), the more irregular is the transient. (Anti)Coordination games As for dominance-solvable games, we observe conver-gence to a fixed point close to one of the pure strategy NE (for low levels of irrationality).We investigate whether noise can help reaching the Pareto-Optimal NE, as it does in thetheory of stochastic stability (Young, 1993). Given the previous remark on the effect of thenoise near the faces of the simplex, we expect that stochastic learning can help reaching thePareto-Optimal NE only in the first steps of the dynamics. This conjecture is confirmedby the numerical simulations in Figure D.2. We find that EWA is path dependent, differ-ently from the learning algorithms introduced by (Young, 1993), which are based on ergodicMarkov Chains. With EWA, the learning dynamics reaches the Pareto-Optimal NE only ifthere is a favourable fluctuation in the first stage of the dynamics.
Discoordination games
In Section 4 we identify two learning behaviours: if irrational-ity is high, the dynamics converges to the centre of the strategy simplex and the playerssimply randomize between their moves; if irrationality is low, the players do not converge toan equilibrium and the mixed strategies keep oscillating. This distinction survives when weallow for noise. In Figure 8 we plot the stochastic time series for both behaviours. In Figure8a the mixed strategy fixed point of the corresponding deterministic dynamics is unstableand the stochastic learning dynamics is chaotic (the parameters are the same as in Fig. 5),whereas in Figure 8b the mixed strategy fixed point is an attractor of the (deterministic)dynamics. It is immediately clear that in the latter case there is a total lack of autocorre-lation in the moves by each player (because the dynamics does not spend much time nearthe faces of the simplex), whereas in the former the autocorrelation function decays moreslowly as a function of the time lag. These results are confirmed in Figure 9 and constitutea precise theoretical prediction that can be tested against data on experimental learningof discoordination games. Finally, Figure 10 represents the same bifurcation diagram andlargest Lyapunov exponents as in Figs. 5a and 5c respectively, with the only difference thatwe consider stochastic learning with T = 1. For small values of α the LLE is still positive,so the dynamics is chaotic. We consider several values of T in Figure D.3. We observe theequivalence between parametric and additive noise (Crutchfield et al., 1982): the effect of In fact, the noise source induced by finite T is not additive, but it is always possible to express the noisethrough a properly defined additive stochastic term in the dynamical equations. oise on the properties of dynamical systems equivalently occurs as a perturbation of theirtrajectories or as a perturbation of their parameter values. By progressively increasing thelevel of noise, we observe the smoothing of both the bifurcation diagram and the plot rep-resenting the LLE, losing the finely alternating structure with bands of chaos and windowsof regularity.
900 910 920 930 940 950 960 970 980 990 1000 t x , y (a)
900 910 920 930 940 950 960 970 980 990 1000 t x , y (b) Figure 8: Time series of the probabilities x (in blue) and y (in red), for stochastic learning with T = 1. The payoff parameters are b = c = f = g = 0, a = − . d = − . e = 11 . h = 1 .
8. The memory loss is α = 0 .
2, the intensity of choice is β = 1 in panel (a), implyingdeterministic chaotic behaviour, and β = 0 . t s C t s C (a) ∆ t £ s C ( t ) s C ( t + ∆ t ) ⁄ (b) Figure 9: (a) Time series of the moves of player Column, for stochastic learning with T = 1.The upper (lower) panel corresponds to the stochastic learning dynamics in Fig. 8a (8b). (b)Autocorrelation function of the moves of player Column, for both learning dynamics representedin the left panel. If irrationality is high, the players randomize between their moves and theautocorrelation decays instantaneously. We drop the assumption of cumulative reinforcement learning ( κ = 1) and we analyse otherlearning algorithms in the EWA family. Looking at Eqs. (6) and (8), in order to considera general value for κ , it is sufficient to rescale the intensity of choice β and to replace it by˜ β = β [1 − (1 − α )(1 − κ )]. As the quantity multiplying β is lower than one, the intensityof choice is smaller and so the irrationality parameter is larger. Therefore, the learningdynamics is generally more stable, and it is easier to converge to a fixed point in the centreof the simplex.If κ = 0 and δ = 1 we recover most forms of belief learning ( α = 1: best-responsedynamics; α = 0: fictitious play; 0 < α <
1: weighted fictitious play). The rescaled .0 0.2 0.4 0.6 0.8 1.0 α x (a) α -0.5-0.3-0.10.10.30.5 λ (b) Figure 10: Bifurcation diagram and largest Lyapunov exponent as a function of α for stochasticlearning with T = 1. The payoff parameters are H = − . K = − . L = 11 . M = 1 . β = 1. Chaos survives for small values of α and we observe theequivalence between additive and parametric noise. intensity of choice is ˜ β = βα . First of all, this means that the coordinates of the fixed pointsdo not depend on α any more (Eqs. (10) and (20)). However, the memory loss sets thetimescale of convergence to the fixed points, so in the limit α → α , which does not cancel out: βα √ α − α A ≤ . (21)The derivative of the LHS with respect to α is positive, and so smaller and smaller valuesof α make stability more and more likely. In other words, the parameter space where it ispossible to observe unstable behaviour shrinks as α is reduced. In the limit α →
0, theLHS goes to zero, so stability is ensured for all parameter values. Note that the case α = 0, κ = 0 is the standard fictitious play learning algorithm (see Ho et al. (2007), Fig. 1) thatwas ruled out by obtaining the steady state dynamical equations (5), from the more generalEWA rule (4). However, by taking the limit α → × In this paper we have exhaustively characterized the dynamics of EWA learning in generic2 × taxonomy , of different behaviours can beobserved, according to the properties of the payoff matrix and to the value of the parametersof the learning algorithm. The taxonomy naturally relates to classes of games that have beenextensively studied in the literature: in dominance-solvable games we observe convergencetowards the unique pure strategy NE; in coordination games we find multiple fixed pointscorresponding to the NE; in discoordination games the unique mixed strategy NE may beunstable and the learning dynamics may settle in a limit cycle or a low dimensional chaoticattractor. However, for all classes of games, if the players cannot choose with certainty he best performing strategy (because of finite intensity of choice), quickly forget the pastperformance of their moves and/or have little incentives in terms of payoffs, the learningdynamics converges to a fixed point well in the centre of the simplex, where the playerssimply randomize between the pure strategies.The novelty of this work is first of all in its approach: we have identified a number ofrelevant parameters and classified the learning dynamics accordingly, by ex-post relating thevalues of the parameters to the classes of games described above. In particular, we havefound that irrationality , defined as the ratio of memory loss α to intensity of choice β , iflarge implies the convergence to a mixed strategy in the centre of the simplex. We havethen defined a coordination parameter by computing the difference of the diagonal and theoff-diagonal elements in the payoff matrix for both players ( A for row and C for column),and multiplying the two numbers ( AC ). A large positive value of coordination is related to acoordination or an anticoordination game, where the players try and coordinate on the sameprofiles of pure strategies. If coordination is negative ( A is positive and C is negative, orviceversa), the players try and coordinate on different profiles of strategies and there is nopure strategy NE. The payoff matrix defines a discoordination game and, for a good levelof rationality, is related to an unstable learning dynamics. The third parameter is called dominance . It is obtained as the absolute value of the product of the difference between thepayoffs associated with one pure strategy and the payoffs associated with the other one, forboth players ( B for row and D for column, so dominance is | BD | ). If large, it is likely thatthe payoff matrix describes a dominance-solvable game.Thanks to the exhaustive characterization of 2 × typical has not been thoroughly ex-plored. It is sensible that, by increasing the size of the strategy sets and/or the number ofplayers, unstable dynamics may become prevalent. Some work has in part confirmed thisconjecture (Sanders et al., 2016), but a more systematic investigation is required. In coordination games, where A and C are positive, they try and coordinate on profiles where they play thesame strategy. On the contrary, in anticoordination games, where A and C are negative, they try and coordinateon profiles of strategies where their moves are different. he ultimate goal of this line of research is to test whether learning converges in ex-periments. Most experiments show approximate aggregate convergence, but the underlyinggames have usually distinct equilibria and paths of convergent best replies. For general pay-off matrices with cycles in best-responses and several players, the players may just endlesslycycle between the profiles of strategies, even in reality, and equilibrium concepts would bemeaningless. For what concerns 2 × Bibliography
Arieli, I. and Young, H. P. (2016) “Stochastic Learning Dynamics and Speed of Convergencein Population Games,”
Econometrica , Vol. 84, pp. 627–676.Bena¨ım, M., Hofbauer, J., and Hopkins, E. (2009) “Learning in games with unstable equi-libria,”
Journal of Economic Theory , Vol. 144, pp. 1694–1709.Benettin, G., Galgani, L., Giorgilli, A., and Strelcyn, J.-M. (1980) “Lyapunov characteristicexponents for smooth dynamical systems and for Hamiltonian systems; a method forcomputing all of them. Part 1: Theory,”
Meccanica , Vol. 15, pp. 9–20.Berg, J. and Weigt, M. (1999) “Entropy and typical properties of Nash equilibria in two-player games,”
EPL (Europhysics Letters) , Vol. 48, pp. 129–135.Brock, W. A. and Hommes, C. H. (1997) “A rational route to randomness,”
Econometrica:Journal of the Econometric Society , pp. 1059–1095.Brown, G. W. (1951) “Iterative solution of games by fictitious play,” in T. Koopmans ed.
Activity analysis of production and allocation , New York: Wiley, pp. 374–376.Camerer, C. and Ho, T. (1999) “Experience-weighted attraction learning in normal formgames,”
Econometrica , Vol. 67, pp. 827–874.Conlisk, J. (1993a) “Adaptation in games: Two solutions to the Crawford puzzle,”
Journalof Economic Behavior & Organization , Vol. 22, pp. 25–50.(1993b) “Adaptive tactics in games: Further solutions to the Crawford puzzle,”
Journal of Economic Behavior & Organization , Vol. 22, pp. 51–68.Cournot, A.-A. (1838)
Recherches sur les principes math´ematiques de la th´eorie desrichesses : L. Hachette.Crawford, V. P. (1974) “Learning the optimal strategy in a zero-sum game,”
Econometrica:Journal of the Econometric Society , pp. 885–891.(1995) “Adaptive dynamics in coordination games,”
Econometrica: Journal of theEconometric Society , pp. 103–143.Crutchfield, J. P., Farmer, J. D., and Huberman, B. A. (1982) “Fluctuations and simplechaotic dynamics,”
Physics Reports , Vol. 92, pp. 45–82.Foster, D. P. and Young, H. P. (1998) “On the nonconvergence of fictitious play in coordi-nation games,”
Games and Economic Behavior , Vol. 25, pp. 79–96.Galla, T. (2009) “Intrinsic noise in game dynamical learning,”
Physical review letters , Vol.103, p. 198702.Galla, T. and Farmer, J. D. (2013) “Complex dynamics in learning complicated games,”
Proceedings of the National Academy of Sciences , Vol. 110, pp. 1232–1236.Hahn, S. (1999) “The convergence of fictitious play in 3 × Economics Letters , Vol. 64, pp. 57–60.Ho, T. H., Camerer, C. F., and Chong, J.-K. (2007) “Self-tuning experience weighted at-traction learning in games,”
Journal of Economic Theory , Vol. 133, pp. 177–198.Hofbauer, J. and Sigmund, K. (1998)
Evolutionary games and population dynamics : Cam-bridge university press. ommes, C. H., Ochea, M. I., and Tuinstra, J. (2016) “Evolutionary Competition be-tween Adjustment Processes in Cournot Oligopoly: Instability and Complex Dynam-ics,”Technical report, THEMA (TH´eorie Economique, Mod´elisation et Applications), Uni-versit´e de Cergy-Pontoise.Huberman, B. A. and Hogg, T. (1988) “The behavior of computational ecologies,” in TheEcology of Computation , pp. 77–115: North-Holland.Krishna, V. (1992) “Learning in games with strategic complementarities,”Technical report,Harvard Business School.Milgrom, P. and Roberts, J. (1991) “Adaptive and sophisticated learning in normal formgames,”
Games and economic Behavior , Vol. 3, pp. 82–100.Miyazawa, K. (1961) “On the Convergence of the Learning Process in a 2 × × Games and Economic Behavior , Vol. 14, pp. 144–148.Monderer, D. and Shapley, L. S. (1996) “Fictitious play property for games with identicalinterests,”
Journal of economic theory , Vol. 68, pp. 258–265.Muth, J. F. (1961) “Rational expectations and the theory of price movements,”
Economet-rica: Journal of the Econometric Society , Vol. 29, pp. 315–335.Nachbar, J. H. (1990) ““Evolutionary” selection dynamics in games: Convergence and limitproperties,”
International journal of game theory , Vol. 19, pp. 59–89.Nagel, R. (1995) “Unraveling in guessing games: An experimental study,”
The AmericanEconomic Review , Vol. 85, pp. 1313–1326.Opper, M. and Diederich, S. (1992) “Phase transition and 1/f noise in a game dynamicalmodel,”
Physical review letters , Vol. 69, pp. 1616–1619.Ott, E. (2002)
Chaos in dynamical systems : Cambridge university press.Pomeau, Y. and Manneville, P. (1980) “Intermittent transition to turbulence in dissipativedynamical systems,”
Communications in Mathematical Physics , Vol. 74, pp. 189–197.Rapoport, A., Guyer, M. J., and Gordon, D. G. (1976) “The 2 × Ann Arbor:University of Michigan Press .Robinson, J. (1951) “An iterative method of solving a game,”
Annals of mathematics , pp.296–301.Sanders, J. B., Galla, T., and Farmer, J. D. (2016) “The prevalence of complex dynamicsin games with many players,” in preparation .Shapley, L. S. (1964) “Some topics in two-person games,”
Advances in game theory, Annalsof Mathematical Studies , Vol. 52, pp. 1–29.Stahl, D. O. (1988) “On the instability of mixed-strategy Nash equilibria,”
Journal of Eco-nomic Behavior & Organization , Vol. 9, pp. 59–69.(1996) “Boundedly rational rule learning in a guessing game,”
Games and EconomicBehavior , Vol. 16, pp. 303–330.Vilone, D., Robledo, A., and S´anchez, A. (2011) “Chaos and unpredictability in evolutionarydynamics in discrete time,”
Physical review letters , Vol. 107, p. 038101.Young, H. P. (1993) “The evolution of conventions,”
Econometrica: Journal of the Econo-metric Society , Vol. 61, pp. 57–84. Payoff parameters and classes of games
To prove Proposition 1 for the generic 2 × (cid:48) = (cid:18) H, L , , K, M (cid:19) , (22)where H = a − c , K = d − b , L = e − g and M = h − f .Finally, consider Π (cid:48)(cid:48) = (cid:18) , , L (cid:48) H (cid:48) , K (cid:48) , M (cid:48) (cid:19) , (23)where H (cid:48) = − H , L (cid:48) = − L , K (cid:48) = K and M (cid:48) = M . Proposition A.1. (i) The payoff matrices Π and Π (cid:48) , defined by (1) and (22) respectively,have the same pure and mixed strategy NE.(ii) The EWA dynamics (6) is identical in the two cases, and so is any learning dynamicswhere the propensities are mapped to the probabilities using a logit function and the expectedpayoff enters as an additive term in the update of the propensities.(iii) Any other payoff matrix Π (cid:48)(cid:48) where the elements H (cid:48) , K (cid:48) , L (cid:48) and M (cid:48) are eitheridentical or opposite to H , K , L and M , and are in the same position if identical and onthe opposite position if opposite (up rather than down for Row, left rather than right forColumn) is equivalent to Π and Π (cid:48) . An example of such payoff matrix is Π (cid:48)(cid:48) , defined in (23) . Therefore, we set the off-diagonal elements to zero and prove Proposition 1 using payoffmatrix (22). We then prove Proposition A.1.
Proof of proposition 1 . In terms of the payoff matrix (22), the parameters A , B , C and D are defined as A = 14 ( H + K ) ,B = 14 ( H − K ) ,C = 14 ( L + M ) ,D = 14 ( L − M ) . (24)As we are interested in their relative magnitudes, we drop the 1 / AC = ( H + K ) ( L + M ) , | BD | = | ( H − K ) ( L − M ) | . (25)We start proving (i) . The game is symmetric, so H = L and K = M . So | A | = | H + K | and | B | = | H − K | . Moreover, the conditions H, K >
H, K < H or K are negative the gameis dominance-solvable. So, if H and K have the same sign, the payoff matrix describes acoordination game and the sum of H and K (in absolute value) is larger than their difference,so that coordination is larger than dominance; if the signs of H and K are different, thegame is dominance-solvable and the difference between H and K is larger (in absolute value)than their sum: dominance is larger than coordination. e then consider (ii) . If ( | BD | > | AC | ), either | B | > | A | , or | D | > | C | , or both.Therefore, either H and K do not have the same sign, or L and M do not have the samesign, or both. All of these cases represent dominance-solvable games (the profile of purestrategies which is the NE of the game depends on the relative signs). On the contrary, thecondition | BD | < | AC | does not necessarily imply that both | B | < | A | and | D | < | C | . However, if that is the case, the sums of H + K and L + M are larger than the differences H − K and L − M , which means that H, K and
L, M have the same sign. If
AC >
0, also A and C have the same sign, so either H, K, L, M are all positive, or they are all negative.If
H, K, L, M >
0, the payoff matrix describes a coordination game; if
H, K, L, M <
0, thepayoff matrix describes an anticoordination game. If
AC < A and C have different signs.Suppose without loss of generality that A > C <
0. Then
H, K > L, M <
0. Thepayoff matrix represents a discoordination game.We still have to show that, the larger the value of coordination (compared to domi-nance), the more likely the payoff matrix describes a coordination or anticoordination game(rather than a dominance-solvable game). This is not obvious. Coordination may be largebecause A (cid:29) B , but it could still be that C (cid:46) D . An extreme example is that B = 0 (sodominance is null), whereas A, C (cid:54) = 0: it is always | AC | > | BD | = 0, but this conditionimposes no restrictions on whether | C | > | D | or | C | < | D | . The intuition is, if we ran-domly choose the payoff elements, it is not likely to generate such a specific payoff matrix.We verify this conjecture by running extensive numerical simulations. For each ( AC , | BD | )point, we generate 1000 random realizations of the payoff matrix with specified AC and | BD | ; we then compute the fraction of dominance-solvable games (the other fractions arecoordination or discoordination games, according to whether we are in the positive or neg-ative AC semiplane). The results are in Figure A.1. As expected, if | BD | > AC , all gamesare dominance-solvable. Viceversa, the larger the absolute value of AC , the more likely thepayoff matrix may represent (anti)coordination or discoordination games. Interestingly, thefraction of dominance-solvable games never drops to zero. Finally, notice the consistencybetween Figure A.1 and Figure 1 (net of the fact there is not a neat separation between thedominance-solvable and the (anti)coordination and discoordination regions). -10.0 -8.0 -6.0 -4.0 -2.0 0.0 AC | B D | F r a c t i o n D o m i n a n c e - S o l v a b l e (a) AC | B D | F r a c t i o n D o m i n a n c e - S o l v a b l e (b) Figure A.1: Fraction of dominance-solvable games for randomly generated payoff matrices, as afunction of the coordination ( AC ) and dominance ( | BD | ) parameters. The larger is coordinationcompared to dominance, the more likely the payoff matrix describes a coordination (if AC >
AC <
0) game. For instance, consider the payoff matrix (1), with a = 3, e = 1, d = − h = 2, b = c = f = g = 0:this is a dominance-solvable game, but | AC | = 3 / > | BD | = 2 /
8. Note that | D | = 1 / < | C | = 3 /
4, but | B | = 1 > | A | = 1 / roof of proposition A.1 . We start proving (i) . The pure strategy NE are only deter-mined by the ordinal properties of the payoffs. Consider player Row. Her contribution indetermining the pure strategy Nash Equilibria depends on whether a > c or d > b , so itis unchanged if we consider H = a − c > K = d − b >
0. The same argumentapplies to player Column: his contribution in determining the pure strategy Nash Equilibriadepends on whether e > g or h > f , so it is unchanged if we consider L = e − g > M = h − f >
0. The same is true for all other positive/negative combinations.In the 2 × p, − p ), ( q, − q ) for playersRow and Column respectively are given by p = h − fe − g + h − f ,q = d − ba − c + d − b . (26)Again, we can rewrite the above equations without loss of generality in terms of H , K , L and M , namely q = KH + K ,p = ML + M . (27)We consider (ii) . We only focus on player Row (the proof for Column is identical). If,at time t , Column plays a mixed strategy given by ( y ( t ) , − y ( t )), the expected payoff forRow for playing pure strategy 1 is Π R ( y ( t )) = ay ( t ) + b (1 − y ( t )) and the expected payofffor strategy 2 is Π R ( y ( t )) = cy ( t ) + d (1 − y ( t )). Now, the ratio x ( t +1)1 − x ( t +1) fully determines x ( t + 1). Using (6) we find x ( t + 1)1 − x ( t + 1) ∝ e β (1 − (1 − α )(1 − κ )) (cid:16) Π R ( y ( t )) − Π R ( y ( t )) (cid:17) , (28)where Π R ( y ( t )) − Π R ( y ( t )) = ( a − c ) y ( t ) + ( d − b )(1 − y ( t )) = Hy ( t ) + Ky ( t ). Note thatthe same argument applies for any other learning algorithm where the expected payoffs arein the argument of an exponential and can be separated from the past propensities (e.g. donot enter multiplicatively).Finally, (iii) follows simply from the above results. If we consider H (cid:48) = − H at thebottom left in the payoff matrix, it is H (cid:48) = c − a , so a > c implies that H (cid:48) < H ↔ − H (cid:48) and the learning dynamics depends on H aswell, as Π R ( y ( t )) − Π R ( y ( t )) ∝ (0 − H (cid:48) ) y ( t ) = Hy ( t ).Apart from the above properties, we stress that the transformed payoff matrix (22) isnot fully equivalent to (1). For instance, consider the following Prisoner Dilemma (PD):Π P D = (cid:18) , , , , (cid:19) , (29)where strategy 1 is Cooperate and strategy 2 is Defect. The transformed payoff matrix isΠ (cid:48) P D = (cid:18) − , − , , , (cid:19) . (30)The payoff matrices (30) and (29) are not equivalent, in that the property that theNE and the Pareto Equilibrium do not coincide is lost, and so is the dilemma betweencooperation and defection. n a similar manner, consider the Stag-Hunt (SH) game:Π SH = (cid:18) , , , , (cid:19) , (31)where strategy 1 is Stag (S) and strategy 2 is Hare (H). Here (S,S) is the payoff-dominantNE, while (H,H) is the risk-dominant NE. If we apply the transformation we findΠ (cid:48) SH = (cid:18) , , , , (cid:19) . (32)The above is a pure coordination game, and the properties of payoff and risk-dominanceno longer hold.However, note that both in (29) and (30) and in (31) and (32) the NE are the sameand so are all differences in payoffs, holding the strategy of the other player fixed. Alearning algorithm that bases its learning properties on the performance of one pure strategycompared to the other one, should be invariant under the payoff matrix transformation whichwe described: this is probably the most intuitive explanation of why Proposition 1 holds. B Proof of Proposition 2
We first consider assertion (i) . Since B = 0, there is always a fixed point ˜ x (cid:63) = 0. It is stableif (from Eq. (13)) βα | A | ≤ . (33)So, as long as ˜ x (cid:63) = 0 is the unique fixed point, it is stable.We then consider assertion (ii) , and in particular the lower bound, βα | A | → + . There aretwo fixed points ˜ x (cid:63) = ± (cid:15) , where (cid:15) is an arbitrarily small number. Thanks to the symmetryof the game, we focus on a profile of mixed strategies given by (˜ x (cid:63) , ˜ x (cid:63) ). To second order,cosh ˜ x (cid:63) ≈ x (cid:63) ) /
2. The stability condition becomes αβ (cid:32) x (cid:63) ) (cid:33) (cid:32) x (cid:63) ) (cid:33) − | A | ≥ , (34)i.e. (˜ x (cid:63) ) ≥ βα | A | − . (35)Now, we Taylor expand Ψ(˜ x (cid:63) ) (defined in Section 4.1.1) to third order (first order wouldjust yield ˜ x (cid:63) = 0) and solve ˜ x (cid:63) = Ψ(˜ x (cid:63) ). Apart from the null solution, we get(˜ x (cid:63) ) = 3 (cid:16) β A α − (cid:17) β A α (cid:16) β A α (cid:17) . (36)It is easily checked that for βα | A | → + , the condition (35) is satisfied: the fixed pointswhose components are the “lateral solutions” are stable. Therefore, there is a supercriticalpitchfork bifurcation at the value βα | A | = 1.The upped bound, namely βα | A | → ∞ , is easily dealt with. Indeed, because we aresearching for the intersection with the ˜ x (cid:63) line, the fixed point is approximately the heightof the hyperbolic tangent itself: ˜ x (cid:63) ≈ ± βα | A | . Now, for βα | A | → ∞ the hyperbolic cosinecan be approximated by cosh (cid:18) βα | A | (cid:19) ≈ exp (cid:18) βα | A | (cid:19) / . (37) e can rewrite the stability condition as:4 βα | A | exp (cid:18) − βα | A | (cid:19) ≤ . (38)For βα | A | → ∞ , the LHS of the above equation goes to zero, so the inequality obviouslyholds.Finally, the proof of (iii) is identical to the proof of the upper bound for βα | A | , in thatthe same arguments apply to sufficiently large values of βα | B | , for which the only fixed pointwill be far enough from zero to be stable. C Proof of Proposition 3
In order to study the properties of the pure strategy NE we need to consider the learningdynamics in the original coordinates (the pure strategies map into infinite elements in thetransformed coordinates). The EWA dynamics reads (using (6) and the payoff matrix (1)): x ( t + 1) = x ( t ) − α e β ( ay ( t )+ b (1 − y ( t )) x ( t ) − α e β ( ay ( t )+ b (1 − y ( t )) + (1 − x ( t )) − α e β ( cy ( t )+ d (1 − y ( t )) ,y ( t + 1) = y ( t ) − α e β ( ex ( t )+ f (1 − x ( t )) y ( t ) − α e β ( ex ( t )+ f (1 − x ( t )) + (1 − y ( t )) − α e β ( gx ( t )+ h (1 − x ( t )) . (39)From Eq. (39) we can see that the pure strategies ( x, y ) ∈ { (0 , , (0 , , (1 , , (1 , } areall fixed points of the dynamics. Let us study their stability properties. We get a Jacobian J = (cid:18) J J J J (cid:19) , (40)with J = (1 − α )( x − x ) α e β ( y ( a − b − c + d )+ b − d ) (cid:0) x (1 − x ) α e β ( y ( a − b − c + d )+ b − d ) − ( x − x α (cid:1) ,J = β ( x − x ) α +1 ( a − b − c + d ) e β ( y ( a − b − c + d )+ b − d ) (cid:0) x (1 − x ) α e β ( y ( a − b − c + d )+ b − d ) − ( x − x α (cid:1) ,J = β ( y − y ) α +1 ( e − f − g + h ) e β ( x ( e − f − g + h )+ f − h ) (cid:0) y (1 − y ) α e β ( x ( e − f − g + h )+ f − h ) − ( y − y α (cid:1) ,J = (1 − α )( y − y ) α e β ( x ( e − f − g + h )+ f − h ) (cid:0) y (1 − y ) α e β ( x ( e − f − g + h )+ f − h ) − ( y − y α (cid:1) . (41)As it can be seen by taking the appropriate limits in Eqs. (41), for all pure strategiesthe Jacobian has infinite elements along the main diagonal and null elements along theantidiagonal. This means that the NE in pure strategies are infinitely unstable, and maybe the reason for the extreme nonlinearities observed in Galla and Farmer (2013) near thefaces of the simplex.The only case where the elements of the Jacobian for the NE in pure strategies wouldnot be infinite is that of no memory loss, α = 0, as it is possible to see if one computes theeigenvalues with this parameter restriction. In fact, the NE in pure strategies become stablefixed points of the learning dynamics. Effect of stochasticity on learning t x , y (c) t x , y (d) t x , y (e) t x , y (f) t x , y (g) t x , y (h) Figure D.1: Time series of the probabilities x (in blue) and y (in red). Values of the parameters: α = 0 . b = c = f = g = 0, a = e = 2, d = h = − β = 0 . β = 0 . T = 2; (c) Stochastic learning with T = 1 and δ = 1; (d) Stochastic learning with T = 1 and δ = 0; (e) Deterministic learning;(f) Stochastic learning with T = 1 and δ = 1. Deterministic and stochastic learning are largelysimilar. See Section 5.1 for further comments.33 t x , y (a) t x , y (b) t x , y (c) t x , y (d) Figure D.2: Time series of the probabilities x (in blue) and y (in red). Values of the parameters: α = 0 . β = 1, b = c = f = g = 0, a = e = 6, d = h = 1. (a) Deterministic learning startingfrom the initial conditions x (0) = 0 . y (0) = 0 .
05, close to the Pareto-dominated NE; (b)Deterministic learning starting from the initial conditions x (0) = 0 . y (0) = 0 .