[PDF] Poincaré-Bendixson Limit Sets in Multi-Agent Learning

Abstract

A key challenge of evolutionary game theory and multi-agent learning is to characterize the limit behaviour of game dynamics. Whereas convergence is often a property of learning algorithms in games satisfying a particular reward structure (e.g. zero-sum), it is well known, that for general payoffs even basic learning models, such as the replicator dynamics, are not guaranteed to converge. Worse yet, chaotic behavior is possible even in rather simple games, such as variants of Rock-Paper-Scissors games (Sato et al., 2002). Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincar\'e-Bendixson theorem, it is only applicable to low-dimensional settings. Are there other characteristics of a game, which can force regularity in the limit sets of learning? In this paper, we show that behaviors consistent with the Poincar\'e-Bendixson theorem (limit cycles, but no chaotic attractor) follows purely based on the topological structure of the interaction graph, even for high-dimensional settings with arbitrary number of players and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for games where each player has two strategies at disposal, and for interaction graphs where payoffs of each agent are only affected by one other agent (i.e. interaction graphs of indegree one). Since chaos has been observed in a game with only two players and three strategies, this class of non-chaotic games is in a sense maximal. Moreover, we provide simple conditions under which such behavior translates to social welfare guarantees, implying that FoReL learning achieves time average social welfare which is at least as good as that of a Nash equilibrium; and connecting the topology of the dynamics to the Price of Anarchy analysis.

Full PDF

PPoincaré-Bendixson Limit Sets in Multi-Agent Learning

Aleksander Czechowski

Delft University of Technology

Georgios Piliouras

Singapore University of Technology and Design

ABSTRACT

A key challenge of evolutionary game theory and multi-agentlearning is to characterize the limit behaviour of game dynamics.Whereas convergence is often a property of learning algorithmsin games satisfying a particular reward structure (e.g. zero-sum),it is well known, that for general payoffs even basic learning mod-els, such as the replicator dynamics, are not guaranteed to con-verge. Worse yet, chaotic behavior is possible even in rather simplegames, such as variants of Rock-Paper-Scissors games [35]. Al-though chaotic behavior in learning dynamics can be precluded bythe celebrated Poincaré-Bendixson theorem, it is only applicable tolow-dimensional settings. Are there other characteristics of a game,which can force regularity in the limit sets of learning?In this paper, we show that behaviors consistent with the Poincaré-Bendixson theorem (limit cycles, but no chaotic attractor) followspurely based on the topological structure of the interaction graph,even for high-dimensional settings with arbitrary number of play-ers and arbitrary payoff matrices. We prove our result for a wideclass of follow-the-regularized leader (FoReL) dynamics, which gen-eralize replicator dynamics, for games where each player has twostrategies at disposal, and for interaction graphs where payoffsof each agent are only affected by one other agent (i.e. interac-tion graphs of indegree one). Since chaos has been observed ina game with only two players and three strategies, this class ofnon-chaotic games is in a sense maximal. Moreover, we providesimple conditions under which such behavior translates to socialwelfare guarantees, implying that FoReL learning achieves timeaverage social welfare which is at least as good as that of a Nashequilibrium; and connecting the topology of the dynamics to thePrice of Anarchy analysis.

KEYWORDS

Replicator Dynamics, Follow-the-regularized-leader, Regret Mini-mization, Poincaré-Bendixson Theorem, Polymatrix Games, Priceof Anarchy

Understanding and predicting the behavior of learning dynamicsin normal form games has been a fundamental question that hasattracted the question of researchers from diverse disciplines such aseconomics, optimization theory, artificial intelligence a.o. [6, 36, 37,40]. Even in simple games, such as Rock-Paper-Scissors [25, 29, 35]models of evolution and learning are not guaranteed to converge;and even beyond cycles, long-term behavior may lead to chaoticbehavior, known to the dynamical systems community, e.g., fromweather models [22]. Not only does chaos manifest itself even insimple games with two players, but moreover, a string of recentresults seems to suggest that such chaotic, unpredictable behaviormay indeed be the norm across a variety of simple low dimensionalgame dynamics [1–3, 7, 8, 10, 13, 15, 27, 28, 33, 38]. Worse yet, the emergence of chaotic behavior has been connected with increasedsocial inefficiency showing that even in games where with a uniquesocially optimum equilibrium, i.e. games with Price of Anarchy [31]equal to 1, chaotic dynamics [9] may lead to highly inefficientoutcomes. Such profoundly negative results lead us to the followingnatural questions: • Do there exist simple, robust conditions under which learn-ing behaves well? • Which type of games lie at the “edge of chaos"? • Does dynamic simplicity translate to high efficiency, socialwelfare?Traditionally, a lot of work has focused on showing that in spe-cific classes of games (e.g., potential games), learning dynamics canlead to convergence and equilibration, (e.g. [6, 14, 34, 40] and refer-ences therein). One disadvantage of these approaches is that theseclasses of games are non-generic, i.e., the set of such games is ofmeasure zero in the space of all games. There is a negligible chancethat a game will satisfy any of these properties (e.g., be an exactpotential game). Hence, such results are typically contingent on thestringent assumption that the agents must internalize an abstractgame theoretic model to an arbitrary high degree of accuracy.Another approach to multi-agent learning that has been rela-tively underexplored so far is the possibility of interpreting simplenon-equilibrating behavior cyclic/periodic behavior as an exampleof a positive/helpful regularity. Instead, it is typically categorizedtogether with chaos under the label of disequilibrium behavior de-spite the vast difference between them. In the closest work to our,recently, [26] explores the possibility of such non-equilibrating reg-ularities, however, once again they do so by assuming non-genericstructure on the set of allowable games (e.g., network of 2 𝑥 Our approach and results.

To make progress along this hardtask, we explore a different type of constraint in games. We con-strain only the combinatorial structure of the game. How manystrategies does each player have? We only allow two. What are theallowable interactions between the agents? Every agent is affectedby the behavior of up to one other agent. Finally, we add a technicalrestriction that the game is connected – it cannot be decomposed a r X i v : . [ c s . G T ] J a n nto two subgames completely independent of each other. Underthese assumptions, we prove our main contribution in form of The-orems 3, 4 – that the limit behavior of FoReL [25, 36] of this gamesis always consistent with the famous Poincaré-Bendixson theorem,which informally states that the system is either convergent orcyclic, and in particular no chaotic attractor is not possible.Furthermore, under additional but structurally robust assump-tions on the payoff matrices (i.e., assumptions that remain validafter small perturbations of the payoff matrices), we prove positiveresults about the efficiency of the time-average behavior of thedynamics regardless of whether they are convergent or not. As it istypically the case in the Price of Anarchy literature [21] we focuson the measure of social welfare – the sum of individual payoffs,but whereas the the typical PoA literature tries to argue that regret-minimizing dynamics (such as FoReL) are at most a constant factorworse than the behavior of the worst case Nash equilibrium [31, 32],we instead show that FoReL dynamics always at least as efficient asthe worst case Nash equilibrium. Finally, in Section 5 we provideexamples in form of sample trajectories in high-dimensional games,where limit behavior is non-chaotic, but cyclic. Limit sets in two-player two-strategy games have been very wellunderstood. In particular, the celebrated Poincaré-Bendixson theo-rem states that all smooth two-dimensional systems can have onlystationary and cyclic limit sets [4]. Since the evolution of mixedstrategies of both agents can be described by two variables only,the theorem implies that no chaotic behavior can emerge. Poincaré-Bendixson theorem has been successfully applied in the past also tohigher-dimensional learning systems [12, 26]. The key technique ofthese papers was to show that the underlying dynamic is in fact two-dimensional, by finding constants of motion. Contrary to that, inthis paper our analysis is not contigent on identifying any invariantfunction while at the same time exploring truly high-dimensionalgames without any restriction on the number of players. Moreover,our theorems apply to a wide class of learning models in games,so-called Follow-the-regularized-Leader (FoReL) systems, whereagents evolve their mixed strategies in the direction of maximalreward, but with taking into account a regularizer term, which mod-els exploratory behavior [24]. This class has a number of strongproperties such as finite regret and contains as special cases e.g.variants of replicator dynamics [25].Due to the chaotic example of Sato et al. [35] in two player,three action games as well another negative example by Plank [30]of complex quasi-cyclic behavior in three player games with twoactions for each player but without a structured network of inter-action both for replicator dynamics, (c.f. Figure 2), our topologicalresults establish a maximal class of games for which such positiveresults are possible. A finite game in normal form consists of a set of 𝑁 players each witha finite set of strategies A 𝑖 . The preferences of each player are repre-sented by the payoff function 𝑢 𝑖 : (cid:206) 𝑖 A 𝑖 → R . To model behaviorat scale, or probabilistic strategy choices, one assumes that players use mixed strategies , i.e. probability distributions ( 𝑥 𝑖𝛼 𝑖 ) 𝛼 𝑖 ∈A 𝑖 ∈ Δ (A 𝑖 ) = : X 𝑖 . With slight abuse of notation, the expected payoff ofthe 𝑖 -th player in the profile ( 𝑥 𝑖𝛼 𝑖 ) 𝑖,𝛼 𝑖 is denoted by 𝑢 𝑖 again, andgiven by 𝑢 𝑖 ( 𝑥 ) = Σ 𝛼 ∈A ,...𝛼 𝑁 ∈A 𝑁 𝑢 𝑖 ( 𝛼 , . . . , 𝛼 𝑁 ) 𝑥 𝛼 . . . 𝑥 𝑁𝛼 𝑁 . (1)A mixed strategy ˆ 𝑥 is a Nash equilibrium iff ∀ 𝑖 ∀ 𝑥 : 𝑥 𝑗 = ˆ 𝑥 𝑗 , 𝑗 ≠ 𝑖 we have 𝑢 𝑖 ( 𝑥 ) ≤ 𝑢 𝑖 ( ˆ 𝑥 ) ; in other words no player can unilaterallyincrease their payoff by changing their strategy distribution. The minimax value for player 𝑖 is given by min 𝑥 − 𝑖 max 𝑥 𝑖 𝑢 𝑖 ( 𝑥 ) , where 𝑥 − 𝑖 : = ( 𝑥 𝑗 ) 𝑗 ≠ 𝑖 . It is the smallest possible value player 𝑖 can be forcedto attain by other players, without them knowing player 𝑖 strategy.We call a game binary iff each agent only has two strategies at theirdisposal, i.e. | 𝐴 𝑖 | = 𝑖 . To model the topology of interactions between players, we restrictour attention to a subset normal form games, where the structureof interactions between players can be encoded by a graph of two-player normal form subgames, leading us to consider so-called graphical polymatrix games (GPGs) [17, 19, 39]. A simple directedgraph is a pair ( 𝑉 , 𝐸 ) , where 𝑉 = { , . . . , 𝑁 } is a finite set of vertices (representing the players), and 𝐸 is a set ordered distinct vertex pairs( edges ), where the first element is called predecessor, and the secondis called successor. Each vertex ( 𝑖, 𝑘 ) has an associated two-playernormal form game, where only the successor 𝑘 is assigned payoffs,and they are represented by a matrix 𝐴 𝑖,𝑘 with rows enumeratingstrategies of player 𝑘 , and columns enumerating strategies of player 𝑖 . For a given strategy profile 𝑠 = { 𝑠 𝑖 } 𝑖 ∈ Π 𝑖 𝑆 𝑖 the payoffs for player 𝑘 in the full game are then determined as the sum 𝑢 𝑘 ( 𝑠 ) = ∑︁ 𝑖 : ( 𝑖,𝑘 ) ∈ 𝐸 𝐴 𝑖,𝑘 ( 𝑠 𝑖 , 𝑠 𝑘 ) (2)The payoffs can be extended to mixed strategies in a usual multilin-ear fashion: 𝑢 𝑘 ( 𝑥 ) = ∑︁ 𝑖 : ( 𝑖,𝑘 ) ∈ 𝐸 ∑︁ 𝑥 𝑠𝑖 ,𝑥 𝑠𝑘 𝐴 𝑖,𝑘 ( 𝑠 𝑖 , 𝑠 𝑘 ) 𝑥 𝑠 𝑖 𝑥 𝑠 𝑘 . (3)Note that a situation, where both the successor 𝑘 and also thepredecessor 𝑖 obtain a reward can be modelled by including bothedges ( 𝑖, 𝑘 ) and ( 𝑘, 𝑖 ) in the graph.We say that a simple directed graph is weakly connected, if anytwo vertices can be connected by a set of edges, where the directionof the edges is not taken into account. This is a weaker conditionthan strong connectedness, where each pair of vertices needs tobe connected by a path , i.e. a sequence of edges, together withassociated vertices, where the successor in one edge needs to bethe predecessor in the next one. The indegree of a vertex, is theamount of edges for which the vertex is the successor (in otherwords: the number of its predecessors). The outdegree is the amountof edges, for which the vertex is the predecessor, i.e. the number ofits successors. A cycle is a path, where the predecessor in the firstedge is the successor in the last edge. For our exposition we shallidentify cycles modulo shifts, i.e. if two paths consist of the sameedges in shifted order, then they form the same cycle.In this paper we consider two types of connected GPGs:1) firstly, cyclic games, where the interaction between the agentsforms a cycle, where each agent interacts only with the pre-vious neighbor. We observe that in such a cyclic game theindegree and outdegree of each vertex is one.(2) Secondly, a more general class of graphical games, whereeach player’s payoffs depend on up to one other player, i.e.the indegree of each vertex is at most one. For a vertex 𝑖 ∈ 𝑉 ,we will then denote the predecessor vertex by ˆ 𝑖 . For cyclicgames we have ˆ 𝑖 mod 𝑁 = 𝑖 − 𝑁 .Below, we state and prove a simple lemma, which characterizesthe one-predecessor assumption in terms of graph topology (c.f.Figure 1).Lemma 1. Let ( 𝑉 , 𝐸 ) be a weakly connected, simple, directed graph.If the indegree of each vertex is at most one, then, the graph can haveup to one cycle. If the graph has no cycle, then it has to have at mostone root vertex, i.e. a vertex of indegree zero, such that all othervertices are connected to it by a unique, directed path. Proof. For the first part of the lemma, let us assume the con-trary: that 𝑎 , 𝑎 are nodes of two distinct cycles within the sameweakly connected component. The edges between 𝑎 and 𝑎 needto form a path (otherwise there would be a vertex with two pre-decessors). Assume the path leads from 𝑎 to 𝑎 , and let 𝑎 be thefirst vertex which is both on the path and on 𝑎 cycle. Then 𝑎 hastwo predecessors, which leads to a contradiction.For the second part of the lemma, we can argue as follows. If anyvertex would have a sequence of predecessors which would not forma cycle, then by backtracking through the predecessors we couldidentify an infinite collection of distinct vertices. Therefore, thereneeds to be at least one root node for each vertex. The path fromsuch root node to the given vertex needs to be unique, otherwiseone could identify a vertex along the path with two predecessors.Finally, if there were two root nodes, from connectedness it followsthat there must be a node with two predecessors on the edgesbetween them. □ Remark 1.

Under the assumptions of Lemma 1, if the graph has acycle, then the cycle serves the role of the root node; i.e. there are nopaths from outside of the cycle to it (otherwise one vertex in the cyclewould have two predecessors), and all vertices outside of the cyclehave to be connected by a path from one of the vertices of the cycle(unique, up to the starting point within the cycle). This follows fromthe same arguments as in the second part of the proof of Lemma 1.Further on we will refer to such cycle as the root cycle . Denote by 𝑣 𝑖𝛼 𝑖 ( 𝑥 ) : = 𝑢 𝑖 ( 𝛼 𝑖 ; 𝑥 − 𝑖 ) and 𝑣 𝑖 ( 𝑥 ) = ( 𝑣 𝑖𝛼 𝑖 ( 𝑥 )) 𝛼 𝑖 ∈A 𝑖 . Tomodel the dynamics of learning we use a class of learning sys-tems known as follow-the-regularized-leader systems (FoReL) [6, 36].This class encompasses a variety of models such as gradient andreplicator, and allows for natural description of agent learning asregularized maximization of individual utilities.FoReL dynamics for player 𝑖 are defined by evolution of so-called utilities 𝑦 𝑖 = { 𝑦 𝑖𝛼 𝑖 } 𝛼 𝑖 ∈A 𝑖 ∈ R |A 𝑖 | – that is real numbersrepresenting a score each player assigns to each respective strategy Figure 1: A connected graph where each vertex is of indegreeat most one. – by the integral equation 𝑦 𝑖 ( 𝑡 ) = 𝑦 𝑖 ( ) + ∫ 𝑡 𝑣 𝑖 ( 𝑥 ( 𝑠 )) 𝑑𝑠,𝑥 𝑖 ( 𝑡 ) = 𝑄 𝑖 ( 𝑦 𝑖 ( 𝑡 )) , (4)where the the choice map 𝑄 = ( 𝑄 , . . . , 𝑄 𝑁 ) , 𝑄 𝑖 : R |A 𝑖 | → X 𝑖 ,which determines the evaluated strategy profile 𝑥 ( 𝑡 ) is given oneach coordinate by: 𝑄 𝑖 ( 𝑦 𝑖 ) = argmax 𝑥 𝑖 ∈X 𝑖 { < 𝑦 𝑖 , 𝑥 𝑖 > − ℎ 𝑖 ( 𝑥 𝑖 ) } , (5)In the above ℎ 𝑖 : X 𝑖 → R ∪{−∞ , ∞} is a convex regularizer function,representing a regularization/exploration term. The equation (4)represents the population adaptation to the perceived evolution ofutility values for each respective strategy of each player.In binary games, each agent has only two strategies at his dis-posal, say 𝛼 , 𝛼 . The variable 𝑥 𝑖 denotes then the proportion oftime player 𝑖 plays strategy 𝛼 , and the proportion of 𝛼 is givenby 1 − 𝑥 𝑖 . Following [25], we introduce new variables 𝑧 𝑖 ∈ R 𝑧 𝑖 : = 𝑦 𝑖𝛼 − 𝑦 𝑖𝛼 , (6)representing the difference in utilities between playing strategy 𝛼 and 𝛼 . It was proved in [25] that 𝑄 𝑖 ( 𝑧 𝑖 + 𝑐, 𝑐 ) is constant in 𝑐 .Therefore, without loss of generality, we can choose 𝑐 : = 𝑧 -dependent choice map ˆ 𝑄 𝑖 ( 𝑧 𝑖 ) : = 𝑄 𝑖 ( 𝑧 𝑖 , ) . Provided that 𝑄 is sufficiently regular (e.g. continuous),the integral equation (4) can be converted to a system of differentialequations (cid:164) 𝑧 = 𝑉 ( 𝑧 ) (7)given coordinate-wise by 𝑉 𝑖 ( 𝑧 ) : = 𝑣 𝑖𝛼 ( ˆ 𝑄 ( 𝑧 )) − 𝑣 𝑖𝛼 ( ˆ 𝑄 ( 𝑧 )) , (8)for details again see [25].Remark 2. An intuitively obvious, but technically important ob-servation is that evolution of 𝑖 th coordinates of the system (4) , and,in turn (8) depends solely on the values of 𝑥 𝑗 / 𝑧 𝑗 , respectively, fornodes 𝑗 that influence the payoffs of 𝑖 . In particular, for GPGs wehave 𝜕𝑉 𝑖 / 𝜕𝑧 𝑗 ≠ implies that there is an edge from 𝑗 to 𝑖 in thegame graph; and for GPGs with up to one predecessor, without loss ofgenerality we can rewrite (7) as (cid:164) 𝑧 𝑖 = 𝑉 𝑖 ( 𝑧 ˆ 𝑖 ) = 𝑣 𝑖𝛼 ( ˆ 𝑄 ( 𝑧 ˆ 𝑖 )) − 𝑣 𝑖𝛼 ( ˆ 𝑄 ( 𝑧 ˆ 𝑖 )) (9)erhaps the best known example of Follow-the-regularized-leader learning system are the replicator equations , where ℎ 𝑖 ( 𝑥 𝑖 ) : = ∑︁ 𝛼 𝑖 𝑥 𝑖𝛼 𝑖 log 𝑥 𝑖𝛼 𝑖 , (10)which yields the following equations for a binary GPG with up toone predecessor: (cid:164) 𝑧 𝑖 = (cid:16) 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) + 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) (cid:17) exp ( 𝑧 ˆ 𝑖 ) + exp ( 𝑧 ˆ 𝑖 )+ 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) . (11)Firstly, we prove the following lemma on monotonicity andsmoothness of the choice map, when a player has exactly twostrategies at disposal (i.e. for X 𝑖 = [ , ] ), which will be used laterin our other proofs.Lemma 2. Assume that the regularizer ℎ 𝑖 satisfies the followingconditions: (1) ℎ 𝑖 ∈ 𝐶 (( , )) ∩ 𝐶 ([ , ]) ( smoothness ), (2) ℎ ′ 𝑖 ( 𝑥 ) → −∞ as 𝑥 → and ℎ ′ 𝑖 ( 𝑥 ) → ∞ as 𝑥 → ( steepness ), (3) ℎ ′′ 𝑖 ( 𝑥 ) > for 𝑥 ∈ ( , ) ( strict convexivity ).Then 𝑄 𝑖 ∈ 𝐶 ( R ) and 𝑄 ′ 𝑖 ( 𝑧 𝑖 ) > . Proof. For a given 𝑧 𝑖 , ˆ 𝑄 𝑖 ( 𝑧 𝑖 ) is defined as the maximizer of ⟨( 𝑧 𝑖 , ) , ( 𝑥 𝑖 , − 𝑥 𝑖 )⟩ − ℎ 𝑖 ( 𝑥 𝑖 ) over 𝑥 𝑖 ∈ [ , ] . We have ⟨( 𝑧 𝑖 , ) , ( 𝑥 𝑖 , − 𝑥 𝑖 )⟩ − ℎ 𝑖 ( 𝑥 𝑖 ) = 𝑧 𝑖 𝑥 𝑖 − ℎ 𝑖 ( 𝑥 𝑖 ) . (12)From steepness, continuity and strict convexity it follows that ℎ 𝑖 ( ) = ℎ 𝑖 ( ) = ∞ so the maximum cannot be attained there.A necessary condition for maximum to be attained within ( , ) is 𝑧 𝑖 = ℎ ′ 𝑖 ( 𝑥 𝑖 ) (13)From steepness and strict convexivity it follows that equation (13)has a unique solution 𝑥 𝑖 = : ˆ 𝑄 𝑖 ( 𝑧 𝑖 ) for any 𝑧 𝑖 ∈ R . From the inversefunction theorem we have 𝜕𝑥 𝑖 𝜕𝑧 𝑖 = ˆ 𝑄 ′ 𝑖 ( 𝑧 𝑖 ) = / ℎ ′′ 𝑖 ( 𝑥 𝑖 ) > 𝑄 𝑖 is 𝐶 . □ A differential equation (cid:164) 𝑥 = 𝐹 ( 𝑥 ) given by a 𝐶 vector field 𝐹 : Ω → R 𝑛 on a domain Ω ⊂ R 𝑛 admits a unique solution on a maximalopen interval 𝐼 = ( 𝐼 𝑙 , 𝐼 𝑟 ) 𝑥 ( 𝑡 ) : 𝐼 → R for any initial condition 𝑥 ( ) = 𝑥 ∈ Ω . Among possible solutions to such equation, wedistinguish particular types of solutions due to their qualitativeproperties: we say that a solution 𝑥 ( 𝑡 ) is an equilibrium iff 𝑥 ( 𝑡 ) = const for all 𝑡 ∈ 𝐼 . A solution is periodic iff 𝑥 ( 𝑡 ) = 𝑥 ( 𝑡 + 𝑇 ) forsome 𝑇 > 𝑡 ∈ 𝐼 ; and it is a connecting orbit between twoequilibria (constant solutions) 𝑥 and 𝑥 , iff 𝑥 ( 𝑡 ) → 𝑥 as 𝑡 → ∞ and 𝑥 ( 𝑡 ) → 𝑥 as 𝑡 → −∞ . If 𝑥 = 𝑥 , such an orbit is called a homoclinic orbit , otherwise it is a heteroclinic orbit .A set 𝜔 ( 𝑥 ) ⊂ Ω is a limit set for an initial condition 𝑥 ∈ Ω , if ∀ 𝑥 ∈ 𝜔 ( 𝑥 ) there exists a sequence { 𝑡 𝑛 } 𝑛 ⊂ R + , such that 𝑥 ( 𝑡 𝑛 ) → 𝑥, 𝑛 → ∞ . Limit sets are invariant – they are formed by unionsof solutions of the differential equation. They are also compact – bounded as subsets of R 𝑛 , and closed under the limit operation forsequences from itself.Fundamental research has been devoted to study the propertiesof solutions within limit sets, as they offer a qualitative descriptionof long-term behavior of the system [16]. Since the discovery ofchaotic attractors [22], it has become known that in the generalsetting, these solutions can have arbitrarily complicated shapes andexhibit seemingly random behavior, a clearly undesirable featurefrom the point of view of applications; and engineering systemswith simple 𝜔 -limit sets became of particular interest.Definition 1. We say that a differential equation (cid:164) 𝑥 = 𝐹 ( 𝑥 ) , 𝑥 ∈ Ω has the Poincaré-Bendixson property iff for all 𝑥 ∈ Ω such thatthe solution 𝑥 ( 𝑡 ) is bounded, each limit set 𝜔 ( 𝑥 ) is either: • an equilibrium; • a periodic solution; • a union of equilibria and connecting orbits between these equi-libria. A well known result from the qualitative theory of differentialequations shows that planar systems exhibit this trait.Theorem 1. (The Poincaré-Bendixson Theorem [4]) Let 𝐹 = 𝐹 ( 𝑥 ) , 𝑥 ∈ Ω ⊂ R be a 𝐶 vector field with finitely many zeroes. Then, thedifferential equation (cid:164) 𝑥 = 𝐹 ( 𝑥 ) has the Poincaré-Bendixson property. Already in R there are known examples of systems havingcomplicated, chaotic attractors [22]. However, dimensionality isnot the only factor which could determine potential shapes of limitsets. In particular, for certain systems of arbitrary dimension, withstructured “previous-neighbor” interactions between the variables,the limit sets are as simple as in planar systems.Theorem 2. (Mallet-Paret & Smith [23])Let ( 𝑓 𝑖 ( 𝑥 𝑖 − , 𝑥 𝑖 )) 𝑛𝑖 = , be a 𝐶 vector field on an open convex set 𝑂 ⊂ R 𝑛 , and let 𝑥 : = 𝑥 𝑛 . Assume that 𝛿 𝑖 𝜕𝑓 𝑖 𝜕𝑥 𝑖 − > for all 𝑥 ∈ 𝑂 ,where 𝛿 𝑖 ∈ {− , } . Then, the system of differential equations (cid:164) 𝑥 𝑖 = 𝑓 𝑖 ( 𝑥 𝑖 − , 𝑥 𝑖 ) , 𝑖 = , . . . , 𝑛, (15) has the Poincaré-Bendixson property. The above theorem is key to proving our further results aboveGPG games with one predecessor. We will refer to systems satisfy-ing assumptions of the above theorem as monotone cyclic feedbacksystems . In this section we will show that Follow-the-regularized-Leader sys-tems of generic binary, cyclic games satisfy the Poincaré-Bendixsonproperty. We will first state and prove the Poincaré-Bendixsontheorem for cyclic games:Theorem 3.

Let (cid:164) 𝑧 = 𝑉 ( 𝑧 ) be a system of differential equationsgiven by the vector field (8) the follow-the-regularized-leader learningdynamics of a binary, cyclic game. For any smooth, steep, strictlyconvex collection of regularizers { ℎ 𝑖 } 𝑖 and almost all values of payoffs– that is outside of a set of measure zero – such system possesses thePoincaré-Bendixson property. roof. Since 𝑢 𝑖 depends only on 𝑄 𝑖 and 𝑄 𝑖 − , we have 𝑉 𝑖 ( ˆ 𝑄 ( 𝑧 )) = 𝑉 𝑖 ( ˆ 𝑄 𝑖 − ( 𝑧 𝑖 − )) = 𝑣 𝑖𝛼 ( 𝑄 𝑖 − ( 𝑧 𝑖 − , )) − 𝑣 𝑖𝛼 ( 𝑄 𝑖 − ( 𝑧 𝑖 − , )) . (16)Our goal is to employ Theorem 2 and show that the vector field 𝑉 induces a monotone cyclic feedback system. Therefore, we wouldlike to establish under which conditions 𝛿 𝑖 𝜕𝑉 𝑖 𝜕𝑧 𝑖 − > . (17)for all 𝑖 and any combination of 𝛿 𝑖 ∈ {− , } . We have: 𝜕𝑉 𝑖 𝜕𝑧 𝑖 − = 𝜕𝑣 𝑖𝛼 𝜕𝑥 𝑖 − 𝜕𝑥 𝑖 − 𝜕𝑧 𝑖 − − 𝜕𝑣 𝑖𝛼 𝜕𝑥 𝑖 − 𝜕𝑥 𝑖 − 𝜕𝑧 𝑖 − . (18)Moreover, 𝜕𝑣 𝑖𝛼 𝜕𝑥 𝑖 − = 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) , (19)and 𝜕𝑣 𝑖𝛼 𝜕𝑥 𝑖 − = 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) . (20)Under assumptions from Lemma 2 we have 𝜕𝑥 𝑖 − 𝜕𝑧 𝑖 − >

0, so thenecessary condition to satisfy inequality (17) is: 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) + 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) ≠ 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) + 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) , (21)which is generically satisfied for such normal form games. □ Lemma 3.

Consider the following 𝑦 -augmented system of differen-tial equations (cid:164) 𝑥 = 𝑓 ( 𝑥 ) , (cid:164) 𝑦 = 𝑔 ( 𝑥 𝑖 ) ,𝑥 = { 𝑥 , . . . , 𝑥 𝑛 } ∈ R 𝑛 , 𝑦 ∈ R . (22) for smooth 𝑓 , 𝑔 . If the original system (cid:164) 𝑥 = 𝑓 ( 𝑥 ) (23) has the Poincaré-Bendixson property, then the augmented system 22also has the Poincaré-Bendixson property. Proof. Let 𝑍 be an 𝜔 -limit set corresponding to some solu-tion ( 𝑥 ( 𝑡 ) , 𝑦 ( 𝑡 )) to the system (22). Consider 𝑋 – an 𝜔 -limit setto solution 𝑥 ( 𝑡 ) of (23). From invariance of 𝜔 -limit sets it followsset 𝑍 consists of a union of solutions of (22). For any solution { 𝑥 ∗ ( 𝑡 ) , 𝑦 ∗ ( 𝑡 ) : 𝑡 ∈ R } ⊂ 𝑍 , we have { 𝑥 ∗ ( 𝑡 )} ⊂ 𝑋 . By the Poincaré-Bendixson property of the original system, we can distinguish threecases:(1) 𝑥 ∗ ( 𝑡 ) is an equilibrium of (23),(2) 𝑥 ∗ ( 𝑡 ) is a periodic orbit of (23),(3) 𝑥 ∗ ( 𝑡 ) is a connecting orbit of (23) – a part of a cycle ofconnecting orbits.In the rest of the proof we will frequently use the integral formof solutions 𝑦 ( 𝑡 ) to (22), given by 𝑦 ( 𝑡 ) = 𝑦 ( ) + ∫ 𝑡 𝑔 ( 𝑥 𝑖 ( 𝑠 )) 𝑑𝑠 .Case (1): We will prove that ( 𝑥 ∗ ( 𝑡 ) , 𝑦 ∗ ( 𝑡 )) is stationary for (22).It is enough to show 𝑔 ( 𝑥 ∗ 𝑖 ) =

0. Assume otherwise. Then 𝑦 ∗ ( 𝑡 ) = 𝑦 ( ) + ∫ 𝑡 𝑔 ( 𝑥 ∗ 𝑖 ) 𝑑𝑠 = 𝑦 ( ) + 𝑡𝑔 ( 𝑥 ∗ 𝑖 ) → ∞ as 𝑡 → ∞ . This contradictsthe boundedness of an 𝜔 -limit set. Case (2) Let 𝑇 be the period of 𝑥 ∗ ( 𝑡 ) . We will show that ( 𝑥 ∗ ( 𝑡 ) , 𝑦 ∗ ( 𝑡 )) is a periodic solution of (22) of the same period. We have 𝑑𝑑𝑡 ( 𝑦 ∗ ( 𝑡 + 𝑇 ) − 𝑦 ∗ ( 𝑡 )) = 𝑑𝑑𝑡 ∫ 𝑇 + 𝑡𝑡 𝑔 ( 𝑥 ∗ 𝑖 ( 𝑠 )) 𝑑𝑠 = 𝑔 ( 𝑥 ∗ 𝑖 ( 𝑇 + 𝑡 )) − 𝑔 ( 𝑥 ∗ 𝑖 ( 𝑡 )) = , (24)hence 𝑦 ∗ ( 𝑡 + 𝑇 ) − 𝑦 ∗ ( 𝑡 ) = 𝑐𝑜𝑛𝑠𝑡 . If this quantity would be non-zero,the diameter of the set { 𝑦 ∗ ( 𝑡 ) : 𝑡 ∈ R } would be infinite. However,the set 𝑍 is bounded, and therefore 𝑦 ∗ ( 𝑡 + 𝑇 ) = 𝑦 ∗ ( 𝑡 ) .Case (3): We will show that ( 𝑥 ∗ ( 𝑡 ) , 𝑦 ∗ ( 𝑡 )) is a connecting or-bit between two equilibria for the full system (22). We shall onlyprove convergence with 𝑡 → ∞ , the very same argument holdsfor 𝑡 → −∞ and 𝛼 -limit sets. The orbit ( 𝑥 ∗ ( 𝑡 ) , 𝑦 ∗ ( 𝑡 )) is boundedand therefore it has an accumulation point as 𝑡 → ∞ given by ( 𝑥 ∗∗ , 𝑦 ∗∗ ) ∈ 𝜔 ( 𝑥 ∗ ( ) , 𝑦 ∗ ( ) . The point 𝑥 ∗∗ is an equilibrium for (23).We will show that ( 𝑥 ∗∗ , 𝑦 ∗∗ ) is an equilibrium. It is enough to showthat 𝑔 ( 𝑦 ∗∗ ) =

0. Assume otherwise. Then 𝑦 ∗∗ ( 𝑡 ) = 𝑦 ∗∗ + 𝑡𝑔 ( 𝑥 ∗∗ 𝑖 ) which is unbounded. However, it is also a part of the 𝜔 ( 𝑥 ∗ ( ) , 𝑦 ∗ ( ) ,since 𝜔 -limit sets are invariant. Boundedness of 𝜔 ( 𝑥 ∗ ( ) , 𝑦 ∗ ( ) leads to a contradiction. The same process, repeated for all connect-ing orbits of (23), creates a cycle of connecting orbits for (22). □ Theorem 4.

Let (cid:164) 𝑧 = 𝑉 ( 𝑧 ) be a system of differential equationsgiven by the follow-the-regularized leader dynamics of a binary, con-nected, graphical polymatrix game, where each player has up to onepredecessor. Then, for any smooth, steep, strictly convex collection ofregularizers { ℎ 𝑖 } 𝑖 and almost all values of payoffs – that is outside ofa set of measure zero – such system possesses the Poincaré-Bendixsonproperty. Proof. By Lemma 1 and Remark 1 we know that the graphof the system has either a root vertex or a root cycle. We willfirst address the case of a root vertex. We will see that this caseis somewhat degenerate. Without loss of generality let us assumethat it is labelled as the 1st vertex, and that the other vertices arenumbered in order of increasing path distance from vertex 1 (i.e. 𝑗 < 𝑖 implies that the path from 1 to 𝑗 is shorter than the path from1 to 𝑖 ) – this is possible by Lemma 1.The payoffs of the root node are only affected by its own choiceof strategy. Therefore (cid:164) 𝑧 = 𝑢 ( 𝛼 ) − 𝑢 ( 𝛼 ) , and therefore 𝑧 ( 𝑡 ) = 𝑡 ( 𝑢 ( 𝛼 ) − 𝑢 ( 𝛼 )) + 𝑧 ( ) . This system constitutes an autonomousODE, which trivially has the Poincaré-Bendixson property (as it iseither all zeroes, or divergent). By adding the vertex 2, we againobtain an autonomous system, and again it is either stationaryor divergent; and in the same manner the proof continues for allvertices. It should be noted that "divergence" in practice means that 𝑧 𝑖 ( 𝑡 ) ’s approach in the limit 𝑡 → ∞ to either ∞ or 𝑖𝑛𝑓 𝑡𝑦 ; the formerimplies that the player 𝑖 is placing almost all probability mass onstrategy 𝛼 , and the latter – on 𝛼 .The more interesting scenario arises for the root cycle, where pe-riodic limit sets are possible. Enumerate these vertices by 1 , . . . , 𝑁 ,with 𝑁 < = 𝑁 , and assume that the vertices from 𝑁 + 𝑁 arearranged in the order of increasing path distance from vertices ofthe cycle (possible by Remark 1). Observe that the system (cid:164) 𝑧 𝑖 = 𝑉 𝑖 ( 𝑧 ˆ 𝑖 ) ,𝑖 = , . . . , 𝑁 , (25)s an autonomous system of differential equations (as there are noedges with successors in { , . . . , 𝑁 } , and predecessors outside ofthis set), and forms a binary, cyclic game in the sense of Theorem 3.As such, this subsystem possesses the Poincaré-Bendixson property.From then on, the proof continues similarly as for the roof vertex– in an inductive way. We add the vertex 𝑁 +

1, and observe thatthe system (cid:164) 𝑧 𝑖 = 𝑉 𝑖 ( 𝑧 ˆ 𝑖 ) ,𝑖 = , . . . , 𝑁 + , (26)is again autonomous, i.e. there are no edges with predecessors in { 𝑁 + , . . . , 𝑁 } and successors in { , . . . , 𝑁 + } . By Lemma 2, thissystem then also possesses the Poincaré-Bendixson property. Theproof continues inductively w.r.to the vertices, until we concludethat the full system (cid:164) 𝑧 = 𝑉 ( 𝑧 ) has the Poincaré-Bendixson property. □ Remark 3.

Our theorems apply only to fully mixed initial strategyprofiles, as the differences of utilities corresponding to pure strategiesare infinite, and (FoReL) learning, as in Equation (4) is formally notdefined in such situation. However, when one player has an initial purestrategy, the system can be suitably decomposed, and the Poincaré-Bendixson property still holds. More specifically, in a game whereeach agent has one predecessor, if agent 𝑖 plays a pure strategy, thenall the agents 𝑉 𝑖 : = { 𝑗 : there exists a path from 𝑖 to j } (27) would eventually sequentially converge under all reasonable learningdynamics (including replicator) to their best response to strategy 𝑖 .One can then apply Theorem 4 to the autonomous reduced system 𝑉 \ 𝑉 𝑖 , where Equations (4) are again well defined. Remark 4.

The assumption of connectedness is needed for thePoincaré-Bendixson property, as (by Lemma 1) it ensures that thegraph of interactions has only one cycle. For games with multiplecycles, one can have yet another type of limit behavior. Consider adisjoint union of FoReL systems for two binary graphical games, bothpossessing the Poincaré-Bendixson property, such that the systemshave non-resonant periods of periodic orbits; e.g. one of the systemshas a periodic solution of period and the other system has a periodicorbit of period √ . Such orbits can be easily obtained from replicatordynamics for appropriately scaled mismatched pennies games, c.f.Section 5. Let ( 𝑧 ( ) , 𝑧 ( )) be a point belonging to the periodic solu-tion of period , and ( 𝑧 ( ) , 𝑧 ( )) be a point belonging to the periodicsolution of period √ . Then the solution of the full system startingfrom ( 𝑧 ( ) , 𝑧 ( ) , 𝑧 ( ) , 𝑧 ( )) forms a quasi-periodic motion, with 𝜔 -limit set of toroidal shape, see Figure 2, c.f. [5]. In this section we will show that in the case of cyclic, binary games,under additional but structurally robust assumptions on the payoffmatrices (i.e., assumptions that remain valid after small perturba-tions of the payoff matrices) the time-average social welfare ofour FoReL dynamics is at least as large as the social welfare of theworst Nash equilibrium. As is typically the case the social welfareis defined as the sum of individual payoffs 𝑆𝑊 = (cid:205) 𝑖 𝑢 𝑖 . Figure 2: An invariant torus in a 4-dimensional dynamicalsystem – a projection onto first three variables.

Theorem 5.

In any binary, cyclic game with the property thatfor any agent 𝑘 + , we have that the payoff entries are distinct and [ 𝐴 𝑘,𝑘 + ( , ) − 𝐴 𝑘,𝑘 + ( , )] [ 𝐴 𝑘,𝑘 + ( , ) − 𝐴 𝑘,𝑘 + ( , )] < thenthe time-average of the social welfare of FoReL dynamics is at leastthat of the social welfare of the worst Nash equilibrium. Formally, lim inf 1 𝑇 ∫ 𝑇 ∑︁ 𝑘 𝑢 𝑘 ( 𝑥 ( 𝑡 )) 𝑑𝑡 ≥ ∑︁ 𝑘 𝑢 𝑘 ( 𝑥 𝑁 𝐸 ) where 𝑥 𝑁 𝐸 the worst case Nash equilibrium, i.e., a Nash equilibriumthat minimizes the sum of utilities of all agents.

Proof. Lets consider the payoff matrix of each agent 𝑘 +

1. Byassumption there is at most one agent 𝑖 such that 𝐴 𝑖,𝑘 is a non-zeromatrix, i.e., the unique predecessor of 𝑘 +

1, that for simplicity ofnotation we call 𝑘 . By assumption the four entries will be considereddistinct. Next, we break down the analysis into two cases:As a first case, we consider the scenario where there exists at leastone agent with a strictly dominant strategy. The FoReL dynamics ofthat agents will trivially converge to playing the strictly dominantstrategy with probabilty one. Similarly, all agents reachable fromagent 𝑘 will similarly best response to it. This is clearly the uniqueNE for the binary cyclic game, so in this case the limit behavior ofFTRL dynamics exactly corresponds to the unique Nash behaviorand the theorem follows immediately.Next, let’s consider the case where no agent has a strictly domi-nant strategy. In this case, we will construct a specific Nash equilib-rium for the cyclic game (although it may have more than one). Inthis Nash equilibrium every agent 𝑘 plays the unique mixed strategythat makes its successor (agent 𝑘 +

1) indifferent between its twostrategies. Such a strategy exists for each agent, because otherwisethere would exist an agent with a strictly dominant strategy. Infact by the assumption [ 𝐴 𝑘,𝑘 + ( , ) − 𝐴 𝑘,𝑘 + ( , )] [ 𝐴 𝑘,𝑘 + ( , ) − 𝐴 𝑘,𝑘 + ( , )] < 𝑘 agent’s min-maxstrategy if they participated in a zero-sum game with agent 𝑘 + 𝑘 +

1. Indeed, this assump-tion along with the fact that agent 𝑘 + 𝐴 𝑘,𝑘 + ) has an interior Nash. Given her predecessors behav-ior, agent 𝑘 + 𝑘 just plays the strategy that makes agent 𝑘 + □ To illustrate our theoretical findings, we present the dynamicalstructure of two multidimensional binary, cyclic games, whichexhibit non-convergence, and therefore non-trivial limit behavior.To determine the limit sets, we perform numerical integration ofinitial value problems for various starting conditions.

Firstly, we analyze a four dimensional system of matched-mismatchedpennies. Each player has a choice of two strategies, 𝛼 and 𝛼 . Thepayoffs for players 0 , 𝐴 , = 𝐴 , = (cid:20) − − (cid:21) (28)and the payoffs of players 1 , 𝐴 , = 𝐴 , = (cid:20) − − . (cid:21) (29)where the rows correspond to the choice of 𝑖 − 𝑖 -th playerstrategy.Simply put, players 0 and 2 will try to mismatch the strategywith players 1 and 3, and players 1 and 3 will try to match them.The induced system of replicator equations is given by: (cid:164) 𝑧 𝑖 = 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) 𝑄 ˆ 𝑖 )( 𝑧 ) + 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 )( − 𝑄 ˆ 𝑖 ( 𝑧 ))− 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 ) 𝑄 ˆ 𝑖 ( 𝑧 ) − 𝐴 ˆ 𝑖,𝑖 ( 𝛼 , 𝛼 )( − 𝑄 ˆ 𝑖 ( 𝑧 )) , (30)with 𝑄 𝑖 ( 𝑧 ) = 𝑥 𝑖 = exp ( 𝑧 𝑖 )/( + exp ( 𝑧 𝑖 )) ∈ ( , ) , and we recall thatˆ 𝑖 mod 4 = 𝑖 − 𝑥 coordinates, compactifying the state space in thefollowing manner: 𝑄 𝑖 ( 𝑧 ) = 𝑥 𝑖 ∈ , ⇒ 𝑧 𝑖 = . (31)This yields the original replicator dynamics, c.f. [25]), where 𝑥 = ( 𝑥 𝑖 ) 𝑖 denotes the frequencies with which the players play strategy 𝛼 .The system possesses three Nash equilibria, corresponding to thefollowing strategy profiles: ( , , , ) , ( , , , ) , ( . , . , . , . ) ,out of which the pure Nash equilibria are attracting, and the mixedNash equilibrium has two center directions, one repelling direction,and one attracting direction. We will denote the mixed Nash by 𝑥 𝑀𝑁 𝐸 .Despite the nonlinear nature of the system, the dynamical situa-tion in viccinity of 𝑥 𝑀𝑁 𝐸 can be described exactly. Due to symmetryof the system, the plane 𝑊 𝑐 ( 𝑥 𝑀𝑁 𝐸 ) = {( 𝑡, 𝑠, 𝑡, 𝑠 ) , 𝑡, 𝑠 ∈ [ , ]} (32)is invariant, and consists purely of periodic orbits, which constitutea two-dimensional center manifold to the mixed Nash equilibrium.The parametrizations of stable/unstable manifold for the mixedNash equilibrium are given by Figure 3: Limit sets in the matched-mismatched pennies sys-tem: a projection onto the first three variables. Top left: con-vergence towards mixed equilibrium along the one dimen-sional stable manifold, top right: a periodic orbit on the cen-ter manifold of the fixed equilibrium. Bottom: An orbit con-verging to a limit cycle. 𝑊 𝑠 ( 𝑥 𝑀𝑁 𝐸 ) = {( − 𝑡, 𝑡, 𝑡, − 𝑡 ) , 𝑡 ∈ [ , ]} ,𝑊 𝑢 ( 𝑥 𝑀𝑁 𝐸 ) = {( 𝑡, 𝑡, − 𝑡, − 𝑡 ) , 𝑡 ∈ [ , ]} , (33)respectively.The numerical results are in line with Theorems 3, 4. Indeed,the interior Nash equilibrium is an 𝜔 -limit set for its own stablemanifold. Each periodic orbit is in particular an 𝜔 -limit set of anypoint lying on that orbit. From numerical simulations it appearsthat the periodic orbits are true limit cycles, in the sense that theythemselves possess a stable manifold and an unstable manifold,and therefore are 𝜔 -limit sets for a collection of points outside ofthemselves, see Figure 3. Most crucially, more complicated behaviorlike chaos or invariant tori does not emerge, despite the systembeing nontrivially embedded in four dimensions.The mixed Nash yields the minimax payoff vector ( , , , ) foreach player, and eventually social welfare of 0. The payoff matricessatisfy the assumptions of Theorem 5, and the average payoffs alongsolutions are therefore at least non-negative. In fact, almost all (as aset of full measure) initial conditions appear to converge to the pureequilibria at the boundary, and their time-average payoffs exceedthe one of Nash equilibrium and converge to the maximal welfareof 4, see Figure 4. Our second system is a system of 𝑁 -player asymmetric mismatchedpennies, previously introduced in [20]. There are three players, andagain each of them can choose between two strategies, 𝛼 and 𝛼 .The payoffs for players 𝑖 w.r.to the player 𝑖 − igure 4: Time-average payoffs and social welfare of a sam-ple orbit in the matched-mismatched pennies game. matrix 𝐴 ˆ 𝑖,𝑖 = (cid:20) 𝑝 (cid:21) (34)for ˆ 𝑖 mod 𝑁 = 𝑖 − 𝑁 , 𝑝 >

0. The induced system of repli-cator equations is given by: (cid:164) 𝑧 𝑖 = 𝑄 ˆ 𝑖 ( 𝑧 ) − 𝑝 ( − 𝑄 ˆ 𝑖 ( 𝑧 )) , (35)with 𝑥 𝑖 = 𝑄 𝑖 ( 𝑥 ) given by exp ( 𝑧 𝑖 )/( + exp ( 𝑧 𝑖 )) as before. We extendthe phase space by 𝑥 𝑖 = ,

1, in the same manner as in our previousexample.For odd 𝑁 there is no Nash equilibrium in pure strategies; insteadbest response dynamics in pure strategies eventually converge tocyclic behavior formed by mixtures of strategies 𝛼 and 𝛼 Forthe replicator system, the pure strategy profiles are saddle-typestationary points of the ODE, connected by heteroclinic orbits ofmixed strategies. The system has a unique, mixed Nash equilibriumdefined by 𝑥 𝑖 = 𝑝 + , 𝑖 ∈ { , . . . , 𝑁 } , where each player attainsa payoff of 𝑝𝑝 + . Due to nonlinear nature, this time it is difficultto give exact formulas for its stable and unstable manifolds, how-ever, linear stability analysis for various values of 𝑝, 𝑁 (e.g. 𝑝 = 𝑁 =

5) shows that the equilibrium is saddle-focus, i.e. have oneattracting direction corresponding to a real negative eigenvalue ofthe Jacobian (corresponding to the diagonal direction), and multi-ple complex eigenvalues with non-zero real parts – some positive,yielding unstable directions.The system was thoroughly analyzed in [20] and the main resultprovided therein is that for 𝑁 = 𝑝 > 𝑁 , and for all 𝑝 ≠ −

1, the only limit sets inthe system are equilibria, periodic orbits, and cycles of connectingorbits to equilibria. The payoff matrices satisfy the assumptions ofTheorem 5, and in particular for all 𝑝 >

0, the mixed equilibriumyields the minimax payoff for each player, and time averages ofpayoffs along other orbits have to exceed the minimax payoffs. Bynumerical integration we observe that for almost all initial condi-tions the dynamics is attracted to the boundary cycle of averagepayoff ( 𝑝 + ) 𝑁 − , see Figure 5, and indeed no chaotic emergentbehavior is apparent. Figure 5: Left: projection of an orbit of the asymmetric mis-matched pennies system for 𝑁 = and 𝑝 = : onto first threevariables. Note that the integration time approaches infin-ity as orbits approach the corners of the heteroclinic cycle,hence the simulated orbit appears to converge to one of thecorners. Right: time-averaged payoffs and social welfare cor-responding to this orbit. Numerous recent results in learning in games have establisheda clear separation between the idealized behavior of equilibra-tion and the erratic, unpredictable and typically chaotic behav-ior of learning dynamics even in simple games and domains [1–3, 8, 11, 13, 20, 27, 29, 35]. Although at a first glance, this realizationmight seem as a set-back, when viewed from the right perspec-tive it opens up a new possibility, a new way of understandinglearning dynamics in games that does not focus primarily on thevocabulary of solution concepts of game theory with its numerousnotions of equilibration, but instead examines solution conceptsfrom the topology of dynamical systems that are more native tothe nature of game dynamics. Our results showcase the possibil-ity of establishing links between the combinatorial structure ofmulti-agent games (e.g. game graph, number of actions) to under-stand and constrain the topological complexity of game dynamics(Poincaré-Bendixson behavior) and finally establish links back tomore traditional game theoretic analysis such as understanding thesocial welfare, efficiency of the system. These connections show-case promising advantages of this approach and we hope that weencourage more work along these lines.

REFERENCES [1] James P. Bailey and Georgios Piliouras. 2018. Multiplicative Weights Update inZero-Sum Games. In

Proceedings of the 2018 ACM Conference on Economics andComputation, Ithaca, NY, USA, June 18-22, 2018 . 321–338. https://doi.org/10.1145/3219166.3219235[2] James P. Bailey and Georgios Piliouras. 2019. Multi-Agent Learning in NetworkZero-Sum Games is a Hamiltonian System. In .[3] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. 2012. Perturbations of set-valued dynamical systems, with applications to game theory.

Dynamic Gamesand Applications

2, 2 (2012), 195–205.[4] Ivar Bendixson. 1901. Sur les courbes définies par des équations différentielles.

Acta Mathematica

24, 1 (1901), 1.[5] Hendrik W Broer, George B Huitema, and Mikhail B Sevryuk. 2009.

Quasi-periodicmotions in families of dynamical systems: order amidst chaos . Springer.[6] Nicolo Cesa-Bianchi and Gábor Lugosi. 2006.

Prediction, learning, and games .Cambridge university press.[7] Yun Kuen Cheung and Georgios Piliouras. 2020. Chaos, Extremism and Optimism:Volume Analysis of Learning in Games. arXiv e-prints , Article arXiv:2005.13996(May 2020), arXiv:2005.13996 pages. arXiv:2005.13996 [cs.GT]8] Yun Kuen Cheung and Georgios Piliouras. 2019. Vortices Instead of Equilibriain MinMax Optimization: Chaos and Butterfly Effects of Online Learning inZero-Sum Games. In

COLT .[9] Thiparat Chotibut, Fryderyk Falniowski, Michał Misiurewicz, and Georgios Pil-iouras. 2019. The route to chaos in routing games: When is Price of Anarchy toooptimistic? arXiv preprint arXiv:1906.02486 (2019).[10] Thiparat Chotibut, Fryderyk Falniowski, Michał Misiurewicz, and Georgios Pil-iouras. 2020. Family of chaotic maps from game theory.

Dynamical Systems (2020), 1–16.[11] Constantinos Daskalakis and Ioannis Panageas. 2018. The Limit Points of (Op-timistic) Gradient Descent in Min-Max Optimization. In

Advances in NeuralInformation Processing Systems 31: Annual Conference on Neural Information Pro-cessing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. \ ’e Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games. arXiv preprintarXiv:1910.13010 (2019).[13] Seth Frey and Robert L Goldstone. 2013. Cyclic game dynamics driven by iteratedreasoning. PloS one

8, 2 (2013), e56416.[14] Drew Fudenberg, Fudenberg Drew, David K Levine, and David K Levine. 1998.

The theory of learning in games . Vol. 2. MIT press.[15] Tobias Galla and J Doyne Farmer. 2013. Complex dynamics in learning com-plicated games.

Proceedings of the National Academy of Sciences

Ordinary Differential Equations . Dover Publications.[17] Joseph T Howson Jr. 1972. Equilibria of polymatrix games.

Management Science

18, 5-part-1 (1972), 312–318.[18] Yu Ilyashenko. 2002. Centennial history of Hilbert’s 16th problem.

Bull. Amer.Math. Soc.

39, 3 (2002), 301–354.[19] Michael Kearns. 2007. Graphical games.

Algorithmic game theory

ICS . 125–140.[21] Elias Koutsoupias and Christos Papadimitriou. 1999. Worst-case equilibria. In

Annual Symposium on Theoretical Aspects of Computer Science . Springer, 404–413.[22] Edward N Lorenz. 1963. Deterministic nonperiodic flow.

Journal of the atmo-spheric sciences

20, 2 (1963), 130–141.[23] John Mallet-Paret and Hal L Smith. 1990. The Poincaré-Bendixson theoremfor monotone cyclic feedback systems.

Journal of Dynamics and DifferentialEquations

2, 4 (1990), 367–421.[24] Brendan McMahan. 2011. Follow-the-regularized-leader and mirror descent:Equivalence theorems and l1 regularization. In

Proceedings of the FourteenthInternational Conference on Artificial Intelligence and Statistics . 525–533.[25] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. 2018.Cycles in adversarial regularized learning. In

Proceedings of the Twenty-NinthAnnual ACM-SIAM Symposium on Discrete Algorithms . SIAM, 2703–2717.[26] Sai Ganesh Nagarajan, David Balduzzi, and Georgios Piliouras. 2020. From Chaosto Order: Symmetry and Conservation Laws in Game Dynamics. In

ICML .[27] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. 2017. Multi-plicative Weights Update with Constant Step-Size in Congestion Games: Conver-gence, Limit Cycles and Chaos. In

The Thirty-first Annual Conference on NeuralInformation Processing Systems (NIPS) .[28] Marco Pangallo, James Sanders, Tobias Galla, and Doyne Farmer. 2017. A tax-onomy of learning dynamics in 2 x 2 games. arXiv preprint arXiv:1701.09043 (2017).[29] G. Piliouras and J. S. Shamma. 2014. Optimization Despite Chaos: Convex Relax-ations to Complex Limit Sets via Poincaré Recurrence. In

SODA .[30] Manfred Plank. 1997. Some qualitative differences between the replicator dy-namics of two player and n player games.

Nonlinear Analysis: Theory, Methods &Applications

30, 3 (1997), 1411–1417.[31] Tim Roughgarden. 2009. Intrinsic robustness of the price of anarchy. In

Proceed-ings of the forty-first annual ACM symposium on Theory of computing . 513–522.[32] Tim Roughgarden. 2016.

Twenty lectures on algorithmic game theory . CambridgeUniversity Press.[33] James BT Sanders, J Doyne Farmer, and Tobias Galla. 2018. The prevalence ofchaotic dynamics in games with many players.

Scientific reports

8, 1 (2018), 1–13.[34] William H Sandholm. 2010.

Population games and evolutionary dynamics . MITpress.[35] Yuzuru Sato, Eizo Akiyama, and J Doyne Farmer. 2002. Chaos in learning a simpletwo-person game.

Proceedings of the National Academy of Sciences

99, 7 (2002),4748–4751.[36] Shai Shalev-Shwartz et al. 2011. Online learning and online convex optimization.

Foundations and trends in Machine Learning

4, 2 (2011), 107–194.[37] Yoav Shoham and Kevin Leyton-Brown. 2008.

Multiagent systems: Algorithmic,game-theoretic, and logical foundations . Cambridge University Press. [38] Sebastian van Strien. 2011. Hamiltonian flows with random-walk behaviouroriginating from zero-sum games and fictitious play.

Nonlinearity

24, 6 (2011),1715. http://stacks.iop.org/0951-7715/24/i=6/a=002[39] Elena B Yanovskaya. 1968. Equilibrium points in polymatrix games.

LitovskiiMatematicheskii Sbornik