Subgame-perfect Equilibria in Mean-payoff Games
aa r X i v : . [ c s . G T ] J a n Subgame-perfect Equilibriain Mean-payoff Games
Léonard Brice
ENS Paris-Saclay, FranceEmail: [email protected]
Jean-François Raskin
Université libre de Bruxelles (ULB), BelgiumEmail: [email protected]
Marie Van Den Bogaard
Université Gustave Eiffel, LIGM, FranceEmail: [email protected]
Abstract —In this paper, we provide an effective characteri-zation of all the subgame-perfect equilibria in infinite durationgames played on finite graphs with mean-payoff objectives. Tothis end, we introduce the notion of requirement and the notionof negotiation function. We establish that the set of plays that aresupported by SPEs are exactly those that are consistent with theleast fixed point of the negotiation function. Finally, we show thatthe negotiation function is piecewise linear and can be analyzedusing the linear algebraic tool box.
I. I
NTRODUCTION
The notion of Nash equilibrium (NE) is one of the mostimportant and most studied solution concepts in game theory.A profile of strategies is an NE when no rational player hasan incentive to change their strategy unilaterally, i.e. while theother players keep their strategies. Thus an NE models a stablesituation. Unfortunately, it is well known that, in sequentialgames, NEs suffer from the problem of non-credible threats ,see e.g. [16]. In those games, some NE only exists when someplayers do not play rationally in subgames and so use non-credible threats to force the NE. This is why in sequentialgames, the stronger notion of subgame-perfect equilibrium is used instead: a profile of strategies is a subgame-perfectequilibrium (SPE) if it is an NE in all the subgames of thesequential game. Thus SPE imposes rationality even after adeviation has occured.In this paper, we study sequential games that are infiniteduration games played on graphs with mean-payoff objectivesand focus on SPEs. While NEs are guaranteed to exist ininfinite duration games played on graphs with mean-payoffobjectives, it is known that it is not the case for SPEs,see e.g. [17], [3]. We provide in this paper a constructivecharacterization of the entire set of SPEs which allows usto decide, among others, the SPE existence problem. Thisproblem was left open in previous contributions on the subject.More precisely, our contributions are described in the nextparagraphs.
Contributions.
First, we introduce two important new notionsthat allow us to capture NEs, and more importantly SPEs ininfinite duration games played on graphs with mean-payoffobjectives : the notion of requirement and the notion of negotiation function . A large part of our results apply to the larger class of games with prefixindependent objectives. For the sake of readability of this introduction, wefocus here on mean-payoff games but the technical results in the paper areusually covering broader classes of games.
A requirement λ is a function that assigns to each vertex v ∈ V of a game graph a value in R ∪ {−∞ , + ∞} . The value λ ( v ) represents a requirement on any play ρ = ρ ρ . . . ρ n . . . that traverses this vertex: if we want the player that controlsthe vertex v to follow ρ and to give up deviating from ρ , thenthe play must offer a payoff to this player that is at least λ ( v ) .An infinite play ρ is λ -consistent if, for each player i , thepayoff of ρ for player i is larger than or equal to the largestvalue of λ on vertices occurring along ρ and controlled byplayer i .We first establish that if λ maps a vertex v to the largestvalue that the player that controls v can secure against a fullyadversarial coalition of the other players, i.e. if λ ( v ) is thezero-sum worst-case value, then the set of plays that are λ -consistent are exactly the set of plays that are supported byan NE (Theorem 1).As SPEs are forcing players to play rationally in all sub-games, we cannot rely of the zero-sum worst-case value tocharacterize them. Indeed, when considering the worst-casevalue, we allow adversaries to play fully adversarially aftera deviation and so potentially in an irrational way w.r.t. theirown objective. In fact, in an SPE, a player is refrained todeviate when opposed by a coalition of rational adversaries .To characterize this relaxation of the notion of worst-casevalue, we rely on our notion of negotiation function .The negotiation function operates from the set of require-ments into itself. To understand the purpose of the negotiationfunction, let us consider its application on the requirement λ that maps every vertex v on the worst-case value as above.Now, we can naturally formulate the following question. Given v and λ , can the player that controls v improve the valuethat they can ensure against all the other players if onlyplays that are consistent with λ are proposed by the otherplayers? In other words, can this player enforce a better valuewhen playing against the other players if those players arenot willing to give away their own worst-case value? Clearly,securing this worst-case value can be seen as a minimal goalfor any rational adversary. So nego( λ )( v ) returns this value.But, now this reasoning can be iterated. One of the maincontributions of this paper is to show that the least fixed point λ ∗ of the negotiation function is exactly characterizing the setof plays supported by SPEs (Theorem 2).To turn this fixed point characterization of SPEs into al-gorithms, we additionally draw links between the negotiationunction and two classes of zero-sum games, that are called abstract and concrete negotiation games (see Theorem 3).We show that the latter can be solved effectively and allow,given λ , to compute nego( λ ) (Lemma 5). While solvingconcrete negotiation games allows us to compute nego( λ ) for any requirement λ , and even if the function nego( · ) is monotone and Scott-continuous (Proposition 2), a directapplication of the Kleene-Tarski fixed point theorem is notsufficient to obtain an effective algorithm to compute λ ∗ .Indeed, we give examples that require a transfinite number ofiterations to converge to the least fixed point. To provide analgorithm to compute λ ∗ , we show that the function nego( · ) ispiecewise linear and we provide an effective representation ofthis function (Theorem 4). This effective representation canthen be used to extract all its fixed points and in particularits least fixed point using linear algebraic techniques. Finally,let us note that all our results are also shown to extend to ε − SPEs, those are quantitative relaxations of SPEs.
Related works.
Non-zero sum infinite duration games haveattracted a large attention in recent years with applicationstargeting reactive synthesis problems. We refer the interestedreader to the following survey papers [1], [5] and their refer-ences for the relevant literature. We detail below contributionsmore closely related to the work presented here.In [4], Brihaye et al. offer a characterization of NE in quan-titative games for cost-prefix-linear reward functions based onthe worst-case value. The mean-payoff is cost-prefix-linear. Intheir paper, the authors do not consider the stronger notion ofSPE which is the central solution concept studied in our paper.In [18], Ummels proves that there always exists an SPE ingames with ω -regular objectives and defines algorithms basedon tree automata to decide constrained SPE problems. Strategylogics, see e.g. [10], can be used to encode the concept ofSPE in the case of ω -regular objectives with application tothe rational synthesis problem [13] for instance. The mean-payoff reward function is not ω -regular and so the techniquesdefined there cannot be used in our setting. Furthermore, asalready recalled above, see e.g. [20], [3], contrary to the ω -regular case, SPEs in games with mean-payoff objectives mayfail to exist.In [3], Brihaye et al. introduce and study the notion ofweak subgame-perfect equilibria which is a weakening ofthe classical notion of SPE. This weakening is equivalentto the original SPE concept on reward functions that are continuous . This is the case for example for the quantitativereachability reward function. On the contrary, the mean-payoffcost function is not continuous and the techniques used in [3]and generalized in [8], cannot be used to characterize SPEsfor the mean-payoff reward function.In [11], Flesch et al. show that the existence of ε -SPEs isguaranteed when the reward function is lower-semicontinuous ,which is not the case of the mean-payoff reward function.In [6], Bruyère et al. study secure equilibria that are arefinement of NEs. Secure equilibria are not subgame-perfectand are, as classical NEs, subject to non-credible threats insequential games. In [2], Brihaye et al. solve the problem of the existenceof SPEs on quantitative reachability games. Their techniquesrely on the property that the quantitative reachability rewardfunction is continuous which implies that in that case weakSPEs and SPEs are equivalent. This is not the case for themean-payoff reward function.In [15], Noémie Meunier develops a method based onProver-Challenger games to solve the problem of the existenceof SPEs on games with a finite number of possible outcomes.This method is not applicable to the mean-payoff rewardfunction as the number of outcomes in this case is uncountablyinfinite. Structure of the paper.
In Sect. 2, we introduce the necessarybackground. Sect. 3 defines the notion of requirement andthe negotiation function. Sect. 4 contains the main technicalcontribution of the paper which shows that the set of plays thatare supported by an SPE are those that are λ ∗ -consistent where λ ∗ is the least fixed point of the negotiation function. Sect. 5draws a link between the negotiation function and negotiationgames. Finally Sect. 6 establishes that the negotiation functionis effectively piecewise linear. All the detailed proofs of ourresults can be found in a well identified appendix and a largenumber of examples are provided in the main part of thepaper to illustrate the main ideas behind our new conceptsand constructions. II. B ACKGROUND
In all what follows, we will use the word game for theinfinite duration turn-based quantitative games on graphs withcomplete information.
Definition 1 (Game) . A game is a tuple: G = (Π , V, ( V i ) i ∈ Π , E, µ ) , where: • Π is a finite set of players ; • ( V, E ) is a directed graph, whose vertices are sometimescalled states and whose edges are sometimes called transitions , and in which for each v ∈ V , the set: { w ∈ V | ( v, w ) ∈ E } of the states directly accessible from v is nonempty; • ( V i ) i ∈ Π is a partition of V , in which V i is the set of states controlled by player i ; • µ : V ω → R Π is an outcome function .For the simplicity of writing, a transition ( v, w ) ∈ E willoften be written vw .When µ is an outcome function and i is a player, µ i denotesthe i -th component of µ : if µ ( ρ ) = ¯ x = ( x i ) i ∈ Π , then µ i ( ρ ) = x i . That quantity is the payoff of player i in ρ . Definition 2 (Initialized game) . An initialized game is a tuple ( G, v ) , often written G ↾ v , where G is a game and v ∈ V isa state called initial state . Moreover, the game G ↾ v is well-initialized if any state of G is accessible from v in the graph ( V, E ) .2 efinition 3 (Play) . A play in a game G is an infinite word ρ = ρ ρ · · · ∈ V ω such that for all n , we have ρ n ρ n +1 ∈ E .It is also a play in the initialized game G ↾ ρ . The set of playsin the game G (resp. the initialized game G ↾ v ) is denoted by Plays G (resp. Plays G ↾ v ). Remark.
In the literature, the word outcome can be usedto name plays, and the word payoff to name what we callhere outcome. Here, the word payoff will be used to refer tooutcomes, seen from the point of view of a given player – orin other words, an outcome will be seen as the collection ofall players’ payoffs.
Definition 4 (History) . A history in a game G is a finite prefix h . . . h n of a play in G . If it is nonempty, it is also a historyin the initialized game G ↾ h . The set of histories in the game G (resp. the initialized game G ↾ v ) is denoted by Hist G (resp. Hist G ↾ v ). Definition 5 (Strategy) . Let i be a player in an initializedgame G ↾ v . A strategy for player i is a function: σ i : { hv ∈ Hist G ↾ v | v ∈ V i } → V such that vσ i ( hv ) is an edge of ( V, E ) for all hv .A play ρ is compatible with a strategy σ i if and only if ρ n +1 = σ i ( ρ . . . ρ n ) for all n such that ρ n ∈ V i . A history h is compatible with a strategy σ i if it is the prefix of a playthat is compatible with σ i . Definition 6 (Strategy profile) . Let P ⊆ Π be a set of playersin an initialized game G ↾ v . A strategy profile for P is a tuple ¯ σ P = ( σ i ) i ∈ P where for each i , σ i is a strategy for player i in G ↾ v .A complete strategy profile is a strategy profile for Π , theset of all the players in the considered game: then, it is simplywritten ¯ σ . A play or a history is compatible with a strategyprofile ¯ σ P if it is compatible with every strategy σ i for i ∈ P .In a strategy profile ¯ σ P , the σ i ’s domains are pairwise disjoint.Therefore, we can consider ¯ σ P as one function: for hv ∈ Hist G ↾ v such that v ∈ S i ∈ P V i , we liberally write ¯ σ P ( hv ) for σ i ( hv ) with i such that v ∈ V i . For any complete strategyprofile ¯ σ , there exists only one play in G ↾ v compatible withevery σ i , denoted by h ¯ σ i v .When i is a player and when the context is clear, we willoften write − i for the set Π \ { i } . Then, a strategy profile forall the players, except player i , will typically be written ¯ σ − i .We will often refer to Π \{ i } as the environment against player i . When ¯ τ P and ¯ τ ′ Q are two strategy profiles with P ∩ Q = ∅ , (¯ τ P , ¯ τ ′ Q ) denotes the strategy profile ¯ σ P ∪ Q such that σ i = τ i for i ∈ P , and σ i = τ ′ i for i ∈ Q .Before moving on to SPEs, let us recall the notion of Nashequilibrium. Definition 7 (Nash equilibrium) . Let G ↾ v be an initializedgame. The strategy profile ¯ σ is a Nash equilibrium , or NE for short, in G ↾ v , if and only if for each player i and for everystrategy σ ′ i , called deviation of σ i , we have the inequality: µ i ( h σ ′ i , ¯ σ − i i v ) ≤ µ i ( h ¯ σ i v ) . To define SPEs, we need the notion of subgame.
Definition 8 (Subgame) . Let G = (Π , V, ( V i ) i , E, µ ) be agame, and let hv be a history in G . The subgame of G after hv , denoted by G ↾ hv , is the initialized game: (Π , V, ( V i ) i , E, µ ↾ hv ) ↾ v where: µ ↾ hv : (cid:26) vV ω → R Π vρ µ ( hvρ ) Remark.
A subgame is initialized, but defined from a gamewhich is not: that is why G ↾ v denotes the game G initializedin the state v , which is also the subgame of G after the one-state history v . Definition 9 (Substrategy) . Let G ↾ v be an initialized game, σ i be a strategy for some player i , and hv be a history in G ↾ v . Then, the substrategy of σ i after hv , denoted by σ i ↾ hv ,is the strategy in the subgame G ↾ hv : σ i ↾ hv : vh ′ σ i ( hvh ′ ) . Definition 10 (Subgame-perfect equilibrium) . Let G ↾ v be aninitialized game. The strategy profile ¯ σ is a subgame-perfectequilibrium , or SPE for short, in G ↾ v , if and only if for everyhistory h in G ↾ v , the strategy profile ¯ σ ↾ h is a Nash equilibriumin the subgame G ↾ h .The notion of subgame-perfect equilibrium can be seen asa refinement of Nash equilibrium: it is a stronger equilibrium,which excludes players resorting to non-credible threats. Example . In the game represented in Figure 1, where thesquare state is controlled by player and the round states byplayer , if both players get the payoff by reaching the state d and the payoff in the other cases, there are actually twoNEs: one, in blue, where goes to the state b and then player goes to the state d , and both win, and one, in red, whereplayer goes to the state c because player was planningto go to the state e . However, only the blue one is an SPE, asmoving from b to e is irrational for player in the subgame G ↾ ab .An ε -SPE is a strategy profile which is almost an SPE,meaning that if a player deviates after some history, they willnot be able to improve their payoff by more than a quantity ε ≥ . Definition 11 ( ε -SPE) . Let G ↾ v be an initialized game, and ε ≥ . A strategy profile ¯ σ from v is an ε -SPE if and onlyif for every history hv , for every player i , for every strategy σ ′ i , we have: µ i ( h ¯ σ − i ↾ hv , σ ′ i ↾ hv i v ) ≤ µ i ( h ¯ σ ↾ hv i v ) + ε. Remark. A -SPE is an SPE.3 b cd e f g Fig. 1. A game with two NEs and one SPE
In this article, we will focus on prefix-independent games,and in particular mean-payoff-inf games.
Definition 12 (Mean-payoff-inf game) . A mean-payoff-infgame is a game G = (Π , V, ( V i ) i , E, µ ) , where µ is definedfrom a function π : E → Q Π , called weight function , by, foreach player i : µ i : ρ lim inf n →∞ n n − X k =0 π i ( ρ k ρ k +1 ) . In a mean-payoff-inf game, the weight given by the function π represents the immediate reward that each action gives toeach player. The final payoff of each player is their averagepayoff along the play, defined as the limit inferior over n (sincethe limit may not be defined) of the average payoff after n steps. Definition 13 (Prefix-independent game) . A game G = (Π , V, ( V i ) i , E, µ ) is prefix-independent if forevery history h and for every play ρ , µ ( hρ ) = µ ( ρ ) .We also say, in that case, that the outcome function µ isprefix-independent. Remark.
Mean-payoff-inf games are prefix-independent.Before moving on to some examples, we recall a fewclassical results about two-player zero-sum games.
Definition 14 (Zero-sum game) . A game G =(Π , V, ( V i ) i , E, µ ) is zero-sum if for every play ρ , wehave: X i ∈ Π µ i ( ρ ) = 0 . Definition 15 (Borelian game) . A game G =(Π , V, ( V i ) i , E, µ ) is Borelian if the function µ , fromthe set V ω equipped with the product topology to theeuclidian space R Π , is Borelian, i.e. if for any Borelian set B ⊆ R Π , the set µ − ( B ) is Borelian. Proposition 1 (Determinacy of Borelian two-playergames [14]) . Let G ↾ v = ( { , } , V, ( V i ) i , E, µ ) ↾ v be ac b d
03 2211
Fig. 2. A game without SPE a b
Fig. 3. A game with an infinity of SPEs an initialized two-player zero-sum Borelian game, with adistinguished player . Then, we have the following equality: sup σ inf σ µ ( h ¯ σ i v ) = inf σ sup σ µ ( h ¯ σ i v ) . Definition 16 (Value of a zero-sum game) . Let: G ↾ v = ( { , } , V, ( V i ) i , E, µ ) ↾ v be an initialized two-player zero-sum Borelian game, with adistinguished player . Then, the quantity: sup σ inf σ µ ( h ¯ σ i v ) = inf σ sup σ µ ( h ¯ σ i v ) is called value of G ↾ v , and denoted by val( G ↾ v ) .In the two following examples, we illustrate the problem ofthe existence of SPEs in mean-payoff games. Example . Let G be the mean-payoff-inf game of Figure 2,where for every edge, the left number is the weight for player , and the right number is the weight for player . No weightis given for the edges ac and bd since they can be used onlyonce, and therefore do not influence the final payoff.As shown in [7], this game does not have any SPE, neitherfrom the state a nor from the state b . The idea is the following: • The only NE plays from the state b are the plays whereplayer eventually leaves the cycle ab and goes to d : ifthey stay in the cycle ab , then player would be betteroff leaving it, and if she does, player would be betteroff leaving it before. • From the state a , if player knows that player willleave, she has no incentive to do it before: there is no NEwhere leaves the cycle and plans to do it if ever shedoes not. Therefore, there is no SPE where leaves thecycle. • But then, after a history that terminates in b , player has actually no incentive to leave if player never plansto do it afterwards: contradiction.Throughout the remaining of the paper, we will show howto apply our method on that example. Example . Let us now study the game of Figure 3.Using techniques from [9], we can represent the outcomesof possible plays in that game as in Figure 4 (gray and blueareas).Following exclusively one of the three simple cycles a , ab , b of the game graph during a play yields the outcomes , and , respectively. By combining those cycles with well chosen4
12 0 1 2
Fig. 4. The outcomes of plays and SPE plays in the game of Figure 3 frequencies, one can obtain any outcome in the convex hullof those three points.Now, it is also possible to obtain the point by using theproperties of the limit inferior: it is for instance the outcomeof the play: a b a b . . . a n b n +1 . . . , In fact, one can construct a play that yields any outcome inthe convex hull of the four points , , , and .We claim that the outcomes of SPEs plays correspond tothe entire blue area in Figure 4: there exists an SPE ¯ σ in G ↾ a with h ¯ σ i a = ρ if and only if µ ( ρ ) , µ ( ρ ) ≥ .That statement is a direct consequence of the results weshow in the remaining sections, but let us give a first intuition:a play with such an outcome necessarily uses infinitely oftenboth states. It is an NE play because none of the players canget a better payoff by looping forever on their state, and theycan both force each other to follow that play, by threateningthem to loop for ever on their state whenever they can. Butsuch a strategy profile is clearly not an SPE.It can be transformed into an SPE as follows: when a playerdeviates, say player , then player can punish him bylooping on a , not forever, but a great number of times, untilplayer ’s mean-payoff gets very close to . Afterwards, bothplayers follow again the play that was initially planned. Sincethat threat is temporary, it does not affect player ’s payoffon the long term, but it really punishes player if that onetries to deviate infinitely often.For example, let us consider the play ( aaabbb ) ω , with out-come . An SPE which generates that play is the following:when player has to play, by default, she loops twice on a , and when player has to play, he loops twice on b . Butif player goes to b without having looped twice on a , thenplayer stays in b until ’s payoff passes below n , where n is the number of times player has already deviated; thenand only then, he plays as it was planned. And if player deviates, player punishes him the same way. III. R EQUIREMENTS AND NEGOTIATION
We will now see that SPEs are strategy profiles that respectsome requirements about the payoffs, depending on the statesit traverses. In this part, we develop the notions of requirement and negotiation . A. Requirement
In the method we will develop further, we will need to ana-lyze the players’ behaviour when they have some requirement to satisfy. Intuitively, one can see requirements as rationalityconstraints for the players, that is, a threshold payoff valueunder which a player will not accept to follow a play.In all what follows, R denotes the set R ∪ {±∞} . Definition 17 (Requirement) . A requirement on the game G is a function λ : V → R .For a given state v , the quantity λ ( v ) represents the minimalpayoff that the player controlling v will require in a playbeginning in v . Definition 18 ( λ -consistency) . Let λ be a requirement on agame G . A play ρ in the game G is λ -consistent if and onlyif for all n ∈ N , for i ∈ Π such that ρ n ∈ V i , we have µ i ( ρ n ρ n +1 . . . ) ≥ λ ( ρ n ) .The set of the λ -consistent plays from a state v is denotedby λ Cons( v ) . Definition 19 (Satisfiability) . A requirement λ on the ini-tialized game G ↾ v is satisfiable if and only if for each v accessible from v , there exists a λ -consistent play in the game G ↾ v . Definition 20 ( λ -rationality) . Let λ be a requirement on amean-payoff-inf game G . Let i ∈ Π . A strategy profile ¯ σ − i is λ -rational if and only if there exists a strategy σ i such thatfor every history hv compatible with ¯ σ − i , the play h ¯ σ ↾ hv i v is λ -consistent. We then say that the strategy σ i λ - rationalizes the strategy profile ¯ σ − i .The set of λ -rational strategy profiles in G ↾ v is denoted by λ Rat( v ) .Note that λ -rationality is a property of a strategy profilefor all the players but one, player i . Intuitively: all players(including player i ) have some requirement to satisfy. Theother players than i made a coalition: they choose collectivelytheir strategy profile, and they propose a strategy to player i ,so that if player i eventually follows that strategy (i.e. possiblyafter a finite number of deviations), then every player has theirrequirements satisfied. Remark.
A requirement λ is satisfiable in G ↾ v if and only iffor some player i , there exists a λ -rational strategy profile ¯ σ − i from v .Finally, let us define a particular requirement: the vacuousrequirement , which requires nothing. Definition 21 (Vacuous requirement) . In any game, the vacu-ous requirement , denoted by λ , is the requirement constantlyequal to −∞ .5 c b d
03 2211 ( λ )1 21 2 Fig. 5. The requirement λ on the game of Example 2 Remark.
Any play is λ -consistent. B. Negotiation
We will show that SPEs in prefix-independent games arecharacterized by the fixed points of a function on requirements.That function can be seen as a negotiation : when a player has arequirement to satisfy, another player can hope a better payoffthan what they can secure in general, and therefore updatetheir own requirement.
Definition 22 (Negotiation function) . Let G be a game.The negotiation function is the function that transforms anyrequirement λ on G into a requirement nego( λ ) on G , suchthat for each i ∈ Π and v ∈ V i , nego( λ )( v ) = inf ¯ σ − i ∈ λ Rat( v ) sup σ i µ i ( h ¯ σ i v ) , with the convention inf ∅ = + ∞ . Remark.
The negotiation function has the following proper-ties: • The requirement λ is not satisfiable from v if and onlyif nego( λ )( v ) = + ∞ . • The negotiation function is monotone: if λ ≤ λ ′ (for thepointwise order, i.e. if for each v , λ ( v ) ≤ λ ′ ( v ) ), then nego( λ ) ≤ nego( λ ′ ) . • The negotiation function is non-decreasing: for every λ ,we have λ ≤ nego( λ ) .In the general case, the quantity nego( λ )( v ) represents theworst case value that the player controlling v can ensure,assuming that the other players play λ -rationally. Example . Let us consider the game of Example 2: werepresented it again in Figure 5, with the requirement λ =nego( λ ) , which is easy to compute since any strategy profileis λ -rational: for each v , λ ( v ) is the classical worst-casevalue or antagonistic value of v , i.e. the best value the playercontrolling v can enforce against a fully hostile environment.Let us now compute the requirement λ = nego( λ ) . • From the state c , there exists exactly one λ -rationalstrategy profile ¯ σ − = σ , which is the empty strategysince player has never to choose anything. Against thatstrategy, the best and the only payoff player can get is , hence λ ( c ) = 1 . • For the same reasons, λ ( d ) = 2 . • From the state b , player can force to get the payoff or less, with the strategy profile σ : h c . Such astrategy is λ -rational, rationalized by the strategy σ : h d . Therefore, λ ( b ) = 2 . • Finally, from the state a , player can force to get thepayoff or less, with the strategy profile σ : h d .Such a strategy is λ -rational, rationalized by the strategy σ : h c . But, he can not force her to get less than thepayoff , because she can force the access to the state b ,and the only λ -consistent plays starting from b are theplays with the form ( ba ) k bd ω . Therefore, λ ( a ) = 2 . C. Steady negotiation
Often in what follows, we will need a game to be withsteady negotiation , i.e. such that there always exists a worse λ -rational behaviour for the environment against a given player. Definition 23 (Game with steady negotiation) . A game G is with steady negotiation if and only if for every player i , forevery vertex v , and for every requirement λ satisfiable from v , there exists a λ -rational strategy profile ¯ σ − i from v suchthat: inf ¯ σ ′− i ∈ λ Rat( v ) sup σ ′ i µ i ( h ¯ σ ′ i v ) = sup σ i µ i ( h ¯ σ − i , σ i i v ) . Remark.
In particular, when a game is with steady negotiation,the infimum in the definition of negotiation is always reached.It will be proved in Section V that mean-payoff-inf gamesare with steady negotiation.
D. Link with Nash equilibria
Requirements and the negotiation function are able to cap-ture Nash equilibria. Indeed, if λ is the vacuous requirement,then nego( λ ) characterizes the plays that are supported by aNash equilibrium (abbreviated by NE plays), in the followingformal sense: Theorem 1.
Let G be a game with steady negotiation. Then,a play ρ in G is an NE play if and only if ρ is nego( λ ) -consistent.Proof sketch: For a given state v , the value nego( λ )( v ) is defined as the best payoff that the player controlling v canensure against any λ -rational strategy profile, that is againstany strategy profile: it is what is often called in the literaturethe antagonistic value or the worst-case value of v .In a play that is not nego( λ ) -consistent, any player thatdoes not have their requirements satisfied can deviate and en-sure that requirement, and therefore has a profitable deviation.Conversely, given a play that is nego( λ ) -consistent, wecan construct an NE realizing that play, where all the playersforce each other to follow it, by threatening them to playfully adversarily against the one who would deviate. Thesteady negotiation property guarantees the existence of a fullyadversarial strategy profile.See Appendix A for a detailed proof. Example . Let us consider again the game of Example 2, withthe requirement λ given in Figure 5. The only λ -consistentplays in this game, starting from the state a , are ac ω , and ( ab ) k d ω with k ≥ . One can check that those plays areexactly the NE plays in that game.6 v ξ • i • • (cid:27) "worst" λ ∗ -rational ¯ σ − i • j • } even worse λ ∗ -rational ¯ τ − i Fig. 6. The construction of the SPE ¯ σ In the following section, we will prove that as well as thisrequirement nego( λ ) characterizes the NEs, the requirementthat is the least fixed point of the negotiation function charac-terizes the SPEs.IV. L INK BETWEEN NEGOTIATION AND
SPE S A. From negotiation fixed points to SPEs
The notion of negotiation will enable us to find the SPEs, butalso more generally the ε -SPEs, in a game. For that purpose,we need the notion of ε -fixed points of a function. Definition 24 ( ε -fixed point) . Let ε ≥ , let D be a finite setand let f : R D → R D be a mapping. A tuple ¯ x ∈ R D is a ε -fixed point of f if for each d ∈ D , if ¯ y = f (¯ x ) , we have y d ∈ [ x d − ε, x d + ε ] . Remark. A -fixed point is a fixed point.We can now prove that in games with steady negotiation,the ε -fixed points λ of the negotiation function are such thatall λ -consistent plays are ε -SPEs plays. Lemma 1.
Let G ↾ v be a well-initialized prefix-independentgame with steady negotiation, and ε ≥ . Let λ be an ε -fixedpoint of the function nego . Then, for every λ -consistent play ξ starting in v , there exists an ε -SPE ¯ σ such that h ¯ σ i v = ξ .Proof sketch: Given a play ξ , the construction of ¯ σ canbe represented as in Figure 6.First, the play generated by ¯ σ is ξ . Then, whenever a player i deviates from that play, the other players must punish them,and stay λ -rational: they follow the λ -rational strategy profilethat minimizes player i ’s payoff, whose existence is guaranteedby the steady negotiation property. Player i plays a strategythat λ -rationalizes that strategy profile.After a history where player i would have make seriousmistakes, that is, choices that lower the best payoff they canensure against a λ -rational environment, that environment must"reset" its strategy profile in order to be as hostile as it canbe. Prefix-independence guarantees that those resets happenfinitely many times.We use the same construction if a second player j deviatesafter finitely many deviations of player i , and so on.See Appendix B for a detailed proof. B. From SPEs to negotiation fixed points
Conversely, let us prove that every ε -SPE play is λ -consistent for some ε -fixed point λ of the negotiation function. Lemma 2.
Let G ↾ v be a well-initialized prefix-independentgame, and let ε ≥ . Let ¯ σ be an ε -SPE in G ↾ v . Then, thereexists an ε -fixed point λ of the negotiation function such thatfor every history hv starting in v , the play h ¯ σ ↾ hv i v is λ -consistent.Proof sketch: Given an SPE ¯ σ , we can define, for everyplayer i and every state v ∈ V i : λ ( v ) = inf hv ∈ Hist G ↾ v µ i ( h ¯ σ ↾ hv i v ) as the lowest payoff that can be given to player i in aplay starting from the state v in the strategy profile ¯ σ . Byconstruction, any play generated by ¯ σ after a finite history is λ -consistent, and the fact that ¯ σ is an ε -SPE implies that λ isan ε -fixed point of the negotiation function.See Appendix C for a detailed proof. C. Least ε -fixed point Since the set of requirements, equipped with the componen-twise order ≤ , is a complete lattice, and since the negotiationfunction is monotone, Tarski’s fixed point theorem states thatthe negotiation function has a least fixed point.That result can be generalized to ε -fixed points: Lemma 3.
Let G be a game, and let ε ≥ . The negotiationfunction has a least ε -fixed point. A proof is given in Appendix D.
D. Theorem
The following theorem is an immediate consequence of thethree previous lemmas, and sums up the link between thenegotiation function and the SPEs, or ε -SPEs. Theorem 2.
Let G ↾ v be an initialized prefix-independentgame, and let ε ≥ . Let λ ∗ be the least ε -fixed point of thenegotiation function. Let ξ be a play starting in v . If thereexists an ε -SPE ¯ σ such that h ¯ σ i v = ξ , then ξ is λ ∗ -consistent.The converse is true if the game G is with steady negotiation.Proof: First, let us recall that λ ∗ , the least ε -fixed pointof the negotiation function, exists by Lemma 3.If ¯ σ is an ε -SPE, then by Lemma 2, there exists an ε -fixed point λ of the negotiation function such that all theplays generated by ¯ σ after some history are λ -consistent;in particular, the play ξ is λ -consistent, and therefore λ ∗ -consistent since λ ∗ ≤ λ .Conversely, if the game G is with steady negotiation, andif the play ξ is λ ∗ -consistent, then by Lemma 1, there existsan ε -SPE ¯ σ such that h ¯ σ i v = ξ .In the following sections, we will develop a method tocompute the negotiation function: we will prove that in thecase of mean-payoff-inf games, it is actually a piecewise linearfunction, which makes it feasible to compute and express theset of its ε -fixed points; and therefore, to find the least of themusing classical linear algebraic techniques.However, when one looks for a least fixed point, a usualmethod, under some continuity hypothesis, is to compute the7 d a b ef Fig. 7. A game where the negotiation function is not stationary limit of the iterations, by successive approximations basedon Kleene-Tarski theorem. We develop that possibility in thefollowing subsection, confirming that the least fixed point ofthe negotiation function is indeed the limit of its iteration onthe vacuous requirement λ , and explaining why it can notalways be used in practice. E. An insufficient track: the negotiation sequence
We assume in that subsection that G is a game such thatthe negotiation function is Scott-continuous , i.e. such that forevery non-decreasing sequence ( λ n ) n of requirements on G ,we have: nego (cid:18) sup n λ n (cid:19) = sup n nego( λ n ) . The least fixed point of the negotiation function is, then,the limit of the negotiation sequence , defined as the sequence ( λ n ) n ∈ N = (nego n ( λ )) n .Indeed, that sequence is non-decreasing; therefore it has alimit λ . By Scott-continuity, the equality λ n +1 = nego( λ n ) implies, when we take the suprema over n , that λ is a fixedpoint of the negotiation function. If λ ∗ is the least fixed pointof the negotiation function, then λ ∗ ≤ λ ; and on the otherhand, λ ∗ ≥ λ by induction, since λ ∗ ≤ λ and if λ ∗ ≤ λ n ,then λ ∗ ≤ λ n +1 because the negotiation function is non-decreasing. Therefore, λ ∗ = λ .In mean-payoff-inf games, in particular, the negotiationfunction is Scott-continuous: Proposition 2.
In mean-payoff-inf games, the negotiationfunction is Scott-continuous.
A proof of that statement is given in Appendix I: note thatit uses results that will be presented in Section V.In many cases, the negotiation sequence is stationary, and inthat case, it is possible to compute its limit: whenever a termis equal to the previous one, we know that we reached it. Butactually, the negotiation sequence is not always stationary. Thegame of Figure 7, where for all edges, the first label is theweight for player , the second one is the label for player ,and the third one for player , is a counter-example.For all n , we have: λ n ( a ) = λ n ( b ) = 2 − n − . v • • • Fig. 8. The play constructed by Prover and Challenger in the abstractnegotiation game
Indeed, the game is symmetric, and the lowest payoff thatcan be proposed to player from the state a will be obtainedby a combination of the cycles ef and f that has to satisfyplayer ’s requirement in the state b , hence the followinginductive equation: λ n +1 ( a ) = 1 + 12 λ n ( a ) , whose solution is the sequence proposed above. This sequenceconverges to but never reaches it. All the details of thatstatement, and a similar example with only two players, aregiven in Example 15, in Appendix K.V. N EGOTIATION GAMES
We have now proved that SPEs are characterized by therequirements that are fixed points of the negotiation function;but we need to know how to compute, in practice, the quantity nego( λ ) for a given requirement λ . In other words, we need aalgorithm that gives, given a state v controlled by a player i in the game G , and given a requirement λ , which value player i can ensure in G ↾ v if the other players play λ -rationally. Theconcept of Prover-Challenger games, used for example in [15],gives us a tool for that purpose. A. Abstract negotiation game
We first define an abstract negotiation game , that is concep-tually simple but not directly usable for algorithmic purposebecause it is defined on an uncoutable infinite state space.Here is an intuitive definition of the abstract negotiationgame
Abs λi ( G ) ↾ [ v ] from a state v , a player i and a require-ment λ : • player Prover proposes a λ -consistent play ρ from v (orlooses, if she has no play to propose); • either: – player Challenger accepts the play and the gameterminates; – or he chooses an edge ρ k ρ k +1 , with ρ k ∈ V i , fromwhich he can make player i deviate, using another edge ρ k v with v = ρ k +1 : then, the game starts again from w instead of v . • In the resulting play (either eventually accepted by Chal-lenger, or constructed by an infinity of deviations), asrepresented in Figure 8, Prover wants player i ’s payoff tobe low, and Challenger wants it to be high.That game gives us the basis of a method to compute nego( λ ) from λ : if α is the maximal outcome that Challengercan ensure in Abs λi ( G ) [ v ] , with v ∈ V i , then it is the8 c b d
03 2211 ( λ )2 21 2 Fig. 9. The requirement λ on the game of Example 2 maximal payoff that player i can guarantee in G ↾ v , against a λ -rational environment. Hence the equality: val (cid:0) Abs λi ( G ) [ v ] (cid:1) = nego( λ )( v ) . A proof of that statement, with a complete formalization ofthe abstract negotiation game, are presented in Appendix E.
Example . Let us take the game of Example 2: in Figure 9,we wrote, in red, the requirement λ = nego( λ ) , computedin Section III-B. Let us use the abstract negotiation game tocompute the requirement λ = nego( λ ) .From the state a , Prover can propose the play abd ω , andthe only deviation Challenger can do is going to c ; he has ofcourse no incentive to do it. Therefore, λ ( a ) = 2 .From the state b , whatever Prover proposes at first, Chal-lenger can deviate and go to a . Then, from a , Prover cannotpropose the play ac ω , which is not λ -consistent: the only pos-sibility she has is proposing a play beginning by ab , and lettingChallenger deviate once more. He can deviate infinitely oftenthat way, and generate the play ( ba ) ω : therefore, λ ( b ) = 3 .The other states keep the same values.Note that λ is no longer satisfiable from a or b , andtherefore that if λ = nego( λ ) , then λ ( a ) = λ ( b ) = + ∞ .By the considerations on the negotiation sequence given inSection IV-E, this proves that the least fixed point of thenegotiation function is not satisfiable, and therefore that thereis no SPE in that game.The interested reader will find other examples in Ap-pendix K. B. Concrete negotiation game
In the abstract negotiation game, Prover has to proposecomplete plays, on which we can make the hypothesis thatthey are λ -consistent. In practice, there will often be an infinityof such plays, and therefore it cannot be used directly for analgorithmic purpose. Instead, those plays can be given edge byedge, with a finite state game. Its definition is more technical,but it can be shown that it is equivalent to this abstract one. Definition 25 (Concrete negotiation game) . Let G ↾ v be aninitialized prefix-independent game, and let λ be a requirementon G , with either λ ( V ) ⊆ R , or λ = λ .The concrete negotiation game of G ↾ v for player i is thetwo-player zero-sum game: Conc λi ( G ) ↾ s = ( {P , C} , S, ( S P , S C ) , ∆ , ν ) ↾ s , that, intuitively, mimics the abstract game introduced above: • the states controlled by Prover are: S P = V × V where the state s = ( v, M ) represents a current state v on which Prover has to define the strategy profile,called localization of the state s , and M is the memory of s , which memorizes the states that have been traversedso far since the last deviation, in order to check the λ -consistency of the proposed play; • the states controlled by Challenger are: S C = E × V ; • there are three types of transitions: proposals , accepta-tions and deviations : ∆ = Prop ∪ Acc ∪ Dev where: – the proposals are transitions in which Prover proposesan edge to follow in the game G : Prop = (cid:26) ( v, M )( vw, M ) (cid:12)(cid:12)(cid:12)(cid:12) vw ∈ E,M ∈ V (cid:27) ; – the acceptations are transitions in which Challengeraccepts to follow the edge proposed by Prover (thisis in particular his only possibility whenever that edgebegins on a state that is not controlled by player i ): Acc = (cid:26) ( vw, M ) ( w, M ∪ { w } ) (cid:12)(cid:12)(cid:12)(cid:12) j ∈ Π ,w ∈ V j (cid:27) (note that the memory is updated); – the deviations are transitions in which Challenger re-fuses to follow the edge proposed by Prover, as he canif that edge begins in a state controlled by player i : Dev = ( uv, M )( w, { w } ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u ∈ V i ,w = v,uw ∈ E (the memory is erased, and only the new state thedeviating edge leads to is memorized); • the outcome function ν measures player i ’s payoff, witha defeat condition if the constructed strategy profile isnot λ -rational, that is to say if after finitely many player i ’s deviations, it can generate a play which is not λ -consistent: – ν C ( η ) = + ∞ if there exists n such that no transition inthe play η n η n +1 . . . is a deviation, and if there exists j ∈ Π such that ˆ µ j ( η ) < ; – ν C ( η ) = ˆ µ ⋆ ( η ) otherwise; – and ν P = − ν C ;where for each dimension j ∈ Π , ˆ µ j measures the differ-ence between player j ’s payoff and player j ’s maximalrequirement: ˆ µ j (( ρ , M ) ( ρ ρ ′ , M ) ( ρ , M ) . . . )= µ j ( ρ ) − lim sup n max v ∈ M n ∩ V j λ ( v ) ⋆ , ˆ µ ⋆ measuresplayer i ’s payoff: ˆ µ ⋆ (( ρ , M ) ( ρ ρ ′ , M ) ( ρ , M ) . . . ) = µ i ( ρ ) The dimension ⋆ is called main dimension, and each j ∈ Π is a non-main dimension; • and finally, s = ( v , { v } ) .Like in the abstract negotiation game, the goal of Proveris to find a λ -rational strategy profile that forces the worstpossible payoff for player i , and the goal of Prover is to finda possibly deviating strategy for player i that gives them thehighest possible payoff.In the case of mean-payoff-inf games, the function ˆ µ is amulti-mean-payoff function, which will enable us to computethe value of the concrete negotiation game. Remark.
In the case of a mean-payoff-inf game, each function ˆ µ j is the mean-payoff-inf function corresponding to the weightfunction ˆ π j defined by, for j ∈ Π : ˆ π j (( v, M )( vw, M )) = 0ˆ π j (( uv, M )( w, N )) = 2 (cid:18) π j ( uw ) − max v j ∈ M ∩ V j λ ( v j ) (cid:19) and: ˆ π ⋆ (( v, M ) , ( vw, M )) = 0ˆ π ⋆ (( uv, M ) , ( w, N )) = 2 π i ( uw ) . A play or a history in the concrete negotiation game hasa projection in the game on which that negotiation game hasbeen constructed, defined as follow:
Definition 26 (Projection of a history, of a play) . Let G bea prefix-independent game. Let λ be a requirement and i aplayer, and let Conc λi ( G ) be the corresponding concrete ne-gotiation game. Let H = ( h , M )( h h ′ , M ) . . . ( h n h ′ n , M n ) be a history in Conc λi ( G ) : the projection of the history H isthe history in the game G : ˙ H = h . . . h n . That definition is naturally extended to plays.
Remark.
For a play η where no transition is a deviation, wehave that ˆ µ j ( η ) ≥ for each j ∈ Π if and only if ˙ η is λ -consistent.Although the construction is technically more complex, theconcrete negotiation game is equivalent to the abstract one:the only differences are that the plays proposed by Prover areproposed edge by edge, and that their λ -consistency is notwritten in the rules of the game but in its outcome function. Theorem 3.
Let G ↾ v be an initialized prefix-independentBorelian game. Let λ be a requirement and i a player. Then,we have: val (Conc λi ( G ) ↾ s ) = inf ¯ σ − i ∈ λ Rat( v ) sup σ i µ i ( h ¯ σ i v ) . A proof can be found in Appendix F. b, { a,b } ba, { a,b } a, { a,b } ab, { a,b } a, { a } ab, { a } c, { c } cc, { c } ac, { a } c, { a,c } cc, { a,c } b, { b } ba, { b } bd, { b } d, { b,d } dd, { b,d } ac, { a,b } c, { a,b,c } cc, { a,b,c } bd, { a,b } d, { a,b,d } dd, { a,b,d } − − −
222 04 0 20 − Fig. 10. A concrete negotiation game
Example . Let us consider again the game from Example 2.Figure 10 represents the game
Conc λ ( G ) (with λ ( a ) = 1 and λ ( b ) = 2 ), where the dashed states are controlled byChallenger, and the other ones by Prover.The dotted arrows indicate the deviations, and when atransition ss ′ is labelled by: x yz,x denotes ˆ π ( ss ′ ) , y denotes ˆ π ( ss ′ ) and z denotes ˆ π ⋆ ( ss ′ ) .The transitions that are not labelled are either zero for thethree coordinates, or meaningless since they cannot be usedmore than once.The red arrows indicate a (memoryless) optimal strategy forChallenger. Against that strategy, the lower outcome Provercan ensure is .Therefore, nego( λ )( v ) = 2 , in line with the abstract gamein Example 6. C. Solving the concrete negotiation game for the mean-payoff-inf case
First, let us note that mean-payoff-inf games are Borelian,and therefore satisfy the hypotheses of Theorem 3.We now know that nego( λ )( v ) , for a given requirement λ , a given player i and a given state v ∈ V i , is the valueof the concrete negotiation game Conc λi ( G ) ↾ ( v, { v } ) . Let usnow show how, in the mean-payoff-inf case, that value can becomputed. Definition 27 (Memoryless strategy) . A strategy σ i in a game G is memoryless if for all vertices v ∈ V i and for all histories h and h ′ , σ i ( hv ) = σ i ( h ′ v ) .For any game G and any memoryless strategy σ i , G [ σ i ] denotes the graph induced by σ i , that is the graph ( V, E ′ ) ,with: E ′ = { vw ∈ E | v V i or w = σ i ( v ) } . For any finite set D and any set X ⊆ R D , Conv X denotesthe convex envelopp of X .10e can now prove that in the concrete negotiation gameconstructed from a mean-payoff-inf game, the player Chal-lenger has an optimal strategy that is memoryless. Lemma 4.
Let G be a mean-payoff-inf game, let i be a player,let λ be a requirement and let Conc λi ( G ) be the correspondingconcrete negotiation game. There exists a memoryless strategy τ C such that for all states s : inf τ P ν C ( h ¯ τ i s ) = val (Conc λi ( G ) ↾ s ) , i.e. that is optimal for Challenger from all state.Proof. By [12], a player whose objective is prefix-independentand convex , that is, such that whenever two plays satisfy thatobjective, a shuffling of those two plays satisfies it too, has amemoryless optimal strategy.For Challenger, ensuring a payoff above some value is notexactly convex; but it becomes convex if we replace the limitinferior in the definition of ˆ µ ⋆ by a limit superior. The optimalChallenger’s strategy with regards to that slightly modifiedobjective is also optimal in the actual concrete negotiationgame.See Appendix G for a complete proof.With Lemma 4, we can now compute the value of theconcrete negotiation game.When G is a graph, SC( G ) denotes the set of all the simplecycles of G .For any closed set C ⊆ R Π ∪{ ⋆ } , the quantity: min ⋆ C = min { x ⋆ | ¯ x ∈ C, ∀ j ∈ Π , x j ≥ } is the ⋆ -minimum of C : it will capture, in the concretenegotiation game, the least payoff that can be imposed onplayer i while keeping every player’s payoff above theirrequirements, among a set of possible outcomes. See Figure 11for an illustration.For every game G ↾ v and each player i , ML i ( G ↾ v ) , or ML ( G ↾ v ) when the context is clear, denotes the set ofmemoryless strategies for player i in G ↾ v .For any graph ( V, E ) , SConn(
V, E ) denotes the stronglyconnected components of ( V, E ) (considered as a subgraph of ( V, E ) or as a subset of V , depending on the context). Lemma 5.
Let G ↾ v be an initialized mean-payoff-inf game,and let Conc λi ( G ) ↾ s be its concrete negotiation game forsome λ and some i . Then, the value of the game Conc λi ( G ) ↾ s is given by the formula: max τ C ∈ ML C (Conc λi ( G )) min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt( K ) , where opt( K ) is the minimal value ν C ( ρ ) for ρ among theinfinite paths in K . • If K contains a deviation, then Prover can choose thesimple cycle of K that minimizes player i ’s payoff: opt( K ) = min c ∈ SC( K ) ˆ µ ⋆ ( c ω ) . • If K does not contain a deviation, then Prover mustchoose a combination of the simple cycles of K thatminimizes player i ’s payoff while keeping the non-maindimensions above : opt( K ) = min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) . A proof can be found in Appendix H.
Corollary 1.
For each player i and every state v ∈ V i , thevalue nego( λ )( v ) can be computed with the formula given inLemma 5 applied to the game Conc λi ( G ) ( v, { v } ) Moreover, another corollary of that result is that therealways exists a best play that Prover can choose, i.e. Proverhas an optimal strategy; by Theorem 3, this is equivalent to saythat mean-payoff-inf games are games with steady negotiation.
Corollary 2.
The mean-payoff-inf games are games withsteady negotiation.
VI. A
NALYSIS OF THE NEGOTIATION FUNCTION INMEAN - PAYOFF - INF GAMES
In this section, we will show that for the case of mean-payoff-inf games, the negotiation function is a piecewise linearfunction from the vector space of requirements into itself,which can therefore be computed and analyzed using classicallinear algebra techniques. Then, it becomes possible to searchfor the fixed points or the ε -fixed points of such a function,and to decide the existence or not of SPEs or ε -SPEs in thegame studied. Theorem 4.
Let G ↾ v be an initialized mean-payoff-inf game.Let us assimilate any requirement λ on G with finite values tothe tuple λ ¯ = ( λ ( v )) v ∈ V , element of the vector space of finitedimension R V . Then, for each player i and every vertex v ∈ V i , the quantity nego( λ )( v ) is a piecewise linear function of λ ¯ , which can be effectively expressed and whose ε -fixed pointsare computable for all ε .Proof sketch: By Lemma 5, the quantity nego( λ )( v ) isequal to: max τ C ∈ ML C (Conc λi ( G )) min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt( K ) , which is defined from the concrete negotiation game Conc λi ( G ) , itself depending on λ . But the underlying graphof the concrete negotiation game is actually independent of λ , which appears only in the computation of the non-maindimension weights. Therefore, the indexation of the maximumand the minimum in the formula above does not depend on λ .Then, we have to prove that the quantity opt( K ) is apiecewise linear function of λ ¯ , in both of the cases given byLemma 5: when K contains a deviation, and when it does not.In the first case, it is trivially true, because opt( K ) is actu-ally independent of λ . In the second case, opt( K ) is defined asthe ⋆ -minimum of the convex hull of finitely many points, i.e.of a polyhedron. A modification of λ engenders a translationof that polyhedron, and the ⋆ -minimum of that polyhedron is11 ⋆j • Fig. 11. The ⋆ -minimum of a polyhedron λ ( a ) λ ( b )
120 1 2 (1 , , λ ( b )) ( λ ( a ) , λ ( b ) − , λ ( b )) ( λ ( a ) , λ ( a ) − λ ( a ) , λ ( b )) (+ ∞ , + ∞ ) Fig. 12. The negotiation function on the game of Example 3 the minimum, along the dimension ⋆ , of its intersection withthe quadrant of the points that have positive coordinates alongthe non-main dimensions: that quantity evolves linearly withsuch translations, as illustrated in Figure 11.See Appendix J for a detailed proof. Example . Let G be the game of Example 3, represented inFigure 3.Then, if a requirement λ is represented by the tuple ( λ ( a ) , λ ( b )) , then the function nego : R → R can berepresented by Figure 12, where in any one of the regionsdelimited by the dashed lines, we wrote a formula for thecouple (nego( λ )( a ) , nego( λ )( b )) .The orange area indicates the fixed points of the function,and the yellow area the other -fixed points.Remember that, by Proposition 2, the negotiation functionis Scott-continuous: that means that when we are exactly ona magenta line, the good expression of nego is the one ofthe tile to the left (for a vertical line) or at the bottom (for a λ ( a ) λ ( b ) (1 ,
2) (2 ,
2) (2 , ,
3) (+ ∞ , + ∞ ) Fig. 13. The negotiation function on the game of Example 2 horizontal one).
Example . As a second example, let us consider the game ofthe Example 2, represented in Figure 2.First, let us note that if λ ( c ) ≤ and λ ( d ) ≤ , then nego( λ )( c ) = 1 and nego( λ )( d ) = 2 . We can therefore fix λ ( c ) = 1 and λ ( d ) = 2 , and represent the requirements λ bythe tuples ( λ ( a ) , λ ( b )) , as in the previous example. Then, thenegotiation function can be represented as in Figure 13.One can check that there is no fixed point here, and evenno -fixed point, except (+ ∞ , + ∞ ) .C ONCLUSION AND FUTURE WORKS
With the tools that we defined, we are now able to character-ize effectively all the SPEs, and ε -SPEs, in a mean-payoff-infgame with arbitrarily many players. We do it by constructinga complete representation of the negotiation function usingits associated concrete negotiation games, and then findingits least fixed point, or ε -fixed point, with classical linearalgebraic tools. That algorithm also provides a solution to the(constrained) existence problem for SPEs in mean-payoff-infgames, which was left open in the literature.If we are able to find a formula expressing the negotia-tion function like in Lemma 4 for other classes of prefix-independent games with steady negotiation, for example usingthe concrete negotiation game, then it will be also possible,by Theorem 2, to characterize the SPEs and ε -SPEs in thosegames.When we look for SPEs, i.e. ε -SPEs with ε = 0 , anothermethod can be the computation of the limit of the negotiationsequence, using the concrete negotiation game, or the abstractone, if the game is simple enough. But such an algorithm doesnot always stop, since in some cases, the negotiation sequenceneeds transfinite number of steps to converge. Nevertheless, letus note that the sequence will always converge after finitelymany iterations in games where there exists only a finitenumber of possible outcomes; examples of such games areliminf and limsup games, for example, or mean-payoff-inf12nd mean-payoff-sup games with only disjoint cycles. Analgorithmic method was already defined in [15] for such cases:an open question is then to know which alternative has thebetter complexity, and perhaps to define wider classes ofgames in which we know that the negotiation sequence willstabilize in a finite number of steps.Finally, another open question is whether the notion ofnegotiation, and its link with the SPEs, can be generalizedto games that are not prefix-independent, or that are not withsteady negotiation. R EFERENCES[1] Romain Brenguier, Lorenzo Clemente, Paul Hunter, Guillermo A. Pérez,Mickael Randour, Jean-François Raskin, Ocan Sankur, and MathieuSassolas. Non-zero sum games for reactive synthesis. In
Languageand Automata Theory and Applications - 10th International Conference,LATA 2016, Prague, Czech Republic, March 14-18, 2016, Proceedings ,volume 9618 of
Lecture Notes in Computer Science , pages 3–23.Springer, 2016.[2] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, Jean-FrançoisRaskin, and Marie van den Bogaard. The complexity of subgame perfectequilibria in quantitative reachability games. In
CONCUR , volume 140of
LIPIcs , pages 13:1–13:16. Schloss Dagstuhl - Leibniz-Zentrum fürInformatik, 2019.[3] Thomas Brihaye, Véronique Bruyère, Noémie Meunier, and Jean-François Raskin. Weak subgame perfect equilibria and their applicationto quantitative reachability. In , volume 41 of
LIPIcs , pages 504–518. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2015.[4] Thomas Brihaye, Julie De Pril, and Sven Schewe. Multiplayer costgames with simple nash equilibria. In
Logical Foundations of ComputerScience, International Symposium, LFCS 2013, San Diego, CA, USA,January 6-8, 2013. Proceedings , volume 7734 of
Lecture Notes inComputer Science , pages 59–73. Springer, 2013.[5] Véronique Bruyère. Computer aided synthesis: A game-theoretic ap-proach. In
Developments in Language Theory - 21st InternationalConference, DLT 2017, Liège, Belgium, August 7-11, 2017, Proceedings ,volume 10396 of
Lecture Notes in Computer Science , pages 3–35.Springer, 2017.[6] Véronique Bruyère, Noémie Meunier, and Jean-François Raskin. Secureequilibria in weighted games. In
CSL-LICS , pages 26:1–26:26. ACM,2014.[7] Véronique Bruyère, Stéphane Le Roux, Arno Pauly, and Jean-FrançoisRaskin. On the existence of weak subgame perfect equilibria.
CoRR ,abs/1612.01402, 2016.[8] Véronique Bruyère, Stéphane Le Roux, Arno Pauly, and Jean-FrançoisRaskin. On the existence of weak subgame perfect equilibria. In
Foundations of Software Science and Computation Structures - 20thInternational Conference, FOSSACS 2017, Held as Part of the EuropeanJoint Conferences on Theory and Practice of Software, ETAPS 2017,Uppsala, Sweden, April 22-29, 2017, Proceedings , volume 10203 of
Lecture Notes in Computer Science , pages 145–161, 2017.[9] Krishnendu Chatterjee, Laurent Doyen, Herbert Edelsbrunner,Thomas A. Henzinger, and Philippe Rannou. Mean-payoff automatonexpressions. In Paul Gastin and François Laroussinie, editors,
CONCUR2010 - Concurrency Theory, 21th International Conference, CONCUR2010, Paris, France, August 31-September 3, 2010. Proceedings ,volume 6269 of
Lecture Notes in Computer Science , pages 269–283.Springer, 2010.[10] Krishnendu Chatterjee, Thomas A. Henzinger, and Nir Piter-man. Strategy logic.
Inf. Comput. , 208(6):677–693, 2010. doi:10.1016/j.ic.2009.07.004 .[11] János Flesch, Jeroen Kuipers, Ayala Mashiah-Yaakovi, Gijs Schoenmak-ers, Eilon Solan, and Koos Vrieze. Perfect-information games withlower-semicontinuous payoffs.
Math. Oper. Res. , 35(4):742–755, 2010.[12] Eryk Kopczynski. Half-positional determinacy of infinite games. InMichele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener,editors,
Automata, Languages and Programming, 33rd InternationalColloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II , volume 4052 of
Lecture Notes in Computer Science , pages 336–347. Springer, 2006.[13] Orna Kupferman, Giuseppe Perelli, and Moshe Y. Vardi. Synthesiswith rational environments.
Ann. Math. Artif. Intell. , 78(1):3–20, 2016. doi:10.1007/s10472-016-9508-8 .[14] Donald A. Martin. Borel determinacy.
Annals of Mathematics , pages363–371, 1975.[15] Noémie Meunier.
Multi-Player Quantitative Games: Equilibria andAlgorithms . PhD thesis, Université de Mons, 2016.[16] Martin J. Osborne.
An introduction to game theory . Oxford Univ. Press,2004.[17] Eilon Solan and Nicolas Vieille. Deterministic multi-player dynkingames.
Journal of Mathematical Economics , 39(8):911–929, 2003.[18] Michael Ummels. Rational behaviour and strategy construction ininfinite multiplayer games. In
FSTTCS 2006: Foundations of SoftwareTechnology and Theoretical Computer Science, 26th International Con-ference, Kolkata, India, December 13-15, 2006, Proceedings , volume4337 of
Lecture Notes in Computer Science , pages 212–223. Springer,2006.[19] Yaron Velner, Krishnendu Chatterjee, Laurent Doyen, Thomas A. Hen-zinger, Alexander Moshe Rabinovich, and Jean-François Raskin. Thecomplexity of multi-mean-payoff and multi-energy games.
Inf. Comput. ,241:177–196, 2015.[20] Nicolas Vieille and Eilon Solan. Deterministic multi-player Dynkin games.
Journal of Mathematical Eco-nomics , Vol.39,num. 8:pp.911–929, November 2003.URL: https://hal-hec.archives-ouvertes.fr/hal-00464953, doi:10.1016/S0304-4068(03)00021-1 . PPENDIX AP ROOF OF T HEOREM Theorem 1.
Let G be a game with steady negotiation. Then,a play ρ in G is an NE play if and only if ρ is nego( λ ) -consistent.Proof: • Let ¯ σ be a Nash equilibrium in G ↾ v , for some state v , and let ρ = h ¯ σ i v : let us prove that the play ρ is nego( λ ) -consistent.Let k ∈ N , let i ∈ Π be such that ρ k ∈ V i , and let usprove that µ i ( ρ k ρ k +1 . . . ) ≥ nego( λ )( ρ k ) .For any deviation σ ′ i of σ i ↾ ρ ...ρ k , by definition of NEs, µ i ( h ¯ σ − i ↾ ρ ...ρ k , σ ′ i i ρ k ) ≤ µ i ( ρ ) . Therefore: µ i ( ρ ) ≥ sup σ ′ i µ i ( h ¯ σ − i ↾ ρ ...ρ k , σ ′ i i ρ k ) hence: µ i ( ρ ) ≥ inf ¯ τ − i sup τ i µ i ( h ¯ τ − i ↾ ρ ...ρ k , τ i i ρ k ) i.e.: µ i ( ρ ) ≥ nego( λ )( ρ k ) . • Let ρ be a nego( λ ) -consistent play from a state v . Letus define a strategy profile ¯ σ such that h ¯ σ i v = ρ , by: – h ¯ σ i v = ρ ; – for all histories of the form ρ . . . ρ k v with v = ρ k +1 ,let i be the player controlling ρ k .Since the game G is with steady negotiation, theinfimum: inf ¯ τ − i ∈ λ Rat( ρ k ) sup τ i µ i ( h ¯ τ i ρ k ) is a minimum. Let ¯ τ k − i be λ -rational strategy profilefrom ρ k realizing that minimum, and let τ ki be somestrategy from ρ k such that τ ki ( ρ k ) = v . Then, wedefine: h ¯ σ ↾ ρ ...ρ k v i v = h ¯ τ kρ k v i v ; – for every other history h , ¯ σ ( h ) is defined arbitrarily.Let us prove that ¯ σ is an NE: let σ ′ i be a deviation of σ i , let ρ ′ = h ¯ σ − i , σ ′ i i v and let ρ . . . ρ k be the longestcommon prefix of ρ and ρ ′ . Let v = ρ ′ k +1 .Then, we have: µ i ( ρ ′ ) ≤ sup τ ki µ i (cid:0) h ¯ τ k i ρ k (cid:1) = nego( λ )( ρ k ) , and since ρ is λ -consistent, nego( λ )( ρ k ) ≤ µ i ( ρ ) ,hence µ i ( ρ ′ ) ≤ µ i ( ρ ) . A PPENDIX BP ROOF OF L EMMA Lemma 1.
Let G ↾ v be a well-initialized prefix-independentgame with steady negotiation, and ε ≥ . Let λ be an ε -fixedpoint of the function nego . Then, for every λ -consistent play ξ starting in v , there exists an ε -SPE ¯ σ such that h ¯ σ i v = ξ .Proof: • Particular case: if there exists v such that λ ( v ) = + ∞ . In that case, since the game G ↾ v is well-initialized,there is no λ -rational strategy profile from v , and nego( λ )( v ) = + ∞ . Since ε is finite and since λ isan ε -fixed point of the negotiation function, it followsthat λ ( v ) = + ∞ : in that case, there is no λ -consistentplay ξ from v , and then the proof is done. Therefore,for the rest of the proof, we assume that for all v , wehave λ ( v ) = + ∞ . As a consequence, since λ is an ε -fixed point of the function nego , for all v , we have nego( λ )( v ) = + ∞ ; and so finally, for each such v , thereexists a λ -consistent play starting from v . • Preliminary result: a game with steady negotiation is alsowith subgame-steady negotiation.
Recall that a game with steady negotiation is a game suchthat for every requirement λ , for every player i and forevery state v , there exists a λ -rational strategy profile ¯ τ v such that: sup τ vi µ i ( h ¯ τ v i v ) = inf ¯ τ − i ∈ λ Rat( v ) sup τ i µ i ( h ¯ τ i v ) is realized, i.e. there exists a worst λ -rational strategyprofile against player i from the state v , with regards toplayer i ’s payoff.Our goal in this part of the proof is to show that a gamethat is with steady negotiation is also with subgame-steady negotiation , that is to say, for every requirement λ , for every player i and for every state v , there exists a λ -rational strategy profile ¯ τ v ∗− i such that for every history hw starting from v compatible with ¯ τ v ∗− i , we have: sup τ v ∗ i µ i ( h ¯ τ v ∗ ↾ hw i w ) = inf ¯ τ − i ∈ λ Rat( w ) sup τ i µ i ( h ¯ τ i w ) , i.e. there exists a λ -rational strategy profile against player i from the state v , that is the worst with regards to player i ’s payoff in any subgame, in other words a subgame-worst strategy profile.Let us construct inductively the strategy profile ¯ τ v ∗− i andthe strategy τ v ∗ i λ -rationalizing it. We define them onlyon histories that are compatible with ¯ τ v ∗− i , since they canbe defined arbitrarily on any other histories. We proceedby assembling the strategy profiles of the form ¯ τ w , andthe histories after which we follow a new ¯ τ w will becalled the resets of ¯ τ v ∗− i . – First, h ¯ τ v ∗ i v = h ¯ τ v i v : the one-state history v is thenthe first reset of ¯ τ v ∗− i ; – then, for every history hw from v such that h iscompatible with ¯ τ v ∗− i and ends in V i , and such that w = τ v ∗ i ( h ) : let us write hw = h ′ uh ′′ so that h ′ u
14s the longest reset of ¯ τ v ∗− i among the prefixes of h ,and therefore so that the strategy profile ¯ τ v ∗ ↾ h ′ u has beendefined as equal to ¯ τ u over the prefixes of h ′′ until w .Then, we have: sup τ i µ i ( h ¯ τ w − i , τ i i w ) ≤ sup τ i µ i ( h ¯ τ u − i ↾ uh ′′ , τ i i w ) by prefix-independence of G and since by its defini-tion, the strategy profile ¯ τ w − i minimizes the quantity sup τ i µ i ( h ¯ τ w − i , τ i i w ) . Let us separate two cases. ∗ Suppose first that: sup τ i µ i ( h ¯ τ w − i , τ i i w ) = sup τ i µ i ( h ¯ τ u − i ↾ uh ′′ , τ i i w ) . Then, h ¯ τ v ∗ ↾ hw i = h ¯ τ u ↾ uh ′′ i w : the coalition of play-ers against player i keeps following their strategyprofile so that player i will have no more than thepayoff they can ensure. ∗ Suppose now that: sup τ i µ i ( h ¯ τ w − i , τ i i w ) < sup τ i µ i ( h ¯ τ u − i ↾ uh ′′ , τ i i w ) . Then, h ¯ τ v ∗ ↾ hw i = h ¯ τ w i w : player i has done some-thing that lowers the payoff they can ensure, andtherefore the other players have to update theirstrategy profile in order to enforce that new mini-mum.The history hw is a reset of ¯ τ v ∗− i .All the plays constructed are λ -consistent, hence ¯ τ v ∗− i is indeed λ -rationalized by τ v ∗ i .Let us now prove that τ v ∗ i is the subgame-worst λ -rational strategy profile against player i . Let hw be ahistory starting in v compatible with ¯ τ v ∗− i , let υ i be astrategy from the state w , let η = h ¯ τ v ∗− i ↾ hw , υ i i w andlet us prove that: µ i ( η ) ≤ inf ¯ τ − i ∈ λ Rat( w ) sup τ i µ i ( h ¯ τ i w ) . Let us consider the sequence ( α n ) n ∈ N , defined by: α n = inf ¯ τ − i ∈ λ Rat( η n ) sup τ i µ i ( h ¯ τ i η n ) . That sequence is non-increasing. Indeed, for all n : ∗ If η n ∈ V i , then no action of player i can improvethe payoff player i themself can secure against a λ -rational environment. ∗ If η n V i , then: η n +1 = ¯ τ v ∗− i ( hη . . . η n ) =¯ τ η k − i ( η k . . . η n ) for some k such that, by construc-tion of ¯ τ v ∗− i , α k = · · · = α n . Since the strategyprofile ¯ τ η k − i is defined to realize the payoff α k = α n ,we have α n +1 = α n .Moreover, that sequence can only take a finite numberof values (at most card V ). Therefore, it is stationary:there exists n ∈ N such that ( α n ) n ≥ n is constant,and there are no resets of ¯ τ v ∗− i among the prefixes of η of length greater than n . Therefore, if we choose n minimal (i.e., n is theindex of the last reset in η ), then the play η n η n +1 . . . is compatible with the strategy profile ¯ τ η n − i . Then, wehave: µ i ( η ) ≤ α n ≤ α , and: α = inf ¯ τ − i ∈ λ Rat( w ) sup τ i µ i ( h ¯ τ i w ) , which proves that ¯ τ v ∗ is the subgame-worst λ -strategyprofile against player i from the state w , and thereforethat the game G is a game with subgame-steadynegotiation. • Construction of ¯ σ . Let H = Hist G ↾ v . Let us construct inductively ¯ σ bydefining all the plays h ¯ σ ↾ hv i v , for hv ∈ H , keepingthe hypothesis that at any step n , the set H n containsexactly the histories hv such that the play h ¯ σ ↾ hv i v hasbeen defined, and that such a play is always λ -consistent:it will define a λ -rational strategy profile, and we will thenprove it is an ε -SPE. – First, h ¯ σ i v = ξ , which satisfies the induction hypoth-esis. We remove then all the finite prefixes of ξ form H to obtain H . Note that the only history of length has been removed. – At the n -th step, with n > : let us choose hv ∈H n of minimal length, and therefore minimal for theprefix order: the strategy profile ¯ σ has been defined onall the strict prefixes of hv , but not on hv itself, and v = ¯ σ ( h ) . Let then i be the player controlling the laststate of h (which exists since all the histories of H n have length at least ). Let ¯ τ v ∗− i be a subgame-worst λ -rational strategy profile against player i from v , whoseexistence has been proved in the previous point, andlet τ v ∗ i be a strategy rationalizing it.Then, we define h ¯ σ ↾ hv i v = h ¯ τ v ∗ i v , and inductively,for every history h ′ w starting from v and compatiblewith ¯ σ − i ↾ hv as it has been defined so far, we define h ¯ σ ↾ hh ′ w i v = h ¯ τ v ∗ ↾ h ′ w i w . The strategy profile ¯ σ ↾ hv is thenequal to ¯ τ v ∗ on any history compatible with ¯ τ v ∗− i .We remove all such histories from H n to obtain H n +1 .All the plays we built are λ -consistent, which was ourinduction hypothesis.Since each step removes from H n a history of minimallength, and since there are finitely many histories of anygiven length, we have T n H n = ∅ , and this processcompletely defines ¯ σ . • Such ¯ σ is an ε -SPE. Let h (0) w ∈ Hist G ↾ v , let i ∈ Π , let σ ′ i be adeviation of σ i . Let ρ = h (0) h ¯ σ ↾ h (0) w i w and let ρ ′ = h (0) h σ ′ i ↾ h (0) w , ¯ σ − i ↾ h (0) w i w . We prove that µ i ( ρ ′ ) ≤ µ i ( ρ ) + ε .If ρ ′ is compatible with σ i , then ρ ′ = ρ and the proof isimmediate. If it is not, we let huv denote the shortestprefix of ρ ′ such that u ∈ V i and v = σ i ( hu ) . Thetransition uv can be considered as the first deviation of15layer i , but note that hu can be both longer or shorterthan h (0) : player i may have already deviated in h (0) .Be that as it may, the history hu is a common prefix ofthe play ρ and ρ ′ , and if ¯ τ v ∗− i denotes a subgame-worst λ -rational strategy profile against player i from the state v ,and if τ v ∗ i is a strategy λ -rationalizing it, then ¯ σ ↾ huv hasbeen defined as equal to ¯ τ v ∗ on any history compatiblewith ¯ σ − i ↾ huv . – If huv is a prefix of ρ : let huh ′ w ′ be the longestcommon prefix of ρ and ρ ′ . Necessarily, w ′ ∈ V i . Then,by definition of ¯ τ v ∗− i , we have: µ i ( ρ ′ ) ≤ inf ¯ τ − i ∈ λ Rat( w ′ ) sup τ i µ i ( h ¯ τ i w ′ ) = nego( λ )( w ′ ) , and since λ is an ε -fixed point of nego : µ i ( ρ ′ ) ≤ λ ( w ′ ) + ε. On the other hand, the play h ¯ σ ↾ h ′ w ′ i w ′ , which is asuffix of ρ , is λ -consistent, hence µ i ( ρ ) ≥ λ ( w ′ ) .Therefore, µ i ( ρ ′ ) ≤ µ i ( ρ ) + ε . – If huv is not a prefix of ρ : then, ρ = h h ¯ σ ↾ hu i u . Since u ∈ V i , we have: nego( λ )( u ) = sup uv ′ ∈ E inf ¯ τ − i ∈ λ Rat( v ′ ) sup τ i µ i ( h ¯ τ i v ′ ) . In particular, we have: inf ¯ τ − i ∈ λ Rat( v ) sup τ i µ i ( h ¯ τ i v ) ≤ nego( λ )( u ) ≤ λ ( u ) + ε. Then, for the same reason as above, we know that: µ i ( ρ ′ ) ≤ inf ¯ τ − i ∈ λ Rat( v ) sup τ i µ i ( h ¯ τ i v ) . Finally, since the suffix h ¯ σ ↾ hu i u of ρ is λ -consistent,we have µ i ( ρ ) ≥ λ ( u ) ≥ nego( λ )( u ) − ε ≥ µ i ( ρ ′ ) .The strategy profile ¯ σ is an ε -SPE.A PPENDIX CP ROOF OF L EMMA Lemma 2.
Let G ↾ v be a well-initialized prefix-independentgame, and let ε ≥ . Let ¯ σ be an ε -SPE in G ↾ v . Then, thereexists an ε -fixed point λ of the negotiation function such thatfor every history hv starting in v , the play h ¯ σ ↾ hv i v is λ -consistent.Proof: Let us define the requirement λ by, for each i ∈ Π and v ∈ V i : λ ( v ) = inf hv ∈ Hist G ↾ v µ i ( h ¯ σ ↾ hv i v ) . Note that the set { µ i ( h ¯ σ ↾ hv i v ) | hv ∈ Hist G ↾ v } is neverempty, since the game G ↾ v is well-initialized.Then, for every history hv starting in v , the play h ¯ σ ↾ hv i v is λ -consistent. Let us prove that λ is an ε -fixed point of nego : let i ∈ Π , let v ∈ V i , and let us assume towardscontradiction (since the negotiation function is non-decreasing)that nego( λ )( v ) > λ ( v ) + ε , that is to say: inf ¯ τ − i ∈ λ Rat( v ) sup τ i µ i ( h ¯ τ i v ) > inf hv ∈ Hist G ↾ v µ i ( h ¯ σ ↾ hv i v ) + ε. Then, since all the plays generated by the strategy profile ¯ σ are λ -consistent, and therefore since any strategy profile ofthe form ¯ σ − i ↾ hv is λ -rational, we have: inf hv sup τ i µ i ( h ¯ σ − i ↾ hv , τ i i v ) > inf hv µ i ( h ¯ σ ↾ hv i v ) + ε. Therefore, there exists a history hv such that: sup τ i µ i ( h ¯ σ − i ↾ hv , τ i i v ) > µ i ( h ¯ σ ↾ hv i v ) + ε, which is impossible if the strategy profile ¯ σ is an ε -SPE.Therefore, there is no such v , and the requirement λ is an ε -fixed point of the negotiation function.A PPENDIX DP ROOF OF L EMMA Lemma 3.
Let G be a game, and let ε ≥ . The negotiationfunction has a least ε -fixed point.Proof: The following proof is a generalization of aclassical proof of Tarski’s fixed point theorem.Let Λ be the set of the ε -fixed points of the negotiationfunction. The set Λ is not empty, since it contains at least therequirement v + ∞ . Let λ ∗ be the requirement defined by: λ ∗ : v inf λ ∈ Λ λ ( v ) . For all ε -fixed point λ of the negotiation func-tion, we have then for each v , λ ∗ ( v ) ≤ λ ( v ) , and nego( λ ∗ )( v ) ≤ nego( λ )( v ) since nego is monotone; andtherefore, nego( λ ∗ )( v ) ≤ λ ( v ) + ε .As a consequence, we have: nego( λ ∗ )( v ) ≤ inf λ ∈ Λ λ ( v ) + ε = λ ∗ ( v ) + ε. The requirement λ ∗ is an ε -fixed point of the negotiation func-tion, and is therefore the least ε -fixed point of the negotiationfunction. A PPENDIX EA BSTRACT NEGOTIATION GAME
Definition 28 (Abstract negotiation game) . Let G ↾ v be aninitialized game, let i ∈ Π , and let λ be a requirement on G . The abstract negotiation game of G ↾ v for player i withrequirement λ is the two-player zero-sum initialized game: Abs λi ( G ) ↾ [ v ] = ( {P , C} , S, ( S P , S C ) , ∆ , ν ) ↾ [ v ] , where: • P denotes the player Prover and C the player Challenger ; • the states of S C are written [ ρ ] , where ρ is a λ -consistentplay in G ; • the states of S P are written [ hwv ] , where hwv is a historyin G , with w ∈ V i , or [ v ] with v ∈ V , plus two additionalstates ⊤ and ⊥ ; • the set ∆ contains the transitions of the form: – [ hv ][ vρ ] , where [ hv ] ∈ S P and [ vρ ] ∈ S C (Proverproposes a play);16 [ ρ ][ ρ ...ρ n v ] , where [ ρ ] ∈ S C , n ∈ N , ρ n ∈ V i , and v = ρ n +1 (Challenger makes player i deviate); – [ ρ ] ⊤ , where [ ρ ] ∈ S C (Challenger accepts the proposedplay); – ⊤⊤ (the game is over); – [ hv ] ⊥ (Prover has no more play to propose); – ⊥⊥ (the game is over). • ν is the outcome function defined by, for all ρ (0) , ρ (1) , . . . , h (1) v , h (2) v , . . . , k, H : ν C (cid:0) [ v ] (cid:2) ρ (0) (cid:3) (cid:2) h (1) v (cid:3) (cid:2) ρ (1) (cid:3) . . . (cid:2) h ( k ) v k (cid:3) (cid:2) ρ ( k ) (cid:3) ⊤ ω (cid:1) = µ i (cid:0) h (1) . . . h ( k ) ρ ( k ) (cid:1) ,ν C (cid:0) [ v ] (cid:2) ρ (0) (cid:3) (cid:2) h (1) v (cid:3) (cid:2) ρ (1) (cid:3) . . . (cid:2) h ( n ) v n (cid:3) (cid:2) ρ ( n ) (cid:3) . . . (cid:1) = µ i (cid:0) h (1) h (2) . . . (cid:1) ,ν C ( H ⊥ ω ) = + ∞ , and by ν P = − ν C . Remark.
If the game G is Borelian, then so is the game Abs λi ( G ) . Proposition 3.
Let G ↾ v be an initialized Borelian game, let λ be a requirement on G and let i ∈ Π . Then, choosingChallenger as distinguished player, the value of the game Abs λi ( G ) [ v ] is equal to the quantity: inf ¯ σ − i ∈ λ Rat( v ) sup σ i µ i ( h ¯ σ i v ) . Proof:
Let α ∈ R , and let us prove that the followingstatements are equivalent:1) there exists a strategy τ P such that for every strategy τ C , ν C ( h ¯ τ i [ v ] ) < α ;2) there exists a λ -rational strategy profile ¯ σ − i in thegame G ↾ v such that for every strategy σ i , we have µ i ( h ¯ σ i v ) < α . • (1) implies (2). Let τ P be such that for every strategy τ C , ν C ( h ¯ τ i [ v ] ) < α .In what follows, any history h compatible with an alreadydefined strategy profile ¯ σ − i in G ↾ v will be decomposedin: h = v h (0) v h (1) . . . h ( n − v n h ( n ) , so that there exist plays ρ (0) , . . . , ρ ( n − , η and a history: [ v ] h ρ (0) i h v h (1) v i . . . h v n − h ( n − v n i h v n h ( n ) η i in the game Abs λi ( G ) compatible with τ P : the existenceand the unicity of that decomposition can be proved byinduction. Intuitively, the history h is cut in historieswhich are prefixes of plays that can be proposed byProver.Then, let us define inductively the strategy profile ¯ σ − i by,for every h such that ¯ σ − i has been defined on the prefixesof h , and such that the last state of h is not controlled byplayer i , ¯ σ − i ( h ) = η with η defined from h as higher.Let us prove that ¯ σ − i is the desired strategy profile. – The strategy profile ¯ σ − i is λ -rational. Let us define σ i so that for every history hv compatiblewith ¯ σ − i , the play h ¯ σ ↾ hv i v is λ -consistent.For any history: h = v h (0) v h (1) . . . h ( n − v n h ( n ) compatible with ¯ σ − i and ending in V i , let σ i ( h ) = η with η corresponding to the decomposition of h , sothat by induction: h ¯ σ ↾ v h (0) v h (1) ...h ( n − v n i v n = v n h ( n ) η. Let now hv be a history in G ↾ v , and let us show thatthe play h ¯ σ ↾ hv i v is λ -consistent. If we decompose: hv = v h (0) v h (1) . . . h ( n − v n h ( n ) with the same definition of η (note that the vertex v isnow included in the decomposition), then h ¯ σ ↾ hv i v = vη , and by definition of the abstract negotiation game, v n h ( n ) η is a λ -consistent play, and therefore so is vη . – The strategy profile ¯ σ − i keeps player i ’s payoff underthe value α . Let σ i be a strategy for player i , and let ρ = h ¯ σ i v .We want to prove that µ i ( ρ ) < α .Let us define two finite or infinite sequences (cid:0) ρ ( k ) (cid:1) k ∈ K and (cid:0) h ( k ) v k (cid:1) k ∈ K , where K = { , . . . , n } or K = N \ { } , by setting h (0) equal to the empty history,and for every k ∈ K : h ρ ( k ) i = τ P (cid:16) [ v ] h ρ (0) i . . . h ρ ( k − i h h ( k ) v k i(cid:17) and so that for every k , the history h ( k ) v k is the shortestprefix of ρ that is not a prefix of h (1) . . . h ( k − ρ ( k − (or equivalently, the history h ( k ) is the longest commonprefix of ρ and h (1) . . . h ( k − ρ ( k − ).Then, the length of the longest common prefix of h (1) . . . h ( k − ρ ( k ) and ρ increases with k , and theset K is finite if and only if there exists n such that h (1) . . . h ( n − ρ ( n ) = ρ .In the infinite case, let: χ = [ v ] h ρ (0) i h h (1) v i . . . h ρ ( k ) i h h ( k ) v k i . . . . The play χ is compatible with τ P , hence ν C ( χ ) < α ,that is to say: µ i (cid:16) h (1) h (2) . . . (cid:17) < α, ie. µ i ( ρ ) < α .In the finite case, let: χ = [ v ] h ρ (0) i h h (1) v i . . . h ρ ( n ) i ⊤ ω . For the same reason, ν C ( χ ) < α , that is to say µ i (cid:0) h (1) . . . h ( n ) ρ ( n ) (cid:1) = µ i ( ρ ) < α . • (2) implies (1). Let ¯ σ − i be a λ -rational strategy profile keeping player i ’spayoff below α .17hen, let σ i be a strategy λ -rationalizing ¯ σ − i . Let usdefine a strategy τ P for Prover in the abstract negotiationgame.Let H = [ v ] (cid:2) ρ (0) (cid:3) (cid:2) h (1) v (cid:3) (cid:2) ρ (1) (cid:3) . . . (cid:2) h ( n ) v n (cid:3) be ahistory in the abstract game, ending in S P . Then, wedefine: τ P ( H ) = (cid:2) h ¯ σ ↾ h (1) ...h ( n ) v n i v n (cid:3) . If H is a history ending in ⊤ , then τ P ( H ) = ⊤ , and inthe same way if H ends in ⊥ , then τ P ( H ) = ⊥ .Let us show that τ P is the strategy we were looking for.Let χ be a play compatible with τ P , and let us note thatthe state ⊥ does not appear in χ . Then, the play χ canonly have two forms: – If χ = [ v ] (cid:2) ρ (0) (cid:3) (cid:2) h (1) v (cid:3) . . . (cid:2) ρ ( n ) (cid:3) ⊤ ω , then we have: ρ ( n ) = h ¯ σ ↾ h (1) ...h ( n ) v n i v n , and the history h (1) . . . h ( n ) v n in the game G ↾ v iscompatible with ¯ σ − i . By hypothesis, we have: µ i (cid:16) h (1) . . . h ( n ) ρ ( n ) (cid:17) < α, hence ν C ( χ ) < α . – If χ = [ v ] (cid:2) ρ (0) (cid:3) . . . (cid:2) h ( n ) v n (cid:3) (cid:2) ρ ( n ) (cid:3) . . . , then the play ρ = h (1) h (2) . . . is compatible with ¯ σ − i , and byhypothesis µ i ( ρ ) < α , hence ν C ( χ ) < α . Remark.
Prover has a strategy to avoid ⊥ if and only if λ issatisfiable. A PPENDIX FP ROOF OF T HEOREM Theorem 3.
Let G ↾ v be an initialized prefix-independentBorelian game. Let λ be a requirement and i a player.Then, we have: val (Conc λi ( G ) ↾ s ) = inf ¯ σ − i ∈ λ Rat( v ) sup σ i µ i ( h ¯ σ i v ) . Proof:
First, let us define: A = (cid:26) sup σ i µ i ( h ¯ σ i v ) (cid:12)(cid:12)(cid:12)(cid:12) ¯ σ − i ∈ λ Rat( v ) (cid:27) and: B = (cid:26) sup τ C ν C ( h ¯ τ i s ) (cid:12)(cid:12)(cid:12)(cid:12) τ P (cid:27) \ { + ∞} . We prove our point if we prove that A = B . • B ⊆ A . Let τ P be a strategy such that: sup τ C ν C ( h ¯ τ i s ) < + ∞ , and let ¯ σ be the strategy profile defined by: ¯ σ ( ˙ H ) = w for every history H compatible with τ P (by induction,the localized projection is injective on the histories com-patible with τ P ) with τ P ( H ) = ( vw, · ) , and arbitrarilydefined on any other histories. – The strategy profile ¯ σ − i is λ -rational , rationalized bythe strategy σ i . Indeed, let us assume it is not.Then, there exists a history h = h . . . h n in G ↾ v compatible with ¯ σ − i such that the play ˙ ρ = h ¯ σ ↾ h i h n isnot λ -consistent. Then, let: Hs = ( h , M ) ( h ¯ σ ( h ) , M ) . . . ( h n , M n ) be the only history in Conc λi ( G ) ↾ s compatible with τ P such that ˙ H = h .Let τ C be a strategy constructing the history h , definedby: τ C ( H . . . H k − ) = H k for every k , and: τ C ( H ′ ( vw, M )) = ( w, M ∪ { w } ) for any other history H ′ ( vw, M ) .Then, the play η = h ¯ τ i s contains finitely manydeviations (Challenger stops the deviations after havingdrawn the history h ), and the play ˙ η = h . . . h n − ˙ ρ is not λ -consistent, i.e. there exists a dimension j ∈ Π such that: µ j ( ˙ η ) − max v ∈ M n ∩ V j λ ( v ) < i.e.: ˆ µ j ( η ) < and therefore ν C ( ρ ) = ν C ( η ) = + ∞ , which is false byhypothesis. – Now, let us prove the equality: sup σ ′ i µ i ( h ¯ σ − i , σ ′ i i v ) = sup τ C ν C ( h ¯ τ i s ) . For that purpose, let us prove the equality of sets: { µ i ( h ¯ σ − i , σ ′ i i v ) | σ ′ i } = { ν C ( h ¯ τ i s ) | τ C } . ∗ Let τ C be a strategy for Challenger, and let ρ = h ¯ τ i s . Since ν C ( ρ ) = + ∞ by hypothesis, we have ν C ( ρ ) = ˆ µ ⋆ ( ρ ) = µ i ( ˙ ρ ) , which is an element ofthe left-hand set. ∗ Conversely, if σ ′ i is a strategy for player i and if η = h ¯ σ − i , σ ′ i i v , let τ C be a strategy such that forevery k : τ C (( η , · )( η · , · ) . . . ( η k · , · ) = ( η k +1 , · )) , i.e. a strategy forcing η .Then, since ν C ( ρ ) = + ∞ by hypothesis on τ P , wehave µ i ( η ) = ν C ( ρ ) , which is an element of theright-hand set. • A ⊆ B . Let ¯ σ − i be a λ -rational strategy profile from v , rational-ized by the strategy σ i ; let us define a strategy τ P by, forevery history H and for every v ∈ V : τ P ( H ( v, · )) = (cid:16) v ¯ σ ( ˙ Hv ) , · (cid:17) . sup σ ′ i µ i ( h ¯ σ − i , σ ′ i i v ) = sup τ C ν C ( h ¯ τ i s ) . For that purpose, let us prove the equality of sets: { µ i ( h ¯ σ − i , σ ′ i i v ) | σ ′ i } = { ν C ( h ¯ τ i s ) | τ C } . – Let τ C be a strategy for Challenger, and let ρ = h ¯ τ i s .If ν C ( ρ ) = + ∞ , then ˙ ρ is compatible with ¯ σ andnot λ -consistent after finitely many steps, which isimpossible.Therefore, ν C ( h ¯ τ i s ) = + ∞ , and as a consequence wehave ν C ( ρ ) = ˆ µ ⋆ ( ρ ) = µ i ( ˙ ρ ) , which is an element ofthe left-hand set. – Conversely, if σ ′ i is a strategy for player i and if η = h ¯ σ − i , σ ′ i i v , let τ C be a strategy such that for all k : τ C (( η , · )( η · , · ) . . . ( η k · , · )) = ( η k +1 , · ) , i.e. a strategy forcing η .Then, either ν C ( ρ ) = + ∞ , and therefore η is not λ -consistent, and is compatible with ¯ σ after finitely manysteps, which is impossible.Or, µ i ( η ) = ν C ( ρ ) , which is an element of the right-hand set. A PPENDIX GP ROOF OF L EMMA Theorem 4.
Let G be a mean-payoff-inf game, let i be a player,let λ be a requirement and let Conc λi ( G ) be the correspondingconcrete negotiation game. There exists a memoryless strategy τ C such that for each state s : inf τ P ν C ( h ¯ τ i s ) = val (Conc λi ( G ) ↾ s ) , i.e. that is optimal for Challenger from all state.Proof: The structure of that proof is inspired from theproof of lemma 14 in [19].Let α ∈ R , and let Φ be the set of the plays ρ in Conc λi ( G ) such that: • lim inf n →∞ n n − P k =0 ( − ˆ π ⋆ ( ρ k ρ k +1 )) ≥ − α ; • and either: – ρ contains infinitely many deviations; – or for each j ∈ Π , ˆ µ j ( ρ ) ≥ .Note that the set of the plays ρ such that µ i ( ˙ ρ ) ≤ α could bedefined almost the same way, but with a limit superior insteadof the limit inferior.By [12], if Challenger can falsify the objective Φ , he canfalsify it with a memoryless strategy, if Φ is prefix-independent and convex . Convex objectives are defined as follows: the objective Φ isconvex if for all ρ, η ∈ Φ and for any decomposition: ρ . . . ρ k . . . ρ k . . . and: η . . . η ℓ . . . η ℓ . . . such that: χ = ρ . . . ρ k η . . . η ℓ ρ k +1 . . . ρ k η ℓ +1 . . . is a play, we have χ ∈ Φ . Let then be such two plays anddecomposition, and let us prove that χ ∈ Φ .Let us write Φ = Ψ ∩ (X ∪ Ξ) , where: • Ψ is the set of the plays ρ such that: lim inf n →∞ n n − X k =0 ( − ˆ π ⋆ ( ρ k ρ k +1 )) ≥ − α • X is the set of the plays containing infinitely manydeviations; • Ξ is the set of the plays ρ such that for each j ∈ Π , ˆ µ j ( ρ ) ≥ .As shown in [19], a mean-payoff-inf objective is convex:therefore, we can already say that χ ∈ Ψ . Let us now provethat χ ∈ X ∪ Ξ . • If ρ ∈ X or η ∈ X . Then, χ contains the deviations of ρ and η , hence χ ∈ X . • If ρ, η ∈ Ξ . Then, since mean-payoff-inf objectives are convex, then χ ∈ Ξ .In both cases, χ ∈ X ∪ Ξ , so χ ∈ Φ : the objective Φ isconvex.Therefore, there exists a memoryless strategy τ C such thatfor every strategy τ P , for each state s from which Challengerhas some strategy to falsify the objective Φ , we have h ¯ τ i s Φ .Let s be a state from which Challenger can enforce anoutcome ν C greater than α . Then, since the limit inferior of asequence is always lesser than or equal to its limit superior,Challenger can, from s , falsify the objective Φ . Therefore, bydefinition of τ C , for every strategy τ P , we have h ¯ τ i s Φ . Letus prove that ν C ( h ¯ τ i s ) > α .In other words, let us prove that for every infinite path ρ from s in the graph Conc λi ( G )[ τ C ] , we have ν C ( ρ ) > α . Since ρ Φ , we have either ρ X ∪ Ξ or ρ Ψ . In the first case,we have ν C ( ρ ) = + ∞ , which ends the proof. In the secondcase, we have: lim sup n →∞ n n − X k =0 ˆ π ⋆ ( ρ k ρ k +1 ) > α. We want to prove that ν C ( ρ ) > α , that is, since we assume ρ ∈ X ∪ Ξ : ˆ µ ⋆ ( ρ ) = lim inf n →∞ n n − X k =0 ˆ π ⋆ ( ρ k ρ k +1 ) > α. Here, the play ρ is an infinite path in the graph Conc λi ( G )[ τ C ] : by the description of the possible outcomes ina mean-payoff game given in [9], the mean-payoff-inf ˆ µ ⋆ ( ρ ) is then above or equal to the mean-payoff-inf ˆ µ ⋆ we get bylooping on all simple cycle c of that graph accessible fromthe state s : intuitively, a play can be seen as a combination ofthose cycles. That is to say: ˆ µ ⋆ ( ρ ) ≥ min c ∈ SC (cid:0) Conc λi ( G )[ τ C ] (cid:1) accessible from s ˆ µ ⋆ ( c ω ) . c ω is a play compatible with τ C ,we have: lim sup n →∞ n n − X k =0 ˆ π ⋆ ( c k c k +1 ) > α where the indices are taken in Z / | c | Z , i.e.: lim n →∞ n n − X k =0 ˆ π ⋆ ( c k c k +1 ) > α, and therefore: lim inf n →∞ n n − X k =0 ˆ π ⋆ ( c k c k +1 ) > α, that is to say: ˆ µ ⋆ ( c ω ) > α, hence ˆ µ ⋆ ( ρ ) > α . A PPENDIX HP ROOF OF L EMMA Lemma 5.
Let G ↾ v be an initialized mean-payoff-inf game,and let Conc λi ( G ) ↾ s be its concrete negotiation game forsome λ and some i .Then, the value of the game Conc λi ( G ) ↾ s is given by theformula: max τ C ∈ ML C (Conc λi ( G )) min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt( K ) , where opt( K ) is the minimal value ν C ( ρ ) for ρ among theinfinite paths in K . • If K contains a deviation, then Prover can simply choosethe simple cycle of K that minimizes player i ’s payoff: opt( K ) = min c ∈ SC( K ) ˆ µ ⋆ ( c ω ) . • If K does not contain a deviation, then Prover mustchoose a combination of the simple cycles of K thatminimizes player i ’s payoff while keeping the non-maindimensions above : opt( K ) = min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) . Proof:
By Lemma 4, there exists a memoryless strategy τ C which is optimal for Challenger among all his possiblestrategies.It follows from Theorem 3 that the highest value player i can get against a hostile λ -rational environment is the minimalpayoff of Challenger in a path in the graph Conc λi ( G )[ τ C ] . Forany such path ρ , there exists a strongly connected component K of Conc λi ( G )[ τ C ] accessible from s such that after afinite number of steps, ρ is a play in K . The least payoffof Challenger in such a path, for a given K , is opt( K ) ; let usprove that it is given by the desired formula.There are, then, two cases to distinguish: • If there is at least a deviation in K . Then, a play in K can contain infinitely many deviations.Therefore, the outcomes ν C ( ρ ) of plays in K are exactly xy Fig. 14. An example for the operator · x the mean-payoff-infs ˆ µ ⋆ ( ρ ) of plays in K , and possibly + ∞ ; and in particular, the lowest outcome Prover canget in K is the quantity: min c ∈ SC( K ) ˆ µ ⋆ ( c ω ) , the least value of a simple cycle in K . • If there is no deviation in K . Let us first introduce a notation: for any finite set D andany set X ⊆ R D , X x denotes the set: X x = (cid:26) (cid:18) min ¯ y ∈ Y y d (cid:19) d ∈ D (cid:12)(cid:12)(cid:12)(cid:12) Y ⊆ X finite (cid:27) . For example, in R , if X is the blue area below, then X x is the union of the blue area and the gray area inFigure 14.Let us already note that for all X ∈ R Π ∪{ ⋆ } , min ⋆ X x = min (cid:26) x ⋆ (cid:12)(cid:12)(cid:12)(cid:12) ¯ x ∈ X x , ∀ j ∈ Π , x j ≥ (cid:27) = min (cid:26) min ¯ y ∈ Y y ⋆ (cid:12)(cid:12)(cid:12)(cid:12) Y ⊆ X finite , ∀ ¯ y ∈ Y, ∀ j ∈ Π , y j ≥ (cid:27) = min (cid:26) y ⋆ (cid:12)(cid:12)(cid:12)(cid:12) ¯ y ∈ X, ∀ ¯ y ∈ Y, ∀ j ∈ Π , y j ≥ (cid:27) = min ⋆ X. Then, it has been proved in [9] that the set of possiblevalues of ˆ µ ( ρ ) for all plays ρ in K is exactly the set: X = (cid:18) Conv c ∈ SC( K ) ˆ µ ( c ω ) (cid:19) x . Since all the plays in K contain finitely many deviations(actually none), for every ¯ x = ˆ µ ( ρ ) ∈ X , we have ν C ( ρ ) = + ∞ if and only if there exists j ∈ Π suchthat x j < . Then, the lowest outcome Prover can get in K is: min { x ⋆ | ¯ x ∈ X, ∀ j ∈ Π , x j ≥ } , that is to say: min ⋆ (cid:18) Conv c ∈ SC( K ) ˆ µ ( c ω ) (cid:19) x , i.e. min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) .Theorem 3 enables to conclude to the desired formula.20 PPENDIX IP ROOF OF P ROPOSITION Proposition 2.
In mean-payoff-inf games, the negotiationfunction is Scott-continuous.Proof:
Let ( λ n ) n be a non-decreasing sequence of re-quirements on a mean-payoff-inf game G , and let λ =sup n λ n . We want to prove that nego( λ ) = sup n nego( λ n ) .Since the negotiation function is monotone, we already have nego( λ ) ≥ sup n nego( λ n ) . Let us prove that nego( λ ) ≤ sup n nego( λ n ) .Let δ > : we want to find n such that nego( λ n )( v ) ≥ nego( λ )( v ) − δ for each v ∈ V .Let: Conc λi ( G ) ↾ s = ( {P , C} , S, ( S P , S C ) , ∆ , ν ) ↾ s be the concrete negotiation game of G for λ and player i controlling v , and let: Conc λ n i ( G ) ↾ s = ( {P , C} , S, ( S P , S C ) , ∆ , ν ′ ) ↾ s be the concrete negotiation game of G for some requirement λ n in v . Let us note that both have the same underlying graph,and that the only difference are the weight functions ˆ π and ˆ π ′ ,on the non-main dimensions.By Lemma 5, we have: nego( λ )( v ) =max τ C ∈ ML C ( Conc λi ( G ) ↾ s ) min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt( K ) with: opt( K ) = if K contains a deviation :min c ∈ SC( K ) ˆ µ ⋆ ( c ω )otherwise :min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) , and identically: nego( λ n )( v ) =max τ C ∈ ML C ( Conc λni ( G ) ↾ s ) min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt ′ ( K ) with: opt ′ ( K ) = if K contains a deviation :min c ∈ SC( K ) ˆ µ ⋆ ( c ω )otherwise :min ⋆ Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) . Let τ C be a memoryless strategy for Challenger in thegame Conc λi ( G ) s ; it can also be considered as a memorylessstrategy in the game Conc λ n i ( G ) s .Let us now define: γ n = sup v ∈ V ( λ ( v ) − λ n ( v )) . Then, the sequence ( γ n ) n is non-increasing and converges to . Moreover, for each transition st ∈ ∆ , we have: ˆ π ′ j ( st ) ∈ [ˆ π j ( st ) − γ n , ˆ π j ( st )] . Let: Γ n = n ¯ x ∈ R Π ∪{ ⋆ } (cid:12)(cid:12)(cid:12) x ⋆ = 0 and ∀ j ∈ Π , x j ∈ [0 , γ n ] o . Then, let K be a strongly connected component of thegraph Conc λi ( G )[ τ C ] , without deviation, accessible from s ;we have: Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) ⊆ Conv c ∈ SC( K ) ˆ µ ( c ω ) + Γ n . Let R = (cid:8) ¯ x ∈ R Π ∪{ ⋆ } (cid:12)(cid:12) ∀ j ∈ Π , x j ≥ (cid:9) . • If Conv c ∈ SC( K ) ˆ µ ( c ω ) ∩ R = ∅ , since Conv c ∈ SC( K ) ˆ µ ( c ω ) and R are closed sets, if γ n is small enough, we have Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) ∩ R = ∅ . Therefore, if: min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) = + ∞ , then, for n great enough: min ⋆ Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) = + ∞ . • Otherwise, we have: min ⋆ Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) ≥ min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) − γ n max c ∈ SC( K ) d ∈ SC( K ) X j ∈ Π , ˆ µj ( cω ) > ˆ µj ( dω ) ˆ µ ⋆ ( c ω ) − ˆ µ ⋆ ( d ω )ˆ µ j ( c ω ) − ˆ µ j ( d ω ) and if γ n is small enough, we have: min ⋆ Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) ≥ min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) − δ. In both cases, we find that there exists γ n small enough,i.e. n great enough, to ensure: min ⋆ Conv c ∈ SC( K ) ˆ µ ′ ( c ω ) ≥ min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) − δ. We can find such n for each strongly connected component K without deviation, and there exists a finite number ofsuch components. Moreover, when K is a strongly connectedcomponent with a deviation, the quantity: min c ∈ SC( K ) ˆ µ ⋆ ( c ω ) is the same in Conc λi ( G ) and in Conc λ n i ( G ) . Therefore, thereexists n ∈ N such that: min K ∈ SConn (Conc λni ( G )[ τ C ])accessible from s opt( K ) ≥ min K ∈ SConn (Conc λi ( G )[ τ C ])accessible from s opt( K ) − δ.
21e find such n for every memoryless strategy τ C , and thereexists a finite number of such strategies. Therefore, there exists n ∈ N such that: nego( λ n )( v ) ≥ nego( λ )( v ) − δ. Finally, since there are finitely many states v ∈ V , we canconclude to the existence of n ∈ N such that for each v ∈ V ,we have: nego( λ n )( v ) ≥ nego( λ )( v ) − δ. The negotiation function is Scott-continuous.A
PPENDIX JP ROOF OF T HEOREM Theorem 4.
Let G ↾ v be an initialized mean-payoff-inf game.Let us assimilate any requirement λ on G with finite values tothe tuple λ ¯ = ( λ ( v )) v ∈ V , element of the vector space of finitedimension R V .Then, for each player i and each vertex v ∈ V i , the quantity nego( λ )( v ) is a piecewise linear function of λ ¯ , which can beeffectively expressed and whose fixed points are computable.Proof: By Lemma 5, we have the formula: nego( λ )( v ) = max τ C ∈ ML C (Conc λi ( G )) opt( K ) Let τ C be a memoryless strategy realizing the maximumabove, and let K be a strongly connected component realizingthe minimum above. Let us prove that the quantity: opt( K ) = if K contains a dev . : min c ∈ SC( K ) ˆ µ ⋆ ( c ω )otherwise : min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) is the desired piecewise linear function of λ .When K contains a deviation, the quantity: min c ∈ SC( K ) ˆ µ ⋆ ( c ω ) is independent of λ , and the result is then immediate. Letus assume that K does not contain any deviation, and as aconsequence let us prove that the quantity: f ( λ ) = min ⋆ Conv c ∈ SC( K ) ˆ µ ( c ω ) is an piecewise linear function of λ .Let M be the common memory of the states of K (since K does not contain deviations). We know that for each j ∈ Π and for every cycle c ∈ SC( K ) , we have: ˆ µ j ( c ω ) = µ j ( ˙ c ω ) − max v ∈ V j ∩ M λ ( v ) . Let C = { ˙ c | c ∈ SC( K ) } . Since there is no deviation in K , any cycle in C is a simple cycle of G . Then, the quantity f ( λ ) is the minimal x i for ¯ x in the set: X = Conv c ∈ C µ ( c ω ) ∩ \ v ∈ M { ¯ x | x j ≥ λ ( v ) with v ∈ V j } . The set X is a polyhedron: therefore, there exists a vertex ¯ x of that polyhedron which minimizes x i for ¯ x ∈ X . That vertexis the intersection between a face of the greater polyhedron j ij • Fig. 15. The intersection between a -dimensional face and zero hyperplane j ij • Fig. 16. The intersection between a -dimensional face and two hyperplanes P = Conv c ∈ C µ ( c ω ) , and some of the hyperplanes H v (possiblyzero), defined as the hyperplanes of equation x j = λ ( v ) for j ∈ Π controlling v , such that λ ( v ) = max w ∈ M ∩ V j λ ( w ) . Example . With three cycles and two players against player i , each controlling one vertex v such that λ ( v ) = 0 , the vertex ¯ x is the red point in Figure 15 and Figure 16.The set of vertices of the polyhedron X is included in thefinite set: Y = ¯ y ∈ R Π ∪{ ⋆ } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∃ W ⊆ M, ∃ D ⊆ C, Conv c ∈ D µ ( c ω ) ∩ T w ∈ W H w = { ¯ y } and ∀ j, ∀ v ∈ M ∩ V j ,y j ≥ λ ( v ) , where the tuple ¯ y corresponding to the sets W and D is theintersection of the face of P delimited by the values of thecycles of D , and the hyperplanes H v for v ∈ W . That set Y is itself included in X .22e have, therefore: min ⋆ Conv c ∈ C µ ( c ω ) = min ¯ y ∈ Y y i . Let now ¯ y ∈ Y , and let D ⊆ C and W ⊆ M be thecorresponding sets.Let us choose D and W minimal, so that all player j ∈ Π controls at most one state w ∈ W , and so that there existsonly one decomposition: ¯ y = X c ∈ D α c µ ( c ω ) with for all c , we have < α c < , and P c α c = 1 .Furthermore, ¯ y is the only such solution of the system ofequations: ∀ j ∈ Π , ∀ w ∈ W ∩ V j , y j = λ ( w ) . Therefore, ¯ α = ( α c ) c ∈ D is the only solution of the system: P c ∈ D α c = 1 ∀ j ∈ Π , ∀ w ∈ W ∩ V j , P c ∈ D α c µ j ( c ω ) = λ ( w ) ∀ c ∈ D, α c > . Then, if ⊕ is a symbol and A W D is the matrix: A W D = (cid:18)(cid:26) w = ⊕ µ j ( c ω ) else , with w ∈ V j (cid:19) w ∈ W ∪ {⊕} ,c ∈ D then A W D is invertible and: ¯ α = A − W D (cid:18)(cid:26) w = ⊕ λ ( w ) otherwise (cid:19) w ∈ W ∪{⊕} , with for all c ∈ D , α c > .Let us write: ¯ β λW = (cid:18)(cid:26) j = ⊕ λ ( w ) otherwise (cid:19) w ∈ W ∪{⊕} . We have, thus, ¯ α = A − W D ¯ β λW .Let us write, for each player j , ¯ γ jD = ( µ j ( c ω )) c ∈ D . Then,we can write: y i = P c α c µ i ( c ω )= t ¯ γ iD ¯ α = t ¯ γ iD A − W D ¯ β λW . Finally, if we write: B W = (cid:18)(cid:26) w = v (cid:19) w ∈ W ∪{⊕} ,v ∈ V and: δ ¯ W = (cid:18)(cid:26) w = ⊕ (cid:19) w ∈ W ∪{⊕} we have ¯ β λW = B W λ ¯ + δ ¯ W , and therefore: y i = t ¯ γ iD A − W D ( B W λ ¯ + δ ¯ W ) . Conversely, the tuple ¯ y defined by, for each j ∈ Π , y j = t ¯ γ jD A − W D ( B W λ ¯ + δ ¯ W ) for given W ⊆ M and D ⊆ C , is an element of the set Y ifand only if: • the intersection Conv c ∈ D µ ( c ω ) ∩ T w ∈ W H w is a singleton, i.e.the matrix A W D is invertible (otherwise the matrix A − W D is not defined); • ¯ y ∈ Conv c ∈ D µ ( c ω ) , i.e. the tuple ¯ α = A − W D ( B W λ ¯ + δ ¯ W ) has only non-negative coordinates (actually positive if D is minimal); • for each player j , for each vertex v ∈ M ∩ V j , we have y j ≥ λ ( v ) , i.e. t γ jD A − W D ( B W λ ¯ + δ ¯ W ) ≥ λ ( v ) .Hence the formula: nego( λ )( v ) = max τ C ∈ ML ( Conc λ i ( G ) ) min K ∈ SConn (cid:16)
Conc λ i ( G )[ τ C ] (cid:17) accessible from ( v , { v } ) ( if K contains a deviation : min c ∈ SC( K ) ˆ µ ⋆ ( c ω )otherwise : min S K , where S K is the set of real numbers of the form: t ¯ γ iD A − W D ( B W λ ¯ + δ ¯ W ) such that: • W ⊆ M K ; • D ⊆ C K ; • the matrix A W D is invertible; • the tuple A − W D ( B W λ ¯ + δ ¯ W ) has only positive coordi-nates; • and for each j ∈ Π , for each v ∈ M K ∩ V j , we have t ¯ γ jD A − W D ( B W λ ¯ + δ ¯ W ) ≥ λ ( v ) .This is, indeed, the expression of a piecewise linear func-tion. A PPENDIX KS OME EXAMPLES OF NEGOTIATION SEQUENCES
We gather in this section some examples that could beinteresting for the reader who would want to get a full overallview on the behaviour of the negotiation function on the mean-payoff-inf games. For all of them, we computed the negotiationsequence, as defined in Section IV-E. For some of them, wejust gave the negotiation sequence; for the most importantones, we gave a complete explanation of how we computed it,using the abstract negotiation game, as defined in AppendixE.
Example . Let us take again the game of Example 2: let usgive (in red) the values of λ = nego( λ ) , which correspondto the antagonistic values. ac b d
03 2211( λ ) 1 21 2 at the second step, let us execute the abstract game on thestate a , with the requirement λ : whatever Prover proposesat first, Challenger has the possibility to deviate and to reachthe state b . Then, Prover has to propose a λ -consistent playfrom the state b , i.e. a play in which player gets at least23he payoff : such a play necessarily ends in the state d , andgives player the payoff .The other states keep the same values. ac b d
03 2211 ( λ )2 21 2 But then, at the third step, from the state b : whatever Proverproposes at first, Challenger can deviate to reach the state a .Then, Prover has to propose a λ -consistent play from a , i.e.a play in which player gets at least the payoff : such aplay necessarily end in the state d , i.e. after possibly someprefix, Prover proposes the play abd ω . But then, Challengercan always deviate to go back to the state a ; and the play whichis thus created is ( ab ) ω which gives player the payoff . ac b d
03 2211 ( λ )2 31 2 Finally, from the states a and b , there exist no λ -consistentplay, and therefore no λ -rational strategy profile. ac b d
03 2211 ( λ )+ ∞ + ∞ and for all n ≥ , λ n = λ . Example . In this example, we show a game that can beturned into a family of games, where the negotiation functionneeds as many steps as there are states to reach its limit:when the requirement changes in some state, it opens newpossibilities from the neighbour states, and so on. a b c d e
11 00 00 00 00 22( λ ) 1 0 0 0 2 a b c d e
11 00 00 00 00 22( λ ) 1 1 0 2 2 a b c d e
11 00 00 00 00 22( λ ) 1 1 2 2 2 a b c d e
11 00 00 00 00 22( λ ) 1 2 2 2 2 a b c d e
11 00 00 00 00 22( λ ) 2 2 2 2 2 and the requirement λ is a fixed point of the negocationfunction. Example . In all the previous examples, all the gameswhose underlying graphs were strongly connected containedSPEs. Here is an example of game with a strongly connectedunderlying graph that does not contain SPEs. b c a d ef λ ) 11 3 2 24 b c a d ef λ ) 21 3 2 24 b c a d ef λ ) 21 3 3 24 c a d ef λ ) + ∞ + ∞ + ∞ + ∞ + ∞ + ∞ Example . This example shows how a new requirement canemerge from the combination of several cycles.Let G be the following game: a b c defg ( λ ) 1 1 0 0003 13 00 0032 00 23 13000000 At the first step, the requirement λ captures the antagonisticvalues.Then, from the state c , if player forces the access to thestate b , then player must get at least : the worst play thatcan be proposed to player is then ( babc ) ω , which givesplayer the payoff .From the state f , if player forces the access to the state g , then the worst play that can be proposed to them is g ω . a b c defg ( λ ) 1 1 Then, from the state d , if player forces the access tothe state c , then player must get at least : the worst playthat can be proposed to player is then ( cccd ) ω , which givesplayer the payoff .At the same time, from the state e , player can now forcethe acces to the state f : then, the worst play that can beproposed to them is f g ω . a b c defg ( λ ) 1 1
32 12
323 13 00 0032 00 23 13000000
But then, from the state c , player can now force the accessto the state e : then, the worst play that can be proposed to themis ef g ω . a b c defg ( λ ) 1 1 2
323 13 00 0032 00 23 13000000
And finally, from that point, if from the state d player forces the access to the state c , then player must have atleast the payof ; and therefore, the worst play that can beproposed to player is now ( ccd ) ω , which gives them thepayoff . a b c defg ( λ ) 1 1 2
323 13 00 0032 00 23 13000000
The requirement λ is a fixed point of the negotiationfunction. Example . We give here the details of the example givenin Figure 15, in which the negotiation sequence was notstationary, and we provide a similar example with only twoplayers. cd a b ef
For each edge, the weights are given in the followingorder: player first, player second, player third. Sinceall the weight are equal to , for all n > , we have λ n ( d ) = λ n ( f ) = 0 . It comes that for all n > , we also25ave λ n ( c ) = λ n ( e ) = 0 . Moreover, by symmetry of thegame, we always have λ n ( a ) = λ n ( b ) . Therefore, to computethe negotiation sequence, it suffices to compute λ n +1 ( a ) asa function of λ n ( b ) , knowing that λ ( a ) = λ ( b ) = 1 , andtherefore that for all n > , λ n ( a ) = λ n ( b ) ≥ .From a , the worst play player could propose to player would be a combination of the cycles cd and d giving herexactly . But then, player will deviate to go to b , fromwhich if player proposes plays in the strongly connectedcomponent containing c and d , then player will alwaysdeviate and generate the play ( ab ) ω , and then get the payoff . Then, in order to give her a payoff lower than , player has to go to the state e . Since player does not controlany state in that strongly connected component, the play hewill propose will be accepted: he will, then propose the worstpossible combination of the cycles ef and f for player ,such that he gets at least his requirement λ n ( b ) . The payoff λ n +1 ( a ) is then the maximal solution of the system: λ n +1 ( a ) = x + 2(1 − x )2(1 − x ) ≥ λ n ( b )0 ≤ x ≤ that is to say λ n +1 ( a ) = 1 + λ n ( b )2 = 1 + λ n ( a )2 , and byinduction, for all n > : λ n ( a ) = λ n ( b ) = 2 − n − which tends to but never reaches it.That example could let us think that we need three playersto observe such a phenomena. Actually, the existence of aplayer for whom all the plays are equivalent was useful tobuild a not too complicated example, but not necessary. Hereis a variant of that example with only two players, slightlyless intuitive, but where the sequences ( λ n ( a )) n and ( λ n ( b )) n are the same as previously: cd a b ef , − − , − , , −2