Nash Equilibria in Finite-Horizon Multiagent Concurrent Games
aa r X i v : . [ c s . G T ] J a n Nash Equilibria in Finite-Horizon Multiagent Concurrent Games
Senthil Rajasekaran
Computer Science DepartmentRice [email protected]
Moshe Y. Vardi
Computer Science DepartmentRice [email protected]
ABSTRACT
The problem of finding pure strategy Nash equilibria in multia-gent concurrent games with finite-horizon temporal goals has re-ceived some recent attention. Earlier work solved this problemthrough the use of Rabin automata. In this work, we take advan-tage of the finite-horizon nature of the agents’ goals and show thatchecking for and finding pure strategy Nash equilibria can be doneusing a combination of safety games and lasso testing in Büchiautomata. To separate strategic reasoning from temporal reason-ing, we model agents’ goals by deterministic finite-word automata(DFAs), since finite-horizon logics such as LTL f and LDL f are rea-soned about through conversion to equivalent DFAs. This allow uscharacterize the complexity of the problem as PSPACE complete. ACM Reference Format:
Senthil Rajasekaran and Moshe Y. Vardi. 2021. Nash Equilibria in Finite-Horizon Multiagent Concurrent Games. In
Proc. of the 20th InternationalConference on Autonomous agents and Multiagent Systems (AAMAS 2021),Online, May 3–7, 2021 , IFAAMAS, 11 pages.
Game theory provides a powerful framework for modeling prob-lems in system design and verification [9, 14, 28]. In particular,two-player games have been used in synthesis problems for tem-poral logics [24]. In these games, one player takes on the role of thesystem that tries to realize a property and the other takes on therole of the environment that tries to falsify the property. Withinthe scope of multiplayer games, two-player zero-sum games arethe easiest to analyze, since they are purely adversarial – there isno reason for either player to do anything but maximize their ownutility at the expense of the other.When there are multiple agents with multiple goals, pure antag-onism is not a reasonable assumption [30].
Concurrent games area fundamental model of such multiagent systems [1, 20].
IteratedBoolean Games (iBG) [10] are a restriction of concurrent gamesintroduced in part to generalize temporal synthesis problems tothe multiagent setting. In an iBG, each agent has a temporal goal,usually expressed in
Linear Time Temporal Logic (LTL) [23], and isgiven control over a unique set of boolean variables. At each timestep, the agents collectively decide a setting to all boolean vari-ables by individually and concurrently assigning values to theirown variables. This creates an infinite sequence of boolean assign-ments (a trace ) that is used to determined which goals are satisfied
Work supported in part by NSF grants IIS-1527668, CCF-1704883, IIS-1830549, and anaward from the Maryland Procurement Office.
Proc. of the 20th International Conference on Autonomous agents and Multiagent Sys-tems (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021,Online and which are not [10]. In this paper, we generalize the iBG for-malism slightly to admit arbitrary finite alphabets rather than justtruth assignments to boolean variables, as discussed below.The concept of the
Nash Equilibrium [22] is widely accepted asan important notion of a solution in multiagent games and repre-sents a situation where agents cannot improve their outcomes uni-laterally. In this paper we consider deterministic agents, and there-fore the notion of a Nash equilibrium in this paper that of purestrategy Nash equilibrium [25]. This definition has a natural ana-logue when iBGs are considered, so finding Nash Equilibria in iBGsis an effective way to reason about temporal interactions betweenmultiple agents [10]. This problem has received attention in theliterature when the goals are derived from infinite-horizon logicssuch as LTL [5, 11]. There are, however, interactions that are bettermodeled by finite-horizon goals, especially when notions such as“completion” are considered [7]. In such settings, it is more effec-tive to reason about goals that can be completed in some finite butperhaps unbounded number of steps. Thus, while the agents stillcreate an infinite trace with their decisions, satisfaction occurs ata finite time index. With this modification in mind, the analogousproblem for finite-horizon temporal logics has recently began to re-ceive attention [12]. The main result of [12] is that automated equi-librium analysis of finite-horizon goals in iterated Boolean gamescan be done via reasoning about automata on infinite words, specif-ically,
Rabin automata .Here we address a more abstract version of the multi-agent finite-horizon temporal-equilibrium problem by analyzing concurrent it-erated games in which each agent is given their own
DeterministicFinite Word Automata (DFA) goal. The reason for this is twofold.First, essentially all finite-horizon temporal logics are reasoned aboutthrough conversion to equivalent DFA, including the popular log-ics LTL f and LDL f [6, 7]. Thus, using DFA goals offers us a generalway of dealing with a variety of temporal formalisms. Furthermore,using DFA goals enables us to separate the complexity of temporalreasoning from the complexity of strategic reasoning. Our focuson DFAs also ties in to a growing interest in DFAs as graphicalmodels that can be reasoned about directly in a number of relatedfields; see [13, 19, 31] for a few examples in the context of machinelearning.Our modelling of this problem is done from the viewpoint of asystem planner. Specifically, when given a system in which multi-ple agents have DFA goals, we query a subset 𝑊 of “good” agentsto see if there is Nash equilibrium in which only the agents in 𝑊 are able to satisfy their goals. By the definition of the Nash equi-librium, this means that agents not within 𝑊 , which we consideras “bad” agents, are unable to unilaterally change their strategyand satisfy their own “bad” goal. In doing so we can naturally in-corporate malicious agents with goals contrary to the planner’sby specifying a set 𝑊 that not contain such agents. This study of eams of cooperating agents has clear parallels to earlier work inrational synthesis [5, 16].Our main result is that automated temporal-equilibrium anal-ysis is PSPACE complete. We prove that the problem of identify-ing sets of players that admit Nash equilibria in concurrent multi-agent games with DFA goals can be solved using rather simpleconstructions. Specifically, our algorithm works by first solving asafety game for each agent in the game and then considers nonempti-ness in a Büchi word automata constructed with respect to the set 𝑊 of agents, which can be done in PSPACE. This is in contrast tothe 2EXPTIME upper bound of [12], which analyzed the combinedcomplexity of temporal and strategic reasoning and also consid-ered existence overall instead of with respect to a specific set ofagents 𝑊 . In this case driving force behind the complexity resultwas the doubly exponential blow-up from LDL f to DFAs [6, 17].Finally, we prove our algorithm optimal by providing a matchinglower bound. We assume familiarity with basic automata theory, as in [26]. Be-low is a quick refresher on 𝜔 -automata and infinite tree automata. Definition 2.1 ( 𝜔 automata). [9] A deterministic 𝜔 automaton isa 5-tuple h 𝑄, 𝑞 , Σ , 𝛿, 𝐴𝑐𝑐 i , where 𝑄 is a finite set of states, 𝑞 ∈ 𝑄 is the initial state, Σ is a finite alphabet, 𝛿 : 𝑄 × Σ → 𝑄 is thetransition function, and 𝐴𝑐𝑐 is an acceptance criterion. An infiniteword 𝑤 = 𝑎 , 𝑎 , . . . ∈ Σ 𝜔 is accepted by the automaton, if the run 𝑞 , 𝑞 , . . . ∈ 𝑄 𝜔 is accepting, which requires that 𝑞 is the initialstate and 𝑞 𝑖 + = 𝛿 ( 𝑞 𝑖 , 𝑎 𝑖 ) for all 𝑖 ≥
0, The run 𝑞 , 𝑞 . . . satisfiesthe acceptance condition 𝐴𝑐𝑐 . Definition 2.2 ( 𝜔 automata Büchi Acceptance Condition). [9] TheBüchi condition is specified by a finite set 𝐹 ⊆ 𝑄 . For a given infi-nite run 𝑟 , let 𝑖𝑛𝑓 ( 𝑟 ) denote the set of states that occur infinitelyoften in 𝑟 . We have that the Büchi condition is satisfied by 𝑟 if 𝑖𝑛𝑓 ( 𝑟 ) ∩ 𝐹 ≠ ∅ We now extend this definition to deterministic Büchi tree au-tomata. These automata will recognize a set of labeled directedtrees. A Σ -labeled, Δ -directed tree, for finite alphabets Σ ( label al-phabet , or labels , for short) and Δ ( direction alphabet , or directions for short) is a mapping 𝜏 : Δ ∗ → Σ . Intuitively, 𝜏 labels the nodes 𝑢 ∈ Δ ∗ with labels from Σ . A path 𝑝 of a Δ -directed tree is aninfinite sequence 𝑝 = 𝑢 , 𝑢 , . . . ∈ ( Δ ∗ ) 𝜔 , such that 𝑢 𝑖 + = 𝑢 𝑖 𝑏 𝑖 for some 𝑏 𝑖 ∈ Δ . We use the notation 𝜏 ( 𝑝 ) to denote the infinitesequence 𝜏 ( 𝑢 ) , 𝜏 ( 𝑢 ) , . . . ∈ Σ 𝜔 . Definition 2.3 (Deterministic Büchi Tree Automata). [9] A deter-ministic Büchi tree automaton is a tuple h Σ , Θ , 𝑄, 𝑞 , Δ , 𝐹 i , where Σ is a finite label alphabet, Δ is a finite direction alphabet, 𝑄 is afinite state set, 𝑞 ∈ 𝑄 is the initial state, 𝜌 : ( 𝑄 × Σ × Δ ) → 𝑄 is adeterministic transition function, and 𝐹 ⊂ 𝑄 is the accepting-stateset.The automaton is considered to be top-down if runs of the au-tomata start from the root of a tree. All automata in this paper willbe top-down, and our notion of a run is conditioned on this.A run of this automaton on a Σ -labeled, Δ -directed tree 𝜏 : Δ ∗ → Σ is a 𝑄 -labeled, Δ -directed tree 𝑟 : Δ ∗ → 𝑄 such that 𝑟 ( 𝜀 ) = 𝑞 , and if 𝑢 ∈ Δ ∗ , 𝜏 ( 𝑢 ) = 𝑎 , for 𝑎 ∈ Σ , 𝑟 ( 𝑢 ) = 𝑞 , and 𝑣 = 𝑢𝑏 for 𝑏 ∈ Δ ,then 𝑟 ( 𝑣 ) = 𝜌 ( 𝑞, 𝑎, 𝑏 ) . The run 𝑟 is accepting if 𝑟 ( 𝑝 ) satisfies theBüchi condition 𝐹 for every path 𝑝 of 𝑟 . In this section we provide some definitions related to simple twoplayer games to provide a standard notation throughout this paper.The two players will be denoted by player 0 and player 1.
Definition 2.4 (Arena). An arena is a four tuple 𝐴 = ( 𝑉 ,𝑉 ,𝑉 , 𝐸 ) where 𝑉 is a finite set of vertices, 𝑉 and 𝑉 are disjoint subsets of 𝑉 with 𝑉 ∪ 𝑉 = 𝑉 that represent the vertices that belong to player0 and player 1 respectively, and 𝐸 ⊆ 𝑉 × 𝑉 is a set of directed edges,i.e. ( 𝑣, 𝑣 ′ ) | ∈ 𝐸 if there is an edge from 𝑣 to 𝑣 ′ .Intuitively, the player that owns a node decides which outgoingedge to follow. Since 𝑉 = 𝑉 ∪ 𝑉 , we can notate the same arenawhile omitting 𝑉 , a convention we follow in this paper. Definition 2.5 (Play). A play in an arena 𝐴 is an infinite sequence 𝜌 𝜌 𝜌 . . . ∈ 𝑉 𝜔 such that ( 𝜌 𝑛 , 𝜌 𝑛 + ) ∈ 𝐸 holds for all 𝑛 ∈ N . Wesay that 𝜌 starts at 𝜌 We now introduce a very broad definition for two-player games.
Definition 2.6 (Game). A game 𝐺 = ( 𝐴,𝑊 𝑖𝑛 ) consists of an arena 𝐴 with vertex set 𝑉 and a set of winning plays 𝑊 𝑖𝑛 ⊆ 𝑉 𝜔 . A play 𝜌 is winning for player 0 if 𝜌 ∈ 𝑊 𝑖𝑛 , otherwise it is winning forplayer 1.Note that in this formulation of a game, reaching a state 𝑣 ∈ 𝑉 with no outgoing transitions is always losing for player 0, as player0 is the one that must ensure that 𝜌 is infinite ( a member of 𝑉 𝜔 ).A game is thus defined by its set of winning plays, often calledthe winning condition. One such widely used winning conditionthe safety condition. Definition 2.7 (Safety Condition/ Safety Game).
Let 𝐴 = ( 𝑉 , 𝑉 ,𝑉 ,𝐸 ) be an arena and 𝑆 ⊆ 𝑉 be a subset of 𝐴 ’s vertices. Then, the safety condition 𝑆𝑎𝑓 𝑒𝑡𝑦 ( 𝑆 ) is defined as 𝑆𝑎𝑓 𝑒𝑡𝑦 ( 𝑆 ) = { 𝜌 ∈ 𝑉 𝜔 | 𝑂𝑐𝑐 ( 𝜌 ) ⊆ 𝑆 } where 𝑂𝑐𝑐 ( 𝜌 ) denotes the subset of vertices that occur at leastonce in 𝜌 .A game with the safety winning condition for a subset 𝑆 is a safety game with the set 𝑆 of safe vertices. Information about solv-ing safety games, including notions of winning strategies and win-ning sets can be found here [18]. A concurrent game structure (CGS) is an 8-tuple ( 𝑃𝑟𝑜𝑝, Ω , ( 𝐴 𝑖 ) 𝑖 ∈ Ω , 𝑆, 𝜆, 𝜏, 𝑠 ∈ 𝑆, ( 𝐴 𝑖 ) 𝑖 ∈ Ω ) where 𝑃𝑟𝑜𝑝 is a finite set of propositions , Ω = { , . . . 𝑘 − } isa finite set of agents , 𝐴 𝑖 is a set of actions , where each 𝐴 𝑖 is as-sociated with an agent 𝑖 (we also construct the set of decisions 𝐷 = 𝐴 × 𝐴 . . . 𝐴 𝑘 − , 𝑆 is a set of states , 𝜆 : 𝑆 → 𝑃𝑟𝑜𝑝 is a la-beling function that associates each state with a set of propositionsthat are interpreted as true in that state, 𝜏 : 𝑆 × 𝐷 → 𝑆 is a de-terministic transition function that takes a state and a decision asinput and returns another state, 𝑠 is a state in 𝑆 that serves as the initial state , and 𝐴 𝑖 is a DFA associated with agent 𝑖 . A DFA 𝐴 𝑖 isdenoted as the goal of agent 𝑖 . Intuitively, agent 𝑖 prefers plays in he game that satisfy 𝐴 𝑖 , that is a play such that some finite prefixof the play is accepted by 𝐴 𝑖 . It is for this reason we refer to 𝐴 𝑖 asa "goal".We now define iterated boolean games (iBG), a restriction on theCGS formalism. Our formulation is slight generalization of the iBGframework introduced in [10], as we take the set of actions to be afinite alphabet rather than a set of truth assignments since we areinterested in separating temporal reasoning from strategic reason-ing. An iBG is defined by applying the following restrictions to theCGS formalism. Each agent 𝑖 is associated with its own alphabet Σ 𝑖 . These Σ 𝑖 are disjoint and each Σ 𝑖 serves as the set of actionsfor agent 𝑖 ; an action for agent 𝑖 consists of choosing a letter in Σ 𝑖 .The set of decisions is then Σ = > 𝑘 − 𝑖 = Σ 𝑖 . The set of states corre-sponds to the set of decisions Σ ; there is a bijection between the setof states and the set of decisions. The labeling function mirrors theelement of Σ associated with each state. As in [10], we still have 𝜆 ( 𝑠 ) = 𝑠 , but with 𝑠 ∈ Σ now. As a slight abuse of notation, weconsider the “proposition” 𝜎 ∈ Σ 𝑖 for some 𝑖 to be true at state 𝑠 if 𝜎 appears in 𝑠 , allowing us to generalize towards arbitrary alpha-bets. Finally, the transition function 𝜏 is simply right projection 𝜏 ( 𝑠, 𝑑 ) = 𝑑 .We now introduce the notion of a strategy for agent 𝑖 in thegeneral CGS formalism. Definition 2.8 (Strategy for agent 𝑖 ). A strategy for agent 𝑖 is afunction 𝜋 𝑖 : 𝑆 ∗ → 𝐴 𝑖 . Intuitively, this is a function that, given theobserved history of the game (represented by an element of 𝑆 ∗ ),returns an action 𝑎 𝑖 ∈ 𝐴 𝑖 .Recalling that Ω = { , . . . 𝑘 − } represents the set of agents,we now introduce the notion of a strategy profile . Definition 2.9 (Strategy Profile).
Let Π 𝑖 represent the set of strate-gies for agent 𝑖 . Then, we define the set of strategy profiles Π = > 𝑖 ∈ Ω Π 𝑖 Note that since both the notion of strategies for individual agentsand the transition function in a CGS are deterministic, a given strat-egy profile for an CGS defines a unique element of 𝑆 𝜔 (a trace). Definition 2.10 (Primary Trace resulting from a Strategy Profile).
Given a strategy profile 𝜋 , the primary trace of 𝜋 is the uniquetrace 𝑡 that satisfies(1) 𝑡 [ ] = 𝜋 ( 𝜖 ) (2) 𝑡 [ 𝑖 ] = 𝜋 ( 𝑡 [ ] , . . . 𝑡 [ 𝑖 − ]) We denote this trace as 𝑡 𝜋 .Given a trace 𝑡 ∈ 𝑆 𝜔 , define the winning set 𝑊 𝑡 = { 𝑖 ∈ Ω : 𝑡 | = 𝐴 𝑖 } to be the set of agents whose DFA goals are satisfied by a finiteprefix of the trace 𝑡 . The losing set is then defined as Ω / 𝑊 𝑡 .A common solution concept in game theory is the Nash equilib-rium , which we will now modify to fit our iBG framework. In ourframework, a Nash equilibrium is a strategy profile 𝜋 such that foreach agent 𝑖 , if 𝐴 𝑖 is not satisfied on 𝑡 𝜋 , then any unilateral strat-egy deviation for agent 𝑖 will not result in a trace that satisfies 𝐴 𝑖 .Formally: Definition 2.11 (Nash Equilibrium). [10] Let 𝐺 be an iBG and 𝜋 = h 𝜋 , 𝜋 . . . 𝜋 𝑘 − i be a strategy profile. We denote 𝑊 𝜋 = 𝑊 𝑡 𝜋 . Theprofile 𝜋 is a Nash equilibrium if for every 𝑖 ∈ Ω / 𝑊 𝑡 we have that given all strategy profiles of the form 𝜋 ′ = h 𝜋 , 𝜋 . . . 𝜋 ′ 𝑖 . . . 𝜋 𝑘 − i ,for every 𝜋 ′ 𝑖 ∈ Π 𝑖 , it is the case that 𝑖 ∈ Ω / 𝑊 𝜋 ′ .This definition provides an analogy for the Nash equilibriumdefined in [22] by capturing the same property - no agent can uni-laterally deviate to improve its own payoff (moving from havinga not satisfied goal to a satisfied goal). Agents already in the set 𝑊 𝜋 cannot have their payoff improved further, so we do not checktheir deviations.Our paper is based around one central question: Given an iBG,which subsets of agents admit at least one Nash equilibrium?
In order to address our central question, we first describe a tree-automata framework to characterize the set of Nash equilibriumstrategies in an iBG 𝐺 . In this section we fix a winning set 𝑊 ⊂ Ω and then describe a deterministic Büchi tree automaton that rec-ognizes the set of strategy profiles for 𝑊 . In the next section wedevelop an algorithm based on this tree-automata framework.Given 𝑘 DFA goals corresponding to 𝑘 agents, we retain the no-tation that the set of actions for agent 𝑖 is given by Σ 𝑖 . The goal DFAfor agent 𝑖 will then denoted as 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i . Note that thealphabet of the DFA is Σ , since it transitions according to decisionsby all agents in the overlying iBG structure. Since Σ = Σ × . . . Σ 𝑘 − ,compact notation is often used to describe the transition function 𝛿 𝑖 . For example, the Mona tool uses binary decision diagrams torepresent automata with large alphabets [4].
As defined previously, strategy profiles are functions 𝜋 : Σ ∗ → Σ .Therefore, strategy profiles correspond exactly towards labeled Σ -labeled trees, which are defined in the exact same way. We use thecommon notions of tree paths and label-direction pairs as widelydefined in the literature (see [9] for reference).A 𝑊 -NE-strategy, for 𝑊 ⊆ Ω , is a mapping 𝜋 : Σ ∗ → Σ suchthat the following conditions are satisfied:(1) Primary-Trace Condition : The primary infinite trace 𝑡 𝜋 de-fined by 𝜋 satisfies the goals 𝐴 𝑗 precisely for 𝑗 ∈ 𝑊 . Thetrace 𝑡 𝜋 = 𝑥 , 𝑥 , . . . for 𝜋 is once again defined as follows(a) 𝑥 = 𝜀 (b) 𝑥 𝑖 + = 𝑥 , . . . , 𝑥 𝑖 , 𝜋 ( 𝑥 , . . . , 𝑥 𝑖 ) (2) 𝑗 -Deviant-Trace Condition : Each 𝑗 - deviant trace 𝑡 = 𝑦 , 𝑦 ,. . . for 𝑗 ∉ 𝑊 , does not satisfy the goal 𝐴 𝑗 .For 𝛼 ∈ Σ , we introduce the notation 𝛼 [− 𝑗 ] to refer to 𝛼 | Σ \ Σ 𝑗 (that is, 𝛼 with Σ 𝑗 projected out). A trace 𝑡 = 𝑦 , 𝑦 , . . . is 𝑗 -deviant if(a) 𝑦 = 𝜀 (b) 𝑦 𝑖 + = 𝑦 , . . . , 𝑦 𝑖 , 𝛼 , where 𝛼 ∈ Σ and 𝛼 [− 𝑗 ] = 𝜋 ( 𝑦 𝑖 )[− 𝑗 ] (c) 𝑡 is not the primary traceIn order to simplify the presentation, we introduce the assump-tion that for all agents 𝑗 we have | Σ 𝑗 | ≥
2. This is because thereare no 𝑗 -Deviant-Traces for an agent with only one strategy. There-fore 𝑊 -NE analysis only amounts to checking the Primary-TraceCondition for these agents.Note that there are traces that do not fall into either category.For example, we could have a trace that contains a label direction air ( 𝛼, 𝛽 ) such that 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 . Or, wecould have a trace that contains two label direction pairs ( 𝛼 , 𝛽 ) and ( 𝛼 , 𝛽 ) such that 𝛼 ≠ 𝛽 , 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , 𝛼 ≠ 𝛽 , and 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for 𝑗 ≠ 𝑗 . Traces like these and others thatdo not fit into either the Primary-Trace category or the 𝑗 -Deviant-Trace category are irrelevant to the Nash equilibrium condition - itdoes not matter what properties do or do not hold on these traces.As a reminder, a trace 𝑧 , 𝑧 , . . . ∈ Σ 𝜔 satisfies a DFA 𝐴 if 𝐴 accepts 𝑧 , . . . , 𝑧 𝑘 for some 𝑘 ≥ 𝑊 -NE strategy, we construct an infinite-tree automaton 𝑇 that accepts all 𝑊 -NE strategies. The problemof determining whether a 𝑊 -NE exists then reduces to querying 𝐿 ( 𝑇 ) ≠ ∅ . Recall that we notate the goal DFA of agent 𝑖 as 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i . We assume that that 𝑞 𝑖 ∉ 𝐹 𝑗 , since we are not in-terested in empty traces. We first construct a deterministic Büchiautomaton 𝐴 𝑊 = h 𝑄, 𝑞 , Σ , 𝛿, 𝐹 i that accepts a word in Σ 𝜔 if it sat-isfies precisely the goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 . Intuitively, 𝐴 𝑊 simulatesconcurrently all the goal DFAs, and checks that 𝐴 𝑗 is satisfied pre-cisely for 𝑗 ∈ 𝑊 . We define the following for 𝐴 𝑊 .(1) 𝑄 = ( > 𝑗 ∈ Ω 𝑄 𝑗 ) × Ω (2) 𝑞 = h 𝑞 , . . . , 𝑞 𝑛 ,𝑊 i (3) 𝐹 = ( > 𝑗 ∈ Ω 𝑄 𝑗 ) × {∅} (4) 𝛿 (h 𝑞 , . . . 𝑞 𝑛 , 𝑈 i , 𝛼 ) = h 𝑞 ′ , . . . 𝑞 ′ 𝑛 ,𝑉 i if(a) 𝑞 ′ 𝑗 = 𝛿 𝑖 ( 𝑞 𝑗 , 𝛼 ) , where 𝑞 ′ 𝑗 ∉ 𝐹 𝑗 for 𝑗 ∉ 𝑊 , and 𝑉 = 𝑈 − { 𝑗 : 𝑞 ′ 𝑗 ∈ 𝐹 𝑗 } .Note that 𝐴 𝑊 concurrently simulates all the goal DFAs while it alsochecks that no goal DFA 𝐴 𝑗 for 𝑗 ∉ 𝑊 is satisfied. (Note that if 𝑞 ′ 𝑗 ∈ 𝐹 𝑗 for 𝑗 ∉ 𝑊 , then the transition is not defined, and 𝐴 𝑊 is stuck.)The last component of the state holds the indices of the goals thatare yet to be satisfied. For 𝐴 𝑊 to accept an infinite trace, all goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 have to be satisfied, so the last component of thestate has to become empty. Note that if 𝐴 𝑊 reaches an acceptingstate in 𝐹 , then it stays in the set 𝐹 unless it gets stuck. Lemma 3.1 ( 𝐴 𝑊 Correctness).
For a given 𝑊 ⊆ Ω , the automa-ton 𝐴 𝑊 accepts an 𝜔 -word 𝑢 ∈ Σ 𝜔 iff 𝑢 | = 𝐴 𝑖 for precisely the agents 𝑖 ∈ 𝑊 . Proof.
First, note that no prefix of 𝑤 can satisfy 𝐴 𝑗 for some 𝑗 ∈ Ω \ 𝑊 . If that is the case, then by the definition of the transitionfunction 𝛿 we would have no transition defined upon reading thisprefix, meaning that 𝐴 𝑊 cannot accept. Next, note that every goal 𝐴 𝑗 for 𝑗 ∈ 𝑊 must be satisfied by a prefix of 𝑤 . Otherwise, the 2 Ω component of the states in 𝑄 would never reach ∅ , as the only wayto remove elements from this component is to satisfy the goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 . Since the Büchi acceptance condition implies that a finalstate in 𝐴 𝑊 be reached, we know that when a final state is reachedall goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 have previously been satisfied. Since bothof these conditions must hold, we conclude the lemma. (cid:3) We now construct a deterministic top-down Büchi tree automa-ton 𝑇 that accepts an infinite tree 𝜋 : Σ ∗ → Σ if the Primary-Trace Condition with respect to 𝑊 holds. Essentially, 𝑇 runs 𝐴 𝑊 on the primary trace defined by the input strategy 𝜋 . Formally, 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) , where:(1) Σ is both the label alphabet of the tree and its set of direc-tions, Here we introduce the notation that 𝛼 is an element of Σ corresponding to a label and 𝛽 is an element of Σ cor-responding to a direction.(2) 𝑞 𝑎 is a new accepting state(3) For a state 𝑞 , label 𝛼 , and direction 𝛽 , we have 𝜌 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 ( 𝑞, 𝛼 ) if 𝛼 = 𝛽 and 𝑞 ≠ 𝑞 𝑎 , and 𝜌 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝑎 otherwiseNote that 𝑇 simulates 𝐴 𝑊 along the branch corresponding to theprimary trace defined by the input tree 𝜋 . Along all other branches, 𝑇 enters the accepting state 𝑞 𝑎 . Lemma 3.2.
Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents. Let 𝜋 : Σ ∗ → Σ be a strategy profile. Then 𝜋 is accepted by the treeautomaton 𝑇 iff 𝜋 satisfies the Primary Trace condition. Proof.
The primary trace is a single path 𝑝 ∈ 𝜋 such thatfor all label direction pairs ( 𝛼, 𝛽 ) ∈ 𝑝 we have 𝛼 = 𝛽 . The au-tomata 𝑇 transitions to the state 𝑞 𝐴 immediately after seeing alabel-direction pair ( 𝛼, 𝛽 ) such that 𝛼 ≠ 𝛽 , meaning that accep-tance by 𝑇 is solely determined by acceptance on the path 𝑝 with 𝛼 = 𝛽 for every ( 𝛼, 𝛽 ) ∈ 𝑝 , which is the primary trace of 𝜋 bydefinition.The Primary-Trace Condition is that on the primary trace, onlythe goals 𝐴 𝑖 for 𝑖 ∈ 𝑊 are satisfied. By virtue of construction, 𝑇 simulates the DBW 𝐴 𝑊 on the primary trace, which captures thiscondition by the previous arguments presented in the constructionof 𝐴 𝑊 in Lemma 3.1. (cid:3) We also construct a deterministic top-down Büchi infinite-treeautomaton 𝑇 𝑗 that accepts precisely the trees 𝜋 : Σ ∗ → Σ thatsatisfy the 𝑗 -Deviant-Trace Condition. Given a DFA goal 𝐴 𝑗 = ( 𝑄 𝑗 , 𝑞 𝑗 , Σ , 𝛿 𝑗 , 𝐹 𝑗 ) , we define 𝑇 𝑗 = ( Σ , Σ , ( 𝑄 𝑗 ×{ , })∪{ 𝑞 𝐴 } , h 𝑞 𝑗 , i ,𝜌 𝑗 , ( 𝑄 𝑗 × { }) ∪ (( 𝑄 𝑗 \ 𝐹 𝑗 ) × { }) ∪ { 𝑞 𝐴 }) , where:(1) Σ is both the label alphabet of the tree and its set of direc-tions. We retain the notation that 𝛼 is a label and 𝛽 is a di-rection.(2) 𝑞 𝐴 is a new accepting state. (By a slight abuse of notationwe consider 𝑞 𝐴 to be a pair h 𝑞 𝐴 , i .)(3) We maintain two copies of 𝑄 𝑗 , one tagged with 0 and onetagged with 1. Intuitively, we stay in 𝑄 𝑗 ×{ } on the primarytrace until there is a 𝑗 -deviation, and then we transition to 𝑄 𝑗 × { } ,(4) 𝜌 𝑗 (h 𝑞, 𝑖 i , 𝛼, 𝛽 ) is defined as follows(a) 𝛿 𝑗 ( 𝑞, 𝛼 ) × { } if 𝑖 = 𝛼 = 𝛽 (b) 𝛿 𝑗 ( 𝑞, 𝛽 )×{ } if 𝑖 = 𝛼 ≠ 𝛽 , 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 (c) 𝛿 𝑗 ( 𝑞, 𝛽 ) × { } if 𝑖 = 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , and 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 (d) 𝑞 𝐴 if 𝑞 = 𝑞 𝐴 or 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] On the primary trace of 𝜋 , we enter states 𝑞 ∈ 𝑄 𝑗 × { } . All ofthese states are accepting, so the primary trace will always be anaccepting branch in 𝑇 𝑗 since the primary trace is not relevant tothe 𝑗 -Deviant-Trace Condition. Intuitively, we may leave the pri-mary trace at a node labeled 𝛼 by following a direction 𝛽 such that 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛼 ≠ 𝛽 . Here, we transition to the second copyof 𝑄 𝑗 , 𝑄 𝑗 × { } , where the 1 denotes that we have left the primarytrace. When we are in these states on a node labeled 𝛼 , we may tran-sition according to 𝛿 𝑗 on any direction 𝛽 with 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] . Nev-ertheless, due to how the transitions are defined, we can never en-ter a state in 𝐹 𝑗 . If such a direction 𝛽 exists such that 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] and the resulting transition according to 𝛿 𝑗 would put 𝐴 𝑗 in 𝐹 𝑗 , hen the automaton does not have a defined transition and there-fore can not accept on this path. Otherwise, if we see a direction 𝛽 such that for our current label 𝛼 we have that 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , thenthis no longer corresponds to a 𝑗 -Deviant-Trace. At this point wetransition to 𝑞 𝐴 , a catch-all accepting state that marks all contin-uations of the current path irrelevant to the 𝑗 -Deviant-Trace Con-dition. Therefore if we are in state 𝑞 𝐴 we transition back to 𝑞 𝐴 onall directions 𝛽 regardless of the label. Lemma 3.3.
Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents. Let 𝜋 : Σ ∗ → Σ be a strategy profile. Then 𝜋 is accepted by the treeautomaton 𝑇 𝑗 iff 𝜋 satisfies the 𝑗 -Deviant Trace Condition. Proof.
By definition set of 𝑗 -Deviant-Traces is the set of paths 𝑝 such that for all ( 𝛼, 𝛽 ) ∈ 𝑝 we have 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , excludingthe primary trace. The 𝑗 -Deviant-Trace Condition says that for allthese paths 𝑝 , none have a finite prefix accepted by 𝐴 𝑗 .For a infinite/finite sequence of label direction pairs 𝑝 = ( 𝛼 , 𝛽 ) . . . , let 𝛽 𝑝 denote the infinite/finite word obtained by concatenat-ing all 𝛽 together in index order. If 𝜋 does not satisfy the 𝑗 -Deviant-Trace Condition, then there exists a finite sequence of label-directionpairs 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . ( 𝛼 𝑛 , 𝛽 𝑛 ) such that ∀ 𝑖 ( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] , ∃ 𝑖 ( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 ≠ 𝛽 𝑖 , and 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 . Since 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for every index in 𝑝 𝑗 , 𝑇 𝑗 never attempts to transition to 𝑞 𝐴 along 𝑝 𝑗 . And since 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 , we know that along 𝑝 𝑗 𝑇 𝑗 at-tempts to transition to a final state in 𝐹 𝑗 and get stuck, thereforerejecting. Therefore, 𝑇 𝑗 reject 𝜎 .Now assume that 𝑇 𝑗 does not accept 𝜋 . This means that alongsome 𝑗 -Deviant-Trace 𝑇 𝑗 attempts to transition to a state 𝑞 𝑓 × { } ,where 𝑞 𝑓 ∈ 𝐹 𝑗 , and gets stuck, as this is the only way for 𝑇 𝑗 to reject. This follows from the observation that every reachablestate in 𝑇 𝑗 is accepting. This means there exists a finite sequenceof label-direction pairs 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . ( 𝛼 𝑛 , 𝛽 𝑛 ) such that ∀( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] (otherwise 𝑇 𝑗 would have transitioned into 𝑞 𝐴 ), ∃( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 ≠ 𝛽 𝑖 (otherwise this would be a prefix of theprimary trace), and 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 (since 𝑇 𝑗 attempted to transi-tion into a final state and got stuck). Therefore, 𝜎 does not satisfythe 𝑗 -Deviant-Trace Condition. (cid:3) W - NE automata We constructed a tree automaton that recognizes the set of strate-gies that satisfy the Primary-Trace condition for a fixed subset 𝑊 ⊆ Ω of agents in an iBG 𝐺 , 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) .We also constructed the automaton 𝑇 𝑗 that checks the 𝑗 -DeviantTrace condition for a specific agent 𝑗 . A simple way to check boththe Primary Trace condition and the 𝑗 -Deviant Trace condition forsome 𝑊 ⊆ Ω would be to take the cross product of 𝑇 with all the 𝑇 𝑗 ’s for every 𝑗 ∉ 𝑊 . We now show that this can be done moreefficiently, by taking a modified union of the state sets of 𝑇 andthe 𝑇 𝑗 ’s instead of their cross product. This is motivated by the ob-servation that each automaton "checks" a disjoint set of paths in atree 𝜋 , and marks all others with a repeating accepting state.We construct a deterministic top-down Büchi infinite-tree au-tomaton 𝑇 𝑊 = ( Σ , Σ , 𝑄 ∪ Ð 𝑗 ∈ Ω \ 𝑊 𝑄 𝑗 ∪{ 𝑞 𝐴 } , 𝑞 , 𝜏, 𝐹 ∪ Ð 𝑗 ∈ Ω \ 𝑊 { 𝑄 𝑗 \ 𝐹 𝑗 } ∪ { 𝑞 𝐴 }) to accept all strategies that satisfy both the Primary-Trace condition and the 𝑗 -Deviant-Trace Conditions, where (1) Σ is both the label alphabet of the tree and its set of direc-tions with the 𝛼 and 𝛽 notations defined as previously.(2) 𝑞 𝐴 is a repeating accepting state.(3) 𝜏 is defined as follows for a given state 𝑞 , label 𝛼 , and direc-tion 𝛽 (a) If 𝑞 ∈ 𝑄 (i) If 𝛼 = 𝛽 , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝜌 ( 𝑞, 𝛼, 𝛽 ) (ii) If 𝛼 ≠ 𝛽 , but for some 𝑗 ∈ Ω \ 𝑊 we have 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 𝑗 ( 𝑞 [ 𝑗 ] , 𝛽 ) , where 𝑞 [ 𝑗 ] is 𝑗 -thcomponent of 𝑞 , provided that 𝛿 𝑗 ( 𝑞 [ 𝑗 ] , 𝛽 ) ∉ 𝐹 𝑗 .(iii) If for all 𝑗 ∈ Ω \ 𝑊 we have 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 (b) If 𝑞 ∈ 𝑄 𝑗 for 𝑗 ∈ Ω \ 𝑊 , then(i) If 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 𝑗 ( 𝑞, 𝛽 ) , providedthat 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 ,(ii) If 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 (c) If 𝑞 = 𝑞 𝐴 , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 Intuitively, the automaton 𝑇 𝑊 simulates the automaton 𝑇 onthe primary trace defined by 𝜋 . If the automaton is on the primarytrace, it is in a state in 𝑄 and it checks all possible 𝑗 -deviations fromthat state by transitioning accordingly to all states reachable bypossible 𝑗 -deviant actions on the corresponding directions. Notethat here we only check if 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for a single 𝑗 , as it caneasy to see that if 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for twodifferent 𝑗 , 𝑗 then 𝛼 = 𝛽 since Σ 𝑗 and Σ 𝑗 are disjoint. On a statethat does not represent either a continuation of the primary traceor one reachable by a deviation from some agent 𝑗 , we move to therepeating accepting state 𝑞 𝐴 .If the automaton is in some state 𝑞 ∈ 𝑄 𝑗 , it transitions accordingto 𝛿 𝑗 on a direction 𝛽 with 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] , including the one where 𝛼 = 𝛽 . On all other directions, it transitions to the new state 𝑞 𝐴 . Ifthe automaton reaches a final state for 𝐴 𝑗 , it gets stuck and cannotaccept. This simulates the automaton 𝑇 𝑗 and verifies the 𝑗 -Deviant-Trace Condition. If the automaton is in the state 𝑞 𝐴 , it means wehave marked the subtree starting from the current node as irrele-vant to the Nash Equilibrium definition. Therefore, we simply stayin the accepting state 𝑞 𝐴 on every direction. Theorem 3.4.
Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Let 𝜋 : Σ ∗ → Σ be a strategy profile. Then 𝜋 is accepted by the treeautomaton 𝑇 𝑊 iff 𝜋 is a 𝑊 -NE strategy. Proof. ( → ) Suppose 𝜋 : Σ ∗ → Σ is accepted by 𝑇 𝑊 . We showthat 𝜋 must satisfy both the Primary-Trace Condition and the 𝑗 -Deviant-Trace Condition for all 𝑗 ∈ Ω \ 𝑊 .(1) The primary trace of 𝜋 is the unique path 𝑝 = ( 𝛼 , 𝛽 ) . . . such that for every ( 𝛼 𝑖 , 𝛽 𝑖 ) we have 𝛼 𝑖 = 𝛽 𝑖 . On this path, theautomaton 𝑇 𝑊 stays in states in 𝑄 and transitions accordingto the transition function 𝛿 of 𝐴 𝑊 ; thus 𝑇 𝑊 simulates 𝐴 𝑊 on the primary trace. Since 𝑇 𝑊 accepts 𝜋 , we know that 𝐴 𝑊 accepts on 𝑝 , meaning that exactly the goals 𝐴 𝑖 for 𝑖 ∈ 𝑊 aresatisfied. Therefore 𝜋 satisfies the Primary-Trace Condition.(2) A 𝑗 -deviant trace of 𝜋 is a path 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . such thatfor every ( 𝛼 𝑖 , 𝛽 𝑖 ) ∈ 𝑝 we have 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] and we havethat 𝑝 𝑗 is different from the primary trace. Therefore, for atleast one index 𝑖 , we have that 𝛼 𝑖 ≠ 𝛽 𝑖 in 𝑝 𝑗 . When 𝑇 𝑊 runs n such a trace, it starts in states in 𝑄 and eventually tran-sition to states in 𝑄 𝑗 upon reaching the first index where 𝛼 𝑖 ≠ 𝛽 𝑖 . When it is in the states in 𝑄 , 𝐴 𝑗 cannot reach a finalstate, otherwise 𝑇 𝑊 would get stuck and not accept due tothe construction of 𝐴 𝑊 , contradicting our assumption that 𝑇 𝑊 does accept. When it reaches the states in 𝑄 𝑗 , it alsocan never get stuck attempting to a transition to final statein 𝐹 𝑗 due to the construction of the transition function 𝜏 ,as any such attempted transition would mean 𝑇 𝑊 would re-ject. This is true no matter which 𝑗 -deviant trace we choosesince 𝑇 𝑊 accepts on all paths of 𝜋 . Therefore 𝜋 satisfies the 𝑗 -Deviant-Trace condition for all 𝑗 ∈ Ω \ 𝑊 .( ← ) Note that 𝑇 𝑊 is deterministic, so there is a unique run 𝑇 𝑊 ( 𝜋 ) .We have to show that all paths of this run are accepting. There arethree types of paths: Primary Path:
If a path 𝑝 is the primary path, then 𝑇 𝑊 emu-lates 𝐴 𝑊 along 𝑝 . Because of the Primary Trace Condition,we know that 𝐴 𝑊 eventually enters and stays in the the set 𝐹 of accepting states. Thus, this path 𝑝 of 𝑇 𝑊 ( 𝜋 ) is accept-ing. 𝑗 -Deviant Paths: If 𝑝 = ( 𝛼 , 𝛽 ) , . . . is a 𝑗 -deviant path forsome 𝑗 ∈ Ω \ 𝑊 , then it can be factored as 𝑝 𝑃 · 𝑝 𝑗 , with 𝑝 𝑃 finite, but possibly empty. For every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑃 we have that 𝛼 𝑖 = 𝛽 𝑖 and for every label di-rection pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑗 we have that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] .Note that only one choice of 𝑗 is appropriate. Let 𝑖 be thefirst index in 𝑝 where 𝛼 𝑖 ≠ 𝛽 𝑖 . Having 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] and 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for two different agents 𝑗 , 𝑗 wouldimply that 𝛼 𝑖 = 𝛽 𝑖 . 𝑇 𝑊 first emulates 𝐴 𝑊 along 𝑝 𝑃 . Since 𝜋 satisfies the Primary Trace Condition, 𝑇 𝑊 will never getstuck and reject on 𝑝 𝑃 . Since 𝑝 is a 𝑗 -Deviant-Trace, there isa smallest 𝑖 such that 𝛼 𝑖 ≠ 𝛽 𝑖 in 𝑝 . At this point 𝑇 𝑊 switchesfrom emulating 𝐴 𝑊 to emulating 𝐴 𝑗 . Because 𝜋 satisfies the 𝑗 -Deviant-Trace-Condition, the goal 𝐴 𝑗 does not hold along 𝑝 . Thus, 𝑇 𝑊 does not get stuck along 𝑝 𝑃 or along 𝑝 𝑗 , and itaccepts along 𝑝 . Other Paths: If 𝑝 is not the primary path nor a 𝑗 -deviantpath, then there are two possibilities.(1) The first case is when 𝑝 can be factored as 𝑝 𝑃 · 𝑝 ′ , with 𝑝 𝑃 finite but possibly empty. For every point ( 𝛼 𝑖 , 𝛽 𝑖 ) of 𝑝 𝑃 we have that 𝛼 𝑖 = 𝛽 𝑖 , and at the first point ( 𝛼 𝑘 , 𝛽 𝑘 ) of 𝑝 ′ we have that 𝛼 𝑘 [− 𝑗 ] ≠ 𝛽 𝑘 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 .Then 𝑇 𝑊 will emulate 𝐴 𝑊 along 𝑝 𝑃 and transition to 𝑞 𝐴 upon reading ( 𝛼 𝑘 , 𝛽 𝑘 ) . By previous arguments, we knowthat 𝑇 𝑊 will not get stuck and reject along 𝑝 𝑃 . Once 𝑇 𝑊 enters 𝑞 𝐴 it stays in 𝑞 𝐴 , an accepting state. Therefore 𝑇 𝑊 accepts the path 𝑝 = 𝑝 𝑃 · 𝑝 ′ (2) The second case is when 𝑝 can be factored as 𝑝 𝑃 · 𝑝 𝑗 · 𝑝 ′ , with 𝑝 𝑃 finite but possibly empty and 𝑝 𝑗 finite andnonempty. For every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑃 wehave that 𝛼 𝑖 = 𝛽 𝑖 . For some 𝑗 ∈ Ω \ 𝑊 we have that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑗 , again noting that only one choice of 𝑗 is appropri-ate. Finally, at the first point ( 𝛼 𝑘 , 𝛽 𝑘 ) of 𝑝 ′ we have that 𝛼 𝑘 [− 𝑗 ] ≠ 𝛽 𝑘 [− 𝑗 ] . By previous arguments, we know that 𝑇 𝑊 will not get stuck and reject along 𝑝 𝑃 or 𝑝 𝑗 . And since 𝑇 𝑊 transitions to 𝑞 𝐴 at the beginning of 𝑝 ′ , we know thatit cannot get stuck and reject along 𝑝 ′ . Therefore 𝑇 𝑊 willaccept on 𝑝 = 𝑝 𝑃 · 𝑝 𝑗 · 𝑝 ′ . (cid:3) Corollary 3.5.
Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Then, a 𝑊 -NE strategy exists in 𝐺 iff the automaton 𝑇 𝑊 constructedwith respect to 𝐺 is nonempty. In the previous section, we constructed an automaton 𝑇 𝑊 that rec-ognizes the set of Nash equilibrium strategy profiles with winningset 𝑊 in an iBG 𝐺 , which we denoted as 𝑊 -NE strategies. Theproblem of determining whether a 𝑊 -NE strategy exists is equiv-alent to testing 𝑇 𝑊 for nonemptiness. The standard algorithm fortesting nonemptiness of Büchi tree automata involves Büchi games[9]. In this section, we prove that testing 𝑇 𝑊 for nonemptiness isequivalent to solving safety games and then testing a Büchi wordautomata for nonemptiness. This gives us a simpler path towardsconstructing an algorithm to decide our central question. Note that the Büchi condition on the j-Deviant traces simply con-sists of avoiding the set of final states in 𝐴 𝑗 , making it simpler thana general Büchi acceptance condition. In order to characterize thiscondition precisely, we now construct a 2-player safety game thatpartitions the states of 𝑄 𝑗 (for ab agent 𝑗 ∉ 𝑊 ) in 𝑇 𝑊 for 𝑗 ∈ Ω \ 𝑊 into two sets - states in which 𝑇 𝑊 started in state 𝑞 ∈ 𝑄 𝑗 is emptyand states in which 𝑇 𝑗 started in state 𝑞 ∈ 𝑄 𝑗 is nonempty. Weconstruct the safety game 𝐺 𝑗 = ( 𝑄 𝑗 , 𝑄 𝑗 × Σ , 𝐸 𝑗 ) . The safety set canintuitively be thought of as all the vertices not in 𝐹 𝑗 , but for ourpurposes it is more convenient to not define outgoing transitionsfrom these states - thus making them losing for player 0 by vio-lating the infinite play condition. Player 0 owns 𝑄 𝑗 and player 1owns 𝑄 𝑗 × Σ . Here we retain our 𝛼 and 𝛽 notation in so far as theyare both elements of Σ . The edge relation 𝐸 𝑗 is defined as follows:(1) ( 𝑞, h 𝑞, 𝛼 i)) ∈ 𝐸 𝑗 for 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 and 𝛼 ∈ Σ .(2) (h 𝑞, 𝛼 i , 𝑞 ′ ) ∈ 𝐸 𝑗 for 𝑞 ∈ 𝑄 𝑗 and 𝑞 ′ ∈ 𝑄 𝑗 , where 𝑞 ′ = 𝛿 𝑗 ( 𝑞, 𝛽 ) for some 𝛽 ∈ Σ such that 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] .Note as defined above, if 𝑞 ∈ 𝐹 𝑗 , then 𝑞 has no successor node,and player 0 is stuck and loses the game. Since 𝐺 𝑗 is a safety game,player 0’s goal is to avoid states in 𝐹 𝑗 and not get stuck. Let 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) be the set of winning states for Player 0 in the safety game 𝐺 𝑗 . Theorem 4.1.
A state 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 belongs to 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) iff 𝑇 𝑊 isnonempty when started in state 𝑞 . Proof. ( →) Suppose 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 and 𝑞 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . We con-struct a tree 𝜋 𝑞 : Σ ∗ → Σ that is accepted by 𝑇 𝑊 starting instate 𝑞 . To show that 𝜋 𝑞 is accepted, we also construct an accept-ing run 𝑟 𝑞 : Σ ∗ → ( 𝑄 𝑗 \ 𝐹 𝑗 ) ∪ { 𝑞 𝐴 } . By construction, we have 𝑟 𝑞 ( 𝑥 ) ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for all 𝑥 ∈ Σ ∗ . We proceed by induction on thelength of the run.For the basis of the induction, we start by defining 𝜋 𝑞 ( 𝜀 ) and 𝑟 𝑞 ( 𝜀 ) . First, we let 𝑟 𝑞 ( 𝜀 ) = 𝑞 . By the assumption that 𝑞 ∉ 𝐹 𝑗 , therun cannot get stuck and reject here. or the step case, suppose now that we have constructed 𝑟 𝑞 ( 𝑦 ) = 𝑝 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for some 𝑦 ∈ Σ ∗ . Now, since 𝑝 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) and can-not get stuck, there must be a node h 𝑞, 𝛼 𝑦 i contained in both 𝑄 𝑗 × Σ and 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , so we let 𝜋 𝑞 ( 𝑦 ) = 𝛼 𝑦 . Recall that the directions of 𝜋 are Σ . Divide the possible directions 𝛽 ∈ Σ into two types: ei-ther 𝛼 𝑦 [− 𝑗 ] = 𝛽 [− 𝑗 ] or 𝛼 𝑦 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] . If 𝛼 𝑦 [− 𝑗 ] = 𝛽 [− 𝑗 ] , thenthis corresponds to a legal move by player 1 in 𝐺 𝑗 . Since h 𝑞, 𝛼 𝑦 i ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , moves by player 1 must stay in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . It followsthat 𝑞 ′ = 𝛿 𝑗 ( 𝑞, 𝛽 ) ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , so 𝑞 ′ ∉ 𝐹 𝑗 . We let 𝑟 𝑞 ( 𝑦 · 𝛽 ) = 𝑞 ′ .If, on the other hand, 𝛼 𝑦 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , we let 𝑟 𝑞 ( 𝑦 · 𝛽 ) = 𝑞 𝐴 .Once we have reached a node 𝑧 ∈ Σ ∗ with 𝑟 𝑞 ( 𝑧 ) = 𝑞 𝐴 , we define 𝑟 𝑞 ( 𝑧 ′ ) = 𝑞 𝐴 for all descendants 𝑧 ′ of 𝑧 and we can define 𝜋 𝑞 ( 𝑧 ′ ) arbitrarily. Since we can never get stuck, we never reach a state in 𝐹 𝑗 , so the run 𝑟 𝑞 is accepting. (←) Suppose now that 𝑇 𝑊 started in state 𝑞 accepts a tree 𝜋 𝑞 : Σ ∗ → Σ . Since the automaton 𝑇 𝑊 is deterministic, it accepts witha unique run of 𝑇 𝑊 on 𝜋 𝑞 as 𝑟 𝑞 : Σ ∗ → ( 𝑄 𝑗 \ 𝐹 𝑗 ) ∪ { 𝑞 𝐴 } . We claimthat 𝜋 𝑞 is a winning strategy for player 0 in 𝐺 𝑗 from the state 𝑞 .Consider a play 𝜋 = 𝑝 , 𝛼 , 𝛽 , 𝑝 , 𝛼 , 𝛽 , . . . , where 𝑝 𝑖 ∈ 𝑄 𝑗 , 𝑝 = 𝑞 , and 𝛼 𝑖 , 𝛽 𝑖 ∈ Σ . In round 𝑖 ≥
0, player 0 moves from 𝑝 𝑖 to h 𝑝 𝑖 , 𝛼 𝑖 i ,for 𝛼 𝑖 = 𝜋 𝑞 (h 𝛽 , . . . , 𝛽 𝑖 − i) , and then player 1 moves from h 𝑝 𝑖 , 𝛼 𝑖 i to 𝑝 𝑖 + = 𝛿 𝑗 ( 𝑝 𝑖 , 𝛽 𝑖 ) , for some 𝛽 𝑖 such that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] . Let 𝑥 𝑖 = h 𝛽 , . . . , 𝛽 𝑖 − i , so we have that 𝛼 𝑖 = 𝜋 𝑞 ( 𝑥 𝑖 ) . By induction onthe length of 𝑥 𝑖 it follows that 𝑝 𝑖 = 𝑟 𝑞 ( 𝑥 𝑖 ) . Since 𝑟 𝑞 is an acceptingrun of 𝑇 𝑊 on 𝜋 𝑞 , it follows that 𝑝 𝑖 = 𝑟 𝑞 ( 𝑥 𝑖 ) ∉ 𝐹 𝑗 . Thus, the play 𝜋 is a winning play for player 0. It follows that 𝜋 𝑞 is a winningstrategy for player 0 in 𝐺 𝑗 from the state 𝑞 . (cid:3) 𝑇 𝑊 Nonemptiness
Recall that the tree automaton 𝑇 𝑊 , which recognizes 𝑊 -NE strate-gies, emulates the Büchi automaton 𝐴 𝑊 = ( 𝑄, 𝑞 , Σ , 𝛿, 𝐹 ) along theprimary trace and the goal automaton 𝐴 𝑗 along 𝑗 -deviant traces.We have constructed the above games 𝐺 𝑗 to capture nonemptinessof 𝑇 𝑊 from states in 𝑄 𝑗 , in terms of the winning sets 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . Wenow modify 𝐴 𝑊 to take these safety games into account. Let 𝐴 ′ 𝑊 = ( 𝑄 ′ , 𝑞 , Σ , 𝛿 ′ , 𝐹 ∩ 𝑄 ′ ) be obtained from 𝐴 𝑊 by restricting states to 𝑄 ′ ⊆ 𝑄 , where 𝑄 ′ = > 𝑖 ∈ 𝑊 𝑄 𝑖 × > 𝑗 ∈ Ω \ 𝑊 { 𝑊 𝑖𝑛 ( 𝐺 𝑗 )∩ 𝑄 𝑗 }× Ω . Inother words, the 𝑗 𝑡ℎ -component 𝑞 𝑖 𝑗 of a state 𝑞 = h 𝑞 𝑖 , . . . , 𝑞 𝑖 𝑛 i ∈ 𝑄 ′ must be in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for all 𝑗 ∈ Ω \ 𝑊 , otherwise the automaton 𝐴 ′ 𝑊 gets stuck. Theorem 4.2.
The Büchi word automaton 𝐴 ′ 𝑊 is nonempty iff thetree automaton 𝑇 𝑊 is nonempty. Proof. ( → ) Assume 𝐴 ′ 𝑊 is nonempty. Then, it accepts an infi-nite word 𝑤 = 𝑤 𝑤 . . . ∈ Σ 𝜔 with a run 𝑟 = 𝑞 , 𝑞 , . . . ∈ 𝑄 ′ 𝜔 . Weuse 𝑤 and 𝑟 to create a tree 𝜋 : Σ ∗ → Σ with an accepting run 𝑟 𝜋 : Σ ∗ → 𝑄 ∪ { 𝑞 𝐴 } with respect to 𝑇 𝑊 .Let 𝑥 = 𝜀 . We start by setting 𝜋 ( 𝑥 ) = 𝑤 and 𝑟 𝜋 ( 𝑥 ) = 𝑞 .Suppose now that we have just defined 𝜋 ( 𝑥 𝑖 ) = 𝛼 and 𝑟 𝜋 ( 𝑥 𝑖 ) = 𝑞 ,and, by construction, 𝑥 𝑖 is on the primary trace. Consider now thenode 𝑥 𝑖 · 𝛽 . There are three cases to consider:(1) If 𝜋 ( 𝑥 𝑖 ) = 𝛽 , then we set 𝑥 𝑖 + = 𝑥 𝑖 · 𝛽 , 𝜋 ( 𝑥 𝑖 + ) = 𝑤 𝑖 + and 𝑟 𝜋 ( 𝑥 𝑖 + ) = 𝑞 𝑖 + . Note that 𝑥 𝑖 + is, by construction, thesuccessor of 𝑥 𝑖 on the primary trace. Thus, the projection of 𝑟 𝜋 on the primary trace of 𝜋 is precisely 𝑟 , so 𝑟 𝜋 is acceptingalong the primary path. (2) If 𝜋 ( 𝑥 𝑖 )[− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝜋 ( 𝑥 𝑖 ) ≠ 𝛽 for some 𝑗 ∈ Ω \ 𝑊 ,then we set 𝑟 𝜋 ( 𝑥 𝑖 · 𝛽 ) = 𝑞 ′ 𝑗 = 𝛿 𝑗 ( 𝑞 𝑗 , 𝛽 ) , where 𝑞 𝑗 is the 𝑗 -thcomponent of 𝑞 . Since 𝑞 𝑗 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , we have that 𝑞 ′ 𝑗 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . By Theorem 4.1, 𝑇 𝑊 is nonempty when startedin state 𝑞 ′ 𝑗 . That is, there is a tree 𝜋 𝑞 ′ 𝑗 and an accepting run 𝑟 𝑞 ′ 𝑗 of 𝑇 𝑊 on 𝜋 𝑞 ′ 𝑗 , starting from 𝑞 ′ 𝑗 . So we take the subtreeof 𝜋 rooted at the node 𝑥 𝑖 · 𝛽 to be 𝜋 𝑞 ′ 𝑗 , and the run of 𝑇 𝑊 from 𝑥 𝑖 · 𝛽 is 𝑟 𝑞 ′ 𝑗 . So all paths of 𝑟 𝜋 that go through 𝑥 𝑖 · 𝛽 areaccepting.(3) Finally, if 𝜋 ( 𝑥 𝑖 )[− 𝑗 ] ≠ 𝛽 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 , then 𝑥 𝑖 · 𝛽 is neither on the primary trace nor on a 𝑗 -deviant trace forsome 𝑗 ∈ Ω \ 𝑊 . So we set 𝑟 𝜋 ( 𝑥 𝑖 · 𝛽 ) = 𝑞 𝐴 as well as 𝑟 𝜋 ( 𝑦 ) = 𝑞 𝐴 for all descendants 𝑦 of 𝑥 𝑖 · 𝛽 . The labels of 𝑥 𝑖 · 𝛽 and itdescendants can be set arbitrarily. So all paths of 𝑟 𝜋 that gothrough 𝑥 𝑖 · 𝛽 are accepting.( ← ) Assume 𝑇 𝑊 is nonempty. Then, we know that it accepts atleast one tree 𝜋 : Σ ∗ → Σ . In particular, since 𝑇 𝑊 accepts on allbranches of 𝜋 it accepts on the primary trace, denoted as 𝜋 𝑝 .Since 𝑇 𝑊 accepts on 𝜋 𝑝 , we can consider the run of 𝑇 𝑊 on 𝜋 which we denote 𝑟 : Σ ∗ → 𝑄 . Let the image of 𝑟 ( 𝜋 𝑝 ) be 𝑄 ∗ ⊆ 𝑄 .We claim that 𝑄 ∗ ⊆ 𝑄 ′ .Assume otherwise, that for some finite prefix of the primarytrace of 𝜋 denoted 𝑝 we have that 𝑟 ( 𝑝 ) ∉ 𝑄 ′ . Since 𝑟 ( 𝑝 ) clearly isinside 𝑄 , it must be the case that for some 𝑗 ∈ Ω \ 𝑊 𝑟 ( 𝑝 )[ 𝑗 ] ∉ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . Since 𝑟 ( 𝑝 )[ 𝑗 ] is not in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , it must be in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) .This means that, upon observing 𝑝 , a direction 𝛽 exists that tran-sitions 𝑇 𝑊 into a state 𝑞 ′ ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . From here player 1 hasa winning strategy in 𝐺 𝑗 . Following one of the paths created byplayer 1 playing directions according to this winning strategy andplayer 0 playing anything in response, we get that player 1 willeventually win the game, forcing 𝑇 𝑊 to attempt a transition into 𝐹 𝑗 and getting stuck. Therefore 𝑇 𝑊 does not actually accept 𝜋 , acontradiction.Since the image of 𝑟 ( 𝜋 𝑝 ) is contained within 𝑄 ′ , we claim that 𝐴 ′ 𝑊 accepts the word formed by the labels along 𝜋 𝑝 , which wedenote by 𝛼 ( 𝜋 𝑝 ) . Since 𝑇 𝑊 accepts along 𝜋 𝑝 and the run 𝑟 ( 𝜋 𝑝 ) never leaves 𝑄 ′ , we have that there are infinitely many membersof the set 𝐹 ∩ 𝑄 ′ in the run 𝑟 ( 𝜋 𝑝 ) , satisfying the Büchi condition of 𝐴 ′ 𝑊 . And since any states in which some 𝑄 𝑗 for 𝑗 ∈ Ω \ 𝑊 reachesa final state are excluded from 𝑄 ′ , 𝐴 ′ 𝑊 will never get stuck reading 𝛼 ( 𝜋 𝑝 ) . Therefore, 𝐴 ′ 𝑊 accepts 𝛼 ( 𝜋 𝑝 ) and is therefore nonempty. (cid:3) Corollary 4.3.
Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Then, a 𝑊 -NE strategy exists in 𝐺 iff the automaton 𝐴 ′ 𝑊 constructedwith respect to 𝐺 is nonempty. The algorithm outlined by our previous constructions consists oftwo main part. First, we construct and solve a safety game foreach agent. Second, for 𝑊 ⊆ Ω , we check the automaton 𝐴 ′ 𝑊 fornonemptiness. The input to this algorithm consists of 𝑘 goal DFAswith alphabet Σ and a set of 𝑘 alphabets Σ 𝑖 corresponding to the ctions available to each agent. Therefore, the size of the input isthe sum of the sizes of these 𝑘 goal DFAs.In the first step, we construct a safety game for each of theagents. The size of the state space of the safety game for agent 𝑗 is | 𝑄 𝑗 |(| Σ | + ) . The size of the edge set for the safety game canbe bounded by (| 𝑄 𝑗 |∗| Σ |)+ (| 𝑄 𝑗 | ∗| Σ |) , where | 𝑄 𝑗 |∗| Σ | representsthe | Σ | outgoing transitions from each state in 𝑄 𝑗 owned by player0 and | 𝑄 𝑗 | ∗ | Σ | is an upper bound assuming that each of the statesin 𝑄 𝑗 × Σ owned by player 1 can transition to each of the states in 𝑄 𝑗 owned by player 0. Since safety games can be solved in lineartime with respect to the number of the edges [2], each safety gameis solved in polynomial time. We solve one such safety game foreach agent which represents a linear blow up. Therefore, solvingthe safety games for all agents can be done in polynomial time.For a given 𝑊 ⊆ Ω , querying the automaton 𝐴 ′ 𝑊 for nonempti-ness can be done in PSPACE, as the state space of 𝐴 ′ 𝑊 consists oftuples from the product of input DFAs. We can then test 𝐴 ′ 𝑊 onthe fly by guessing the prefix of the lasso and then guessing thecycle, which can be done in polynomial space [29]. Theorem 5.1.
The problem of deciding whether there exists a 𝑊 -NE strategy profile for an iBG 𝐺 and a set 𝑊 ⊆ Ω of agents is inPSPACE. In this section we show that the problem of determining whethera 𝑊 -NE exists in an iBG is PSPACE-hard by providing a reductionfrom the PSPACE-complete problem of DFA Intersection Empti-ness (DFAIE). The DFAIE problem is as follows: Given 𝑘 DFAs 𝐴 . . . 𝐴 𝑘 − with a common alphabet Σ , decide whether Ñ ≤ 𝑖 ≤ 𝑘 − 𝐴 𝑖 ≠ ∅ [15].Given a DFA 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i , we define the goal DFAˆ 𝐴 𝑖 = h ˆ 𝑄 𝑖 , 𝑞 𝑖 , ˆ Σ , ˆ 𝛿 𝑖 , ˆ 𝐹 𝑖 i as follows:(1) ˆ Σ = Σ ∪ { 𝐾 } , where 𝐾 is a new symbol, i.e. 𝐾 ∉ Σ (2) ˆ 𝑄 𝑖 = 𝑄 𝑖 ∪ { accept , reject } ,(3) ˆ 𝛿 𝑖 ( 𝑞, 𝑎 ) = 𝑞 for 𝑞 ∈ { accept , reject } and 𝑎 ∈ ˆ Σ ˆ 𝛿 𝑖 ( 𝑞, 𝑎 ) = 𝛿 𝑖 ( 𝑞, 𝑎 ) for 𝑞 ∈ 𝑄 𝑖 and 𝑎 ∈ Σ ˆ 𝛿 𝑖 ( 𝑞, 𝐾 ) = accept for 𝑞 ∈ 𝐹 𝑖 ˆ 𝛿 𝑖 ( 𝑞, 𝐾 ) = reject for 𝑞 ∈ 𝑄 𝑖 \ 𝐹 𝑖 (4) ˆ 𝐹 𝑖 = { accept } Intuitively, accept and reject are two new accepting and rejectingstates that have no outgoing transitions. The new symbol 𝐾 takesaccepting states to accept and rejecting states to reject. The pur-pose of 𝐾 is to synchronize acceptance by all goal automata. Wecall the process of modifying 𝐴 𝑖 into ˆ 𝐴 𝑖 transformation.The transformation from 𝐴 𝑖 to ˆ 𝐴 𝑖 can be done in linear timewith respect to the size of 𝐴 𝑖 , as the process only involves addingtwo new states. Furthermore, if 𝐴 𝑖 is a DFA then ˆ 𝐴 𝑖 is also a DFA.Given an instance of DFAIE, i.e., 𝑘 DFAs 𝐴 . . . 𝐴 𝑘 − , we createan iBG 𝐺 , defined in the following manner.(1) Ω = { , . . . 𝑘 − } (2) The goal for agent 𝑖 is ˆ 𝐴 𝑖 (3) Σ = Σ ∪ { 𝐾 } (4) Σ 𝑖 = {∗} for 𝑖 ≠
0. Here ∗ represents a fresh symbol, i.e., ∗ ∉ Σ and ∗ ≠ 𝐾 .Clearly, the blow-up of the construction is linear. Since eachagent except 0 is given control over a set consisting solely of ∗ ,the common alphabet of the ˆ 𝐴 𝑖 is technically ˆ Σ × {∗} 𝑘 − . This al-phabet is isomorphic to ˆ Σ , so by a slight abuse of notation we keepconsidering the alphabet of the ˆ 𝐴 𝑖 to be ˆ Σ .Before stating and proving the correctness of the reduction, wemake two observations. We are interested here in Nash equilibriain which every agent is included in 𝑊 . This implies the following:(1) The existence of an Ω -NE is defined solely by the Primary-Trace Condition. Since there are no agents in Ω \ 𝑊 , there isno concept of a 𝑗 -Deviant-Trace. If we are given an infiniteword that satisfies the Primary-Trace Condition, we can ex-tend it to a full Ω -NE strategy tree by labeling the nodesthat do not occur on the primary trace arbitrarily.(2) Since there are no 𝑗 -Deviant-Traces in this specific instanceof the Ω -NE Nonemptiness problem, we can relax our as-sumption that | Σ 𝑗 | ≥ 𝑗 ∈ Ω , since there is nomeaningful concept of deviation in an Ω -NE. Recall that thisassumption was made only for simplicity of presentation re-garding 𝑗 -Deviant Traces. Theorem 5.2.
Let 𝐴 . . . 𝐴 𝑘 − be 𝑘 DFAs with alphabet Σ . Then, Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ iff there exists an Ω -NE in the iBG 𝐺 con-structed from 𝐴 . . . 𝐴 𝑘 − . Proof.
In this proof, we introduce the notation 𝑆 to denote aninfinite suffix, which is an arbitrarily chosen element of { Σ ∪ 𝐾 } 𝜔 . (→) Assume that Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ . Then, there is a word 𝑤 ∈ Σ ∗ that is accepted by each of 𝐴 . . . 𝐴 𝑘 − . We now show that 𝑤 · 𝐾 · 𝑆 satisfies all goals ˆ 𝐴 . . . ˆ 𝐴 𝑘 − . Since each of 𝐴 . . . 𝐴 𝑘 − accepts 𝑤 , each of ˆ 𝐴 . . . ˆ 𝐴 𝑘 − reaches a final state of 𝐴 . . . 𝐴 𝑘 − ,respectively, after reading 𝑤 . Then, after reading 𝐾 , ˆ 𝐴 . . . ˆ 𝐴 𝑘 − allsimultaneously transition to accept. Therefore all goals ˆ 𝐴 𝑖 are sat-isfied on 𝑤 · 𝐾 · 𝑆 and 𝑤 · 𝐾 · 𝑆 satisfies the Primary-Trace Condition.Since we are considering an Ω -NE, there is no need to check de-viant traces and 𝑤 · 𝐾 · 𝑆 can be arbitrarily extended to a full Ω -NEstrategy profile tree. (←) Assume that the iBG 𝐺 with goals ˆ 𝐴 . . . ˆ 𝐴 𝑘 − admits an Ω -NE. We claim that its primary trace must be of the form 𝑤 · 𝐾 · 𝑆 ,where 𝑤 ∈ Σ ∗ does not contain 𝐾 . This is equivalent to sayingthat a satisfying primary trace must have at least one 𝐾 . This iseasy to see, as the character 𝐾 is the only way to transition into anaccepting state for each ˆ 𝐴 𝑖 , therefore it must occur at least once ifall ˆ 𝐴 𝑖 are satisfied on this trace.We now claim that each of 𝐴 . . . 𝐴 𝑘 − accept 𝑤 . Assume this isnot the case, and some 𝐴 𝑖 does not accept 𝑤 . Then, while reading 𝑤 , ˆ 𝐴 𝑖 never reaches accept, as 𝑤 does not contain 𝐾 . Furthermore,upon seeing the first 𝐾 , ˆ 𝐴 𝑖 transitions to reject, since 𝐴 𝑖 is not ina final state in 𝐹 𝑖 after reading 𝑤 . Thus, ˆ 𝐴 𝑖 can never reach accept,contradicting the assumption that 𝑤 · 𝐾 · 𝑆 was an Ω -NE. Thereforeall 𝐴 𝑖 must accept 𝑤 , and Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ . (cid:3) This establishes a polynomial time reduction from DFAIE to 𝑊 -NE Nonemptiness; therefore 𝑊 -NE Nonemptiness is PSPACE-hard. n fact this reduction has shown that checking the Primary-TraceCondition is itself PSPACE-hard. Combining this with our PSPACEdecision algorithm yields PSPACE-completeness. Theorem 5.3.
The problem of deciding whether there exists a 𝑊 -NE strategy profile for an iBG 𝐺 and a set 𝑊 ⊆ Ω of agents isPSPACE-complete. The main contribution of this work is Theorem 5.3, which charac-terizes the complexity of deciding whether a 𝑊 -NE strategy profileexists for an iBG 𝐺 and 𝑊 ⊆ Ω is PSPACE-complete. Separation of Strategic and Temporal Reasoning. : The main ob-jectives of this work is to analyze equilibria in finite-horizon mul-tiagent concurrent games, focusing on the strategic-reasoning as-pect of the problem, separately from temporal reasoning. In orderto accomplish this, we used DFA goals instead of goals expressed insome finite-horizon temporal logic. For these finite-horizon tempo-ral logics, previous analysis [12] consisted of two steps. First, thelogical goals are translated into a DFA, which involves a doublyexponential blow up [6, 17]. The second step was to perform thestrategic reasoning, i.e., finding the Nash equilibria with the DFAfrom the first step as input. In terms of computational complexity,the first step completely dominated the second step, in which thestrategic reasoning was conducted with respect to the DFAs. Herewe eliminated the doubly exponential-blow up from considerationby starting with DFA goals and provided a PSPACE-completenessresult for the second step.
Future Work. : Our immediate next goals are to analyze prob-lems such as verification (deciding whether a given strategy profileis a 𝑊 -NE) and strategy extraction (i.e., construction a finite-statecontroller that implements the 𝑊 -NEs found) within the contextof our DFA based iBGs. Furthermore, we are interested in imple-mentation, i.e. a tool based on the theory developed in this paper.Further points of interest can be motivated from a game-theorylens, such as introducing imperfect information. Earlier work hasalready introduced imperfect information to problems in synthesisand verification - see [3, 8, 27]. Finally, the work can be extended toboth the general CGS formalism (as opposed to iBGs) and to query-ing other properties/equilibrium concepts outside of the Nash equi-libria. Strategy Logic [21] has been introduced as a way to querygeneral game theoretic properties on concurrent game structures,and a version of strategy logic with finite goals would be a promis-ing place to start for these extensions.
REFERENCES [1] R. Alur, T. A. Henzinger, and O. Kupferman. Alternating-time temporal logic.
J.ACM , 49(5):672–713, 2002.[2] J. Bernet, D. Janin, and I. Walukiewicz. Permissive strategies: from parity gamesto safety games.
RAIRO-Theoretical Informatics and Applications-InformatiqueThéorique et Applications , 36(3):261–275, 2002.[3] R. Berthon, B. Maubert, A. Murano, S. Rubin, and M. Y. Vardi. Strategy logicwith imperfect information.
CoRR , abs/1805.12592, 2018.[4] J. Elgaard, N. Klarlund, and A. Möller. Mona 1.x: new techniques for WS1S andWS2S. In
Proc. 10th Int’l Conf. on Computer Aided Verification , volume 1427 of
Lecture Notes in Computer Science , pages 516–520. Springer, 1998.[5] D. Fisman, O. Kupferman, and Y. Lustig. Rational synthesis. In J. Esparza andR. Majumdar, editors,
Tools and Algorithms for the Construction and Analysis ofSystems, 16th International Conference, TACAS 2010, Held as Part of the Joint Euro-pean Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings , volume 6015 of
Lecture Notes in Computer Science ,pages 190–204. Springer, 2010.[6] G. D. Giacomo and M. Y. Vardi. Linear temporal logic and linear dynamic logic onfinite traces. In F. Rossi, editor,
IJCAI 2013, Proceedings of the 23rd InternationalJoint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013 , pages854–860. IJCAI/AAAI, 2013.[7] G. D. Giacomo and M. Y. Vardi. Synthesis for LTL and LDL on finite traces.In Q. Yang and M. J. Wooldridge, editors,
Proceedings of the Twenty-Fourth In-ternational Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires,Argentina, July 25-31, 2015 , pages 1558–1564. AAAI Press, 2015.[8] G. D. Giacomo and M. Y. Vardi. Ltlf and ldlf synthesis under partial observability.In S. Kambhampati, editor,
Proceedings of the Twenty-Fifth International JointConference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July2016 , pages 1044–1050. IJCAI/AAAI Press, 2016.[9] E. Grädel, W. Thomas, and T. Wilke.
Automata, Logics, and Infinite Games: AGuide to Current Research . Lecture Notes in Computer Science 2500. Springer,2002.[10] J. Gutierrez, P. Harrenstein, and M. J. Wooldridge. Iterated boolean games.
Inf.Comput. , 242:53–79, 2015.[11] J. Gutierrez, M. Najib, G. Perelli, and M. J. Wooldridge. Automated temporalequilibrium analysis: Verification and synthesis of multi-player games.
Artif.Intell. , 287:103353, 2020.[12] J. Gutierrez, G. Perelli, and M. J. Wooldridge. Iterated games with LDL goals overfinite traces. In K. Larson, M. Winikoff, S. Das, and E. H. Durfee, editors,
Pro-ceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems,AAMAS 2017, São Paulo, Brazil, May 8-12, 2017 , pages 696–704. ACM, 2017.[13] M. Hasanbeig, N. Y. Jeppu, A. Abate, T. Melham, and D. Kroening. Deepsynth:Program synthesis for automatic task segmentation in deep reinforcement learn-ing.
CoRR , abs/1911.10244, 2019.[14] T. A. Henzinger. Games in system design and verification. In R. van der Meyden,editor,
Proceedings of the 10th Conference on Theoretical Aspects of Rationalityand Knowledge (TARK-2005), Singapore, June 10-12, 2005 , pages 1–4. NationalUniversity of Singapore, 2005.[15] D. Kozen. Lower bounds for natural proof systems. In , pages 254–266. IEEE Computer Society, 1977.[16] O. Kupferman, G. Perelli, and M. Y. Vardi. Synthesis with rational environments.
Ann. Math. Artif. Intell. , 78(1):3–20, 2016.[17] O. Kupferman and M. Vardi. Model checking of safety properties.
Formal Meth-ods in System Design , 19(3):291–314, 2001.[18] R. McNaughton. Infinite games played on finite graphs.
Ann. Pure Appl. Logic ,65(2):149–184, 1993.[19] J. J. Michalenko, A. Shah, A. Verma, R. G. Baraniuk, S. Chaudhuri, and A. B. Pa-tel. Representing formal languages: A comparison between finite automata andrecurrent neural networks. In . OpenReview.net, 2019.[20] F. Mogavero, A. Murano, G. Perelli, and M. Y. Vardi. Reasoning about strategies:On the model-checking problem.
ACM Trans. on Computational Logic , 15(4):1–47, 2014.[21] F. Mogavero, A. Murano, G. Perelli, and M. Y. Vardi. Reasoning about strategies:On the model-checking problem.
ACM Trans. Comput. Log. , 15(4):34:1–34:47,2014.[22] J. F. Nash. Equilibrium points in n-person games.
Proceedings of the NationalAcademy of Sciences , 36(1):48–49, 1950.[23] A. Pnueli. The temporal logic of programs. In , pages 46–57. IEEE Computer Society, 1977.[24] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In
ConferenceRecord of the Sixteenth Annual ACM Symposium on Principles of ProgrammingLanguages, Austin, Texas, USA, January 11-13, 1989 , pages 179–190. ACM Press,1989.[25] Y. Shoham and K. Leyton-Brown.
Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations . Cambridge University Press, 2009.[26] M. Sipser.
Introduction to the Theory of Computation . Course Technology, secondedition, 2006.[27] L. M. Tabajara and M. Y. Vardi. Ltlf synthesis under partial observability: Fromtheory to practice.
CoRR , abs/2009.10875, 2020.[28] J. van Benthem. Logic games: From tools to models of interaction. In J. vanBenthem, A. Gupta, and R. Parikh, editors,
Proof, Computation and Agency - Logicat the Crossroads , volume 352 of
Synthese library , pages 183–216. Springer, 2011.[29] M. Vardi and P. Wolper. Reasoning about infinite computations.
Informationand Computation , 115(1):1–37, 1994.[30] M. J. Wooldridge.
An Introduction to MultiAgent Systems, Second Edition . Wiley,2009.[31] E. Yahav. From programs to interpretable deep models and back. In H. Chocklerand G. Weissenbacher, editors,
Computer Aided Verification - 30th InternationalConference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, xford, UK, July 14-17, 2018, Proceedings, Part I , volume 10981 of Lecture Notesin Computer Science , pages 27–37. Springer, 2018. 10 otation GlossarySymbol Meaning 𝐺 An iBG as a whole Ω The set of agents in an iBG. 𝑘 The cardinality of Ω . Ω = { , . . . 𝑘 − } 𝑖, 𝑗 Agents in an iBG. 𝑗 usually refers to a deviating agent. 𝐴 𝑖 The goal automata for agent 𝑖 . Usually 𝐴 𝑗 if 𝑗 is not in 𝑊 . 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i 𝜋 𝑖 A strategy for agent 𝑖 Π 𝑖 The set of strategies for agent 𝑖𝜋 A strategy profile consisting of one strategy for each agent. 𝜋 = h 𝜋 . . . 𝜋 𝑘 − i . In the context of safety games, a play. Σ 𝑖 The set of actions for agent 𝑖 Σ The cross product of all 𝜎 𝑖 . 𝑤 An element of Σ ∗ 𝛼 An element of Σ . In the context of Σ ∗ → Σ trees, it refers to a label. 𝛽 An element of Σ . In the context of Σ ∗ → Σ trees, it refers to a direction. 𝑞 𝐴 , 𝑞 𝑎 A catch all accepting state in tree automata that always transitions back to itself. 𝐴 𝑊 A deterministic Büchi word automaton that accepts traces in which all goals from 𝑊 are satisfied and no others. 𝐴 𝑊 = h 𝑄, 𝑞 , Σ , 𝛿, 𝐹 i 𝑇 A deterministic top down Büchi tree automaton that accepts a tree if its primary trace is accepted by 𝐴 𝑊 . 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) 𝑇 𝑗 A deterministic top down Büchi tree automaton that accepts a tree if it satisfies the j-Deviant Trace Condition. 𝑇 𝑗 = ( Σ , Σ , ( 𝑄 𝑗 × { , }) ∪ { 𝑞 𝐴 } , h 𝑞 𝑗 , i , 𝜌 𝑗 , ( 𝑄 𝑗 × { }) ∪ (( 𝑄 𝑗 \ 𝐹 𝑗 ) × { }) ∪ { 𝑞 𝐴 }) 𝑇 𝑊 A deterministic top down Büchi tree automaton that accepts a tree if it represents a 𝑊 -NE strategy profile. 𝑇 𝑊 = ( Σ , Σ , 𝑄 ∪ Ð 𝑗 ∈ Ω \ 𝑊 𝑄 𝑗 ∪ { 𝑞 𝐴 } , 𝑞 , 𝜏, 𝐹 ∪ Ð 𝑗 ∈ Ω \ 𝑊 { 𝑄 𝑗 \ 𝐹 𝑗 } ∪ { 𝑞 𝐴 }) 𝐺 𝑗 A safety game constructed to characterize the states of 𝑄 𝑗 in 𝐴 𝑗 into those that 𝑇 𝑊 started in said state is empty ornot. 𝐺 𝑗 = ( 𝑄 𝑗 , 𝑄 𝑗 × Σ , 𝐸 𝑗 ) 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) The winning set of player 0 in 𝐺 𝑗 . 𝐴 ′ 𝑊 A deterministic Büchi word automata used to test 𝑇 𝑊 for nonemptiness. 𝐴 ′ 𝑊 = ( 𝑄 ′ , 𝑞 , Σ , 𝛿 ′ , 𝐹 ∩ 𝑄 ′ ) 𝐾 A fresh character that is not contained in Σ . ∗ A second fresh character that is neither contained in Σ nor equal to 𝐾 .ˆ 𝐴 𝑖 A transformed DFA that serves as a goal DFA in an iBG. ˆ 𝐴 𝑖 = h ˆ 𝑄 𝑖 , 𝑞 𝑖 , ˆ Σ , ˆ 𝛿 𝑖 , ˆ 𝐹 𝑖 i See Section 5.2.ˆ
Σ Σ ∪ { 𝐾 } 𝑆 An arbitrary element of { Σ ∪ 𝐾 } 𝜔11