[PDF] Nash Equilibria in Finite-Horizon Multiagent Concurrent Games

Abstract

The problem of finding pure strategy Nash equilibria in multiagent concurrent games with finite-horizon temporal goals has received some recent attention. Earlier work solved this problem through the use of Rabin automata. In this work, we take advantage of the finite-horizon nature of the agents' goals and show that checking for and finding pure strategy Nash equilibria can be done using a combination of safety games and lasso testing in B\"uchi automata. To separate strategic reasoning from temporal reasoning, we model agents' goals by deterministic finite-word automata (DFAs), since finite-horizon logics such as LTL\textsubscript{f} and LDL\textsubscript{f} are reasoned about through conversion to equivalent DFAs. This allow us characterize the complexity of the problem as PSPACE complete.

Full PDF

aa r X i v : . [ c s . G T ] J a n Nash Equilibria in Finite-Horizon Multiagent Concurrent Games

Senthil Rajasekaran

Computer Science DepartmentRice [email protected]

Moshe Y. Vardi

Computer Science DepartmentRice [email protected]

ABSTRACT

The problem of ﬁnding pure strategy Nash equilibria in multia-gent concurrent games with ﬁnite-horizon temporal goals has re-ceived some recent attention. Earlier work solved this problemthrough the use of Rabin automata. In this work, we take advan-tage of the ﬁnite-horizon nature of the agents’ goals and show thatchecking for and ﬁnding pure strategy Nash equilibria can be doneusing a combination of safety games and lasso testing in Büchiautomata. To separate strategic reasoning from temporal reason-ing, we model agents’ goals by deterministic ﬁnite-word automata(DFAs), since ﬁnite-horizon logics such as LTL f and LDL f are rea-soned about through conversion to equivalent DFAs. This allow uscharacterize the complexity of the problem as PSPACE complete. ACM Reference Format:

Senthil Rajasekaran and Moshe Y. Vardi. 2021. Nash Equilibria in Finite-Horizon Multiagent Concurrent Games. In

Proc. of the 20th InternationalConference on Autonomous agents and Multiagent Systems (AAMAS 2021),Online, May 3–7, 2021 , IFAAMAS, 11 pages.

Game theory provides a powerful framework for modeling prob-lems in system design and veriﬁcation [9, 14, 28]. In particular,two-player games have been used in synthesis problems for tem-poral logics [24]. In these games, one player takes on the role of thesystem that tries to realize a property and the other takes on therole of the environment that tries to falsify the property. Withinthe scope of multiplayer games, two-player zero-sum games arethe easiest to analyze, since they are purely adversarial – there isno reason for either player to do anything but maximize their ownutility at the expense of the other.When there are multiple agents with multiple goals, pure antag-onism is not a reasonable assumption [30].

Concurrent games area fundamental model of such multiagent systems [1, 20].

IteratedBoolean Games (iBG) [10] are a restriction of concurrent gamesintroduced in part to generalize temporal synthesis problems tothe multiagent setting. In an iBG, each agent has a temporal goal,usually expressed in

Linear Time Temporal Logic (LTL) [23], and isgiven control over a unique set of boolean variables. At each timestep, the agents collectively decide a setting to all boolean vari-ables by individually and concurrently assigning values to theirown variables. This creates an inﬁnite sequence of boolean assign-ments (a trace ) that is used to determined which goals are satisﬁed

Work supported in part by NSF grants IIS-1527668, CCF-1704883, IIS-1830549, and anaward from the Maryland Procurement Oﬃce.

Proc. of the 20th International Conference on Autonomous agents and Multiagent Sys-tems (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021,Online and which are not [10]. In this paper, we generalize the iBG for-malism slightly to admit arbitrary ﬁnite alphabets rather than justtruth assignments to boolean variables, as discussed below.The concept of the

Nash Equilibrium [22] is widely accepted asan important notion of a solution in multiagent games and repre-sents a situation where agents cannot improve their outcomes uni-laterally. In this paper we consider deterministic agents, and there-fore the notion of a Nash equilibrium in this paper that of purestrategy Nash equilibrium [25]. This deﬁnition has a natural ana-logue when iBGs are considered, so ﬁnding Nash Equilibria in iBGsis an eﬀective way to reason about temporal interactions betweenmultiple agents [10]. This problem has received attention in theliterature when the goals are derived from inﬁnite-horizon logicssuch as LTL [5, 11]. There are, however, interactions that are bettermodeled by ﬁnite-horizon goals, especially when notions such as“completion” are considered [7]. In such settings, it is more eﬀec-tive to reason about goals that can be completed in some ﬁnite butperhaps unbounded number of steps. Thus, while the agents stillcreate an inﬁnite trace with their decisions, satisfaction occurs ata ﬁnite time index. With this modiﬁcation in mind, the analogousproblem for ﬁnite-horizon temporal logics has recently began to re-ceive attention [12]. The main result of [12] is that automated equi-librium analysis of ﬁnite-horizon goals in iterated Boolean gamescan be done via reasoning about automata on inﬁnite words, specif-ically,

Rabin automata .Here we address a more abstract version of the multi-agent ﬁnite-horizon temporal-equilibrium problem by analyzing concurrent it-erated games in which each agent is given their own

DeterministicFinite Word Automata (DFA) goal. The reason for this is twofold.First, essentially all ﬁnite-horizon temporal logics are reasoned aboutthrough conversion to equivalent DFA, including the popular log-ics LTL f and LDL f [6, 7]. Thus, using DFA goals oﬀers us a generalway of dealing with a variety of temporal formalisms. Furthermore,using DFA goals enables us to separate the complexity of temporalreasoning from the complexity of strategic reasoning. Our focuson DFAs also ties in to a growing interest in DFAs as graphicalmodels that can be reasoned about directly in a number of relatedﬁelds; see [13, 19, 31] for a few examples in the context of machinelearning.Our modelling of this problem is done from the viewpoint of asystem planner. Speciﬁcally, when given a system in which multi-ple agents have DFA goals, we query a subset 𝑊 of “good” agentsto see if there is Nash equilibrium in which only the agents in 𝑊 are able to satisfy their goals. By the deﬁnition of the Nash equi-librium, this means that agents not within 𝑊 , which we consideras “bad” agents, are unable to unilaterally change their strategyand satisfy their own “bad” goal. In doing so we can naturally in-corporate malicious agents with goals contrary to the planner’sby specifying a set 𝑊 that not contain such agents. This study of eams of cooperating agents has clear parallels to earlier work inrational synthesis [5, 16].Our main result is that automated temporal-equilibrium anal-ysis is PSPACE complete. We prove that the problem of identify-ing sets of players that admit Nash equilibria in concurrent multi-agent games with DFA goals can be solved using rather simpleconstructions. Speciﬁcally, our algorithm works by ﬁrst solving asafety game for each agent in the game and then considers nonempti-ness in a Büchi word automata constructed with respect to the set 𝑊 of agents, which can be done in PSPACE. This is in contrast tothe 2EXPTIME upper bound of [12], which analyzed the combinedcomplexity of temporal and strategic reasoning and also consid-ered existence overall instead of with respect to a speciﬁc set ofagents 𝑊 . In this case driving force behind the complexity resultwas the doubly exponential blow-up from LDL f to DFAs [6, 17].Finally, we prove our algorithm optimal by providing a matchinglower bound. We assume familiarity with basic automata theory, as in [26]. Be-low is a quick refresher on 𝜔 -automata and inﬁnite tree automata. Deﬁnition 2.1 ( 𝜔 automata). [9] A deterministic 𝜔 automaton isa 5-tuple h 𝑄, 𝑞 , Σ , 𝛿, 𝐴𝑐𝑐 i , where 𝑄 is a ﬁnite set of states, 𝑞 ∈ 𝑄 is the initial state, Σ is a ﬁnite alphabet, 𝛿 : 𝑄 × Σ → 𝑄 is thetransition function, and 𝐴𝑐𝑐 is an acceptance criterion. An inﬁniteword 𝑤 = 𝑎 , 𝑎 , . . . ∈ Σ 𝜔 is accepted by the automaton, if the run 𝑞 , 𝑞 , . . . ∈ 𝑄 𝜔 is accepting, which requires that 𝑞 is the initialstate and 𝑞 𝑖 + = 𝛿 ( 𝑞 𝑖 , 𝑎 𝑖 ) for all 𝑖 ≥

0, The run 𝑞 , 𝑞 . . . satisﬁesthe acceptance condition 𝐴𝑐𝑐 . Deﬁnition 2.2 ( 𝜔 automata Büchi Acceptance Condition). [9] TheBüchi condition is speciﬁed by a ﬁnite set 𝐹 ⊆ 𝑄 . For a given inﬁ-nite run 𝑟 , let 𝑖𝑛𝑓 ( 𝑟 ) denote the set of states that occur inﬁnitelyoften in 𝑟 . We have that the Büchi condition is satisﬁed by 𝑟 if 𝑖𝑛𝑓 ( 𝑟 ) ∩ 𝐹 ≠ ∅ We now extend this deﬁnition to deterministic Büchi tree au-tomata. These automata will recognize a set of labeled directedtrees. A Σ -labeled, Δ -directed tree, for ﬁnite alphabets Σ ( label al-phabet , or labels , for short) and Δ ( direction alphabet , or directions for short) is a mapping 𝜏 : Δ ∗ → Σ . Intuitively, 𝜏 labels the nodes 𝑢 ∈ Δ ∗ with labels from Σ . A path 𝑝 of a Δ -directed tree is aninﬁnite sequence 𝑝 = 𝑢 , 𝑢 , . . . ∈ ( Δ ∗ ) 𝜔 , such that 𝑢 𝑖 + = 𝑢 𝑖 𝑏 𝑖 for some 𝑏 𝑖 ∈ Δ . We use the notation 𝜏 ( 𝑝 ) to denote the inﬁnitesequence 𝜏 ( 𝑢 ) , 𝜏 ( 𝑢 ) , . . . ∈ Σ 𝜔 . Deﬁnition 2.3 (Deterministic Büchi Tree Automata). [9] A deter-ministic Büchi tree automaton is a tuple h Σ , Θ , 𝑄, 𝑞 , Δ , 𝐹 i , where Σ is a ﬁnite label alphabet, Δ is a ﬁnite direction alphabet, 𝑄 is aﬁnite state set, 𝑞 ∈ 𝑄 is the initial state, 𝜌 : ( 𝑄 × Σ × Δ ) → 𝑄 is adeterministic transition function, and 𝐹 ⊂ 𝑄 is the accepting-stateset.The automaton is considered to be top-down if runs of the au-tomata start from the root of a tree. All automata in this paper willbe top-down, and our notion of a run is conditioned on this.A run of this automaton on a Σ -labeled, Δ -directed tree 𝜏 : Δ ∗ → Σ is a 𝑄 -labeled, Δ -directed tree 𝑟 : Δ ∗ → 𝑄 such that 𝑟 ( 𝜀 ) = 𝑞 , and if 𝑢 ∈ Δ ∗ , 𝜏 ( 𝑢 ) = 𝑎 , for 𝑎 ∈ Σ , 𝑟 ( 𝑢 ) = 𝑞 , and 𝑣 = 𝑢𝑏 for 𝑏 ∈ Δ ,then 𝑟 ( 𝑣 ) = 𝜌 ( 𝑞, 𝑎, 𝑏 ) . The run 𝑟 is accepting if 𝑟 ( 𝑝 ) satisﬁes theBüchi condition 𝐹 for every path 𝑝 of 𝑟 . In this section we provide some deﬁnitions related to simple twoplayer games to provide a standard notation throughout this paper.The two players will be denoted by player 0 and player 1.

Deﬁnition 2.4 (Arena). An arena is a four tuple 𝐴 = ( 𝑉 ,𝑉 ,𝑉 , 𝐸 ) where 𝑉 is a ﬁnite set of vertices, 𝑉 and 𝑉 are disjoint subsets of 𝑉 with 𝑉 ∪ 𝑉 = 𝑉 that represent the vertices that belong to player0 and player 1 respectively, and 𝐸 ⊆ 𝑉 × 𝑉 is a set of directed edges,i.e. ( 𝑣, 𝑣 ′ ) | ∈ 𝐸 if there is an edge from 𝑣 to 𝑣 ′ .Intuitively, the player that owns a node decides which outgoingedge to follow. Since 𝑉 = 𝑉 ∪ 𝑉 , we can notate the same arenawhile omitting 𝑉 , a convention we follow in this paper. Deﬁnition 2.5 (Play). A play in an arena 𝐴 is an inﬁnite sequence 𝜌 𝜌 𝜌 . . . ∈ 𝑉 𝜔 such that ( 𝜌 𝑛 , 𝜌 𝑛 + ) ∈ 𝐸 holds for all 𝑛 ∈ N . Wesay that 𝜌 starts at 𝜌 We now introduce a very broad deﬁnition for two-player games.

Deﬁnition 2.6 (Game). A game 𝐺 = ( 𝐴,𝑊 𝑖𝑛 ) consists of an arena 𝐴 with vertex set 𝑉 and a set of winning plays 𝑊 𝑖𝑛 ⊆ 𝑉 𝜔 . A play 𝜌 is winning for player 0 if 𝜌 ∈ 𝑊 𝑖𝑛 , otherwise it is winning forplayer 1.Note that in this formulation of a game, reaching a state 𝑣 ∈ 𝑉 with no outgoing transitions is always losing for player 0, as player0 is the one that must ensure that 𝜌 is inﬁnite ( a member of 𝑉 𝜔 ).A game is thus deﬁned by its set of winning plays, often calledthe winning condition. One such widely used winning conditionthe safety condition. Deﬁnition 2.7 (Safety Condition/ Safety Game).

Let 𝐴 = ( 𝑉 , 𝑉 ,𝑉 ,𝐸 ) be an arena and 𝑆 ⊆ 𝑉 be a subset of 𝐴 ’s vertices. Then, the safety condition 𝑆𝑎𝑓 𝑒𝑡𝑦 ( 𝑆 ) is deﬁned as 𝑆𝑎𝑓 𝑒𝑡𝑦 ( 𝑆 ) = { 𝜌 ∈ 𝑉 𝜔 | 𝑂𝑐𝑐 ( 𝜌 ) ⊆ 𝑆 } where 𝑂𝑐𝑐 ( 𝜌 ) denotes the subset of vertices that occur at leastonce in 𝜌 .A game with the safety winning condition for a subset 𝑆 is a safety game with the set 𝑆 of safe vertices. Information about solv-ing safety games, including notions of winning strategies and win-ning sets can be found here [18]. A concurrent game structure (CGS) is an 8-tuple ( 𝑃𝑟𝑜𝑝, Ω , ( 𝐴 𝑖 ) 𝑖 ∈ Ω , 𝑆, 𝜆, 𝜏, 𝑠 ∈ 𝑆, ( 𝐴 𝑖 ) 𝑖 ∈ Ω ) where 𝑃𝑟𝑜𝑝 is a ﬁnite set of propositions , Ω = { , . . . 𝑘 − } isa ﬁnite set of agents , 𝐴 𝑖 is a set of actions , where each 𝐴 𝑖 is as-sociated with an agent 𝑖 (we also construct the set of decisions 𝐷 = 𝐴 × 𝐴 . . . 𝐴 𝑘 − , 𝑆 is a set of states , 𝜆 : 𝑆 → 𝑃𝑟𝑜𝑝 is a la-beling function that associates each state with a set of propositionsthat are interpreted as true in that state, 𝜏 : 𝑆 × 𝐷 → 𝑆 is a de-terministic transition function that takes a state and a decision asinput and returns another state, 𝑠 is a state in 𝑆 that serves as the initial state , and 𝐴 𝑖 is a DFA associated with agent 𝑖 . A DFA 𝐴 𝑖 isdenoted as the goal of agent 𝑖 . Intuitively, agent 𝑖 prefers plays in he game that satisfy 𝐴 𝑖 , that is a play such that some ﬁnite preﬁxof the play is accepted by 𝐴 𝑖 . It is for this reason we refer to 𝐴 𝑖 asa "goal".We now deﬁne iterated boolean games (iBG), a restriction on theCGS formalism. Our formulation is slight generalization of the iBGframework introduced in [10], as we take the set of actions to be aﬁnite alphabet rather than a set of truth assignments since we areinterested in separating temporal reasoning from strategic reason-ing. An iBG is deﬁned by applying the following restrictions to theCGS formalism. Each agent 𝑖 is associated with its own alphabet Σ 𝑖 . These Σ 𝑖 are disjoint and each Σ 𝑖 serves as the set of actionsfor agent 𝑖 ; an action for agent 𝑖 consists of choosing a letter in Σ 𝑖 .The set of decisions is then Σ = > 𝑘 − 𝑖 = Σ 𝑖 . The set of states corre-sponds to the set of decisions Σ ; there is a bijection between the setof states and the set of decisions. The labeling function mirrors theelement of Σ associated with each state. As in [10], we still have 𝜆 ( 𝑠 ) = 𝑠 , but with 𝑠 ∈ Σ now. As a slight abuse of notation, weconsider the “proposition” 𝜎 ∈ Σ 𝑖 for some 𝑖 to be true at state 𝑠 if 𝜎 appears in 𝑠 , allowing us to generalize towards arbitrary alpha-bets. Finally, the transition function 𝜏 is simply right projection 𝜏 ( 𝑠, 𝑑 ) = 𝑑 .We now introduce the notion of a strategy for agent 𝑖 in thegeneral CGS formalism. Deﬁnition 2.8 (Strategy for agent 𝑖 ). A strategy for agent 𝑖 is afunction 𝜋 𝑖 : 𝑆 ∗ → 𝐴 𝑖 . Intuitively, this is a function that, given theobserved history of the game (represented by an element of 𝑆 ∗ ),returns an action 𝑎 𝑖 ∈ 𝐴 𝑖 .Recalling that Ω = { , . . . 𝑘 − } represents the set of agents,we now introduce the notion of a strategy proﬁle . Deﬁnition 2.9 (Strategy Proﬁle).

Let Π 𝑖 represent the set of strate-gies for agent 𝑖 . Then, we deﬁne the set of strategy proﬁles Π = > 𝑖 ∈ Ω Π 𝑖 Note that since both the notion of strategies for individual agentsand the transition function in a CGS are deterministic, a given strat-egy proﬁle for an CGS deﬁnes a unique element of 𝑆 𝜔 (a trace). Deﬁnition 2.10 (Primary Trace resulting from a Strategy Proﬁle).

Given a strategy proﬁle 𝜋 , the primary trace of 𝜋 is the uniquetrace 𝑡 that satisﬁes(1) 𝑡 [ ] = 𝜋 ( 𝜖 ) (2) 𝑡 [ 𝑖 ] = 𝜋 ( 𝑡 [ ] , . . . 𝑡 [ 𝑖 − ]) We denote this trace as 𝑡 𝜋 .Given a trace 𝑡 ∈ 𝑆 𝜔 , deﬁne the winning set 𝑊 𝑡 = { 𝑖 ∈ Ω : 𝑡 | = 𝐴 𝑖 } to be the set of agents whose DFA goals are satisﬁed by a ﬁnitepreﬁx of the trace 𝑡 . The losing set is then deﬁned as Ω / 𝑊 𝑡 .A common solution concept in game theory is the Nash equilib-rium , which we will now modify to ﬁt our iBG framework. In ourframework, a Nash equilibrium is a strategy proﬁle 𝜋 such that foreach agent 𝑖 , if 𝐴 𝑖 is not satisﬁed on 𝑡 𝜋 , then any unilateral strat-egy deviation for agent 𝑖 will not result in a trace that satisﬁes 𝐴 𝑖 .Formally: Deﬁnition 2.11 (Nash Equilibrium). [10] Let 𝐺 be an iBG and 𝜋 = h 𝜋 , 𝜋 . . . 𝜋 𝑘 − i be a strategy proﬁle. We denote 𝑊 𝜋 = 𝑊 𝑡 𝜋 . Theproﬁle 𝜋 is a Nash equilibrium if for every 𝑖 ∈ Ω / 𝑊 𝑡 we have that given all strategy proﬁles of the form 𝜋 ′ = h 𝜋 , 𝜋 . . . 𝜋 ′ 𝑖 . . . 𝜋 𝑘 − i ,for every 𝜋 ′ 𝑖 ∈ Π 𝑖 , it is the case that 𝑖 ∈ Ω / 𝑊 𝜋 ′ .This deﬁnition provides an analogy for the Nash equilibriumdeﬁned in [22] by capturing the same property - no agent can uni-laterally deviate to improve its own payoﬀ (moving from havinga not satisﬁed goal to a satisﬁed goal). Agents already in the set 𝑊 𝜋 cannot have their payoﬀ improved further, so we do not checktheir deviations.Our paper is based around one central question: Given an iBG,which subsets of agents admit at least one Nash equilibrium?

In order to address our central question, we ﬁrst describe a tree-automata framework to characterize the set of Nash equilibriumstrategies in an iBG 𝐺 . In this section we ﬁx a winning set 𝑊 ⊂ Ω and then describe a deterministic Büchi tree automaton that rec-ognizes the set of strategy proﬁles for 𝑊 . In the next section wedevelop an algorithm based on this tree-automata framework.Given 𝑘 DFA goals corresponding to 𝑘 agents, we retain the no-tation that the set of actions for agent 𝑖 is given by Σ 𝑖 . The goal DFAfor agent 𝑖 will then denoted as 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i . Note that thealphabet of the DFA is Σ , since it transitions according to decisionsby all agents in the overlying iBG structure. Since Σ = Σ × . . . Σ 𝑘 − ,compact notation is often used to describe the transition function 𝛿 𝑖 . For example, the Mona tool uses binary decision diagrams torepresent automata with large alphabets [4].

As deﬁned previously, strategy proﬁles are functions 𝜋 : Σ ∗ → Σ .Therefore, strategy proﬁles correspond exactly towards labeled Σ -labeled trees, which are deﬁned in the exact same way. We use thecommon notions of tree paths and label-direction pairs as widelydeﬁned in the literature (see [9] for reference).A 𝑊 -NE-strategy, for 𝑊 ⊆ Ω , is a mapping 𝜋 : Σ ∗ → Σ suchthat the following conditions are satisﬁed:(1) Primary-Trace Condition : The primary inﬁnite trace 𝑡 𝜋 de-ﬁned by 𝜋 satisﬁes the goals 𝐴 𝑗 precisely for 𝑗 ∈ 𝑊 . Thetrace 𝑡 𝜋 = 𝑥 , 𝑥 , . . . for 𝜋 is once again deﬁned as follows(a) 𝑥 = 𝜀 (b) 𝑥 𝑖 + = 𝑥 , . . . , 𝑥 𝑖 , 𝜋 ( 𝑥 , . . . , 𝑥 𝑖 ) (2) 𝑗 -Deviant-Trace Condition : Each 𝑗 - deviant trace 𝑡 = 𝑦 , 𝑦 ,. . . for 𝑗 ∉ 𝑊 , does not satisfy the goal 𝐴 𝑗 .For 𝛼 ∈ Σ , we introduce the notation 𝛼 [− 𝑗 ] to refer to 𝛼 | Σ \ Σ 𝑗 (that is, 𝛼 with Σ 𝑗 projected out). A trace 𝑡 = 𝑦 , 𝑦 , . . . is 𝑗 -deviant if(a) 𝑦 = 𝜀 (b) 𝑦 𝑖 + = 𝑦 , . . . , 𝑦 𝑖 , 𝛼 , where 𝛼 ∈ Σ and 𝛼 [− 𝑗 ] = 𝜋 ( 𝑦 𝑖 )[− 𝑗 ] (c) 𝑡 is not the primary traceIn order to simplify the presentation, we introduce the assump-tion that for all agents 𝑗 we have | Σ 𝑗 | ≥

2. This is because thereare no 𝑗 -Deviant-Traces for an agent with only one strategy. There-fore 𝑊 -NE analysis only amounts to checking the Primary-TraceCondition for these agents.Note that there are traces that do not fall into either category.For example, we could have a trace that contains a label direction air ( 𝛼, 𝛽 ) such that 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 . Or, wecould have a trace that contains two label direction pairs ( 𝛼 , 𝛽 ) and ( 𝛼 , 𝛽 ) such that 𝛼 ≠ 𝛽 , 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , 𝛼 ≠ 𝛽 , and 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for 𝑗 ≠ 𝑗 . Traces like these and others thatdo not ﬁt into either the Primary-Trace category or the 𝑗 -Deviant-Trace category are irrelevant to the Nash equilibrium condition - itdoes not matter what properties do or do not hold on these traces.As a reminder, a trace 𝑧 , 𝑧 , . . . ∈ Σ 𝜔 satisﬁes a DFA 𝐴 if 𝐴 accepts 𝑧 , . . . , 𝑧 𝑘 for some 𝑘 ≥ 𝑊 -NE strategy, we construct an inﬁnite-tree automaton 𝑇 that accepts all 𝑊 -NE strategies. The problemof determining whether a 𝑊 -NE exists then reduces to querying 𝐿 ( 𝑇 ) ≠ ∅ . Recall that we notate the goal DFA of agent 𝑖 as 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i . We assume that that 𝑞 𝑖 ∉ 𝐹 𝑗 , since we are not in-terested in empty traces. We ﬁrst construct a deterministic Büchiautomaton 𝐴 𝑊 = h 𝑄, 𝑞 , Σ , 𝛿, 𝐹 i that accepts a word in Σ 𝜔 if it sat-isﬁes precisely the goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 . Intuitively, 𝐴 𝑊 simulatesconcurrently all the goal DFAs, and checks that 𝐴 𝑗 is satisﬁed pre-cisely for 𝑗 ∈ 𝑊 . We deﬁne the following for 𝐴 𝑊 .(1) 𝑄 = ( > 𝑗 ∈ Ω 𝑄 𝑗 ) × Ω (2) 𝑞 = h 𝑞 , . . . , 𝑞 𝑛 ,𝑊 i (3) 𝐹 = ( > 𝑗 ∈ Ω 𝑄 𝑗 ) × {∅} (4) 𝛿 (h 𝑞 , . . . 𝑞 𝑛 , 𝑈 i , 𝛼 ) = h 𝑞 ′ , . . . 𝑞 ′ 𝑛 ,𝑉 i if(a) 𝑞 ′ 𝑗 = 𝛿 𝑖 ( 𝑞 𝑗 , 𝛼 ) , where 𝑞 ′ 𝑗 ∉ 𝐹 𝑗 for 𝑗 ∉ 𝑊 , and 𝑉 = 𝑈 − { 𝑗 : 𝑞 ′ 𝑗 ∈ 𝐹 𝑗 } .Note that 𝐴 𝑊 concurrently simulates all the goal DFAs while it alsochecks that no goal DFA 𝐴 𝑗 for 𝑗 ∉ 𝑊 is satisﬁed. (Note that if 𝑞 ′ 𝑗 ∈ 𝐹 𝑗 for 𝑗 ∉ 𝑊 , then the transition is not deﬁned, and 𝐴 𝑊 is stuck.)The last component of the state holds the indices of the goals thatare yet to be satisﬁed. For 𝐴 𝑊 to accept an inﬁnite trace, all goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 have to be satisﬁed, so the last component of thestate has to become empty. Note that if 𝐴 𝑊 reaches an acceptingstate in 𝐹 , then it stays in the set 𝐹 unless it gets stuck. Lemma 3.1 ( 𝐴 𝑊 Correctness).

For a given 𝑊 ⊆ Ω , the automa-ton 𝐴 𝑊 accepts an 𝜔 -word 𝑢 ∈ Σ 𝜔 iﬀ 𝑢 | = 𝐴 𝑖 for precisely the agents 𝑖 ∈ 𝑊 . Proof.

First, note that no preﬁx of 𝑤 can satisfy 𝐴 𝑗 for some 𝑗 ∈ Ω \ 𝑊 . If that is the case, then by the deﬁnition of the transitionfunction 𝛿 we would have no transition deﬁned upon reading thispreﬁx, meaning that 𝐴 𝑊 cannot accept. Next, note that every goal 𝐴 𝑗 for 𝑗 ∈ 𝑊 must be satisﬁed by a preﬁx of 𝑤 . Otherwise, the 2 Ω component of the states in 𝑄 would never reach ∅ , as the only wayto remove elements from this component is to satisfy the goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 . Since the Büchi acceptance condition implies that a ﬁnalstate in 𝐴 𝑊 be reached, we know that when a ﬁnal state is reachedall goals 𝐴 𝑗 for 𝑗 ∈ 𝑊 have previously been satisﬁed. Since bothof these conditions must hold, we conclude the lemma. (cid:3) We now construct a deterministic top-down Büchi tree automa-ton 𝑇 that accepts an inﬁnite tree 𝜋 : Σ ∗ → Σ if the Primary-Trace Condition with respect to 𝑊 holds. Essentially, 𝑇 runs 𝐴 𝑊 on the primary trace deﬁned by the input strategy 𝜋 . Formally, 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) , where:(1) Σ is both the label alphabet of the tree and its set of direc-tions, Here we introduce the notation that 𝛼 is an element of Σ corresponding to a label and 𝛽 is an element of Σ cor-responding to a direction.(2) 𝑞 𝑎 is a new accepting state(3) For a state 𝑞 , label 𝛼 , and direction 𝛽 , we have 𝜌 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 ( 𝑞, 𝛼 ) if 𝛼 = 𝛽 and 𝑞 ≠ 𝑞 𝑎 , and 𝜌 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝑎 otherwiseNote that 𝑇 simulates 𝐴 𝑊 along the branch corresponding to theprimary trace deﬁned by the input tree 𝜋 . Along all other branches, 𝑇 enters the accepting state 𝑞 𝑎 . Lemma 3.2.

Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents. Let 𝜋 : Σ ∗ → Σ be a strategy proﬁle. Then 𝜋 is accepted by the treeautomaton 𝑇 iﬀ 𝜋 satisﬁes the Primary Trace condition. Proof.

The primary trace is a single path 𝑝 ∈ 𝜋 such thatfor all label direction pairs ( 𝛼, 𝛽 ) ∈ 𝑝 we have 𝛼 = 𝛽 . The au-tomata 𝑇 transitions to the state 𝑞 𝐴 immediately after seeing alabel-direction pair ( 𝛼, 𝛽 ) such that 𝛼 ≠ 𝛽 , meaning that accep-tance by 𝑇 is solely determined by acceptance on the path 𝑝 with 𝛼 = 𝛽 for every ( 𝛼, 𝛽 ) ∈ 𝑝 , which is the primary trace of 𝜋 bydeﬁnition.The Primary-Trace Condition is that on the primary trace, onlythe goals 𝐴 𝑖 for 𝑖 ∈ 𝑊 are satisﬁed. By virtue of construction, 𝑇 simulates the DBW 𝐴 𝑊 on the primary trace, which captures thiscondition by the previous arguments presented in the constructionof 𝐴 𝑊 in Lemma 3.1. (cid:3) We also construct a deterministic top-down Büchi inﬁnite-treeautomaton 𝑇 𝑗 that accepts precisely the trees 𝜋 : Σ ∗ → Σ thatsatisfy the 𝑗 -Deviant-Trace Condition. Given a DFA goal 𝐴 𝑗 = ( 𝑄 𝑗 , 𝑞 𝑗 , Σ , 𝛿 𝑗 , 𝐹 𝑗 ) , we deﬁne 𝑇 𝑗 = ( Σ , Σ , ( 𝑄 𝑗 ×{ , })∪{ 𝑞 𝐴 } , h 𝑞 𝑗 , i ,𝜌 𝑗 , ( 𝑄 𝑗 × { }) ∪ (( 𝑄 𝑗 \ 𝐹 𝑗 ) × { }) ∪ { 𝑞 𝐴 }) , where:(1) Σ is both the label alphabet of the tree and its set of direc-tions. We retain the notation that 𝛼 is a label and 𝛽 is a di-rection.(2) 𝑞 𝐴 is a new accepting state. (By a slight abuse of notationwe consider 𝑞 𝐴 to be a pair h 𝑞 𝐴 , i .)(3) We maintain two copies of 𝑄 𝑗 , one tagged with 0 and onetagged with 1. Intuitively, we stay in 𝑄 𝑗 ×{ } on the primarytrace until there is a 𝑗 -deviation, and then we transition to 𝑄 𝑗 × { } ,(4) 𝜌 𝑗 (h 𝑞, 𝑖 i , 𝛼, 𝛽 ) is deﬁned as follows(a) 𝛿 𝑗 ( 𝑞, 𝛼 ) × { } if 𝑖 = 𝛼 = 𝛽 (b) 𝛿 𝑗 ( 𝑞, 𝛽 )×{ } if 𝑖 = 𝛼 ≠ 𝛽 , 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 (c) 𝛿 𝑗 ( 𝑞, 𝛽 ) × { } if 𝑖 = 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , and 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 (d) 𝑞 𝐴 if 𝑞 = 𝑞 𝐴 or 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] On the primary trace of 𝜋 , we enter states 𝑞 ∈ 𝑄 𝑗 × { } . All ofthese states are accepting, so the primary trace will always be anaccepting branch in 𝑇 𝑗 since the primary trace is not relevant tothe 𝑗 -Deviant-Trace Condition. Intuitively, we may leave the pri-mary trace at a node labeled 𝛼 by following a direction 𝛽 such that 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛼 ≠ 𝛽 . Here, we transition to the second copyof 𝑄 𝑗 , 𝑄 𝑗 × { } , where the 1 denotes that we have left the primarytrace. When we are in these states on a node labeled 𝛼 , we may tran-sition according to 𝛿 𝑗 on any direction 𝛽 with 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] . Nev-ertheless, due to how the transitions are deﬁned, we can never en-ter a state in 𝐹 𝑗 . If such a direction 𝛽 exists such that 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] and the resulting transition according to 𝛿 𝑗 would put 𝐴 𝑗 in 𝐹 𝑗 , hen the automaton does not have a deﬁned transition and there-fore can not accept on this path. Otherwise, if we see a direction 𝛽 such that for our current label 𝛼 we have that 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , thenthis no longer corresponds to a 𝑗 -Deviant-Trace. At this point wetransition to 𝑞 𝐴 , a catch-all accepting state that marks all contin-uations of the current path irrelevant to the 𝑗 -Deviant-Trace Con-dition. Therefore if we are in state 𝑞 𝐴 we transition back to 𝑞 𝐴 onall directions 𝛽 regardless of the label. Lemma 3.3.

Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents. Let 𝜋 : Σ ∗ → Σ be a strategy proﬁle. Then 𝜋 is accepted by the treeautomaton 𝑇 𝑗 iﬀ 𝜋 satisﬁes the 𝑗 -Deviant Trace Condition. Proof.

By deﬁnition set of 𝑗 -Deviant-Traces is the set of paths 𝑝 such that for all ( 𝛼, 𝛽 ) ∈ 𝑝 we have 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , excludingthe primary trace. The 𝑗 -Deviant-Trace Condition says that for allthese paths 𝑝 , none have a ﬁnite preﬁx accepted by 𝐴 𝑗 .For a inﬁnite/ﬁnite sequence of label direction pairs 𝑝 = ( 𝛼 , 𝛽 ) . . . , let 𝛽 𝑝 denote the inﬁnite/ﬁnite word obtained by concatenat-ing all 𝛽 together in index order. If 𝜋 does not satisfy the 𝑗 -Deviant-Trace Condition, then there exists a ﬁnite sequence of label-directionpairs 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . ( 𝛼 𝑛 , 𝛽 𝑛 ) such that ∀ 𝑖 ( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] , ∃ 𝑖 ( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 ≠ 𝛽 𝑖 , and 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 . Since 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for every index in 𝑝 𝑗 , 𝑇 𝑗 never attempts to transition to 𝑞 𝐴 along 𝑝 𝑗 . And since 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 , we know that along 𝑝 𝑗 𝑇 𝑗 at-tempts to transition to a ﬁnal state in 𝐹 𝑗 and get stuck, thereforerejecting. Therefore, 𝑇 𝑗 reject 𝜎 .Now assume that 𝑇 𝑗 does not accept 𝜋 . This means that alongsome 𝑗 -Deviant-Trace 𝑇 𝑗 attempts to transition to a state 𝑞 𝑓 × { } ,where 𝑞 𝑓 ∈ 𝐹 𝑗 , and gets stuck, as this is the only way for 𝑇 𝑗 to reject. This follows from the observation that every reachablestate in 𝑇 𝑗 is accepting. This means there exists a ﬁnite sequenceof label-direction pairs 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . ( 𝛼 𝑛 , 𝛽 𝑛 ) such that ∀( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] (otherwise 𝑇 𝑗 would have transitioned into 𝑞 𝐴 ), ∃( ≤ 𝑖 ≤ 𝑛 ) .𝛼 𝑖 ≠ 𝛽 𝑖 (otherwise this would be a preﬁx of theprimary trace), and 𝐴 𝑗 accepts 𝛽 𝑝 𝑗 (since 𝑇 𝑗 attempted to transi-tion into a ﬁnal state and got stuck). Therefore, 𝜎 does not satisfythe 𝑗 -Deviant-Trace Condition. (cid:3) W - NE automata We constructed a tree automaton that recognizes the set of strate-gies that satisfy the Primary-Trace condition for a ﬁxed subset 𝑊 ⊆ Ω of agents in an iBG 𝐺 , 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) .We also constructed the automaton 𝑇 𝑗 that checks the 𝑗 -DeviantTrace condition for a speciﬁc agent 𝑗 . A simple way to check boththe Primary Trace condition and the 𝑗 -Deviant Trace condition forsome 𝑊 ⊆ Ω would be to take the cross product of 𝑇 with all the 𝑇 𝑗 ’s for every 𝑗 ∉ 𝑊 . We now show that this can be done moreeﬃciently, by taking a modiﬁed union of the state sets of 𝑇 andthe 𝑇 𝑗 ’s instead of their cross product. This is motivated by the ob-servation that each automaton "checks" a disjoint set of paths in atree 𝜋 , and marks all others with a repeating accepting state.We construct a deterministic top-down Büchi inﬁnite-tree au-tomaton 𝑇 𝑊 = ( Σ , Σ , 𝑄 ∪ Ð 𝑗 ∈ Ω \ 𝑊 𝑄 𝑗 ∪{ 𝑞 𝐴 } , 𝑞 , 𝜏, 𝐹 ∪ Ð 𝑗 ∈ Ω \ 𝑊 { 𝑄 𝑗 \ 𝐹 𝑗 } ∪ { 𝑞 𝐴 }) to accept all strategies that satisfy both the Primary-Trace condition and the 𝑗 -Deviant-Trace Conditions, where (1) Σ is both the label alphabet of the tree and its set of direc-tions with the 𝛼 and 𝛽 notations deﬁned as previously.(2) 𝑞 𝐴 is a repeating accepting state.(3) 𝜏 is deﬁned as follows for a given state 𝑞 , label 𝛼 , and direc-tion 𝛽 (a) If 𝑞 ∈ 𝑄 (i) If 𝛼 = 𝛽 , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝜌 ( 𝑞, 𝛼, 𝛽 ) (ii) If 𝛼 ≠ 𝛽 , but for some 𝑗 ∈ Ω \ 𝑊 we have 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 𝑗 ( 𝑞 [ 𝑗 ] , 𝛽 ) , where 𝑞 [ 𝑗 ] is 𝑗 -thcomponent of 𝑞 , provided that 𝛿 𝑗 ( 𝑞 [ 𝑗 ] , 𝛽 ) ∉ 𝐹 𝑗 .(iii) If for all 𝑗 ∈ Ω \ 𝑊 we have 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 (b) If 𝑞 ∈ 𝑄 𝑗 for 𝑗 ∈ Ω \ 𝑊 , then(i) If 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝛿 𝑗 ( 𝑞, 𝛽 ) , providedthat 𝛿 𝑗 ( 𝑞, 𝛽 ) ∉ 𝐹 𝑗 ,(ii) If 𝛼 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 (c) If 𝑞 = 𝑞 𝐴 , then 𝜏 ( 𝑞, 𝛼, 𝛽 ) = 𝑞 𝐴 Intuitively, the automaton 𝑇 𝑊 simulates the automaton 𝑇 onthe primary trace deﬁned by 𝜋 . If the automaton is on the primarytrace, it is in a state in 𝑄 and it checks all possible 𝑗 -deviations fromthat state by transitioning accordingly to all states reachable bypossible 𝑗 -deviant actions on the corresponding directions. Notethat here we only check if 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for a single 𝑗 , as it caneasy to see that if 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] for twodiﬀerent 𝑗 , 𝑗 then 𝛼 = 𝛽 since Σ 𝑗 and Σ 𝑗 are disjoint. On a statethat does not represent either a continuation of the primary traceor one reachable by a deviation from some agent 𝑗 , we move to therepeating accepting state 𝑞 𝐴 .If the automaton is in some state 𝑞 ∈ 𝑄 𝑗 , it transitions accordingto 𝛿 𝑗 on a direction 𝛽 with 𝛽 [− 𝑗 ] = 𝛼 [− 𝑗 ] , including the one where 𝛼 = 𝛽 . On all other directions, it transitions to the new state 𝑞 𝐴 . Ifthe automaton reaches a ﬁnal state for 𝐴 𝑗 , it gets stuck and cannotaccept. This simulates the automaton 𝑇 𝑗 and veriﬁes the 𝑗 -Deviant-Trace Condition. If the automaton is in the state 𝑞 𝐴 , it means wehave marked the subtree starting from the current node as irrele-vant to the Nash Equilibrium deﬁnition. Therefore, we simply stayin the accepting state 𝑞 𝐴 on every direction. Theorem 3.4.

Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Let 𝜋 : Σ ∗ → Σ be a strategy proﬁle. Then 𝜋 is accepted by the treeautomaton 𝑇 𝑊 iﬀ 𝜋 is a 𝑊 -NE strategy. Proof. ( → ) Suppose 𝜋 : Σ ∗ → Σ is accepted by 𝑇 𝑊 . We showthat 𝜋 must satisfy both the Primary-Trace Condition and the 𝑗 -Deviant-Trace Condition for all 𝑗 ∈ Ω \ 𝑊 .(1) The primary trace of 𝜋 is the unique path 𝑝 = ( 𝛼 , 𝛽 ) . . . such that for every ( 𝛼 𝑖 , 𝛽 𝑖 ) we have 𝛼 𝑖 = 𝛽 𝑖 . On this path, theautomaton 𝑇 𝑊 stays in states in 𝑄 and transitions accordingto the transition function 𝛿 of 𝐴 𝑊 ; thus 𝑇 𝑊 simulates 𝐴 𝑊 on the primary trace. Since 𝑇 𝑊 accepts 𝜋 , we know that 𝐴 𝑊 accepts on 𝑝 , meaning that exactly the goals 𝐴 𝑖 for 𝑖 ∈ 𝑊 aresatisﬁed. Therefore 𝜋 satisﬁes the Primary-Trace Condition.(2) A 𝑗 -deviant trace of 𝜋 is a path 𝑝 𝑗 = ( 𝛼 , 𝛽 ) . . . such thatfor every ( 𝛼 𝑖 , 𝛽 𝑖 ) ∈ 𝑝 we have 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] and we havethat 𝑝 𝑗 is diﬀerent from the primary trace. Therefore, for atleast one index 𝑖 , we have that 𝛼 𝑖 ≠ 𝛽 𝑖 in 𝑝 𝑗 . When 𝑇 𝑊 runs n such a trace, it starts in states in 𝑄 and eventually tran-sition to states in 𝑄 𝑗 upon reaching the ﬁrst index where 𝛼 𝑖 ≠ 𝛽 𝑖 . When it is in the states in 𝑄 , 𝐴 𝑗 cannot reach a ﬁnalstate, otherwise 𝑇 𝑊 would get stuck and not accept due tothe construction of 𝐴 𝑊 , contradicting our assumption that 𝑇 𝑊 does accept. When it reaches the states in 𝑄 𝑗 , it alsocan never get stuck attempting to a transition to ﬁnal statein 𝐹 𝑗 due to the construction of the transition function 𝜏 ,as any such attempted transition would mean 𝑇 𝑊 would re-ject. This is true no matter which 𝑗 -deviant trace we choosesince 𝑇 𝑊 accepts on all paths of 𝜋 . Therefore 𝜋 satisﬁes the 𝑗 -Deviant-Trace condition for all 𝑗 ∈ Ω \ 𝑊 .( ← ) Note that 𝑇 𝑊 is deterministic, so there is a unique run 𝑇 𝑊 ( 𝜋 ) .We have to show that all paths of this run are accepting. There arethree types of paths: Primary Path:

If a path 𝑝 is the primary path, then 𝑇 𝑊 emu-lates 𝐴 𝑊 along 𝑝 . Because of the Primary Trace Condition,we know that 𝐴 𝑊 eventually enters and stays in the the set 𝐹 of accepting states. Thus, this path 𝑝 of 𝑇 𝑊 ( 𝜋 ) is accept-ing. 𝑗 -Deviant Paths: If 𝑝 = ( 𝛼 , 𝛽 ) , . . . is a 𝑗 -deviant path forsome 𝑗 ∈ Ω \ 𝑊 , then it can be factored as 𝑝 𝑃 · 𝑝 𝑗 , with 𝑝 𝑃 ﬁnite, but possibly empty. For every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑃 we have that 𝛼 𝑖 = 𝛽 𝑖 and for every label di-rection pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑗 we have that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] .Note that only one choice of 𝑗 is appropriate. Let 𝑖 be theﬁrst index in 𝑝 where 𝛼 𝑖 ≠ 𝛽 𝑖 . Having 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] and 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for two diﬀerent agents 𝑗 , 𝑗 wouldimply that 𝛼 𝑖 = 𝛽 𝑖 . 𝑇 𝑊 ﬁrst emulates 𝐴 𝑊 along 𝑝 𝑃 . Since 𝜋 satisﬁes the Primary Trace Condition, 𝑇 𝑊 will never getstuck and reject on 𝑝 𝑃 . Since 𝑝 is a 𝑗 -Deviant-Trace, there isa smallest 𝑖 such that 𝛼 𝑖 ≠ 𝛽 𝑖 in 𝑝 . At this point 𝑇 𝑊 switchesfrom emulating 𝐴 𝑊 to emulating 𝐴 𝑗 . Because 𝜋 satisﬁes the 𝑗 -Deviant-Trace-Condition, the goal 𝐴 𝑗 does not hold along 𝑝 . Thus, 𝑇 𝑊 does not get stuck along 𝑝 𝑃 or along 𝑝 𝑗 , and itaccepts along 𝑝 . Other Paths: If 𝑝 is not the primary path nor a 𝑗 -deviantpath, then there are two possibilities.(1) The ﬁrst case is when 𝑝 can be factored as 𝑝 𝑃 · 𝑝 ′ , with 𝑝 𝑃 ﬁnite but possibly empty. For every point ( 𝛼 𝑖 , 𝛽 𝑖 ) of 𝑝 𝑃 we have that 𝛼 𝑖 = 𝛽 𝑖 , and at the ﬁrst point ( 𝛼 𝑘 , 𝛽 𝑘 ) of 𝑝 ′ we have that 𝛼 𝑘 [− 𝑗 ] ≠ 𝛽 𝑘 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 .Then 𝑇 𝑊 will emulate 𝐴 𝑊 along 𝑝 𝑃 and transition to 𝑞 𝐴 upon reading ( 𝛼 𝑘 , 𝛽 𝑘 ) . By previous arguments, we knowthat 𝑇 𝑊 will not get stuck and reject along 𝑝 𝑃 . Once 𝑇 𝑊 enters 𝑞 𝐴 it stays in 𝑞 𝐴 , an accepting state. Therefore 𝑇 𝑊 accepts the path 𝑝 = 𝑝 𝑃 · 𝑝 ′ (2) The second case is when 𝑝 can be factored as 𝑝 𝑃 · 𝑝 𝑗 · 𝑝 ′ , with 𝑝 𝑃 ﬁnite but possibly empty and 𝑝 𝑗 ﬁnite andnonempty. For every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑃 wehave that 𝛼 𝑖 = 𝛽 𝑖 . For some 𝑗 ∈ Ω \ 𝑊 we have that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] for every label-direction pair ( 𝛼 𝑖 , 𝛽 𝑖 ) in 𝑝 𝑗 , again noting that only one choice of 𝑗 is appropri-ate. Finally, at the ﬁrst point ( 𝛼 𝑘 , 𝛽 𝑘 ) of 𝑝 ′ we have that 𝛼 𝑘 [− 𝑗 ] ≠ 𝛽 𝑘 [− 𝑗 ] . By previous arguments, we know that 𝑇 𝑊 will not get stuck and reject along 𝑝 𝑃 or 𝑝 𝑗 . And since 𝑇 𝑊 transitions to 𝑞 𝐴 at the beginning of 𝑝 ′ , we know thatit cannot get stuck and reject along 𝑝 ′ . Therefore 𝑇 𝑊 willaccept on 𝑝 = 𝑝 𝑃 · 𝑝 𝑗 · 𝑝 ′ . (cid:3) Corollary 3.5.

Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Then, a 𝑊 -NE strategy exists in 𝐺 iﬀ the automaton 𝑇 𝑊 constructedwith respect to 𝐺 is nonempty. In the previous section, we constructed an automaton 𝑇 𝑊 that rec-ognizes the set of Nash equilibrium strategy proﬁles with winningset 𝑊 in an iBG 𝐺 , which we denoted as 𝑊 -NE strategies. Theproblem of determining whether a 𝑊 -NE strategy exists is equiv-alent to testing 𝑇 𝑊 for nonemptiness. The standard algorithm fortesting nonemptiness of Büchi tree automata involves Büchi games[9]. In this section, we prove that testing 𝑇 𝑊 for nonemptiness isequivalent to solving safety games and then testing a Büchi wordautomata for nonemptiness. This gives us a simpler path towardsconstructing an algorithm to decide our central question. Note that the Büchi condition on the j-Deviant traces simply con-sists of avoiding the set of ﬁnal states in 𝐴 𝑗 , making it simpler thana general Büchi acceptance condition. In order to characterize thiscondition precisely, we now construct a 2-player safety game thatpartitions the states of 𝑄 𝑗 (for ab agent 𝑗 ∉ 𝑊 ) in 𝑇 𝑊 for 𝑗 ∈ Ω \ 𝑊 into two sets - states in which 𝑇 𝑊 started in state 𝑞 ∈ 𝑄 𝑗 is emptyand states in which 𝑇 𝑗 started in state 𝑞 ∈ 𝑄 𝑗 is nonempty. Weconstruct the safety game 𝐺 𝑗 = ( 𝑄 𝑗 , 𝑄 𝑗 × Σ , 𝐸 𝑗 ) . The safety set canintuitively be thought of as all the vertices not in 𝐹 𝑗 , but for ourpurposes it is more convenient to not deﬁne outgoing transitionsfrom these states - thus making them losing for player 0 by vio-lating the inﬁnite play condition. Player 0 owns 𝑄 𝑗 and player 1owns 𝑄 𝑗 × Σ . Here we retain our 𝛼 and 𝛽 notation in so far as theyare both elements of Σ . The edge relation 𝐸 𝑗 is deﬁned as follows:(1) ( 𝑞, h 𝑞, 𝛼 i)) ∈ 𝐸 𝑗 for 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 and 𝛼 ∈ Σ .(2) (h 𝑞, 𝛼 i , 𝑞 ′ ) ∈ 𝐸 𝑗 for 𝑞 ∈ 𝑄 𝑗 and 𝑞 ′ ∈ 𝑄 𝑗 , where 𝑞 ′ = 𝛿 𝑗 ( 𝑞, 𝛽 ) for some 𝛽 ∈ Σ such that 𝛼 [− 𝑗 ] = 𝛽 [− 𝑗 ] .Note as deﬁned above, if 𝑞 ∈ 𝐹 𝑗 , then 𝑞 has no successor node,and player 0 is stuck and loses the game. Since 𝐺 𝑗 is a safety game,player 0’s goal is to avoid states in 𝐹 𝑗 and not get stuck. Let 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) be the set of winning states for Player 0 in the safety game 𝐺 𝑗 . Theorem 4.1.

A state 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 belongs to 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) iﬀ 𝑇 𝑊 isnonempty when started in state 𝑞 . Proof. ( →) Suppose 𝑞 ∈ 𝑄 𝑗 \ 𝐹 𝑗 and 𝑞 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . We con-struct a tree 𝜋 𝑞 : Σ ∗ → Σ that is accepted by 𝑇 𝑊 starting instate 𝑞 . To show that 𝜋 𝑞 is accepted, we also construct an accept-ing run 𝑟 𝑞 : Σ ∗ → ( 𝑄 𝑗 \ 𝐹 𝑗 ) ∪ { 𝑞 𝐴 } . By construction, we have 𝑟 𝑞 ( 𝑥 ) ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for all 𝑥 ∈ Σ ∗ . We proceed by induction on thelength of the run.For the basis of the induction, we start by deﬁning 𝜋 𝑞 ( 𝜀 ) and 𝑟 𝑞 ( 𝜀 ) . First, we let 𝑟 𝑞 ( 𝜀 ) = 𝑞 . By the assumption that 𝑞 ∉ 𝐹 𝑗 , therun cannot get stuck and reject here. or the step case, suppose now that we have constructed 𝑟 𝑞 ( 𝑦 ) = 𝑝 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for some 𝑦 ∈ Σ ∗ . Now, since 𝑝 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) and can-not get stuck, there must be a node h 𝑞, 𝛼 𝑦 i contained in both 𝑄 𝑗 × Σ and 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , so we let 𝜋 𝑞 ( 𝑦 ) = 𝛼 𝑦 . Recall that the directions of 𝜋 are Σ . Divide the possible directions 𝛽 ∈ Σ into two types: ei-ther 𝛼 𝑦 [− 𝑗 ] = 𝛽 [− 𝑗 ] or 𝛼 𝑦 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] . If 𝛼 𝑦 [− 𝑗 ] = 𝛽 [− 𝑗 ] , thenthis corresponds to a legal move by player 1 in 𝐺 𝑗 . Since h 𝑞, 𝛼 𝑦 i ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , moves by player 1 must stay in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . It followsthat 𝑞 ′ = 𝛿 𝑗 ( 𝑞, 𝛽 ) ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , so 𝑞 ′ ∉ 𝐹 𝑗 . We let 𝑟 𝑞 ( 𝑦 · 𝛽 ) = 𝑞 ′ .If, on the other hand, 𝛼 𝑦 [− 𝑗 ] ≠ 𝛽 [− 𝑗 ] , we let 𝑟 𝑞 ( 𝑦 · 𝛽 ) = 𝑞 𝐴 .Once we have reached a node 𝑧 ∈ Σ ∗ with 𝑟 𝑞 ( 𝑧 ) = 𝑞 𝐴 , we deﬁne 𝑟 𝑞 ( 𝑧 ′ ) = 𝑞 𝐴 for all descendants 𝑧 ′ of 𝑧 and we can deﬁne 𝜋 𝑞 ( 𝑧 ′ ) arbitrarily. Since we can never get stuck, we never reach a state in 𝐹 𝑗 , so the run 𝑟 𝑞 is accepting. (←) Suppose now that 𝑇 𝑊 started in state 𝑞 accepts a tree 𝜋 𝑞 : Σ ∗ → Σ . Since the automaton 𝑇 𝑊 is deterministic, it accepts witha unique run of 𝑇 𝑊 on 𝜋 𝑞 as 𝑟 𝑞 : Σ ∗ → ( 𝑄 𝑗 \ 𝐹 𝑗 ) ∪ { 𝑞 𝐴 } . We claimthat 𝜋 𝑞 is a winning strategy for player 0 in 𝐺 𝑗 from the state 𝑞 .Consider a play 𝜋 = 𝑝 , 𝛼 , 𝛽 , 𝑝 , 𝛼 , 𝛽 , . . . , where 𝑝 𝑖 ∈ 𝑄 𝑗 , 𝑝 = 𝑞 , and 𝛼 𝑖 , 𝛽 𝑖 ∈ Σ . In round 𝑖 ≥

0, player 0 moves from 𝑝 𝑖 to h 𝑝 𝑖 , 𝛼 𝑖 i ,for 𝛼 𝑖 = 𝜋 𝑞 (h 𝛽 , . . . , 𝛽 𝑖 − i) , and then player 1 moves from h 𝑝 𝑖 , 𝛼 𝑖 i to 𝑝 𝑖 + = 𝛿 𝑗 ( 𝑝 𝑖 , 𝛽 𝑖 ) , for some 𝛽 𝑖 such that 𝛼 𝑖 [− 𝑗 ] = 𝛽 𝑖 [− 𝑗 ] . Let 𝑥 𝑖 = h 𝛽 , . . . , 𝛽 𝑖 − i , so we have that 𝛼 𝑖 = 𝜋 𝑞 ( 𝑥 𝑖 ) . By induction onthe length of 𝑥 𝑖 it follows that 𝑝 𝑖 = 𝑟 𝑞 ( 𝑥 𝑖 ) . Since 𝑟 𝑞 is an acceptingrun of 𝑇 𝑊 on 𝜋 𝑞 , it follows that 𝑝 𝑖 = 𝑟 𝑞 ( 𝑥 𝑖 ) ∉ 𝐹 𝑗 . Thus, the play 𝜋 is a winning play for player 0. It follows that 𝜋 𝑞 is a winningstrategy for player 0 in 𝐺 𝑗 from the state 𝑞 . (cid:3) 𝑇 𝑊 Nonemptiness

Recall that the tree automaton 𝑇 𝑊 , which recognizes 𝑊 -NE strate-gies, emulates the Büchi automaton 𝐴 𝑊 = ( 𝑄, 𝑞 , Σ , 𝛿, 𝐹 ) along theprimary trace and the goal automaton 𝐴 𝑗 along 𝑗 -deviant traces.We have constructed the above games 𝐺 𝑗 to capture nonemptinessof 𝑇 𝑊 from states in 𝑄 𝑗 , in terms of the winning sets 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . Wenow modify 𝐴 𝑊 to take these safety games into account. Let 𝐴 ′ 𝑊 = ( 𝑄 ′ , 𝑞 , Σ , 𝛿 ′ , 𝐹 ∩ 𝑄 ′ ) be obtained from 𝐴 𝑊 by restricting states to 𝑄 ′ ⊆ 𝑄 , where 𝑄 ′ = > 𝑖 ∈ 𝑊 𝑄 𝑖 × > 𝑗 ∈ Ω \ 𝑊 { 𝑊 𝑖𝑛 ( 𝐺 𝑗 )∩ 𝑄 𝑗 }× Ω . Inother words, the 𝑗 𝑡ℎ -component 𝑞 𝑖 𝑗 of a state 𝑞 = h 𝑞 𝑖 , . . . , 𝑞 𝑖 𝑛 i ∈ 𝑄 ′ must be in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) for all 𝑗 ∈ Ω \ 𝑊 , otherwise the automaton 𝐴 ′ 𝑊 gets stuck. Theorem 4.2.

The Büchi word automaton 𝐴 ′ 𝑊 is nonempty iﬀ thetree automaton 𝑇 𝑊 is nonempty. Proof. ( → ) Assume 𝐴 ′ 𝑊 is nonempty. Then, it accepts an inﬁ-nite word 𝑤 = 𝑤 𝑤 . . . ∈ Σ 𝜔 with a run 𝑟 = 𝑞 , 𝑞 , . . . ∈ 𝑄 ′ 𝜔 . Weuse 𝑤 and 𝑟 to create a tree 𝜋 : Σ ∗ → Σ with an accepting run 𝑟 𝜋 : Σ ∗ → 𝑄 ∪ { 𝑞 𝐴 } with respect to 𝑇 𝑊 .Let 𝑥 = 𝜀 . We start by setting 𝜋 ( 𝑥 ) = 𝑤 and 𝑟 𝜋 ( 𝑥 ) = 𝑞 .Suppose now that we have just deﬁned 𝜋 ( 𝑥 𝑖 ) = 𝛼 and 𝑟 𝜋 ( 𝑥 𝑖 ) = 𝑞 ,and, by construction, 𝑥 𝑖 is on the primary trace. Consider now thenode 𝑥 𝑖 · 𝛽 . There are three cases to consider:(1) If 𝜋 ( 𝑥 𝑖 ) = 𝛽 , then we set 𝑥 𝑖 + = 𝑥 𝑖 · 𝛽 , 𝜋 ( 𝑥 𝑖 + ) = 𝑤 𝑖 + and 𝑟 𝜋 ( 𝑥 𝑖 + ) = 𝑞 𝑖 + . Note that 𝑥 𝑖 + is, by construction, thesuccessor of 𝑥 𝑖 on the primary trace. Thus, the projection of 𝑟 𝜋 on the primary trace of 𝜋 is precisely 𝑟 , so 𝑟 𝜋 is acceptingalong the primary path. (2) If 𝜋 ( 𝑥 𝑖 )[− 𝑗 ] = 𝛽 [− 𝑗 ] and 𝜋 ( 𝑥 𝑖 ) ≠ 𝛽 for some 𝑗 ∈ Ω \ 𝑊 ,then we set 𝑟 𝜋 ( 𝑥 𝑖 · 𝛽 ) = 𝑞 ′ 𝑗 = 𝛿 𝑗 ( 𝑞 𝑗 , 𝛽 ) , where 𝑞 𝑗 is the 𝑗 -thcomponent of 𝑞 . Since 𝑞 𝑗 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , we have that 𝑞 ′ 𝑗 ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . By Theorem 4.1, 𝑇 𝑊 is nonempty when startedin state 𝑞 ′ 𝑗 . That is, there is a tree 𝜋 𝑞 ′ 𝑗 and an accepting run 𝑟 𝑞 ′ 𝑗 of 𝑇 𝑊 on 𝜋 𝑞 ′ 𝑗 , starting from 𝑞 ′ 𝑗 . So we take the subtreeof 𝜋 rooted at the node 𝑥 𝑖 · 𝛽 to be 𝜋 𝑞 ′ 𝑗 , and the run of 𝑇 𝑊 from 𝑥 𝑖 · 𝛽 is 𝑟 𝑞 ′ 𝑗 . So all paths of 𝑟 𝜋 that go through 𝑥 𝑖 · 𝛽 areaccepting.(3) Finally, if 𝜋 ( 𝑥 𝑖 )[− 𝑗 ] ≠ 𝛽 [− 𝑗 ] for all 𝑗 ∈ Ω \ 𝑊 , then 𝑥 𝑖 · 𝛽 is neither on the primary trace nor on a 𝑗 -deviant trace forsome 𝑗 ∈ Ω \ 𝑊 . So we set 𝑟 𝜋 ( 𝑥 𝑖 · 𝛽 ) = 𝑞 𝐴 as well as 𝑟 𝜋 ( 𝑦 ) = 𝑞 𝐴 for all descendants 𝑦 of 𝑥 𝑖 · 𝛽 . The labels of 𝑥 𝑖 · 𝛽 and itdescendants can be set arbitrarily. So all paths of 𝑟 𝜋 that gothrough 𝑥 𝑖 · 𝛽 are accepting.( ← ) Assume 𝑇 𝑊 is nonempty. Then, we know that it accepts atleast one tree 𝜋 : Σ ∗ → Σ . In particular, since 𝑇 𝑊 accepts on allbranches of 𝜋 it accepts on the primary trace, denoted as 𝜋 𝑝 .Since 𝑇 𝑊 accepts on 𝜋 𝑝 , we can consider the run of 𝑇 𝑊 on 𝜋 which we denote 𝑟 : Σ ∗ → 𝑄 . Let the image of 𝑟 ( 𝜋 𝑝 ) be 𝑄 ∗ ⊆ 𝑄 .We claim that 𝑄 ∗ ⊆ 𝑄 ′ .Assume otherwise, that for some ﬁnite preﬁx of the primarytrace of 𝜋 denoted 𝑝 we have that 𝑟 ( 𝑝 ) ∉ 𝑄 ′ . Since 𝑟 ( 𝑝 ) clearly isinside 𝑄 , it must be the case that for some 𝑗 ∈ Ω \ 𝑊 𝑟 ( 𝑝 )[ 𝑗 ] ∉ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . Since 𝑟 ( 𝑝 )[ 𝑗 ] is not in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) , it must be in 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) .This means that, upon observing 𝑝 , a direction 𝛽 exists that tran-sitions 𝑇 𝑊 into a state 𝑞 ′ ∈ 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) . From here player 1 hasa winning strategy in 𝐺 𝑗 . Following one of the paths created byplayer 1 playing directions according to this winning strategy andplayer 0 playing anything in response, we get that player 1 willeventually win the game, forcing 𝑇 𝑊 to attempt a transition into 𝐹 𝑗 and getting stuck. Therefore 𝑇 𝑊 does not actually accept 𝜋 , acontradiction.Since the image of 𝑟 ( 𝜋 𝑝 ) is contained within 𝑄 ′ , we claim that 𝐴 ′ 𝑊 accepts the word formed by the labels along 𝜋 𝑝 , which wedenote by 𝛼 ( 𝜋 𝑝 ) . Since 𝑇 𝑊 accepts along 𝜋 𝑝 and the run 𝑟 ( 𝜋 𝑝 ) never leaves 𝑄 ′ , we have that there are inﬁnitely many membersof the set 𝐹 ∩ 𝑄 ′ in the run 𝑟 ( 𝜋 𝑝 ) , satisfying the Büchi condition of 𝐴 ′ 𝑊 . And since any states in which some 𝑄 𝑗 for 𝑗 ∈ Ω \ 𝑊 reachesa ﬁnal state are excluded from 𝑄 ′ , 𝐴 ′ 𝑊 will never get stuck reading 𝛼 ( 𝜋 𝑝 ) . Therefore, 𝐴 ′ 𝑊 accepts 𝛼 ( 𝜋 𝑝 ) and is therefore nonempty. (cid:3) Corollary 4.3.

Let 𝐺 be an iBG and 𝑊 ⊆ Ω be a set of agents.Then, a 𝑊 -NE strategy exists in 𝐺 iﬀ the automaton 𝐴 ′ 𝑊 constructedwith respect to 𝐺 is nonempty. The algorithm outlined by our previous constructions consists oftwo main part. First, we construct and solve a safety game foreach agent. Second, for 𝑊 ⊆ Ω , we check the automaton 𝐴 ′ 𝑊 fornonemptiness. The input to this algorithm consists of 𝑘 goal DFAswith alphabet Σ and a set of 𝑘 alphabets Σ 𝑖 corresponding to the ctions available to each agent. Therefore, the size of the input isthe sum of the sizes of these 𝑘 goal DFAs.In the ﬁrst step, we construct a safety game for each of theagents. The size of the state space of the safety game for agent 𝑗 is | 𝑄 𝑗 |(| Σ | + ) . The size of the edge set for the safety game canbe bounded by (| 𝑄 𝑗 |∗| Σ |)+ (| 𝑄 𝑗 | ∗| Σ |) , where | 𝑄 𝑗 |∗| Σ | representsthe | Σ | outgoing transitions from each state in 𝑄 𝑗 owned by player0 and | 𝑄 𝑗 | ∗ | Σ | is an upper bound assuming that each of the statesin 𝑄 𝑗 × Σ owned by player 1 can transition to each of the states in 𝑄 𝑗 owned by player 0. Since safety games can be solved in lineartime with respect to the number of the edges [2], each safety gameis solved in polynomial time. We solve one such safety game foreach agent which represents a linear blow up. Therefore, solvingthe safety games for all agents can be done in polynomial time.For a given 𝑊 ⊆ Ω , querying the automaton 𝐴 ′ 𝑊 for nonempti-ness can be done in PSPACE, as the state space of 𝐴 ′ 𝑊 consists oftuples from the product of input DFAs. We can then test 𝐴 ′ 𝑊 onthe ﬂy by guessing the preﬁx of the lasso and then guessing thecycle, which can be done in polynomial space [29]. Theorem 5.1.

The problem of deciding whether there exists a 𝑊 -NE strategy proﬁle for an iBG 𝐺 and a set 𝑊 ⊆ Ω of agents is inPSPACE. In this section we show that the problem of determining whethera 𝑊 -NE exists in an iBG is PSPACE-hard by providing a reductionfrom the PSPACE-complete problem of DFA Intersection Empti-ness (DFAIE). The DFAIE problem is as follows: Given 𝑘 DFAs 𝐴 . . . 𝐴 𝑘 − with a common alphabet Σ , decide whether Ñ ≤ 𝑖 ≤ 𝑘 − 𝐴 𝑖 ≠ ∅ [15].Given a DFA 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i , we deﬁne the goal DFAˆ 𝐴 𝑖 = h ˆ 𝑄 𝑖 , 𝑞 𝑖 , ˆ Σ , ˆ 𝛿 𝑖 , ˆ 𝐹 𝑖 i as follows:(1) ˆ Σ = Σ ∪ { 𝐾 } , where 𝐾 is a new symbol, i.e. 𝐾 ∉ Σ (2) ˆ 𝑄 𝑖 = 𝑄 𝑖 ∪ { accept , reject } ,(3)  ˆ 𝛿 𝑖 ( 𝑞, 𝑎 ) = 𝑞 for 𝑞 ∈ { accept , reject } and 𝑎 ∈ ˆ Σ ˆ 𝛿 𝑖 ( 𝑞, 𝑎 ) = 𝛿 𝑖 ( 𝑞, 𝑎 ) for 𝑞 ∈ 𝑄 𝑖 and 𝑎 ∈ Σ ˆ 𝛿 𝑖 ( 𝑞, 𝐾 ) = accept for 𝑞 ∈ 𝐹 𝑖 ˆ 𝛿 𝑖 ( 𝑞, 𝐾 ) = reject for 𝑞 ∈ 𝑄 𝑖 \ 𝐹 𝑖 (4) ˆ 𝐹 𝑖 = { accept } Intuitively, accept and reject are two new accepting and rejectingstates that have no outgoing transitions. The new symbol 𝐾 takesaccepting states to accept and rejecting states to reject. The pur-pose of 𝐾 is to synchronize acceptance by all goal automata. Wecall the process of modifying 𝐴 𝑖 into ˆ 𝐴 𝑖 transformation.The transformation from 𝐴 𝑖 to ˆ 𝐴 𝑖 can be done in linear timewith respect to the size of 𝐴 𝑖 , as the process only involves addingtwo new states. Furthermore, if 𝐴 𝑖 is a DFA then ˆ 𝐴 𝑖 is also a DFA.Given an instance of DFAIE, i.e., 𝑘 DFAs 𝐴 . . . 𝐴 𝑘 − , we createan iBG 𝐺 , deﬁned in the following manner.(1) Ω = { , . . . 𝑘 − } (2) The goal for agent 𝑖 is ˆ 𝐴 𝑖 (3) Σ = Σ ∪ { 𝐾 } (4) Σ 𝑖 = {∗} for 𝑖 ≠

0. Here ∗ represents a fresh symbol, i.e., ∗ ∉ Σ and ∗ ≠ 𝐾 .Clearly, the blow-up of the construction is linear. Since eachagent except 0 is given control over a set consisting solely of ∗ ,the common alphabet of the ˆ 𝐴 𝑖 is technically ˆ Σ × {∗} 𝑘 − . This al-phabet is isomorphic to ˆ Σ , so by a slight abuse of notation we keepconsidering the alphabet of the ˆ 𝐴 𝑖 to be ˆ Σ .Before stating and proving the correctness of the reduction, wemake two observations. We are interested here in Nash equilibriain which every agent is included in 𝑊 . This implies the following:(1) The existence of an Ω -NE is deﬁned solely by the Primary-Trace Condition. Since there are no agents in Ω \ 𝑊 , there isno concept of a 𝑗 -Deviant-Trace. If we are given an inﬁniteword that satisﬁes the Primary-Trace Condition, we can ex-tend it to a full Ω -NE strategy tree by labeling the nodesthat do not occur on the primary trace arbitrarily.(2) Since there are no 𝑗 -Deviant-Traces in this speciﬁc instanceof the Ω -NE Nonemptiness problem, we can relax our as-sumption that | Σ 𝑗 | ≥ 𝑗 ∈ Ω , since there is nomeaningful concept of deviation in an Ω -NE. Recall that thisassumption was made only for simplicity of presentation re-garding 𝑗 -Deviant Traces. Theorem 5.2.

Let 𝐴 . . . 𝐴 𝑘 − be 𝑘 DFAs with alphabet Σ . Then, Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ iﬀ there exists an Ω -NE in the iBG 𝐺 con-structed from 𝐴 . . . 𝐴 𝑘 − . Proof.

In this proof, we introduce the notation 𝑆 to denote aninﬁnite suﬃx, which is an arbitrarily chosen element of { Σ ∪ 𝐾 } 𝜔 . (→) Assume that Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ . Then, there is a word 𝑤 ∈ Σ ∗ that is accepted by each of 𝐴 . . . 𝐴 𝑘 − . We now show that 𝑤 · 𝐾 · 𝑆 satisﬁes all goals ˆ 𝐴 . . . ˆ 𝐴 𝑘 − . Since each of 𝐴 . . . 𝐴 𝑘 − accepts 𝑤 , each of ˆ 𝐴 . . . ˆ 𝐴 𝑘 − reaches a ﬁnal state of 𝐴 . . . 𝐴 𝑘 − ,respectively, after reading 𝑤 . Then, after reading 𝐾 , ˆ 𝐴 . . . ˆ 𝐴 𝑘 − allsimultaneously transition to accept. Therefore all goals ˆ 𝐴 𝑖 are sat-isﬁed on 𝑤 · 𝐾 · 𝑆 and 𝑤 · 𝐾 · 𝑆 satisﬁes the Primary-Trace Condition.Since we are considering an Ω -NE, there is no need to check de-viant traces and 𝑤 · 𝐾 · 𝑆 can be arbitrarily extended to a full Ω -NEstrategy proﬁle tree. (←) Assume that the iBG 𝐺 with goals ˆ 𝐴 . . . ˆ 𝐴 𝑘 − admits an Ω -NE. We claim that its primary trace must be of the form 𝑤 · 𝐾 · 𝑆 ,where 𝑤 ∈ Σ ∗ does not contain 𝐾 . This is equivalent to sayingthat a satisfying primary trace must have at least one 𝐾 . This iseasy to see, as the character 𝐾 is the only way to transition into anaccepting state for each ˆ 𝐴 𝑖 , therefore it must occur at least once ifall ˆ 𝐴 𝑖 are satisﬁed on this trace.We now claim that each of 𝐴 . . . 𝐴 𝑘 − accept 𝑤 . Assume this isnot the case, and some 𝐴 𝑖 does not accept 𝑤 . Then, while reading 𝑤 , ˆ 𝐴 𝑖 never reaches accept, as 𝑤 does not contain 𝐾 . Furthermore,upon seeing the ﬁrst 𝐾 , ˆ 𝐴 𝑖 transitions to reject, since 𝐴 𝑖 is not ina ﬁnal state in 𝐹 𝑖 after reading 𝑤 . Thus, ˆ 𝐴 𝑖 can never reach accept,contradicting the assumption that 𝑤 · 𝐾 · 𝑆 was an Ω -NE. Thereforeall 𝐴 𝑖 must accept 𝑤 , and Ñ ≤ 𝑖 ≤ 𝑘 − 𝐿 ( 𝐴 𝑖 ) ≠ ∅ . (cid:3) This establishes a polynomial time reduction from DFAIE to 𝑊 -NE Nonemptiness; therefore 𝑊 -NE Nonemptiness is PSPACE-hard. n fact this reduction has shown that checking the Primary-TraceCondition is itself PSPACE-hard. Combining this with our PSPACEdecision algorithm yields PSPACE-completeness. Theorem 5.3.

The problem of deciding whether there exists a 𝑊 -NE strategy proﬁle for an iBG 𝐺 and a set 𝑊 ⊆ Ω of agents isPSPACE-complete. The main contribution of this work is Theorem 5.3, which charac-terizes the complexity of deciding whether a 𝑊 -NE strategy proﬁleexists for an iBG 𝐺 and 𝑊 ⊆ Ω is PSPACE-complete. Separation of Strategic and Temporal Reasoning. : The main ob-jectives of this work is to analyze equilibria in ﬁnite-horizon mul-tiagent concurrent games, focusing on the strategic-reasoning as-pect of the problem, separately from temporal reasoning. In orderto accomplish this, we used DFA goals instead of goals expressed insome ﬁnite-horizon temporal logic. For these ﬁnite-horizon tempo-ral logics, previous analysis [12] consisted of two steps. First, thelogical goals are translated into a DFA, which involves a doublyexponential blow up [6, 17]. The second step was to perform thestrategic reasoning, i.e., ﬁnding the Nash equilibria with the DFAfrom the ﬁrst step as input. In terms of computational complexity,the ﬁrst step completely dominated the second step, in which thestrategic reasoning was conducted with respect to the DFAs. Herewe eliminated the doubly exponential-blow up from considerationby starting with DFA goals and provided a PSPACE-completenessresult for the second step.

Future Work. : Our immediate next goals are to analyze prob-lems such as veriﬁcation (deciding whether a given strategy proﬁleis a 𝑊 -NE) and strategy extraction (i.e., construction a ﬁnite-statecontroller that implements the 𝑊 -NEs found) within the contextof our DFA based iBGs. Furthermore, we are interested in imple-mentation, i.e. a tool based on the theory developed in this paper.Further points of interest can be motivated from a game-theorylens, such as introducing imperfect information. Earlier work hasalready introduced imperfect information to problems in synthesisand veriﬁcation - see [3, 8, 27]. Finally, the work can be extended toboth the general CGS formalism (as opposed to iBGs) and to query-ing other properties/equilibrium concepts outside of the Nash equi-libria. Strategy Logic [21] has been introduced as a way to querygeneral game theoretic properties on concurrent game structures,and a version of strategy logic with ﬁnite goals would be a promis-ing place to start for these extensions.

REFERENCES [1] R. Alur, T. A. Henzinger, and O. Kupferman. Alternating-time temporal logic.

J.ACM , 49(5):672–713, 2002.[2] J. Bernet, D. Janin, and I. Walukiewicz. Permissive strategies: from parity gamesto safety games.

RAIRO-Theoretical Informatics and Applications-InformatiqueThéorique et Applications , 36(3):261–275, 2002.[3] R. Berthon, B. Maubert, A. Murano, S. Rubin, and M. Y. Vardi. Strategy logicwith imperfect information.

CoRR , abs/1805.12592, 2018.[4] J. Elgaard, N. Klarlund, and A. Möller. Mona 1.x: new techniques for WS1S andWS2S. In

Proc. 10th Int’l Conf. on Computer Aided Veriﬁcation , volume 1427 of

Lecture Notes in Computer Science , pages 516–520. Springer, 1998.[5] D. Fisman, O. Kupferman, and Y. Lustig. Rational synthesis. In J. Esparza andR. Majumdar, editors,

Tools and Algorithms for the Construction and Analysis ofSystems, 16th International Conference, TACAS 2010, Held as Part of the Joint Euro-pean Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings , volume 6015 of

Lecture Notes in Computer Science ,pages 190–204. Springer, 2010.[6] G. D. Giacomo and M. Y. Vardi. Linear temporal logic and linear dynamic logic onﬁnite traces. In F. Rossi, editor,

IJCAI 2013, Proceedings of the 23rd InternationalJoint Conference on Artiﬁcial Intelligence, Beijing, China, August 3-9, 2013 , pages854–860. IJCAI/AAAI, 2013.[7] G. D. Giacomo and M. Y. Vardi. Synthesis for LTL and LDL on ﬁnite traces.In Q. Yang and M. J. Wooldridge, editors,

Proceedings of the Twenty-Fourth In-ternational Joint Conference on Artiﬁcial Intelligence, IJCAI 2015, Buenos Aires,Argentina, July 25-31, 2015 , pages 1558–1564. AAAI Press, 2015.[8] G. D. Giacomo and M. Y. Vardi. Ltlf and ldlf synthesis under partial observability.In S. Kambhampati, editor,

Proceedings of the Twenty-Fifth International JointConference on Artiﬁcial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July2016 , pages 1044–1050. IJCAI/AAAI Press, 2016.[9] E. Grädel, W. Thomas, and T. Wilke.

Automata, Logics, and Inﬁnite Games: AGuide to Current Research . Lecture Notes in Computer Science 2500. Springer,2002.[10] J. Gutierrez, P. Harrenstein, and M. J. Wooldridge. Iterated boolean games.

Inf.Comput. , 242:53–79, 2015.[11] J. Gutierrez, M. Najib, G. Perelli, and M. J. Wooldridge. Automated temporalequilibrium analysis: Veriﬁcation and synthesis of multi-player games.

Artif.Intell. , 287:103353, 2020.[12] J. Gutierrez, G. Perelli, and M. J. Wooldridge. Iterated games with LDL goals overﬁnite traces. In K. Larson, M. Winikoﬀ, S. Das, and E. H. Durfee, editors,

Pro-ceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems,AAMAS 2017, São Paulo, Brazil, May 8-12, 2017 , pages 696–704. ACM, 2017.[13] M. Hasanbeig, N. Y. Jeppu, A. Abate, T. Melham, and D. Kroening. Deepsynth:Program synthesis for automatic task segmentation in deep reinforcement learn-ing.

CoRR , abs/1911.10244, 2019.[14] T. A. Henzinger. Games in system design and veriﬁcation. In R. van der Meyden,editor,

Proceedings of the 10th Conference on Theoretical Aspects of Rationalityand Knowledge (TARK-2005), Singapore, June 10-12, 2005 , pages 1–4. NationalUniversity of Singapore, 2005.[15] D. Kozen. Lower bounds for natural proof systems. In , pages 254–266. IEEE Computer Society, 1977.[16] O. Kupferman, G. Perelli, and M. Y. Vardi. Synthesis with rational environments.

Ann. Math. Artif. Intell. , 78(1):3–20, 2016.[17] O. Kupferman and M. Vardi. Model checking of safety properties.

Formal Meth-ods in System Design , 19(3):291–314, 2001.[18] R. McNaughton. Inﬁnite games played on ﬁnite graphs.

Ann. Pure Appl. Logic ,65(2):149–184, 1993.[19] J. J. Michalenko, A. Shah, A. Verma, R. G. Baraniuk, S. Chaudhuri, and A. B. Pa-tel. Representing formal languages: A comparison between ﬁnite automata andrecurrent neural networks. In . OpenReview.net, 2019.[20] F. Mogavero, A. Murano, G. Perelli, and M. Y. Vardi. Reasoning about strategies:On the model-checking problem.

ACM Trans. on Computational Logic , 15(4):1–47, 2014.[21] F. Mogavero, A. Murano, G. Perelli, and M. Y. Vardi. Reasoning about strategies:On the model-checking problem.

ACM Trans. Comput. Log. , 15(4):34:1–34:47,2014.[22] J. F. Nash. Equilibrium points in n-person games.

Proceedings of the NationalAcademy of Sciences , 36(1):48–49, 1950.[23] A. Pnueli. The temporal logic of programs. In , pages 46–57. IEEE Computer Society, 1977.[24] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In

ConferenceRecord of the Sixteenth Annual ACM Symposium on Principles of ProgrammingLanguages, Austin, Texas, USA, January 11-13, 1989 , pages 179–190. ACM Press,1989.[25] Y. Shoham and K. Leyton-Brown.

Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations . Cambridge University Press, 2009.[26] M. Sipser.

Introduction to the Theory of Computation . Course Technology, secondedition, 2006.[27] L. M. Tabajara and M. Y. Vardi. Ltlf synthesis under partial observability: Fromtheory to practice.

CoRR , abs/2009.10875, 2020.[28] J. van Benthem. Logic games: From tools to models of interaction. In J. vanBenthem, A. Gupta, and R. Parikh, editors,

Proof, Computation and Agency - Logicat the Crossroads , volume 352 of

Synthese library , pages 183–216. Springer, 2011.[29] M. Vardi and P. Wolper. Reasoning about inﬁnite computations.

Informationand Computation , 115(1):1–37, 1994.[30] M. J. Wooldridge.

An Introduction to MultiAgent Systems, Second Edition . Wiley,2009.[31] E. Yahav. From programs to interpretable deep models and back. In H. Chocklerand G. Weissenbacher, editors,

Computer Aided Veriﬁcation - 30th InternationalConference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, xford, UK, July 14-17, 2018, Proceedings, Part I , volume 10981 of Lecture Notesin Computer Science , pages 27–37. Springer, 2018. 10 otation GlossarySymbol Meaning 𝐺 An iBG as a whole Ω The set of agents in an iBG. 𝑘 The cardinality of Ω . Ω = { , . . . 𝑘 − } 𝑖, 𝑗 Agents in an iBG. 𝑗 usually refers to a deviating agent. 𝐴 𝑖 The goal automata for agent 𝑖 . Usually 𝐴 𝑗 if 𝑗 is not in 𝑊 . 𝐴 𝑖 = h 𝑄 𝑖 , 𝑞 𝑖 , Σ , 𝛿 𝑖 , 𝐹 𝑖 i 𝜋 𝑖 A strategy for agent 𝑖 Π 𝑖 The set of strategies for agent 𝑖𝜋 A strategy proﬁle consisting of one strategy for each agent. 𝜋 = h 𝜋 . . . 𝜋 𝑘 − i . In the context of safety games, a play. Σ 𝑖 The set of actions for agent 𝑖 Σ The cross product of all 𝜎 𝑖 . 𝑤 An element of Σ ∗ 𝛼 An element of Σ . In the context of Σ ∗ → Σ trees, it refers to a label. 𝛽 An element of Σ . In the context of Σ ∗ → Σ trees, it refers to a direction. 𝑞 𝐴 , 𝑞 𝑎 A catch all accepting state in tree automata that always transitions back to itself. 𝐴 𝑊 A deterministic Büchi word automaton that accepts traces in which all goals from 𝑊 are satisﬁed and no others. 𝐴 𝑊 = h 𝑄, 𝑞 , Σ , 𝛿, 𝐹 i 𝑇 A deterministic top down Büchi tree automaton that accepts a tree if its primary trace is accepted by 𝐴 𝑊 . 𝑇 = ( Σ , Σ , 𝑄 ∪ { 𝑞 𝑎 } , 𝑞 , 𝜌 , 𝐹 ∪ { 𝑞 𝑎 }) 𝑇 𝑗 A deterministic top down Büchi tree automaton that accepts a tree if it satisﬁes the j-Deviant Trace Condition. 𝑇 𝑗 = ( Σ , Σ , ( 𝑄 𝑗 × { , }) ∪ { 𝑞 𝐴 } , h 𝑞 𝑗 , i , 𝜌 𝑗 , ( 𝑄 𝑗 × { }) ∪ (( 𝑄 𝑗 \ 𝐹 𝑗 ) × { }) ∪ { 𝑞 𝐴 }) 𝑇 𝑊 A deterministic top down Büchi tree automaton that accepts a tree if it represents a 𝑊 -NE strategy proﬁle. 𝑇 𝑊 = ( Σ , Σ , 𝑄 ∪ Ð 𝑗 ∈ Ω \ 𝑊 𝑄 𝑗 ∪ { 𝑞 𝐴 } , 𝑞 , 𝜏, 𝐹 ∪ Ð 𝑗 ∈ Ω \ 𝑊 { 𝑄 𝑗 \ 𝐹 𝑗 } ∪ { 𝑞 𝐴 }) 𝐺 𝑗 A safety game constructed to characterize the states of 𝑄 𝑗 in 𝐴 𝑗 into those that 𝑇 𝑊 started in said state is empty ornot. 𝐺 𝑗 = ( 𝑄 𝑗 , 𝑄 𝑗 × Σ , 𝐸 𝑗 ) 𝑊 𝑖𝑛 ( 𝐺 𝑗 ) The winning set of player 0 in 𝐺 𝑗 . 𝐴 ′ 𝑊 A deterministic Büchi word automata used to test 𝑇 𝑊 for nonemptiness. 𝐴 ′ 𝑊 = ( 𝑄 ′ , 𝑞 , Σ , 𝛿 ′ , 𝐹 ∩ 𝑄 ′ ) 𝐾 A fresh character that is not contained in Σ . ∗ A second fresh character that is neither contained in Σ nor equal to 𝐾 .ˆ 𝐴 𝑖 A transformed DFA that serves as a goal DFA in an iBG. ˆ 𝐴 𝑖 = h ˆ 𝑄 𝑖 , 𝑞 𝑖 , ˆ Σ , ˆ 𝛿 𝑖 , ˆ 𝐹 𝑖 i See Section 5.2.ˆ

Σ Σ ∪ { 𝐾 } 𝑆 An arbitrary element of { Σ ∪ 𝐾 } 𝜔11