[PDF] Polyhedral value iteration for discounted games and energy games

Abstract

We present a deterministic algorithm, solving discounted games with n nodes in n O(1) ⋅(2+ 2 – √ ) n -time. For bipartite discounted games our algorithm runs in n O(1) ⋅ 2 n -time. Prior to our work no deterministic algorithm running in time 2 o(nlogn) regardless of the discount factor was known. We call our approach polyhedral value iteration. We rely on a well-known fact that the values of a discounted game can be found from the so-called optimality equations. In the algorithm we consider a polyhedron obtained by relaxing optimality equations. We iterate points on the border of this polyhedron by moving each time along a carefully chosen shift as far as possible. This continues until the current point satisfies optimality equations. Our approach is heavily inspired by a recent algorithm of Dorfman et al. (ICALP 2019) for energy games. For completeness, we present their algorithm in terms of polyhedral value iteration. Our exposition, unlike the original algorithm, does not require edge weights to be integers and works for arbitrary real weights.

Full PDF

aa r X i v : . [ c s . D S ] J u l Polyhedral value iteration for discounted games andenergy games

Alexander Kozachinskiy ∗ Department of Computer Science, University of Warwick, Coventry, UKJuly 20, 2020

Abstract

We present a deterministic algorithm solving discounted games with n nodesin strongly n O (1) · (2 + √ n -time. For a special case of bipartite discounted gamesour algorithm runs in n O (1) · n -time. Prior to our work no deterministic algorithmrunning in time 2 o ( n log n ) regardless of the discount factor was known.We call our approach polyhedral value iteration. We rely on a well-known factthat the values of a discounted game can be found from the so-called optimal-ity equations. In the algorithm we consider a polyhedron obtained by relaxingoptimality equations. We iterate the points on the border of this polyhedron bymoving each time along a carefully chosen shift as far as possible. This continuesuntil the current point satisﬁes optimality equations.Our approach is heavily inspired by a recent algorithm of Dorfman etal. (ICALP 2019) for energy games. For completeness, we present their algo-rithm in terms of polyhedral value iteration. Our exposition, unlike the originalalgorithm, does not require edge weights to be integers and works for arbitraryreal weights. We study discounted games , mean payoﬀ games and energy games . All thesethree kinds of games are played on ﬁnite weighted directed graphs between two playerscalled Max and Min. Players shift a pebble along the edges of a graph. Nodes of thegraph are partitioned into two subsets, one where Max controls the pebble and the otherwhere Min controls the pebble. One should also indicate in advance a starting node (anode where the pebble is located initially). By making inﬁnitely many moves players ∗ [email protected]. Supported by the EPSRC grant EP/P020992/1 (SolvingParity Games in Theory and Practice). e , e , e , . . . of the graph (here e i is the i thedge passed by the pebble). The outcome of the game is a real number determined by asequence w , w , w , . . . , where w i is the weight of the edge e i . We assume that outcomeserves as the amount of ﬁne paid by player Min to player Max. In other words, the goalof Max is to maximize the outcome and the goal of Min is to minimize it.The outcome is computed diﬀerently in discounted, mean payoﬀ and energy games. • the outcome of a discounted game is ∞ X i =1 λ i − w i , where λ ∈ (0 ,

1) is a ﬁxed in advance real number called discount factor . • the outcome of a mean payoﬀ game islim sup n →∞ w + . . . + w n n . • the outcome of an energy game is  w + w + . . . + w n ) , n ∈ N is bounded from below,0 otherwise,(we interpret outcome 1 as victory of Max and outcome 0 as victory of Min).In all these three games every starting node has value , i.e., a real number α such that (a) there is a Max’s strategy σ guarantying that the outcome is at least α and (b) thereis a Min’s strategy τ guarantying that the outcome is at most α . Moreover [24, 7, 5]we can always choose σ and τ to be positional and independent of the starting node.Positionality means that strategy never makes two diﬀerent moves in the same node. Aproperty of having such σ and τ is often called positional determinacy .We study algorithmic problems that arise from these games. Namely, the valueproblem is a problem of ﬁnding values of a given game. The decision problem is aproblem of comparing the value of a node with a given threshold. Another fundamentalproblem is to ﬁnd positional strategies establishing the value of a game. Motivation . Positionally determined games are of great interest in design of al-gorithms and computational complexity. Speciﬁcally, these games serve as a source ofproblems that are in NP ∩ coNP but not known to be in P.Below we survey algorithms for discounted, mean payoﬀ and energy games (includingour contribution). Mean payoﬀ and discounted games are also studied in context ofdynamic systems [9]. Positionally determined games in general have a broad impact onformal languages and automata theory [2]. 2 alue problem vs. decision problem. The value problem, as more general one,is at least as hard as the decision problem. On the other hand, the values in dis-counted and mean payoﬀ games can be obtained from a play of two positional strategies.Hence, the bit-length of values is polynomial in bit-length of weights of edges and (incase of discounted games) bit-length of discount factor. This makes the value problempolynomial-time reducible to the decision problem via binary search. For energy gamesthere is no diﬀerence between these two problems at all.On the other hand, for discounted and mean payoﬀ games the value problem mayturn out to be harder for strongly polynomial algorithms. Indeed, in the reduction givenabove one manipulates directly with binary representations of weights (to identify arange containing values). This is prohibited for strongly polynomial algorithms.

Reductions, structural complexity.

It is known that Max wins in an energygame if and only if the value of the corresponding mean payoﬀ game is non-negative [3].Hence, energy games are equivalent to decision problem for mean-payoﬀ with threshold0. Any other threshold α is reducible to threshold 0 by adding − α to all the weights.So energy games and mean payoﬀ games are polynomial-time equivalent.Decision problem for discounted games lies in UP ∩ coUP [15]. In turn, mean pay-oﬀ games are polynomial-time reducible to discounted games [25]. Hence, the sameUP ∩ coUP upper bound applies to mean payoﬀ and energy games. None of these prob-lems is known to lie in P. Algorithms for discounted games.

There are two classical approaches to dis-counted games. In value iteration approach, going back to Shapley [24], one manipulateswith a real vector indexed by the nodes of the graph. The vector of values of a discountedgame is known to be a ﬁxed point of an explicit contracting operator. By applying thisoperator repeatedly to an arbitrary initial vector one obtains a sequence converging tothe vector of values. Using this, Littman [17] gave a deterministic O (cid:16) n O (1) · L − λ log (cid:16) − λ (cid:17)(cid:17) -time algorithm solving the value problem for discounted games. Here n is the number ofnodes, λ is the discount factor and L is the bit-length of input. This gives a polynomialtime algorithm for λ = 1 − Ω(1).

Strategy iteration approach, going back to Howard [14] (see also [23]), can be seen asa sophisticated way of iterating positional strategies of players. Hansen et al. [13] showedthat strategy iteration solves the value problem for discounted games in deterministicstrongly O (cid:16) n O (1) − λ log (cid:16) − λ (cid:17)(cid:17) -time. Unlike Littman’s algorithm, for λ = 1 − Ω(1) thisalgorithm is strongly polynomial.More recently, interior point methods we applied to discounted games [12]. As ofnow, however, these methods do not outperform the algorithm of Hansen et al.For all these algorithms the running time depends on λ (exponentially in the bit-length of λ ). As far as we know, no deterministic algorithm with running time 2 o ( n log n ) regardless of the value of λ was known. One can get 2 O ( n log n ) time by simply trying allpossible positional strategies of one of the players. Our main result pushes this bounddown to 2 O ( n ) . More precisely, we show the following3 heorem 1. The values of a discounted game on a graph with n nodes can be found indeterministic strongly n O (1) · (2 + √ n -time. We also obtain a better bound for a special case of discounted games, namely for bipartite discounted games. We call a discounted game bipartite if in the underlyinggraph each edge is either an edge from a Max’s node to a Min’s node or an edge from aMin’s node to a Max’s node. In other words, in a bipartite discounted game players canonly make moves alternatively.

Theorem 2.

The values of a bipartite discounted game on a graph with n nodes can befound in deterministic strongly n O (1) · n -time. Our algorithm is the fastest known deterministic algorithm for discounted gameswhen λ > − (2 + √ − n . For bipartite discounted games it is the fastest one for λ > − (2+Ω(1)) − n . For smaller discounts the algorithm Hansen et al. outperforms ours.One should also mention that their algorithm is applicable to more general stochastic discounted games, while our algorithm is not.In addition, it is known that randomized algorithms can solve discounted gamesfaster, namely, in time 2 O ( √ n · log n ) [18, 11, 1]. These algorithms are based on formulatingdiscounted games as an LP-type problem [19].

Algorithms for mean payoﬀ and energy games.

For mean payoﬀ and energygames it is usually assumed that weights of edges are integers, and running time ofteninvolves a parameter W , the largest absolute weight. In case of rational weights one cansimply multiply them by a common denominator.Zwick and Paterson [25] gave an algorithm solving the value problem for mean payoﬀgames in pseudopolynomial time, namely, in time O ( n O (1) · W ) (see also [21]). Brim etal. [4] improved the polynomial factor before W . In turn, Fijalkow et al. [8] slightlyimproved the dependence on W (from W to W − /n ).There are algorithms with running time depending on W much better (at the cost thatthey are exponential in n ). Lifshits and Pavlov [16] gave O ( n O (1) · n )-time algorithm forenergy games (here the running time does not depend at all on W ). Recently, Dorfmanet al. [6] pushed 2 n down to 2 n/ by giving a O ( n O (1) · n/ log W )-time algorithm forenergy games. They also claim (without proof) that log W factor can be removed. Atthe cost of an extra log W factor these algorithms can be lifted to the value problem formean payoﬀ games.All these algorithms are deterministic. As for randomized algorithms, the state-of-the-art is 2 O ( √ n log n ) -time, the same as for discounted games.We show that: Theorem 3.

For of an energy game on n nodes one can ﬁnd all the nodes where Max wins in deterministic strongly n O (1) n/ -time. This certiﬁes that for the algorithm of Dorfman et al. log W factor can be removed.More importantly, unlike the algorithm of Dorfman et al., our algorithm is strongly O (1) n/ -time. I.e., our algorithm can be performed for arbitrary real weights (assumingbasic arithmetic operations with them are carried out by an oracle).The main reason we provide the proof of Theorem 3 is for the sake of exposition.Our result for discounted games is highly inspired by the Dorfman et al. algorithm. Sowe ﬁnd it instructive to give Theorem 3 along with Theorem 1. We also believe that ourexposition is more transparent for the reasons discussed below. Arguably, our approach arises more naturally for discounted games, yet it roots in thealgorithm of Dorfman et al. for mean payoﬀ games.For discounted games we iterate a real vector x with coordinates indexed by thenodes of the graph, until x coincides with the vector of values. Thus, our approach canalso be called value iteration. However, it diﬀers signiﬁcantly from the classical valueiteration, and we call it polyhedral value iteration.We rely on a well known fact that the vector of values is a unique solution to so-calledoptimality equations. Optimality equations is a set of conditions that can be naturallysplit into two parts. The ﬁrst part is just a system of linear inequalities over x , whereeach node has some subset of inequalities associated speciﬁcally with this node. Theyexpress the fact that the players can not improve the value in a node. The second partstates that among inequalities associated with a node there is one turning into equality.This part represents the fact that values can be attained.By throwing away the second part we obtain a polyhedron containing the vector ofvalues. We call this polyhedron optimality polyhedron . Of course, besides the vector ofvalues there are some other points too.We initialize x by ﬁnding any point belonging to optimality polyhedron. There islittle chance that x will satisfy optimality equations. So until it does, we do the following.We compute a shift directed from x to the interior of the optimality polyhedron. Wemove x along this shift as far as possible, until the border of optimality polyhedron isreached. This point on the border will be the new value of x .We choose a shift in a very speciﬁc way. We consider an auxiliary discrete gamewhich we call discounted normal play game . The graph of the game depends on whatinequalities of the optimality polyhedron are tight on x . The values of this game de-termine a shift for x . The rules of the game guaranty that such shift does not violatetight inequalities. Hence our shift does not immediately lead us outside the optimalitypolyhedron.It turns out that this process converges to the vector of values. Moreover, it does in O ( n (2 + √ n ) steps. The complexity analysis is split into two independent parts. First,we indicate some properties of how the underlying discounted normal play games arechanging from one point to another in the algorithm. These leads to a deﬁnition of anabstract process of iterating discounted normal play games according to certain rules. Inthe second part of the argument we care only about this abstract process (called below DNP games iteration ) and forget about the context of discounted games. We show that5NP games iteration can last only O ( n (2 + √ n ) steps.It turns out that in essentially the same language one can present the algorithm ofDorfman et al. Now we search not the solution to optimality equations but a vector of potentials certifying that one of the players wins in certain nodes. Dorfman et al. buildupon a potential lifting algorithm of Brim et al. [4]. Dorfman et al. notice that in thealgorithm of Brim et al. a lot of consecutive iterations may turn out to be lifting the sameset of nodes. Instead, Dorfman et al. perform all these iterations at once, acceleratingthe algorithm of Brim et al. We notice that this can be seen as one step of polyhedralvalue iteration, but now for mean payoﬀ games.Polyhedron, inside which it all happens, is a limit of optimality polyhedrons as λ → To specify a discounted game G one has to specify • a ﬁnite directed graph G = ( V, E ) in which every node has at least one out-goingedge, i.e., in which every node is not a sink; • a partition of the set of nodes V into two disjoint subsets V Max and V Min ; • a weight function w : E → R ; • a real number λ ∈ (0 ,

1) called the discount factor .Discounted games are played between two players called Max and Min. There is a pebblewhich in each moment of time is located in one of the nodes of G . First, we have tospecify a node s ∈ V where the pebble is located initially. After that, at each move ofthe game the pebble is shifted along some edge of G by one of the players. Namely, ifcurrently the pebble is in a node a ∈ V Max , then player Max has to move the pebbleto some node b ∈ V satisfying ( a, b ) ∈ E . Similarly, if currently the pebble is in anode c ∈ V Min , then player Min has to move the pebble to some node d ∈ V satisfying( c, d ) ∈ E . Since in G every node has at least one out-going edge, it is always possibleto make a move.By making inﬁnitely many moves according to the rules above players obtain aninﬁnite path of the graph G . If e , e , e , . . . are edges of this path (in the order they are6isited), then the outcome of the game G is determined by the corresponding sequenceof weights: w = w ( e ) , w = w ( e ) , w = w ( e ) , . . . Namely, player Min pays to player Max a ﬁne of size ∞ X i =1 λ i − w i . (1)In other words, the goal of Max is to maximize (1) and the goal of Min is to minimize(1).For any discounted game G and for any starting node s there exists a real number x ∗ s , called the value of G in the node s , such that: • there is a Max’s strategy guarantying that (1) is at least x ∗ s ; • there is a Min’s strategy guarantying that (1) is at most x ∗ s .Moreover, the values of G can be found from the following system of equations called optimality equations : x a = max e =( a,b ) ∈ E w ( e ) + λx b , a ∈ V Max , (2) x a = min e =( a,b ) ∈ E w ( e ) + λx b , a ∈ V Min , (3)where the system is over a real vector x with coordinates indexed by the nodes of thegraph. More speciﬁcally, (a) there exists exactly one solution x ∗ to (2–3) and (b) forany node s the value of G in s coincides with x ∗ s .This characterization of the values of discounted games goes back to Shapley [24]. Letus sketch Shapley’s argument for reader’s convenience. The fact that (2–3) has exactlyone solution follows from Banach ﬁxed point theorem. Observe that the set of solutionsto (2–3) coincides with the set of ﬁxed points of the following mapping:∆ : R V → R V , ∆( x ) a =  max e =( a,b ) ∈ E w ( e ) + λx b a ∈ V Max , min e =( a,b ) ∈ E w ( e ) + λx b a ∈ V Min . .

It remains to notice that ∆ is λ -contracting with respect to k · k ∞ -norm.Now, let x ∗ be the solution to (2–3). We have to come up with a Max’s strategy σ and a Min’s strategy τ proving that the value in the node s exists and coincides with x ∗ s . Let σ be a strategy that from a node a ∈ V Max moves along an edge on which themaximum in (2) is attained. Similarly, let τ be a strategy that from a node a ∈ V Min moves along an edge on which the minimum in (3) is attained. It is not hard to verifythat • if the game starts in s and Max follows σ , then (1) is at least x ∗ s ;7 if the game starts in s and Min follows τ , then (1) is at most x ∗ s .Remarkably, strategies σ and τ do not depend on s . Moreover, strategies σ and τ are positional, i.e., the moves they make depend only on a current node and not on apath to this node. Thus, discounted games belong to a class of positionally determined games [10].In this paper we are interested in an algorithmic problem of ﬁnding for a givendiscounted game G and for every s ∈ V the value of G in s . By throwing away thecontext of discounted games one can simply say that we are interested in ﬁnding thesolution to (2–3). Energy games [3, 5] are also played between two players called Max and Min. Theyhave the same underlying mechanics as discounted games. Namely, the game takes placeon a directed graph G = ( V, E ) (with no sinks) equipped with a partition of V intosets V Max and V Min and with a weight function w : E → R . In the same way playersproduce an inﬁnite sequence w , w , w , . . . of weights of edges they visit. Now there isno discount factor and no ﬁne paid by Min to Max. Instead, depending on the sequence w , w , w , . . . , either Max or Min wins. More precisely, player Max wins if the sequenceof partial sums w + w + . . . + w n , n ∈ N is bounded from below. Player Min winsotherwise.Energy games are also positionally determined. More precisely, there is always aMax’s positional strategy σ and a Min’s positional strategy τ such that for every startingnode s either σ is a Max’s winning strategy or τ is a Min’s winning strategy. This followsfrom positional determinacy of more general mean payoﬀ games [7] and requires moreelaborate argument than for discounted games.It is instructive to provide a characterization of positional winning strategies in energygames in terms of cycles. First, by the weight of a cycle we mean the sum of weights ofits edges. We call a cycle positive if its weight is positive. In the same way we deﬁnenegative cycles, zero cycles and so on. Now, for a Max’s positional strategy σ let G σ bea graph obtained from G by removing edges that start in V Max and are not consistentwith strategy σ . I.e., in G σ each node from V Max has exactly one out-going edge, namelyone used by σ in this node. It is easy to see that σ is winning for Max in energy gamewith starting node s if and only if in the graph G σ only non-negative cycles are reachablefrom s .Similarly, for a Min’s positional strategy τ one can deﬁne the graph G τ where onlyedges used by τ are left for nodes in V Min . Then a strategy τ is winning for Max inenergy game starting in a node s if and only if only negative cycles are reachable from s in G τ .In this notation positional determinacy means that there is always a positional Max’sstrategy σ and a positional Min’s strategy τ such that for every node s either only non-negative cycles are reachable from s in G σ or only negative cycles are reachable from s in G τ . 8e consider an algorithmic problem of ﬁnding all the nodes where Max wins (equiv-alently, all the nodes where Min wins). In the paper we use term “bipartite” for directed graphs G = ( V, E ) equipped with apartition of V into sets V Max and V Min . Namely, we call a directed graph G = ( V, E )bipartite if E ⊆ V Max × V Min ∪ V Min × V Max . Next, by bipartite discounted game orbipartite energy game we mean a game played on a bipartite graph. n O (1) · (2 + √ n -time algorithm for discounted games In this section we give an algorithm establishing Theorem 1 and 2.We consider a discounted game on a graph G = ( V, E ) with a weight function w : E → R and with a partition of V between the players given by the sets V Max and V Min . Weassume that G has n nodes and m edges.In Subsection 3.1 we deﬁne auxiliary games that we call discounted normal playgames . We use these games both in the formulation of the algorithm and in the com-plexity analysis. In Subsection 3.2 we deﬁne so-called optimality polyhedron by relaxingoptimality equations (2–3).The algorithm is given in Subsection 3.3. In the algorithm we iterate the points ofthe optimality polyhedron in search of the solution to (2–3). First we initialize by ﬁndingany point belonging to the optimality polyhedron. Then for a current point we deﬁnea shift which does not immediately lead us outside the optimality polyhedron. In thedeﬁnition of the shift we use discounted normal play games. To obtain the next pointwe move as for as possible along the shift until we reach the border. We do so until thecurrent point satisﬁes (2–3). Along the way we also take some measures to prevent thebit-length of the current point of growing super-polynomially.This process always terminates and, in fact, can take only O ( n (2 + √ n ) iterations.Moreover, for bipartite discounted games it can take only O (2 n ) steps. A proof of it isdeferred to Section 4. These games will always be played on directed graphs with the same set of nodes as G .Given such a graph G ′ = ( V, E ′ ), we equip it with the same partition of V into V Max and V Min as in G . There may be sinks in G ′ .Two players called Max and Min move a pebble along the edges of G ′ . Player Maxcontrols the pebble in the nodes from V Max and player Min controls the pebble in thenodes from V Min . If the pebble reaches a sink of G ′ after s moves, then the player who cannot make a move pays ﬁne of size λ s to his opponent. Here λ is the discount factor fromour discounted game. If the pebble never reaches a sink, i.e., if the play lasts inﬁnitelylong, then players pay each other nothing. 9y the outcome of the play we mean the income of player Max. Thus, the outcomeis • positive, if the play ends in a sink from V Min ; • zero, if the play lasts inﬁnitely long; • negative, if the play ends in a sink from V Max .It is not hard to see that in this game players have optimal positional strategies. More-over, if δ ( v ) is the value of this game in the node v , then δ ( s ) = − , if s is a sink from V Max , (4) δ ( s ) = 1 , if s is a sink from V Min , (5) δ ( a ) = λ · max ( a,b ) ∈ E ′ δ ( b ) , if a ∈ V Max and a is not a sink , (6) δ ( a ) = λ · min ( a,b ) ∈ E ′ δ ( b ) , if a ∈ V Min and a is not a sink . (7)We omit proofs of these facts as below we only require the following Proposition 4.

For any G = ( V, E ′ ) there exists exactly one solution to (4–7), whichcan be found in strongly polynomial time. Before proving Proposition 4 let us note that for graphs with n nodes any solution δ to (5–6) satisﬁes δ ( v ) ∈ { , λ, . . . , λ n − , , − λ n − , . . . , − } . Indeed, if a is not a sink,then by (6–7) the node a has an out-going edge leading to a node with δ ( b ) = δ ( a ) /λ .By following these edges we either reach a sink after at most n − δ ( a ) = ± λ i for some i ∈ { , , . . . , n − } ) or we go to a loop. For all the nodes on a loopof length l > δ ( b ) = λ l δ ( b ) which means that δ ( b ) = 0 everywhere on the loop(recall that λ ∈ (0 , a , we also have δ ( a ) = 0.From this it is also clear that δ ( v ) = 1 if and only if v ∈ V Min and v is a sink of G ′ .Similarly, δ ( v ) = − v ∈ V Max and v is a sink of G ′ . Proof of Proposition 4.

To show the existence of a solution and its uniqueness we employBanach ﬁxed point theorem. Let ∆ be the set of all vectors f ∈ R V , satisfying f ( s ) = 1 for all sinks s ∈ V Min , f ( t ) = − t ∈ V Max . Deﬁne the following mapping ρ : ∆ → ∆: ρ ( f )( a ) =  − a is a sink from V Max , a is a sink from V Min ,λ · max ( a,b ) ∈ E f ( b ) a ∈ V Max and a is not a sink ,λ · min ( a,b ) ∈ E f ( b ) a ∈ V Min and a is not a sink . δ ∈ ∆ such that ρ ( δ ) = δ . Itremains to notice that ρ is λ -contracting with respect to k · k ∞ -norm.Now let us explain how to ﬁnd the solution to (4–7) in strongly polynomial time. Letus ﬁrst determine for every k ∈ { , , . . . , n − } the set V k = { v ∈ V | δ ( v ) = λ k } . It isclear that V coincides with the set of sinks of the graph G ′ which lie in V Min . Next, theset V k can be determined in strongly polynomial time once V , V , . . . , V k − are given.Indeed, by (6–7) the set V k consists of • all v ∈ V Max \ V a, b ) ∈ E, a ∈ V Min . (9)We denote the optimality polyhedron by OptPol . Note that the solution to optimalityequations (2–3) belongs to

OptPol .We call a vector δ ∈ R V a valid shift for x ∈ OptPol if for all small enough ε > x + εδ belongs to OptPol . To determine whether a shift δ is valid for x it isenough to look at the edges which are tight for x . Namely, we call an edge ( a, b ) ∈ E tight for x ∈ OptPol if w ( e ) + λx b − x a = 0, i.e., if the corresponding inequality in (8–9)becomes an equality on x . It is clear that δ ∈ R is valid for x if and only if λδ ( b ) − δ ( a ) a, b ) ∈ E, a ∈ V Max and ( a, b ) is tight for x, (10) λδ ( b ) − δ ( a ) > a, b ) ∈ E, a ∈ V Min and ( a, b ) is tight for x. (11)Discounted normal play games can be used to produce for any x ∈ OptPol a valid shiftfor x . Namely, let E x ⊆ E be the set of edges that are tight for x and consider the graph G x = ( V, E x ). I.e., G x is a subgraph of G containing only edges that are tight for x . Animportant observation is that x is the solution to optimality equations (2–3) if and onlyif in G x there are no sinks.Deﬁne δ x to be the solution to (4–7) for G x . Note that δ x is a valid shift for x as(6–7) imply (10–11). Not also that as long x does not satisfy (2–3), i.e., as long as thegraph G x has sinks, the vector δ x is not zero.Let us also deﬁne a procedure RealizeGraph ( S ) that we use in our algorithm tocontrol the bit-length of the current point. The input to the procedure is a subset S ⊆ E .11he output of RealizeGraph ( S ) is a point x ∈ OptPol satisfying S ⊆ E x . If there isno such x , the output of RealizeGraph ( S ) is “not found”. In other words, considera polyhedron which can be obtained from (8–9) by turning inequalities correspondingto edges from S into equalities. The output of RealizeGraph ( S ) is a point of thispolyhedron, if this polyhedron is not empty. In particular, RealizeGraph ( ∅ ) is simplya procedure of ﬁnding a point belonging to OptPol .Note that each inequality in (8–9) contains exactly two variables. Hence (see [20]),the output of

RealizeGraph ( S ) can be computed in strongly polynomial time. Algorithm 1: n O (1) · (2 + √ n -time algorithm for discounted games Result:

The solution to optimality equations (2–3)initialization: x = RealizeGraph ( ∅ ); while x does not satisfy (2–3) do ε max ← the largest ε ∈ (0 , + ∞ ) s.t x + εδ x ∈ OptPol ; x ← RealizeGraph ( E x + ε max δ x ); end output x ;Some remarks: • we can ﬁnd δ x in strongly polynomial time by Proposition 4; • the value of ε max can be found as in the simplex-method. Indeed, ε max is thesmallest ε ∈ (0 , + ∞ ) for which there exists an inequality in (8–9) which is tightfor x + εδ x but not for x . Thus, to ﬁnd ε max it is enough to solve at most m linearone-variable equations and compute the minimum over positive solutions to theseequations. • in fact, ε max < + ∞ throughout the algorithm, i.e, we can not move along δ x forever. To show this, it is enough to indicate ε > x + εδ x but not for x . First, since x does not yet satisfy optimalityequations (2–3), there exists a sink s of the graph G x . Assume that s ∈ V Max , theargument in the case s ∈ V Min is similar. The graph G is sinkless, so there existsan edge e = ( s, b ) ∈ E . The edge ( s, b ) is not tight for x (otherwise s is not a sinkof G x ). Hence w ( e ) + λx b − x s <

0. The left-hand side of the same inequality for x + εδ x looks as follows: w ( e ) + λx b − x s + ε · ( λδ x ( b ) − δ x ( s )) . In turn, the node s is a sink of G x from V Max , hence δ x ( s ) = − < λδ x ( b ). I.e., theleft-hand side of the inequality for the edge ( s, b ) increases as ε increases, so forsome positive ε it will become tight. 12 One could consider a version of the Algorithm 1 where we do not use the procedure

RealizeGraph and simply set x ← x + ε max δ x . A problem with this version is that itis not clear why the bit-length of the coordinates of x x a is polynomially boundedthroughout the algorithm. In turn, if we use the procedure RealizeGraph , thisproblem does not occur. Indeed, we maintain the property that x is an output ofa strongly polynomial time algorithm on a polynomially bounded input. Let x , x , x , . . . be a sequence of point from OptPol that arise in the Algorithm 1. Theargument consists of two parts: • ﬁrst, we show that the sequence of graph G x , G x , G x . . . can be obtained in anabstract process that we call discounted normal play games iteration (DNP gamesiteration for short), see Subsection 4.2; • second, we show that any sequence of n -node graphs that can be obtained in DNPgames iteration has length O ( n (2 + √ n ), see Subsection 4.3.This will establish Theorem 1. To show Theorem 2 note that if G is bipartite, then soare G x , G x , G x and so on. Thus, it is enough to demonstrate that: • any sequence of bipartite n -node graphs that can be obtained in DNP games iter-ation has length O (2 n ), see Subsection 4.4.First of all, we have to give a deﬁnition of DNP games iteration (Subsection 4.1). Consider a directed graph H = ( V, E ) and let δ H be the solution to (4–7) for H . Wesay that the edge ( a, b ) ∈ E is optimal for H if δ H ( a ) = λδ H ( b ). Next, we say that thepair ( a, b ) ∈ V × V is improving for H if one of the following two conditions holds: • a ∈ V Max and δ H ( a ) < λδ H ( b ); • a ∈ V Min and δ H ( a ) > λδ H ( b ).Note that an improving pair of nodes can not be an edge of H because of (6–7).Consider another directed graph K = ( V, E ) over the same set of nodes as H . Wesay that K can be obtained from H in one step of DNP games iteration if E contains alledges of H that are optimal for H and also at least one pair of nodes which is improvingfor H . I.e., we can erase some non-optimal edges of H , and then we can add some edgesthat are not in H , in particular, we should add at least one improving pair.Finally, we say that a sequence of graph H , H , . . . , H j can be obtained in DNPgames iterations if for all i ∈ { , , . . . , j − } the graph H i +1 can be obtained from H i in one step of DNP games iteration. 13 .2 Why the sequence G x , G x , G x , . . . can be obtained in DNPgames iteration Let x and x ′ = RealizeGraph ( E x + ε max δ x ) be two consecutive points of OptPol in thealgorithm. We have to show that the graph G x ′ can be obtained from G x in one stepof DNP games iteration. By deﬁnition of the procedure RealizeGraph the graph G x ′ contains all edges of the graph G y , where y = x + ε max δ x . Hence it is enough to showthe following: (a) all the edges of the graph G x that are optimal for G x are also in the graph G y ; (b) there is an edge of the graph G y which is an improving pair for the graph G x . Proof of (a).

Take any edge ( a, b ) of the graph G x which is optimal for G x . Theleft-hand side of (8–9) for the edge ( a, b ) on the point y = x + ε max δ x looks as follows: w ( e ) + λx b − x a + ε max · ( λδ x ( b ) − δ x ( a )) . (12)The last term of (12) is 0 as ( a, b ) is an optimal edge of G x . Since ( a, b ) is tight for x , itis also tight for y , i.e., it also belongs to G y . Proof of (b).

In fact, any edge of the graph G y which is not in the graph G x is animproving pair for G x . Assume ( a, b ) ∈ E is an edge of G y but not of G x . Hence ( a, b )is tight for y but not for x . I.e., (12) is 0 for ( a, b ), but • w ( e ) + λx b − x a < a ∈ V Max ; • w ( e ) + λx b − x a > a ∈ V Min .This means that λδ x ( b ) − δ x ( a ) > a ∈ V Max and λδ x ( b ) − δ x ( a ) < a ∈ V Min .Therefore ( a, b ) is an improving pair for G x .It only remains to note that there exists an edge of G y which is not an edge of G x .Indeed, otherwise all inequalities that are tight for y = x + ε max δ x were tight already for x . Then ε max could be increased, contradiction. O ( n (2 + √ n ) bound on the length of DNP games iteration The argument has the following structure. • Step 1.

For every directed graph H = ( V, E ) we deﬁne two vectors f H , g H ∈ N n − . • Step 2.

We deﬁne a linear ordering of vectors from N n − called alternating lexico-graphic ordering . • Step 3.

We show that in each step of DNP games iteration (a) neither f H nor g H decrease and (b) either f H or g H increase (in the alternating lexicographicordering). 14 Step 4.

We bound the number of values f H and g H can take. By step 3 this bound(multiplied by 2) is also a bound on the length of DNP games iteration. Step 1.

The ﬁrst coordinate of the vector f H equals the number of nodes with δ H ( a ) = 1 (all such nodes are from V Min ). The other 2 n − n − i th pair we ﬁrst have the number of nodes from V Max with δ H ( a ) = λ i , and then the number of nodes from V Min with δ H ( a ) = λ i .The vector g H is deﬁned similarly, with the roles of Max and Min and + and − reversed. The ﬁrst coordinate of g H equals the number of nodes with δ H ( a ) = − V Max ). The other 2 n − n − i th pair we ﬁrst have the number of nodes from V Min with δ H ( a ) = − λ i ,and then the number of nodes from V Max with δ H ( a ) = − λ i . Step 2.

Alternating lexicographic ordering is a lexicographic order obtained fromthe standard ordering of integers in the even coordinates and from the reverse of thestandard ordering of integers in the odd coordinates. For example,(3 , , < (2 , , , (2 , , > (2 , , , in the alternating lexicographic order. Step 3.

This step relies on the following

Lemma 5.

Assume that a graph K can be obtained from a graph H in one step of DNPgames iteration. Then (a) if for some i ∈ { , , . . . , n − } it holds that { a ∈ V | δ H ( a ) = λ i } 6 = { a ∈ V | δ K ( a ) = λ i } , then f K is greater than f H in the alternating lexicographic order. (b) if for some i ∈ { , , . . . , n − } it holds that { a ∈ V | δ H ( a ) = − λ i } 6 = { a ∈ V | δ K ( a ) = − λ i } , then g K is greater than g H in the alternating lexicographic order. Assume Lemma 5 is proved. • Why neither f H nor g H can decrease? If f K does not exceed f H in thealternating lexicographic order, then { a ∈ V | δ H ( a ) = λ i } = { a ∈ V | δ K ( a ) = λ i } for every i ∈ { , , . . . , n − } by Lemma 5. On the other hand, f H and f K aredetermined by these sets, so f H = f K . Similar argument works for g H and g K aswell. • Why either f H or g H increase? Assume that neither f K is greater than f H nor g H is greater than g H in alternating lexicographic order. By Lemma 5 we havefor every i ∈ { , , . . . , n − } that { a ∈ V | δ H ( a ) = λ i } = { a ∈ V | δ K ( a ) = λ i } and { a ∈ V | δ H ( a ) = − λ i } = { a ∈ V | δ K ( a ) = − λ i } . This means that functions δ H and δ K coincide. On the other hand, the graph K contains as an edge a pair ofnodes which is improving for H . Since δ K = δ H , this means that this pair is alsoimproving for K . Hence this pair can not be an edge of the graph K , contradiction.15e now proceed to a proof of Lemma 5. Let us stress that in the proof we do not usethe fact that K contains an improving pair for H . We only use the fact that K containsall optimal edges of H . Proof of Lemma 5.

We only prove (a) , the proof of (b) is similar. Let j be the smallestelement of { , , . . . , n − } for which { a ∈ V | δ H ( a ) = λ j } 6 = { a ∈ V | δ K ( a ) = λ j } .First consider the case j = 0. We claim that in this case the ﬁrst coordinate of f K issmaller than the ﬁrst coordinate of f H . Indeed, f H is the number of sinks from V Min inthe graph H . In turn, f K is the number of sinks from V Min in the graph K . On the otherhand, there all sinks of K are also sinks of H . Indeed, nodes that are not sinks of H havein H an out-going optimal edge. All these edges are also in K . Hence f K f H . Theequality is not possible because otherwise { a ∈ V | δ H ( a ) = 1 } 6 = { a ∈ V | δ K ( a ) = 1 } ,contradiction with the fact that j = 0.Now assume that j >

0. Then the sets { v ∈ V | δ H ( v ) = λ j } and { v ∈ V | δ K ( v ) = λ j } are distinct. Hence there are two cases. • First case : { v ∈ V Max | δ H ( v ) = λ j } 6 = { v ∈ V Max | δ H ( v ) = λ j } . • Second case : { v ∈ V Max | δ H ( v ) = λ j } = { v ∈ V Max | δ H ( v ) = λ j } and { v ∈ V Min | δ H ( v ) = λ j } 6 = { v ∈ V Min | δ H ( v ) = λ j } .In both cases the ﬁrst 1 + 2( j −

1) coordinates of f H and f K coincide, because { v ∈ V | δ H ( v ) = λ i } = { v ∈ V | δ K ( v ) = λ i } for all i < j . Moreover, in the secondcase we also have f H j = f K j . We claim that in the ﬁrst case we have f H j < f K j and in thesecond case we have f H j +1 > f K j +1 . The rest is devoted a proof of this claim as it clearlyimplies that f K exceeds f H in alternating lexicographic order. Proving f H j < f K j in the ﬁrst case. Since the sets { v ∈ V Max | δ H ( v ) = λ j } and { v ∈ V Max | δ K ( v ) = λ j } are distinct, it is enough to show that { v ∈ V Max | δ H ( v ) = λ j } ⊆ { v ∈ V Max | δ K ( v ) = λ j } . For that we take any a ∈ V Max with δ H ( a ) = λ j and showthat δ K ( a ) = λ j . By (6–7) there is an edge ( a, b ) of the graph H with δ H ( b ) = λ j − . Wealso have that δ K ( b ) = λ j − , because { v ∈ V | δ H ( v ) = λ j − } = { v ∈ V | δ K ( v ) = λ j − } .On the other hand, since δ H ( a ) = λδ H ( b ), the edge ( a, b ) is optimal for H , hence thisedge is also in the graph K . So in the graph K there is an edge from a ∈ V Max to anode b with δ K ( b ) = λ j − . Hence by (6) we have δ K ( a ) > λ j . It remains to show why itis impossible that δ K ( a ) > λ j . Indeed, then a ∈ { v ∈ V | δ K ( v ) = λ i } for some i < j .On the other hand, the node a is not in the set { v ∈ V | δ H ( v ) = λ i } . Hence the sets { v ∈ V | δ H ( v ) = λ i } and { v ∈ V | δ K ( v ) = λ i } are distinct, contradiction with theminimality of j . Proving f H j +1 > f K j +1 in the second case. Since the sets { v ∈ V Min | δ H ( v ) = λ j } and { v ∈ V Min | δ K ( v ) = λ j } are distinct, it is enough to show that { v ∈ V Min | δ K ( v ) = λ j } ⊆ { v ∈ V Min | δ H ( v ) = λ j } . For that we take any a ∈ V Min with δ K ( a ) = λ j and showthat δ H ( a ) = λ j . It is clear that δ H ( a ) λ j , because otherwise for some i < j we wouldhave that the sets { v ∈ V | δ H ( v ) = λ i } and { v ∈ V | δ K ( v ) = λ i } are distinct ( a wouldbelong to the ﬁrst set and not to the second one). This would give us a contradictionwith the minimality of j . Thus, it remains to show that δ H ( a ) > λ j . Assume that this16s not the case, i.e., δ H ( a ) λ j +1 . Since a ∈ V Min , the node a is not a sink of H (thiswould mean that δ H ( a ) = 1 > λ j +1 ). Hence by (7) there exists an edge ( a, b ) in thegraph H with δ H ( b ) = δ H ( a ) /λ λ j . Then we also have that δ K ( b ) λ j , because byminimality of j we have { v ∈ V | δ H ( a ) > λ j − } = { v ∈ V | δ K ( a ) > λ j − } and hence { v ∈ V | δ H ( a ) λ j } = { v ∈ V | δ K ( a ) λ j } . But the edge ( a, b ) is optimal for H , so the edge ( a, b ) is also in the graph K . This means that in the graph K thereis an edge from a to a node b with δ K ( b ) λ j . Hence by (7) we have δ K ( a ) λ j +1 ,contradiction. Step 4.

Notice that f H and g H belong to the set of all vectors v ∈ N n − satisfying: k v k n, (13) v = 0 = ⇒ v = v = . . . = v n − = 0 , (14) v i = v i +1 = 0 = ⇒ v i +2 = v i +3 = . . . v n − = 0 for every i ∈ { , . . . , n − } . (15)To see (13) note that in our case the l -norm is just a sum of coordinates. By construction,the sum of coordinates of f H is the number of nodes with δ H ( a ) > g H is the number of nodes with δ H ( a ) <

0. The fact that f H satisﬁes(14–15) can be seen from the following observation: if { a ∈ V | δ H ( a ) = λ i } = ∅ , thenwe also have { a ∈ V | δ H ( a ) = λ j } = ∅ for every j > i, j ∈ { , , . . . , n − } . Indeed,by (6–7) a node with δ H ( a ) = λ j has an edge leading to a node with δ H ( b ) = λ j − . Bycontinuing in this way we would reach a node with δ H ( a ) = λ i , contradiction.Thus, the desired upper bound on the length of DNP games iteration follows fromthe following technical lemma. Lemma 6.

The number of vectors v ∈ N n − satisfying (13–15) is O ( n (2 + √ n ) .Proof. Let A be the set of v ∈ N n − satisfying (13–15). For v ∈ A let t ( v ) be thelargest t ∈ { , , . . . , n − } such that v t + v t +1 >

0. If there is no such t at all (i.e., if v = v = . . . = v n − = 0), then deﬁne t ( v ) = 0.Let A t = { v ∈ A | t ( v ) = t } . We claim that |A t | (2 + √ n for any t . As t ( v ) cantake only O ( n ) values, the lemma follows.The size of A is n , so we may assume that t >

0. Take any ρ ∈ (0 , ρ n |A t | X v ∈A t ρ k v k = X v ∈A t ρ v · ρ v + v · . . . ρ v t + v t +1  ∞ X v =1 ρ v  ·  X ( v ,v ) ∈ N \{ (0 , } ρ v + v  · . . . ·  X ( v t ,v t +1 ) ∈ N \{ (0 , } ρ v t + v t +1  = ∞ X a =1 ρ a ! ·  X ( b,c ) ∈ N \{ (0 , } ρ b + c  t . Indeed, the ﬁrst inequality here holds because k v k n by (13) for v ∈ A . The secondinequality holds because for v ∈ A with t ( v ) = t we have v > v i + v i +1 > i ∈ { , , . . . , t } by (15). 17ext, notice that for ρ = 1 − √ we have: ∞ X a =1 ρ a ! ·  X ( b,c ) ∈ N \{ (0 , } ρ b + c  t

1. Indeed, ∞ X a =1 ρ a = ρ − ρ = √ − < , X ( b,c ) ∈ N \{ (0 , } ρ b + c = 1(1 − ρ ) − . Thus, we get ρ n |A t |

1. I.e., |A t | (1 /ρ ) n = (2 + √ n , as required.In fact, as shown in Appendix A, Lemma 6 is tight up to a polynomial factor. O (2 n ) bound on the length of DNP games iteration for bi-partite graphs The proof diﬀers only in the last step, where for bipartite graphs we obtain a betterbound. In more detail, if H is bipartite, then f H and g H in addition to (13–15) satisfythe following property: u i = 0 for even i ∈ { , . . . , n − } , u i +1 = 0 for odd i ∈ { , . . . , n − } . (16)Indeed, for f H the condition (16) looks as follows: f H i = |{ v ∈ V Max | δ H ( v ) = λ i }| = 0 for even i , f H i +1 = |{ v ∈ V Min | δ H ( v ) = λ i }| = 0 for odd i .This holds because from a node a with δ H ( a ) = λ i there a path of length i to a node s with δ H ( s ) = 1. If δ H ( s ) = 1, then s ∈ V Min . Since H is bipartite, this means that a ∈ V Min for even i and a ∈ V Max for odd i . The argument for g H is the same.So it is enough to show that the number of v ∈ N n − satisfying (13–16) is O (2 n ).Let t ( v ) be deﬁned in the same way as in the proof of Lemma 6. I.e., t ( v ) is the largest t ∈ { , . . . , n − } for which v t + v t +1 > t , we set t ( v ) = 0). Letus bound the number of v ∈ N n − satisfying (13–16) and k v k = s, t ( v ) = t .For t = 0 the number of such v is exactly 1. Assume now that t >

0. Then v > , by (14) v i > v i +1 = 0 , for odd i ∈ { , . . . , t } by (15) and (16), v i = 0 and v i +1 > , for even i ∈ { , . . . , t } by (15) and (16), v j = 0 for j > t + 1 , by deﬁnition of t ( v ).18ence the number of v ∈ N n − satisfying (13–16) and k v k = s, t ( v ) = t is equal to thenumber of the solutions to the following system: x + x + . . . + x t +1 = s, x , x , . . . , x t +1 ∈ N \ { } . This number is (cid:16) s − t (cid:17) . By summing over all s n and t we get the required O (2 n )bound. n O (1) · n/ -time algorithm for energy games In this section we give an algorithm establishing Theorem 3.We consider an energy game G on a graph G = ( V, E ) with a weight function w : E → R and with a partition of V between the players given by the sets V Max and V Min . Weassume that G has n nodes and m edges.First, we notice that without loss of generality we may assume that G is bipartite. Lemma 7.

An energy game on n nodes can be reduced in strongly polynomial time to abipartite energy game on at most n nodes. This fact seems to be overlooked in the literature. Here is a brief sketch of it. Supposethat the pebble is in a ∈ V Max . After controlling the pebble for some time Max mightdecide to enter a Min’s node b . Of course, it makes sense to do it via a path of the largestweight (among all paths from a to b with intermediate nodes controlled by Max). We cansimply replace this path by a single edge from a to b of the same weight. Similar thingcan be done with Min, but now the weight of a path should be minimized. By performingthis for all pair of nodes controlled by diﬀerent players we obtain an equivalent bipartitegame. A full proof is given in Appendix B.To simplify an exposition we ﬁrst present our algorithm for the case when the fol-lowing assumption is satisﬁed. Assumption 1.

In the graph G there are no zero cycles. Discussion of the general case is postponed to the end of this section.Exposition of the algorithm follows the same scheme as for discounted games. Firstwe deﬁne a polyhedron that we will work with. Now we call it polyhedron of potentials . Inthe algorithm we iterate the points of this polyhedron via valid shifts. To produce validshifts we again use discounted normal play games. We should also modify a terminatingcondition. Given a point of polyhedron of potentials satisfying our new terminatingcondition one should be able to ﬁnd all the nodes where Max wins in energy game. Wealso describe an analog of procedure

RealizeGraph (again used to control the bit-lengthof points that arise in the algorithm). All this is collected together in the Algorithm 2.Here are details.The polyhedron of potentials is deﬁned as follows: w ( e ) + x b − x a a, b ) ∈ E, a ∈ V Max , (17) w ( e ) + x b − x a > a, b ) ∈ E, a ∈ V Min . (18)19ere x is an n -dimensional real vector with coordinates indexed by the nodes of thegraph. This polyhedron is denoted by PolPoten .By setting x a =  W a ∈ V Max , a ∈ V Min , for W = max e ∈ E | w ( e ) | we obtain that PolPoten is not empty (here it is important thatour energy game is bipartite).We use notions similar to those we gave for the optimality polyhedron. Namely, wecall an edge e = ( a, b ) ∈ E tight for x ∈ PolPoten if w ( e ) + x b − x a = 0. The setof all e ∈ E that are tight for x ∈ PolPoten is denoted by E x . By G x we mean thegraph ( V, E x ). A very important consequence of the Assumption 1 is that for every x ∈ PolPoten the graph G x is a directed acyclic graph. Indeed, a cycle consisting ofedges that are tight for x would be a zero cycle, contradicting Assumption 1.Next, we call a vector δ ∈ R n a valid shift for x ∈ PolPoten if for all small enough ε > x + εδ ∈ PolPoten . Again, discounted normal play games on G x can be used to produce a valid shift for x . Now the discounted factor in a discountednormal play game is irrelevant. We can pick an arbitrary one, say, λ = 1 /

2. As before,for x ∈ PolPoten we let δ x be the solution to (4–7) for the graph G x . Since the graph G x is acyclic, we have δ x ( a ) = 0 for every a ∈ V . Deﬁne V + x = { a ∈ V | δ x ( a ) > } and V − x = { a ∈ V | δ x ( a ) < } . Lemma 8.

Assume that x ∈ PolPoten and let χ + x be the characteristic vector of theset V + x . Then χ + x is a valid shift for x .Proof. Assume that ( a, b ) ∈ E x . It is enough to show that χ + x ( b ) − χ + x ( a ) a ∈ V Max and χ + x ( b ) − χ + x ( a ) > a ∈ V Min .First, assume that a ∈ V Max and χ + x ( b ) − χ + x ( a ) >

0. Then χ + x ( a ) = 0 and χ + x ( b ) = 1,i.e., δ x ( b ) > δ x ( a ) <

0. But this contradicts (6).Similarly, assume a ∈ V Min and χ + x ( b ) − χ + x ( a ) <

0. Then χ + x ( a ) = 1 and χ + x ( b ) = 0,i.e., δ x ( b ) < δ x ( a ) >

0. This contradicts (7).The following lemma speciﬁes and justiﬁes our new terminating condition.

Lemma 9.

Let x ∈ PolPoten and assume that in the graph G there are no edges from V + x ∩ V Min to V − x and no edges from V − x ∩ V Max to V + x . Then V + x is the set of nodes whereMax wins in the energy game and V − x is the set of nodes where Min wins in the energygame.Proof. Consider a positional strategy σ of Max deﬁned as follows. For all a ∈ V + x ∩ V Max strategy σ goes from a by an edge ( a, b ) ∈ E x with b ∈ V + x . There is always such anedge because of (6) and because there are no sinks from V Max in V + x . In the nodes from V − x ∩ V Max deﬁne strategy σ arbitrarily.Let us also deﬁne the following positional strategy τ of Min. For all a ∈ V − x ∩ V Min strategy τ goes from a by an edge ( a, b ) ∈ E x with b ∈ V − x . Again, such an edge exists by207) and since there are no sinks from V Min in V − x . In the nodes from V − x ∩ V Min strategy τ is deﬁned arbitrarily.First, let us verify that for every a ∈ V + x from a one can reach only non-negativecycles in the graph G σ . This would mean that Max wins in energy game from any nodeof V + x . First, in G σ from a it is impossible to reach V − x . Indeed, σ does not leave V + x and by assumptions of the lemma there are no edges from V + x ∩ V Min to V − x . Hence itis enough to show that in the graph G σ every cycle consisting of nodes from V + x is non-negative. Note that we can compute the weight of a cycle by summing up w ( e ) + x b − x a over all edges e = ( a, b ) belonging to a cycle (the terms x a cancel out). In turn, foredges of G σ lying inside V + x all expressions w ( e ) + x b − x a are non-negative. Indeed, forevery e that starts in V Min the expression w ( e ) + x b − x a is non-negative by (18). In turnstrategy σ uses edges of the graph G x , i.e., edges that are tight for x . For these edgeswe have w ( e ) + x b − x a = 0.Similarly one can show that for every a ∈ V − x from a one can reach only non-positivecycles in the graph G τ . In fact, by Assumption 1 there are no zero cycles, so in everynode from V − x the winner of energy game is Min.We deﬁne the procedure RealizeGraph ( S ) similarly. The input to RealizeGraph ( S )is a subset S ⊆ E and the output is a point x ∈ PolPoten satisfying S ⊆ E x , providedsuch point exists. Again, RealizeGraph ( S ) can be computed in strongly polynomialtime. Let us remark that now there is no need to refer to Megiddo’s algorithm [20]. In-deed, notice that all inequalities in (17–18) are of the form x y + c , where x and y arevariables and c is a constant. It is clear that all inequalities appearing in Fourier–Motzkinelimination for (17–18) will still have this form. Hence we can keep the number of in-equalities to be O ( n ) throughout Fourier–Motzkin elimination, by removing redundantinequalities.Now we are ready to give an algorithm establishing Theorem 3. Our goal is to ﬁndthe sets W Max = { a ∈ V | Max wins from a in the energy game } ,W Min = { a ∈ V | Min wins from a in the energy game } . Algorithm 2: n O (1) · n/ -time algorithm for energy games Result:

The sets W Max , W

Min .initialization: x = RealizeGraph ( ∅ ); while there is an edge of G from V + x ∩ V Min to V − x or from V − x ∩ V Max to V + x do ε max ← the largest ε ∈ (0 , + ∞ ) s.t x + εχ + x ∈ PolPoten ; x ← RealizeGraph ( E x + ε max χ + x ); end output W Max = V + x , W Min = V − x ;The correctness of the output of our algorithm follows from Lemma 9. To compute V + x , V − x and χ + x we ﬁnd δ x in strongly polynomial time by Lemma 4. In turn, we compute ε max in the same way as in Algorithm 1. To demonstrate the correctness of the algorithm21t only remains to show that ε max < + ∞ throughout the algorithm. Indeed, when theterminating condition is not yet satisﬁed, there exists an edge e = ( a, b ) of the graph G such that either a ∈ V + x ∩ V Min , b ∈ V − x or a ∈ V − x ∩ V Max , b ∈ V + x . Let us consider theﬁrst case, the second one is similar. Note that ( a, b ) is not tight for x , because otherwise( a, b ) belongs to the graph G x . This contradicts (7), because we can not have an edgefrom a Min’s node with positive value of δ x to a node with negative value of δ x . So wehave w ( e ) + x b − x a > . In turn, if we consider the left-hand side of the same inequality for x + εχ + x , we obtainthe following: w ( e ) + x b − x a + ε ( χ + x ( b ) − χ + x ( a )) = w ( e ) + x b − x a − ε. Indeed, χ + x ( a ) = 1 and χ + x ( b ) = 0. This means that for some positive ε the inequalitycorresponding to ( a, b ) in (17-18) is tight for x + εχ + x . The same inequality, as weestablished, is not tight for x . Hence it is impossible to move along χ + x forever, i.e., ε max < + ∞ . Assume that we add small ρ > G become strictly positive. On the other hand, if ρ is small enough, then all negativecycles stay negative. Thus, for all small enough ρ > ρ > k , then we can set ρ = 2 − k / ( n + 1). An interesting question is whether a suitable ρ > strongly polynomial time. We do not know the answer. Instead, we propose anotherapproach that solves energy games in strongly n O (1) · n/ -time in general case.Our idea is to add ρ to all weights of edges not as a real number but as a formalvariable . I.e., we will consider the weights as formal linear combinations of the form a + b · ρ, a, b ∈ R . First, we will perform additions over such combinations. Morespeciﬁcally, the sum of a + b · ρ and c + d · ρ will be ( a + c ) + ( b + d ) · ρ . We will alsoperform comparisons of these linear combinations. We say that a + b · ρ < c + d · ρ if a < c or a = c, b < d . Note that the inequality a + b · ρ < c + d · ρ holds for formal linearcombinations a + b · ρ and c + d · ρ if and only if for all small enough γ ∈ R , γ > real numbers when one substitutes γ instead of ρ .Thus, more formally, we consider the weights as elements of the additive group R equipped with lexicographic order. Now, given our initial “real” energy game, we consideranother one where the weight of an edge e ∈ E is a formal linear combination w ( e ) + ρ .After that Assumption 1 is satisﬁed (again, if one understands the weight of a cycle asan element of the group R ).We then run Algorithm 2, but now with the coordinates of the vector x being elementsof the group R . Note that in Algorithm 2 we perform only additions and comparisons22ith the weights of edges and with the coordinates of x . Indeed, in computing ε max wesolve at most m one-variable linear equations with the coeﬃcient before the variable being1. In computing ReazlizeGraph procedure we perform Fourier–Motzkin eliminationfor inequalities of the form x > y + c . Clearly, this also requires only additions andcomparisons. So throughout the algorithm we never have to multiply or divide ourformal linear combinations .To argue that a version of Algorithm 2 with formal linear combinations is correctwe use a sort of compactness argument. Fix some N and “freeze” the algorithm after N steps. Up to now only ﬁnitely many comparisons of linear combinations over ρ areperformed. For all small enough real γ > γ instead of ρ . So after N steps the “formal” version of Algorithm2 will be in the same state as the “real” one, i.e., one where in advance we add a smallenough real number γ to all the weights. In turn, for all small enough γ the “real” versionterminates in N = n O (1) n/ steps (see the next section) with the correct output to ourinitial energy game. It is important to note that a bound N on the number of steps ofthe “real” algorithm is independent of γ . Hence the “formal” version also terminates inat most N = n O (1) n/ steps with the correct output. The complexity analysis of Algorithm 2 follows the same scheme as for discounted games.First, we deﬁne strong

DNP games iteration (a more restrictive version of DNP gamesiteration, see Subsection 6.1). Then we consider a sequence x , x , x , . . . of points from PolPoten that arise in Algorithm 2. We show that the corresponding sequence ofgraphs G x , G x , G x , . . . can be obtained in strong DNP games iteration (Subsection6.2). Finally, we show that the length of a strong DNP games iteration is bounded by O (2 n/ ) (Subsection 6.3). In strong DNP games iteration all graphs are assumed to be bipartite and acyclic.Consider a directed bipartite acyclic graph H = ( V, E ). We say that a pair of nodes( a, b ) ∈ V × V is strongly improving for H if either a ∈ V Min , δ H ( a ) > , δ H ( b ) < a ∈ V Max , δ H ( a ) < , δ H ( b ) >

0. Here, as before, δ H is the solution to (4–7) for H (andfor λ = 1 / δ H ( a ) = 0 for all a ∈ V .Consider another directed bipartite acyclic graph K = ( V, E ) over the same set ofnodes as H . We say that K can be obtained from H in one step of strong DNP gamesiteration if E contains all edges of H that are optimal for H and also at least one pairof nodes which is strongly improving for H . Multiplications and divisions would not be a disaster for this argument as we could consider formalrational fractions over ρ . However, we ﬁnd it instructive to note that we never go beyond the group R . H , H , . . . , H j can be obtained in strong DNP games iterations if for all i ∈ { , , . . . , j − } the graph H i +1 can be obtained from H i in one step of strong DNP games iteration. G x , G x , G x , . . . can be obtained in strongDNP games iteration Consider any two consecutive points x and x ′ = RealizeGraph ( E x + ε max χ + x ) of PolPoten from Algorithm 2. We shall show that the graph G x ′ can be obtained from G x in onestep of strong DNP games iteration. First, note that both of these graphs are bipartite(because the underlying energy game is bipartite) and acyclic (because of Assumption1). Set y = x + ε max χ + x . As the graph G x ′ contains all edges of the graph G y , it is enoughto show the following (a) all the edges of the graph G x that are optimal for G x are also in the graph G y ; (b) there is an edge of the graph G y which is a strongly improving pair for the graph G x . Proof of (a).

Take any edge ( a, b ) of the graph G x which is optimal for G x . Clearly,the values of δ x ( a ) and δ x ( b ) are either both positive or both negative. Hence the shift χ + x increases both x a and x b by the same amount. This means that ( a, b ) is still tightfor y , i.e., ( a, b ) is an edge of G y . Proof of (b).

First, there exists an edge e = ( a, b ) ∈ E which belongs to the graph G y and not to G x . Indeed, otherwise all edges that are tight for y were already tight for x and hence ε max can be increased. It is enough to show now that any edge ( a, b ) ∈ E y \ E x is strongly improving for G x . Since ( a, b ) is not tight for x , we have: • w ( e ) + x b − x a < a ∈ V Max ; • w ( e ) + x b − x a > a ∈ V Min .On the other hand, since ( a, b ) is tight for y , we have: w ( e ) + x b − x a + ε max ( χ + x ( b ) − χ + x ( a )) = 0 . Hence χ + x ( b ) − χ + x ( a ) > a ∈ V Max and χ + x ( b ) − χ + x ( a ) < a ∈ V Min . Consider thecase a ∈ V Max , the case a ∈ V Min is similar. Note that χ + x ( b ) − χ + x ( a ) > χ + x ( b ) = 1 and χ + x ( a ) = 0. I.e., δ x ( a ) < δ x ( b ) >

0. Since a ∈ V Max , this means that( a, b ) is strongly improving for G x . O (2 n/ ) bound on length of strong DNP games iteration Note that strong DNP games iteration is a special case of DNP games iteration. Henceall the results we established for DNP games iteration can be applied here. Since we are24ealing with bipartite graphs, we already have the bound O (2 n ) proved in Subsection 4.4.The improvement from 2 n to 2 n/ will be obtained by noticing that in every step of strongDNP games iteration both f H and g H increase in the alternating lexicographic order.Before we could only show that one of these vectors increase, the other could remainunchanged. So why the fact that both f H and g H increase each time leads to a O (2 n/ )bound? Note that k f H k = |{ a ∈ V | δ H ( a ) > }| and k g H k = |{ a ∈ V | δ H ( a ) < }| .Hence k f H k + k g H k = n . Therefore, if strong DNP games iteration has length l , theneither k f H k n/ l/ k g H k n/ l/ l/ v ∈ N n − satisfying (13–16) and k v k n/

2. On theother hand, the number of such vectors is O (2 n/ ). Indeed, as shown in Subsection 4.4the number of v ∈ N n − satisfying (13–16) and k v k = s, t ( v ) = t is (cid:16) s − t (cid:17) . By summingover all s n/ t we get the required O (2 n/ ) bound.It only remains to explain why both f H and g H increase in each step of a strongDNP games iteration. Let H and K be two consecutive graphs in strong DNP gamesiteration. Assume ﬁrst that f K is not greater than f H in the alternating lexicographicorder. By Lemma 5 for every i ∈ { , , . . . , n − } it holds that { a ∈ V | δ H ( a ) = λ i } = { a ∈ V | δ K ( a ) = λ i } . In particular, { a ∈ V | δ H ( a ) > } = { a ∈ V | δ K ( a ) > } . Since δ H and δ K are non-zero in all nodes (again, this is because these graphs are acyclic), wealso have { a ∈ V | δ H ( a ) < } = { a ∈ V | δ K ( a ) < } . Hence a pair ( a, b ) ∈ V × V is strongly improving for H if and only if it is strongly improving for K . On the otherhand, the graph K contains as an edge a strongly improving pair for H . This pair isalso strongly improving for K . Therefore it can not be an edge of K , contradiction.Exactly the same argument shows that g K is greater than g H in the alternatinglexicographic order. References [1]

Björklund, H., and Vorobyov, S.

Combinatorial structure and randomizedsubexponential algorithms for inﬁnite games.

Theoretical Computer Science 349 , 3(2005), 347–360.[2]

Bojańczyk, M., and Czerwiński, W.

An au-tomata toolbox. A book of lecture notes, available at ∼ bojan/upload/reduced-may-25.pdf , 2018.[3] Bouyer, P., Fahrenberg, U., Larsen, K. G., Markey, N., and Srba, J.

Inﬁnite runs in weighted timed automata with energy constraints. In

InternationalConference on Formal Modeling and Analysis of Timed Systems (2008), Springer,pp. 33–47.[4]

Brim, L., Chaloupka, J., Doyen, L., Gentilini, R., and Raskin, J.-F.

Faster algorithms for mean-payoﬀ games.

Formal methods in system design 38 , 2(2011), 97–118. 255]

Chakrabarti, A., De Alfaro, L., Henzinger, T. A., and Stoelinga,M.

Resource interfaces. In

International Workshop on Embedded Software (2003),Springer, pp. 117–133.[6]

Dorfman, D., Kaplan, H., and Zwick, U.

A faster deterministic exponentialtime algorithm for energy games and mean payoﬀ games. In (2019), SchlossDagstuhl-Leibniz-Zentrum fuer Informatik.[7]

Ehrenfeucht, A., and Mycielski, J.

Positional strategies for mean payoﬀgames.

International Journal of Game Theory 8 , 2 (1979), 109–113.[8]

Fijalkow, N., Gawrychowski, P., and Ohlmann, P.

The complexity of meanpayoﬀ games using universal graphs. arXiv preprint arXiv:1812.07072 (2018).[9]

Filar, J., and Vrieze, K.

Competitive Markov decision processes . SpringerScience & Business Media, 2012.[10]

Gimbert, H., and Zielonka, W.

Games where you can play optimally withoutany memory. In

International Conference on Concurrency Theory (2005), Springer,pp. 428–442.[11]

Halman, N.

Simple stochastic games, parity games, mean payoﬀ games and dis-counted payoﬀ games are all lp-type problems.

Algorithmica 49 , 1 (2007), 37–50.[12]

Hansen, T. D., and Ibsen-Jensen, R.

The complexity of interior point methodsfor solving discounted turn-based stochastic games. In

Conference on Computabilityin Europe (2013), Springer, pp. 252–262.[13]

Hansen, T. D., Miltersen, P. B., and Zwick, U.

Strategy iteration is stronglypolynomial for 2-player turn-based stochastic games with a constant discount factor.

Journal of the ACM (JACM) 60 , 1 (2013), 1–16.[14]

Howard, R. A.

Dynamic programming and markov processes.[15]

Jurdziński, M.

Deciding the winner in parity games is in up ∩ co-up. InformationProcessing Letters 68 , 3 (1998), 119–124.[16]

Lifshits, Y. M., and Pavlov, D. S.

Potential theory for mean payoﬀ games.

Journal of Mathematical Sciences 145 , 3 (2007), 4967–4974.[17]

Littman, M. L.

Algorithms for sequential decision making . 1996.[18]

Ludwig, W.

A subexponential randomized algorithm for the simple stochasticgame problem.

Information and computation 117 , 1 (1995), 151–155.[19]

Matoušek, J., Sharir, M., and Welzl, E.

A subexponential bound for linearprogramming.

Algorithmica 16 , 4-5 (1996), 498–516.2620]

Megiddo, N.

Towards a genuinely polynomial algorithm for linear programming.

SIAM Journal on Computing 12 , 2 (1983), 347–353.[21]

Pisaruk, N. N.

Mean cost cyclical games.

Mathematics of Operations Research24 , 4 (1999), 817–828.[22]

Puterman, M. L.

Markov decision processes: discrete stochastic dynamic pro-gramming . John Wiley & Sons, 2014.[23]

Rao, S. S., Chandrasekaran, R., and Nair, K.

Algorithms for discountedstochastic games.

Journal of Optimization Theory and Applications 11 , 6 (1973),627–637.[24]

Shapley, L. S.

Stochastic games.

Proceedings of the national academy of sciences39 , 10 (1953), 1095–1100.[25]

Zwick, U., and Paterson, M.

The complexity of mean payoﬀ games on graphs.

Theoretical Computer Science 158 , 1-2 (1996), 343–359.

A Why Lemma 6 is tight

For v ∈ A let k ( v ) be the number of i ∈ { , , . . . , t ( v ) } such that either v i = 0 or v i +1 = 0. It is not hard to see that the number of v ∈ A with k v k = n, t ( v ) = t, k ( v ) = k is 2 k · tk ! · n − t − k ! . (19)Let us explain how to choose t and k so that (19) equals (2 + √ n (up to a polynomialfactor). First, by using an approximation of binomial coeﬃcients in terms of the Shannonfunction we get that up to a polynomial factor (19) equals:2 ( αβ + αh ( β )+ h (2 α − αβ ))( n − , (20)where α, β ∈ [0 ,

1] are such that t = α ( n −

1) and k = βt , and h ( x ) = x log (1 /x ) + (1 − x ) log (1 / (1 − x )). A direct calculation shows that for α = √ , β = 2( √ − , the coeﬃcient before ( n −

1) in the exponent of (20) equals log (2 + √ √ n (up to a polynomial factor). B Proof of Lemma 7

Let us call a node a ∈ V of the graph G trivial in the following two cases:27 a ∈ V Max and only nodes of V Max are reachable from a ; • a ∈ V Min and only nodes of V Min are reachable from a .Next, let us call a cycle C of the graph G trivial in the following two cases: • cycle C is non-negative and all its nodes are from V Max ; • cycle C is negative and all its nodes are from V Min .First step of our reduction is to get rid of trivial nodes and cycles. Note that once wehave detected a trivial node or a trivial cycle, we can determine the winner of energygame in at least one node of G . Indeed, to determine the winner of energy game in atrivial node we essentially need to solve a one-player energy game. It is well-known thatthis can be done in strongly polynomial time. In turn, all nodes of a trivial cycle arewinning for the player controlling these nodes – he can win just by staying on the cycleforever.Next, once the winner is determined in at least one node, there is a standard wayof reducing the initial game to a game with fewer nodes. Suppose we know the winnerin a node a , say, it is Max. Then Max also wins in all the nodes from where he canenforce reaching a . We simply remove all these nodes. This does not aﬀect who winsthe energy games in the remaining nodes. Indeed, Max has no edges to removed nodesand a winning strategy of Min would never use an edge to these nodes. It should be alsonoted that in the remaining graph all the nodes still have at least one out-going edge (asink would have been removed).So getting rid of trivial nodes and cycles can be done as follows. We ﬁrst detectwhether they exist. Then we determine the winner in some node of the graph andreduce our game to a game with smaller number of nodes. Clearly, all these actions takestrongly polynomial time. This can be repeated at most n times, so the whole proceduretakes strongly polynomial time.From now we assume that we are given an energy game G on a graph G = ( V, E )with no trivial cycles and nodes. We construct a bipartite graph G ′ over the same setof nodes and the corresponding bipartite energy game G ′ equivalent to the initial one.In the deﬁnition of G ′ we use the following notation. Consider a path p of the graph G .We say that p is Max-controllable if all the nodes of p except the last one are from V Max (the last one can belong to V Min as well as to V Max ). In other words, Max should be ableto navigate the pebble along p without giving the control to Min. Similarly, we say that p is Min-controllable if all the nodes of p except the last one are from V Min .First, consider a pair of nodes a ∈ V Max , b ∈ V Min . We include ( a, b ) as en edge to thegraph G ′ if and only if in G there is a Max-controllable path from a to b . Since a is not atrivial node in G , there will be at least one edge starting at a in G ′ . Provided ( a, b ) wasincluded, we let its weight in G ′ be the largest weight of a Max-controllable path from a to b in G (with respect to the weight function of G ). We call a path on which thismaximum is attained underlying for edge ( a, b ). In this way we always obtain a ﬁniteweight since in G there are no positive cycles consisting entirely of nodes from V Max .28e have described edges of G ′ from V Max to V Min . Edges in opposite direction aredeﬁned analogously. Namely, consider a pair of nodes a ∈ V Min , b ∈ V Max . We includethis pair to G ′ as an edge if and only if in G there is a Min-controllable path from a to b . Once ( a, b ) is included, we let its weight be the minimal weight of a Min-controllablepath from a to b in G . A path attaining this minimum will be called underlying for( a, b ). Again, absence of trivial nodes guaranties that in G ′ the node a will have at leastone out-going edge. The weight of ( a, b ) will be well-deﬁned due to absence of trivialcycles.It only remains to argue that G ′ is equivalent to G . Let W Max ( W Min ) be the set ofnodes where Max (Min) wins in G . It is enough to show that the set W Max (the set W Min )is winning for Max (Min) in G ′ . We present an argument only for W Max , the argumentfor W Min is similar.Let σ be a Max’s positional strategy which is winning for the game G in W Max .Consider the following Max’s positional strategy σ ′ for the graph G ′ . We will deﬁne itonly for nodes in W Max . Given a Max’s node a ∈ W Max , apply σ to a repeatedly untila node from V Min is reached. In fact, there is a possibility that from a strategy σ loopsbefore reaching any Min’s node. But then the corresponding cycle would be negative(there are no trivial cycles). This would mean that σ is not winning for Max in a . Sowe conclude that indeed by applying repeatedly σ to a we reach a node from V Min . Letthis node from V Min be b . Note that ( a, b ) is an edge of G ′ , as we have reached b by aMax-controllable path from a . We let ( a, b ) be the edge that strategy σ ′ uses in the node a . We shall prove that only non-negative cycles are reachable from W Max in ( G ′ ) σ ′ . First,note that edges that σ ′ uses do not leave W Max . This is because by applying a winningMax’s strategy repeatedly we can not leave W Max in G . Moreover, no Min’s edge in G ′ can leave W Max . Indeed, otherwise Min could leave W Max in G . Thus, it remainsto argue that any cycle C ′ in ( G ′ ) σ ′ , located in W Max , is non-negative. Indeed, we canobtain in G σ a cycle C , located in W Max and having at most the same weight. As C isnon-negative, the same holds for C ′ .To obtain C we replace each edge ( a, b ) of C ′ by a path p ( a,b ) from a to b in G σ . Thepath p ( a,b ) will never leave W Max and its weight in G will be at most the weight of ( a, b )in G ′ .If ( a, b ) ∈ V Min × V Max , we let p ( a,b ) be an underlying path for ( a, b ). Its weight in G just equals the weight of ( a, b ) in G ′ . As this path is Min-controllable, it belongs to G σ and never leaves W Max .If ( a, b ) ∈ V Max × V Min , then ( a, b ) is used by strategy σ ′ in a . Hence by deﬁnition of σ ′ there is a Max-controllable path in G σ from a to b . We let p ( a,b ) be this path. It neverleaves W Max as σ can not leave W Max . The weight of ( a, b ) in G ′ is the largest weight ofa Max-controllable path from a to b in G , so the weight of p ( a,b ))