A Generic Strategy Improvement Method for Simple Stochastic Games
AA Generic Strategy Iteration Method for SimpleStochastic Games
David Auger
David Laboratory, Université Paris [email protected]
Xavier Badin de Montjoye
David Laboratory, Université Paris [email protected]
Yann Strozecki
David Laboratory, Université Paris [email protected]
Abstract
We present a generic strategy iteration algorithm (GSIA) to find an optimal strategy of a simplestochastic game (SSG). We prove the correctness of GSIA, and derive a general complexity bound,which implies and improves on the results of several articles. First, we remove the assumption thatthe SSG is stopping, which is usually obtained by a polynomial blowup of the game. Second, weprove a tight bound on the denominator of the values associated to a strategy, and use it to provethat all strategy iteration algorithms are in fact fixed parameter tractable in the number of randomvertices. All known strategy iteration algorithms can be seen as instances of GSIA, which allows toanalyze the complexity of converge from below by Condon [10] and to propose a class of algorithmsgeneralising Gimbert and Horn’s algorithm [14, 15]. These algorithms require less than r ! iterationsin general and less iterations than the current best deterministic algorithm for binary SSGs given byIbsen-Jensen and Miltersen [17]. Theory of computation → Algorithmic game theory
Keywords and phrases
Simple Stochastic Games, Strategy Improvement, Parametrized Complexity,Stopping, Meta Algorithm, f-strategy
Acknowledgements
The authors want to thank Pierre Coucheney for many good discussions onSSGs. A simple stochastic game , or SSG, is a two-player turn-based zero-sum game with perfectinformation introduced by Condon [11]. It is a simpler version of stochastic games , previouslydefined by Shapley [22]. An SSG is played by two players max and min moving a pebble on agraph. The vertices of the graph are divided into min vertices, max vertices, random verticesand a target vertex for max . When the pebble reaches a player vertex, the correspondingplayer chooses a neighbouring vertex where to move the pebble. If it reaches a randomvertex, the next vertex is chosen at random following some probability law. Finally, whenthe pebble reaches the target vertex, min pays 1 to max . The goal of min is to minimise theprobability to reach the target vertex while max must maximise this probability.We study the algorithmic problem of solving an SSG, i.e. finding a pair of optimalstrategies in an SSG, or equivalently the optimal value vector which contains the optimalprobability for max to reach the sink from each vertex of the game. There are always optimalstrategies for both players that are positional [11], i.e. stationary and deterministic, but thenumber of positional strategies is exponential in the size of the game. Consequently, findinga pair of optimal strategies is a problem not known to be in FP , but is in PPAD [19], aclass included in
FNP . a r X i v : . [ c s . G T ] F e b A Generic Strategy Iteration Method for Simple Stochastic Games
Simple Stochastic Games can be used to simulate many classical games such as paritygames, mean or discounted payoff games [2, 7]. Moreover, stochastic versions of these gamesare equivalent to SSGs [2], which underlines that SSGs are an important model to study.SSGs have applications in different domains such as model checking of modal µ -calculus [23],or modelling autonomous urban driving [9].There are three known methods to solve SSGs: strategy iteration, value iteration andquadratic programming. A strategy iteration algorithm (SIA) starts with a strategy for oneplayer and improves until it is optimal, whereas value iteration algorithms (VIA) update avalue vector by elementary operations, which converges to the optimal value vector of thegame. Implementations of those algorithms have been written and compared in [20].Denote by n be the number of max vertices and r be the number of random vertices in anSSG. For SSGs with max vertices of outdegree 2, the best known deterministic algorithm isan SIA which makes at worst O (2 n /n ) iterations (see [24]), and the best known randomisedalgorithm is a SIA described by Ludwig in [21], which runs in 2 O ( √ n ).Gimbert and Horn give an SIA in [14], running in O ∗ ( r !) iterations, that is with asuperpolynomial dependency in r only ( O ∗ omits polynomial factors in r and n ). For SSGswhere random vertices have a probability distribution (1 / , /
2) (coin toss), Ibsen-Jensenand Miltersen present a VIA of complexity in O ∗ (2 r ) [17]. It turns out that all SIA runs in O ∗ (2 r ) on this family of SSGs, as we prove in this article. The same complexity of O ∗ (2 r )is obtained for general SSGs with a more involved randomised algorithm in [4].Most of the mentioned algorithms rely on the game being stopping which means thatit ends in a sink with probability 1. This condition is not restrictive since any SSG can betransformed into a stopping SSG while keeping the same optimal strategies. However, thistransformation incurs a quadratic blow-up of the game and cannot really be used in real lifeapplication. This restriction has been lifted for quadratic programming in [20] and beforethat for SIA and VIA in [8, 6]. Contributions
We introduce GSIA, a new meta-algorithm to solve SSGs in Sec. 3. This algorithm provessimultaneously the correctness of multiple algorithms ([10, 14, 13, 24, 17, 4]). In Sec. 4,we give a general complexity bound that matches or improves previous bounds obtainedby ad-hoc methods and shows that in all these algorithms, complexity is fixed-parametertractable in the number of random vertices. Moreover, we do not rely on the fact that thegame is stopping, which was commonly used in the aforementioned papers. The correctnessproof relies on concatenation of strategies to interpolate between two strategies, and ananalysis of the absorbing sets in the game, while the complexity is derived from a new andtight characterisation of the values of an SSG. Finally, in Sec. 5, we show how GSIA can beused to derive new algorithms, generalising classical ones. In particular, we exhibit a classof algorithms which generalise Gimbert and Horn’s algorithm and use less iterations thanIbsen-Jensen and Miltersen’s algorithm.
We give a generalized definition of a
Simple Stochastic Games , a two-player zero-sum gameswith turn-based moves and perfect information introduced by Anne Condon [10]. (cid:73)
Definition 1.
A Simple Stochastic Game (SSG) is a directed graph G , together with: . Auger and X. Badin De Montjoye and Y. Strozecki 3 A partition of the vertex set V in four parts V max , V min , V R and V S (all possibly empty,except V S ), satisfying the following conditions: a. every vertex of V max , V min or V R has at least one outgoing arc; b. every vertex of V S has exactly one outgoing arc which is a loop on itself. For every x ∈ V R , a probability distribution p x ( · ) with rational values, on the outneigh-bourhood of x . For every x ∈ V S , a value Val ( x ) which is a rational number in the closed interval [0 , . In the article, we denote | V max | by n and | V R | by r . Vertices from V max , V min , V R and V S are respectively called max vertices, min vertices, random vertices and sinks. For x ∈ V , wedenote by N + ( x ) the set of outneighbours of x . We assume that for every x ∈ V R and y ∈ V , y ∈ N + ( x ) if and only if p x ( y ) > max and min . A token ispositioned on a starting vertex x . If x is in V max (resp. V min ) the max player (resp. the min player) chooses one of the outneighbours of x to move the token to. If x is in V R , the tokenis randomly moved to one of the outneighbours of x according to the probability distribution p x ( · ), independently of everything else. This process continues until the token reaches a sink s and then, player min has to pay Val( s ) to player max and the game stops. The problemwe study is to find the best possible strategies for min and max , and the expected value that min has to pay to max while following those strategies.We consider a slightly restricted class of SSGs where the probability distribution on eachrandom vertex has a given precision and the value of the sinks are 0 and 1. (cid:73) Definition 2.
For q a positive integer, we say that an SSG is a q -SSG if there are only twosinks of value and , and for all x ∈ V R , there is an integer q x ≤ q such that the probabilitydistribution p x ( · ) can be written as p x ( x ) = ‘ x,x q x for all x where ‘ x,x is an integer. Let x be a random vertex of a 2-SSG, and let u ∈ N + ( x ), then p x ( u ) can be equal to 0,1 / p x ( u ) = 0 is forbidden by definition, and if p x ( u ) = 1, then x is of degreeone and can be removed (by redirecting arcs entering x directly to u ), without changinganything about the outcome of the game. Hence, we suppose without loss of generality thateach random vertex of a 2-SSG has degree 2 and has probability distribution (1 / , / binary SSG , given by Condon and used in most articles onSSGs, except that we allow here max and min vertices to have an outdegree larger than 2. (cid:73) Definition 3. A play in G is an infinite sequence of vertices X = ( x , x , x , · · · ) suchthat for all t ≥ , ( x t , x t +1 ) is an arc of G . If for a play X = ( x t ) there is some t ≥ x t = s ∈ V S , then all subsequent verticesin the play are also equal to s , and such a sink s is unique. In this case, we say that the play reaches sink vertex s and we define the value of the play Val( X ) as Val( s ). If the playreaches no sink, then we set Val( X ) = 0.A history of G is a finite directed path h = ( x , x , · · · , x k ). If the last vertex x k is a max vertex (resp. min vertex), we say that h is a max history (resp. min history). (cid:73) Definition 4. A general max strategy (resp. general min strategy) is a map σ assigningto every max history (resp. min history) h = ( x , x , · · · , x k ) a vertex σ ( h ) which is anoutneighbour of x k . The set of these strategies is denoted by Σ max gen (resp. Σ min gen ). A Generic Strategy Iteration Method for Simple Stochastic Games
For σ ∈ Σ max gen and τ ∈ Σ min gen , given a starting vertex x , we recursively define a randomplay X = ( X , X , · · · ) of G in the following way. At t = 0 let X = x , and for t ≥ X t ∈ V max , define X t +1 = σ ( X , X , · · · , X t );if X t ∈ V min , define X t +1 = τ ( X , X , · · · , X t );if X t ∈ V R , then X t +1 is an outneighbour of X t chosen following the probability distribu-tion p X t ( · ), independently of everything else;if X t ∈ V S , define X t +1 = X t .This defines a distribution on plays which we denote by P x σ,τ ( · ), or simply P ( · ) if strategiesand starting vertex are clear from context. The corresponding expected value and conditionalexpected values are denoted by E x σ,τ ( ·|· ), or simply E ( ·|· ).We now define positional strategies which only depend on the last vertex in the history: (cid:73) Definition 5.
A general max strategy σ (resp. min strategy) is said to be positional if forany max vertex x (resp. min vertex) and any history h = ( x , . . . , x ) , we have σ ( h ) = σ (( x )) where ( x ) is the history containing only x as a start vertex. The set of positional max strategies (resp. min strategies) is denoted Σ max (resp. Σ min ). (cid:73) Definition 6.
Let G be an SSG and let ( σ, τ ) be a pair of max and min strategies, thevalue vector v Gσ,τ is the real vector of dimension | V | defined by, for any x ∈ V , v Gσ,τ ( x ) = E x σ,τ ( Val ( X )) . As before, the superscript G can be omitted when the context is clear.To compare value vectors, we use the pointwise order: we say that v ≥ v if for all vertices x ∈ V we have v ( x ) ≥ v ( x ). Moreover, we say that v > v if v ≥ v and there is some x suchthat v ( x ) > v ( x ). Given a max strategy σ , a best response to σ is a min strategy τ suchthat v σ,τ ≤ v σ,τ for all min strategies τ . (cid:73) Proposition 7 ([10]) . A positional strategy admits a positional best response, which can befound in polynomial time using linear programming.
The set of positional best responses to σ is denoted by BR ( σ ). Similarly, for a min strategy τ , we define the notion of best response to τ and the corresponding set is denotedby BR ( τ ). Except explicitly stated otherwise (in Sec. 3.4), all considered strategies arepositional.We denote by τ ( σ ) a positional best response to σ . For a max strategy σ and τ ∈ BR ( σ ),we write v σ for v σ,τ . For a min strategy τ and σ ∈ BR ( τ ), we write v τ for v σ,τ . The vector v σ is called the value vector of strategy σ , and is used to compare strategies by writing σ > G σ if v Gσ > v Gσ .It is well known (see [10, 24]) that there is a pair of deterministic positional strategies( σ ∗ , τ ∗ ) called optimal strategies, that satisfies for all x , v ∗ = v σ ∗ ,τ ∗ = v σ ∗ = v τ ∗ since σ ∗ and τ ∗ are best responses to each other. The next two lemmas give characterisations of (optimal) value vectors under a pair ofstrategies. They are fundamental to all algorithms finding optimal strategies. Proofs of . Auger and X. Badin De Montjoye and Y. Strozecki 5 similar results can be found in [11]; we add here a fifth condition to make the characterisationhold when the game is not stopping.For any SSG, the vertices with value 0 under optimal strategies can be found in lineartime by a simple graph traversal computing its complementary, the set of max vertices whichcan access a sink of positive value, regardless of the choice of the min player. Let K G be theset of such vertices. For a max strategy σ of G and a min strategy τ , we call K Gσ,τ the set ofvertices with value zero under the pair of strategies σ, τ , and K Gσ the set of vertices withvalue zero under σ, τ when τ is a best response to σ . (cid:73) Lemma 8.
Given positional strategies ( σ, τ ) and a real | V | -dimensional vector v , one hasequality between v and v σ,τ if and only if the following conditions are met: (i) For s ∈ V S , v ( s ) = Val ( s ) (ii) For r ∈ V R , v ( r ) = X y ∈ N + ( r ) p r ( y ) v ( y ) (iii) For x ∈ V min , v ( x ) = v ( τ ( x )) (iv) For x ∈ V max , v ( x ) = v ( σ ( x )) (v) For any x ∈ V , v ( x ) = 0 , if and only if x ∈ K Gσ,τ
Moreover, τ ∈ BR ( σ ) if and only if for any x in V min , v ( x ) = min y ∈ N + ( x ) v ( y ) = v ( τ ( x )) andthe last condition is modified into v ( x ) = 0 if and only if x ∈ K Gσ . (cid:73) Lemma 9 (Optimality conditions) . Given positional strategies ( σ, τ ) and denoting v = v σ,τ , ( σ, τ ) are optimal strategies if and only if: (i) For s ∈ V S , v ( s ) = Val ( s ) (ii) For r ∈ V R , v ( r ) = X y ∈ N + ( r ) p r ( y ) v ( y ) (iii) For x ∈ V min , v ( x ) = min y ∈ N + ( x ) v ( y ) (iv) For x ∈ V max , v ( x ) = max y ∈ N + ( x ) v ( y ) (v) For any x ∈ V , v ( x ) = 0 , if and only if x ∈ K G The conditions of Lemma 9 makes of ( σ, τ ) a certificate of optimality that can be checkedin polynomial time: compute v σ,τ by solving the linear system of Lemma 8, compute K G inlinear time, then check in linear time if conditions are met. We present a simple transformation of an SSG, where some arcs of the game are rerouted tonew sinks with appropriate values. (cid:73)
Definition 10.
Let G be an SSG, let A be a subset of the arcs of G and let f be a functionfrom A to the set of rational numbers. Let G [ A, f ] be the SSG obtained from a copy of G with the following modifications: each arc e = ( x, y ) ∈ A is removed and replaced in G [ A, f ] by e = ( x, s e ) where s e is a new sink vertex with value f ( e ) . These new sinks of G [ A, f ] arecalled A -sinks, and A is the set of fixed arcs. Note that in the previous definition, the end vertex y of an arc ( x, y ) ∈ A is not removedfrom the game. Its incoming arcs which are in A are simply redirected to sinks, see Fig. 1. A Generic Strategy Iteration Method for Simple Stochastic Games x x x x x −→ x x x . x x Figure 1
Transformation of the graph G in G [ { ( x , x ) } , f ] where f (( x , x )) = 0 . The function f is usually given by the values of a a strategy: we denote by G [ A, σ ]the game G [ A, f ], where f is the function associating to each arc e = ( x, y ) of A the value f ( e ) = v σ ( y ). Comparing G and G [ A, σ ], the only differences are that arcs of A have theirendpoints changed to new sinks. Therefore, a strategy defined in G can be interpreted as astrategy of G [ A, σ ] and vice versa. Hence, we identify strategies of both players in G and G [ A, σ ] in the article. However, when we compare a strategy in both games in the followinglemma, in order to make sense we only compare values on vertices in G and do not consider A -sinks (and anyway values of A -sinks are fixed). (cid:73) Lemma 11.
For an SSG G , a subset of arcs A , and a max strategy σ , K Gσ = K G [ A,σ ] σ . Proof.
Fix a min strategy τ and define R Gσ,τ ( x ) as the set of vertices that can be reachedfrom x in G , following only arcs corresponding to σ and τ after max and min vertices, andany arc out of random vertices. We repeatedly use the easy fact that the three followingassertions are equivalent: (i) v Gσ,τ ( x ) = 0; (ii) v Gσ,τ ( y ) = 0 for all y ∈ R Gσ,τ ( x ); (iii) Val G ( s ) = 0 for all s ∈ V GS ∩ R Gσ,τ ( x ).The same equivalence is true in G [ A, σ ], where we define R G [ A,σ ] σ,τ likewise. Denote by R GA ( x )vertices of R Gσ,τ ( x ) that are endpoints of arcs in A , and let S A ( x ) be the corresponding A -sinks in G [ A, σ ].Suppose that v Gσ,τ ( x ) = 0 and consider a sink s in V G [ A,σ ] S ∩ R G [ A,σ ] ( x ): either it belongsto V GS hence also to R G ( x ) and satisfies Val G ( s ) = 0 by ( iii ), or it belongs to S A ( x ) andthen by definition Val G [ A,σ ] ( s ) = v Gσ ( s ) ≤ v Gσ,τ ( s ) = 0 . Thus, by ( iii ) once again we have v G [ A,σ ] σ,τ ( x ) = 0.Conversely, suppose that v G [ A,σ ] σ,τ ( x ) = 0 and let s ∈ V GS ∩ R Gσ,τ ( x ). Then, either s ∈ R G [ A,σ ] σ,τ , hence by ( iii ) Val G ( s ) = Val G [ A,σ ] ( s ) = 0 , or there is a y ∈ R GA ( x ) such that s ∈ R Gσ,τ ( y ). In this case we have v Gσ,τ ( y ) = 0 by ( ii ), henceVal G ( s ) = 0 by ( iii ) applied to y , and we see that v Gσ,τ ( x ) = 0.Since we have v Gσ,τ ( x ) = 0 if and only if v G [ A,σ ] σ,τ ( x ) = 0, regardless of τ , the resultfollows. (cid:74)(cid:73) Lemma 12.
For an SSG G , a subset of arcs A , and a max strategy σ , v Gσ = v G [ A,σ ] σ . Proof.
This is a direct consequence of Lemma 8 and Lemma 11, since the vector v Gσ satisfiesthe best-response conditions in G [ A, σ ] and vice versa. (cid:74) . Auger and X. Badin De Montjoye and Y. Strozecki 7
An SSG is stopping if under every pair of strategies, a play eventually reaches a sink withprobability 1. Most algorithms in the literature depend on the game being stopping. It isusually not seen as a limitation since it is possible to transform every SSG into a stoppingSSG, but the transformation makes the game polynomially larger, which is bad from acomplexity point of view. We strengthen the classical order on strategies, to get rid of thestopping condition in the generic strategy improvement algorithm presented in this section. (cid:73)
Definition 13.
Let σ and σ be two max strategies, then σ (cid:31) G σ if σ > G σ and for every max vertex x , if v Gσ ( x ) = v Gσ ( x ) , then σ ( x ) = σ ( x ) . Algorithm 1 is a classical strategy improvement algorithm with two twists: the improve-ment is for the stricter order (cid:31) and it is guaranteed in the transformed game rather thanin the original game. We call Algorithm 1 the Generic Strategy Improvement Algorithm,or GSIA.
Algorithm 1
GSIA
Data: G a stopping SSG Result: ( σ, τ ) a couple of optimal strategies begin select an initial max strategy σ while ( σ, τ ( σ )) are not optimal strategies of G do choose a subset A of arcs of G find σ such that σ (cid:31) G [ A,σ ] σ . σ ←− σ return ( σ, τ ( σ ))Algorithm 1 is a generic algorithm (or meta-algorithm) because neither selecting an initialstrategy σ at line 2, nor the way of choosing A at line 4, nor the way of finding σ at line 5,is specified. A choice of implementation for these three parts is an instance of GSIA, thatis a concrete strategy improvement algorithm. Note that if σ > G [ A,σ ] σ is found, it is easyto find σ with σ (cid:31) G [ A,σ ] σ : define σ as equal to σ , except for max vertices x such that v Gσ ( x ) = v Gσ ( x ) and σ ( x ) = σ ( x ) where σ ( x ) is defined as σ ( x ).When we prove some property of GSIA in this article, it means that the property is truefor all instances of GSIA, that is regardless of the selection of the initial strategy, of the set A or of the strategy σ .In order to prove the correctness of GSIA, we need to prove two points: If σ is not optimal in G , then σ is not optimal in G [ A, σ ]. If σ (cid:31) G [ A,σ ] σ then σ > G σ .The first point is proved in the following lemma, while the second one is much harder toobtain and is the subject of the next two subsections. (cid:73) Lemma 14.
For an SSG G and a subset of arcs A , a max strategy σ is optimal in G ifand only if it is optimal in G [ A, σ ] . A Generic Strategy Iteration Method for Simple Stochastic Games
Proof.
Except on A -sinks, the value vectors of σ in G and G [ A, σ ] are equal by Lemma 12.Furthermore, by Lemma 11, K Gσ = K G [ A,σ ] σ ; hence σ satisfies the optimality conditions ofLemma 9 in G if and only if it satisfies them in G [ A, σ ]. (cid:74) In order to avoid requiring the game to be stopping, it is necessary to pay particular attentionto the set of vertices where the play can loop infinitely and yield value zero, which is a subsetof the set of vertices of value 0. We now prove that a step of GSIA can only reduce this set. (cid:73)
Definition 15.
For an SSG G and two strategies ( σ, τ ) , an absorbing set Z is a subset of V (cid:114) V S such that any vertex Z has a probability zero of reaching a vertex of V (cid:114) Z . For σ and τ two strategies, Z ( σ, τ ) is the set of all vertices in some absorbing set under( σ, τ ). Hence, Z ( σ, τ ) is also an absorbing set. By definition, a play remains stuck in anabsorbing set and can never reach a sink, hence all vertices of an absorbing set have valuezero under ( σ, τ ). The next lemma proves the existence of the inclusion-wise maximum over τ of Z ( σ, τ ) that we denote by Z ( σ ). (cid:73) Lemma 16.
For every max strategy σ , there is τ ∈ BR ( σ ) such that for every min strategy τ , we have Z ( σ, τ ) ⊆ Z ( σ, τ ) . Proof.
For τ in BR ( σ ) and τ such that Z ( σ, τ ) (cid:42) Z ( σ, τ ), then we define ˜ τ as ˜ τ ( x ) = τ ( x )for x in Z ( σ, τ ) and ˜ τ ( x ) = τ ( x ) otherwise. We now prove that ˜ τ ∈ BR ( σ ) and Z ( σ, ˜ τ ) ⊇ Z ( σ, τ ) ∪ Z ( σ, τ ).Since τ is a best response to σ , we have v σ,τ ( x ) ≤ v σ,τ ( x ). Moreover, for x ∈ Z ( σ, τ ), v σ,τ ( x ) = 0 thus v σ,τ ( x ) = 0. From this, we deduce that the two systems of linear equationsgiven by Lemma 8, characterising respectively vectors v σ,τ and v σ, ˜ τ , are exactly the same:for the only vertices where ˜ τ ( x ) and τ ( x ) differ satisfy v σ,τ ( τ ( x )) = v σ,τ (˜ τ ( x )) = 0. Hence,we have v σ,τ = v σ, ˜ τ and ˜ τ ∈ BR ( σ ).For any play under strategies ( σ, ˜ τ ) starting in x ∈ Z ( σ, τ ), the min vertices of the playare all in Z ( σ, τ ) because ˜ τ plays as τ on these vertices. Thus, we have Z ( σ, τ ) ⊆ Z ( σ, ˜ τ ).For a play starting in x ∈ Z ( σ, τ ), either the play reaches a vertex of Z ( σ, τ ) and then staysin Z ( σ, τ ) or it plays like τ and stays in Z ( σ, τ ). Hence, we have Z ( σ, τ ) ⊆ Z ( σ, ˜ τ ). (cid:74) From this we deduce the following result on the improvement step for GSIA (whereabsorbing sets are understood in G ): (cid:73) Proposition 17.
Let G be an SSG, A a set of arcs of G , σ and σ two max strategies suchthat σ (cid:31) G [ A,σ ] σ , then Z ( σ ) ⊆ Z ( σ ) . Proof.
Suppose that Z ( σ ) is not a subset of Z ( σ ). From Lemma 16 there is τ ∈ BR ( σ )such that Z ( σ, τ ) = Z ( σ ) and τ ∈ BR ( σ ) such that Z ( σ , τ ) = Z ( σ ) = Z .Let X be the set of max vertices x in Z ( σ ) (cid:114) Z ( σ ) such that σ ( x ) = σ ( x ); it is nonemptyotherwise Z ( σ ) would be an absorbing set for ( σ, τ ). If x is in X , since σ (cid:31) G [ A,σ ] σ , we have v G [ A,σ ] σ ( x ) > v G [ A,σ ] σ ( x ) ≥ G [ A, σ ] starting from x under the strategies ( σ , τ ). Since Z isan absorbing set in G under the same strategies, it implies that all the accessible sinks in . Auger and X. Badin De Montjoye and Y. Strozecki 9 G [ A, σ ] are A -sinks. Hence, there is at least one arc e = ( y, z ) ∈ A with both ends in Z andsuch that v σ ( z ) >
0. We define the vertex s of Z as: s = arg max z ∈ Z { v σ ( z ) | ∃ y ∈ V, ( y, z ) ∈ A } and we let v = v σ ( s ). The value of each vertex in Z is bounded by v . Similarly than for x ,in G under strategies ( σ, τ ) the value of s is bounded by the value of the vertex leaving Z .Such vertices exist since Z is not a subset of Z ( σ ). We now want to show that those verticesall have value strictly lesser than v , thus proving a contradiction.First, since Z is an absorbing set for ( σ , τ ), all arcs leaving a random vertex in Z ( σ )remain in Z ( σ ) in G ; this is not dependent on the strategies considered.Let E X ⊆ X the set of max vertices x of X such that σ ( x ) / ∈ Z and let E N ⊆ Z ∩ V min the set of min vertices x of Z such that τ ( x ) / ∈ Z .On the one hand, for a min vertex x ∈ E N : v Gσ ( τ ( x )) ≤ v Gσ ( τ ( x )) Since τ = τ ( σ ) v Gσ ( τ ( x )) = v G [ A,σ ] σ ( τ ( x )) v G [ A,σ ] σ ( τ ( x )) ≤ v G [ A,σ ] σ ( τ ( x )) Since σ (cid:31) G [ A,σ ] σv G [ A,σ ] σ ( τ ( x )) ≤ v Since τ ( x ) ∈ Z Thus, v Gσ ( τ ( x )) ≤ v . In case of equality, we have v = v Gσ ( τ ( x )) = v Gσ ( τ ( x )); hence we canreplace τ by ¯ τ ( x ),which is identical to τ except that ¯ τ ( x ) = τ ( x ). We have v σ,τ = v σ, ¯ τ and Z ( σ, τ ) = Z ( σ, ¯ τ ). Indeed, according to Lemma 8 the only situation that could occur wouldbe to violate the condition ( v ) by creating an absorbing set. However this would contradictthe definition of τ . Thus, we can suppose that for any x in E N , v σ ( τ ( x )) < v .On the other hand, since σ (cid:31) G [ A,σ ] σ we know that for any x in E X : v Gσ ( σ ( x )) < v G [ A,σ ] σ ( x ) ≤ v Now, for any vertex x of E = E X ∪ E N , let p x be the probability of x being the first vertexof E reached starting from s following strategies ( σ, τ ). By conditional expectation: v σ ( s ) = X x ∈ E X p x v σ ( σ ( x )) + X x ∈ E N p x v σ ( τ ( x ))Thus, v σ ( s ) < v which contradicts the definition of v , and proves that Z ( σ ) ⊆ Z ( σ ). (cid:74) As a tool for proving the correctness of Algorithm 1, we introduce the notion of concatenationof strategies which produces non-positional strategies even if both concatenated strategiesare positional. The idea of using a sequence of concatenated strategies to interpolate betweentwo strategies has been introduced in [15]. (cid:73)
Definition 18.
For two max strategies σ , σ and a subset of arcs A , we call σ | A σ thenon-positional strategy that plays like σ until an arc of A is crossed, and then plays like σ until the end of the game. We let σ | A σ = σ and for all i ≥ , σ | i +1 A σ = σ | A ( σ | iA σ ) . When A is clear from the context, we omit it and write σ | i σ . Strategy σ | iA σ is thestrategy that plays like σ until i arcs from A have been crossed, and then plays like σ .Hence, we can relate the strategy σ | A σ to a positional strategy inG [ A, σ ] as shown in thenext lemma. (cid:73)
Lemma 19.
For two max strategies σ , σ and a subset of arcs A , we have: v Gσ | A σ = v G [ A,σ ] σ Proof. In G , after crossing an arc from A , by definition of σ | A σ , max plays according to σ . The game being memoryless, from this point, the best response for min is to play like τ ( σ ) ∈ BR ( σ ). Thus, there is a best response to σ | σ of the form τ | τ ( σ ) with τ a min strategy not necessarily positional. Let us consider a play following ( σ | σ, τ | τ ( σ )) with τ any min strategy. If the play does not cross an arc of A , then there is no difference between thisplay and a play following ( σ , τ ) in G [ A, σ ]. If an arc of A is used, then by Lemma 12 thereis no difference between stopping with the value of G [ A, σ ] or continuing in G while following( σ, τ ). Thus we have: v Gσ | σ,τ | τ ( σ ) = v G [ A,σ ] σ ,τ . Thus, if τ is a best response to σ in G [ A, σ ], then τ | τ ( σ ) is a best response to σ | σ in G . This implies that v Gσ | σ = v G [ A,σ ] σ . (cid:74) We now prove the fact that increasing the values of sinks can only increase the value ofthe game (a similar lemma is proved in [3]). (cid:73)
Lemma 20.
Let G and G be two identical SSGs except the values of theirs sinks s ∈ V S ,denoted respectively by Val ( s ) and Val ( s ) . If for every s ∈ V S , Val ( s ) ≥ Val ( s ) , then forevery max strategy σ we have v G σ ≥ v Gσ . Proof.
For s ∈ V S , let P xσ,τ ( → s ) be the probability that the play ends in sink s whilestarting from vertex x , following strategies ( σ, τ ). For any vertex x we have: v G σ,τ ( x ) = X s ∈ V S P xσ,τ ( → s )Val ( s ) ≥ X s ∈ V S P xσ,τ ( → s )Val( s ) = v Gσ,τ ( x )This is true for any min strategy τ , thus v G σ ≥ v Gσ . (cid:74) The following proposition is the core idea of GSIA: a strategy which improves on σ inthe transformed game also improves on σ in the original game. (cid:73) Proposition 21.
Let G be an SSG, A a subset of arcs of G and σ, σ two max strategies.If σ (cid:31) G [ A,σ ] σ then σ > G σ . Proof.
We introduce a sequence of non-positional strategies ( σ i ) i ≥ defined by σ i = σ | i σ for i ≥
1. By hypothesis σ (cid:31) G [ A,σ ] σ , and by Lemma 19 v Gσ | A σ = v G [ A,σ ] σ , then we have v Gσ = v Gσ | σ = v G [ A,σ ] σ > v G [ A,σ ] σ = v Gσ . Hence, by definition, sinks of G [ A, σ ] will have at least the values of the correspondingsinks in G [ A, σ ]. Applying Lemma 20, we obtain that v G [ A,σ ] σ ≥ v G [ A,σ ] σ , which can also bewritten as v Gσ ≥ v Gσ . More generally, we have: ∀ i ≥ , v Gσ i +1 ≥ v Gσ i ≥ v Gσ > v Gσ . We now prove that v Gσ ≥ v Gσ to conclude the proof. . Auger and X. Badin De Montjoye and Y. Strozecki 11 From now on, we only consider the game G . Fix a vertex x and a min strategy τ ∈ BR ( σ )such that Z ( σ ) = Z ( σ , τ ). From Proposition 17 we know that, Z ( σ ) ⊆ Z ( σ ). It impliesthat for every z ∈ Z ( σ ), v Gσ ( z ) = v Gσ ( z ) = 0 which implies that v G [ A,σ ] σ ( z ) = 0. Thus, σ ( z ) = σ ( z ). It implies that Z ( σ ) ⊆ Z ( σ, τ ).We now only consider G the game G where we replace every vertex in Z ( σ , τ ) by a sinkof value 0. Lemma 12 directly implies that v Gσ = v G σ and v Gσ = v G σ . Moreover, when playingfollowing σ i when a vertex of Z ( σ ) is reached, for all possible history, the play will stay inthe absorbing set. Thus, v Gσ i = v G σ i .Recall that P xσ ,τ ( → s ) is the probability to reach a sink s in G while starting in x andfollowing ( σ , τ ). Let T σ ,τ be a random variable defined as the time at which a sink isreached. Note that T σ ,τ may be equal to + ∞ .For every i ≥
1, we use Bayes rule to express the value of v σ ,τ ( x ) while conditioning onfinishing the game before i steps. v σ ,τ ( x ) = P ( T σ ,τ < i ) X s ∈ V S P xσ ,τ ( → s | T σ ,τ < i )Val( s )+ P ( i ≤ T σ ,τ < + ∞ ) X s ∈ V S P xσ ,τ ( → s | + ∞ > T σ ,τ ≥ i )Val( s )If T σ i ,τ < i , only i arcs have been crossed, thus at most i arcs from A have been crossedwhen the sink is reached. Hence σ i acts like σ during the whole play, which yields: v σ ,τ ( x ) = P ( T σ i ,τ < i ) X s ∈ V S P xσ i ,τ ( → s | T σ i ,τ < i )Val( s )+ P ( i ≤ T σ ,τ < + ∞ ) X s ∈ V S P xσ ,τ ( → s | + ∞ > T σ ,τ ≥ i )Val( s )We use Bayes rule in the same way for v σ i ,τ ( x ) v σ i ,τ ( x ) = P ( T σ i ,τ < i ) X s ∈ V S P xσ i ,τ ( → s | T σ i ,τ < i )Val( s )+ P ( i ≤ T σ i ,τ < + ∞ ) X s ∈ V S P xσ i ,τ ( → s | T σ i ,τ ≥ i )Val( s )Since every absorbing vertex in G associated with σ has been turned into a sink, in G P ( T σ ,τ < i ) = P ( T σ i ,τ < i ) converges to 1 when i grows. Hence, both P ( i ≤ T σ ,τ < + ∞ )and P ( i ≤ T σ i ,τ < + ∞ ) go to 0 andlim i → + ∞ | v σ ,τ ( x ) − v σ i ,τ ( x ) | = 0 . Hence, if there was x such that v σ ( x ) < v σ ( x ), we denote (cid:15) = v σ ( x ) − v σ ( x ). Forsome rank I for all i ≥ I we have | v σ ,τ − v σ i ,τ | < (cid:15)/
2. Which implies v σ i ,τ ( x ) < v σ ( x ). Werecall that v σ ( x ) ≤ v σ i ( x ). This means that v σ i ,τ < v σ i ( x ), which contradicts the notion ofoptimal response against σ i . Therefore, we have shown that σ ≥ G σ > G σ. (cid:74) As a consequence of all previous lemmas, we obtain the correction of GSIA. (cid:73)
Theorem 22.
GSIA terminates and returns a pair of optimal strategies.
Proof.
We denote by σ i the max strategy σ at the end of the i -th loop in Algorithm 1. Byinduction, we prove that the sequence σ i is of increasing value. Indeed, Line 5 of Algorithm 1guarantees that σ (cid:31) G [ A,σ ] σ , thus Prop. 21 implies that σ > G σ , that is σ i +1 > σ i .The strategies produced by the algorithm are positional, hence there is only a finitenumber of them. Since the sequence is strictly increasing, it stops at some point. Thealgorithm only stops when Line 5 of Algorithm 1 fails to find σ (cid:31) G [ A,σ ] σ . In other words, σ is optimal in G [ A, σ ]. By Lemma 14, σ is also optimal in G . (cid:74) We analyse the algorithmic complexity of GSIA, by lower bounding the values of the sequenceof strategies it produces. We obtain a bound on the number of iterations of GSIA dependingon the number of random vertices, rather than on the number of max or min vertices. Then,we can derive the complexity of any instance of GSIA, by evaluating the cost of computing σ from σ in G [ A, σ ]. To prove a complexity bound using the values of a strategy, we need to precisely characterisethe form of these values. In a 2-SSG, there is a function f ( r ) such that, for every pair ofpositional strategies ( σ, τ ), there is t ≤ f ( r ), such that for every vertex x , there is an integer p x , such that v σ,τ ( x ) = p x t Condon proved in [11] that f ( r ) ≤ r . Then Auger, Coucheney and Strozecki improvedthis to f ( r ) ≤ r/ in [3]. We show that f ( r ) = q r for q -SSGs, which gives the improvedbound of f ( r ) ≤ r for 2-SSGs. (cid:73) Theorem 23.
Let q ≥ and G a q -SSG with r random vertices, then for any pair ofstrategies ( σ, τ ) there is t ≤ q r such that, for every vertex x , there is an integer s x such that, v σ,τ = s x t . In order to prove this result, let us first remark that a q -SSG can be assumed to have allits probability transition of the form p/q . (cid:73) Lemma 24.
Let G be a q -SSG, then there is G a q -SSG with the same vertices whichdefines the same expectation E x σ,τ ( ·|· ) and such that for all x ∈ V R and all x ∈ N + ( x ) thenthere is an integer p x,x such that p x ( x ) = p x,x /q . Proof.
For a a random vertex in G , and q a < q such that for every other vertex x in G thereis p x ∈ N and a probability p x /q a to go directly from a to x , we change those probabilitiesto p x /q and we add a probability p/q to stay in a , where: p = q − X x ∈ V p x . (cid:74) Now, we state the classical matrix-tree theorem that we use in our proof (see e.g. [5]).Let G be a directed multigraph with n vertices, the Laplacian matrix of G is a n × n matrix L ( G ) = ( l i,j ) i,j ≤ n defined by: . Auger and X. Badin De Montjoye and Y. Strozecki 13 (i) l i,j equals − m where m is the number of arcs from i to j . (ii) l i,i is the number of arcs going to i , excluding the loops. (cid:73) Theorem 25 (Matrix tree theorem for directed multigraphs [5]) . For G = ( V, E ) a directedmultigraph with vertices V = { v , . . . , v k } and L its Laplacian matrix, the number of spanningtrees rooted at v i is det( ˆ L i,i ) where ˆ L i,i is the matrix obtained by deleting the i -th row andcolumn from L . We can now prove Th. 23.
Proof of Th. 23.
The beginning of the proof is the same as in [10] and [3]. We consider a q -SSG G and two positional strategies σ and τ . Without loss of generality, we can restrictourselves to the computation of non-zero, non-sink values. Thus, each vertex has a non-zeroprobability to reach the 1-sink. To compute the values v σ,τ , we can consider G A an SSGwith vertices V R ∪ V S : the random vertices and the sinks of V . The value of the sinks is notchanged and the probability distribution p x is defined as follows. For x ∈ V R and x in G A ,we call M x,x the set of max and min vertex y in N + ( x ) such that there is a path followingonly arcs of σ and τ from y to x . We then have p x ( x ) = X y ∈ M p x ( y )The graph G A has r + 2 vertices that we denote by a , . . . , a r +1 , a r +2 where a r +1 is the0-sink and a r +2 is the 1-sink. Let b be the r -dimensional column vector with b i = p a i ( a r +2 ).We define A the r × r matrix, with A i,j = p a i ( a j ).The values of the random vertices are defined by the vector z that satisfies the followingequation: z = Az + b Let I be the identity matrix, ( I − A ) is invertible because each random vertex has access toa sink and every eigenvalue of A is strictly less than 1. We refer to [10] for details. Hence,the equation has a unique solution and z is also solution of: q ( I − A ) z = qb Hence, under the strategies σ, τ , the value z i of a random vertex a i given by the Cramer ruleis z i = det B i det q ( I − A )where B i is the matrix q ( I − A ) where the i -th column has been replaced by qb . The valuedet B i is an integer. See [3] for more details. Our goal is now to bound det q ( I − A ).From the graph G A , we construct the graph G by inverting all arcs, and duplicating anarc of probability p/q into p arcs of probability 1 /q . We also add an arc coming from the1-sink to the 0-sink and one from the 0-sink towards the 1-sink. Figure 2 shows an exampleof the transformation from G to G . The Laplacian L of G is thus the following matrix. L = q ( I − A ) T B x n x r r x r r n −→ r r r r Figure 2
Example of a transformation of a graph G into a graph G Indeed, every random vertex has indegree q minus the number of loops. Thus the numberof spanning trees of G rooted in the 1-sink is equal by Th. 25 to det ˆ L r +2 ,r +2 where we haveˆ L r +2 ,r +2 = (cid:18) q ( I − A ) T B (cid:19) . In other words, the number of spanning trees of G is equal to det q ( I − A ). Furthermore,each spanning tree contains exactly one incoming arcs for every random vertices, and the arc( a r +2 , a r +1 ) has to be used. Thus, there is at most q r spanning trees rooted in G and:det q ( I − A ) ≤ q r . (cid:74) GSIA produces a sequence of strictly increasing positional max strategies. The numberof positional max strategies is bounded by | Σ max | = Y x ∈ V max deg( x ), hence the number ofiterations of GSIA is bounded by this value. If we consider the case of a binary SSG (allvertices of outdegree 2), we have the classical bound of | Σ max | = 2 n iterations. This trivialbound in n is not far from the best known for deterministic algorithms: 2 n /n iterationsobtained for Hoffman-Karp algorithm [24].We give a bound for q -SSG, which depends on q and r the number of random vertices.The difference of two values written as a/b and c/d , with a and b less than q − r is more than q − r . Hence, if a value increases in GSIA, it increases at least by q − r . Using the classicalnotion of switch and anti-switch [24], we can prove that all nodes which have their valueincreased by a step of GSIA, are increased by at least q − r . (cid:73) Definition 26.
A switch (resp. an anti-switch) of a max strategy σ with switchedset S ⊆ V max is a strategy σ S defined by σ S ( x ) = σ ( x ) for x / ∈ S , and satisfying v σ ( σ ( x )) < v σ ( σ S ( x )) (resp. v σ ( σ ( x )) > v σ ( σ S ( x )) ) for x ∈ S (hence σ S ( x ) = σ ( x ) ). . Auger and X. Badin De Montjoye and Y. Strozecki 15 A common tool to solve SSGs is the fact that a switch increases the value of a strategy,while an anti-switch decreases it. Within our framework of transformed game, it is extremelysimple to prove. (cid:73)
Lemma 27. If σ S is a switch of σ , then σ S > σ . If σ S is an anti-switch of σ , then σ S < σ . Proof.
Consider G [ A, σ ] the game obtained from G , where A is the set of all arcs of G .Let us consider x a vertex switched in σ , that is with v Gσ ( σ ( x )) < v Gσ ( σ ( x )). Then,because all arcs are in A , we have v G [ A,σ ] σ ( x ) = v Gσ ( σ ( x )) and v G [ A,σ ] σ ( x ) = v Gσ ( σ ( x )). Hence, v G [ A,σ ] σ ( x ) < v G [ A,σ ] σ ( x ) and for v Gσ ( σ ( x )) ≥ v Gσ ( σ ( x )), σ ( x ) = σ ( x ), which implies σ (cid:31) G [ A,σ ] σ .Prop. 21 proves σ > G σ .The proof is the same for an anti-switch, since σ (cid:31) G [ A,σ ] σ ⇒ σ > G σ (which can be provedsimilarly as Prop. 21, while keeping in mind that in the decreasing case, creating absorbingset lowers the value). (cid:74)(cid:73) Theorem 28.
For G a q -SSG with r random vertices and n max vertices, the number ofiterations of GSIA is at most nq r . Proof.
Let us consider σ the strategy computed at some point by GSIA and σ the nextstrategy. By Prop. 21, σ < σ . Hence, by Lemma 27, σ cannot be an anti-switch of σ . Thus,there is a max vertex x such that v σ ( σ ( x )) < v σ ( σ ( x )).Since σ < σ , we have v σ ( x ) = v σ ( σ ( x )) < v σ ( σ ( x )) ≤ v σ ( σ ( x )) = v σ ( x ). We nowevaluate v σ ( σ ( x )) − v σ ( σ ( x )). In the game G , under the strategies σ, τ ( σ ), Th. 23 impliesthat for some t ≤ q r , v σ ( σ ( x )) = p/t and v σ ( σ ( x )) = p /t . We have p/t < p /t , thus p /t − p/t ≥ /t ≥ /q r . Hence, the value of some max vertex increases by 1 /q r in eachiteration of GSIA. Since there are n max vertices and their values are bounded by 1, thereare at most nq r iterations. (cid:74) The complexity of GSIA is the number of iterations given by Th. 28, multiplied by thecomplexity of an iteration. In an iteration, there are two sources of complexity: constructingthe game G [ A, σ ] and finding an improving strategy σ in G [ A, σ ]. To construct the game, v σ is computed by solving a linear program of size m up to precision p = q r . Let C ( m, p ) bethe complexity of computing v σ , the best bound is currently in O ( m ω log( p )) [18], with ω thecurrent best bound on the matrix multiplication exponent. Let C ( n, r, q ) be the complexityof computing σ , the complexity of an iteration is in O ( nq r ( C ( n + r, q r ) + C ( n, r, q )).We obtain a better complexity, when C ( n, r, q ) = O ( C ( n, q r ) r/n ), which is the case formost considered instances of GSIA. The number of iterations is only rq r if we can guaranteethat a random vertex increases its value at each step. When no random vertex is improved,the cost of computing G [ σ, A ] can be shown to be smaller, which yields the following theorem. (cid:73) Theorem 29.
Let G be a q -SSG with r random vertices and n max vertices. If C ( n, r, q ) = O ( C ( n, q r ) r/n ) , then the complexity of GSIA is in O ( rq r C ( n, q r )) . Proof.
We assume that r < n , otherwise the theorem is trivial. Let σ be the strategycomputed by GSIA at some point, improving on the strategy σ . GSIA must compute G [ σ , A ],and thus v σ and we explain a method to do so efficiently.We assume that the order of the values (in G ) of the random vertices is the same for σ and σ . Then, knowing this order and σ , it is easy to compute τ ( σ ) a best response to σ in quasi linear time [1]. Then, we can compute the values v σ ,τ ( σ ) in time O ( C ( r, q r )),since it is done by solving a linear system of dimension r with precision q r , a task which is simpler than solving a linear program. Since C ( r, q r ) is at least quadratic in r , then C ( r, q r ) < C ( n, q r ) r/n and by hypothesis C ( n, r, q ) = O ( C ( n, q r ) r/n ), hence a step is ofcomplexity at most O ( C ( n, q r ) r/n ). There are at most nq r such steps, for a total complexityof O ( rq r C ( n, q r )).We need to detect when the assumption that the values of the random vertices are thesame for σ and σ is false. If v σ ,τ ( σ ) satisfies the optimality conditions at the min vertices,then τ ( σ ) is a best response. Otherwise, we compute the best response by solving a linearprogram in time C ( n, q r ). In that case, the order of the random vertices has changed:there are two vertices x and x such that v σ ( x ) < v σ ( x ) and v σ ( x ) > v σ ( x ). Hence, v σ ( x ) > v σ ( x ), which implies that v σ ( x ) − v σ ( x ) > v σ ( x ) − v σ ( x ) > q − r .We have proved that when the random order changes, the value of some random vertexincreases by at least q − r , hence there are at most rq r such steps. The complexity from thesesteps is bounded by O ( rq r C ( n, q r )), which proves the theorem. (cid:74) As previously mentioned, all known strategy iteration algorithms can be viewed as particularinstances of GSIA. This includes e.g. switch-based algorithms, like Hoffman-Karp algorithm[10, 24] or Ludwig’s recursive algorithmm [21]. With the help of GSIA it also becomes veryeasy to derive new algorithms, by transforming the game into polynomial time solvableinstances, such as almost acyclic games [3]. We detail all these old and new algorithms inthe last section of this paper.In this section, we focus instead on two particular instances (or family of instances) ofGSIA, for which we obtain new complexity bounds using the results of the previous sections.
In [10], Condon first presents a faulty algorithm (the Naive Converge From Below Algorithm)and then a correct modified version, the Converge From Below (CFB) Algorithm. Thisalgorithm proceeds by improving a value vector iteratively, but is in fact a strategy improve-ment algorithm that can be seen as an instance of GSIA. This gives us a proof of convergenceof the CFB algorithm in the general non-stopping case (whereas Condon has the assumptionthat the game is stopping in her proof) and also a bound on the number of iterations (noneis given in the original paper).The CFB algorithm is restated with some clarifications on listing 2 (we omit the detailsof the linear program, see [10]). The algorithm uses two properties of a vector, that we nowdefine. First, vector v is feasible if (i) For s ∈ V S , v ( s ) = Val( s ) (ii) For r ∈ V R , v ( r ) = X x ∈ N r p r ( x ) v ( r ) (iii) For x ∈ V min , v ( x ) ≤ min y ∈ N + ( x ) v ( y ) (iv) For x ∈ V max , v ( x ) ≥ max y ∈ N + ( x ) v ( y ).A feasible vector is stable at x a min vertex (resp. max vertex) if satisfies condition ( iii )(resp. condition ( iv )) of feasibility for x with an equality. The main loop of the algorithmconsists in two parts:first, “compute the value vector v r of an optimal response to the max strategy that playsgreedily according to v ”, is what we precisely do in GSIA when we pull back our strategy σ (cid:31) σ from G [ A, σ ] and compute a best response. Here v r is the value vector of σ ; . Auger and X. Badin De Montjoye and Y. Strozecki 17 second, “update v as the feasible vector where all min vertices x have value v r ( x ) andall max vertices are stable” amounts to fixing all min vertices by defining A as the setof arcs entering min vertices, considering the game G [ A, σ ], and computing an optimalstrategy for max in this game.Hence, the CFB algorithm is the instance of GSIA where all min vertices are fixed, exceptthat the two steps of the algorithm are reversed in the loop. The initial vector in the CFBalgorithm is the value vector of an optimal max strategy in G [ A, f ], where A is the set ofarcs entering min vertices, and f is the zero vector. Here f may not correspond to the actualvalues of a max strategy σ in G , but after the first loop of the algorithm it will be the casefor all remaining steps. Algorithm 2
Converge From Below Algorithm
Data: G an SSG Result:
The optimal value vector v ∗ of G begin · let v be a feasible vector in which all min vertices have value 0 and all max vertices are stable while v is not an optimal value vector do · use linear programming to compute the value vector v r of an optimalresponse to the max strategy that plays greedily according to v · update v as the feasible vector where all min vertices x have value v r ( x ) andall max vertices are stable return v The strategy improvement algorithm proposed by Gimbert and Horn in [14] (denoted byGHA) can be viewed as an instance of GSIA where the set A of fixed arcs is the set R of allarcs going out of random vertices, and the improvement step in the subgame G [ R, σ ] consistsin taking an optimal strategy. In this case, the subgame G [ R, σ ] is deterministic (randomvertices are connected to sinks only and can be replaced by sinks), hence optimal values in G [ R, σ ] depend only on the relative ordering of the values v σ ( x ) for sink and random vertices x of G . These values can be computed in quasi linear time [1]. In the original paper [14], thealgorithm is proposed in a context where the number of sinks is two, but we generalise theirdefinitions to our context.Consider a total ordering f on V R ∪ V S , f : x < x < · · · < x r + s , where s is thenumber of sinks. An f -strategy corresponding to this ordering is an optimal max strategy inthe game where the s + r vertices above are replaced by sinks with new values satisfyingVal( x ) < Val( x ) < · · · < Val( x r + s ) . Clearly, this strategy does not depend on the actualvalues given but only on f . Note that if several f -strategies exist for a given f , they sharethe same values on all vertices.Algorithm GHA produces an improving sequence of f -strategies, and the two sinks ofvalue zero and one are always first and last in the order, hence its number of iterations isbounded by r !, the total number of possible orderings of the random nodes. We extend thisresult to a large class of instances of GSIA: let us call Optimal-GSIA (OGSIA), the metaalgorithm obtained from Algorithm 1 with two additional constraints: the set A of fixed arcs is the same at each step of Algorithm 1;at line 5, the improving strategy σ is the optimal strategy in G [ A, σ ].All classical algorithms captured by GSIA, or new ones presented in this article are infact instances of OGSIA. We now show that OGSIA has an iteration number similar to GHA.Since we have proved a bound of nq r iterations, by Th. 28, OGSIA has essentially the bestknown number of iterations, for q small and large (the latter being interesting in the case ofrandom vertices with large degree and arbitrary probability distributions). (cid:73) Theorem 30.
Consider an SSG G and a set of arcs A containing k arcs out of max or min vertices. Then Algorithm OGSIA runs in at most min(( r + k ) q r , ( r + k )!) iterations. Proof.
Let σ be one of the iterated max strategies obtained by an instance of OGSIA, and σ be an optimal strategy in G [ A, σ ]. Then σ is consequently an f -strategy in G [ A, σ ], where f is the ordering on V R ∪ V A (where V A is the set of A -sinks) which is induced by the valuevector v G [ A,σ ] σ (if vertices have the same value, just arbitrarily decide their relative orderingin f ).Since strategies produced by the algorithm strictly increase in values by Prop. 21, theymust be all distinct. Hence, the order f must be distinct at each step of the algorithm, whichproves that OGSIA does at most ( r + k )! iterations.Moreover, at every step the value in G of at least one vertex in V R ∪ V A must improve, byat least q − r because of Th. 23. Since the value of these vertices is bounded by 1, the numberof iterations of OGSIA is bounded by ( r + k ) q r . (cid:74) Th. 30 gives a competitive bound on the number of iterations for a strategy improvementalgorithm, but the algorithmic complexity of an instance of OGSIA also depends on how wefind an optimal solution in G [ A, σ ]. We study a class of instances of OGSIA generalising GHA,with two interesting properties: there is a simple sufficient condition, which guarantees that G [ A, σ ] is solvable in polynomial time, and they can be precisely compared to Ibsen-Jensenand Miltersen’s algorithm (denoted by IJMA) [17], which is the current best deterministicalgorithm for 2-SSGs.First, as noted in Sec. 5.2, recall that Optimal-GSIA with R as a fixed set of arcsis equivalent to Gimbert and Horn’s algorithm (GHA). The meta-algorithm GGHA (forGeneralised Gimbert-Horn Algorithm) is any instance of OGSIA with A ⊆ R . By Th. 30,GGHA needs at most min( rq r , r !) iterations. Let us call the stopping factor of a randomnode x the sum of probabilities of the arcs going out of x into a sink. The stopping factor ofthe game is the maximum of the stopping factors of the random nodes. The game G [ R, σ ]has a stopping factor of 1, which means that the game stops as soon as a random nodes isencountered and the size of a play is at most n . If the choice of A yields a game G [ A, σ ] witha stopping factor q , then the game can be seen as having a discount factor of at most q − /q and it can be solved in time polynomial in n and q using Hoffman-Karp algorithm [16].Remark that there are exponentially many choices of A ⊆ R for which the stopping factor ismore than 1 /n , e.g. by requiring that for each random vertex, its outgoing arc of largestprobability is in A . When the stopping factor is more than 1 /n , then G [ A, σ ] can be solvedin polynomial time in n .Let us describe IJMA, restated in our framework, and generalised to q -SSGs. IJMA isnot a strategy improvement algorithm but a value iteration algorithm , which keeps a vectorof values for random vertices, here denoted by v IJMA i at step i . This vector is updated in thefollowing way: . Auger and X. Badin De Montjoye and Y. Strozecki 19 first, an optimal f -strategy is computed in the deterministic game G [ R, v
IJMA i ], and wedenote the values of this game by v IJMA i (remember that R is the set of arcs going out ofrandom vertices in G ) ;second, the vector v IJMA i is updated on every random vertex by v IJMA i +1 ( x ) = X y ∈ N + ( x ) p x ( y ) · v IJMA i ( y ) . As can be seen, IJMA has an almost linear update complexity since G [ R, σ ] is a deterministicgame and can be solved in quasi linear time. The time to solve one iteration of an instanceof GGHA is polynomial if A is chosen so that the stopping factor is larger than 1 /n butsince a best response must be computed, it is at least in time O ( rn ω ). We now prove thatthe advantage of this longer to compute iteration is that any instance of GGHA convergefaster than IJMA on all instances. (cid:73) Theorem 31.
GGHA needs less iterations than IJMA to find the optimal values on anyinput.
Proof.
We denote by σ i the strategy obtained after i steps of an instance of GGHA, where A satisfies the conditions defined above. We prove by induction on i that v σ i ≥ v IJMA i (onrandom vertices). In IJMA, the value vector is initialised to 0 at the first step, hence anychoice of initial strategy for GGHA guarantees a larger value on random vertices and satisfiesthe induction hypothesis.Now assume that v σ i ≥ v IJMA i for some i . First, we have v σ i +1 = v G [ A,σ i +1 ] σ i +1 (1)by Lemma 12 and v G [ A,σ i +1 ] σ i +1 ≥ v G [ A,σ i ] σ i +1 (2)since σ i +1 < σ i by Proposition 21. Now v G [ A,σ i ] σ i +1 is is by definition of OGSIA an optimal valuevector of G [ A, σ i ], but optimal values are larger in G [ A, σ i ] than in G [ R, σ i ] since A ⊆ R ,and on the other hand, optimal values of G [ R, σ i ] are larger than those of G [ R, v
IJMA i ], usingLemma 20 and the induction hypothesis, the latter being v IJMA by definition of IJMA.Putting these together, we have proved that v σ i +1 ≥ v IJMA . (3)Consider now a random vertex x . Considering the optimality conditions of Lemma 8 for v σ i +1 and the definition of v IJMA i +1 ( x ), we see that v σ i +1 ( x ) = X y ∈ N + ( x ) p x ( y ) v σ i +1 ( y ) ≥ X y ∈ N + ( x ) p x ( y ) v IJMA i +1 ( y ) = v IJMA i +1 This concludes the induction and the result follows. (cid:74)
We have proved that on every game, GGHA algorithms always make less iteration thanIJMA; but as previously stated, it can need dramatically less of them. Indeed, the analysisof IJMA [17] relies on finding an extremal input for the algorithm, which happens to haveno max nor min vertices. This extremal input is thus solved in one iteration of GGHA withthe help of the "best response" step, and GGHA is then exponentially faster. We have yet no result to quantify how faster GGHA is in the general case, but we suspect the number ofiterations of GGHA to be much smaller in many cases.To go further into that direction, we can also propose an hybrid version between GGHAand IJMA, by computing a best response every rn ω − steps of IJMA rather than doing avalue propagation. This hybrid version enjoys the same complexity as IJMA since the cost ofcomputing the best response is amortised over enough steps, and the proof of Th. 31 showsthat it needs less iterations than IJMA, and exponentially so for the extremal input. We show that all known strategy improvement algorithms can be expressed as instances ofGSIA and we also propose several new algorithms, derived from choices of A which make thetransformed game polynomial time solvable. The only algorithms which are not instancesof GSIA are based on values rather than strategies: value propagation [12, 17], quadraticprogramming [12, 20] and dichotomy [3]. The most classical method to solve an SSG, called the Hoffman-Karp algorithm, repeatedlyapplies switches to the strategy until finding the optimal one. It is also a generic algorithm,since the choice of the set of vertices to switch at each step is not specified nor the choice ofthe initial strategy. Many details on these algorithms can be found in [10] or [24].Hoffman-Karp algorithms are instances of GSIA, where A is the set of all arcs of theSSG. Indeed, as proved in Lemma 27, a switch σ of σ satisfies σ (cid:31) G [ σ,A ] σ . InterpretingHoffman-Karp algorithms as instances of GSIA proves that they work on non-stopping games,while in most article the stopping condition is required. Moreover, it shows that their numberof iterations is O ( nq r ) on q -SSGs, a complexity exponential in r only, which was known onlyfor algorithms specially designed for this purpose [15, 13, 17, 4].Ludwig’s Algorithm [21], which is the best randomised algorithm to solve SSGs, can beseen as an Hoffman-Karp algorithm using Bland’s rule as shown in [4]: a random order onthe vertices is drawn, and at each step, the first switchable vertex in the order is switched.Two other Hoffman-Karp algorithms are presented in [24]: switching all switchable vertices ateach step or switching a random subset. Seeing these three algorithms as instances of GSIAyields O ( nq r ) as a deterministic bound on their number of iterations, which was unknown.However, the analysis of [21, 24] is required to obtain a good complexity in n for thesealgorithms. In [13], Dai and Ge give a randomised improvement of GHA simply by choosing a betterinitial strategy. To do so, they choose randomly √ r ! log( r !) strategies and choose the onewith the highest value. This ensures, with high probability, that at most √ r ! iterationswill be done in GHA. Thus, their algorithm runs in O ( (cid:16) √ r ! (cid:17) iterations. This algorithm isalso captured by GSIA by selecting the initial strategy in the same way, however it seemshard to combine the gain made by the random selection of the strategy and the bound in O ( q r ) of GSIA, since even a strategy close to the optimal one may have values far from it.Remark that it is trivial to extend this method to any instance of OGSIA to improve on thecomplexity of Th. 30. . Auger and X. Badin De Montjoye and Y. Strozecki 21 We can use GSIA to design many strategy improvement algorithms. We present three ofthem, all based on a choice of A which makes G [ A, σ ] solvable in polynomial time. Theinitial strategy can be anything and σ is always chosen to be the optimal strategy in G [ A, σ ].Most of them can be seen as generalisations of known algorithms. Let A be a feedback arc set of G , then G [ A, σ ] is acyclic and it can be solved in lineartime. It seems intuitively appealing to think that this algorithm will be faster if thefeedback arc set is small but we have no proof to sustain such a proposition. A max acyclic SSG is an SSG such that that every max vertex has at most one outgoingarc in a cycle. max acyclic SSG can be solved in polynomial time, see [3]. If we let A bea set of arc that contains all but one outgoing edges of each max vertex, then G [ A, σ ]is max acyclic and can be solved in polynomial time. Moreover, such a game can besolved by strategy iteration in at most n iterations. This can be seen as a generalisationof Hoffman-Karp algorithm, in which A contains all outgoing edges of max vertices. As an intermediate between acyclic games and max acyclic games, we may consideralmost acyclic games, where all vertices have at most one outgoing arc in a cycle. Almostacyclic SSGs can be solved in linear time [3].
References Daniel Andersson, Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen.Deterministic graphical games revisited. In
Conference on Computability in Europe , pages1–10. Springer, 2008. Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games ongraphs. In
International Symposium on Algorithms and Computation , pages 112–121, 2009. David Auger, Pierre Coucheney, and Yann Strozecki. Finding optimal strategies of almostacyclic simple stochastic games. In
International Conference on Theory and Applications ofModels of Computation , pages 67–85, 2014. David Auger, Pierre Coucheney, and Yann Strozecki. Solving Simple Stochastic Games withFew Random Nodes Faster Using Bland’s Rule. In , pages 9:1–9:16, 2019. Seth Chaiken and Daniel J Kleitman. Matrix tree theorems.
Journal of combinatorial theory,Series A , 24(3):377–381, 1978. Krishnendu Chatterjee, Luca de Alfaro, and Thomas A. Henzinger. Strategy improvementfor concurrent reachability and turn-based stochastic safety games.
Journal of Computer andSystem Sciences , 79(5):640 – 657, 2013. Krishnendu Chatterjee and Nathanaël Fijalkow. A reduction from parity games to simplestochastic games.
Electronic Proceedings in Theoretical Computer Science , 54, 2011. Krishnendu Chatterjee and Thomas A. Henzinger.
Value Iteration , pages 107–138. 2008. Taolue Chen, Marta Kwiatkowska, Aistis Simaitis, and Clemens Wiltsche. Synthesis for multi-objective stochastic games: An application to autonomous urban driving. In
QuantitativeEvaluation of Systems , pages 322–337, 2013. Anne Condon. On algorithms for simple stochastic games. pages 51–72, 1990. Anne Condon. The complexity of stochastic games.
Information and Computation , 96(2):203–224, 1992. Anne Condon. On algorithms for simple stochastic games.
Advances in computationalcomplexity theory , 13:51–73, 1993. Decheng Dai and Rong Ge. New results on simple stochastic games. In
International Symposiumon Algorithms and Computation , pages 1014–1023. Springer, 2009. Hugo Gimbert and Florian Horn. Simple stochastic games with few random vertices areeasy to solve. In
Foundations of Software Science and Computational Structures , pages 5–19.Springer, 2008. Hugo Gimbert and Florian Horn. Solving simple stochastic games with few random vertices.
Logical Methods in Computer Science , Volume 5, Issue 2, 2009. Thomas Dueholm Hansen, Peter Bro Miltersen, and Uri Zwick. Strategy iteration is stronglypolynomial for 2-player turn-based stochastic games with a constant discount factor.
Journalof the ACM (JACM) , 60(1):1–16, 2013. Rasmus Ibsen-Jensen and Peter Bro Miltersen. Solving simple stochastic games with few cointoss positions. In
European Symposium on Algorithms , pages 636–647. Springer, 2012. Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. Faster dynamic matrixinverse for faster lps. arXiv preprint arXiv:2004.07470 , 2020. Brendan Juba. On the hardness of simple stochastic games.
Master’s thesis, CMU , 2005. Jan Křetínsk`y, Emanuel Ramneantu, Alexander Slivinskiy, and Maximilian Weininger. Com-parison of algorithms for simple stochastic games. arXiv preprint arXiv:2009.10882 , 2020. Walter Ludwig. A subexponential randomized algorithm for the simple stochastic gameproblem.
Information and computation , 117(1):151–155, 1995. L. S. Shapley. Stochastic games.
Proceedings of the National Academy of Sciences , 39(10):1095–1100, 1953. C Stirling. Bisimulation, modal logic and model checking games.
Logic Journal of the IGPL ,7(1):103–124, 1999. Rahul Tripathi, Elena Valkanova, and VS Anil Kumar. On strategy improvement algorithmsfor simple stochastic games.