[PDF] On Satisficing in Quantitative Games

Abstract

Several problems in planning and reactive synthesis can be reduced to the analysis of two-player quantitative graph games. {\em Optimization} is one form of analysis. We argue that in many cases it may be better to replace the optimization problem with the {\em satisficing problem}, where instead of searching for optimal solutions, the goal is to search for solutions that adhere to a given threshold bound. This work defines and investigates the satisficing problem on a two-player graph game with the discounted-sum cost model. We show that while the satisficing problem can be solved using numerical methods just like the optimization problem, this approach does not render compelling benefits over optimization. When the discount factor is, however, an integer, we present another approach to satisficing, which is purely based on automata methods. We show that this approach is algorithmically more performant -- both theoretically and empirically -- and demonstrates the broader applicability of satisficing overoptimization.

Full PDF

OOn Satisﬁcing in Quantitative Games

Suguman Bansal (cid:66) , Krishnendu Chatterjee , and Moshe Y. Vardi University of Pennsylvania, Philadelphia, USA [email protected] IST Austria, Klosterneuburg, Austria, [email protected] Rice University, Houston, USA [email protected]

Abstract.

Several problems in planning and reactive synthesis can bereduced to the analysis of two-player quantitative graph games.

Opti-mization is one form of analysis. We argue that in many cases it may bebetter to replace the optimization problem with the satisﬁcing problem ,where instead of searching for optimal solutions, the goal is to search forsolutions that adhere to a given threshold bound.This work deﬁnes and investigates the satisﬁcing problem on a two-playergraph game with the discounted-sum cost model. We show that while thesatisﬁcing problem can be solved using numerical methods just like theoptimization problem, this approach does not render compelling bene-ﬁts over optimization. When the discount factor is, however, an integer,we present another approach to satisﬁcing, which is purely based on au-tomata methods. We show that this approach is algorithmically moreperformant – both theoretically and empirically – and demonstrates thebroader applicability of satisﬁcing over optimization.

Quantitative properties of systems are increasingly being explored in automatedreasoning [4,14,16,20,21,26]. In decision-making domains such as planning andreactive synthesis, quantitative properties have been deployed to describe softconstraints such as quality measures [11], cost and resources [18,22], rewards [31],and the like. Since these constraints are soft, it suﬃces to generate solutions thatare good enough w.r.t. the quantitative property.Existing approaches on the analysis of quantitative properties have, however,primarily focused on optimization of these constraints, i.e., to generate optimalsolutions. We argue that there may be disadvantages to searching for optimalsolutions, where good enough ones may suﬃce. First, optimization may be moreexpensive than searching for good-enough solutions. Second, optimization re-stricts the search-space of possible solutions, and thus could limit the broaderapplicability of the resulting solutions. For instance, to generate solutions thatoperate within battery life, it is too restrictive to search for solutions with mini-mal battery consumption. Besides, solutions with minimal battery consumptionmay be limited in their applicability, since they may not satisfy other goals, suchas desirable temporal tasks.To this end, this work focuses on directly searching for good-enough solu-tions. We propose an alternate form of analysis of quantitative properties in a r X i v : . [ c s . F L ] J a n which the objective is to search for a solution that adheres to a given thresh-old bound , possibly derived from a physical constraint such as battery life. Wecall this the satisﬁcing problem , a term popularized by H.A.Simon in economicsto mean satisfy and suﬃce , implying a search for good-enough solutions [1].Through theoretical and empirical investigation, we make the case that satis-ﬁcing is algorithmically more performant than optimization and, further, thatsatisﬁcing solutions may have broader applicability than optimal solutions.This work formulates and investigates the satisﬁcing problem on two-player,ﬁnite-state games with the discounted-sum (DS) cost model, which is a standardcost-model in decision-making domains [24,25,28]. In these games, players taketurns to pass a token along the transition relation between the states. As thetoken is pushed around, the play accumulates costs along the transitions usingthe DS cost model. The players are assumed to have opposing objectives: oneplayer maximizes the cost, while the other player minimizes it. We deﬁne thesatisﬁcing problem as follows: Given a threshold value v ∈ Q , does there exist astrategy for the minimizing (or maximizing) player that ensures the cost of allresulting plays is strictly or non-strictly lower (or greater) than the threshold v ? Clearly, the satisﬁcing problem is decidable since the optimization prob-lem on these quantitative games is known to be solvable in pseudo-polynomialtime [17,23,32]. To design an algorithm for satisﬁcing, we ﬁrst adapt the cele-brated value-iteration (VI) based algorithm for optimization [32] ( § VISatisﬁce , displays the same complexity asoptimization and hence renders no complexity-theoretic advantage. To obtainworst-case complexity, we perform a thorough worst-case analysis of VI for op-timization. It is interesting that a thorough analysis of VI for optimization hadhitherto been absent from the literature, despite the popularity of VI. To ad-dress this gap, we ﬁrst prove that VI should be executed for Θ ( | V | ) iterationsto compute the optimal value, where V and E refer to the sets of states andtransitions in the quantitative game. Next, to compute the overall complexity,we take into account the cost of arithmetic operations as well, since they appearin abundance in VI. We demonstrate an orders-of-magnitude diﬀerence betweenthe complexity of VI under diﬀerent cost-models of arithmetic. For instance,for integer discount factors, we show that VI is O ( | V | · | E | ) and O ( | V | · | E | )under the unit-cost and bit-cost models of arithmetic, respectively. Clearly, thisshows that VI for optimization, and hence VISatisﬁce , does not scale to largequantitative games.We then present a purely automata-based approach for satisﬁcing ( § O ( | V | + | E | ) time. This shows that there is a fundamental separation in com-plexity between satisﬁcing and VI-based optimization, as even the lower boundon the number of iterations in VI is higher. In this approach, the satisﬁcing prob-lem is reduced to solving a safety or reachability game. Our core observation isthat the criteria to fulﬁl satisﬁcing with respect to threshold value v ∈ Q can beexpressed as membership in an automaton that accepts a weight sequence A iﬀ DS ( A, d ) R v holds, where d > R ∈ {≤ , ≥ , <, > } . In existing literature, such automata are called comparator automata (comparators,in short) when the threshold value v = 0 [6,7]. They are known to have a com-pact safety or co-safety automaton representation [9,19], which could be used toreduce the satisﬁcing problem with zero threshold value. To solve satisﬁcing forarbitrary threshold values v ∈ Q , we extend existing results on comparators topermit arbitrary but ﬁxed threshold values v ∈ Q . An empirical comparison be-tween the performance of VISatisﬁce , VI for optimization, and automata-basedsolution for satisﬁcing shows that the latter outperforms the others in eﬃciency,scalability, and robustness.In addition to improved algorithmic performance, we demonstrate that satis-ﬁcing solutions have broader applicability than optimal ones ( § Reachability and safety games.

Both reachability and safety games are deﬁnedover the structure G = ( V = V (cid:93) V , v init , E, F ) [30]. It consists of a directedgraph ( V, E ), and a partition ( V , V ) of its states V . State v init is the initial state of the game. The set of successors of state v is designated by vE . For convenience,we assume that every state has at least one outgoing edge, i.e, vE (cid:54) = ∅ for all v ∈ V . F ⊆ V is a non-empty set of states. F is referred to as accepting and rejecting states in reachability and safety games, respectively.A play of a game involves two players, denoted by P and P , to create aninﬁnite path by moving a token along the transitions as follows: At the beginning,the token is at the initial state. If the current position v belongs to V i , then P i chooses the successor state from vE . Formally, a play ρ = v v v . . . is an inﬁnitesequence of states such that the ﬁrst state v = v init , and each pair of successivestates is a transition, i.e., ( v k , v k +1 ) ∈ E for all k ≥

0. A play is winning forplayer P in a reachability game if it visits an accepting state, and winning forplayer P otherwise. The opposite holds in safety games, i.e., a play is winningfor player P if it does not visit any rejecting state, and winning for P otherwise.A strategy for a player is a recipe that guides the player on which state to gonext to based on the history of the play. A strategy is winning for a player P i if for all strategies of the opponent player P − i , the resulting plays are winning for P i . To solve a graph game means to determine whether there exists a winningstrategy for player P . Reachability and safety games are solved in O ( | V | + | E | ). Quantitative graph games. A quantitative graph game (or quantitative game, inshort) is deﬁned over a structure G = ( V = V (cid:93) V , v init , E, γ ). V , V , V , v init , E ,plays and strategies are deﬁned as earlier. Each transition of the game is associ-ated with a cost determined by the cost function γ : E → Z . The cost sequence of a play ρ is the sequence of costs w w w . . . such that w k = γ (( v k , v k +1 )) forall i ≥

0. Given a discount factor d >

1, the cost of play ρ , denoted wt ( ρ ), is thediscounted sum of its cost sequence, i.e., wt ( ρ ) = DS ( ρ, d ) = w + w d + w d + . . . . B¨uchi automata. A B¨uchi automaton is a tuple A = ( S , Σ , δ , s I , F ), where S is a ﬁnite set of states , Σ is a ﬁnite input alphabet , δ ⊆ ( S × Σ × S ) is the transition relation , state s I ∈ S is the initial state , and F ⊆ S is the set of accepting states [30]. A B¨uchi automaton is deterministic if for all states s andinputs a , |{ s (cid:48) | ( s, a, s (cid:48) ) ∈ δ for some s (cid:48) }| ≤

1. For a word w = w w · · · ∈ Σ ω , a run ρ of w is a sequence of states s s . . . s.t. s = s I , and τ i = ( s i , w i , s i +1 ) ∈ δ for all i . Let inf ( ρ ) denote the set of states that occur inﬁnitely often in run ρ .A run ρ is an accepting run if inf ( ρ ) ∩ F (cid:54) = ∅ . A word w is an accepting word if ithas an accepting run. The language of B¨uchi automaton A is the set of all wordsaccepted by A . Languages accepted by B¨uchi automata are called ω -regular . Safety and co-safety languages.

Let

L ⊆ Σ ω be a language over alphabet Σ . Aﬁnite word w ∈ Σ ∗ is a bad preﬁx for L if for all inﬁnite words y ∈ Σ ω , x · y / ∈ L .A language L is a safety language if every word w / ∈ L has a bad preﬁx for L [3]. A co-safety language is the complement of a safety language [19]. Safetyand co-safety languages that are ω -regular are represented by specialized B¨uchiautomata called safety and co-safety automata , respectively. Comparison language and comparator automata.

Given integer bound µ >

0, dis-count factor d >

1, and relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } the comparison languagewith upper bound µ , relation R , discount factor d is the language of words overthe alphabet Σ = {− µ, . . . , µ } that accepts A ∈ Σ ω iﬀ DS ( A, d ) R comparator automata with upper bound µ , relation R , discount factor d is theautomaton that accepts the corresponding comparison language [6]. Dependingon R , these languages are safety or co-safety [9]. A comparison language is saidto be ω -regular if its automaton is a B¨uchi automaton. Comparison languagesare ω -regular iﬀ the discount factor is an integer [7]. This section shows that there are no complexity-theoretic beneﬁts to solving thesatisﬁcing problem via algorithms for the optimization problem. § claim without proof that the algorithm runs in pseudo-polynomial time [32],its worst-case analysis is absent from literature. This section presents a detailedaccount of the said analysis, and exposes the dependence of VI’s worst-casecomplexity on the discount factor d > § § § VISatisﬁce . Given a quantitative graph game G anda threshold value v ∈ Q , the satisﬁcing problem is to determine whether theminimizing (or maximizing) player has a strategy that ensures the cost of allresulting plays is strictly or non-strictly lower (or greater) than the threshold v . The satisﬁcing problem can clealy be solved by solving the optimization prob-lem . The optimal cost of a quantitative game is that value such that the max-imizing and minimizing players can guarantee that the cost of plays is at leastand at most the optimal value, respectively.

Deﬁnition 2 (Optimization problem).

Given a quantitative graph game G ,the optimization problem is to compute the optimal cost from all possible playsfrom the game, under the assumption that the players have opposing objectivesto maximize and minimize the cost of plays, respectively. Seminal work by Zwick and Patterson showed the optimization problem issolved by the value-iteration algorithm presented here [32]. Essentially, the al-gorithm plays a min-max game between the two players. Let wt k ( v ) denotethe optimal cost of a k -length game that begins in state v ∈ V . Then wt k ( v )can be computed using the following equations: The optimal cost of a 1-lengthgame beginning in state v ∈ V is max { γ ( v, w ) | ( v, w ) ∈ E } if v ∈ V andmin { γ ( v, w ) | ( v, w ) ∈ E } if v ∈ V . Given the optimal-cost of a k -length game,the optimal cost of a ( k + 1)-length game is computed as follows: wt k +1 ( v ) = (cid:40) max { γ ( v, w ) + d · wt k ( w ) | ( v, w ) ∈ E } if v ∈ V min { γ ( v, w ) + d · wt k ( w ) | ( v, w ) ∈ E } if v ∈ V Let W be the optimal cost. Then, W = lim k →∞ wt k ( v init ). [27,32]. The VI algorithm described above terminates at inﬁnitum . To compute the al-gorithms’ worst-case complexity, we establish a linear bound on the number ofiterations that is suﬃcient to compute the optimal cost. We also establish amatching lower bound, showing that our analysis is tight.

Upper bound on number of iterations.

The upper bound computation utilizes onekey result from existing literature: There exist memoryless strategies for bothplayers such that the cost of the resulting play is the optimal cost [27]. Then,there must exists an optimal play in the form of a simple lasso in the quantitativegame, where a lasso is a play represented as v v . . . v n ( s s . . . s m ) ω . We call theinitial segment v v . . . v n its head , and the cycle segment s s . . . s m its loop . Alasso is simple if each state in { v . . . v n , s , . . . s m } is distinct. We begin our proofby assigning constraints on the optimal cost using the simple lasso structure ofan optimal play (Corollary 1 and Corollary 2).Let l = a . . . a n ( b . . . b m ) ω be the cost sequence of a lasso such that l = a . . . a n and l = b . . . b m are the cost sequences of the head and the loop,respectively. Then the following can be said about DS ( l · l ω , d ), Lemma 1.

Let l = l · ( l ) ω represent an integer cost sequence of a lasso, where l and l are the cost sequences of the head and loop of the lasso. Let d = pq be the discount factor. Then, DS ( l, d ) is a rational number with denominator atmost ( p | l | − q | l | ) · ( p | l | ) . Lemma 1 is proven by unrolling DS ( l · l ω , d ). Then, the ﬁrst constraint onthe optimal cost is as follows: Corollary 1.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let d = pq bethe discount factor. Then the optimal cost of the game is a rational number withdenominator at most ( p | V | − q | V | ) · ( p | V | ) Proof.

Recall, there exists a simple lasso that computes the optimal cost. Since asimple lasso is of | V | -length at most, the length of its head and loop are at most | V | each. So, the expression from Lemma 1 simpliﬁes to ( p | V | − q | V | ) · ( p | V | ). (cid:117)(cid:116) The second constraint has to do with the minimum non-zero diﬀerence be-tween the cost of simple lassos:

Corollary 2.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let d = pq be the discount factor. Then the minimal non-zero diﬀerence between the cost ofsimple lassos is a rational with denominator at most ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) .Proof. Given two rational numbers with denominator at most a , an upper boundon the denominator of minimal non-zero diﬀerence of these two rational numbersis a . Then, using the result from Corollary 1, we immediately obtain that theminimal non-zero diﬀerence between the cost of two lassos is a rational numberwith denominator at most ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ). (cid:117)(cid:116) For notational convenience, let bound W = ( p | V | − q | V | ) · ( p | V | ) and bound diﬀ =( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ). Wlog, | V | >

1. Since, bound diﬀ < bound W , there is atmost one rational number with denominator bound W or less in any interval ofsize bound diﬀ . Thus, if we can identify an interval of size less than bound diﬀ aroundthe optimal cost, then due to Corollary 1, the optimal cost will be the uniquerational number with denominator bound W or less in this interval. Start . . .. . . w >

00 1 010 00

Fig. 1: Sketch of game graph which requires Ω ( | V | ) iterationsThus, the ﬁnal question is to identify a small enough interval (of size bound diﬀ or less) such that the optimal cost lies within it. To ﬁnd an interval around theoptimal cost, we use a ﬁnite-horizon approximation of the optimal cost: Lemma 2.

Let W be the optimal cost in quantitative game G . Let µ > be themaximum of absolute value of cost on transitions in G . Then, for all k ∈ N ,wt k ( v init ) − d k − · µd − ≤ W ≤ wt k ( v init ) + 1 d k − · µd − Proof.

Since W is the limit of wt k ( v init ) as k → ∞ , W must lie in between theminimum and maximum cost possible if the k -length game is extended to aninﬁnite-length game. The minimum possible extension would be when the k -length game is extended by iterations in which the cost incurred in each roundis − µ . Therefore, the minimum possible value is wt k ( v init ) − d k − · µd − . Similarly,the maximum possible value is wt k ( v init ) + d k − · µd − . (cid:117)(cid:116) Now that we have an interval around the optimal cost, we can compute thenumber of iterations of VI required to make it smaller than 1 / bound diﬀ . Theorem 1.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > be the maximum of absolute value of costs along transitions. The number ofiterations required by the value-iteration algorithm is1. O ( | V | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) d − + | V | (cid:17) when discount factor < d < .Proof (Sketch). As discussed in Corollary 1-2 and Lemma 2, the optimal cost isthe unique rational number with denominator bound W or less within the interval( wt k ( v init ) − d k − · µd − , wt k ( v init ) + d k − · µd − ) for a large enough k > bound diﬀ . Thus, our task is to determine the value of k > · µd − · d k − ≤ bound diﬀ holds. The case d ≥ < d < (cid:117)(cid:116) Lower bound on number of iterations of VI.

We establish a matching lowerbound of Ω ( | V | ) iterations to show that our analysis is tight.Consider the sketch of a quantitative game in Fig 1. Let all states belongto the maximizing player. Hence, the optimization problem reduces to searchingfor a path with optimal cost. Now let the loop on the right-hand side (RHS) belarger than the loop on the left-hand side (LHS). For carefully chosen values of w and lengths of the loops, one can show that the path for optimal cost of a k -length game is along the RHS loop when k is small, but along the LHS loopwhen k is large. This way, the correct maximal value can be obtained only at alarge value for k . Hence the VI algorithm runs for at least enough iterations thatthe optimal path will be in the LHS loop. By meticulous reverse engineering ofthe size of both loops and the value of w , one can guarantee that k = Ω ( | V | ). Finally, we complete the worst-case complexity analysis of VI for optimization.We account for the the cost of arithmetic operations since they appear in abun-dance in VI. We demonstrate that there are orders-of-magnitude of diﬀerence incomplexity under diﬀerent models of arithmetic, namely unit-cost and bit-cost.

Unit-cost model.

Under the unit-cost model of arithmetic, all arithmetic opera-tions are assumed to take constant time.

Theorem 2.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > be the maximum of absolute value of costs along transitions. The worst-casecomplexity of the optimization problem under unit-cost model of arithmetic is1. O ( | V | · | E | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) ·| E | d − + | V | · | E | (cid:17) when discount factor < d < .Proof. Each iteration takes O ( E ) cost since every transition is visited once. Thus,the complexity is O ( | E | ) multiplied by the number of iterations (Theorem 1). (cid:117)(cid:116) Bit-cost model.

Under the bit-cost model, the cost of arithmetic operations de-pends on the size of the numerical values. Integers are represented in their bit-wise representation. Rational numbers rs are represented as a tuple of the bit-wiserepresentation of integers r and s . For two integers of length n and m , the costof their addition and multiplication is O ( m + n ) and O ( m · n ), respectively. Theorem 3.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > bethe maximum of absolute value of costs along transitions. Let d = pq > be thediscount factor. The worst-case complexity of the optimization problem underthe bit-cost model of arithmetic is1. O ( | V | · | E | · log p · max { log µ, log p } ) when d ≥ ,2. O (cid:16)(cid:16) log( µ ) d − + | V | (cid:17) · | E | · log p · max { log µ, log p } (cid:17) when < d < .Proof (Sketch). Since arithmetic operations incur a cost and the length of repre-sentation of intermediate costs increases linearly in each iteration, we can showthat the cost of conducting the j -th iteration is O ( | E | · j · log µ · log p ). Theirsummation will return the given expressions. (cid:117)(cid:116) Remarks on integer discount factor.

Our analysis shows that when the discountfactor is an integer ( d ≥ Θ ( | V | ) iterations. Its worst-case com-plexity is, therefore, O ( | V |·| E | ) and O ( | V | ·| E | ) under the unit-cost and bit-costmodels for arithmetic, respectively. From a practical point of view, the bit-costmodel is more relevant since implementations of VI will use multi-precision li-braries to avoid ﬂoating-point errors. While one may argue that the upper boundsin Theorem 3 could be tightened, they would not improve signiﬁcantly due tothe Ω ( | V | ) lower bound on number of iterations. We present our ﬁrst algorithm for the satisﬁcing problem. It is an adaptation ofVI. However, we see that it does not fare better than VI for optimization.VI-based algorithm for satisﬁcing is described as follows: Perform VI foroptimization. Terminate as soon as one of these occurs: (a). VI completes as manyiterations from Theorem 1, or (b). The threshold value falls outside the intervaldeﬁned in Lemma 2. Either way, one can tell how the threshold value relatesto the optimal cost to solve satisﬁcing. Clearly, (a) needs as many iterations asoptimization; (b) does not reduce the number of iterations since it is inverselyproportional to the distance between optimal cost and threshold value:

Theorem 4.

Let G = ( V, v init , E, γ ) be a quantitative graph game with optimalcost W . Let v ∈ Q be the threshold value. Then number of iterations taken by aVI-based algorithm for the satisﬁcing problem is min { O ( | V | ) , log µ | W |− v } if d ≥ and min {O (cid:16) log( µ ) d − + | V | (cid:17) , log µ | W |− v } if < d < . Observe that this bound is tight since the lower bounds from optimizationapply here as well. The worst-case complexity can be completed using similarcomputations from § non-robust performance. Our second algorithm for satisﬁcing is purely based on automata-methods. Whilethis approach operates with integer discount factors only, it runs linearly inthe size of the quantitative game. This is lower than the number of iterationsrequired by VI, let alone the worst-case complexities of VI. This approach reducessatisﬁcing to solving a safety or reachability game using comparator automata.The intuition is as follows: Given threshold value v ∈ Q and relation R , letthe satisﬁcing problem be to ensure cost of plays relates to v by R . Then, a play ρ is winning for satisﬁcing with v and R if its cost sequence A satisﬁes DS ( A, d ) R v , where d > d is an integer and v = 0, this simplychecks if A is in the safety/co-safety comparator, hence yielding the reduction.The caveat is the above applies to v = 0 only. To overcome this, we extendthe theory of comparators to permit arbitrary threshold values v ∈ Q . We ﬁndthat results from v = 0 transcend to v ∈ Q , and oﬀer compact comparatorconstructions ( § § § v ∈ Q This section extends the existing literature on comparators with threshold value v = 0 [6,5,9] to permit non-zero thresholds. The properties we investigate are ofsafety/co-safety and ω -regularity. We begin with formal deﬁnitions: Deﬁnition 3 (Comparison language with threshold v ∈ Q ). For an in-teger upper bound µ > , discount factor d > , equality or inequality relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } , and a threshold value v ∈ Q the comparison languagewith upper bound µ , relation R , discount factor d and threshold value v is a lan-guage of inﬁnite words over the alphabet Σ = {− µ, . . . , µ } that accepts A ∈ Σ ω iﬀ DS ( A, d ) R v holds. Deﬁnition 4 (Comparator automata with threshold v ∈ Q ). For an in-teger upper bound µ > , discount factor d > , equality or inequality relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } , and a threshold value v ∈ Q the comparator automatawith upper bound µ , relation R , discount factor d and threshold value v is anautomaton that accepts the DS comparison language with upper bound µ , relation R , discount factor d and threshold value v . Safety and co-safety of comparison languages.

The primary observationis that to determine if DS ( A, d ) R v holds, it should be suﬃcient to examineﬁnite-length preﬁxes of A since weights later on get heavily discounted. Thus, Theorem 5.

Let µ > be the integer upper bound. For arbitrary discount factor d > and threshold value v ∈ Q

1. Comparison languages are safety languages for relations R ∈ {≤ , ≥ , = } .2. Comparison language are co-safety languages for relations R ∈ { <, >, (cid:54) = } .Proof. The proof is identical to that for threshold value v = 0 from [9]. (cid:117)(cid:116) Regularity of comparison languages.

Prior work on threshold value v = 0shows that a comparator is ω -regular iﬀ the discount factor is an integer [7]. Weshow the same result for arbitrary threshold values v ∈ Q .First of all, trivially, comparators with arbitrary threshold value are not ω -regular for non-integer discount factors, since that already holds when v = 0. The rest of this section proves ω -regularity with arbitrary threshold val-ues for integer discount factors. But ﬁrst, let us introduce some notations:Since v ∈ Q , w.l.o.g. we assume that the it has an n -length representation v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω . By abuse of notation, we denoteboth the expression v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω and the value DS ( v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω , d ) by v .We will construct a B¨uchi automaton for the comparison language L ≤ forrelation ≤ , threshold value v ∈ Q and an integer discount factor. This is suﬃcientto prove ω -regularity for all relations since B¨uchi automata are closed.From safety/co-safety of comparison languages, we argue it is suﬃcient toexamine the discounted-sum of ﬁnite-length weight sequences to know if theirinﬁnite extensions will be in L ≤ . For instance, if the discounted-sum of a ﬁnite-length weight-sequence W is very large , W could be a bad-preﬁx of L ≤ . Similarly,if the discounted-sum of a ﬁnite-length weight-sequence W is very small thenfor all of its inﬁnite-length bounded extensions Y , DS ( W · Y , d ) ≤ v . Thus, amathematical characterization of very large and very small would formalize acriterion for membership of sequences in L ≤ based on their ﬁnite-preﬁxes.To this end, we use the concept of a recoverable gap (or gap value), which is ameasure of distance of the discounted-sum of a ﬁnite-sequence from 0 [12]. Therecoverable gap of a ﬁnite weight-sequences W with discount factor d , denoted gap ( W, d ), is deﬁned as follows: If W = ε (the empty sequence), gap ( ε, d ) = 0,and gap ( W, d ) = d | W |− · DS ( W, d ) otherwise. Then, Lemma 3 formalizes verylarge and very small in Item 1 and Item 2, respectively, w.r.t. recoverable gaps.As for notation, given a sequence A , let A [ . . . i ] denote its i -length preﬁx: Lemma 3.

Let µ > be the integer upper bound, d > be the discount factor.Let v ∈ Q be the threshold value such that v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m +2] . . . v [ n ]) ω . Let W be a non-empty, bounded, ﬁnite-length weight-sequence.1. gap ( W − v [ · · · | W | ] , d ) > d · DS ( v [ | W | · · · ] , d ) + µd − . iﬀ for all inﬁnite-length,bounded extensions Y , DS ( W · Y , d ) > v gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − iﬀ For all inﬁnite-length,bounded extensions Y , DS ( W · Y , d ) ≤ v Proof.

We present the proof of one direction of Item 1. The others follow simi-larly. Let W be s.t. for every inﬁnite-length, bounded extension Y , DS ( W · Y , d ) > v holds. Then DS ( W, d ) + d | W | · DS ( Y, d ) ≥ DS ( v [ · · · | W | ] · v [ | W | · · · ] , d ) im-plies DS ( W, d ) − DS ( v [ · · · | W | ] , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) − DS ( Y, d )) implies gap ( W − v [ · · · | W | ] , d ) > d ( DS ( v [ | W | · · · ] , d ) + µ · dd − ). (cid:117)(cid:116) This segues into the state-space of the B¨uchi automaton. We deﬁne the statespace so that state s represents the gap value s . The idea is that all ﬁnite-lengthweight sequences with gap value s will terminate in state s . To assign transitionbetween these states, we observe that gap value is deﬁned inductively as follows: gap ( ε, d ) = 0 and gap ( W · w, d ) = d · gap ( W, d )+ w , where w ∈ {− µ, . . . , µ } . Thusthere is a transition from state s to state t on a ∈ {− µ, . . . , µ } if t = d · s + a .Since gap ( ε, d ) = 0, state 0 is assigned to be the initial state. The issue with this construction is it has inﬁnite states. To limit that, weuse Lemma 3. Since Item 1 is a necessary and suﬃcient criteria for bad preﬁxesof safety language L ≤ , all states with value larger than Item 1 are fused intoone non-accepting sink. For the same reason, all states with gap value less thanItem 1 are accepting states. Due to Item 2, all states with value less than Item 2are fused into one accepting sink. Finally, since d is an integer, gap values areintegral. Thus, there are only ﬁnitely many states between Item 2 and Item 1. Theorem 6.

Let µ > be an integer upper bound, d > an integer discountfactor, R an equality or inequality relation, and v ∈ Q the threshold value with an n -length representation given by v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω .1. The DS comparator automata for µ, d, R , v is ω -regular iﬀ d is an integer.2. For integer discount factors, the DS comparator is a safety or co-safety au-tomaton with O ( µ · nd − ) states.Proof. To prove Item 1 we present the construction of an ω -regular compara-tor automaton for integer upper bound µ >

0, integer discount factor d > ≤ , and threshold value v ∈ Q s.t. v = v [0] v [1] . . . v [ m ]( v [ m +1] v [ m + 2] . . . v [ n ]) ω . , denoted by A = ( S, s I , Σ, δ, F ) where:For i ∈ { , . . . , n } , let U i = d · DS ( v [ i · · · ] , d ) + µd − (Lemma 3, Item 1)For i ∈ { , . . . , n } , let L i = d · DS ( v [ i · · · ] , d ) − µd − (Lemma 3, Item 2) – States S = (cid:83) ni =0 S i ∪{ bad , veryGood } where S i = { ( s, i ) | s ∈ {(cid:98) L i (cid:99) +1 , . . . , (cid:98) U i (cid:99)}} – Initial state s I = (0 , F = S \ { bad } – Alphabet Σ = {− µ, − µ + 1 , . . . , µ − , µ } – Transition function δ ⊆ S × Σ → S where ( s, a, t ) ∈ δ then:1. If s ∈ { bad , veryGood } , then t = s for all a ∈ Σ

2. If s is of the form ( p, i ), and a ∈ Σ (a) If d · p + a − v [ i ] > (cid:98) U i (cid:99) , then t = bad (b) If d · p + a − v [ i ] ≤ (cid:98) L i (cid:99) , then t = veryGood (c) If (cid:98) L i (cid:99) < d · p + a − v [ i ] ≤ (cid:98) U i (cid:99) ,i. If i == n , then t = ( d · p + a − v [ i ] , m + 1)ii. Else, t = ( d · p + a − v [ i ] , i + 1)We skip proof of correctness as it follows from the above discussion. Observe, A is deterministic. It is a safety automaton as all non-accepting states are sinks.To prove Item 2, observe that since the comparator for ≤ is a determinis-tic safety automaton, the comparator for > is obtained by simply ﬂipping theaccepting and non-accepting states. This is a co-safety automaton of the samesize. One can argue similarly for the remaining relations. (cid:117)(cid:116) This section describes our comparator-based linear-time algorithm for satisﬁcingfor integer discount factors. As described earlier, given discount factor d >

1, a play is winning for satis-ﬁcing with threshold value v ∈ Q and relation R if its cost sequence A satisﬁes DS ( A, d ) R v . We now know from Theorem 6, that the winning condition forplays can be expressed as a safety or co-safety automaton for any v ∈ Q as longas the discount factor is an integer. Therefore, a synchronized product of thequantitative game with the safety or co-safety comparator denoting the winningcondition completes the reduction to a safety or reachability game, respectively. Theorem 7.

Let G = ( V, v init , E, γ ) be a quantitative game, d > the integerdiscount factor, R the equality or inequality relation, and v ∈ Q the thresholdvalue with an n -length representation. Let µ > be the maximum of absolutevalues of costs along transitions in G . Then,1. The satisﬁcing problem reduces to solving a safety game if R ∈ {≤ , ≥}

2. The satisﬁcing problem reduces to solving a reachability game if R ∈ { <, > }

3. The satisﬁcing problem is solved in O (( | V | + | E | ) · µ · n ) time.Proof. The ﬁrst two points use a standard synchronized product argument on thefollowing formal reduction [15]: Let G = ( V = V (cid:93) V , v init , E, γ ) be a quantitativegame, d > R the equality or inequality relation,and v ∈ Q the threshold value with an n -length representation. Let µ > G . Then, the ﬁrststep is to construct the safety/co-safety comparator A = ( S, s I , Σ, δ, F ) for µ , d , R and v . The next is to synchronize the product of G and A over weights toconstruct the game GA = ( W = W ∪ W , s × init , δ W , F W ), where – W = V × S . In particular, W = V × S and W = V × S . Since V and V are disjoint, W and W are disjoint too. – Let s × init be the initial state of GA . – Transition relation δ W = W × W is deﬁned such that transition (( v, s ) , ( v (cid:48) , s (cid:48) )) ∈ δ W synchronizes between transitions ( v, v (cid:48) ) ∈ δ and ( s, a, s (cid:48) ) ∈ δ C if a = γ (( v, v (cid:48) )) is the cost of transition in G . – F W = V × F . The game is a safety game if the comparator is a safety au-tomaton and a reachability game if the comparator is a co-safety automaton.We need the size of GA to analyze the worst-case complexity. Clearly, GA consists of O ( | V | · µ · n ) states. To establish the number of transitions in GA ,observe that every state ( v, s ) in GA has the same number of outgoing edges asstate v in G because the comparator A is deterministic. Since GA has O ( µ · n )copies of every state v ∈ G , there are a total of O ( | E | · µ · n ) transitions in GA .Since GA is either a safety or a reachability game, it is solved in linear-time toits size. Thus, the overall complexity is O (( | V | + | E | ) · µ · n ). (cid:117)(cid:116) With respect to the value µ , the VI-based solutions are logarithmic in theworst case, while comparator-based solution is linear due to the size of the com-parator. From a practical perspective, this may not be a limitation since weightsalong transitions can be scaled down. The parameter that cannot be altered isthe size of the quantitative game. With respect to that, the comparator-based

50 100 150 200 250 300

Number of benchmarks solved T i m e ( i n s e c ond s ) VIOptimalCompSatisﬁce

Fig. 2: Cactus plot. µ = 5 , v = 3. Totalbenchmarks = 291 Number of states (in log scale) − − T i m e ( i n s e c ond s , i n l og sc a l e ) VIOptimalCompSatisﬁce

Fig. 3: Single counter scalable bench-mark. µ = 5 , v = 3. Timeout = 500s.solution displays clear superiority. Finally, the comparator-based solution is af-fected by n , length of the representation of the threshold value while the VI-basedsolution does not. It is natural to assume that the value of n is small. The goal of the empirical analysis is to determine whether the practical perfor-mance of these algorithms resonate with our theoretical discoveries.For an apples-to-apples comparison, we implement three algorithms: (a)

VIOptimal : Optimization via value-iteration, (b)

VISatisﬁce : Satisﬁcing via value-iteration, and (c).

CompSatisﬁce : Satisﬁcing via comparators. All tools have beenimplemented in

C++ . To avoid ﬂoating-point errors in

VIOptimal and

VISatisﬁce ,the tools invoke the open-source

GMP ( GNU

Multi-Precision) [2]. Since all arith-metic operations in

CompSatisﬁce are integral only, it does not use

GMP .To avoid completely randomized benchmarks, we create ∼

290 benchmarksfrom

LTL f benchmark suite [29]. The state-of-the-art LTL f -to-automaton tool Lisa [8] is used to convert

LTL f to (non-quantitative) graph games. Weights arerandomly assigned to transitions. The number of states in our benchmarks rangefrom 3 to 50000+. Discount factor d = 2, threshold v ∈ [0 − Observations and Inferences

Overall, we see that

VISatisﬁce is eﬃcient andscalable, and exhibits steady and predictable performance . CompSatisﬁce outperforms

VIOptimal in both runtime and number of bench-marks solved, as shown in Fig 2. It is crucial to note that all benchmarks solvedby

VIOptimal had fewer than 200 states. In contrast,

CompSatisﬁce solves muchlarger benchmarks with 3-50000+ number of states.To test scalability, we compared both tools on a set of scalable benchmarks.For integer parameter i >

0, the i -th scalable benchmark has 3 · i states. Fig 3 Figures are best viewed online and in color5 − T i m e ( i n s e c ond s , i n l og sc a l e ) CompSatisﬁceVISatisﬁce

Fig. 4: Robustness. Fix benchmark, vary v . µ = 5. Timeout = 500s.plots number-of-states to runtime in log-log scale. Therefore, the slope of thestraight line will indicate the degree of polynomial (in practice). It shows usthat CompSatisﬁce exhibits linear behavior (slope ∼ VIOptimal ismuch more expensive (slope >>

1) even in practice.

CompSatisﬁce is more robust than

VISatisﬁce . We compare

CompSatisﬁce and

VISatisﬁce as the threshold value changes. This experiment is chosen due toTheorem 4 which proves that

VISatisﬁce is non-robust. As shown in Fig 4, thevariance in performance of

VISatisﬁce is very high. The appearance of peak closeto the optimal value is an empirical demonstration of Theorem 4. On that otherhand,

CompSatisﬁce stays steady in performance owning to its low complexity.

Having witnessed algorithmic improvements of comparator-based satisﬁcing overVI-based algorithms, we now shift focus to the question of applicability. Whilethis section examines this with respect to the ability to extend to temporalgoals, this discussion highlights a core strength of comparator-based reasoningin satisﬁcing and shows its promise in a broader variety of problems.The problem of extending optimal/satisﬁcing solutions with a temporal goalis to determine whether there exists an optimal/satisﬁcing solution that alsosatisﬁes a given temporal goal. Formally, given a quantitative game G , a labelingfunction L : V → AP which assigns states V of G to atomic propositions fromthe set AP , and a temporal goal ϕ over AP , we say a play ρ = v v . . . satisﬁes ϕ if its proposition sequence given by L ( v ) L ( v ) . . . satisﬁes the formula ϕ .Then to solve optimization/satisﬁcing with a temporal goal is to determine ifthere exists a solutions that is optimal/satisﬁcing and also satisﬁes the temporalgoal along resulting plays. Prior work has proven that the optimization problemcannot be extended to temporal goals [13] unless the temporal goals are verysimple safety properties [10,31]. In contrast, our comparator-based solution forsatisﬁcing can naturally be extended to temporal goals, in fact to all ω -regularproperties, owing to its automata-based underpinnings, as shown below: Theorem 8.

Let G a quantitative game with state set V , L : V → AP be alabeling function over set of atomic propositions AP , and ϕ be a temporal goalover AP and A ϕ be its equivalent deterministic parity automaton. Let d > bean integer discount factor, µ be the maximum of the absolute values of costs alongtransitions, and v ∈ Q be the threshold value with an n -length representation.Then, solving satisﬁcing with temporal goals reduces to solving a parity game ofsize linear in | V | , µ , n and |A ϕ | .Proof. The reduction involves two steps of synchronized products. The ﬁrst re-duces the satisﬁcing problem to a safety/reachability game while preservingthe labelling function. The second synchronization product is between the safe-ty/reachability game with the DPA A ϕ . These will synchronize on the atomicpropositions in the labeling function and DPA transitions, respectively. There-fore, resulting parity game will be linear in | V | , µ and n , and |A ϕ | . (cid:117)(cid:116) Broadly speaking, our ability to solve satisﬁcing via automata-based meth-ods is a key feature as it propels a seamless integration of quantitative prop-erties (threshold bounds) with qualitative properties, as both are grounded inautomata-based methods. VI-based solutions are inhibited to do so since numeri-cal methods are known to not combine well with automata-based methods whichare so prominent with qualitative reasoning [5,20]. This key feature could be ex-ploited in several other problems to show further beneﬁts of comparator-basedsatisﬁcing over optimization and VI-based methods.

This work introduces the satisﬁcing problem for quantitative games with thediscounted-sum cost model. When the discount factor is an integer, we presenta comparator-based solution for satisﬁcing, which exhibits algorithmic improve-ments – better worst-case complexity and eﬃcient, scalable, and robust per-formance – as well as broader applicability over traditional solutions based onnumerical approaches for satisﬁcing and optimization. Other technical contri-butions include the presentation of the missing proof of value-iteration for opti-mization and the extension of comparator automata to enable direct comparisonto arbitrary threshold values as opposed to zero threshold value only.An undercurrent of our comparator-based approach for satisﬁcing is that itoﬀers an automata-based replacement to traditional numerical methods. By do-ing so, it paves a way to combine quantitative and qualitative reasoning withoutcompromising on theoretical guarantees or even performance. This motivatestackling more challenging problems in this area, such as more complex environ-ments, variability in information availability, and their combinations.

Acknowledgements.

We thank anonymous reviewers for valuable inputs. Thiswork is supported in part by NSF grant 2030859 to the CRA for the CIFel-lows Project, NSF grants IIS-1527668, CCF-1704883, IIS-1830549, the ERC CoG863818 (ForM-SMArt), and an award from the Maryland Procurement Oﬃce. References

1. Satisﬁcing. https://en.wikipedia.org/wiki/Satisficing .2.

GMP . https://gmplib.org/ .3. B. Alpern and F. B. Schneider. Recognizing safety and liveness. Distributed com-puting , 2(3):117–126, 1987.4. C. Baier. Probabilistic model checking. In

Dependable Software Systems Engineer-ing , pages 1–23. 2016.5. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Automata vs linear-programmingdiscounted-sum inclusion. In

Proc. of International Conference on Computer-AidedVeriﬁcation (CAV) , 2018.6. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Comparator automata in quantita-tive veriﬁcation. In

Proc. of International Conference on Foundations of SoftwareScience and Computation Structures (FoSSaCS) , 2018.7. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Comparator automata in quantitativeveriﬁcation (full version).

CoRR , abs/1812.06569, 2018.8. S. Bansal, Y. Li, L. Tabajara, and M. Y. Vardi. Hybrid compositional reasoningfor reactive synthesis from ﬁnite-horizon speciﬁcations. In

Proc. of AAAI , 2020.9. S. Bansal and M. Y. Vardi. Safety and co-safety comparator automata fordiscounted-sum inclusion. In

Proc. of International Conference on Computer-AidedVeriﬁcation (CAV) , 2019.10. J. Bernet, D. Janin, and I. Walukiewicz. Permissive strategies: from parity gamesto safety games.

RAIRO-Theoretical Informatics and Applications-InformatiqueTh´eorique et Applications , 36(3):261–275, 2002.11. R. Bloem, K. Chatterjee, T. Henzinger, and B. Jobstmann. Better quality in syn-thesis through quantitative objectives. In

Proc. of CAV , pages 140–156. Springer,2009.12. U. Boker and T. A. Henzinger. Exact and approximate determinization ofdiscounted-sum automata.

LMCS , 10(1), 2014.13. K. Chatterjee, T. A. Henzinger, J. Otop, and Y. Velner. Quantitative fair simula-tion games.

Information and Computation , 254:143–166, 2017.14. D. Clark, S. Hunt, and P. Malacaria. A static analysis for quantifying informationﬂow in a simple imperative language.

Journal of Computer Security , 15(3):321–371,2007.15. T. Colcombet and N. Fijalkow. Universal graphs and good for games automata:New tools for inﬁnite duration games. In

Proc. of FSTTCS , pages 1–26. Springer,2019.16. B. Finkbeiner, C. Hahn, and H. Torfah. Model checking quantitative hyperprop-erties. In

Proc. of CAV , pages 144–163. Springer, 2018.17. T. D. Hansen, P. B. Miltersen, and U. Zwick. Strategy iteration is strongly poly-nomial for 2-player turn-based stochastic games with a constant discount factor.

Journal of the ACM , 60, 2013.18. K. He, M. Lahijanian, L. Kavraki, and M. Vardi. Reactive synthesis for ﬁnitetasks under resource constraints. In

Intelligent Robots and Systems (IROS), 2017IEEE/RSJ International Conference on , pages 5326–5332. IEEE, 2017.19. O. Kupferman and M. Y. Vardi. Model checking of safety properties. In

Proc. ofCAV , pages 172–183. Springer, 1999.20. M. Kwiatkowska. Quantitative veriﬁcation: Models, techniques and tools. In

Proc.6th joint meeting of the European Software Engineering Conference and the ACMSIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) ,pages 449–458. ACM Press, September 2007.821. M. Kwiatkowska, G. Norman, and D. Parker. Advances and challenges of proba-bilistic model checking. In , pages 1691–1698. IEEE, 2010.22. M. Lahijanian, S. Almagor, D. Fried, L. Kavraki, and M. Vardi. This time therobot settles for a cost: A quantitative approach to temporal logic planning withpartial satisfaction. In

AAAI , pages 3664–3671, 2015.23. M. L. Littman.

Algorithms for sequential decision making . Brown UniversityProvidence, RI, 1996.24. M. Osborne and A. Rubinstein.

A course in game theory . MIT press, 1994.25. M. Puterman. Markov decision processes.

Handbooks in operations research andmanagement science , 2:331–434, 1990.26. S. A. Seshia, A. Desai, T. Dreossi, D. J. Fremont, S. Ghosh, E. Kim, S. Shivakumar,M. Vazquez-Chanlatte, and X. Yue. Formal speciﬁcation for deep neural networks.In

Proc. of ATVA , pages 20–34. Springer, 2018.27. L. S. Shapley. Stochastic games.

Proceedings of the National Academy of Sciencesof the United States of America , 39(10):1095, 1953.28. R. Sutton and A. Barto.

Introduction to reinforcement learning , volume 135. MITpress Cambridge, 1998.29. L. M. Tabajara and M. Y. Vardi. Partitioning techniques in LTLf synthesis. In

IJCAI , pages 5599–5606. AAAI Press, 2019.30. W. Thomas, T. Wilke, et al.

Automata, logics, and inﬁnite games: A guide tocurrent research , volume 2500. Springer Science & Business Media, 2002.31. M. Wen, R. Ehlers, and U. Topcu. Correct-by-synthesis reinforcement learningwith temporal logic constraints. In , pages 4983–4990. IEEE, 2015.32. U. Zwick and M. Paterson. The complexity of mean payoﬀ games on graphs.

Theoretical Computer Science , 158(1):343–359, 1996.9

A Complexity proof for VI Optimization

Lemma 1

Let l = l · ( l ) ω represent an integer cost sequence of a lasso, where l and l are the cost sequences of the head and loop of the lasso. Let d = pq be the discount factor. Then, DS ( l, d ) is a rational number with denominator atmost ( p | l | − q | l | ) · ( p | l | ) .Proof. The discounted sum of l is given as follows: DS ( l, d ) = DS ( l , d ) + 1 d | l | · ( DS (( l ) ω , d ))= DS ( l , d ) + 1 d | l | · (cid:16) DS ( l , d ) + 1 d | l | · DS ( l , d ) + 1 d ·| l | · DS ( l , d ) + . . . (cid:17) Taking closed form expression of the term in the parenthesis, we get= DS ( l , d ) + 1 d | l | · (cid:16) d | l | d | l | − (cid:17) · DS ( l , d )Let l = b b . . . b | l |− where b i ∈ Z = DS ( l , d ) + 1 d | l | · (cid:16) d | l | d | l | − (cid:17) · (cid:16) b + b d + · · · + b | l |− d | l |− (cid:17) = DS ( l , d ) + 1 d | l | · (cid:16) d | l | − (cid:17) · (cid:16) b · d | l | + b · d | l |− + · · · + b | l |− d (cid:17) Expressing d = pq , we get= DS ( l , d ) + 1 d | l | · q | l | p | l | − q | l | · (cid:16) b ( pq ) | l | + . . . b | l |− · pq (cid:17) DS ( l , d ) + 1 d | l | · p | l | − q | l | · M, where M ∈ Z Expressing d = pq again, we get= 1 p | l | · p | l | − q | l | · N, where N ∈ Z Theorem 1

Let G = ( V, v init , E, γ ) be a graph game. The number of iterationsrequired by the value-iteration algorithm or the length of the ﬁnite-length gameto compute the optimal value W is1. O ( | V | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) d − + | V | (cid:17) when discount factor < d < .Proof. Recall, the task is to ﬁnd a k such that the interval identiﬁed by Lemma 2is less than bound diﬀ . Note that bound W < bound diﬀ . Therefore, bound diﬀ < bound v .Hence, there can be only one rational value with denominator bound W or less inthe small interval identiﬁed by the chosen k . Since the optimal value must alsolie in this interval, the unique rational number with denominator bound W or less must be the optimal value. Let k be such that the interval from Lemma 2 is lessthan bound diﬀ . Then,2 · µd − · d k − ≤ c · p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) for some c > · µd − · d k − ≤ c · q ·| V | ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) for some c > · µd − · d k − ≤ c · d ( | V | ) − · ( d (2 ·| V | ) ) for c > d − · d k − ≥ c (cid:48) · µ · ( d ( | V | ) − · ( d (2 ·| V | ) ) for c (cid:48) > d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −

1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > When d ≥ : In this case, both d and d | V | are large. Then,log( d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −

1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d ) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · | V | · log( d ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > k = O ( | V | ) When d is small but d | V | is large: In this case, log( d ) ≈ ( d − d − ≈ − d . Then,log( d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −

1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · | V | · log( d ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > − d + ( k − · ( d − ≥ c (cid:48)(cid:48) + log( µ ) + 4 · | V | · ( d −

1) for c (cid:48)(cid:48) > k = O (cid:16) log( µ ) d − | V | (cid:17) When both d and d | V | are small: Then, in addition to the approximationsfrom the earlier case, log( d | V | − ≈ (2 − d | V | ). So,log( d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −

1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d −

1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · (2 − d | V | ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > − d + ( k − · ( d − ≥ c (cid:48)(cid:48) + log( µ ) + 2 · (2 − d | V | ) + 2 · | V | · ( d −

1) for c (cid:48)(cid:48) > k = O (cid:16) log( µ ) d − | V | (cid:17) (cid:117)(cid:116) Concrete example to establish Ω ( | V | ) lower bound for number of iter-ations required by the value iteration algorithm Recall Fig 1, as presentedhere as well: . . .. . . w >

00 1 010 00

Fig. 5: Sketch of game graph which requires Ω ( | V | ) iterationsLet the left hand side loop have 4 n edges, the right hand side of the loophave 2 n edges, and w = d n + d n + · · · + d m · n − such that m · n − c · n fora positive integer c > m · n −

1) or less, the optimal patharises from the loop to the right. But for games of length greater than ( m · n − B Complexity of VI under Bit-Cost model

Under the bit-cost model, the cost of arithmetic operations depends on the sizeof the numerical values. Integers are represented in their bit-wise representation.Rational numbers rs are represented as a tuple of the bit-wise representation ofintegers r and s . For two integers of length n and m , the cost of their additionand multiplication is O ( m + n ) and O ( m · n ), respectively.To compute the cost of arithmetic in each iteration of the value-iterationalgorithm, we deﬁne the cost of a transition ( v, w ) ∈ E in the k -th iteration as cost ( v, w ) = γ ( v, w ) and cost k ( v, w ) = γ ( v, w ) + 1 d · wt k − ( v ) for k > wt k ( v ) = max { cost k ( v, w ) | w ∈ vE } if v ∈ V and wt k ( v ) =min { cost k ( v, w ) | w ∈ vE } if v ∈ V . Since, we compute the cost of every transitionin each iteration, it is crucial to analyze the size and cost of computing cost . Lemma 4.

Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.Then for all ( v, w ) ∈ E , for all k > cost k ( v, w ) = q k − · n + q k − p · n + · · · + p k − n k p k − where n i ∈ Z such that | n i | ≤ µ for all i ∈ { , . . . , k } . Lemma 4 can be proven by induction on k . Lemma 5.

Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.For all ( v, w ) ∈ E , for all k > the cost of computing cost k ( v, w ) in the k -thiteration is O ( k · log p · max { log µ, log p } ) . Proof.

We compute the cost of computing cost k ( v, w ) given that optimal costshave been computed for the ( k − cost k ( v, w ) = γ ( v, w ) + 1 d · wt k − ( v ) = γ ( v, w ) + qp · wt k − ( v )= γ ( v, w ) + qp · q k − · n + q k − p · n + · · · + p k − n k − p k − for some n i ∈ Z such that | n i | ≤ µ . Therefore, computation of cost k ( v, w )involves four operations:1. Multiplication of q with ( q k − · n + q k − p · n + · · · + p k − n k − ). The later isbounded by ( k − · µ · p k − since | n i | ≤ µ and p > q . The cost of this operationis O (log(( k − · µ · p k − ) · log( p )) = O ((( k − · log p +log µ +log( k − · (log p )).2. Multiplication of p with p k − . Its cost is O (( k − · (log p ) ).3. Multiplication of p k − with γ ( v, w ). Its cost is O (( k − · log p · log µ ).4. Addition of γ ( v, w ) · p k − with q · ( q k − · n + q k − p · n + · · · + p k − n k − ).The cost is linear in their representations.Therefore, the cost of computing cost k ( v, w ) is O ( k · log p · max { log µ, log p } ). (cid:117)(cid:116) Now, we can compute the cost of computing optimal costs in the k -th itera-tion from the k − Lemma 6.

Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.The worst-case complexity of computing optimal costs in the k -th iteration fromthe ( k − -th iteration is O ( | E | · k · log µ · log p ) .Proof. The update requires us to ﬁrst compute the transition cost in the k -thiteration for every transition in the game. Lemma 5 gives the cost of computingthe transition cost of one transition. Therefore, the worst-case complexity ofcomputing transition cost for all transitions is O ( | E | · k · log p · max { log µ, log p } ).To compute the optimal cost for each state, we are required to compute themaximum transition cost of all outgoing transitions from the state. Since thedenominator is same, the maximum value can be computed via lexicographiccomparison of the numerators on all transitions. Therefore, the cost of computingmaximum for all states is O ( | E | · k · log µ · log p ).Therefore, total cost of computing optimal costs in the k -th iteration fromthe ( k − O ( | E | · k · log p · max { log µ, log p } ). (cid:117)(cid:116) Finally, the worst-case complexity of computing the optimal value of thequantitative game under bit-cost model for arithmetic operations is as follows:

Theorem 3.

Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > bethe maximum of absolute value of all costs along transitions. Let d = pq > bethe discount factor. The worst-case complexity of computing the optimal valueunder bit-cost model for arithmetic operations is O ( | V | · | E | · log p · max { log µ, log p } ) when d ≥ ,2. O ( (cid:16) log( µ ) d − + | V | (cid:17) · | E | · log p · max { log µ, log p } ) when < d < .Proof. This is the sum of computing the optimal costs for all iterations.When d ≥

2, it is suﬃcient to perform value iteration for O ( | V | ) times(Theorem 1). So, the cost is O ((1 + 2 + 3 · + | V | ) · | E | · log p · max { log µ, log p } ).This expression simpliﬁes to O ( | V | · | E | · log p · max { log µ, log p } ).A similar computation solves the case for 1 < d < (cid:117)(cid:116) C Discounted-sum comparator construction

Theorem 5

Let µ > be the upper bound. For arbitrary discount factor d > and threshold value v

1. DS-comparison languages are safety languages for relations R ∈ {≤ , ≥ , = } .2. DS-comparison language are co-safety languages for relations R ∈ { <, >, (cid:54) = } .Proof. Due to duality of safety/co-safety languages, it is suﬃcient to show thatDS-comparison language with ≤ is a safety language.Let us assume that DS-comparison language with ≤ is not a safety language.Let W be a weight-sequence in the complement of DS-comparison language with ≤ such that it does not have a bad preﬁx.Since W is in the complement of DS-comparison language with ≤ , DS ( W, d ) >v . By assumption, every i -length preﬁx W [ i ] of W can be extended to a boundedweight-sequence W [ i ] · Y i such that DS ( W [ i ] · Y i , d ) ≤ v .Note that DS ( W, d ) = DS ( W [ · · · i ] , d ) + d i · DS ( W [ i . . . ] , d ), and DS ( W [ · · · i ] · Y i , d ) = DS ( W [ · · · i ] , d )+ d i · DS ( Y i , d ). The contribution of tail se-quences W [ i . . . ] and Y i to the discounted-sum of W and W [ · · · i ] · Y i , respectivelydiminishes exponentially as the value of i increases. In addition, since and W and W [ · · · i ] · Y i share a common i -length preﬁx W [ · · · i ], their discounted-sum valuesmust converge to each other. The discounted sum of W is ﬁxed and greater than v , due to convergence there must be a k ≥ DS ( W [ · · · k ] · Y k , d ) > v .Contradiction. Therefore, DS-comparison language with ≤ is a safety language.The above intuition is formalized below:Since DS ( W, d ) > v and DS ( W [ · · · i ] · Y i , d ) ≤ v , the diﬀerence DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) > DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) = d i ( DS ( W [ i . . . ] , d ) − DS ( Y i , d )) ≤ d i · ( | DS ( W [ i . . . ] , d ) | + | DS ( Y i , d ) | ). Sincethe maximum value of discounted-sum of sequences bounded by µ is µ · dd − , wealso get that DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) ≤ · d i | µ · dd − | .Putting it all together, for all i ≥ < DS ( W, d ) − DS ( W [ i ] · Y i , d ) ≤ · d i | µ · dd − | As i → ∞ , 2 · | d i − · µd − | →

0. So, lim i →∞ ( DS ( W, d ) − DS ( W [ i ] · Y i , d )) = 0.Since DS ( W, d ) is ﬁxed, lim i →∞ DS ( W [ i ] · Y i , d ) = DS ( W, d ). By deﬁnition of convergence, there exists an index k ≥ DS ( W [ k ] · Y k , d ) falls within the | DS ( W,d ) | neighborhood of DS ( W, d ). Finallysince DS ( W, d ) > DS ( W [ k ] · Y k , d ) > i ≥ DS ( W [ i ] · Y i , d ) ≤ ≤ is a safety comparator. (cid:117)(cid:116) Lemma 3

Let µ > be the integer upper bound, d > be an integer discountfactor, and the relation R be the inequality ≤ . Let v ∈ Q be the threshold valuesuch that v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω . Let W be a non-empty,bounded, ﬁnite weight-sequence. Then, weight sequence W is a bad-preﬁx of theDS comparison language with µ , d , ≤ and v iﬀ gap ( W − v [ · · · | W | ] , d ) > d · DS ( v [ | W | · · · ] , d ) + µd − .Proof. Let W be a bad preﬁx. Then for all inﬁnite length, bounded weightsequence Y we get that DS ( W · Y , d ) > v = ⇒ DS ( W, d ) + d | W | · DS ( Y, d ) ≥ DS ( v [ · · · | W | ] · v [ | W | · · · ] , d ) = ⇒ DS ( W, d ) − DS ( v [ · · · | W | ] , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) − DS ( Y, d )) = ⇒ gap ( W − v [ · · · | W | ] , d ) > d ( DS ( v [ | W | · · · ] , d ) + µ · dd − ).Next, we prove that if a ﬁnite weight sequence W is such that, the Wis a bad preﬁx. Let Y be an arbitrary inﬁnite but bounded weight sequence.Then DS ( W · Y , d ) = DS ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) + d | W |− · ( gap ( v [ · · · | W | ] , d ) − gap ( v [ · · · | W | ] , d )). By re-arrangement of terms we get that DS ( W · Y , d ) = d | W |− · gap ( W − v [ · · · | W | ] , d ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ).Since gap ( W − v [ · · · | W | ] , d ) > d · ( DS ( v [ | W | · · · ] , d ) + µ · dd − ) holds, we get that DS ( W · Y , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) + µ · dd − ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ). Since minimal value of DS ( Y, d ) is − µ · dd − , the inequality simpli-ﬁes to DS ( W · Y , d ) > d | W |− · gap ( v [ · · · | W | ] , d ) + d | W | · DS ( v [ | W | · · · ] , d ) = ⇒ DS ( W · Y , d ) > DS ( v, d ) = v . Therefore, W is a bad preﬁx. (cid:117)(cid:116) Lemma 7.

Let µ and d > be the bound and discount-factor, resp. Let W be a non-empty, bounded, ﬁnite weight-sequence. Weight sequence W is a verygood-preﬁx of A µ,d ≤ iﬀ gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − .Proof. Let W be a very good preﬁx. Then for all inﬁnite, bounded sequences Y , we get that DS ( W · Y , d ) ≤ v = ⇒ DS ( W, d ) + d | W | · DS ( Y, d ) ≤ v . By re-arrangement of terms, we get that gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − d · DS ( Y, d ). Since maximal value of DS ( Y, d ) = µ · dd − , we get that gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − .Next, we prove the converse. We know DS ( W · Y , d ) = DS ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) + d | W |− · ( gap ( v [ · · · | W | ] , d ) − gap ( v [ · · · | W | ] , d )). By re-arrangement ofterms we get that DS ( W · Y , d ) = d | W |− · gap ( W − v [ · · · | W | ] , d ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ). From assumption we derive that DS ( W · Y , d ) ≤ d | W | · DS ( v [ | W | · · · ] , d ) − µd − + d | W | · DS ( Y, d )+ DS ( v [ · · · | W | ] , d ). Since maximal valueof DS ( Y, d ) is µd − , we get that DS ( W · Y , d ) ≤ v . Therefore, W is a very goodpreﬁx.is a very goodpreﬁx.