On Satisficing in Quantitative Games
OOn Satisficing in Quantitative Games
Suguman Bansal (cid:66) , Krishnendu Chatterjee , and Moshe Y. Vardi University of Pennsylvania, Philadelphia, USA [email protected] IST Austria, Klosterneuburg, Austria, [email protected] Rice University, Houston, USA [email protected]
Abstract.
Several problems in planning and reactive synthesis can bereduced to the analysis of two-player quantitative graph games.
Opti-mization is one form of analysis. We argue that in many cases it may bebetter to replace the optimization problem with the satisficing problem ,where instead of searching for optimal solutions, the goal is to search forsolutions that adhere to a given threshold bound.This work defines and investigates the satisficing problem on a two-playergraph game with the discounted-sum cost model. We show that while thesatisficing problem can be solved using numerical methods just like theoptimization problem, this approach does not render compelling bene-fits over optimization. When the discount factor is, however, an integer,we present another approach to satisficing, which is purely based on au-tomata methods. We show that this approach is algorithmically moreperformant – both theoretically and empirically – and demonstrates thebroader applicability of satisficing over optimization.
Quantitative properties of systems are increasingly being explored in automatedreasoning [4,14,16,20,21,26]. In decision-making domains such as planning andreactive synthesis, quantitative properties have been deployed to describe softconstraints such as quality measures [11], cost and resources [18,22], rewards [31],and the like. Since these constraints are soft, it suffices to generate solutions thatare good enough w.r.t. the quantitative property.Existing approaches on the analysis of quantitative properties have, however,primarily focused on optimization of these constraints, i.e., to generate optimalsolutions. We argue that there may be disadvantages to searching for optimalsolutions, where good enough ones may suffice. First, optimization may be moreexpensive than searching for good-enough solutions. Second, optimization re-stricts the search-space of possible solutions, and thus could limit the broaderapplicability of the resulting solutions. For instance, to generate solutions thatoperate within battery life, it is too restrictive to search for solutions with mini-mal battery consumption. Besides, solutions with minimal battery consumptionmay be limited in their applicability, since they may not satisfy other goals, suchas desirable temporal tasks.To this end, this work focuses on directly searching for good-enough solu-tions. We propose an alternate form of analysis of quantitative properties in a r X i v : . [ c s . F L ] J a n which the objective is to search for a solution that adheres to a given thresh-old bound , possibly derived from a physical constraint such as battery life. Wecall this the satisficing problem , a term popularized by H.A.Simon in economicsto mean satisfy and suffice , implying a search for good-enough solutions [1].Through theoretical and empirical investigation, we make the case that satis-ficing is algorithmically more performant than optimization and, further, thatsatisficing solutions may have broader applicability than optimal solutions.This work formulates and investigates the satisficing problem on two-player,finite-state games with the discounted-sum (DS) cost model, which is a standardcost-model in decision-making domains [24,25,28]. In these games, players taketurns to pass a token along the transition relation between the states. As thetoken is pushed around, the play accumulates costs along the transitions usingthe DS cost model. The players are assumed to have opposing objectives: oneplayer maximizes the cost, while the other player minimizes it. We define thesatisficing problem as follows: Given a threshold value v ∈ Q , does there exist astrategy for the minimizing (or maximizing) player that ensures the cost of allresulting plays is strictly or non-strictly lower (or greater) than the threshold v ? Clearly, the satisficing problem is decidable since the optimization prob-lem on these quantitative games is known to be solvable in pseudo-polynomialtime [17,23,32]. To design an algorithm for satisficing, we first adapt the cele-brated value-iteration (VI) based algorithm for optimization [32] ( § VISatisfice , displays the same complexity asoptimization and hence renders no complexity-theoretic advantage. To obtainworst-case complexity, we perform a thorough worst-case analysis of VI for op-timization. It is interesting that a thorough analysis of VI for optimization hadhitherto been absent from the literature, despite the popularity of VI. To ad-dress this gap, we first prove that VI should be executed for Θ ( | V | ) iterationsto compute the optimal value, where V and E refer to the sets of states andtransitions in the quantitative game. Next, to compute the overall complexity,we take into account the cost of arithmetic operations as well, since they appearin abundance in VI. We demonstrate an orders-of-magnitude difference betweenthe complexity of VI under different cost-models of arithmetic. For instance,for integer discount factors, we show that VI is O ( | V | · | E | ) and O ( | V | · | E | )under the unit-cost and bit-cost models of arithmetic, respectively. Clearly, thisshows that VI for optimization, and hence VISatisfice , does not scale to largequantitative games.We then present a purely automata-based approach for satisficing ( § O ( | V | + | E | ) time. This shows that there is a fundamental separation in com-plexity between satisficing and VI-based optimization, as even the lower boundon the number of iterations in VI is higher. In this approach, the satisficing prob-lem is reduced to solving a safety or reachability game. Our core observation isthat the criteria to fulfil satisficing with respect to threshold value v ∈ Q can beexpressed as membership in an automaton that accepts a weight sequence A iff DS ( A, d ) R v holds, where d > R ∈ {≤ , ≥ , <, > } . In existing literature, such automata are called comparator automata (comparators,in short) when the threshold value v = 0 [6,7]. They are known to have a com-pact safety or co-safety automaton representation [9,19], which could be used toreduce the satisficing problem with zero threshold value. To solve satisficing forarbitrary threshold values v ∈ Q , we extend existing results on comparators topermit arbitrary but fixed threshold values v ∈ Q . An empirical comparison be-tween the performance of VISatisfice , VI for optimization, and automata-basedsolution for satisficing shows that the latter outperforms the others in efficiency,scalability, and robustness.In addition to improved algorithmic performance, we demonstrate that satis-ficing solutions have broader applicability than optimal ones ( § Reachability and safety games.
Both reachability and safety games are definedover the structure G = ( V = V (cid:93) V , v init , E, F ) [30]. It consists of a directedgraph ( V, E ), and a partition ( V , V ) of its states V . State v init is the initial state of the game. The set of successors of state v is designated by vE . For convenience,we assume that every state has at least one outgoing edge, i.e, vE (cid:54) = ∅ for all v ∈ V . F ⊆ V is a non-empty set of states. F is referred to as accepting and rejecting states in reachability and safety games, respectively.A play of a game involves two players, denoted by P and P , to create aninfinite path by moving a token along the transitions as follows: At the beginning,the token is at the initial state. If the current position v belongs to V i , then P i chooses the successor state from vE . Formally, a play ρ = v v v . . . is an infinitesequence of states such that the first state v = v init , and each pair of successivestates is a transition, i.e., ( v k , v k +1 ) ∈ E for all k ≥
0. A play is winning forplayer P in a reachability game if it visits an accepting state, and winning forplayer P otherwise. The opposite holds in safety games, i.e., a play is winningfor player P if it does not visit any rejecting state, and winning for P otherwise.A strategy for a player is a recipe that guides the player on which state to gonext to based on the history of the play. A strategy is winning for a player P i if for all strategies of the opponent player P − i , the resulting plays are winning for P i . To solve a graph game means to determine whether there exists a winningstrategy for player P . Reachability and safety games are solved in O ( | V | + | E | ). Quantitative graph games. A quantitative graph game (or quantitative game, inshort) is defined over a structure G = ( V = V (cid:93) V , v init , E, γ ). V , V , V , v init , E ,plays and strategies are defined as earlier. Each transition of the game is associ-ated with a cost determined by the cost function γ : E → Z . The cost sequence of a play ρ is the sequence of costs w w w . . . such that w k = γ (( v k , v k +1 )) forall i ≥
0. Given a discount factor d >
1, the cost of play ρ , denoted wt ( ρ ), is thediscounted sum of its cost sequence, i.e., wt ( ρ ) = DS ( ρ, d ) = w + w d + w d + . . . . B¨uchi automata. A B¨uchi automaton is a tuple A = ( S , Σ , δ , s I , F ), where S is a finite set of states , Σ is a finite input alphabet , δ ⊆ ( S × Σ × S ) is the transition relation , state s I ∈ S is the initial state , and F ⊆ S is the set of accepting states [30]. A B¨uchi automaton is deterministic if for all states s andinputs a , |{ s (cid:48) | ( s, a, s (cid:48) ) ∈ δ for some s (cid:48) }| ≤
1. For a word w = w w · · · ∈ Σ ω , a run ρ of w is a sequence of states s s . . . s.t. s = s I , and τ i = ( s i , w i , s i +1 ) ∈ δ for all i . Let inf ( ρ ) denote the set of states that occur infinitely often in run ρ .A run ρ is an accepting run if inf ( ρ ) ∩ F (cid:54) = ∅ . A word w is an accepting word if ithas an accepting run. The language of B¨uchi automaton A is the set of all wordsaccepted by A . Languages accepted by B¨uchi automata are called ω -regular . Safety and co-safety languages.
Let
L ⊆ Σ ω be a language over alphabet Σ . Afinite word w ∈ Σ ∗ is a bad prefix for L if for all infinite words y ∈ Σ ω , x · y / ∈ L .A language L is a safety language if every word w / ∈ L has a bad prefix for L [3]. A co-safety language is the complement of a safety language [19]. Safetyand co-safety languages that are ω -regular are represented by specialized B¨uchiautomata called safety and co-safety automata , respectively. Comparison language and comparator automata.
Given integer bound µ >
0, dis-count factor d >
1, and relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } the comparison languagewith upper bound µ , relation R , discount factor d is the language of words overthe alphabet Σ = {− µ, . . . , µ } that accepts A ∈ Σ ω iff DS ( A, d ) R comparator automata with upper bound µ , relation R , discount factor d is theautomaton that accepts the corresponding comparison language [6]. Dependingon R , these languages are safety or co-safety [9]. A comparison language is saidto be ω -regular if its automaton is a B¨uchi automaton. Comparison languagesare ω -regular iff the discount factor is an integer [7]. This section shows that there are no complexity-theoretic benefits to solving thesatisficing problem via algorithms for the optimization problem. § claim without proof that the algorithm runs in pseudo-polynomial time [32],its worst-case analysis is absent from literature. This section presents a detailedaccount of the said analysis, and exposes the dependence of VI’s worst-casecomplexity on the discount factor d > § § § VISatisfice . Given a quantitative graph game G anda threshold value v ∈ Q , the satisficing problem is to determine whether theminimizing (or maximizing) player has a strategy that ensures the cost of allresulting plays is strictly or non-strictly lower (or greater) than the threshold v . The satisficing problem can clealy be solved by solving the optimization prob-lem . The optimal cost of a quantitative game is that value such that the max-imizing and minimizing players can guarantee that the cost of plays is at leastand at most the optimal value, respectively.
Definition 2 (Optimization problem).
Given a quantitative graph game G ,the optimization problem is to compute the optimal cost from all possible playsfrom the game, under the assumption that the players have opposing objectivesto maximize and minimize the cost of plays, respectively. Seminal work by Zwick and Patterson showed the optimization problem issolved by the value-iteration algorithm presented here [32]. Essentially, the al-gorithm plays a min-max game between the two players. Let wt k ( v ) denotethe optimal cost of a k -length game that begins in state v ∈ V . Then wt k ( v )can be computed using the following equations: The optimal cost of a 1-lengthgame beginning in state v ∈ V is max { γ ( v, w ) | ( v, w ) ∈ E } if v ∈ V andmin { γ ( v, w ) | ( v, w ) ∈ E } if v ∈ V . Given the optimal-cost of a k -length game,the optimal cost of a ( k + 1)-length game is computed as follows: wt k +1 ( v ) = (cid:40) max { γ ( v, w ) + d · wt k ( w ) | ( v, w ) ∈ E } if v ∈ V min { γ ( v, w ) + d · wt k ( w ) | ( v, w ) ∈ E } if v ∈ V Let W be the optimal cost. Then, W = lim k →∞ wt k ( v init ). [27,32]. The VI algorithm described above terminates at infinitum . To compute the al-gorithms’ worst-case complexity, we establish a linear bound on the number ofiterations that is sufficient to compute the optimal cost. We also establish amatching lower bound, showing that our analysis is tight.
Upper bound on number of iterations.
The upper bound computation utilizes onekey result from existing literature: There exist memoryless strategies for bothplayers such that the cost of the resulting play is the optimal cost [27]. Then,there must exists an optimal play in the form of a simple lasso in the quantitativegame, where a lasso is a play represented as v v . . . v n ( s s . . . s m ) ω . We call theinitial segment v v . . . v n its head , and the cycle segment s s . . . s m its loop . Alasso is simple if each state in { v . . . v n , s , . . . s m } is distinct. We begin our proofby assigning constraints on the optimal cost using the simple lasso structure ofan optimal play (Corollary 1 and Corollary 2).Let l = a . . . a n ( b . . . b m ) ω be the cost sequence of a lasso such that l = a . . . a n and l = b . . . b m are the cost sequences of the head and the loop,respectively. Then the following can be said about DS ( l · l ω , d ), Lemma 1.
Let l = l · ( l ) ω represent an integer cost sequence of a lasso, where l and l are the cost sequences of the head and loop of the lasso. Let d = pq be the discount factor. Then, DS ( l, d ) is a rational number with denominator atmost ( p | l | − q | l | ) · ( p | l | ) . Lemma 1 is proven by unrolling DS ( l · l ω , d ). Then, the first constraint onthe optimal cost is as follows: Corollary 1.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let d = pq bethe discount factor. Then the optimal cost of the game is a rational number withdenominator at most ( p | V | − q | V | ) · ( p | V | ) Proof.
Recall, there exists a simple lasso that computes the optimal cost. Since asimple lasso is of | V | -length at most, the length of its head and loop are at most | V | each. So, the expression from Lemma 1 simplifies to ( p | V | − q | V | ) · ( p | V | ). (cid:117)(cid:116) The second constraint has to do with the minimum non-zero difference be-tween the cost of simple lassos:
Corollary 2.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let d = pq be the discount factor. Then the minimal non-zero difference between the cost ofsimple lassos is a rational with denominator at most ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) .Proof. Given two rational numbers with denominator at most a , an upper boundon the denominator of minimal non-zero difference of these two rational numbersis a . Then, using the result from Corollary 1, we immediately obtain that theminimal non-zero difference between the cost of two lassos is a rational numberwith denominator at most ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ). (cid:117)(cid:116) For notational convenience, let bound W = ( p | V | − q | V | ) · ( p | V | ) and bound diff =( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ). Wlog, | V | >
1. Since, bound diff < bound W , there is atmost one rational number with denominator bound W or less in any interval ofsize bound diff . Thus, if we can identify an interval of size less than bound diff aroundthe optimal cost, then due to Corollary 1, the optimal cost will be the uniquerational number with denominator bound W or less in this interval. Start . . .. . . w >
00 1 010 00
Fig. 1: Sketch of game graph which requires Ω ( | V | ) iterationsThus, the final question is to identify a small enough interval (of size bound diff or less) such that the optimal cost lies within it. To find an interval around theoptimal cost, we use a finite-horizon approximation of the optimal cost: Lemma 2.
Let W be the optimal cost in quantitative game G . Let µ > be themaximum of absolute value of cost on transitions in G . Then, for all k ∈ N ,wt k ( v init ) − d k − · µd − ≤ W ≤ wt k ( v init ) + 1 d k − · µd − Proof.
Since W is the limit of wt k ( v init ) as k → ∞ , W must lie in between theminimum and maximum cost possible if the k -length game is extended to aninfinite-length game. The minimum possible extension would be when the k -length game is extended by iterations in which the cost incurred in each roundis − µ . Therefore, the minimum possible value is wt k ( v init ) − d k − · µd − . Similarly,the maximum possible value is wt k ( v init ) + d k − · µd − . (cid:117)(cid:116) Now that we have an interval around the optimal cost, we can compute thenumber of iterations of VI required to make it smaller than 1 / bound diff . Theorem 1.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > be the maximum of absolute value of costs along transitions. The number ofiterations required by the value-iteration algorithm is1. O ( | V | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) d − + | V | (cid:17) when discount factor < d < .Proof (Sketch). As discussed in Corollary 1-2 and Lemma 2, the optimal cost isthe unique rational number with denominator bound W or less within the interval( wt k ( v init ) − d k − · µd − , wt k ( v init ) + d k − · µd − ) for a large enough k > bound diff . Thus, our task is to determine the value of k > · µd − · d k − ≤ bound diff holds. The case d ≥ < d < (cid:117)(cid:116) Lower bound on number of iterations of VI.
We establish a matching lowerbound of Ω ( | V | ) iterations to show that our analysis is tight.Consider the sketch of a quantitative game in Fig 1. Let all states belongto the maximizing player. Hence, the optimization problem reduces to searchingfor a path with optimal cost. Now let the loop on the right-hand side (RHS) belarger than the loop on the left-hand side (LHS). For carefully chosen values of w and lengths of the loops, one can show that the path for optimal cost of a k -length game is along the RHS loop when k is small, but along the LHS loopwhen k is large. This way, the correct maximal value can be obtained only at alarge value for k . Hence the VI algorithm runs for at least enough iterations thatthe optimal path will be in the LHS loop. By meticulous reverse engineering ofthe size of both loops and the value of w , one can guarantee that k = Ω ( | V | ). Finally, we complete the worst-case complexity analysis of VI for optimization.We account for the the cost of arithmetic operations since they appear in abun-dance in VI. We demonstrate that there are orders-of-magnitude of difference incomplexity under different models of arithmetic, namely unit-cost and bit-cost.
Unit-cost model.
Under the unit-cost model of arithmetic, all arithmetic opera-tions are assumed to take constant time.
Theorem 2.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > be the maximum of absolute value of costs along transitions. The worst-casecomplexity of the optimization problem under unit-cost model of arithmetic is1. O ( | V | · | E | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) ·| E | d − + | V | · | E | (cid:17) when discount factor < d < .Proof. Each iteration takes O ( E ) cost since every transition is visited once. Thus,the complexity is O ( | E | ) multiplied by the number of iterations (Theorem 1). (cid:117)(cid:116) Bit-cost model.
Under the bit-cost model, the cost of arithmetic operations de-pends on the size of the numerical values. Integers are represented in their bit-wise representation. Rational numbers rs are represented as a tuple of the bit-wiserepresentation of integers r and s . For two integers of length n and m , the costof their addition and multiplication is O ( m + n ) and O ( m · n ), respectively. Theorem 3.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > bethe maximum of absolute value of costs along transitions. Let d = pq > be thediscount factor. The worst-case complexity of the optimization problem underthe bit-cost model of arithmetic is1. O ( | V | · | E | · log p · max { log µ, log p } ) when d ≥ ,2. O (cid:16)(cid:16) log( µ ) d − + | V | (cid:17) · | E | · log p · max { log µ, log p } (cid:17) when < d < .Proof (Sketch). Since arithmetic operations incur a cost and the length of repre-sentation of intermediate costs increases linearly in each iteration, we can showthat the cost of conducting the j -th iteration is O ( | E | · j · log µ · log p ). Theirsummation will return the given expressions. (cid:117)(cid:116) Remarks on integer discount factor.
Our analysis shows that when the discountfactor is an integer ( d ≥ Θ ( | V | ) iterations. Its worst-case com-plexity is, therefore, O ( | V |·| E | ) and O ( | V | ·| E | ) under the unit-cost and bit-costmodels for arithmetic, respectively. From a practical point of view, the bit-costmodel is more relevant since implementations of VI will use multi-precision li-braries to avoid floating-point errors. While one may argue that the upper boundsin Theorem 3 could be tightened, they would not improve significantly due tothe Ω ( | V | ) lower bound on number of iterations. We present our first algorithm for the satisficing problem. It is an adaptation ofVI. However, we see that it does not fare better than VI for optimization.VI-based algorithm for satisficing is described as follows: Perform VI foroptimization. Terminate as soon as one of these occurs: (a). VI completes as manyiterations from Theorem 1, or (b). The threshold value falls outside the intervaldefined in Lemma 2. Either way, one can tell how the threshold value relatesto the optimal cost to solve satisficing. Clearly, (a) needs as many iterations asoptimization; (b) does not reduce the number of iterations since it is inverselyproportional to the distance between optimal cost and threshold value:
Theorem 4.
Let G = ( V, v init , E, γ ) be a quantitative graph game with optimalcost W . Let v ∈ Q be the threshold value. Then number of iterations taken by aVI-based algorithm for the satisficing problem is min { O ( | V | ) , log µ | W |− v } if d ≥ and min {O (cid:16) log( µ ) d − + | V | (cid:17) , log µ | W |− v } if < d < . Observe that this bound is tight since the lower bounds from optimizationapply here as well. The worst-case complexity can be completed using similarcomputations from § non-robust performance. Our second algorithm for satisficing is purely based on automata-methods. Whilethis approach operates with integer discount factors only, it runs linearly inthe size of the quantitative game. This is lower than the number of iterationsrequired by VI, let alone the worst-case complexities of VI. This approach reducessatisficing to solving a safety or reachability game using comparator automata.The intuition is as follows: Given threshold value v ∈ Q and relation R , letthe satisficing problem be to ensure cost of plays relates to v by R . Then, a play ρ is winning for satisficing with v and R if its cost sequence A satisfies DS ( A, d ) R v , where d > d is an integer and v = 0, this simplychecks if A is in the safety/co-safety comparator, hence yielding the reduction.The caveat is the above applies to v = 0 only. To overcome this, we extendthe theory of comparators to permit arbitrary threshold values v ∈ Q . We findthat results from v = 0 transcend to v ∈ Q , and offer compact comparatorconstructions ( § § § v ∈ Q This section extends the existing literature on comparators with threshold value v = 0 [6,5,9] to permit non-zero thresholds. The properties we investigate are ofsafety/co-safety and ω -regularity. We begin with formal definitions: Definition 3 (Comparison language with threshold v ∈ Q ). For an in-teger upper bound µ > , discount factor d > , equality or inequality relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } , and a threshold value v ∈ Q the comparison languagewith upper bound µ , relation R , discount factor d and threshold value v is a lan-guage of infinite words over the alphabet Σ = {− µ, . . . , µ } that accepts A ∈ Σ ω iff DS ( A, d ) R v holds. Definition 4 (Comparator automata with threshold v ∈ Q ). For an in-teger upper bound µ > , discount factor d > , equality or inequality relation R ∈ { <, >, ≤ , ≥ , = , (cid:54) = } , and a threshold value v ∈ Q the comparator automatawith upper bound µ , relation R , discount factor d and threshold value v is anautomaton that accepts the DS comparison language with upper bound µ , relation R , discount factor d and threshold value v . Safety and co-safety of comparison languages.
The primary observationis that to determine if DS ( A, d ) R v holds, it should be sufficient to examinefinite-length prefixes of A since weights later on get heavily discounted. Thus, Theorem 5.
Let µ > be the integer upper bound. For arbitrary discount factor d > and threshold value v ∈ Q
1. Comparison languages are safety languages for relations R ∈ {≤ , ≥ , = } .2. Comparison language are co-safety languages for relations R ∈ { <, >, (cid:54) = } .Proof. The proof is identical to that for threshold value v = 0 from [9]. (cid:117)(cid:116) Regularity of comparison languages.
Prior work on threshold value v = 0shows that a comparator is ω -regular iff the discount factor is an integer [7]. Weshow the same result for arbitrary threshold values v ∈ Q .First of all, trivially, comparators with arbitrary threshold value are not ω -regular for non-integer discount factors, since that already holds when v = 0. The rest of this section proves ω -regularity with arbitrary threshold val-ues for integer discount factors. But first, let us introduce some notations:Since v ∈ Q , w.l.o.g. we assume that the it has an n -length representation v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω . By abuse of notation, we denoteboth the expression v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω and the value DS ( v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω , d ) by v .We will construct a B¨uchi automaton for the comparison language L ≤ forrelation ≤ , threshold value v ∈ Q and an integer discount factor. This is sufficientto prove ω -regularity for all relations since B¨uchi automata are closed.From safety/co-safety of comparison languages, we argue it is sufficient toexamine the discounted-sum of finite-length weight sequences to know if theirinfinite extensions will be in L ≤ . For instance, if the discounted-sum of a finite-length weight-sequence W is very large , W could be a bad-prefix of L ≤ . Similarly,if the discounted-sum of a finite-length weight-sequence W is very small thenfor all of its infinite-length bounded extensions Y , DS ( W · Y , d ) ≤ v . Thus, amathematical characterization of very large and very small would formalize acriterion for membership of sequences in L ≤ based on their finite-prefixes.To this end, we use the concept of a recoverable gap (or gap value), which is ameasure of distance of the discounted-sum of a finite-sequence from 0 [12]. Therecoverable gap of a finite weight-sequences W with discount factor d , denoted gap ( W, d ), is defined as follows: If W = ε (the empty sequence), gap ( ε, d ) = 0,and gap ( W, d ) = d | W |− · DS ( W, d ) otherwise. Then, Lemma 3 formalizes verylarge and very small in Item 1 and Item 2, respectively, w.r.t. recoverable gaps.As for notation, given a sequence A , let A [ . . . i ] denote its i -length prefix: Lemma 3.
Let µ > be the integer upper bound, d > be the discount factor.Let v ∈ Q be the threshold value such that v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m +2] . . . v [ n ]) ω . Let W be a non-empty, bounded, finite-length weight-sequence.1. gap ( W − v [ · · · | W | ] , d ) > d · DS ( v [ | W | · · · ] , d ) + µd − . iff for all infinite-length,bounded extensions Y , DS ( W · Y , d ) > v gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − iff For all infinite-length,bounded extensions Y , DS ( W · Y , d ) ≤ v Proof.
We present the proof of one direction of Item 1. The others follow simi-larly. Let W be s.t. for every infinite-length, bounded extension Y , DS ( W · Y , d ) > v holds. Then DS ( W, d ) + d | W | · DS ( Y, d ) ≥ DS ( v [ · · · | W | ] · v [ | W | · · · ] , d ) im-plies DS ( W, d ) − DS ( v [ · · · | W | ] , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) − DS ( Y, d )) implies gap ( W − v [ · · · | W | ] , d ) > d ( DS ( v [ | W | · · · ] , d ) + µ · dd − ). (cid:117)(cid:116) This segues into the state-space of the B¨uchi automaton. We define the statespace so that state s represents the gap value s . The idea is that all finite-lengthweight sequences with gap value s will terminate in state s . To assign transitionbetween these states, we observe that gap value is defined inductively as follows: gap ( ε, d ) = 0 and gap ( W · w, d ) = d · gap ( W, d )+ w , where w ∈ {− µ, . . . , µ } . Thusthere is a transition from state s to state t on a ∈ {− µ, . . . , µ } if t = d · s + a .Since gap ( ε, d ) = 0, state 0 is assigned to be the initial state. The issue with this construction is it has infinite states. To limit that, weuse Lemma 3. Since Item 1 is a necessary and sufficient criteria for bad prefixesof safety language L ≤ , all states with value larger than Item 1 are fused intoone non-accepting sink. For the same reason, all states with gap value less thanItem 1 are accepting states. Due to Item 2, all states with value less than Item 2are fused into one accepting sink. Finally, since d is an integer, gap values areintegral. Thus, there are only finitely many states between Item 2 and Item 1. Theorem 6.
Let µ > be an integer upper bound, d > an integer discountfactor, R an equality or inequality relation, and v ∈ Q the threshold value with an n -length representation given by v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω .1. The DS comparator automata for µ, d, R , v is ω -regular iff d is an integer.2. For integer discount factors, the DS comparator is a safety or co-safety au-tomaton with O ( µ · nd − ) states.Proof. To prove Item 1 we present the construction of an ω -regular compara-tor automaton for integer upper bound µ >
0, integer discount factor d > ≤ , and threshold value v ∈ Q s.t. v = v [0] v [1] . . . v [ m ]( v [ m +1] v [ m + 2] . . . v [ n ]) ω . , denoted by A = ( S, s I , Σ, δ, F ) where:For i ∈ { , . . . , n } , let U i = d · DS ( v [ i · · · ] , d ) + µd − (Lemma 3, Item 1)For i ∈ { , . . . , n } , let L i = d · DS ( v [ i · · · ] , d ) − µd − (Lemma 3, Item 2) – States S = (cid:83) ni =0 S i ∪{ bad , veryGood } where S i = { ( s, i ) | s ∈ {(cid:98) L i (cid:99) +1 , . . . , (cid:98) U i (cid:99)}} – Initial state s I = (0 , F = S \ { bad } – Alphabet Σ = {− µ, − µ + 1 , . . . , µ − , µ } – Transition function δ ⊆ S × Σ → S where ( s, a, t ) ∈ δ then:1. If s ∈ { bad , veryGood } , then t = s for all a ∈ Σ
2. If s is of the form ( p, i ), and a ∈ Σ (a) If d · p + a − v [ i ] > (cid:98) U i (cid:99) , then t = bad (b) If d · p + a − v [ i ] ≤ (cid:98) L i (cid:99) , then t = veryGood (c) If (cid:98) L i (cid:99) < d · p + a − v [ i ] ≤ (cid:98) U i (cid:99) ,i. If i == n , then t = ( d · p + a − v [ i ] , m + 1)ii. Else, t = ( d · p + a − v [ i ] , i + 1)We skip proof of correctness as it follows from the above discussion. Observe, A is deterministic. It is a safety automaton as all non-accepting states are sinks.To prove Item 2, observe that since the comparator for ≤ is a determinis-tic safety automaton, the comparator for > is obtained by simply flipping theaccepting and non-accepting states. This is a co-safety automaton of the samesize. One can argue similarly for the remaining relations. (cid:117)(cid:116) This section describes our comparator-based linear-time algorithm for satisficingfor integer discount factors. As described earlier, given discount factor d >
1, a play is winning for satis-ficing with threshold value v ∈ Q and relation R if its cost sequence A satisfies DS ( A, d ) R v . We now know from Theorem 6, that the winning condition forplays can be expressed as a safety or co-safety automaton for any v ∈ Q as longas the discount factor is an integer. Therefore, a synchronized product of thequantitative game with the safety or co-safety comparator denoting the winningcondition completes the reduction to a safety or reachability game, respectively. Theorem 7.
Let G = ( V, v init , E, γ ) be a quantitative game, d > the integerdiscount factor, R the equality or inequality relation, and v ∈ Q the thresholdvalue with an n -length representation. Let µ > be the maximum of absolutevalues of costs along transitions in G . Then,1. The satisficing problem reduces to solving a safety game if R ∈ {≤ , ≥}
2. The satisficing problem reduces to solving a reachability game if R ∈ { <, > }
3. The satisficing problem is solved in O (( | V | + | E | ) · µ · n ) time.Proof. The first two points use a standard synchronized product argument on thefollowing formal reduction [15]: Let G = ( V = V (cid:93) V , v init , E, γ ) be a quantitativegame, d > R the equality or inequality relation,and v ∈ Q the threshold value with an n -length representation. Let µ > G . Then, the firststep is to construct the safety/co-safety comparator A = ( S, s I , Σ, δ, F ) for µ , d , R and v . The next is to synchronize the product of G and A over weights toconstruct the game GA = ( W = W ∪ W , s × init , δ W , F W ), where – W = V × S . In particular, W = V × S and W = V × S . Since V and V are disjoint, W and W are disjoint too. – Let s × init be the initial state of GA . – Transition relation δ W = W × W is defined such that transition (( v, s ) , ( v (cid:48) , s (cid:48) )) ∈ δ W synchronizes between transitions ( v, v (cid:48) ) ∈ δ and ( s, a, s (cid:48) ) ∈ δ C if a = γ (( v, v (cid:48) )) is the cost of transition in G . – F W = V × F . The game is a safety game if the comparator is a safety au-tomaton and a reachability game if the comparator is a co-safety automaton.We need the size of GA to analyze the worst-case complexity. Clearly, GA consists of O ( | V | · µ · n ) states. To establish the number of transitions in GA ,observe that every state ( v, s ) in GA has the same number of outgoing edges asstate v in G because the comparator A is deterministic. Since GA has O ( µ · n )copies of every state v ∈ G , there are a total of O ( | E | · µ · n ) transitions in GA .Since GA is either a safety or a reachability game, it is solved in linear-time toits size. Thus, the overall complexity is O (( | V | + | E | ) · µ · n ). (cid:117)(cid:116) With respect to the value µ , the VI-based solutions are logarithmic in theworst case, while comparator-based solution is linear due to the size of the com-parator. From a practical perspective, this may not be a limitation since weightsalong transitions can be scaled down. The parameter that cannot be altered isthe size of the quantitative game. With respect to that, the comparator-based
50 100 150 200 250 300
Number of benchmarks solved T i m e ( i n s e c ond s ) VIOptimalCompSatisfice
Fig. 2: Cactus plot. µ = 5 , v = 3. Totalbenchmarks = 291 Number of states (in log scale) − − T i m e ( i n s e c ond s , i n l og sc a l e ) VIOptimalCompSatisfice
Fig. 3: Single counter scalable bench-mark. µ = 5 , v = 3. Timeout = 500s.solution displays clear superiority. Finally, the comparator-based solution is af-fected by n , length of the representation of the threshold value while the VI-basedsolution does not. It is natural to assume that the value of n is small. The goal of the empirical analysis is to determine whether the practical perfor-mance of these algorithms resonate with our theoretical discoveries.For an apples-to-apples comparison, we implement three algorithms: (a)
VIOptimal : Optimization via value-iteration, (b)
VISatisfice : Satisficing via value-iteration, and (c).
CompSatisfice : Satisficing via comparators. All tools have beenimplemented in
C++ . To avoid floating-point errors in
VIOptimal and
VISatisfice ,the tools invoke the open-source
GMP ( GNU
Multi-Precision) [2]. Since all arith-metic operations in
CompSatisfice are integral only, it does not use
GMP .To avoid completely randomized benchmarks, we create ∼
290 benchmarksfrom
LTL f benchmark suite [29]. The state-of-the-art LTL f -to-automaton tool Lisa [8] is used to convert
LTL f to (non-quantitative) graph games. Weights arerandomly assigned to transitions. The number of states in our benchmarks rangefrom 3 to 50000+. Discount factor d = 2, threshold v ∈ [0 − Observations and Inferences
Overall, we see that
VISatisfice is efficient andscalable, and exhibits steady and predictable performance . CompSatisfice outperforms
VIOptimal in both runtime and number of bench-marks solved, as shown in Fig 2. It is crucial to note that all benchmarks solvedby
VIOptimal had fewer than 200 states. In contrast,
CompSatisfice solves muchlarger benchmarks with 3-50000+ number of states.To test scalability, we compared both tools on a set of scalable benchmarks.For integer parameter i >
0, the i -th scalable benchmark has 3 · i states. Fig 3 Figures are best viewed online and in color5 − T i m e ( i n s e c ond s , i n l og sc a l e ) CompSatisficeVISatisfice
Fig. 4: Robustness. Fix benchmark, vary v . µ = 5. Timeout = 500s.plots number-of-states to runtime in log-log scale. Therefore, the slope of thestraight line will indicate the degree of polynomial (in practice). It shows usthat CompSatisfice exhibits linear behavior (slope ∼ VIOptimal ismuch more expensive (slope >>
1) even in practice.
CompSatisfice is more robust than
VISatisfice . We compare
CompSatisfice and
VISatisfice as the threshold value changes. This experiment is chosen due toTheorem 4 which proves that
VISatisfice is non-robust. As shown in Fig 4, thevariance in performance of
VISatisfice is very high. The appearance of peak closeto the optimal value is an empirical demonstration of Theorem 4. On that otherhand,
CompSatisfice stays steady in performance owning to its low complexity.
Having witnessed algorithmic improvements of comparator-based satisficing overVI-based algorithms, we now shift focus to the question of applicability. Whilethis section examines this with respect to the ability to extend to temporalgoals, this discussion highlights a core strength of comparator-based reasoningin satisficing and shows its promise in a broader variety of problems.The problem of extending optimal/satisficing solutions with a temporal goalis to determine whether there exists an optimal/satisficing solution that alsosatisfies a given temporal goal. Formally, given a quantitative game G , a labelingfunction L : V → AP which assigns states V of G to atomic propositions fromthe set AP , and a temporal goal ϕ over AP , we say a play ρ = v v . . . satisfies ϕ if its proposition sequence given by L ( v ) L ( v ) . . . satisfies the formula ϕ .Then to solve optimization/satisficing with a temporal goal is to determine ifthere exists a solutions that is optimal/satisficing and also satisfies the temporalgoal along resulting plays. Prior work has proven that the optimization problemcannot be extended to temporal goals [13] unless the temporal goals are verysimple safety properties [10,31]. In contrast, our comparator-based solution forsatisficing can naturally be extended to temporal goals, in fact to all ω -regularproperties, owing to its automata-based underpinnings, as shown below: Theorem 8.
Let G a quantitative game with state set V , L : V → AP be alabeling function over set of atomic propositions AP , and ϕ be a temporal goalover AP and A ϕ be its equivalent deterministic parity automaton. Let d > bean integer discount factor, µ be the maximum of the absolute values of costs alongtransitions, and v ∈ Q be the threshold value with an n -length representation.Then, solving satisficing with temporal goals reduces to solving a parity game ofsize linear in | V | , µ , n and |A ϕ | .Proof. The reduction involves two steps of synchronized products. The first re-duces the satisficing problem to a safety/reachability game while preservingthe labelling function. The second synchronization product is between the safe-ty/reachability game with the DPA A ϕ . These will synchronize on the atomicpropositions in the labeling function and DPA transitions, respectively. There-fore, resulting parity game will be linear in | V | , µ and n , and |A ϕ | . (cid:117)(cid:116) Broadly speaking, our ability to solve satisficing via automata-based meth-ods is a key feature as it propels a seamless integration of quantitative prop-erties (threshold bounds) with qualitative properties, as both are grounded inautomata-based methods. VI-based solutions are inhibited to do so since numeri-cal methods are known to not combine well with automata-based methods whichare so prominent with qualitative reasoning [5,20]. This key feature could be ex-ploited in several other problems to show further benefits of comparator-basedsatisficing over optimization and VI-based methods.
This work introduces the satisficing problem for quantitative games with thediscounted-sum cost model. When the discount factor is an integer, we presenta comparator-based solution for satisficing, which exhibits algorithmic improve-ments – better worst-case complexity and efficient, scalable, and robust per-formance – as well as broader applicability over traditional solutions based onnumerical approaches for satisficing and optimization. Other technical contri-butions include the presentation of the missing proof of value-iteration for opti-mization and the extension of comparator automata to enable direct comparisonto arbitrary threshold values as opposed to zero threshold value only.An undercurrent of our comparator-based approach for satisficing is that itoffers an automata-based replacement to traditional numerical methods. By do-ing so, it paves a way to combine quantitative and qualitative reasoning withoutcompromising on theoretical guarantees or even performance. This motivatestackling more challenging problems in this area, such as more complex environ-ments, variability in information availability, and their combinations.
Acknowledgements.
We thank anonymous reviewers for valuable inputs. Thiswork is supported in part by NSF grant 2030859 to the CRA for the CIFel-lows Project, NSF grants IIS-1527668, CCF-1704883, IIS-1830549, the ERC CoG863818 (ForM-SMArt), and an award from the Maryland Procurement Office. References
1. Satisficing. https://en.wikipedia.org/wiki/Satisficing .2.
GMP . https://gmplib.org/ .3. B. Alpern and F. B. Schneider. Recognizing safety and liveness. Distributed com-puting , 2(3):117–126, 1987.4. C. Baier. Probabilistic model checking. In
Dependable Software Systems Engineer-ing , pages 1–23. 2016.5. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Automata vs linear-programmingdiscounted-sum inclusion. In
Proc. of International Conference on Computer-AidedVerification (CAV) , 2018.6. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Comparator automata in quantita-tive verification. In
Proc. of International Conference on Foundations of SoftwareScience and Computation Structures (FoSSaCS) , 2018.7. S. Bansal, S. Chaudhuri, and M. Y. Vardi. Comparator automata in quantitativeverification (full version).
CoRR , abs/1812.06569, 2018.8. S. Bansal, Y. Li, L. Tabajara, and M. Y. Vardi. Hybrid compositional reasoningfor reactive synthesis from finite-horizon specifications. In
Proc. of AAAI , 2020.9. S. Bansal and M. Y. Vardi. Safety and co-safety comparator automata fordiscounted-sum inclusion. In
Proc. of International Conference on Computer-AidedVerification (CAV) , 2019.10. J. Bernet, D. Janin, and I. Walukiewicz. Permissive strategies: from parity gamesto safety games.
RAIRO-Theoretical Informatics and Applications-InformatiqueTh´eorique et Applications , 36(3):261–275, 2002.11. R. Bloem, K. Chatterjee, T. Henzinger, and B. Jobstmann. Better quality in syn-thesis through quantitative objectives. In
Proc. of CAV , pages 140–156. Springer,2009.12. U. Boker and T. A. Henzinger. Exact and approximate determinization ofdiscounted-sum automata.
LMCS , 10(1), 2014.13. K. Chatterjee, T. A. Henzinger, J. Otop, and Y. Velner. Quantitative fair simula-tion games.
Information and Computation , 254:143–166, 2017.14. D. Clark, S. Hunt, and P. Malacaria. A static analysis for quantifying informationflow in a simple imperative language.
Journal of Computer Security , 15(3):321–371,2007.15. T. Colcombet and N. Fijalkow. Universal graphs and good for games automata:New tools for infinite duration games. In
Proc. of FSTTCS , pages 1–26. Springer,2019.16. B. Finkbeiner, C. Hahn, and H. Torfah. Model checking quantitative hyperprop-erties. In
Proc. of CAV , pages 144–163. Springer, 2018.17. T. D. Hansen, P. B. Miltersen, and U. Zwick. Strategy iteration is strongly poly-nomial for 2-player turn-based stochastic games with a constant discount factor.
Journal of the ACM , 60, 2013.18. K. He, M. Lahijanian, L. Kavraki, and M. Vardi. Reactive synthesis for finitetasks under resource constraints. In
Intelligent Robots and Systems (IROS), 2017IEEE/RSJ International Conference on , pages 5326–5332. IEEE, 2017.19. O. Kupferman and M. Y. Vardi. Model checking of safety properties. In
Proc. ofCAV , pages 172–183. Springer, 1999.20. M. Kwiatkowska. Quantitative verification: Models, techniques and tools. In
Proc.6th joint meeting of the European Software Engineering Conference and the ACMSIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) ,pages 449–458. ACM Press, September 2007.821. M. Kwiatkowska, G. Norman, and D. Parker. Advances and challenges of proba-bilistic model checking. In , pages 1691–1698. IEEE, 2010.22. M. Lahijanian, S. Almagor, D. Fried, L. Kavraki, and M. Vardi. This time therobot settles for a cost: A quantitative approach to temporal logic planning withpartial satisfaction. In
AAAI , pages 3664–3671, 2015.23. M. L. Littman.
Algorithms for sequential decision making . Brown UniversityProvidence, RI, 1996.24. M. Osborne and A. Rubinstein.
A course in game theory . MIT press, 1994.25. M. Puterman. Markov decision processes.
Handbooks in operations research andmanagement science , 2:331–434, 1990.26. S. A. Seshia, A. Desai, T. Dreossi, D. J. Fremont, S. Ghosh, E. Kim, S. Shivakumar,M. Vazquez-Chanlatte, and X. Yue. Formal specification for deep neural networks.In
Proc. of ATVA , pages 20–34. Springer, 2018.27. L. S. Shapley. Stochastic games.
Proceedings of the National Academy of Sciencesof the United States of America , 39(10):1095, 1953.28. R. Sutton and A. Barto.
Introduction to reinforcement learning , volume 135. MITpress Cambridge, 1998.29. L. M. Tabajara and M. Y. Vardi. Partitioning techniques in LTLf synthesis. In
IJCAI , pages 5599–5606. AAAI Press, 2019.30. W. Thomas, T. Wilke, et al.
Automata, logics, and infinite games: A guide tocurrent research , volume 2500. Springer Science & Business Media, 2002.31. M. Wen, R. Ehlers, and U. Topcu. Correct-by-synthesis reinforcement learningwith temporal logic constraints. In , pages 4983–4990. IEEE, 2015.32. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs.
Theoretical Computer Science , 158(1):343–359, 1996.9
A Complexity proof for VI Optimization
Lemma 1
Let l = l · ( l ) ω represent an integer cost sequence of a lasso, where l and l are the cost sequences of the head and loop of the lasso. Let d = pq be the discount factor. Then, DS ( l, d ) is a rational number with denominator atmost ( p | l | − q | l | ) · ( p | l | ) .Proof. The discounted sum of l is given as follows: DS ( l, d ) = DS ( l , d ) + 1 d | l | · ( DS (( l ) ω , d ))= DS ( l , d ) + 1 d | l | · (cid:16) DS ( l , d ) + 1 d | l | · DS ( l , d ) + 1 d ·| l | · DS ( l , d ) + . . . (cid:17) Taking closed form expression of the term in the parenthesis, we get= DS ( l , d ) + 1 d | l | · (cid:16) d | l | d | l | − (cid:17) · DS ( l , d )Let l = b b . . . b | l |− where b i ∈ Z = DS ( l , d ) + 1 d | l | · (cid:16) d | l | d | l | − (cid:17) · (cid:16) b + b d + · · · + b | l |− d | l |− (cid:17) = DS ( l , d ) + 1 d | l | · (cid:16) d | l | − (cid:17) · (cid:16) b · d | l | + b · d | l |− + · · · + b | l |− d (cid:17) Expressing d = pq , we get= DS ( l , d ) + 1 d | l | · q | l | p | l | − q | l | · (cid:16) b ( pq ) | l | + . . . b | l |− · pq (cid:17) DS ( l , d ) + 1 d | l | · p | l | − q | l | · M, where M ∈ Z Expressing d = pq again, we get= 1 p | l | · p | l | − q | l | · N, where N ∈ Z Theorem 1
Let G = ( V, v init , E, γ ) be a graph game. The number of iterationsrequired by the value-iteration algorithm or the length of the finite-length gameto compute the optimal value W is1. O ( | V | ) when discount factor d ≥ ,2. O (cid:16) log( µ ) d − + | V | (cid:17) when discount factor < d < .Proof. Recall, the task is to find a k such that the interval identified by Lemma 2is less than bound diff . Note that bound W < bound diff . Therefore, bound diff < bound v .Hence, there can be only one rational value with denominator bound W or less inthe small interval identified by the chosen k . Since the optimal value must alsolie in this interval, the unique rational number with denominator bound W or less must be the optimal value. Let k be such that the interval from Lemma 2 is lessthan bound diff . Then,2 · µd − · d k − ≤ c · p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) for some c > · µd − · d k − ≤ c · q ·| V | ( p ( | V | ) − q ( | V | ) ) · ( p (2 ·| V | ) ) for some c > · µd − · d k − ≤ c · d ( | V | ) − · ( d (2 ·| V | ) ) for c > d − · d k − ≥ c (cid:48) · µ · ( d ( | V | ) − · ( d (2 ·| V | ) ) for c (cid:48) > d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −
1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > When d ≥ : In this case, both d and d | V | are large. Then,log( d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −
1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d ) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · | V | · log( d ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > k = O ( | V | ) When d is small but d | V | is large: In this case, log( d ) ≈ ( d − d − ≈ − d . Then,log( d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −
1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · | V | · log( d ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > − d + ( k − · ( d − ≥ c (cid:48)(cid:48) + log( µ ) + 4 · | V | · ( d −
1) for c (cid:48)(cid:48) > k = O (cid:16) log( µ ) d − | V | (cid:17) When both d and d | V | are small: Then, in addition to the approximationsfrom the earlier case, log( d | V | − ≈ (2 − d | V | ). So,log( d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · log( d ( | V | ) −
1) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > d −
1) + ( k − · log( d ) ≥ c (cid:48)(cid:48) + log( µ ) + 2 · (2 − d | V | ) + 2 · | V | · log( d ) for c (cid:48)(cid:48) > − d + ( k − · ( d − ≥ c (cid:48)(cid:48) + log( µ ) + 2 · (2 − d | V | ) + 2 · | V | · ( d −
1) for c (cid:48)(cid:48) > k = O (cid:16) log( µ ) d − | V | (cid:17) (cid:117)(cid:116) Concrete example to establish Ω ( | V | ) lower bound for number of iter-ations required by the value iteration algorithm Recall Fig 1, as presentedhere as well: . . .. . . w >
00 1 010 00
Fig. 5: Sketch of game graph which requires Ω ( | V | ) iterationsLet the left hand side loop have 4 n edges, the right hand side of the loophave 2 n edges, and w = d n + d n + · · · + d m · n − such that m · n − c · n fora positive integer c > m · n −
1) or less, the optimal patharises from the loop to the right. But for games of length greater than ( m · n − B Complexity of VI under Bit-Cost model
Under the bit-cost model, the cost of arithmetic operations depends on the sizeof the numerical values. Integers are represented in their bit-wise representation.Rational numbers rs are represented as a tuple of the bit-wise representation ofintegers r and s . For two integers of length n and m , the cost of their additionand multiplication is O ( m + n ) and O ( m · n ), respectively.To compute the cost of arithmetic in each iteration of the value-iterationalgorithm, we define the cost of a transition ( v, w ) ∈ E in the k -th iteration as cost ( v, w ) = γ ( v, w ) and cost k ( v, w ) = γ ( v, w ) + 1 d · wt k − ( v ) for k > wt k ( v ) = max { cost k ( v, w ) | w ∈ vE } if v ∈ V and wt k ( v ) =min { cost k ( v, w ) | w ∈ vE } if v ∈ V . Since, we compute the cost of every transitionin each iteration, it is crucial to analyze the size and cost of computing cost . Lemma 4.
Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.Then for all ( v, w ) ∈ E , for all k > cost k ( v, w ) = q k − · n + q k − p · n + · · · + p k − n k p k − where n i ∈ Z such that | n i | ≤ µ for all i ∈ { , . . . , k } . Lemma 4 can be proven by induction on k . Lemma 5.
Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.For all ( v, w ) ∈ E , for all k > the cost of computing cost k ( v, w ) in the k -thiteration is O ( k · log p · max { log µ, log p } ) . Proof.
We compute the cost of computing cost k ( v, w ) given that optimal costshave been computed for the ( k − cost k ( v, w ) = γ ( v, w ) + 1 d · wt k − ( v ) = γ ( v, w ) + qp · wt k − ( v )= γ ( v, w ) + qp · q k − · n + q k − p · n + · · · + p k − n k − p k − for some n i ∈ Z such that | n i | ≤ µ . Therefore, computation of cost k ( v, w )involves four operations:1. Multiplication of q with ( q k − · n + q k − p · n + · · · + p k − n k − ). The later isbounded by ( k − · µ · p k − since | n i | ≤ µ and p > q . The cost of this operationis O (log(( k − · µ · p k − ) · log( p )) = O ((( k − · log p +log µ +log( k − · (log p )).2. Multiplication of p with p k − . Its cost is O (( k − · (log p ) ).3. Multiplication of p k − with γ ( v, w ). Its cost is O (( k − · log p · log µ ).4. Addition of γ ( v, w ) · p k − with q · ( q k − · n + q k − p · n + · · · + p k − n k − ).The cost is linear in their representations.Therefore, the cost of computing cost k ( v, w ) is O ( k · log p · max { log µ, log p } ). (cid:117)(cid:116) Now, we can compute the cost of computing optimal costs in the k -th itera-tion from the k − Lemma 6.
Let G be a quantiative graph game. Let µ > be the maximum ofabsolute value of all costs along transitions. Let d = pq be the discount factor.The worst-case complexity of computing optimal costs in the k -th iteration fromthe ( k − -th iteration is O ( | E | · k · log µ · log p ) .Proof. The update requires us to first compute the transition cost in the k -thiteration for every transition in the game. Lemma 5 gives the cost of computingthe transition cost of one transition. Therefore, the worst-case complexity ofcomputing transition cost for all transitions is O ( | E | · k · log p · max { log µ, log p } ).To compute the optimal cost for each state, we are required to compute themaximum transition cost of all outgoing transitions from the state. Since thedenominator is same, the maximum value can be computed via lexicographiccomparison of the numerators on all transitions. Therefore, the cost of computingmaximum for all states is O ( | E | · k · log µ · log p ).Therefore, total cost of computing optimal costs in the k -th iteration fromthe ( k − O ( | E | · k · log p · max { log µ, log p } ). (cid:117)(cid:116) Finally, the worst-case complexity of computing the optimal value of thequantitative game under bit-cost model for arithmetic operations is as follows:
Theorem 3.
Let G = ( V, v init , E, γ ) be a quantitative graph game. Let µ > bethe maximum of absolute value of all costs along transitions. Let d = pq > bethe discount factor. The worst-case complexity of computing the optimal valueunder bit-cost model for arithmetic operations is O ( | V | · | E | · log p · max { log µ, log p } ) when d ≥ ,2. O ( (cid:16) log( µ ) d − + | V | (cid:17) · | E | · log p · max { log µ, log p } ) when < d < .Proof. This is the sum of computing the optimal costs for all iterations.When d ≥
2, it is sufficient to perform value iteration for O ( | V | ) times(Theorem 1). So, the cost is O ((1 + 2 + 3 · + | V | ) · | E | · log p · max { log µ, log p } ).This expression simplifies to O ( | V | · | E | · log p · max { log µ, log p } ).A similar computation solves the case for 1 < d < (cid:117)(cid:116) C Discounted-sum comparator construction
Theorem 5
Let µ > be the upper bound. For arbitrary discount factor d > and threshold value v
1. DS-comparison languages are safety languages for relations R ∈ {≤ , ≥ , = } .2. DS-comparison language are co-safety languages for relations R ∈ { <, >, (cid:54) = } .Proof. Due to duality of safety/co-safety languages, it is sufficient to show thatDS-comparison language with ≤ is a safety language.Let us assume that DS-comparison language with ≤ is not a safety language.Let W be a weight-sequence in the complement of DS-comparison language with ≤ such that it does not have a bad prefix.Since W is in the complement of DS-comparison language with ≤ , DS ( W, d ) >v . By assumption, every i -length prefix W [ i ] of W can be extended to a boundedweight-sequence W [ i ] · Y i such that DS ( W [ i ] · Y i , d ) ≤ v .Note that DS ( W, d ) = DS ( W [ · · · i ] , d ) + d i · DS ( W [ i . . . ] , d ), and DS ( W [ · · · i ] · Y i , d ) = DS ( W [ · · · i ] , d )+ d i · DS ( Y i , d ). The contribution of tail se-quences W [ i . . . ] and Y i to the discounted-sum of W and W [ · · · i ] · Y i , respectivelydiminishes exponentially as the value of i increases. In addition, since and W and W [ · · · i ] · Y i share a common i -length prefix W [ · · · i ], their discounted-sum valuesmust converge to each other. The discounted sum of W is fixed and greater than v , due to convergence there must be a k ≥ DS ( W [ · · · k ] · Y k , d ) > v .Contradiction. Therefore, DS-comparison language with ≤ is a safety language.The above intuition is formalized below:Since DS ( W, d ) > v and DS ( W [ · · · i ] · Y i , d ) ≤ v , the difference DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) > DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) = d i ( DS ( W [ i . . . ] , d ) − DS ( Y i , d )) ≤ d i · ( | DS ( W [ i . . . ] , d ) | + | DS ( Y i , d ) | ). Sincethe maximum value of discounted-sum of sequences bounded by µ is µ · dd − , wealso get that DS ( W, d ) − DS ( W [ · · · i ] · Y i , d ) ≤ · d i | µ · dd − | .Putting it all together, for all i ≥ < DS ( W, d ) − DS ( W [ i ] · Y i , d ) ≤ · d i | µ · dd − | As i → ∞ , 2 · | d i − · µd − | →
0. So, lim i →∞ ( DS ( W, d ) − DS ( W [ i ] · Y i , d )) = 0.Since DS ( W, d ) is fixed, lim i →∞ DS ( W [ i ] · Y i , d ) = DS ( W, d ). By definition of convergence, there exists an index k ≥ DS ( W [ k ] · Y k , d ) falls within the | DS ( W,d ) | neighborhood of DS ( W, d ). Finallysince DS ( W, d ) > DS ( W [ k ] · Y k , d ) > i ≥ DS ( W [ i ] · Y i , d ) ≤ ≤ is a safety comparator. (cid:117)(cid:116) Lemma 3
Let µ > be the integer upper bound, d > be an integer discountfactor, and the relation R be the inequality ≤ . Let v ∈ Q be the threshold valuesuch that v = v [0] v [1] . . . v [ m ]( v [ m + 1] v [ m + 2] . . . v [ n ]) ω . Let W be a non-empty,bounded, finite weight-sequence. Then, weight sequence W is a bad-prefix of theDS comparison language with µ , d , ≤ and v iff gap ( W − v [ · · · | W | ] , d ) > d · DS ( v [ | W | · · · ] , d ) + µd − .Proof. Let W be a bad prefix. Then for all infinite length, bounded weightsequence Y we get that DS ( W · Y , d ) > v = ⇒ DS ( W, d ) + d | W | · DS ( Y, d ) ≥ DS ( v [ · · · | W | ] · v [ | W | · · · ] , d ) = ⇒ DS ( W, d ) − DS ( v [ · · · | W | ] , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) − DS ( Y, d )) = ⇒ gap ( W − v [ · · · | W | ] , d ) > d ( DS ( v [ | W | · · · ] , d ) + µ · dd − ).Next, we prove that if a finite weight sequence W is such that, the Wis a bad prefix. Let Y be an arbitrary infinite but bounded weight sequence.Then DS ( W · Y , d ) = DS ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) + d | W |− · ( gap ( v [ · · · | W | ] , d ) − gap ( v [ · · · | W | ] , d )). By re-arrangement of terms we get that DS ( W · Y , d ) = d | W |− · gap ( W − v [ · · · | W | ] , d ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ).Since gap ( W − v [ · · · | W | ] , d ) > d · ( DS ( v [ | W | · · · ] , d ) + µ · dd − ) holds, we get that DS ( W · Y , d ) > d | W | · ( DS ( v [ | W | · · · ] , d ) + µ · dd − ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ). Since minimal value of DS ( Y, d ) is − µ · dd − , the inequality simpli-fies to DS ( W · Y , d ) > d | W |− · gap ( v [ · · · | W | ] , d ) + d | W | · DS ( v [ | W | · · · ] , d ) = ⇒ DS ( W · Y , d ) > DS ( v, d ) = v . Therefore, W is a bad prefix. (cid:117)(cid:116) Lemma 7.
Let µ and d > be the bound and discount-factor, resp. Let W be a non-empty, bounded, finite weight-sequence. Weight sequence W is a verygood-prefix of A µ,d ≤ iff gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − .Proof. Let W be a very good prefix. Then for all infinite, bounded sequences Y , we get that DS ( W · Y , d ) ≤ v = ⇒ DS ( W, d ) + d | W | · DS ( Y, d ) ≤ v . By re-arrangement of terms, we get that gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − d · DS ( Y, d ). Since maximal value of DS ( Y, d ) = µ · dd − , we get that gap ( W − v [ · · · | W | ] , d ) ≤ d · DS ( v [ | W | · · · ] , d ) − µd − .Next, we prove the converse. We know DS ( W · Y , d ) = DS ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) = d | W |− · gap ( W, d ) + d | W | · DS ( Y, d ) + d | W |− · ( gap ( v [ · · · | W | ] , d ) − gap ( v [ · · · | W | ] , d )). By re-arrangement ofterms we get that DS ( W · Y , d ) = d | W |− · gap ( W − v [ · · · | W | ] , d ) + d | W | · DS ( Y, d ) + d | W |− · gap ( v [ · · · | W | ] , d ). From assumption we derive that DS ( W · Y , d ) ≤ d | W | · DS ( v [ | W | · · · ] , d ) − µd − + d | W | · DS ( Y, d )+ DS ( v [ · · · | W | ] , d ). Since maximal valueof DS ( Y, d ) is µd − , we get that DS ( W · Y , d ) ≤ v . Therefore, W is a very goodprefix.is a very goodprefix.