One-Clock Priced Timed Games are PSPACE-hard
aa r X i v : . [ c s . G T ] M a r One-Clock Priced Timed Games are
PSPACE -hard
John Fearnley, Rasmus Ibsen-Jensen, and Rahul SavaniDept. of Computer Science, University of LiverpoolMarch 6, 2020
Abstract
The main result of this paper is that computing the value of a one-clock priced timed game (OCPTG) is
PSPACE -hard. Along the way, we provide a family of OCPTGs that have an exponential number of eventpoints. Both results hold even in very restricted classes of games such as DAGs with treewidth three. Finally,we provide a number of positive results, including polynomial-time algorithms for even more restricted classesof OCPTGs such as trees.
In this paper, we study priced timed games (PTG), which are two-player zero-sum games that are played on agraph. The defining feature of PTGs is that the game is played over time, with players accumulating costs bothfor spending time waiting in states, and for using edges. Ultimately, one of the players would like to reach a goalstate while minimizing the cost, while the opponent would like to prevent the goal state from being reached, orif that is impossible, to maximize the cost of reaching the goal state.Priced timed games have been studied extensively in the literature, starting with the work of La Torre,Mukhopadhyay, and Murano [18] who first studied games on DAGs, with the later paper of Bouyer, Cassez,Fleury, and Larsen [8] being the first to study the concept on general graphs. Since then, there has been a greatdeal of follow-up work on these games, e.g., [1, 3, 5–16, 18, 20], including work on practical applications in, forexample, embedded systems, and also applications in other theoretical results.
One-clock priced timed games.
In general, a PTG can have any number of clocks , which all increase at thesame rate as time progresses, but which can be independently reset back to zero. The edges of the game can have guards , which only allow the edge to be used if the clock values satisfy the conditions of the guard.In this paper, we focus on the case in which there is exactly one clock, and so we study one-clock priced timedgames (OCPTG). It has been shown that one-clock priced timed games always have a value [11], and moreoveralgorithms have been proposed [11,15,20] for computing the value of these games. The current state of the art isthe algorithm of Hansen, Ibsen-Jensen, and Miltersen [15], who give an algorithm that runs in O ( m · poly ( n ) · n ) time, where m is the number of edges and n is the number of vertices. This gives an exponential-time upper boundfor the problem.It has remained open, however, whether the problem can be solved in polynomial time. The running time ofHansen, Ibsen-Jensen, and Miltersen’s algorithm [15] is polynomial in the number of event points in the game,which are the set of points at which the gradient of the value function changes. They showed that all OCPTGshave at most 24 m ( n + ) n event points, which directly leads to the running time of their algorithm mentionedabove. They conjectured that the number of event points in a OCPTG is actually bounded by a polynomial [15],and if this conjecture were true, then their algorithm would always terminate in polynomial time. Lower bounds.
This paper shows that computing the value of a one-clock priced timed game is very unlikely tobe solvable in polynomial time, by showing that the problem is actually
PSPACE -hard. We begin by construct-ing a family of examples that have exponentially many event points. This explicitly disproves the conjectureof Hansen, Ibsen-Jensen, and Miltersen. We then use those examples as the foundation of our computationalcomplexity reductions. We first show that the problem is both NP and coNP -hard, and we then combine thetechniques from both those reductions to show hardness for the k -th level of the polynomial-time hierarchy, forall k , and finally PSPACE . 1ll of our lower bound constructions produce graphs with special structures. In particular they are all acyclic,planar, have in-degree and out-degree at most 2, and overall degree at most 3 (our figures show a simpler variantwith overall degree at most 4). Also, the treewidth, cliquewidth, and rankwidth of our constructions are all 3.Our results for the polynomial-time hierarchy give additional properties. To obtain hardness for the k -th levelof the polynomial time hierarchy, we only need k + holding rates , which are the costs that the playersincur by waiting in a particular state.Another interesting feature is that, in a variant of the construction, which loses planarity, all but k + urgent . A state is urgent if the player is not allowed to wait at the state. Urgent states arerelevant because the results of [11,20] are based on the technique of converting more and more states into urgentstates, since it is easy to solve a game in which all states are urgent.In particular, our NP - and coNP -hardness constructions have 3 distinct holding rates, namely 0 , / ,
1, andthere is a variant in which all but two states can be made urgent.Finally, members of our initial family that has exponentially many event points have the additional propertiesof having pathwidth 3, using only holding rates { , } , and having only a single state that cannot be made urgent.Thus, the games may still have exponentially-many event points even for many of the most obvious special cases. Upper bounds.
Our hardness results essentially rule out finding polynomial-time algorithms for many ques-tions in a large number of special cases, unless P = PSPACE . We are able to prove some upper bounds: we showthat undirected graphs and trees have a polynomial number of event points, and so can be solved in polynomialtime.Finally, we show that OCPTGs on DAGs are in
PSPACE by showing that a variant of the event-point iterationalgorithm [15] can solve games on DAGS in polynomial space. Combined with our hardness results, we obtaina
PSPACE -completeness result for OCPTGs played on DAGs. This result improves on an exponential-timealgorithm by [1] that in turn improved on a double- exponential-time algorithm [18], both of which are designedfor games with many clocks.
As shown in [7], building on a similar result in [12] for five clocks, some problems are undecidable in general forpriced timed games with three clocks. This was extended to the value problem in [10]. The complexity of mostproblems for two clocks is still open.Games with only a single player, called priced timed automata, have been studied extensively on their own,following their introduction in [2,4]. They can be solved in NL for the one-clock case [19] and in PSPACE for themultiple clock case [6]. Games on DAGs are in
EXP for any number of clocks [1], which improved on a previous bound in [18]. Games with no costs and holding rates in { , } are called reachability timed games. Theycan be solved in polynomial time for one clock [15, 21] and in polynomial space for multiple clocks [16].This result has been generalised by [14] to show a polynomial time algorithm for the decision question forone clock=priced timed games with rates { , } and integer costs. Previously, they also claimed that such gameswould have only a polynomial number of event points, implying that one could find the full value functionsin polynomial time. This, however is incorrect: We show in Appendix J how to convert our examples withexponentially-many event points and two holding rates to have integer costs. Their result [14] does show apseudo-polynomial number of event points for such games though.More generally, [14] also give a pseudo-polynomial time algorithm for the special case with holding rates in {− d , , d } for any number d (note that our paper otherwise does not discuss negative rates or costs). As shown by [15], every one-clock priced timed game can be reduced, in polynomial time, to a simple pricedtimed game (SPTG), which is an OCPTG in which there are no edge guards and no clock resets. Our hardnessresults will directly build SPTGs, and so we restrict our definitions to SPTGs in this section. Since every SPTG isa OCPTG, all of our hardness results directly apply to OCPTGs.
SPTGs.
A simple priced timed game is a game played between two players called the minimizer and the max-imizer. The game is formally defined by a 6-tuple: ( V , V , G , E , c , r ) , where I.e. given a game, a state, and a number, is the value of starting in that state at time 0 above the value? V is the set of states belonging to the minimizer, V is the set of states belonging to the maximizer, and G isa set of goal states. The set of all states is denoted as V = V ∪ V ∪ G , and we use n to denote the number ofstates.• E is a set of directed edges , which is a subset of V × V . We use m to denote the number of edges.• c : E → R ≥ is a non-negative cost function for edges.• r : V → R ≥ is a non-negative holding rate function for states.The game takes place over a period of time. At the start of the game, a pebble is placed on one of the statesof the game. In each round of the game, we will be at some time t ∈ [ , ] . The player who owns the state thatholds the pebble, can choose to move the pebble along one of the outgoing edges of that state, or to delay untilsome future point in time. Moving along an edge e incurs the fixed one-time cost given by c ( e ) , while delayingfor d time units at a state s incurs a cost of r ( s ) · d .The game starts at time 0, and either ends when a goal state is reached, or it never ends. If a goal state is notreached, then the minimizer loses the game, and receives payoff −∞ . Otherwise, the payoff is the total amountof cost that was incurred before the goal state was reached, which the maximizer wins, and the minimizer loses. Strategies.
Our players will use time-positional strategies , meaning that for each state and each point in time,the strategy chooses a fixed action that is executed irrespective of the history of the play. Formally, for each j ∈ { , } , a time-positional strategy σ j for player j is defined by a pair ( W j , S j ) .• W j is a set of non-negative change points . That is, W j = { = w j < w j < w j < · · · < w jk − < = w jk } givesa sequence of points in time at which the player changes their strategy. For notational convenience we define w jk + = ∞ .• S j = { S j , S j , . . . S jk } is a corresponding list of strategy choices , which defines what action the player chooses ateach point in time. The player can either choose an outgoing edge, or choose to wait at the state, which wedenote with the symbol δ . So, for each i , we have that S ji : V j → E ∪ { δ } with the requirement that if S ji ( s ) ∈ E then S ji ( s ) = ( s , s ′ ) for some state s ′ . At time 1, delay is not possible, so for all s ∈ V j we require that S jk ( s ) , δ . Plays.
Given a pair of strategies σ , σ for the minimizer and the maximizer, respectively, the resulting play from a starting state s , and a starting time t is denoted as P ( σ , σ , s , t ) , and is defined as follows. Initially, placea pebble on s at time t . For each j ∈ { , } and i , whenever the pebble is placed on a state s i in V j at time t i , let i ′ be the index such that t i ∈ [ w ji ′ , w ji ′ + ) and let ℓ ≥ i ′ be the smallest index such that e i : = S j ℓ ( s i ) = ( s i , s i + ) , δ .Then, player j waits until time t i + = w ℓ , and then moves the pebble on to s i + at time t i + We also define δ i : = t i + − t i to be the delay that player i chooses at time t i .If s i ∈ G , then the play is over and | P ( σ , σ , s , t )| = i . If the play is never over, i.e. for all i , s i < G , we havethat | P ( σ , σ , s , t )| = ∞ . Outcomes and values.
The outcome val ( P ) is defined to be ∞ if | P | = ∞ , since no goal state is reached.Otherwise, the outcome is val ( P ) : = | P | Õ t = ( r ( s t ) · δ t + c ( e t )) , where r ( s t ) · δ t is the cost for holding at the state s t for δ t time units, and c ( e t ) is the cost for using the edge e t .Fix s to be a state, and t to be a time. The lower value is defined to be val ( s , t ) = sup σ inf σ val ( P ( σ , σ , s , t )) , while the upper value is defined to be val ( s , t ) = inf σ sup σ val ( P ( σ , σ , s , t )) . By definition, val ( s , t ) ≤ val ( s , t ) . As shown in [11], for a richer class of strategies, val ( s , t ) = val ( s , t ) . It mostlyfollows from [11] (but formally, one also needs [15]) that this equality holds even when the minimizer is restrictedto time-positional strategies in the definition of lower value and the maximizer is restricted to time-positionalstrategies in the definition of upper value. Therefore, the game is determined in time-positional strategies, andwe use val ( s , t ) : = val ( s , t ) = val ( s , t ) to denote the value of the game starting at the state s , and time t . Optimal and ϵ -optimal strategies. Given an ϵ ≥
0, a strategy σ is ϵ -optimal for the minimizer if val ( s , t )− ϵ ≤ inf σ val ( P ( σ , σ , s , t )) for all s and t . A strategy is optimal if it is 0-optimal. The definitions for the maximizerare symmetric. As shown in [11], for all ϵ > ϵ -optimal strategies exist in OCPTGs, and [15] have shownthat optimal strategies exist in SPTGs. Moreover, the function val ( s , t ) is piecewise linear for OCPTGs [11], andcontinuous for SPTGs [15] 3 vent points. As mentioned, the value function of each state in an SPTG is piecewise linear. An event point isa point in time at which the value function of some state s changes from one linear function to another. The setof event points contains every event point for every state in the game.As shown in [15], improving on [11, 20], the number of event points is less than 12 n for SPTGs and it is lessthan m · n · poly ( n ) for OCPTGs. The optimal strategies for SPTGs constructed by [15] have the set of changepoints being equal to the set of event points. Conversely, it is clear that event points are a subset of the changepoints in any optimal strategy. We begin by constructing a family of simple priced timed games in which the number of event points in theoptimal strategy is exponential. This serves two purposes. Firstly, it provides a negative answer to the question,posed in prior work [15], of whether the number of event points is polynomial. Secondly, this construction willbe used in a fundamental way in the hardness results that we present in later sections.
The construction.
The family of games is shown on the left-hand side in Figure 1. States belonging to themaximizer are drawn as squares, while states belonging to the minimizer are drawn as triangles. The numberdisplayed on each state is the holding rate for that state, while the number affixed to each edge is the cost ofusing that edge.The game is divided into levels, with each level containing two states, which we will call the left state (denotedas v i ℓ ) and the right state (denoted as v ir ). These names correspond to the positions at which these states are drawnin Figure 1.At the bottom of the game, on level 0, the left state v ℓ is the goal state and the right state v r is a maximizerstate with holding rate 1. The state v r has an edge to v ℓ with cost 0. For each level i >
0, the left state v i ℓ is aminimizer state with holding rate 1, and the right state v ir is a maximizer state with holding rate 0. Both stateshave the same outgoing edges: an edge to v i − ℓ with cost 2 − i , and an edge to v i − ℓ with cost 0. Value diagrams.
On the right-hand side in Figure 1, we show the value function for each state, representedas value diagrams . These show the value for each state at each point in time. The bottom-left diagram shows thevalue function of the goal state, which is zero at all points in time, since the game ends when the state is reached.The bottom-right diagram shows the value function of the state v r (the bottom-right state of the game). At thisstate, the maximizer will wait for as long as possible before moving to the goal, since this maximizes the costgenerated from the holding rate of 1. Hence, the value of this state is 1 − x at time 0 ≤ x ≤
1, which is shown inthe diagram.For the states at level one of the game, first observe that there is no incentive for either player to wait. The leftstate has holding rate 1, which is the worst possible holding rate for the minimizer, and the right state has holdingrate 0, which is the worst possible holding rate for the maximizer. Hence both players will move immediately tothe lower level, and we must determine which state is chosen.To do this, we use the value function diagrams of the lower level. Both players can move to the goal withan edge cost of 0 .
5, or move to v r with a cost of zero. So we shift the value function of the goal state up by 0 . v r . This is displayed in the value diagram that lies between thetwo layers. The minimizer’s value function is the lower envelope of these two functions, which minimizes thevalue, while the maximizer’s value function is the upper envelope , which maximizes the value. This is shown inthe value diagrams of the two states at level one.This process repeats for each level. For level two, we overlay the two value diagrams from level one, aftershifting the left-hand diagram up by the edge cost of 0 .
25, and then we take lower and upper envelopes for therespective players.
The exponential lower bound.
To see that this game produces exponentially many event points, observethat the left-hand value diagram at level two contains two complete copies of the left-hand value diagram at levelone, and that the same property holds for the right-hand value diagrams. This property generalizes, and we canshow that the value diagrams for v n ℓ and v nr both contain 2 n distinct line segments. The following theorem isshown in Appendix B. Theorem 1.
There is a family of simple priced time games that have exponentially many event points. oal 1 time time
11 0 . . time
11 0 . . time
11 0 . . time
11 0 .
25 0 . . . . . time
11 0 .
25 0 . . . . . time
11 0 .
25 0 . . . . . time
11 0 .
25 0 . . . . . time − i − i ... Figure 1: Event points lower bound construction.5 .1 Inapproximability with few change points
We are also able to show that both players must use strategies with exponentially many change points in orderto play close to optimally in our lower bound game. Specifically, we can show that if the game starts at the k thlevel of our game, that is, in the vertices v k ℓ or v kr , and if both players play ϵ -optimally for ϵ < / k , then everyinterval of the form (cid:20) x k − , x + k − (cid:19) for some integer x , must contain a change point. This is only possible if there are 2 k − distinct change points.We shall illustrate this for the case where k =
3, by showing that the minimizer must use four different changepoints at v ℓ to play an ϵ -optimal strategy with ϵ < /
8. The value diagram of v ℓ is the lower envelope of thevalue diagram at the top of Figure 1. Let us consider the interval D = [ x / , ( x + )/ ) for some integer x , and forthe sake of contradiction, suppose that there are no change points in this interval.Since the minimizer cannot change their strategy, they have only three options during D : always go to v ℓ ,always go to v r , or wait at v ℓ until the end of D .If the minimizer chooses to wait, then let us consider a play starting at time x /
4. This play has a payoff of atleast 1 / + val ( v ℓ , ( x + )/ ) , because we wait with a holding rate of 1 for 1 / ( x + )/ ( v ℓ , ( x + )/ ) . On the other hand, we haveval ( v ℓ , x / ) = val ( v ℓ , ( x + )/ ) + / . This can be seen from the value function for v ℓ in Figure 1: the first half of the value function during D is flat,while the second half falls at rate 1, hence the difference is 1 /
8. Since choosing to wait achieves a value that is1 / ϵ -optimal for any ϵ < / v ℓ and always going to v r , while the blue line gives the outcome for always going to v ℓ , assuming that both players play optimally afterwards. The optimal strategy takes the lower envelope of thetwo lines.There is a difference of 1 / x / ( x + )/
4, but the lines cross in the middle ofthe interval, so the line that is part of the lower envelope at x / ( x + )/
4. Hence, choosing to go to v r , or to v ℓ for the entire interval will cause a loss in value of up to 1 / ϵ -optimal since ϵ < / k ≥ Lemma 1.
There is a family of simple priced time games in which every ϵ -optimal strategy with ϵ < / k uses k − change points. NP and coNP lower bounds We now present NP -hardness and coNP -hardness results for computing the value of a simple priced timedgame. This serves two purposes. Firstly, it introduces some of the key concepts that we will use in our PSPACE -hardness result. Secondly, these hardness results will hold for SPTGs that have only the holding rates { , / , } ,which is not the case for our later results.Our goal in this section is to show hardness results for the following decision problem: given a state s and aconstant c , decide whether val ( s , ) ≥ c . In other words, it is hard to determine the value of a particular state attime zero. The majority of this section will be used to describe the NP -hardness result, and the coNP -hardnesswill be derived by slightly altering the techniques that we develop. Relative values.
The family of games from Section 3 will be used as a basis for this result. We start by dis-cussing a change in perspective that is helpful when dealing with value diagrams. Take, for example, the valuediagram at the top of Figure 1. Observe that both of the value functions depicted in this diagram are weaklymonotone. This will always be the case in an SPTG, since there are no guards, meaning that costs can onlyincrease as the amount of time left in the game increases.6 .
25 0 . . . . . timevalue .
25 0 . . timevalue + time Figure 2: Our relative value diagramming convention.We will use values at specific points in time to encode information. But we will not use the absolute value,but rather the value relative to some monotone linear function. This is shown in Figure 2. On the right-hand sidewe have added the linear function time /
2, which causes the value functions to become horizontal. The diagramshows the value functions increasing and decreasing relative to this linear function.We will use relative values in our reduction, because it makes it easier to understand. It is worth pointingout, however, that this is only a change in perspective. The underlying absolute values are still always weaklymonotone.
Enumerating bit strings.
Our NP -hardness reduction will be a direct reduction from Boolean satisfiability.There are two steps to the reduction. First we build a set of gadgets that enumerate all possible n -bit strings overtime, and then we build a gadget that tests whether a Boolean formula is true over this set of bit strings.We start by describing the enumeration gadget. We denote the n bits of a bit string as v through v n . Theenumeration gadget builds 2 n states, corresponding to v i and ¬ v i for each index i . The top half of Figure 3 showsthe relative value diagrams for these states.The gadget divides time into 2 n intervals, with each interval corresponding to a particular bit string. Bitvalues of the bit-string are encoded using the relative value function of the states, using two fixed constants L and H that the relative value stays between.• If a bit is zero for an interval, then the relative value of the state remains at L during the interval.• If a bit is one for an interval, then the relative value of the state begins the interval at L , it increases during theinterval to H , and then decreases back to L by the end of the interval. This forms the peaks shown in Figure 3.The enumeration gadget produces these value functions by using several copies of the exponentially-manyevent point games from Section 3. From Figure 1, we can see that the value functions there are similar to what wewant: the functions alternate between having high relative value and low relative value, and there are exactly 2 i alternations at level i . However, these value functions do not exactly match those shown in Figure 3. Specifically:• The exponential lower bound functions are symmetric with respect to peaks and troughs, but we would likezeroes to be represented by the fixed constant L , and ones to be represented as peaks.• The functions start at either peaks or troughs, but we would like to start in the middle of the waveform. Soattempting to represent v in Figure 3 using the value functions from the exponential lower bound would resultin a bit-sequence like , rather than .• When a state has a sequence of intervals that all encode one-bits, we would like each to contain a copy of thepeaks shown in Figure 3. However, the exponentially-many event point game value functions would insteadgive us a single large peak during the whole interval.To address these issues, we transform the exponentially-many event point game value functions so that theyhave these properties. This involves inserting a sequence of intermediate states, and the construction is describedin detail in Appendix F.1. 7 = (¬ v ∨ v ∨ v ) ∧ (¬ v ∨ v ) ∧ ( v ∨ v ) ∧ (¬ v ∨ v ) v v v
000 001 010 011 100 101 110 1110 1 time0 1 time v v v ¬ v ¬ v ¬ v F ExtenderOutput
Figure 3: NP lower bound construction.8 valuating a Boolean formula. Once we have constructed the states v through v n and ¬ v through ¬ v n ,we can then design a gadget to evaluate an arbitrary Boolean formula F over every n -bit string. The output ofthis gadget is a state, that we will also call F , whose value is depicted in Figure 3. The output of the F state usesthe same encoding as before: if F evaluates to false for a specific bit string, then the value of F remains at L forthe entire interval, while if it evaluates to true, the value forms a peak that starts at L , increases to touch H , andthen returns to L by the end of the interval.To evaluate the formula, we first apply De Morgan’s laws to ensure that all negations are applied to proposi-tions, meaning that all internal operations of the formula consist only of ∧ and ∨ operations. Next, we introducea state in the game for each sub-formula F ′ = x ⊕ y of F , where ⊕ ∈ {∧ , ∨} . This state will have edges to thestates corresponding to x and y with no edge costs, and• if ⊕ = ∨ then the state is a maximizer state with holding rate
0, while• if ⊕ = ∧ then the state is a minimizer state with holding rate F ′ • will be the maximum of the two input states for an ∨ gate, meaning that in any particular interval the relativevalue of F ′ will contain a peak if either of the two input states contains a peak,• while for a ∧ gate, the relative value will be the minimum of the two inputs, meaning that an interval willcontain a peak only when both inputs contain peaks .Hence this correctly simulates boolean logic, and the output of state F will encode the set of bit strings that satisfythe formula. NP -hardness of computing values. Finally, we can turn this into our NP -hardness result. So far, we haveshown how to evaluate the Boolean formula, but the outcome of the evaluation does not affect the values at timezero, because each evaluation is entirely contained within its interval.To address this, we introduce one final state called the extender . The relative value function of the extenderis shown at the bottom of Figure 3. Whenever the relative value of F peaks at the value H , the extender makesthe relative value decay more gradually on the left-hand side of the peak. This decay rate is carefully chosen sothat the value will not have returned to L even after all 2 n intervals. Hence,• if the relative value of F touches H at any point in time, the relative value of the extender at time zero will bestrictly greater than L , while• if the relative value of F is never more than L , then the relative value of the extender will be L at time zero.This implies that the relative value (and hence absolute value) of the extender at time zero depends on the satis-fiability of the formula F , which gives us our NP -hardness result.The extender state is a maximizer state that has one outgoing edge to the state F with no edge cost, and acarefully chosen holding rate. The second to last relative value diagram in Figure 3 shows the affect of the holdingrate of the extender. The idea is that the maximizer would like to wait in the extender until the next interval inwhich the formula evaluates to true (if there is such an interval).The holding rate at the extender determines the gradient of the blue lines. For NP -hardness it is sufficient forthis line to be horizontal , and never touch the relative value of L , but the ability for the extender state to decayback to L after a finite amount of time will be later used in our PSPACE -hardness result.One final thing to note is that this construction uses exactly three different holding rates. The exponentially-many event point games use the holding rates 0 and 1, and one extra holding rate (of 1 /
2) is introduced in theenumeration gadget. We get the following theorem. The full formal description of the construction, along witha proof of correctness, can be found in Appendix G.
Theorem 2.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is NP -hard, even if thegame has only holding rates in { , / , } . coNP -hardness of computing values. To obtain coNP hardness, we use essentially the same technique,but with one important difference in our encoding of bits. In the NP -hardness result we used the constant L toencode a zero bit, and a peak that touches the constant H to encode a one bit. To prove coNP hardness, we flipthat upside down. An issue could arise if the peaks were located at different points in the intervals, but as shown in Lemma 10 of Appendix F.2, the peaksare always exactly in the middle. Horizontal in our relative value diagrams means a holding rate of 1 / = (¬ v ∨ v ∨ v ) ∧ (¬ v ∨ ¬ v ∨ v ) v v v
000 001 010 011 100 101 110 111 F ExtenderOutput
Figure 4: coNP lower bound construction. Troughs encode false assignments.• If a bit is one during an interval, then the relative value of the state will remain at H for the entire interval.• If a bit is zero during an interval, then this is encoded as a trough, during which the relative value touches L .We use this encoding, which we call the reverse encoding throughout the coNP -hardness construction: all of thestates of the enumeration gadget use the reverse encoding, and the formula evaluation is also done in reverseencoding. We end up with a state whose relative value encodes F in reverse encoding, as shown in Figure 4.With the reverse encoding, if F is always true, then the relative value of the state will be H . If there existsan input that makes F false, then this will be encoded as a trough. We can extend this back to time zero usingan extender state with a carefully chosen holding rate , though this time the extender state must be a minimizerstate, since we want the extender player to obtain a lower value by waiting until F is not satisfied.The end result is that the relative value at time 0 is H if F is always true, and it is strictly less than H if thereexists an assignment to variables that makes F false. Again, this construction uses only three holding rates, sowe obtain the following theorem. Theorem 3.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is coNP -hard, even if thegame has only holding rates in { , / , } . The proof of this theorem appears in Appendix G. Since the NP and coNP -hardness proofs are very similar,we prove them both at the same time in the appendix. PSPACE lower bound
We now move on to our main result, and show that computing the value of a particular state at time zero is
PSPACE -hard. We will reduce directly from TQBF, which is the problem of deciding whether a quantifiedBoolean formula is true. The high level idea is to make use of the techniques from our NP -hardness reduction todeal with existential quantifiers, and the techniques from our coNP -hardness reduction to deal with universalquantifiers.As a running example, we will use the formula F = ( v ∨ v ∨ v ) ∧ ( v ∨ ¬ v ∨ ¬ v ) ,and we will apply the reduction to the TQBF instance ∀ v ∃ v v · F ( v , v , v ) . The slightly odd choice of variable indices will be explained shortly. As for NP -hardness, this holding rate can be 1 / ( v , v , v ) : = ∀ v ∃ v v · ( v ∨ v ∨ v ) ∧ ( v ∨ ¬ v ∨ ¬ v ) | {z } Padding | {z } v = | {z } Padding | {z } v = v v v
000 001 010 011 100 101 110 111EraserExtenderLimiterDetectorOutput
Figure 5:
PSPACE lower bound construction.11 verview.
As in previous reductions, we will divide the time period into intervals, and we will associate eachinterval with a bit string, and evaluate the formula on each of those bit strings. However, in this setting we mustnow deal with both types of quantifiers.Our solution is shown in Figure 5. We use the quantifiers to divide the bit strings into blocks, and placepadding between the blocks. For our running example, we have two blocks, which correspond to the case where v = v =
1. So we have split the bit strings according to the universal quantifier in theformula. We will refer to the two sub-instances as F ′ ( v ) : = ∃ v v · F ( v , v , v ) .The idea is to evaluate the two blocks separately using the method from the NP -hardness reduction. So we willdetermine whether F ′ ( v ) holds when v =
0, and independently determine whether it holds when v =
1. Thisthen leaves us with the problem of deciding whether ∀ v · F ′ ( v ) is true, which will be evaluated using methodsfrom the coNP -hardness reduction. We do this by turning the output of the two independent evaluations of F ′ into a reverse encoded input for the coNP problem. The padding.
The padding between the blocks is used to ensure that the two evaluations of F ′ are independent.The padding is implemented by inserting extra dummy variables into the formula. In our running example, weadd the extra dummy variables v and v , but we do not modify the formula itself in any way. As shown in thefirst line of Figure 5, this leads to each block being repeated four times, since we enumerate all four possiblesettings for v and v , but none of these change the output of the formula.The first step is to take the minimum of this relative value function with a state that we call the eraser , whoserelative value function is shown in blue in the first line of Figure 5. This value function peaks during the block that we would like to keep, but stays at the value L during the blocks that we would like to erase. So by takingthe minimum, we keep the right-most copy, and erase the other three, which gives us the padding between theblocks.Recall from the NP -hardness reduction that the extender state is used to detect whether the relative valuehas peaked during a block. Furthermore, the relative value of the extender decays over time, and that the rate atwhich this happens is controlled by the holding rate of the extender.In the PSPACE -hardness reduction, we choose the decay rate so that the value will always decay back to L during the padding before the next block starts. This can be seen in the extender state in the second line ofFigure 5. In the right-hand block, there are two assignments that make the formula true, and this information iscarried to the left-hand edge of the block by the extender. The padding provides enough space for the extenderto decrease back to L before the left-hand block begins. Changing the encoding.
So far we have independently evaluated F ′ ( v ) for both possible settings of v , andthis is encoded in the second value function of Figure 5. The rest of the steps in that figure show how we thenturn this into a reverse encoding of ∀ v · F ′ ( v ) .The overall goal is to detect whether the extender is above L at the left-hand boundary of each block. In fact,we choose the decay rate of the extender to be slow enough to guarantee that if there was a peak during theblock the value of the extender is above ( H + L )/ limiter state, shown in blue in thethird line of Figure 5, whose relative value is constant at ( H + L )/
2. This effectively chops off the top half of thefunction. We then construct a state, known as the detector , shown in red in the fourth line of Figure 5. This statehas a relative value function that remains at ( H + L )/ L . We do this by encoding the formula (¬ v ∨ v ∨ ¬ v ∨ ¬ v ) in reverseencoding.We take the maximum of the value function with the detector. This does the following.• If there was a peak during the block, the value of the extender will be above ( H + L )/
2, and so the trough in thedetector will be eliminated. The limiter ensures that the relative value does not exceed ( H + L )/ L , and so the trough in the detector willnot be eliminated.The end result is that we have a trough in the final value function if and only if F ′ ( v ) was false for the corre-sponding block.Observe that this is a valid reverse encoding of the problem ∀ v · F ′ ( v ) , with the only change being that therelative function ranges between L and ( H + L )/ L and H . So we can apply the techniques from the coNP -hardness reduction to determine whether ∀ v · F ′ ( v ) is true. We do so by encoding the formula ( v ∧ v ) using our previous constructions for formulas but only over these two variables, whichresults in the very high peak. SPACE -hardness.
So far, we have seen how to deal with alternations of the form ∀ x ∃ y , but the sametechniques can also deal with alternations of the form ∃ y ∀ x . The only difference is that we must turn a reverseencoded output into the normal encoding, which can again be done with appropriately constructed limiter anddetector states.The full PSPACE -hardness result applies the two techniques inductively. Every alternation of quantifiersin the formula is handled by turning one encoding into the other, ready to be evaluated by the next level ofquantifiers. The full details can be found in Appendix H, where we prove the following result.
Theorem 4.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is PSPACE -hard.
It is also worth noting that if the formula only has k alternations, then the resulting game uses k + k holding rates.Hence, we also get the following result. Theorem 5.
For an SPTG with k + distinct holding rates, deciding whether v ( s , ) ≥ c for a given state s andconstant c is hard for the k -th level of the polynomial-time hierarchy. All of our results so far have shown hardness of deciding whether v ( s , ) ≥ c for some state s , and some constant c . In this section, we point out that our construction can also prove hardness for other, related, decision problems.As in the NP - and coNP -hardness section, we can let the outer-most extender state produce a horizontal line,rather than a decaying one. This ensures that we can pick two constants H ′ and L ′ such that val ( v , ) = H ′ if theformula is true and val ( v , ) = L ′ if the formula is false. Thus, all our hardness proofs for each of NP -, coNP -and PSPACE -hardness, and hardness for the k -th level of the polynomial time hierarchy, apply to the followingpromise problem. PromiseSPTG:
Given an SPTG, a state v and two numbers c > c ′ , with the promise that val ( v , ) ∈ { c , c ′ } , isval ( v , ) = c ?This problem can be reduced in polynomial time to each of the following problems.1. DecisionSPTG: Given an SPTG G , a state v and a value c , is val ( v , ) ≥ c ?2. EqualDecisionSPTG: Given an SPTG G , a state v and a value c , is val ( v , ) = c ?3. ϵ -StrategySPTG: Given an SPTG G , a state v , an ϵ > a is there an ϵ -optimal strategy that uses a at time 0?4. StrategySPTG: Given an SPTG G , a state v and an action a is there an optimal strategy that uses a at time 0?5. AllOptimalStrategiesSPTG: Given an SPTG G , a state v and an action a do all optimal strategies use a at time 0?The reduction is trivial for (1) and (2), since we have just removed the promise.For (3), (4) and (5), fix some ϵ < c − c ′ . We add another minimizer state v ′ to the game with holding rate M + M is the largest holding rate in the rest of the game. The state v ′ has an edge to v and an edge to a goalstate. The edge to v has cost 0 and the edge to the goal state has cost c + c ′ .It is clear that val ( v ′ , ) = c ′ if and only if val ( v , ) = c ′ . Also, if val ( v , ) = c ′ , then no ϵ -optimal strategycan use the edge to v at time 0, so no optimal strategy can do this either. Similarly, if val ( v , ) = c , then no ϵ -optimal strategy can use the edge to the goal state. This proves hardness for (3) and (4). Also, since the holdingrate is larger than M in the above construction, the minimizer will not wait in v ′ under an optimal strategy andtherefore he must use an edge immediately, which proves hardness for (5).That said, it is ϵ -optimal, for any ϵ >
0, to wait for a duration of ϵM in v ′ and then make the optimal choice ineither cases, when starting in v ′ at time 0. This explains why the ϵ -optimal variant of AllOptimalStrategiesSPTGdoes not appear in our list.Note that parametrising the problems with time t , instead of always using time 0, trivially makes the questionseven harder. Also, using techniques similar to what we use for shifting in our construction allow us to showhardness for any of these problems for a given time t ∈ ( , ) . Finally, as shown by [15], finding val ( v , ) and theoptimal and ϵ -optimal choice at time 1 can be solved in time O ( m + n log n ) and is thus in P .13 Properties of our hard instances
The instances that we have constructed actually lie in a very restricted class of graphs, which we describe in thissection.
The exponential-many event point games.
In Section 3 the family of games that we constructed are allDAGS with degree four, as seen in Figure 1. In Appendix I, we show that by slightly modifying the graph, thiscan actually be reduced to a
DAG with degree three . Furthermore, there are only two distinct holding rates namely the ones in { , } .The game also has a planar graph . This can be seen by redrawing Figure 1 in the following way. The crossingof edges in the middle of each level can be eliminated by taking each edge ( v ir , v i − ℓ ) and making it “wrap-around”under the structure by passing v r on the right before going to the left side and moving up.While proving an upper bound on the number of event points in a class of games similar, but more generalthan ours, the authors of [11, 20] use a technique based on adding more and more urgent states to the game. Astate is urgent if the owner is not allowed to wait in it. In our construction with exponentially-many event points,the minimizer would not want to wait in a state with rate 1, and the maximizer would not want to wait in a statewith rate 0, because in both cases this is the worst possible rate for them. So the optimal strategies only wait inthe state v r . Therefore, making any number of states, besides v r , urgent does not change the value functions.Hence, our results show that, while games with no non-urgent states are easy (because they can be solved as apriced game) games with a single non-urgent state still may have an exponential number of event points.In Appendix I we give a more in depth argument for this and also argue that each member of the family have pathwidth, treewidth, cliquewidth and rankwidth three . The
PSPACE -hard games.
The
PSPACE -hard games add several extra gadgets to the exponentially-manyevent point games. These gadgets essentially form a directed tree structure, whose leafs have outgoing edges toa unique copy of one of our exponential lower bound games. Hence, the games continue to be
DAGs and planar (because no edge goes “over” the top states in our exponentially-many event point games), and the gadgets canalso be constructed so that the games continue to have degree three . In Appendix I we give a more in depthargument for this and also argue that each member of the family have treewidth, cliquewidth and rankwidth three. We lose bounded pathwidth as a property, which is caused by the large tree of states that we add toconstruct our gadgets.For the NP , coNP , and polynomial time hierarchy results, we show in Appendix I that a variant of ourconstructions (that is not planar and where the treewidth, cliquewidth and rankwidth is 4 instead of 3) hasthe property that for NP and coNP hard instances there are only 2 states that cannot be made urgent. Eachalternation adds one extra state that cannot be made urgent. Hence, it is NP -hard to solve games with 2 non-urgent states and hard for the k -th level of the polynomial time hierachy to solve games with k + In Section 3 and Section 6, we showed that there are an exponential number of event points for SPTGs belongingto even very restrictive graph classes. In this section we show that there are some classes of games in which thereis at most a polynomial number of event points. Specifically, this holds for undirected graphs and trees. It thenfollows by [15] that the event point iteration algorithm algorithm runs in polynomial-time for these problems.Secondly, we show that SPTGs on DAGs are in
PSPACE . The result extends to OCPTGs because of a reduc-tion by [15]. Our main result implies that they are
PSPACE -hard and thus, this shows that they are
PSPACE -complete.
Undirected graphs.
The trick is to consider that whenever play goes to a maximizer state v at some time t from some other state v ′ , the maximizer can choose to send the play immediately back to state v ′ . Because ourstrategies are time-positional, if the maximizer follows this strategy, and the play then ever goes to a maximizerstate v from some other state v ′ , the play will continue going back and forth between v and v ′ forever, andtherefore never reach a goal state. The outcome is therefore ∞ , which is the best possible for the maximizer, andso we can assume that he will adopt this strategy.As shown in [15], if val ( v , t ′ ) = ∞ for some t ′ , then val ( v , t ) = ∞ for all t and val ( v , ) can be found in O ( m + n log n ) . In the remaining states, we can assume that maximizer states cannot be entered. This allows usto solve the minimizer and goal states as a sub-game first (which can be done in polynomial time since it is a14riced timed automata [15, 19]). The remaining maximizer states are also easy to solve in polynomial time oncethis has been done. Full details of the argument can be found in Appendix C. Trees.
The argument for trees is also fairly straightforward, in that the following lemma (see Appendix C)can easily be shown, using structural induction and how value functions are computed by the value iterationalgorithm.We will say that a line segment L is covered by a line or line segment L ′ if L ⊆ L ′ and also extend the notionto sets, i.e. a set S is covered by a set S ′ if each element L ∈ S is covered by some element L ′ ∈ S ′ (which maydepend on L ). Lemma 2.
Consider a state s which is the root of a tree with k leaves, for some number k . Then, let L s be the linesegments of val ( s , t ) . There exists a set L k of k lines that covers L s . Because there can be at most k ( k − ) intersections of k lines, that is also a bound on the number of line segmentsof val ( v , t ) . This, in turn, means that there are at most O ( n ) = O ( nk ) many line segments for val ( v , t ) over all n states of the graph. Because an event point is the time coordinate of an end point of some line segment ofval ( v , t ) for some state v , we therefore have at most O ( n ) event points. DAGs.
To show that DAGs are in
PSPACE , we will first argue that the denominator of each event point t ∗ andnumber val ( v , t ∗ ) for all v can be expressed in polynomial space. For a natural number c , we say that a fraction x is c -expressible if the denominator d of x is such that d · k = c for some natural number k . In Appendix C, weshow the following lemma. Lemma 3.
Consider a SPTG on a DAG of depth h with integer holding rates. Let R = Π v , v ∈ V | r ( v ) , r ( v ) | r ( v ) − r ( v )| . Let v be some state at depth h v and ( x , y ) some end point of a line segment of val ( v , t ) . If y = ∞ , then val ( v , t ) = ∞ and otherwise, if y , ∞ , the numbers x and y are R h − h v -expressible. We can find the set of states for which val ( v , t ) = ∞ in time O ( m + n log n ) by using techniques from [15].Specifically, that paper shows that if val ( v , t ′ ) = ∞ for some t ′ then val ( v , t ) = ∞ for all t , and that paper alsoshows how to find val ( v , ) in time O ( m + n log n ) for all v .For the remaining states, note that R h − can be described in polynomial space, since it is a product of ≤ hn numbers, each of which are bounded by the largest holding rate. In turn, we can also bound the numerators asusing at most polynomial space and thus all the numbers.This, in turn, means that a variant of the event point iteration algorithm given in [15] (that does not storethe end points of line segments of val ( v , t ) , which is only used for the output) runs in polynomial space (seeAppendix A for pseudo-code for the event point iteration algorithm), because it then stores only t ∗ and val ( v , t ∗ ) for all v at any one point for some event point t ∗ . That can then find val ( v , t ′ ) for some given v , t ′ by findingthe value val ( v , t ∗ ) for the smallest event point t ∗ > t and how val ( v , t ) behaves between t ∗ and the next smallerevent point (which is how the algorithm iterates over the event points). Thus, it can solve the decision questionwe are interested in. We give more details of this argument in Appendix C. Games with holding rates { , } . In [14], it was previously claimed (fixed in the latest arXiv version 6,arXiv:1404.5894v6) that an SPTG with holding rates { , } and integer costs can be solved in polynomial timebecause, it was claimed, such games would have only polynomially-many event points. Our results, however,show that this claim is incorrect: We show in Appendix J how to convert our examples with exponentially-manyevent points and holding rates { , } to have integer costs. While our results show that one-clock priced timed games and many special cases are
PSPACE -hard, there arestill a number of open questions.The biggest open question for priced timed games is likely the complexity of two-clock priced timed games.That said, a number of other models related to priced timed games have been considered and there is often ajump in complexity when going from one clock to two or more clocks in those models, as we mentioned in theintroduction. Also, many questions related to three or more clocks for priced timed games are undecidable [7,10, 12]. This suggests that similar questions for the case of two clocks are also undecidable.Besides that, we show that the complexity of priced timed games is
PSPACE -hard. Previous work haveshown them to be solvable in exponential time [15, 20], which does leave a gap. A possible way to resolve thequestion is to show a conjecture by [11]. If, as conjectured by [11], the number of iterations of the value iteration15lgorithm is polynomial, the problem is
PSPACE -complete, since DAGs are in
PSPACE , as we show, and thevalue iteration algorithm in essence turns the game into a DAG with states polynomial in the number of iterationsand the number of states of the game.Let ℓ be the number of event points. We show ℓ ≥ n / /
2. Previous work [15] has shown that ℓ ≤ n forSPTGs. This means that ℓ = Θ ( n ) , but this is quite a wide gap, and one could work on making it smaller.We have shown that priced timed games on DAGs with one clock are PSPACE -complete, but the best resultfor DAGS with more clocks [1] is exponential. For DAGs the results for more clocks seems similar to the oneclock case though: The value iteration algorithm runs in exponential time (see [1] for the upper bound on moreclocks, and we show the lower bound for one clock). There is an exponential number of areas (called event pointsfor one clock) where the strategy should change (see [1] for more clocks, we have the lower bound for one clockand the upper bound for one clock is in [15, 20]). Does our
PSPACE upper bound generalise to more clocks?While we resolve several special cases of one-clock priced timed games, a number are still open:•
Constant pathwidth.
We show that each member of our family that has exponentially many event points haspathwidth 3, but no computational-complexity hardness is shown and it is plausible that they are easier thanthe general case.•
Pseudo-polynomial time algorithm for costs.
Our constructions use costs that double as we double the numberof event points. To avoid the lower bounds, one could consider pseudo-polynomial time algorithms.•
A player with few states.
Our
PSPACE -hard construction has a nearly equal number minimizer and maximizerstates. On the other hand, the for automata (i.e., when only one player has states) the corresponding problemis in NL [19]. Can one design fast algorithms for the case where one player only has a few states? Here, fewcould mean either constant or one could do a parametrized analysis.• Very limited graph width.
We show hardness for games with treewidth, cliquewidth and rankwidth three, butthe cases of lower treewidths, cliquewidths and rankwidths are still open (apart from trees, which we haveshown in Section 7 can be solved in polynomial time).16 eferences [1] Rajeev Alur, Mikhail Bernadsky, and P. Madhusudan. Optimal reachability for weighted timed games. In
Proc. of ICALP , pages 122–133, 2004.[2] Rajeev Alur, Salvatore La Torre, and George J. Pappas. Optimal paths in weighted timed automata. In
Proc.of HSCC , pages 49–62, 2001.[3] Gerd Behrmann, Agnès Cougnard, Alexandre David, Emmanuel Fleury, Kim G. Larsen, and Didier Lime.Uppaal-tiga: Time for playing games! In
Proc. of CAV , pages 121–125, 2007.[4] Gerd Behrmann, Ansgar Fehnker, Thomas Hune, Kim Larsen, Paul Pettersson, Judi Romijn, and Frits Vaan-drager. Minimum-cost reachability for priced time automata. In
Proc. of HSCC , pages 147–161, 2001.[5] Patricia Bouyer. Weighted timed automata: Model-checking and games.
Electronic Notes in TheoreticalComputer Science , 158:3 – 17, 2006.[6] Patricia Bouyer, Thomas Brihaye, Véronique Bruyère, and Jean-François Raskin. On the optimal reachabilityproblem of weighted timed automata.
Formal Methods in System Design , 31(2):135–175, 2007.[7] Patricia Bouyer, Thomas Brihaye, and Nicolas Markey. Improved undecidability results on weighted timedautomata.
Information Processing Letters , 98(5):188 – 194, 2006.[8] Patricia Bouyer, Franck Cassez, Emmanuel Fleury, and Kim G. Larsen. Optimal strategies in priced timedgame automata. In
Proc. of FSTTCS , pages 148–160, 2004.[9] Patricia Bouyer, Franck Cassez, Emmanuel Fleury, and Kim G. Larsen. Synthesis of optimal strategies usinghytech.
Electronic Notes in Theoretical Computer Science , 119(1):11 – 31, 2005.[10] Patricia Bouyer, Samy Jaziri, and Nicolas Markey. On the value problem in weighted timed games. In
Proc.of CONCUR , pages 311–324, 2015.[11] Patricia Bouyer, Kim G. Larsen, Nicolas Markey, and Jacob Illum Rasmussen. Almost optimal strategies inone clock priced timed games. In
Proc. of FSTTCS , pages 345–356, 2006.[12] Thomas Brihaye, Véronique Bruyère, and Jean-François Raskin. On optimal timed strategies. In
Proc. ofFORMATS , pages 49–64, 2005.[13] Thomas Brihaye, Gilles Geeraerts, Axel Haddad, Engel Lefaucheux, and Benjamin Monmege. Simple pricedtimed games are not that simple. In
Proc. of FSTTCS , pages 278–292, 2015.[14] Thomas Brihaye, Gilles Geeraerts, Shankara Narayanan Krishna, Lakshmi Manasa, Benjamin Monmege,and Ashutosh Trivedi. Adding negative prices to priced timed games. In
Proc. of CONCUR , pages 560–575,2014.[15] Thomas Dueholm Hansen, Rasmus Ibsen-Jensen, and Peter Bro Miltersen. A faster algorithm for solvingone-clock priced timed games. In
Proc. of CONCUR , pages 531–545, 2013.[16] Marcin Jurdziński and Ashutosh Trivedi. Reachability-time games on timed automata. In
Proc. of ICALP ,pages 838–849, 2007.[17] Leonid Khachiyan, Endre Boros, Konrad Borys, Khaled Elbassioni, Vladimir Gurvich, Gabor Rudolf, andJihui Zhao. On short paths interdiction problems: Total and node-wise limited interdiction.
Theory ofComputing Systems , 43(2):204–233, 2008.[18] Salvatore La Torre, Supratik Mukhopadhyay, and Aniello Murano. Optimal-reachability and control foracyclic weighted timed automata. In
Proc. of IFIP 17th World Computer Congress — TC1 Stream , pages 485–497, 2002.[19] F. Laroussinie, N. Markey, and Ph. Schnoebelen. Model checking timed automata with one or two clocks.In
Proc. of CONCUR , pages 387–401, 2004.[20] Michal Rutkowski. Two-player reachability-price games on single-clock timed automata. In
Proc. of QAPL ,pages 31–46, 2011.[21] Ashutosh Trivedi.
Competitive Optimisation on Timed Automata . PhD thesis, University of Warwick, 2011.17
Two known algorithms for OCPTGs
In this section we describe two known algorithms that we will use in our analysis. There first algorithm iteratesover how many steps one is allowed to take before the game is over. The second algorithm iterates over the eventpoints of the game.
The value iteration algorithm.
A variant of the algorithm that iterates over how many steps one is allowed totake before the game is over is used for many classes of games and is typically called the value iteration algorithm .The algorithm was defined and shown correct independently by [1, 8]. The algorithm, given a game G , is basedon defining the notion of a finite-horizon game G k , where k is some natural number. In G k , the outcome is ∞ ifmore than k steps are taken. This definition allows one to find val ( v , t , G k ) easily from the value functions for itssuccessors in G k − (because, when entering the successors, one less step is left). For a piecewise linear function f ( t ) , where t ∈ [ , ] let L ( f ( t )) , be the set of end points of line segments of f ( t ) . The function upper (resp.lower) is the upper (resp. lower) envelope of a set of functions, i.e. basically max (resp. min), but for functions.Formally, let f , . . . , f ℓ : [ , ] → R , for some number ℓ , thenupper ( f , . . . , f ℓ )( x ) = max ( f ( x ) , . . . , f ℓ ( x )) and lower ( f , . . . , f ℓ )( x ) = min ( f ( x ) , . . . , f ℓ ( x )) . For a fixed i and x , the number f i ( x ) is omitted from the max or min, if it is undefined and in turn the functionsupper , lower are undefined at x if all of f i ( x ) are. Also, given two points ( x , y ) , ( x ′ , y ′ ) , we let ( x , y ) − ( x ′ , y ′ ) bethe line between them. We give pseudo-code for the algorithm in Algorithm 1. As shown by [11],lim k →∞ val ( v , t , G k ) = val ( v , t , G ) . Algorithm 1:
Value iteration algorithm
Result: val ( v , t , G k ) for all v , t for v ∈ V doif v is a goal state then val ( v , t , G ) = else val ( v , t , G ) = ∞ endendfor ( k ′ ← k ′ ≤ k ; k ′ ← k ′ + ) dofor v ∈ V doif v is a goal state then val ( v , t , G k ′ ) = else S ← ∅ ; for ( v , u ) ∈ E do S ← S ∪ { val ( u , t , G k ′ ) + c (( v , u ))} ; for ( x , y ) ∈ L ( val ( u , t , G k ′ ) + c (( v , u ))) do S ← S ∪ {( , y + r ( v ) x ) − ( x , y )} ; endendif v ∈ V then val ( v , t , G k ′ ) = lower ( S ) ; else val ( v , t , G k ′ ) = upper ( S ) ; endendendend he event point iteration algorithm. The second algorithm iterates over the event points. The algorithmwas given and shown correct by [15]. In particular, given an event point t ′ and val ( v , t ′ ) for all v , it finds thelargest event point t ′′ < t ′ and val ( v , t ′′ ) for all v . This is done using that if one starts waiting at time t ′′ , then onewaits until time at least t ′ . Also, val ( v , ) is easy to find for all v , since one cannot wait any more and the gameturns in to a priced game. As shown by [17], such games can be solved in O ( m + n log n ) time, using an algorithmsimilar to Dijkstra’s shortest path algorithm. The edge costs in the following priced games are lexicographicordered pairs (but we omit the second component if it is 0). To give pseudo-code for the algorithm, for an SPTG G and a function f : V → R + , let PG ( G ) be the priced game with the same states and edges as G and let PG ( G , f ) be the extension of PG ( G ) , that, for all v has an edge ( v , д ) of cost f ( v ) where д is a goal state. To define apiecewise-linear function, one just needs to define the set of end points of line segments. Therefore, to defineval ( v , t ) , we will just define L ( val ( v , t )) , i.e. the end points of line segments of val ( v , t ) . We give pseudo-code forthe algorithm in Algorithm 2 (the number t ∗ becomes in turn each of the event points, starting from the last at1, and ending with the first at 0). Algorithm 2:
Event point iteration algorithm
Result: val ( v , t ) for all v , t Solve PG ( G ) ; t ∗ ← for v ∈ V do L ( val ( v , t )) ← {( , val ( v , PG ( G )))} ; f ( v ) ← ( val ( v , PG ( G )) , r ( v )) ; endwhile t ∗ ≥ do Solve PG ( G , f ) ; d ← t ∗ ; for ( v , u ) ∈ E do ( x , y ) ← val ( v , PG ( G , f )) ; ( x ′ , y ′ ) ← val ( u , PG ( G , f )) + ( c (( v , u )) , ) ; if v ∈ V and y ′ < y and x ′ > x and d > ( x ′ − x )/( y − y ′ ) then d ← ( x ′ − x )/( y − y ′ ) ; endif v ∈ V and y ′ > y and x ′ < x and d > ( x − x ′ )/( y ′ − y )) then d ← ( x − x ′ )/( y ′ − y ) ; endendfor v ∈ V do ( x , y ) ← val ( v , PG ( G , f )) ; L ( val ( v , t )) ← L ( val ( v , t )) ∪ {( t ∗ − d , x + yd )} ; f ′ ( v ) ← ( x + yd , r ( v )) ; end f ← f ′ ; t ∗ ← t ∗ − d ; end B Exponentially-many event points
In this section we provide the technical details and proofs for our basic construction with exponentially-manyevent points.Consider the following graph G . The graph is divided into levels, with two states per level. We will dividethe states into left and right states. For all i , the left state of level i is v i ℓ and the right state is v ir . On level 0, theleft state, v ℓ , is the goal state and the right state, v ir , is a maximizer state with holding rate 1. The state v ir has anedge to v ℓ of cost 0. For all i ≥
1, at level i , the left state, v i ℓ is a minimizer state of holding rate 1 and the rightstate, v ir is a maximizer state of holding rate 0. Each node v ∈ { v i ℓ , v ir } has an edge to v i − ℓ of cost 2 − i and an edgeto v i − ℓ of cost 0.We will in this section argue that G has 2 n many event points.19 emma 4. The value function for each node at level i consists of i line segments, each of duration (in time) − i ,with slope alternating between and − . The first line segment of val ( v i ℓ , t ) has slope and starts at value − − i and the first line segment of val ( v ir , t ) has slope − and starts at value 1. Furthermore, val ( v i ℓ , t ) + − i − intersects i + times with val ( v ir , t ) on a line L with slope − / , starting at − − i − . More precisely, the intersections are attime t = − i − + k · − i − , for k ∈ { , . . . , i + − } . Finally, at time k · − i − , for k ∈ { , . . . , i + − } , we have that | val ( v i ℓ , t ) + − i − − val ( v ir , t )| = − i − Proof.
The proof will be by induction in the level.
The base case, level 0.
We see that the value function for v ℓ (being a goal state) is 0, satisfying the statement.The optimal strategy in state v r is to wait until time 1 and then move to goal. At time t , we have t − t −
1. Thus, the value falls from 1 at time 0 to 0 at time or, equivalently, val ( v r , t ) = − t . Thus,satisfying the lemma statement. The induction case, level i . By induction, we have the lemma statement for level i and needs to show it forlevel i +
1, for some i ≥
0. Let f ℓ ( t ) = val ( v i ℓ , t ) + − i − and let f r ( t ) = val ( v ir , t ) for all t .The value function val ( v i ℓ ) = lower ( val ( v i ℓ ) + − i − , val ( v ir )) = lower ( f ℓ ( t ) , f r ( t )) and the value function val ( v ir ) = upper ( val ( v i ℓ ) + − i − , val ( v ir )) = upper ( f ℓ ( t ) , f r ( t )) . Because of (1) the alternations of slopes 0 and −
1, (2) them starting with different slopes; and (3) each linesegment having equal duration (of 2 − i ), we see that at all times (except for the ends of the line segments at time k − i for some integer k ), one of the value functions val ( v i ℓ ) and val ( v ir ) have slope − Claim 1.
For each k ∈ { , . . . , i } , we have the following:1. The values of the functions f ℓ ( t k ) and f r ( t k ) at time t k : = k · − i − − i − are equal (and are otherwise differentbetween time ( k − ) − i and k − i − ), i.e. the functions k -th line segment intersect in the middle.2. Also, at time t ′ k : = k · − i , we have that | f r ( t ′ k ) − f ℓ ( t ′ k )| = − i − , i.e the functions differ by − i − , at the end ofthe line segments, when the slopes alternate (it is also the case at time 0).Proof. We will show the claim by induction in k . First, k =
1. By induction in i , we have that f ℓ ( t ) starts at1 − − i + − i − = − − i − and in the first line segment, of duration 2 − i , it has slope 0. On the other hand, f r ( t ) starts at 1 and for the first line segment, of duration 2 − i , it has slope −
1. Thus, at time 2 − i − ( = t ) they intersect,as wanted. Also, f ℓ ( t ′ ) = − − i − and f r ( t ′ ) = − − i , for t ′ = − i . Thus, | f r ( t ′ ) − f ℓ ( t ′ )| = − i − .Next, consider k ≥
2. By induction in k , we have that | f r ( t ′ k − ) − f ℓ ( t ′ k − )| = − i − and that the function f ℓ ( t ) intersected with f r ( t ) at time t k − = t ′ k − − − i − . At time t ′ k − (by induction in i ), the slopes of f ℓ ( t ) and f r ( t ) alternate. Because the two functions intersected in the middle of the last line segment, the function withleast (resp. highest) value must have slope 0 (resp. −
1) in the next line segment. They therefore intersect aftera duration of 2 − i − into the line segment, i.e. at time t ′ k − + − i − ( = t k ) and differ by 2 − i − after a duration 2 − i (which is also the duration of the line segment), i.e at time t ′ k − + − i ( = t ′ k ) , as wanted. (cid:3) Because of the use of lower, resp. upper in the above definition of the value function for v i ℓ , resp. v ir , thefirst part of the lemma follows from the first part of the claim. The second part of the lemma (about the first linesegment), follows from that (1) f ℓ ( t ) ≤ f r ( t ) until time 2 − i − , (2) that the first line segment of f ℓ ( t ) has slope 0and starts at value 1 − − i + − i − = − − i − ; and (3) that the first line segment of f r ( t ) has slope − i , but (1) also uses that the functions first intersect attime 2 − i − , which comes from the claim. The third and fourth part of the lemma also comes from the claim. Thelemma thus follows from the claim. (cid:3) The lemma implies the following theorem as a corollary:
Theorem 1.
There is a family of simple priced time games that have exponentially many event points. .1 Inapproximability with few change points We will in this section argue that a strategy for the minimizer (resp. maximizer) such that if there exists a duration [ x · − k + , ( x + ) · − k + ) , for some integer k , x in which there are no strategy change points, then the strategy isnot ϵ -optimal for ϵ < − k when starting in v k ℓ (resp. v kr ). In particular, such a duration exists if there are < k − many strategy change points in the strategy or if there exists any duration of length 2 − k + without strategychange points.The argument is nearly identical to the one in Section 3.1, but instead of referencing Figure 1, it uses Lemma 4.Thus, consider towards contradiction that ϵ < − k and that we have an ϵ -optimal minimizer strategy with nostrategy change points in D = [ x · − k + , ( x + ) · − k + ) for some integer x . Let a = x · − k + and b = ( x + ) · − k + .Thus, D = [ a , b ) . Since there are no strategy change points in D , minimizer has only three options for the durationof D : Always go to v k − ℓ , always go to v k − r or wait until the end of D and then possible do something else. If hewaits, the outcome, for play starting at time a is ≥ val ( v k ℓ , b ) + − k + (because at best the minimizer starts playingoptimally thereafter), while val ( v k ℓ , a ) = val ( v k ℓ , b ) + − k + /
2, since in the first half, the slope is 0 and in the lasthalf it is −
1, by Lemma 4. The strategy is therefore not ϵ -optimal, since ϵ < − k = − k + /
2. Alternately, if wealways go to state v k − ℓ (and pay the cost of 2 − k ) or state v k − r : We have that | val ( v k − ℓ , t ) + − k − val ( v k − r , t )| = − k ,by Lemma 4 for t ∈ { a , b } , but which one is smaller differs. Thus, at either t = a or t = b − ( − k − ϵ )/ b ) the outcome we get in v k ℓ differs from val ( v k ℓ , t ) byat least 2 − k − ( − k − ϵ )/ = − k + ϵ (atleast because minimizer need not play optimally after leaving v k ℓ ) - thedifferences in outcome between going to v k − ℓ and v k − r changes linearly at all times because of Lemma 4. Thestrategy is therefore not ϵ -optimal, since ϵ < − k .The argument for the maximizer is symmetric and uses v kr instead of v k ℓ , but if the maximizer waits during D , the outcome for starting at time a is ≤ val ( v k ℓ , b ) , because v kr has a holding rate of 0. Still val ( v k ℓ , a ) = val ( v k ℓ , b ) + − k + / ϵ -optimal.We get the following lemma. Lemma 1.
There is a family of simple priced time games in which every ϵ -optimal strategy with ϵ < / k uses k − change points. C Upper bounds for trees and undirected graphs
In this section, we will argue that there are few event points (i.e. at most polynomial many) in SPTGs that areeither trees or undirected graphs. Recall that the event point iteration algorithms runs in time O (| E |( m + n log n )) ,where E is the set of event points.We will say that a line segment L is covered by a line or line segment L ′ if L ⊆ L ′ and also extend the notionto sets, i.e. a set S is covered by a set S ′ if each element L ∈ S is covered by some element L ′ ∈ S ′ (which maydepend on L ). Lemma 2.
Consider a state s which is the root of a tree with k leaves, for some number k . Then, let L s be the linesegments of val ( s , t ) . There exists a set L k of k lines that covers L s .Proof. The proof is by structural induction in the tree-structure. Consider a leaf s . Either, it is a goal state or not.If it is then val ( s , t ) = ( s , t ) = ∞ . In either case, they form a (horizontal) line, satisfying thestatement.Next, consider an inner node v of the tree, with children c , . . . , c ℓ such that c i can be covered by a setof k i lines, which is also a lower bound on the number of leaves below c i . No matter if it is a maximizer orminimizer state, we have that val ( v , t ) is an upper/lower envelope of some functions, according to the valueiteration algorithm. We can split the upper/lower envelope into some sets and then first apply the upper/lowerenvelope on the sets on their own and then apply upper/lower envelope on the set of results for each set. Thiswill still give the same result.The sets C = { C , . . . , C ℓ } are the value function and the corresponding line segments for each of the suc-cessors. Fix an i . It is clear that lower/upper envelope of C i is a piecewise linear function with at most as manypieces as the function val ( c i , t ) and since val ( c i , t ) could be covered with k i lines before, then the upper/lowerenvelope of the set can be covered by k i lines now (or fewer). This is because there is an added line segment foreach event point and they are parallel. Thus, then a line segment is left it can never be reentered, because wemust have entered a better line segment in-between and we could then just continue on that instead. Let the setthat covers the upper/lower envelope of C i be L i . As a side remark, we do not necessarily have that L i is the setthat covers val ( c i , t ) , because the additional line segments might require us to change some lines.21f we have a set of lines L i that can cover each of the sets on their own, then the union L of them can coverthe lower/upper envelope, i.e. L can cover val ( v , t ) . Note that there are at atleast k = Í ℓ i = k i leaves under v (because there were at least k i leaves under c i and the leaves must be disjoint since the graph is a tree) and thesize of L is at most k . (cid:3) Theorem 6.
In a SPTG that forms a tree, there are at most O ( n ) many event pointsProof. According to Lemma 2, each state is made up of a subset of at most n lines. There can be at most n ( n − ) = O ( n ) intersections of n lines. (cid:3) The next lemma can especially be applied on undirected graphs.
Lemma 3.
Consider a SPTG on a DAG of depth h with integer holding rates. Let R = Π v , v ∈ V | r ( v ) , r ( v ) | r ( v ) − r ( v )| . Let v be some state at depth h v and ( x , y ) some end point of a line segment of val ( v , t ) . If y = ∞ , then val ( v , t ) = ∞ and otherwise, if y , ∞ , the numbers x and y are R h − h v -expressible.Proof. We have that minimizer will never, in an optimal strategy, from a state s such that val ( s , t ) , ∞ (it is easyto see that val ( s , t ′ ) = ∞ for some t ′ iff val ( s , t ) = ∞ for all t ), take an edge to a maximizer state. This is becauseif he did so, the maximizer could just immediately move back to s and there play would never reach goal. Thus,we can remove all such edges. Now, the set of minimizer states forms a component, which we can solve first.Because only 1 player has states in the component, it can be solved as a one player game, and such has at mosta polynomial number of event points.Next, there are two kinds of maximizer states s ′ , those with edges to another maximizer state s ′′ and thosewithout. If ( s ′ , s ′′ ) ∈ E , then ( s ′′ , s ′ ) ∈ E by the assumption of the graph. Thus, the maximizer can just move backand forth between the states forever and thus val ( s ′ , t ) = val ( s ′′ , t ) = ∞ . Otherwise, if the maximizer only haveedges to minimizer states, we can see, following the value iteration algorithm, that it is the upper envelope of apolynomial number of line segments. Upper envelopes of line segments form a Davenport-Schinzel sequence oforder 3, which implies that, if you have n line segments, then the output consists of at most 2 nα ( n ) + O ( n ) linesegments, where α is the inverse of the Ackermann function - α is much slower growing than e.g. log. Thus, wesee that we have at most a polynomial number of event points. (cid:3) D PSPACE upper bound for DAGs
In this section, we show that DAGs can be solved in polynomial space, by arguing that, for each state v , each endpoint of a line segment of val ( v , t ) is a pair with at most polynomially many bits.For a natural number c , we say that a fraction x is c -expressible if the denominator d of x is such that d · k = c for some natural number k . We trivially see that if p , q are c -expressible and r c ′ -expressible, then p − q and p + q are c -expressible and p · r is ( c ′ · c ) -expressible for all natural numbers p , q , r , c , c ′ . Also, if a number is c -expressible, then it is also ( c · k ) -expressible for all natural numbers k . Lemma 5.
Consider a SPTG on a DAG of depth h with integer holding rates. Let R = Π v , v ∈ V | r ( v ) , r ( v ) | r ( v ) − r ( v )| . Let v be some state at depth h v and ( x , y ) some end point of a line segment of val ( v , t ) . If y = ∞ , then val ( v , t ) = ∞ and otherwise, if y , ∞ , the numbers x and y are R h − h v -expressible.Proof. We will show the statement using structural induction and consider how the value iteration algorithmcomputes the function.First, for the leaves, which are at depth h . Either the leaf v is a goal state or not. If so, then val ( v , t ) = ( v , t ) = ∞ . In either case, the statement is satisfied, in the former case, because the only end pointsare ( , ) , ( , ) and all those are R = v of height h v , with successors c , . . . , c k . By structural induction, we havethat for each i , each end point ( x i , y i ) of a line segment of val ( v i , x ) are R h − h v − . The -1 is because they are at adepth 1 larger.First of all, the val ( v , t ) is either always ∞ or never ∞ . To see this consider first that v is a maximizer state.Either one of c , . . . , c k is such that val ( c i , t ) = ∞ or not. In the first case, val ( v , t ) = ∞ and in the second, bydefinition of upper and that each line segment has bounded end points, we have that val ( v , t ) is finite. Otherwise,if v is a minimizer state, then either all of c , . . . , c k is such that val ( c i , t ) = ∞ or not. In the first case, val ( v , t ) = ∞ and in the second, by definition of lower and that each line segment has bounded end points, we have that val ( v , t ) is finite.To show the lemma we thus just need to consider the case where y , ∞ . In that case, each line segment ofval ( c i , t ) or the additional line segments in L i has a slope corresponding to minus some holding rate and starts in22n end point of some line segment of val ( c i , t ) for some i and goes towards the left. Fix two such line segments,defined from end point ( x i , y i ) and slope − r i , for i ∈ { , } . They can potentially generate a new end point of aline segment in val ( v , t ) , at the intersection of the two line segments (the upper and lower envelope of the linesegments can trivially not contain other new end points of line segments). We consider the case when x ≥ x andthe other case is symmetric. We see that the line segment starting in ( x , y ) is going through ( x , y + r ( x − x )) .We let y ′ = y + r ( x − x ) . If y ′ = y , then the line segments intersect in ( x , y ) . We have by induction that x , y are such that their denominator d satisfies that c · d = R h − h v − for some integer c and thus satisfies thestatement. If y ′ < y and r ≥ r or y ′ > y and r ≤ r , then the line segments will never meet. Finally, if y ′ < y and r < r or y ′ > y and r > r , then line segments intersect in ( x , y ) where x = | y ′ − y |/| r − r | and y = ( x − x ) r + y ( = ( x − x ) r + y ′ ) .First, observe that y ′ = y + r ( x − x ) is R h − h v − -expressible, because each of y , x , x are and r is someinteger and thus 1-expressible. Next, x = | y ′ − y |/| r − r | is R h − h v -expressible, because y ′ , y are R h − h v − -expressible and 1 /| r ( v ) − r ( v )| is R -expressible. Finally, y = ( x − x ) r + y is R h − h v -expressible, because x , y are R h − h v − -expressible and thus R h − h v -expressible, x is R h − h v -expressible and r is some integer and thus1-expressible. (cid:3) Theorem 7.
Consider an SPTG on a DAG with integer holding rates. Then DecisionSPTG is in
PSPACE .Proof.
We will first argue that for each event point t ′ and state v , we have that val ( v , t ′ ) is R h -expressible. Fixa state v and a event point t ′ . An event point is the x-coordinate of an end point of a line segment of val ( v ′ , t ) and, by Lemma 5, event points are therefore R h -expressible. The number val ( v , t ′ ) is on some line segment ofval ( v , t ) , starting from some end point of a line segment ( x , y ) which is R h -expressible and having slope − r forsome holding rate r . Thus, x ≥ t ′ and hence val ( v , t ′ ) = ( t ′ − x ) r + y . Note that t ′ , x , y are R h -expressible and r ,being some holding rate, is 1-expressible. Thus, val ( v , t ′ ) is R h -expressible.The event point iteration algorithm, given an event point t ′ and val ( v , t ′ ) for all v , finds the next smallerevent point t ′′ and val ( v , t ′′ ) for all v . and finally outputs all of these numbers as the value function. To solvethe DecisionSPTG problem with t ∗ as input time and v as input state (i.e. we want to find val ( v , t ∗ ) ), we then usea variant of the event point iteration algorithm that iterates over the event points, but simply deletes the valuesfrom previous iterations, until the iteration in which we have the smallest event point t ′ ≥ t ∗ and then findsval ( v , t ∗ ) from that.We have that val ( v ′ , t ′ ) are R h -expressible as shown above. Let M be the largest holding rate (which requireslinear space). For obvious reasons, the nominator of t ′ is smaller than its denominator (or equal, but only incase t ′ = ( v , ) ≤ val ( v , ) + M for all v . Also,val ( v , ) ≥ val ( v , t ) for all t . And finally, val ( v , ) ≤ n · W (unless val ( v , ) = ∞ ), where W is the biggest weight,since it is the cost of some acyclic path in the graph from v to a goal node. Hence, the nominator of val ( v , t ′ ) Hence, the nominator of val ( v , t ′ ) can at most be n · W + M times the denominator (which was R h -expressible).Thus, each of the 2 n + ( n · W + M ) R h . We see, bydefinition, that R has at most n factors, each of which are at most M . Hence, we need at most h · n · log ( M ) bits to write down R h and thus any denominator of a R h -expressible number in each iteration. Each of the 2 n + ( n · W + M ) + h · n · log ( M ) many bits, which is polynomial in the input size.In the last iteration (i.e. the one where we output val ( v , t ∗ ) ), we have an event point t ′ ≥ t ∗ and the next eventpoint t ′′ < t ∗ . We have that val ( v , t ∗ ) is then val ( v , t ′ ) + r · ( t ′ − t ∗ ) for the holding rate r of that line segementof val ( v , t ) . We see that this computation can be done in polynomial space, since val ( v , t ′ ) , t ′ fits in polynomialspace and r , t ∗ , being explicitly in the input, takes at most linear space.In conclusion, the above described variant of the event point iteration algorithm runs in PSPACE and solvesDecisionSPTG. (cid:3)
E A helpful lemma
Before giving our lower bound constructions, we state following useful lemma. The intuition behind it was alsoan important part of the proof in [11] that one clock priced timed games have a value.
Lemma 6.
In a minimizer state s with r ( s ) = max v r ( v ) , the minimizer can always avoid waiting, i.e. val ( s , t ) = min v ∈ V |( s , v )∈ E ( val ( v , t ) + c (( s , v ))) . Similarly, in a maximizer state with holding rate , the maximizer can always avoid waiting, i.e., val ( s , t ) = max v ∈ V |( s , v )∈ E ( val ( v , t ) + c (( s , v ))) . roof. We consider the case with a maximizer state of holding rate 0 and the argument for minimizer states (withholding rate r ( s ) = max v r ( v ) ) is similar.The argument is based on the two algorithms described in Appendix A. We have from the value iterationalgorithm (Algorithm 1) that val ( s , t ) is the upper envelope of a set of functions, some that corresponds to waitingand some that corresponds to going to a successor. However, the waiting line segments cannot be above thevalue function of the successor that spawned it. This is because each line segment has slope at least 0 (byAlgorithm 2). (cid:3) F Encoding formulas
This section is intended to serve as a link between our exponentially lower bound, in the previous section andour NP and coNP hardness proof in the next. In essence, we will use our exponential lower bound to encode thevariables in SAT/Tautology and then later TQBF. Assignment time interval.
For the set of booleans B = { b , . . . , b n } , an assignment A is a map from B to { , } (or false and true). For an assignment A , we define an interval of times T A , where t ∈ T A iff, for all i ∈ [ n ] ,the i -th bit of t after the comma, t i , is A ( b i ) and there exists some j > n , such that t j , . ... = t , t = t As . We let t As be the first t ∈ T A , i.e. t ∈ T A and t As , i = A ( b i ) , for i ∈ [ n ] and t j = j ≥ n + t Am . We let t Am be the middle t ∈ T A , i.e. t ∈ T A and t As , i = A ( b i ) , for i ∈ [ n ] , t n + = t j = j ≥ n + t Ae . We let t Ae be the last t ∈ T A , i.e. t ∈ T A and t As , i = A ( b i ) , for i ∈ [ n ] and t j = j ≥ n + t A ′ s ,where A ′ is the “next” assignment, by definition of reals). Definition of function encoding.
We will encode functions using two numbers, a value v and a offset v ′ . Wehave two encodings, straight encoding and reverse encoding (that depends on v , v ′ and the set of variables it isover, the latter because it changes the duration of an assignment). Straight is used for exists, such as SAT and inexists alternations of TQBF and reverse encoding is used for for all, such as Tautology and for all alternations ofTQBF. We will then convert from one to the other as a part of our PSPACE -hardness proof.For a boolean function F (in particular (quantified) boolean formulas) and an assignment A to its variables,we write F ( A ) for the boolean the function evaluates to when the variables are assigned according to assignment A . Definition 1 (Straight encoding) . A state s encodes a boolean function F under straight encoding iff (S1) ∀ t we have that val ( s , t ) ∈ [ v − t / , v + v ′ − t / ] ;(S2) ∀ A s.t. F ( A ) is false, we have that ∀ t ∈ t A : val ( s , t ) = v − t / . (S3) ∀ A s.t. F ( A ) is true, we have that ∃ t ∈ t A :val ( s , t ) = v + v ′ − t / and val ( s , t As ) = val ( s , t Ae ) = v − t / Definition 2 (Reverse encoding) . A state s encodes a boolean function F under reverse encoding iff (R1) ∀ t we have that val ( s , t ) ∈ [ v − v ′ − t / , v − t / ] ;(R2) ∀ A s.t. F ( A ) is true, we have that ∀ t ∈ t A : val ( s , t ) = v − t / ∀ A s.t. F ( A ) is false, we have that ∃ t ∈ t A :val ( s , t ) = v − v ′ − t / and val ( s , t As ) = val ( s , t Ae ) = v − t / i ∈ { , , } , then use (en i ) to refer to (str i ) for the proof for straight encoding and to (rev i ) for reverseencoding. F.1 Encoding booleans
In this sub-section we will show how to use our lower bound family to encode booleans, i.e. v i and ¬ v i for each i , in our encodings. Step 1 of using our game to encode a variable.
To encode a variable, v i , we first construct a game accordingto our lower bound family from the previous section, with i levels. We then add a new state s i (which will encode v i - whether it is minimizer state of holding rate 1 or a maximizer state of holding rate 0 does not matter) whichhas an edge e to v ir . The cost of e , c ( e ) , is 1 / − − i − + − i − . This means that val ( s i , t ) intersect, according toLemma 4, the function ( val ( v i ℓ ) + / − − i − + · − i − ) i + times on the line L that starts at 5 / − − i − andhas slope − /
2. The value val ( s i , t ) has some similarities with what we want, when we encode v i in (straight orreverse) encoding with v = / − − i − and v ′ = − i − . In particular,1. There are 2 i + durations (the first and last are of length 2 − i − , the rest of length 2 − i − each);2. For all times t in every odd duration, v − t / ≤ val ( s , t ) ≤ v + v ′ − t / , and for some t in each such duration val ( s , t ) = v + v ′ − t / t in an even duration v − v ′ − t / ≤ val ( s , t ) ≤ v − t / , and for some t in each such duration val ( s , t ) = v − v ′ − t / . To encode ¬ v i (i.e., a variable which is true iff v i is false), we can then first use a state ˆ s i , which has an edge e ′ to v i ℓ of cost c ( e ′ ) = / − − i − + · − i . Such a construction have similar properties to that of s i (but the propertiesof odd and even durations are exchanged). The constructed states s i and ˆ s i have values similar to what we wantin our encoding of v i and ¬ v i (with v = / + − i − and v ′ = − i − ), but there are three differences:(D1) We start and end in the middle of a duration (and it is not equal to v − t / v − t / v − t / shifting . Shifting.
To deal with (D1) and (D2), we would like to have a state s ′ i such that val ( s ′ i , t ) = val ( s i , t / + / − − i − ) for t ∈ [ , ] (we use t / ( s i , t ) outside the [ , ] interval for t that we have already considered). Note that the last duration (which is the one cut in half) is of length 2 − i − ,which is why 2 − i − appears here. Also, we keep only half of the durations (because of t / ( s ′ i , t ) has value 1 at t = ( s i , / − − i − ) , is equal to val ( L , / − − i − ) = / − − i − − ( / − − i − )/ = / t = ( s i , − − i − ) , is equal to val ( L , / − − i − ) = / − − i − − ( − − i − )/ = / i many durations in a game of half the length, and it is above 1 − t / G for the interval [ − i − , − i − + / ] (which is the interval we care about), see Figure 6 for an illustration.The resulting SPTG then has a state that has value function equal to s ′ i as wanted. The game is quite similarto the original construction, but all nodes have holding rate either 0 or 1 / at the start of the first and in the middle of the remaining durations at the end of the last and in the middle of the remaining durations .
375 0 . time Figure 6: Extracting a time period from a longer game.constructed over is of length 1 / . We would prefer to use holding rates 0 and 1, so we use a classic trick from game theory andchange the currency of the output to one of half as much value (such tricks are also used in [15]). This causes theholding rates, the cost and the values (at all times), when expressed using the new currency, to double. Let G bethat game (which has holding rates in { , } ). Let s ′′ i be the state in G corresponding to s i (or equally to s ′ i ).We see that val ( s ′′ i , t ) is such that (1) there are 2 i durations (each of length 2 − i each); (2) for each time t in every even duration, 2 − t / ≤ val ( s , t ) ≤ + − i − − t /
2, and for the middle t in each such duration val ( s , t ) = + − i − − t / t in an odd duration 2 − − i − − t / ≤ val ( s , t ) ≤ − t / t in each such duration2 − − i − − t / = val ( s , t ) . Note also that slope alternates in the middle of the durations and that val ( s , t ) = − t / Lemma 7.
Consider the game G above. Consider some i ≥ and let s = s i ℓ and v i ℓ . If val ( s , t ) = val ( v , t ) , then val ( s , ) = val ( v , ) and val ( s , t ) = − t + val ( v , ) .Also, let s ′ = s ir and v ′ = д ir . If val ( s ′ , t ) = val ( v ′ , t ) + · c r , i , then val ( s ′ , t ) = · c r , i = val ( s ′ , ) Proof.
Observe that val ( v , t ) = c ℓ, i + − t , since the maximizer will wait until time 1 and go to goal. Therefore,val ( s , t ) = val ( v , t ) = c ℓ, i + − t = − t + val ( v , ) . On the other hand, if the minimizer waits in s until time 1and then move as at time 1, we have that val ( s , t ) ≤ val ( s , ) + − t . But, the minimizer can go to v at time 1, soval ( s , ) ≤ val ( v , ) . Therefore, since val ( s , t ) = − t + val ( v , ) , we see that val ( s , ) = val ( v , ) .On the other hand, if val ( s ′ , t ) = val ( v ′ , t ) + c r , i (recall that we multiplied all holding rates and costs by 2to get G , which explains why there is a 2 here). We have that val ( v ′ , t ) = val ( v ′ , ) = ( s ′ , t ) = c r , i . On the other hand, the maximizer can wait until time 1 in s ′ (costing 0) and then dowhatever he would do according to an optimal strategy at time 1. Thus, val ( s ′ , t ) ≥ val ( s ′ , ) . One of the optionsfor the maximizer at time 1 is to go to v ′ and thus val ( s ′ , ) ≥ · c r , i . But, then val ( s ′ , t ) = · c r , i = val ( s ′ , ) . (cid:3) The lemma says that if a strategy ever goes from s to v or from s ′ to v ′ , then we can assume that it first waituntil time 1 and then do so. We will next argue that (besides v r , which must use this new edge, but on the otherhand, we can just change the cost of the edge to s ℓ instead of adding a new state) that at time 1, there are equallygood options to going to the additional states. Lemma 8.
For all k ≥ , in G , we have that val ( s k ℓ , ) = val ( s k − ℓ , ) + − k If k ≥ ( s r , ) If k = If k = Also, val ( s kr , ) = val ( s k − r , ) If k ≥ ( s ℓ , ) + / If k = · c r , If k = for each original minimizer state s i ℓ there is a new maximizer state v i ℓ with holding rate 1 and an edge to it of cost 0 and then an edgefrom v i ℓ to some new goal state д i ℓ of cost c ℓ, i = val ( s i ℓ , − − i − ) in the original game. For each original maximizer state s ir , for i ≥ д ir and an edge to it from s ir of cost c r , i = val ( s ir , − − i − ) in the original game roof. The state s ℓ is a goal state, thus val ( s ℓ , t ) =
0. In the remainder of the proof, let t = − − i − . In state s r ,the maximizer can either go to goal with a cost of 0 or of 2 · c r , . The number c r , was val ( s r , t ) in G . The optimalchoice in s r in G was to wait to time 1 and go to goal. We have, in the original game, that val ( s r , t ) = − t = − i − .Clearly, the latter is preferable. Thus, val ( s r , ) = · c r , = − i .All nodes s k ℓ and s kr (in both G and G ), for k ≥ k , for all 1 ≤ k ≤ i , that c ℓ, k = − i − + Í kj = − j = / − − k + − i − < / i = k , we have that c ℓ, i = / − − i − and this is the top state), and c r , k = / s ℓ in G , we have that c ℓ, = val ( G , s ℓ , t ) = min ( val ( G , s ℓ , t ) + / , val ( G , s r , t )) = min ( / , − i − ) = − i − . For s r in G , we get the same expression, but with max replacing min and thus c r , = / s k ℓ in G , we have that c ℓ, k = val ( G , s k ℓ , t ) = min ( − k + val ( G , s k − ℓ , t ) , val ( G , s k − r , t )) = min ( − k + − i − + k − Õ j = − j , / ) = − i − + k Õ j = − j . For s kr in G , we again have get the same expression, but with max replacing min and thus c r , k = / k , for all k ≥
0, that val ( G , s k ℓ , ) = c ℓ, k and val ( G , s kr , ) = c r , k and that there is an option that is equally good as going to v k ℓ and д kr in s k ℓ and s kr respectively.First, the base case, for k =
1. For s ℓ in G , we have thatval ( G , s ℓ , ) = min ( val ( G , s ℓ , ) + , val ( G , s r , ) , c ℓ, + val ( G , v ℓ , )) = min ( , − i , − i ) = − i = c ℓ, . Note that val ( G , s ℓ , ) = val ( G , s r , ) . For s r in G , we have get the same expression, but with max replacingmin and thus val ( G , s r , ) = c r , . Also, val ( G , s r , ) = + val ( G , s ℓ , ) .Next, the induction case, for k ≥
2. For s k ℓ in G , we have thatval ( G , s k ℓ , ) = min ( val ( G , s ℓ , ) + − k , val ( G , s r , ) , c ℓ, k + val ( G , v k ℓ , )) = min ( c ℓ, k − + − k , , c ℓ, k ) = min ( c ℓ, k , , c ℓ, k ) = c ℓ, k . Note that val ( G , s k ℓ , ) = val ( G , s k − ℓ , ) + − k . For s kr in G , we have get the same expression, but with maxreplacing min and thus val ( G , s kr , ) = = c r , k . Also, val ( G , s kr , ) = val ( G , s k − r , ) . (cid:3) We can therefore remove all the states д k ℓ , v k ℓ , v kr from G and change the cost of the edge e = ( v r , v ℓ ) so that c ( e ) = − i and still have that all states have the same value as before we did so.An identical sequence of transformations can deal with (D1) and (D2) for ¬ v i .27 ounding the booleans. Finally, we deal with (D3). To do so, we add a new maximizer state L , which hasonly a single edge e . The holding rate of L is r ( L ) = /
2. The edge e goes to a new goal state and has cost 3 / L is to wait until time 1 and go to goal. Thus, the value isval ( L , t ) = ( − t )/ + / = − t /
2. We then add a state s ∗ i , with an edge to L and an edge to s ′′ i . In each case, wewill have that s ∗ i is an encoding with v = v ′ = − i − .1. Straight encoding:
In this case, s ∗ i should be a maximizer state of holding rate 0. The state s ∗ i is then a straightencoding of v i (note that the value val ( s ∗ i , t ) ≥ v , because of L and s ∗ i being a maximizer state).2. Reverse encoding:
In this case, s ∗ i should be a minimizer state of holding rate 1. The state s ∗ i is then a reverseencoding of v i (note that the value val ( s ∗ i , t ) ≤ v , because of L and s ∗ i being a minimizer state).The states are such that (en3) is satisfied by the midpoint of the duration in which the variable is true for straightand false for reverse, also the slope of the function in such a duration is t in the first half and − t in the secondhalf.Again, we can deal with (D3) for ¬ v i in the same way.We get the following lemma: Lemma 9.
For each variable v i or ¬ v i , we can, using ( i + ) states, construct a state that is a (straight or reverse)encoding of it with v = and v ′ = − i − F.2 Formula encoding
In this section, we will show how we encode a boolean formula F over n variables in our construction. We assumethat De Morgan’s laws have been applied repeatedly so that all negations are on variables only. In the previoussection section, we gave an encoding for variables and their negation, and we thus need only show how weencode ANDs and ORs. ANDs and ORs.
We give a recursive implementation of ORs and ANDs, using the same encoding for bothtypes of gate in both straight and reverse encoding. Consider some AND or OR over sub-formulas F , F , . . . F k ,for some k ≥
1. We can recursively construct a game A i , such that a state s i in it encodes F i for each i (to ensurethat our construction have certain properties, in particular constant tree-width, the game for F i has no state incommon with the game for F j for i , j ). In common for both ANDs and ORs, we add a state s that has an edge e i to s i for each i , of cost c ( e i ) = AND:
In this case, s is a minimizer state of holding rate 1.2. OR:
In this case, s is a maximizer state of holding rate 0.The state s has value val ( s , t ) and is close to satisfying our properties for straight/reverse encoding. In par-ticular, it satisfies (S1) and (R1) - for some bounds, which is trivial - and (S2)/(R2) for v =
2. The latter comesfrom the fact that all variables v i /¬ v i satisfies (en2) with v = v ′ ). It then follows fromLemma 6 that for any AND or OR directly of variables that (en2) is satisfied for (straight or reverse) encodingwith v = ( s , t ) will not necessarily satisfy requirement (S3)/(R3) of straight/reverse encoding, becausedifferent variables use different values of v ′ . We deal with it for the full formula by using a detector stater;the nodes corresponding to internal ANDs and ORs in the formula will not (necessarily) be straight or reverseencodings. Detector state.
The role of the detector state c is to convert the input state (here the state s as above for ourfull formula F ) to the right format. This will allow us to detect the truthfulness of the formula according to thegame value at a specific time. As mentioned above, the issue with s is that val ( s , t ) + t / F , in either straight or reverse encoding. To do this, we make c encode ( F ∧ ( v n ∨ ¬ v n )) for straight encoding and ( F ∨ ( v n ∧ ¬ v n )) for reverse encoding. Note that ( v n ∨ ¬ v n ) is true for any assignment,and ( v n ∧ ¬ v n ) is false for all assignments; thus both these boolean formulas are equal to F . Lemma 10.
The state c (whether we consider straight or reverse encoding) encodes the formula F , with v = and v ′ = − n − . Also requirement (en3) is satisfied precisely by t Am for any A for which F ( A ) is true/falseProof. By the same argument as for s , c satisfies requirement (en1) and (en2).We will give the following argument for straight encoding (resp. reverse encoding):28 laim 2. If F is true (resp. false) for some assignment A , then val ( s , t Am ) ≥ + − n − (resp. val ( s , t Am ) ≤ − − n − ).Proof. We do so recursively.• It is true for v n and ¬ v n straightforwardly (because the duration of t Am is exactly the duration for the corre-sponding state to v n resp. ¬ v n and variables reach their maximum in the middle).• It is true for each state s corresponding to another variable, because val ( s , t ) + t / v ′ (resp. v ) at the start of the duration for s , until the middle and then decrease back to v ′ (resp. v ′ ) at theend. The first half of val ( s , t ) + t / t / − t /
2) and the last half slope − t / t / i < j , the duration of the variable v i is 2 j − i times the duration of v j and the duration of each variable v i startsat the start of a duration for v j and ends at the end of a duration of v j . Thus, if v i and v j are true in A , for i < j ,then val ( s ∗ i , t ) ≥ val ( s ∗ j , t ) (resp. val ( s ∗ i , t ) ≤ val ( s ∗ j , t ) ) for all t ∈ T A . It then follows by considering j = n .• It is true for each state v that encodes an OR over F , . . . , F n (with state f i encoding F i ). It is then a maximizerstate of holding rate 0 and thus val ( v , t ) = max i ( val ( f i , t )) (by Lemma 6). If the OR is true (resp. false) for A then it follows since for some (resp. each) F i , we have F i ( A ) is true (resp. false) and it is then true for f i andthus for v in turn.• It is true for each state v that encodes an AND, similar to OR. (cid:3) State c satisfies (str3) for straight encoding: Consider the node v that encodes the OR outside F in the formula ( F ∧ ( v n ∨ ¬ v n )) . It is then a maximizer state of holding rate 0 and thus val ( v , t ) = max ( val ( s ∗ n , t ) , val ( ˆ s ∗ n , t )) (byLemma 6). Observe that val ( s ∗ i , t ) , val ( ˆ s ∗ i , t ) ≤ + − n − , since they encode the variables v n / , v n in a straightencoding (with v = v ′ = n − ) and val ( s ∗ i , t Am ) = + − n − or val ( ˆ s ∗ i , t Am ) = + − n − (because either v n or , v n is true in each assignment). Also,val ( s ∗ i , t As ) = val ( ˆ s ∗ i , t As ) = val ( s ∗ i , t Ae ) = val ( ˆ s ∗ i , t Ae ) = − t / ( v , t As ) = max ( val ( s ∗ i , t As ) , val ( ˆ s ∗ i , t As )) = − t / = max ( val ( s ∗ i , t Ae ) , val ( ˆ s ∗ i , t Ae )) = val ( v , t Ae ) . for all A (using Lemma 6). Since c is a minimizer state of holding rate 1 (being an AND), we then have thatval ( c , t ) = min ( val ( s , t ) , val ( v , t )) (by Lemma 6). We get that c satisfies (str3), since val ( v , t Am ) = + − n − − t / t ∈ T A )and val ( s , t Am ) ≥ + − n − . Also, val ( c , t As ) = val ( c , t Ae ) = − t /
2, for all A , because it was true for v .The argument for reverse encoding is similar. (cid:3) G NP - and coNP -hardness As a stepping stone towards
PSPACE -hardness, we first show that the problem is NP - and coNP -hard. We willdo so by encoding SAT and DNF-tautology, well-known NP - and coNP -hard problems. NP -hardness. Given a SAT-formula F (with variables encoded in straight format), we construct a game G sothat a detector state c encodes F (according to the previous section). Our NP -hardness will use a special statecalled an extender state .In essence, the job of the extender state x is to extend the duration for which the formula is true (because ofthe exists part of SAT - we extend the duration for which the formula is false for coNP ). The extender state x isa maximizer state, has holding rate 1 / c of cost 0. Consider some time t . If there is an assignment A such that t ≤ t Am and F ( A ) , then it is an optimal strategy in x to wait until t Am and then go to c .29e will show so by considering all possible strategies in x . Let f ( t ) = v − t /
2. If we start in x at time t , andwait until time t ′ and then move to c (and afterwards follow some optimal strategy), we get ( t ′ − t ) · + val ( c , t ′ ) = ( t ′ − t ) · + val ( c , t ′ ) − f ( t ′ ) + f ( t ′ ) = ( val ( c , t ′ ) − f ( t ′ )) + f ( t ) , using that f ( t ) = v − t / = v − t / − t ′ / + t ′ / = ( t ′ − t )/ + f ( t ′ ) . We thus see that the optimal strategy(for the maximizer, since x belongs to him) is to wait until a time that maximizes ( val ( c , t ′ ) − f ( t ′ )) (because f ( t ) is independent of how much we wait). According to Lemma 10, such times t ′ are when there is an satisfyingassignment A to F for which t ≤ t Am . If no such satisfying assignment exists, it is an optimal choice to go to c directly (because we might have that t is just slightly larger than t Am for some satisfying assignment A and thefunction ( val ( c , t ′ ) − f ( t ′ )) is then decreasing until hitting 0).Note that 0 < t Am for all assignments A and thus, we have that val ( x , ) = + − n − if there is a satisfyingassignment A to F (and it is optimal to wait until time t Am and then move to c when starting in x at time 0) andotherwise, if no satisfying assignment exists, c is such that val ( c , t ) = f ( t ) (because it is a straight encoding ofthe formula), and, by our above calculations, we see that val ( x , t ) = f ( t ) as well. In particular, val ( x , ) =
2. Weget the following theorem.
Theorem 8.
In the SPTG G constructed above val ( x , ) ∈ { , + − n − } and val ( x , ) = + − n − iff there is asatisfying assignment to the SAT instance with boolean formula F . Note that the SPTG problem in the theorem is an instance of the PromiseSPTG problem.Because DecisionSPTG is harder than PromiseSPTG and since we only used holding rates in { , / , } , weget the following theorem as a corollary: Theorem 2.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is NP -hard, even if thegame has only holding rates in { , / , } . We can show coNP -hardness similarly, by considering DNF-tautology, using reverse encoding and havingthe extender be a minimizer state instead. Changing the extender to be a minimizer state instead, then ensuresthat val ( x , ) ∈ { , − − n − } and val ( x , ) = − − n − iff there is an assignment to the DNF-tautology instancesuch that it evaluates to false (or in other words, the formula F is a tautology iff val ( x , ) = Theorem 3.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is coNP -hard, even if thegame has only holding rates in { , / , } . H Quantified boolean formula encoding and
PSPACE -hardness
In this section, we will show how to encode a quantified boolean formula in our encoding. This then triviallyallows us to show
PSPACE -hardness, since if we can encode a quantified boolean formula, we can (easily) solveTQBF. Doing it this way also allows us to give a more precise picture of what is required for being hard for the k -th level of the polynomial time hierachy.Consider a quantified booolean formula ∀ v , . . . , v n ∃ v . . . , v n ∀ . . . ∃ v n , . . . , v nn n : F ( v n , . . . , v n , v , . . . , v nn n ) , where, like before F ( v n , . . . , v n , v , . . . , v nn n ) is assumed to have negations only on the variables (we can stillassume so, because of De Morgan’s laws). Variable encoding.
We use variables v , . . . , v n to encode v , . . . , v n , variables v n + , . . . , v n + n + to encode v , . . . , v n and in general variable v S to encode v ji , where S = Í j − k = ( n k + ) + i . In particular, we skip twovariables whenever we have an alternation. 30 ecursive construction. We will recursively construct an encoding of our quantified boolean formula byshowing: Given a straight encoding, where v ≥ < v ′ ≤ − n − (with free variables v , . . . v n , v , . . . , v in i - i.e. the ones given by time) of F Ai + : = ∀ v i + , . . . , v i + n i + ∃ v i + , . . . , v i + n i + ∀ . . . ∃ v n , . . . , v nn n : F ( v n , . . . , v n , v , . . . , v nn n ) we give a reverse encoding, where v ≥ < v ′ ≤ − n − , (with free variables v , . . . v n , v , . . . , v i − n i − - i.e.the ones given by time) of F Ei : = ∃ v i , . . . , v in i ∀ v i + , . . . , v i + n i + ∃ . . . ∃ v n , . . . , v nn n : F ( v n , . . . , v n , v , . . . , v nn n ) . Also, we will show how to, given a reverse encoding, where v ≥ < v ′ ≤ − n − (with free variables v , . . . v n , v , . . . , v in i - i.e. the ones given by time) of F Ei + : = ∃ v i + , . . . , v i + n i + ∀ v i + , . . . , v i + n i + ∃ . . . ∃ v n , . . . , v nn n : F ( v n , . . . , v n , v , . . . , v nn n ) give a straight encoding, where v ≥ < v ′ ≤ − n − (with free variables v , . . . v n , v , . . . , v i − n i − - i.e. theones given by time) of F Ai : = ∀ v i , . . . , v in i ∃ v i + , . . . , v i + n i + ∀ . . . ∃ v n , . . . , v nn n : F ( v n , . . . , v n , v , . . . , v nn n ) . Note that we can directly give a straight/reverse encoding of F ( v n , . . . , v n , v , . . . , v nn n ) using the section onformula encoding, and thus, if we show how to make this recursive construction, we can recursively constructthe full quantified boolean formula. The detector state.
Let v S be the variable encoding v i . We construct a state c which encodes the formula F ′ = ( F Ai + ∧ v S − ∧ v S − ) (i.e. by having states s ∗ S − and s ∗ S − , encoding v S − and v S − respectively and then c isa minimizer state of holding rate 1 with an edge to each of s , s ∗ S − , s ∗ S − . The edge to s has cost 0 and the ones to s ∗ S − , s ∗ S − have cost v −
2. Thus, val ( c , t ) = min ( val ( s , t ) , val ( s ∗ S − , t ) + v − , val ( s ∗ S − , t ) + v − ) (by Lemma 6). Lemma 11.
The state c straight encodes F ′ with the same v and v ′ as for s .Proof. Consider time split into durations of length 2 − S + (i.e. a duration for v S − ) each. Then, for any time t in thefirst 3 / ( c , t ) = v − t /
2, because the assignment A for which t ∈ T A is such that atleast one of v S − and v S − is false. Still, the state c is a straight encoding of ( F ∧ v S − ∧ v S − ) (with the same v and v ′ as for s ): Property (str2) comes, similar to earlier cases, from that each of the functions val ( s , t ) , val ( s ∗ S − , t ) + v − ( s ∗ S − , t ) + v − v − t / ≤ val ( s , t ) , val ( s ∗ S − , t ) + v − , val ( s ∗ S − , t ) + v − ( c , t ) ≤ val ( s , t ) ≤ v + v ′ − t /
2. That (str3) is satisfied comesfrom that for the last 1 / − S + , we have that val ( s S − , t ) starts with v − t / / v − t / ( s S − , t ) ≥ val ( s , t ) for such t . Similarly, val ( s S − , t ) increases, from value v − t /
2, with the highest holdingrate from the middle of these durations, until 3 / v − t / ( s S − , t ) ≥ val ( s , t ) . In particular, val ( c , t ) = val ( s , t ) for such t and thus, since val ( s , t ) satisfies (str3) for such t , it is also satisfied by c . On the other hand, in the first 3/4 of these durations, at least oneof val ( s S − , t ) and val ( s S − , t ) have value v − t /
2, because they are straight encodings (and all such t belongs to T A for some assignnment where either v S − or v S − are false). (cid:3) The extender state.
Next, let x be a maximizer state with holding rate r ( x ) = / − v ′ . · − S + and an edge to c of cost 0. Intuitively, x is an extender, similar to in our NP -hardness proof, for extending the truth value of ourassignments to all variables, but which resets (back to having value v − t /
2) between each assignment of the first S − c being a straight encoding) time split into durations of length 2 − S + = · − S + (i.e. a duration for v S − ) each. Consider such a duration, which corresponds to some assignment ˆ A of variables v , . . . , v S − . 31 emma 12. For all t in a slim region, from / to / = / into the duration, either val ( x , t ) ≥ v + v ′ / − t / ,if F Ei ( ˆ A ) is true, or val ( x , t ) = v − t / if F Ei ( ˆ A ) is falseProof. We will first argue that the maximizer never wait w > . · − S + in x from time t , for any t . This is becausethe outcome is then r ( x ) · w + val ( c , t + w ) ≤ w − w · v ′ . · − S + + v + v ′ − t + w < v − t . On the other hand not waiting at all gives an outcome of val ( c , t ) ≥ v − t /
2, because c is a straight encoding.Observe that val ( c , t ) = v − t / t from 0 to 3 / v S − and v S − arefalse. Let t ′ be some number between 11 /
16 and 3 / t the durations, val ( c , t ) = v − t / x in that duration, because, he cannot, as shown above, wait until 3 / ( c , t ) = val ( x , t ) = v − t / ( x , t ′ ) = v − t ′ /
2. But if val ( c , t ) = v − t / t ∈ T A for each assignment A that extends ˆ A into anassignment to variables v , . . . , v in i , then, F Ei ( ˆ A ) is false.On the other hand, if for some t in a duration, we have that val ( c , t ) = v + v ′ − t / ( c , t ) , v − t / t in such a duration, by straight encoding), then, val ( x , t ′ ) ≥ v + v ′ / − t /
2, because, t > t ′ (since val ( c . t ′′ ) = v − t / t ′′ before 3/4th into the duration) and thus t − t ′ is at most 5/16 of theduration. If we wait d ≤ w = / · − S + = / · − S + from time t until time t ′ , we get an outcome of r ( x ) · d + val ( c , t ′ ) = r ( x ) · d + v + v ′ − t ′ / = d / − d · v ′ . · − S + + v + v ′ − t ′ / = ( v ′ − d · v ′ . · − S + ) + v − t / ≥ ( v ′ − w · v ′ . · − S + ) + v − t / = v + v ′ / − t / . But if val ( c , t ) = v + v ′ − t / t ∈ T A and some assignment A that extends ˆ A into an assignment tovariables v , . . . , v in i , then, F Ei ( ˆ A ) is true. (cid:3) Limiter.
We will now, using a few more states, use the above lemma to create a state s ′ that is a reverseencoding of F Ei with v i ← v + v ′ / v ′ i ← v ′ /
2. First, to ensure that val ( s ′ , t ) ≤ v i − t /
2, we will introducea limiter state L . The state L is a minimizer state of holding rate 1, with an edge to x and one to a maximizerstate s ′′ of holding rate 1 /
2, each of cost 0. The state s ′′ has a single edge to a goal state of cost v i − /
2. Thus,val ( s ′′ , t ) = v i − / + ( − t )/ = v i − t / ( L , t ) = min ( val ( x , t ) , val ( s ′′ , t )) = min ( val ( x , t ) , v i − t / ) , by Lemma 6. The reverse encoding state s ′ . The state s ′ , which we will show, reverse encodes F Ei is a maximizer state ofholding rate 0 that has an edge to L and to a state r reverse encoding F ′′ = (¬ v S + ∨ ¬ v S ∨ v S − ∨ ¬ v S − ) with v i and v ′ i . Lemma 13.
The state s ′ reverse encodes F Ei with v i and v ′ i Proof.
We have that val ( s ′ , t ) = max ( val ( L , t ) , val ( r , t )) by Lemma 6.First, property (rev1) follows from that val ( L , t ) ≤ v i − t / ( r , t ) ∈ [ v i − t / , v i − v ′ i − t / ] (by reverseencoding), and thus, val ( s ′ , t ) = max ( val ( L , t ) , val ( r , t )) ∈ [ v i − t / , v i − v ′ i − t / ] .For property (rev2) and (rev3), consider some assignment ˆ A to the variables v , . . . , v S − . For each t ∈ { t ˆ As , t ˆ Ae } ,we have that val ( s ′ , t ) = val ( s ′ , t ) = v i − t /
2, because val ( L , t ) = v i − t / t (by reverse encoding andsince such t are also the end of durations for higher numbered variables) and val ( s ′ , t ) ≤ v i − t / ( s ′ , t ) = max ( val ( L , t ) , val ( r , t )) = v i − t / s ∗ S + , the state encoding ¬ v S + , has a duration of 2 − S − = − S + /
16. Also, theboolean encoding of 11 /
16 is 0 . F ′′ is true, except for a period of length 1 /
16, starting at 11 / t < [ / , / ] , we have that (1) val ( r , t ) = v i − t /
2; (2) val ( L , t ) ≤ v i − t / ( s ′ , t ) = max ( val ( L , t ) , val ( r , t )) (by Lemma 6), we have that val ( s ′ , t ) = v − t /
2. For any t ∈ [ / , / ] ,32e have by Lemma 12 that val ( c , t ) is greater than and therefore val ( L , t ) is equal to v + v ′ / − t / = v i − t / F Ei ( ˆ A ) is true and equal to v − t / = v i − v ′ i − t / F Ei ( ˆ A ) is false. Let A be the extension of ˆ A that maps v S + , v S , v S − to true and v S − to false. Note that T A is the duration covering [ / , / ] of the duration. Wehave that val ( L , t Am ) = v i − v ′ i − t /
2, by it reverse encoding F ′′ with those parameters. Thus, for all t in theduration, val ( s ′ , t ) = v i − t / F Ei ( ˆ A ) is true and otherwise, val ( s ′ , t Am ) = v i − v ′ i − t / F Ei ( ˆ A ) is false. (cid:3) We can similar give a straight encoding of F Ai , with v ← v − v ′ / v ′ ← v ′ / F Ei + with v and v ′ .We can thus encode any quantified boolean formula. In particular, we see that PromiseSPTG is PSPACE -hard. Hence, we get the following theorem.
Theorem 4.
For an SPTG, deciding whether v ( s , ) ≥ c for a given state s and constant c is PSPACE -hard.
Also, since we need holding rates { , } for our family with exponentially many event points and besides thatone more distinct holding rate (for the extender state) per alternation to encode a quantified boolean formula,we also get the following theorem. Theorem 5.
For an SPTG with k + distinct holding rates, deciding whether v ( s , ) ≥ c for a given state s andconstant c is hard for the k -th level of the polynomial-time hierarchy. I Graph properties
We will in this section argue that our construction (or similar constructions) belongs to very many special casesof graphs. A simple way to understand our
PSPACE -hardness construction is that it basically forms a tree, butwith the leaves being a member of our family of graphs with exponentially many event points.First, we will give three simple properties.1.
DAG.
Note that our lower bound graph is acyclic, since our family of graphs with exponentially many eventpoints were, and it is thus a DAG2.
Planar.
It is also planar (i.e. it can be drawn in the plane without edges crossing), since we can draw our familyof graphs with exponentially many event points in a planar way such that the last level (i.e. states s i ℓ , s ir ) are onthe outmost surface and thus, a tree, like our PSPACE -hardness construction is, with such leaves are planar.3.
Degree 3 or 4.
By assuming that ANDs and ORs are over two variables (which we can do without loss ofgenerality), we see that our graph has degree at most 4. In fact, we can get it down to 3, by for each state s (1) adding a maximizer state s ′ with holding rate 0; (2) adding an edge from s ′ to s of cost 0, thus ensuringthat val ( s ′ , t ) = val ( s , t ) by Lemma 6 and (3) letting all incoming edges from s go to s ′ instead. The resultinggame is such that each state has the same value as before the modification, but the degree is at most 3. Doingso does not change any of other graph properties. Few holding rates.
First, observe that we only use two holding rates, 0 and 1, in our exponential lower boundfamily. Also, in our NP -hardness construction we only use holding rates 0 , / ,
1. In general, for a formula with k − k + ∃ or ∀ in theformula) and hence SPTGs with k + k -th level of the polynomial time hierachy. I.1 Urgent states
One can consider a slightly more general variant of OCPTGs and SPTGs where a state may be declared urgent .In an urgent state, the owner cannot wait.The notion is of interest at least partly, because the proofs in [11,20], showing that OCPTGs have a value andat most exponentially many event points respectively, are recursive arguments on graphs with more and moreurgent states.We will say that a set of states S is urgent-equivalent if, for some pair of optimal strategies, no player waitsin any state of S . Note that changing any number of states in an urgent-equivalent set to being urgent does notchange the value (because some optimal strategy in the original game is still useable). Observe that our familywith exponentially many event points are such that S is urgent-equivalent, where S is all states, except for s r ,because of Lemma 6. Hence, even in games with all but 1 state being urgent, we have exponentially many eventpoints. 33ext, consider that in our NP -hardness or our PSPACE -hardness construction, we reuse the state s r also useit instead of state L (used to bound booleans - the two states have the same value function) instead of coping itfor each time we use our family in the construction. Then, similar to above, we see that the problem is NP -hardwith 2 non-urgent states (the state s r and the extender state) and in general, with k + k -th level of the polynomial time hierarchy (we add one extender state for each alternation). Notethat the resulting graph is not planar (and also does not have low degree, but it can still be changed similar tobefore to have degree 3 though while still satisfying all properties except for planarity). One can also show thatit has tree-width/clique-width 4 as defined later (by simply having s r in all bags and giving it a unique colorrespectively). Since it has clique-width 4 it has rank-width 4 as well. I.2 Treewidth
Treewidth is a classic measure of how tree-like a graph is, with treewidth being 1 if the graph is a tree. Manyalgoritmic problems are easier on graphs with constant tree-width, e.g. monodic second order logic, which islucky since many graph families (e.g. control flow graphs of C programs without gotos, used in verificationof programs) have constant tree-width. We will argue that our
PSPACE -hard family have tree-width 3, thusshowing that solving SPTGs on constant tree-width graphs are
PSPACE -hard.
Treewidth definition. A tree-decomposition of a graph G = ( V , E ) is a pair ( B , T ) where B = { B , B , . . . , B k } is a family of subsets of V (called bags ), and T is a tree with the bags as states, satisfying that• for each edge ( s , t ) ∈ E , there exists i , such that s , t ∈ B i .• for each state s ∈ V , the set of bags X s containing s (i.e. B i ∈ X s iff s ∈ B i ), is a non-empty subtree in T .The width of a tree-decomposition is max i | B i | −
1. The tree-width of a graph G = ( V , E ) is the minimum widthof any tree-decomposition of G (there can be many tree-decompositions of the same graph). The path-width of agraph G = ( V , E ) is the minimum width of any tree-decomposition ( X , T ) of G , for which T is a path. Tree-decomposition construction.
We will first argue that our exponential lower bound example have path-width 3.The bags are such that B k = { s k − ℓ , s k − r , s k ℓ , s kr } for k ∈ { , . . . , i } . The bag B k has an edge to B k + for k < i and to B k − for k > I.3 Clique-width
There are more general graph properties than tree-width (even if tree-width is the most well-known) for whichmany algorithms runs faster on the special case. One of these are clique-width. It is known that if the treewidthis w then clique-width is at most 3 · w − . In our case, because the tree-width was 3, that would imply that theclique-width is at most 12. We will argue that it is in fact 3. Definition of clique-width.
We will next define clique-width . To do so, let k be some number. The set ofgraphs of clique-width at most k are exactly those that can be constructed using the following set of rules:1. Given an integer 1 ≤ i ≤ k , output a new graph with one state of color i
2. Given two graphs G , G ′ , output the disjoint union of them3. Given a graph and two integers i , j such that 1 ≤ i , j ≤ k , output the graph you get by adding an edge fromeach state of color i to each state of color j
4. Given a graph and two integers i , j such that 1 ≤ i , j ≤ k , output the graph you get by changing the color ofeach state of color i to color j . 34 ower bound on clique-width. A graph H = ( V ′ , E ′ ) is an induced sub-graph of a graph G = ( V , E ) , if thereexists an injective function f : V ′ → V , such that for all pairs s , t ∈ V ′ , we have that ( s , t ) ∈ E ′ iff ( f ( s ) , f ( t )) ∈ E .There are also futher types, e.g. rank-width, however, if a graph has clique-width k , then it has rankwidth atmost k as well and thus our graph has rank-width 3.It is known that the set of graphs (with edges) that have clique-width 2 are exactly the ones that do not havea path of length 4 as an induced sub-graph.Note that already our exponential lower bound family (with i ≥
3) has an induced sub-graph which is a pathof length 4. E.g. s ℓ , s ℓ , s ℓ , s ℓ is such an induced sub-graph. Thus, the clique-width is at least 3. Upper bound on clique-width.
We first construct our family with exponentially many event points.We do so in steps. In the first few steps, we construct s ℓ and s r , add the edge between them (this uses 2 colors)and color them, say, green.We then do an iterative construction to construct all i levels as follows. The last level is special though, in thatonly one state of that level has incoming edges (and we could have omitted the other state from our construction).Consider that for some k ≤ i −
2, states s k ℓ and s kr are green, states s j ℓ , s jr , for 0 ≤ j < k are blue and all edgesbetween them have been added, then, we can add s k + ℓ and s k + r as a red state, add all edges between red andgreen states and then color green states blue and then red states green and we go to the next iteration. We willonly use one of s i ℓ and s ir in the tree above our family, so add first the other (i.e. if we want to use s i ℓ , add first s ir )as a red state, add all edges between red and green, color the red state blue and then add the one we want to uselater as a red state, add all edges between red and green and color the green states blue and finally the red stategreen. This used 3 colors.To construct our full PSPACE -hardness construction, we construct each family member as above and thenconstruct the graph above level by level (i.e. add the next higher state in the tree as a red state, add all edgesbetween red and green and recolor the green states blue and then the red state green and proceed upward likethat). Again, this uses 3 colors in total.
J Integer costs
Here we show how to convert our games with exponentially-many event points and two holding rates to haveinteger costs.Fix some i . Taking the member with i levels our family of SPTGs that have an exponential number of eventpoints, and changing the unit of time to be 2 − i of the old time unit (thus, the duration of the game that waspreviously of length 1 time unit is now 2 i time units) and changing the currency for the output to be 2 − i -th ofthe old outputs currency, we see that the costs on edges are scaled with 2 i (and are thus integers), the holdingrates are scaled with 2 i · − i = { , } ), because the change in time unit and in currencycancels and finally the duration has changed to 2 i time units, i.e. time is in [ , i ] . The resulting game then hasthe same number of event points (2 i + ), but have integer costs and holding rates in { , }}