Cops and Robbers, Game Theory and Zermelo's Early Results
aa r X i v : . [ c s . D M ] J u l COPS AND ROBBERS, GAME THEORY ANDZERMELO’S EARLY RESULTS
ATHANASIOS KEHAGIAS AND GEORGIOS KONSTANTINIDIS
Abstract.
We provide a game theoretic framework for the game of cops and robbers (CR).Within this framework we study certain assumptions which underlie the concepts of optimalstrategies and capture time . We also point out a connection of these concepts to early workby Zermelo and D. K¨onig. Finally, we discuss the relationship between CR and related pursuitgames to reachability games . Introduction
In this note we study the game of cops and robbers (CR), first introduced in [11, 12]. Our goals are the following.(1) We examine the concepts of optimal strategies and capture time from a game theoretic point of view. While this is a natural formulation, it is generally not used in the CRliterature. Notable exceptions are [7, 5, 6] but, in our opinion, these papers overlooksome important details. We must stress that we consider [7, 5, 6] extremely valuablecontributions, which introduce novel concepts and reach correct conclusions . However, webelieve that some issues underlying these conclusions have not been analyzed completely.(2) Similar issues have been discussed in two early papers by Zermelo [14] and D. K¨onig [9],in the context of chess . This brings us to our second goal: bring to the attention of CRresearchers the connection between Zermelo (an early game theorist), D. K¨onig (an earlygraph theorist) and CR (a natural meeting point between graph and game theory).(3) Our final goal is to present a connection between CR (and related pursuit games) and reachability games [3, 10]. This connection has been, to some degree, anticipated in theCR literature but never explicitly noted.We assume the reader is familiar with CR as described in [11, 1]. We follow the notation andterminology of [7]. We will assume the CR game is played by one cop and one robber on a cop-win graph G = ( V, E ) (extension to robber-win graphs and multi-cop games is straightforwardbut omitted, in the interest of brevity). We denote the cop’s (resp. robber’s) move at the t -thround by x t (resp. y t ). 2. Game Theoretic Analysis of CR
Discussion of ( time ) optimal strategies has appeared relatively recently in the CR literature.Two examples are [7, 5] and we will take these as representative of the approach prevalent inthe CR literature. We will also examine a more recent discussion, which appears in [6].While the concept of strategy is central in the CR literature, it is usually introduced informally,without providing a precise definition. Regarding optimality, in [7] a strategy is called “optimal Date : August 21, 2018.The authors thank G. Hahn for very useful discussion and comments. for the cop if no other strategy gives a win in fewer moves” and a strategy is “optimal for therobber [...] if no other strategy forces a longer game”. This definition is also rather informal.We will provide more rigorous definitions of “strategy” and “optimality” in game theoretic terms. To this end we introduce the following definitions.(1) A game position is a triple ( x, y, p ), where x (resp. y ) is the current cop (resp.robber)position and p is the player whose turn it is to play.(2) A history is a sequence of cop and robber moves. A finite history has the form x y x y ...x t or x y x y ...y t ; an infinite history has the form x y x y ... .(3) A legal cop (resp. robber ) strategy is a function s C (resp. s R ) which maps finite historiesto legal next cop (resp. robber) moves: x t +1 = s C ( x y ...x t y t ) ∈ N [ x t ] (resp. y t +1 = s R ( x y ...y t x t +1 ) ∈ N [ y t ] ).A cop strategy s C also provides a cop move x = s C ( ∅ ) at the beginning of the game,when presented with the empty history ∅ .(4) A memoryless (legal) cop strategy is one which only depends on the current cop and rob-ber position; in other words: ∀ x y ...x t y t : s C ( x y ...x t y t ) = σ C ( x t y t ). Similarly, a mem-oryless (legal) robber strategy satisfies: ∀ x y ...y t x t +1 : s R ( x y ...y t x t +1 ) = σ R ( y t x t +1 ).(5) A play is a history h which either terminates with a capture (i.e., h = x y x ...y t − x t with y t − = x t or h = x y x ...x t y t with x t = y t ) or, if no capture takes place, continuesfor an infinite number of rounds (i.e., h = x y x ... with y t − = x t and x t = y t for all t ). Since a pair ( s C , s R ) fully determines the corresponding play h , we will denote thelength of the play by T ( s C , s R ).(6) When the cop uses s C and the robber uses s R , the robber’s (resp. cop’s) payoff is T ( s C , s R ) (resp. − T ( s C , s R )). CR with this payoff is a two-person , zero-sum game [8].We always have sup s R inf s C T ( s C , s R ) ≤ inf s C sup s R T ( s C , s R ). If we actually have(1) sup s R inf s C T ( s C , s R ) = v = inf s C sup s R T ( s C , s R ) , then we say that v is the value of the game. Suppose (1) holds and there exists a cop strategy s ∗ C (resp. a robber strategy s ∗ R ) such that(2) ∀ s R : T ( s ∗ C , s R ) ≤ v, (resp. ∀ s C : T ( s C , s ∗ R ) ≥ v )then we say that s ∗ C (resp. s ∗ R ) is an optimal strategy . If optimal strategies ( s ∗ C , s ∗ R ) exist, thenwe also have [8](3) T ( s ∗ C , s ∗ R ) = sup s R inf s C T ( s C , s R ) = inf s C sup s R T ( s C , s R ) . (In the CR context, T ( s ∗ C , s ∗ R ) is commonly called capture time [4].) It is well known [8] that(3) holds for all finite games (i.e., they have both a value and optimal strategies). But CR isan infinite game (it may last an infinite number of rounds and, consequently, there is also aninfinite number of strategies) hence (3) must be proved . To this end, it suffices to prove thefollowing. Lemma 2.1. If G is cop-win, there exists a number T G and a cop strategy s C such that sup S R T ( s C , s R ) ≤ T G . In other words: if the cop can effect capture in a finite number of rounds, he can do so in a bounded number of rounds, and the bound is independent of the robber strategy . Some readers
OPS AND ROBBERS, GAME THEORY AND ZERMELO’S EARLY RESULTS 3 may consider this obvious, but we will soon argue that proving Lemma 2.1 is not trivial. At anyrate, using the lemma, we can prove the following.
Theorem 2.2.
For every cop-win graph G there exist memoryless cop and robber strategies σ ∗ C , σ ∗ R such that the CR game played on G has value T ( σ ∗ C , σ ∗ R ) (which we call capture time ). The proof of Theorem 2.2 is straightforward and hence omitted. The basic idea is that, sincethe cop strategy s C effects capture in at most T G rounds, for every robber strategy , it sufficesto examine the “truncated” game CR in which the robber wins (and receives infinite payoff) ifhe can avoid capture for T G rounds. Since CR is a finite game, it has a value and both cop androbber have memoryless optimal strategies.Furthermore, Theorem 2.2 can be generalized as follows: when K cops and one robber playon a graph G with cop number c ( G ) ≥
1, optimal memoryless cop and robber strategies existfor every K ≥ K < c ( G ) then capture time is infinite and every cop strategy isoptimal). Once again, details are omitted.In the definitions leading to Lemma 2.1 and Theorem 2.2 we have followed a “standard”game theoretic approach; this, as already noted, is generally not used in the CR literature. Anotable exception is the formulation presented in [6], which is very similar to the one we haveused, with one important difference. Namely, the definition of “strategy” used in [6] is the sameas our definition of “memoryless strategy”; in other words, it appears that the authors of [6] only consider memoryless strategies . We will argue in Section 4 that this choice is based on animportant implicit assumption.3. Boundedness of Capture Time
Apparently Lemma 2.1 has been considered so obvious that it has been used without proof.In fact, the following stronger assumption has been used [5]:(4) “
If the cop has a winning strategy he can play so that no state of the game is repeated .”“Game state” is another term for “game position”. Lemma 2.1 follows immediately from (4):since the number of possible positions (excluding those of the 0-th round) is 2 | V | , if the copcan win without repeating any game position, then he can win in at most | V | rounds.We will now argue that (4) is correct but its proof, while short, is not trivial.It turns out that an analog of (4) has been claimed by Zermelo for the game of chess . Bothchess and CR are two-person, zero-sum games of perfect information, in which the players movealternately; furthermore, the number of positions is finite in both games; finally, Zermelo studieda version of chess in which the “ draw on three repeated moves ” rule is not applied, hence thegame can last an infinite number of rounds. Because of the close analogy between CR and chess,Zermelo’s original statements and later revisions are highly relevant to our discussion.All the passages quoted in the rest of this section come from the excellent paper [13], which(among other things) discusses Zermelo’s [14] and D. K¨onig’s [9].In 1913 Zermelo wrote [14], in which he studies (in the context of chess) the following question:“given that a player is in ‘a winning position’, how long does it take for White to force a win?”.His answer is the following (the similarity of the following passage to (4) is obvious).Zermelo claimed that it will never take more moves than there are positions inthe game. His proof is by contradiction: Assume that White can win in a numberof moves greater than the number of positions. Of course, at least one winningposition must have appeared twice. So White could have played at the first ATHANASIOS KEHAGIAS AND GEORGIOS KONSTANTINIDIS occurrence in the same way he does at the second and thus could have won infewer moves than there are positions. [13]This claim was challenged in 1927 by D. K¨onig [9]. After proving a very general theorem,he used it to (among other things) prove Zermelo’s claim. But K¨onig also argued in [9] thatZermelo’s original proof is incomplete becauseZermelo had argued that White could [change] his behavior at the first occurrenceof any repeated winning position and thus win without repetition. [13]The flaw of this argument, according to K¨onig, is thatZermelo implicitly assumes that Black would never change his behavior at anyreoccurrence of a winning position. He only considered the special case of un-changing behavior on Black’s part. What he needed to show was that his claimis true for all possible moves by Black. [13]K¨onig communicated his results to Zermelo who accepted the criticism and provided a new,correct proof that the number of moves necessary to win is less than the number of positions.Zermelo’s proof uses the the nontrivial result that the number of moves necessaryto force a win is bounded [13];this is the analog of Lemma 2.1. Zermelo’s proof is included in [9]. The following Englishtranslation originally appeared in [13].Let p be a position in which White—having to move—can force a checkmatehowever not in a bounded number of moves but, depending on the play of theopponent, in a possibly unbounded increasing number of moves. Then for everymove by White, Black can bring about a position p which has the same property.Otherwise, White could achieve his goal with a bounded number of moves startingfrom p , as the number of possible moves is finite. Consequently, and indepen-dently of White’s play, if the opponent plays correctly, an unbounded sequence ofpositions p , p , p , ... which all have the property [of] p will emerge, i.e. whichwill never lead to a checkmate. Thus, if from a position p a win can be forced atall, then it can be forced in a bounded number of moves. [13]Clearly, the above argument can also be applied (for the case of CR) to prove Lemma 2.1.4. Memoryless Strategies
We conjecture that Assumption (4) (and hence Lemma 2.1) appears self-evident because ofan additional implicit assumption:(5) “ in the CR game neither player loses anything by using memoryless strategies”.Assuming (5), we can prove that no game position is repeated (and hence capture time isbounded) by contradiction, as follows. If the cop uses a winning memoryless strategy and aposition is repeated at rounds t and t , then the robber can repeat his moves between t and t − t = t + ( t − t ), t = t + 2 ( t − t ), ... the same position will be reached and the gamewill continue ad infinitum , which contradicts the assumption that the cop was using a winningstrategy. But can we consider (5) self-evident? Informally it can be restated as follows:(6) “ remembering how a CR position was reached gives no advantage to either player ” Our emphasis.
OPS AND ROBBERS, GAME THEORY AND ZERMELO’S EARLY RESULTS 5 and this seems quite reasonable. But in game theoretic terms, (5) states that there exist mem-oryless strategies σ ∗ C , σ ∗ R such that(7) T ( σ ∗ C , σ ∗ R ) = sup s R inf s C T ( s C , s R ) = inf s C sup s R T ( s C , s R ) , where the inf and sup are taken over all strategies (not just memoryless ones). We believe that(7) is far less obvious than either (5) or (6) and requires proof. The fact that neither (5) nor (6)was invoked by the authors of [9, 13, 14] (when studying the same issues for the game of chess)supports our position.It appears that (5) is taken for granted in [7, 5, 6]. For example, it is stated in [7] that “thesituation of the game can be described simply by saying where each player is and whose turn itis to move”, i.e., by the triple ( x, y, p ); this is reflected in the fact that in [5] ( x, y, p ) is called the state of the game. Also, in both [7, 5] the algorithm used to determine optimal strategies consid-ers only memoryless strategies, implying that nothing is lost by ignoring strategies with memory,i.e., (5) is assumed implicitly . Finally, the definition of “strategy” used in [6] encompasses onlymemoryless strategies, which again suggests that the authors take (5) for granted.Our own point of view is different. We believe that (5) must be proved, rather than assumed.The proof is easy but not trivial. Namely, (5) is a corollary of Theorem 2.2 which dependson Lemma 2.1; as already mentioned, this lemma can be proved by the previously mentionedargument from [9]. 5. Reachability Games
We can use another route, and establish (5) through well known results about reachabilitygames [10]. This requires formulating CR as a reachability game ; the connection is interestingand, as far as we know, has not been noted previously .A reachability game [10] is played by two players (Player 0 and Player 1) on a digraph G = (cid:0) V , E (cid:1) . Each move consists in sliding a token from one digraph node to another, along an edge;the i -th player slides the token if and only if it is currently located on a node v ∈ V i ( i ∈ { , } ),where V ∪ V = V , V ∩ V = ∅ . Player 0 wins if and only if the token goes into a node u ∈ F ;otherwise Player 1 wins. The game is fully described by the tuple (cid:0) V , V , E, F (cid:1) . The followingis well known [3, 10]. Theorem 5.1.
Let (cid:0) V , V , E, F (cid:1) be a reachability game on the digraph D = (cid:0) V , E (cid:1) . Then V can be partitioned into two sets W and W such that (for i ∈ { , } ) player i has a memoryless strategy σ i which is winning whenever the game starts in u ∈ W i . CR can be converted to a reachability game. Essentially, this has been done in [7, 5] (eventhough the authors appear to not be aware of the connection to reachability games) using the move digraph M G . Every node of M G corresponds to a position ( x, y, p ), where x (resp. y ) isthe cop (resp. robber) position in the original G and p is the player whose turn it is to play;( u, v ) is a directed edge of M G iff it is possible to get from u to v by a single move. The M G thusconstructed can be used to play any modified CR game with prespecified initial player positions x and y and starting player p . For the “classic” CR game (starting on an “empty board”) M G must be expanded to M G by adding:(1) one node of the form ( ∅ , ∅ , C ) (it corresponds to the beginning of the game, just beforethe cop is placed in the graph); Despite the fact that the reachability formulation is almost identical to the one used in [7, 5].
ATHANASIOS KEHAGIAS AND GEORGIOS KONSTANTINIDIS (2) | V | nodes of the form ( x, ∅ , C ) with x ∈ V (they correspond to the middle of the 0-thround, when the cop has been placed but not the robber);(3) the edges which correspond to legal moves between the nodes of M G .Let M G = (cid:0) V , E (cid:1) and consider the reachability game (cid:0) V , V , E, F (cid:1) where V (resp. V )contains the nodes of the form ( x, y, C ) (resp. ( x, y, R )) and F (the cop’s target set ) containsthe nodes of the form ( x, x, p ), i.e., positions of the CR game in which the cop and robber arein the same position (i.e., node of the original G ). This reachability game subsumes both theclassic and the (previously mentioned) modified CR games.The graph G is cop-win iff ( ∅ , ∅ , C ) belongs to the W ; in this case, by Theorem 5.1, thecop has a memoryless winning strategy for the classic CR game. To establish that he hasa time optimal memoryless strategy, we use the fact that the memoryless winning strategyyields bounded capture time; hence, by the arguments of Section 2, both cop and robber havememoryless time optimal strategies. The situation is similar when G is robber-win. In this case,( ∅ , ∅ , C ) ∈ V and, by Theorem 5.1, the robber has a memoryless winning strategy σ ∗ R and,since the capture time is infinite, any winning robber strategy is also time optimal. When therobber uses σ ∗ R , any cop strategy results in infinite capture time; hence any memoryless strategy σ C is time optimal. These ideas can be extended to any graph G and any number K of cops,provided the move digraph M ( K ) G = (cid:16) V ( K ) , E ( K ) (cid:17) is constructed accordingly. The cop number c ( G ) is the smallest K such that ( ∅ , ∅ , C ) ∈ V ( K )0 .We have already mentioned that the move digraph formulation of CR has appeared in [7, 5].It has also been used in the early paper [2] to study winning strategies (but not time optimality);the authors of [2] appear unaware of the connection to reachability games. References
1. M. Aigner and M. Fromme,
A game of cops and robbers , Discrete Applied Math. (1984), 1–12.2. Alessandro Berarducci and Benedetto Intrigila, On the cop number of a graph , Advances in Applied Mathe-matics (1993), no. 4, 389–403.3. D. Berwanger, Graph games with perfect information .4. Anthony Bonato, Petr Golovach, Gena Hahn, and Jan Kratochv´ıl,
The capture time of a graph , DiscreteMathematics (2009), no. 18, 5588–5595.5. A.Y Bonato and G. Macgillivray,
A general framework for discrete-time pursuit games .6. M. Boyer et al.,
Cops-and-robbers: remarks and problems , Journal of Combinatorial Mathematics and Com-binatorial Computing (2013).7. G. Hahn and G. MacGillivray, A note on k-cop, l-robber games on graphs , Discrete mathematics (2006),no. 19-20, 2492–2497.8. Samuel Karlin,
Mathematical methods and theory in games, programming, and economics , Dover, 2003.9. D´enes K¨onig, ¨Uber eine schlussweise aus dem endlichen ins unendliche , Acta Litt. Ac. Sci. Hung. Fran.Joseph (1927), 121–130.10. Ren´e Mazala, Infinite games , Automata logics, and infinite games (2002), 197–204.11. R. Nowakowski and P. Winkler,
Vertex-to-vertex pursuit in a graph , Discrete Math. (1983), 235–239.12. A. Quilliot, Jeux et pointes fixes sur les graphes, these de 3eme cycle , Ph.D. thesis, Universite de Paris VI,1985, pp. 131–145.13. Ulrich Schwalbe and Paul Walker,
Zermelo and the early history of game theory , Games and economicbehavior (2001), no. 1, 123–137.14. Ernst Zermelo, ¨Uber eine anwendung der mengenlehre auf die theorie des schachspiels¨Uber eine anwendung der mengenlehre auf die theorie des schachspiels