[PDF] General Cops and Robbers Games with randomness

Abstract

Cops and Robbers games have been studied for the last few decades in computer science and mathematics. As in general pursuit evasion games, pursuers (cops) seek to capture evaders (robbers); however, players move in turn and are constrained to move on a discrete structure, usually a graph, and know the exact location of their opponent. In 2017, Bonato and MacGillivray presented a general characterization of Cops and Robbers games in order for them to be globally studied. However, their model doesn't cover cases where stochastic events may occur, such as the robbers moving in a random fashion. In this paper we present a novel model with stochastic elements that we call a Generalized Probabilistic Cops and Robbers game (GPCR). A typical such game is one where the robber moves according to a probabilistic distribution, either because she is rather lost or drunk than evading, or because she is a robot. We present results to solve GPCR games, thus enabling one to study properties relating to the optimal strategies in large classes of Cops and Robbers games. Some classic Cops and Robbers games properties are also extended.

Full PDF

GGeneral Cops and Robbers Games with randomness

Frédéric Simard a, , Josée Desharnais b, , François Laviolette b, a School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa,ON, Canada b Department of Computer Science and Software Engineering, Université Laval, Québec,QC, Canada

Abstract

Cops and Robbers games have been studied for the last few decades in com-puter science and mathematics. As in general pursuit evasion games, pursuers(cops) seek to capture evaders (robbers); however, players move in turn and areconstrained to move on a discrete structure, usually a graph, and know the ex-act location of their opponent. In 2017, Bonato and MacGillivray [2] presenteda general characterization of Cops and Robbers games in order for them to beglobally studied. However, their model doesn’t cover cases where stochasticevents may occur, such as the robbers moving in a random fashion. In thispaper we present a novel model with stochastic elements that we call a General-ized Probabilistic Cops and Robbers game (GPCR). A typical such game is onewhere the robber moves according to a probabilistic distribution, either becauseshe is rather lost or drunk than evading, or because she is a robot. We presentresults to solve GPCR games, thus enabling one to study properties relating tothe optimal strategies in large classes of Cops and Robbers games. Some classicCops and Robbers games properties are also extended.

Keywords:

Cops and Robbers games, pursuit games, optimal strategies,graph theory, stochastic games

Contents1 Introduction 22 An abstract Cops and Robbers game 4

Email addresses: [email protected] (Frédéric Simard), [email protected] (Josée Desharnais), [email protected] (François Laviolette)

Preprint submitted to Theoretical Computer Science April 27, 2020 a r X i v : . [ c s . D M ] A p r .5 The computational complexity of the w n recursion . . . . . . . . 182.6 A stationarity result . . . . . . . . . . . . . . . . . . . . . . . . . 192.7 Bonato and MacGillivray’s generalized Cops and Robbers game . 22 Cops and Robbers games have been studied as examples of discrete-timepursuit games on graphs since the publication of Quilliot’s doctoral thesis [29]in 1978 and, independently, Nowakowski and Winkler’s article [26] in 1983.Both monographs describe a turn-based game in which a lone cop pursues arobber on the vertices of a graph. The game evolves in discrete time and withperfect information. The cop wins if he eventually shares the same vertex asthe robber’s, otherwise, if the play continues indeﬁnitely, the latter wins. Agiven graph is copwin if the cop has a winning strategy: for any possible movethe robber makes, the cop has an answer that leads him to eventually catch therobber (in ﬁnite time). As there is no tie, it is always true that one player hasa (deterministic) winning strategy.Since the ﬁrst exposition of the game of Cop and Robber, many variants haveemerged. Aigner and Fromme [1] notably presented in 1984 the cop number :it is the minimal number of cops required on a graph to capture a robber.Since then, more alternatives have been described, each one modifying one gameparameter or more such as the speed of the players, the radius of capture of thecops, etc. We refer to Bonato and Nowakowski’s book [4] for a comprehensivedescription of these diﬀerent formulations. The survey on guaranteed graphsearching problems by Fomin and Thilikos [11] is also a great reference on thesubject. In graph searching games, the objective is to capture a fugitive on agraph. The problems in which the object is always found are called guaranteed.In 2017 Bonato and MacGillivray [2] presented a ﬁrst generalization of Copsand Robbers games that encompasses the majority of the variants describedpreviously. Indeed, all two-player, turn-based, discrete-time, pursuit games ofperfect information on graphs in which both players play optimally are con-tained in Bonato and MacGillivray’s model. As such, this model encompasses2ll pursuit games deemed combinatorial (we refer to Conway’s book

On Num-bers and Games [9] for an introduction on the subject of combinatorial games).Those games include the set of turn-based, perfect information, games playedon a discrete structure without any randomness.Recently, some researchers such as Prałat and Kehagias [19], Komarov andWinkler [22] and Simard et al. [31] described a game, called the Cop and DrunkRobber game, in which the robber walks in a random fashion: each of hermovements is described by a uniform random walk on the vertices of the graph.In general, this strategy is suboptimal. Since this particular game cannot bedescribed by Bonato and MacGillivray’s model, it appears natural to seek toextend their framework to integrate games with random events.There has also been a recent push towards more game theoretic approachesto modeling Cops and Robbers games, notably by Konstantinidis, Kehagias andothers (see for example [23, 18, 16, 17, 24]). Our paper can be considered morein line with this way of treating Cops and Robbers games than more traditionalapproaches.This paper thus presents a model of Cops and Robbers games that is moregeneral than that of Bonato and MacGillivray. The main objective of thismodel is to incorporate games such as the Cop and Drunk Robber game. Theprobabilistic nature of this game leads to deﬁne a framework diﬀerent from theone of Bonato and MacGillivray.In Cops and Robbers games, one is generally interested in the questionof solving a game. This question is universal to game theory where one de-ﬁnes a solution concept such as the

Nash Equilibrium . In Cops and Robbersgames, often-times the cops’ point of view is adopted and one seeks to deter-mine whether it is feasible, and if so how, for them to capture the robbers. Instochastic Cops and Robbers games, one can generalize the question to a quan-titative scale of success: what is the (best) probability for the cops to capturethe robbers, and which strategy reﬂects it. One can also ask the dual questionof what would be the minimal number of cops required in order to capture therobbers with some probability. In deterministic games, this graph parameter isknown as the cop number .One can note that many solutions of Cops and Robbers games share thesame structure, and this is reﬂected in the fact that they can be solved with arecursive expression. Indeed, Nowakowski and Winkler [26] in 1983 presented apreorder relation on vertices, writing x (cid:22) n y when the cop has a winning strategyin at most n moves if positioned on vertex y , while the robber is on vertex x .An important aspect of this relation (cid:22) n is that it can be computed recursivelyand thus leads to a polynomial time algorithm to compute its values, as wellas the strategy of the cop. This relation was extended years later by Hahnand MacGillivray [15] in order to solve games of k cops by letting players moveon the graph’s strong product. Clarke and MacGillivray [7] have also deﬁned acharacterization of k -cop-win graphs through a dismantling strategy and studiedthe algorithmic complexity of the problem. For a ﬁxed k the problem can beresolved in polynomial time with degree k + 2 . On a related note, Kinnersley[20] proved that it is EXPTIME-complete to determine whether the cop-number3f a graph G is less than some integer k when both G and k are part of theinput. This shows that Clarke and MacGillivray’s result is somehow optimal.In games with stochastic components, such order relations can be generalizedby considering the probability of capture, as is done in a recent paper about the Optimal Search Path (OSP) problem [31]. A recursion w n ( x, y ) is deﬁned: itrepresents the probability that a cop standing on vertex y captures the robber,positioned on vertex x , in at most n steps. This relation, deﬁned on the Copand Drunk Robber game [19, 22, 31], is analogous to Nowakowski and Winkler’s x (cid:22) n y and is slightly more general as it enables to model the robber’s randommovement. One can wonder up to what point the relation w n can be extendedwhile preserving its polynomial nature. Theorem 2.13 and Proposition 2.24 givean answer to this question.This paper is divided as follows. Section 2 presents our model of Cops andRobbers games, the w n recursion along with some complexity results, notably,on w n . Stationarity results on w n are also included. Since most Cops and Rob-bers games are played on graphs, another formulation of our model is presentedon such a structure in Section 3. We conclude in Section 4.

2. An abstract Cops and Robbers game

We now present a general model of

Probabilistic

Cops and Robbers games;it is played with perfect information, is turn-based starting with the cops, andtakes place on a discrete structure. From each state/conﬁguration of the game,after choosing their actions, the cops and robbers will jump to a state accordingto their transition matrices, denoted T rob and T cop . These matrices may encodeprobabilistic behaviours: T cop ( s, a, s (cid:48) ) is interpreted as the probability that thecop, starting in s and playing action a , will arrive in s (cid:48) . Deﬁnition 2.1. A Generalized Probabilistic Cops and Robbers game (GPCR)is played by two players, the cop team and the robber team. It is given by thefollowing tuple G = ( S, i , F, A, T cop , T rob ) , (1) satisfying S = S cop × S rob × S o , the non-empty ﬁnite set of states representing thepossible conﬁgurations of the game. The sets S cop and S rob hold the possiblecops and robbers positions while S o may contain other relevant information(like whose turn it is). i ∈ S is the initial state. The notation T ( s, a, s (cid:48) ) refers to a transition matrix view. In this way, it corresponds toannotating the edge [ s, s (cid:48) ] of the transition system with an action a and a positive value, theprobability. In the Markov Decision Processes (MDP) community, it is also written T a ( s, s (cid:48) ) ,or P ( s (cid:48) | s, a ) . F ⊆ S is the set of ﬁnal (winning) states for the cops. A = A cop ∪ A rob , with A cop and A rob the non-empty, ﬁnite sets of actionsof the cops and robbers, respectively. T cop : S × A cop × S → [0 , is a transition function for the cops, that is, (cid:80) s (cid:48) ∈ S T cop ( s, a, s (cid:48) ) ∈ { , } for all s ∈ S and a ∈ A cop . When the sum is 1, we say that a is playable in s , and we write A cop ( s ) for the set of playable actions for the cops at state s ∈ S . Furthermore, T cop also satisﬁes • for all s ∈ S , A cop ( s ) (cid:54) = ∅• if s ∈ F , then T cop ( s, a, s ) = 1 for all action a ∈ A cop ; hence T cop ( s, a, s (cid:48) ) = 0 for all s (cid:48) (cid:54) = s . T rob is a transition function for the robbers, similar to T cop . A rob ( s ) is theset of playable actions by the robbers in state s ∈ S .A play of G is an inﬁnite sequence i a s a s a · · · ∈ ( SA cop SA rob ) ω of statesand playable actions of G that alternates the moves of cop and rob . It thussatisﬁes T cop ( s j , a j , s j +1 ) > for j = 0 , , , . . . and T rob ( s j , a j , s j +1 ) > for j = 1 , , , . . . . The cops win whenever a ﬁnal state s ∈ F is encountered,otherwise the robbers win. A turn is a subsequence of two moves, starting from cop . We also consider ﬁnite plays and we write G n for the game where plays areﬁnite with n (complete) turns. An equivalent formulation for T cop , and sometimes more handy, is to ratherdeﬁne T cop ( s, a ) as a distribution on S , for an action a playable in s . Thecorrespondance is T cop ( s, a )( X ) = (cid:80) s (cid:48) ∈ X T cop ( s, a, s (cid:48) ) for X ⊆ S . For example,the second condition of the ﬁfth item in the preceding deﬁnition could have beenstated T cop ( s, a ) = δ s , where δ s is the Dirac distribution on an element s , thatis, δ s has value 1 on { s } , and is 0 elsewhere.A play progresses as follows: from a state s , the cops choose an action a cop ∈ A cop ( s ) , which results in a new state s (cid:48) , randomly chosen according todistribution T cop ( s, a cop ) ; then the robbers play an action a rob ∈ A rob ( s (cid:48) ) , whichresults in the next state s (cid:48)(cid:48) , drawn with probability T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) . Once a ﬁnalstate is reached, the players are forced to stay in the same state. Notice thatone could record whose turn it is in the third component of the states: S o = { cop , rob } . However, this doubles the state set and complexiﬁes the deﬁnitionof the transition function. In most games, it is more intuitive to deﬁne the rulesfor movement independently of when this transition will be taken, like in chess.We sometimes use the notation s x , for x ∈ { cop , rob , o } to denote the pro-jection of a state s ∈ S on the set S x . The set S o is rarely used in the currentsection, but will be valuable further on, such as in Example 3.5 on dynamicgraphs whose structures vary with time.In what follows, we write Dist B as the set of discrete distributions on a set B and U B ∈ Dist B for the discrete uniform distribution on the same set.Most of the example games we will describe will be between a single cop anda single robber, even if the deﬁnition speciﬁes a cop team and a robber team.5he usual way of presenting the positions of the cop team is with a single vertexin the strong product of each member’s possible territory. We now describe a few known games, following the structure of Deﬁni-tion 2.1. The ﬁrst one is a typical, deterministic example of a Cops and Robbersgame. We say a game is deterministic when both distributions deﬁned by T cop and T rob are concentrated on a single point, in other words if T cop ( s, a ) and T rob ( s, a ) are Dirac for all s ∈ S and a ∈ A . The reader can safely skip thissection. Example 2.2 ( Classic Cop and Robber game).

Let G = ( V, E ) be a ﬁnitegraph. In this game, both players play alone and walk on the vertices of thegraph, successively choosing their next moves among their neighbourhoods. Theﬁnal states are those in which both players share a vertex, in which case the copwins. The tricky part for encoding this game is that in their ﬁrst move, the copand the robber can choose whatever vertices they want, so the rule of movingdiﬀers at the ﬁrst move from the rest of the play. So we let i cop , i rob / ∈ V be twoelements that will serve as starting points for the cop and the robber. Becausethe ﬁrst moves are chosen in turn, the set of states S below must contain statesin V × { i rob } , which can only be reached after the cop’s ﬁrst move, but beforethe robber’s. To simplify S , we include states that will not be reached, and thiswill be governed by the transition functions. The diﬀerent sets are: i = ( i cop , i rob ) S = ( { i cop } ∪ V ) × ( { i rob } ∪ V ) F = (cid:8) ( x, x ) ∈ V (cid:9) A cop = VA rob = V. Let ( c, r ) ∈ S , x ∈ V , and actions c (cid:48) ∈ A cop and r (cid:48) ∈ A rob . We deﬁne: T cop (( c, r ) , c (cid:48) , ( x, r )) = (cid:40) , if x = c (cid:48) and c = i cop or c (cid:48) ∈ N [ c ] , , otherwise. T rob (( c, r ) , r (cid:48) , ( c, x )) = (cid:40) , if x = r (cid:48) and r = i rob or r (cid:48) ∈ N [ r ] , , otherwise.Thus, for state ( c, r ) ∈ S \ { i } , the playable action set is A cop ( c, r ) = N [ c ] .Similarly, for the robber we get A rob ( c, r ) = N [ r ] . Because a play starts withthe cop, it is not required to specify the condition c (cid:54) = i cop in function T rob .Similarly, is it not necessary to make a special case of state c = r , since the playends anyway.The stochasticity of Deﬁnition 2.1 is motivated by the following example,called the Cop and Drunk Robber game. It is rather similar to the one justpresented except that the robber moves randomly on the vertices of the graph.6 xample 2.3 ( Cop and Drunk Robber game).

From the preceding ex-ample, only the robber’s transition function T rob is modiﬁed, the rest stays thesame. Let ( c, r ) ∈ S and r (cid:48) ∈ A rob . The robber’s transition function is then: T rob (( c, r ) , r (cid:48) ) = (cid:40) δ ( c,r (cid:48) ) , if r = i rob , U { c }× N [ r ] , otherwise.The robber, after the ﬁrst move, moves uniformly randomly on her neighbour-hood, which amounts to ignoring her action r (cid:48) ∈ A rob . One could also restricther actions by A rob ( s ) = { } when s ∈ S \ { i } .In the Cop and Drunk Robber game, the robber moves according to a uni-form distribution on her neighbourhood. Varying her transition function couldrepresent various scenarios. For example, the robber’s probability of ending ona vertex r (cid:48) from vertex r could depend on the distance between r and r (cid:48) .In addition to the Cop and Drunk Robber game itself, a recent paper bySimard et al. [31] presented a variant of this game in which the robber canevade capture. The main diﬀerence between these games is that the cop maynot catch the robber even when standing on the same vertex. This game ispresented in the next example. Example 2.4 ( Cop and Drunk Defending Robber).

The game’s mainstructure is again similar to that of Example 2.2, but we need a jail to simulatethe catch of the robber, j ∗ / ∈ V . The initial state is the same, and we have: i = ( i cop , i rob ) S = ( { i cop } ∪ V ) × ( { i rob } ∪ V ) ∪ { ( j ∗ , j ∗ ) } F = { ( j ∗ , j ∗ ) } . When players do not meet, they move on G as before. Yet, when the cop stepson the same vertex v as the robber, there is a probability p ( v ) the robber getscaptured, where p : V → [0 , . For ( c, r ) (cid:54)∈ F , the robber’s transition functionis then: T rob (( c, r ) , r (cid:48) ) =  δ ( c,r (cid:48) ) , if r = i rob , U { c }× N [ r ] , if c (cid:54) = r and r (cid:54) = i rob ,D r , if c = r and r (cid:54) = i rob , where D r ( x ) = (cid:40) − p ( r ) | N [ r ] | , if x ∈ { c } × N [ r ] and c = r,p ( r ) , if x = ( j ∗ , j ∗ ) . When the cop steps on the robber’s vertex ( c = r ), at the end of his turn, thenext move for the robber follows the distribution D r . The robber is caughtby the cop with probability p ( r ) , bringing the play in a ﬁnal state, otherwiseshe proceeds as expected: the target state is chosen uniformly randomly inthe robber’s neighbourhood. Variations of this game could be deﬁned throughdiﬀerent distributions for T rob (( c, r ) , r (cid:48) ) with c (cid:54) = r . Likewise, in D r , the factor | N [ r ] | could be replaced with any distribution on N [ r ] .7e now present the Cop and Fast Robber game with surveillance zone asﬁrst formulated in Marcoux [25]. This example is reconsidered further on inSection 3. Chalopin et al. also studied a game of Cop and Fast Robber with theaim of characterizing graph classes [6]. Example 2.5 ( Cop and Fast Robber).

This game is similar to the classicone (Example 2.2) except that the robber is not limited to a single transition.It has been studied by Fomin et al. [10]. We present a variation where the copcan capture the robber when she appears in his watch zone, even in the middleof a path movement. This watch zone can simulate the use of a weapon by thecop. The states will now contain, in addition to both players’ positions, the setof vertices watched by the cop. We assume here that the cop’s watch zone ishis neighbourhood, as in Marcoux [25]; Fomin et al.’s version is retrieved with awatch zone consisting of a single vertex, the cop’s position. In the initial state,the cop’s watch zone is empty since the robber cannot be captured before herﬁrst step. We again use a jail state j ∗ / ∈ V . When both players ﬁnd themselvesthere, the game ends and the robber has lost. Hence, we let: i = ( i cop , ∅ , i rob ) with i cop , i rob / ∈ V,F = { ( j ∗ , ∅ , j ∗ ) } ,S = ( { ( i cop , ∅ ) } ∪ { ( c, N [ c ]) | c ∈ V } ) × ( { i rob } ∪ V ) ∪ F Let ( c, C, r ) ∈ S be the current state and c (cid:48) ∈ N [ c ] an action of the cop. Hereis the cop’s transition function, for ( c, C, r ) (cid:54)∈ F : T cop (( c, C, r ) , c (cid:48) ) =  δ ( c (cid:48) ,N [ c (cid:48) ] ,r ) , if c = i cop and c (cid:48) ∈ V orif c ∈ V and c (cid:48) ∈ N [ c ] , , otherwise.As in the classic game, the cop can jump to any vertex in his ﬁrst move; afterthat he moves in the neighbourhood of his current position. His watch zonethen changes to N [ c (cid:48) ] . We use C as watch zone in this deﬁnition to emphasizethe fact that it does not inﬂuence the cop’s next state. On her turn, on vertex r ∈ V , the robber’s action consists in choosing a path π = ( r , r , . . . , r n ) ofﬁnite length n > , that is, [ r i , r i +1 ] is an edge in E for each i = 1 , , . . . . Therobber’s transition function is: T rob (( c, C, r ) , π ) =  δ ( c,C,r n ) , if r = i rob and r n ∈ V \ N [ c ] , orif r ∈ V and r i / ∈ C , for all ≤ i ≤ n,δ ( j ∗ , ∅ ,j ∗ ) , otherwise.The robber is thus ensured to reach her destination r n provided that she nevercrosses the cop’s watch zone on her path π . If this happens, then the robber istaken to the jail state ( j ∗ , ∅ , j ∗ ) .In Section 3, we present this game again, but with the possibility for therobber to evade capture. 8ence, because of Deﬁnition 2.1’s rather general description, it is possible toencode a great variety of random events resulting from the cops’ or the robbers’actions. In the following example, we encode a simple inhomogeneous MarkovChain by forgetting the notions of cop. This makes the example fairly degeneratebut it also shows the generality of Deﬁnition 2.1. Example 2.6 ( Finite Markov chain).

A Markov chain is a sequence ofrandom variables X , X , . . . on a space E , having the Markov property. So wecan assume that the evolution is given by an initial distribution q on E and afamily of matrices M , M , . . . , where M i ( s, s (cid:48) ) is the probability that X i +1 = s (cid:48) given that X i = s . We can encode it as a GPCR game from Deﬁnition 2.1. Inprevious examples, we have ignored the third component of states, S o , but herewe can ignore one of the players sets, like S rob ; equivalently, we can assume asingle state for the robber and no eﬀect by T rob . We deﬁne i / ∈ ES = I ∪ ( E × N ) F = ∅ A = { } T cop ( i , , ( e, q ( e ) T cop (( e, j ) , , ( e (cid:48) , j + 1)) = M j ( e, e (cid:48) ) . Since the action of the player has no inﬂuence on the progress of the game, itis natural to deﬁne A as a singleton. Technically, a play alternates between themoves of cops and robbers, so it is a sequence i e , e , e , e , . . . ;the repetitions reﬂect the fact that the robber has no eﬀect. If we ignore theuseless information of such a play, we obtain a sequence i e e e . . . , which isjust a walk in the Markov chain (and the robber wins). Another way to writedown this model would have been to let the two players play similarly, with T rob = T cop , but the states would then have to be triplets, and the initial statewould force a less simple encoding.Similarly, we can encode a ﬁnite state Markov Decision Process (MDP) withreachability objectives [28] with Deﬁnition 2.1. The encoding will satisfy thatthe optimal value of the MDP is 1 if the cops wins, otherwise it is 0, and therobber wins.The probabilistic Zombies and Survivors game on graphs [3] can also beviewed as a GPCR game, one in which only the robbers play optimally. Itmodels a situation in which a single robber (the survivor) tries to escape a setof cops (the zombies). However, the cops have to choose their initial verticesat random and, on each turn, choose randomly among the set of vertices thatminimize the distance to the robber. A deterministic (or pure) strategy is a function that prescribes to a playerwhich action to play on each possible game history. Some strategies are better9han others; we will be interested in the probability of winning for the cops,which will be attained by following a strategy. Ultimately, we are interestedin memoryless strategies, that is, those that only depend on the present state,and not on the previous moves; nevertheless, we need to deﬁne more generalstrategies as well.

Deﬁnition 2.7.

Let G be a game. A history on G is an initial fragment of aplay on G ending in a state. H G is the set of histories on G . • the set of general strategies is Ω g = { σ : H G → A } . • the set of memoryless strategies is Ω = { σ : S → A } . • the set of ﬁnite horizon strategies is Ω f = { σ : ( S × N ) → A } . A ﬁnite horizon strategy counts the number of turns remaining, and it isotherwise memoryless. A ﬁnite horizon strategy is conveniently deﬁned on G but it is actually played on G n , hence the following deﬁnition of how such astrategy is followed. At turn 0 of h (histories i and i a s ), there are n turnsremaining, so σ is evaluated with n on the second coordinate of its argument; atturn 1 (histories i a s a s and i a s a s a s ), there are n − turns remaining. Deﬁnition 2.8.

Let h = i a s a s a s . . . be a (ﬁnite or inﬁnite) play of G . • h follows a general strategy σ ∈ Ω g for the cops if for all j = 0 , , , . . . we have a j = σ ( a s a s a s . . . s j ) . Similarly for the robbers. • h follows a memoryless strategy σ ∈ Ω cop for the cops if for all j =0 , , , . . . we have a j = σ ( s j ) . Similarly for the robbers. • h follows a ﬁnite horizon strategy σ ∈ Ω fcop on G n for the cops if for j = 0 , , , . . . , n we have a j = σ ( s j , n − j )) . • h follows a ﬁnite horizon strategy σ ∈ Ω frob on G n for the robbers if j =1 , , , . . . , n + 1 we have a j = σ ( s j , n − j − ) . These strategies are all deterministic, or pure: a single action is chosen.Some papers consider mixed or behavioral strategies, where this choice is ran-domized. This is unnecessary in our setting because, as is well known in perfectinformation games, among all optimal strategies, there is always a pure one. Wewill come back to this when we study optimal strategies later on.We now present an example where the optimal strategy for the inﬁnite gameis memoryless (only depends on the states), but, for any ﬁnite horizon game G n ,it is a ﬁnite horizon strategy. Example 2.9.

This example is in the spirit of the Cop and Drunk Robbergame, presented in Example 2.3. As in this example, the cop moves on hisneighbourhood and so does the robber, who cannot choose her action, as before,but the diﬀerence with Example 2.3 is that the robber’s movement is not uniform.The graph is a cycle of length 5. The robber moves clockwise with probability0.9, and counterclockwise with probability 0.1. If the cop is at distance 1 ofthe robber at his turn, of course he wins in this turn. Otherwise, the cop isat distance 2, more speciﬁcally at clockwise distance tates s where this clockwise distance is 2 (from the cop to the robber). On thelong term, the cop’s best choice is to move counterclockwise. However, if onlyone turn remains, the best move for the cop is the clockwise move because thenwith probability . , the robber will jump to his position, whereas the probabilityof winning is zero in the counterclockwise direction. So the best strategy σ for G n satisﬁes σ ( s, n ) (cid:54) = σ ( s, in such a state s , for n > , hence it is notmemoryless. Indeed, for example, σ ( s, (cid:54) = σ ( s, because the probability ofcatching the robbers by playing counterclockwise when 2 turns remain is . ,and it is . by playing clockwise ( . in one move of the robber plus . intwo moves).2.3. Winning conditions in GPCR games In this section we are interested in winning strategies for the cops, theirprobability of winning in a given number n of turns (that is, in G n ) and theirprobability of winning without any limit on the number of turns (in G ).Given ﬁnite horizon strategies σ cop and σ rob , for the cops and for the robbers,we consider the probability that the robbers are captured in n steps or less: p n ( σ cop , σ rob ) := P [ “capture in at most n steps” | σ cop , σ rob ] . Since the cops want to maximize this probability, and the robbers want tominimize it, the probability for the cops to win in n turns or less (playingoptimally), whatever the robbers strategy, is: p ∗ n := max σ cop ∈ Ω fcop min σ rob ∈ Ω frob p n ( σ cop , σ rob ) . (2)This is in fact the value of G n in the sense of game theory. In game theory, thevalue for G n exists if max σ cop ∈ Ω gcop min σ rob ∈ Ω grob p n ( σ cop , σ rob ) = min σ rob ∈ Ω grob max σ cop ∈ Ω gcop p n ( σ cop , σ rob ) . (3)In our setting deﬁning the payoﬀ function of a play as 1 when the robbers arecaptured and 0 otherwise, we have, by Wal and Wessels [33], that the game G n has value p ∗ n . That the restriction of p ∗ n to ﬁnite horizon strategies does achievesthe value of G n is given again by Wal and Wessels, who call such strategiesMarkov strategies. Finally, since G n is ﬁnite and with perfect information, astandard game-theoretical argument [27] justiﬁes that the optimal strategiesare deterministic (or pure).We say that the cops and the robbers play optimally in G n if they each followa strategy that yields probability p ∗ n for the cops to win. We will show lateron, but it is also straightforward from the deﬁnition, that p ∗ n is increasing in n ; since it is moreover bounded by 1, the limit always exists and we will provethat is it equal to the value of G . This can be proven by induction, since for n +1 the cops can choose their optimal strategyfor n and simply do anything on the last turn. G has a value and that this value is achieved by a pair of optimalstrategies that are deterministic (or pure) and memoryless. The argument iswell known in the literature on SSGs, but requires a construction, so we leaveit to Appendix A. Thus, let us write the value of game G as p ∗G , that is, p ∗G = max σ cop ∈ Ω c min σ rob ∈ Ω r P [ “capture in a play” | σ cop , σ rob ] , (4)and the equality still holds when the min and max operators are switched.This value is guaranteed by Theorem A.1 [8, 30]. In Proposition 2.16, we willshow that the diﬀerence in the cop using a ﬁnite-horizon strategy in G n and amemoryless one in G is negligible for a suﬃciently large integer n .Equation (2) returns either or in deterministic games such as the ClassicCop and Robber game. We seek here to study games that can be stochastic,where p ∗ n can take any value in [0 , . Thus, we adapt the usual deﬁnition of copwin to our broader model. Deﬁnition 2.10.

Let G be a GPCR game. We say G is • c ( p, n ) -win if the cops can ensure a win with probability at least p in atmost n turns, that is p ∗ n ≥ p ; • p -copwin if it is c ( p, n ) -win for some n ∈ N ; • almost surely copwin if the cops can win when they are allowed to playinﬁnitely, that is p ∗G = 1 ; • copwin if it is c (1 , n ) -win for some n ∈ N . It is easy to see that when G corresponds to the Classic Cop and Robbergame, as deﬁned in Example 2.2, this deﬁnition of copwin coincides with theclassical one. In that sense, it can be considered as a generalization of theclassical one, because in any copwin ﬁnite graph, the cop wins in at most n = | V ( G ) | turns. Remark 2.11.

We will see in Proposition 2.16 that lim n →∞ p ∗ n = p ∗G . Thus,if there exists n such that p ∗ n > and if all states reachable within a ﬁnitenumber of moves of the cop’s optimal strategy are in the same strongly connectedcomponent, then p ∗G = 1 . Indeed, after n turns, if the play is not over, thecops can go back to the conﬁguration where p ∗ n > : the initial position thatis proposed by the cops’ strategy. In that state, the probability that the robbershave not been caught is at most − p ∗ n ; the probability that the robbers are notcaught after m repetition of this cycle is at most (1 − p ∗ n ) m . It is thus zero atthe limit. This happens, for example, if p ∗ n > and G is a strongly connectedgraph. However, we cannot, in general, claim that if p ∗ n > after n > | S | turnshave been played, then p ∗G = 1 . We deﬁne a probabilistic analog to the cop number, c ( G ) , which is theminimal number of cops required on a graph G in order for the cops to cap-ture the robbers. It is an important subject of research in Classic Cops andRobbers games [4], in particular relating to Meyniel’s conjecture that c ( G ) ∈ (cid:16)(cid:112) | V ( G ) | (cid:17) . Furthermore, one of the main areas of research on cops androbbers games that involve random events is the expected capture time of therobbers [22, 21, 19]. Thus, we further generalize the expected capture time ofthe robbers for any game G .Adding cops in a game G is done in the natural way. The set of cops states S cop is the cartesian product of the sets of single cop positions, and the transitionfunction is updated so as to let all cops move in one step. Deﬁnition 2.12.

The ( p, n ) -cop number c np ( G ) of a game G is the minimalnumber of cops required for the capture of the robbers in at most n turns withprobability at least p . In other words, c np ( G ) is the minimal number of copsrequired for a game G to be c(p,n)-win. The p -cop number, c p ( G ) = c ∞ p ( G ) , isthe minimal number of cops necessary for having p ∗G ≥ p .Let T p G be the random variable giving the number of turns required for therobbers to be captured with probability at least p in G under optimal strategies.Then, the p -expected capture time of the robbers is E (cid:2) T p G (cid:3) . The expected capturetime of the robbers is E (cid:2) T G (cid:3) . Since some of the optimal strategies of G are memoryless, we can turn thequestion of computing E (cid:2) T p G (cid:3) into a question of computing an expected hittingtime in a Markov chain. Let us write σ ∗ cop ( σ ∗ rob ) for the optimal strategy ofthe cops (robbers) in G and let M be the Markov chain such that for anystate s ∈ S , it has two states ( s, σ ∗ cop ( s )) and ( s, σ ∗ rob ( s )) . Furthermore, let M be its transition matrix, that is governed by the distributions T cop ( s, σ ∗ cop ( s )) and T rob ( s, σ ∗ rob ( s )) . Suppose ( X n ) n ≥ describes the stochastic process on M beginning at the initial state i , then T := min n ≥ ( X n ∈ F ) is the hittingtime of F from i . The expectation of T is E (cid:2) T G (cid:3) . Similarly as with Bonato and MacGillivray’s model, we deﬁne a method forsolving GPCR games, that is, for computing the probability for the cops tocapture the robbers in an optimal play, and the strategy to follow. This methodtakes the form of a recursion, deﬁning the probability w n ( s ) that state s leadsto a ﬁnal state in at most n steps ( w is for winning in the following theorem).This recursion gives a strategy for the cops. Theorem 2.13.

Let G be a GPCR game, and let: w ( s ) := (cid:40) , if s ∈ F, , otherwise. w n ( s ) :=  , if s ∈ F, max a ∈ A cop ( s (cid:48) ) (cid:88) s (cid:48) ∈ S T cop ( s, a, s (cid:48) ) min a (cid:48) ∈ A rob ( s (cid:48) ) (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a (cid:48) , s (cid:48)(cid:48) ) w n − ( s (cid:48)(cid:48) ) , otherwise. (5)13 hen w n ( s ) gives the probability for the robbers to be captured in n turns or less,given that both players play optimally, starting in state s . Thus, w n ( i ) = p ∗ n . This also says that G is c ( p, n ) -win if and only if w n ( i ) ≥ p . For ( s, k ) ∈ S × N ,let σ ∗ cop ( s, k ) be the argmax in place of max in Equation (5) . This deﬁnes ﬁnitehorizon strategies that are optimal in G n . The recursive part of w n ’s deﬁnition is as follows: to win, the cops musttake the best action a ; this leads them to state s (cid:48) with probability T cop ( s, a, s (cid:48) ) ;from this state, the robbers must choose the action a (cid:48) that will give them thesmallest probability of being caught. Action a (cid:48) leads the robbers to state s (cid:48)(cid:48) with probability T rob ( s (cid:48) , a (cid:48) , s (cid:48)(cid:48) ) and then we multiply by the probability that thecops catch the robbers from this state, w n − ( s (cid:48)(cid:48) ) . Since the cops want a highprobability, a maximum is taken; it is the converse for the robbers. The fullequation gives the expected probability of capture of the robbers by the copswhen both players move optimally. Proof.

The proof is by induction on n . We prove that w n ( s ) gives the probabilityfor the robbers to be captured in n turns or less, given that both players playoptimally, starting in state s . Let s be any state.If n = 0 , then the cops win if and only if s ∈ F , in which case, by deﬁnitionwe do have w ( s ) = 1 . Otherwise the robbers win and w ( s ) = 0 , as wanted.If n > , suppose the result holds for n − ≤ k and let s be the currentstate. If this state is ﬁnal, then the robbers are caught in n turns or less withprobability and w n ( s ) = 1 as desired. Otherwise, let the cops, playing ﬁrst,choose an action a cop ∈ A cop ( s ) , after what the next state s (cid:48) is drawn accordingto T cop ( s, a cop , s (cid:48) ) . Then, the robbers can choose an action a rob ∈ A rob ( s (cid:48) ) , inwhich case the next state s (cid:48)(cid:48) will be drawn with probability T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) . Bythe induction hypothesis, we know a ﬁnal state will be encountered in n − turnsor less with probability w n − ( s (cid:48)(cid:48) ) starting from state s (cid:48)(cid:48) . Thus, the probabilitythe robbers are caught in n turns or less by playing action a rob after the copshave reached state s (cid:48) is given by: (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) w n − ( s (cid:48)(cid:48) ) . Note that if s (cid:48) ∈ F , this value is exactly w n − ( s (cid:48) ) , since by deﬁnition, we musthave T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) = 1 if s (cid:48)(cid:48) = s (cid:48) and 0 otherwise. The robbers wish tominimize this value among their set of available actions, which is possible sinceboth sets S and A rob are ﬁnite. Hence, supposing action a cop ∈ A cop ( s ) has beenchosen by the cops, the game stochastically transits to some other state s (cid:48) ∈ S with probability T cop ( s, a cop , s (cid:48) ) . Thus, with probability (cid:88) s (cid:48) ∈ S T cop ( s, a cop , s (cid:48) ) min a rob ∈ A rob ( s (cid:48) ) (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) w n − ( s (cid:48)(cid:48) ) , The argmax is not necessary unique. n turns from state s , when the cops playaction a cop . The cops want to maximize this value and, as for the robbers, thisis possible because the considered sets are ﬁnite. Thus, the cops must play theaction argmax a cop ∈ A cop ( s ) (cid:88) s (cid:48) ∈ S T cop ( s, a cop , s (cid:48) ) min a rob ∈ A rob ( s (cid:48) ) (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a rob , s (cid:48)(cid:48) ) w n − ( s (cid:48)(cid:48) ) . The claim about σ ∗ cop is straightforward from this result. The choices of actionsat the initial state thus give the probability w n ( i ) . Because p ∗ n is, by deﬁnition,the probability of capture of the robbers in n turns or less when both playersplay optimally, we conclude that w n ( i ) = p ∗ n .This result implies that the w n ’s are probabilities that increase with n . Inother words, we have the following corollary. Corollary 2.14.

For any n ∈ N , s ∈ S we have ≤ w n ( s ) ≤ w n +1 ( s ) ≤ . Note that there are many optimal strategies for the cops in G , that is, state-gies that have value p ∗ n , but they are not all as eﬃcient. Consider a game G n where the robbers can be caught in k < n turns with probability 1, and let σ k be an optimal strategy for G k . Then the strategy that stays idle for n − k turns and then behave as prescribed by σ k is optimal, but not eﬃcient, and itrespects the argmax of Equation (5). The next proposition shows how to deﬁnean eﬃcient one. Proposition 2.15.

For each N ∈ N , there exists an optimal strategy σ ∗ N withhorizon N that satisﬁes: for all s ∈ S , if w N ( s ) = w N ( s ) for N ≤ N ≤ N ,then σ ∗ N ( s, N ) = σ ∗ N ( s, N ) . Similarly for the robbers.Proof. For any ( s, m ) ∈ S × N , we denote by ACT ( s, m ) the set of actionsthat achieve the maximum in Equation (5) for w m ( s ) . We have proved inTheorem 2.13 that any strategy satisfying σ ( s, m ) ∈ ACT ( s, m ) for all ( s, m ) ∈ S × N is optimal. Let us prove that if w N ( s ) = w N +1 ( s ) for N ∈ N , thenACT ( s, N ) ⊆ ACT ( s, N + 1) . By contradiction let k be the smallest integersuch that there is a state s and an action a ∈ ACT ( s, k ) \ ACT ( s, k + 1) . Byinduction, the cops play action a at time k + 1 , and then, with horizon k , theychoose an optimal action in σ ∗ k − that is also in σ ∗ k (possible by minimality of k ) and so on until the last turn where they stay in place or whatever possible.This gives us a value of at least w k ( s ) and, by deﬁnition, at most w k +1 ( s ) . Since w k ( s ) = w k +1 ( s ) , the ﬁnite horizon strategy deﬁned above is optimal. This isa contradiction since a should be in ACT ( s, k + 1) . Thus we obtain that if w N ( s ) = w N ( s ) for N ≤ N ≤ N , then ACT ( s, N ) ⊆ ACT ( s, N ) , for all N ≤ N ≤ N . Hence the wanted strategy exists. The argument is similar forthe robbers.Although w n ( i ) only gives the value of the game G n with ﬁnite-horizonstrategies, we can show that this relation, as a function of n , converges to thevalue of G . 15 roposition 2.16. The value of G is lim n →∞ p ∗ n . Furthermore, the optimalstrategies of G n are (cid:15) -optimal strategies of G for any (cid:15) > and suﬃciently largeinteger n .Proof. From a previous argument, we know that some pair ( s c , s r ) of optimalmemoryless strategies for the cops and the robbers yields a probability p ∗G ofwinning for the cops. It holds that p ∗ n ≤ p ∗G , for any integer n , since the valueof G n can only be at most the value of G . Recall that since p ∗ n is non-decreasingin n and bounded above by p ∗G , we have that lim n →∞ p ∗ n ≤ p ∗G .Now, let us play strategies ( s c , s r ) , chosen above, in the game G n for anyinteger n . Consider the probability that the cops win in G n when both play-ers follow those strategies. These probabilities, for each n , form a sequence ( v n ) n ∈ N := ( v , v , . . . ) . This sequence is non-decreasing and bounded above by p ∗G . Let A n be the event “there is a capture in at most n turns under strategies s c and s r ”. Observe that A ⊆ A ⊆ . . . is a non-decreasing sequence. Thus,by the Monotone Convergence Theorem: p ∗G = P [ { h | h is a play following s c , s r where cops win } ]= P [ ∪ ∞ i =0 A i ]= lim n →∞ P [ A n ]= lim n →∞ v n . Thus, for any (cid:15) > there exists an integer N such that for all n ≥ N , p ∗G − v n < (cid:15) .But, we also have that v n ≤ p ∗ n for any integer n , since w n ( i ) is the value of G n .Hence, it follows that < p ∗G − p ∗ n ≤ p ∗G − v n = (cid:12)(cid:12) p ∗G − v n (cid:12)(cid:12) < (cid:15) . This completesthe proof.It is interesting to note that this theorem only applies if there are beststrategies for the cops and robbers. In particular, it is not true if G is played onthe inﬁnite graph of the following example. Example 2.17.

Consider an inﬁnite star graph with a central vertex, fromwhich paths of lengths n are deployed, for every integer n , and consider theClassic Cops and Robbers game G on this graph with one cop and one robber.The best move for the cop is to start on the (inﬁnitely branching) central vertex.Then whatever state the robber chooses, the cop will catch her in a ﬁnite numberof turn, so this graph is copwin in the sense of Deﬁnition 2.10. However thisnumber of turn is unbounded, so when playing in G n , the robber can simplychoose a vertex at distance greater than n ; so the value of G n is 0 for all n . Theproof of the theorem fails in that case because, the graph being inﬁnite, there isno optimal strategy for the robber in G . Whatever state the robber chooses, thereis always a further state that would allow her to be captured in more turns, thatis, there is always a better strategy. Under certain conditions that will be further studied in Subsection 2.6, the ( w n ) n ∈ N sequence becomes constant. 16 eﬁnition 2.18. We say that ( w n ) n ∈ N is stationary if there exists an integer N ∈ N such that w n ( s ) = w n +1 ( s ) , for all n > N , s ∈ S . We write w for thestationary part of ( w n ) n ∈ N . Remark 2.19.

It follows from the deﬁnition of w n that, if for some N , w N ( s ) = w N +1 ( s ) for all s ∈ S , then ( w n ) n ∈ N is stationary and w starts at n = N orless. From Theorem 2.13, we deduce Theorem 2.20 that is more in line withtraditional game theoretical arguments and show that in addition to the equality lim n →∞ w n ( i ) = p ∗G we can compute explicitly the optimal strategy of the copsin G , from the limit of the w n ’s. Theorem 2.20.

The (point-wise) limit w ∞ := lim n →∞ w n exists and it satisﬁes w ∞ ( s ) =  , if s ∈ F, max a ∈ A cop ( s (cid:48) ) (cid:88) s (cid:48) ∈ S T cop ( s, a, s (cid:48) ) min a (cid:48) ∈ A rob ( s (cid:48) ) (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a (cid:48) , s (cid:48)(cid:48) ) w ∞ ( s (cid:48)(cid:48) ) , otherwise. (6) Moreover, the optimal (memoryless) strategy for the cops in G , from any state s , can be retrieved by a cops’ action for which the maximum of Equation (6) isachieved.Proof. Let L be the lattice of functions S → [0 , , ordered point-wise, withthe null function as bottom element ⊥ . Equation (5) determines the followingfunction F : L → L . For f : S → [0 , and s ∈ S , F ( f )( s ) :=  , if s ∈ F, max a ∈ A cop ( s (cid:48) ) (cid:88) s (cid:48) ∈ S T cop ( s, a, s (cid:48) ) min a (cid:48) ∈ A rob ( s (cid:48) ) (cid:88) s (cid:48)(cid:48) ∈ S T rob ( s (cid:48) , a (cid:48) , s (cid:48)(cid:48) ) f ( s (cid:48)(cid:48) ) , otherwise.From previous remarks, F is monotone increasing. Thus, we deduce from theKnaster-Tarski ﬁxed point theorem [14] that F has a least ﬁxed point given by w ∞ := lim n →∞ F n ( ⊥ ) . Furthermore, we have F ( ⊥ ) = w and F ( w n − ) = w n ,so F n +1 ( ⊥ ) = w n , for all integer n , and thus w ∞ = lim n →∞ w n and satisﬁesEquation (6).We showed in Theorem 2.13 that w n ( i ) = p ∗ n , and in Proposition 2.16that lim n →∞ p ∗ n = p ∗G . Consequently, w ∞ ( i ) = p ∗G . Hence, w ∞ ( i ) is theprobability that the cops capture the robbers when both teams play optimally.Similarly, one can show that w ∞ ( s ) is the probability that, starting at s , thecops capture the robbers when both team play optimally. This, together withthe fact that w ∞ satisﬁes Equation (6), imply that the optimal strategy for thecop is coherent with an action achieving the argmax operator in place of the max operator in Equation (6). One cannot choose any such action because, forexample, a temporary bad action, like staying idle, can give the same probabilityof winning than another action, but you can only choose it a ﬁnite number oftimes, which is incompatible with a memoryless strategy.17 emark 2.21. Recall that we have w n ( i ) = p ∗ n and that, by deﬁnition, it holdsthat p ∗ n = min σ rob ∈ Ω grob max σ cop ∈ Ω gcop p n ( σ cop , σ rob ) . Thus, we could have deﬁned w n ( i ) with switched operators min and max . Then, we can deduce the optimalrobbers strategies by ﬂipping those operators and replacing the min operator byan argmin operator. This also holds in w ∞ . Now, with the help of Equation (5) we can generalize the classic theorem ofCops and Robbers games. This is done in the next corollary.

Corollary 2.22.

Let G be a GPCR game. Then, G is copwin if and only if thesequence ( w n ) n ∈ N is stationary and w ( i ) = 1 . Moreover, the game is p -copwin if and only if the sequence is stationary and w ( i ) ≥ p. If G is not p-copwin for any p , then the game is almost surely copwin if andonly if the sequence is not stationary and w ∞ ( i ) = 1 . Remark 2.23.

If the GPCR game G is deterministic, then w n ( s ) is or forany n ∈ N and s ∈ S . It therefore follows from monotonicity of ( w n ) n ∈ N (seeCorollary 2.14) and from Remark 2.19 that the stationary part starts at some N ≤ | S | . Indeed, if w n (cid:54) = w n +1 there is at least one s such that w n ( s ) = 0 and w n +1 ( s ) = 1 . This diﬀerence can be observed at most | S | times. The conditions under which ( w n ) n ∈ N is stationary are presented in Proposi-tion 2.26. w n recursion We show a result on the algorithmic complexity of computing function w n (Equation (5)). This function is computable with dynamic programming, yet itmay require a high number of operations, especially as its complexity is functionof the size of the state space. Recall that Equation (5) was devised to be asgeneral and eﬃcient as possible. However, given the context of Deﬁnition 2.1,the best one can hope for its polynomial complexity in the size of the state andaction spaces. Proposition 2.24.

In the worst case and under a dynamic programming ap-proach, computing w n requires O (cid:16) n | S | max | A cop | max | A rob | (cid:17) operations, where max | A cop | is max s ∈ S | A cop ( s ) | , similarly for max | A rob | . The spatial complexityis O ( n | S | ) . roof. Let a n be the number of operations required for computing the recursionof w n . Assume that computing probabilities T cop and T rob require unit cost.Clearly, a = 1 . In the worst case, when n > , all elements of the sets A cop and A rob must be considered in order to ensure optimality of the actions chosenand thus max | A rob | max | A cop | operations are required. We always have that | S | > |{ s (cid:48) ∈ S | T cop ( s, a cop , s (cid:48) ) > }| and similarly for T rob ( s, a rob , s (cid:48) ) . Then, inthe worst case, a n ≤ | S | max | A rob | max | A cop | + a n − ≤ n | S | max | A cop | max | A rob | + 1 , where we assumed that all values of w n − were saved in memory for all n − .Memorizing those values requires a spatial complexity of O ( n | S | ) at most. Theﬁnal complexity is thus O (cid:16) n | S | max | A cop | max | A rob | (cid:17) .Consequently, both spatial and temporal algorithmic complexities depend onthe three sets S , A cop and A rob . This suggests that these complexities may behigh if the number of available actions is. One could imagine a game in whichactions are paths, resulting in exponential complexity in | S | . Still, whenever A cop ∈ O ( p ( | S | )) and A rob ∈ O ( q ( | S | )) for some polynomials p and q , thenEquation (5) is clearly computable in polynomial time in the size of S . Moreover,as we will see in Corollary 2.27, w n does not have to be computed for all n inorder to determine if the cops have a winning strategy or not, essentially, n = | S | suﬃces. In many studied cases, | S | is itself polynomial in the size of the structureon which the game is played, leading each time to polynomial time algorithmsfor solving the game. In traditional games of Cops and Robbers where a relation (cid:22) n is deﬁned(such as the classic game [26] and the game with k cops [7]), it is useful toprove results on the convergence of the recursion (cid:22) n . One demonstrates therelation becomes stationary , that is, there exists a number N ∈ N such that forall integers n > N and all pairs of vertices ( u, v ) ∈ V , if u (cid:22) n v , then u (cid:22) n +1 v .One then writes (cid:22) for the stationary part of the sequence, i.e. (cid:22) = (cid:22) N . Thisresult is vital for solving Cops and Robbers games as it ensures the relation (cid:22) can be computed in ﬁnite time.Contrary to the relation (cid:22) n found in deterministic Cops and Robbers games(such as the classic game in Example 2.2), the relation w n does not always be-come stationary. For example, on the triangle K , with one cop and one robber,although it is copwin in the classical sense, whenever one adds a probability ofcapture on the vertices, say /M for M > , then after n turns the cop will havecaptured the robber with probability only − (1 − M ) n . Thus, after n turns,the cop can only ensure a probability of capture strictly less than , although hecan clearly win with probability p for any p ∈ [0 , . In other words, a game maybe almost surely copwin, but not c (1 , n ) -win for any integer n . In the following19roposition, we formulate and prove an upper bound on the minimal number ofsteps n required to determine p ∗G , the probability of capture in an inﬁnite game.Recall that it does not hold in general that in a copwin graph (in the classicalsense of one cop against one robber) every optimal strategy of the cop preventshim from visiting any vertex more than once [5]. Were this to be true, wecould easily upper bound the capture time of the robber. However, we showin Lemma 2.25 that a milder version of this result holds for states, instead ofsimple cop position.To have an intuition of why the following lemma is true, it is important tonote that the condition of stationarity is a very strong one. The contraposition ofthe lemma may be more informative: the only way for w n to become stationaryis that there is no loop possible in any play following the optimal strategies ofthe players. An example of a graph where it does not happen is a cycle of length3, where the robbers have equal probability in both directions in every state.There are plays where the robbers are caught after an arbitrary large numberof turns. On the other hand, an acyclic graph does induce stationarity for w n . Lemma 2.25.

Suppose ( w n ( s )) n ∈ N is stationary at N > in a game G , for astate s and that the cops and robbers follow their optimal strategy, from Propo-sition 2.15. Then, every winning play (for the cops) from s brings the cops inany given state at most once (at the end of a turn).Proof. We prove the result for both the cops and robbers, that is, in a winningplay where they follow their optimal strategy, none of them visit the same statetwice at their turn. Because of stationarity and Proposition 2.16 the optimalstrategy for the cops in G is also optimal in G n , n ≥ N . Suppose the lemma isfalse. Then there is a winning play π (i.e., reaching F in N turns or less) fromstate s containing a loop through a state s k that is thus reached twice by thesame player in the play, the second time being at s l , k < l (with k and l havingthe same parity). This play follows the optimal strategies of the players. Noneof the states of the loop are in F by deﬁnition of a play. Consider the set Π of plays π i that start as π until the ﬁrst occurence of s l , follow the fragment s l a l s l +1 . . . a k − s k from s l for i times, and then continue as the fragment of π after it exits s l = s k the last time. These are plays (in particular, they arealternating between the players). All these plays are winning (one of them maybe π ), but inﬁnitely many of them reach F at a turn greater than N . If we provethat these plays follow the optimal strategies, this contradicts stationarity asthe robbers are caught in more than N turns in an inﬁnite number of them,which implies that the value of G N + k is strictly greater than the value of G N forinﬁnitely many k .We do have that any play of Π follows the optimal strategies. Indeed, since π follows the optimal memoryless strategyies, everytime the play reaches state s k = s l , the same action is chosen for the player. In the ﬁrst occurence, it leadsto enter the loop, in the last one it leads to leave it. This happens when theaction leads to a stochastic next state.20ote that the lemma is not true if the robbers do not play well. Indeedconsider the very simple deterministic game played on a cycle of length greaterthan 3; a robber can avoid capture indeﬁnitiely by traveling away from the cop.Then ( w n ( s ) n ∈ N is stationnary for every state s . Consider a play where therobber decides to stop after having traveled 8 times around the cycle. The playis winning for the cop, but even if the cop follows the optimal strategy, the samestate is encountered 8 times. Proposition 2.26.

Let G be a GPCR game and s ∈ S . Then, the recursion w n deﬁned by Equation (5) is such that: if w | S | ( s ) = 0 , then for every k > , w | S | + k ( s ) = 0 ; if w | S | +1 ( s ) > w | S | ( s ) , then ( w n ( s )) n ∈ N is not stationary.Proof. For the ﬁrst claim, assume that w | S | + k ( s ) > . Then there is a pathfrom state s to a ﬁnal state in F that follows σ ∗| S | + k (and that has positiveprobability). If this path is longer than | S | then it contains a repetition of atleast one state s (cid:48) , at turns, say, m , and m . Consider the ﬁnite horizon strategythat follows σ ∗| S | + k for the ﬁrst m turns, and then follows σ ∗| S | + k − m , which isthe strategy followed by σ ∗| S | + k when s (cid:48) of π was encountered for the secondtime originally in π . So removing from π the subpath between m and m , weobtain a shorter path that has positive value and that follows this strategy. Bycontinuing this procedure, we obtain a path of length | S | or less and Claim 1 isproved.From Lemma 2.25, if ( w n ( s )) n ∈ N is stationary from N , there is no (positive,or winning) plays where the same state is encountered twice in the N ﬁrst turnsof G N following σ ∗ N . Now, suppose N ≥ | S | . Thus, there is no repetition ofstates, which implies that for all s ∈ S , all paths that contribute to the value w N ( s ) are of length at most | S | , and the result follows.It is interesting to note the contrapositive of the second item in Proposi-tion 2.26, that if ( w n ( s )) n ∈ N is stationary for some state s , then w | S | ( s ) = w | S | +1 ( s ) . In other words, the stationary part starts at most at turn | S | . Thisresult is by state , so other states may not be stationnary. Note however thatwe cannot deduce stationnarity from observing w | S | ( s ) = w | S | +1 ( s ) because thesequence may stay stable for a few turns and then be updated with a posi-tive value. Anyway, we can complete the algorithmic complexity presented inProposition 2.24. Corollary 2.27.

In the worst case, under a dynamic programming approach atmost O (cid:16) | S | max | A cop | max | A rob | (cid:17) operations are suﬃcient in order to deter-mine whether w n is null, stationary equal to a number p ∈ (0 , or inﬁnitelyincreasing.Proof. The result follows from Proposition 2.26 and 2.24 by substituting n for | S | . For stationnarity, for example, if ( w n ) n ∈ N is stationnary, then ( w n ( s )) n ∈ N is stationnary for all s ∈ S , so we can conclude that ( w n ) n ∈ N is stationnary at n = | S | . 21 .7. Bonato and MacGillivray’s generalized Cops and Robbers game This subsection is dedicated to our comparison with Bonato and MacGilli-vray’s generalized Cops and Robbers game [2], which is another attempt atstudying Cops and Robbers games in general forms. For the sake of self-containment, their model is transcribed here. This model is completely de-terministic and thus is included as a special case of Deﬁnition 2.1.Bonato and MacGillivray’s game is presented in the following deﬁnition.

Deﬁnition 2.28 (Bonato and MacGillivray’s game) . A discrete time process G is a generalized Cops and Robbers game if it satisﬁes the following rules : Two players, pursuer and evader compete against each other. There is perfect information. There is a set P P of admissible positions for the pursuer and a set P E for the evader. The set of admissible positions of the game is the subset P ⊆ P P × P E of positions that can be reached according to the rules ofthe game. The set of game states is the subset S ⊆ P × {

P, E } suchthat (( p P , q E ) , X ) ∈ S if, when X is the player next to play, the position ( p P , q E ) can be reached by following the rules of the game. For each game state and each player, there exists a non-empty set of al-lowed movements. Each movement leaves the other player’s position un-changed. We write A P ( p P , q E ) the set of allowed movements for the pur-suer when the game state is (( p P , q E ) , P ) and A E ( p P , q E ) for the set ofmovements allowed to the evader when the game state is (( p P , q E ) , E ) . The rules of the game specify how the game begins. Thus, there ex-ists a set

I ⊆ P P × P E of admissible starting positions. We deﬁne I P = { p P : ∃ q e ∈ P E , ( p P , q E ) ∈ I} and, for p P ∈ P P , we deﬁne theset I E ( p P ) = { q E ∈ P E : ( p P , q E ) ∈ I} . The game G starts with the pur-suer choosing a starting position p P ∈ I P and then the evader choosing astarting position q E ∈ I E ( p P ) . After both players have chosen their initial positions, the game unfoldsalternatively with the pursuer moving ﬁrst. Each player, on his turn,must choose an admissible action given the current state. The rules of the game specify when the pursuer has captured the evader.In other words, there is a subset F of ﬁnal positions. The pursuer wins G if, at any moment, the current position belongs to F . The evader wins ifhis position never belongs to F .Only Cops and Robbers games in which the set P is ﬁnite are considered. Gamesconsidered are played on a ﬁnite sequence of turns indexed by natural integersincluding . We also present how the same authors deﬁned an extension of the relation (cid:22) n of Nowakowski and Winkler [26] in order to solve the set of games characterizedby their model. Deﬁnition 2.29 (Bonato and MacGillivray’s (cid:22) n ) . Let G be a Cops and Robbersgame given by Deﬁnition 2.28. We let : q E (cid:22) p P if and only if ( p P , q E ) ∈ F . Suppose that (cid:22) , (cid:22) , . . . , (cid:22) i − have all been deﬁned for some i ≥ . De-ﬁne q E (cid:22) i p P if ( p P , q E ) ∈ F or if (( p P , q E ) , E ) ∈ S and for all x E ∈A E ( p P , q E ) either ( p P , x E ) ∈ F or there exists some w P ∈ A P ( p P , x E ) such that x E (cid:22) j w P for some j < i .By deﬁnition, (cid:22) i contains (cid:22) i − for all i ≥ . Since P E × P P are ﬁnite, thereexists some t such that (cid:22) t = (cid:22) k for all k ≥ t . We deﬁne (cid:22) = (cid:22) t . Bonato and MacGillivray then use the relation deﬁned in Deﬁnition 2.29 toshow a necessary and suﬃcient condition for the existence of a winning strategyfor the pursuer that is greatly similar to corresponding theorem of Nowakowskiand Winkler [26].

Theorem 2.30 (The copwin theorem of Bonato and MacGillivray) . The pur-suer has a winning strategy in a game of Cops and Robbers characterized byDeﬁnition 2.28 if and only if there exists some p P ∈ I P such that for all q E ∈ I E ( p P ) , either ( p P , q E ) ∈ F or there exists w P ∈ A P ( p P , q E ) such that q E (cid:22) w P . It should be clear at this point that both Deﬁnitions 2.1 and 2.28 describealternative pursuit games of perfect information that unfold on discrete struc-tures. Although the notation is diﬀerent in both cases, it should also be clearthat Bonato and MacGillivray’s model is embedded in ours. The only diﬀer-ence between our formalism has to do with the initial states. Indeed, we onlyallow one initial state, i , which is not the case in Deﬁnition 2.28. This doesnot cause any problem as it suﬃces to play one more turn in our model, oreven to modify the ﬁrst reachable states. In order to simplify what follows, weassume the set of initial states in both models are equivalent. We conclude thatEquation (5) should encode the relation (cid:22) n of Deﬁnition 2.29. Indeed, otherthan its deterministic character, the relation (cid:22) n is greatly similar to our recur-sion. Both relations are binary and recursive. Both share the same structure:a single case when n = 0 in which both players may not make another move; asecond case when n > , but the current state is ﬁnal; ﬁnally, a last case, againwhen n > , when both players must choose an action that is optimal in thesubsequent turns. Thus, we formally show how those two equations are relatedin the coming lines.We ﬁrst note that Equation (5) can be simpliﬁed when following Bonato andMacGillvray’s model. Since the component s o is not used in what follows, wesimply write ( c, r ) ∈ S . Since the game is deterministic, we let players choosetheir next position directly. The recursion w n is thus given by : w ( c, r ) = 1 ⇐⇒ ( c, r ) ∈ F ; w n ( c, r ) =  , if ( c, r ) ∈ F ;max c (cid:48) ∈ A cop ( c,r ) min r (cid:48) ∈ A rob ( c (cid:48) ,r ) w n − ( c (cid:48) , r (cid:48) ) , otherwise. (7)23he following theorem thus makes the connection between the two for-malisms that are our model and that of Bonato and MacGillivray. In orderto clarify the exposition, the relation (cid:22) n is written in our model, that of Deﬁ-nition 2.1. Given the preceding remarks, this should incur no loss of generality. Theorem 2.31.

Let the relation (cid:22) n be given by Deﬁnition 2.29 and w n therecursion given by Equation (5) . Assume G is a GPCR game given by Deﬁnition2.1, but following the speciﬁcations of Deﬁnition 2.28. Then, we have: w n ( c, r ) = 1 ⇐⇒ ∃ a cop ∈ A cop ( c, r ) : r (cid:22) n c (cid:48) . (8) Proof.

First, observe that relation (cid:22) n compares the positions of the pursuerand the evader. These positions are encoded in the game states S of our model.Moreover, the set of actions A deﬁned in model 2.28 are in fact restrictions of theset of actions from model 2.1. Indeed, actions in A directly correspond to gamepositions, whereas we enable, in deﬁnition 2.1, the action sets to be disjointfrom the set of states. It is thus possible to deﬁne a game G that respects thehypotheses of Deﬁnition 2.28 and where Expression (8) is well-deﬁned. A subtlediﬀerence between both formalisms has to do with the turn counters: in relation w n the cops are next to play while the robbers are to make their move in relation (cid:22) n . This does not change the fact that cops play ﬁrst in both games. Now, weprove the result by induction, similarly as in the proof of Proposition 3.2. Base case: n = 0 . w ( c, r ) = 1 if and only if ( c, r ) ∈ F and ( c, r ) ∈ F if andonly if r (cid:22) c . Induction step.

Assume the result holds for n ≤ k and let’s show it for n = k +1 .It holds that w k +1 ( c, r ) = 1 if and only if ( c, r ) ∈ F , in which case r (cid:22) k +1 c by deﬁnition, or there exists an action c (cid:48) ∈ A cop ( c, r ) for the cops such that nomatter the response r (cid:48) ∈ A rob ( c (cid:48) , r ) of the robbers, we have w k ( c (cid:48) , r (cid:48) ) = 1 . Bythe induction hypothesis, we have w k ( c (cid:48) , r (cid:48) ) = 1 if and only if there exists anaction c (cid:48)(cid:48) ∈ A cop ( c (cid:48) , r (cid:48) ) such that r (cid:48) (cid:22) k c (cid:48)(cid:48) . Thus, if the cops play action c (cid:48) , theyposition themselves on a state in which r (cid:22) k +1 c (cid:48) . Conversely, assume thereexists an action c (cid:48) ∈ A cop ( c, r ) such that r (cid:22) k +1 c (cid:48) . Then, by deﬁnition, for allresponse r (cid:48) ∈ A rob ( c (cid:48) , r ) of the robbers there exists an action c (cid:48)(cid:48) ∈ A cop ( c (cid:48) , r (cid:48) ) of the cops such that r (cid:48) (cid:22) k c (cid:48)(cid:48) . In this case, by the induction hypothesis, wehave w k ( c (cid:48) , r (cid:48) ) = 1 . The cops play action c (cid:48) ∈ A cop ( c, r ) , in which case we have w k +1 ( c, r ) = 1 .

3. A concrete model of GPCR games

In this section we present a more concrete model of GPCR games that iscloser to the usual deﬁnitions in the literature. Thus, we specify that the gameis played on a graph, without pointing out its particular shape. The actions ofthe players will correspond to paths as in the game of Cop and Fast Robber[25]. The game presented in Deﬁnition 2.1 is abstract because its sets do notdepend on any precise structure and so neither does the algorithmic complexityof computing Equation (5). The point of reformulating Deﬁnition 2.1 is to reﬁnesome results and formulate them in terms of the graph’s structure.24 .1. Deﬁnition of concrete Cops and Robbers games

In the game presented below, players walk on paths since it appears, inlight of the literature, that such actions are most general. We also grant thecops a watch zone that enables them to capture the robbers whenever they areobserved. We write P for the set of ﬁnite paths in a graph and P v ⊆ P for theset of paths that start on vertex v ∈ V . To simplify the notation, we formulatethe concrete model in the setting where there are one cop and one robber, andwithout the auxiliary information set S o . The extension to the general case isstraightforward. Deﬁnition 3.1.

A GPCR game G = ( S, i , F, A, T cop , T rob ) with one cop andone robber (Deﬁnition 2.1) is concrete if there is a graph G = ( V, E ) satisfying: S = S cop × S rob is a ﬁnite set of conﬁgurations of the game. i = ( i cop , i rob ) , where i cop , i rob (cid:54)∈ V . S cop ⊆ V ×P ( E ) ∪{ i cop } is the set of conﬁgurations of the cop. The secondcoordinate is the cop’s watch zone. S rob ⊆ V ∪ { i rob } is the set of positions of the robber. A cop (( c, z ) , r ) ⊆ P c × P ( E ) is the set of available actions for the cop. Hecan move along a path from his present position c and choose a watch zone.From the initial state, A cop ( i cop ) ⊆ V × P ( E ) . A rob (( c, z ) , r ) ⊆ P r is the set of available actions for the robber. She canmove along a path from her present position r . From the initial state, A rob ( i rob ) ⊆ V . The deﬁnition of a play and all previous remarks and details that apply toDeﬁnition 2.1 are still applicable in Deﬁnition 3.1. A peculiarity here is howwe let the cop have his own watch zone consisting in a set of edges. Thus, thecop can only capture the robber on the robber’s turn. Indeed, seeing as therobber moves along paths, we can explicitly deduce at what point a robber issusceptible to get caught crossing a cop’s watch zone. It’s a natural choice ofmodeling that makes writing the probability of capture easier.

Nowakowski and Winkler’s, and Quilliot’s, game is now presented in the formof Deﬁnition 3.1. In this game, we will consider that the game is over not whenthe cop reaches the same position as the robber, but exactly after that, duringthe robber’s turn, when she tries to escape. This slightly diﬀerent interpretationleads to the same game. Our presentation allows to model and solve a moregeneral situation where the robber could have a possibility of escaping, even ifthe cop reaches the robber’s position. Let G = ( V, E ) be a ﬁnite, undirected,reﬂexive and connected graph and let: S cop = V × P ( E ) S rob = VA cop ( c, r ) = { ([ c, c (cid:48) ] , E c (cid:48) ) | [ c, c (cid:48) ] ∈ E } . E c (cid:48) of the next state is the set of adjacent edges of the cop’snext position c (cid:48) . The ﬁnal states are those in which both players stand on thesame vertex, F = { ( c, r ) ∈ S : c = r } . The initial state is i = ( i cop , i rob ) and welet players choose any vertex from it, that is, A cop ( i cop , i rob ) = A rob ( c, i rob ) = V ,with c ∈ V . Finally, the probabilities of transition are trivial since the game isdeterministic.Now, in order to show that Equation (5) is well-deﬁned, we demonstrate howit encodes the relation (cid:22) n of Nowakowski and Winkler [26]. Since the game isdeterministic, Equation (5) reduces to : w ( c, r ) = 1 ⇐⇒ c = rw n ( c, r ) = max c (cid:48) ∈ N [ c ] min r (cid:48) ∈ N [ r ] w n − ( c (cid:48) , r (cid:48) ) . (9)This equation is also a particular case of Equation (7). The next propositionshows that Equation (9) simulates the relation (cid:22) n . Proposition 3.2.

It holds that w n ( c, r ) = 1 if and only if there exists a vertex c (cid:48) ∈ N [ c ] such that r (cid:22) n c (cid:48) .Proof. We prove the result by induction. We note that in recursion w n it is thecop’s turn to play, while in relation (cid:22) n the robber is next to move. Base case: n = 0 . w ( c, r ) = 1 if and only if r = c and r = c if and only if r (cid:22) c . Induction step.

Assume the result holds for n ≤ k and let us show it holds for n = k + 1 . Then, w k +1 ( c, r ) = 1 if and only if there exists an action c (cid:48) for thecop from which, no matter the response r (cid:48) of the robber, we have w k ( c (cid:48) , r (cid:48) ) = 1 .By the induction hypothesis, w k ( c (cid:48) , r (cid:48) ) = 1 if and only if there exists a vertex c (cid:48)(cid:48) ∈ N [ c (cid:48) ] such that r (cid:48) (cid:22) k c (cid:48)(cid:48) . Thus, the cop can play action c (cid:48) ∈ N [ c ] and wehave r (cid:22) k +1 c (cid:48) . Conversely, if there exists a vertex c (cid:48) ∈ N [ c ] such that r (cid:22) k +1 c (cid:48) ,then, by deﬁnition, for any action r (cid:48) ∈ N [ r ] of the robber there exist a response c (cid:48)(cid:48) ∈ N [ c ] of the cop such that r (cid:48) (cid:22) k c (cid:48)(cid:48) . By the induction hypothesis, we thushave w k ( c (cid:48) , r (cid:48) ) = 1 . In this case, the cop can play action c (cid:48) ∈ N [ c ] such that,no matter the answer of the robber r (cid:48) ∈ N [ r ] , w k ( c (cid:48) , r (cid:48) ) = 1 . By deﬁnition, wethus have w k +1 ( c, r ) = 1 . Deﬁnition 3.1 is further illustrated on the following example. It describesthe game of Cop and Fast Robber with probability of capture, which is a variantof the one presented by Fomin et al. [10], already mentioned in Example 2.5,and a variant of Example 2.4, where the robber could evade from capture.Unsurprisingly, given both games ask of robbers to move along paths, it iseasier to write this new game following Deﬁnition 3.1.For a path π ∈ P on a graph G , we write π [ k ] for its k th vertex and π [ ∗ ] forits last one. Let G = ( V, E ) be a ﬁnite graph. Assume that the cop guards awatch zone C ⊂ E and that each time the robber crosses an edge e he surviveshis walk with probability q C ( e ) (between 0 and 1). In Example 2.4, a capture26robability was used, here we deﬁne a survival probability as it is simpler touse in the current context. Contrary to the Defending Robber game 2.4, theprobability of survival depends on the cop’s watch zone as well as the robber’saction. Here, only the cop’s watch zone and the transition functions are modiﬁedcompared to Example 2.5. So we have an element j ∗ / ∈ V and the set of ﬁnalstates are F = { ( j ∗ , ∅ , j ∗ ) } . We write E c for the set of edges incident to c .Similarly, we write E π for the set of edges of a path π . Let: T cop (( c, E c , r ) , c (cid:48) ) =  δ ( c (cid:48) ,E c (cid:48) ,r ) , if c = i cop and c (cid:48) ∈ V or if c ∈ V and c (cid:48) ∈ N [ c ];0 , otherwise.The robber’s transition function is given by: T rob (( c, E c , r ) , π ) = (cid:40) δ ( c,E c ,π [ ∗ ]) , if E π ∩ E c = ∅ ; D ( r,π [ ∗ ]) , if E π ∩ E c (cid:54) = ∅ ; where D ( r,π [ ∗ ]) is a function satisfying: D ( r,π [ ∗ ]) ( x ) = (cid:40)(cid:81) e ∈ E π q E c ( e ) , if x = ( c, E c , π [ ∗ ]);1 − (cid:81) e ∈ E π q E c ( e ) , if x = ( j ∗ , ∅ , j ∗ ) . Note that to retrieve the game considered in Example 2.5 and Marcoux’s the-sis [25], we should rather use a watch zone E c containing all edges on pathsof length 2 from c and change the conditions on T rob for E π ∩ E c = ∅ and E π ∩ E c (cid:54) = ∅ , where π is the subpath of π starting in π [1] .Since the watch zone is determined by the cop’s position, we can use thesimpliﬁed notation ( c, r ) for a state ( c, E c , r ) . Thus, the recursion of Equation(5) can be written as follows. For the jail state: w i ( j ∗ , ∅ , j ∗ ) = 1 for all i ≥ .For ( c, E c , r ) (cid:54) = ( j ∗ , ∅ , j ∗ ) , we have w ( c, r ) = 0 and, for n ≥ , w n ( c, r ) =max c (cid:48) ∈ N [ c ] min π ∈P r T rob (( c (cid:48) , r ) , π, ( c (cid:48) , π [ ∗ ])) w n − ( c (cid:48) , π [ ∗ ]) + T rob (( c (cid:48) , r ) , π, ( j ∗ , j ∗ )) . Following Proposition 2.24, the algorithmic complexity of the previous re-cursion is at most O (cid:16) n ∆ | V | |P| (cid:17) , where ∆ is the maximal degree of G . Indeed, S corresponds to the set of pairs of vertices, the cop can only move on his neigh-bourhood and the robber is allowed to choose any path of ﬁnite length. Hence,even if we restrict the possible paths that can choose the robber to elementarypaths (paths that do not cross twice a same vertex), the size of the possiblerobber actions, and therefore the size of P , is exponential in the size of thegraph on which the game is played. However, as shown in the next proposition, w n can be computed in polynomial time in the size of the graph itself. Proposition 3.3.

Computing w n ( i ) in the Cop and Fast Defending Robbergame requires at most O (cid:16) | V | log | V | + ( n + 1) | V | | E | (cid:17) operations and uses atmost O ( | V | ) space, for any n ∈ N . roof. Let P r (cid:48) r be the set of paths beginning in r and ending in r (cid:48) . Let ( c, E c ) bea cop position. The robber’s transition function can be simpliﬁed by assuming q E c ( e ) = 1 if e / ∈ E c . Then, T rob (( c, r ) , π, ( c, π [ r (cid:48) ])) = Π e ∈ E π q E c ( e ) if the robberis not caught on π [ r (cid:48) ] . The previous recursion, when state ( c, r ) is not ﬁnal, canbe simpliﬁed to: w n ( c, r ) = max c (cid:48) ∈ N [ c ] min r (cid:48) ∈ Vπ ∈P r (cid:48) r (cid:32) (cid:89) e ∈ E π q E c (cid:48) ( e ) w n − ( c (cid:48) , r (cid:48) ) + 1 − (cid:89) e ∈ E π q E c (cid:48) ( e ) (cid:33) . = max c (cid:48) ∈ N [ c ] min r (cid:48) ∈ Vπ ∈P r (cid:48) r (cid:16) ( w n − ( c (cid:48) , r (cid:48) ) − (cid:89) e ∈ E π q E c (cid:48) ( e ) + 1 (cid:17) . If c (cid:48) and r (cid:48) are ﬁxed, we look for the path π minimizing the expression inparentheses, so maximizing (cid:81) e ∈ E π q E c (cid:48) ( e ) . This is the same path that maxi-mizes (cid:80) e ∈ E π log q E c (cid:48) ( e ) because log is a monotone increasing function. Because log q E c (cid:48) ( e ) < , with q E c (cid:48) ( e ) ∈ [0 , , we can minimize (cid:80) e ∈ E π − log q E c (cid:48) ( e ) .Observe that the survival probabilities q E c (cid:48) ( e ) depend only on the vertex c (cid:48) and the edge e . Thus, prior to evaluating w n ( c, r ) , we can precompute | V | all-pairs shortest paths (one for each possible cop position) by weightingeach edge e ∈ E with − log q E x ( e ) for each source x ∈ V . This is done in O (cid:16) | E || V | + | V | log | V | (cid:17) operations, for example by using the algorithm ofFredman and Tarjan [12]. This takes O (cid:16) | V | (cid:17) space (because the path doesnot have to be stored, it can be recomputed in O (∆ | V | ) ). Thus, for each c (cid:48) westore the values (cid:81) e ∈ E π q E c (cid:48) ( e ) for the shortest path π between r and r (cid:48) . Findingthe next robber position r (cid:48) thus requires at most O ( | V | ) operations.Now, assume w n − ( c (cid:48) , r (cid:48) ) is computed for all c (cid:48) , r (cid:48) , in time a n − . We lookfor the vertex c (cid:48) ∈ N [ c ] maximizing min r (cid:48) ∈ Vπ ∈P r (cid:48) r (cid:16) ( w n − ( c (cid:48) , r (cid:48) ) − (cid:89) e ∈ E π q E c (cid:48) ( e ) + 1 (cid:17) . The values w n − ( c (cid:48) , r (cid:48) ) are already computed, as well as the (cid:81) e ∈ E π q E c (cid:48) ( e ) ’s.Thus, at most O ( | N [ c ] || V | ) operations are required to evaluate expression w n ( c, r ) when c and r are ﬁxed. To ﬁnd all maxima, that is for all c ∈ V and r ∈ V , weneed to make at most O (cid:16) | V | (cid:80) c ∈| V | | N [ c ] | (cid:17) = O (2 | V || E | ) operations. On turn n , we make a number of operations a n ∈ O ( | V || E | ) + a n − ⊆ O ( n | V || E | ) . Thetotal complexity is thus: O (cid:16) | E || V | + | V | log | V | (cid:17) + a n = O (cid:16) ( n + 1) | E || V | + | V | log | V | (cid:17) . The bottlenecks for the spatial complexity are the shortest path algorithmswhich require at most O (cid:16) | V | (cid:17) space. For each w n we only need w n − so wedo not need to store any other w k for k < n − .28n important aspect of the fast robber game is its ability to model situ-ations of imperfect information in which the cops only gather information onthe robber’s position at regular intervals. This game, deemed with witness , isshown by Chalopin et al. [6] to correspond to the game of Cop and Fast Robber(without watch zone). In essence, the authors present an equivalence betweenthe classes of copwin graphs in the witness game and in the fast robber one.We can wonder if the same could be said of the stochastic case. Let us revisit the Cop and Drunk Robber game of Example 2.3 with theconcrete model and Equation (5).We can show how it is always easier to capture a robber moving randomlythan a robber playing optimally. For the sake of generality, assume the robbercan play according to any distribution in

Dist N [ r ] when she ﬁnds herself onvertex r . Let φ ⊆ (cid:0) Dist N [ r ] (cid:1) r ∈ V be a sequence of distributions on the vertices V and φ r its component that is in Dist N [ r ] . Then, we write w φn ( c, r ) for therecursion in which T rob (( c, r ) , r (cid:48) ) = φ r . In other words, we write: w φn ( c, r ) = max c (cid:48) ∈ N [ c ] (cid:88) r (cid:48) ∈ N [ r ] φ r ( r (cid:48) ) w φn − ( c (cid:48) , r (cid:48) ) , (10)if c (cid:54) = r and w φn ( c, r ) = 1 if c = r , for all n ≥ . The classic recursion fromEquation (9) is written w n ( c, r ) . Proposition 3.4.

It is always easier to capture a robber playing randomly thanan adversarial one, that is: w φn ( c, r ) ≥ w n ( c, r ) . Proof.

We write δ N [ r ] for the set of Dirac distributions deﬁned on N [ r ] . Therobber would be harder to capture if she were to minimize her probability of cap-ture only on Dirac distributions because her optimal strategy is deterministic.Thus, we compute w φn ( c, r ) := max c (cid:48) ∈ N [ c ] (cid:88) r (cid:48) ∈ N [ r ] φ r ( r (cid:48) ) w φn − ( c (cid:48) , r (cid:48) ) ≥ max c (cid:48) ∈ N [ c ] min ψ ⊆ ( Dist N [ r ] ) r ∈ V (cid:88) r (cid:48) ∈ N [ r ] ψ r ( r (cid:48) ) w ψn − ( c (cid:48) , r (cid:48) )= max c (cid:48) ∈ N [ c ] min ψ ⊆ ( δ N [ r ] ) r ∈ V (cid:88) r (cid:48) ∈ N [ r ] ψ r ( r (cid:48) ) w ψn − ( c (cid:48) , r (cid:48) )= max c (cid:48) ∈ N [ c ] min r (cid:48) ∈ N [ r ] w n − ( c (cid:48) , r (cid:48) ) . The ﬁrst line is the deﬁnition of Equation (10) with the robber playing accordingto distribution φ . If she could choose this distribution, she could fall on adistribution ψ ⊆ (cid:0) Dist N [ r ] (cid:1) r ∈ V ensuring her a greater probability of survival,29hat justiﬁes the second line. Then, we observe that since her optimal strategy isdeterministic, it corresponds to a sequence of Dirac distributions and she losesnothing in playing according to ψ ⊆ (cid:0) δ N [ r ] (cid:1) r ∈ V . The last line is simply thepreceding one rewritten without distributions, as in this case ψ r is concentratedon a single vertex r (cid:48) ∈ N [ r ] . In graph theory, one can deﬁne many random processes to stochasticallygenerate graphs that vary on each time step. One thus obtains a sequence ofgraphs G , G , . . . that represents the evolution of a network over time. Thosegraphs are called dynamic graphs, link streams, time-varying graphs or tempo-ral networks , depending on the community, and can model, for example, thedestruction of a bridge or of a road that makes it impossible for the players topass through it.Suppose k cops are chasing l robbers on the sequence G , G , . . . . In orderto take into account the variable nature of the underlying structure of a gamefrom Deﬁnition 3.1, we can make use of the component S o as a turn counter.Let G t = ( V t , E t ) be the graph generated at time t , S t = V kt × P ( E t ) k × V lt × { t } and S = (cid:83) ∞ t =1 S t . Hence, on each time step t a new graph G t is created accordingto a certain process and the set of states is renewed. The sets of actions canalso be redeﬁned. Let P G t u be the set of ﬁnite paths on G t that begin on vertex u ∈ V t . The sets of actions are thus : A cop ( c, C, r, t ) ⊆ k (cid:89) i =1 P G t c i × P ( E t ); A rob ( c, C, r, t ) ⊆ l (cid:89) i =1 P G t r i . Since this example is rather general, we let the transition functions be undeﬁned.We require, however, that the transition functions follow the arrow of time: if s t ∈ S t is a game state at time t and a cop ∈ A cop ( s t ) is a cop action, then T cop ( s t , a cop )( X ) > only if X ⊆ S t +1 . The same holds for the robbers.

4. Conclusion

This paper presented a relatively simple yet very general model in orderto describe games of Cops and Robbers that, notably, may include stochasticaspects. The game G was presented along with a method of resolution in theform of a recursion w n in Theorem 2.13. We show in Proposition 2.16 that wecan always retrieve an (cid:15) -optimal strategy for G from the recursion w n (for largeenough n ). Moreover, in Proposition 2.26 we show that if the recursion becomes30tationary, stationarity must occur at most at index | S | . This is a ﬁrst step inthe analysis of the rate of convergence of the recursion.We have exposed how some classic Cops and Robbers games can be writ-ten into our model and extended. Many more games could now be studied asGPCR such as the Fireﬁghting game, under certain conditions, in which a teamof ﬁreﬁghters seeks to prevent the nodes of a graph from burning. An interest-ing notion that is captured by our framework, in Deﬁnition 3.1, is that of thesurveillance zones of the cops that can be chosen at each step. Thus, we claima wide variety of games of Cops and Robbers can be solved with the conceptsdeveloped in this paper. Furthermore, such a broad exposition of games of Copsand Robbers as ours enables one to study the eﬀects of modifying certain rules,for example on the number of cops or on the speed of players, on the games.That is, one can use Equation (5) and probe its values in order to test howmodifying these rules aﬀect the ability of the cops to capture the robbers.We have extended the classic notion of cop number with the p -cop number,although the question is still open about the behaviour of this function. Theexpected capture time of the robbers is also of great interest. This functioncan now be studied on large swaths of Cops and Robbers games. In part,this question can be motivated by a Simard et al.’s paper [31] on the relationbetween an Operations Research problem and the resolution of a Cop and DrunkRobber game. Speciﬁcally, the authors tackled the problem of upper boundingthe probability of detecting a hidden and randomly moving object on a graphwith a single optimally moving searcher. This problem, being NP-hard [32],is constrained to be solved in a maximum number of time steps T ∈ N . Inparticular, it appears that if one could tightly upper bound the expected capturetime of a game derived from Deﬁnition 2.1, then one could, following the ideaspresented in this paper, deduce the optimal number of searchers to send on amission to rescue the object. Then, if this number were deduced, one couldfurther apply the ideas of this article along with Equation (5) in order to helpsolve this search problem with multiple searchers.Finally, a last avenue of research that is worth mentioning and that is possi-bly of most interest to researchers in robotics and operations research concernsthe extension of model 2.1 to games of imperfect information. Imperfect in-formation refers to the lack of knowledge of one or both players. Cops andRobbers games of imperfect information thus contain games in which robbersare invisible, that can model problems of graph search such as the one men-tioned above. Game theory seems apt to enable the transition from perfectinformation to imperfect information games with the use of belief states. Suchgeneralization could be paired with the branch and bound method presented inSimard et al. [31] in order to solve more general search problems.In light of the literature on Cops and Robbers games it appears this paperdistances itself from most studies on the subject. Indeed, we do not claim anyresults on typical Cops and Robbers questions such as the asymptotic behaviourof c p ( G ) or on dismantling schemes to characterize classes of winning graphs.However, we think that modelling such a wide variety of games opens the doorto further studies on Cops and Robbers games that can now be tackled in their31enerality, which was not possible before. Thus, although our model may notenable one to compute analytical solutions on classical questions of Cops andRobbers games, we have good hope that algorithmic ones will be devised inorder to solve more general problems on classes, not of graphs, but of games.In short, it appears that new and promising avenues of research have come tolight with the objects presented in this paper and we hope researchers will bedriven to tackle those open questions that were unearthed. Acknowledgement

The authors acknowledge the careful reading of reviewers, which has helpedimprove the paper presentation. Josée Desharnais and François Laviolette ac-knowledge the support of the Natural Sciences and Engineering Research Coun-cil of Canada (NSERC, grant numbers 239294 and 262067).

References [1] M. Aigner and M. Fromme. A game of cops and robbers.

Discrete AppliedMathematics , 8(1):1–12, 1984.[2] Anthony Bonato and Gary Macgillivray. Characterizations and algorithmsfor generalized cops and robbers games.

Contributions to Discrete Mathe-matics , 12(1):1–10, 2017.[3] Anthony Bonato, Dieter Mitsche, Xavier Pérez-Giménez, and Pawel Pralat.A probabilistic version of the game of zombies and survivors on graphs.

Theor. Comput. Sci. , 655:2–14, 2016.[4] Anthony Bonato and Richard J. Nowakowski.

The Game of Cops andRobbers on Graphs . American Mathematical Society, 2011.[5] M Boyer, S El Harti, A El Ouarari, R Ganian, T Gavenciak, G Hahn,C Moldenauer, I Rutter, B Thériault, and M Vatshelle. Cops-and-robbers:remarks and problems.

Journal of Combinatorial Mathematics and Com-binatorial Computing , 85, 2013.[6] Jérémie Chalopin, Victor Chepoi, Nicolas Nisse, and Yann Vaxès. Cop andRobber Games When the Robber Can Hide and Ride.

SIAM Journal onDiscrete Mathematics , 25(1):333–359, jan 2011.[7] Nancy E. Clarke and Gary MacGillivray. Characterizations of k-copwingraphs.

Discrete Mathematics , 312(8):1421–1425, 2012.[8] Anne Condon. The complexity of stochastic games.

Information and Com-putation , 96(2):203–224, 1992.[9] John Horton Conway.

On numbers and games . IMA, 1976.3210] Fedor V. Fomin, Petr A. Golovach, Jan Kratochvíl, Nicolas Nisse, andKarol Suchan. Pursuing a fast robber on a graph.

Theoretical ComputerScience , 411(7-9):1167–1181, feb 2010.[11] Fedor V. Fomin and Dimitrios M. Thilikos. An annotated bibliographyon guaranteed graph searching.

Theoretical Computer Science , 399(3):236–245, 2008.[12] Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and theiruses in improved network optimization algorithms.

Journal of the ACM ,34(3):596–615, 1987.[13] Hugo Gimbert and Florian Horn. Simple stochastic games with few randomvertices are easy to solve. In

International Conference on Foundationsof Software Science and Computational Structures , pages 5–19. Springer,2008.[14] Andrzej Granas and James Dugundji.

Fixed Point Theory . Springer-Verlag,2003.[15] Geňa Hahn and Gary MacGillivray. A note on k-cop, l-robber games ongraphs.

Discrete Mathematics , 306(19):2492 – 2497, 2006. Creation andRecreation: A Tribute to the Memory of Claude Berge.[16] Ath. Kehagias. Generalized cops and robbers: A multi-player pursuit gameon graphs.

Dynamic Games and Applications , Nov 2018.[17] Ath. Kehagias and G. Konstantinidis. Selﬁsh cops and passive robber:Qualitative games.

Theoretical Computer Science , 680:25 – 35, 2017.[18] Athanasios Kehagias, Dieter Mitsche, and Paweł Prałat. Cops and invisiblerobbers: The cost of drunkenness.

Theoretical Computer Science , 481:100– 120, 2013.[19] Athanasios Kehagias and Paweł Prałat. Some remarks on cops and drunkrobbers.

Theoretical Computer Science , 463:133 – 147, 2012. Special Issueon Theory and Applications of Graph Searching Problems.[20] William B. Kinnersley. Cops and Robbers is EXPTIME-complete.

Journalof Combinatorial Theory, Series B , 111:201–220, mar 2015.[21] Natasha Komarov.

Expected Capture Time in Variants of Cops & RobbersGames . PhD thesis, Dartmouth College, 2013.[22] Natasha Komarov and Peter Winkler. Capturing the Drunk Robber on aGraph.

The Electronic Journal of Combinatorics , 21(3):14, 2014.[23] G. Konstantinidis and A. Kehagias. Selﬁsh cops and active robber: Multi-player pursuit evasion on graphs.

Theoretical Computer Science , 780:84 –102, 2019. 3324] G. Konstantinidis and Ath. Kehagias. Simultaneously moving cops androbbers.

Theoretical Computer Science , 645:48 – 59, 2016.[25] Héli Marcoux.

Jeux de poursuite policier-voleur sur un graphe, le cas duvoleur rapide . Mémoire de maîtrise, Université Laval, 2014.[26] Richard Nowakowski and Peter Winkler. Vertex-to-vertex pursuit in agraph.

Discrete Mathematics , 43(2-3):235–239, 1983.[27] Martin J Osborne and Ariel Rubinstein.

A course in game theory . MITpress, 1994.[28] Martin L. Puterman.

Markov decision processes: discrete stochastic dy-namic programming . John Wiley & Sons, 2014.[29] Alain Quilliot.

Problèmes de jeux, de point ﬁxe, de connectivité et dereprésentation sur des graphes, des ensembles ordonnés et des hypergraphes .Thèse de doctorat d’état, Université de Paris VI, France, 1983.[30] Lloyd S Shapley. Stochastic games.

Proceedings of the national academy ofsciences , 39(10):1095–1100, 1953.[31] Frédéric Simard, Michael Morin, Claude-Guy Quimper, François Lavio-lette, and Josée Desharnais. Bounding an Optimal Search Path with aGame of Cop and Robber on Graphs. In

Principles and Practice of Con-straint Programming: 21st International Conference, CP 2015, Cork, Ire-land, August 31–September 4, 2015, Proceedings , volume 9255, pages 403–418. Springer Science Business Media, 2015.[32] K E Trummel and J R Weisinger. The Complexity of the Optimal SearcherPath Problem.

Operations Research , 34(2):324–327, 1986.[33] J. Wal, van der and J. Wessels.

On Markov games . Memorandum COSOR.Technische Hogeschool Eindhoven, 1975.

Appendices

A. Constructing a GPCR game as a Simple Stochastic Game

The following argument is inspired by the SSG exposition of Gimbert andHorn [13]. A simple stochastic game is a tuple ( V, V max , V min , V R , E, t, p ) , where ( V, E ) describes a directed graph G and V max , V min , V R form a partition of V .There is a special vertex t ∈ V , called the target, and p is a probability functionsuch that for every vertex w ∈ V and v ∈ V R , p ( w | v ) is the probability oftransiting from v to w . There are two players, max and min , and the game isplayed with perfect information. The set V max contains those nodes controlledby player max , i.e. where this player is next to play, and V min those nodes34ontrolled by player min . The set of edges E is deﬁned by the possible movesin the game. The game proceeds as follows: imagine a token is placed on someinitial vertex i ∈ V max ∪ V min , then the player who is next to play moves thetoken along an edge, either the token is again in some vertex of V max ∪ V min where one player has to make a move, or the token is now on some vertex v of V R . When the token is on v , an outneighbour of v is chosen randomly accordingto the distribution p ( · | v ) , where the token is moved. The game ends if t is everencountered, in which case max wins, otherwise it continues indeﬁnitely and theother player wins.Following Gimbert and Horn, we deﬁne a play as an inﬁnite sequence ofvertices v v . . . of G such that ( v i , v i +1 ) ∈ E for all i and a ﬁnite play (whatwe called a history) as a ﬁnite preﬁx of a play. A strategy for max is a function σ : V ∗ V max → V and a strategy for min is a function τ : V ∗ V min → V , where V ∗ is the set of ﬁnite plays. We suppose that for each ﬁnite play ( v . . . v n ) and vertex v ∈ V max , ( v, σ ( v . . . v n v ) ∈ E and similarly for τ . Note that suchstrategies are deterministic, which is without loss of generality. We write forconvenience Γ max and Γ min for the sets of max (resp. min ) strategies. Now,for any node v ∈ V , we can deﬁne the value of v for max (resp. min ) as theprobability the target node is reached from that node. If p ( t | σ, τ, v ) is theprobability that t is reached from v under strategies σ and τ , then we let val ( v ) := sup σ ∈ Γ max inf τ ∈ Γ min p ( t | σ, τ, v ) ,val ( v ) := inf τ ∈ Γ min sup σ ∈ Γ max p ( t | σ, τ, v ) . The following theorem [8, 13, 30] is well known about simple stochasticgames.

Theorem A.1.