Comparing Labelled Markov Decision Processes
CComparing Labelled Markov Decision Processes
Stefan Kiefer
Department of Computer Science, University of Oxford, [email protected]
Qiyi Tang
Department of Computer Science, University of Oxford, [email protected]
Abstract
A labelled Markov decision process is a labelled Markov chain with nondeterminism, i.e., togetherwith a strategy a labelled MDP induces a labelled Markov chain. The model is related to intervalMarkov chains. Motivated by applications of equivalence checking for the verification of anonymity,we study the algorithmic comparison of two labelled MDPs, in particular, whether there existstrategies such that the MDPs become equivalent/inequivalent, both in terms of trace equivalenceand in terms of probabilistic bisimilarity. We provide the first polynomial-time algorithms forcomputing memoryless strategies to make the two labelled MDPs inequivalent if such strategiesexist. We also study the computational complexity of qualitative problems about making the totalvariation distance and the probabilistic bisimilarity distance less than one or equal to one.
Theory of computation → Program verification; Theory of com-putation → Models of computation; Mathematics of computing → Probability and statistics
Keywords and phrases
Markov decision processes, Markov chains, Behavioural metrics
Digital Object Identifier
Related Version
The paper is accepted to FSTTCS 2020.
Funding
Stefan Kiefer : Supported by a Royal Society University Fellowship.
Acknowledgements
We thank the anonymous reviewers of this paper for their constructive feedback.
Given a model of computation (e.g., finite automata), and two instances of it, are theysemantically equivalent (i.e., do they accept the same language)? Such equivalence problemscan be viewed as a fundamental question for almost any model of computation. As such,they permeate computer science, in particular, theoretical computer science.In labelled Markov chains (LMCs) , which are Markov chains whose states (or, equivalently,transitions) are labelled with an observable letter, there are two natural and very well-studiedversions of equivalence, namely trace (or language) equivalence and probabilistic bisimilarity .The trace equivalence problem has a long history, going back to Schützenberger [36]and Paz [31] who studied weighted and probabilistic automata, respectively. Those modelsgeneralize LMCs, but the respective equivalence problems are essentially the same. It canbe extracted from [36] that equivalence is decidable in polynomial time, using a techniquebased on linear algebra. Variants of this technique were developed in [42, 17]. More recently,the efficient decidability of the equivalence problem was exploited, both theoretically andpractically, for the verification of probabilistic systems, see, e.g., [23, 24, 32, 30, 28]. Inthose works, equivalence naturally expresses properties such as obliviousness and anonymity,which are difficult to formalize in temporal logic. In a similar vein, inequivalence can meandetectibility and the lack of anonymity.
Probabilistic bisimilarity is an equivalence that was introduced by Larsen and Skou [27]. © Stefan Kiefer and Qiyi Tang;licensed under Creative Commons License CC-BYLeibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . F L ] S e p It is finer than trace equivalence, i.e., probabilistic bisimilarity implies trace equivalence.A similar notion for Markov chains, called lumpability , can be traced back at least to theclassical text by Kemeny and Snell [22]. Probabilistic bisimilarity can also be computed inpolynomial time [3, 14, 43]. Indeed, in practice, computing the bisimilarity quotient is fastand has become a backbone for highly efficient tools for probabilistic verification such as
Prism [26] and
Storm [20].In this paper, we study equivalence problems for (labelled) Markov decision processes(MDPs) , which are LMCs plus nondeterminism, i.e., each state may have several actions (or “moves”) one of which is chosen by a controller, potentially randomly. An MDP and acontroller strategy together induce an LMC (potentially with infinite state space, dependingon the complexity of the strategy). The nondeterminism in MDPs gives rise to a spectrum ofequivalence queries: one may ask about the existence of strategies for two given MDPs suchthat the induced LMCs become trace/bisimulation equivalent, or such that they becometrace/bisimulation in equivalent. Another potential dimension of this spectrum is whether toconsider general strategies or more restricted ones, such as memoryless or even memorylessdeterministic (MD) ones.In this paper, we focus on memoryless strategies, for several reasons. First, these questionsfor unrestricted strategies quickly lead to undecidability. For example, in [18, Theorem 3.1] itwas shown that whether there exists a general strategy such that a given MDP becomes traceequivalent with a given LMC is undecidable. Second, memoryless strategies are sufficient fora wide range of objectives in MDPs, and their simplicity means that even if it was knownthat a general strategy exists to accomplish (in)equivalence one might still wonder if there also exists a memoryless strategy. Third, probabilistic bisimilarity is a less natural notion forLMCs induced by general strategies: such LMCs will in general have an infinite state space,even when the MDP is finite. Fourth, applying a memoryless strategy in an MDP is relatedto choosing an instance of an interval Markov chain (IMC) . IMCs are like Markov chains,but the transitions are labelled not with probabilities but with probability intervals. IMCswere introduced by Jonsson and Larsen [21] and have been well studied in verification-relateddomains [37, 8, 13, 4, 7], but also in areas such as systems biology, security or communicationprotocols, see, e.g., [12]. Selecting a memoryless strategy in an MDP corresponds to selectinga probability from each interval (one out of generally uncountably many). Parametric Markovchains and parametric MDPs are further related models, see, e.g., [19, 45] and the referencestherein.LMCs can also be compared in terms of their distance . We consider two natural distancefunctions between two LMCs: the total variation distance (between the two trace distributions)and the probabilistic bisimilarity distance [16]. Both distances can be at most 1. The totalvariation (resp. probabilistic bisimilarity) distance is 0 if and only if the LMCs are traceequivalent (resp. probabilistic bisimilar). Further, the probabilistic bisimilarity distance is anupper bound on the total variation distance [9]. It was shown in [10] (resp. [40]) that whetherthe total variation (resp. probabilistic bisimilarity) distance of two LMCs equals 1 can bedecided in polynomial time. This raises the question whether these results can be extendedto MDPs, i.e., what is the complexity of deciding whether there exists a memoryless strategyto make the distance less than 1 or equal to 1, respectively. It turns out that some of theseproblems are closely related to the corresponding (in)equivalence problem.Instead of comparing two MDPs with initial distributions/states, one may equivalentlycompare two initial distributions/states in a single MDP (by taking a disjoint union of thestates). In this paper we study the computational complexity of the following problems:TV = 0 (TV > . Kiefer and Q. Tang 50:3 initial distributions are (not) trace equivalent in the induced labelled Markov chain;TV = 1 (TV < > < > > < Problem ComplexityTV = 0 ∃ R -completeTV > P TV = 1 NP -hard and in ∃ R TV < ∃ R -completePB = 0 NP -completePB > P PB = 1 NP -completePB < NP -complete Table 1
Summary of the results. These results also imply results for the problems which state“for all memoryless strategies”. For example, TV > We write R for the set of real numbers and N the set of nonnegative integers. Let S be a finiteset. We denote by Distr( S ) the set of probability distributions on S . By default we viewvectors, i.e., elements of R S , as row vectors. For a vector µ ∈ [0 , S we write | µ | := P s ∈ S µ ( s )for its L -norm. A vector µ ∈ [0 , S is a distribution (resp. subdistribution) over S if | µ | = 1(resp. 0 < | µ | ≤ ∈ { } S and ∈ { } S are column vectors all whose entries are 1 and 0, respectively. For s ∈ S wewrite δ s for the (Dirac) distribution over S with δ s ( s ) = 1 and δ s ( r ) = 0 for r ∈ S \ { s } . Fora (sub)distribution µ we write support( µ ) = { s ∈ S | µ ( s ) > } for its support.A labelled Markov chain (LMC) is a quadruple h S, L, τ, ‘ i consisting of a nonempty finiteset S of states, a nonempty finite set L of labels, a transition function τ : S → Distr( S ), anda labelling function ‘ : S → L .We denote by τ ( s )( t ) the transition probability from s to t . Similarly, we denote by τ ( s )( E ) = P t ∈ E τ ( s )( t ) the transition probability from s to E ⊆ S . A trace in a LMC M is a sequence of labels w = a a · · · a n where a i ∈ L . We denote by L ≤ n the set of tracesof length at most n . Let M : L → [0 , S × S specify the transitions, so that P a ∈ L M ( a )is a stochastic matrix, M ( a )( s, t ) = τ ( s )( t ) if ‘ ( s ) = a and M ( a )( s, t ) = 0 otherwise. We extend M to the mapping M : L ∗ → [0 , S × S with M ( w ) = M ( a ) · · · M ( a n ) for a trace w = a · · · a n . If the LMC is in state s , then with probability M ( w )( s, s ) it emits a trace w and moves to state s in | w | steps. For a trace w ∈ L ∗ , we define Run ( w ) := { w } L ω ; i.e., Run ( w ) is the set of traces starting with w . To an initial distribution π on S , we associatethe probability space ( L ω , F , Pr M ,π ), where F is the σ -field generated by all basic cylinders Run ( w ) with w ∈ L ∗ and Pr M ,π : F → [0 ,
1] is the unique probability measure such thatPr M ,π ( Run ( w )) = | πM ( w ) | . We generalize the definition of Pr M ,π to subdistributions π inthe obvious way, yielding sub-probability measures. We may drop the subscript M if it isclear from the context.Given two initial distributions µ and ν , the total variation distance between µ and ν isdefined as follows: d tv ( µ, ν ) = sup E ∈F | Pr µ ( E ) − Pr ν ( E ) | . We write µ ≡ ν to denote that µ and ν are trace equivalent, i.e., | Pr µ ( Run ( w )) | = | Pr ν ( Run ( w )) | holds for all w ∈ L ∗ . We have that trace equivalence and the total variationdistance being zero are equivalent [10, Proposition 3(a)].The pseudometric probabilistic bisimilarity distance of Desharnais et al. [15] , which wedenote by d pb , is a function from S × S to [0 , , S × S . It can bedefined as the least fixed point of the following function:∆( d )( s, t ) = ‘ ( s ) = ‘ ( t )min ω ∈ Ω( τ ( s ) ,τ ( t )) X u,v ∈ S ω ( u, v ) d ( u, v ) otherwisewhere the set Ω( µ, ν ) of couplings of µ, ν ∈ Distr( S ) is defined as Ω( µ, ν ) = (cid:8) ω ∈ Distr( S × S ) (cid:12)(cid:12) P t ∈ S ω ( s, t ) = µ ( s ) ∧ P s ∈ S ω ( s, t ) = ν ( t ) (cid:9) . Note that a coupling ω ∈ Ω is a joint probability distribution with marginals µ and ν (see, e.g., [5, page 260-262]).An equivalence relation R ⊆ S × S is a probabilistic bisimulation if for all ( s, t ) ∈ R , ‘ ( s ) = ‘ ( t ) and τ ( s )( E ) = τ ( t )( E ) for each R -equivalence class E . Probabilistic bisimilarity ,denoted by ∼ M (or ∼ when M is clear), is the largest probabilistic bisimulation. For all s, t ∈ S , s ∼ t if and only if d pb ( s, t ) = 0 [15, Theorem 1].A (labelled) Markov decision process (MDP) is a tuple h S, A , L, ϕ, ‘ i consisting of afinite set S of states, a finite set A of actions, a finite set L of labels, a partial function ϕ : S × A 7→ Distr( S ) denoting the probabilistic transition, and a labelling function ‘ : S → L .The set of available actions in a state s is A ( s ) = { m ∈ A | ϕ ( s, m ) is defined } . A memoryless strategy for an MDP is a function α : S → Distr( A ) that given a state s , returns a probabilitydistribution on all the available actions at that state. Such strategies are also known aspositional, as they do not depend on the history of past states. A strategy α is memorylessdeterministic (MD) if for all states s there exists an action m ∈ A ( s ) such that α ( s )( m ) = 1;we thus view an MD strategy as a function α : S → A .For the remainder of the paper, we fix an MDP D = h S, A , L, ϕ, ‘ i . Given a memorylessstrategy α for D , an LMC D ( α ) = h S, L, τ, ‘ i is induced, where τ ( s )( t ) = P m ∈A ( s ) α ( s )( m ) · ϕ ( s, m )( t ). The matrix M α specifies the transitions of the LMC D ( α ) as is defined previously.We fix two initial distributions µ and ν on S (resp. two initial states s and t ) for problemsrelated to total variation distance (resp. probabilistic bisimilarity distance). In this section we show that one can decide in polynomial time whether there exists amemoryless strategy α so that µ ν in D ( α ). In terms of the notation from the introduction, . Kiefer and Q. Tang 50:5 we show that TV > P . Define the following column-vector spaces. V = h M α ( a ) M α ( a ) · · · M α m ( a m ) : α i is a memoryless strategy; a i ∈ L i and V = h M α ( w ) : α is a memoryless strategy; w ∈ L ∗ i and V = h M α ( w ) : α is an MD strategy; w ∈ L ∗ i . Here and later we use the notation h·i to denote the span of (i.e., the vector space spanned by)a set of vectors. By the definitions, we have that µ ≡ ν in all LMCs induced by all memorylessstrategies α if and only if µM α ( w ) = νM α ( w ) holds for all memoryless strategies α andall w ∈ L ∗ . It follows: (cid:73) Proposition 1.
For all distributions µ, ν over S we have: ∃ a memoryless strategy α such that µ ν in D ( α ) ⇐⇒ µ v = ν v for some v ∈ V . To decide TV > µ ν inthe induced LMC, it suffices to compute a basis for V ; more precisely, a set of α and w suchthat the vectors M α ( w ) span V . As the set of memoryless strategies is uncountable, this isnot straightforward. From the definitions, we know V ⊆ V ⊆ V . We will show V ⊆ V andthus establish the equality of these three vector spaces. It follows from [18, Theorem 5.12]that computing a basis for V is in P . It follows that our problem TV > P , butthis does not explicitly give the witnessing memoryless strategy. Since V = V , there mustexist an MD strategy that witnesses µ ν . To find this MD strategy, one can go through allMD strategies (potentially exponentially many). In the following, by considering the vectorspaces while restricting the word length, we show that a witness MD strategy can also becomputed in polynomial time.We define the following column-vector spaces. For each j ∈ N , V j = h M α ( a ) M α ( a ) · · · M α k ( a k ) : α i is a memoryless strategy; a i ∈ L ; k ≤ j i and V j = h M α ( w ) : α is a memoryless strategy; w ∈ L ≤ j i and V j = h M α ( w ) : α is an MD strategy; w ∈ L ≤ j i . Let α be an MD strategy and m be an action available at state i . Recall that an MDstrategy can be viewed as a function α : S → A . We define α i → m to be the MD strategy suchthat α i → m ( i ) = m and α i → m ( s ) = α ( s ) for all s ∈ S \ { i } . Let c i ∈ { , } S be the column bitvector whose only non-zero entry is the i th one. For a set B ⊆ R S , we define h B i to be thevector space spanned by B .We call a column vector an MD vector if it is of the form M α ( w ) for an MDstrategy α and w ∈ L ∗ . Let P be a set of MD strategy and word pairs, i.e., P = { ( α , w ) , ( α , w ) , · · · , ( α m , w m ) } where α i is an MD strategy and w i ∈ L ∗ . We definea function B transforming such a set P to the set of corresponding MD vectors, i.e., B ( P ) = { M α ( w ) , M α ( w ) , · · · , M α n ( w n ) } . (cid:73) Lemma 2.
Let j ∈ N . For all MD strategies α and α , a ∈ L and w ∈ L ≤ j , we have M α ( a ) M α ( w ) ∈ hV j ∪ B ( { ( α, aw ) } ) i where α is the MD strategy defined by α ( i ) = (cid:26) α ( i ) if c i
6∈ V j α ( i ) otherwise The next lemma shows that a basis for V j for some j < | S | consisting only of MD vectorscan be computed in polynomial time. (cid:73) Lemma 3.
Let j ∈ N with j < | S | . One can compute in polynomial time a set P j = { ( α , w ) , · · · , ( α k , w k ) } in which all α i are MD strategies and all w i are in L ≤ j such that B ( P j ) is a basis of V j . Proof sketch.
We prove this lemma by induction on j . The base case where j = 0 isvacuously true with P = { ( α , w ) } where α is an arbitrary MD strategy, w = ε and B ( P ) = { } . For the induction step, assume that we can compute in polynomial time a set P j = { ( α , w ) , · · · , ( α k , w k ) } where all the strategies are MD strategies and all the wordsare in L ≤ j such that B ( P j ) is a basis for V j . We show that the statement holds for j + 1.DefineΣ = { α } ∪ { α s → m : s ∈ S, m ∈ A ( s ) } and M = { M α ( a ) ∈ R S × S : α ∈ Σ , a ∈ L } . Next, we present Algorithm 1 which computes a set P j +1 in polynomial time such thatfor all M ∈ M and all b ∈ B ( P j ) : M · b ∈ hB ( P j +1 ) i (1) Algorithm 1
Polynomial-time algorithm computing P j +1 . P j +1 := P j foreach α ∈ Σ , a ∈ L and ( α , w ) ∈ P j do if M α ( a ) M α ( w )
6∈ hB ( P j +1 ) i then add ( α, aw ) to P j +1 where α is the MD strategy defined as α ( i ) = (cid:26) α ( i ) if c i
6∈ V j α ( i ) otherwise. end end All the vectors in B ( P j +1 ) are linearly independent, as we only add a pair if the corres-ponding vector is linearly independent to the existing vectors in B ( P j +1 ) (lines 3-4). Since B ( P j ) is a basis for V j , we can decide whether c i ∈ V j for i ∈ S in polynomial time, andthus compute a pair ( α, aw ) on line 4 in polynomial time. Since | Σ | and | L | are polynomialin the size of the MDP, | P j | < | S | , the number of iterations is polynomial in the size of theMDP. The construction of P j +1 is then in polynomial time. It remains to show that afteradding ( α, aw ) to P j +1 (line 4), we have M · b = M α ( a ) M α ( w ) ∈ hB ( P j +1 ) i . Since thepair ( α , w ) is in P j , we have w ∈ L ≤ j . Then, M · b = M α ( a ) M α ( w ) ∈ hV j ∪ B ( { ( α, aw ) } ) i [Lemma 2]= hB ( P j ) ∪ B ( { ( α, aw ) } ) i [ B ( P j ) is a basis for V j by induction hypothesis]= hB (cid:0) P j ∪ { ( α, aw ) } (cid:1) i Since P j ⊆ P j +1 (line 1), we have B ( P j ) ⊆ B ( P j +1 ). By adding the pair ( α, aw ) to P j +1 , wehave hB (cid:0) P j ∪ { ( α, aw ) } (cid:1) i ⊆ hB ( P j +1 ) i , and thus M · b ∈ hB ( P j +1 ) i .Finally, we show that the set P j +1 satisfies V j +11 = hB ( P j +1 ) i . We have hB ( P j +1 ) i ⊆ V j +13 for all ( α, w ) ∈ P j +1 : α is an MD strategy and w ∈ L ≤ j +1 ⊆ V j +11 from the definitionsWe prove the other direction V j +11 ⊆ hB ( P j +1 ) i in Appendix A. (cid:74) . Kiefer and Q. Tang 50:7 st s a s b t a t b q q m m q uv Figure 1
In this MDP no MD strategy witnesses s t . All states have the same label exceptstate v . By default the transition probabilities out of each action are uniformly distributed. Combining classical linear algebra arguments about equivalence checking (see, e.g., [42])with Lemma 3, we obtain: (cid:73)
Lemma 4.1.
For all j < | S | we have V j = V j = V j . We have V = V = V = V | S |− = V | S |− = V | S |− . Thus we obtain: (cid:73)
Proposition 5.
One can compute in polynomial time a set P = { ( α , w ) , · · · , ( α k , w k ) } of MD strategy and word pairs such that B ( P ) is a basis of V . Proof.
By Lemma 4 it suffices to invoke Lemma 3 for j = | S | − (cid:74) Now we can prove the main theorem of this section. (cid:73)
Theorem 6.
The problem TV > is in P . Further, for any positive instance of the problem TV > , we can compute in polynomial time an MD strategy α and a word w that witness µ ν , i.e., Pr µ, D ( α ) ( Run ( w )) = Pr ν, D ( α ) ( Run ( w )) . Proof.
A polynomial algorithm follows naturally from Proposition 5 and Proposition 1. Wefirst compute a set P of MD strategy and word pairs such that B ( P ) is a basis for V . Foreach b ∈ B ( P ), we check whether µ b = ν b and output “yes” indicating a positive instanceif the inequality holds. Otherwise, we have µ b = ν b for all b ∈ B ( P ), and the algorithmoutputs “no” indicating that µ ≡ ν holds for all memoryless strategies.If the instance is positive, there exists a vector b ∈ B ( P ) such that µ b = ν b . Since b is an MD vector which corresponds to a pair ( α, w ) ∈ P , we have µM α ( w ) = νM α ( w ) ,equivalently Pr µ, D ( α ) ( Run ( w )) = Pr ν, D ( α ) ( Run ( w )). (cid:74) In this section we show that one can decide in polynomial time whether there exists amemoryless strategy α so that s t in D ( α ), i.e., we show that PB > P .For some MDPs, there might be memoryless strategies such that s t in the inducedLMC but no such strategy is MD. The MDP in Figure 1 is such an example. Similar to Algorithm 2
Partition Refinement i = 0; X := { S } repeat i := i + 1 X i := S/ ≡ X i − until X i = X i − the or -gate construction of [9, Theorem 2], we have s ∼ t if and only if q ∼ q or q ∼ q .We have q ∼ q if the MD strategy maps q to the action that goes to state u , otherwise q ∼ q if the MD strategy maps q to the action that goes to state v . This rules out thealgorithm that goes through all the MD strategies.We define an equivalence relation and run the classical polynomial-time partition re-finement as shown in Algorithm 2, with an equivalence relation ≡ X defined below. At thebeginning, all states are in the same equivalence class. In a refinement step, a pair of statesis split if there could exist a memoryless strategy that makes them not probabilistic bisimilar.Two states s, t remain in the same equivalence class until the end if and only if they areprobabilistic bisimilar under all memoryless strategies.The correctness of this approach is not obvious, as some splits that occurred in differ-ent iterations of the algorithm may have been due to different, potentially contradictory,memoryless strategies. Furthermore, the algorithm does not compute a memoryless strategythat witnesses s t . The key to solving both problems will be Lemma 11.A partition of the states S is a set X consisting of pairwise disjoint subsets E of S with S E ∈ X = S . Recall that ϕ ( s, m )( s ) is the transition probability from s to s when choosing action m . Similarly, ϕ ( s, m )( E ) is the transition probability from s to E ⊆ S when choosing action m . We write ϕ ( s, m )( X ) to denote the vector (probabilitydistribution) ( ϕ ( s, m )( E )) E ∈ X . We define ϕ ( s )( X ) = { ϕ ( s, m )( X ) : m ∈ A ( s ) } , which is aset of probabilistic distributions over the partition X when choosing all available actions of s . Each partition is associated with an equivalence relation ≡ X on S : s ≡ X s if and only if- ‘ ( s ) = ‘ ( s );- s = s = ⇒ | ϕ ( s )( X ) | = 1 and ϕ ( s )( X ) = ϕ ( s )( X ).Let S/ ≡ X denote the set of equivalence classes with respect to ≡ X , which forms apartition of S . We present in Table 2 the partitions of running the algorithm on the MDP inFigure 1. Notice that states s and t are no longer in the same equivalence class at the end.The following lemma is standard, and claims that the partition gets finer. (cid:73) Lemma 7.
For all i ∈ N , we have ≡ X i +1 ⊆ ≡ X i . If the loop in Algorithm 2 is performed | S | − X | S |− consists of | S | one-element sets. Hence at most after | S | − X i cannot be refined. X = { S } X = (cid:8) { v } , S \ { v } (cid:9) X = (cid:8) { v } , { q } , { q } , S \ { v, q , q } (cid:9) X = (cid:8) { v } , { q } , { q } , { s a } , { s b } , { t a } , { t b } , { s, t, q , u } (cid:9) X = (cid:8) { v } , { q } , { q } , { s a } , { s b } , { t a } , { t b } , { s } , { t } , { q , u } (cid:9) Table 2
Example of running Algorithm 2 on the MDP in Figure 1. . Kiefer and Q. Tang 50:9
We aim at proving that s ≡ X | S |− t if and only if s ∼ D ( α ) t for all memoryless strategies α .In the following lemma we show the forward direction: (cid:73) Lemma 8.
Let X be a partition and X = S/ ≡ X . We have ≡ X ⊆ ∼ D ( α ) for all memorylessstrategies α . For the converse, to guarantee ≡ X | S |− is not too fine, it suffices to show that there existsa memoryless strategy α such that ∼ D ( α ) ⊆ ≡ X where X = S/ ≡ X . To do that, we definethe equivalence relations ∼ i D ( α ) with 0 ≤ i ≤ | S | for all memoryless strategies α .Let α be a memoryless strategy. Let τ be the transition function for the LMC D ( α ).Define the equivalence relation ∼ i D ( α ) with 0 ≤ i ≤ | S | on S : s ∼ i D ( α ) s if and only if- ‘ ( s ) = ‘ ( s );- i > ⇒ τ ( s )( E ) = τ ( s )( E ) for all E ∈ S/ ∼ i − D ( α ) .Note that for the LMC D ( α ), we have ∼ i +1 D ( α ) ⊆ ∼ i D ( α ) for all i ∈ N and ∼ | S |− D ( α ) is theprobabilistic bisimilarity for the LMC D ( α ) (see, e.g., [3]).Since the witness strategy might not be MD, we compute a set of prime numbers thatcan be used to form the weights of the actions. The prime numbers are used to rule outcertain “accidental” bisimulations. We denote by size( D ) the size of the representation of anobject D . We represent rational numbers as quotients of integers written in binary.For u ∈ S , m ∈ A ( u ) and E ⊆ S , we express ϕ ( u, m )( E ) as an irreducible fraction a u, m ,E b u, m ,E where a u, m ,E and b u, m ,E are coprime integers. Similarly, for u ∈ S , m , m ∈ A ( u ) and E ⊆ S , ϕ ( u, m )( E ) − ϕ ( u, m )( E ) is expressed as an irreducible fraction c u, m , m ,E d u, m , m ,E that c u, m , m ,E and d u, m , m ,E are coprime integers. Let N ⊆ N be the following set: N = { b u, m ,E : u ∈ S, m ∈ A ( u ) and E ∈ S i X i } ∪{ c u, m , m ,E : u ∈ S, m , m ∈ A ( u ) , E ∈ S i X i and c u, m , m ,E > } . We denote by θ ( x ) the number of different prime factors of a positive integer x , and by θ ( N )the number of different prime factors in N where N is a set of positive integers. (cid:73) Lemma 9. θ ( N ) is polynomial in size ( D ) . Using the prime number theorem, we obtain the following lemma which guarantees thatone can find | S | extra different prime numbers other than the prime factors in N in timepolynomial in size( D ). (cid:73) Lemma 10.
One can find | S | different prime numbers in time polynomial in size ( D ) suchthat any of them is coprime to all numbers in the set N . To each u ∈ S , we assign a different prime number p u that is coprime with all b ∈ N .This can be done in polynomial time by Lemma 10. We have p u (cid:45) b for all b ∈ N and u = v = ⇒ p u = p v for all u, v ∈ S (2)We define a partial memoryless strategy for D to be a partial function α : S Distr( A )that, given a state s ∈ S , returns α ( s ) ∈ Distr( A ( s )) if α ( s ) is defined. A memorylessstrategy α is compatible with a partial memoryless strategy α , written as α w α , if andonly if α ( s ) = α ( s ) for all s such that α ( s ) is defined. We construct the partial memorylessstrategy iteratively. (cid:73) Lemma 11.
Let i ∈ N with i ≤ | S | . One can compute in polynomial time a partial strategy α i such that ∼ i D ( α ) ⊆ ≡ X i for all α w α i . Proof sketch.
We prove the statement by induction on i . Let s, t ∈ S . The base case is i = 0. By definition, we have if s X t then ‘ ( s ) = ‘ ( t ). We also have if ‘ ( s ) = ‘ ( t ), then s D ( α ) t in D ( α ) for all memoryless strategy α . We simply let α be the empty partialfunction such that α w α holds for any memoryless strategy α .For the induction step, assume that we can compute in polynomial time a partial strategy α i such that ∼ i D ( α ) ⊆ ≡ X i for all α w α i , i.e., if s X i t then s i D ( α ) t in D ( α ). We showthe statement holds for i + 1. Algorithm 3
Polynomial-time algorithm constructing α i +1 . α i +1 := α i foreach u ∈ S such that | ϕ ( u )( X i ) | = 1 and | ϕ ( u )( X i +1 ) | 6 = 1 do pick m , m ∈ A ( u ) such that for a set E ∈ X i +1 : ϕ ( u, m )( E ) > ϕ ( u, m )( E ) α i +1 ( u )( m ) := p u α i +1 ( u )( m ) := 1 − p u end Algorithm 3 computes the partial memoryless strategy α i +1 in polynomial time. Weshow that α j does not overwrite α k for all k < j . It follows that for any α w α i +1 , it satisfies α w α i . Let α w α i +1 . Assume s X i +1 t . We distinguish the two cases: s i D ( α ) t and s ∼ i D ( α ) t . For both cases we can derive s i +1 D ( α ) t , i.e., ∼ i +1 D ( α ) ⊆ ≡ X i +1 as desired. Thedetails can be found in Appendix B. (cid:74) For example, let p q , the prime number assigned to state q in Figure 1, be 3 which is coprimewith numbers in N = { , } . We show how the partial strategy α is constructed. On line 1of Algorithm 3, α is equal to α , the empty partial function. Since | ϕ ( q )( X ) | = 1 and | ϕ ( q )( X ) | = 2, we enter the for loop. We can pick m , m ∈ A ( q ) and E = S \ { v } ∈ X on line 3, since ϕ ( q , m )( E ) = 1 > ϕ ( q , m )( E ). We then define the strategy for q (line 4): α ( q )( m ) = and α ( q )( m ) = . We have completed the construction of α as | ϕ ( u )( X ) | = | ϕ ( u )( X ) | = 1 for all other state u . (cid:73) Theorem 12.
One can compute in polynomial time a memoryless strategy β such that ∼ D ( β ) ⊆ ∼ D ( α ) for all memoryless strategies α . Proof.
By invoking Lemma 11 for i = | S | −
1, a partial strategy α S |− can be computed inpolynomial time such that ∼ | S |− D ( α ) ⊆ ≡ X | S |− for all α w α S |− . Since ∼ | S |− D ( α ) = ∼ D ( α ) , wehave ∼ D ( α ) ⊆ ≡ X | S |− for all α w α S |− . Let β be a memoryless strategy defined by β ( u ) = ( α S |− ( u ) if α S |− ( u ) is defined δ m u where m u ∈ A ( u ) otherwiseBy definition the memoryless strategy β is compatible with α S |− . We have: ∼ D ( β ) ⊆ ≡ X | S |− β w α S |− ⊆ ∼ D ( α ) for all strategy α X | S |− = S/ ≡ X | S |− and Lemma 8 (cid:74)(cid:73) Corollary 13.
The problem PB > is in P . Further, for any positive instance of theproblem PB > , we can compute in polynomial time a memoryless strategy that witnesses s t . We have 2 ∈ N since ϕ ( s, m s )( { s a } ) = where m s is the only available action at state s . . Kiefer and Q. Tang 50:11 ss s a s b
11 11 1 tt t t a t b
12 12
Figure 2
In this MDP, no MD strategy witnesses d tv ( δ s , δ t ) = 1 (nor d pb ( s, t ) = 1). States s b and t b have label b while all other states have label a . In this section, we summarise the results for the two distance one problems, namely TV = 1and PB = 1. The existential theory of the reals,
ETR , is the set of valid formulas of the form ∃ x . . . ∃ x n R ( x , . . . , x n ) , where R is a boolean combination of comparisons of the form p ( x , . . . , x n ) ∼
0, inwhich p ( x , . . . , x n ) is a multivariate polynomial (with rational coefficients) and ∼ ∈{ <, >, ≤ , ≥ , = , = } . The complexity class ∃ R [35] consists of those problems that are many-one reducible to ETR in polynomial time. Since
ETR is NP -hard and in PSPACE [6, 33], wehave NP ⊆ ∃ R ⊆ PSPACE .For some MDPs there exist memoryless strategies that make d tv ( δ s , δ t ) = 1 but no suchstrategy is MD. For example, consider the MDP in Figure 2 which has two MD strategies. Wehave d tv ( δ s , δ t ) = which is less than 1 in the LMC induced by any of the two MD strategies,and d tv ( δ s , δ t ) = 1 in the LMC induced by any other strategy. Thus, we cannot simply guessan MD strategy. We show that the problem TV = 1 is in ∃ R , using the characterizationfrom [10, Theorem 21] of total variation distance 1 in LMCs and some reasoning on convexpolyhedra: (cid:73) Theorem 14.
The problem
TV = 1 is in ∃ R . The problem TV = 1 is NP -hard, and PB = 1 is NP -complete. The hardness results forboth problems are by reductions from the Set Splitting problem. Given a finite set S and acollection C of subsets of S , Set Splitting asks whether there is a partition of S into disjointsets S and S such that no set in C is a subset of S or S .Let h S, Ci be an instance of Set Splitting where S = { e , · · · , e n } and C = { C , · · · , C m } is a collection of subsets of S . We construct an MDP D consisting of the following states: twostates s and t , a state e i for each element in S , twin states C j and C j for each element in C ,two sink states u and v . State v has label b while all other states have label a . State s ( t ) has asingle action which goes with uniform probability m to states C i ( C i ) for 1 ≤ i ≤ m . For each e i ∈ C j , there is an action from state C j and C j leading to state e i with probability one. Eachstate e i has two actions going to the sink states u and v with probability one, respectively. Wehave: h S, Ci ∈
Set Splitting ⇐⇒ ∃ memoryless strategy α such that d tv ( δ s , δ t ) = 1 in D ( α ) . For example, let S = { e , e , e } and C = { C , C } with C = { e , e } and C = { e , e } . s tC C C C e e e u v
12 12 12 12 Figure 3
The MDP in the reduction from Set Splitting for NP -hardness of TV = 1 (or PB = 1). Figure 3 shows the corresponding MDP. The MD strategy highlighted, corresponding to thepartition of S = { e , e } and S = { e } , witnesses d tv ( δ s , δ t ) = 1. (cid:73) Theorem 15.
The Set Splitting problem is polynomial-time many-one reducible to
TV = 1 ,hence
TV = 1 is NP -hard. The problem PB = 1 is NP -complete. The MDP in Figure 2 is also an example of noMD strategy witnessing d pb ( s, t ) = 1, which rules out the algorithm of simply guessing anMD strategy. By [39], deciding whether d pb ( s, t ) = 1 in an LMC can be formulated as areachability problem on a directed graph induced by the LMC. One can nondeterministicallyguess the graph induced by the LMC and use Algorithm 3 to construct a memoryless strategythat witnesses d pb ( s, t ) = 1. (cid:73) Theorem 16.
The problem
PB = 1 is NP -complete. In this section, we summarise the results for the remaining problems, which are all aboutmaking the distance small (equal to 0 or less than 1).We show that TV = 0 and TV < ∃ R -complete. The proof for the membership ofTV = 0 in ∃ R is similar to [18, Theorem 4.3]. For both hardness results we provide reductionsfrom the Nonnegative Matrix Factorization (NMF) problem, which asks, given a nonnegativematrix J ∈ Q n × m and a number r ∈ N , whether there exists a factorization J = A · W withnonnegative matrices A ∈ R n × r and W ∈ R r × m . The NMF problem is ∃ R -complete by [38,Theorem 2], see also [11, 44, 2] for more details on the NMF problem. The reduction issimilar to [18, Theorem 4.5]. (cid:73) Theorem 17.
The problem
TV = 0 is ∃ R -complete. (cid:73) Theorem 18.
The problem TV < is ∃ R -complete. Finally, we show that PB = 0 and PB < NP -complete. For some MDPs there existmemoryless strategies that make d pb ( s, t ) = 0 (resp. d pb ( s, t ) <
1) but no such strategy is . Kiefer and Q. Tang 50:13 s m m s a s b tt a t b
12 12
Figure 4
In this MDP, no MD strategy witnesses d pb ( s, t ) = 0. States s b and t b have label b while all other states have label a . s m m s a s b tt a t b
12 12
Figure 5
In this MDP, no MD strategy witnesses d pb ( s, t ) <
1. States s b and t b have label b whileall other states have label a . MD. Indeed, for the MDP in Figure 4 (resp. Figure 5), it is easy to check that the onlystrategy α which makes d pb ( s, t ) = 0 (resp. d pb ( s, t ) < α ( s )( m ) = α ( s )( m ) = , where m and m are the two available actions of state s . Thus,to show the NP upper bound, we cannot simply guess an MD strategy. Instead, one cannondeterministically guess a partition of the states and check in polynomial time if thepartition is a probabilistic bisimulation.The hardness results for both problems are by reductions from the Subset Sum problem.The reduction is similar to [18, Theorem 4.1]. (cid:73) Theorem 19.
The problem
PB = 0 is NP -complete. By [39], deciding whether d pb ( s, t ) < NP algorithm also guesses the graph induced by the LMC. (cid:73) Theorem 20.
The problem PB < is NP -complete. We have studied the computational complexity of qualitative comparison problems in labelledMDPs. Motivated by the connection between obliviousness/anonymity and equivalence, wehave devised polynomial-time algorithms to decide the existence of strategies for trace andbisimulation in equivalence. In case of trace inequivalence, there always exists an MD witnessstrategy, and our algorithm computes it. The trace inequivalence algorithm is based on linear-algebra arguments that are considerably more subtle than in the LMC case. For bisimulationinequivalence, MD strategies may not exist, but we have devised a polynomial-time algorithm to compute a memoryless strategy witnessing inequivalence; here the randomization is basedon prime numbers to rule out certain “accidental” bisimulations. The other 6 problems donot have polynomial complexity (unless P = NP ), and we have established completenessresults for all of them except TV = 1, where a complexity gap between NP and ∃ R remains.Concerning the relationship to interval Markov chains and parametric Markov chainsmentioned in the introduction, the lower complexity bounds that we have derived in thispaper carry over to corresponding problems in these models. Transferring the upper boundsrequires additional work, as, e.g., even the consistency problem for IMCs (i.e., whether there exists a Markov chain conforming to an IMC) is not obvious to solve. Nevertheless, thealgorithmic insights of this paper will be needed. References Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. Primes is in P.
Annals of Mathematics ,160(2):781–793, 2004. Sanjeev Arora, Rong Ge, Ravi Kannan, and Ankur Moitra. Computing a nonnegative matrixfactorization - provably. In
STOC , pages 145–162. ACM, 2012. Christel Baier. Polynomial time algorithms for testing probabilistic bisimulation and simulation.In Rajeev Alur and Thomas A. Henzinger, editors,
Computer Aided Verification , pages 50–61,Berlin, Heidelberg, 1996. Springer Berlin Heidelberg. Michael Benedikt, Rastislav Lenhardt, and James Worrell. LTL model checking of intervalMarkov chains. In Nir Piterman and Scott A. Smolka, editors,
Tools and Algorithms for theConstruction and Analysis of Systems - 19th International Conference, TACAS 2013, Held asPart of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013,Rome, Italy, March 16-24, 2013. Proceedings , volume 7795 of
Lecture Notes in ComputerScience , pages 32–46. Springer, 2013. Patrick Billingsley.
Probability and measure . Wiley Series in Probability and Statistics. Wiley,New York, NY, USA, 3rd edition, 1995. John Canny. Some algebraic and geometric computations in PSPACE. In
STOC , pages460–467, 1988. Souymodip Chakraborty and Joost-Pieter Katoen. Model checking of open interval Markovchains. In Marco Gribaudo, Daniele Manini, and Anne Remke, editors,
Analytical and StochasticModelling Techniques and Applications , pages 30–42. Springer International Publishing, 2015. Krishnendu Chatterjee, Koushik Sen, and Thomas A. Henzinger. Model-checking omega-regular properties of interval Markov chains. In Roberto M. Amadio, editor,
Foundations ofSoftware Science and Computational Structures, 11th International Conference, FOSSACS2008, Held as Part of the Joint European Conferences on Theory and Practice of Software,ETAPS 2008, Budapest, Hungary, March 29 - April 6, 2008. Proceedings , volume 4962 of
Lecture Notes in Computer Science , pages 302–317. Springer, 2008. Di Chen, Franck van Breugel, and James Worrell. On the complexity of computing probabilisticbisimilarity. In Lars Birkedal, editor,
Proceedings of the 15th International Conference onFoundations of Software Science and Computational Structures , volume 7213 of
Lecture Notesin Computer Science , pages 437–451, Tallinn, Estonia, March/April 2012. Springer-Verlag. Taolue Chen and Stefan Kiefer. On the total variation distance of labelled Markov chains.In
Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference onComputer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium onLogic in Computer Science (LICS) , CSL-LICS ’14, New York, NY, USA, 2014. ACM. Joel E. Cohen and Uriel G. Rothblum. Nonnegative ranks, decompositions, and factorizationsof nonnegative matrices.
Linear Algebra and its Applications , 190:149–168, 1993. Benoît Delahaye. Consistency for parametric interval markov chains. In Étienne André andGoran Frehse, editors, . Kiefer and Q. Tang 50:15
SynCoP 2015, April 11, 2015, London, United Kingdom , volume 44 of
OASICS , pages 17–32.Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015. Benoît Delahaye, Kim G. Larsen, Axel Legay, Mikkel L. Pedersen, and Andrzej Wasowski.Decision problems for interval Markov chains. In Adrian-Horia Dediu, Shunsuke Inenaga,and Carlos Martín-Vide, editors,
Language and Automata Theory and Applications - 5thInternational Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings ,volume 6638 of
Lecture Notes in Computer Science , pages 274–285. Springer, 2011. Salem Derisavi, Holger Hermanns, and William H. Sanders. Optimal state-space lumping inMarkov chains.
Inf. Process. Lett. , 87(6):309–315, 2003. Josée Desharnais, Vineet Gupta, Radha Jagadeesan, and Prakash Panangaden. Metrics forlabeled Markov systems. In Jos Baeten and Sjouke Mauw, editors,
Proceedings of the 10thInternational Conference on Concurrency Theory , volume 1664 of
Lecture Notes in ComputerScience , pages 258–273, Eindhoven, The Netherlands, August 1999. Springer-Verlag. Josee Desharnais, Vineet Gupta, Radha Jagadeesan, and Prakash Panangaden. Metrics forlabelled Markov processes.
Theor. Comput. Sci. , 318(3):323–354, 2004. L. Doyen, T.A. Henzinger, and J.-F. Raskin. Equivalence of labeled Markov chains.
Interna-tional Journal on Foundations of Computer Science , 19(3):549–563, 2008. Nathanaël Fijalkow, Stefan Kiefer, and Mahsa Shirmohammadi. Trace refinement in labelledMarkov decision processes.
Logical Methods in Computer Science , 16(2), 2020. Ernst Moritz Hahn, Holger Hermanns, and Lijun Zhang. Probabilistic reachability forparametric Markov models.
Int. J. Softw. Tools Technol. Transf. , 13(1):3–19, 2011. Christian Hensel, Sebastian Junges, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk.The probabilistic model checker Storm, 2020. arXiv:arXiv:2002.07080 . Bengt Jonsson and Kim Guldstrand Larsen. Specification and refinement of probabilisticprocesses. In
Proceedings of the Sixth Annual Symposium on Logic in Computer Science (LICS’91), Amsterdam, The Netherlands, July 15-18, 1991 , pages 266–277. IEEE Computer Society,1991. John G. Kemeny and J. Laurie Snell.
Finite Markov Chains . Springer, 1976. S. Kiefer, A.S. Murawski, J. Ouaknine, B. Wachter, and J. Worrell. Language equivalence forprobabilistic automata. In
CAV , volume 6806 of
LNCS , pages 526–540. Springer, 2011. S. Kiefer, A.S. Murawski, J. Ouaknine, B. Wachter, and J. Worrell. APEX: An analyzer foropen probabilistic programs. In
CAV , volume 7358 of
LNCS , pages 693–698. Springer, 2012. Stefan Kiefer and Björn Wachter. Stability and complexity of minimising probabilistic automata.In
ICALP , volume 8573 of
LNCS , pages 268–279, 2014. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of probabilistic real-timesystems. In G. Gopalakrishnan and S. Qadeer, editors,
Proc. 23rd International Conference onComputer Aided Verification (CAV’11) , volume 6806 of
LNCS , pages 585–591. Springer, 2011. Kim Guldstrand Larsen and Arne Skou. Bisimulation through probabilistic testing.
Inf.Comput. , 94(1):1–28, 1991. L. Li and Y. Feng. Quantum Markov chains: Description of hybrid systems, decidabilityof equivalence, and model checking linear-time properties.
Information and Computation ,244:229–244, 2015. LA Lindahl. Convexity and optimization, 2016. URL: . T.M. Ngo, M. Stoelinga, and M. Huisman. Confidentiality for probabilistic multi-threadedprograms and its verification. In
Engineering Secure Software and Systems , volume 7781 of
LNCS , pages 107–122. Springer, 2013. A. Paz.
Introduction to Probabilistic Automata . Academic Press, 1971. S. Peyronnet, M. de Rougemont, and Y. Strozecki. Approximate verification and enumerationproblems. In
ICTAC , volume 7521 of
LNCS , pages 228–242. Springer, 2012. James Renegar. On the computational complexity and geometry of the first-order theory ofthe reals. Parts I–III.
Journal of Symbolic Computation , 13(3):255–352, 1992. Barkley Rosser. Explicit bounds for some functions of prime numbers.
American Journal ofMathematics , 63(1):211–232, 1941. Marcus Schaefer and Daniel Stefankovic. Fixed points, Nash equilibria, and the existentialtheory of the reals.
Theory Comput. Syst. , 60(2):172–193, 2017. URL: https://doi.org/10.1007/s00224-015-9662-0 , doi:10.1007/s00224-015-9662-0 . M.-P. Schützenberger. On the definition of a family of automata.
Information and Control ,4:245–270, 1961. Koushik Sen, Mahesh Viswanathan, and Gul Agha. Model-checking markov chains in thepresence of uncertainties. In Holger Hermanns and Jens Palsberg, editors,
Tools and Algorithmsfor the Construction and Analysis of Systems, 12th International Conference, TACAS 2006Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS2006, Vienna, Austria, March 25 - April 2, 2006, Proceedings , volume 3920 of
Lecture Notesin Computer Science , pages 394–410. Springer, 2006. Yaroslav Shitov. A universality theorem for nonnegative matrix factorizations, 2016. arXiv:1606.09068 . Qiyi Tang and Franck van Breugel. Deciding probabilistic bisimilarity distance one for labelledMarkov chains. In Hana Chockler and Georg Weissenbacher, editors,
Proceedings of the30th International Conference on Computer Aided Verification , volume 10981 of
LectureNotes in Computer Science , pages 681–699, Oxford, UK, July 2018. Springer-Verlag. doi:10.1007/978-3-319-96145-3_39 . Qiyi Tang and Franck van Breugel. Deciding probabilistic bisimilarity distance one forprobabilistic automata.
Journal of Computer and System Sciences , 111:57–84, 2020. Balder ten Cate, Phokion G. Kolaitis, and Walied Othman. Data exchange with arithmeticoperations. In Giovanna Guerrini and Norman W. Paton, editors,
Joint 2013 EDBT/ICDTConferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013 , pages 537–548. ACM,2013. URL: https://doi.org/10.1145/2452376.2452439 , doi:10.1145/2452376.2452439 . Wen-Guey Tzeng. A polynomial-time algorithm for the equivalence of probabilistic automata.
SIAM Journal on Computing , 21(2):216–227, 1992. Antti Valmari and Giuliana Franceschinis. Simple O ( m log n ) time Markov chain lumping. InJavier Esparza and Rupak Majumdar, editors, Tools and Algorithms for the Construction andAnalysis of Systems, 16th International Conference, TACAS 2010, Held as Part of the JointEuropean Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus,March 20-28, 2010. Proceedings , volume 6015 of
Lecture Notes in Computer Science , pages38–52. Springer, 2010. Stephen A. Vavasis. On the complexity of nonnegative matrix factorization.
SIAM Journal onOptimization , 20(3):1364–1377, 2009. Tobias Winkler, Sebastian Junges, Guillermo A. Pérez, and Joost-Pieter Katoen. On thecomplexity of reachability in parametric markov decision processes. In Wan J. Fokkinkand Rob van Glabbeek, editors, , volume 140 of
LIPIcs ,pages 14:1–14:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CONCUR.2019.14 , doi:10.4230/LIPIcs.CONCUR.2019.14 . . Kiefer and Q. Tang 50:17 A Proofs of Section 3
The following lemma is technical and is only used in the proof of Lemma 2. (cid:73)
Lemma 21.
Let j ∈ N and α be an MD strategy. Let i ∈ S . If c i
6∈ V j , then M α ( w ) = M α i → m ( w ) for all m ∈ A ( i ) and w ∈ L ≤ j . Proof.
Let j ∈ N and α be an MD strategy. Let i ∈ S and m ∈ A ( i ). Assume c i
6∈ V j . Weprove this lemma by induction on the length of the trace w . The base case where | w | = 0is vacuously true. For the induction step, assume M α ( w ) = M α i → m ( w ) holds for all w oflength less than j . Let a ∈ L . M α ( aw ) = M α ( a ) M α ( w ) = M α ( a ) M α i → m ( w ) [ M α ( w ) = M α i → m ( w ) by induction hypothesis]= ( M α ( a ) − M α i → m ( a )) M α i → m ( w ) + M α i → m ( aw ) = x · c i + M α i → m ( aw ) [for some x ∈ R ] , where the last equality follows from the fact that since the two matrices M α ( a ) and M α i → m ( a )differ only in the i th row, M α ( a ) − M α i → m ( a ) is a matrix all whose rows except possiblythe i th row are zero vectors. The product of such a matrix with a column vector is then amultiple of c i .Since both M α ( aw ) and M α i → m ( aw ) are in V j , the difference of them, which is x · c i ,is in V j as well. However, as c i
6∈ V j by assumption, we have x = 0. Hence M α ( aw ) = M α i → m ( aw ) . (cid:74)(cid:73) Lemma 2.
Let j ∈ N . For all MD strategies α and α , a ∈ L and w ∈ L ≤ j , we have M α ( a ) M α ( w ) ∈ hV j ∪ B ( { ( α, aw ) } ) i where α is the MD strategy defined by α ( i ) = (cid:26) α ( i ) if c i
6∈ V j α ( i ) otherwise Proof.
Let j ∈ N . Let α and α be two MD strategies, a ∈ L and w ∈ L ≤ j .Since | w | ≤ j and for all i with α ( i ) = α ( i ) we have c i
6∈ V j , by Lemma 21, we have M α ( w ) = M α ( w ) (3)Then, M α ( a ) M α ( w ) = M α ( a ) M α ( w ) [(3)]= ( M α ( a ) − M α ( a )) M α ( w ) + M α ( aw ) . The first summand in the previous line is in the vector space V j , since ( M α ( a ) − M α ( a )) M α ( w ) ∈ h c i : c i ∈ V j i ⊆ V j . Thus, M α ( a ) M α ( w ) ∈ hV j ∪ { M α ( aw ) }i = hV j ∪ B ( { ( α, aw ) } ) i . (cid:74) The next lemma shows that a basis for V j for some j < | S | consisting only of MD vectorscan be computed in polynomial time. (cid:73) Lemma 3.
Let j ∈ N with j < | S | . One can compute in polynomial time a set P j = { ( α , w ) , · · · , ( α k , w k ) } in which all α i are MD strategies and all w i are in L ≤ j such that B ( P j ) is a basis of V j . Proof.
We have shown hB ( P j +1 ) i ⊆ V j +11 in the proof sketch. To show the other direction, V j +11 ⊆ hB ( P j +1 ) i , it suffices to show for all memoryless strategies α , a ∈ L and b ∈ hB ( P j ) i ,we have M α ( a ) · b ∈ hB ( P j +1 ) i . Let α be an arbitrary memoryless strategy and a ∈ L . Thematrix M α ( a ) can be expressed as a linear combination of the matrices from M : M α ( a ) = M α ( a ) + X s ∈ S − M α ( a ) + X m ∈A ( s ) α ( s )( m ) · M α s → m ( a ) That is, there are y α ∈ R for all α ∈ Σ such that M α ( a ) = X α ∈ Σ y α · M α ( a ) . (4)Let b ∈ B ( P j ). Then, M α ( a ) · b = X α ∈ Σ y α · M α ( a ) · b [(4)]Since each term of the summation, y α · M α ( a ) · b , is in hB ( P j +1 ) i by (1), M α ( a ) · b isalso in hB ( P j +1 ) i . (cid:74) For the proof of the following lemma we combine classical linear algebra arguments aboutequivalence checking (see, e.g., [42]) with Lemma 3. (cid:73)
Lemma 4.1.
For all j < | S | we have V j = V j = V j . We have V = V = V = V | S |− = V | S |− = V | S |− . Proof.
We prove the items in turn. Let j < | S | . From Lemma 3 it follows that V j ⊆ V j . From the definitions of V j , V j , V j we have V j ⊆ V j ⊆ V j . We have V j ⊆ V j +11 for all j ∈ N . Further we have for all j ∈ N : V j +11 = h v , M α ( a ) v : α is a memoryless strategy, a ∈ L, v ∈ V j i (5)It follows that if V j = V j +11 then V j = V k holds for all k ≥ j . Since dim V j ≤ | S | forall j ∈ N , it follows that V | S |− = V k holds for all k ≥ | S | −
1. By (5) we see that V | S |− contains and is closed under pre-multiplication with M α ( a ) for any memorylessstrategy α and for any a ∈ L . But from the definition of V we can derive that V is thesmallest vector space that contains and has that closure property. Thus V ⊆ V | S |− .We have: V ⊆ V | S |− as just shown= V | S |− = V | S |− by item 1 ⊆ V ⊆ V ⊆ V from the definitionsHence these vector spaces are all equal. (cid:74) . Kiefer and Q. Tang 50:19 B Proofs of Section 4
The following lemma is quite standard, it shows that the partition gets finer after eachiteration of the partition refinement algorithm. (cid:73)
Lemma 7.
For all i ∈ N , we have ≡ X i +1 ⊆ ≡ X i . Proof.
We prove the statement by induction on i . The base case where i = 0 is vacuouslytrue. For the induction step, assume ≡ X i +1 ⊆ ≡ X i . Then, for each ≡ X i -equivalence class E ,we have E = S j E j where E j ∈ S/ ≡ X i +1 . (6)Next, we show ≡ X i +2 ⊆ ≡ X i +1 .Let s, t ∈ S and s ≡ X i +2 t . If s = t , then s ≡ X i +1 t . Otherwise, assume s = t . By thedefinition of ≡ X i +2 , we have ‘ ( s ) = ‘ ( t ), | ϕ ( s )( X i +2 ) | = 1 and ϕ ( s )( X i +2 ) = ϕ ( t )( X i +2 ). Forall m , m ∈ A ( s ), m , m ∈ A ( t ) and E ∈ X i +2 , we have ϕ ( s, m )( E ) = ϕ ( s, m )( E ) = ϕ ( t, m )( E ) = ϕ ( t, m )( E ) . Since X i +2 = S/ ≡ X i +1 and X i +1 = S/ ≡ X i , by (6), we have for all m , m ∈ A ( s ), m , m ∈A ( t ) and all E ∈ X i +1 , ϕ ( s, m )( E ) = ϕ ( s, m )( E ) = ϕ ( t, m )( E ) = ϕ ( t, m )( E ) . Thus, | ϕ ( s )( X i +1 ) | = | ϕ ( t )( X i +1 ) | = 1 and ϕ ( s )( X i +1 ) = ϕ ( t )( X i +1 ). By the definitionof ≡ X i +1 , we have s ≡ X i +1 t . (cid:74) The next lemma shows that if s ≡ X | S |− t then s ∼ D ( α ) t for all memoryless strategy α . (cid:73) Lemma 8.
Let X be a partition and X = S/ ≡ X . We have ≡ X ⊆ ∼ D ( α ) for all memorylessstrategies α . Proof.
Let X be a partition and X = S/ ≡ X . Let α be a memoryless strategy. We showthat ≡ X is a probabilistic bisimulation in the induced LMC D ( α ). By the definition ofprobabilistic bisimulation, it suffices to show that for all ( u, v ) ∈ ≡ X , we have ‘ ( u ) = ‘ ( v )and τ ( u )( E ) = τ ( v )( E ) for each ≡ X -equivalence class E .Since X = S/ ≡ X , each element E ∈ X is an ≡ X -equivalence class. Let u ≡ X v . Wedistinguish the following two cases: u = v and u = v . If u = v , then u ∼ D ( α ) v is vacuouslytrue. Assume u = v . Let π be a probability distribution over X and π = ϕ ( u, m u )( X ) forsome m u ∈ A ( u ). By definition of ≡ X , we have ‘ ( u ) = ‘ ( v ) and for all m u ∈ A ( u ) and m v ∈ A ( v ), ϕ ( u, m u )( X ) = ϕ ( v, m v )( X ) = π .In the LMC D ( α ), the transition probability from u to E ∈ X is τ ( u )( E ) = X m u ∈A ( u ) α ( u )( m u ) · ϕ ( u, m u )( E )= X m u ∈A ( u ) α ( u )( m u ) · π ( E ) [ π ( E ) = ϕ ( u, m u )( E )]= π ( E ) [ α ( u ) is a probability distribution over A ( u )]Similarly, the transition probability from v to E ∈ X , τ ( v )( E ), is also equal to π ( E ).Thus, for all ≡ X -equivalence class E , we have τ ( u )( E ) = τ ( v )( E ). This completes theproof. (cid:74) (cid:73) Lemma 9. θ ( N ) is polynomial in size ( D ) . Proof.
Since X i for some i is a partition of S , we have | X i | ≤ S . Together with the factthat Algorithm 2 runs for at most | S | iterations, | S i X i | is polynomial in | S | . Thus, | N | ispolynomial in size( D ).Let b ∈ N . Since the smallest prime number is 2, we have θ ( b ) < log b . Furthermore, sincelog b is the bit size of b , θ ( b ) is then polynomial in size( D ).Finally, since θ ( N ) ≤ | N | · max b ∈ N θ ( b ), θ ( N ) is polynomial in size( D ). (cid:74)(cid:73) Lemma 10.
One can find | S | different prime numbers in time polynomial in size ( D ) suchthat any of them is coprime to all numbers in the set N . Proof.
We denote by p ( x ) the number of primes less than or equal to a positive integer x .We show that there exists x ∈ N such that p ( x ) ≥ | S | + θ ( N ) and x is polynomial in size( D ).Let x >
55 and x ≥ ( | S | + θ ( N )) . Then, | S | + θ ( N ) ≤ √ x = x √ x< x log x + 2 [ √ x > log x + 2] < p ( x ) [ x log x +2 < p ( x ) for x >
55 by [34]]It follows that x is polynomial in size( D ), as θ ( N ) is polynomial in size( D ) by Lemma 9.For each positive integer i ≤ x , we can check whether it is prime using the algorithm in [1]and is coprime to all number in N . Each check can be done in polynomial time as shown in[1] and that | N | is polynomial in size( D ). (cid:74)(cid:73) Lemma 11.
Let i ∈ N with i ≤ | S | . One can compute in polynomial time a partial strategy α i such that ∼ i D ( α ) ⊆ ≡ X i for all α w α i . Proof.
Following the proof sketch, we show the rest of the proof in detail.In Algorithm 2, once a state u satisfies | ϕ ( u )( X i ) | 6 = 1 for some partition X i , it satisfies | ϕ ( u )( X j ) | 6 = 1 for all j ≥ i since ≡ X j ⊆ ≡ X i for all j ≥ i by Lemma 7. A state u is onlyadded to the domain of the partial strategy once (in Algorithm 3 line 2), which guaranteesthat α i +1 does not overwrite α i . It follows that for any α w α i +1 , it satisfies α w α i .Let s X i +1 t . Let α w α i +1 . If s i D ( α ) t , then s i +1 D ( α ) t , since ∼ i +1 D ( α ) ⊆ ∼ i D ( α ) for all i ∈ N . Otherwise, assume s ∼ i D ( α ) t . From α w α i and the induction hypothesis, it followsthat s ≡ X i t . Since s X i +1 t , by the definition of ≡ X i and ≡ X i +1 , we have ‘ ( s ) = ‘ ( t ), s = t and | ϕ ( s )( X i ) | = | ϕ ( t )( X i ) | = 1.Towards a contradiction, assume s ∼ i +1 D ( α ) t . We show that under this assumption, for all E ∈ X i +1 , τ ( s )( E ) = τ ( t )( E ) should hold. Let E ∈ X i +1 . Since X i +1 = S/ ≡ X i , E is anequivalence class with respect to ≡ X i . By the induction hypothesis, we have E = S j E j where E j ∈ S/ ∼ i D ( α ) . (7)Then, τ ( s )( E ) = P E j τ ( s )( E j ) [(7)]= P E j τ ( t )( E j ) [ s ∼ i +1 D ( α ) t ]= τ ( t )( E ) . Kiefer and Q. Tang 50:21 The two different prime numbers p s and p t are associated with state s and t , respectively.If | ϕ ( s )( X i +1 ) | 6 = 1, α i +1 ( s ) is defined using p s on line 4. It follows that τ ( s )( E ) = τ ( t )( E ),since p s can divide the denominator of τ ( s )( E ) but not the denominator of τ ( t )( E ). Thecase when | ϕ ( t )( X i +1 ) | 6 = 1 is symmetrical. Otherwise, | ϕ ( s )( X i +1 ) | = | ϕ ( t )( X i +1 ) | = 1. Bythe definition of ≡ X i +1 , there must exist a set E that τ ( s )( E ) = τ ( t )( E ).We now show in detail that for all the following cases, we have the contradiction that τ ( s )( E ) = τ ( t )( E ) for some E ∈ X i +1 .- Assume | ϕ ( s )( X i +1 ) | 6 = 1. From the construction of α i +1 on line 3, we pick m , m ∈ A ( s )and E ∈ X i +1 such that ϕ ( s, m )( E ) > ϕ ( s, m )( E ). In the LMC D ( α ), the probabilityfrom s to E is τ ( s )( E ) = X m ∈A ( s ) α ( s )( m ) · ϕ ( s, m )( E )= X m ∈A ( s ) α i +1 ( s )( m ) · ϕ ( s, m )( E ) [ α w α i +1 ]= 1 p s · ϕ ( s, m )( E ) + (1 − p s ) · ϕ ( s, m )( E )= ϕ ( s, m )( E ) + 1 p s · (cid:0) ϕ ( s, m )( E ) − ϕ ( s, m )( E ) (cid:1) = a s, m ,E b s, m ,E + 1 p s · c s, m , m ,E d s, m , m ,E = a s, m ,E d s, m , m ,E p s + c s, m , m ,E b s, m ,E b s, m ,E d s, m , m ,E p s , where a s, m ,E , b s, m ,E , c s, m , m ,E and d s, m , m ,E are defined before Lemma 9. The firstsummand of the numerator in the previous line can be divided by p s . By (2), we have p s (cid:45) b s, m ,E and p s (cid:45) c s, m , m ,E . Thus, p s can not divide the second summand, and hence,not the numerator. For u ∈ S and E ⊆ S , we express τ ( u )( E ) as an irreducible fraction x τ ( u )( E ) y τ ( u )( E ) where x τ ( u )( E ) and y τ ( u )( E ) are coprime integers. It follows that p s | y τ ( s )( E ) .For state t , we have either | ϕ ( t )( X i +1 ) | 6 = 1 or | ϕ ( t )( X i +1 ) | = 1. Assume | ϕ ( t )( X i +1 ) | 6 = 1.Similar to s , we have m , m ∈ A ( t ) and E ∈ X i +1 such that ϕ ( t, m )( E ) > ϕ ( t, m )( E ).In the LMC D ( α ), the probability from t to E is τ ( t )( E ) = X m ∈A ( t ) α ( t )( m ) · ϕ ( t, m )( E )= X m ∈A ( t ) α i +1 ( t )( m ) · ϕ ( t, m )( E ) [ α w α i +1 ]= 1 p t · ϕ ( t, m )( E ) + (1 − p t ) · ϕ ( t, m )( E )= 1 p t · a t, m ,E b t, m ,E + p t − p t · a t, m ,E b t, m ,E = a t, m ,E b t, m ,E + ( p t − · a t, m ,E b t, m ,E p t b t, m ,E b t, m ,E By (2), the two prime numbers p s and p t are different and p s (cid:45) b t, m ,E , b t, m ,E . It followsthat p s (cid:45) y τ ( t )( E ) , and thus τ ( s )( E ) = τ ( t )( E ).We consider the other case where | ϕ ( t )( X i +1 ) | = 1. It follows that ϕ ( t, m )( E ) is the samefor all m ∈ A ( t ). Let m ∈ A ( t ). In the LMC D ( α ), the probability from t to E is τ ( t )( E ) = X m ∈A ( t ) α ( t )( m ) · ϕ ( t, m )( E ) = ϕ ( t, m )( E )= a t, m ,E b t, m ,E By (2), p s (cid:45) b t, m ,E . It follows that p s (cid:45) y τ ( t )( E ) , and thus τ ( s )( E ) = τ ( t )( E ).- Assume | ϕ ( s )( X i +1 ) | = 1 and | ϕ ( t )( X i +1 ) | 6 = 1. To avoid redundancy, we do not showthe proof as this case is similar to the case | ϕ ( s )( X i +1 ) | 6 = 1 and | ϕ ( t )( X i +1 ) | = 1.- Assume | ϕ ( s )( X i +1 ) | = | ϕ ( t )( X i +1 ) | = 1. Since s X i +1 t , by definition of ≡ X i +1 , we have ϕ ( s )( X i +1 ) = ϕ ( t )( X i +1 ). Let m s ∈ A ( s ) and m t ∈ A ( t ). There exists a set E ∈ X i +1 such that ϕ ( s, m s )( E ) = ϕ ( t, m t )( E ). In the LMC D ( α ), we have τ ( s )( E ) = X m ∈A ( s ) α ( s )( m ) · ϕ ( s, m )( E )= ϕ ( s, m s )( E ) = ϕ ( t, m t )( E )= X m ∈ ( t ) α ( t )( m ) · ϕ ( t, m )( E )= τ ( t )( E ) . This completes the proof. (cid:74)
C Proofs of Section 5C.1 Proofs of
TV = 1
In this section, we show that the problem TV = 1 is in ∃ R and is NP -hard. Recall thatTV = 1 is the problem asking whether there is a memoryless strategy α for D such that thetotal variation distance of the two initial distributions is one in the induced labelled Markovchain D ( α ), i.e., d tv ( µ, ν ) = 1.Define the set R µ,ν := { ( r , r ) ∈ S × S : ∃ w ∈ L ∗ : r ∈ support( µM ( w )) and r ∈ support( νM ( w )) } , which can be computed in polynomial time as shown in [10, Lemma 20].For each r ∈ S , define the projection R µ,νr := { r ∈ S : ( r , r ) ∈ R µ,ν } . According to [10,Theorem 21], the following proposition holds. (cid:73) Proposition 22.
We have d tv ( µ, ν ) < if and only if there are r ∈ S and subdistributions µ and µ such that µ ≡ µ and r ∈ support( µ ) and support( µ ) ⊆ R µ,νr (8)It is known that ∃ R is closed under NP -reductions [41] which is needed for showing themembership of TV = 1 in ∃ R , and later the membership of TV < ∃ R . (cid:73) Theorem 14.
The problem
TV = 1 is in ∃ R . Proof.
Let B α ∈ R S × r be a matrix consisting of r ≤ | S | linearly independent columns whichwe denote by b , · · · , b r − . Furthermore, we have- b = ;- b i = M α ( w i ) where w i ∈ L ≤| S | for all 1 ≤ i < r .The columns of B α are linearly independent, i.e., B α has full rank r , if and only if thereexists a reduced QR factorization of B α , i.e., there exist a matrix Q ∈ R S × r with orthonormal . Kiefer and Q. Tang 50:23 columns and an upper triangular matrix R ∈ R r × r with all diagonal entries being nonzerosuch that B α = QR .The matrix B α is a basis for the vector space h M α ( w ) · : w ∈ L ∗ i if and only if B α is closed under pre-multiplication with M α ( a ) for any a ∈ L , i.e., for eachlabel a ∈ L , there exists a matrix F ( a ) ∈ R r × r such that M α ( a ) · B α = B α F ( a ) . Let I n ∈ R n × n denote the identity matrix of size n . Let H α ∈ R S × r be a matrixconsisting of r columns which are denoted by h , · · · , h r − . Furthermore, we require thatall of the columns have length one and they are mutually orthogonal, i.e., H Tα H α = I r . It isan orthonormal basis for the vector space h x : B Tα x = i , i.e., the orthogonal complementof h B α i , if and only if B Tα h i = for all 0 ≤ i < r and rank ( B α ) + rank ( H α ) = r + r = | S | .Recall that c i ∈ { , } S is the column bit vector whose only non-zero entry is the i th one.For each s ∈ S , define a convex polyhedron P s = n c s + X t ∈ S λ t c t + X t ∈ R µ,νs λ t ( − c t ) (cid:12)(cid:12) λ t ≥ , λ t ≥ o . We call c t for t ∈ S and − c t for t ∈ R µ,νs the spanning vectors of P s .Assume the matrix B α is a basis for h M α ( w ) · : w ∈ L ∗ i and H α is an orthonormalbasis for the orthogonal complement of h B α i . We show that the two convex polyhedra h H α i and P s intersect if and only if d tv ( µ, ν ) < D ( α ). We distinguish the following two cases:- Assume s ∈ R µ,νs . It is easy to check that ∈ h H α i ∩ P s . Define the two subdistributions µ and µ as µ = µ = δ s . By Proposition 22, d tv ( µ, ν ) < µ and µ satisfy (8).- Assume s R µ,νs . We first show the backward implication. From Proposition 22,there exist subdistributions µ and µ satisfying (8). Let N = µ ( s ) − µ ( s ). Since s ∈ support( µ ) and s R µ,νs , we have N = µ ( s ) >
0. Define the vector v = ( µ − µ ) T N .We can easily verify that it is in both h H α i and P s , and hence, h H α i ∩ P s = ∅ .For the converse, assume h H α i ∩ P s = ∅ . Let v be a column vector such that v ∈ h H α i and v ∈ P s . Since v ∈ P s and s R µ,νs , we have v ( s ) ≥ v ( t ) ≥ t ∈ S \ R µ,νs .Since B Tα v = and b = , we have T v = 0. It follows that { t : v ( t ) < } ⊆ R µ,νs and { t : v ( t ) < } 6 = ∅ . Let N = P u ∈ S | v ( u ) | . Define the two subdistributions µ and µ asfollows: µ ( u ) = ( v ( u ) N if v ( u ) > µ ( u ) = ( − v ( u ) N if v ( u ) < µ − µ = v N and v is orthogonal with h B α i , µ − µ is also orthogonal with h B α i ,and thus µ ≡ µ . Furthermore, we have µ ( s ) = v ( s ) N ≥ N >
0, and support( µ ) = { t : v ( t ) < } ⊆ R µ,νs . From Proposition 22, it follows that d tv ( µ, ν ) < D ( α )since µ and µ satisfy (8).From the analysis above, to show that there exists a memoryless strategy α such that d tv ( µ, ν ) = 1 in the LMC D ( α ), it suffices to show that there exists a memoryless strategy α such that h H α i ∩ P s = ∅ for all s ∈ S . By [29, Theorem 5.5.1], if the two convex polyhedra h H α i and P s are disjoint, then there exists a hyperplane that strictly separates them, i.e.,there exist a s , b s ∈ R and a row vector v s ∈ R S such that v s · x ≤ a s for all x ∈ h H α i , v s · x ≥ b s for all x ∈ P s and a s < b s . Since ∈ h H α i , we have a s ≥ ≤ a s < b s . Forany column vector h of H α , a · h is also in h H α i for any a ∈ R . It follows that v s · h ≤ a s a and v s · h ≥ − a s a for all a >
0, and hence, − lim a →∞ a s a ≤ v s · h ≤ lim a →∞ a s a . Since both the left and right limits exist and are equal to zero, we have v s · h = 0 for allcolumn vectors h of H α . It follows that v s · x = 0 for all x ∈ h H α i , v s · x ≥ b s for all x ∈ P s and b s > α for D can be characterised by numbers x s, m ∈ [0 ,
1] where s ∈ S and m ∈ A such that x s, m = α ( s )( m ). We write ¯ x for the collection ( x s, m ) s ∈ S, m ∈A . Thus, todecide if there exists a memoryless strategy such that d tv ( µ, ν ) = 1, we nondeterministicallyguess a set of r − w i ∈ L ≤| S | where 1 ≤ i < r and a nonnegative integer r , thencheck the following decision problem, which is a closed formula in the existential theory ofthe reals: ∃ ¯ x , a matrix B α ∈ R S × r the columns of which are denoted by b , · · · , b r − , a matrix Q ∈ R S × r , an upper triangular matrix R ∈ R r × r , matrices F ( a ) ∈ R r × r for all a ∈ L , amatrix H α ∈ R S × r the columns of which are denoted by h , · · · , h r − , row vectors v s ∈ R S and b s ∈ R for all s ∈ S such that − for all s ∈ S : P m ∈A ( s ) x s, m = 1; [¯ x characterising a memoryless strategy] − b = ; − for all 1 ≤ i < r : b i = M α ( w i ) ; − Q T Q = I r ; − R [ i, i ] = 0 for all i ; − B α = QR ; − for all labels a ∈ L : M α ( a ) · B α = B α F ( a ); B α is a basis for h M α ( w ) · : w ∈ L ∗ i− H Tα H α = I r ; − for all 0 ≤ i < r : B Tα h i = ; − r + r = | S | ; H α is an orthonormal basis for the orthogonal com-plement of h B α i− for all s ∈ S : v s · h i = 0 for all 0 ≤ i < r ; − for all s ∈ S : v s · x ≥ b s for all x ∈ P s , i.e., v s · c s = b s and v s · c ≥ c of P s ; − for all s ∈ S : b s > . for all s ∈ S , h H α i and P s donot intersect (cid:74) Let µ , µ be two subdistributions on S . We write µ ≤ µ to say that µ ( u ) ≤ µ ( u ) forall u ∈ S . According to [10, Proposition 17], the following proposition holds. (cid:73) Proposition 23.
We have d tv ( µ, ν ) < if and only if there are w ∈ L ∗ and µ and µ with µ ≤ µM ( w ) and µ ≤ νM ( w ) and µ ≡ µ and | µ | = | µ | > . (cid:73) Theorem 15.
The Set Splitting problem is polynomial-time many-one reducible to
TV = 1 ,hence
TV = 1 is NP -hard. Proof.
Let h S, Ci be an instance of Set Splitting where S = { e , · · · , e n } and C = { C , · · · , C m } is a collection of subsets of S . We construct an MDP D , see Figure 3for example, consisting of the following states: two states s and t , a state e i for each elementin S , twin states C j and C j for each element in C , two sink states u and v . State v has label . Kiefer and Q. Tang 50:25 b while all other states have label a . State s ( t ) has a single action which goes with uniformprobability m to states C i ( C i ) for 1 ≤ i ≤ m . For each e i ∈ C j , there is an action fromstate C j and C j leading to state e i with probability one. Each state e i has two actions goingto the sink states u and v with probability one, respectively. We show that h S, Ci ∈
Set Splitting ⇐⇒ ∃ memoryless strategy α such that d tv ( µ, ν ) = 1 in D ( α ) . Intuitively, making C i (resp. C i ) select the transition to e j simulates the membership of e j in S (resp. S ).( = ⇒ ) Let S and S be the two disjoint sets that partition S and split the elementsin C . For the MDP D , we define an MD strategy α as follows: let state e i ∈ S select theaction transitioning to u and state e i ∈ S the action to v ; let state C i select an availableaction that goes to a state in S and C i an available action that goes to a state in S .We show that d tv ( µ, ν ) = 1 in the LMC D ( α ). Let µ and µ be subdistributions overthe states that are reachable from s and t , respectively. Let E ∈ F be a set of words alwaysending with infinite number of b ’s. Since a word emitted by running D ( α ) from an arbitrarystate in support( µ ) always ends with infinitely many b ’s, we have Pr µ ( E ) >
0. On theother hand, a word emitted by running D ( α ) from an arbitrary state in support( µ ) alwaysends with infinitely many a ’s, we have Pr µ ( E ) = 0. Then, d tv ( µ , µ ) = sup E ∈F | Pr µ ( E ) − Pr µ ( E ) |≥| Pr µ ( E ) − Pr µ ( E ) | > d tv ( µ, ν ) = 1 in the LMC D ( α ).( ⇐ = ) Let α be a memoryless strategy for D such that d tv ( µ, ν ) = 1. Let τ be thetransition function for the LMC D ( α ). Let S = S C i support( τ ( C i )) and S = S \ S . Let S = S C i support( τ ( C i )). It suffices to show that S ⊆ S and S and S split the elementsof C .Since S T S = ∅ , otherwise d tv ( µ, ν ) < S ⊆ S . We proveby contradiction that S and S split the elements of C . Assume there is a set C i ∈ C which isnot split by S and S . Furthermore, without loss of generality, assume C i ⊆ S , that is, forall states e ∈ C i : e ∈ support( τ ( C i )). Since state C i and C i have the same successors in theMDP D , there must exist a state e ∈ C i such that e ∈ support( τ ( C i )). Let µ = µ = δ e .We have d tv ( µ , µ ) = 0, which leads to the desired contradiction d tv ( µ, ν ) < D ( α ) byProposition 23. (cid:74) C.2 Proofs of
PB = 1
Next, we show that the problem PB = 1 is NP -complete. Recall that PB = 1 is the problemasking whether there is a memoryless strategy α for D such that the probabilistic bisimilaritydistance of the two initial states is one in the induced labelled Markov chain D ( α ), i.e., d pb ( s, t ) = 1. (cid:73) Definition 24.
The directed graph G = ( V, E ) is defined by V = { ( u, v ) : ‘ ( u ) = ‘ ( v ) } E = { h ( u, v ) , ( s , t ) i : τ ( s )( u ) > ∧ τ ( t )( v ) > } By [39, Theorem 4, Proposition 5], the following proposition holds. (cid:73)
Proposition 25.
We have d pb ( s, t ) < if and only if in the graph G = ( V, E ) the vertex ( s, t ) is reachable from some ( u, v ) with u ∼ v . (cid:73) Theorem 26.
The problem
PB = 1 is in NP . Proof.
Suppose there exists a memoryless strategy β such that d pb ( s, t ) = 1 in D ( β ). Let G bethe graph of Definition 24 induced by the LMC D ( β ). Consider an MDP D = h S, A , L, ϕ , ‘ i ,which is over the same state space as D but is restricted to choose actions that conform tothe graph G . Thus, β is also a strategy of D . Furthermore, we have D ( α ) = D ( α ) for allmemoryless strategy α of D .According to Theorem 12, a memoryless strategy α of D such that ∼ D ( α ) ⊆ ∼ D ( α ) for allmemoryless strategy α can be computed in polynomial time. Thus, we have ∼ D ( α ) ⊆ ∼ D ( β ) ,that is, if u D ( β ) v then u D ( α ) v for u, v ∈ S . Let G be the graph of Definition 24 forthe LMC D ( α ). Since D conforms to G , G is a subgraph of G . Let R and R be the set ofstate pairs that can reach ( s, t ) in G and G , respectively. We have R ⊆ R .According to Proposition 25, since d pb ( s, t ) = 1 in the LMC D ( β ), we have u D ( β ) v for all ( u, v ) ∈ R . By ∼ D ( α ) ⊆ ∼ D ( β ) and R ⊆ R , we have u D ( α ) v for all ( u, v ) ∈ R .By Proposition 25, we have d pb ( s, t ) = 1 in the LMC D ( α ), and hence, α is a memorylessstrategy that witnesses d pb ( s, t ) = 1.This induces the following nondeterministic algorithm: we guess the graph G and checkwhether d pb ( s, t ) = 1 holds in D ( α ), where both the construction of the memoryless strategy α (using Algorithm 3) and the checking of d pb ( s, t ) = 1 are in polynomial time. (cid:74)(cid:73) Theorem 27.
The Set Splitting problem is polynomial-time many-one reducible to
PB = 1 ,hence
PB = 1 is NP -hard. Proof.
Given an instance of Set Splitting h S, Ci where S = { e , · · · , e n } and C = { C , · · · , C m } is a collection of subsets of S , we construct the same MDP D as shownin Theorem 15, see Figure 3 for example. We show that h S, C i ∈
Set Splitting ⇐⇒ ∃ α for D such that d pb ( s, t ) = 1 in D ( α ) . ( = ⇒ ) Let S and S be the two disjoint sets that partition S and split the elements of C . According to Theorem 15, there exists a memoryless strategy α such that d tv ( δ s , δ t ) = 1in the induced LMC D ( α ). Since probabilistic bisimilarity distance is an upper bound of thetotal variational distance [9], we have that d pb ( s, t ) = 1 in D ( α ).( ⇐ = ) Let α be a memoryless strategy for D such that d pb ( s, t ) = 1 in the LMC D ( α ).Let τ be the transition function for the LMC D ( α ). Let S = S C i support( τ ( C i )) and S = S \ S . Let S = S C i support( τ ( C i )). It suffices to show that S and S split theelements of C and S ⊆ S .Since d pb ( s, t ) = 1, by definition of probabilistic bisimilarity distance, d pb ( C i , C j ) = 1 forany choice of C i and C j . We can obtain, by the same argument, d pb ( e k , e l ) = 1 for any e k ∈ support( τ ( C i )) and e l ∈ support( τ ( C j )). Thus, we have support( τ ( C i )) ∩ support( τ ( C j )) = ∅ for any choice of C i and C j . It follows that S ∩ S = ∅ , that is, S ⊆ S . Furthermore,for any set C i ∈ C , there are two states e k , e l ∈ C i such that e k ∈ support( τ ( C i )) and e l ∈ support( τ ( C i )), that is, e k and e l split the set C i . (cid:74)(cid:73) Theorem 16.
The problem
PB = 1 is NP -complete. Proof.
It follows from Theorem 26 and Theorem 27. (cid:74) . Kiefer and Q. Tang 50:27
D Proofs of Section 6D.1 Proofs of
TV = 0
In this section we show that the problem TV = 0 is ∃ R -complete. Recall that TV = 0 is theproblem asking whether there is a memoryless strategy α for D such that the total variationdistance of the two initial distributions is zero in the induced labelled Markov chain D ( α ),i.e., d tv ( µ, ν ) = 0.The following proposition is adapted from [25, Proposition 10], which will be used toprove Theorem 29. (cid:73) Proposition 28.
Let M = h S, L, τ, ‘ i be an LMC and µ and ν be two (sub)distributions.We have that µ ≡ ν if and only if there exists F ∈ R S × S such that- the first row of F is µ − ν ;- F = and for each label a ∈ L there exists a matrix B ( a ) ∈ R S × S such that F M ( a ) = B ( a ) F. (cid:73) Theorem 29.
The problem
TV = 0 is in ∃ R . Proof.
The proof is very similar to the one of [18, Theorem 4.3].A memoryless strategy α for D can be characterised by numbers x s, m ∈ [0 ,
1] where s ∈ S and m ∈ A such that x s, m = α ( s )( m ). We write ¯ x for the collection ( x s, m ) s ∈ S, m ∈A .According to Proposition 28, in the LMC D ( α ), we have µ ≡ ν if and only if the followingdecision problem, which is a closed formula in the existential theory of the reals, has answer“yes”: ∃ ¯ x , matrices B ( a ) ∈ R S × S for all a ∈ L and a matrix F ∈ R S × S such that- P m ∈A ( s ) x s,m = 1 for all s ∈ S ;- the first row of F is µ − ν ;- F = ;- F M α ( a ) = B ( a ) F for all a ∈ L . (cid:74) To show that the problem TV = 0 is hard for ∃ R , we present the reduction fromthe nonnegative matrix factorization (NMF) problem. Given the instance of the NMF, anonnegative matrix J ∈ Q n × m and a number r ∈ N , we construct an MDP D ; see Figure 6.Similar to [18, Theorem 4.5], we assume, without loss of generality, that J is a stochasticmatrix. The left part is an LMC. The transition probability from s i to p j in the LMCencodes the entry J [ i, j ].The other part is an MDP; see the right of Figure 6. The initial state t transitions tothe successors t , · · · , t n with equal probabilities. In each t i where 1 ≤ i ≤ n , there are r actions m i, , m i, , . . . , m i,r where ϕ ( t i , m i,k ) = δ t k for 1 ≤ k ≤ r . In each t k , there are m actions m k, , m k, , . . . , m k,m where ϕ ( t k , m k,j ) = δ q j where 1 ≤ j ≤ m . In state q j , there isonly one action which transitions back to state t with probability one.The probabilities of choosing the action m i,k in s i and choosing m k,j in s k simulate theentries of A [ i, k ] and W [ k, j ].The distribution µ and ν are the Dirac distribution on s and t , respectively. The labelsof the states are as follows: ‘ ( s i ) = ‘ ( t i ) = a i for 1 ≤ i ≤ n , ‘ ( p j ) = ‘ ( q j ) = b j for 1 ≤ j ≤ m and all remaining states have label c . The construction is very similar to the one in [18,Theorem 4.5].The following proposition is technical and is used in proving Theorem 31 and Theorem 34. s s n s ... s n ... s p m p ... n n J [ n, m ] ... J [ n, J [1 , m ] ... J [1 ,
1] 11 t t n ... t m n,r m n, m , m ,r t r ... t m r,m m r, m , m ,m q m q ... n n ...... Figure 6
The MDP D in the reduction for ∃ R -hardness of TV = 0 (or TV < ‘ ( s i ) = ‘ ( t i ) = a i for 1 ≤ i ≤ n , ‘ ( p j ) = ‘ ( q j ) = b j for 1 ≤ j ≤ m and allremaining states have label c . (cid:73) Proposition 30.
The NMF instance is a yes-instance if and only if there is a memorylessstrategy α such that d tv ( µ, ν ) = 0 in D ( α ) . Proof. ( ⇐ = ) Assume there is a memoryless strategy α such that in the induced Markovchain d tv ( µ, ν ) = 0, that is, we have Pr µ ( Run ( w )) = Pr ν ( Run ( w )) for all words w ∈ L ∗ . Forall 1 ≤ i ≤ n , 1 ≤ k ≤ r and 1 ≤ j ≤ m , let A [ i, k ] = α ( t i )( m i,k ) and W [ k, j ] = α ( t k )( m k,j ) . In the LMC D ( α ), for all 1 ≤ i ≤ n and all 1 ≤ j ≤ m , we havePr µ ( Run ( ca i cb j )) = 1 n J [ i, j ] andPr ν ( Run ( ca i cb j )) = 1 n r X k =1 α ( t i )( m i,k ) · α ( t k )( m k,j ) = 1 n r X k =1 A [ i, k ] · W [ k, j ] . For all i, j we have Pr µ ( Run ( ca i cb j )) = Pr ν ( Run ( ca i cb j )). Thus, we have P rk =1 A [ i, k ] · W [ k, j ] = J [ i, j ] for all i, j .( = ⇒ ) Assume the NMF instance is a yes-instance, that is, P rk =1 A [ i, k ] · W [ k, j ] = J [ i, j ]for all i, j . We construct a memoryless strategy α such that d tv ( µ, ν ) = 0 in D ( α ). For all . Kiefer and Q. Tang 50:29 state s ∈ S and m ∈ A , the strategy α is defined by α ( s )( m ) = A [ i, k ] if s = t i and m = m i,k where 1 ≤ i ≤ n and 1 ≤ k ≤ rW [ k, j ] if s = t k and m = m k,j where 1 ≤ k ≤ r and 1 ≤ j ≤ m m is the only action available to s k ∈ N . Define w k to be the word ca i k cb j k where 1 ≤ i k ≤ n and 1 ≤ j k ≤ m . To showthat Pr µ ( Run ( w )) = Pr ν ( Run ( w )) for all w ∈ L ∗ , it suffices to show for all k ∈ N , we have:- Pr µ ( Run ( w · · · w k )) = Pr ν ( Run ( w · · · w k )) = n k Q kk =1 J [ i k , j k ];- Pr µ ( Run ( w · · · w k c )) = Pr ν ( Run ( w · · · w k c ));- Pr µ ( Run ( w · · · w k ca i k +1 )) = Pr ν ( Run ( w · · · w k ca i k +1 ));- Pr µ ( Run ( w · · · w k ca i k +1 c )) = Pr ν ( Run ( w · · · w k ca i k +1 c )).We prove the statement by induction on k . The base case is k = 0. We have Pr µ ( Run ( ε )) =Pr ν ( Run ( ε )) = Pr µ ( Run ( c )) = Pr ν ( Run ( c )) = 1 and Pr µ ( Run ( ca i )) = Pr ν ( Run ( ca i )) =Pr µ ( Run ( ca i c )) = Pr ν ( Run ( ca i c )) = n .For the induction step, assume the statement holds for all k ≤ k . By the inductionhypothesis, we have: µM α ( w · · · w k ) = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) δ s = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) µ and (9) νM α ( w · · · w k ) = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) δ t = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) ν (10)First, we show thatPr µ ( Run ( w · · · w k +1 )) = Pr ν ( Run ( w · · · w k +1 )) = n k +1 Q k +1 k =1 J [ i k , j k ] . (11)We havePr µ ( Run ( w · · · w k +1 ))= | µM α ( w · · · w k +1 ) | = | µM α ( w · · · w k ) M α ( w k +1 ) | = | (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) µM α ( w k +1 ) | [(9)]= (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) | µM α ( w k +1 ) | = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) n J [ i k +1 , j k +1 ] [induction hypothesis]= 1 n k +1 Q k +1 k =1 J [ i k , j k ]Similarly,Pr ν ( Run ( w · · · w k +1 ))= | νM α ( w · · · w k +1 ) | = | νM α ( w · · · w k ) M α ( w k +1 ) | = | (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) νM α ( w k +1 ) | [(10)]= (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) | νM α ( w k +1 ) | = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) n J [ i k +1 , j k +1 ] [induction hypothesis]= 1 n k +1 Q k +1 k =1 J [ i k , j k ] . By equation (11), we have µM α ( w · · · w k +1 ) = (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) δ s = (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) µ and (12) νM α ( w · · · w k +1 ) = (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) δ t = (cid:0) n k Q kk =1 J [ i k , j k ] (cid:1) ν (13)Thus,Pr µ ( Run ( w · · · w k +1 c ))= | µM α ( w · · · w k +1 c ) | = | µM α ( w · · · w k +1 ) M α ( c ) | = | (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) µM α ( c ) | [(12)]= (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) | µM α ( c ) | = 1 n k +1 Q k +1 k =1 J [ i k , j k ]Similarly, Pr ν ( Run ( w · · · w k +1 c )) = n k +1 Q k +1 k =1 J [ i k , j k ]. We also havePr µ ( Run ( w · · · w k +1 ca i k +2 ))= | µM α ( w · · · w k +1 ca i k +2 ) | = | µM α ( w · · · w k +1 ) M α ( ca i k +2 ) | = | (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) µM α ( ca i k +2 ) | [(12)]= (cid:0) n k +1 Q k +1 k =1 J [ i k , j k ] (cid:1) | µM α ( ca i k +2 ) | = 1 n k +2 Q k +1 k =1 J [ i k , j k ]Similarly, Pr ν ( Run ( w · · · w k +1 ca i k +2 )) = Pr µ ( Run ( w · · · w k +1 ca i k +2 c )) =Pr ν ( Run ( w · · · w k +1 ca i k +2 c )) = n k +2 Q k +1 k =1 J [ i k , j k ]. (cid:74)(cid:73) Theorem 31.
The NMF problem is polynomial-time reducible to the problem
TV = 0 ,hence
TV = 0 is ∃ R -hard. Proof.
Proposition 30 shows that the NMF problem is polynomial-reducible to the problemTV = 0. Since the NMF problem is ∃ R -complete [38], we have that the problem TV = 0 is ∃ R -hard. (cid:74)(cid:73) Theorem 17.
The problem
TV = 0 is ∃ R -complete. Proof.
It follows from Theorem 29 and Theorem 31. (cid:74) . Kiefer and Q. Tang 50:31
D.2 Proofs of TV < Next, we show that the problem TV < ∃ R -complete. Recall that TV < α for D such that the total variation distanceof the two initial distributions is less than one in the induced labelled Markov chain D ( α ),i.e., d tv ( µ, ν ) < (cid:73) Theorem 32.
The problem TV < is in ∃ R . Proof.
A memoryless strategy α for D can be characterised by numbers x s, m ∈ [0 ,
1] where s ∈ S and m ∈ A such that x s, m = α ( s )( m ). We write ¯ x for the collection ( x s, m ) s ∈ S, m ∈A .From Proposition 22, to check whether there is a memoryless strategy α such that d tv ( µ, ν ) <
1, it suffices to check if there are subdistributions µ and µ that satisfy Equation (8).Thus, we can nondeterministically guess r and support of µ such that support( µ ) ⊆ R µ,νr ,and then check the following decision problem, which is a closed formula in the existentialtheory of the reals: ∃ ¯ x , matrices B ( a ) ∈ R S × S for all a ∈ L , a matrix F ∈ R S × S , subdistributions µ and µ such that- P m ∈A ( s ) x s, m = 1 for all s ∈ S ;- the first row of F is µ − µ ;- F = ;- F M ( a ) = B ( a ) F for all a ∈ L ;- r ∈ support( µ );- support( µ ) ⊆ R µ,νr .It follows that the problem is in ∃ R since ∃ R is closed under NP -reductions [41]. (cid:74) To show that the problem TV < ∃ R , we present the reduction from thenonnegative maitrx factorization (NMF) problem. We construct the same MDP D as shownin Figure 6. The reduction is similar to [18, Theorem 4.5].The proposition below is technical and is only used in the proof of Theorem 34. (cid:73) Proposition 33.
If the NMF instance is a no-instance then for all memoryless strategy α and all (sub)distributions µ over the left part of D and all (sub)distributions µ over theright part, we have d tv ( µ , µ ) > in the LMC D ( α ) . Proof.
Let µ and µ be two (sub)distributions where µ is over the left part of D and µ is over the right part. Let the NMF instance be a no-instance. Let α be any memorylessstrategy.By the construction of the MDP D (see Figure 6), there must exist a word w ∈ L ∗ suchthat µ M α ( w ) is a Dirac distribution on state s . Let µ = µ M α ( w ) and µ = µ M α ( w ).We have that µ = µ ( s ) δ s . We distinguish the following three cases.(a) Assume | µ | 6 = | µ | . Let E = L ω . We have d tv ( µ , µ ) = sup E ∈F | Pr µ ( E ) − Pr µ ( E ) |≥ | Pr µ ( E ) − Pr µ ( E ) | [ E ∈ F ]= (cid:12)(cid:12) | µ | − | µ | (cid:12)(cid:12) > | µ | 6 = | µ | ] (b) Assume | µ | = | µ | and µ ( s ) = µ ( t ). Let E = Run ( ca i ) ∈ F . We have d tv ( µ , µ ) = sup E ∈F | Pr µ ( E ) − Pr µ ( E ) |≥ | Pr µ ( E ) − Pr µ ( E ) | [ E ∈ F ]= | Pr µ ( Run ( ca i )) − Pr µ ( Run ( ca i )) | [ E = Run ( ca i )]= (cid:12)(cid:12) | µ M α ( ca i ) | − | µ M α ( ca i ) | (cid:12)(cid:12) = | n µ ( s ) − n µ ( t ) | > µ ( s ) = µ ( t )](c) Assume | µ | = | µ | and µ ( s ) = µ ( t ) > µ = µ ( s ) δ s , | µ | = | µ | and µ ( t ) = µ ( s ) >
0, we have that µ = µ ( t ) δ t .By Proposition 30, we have that if the NMF instance is a no-instance, then d tv ( µ, ν ) > w ∈ L ∗ such that Pr µ ( Run ( w )) = Pr ν ( Run ( w )). This word w is of the form ca i cb j ca i · · · , since it is emitted by running the MDP D from state s .Let E = Run ( w ) ∈ F . We have d tv ( µ , µ ) = sup E ∈F | Pr µ ( E ) − Pr µ ( E ) |≥ | Pr µ ( E ) − Pr µ ( E ) | [ E ∈ F ]= | Pr µ ( Run ( w )) − Pr µ ( Run ( w )) | [ E = Run ( w )]= (cid:12)(cid:12) | µ M α ( w ) | − | µ M α ( w ) | (cid:12)(cid:12) = (cid:12)(cid:12) | µ ( s ) δ s M α ( w ) | − | µ ( t ) δ t M α ( w ) | (cid:12)(cid:12) [ µ = µ ( s ) δ s and µ = µ ( t ) δ t ]= | Pr µ ( s ) δ s ( Run ( w )) − Pr µ ( t ) δ t ( Run ( w )) | = | Pr µ ( s ) µ ( Run ( w )) − Pr µ ( t ) ν ( Run ( w )) | [ µ = δ s and ν = δ t ]= | µ ( s ) Pr µ ( Run ( w )) − µ ( t ) Pr ν ( Run ( w )) | = µ ( s ) | Pr µ ( Run ( w )) − Pr ν ( Run ( w )) | [ µ ( s ) = µ ( t )] > µ ( s ) > µ ( Run ( w )) = Pr ν ( Run ( w ))]Following the three cases, we have d tv ( µ , µ ) >
0, that is, there exists a word w ∈ L ∗ such that Pr µ ( Run ( w )) = Pr µ ( Run ( w )). Consider the word w w , we havePr µ ( Run ( w w )) = | µ M α ( w w ) | = | µ M α ( w ) M α ( w ) | = Pr µ M α ( w ) ( Run ( w )= Pr µ ( Run ( w )) [ µ = µ M α ( w )] = Pr µ ( Run ( w ))= Pr µ M α ( w ) ( Run ( w )) [ µ = µ M α ( w )]= | µ M α ( w ) M α ( w ) | = | µ M α ( w w ) | = Pr µ ( Run ( w w )) . Thus, we have d tv ( µ , µ ) > (cid:74)(cid:73) Theorem 34.
The NMF problem is polynomial-time reducible to the problem TV < ,hence TV < is ∃ R -hard. Proof.
According to Proposition 30, we have that if the NMF instance is a yes-instance thenthere is a memoryless strategy such that d tv ( µ, ν ) = 0 in the induced LMC, which implies d tv ( µ, ν ) < d tv ( µ, ν ) <
1, then theNMF instance is a yes-instance. We show the contrapositive, that is, if the NMF instance isa no-instance, then for all memoryless strategy d tv ( µ, ν ) = 1 in the induced LMC. For all . Kiefer and Q. Tang 50:33 w ∈ L ∗ and memoryless strategy α , we have that if | µM α ( w ) | > | νM α ( w ) | > µM α ( w ) and νM α ( w ) are subdistributions over the left and right part of D , respectively.It follows from Proposition 33 that d tv ( µ , µ ) > µ , µ over theleft and right part of D , respectively. By Proposition 23, we have that d tv ( µ, ν ) = 1 in allLMC D ( α ). (cid:74)(cid:73) Theorem 18.
The problem TV < is ∃ R -complete. Proof.
It follows from Theorem 32 and Theorem 34. (cid:74)
D.3 Proofs of
PB = 0
Next, we show that the problem PB = 0 is NP -complete. Recall that PB = 0 is the problemasking whether there is a memoryless strategy α for D such that the probabilistic bisimilaritydistance of the two initial states is zero in the induced labelled Markov chain D ( α ), i.e., d pb ( s, t ) = 0. (cid:73) Theorem 35. [15, Theorem 1]
For all s, t ∈ S , s ∼ t if and only if d pb ( s, t ) = 0 . (cid:73) Theorem 36.
The problem
PB = 0 is in NP . Proof.
According to Theorem 35 and the definition of probabilistic bisimilarity, there existsa memoryless strategy α such that d pb ( s, t ) = 0 in the induced LMC D ( α ), if and only if theinitial states s and t are probabilistic bisimilar, i.e., s and t are in the same probabilisticbisimulation induced equivalence class.We can nondeterministically guess a partition E , · · · , E n of S such that each subset E i is a probabilistic bisimulation induced equivalence class and state s, t are in the sameequivalence class, that is, S E i = S , E i ∩ E j = ∅ for any i = j , and s, t ∈ E i for some i .A memoryless strategy α for D can be characterised by numbers x s,m ∈ [0 ,
1] where s ∈ S and m ∈ A such that x s, m = α ( s )( m ). We write ¯ x for the collection ( x s, m ) s ∈ S, m ∈A . To check d pb ( s, t ) = 0 in the induced LMC D ( α ) amounts to a feasibility test of the following linearprogram: ∃ ¯ x such that P m ∈A ( s ) x s, m = 1 for all s ∈ S and τ ( s )( E j ) = τ ( t )( E j ) for all E i , E j and all s , t ∈ E i , and hence can be decided in polynomial time. (cid:74) Given a set S = { s , · · · , s n } and N ∈ N , Subset Sum asks whether there exists a set P ⊆ S such that P s i ∈ P s i = N . (cid:73) Theorem 37.
The Subset Sum problem is polynomial-time many-one reduction to
PB = 0 ,hence
PB = 0 is NP -hard. Proof.
Given an instance of Subset Sum h S, N i where S = { s , · · · , s n } and N ∈ N , weconstruct an MDP D ; see Figure 7. Let T = P s i ∈ S s i . In the MDP D , state s transitions tostate s i with probability s i /T for all 1 ≤ i ≤ n . Each state s i has two available actions, eachtransitions to s a and s b by taking the action m i and m i , respectively. State t transitions to t and t with probability N/T and 1 − N/T , respectively. All the remaining states haveonly one available action transitioning to the successor state with probability one. States s b and t b have label b and all other states have label a . ss · · · s n m m m n m n s a s bs T s n T tt t t a t bNT − NT Figure 7
The MDP D in the reduction for NP -hardness of PB = 0 (or PB < a except s b and t b which have label b . Next, we show that h S, N i ∈
Subset Sum ⇐⇒ ∃ memoryless strategy α such that d pb ( s, t ) = 0 in D ( α ) . Intuitively, making s i probabilistic bisimilar with t simulates the membership of s i in P .Conversely, making s i probabilistic bisimilar with t simulates the membership of s i in S \ P .( = ⇒ ) Let P ⊆ S be the set such that P s i ∈ P s i = N . Let α be an MD strategy suchthat if s i ∈ P then α ( s i ) = m i and α ( s i ) = m i otherwise. Consider the following partition ofstates of D , E = { s, t } , E = P ∪ { t } , E = ( S \ P ) ∪ { t } , E = { s a , t a } and E = { s b , t b } . Then, τ ( s )( E ) = τ ( s )( P ) = X s i ∈ P s i T = NT [ P s i ∈ P s i = N ]= τ ( t )( t ) = τ ( t )( E ) and τ ( s )( E ) = τ ( s )( S \ P ) = X s i ∈ S \ P s i T = 1 − X s i ∈ P s i T = 1 − NT [ P s i ∈ P s i = N ]= τ ( t )( t ) = τ ( t )( E ) . Similarly, we can verify that for all E i , E j and all s , t ∈ E i : τ ( s )( E j ) = τ ( t )( E j ) . By the definition of probabilistic bisimulation, each set E i is a probabilistic bisimulationinduced equivalence class. Since s and t are in the same equivalence class, we have s ∼ t ,and hence d pb ( s, t ) = 0 by Theorem 35.( ⇐ = ) Assume there is a memoryless strategy α such that d pb ( s, t ) = 0 in the LMC D ( α ).By Theorem 35, s and t are probabilistic bisimilar in D ( α ). Let P be the set of successorstates of s that are probabilistic bisimilar to t . Then, τ ( s )( P ) = X s i ∈ P s i T and τ ( t )( t ) = NT . . Kiefer and Q. Tang 50:35
Since ‘ ( t a ) = ‘ ( t b ), by definition of probabilistic bisimilarity, we have t a t b , and hence t a and t b are not in the same ∼ -equivalence class. Since τ ( t )( t a ) = τ ( t )( t b ) = 1 in theLMC D ( α ), again by definition of probabilistic bisimilarity, we have t t and t and t are not in the same ∼ -equivalence class, and thus t is not in the same ∼ -equivalence classas the states in P . Since s ∼ t , we have P s i ∈ P s i T = NT , and hence, P s i ∈ P = N . (cid:74)(cid:73) Theorem 19.
The problem
PB = 0 is NP -complete. Proof.
It follows from Theorem 36 and Theorem 37. (cid:74)
D.4 Proofs of PB < We show in this section that the problem PB < NP -complete. Recall that PB < α for D such that the probabilisticbisimilarity distance of the two initial states is less than one in the induced labelled Markovchain D ( α ), i.e., d pb ( s, t ) < (cid:73) Theorem 20.
The problem PB < is NP -complete. Proof.
We first show that this problem is in NP . We nondeterministically guess a partition E , · · · , E n of the states of D and two states u, v in the same subset E i for some i . Wealso nondeterministically guess the graph G of Definition 24 for the LMC D ( α ) induced bysome strategy α . By Proposition 25, if ( s, t ) is reachable from some state pair ( u, v ) in thegraph G and u ∼ v then d pb ( s, t ) <
1. The condition that ( s, t ) is reachable from ( u, v ) inthe graph G can be checked in polynomial time using e.g. breadth-first search. To check u ∼ v , it suffices to check each subset E i is a probabilistic bisimulation induced equivalenceclass, which amounts to a feasibility test of the linear program: ∃ ¯ x such that P m ∈A ( s ) x s, m = 1 for all s ∈ S and τ ( s )( E j ) = τ ( t )( E j ) for all E i , E j and all s , t ∈ E i , and hence can be decided in polynomial time.Next, we establish NP -hardness of the problem. Similar to Theorem 37, we provide apolynomial-time many-one reduction from Subset Sum. Given an instance of h S, N i of SubsetSum, we construct the same MDP D as shown in Figure 7.Next, we show that h S, N i ∈
Subset Sum ⇐⇒ ∃ memoryless strategy α such that d pb ( s, t ) < D ( α ) . ( = ⇒ ) Let P ⊆ S be the set such that P s i ∈ P s i = N . By Theorem 37, there exists amemoryless strategy α such that d pb ( s, t ) = 0, and hence d pb ( s, t ) < ⇐ = ) We prove its contrapositive, that is, if the instance is a no-instance then for allmemoryless strategy α we have d pb ( s, t ) = 1 in D ( α ).Assume the instance is a no-instance. By Theorem 37, we have d pb ( s, t ) > D ( α ) forall memoryless strategy α , i.e. s t . Let α be an arbitrary memoryless strategy. By theconstruction of the MDP, we have s a t a and s b t b . Since ‘ ( s a ) = ‘ ( t b ) and ‘ ( s b ) = ‘ ( t a ),we also have s a t b and s b t a . Thus, in the LMC D ( α ), s i for all 1 ≤ i ≤ n is notprobabilistic bisimilar to t or t . In the graph of Definition 24, the following vertices couldreach ( s, t ): ( s i , t ) or ( s i , t ) for all 1 ≤ i ≤ n , ( s a , t a ) and ( s b , t b ). However, none of themare probabilistic bisimilar. By Proposition 25, we have d pb ( s, t ) = 1 in the LMC D ( α ).).