[PDF] Proving Non-Inclusion of Büchi Automata based on Monte Carlo Sampling

Abstract

The search for a proof of correctness and the search for counterexamples (bugs) are complementary aspects of verification. In order to maximize the practical use of verification tools it is better to pursue them at the same time. While this is well-understood in the termination analysis of programs, this is not the case for the language inclusion analysis of Büchi automata, where research mainly focused on improving algorithms for proving language inclusion, with the search for counterexamples left to the expensive complementation operation. In this paper, we present IMC 2 , a specific algorithm for proving Büchi automata non-inclusion L(A)⊈L(B) , based on Grosu and Smolka's algorithm MC 2 developed for Monte Carlo model checking against LTL formulas. The algorithm we propose takes M=⌈lnδ/ln(1−ϵ)⌉ random lasso-shaped samples from A to decide whether to reject the hypothesis L(A)⊈L(B) , for given error probability ϵ and confidence level 1−δ . With such a number of samples, IMC 2 ensures that the probability of witnessing L(A)⊈L(B) via further sampling is less than δ , under the assumption that the probability of finding a lasso counterexample is larger than ϵ . Extensive experimental evaluation shows that IMC 2 is a fast and reliable way to find counterexamples to Büchi automata inclusion.

Full PDF

PProving Non-Inclusion of B¨uchi Automatabased on Monte Carlo Sampling (cid:63)

Yong Li , Andrea Turrini , , Xuechao Sun , , Lijun Zhang , , State Key Laboratory of Computer Science,Institute of Software, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China Institute of Intelligent Software, Guangzhou, China

Abstract.

The search for a proof of correctness and the search forcounterexamples (bugs) are complementary aspects of veriﬁcation. Inorder to maximize the practical use of veriﬁcation tools it is better topursue them at the same time. While this is well-understood in thetermination analysis of programs, this is not the case for the languageinclusion analysis of B¨uchi automata, where research mainly focused onimproving algorithms for proving language inclusion, with the search forcounterexamples left to the expensive complementation operation.In this paper, we present

IMC , a speciﬁc algorithm for proving B¨uchiautomata non-inclusion L ( A ) (cid:54)⊆ L ( B ), based on Grosu and Smolka’salgorithm MC developed for Monte Carlo model checking against LTLformulas. The algorithm we propose takes M = (cid:100) ln δ/ ln(1 − ε ) (cid:101) randomlasso-shaped samples from A to decide whether to reject the hypothesis L ( A ) (cid:54)⊆ L ( B ), for given error probability ε and conﬁdence level 1 − δ .With such a number of samples, IMC ensures that the probability ofwitnessing L ( A ) (cid:54)⊆ L ( B ) via further sampling is less than δ , under theassumption that the probability of ﬁnding a lasso counterexample is largerthan ε . Extensive experimental evaluation shows that IMC is a fast andreliable way to ﬁnd counterexamples to B¨uchi automata inclusion. The language inclusion checking of B¨uchi automata is a fundamental problemin the ﬁeld of automated veriﬁcation. Specially, in the automata-based modelchecking [24] framework, when both system and speciﬁcation are given as B¨uchiautomata, the model checking problem of verifying whether some system’sbehavior violates the speciﬁcation reduces to a language inclusion problembetween the corresponding B¨uchi automata.In this paper, we target at the language inclusion checking problem of B¨uchi au-tomata. Since this problem has already been proved to be PSPACE-complete [18],researchers have been focusing on devising algorithms to reduce its practical cost. (cid:63)

This work has been supported by the Guangdong Science and Technology Department(grant no. 2018B010107004) and by the National Natural Science Foundation of China(grant nos. 61761136011, 61532019). a r X i v : . [ c s . F L ] J u l na¨ıve approach to checking the inclusion between B¨uchi automata A and B isto ﬁrst construct a complement automaton B c such that L ( B c ) = Σ ω \ L ( B ) andthen to check the language emptiness of L ( A ) ∩ L ( B c ), which is the algorithmimplemented in SPOT [11], a highly optimized symbolic tool for manipulatingLTL formulas and ω -automata.The bottleneck of this approach is computing the automaton B c , whichcan be exponentially larger than B [25]. As a result, various optimizations —such as subsumption and simulation — have been proposed to avoid exploringthe whole state-space of B c , see, e.g., [1, 2, 9, 10, 13, 14]. For instance, RABITis currently the state-of-the-art tool for checking language inclusion betweenB¨uchi automata, which has integrated the simulation and subsumption techniquesproposed in [1,2,9]. All these techniques improving the language inclusion checking,however, focus on proving inclusion. In particular, the simulation techniquesin [9, 13] are specialized algorithms mainly proposed to obtain such proof, whichensures that for every initial state q a of A , there is an initial state q b of B thatsimulates every possible behavior from q a .From a practical point of view, it is widely believed that the witness of acounterexample (or bug) found by a veriﬁcation tool is equally valuable as aproof for the correctness of a program; we would argue that showing why aprogram violates the speciﬁcation is also intuitive for a programmer, since itgives a clear way to identify and correct the error. Thus, the search for a proofand the search for counterexamples (bugs) are complementary activities thatneed to be pursued at the same time in order to maximize the practical use ofveriﬁcation tools. This is well-understood in the termination analysis of programs,as the techniques for searching the proof of the termination [6, 7, 20] and thecounterexamples [12, 16, 21] are evolving concurrently. Counterexamples to B¨uchiautomata language inclusion, instead, are the byproducts of a failure whileproving language inclusion. Such a failure may be recognized after a considerableamount of eﬀorts has been spent on proving inclusion, in particular when theproposed improvements are not eﬀective. In this work, instead, we focus directlyon the problem of ﬁnding a counterexample to language inclusion.The main contribution is a novel algorithm called IMC for showing languagenon-inclusion based on sampling and statistical hypothesis testing. Our algorithmis inspired by the Monte Carlo approach proposed in [15] for model checkingsystems against LTL speciﬁcations. The algorithm proposed in [15] takes as inputa B¨uchi automaton A as system and an LTL formula ϕ as speciﬁcation and thenchecks whether A (cid:54)| = ϕ by equivalently checking L ( A ) (cid:54)⊆ L ( B ϕ ), where B ϕ is theB¨uchi automaton constructed for ϕ . The main idea of the algorithm for showing L ( A ) (cid:54)⊆ L ( B ϕ ) is to sample lasso words from the product automaton A × B c ϕ for L ( A ) ∩ L ( B c ϕ ); lasso words are of the form uv ω and are obtained as soon as astate is visited twice. If one of such lasso words is accepted by A × B c ϕ , then it issurely a witness to L ( A ) (cid:54)⊆ L ( B ϕ ), i.e., a counterexample to A | = ϕ . Since in [15]the algorithm gets an LTL formula ϕ as input, the construction of B c ϕ reducesto the construction of B ¬ ϕ and it is widely assumed that the translation into aB¨uchi automaton is equally eﬃcient for a formula and its negation. In this paper,e consider the general case, namely the speciﬁcation is given as a generic B¨uchiautomaton B , where the construction of B c from B can be very expensive [25].To avoid the heavy generation of B c , the algorithm IMC we propose directlysampling lasso words in A , without making the product A × B c . We show thatusual lasso words, like the ones used in [15], do not suﬃce in our case, andpropose a rather intriguing sampling procedure. We allow the lasso word uv ω to visit each state of A multiple times, i.e., the run σ of A on the ﬁnite word uv can present small cycles on both the u and the v part of the lasso word.We achieve this by setting a bound K on the number of times a state can bevisited: each state in σ is visited at most K − σ that is visited at most K times. We show that IMC gives a probabilisticguarantee in terms of ﬁnding a counterexample to inclusion when K is suﬃcientlylarge, as described in Theorem 4. This notion of generalized lasso allows ourapproach to ﬁnd counterexamples that are not valid lassos in the usual sense.The extensive experimental evaluation shows that our approach is generally veryfast and reliable in ﬁnding counterexamples to language inclusion. In particular,the prototype tool we developed is able to manage easily B¨uchi automata withvery large state space and alphabet on which the state-of-the-art tools suchas RABIT and SPOT fail. This makes our approach ﬁt very well among toolsthat make use of B¨uchi automata language inclusion tests, since it can quicklyprovide counterexamples before having to rely on the possibly time and resourceconsuming structural methods, in case an absolute guarantee about the result ofthe inclusion test is desired. Organization of the paper.

In the remainder of this paper, we brieﬂy recall someknown results about B¨uchi automata in Section 2. We then present the algorithm

IMC in Section 3 and give the experimental results in Section 4 before concludingthe paper with some remark in Section 5. B¨uchi Automata

Let Σ be a ﬁnite set of letters called alphabet . A ﬁnitesequence of letters is called a word . An inﬁnite sequence of letters is called an ω -word . We use | α | to denote the length of the ﬁnite word α and we use λ torepresent the empty word, i.e., the word of length 0. The set of all ﬁnite wordson Σ is denoted by Σ ∗ , and the set of all ω -words is denoted by Σ ω . Moreover,we also denote by Σ + the set Σ ∗ \ { λ } .A nondeterministic B¨uchi automaton (NBA) is a tuple B = ( Σ, Q, Q I , T , Q F ),consisting of a ﬁnite alphabet Σ of input letters, a ﬁnite set Q of states with anon-empty set Q I ⊆ Q of initial states , a set T ⊆ Q × Σ × Q of transitions , anda set Q F ⊆ Q of accepting states .A run of an NBA B over an ω -word α = a a a · · · ∈ Σ ω is an inﬁnitealternation of states and letters ρ = q a q a q · · · ∈ ( Q × Σ ) ω such that q ∈ Q I and, for each i ≥ (cid:0) ρ ( i ) , a i , ρ ( i + 1) (cid:1) ∈ T where ρ ( i ) = q i . A run ρ is accepting ifit contains inﬁnitely many accepting states, i.e., Inf( ρ ) ∩ Q F (cid:54) = ∅ , where Inf( ρ ) = q ∈ Q | ∀ i ∈ N . ∃ j > i : ρ ( j ) = q } . An ω -word α is accepted by B if B has anaccepting run on α , and the set of words L ( B ) = { α ∈ Σ ω | α is accepted by B } accepted by B is called its language .We call a subset of Σ ω an ω -language and the language of an NBA an ω -regular language . Words of the form uv ω are called ultimately periodic words.We use a pair of ﬁnite words ( u, v ) to denote the ultimately periodic word w = uv ω . We also call ( u, v ) a decomposition of w . For an ω -language L , letUP( L ) = { uv ω ∈ L | u ∈ Σ ∗ , v ∈ Σ + } be the set of all ultimately periodic wordsin L . The set of ultimately periodic words can be seen as the ﬁngerprint of L : Theorem 1 (Ultimately Periodic Words [8]).

Let L , L (cid:48) be two ω -regularlanguages. Then L = L (cid:48) if, and only if, UP ( L ) = UP ( L (cid:48) ) . An immediate consequence of Theorem 1 is that, for any two ω -regular languages L and L , if L (cid:54) = L then there is an ultimately periodic word xy ω ∈ (cid:0) UP( L ) \ UP( L ) (cid:1) ∪ (cid:0) UP( L ) \ UP( L ) (cid:1) . It follows that xy ω ∈ L \ L or xy ω ∈ L \ L . Let A , B be two NBAs and assume that L ( A ) \ L ( B ) (cid:54) = ∅ . One can ﬁnd an ultimatelyperiodic word xy ω ∈ L ( A ) \ L ( B ) as a counterexample to L ( A ) ⊆ L ( B ).Language inclusion between NBAs can be reduced to complementation , in-tersection , and emptiness problems on NBAs. The complementation operationof an NBA B is to construct an NBA B c accepting the complement language of L ( B ), i.e., L ( B c ) = Σ ω \ L ( B ). Lemma 1 (cf. [17, 19]).

Let A , B be NBAs with n a and n b states, respectively.1. It is possible to construct an NBA B c such that L ( B c ) = Σ ω \ L ( B ) whosenumber of states is at most (2 n b + 2) n b × n b , by means of the complementconstruction.2. It is possible to construct an NBA C such that L ( C ) = L ( A ) ∩ L ( B c ) whosenumber of states is at most × n a × (2 n b + 2) n b × n b , by means of the productconstruction. Note that L ( A ) ⊆ L ( B ) holds if and only if L ( C ) = ∅ holds.3. L ( C ) = ∅ is decidable in time linear in the number of states of C . Further, testing whether an ω -word w is accepted by a B¨uchi automaton B canbe done in time polynomial in the size of the decomposition ( u, v ) of w = uv ω . Lemma 2 (cf. [17]).

Let B be an NBA with n states and an ultimately periodicword ( u, v ) with | u | + | v | = m . Then checking whether uv ω is accepted by B isdecidable in time and space linear in n × m . Random Sampling and Hypothesis Testing

Statistical hypothesis testingis a statistical method to assign a conﬁdence level to the correctness of theinterpretation given to a small set of data sampled from a population, when thisinterpretation is extended to the whole population.Let Z be a Bernoulli random variable and X the random variable withparameter p Z whose value is the number of independent trials required until wesee that Z = 1. Let δ be the signiﬁcance level that Z = 1 will not appear within trials. Then N = (cid:100) ln δ/ ln(1 − p Z ) (cid:101) is the number of attempts needed to get acounterexample with probability at most 1 − δ .If the exact value of p Z is unknown, given an error probability ε such that p Z ≥ ε , we have that M = (cid:100) ln δ/ ln(1 − ε ) (cid:101) ≥ N = (cid:100) ln δ/ ln(1 − p Z ) (cid:101) ensuresthat p Z ≥ ε = ⇒ Pr [ X ≤ M ] ≥ − δ . In other words, M is the minimal numberof attempts required to ﬁnd a counterexample with probability 1 − δ , underthe assumption that p Z ≥ ε . See, e.g., [15, 26] for more details about statisticalhypothesis testing in the context of formal veriﬁcation. In this section we present our Monte Carlo sampling algorithm

IMC for testingnon-inclusion between B¨uchi automata. MC : Monte Carlo Sampling for LTL Model Checking In [15], the authors proposed a Monte Carlo sampling algorithm MC for verifyingwhether a given system A satisﬁes a Linear Temporal Logic (LTL) speciﬁcation ϕ . MC works directly on the product B¨uchi automaton P that accepts the language L ( A ) ∩ L ( B ¬ ϕ ). It essentially checks whether L ( P ) is empty.First, MC takes two statistical parameters ε and σ as input and computesthe number of samples M for this experiment. Since every ultimately periodicword xy ω ∈ L ( P ) corresponds to some cycle run (or “lasso”) in P , MC canjust ﬁnd an accepting lasso whose corresponding ultimately periodic word xy ω issuch that xy ω ∈ L ( P ). In each sampling procedure, MC starts from a randomlychosen initial state and performs a random walk on P ’s transition graph until astate has been visited twice, which consequently gives a lasso in P . MC thenchecks whether there exists an accepting state in the repeated part of the sampledlasso. If so, MC reports it as a counterexample to the veriﬁcation, otherwise itcontinues with another sampling process if necessary. The correctness of MC isstraightforward, as the product automaton P is non-empty if and only if there isan accepting lasso. The Monte Carlo Sampling algorithm in [15] operates directly on the product.For language inclusion, as discussed in the introduction, this is the bottleneckof the construction. Thus, we aim at a sampling algorithm operating on theautomata A and B , separately. With this in mind, we show ﬁrst that, directlyapplying MC can be incomplete for language inclusion checking. Example 1.

Consider checking the language inclusion of the B¨uchi automata A and B in Fig. 1. As we want to exploit MC to ﬁnd a counterexample tothe inclusion, we need to sample a word from A that is accepted by A butnot accepted by B . In [15], the sampling procedure is terminated as soon as a s A a b b q q B b b Fig. 1.

Two NBAs A and B . state is visited twice. Thus, the set of lassos that can be sampled by MC is { s as , s bs bs } , which yields the set of words { a ω , b ω } . It is easy to see thatneither of these two words is a counterexample to the inclusion. The inclusion,however, does not hold: the word ab ω ∈ L ( A ) \ L ( B ) is a counterexample. ♦ According to Theorem 1, if L ( A ) \ L ( B ) (cid:54) = ∅ , then there must be an ultimatelyperiodic word xy ω ∈ L ( A ) \ L ( B ) as a counterexample to the inclusion. It followsthat there exists some lasso in A whose corresponding ultimately periodic wordis a counterexample to the inclusion. The limit of MC in checking the inclusion is that MC only samples simple lasso runs, which may miss non-trivial lassosin A that correspond to counterexamples to the inclusion. The reason that it issuﬃcient for checking non-emptiness in the product automaton is due to the factthat the product automaton already synchronizes behaviors of A and B ¬ ϕ .In the remainder of this section, we shall propose a new deﬁnition of lassos byallowing multiple occurrences of states, which is the key point of our extension. IMC : Monte Carlo Sampling for Inclusion Checking We now present our Monte Carlo sampling algorithm called

IMC specialized fortesting the language inclusion between two given NBAs A and B .We ﬁrst deﬁne the lassos of A in Deﬁnition 1 and show how to computethe probability of a sample lasso in Deﬁnition 2. Then we prove that with ourdeﬁnition of the lasso probability space in A , the probability of a sample lassowhose corresponding ultimately periodic word xy ω is a counterexample to theinclusion is greater than 0 under the hypothesis L ( A ) (cid:54)⊆ L ( B ). Thus we eventuallyget for sure a sample from A that is a counterexample to the inclusion, if inclusiondoes not hold. In other words, we are able to obtain a counterexample to theinclusion with high probability from a large amount of samples.In practice, a lasso of A is sampled via a random walk on A ’s transition graph,starting from a randomly chosen initial state and picking uniformly one outgoingtransition. In the following, we ﬁx a natural number K ≥ A = ( Σ, Q, Q I , T , Q F ) and B . We assume thateach state in A can reach an accepting state and has at least one outgoingtransition. Note that each NBA A with L ( A ) (cid:54) = ∅ can be pruned to satisfy suchassumption; only NBAs A (cid:48) with L ( A (cid:48) ) = ∅ do not satisfy the assumption, butfor these automata the problem L ( A (cid:48) ) ⊆ L ( B ) is trivial. eﬁnition 1 (Lasso). Given two NBAs A , B and a natural K ≥ , a ﬁniterun σ = q a q · · · a n − q n a n q n +1 of A is called a K -lasso if (1) each state in { q , . . . , q n } occurs at most K − times in q a q · · · a n − q n and (2) q n +1 = q i for some ≤ i ≤ n (thus, q n +1 occurs at most K times in σ ). We write σ ⊥ forthe terminating K -lasso σ , where ⊥ is a fresh symbol denoting termination. Wedenote by S K A the set of all terminating K -lassos for A .We call σ ⊥ ∈ S K A a witness for L ( A ) \ L ( B ) (cid:54) = ∅ if the associated ω -word ( a · · · a i − , a i · · · a n ) is accepted by A but not accepted by B . It is worth noting that not every ﬁnite cyclic run of A is a valid K -lasso. Considerthe NBA A shown in Fig. 1 for instance: the run s as bs bs is not a lasso when K = 2 since by Deﬁnition 1 every state except the last one is allowed to occur atmost K − s clearly violates this requirement since it occurs twiceand it is not the last state of the run. The run s bs bs instead is obviously avalid lasso when K = 2. Remark 1. A K -lasso σ is also a K (cid:48) -lasso for any K (cid:48) > K . Moreover, a termi-nating K -lasso can be a witness without being an accepting run: according toDeﬁnition 1, a terminating K -lasso σ ⊥ is a witness if its corresponding word uv ω is accepted by A but not accepted by B . This does not imply that σ is anaccepting run, since there may be another run σ (cid:48) on the same word uv ω that isaccepting.In order to deﬁne a probability space over S K A , we ﬁrst deﬁne the probabilityof a terminating K -lasso of A . We denote by σ, q ) the number of occurrencesof the state q in the K -lasso σ . Deﬁnition 2 (Lasso Probability).

Given an NBA A , a natural number K ≥ , and a stopping probability p ⊥ ∈ (0 , , the probability Pr p ⊥ [ σ ⊥ ] of a terminating K -lasso σ ⊥ = q a · · · q n a n q n +1 ⊥ ∈ S K A is deﬁned as follows: Pr p ⊥ [ σ ⊥ ] = (cid:40) Pr (cid:48) p ⊥ [ σ ] if σ, q n +1 ) = K , p ⊥ · Pr (cid:48) p ⊥ [ σ ] if σ, q n +1 ) < K ; Pr (cid:48) p ⊥ [ σ ] =  | Q I | if σ = q ; Pr (cid:48) p ⊥ [ σ (cid:48) ] · π [ q l a l q l +1 ] if σ = σ (cid:48) a l q l +1 and σ (cid:48) , q l ) = 1 ; (1 − p ⊥ ) · Pr (cid:48) p ⊥ [ σ (cid:48) ] · π [ q l a l q l +1 ] if σ = σ (cid:48) a l q l +1 and σ (cid:48) , q l ) > ,where π [ qaq (cid:48) ] = m if ( q, a, q (cid:48) ) ∈ T and | T( q ) | = m , otherwise.We extend Pr p ⊥ to sets of terminating K -lassos in the natural way, i.e., for S ⊆ S K A , Pr p ⊥ [ S ] = (cid:80) σ ⊥∈ S Pr p ⊥ [ σ ⊥ ] . Assume that the current state of run σ is q . Intuitively, if the last state s of therun σ has been already visited at least twice but less than K times, the run σ can either terminate at s with probability p ⊥ or continue with probability 1 − p ⊥ by taking uniformly one of the outgoing transitions from the state q . However,as soon as the state q has been visited K times, the run σ has to terminate. λ, (cid:105)(cid:104) s , (cid:105) (cid:104) s bs , (cid:105)(cid:104) s bs bs , (cid:105) (cid:104) s bs bs ⊥ , (cid:105)(cid:104) s bs bs bs , (cid:105) (cid:104) s bs bs bs ⊥ , (cid:105)(cid:104) s as , (cid:105)(cid:104) s as ⊥ , (cid:105) (cid:104) s as as , (cid:105)(cid:104) s as as ⊥ , (cid:105) (cid:104) s as bs , (cid:105)(cid:104) s as bs bs , (cid:105)(cid:104) s as bs bs ⊥ , (cid:105) (cid:104) s as bs bs bs , (cid:105)(cid:104) s as bs bs bs ⊥ , (cid:105) Fig. 2.

An instance T of the trees used in the proof of Theorem 2. Each leaf node islabeled with a terminating 3-lasso σ ⊥ ∈ S A , B for the NBAs A and B shown in Fig. 1,and its corresponding probability value Pr [ σ ⊥ ]. Theorem 2 (Lasso Probability Space).

Let A be an NBA, K ≥ , and astopping probability p ⊥ ∈ (0 , . The σ -ﬁeld ( S K A , S K A ) together with Pr p ⊥ deﬁnesa discrete probability space.Proof (sketch). The facts that Pr p ⊥ [ σ ] is a non-negative real value for each σ ∈ S and that Pr p ⊥ [ S ∪ S ] = Pr p ⊥ [ S ] + Pr p ⊥ [ S ] for each S , S ⊆ S K A suchthat S ∩ S = ∅ are both immediate consequences of the deﬁnition of Pr p ⊥ .The interesting part of the proof is about showing that Pr p ⊥ [ S K A ] = 1. Toprove this, we make use of a tree T = ( N, (cid:104) λ, (cid:105) , E ), like the one shown in Fig. 2,whose nodes are labelled with ﬁnite runs and probability values. In particular, welabel the leaf nodes of T with the terminating K -lassos in S K A while we use theirﬁnite run preﬁxes to label the internal nodes. Formally, the tree T is constructedas follows. Let P = { σ (cid:48) ∈ Q × ( Σ × Q ) ∗ | σ (cid:48) is a preﬁx of some σ ⊥ ∈ S K A } bethe set of preﬁxes of the K -lassos in S K A . T ’s components are deﬁned as follows. – N = (cid:0) P × (0 , (cid:1) ∪ (cid:0) S K A × (0 , (cid:1) ∪ {(cid:104) λ, (cid:105)} is the set of nodes, – (cid:104) λ, (cid:105) is the root of the tree, and – E ⊆ (cid:16) {(cid:104) λ, (cid:105)} × (cid:0) P × (0 , (cid:1)(cid:17) ∪ (cid:16) P × (0 , (cid:17) ∪ (cid:16)(cid:0) P × (0 , (cid:1) × (cid:0) S K A × (0 , (cid:1)(cid:17) is the set of edges deﬁned as E = { ( (cid:104) λ, (cid:105) , (cid:104) q, | Q I | (cid:105) ) | q ∈ Q I }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σaq, p | T( σ l ) | (cid:105) (cid:1) | σaq ∈ P ∧ σ, σ l ) = 1 }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σaq, p · (1 − p ⊥ ) | T( σ l ) | (cid:105) (cid:1) | σaq ∈ P ∧ σ, σ l ) > }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) = K } { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p · p ⊥ (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) < K } where σ l denotes the last state s n of the ﬁnite run σ = s a s . . . a n − s n .Then we show a correspondence between the reachable leaf nodes and theterminating K -lassos with their Pr p ⊥ probability values, and that the probabilityvalue in each internal node equals the sum of the probabilities of its children. Bythe ﬁniteness of the reachable part of the tree we then derive Pr p ⊥ [ S K A ] = 1.The complete proof can be found in Appendix A. (cid:117)(cid:116) Example 2 (Probability of lassos).

Consider the B¨uchi automaton A of Fig. 1and p ⊥ = . For K = 2, there are only two terminating 2-lassos, namely s as ⊥ and s bs bs ⊥ . According to Deﬁnition 2, we know that each lasso occurs withprobability and they are not witnesses since the corresponding ultimatelyperiodic words a ω and bb ω do not belong to the language L ( A ) \ L ( B ). If we set K = 2 to check whether L ( A ) ⊆ L ( B ), we end up concluding that the inclusionholds with probability 1 since the probability to ﬁnd some lasso of A relatedto the ω -word ab ω ∈ L ( A ) \ L ( B ) is 0. If we want to ﬁnd a witness K -lasso, weneed to set K = 3 at least, since now the terminating 3-lasso s as bs bs ⊥ withcorresponding ω -word abb ω ∈ L ( A ) \ L ( B ) can be found with probability > K = 2 and p ⊥ = 1, thus theirmethod is not complete for NBA language inclusion checking. ♦ According to Theorem 2, the probability space of the sample terminating K -lassos in A can be organized in the tree, like the one shown in Fig. 2. Therefore,it is easy to see that the probability to ﬁnd the witness 3-lasso s as bs bs ⊥ of A is , as indicated by the leaf node (cid:104) s as bs bs ⊥ , (cid:105) . Deﬁnition 3 (Lasso Bernoulli Variable).

Let K ≥ be a natural numberand p ⊥ a stopping probability. The random variable associated with the probabilityspace ( S K A , S K A , Pr p ⊥ ) of the NBAs A and B is deﬁned as follows: p Z = Pr p ⊥ [ Z =1] = (cid:80) σ ⊥∈ S w Pr p ⊥ [ σ ⊥ ] and q Z = Pr p ⊥ [ Z = 0] = (cid:80) σ ⊥∈ S n Pr p ⊥ [ σ ⊥ ] , where S w , S n ⊆ S K A are the set of witness and non-witness lassos, respectively. Under the assumption L ( A ) \ L ( B ) (cid:54) = ∅ , there exists some witness K -lasso σ ⊥ ∈ S w that can be sampled with positive probability if Pr p ⊥ [ Z = 1] >

0, asexplained by Example 3.

Example 3.

For the NBAs A and B shown in Fig. 1, K = 3, and p ⊥ = , thelasso probability space is organized as in Fig. 2. The lasso Bernoulli variable hasassociated probabilities p Z = and q Z = since the only witness lassos are s as bs bs ⊥ and s as bs bs bs ⊥ , both occurring with probability . ♦ Therefore, if we set K = 3 and p ⊥ = to check the inclusion between A and B from Fig. 1, we are able to ﬁnd with probability the ω -word ab ω as acounterexample to the inclusion L ( A ) ⊆ L ( B ). It follows that the probability we q q . . . q K q b ba ba b a abb Fig. 3.

NBA K K making p Z = 0 when checking L ( A ) ⊆ L ( K K ) by means of samplingterminating K -lassos from A shown in Fig. 1. do not ﬁnd any witness 3-lasso after 50 trials would be less than 0 . K is not suﬃciently large, that is the main problem with MC algorithm from [15] for checking language inclusion. The natural questionis then: how large should K be for checking the inclusion? First, let us discussabout K without taking the automaton B into account. Consider the NBA A of Fig. 1: it seems that no matter how large K is, one can always construct anNBA K with K + 1 states to make the probability p Z = 0, as the counterexample a l b ω ∈ L ( A ) \ L ( B ) can not be sampled for any l ≥ K . Fig. 3 depicts suchNBA K , for which we have L ( K ) = { b ω , ab ω , aab ω , . . . , a K − b ω } . One can easilyverify that the counterexample a l b ω can not be sampled from A when l ≥ K , assampling this word requires the state s to occur l + 1 times in the run, that isnot a valid K -lasso. This means that K is a value that depends on the size of B .To get a K suﬃciently large for every A and B , one can just take the productof A with the complement of B and check how many times in the worst case astate of A occurs in the shortest accepting run of the product. Lemma 3 (Suﬃciently Large K ). Let A , B be NBAs with n a and n b states,respectively, and Z be the random variable deﬁned in Deﬁnition 3. Assume that L ( A ) \ L ( B ) (cid:54) = ∅ . If K ≥ × (2 n b + 2) n b × n b + 1 , then Pr p ⊥ [ Z = 1] > .Proof. To check whether L ( A ) \ L ( B ) (cid:54) = ∅ , we can use NBA complementationand product operations to check equivalently L ( A ) ∩ L ( B c ) = ∅ . By Lemma 1, theresulting NBA C such that L ( C ) = L ( A ) ∩ L ( B c ) has 2 × n a × (2 n b + 2) n b × n b states, each one of the form ( q a , q b , i ) with q a ∈ Q A , q b ∈ Q B c , and i ∈ { , } .In the worst case, the shortest run σ c witnessing L ( C ) (cid:54) = ∅ has | Q C | + 1 states,where each state of C occurs exactly once in σ c except for the last one occurringtwice. Since C is the product NBA of A and B c , the run σ c can be projectedinto the component runs σ a and σ b of A and B c , respectively (see, e.g., [17]);both of them have the same length as σ c . Note that by the product construction,we have that σ a and σ b are both accepting runs in A and B c , respectively. Byproduct and projection construction, we have that each state of A occurs exactly2 × (2 n b + 2) n b × n b in σ a except for the last one occurring one time more. Thismeans that by setting K ≥ × (2 n b + 2) n b × n b + 1, σ a becomes a valid K -lassothat is sampled with non-zero probability. (cid:117)(cid:116) lgorithm 1 IMC Algorithm procedure IMC ( A , B , K, p ⊥ , ε, δ )2: M := (cid:100) ln δ/ ln(1 − ε ) (cid:101) ;3: for ( i := 1; i ≤ M ; i ++) do

4: ( u, v ) := sample ( A , K, p ⊥ );5: if membership ( A , ( u, v )) then if not membership ( B , ( u, v )) then return ( false , ( u, v ));8: return true ; Remark 2.

We want to stress that choosing K as given in Lemma 3 is a suﬃcientcondition for sampling a counterexample with positive probability; choosing thisvalue, however, is not a necessary condition. In practice, we can ﬁnd counterex-amples with positive probability with K being set to a value much smaller than2 × (2 n b + 2) n b × n b + 1, as experiments reported in Section 4 indicate.Now we are ready to present our IMC algorithm, given in Algorithm 1. Oninput the two NBAs A and B , the bound K , the stopping probability p ⊥ , andthe statistical parameters ε and δ , the algorithm at line 2 ﬁrst computes thenumber M of samples according to ε and δ . Then, for each ω -word ( u, v ) = uv ω associated with a terminating lasso sampled at line 4 according to Deﬁnitions 1and 2, it checks whether the lasso is a witness by ﬁrst (line 5) verifying whether uv ω ∈ L ( A ), and then (line 6) whether uv ω / ∈ L ( B ). If the sampled lasso is indeeda witness, a counterexample to L ( A ) ⊆ L ( B ) has been found, so the algorithmcan terminate at line 7 with the correct answer false and the counterexample( u, v ). If none of the M sampled lassos is a witness, then the algorithm returns true at line 8, which indicates that hypothesis L ( A ) (cid:54)⊆ L ( B ) has been rejectedand L ( A ) ⊆ L ( B ) is assumed to hold. It follows that IMC gives a probabilisticguarantee in terms of ﬁnding a counterexample to inclusion when K is suﬃcientlarge, as formalized by the following proposition. Proposition 1.

Let A , B be two NBAs and K be a suﬃciently large number. If L ( A ) \ L ( B ) (cid:54) = ∅ , then IMC ﬁnds a counterexample to the inclusion L ( A ) ⊆ L ( B ) with positive probability. In general, the exact value of p Z , the probability of ﬁnding a word acceptedby A but not accepted by B , is unknown or at least very hard to compute. Thus,we summarize our results about IMC in Theorems 3 and 4 with respect to thechoice of the statistical parameters ε and δ . Theorem 3 (Correctness).

Let A , B be two NBAs, K be a suﬃciently largenumber, and ε and δ be statistical parameters. If IMC returns false, then L ( A ) (cid:54)⊆L ( B ) is certain. Otherwise, if IMC returns true, then the probability that wewould continue and with probability p Z ≥ ε ﬁnd a counterexample is less than δ .Proof. The fact that

IMC is correct when returning false is trivial, since thishappens only when IMC ﬁnds an ω -word uv ω such that uv ω ∈ L ( A ) (cf. line 5)and uv ω / ∈ L ( B ) (cf. line 6), i.e., uv ω is a witness of L ( A ) (cid:54)⊆ L ( B ).he fact that IMC , on returning true , ensures that, by sampling more words,with probability at most δ a counterexample is found with probability p Z ≥ ε , isjustiﬁed by statistical hypothesis testing (cf. Section 2). (cid:117)(cid:116) Theorem 4 (Complexity).

Given two NBAs A , B with n a and n b states,respectively, and statistical parameters ε and δ , let M = (cid:100) ln δ/ ln(1 − ε ) (cid:101) and n = max( n a , n b ) . Then IMC runs in time O ( M · K · n ) and space O ( K · n ) .Proof. Let ( u, v ) be a sampled word from A . Then in the worst case, | u | + | v | = n × ( K −

1) when every state in A occurs in the sampled lasso K − K times. According to Lemma 2, determining whetheran ω -word ( u, v ) = uv ω is accepted by an NBA can be done in time and spacelinear in the number of states and the length of ( u, v ). Therefore the time andspace complexity for resolving a membership checking problem of ( u, v ) is in O ( n × K × n ) = O ( K · n ). It follows that IMC runs in time O ( M · K · n ) andspace O ( K · n ). (cid:117)(cid:116) We have implemented the Monte Carlo sampling algorithm proposed in Section 3in ROLL [22] to evaluate it. We performed our experiments on a desktop PCequipped with a 3 . -fastc while for ROLL we setparameters ε = 0 .

1% and δ = 2%, resulting in sampling roughly 4 000 words fortesting inclusion, p ⊥ = , and K to the maximum of the number of states of thetwo automata. The automata we used in the experiment are represented in twoformats: the BA format used by GOAL [23] and the HOA format [4]. RABITsupports only the former, SPOT only the latter, while ROLL supports both.We used ROLL to translate between the two formats and then we comparedROLL (denoted ROLL H ) with SPOT on the HOA format and ROLL (denotedROLL B ) with RABIT on the BA format. When we present the outcome of theexperiments, we distinguish them depending on the used automata format. Thisallows us to take into account the possible eﬀects of the automata representation,on both the language they represent and the running time of the tools. To run the diﬀerent tools on randomly generated automata, we used SPOT togenerate 50 random HOA automata for each combination of state space size | Q | ∈ GOAL is omitted in our experiments as it is shown in [9] that RABIT performs muchbetter than GOAL. able 1.

Experiment results on random automata with ﬁxed state space and alphabet.Tool included not included timeout memory out other failuresSPOT 1 803 10 177+53 1 780 670 1 517ROLL H B { , , . . . , , , , . . . , , } and alphabet size | Σ | ∈ { , , . . . , , } ,for a total of 8 000 automata, that we have then translated to the BA format.We then considered 100 diﬀerent pairs of automata for each combination of statespace size and alphabet size (say, for instance, 100 pairs of automata with 50states and 10 letters or 100 pairs with 175 states and 4 letters). The resulting16 000 experiments are summarized in Table 1.For each tool, we report the number of inclusion test instances that resultedin an answer for language inclusion and not inclusion, as well as the numberof cases where a tool went timeout, ran out of memory, or failed for any otherreason. For the “included” case, we indicate in parenthesis how many timesROLL has failed to reject the hypothesis L ( A ) ⊆ L ( B ), that is, ROLL returned“included” instead of the expected “not included”. For the “non included” case,instead, we split the number of experiments on which multiple tools returned“not included” and the number of times only this tool returned “not included”;for instance, we have that both SPOT and ROLL H returned “not included” on10 177 cases, that only SPOT returned so in 53 more experiments (for a total of10 230 “not included” results), and that only ROLL H identiﬁed non inclusion in3 194 additional experiments (for a total of 13 371 “not included” results).We can see in Table 1 that both ROLL H and ROLL B were able to solvemany more cases than their counterparts SPOT and RABIT, respectively, onboth “included” and “not included” outcomes. In particular, we can see that bothROLL H and ROLL B have been able to ﬁnd a counterexample to the inclusionfor many cases (3 194 and 1 052, respectively) where SPOT on the HOA formatand RABIT on the BA format failed, respectively.On the other hand, there are only few cases where SPOT or RABIT provednon inclusion while ROLL failed to do so. In particular, since ROLL implementsa statistical hypothesis testing algorithm for deciding language inclusion, we canexpect few experiments where ROLL fails to reject the alternative hypothesis L ( A ) ⊆ L ( B ). In the experiments this happened 5 (ROLL H ) and 45 (ROLL B )times; this corresponds to a failure rate of less than 0 . δ = 2%.Regarding the 13 failures of ROLL H and the 9 ones of ROLL B , they areall caused by a stack overﬂow in the strongly connected components (SCC)decomposition procedure for checking membership uv ω ∈ L ( A ) or uv ω ∈ L ( B )(i.e., L ( A ) ∩ { uv ω } = ∅ or L ( B ) ∩ { uv ω } = ∅ , cf. [17]) at lines 5 and 6 ofAlgorithm 1, since checking whether the sampled lasso is an accepting run of A does not suﬃce (cf. Remark 1). The 119 timeouts of ROLL H occurred for 3 pairs . , ,

000 Time (s) N u m b e r o f e x p e r i m e n t s SPOT

IMC H IMC B RABIT

Fig. 4.

Experiment running time on the random automata with ﬁxed state space andalphabet. of automata with 200 states and 20 letters, 12/21 pairs of automata with 225states and 18/20 letters, respectively, and 40/43 pairs of automata with 250 statesand 18/20 letters, respectively. We plan to investigate why ROLL H suﬀered ofthese timeouts while ROLL B avoided them, to improve ROLL’s performance.About the execution running time of the tools, they are usually rather fast ingiving an answer, as we can see from the plot in Fig. 4. In this plot, we show onthe y axis the total number of experiments, each one completed within the timemarked on the x axis; the vertical gray line marks the timeout limit. The plotis relative to the number of “included” and “not included” outcomes combinedtogether; the shape of the plots for the two outcomes kept separated is similarto the combined one we present in Fig. 4; the only diﬀerence is that in the“not included” case, the plots for ROLL B and ROLL H would terminate earlier,since all experiments returning “not included” are completed within a smallertime than for the “included” case. As we can see, we have that ROLL ratherquickly overcame the other tools in giving an answer. This is likely motivated bythe fact that by using randomly generated automata, the structure-based toolssuch as RABIT and SPOT are not able to take advantage of the symmetries orother structural properties one can ﬁnd in automata obtained from, e.g., logicalformulas. From the result of the experiments presented in Table 1 and Fig. 4,we have that the use of a sampling-based algorithm is a very fast, eﬀective, andreliable way to rule out that L ( A ) ⊆ L ( B ) holds. Moreover, we also conclude that MC complements existing approaches rather well, as it ﬁnds counterexamples tothe language inclusion for a lot of instances that other approaches fail to manage. ε and δ To analyze the eﬀect of the choice of ε and δ on the execution of the samplingalgorithm we proposed, we have randomly taken 100 pairs of automata where,for each pair ( A , B ), the automata A and B have the same alphabet but possiblydiﬀerent state space. On these 100 pairs of automata, we repeatedly ran ROLL H

10 times with diﬀerent values of ε in the set { . , . , . . . , . } andof δ in the set { . , . , . . . , . } , for a total of 121 000 inclusion tests.The choice of ε and δ plays essentially no role in the running time for the caseswhere a counterexample to the language inclusion is found: the average runningtime is between 1 .

67 and 1 .

77 seconds. This can be expected, since ROLL stopsits sampling procedure as soon as a counterexample is found (cf. Algorithm 1). Ifwe consider the number of experiments, again there is almost no diﬀerence, sincefor all combinations of the parameters it ranges between 868 and 870.On the other hand, ε and δ indeed aﬀect the running time for the “included”cases, since they determine the number M of sampled words and all such wordshave to be sampled and tested before rejecting the “non included” hypothesis.The average running time is 1 second or less for all choices of ε (cid:54) = 0 . δ ,while for ε = 0 . δ moves from 0 . . K and p ⊥ At last, we also experimented with diﬀerent values of K and p ⊥ while keeping thestatistical parameters unchanged: we have generated other 100 pairs of automataas in Section 4.2 and then checked inclusion 10 times for each pair and eachcombination of K ∈ { , , , , , , , , , } and p ⊥ ∈ { . , . , . . . , . } .As one can expect, low values for p ⊥ and large values of K allow IMC to ﬁndmore counterexamples, at the cost of a higher running time. It is worth notingthat K = 2 is still rather eﬀective in ﬁnding counterexamples: out of the 1 000executions on the pairs, IMC returned “non included” between 906 and 910times; for K = 3 it ranged between 914 and 919 for p ⊥ ≤ . p ⊥ > .

5. Larger values of K showed similar behavior. Regarding therunning time, except for K = 2 the running time of IMC is loosely dependent onthe choice of K , for a given p ⊥ ; this is likely motivated by the fact that imposinge.g. K = 51 still allows IMC to sample lassos that are for instance 4-lassos.Instead, the running time is aﬀected by the choice of p ⊥ for a given K ≥

3: asone can expect, the smaller p ⊥ is, the longer IMC takes to give an answer; asmall p ⊥ makes the sampled words uv ω ∈ L ( B ) to be longer, which in turnmakes the check uv ω ∈ L ( B ) more expensive.Experiments suggest that taking 0 . ≤ p ⊥ ≤ . ≤ K ≤

11 gives agood tradeoﬀ between running time and number of “non included” outcomes.ery large values of K , such as K >

50, are usually not needed, also given thefact that usually lassos with several repetitions occur with rather low probability.

We presented

IMC , a sample-based algorithm for proving language non-inclusionbetween B¨uchi automata. Experimental evaluation showed that IMC is veryfast and reliable in ﬁnding such witnesses, by sampling them in many caseswhere traditional structure-based algorithms fail or take too long to complete theanalysis. We believe that IMC is a very good technique to disprove L ( A ) ⊆ L ( B )and complements well the existing techniques for checking B¨uchi automatalanguage inclusion. As future work, our algorithm can be applied to scenarios likeblack-box testing and PAC learning [3], in which inclusion provers are either notapplicable in practice or not strictly needed. A uniform word sampling algorithmwas proposed in [5] for concurrent systems with multiple components. We believethat extending our sampling algorithms to concurrent systems with multiplecomponents is worthy of study. References

1. Abdulla, P.A., Chen, Y., Clemente, L., Hol´ık, L., Hong, C., Mayr, R., Vojnar,T.: Simulation subsumption in Ramsey-based B¨uchi automata universality andinclusion testing. In: CAV. LNCS, vol. 6174, pp. 132–147 (2010)2. Abdulla, P.A., Chen, Y., Clemente, L., Hol´ık, L., Hong, C., Mayr, R., Vojnar, T.:Advanced Ramsey-based B¨uchi automata inclusion testing. In: CONCUR. LNCS,vol. 6901, pp. 187–202 (2011)3. Angluin, D.: Queries and concept learning. ML (4), 319–342 (1987)4. Babiak, T., Blahoudek, F., Duret-Lutz, A., Klein, J., Kret´ınsk´y, J., M¨uller, D.,Parker, D., Strejcek, J.: The Hanoi omega-automata format. In: CAV. LNCS,vol. 9206, pp. 479–486 (2015)5. Basset, N., Mairesse, J., Soria, M.: Uniform sampling for networks of automata. In:CONCUR. pp. 36:1–36:16 (2017)6. Ben-Amram, A.M., Genaim, S.: On multiphase-linear ranking functions. In: CAV.LNCS, vol. 10427, pp. 601–620 (2017)7. Bradley, A.R., Manna, Z., Sipma, H.B.: The polyranking principle. In: ICALP.LNCS, vol. 3580, pp. 1349–1361 (2005)8. B¨uchi, J.R.: On a decision method in restricted second order arithmetic. In: TheCollected Works of J. Richard B¨uchi, pp. 425–435 (1990)9. Clemente, L., Mayr, R.: Eﬃcient reduction of nondeterministic automata withapplication to language inclusion testing. LMCS (1) (2019)10. Doyen, L., Raskin, J.: Antichains for the automata-based approach to model-checking. LMCS (1) (2009)11. Duret-Lutz, A., Lewkowicz, A., Fauchille, A., Michaud, T., Renault, E., Xu, L.:Spot 2.0 - A framework for LTL and ω -automata manipulation. In: ATVA. LNCS,vol. 9938, pp. 122–129 (2016)12. Emmes, F., Enger, T., Giesl, J.: Proving non-looping non-termination automatically.In: IJCAR. LNCS, vol. 7364, pp. 225–240 (2012)3. Etessami, K., Wilke, T., Schuller, R.A.: Fair simulation relations, parity games,and state space reduction for Bu¨chi automata. SIAM J. Comput. (5), 1159–1175(2005)14. Fogarty, S., Vardi, M.Y.: Eﬃcient B¨uchi universality checking. In: TACAS. LNCS,vol. 6015, pp. 205–220 (2010)15. Grosu, R., Smolka, S.A.: Monte Carlo model checking. In: TACAS. LNCS, vol. 3440,pp. 271–286 (2005)16. Gupta, A., Henzinger, T.A., Majumdar, R., Rybalchenko, A., Xu, R.: Provingnon-termination. In: POPL. pp. 147–158 (2008)17. Kupferman, O.: Automata theory and model checking. In: Handbook of ModelChecking, pp. 107–151 (2018)18. Kupferman, O., Vardi, M.Y.: Veriﬁcation of fair transisiton systems. In: CAV.LNCS, vol. 1102, pp. 372–382 (1996)19. Kupferman, O., Vardi, M.Y.: Weak alternating automata are not that weak. TOCL (3), 408–429 (2001)20. Leike, J., Heizmann, M.: Ranking templates for linear loops. LMCS (1) (2015)21. Leike, J., Heizmann, M.: Geometric nontermination arguments. In: TACAS. LNCS,vol. 10806, pp. 266–283 (2018)22. Li, Y., Sun, X., Turrini, A., Chen, Y., Xu, J.: ROLL 1.0: ω -regular language learninglibrary. In: TACAS I. LNCS, vol. 11427, pp. 365–371 (2019)23. Tsai, M., Tsay, Y., Hwang, Y.: GOAL for games, omega-automata, and logics. In:CAV. LNCS, vol. 8044, pp. 883–889 (2013)24. Vardi, M.Y., Wolper, P.: An automata-theoretic approach to automatic programveriﬁcation (preliminary report). In: LICS. pp. 332–344 (1986)25. Yan, Q.: Lower bounds for complementation of omega-automata via the full au-tomata technique. LMCS (1) (2008)26. Younes, H.L.S.: Planning and Veriﬁcation for Stochastic Processes with Asyn-chronous Events. Ph.D. thesis, Carnegie Mellon University (2005) A Proof of Theorem 2

Theorem 2 (Lasso Probability Space).

Let A be an NBA, K ≥ , and astopping probability p ⊥ ∈ (0 , . The σ -ﬁeld ( S K A , S K A ) together with Pr p ⊥ deﬁnesa discrete probability space. The fact that Pr ( σ ) is a non-negative real value for each σ ∈ S is immediate bydeﬁnition of Pr . By deﬁnition, Pr satisﬁes Pr p ⊥ [ S ∪ S ] = Pr p ⊥ [ S ] + Pr p ⊥ [ S ]for each S , S ⊆ S K A such that S ∩ S = ∅ . To complete the proof, we need toshow that Pr p ⊥ [ S K A ] = 1.Let P = { σ (cid:48) ∈ Q × ( Σ × Q ) ∗ | σ (cid:48) is a preﬁx of some σ ⊥ ∈ S K A } be the set ofproper preﬁxes of the K -lassos in S K A . For a given ﬁnite run σ = s a s . . . a n − s n ,let σ l denote the last state s n of σ . Consider the tree T = ( N, (cid:104) λ, (cid:105) , E ) where – N = (cid:0) P × (0 , (cid:1) ∪ (cid:0) S K A × (0 , (cid:1) ∪ {(cid:104) λ, (cid:105)} is the set of nodes, – (cid:104) λ, (cid:105) is the root of the tree, and – E ⊆ (cid:16) {(cid:104) λ, (cid:105)} × (cid:0) P × (0 , (cid:1)(cid:17) ∪ (cid:16) P × (0 , (cid:17) ∪ (cid:16)(cid:0) P × (0 , (cid:1) × (cid:0) S K A × (0 , (cid:1)(cid:17) is the set of edges deﬁned as E = { ( (cid:104) λ, (cid:105) , (cid:104) q, | Q I | (cid:105) ) | q ∈ Q I } { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σaq, p | T( σ l ) | (cid:105) (cid:1) | σaq ∈ P ∧ σ, σ l ) = 1 }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σaq, p · (1 − p ⊥ ) | T( σ l ) | (cid:105) (cid:1) | σaq ∈ P ∧ σ, σ l ) > }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) = K }∪ { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p · p ⊥ (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) < K } Note that by construction, T is ﬁnite.We now show by induction on the length of the ﬁnite run σ ∈ P that (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) . Base case σ = q : by deﬁnition of E , we have that ( (cid:104) λ, (cid:105) , (cid:104) q , | Q I | (cid:105) ) ∈ E since by Deﬁnition 1 we have that q ∈ Q I , which implies that( (cid:104) λ, (cid:105) , (cid:104) q , Pr (cid:48) p ⊥ [ q ] (cid:105) ) ∈ E since by Deﬁnition 2 we have that Pr (cid:48) [ q ] = | Q I | ,thus ( (cid:104) λ, (cid:105) , (cid:104) q , Pr (cid:48) p ⊥ [ σ ] (cid:105) ) ∈ E showing that (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachablein T from (cid:104) λ, (cid:105) . Inductive step: let σ be σ (cid:48) aq for some σ (cid:48) ∈ P . By induction hypothesis, wehave that (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) is reachable in T from (cid:104) λ, (cid:105) . There are now twocases: either σ (cid:48) , σ (cid:48) l ) = 1 or σ (cid:48) , σ (cid:48) l ) > σ (cid:48) , σ (cid:48) l ) = 1, then by deﬁnition of E we have that (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, Pr (cid:48) p ⊥ [ σ (cid:48) ] | T σ (cid:48) l | (cid:105) (cid:1) ∈ E , which by simple rewriting is (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, Pr (cid:48) p ⊥ [ σ (cid:48) ] · | T σ (cid:48) l | (cid:105) (cid:1) ∈ E that can be rewritten as (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, Pr (cid:48) p ⊥ [ σ (cid:48) ] · π [ σ (cid:48) l aq ] (cid:105) (cid:1) ∈ E since ( σ (cid:48) l , a, q ) ∈ T de-rives from the fact that σ (cid:48) aq is a preﬁx of some K -lasso in S K A and π [ σ (cid:48) l aq ] = m with m = | T( σ (cid:48) l ) | from Deﬁnition 2, which also impliesthat (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) (cid:1) ∈ E as required, since σ = σ (cid:48) aq and Pr (cid:48) p ⊥ [ σ ] = Pr (cid:48) p ⊥ [ σ (cid:48) ] · π [ σ (cid:48) l aq ], that is, (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) .Suppose now that σ (cid:48) , σ (cid:48) l ) >

1, then by deﬁnition of E we have that (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, Pr (cid:48) p ⊥ [ σ (cid:48) ] · (1 − p ⊥ ) | T σ (cid:48) l | (cid:105) (cid:1) ∈ E , which by simple rewriting is (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, (1 − p ⊥ ) · Pr (cid:48) p ⊥ [ σ (cid:48) ] · | T σ (cid:48) l | (cid:105) (cid:1) ∈ E that can be rewrittenas (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ (cid:48) aq, (1 − p ⊥ ) · Pr (cid:48) p ⊥ [ σ (cid:48) ] · π [ σ (cid:48) l aq ] (cid:105) (cid:1) ∈ E since ( σ (cid:48) l , a, q ) ∈ Tderives from the fact that σ (cid:48) aq is a preﬁx of some K -lasso in S K A and π [ σ (cid:48) l aq ] = m with m = | T( σ (cid:48) l ) | from Deﬁnition 2, which also implies that (cid:0) (cid:104) σ (cid:48) , Pr (cid:48) p ⊥ [ σ (cid:48) ] (cid:105) , (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) (cid:1) ∈ E as required, since σ = σ (cid:48) aq and Pr (cid:48) p ⊥ [ σ ] =(1 − p ⊥ ) · Pr (cid:48) [ σ (cid:48) ] · π [ σ (cid:48) l aq ], that is, (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) .This concludes the proof that for each ﬁnite run σ ∈ P we have that (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) .An analogous result holds also for each terminating K -lasso: for each terminat-ing K -lasso σ ⊥ ∈ S K A we have that (cid:104) σ ⊥ , Pr p ⊥ [ σ ⊥ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) . Let σ ⊥ ∈ S K A . By deﬁnition of P , we have that σ ∈ P , so by the resulthown above we have that (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) ∈ N is reachable in T from (cid:104) λ, (cid:105) . To com-plete the proof, we just need to show that (cid:0) (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) , (cid:104) σ ⊥ , Pr p ⊥ [ σ ⊥ ] (cid:105) (cid:1) ∈ E .There are now two cases: either σ, σ l ) = K or σ, σ l ) < K . In the former case σ, σ l ) = K , the deﬁnition of E implies that (cid:0) (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) , (cid:104) σ ⊥ , Pr (cid:48) p ⊥ [ σ ] (cid:105) (cid:1) ∈ E which is indeed (cid:0) (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) , (cid:104) σ ⊥ , Pr p ⊥ [ σ ⊥ ] (cid:105) (cid:1) ∈ E since Deﬁnition 2 impliesthat Pr p ⊥ [ σ ⊥ ] = Pr (cid:48) p ⊥ [ σ ], as required. Similarly, in the latter case σ, σ l ) < K ,the deﬁnition of E implies that (cid:0) (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) , (cid:104) σ ⊥ , p ⊥ · Pr (cid:48) p ⊥ [ σ ] (cid:105) (cid:1) ∈ E whichis indeed (cid:0) (cid:104) σ, Pr (cid:48) p ⊥ [ σ ] (cid:105) , (cid:104) σ ⊥ , Pr p ⊥ [ σ ⊥ ] (cid:105) (cid:1) ∈ E since Deﬁnition 2 implies that Pr p ⊥ [ σ ⊥ ] = p ⊥ · Pr (cid:48) p ⊥ [ σ ], as required.By a completely symmetric reasoning, we can show that each node (cid:104) σ, p (cid:105) ∈ P × (0 ,

1] that is reachable in T from (cid:104) λ, (cid:105) has p = Pr (cid:48) p ⊥ [ σ ] and that each node (cid:104) σ ⊥ , p (cid:105) ∈ S K A × (0 ,

1] that is reachable in T from (cid:104) λ, (cid:105) has p = Pr p ⊥ [ σ ⊥ ].These results allow us to claim that checking (cid:80) σ ⊥∈ S K A Pr p ⊥ [ σ ⊥ ] = 1 isequivalent to check (cid:80) (cid:104) σ ⊥ ,p (cid:105) p = 1 where the summation is taken over all leafnodes (cid:104) σ ⊥ , p (cid:105) ∈ N that are reachable in T from (cid:104) λ, (cid:105) .To show this last point, we now prove that for each non-leaf node (cid:104) σ, p (cid:105) ∈ (cid:0) P × (0 , (cid:1) ∪ {(cid:104) λ, (cid:105)} , it holds that (cid:80) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = p . Since thereachable part of the tree T is ﬁnite, this result allows us to propagate backward,level by level, the probability values stored in the leaves, i.e., on the elements of S K A × (0 , T , we will have that (cid:80) (cid:104) σ ⊥ ,p (cid:105) p = p r with p r being the valueassociated with the root, that is, p r = 1 as required since the root of T is (cid:104) λ, (cid:105) .Let (cid:104) σ, p (cid:105) ∈ (cid:0) P × (0 , (cid:1) ∪ {(cid:104) λ, (cid:105)} be an arbitrary non-leaf node of T ; we wantto prove that (cid:80) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = p . There are two cases: (cid:104) σ, p (cid:105) = (cid:104) λ, (cid:105) or (cid:104) σ, p (cid:105) ∈ P × (0 , (cid:88) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = (cid:88) (cid:104) q, | QI | (cid:105) :( (cid:104) λ, (cid:105) , (cid:104) q, | QI | (cid:105) ) ∈ E | Q I | = | Q I | · | Q I | = 1,as required, since by deﬁnition of E , we have an edge ( (cid:104) λ, (cid:105) , (cid:104) q, | Q I | (cid:105) ) ∈ E foreach q ∈ Q I . In the latter case, i.e., when (cid:104) σ, p (cid:105) ∈ P × (0 , (cid:88) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = (cid:88) (cid:104) σaq, p | T( σl ) | (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σaq, p | T( σl ) | (cid:105) ) ∈ E ∧ σaq ∈ P ∧ σ,σ l )=1 p | T( σ l ) | + (cid:88) (cid:104) σaq, p · (1 − p ⊥ ) | T( σl ) | (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σaq, (1 − p ⊥ ) · p | T( σl ) | (cid:105) ) ∈ E ∧ σaq ∈ P ∧ σ,σ l ) > (1 − p ⊥ ) · p | T( σ l ) | (cid:88) (cid:104) σ ⊥ ,p (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ ⊥ ,p (cid:105) ) ∈ E ∧ σ ⊥∈ S K A ∧ σ,σ l )= K p + (cid:88) (cid:104) σ ⊥ ,p · p ⊥ (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ ⊥ ,p · p ⊥ (cid:105) ) ∈ E ∧ σ ⊥∈ S K A ∧ σ,σ l ) } = ∅ and { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) = K } = ∅ since σ, σ l ) = 1 contradicts both σ, σ l ) > σ, σ l ) = K given that K ≥ σ, σ l ) = 1 forces { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) < K } = ∅ since Deﬁnition 1 requires σ, σ l ) ≥ σ ⊥ ∈ S K A .This allows us to consider the three cases independently. Case { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σaq, p | T( σ l ) | (cid:105) (cid:1) | σaq ∈ P ∧ σ, σ l ) = 1 } (cid:54) = ∅ : we have that (cid:88) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = (cid:88) (cid:104) σaq, p | T( σl ) | (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σaq, p | T( σl ) | (cid:105) ) ∈ E ∧ σaq ∈ P ∧ σ,σ l )=1 p | T( σ l ) | = | T( σ l ) | · p | T( σ l ) | = p as required, since by deﬁnition of E , we have an edge ( (cid:104) σ, p (cid:105) , (cid:104) σaq, p | T( σ l ) | (cid:105) ) ∈ E for each ( σ l , a, q ) ∈ T( σ l ). Case { (cid:0) (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) (cid:1) | σ ⊥ ∈ S K A ∧ σ, σ l ) = K } (cid:54) = ∅ : we have that (cid:88) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = (cid:88) (cid:104) σ ⊥ ,p (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ ⊥ ,p (cid:105) ) ∈ E ∧ σ ⊥∈ S K A ∧ σ,σ l )= K p = p as required, since by deﬁnition of E and S K A , we have exactly one edge( (cid:104) σ, p (cid:105) , (cid:104) σ ⊥ , p (cid:105) ) ∈ E with σ ⊥ ∈ S K A and σ, σ l ) = K . Otherwise: we have that (cid:88) (cid:104) σ (cid:48) ,p (cid:48) (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ (cid:48) ,p (cid:48) (cid:105) ) ∈ E p (cid:48) = (cid:88) (cid:104) σaq, p · (1 − p ⊥ ) | T( σl ) | (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σaq, p · (1 − p ⊥ ) | T( σl ) | (cid:105) ) ∧ σaq ∈ P ∧ σ,σ l ) > p · (1 − p ⊥ ) | T( σ l ) | (cid:88) (cid:104) σ ⊥ ,p · p ⊥ (cid:105) :( (cid:104) σ,p (cid:105) , (cid:104) σ ⊥ ,p · p ⊥ (cid:105) ) ∧ σ ⊥∈ S K A ∧ σ,σ l )