On the Hierarchy of Block Deterministic Languages
aa r X i v : . [ c s . F L ] D ec On the Hierarchy of Block Deterministic Languages
Pascal Caron, Ludovic Mignot, and Clément Miklarz
LITIS, Université de Rouen, 76801 Saint-Étienne du Rouvray Cedex, France {pascal.caron,ludovic.mignot,clement.miklarz1}@univ-rouen.fr
Abstract.
A regular language is k -lookahead deterministic (resp. k -block deterministic) if it is specified bya k -lookahead deterministic (resp. k -block deterministic) regular expression. These two subclasses of regularlanguages have been respectively introduced by Han and Wood ( k -lookahead determinism) and by Giammarresi et al. ( k -block determinism) as a possible extension of one-unambiguous languages defined and characterizedby Brüggemann-Klein and Wood.In this paper, we study the hierarchy and the inclusion links of these families. We first show that each k -block deterministic language is the alphabetic image of some one-unambiguous language. Moreover, we showthat the conversion from a minimal DFA of a k -block deterministic regular language to a k -block deterministicautomaton not only requires state elimination, and that the proof given by Han and Wood of a proper hierarchyin k -block deterministic languages based on this result is erroneous. Despite these results, we show by givinga parameterized family that there is a proper hierarchy in k -block deterministic regular languages. We alsoprove that there is a proper hierarchy in k -lookahead deterministic regular languages by studying particularproperties of unary regular expressions. Finally, using our valid results, we confirm that the family of k -blockdeterministic regular languages is strictly included into the one of k -lookahead deterministic regular languagesby showing that any k -block deterministic unary language is one-unambiguous. A Document Type Definition (DTD) containing a grammar is used to know whether an XML file fits some specifica-tion. These grammars are made of rules whose right-hand part is a restricted regular expression. Brüggemann-Kleinand Wood have formalized these regular expressions and have shown that the set of languages specified is strictly in-cluded in the set of regular ones. The distinctive aspect of such expressions is the one-to-one correspondence betweeneach letter of the input word and a unique position in them. The resulting Glushkov automaton is deterministic.The languages specified are called one-unambiguous regular languages.Several extensions of one-unambiguous expressions have been considered: – k -block deterministic regular expressions [4] are such that while reading an input word, there is a one-to-onecorrespondence between the next at most k input symbols and the same number of symbols of the expression.These expressions have particular Glushkov automata. The transitions of these automata can be labeled bywords of length at most k and for every couple of words labeling two output transitions of a single state, thesewords are not prefix from each other. – k -lookahead deterministic regular expressions form another generalization. This time, the reading of the next k symbols of the input word allows one to know the next position in the expression. This extension has beenproposed in [6]. – ( k, l ) -unambiguous regular expressions [3] is another extension of one-unambiguity, where the next k symbolsmay induce several paths, but with at most one common state.These three families of expressions fit together as families of languages in the way that a language is k -blockdeterministic (resp. k -lookahead deterministic, ( k, l ) -unambiguous) if there exists a k -block deterministic (resp. k -lookahead deterministic, ( k, l ) -unambiguous) expression to represent it.In [6], Han and Wood show that there is a proper hierarchy in block deterministic languages and there is astrict inclusion of the family of k -block deterministic languages into the one of k -lookahead deterministic languages.However, they based their proofs on an erroneous statement due to Giammaresi et al. [4], invalidating them. Inthis paper, we first show that there is indeed a proper hierarchy in block deterministic languages by giving our ownparameterized family. Then, we show that there is also a proper hierarchy in k -lookahead deterministic languagesby studying the structural properties of unary Glushkov automata. Finally, using our valid results, we demonstratethat the family of k -block deterministic languages is strictly included into the one of k -lookahead deterministiclanguages by showing that any k -block deterministic unary language is also one-unambiguous.Preliminaries are gathered in Section 2. In Section 3, we recall several results from [4,6] on which we questiontheir truthfulness. Indeed, we show in Section 4 that, due to an erroneous statement of Lemma 4, the witness familyiven as a proof of Theorem 3 is invalid; and present an alternative family, proving the infinite hierarchy of k -blockdeterministic regular languages w.r.t. k . In Section 5, we give another witness family to prove that there is also aninfinite hierarchy in k -lookahead deterministic regular languages w.r.t. k . Then, in Section 6, we give our own proofthat k -block deterministic regular languages are a proper subfamily of k -lookahead deterministic regular languagesw.r.t. k . Let Σ be a non-empty finite alphabet . A word w over Σ is a finite sequence of symbols from Σ . The length of aword w is denoted by | w | , and the empty word is denoted by ε . Let p, f, s, w ∈ Σ ∗ be words such that w = pf s ,then p is a prefix of w and f is a subword of w . The set of all prefixes (respectively subwords) of w is denoted by Pref( w ) (respectively Subw( w ) ).Let Σ ∗ denote the set of all words over Σ . A language over Σ is a subset of Σ ∗ . Let L and L ′ be two languagesover Σ . The following operations are defined: – the union : L ∪ L ′ = { w | w ∈ L ∨ w ∈ L ′ } – the concatenation : L · L ′ = { w · w ′ | w ∈ L ∧ w ′ ∈ L ′ } – the Kleene star : L ∗ = S k ∈ N L k with L = { ε } and L k +1 = L · L k A regular expression over Σ is built from ∅ (the empty set), ε , and symbols in Σ using the binary operators + and · , and the unary operator ∗ . The language L( E ) specified by a regular expression E is defined as follows: L( ∅ ) = ∅ , L( ε ) = { ε } , L( a ) = { a } , L( F + G ) = L( F ) ∪ L( G ) , L( F · G ) = L( F ) · L( G ) , L( F ∗ ) = L( F ) ∗ , with a ∈ Σ , and F , G some regular expressions over Σ . Given a language L , if there exists a regular expression E such that L( E ) = L , then L is a regular language . A regular expression is trimmed if it is equal to ∅ or does notcontain any occurrence of ∅ . We consider only trimmed regular expressions in the rest of this paper.A finite automaton A is a 5-tuple ( Σ, Q, I, F, δ ) where: Q is a finite set of states, I ⊂ Q is the set of initialstates, F ⊂ Q is the set of final states, and δ ⊂ Q × Σ × Q is a set of transitions. The set δ is equivalent to afunction of Q × Σ → Q : ( p, a, q ) ∈ δ ⇐⇒ q ∈ δ ( p, a ) . This function can be extended to Q × Σ ∗ → Q as follows:for any subset Q ′ ⊂ Q , for any symbol a ∈ Σ , for any word w ∈ Σ ∗ : δ ( Q ′ , ε ) = Q ′ , δ ( Q ′ , a ) = S q ∈ Q ′ δ ( q, a ) , δ ( Q ′ , a · w ) = δ ( δ ( Q ′ , a ) , w ) ; finally, we set δ ( q, w ) = δ ( { q } , w ) .A set O ⊂ Q is called an orbit if it is a strongly connected component. An orbit is trivial if it consists ofonly one state and there is no transition from it to itself in A . The set of orbits of A is denoted by O A . Let O ∈ O A be an orbit and p ∈ O be a state. The state p is an out-gate of O (respectively an in-gate of O ) if ( p ∈ F ) ∨ ( ∃ a ∈ Σ, ∃ q ∈ ( Q \ O ) , q ∈ δ ( p, a )) (respectively if ( p ∈ I ) ∨ ( ∃ a ∈ Σ, ∃ q ∈ ( Q \ O ) , p ∈ δ ( q, a )) ). The setof out-gates (respectively in-gates) of O is denoted by G out ( O ) (respectively G in ( O ) ).The language L( A ) recognized by A is the set { w ∈ Σ ∗ | δ ( I, w ) ∩ F = ∅} . Two automata are equivalent if theyrecognize the same language. The right language of a state q of A is denoted by L q ( A ) = { w ∈ Σ ∗ | δ ( q, w ) ∩ F = ∅} .Two states are equivalent if they have the same right language.An automaton A = ( Σ, Q, I, F, δ ) is trimmed if ∀ q ∈ Q, ∃ w p , w s ∈ Σ ∗ , q ∈ δ ( I, w p ) ∧ δ ( q, w s ) ∩ F = ∅ . If anautomaton is not trimmed, it is possible to compute an equivalent trimmed automaton by getting rid of any uselessstate. We consider only trimmed automata in the rest of this paper.An automaton A = ( Σ, Q, I, F, δ ) is standard if | I | = 1 and ∀ q ∈ Q, ∀ a ∈ Σ, δ ( q, a ) ∩ I = ∅ . If A is not a standardautomaton, then it is possible to compute an equivalent standard automaton ( Σ, Q s , I s , F s , δ s ) as follows: – Q s = Q ∪ { i s } with i s / ∈ Q – I s = { i s } – F s = F ∪ { i s } if I ∩ F = ∅ , F otherwise – δ s = δ ∪ { ( i s , a, q ) | ∃ i ∈ I, ( i, a, q ) ∈ δ } This operation is called standardization .An automaton A = ( Σ, Q, I, F, δ ) is deterministic if | I | = 1 and ∀ t = ( p, a, q ) , t = ( p, b, q ) ∈ δ, ( t = t ) = ⇒ ( a = b ) . If A is not deterministic, then it is possible to compute an equivalent deterministic automaton by usingthe powerset construction described in [10]. deterministic automaton A = ( Σ, Q A , { i A } , F A , δ A ) is minimal if there is no equivalent deterministic au-tomaton B = ( Σ, Q B , { i B } , F B , δ B ) such that | Q B | < | Q A | . If A is not minimal, then it is possible to compute anequivalent minimal deterministic automaton by merging equivalent states [7,9]. Notice that two equivalent minimaldeterministic automata are isomorphic.Kleene’s Theorem [8] asserts that the set of the languages specified by regular expressions is the same as the setof languages recognized by finite automata. The conversion of regular expressions into automata has been deeplystudied, e.g. by Glushkov [5]. To differentiate each occurence of the same symbol in a regular expression, a marking of all the symbols of the alphabet is performed by indexing them with their relative position in the expression.The marking of a regular expression E produces a marked regular expression denoted by E ♯ over the alphabet ofindexed symbols denoted by Π E where each indexed symbol occurs at most once in E ♯ . The reverse of markingis the dropping of subscripts, denoted by ♮ , such that if x ∈ Π E and x = a k , then x ♮ = a . It is then extended tomarked regular expressions such that ( E ♯ ) ♮ = E .Let E be a regular expression over an alphabet Σ . The following functions are defined: – Null( E ) = { ε } if ε ∈ L( E ) , ∅ otherwise – First( E ) = { x ∈ Σ | ∃ w ∈ Σ ∗ , xw ∈ L( E ) } – Last( E ) = { x ∈ Σ | ∃ w ∈ Σ ∗ , wx ∈ L( E ) } – Follow(
E, x ) = { y ∈ Σ | ∃ u, v ∈ Σ ∗ , uxyv ∈ L( E ) } , ∀ x ∈ Σ From these functions, an automaton recognizing L( E ) can be computed: Definition 1.
The
Glushkov automaton of a regular expression E over an alphabet Σ is denoted by G E =( Σ, Q E , I E , F E , δ E ) with: – Q E = Π E ∪ { i } – I E = { i } – F E = Last( E ♯ ) ∪ { i } if Null( E ♯ ) = { ε } , Last( E ♯ ) otherwise – δ E = { ( x, a, y ) ∈ Π E × Σ × Π E | y ∈ Follow( E ♯ , x ) ∧ a = y ♮ }∪{ ( i, a, y ) ∈ { i } × Σ × Π E | y ∈ First( E ♯ ) ∧ a = y ♮ } Finally, an automaton is a
Glushkov automaton if it is the Glushkov automaton of a regular expression E . Example 1.
Let E = ( a + b ) ∗ a + ε . Then E ♯ = ( a + b ) ∗ a + ε with Π E = { a , b , a } , and G E is given in Figure 1. i a a b aab a ba baa Fig. 1.
The Glushkov automaton G E of E = ( a + b ) ∗ a + ε We present the notion of one-unambiguity introduced in [1].
Definition 2.
A regular expression E is one-unambiguous if G E is deterministic. A regular language is one-unambiguous if it is specified by some one-unambiguous regular expression. Brüggemann-Klein and Wood showed that the one-unambiguity of a regular language is structurally decidableover its minimal DFA. This decision procedure is related to the orbits of the underlying graph and to their linkswith the remaining parts: An automaton has the orbit property if all the out-gates of each orbit have identicalconnections to the outside. More formally: efinition 3.
An automaton A = ( Σ, Q, I, F, δ ) has the orbit property if, for any orbit O in O A , for any two states ( p, q ) in G out ( O ) , the two following conditions are satisfied: – p ∈ F = ⇒ q ∈ F , – ∀ r ∈ ( Q \ O ) , ∀ a ∈ Σ, r ∈ δ ( p, a ) = ⇒ r ∈ δ ( q, a ) . Let q ∈ Q be a state. The orbit of a state q , denoted by O( q ) is the orbit to which q belongs. The orbit automaton A q of the state q in A is the automaton obtained by restricting the states and the transitions of A to O( q ) withinitial state q and final states G out (O( q )) . For any state q ∈ Q , the languages L( A q ) are called the orbit languagesof A . A symbol a ∈ Σ is A -consistent if there exists a state q a ∈ Q such that all final states of A have a transitionlabelled by a to q a . A set S of symbols is A -consistent if each symbol in S is A -consistent. The S -cut A S of A isconstructed from A by removing, for each a ∈ S , all transitions labelled by a that leave a final state of A . All thesenotions can be used to characterize one-unambiguous regular languages: Theorem 1 ([1]).
Let M be a minimal deterministic automaton and S be a M -consistent set of symbols. Then, L( M ) is one-unambiguous if and only if:1. the S -cut M S of M has the orbit property2. all orbit languages of M S are one-unambiguous.Furthermore, if M consists of a single non-trivial orbit and L( M ) is one-unambiguous, then M has at least one M -consistent symbol. This theorem suggests an inductive algorithm to decide, given a minimal deterministic automaton M whether L( M ) is one-unambiguous: the BKW test . Furthermore, the theorem defines a sufficient condition over non-minimaldeterministic automaton:
Lemma 1 ([1]).
Let A be a deterministic automaton and M be its equivalent minimal deterministic automaton.1. If A has the orbit property, then so does M
2. If all orbit languages of A are one-unambiguous, then so are all orbit languages of M . Consequently, the BKW test is extended to deterministic automata which are not minimal. Reinterpreting theresults in [1], it can be shown that
Lemma 2.
The Glushkov automaton of a one-unambiguous regular expression passes the BKW test.
We present the notion of lookahead determinism introduced in [6]. The basic idea is that the reading of the next k symbols of the input word allows one to know the next position in the expression or in the automaton. Definition 4.
An automaton A = ( Σ, Q, I, F, δ ) is k -lookahead deterministic if the following conditions hold: – | I | = 1 – ∀ t = ( p, a, q ) , t = ( p, b, q ) ∈ δ, ( t = t ) = ⇒ ( a = b ) ∨ ( ∀ w ∈ Σ k − , δ ( q , w ) = ∅ ∨ δ ( q , w ) = ∅ ) . Definition 5.
A regular expression E is k -lookahead deterministic if G E is k -lookahead deterministic. A regularlanguage is k -lookahead deterministic if it is specified by some k -lookahead deterministic regular expressions. Since a -lookahead deterministic automaton is deterministic, the family of -lookahead deterministic languageis the same as the family of one-unambiguous language. Example 2.
Let E = b ∗ a ( b ∗ a ) ∗ ( a + b ) , G E is given in Figure 2. Notice that the states a and a admit two successorsby a and b , but since L a ( G E ) = L b ( G E ) = { ε } , then G E and E are -lookahead deterministic. a b a b a b ba bab a a bb ab a a b Fig. 2.
The -lookahead deterministic Glushkov automaton G E It has been proved in [6] that the language L( b ∗ a ( b ∗ a ) ∗ ( a + b )) is not one-unambiguous. Thus, one-unambiguousregular languages are a proper subfamily of k -lookahead deterministic regular languages. We present the notion of block determinism introduced in [4].Let Σ be an alphabet and k be an integer. The set of blocks B Σ,k is the set { w | w ∈ Σ ∗ ∧ ≤ | w | ≤ k } .The notions of regular expression and automaton can be extended to ones over set of blocks. Let E be a regularexpression over Γ and A = ( Γ, Q, I, F, δ ) be an automaton. Let Σ be an alphabet and k be an integer, if Γ ⊂ B Σ,k then E and A are ( Σ, k ) -block . And since Γ ⊂ B Σ,k ⊂ Σ ∗ , a language over Γ is also a language over Σ . Todistinguish blocks as syntactic components in a regular expression, we write them between square brackets. Thoseare omitted for one letter blocks.Since Σ = B Σ, , regular expressions and automata can be considered as ones over a set of blocks. Moreover,the blocks can be treated as single symbols, as we do when we refer to the elements of an alphabet. With thisassumption, the marking of block regular expressions induces the construction of a Glushkov automaton from ablock regular expression, and the usual automaton transformations such as determinization and minimization canbe easily performed. Example 3.
Let E = [ aa ] ∗ ([ ab ] b + ba ) b ∗ . Then E ♯ = [ aa ] ∗ ([ ab ] b + b a ) b ∗ , and G E is given in Figure 3. i [ aa ] b [ ab ] b b b aab abaab ab bba b b Fig. 3.
The ( { a, b } , -block Glushkov automaton G E
12 34ab a ba b b
Fig. 4.
The minimal DFA of L( E ) The notion of determinism can also be extended to block determinism as follows:
Definition 6.
An automaton A = ( Γ, Q, I, F, δ ) is k -block deterministic if the following conditions hold: – there exists an alphabet Σ such that A is ( Σ, k ) -block, – | I | = 1 , – ∀ t = ( p, b , q ) , t = ( p, b , q ) ∈ δ, ( t = t ) = ⇒ ( b / ∈ Pref( b )) . Finally, the block determinism of a Glushkov automaton can be used to extend the block determinism to blockexpression:
Definition 7.
A block regular expression E is k -block deterministic if G E is k -block deterministic. A regular lan-guage is k -block deterministic if it is specified by some k -block deterministic regular expressions. Since a -block deterministic automaton is a deterministic automaton, the family of -block deterministic lan-guage is the same as the family of one-unambiguous language. Example 4.
Since the Glushkov automaton in Figure 3 is -block deterministic, L([ aa ] ∗ ([ ab ] b + ba ) b ∗ ) is -blockdeterministic.Let A = ( Σ, Q, I, F, δ ) be an automaton and Γ be a set. Then the automaton B = ( Γ, Q, I, F, δ ′ ) is an alphabeticimage of A if there exists an injection φ from Σ to Γ such that δ ′ = { ( p, φ ( a ) , q ) | ( p, a, q ) ∈ δ } . In this case, we set B = φ ( A ) . Caron and Ziadi showed in [2] that an automaton is a Glushkov one if and only if the two conditionshold: – it is homogeneous (for any state q , for any two transitions ( p, a, q ) and ( r, b, q ) , the symbols a and b are thesame); – it satisfies some structural properties over the transition structure.One can check that any injection φ from Σ to Γ preserves such conditions, since the alphabetical image preservesthe transition structure by only changing the symbol labeling a transition. Therefore Lemma 3.
The alphabetic image of an automaton A is a Glushkov automaton if and only if A is a Glushkovautomaton. Let us show that the BKW test can be used to characterize the k -block determinism of a regular language: Theorem 2.
A regular language L is k -block deterministic if and only if it is recognized by a k -block deterministicautomaton K such that K is the alphabetic image of a deterministic automaton which passes the BKW test.Proof. Let us show the double implication.1. Let L be a k -block deterministic regular language over Σ . Then there exists a k -block deterministic Glushkovautomaton K = ( B Σ,k , Q, { i } , F, δ K ) that recognizes L . Let Π = { [ b ] | b ∈ B Σ,k } be an alphabet, ϕ : Π → B Σ,k be the bijection such that for every [ b ] ∈ Π, ϕ ([ b ]) = b . Let A = ( Π, Q, { i } , F, δ A ) be a Glushkov automaton suchthat K = ϕ ( A ) . Let us suppose that A is not deterministic. Then, there exist two transitions ( p, a, q ) , ( p, a, r ) ∈ δ A such that q = r . Thus, ( p, ϕ ( a ) , q ) , ( p, ϕ ( a ) , r ) ∈ δ K , which contradicts the fact that K is k -block deterministic.So, A is a deterministic Glushkov automaton, and therefore passes the BKW test following Lemma 2.. Let A = ( Π, Q A , { i A } , F A , δ A ) be a deterministic automaton which passes the BKW test, K = { Γ, Q A , { i A } , F A , δ K ) be a k -block deterministic automaton, and ϕ : Π → Γ be an injection such that K = ϕ ( A ) . Now, ϕ : Π → Γ is extended into the morphism ϕ : Π ∗ → Γ ∗ such that for every letter a ∈ Π and every word w ∈ Π ∗ we have ϕ ( a · w ) = ϕ ( a ) · ϕ ( w ) and ϕ ( ε ) = ε . In this case, L( K ) = ϕ (L( A )) . Since A passes the BKW test, there exists anequivalent deterministic Glushkov automaton G = ( Π, Q G , { i G } , F G , δ G ) . Following Lemma 3, there also existsa Glushkov automaton H = ( Γ, Q G , { i G } , F G , δ H ) such that H = ϕ ( G ) and L( H ) = ϕ (L( G )) . Since A and G are equivalent deterministic automata, ϕ (L( G )) = ϕ (L( A )) . And so L( H ) = L( K ) . Let us suppose that H isnot k -block deterministic, then there exist two transitions ( p H , ϕ ( a ) , q H ) , ( p H , ϕ ( b ) , r H ) ∈ δ H such that either ( ϕ ( a ) = ϕ ( b )) ∧ ( q H = r H ) or ( ϕ ( a ) = ϕ ( b )) ∧ ( ϕ ( a ) ∈ Pref( ϕ ( b ))) . By definition, ( p H , a, q H ) , ( p H , b, r H ) ∈ δ G .But since G and A are equivalent deterministic automata, there exist two transitions ( p A , a, q A ) , ( p A , b, r A ) ∈ δ A ,and by definition, ( p A , ϕ ( a ) , q A ) , ( p A , ϕ ( b ) , r A ) ∈ δ K . Let us suppose that ( ϕ ( a ) = ϕ ( b )) ∧ ( q H = r H ) . Since ϕ is an injection, ( a = b ) ∧ ( q H = r H ) , which contradicts the fact that G is deterministic. So let us suppose that ( ϕ ( a ) = ϕ ( b )) ∧ ( ϕ ( a ) ∈ Pref( ϕ ( b ))) , it contradicts the fact that K is k -block deterministic. Therefore, H is a k -block deterministic Glushkov automaton, and L( K ) is k -block deterministic.It has been proved that one-unambiguous regular languages are a proper subfamily of k -block deterministic reg-ular languages. As an example, the language L([ aa ] ∗ ([ ab ] b + ba ) b ∗ ) is -block deterministic but not one-unambiguoussince its minimal deterministic automaton given in Figure 4 does not pass the BKW test. Therefore one can wonderwhether there exists an infinite hierarchy in k -block deterministic regular languages regarding k . That has beenachieved by Han and Wood [6], but with an invalid assumption. In [4], a method is presented for creating from a block automaton an equivalent block automaton with larger blocksby eliminating states while preserving the right language of every other states.Let A = ( Γ, Q, I, F, δ ) be a block automaton. The state elimination of q in A creates a new block automaton,denoted by S ( A, q ) , computed as follows: first, the state q and all transitions going in and out of it are removed;second, for every two transitions ( r, u, q ) and ( q, v, s ) in δ , the transition ( r, uv, s ) is added. This transformation isillustrated in Figure 5. qr r s s u u v v w r r s s u v u v u v u v wv wv Fig. 5.
The state elimination of the state q Definition 8.
Let A = ( Γ, Q, I, F, δ ) be a block automaton. A state q ∈ Q satisfies the state elimination precondition if it is neither an initial state nor a final state and it has no self-loop. The state elimination is extended to a set S ⊂ Q of states if every state in S satisfies the state eliminationprecondition, and the subgraph induced by S is acyclic. In this case, we can eliminate the states in S in any order.Giammarresi et al. [4] suggest that state elimination is sufficient to decide the k -block determinism of a regularlanguage. Lemma 4 ([4,6]).
Let M be a minimal deterministic automaton of a k -block deterministic regular language. Wecan transform M to a k -block deterministic automaton that satisfies the orbit property using state elimination. Using this lemma, Han and Wood stated that:
Theorem 3 ([6]).
There is a proper hierarchy in k -block deterministic regular languages.roof. Han and Wood exhibited the family of languages L k specified by regular expressions E k = ([ a k ]) ∗ ([ a k − b ] b + ba ) b ∗ whose minimal deterministic automata M k are represented in Figure 6. Following Lemma 4, there is no otherchoice but to eliminate states q to q k − , in any order, to have the orbit property. Thus, L k is k -block deterministicand not ( k − -block deterministic. q k q k − q k − q q q ab a a aa bba b q k a k b a k − bba b Fig. 6.
The minimal deterministic automaton M k and its equivalent k -block deterministic automaton after having eliminatedstates q to q k − In this section, we exhibit a counter-example for Lemma 4. We can find a k -block deterministic language with aminimal deterministic automaton from which we cannot get any k -block deterministic automaton that satisfies theorbit property. In Figure 7, the leftmost automaton is minimal and none of its states can be eliminated. However, byapplying standardization, we create an equivalent deterministic automaton from which we can eliminate the state i to get the rightmost equivalent -block deterministic automaton. i ab b i ′ i ab ab b i ′ ab babb Fig. 7.
The counter-example
This clearly shows that the only action of state elimination is not enough to decide whether a language is k -blockdeterministic. Using this operation, we show that: Proposition 1. ∀ k ∈ N \ { } , the language L k is -block deterministic.Proof. As shown in Figure 8, we can always standardize M k , proceed to the state elimination of q k and get a -blockdeterministic automaton which respects the conditions stated in Theorem 2. Thus, L k is -block deterministic andis specified by the regular expressions F k = ( a k − ([ aa ] a k − ) ∗ ([ ab ] a + bb ) + ba ) b ∗ . q k q k − q k − q q q ab aba a aa bba b i q k − q k − q q q ab a a aaaab bba b Fig. 8.
The standardization of M k followed by the state elimination of q k However, Theorem 3 is still correct since we can give proper details about the proof with our own parameterizedfamily of languages. Let k ∈ N \ { } be an integer and A k = ( Σ, Q k , I k , F k , δ k ) be the automaton (given in Figure 9)such that: – Σ = { a, b, c } – Q k = { f } ∪ { α j , β j | ≤ j ≤ k } – I k = { β k } – F k = { f } ∪ { α k , β k } – δ k = ∆ k ∪ Γ k with: • ∆ k = { ( β k , a, α k ) , ( β , b, f ) , ( α k , a, α k ) , ( α , b, f ) , ( α , c, β k ) }• Γ k = { ( α j , b, α j − ) , ( β j , b, β j − ) | ≤ j ≤ k } β k β k − β k − β β α k α k − α k − α α fba b b ba b b b bc Fig. 9.
The deterministic automaton A k First of all, let us notice that the word b j ∈ L( A k ) if and only if j = k . Thus, for all k = k ′ , L( A k ) = L( A k ′ ) .Furthermore, Proposition 2. ∀ k ∈ N \ { } , L( A k ) is k -block deterministic.Proof. By construction, for all k , A k is trimmed and deterministic. So, any automaton that we can get fromeliminating states such that the state elimination precondition is respected is a block deterministic automaton.For any integer k in N \ { } , we can eliminate the set of states { α j , β j | ≤ j ≤ k − } because none of thesestates are initial or final and their induced subgraph is acyclic. Thus, we can get a k -block deterministic automaton B k , such that L( B k ) = L( A k ) , shown in Figure 10. Obviously B k respects the conditions stated in Theorem 2, so L( A k ) is k -block deterministic. Furthermore, it can be checked that L( A k ) is specified by the k -block deterministicregular expression ( a ( ε + [ b k − c ])) ∗ ( ε + [ b k ]) . β k α k fb k aa b k b k − c Fig. 10.
The k -block deterministic automaton B k inally, let us show that the index cannot be reduced: Proposition 3. ∀ k ∈ N \ { , } , L( A k ) is not ( k − -block deterministic.Proof. Let B = ( B Σ,k − , Q B , { i B } , F B , δ B ) be a ( k − -block deterministic automaton equivalent to A k .We first show that there exists a non-trivial orbit O ⊂ Q B and two states α, β ∈ O such that L α ( B ) = L α k ( A k ) and L β ( B ) = L β k ( A k ) . Let us consider the following state sequences: ( α k,j ) j ∈ N ⊂ F B and ( β k,j ) j ∈ N ⊂ F B , suchthat β k, = i B , δ B ( β k,j , a ) = α k,j and δ B ( α k,j , b k − c ) = β k,j +1 . It follows that δ B ( i B , ( ab k − c ) j ) = β k,j and δ B ( i B , ( ab k − c ) j a ) = α k,j . Notice that the existence of α k,j and β k,j is ensured by the fact that L( B ) = L( A k ) .Let us suppose that there exists j ∈ N such that L β k,j ( B ) = L β k ( A k ) . Then there exists w ∈ Σ ∗ such that w ∈ L β k,j ( B ) △ L β k ( A k ) , where for any two sets X and Y , X △ Y = ( X \ Y ) ∪ ( Y \ X ) . And since δ k ( β k , ( ab k − c ) j ) = β k , ( ab k − c ) j · w ∈ L( B ) △ L( A k ) . Thus, L( B ) = L( A k ) which is contradictory. So, for every j ∈ N , we have L β k,j ( B ) =L β k ( A k ) . The proof that for every j ∈ N , we have L α k,j ( B ) = L α k ( A k ) , is done in the same way. Now, let us supposethat for every j = j ′ ∈ N , we have α k,j = α k,j ′ and β k,j = β k,j ′ . Then Q B would be infinite, which would contradictthe fact that B is a finite automaton. So, there exist j < j ′ ∈ N such that α k,j = α k,j ′ or β k,j = β k,j ′ . Thus, eitherthere exists a path going from β k,j to α k,j and a path going from α k,j to β k,j ′ = β k,j , and β k,j and α k,j belong tothe same orbit; or there exists a path going from α k,j to β k,j +1 and a path going from β k,j +1 to α k,j ′ = α k,j , and α k,j and β k,j +1 belong to the same orbit.Finally, let us focus on such an orbit O with two out-gates α and β such that L α ( B ) = L α k ( A k ) and L β ( B ) =L β k ( A k ) . We know that for every i ∈ N such that ≤ i < k , we have δ k ( β k , b i ) = β k − i with | L β k − i ( A k ) | < ∞ .Since L β ( B ) = L β k ( A k ) and B is ( k − -block deterministic, there exist j ∈ N and p ∈ Q B such that ≤ j < k , δ B ( β, [ b j ]) = p and L p ( B ) = L β k − j ( A k ) . This means that | L p ( B ) | < ∞ , so p / ∈ O . Now, if there does not exist astate q ∈ Q B such that δ B ( α, [ b j ]) = q , then B does not have the orbit property. So, let us suppose that such astate exists. We know that for every i ∈ N such that ≤ i < k , we have δ k ( α k , b i ) = α k − i with | L α k − i ( A k ) | = ∞ .Since L α ( B ) = L α k ( A k ) , we have L q ( B ) = L α k − j ( A k ) and | L q ( B ) | = ∞ . So p = q and B does not have the orbitproperty.Since L( A k ) cannot be recognized by a ( k − -block deterministic alphabetic image of an automaton passingthe BKW test, following Theorem 2 it holds that L( A k ) is not ( k − -block deterministic. In this section, we give a parameterized family (L j ) j ≥ such that L j is ( j + 1) -lookahead deterministic but not j -lookahead deterministic. In order to prove it, we show that any j -lookahead deterministic Glushkov automatondoes not recognize L j .Let j ∈ N and let A j = ( Σ, Q j , I, F j , δ j ) be the automaton (given in Figure 11) such that: – Σ = { a } – Q j = { α i | ≤ i ≤ j } – I = { α } – F j = { α , α j } – δ j = { ( α i , a, α i +1 ) | ≤ i < j } ∪ { ( α j , a, α ) } α α α j − α j α j +1 α j a aaa Fig. 11.
The minimal deterministic automaton A j Let us first show that the languages in the family are distinct and satisfy the condition of lookahead determinism.
Proposition 4. ∀ j ∈ N , A j is a minimal deterministic automaton.roof. By construction, ∀ j ∈ N , A j is trimmed and deterministic. Then, if j = 0 , there is only one state which isinitial and final, so A is minimal. Otherwise, if j > , F j = { α , α j } such that { a j } ∈ L α ( A j ) and { a j } / ∈ L α j ( A j ) .Thus, α and α j are not equivalent, and so are every non final states. Therefore, for every j ∈ N , A j is also minimal.Thus, for all j = j ′ , since | Q j | 6 = | Q j ′ | , then L( A j ) = L( A j ′ ) . Furthermore, Proposition 5. ∀ j ∈ N , L( A j ) is ( j + 1) -lookahead deterministic.Proof. Let us consider the regular expression E j = ( a j +1 ) ∗ · ( ε + a j ) . Then E j is ( j + 1) -lookahead deterministicAnd since the minimal deterministic automaton recognizing L( E j ) is isomorphic to A j , then L( A j ) = L( E j ) . So L( A j ) is ( j + 1) -lookahead.Let j ′ ∈ N \ { } and let G = ( { a } , Q G , i G , F G , δ G ) be a j ′ -lookahead deterministic Glushkov automaton. Wedemonstrate that G cannot recognized L( A j ′ ) , that is to say that L( A j ′ ) is not j ′ -lookahead deterministic.In order to do so, we consider a property of Glushkov automata from Proposition 4.2 of [2]: Lemma 5.
Let O ∈ O G be a non-trivial orbit of G . Then for every in-gate o i of O and for every out-gate o o of O , o i ∈ δ ( o o , a ) . Let us first restrain the set of Glushkov automata to consider. We show that a state in a lookahead deterministicunary automaton cannot admit two distinct successors with infinite right languages.
Proposition 6. ∀ s ∈ Q G , ∀ q , q ∈ δ G ( s, a ) , ( | L q ( G ) | = ∞ ∧ | L q ( G ) | = ∞ ) = ⇒ q = q .Proof. Let us suppose that there exist 3 states q, q , q ∈ Q G such that q = q , | L q ( G ) | = ∞ , | L q ( G ) | = ∞ and q , q ⊂ δ G ( q, a ) . Then, necessarily, δ G ( q , a j ′ − ) = ∅ and δ G ( q , a j ′ − ) = ∅ . Thus G is not j ′ -lookaheaddeterministic. Corollary 1.
Among the orbit of G , at most one is non-trivial. Furthermore, let us suppose that G has no non-trivial orbit, then | L( G ) | < ∞ . But, since for every j ′ ∈ N \ { } ,we have | L( A j ′ ) | = ∞ , then G could not recognize L( A j ′ ) . Thus, G must have a single non-trivial orbit denoted by O , of size l O = | O | . Moreover, the gates of O are remarkable: Proposition 7. O has a single in-gate and a single out-gate.Proof. The fact that there exists a single in-gate is also a direct consequence of Proposition 6. Furthermore, let ussuppose that there exist 2 distinct states g , g ∈ G out ( O ) . Let o in ∈ G in ( O ) be the single in-gate of O . Since G isa Glushkov automaton, following Lemma 5, o in ∈ δ G ( g , a ) ∩ δ G ( g , a ) . Consequently, there exists s ∈ O such that | δ G ( s, a ) | > , contradicting Proposition 6.Consequently, we denote by o in the single in-gate of O and by o out its single out-gate.As a corollary of Proposition 6, and since Glushkov automata are standard, then there exists a single state s ∈ Q G \ O such that o in ∈ δ G ( s, a ) , which can be reached from the initial state by a single word w s such that | w s | = m . This allows us to characterize the words reaching o out from i . Lemma 6. ∀ w ∈ { a } ∗ , o out ∈ δ G ( i, w ) ⇐⇒ ∃ k ∈ N \ { } , | w | = m + k × l O .Proof. Following Proposition 6, we can deduce: ∀ o ∈ O, | δ G ( o, a ) ∩ O | = 1 . Thus, ∀ o ∈ O, o ∈ δ G ( o, w ) ⇐⇒ ∃ k ∈ N , | w | = k × l O . And following Lemma 5, o in ∈ δ G ( o out , a ) and thus o out ∈ δ G ( o in , w ) ⇐⇒ ∃ k ∈ N , | w | = l O − k × l O .Since o in ∈ δ G ( s, a ) and s ∈ δ G ( i, w ) ⇐⇒ | w | = m , then o out ∈ δ G ( i, w ) ⇐⇒ ∃ k ∈ N , | w | = m + 1 + l O − k × l O .Moreover, we give a necessary condition over words in L( A j ′ ) . Lemma 7.
Let w , w , w ∈ L( A j ′ ) such that | w | < | w | < | w | and for every w ∈ L( A j ′ ) such that w = w ,either | w | ≤ | w | or | w | ≥ | w | . Then, either | w | − | w | = j ′ + 1 and | w | − | w | = j ′ , or | w | − | w | = j ′ and | w | − | w | = j ′ + 1 .Proof. Since L( A j ′ ) = L(( a j ′ +1 ) ∗ · ( ε + a j ′ )) = { ε, a j ′ , a j ′ +1 , a j ′ +1 , a j ′ +2 , · · · } , then either | w | − | w | = j ′ + 1 and | w | − | w | = j ′ , or | w | − | w | = j ′ and | w | − | w | = j ′ + 1 .From the two previous lemmas, we show that G cannot recognize L( A j ′ ) . roposition 8. L( G ) = L( A j ′ ) .Proof. Let Q o = δ ( o out , a ) \ O be the set of direct successors of o out outside of O , and L o = S q ∈ Q o L q ( A ) be theunion of their right languages. Then, since G is j ′ -lookahead deterministic, the length of any word of L o is strictlysmaller than j ′ − .Let us consider the set L out of words reaching a final state from o out without going through O . By definition, L out = { a } · L o ∪ { ε } if o out ∈ F , { a } · L o otherwise. Then, the length of any word w ∈ L out is strictly smaller than j ′ . If L out = ∅ , then | L( G ) | < ∞ and L( G ) = L( A j ′ ) .Now, let us suppose that there exist two distinct words w o , w o ∈ L out such that | w o | > | w o | , then | w o | −| w o | < j ′ . Moreover, there exists a word w ∈ { a } ∗ such that o out ∈ δ ( i, w ) . Thus, { ww o , ww o } ⊂ L( G ) and | ww o − ww o | < j ′ . Following Lemma 7, L( G ) = L( A j ′ ) .Now, let us suppose that L out = { w o } . Since O is the single non-trivial orbit and o out is its single out-gate, thenthere exists n ∈ N such that for every word w G ∈ L( G ) , if | w G | ≥ n then w G = w p w o with o out ∈ δ G ( i, w p ) . Let w G , w G , w G ∈ L( G ) such that n ≤ | w G | < | w G | < | w G | and for every w G ∈ L( G ) such that w G = w G , either | w G | ≤ | w G | or | w G | ≥ | w G | . Then, following Lemma 6, | w G | − | w G | = | w G | − | w G | = l O , which means that L( G ) = L( A j ′ ) .Thus, L( A j ′ ) cannot be recognized by a j ′ -lookahead deterministic Glushkov automaton. Consequently: Proposition 9. ∀ j ∈ N \ { } , L( A j ) is not j -lookahead deterministic. We can conclude that:
Theorem 4.
There is a proper hierarchy in k -lookahead deterministic regular languages. Han and Wood stated in [6] that block deterministic languages are a proper subfamily of lookahead deterministiclanguages. However, in their proof of the block deterministic languages being a proper subfamily of the lookaheaddeterministic languages, they used a statement made by Giammarresi et al. in [4] about the family of languages
L(( a + b ) ∗ a ( a + b ) k − ) not being k -block deterministic. But we proved that Lemma 4, which is used as a basisfor deciding if a language is block deterministic, is wrong. So we cannot be sure that their example is not blockdeterministic and give our own proof. We start by presenting some properties of block regular expressions and regular expressions over the language oftheir marked expressions.
Lemma 8.
A block regular expression E b is k -block deterministic if and only if: ∀ u, v, w ∈ Π ∗ E b , ∀ b , b ∈ Π E b , ( ub v, ub w ∈ L( E ♯b ) ∧ b = b ) = ⇒ b ♮ / ∈ Pref( b ♮ ) .Proof. Giammarresi et al. in [4] defined a block regular expression E b being block deterministic if the following twoconditions hold: – ∀ b , b ∈ First( E ♯b ) , b = b = ⇒ b ♮ / ∈ Pref( b ♮ ) – ∀ x ∈ Π E b , ∀ b , b ∈ Follow( E ♯b , x ) , b = b = ⇒ b ♮ / ∈ Pref( b ♮ ) .And we can deduce from it the aforementioned property. Lemma 9 ([6]).
A regular expression E is k -lookahead deterministic if and only if: ∀ u, v, w ∈ Π ∗ E , ∀ x, y ∈ Π kE , ( uxv, uyw ∈ L( E ♯ ) ∧ x (1) = y (1) = ⇒ x ♮ = y ♮ Let E b be a ( Σ, k )-block regular expression over an alphabet Γ and E ♯b its marked block regular expression overthe alphabet Π E b . Let Ω = { a i,j | ∃ [ w ] i ∈ Π E b , w [ j ] = a } be an alphabet, ϕ : Π E b → B Ω,k a function such that forevery [ w ] i ∈ Π E b , ϕ ([ w ] i ) = w [1] i, · w [2] i, · · · w [ | w | ] i, | w | . xample 5. ϕ ([ aba ] ) = a , b , a , .Every symbol of Ω is linked to only one position of block of Π E b and represents the position of a letter in aposition of block. Considering that every block of Π E b is indexed differently, every element of Ω is produced byonly one block. We define the following functions for every a i,j ∈ Ω : – Pos E b : Ω → Π E b with Pos E b ( a i,j ) = [ w ] i – BlockPos E b : Ω → N \ { } with BlockPos E b ( a i,j ) = j – BlockLength E b : Ω → N \ { } with BlockLength E b ( a i,j ) = | (Pos E b ( a i,j )) ♮ | A word w ∈ Ω ∗ is simple block complete if there exists x ∈ Π E b such that ϕ ( x ) = w . The set of simple blockcomplete words over Ω is denoted by SBC ( Ω ) . Let w ∈ ( SBC ( Ω )) ∗ , then w is block complete and the set of blockcomplete words is denoted by BC ( Ω ) . Then, we can define ϕ as a bijection between Π E b and SBC ( Ω ) , and extendit by morphism between Π ∗ E b and BC ( Ω ) .Let χ be the function which takes a marked block regular expression E ♯b and transform it into a marked reg-ular expression E ♯ over the alphabet Ω such that, for every subexpression F = x ∈ Π E b , χ ( F ) = ϕ ( x )[1] · ϕ ( x )[2] · · · ϕ ( x )[ | ϕ ( x ) | ] . In this way, we extend the definition of marked regular expresion to regular expressions overan alphabet indexed by any number of element, and whose symbol appear only once in the regular expression. Example 6.
Let E b = ([ aba ] + [ abb ]) ∗ [ aa ] , then E ♯b = ([ aba ] + [ abb ] ) ∗ [ aa ] and E ♯ = χ ( E ♯b ) = ( a , b , a , + a , b , b , ) ∗ a , a , .In the rest of this section, we consider the following regular expression : E b a ( Σ, k )-block regular expresion, itsmarked block regular expression E ♯b over Π E b , E ♯ = χ ( E ♯b ) a marked regular expression over Ω and E = ( E ♯ ) ♮ aregular expression over Σ .We can deduce some obvious properties about E ♯ : Null( E ♯ ) = Null( E ♯b ) (1) First( E ♯ ) = { a i, ∈ Ω | Pos E b ( a i, ) ∈ First( E ♯b } (2) Last( E ♯ ) = { a i,j ∈ Ω | Pos E b ( a i,j ) ∈ Last( E ♯b ) ∧ j = BlockLength E b ( a i,j ) } (3) Follow( E ♯ , a i,j ) = { b i,j +1 ∈ Ω } if j < BlockLength E b ( a i,j ) { b i ′ , ∈ Ω | Pos E b ( b i ′ , ) ∈ Follow( E ♯b , Pos E b ( a i,j )) } otherwise (4)We can also deduce some properties about block complete words, such that for every w ∈ Ω ∗ , w is bloc completeif w = ε or if it validates all of the following conditions: BlockPos E b ( w [1]) = 1 (5) BlockPos E b ( w [ | w | ]) = BlockLength E b ( w [ | w | ]) (6) ∀ i ∈ [1 , | w | [ , BlockPos E b ( w [ i + 1]) = (BlockPos E b ( w [ i ]) + 1) = ⇒ Pos E b ( w [ i + 1]) = Pos E b ( w [ i ]) (7) ∀ i ∈ [1 , | w | [ , BlockPos E b ( w [ i + 1]) = (BlockPos E b ( w [ i ]) + 1) = ⇒ BlockPos E b ( w [ i ]) = BlockLength E b ( w [ i ]) ∧ BlockPos E b ( w [ i + 1]) = 1 (8)Let us show that the previous statements induce important conditions concerning L( E ♯ ) . Proposition 10.
Every word w ∈ L( E ♯ ) is block complete.Proof. Let w ∈ L( E ♯ ) . If w = ε , then w is block complete. Now let us suppose that w = ε . Following property (2), w validates property (5). Following property (3), w validates property (6). And following property (4), w validatesthe properties (7) and (8). Therefore w is block complete. roposition 11. Let u, v, w, ∈ Ω ∗ and x, y ∈ Ω such that uxv, uyw ∈ L( E ♯ ) and x = y . Then u, xv, yw ∈ Subw(L( E ♯ )) ∩ BC ( Ω ) and Pos E b ( x ) = Pos E b ( y ) .Proof. Since the words uxv, uyw ∈ L( E ♯ ) , then u, xv, yw ∈ Subw(L( E ♯ )) . Following property (4), | Follow( E ♯ , u [ | u | ]) | > implies that BlockPos E b ( u [ | u | ]) = BlockLength E b ( u [ | u | ]) and BlockPos E b ( x ) = BlockPos E b ( y ) = 1 . Then, usingProposition 10, we can conclude that u, xv, yw ∈ BC ( Ω ) . Moreover, since BlockPos E b ( x ) = BlockPos E b ( y ) = 1 and x = y , then Pos E b ( x ) = Pos E b ( y ) . Proposition 12. ϕ is a bijection between L( E ♯b ) and L( E ♯ ) .Proof. Following property (1), ε ∈ L( E ♯ ) ⇐⇒ ε ∈ L( E ♯b ) . Since ϕ ( ε ) = ε , then ε ∈ L( E ♯ ) ⇐⇒ ϕ ( ε ) ∈ L( E ♯b ) .Let w B ∈ L( E ♯b ) such that w B = ε , according to the definition of function χ , ϕ ( w B ) ∈ L( E ♯ ) .Let w ∈ L( E ♯ ) such that w = ε . Following Proposition 10, w is block complete. Let set w = w · w · · · w n suchthat for every i ∈ [1 , n ] , w i ∈ SBC ( Ω ) . Then: – w [1] ∈ First( E ♯ ) – w n [ | w n | ] ∈ Last( E ♯ ) – ∀ i ∈ [1 , n [ , w i +1 [1] ∈ Follow( E ♯ , w i [ | w i | ]) .This means that : – ϕ − ( w ) ∈ First( E ♯b ) (following property (2)) – ϕ − ( w n ) ∈ Last( E ♯b ) (following property (3)) – ∀ j ∈ [1 , n [ , ϕ − ( w i +1 ) ∈ Follow( E ♯b , ϕ − ( w i )) (following property (4)).Therefore, ϕ − ( w ) = ϕ − ( w ) · ϕ − ( w ) · · · ϕ − ( w n ) ∈ L( E ♯b ) . This demonstrates that ϕ is also a bijection between L( E ♯b ) → L( E ♯ ) .In the same way, we can also prove that ϕ is a bijection between Subw(L( E ♯b )) and Subw(L( E ♯ )) ∩ BC ( Ω ) . Proposition 13. L( E b ) = L( E ) .Proof. Let u ∈ Γ ∗ , w ∈ Ω ∗ , w B ∈ Π ∗ E b such that u = w ♮B and w = ϕ ( w B ) . According to the definition of thebijection ϕ , for every w B ∈ Π E b , w ♮B = ( ϕ ( w B )) ♮ . And since Γ ∗ ⊂ Σ ∗ , then: u ∈ L( E b ) ⇐⇒ w B ∈ L( E ♯b ) ⇐⇒ ϕ ( w B ) = w ∈ L( E ♯ ) ⇐⇒ w ♮ = ϕ ( w B )) ♮ = w ♮B = u ∈ L( E ) Theorem 5. If E b is k -block deterministic, then E is k -lookahead deterministic.Proof. Let us suppose that A is not k -lookahead deterministic. Then, there exist u, v, w, ∈ Ω ∗ and l , l ∈ Ω k such that ul v, ul w ∈ L( E ♯ ) , l [1] = l [1] et ( l ) ♮ = ( l ) ♮ . Since ul v, ul w ∈ L( E ♯ ) then, following Proposition11, u, l v, l w ∈ Subw(L( E ♯ )) ∩ BC ( Ω ) and Pos E b ( l [1]) = Pos E b ( l [1]) . Let u B , v B , w B ∈ Π ∗ E b and b , b ∈ Π E b such that u B = ϕ − ( u ) , b v B = ϕ − ( l v ) , b w B = ϕ − ( l w ) . Then u B b v B , u B b w B ∈ L( E ♯b ) . Since Pos E b ( l [1]) =Pos E b ( l [1]) , then b = b . And since b v B = ϕ − ( l v ) , b w B = ϕ − ( l w ) , then ( b v B ) ♮ = ( l v ) ♮ , ( b w B ) ♮ = ( l w ) ♮ .But since E b is k -block deterministic, then | b ♮ | ≤ k and | b ♮ | ≤ k . So b ♮ ∈ Pref( l ♮ ) and b ♮ ∈ Pref( l ♮ ) . But since l ♮ = l ♮ , then either b ♮ ∈ Pref( b ♮ ) , or b ♮ ∈ Pref( b ♮ ) . Thus E b cannot be k -block deterministic. Therefore, if E b is k -block deterministic, then E is indeed k -lookahead deterministic.To conclude, following Theorem 5 and Proposition 13 : Theorem 6.
The family of k -block deterministic languages is included in the family of k -lookahead deterministiclanguages. .2 Block Deterministic Languages as a Proper Subfamily of Lookahead Deterministic Languages In this section, we show that the family block deterministic languages is strictly included in the one of lookaheaddeterministic languages. We first show that unary k -block deterministic languages are one-unambiguous, and thenconclude using the parameterized family provided in Section 5. Proposition 14.
Let E be a regular expression over an unary alphabet { a } . If E is k -block deterministic, then | First( E ♯ ) | ≤ and ∀ x ∈ Π E , | Follow( E ♯ , x ) | ≤ .Proof. Let us suppose that E is k -block deterministic. If | First( E ♯ ) | > , then there exist b , b ∈ First( E ♯ ) suchthat b = b . And since E is defined over an unary alphabet, then necessarily, either ( b ) ♮ ∈ Pref(( b ) ♮ ) , or ( b ) ♮ ∈ Pref(( b ) ♮ ) , which means that E is not k -block deterministic. Therefore | First( E ♯ ) | ≤ , and the samereasoning can be applied to Follow( E ♯ , x ) . Lemma 10.
Let Σ be an alphabet and E b be a k -block deterministic regular expression over an alphabet Γ ⊂ B Σ,k .If ∀ x ∈ Π E b , | Follow( E ♯b , x ) | ≤ and | First( E ♯b ) | ≤ , then L( E b ) is -block deterministic.Proof. Let E ♯ = χ ( E ♯b ) be a marked regular expression and E = ( E ♯ ) ♮ a regular expression over Σ . FollowingProposition 13, L( E b ) = L( E ) . Now, following property (2), | First( E ♯b ) | ≤ imply that | First( E ♯ ) | ≤ . Andfollowing property (4), ∀ x ∈ Π E b , | Follow( E ♯b , x ) | ≤ imply that ∀ y ∈ Π E , | Follow( E ♯ , y ) | ≤ . Therefore, followingthe construction of Glushkov automata in Defintion 1, G E is deterministic and E is one-unambiguous (that is tosay -block deterministic).Consequently: Theorem 7.
If an unary language is block deterministic, then it is -block deterministic. Therefore:
Proposition 15.
For any k ∈ N \ { , } , there exists unary languages which are k -lookahead deterministic withoutbeing block deterministic.Proof. In Section 5, we showed that for any j ∈ N \ { } , L( A j ) is ( j + 1) -lookahead deterministic without being j -lookahead deterministic. And since they are not -lookahead deterministic (that is to say one-unambiguous), theyare not block deterministic for any k . Thus, for any k ∈ N \ { } , L( A k ) is ( k + 1) -lookahead deterministic withoutbeing block deterministic.Finally: Theorem 8. ∀ k ∈ N \{ } , the family of k -block deterministic languages is strictly included in the one of k -lookaheaddeterministic languages. In this paper, we demonstrated that despite some erroneous results, there exists an infinite hierarchy in blockdeterministic languages. We showed that such an infinite hierarchy also exists in lookahead deterministic languages.And finally, showing that block-deterministic unary languages are one-unambiguous, we gave our own proof of thefamily of block deterministic languages being strictly included in the family of lookahead deterministic languages.From these results, one can wonder whether there exists a k -lookahead deterministic language which is ( k + 1) -block deterministic without being k -block deterministic. Another open problem is the decidability of the lookaheaddeterminism of a language. Finally, the decidability of the block determinism of a language has been studied byGiammarresi et al. but proved with Lemma 4 which we invalidated. Thus, this problem is still open. eferences
1. Anne Brüggemann-Klein and Derick Wood. One-unambiguous regular languages.
Inf. Comput. , 140(2):229–253, 1998.2. P. Caron and D. Ziadi. Characterization of Glushkov automata.
Theoret. Comput. Sci. , 233(1–2):75–90, 2000.3. Pascal Caron, Marianne Flouret, and Ludovic Mignot. (k, l)-unambiguity and quasi-deterministic structures: An alter-native for the determinization. In
LATA , pages 260–272, 2014.4. Dora Giammarresi, Rosa Montalbano, and Derick Wood. Block-deterministic regular languages. In
Theoretical ComputerScience, 7th Italian Conference, ICTCS 2001, Torino, Italy, October 4-6, 2001, Proceedings , pages 184–196, 2001.5. V. M. Glushkov. The abstract theory of automata.
Russian Mathematical Surveys , 16:1–53, 1961.6. Yo-Sub Han and Derick Wood. Generalizations of 1-deterministic regular languages.
Inf. Comput. , 206(9-10):1117–1125,2008.7. J. E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In Z. Kohavi, editor, The theoryof machines and computations , pages 189–196. Academic Press, New York, 1971.8. S. Kleene. Representation of events in nerve nets and finite automata.
Automata Studies , Ann. Math. Studies 34:3–41,1956. Princeton U. Press.9. E. F. Moore. Gedanken experiments on sequential machines. In
Automata studies , pages 129–153. Princeton Univ. Press,Princeton, N.J., 1956.10. M. O. Rabin and D. Scott. Finite automata and their decision problems.