Restarting Automata with Auxiliary Symbols and Small Lookahead
aa r X i v : . [ c s . F L ] M a y Restarting Automata with Auxiliary Symbols and Small Lookahead ∗ Natalie SchluterIT University of CopenhagenRued Langgaards Vej 7, 2300 Copenhagen S., Denmark [email protected]
Abstract
We present a study on lookahead hierarchies for restarting automata with auxiliary symbolsand small lookahead. In particular, we show that there are just two different classes of languagesrecognised by RRWW automata, through the restriction of lookahead size. We also show that therespective (left-) monotone restarting automaton models characterise the context-free languagesand that the respective right-left-monotone restarting automata characterise the linear languagesboth with just lookahead length 2.
Restarting automata work in phases of scanning their input from the left end marker towardsthe right end marker, rewriting the lookahead contents with a shorter substring once per phase,and then restarting at some point before or at the right end marker. They were introduced tomodel the analysis by reduction grammar verification technique in the analysis of sentences in free-word order natural language. It has been shown that through various restrictions on the model,an important number of traditional and new formal language classes may be defined. Study ofrestarting automata has therefore also become important for both its original intent of computationallinguistic application development, as well as for being an alternative machine model for investigatingproperties of traditional and newly distinguished formal language classes.In his study of lookahead hierarchies, Mraz [3] showed that the expressive power of restartingautomata without auxiliary symbols increases with the size of the lookahead. Schluter [6] latershowed that for deterministic monotone and monotone restarting automata with auxiliary symbols,separation of rewrite and restart step is not a significant restriction on expressive power for any fixedlookahead size k ≥
3, and that for the deterministic model, the difference in power of the modelscan be overcome by approximately doubling the lookahead size, when k ≥
3. In both studies, it wasremarked that lookahead hierarchies collapse for (left-)mon-RWW and (left-)mon-RRWW automatato k = 3. This paper presents a study on lookahead hierarchies for k < k . In particular, we show that there are only two different classes oflanguages recognised by RRWW automata, through restrictions on lookahead size.We also partially improve a result from [6] and [3], by showing that the respective monotoneand left-monotone restarting automaton models characterise the context-free languages with onlylookahead size 2. And, we establish a corresponding result for the characterisation of the linearlanguages by the respective right-left-monotone restarting automata with lookahead size 2.Following the definition of restarting automata and presentation of some useful properties inSection 2, we present our main results in Section 3. ∗ This is the full version of the paper accepted at LATA 2011. ome notation. We refer to the i th symbol of a string x as x [ i ], and its substring from the i thto j th symbols as x [ i, j ]. When we want to make the length of a string v such that | v | = k explicit,we may refer to v as v [1 , k ].For i, j ∈ N , with i < j , [ i, j ] alone denotes the set { i, . . . , j } . If i = 1, we say [ j ] := [1 , j ].If S is a set of symbols, then by S i we denote the set of strings of length i ∈ N with symbolsfrom S . Also λ := S is the empty string.Finally, REG, LIN and CFL denote the classes of regular, linear, and context-free languagesrespectively. A restarting automaton or RRWW-automaton, M = ( Q, Σ , Γ , ¢ , $ , q , k, δ ), is a nondeterministicmachine model with a finite control unit and a lookahead (or read/write) window of size k (includingthe symbol under its scanning head, which is the first symbol of the lookahead contents) that workson a list of symbols delimited by end markers (or sentinels) ( { ¢ , $ } ), where ¢ is the left sentinel and $ is the right sentinel. Σ is the input alphabet and Γ ⊇ Σ the work tape alphabet. The symbolsΓ − Σ are called auxiliary symbols . Q is the finite set of states and q ∈ Q is the initial state. M ’s transition relation, δ , describes four types of transition steps (or instructions), where u isthe contents of the lookahead.(1) A move-right step is of the form q ′ ∈ δ ( q, u ), where q, q ′ ∈ Q . This means that M advancesone tape square to the right and enters state q ′ upon reading u .(2) A rewrite step is of the form ( q ′ , REWRITE ( v )) ∈ δ ( q, u ), where q, q ′ ∈ Q , and v is such that | v | < | u | ( u, v ∈ Γ ∗ ). This means that M replaces its window contents u with v , advances tothe tape square directly to the right of v , and enters state q ′ . In this rewrite instruction, wewill refer to u as the redex and v as the reduct .(3) A restart step is of the form RESTART ∈ δ ( q, u ), where q ∈ Q , in which M moves its read/writewindow to the beginning of the input and enters the initial state.(4) An accept step is of the form ACCEPT ∈ δ ( q, u ), in which M halts and accepts. (This may alsobe viewed as the accept state.)If δ ( q, u ) = ∅ , in which case we say that δ is undefined , M halts and rejects; we could exclude thispossibility through the use of a model with both accept and reject states, in which case all possibilitiesfor δ are defined. If | δ ( q, u ) | ≤ q, u , then the restarting automaton is deterministic .A configuration of M is uqv , where u ∈ { λ }∪{ ¢ }· Γ ∗ is the contents of the worktape from theleft sentinel to the position of the head, q ∈ Q is the current state and v ∈{ ¢ , λ } · Γ ∗ · { $ , λ } is thecontents of the worktape from the current first symbol under the scanning head to the right sentinel,and uv is the current contents of the worktape. The head scans the first k symbols of v (or all of v when | v | ≤ k ). A restarting configuration , for a word w ∈ Γ ∗ , is of the form q ¢ w $ . If w ∈ Σ ∗ , q ¢ w $ is an initial configuration . An accepting configuration is a configuration with an accepting state.A computation of M for an input word w ∈ Σ ∗ is a sequence of configurations starting with aninitial configuration, where two consecutive configurations are in the relation ⊢ M induced by a finiteset of instructions of one of the above mentioned types. The transitive closure of ⊢ M is denoted ⊢ ∗ M .A phase of a computation begins with a restarting configuration and (exclusively) either (1) endswith the next encountered restarting configuration, in which case it includes exactly one rewritestep and is called a cycle , or (2) halts, in which case it includes at most one rewrite step and iscalled a tail phase . We refer to segments of a computation within a single phase before (resp. after)a rewrite as left (resp. right ) computation . 2n input word w is accepted or recognised by M if there is a computation which starts on theinitial configuration and finishes in an accepting configuration. Also, we define L ( M ) as the languagerecognised by M .Consider a cycle C and say the configuration from which M carries out a rewrite step is uqv in C ; we define to the right distance of C as D r ( C ) := | v | and the left distance as D l ( C ) := | u | .Let C = C , C , . . . , C n be a sequence of cycles of a restarting automaton M that, together withpossibly a (final) tail phase, are M ’s computation on some input. If D r ( C i ) ≥ D r ( C i +1 ) for all i ∈ [ n − C is right-monotone or simply monotone . Similarly, if D l ( C i ) ≥ D l ( C i +1 )for all i ∈ [ n − C is left-monotone . If C is both right- and left-monotone, then wesay that C is right-left-monotone . If all the sequences of cycles corresponding to computations ofa restarting automaton M are monotone (respectively left-monotone, right-left-monotone) then wesay that M is monotone (respectively left-monotone , right-left-monotone ). We denote the class ofmonotone RRWW-automata (respectively left- or right-left- RRWW automata), mon -RRWW ( lef t - mon -RRWW or right-left-mon -RRWW).Through restrictions on the restarting automaton model, we obtain many types of restartingautomata. For instance, RRW-automata are RRWW-automata with no auxiliary symbols (Γ = Σ).An RR-automaton is an RRW-automaton with rewrite instructions that can only delete symbols.An RWW-automaton is an RRWW-automaton, which restarts immediately after any rewrite in-struction, and an RW-automaton is an RRW-automaton that restarts immediately after any rewriteinstruction. Finally, an R-automaton is an RR-automaton that restarts after any rewrite instruction.When the rewrite and restart steps are not separated, instead of items (2) and (3) in the de-scription of δ above, we have simply the following type of instruction.(2/3) A rewrite step (which is combined with restarting) is of the form REWRITE ( v ) ∈ δ ( q, u ), where q, q ′ ∈ Q , and v is such that | v | < | u | ( u, v ∈ Γ). This means that M replaces its windowcontents u with v and then moves its read/write window to the beginning of the input andenters the initial state.All notions of monotonicity and determinism and corresponding notation extend to these morerestrictive versions in the obvious way.An X automaton, X ∈ { R, RR, RW, RW W, RRW, RRW W } , with lookahead size k , will bedenoted by X ( k ). For example, an RRWW(k) automaton is an RRWW automaton with lookaheadsize k . Niemann and Otto [4] describe the behaviour of a non-deterministic restarting automaton M bymeans of a finite set of meta-instructions of the form ( E , u → v, E ) (called cycle meta-instructions )and ( E, ACCEPT ) (called tail meta-instructions ). In these meta-instructions, E , E , and E are reg-ular languages, which are called the regular constraints of the meta-instruction, and u and v arestrings such that u → v stands for a rewrite step of M , where u is the redex and v is the reduct.These meta-instructions are applied as follows. In a restarting configuration q ¢ w $, M nondetermin-istically chooses a meta-instruction, say ( E , u → v, E ). Now, if w does not admit a factorisationof the form w = w uw such that ¢ w ∈ E and w $ ∈ E , then M halts and rejects. Otherwise,one such factorisation is chosen nondeterministically, and q ¢ w $ is transformed into the restartingconfiguration q ¢ w vw $. If ( E, ACCEPT ) is chosen, then M halts and accepts, if ¢ w $ ∈ E , otherwise, M halts and rejects. Similarly, the behaviour of an RWW-automaton M can be described througha finite sequence of meta-instructions of the form ( E, u → v ) and ( E, ACCEPT ). This section presents four basic lemmata used in the proofs of the main results in Section 3.3he correctness preserving property is a fundamental property of restarting automata.
Proposition 1 (Correctness Preserving Property [5]) . Let M be a restarting automaton, and u , v be arbitrary input words from Σ ∗ . If u ∈ L ( M ) and u ⊢ ∗ M v is an initial segment of an acceptingcomputation of M , then v ∈ L ( M ) . It will be useful to simplify the computations of the restarting automata that we discuss (withoutreducing their power). The next three lemmata serve this purpose.A nondeterministic restarting automaton M = ( Q, Σ , Γ , ¢ , $ , q , k, δ ) is in RR-semidet-form if(1) halting (and restarting for automata with separate rewrite and restart steps) occurs only whenthe right sentinel is under the lookahead, and (2) move-right steps are deterministic. The followinglemma shows that non-deterministic restarting automata with lookahead length k can be assumedw.l.o.g. to be (1) in RR-semidet-form and (2) making move-right steps based only on the first symbolunder the lookahead. Lemma 2.
For any X - Y automaton, M = ( Q, Σ , Γ , ¢ , $ , q , k, δ ) , where X ∈ { (right-left-, left-)mon-, λ } and Y ∈ { R, RR, RW, RRW, RWW, RRWW } , there is X - Y automaton, M = ( Q ′ , Σ , Γ , ¢ , $ , q ′ , k, δ ′ ) , such that1. M is in RR-semidet form,2. M makes move-right steps based on the couple ( u [1] , q ) , where u [1] is the first symbol underthe lookahead and q is M ’s current state,and L ( M ) = L ( M ) .Proof. Janˇcar [1] showed (1). (2) is easily seen by the specification of non-deterministic restartingautomata by means of regular constraints. A restarting automaton specified by regular constraintscan easily be assured to be in RR-semidet-form. Halting (and restarting for automata with separaterewrite and restart steps) can be made to occur after verification that the tape contents can befactorised according to the selected meta-instruction and once the automaton reaches the rightsentinel. Moreover, move-right steps verify membership in a regular language, so not only can thesemove-right steps be determinised, but they can be determinised based on just the first symbol underthe lookahead. Any monotonicity is preserved.If a restarting automaton M only rewrites when the contents of its lookahead is full, we say that M has fixed rewrite size . Lemma 3.
For any X - Y automaton, M , where X ∈ { (right-left-, left-)mon-, λ } and Y ∈ { R, RR,RW, RRW, RWW, RRWW } , there exists an X - Y automaton, M , that has fixed rewrite size, suchthat L ( M ) = L ( M ) .Proof. For the proof, we construct a restarting automaton M from M that never rewrites when itslookahead contains less than k symbols (where k is the length of the lookahead), supposing withoutloss of generality that M is in RR-semidet form. We describe the case where restart and rewritesteps are separated, the other case being easily understood from this. M ’s lookahead can only contain less than k symbols if it also contains the right sentinel. Werely on a simple speed-up of M ’s steps for the cases (1) where the left sentinel is also contained inthe lookahead, (2) of a right computation, or (3) of a tail phase.Otherwise, M (with transition relation δ ) has a rewrite of the form ( p, REWRITE ( v $)) ∈ δ ( q, u $)where | u $ | < k . In this case, we “plug up” the rewrite from the left with all strings α ∈ Γ k −| u $ | , Here, the decision whether or not to move-right remains non-deterministic; however, the decision of which move-right step to carry out becomes deterministic. M from state q ′ reads αu $ and enters state q with u $ the prefix of its lookahead, giving( p, REWRITE ( αv $)) ∈ δ ( q ′ , αu $), where δ is M ’s transition relation.Clearly L ( M ) = L ( M ). Also, monotonicity is clearly preserved. Lemma 4.
For any X - Y automaton, M , where X ∈ { (right-left-, left-)mon-, λ } and Y ∈ { RWW,RRWW } , with lookahead size k , there exists an X - Y automaton, M , with lookahead size k , thatreduces its input by only one symbol per cycle, and is such that L ( M ) = L ( M ) .Proof. Let M = ( Q, Σ , Γ , ¢ , $ , q , k, δ ) be an X -RRWW automaton where X ∈ { (right-left-, left-)mon-, λ } , with fixed rewrite size, in the RR-semi-det form, and that carries out move-right stepsbased on only the first symbol under the lookahead. Let B be a symbol not in Γ, which we call the blank symbol . We construct M = ( Q ∪ ¯ Q ∪ ˆ Q, Σ , Γ ∪ { B } , ¢ , $ , q , k, δ ), such that L ( M ) = L ( M ),from M .In what follows, q, q ′ , p, p ′ ∈ Q, u ∈ (Γ ∪ { ¢ } ) · Γ k − · (Γ ∪ { $ } ) ,x ∈ (Γ ∪ { B } ) k − · (Γ ∪ { B, $ } ) , and x x ∈ (Γ ∪ { B } ) k − .M ’s state set includes M ’s state set ( Q ), marked states for indicating a guess that there areblank symbols on the tape (in left computations) Q := { ¯ q | q ∈ Q } , and hat states for indicatingthat M is working in a right computation, ˆ Q := { ˆ q | q ∈ Q } .In a restarting configuration, M can either rewrite or move-right. Say M wants to simulate amove-right step of M . M first guesses whether there are any blank symbols currently on its tape.If M guesses that there are blank symbols on it’s tape, then it will move into a marked state.Otherwise it will remain in a state from Q . So, if q ′ ∈ δ ( q , u ), then M has both of the followingmove-right instructions q ′ ∈ δ ( q , u ) for guesses that there are blank symbols on the tape, and (1) q ′ ∈ δ ( q , u ) for guesses that there are no blank symbols on the tape. (2)For rewrites, if ( p, REWRITE ( v )) ∈ δ ( q, u ), then(ˆ p, REWRITE (B k − −| v | v )) ∈ δ ( q, u ) . (3)That is, we pad rewrites of M (from the left) with k − − | v | blank symbols so that the input isreduced by only one symbol for M . (Note that if q = q , since M has fixed rewrite size, we neverpad these lookaheads.) The state ˆ p indicates that M has made a rewrite. There should be no blanksymbols for the rest of this cycle (right computation). Therefore if M finds a blank symbol whilein a hat state, it rejects: REJECT ∈ δ (ˆ q, Bx ) , and REJECT ∈ δ (ˆ q, x Bx $) . In subsequent cycles, M will delete the blank symbols introduced, one-by-one and immediatelyrestart . Unless M is in a restarting configuration, it can only delete blank symbols if it is a markedstate (i.e., if it guessed that there were blank symbols on the tape at the start of the cycle): REWRITE ( x ) ∈ δ (¯ q, Bx ) , deletion of blank symbols in a marked state (4) REWRITE ( x ) ∈ δ ( q , Bx ) , deletion of blank symbols in the start state. (5)If M reaches the right sentinel in a marked state, and still has no blank symbols under itslookahead, then it rejects (it has verified that its guess about the presence of blank symbols on thetape is incorrect): REJECT ∈ δ (¯ p, u [1 , k − ∀ u [1 , k − ∈ Γ k − .
5e have already defined move-right instructions for M in state q . M can simulate M ’s move-right steps with only the first symbol under the lookahead. Therefore we can define the rest of M ’s move-right steps simply as follows, for q ′ ∈ δ ( q, u ) and based on just the symbol u [1] of thelookahead (as well as the states q, q ′ ). Here, neither q nor q ′ is the restart state. Also, x does not havethe right sentinel as a suffix. If M is in a marked state (resp. hat state, state from Q ) it remains ina marked state (resp. hat state, state from Q ): q ′ ∈ δ ( q, u [1] x ) , ˆ q ′ ∈ δ (ˆ q, u [1] x ) , and q ′ ∈ δ ( q, u [1] x ) . In state q or ˆ q and with lookahead contents u , M move rights and rejects (resp. accepts) if instate q , M moves right and rejects (resp. accepts). Also, it is clear that L ( M ) = L ( M ). Moreover,it is easy to see that monotonicity is preserved.For the remainder of this paper, we will assume w.l.o.g. that all discussed non-deterministicrestarting automata with auxiliary symbols (1) are in RR-semi-det form, (2) carry out move-rightsteps based on the current state and the first symbol under the lookahead, (3) have fixed rewritesize, and (4) reduce their input by only one symbol per cycle. For restarting automata with auxiliary symbols and lookahead of size 1, showing that the separationof rewrite and restart step results in an increase in power for these automata. In fact, the result isgiven for monotone restarting automata also. Proposition 5.
For X ∈ { (right-left-, left-)mon-, λ } , REG = L ( X -RWW (1)) ( L ( right-left-mon-RRWW (1)) . Proof.
Mraz [3] showed that
REG = L ( X -R(1)) = L ( X -RW(1)) = L ( X -RWW(1)), with X ∈ { det-mon, det, mon, λ } and this clearly also holds for X = (right-left-, left-)mon. We specify a right-left-mon-RRWW(1) automaton M such that L ( M ) ∈ LIN − REG , through the following regularconstraints. (Note that L (right-left-mon-RRWW) = LIN [2].)( ¢ ( ab ) ∗ a, b → λ, ( cd ) ∗ $) ( ¢ ( ab ) ∗ a, c → λ, d ( cd ) ∗ $)( ¢ ( ab ) ∗ , a → λ, d ( cd ) ∗ $) ( ¢ ( ab ) ∗ , d → λ, ( cd ) ∗ $) ( ¢ λ $ , ACCEPT ).By an enumeration of the left-over context possibilities, it can be shown that L ( M ) = { ( ab ) n ( cd ) n | n ≥ } ∪ { ( ab ) n − a ( cd ) n | n ≥ } ∪ { ( ab ) n − ad ( cd ) n − | n ≥ } ∪ { ( ab ) n − a ( cd ) n − | n ≥ } ∈ LIN − REG .We can also separate the classes of languages recognised by RWW (RRWW) automata withlookahead 1 from that of those with lookahead 2. The result is also given for monotone restartingautomata.
Proposition 6.
For all X ∈ { (right-left, left-)mon, λ } , L ( X - RW W (2)) − L ( X - RW W (1)) = ∅ and L ( X - RRW W (2)) − L ( X - RRW W (1)) = ∅ . Note that this is only a small improvement on the fact that L ( X -RWW(1)) ( L (RRWW(1)) for all X ∈ { (right-left-, left-)mon-, λ } , which is an immediate consequence of results in [3]. roof. The language L = { a n b n | n ≥ } is the classic example of a linear language that is notregular. A det-right-left-mon-RWW(2) automaton to recognise L may be specified (deterministically)by the following regular constraints:( ¢ a ∗ , ab → c, b ∗ $) , ( ¢ a ∗ , cb → d, b ∗ $) , ( ¢ a ∗ , ad → c, b ∗ $) , ( ¢ λ $ , ACCEPT ) , ( ¢ d $ , ACCEPT ) . On the other hand, no restarting automaton M with just size 1 lookahead can recognise this language,for after the first deletion, the tape contents contain a string not in L ( M ), which is excluded by thecorrectness preserving property.It turns out that further separation of language classes for RRWW is not possible. This is themain result of this paper, given in Theorem 7 and Corollary 12. Theorem 7.
For k ≥ and X ∈ { (right-left-, left-)mon, λ } , we have L ( X -RRWW ( k )) = L ( X -RRWW ( k + 1)) . Proof of Theorem 7.
Assume M = ( Q , Σ , Γ , ¢ , $ , q , k + 1 , δ ) is an RRWW(k+1) automaton.We construct M = ( Q , Σ , Γ , ¢ , $ , q , k, δ ) an RRWW(k) automaton to simulate M , such that L ( M ) = L ( M ).For this construction, the nondeterminacy of M is essential. M ’s lookahead is one symbolshorter than M ’s. So, M will simulate M ’s rewrites by guessing the contents of the tape square, τ R , following the last symbol of its lookahead, contained in tape square τ L . It will verify this guesswithin up to one step (of the same cycle), using a compound state holding this information, leavingbehind in the compound symbol τ L , how M should read the guessed contents of τ R in subsequentcycles; we’ll call this instruction I . If there is a rewrite starting in τ R in a subsequent cycle, C i ,then M will record in τ R that it should ignore I in all cycles after C i . Using the Matching Lemma(Lemma 11) concerning the “interaction” of information in τ L and τ R , M will be able to determinewhich message is most up-to-date. Note that this simulation could not work for k = 1, because then M can only delete.We now give the formal proof of the Theorem. Notation for M ’s Work Tape. Let Θ t, C = π i − π i π i π i · · · π i n − m π i n − m +1 π i n − m +3 denote M ’swork tape at time t in cycle C m ( m ≥
1) of computation C on an initial input of length n , whereeach π i j is a tape square boundary, for j ∈ {− , } ∪ [ n − m + 3]. Further, with respect to Θ t, C ,we let τ R ( π i j , t ) denote the contents of tape square to the right of π i j at time t (if it exists) and τ L ( π i j , t ) the contents of the tape square to the left of π i j at time t (if it exists). So, we always have,for example, τ R ( π − , t ) = ¢ = τ L ( π , t ). We call a tape square boundary internal if it is between twotape squares. With each cycle, one tape square and boundary are destroyed and for this proof, wesay that the second tape square involved in the redex and its boundary to the left are destroyed inthe rewrite of the cycle. Verification Information and Rewrite Instruction Set Notation. By verification informa-tion , VerInf , we will just mean some member of the set of M ’s rewrites, or the special blank symbol, B / ∈ Γ , and we will denote the set of verification information asΠ := { ( q, u [1 , k + 1] , v [1 , k ] , q ′ ) | ( q ′ , REWRITE ( v [1 , k ])) ∈ δ ( q, u [1 , k + 1]) } ∪ { B } . We’ll also refer to Π := Π − { B } as the set of M ’s rewrites. For ρ = ( q, u [1 , k + 1] , v [1 , k ] , q ′ ) ∈ Π ,we denote to the components of ρ as follows: redex ( ρ ) := u, reduct ( ρ ) := v, from state ( ρ ) : q, and to state ( ρ ) := q ′ . reduct ( ρ )[ k + 1] = u [ k + 1] and redex ( ρ )[ k ] = v [ k ]. Finally, we denote by Π , theset of M ’s rewrites,Π := { ( q, x [1 , k ] , y [1 , k − , q ′ ) | ( q ′ , REWRITE ( y [1 , k − ∈ δ ( q, x [1 , k ]) } which will be defined shortly. M ’s Tape Alphabet. M has tape alphabet Γ := Γ ∪ ∆, where∆ := { ( x, VerInf , c , c ) | x ∈ Γ , VerInf ∈ Π , c , c ∈ { , , neutral }} . The second through fourth components of the information from these compound symbols in ∆are used for verifying rewrite guesses, updating tape contents, and determining whether updatingis necessary.If
VerInf = B , we say that VerInf is blank ; we refer to the set of compound symbols withblank verification information as ∆ B . Also, we refer to the set of compound symbols with the lastcomponent, c , not equal to neutral as ∆ . M uses compound symbols as either the last and possibly also the first symbol of a reduct. Theinformation VerInf is used for verifying rewrite guesses and updating tape contents; this componentwill be non-blank in the last symbol of a reduct.
VerInf represents the latest simulated rewriteintroducing a compound symbol in the tape square as the last symbol of the reduct.The last two components of the 4-tuples in ∆ take values that help determine when verificationinformation is out of date; the third component gives instructions about information in the followingtape square and the fourth component gives instructions about information in the preceding tapesquare. Their usage will be made precise in Remark 8 and in the description of M ’s rewrite andmove-right instructions.To refer to the different components of compound symbols z = ( z ′ , VerInf , c , c ) ∈ ∆, weintroduce the notation comp i ( z ) , i ∈ { , , } , which refers to the i th component of z . On the otherhand, comp is defined as a homomorphism comp : Γ ∪ { ¢ , $ } → Γ ∪ { ¢ , $ } as follows, for z ∈ Γ ∪ { ¢ , $ } comp ( z ) := ( z if z ∈ Γ ∪ { ¢ , $ } x if z = ( x, VerInf , c , c ) ∈ ∆ . Then we extend comp in the natural way to comp : (Γ ∪ { ¢ , $ } ) ∗ → (Γ ∪ { ¢ , $ } ) ∗ .Further, we inductively define a mapping h : (Γ ∪ { λ, ¢ } ) × (Γ ∪ { ¢ , $ } ) ∗ → (Γ ∪ { ¢ , $ } ) ∗ by h ( z ′ , z ) = comp ( z ) if z ′ ∈ Γ ∪ ∆ B ∪ { ¢ } , orif z ′ ∈ ∆ − ∆ B , z ∈ ∆ , and comp ( z ) = comp ( z ′ ) , orif z ′ = λ. reduct ( comp ( z ′ ))[ k ] otherwise.Then we let h ( z ′ , zα ) := h ( z ′ , z ) h ( z, α ), where z is a single symbol.Since compound symbols may have various components in common, we will sometimes speak ofcomponents being introduced into tape squares. If at time t a tape square τ holds compound symbol z with some component comp i ( z ), but at time t − τ ’s contents held some symbol z ′ ∈ Γ withoutthe same component—that is, either z ′ ∈ Γ or comp i ( z ′ ) = comp i ( z )—then we say that comp i ( z )was introduced (into tape square τ ) at time t . 8 ’s State Set. For the definition of Q , we first define the two-by-two mutually exclusive sets Q and Q (which are also each mutually exclusive with Q ). Q := { ( q, VerInf , c, d, e ) | q ∈ Q − { ACCEPT , REJECT } , VerInf ∈ Π ,c, e ∈ { , , neutral } , d ∈ { verify , ignore , neutral }} Q := { q u [1 ,k ] | q ∈ Q , u [1 , k ] ∈ (Γ ∪ { ¢ } ) k and δ ( q, u [1 , k ]$) ∈ { ACCEPT, REJECT }} M has the state set Q := Q ∪ Q ∪ Q , where Q is the set of all possible contexts leading to anaccept state for M , used on exactly the accept step in M ’s computations. The compound states(from Q ) are only used to “pick up” information from compound symbols.To refer the different components of compound symbols q = ( q ′ , VerInf , c, d, e ) ∈ Q , we intro-duce the notation COMP i ( q ) , i ∈ { , , , } , which refers to the i th component of q . We further definethe homomorphism COMP ( q ) : Q → Q as follows, for q ∈ Q . COMP ( q ) := q if q ∈ Q p if q = p u [1 ,k ] ∈ Q p if q = ( p, VerInf , c, d, e ) ∈ Q . Using the mapping h above, we define another mapping g : Q × (Γ ∪ { ¢ , $ } ) ∗ → (Γ ∪ { ¢ , $ } ) ∗ by g ( q, z ) = comp ( z ) if q ∈ Q , orif z ∈ ∆ , and comp ( z ) = COMP ( q ) , orif z = { ¢ , $ } . reduct ( COMP ( q ))[ k ] otherwise.Then we let g ( q, zα ) := g ( q, z ) h ( z, α ), where z is a single symbol.The presentation of the proof is somewhat eased by first presenting some guiding propertiesfor M that the definition of rewrite and move-right steps will have to obey; this is the purposeof Remark 8 (some comments on Remark 8 follow). After this, we will prove some facts about M based on these properties and use these results in the remainder of our definition of M that follows. Remark 8. M will be defined according to the six following invariants:(I1) M ’s rewrites will be of the form ( p, REWRITE ( y [1 , k − ∈ δ ( q, x [1 , k ]) where:(a) The last symbol of the reduct, y [ k − , is from ∆ − ∆ B and is such that comp ( y [ k − ∈ Π is the rewrite of M simulated.(b) The first symbol of the reduct, y [1] , is from ∆ ∪ Γ .(c) All remaining symbols of the reduct, y [ i ] , i ∈ { , . . . , k − } are from Γ .(I2) M will only write a symbol from ∆ if in a compound state. In particular, if M is incompound state q and writes symbol y ∈ ∆ , then comp ( y ) = COMP ( q ) and comp ( y ) = COMP ( q ) .(I3) M will always enter a compound state after carrying out a rewrite step. In fact, if M isin compound state q after writing compound symbol y [ k − ∈ ∆ − ∆ B , then COMP ( q ) = comp ( y [ k − , COMP ( q ) = comp ( y [ k − , COMP ( q ) ∈ { verify , ignore } , and if x [ k ] ∈ ∆ − ∆ B , then COMP ( q ) = comp ( x [ k ]) , otherwise COMP ( q ) = neutral .(I4) M enters a compound state after reading a compound symbol from ∆ − ∆ B as the first symbolunder the lookahead. Otherwise, after a move-right step M must be in a state from Q . In fact,if M reads symbol z ∈ ∆ , then it enters a compound state q such that COMP ( q ) = comp ( z ) , COMP ( q ) = comp ( z ) , and COMP ( q ) = COMP ( q ) = neutral . I5) M in compound state q with COMP ( q ) ∈ { verify , ignore } rejects if it reads a compoundsymbol z ∈ ∆ such that COMP ( q ) = comp ( z ) .Moreover, if M does not reject, then(a) if COMP ( q ) = verify , then M checks that reduct ( COMP ( q ))[ k + 1] = comp ( z ) ( M verifies the symbol currently scanned). Furthermore, if COMP ( q ) ∈ { , } and z ∈ ∆ , M also assures that COMP ( q ) = comp ( z ) ( M verifies that the currently scanned symbolholds the most up-to-date information).(b) if COMP ( q ) = ignore , COMP ( q ) ∈ { , } , and z ∈ ∆ , then M assures that COMP ( q ) = comp ( z ) ( M verifies that the information in the currently scanned symbol, z , is out-of-date).Then M (in both cases of COMP ( q ) ) enters some state p such that COMP ( q ) = COMP ( p ) andif p / ∈ Q , then COMP ( p ) = COMP ( p ) = neutral and COMP i ( p ) = comp i ( z ) for i ∈ { , } .(I6) Let p ∈ Q − ( { ACCEPT, REJECT } ∪ Q ) .(a) There is some left computation on prefix ¢ α ∈ Γ ∗ in which M reaches state p if and onlyif there is some left computation on prefix h ( λ, ¢ α ) that puts M in state q = COMP ( p ) .(b) There is some right computation on prefix zα after which M enters state p where z ∈ Γ , α ∈ Γ ∗ starting in state p ′ if and only if there is some right computation on prefix h ( z, α ) after which M enters state COMP ( p ) starting in state COMP ( p ′ ) . (I1-I3) concern rewrite steps, (I4-I5) concern move-right steps, and (I6) is the main statementthat ensures this proof works (valid simulations).(I4) ensures that M can update tape contents after reading a compound symbol from ∆, butthat it should not verify that the rewrite guess indicated in this information is correct ( COMP ( q ) = neutral ). In fact, this verification should have taken place directly following the rewrite (in thesame cycle) as is indicated in (I3) ( COMP ( q ) ∈ { verify, ignore } ). Points (I3-I5) together indicatethat M can only be in a state with fourth component equal to a member of { verify, ignore } atmost once in a cycle: verification of the rewrite guess happens during a single move-right step inthe same cycle. By the same token, M can only be in a state with fifth component non-equal to neutral during the same single move-right step of the cycle: verification of the updated-ness of thelast symbol under the lookahead can happen only in the step after a rewrite, since move-right stepsare only defined with respect to the first symbol under the lookahead.(I2) ensures that M can detect when an update of the tape contents has been written onto thetape. (I5) permits M to keep track of cycle orders, to the extent that is necessary here. (See Lemma11.)From Remark 8, we easily obtain the following three facts: Lemma 9.
At no time t in M ’s computation C is there an interior square boundary π on M ’swork tape Θ t, C such that τ L ( π, t ) ∈ Γ ∪ ∆ B ∪ { ¢ } and τ R ( π, t ) ∈ ∆ . (No symbol from Γ ∪ ∆ B ∪ { ¢ } directly precedes a symbol from ∆ on M ’s work tape at any time t in the computation.)Proof. This follows from (I1-I4).
Corollary 10. M cannot read a symbol from ∆ in a state from Q . The following Matching Lemma shows that M can detect the order of rewrites over consecutivetape squares. By prefix in a right computation we mean the prefix of the segment of work tape contents following the rewrite. emma 11 (Matching Lemma) . At time t in M ’s computation C let π be an interior tape squareboundary on M ’s work tape Θ t, C . Suppose τ L ( π, t ) ∈ ∆ − ∆ B and τ R ( π, t ) ∈ ∆ . Then there aretwo cycles C j , C j ∈ C , such that1. M uses rewrite ρ i = ( q i , x i [1 , k ] , y i [1 , k − , q ′ i ) at time t i in C j i ( i ∈ [2] ) such that C j intro-duced comp ( τ L ( π, t )) = comp ( y [ k − , and C j introduced comp ( τ R ( π, t )) = comp ( y [1]) ∈{ , } .2. (a) comp ( τ L ( π, t )) = comp ( τ R ( π, t )) , implies t < t .(b) comp ( τ L ( π, t )) = comp ( τ R ( π, t )) , implies t > t .Proof. (1) follows from (I1). (2a) follows from (I2) and (I4). (2b) follows from (I3) and (I5).In case (2a) of the Matching Lemma, M should update the tape square (in memory) τ R ( π, t ) asit reads it, and in case (2b), M should ignore the instruction in τ L ( π, t ) to update the informationin τ R ( π, t ), since it is now “out of date”. We also remark that the Matching Lemma helped providethe definition of the mappings h and g .We now describe the rewrite and move-right instruction for M with k >
2. The case for k = 2is easily obtained from this by merging the requirements for the first and last symbols in reducts ofthe case k > Rewrite steps of M . Let ρ = ( q, u [1 , k + 1] , v [1 , k ] , q ′ ) ∈ Π . We define a set of M ’s rewritesrequired for simulating ρ of the form ρ ′ = ( p, x [1 , k ] , y [1 , k − , p ′ ) ⊆ Π with the following component requirements.1. p = q if p ∈ Q , and p = ( q, ρ ′′ , comp ( τ L ( π, t )) , neutral , neutral ), otherwise, where ρ ′′ hasfurther constraints with respect to x [1]. (See Item (7).)2. For p ′ , we have (by (I3)) p ′ = ( q ′ , ρ, comp ( y [ k − , verify , neutral ) if x [ k ] ∈ Γ ∪ ∆ B , ( q ′ , ρ, comp ( y [ k − , ignore , comp ( x [ k ])) if x [ k ] ∈ ∆ − ∆ B and only in (6b),( q ′ , ρ, comp ( y [ k − , verify , comp ( x [ k ])) if x [ k ] ∈ ∆ − ∆ B and only in (6a).3. Any x [2 , k − ∈ Γ k − such that h ( x [1] , x [2 , k − u [2 , k − y [2 , k −
2] = v [2 , k − y [ k −
1] = ( v [ k − , ρ, c , neutral ) , with c ∈ { , } , by (I1).6. (a) any x [ k ] ∈ Γ such that h ( x [ k − , x [ k ]) = u [ k ], or(b) any x [ k ] ∈ ∆ − ∆ B such that reduct ( comp ( x [ k ]))[ k ] = u [ k +1], and comp ( x [ k ]) ∈ { , } .7. Finally for x [1], y [1], • If p ∈ Q , then y [1] = v [1] and any x [1] ∈ Γ ∪ { ¢ } such that comp ( x [1]) = u [1] willsuffice. • If p ∈ Q , then y [1] = ( v [1] , B , neutral , COMP ( p )) and – any x [1] ∈ (Γ ∪ { ¢ } ) − ∆ such that comp ( x [1]) = redex ( COMP ( p ))[ k + 1] and reduct ( COMP ( p ))[ k ] = u [1], or 11 any x [1] ∈ ∆ such that ∗ COMP ( p ) = comp ( x [1]), comp ( x [1]) = redex ( COMP ( p ))[ k +1] and reduct ( COMP ( p ))[ k ] = u [1], or ∗ COMP ( p ) = comp ( x [1]) and comp ( x [1]) = u [1].by the Matching Lemma.There are no other rewrites in δ .Note that M cannot rewrite over the right sentinel, since it always simulates M ’s rewrites usingonly the first k symbols and M has fixed rewrite size. Move-right steps of M not derived from M ’s move-right steps. We suppose without lossof generality that M doesn’t rewrite over the right sentinel and then immediately halt. There aretwo types of move-right steps for M that are not derived from M ’s move-right steps, for verifyingrewrite guesses; they are therefore derived from M ’s rewrites. These two cases, for δ ( p, x [1 , k ]) arewhen p ∈ Q with COMP ( p ) ∈ { verify , ignore } . In these move-right steps, M simply verifies thatInvariant (I5) is maintained and, if so, moves right and into state ( COMP ( p ) if x [1] ∈ Γ , and( COMP ( p ) , comp ( x [1]) , comp ( x [1]) , neutral , neutral ) otherwise,indicating that M remains in the “same” state (with respect to M ’s state), picks up x [1]’s verifica-tion information (in case it must update tape contents), and its matching information (to keep trackof the order of rewrites). The fourth and fifth components are always neutral in the compoundstate following any step that does not verify a rewrite step. Move-right steps of M derived from M ’s move-right steps. Other than the above de-scribed move-right steps, M ’s move-right steps nondeterministically simulate those of M simul-taneously updating tape contents because of rewrite guesses. Recall that since M is in the RR-semidet-form, we only need to consider the first symbol under the lookahead for M ’s move-rightsteps (so, in particular, we can talk about move-right steps in δ on a lookahead contents of size k instead of k + 1).Let q ′ ∈ δ ( q, u [1 , k + 1]) (6)be a move-right step for M .Firstly, q ′ ∈ δ ( q, u [1] x [2 , k ]), for all h ( x [1] , x [2 , k ]) = u [2 , k ].In addition, M has the following instructions:If M accepts/rejects/restarts with less than k + 1 symbols under the lookahead, then so can M ; that is, if δ ( q, u [1 , j ]) = ACCEPT (resp.
REJECT , RESTART ) for 1 ≤ j < k , with u [ j ] = $, then δ ( p, x [1 , j ]) = ACCEPT (resp.
REJECT , RESTART ) with x [ j ] = $ and such that COMP ( p ) = q , and forall x [1 , j − ∈ (Γ ∪ { ¢ } ) · Γ j − such that g ( q, x [1 , j − u [1 , j − M alwayshas k + 1 symbols under the lookahead.If q ′ = ACCEPT (so u [ k + 1] = $), then we have, for q u [1 ,k ] ∈ Q , q u [1 ,k ] ∈ δ ( p, x [1 , k ]), and δ ( q u [1 ,k ] , x [2 , k ] z ) ∋ ( ACCEPT if z = $ , and REJECT otherwise.for all p such that COMP ( p ) = q and COMP ( p ) = COMP ( p ) = neutral , and for all z ∈ Γ , and forall x [1 , k ] ∈ (Γ ∪ { ¢ } ) · Γ k − such that g ( p, x [1 , k ]) = u [1 , k ]. Here, M first guesses that M would12ccept and then verifies its guess. We must have COMP ( p ) = COMP ( p ) = neutral , because in thestep after rewriting we have assumed that M does not immediately halt after rewriting.If q ′ = REJECT , then we have simply
REJECT ∈ δ ( p, x [1 , k ]) for all p such that COMP ( p ) = q andfor all x [1 , k ] ∈ (Γ ∪ { ¢ } ) · Γ k − such that g ( p, x [1 , k ]) = u [1 , k ], so long as COMP ( p ) = COMP ( p ) = neutral . M can guess that the M would reject; if this is not the case, there is still some computationthat does not reject.By Corollary 10, the remaining cases for the simulation of (6) are where M reads a compoundsymbol (as the first symbol under the lookahead) and/or is in a compound state.Suppose p ∈ Q , i.e., p = q . By Corollary 10, we must have x [1] ∈ ∆ − ∆ and therefore comp ( x [1]) = u [1]. Now M simply picks up the information in x [1] and moves right as M would:( q ′ , comp ( x [1]) , comp ( x [1]) , neutral , neutral ) ∈ δ ( p, x [1 , k ]) . (7)Finally, suppose p ∈ Q ; then COMP ( p ) = q . The only case left to treat is where COMP ( p ) = neutral .1. If x [1] ∈ (Γ ∪{ ¢ } ) − ∆ , then comp ( x [1]) = redex ( COMP ( p ))[ k +1] and reduct ( COMP ( p ))[ k ] = u [1].2. If x [1] ∈ ∆ . Then by the Matching Lemma,(a) COMP ( p ) = comp ( x [1]) , comp ( x [1]) = redex ( COMP ( p ))[ k +1] and reduct ( COMP ( p ))[ k ] = u [1], or(b) COMP ( p ) = comp ( x [1]) and comp ( x [1]) = u [1]. M rejects for all other contexts (except where it can rewrite). M ’s rewrite and move-right steps being entirely determined by M ’s, it follows that L ( M ) = L ( M ).As a corollary of Theorem 7, we have the following lookahead hierarchy collapsal. Corollary 12.
For k ≥ and X ∈ { (left-, right-left-)mon, λ } , we have L ( X -RRWW ) = ∞ [ k =2 L ( X -RRWW ( k )) = L ( X -RRWW (2))Corollary 12 reduces the most important question concerning restarting automata—whetherthe separation of rewrite and restart steps results in an increase in power—to the same questionabout restarting automata with lookahead length 2: L ( RW W ) = L ( RRW W ) ⇐⇒ L ( RW W ) = L ( RRW W (2)). Theorem 7 also leads to an improvement on a result of [6] with the following corollary,which was proven for k ≥ Corollary 13.
For all k ≥ and X ∈ { left-mon, mon } , we have L ( X -RRWW ( k )) = CF L . Corollary 14.
For all k ≥ , we have L ( right-left-RRWW ( k )) = LIN.
We showed that the restriction on lookahead length is not as important a restriction for restartingautomata with auxiliary symbols as opposed to those without auxiliary symbols, so long as restartand rewrite steps are separated, distinguishing only two different language classes for RRWW au-tomata. The respective question for RWW automata remains open.13 cknowledgements.
We thank the anonymous reviewers for their helpful comments.
References [1] P. Janˇcar, F. Mr´az, M. Pl´atek, and J. Vogel. On monotonic automata with a restart operation.
Journal of Automata, Languages and Combinatorics , 4(4):287–312, 1999.[2] T. Jurdzi´nski, F. Mr´az, F. Otto, and M. Pl´atek. Degrees of non-monotonicity for restartingautomata.
Theoretical Computer Science , 369:1–34, 2006.[3] F. Mr´az. Lookahead hierarchies of restarting automata.
Journal of Automata, Languages andCombinatorics , 6(4):493–506, 2001.[4] G. Niemann and F. Otto. Restarting automata and prefix rewriting systems. Technical report,Kassel University, 1999.[5] F. Otto. Restarting automata. In Z. Esik, C. Martin-Vide, and V. Mitrana, editors,
RecentAdvances in Formal Languages and Applications , volume 25 of
Studies in Computational Intel-ligence , pages 269–303. Springer-Verlag, Berlin, 2006.[6] N. Schluter. On lookahead hierarchies for monotone and deterministic restarting automata withauxiliary symbols (extended abstract). In