[PDF] SD-Regular Transducer Expressions for Aperiodic Transformations

Abstract

FO transductions, aperiodic deterministic two-way transducers, as well as aperiodic streaming string transducers are all equivalent models for first order definable functions. In this paper, we solve the long standing open problem of expressions capturing first order definable functions, thereby generalizing the seminal SF=AP (star free expressions = aperiodic languages) result of Sch\"utzenberger. Our result also generalizes a lesser known characterization by Sch\"utzenberger of aperiodic languages by SD-regular expressions (SD=AP). We show that every first order definable function over finite words captured by an aperiodic deterministic two-way transducer can be described with an SD-regular transducer expression (SDRTE). An SDRTE is a regular expression where Kleene stars are used in a restricted way: they can appear only on aperiodic languages which are prefix codes of bounded synchronization delay. SDRTEs are constructed from simple functions using the combinators unambiguous sum (deterministic choice), Hadamard product, and unambiguous versions of the Cauchy product and the k-chained Kleene-star, where the star is restricted as mentioned. In order to construct an SDRTE associated with an aperiodic deterministic two-way transducer, (i) we concretize Sch\"utzenberger's SD=AP result, by proving that aperiodic languages are captured by SD-regular expressions which are unambiguous and stabilising; (ii) by structural induction on the unambiguous, stabilising SD-regular expressions describing the domain of the transducer, we construct SDRTEs. Finally, we also look at various formalisms equivalent to SDRTEs which use the function composition, allowing to trade the k-chained star for a 1-star.

Full PDF

SSD-Regular Transducer Expressions for AperiodicTransformations

Luc Dartois

Univ Paris Est Creteil, LACL, F-94010 Creteil, [email protected]

Paul Gastin

Université Paris-Saclay, ENS Paris-Saclay, CNRS, LMF, 91190, Gif-sur-Yvette, [email protected]

Shankara Narayanan Krishna

IIT Bombay, [email protected]

Abstract

FO transductions, aperiodic deterministic two-way transducers, as well as aperiodic streamingstring transducers are all equivalent models for ﬁrst order deﬁnable functions. In this paper,we solve the long standing open problem of expressions capturing ﬁrst order deﬁnable functions,thereby generalizing the seminal SF=AP (star free expressions = aperiodic languages) result ofSchützenberger. Our result also generalizes a lesser known characterization by Schützenberger ofaperiodic languages by SD-regular expressions (SD=AP). We show that every ﬁrst order deﬁnablefunction over ﬁnite words captured by an aperiodic deterministic two-way transducer can be describedwith an SD-regular transducer expression (

SDRTE ). An

SDRTE is a regular expression where Kleenestars are used in a restricted way: they can appear only on aperiodic languages which are preﬁxcodes of bounded synchronization delay.

SDRTE s are constructed from simple functions using thecombinators unambiguous sum (deterministic choice), Hadamard product, and unambiguous versionsof the Cauchy product and the k -chained Kleene-star, where the star is restricted as mentioned. Inorder to construct an SDRTE associated with an aperiodic deterministic two-way transducer, (i)we concretize Schützenberger’s SD=AP result, by proving that aperiodic languages are capturedby SD-regular expressions which are unambiguous and stabilising; (ii) by structural induction onthe unambiguous, stabilising SD-regular expressions describing the domain of the transducer, weconstruct

SDRTE s. Finally, we also look at various formalisms equivalent to

SDRTE s which use thefunction composition, allowing to trade the k -chained star for a 1-star. Theory of computation → Transducers

Keywords and phrases transducers, aperiodic functions, regular expressions, transition monoids.

Funding

Supported by IRL ReLaX

The seminal result of Kleene, which proves the equivalence of regular expressions and reg-ular languages, is among the cornerstones of formal language theory. The Büchi, Elgot,Trakhtenbrot theorem which proved the equivalence of regular languages with MSO deﬁnablelanguages, and the equivalence of regular languages with the class of languages having aﬁnite syntactic monoid, established the synergy between machines, logic and algebra. Thefundamental correspondence between machines and logic at the language level has beengeneralized to transformations by Engelfreit and Hoogeboom [15], where regular transforma-tions are deﬁned by two way transducers (2DFTs) as well as by the MSO transductions ofCourcelle [9]. A generalization of Kleene’s theorem to transformations can be found in [3],[4] and [11].In [3], regular transformations were described using additive cost register automata a r X i v : . [ c s . F L ] J a n SD-Regular Transducer Expressions for Aperiodic Transformations (ACRA) over ﬁnite words. ACRAs are a generalization of streaming string transducers (SSTs)[1] which make a single left to right pass over the input and use a ﬁnite set of variables overstrings from the output alphabet. ACRAs compute partial functions from ﬁnite words overa ﬁnite input alphabet to a monoid ( D , + , Our Contributions . The following central problem remained open till now: Given anaperiodic 2DFT A , does there exist a class of expressions over basic functions and regularcombinators such that, one can eﬀectively compute from A , an expression E in this class, andconversely, such that [[ A ]]( w ) = [[ E ]]( w ) for each w ∈ dom ( A )? We solve this open problem,by providing a characterization by means of expressions for aperiodic two way transducers.In the following, we describe the main steps leading to the solution of the problem. Concretizing Schützenberger’s characterization . In 1973, Schützenberger [22] presen-ted a characterization of aperiodic languages in terms of rational expressions where thestar operation is restricted to preﬁx codes with bounded synchronization delay and nocomplementation is used. This class of languages is denoted by SD, and this result is knownas SD=AP. To circumvent the diﬃculty of using complementation in star-free expressions, weuse this SD=AP characterization of aperiodic languages by SD-expressions. An SD-expressionis a regular expression where the Kleene stars are restricted to appear only on preﬁx codesof bounded synchronization delay. Our ﬁrst contribution is to concretize Schützenberger’sresult to more speciﬁc SD-expressions. We show that aperiodic languages can be capturedby unambiguous , stabilising , SD-expressions. The unambiguity of an expression refers to theunique way in which it can be parsed, while stabilising expressions is a new notion introducedin this paper. Our concretization, (Theorem 10) shows that, given a morphism ϕ from thefree monoid Σ ∗ to a ﬁnite aperiodic monoid M , for each s ∈ M , ϕ − ( s ) can be expressed byan unambiguous, stabilising SD-expression. The two notions of unambiguity and stabilising help us to capture the runs of an aperiodic two way transducer. These two notions will bedescribed in detail in Section 3. . Dartois, P. Gastin, S. Krishna 3 q q q q q q q $ /ε, +1 a/ε, +1 ‘ /ε, +1 /ε, +1 a/b, +1 $ /ε, − a/ε, − /ε, − a/ε, − /ε, +1 a/a, +1 /ε, +1 a/ε, +1$ /ε, +1 Figure 1

An aperiodic 2DFT A computing the partial function[[ A ]]($ a m a m $ a m a m $ · · · a m k $) = b m a m b m a m · · · b m k a m k − , for k ≥

0. Theinput alphabet is Σ = { a, , $ } while the output alphabet is Γ = { a, b } . The Combinators . Our second contribution is the deﬁnition of SD-regular transducerexpressions (

SDRTE ). These are built from basic constant functions using combinators suchas unambiguous sum, unambiguous Cauchy product, Hadamard product. In addition, we use k -chained Kleene star [ L, C ] k? (and its reverse) when the parsing language L is restricted tobe aperiodic and a preﬁx code with bounded synchronisation delay. It should be noticedthat, contrary to the case of regular transducer expressions ( RTE ) which deﬁne all regularfunctions, the 2-chained Kleene star [

L, C ] ? does not seem suﬃcient to deﬁne all aperiodicfunctions (see Section 4.7 as well as Figure 2), and k -chained Kleene stars for arbitrary large k seem necessary to capture all aperiodic functions.The semantics of an SDRTE C is a partial function [[ C ]] : Σ ∗ → Γ ∗ with domain denoted dom ( C ). An SDRTE of the form

L . v where L ⊆ Σ ∗ is an aperiodic language and v ∈ Γ ∗ is such that [[ L . v ]] is a constant function with value v and domain L . The Hadamardproduct C (cid:12) C when applied to w ∈ dom ( C ) ∩ dom ( C ) produces [[ C ]]( w ) · [[ C ]]( w ). Theunambiguous Cauchy product C · C when applied on w ∈ Σ ∗ produces [[ C ]]( u ) · [[ C ]]( v )if w can be unambiguously decomposed as u · v , with u ∈ dom ( C ) and v ∈ dom ( C ). TheKleene star C ∗ is deﬁned when L = dom ( C ) is an aperiodic language which is a preﬁx codewith bounded synchronisation delay. Then dom ( C ∗ ) = L ∗ , and, for w = u u · · · u n with u i ∈ L , we have [[ C ∗ ]]( w ) = [[ C ]]( u )[[ C ]]( u ) · · · [[ C ]]( u n ).As an example, consider the SDRTE s C = C · C , C = C · C and D = C (cid:12) C with C = ( a ∗ . ε , C = ( a . b ) ∗ · ($ . ε ), and C = ( a . a ) ∗ · ( . ε ), C = ( a ∗ $) . ε .Then dom ( C ) = a ∗ dom ( C ), dom ( C ) = a ∗ $ = dom ( C ), and dom ( C ) = a ∗ a ∗ $ = dom ( C ) = dom ( D ). Further, [[ C ]]( a m ε , [[ C ]]( a n $) = b n , [[ C ]]( a m a m , and[[ C ]]( a ∗ $) = ε . Also, [[ D ]]( a m a n $) = b n a m . Notice that dom ( D ) is a preﬁx code withsynchronisation delay 1. Hence, we can deﬁne the SDRTE D ∗ which has domain theaperiodic language dom ( D ∗ ) = ( a ∗ a ∗ $) ∗ , and [[ D ∗ ]]( a a $ a a $) = b a b a . The SDRTE D = ($ . ε ) · D ∗ corresponds to the aperiodic 2DFT A in Figure 1: [[ A ]] = [[ D ]]. SDRTE ↔ Aperiodic 2DFT . Our third and main contribution solves the open problemby proving the eﬀective equivalence between aperiodic two way transducers and

SDRTE s over ﬁnite words: (cid:73) Theorem 1. (1) Given an

SDRTE , we can eﬀectively construct an equivalent aperiodic2DFT. (2) Given an aperiodic 2DFT, we can eﬀectively construct an equivalent

SDRTE . The proof of (1) is by structural induction on the

SDRTE . All cases except the k -chainedKleene star are reasonably simple, and it is easy to see how to construct the equivalent2DFT. The case of the k -chained Kleene star is more involved. We write [ L, C ] k? as the SD-Regular Transducer Expressions for Aperiodic Transformations composition of 3 aperiodic functions f , f , f , where, (i) f takes as input u u · · · u n ∈ L ∗ with u i ∈ L and produces as output u u · · · u n f takes v v · · · v m v i ∈ Σ ∗ as input, and produces v · · · v k v · · · v k +1 · · · v m − k +1 · · · v m f takes w w · · · w ‘ w i ∈ Σ ∗ and produces asoutput f ( w ) f ( w ) · · · f ( w ‘ ). We produce aperiodic 2DFTs for f , f , f , and compose them,obtaining the required aperiodic 2DFT.The construction of SDRTE from an aperiodic 2DFT A is much more involved, and isbased on the transition monoid TrM of the 2DFT A . The translation of A to SDRTE is guidedby an unambiguous, stabilising, SD-regular expression induced by

TrM . These expressionsare obtained thanks to Theorem 10 applied to the canonical morphism ϕ : Σ → TrM wherethe transition monoid

TrM of A is aperiodic. This construction is illustrated in detail viaExamples 23, 26, 27 and 29. Related Work . A natural operation on functions is that of composition. The compositionoperation can be used in place of the chained-sum operator of [3], and also in place of theunambiguous 2-chained iteration of [11], preserving expressiveness. In yet another recentpaper, [17] proposes simple functions like copy, duplicate and reverse along with functioncomposition to capture regular word transductions.A closely related paper to our work is [6], where ﬁrst-order and regular list functionswere introduced. Using the basic functions reverse, append, co-append, map, block on lists,and combining them with the function combinators of disjoint union, map, pairing andcomposition, these were shown to be equivalent (after a suitable encoding) to FO transductionsa la Courcelle (extendible to MSO transductions by adding to the basic functions, the preﬁxmultiplication operation on groups). [6] provides an equivalent characterization (modulo anencoding) for FO transductions with basic list functions and combinators.Contrary to [6] where expressions crucially rely on function composition, we focus onconcatenation and iteration as ﬁrst class combinators, in the spirit of Kleene’s theorem andof Schützenberger’s characterisation AP=SD. We are able to characterise 2DFTs with suchSD-regular expressions without using composition. Hence, our result is fully independent andcomplementary to the work in [6]: both formalisms,

SDRTE s and list functions are naturalchoices for describing ﬁrst order transductions. Our basic functions and combinators areinspired from the back and forth traversal of a two way automaton, and the restrictions on theusage of the Kleene star comes from the unambiguous, stabilising nature of the expressionscapturing the aperiodic domain of the 2DFT. We also study in Section 6 how compositionmay be used to simplify our

SDRTE s (Theorem 30). With composition, k -chained Kleenestar ( k >

1) is no more necessary, resulting in an equivalent formalism, namely,

SDRTE where we only use 1-star. Yet another equivalent formalism is obtained by restricting

SDRTE to simple functions, unambiguous sum, Cauchy product and 1-star, but adding the functionsduplicate and reverse along with composition.

Structure of the paper.

In Section 2, we introduce preliminary notions used throughoutthe paper. In Section 3 we give a procedure to construct complement-free expressions foraperiodic languages that suits our approach. This is a generic result on languages, independentof two-way transducers. Section 4 presents the combinators and the chain-star operatorsfor our characterization. The main theorem and technical proofs, which is constructingSD-regular transducer expressions from a two-way aperiodic transducer, are in Section 5. . Dartois, P. Gastin, S. Krishna 5

We call a ﬁnite set Σ an alphabet and its elements letters . A ﬁnite sequence of letters of Σis called a word , and a set of words is a language . The empty word is denoted ε , and wedenote by Σ ∗ the set of all words over the alphabet Σ. More generally, given any language L ⊆ Σ ∗ , we write L ∗ for the Kleene star of L , i.e., the set of words which can be written asa (possibly empty) sequence of words of L . Given a word u , we write | u | for the length of u ,i.e., its number of letters, and we denote by u i its i th letter.A monoid M is a set equipped with a binary associative law, usually denoted · or omittedwhen clear from context, and a neutral element 1 M for this law, meaning that for any s ∈ M ,1 M · s = s · M = s . The set of words Σ ∗ can be seen as the free monoid generated by Σ usingthe concatenation of words as binary law. Given a morphism ϕ : Σ ∗ → M , i.e., a functionbetween monoids that satisﬁes ϕ ( ε ) = 1 M and ϕ ( xy ) = ϕ ( x ) ϕ ( y ) for any x, y , we say that ϕ recognizes a language L ⊆ Σ ∗ if M is ﬁnite and L = ϕ − ( P ) for some P ⊆ M . A monoid iscalled aperiodic if there exists an integer n such that for any element s of M , s n = s n +1 . (cid:73) Example 2.

We deﬁne the monoids e U n , for n ≥

0, as the set of elements { , s , . . . , s n } ,with 1 being the neutral element, and for any 1 ≤ i, j ≤ n , s i · s j = s i . Clearly, e U n isaperiodic, actually idempotent, as s i · s i = s i for any 1 ≤ i ≤ n . For instance, the monoid e U is the transition monoid (deﬁned below) of the automaton below with ϕ ( a ) = s , ϕ ( b ) = s and ϕ ( c ) = 1. a, c b, cab Rational languages are languages that can be described by rational expressions, i.e.,sets of words constructed from ﬁnite sets using the operations of concatenation, union andKleene star. It is well-known that rational languages are equivalent to regular languages, i.e.,languages accepted by ﬁnite automata, and to languages recognized by ﬁnite monoids (andMonadic Second-order logic [7]).

Star-free rational expressions are built from ﬁnite sets usingthe operations of concatenation, union and complement (instead of Kleene star). They havethe same expressive power as ﬁnite aperiodic monoids [21] (as well as counter-free automataand ﬁrst-order logic [18]). (cid:73)

Deﬁnition 3 (Two-way transducer) . A (deterministic) two-way transducer (2DFT) is atuple A = ( Q, Σ , Γ , δ, γ, q , F ) deﬁned as follows: Q is a ﬁnite set of states . Σ and Γ are the ﬁnite input and output alphabets . δ : Q × (Σ ] {‘ , a} ) → Q × {− , +1 } is the partial transition function . Contrary toone-way machines, the transition function also outputs an integer, indicating the moveof the reading head. The alphabet is enriched with two new symbols ‘ and a , which areendmarkers that are added respectively at the beginning and at the end of the input word,such that for all q ∈ Q , we have δ ( q, ‘ ) ∈ Q × { +1 } (if deﬁned), δ ( q, a ) ∈ Q × {− } (ifdeﬁned) and δ ( q, a ) is undeﬁned for q ∈ F . γ : Q × (Σ ] {‘ , a} ) → Γ ∗ is the partial production function with same domain as δ . q ∈ Q is the initial state . SD-Regular Transducer Expressions for Aperiodic Transformations F ⊆ Q is the set of ﬁnal states. A conﬁguration c of A over an input word w = w · · · w | w | is simply a pair ( p, i ) where p ∈ Q is the current state and 0 ≤ i ≤ | w | + 1 is the position of the head on the inputtape containing ‘ w a . Two conﬁgurations c = ( p, i ) and c = ( q, j ) are successive if we have δ ( p, w i ) = ( q, d ) and i + d = j , with w = ‘ and w | w | +1 = a . In this case, they produce anoutput v = γ ( p, w i ). Abusing notations we will sometime write γ ( c ) when the input word w is clear. A run ρ is a sequence of successive conﬁgurations c · · · c n . The run ρ is initial if c = ( q ,

0) and is ﬁnal if c n = ( q, | w | + 1) for some q ∈ F . It is accepting if it is both initialand ﬁnal.The output of a run ρ = c · · · c n is the concatenation of the output of the conﬁgurations,and will be denoted [[ ρ ]] = γ ( c ) · · · γ ( c n − ). Given a deterministic two-way transducer A andan input word w , there is at most one accepting run of A over ‘ w a , which we will denote ρ ( w ). The output of A over w is then [[ A ]]( w ) = [[ ρ ( w )]]. The domain of A is the set dom ( A )of words w such that there exists an accepting run of A over w . Finally, the semantics of A is the partial function [[ A ]] : Σ ∗ → Γ ∗ deﬁned on dom ( A ) by w [[ A ]]( w ).Let ρ = ( p , i ) · · · ( p n , i n ) be a run over a nonempty word w ∈ Σ + such that 1 ≤ i j ≤ | w | for all 0 ≤ j < n . It is a left-right run if i = 1 and i n = | w | + 1. If this is the case, we saythat ρ is a ( → , p , p n )-run. Similarly, it is a left-left ( (cid:121) , p , p n )-run if i = 1 and i n = 0. Itis a right-left ( ← , p , p n )-run if i = | w | and i n = 0 and it is a right-right ( (cid:120) , p , p n )-run if i = | w | and i n = | w | + 1. Notice that if | w | = 1, then left-right runs and right-right runscoincide, also right-left runs and left-left runs coincide. (cid:73) Remark 4.

Given our semantics of two-way transducers, a run associates states to eachposition, whereas the classical semantics of one-way automata keeps the states between twopositions. Then, if we consider a word w = uv and a left-left run ( (cid:121) , p, q ) on v , we begin onthe ﬁrst position of v in state p , and the state q is reached at the end of the run on the lastposition of u . This allows for easy sequential composition of partial runs when concatenatingnon empty words, as the end of a partial run is the start of the next one.However, in order to keep our ﬁgures as readable as possible, we will represent these statesbetween words. A state q between two words u and v is to be placed on the ﬁrst positionof v if it is the start of a run going to the right, and on the last position of u otherwise.For instance, in Figure 4, state q is on the ﬁrst position of u i +1 and state q is on the lastposition of u i . Transition monoid of a two-way automaton

Let A be a deterministic two-way automaton (2DFA) with set of states Q . When computingthe transition monoid of a two-way automaton, we are interested in the behaviour ofthe partial runs, i.e., how these partial runs can be concatenated. Thus we abstract agiven ( d, p, q )-run ρ over a word w to a step ( d, p, q ) ∈ {→ , (cid:121) , (cid:120) , ←} × Q and we saythat w realises the step ( d, p, q ). The transition monoid TrM of A is a subset of thepowerset of steps: TrM ⊆ P ( {→ , (cid:121) , (cid:120) , ←} × Q ). The canonical surjective morphism ϕ : (Σ ]{‘ , a} ) ∗ → TrM = ϕ ((Σ ]{‘ , a} ) ∗ ) is deﬁned for a word w ∈ (Σ ]{‘ , a} ) ∗ as the set ofsteps realised by w , i.e., ϕ ( w ) = { ( d, p, q ) | there is a ( d, p, q )-run on w } ⊆ {→ , (cid:121) , (cid:120) , ←}× Q .As an example, in Figure 1, we have ϕ ( a { ( → , q , q ) , ( (cid:120) , q , q ) , ( (cid:121) , q , q ) , ( ← , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , q , q ) } . The unit of

TrM is = { ( → , p, p ) , ( ← , p, p ) | p ∈ Q } and ϕ ( ε ) = . . Dartois, P. Gastin, S. Krishna 7 A 2DFA is aperiodic if its transition monoid

TrM is aperiodic. Also, a 2DFT is aperiodicif its underlying input 2DFA is aperiodic.When talking about a given step ( d, p, q ) belonging to an element of

TrM , we willsometimes forget p and q and talk about a d -step, for d ∈ { (cid:121) , (cid:120) , → , ←} if the states p, q areclear from the context, or are immaterial for the discussion. In this case we also refer to astep ( d, p, q ) as a d -step having p as the starting state and q as the ﬁnal state. As the aim of the paper is to obtain rational expressions corresponding to transformationscomputed by aperiodic two-way transducers, we cannot rely on extending the classical(SF=AP) star-free characterization of aperiodic languages, since the complement of a functionis not a function. We solve this problem by considering the SD=AP characterization ofaperiodic languages, namely preﬁx codes with bounded synchronisation delay, introduced bySchützenberger [22].A language L is called a code if for any word u ∈ L ∗ , there is a unique decomposition u = v · · · v n such that v i ∈ L for 1 ≤ i ≤ n . For example, the language W = { a, ab, ba, bba } is not a code: the words abba, aba ∈ W ∗ have decompositions a · bba = ab · ba and a · ba = ab · a respectively. A preﬁx code is a language L such that for any pair of words u, v , if u, uv ∈ L ,then v = ε . W is not a preﬁx code, while W = W \ { ab } and W = W \ { a } are preﬁxcodes. Preﬁx codes play a particular role in the sense that the unique decomposition can beobtained on the ﬂy while reading the word from left to right. (cid:73) Deﬁnition 5.

Let d be a positive integer. A preﬁx code C over an alphabet Σ has asynchronisation delay d (denoted d -SD) if for all u, v, w ∈ Σ ∗ , uvw ∈ C ∗ and v ∈ C d implies uv ∈ C ∗ (hence also w ∈ C ∗ ). An SD preﬁx code is a preﬁx code with a boundedsynchronisation delay. As an example, consider the preﬁx code C = { aa, ba } and the word ba ( aa ) d ∈ C ∗ . Wehave ba ( aa ) d = uvw with u = b , v = ( aa ) d ∈ C d and w = a . Since uv / ∈ C ∗ , the preﬁxcode C is not of bounded synchronisation delay. Likewise, C = { aa } is also not of boundedsynchronisation delay. On the other hand, the preﬁx code C = { ba } is 1-SD.The syntax of regular expressions over the alphabet Σ is given by the grammar E ::= ∅ | ε | a | E ∪ E | E · E | E ∗ where a ∈ Σ. We say that an expression is ε -free (resp. ∅ -free) if it does not use ε (resp. ∅ )as subexpressions. The semantics of a regular expression E is a regular language over Σ ∗ denoted L ( E ).An SD-regular expression is a regular expression where Kleene-stars are restricted to SDpreﬁx codes: If E ∗ is a sub-expression then L ( E ) is a preﬁx code with bounded synchronizationdelay. Thus, the regular expression ( ba ) ∗ is a SD-regular expression while ( aa ) ∗ is not.The relevance of SD-regular expressions comes from the fact that they are a complement-free characterization of aperiodic languages. (cid:73) Theorem 6. [22] A language L is recognized by an aperiodic monoid if, and only if, thereexists an SD-regular expression E such that L = L ( E ) . Theorem 10 concretizes this result, and extends it to get more speciﬁc expressions whichare (i) unambiguous , a property required for the regular combinators expressing functionsover words, and (ii) stabilising , which is a new notion introduced below that suits our need

SD-Regular Transducer Expressions for Aperiodic Transformations s q q q q q q q ‘ /ε, +1 b/ε, +1 a/ε, +1 b/ε, +1 a/ε, +1 b/ε, +1 a/a, +1 b/b, − a/ε, − b/ε, − a/ε, − b/ε, − a/ε, − b/ε, +1 a/a, +1 b/b, +1 Figure 2

For u i ∈ a ∗ b , an aperiodic 2DFT A computing the partial function [[ A ]]( bu u · · · u n a k ) = u u u u · · · u n u n − a k if n ≥

3, and a k if n = 2. The domain is b ( a ∗ b ) ≥ a ∗ . for characterizing runs of aperiodic two-way transducers. Our proof technique follows thelocal divisor technique, which was notably used by Diekert and Kuﬂeitner to lift the result ofSchützenberger to inﬁnite words [12, 13].A regular expression E is unambiguous, if it satisﬁes the following: for each subexpression E ∪ E we have L ( E ) ∩ L ( E ) = ∅ , for each subexpression E · E , each word w ∈ L ( E · E ) has a unique factorisation w = uv with u ∈ L ( E ) and v ∈ L ( E ), for each subexpression E ∗ , the language L ( E ) is a code , i.e., each word w ∈ L ( E ∗ ) has a unique factorisation w = v · · · v n with v i ∈ L ( E ) for 1 ≤ i ≤ n . (cid:73) Deﬁnition 7.

Given an aperiodic monoid M and X ⊆ M , we say that X is n - stabilising if xy = x for all x ∈ X n and y ∈ X . We say that X is stabilising if it is n -stabilising forsome n ≥ . Remark . Stabilisation generalizes aperiodicity in some sense. For aperiodicity, we require x n = x n +1 for each element x ∈ M and some n ∈ N , i.e., all singleton subsets of M shouldbe stabilising. (cid:73) Example 8.

Continuing Example 2, any subset X ⊆ { s , . . . , s n } ⊆ e U n is 1-stabilising.As another example, consider the aperiodic 2DFT A in Figure 1, and consider its transitionmonoid TrM . Clearly,

TrM is an aperiodic monoid. Let ϕ be the morphism from (Σ ] {‘ , a} ) ∗ to TrM . Consider the subset Z = { Y, Y } of TrM where Y = ϕ ( a a $): Y = { ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , q , q ) , ( ← , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Y = { ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } . It can be seen that Y = Y , hence Z is 2-stabilising.Let ϕ : Σ ∗ → M be a morphism. We say that a regular expression E is ϕ - stabilising (orsimply stabilising when ϕ is clear from the context) if for each subexpression F ∗ of E , theset ϕ ( L ( F )) is stabilising.Continuing Example 8, we can easily see that ϕ ( a ) is idempotent and we get ϕ ( a + a + $) = { Y } . Since Y = Y , we deduce that ( aa ∗ aa ∗ $) ∗ is a stabilising expression. Notice alsothat, by deﬁnition, expressions without a Kleene-star are stabilising vacuously. (cid:73) Example 9.

As a more non trivial example to illustrate stabilising expressions, consider the2DFT A in Figure 2, whose domain is the language b ( a ∗ b ) ≥ a ∗ . Consider b ( a ∗ b ) ≥ ⊆ dom ( A ).Note that a ∗ b is a preﬁx code with synchronisation delay 1. Let X = ϕ ( a ∗ b ), where ϕ is themorphism from (Σ ] {‘ , a} ) ∗ to TrM . We will see that X stabilises. . Dartois, P. Gastin, S. Krishna 9 First, we have X = { Y , Y } where Y = ϕ ( b ) = { ( → , s, q ) , ( → , q , q ) , ( → , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Y = ϕ ( a + b ) = ϕ ( ab ) = { ( → , q , q ) , ( → , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Next, we can check that X = { Y , Y } where Y = Y Y = Y Y = { ( → , s, q ) , ( → , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Y = Y Y = Y Y = { ( → , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( ← , q , q ) , ( ← , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Then, we have X = { Z , Z } where Z = Y Y = Y Y = { ( → , s, q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Z = Y Y = Y Y = { ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( (cid:121) , q , q ) , ( → , q , q ) , ( (cid:120) , s, q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) , ( (cid:120) , q , q ) } Finally, we can easily check that Z Y = Z Y = Z and Z Y = Z Y = Z . Therefore, X is 4-stabilising. Moreover, b ( a ∗ b ) ≥ ⊆ ϕ − ( Z ) and a + b ( a ∗ b ) ≥ ⊆ ϕ − ( Z ).Given a morphism ϕ from Σ ∗ to some aperiodic monoid M , our goal is to build, foreach language ϕ − ( s ) with s ∈ M , an SD-regular expression which is both unambiguous and stabilising . The proof is by induction on the monoid M via the local divisor technique, similarto Diekert and Kuﬂeitner [12, 13, 14], and to Perrin and Pin [20, Chapter VIII, Section 6.1],with the objective to get stronger forms of SD-regular expressions. (cid:73) Theorem 10.

Given a morphism ϕ from the free monoid Σ ∗ to a ﬁnite aperiodic monoid M , for each s ∈ M there exists an unambiguous, stabilising, SD-regular expression E s suchthat L ( E s ) = ϕ − ( s ) . The proof of this theorem makes crucial use of marked substitutions (see [20]) that wedeﬁne and study in the next section.

Let

A, B be ﬁnite alphabets. A map α : A → P ( B ∗ ) is called a marked substitution if itsatisﬁes the following two properties:There exists a partition B = B ] B such that for all a in A , α ( a ) ⊆ B ∗ B ,For all a and a in A , a = a implies α ( a ) ∩ α ( a ) = ∅ .A marked substitution α : A → P ( B ∗ ) can be naturally extended to words in A ∗ usingconcatenation of languages, i.e., to a morphism from the free monoid A ∗ to ( P ( B ∗ ) , · , { ε } ).It is then further lifted to languages L ⊆ A ∗ by union: α ( L ) = S w ∈ L α ( w ). (cid:73) Lemma 11 ([20] Chapter VIII, Proposition 6.2) . Let α : A → P ( B ∗ ) be a marked substitution,and X ⊆ A + be a preﬁx code with synchronisation delay d . Then Y = α ( X ) ⊆ ( B ∗ B ) + is apreﬁx code with synchronisation delay d + 1 . Proof.

First, since B and B are disjoint, B ∗ B ⊆ B ∗ is a preﬁx code. Hence, given aword w ∈ α ( A ∗ ) ⊆ ( B ∗ B ) ∗ , there exists a unique decomposition w = w · · · w n such that w i ∈ B ∗ B for 1 ≤ i ≤ n . Now since images of diﬀerent letters from A are disjoint, thereexists at most one a i such that w i ∈ α ( a i ). We deduce that there is exactly one word w ∈ A ∗ such that α ( w ) = w . This word is denoted α − ( w ).Now, we prove that Y is a preﬁx code. Let v, w ∈ α ( A ∗ ) ⊆ ( B ∗ B ) ∗ and assume that v is a preﬁx of w . Write w = w · · · w n with w i ∈ B ∗ B . Since v ends with a letter from B we deduce that v = w · · · w i for some 1 ≤ i ≤ n . Let w = α − ( w ) = a · · · a n . We have v = α − ( v ) = a · · · a i . Now, if v, w ∈ α ( X ) then we get v , w ∈ X . Since X is a preﬁxcode we get i = n . Hence v = w , proving that Y is also a preﬁx code.Finally, we prove that Y has synchronization delay d + 1. Let u, v, w in B ∗ suchthat uvw ∈ Y ∗ and v ∈ Y d +1 . We need to prove that uv ∈ Y ∗ . Since v ∈ Y d +1 , itcan be written v = v v · · · v d with v i ∈ Y for 0 ≤ i ≤ d . Then, let us remark that α ( A ) ⊆ B ∗ B is a preﬁx code with synchronisation delay 1. Since uv · · · v d w ∈ Y ∗ ⊆ α ( A ) ∗ and v ∈ Y ⊆ α ( A ) + , we deduce that uv belongs to α ( A ) ∗ = α ( A ∗ ), as well as v · · · v d and w . Let r = α − ( uv ), s = α − ( v · · · v d ) and t = α − ( w ). We have rst = α − ( uvw ) andsince uvw ∈ Y ∗ = α ( X ∗ ), we deduce that rst ∈ X ∗ . Similarly, from v · · · v d ∈ Y d = α ( X ) d ,we get s ∈ X d . Now, X has synchronisation delay d . Therefore, rs ∈ X ∗ , meaning that uv = uv · · · v d ∈ α ( rs ) ⊆ α ( X ∗ ) = Y ∗ . (cid:74) Marked substitutions also preserve unambiguity of union, concatenation and Kleene star. (cid:73)

Lemma 12.

Let α : A → P ( B ∗ ) be a marked substitution and let L , L ⊆ A ∗ . If the union L ∪ L is unambiguous then so is α ( L ) ∪ α ( L ) . If the concatenation L · L is unambiguous then so is α ( L ) · α ( L ) . If the Kleene star L ∗ is unambiguous then so is α ( L ) ∗ . Proof.

As stated in the previous proof, a marked substitution is one-to-one. We denote by α − ( w ) the unique inverse of w , for w in α ( A ∗ ).If w ∈ α ( L ) ∪ α ( L ) then α − ( w ) ∈ L ∪ L . This shows that unambiguity of union ispreserved by α .Assume now that the concatenation L · L is unambiguous. Let w ∈ α ( L ) · α ( L )and consider its unique factorisation w = v b · · · v n b n as above and α − ( w ) = a · · · a n .Since α ( L ) ⊆ ( B ∗ B ) ∗ , a factorisation of w according to α ( L ) · α ( L ) must be of theform w = ( v b · · · v i b i ) · ( v i +1 b i +1 · · · v n b n ) with a · · · a i ∈ L and a i +1 · · · a n ∈ L . Fromunambiguity of the product L · L we deduce that such a factorisation of w is unique. Hence,the product α ( L ) · α ( L ) is unambiguous.We can prove similarly that α preserves unambiguity of Kleene stars. (cid:74) We will be interested in marked substitutions that are deﬁned by regular expressions.A regular marked substitution (RMS) is a map α : A → Reg ( B ∗ ) which assigns a regularexpression α ( a ) over B to each letter a ∈ A such that ˜ α : A → P ( B ∗ ) deﬁned by ˜ α ( a ) = L ( α ( a )) is a marked substitution.Let α : A → Reg ( B ∗ ) be an RMS and let E be a regular expression over A . Wedeﬁne α ( E ) = E [ α ( a ) /a, ∀ a ∈ A ] to be the regular expression over B obtained from E bysubstituting each occurrence of a letter a ∈ A with the expression α ( a ). Notice that α iscompositional: α ( E ∪ E ) = α ( E ) ∪ α ( E ), α ( E · E ) = α ( E ) · α ( E ) and α ( E ∗ ) = α ( E ) ∗ .In particular, we have ˜ α ( L ( E )) = L ( α ( E )).Further, we say that a RMS α is unambiguous (URMS) if α ( a ) is unambiguous for each a ∈ A . Similarly, an RMS α is SD-regular (SDRMS) if α ( a ) is an SD-regular expression foreach a ∈ A . We obtain: . Dartois, P. Gastin, S. Krishna 11 (cid:73) Corollary 13.

Let α : A → Reg ( B ∗ ) be an RMS and E be a regular expression over A . If α and E are SD-regular, then α ( E ) is SD-regular. If α and E are unambiguous, then α ( E ) is unambiguous. Proof.

1. Let F ∗ be a subexpression of α ( E ). If F ∗ is a subexpression of some α ( a ) then, α ( a ) being SD-regular we obtain than L ( F ) is an SD preﬁx code. Otherwise, F = α ( G )where G ∗ is a subexpression of E . Since E is SD-regular, L ( G ) is a SD preﬁx code. Bylemma 11 we deduce that L ( F ) = ˜ α ( L ( G )) is a SD preﬁx code.2. First, we know that each α ( a ) is unambiguous. Next, a subexpression of α ( E ) whichis not a subexpression of some α ( a ) must be of the form α ( F ) where F is a subexpression of E . We conclude easily using unambiguity of E and Lemma 12. (cid:74) Proof.

We ﬁrst consider the set of neutral letters, i.e., letters whose image is the neutralelement 1 of M . To ease the proof, we ﬁrst explain how to handle them, and in the rest ofthe proof, focus on the case where we do not have neutral letters.Let ϕ : Σ ∗ → M be a morphism and Σ = { a ∈ Σ | ϕ ( a ) = 1 } be the set of neutralletters. Further, let Σ = Σ \ Σ and let ϕ : Σ ∗ → M be the restriction of ϕ to Σ ∗ . Let α : Σ → Reg (Σ ∗ ) be the regular marked substitution deﬁned by α ( a ) = Σ ∗ a . Clearly, α isunambiguous and since Σ is a 1-SD preﬁx code we get that α is SD-regular. By Corollary 13we deduce that α preserves unambiguity and also SD-expressions. It also preserves stabilisingexpressions, i.e., if E ∈ Reg (Σ ∗ ) is ϕ -stabilising then α ( E ) ∈ Reg (Σ ∗ ) is ϕ -stabilising.Indeed, ϕ (Σ ) is 1-stabilising. Further, if G ∗ is a subexpression of α ( E ) diﬀerent from Σ ∗ then there is a subexpression F ∗ of E such that G = α ( F ). Hence, L ( G ) = ˜ α ( L ( F )) and X = ϕ ( L ( G )) = ϕ ( L ( F )) is stabilising.Now, suppose we have unambiguous, stabilising, SD-expressions E s for ϕ and each s ∈ M : L ( E s ) = ϕ − ( s ). We deduce that E s = α ( E s ) · Σ ∗ is an unambiguous, stabilising,SD-expression. Moreover, we have L ( E s ) = ϕ − ( s ).In the rest of the proof, we assume that the morphism ϕ has no neutral letters. Theproof is by induction on the size of M , using a result from Perrin and Pin [20, Chapter XI,Proposition 4.14] stating that if ϕ is a surjective morphism from Σ ∗ to a ﬁnite aperiodicmonoid M , then one of the following cases hold: M is a cyclic monoid, meaning that M is generated by a single element. M is isomorphic to e U n for some n ≥ There is a partition Σ = A ] B such that ϕ ( A ∗ ) and ϕ (( A ∗ B ) ∗ ) are proper submonoidsof M .We now treat the three cases above. M is a cyclic monoid. Then M is of the form { , s, s , . . . , s n } with s i s j = s i + j if i + j ≤ n and s n otherwise. Notice that since we have no neutral letters, ϕ − (1) = { ε } . For1 ≤ i ≤ n , we denote by Σ i the set of letters whose image is s i . Now, we deﬁne inductivelystabilising, unambiguous, SD-regular expressions E j such that L ( E j ) = ϕ − ( s j ) for1 ≤ j ≤ n . Let E = Σ . Then, for 1 < j < n we let E j = Σ j ∪ [ ≤ i

For each v ∈ Γ ∗ we have a constant function f v deﬁned by f v ( u ) = v for all u ∈ Σ ∗ . Abusingnotations, we simply denote the constant function f v by v . We denote by ⊥ : Σ ∗ → Γ ∗ thefunction with empty domain. These atomic functions are the most simple ones. We will use two equivalent ways of deﬁning a function by cases. First, the if-then-elseconstruct is given by h = L ? f : g where f, g : Σ ∗ → Γ ∗ are functions and L ⊆ Σ ∗ is alanguage. We have dom ( h ) = ( dom ( f ) ∩ L ) ∪ ( dom ( g ) \ L ). Then, for w ∈ dom ( h ) we have h ( w ) = ( f ( w ) if w ∈ Lg ( w ) otherwise.We will often use this case deﬁnition with L = dom ( f ). To simplify notations we deﬁne f + g = dom ( f ) ? f : g . Note that dom ( f + g ) = dom ( f ) ∪ dom ( g ) but the sum is notcommutative and g + f = dom ( g ) ? g : f . For w ∈ dom ( f ) ∩ dom ( g ) we have ( f + g )( w ) = f ( w )and ( g + f )( w ) = g ( w ). When the domains of f and g are disjoint then f + g and g + f areequivalent functions with domain dom ( f ) ] dom ( g ). In all cases the sum is associative andthe sum notation is particularly useful when applied to a sequence f , . . . , f n of functions: X ≤ i ≤ n f i = f + · · · + f n = dom ( f ) ? f : dom ( f ) ? f : · · · dom ( f n − ) ? f n − : f n If the domains of the functions are pairwise disjoint then this sum is associative andcommutative.Further, we let

L . f = L ? f : ⊥ the function f restricted to L ∩ dom ( f ). When L = { w } is a singleton, we simply write w . f . The Hadamard product of two functions f, g : Σ ∗ → Γ ∗ ﬁrst applies f and then applies g .It is denoted by f (cid:12) g . Its domain is dom ( f ) ∩ dom ( g ) and ( f (cid:12) g )( u ) = f ( u ) g ( u ) for eachinput word u in its domain. Consider two functions f, g : Σ ∗ → Γ ∗ . The unambiguous Cauchy product of f and g is thefunction f · g whose domain is the set of words w ∈ Σ ∗ which admit a unique factorization w = uv with u ∈ dom ( f ) and v ∈ dom ( g ), and in this case, the computed output is f ( u ) g ( v ).Contrary to the Hadamard product which reads its full input word w twice, ﬁrst applying f and then applying g , the Cauchy product splits unamgibuously its input word w as uv ,applies f on u and then g on v .Sometimes we may want to reverse the output and produce g ( v ) f ( u ). This reversed Cauchy product can be deﬁned using the Hadamard product as f · r g = (( dom ( f ) . ε ) · g ) (cid:12) ( f · ( dom ( g ) . ε )) k -chained Kleene-star and its reverse Let L ⊆ Σ ∗ be a code, let k ≥ f : Σ ∗ → Γ ∗ be a partial function.We deﬁne the k -chained Kleene-star [ L, f ] k? : Σ ∗ → Γ ∗ and its reverse [ L, f ] k?r : Σ ∗ → Γ ∗ asfollows.The domain of both these functions is contained in L ∗ , the set of words having a (unique)factorization over the code L . Let w ∈ L ∗ and consider its unique factorization w = u u · · · u n with n ≥ u i ∈ L for all 1 ≤ i ≤ n . Then, w ∈ dom ([ L, f ] k? ) = dom ([ L, f ] k?r ) if u i +1 · · · u i + k ∈ dom ( f ) for all 0 ≤ i ≤ n − k and in this case we set[ L, f ] k? ( w ) = f ( u · · · u k ) · f ( u · · · u k +1 ) · · · f ( u n − k +1 · · · u n )[ L, f ] k?r ( w ) = f ( u n − k +1 · · · u n ) · · · f ( u · · · u k +1 ) · f ( u · · · u k ) . Notice that when n < k , the right-hand side is an empty product and we get [

L, f ] k? ( w ) = ε and [ L, f ] k?r ( w ) = ε . When k = 1 and L = dom ( f ) is a code then we simply write f ? = [ dom ( f ) , f ] ? and f ?r = [ dom ( f ) , f ] ?r . We have dom ( f ? ) = dom ( f ?r ) = L ∗ .The k-chained Kleene star was also deﬁned in [3, 11]; however as we will see below, weuse it in a restricted way for aperiodic functions. SD-regular transducer expressions (

SDRTE s) are obtained from classical regular transducerexpressions (

RTE s) [3, 11] by restricting the k -chained Kleene-star [ L, f ] k? and its reverse[ L, f ] k?r to aperiodic languages L that are preﬁx codes of bounded synchronisation delay.The if-then-else choice L ? f : g is also restricted to aperiodic languages L . Hence, the syntaxof SDRTE s is given by the grammar: C ::= ⊥ | v | L ? C : C | C (cid:12) C | C · C | [ L, C ] k? | [ L, C ] k?r where v ∈ Γ ∗ , and L ⊆ Σ ∗ ranges over aperiodic languages (or equivalently SD-regularexpressions), which are also preﬁx codes with bounded synchronisation delay for [ L, C ] k? and [ L, C ] k?r .The semantics of SDRTE s is deﬁned inductively. [[ ⊥ ]] is the function which is nowheredeﬁned, [[ v ]] is the constant function such as [[ v ]]( u ) = v for all u ∈ Σ ∗ , and the semantics ofthe other combinators has been deﬁned in the above sections.As discussed in Section 4.2, we will use binary sums C + C = dom ( C ) ? C : C andgeneralised sums P i C i . Also, we use the abbreviation L . C = L ? C : ⊥ and the reversedCauchy product C · r C = (( dom ( C ) . ε ) · C ) (cid:12) ( C · ( dom ( C ) . ε )). (cid:73) Lemma 14. If C is an SDRTE , then dom ( C ) is an aperiodic language. Proof.

We prove the statement by induction on the syntax of

SDRTE s. We recall thataperiodic languages are closed under concatenation, union, intersection and complement. dom ( ⊥ ) = ∅ and dom ( v ) = Σ ∗ are aperiodic languages. C = L ? C : C . By induction, dom ( C ) and dom ( C ) are aperiodic. We have dom ( C ) =( L ∩ dom ( C )) ∪ ( dom ( C ) \ L ), which is aperiodic thanks to the closure properties ofaperiodic languages. C = C (cid:12) C . By induction, dom ( C ) and dom ( C ) are aperiodic. We deduce that dom ( C ) = dom ( C ) ∩ dom ( C ) is aperiodic. C = C · C . By induction, L = dom ( C ) and L = dom ( C ) are aperiodic. We have dom ( C ) ⊆ dom ( C ) · dom ( C ). However, C is undeﬁned on words having more than one . Dartois, P. Gastin, S. Krishna 15 decomposition. A word which admits at least two decompositions can be written uvw with v = ε , u, uv ∈ L and vw, w ∈ L . Let ϕ : Σ ∗ → M be a morphism to a ﬁniteaperiodic monoid recognising both L and L . We have L = ϕ − ( P ) and L = ϕ − ( P )for some P , P ⊆ M . The set L of words having at least two decompositions is precisely L = [ r,s,t | r,rs ∈ P ∧ st,t ∈ P ϕ − ( r )( ϕ − ( s ) \ { ε } ) ϕ − ( t )which is aperiodic. We deduce that dom ( C ) = ( L · L ) \ L is aperiodic. C = [ L, C ] k? . By induction, dom ( C ) is aperiodic and by deﬁnition L is an aperiodicSD preﬁx code. Hence L ∗ is aperiodic. Notice that dom ( C ) ⊆ L ∗ but C is undeﬁned onwords w = u · · · u n with u i ∈ L if there is a factor u i +1 · · · u i + k which is not in dom ( C ).We deduce that dom ( C ) = L ∗ \ ( L ∗ ( L k \ dom ( C )) L ∗ ) which is aperiodic thanks to theclosure properties given above.Notice that dom ([ L, C ] k?r ) = dom ([ L, C ] k? ), which is aperiodic as proved above. (cid:74)(cid:73) Proposition 15.

Given an

SDRTE C and a letter a ∈ Σ , we can construct an SDRTE a − C such that dom ( a − C ) = a − dom ( C ) and [[ a − C ]]( w ) =[[ C ]]( aw ) for all w ∈ a − dom ( C ) , we can construct an SDRTE Ca − such that dom ( Ca − ) = dom ( C ) a − and [[ Ca − ]]( w ) =[[ C ]]( wa ) for all w ∈ dom ( C ) a − . Proof.

We recall that aperiodic languages are closed under left and right quotients. Theproof is by structural induction on the given

SDRTE C over alphabet Σ. We only constructbelow the SDRTE s for the left quotient. Formulas for the right quotient can be obtainedsimilarly. A point to note is that, unlike the left quotient, the right quotient of a languagemight break its preﬁx code property, which could be a problem if applied to a parsinglanguage L used for k -star or its reverse. However, the quotient by a letter only modiﬁesthe ﬁrst or last copy of L , which can be decoupled so that the remaining iterations are stillperformed with the same parsing language L . Basic cases.

We deﬁne a − ⊥ = ⊥ and a − v = v for v ∈ Γ ∗ . If-then-else.

Let C = L ? C : C . We deﬁne a − C = a − L ? a − C : a − C .Recall that dom ( C ) = ( dom ( C ) ∩ L ) ∪ ( dom ( C ) \ L ). We deduce that a − dom ( C ) = (( a − dom ( C )) ∩ ( a − L )) ∪ (( a − dom ( C )) \ ( a − L ))= dom ( a − L ? a − C : a − C )Moreover, for w ∈ a − dom ( C ), we have[[ C ]]( aw ) = ( [[ C ]]( aw ) if aw ∈ L [[ C ]]( aw ) otherwise. = ( [[ a − C ]]( w ) if w ∈ a − L [[ a − C ]]( w ) otherwise.= [[ a − L ? a − C : a − C ]]( w ) Hadamard product.

Let C = C (cid:12) C . We deﬁne a − C = a − C (cid:12) a − C .Recall that dom ( C ) = dom ( C ) ∩ dom ( C ). We deduce that a − dom ( C ) = ( a − dom ( C )) ∩ ( a − dom ( C )) = dom ( a − C (cid:12) a − C )Moreover, for w ∈ a − dom ( C ), we have[[ C ]]( aw ) = [[ C ]]( aw )[[ C ]]( aw ) = [[ a − C ]]( w )[[ a − C ]]( w ) = [[ a − C (cid:12) a − C ]]( w ) Cauchy product.

Let C = C · C . The SDRTE a − C is the unambiguous sum of twoexpressions depending on whether the letter a is removed from C or from C . Hence, welet C = ( a − C ) · C and C = ( ε . [[ C ]]( ε )) · ( a − C ). Notice that dom ( C ) = ∅ when ε / ∈ dom ( C ) (i.e., [[ C ]]( ε ) = ⊥ ). Now, we deﬁne a − C = ( a − dom ( C )) . ( C + C ).Let w ∈ a − dom ( C ). Then aw admits a unique factorization aw = uv with u ∈ dom ( C )and v ∈ dom ( C ). There are two exclusive cases.If u = ε then u = au with u ∈ a − dom ( C ). The word w admits a unique factorizationaccording to dom ( a − C ) dom ( C ) which is w = u v . Hence, w ∈ dom ( C ) and[[ C ]]( aw ) = [[ C ]]( u )[[ C ]]( v ) = [[ a − C ]]( u )[[ C ]]( v ) = [[ C ]]( w ) . If u = ε then v = av and v ∈ a − dom ( C ). The word w = v admits a uniquefactorization according to { ε } · dom ( a − C ) which is w = ε · w . Hence, w ∈ dom ( C ) and[[ C ]]( aw ) = [[ C ]]( ε )[[ C ]]( v ) = [[ C ]]( ε )[[ a − C ]]( w ) = [[ C ]]( w ) . We deduce that a − dom ( C ) ⊆ dom ( C ) ∪ dom ( C ) = dom ( C + C ) and dom ( a − C ) = a − dom ( C ) as desired.Finally, assume that w ∈ dom ( C ) ∩ dom ( C ). Then, w admits two factorizations w = u v = εv with u ∈ dom ( a − C ), v ∈ dom ( C ), ε ∈ dom ( C ) and v ∈ dom ( a − C ).We deduce that aw admits two distinct factorizations aw = ( au ) v = ε ( av ) with au , ε ∈ dom ( C ) and v, av ∈ dom ( C ). This is a contradiction with aw ∈ dom ( C ).We deduce that in both cases above, we have[[ C ]]( aw ) = [[( a − dom ( C )) . ( C + C )]]( w ) . k -star. Let L ⊆ Σ ∗ be an aperiodic preﬁx code with bounded synchronisation delay andlet C an SDRTE . Notice that, since L is a code, ε / ∈ L . Also, a − dom ([ L, C ] k? ) ⊆ a − L ∗ = ( a − L ) L ∗ . Let w ∈ a − L ∗ . It admits a unique factorization w = u u · · · u n with u = au ∈ L and u , . . . , u n ∈ L . The unique factorization of aw according to thecode L is aw = u u · · · u n .Now, by deﬁntion of k -star, when n < k we have [[[ L, C ] k? ]]( aw ) = ε . Hence, we let C = (cid:0) ( a − L ) · L

SDRTE a − ([ L, C ] k? ) = ( a − L ∗ ) . (cid:0) C + ( C (cid:12) C ) (cid:1) . Notice that dom ( C ) = a − L

Given an

SDRTE C over an alphabet Σ and a sub-alphabet Σ ⊆ Σ , we canconstruct an SDRTE C over alphabet Σ such that dom ( C ) ⊆ Σ and for any word w in Σ , [[ C ]]( w ) = [[ C ]]( w ) . Proof.

The proof itself is rather straightforward, and simply amounts to get rid of lettersthat do not appear in Σ . We ﬁrst construct C by structural induction from C , and thenprove that it is indeed an SDRTE . Thus C is deﬁned as follows:if C = ⊥ then C = ⊥ ,if C = v then C = v , with dom ( v ) = Σ here since C is over Σ ,if C = L ? C : C then C = ( L ∩ Σ ) ? C : C ,if C = C (cid:12) C then C = C (cid:12) C ,if C = C · C then C = C · C ,if C = [ L, C ] k? then C = [ L ∩ Σ , C ] k? ,if C = [ L, C ] k?r then C = [ L ∩ Σ , C ] k?r .To prove that C is SD-regular, we construct, given an SD-expression E for L over Σ, anSD-expression E over Σ for L ∩ Σ . Again, the proof is an easy structural induction:if E = ∅ then E = ∅ ,if E = a ∈ Σ then E = a ,if E = a ∈ Σ \ Σ then E = ∅ ,if E = E + E then E = E + E ,if E = E · E then E = E · E ,if E = E ∗ then E = E .We conclude by stating that being a preﬁx code with bounded synchronisation delay is aproperty preserved by subsets, hence E is an SD-expression. (cid:74) It is known [11] that the 2-chained Kleene star can simulate the k -chained Kleene-star forregular functions. However, we believe that, contrary to the case of regular functions, the k -chained Kleene-star operator cannot be simulated by the 2-chained Kleene-star whilepreserving the aperiodicity of the expression. The key idea is that, in order to simulate a k -chained Kleene-star on a SD preﬁx code L using a 2-chained Kleene-star, one needs touse L d k/ e as a parser. However, for any given preﬁx code L , the language L n for n > { aa } is not, i.e., for v = ( aa ) d that we consider, ava belongs to ( aa ) ∗ but av does not).Intuitively, parsing L n reduces to count ing factors of L modulo n , which is a classical exampleof non-aperiodicity.As an example, consider the preﬁx code L = ( a + b ) ∗ c which has synchronisation delay 1.Deﬁne a function f with domain L by f ( u u u ) = u u when u , u , u ∈ L , which can bewritten using combinators as (cid:0) ( L . ε ) · ( L . id ) (cid:1) (cid:12) (cid:0) ( L . id ) · ( L . ε ) (cid:1) . The identity function id can itself be written as ( a . a + b . b + c . c ) ? (see also Figure 2, which is a simpliﬁcation ofthe same function, but neverthless has the same inexpressiveness with 2 chained star). Then we believe that the function [ L, f ] ? , which associates to a word u · · · u n ∈ L ∗ the word u u u u · · · u n u n − is not deﬁnable using only 2-chained Kleene-stars. While not a proof,the intuition behind this is that, in order to construct u i +1 u i − , we need to highlight wordsfrom L . In order to do this with a 2-chained Kleene-star, it seems necessary to apply achained star with parser L , which is a preﬁx code but not of bounded synchronisation delay.A similar argument would hold for any [ L, f ] k? , k ≥ f ( u u · · · u k ) = u k u . In this section, we prove the main result of the paper, namely the equivalence between

SDRTE and aperiodic 2DFT stated in Theorem 1. The ﬁrst direction, given an

SDRTE C ,constructing an equivalent aperiodic 2DFT A is given by Theorem 17, while Theorem 24handles the converse. (cid:73) Theorem 17.

Given an

SDRTE C , we can construct an equivalent aperiodic 2DFT A with [[ C ]] = [[ A ]] . Proof.

We construct A by induction on the structure of the SDRTE C . In the suitable cases,we will suppose thanks to induction that we have aperiodic transducers A i for expressions C i , i ≤

2. We also have a deterministic and complete aperiodic automaton A L for any aperiodiclanguage L . C = ⊥ . Then A is a single state transducer with no ﬁnal state so that its domain isempty. C = v . Then A is a single state transducer which produces v and accepts any input word.Clearly, A is aperiodic. C = L ? C : C . The transducer A ﬁrst reads its input, simulating A L . Upon reachingthe end of the input word, it goes back to the beginning of the word, and either executes A if the word was accepted by A L , or executes A otherwise. Since every machine wasaperiodic, so is A . C = C (cid:12) C . The transducer A does a ﬁrst pass executing A , then resets to thebeginning of the word and simulates A . Since both transducers are aperiodic, so is A . C = C · C . We express A as the composition of three functions f , f , f , each aperiodic.Since aperiodic functions are closed under composition, we get the result. The ﬁrst function f associates to each word w ∈ Σ ∗ the word u u · · · u n , such that w = u u · · · u n and for any preﬁx u of w , u belongs to the domain of C if, and only if, u = u · · · u i for some 1 ≤ i < n . Notice that u = ε iﬀ ε ∈ dom ( C ) and u n = ε iﬀ w ∈ dom ( C ).The other u i ’s must be nonempty. The second function f takes as input a word in(Σ ∪ { } ) ∗ , reads it from right to left, and suppresses all C . Then, f ( f ( w )) contains exactlyone w has a unique factorisation w = uv with u ∈ dom ( C ) and v ∈ dom ( C ). In this case, f ( f ( w )) = u v .Finally, the function f has domain Σ ∗ ∗ and ﬁrst executes A on the preﬁx of itsinput upto the a , and then executes A onthe second part, treating ‘ .The functions f and f can be realised by aperiodic transducers as they only simulateautomata for the aperiodic domains of C and the reverse of C respectively, and thefunction f executes A and A one after the other, and hence is also aperiodic. . Dartois, P. Gastin, S. Krishna 19 − end − − ‘ /ε, +1 a/a, +1 a/a, +1 a/a, +1 a/ε, − a/ε, − / , +1 /ε, +1 /ε, +1 / , +1 , a/ε, − /ε, − /ε, − /ε, +1 Figure 3

The transducer T for k = 3. C = [ L, C ] k? or C = [ L, C ] k?r . Here L ⊆ Σ ∗ is an aperiodic language which is also apreﬁx code with bounded synchronisation delay, and k ≥ f = [[ C ]] : Σ ∗ → Γ ∗ be the aperiodic function deﬁned by C . We write [ L, f ] k? = L

The functions f , f , f , f are realised by aperiodic 2DFTs. Proof.

The function f . First, since L is an aperiodic language which is a preﬁx codewith bounded synchronisation delay, L ∗ is aperiodic. Let A be an aperiodic deterministicautomaton that recognizes L ∗ . Let w be a word in L ∗ and w = u · · · u n with u i ∈ L . Since L is a code, this decomposition is unique. Notice that ε / ∈ L . We claim that the run of A over w reaches ﬁnal states exactly at the end of each u i . Should this hold, then we caneasily construct a (one-way) aperiodic transducer T realising f by simply simulating A and copying its input, adding A reaches a ﬁnal state.It remains to prove the claim. First, since for any 1 ≤ i ≤ n , u · · · u i belongs to L ∗ , A reaches a ﬁnal state after reading u i . Conversely, suppose A reaches a ﬁnal state afterreading some nonempty preﬁx v of w . Then v can be written u · · · u i u for some index0 ≤ i < n and some nonempty preﬁx u of u i +1 . But since A reaches a ﬁnal state on v ,we have v ∈ L ∗ . Hence, there is a unique decomposition v = v · · · v m with v j ∈ L . Since v = u · · · u i u = v · · · v m , either u is a preﬁx of v or conversely. Since L is a preﬁx code,and both u and v belong to L , we obtain u = v . By induction, we get that u j = v j for j ≤ i . Now, u = v i +1 · · · v m is a nonempty preﬁx of u i +1 . Using again that L is a preﬁxcode, we get m = i + 1 and u = v i +1 = u i +1 , which concludes the proof of the claim. The function f . The domain of f is the language K = ∗ ≥ k . We constructan aperiodic 2DFT T for f (see Figure 3 for T where k = 3). Let T = ( {− k, − k +1 , . . . , , . . . , k − , k } ∪ { end } , Σ ∪ { } , Σ ∪ { } , δ , γ , , { end } ) be the 2DFT realising f .The transition function δ is deﬁned as: δ (0 , ‘ ) = (0 , +1), δ ( i, a ) = ( i, +1) for 0 < i ≤ k and a ∈ Σ, δ ( i, a ) = ( i, −

1) for − k < i < a ∈ Σ, δ ( end, a ) = δ ( end, − k, − δ ( i, i + 1 , +1) for 0 ≤ i < k , δ ( k, end, +1), δ ( i, i + 1 , −

1) for − k ≤ i < − δ ( − , , +1).The production function γ is then simply γ ( i, a ) = a for i > γ ( i, i = 0and i = k , and is set to ε for all other transitions.The way the transducer T works is that it reads forward, in the strictly positive states,a factor of the input containing k k th end to check if it was the last k − T . First, notice that the (cid:121) and (cid:120) are always aperiodicrelations, for any ﬁnite 2DFT. This is due to the fact that if a (cid:121) step exists in some u = ε , italso appears in uv . So for ( v n ) n> , the (cid:121) and (cid:120) relations are monotone, and since we considerﬁnite state machines, they eventually stabilize. So we turn to traversal steps. These traversalsteps only depend on the number of v has k + 1 or more { ( → , , k ) , ( → , , end ) , ( → , , k ) , ( → , , end ) } , starting from 0 is possible onlyif v starts with end if the last letter of v is k . Noticethat both ( → , , end ) and ( → , , end ) are possible if v ∈ ∗ ≥ k . Similarly, both ( → , , k )and ( → , , k ) are possible if v ∈ ∗ ≥ k Σ + . Then given any word v ∈ (Σ ∪ { } ) + , both v k +1 and v k +2 have either no k + 1 The function f . The goal of f is to iteratively simulate f on each factor appearingbetween T is deﬁned as the transducer T realising f , with theexception that it reads the a , and in this case ends the run, orsimulates the move of T reading ‘ from the initial state. Note that q to states q ‘ (where q r (where T . If the input word v does not contain any (cid:121) , (cid:120) , → , ← )-runs of v n are the same as the ones in T , and since T is aperiodic thenwe get ϕ ( v n ) = ϕ ( v n +1 ) for some n , where ϕ is the syntactic morphism of T . . Dartois, P. Gastin, S. Krishna 21 Otherwise, let us remark that by design, once the reading head has gone right of a given T when going fromleft to right of a T .So given a word v with at least one u and u be the preﬁx and suﬃx of v upto the ﬁrst and from the last v = u w · · · w m u with m ≥ u , u , w , . . . , w m ∈ Σ ∗ . Then there exists no ← traversal of v n for n ≥ → traversals of v n , for n ≥

2, existif and only if u u , w , . . . , w m belong to the domain of T , and consist of all ( → , p, q ), where ϕ ( u ) contains ( → , p, f ) for some ﬁnal state f , and ϕ ( u ) contains ( → , ι, q ) where ι is theinitial state. These traversals are then the same for v and v , which concludes the proof ofaperiodicity of T . The function f . The transducer T realising f is similar to T . The main diﬀerenceis that it starts by reaching the end of the word, then goes back to the previous T . On reaching the end of the run in T (in a ﬁnal state of T when reading a and then enters into a special state which moves the reading head to the left,till the time it has ﬁnished reading two ε all along. When itreads the second T would, on reading ‘ from its initialstate, and continues simulating T . This goes on until it reaches the start symbol ‘ , and thenit goes to the ﬁnal state of T that only moves to the right outputting ε all along until theend of the input to a .The arguments for the aperiodicity of T are similar to the ones for T . (cid:74) In this section, we show that the runs of an aperiodic 2DFT have a “stabilising” property.This property crucially distinguishes aperiodic 2DFTs from non aperiodic ones, and we usethis in our proof to obtain

SDRTE s from aperiodic 2DFTs. In the remainder of this section,we ﬁx an aperiodic 2DFT A = ( Q, Σ , Γ , δ, γ, q , F ). Let ϕ : (Σ ] {‘ , a} ) ∗ → TrM be thecanonical surjective morphism to the transition monoid of A . Consider a code L ⊆ Σ ∗ such that X = ϕ ( L ) is k -stabilizing for some k >

0. We will seethat a run of A over a word w ∈ L ∗ has some nice properties. Intuitively, if it moves forwardthrough k factors from L then it never moves backward through more than k factors.More precisely, let w = u u · · · u n be the unique factorisation of w ∈ L ∗ with u i ∈ L for1 ≤ i ≤ n . We assume that n ≥ k . We start with the easiest fact. (cid:73) Lemma 19. If ( (cid:121) , p, q ) ∈ ϕ ( w ) then the run of A over w starting on the left in state p only visits the ﬁrst k factors u · · · u k of w . Proof.

Since X is k -stabilising, we have ϕ ( w ) = ϕ ( u · · · u k ). Hence, ( (cid:121) , p, q ) ∈ ϕ ( u · · · u k )and the result follows since A is deterministic. (cid:74) Notice that the right-right ( (cid:120) ) runs of A over w need not visit the last k factors only (seeLemma 22 below). This is due to the fact that stabilising is not a symmetric notion.Next, we consider the left-right runs of A over w . (cid:73) Lemma 20.

Assume that ( → , p, q ) ∈ ϕ ( w ) . Then the run ρ of A over w starting onthe left in state p has the following property, that we call k -forward-progressing: for each ≤ i < n − k , after reaching the suﬃx u i + k +1 · · · u n of w , the run ρ will never visit againthe preﬁx u · · · u i . See Figure 4 for a non-example and Figure 10 for an example. u u u i u i +1 u i + k u i + k +1 u n ρ ρ ρ ρ p q q q q Figure 4

A left-right run which is not k -forward-progressing u u u i u i +1 u i + k u i + k +1 u n ρ ρ ρ ρ pq q q q Figure 5

A right-left run which is not k -backward-progressing Proof.

Towards a contradiction, assume that for some 1 ≤ i < n − k , the run ρ visits u · · · u i after visiting u i + k +1 · · · u n (See Figure 4). Then, there exists a subrun ρ of ρ making some( (cid:121) , q , q )-step on u i +1 · · · u n and visiting u i + k +1 (on Figure 4 we have ρ = ρ ρ ). Hence( (cid:121) , q , q ) ∈ ϕ ( u i +1 · · · u n ) and by Lemma 19 we deduce that ρ visits u i +1 · · · u i + k only, acontradiction. (cid:74)(cid:73) Lemma 21.

Assume that ( ← , p, q ) ∈ ϕ ( w ) . Then the run ρ of A over w starting on theright in state p has the following property, that we call k -backward-progressing: for each ≤ i < n − k , after reaching the preﬁx u · · · u i of w , the run ρ will never visit again thesuﬃx u i + k +1 · · · u n . Proof.

This Lemma is a consequence of Lemma 19. Indeed, consider any part of ρ thatvisits u i +1 again (in some state q ) after visiting u i , for some 1 ≤ i < n − k . As ρ is a ← run, it will later cross from u i +1 to u i (reaching some state q ). Then ( (cid:121) , q , q ) is a run on u i +1 · · · u n . By Lemma 19, it does not visit u i + k +1 · · · u n , which concludes the proof (SeeFigure 5 for a non-example). (cid:74)(cid:73) Lemma 22.

Assume that ( (cid:120) , p, q ) ∈ ϕ ( w ) and let ρ be the run of A over w startingon the right in state p . Then, either ρ visits only the last k factors u n − k +1 · · · u n , or forsome ≤ i ≤ n − k the run ρ is the concatenation ρ ρ ρ of a k -backward-progressing run ρ over u i +1 · · · u n followed by a run ρ staying inside some u i · · · u i + k , followed by some k -forward-progressing run ρ over u i +1 · · · u n . See Figure 6. Proof.

Assume that ρ visits u · · · u n − k and let u i (1 ≤ i ≤ n − k ) be the left-most factorvisited by ρ . We split ρ in ρ ρ ρ (see Figure 6) where ρ is the preﬁx of ρ , starting on the right of w in state p and going until the ﬁrst time ρ crosses from u i +1 to u i . Hence, ρ is a run over u i +1 · · · u n starting on the right in . Dartois, P. Gastin, S. Krishna 23 u u u i u i +1 u i + k u i + k +1 u n ρ ρ ρ pq q q Figure 6

A right-right run ρ ρ ρ where ρ is k -backward-progressing, ρ is local to u i · · · u i + k and ρ is k -forward-progressing. state p and exiting on the left in some state q . We have ( ← , p, q ) ∈ ϕ ( u i +1 · · · u n ). ByLemma 21 we deduce that ρ is k -backward-progressing.Then, ρ goes until the last crossing from u i to u i +1 .Finally, ρ is the remaining suﬃx of ρ . Hence, ρ is a run over u i +1 · · · u n starting on theleft in some state q and exiting on the right in state q . We have ( → , q , q ) ∈ ϕ ( u i +1 · · · u n ).By Lemma 20 we deduce that ρ is k -forward-progressing.It remains to show that ρ stays inside u i · · · u i + k . Since u i is the left-most factor visited by ρ , we already know that ρ does not visit u · · · u i − . Similarly to Lemma 21, any maximalsubrun ρ of ρ that does not visit u i is a (cid:121) run on u i +1 · · · u n since ρ starts and ends atthe frontier between u i and u i +1 . By Lemma 19, the subrun ρ does not visit u i + k +1 · · · u n and thus ρ stays inside u i · · · u i + k . (cid:74)(cid:73) Example 23.

We illustrate the stabilising runs of an aperiodic 2DFT using the aperiodic2DFT A in Figure 2. Figure 7 depicts the run of A on words in b ( a ∗ b ) ≥ . We use the set Z computed in Example 9. Notice that a run of A on such words is 4-forward-progressing,as seen below. For each w = u u · · · u n with n > u = b and u i ∈ a ∗ b for 2 ≤ i ≤ n , wehave ϕ ( w ) = Z and one can see thateach ( (cid:121) , p, q ) ∈ Z , is such that, whenever the run of A starts at the left of w in state p ,it stays within u · · · u and never visits u · · · u n (as in Lemma 19).each ( → , p, q ) ∈ Z , is such that, whenever the run of A starts at the left of w in state p and reaches u i +5 , for i ≥

1, it no longer visits any of u · · · u i (4-forward-progressing asin Lemma 20).each ( (cid:120) , p, q ) ∈ Z , is such that, whenever the run of A starts at the right of w in state p ,it never visits u · · · u n − (the easy case of Lemma 22). In this section, we show how to construct

SDRTE s which are equivalent to aperiodic 2DFTs.Recall that ϕ : (Σ ] {‘ , a} ) ∗ → TrM is the canonical surjective morphism to the transitionmonoid of the 2DFT A = ( Q, Σ , Γ , δ, γ, q , F ). Given a regular expression E and a monoidelement s ∈ TrM , we let L ( E, s ) = L ( E ) ∩ ϕ − ( s ). The main construction of this section isgiven by Theorem 24.Recall that TrM represents the transition monoid of a 2DFT, and consists of elements ϕ ( w ) for all w ∈ Σ ∗ , where each ϕ ( w ) = { ( d, p, q ) | there is a ( d, p, q )-run on w } ⊆ {→ , (cid:121) , (cid:120) , ←} × Q . The elements of ϕ ( w ) are called steps , since any run of w is obtained by asequence of such steps. If the states p, q in a step ( d, p, q ) are clear from the context, or is Figure 7

An accepting run on words in b ( a ∗ b ) ≥ . Bottom left: ( (cid:120) , q , q ). immaterial for the discussion we also refer to a step as a d step, d ∈ { (cid:121) , (cid:120) , → , ←} . In thiscase we also refer to a step ( d, p, q ) as a d step having p as the starting state and q as theﬁnal state. (cid:73) Theorem 24.

Let E be an unambiguous, stabilising, SD-regular expression over Σ ] {‘ , a} and let s ∈ TrM . For each step x ∈ {→ , (cid:121) , (cid:120) , ←} × Q , we can construct an SDRTE C E,s ( x ) such that: C E,s ( x ) = ⊥ when x / ∈ s , and otherwise dom ([[ C E,s ( x )]]) = L ( E, s ) and for all words w ∈ L ( E, s ) , [[ C E,s ( x )]]( w ) is the outputproduced by A running over w according to step x .When w = ε and s = = ϕ ( ε ) with x ∈ , this means [[ C E,s ( x )]]( ε ) = ε . Proof.

The construction is by structural induction on E . Atomic expressions

We ﬁrst deﬁne C E,s ( x ) when E is an atomic expression, i.e., ∅ , ε or a for a ∈ Σ. E = ∅ : we simply set C ∅ ,s ( x ) = ⊥ , which is the nowhere deﬁned function. E = ε : when s = and x ∈ s then we set C ε,s ( x ) = ε . ε and otherwise we set C ε,s ( x ) = ⊥ . E = a ∈ Σ ] {‘ , a} : again, we set C a,s ( x ) = ⊥ if s = ϕ ( a ) or x / ∈ s . Otherwise, there aretwo cases. Either x ∈ { ( → , p, q ) , ( (cid:120) , p, q ) } for some states p, q such that δ ( p, a ) = ( q, +1),or x ∈ { ( ← , p, q ) , ( (cid:121) , p, q ) } for some states p, q with δ ( p, a ) = ( q, − γ ( p, a ) and we set C a,s ( x ) = a . γ ( p, a ). Disjoint union

If the expression is E ∪ F with L ( E ) and L ( F ) disjoint, then we simply set C E ∪ F,s ( x ) = C E,s ( x ) + C F,s ( x ). Unambiguous concatenation E · F Here, we suppose that we have

SDRTE s for C E,s ( x ) and C F,s ( x ) for all s in TrM and all steps x ∈ {→ , (cid:121) , (cid:120) , ←} × Q . We show how to construct SDRTE s for C E · F,s ( x ), assuming that the . Dartois, P. Gastin, S. Krishna 25 u vρ ρ ρ ρ ρ ρ p p p p p p q u vρ ρ ρ ρ ρ p p p p p q Figure 8

Decomposition of a ( → , p, q )-run and a ( (cid:121) , p, q )-run over the product w = uv . concatenation L ( E ) · L ( F ) is unambiguous.A word w ∈ L ( E · F ) has a unique factorization w = uv with u ∈ L ( E ) and v ∈ L ( F ).Let s = ϕ ( u ) and t = ϕ ( v ). A run ρ over w is obtained by stitching together runs over u and runs over v as shown in Figure 8. In the left ﬁgure, the run over w follows step x = ( → , p, q ) starting on the left in state p and exiting on the right in state q . The run ρ splits as ρ ρ ρ ρ ρ ρ as shown in the ﬁgure. The output of the initial part ρ is computedby C E,s (( → , p, p )) over u and the output of the ﬁnal part ρ is computed by C F,t (( → , p , q ))over v . We focus now on the internal part ρ ρ ρ ρ which consists of an alternate sequenceof left-left runs over v and right-right runs over u . The corresponding sequence of steps x = ( (cid:121) , p , p ) ∈ t , x = ( (cid:120) , p , p ) ∈ s , x = ( (cid:121) , p , p ) ∈ t and x = ( (cid:120) , p , p ) ∈ s dependsonly on s = ϕ ( u ) and t = ϕ ( v ).These internal zigzag runs will be frequently used when dealing with concatenation orKleene star. They alternate left-left ( (cid:121) ) steps on the right word v and right-right ( (cid:120) ) stepson the left word u . They may start with a (cid:121) -step or a (cid:120) -step. The sequence of steps in a maximal zigzag run is entirely determined by the monoid elements s = ϕ ( u ), t = ϕ ( v ), thestarting step d ∈ { (cid:121) , (cid:120) } and the starting state p of step d . The ﬁnal step of this maximal sequence is some d ∈ { (cid:121) , (cid:120) } and reaches some state q . We write Z s,t ( p , d ) = ( d , q ). Forinstance, on the left of Figure 8 we get Z s,t ( p , (cid:121) ) = ( (cid:120) , p ) whereas on the right of Figure 8we get Z s,t ( p , (cid:121) ) = ( (cid:121) , p ). By convention, if the sequence of zigzag steps is empty then wedeﬁne Z s,t ( p, (cid:121) ) = ( (cid:120) , p ) and Z s,t ( p, (cid:120) ) = ( (cid:121) , p ). (cid:73) Lemma 25.

We use the above notation. We can construct

SDRTE s ZC F,tE,s ( p, d ) for p ∈ Q and d ∈ { (cid:121) , (cid:120) } such that dom ([[ ZC F,tE,s ( p, d )]]) = L ( E, s ) L ( F, t ) and for all u ∈ L ( E, s ) and v ∈ L ( F, t ) the value [[ ZC F,tE,s ( p, d )]]( uv ) is the output produced by the internal zigzag run of A over ( u, v ) following the maximal sequence of steps starting in state p with a d -step. Proof.

We ﬁrst consider the case d = (cid:121) and Z s,t ( p, (cid:121) ) = ( (cid:120) , q ) for some q ∈ Q which isillustrated on the left of Figure 8. Since A is deterministic, there is a unique maximal sequenceof steps (with n ≥ p = p and p n +1 = q ): x = ( (cid:121) , p , p ) ∈ t , x = ( (cid:120) , p , p ) ∈ s ,. . . , x n − = ( (cid:121) , p n − , p n ) ∈ t , x n = ( (cid:120) , p n , p n +1 ) ∈ s . The zigzag run ρ following thissequence of steps over uv splits as ρ ρ · · · ρ n where ρ i is the unique run on u followingstep x i and ρ i +1 is the unique run on v following step x i +1 . The output of these runs aregiven by [[ C E,s ( x i )]]( u ) and [[ C F,t ( x i +1 )]]( v ). When n = 0 the zigzag run ρ is empty and wesimply set ZC F,tE,s ( p, (cid:121) ) = ( L ( E, s ) L ( F, t )) . ε . Assume now that n >

0. The required

SDRTE computing the output of ρ can be deﬁned as ZC F,tE,s ( p, (cid:121) ) = (cid:0) ( L ( E, s ) . ε ) · C F,t ( x ) (cid:1) (cid:12) (cid:0) C E,s ( x ) · C F,t ( x ) (cid:1) (cid:12) · · · (cid:12) (cid:0) C E,s ( x n − ) · C F,t ( x n − ) (cid:1) (cid:12) (cid:0) C E,s ( x n ) · ( L ( F, t ) . ε ) (cid:17) . Notice that each Cauchy product in this expression is unambiguous since the product L ( E ) · L ( F ) is unambiguous.The other cases can be handled similarly. For instance, when Z s,t ( p, (cid:121) ) = ( (cid:121) , q ) as onthe right of Figure 8, the sequence of steps ends with x n − = ( (cid:121) , p n − , p n ) ∈ t with n > p n = q and the zigzag run ρ is ρ ρ · · · ρ n − . The SDRTE ZC

F,tE,s ( p, (cid:121) ) is given by (cid:0) ( L ( E, s ) . ε ) · C F,t ( x ) (cid:1) (cid:12) (cid:0) C E,s ( x ) · C F,t ( x ) (cid:1) (cid:12) · · · (cid:12) (cid:0) C E,s ( x n − ) · C F,t ( x n − ) (cid:1) . The situation is symmetric for ZC F,tE,s ( p, (cid:120) ): the sequence starts with a right-right step x = ( (cid:120) , p , p ) ∈ s with p = p and we obtain the SDRTE simply by removing the ﬁrstfactor (cid:0) ( L ( E, s ) . ε ) · C F,t ( x ) (cid:1) in the Hadamard products above. (cid:74) We come back to the deﬁnition of the

SDRTE s for C E · F,r ( x ) with r ∈ TrM and x ∈ r . Asexplained above, the output produced by a run ρ following step x over a word w = uv with u ∈ L ( E, s ), v ∈ L ( F, t ) and r = st consists of an initial part, a zigzag internal part, and aﬁnal part. There are four cases depending on the step x . x = ( (cid:121) , p, q ). Either the run ρ stays inside u (zigzag part empty) or there is a zigzaginternal part starting with ( p , (cid:121) ) such that ( → , p, p ) ∈ ϕ ( u ) and ending with ( (cid:121) , q ) suchthat ( ← , q , q ) ∈ ϕ ( u ). Thus we deﬁne the SDRTE C E · F,r ( x ) as X st = r | x ∈ s C E,s ( x ) · (cid:0) L ( F, t ) . ε (cid:1) + X st = r, ( p ,q ) | Z s,t ( p , (cid:121) )=( (cid:121) ,q ) (cid:0) C E,s (( → , p, p )) · ( L ( F, t ) . ε ) (cid:1) (cid:12) ZC F,tE,s ( p , (cid:121) ) (cid:12) (cid:0) C E,s (( ← , q , q )) · ( L ( F, t ) . ε ) (cid:1) Notice that all Cauchy products are unambiguous since the concatenation L ( E ) · L ( F )is unambiguous. The sums are also unambiguous. Indeed, a word w ∈ L ( E · F, r ) hasa unique factorization w = uv with u ∈ L ( E ) and v ∈ L ( F ). Hence s = ϕ ( u ) and t = ϕ ( v ) are uniquely determined and satisfy st = r . Then, either x ∈ s and w is onlyin the domain of C E,s ( x ) · (cid:0) L ( F, t ) . ε (cid:1) . Or there is a unique p with ( → , p, p ) ∈ s anda unique q with Z s,t ( p , (cid:121) ) = ( (cid:121) , q ) and ( ← , q , q ) ∈ s . Notice that if ( → , p, p ) / ∈ s then C E,s (( → , p, p )) = ⊥ and similarly if ( ← , q , q ) / ∈ s . Hence we could have added thecondition ( → , p, p ) , ( ← , q , q ) ∈ s to the second sum, but do not, to reduce clutter. x = ( → , p, q ). Here the run must cross from left to right. Thus we deﬁne the SDRTEC E · F,r ( x ) as X st = r, ( p ,q ) | Z s,t ( p , (cid:121) )=( (cid:120) ,q ) (cid:0) C E,s (( → , p, p )) · ( L ( F, t ) . ε ) (cid:1) (cid:12) ZC F,tE,s ( p , (cid:121) ) (cid:12) (cid:0) ( L ( E, s ) . ε ) · C F,t (( → , q , q )) (cid:1) x = ( ← , p, q ). This case is similar. The SDRTE C E · F,r ( x ) is X st = r, ( p ,q ) | Z s,t ( p , (cid:120) )=( (cid:121) ,q ) (cid:0) ( L ( E, s ) . ε ) · C F,t (( ← , p, p )) (cid:1) (cid:12) ZC F,tE,s ( p , (cid:120) ) (cid:12) (cid:0) C E,s (( ← , q , q )) · ( L ( F, t ) . ε ) (cid:1) x = ( (cid:120) , p, q ). Finally, for right-right runs, the SDRTE C E · F,r ( x ) is X st = r | x ∈ t ( L ( E, s ) . ε ) · C F,t ( x ) + X st = r, ( p ,q ) | Z s,t ( p , (cid:120) )=( (cid:120) ,q ) (cid:0) ( L ( E, s ) . ε ) · C F,t (( ← , p, p )) (cid:1) (cid:12) ZC F,tE,s ( p , (cid:120) ) (cid:12) (cid:0) ( L ( E, s ) . ε ) · C F,t (( → , q , q )) (cid:1) . Dartois, P. Gastin, S. Krishna 27 (cid:73) Example 26.

We go back to our running example of the aperiodic 2DFT A in Figure 2 andillustrate the unambiguous concatenation. Consider F = a + b , G = F and E = F = G .We know from Example 9 that ϕ ( F ) = Y , ϕ ( G ) = Y and ϕ ( E ) = Z . We compute belowsome steps of Y , Y and Z .First, we look at some steps in F for which the SDRTE are obtained directly by looking atthe automaton A in Figure 2 (we cannot give more details here since we have not explainedyet how to deal with Kleene-plus, hence we rely on intuition for these steps). C F,Y (( → , q , q )) = ( a + b . ε ) C F,Y (( → , q , q )) = ( a + b . ε ) C F,Y (( (cid:121) , q , q )) = ( a . a ) + · ( b . b ) C F,Y (( → , q , q )) = ( a . a ) + · ( b . b ) C F,Y (( ← , q , q )) = ( a + b . ε ) C F,Y (( ← , q , q )) = ( a + b . ε ) C F,Y (( (cid:120) , q , q )) = ( a + b . ε ) . Next, we compute some steps using the unambiguous concatenation G = F · F . We startwith step ( → , q , q ) for which the zigzag part is empty: ZC F,Y F,Y ( q , (cid:121) ) = F . ε . Hence, weget using the formula in the proof above C G,Y (( → , q , q )) = (cid:0) C F,Y (( → , q , q )) · ( F . ε ) (cid:1) (cid:12) ( F . ε ) (cid:12) (cid:0) ( F . ε ) · C F,Y (( → , q , q )) (cid:1) and after some simpliﬁcations C G,Y (( → , q , q )) = C F,Y (( → , q , q )) · C F,Y (( → , q , q )) = (( a + b ) . ε ) . Similarly, we can compute the following steps C G,Y (( → , q , q )) = C F,Y (( → , q , q )) · C F,Y (( → , q , q )) = ( a . a ) + · ( b . b ) · ( a + b . ε ) . For step ( (cid:121) , q , q ), the run only visits the ﬁrst factor since ( (cid:121) , q , q ) ∈ Y : C G,Y (( (cid:121) , q , q )) = C F,Y (( (cid:121) , q , q )) · ( F . ε ) = ( a . a ) + · ( b . b ) · ( a + b . ε ) . Now, for step ( (cid:121) , q , q ), the zigzag part is reduced to step ( (cid:121) , q , q ) and we get: C G,Y (( (cid:121) , q , q )) = (cid:0) C F,Y (( → , q , q )) · ( F . ε ) (cid:1) (cid:12) (cid:0) ( F . ε ) · C F,Y (( (cid:121) , q , q )) (cid:1) (cid:12) (cid:0) C F,Y (( ← , q , q )) · ( F . ε ) (cid:1) = ( a + b . ε ) · ( a . a ) + · ( b . b ) . Similarly, we compute C G,Y (( (cid:120) , q , q )) = (cid:0) ( F . ε ) · C F,Y (( ← , q , q )) (cid:1) (cid:12) (cid:0) C F,Y (( (cid:120) , q , q )) · ( F . ε ) (cid:1) (cid:12) (cid:0) ( F . ε ) · C F,Y (( → , q , q )) (cid:1) = ( a + b . ε ) · ( a . a ) + · ( b . b ) . Finally, we consider the unambiguous decomposition E = G · G in order to compute C E,Z ( x ) where x = ( → , q , q ). Notice that ( → , q , q ) , ( (cid:121) , q , q ) , ( (cid:120) , q , q ) , ( → , q , q ) ∈ Y = ϕ ( G ). Hence, Z Y ,Y ( q , (cid:121) ) = ( (cid:120) , q ) and applying the formulas in the proof above weobtain C E,Z ( x ) = (cid:0) C G,Y (( → , q , q )) · ( G . ε ) (cid:1) (cid:12) ZC G,Y G,Y ( q , (cid:121) ) (cid:12) (cid:0) ( G . ε ) · C G,Y (( → , q , q )) (cid:1) ZC G,Y G,Y ( q , (cid:121) ) = (cid:0) ( G . ε ) · C G,Y (( (cid:121) , q , q )) (cid:1) (cid:12) (cid:0) C G,Y (( (cid:120) , q , q )) · ( G . ε ) (cid:1) . Figure 9

Illustration for Example 26.

Putting everything together and still after some simpliﬁcations, we get C E,Z ( x ) = (cid:0) ( a . a ) + · ( b . b ) · (( a + b ) . ε ) (cid:1) (cid:12) (cid:0) (( a + b ) . ε ) · ( a . a ) + · ( b . b ) (cid:1) (cid:12) (cid:0) ( a + b . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) . ε ) (cid:1) . For instance, applying C E,Z ( x ) to w = aba ba ba b we obtain aba ba b . (cid:74) SD-Kleene Star

The most interesting case is when E = F ∗ . Let L = L ( F ) ⊆ Σ ∗ . Since E is a stabilisingSD-regular expression, L is an aperiodic preﬁx code of bounded synchronisation delay, and X = ϕ ( L ) is k -stabilising for some k >

0. Hence, we may apply the results of Section 5.2.1.By induction, we suppose that we have

SDRTE s C F,s ( x ) for all s in TrM and steps x . Since L = L ( F ) is a code, for each ﬁxed ‘ >

0, the expression F ‘ = F · F · · · F is an unambiguousconcatenation. Hence, from the proof above for the unambiguous concatenation, we mayalso assume that we have SDRTE s C F ‘ ,s ( x ) for all s ∈ TrM and steps x . Similarly, we have SDRTE s for ZC F,tF k ,s ( − ) and ZC F k ,tF,s ( − ). Notice that F is equivalent to ε hence we have C F ,s ( x ) = C ε,s ( x ).We show how to construct SDRTE s C E,s ( x ) for E = F ∗ . There are four cases, which aredealt with below, depending on the step x .We ﬁx some notation common to all four cases. Fix some w ∈ L ( E, s ) = L ∗ ∩ ϕ − ( s ) andlet w = u · · · u n be the unique factorization of w with n ≥ u i ∈ L for 1 ≤ i ≤ n . Fora step x ∈ s , we denote by ρ the unique run of A over w following step x . x = ( (cid:121) , p, q ) ∈ s The easiest case is for left-left steps. If n < k then the output of ρ is [[ C F n ,s ( x )]]( w ).Notice that here, C F ,s ( x ) = ⊥ since x / ∈ = ϕ ( ε ). Now, if n ≥ k then, by Lemma 19,the run ρ stays inside u · · · u k . We deduce that the output of ρ is [[ C F k ,s ( x )]]( u · · · u k ).Therefore, we deﬁne C E,s ( x ) = (cid:16) X n

A left-right run which is 2-forward-progressing.

Notice that the sums are unambiguous since L = L ( F ) is a code. The concatenation F k · F ∗ is also unambiguous. x = ( → , p, q ) ∈ s We turn now to the more interesting left-right steps. Again, if n < k then the output of ρ is [[ C F n ,s ( x )]]( w ). Assume now that n ≥ k . We apply Lemma 20 to deduce that the run ρ is k -forward-progressing. See Figure 10 for a sample run which is 2-forward-progressing.We split ρ in ρ ρ · · · ρ n − k where ρ is the preﬁx of ρ going until the ﬁrst crossing from u k to u k +1 . Then, ρ goes until the ﬁrst crossing from u k +1 to u k +2 . Continuing in thesame way, for 1 ≤ i < n − k , ρ i goes until the ﬁrst crossing from u k + i to u k + i +1 . Finally, ρ n − k is the remaining suﬃx, going until the run exits from w on the right. Since the run ρ is k -forward progressing, we deduce that ρ i does not go back to u · · · u i − , hence itstays inside u i · · · u i + k , starting on the left of u i + k and exiting on the right of u i + k .Since X = ϕ ( L ) is k -stabilising, we have ϕ ( u · · · u k + i ) = ϕ ( w ) for all 0 ≤ i ≤ n − k .Now, ρ · · · ρ i is a run on u · · · u k + i starting on the left in state p and exiting on theright. Since A is deterministic and x = ( → , p, q ) ∈ ϕ ( w ) = ϕ ( u · · · u k + i ) we deduce that ρ i exits on the right of u k + i in state q . In particular, ρ is a run on u · · · u k starting onthe left in state p and exiting on the right in state q . Moreover, for each 1 ≤ i ≤ n − k , ρ i is the concatenation of a zigzag internal run over ( u i · · · u i + k − , u i + k ) starting with( q, (cid:121) ) ending with ( (cid:120) , q i ) = Z s ,s ( q, (cid:121) ) where s = ϕ ( u i · · · u i + k − ), s = ϕ ( u i + k ) and a( → , q i , q ) run over u i + k .Let v i be the output produced by ρ i for 0 ≤ i ≤ n − k . Then, using Lemma 25, theproductions v i with 0 < i ≤ n − k are given by the SDRTE f deﬁned as f = X s ,s ,q | ( (cid:120) ,q )= Z s ,s ( q, (cid:121) ) ZC F,s F k ,s ( q, (cid:121) ) (cid:12) (cid:0) ( L ( F k , s ) . ε ) · C F,s (( → , q , q )) (cid:1) Then the product v · · · v n − k is produced by the ( k +1)-chained Kleene-star [ L, f ] ( k +1) ? ( w ).From the above discussion, we also deduce that v = [[ C F k ,s ( x )]]( u · · · u k ). Therefore, wedeﬁne C E,s ( x ) = (cid:16) X n

A right-left run which is 2-backward-progressing. we split ρ in ρ n − k +1 · · · ρ ρ where ρ n − k +1 is the preﬁx of ρ going until the ﬁrst crossingfrom u n − k +1 to u n − k . Then, ρ n − k goes until the ﬁrst crossing from u n − k to u n − k − .Continuing in the same way, for n − k > i > ρ i goes until the ﬁrst crossing from u i to u i − . Finally, ρ is the remaining suﬃx, going until the run exits from u on the left. Sincethe run ρ is k -backward progressing, we deduce that, for 1 ≤ i ≤ n − k , the run ρ i doesnot go back to u i + k +1 · · · u n . Hence it is the concatenation of a zigzag internal run over( u i , u i +1 · · · u i + k ), starting with some ( q i , (cid:120) ) and exiting with ( (cid:121) , q i ) = Z s ,s ( q i , (cid:120) ) where s = ϕ ( u i ), s = ϕ ( u i +1 · · · u i + k ), and a ( ← , q i , q i − )-run over u i (see again Figure 11).Let v i be the output produced by ρ i for 1 ≤ i ≤ n − k + 1. The output produced by ρ is v = v n − k +1 · · · v v . Now, the situation is slightly more complicated than for left-rightruns where we could prove that q i = q for each i . Instead, let us remark that q i is theunique state (by determinacy of A ) such that there is a run over u i +1 · · · u n following step( ← , p, q i ). But since X is k -stabilising, we know that ϕ ( u i +1 · · · u n ) = ϕ ( u i +1 · · · u i + k ).Then, given s = ϕ ( u i ) and s = ϕ ( u i +1 · · · u i + k ), we get that q i is the unique state suchthat ( ← , p, q i ) ∈ s and q i − the one such that ( ← , p, q i − ) ∈ s s . Thus we deﬁne thefunction g generating the v i with 1 ≤ i ≤ n − k by g = X s ,s ,p ,q ,q | ( ← ,p,p ) ∈ s , ( (cid:121) ,q )= Z s ,s ( p , (cid:120) ) , ( ← ,q ,q ) ∈ s ZC F k ,s F,s ( p , (cid:120) ) (cid:12) (cid:16) C F,s (( ← , q , q )) · ( L ( F k , s ) . ε ) (cid:17) Finally, a right-left run for F ∗ is either a right-left run over F n for n < k , or theconcatenation of a right-left run ρ n − k +1 over the k rightmost iterations of F , and asequence of runs ρ i with 1 ≤ i ≤ n − k as previously and whose outputs are computed by g . Therefore, we deﬁne C E,s ( x ) = (cid:16) X n

Let us continue with our example in Figure 2. Here we illustrate computingan

SDRTE for some E = F ∗ and some left-right step. We consider F = a + b so that ϕ ( F ) = Y as computed in Example 9, where we also computed Z = Y . Consider x = ( → , q , q ) ∈ Z .We explain how to compute C E,Z ( x ). Since ϕ ( F ) = Y is 4-stabilising, the runs followingstep x over words in L ( E, Z ) = ( a + b ) ≥ are 4-forward-progressing, and the construction inthe proof above uses a 5-chained star .Let w = u u · · · u n with n ≥ u , u , . . . , u n ∈ a + b . As can be seen on Figure 12(with n = 8), the 4-forward-progressing run ρ over w following step x = ( → , q , q ) splits as ρ = ρ · · · ρ where ρ is the preﬁx from u till ﬁrst crossing from u to u , ρ is the partfrom u till the ﬁrst crossing from u to u , ρ is the part of the run from u till the ﬁrstcrossing from u to u , ρ is the part of the run from u till the ﬁrst crossing from u to u ,and ρ is the part from u till exiting at the right of u . Actually, the runs over step x are 3-forward-progressing. Hence, we could simplify the example below byusing a 4-chained star only. But we decided to use a 5-star in order to follow the construction describedin the proof above. Figure 12

Illustration for Example 27

We obtain C ( a + b ) ∗ ,Z (( → , q , q )) = ( C ( a + b ) ,Z (( → , q , q )) · (( a + b ) ∗ . ε )) (cid:12) [ a + b, f ] ? where f = ZC a + b,Y ( a + b ) ,Z ( q , (cid:121) ) (cid:12) (cid:16) ( a + b ) . ε ) · C a + b,Y (( → , q , q )) (cid:17) ZC a + b,Y ( a + b ) ,Z ( q , (cid:121) ) = (cid:16) (( a + b ) . ε ) · C a + b,Y (( (cid:121) , q , q )) (cid:17) (cid:12) (cid:16) C ( a + b ) ,Z (( (cid:120) , q , q )) · ( a + b . ε ) (cid:17) and the expressions below where computed in Example 26 C a + b,Y (( → , q , q )) = a + b . ε C a + b,Y (( (cid:121) , q , q )) = ( a . a ) + · ( b . b ) C ( a + b ) ,Z (( (cid:120) , q , q )) = (( a + b ) . ε ) · ( a . a ) + · ( b . b ) · ( a + b . ε ) C ( a + b ) ,Z (( → , q , q )) = (cid:16) ( a . a ) + · ( b . b ) · (( a + b ) . ε ) · ( a . a ) + · ( b . b ) (cid:17) (cid:12) (cid:16) ( a + b . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) . ε ) (cid:17) After simpliﬁcations, we obtain f = ZC a + b,Y ( a + b ) ,Z ( q , (cid:121) )= (cid:16) (( a + b ) . ε ) · ( a . a ) + · ( b . b ) (cid:17) (cid:12) (cid:16) (( a + b ) . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) . ε ) (cid:17) . For instance, consider w = u u · · · u with u i = a i b . Then, f ( u · · · u ) = a ba b f ( u · · · u ) = a ba bf ( u · · · u ) = a ba b f ( u · · · u ) = a ba b [[ C ( a + b ) ,Z (( → , q , q ))]]( u · · · u ) = aba ba b [[ C ( a + b ) ∗ ,Z (( → , q , q ))]]( u · · · u ) = aba ba ba ba ba ba ba ba ba ba b . . Dartois, P. Gastin, S. Krishna 33 We conclude the section by showing how to construct

SDRTE s equivalent to 2DFTs. (cid:73)

Theorem 28.

Let A = ( Q, Σ , Γ , δ, γ, q , F ) be an aperiodic 2DFT. We can construct anequivalent SDRTE C A over alphabet Σ with dom ([[ C A ]]) = dom ([[ A ]]) and [[ A ]]( w ) = [[ C A ]]( w ) for all w ∈ dom ([[ A ]]) . Proof.

We ﬁrst construct below an

SDRTE C with dom ([[ C ]]) = ‘ dom ([[ A ]]) a and suchthat [[ A ]]( w ) = [[ C ]]( ‘ w a ) for all w ∈ dom ([[ A ]]). Then, we obtain C using Proposition 15by C A = ( ‘ − C ) a − . Finally, we get rid of lingering endmarkers in C using Lemma 16 to obtain C A as theprojection of C on Σ ∗ .Let ϕ : (Σ ]{‘ , a} ) ∗ → TrM be the canonical surjective morphism to the transition monoidof A . Since A is aperiodic, the monoid TrM is also aperiodic. We can apply Theorem 10 tothe restriction of ϕ to Σ ∗ : for each s ∈ TrM , we get an unambiguous, stabilising, SD-regularexpression E s with L ( E s ) = ϕ − ( s ) ∩ Σ ∗ . Let E = ‘ · ( S s ∈ TrM E s ) which is an unambiguous,stabilising, SD-regular expression with L ( E ) = ‘ Σ ∗ . Applying Theorem 24, for each monoidelement s ∈ TrM and each step x ∈ {→ , (cid:121) , (cid:120) , ←} × Q , we construct the corresponding SDRTE C

E,s ( x ). We also apply Lemma 25 and construct for each state p ∈ Q an SDRTEZC a ,tE,s ( p, (cid:121) ) where t = ϕ ( a ). Figure 13

Removing end markers. On the left when there is a non trivial zig zag until reachinga ﬁnal state q ; on the right when we have an empty zigzag. q is the initial state and q ∈ F . Finally, we deﬁne C = X s,p,q | q ∈ F ( → ,q ,p ) ∈ sZ s,t ( p, (cid:121) )=( (cid:120) ,q ) (cid:0) C E,s (( → , q , p )) · ( a . ε ) (cid:1) (cid:12) ZC a ,tE,s ( p, (cid:121) )See alo Figure 13 illustrating C . We can easily check that C satisﬁes the requirementsstated above. (cid:74)(cid:73) Example 29.

We complete the series of examples by giving an

SDRTE equivalent withthe transducer A of Figure 2 on words in E = b ( a + b ) ≥ ⊆ dom ( A ). Notice that byExample 9, we have ϕ ( E ) = Z = Y Z . We compute ﬁrst C E ,Z (( → , s, q )). We usethe unambiguous product E = b · E with E = ( a + b ) ≥ . From Example 9, we have( → , s, q ) , ( (cid:120) , q , q ) ∈ Y = ϕ ( b ) and ( (cid:121) , q , q ) , ( → , q , q ) ∈ Z = ϕ ( E ). We deduce that in the product, the zigzag part consists of the two steps ( (cid:121) , q , q ) and ( (cid:120) , q , q ). Thereforewe obtain: C E ,Z (( → , s, q )) = (cid:0) C b,Y (( → , s, q )) · ( E . ε ) (cid:1) (cid:12) ZC E ,Z b,Y ( q , (cid:121) ) (cid:12) (cid:0) ( b . ε ) · C E ,Z (( → , q , q )) (cid:1) ZC E ,Z b,Y ( q , (cid:121) ) = (cid:0) ( b . ε ) · C E ,Z (( (cid:121) , q , q )) (cid:1) (cid:12) (cid:0) C b,Y (( (cid:120) , q , q )) · ( E . ε ) (cid:1) . Since C b,Y (( → , s, q )) = C b,Y (( (cid:120) , q , q )) = b . ε , we obtain after simpliﬁcations C E ,Z (( → , s, q )) = (cid:0) ( b . ε ) · C E ,Z (( (cid:121) , q , q )) (cid:1) (cid:12) (cid:0) ( b . ε ) · C E ,Z (( → , q , q )) (cid:1) . Let E = ( a + b ) ∗ as in Example 27. Since L ( E , Z ) = L ( E, Z ) we have C E ,Z ( x ) = C E,Z ( x )for all steps x . Recall that in Example 27 we have computed C E,Z (( → , q , q )). Moreover, C E ,Z (( (cid:121) , q , q )) = C ( a + b ) ,Z (( (cid:121) , q , q )) · ( E . ε ) and a computation similar to Example 26gives C ( a + b ) ,Z (( (cid:121) , q , q )) = (( a + b ) . ε ) · ( a . a ) + · ( b . b ) · ( a + b . ε ) . Finally, with the notations of Example 27, we obtain C E ,Z (( → , s, q )) = (cid:0) ( b ( a + b ) . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) + . ε ) (cid:1) (cid:12) (cid:0) ( b . ε ) · C E,Z (( → , q , q )) (cid:1) = (cid:0) ( b ( a + b ) . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) + . ε ) (cid:1) (cid:12) (cid:0) ( b . ε ) · C ( a + b ) ,Z (( → , q , q )) · (( a + b ) ∗ . ε ) (cid:1) (cid:12) (cid:0) ( b . ε ) · [ a + b, f ] ? (cid:1) = (cid:0) ( b ( a + b ) . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) + . ε ) (cid:1) (cid:12) (cid:0) ( b . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) ∗ . ε ) (cid:1) (cid:12) (cid:0) ( ba + b . ε ) · ( a . a ) + · ( b . b ) · (( a + b ) ≥ . ε ) (cid:1) (cid:12) (cid:0) ( b . ε ) · [ a + b, f ] ? (cid:1) . Note that for our example, we have a simple case of using Theorem 28, since ‘ is visitedonly once at state s , and there are no transitions deﬁned on a , i.e., ϕ ( a ) = ∅ . So for E = ‘ E ,we have ( → , s, q ) ∈ ϕ ( E ) and ZC a , ∅ E ,ϕ ( E ) ( q , (cid:121) ) = ε (zigzag part empty as seen in the rightof Figure 13). Thus, an expression equivalent to A on words in E eﬀectively boils down to C E ,Z (( → , s, q )). For the sake of simplicity, we stop the example here and do not providethe expression on the full domain dom ( A ) = b ( a ∗ b ) ≥ a ∗ . (cid:74) Since

SDRTE s are used to deﬁne functions over words, it seems natural to consider thecomposition of functions, as it is an easy to understand but powerful operator. In this section,we discuss other formalisms using composition as a basic operator, and having the sameexpressive power as

SDRTE s.Theorem 1 gives the equivalence between

SDRTE s and aperiodic two-way transducers, thelatter being known to be closed under composition. Hence, adding composition to

SDRTE sdoes not add expressiveness, while allowing for easier modelisation of transformations.Moreover, we prove that, should we add composition of functions, then we can replace the k -chained star operator and its reverse by the simpler 1-star [ L, f ] ? and its reverse, whichin particular are one-way (left-to-right or right-to-left) operator when f is also one-way. . Dartois, P. Gastin, S. Krishna 35 Finally, we prove that we can furthermore get rid of the reverse operators as well asthe Hadamard product by adding two basic functions: reverse and duplicate . The reversefunction is straightforward as it reverses its input. The duplicate function is parameterised bya symbol, say dup ( u ) = u u . (cid:73) Theorem 30.

The following families of expressions have the same expressive power: SDRTE s, SDRTE s with composition of functions, SDRTE s with composition and chained star restricted to . Expressions with simple functions, unambiguous sum, Cauchy product, -star, duplicate,reverse and composition. Proof.

It is trivial that 3 ⊆ ⊆ SDRTE s are equivalent to aperiodic two-way transducers that areclosed under composition. Hence, composition does not add expressive power and we have2 ⊆ ⊆

3, we simply have to prove that the duplicate and reverse functionscan be expressed with

SDRTE s using only the 1-star operator and its reverse. The duplicatefunction deﬁnition relies on the Hadamard and is given by the expression: dup = ( id Σ ∗ · ( ε . (cid:12) id Σ ∗ where id Σ ∗ is the identity function and can be written as [Σ , id Σ ] ? where id Σ = P a ∈ Σ a . a .The reverse function is also easy to deﬁne using the 1-star reverse: rev = [Σ , id Σ ] ?r To prove the last inclusion 1 ⊆

4, we need to express the Hadamard product and the(reverse) k -chained star, using duplicate, reverse and composition.The Hadamard product f (cid:12) g is easy to deﬁne using dup where f (cid:12) g = ( f · ( . ε ) · g ) ◦ dup . We show now how to reduce k -star to 1-star using duplicate and composition. The proofis by induction on k . When k = 1 there is nothing to do. Assume that k >

1. We showhow to express [

L, f ] k? using ( k − L in order to duplicate them, then use a ( k − k factors of L (with some redundant information), and lastly use compositionto prune the input to a form suitable to ﬁnally apply f .More formally, let f = [ L, dup $ · ( ε . ? with domain L ∗ and, when applied to a word u = u · · · u n with u i ∈ L , produces u $ u u $ u u $ u · · · u n − $ u n − u n $ u n ∈ { ε } ∪ Σ ∗ $(Σ ∗ ∗ $) ∗ Σ ∗ ∗ ∗ $ is a 1-SD preﬁx code and that taking k − k factors of L . Then we deﬁne the function gg = (cid:0) id Σ ∗ · ( ∗ $ . ε ) (cid:1) k − · (cid:0) id Σ ∗ · ( . ε ) · id Σ ∗ · ($ . ε ) (cid:1) with domain (Σ ∗ ∗ $) k − and, when applied to a word v v $ v v $ · · · v k − v k − $,produces v v · · · v k − v k − . In particular, g ( u i +1 u i +2 $ u i +2 u i +3 $ · · · u i + k − u i + k $) = u i +1 · · · u i + k . Finally, we have[ L, f ] k? = (cid:0) ( ε . ε ) + (Σ ∗ $ . ε ) · [Σ ∗ ∗ $ , f ◦ g ] ( k − ? · (Σ ∗ . ε ) (cid:1) ◦ f . The reverse k -star [ L, f ] k?r is not expressed in a straightforward fashion using reversecomposed with k -star, because while reverse applies on all the input, the reverse k -star swapsthe applications of function f while keeping the function f itself untouched. In order toexpress it, we reverse a k -star operator not on f , but on f reversed. The result is that theapplications of f are reversed twice, thus preserving them. Formally, we have:[ L, f ] k?r = rev ◦ [ L, rev ◦ f ] k? (cid:74) We conclude with some interesting avenues for future work, arising from the open questionsbased on this paper.We begin with complexity questions, and then move on to other directions for futurework. The complexity of our procedure, especially when going from the declarative language

SDRTE to the computing machine 2DFT, is open. This part relies heavily on the compositionof 2DFTs which incurs at least one exponential blowup in the state space. A possibility toreduce the complexity incurred during composition, is to obtain reversible

SDRTE . Another open questionis the eﬃciency of evaluation, i.e., given an

SDRTE and an input word, what is the timecomplexity of obtaining the corresponding output. This is crucial for an implementation,along the lines of DReX [2].Yet another direction is to extend our result to transformations over inﬁnite words. WhilePerrin [19] generalized the SF=AP result of Schützenberger to inﬁnite words in the mid1980s, Diekert and Kuﬂeitner [12, 13] generalized Schützenberger’s SD=AP result to inﬁnitewords. One could use this SD=AP over inﬁnite words and check how to adapt our proof tothe setting of transformations over inﬁnite words. Finally, a long standing open problemin the theory of transformations is to decide if a function given by a 2DFT is realizable byan aperiodic one. This question has been solved in the one-way case, or in the case whenwe have origin information [5], but the general case remains open. We believe that ourcharacterisation of stabilising runs provided in Section 5.2.1 could lead to a forbidden patterncriteria to decide this question.

References Rajeev Alur and Pavol Černý. Expressiveness of streaming string transducers. In , volume 8 of

LIPIcs. Leibniz Int. Proc. Inform. , pages 1–12. SchlossDagstuhl. Leibniz-Zent. Inform., Wadern, 2010. . Dartois, P. Gastin, S. Krishna 37 Rajeev Alur, Loris D’Antoni, and Mukund Raghothaman. Drex: A declarative languagefor eﬃciently evaluating regular string transformations. In

Proceedings of the 42nd AnnualACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015,Mumbai, India, January 15-17, 2015 , pages 125–137, 2015. doi:10.1145/2676726.2676981 . Rajeev Alur, Adam Freilich, and Mukund Raghothaman. Regular combinators for stringtransformations. In Thomas A. Henzinger and Dale Miller, editors,

Joint Meeting of the 23rdEACSL Annual Conference on Computer Science Logic (CSL) and the 29th Annual ACM/IEEESymposium on Logic in Computer Science (LICS), CSL-LICS ’14, Vienna, Austria, July 14 -18, 2014 , pages 9:1–9:10. ACM, 2014. Nicolas Baudru and Pierre-Alain Reynier. From two-way transducers to regular functionexpressions. In Mizuho Hoshi and Shinnosuke Seki, editors, , volume 11088 of

Lecture Notes in ComputerScience , pages 96–108. Springer, 2018. Mikolaj Bojanczyk. Transducers with origin information. In

Automata, Languages, andProgramming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11,2014, Proceedings, Part II , pages 26–37, 2014. doi:10.1007/978-3-662-43951-7\_3 . Mikolaj Bojanczyk, Laure Daviaud, and Shankara Narayanan Krishna. Regular and ﬁrst-order list functions. In

Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic inComputer Science, LICS 2018, Oxford, UK, July 09-12, 2018 , pages 125–134, 2018. doi:10.1145/3209108.3209163 . J. Richard Büchi. Weak second-order arithmetic and ﬁnite automata.

Zeitschrift für Mathem-atische Logik und Grundlagen der Mathematik , 6:66–92, 1960. Olivier Carton and Luc Dartois. Aperiodic two-way transducers and fo-transductions. InStephan Kreutzer, editor, , volume 41 of

LIPIcs , pages 160–174. SchlossDagstuhl - Leibniz-Zentrum fuer Informatik, 2015. doi:10.4230/LIPIcs.CSL.2015.160 . Bruno Courcelle. Monadic second-order deﬁnable graph transductions: a survey [seeMR1251992 (94f:68009)].

Theoret. Comput. Sci. , 126(1):53–75, 1994. Seventeenth Colloquiumon Trees in Algebra and Programming (CAAP ’92) and European Symposium on Program-ming (ESOP) (Rennes, 1992). URL: http://dx.doi.org/10.1016/0304-3975(94)90268-2 , doi:10.1016/0304-3975(94)90268-2 . Luc Dartois, Paulin Fournier, Ismaël Jecker, and Nathan Lhote. On reversible transducers.In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, , volume 80 of

LIPIcs , pages 113:1–113:12. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ICALP.2017.113 . Vrunda Dave, Paul Gastin, and Shankara Narayanan Krishna. Regular Transducer Expressionsfor Regular Transformations. In Martin Hofmann, Anuj Dawar, and Erich Grädel, editors,

Pro-ceedings of the 33rd Annual ACM/IEEE Symposium on Logic In Computer Science (LICS’18) ,pages 315–324, Oxford, UK, July 2018. ACM Press. Volker Diekert and Manfred Kuﬂeitner. Bounded synchronization delay in omega-rationalexpressions. In

Computer Science - Theory and Applications - 7th International ComputerScience Symposium in Russia, CSR 2012, Nizhny Novgorod, Russia, July 3-7, 2012. Proceedings ,pages 89–98, 2012. doi:10.1007/978-3-642-30642-6\_10 . Volker Diekert and Manfred Kuﬂeitner. Omega-rational expressions with bounded synchroniz-ation delay.

Theory of Computing Systems , 56(4):686–696, 2015. Volker Diekert and Manfred Kuﬂeitner. A survey on the local divisor technique.

TheoreticalComputer Science , 610:13–23, Jan 2016. Joost Engelfriet and Hendrik Jan Hoogeboom. MSO deﬁnable string transductions andtwo-way ﬁnite-state transducers.

ACM Trans. Comput. Log. , 2(2):216–254, 2001. URL: http://dx.doi.org/10.1145/371316.371512 , doi:10.1145/371316.371512 . Emmanuel Filiot, Shankara Narayanan Krishna, and Ashutosh Trivedi. First-order deﬁn-able string transformations. In Venkatesh Raman and S. P. Suresh, editors, , volume 29 of

LIPIcs ,pages 147–159. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2014.147 , doi:10.4230/LIPIcs.FSTTCS.2014.147 . Paul Gastin. Modular descriptions of regular functions. In

Algebraic Informatics - 8thInternational Conference, CAI 2019, Niš, Serbia, June 30 - July 4, 2019, Proceedings , pages3–9, 2019. doi:10.1007/978-3-030-21363-3\_1 . Robert McNaughton and Seymour Papert.

Counter-Free Automata . The MIT Press, Cambridge,Mass., 1971. Dominique Perrin. Recent results on automata and inﬁnite words. In

Mathematical Foundationsof Computer Science 1984, Praha, Czechoslovakia, September 3-7, 1984, Proceedings , pages134–148, 1984. doi:10.1007/BFb0030294 . Dominique Perrin and Jean-Eric Pin.

Inﬁnite Words: Automata, Semigroups, Logic andGames , volume 141. Elsevier, 2004. Marcel Paul Schützenberger. On ﬁnite monoids having only trivial subgroups.

Informationand Control , 8(2):190–194, 1965. Marcel-Paul Schützenberger. Sur certaines opérations de fermeture dans les langages rationnels.In