One-way resynchronizability of word transducers
OOne-way Resynchronizabilityof Word Transducers
Sougata Bose , S.N. Krishna , Anca Muscholl , and Gabriele Puppis LaBRI, University of Bordeaux, France Dept. of Computer Science & Engineering IIT Bombay, India Dept. of Mathematics, Computer Science, and Physics, Udine University, Italy
Abstract.
The origin semantics for transducers was proposed in 2014,and it led to various characterizations and decidability results that arein contrast with the classical semantics. In this paper we add a furtherdecidability result for characterizing transducers that are close to one-way transducers in the origin semantics. We show that it is decidablewhether a non-deterministic two-way word transducer can be resynchro-nized by a bounded, regular resynchronizer into an origin-equivalent one-way transducer. The result is in contrast with the usual semantics, whereit is undecidable to know if a non-deterministic two-way transducer isequivalent to some one-way transducer.
Keywords:
String transducers · Resynchronizers · One-way transducers
Regular word-to-word functions form a robust and expressive class of transforma-tions, as they correspond to deterministic two-way transducers, to deterministicstreaming string transducers [1], and to monadic second-order logical transduc-tions [11]. However, the transition from word languages to functions over wordsis often quite tricky. One of the challenges is to come up with effective charac-terizations of restricted transformations. A first example is the characterizationof functions computed by one-way transducers (known as rational functions ).It turns out that it is decidable whether a regular function is rational [14],but the algorithm is quite involved [3]. In addition, non-determinism makes theproblem intractable: it is undecidable whether the relation computed by a non-deterministic two-way transducer can be also computed by a one-way transducer,[2]. A second example is the problem of knowing whether a regular word func-tion can be described by a first-order logical transduction. This question is stillopen in general [16], and it is only known how to decide if a rational function isdefinable in first-order logic [13].Word transducers with origin semantics were introduced by Boja´nczyk [4]and shown to provide a machine-independent characterization of regular word-to-word functions. The origin semantics, as the name suggests, means taggingthe output by the positions of the input that generated that output. a r X i v : . [ c s . F L ] J a n a c aa b a c input:output:origins: b a c aa b a c input:output:resynchronizedorigins: Fig. 1: On the left, an input-output pair for a transducer T that reads wd andoutputs dw , d ∈ Σ , w ∈ Σ ∗ , the arrows denoting origins. On the right, the sameinput-output pair, but with origins modified by a resynchronizer R . The resyn-chronized relation R ( T ) is order-preserving, and T is one-way resynchronizable.A nice phenomenon is that origins can restore decidability for some inter-esting problems. For example, the equivalence of word relations computed byone-way transducers, which is undecidable in the classical semantics [18,19], is PSPACE -complete for two-way non-deterministic transducers in the origin se-mantics [7]. Another, deeper, observation is that the origin semantics providesan algebraic approach that can be used to decide fragments. For example, [4]provides an effective characterization of first-order definable word functions un-der the origin semantics. As for the problem of knowing whether a regular wordfunction is rational, it becomes almost trivial in the origin semantics.A possible objection against the origin semantics is that the comparison oftwo transducers in the origin semantics is too strict. Resynchronizations wereproposed in order to overcome this issue. A resynchronization is a binary relationbetween input-output pairs with origins, that preserves the input and the out-put, changing only the origins. Resynchronizations were introduced for one-waytransducers [15], and later for two-way transducers [7]. For one-way transduc-ers rational resynchronizations are transducers acting on the synchronizationlanguages, whereas for two-way transducers, regular resynchronizations are de-scribed by regular properties over the input that restrict the change of origins.The class of bounded regular resynchronizations was shown to behave verynicely, preserving the class of transductions defined by non-deterministic, two-way transducers: for any bounded regular resynchronization R and any two-waytransducer T , the resynchronized relation R ( T ) can be computed by anothertwo-way transducer [7]. In particular, non-deterministic, two-way transducerscan be effectively compared modulo bounded regular resynchronizations.As mentioned above, it is easy to know if a two-way transducer is equiv-alent under the origin semantics to some one-way transducer [4], since this isequivalent to being order-preserving. But what happens if this is not the case?Still, the given transducer T can be “close” to some order-preserving transducer.What we mean here by “close” is that there exists some bounded regular resyn-chronizer R such that R ( T ) is order-preserving and all input-output pairs with “Bounded” refers here to the number of source positions that are mapped to thesame target position. It rules out resynchronizations such as the universal one. rigins produced by T are in the domain of R . We call such transducers one-wayresynchronizable . Figure 1 gives an example.In this paper we show that it is decidable if a two-way transducer is one-wayresynchronizable. We first solve the problem for bounded-visit two-way transduc-ers. A bounded-visit transducer is one for which there is a uniform bound for thenumber of visits of any input position. Then, we use the previous result to showthat one-way resynchronizability is decidable for arbitrary two-way transducers,so without the bounded-visit restriction. This is done by constructing, if possible,a bounded, regular resynchronization from the given transducer to a bounded-visit transducer with regular language outputs. Finally, we show that boundedregular resynchronizations are closed under composition, and this allows to com-bine the previous construction with our decidability result for bounded-visittransducers. Related work and paper overview.
The synthesis problem for resynchronizers asksto compute a resynchronizer from one transducer to another one, when the twotransducers are given as input. The problem was studied in [6] and shown tobe decidable for unambiguous two-way transducers (it is open for unrestrictedtransducers). The paper [21] shows that the containment version of the aboveproblem is undecidable for unrestricted one-way transducers.The origin semantics for streaming string transducers (SST) [1] has beenstudied in [5], providing a machine-independent characterization of the sets oforigin graphs generated by SSTs. An open problem here is to characterize origingraphs generated by aperiodic streaming string transducers [10,16]. Going be-yond words, [17] investigates decision problems of tree transducers with origin,and regains the decidability of the equivalence problem for non-deterministictop-down and MSO transducers by considering the origin semantics. An openproblem for tree transducers with origin is that of synthesizing resynchronizersas in the word case.We will recall regular resynchronizations in Section 3. Section 4 provides theproof ingredients for the bounded-visit case, and the proof of decidability ofone-way resynchronizability in the bounded-visit case can be found in Section5. Finally, in Section 6 we sketch the proof in the general case. Missing proofscan be found in the appendix.
Let Σ be a finite input alphabet. Given a word w ∈ Σ ∗ of length | w | = n , a position is an element of its domain dom ( w ) = { , . . . , n } . For every position i , w ( i ) denotes the letter at that position. A cut of w is any number from 1to | w | + 1, so a cut identifies a position between two consecutive letters of theinput. The cut i = 1 represents the position just before the first input letter,and i = | w | + 1 the position just after the last letter of w . Two-way transducers.
We use two-way transducers as defined in [3,6], with aslightly different presentation than in classical papers such as [22]. As usual forwo-way machines, for any input w ∈ Σ ∗ , w (0) = (cid:96) and w ( | w | + 1) = (cid:97) , where (cid:96) , (cid:97) / ∈ Σ are special markers used as delimiters.A two-way transducer (or just transducer from now on) is a tuple T =( Q, Σ, Γ, ∆, I, F ), where
Σ, Γ are respectively the input and output alphabets, Q = Q ≺ (cid:93) Q (cid:31) is the set of states, partitioned into left-reading states from Q ≺ and right-reading states from Q (cid:31) , I ⊆ Q (cid:31) is the set of initial states, F ⊆ Q isthe set of final states, and ∆ ⊆ Q × ( Σ (cid:93) {(cid:96) , (cid:97)} ) × Γ ∗ × Q is the finite transitionrelation. Left-reading states read the letter to the left, whereas right-readingstates read the letter to the right. This partitioning will also determine the headmovement during a transition, as explained below.As usual, to define runs of transducers we first define configurations. Givena transducer T and a word w ∈ Σ ∗ , a configuration of T on w is a state-cut pair( q, i ), with q ∈ Q and 1 ≤ i ≤ | w | + 1. A configuration ( q, i ), 1 ≤ i ≤ | w | + 1means that the automaton is in state q and its head is between the ( i − i -th letter of w . The transitions that depart from a configuration ( q, i )and read a are denoted ( q, i ) a −→ ( q (cid:48) , i (cid:48) ), and must satisfy one of the following:(1) q ∈ Q (cid:31) , q (cid:48) ∈ Q (cid:31) , a = w ( i ), ( q, a, v, q (cid:48) ) ∈ ∆ , and i (cid:48) = i + 1,(2) q ∈ Q (cid:31) , q (cid:48) ∈ Q ≺ , a = w ( i ), ( q, a, v, q (cid:48) ) ∈ ∆ , and i (cid:48) = i ,(3) q ∈ Q ≺ , q (cid:48) ∈ Q (cid:31) , a = w ( i − q, a, v, q (cid:48) ) ∈ ∆ , and i (cid:48) = i ,(4) q ∈ Q ≺ , q (cid:48) ∈ Q ≺ , a = w ( i − q, a, v, q (cid:48) ) ∈ ∆ , and i (cid:48) = i −
1. When T has only right-reading states (i.e. Q ≺ = ∅ ), its head can only move rightward.In this case we call T a one-way transducer .A run of T on w is a sequence ρ = ( q , i ) a j | v −→ ( q , i ) a j | v −→ · · · a jm | v m −→ ( q m +1 , i m +1 ) of configurations connected by transitions. Note that the positions j , j , . . . , j m of letters do not need to be ordered from smaller to bigger, andcan differ slightly (by +1 or −
1) from the cuts i , i , . . . , i m +1 , since cuts takevalues in between consecutive letters.A configuration ( q, i ) on w is initial (resp. final ) if q ∈ I and i = 1 (resp. q ∈ F and i = | w | + 1). A run is successful if it starts with an initial configuration andends with a final configuration. The output associated with a successful run ρ as above is the word v v · · · v m ∈ Γ ∗ . A transducer T defines a relation[[ T ]] ⊆ Σ ∗ × Γ ∗ consisting of all the pairs ( u, v ) such that v is the output of somesuccessful run ρ of T on u . Origin semantics.
In the origin semantics for transducers [4] the output is taggedwith information about the position of the input where it was produced. Ifreading the i -th letter of the input we output v , then all letters of v are taggedwith i , and we say they have origin i . We use the notation ( v, i ) for v ∈ Γ ∗ to denote that all positions in the output word v have origin i , and we view( v, i ) as word over the alphabet Γ × N . The outputs associated with a successfulrun ρ = ( q , i ) b | v −→ ( q , i ) b | v −→ ( q , i ) · · · b m | v m −→ ( q m +1 , i m +1 ) in the originsemantics are the words of the form ν = ( v , j )( v , j ) · · · ( v m , j m ) over Γ × N where, for all 1 ≤ k ≤ m , j k = i k if q k ∈ Q (cid:31) , and j k = i k − q k ∈ Q ≺ . Underthe origin semantics, the relation defined by T , denoted [[ T ]] o , is the set of pairs = ( u, ν ) —called synchronized pairs — such that u ∈ Σ ∗ and ν ∈ ( Γ × N ) ∗ isthe output of some successful run on u .Equivalently, a synchronized pair ( u, ν ) can be described as a triple ( u, v, orig ),where v is the projection of ν on Γ , and orig : dom ( v ) → dom ( u ) associates witheach position of v its origin in u . So for ν = ( v , j )( v , j ) · · · ( v m , j m ) as above, v = v . . . v m , and, for all positions i s.t. | v . . . v k − | < i ≤ | v . . . v k | , we have orig ( i ) = j k . Given two transducers T , T , we say they are origin-equivalent if[[ T ]] o = [[ T ]] o . Note that two transducers T , T can be equivalent in the clas-sical semantics, [[ T ]] = [[ T ]], while they can have different origin semantics, so[[ T ]] o (cid:54) = [[ T ]] o . Bounded-visit transducers.
Let k > ρ some run of atwo-way transducer T . We say that ρ is k -visit if for every i ≥
0, it has at most k occurrences of configurations from Q × { i } . We call a transducer T k -visit iffor every σ ∈ [[ T ]] o there is some successful, k -visit run ρ of T with output σ (actually we should call the transducer k -visit in the origin semantics , but forsimplicity we omit this). For example, the relation { ( w, w ) | w ∈ Σ ∗ } , where w denotes the reverse of w , can be computed by a 3-visit transducer. A transduceris called bounded-visit if it is k -visit for some k . Common guess.
It is often useful to work with a variant of two-way transducersthat can guess beforehand some annotation on the input and inspect it consis-tently when visiting portions of the input multiple times. This feature is called common guess [5], and strictly increases the expressive power of two-way trans-ducers, including bounded-visit ones.
Resynchronizations are used to compare transductions in the origin semantics.A resynchronization is a binary relation
R ⊆ ( Σ ∗ × ( Γ × N ) ∗ ) over synchronizedpairs such that ( σ, σ (cid:48) ) ∈ R implies that σ = ( u, v, orig ) and σ (cid:48) = ( u, v, orig (cid:48) )for some origin mappings orig , orig (cid:48) : dom ( v ) → dom ( u ). In other words, aresynchronization will only change the origin mapping, but neither the input, northe output. Given a relation S ⊆ Σ ∗ × ( Γ × N ) ∗ with origins, the resynchronizedrelation R ( S ) is defined as R ( S ) = { σ (cid:48) | ( σ, σ (cid:48) ) ∈ R , σ ∈ S } . For a transducer T we abbreviate R ([[ T ]] o ) by R ( T ). The typical use of a resynchronization R isto ask, given two transducers T, T (cid:48) , whether R ( T ) and T (cid:48) are origin-equivalent.Regular resynchronizers (originally called MSO resynchronizers) were intro-duced in [7] as a resynchronization mechanism that preserves definability bytwo-way transducers. They were inspired by MSO (monadic second-order) trans-ductions [9,12] and they are formally defined as follows. A regular resynchronizer is a tuple R = ( I, O, ipar , opar , ( move τ ) τ , ( next τ,τ (cid:48) ) τ,τ (cid:48) ) consisting of – some monadic parameters (colors) I = ( I , . . . , I m ) and O = ( O , . . . , O n ), MSO sentences ipar , opar , defining languages over expanded input and outputalphabets, i.e. over Σ (cid:48) = Σ × { ,...,m } and Γ (cid:48) = Γ × { ,...,n } , respectively, – MSO formulas move τ ( y, z ), next τ,τ (cid:48) ( z, z (cid:48) ) with two free first-order variablesand parametrized by expanded output letters τ, τ (cid:48) (called types, see below).To apply a regular resynchronizer as above, one first guesses the valuation of allthe predicates I j , O k , and uses it to interpret the parameters I and O . Basedon the chosen valuation of the parameters O , each position x of the output v gets an associated type τ x = ( v ( x ) , b , . . . , b n ) ∈ Γ × { , } n , where b j is 1 or0 depending on whether x ∈ O j or not. We refer to the output word togetherwith the valuation of the output parameters as annotated output , so a wordover Γ × { , } n . Similarly, the annotated input is a word over Σ × { , } m .The annotated input and output word must satisfy the formulas ipar and opar ,respectively.The origins of output positions are constrained using the formulas move τ and next τ,τ (cid:48) , which are parametrized by output types and evaluated over the an-notated input . Intuitively, the formula move τ ( y, z ) states how the origin of everyoutput position of type τ changes from y to z . We refer to y and z as source and target origin, respectively. The formula next τ,τ (cid:48) ( z, z (cid:48) ) instead constrains thetarget origins z, z (cid:48) of any two consecutive output positions with types τ and τ (cid:48) ,respectively.Formally, R = ( I, O, ipar , opar , ( move τ ) , ( next τ,τ (cid:48) )) defines the resynchroniza-tion consisting of all pairs ( σ, σ (cid:48) ), with σ = ( u, v, orig ), σ (cid:48) = ( u, v, orig (cid:48) ), u ∈ Σ ∗ ,and v ∈ Γ ∗ , for which there exist u (cid:48) ∈ Σ (cid:48)∗ and v (cid:48) ∈ Γ (cid:48)∗ such that – π Σ ( u (cid:48) ) = u and π Γ ( v (cid:48) ) = v – u (cid:48) satisfies ipar and v (cid:48) satisfies opar , – ( u (cid:48) , orig ( x ) , orig (cid:48) ( x )) satisfies move τ for all τ -labeled output positions x ∈ dom ( v (cid:48) ), and – ( u (cid:48) , orig (cid:48) ( x ) , orig (cid:48) ( x + 1)) satisfies next τ,τ (cid:48) for all x, x + 1 ∈ dom ( v (cid:48) ) such that x and x + 1 have label τ and τ (cid:48) , respectively. Example 1.
Consider the following resynchronization R . A pair ( σ, σ (cid:48) ) belongsto R if σ = ( uv, uwv, orig ), σ (cid:48) = ( uv, uwv, orig (cid:48) ), with u, v, w ∈ Σ + . The origins orig and orig (cid:48) are both the identity over u and v . The origin of every positionof w in σ (hence a source origin) is either the first or the last position of v . Theorigin of every position of w in σ (cid:48) (a target origin) is the first position of v .This resynchronization is described by a regular resynchronizer that uses twoinput parameters I , I to mark the last and the first positions of v in the input,and one output parameter O to mark the factor w in the output. The formula move τ ( y, z ) is either ( I ( y ) ∨ I ( y )) ∧ I ( z ) or ( y = z ), depending on whether thetype τ describes a position inside w or a position outside w .We now turn to describing some important restrictions on (regular) resyn-chronizers. Let R = ( I, O, ipar , opar , ( move τ ) , ( next τ,τ (cid:48) )) be a resynchronizer. – R is k -bounded (or just bounded ) if for every annotated input u (cid:48) ∈ Σ (cid:48)∗ , everyoutput type τ ∈ Γ (cid:48) , and every position z , there are at most k positions y such that ( u (cid:48) , y, z ) satisfies move τ . Recall that y, z are input positions. R is T -preserving for a given transducer T , if every σ ∈ [[ T ]] o belongs to thedomain of R . – R is partially bijective if each move τ formula defines a partial, bijective func-tion from source origins to target origins. Observe that this property impliesthat R is 1-bounded.The boundedness restriction rules out resynchronizations such as the univer-sal one, that imposes no restriction on the change of origins. It is a decidablerestriction [7], and it guarantees that definability by two-way transducers is effec-tively preserved under regular resynchronizations, modulo common guess. Moreprecisely, Theorem 16 in [7] shows that, given a bounded regular resynchronizer R and a transducer T , one can construct a transducer T (cid:48) with common guessthat is origin-equivalent to R ( T ). Example 1 (continued).
Consider again the regular resynchronizer R describedin the previous example. Note that R is 2-bounded, since at most two sourceorigins are redirected to the same target origin. If we used an additional outputparameter to distinguish, among the positions of w , those that have source originin the first position of v and those that have source origin in the last position of v , we would get a 1-bounded, regular resynchronizer.We state below two crucial properties of regular resynchronizers (the secondlemma is reminiscent of Lemma 11 from [21], which proves closure of boundedresynchronizers with vacuous next τ,τ (cid:48) relations). Lemma 1.
Every bounded, regular resynchronizer is effectively equivalent tosome -bounded, regular resynchronizer. Lemma 2.
The class of bounded, regular resynchronizers is effectively closedunder composition.
Given a two-way transducer T one can ask if it is origin-equivalent to someone-way transducer. It was observed in [4] that this property holds if and onlyif all synchronized pairs defined by T are order-preserving , namely, for all σ =( u, v, orig ) ∈ [[ T ]] o and all y, y (cid:48) ∈ dom ( v ), with y < y (cid:48) , we have orig ( y ) ≤ orig ( y (cid:48) ).The decidability of the above question should be contrasted to the analogousquestion in the classical semantics: “is a given two-way transducer classicallyequivalent to some one-way transducer?” The latter problem turns out to bedecidable for functional transducers [14,3], but is undecidable for arbitrary two-way transducers [2].Here we are interested in a different, more relaxed notion: Definition 1.
A transducer T is called one-way resynchronizable if there existsa bounded, regular resynchronizer R that is T -preserving and such that R ( T ) isorder-preserving. ote that if T (cid:48) is an order-preserving transducer, then one can constructrather easily a one-way transducer T (cid:48)(cid:48) such that T (cid:48) = o T (cid:48)(cid:48) , by eliminating non-productive U-turns from accepting runs.Moreover, note that without the condition of being T -preserving every trans-ducer T would be one-way resynchronizable, using the empty resynchronization. Example 2.
Consider the transducer T that moves the last letter of the input wa to the front by a first left-to-right pass that outputs the last letter a , followed by aright-to-left pass without output, and finally by a left-to-right pass that producesthe remaining w . Let R be the bounded regular resynchronizer that redirects theorigin of the last a to the first position. Assuming an output parameter O withan interpretation constrained by opar that marks the last position of the output,the formula move ( a, ( y, z ) says that target origin z (source origin y , resp.) ofthe last a is the first (last, resp.) position of the input. It is easy to see that R ( T ) is origin-equivalent to the one-way transducer that on input wa , guesses a and outputs aw . Thus, T is one-way resynchronizable. See also Figure 1. Example 3.
Consider the transducer T that reads inputs of the form u v andoutputs vu in the obvious way, by a first left-to-right pass that outputs v , followedby a right-to-left pass, and a finally a left-to-right pass that outputs u . Usingthe characterization with the notion of cross-width that we introduce below, itcan be shown that T is not one-way resynchronizable.In order to give a flavor of our results, we anticipate here the two main theo-rems, before introducing the key technical concepts of cross-width and inversion(these will be defined further below). Theorem 1.
For every bounded-visit transducer T , the following are equivalent:(1) T is one-way resynchronizable,(2) the cross-width of T is finite,(3) no successful run of T has inversions,(4) there is a partially bijective, regular resynchronizer R that is T -preservingand such that R ( T ) is order-preserving.Moreover, condition (3) is decidable. We will use Theorem 1 to show that one-way resynchronizability is decidablefor arbitrary two-way transducers (not just bounded-visit ones).
Theorem 2.
It is decidable whether a given two-way transducer T is one-wayresynchronizable. Let us now introduce the first key concept, that of cross-width:
Definition 2 (cross-width).
Let σ = ( u, v, orig ) be a synchronized pair andlet X , X ⊆ dom ( v ) be sets of output positions such that, for all x ∈ X and x ∈ X , x < x and orig ( x ) > orig ( x ) . We call such a pair ( X , X ) a cross nd define its width as min( | orig ( X ) | , | orig ( X ) | ) , where orig ( X ) = { orig ( x ) | x ∈ X } is the set of origins corresponding to a set X of output positions.The cross-width of a synchronized pair σ is the maximal width of the crossesin σ . A transducer has bounded cross-width if for some integer k , all synchro-nized pairs associated with successful runs of T have cross-width at most k . inputoutput crosscross-width X X For instance, the transducer T in Example 3 has unbounded cross-width. Incontrast, the transducer T in Example 2 has cross-width one.The other key notion of inversion will be introduced formally in the nextsection (page 12), as it requires a few technical definitions. The notion howeveris very similar in spirit to that of cross, with the difference that a single inversionis sufficient for witnessing a family of crosses with arbitrarily large cross-width. This section provides an overview of the proof of Theorem 1, and introduces themain ingredients.We will use flows (a concept inspired from crossing sequences [22,3] andrevised in Section 4.1) in order to derive the key notion of inversion. Roughlyspeaking, an inversion in a run involves two loops that produce outputs in anorder that is reversed compared to the order on origins. Inversions were also usedin the characterization of one-way definability of two-way transducers under theclassical semantics [3]. There, they were used for deriving some combinatorialproperties of outputs. Here we are only interested in detecting inversions, andthis is a simple task.Flows will also be used to associate factorization trees with runs (the exis-tence of factorization trees of bounded height was established by the celebratedSimon’s factorization theorem [23]). We will use a structural induction on thesefactorization trees and the assumption that there is no inversion in every run toconstruct a regular resynchronization witnessing one-way resynchronizability ofthe transducer at hand.Another important ingredient underlying the main characterization is givenby the notion of dominant output interval (Section 4.2), which is used to for-malize the invariant of our inductive construction.
Intervals. An interval of a word is a set of consecutive positions in it. An intervalis often denoted by I = [ i, i (cid:48) ), with i = min( I ) and i (cid:48) = max( I ) + 1. Given twontervals I = [ i, i (cid:48) ) and J = [ j, j (cid:48) ), we write I < J if i (cid:48) ≤ j , and we say that I, J are adjacent if i (cid:48) = j . The union of two adjacent intervals I = [ i, i (cid:48) ), J = [ j, j (cid:48) ),denoted I · J , is the interval [ i, j (cid:48) ) (if I, J are not adjacent, then I · J is undefined). Subruns.
Given a run ρ of a transducer, a subrun is a factor of ρ . Note that asubrun of a two-way transducer may visit a position of the input several times.For an input interval I = [ i, j ) and a run ρ , we say that a subrun ρ (cid:48) of ρ spansover I if i (resp. j ) is the smallest (resp. greatest) input position labeling sometransition of ρ (cid:48) . The left hand-side of the figure at page 11 gives an example ofan interval I of an input word together with the subruns α , α , α , β , β , β , γ that span over it. Subruns spanning over an interval can be left-to-right, left-to-left, right-to-left, or right-to-right depending on where the starting and endingpositions are w.r.t. the endpoints of the interval. Flows.
Flows are used to summarize subruns of a two-way transducer that spanover a given interval. The definition below is essentially taken from [3], except forreplacing “functional” by “ K -visit”. Formally, a flow of a transducer T is a graphwith vertices divided into two groups, L -vertices and R -vertices, labeled by statesof T , and with directed edges also divided into two groups, productive and non-productive edges. The graph satisfies the following requirements. Edge sourcesare either an L -vertex labeled by a right-reading state, or an R -vertex labeled bya left-reading state, and symmetrically for edge destinations; moreover, edges areof one of the following types: LL , LR , RL , RR . Second, each node is the endpointof exactly one edge. Finally, L ( R , resp.) vertices are totally ordered, in sucha way that for every LL ( RR , resp.) edge ( v, v (cid:48) ), we have v < v (cid:48) . We will onlyconsider flows of K -visiting transducers, so flows with at most 2 K vertices. Forexample, the flow in the left-hand side of the figure at page 11 has six L -verticeson the left, and six R -vertices on the right. The edges α , α , α are LL , LR , and RR , respectively.Given a run ρ of T and an interval I = [ i, i (cid:48) ) on the input, the flow of ρ on I , denoted flow ρ ( I ), is obtained by identifying every configuration at position i (resp. i (cid:48) ) with an L (resp. R ) vertex, labeled by the state of the configuration, andevery subrun spanning over I with an edge connecting the appropriate vertices(this subrun is called the witnessing subrun of the edge of the flow). An edge issaid to be productive if its witnessing subrun produces non-empty output. Flow monoid.
The composition of two flows F and G is defined when the R -vertices of F induce the same sequence of labels as the L -vertices of G . In thiscase, the composition results in the flow F · G that has as vertices the L -vertices of F and the R -vertices of G , and for edges the directed paths in the graph obtainedby glueing the R -vertices of F with the L -vertices of G so that states are matched.Productiveness of edges is inherited by paths, implying that an edge of F · G is productive if and only if the corresponding path contains at least one edge(from F or G ) that is productive. When the composition is undefined, we simplywrite F · G = ⊥ . The above definitions naturally give rise to a flow monoid associated with the transducer T , where elements are the flows of T , extendedith a dummy element ⊥ , and the product operation is given by the compositionof flows, with the convention that ⊥ is absorbing. It is easy to verify that forany two adjacent intervals I < J of a run ρ , flow ρ ( I ) · flow ρ ( J ) = flow ρ ( I · J ).We denote by M T the flow monoid of a K -visiting transducer T .Let us estimate the size of M T . If Q is the set of states of T , there are at most | Q | K possible sequences of L and R -vertices; and the number of edges (markedas productive or not) is bounded by (cid:0) KK (cid:1) · (2 K ) K · K ≤ (2 K +1) K . Including thedummy element ⊥ in the flow monoid, we get | M T | ≤ ( | Q |· (2 K +1)) K +1 =: M . Loops.
A loop of a run ρ over input w is an interval I = [ i, j ) with a flow F = flow ρ ( I ) such that F · F = F (call F idempotent ). Therun ρ can be pumped on aloop I = [ i, j ) as expected:given n >
0, we let pump nI ( ρ )be the run obtained from ρ by glueing the subruns thatspan over the intervals [1 , i )and [ j, | w | + 1) with n copiesof the subruns spanning over I (see figure to the right). Iα α α α α α α α α β β β β β β β β β γ γ γ I Iα α α α α α α α α α α α α α α α α α α α α α α α α α α β β β β β β β β β β β β β β β β β β β β β β β β β β β γ γ γ γ γ γ γ γ γ The lemma below shows that the occurrence order relative to subruns wit-nessing LR or RL edges of a loop (called straight edges , for short) is preservedwhen pumping the loop. This seemingly straightforward lemma is needed fordetecting inversions and its proof is surprisingly non-trivial. For example, theexternal edge connecting the two L -vertices 1 , α , and also before every copy of α in the run where loop I is pumped. Lemma 3.
Let ρ be a run of T on u , let J < I < K be a partition of the domainof u into intervals, with I loop of ρ , and let F = flow ρ ( J ) , E = flow ρ ( I ) , and G = flow ρ ( K ) be the corresponding flows. Consider an arbitrary edge f of either F or G , and a straight edge e of the idempotent flow E . Let ρ f and ρ e be thewitnessing subruns of f and e , respectively. Then the occurrence order of ρ f and ρ e in ρ is the same as the occurrence order of ρ f and any copy of ρ e in pump nI ( ρ ) . We can now formalize the key notion of inversion: efinition 3 (inversion). An inversion of ρ is a tuple ( I, e, I (cid:48) , e (cid:48) ) such that – I, I (cid:48) are loops of ρ and I < I (cid:48) , – e, e (cid:48) are productive straightedges in flow ρ ( I ) and flow ρ ( I (cid:48) ) respectively, – the subrun witnessing e (cid:48) pre-cedes the subrun witnessing e inthe run order(see the figure to the right). I I (cid:48) e (cid:48) e (cid:48) e (cid:48) eee In this section we identify some particular intervals of the output that play animportant role in the inductive construction of the resynchronizer for a one-wayresynchronizable transducer.Given n ∈ N , we say that a set B of output positions is n -large if | orig ( B ) | >n ; otherwise, we say that B is n -small . Recall that here we work with a K -visiting transducer T , for some constant K , and that M = ( | Q | · (2 K + 1)) K + 1is an upper bound to the size of the flow monoid M T . We will extensively usethe derived constant C = M K to distinguish between large and small sets ofoutput positions. The intuition behind this constant is that any set of outputpositions that is C -large must traverse a loop of ρ . This is formalized in thelemma below. The proof uses algebraic properties of the flow monoid M T [20](see also Theorem 7.2 in [3], which proves a similar result, but with a largerconstant derived from Simon’s factorization theorem): Lemma 4.
Let I be an input interval and B a set of output positions withorigins inside I . If B is C -large, then there is a loop J ⊆ I of ρ such thatflow ρ ( J ) contains a productive straight edge witnessed by a subrun that intersects B (in particular, out ( J ) ∩ B (cid:54) = ∅ ). We need some more notations for outputs. Given an input interval I wedenote by out ρ ( I ) the set of output positions whose origins belong to I (notethat this might not be an output interval). An output block of I is a maximalinterval contained in out ρ ( I ).The dominant output interval of I , denoted bigout ρ ( I ), is the smallest outputinterval that contains all C -large output blocks of I . In particular, bigout ρ ( I )either is empty or begins with the first C -large output block of I and ends withthe last C -large outblock block of I . We will often omit the subscript ρ from thenotations flow ρ ( I ), out ρ ( I ) , bigout ρ ( I ), etc., when no confusion arises.We now fix a successful run ρ of the K -visiting transducer T . The rest ofthe section presents some technical lemmas that will be used in the inductiveconstructions for the proof of the main theorem. In the lemmas below, we assumethat all successful runs of T (in particular, ρ ) avoid inversions. emma 5. Let I < I be two input intervals and B , B output blocks of I , I , respectively. If both B , B are C -large, then B < B .Proof (sketch). If the claim would not hold, then Lemma 4 would provide someloops J ⊆ I and J ⊆ I , together with some productive edges in them,witnessing an inversion. (cid:117)(cid:116) Lemma 6.
Let I = I · I , B = bigout ( I ) , and B i = bigout ( I i ) for i = 1 , .Then B \ ( B ∪ B ) is K C -small.Proof (sketch). By Lemma 5, B < B . Moreover, all C -large output blocksof I or I are also C -large output blocks of I , so B contains both B and B .Suppose, by way of contradiction, that B \ ( B ∪ B ) is 4 K C -large. This meansthat there is a 2 K C -large set S ⊆ B \ ( B ∪ B ) with origins entirely to theleft of I , or entirely to the right of I . Suppose, w.l.o.g., that the former caseholds, and decompose S as a union of maximal output blocks B (cid:48) , B (cid:48) , . . . , B (cid:48) n with origins either entirely inside I , or entirely outside. Since S ∩ B = ∅ ,every block B (cid:48) i with origins inside I is C -small. Similarly, by Lemma C.1 inAppendix C, every block B (cid:48) i with origins outside I is C -small too. Moreover,since ρ is K -visiting, we get n ≤ K . Altogether, this contradicts the assumptionthat S is 2 K C -large. (cid:117)(cid:116) Lemma 7.
Let I = I · I · · · I n , such that I is a loop and flow ( I ) = flow ( I k ) for all k . Then bigout ( I ) can be decomposed as B · J · B · J · . . . · J n − · B n ,where1. for all ≤ k ≤ n , B k = bigout ( I k ) (with B k possibly empty);2. for all ≤ k < n , the positions in J k have origins inside I k ∪ I k +1 and J k is K C -small.Proof (sketch). The proof idea is similar to the previous lemma. First, usingproperties of idempotent flows, one shows that all output positions strictly be-tween B k and B k +1 , for any k = 1 , . . . , n −
1, have origin in I k ∪ I k +1 . Then, oneobserves that every output block of I k disjoint from B k is C -small, and since T is K -visiting there are at most K suchblocks. This shows that every output inter-val J k between B k and B k +1 is 2 K C -small.For an illustration see the figure to the right.The C -large blocks in I are shown in red;in blue those for I , in purple those for I .So bigout ( I ) is the entire output betweenthe two red dots, bigout ( I ) between the twoblue dots, and bigout ( I ) between the pur-ple dots. All three blocks are non-empty, and bigout ( I · I · I ) goes from the first red tothe second purple dot. Black non-dashed ar-rows stand for C -small blocks. (cid:117)(cid:116) I I I Proof of Theorem 1
This section is devoted to proving the characterization of one-way resynchro-nizability in the bounded-visit case. We will use the notion of bounded-traversal from [21], that was shown to characterize the class of bounded regular resynchro-nizers, in as much as bounded-delay characterizes rational resynchronizers [15].
Definition 4 (traversal [21]).
Let σ = ( u, v, orig ) and σ (cid:48) = ( u, v, orig (cid:48) ) betwo synchronized pairs with the same input and output words.Given two input positions y, y (cid:48) ∈ dom ( u ) , we say that y traverses y (cid:48) if there isa pair ( y, z ) of source and target origins associated with the same output positionsuch that y (cid:48) is between y and z , with y (cid:48) (cid:54) = z and possibly y (cid:48) = y . More precisely: – ( y, y (cid:48) ) is a left-to-right traversal if y ≤ y (cid:48) and for some output position x ,orig ( x ) = y and z = orig (cid:48) ( x ) > y (cid:48) ; – ( y, y (cid:48) ) is a right-to-left traversal if y ≥ y (cid:48) and for some output position x ,orig ( x ) = y and z = orig (cid:48) ( x ) < y (cid:48) .A pair ( σ, σ (cid:48) ) of synchronized pairs with input u and output v is said to have k -bounded traversal , with k ∈ N , if every y (cid:48) ∈ dom ( u ) is traversed by at most k distinct positions of dom ( u ) .A resynchronizer R has bounded traversal if there is some k ∈ N such thatevery ( σ, σ (cid:48) ) ∈ R has k -bounded traversal. Lemma 8 ([21]).
A regular resynchronizer is bounded if and only if it hasbounded traversal.Proof (of Theorem 1).
First of all, observe that the implication 4 → →
2, assume that there is a k -bounded,regular resynchronizer R that is T -preserving and such that R ( T ) is order-preserving. Lemma 8 implies that R has t -bounded traversal, for some constant t . We head towards proving that T has cross-width bounded by t + k . Considertwo synchronized pairs σ = ( u, v, orig ) and σ (cid:48) = ( u, v, orig (cid:48) ) such that σ ∈ [[ T ]] o and ( σ, σ (cid:48) ) ∈ R , and consider a cross ( X , X ) of σ . We claim that | orig ( X ) | or | orig ( X ) | is at most t + k . Let x = min( orig ( X )), x (cid:48) = max( orig (cid:48) ( X )), x = max( orig ( X )), and x (cid:48) = min( orig (cid:48) ( X )). Since ( X , X ) is a cross, wehave x > x , and since σ (cid:48) is order-preserving, we have x (cid:48) ≤ x (cid:48) . Now, if x (cid:48) > x , then at least | orig ( X ) | − k input positions from X traverse x (cid:48) tothe right (the − k term is due to the fact that at most k input positions can beresynchronized to x (cid:48) ). Symmetrically, if x (cid:48) ≤ x , then at least | orig ( X ) | − k input positions from X traverse x to the left (the − k term accounts for thecase where some positions are resynchronized to x (cid:48) and x (cid:48) = x ). This impliesmin( | orig ( X ) | , | orig ( X ) | ) ≤ t + k , as claimed.The remaining implications rely on the assumption that T is bounded-visit.The implication 2 → ρ with an inversion, and shows that crosses of arbitrary width emerge afterpumping the loops of the inversion (here Lemma 3 is crucial).he proof of 3 → T has inversions we build a partially bijective, regularresynchronizer R that is T -preserving and R ( T ) is order-preserving. The resyn-chronizer R uses some parameters to guess a successful run ρ of T on u and afactorization tree of bounded height for ρ . Formally, a factorization tree for asequence α of monoid elements (e.g. the flows flow ρ ([ y, y ]) for all input positions y ) is an ordered, unranked tree whose yield is the sequence α . The leaves ofthe factorization tree are labeled with the elements of α . All other nodes haveat least two children and are labeled by the monoid product of the child labels(in our case by the flows of ρ induced by the covered factors in the input). Inaddition, if a node has more than two children, then all its children must havethe same label, representing an idempotent element of the monoid. By Simon’sfactorization theorem [23], every sequence of monoid elements has some factor-ization tree of height at most linear in the size of the monoid (in our case, atmost 3 | M T | , see e.g. [8]). Parameters.
We use input parameters to encode the successful run ρ and afactorization tree for ρ of height at most H = 3 | M T | . These parameters specify,for each input interval corresponding to a subtree, the start and end positionsof the interval and the label of the root of the subtree. Correctness of theseannotations can be enforced by an MSO sentence ipar . The run and the factor-ization tree also need to be encoded over the output, using output parameters.More precisely, given a level in the tree and an output position, we need to beable to determine the flow and the productive edge that generated that position.The technical details for checking correctness of the output annotation using theformulas opar , move τ and next τ,τ (cid:48) can be found in Appendix D. Moving origins.
For each level (cid:96) of the factorization tree, a partial resyn-chronization relation R (cid:96) is defined. The relation is partial in the sense that someoutput positions may not have a source-target origin pair defined at a given level.But once a source-target pair is defined for some output position at a given level,it remains defined for all higher levels.In the following we write bigout ( p ) for the dominant output interval associ-ated with the input interval I ( p ) corresponding to a node p in the tree. For everylevel (cid:96) of the factorization tree, the resynchronizer R (cid:96) will be a partial functionfrom source origins to target origins, and will satisfy the following: – the set of output positions for which R (cid:96) defines target origins is the unionof the intervals bigout ( p ) for all nodes p at level (cid:96) ; – R (cid:96) only moves origins within the same interval at level (cid:96) , that is, R (cid:96) definesonly pairs ( y, z ) of source-target origins such that y, z ∈ I ( p ) for some node p at level (cid:96) ; – the target origins defined by R (cid:96) are order-preserving within every intervalat level (cid:96) , that is, for all output positions x < x (cid:48) , if R (cid:96) defines the targetorigins of x, x (cid:48) to be z, z (cid:48) , respectively, and if z, z (cid:48) ∈ I ( p ) for some node p atlevel (cid:96) , then z ≤ z (cid:48) ; – R (cid:96) is (cid:96) · K C -bounded, namely, there are at most (cid:96) · K C distinct sourceorigins that are moved by R (cid:96) to the same target origin.he construction of R (cid:96) is by induction on (cid:96) . For a binary node p at level (cid:96) with children p , p , the resynchronizer R (cid:96) inherits the source-origin pairsfrom level (cid:96) − bigout ( p ) ∪ bigout ( p ).Note that bigout ( p ) < bigout ( p ) by Lemma 5, so R (cid:96) is order-preserving in-side bigout ( p ) ∪ bigout ( p ). Output positions inside bigout ( p ) \ ( bigout ( p ) ∪ bigout ( p )) are moved in an order-preserving manner to one of the extremitiesof I ( p ), or to the last position of I ( p ). Boundedness of R (cid:96) is guaranteed byLemma 6.The case where p is an idempotent node at level (cid:96) with children p , p , . . . , p n follows a similar approach. For brevity, let I i = I ( p i ) and B i = bigout ( p i ),and observe that, by Lemma 5, B < B < · · · < B n . Lemma 7 provides adecomposition of bigout ( p ) as B · J · B · J · . . . · J n − · B n , for some 2 K C -smalloutput intervals J k with origins inside I k ∪ I k +1 , for k = 1 , . . . , n −
1. As before,the resynchronizer R (cid:96) behaves exactly as R (cid:96) − for the output positions insidethe B k ’s. For any other output position, say x ∈ J k , the resynchronizer R (cid:96) willmove the origin either to the last position of I k or to the first position of I k +1 ,depending on whether the source origin of x belongs to I k or I k +1 . (cid:117)(cid:116) The main obstacle towards dropping the bounded-visit restriction from Theo-rem 1, while maintaining the effectiveness of the characterization, is the lack of abound on the number of flows. Indeed, for a transducer T that is not necessarilybounded-visit, there is no bound on the number of flows that encode successfulruns of T , and thus the proofs of the implications 2 → → → → T that is not bounded-visit.The idea for proving Theorem 2 is to transform T into an equivalent bounded-visit transducer low ( T ), so that the property of one-way resynchronizability ispreserved. More precisely, given a two-way transducer T , we construct:1. a bounded-visit transducer low ( T ) that is classically equivalent to T ,2. a 1-bounded, regular resynchronizer R that is T -preserving and such that R ( T ) = o low ( T ).We can apply our characterization of one-way resynchronizability in thebounded-visit case to the transducer low ( T ). If low ( T ) is one-way resynchroniz-able, then by Theorem 1 we obtain another partially bijective, regular resynchro-nizer R (cid:48) that is low ( T )-preserving and such that R (cid:48) ( low ( T ))) is order-preserving.Thanks to Lemma 2, the resynchronizers R and R (cid:48) can be composed, so we con-clude that the original transducer T is one-way resynchronizable. Otherwise,if low ( T ) is not one-way resynchronizable, we show that neither is T . This isprecisely shown in the lemma below. Lemma 9.
For all transducers
T, T (cid:48) , with T (cid:48) bounded-visit, and for every par-tially bijective, regular resynchronizer R that is T -preserving and such that ( T ) = o T (cid:48) , T is one-way resynchronizable if and only if T (cid:48) is one-way resyn-chronizable. There are however some challenges in the approach described above. First, as T may output arbitrarily many symbols with origin in the same input position,and low ( T ) is bounded-visit, we need low ( T ) to be able to produce arbitrarilylong outputs within a single transition. For this reason, we allow low ( T ) to bea transducer with regular outputs . The transition relation of such a transducerconsists of finitely many tuples of the form ( q, a, L, q (cid:48) ), with q, q (cid:48) ∈ Q , a ∈ Σ ,and L ⊆ Γ ∗ a regular language over the output alphabet. The semantics of atransition rule ( q, a, L, q (cid:48) ) is that, upon reading a , the transducer can switch fromstate q to state q (cid:48) , and move its head accordingly, while outputting any wordfrom L . We also need to use transducers with common guess. Both extensions,regular outputs and common guess, already appeared in prior works (cf. [5,7]),and the proof of Theorem 1 in the bounded-visit case can be easily adapted tothese features.There is still another problem: we cannot always expect that there exists abounded-visit transducer low ( T ) classically equivalent to T . Consider, for in-stance, the transducer that performs several passes on the input, and on eachleft-to-right pass, at an arbitrary input position, it copies as output the letterunder its head. It is easy to see that the Parikh image of the output is an exactmultiple of the Parikh image of the input, and standard pumping argumentsshow that no bounded-visit transducer can realize such a relation.A solution to this second problem is as follows. Before trying to construct low ( T ), we test whether T satisfies the following condition on vertical loops(these are runs starting and ending at the same position and at the same state).There should exist some K such that T is K -sparse , meaning that the number ofdifferent origins of outputs generated inside some vertical loop is at most K . Ifthis condition is not met, then we show that T has unbounded cross-width, andhence, by the implication 1 → T is not one-way resynchronizable.Otherwise, if the condition holds, then we show that a bounded-visit transducer low ( T ) equivalent to T can indeed be constructed. We discuss the effectiveness and complexity of our characterization. For a k -visit transducer T , the effectiveness of the characterization relies on detectinginversions in successful runs of T . It is not difficult to see that this can be decidedin space that is polynomial in the size of T and the bound k . We can also showthat one-way resynchronizability is Pspace -hard. For this we recall that theemptiness problem for two-way finite automata is
Pspace -complete. Let A be atwo-way automaton accepting some language L , and let Σ be a binary alphabetdisjoint from that of L . The function { ( w · a . . . a n , a n . . . a ) | w ∈ L, a . . . a n ∈ Σ ∗ , n ≥ } can be realized by a two-way transducer T of size polynomial in | A | ,and T is one-way resynchronizable if and only if L is empty.n the unrestricted case, we showed that one-way resynchronizability is decid-able (Theorem 2). We briefly outline the complexity of the decision procedure:1. First one checks that T is K -sparse for some K . To do this, we constructfrom T the regular language L of all inputs with some positions markedthat correspond to origins produced within the same vertical loop. Boundedsparsity is equivalent to having a uniform bound on the number of markedpositions in every input from L . Standard techniques for two-way automataallow to decide this in space that is polynomial in the size of T . Moreover,this also gives us a computable exponential bound to the largest constant K for which T can be K -sparse.2. Next, we construct from the K -sparse transducer T a bounded-visit trans-ducer T (cid:48) that is classically equivalent to T and has exponential size.3. Finally, we decide one-way resynchronizability of T (cid:48) by detecting inversionsin successful runs of T (cid:48) (Theorem 1).Summing up, one can decide one-way resynchronizability of unrestricted two-way transducers in exponential space. It is open if this bound is optimal. Wealso do not have any interesting bound on the size of the resynchronizer thatwitnesses one-way resynchronizability, both in the bounded-visit case and in theunrestricted case. Similarly, we lack upper and lower bounds on the size of theresynchronized one-way transducers, when these exist. As the main contribution of this paper, we provided a characterization for thesubclass of two-way transducers that are one-way resynchronizable, namely, thatcan be transformed by some bounded, regular resynchronizer, into an origin-equivalent one-way transducer.There are similar definability problems that emerge in the origin semantics.For instance, one could ask whether a given two-way transducer can be resyn-chronized, through some bounded, regular resynchronization, to a relation that isorigin-equivalent to a first-order transduction. This can be seen as a relaxation ofthe first-order definability problem in the origin semantics, namely, the problemof telling whether a two-way transducer is origin-equivalent to some first-ordertransduction, shown decidable in [4]. It is worth contrasting the latter problemwith the challenging open problem whether a given transduction is equivalentto a first-order transduction in the classical setting.
Acknowledgments.
We thank the FoSSaCS reviewers for their constructive anduseful comments.
References
1. Rajeev Alur and Pavel Cern´y. Expressiveness of streaming string transducer. In
IARCS Annual Conference on Foundation of Software Technology and Theoreticalomputer Science (FSTTCS’10) , volume 8 of
LIPIcs , pages 1–12. Schloss Dagstuhl- Leibniz-Zentrum f¨ur Informatik, 2010.2. F´elix Baschenis, Olivier Gauwin, Anca Muscholl, and Gabriele Puppis. One-waydefinability of sweeping transducers. In
IARCS Annual Conference on Founda-tion of Software Technology and Theoretical Computer Science (FSTTCS’15) , vol-ume 45 of
LIPIcs , pages 178–191. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Infor-matik, 2015.3. F´elix Baschenis, Olivier Gauwin, Anca Muscholl, and Gabriele Puppis. One-waydefinability of two-way word transducers.
Logical Methods in Computer Science ,14(4):1–54, 2018.4. Mikolaj Boja´nczyk. Transducers with origin information. In
International Collo-quium on Automata, Languages and Programming (ICALP’14) , number 8572 inLNCS, pages 26–37. Springer, 2014.5. Mikolaj Boja´nczyk, Laure Daviaud, Bruno Guillon, and Vincent Penelle. Whichclasses of origin graphs are generated by transducers? In
International Colloquiumon Automata, Languages and Programming (ICALP’17) , volume 80 of
LIPIcs ,pages 114:1–114:13. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2017.6. Sougata Bose, Shankara Narayanan Krishna, Anca Muscholl, Vincent Penelle, andGabriele Puppis. On synthesis of resynchronizers for transducers. In
Interna-tional Symposium on Mathematical Foundations of Computer Science (MFCS’19) ,volume 138 of
LIPIcs , pages 69:1–69:14. Schloss Dagstuhl - Leibniz-Zentrum f¨urInformatik, 2019.7. Sougata Bose, Anca Muscholl, Vincent Penelle, and Gabriele Puppis. Origin-equivalence of two-way word transducers is in PSPACE. In
IARCS Annual Con-ference on Foundations of Software Technology and Theoretical Computer Science(FSTTCS’18) , volume 122 of
LIPIcs , pages 1–18. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2018.8. Thomas Colcombet. Factorisation forests for infinite words. In
Fundamentals ofComputation Theory (FCT) , volume 4639 of
LNCS , pages 226–237. Springer, 2007.9. Bruno Courcelle and Joost Engelfriet.
Graph Structure and Monadic Second-OrderLogic - A Language-Theoretic Approach , volume 138 of
Encyclopedia of mathemat-ics and its applications . Cambridge University Press, 2012.10. Luc Dartois, Isma¨el Jecker, and Pierre-Alain Reynier. Aperiodic string transducers.
Int. J. Found. Comput. Sci. , 29(5):801–824, 2018.11. Joost Engelfriet and Hendrik Jan Hoogeboom. MSO definable string transductionsand two-way finite-state transducers.
ACM Trans. Comput. Log. , 2(2):216–254,2001.12. Joost Engelfriet and Hendrik Jan Hoogeboom. Finitary compositions of two-wayfinite-state transductions.
Fundamenta Informaticae , 80:111–123, 2007.13. Emmanuel Filiot, Olivier Gauwin, and Nathan Lhote. Logical and algebraic charac-terizations of rational transductions.
Logical Methods in Computer Science , 15(4),2019.14. Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Fr´ed´eric Servais. Fromtwo-way to one-way finite state transducers. In
ACM/IEEE Symposium on Logicin Computer Science (LICS’13) , pages 468–477, 2013.15. Emmanuel Filiot, Isma¨el Jecker, Christof L¨oding, and Sarah Winter. On equiva-lence and uniformisation problems for finite transducers. In
Proc. of nternationalColloquium on Automata, Languages, and Programming (ICALP’16) , number 125in LIPIcs, pages 1–14. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2016.6. Emmanuel Filiot, Shankara Narayanan Krishna, and Ashutosh Trivedi. First-orderdefinable string transformations. In
IARCS Annual Conference on Foundationsof Software Technology and Theoretical Computer Science (FSTTCS’14) , LIPIcs,pages 147–159. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2014.17. Emmanuel Filiot, Sebastian Maneth, Pierre-Alain Reynier, and Jean-Marc Talbot.Decision problems of tree transducers with origin.
Inf. Comput. , 261(Part):311–335, 2018.18. T. V. Griffiths. The unsolvability of the equivalence problem for lambda-free non-deterministic generalized machines.
J. ACM , 15(3):409–413, 1968.19. Oscar H. Ibarra. The unsolvability of the equivalence problem for e-free NGSM’swith unary input (output) alphabet and applications.
SIAM J. of Comput. ,7(4):524–532, 1978.20. Ismael Jecker. Personal communication.21. Denis Kuperberg and Jan Martens. Regular resynchronizability of origin transduc-ers is undecidable. In
International Symposium on Mathematical Foundations ofComputer Science (MFCS’20) , volume 170 of
LIPIcs , pages 1–14. Schloss Dagstuhl- Leibniz-Zentrum f¨ur Informatik, 2020.22. John C. Shepherdson. The reduction of two-way automata to one-way automata.
IBM Journal of Research and Development , 3(2):198–200, 1959.23. Imre Simon. Factorization forests of finite height.
Theoretical Computer Science ,72(1):65–94, 1990. ppendix
A Proofs from Section 3.1
Lemma 1.
Every bounded, regular resynchronizer is effectively equivalent tosome -bounded, regular resynchronizer.Proof. Let R = ( I, O, ipar , opar , ( move τ ) τ , ( next τ,τ (cid:48) ) τ,τ (cid:48) ) be a k -bounded, regularresynchronizer. Let ˆ u and ˆ v be a pair of annotated input and output satisfy-ing ipar and opar respectively. To construct an equivalent 1-bounded regularresynchronizer R (cid:48) we introduce additional output parameters. Specifically, eachoutput position will be annotated with an output type τ from R and an addi-tional index in { , . . . , k } . The intended meaning of the index is as follows: if( y, z ) is the source/target origin pair associated with an output position labeledby ( τ, i ), i ∈ { , . . . , k } , then then there are exactly ( i −
1) positions y (cid:48) < y suchthat (ˆ u, y (cid:48) , z ) | = move τ .Note that this indexing depends on the choice of the target origin z . There-fore, different indexing are possible for different choice of the target origin z .Based on the resynchronizer R , we define the new resynchronizer as R (cid:48) =( I, O (cid:48) , ipar , opar (cid:48) , ( move (cid:48) ( τ,i ) ) τ,i , ( next ( τ,i ) , ( τ (cid:48) ,i (cid:48) ) ) τ,i,τ (cid:48) ,i (cid:48) ), where – O (cid:48) = O (cid:93) { O (cid:48) , . . . , O (cid:48) k } consists of the old output parameters O of R plussome new parameters O (cid:48) , . . . , O (cid:48) k for representing indices in { , . . . , k } ; – opar (cid:48) defines language of all output annotations whose projections over Γ (cid:48) (the output alphabet extended with the parameters of R ) satisfy opar andeach position is marked by exactly one index; – given a type τ (cid:48) that encodes a type τ of R and an index i ∈ { , . . . , k } , move (cid:48) τ (cid:48) ( y, z ) states that y is the i -th position y (cid:48) satisfying move τ ( y (cid:48) , z ); Thisproperty can be expressed by the MSO-formula ∃ y < · · · < y i = y (cid:94) j move τ ( y j , z ) ∧ ∀ y (cid:48) ≤ y (cid:0) move τ ( y (cid:48) , z ) → (cid:95) j y (cid:48) = y j (cid:1) ; – next (cid:48) ( τ,i ) , ( τ (cid:48) ,i (cid:48) ) ( z, z (cid:48) ) enforces the same property as next τ,τ (cid:48) ( z, z (cid:48) ).The resynchronizer R (cid:48) is 1-bounded by definition of move (cid:48) ( τ,i ) . If for positions y < y (cid:48) , (ˆ u, y, z ) | = move (cid:48) ( τ,i ) and (ˆ u, y (cid:48) , z ) | = move (cid:48) ( τ,i ) , then y and y (cid:48) are both the i -th source position in ˆ u satisfying move τ with target z , which is a contradiction.We now prove that R and R (cid:48) define the same relation between synchronizedpairs. First we show R (cid:48) ⊆ R . Consider (( u, v ) , ( u, v (cid:48) )) ∈ R (cid:48) . Therefore, thereexists ˆ u | = ipar and ˆ v | = opar (cid:48) such that move (cid:48) applied to positions of ˆ v give the v (cid:48) witnessing (( u, v ) , ( u, v (cid:48) )) ∈ R (cid:48) . By definition of opar (cid:48) , ˆ v Γ (cid:48) | = opar . Suppose,a position x of output type ( τ, i ) is moved from origin y in v to z in v (cid:48) . Thismeans (ˆ u, y, z ) | = move (cid:48) ( τ,i ) . Then, by definition of move (cid:48) ( τ,i ) , (ˆ u, y, z ) | = move τ .This shows R (cid:48) ⊆ R .or the containment R ⊆ R (cid:48) , consider (( u, v ) , ( u, v (cid:48) )) ∈ R . Therefore, thereexists ˆ u | = ipar and ˆ v | = opar such that move applied to each position in ˆ v witnesses (( u, v ) , ( u, v (cid:48) )) ∈ R . This means for every position x ∈ dom (ˆ v ) withoutput-type τ , there exist y, z , such that (ˆ u, y, z ) | = move τ , y = orig ( v ( x ))and z = orig ( v (cid:48) ( x )). For such a position x ∈ dom (ˆ v ) of output type τ , let i ∈ { , . . . , k } be such that there are exactly i − y < y < . . . y i − < y such that (ˆ u, y j , z ) | = move τ . Let ˆ v (cid:48) be the annotation of ˆ v where every position x is annotated with the index i as above. Clearly ˆ v (cid:48) | = opar (cid:48) and therefore,(( u, v ) , ( u, v (cid:48) )) ∈ R (cid:48) . We conclude R = R (cid:48) . (cid:117)(cid:116) Lemma 2.
The class of bounded, regular resynchronizers is effectively closedunder composition.Proof.
Let R = ( I, O, ipar , opar , ( move τ ) τ , ( next τ,τ (cid:48) ) τ,τ (cid:48) ) and R (cid:48) = ( I (cid:48) , O (cid:48) , ipar (cid:48) , opar (cid:48) ,( move (cid:48) λ ) λ , ( next (cid:48) λ,λ (cid:48) ) λ,λ (cid:48) ) be two bounded, regular resynchronizers. In view ofLemma 1, we can assume that both resynchronizers are 1-bounded. The com-position R ◦ R (cid:48) can be defined by combining the effects of R and R (cid:48) almostcomponent-wise. Some care should be taken, however, in combining the for-mulas next and next (cid:48) . Formally, we define the composed resynchronizer R (cid:48)(cid:48) =( I (cid:48)(cid:48) , O (cid:48)(cid:48) , ipar (cid:48)(cid:48) , opar (cid:48)(cid:48) , ( move (cid:48)(cid:48) ( τ,λ ) ) τ,λ , ( next (cid:48)(cid:48) ( τ,λ ) , ( τ (cid:48) ,λ (cid:48) ) ) τ,λ,τ (cid:48) ,λ (cid:48) ), where – I (cid:48)(cid:48) is the union of the parameters I and I (cid:48) , – O (cid:48)(cid:48) is the union of the parameters O and O (cid:48) , – ipar (cid:48)(cid:48) is the conjunction of the formulas ipar and ipar (cid:48) ; – opar (cid:48)(cid:48) is the conjunction of the formulas opar and opar (cid:48) ; – move (cid:48)(cid:48) ( τ,λ ) ( y, z ) states the existence of some position t satisfying both formulas move τ ( t, z ) and move (cid:48) λ ( y, t ); – next ( τ,λ ) , ( τ (cid:48) ,λ (cid:48) ) ( z, z (cid:48) ) requires that next τ,τ (cid:48) ( z, z (cid:48) ) holds and, moreover, thatthere exist some positions t, t (cid:48) satisfying move τ ( t, z ), move τ (cid:48) ( t (cid:48) , z (cid:48) ), and next λ,λ (cid:48) ( t, t (cid:48) );note that these positions t, t (cid:48) are uniquely determined from z, z (cid:48) since R is1-bounded, and they act, at the same time, as source origins for R and astarget origins for R (cid:48) .By definition, move (cid:48)(cid:48) ( τ,λ ) is 1-bounded, thus z and τ determine a unique t ,which together with λ determines a unique y . It is also easy to see that R (cid:48)(cid:48) isequivalent to R ◦ R (cid:48) as the positions corresponding to t in formulas move (cid:48)(cid:48) ( τ,λ ) and next (cid:48)(cid:48) ( τ,λ ) , ( τ (cid:48) ,λ (cid:48) ) correspond to the source origin of R and target origin of R (cid:48) . (cid:117)(cid:116) B Proofs from Section 4.1
Lemma 3.
Let ρ be a run of T on u , let J < I < K be a partition of the domainof u into intervals, with I loop of ρ , and let F = flow ρ ( J ) , E = flow ρ ( I ) , and G = flow ρ ( K ) be the corresponding flows. Consider an arbitrary edge f of either F or G , and a straight edge e of the idempotent flow E . Let ρ f and ρ e be thewitnessing subruns of f and e , respectively. Then the occurrence order of ρ f and ρ e in ρ is the same as the occurrence order of ρ f and any copy of ρ e in pump nI ( ρ ) .roof. It is convenient to rephrase the claim of the lemma in terms of a jux-taposition operation on flows and in terms of an induced accessibility relationon edges. Formally, given two flows
F, G , we define the juxtaposition
F G ina way similar to concatenation, with the only exception that in the result wemaintain as an additional group of vertices the R vertices of F , glued with thestate-matching L vertices of G (strictly speaking, the result of a juxtaposition oftwo flows is not a flow, since it has three distinguished groups of vertices). Wedenote by E . . . E the n -fold juxtaposition of the flow E with itself (this mustnot be confused with the n -fold concatenation E · . . . · E ).Let F, E, G , f, e , ρ f , ρ e be as stated in the lemma, and let (cid:22) denote theaccessibility order between edges in a juxtaposition of flows, e.g. F EG (notethat, due to the type of flows considered here, (cid:22) turns out to always be a totalorder on
F EG ). Observe that the relative occurrence order of ρ f and ρ e inside ρ is the same as the accessibility order (cid:22) of the edges f and e on the graph F EG . A similar claim holds for the occurrence order of ρ f and any copy of ρ e inside the pumped run pump nI ( ρ ), which corresponds to the accessibility order of f and any copy of e in the graph F E . . . EG . Thanks to these correspondences,to prove the lemma it suffices to consider any copy e (cid:48) of e in F E . . . EG , andshow that f (cid:22) e in F EG iff f (cid:22) e (cid:48) in F E . . . EG.
We thus prove the above claim. Consider the maximal path π inside E . . . E that contains the edge e (cid:48) . Note that this path starts and ends at some extremalvertices of E . . . E (otherwise the path could be extended while remaining inside
E . . . E ). Also recall that concatenation can be defined from juxtaposition byremoving the intermediate groups of vertices, leaving only the extremal ones,and by shortcutting paths into edges. We call this operation flattening , for short.In particular, since E is idempotent, we have that E = E · . . . · E can be obtainedfrom the flattening of E . . . E , and this operation transforms the path π into anedge e (cid:48)(cid:48) . By construction, we have that f (cid:22) e (cid:48) in F E . . . EG if and only if f (cid:22) e (cid:48)(cid:48) in F EG . So it remains to prove that f (cid:22) e in F EG iff f (cid:22) e (cid:48)(cid:48) in F EG.
Clearly, this latter claim holds if the edges e and e (cid:48)(cid:48) coincide. This is indeed thecase when e is a straight edge and E is idempotent. The formal proof that thisholds is rather tedious, but follows quite easily from a series of results we havealready proven in [3]. Roughly speaking, one proves that: – the edges of an idempotent flow E can be grouped into components (cf. Def-inition 6.4 from [3]), so that each component contains exactly one straightedge (cf. Lemma 6.6 from [3], see also the figure at page 11, where compo-nents are represented by colors); – every path inside the juxtaposition EE , with E idempotent, consists of edgesfrom the same component, say C ; moreover, after the flattening from EE to E , this path becomes an edge of E that belongs again to the component C (cf. Claims 7.3 and 7.4 in the proof of Theorem 7.2 from [3]); every maximal path in E . . . E that contains a straight edge starts and endat opposite sides of
E . . . E (simple observation based on the definition ofconcatenation and Lemma 6.6 from [3]).To conclude, recall that π is a path inside E . . . E that contains a copy e (cid:48) of thestraight edge e , and that becomes the edge e (cid:48)(cid:48) after the flattening into E . Theprevious properties immediately imply that e = e (cid:48)(cid:48) . C Proofs from Section 4.2
As explained in the body, the technical lemmas involving output blocks are onlyapplicable to transducers that avoid inversions.
Hereafter we assume that T isa transducer that avoids inversions, and we denote by ρ an arbitrary successfulrun of T . Lemma 5.
Let I < I be two input intervals and B , B output blocks of I , I , respectively. If both B , B are C -large, then B < B .Proof. B and B are clearly disjoint. By way of contradiction, assume that B and B are C -large, but B > B . By Lemma 4, we can find for both i = 1 and i = 2 a loop J i ⊆ I i and a productive straight edge e i ∈ flow ( J i ) that is witnessedby a subrun intersecting B i . Clearly, we have J < J , and since B > B , thesubrun witnessing e follows the subrun witnessing e . Thus, ( J , e , J , e ) is aninversion of ρ , which contradicts the assumption that T avoids inversions.We now turn to the proofs of Lemmas 6 and 7, which both require auxiliarylemmas relying again on the assumption that T avoids inversion. Lemma C.1.
Let I be an input interval, B < B two output blocks of I , and S the set of output positions strictly between B and B and with origins outside I . If B , B are C -large, then S is C -small.Proof. By way of contradiction, suppose that S is 2 C -large. This means that | orig ( S ) ∩ I (cid:48) | > C for some interval I (cid:48) disjoint from I , say I (cid:48) < I (the case of I (cid:48) > I is treated similarly). By Lemma 4, we can find two loops J ⊆ I and J (cid:48) ⊆ I (cid:48) and some productive straight edges e ∈ flow ( J ) and e (cid:48) ∈ flow ( J (cid:48) ) thatare witnessed by subruns intersecting B and S , respectively. Since S > B ,we know that the subrun witnessing e follows the subrun witnessing e (cid:48) . As inthe previous proof, this shows the inversion ( J, e, J (cid:48) , e (cid:48) ), which contradicts theassumption that T avoids inversions. Lemma 6.
Let I = I · I , B = bigout ( I ) , and B i = bigout ( I i ) for i = 1 , .Then B \ ( B ∪ B ) is K C -small.Proof. By Lemma 5, we have B < B . Moreover, all C -large output blocks of I or I are also C -large output blocks of I , so B contains both B and B . Let I e the maximal interval to the left of I , and thus adjacent to it, and, similarly,let I be the maximal interval to the right of I , and thus adjacent to it.Suppose, by way of contradiction, that B \ ( B ∪ B ) is 4 K C -large. Thismeans that there is a 2 K C -large set S ⊆ B \ ( B ∪ B ) with origins entirelyinside I · I or entirely inside I · I . Suppose, w.l.o.g., that the former caseholds, and decompose S as a union of maximal output blocks B (cid:48) , B (cid:48) , . . . , B (cid:48) n ofeither I or I . Since S ∩ B = ∅ , we have that every block B (cid:48) i with origins inside I is C -small. Similarly, by Lemma C.1, every block B (cid:48) i with origins inside I is C -small too. Moreover, since ρ is K -visiting, we have that the number n ofmaximal output blocks of either I or I that are contained in S is at most 2 K .All together, this contradicts the assumption that S is 2 K C -large. Lemma C.2.
Let I be a loop of ρ . Then flow ( I ) has at most one productivestraight edge, and this edge must be LR .Proof. Suppose, by way of contradiction, that there are two productive straightedges in flow ( I ), say e and f , with e before f in ρ (the reader may refer againto the figure at page 11, and think of e and f , for instance, as the edges labeledby α and γ , respectively). Suppose that we pump I twice, and let I < I be the copies of I in the pumped run ρ (cid:48) . Let also e , e (resp. f , f ) be thecorresponding copies of e (resp. f ), so that e j , f j belong to the flow flow ρ (cid:48) ( I j ).It is easy to check the following properties: – if e is an LR edge, then the subrun witnessed by e occurs in ρ (cid:48) before thesubrun witnessed by e (and the other way around if e is RL ); – the subruns witnessed by e and e occur in ρ (cid:48) before the subruns witnessedby f , f (this property follows easily from the observation that when build-ing the product flow ( I ) · flow ( I ), the edges e , e will be “part” of the edge e in the product, whereas f , f will be “part” of the edge f ).Let us assume first that e is an RL edge. Then observe that ( I , e , I , e ) is aninversion in ρ (cid:48) . But this contradicts T being inversion-free. Therefore, both e, f are LR edges. But then, ( I , f , I , e ) is an inversion in ρ (cid:48) , and we have again acontradiction. Remark C.1.
The statement of Lemma C.2 can be strengthened by observingthe following property of productive edges in an idempotent flow. Assume that I is a loop and e is the unique productive straight edge in flow ( I ). Let f besome productive (non-straight) edge of flow ( I ) with f (cid:54) = e . When I is pumpedthen the subruns witnessing the copies of f are part of the subrun witnessing e in the product flow. This means for example, that in the figure on page 11the productive edges are either all among the blue edges, or all among the grayedges (none of the red edges can be productive, because the straight edge is RL ,and would result in a productive RL edge on pumping). Lemma 7.
Let I = I · I · · · I n , such that I is a loop and flow ( I ) = flow ( I k ) for all k . Then bigout ( I ) can be decomposed as B · J · B · J · . . . · J n − · B n ,where I I αααβββγγγ ααακκκ βββζζζ γγγ ααακκκ βββζζζ ζζζγγγκκκ Fig. 2: Illustration for Lemma 7.
1. for all ≤ k ≤ n , B k = bigout ( I k ) (with B k possibly empty);2. for all ≤ k < n , the positions in J k have origins inside I k ∪ I k +1 and J k is K C -small.Proof. By Lemma C.2, we can assume that flow ( I ) = flow ( I k ) has a uniqueproductive straight edge e , which is an LR edge. Let B (cid:48) k be the output blockcorresponding to e in flow ( I k ). Since flow ( I ) is idempotent, any output block of I has one of the following shapes (see also Remark C.1):(a) A block B = B (cid:48) · J (cid:48) · . . . J (cid:48) n − · B (cid:48) n , for some intervals J (cid:48) , . . . , J (cid:48) n − such that out ( I k ) is included in J (cid:48) k − · B (cid:48) k · J (cid:48) k for all 1 < k < n ,(b) At most 2 K output blocks L , . . . , L p , R , . . . , R s , where each L i and R j corresponds to an edge of flow ( I ) and flow ( I n ), respectively: the blocks L i , R j appear before, respectively after the straight edge.Moreover, the order of the output blocks of I is L , . . . , L p , B, R , . . . , R s . Toillustrate the statement (a) above, the reader can take as example p = s = 2, L = α , L = β , R = κ , R = ζ , J (cid:48) = · · · = J (cid:48) n − = ακβζ in Figure 2. Forstatement (b), notice that in I · I · I , we have the output blocks L = α, L = β of I , the straight edge ( γακβζ ) γ (the purple zigzag) followed by R = κ, R = ζ of I .Note that B k = bigout ( I k ) is contained in J (cid:48) k − · B (cid:48) k · J (cid:48) k for all 1 < k < n .Moreover, B = bigout ( I ) is contained in L · · · L p · B (cid:48) · J (cid:48) , and B n = bigout ( I n )is contained in J (cid:48) n − · B (cid:48) n · R · · · R s . Also by Lemma 5, B j precedes B j +1 for all j . If one of the L k is C -large, then B is non-empty, hence bigout ( I ) is non-empty and starts at the first position of B . Similarly, if one of the R k is C -large then B n is non-empty, hence bigout ( I ) is non-empty and ends with the lastposition of B n . Otherwise, if all L j , R j are C -small then bigout ( I ) is either emptyor equal to B . In all cases we can write bigout ( I ) = B · J · B · J · . . . · J n − · B n ,with each J k consisting of at most K C -small blocks of I k and K C -small blocksof I k +1 , namely those left over after gathering the C -large blocks into bigout ( I k )and bigout ( I k +1 ), respectively. Therefore, each J k is 2 K C -small. Proof of Theorem 1.
Recall that the implication 4 → → → → → ρ of T on some input u and suppose there is an inversion: ρ has disjoint loops I < I (cid:48) , whose flows contain productive straight edges, say e in flow ρ ( I ) and e (cid:48) in flow ρ ( I (cid:48) ), such that e (cid:48) precedes e in the run order. Let u = u w u w (cid:48) u so that w and w (cid:48) are the factors of the input delimited by the loops I and I (cid:48) , respectively. Further let v and v (cid:48) be the outputs produced along the edges e and e (cid:48) , respectively. Consider now the run ρ k obtained from ρ by pumpingthe input an arbitrary number k of times on the loops I and I (cid:48) . This run isover the input u ( w ) k u ( w (cid:48) ) k u , and in the output produced by ρ k there are k (possibly non-consecutive) occurrences of v and v (cid:48) . By Lemma 3 all occurrencesof v (cid:48) precede all occurrences of v . In particular, if X (resp. X ) is the setof positions corresponding to all the occurrences of v (resp. v (cid:48) ) in the outputproduced by ρ k , then ( X , X ) is a cross of width at least k .Now we prove the implication 3 →
4. We assume that no run of T has anyinversion. We want to build a partially bijective, regular resynchronizer R that is T -preserving and such that R ( T ) is order-preserving. The resynchronizer R willuse input and output parameters to guess a successful run ρ of T on the input u and a corresponding factorization tree for ρ of height at most H = 3 | M T | (seepage 5 for the formal definition and the existence of a factorization tree).The resynchronizer R that we will define is functional , which means herethat every source origin is mapped by each move τ formula to at most one targetposition. Notations.
For a node p of a factorization tree we write I ( p ) for the input intervalwhich is the yield of the subtree of p . Recall that the leaves of the factorizationtree correspond to singleton intervals on the input. The set of output positionswith origins in I ( p ) is denoted by out ( p ) (note that this might not be an interval).Recall that an output block B of out ( p ) is a maximal interval of outputpositions with origins in I ( p ), and hence the position just before and the posi-tion just after B have origins outside I ( p ). We also write bigout ( p ), instead of bigout ( I ( p )), for the dominant output interval of I ( p ) (see page 12 for the defini-tion). Finally, given a position x in the output and a level (cid:96) of the factorizationtree of ρ , we denote by p x,(cid:96) the unique node at level (cid:96) such that I ( p x,(cid:96) ) containsthe source origin of x . Input Parameters.
The successful run ρ together with its factorization tree ofheight at most H = 3 | M T | can be easily encoded over the input using inputparameters ipar . The parameters describe each input interval I ( p ) and the label flow ( I ( p )) of each node p in the factorization tree. Formally, an input interval ( p ) is described by marking the begin and end with two distinguished param-eters for the specific level. The label flow ( I ( p )) annotates every position inside I ( p ). This accounts for H (2 + | M T | ) input parameters. Correctness of the anno-tations with the above input parameters can be expressed by a formula ipar . Inparticular, on the leaves, ipar checks that every interval is a singleton of the form { y } and its flow is the one induced by the letter u ( y ). On the internal nodes, ipar checks that the label of a node coincides with the monoid product of the labelsof its children, which is a composition of flows. It also checks that for every nodewith more than two children, the node and the children are labelled by the sameidempotent flow. Output Parameters.
We also need to encode the run ρ on the output, becausethe resynchronizer will determine the target origin of an output position, notonly on the basis of the flow at the source origin, but also on the basis of theproductive transition that generated that particular position. The annotationthat encodes the run ρ on the output is done using output parameters (onefor each transition in ∆ ), and its correctness will be enforced by a suitablecombination of the formulas opar , move τ , and next τ,τ (cid:48) . This will take a significantamount of technical details and will rely on specific properties of formulas move τ ,so we prefer to temporarily postpone those details.Below, we explain how the origins are transformed by a series of partial resyn-chronizers R (cid:96) that “converge” in finitely many steps to a desired resynchroniza-tion, under the assumption that the output annotation correctly encodes thesame run ρ that is represented in the input annotation. Moving origins.
Here we will work with a fixed successful run ρ and a factoriza-tion tree for it, that we assume are correctly encoded by the input and outputannotations. For every level (cid:96) of the factorization tree, we will define a functional,bounded, regular resynchronizer R (cid:96) . Each resynchronizer R (cid:96) will be partial , inthe sense that for some output positions it will not define source-target originpairs. However, the set of output positions with associated source-target originpairs increases with the level (cid:96) , and the top level resynchronizer R ∗ will specifysource-target origin pairs for all output positions. The latter resynchronizer willalmost define the resynchronization that is needed to prove item (4) of the the-orem; we will only need to modify it slightly in order to make it 1-bounded andto check that the output annotation is correct.To enable the inductive construction, we need the resynchronizer R (cid:96) to satisfythe following properties, for every level (cid:96) of the factorization tree: – the set of output positions for which R (cid:96) defines target origins is the unionof the dominant output intervals bigout ( p ) of all nodes p at level (cid:96) ; – R (cid:96) only moves origins within the same interval at level (cid:96) , that is, R (cid:96) definesonly pairs ( y, z ) of source-target origins such that y, z ∈ I ( p ) for some node p at level (cid:96) ; – the target origins defined by R (cid:96) are order-preserving within the same intervalat level (cid:96) , that is, for all output positions x < x (cid:48) , if R (cid:96) defines the targetrigins of x, x (cid:48) to be z, z (cid:48) , respectively, and if z, z (cid:48) ∈ I ( p ) for some node p atlevel (cid:96) , then z ≤ z (cid:48) . – R (cid:96) is (cid:96) · K C -bounded, namely, there are at most (cid:96) · K C distinct sourceorigins that are moved by R (cid:96) to the same target origin.The inductive construction of R (cid:96) will basically amount to defining appropriateformulas move τ ( y, z ). Base Case . The base case is (cid:96) = 0, namely, when the resynchronization is actingat the leaves of the factorization tree. In this case, the regular resynchronizer R (cid:96) is vacuous, as the input intervals I ( p ) associated with the leaves p are sin-gletons, and hence all dominant output intervals bigout ( p ) are empty. Formally,for this resynchronizer R (cid:96) , we simply let move τ ( y, z ) be false, independently ofthe underlying output type τ and of the source and target origins. This resyn-chronization is clearly functional, 0-bounded, and order-preserving. Inductive Step . For the inductive step, we explain how the origins of an outputposition x ∈ bigout ( p ) are moved within the interval I ( p ), where p = p x,(cid:96) is thenode at level (cid:96) that “generates” x . Even though we explain this by mentioning thenode p x,(cid:96) , the definition of the resynchronization will not depend on it, but onlyon the level (cid:96) and the underlying input and output parameters. In particular, todescribe how the origin of a τ -labeled output position x is moved, the formula move τ ( y, z ) has to determine the productive edge that generated x in the flowthat labels the node p x,(cid:96) . This can be done by first determining from the outputtype τ the productive transition t x that generated x , and then inspecting theannotation at the source origin y to “track” t x inside the productive edges of theflow flow ( I p (cid:48) ), for each node p (cid:48) along the unique path from the leaf p x, to node p x,(cid:96) . In the case distinction below, we implicitly rely on this type of computation,which can be easily implemented in MSO.1. p x,(cid:96) p x,(cid:96) p x,(cid:96) is a binary node . We first consider the case where p = p x,(cid:96) is abinary node (the annotation on the source origin y will tell us whether thisis the case). Let p , p be the left and right children of p . If x belongs toone of the dominant output blocks bigout ( p ) and bigout ( p ) (again, thisinformation is available at the input annotation), then the resynchronizer R (cid:96) will inherit the source-target origin pairs associated with x from thelower level resynchronization R (cid:96) − . Note that bigout ( p ) < bigout ( p ) byLemma 5, so R (cid:96) is order-preserving at least for the output positions inside bigout ( p ) ∪ bigout ( p ).We now describe the source-target origin pairs when x ∈ bigout ( p ) \ ( bigout ( p ) ∪ bigout ( p )). The idea is to move the origin of x to one of the following threeinput positions, depending on the relative order between x and the positionsin bigout ( p ) and in bigout ( p ): – the first position of I ( p ), if x < bigout ( p ); – the last position of I ( p ), if bigout ( p ) < x < bigout ( p ); – the last position of I ( p ), if x > bigout ( p ).Which of the above cases holds can be determined, again, by inspecting theoutput type τ and the annotation of the source origin y , in a way similaro the computation of the productive edge that generated x at level (cid:96) . Sothe described resynchronization can be implemented by an MSO formula move τ ( y, z ).The resulting resynchronization R (cid:96) is functional and order-preserving insideevery interval at level (cid:96) . It remains to argue that R (cid:96) is (cid:96) · K C -bounded.To see why this holds, assume, by the inductive hypothesis, that R (cid:96) − is( (cid:96) − · K C -bounded. Recall that the new source-target origin pairs thatare added to R (cid:96) are those associated with output positions in bigout ( p ) \ ( bigout ( p ) ∪ bigout ( p )). Lemma 6 tells us that there are at most 4 K C distinct positions that are source origins of such positions. So, in the worstcase, at most ( (cid:96) − · K C source origins from R (cid:96) − and at most 4 K C newsource origins from R (cid:96) are moved to the same target origin. This shows that R (cid:96) is (cid:96) · K C -bounded.2. p x,(cid:96) p x,(cid:96) p x,(cid:96) is an idempotent node . The case where p = p x,(cid:96) is an idempotentnode with children p , p , . . . , p n follows a similar approach. For brevity, let I i = I ( p i ) and B i = bigout ( p i ). By Lemma 5, we have B < B < · · · < B n .Lemma 7 then provides a decomposition of bigout ( p ) as B · J · B · J · . . . · J n − · B n , for some 2 K C -small output intervals J k , for k = 1 , . . . , n −
1, thathave origins inside I k ∪ I k +1 .As before, the resynchronizer R (cid:96) behaves exactly as R (cid:96) − for the outputpositions inside the B k ’s. For any other output position, say x ∈ J k for some k = 1 , , . . . , n −
1, we first recall that the source origin y of x is either inside I k or inside I k +1 . Depending on which of the two intervals contains y , theresynchronizer R (cid:96) will define the target origin z to be either the last positionof I k or the first position of I k +1 . However, since we cannot determine usingMSO the index k of the interval J k that contains x , we proceed as follows.First observe that any block B i can be identified by some flow edge at level (cid:96) −
1, and the latter edge can represented in MSO by suitable monadicpredicates over the input. Let
B, B (cid:48) be the two consecutive blocks among B , . . . , B n such that B < x < B (cid:48) . Note that these blocks can be determinedin MSO once the productive edge that generated x is identified. Further let I be the interval among I , . . . , I n that contains the origin y of x . By theprevious arguments, we have that the interval I contains either all the originsof B or all the origins B (cid:48) . Moreover, which of the two sub-cases holds canagain be determined in MSO by inspecting the annotations. The formula move τ ( y, z ) can thus define the target origin z to be – the last position of I , if I contains the origins of B ; – the first position of I , if I contains the origins of B (cid:48) .The above construction yields a functional regular resynchronization R (cid:96) thatassociates with any two output positions x < x (cid:48) with source origins in thesame interval I ( p ), some target origins z ≤ z (cid:48) . In other words, the resyn-chronization R (cid:96) is order-preserving in each interval at level (cid:96) .It remains to show that R (cid:96) is (cid:96) · K C -bounded, under the inductive hy-pothesis that R (cid:96) − is ( (cid:96) − · K C -bounded. This is done using a simi-lar argument as before, that is, by observing that the output positions in bigout ( p ) \ (cid:0) (cid:83) ≤ k ≤ n bigout ( p i ) (cid:1) belong to some J k , and in the worst casell source origins y of positions from J k are moved to the last position of I k .By Lemma 7, there are at most 2 K C such positions y . Top level resynchronizer.
Let R ∗ be the the resynchronizer R (cid:96) obtained at thetop level (cid:96) of the factorization tree. Based on the above constructions, R ∗ de-fines target origins for all output positions, unless the dominant output interval bigout ( p ) associated with the root p is empty (this can indeed happen whenthe number of different origins in the output is at most C , so not sufficient forhaving at least one C -large output factor). In particular, if bigout ( p ) (cid:54) = ∅ , then bigout ( p ) is the whole output, and R (cid:96) is basically the desired resynchronization,assuming that the output annotations are correct.Let us now discuss briefly the degenerate case where bigout ( p ) = ∅ , which ofcourse can be detected in MSO. In this case, the appropriate resynchronizer R ∗ should be redefined so that it moves all source origins to the same target origin,say the first input position. Clearly, this gives a functional, regular resynchronizerthat is order-preserving and C -bounded. Correctness of output annotation.
Recall that the properties of the top levelresynchronizer R ∗ , in particular, the claim that R ∗ is bounded, were cruciallyrelying on the assumption that every output position x is correctly annotatedwith the productive transition that generated it. This assumption cannot beguaranteed by the MSO sentence opar alone (the property intrinsically talksabout a relation between input and output annotations). Below, we explain howto check correctness of the output annotation with the additional help of theformulas move τ ( y, z ) (that will be modified for this purpose) and next τ,τ (cid:48) ( z, z (cid:48) ).Let ρ be the successful run as encoded by the input annotation. The idea is tocheck that the sequence of productive transitions t x that annotate the positions x in the output is the maximal sub-sequence of ρ consisting only of productivetransitions. Besides the straightforward conditions (concerning, for instance, thefirst and last productive transitions of ρ , or the possible multiple symbols thatcould be produced within a single transition), the important condition to beverified is the following: For every pair of consecutive output positions x, x +1 with source origins y, y (cid:48) , respectively, then on the run ρ that is annotated on the input, onecan move from transition t x at position y to transition t x +1 at position y (cid:48) by using at the intermediate steps only non-productive transitions. ( † )The above property is easily expressible by an MSO formula ϕ † τ,τ (cid:48) ( y, y (cid:48) ),assuming that τ, τ (cid:48) are the output types of x, x + 1 and the free variables y and y (cid:48) are interpreted by the source origins of x and x + 1, with x ranging over alloutput positions. This is very close to the type of constraints that can be enforcedby the formula next τ,τ (cid:48) of a regular resynchronizer, with the only difference thatthe latter formula can only access the target origins z, z (cid:48) of x, x + 1.We thus need a way to uniquely determine from the target origins z, z (cid:48) of x the source origins y, y (cid:48) of x . For this, we could rely on the formulas move τ ( y, z ),f only they were defining partial bijections between y and z . Those formulas arein fact close to define partial bijections, as they are functional and k -bounded,for k = H · K C . The latter boundedness property, however, depends again onthe assumption that the output annotation is correct. We overcome this problemby gradually modifying the resynchronizer R ∗ so as to make it functional and1-bounded (i.e., partially bijective), independently of the output annotations.We start by modifying the formulas move τ ( y, z ) to make them “syntactically” k -bounded. Formally, we construct from move τ ( y, z ) the formula move (cid:48) τ ( y, z ) = move τ ( y, z ) ∧ ∀ y , . . . , y k , y k +1 (cid:16) (cid:94) i move τ ( y i , z ) (cid:17) → (cid:16) (cid:95) i (cid:54) = j y i = y j (cid:17) . Intuitively, the above formula is semantically equivalent to move τ ( y, z ) whenthere are at most k input positions y (cid:48) that can be paired with z via the sameformula move τ , and it is false otherwise.Let R (cid:48)∗ be the regular resynchronizer obtained from R ∗ by replacing the for-mulas move τ by move (cid:48) τ , for every output type τ . By construction, R (cid:48)∗ is functionaland k -bounded, independently of any assumption on the output annotations. Wecan then apply Lemma 1 and obtain from R (cid:48)∗ an equivalent regular resynchro-nizer R (cid:48)(cid:48)∗ = ( I (cid:48)(cid:48) , O (cid:48)(cid:48) , ipar (cid:48)(cid:48) , opar (cid:48)(cid:48) , ( move (cid:48)(cid:48) τ ) τ , ( next (cid:48)(cid:48) τ,τ (cid:48) ) τ,τ (cid:48) ) that is 1-bounded. Soeach move (cid:48)(cid:48) τ is a partial bijection.We are now ready to verify the correctness of the output annotation. Recallthat the idea is to enforce the property ( † ) by exploiting the previously definedformula ϕ † τ,τ (cid:48) ( y, y (cid:48) ) and the partial bijection between the source origings y, y (cid:48) andthe target origins z, z (cid:48) , as defined by move (cid:48)(cid:48) τ ( y, z ) and move (cid:48)(cid:48) τ (cid:48) ( y (cid:48) , z (cid:48) ). Formally,we define next (cid:48)(cid:48)(cid:48) τ,τ (cid:48) ( z, z (cid:48) ) = next (cid:48)(cid:48) τ,τ (cid:48) ( z, z (cid:48) ) ∧ ∃ y, y (cid:48) move (cid:48)(cid:48) τ ( y, z ) ∧ move (cid:48)(cid:48) τ (cid:48) ( y (cid:48) , z (cid:48) ) ∧ ϕ † τ,τ (cid:48) ( y, y (cid:48) ) . To conclude, by replacing in R (cid:48)(cid:48) the formulas next (cid:48)(cid:48) τ,τ (cid:48) with next (cid:48)(cid:48)(cid:48) τ,τ (cid:48) , we obtain aregular resynchronizer R that is partially bijective, T -preserving and such that R ( T ) is order-preserving. This completes the proof of the implication 3 → (cid:117)(cid:116) E Proof of Theorem 2
We provide here the missing details of the proof of Theorem 2, as sketched inSection 6. We recall that the goal is to construct, from a given arbitrary two-waytransducer T :1. a bounded-visit transducer low ( T ) that is classically equivalent to T ,2. partially bijective, regular resynchronizer R that is T -preserving and suchthat R ( T ) = o low ( T ).We will reason with a fixed input u at hand and with an induced accessibilityrelation on productive transitions of T , tagged with origins. Formally, a taggedransition is any pair ( t, y ) consisting of a transition t ∈ ∆ and a position y on the input u , such that t occurs at position y in some successful run on u .The accessibility preorder on tagged transitions is such that ( t, y ) (cid:22) u ( t (cid:48) , y (cid:48) )whenever T has a run on u starting with transition t at position y and endingwith transition t (cid:48) at position y (cid:48) . This preorder induces an equivalence relation,denoted ∼ u . Intuitively, ( t, y ) ∼ u ( t (cid:48) , y (cid:48) ) means that T can cycle an arbitrarynumber of times between these two tagged transitions (possibly ( t, y ) = ( t (cid:48) , y (cid:48) )).A ∼ u -equivalence class C is called realizable on u if there is a successful run on u that uses at least once a tagged transition from the class C .We say that T is K -sparse if for every input u and every realizable ∼ u -equivalence class C , there are at most K productive tagged transitions in C (recall that a productive transition is one that produces non-empty output).Intuitively, bounded sparsity means that the number of origins of outputs pro-duced by vertical loops in successful runs of T is uniformly bounded. If T is not K -sparse for any K , then we say that T has unbounded sparsity .When T is K -sparse, the productive tagged transitions from the same real-izable ∼ u -equivalence class can be lexicographically ordered and distinguishedby means of numbers from a fixed finite range, say { , . . . , K } . An importantobservation is that the equivalence ∼ u is a regular property, in the sense that onecan construct, for instance, an MSO formula ϕ ∼ u t,t (cid:48) ( y, y (cid:48) ) that holds on input u ifand only if ( t, y ) ∼ u ( t (cid:48) , y (cid:48) ). In particular, this implies that unbounded sparsitycan be effectively tested: it suffices to construct the regular language consistingof every possible input u with a distinguished realizable ∼ u -equivalence classmarked on it, and check whether this language contains words with arbitrarilymany marked positions that correspond to productive tagged transitions (thisboils down to detecting special loops in a classical finite-state automaton). Lemma E.1. If T has unbounded sparsity, then T is not one-way definable.Proof. The assumption that T has unbounded sparsity and the definition of ∼ u imply that, for every n ∈ N , there exist an input u , a successful run ρ on u , and2 n tagged transitions ( t , y ) , . . . , ( t n , y n ), ( t (cid:48) , y (cid:48) ) , . . . , ( t (cid:48) n , y (cid:48) n ) such that the t i ’soccur before the t (cid:48) j in ρ and the y (cid:48) i are to the right of the y (cid:48) j . Since n can growarbitrarily, this witnesses precisely the fact that T has unbounded cross-width.Thus, by the implication 1 → T beingbounded-visit, we know that T is not one-way resynchronizable. (cid:117)(cid:116) Let us now show how to construct a bounded-visit transducer low ( T ) withregular outputs and common guess that is equivalent to T , under the assumptionthat T is K -sparse for some constant K . Intuitively, low ( T ) simulates successfulruns of T on input u by shortcutting maximal vertical loops. Formally, for aninput u and a tagged transition ( t, y ), a vertical loop at ( t, y ) is any run on u thatstarts and ends with transition t at position y . We will tacitly focus on verticalloops that are realizable on u , exactly as we did for ∼ u -equivalence classes. Theoutput of a vertical loop is the word spelled out by the productive transitions init.f course, all tagged transitions in a vertical loop at ( t, y ) are ∼ u -equivalentto ( t, y ). In particular, as T is K -sparse, there are at most K productive taggedtransitions in a (realizable) vertical loop, and hence the language L t,y of outputsof vertical loops at ( t, y ) is regular. In addition, there are only finitely manylanguages L t,y for varying ( t, y ). This can be seen as follows: we can assume anorder on the elements of the ∼ u -class C of ( t, y ), and a strongly connected graphwith nodes corresponding to C and edges reflecting the accessibility preorder.The correctness of the graph can be checked with regular annotations on theinput, and the graph itself can be turned into an automaton accepting L t,y .Therefore, using common guess in low ( T ), we can assume that every position y carries as annotation the language L t,y for each transition t . By definition, L t,y is non-empty if and only if there is some productive vertical loop at ( t, y ).Consider an arbitrary successful run ρ of T on u . Let low ( ρ ) be the runobtained by replacing, from left to right, every maximal vertical loop at ( t, y )by the single transition t . Here, maximality refers to the subrun relation. Wecall low ( ρ ) the normalization of ρ and we observe that this is a successful, | ∆ | -visit run. This means that (i) low ( ρ ) can be finitely encoded on the input as asequence of flows of height at most | ∆ | , and (ii) the language consisting of inputsannotated with such encodings is regular.The transducer low ( T ) guesses the encoding of a normalization low ( ρ ) anduses it to simulate a possible run ρ of T . In particular, every time low ( T ) tra-verses a transition t from the flow of low ( ρ ) at position y , it outputs a word fromthe language L t,y . However, in order to simplify later the construction of a resyn-chronizer R such that R ( T ) = o low ( T ), it is convenient that low ( T ) outputs theword from L t,y in a possibly different origin, which is uniquely determined by the ∼ u -equivalence class of ( t, y ). Formally, we define the anchor of a ∼ u -equivalenceclass C , denoted an ( C ), to be the leftmost input position z such that ( t (cid:48) , z ) ∈ C for some transition t (cid:48) . After traversing a transition t from the flow at position y , and before outputting a word from L t,y , the transducer low ( T ) moves to theanchor an ([( t, y )] ∼ u ). Then it outputs the appropriate word and moves back toposition y , where it can resume the simulation of the normalized run low ( ρ ).Note that the position y can be recovered from the anchor an ([( t, y )] ∼ u ) sincethe elements inside the equivalence class [( t, y )] ∼ u can be identified by numbersfrom { , . . . , K } (recall that T is K -sparse), and since the relationship betweenany two such elements is a regular property. It is routine to verify that thedescribed transducer low ( T ) is equivalent to T and bounded-visit.Let us now explain how to construct a partially bijective, regular resynchro-nizer R that is T -preserving and such that R ( T ) = o low ( T ). We proceed asin the construction of low ( T ) by annotating the input word u with flows thatencode the normalization low ( ρ ) of a successful run ρ of T on u . As for theoutput word v , we annotate every position x of v with the productive transition t = ( q, a, v, q (cid:48) ) of ρ that generated x . For short, we call t the transition of x . Inaddition, we fix an MSO-definable total ordering on tagged transitions (e.g. thelexicographic ordering). Then, we determine from each output position x the ∼ u -equivalence class C = [( t, y )] ∼ u , where u is the underlying input, t is theroductive transition that generated x , and y is its origin, and we extend theannotation of x with the index of the element ( t, y ) inside the equivalence class C , according to the fixed total ordering on tagged transitions. This number i iscalled the index of x .The resynchronizer R needs to redirect the source origin y of any outputposition generated by a transition t to a target origin z that is the anchor ofthe ∼ u -equivalence class of ( t, y ). To simplify the explanation, we temporarilyassume that the input and output are correctly annotated as described above.By inspecting the type τ of an output position x , the formula move τ ( y, z ) of R can determine the transition t of x , and enforce that ( t, y ) ∼ u ( t (cid:48) , z ), for sometransition t (cid:48) , and that ( t, y ) (cid:54)∼ u ( t (cid:48)(cid:48) , z (cid:48) ), for all z (cid:48) < z and all transitions t (cid:48)(cid:48) . Under the assumption that the input and output annotations are correct , thiswould result in a bounded resynchronizer R . Indeed, for every position z , thereexist at most K · | ∆ | positions y that, paired with some productive transition,turn out to be ∼ u -equivalent to ( t (cid:48) , z ) for some transition t (cid:48) . Once again, weneed to further constrain the relation move τ ( y, z ) so that it describes a partialbijection between source and target origins (this will be useful later). For this, itsuffices to additionally enforce that ( t, y ) is the i -th element in its ∼ u -equivalenceclass, accordingly to the fixed total ordering on tagged transitions, where i is theindex specified in the output type τ of x . This latter modification also guaranteesthat i is the correct index of x .Unless we further refine our constructions, we cannot claim that they alwaysresult in a 1-bounded resynchronizer R , since the above arguments crucially relyon the assumption that the input and output annotations are correct. However,we can apply the same trick that we used in the proof of Theorem 1, to makethe resynchronizer R “syntactically” 1-bounded, even in the presence of badly-formed annotations. Formally, let move τ ( y, z ) be the formula that transformsthe origins in the way described above, and define move (cid:48) τ ( y, z ) = move τ ( y, z ) ∧ ∀ y (cid:48) (cid:0) move τ ( y (cid:48) , z ) → y (cid:48) = y (cid:1) . By construction, the above formula defines a partial bijection entailing the oldrelation move τ (in the worst case, when the annotations are not correct, the aboveformula may not hold for some pairs of source and target origins). In addition,if the annotations are correct, then move (cid:48) τ ( y, z ) is semantically equivalent to move τ ( y, z ), as desired. In this way, we obtain a regular resynchronizer R =( I, O, ipar , opar , move (cid:48) τ , next ) that is always 1-bounded, no matter how we define ipar , opar , and next .We now explain how to check that the annotations are correct. The inputannotation does not pose any particular problem, since the language of inputsannotated with normalized runs is regular, and can be checked using the firstformula ipar of the resynchronizer. As for the output annotation, correctness ofthe indices was already enforced by the move (cid:48) τ relation. It remains to enforce cor-rectness of the transitions. Once again, this boils down to verifying the followingroperty ( † ): For every pair of consecutive output positions x, x +1 with source origins y, y (cid:48) , respectively, if t, t (cid:48) are the productive transitions specified in theoutput types of x, x + 1 , then on the flows that annotate the input, onecan move from transition t at position y to transition t (cid:48) at position y (cid:48) by using as intermediate edges only non-productive transitions. ( † )From here we proceed exactly as in the proof of Theorem 1. We observe thatProperty ( † ) is expressible by an MSO formula ϕ † τ,τ (cid:48) ( y, y (cid:48) ), assuming that τ, τ (cid:48) arethe output types of x, x +1, that y, y (cid:48) are interpreted by the source origins of x, x +1, and that x ranges over all output positions. We then recall that move τ ( y, z )and move τ ( y (cid:48) , z (cid:48) ) describe partial bijections between source and target origins,and exploit this enforce ( † ) by means of the last formula of R : next τ,τ (cid:48) ( z, z (cid:48) ) = ∃ y, y (cid:48) move τ ( y, z ) ∧ move τ (cid:48) ( y (cid:48) , z (cid:48) ) ∧ ϕ † τ,τ (cid:48) ( y, y (cid:48) ) . This guarantees that all annotations are correct, and proves that R is a partiallybijective, regular resynchronizer satisfying R ( T ) = o low ( T ). It is also immediateto see that R is T -preserving.We finally prove that one-way resynchronizability of T reduces to one-wayresynchronizability of low ( T ), which can be effectively tested using Theorem 1since low ( T ) is bounded-visit: Lemma 9.
For all transducers
T, T (cid:48) , with T (cid:48) bounded-visit, and for every par-tially bijective, regular resynchronizer R that is T -preserving and such that R ( T ) = o T (cid:48) , T is one-way resynchronizable if and only if T (cid:48) is one-way resyn-chronizable.Proof. For the right-to-left implication, suppose that T (cid:48) = o R ( T ) is bounded-visit and one-way resynchronizable. Since T (cid:48) is bounded-visit, we can use theimplications 1 → → → R (cid:48) that is T (cid:48) -preserving and such that R (cid:48) ( T (cid:48) )is order-preserving. By Lemma 2, there is a bounded, regular resynchronizer R (cid:48)(cid:48) that is equivalent to R (cid:48) ◦ R . In particular, R (cid:48)(cid:48) ( T ) is order-preserving. It remainsto verify that R (cid:48)(cid:48) is also T -preserving. Consider any synchronized pair σ ∈ [[ T ]] o .Since R is T -preserving, σ belongs to the domain of R (cid:48) , and hence ( σ, σ (cid:48) ) ∈ R for some synchronized pair σ (cid:48) ∈ [[ T (cid:48) ]] o . Since R is T (cid:48) -preserving, σ (cid:48) belongs tothe domain of R , and hence there is ( σ, σ (cid:48)(cid:48) ) ∈ ( R (cid:48) ◦ R ) = R (cid:48)(cid:48) . This shows that R (cid:48)(cid:48) is T -preserving, and hence T is one-way resynchronizable.For the converse direction, suppose that T (cid:48) is bounded-visit, but not one-wayresynchronizable. We apply again Theorem 1, but now we use the contrapositivesof the implications 2 → → →
1, and obtain that T (cid:48) has unbounded cross-width (see Definition 2).We also recall that R = ( I, O, ipar , ( move τ ) τ , ( next τ,τ (cid:48) ) τ,τ ) is partially bijec-tive. This means that every formula move τ ( y, z ) defines a partial bijection fromource to target positions. A useful property of every MSO-definable partial bi-jection is that, for every position t , it can only define boundedly many pairs ( y, z )with either y ≤ t < z or z ≤ t < y — for short, we say call such a pair ( y, z ) t -separated . This follows from compositional properties of regular languages. In-deed, let A be a deterministic automaton equivalent to the formula that definesthe partial bijection. For every pair ( y, z ) in the partial bijection, let q y,z be thestate visited at position t by the successful run of A on the input annotated withthe pair ( y, z ). If A accepted more than | Q | pairs that are t -separated, where Q is the state space of A , then at least two of them, say ( y, z ) and ( y (cid:48) , z (cid:48) ), wouldsatisfy q y,z = q y (cid:48) ,z (cid:48) . But this would imply that the pair ( y, z (cid:48) ) is also acceptedby A , which contradicts the assumption that A defines a partial bijection.We now exploit the above result to prove that the property of having un-bounded cross-width transfers from T (cid:48) to T . Consider a cross ( X , X ) of arbi-trarily large width h in some synchronized pair σ = ( u, v, orig ) of T (cid:48) . Withoutloss of generality, assume that all positions in X ∪ X have the same type τ .Let Z i = orig ( X i ), for i = 1 ,
2, and t = max( Z ). By definition of cross, we have X < X and Z ≤ t < Z . Recall that move τ defines a partial bijection, and thatthis implies that there are only boundedly many pairs of source-target originsthat are t -separated, say ( y , z ) , . . . , ( y k , z k ) for a constant k that only dependson R . Moreover, since R ( T ) = o T (cid:48) , the positions in Z i can be seen as targetorigins of the formula move τ of R . Now, let X (cid:48) i = X i \ orig − ( { z , . . . , z k } and Y (cid:48) i = orig (cid:48) ( X (cid:48) i ), for any synchronized pair σ (cid:48) = ( u, v, orig (cid:48) ) such that ( σ, σ (cid:48) ) ∈ R .By construction, we have X (cid:48) < X (cid:48) and Y (cid:48) ≤ t < Y (cid:48) (the latter condition followsfrom the fact that the source origins from Y (cid:48) i can only be moved to target originson the same side w.r.t. t ). This means that ( X (cid:48) , X (cid:48) ) is a cross of width h − k .As h can be taken arbitrarily large and k is constant, this proves that T hasunbounded cross-width as well.Finally, by the contrapositive of the implication 1 → T is bounded-visit), we conclude that T isnot one-way resynchronizable. (cid:117)(cid:116) Summing up, the algorithm that decides whether a given two-way transducer T is one-way resynchronizable first verifies that T is K -sparse for some K (ifnot, it claims that T is not one-way resynchronziable), then it constructs abounded-visit transducer low ( T ) equivalent to T , and finally decides whether low ( T ) is one-way resynchronizable (which happens if and only if T is one-wayresynchronizable). This concludes the proof of Theorem 2.is one-wayresynchronizable). This concludes the proof of Theorem 2.