Locality and Centrality: The Variety ZG
LLocality and Centrality: The Variety ZG
Antoine Amarilli ! ˇ LTCI, Télécom Paris, Institut polytechnique de Paris, France
Charles Paperman
LINKS, CRIStAL, Université de Lille, INRIA, France
Abstract
We study the variety ZG of monoids where the elements that belong to a group are central , i.e.,commute with all other elements. We show that ZG is local , that is, the semidirect product ZG ∗ D of ZG by definite semigroups is equal to LZG , the variety of semigroups where all local monoidsare in ZG . Our main result is thus: ZG ∗ D = LZG . We prove this result using Straubing’sdelay theorem, by considering paths in the category of idempotents. In the process, we obtain thecharacterization ZG = MNil ∨ Com , and also characterize the ZG languages, i.e., the languageswhose syntactic monoid is in ZG : they are precisely the languages that are finite unions of disjointshuffles of singleton languages and regular commutative languages. Theory of computation → Formal languages and automata theory
Keywords and phrases regular language, variety, locality
Acknowledgements
We thank Jean-Éric Pin and Jorge Almeida for their fruitful advice.
In this paper, we study a variety of monoids called ZG . It is defined by enforcing thatthe elements of the monoid that belong to a group are central , i.e., commute with all otherelements of the monoid. The notation ZG thus stands for Zentral Group , inspired by theclassical notion of centrality in group theory. We can also define ZG with the equation x ω +1 y = yx ω +1 on all elements x and y , where ω is the idempotent power of the monoid.The variety ZG has been introduced by Auinger [5] as a subvariety of interest of thebroader class ZE of semigroups where the idempotent elements are central. The study of ZE was initiated by Straubing [21]. Straubing shows in particular that the variety MNil (calledsimply V in the paper) of regular languages generated by finite languages is exactly thevariety of aperiodic monoids in ZE . From this, a systematic investigation of the subclassesof ZE was started by Almeida and pursued by Auinger: see [2, page 211] and [5, 4].Our specific motivation to explore ZG comes from our study of the dynamic membershipproblem for regular languages . In this problem [18], we want to handle update operations onan input word while maintaining the information of whether it belongs to a fixed regularlanguage. In a companion paper also submitted to ICALP’21 [3], we identify a variant of ZG as a plausible tractability boundary characterizing the languages for which every updateoperation can be handled in O (1). Specifically, this variant can be defined as the so-called semidirect product of ZG by definite ( D ) semigroups, which we denote ZG ∗ D .This semidirect product operation on varieties, which we use to define ZG ∗ D , intuitivelycorresponds to composing finite automata via a kind of cascade operation. Its study isthe subject of a large portion of semigroup theory, inspired by the classical study of thesemidirect product in group theory. There are also known results to understand specificallythe semidirect product by D . For instance, the Derived category theorem [28] studies it asa decisive step towards proving the decidability of membership to an arbitrary semidirectproduct, i.e., deciding if a given monoid belongs to the product. The product by D also arisesnaturally in several other contexts: the dotdepth hierarchies [22], the circuit complexity ofregular languages [23], or the study of the successor relations in first-order logic [26, 27, 14]. a r X i v : . [ c s . F L ] F e b Locality and Centrality: The Variety ZG
Understanding this product with D is notoriously complicated. For instance, it requiresspecific dedicated work for some varieties like J or Com [13, 12, 25]. Also, this productdoes not preserve the decidability of membership, i.e., Auinger [6] proved that there arevarieties V such that membership in V is decidable, but the analogous problem for V ∗ D isundecidable. For the specific case of the varieties ZG , ZE , or even MNil , we are not awareof prior results describing their semidirect product with D . Locality.
Existing work has nevertheless identified some cases where the ∗ D operator canbe simplified to a much nicer local operator , that preserves the decidability of membership andis easier to understand. For any semigroup S , the local monoids of S are the subsemigroupsof S of the shape eSe with e an idempotent element of S . For a variety V , we say that asemigroup belongs to LV if all its local monoids are in V . It is not hard to notice that thevariety V ∗ D is always a subvariety of LV , i.e., that every monoid in V ∗ D must also bein LV . In some cases, we can show a locality result stating that the other direction alsoholds, so that V ∗ D = LV . In those cases we say that the variety V is local . The localityof the variety of monoids DA [2] is a famous result that has deep implications in logic andcomplexity [26, 9, 11] and has inspired recent follow-up work [17]. Locality results are alsoknown for other varieties, for instance the variety of semi-lattice monoids (monoids that areboth idempotent and commutative) [15, 8], any sub-varieties of groups [22, Theorem 10.2],or the R -trivial variety [20, 19, 24]. This suggests an angle of attack to understanding thevariety ZG ∗ D : establishing a locality result of this type for ZG . Contributions.
Our main result in this paper is to show the locality of the variety ZG : ▶ Theorem 1.1.
We have
LZG = ZG ∗ D . In the process of showing this result, we obtain a characterization of ZG -congruences,i.e., congruences ∼ on Σ ∗ where the quotient Σ ∗ / ∼ is a monoid of ZG . We show that theyare always refined by a so-called n -congruence , which identifies the number of occurrences ofthe frequent letters (the ones occurring > n times in the word) modulo n , and also identifiesthe exact subword formed by the rare letters (the ones occurring ≤ n times). Thanks to this(Theorem 3.4), we also obtain a characterization of the languages of ZG , i.e., the languageswhose syntactic monoid is in ZG : they are exactly the finite unions of disjoint shuffles ofsingleton languages and commutative languages (Corollary 3.5). We also characterize ZG asa variety of monoids: ZG = MNil ∨ Com , for
MNil defined in [21] and
Com the varietyof commutative monoids.
Paper structure.
We give preliminaries in Section 2 and formally define the variety ZG .We then give in Section 3 our characterizations of ZG via the so-called n -congruence. Wethen define in Section 4 the varieties ZG ∗ D and LZG used in Theorem 1.1, which we provein the rest of the paper. We first introduce the framework of Straubing’s delay theoremused for our proof in Section 5, and rephrase our result as a claim (Claim 5.6) on paths inthe category of idempotents. We then study in Section 6 how to pick a sufficiently largevalue of n as a choice of our n -congruence, show in Section 7 two lemmas on paths in theidempotent category, and finish the proof in Section 8. We conclude in Section 9. For a complete presentation of the basic concepts (automata, monoids, semigroups, groups,etc.) the reader can refer to the book of J. E. Pin [16] or to the more recent lecture notes [29]. . Amarilli, C. Paperman 3
All semigroups, groups, and monoids that we consider are finite.
Semigroups and varieties.
For a semigroup S , we call x ∈ S idempotent if x = x . We callthe idempotent power of x ∈ S the unique idempotent element which is a power of x . (Thismeans that x is idempotent iff it is its own idempotent power.) Now, the idempotent powerof S is an integer ω such that for any element x ∈ S , the element x ω is the idempotent powerof x . We write x ω + k for any k ∈ Z to mean x ω + k ′ where k ′ is the remainder of k in theinteger division by ω .We will use notions from formal language theory for some of our definitions. We denoteby Σ an alphabet and by Σ ∗ the set of all finite words on Σ. We denote by ϵ the empty word.For w ∈ Σ ∗ , we denote by | w | the length of w . For u, v ∈ Σ ∗ , we say that u is a subword of v if there is 0 ≤ n ≤ | u | and 1 ≤ i < · · · < i n ≤ | v | such that u = v i · · · v i n . For w ∈ Σ ∗ and a ∈ Σ, we denote by | w | a the number of occurrences of a in w . A language L is a subsetof Σ ∗ . A variety of (regular) languages is a class of regular languages which is closed underBoolean operations, left and right derivatives, and inverse homomorphisms. A non-erasingvariety is closed under the same operations, except we only require closure under inverse non-erasing homomorphisms, that is, those that do not map any letter of Σ to ϵ .A variety of monoids (resp., variety of semigroups ) is a class of monoids (resp., semigroups )closed under direct product, quotient, and submonoid (resp., subsemigroup ). We recall thatEilenberg’s theorem [10] gives a one-to-one correspondence between varieties of languagesand varieties of monoids. A similar one-to-one correspondence exists between non-erasingvarieties of languages and varieties of semigroups. In the following, we abuse notation andidentify varieties of monoids with varieties of languages following this correspondence. To bemore precise, we say that a monoid M recognizes a language L if there exists a morphism η : Σ ∗ → M such that L = η − ( η ( L )). Eilenberg’s correspondence then says that, whenconsidering a variety V of monoids, the languages recognized by monoids in V belong to thecorresponding variety of languages. Eilenberg’s correspondence extends to a correspondencebetween variety of semigroups and non-erasing variety of languages. Congruences. A finite index congruence on a finite alphabet Σ is a congruence on Σ ∗ thathas a finite number of equivalence classes. For a given finite index congruence ∼ , the quotientΣ ∗ / ∼ is a finite monoid, whose law corresponds to concatenation over Σ ∗ and whose neutralelement is the class of the empty word. The syntactic monoid of a regular language over Σ ∗ isthe quotient by the syntactic congruence for the language, which is a finite index congruencebecause the language is regular. Letting V be a variety of monoids, we say that a finiteindex congruence ∼ on Σ is a V -congruence if the quotient Σ ∗ / ∼ is a monoid in V . For agiven V -congruence ∼ , the map η : Σ ∗ → Σ ∗ / ∼ , defined by associating each word with itsequivalent class, is an onto morphism within a monoid of V . Hence, each equivalence class isa language of V , since it is recognized by Σ ∗ / ∼ . The variety ZG . In this paper, we study the variety of monoids ZG defined by the equation: x ω +1 y = yx ω +1 for all x, y ∈ M . Intuitively, this says that the elements of the form x ω +1 are central, i.e., commute with all other elements. This clearly implies the same for elementsof the form x ω + k for any k ∈ Z , as we will implicitly use throughout the paper: ▶ Claim 2.1.
For any monoid M in ZG , for x, y ∈ M , and k ∈ Z , we have: x ω + k y = yx ω + k . Note that these elements are precisely the monoid elements that are within a (possiblytrivial) subgroup. This motivates the name ZG , which stands for “Zentral Group”: it follows Locality and Centrality: The Variety ZG the traditional notation Z ( · ) for central subgroups, and extends the variety ZE introducedin [1, p211] which only requires idempotents to be central. Thus, we have ZG ⊊ ZE , andnon-commutative groups are examples of monoids that are in ZE but not in ZG .By Eilenberg’s theorem, ZG then defines a variety of regular languages, namely thelanguages whose syntactic monoid is in ZG . Note that any regular commutative language isin ZE and in ZG (as is clear from the equation), and any finite language is also in ZE andin ZG (their unique group element is a zero so it commutes with everything). ZG In this section, we present our characterizations of ZG , which we will use to prove Theorem 1.1.We will show that ZG is intimately linked to a congruence on words called the n -congruence.Intuitively, two words are identified by this congruence if the subwords of the rare letters (occurring less than n times) are the same, and the numbers of occurrences of the frequentletters (occurring more than n times) are congruent modulo n . Formally: ▶ Definition 3.1 (Rare and frequent letters, n -congruence) . Fix an alphabet Σ and a word w ∈ Σ ∗ . Given a threshold n ∈ N , we call a ∈ Σ rare in w if | w | a ≤ n , and frequent in w if | w | a > n . We define the rare subword w ≤ n to be the subword of w obtained by keeping onlythe rare letters of w , i.e., the subword of w where we keep precisely the letters of the (possiblyempty) rare alphabet { a ∈ Σ | | w | a ≤ n } .For n > , the n -congruence ∼ n is defined by writing u ∼ n v for u, v ∈ Σ ∗ iff:The rare subwords are equal: u ≤ n = v ≤ n ;The rare alphabets are the same: for all a ∈ Σ , we have | u | a > n iff | v | a > n ;The number of occurrences modulo a are the same: for all a ∈ Σ such that | u | a > n (and | v | a > n ), we have that | u | a and | v | a are congruent modulo n . We first remark that two n -equivalent words are also m -equivalent for any divisor m of n : ▶ Claim 3.2.
For any alphabet Σ , for any n > , for any m > , if m is a multiple of n then the m -congruence refines the n -congruence. What is more, observe that n -congruences are a particular case of ZG -congruences: ▶ Claim 3.3.
For any alphabet Σ and n > , the n -congruence over Σ ∗ is a ZG -congruence. Proof sketch.
Considering any equivalence class of the n -congruence, we can enforce in ZG the (commutative) conditions on the number of occurrences of the frequent letters, andinterleave this with the requirement on the rare subword. ◀ The goal of this section is to show the following result. Intuitively, it states that ZG -congruences are always refined by a sufficiently large n -congruence. Formally: ▶ Theorem 3.4.
Consider any ZG -congruence ∼ over Σ ∗ and consider its associated monoid M := Σ ∗ / ∼ . Let n := ( | M | + 1) · ω with ω the idempotent power of M . Then the congruence ∼ is refined by the n -congruence on Σ . Before proving this result, we spell out some of its consequences. The most importantone is that Theorem 3.4 implies a characterization of languages in ZG , which is similarto the one obtained by Straubing in [21] for the variety MNil . To define
MNil , define a nilpotent semigroup S to be a semigroup satisfying the equation x ω y = yx ω = x ω , and let S be the monoid obtained from S by adding an identity element S (i.e., an element . Amarilli, C. Paperman 5 with 1 x = x x for all x ∈ S ) if S does not have one. The variety MNil is generatedby semigroups of the form S for S a nilpotent semigroup. It was shown in [21] that thelanguages of MNil are disjoint monomials , that is, Boolean combinations of languages ofthe shape B ∗ a B ∗ a · · · a k B ∗ with B ∩ { a , . . . , a k } = ∅ .Our analogous characterization for ZG is the following, obtained via Theorem 3.4: ▶ Corollary 3.5.
Any ZG language L can be expressed as a finite union of languages of theform B ∗ a B ∗ a · · · a k B ∗ ∩ K where { a , . . . , a k } ∩ B = ∅ and K is a regular commutativelanguage. Equivalently, we can say that every language of ZG is a finite union of disjoint shuffles of a singleton language (containing only one word) and of a regular commutative language,where the disjoint shuffle operator interleaves two languages (i.e., it describes the sets ofwords that can be achieved as interleavings of one word in each language) while requiringthat the two languages are on disjoint alphabets. We sketch the proof of Corollary 3.5: Proof sketch.
We know that the syntactic congruence of a ZG language is a ZG -congruence,so by Theorem 3.4 it is refined by an n -congruence; and the equivalence classes of an n -congruence can be expressed as stated. ◀ This corollary also implies a characterization of the variety of monoids ZG . To define it,we use the join of two varieties V and W , denoted by V ∨ W , which is the variety of monoidsgenerated by the monoids of V and those of W . Alternatively, the join is the smallest varietycontaining both varieties. We then have: ▶ Corollary 3.6.
The variety ZG is generated by commutative monoids and monoids of theshape S with S a nilpotent semigroup. In other words, we have: ZG = MNil ∨ Com . Proof.
Clearly ZG contains both Com and
MNil . Furthermore, by Corollary 3.5, anylanguage in ZG is a union of intersections of a language in MNil and a language in
Com .Hence, it is in the variety generated by them, concluding the proof. ◀ On a different note, we will also use Theorem 3.4 to show a technical result that will beuseful later. It intuitively allows us to regroup and move arbitrary elements: ▶ Corollary 3.7.
For any monoid M in ZG , letting n ≥ ( | M | + 1) · ω , for any element m of M and elements m , . . . , m n of M , we have m · m · m · m · m · · · m · m n · m · m n · m = m n +1 · m · · · m n . Having spelled out the consequences of Theorem 3.4, we turn to its proof. It cruciallyrelies on a general result about ZG that we will use in several proofs, and which is shownsimply by manipulating equations: ▶ Lemma 3.8.
Let M be a monoid of ZG , let ω be the idempotent power, and let x, y ∈ M .Then we have: ( xy ) ω = x ω y ω . We can then sketch the proof of Theorem 3.4:
Proof sketch.
From Lemma 3.8, we can rewrite any ZG -congruence to a normal form, wherefrequent letters are moved to the end of the word. Looking at this form, we can show that n -equivalence implies equivalence by the ZG -congruence. ◀ Locality and Centrality: The Variety ZG ZG ∗ D and LZG , and Result Statement
We have given our characterizations of ZG and presented some preliminary results. We nowmove to the definition of LZG and ZG ∗ D , to show that LZG = ZG ∗ D (Theorem 1.1). ZG ∗ D . We denote by D the variety of the definite semigroups , i.e., the semigroupssatisfying the equation yx ω = x ω . The variety of semigroups ZG ∗ D is intuitively definedby taking the semidirect product of monoids in ZG and semigroups in D . Although wewill not use directly its definition in this paper, we recall it for completeness. Given twosemigroups S and T , a semigroup action of S on T is defined by a map act : S × T → T suchthat act( s , act( s , t )) = act( s s , t ) and act( s, t t ) = act( s, t )act( s, t ). We then definethe product ◦ act on the set T × S as follows: for all s , s in S and t , t in T , we have:( t , s ) ◦ act ( t , s ) := ( t act( s , t ) , s s ) . The set T × S equipped with the product ◦ act is asemigroup called the semidirect product of S by T , denoted T ◦ act S . The variety ZG ∗ D isthen the variety generated by the semidirect products of monoids in ZG and semigroups in D . Remark that this operation is equivalent to the wreath product of varieties . Furthermorewe could equivalently replace D by the variety of locally trivial semigroups . We refer to [22]for a detailed presentation on this subject. LZG . Last, we introduce the variety
LZG . This is the variety of semigroups S such that,for every idempotent e of S , the submonoid eSe of elements that can be written as ese forsome s ∈ S is in ZG . In other words, a semigroup is in LZG iff it satisfies the followingequation: for any x , y and z in S , we have:( z ω xz ω ) ω +1 ( z ω yz ω ) = ( z ω yz ω )( z ω xz ω ) ω +1 . Again, following Eilenberg’s theorem, we also see
LZG as a non-erasing variety of languages.
Main result.
Our main result, stated in Theorem 1.1, is that ZG ∗ D and LZG areactually the same variety. To prove this result, we will first present the general framework ofStraubing’s delay theorem in the next section and show the easy inclusion ZG ∗ D ⊆ LZG ,before embarking with the actual proof.
To show our main result, we will use Straubing’s delay theorem from [22]. We first give someprerequisites to recall this result. To this end, let us first define a general notion of category : ▶ Definition 5.1. A finite category on a set of objects O is a finite multiset C over O × O of arrows , each arrow going from an object to another object (possibly itself), equipped with acomposition law: for any arrows a, b ∈ C such that we can write a = ( o, o ′ ) and b = ( o ′ , o ′′ ) ,the composition law gives us ab which must be an arrow of the form ab = ( o, o ′′ ) . Further,this composition law must be associative. What is more, we require that for any object o , thereexists an arrow ( o, o ) which is the identity for all elements with which it can be combined(hence these arrows are in particular unique). We now define the notion of idempotent category of a semigroup. The idempotent categoryof a language is then defined as that of its syntactic semigroup. ▶ Definition 5.2 (Idempotent category) . Let S be a semigroup. The idempotent category S E of S is the finite category defined as follows: . Amarilli, C. Paperman 7 The objects of S are the idempotents of S .For any idempotents e and f and any element x of S such that x ∈ eSf , we have anarrow labeled by x going from e to f , which we will denote by ( e, x, f ) .The composition law of the category is ( e, x, f )( f, y, g ) = ( e, xy, g ) . Note that it is clearlyassociative thanks to the associativity of the composition law on S . Let us now study the idempotent category of a semigroup S in more detail. LetArrows( S E ) = { ( e, x, f ) | x ∈ eSf } be the set of arrows of the idempotent category. Forbrevity, we denote this set simply as B .A path of S E is a nonempty word of B ∗ whose sequence of arrows is valid , i.e., the endobject of each arrow except the last one is equal to the starting object of the next arrow.Because S E is a category, each path is equivalent to an element of the category, i.e., composingthe arrows of the path according to the composition law of the category will give one arrowof the category, whose starting and ending objects will be the starting object of the path (i.e., that of the first arrow) and the ending object of the path (i.e., of the last arrow). Twopaths are coterminal if they have the same starting and end object. Two paths p and p are S E -equal if they evaluate to the same category element, which we write p ≡ p . Note that iftwo paths are S E -equal then they must be coterminal. A loop is a path whose starting andending objects are the same.A congruence is an equivalence relation ∼ over B ∗ which satisfies compositionality , i.e., itis compatible with the concatenation of words in the following sense: for any words x , y , z , and t of B ∗ , if x ∼ y and z ∼ t , then xz ∼ yt . Note that the relation is also defined onwords of B ∗ that are not valid, i.e., do not correspond to paths; and compositionality alsoapplies to such words. ▶ Definition 5.3 (Compatible congruence) . A congruence ∼ on B ∗ is compatible with S E ifffor any two coterminal paths p and p of S E such that p ∼ p , then p ≡ p . In other words, ∼ is compatible with S E iff, on words of B ∗ that are coterminal paths, it refines S E -equality. Recall the notion of a ZG -congruence from Section 2. We are now ready to stateStraubing’s delay theorem. The theorem applies to any variety, but we state it specificallyfor ZG for our purposes. The theorem gives us an alternative characterization of ZG ∗ D : ▶ Theorem 5.4 (Straubing’s delay theorem (Theorem 5.2 of [22])) . A language L is in ZG ∗ D iff, writing S E the idempotent category of L and defining B := Arrows( S E ) as above, thereexists a ZG -congruence on B ∗ which is compatible with S E . Using our notion of n -congruence, via Theorem 3.4 and Claim 3.3, we rephrase it again: ▶ Corollary 5.5.
A language L is in ZG ∗ D iff, writing S E and B as above, there exists an n -congruence on B ∗ which is compatible with S E . Before moving on to the full proof of our main theorem (Theorem 1.1), we conclude thesection by noticing that the Straubing delay theorem implies the easy direction of our result,namely, if L is in ZG ∗ D then L is in LZG . This easy direction follows directly from [28],but we provide a self-contained argument in Appendix C for completeness.In the rest of this paper, we show the much harder direction, i.e., if L is in LZG then L is in ZG ∗ D . To prove this, using Corollary 5.5, it suffices to show: ▶ Claim 5.6.
Let S be a semigroup of LZG , write S E its idempotent category and B :=Arrows( S E ) . There is n > such that the n -congruence on B ∗ is compatible with S E . Locality and Centrality: The Variety ZG
This result then implies, by our rephrasing of Straubing’s result (Corollary 5.5), that S is in ZG ∗ D . So in the rest of this paper we prove Claim 5.6. The proof is structured inthree sections. First, in Section 6, we carefully choose the value n in the congruence to be“large enough”. Second, in Section 7, we show auxiliary results about paths in the category ofidempotents. Third, in Section 8, we conclude the proof, first by an induction on the numberof rare arrow occurrences, then by a decomposition of the category using ear decompositionsof multigraphs. In this section, we define our choice of the value of n to prove Claim 5.6. Intuitively, we needto choose n to be large enough so that the “gap” between the number of occurrences of rareletters and of frequent letters can be made sufficiently large: ▶ Definition 6.1.
For Σ an alphabet, u ∈ Σ ∗ , n ∈ N , and m > , we say that n is an m -distant rare-frequent threshold if, letting Σ r := { a ∈ Σ | | u | a ≤ n } be the rare letters for n ,then their total number of occurrences in u plus 1 multiplied by m is less than n , formally (cid:0) P a ∈ Σ r | u | a (cid:1) × m ≤ n . In other words, as every frequent letter occurs strictly more than n times, this guaranteesthat every frequent letter occurs strictly more than ( r + 1) m times, where r is the totalnumber of rare letters. This means that there must some contiguous subword containing norare letter where the frequent letter occurs > m times. This will prove useful in pumpingarguments. Specifically, we will want an m -distant rare-frequent threshold with m := | S | .Of course, we cannot pick an n which will be an m -distant rare-frequent threshold forany path u : there will always be paths u where the number of arrow occurrences is closeto n . However, remember that n -equivalence implies n ′ -equivalence for all n ′ that divide n (Claim 3.2). This suggests that, by choosing a large and composite enough n , we can ensurethat, given any pair of paths, we can pick a divisor n ′ which is a m -distant rare-frequentthreshold. We will also want to ensure in the sequel that n ′ is also always a multiple of theidempotent power ω and of | S | + 1. Let us formally state that such a choice of n exists: ▶ Lemma 6.2.
For any m ≥ , for any semigroup S , letting S E be the category of idempotentsof S , there exists an integer n ≥ with the following property: for any paths u , u in S E ,there exists a divisor n ′ of n which is a multiple of ω × ( | S | + 1) (for ω the idempotent powerof S ) such that n ′ is an m -distant rare-frequent threshold for u and for u . This value of n will be the one we choose in our proof of Claim 5.6. Note that the resultapplies to arbitrary semigroups, not only those of ZG . Before we prove Lemma 6.2, we firstobserve for later that if a path has an m -distant rare-frequent threshold (actually it sufficesto have a 1-distant rare-frequent threshold) then the frequent arrows in this path for thisthreshold must form a so-called union of strongly connected components (SCCs) : ▶ Definition 6.3.
Given S E and a subset B ′ of its arrows B , we say that B ′ is a union ofSCCs if, letting G be the directed graph on the objects of S E formed of the arrows of B ′ , thenall connected components of G are strongly connected. ▶ Claim 6.4.
Fix S and S E and B , let w ∈ B ∗ be a path of S E , and let n ′ > be a 1-distantrare-frequent threshold of w . Then the set of frequent arrows of w for n ′ is a union of SCCs. Proof sketch.
Any frequent arrow occurs > n ′ times in w , so w must contain ≥ n ′ occurrencesof a return path. Now, by 1-distance, not all of these paths can contain a rare arrow. ◀ . Amarilli, C. Paperman 9 Hence, in the rest of this section, we prove Lemma 6.2. Let us show an abstract resultthat will give our choice of value n : ▶ Claim 6.5.
For any d > and k > and m ≥ , there exists n ≥ m such that for any d -tuple T of integers, there exists an n ′′ ≥ m such that, letting n ′ := n ′′ k , we have that n ′ divides n and that m P i ∈ F T i ≤ n ′ , where F = { i | T i ≤ n ′ } . Intuitively, d is the cardinality of the alphabet B , k is ω × ( | S | + 1) (ensuring that wealways work with multiples of that value), m enforces a sufficiently large gap (it is theparameter of a distant rare-frequent threshold), and n is the threshold that we will choose,to ensure the existence of a suitable threshold n ′ . Let us sketch the proof of Claim 6.5: Proof sketch.
By choosing a sufficiently large n , we can ensure that we have 2 d + 1 divisorsof n that are candidate thresholds (are multiple of k ) and are sufficiently far apart. Asthere are only 2 d possible partitions in rare and frequent alphabets, the pigeonhole principleensures that two divisors achieve the same partition. Taking the larger one then ensures thatthe gap between rare and frequent letter occurrences is sufficiently large. ◀ With Claim 6.5, it is now easy to show Lemma 6.2; details are given in the appendix.
We now show several auxiliary results on the category of idempotents to be used in thesequel. We first show some combinatorial results on paths and loops. We then use themto establish two technical claims on paths: the loop insertion lemma , making it possible toinsert any loop of frequent arrows to the power n ′ (with n ′ a sufficiently distant rare-frequentthreshold) without affecting equivalence; and the prefix substitution lemma , which we canuse to replace a prefix of frequent arrows by another up to inserting a loop later in the path.Recall that S E denotes the idempotent category of the semigroup S in LZG that westudy, and B denotes the set of arrows of S E . Basic combinatorial results.
We first apply the definition of ZG to the local monoid toget a trivial result about the commutation between loops: ▶ Claim 7.1.
Let x and y be two coterminal loops of S E , let k ∈ Z , and let ω be an idempotentpower of S . We have: x ω + k y ≡ yx ω + k We then show that frequent loops can be “recombined” without changing the categoryimage, simply by equation manipulation: ▶ Claim 7.2.
For x , x ′ two coterminal paths in S E and y , y ′ coterminal paths in S E suchthat xy and x ′ y ′ are valid loops, we have: ( xy ) ω ( x ′ y ′ ) ω ≡ ( xy ′ ) ω ( x ′ y ) ω ( xy ) ω ( x ′ y ′ ) ω . The previous lemma implies that we can freely change the initial part of a path, even ifit not a loop, when there is a coterminal path under an ω with which we can swap it. Weshow this again by equation manipulation, and it will be crucial for the prefix substitutionlemma that we show later in the section: ▶ Claim 7.3.
For x , x ′ two coterminal paths in S E and y , y ′ coterminal paths in S E suchthat xy and x ′ y ′ are valid loops, and for any path t coterminal with y , the following equationholds: xt ( xy ) ω ( x ′ y ′ ) ω ≡ x ′ t ( xy ) ω xy ′ ( x ′ y ′ ) ω − . Loop insertion lemma.
We now argue that, when we have a sufficiently distant rare-frequentthreshold n ′ , we can insert any arbitrary loop raised to the power n ′ without changing thecategory element to which a path evaluates: ▶ Lemma 7.4 (Loop insertion lemma) . Let p be a path, and assume that n ′ is a rare-frequentthreshold for p which is | S | -distant and a multiple of ω . Let p = rt be a decomposition of p (with r or t potentially empty), let o be the object between r and t (i.e., the final object of r ,or the initial object of t if r is empty), and let p ′ be a loop over o that only uses frequentarrows. Then p ≡ r ( p ′ ) n ′ t (note that they are also n ′ -equivalent by construction). We only sketch the proof of this result, which is proved in the appendix.
Proof sketch.
Intuitively, we show that, at any object o along a path, any idempotent x that can be achieved as a loop of frequent arrows of the form q n ′ x on o can be added withoutaffecting equivalence. This clearly preserves n ′ -equivalence by definition, so we must onlyargue that it only preserves equivalence. This claim implies Lemma 7.4, as spawning theloop ( p ′ ) n ′ when indicated is then absorbed by one of these idempotents.To establish the claim, we first show that, for any prefix u of frequent arrows, we canspawn a loop starting by u with some arbitrary return path. This is by induction on thelength of u . The base case of a prefix a of length 1 is shown by the pigeonhole principle: weconsider all occurrences of a , and as it is frequent and the threshold is | S | -distant we canapply the pigeonhole principle to two occurrences of a separated only by frequent arrows,which we then iterate to form the loop. The induction step is shown by spawning a loopwith a shorter prefix, then spawning the missing arrow of the prefix within the first loop,and recombining. Thanks to this, all necessary loops can be spawned, proving the claim. ◀ Prefix substitution lemma.
We finally show that we can freely change any prefix of frequentarrows of a path, up to inserting a loop of frequent arrows elsewhere: ▶ Lemma 7.5.
Let p = xry be a path, and assume that n ′ is a rare-frequent threshold for p which is | S | -distant and a multiple of ω . Let x ′ be a path coterminal with x . Assume thatevery arrow in x and in x ′ is frequent. Assume that some object o in the SCC of frequentarrows of the initial object of r occurs again in y , say as the intermediate object of y = y y .Then there exists y ′ = y y ′′ y for some loop y ′′ consisting only of frequent arrows such that p ≡ x ′ ry ′ and such that p ∼ n ′ x ′ ry ′ . This claim is shown in the appendix: it uses Claim 6.4 to argue that frequent arrows area union of SCCs, and crucially relies on Claim 7.3.
LZG ⊆ ZG ∗ D : Claim 5.6 We are now ready to prove the second direction of Theorem 1.1, namely Claim 5.6. Wehave fixed the semigroup S in LZG , its category of idempotents S E , and B := Arrows( S E ).We take the n given by Lemma 6.2. Our goal is to show that the n -congruence on B ∗ iscompatible with S E . To do so, let u and u be two coterminal paths that are n -equivalent.We must show that u ≡ u , i.e., the two paths u and u evaluate to the same categoryelement in S E .To do so, we use the guarantee on n ensured by Lemma 6.2 to pick an n ′ with which towork. Considering the path u , the lemma ensures that there is a divisor n ′ of n which is amultiple of ω × ( | S | + 1) and is an | S | -distant rare-frequent threshold for u . As n ′ divides n and u and u are n -equivalent, by Claim 3.2 we know that they are also n ′ -equivalent. So we . Amarilli, C. Paperman 11 will only consider the n ′ -congruence, denoted ∼ , from now on. We know that u and u are n ′ -equivalent, that n ′ is a multiple of ω × ( | S | + 1), and that n ′ is an | S | -distant rare-frequentthreshold for u and for u . Recall that, following Definition 3.1, now that we have fixedthe threshold n ′ , we call an arrow of B rare in u in u if it occurs ≤ n ′ times, and frequent otherwise.We will show that u ≡ u , and in fact will show that p ≡ p for pairs of paths p , p more generally. More specifically, we establish the following claim by finite induction on r :let p and p be two coterminal paths that are n ′ -equivalent and which contain r rare arrowseach. Then p ≡ p . Showing this for all r establishes in particular that u ≡ u . Base case: all arrows in p and p are frequent. The base case of the induction is: ▶ Claim 8.1.
Let p and p be two coterminal paths that are n ′ -equivalent and which containno rare arrows. Then p ≡ p The claim is shown in Appendix F.1, so we only sketch the proof here. We consider themultigraph G of all arrows occurring in p and p (note that these arrows for p and p must be the same). We prove that p ≡ p by another induction, this time on the numberof frequent arrows, i.e., the number of edges of the multigraph G . Formally, we show thefollowing by finite induction on the integer η : let q and q be two coterminal paths that are n ′ -equivalent, where all rare arrows have 0 occurrences, and where there are ≤ η differentfrequent arrows. Then q ≡ q . Showing this for all η establishes in particular that p ≡ p .The base case is η = 0, in which case q and q must be empty and the claim is trivial,so what matters is the induction step on G . Assume that the claim is true for any q and q such that G has ≤ η edges. Consider q and q such that G has η + 1 edges. Recall that G is strongly connected: indeed, we know, as all arrows are rich, that G is a union of SCCs,and p (or p ) witnesses that G is connected, so we know that G is strongly connected. Theinduction case is shown using a decomposition result on strongly connected multigraphsfollowing the notion of ear decomposition : ▶ Lemma 8.2.
Let G be a strongly connected nonempty directed multigraph. We have: G is a simple cycle; or G contains a simple cycle u → · · · → u n → u with n ≥ , where all vertices u , . . . , u n are pairwise disjoint, such that all intermediate vertices u , . . . , u n − only occur in theedges of the cycle, and such that the removal of the cycle leaves the graph strongly connected(note that the case n = 1 corresponds to the removal of a self-loop); or G contains a simple path u → · · · → u n with n ≥ where all vertices are pairwisedistinct, such that all intermediate vertices u , . . . , u n − only occur in the edges of thepath, and such that the removal of the path leaves the graph strongly connected (note thatthe case n = 2 corresponds to the removal of a single edge). This is a known result [7], but we give a self-contained proof in Appendix F.1. We use itto distinguish three cases in the induction step of the induction on G , which we now sketch.The first case is when G is a simple cycle. In this case n ′ -equivalence ensures that thecycle is taken by q and q some number of times with the same remainder modulo n ′ , sothey evaluate to the same element because n ′ is a multiple of ω .The second case is when G contains a simple cycle only connected to a single object. Thistime, we argue as in the previous case that the number of occurrences of the cycle must havethe same remainder, and we can use Corollary 3.7 to merge all the occurrences together.However, to eliminate them, we need to use Lemma 7.5, to modify q and q to have the same prefix (up to and including the cycle occurrences), while preserving equivalence. Thisallows us to consider the rest of the paths (which contains no occurrence of the cycle), applythe induction hypothesis to them, and conclude by compositionality. A technicality is thatwe must ensure that removing the common prefix does not make some arrows insufficientlyfrequent relative to the distant rare-frequent threshold. We avoid this using Lemma 7.4 tospawn sufficiently many copies of a suitable loop.The third case is when G contains a simple path π connecting two objects. The reasoningis similar, but we also use Lemma 7.4 to spawn a loop involving a return path for π and apath that is parallel to π (i.e., does not share any arrows with it). The return path in thisloop can then be combined with π to form a loop, which we handle like in the previous case. Induction case: some arrows are rare
Let us now show the induction step for the outerinduction, namely, the one on the number of occurrences of rare arrows. We assume theclaim of the outer induction for r ∈ N . Consider two paths p and p that are n ′ -equivalentand that contain r + 1 rare letters. Let us partition them as p = q as and p = q as where q and q all consist of frequent arrows, and a is the first rare arrow of p and p (notethat n ′ -equivalence implies that the first rare arrow is the same in both paths). In this case, q and q are two coterminal paths consisting only of frequent arrows (or they are empty),and s and s are two coterminal paths (possibly empty) with r rare letter occurrences.The full proof is given in appendix; we only sketch it. By Claim 6.4, either the SCC atthe origin of a occurs again in the rest of the path, or it does not. In the first case, we arguewith Lemma 7.5 that the prefix q can be substituted for q without affecting equivalence,reason on the rest of the path by induction hypothesis, and conclude by compositionality. Inthe second case, n ′ -equivalence intuitively ensures that q and q , and s and s , must eachbe n ′ -equivalent, so we can apply the induction hypothesis to each of them. This concludesthe induction step of the outer induction. Concluding the proof.
We have established by induction that p and p evaluate to thesame category element in all cases. This implies that n -equivalence for our choice of n iscompatible with S E , so by Corollary 5.5 we know that L is in ZG ∗ D . Thus, L ∈ LZG implies that L ∈ ZG ∗ D . We have therefore established the locality result LZG = ZG ∗ D ,concluding the proof of Claim 5.6 and hence of Theorem 1.1. In this paper, we have given a characterization of the languages of ZG , and proved that thevariety ZG is local. The methodology seems to be adaptable enough to tackle ZG ∩ A = MNil as well, but this would require a careful analysis of the proofs that we devote to future work.The case of ZE is more complicated. As proved by Almeida [1], we have ZE = G ∨ Com ,that is, ZE is the variety of monoids generated by both groups and commutative languages.Now, commutative languages are a specific example of a non-local variety, while G is a localvariety. This being said, we do not know of general results showing the preservation ornon-preservation of locality under such operators. Interestingly, however, one can check bycomputation that the counter-example language in LCom but not in
Com ∗ D (illustratingthat Com is not local), namely e ∗ af ∗ be ∗ cf ∗ , is in LZG .We hope that extending our approach to a study of locality for centrally defined varietiesin general could lead to such general results on the interplay of join operations and of thelocality or non-locality for arbitrary varieties. . Amarilli, C. Paperman 13
References Jorge Almeida.
Finite semigroups and universal algebra , volume 3. World Scientific, 1994. Jorge Almeida. A syntactical proof of locality of DA.
Int. J. Algebra Comput. , 6(2):165–178,1996. doi:10.1142/S021819679600009X . Antoine Amarilli, Louis Jachiet, and Charles Paperman. Dynamic membership for regularlanguages. Available online: https://a3nm.net/publications/amarilli2021dynamic.pdf .Also submitted to ICALP’21, 2021. K. Auinger. Join decompositions of pseudovarieties involving semigroups with commutingidempotents.
Journal of Pure and Applied Algebra , 170(2):115–129, 2002. Karl Auinger. Semigroups with central idempotents. In
Algorithmic problems in groups andsemigroups , pages 25–33. Springer, 2000. Karl Auinger. On the decidability of membership in the global of a monoid pseudovariety.
Int.J. Algebra Comput. , 20(2):181–188, 2010. doi:10.1142/S0218196710005571 . Jørgen Bang-Jensen and Gregory Z Gutin.
Digraphs: theory, algorithms and applications .Springer Science & Business Media, 2008. J.A. Brzozowski and Imre Simon. Characterizations of locally testable events.
DiscreteMathematics , 4(3):243–271, 1973. doi:10.1016/s0012-365x(73)80005-6 . Luc Dartois and Charles Paperman. Alternation hierarchies of first order logic with regularpredicates. In
FCT , 2015. Samuel Eilenberg.
Automata, languages, and machines. Vol. B . Academic Press [HarcourtBrace Jovanovich, Publishers], New York-London, 1976. With two chapters (“Depthdecomposition theorem” and “Complexity of semigroups and morphisms”) by Bret Tilson,Pure and Applied Mathematics, Vol. 59. Nathan Grosshans, Pierre McKenzie, and Luc Segoufin. The power of programs over monoidsin DA. In
MFCS , 2017. Robert Knast. A semigroup characterization of dot-depth one languages.
RAIRO Theor.Informatics Appl. , 17(4):321–330, 1983. doi:10.1051/ita/1983170403211 . Robert Knast. Some theorems on graph congruences.
RAIRO Theor. Informatics Appl. ,17(4):331–342, 1983. doi:10.1051/ita/1983170403311 . Manfred Kufleitner and Alexander Lauser. Quantifier alternation in two-variable first-orderlogic with successor is decidable. In
STACS , 2013. Robert McNaughton. Algebraic decision procedures for local testability.
Mathematical SystemsTheory , 8(1):60–76, 1974. doi:10.1007/bf01761708 . J.-E. Pin.
Varieties of formal languages . Foundations of Computer Science. Plenum PublishingCorp., New York, 1986. With a preface by M.-P. Schützenberger, Translated from the Frenchby A. Howie. doi:10.1007/978-1-4613-2215-3 . Thomas Place and Luc Segoufin. Decidable characterization of FO2(<, +1) and locality ofDA. abs/1606.03217, 2016. Gudmund Skovbjerg Frandsen, Peter Bro Miltersen, and Sven Skyum. Dynamic word problems.
JACM , 44(2):257–271, 1997. Benjamin Steinberg. A modern approach to some results of Stiffler. In
Semigroups andLanguages . World Scientific, 2004. doi:10.1142/9789812702616_0013 . Price Stiffler. Chapter 1. extension of the fundamental theorem of finite semigroups.
Advancesin Mathematics , 11(2):159–209, 1973. doi:10.1016/0001-8708(73)90007-8 . Howard Straubing. The variety generated by finite nilpotent monoids.
Semigroup Forum ,24(1):25–38, 1982. doi:10.1007/bf02572753 . Howard Straubing. Finite semigroup varieties of the form V*D.
Journal of Pure and AppliedAlgebra , 36:53–94, 1985. Howard Straubing.
Finite automata, formal logic, and circuit complexity . Birkhäuser BostonInc., Boston, MA, 1994. Howard Straubing. A new proof of the locality of R.
International Journal of Algebra andComputation , 25(01n02):293–300, 2015. doi:10.1142/s0218196715400111 . Denis Thérien and Alex Weiss. Graph congruences and wreath products.
J. Pure Appl. Algebra ,36(2):205–215, 1985. doi:10.1016/0022-4049(85)90071-4 . Denis Thérien and Thomas Wilke. Over words, two variables are as powerful as one quantifieralternation. In
STOC , 1998. Denis Thérien and Thomas Wilke. Temporal logic and semidirect products: An effectivecharacterization of the until hierarchy.
SIAM J. Comput. , 31(3):777–798, 2001. doi:10.1137/S0097539797322772 . Bret Tilson. Categories as algebra: an essential ingredient in the theory of monoids.
J. PureAppl. Algebra , 48(1-2):83–198, 1987. doi:10.1016/0022-4049(87)90108-3 . Jean Éric Pin. Mathematical foundations of automata theory. , 2019. . Amarilli, C. Paperman 15
A Proofs for Section 2 (Preliminaries) ▶ Claim 2.1.
For any monoid M in ZG , for x, y ∈ M , and k ∈ Z , we have: x ω + k y = yx ω + k . Proof.
This is simply because we can always write x ω + k as ( x ω + k ) ω +1 , because the latteris equal to x ( ω + k ) × ( ω +1) which is indeed equal to x ω + k . Thus, by setting x ′ := ( x ω + k ) andapplying the equation, we conclude. ◀ B Proofs for Section 3 (Characterizations of ZG )B.1 Miscellaneous Results ▶ Claim 3.2.
For any alphabet Σ , for any n > , for any m > , if m is a multiple of n then the m -congruence refines the n -congruence. Proof.
The claim is trivial for m = n , so we assume m > n . As m > n , if two words u and v have the same rare alphabet for m , then they have the same rare alphabet for n , becausethe number of occurrences of all rare letters for m is the same, so the same ones are also rarefor n . Furthermore, if they have the same rare subword for m , the restriction of this samerare subword to the rare letters for n yields the same word. Last, we show that the numberof occurrences modulo n are the same. For the letters that were frequent for m , this is thecase because their number of occurrences is congruent modulo m , hence modulo n because n divides m . For the letters that were not frequent for m , this is because their number ofoccurrences has to be the same because the rare subwords for m were the same. ◀▶ Claim 3.3.
For any alphabet Σ and n > , the n -congruence over Σ ∗ is a ZG -congruence. Proof.
Let E be an equivalence class of the n -congruence, which we see as a language of Σ ∗ ,and let us show that E is a language of ZG . Let Σ = A ⊔ B the partition of Σ in rare andfrequent letters for the class E , let u be the word over A ∗ associated to the class E , and let ⃗k be the | B | -tuple describing the modulo values for E . We know that the singleton language { u } is a language of ZG , because it is finite. Hence, the language U = B ∗ u · · · B ∗ u n B ∗ isalso in ZG , because it is the inverse inverse of { u } by the morphism that erases the lettersof B and is the identity on A . Similarly, the language C of words of B ∗ where the modulovalues of each letter are as prescribed by ⃗k and where every letter occurs at least n times is alanguage of ZG , because it is commutative. For the same reason, the language C ′ of wordsof Σ ∗ whose restriction to B are in C is also a language of ZG , because it is the inverseimage of C by the morphism that erases the letters of A and is the identity on B . Now, weremark that E = C ′ ∩ U , so E is in ZG , concluding the proof. ◀ B.2 Proofs of the Characterizations (Consequences of Theorem 3.4) ▶ Corollary 3.5.
Any ZG language L can be expressed as a finite union of languages of theform B ∗ a B ∗ a · · · a k B ∗ ∩ K where { a , . . . , a k } ∩ B = ∅ and K is a regular commutativelanguage. Proof.
Fix a language L in ZG , and consider the syntactic congruence ∼ of L : it is a ZG -congruence. By Theorem 3.4, there exists n ∈ N such that ∼ is refined by a n -congruence ∼ ′ .Now, by definition of the syntactic congruence, the set of words of Σ ∗ that are in L is aunion of equivalence classes of ∼ , hence of ∼ ′ . This means that L can be expressed as theunion of the languages corresponding to these classes. Now, an equivalence class of the n -congruence ∼ ′ can be expressed as the shuffle oftwo languages: the singleton language containing the rare word defining the class, and thelanguage that imposes that all frequent letters are indeed frequent (so the rare alphabet is asrequired) and that the modulo of their number of occurrences is as specified. The secondlanguage is commutative, and the disjointness of rare and frequent letters guarantees thatthe shuffle is indeed disjoint.Thus, we have shown that L is a union of disjoint shuffles of a singleton language and aregular commutative language. The form stated in the corollary is equivalent, i.e., it it theshuffle of the singleton language { a · · · a k } and of the commutative language obtained byrestricting K to the subalphabet B . ◀▶ Corollary 3.7.
For any monoid M in ZG , letting n ≥ ( | M | + 1) · ω , for any element m of M and elements m , . . . , m n of M , we have m · m · m · m · m · · · m · m n · m · m n · m = m n +1 · m · · · m n . Proof.
We consider the free monoid M ∗ . Let η : M ∗ → M be the onto morphism definedby η ( m · m ) = η ( m ) · η ( m ). Let ∼ be the congruence it induces over M ∗ , i.e., for u, v ∈ M ∗ , we have u ∼ v if η ( u ) = η ( v ). Remark that, as M is in ZG , the congruence ∼ is a ZG -congruence by definition. Hence, by Theorem 3.4, ∼ is refined by a n -congruencewhere n = ( | M | + 1) ω . Now, consider the two words in the equation to be shown above: theyare words of M ∗ . As the letter m over is then a frequent letter, we know that the two wordsare indeed n -congruent, which concludes. ◀ B.3 Proving Lemma 3.8
We show the lemma by establishing two claims: ▶ Claim B.1.
We have: ( xy ) ω = x ω y ω ( xy ) ω Proof.
We have ( xy ) ω = xy ( xy ) ω − : as ( xy ) ω − is central the right-hand-side is equal to: x ( xy ) ω − y . By injecting an ( xy ) ω in the latter, we obtain:( xy ) ω = x ( xy ) ω − ( xy ) ω y. Applying this equality ω times gives:( xy ) ω = ( x ( xy ) ω − ) ω ( xy ) ω y ω . Now, note that we can expand ( x ( xy ) ω − ) ω , commuting the ( xy ) ω − to regroup the x into x ω and regroup the ( xy ) ω − into (( xy ) ω − ) ω which is equal to ( xy ) ω , so that the first factorof the right-hand-side is equal to x ω ( xy ) ω Thus, by commuting, we obtain: ( xy ) ω = x ω y ω ( xy ) ω , the desired result. ◀▶ Claim B.2.
We have: x ω y ω = x ω y ω ( xy ) ω . Proof.
We have x ω y ω = x ω − xy ω − y , so by the equation of ZG we get: x ω y ω = x ω − y ω − xy. Now, we have x ω − y ω − = x ω − x ω y ω − y ω , and by the equation of ZG we have: x ω − y ω − = x ω − y ω − x ω y ω . . Amarilli, C. Paperman 17 Inserting the second equality in the first, we have: x ω y ω = x ω − y ω − x ω y ω xy. Now, applying this equality ω times gives x ω y ω = ( x ω − y ω − ) ω x ω y ω ( xy ) ω . As the first factorof the right-hand side is equal to x ω y ω , we get x ω y ω = x ω y ω ( xy ) ω , the desired result. ◀ Putting Claims B.1 and B.2 together immediately establishes Lemma 3.8.
B.4 Concluding the Proof of Theorem 3.4
To conclude the proof of Theorem 3.4, we will now show a kind of “normal form” for ZG -congruences, by arguing that any word can be rewritten to a word where frequent letters aremoved to the end of the word, without breaking equivalence for the ZG -congruence. Thisrelies on Lemma 3.8 and allows us to get to the notion of n -equivalence. Specifically: ▶ Claim B.3.
Let ∼ be a ZG -congruence on Σ . Let n := ( | M | + 1) · ω where M is themonoid associated to ∼ . Then, for all w ∈ Σ ∗ , for every letter a ∈ Σ which is frequent in w (i.e., | w | a > n ), writing w ′ the restriction of w to Σ \ { a } , and writing w ′′ := w ′ a | w | a , wehave: w ∼ w ′′ . Proof.
Define n as in the claim statement, and let µ : Σ ∗ → M = Σ ∗ / ∼ be the morphismassociated to ∼ . Remark that by definition, for any words u, v , we have u ∼ v iff µ ( u ) = µ ( v ).Let us take an arbitrary w and a ∈ Σ such that a is frequent in w . We can therefore write w = w aw a · · · w m aw m +1 with m = | w | a > n > | M | . Furthermore, letting x l = µ ( v · · · v l a )for each 1 ≤ l ≤ m , as m > M we know by the pigeonhole principle that there exist1 ≤ i < j ≤ m such that x i = x j . But we have x j = x i zµ ( a ) where z = µ ( v i +1 a · · · v j ). Byapplying the equation ω times, we have that x i = x i ( zµ ( a )) ω .Now, by Lemma 3.8, we have ( zµ ( a )) ω = z ω µ ( a ) ω . This is equal to z ω µ ( a ) ω µ ( a ) ω , andby now applying the lemma in reverse we conclude that ( zµ ( a )) ω = ( zµ ( a )) ω µ ( a ) ω . Finally,we obtain x i = x i ( zµ ( a )) ω = x i ( zµ ( a )) ω µ ( a ) ω = x i µ ( a ) ω .Now, the equation of ZG ensures that µ ( a ) ω is central, so we can commute it in µ ( w )and absorb all occurrences of µ ( a ) in µ ( w ), then move it at the end, while keeping the same µ -image. Formally, from x i = x i µ ( a ) ω , we have µ ( w ) = µ ( w ) µ ( a ) · · · µ ( w i ) µ ( a ) µ ( a ) ω µ ( w i +1 ) µ ( a ) · · · µ ( w m ) µ ( a ) µ ( w m +1 ) , and we commute µ ( a ) ω to merge it with all µ ( a ) and then commute the resulting µ ( a ) ω + | w | a .As | w | a ≥ ω , this µ -image is the same as the one that we obtain from w ′ a | w | a , with w ′ as defined in the statement of the claim. This establishes that w ∼ w ′′ and concludes theproof. ◀ We can now conclude the proof of Theorem 3.4:
Proof of Theorem 3.4.
Let ∼ be a ZG -congruence on Σ ∗ , M its associated monoid, and fix n := ( | M | + 1) · ω as in the theorem statement. Let u and v be two n -congruent words of Σ ∗ ,we need to prove that they are indeed ∼ -equivalent. Let Σ ′ = { a , . . . , a r } be the subsetof letters in Σ that are frequent in u (hence in v , as they are n -congruent). By successiveapplications of Claim B.3 for every frequent letter in Σ ′ , starting with u , we know that u ∼ u ≤ n a | u | a · · · a | u | ar r . Likewise, we have v ∼ v ≤ n a | v | a · · · a | v | ar r . Now, we know that ω divides n . Thus, as for any 1 ≤ i ≤ r , the values | u | a i and | v | a i are greater than n andcongruent modulo n (by definition of the n -congruence), we have a | u | ai i ∼ a | v | ai i . We also know by definition of the n -congruence that u ≤ n = v ≤ n . All of this together establishes that u ∼ v . Thus, the n -congruence indeed refines the ∼ -congruence, concluding the proof. ◀ C Proofs for Section 5 (Straubing’s Delay Theorem)
In this appendix, we give the self-contained proof of the easy direction of our main result,namely: ▶ Claim C.1.
We have ZG ∗ D ⊆ LZG . Proof. If L is in ZG ∗ D , then by Theorem 5.4, there exists a ZG -congruence ∼ compatiblewith S E . Let us now show that L is in LZG by showing that, writing S the syntactic semigroupof L , for any idempotent e , the local monoid eSe is in ZG . Let e be an idempotent. Bydefinition of S E , the local monoid eSe is isomorphic to the subset of arrows of S E goingfrom e to e , with their composition law. Let us denote this subset by B e . Define ∼ e tobe the specialization of the relation ∼ to B e . Remark that N = B ∗ e / ∼ e is a submonoid of B ∗ / ∼ , and is hence in ZG because B ∗ / ∼ is, and ZG is a variety. Remark that since allwords in B ∗ e are valid paths in S E , the local monoid H := eSe defines a congruence ∼ over B ∗ e where two paths are equivalent if they evaluate to the same monoid element. We knowthat ∼ , hence ∼ e , refines this congruence ∼ . Hence, H is a quotient of N . Thus, eSe is aquotient of B ∗ e / ∼ e , which is a submonoid of a monoid in ZG , concluding the proof. ◀ D Proofs for Section 6 (Choosing the Congruence)D.1 Proving Claim 6.4 and Claim 6.5 ▶ Claim 6.4.
Fix S and S E and B , let w ∈ B ∗ be a path of S E , and let n ′ > be a 1-distantrare-frequent threshold of w . Then the set of frequent arrows of w for n ′ is a union of SCCs. Proof.
Consider G the directed graph of Definition 6.3. Let us assume by way of contradictionthat G has a connected component which is not strongly connected. This means that thereexists an edge ( u, v ) of G such that there is no path from u to v in G . Consider any frequentarrow a in S E achieving the edge ( u, v ) of G . As a is frequent is w , we know that a occurs > n ′ times in w , hence w contains ≥ n ′ paths from the final object v of a back to the initialobject u of a . As there is no path from v to u in G , each one of these paths must contain anarrow of B which is rare in w .Hence, the total number of rare arrows in w is at least n ′ . But the 1-distant rare-frequentthreshold condition imposes that the total number of rare arrow occurrences in w is ≤ n ′ − ◀▶ Claim 6.5.
For any d > and k > and m ≥ , there exists n ≥ m such that for any d -tuple T of integers, there exists an n ′′ ≥ m such that, letting n ′ := n ′′ k , we have that n ′ divides n and that m P i ∈ F T i ≤ n ′ , where F = { i | T i ≤ n ′ } . Proof.
Let us take n := k × (( md ) d +1 !), which ensures n ≥ m . This ensures that n has asdivisors kmd , k ( md ) , . . . , k ( md ) d +1 : these are all our possible choices of values for n ′ . Nowtake any d -tuple T . For any possible choice of n ′′ , the rare set is the subset of coordinates of T having values ≤ kn ′′ . By the pigeonhole principle, we can choose two of the divisors above,say ( md ) i and ( md ) j with i < j , having the same rare set, i.e., letting F i = { i ′ | T i ′ ≤ ( md ) i k } and F j = { i ′ | T i ′ ≤ ( md ) j k } , we have F i = F j . . Amarilli, C. Paperman 19 Let us set n ′′ := ( md ) j , and let n ′ := n ′′ k . By construction, we have n ′ ≥ m , and n ′ divides n . Now, consider the sum m × P i ′ ∈ F j T i ′ . As F j = F i , we know that for every i ′ ∈ F j ,we have T i ′ ≤ ( md ) i k . Thus, the sum is at most d times this value because T is a d -tuple,and multiplying by m we know that the sum m × P i ′ ∈ F j T i ′ is at most ( md ) × ( md ) i k , henceit is ≤ ( md ) i +1 k , so it is ≤ n ′ k for n ′ := ( md ) j because i < j . This concludes the proof. ◀ D.2 Concluding the proof of Lemma 6.2
We are now ready to show Lemma 6.2:
Proof of Lemma 6.2.
Fix S , S E , and B , and the desired m . Take n to be as given byClaim 6.5 with d := 2 | B | , with k := ω × ( | S | + 1), with ω being the idempotent power of thesemigroup, and with m being the desired m plus 1. Now consider any pair u , u of S E . Let T be the d -tuple of the letter occurrences of u , followed by those of u . The statement ofClaim 6.5 ensures that there exists n ′ > n ′ k divides n and such that the totalnumber of rare arrows in u plus in u is ≤ ( n ′ k ) / ( | S | + 1). As n ′ ≥ | S | + 1 and k ≥ | S | + 1,we have n ′ k ≥ | S | ( | S | + 1), so we have ( n ′ k ) / ( | S | + 1) ≤ ( n ′ k ) / | S | −
1. Hence, the totalnumber of rare arrows u plus in u is ≤ ( n ′ k ) / | S | −
1. So the same is true of the rarearrows in u , and of the rare arrows in u . By contrast, the frequent arrows in u occur > n ′ k times, and the same is true of the frequent arrows in u . Hence, by Definition 6.1, n ′ k is an | S | -distant rare-frequent threshold for u and for u . Now, note that n ′ k is a multipleof k , hence of ω × ( | S | + 1). Thus, we have achieved all desired conditions and showed theresult. ◀ E Proofs for Section 7 (The Loop Insertion and Prefix SubstitutionLemmas)E.1 Proof of Basic Combinatorial Results ▶ Claim 7.1.
Let x and y be two coterminal loops of S E , let k ∈ Z , and let ω be an idempotentpower of S . We have: x ω + k y ≡ yx ω + k Proof.
Recall that the equation of ZG implies: x ω + k y ≡ yx ω + k . By definition of LZG , thelocal monoid eSe , for e the initial and final object of x and y , is in ZG . As the loops x and y evaluate in the category to some arrow having e as starting and ending object, the previousequation then concludes the proof. ◀▶ Claim 7.2.
For x , x ′ two coterminal paths in S E and y , y ′ coterminal paths in S E suchthat xy and x ′ y ′ are valid loops, we have: ( xy ) ω ( x ′ y ′ ) ω ≡ ( xy ′ ) ω ( x ′ y ) ω ( xy ) ω ( x ′ y ′ ) ω . Proof.
Let us first show that:( xy ) ω ( x ′ y ′ ) ω ≡ x ′ y ( xy ) ω − ( x ′ y ′ ) ω − xy ′ (1)To show Equation 1, first rewrite ( xy ) ω as x ( yx ) ω − y and likewise for ( y ′ x ′ ) ω , to get:( xy ) ω ( x ′ y ′ ) ω ≡ x ( yx ) ω − yx ′ ( y ′ x ′ ) ω − y ′ Then, we use Claim 7.1 to move ( yx ) ω − , so the above is equal to: xyx ′ ( yx ) ω − ( y ′ x ′ ) ω − y ′ We again rewrite ( yx ) ω − to y ( xy ) ω − x , yielding: xyx ′ y ( xy ) ω − x ( y ′ x ′ ) ω − y ′ We again use Claim 7.1 to move ( xy ) ω − , merge it with the prefix xy , and move it back toits place, yielding: x ′ y ( xy ) ω − x ( y ′ x ′ ) ω − y ′ We rewrite ( y ′ x ′ ) ω − to y ′ ( x ′ y ′ ) ω − x ′ , yielding: x ′ y ( xy ) ω − xy ′ ( x ′ y ′ ) ω − x ′ y ′ Again by Claim 7.1, we can merge ( x ′ y ′ ) ω − with x ′ y ′ and move it to finally get: x ′ y ( xy ) ω − ( x ′ y ′ ) ω − xy ′ This establishes Equation 1.Now, as ( xy ) ω ( x ′ y ′ ) ω ≡ ( xy ) ω ( x ′ y ′ ) ω , we can now apply Equation 1 ω times to theright-hand side and get:( xy ) ω ( x ′ y ′ ) ω ≡ ( x ′ y ) ω ( xy ) ω ( x ′ y ′ ) ω ( xy ′ ) ω (2)As these elements commute (thanks to Claim 7.1), we have shown the desired equality. ◀▶ Claim 7.3.
For x , x ′ two coterminal paths in S E and y , y ′ coterminal paths in S E suchthat xy and x ′ y ′ are valid loops, and for any path t coterminal with y , the following equationholds: xt ( xy ) ω ( x ′ y ′ ) ω ≡ x ′ t ( xy ) ω xy ′ ( x ′ y ′ ) ω − . Proof.
We apply Claim 7.2 to show the following equality on the left-hand-side: xt ( xy ) ω ( x ′ y ′ ) ω ≡ xt ( xy ′ ) ω ( x ′ y ) ω ( xy ) ω ( x ′ y ′ ) ω By commutation of ( x ′ y ) ω thanks to Claim 7.1, the right-hand-side is equal to:( x ′ y ) ω xt ( xy ′ ) ω ( x ′ y ′ ) ω ( xy ) ω By expanding ( x ′ y ) ω = x ′ ( yx ′ ) ω − y , we get: x ′ ( yx ′ ) ω − yxt ( xy ′ ) ω ( x ′ y ′ ) ω ( xy ) ω By commutation of ( xy ) ω and expanding it to x ( yx ) ω − y , we get: x ′ ( yx ′ ) ω − yx ( yx ) ω − yxt ( xy ′ ) ω ( x ′ y ′ ) ω Combining ( yx ) ω − with what precedes and follows, we get: x ′ ( yx ′ ) ω − ( yx ) ω +1 t ( xy ′ ) ω ( x ′ y ′ ) ω By expanding ( x ′ y ′ ) ω = x ′ ( y ′ x ′ ) ω − y ′ , and commuting ( yx ′ ) ω − and ( yx ) ω +1 , we get: x ′ t ( xy ′ ) ω x ′ ( yx ′ ) ω − ( yx ) ω +1 ( y ′ x ′ ) ω − y ′ Now, we have x ′ ( yx ′ ) ω − = ( x ′ y ) ω − x ′ , so we get: x ′ t ( xy ′ ) ω ( x ′ y ) ω − x ′ ( yx ) ω +1 ( y ′ x ′ ) ω − y ′ . Amarilli, C. Paperman 21 Commuting ( y ′ x ′ ) ω − and doing a similar transformation, we get: x ′ t ( xy ′ ) ω ( x ′ y ) ω − ( x ′ y ′ ) ω − x ′ ( yx ) ω +1 y ′ Now, expanding ( yx ) ω +1 , we get: x ′ t ( xy ′ ) ω ( x ′ y ) ω − ( x ′ y ′ ) ω − x ′ y ( xy ) ω xy ′ Commuting ( x ′ y ) ω − and merging it with x ′ y , we finally get: x ′ t ( xy ′ ) ω ( x ′ y ′ ) ω − ( x ′ y ) ω ( xy ) ω xy ′ Note that ( x ′ y ′ ) ω − ≡ ( x ′ y ′ ) ω ( x ′ y ′ ) ω − , so applying commutation we get: x ′ t ( xy ′ ) ω ( x ′ y ′ ) ω ( x ′ y ) ω ( xy ) ω ( x ′ y ′ ) ω − xy ′ Now, applying Claim 7.2 in reverse (using commutation again, we can obtain): x ′ t ( x ′ y ′ ) ω ( xy ) ω ( x ′ y ′ ) ω − xy ′ And commuting ( x ′ y ′ ) ω and merging it yields: x ′ t ( xy ) ω ( x ′ y ′ ) ω − xy ′ A final commutation of ( x ′ y ′ ) ω − yields the desired right-hand-side, establishing the result. ◀ E.2 Proof of the Loop Insertion Lemma (Lemma 7.4) ▶ Lemma 7.4 (Loop insertion lemma) . Let p be a path, and assume that n ′ is a rare-frequentthreshold for p which is | S | -distant and a multiple of ω . Let p = rt be a decomposition of p (with r or t potentially empty), let o be the object between r and t (i.e., the final object of r ,or the initial object of t if r is empty), and let p ′ be a loop over o that only uses frequentarrows. Then p ≡ r ( p ′ ) n ′ t (note that they are also n ′ -equivalent by construction). We first rephrase the claim to the following auxiliary result: ▶ Claim E.1.
Let p be a path, assume that n ′ is a rare-frequent threshold for p which is | S | -distant and a multiple of ω , let o be the object between r and t , and let X be the set ofelements of the local monoid on o that can be achieved as a loop q n ′ x on x with frequent arrows(i.e., the loop evaluates to an arrow ( o, x, o ) ), noting that this implies that x is idempotentas n ′ is a multiple of ω . Then letting q := (cid:16)Q x ∈ X q n ′ x (cid:17) , we have that p ≡ rqt . (Note thatthey are also n ′ -equivalent by construction.) We explain why Claim E.1 implies the desired claim. Indeed, when taking p = rt andtaking r ( p ′ ) n ′ t , choosing for o the object between r and t , the sets X defined in the auxiliaryresult will be the same for both paths (because X only depends on o ), so the rephrased claimimplies that there is a loop q such that rt ≡ rqt and r ( p ′ ) n ′ t ≡ rq ( p ′ ) n ′ t . Now, as ( p ′ ) n ′ mustcorrespond to an arrow of the form ( o, x, o ) for x ∈ X , it must be the same idempotent as oneof the idempotents achieved by one of the loops in the definition of q , and as the local monoidis in ZG these idempotents commute and q ≡ q ( p ′ ) n ′ . Hence, we have rqt ≡ rq ( p ′ ) n ′ t . Weknow that rqt ≡ rt , and rq ( p ′ ) n ′ t ≡ r ( p ′ ) n ′ t . Thus we obtain rt ≡ r ( p ′ ) n ′ t . Thus, Lemma 7.4is proved once we have shown Claim E.1.Hence, all that remains is to show Claim E.1. We will do so by establishing a number ofclaims, all of which will have their proofs deferred to the appendix.We first prove a preliminary claim which uses the pigeonhole principle to insert a loopcontaining an arbitrary arrow x in a word where x occurs more than | S | times: ▶ Claim E.2.
Let p = rt be a path, let o be the object between r and t , let x be an arrowstarting at o , and assume that x occurs > | S | times in p . Then we have p ≡ r ( xu ) ω t forsome return path u using only the arrows of p . Proof. As x occurs k > | S | times in p , this provides a decomposition of p in the shape: p = p xp x · · · p k xs .By the pigeonhole principle, there exists i < j such that p x · · · p i x ≡ p · · · p j x . Hence,iterating, we obtain: p x · · · p j x ≡ p x · · · p i x ( p i +1 x · · · p j x ) ω . Moving the ω , we get: p x · · · p j x ≡ p x · · · p i − xp i ( xp i +1 x · · · p j ) ω x. This proves that p and h ( xu ) ω g achieve the same category element by taking u := p i +1 x · · · p j , h := p x · · · p i − xp i and g := p j +1 x · · · p k xs . Remark that the terminal object of h is thesame than the terminal object of r . Furthermore either h is a prefix of r or the converse.Assume first that h = rw for some path w . Then, w and ( xu ) ω belong to the local monoid ofthe terminal object of r which is in ZG . Since idempotents commute with all elements, wehave w ( xu ) ω ≡ ( xu ) ω w achieving that rw ( xu ) ω g ≡ r ( xu ) ω wg ≡ r ( xu ) ω t since wg = t . Theother case is symmetrical. This concludes the proof of Claim E.2. ◀ Let us extend this to a claim using the notion of distant rare-frequent threshold (this iswhere we use the fact that the threshold is distant): ▶ Claim E.3.
Let p = rt be a path with an | S | -distant rare-frequent threshold n ′ , let o bethe object between r and t , and let x be any frequent arrow starting at o . Then we have p ≡ r ( xu ) ω t for some return path u using only frequent arrows of p . Proof. As x is a frequent arrow and n ′ is | S | -distant, it occurs > ( ρ + 1) | S | times in p , where ρ is the total number of rare arrows of p . Hence, writing p = p a · · · p ρ a ρ p ρ +1 where the a i are the rare arrows and the ρ i are paths of frequent arrows, there must be a ρ i containing > | S | occurrences of x . Write ρ i = r ′ t ′ where the object between r and t is the initial objectof x , which must exist as x occurs in ρ i . Applying Claim E.2 to that decomposition, we have ρ i ≡ r ′ ( xu ) ω t ′ for some return path u using only arrows of ρ i , hence only frequent arrows.Now, as idempotents commute with all elements, similarly to the end of the proof ofClaim E.2, we deduce that p ≡ r ( xu ) ω t . ◀ We then prove a generalization of the previous claim, going from a single frequent arrowto an arbitrary path of frequent arrows: ▶ Claim E.4.
Let p = rt be a path with a rare-frequent threshold n ′ which is | S | -distant anda multiple of ω . Let o be the object between r and t , and let h be a path of frequent arrowsstarting at o . Then for any path h using only arrows of p beginning at the final object of r ,we have p ≡ r ( hg ) ω t for some return path g using only frequent arrows of p . Proof.
We show the claim by induction on the length of h . The base case of the induction,with h of length 0, is trivial with g also having length 0.For the inductive claim, write h = h ′ a . By induction hypothesis, there exists a g ′ usingonly frequent arrows of p such that: p ≡ r ( h ′ g ′ ) ω t. . Amarilli, C. Paperman 23 Furthermore, by applying Claim E.3 to the decomposition r ′ = rh ′ and t ′ = g ′ ( h ′ g ′ ) ω − t andwith the frequent arrow a we get a return path u using only frequent arrows of p such that: r ( h ′ g ′ ) ω t ≡ rh ′ ( au ) ω g ′ ( h ′ g ′ ) ω − t So, iterating the ω power, and combining with the preceding equation, we get: p ≡ rh ′ (( au ) ω ) ω g ′ ( h ′ g ′ ) ω − t. Now, by applying ω − au ) ω except the first and to each loop goingfrom after this ( au ) ω to the position between an occurrence of h ′ and g ′ , we get that: p ≡ rh ′ ( au ) ω g ′ ( h ′ ( au ) ω g ′ ) ω − t. Note the right-hand side is equal to: r ( h ′ ( au ) ω g ′ ) ω t . So we have shown: p ≡ r ( h ′ ( au ) ω g ′ ) ω t. So this establishes the inductive claim by taking g := u ( au ) ω − g ′ . ◀ We can extend this to a claim about inserting arbitrary loops: ▶ Claim E.5.
Let p = rt be a path with an | S | -distant rare-frequent threshold n ′ , let o be theobject between r and t , and let q and q ′ be loops on o using only frequent arrows. We havethat rqt ≡ rq ( q ′ ) n ′ ( q ′′ ) n ′ t for some loop q ′′ on o using only frequent arrows (note that thetwo are also n ′ -equivalent). Proof.
We use Claim E.4 with h := q ′ . This gives us the existence of a return path g using only frequent arrows, which is then also a loop on o , such that rqt ≡ rq ( q ′ g ) ω t . Now,applying Lemma 3.8 to the local monoid on object o , we know that this evaluates to thesame category element as: rq ( q ′ ) ω g ω t . Hence, as n ′ is a multiple of ω , it evaluates to thesame category element as rq ( q ′ ) n ′ g n ′ t , which now preserves n ′ -equivalence and concludes theproof of Claim E.5. ◀ The only step left is to argue that Claim E.5 implies our rephrasing of the result that wewish to prove, Claim E.1. To do this, let o be the terminal object of r and let X be the set ofidempotents definable from frequent arrows. For each idempotent x ∈ X , we can choose someloop q x on frequent arrows that achieves it. Now, successive applications of Claim E.5 toeach x ∈ X , and using the fact that the ( q ′ ) n ′ and ( q ′′ ) n ′ commute by Claim 7.1 (rememberthat n ′ is a multiple of ω ), we know that p = rt is n ′ -equivalent to, and evaluates to the samecategory element as, the path r (cid:16)Q x ∈ X q n ′ x ( q ′ x ) n ′ (cid:17) t , for some q ′ x for each q x (correspondingto the q ′′ in that application of Claim E.5), which also only consists of frequent arrows. Now,since ( q ′ x ) n ′ is a loop on o consisting of frequent arrows, it also achieves an idempotent in thelocal monoid on o . As this monoid is in ZG , idempotents commute, and so these idempotentscan all be combined with some idempotent q n ′ x and absorbed by them. Thus, we get that rt is n ′ -equivalent to, and evaluates to the same category object as, the path r Q x ∈ X q n ′ x t . Thisconcludes the proof of Claim E.1, and thus establishes our desired result, Lemma 7.4. E.3 Proof of the Prefix Substitution Lemma (Lemma 7.5) ▶ Lemma 7.5.
Let p = xry be a path, and assume that n ′ is a rare-frequent threshold for p which is | S | -distant and a multiple of ω . Let x ′ be a path coterminal with x . Assume that every arrow in x and in x ′ is frequent. Assume that some object o in the SCC of frequentarrows of the initial object of r occurs again in y , say as the intermediate object of y = y y .Then there exists y ′ = y y ′′ y for some loop y ′′ consisting only of frequent arrows such that p ≡ x ′ ry ′ and such that p ∼ n ′ x ′ ry ′ . Proof. As n ′ is an | S | -distant rare-frequent threshold, we know by Claim 6.4 that the frequentarrows occurring in p are a union of SCCs. Thus, there is a return path s for x ′ (i.e., x ′ s is aloop, hence xs also is) where s only consists of frequent arrows.By our hypothesis on the initial object of r , we can decompose y = y y such that theterminal object of y and initial object of y is the initial object of r . Now, take p ′ to bethe loop s ( xs ) ω ( x ′ s ) ω x : note that all arrows of p ′ are frequent. Hence, by Lemma 7.4, xry evaluates to the same category element as, and is n ′ -equivalent to, xry ( p ′ ) ω y = xry ( s ( xs ) ω ( x ′ s ) ω x ) n ′ y By unfolding the power n ′ , we get the following: xry ( p ′ ) ω y = xry s ( xs ) ω ( x ′ s ) ω x ( s ( xs ) ω ( x ′ s ) ω x ) n − y = xry s ( xs ) ω ( x ′ s ) ω xz where we write z := ( s ( xs ) ω ( x ′ s ) ω x ) n − y for convenience. We can therefore apply Claim 7.3to obtain that:( x ( ry s ))( xs ) ω ( x ′ s ) ω xz ≡ ( x ′ ( ry s ))( xs ) ω ( xs )( x ′ s ) ω − xz. What is more, these two paths are clearly n ′ -equivalent, as they only differ in terms offrequent arrows (all arrows in x and x ′ being frequent) and the number of these arrows isunchanged by the transformation. This path is of the form given in the statement, taking y ′ := y s ( xs ) ω ( xs )( x ′ s ) ω − xz from which we can extract the right y ′′ . This concludes theproof. ◀ F Proofs for Section 8 (Concluding the Proof of
LZG ⊆ ZG ∗ D :Claim 5.6)F.1 Proof of the Base Case: Claim 8.1 ▶ Claim 8.1.
Let p and p be two coterminal paths that are n ′ -equivalent and which containno rare arrows. Then p ≡ p We first state and prove the lemma on graph decompositions: ▶ Lemma 8.2.
Let G be a strongly connected nonempty directed multigraph. We have: G is a simple cycle; or G contains a simple cycle u → · · · → u n → u with n ≥ , where all vertices u , . . . , u n are pairwise disjoint, such that all intermediate vertices u , . . . , u n − only occur in theedges of the cycle, and such that the removal of the cycle leaves the graph strongly connected(note that the case n = 1 corresponds to the removal of a self-loop); or G contains a simple path u → · · · → u n with n ≥ where all vertices are pairwisedistinct, such that all intermediate vertices u , . . . , u n − only occur in the edges of thepath, and such that the removal of the path leaves the graph strongly connected (note thatthe case n = 2 corresponds to the removal of a single edge). We repeat here that the result is standard. The proof given below is only for the reader’sconvenience, and follows [7]. . Amarilli, C. Paperman 25
Proof.
This result is showed using the notion of an ear decomposition of a directed multigraph.Specifically, following Theorem 7.2.2 of [7], for any nonempty strongly connected multigraph G , we can build a copy of it (called G ′ ) by the following sequence of steps, with the invariantthat G ′ remains strongly connected:First, take some arbitrary simple cycle in G and copy it to G ′ ;Second, while there are some vertices of G that have not been copied to G ′ , then picksome vertex v of G that was not copied, such that there is an edge ( v ′ , v ) in G with v ′ avertex that was copied. Now take some shortest path (hence a simple path) v → · · · → v ′′ from v to the subset of the vertices of G that had been copied to G ′ . This path endsat a vertex v ′′ which may or may not be equal to v ′ . If v ′′ ̸ = v ′ , then we have a simplepath v ′ → v → · · · → v ′′ , which we copy to G ′ ; otherwise we have a simple cycle, whichwe copy to G ′ . Note that, in both cases, all intermediate vertices in the simple path orsimple cycle that we copy only occur in the edges of the path or cycle (as they had notbeen previously copied to G ′ ). Further, G ′ clearly remains strongly connected after thisaddition.Third, once all vertices of G have been copied to G ′ , take each edge of G that has not beencopied to G ′ (including all self-loops), and copy it to G ′ (as a simple path of length 1).These additions preserve the strong connectedness of G ′ .At the end of this process, G ′ is a copy of G .Now, to show the result, take the graph G , consider how we can construct it according tothe above process, and distinguish three cases:If the process stopped at the end of the first step, then G is a simple cycle (case 1 of thestatement).If the process stopped after performing a copy in the second step, then considering thelast simple path or simple cycle that we added, then it satisfies the conditions and its removal from G gives a graph which is still strongly connected (case 2 or case 3 of thestatement).If the process stopped after performing a copy in the third step, then considering the lastedge that we added, then it is a simple path of length 1 and its removal from G gives agraph which is still strongly connected (case 3 of the statement).This concludes the proof. ◀ As explained in the body, we do an induction on the number of edges of the strongly connectedmultigraph G , whose base case is trivial. Here are the details of the three cases to considerin the induction step, following Lemma 8.2. Case 1: G is a simple cycle. If G is a simple cycle, then distinguish the initial object of q (hence, of q ) as o , and let α be the category element corresponding to the cycle from o to itself, and p the category element corresponding to the path from o to the commonterminal element of q and q . We have: q = α n p and q = α n p with n and n being ≥ n ′ and having the same remainder modulo n ′ . By definition of x , there exists an idempotent e and some element m ∈ eSe such that x = ( e, m, e ). Hence, q = x n p ≡ ( e, m n , e ) p (resp. q = x n p ≡ ( e, m n , e ) p ). Since n ′ is a multiple of the idempotent power of S , we have m n ≡ m ω + r and m n ≡ m ω + r where r is the remainder modulo n ′ . Thus, we have q ≡ q ,concluding this case. Case 2: G has a simple cycle. Recall that, in this case, we know that G has a simple cyclewhose intermediate objects have no other incident edges and such that the removal of thesimple cycle leaves the graph strongly connected. Let α be the simple cycle, starting from the only object e of the cycle having other incident edges. We can then decompose q and q to isolate the occurrences of the simple cycle (which must be taken in its entirety), i.e.: q = x αx αx · · · x t − αx t q = y αy αy · · · y t ′ − αy t ′ We ensure that the edges of α do not occur elsewhere than in the α factors, except possiblyin x , y and in x t , y t ′ if the paths q and/or q start/and or end in the simple cycle. However,in that case, we know that the prefixes of q and q containing this incomplete subset of thecycle must be equal (same sequence of arrows), and likewise for their suffixes. For this reason,it suffices to show the claim that q and q evaluate to the same category object under theassumption that both their initial and terminal objects are not intermediate vertices of thecycle: the claim then extends to the case when they can be (by adding the common prefixesand suffixes to the two paths that satisfy the condition, using the fact that a congruence iscompatible with concatenation). Thus, in the rest of the proof for this case, we assume thatthe edges of α only occur in the α factors.We will now argue that, to show that q ≡ q , it suffices to show the same of two n ′ -equivalent coterminal paths from which all occurrences of the edges of the cycle have beenremoved and where all other edges still occur sufficiently many times. As this deals withpaths where the underlying multigraph contains fewer edges, the induction hypothesis willconclude.To do this, by Lemma 7.5, as x and y are coterminal and consist only of frequent arrows,and as the initial object of α occurs again in both paths, the path q is n ′ -equivalent, andevaluates to the same category element as, some path: q ′ = y αx ′ αx ′ · · · x ′ t ′′ − αx t ′′ For this reason, up to replacing q by q ′ , we can assume that x = y .Now, furthermore, x , . . . , x t − (resp. y , . . . , y t ′ − ) and α are coterminal cycles over theobject e (which by definition corresponds to an idempotent of S ). Hence, αx αx · · · x t − α =( e, mm mm · · · m t − m, e ) where α = ( e, m, e ), x i = ( e, m i , e ) for 2 ≤ i ≤ t − m and all m i ’s are in eSe , which is by hypothesis a monoid in ZG . Now, by Corollary 3.7,we know that mm mm · · · m t − m = m t − m m · · · m t − , because, as the arrows of α arefrequent, the number of occurrences of α is ≥ n ′ , and we have taken n ′ to be a multiple of( | S | + 1) × ω (where ω is the idempotent power of S ), which is greater than ( | eSe | + 1) × k ,where k is the idempotent power of eSe (which divides ω , hence is ≤ ω ). By applying thesame reasoning to q , it suffices to show that the two following paths evaluate to the samecategory element, where x = y : x α t − x x · · · x t − x t y α t ′ − y y · · · y t ′ − y t ′ Now, because these two paths are n ′ -equivalent, we know that t − t ′ − n ′ . By the same reasoning as in case 1, they evaluate to the same categoryelement as α r , where r is the remainder. So it suffices to show that the two following pathsevaluate to the same category element, with x = y : x α r x x · · · x t − x t y α r y y · · · y t ′ − y t ′ . Amarilli, C. Paperman 27 To ensure that the edges not in α still occur sufficiently many times, let β be any loop on e that visits all edges of G except the ones in α : this is doable because G is still stronglyconnected after the removal of α . Up to exponentiating β , we can assume that β traverseseach edge sufficiently many times to satisfy the lower bound imposed by the requirementof n ′ being an | S | -distant rare-frequent threshold. By Lemma 7.4, it suffices to show thatthe following paths evaluate to the same category element: q ′ = x α r β n ′ x x · · · x t − x t q ′ = y α r β n ′ y y · · · y t ′ − y t ′ Now, observe that both paths start by x α = y α , and the arrows of α do not occur inthe rest of the paths. Now consider the paths β n ′ x x . . . x t and β n ′ y y . . . y t ′ . They arepaths that are coterminal, n ′ -equivalent because q and q were, where the frequent lettersthat are used are a strict subset of the ones used in p and p , and where all other frequentletters occur sufficiently many times for n ′ to still be an | S | -distant rare-frequent threshold.Thus, by induction hypothesis, we know that these two paths evaluate to the same categoryelement, so that q ′ and q ′ also do. This concludes case 2. Case 3: G has a simple path. Recall that, in this case, we know that G has a simple pathwhose starting and ending objects have no other incident edges and such that the removalof the simple path leaves the graph strongly connected. We denote by x ̸ = y the startingand ending objects of the path. Let π be the category element corresponding to the path.Since the removal of the path does not affect strong connectedness of the graph, there is asimple path from x to y sharing no edges with π ; let κ be the category element to which this“return path” evaluates. Furthermore, there is a simple path ρ from y to x sharing no edgeswith π (this is because all intermediate objects of π only occur in the edges of π ).Like in the previous case, up to removing common prefixes and suffixes, it suffices toconsider the case where q and q do not start or end in the intermediate vertices of π . Forthat reason, we can now isolate all occurrences of the edges of π , and write: q = x πx πx · · · x t − πx t q = y πy πy · · · y t ′ − πy t ′ Like in the previous case, by Lemma 7.5, we can assume that x = y .By Lemma 7.4, we spawn a loop ( ρκ ) n ′ after every occurrence of π without changing thecategory element and still respecting the n ′ -congruence. By expanding ( ρκ ) n ′ = ρκ ( ρκ ) n ′ − ,it suffices to show that the following paths evaluate to the same category element, with x = y : q ′ = x πρκ ( ρκ ) n ′ − · · · x t − πρκ ( ρκ ) n ′ − x t q ′ = y πρκ ( ρκ ) n ′ − · · · y t ′ − πρκ ( ρκ ) n ′ − y t ′ We can now regroup the occurrences of πρ , which are loops such that some edges (namely,the edges of π ) only occur in these factors. This means that we can conclude as in case 2 forthe cycle πρ , as this cycle contains some edges that only occur there; we can choose β at theend of the proof to be a loop on x visiting all edges of G except those of π , which is againpossible because G is still strongly connected even after the removal of π .This establishes case 3 and concludes the induction step of the proof.We have thus proved by induction that q and q evaluate to the same category element,in the base case of the outer induction where all edges of q and q are frequent. F.2 Proof of the Induction Case
We now conclude the proof of the induction step for the outer induction. Recall from thebody that we have taken two n ′ -equivalent paths p = q as and p = q as with a beingthe first of the r + 1 rare letters of the paths.Remember that, as n ′ is an | S | -distant rare-frequent threshold for p and p , then weknow that the frequent arrows of p form a union of SCCs (Claim 6.4); note that, thanks to n ′ -equivalence, the same is true of p with the same SCCs. Let e be the source object of a ,and consider the SCC C of frequent arrows that contains e . There are two cases, dependingon whether some object of C occurs again in s or not. Note that some object of C occursagain in s iff the same is true of s , because which frequent arrow components occur againis entirely determined by the terminal objects of the rare arrows of s and s , which areidentical thanks to n ′ -equivalence. Case 1: C occurs again after a . In this case, we apply Lemma 7.5, because q and q onlyconsist of frequent arrows and some object of the SCC of the initial object of a occurs againin s . The claim tells us that there is a path: p ′ = q as ′ which evaluates to the same category element as p and is n ′ -equivalent to it. Hence, bycompositionality, it suffices to show that s ′ and s evaluate to the same category element.To apply the induction hypothesis, we simply need to ensure that n ′ is still a | S | -distantrare-frequent threshold for s ′ and s . To do this, we need to ensure that the arrows thatare frequent in p and p are still frequent there, and still satisfy the | S | -distant condition.Fortunately, we can simply ensure this by inserting a loop using Lemma 7.4. Formally, write s ′ = r t where the intermediate object is the object of the SCC C that occurred in s (theexistence of such a decomposition is a consequence of the statement of Lemma 7.5), and write s = r t in the same way (which we already discussed must be possible with s ). Let p ′ bean arbitrary loop of frequent arrows where all arrows of C occur: this is possible because C is strongly connected. We know by Lemma 7.4 that s ′ = r t and r ( p ) n ′ t are both n ′ -equivalent and evaluate to the same category object: this is also true with w := r ( p ) kn ′ t for a sufficiently large k such that every frequent arrow of C occurs as many times as it didin p . Likewise, s = r t and w := r ( p ) kn ′ t are both n ′ -equivalent and evaluate to thesame category object, and we can take a sufficiently large k . So it suffices to consider w and w .Let us apply the induction hypothesis to them. They are two coterminal paths, andthey are n ′ -equivalent because w ∼ n ′ p ′ ∼ n ′ p and w ∼ n ′ p and by hypothesis p ∼ n ′ p .What is more, the arrows that were rare in p and p are still rare for them, and they have r occurrences in total: this was true by construction of s and s and is true of s ′ because p = q as ∼ n ′ q as ′ and all arrows of q are frequent so the rare subwords of s and s ′ arethe same. The arrows that were frequent in p and p are still frequent in s and s andoccur at least as many times as they did in p and p respectively: we have guaranteed thisfor the arrows of C using Lemma 7.4, and this is clear for the arrows outside of C as alltheir occurrences in p and p were in s and s respectively, and s ′ has at least as manyoccurrences of every letter as s does (this is a consequence of the statement of Lemma 7.5).This ensures that s ′ ∼ n ′ s , and that n ′ is still an | S | -distant rare-frequent threshold forthem.Hence, by the induction hypothesis, we have s ′ ≡ s , so that by compositionality we have p ≡ p . . Amarilli, C. Paperman 29 Case 2: C does not occur again after a . We first claim that q ≡ q by the base case ofthe outer induction. Indeed, first note that they are two coterminal paths. Now, every arrow x which is frequent in p and p is either in the SCC of the initial object of a or not. Inthe first situation, all the occurrences of x in p must be in q , as any occurrence of x in s would witness that we are in Case 2; and likewise all its occurrences in p must be in q . Inthe second situation, all its occurrences in p must be in s and all its occurrences in p mustbe in s , for the same reason. Thus, q and q contain no letter which was rare in p and p ,some of the frequent letters of p and p (those of the other SCCs) do not occur there at all,and the others occur there with the same number of occurrences. Thus indeed q ∼ n ′ q ,they contain no rare arrows, and n ′ is still an | S | -distant rare-frequent threshold for them.Thus, the base case of the outer induction concludes that they evaluate to the same categoryelement.We now claim that s ≡ s by the induction case of the outer induction. Indeed, theyare again two coterminal paths. What is more, by the previous reasoning the arrows thatare frequent in p and p either occur only in s and s or do not occur there at all. Thus, s and s contain r rare arrows (for the arrows that were already rare in p and p ), andthe frequent arrows either occur in s and s with the same number of occurrences as in p and p or not at all. This implies that n ′ is still an | S | -distant rare-frequent threshold for s and s . Thus, we have s ∼ n ′ s and the induction case of the outer induction establishesthat s ≡ s .Thus, by compositionality, we know that p and p2