Dynamic Membership for Regular Languages
DDynamic Membership for Regular Languages
Antoine Amarilli ! ˇ LTCI, Télécom Paris, Institut polytechnique de Paris, France
Louis Jachiet
LTCI, Télécom Paris, Institut polytechnique de Paris, France
Charles Paperman
LINKS, CRIStAL, Université de Lille, INRIA, France
Abstract
We study the dynamic membership problem for regular languages: fix a language L , read a word w ,build in time O ( | w | ) a data structure indicating if w is in L , and maintain this structure efficientlyunder substitution edits on w . We consider this problem on the unit cost RAM model withlogarithmic world length, where the problem always has a solution in O (log | w | / log log | w | ).We show that the problem is in O (log log | w | ) for languages in an algebraically-defined class QSG , and that it is in O (1) for another class QLZG . We show that languages not in
QSG admita reduction from the prefix problem for a cyclic group, so that they require Ω(log n/ log log n )operations in the worst case; and that QSG languages not in
QLZG admit a reduction from theprefix problem for the monoid U , which we conjecture cannot be maintained in O (1). This yields aconditional trichotomy. We also investigate intermediate cases between O (1) and O (log log n ).Our results are shown via the dynamic word problem for monoids and semigroups, for which wealso give a classification. We thus solve open problems of the paper of Skovbjerg Frandsen, Miltersen,and Skyum [25] on the dynamic word problem, and additionally cover regular languages. Theory of computation → Formal languages and automata theory
Keywords and phrases regular language, membership, RAM model, updates, dynamic
Acknowledgements
We thank Jean-Éric Pin and Jorge Almeida for their advice on ZG and SG . This paper studies how to handle substitution updates on a word while maintaining itsmembership to a regular language. Specifically, we fix a regular language L and we are givena word w , which we preprocess in linear time – in particular testing if w ∈ L . Now, edits areapplied to w , and we want to update w and keep track at each step of whether w ∈ L . Westudy this dynamic membership task from a theoretical angle, but it is also practically usefulfor maintaining a Boolean condition (expressed as a regular language) on a user-edited word.Dynamic membership was studied for various update operations, e.g., append operationsfor streaming algorithms or the sliding window model [12, 11, 13], substitutions for thedynamic word problem for monoids [25], or concatenations and splits [18]. It was also studiedin the case of pattern matching , where we check if the word contains some target pattern [8],which is also assumed to be editable. It is also connected to the incremental validationproblem, which has been studied for strings and for XML documents [3].Our focus in this work is to identify classes of fixed regular languages for which dynamicmembership can be solved extremely efficiently, e.g., in constant time or sublogarithmic time.Thus, our update language only allows substitutions to the input word, as insertions anddeletions already make it challenging to efficiently maintain the word itself (see Section 7).We work within the computational model of the unit-cost RAM, with logarithmic cell size. Dynamic word problem for monoids [25].
Our problem closely relates to the work bySkovbjerg Frandsen, Miltersen, and Skyum on the dynamic word problem for monoids [25]: a r X i v : . [ c s . F L ] F e b Dynamic Membership for Regular Languages fix a finite monoid, read a word which is a sequence of monoid elements, and maintain undersubstitution updates the image of these elements when composed according to the monoidlaw. Indeed, the dynamic membership problem for a language L reduces to the dynamic wordproblem for any monoid that recognizes L ; but the converse is not true. Hence, studyingthe dynamic word problem for monoids is coarser than studying the dynamic membershipproblem for languages, although it is a natural first step and is already very challenging.In the context of monoids, Skovbjerg Frandsen et al. [25] propose a general algorithmfor the dynamic word problem in O (log n/ log log n ), for n the length of the word. This is arefinement of the elementary O (log n ) algorithm that decomposes the word as a balancedbinary tree whose nodes are annotated with the monoid image of the corresponding infix.They show that this bound is tight for some monoids, namely noncommutative groups, anda generalization of them defined via an equation. This is obtained by a reduction from theso-called prefix- Z d problem, for which an Ω(log n/ log log n ) lower bound [10] is known in thecell probe model [14]. We will reuse this lower bound in our work.They also show that the problem is easier for some monoids. For instance, commutativemonoids can be maintained in O (1), simply by maintaining the count of element occurrences.They also show a trickier O (log log n ) upper bound for group-free monoids : this is based on aso-called Krohn-Rhodes decomposition [24] and uses a predecessor data structure implementedas a Van Emde Boas tree [30]. However, there are non-commutative monoids for which theproblem is in O (1) (as we will show), and there is still a gap between group-free monoids(having an upper bound in O (log log n )) and the monoids to which the Ω(log n/ log log n )lower bound applies. This was claimed as open in [25] and not addressed afterwards – thereis a more recent study by Pˇatraşcu and Tarniţˇa [20] but focusing on single-bit memory cells. Our contributions.
In this paper, we attack these problems using algebraic monoid theory.This unlocks new results: first on the dynamic word problem for monoids, where we extendthe results of [25], and then on the dynamic membership problem for regular languages.We start with our results on the dynamic word problem for monoids, which are summarizedin Figure 1 in the appendix for common monoid classes. First, in Section 3, we show how amore elaborate O (log log n ) algorithm can cover all monoids to which the Ω(log n/ log log n )lower bound of [25] does not apply: we dub this class SG and characterize it by the equation x ω +1 yx ω = x ω yx ω +1 , for any elements x and y , where ω denotes the idempotent power. Ouralgorithm shares some ideas with the O (log log n ) algorithm of [25], in particular it uses VanEmde Boas trees, but it faces significant new challenges. For instance, we can no longer usea Krohn-Rhodes decomposition, and proceed instead by a rather technical induction on the J -classes of the monoid. Thus, we have an unconditional dichotomy between monoids in SG which are in O (log log n ), and monoids outside SG which are in Θ(log n/ log log n ).Second, in Section 4, we generalize the O (1) result on commutative monoids to themonoid class ZG [2]. This class is defined via the equation x ω +1 y = yx ω +1 , i.e., only theelements that are part of a group are required to commute with all other elements. Weshow that the dynamic word problem for these monoids is in O (1): we intuitively describethem algebraically as the combination of commutative monoids with monoids obtained fromnilpotent semigroups, for which we design a simple but somewhat surprising algorithm. Wealso show a conditional lower bound: for any monoid M not in ZG , we can reduce the prefix- U problem to the dynamic word problem for M . This is the problem of maintaininga binary word under substitution updates while answering queries asking if a prefix containsa 0. It can be seen as a priority queue (slightly weakened), so we conjecture that no O (1)data structure for this problem exists in the RAM model. Under this conjecture, ZG is . Amarilli, L. Jachiet, C. Paperman 3 exactly the class of monoids having a dynamic word problem in O (1).We then extend our results in Section 5 from monoids to the dynamic word problemfor semigroups. Our results for SG extend as-is, as the upper bound on SG also appliesto semigroups in SG , and semigroups not in SG must contain a submonoid not in SG socovered by the lower bound. For ZG , there are major complications, and we must studythe class LZG of semigroups where all submonoids are in ZG . A semigroup not in LZG is covered by our conditional lower bound, but it is very tricky to show the converse, i.e.,that imposing the condition on
LZG suffices to ensure tractability. We do so by showingtractability for ZG ∗ D , the semigroups generated by semidirect products of ZG semigroupsand so-called definite semigroups , and using a locality result establishing that ZG ∗ D = LZG .Next, we extend our results in Section 6 from semigroups to languages. This is doneusing the notion of stable semigroup [4, 5], which we can use to lift our results and obtain: ▶ Theorem 1.1.
Let L be a regular language, and consider the dynamic membership problemfor L on the unit-cost RAM with logarithmic word length under substitution edits:If L is in the class QLZG , then the problem is in O (1) .If L is not in the class QLZG but is in the class
QSG , then the dynamic membershipproblem is in O (log log n ) with n the length of the word. Further, solving it in O (1) timegives an O (1) implementation of a prefix- U structure.If L is not in the class QSG , then the dynamic membership problem is in
Θ(log n/ log log n ) . We last present in Section 7 some extensions and questions for future work: preliminaryobservations on the precise complexity of languages in
QSG \ QLZG (as the O (log log n )bound is not shown to be tight), the complexity of deciding which case of the theorem applies,the support for insertion and deletion updates, and the support for infix queries.Note that this paper uses some new results about the class ZG , namely its characterizationvia commutative monoids and nilpotent semigroups, and the locality result ZG ∗ D = LZG .As these results are of possible independent interest, and have a technical proof usingunrelated tools (namely finite categories and Straubing’s delay theorem [27]), we chose todefer them to a separate ICALP’21 submission [1].
Computation model.
We work in the RAM model with unit cost, i.e., each cell can storeintegers of value at most polynomial in O ( | w | ) where | w | is the length of the input, andarithmetic operations (addition, successor, modulo, etc.) on two cells takes unit time. Asthe integers have at most polynomial value, the memory usage is also polynomially bounded.We consider dynamic problems where we are given a data structure, preprocess it in lineartime, and must then handle update operations on the data structure as well as query operations on the current structure. The complexity of the problem is the worst-case complexity ofhandling an update or answering a query.Like in [25], the lower bounds that we show will actually all hold in the coarser cell probe model, which only considers the number of memory cells accessed during a computation.Furthermore, they will all hold even without the assumption that the preprocessing is linear.Given two dynamic problems A and B , we say that A has a constant-time reduction to B if we can implement a data structure for problem A having constant-time complexity whenusing as oracle a data structure for problem B (built during the preprocessing). In otherwords, queries and updates on the structure for A , in addition to constant-time computationson its own memory, can use the data structure for B as an oracle, i.e., perform a constant Dynamic Membership for Regular Languages number of queries and updates on it, which are considered to run in O (1). We similarly talkof a dynamic problem having a reduction to multiple problems, meaning we can use all ofthem as oracle. If problem A reduces to problems B , . . . , B n , then any complexity upperbound that holds on all problems B , . . . , B n also holds for A , and any complexity lowerbound on A extends to at least one of the B i . Problem statement.
Our problems require some algebraic prerequisites. We refer the readerto the book of Pin [22] and his lecture notes [23] for basic concepts (monoids, semigroups,etc.), and only recall less common notions. All semigroups and monoids considered are finite.A semigroup element s ∈ S is idempotent if ss = s . For x ∈ S , we denote by ω theidempotent power of x , i.e., the smallest positive integer such that x ω is idempotent. A zero for S is an element 0 ∈ S such that 0 x = x x ∈ S : if it exists, it is unique. A variety of monoids (resp., semigroups) is a class of monoids (resp., semigroups) closed underdirect product, quotient and submonoid (resp., subsemigroup). The variety of monoids (resp.,semigroups) generated by a class V of monoids (resp., of semigroups) is the least varietyclosed under these operations and containing V .We consider dynamic problems where we maintain a word w on an alphabet Σ, everyletter being stored in a cell. We allow substitution updates ( i, a ) for 1 ≤ i ≤ | w | and a ∈ Σsetting the i -th letter of w to be a . The size | w | of the word never changes, the preprocessingruns in O ( | w | ), and we measure the worst-case complexity of an update or query as a functionof | w | . We do not limit the memory used, but all our upper complexity bounds will havememory usage in O ( | w | ), while all our lower bounds hold without this assumption.We focus on three related dynamic problems, allowing different query operations. Thefirst is the dynamic word problem for monoids : fix a monoid M , the alphabet Σ is M , andthe query returns the evaluation of the current word w , i.e., the product of the elementsof w (it is an element of M ). This is the problem studied in [25]. The second is the dynamicword problem for semigroups , which is the same but with a semigroup, and assuming that | w | >
0. The third is the dynamic membership problem for regular languages : we fix a regularlanguage L on the alphabet Σ, and the query checks if the current word belongs to L or not.We will study the complexity of these problems in the rest of this paper. Let us firstobserve that, for monoids and more generally for semigroups, the usual algebraic operatorsof quotient, subsemigroup, and product, do not increase the complexity of the problem: ▶ Proposition 2.1.
Let S and T be finite semigroups. The dynamic word problem ofsubsemigroups or quotients of S reduce in constant time to the problem for S , and thedynamic word problem of S × T reduces to that of S and T . Hard problems.
All our lower bounds are by reducing from the problem prefix- M , for M afixed monoid. In this problem, we maintain a word of M ∗ under substitution updates, andhandle prefix queries : given a prefix length, return the evaluation of the prefix of that length.In particular, for d ≥
2, we will consider the problem prefix- Z d for Z d the cyclic group oforder d , i.e., Z d = { , . . . , d − } with addition modulo d , where we must compute the sumof the prefix modulo d . The following lower bound is known already in the cell probe model: ▶ Theorem 2.2 ([10, 25]) . For any fixed d ≥ , any structure for prefix- Z d on a word oflength n has complexity at least Ω(log n/ log log n ) . We also consider the problem prefix- U , where U = { , } with composition being thelogical AND, i.e., prefix queries check if the prefix contains a 0. Equivalently, we mustmaintain a subset S of a universe { , . . . , n } (intuitively n is the length of the word) under . Amarilli, L. Jachiet, C. Paperman 5 insertions and deletions, and support threshold queries that ask, given 0 ≤ i ≤ n , whether S contains some element which is ≤ i (i.e., if some position ≤ i has a 0). The prefix- U problemcan be solved with a priority queue data structure in O (log log n ) [29], or even in expected O ( √ log log n ) if we allow randomization [15]. Note that prefix- U is slightly weaker than apriority queue as we can only compare the minimal value to a value given as input. We donot know of lower bounds on prefix- U , but conjecture [16] that it cannot be solved in O (1): ▶ Conjecture 2.3.
There is no structure for prefix- U with complexity O (1) . Note that the best algorithm for prefix- U is by sorting small sets of large integers. Thistakes linear time in the cell probe model, so does not rule out the existence of an O (1)priority queue [29]. Hence, a lower bound for prefix- U would need to apply to the RAMmodel specifically, which would require new techniques.Our last hard problem is prefix- U where U is the monoid { , a, b } with compositionlaw xy = y for x, y ∈ { a, b } , i.e., queries check if the last non-neutral element is a or b (ornothing). By adapting known results on the colored predecessor problem [21], we have: ▶ Theorem 2.4 (Adapted from [21]) . Any structure for prefix- U on a word of length n mustbe in Ω(log log n ) . General algorithms.
Of course, the “hard” prefix problems, and the three problems thatwe study, can all be solved in O ( | w | ) by re-reading the whole word at each update. We canimprove this to O (log | w | ) by maintaining a balanced binary tree on the letters of the word,with each node of the tree carrying the evaluation in the monoid of the letters reachablefrom that node. Any edit to the word can be propagated up to the root in logarithmictime, and the annotation of the root is the evaluation of the word. This algorithm has beenimplemented in practice [18]. A finer bound is given in [25] using a foklore technique ofworking with (log n )-ary trees rather than binary trees, and using the power of the RAMmodel. We recall it here for monoids (but it applies to all three problems): ▶ Theorem 2.5 ([25]) . For any monoid M , the dynamic word problem and prefix problemfor M are in O (log n/ log log n ) . Our goal in this paper is to solve the dynamic word problem and dynamic membershipproblem more efficiently for specific classes of monoids, semigroups, and languages. We startour study with monoids in the next two sections, by studying the varieties SG and ZG . SG The class SG of monoids is defined by the equation x ω +1 yx ω = x ω yx ω +1 for all x, y . Itincidentally occurs in [9, Theorem 3.1], but to our knowledge was not otherwise studied. Thename SG means switching group , as we can “switch” group elements between occurrences oftheir idempotent. We first recall the lower bound from [25] on the dynamic word problemfor monoids not in SG , and then show an upper bound for monoids in SG . Lower bound.
The monoids not in SG are in fact those covered by the lower bound of [25].Namely, we have the following, implying the Ω(log n/ log log n ) lower bound by Theorem 2.2: ▶ Theorem 3.1 ([25], Theorem 2.5.1) . For any monoid M not in SG , there is some d ≥ such that the prefix- Z d problem reduces in constant time to the dynamic word problem for M . Dynamic Membership for Regular Languages
Upper bound.
The rest of this section presents our upper bound on monoids in SG . Wewill show a more general claim on the dynamic word problem for semigroups in SG , i.e.,those satisfying the equation of SG : this covers in particular the monoids of SG . We show: ▶ Theorem 3.2.
The dynamic word problem for any semigroup in SG is in O (log log n ) . This result extends the result of [25] on group-free monoids, because SG captures bothaperiodic monoids and commutative monoids. Indeed, in the case where a monoid is aperiodic,then it satisfies the equation x ω +1 = x ω , and then x ω +1 yx ω = x ω yx ω = x ω yx ω +1 . Besides,commutative monoids clearly satisfy the equation. Of course, SG captures monoids notcovered by [25], e.g., products of a commutative monoid and an aperiodic monoid.The result of [25] is proved using van Emde Boas trees [30] (vEBs), which we will alsouse, and which we will extend to store values in an alphabet Σ. In this paper, fixing analphabet Σ, a vEB tree is a data structure parametrized by an integer n called its span ,which stores a set X ⊆ { , ..., n } and a mapping µ : X → Σ, and supports the followingoperations: inserting an integer x ∈ { , ..., n } with a label µ ( x ) := a , removing an integer x and its label, retrieving the label of x ∈ { , . . . , n } , finding the next integer of X that followsan input x ∈ { , . . . , n } (or a special value if none exists), and finding the previous integer .We can implement vEBs so that these five operations run in O (log log n ) time in theworst case, and so that a vEB can be constructed in linear time from a ordered list.We use vEBs in our inductive proof to represent words with “gaps”: a vEB represents theword obtained by concatenating the labels of the elements of X in order. For a semigroup S and span n ∈ N , the dynamic word problem on vEBs for S is to maintain a vEB T of span n on alphabet S under insertions and deletions while maintaining the evaluation in S of theword of T . The complexity is measured as a function of n (which never changes), andthe preprocessing of T is in O ( n ). Note that when T is empty then its evaluation is notexpressible in the semigroup S : we then return a special value.It is then clear that Theorem 3.2 follows from its generalization to vEBs, as a word inthe usual sense can be converted in linear time to a vEB where X = { , . . . , n } : ▶ Theorem 3.3.
Let S be a semigroup in SG . The dynamic word problem for S on a vEBof span n is in O (log log n ) . We show this result in the rest of the section. We assume without loss of generalitythat S has a zero, as otherwise we can simply add one. We first introduce some algebraicpreliminaries, and then present the proof, which is an induction on J -classes of the semigroup. Preliminaries and proof structure.
The J -order of S is the preorder ≤ J on S defined by s ≤ J s ′ if SsS ⊆ Ss ′ S : its equivalence classes of its reflexive closure are called J -classes .We lift the J -order to J -classes C, C ′ s by writing C ≤ J C ′ if u ≤ J v for all u ∈ C and u ′ ∈ C ′ . A J -class is maximal if it is maximal for this order.We show Theorem 3.3 by induction on the number of J -classes of the semigroup. Moreprecisely, at every step, we consider a maximal J -class C , and remove it by doing a constant-time reduction to the semigroup S \ C obtained by removing this class. Thus, as there areonly constantly many classes, and as the constant number of operations on vEBs at eachclass each take time O (log log n ), the overall bound is indeed in O (log log n ).The base case of the induction is that of a semigroup with a single J -class; from ourassumption that the semigroup has a zero, that J -class must then consist of the zero, i.e., wehave the trivial monoid { } , and the image is always 0 (or undefined if the word is empty).We now show the induction step of Theorem 3.3. Take a semigroup S with more than one J -class, and fix a maximal J -class C of S : we know that S \ C is not empty. What is more: . Amarilli, L. Jachiet, C. Paperman 7 ▶ Claim 3.4.
For any x, y of S with xy ∈ C , then x ∈ C and y ∈ C . Thus, whenever a combination of elements “falls” outside of the maximal class C , then itremains in S \ C ; and we can see S \ C as a semigroup, which still has a zero, and has strictlyless J -classes. So we will study how to reduce to S \ C . We now make a case disjunction onwhether C is regular , i.e., it contains an idempotent element, or whether it does not. Non-regular maximal classes.
This case is easy, as the combination of any two elementsin C always “falls” in S \ C . Formally, for a maximal J -class C of S , we call a word w on S ∗ pair-collapsing for C if the product of any two adjacent letters of w is in S \ C . We show: ▶ Lemma 3.5.
Let C be a maximal J -class. If C is non-regular, then any word is pair-collapsing: for any x, y ∈ C , we have xy ∈ S \ C . We can then show the following, which we will reuse for regular maximal classes: ▶ Lemma 3.6.
Let S be a semigroup and let C be a maximal J -class of S . Consider thedynamic word problem for S on vEBs of some span n where we assume that, at every step,the represented word is pair-collapsing for C . Then there is a constant-time reduction fromthat problem to the dynamic word problem same problem for S \ C on vEBs of span n . Proof sketch.
We group adjacent letters of the word into groups of ≥ S \ C , and we can perform the evaluationwith a structure for the dynamic word problem for S \ C . When handling edits, we maintaina constant bound on the size of the groups, without introducing singleton groups. ◀ Thanks to Lemma 3.5, this allows us to settle the case of a non-regular maximal J -class,using the induction hypothesis to maintain the problem on S \ C . Case of a regular maximal class.
We now consider a maximal J -class C that is regular.Consider the semigroup C := C ∪ { } for a fresh zero 0, i.e., the multiplication is thatof C except that x x = 0 for all x ∈ C . Note that 0 is unrelated to the zero which S was assumed to have; intuitively, the 0 of C stands for combinations of elements that“fall” outside of C . Another way to see C is as the quotient of S by the ideal S \ C , i.e., weidentify all elements of S \ C to 0. By Prop. 4.35 of Chapter V of [23], we know that C is a so-called . By the Rees-Sushkevich theorem (Theorem 3.33 of [23]), S is isomorphic to some Rees matrix semigroup with 0 . This is a semigroup M ( G, I, J, P )with I and J two non-empty sets, G a group called the structuring group , and P a matrixindexed by J × I having values in G . The elements of the semigroup are I × G × J and 0,with x x = 0 for any element x ∈ I × G × J , and for ( i, g, j ) and ( i ′ , g ′ , j ′ ) two elementsof I × G × J , their product is 0 if p j,i ′ = 0, and ( i, gp j,i ′ g ′ , j ′ ) otherwise.With this representation, the idea is to collapse together the maximal runs of consecutiveelements of C whose product is not 0, i.e., does not “fall” outside of C . Once this is done,the product of two elements always falls in S \ C , so we can conclude with Lemma 3.6.However, we cannot do this in a naive fashion. For instance, if we insert a letter in thevEB at the middle of such a maximal run, we cannot hope to split the run and know theexact group annotation of the two new runs – this could amount to solving a prefix- Z d problem. Instead, we must now use the fact that S is in SG , and derive the consequencesof the equation in terms of the Rees-Sushkevich representation. Intuitively, the equationensures that the structuring group G is commutative, and that annotations in G can “movearound” relative to other elements in S without changing the equation. Formally: Dynamic Membership for Regular Languages ▶ Claim 3.7.
The structuring group G is commutative. ▶ Claim 3.8.
Let r, s, t ∈ S ∗ and ( i, g, j ) , ( i ′ , g ′ , j ′ ) ∈ I × G × J . Write w = r ( i, g, j ) s ( i ′ , g ′ , j ′ ) t and w ′ = r ( i, gg ′ , j ) s ( i ′ , e, j ′ ) t where e is the neutral element of G . Then eval( w ) = eval( w ′ ) . This allows us to reduce from the dynamic word problem on S to the same problem whileguaranteeing that the target word is pair-collapsing: ▶ Claim 3.9.
There is a constant-time reduction from the incremental word problem for S on vEBs (of some span n ) to the same problem on vEBs of span n where we additionallyenforce that, at every step, the represented word is pair-collapsing. Proof sketch.
We maintain a mapping where all maximal runs of word elements evaluatingto C are collapsed to a single element, which we can evaluate following the Rees-Sushkevichrepresentation. The tricky case is whenever an edit breaks up a maximal run into two parts:we cannot recover the G -component of the annotation of each part, but we use Claim 3.8 toargue that we can simply put it on the left part without altering the evaluation in S . ◀ This claim together with Lemma 3.6 implies that the dynamic word problem for S reducesto the same problem for S \ C , for which we use the induction hypothesis. This establishesthe induction step and concludes the proof of Theorem 3.2. ZG We pursue our study of the dynamic word problem for monoids with the class ZG , introducedin [2] and defined by the equation x ω +1 y = yx ω +1 for all x, y . This asserts that elementsof the form x ω +1 , which are the ones belonging to a subgroup of the monoid, are central ,i.e., commute with all other elements. By the equations, and recalling that x ω +1 = x ω x ω +1 ,clearly ZG ⊆ SG . In this section, we will show an upper bound on the dynamic wordproblem for monoids in ZG , and a conditional lower bound for any monoid not in ZG . Upper bound.
Recall the result on commutative monoids from [25]: ▶ Theorem 4.1 ([25]) . The dynamic word problem for any commutative monoid is in O (1) . Our goal is to generalize it to the following result: ▶ Theorem 4.2.
The dynamic word problem for any monoid in ZG is in O (1) . This generalizes Theorem 4.1 (as commutative monoids are clearly in ZG ) and coversother monoids, e.g., the monoid M = { , a, b, ab, } with a = b = ba = 0, where it intuitivelysuffices to track the position of a ’s and b ’s and compare them if there is only one of each.We now prove Theorem 4.2. A semigroup S is nilpotent if it has a zero and it has some k > S k = { } , i.e., all products of k elements are equal to 0. Alternatively [23,Chapter X, Section 4], S is nilpotent iff it satisfies the equation x ω y = yx ω = x ω . We thendefine a monoid S by adding an identity to S , i.e., a fresh element 1 such that 1 x = x x for all x ∈ S . Note that the monoid M given above is of this form. The variety generated bysuch monoids was studied by Straubing [26]. We can show: ▶ Proposition 4.3.
For any nilpotent S , the dynamic word problem for S is in O (1) . Proof sketch.
We maintain a (non-sorted) doubly-linked list L of the positions with a non-neutral element. As S is nilpotent, the evaluation in 0 unless constantly many non-neutralletters remain, which we can then find in O (1) with L . ◀ . Amarilli, L. Jachiet, C. Paperman 9 In [1] we show that ZG is generated by such monoids S and by commutative monoids: ▶ Proposition 4.4 (Corollary 3.6 of [1]) . The variety ZG is generated by commutative monoidsand monoids of the shape S for S a nilpotent semigroup. Theorem 4.1 and Proposition 4.3 allow us to conclude the proof of Theorem 4.2, thanksto the above result and to closure under the operations of a variety (Proposition 2.1).
Lower bound.
We now show a conditional lower bound on the dynamic word problem formonoids outside of ZG . To do this, we will reduce from the prefix- U problem: ▶ Theorem 4.5.
For any monoid M in SG \ ZG , there is a constant-time reduction fromthe prefix- U problem to the dynamic word problem for M . Proof sketch.
We show that ZG = SG ∩ ZE , and show for any monoid not in ZE that wecan encode the prefix- U problem using carefully chosen elements. ◀ By Conjecture 2.3, and together with Theorem 3.1 for the monoids not in SG , thisimplies a conditional super-constant lower bound for monoids outside ZG . We have classified the complexity of the dynamic word problem for monoids: it is in O (log log n ) for monoids in SG , in O (1) for monoids in ZG , in Ω(log n/ log log n ) for monoidsnot in SG , and non-constant for monoids not in ZG conditionally to Conjecture 2.3. In thissection, we extend our results from monoids to semigroups . Sub-monoids and local monoids. A sub-monoid of a semigroup S is a subset of thesemigroup which is stable under its composition law and is a monoid. We first notice viaProposition 2.1 that a semigroup that contains a hard sub-monoid is also hard: ▶ Claim 5.1.
The dynamic word problem for any sub-monoid of a semigroup S reduces inconstant time to the same problem for S . We will investigate if studying the sub-monoids of a semigroup suffices to understand thecomplexity of its dynamic word problem. To do so, we focus on a certain kind of sub-monoids:the local monoids . A sub-monoid N of S is a local monoid if there exists an idempotentelement e of S such that N = eSe , i.e., N is the set of elements that can be written as ese for some s ∈ S . The point of local monoids is that every sub-monoid T of S is a sub-monoidof a local monoid: indeed, taking e the neutral element of T , all elements of T can be writtenas eT e ⊆ eSe and eSe is a local monoid. For V a variety of monoids, we denote by LV theclass of semigroups such that all local monoids are in V . As we explained, this is equivalentto imposing that all submonoids are in V (using the fact that V is closed by sub-monoid). Case of SG . We now revisit our results on monoids to extend them to semigroups, startingwith SG . We show that a semigroup where all local monoids are in SG must itself be in SG : ▶ Claim 5.2.
We have
LSG = SG as varieties of semigroups. Semigroups in SG are already covered by our upper bound (Theorem 3.2), and semigroupsnot in LSG have a sub-monoid not in SG , so we can use Claim 5.1 and Theorem 3.1. Hence: ▶ Corollary 5.3.
Let S be a semigroup. If S is in SG , then the dynamic word problem for S is in O (log log( n )) . Otherwise, the dynamic word problem for S is in Ω(log n/ log log n ) . Case of ZG . The variety ZG is not equal to LZG . For instance, let S be the syntacticsemigroup of a ∗ b ∗ , that is the semigroup { a, b, ab, } defined with a = a , b = b and ba = 0.It is not in ZG , since a and b are idempotents that do not commute. However, its localmonoids are either trivial or U , so they are all in ZG , showing that this semigroup is in LZG .Still, we can extend our characterization from monoids to semigroups up to studying
LZG : ▶ Theorem 5.4.
Let S be a semigroup. If S is in LZG , then the dynamic word problemfor S is in O (log log( n )) . Otherwise, unless prefix- U is in O (1) , the dynamic word problemfor S is not in O (1) . The second part of the claim is by Claim 5.1 and Theorem 3.1, but the first part is muchtrickier. It uses an alternative definition of
LZG via the semidirect product operator [27], avery technical locality result asserting that this is indeed equal to
LZG , and an algorithmfor the alternative definition. We prove Theorem 5.4 in the rest of this section.Given two semigroups S and T , a semigroup action of S on T is defined by a map act : S × T → T such that act( s , act( s , t )) = act( s s , t ) and act( s, t t ) = act( s, t )act( s, t ).We then define the product ◦ act on the set T × S as follows: for all s , s in S and t , t in T , we have: ( t , s ) ◦ act ( t , s ) := ( t act( s , t ) , s s ) . The set T × S equipped with theproduct ◦ act is a semigroup called the semidirect product of S by T , denoted T ◦ act S .We say that a semigroup is definite if there exists an integer k ∈ N such that for all y, x , . . . , x k in D , we have yx · · · x k = x · · · x k . Alternatively, a semigroup is definite iff itsatisfies the equation yx ω = x ω [27, Proposition 2.2] for all x, y in T . In particular, everynilpotent semigroup is definite. We write D for the variety of definite semigroups.Our alternative definition of LZG will be the variety of semigroups ZG ∗ D generated bysemigroups that are the semidirect product of a ZG monoid by a definite semigroup.The variety ZG ∗ D of semigroups is not immediately related to the variety LZG definedabove. One can easily show that ZG ∗ D ⊆ LZG , but the other direction is much morechallenging to establish. We show this as a so-called locality theorem , which we defer to aseparate paper [1] because it uses different tools and is of possible independent interest: ▶ Theorem 5.5 ([1], Theorem 1.1) . We have: ZG ∗ D = LZG . To conclude the proof of Theorem 5.4, by the locality theorem above, it suffices to solve thedynamic word problem for semigroups in ZG ∗ D . By Proposition 2.1, it suffices to considerthe semigroups that generate the variety. We do this below, establishing Theorem 5.4: ▶ Proposition 5.6.
Let S be a definite semigroup, let T be a semigroup of ZG , and let act be an action of S on T . There is a constant-time reduction from the dynamic word problemfor the semigroup T ◦ act S to the same problem for T . Proof sketch.
We express the direct product of the word elements as a product involvingelements of T and prefix sums of elements of S , which we can maintain in O (1). ◀ We now turn to our results on the dynamic membership problem for regular languages, andshow Theorem 1.1 using the three previous sections and some extra algebraic results.
Connection to the dynamic word problem.
A regular language L is recognized by a finitemonoid if there exists a morphism η : Σ ∗ → M such that L = η − ( η ( L )). The syntactic . Amarilli, L. Jachiet, C. Paperman 11 monoid of L is the least monoid M that recognizes L , and the morphism mapping Σ ∗ to M is called the syntactic morphism .The dynamic membership problem for a language clearly reduces to the dynamic wordproblem for its syntactic morphism. Yet, the converse is not true: the language L := ( aa ) ∗ ba ∗ has a syntactic monoid that can be shown to be outside of SG , but we can solve dynamicmembership in O (1) by counting the b ’s at even and odd positions. Intuitively, the syntacticmonoid has a neutral element e so that its dynamic word problem has a reduction fromprefix- Z , but e is not achieved by a letter of the alphabet so dynamic membership is easier.We extend our results to languages using the stable semigroup [4, 5]. This allows us toremove the neutral element (as it is a semigroup not a monoid) and ensures that all semigroupelements can be achieved by subwords of some constant length (the stability index ).Formally, let L be a regular language and η : Σ ∗ → M its syntactic morphism. The powerset of M is the monoid whose elements are subsets of M and for E, F ⊆ M , the product EF is { xy | x ∈ E, y ∈ F } . The stability index of L is the idempotent power s of η (Σ) in thepowerset monoid. Intuitively, the choice of s ensures that, for any two words w , w ∈ Σ s ,the effect η ( w w ) of their concatenation in the monoid can be achieved by another wordof Σ s , i.e., η ( w w ) = η ( w ) for some w ∈ Σ s . The stable semigroup of L is then S := η (Σ s ),and it is a sub-semigroup of M , since ( η (Σ s )) = η (Σ s ). For any class of semigroups V , wedenote by QV the class of languages whose stable semigroup is in V . Upper bounds.
We first show that the dynamic membership problem for a regular languagereduces more specifically to the dynamic word problem for its stable semigroup : ▶ Proposition 6.1.
Let L be a regular language. The dynamic membership problem for L reduces in constant time to the dynamic word problem for the stable semigroup of L . Proof sketch.
We partition the word of L into chunks of size s (plus one of size ≤ s ) for s the stability index, and feed them to the data structure for the stable semigroup of L . ◀ Thanks to Corollary 5.3 and Theorem 5.4, this implies that languages in
QSG (resp., in
QLZG ) have a dynamic membership problem in O (log log n ) (resp., in O (1)). Lower bound.
We now show that languages whose stable semigroup is not in LV admit areduction from the dynamic word problem of a monoid of V . ▶ Proposition 6.2.
Let V be a variety of monoids and let L be a regular language notin QLV . There is a monoid not in V whose dynamic word problem reduces in constant-timeto the dynamic membership problem for L . Proof sketch. If L is not in QLV , then its stable semigroup contains a submonoid M notin V , and all elements can be achieved by a block of s letters for s the stability index. ◀ Again by Corollary 5.3 and Theorem 5.4, we deduce that languages outside of
QSG have complexity at least Ω(log n/ log log n ), and languages outside of QLZG do not havecomplexity O (1) assuming Conjecture 2.3. Intermediate cases.
Our O (log log n ) upper bound in Theorem 3.2 and its variants maynot be tight. Still, we can identify a language L U in QSG \ QLZG for which we show that the dynamic membership problem is in Θ(log log n ) (even allowing randomization andallowing a probably correct answer), because the prefix- U problem reduces to it.We can also identify a language of QSG \ QLZG that reduces to prefix- U and so can besolved in expected O ( √ log log n ). This shows that languages in QSG \ QLZG have differentcomplexity regimes, at least when allowing randomization. ▶ Proposition 7.1.
There is a language L U in QSG \ QLZG which is equivalent to prefix- U under constant-time reductions, and a language L U in QSG \ QLZG which is equivalent toprefix- U under constant-time reductions. Deciding which case applies.
A natural question about our results is the question ofefficiently identifying, given a regular language, which of the case of Theorem 1.1 applies, orin particular of determining, given an input monoid, if it is SG , or in ZG . This depends onhow the input is represented. If we are given a monoid explicitly (as a table of its operations),then the equations of ZG and of SG can be checked in polynomial time. If the monoid isrepresented more concisely as a transition monoid of some automaton, then the verificationcan be performed in PSPACE. We do not know if the problems are PSPACE-hard, thoughthis seems likely at least for SG because of its proximity to aperiodic monoids [6]. We leaveopen the precise complexities of this task, in particular for the L and Q operators. Handling insertions and deletions.
Another natural question is to handle insertion anddeletion updates, i.e., inserting letter a at position k transforms the word w · · · w k − w k · · · w n into w · · · w k − aw k · · · w n , and deleting at position k does the opposite. Any regular languagecan be maintained under such updates in O (log n ) using a Fenwick tree, but it makes theproblem much harder for most languages. For example, if the alphabet has two letters a and b , just testing if the word that we maintain contains an a requires Ω(log n/ log log n )by [16]. This is why we do not study such updates in this work. Interestingly, notice that ouralgorithm in Theorem 3.3 supports insertions and deletions on words represented as vEBs,but the semantics are different (they use explicit positions in a fixed range). Infix queries.
A natural extension of dynamic membership for a regular language L is the dynamic infix membership problem , where we can query any infix of the word (identified by itsendpoints) and know if it is in L . The O (log n/ log log n ) algorithm of Theorem 2.5 supportsthis, and so can the O (log log n ) algorithm of [25] for group-free monoids. However, the infixproblem can be harder. Consider for instance the language L on { a, b } of words with aneven number of a ’s. Dynamic membership has complexity O (1) because L is commutative,but infix queries (or even prefix queries) require Ω(log n/ log log n ) as prefix- Z reduces to it.We leave open the study of the complexity of the infix problem. We note, however,that this problem for a language L can be studied as the dynamic membership problemfor a regular language defined from L . So our results cover the infix problem via thistransformation; we leave to future work a characterization of the resulting classes. ▶ Claim 7.2.
For any fixed regular language L , the dynamic infix membership problem isequivalent up to constant-time reductions to the dynamic membership problem for the language Σ ∗ xLx Σ ∗ where x is a fresh letter. Other open questions.
A natural question for future work would be to study the complexityof our problems in weaker models, e.g., pointer machines [28], or machines with counters. Onecould also extend our study to languages that are not regular: one result in this direction isa lower bound on maintaining the language of well-parenthesized strings [17, Proposition 1]. . Amarilli, L. Jachiet, C. Paperman 13
References Antoine Amarilli and Charles Paperman. Locality and centrality: The variety ZG . Availableonline: https://a3nm.net/publications/amarilli2021locality.pdf . Also submitted toICALP’21, 2021. Karl Auinger. Semigroups with central idempotents. In
Algorithmic problems in groups andsemigroups . Springer, 2000. Andrey Balmin, Yannis Papakonstantinou, and Victor Vianu. Incremental validation of XMLdocuments.
TODS , 29(4), 2004. David A. Mix Barrington, Kevin J. Compton, Howard Straubing, and Denis Thérien. Regularlanguages in NC . J. Comput. Syst. Sci. , 44(3), 1992. Laura Chaubard, Jean-Eric Pin, and Howard Straubing. First order formulas with modularpredicates. In
LICS , 2006. Sang Cho and Dung T. Huynh. Finite-automaton aperiodicity is PSPACE-complete.
TheoreticalComputer Science , 88(1), 1991. doi:10.1016/0304-3975(91)90075-d . Christian Choffrut and Serge Grigorieff. Separability of rational relations in A ∗ × N m byrecognizable relations is decidable. Inf. Process. Lett. , 99(1), 2006. doi:10.1016/j.ipl.2005.09.018 . Raphael Clifford, Allan Grønlund, Kasper Green Larsen, and Tatiana Starikovskaya. Upperand lower bounds for dynamic data structures on strings. arXiv preprint arXiv:1802.06545 ,2018. Assis de Azevedo. The join of the pseudovariety J with permutative pseudovarieties. In
Lattices, Semigroups, and Universal Algebra , pages 1–11. Springer, 1990. Michael Fredman and Michael Saks. The cell probe complexity of dynamic data structures. In
STOC , 1989. Moses Ganardi, Danny Hucke, Daniel König, Markus Lohrey, and Konstantinos Mamouras.Automata theory on sliding windows. In
STACS , 2018. Moses Ganardi, Danny Hucke, and Markus Lohrey. Querying regular languages over slidingwindows. In
FSTTCS , 2016. Moses Ganardi, Danny Hucke, Markus Lohrey, and Tatiana Starikovskaya. Sliding windowproperty testing for regular languages. arXiv preprint arXiv:1909.10261 , 2019. Kasper Green Larsen, Jonathan Lindegaard Starup, and Jesper Steensgaard. Further unifyingthe landscape of cell probe lower bounds. In
SOSA , 2021. Yijie Han and Mikkel Thorup. Integer sorting in O ( n √ log log n ) expected time and linearspace. In FOCS , 2002. Louis (https://cstheory.stackexchange.com/users/45122/louis). Computing and maintainingthe minimum of a set S of integers while allowing updates on S . Theoretical ComputerScience Stack Exchange. URL: https://cstheory.stackexchange.com/q/47831 (version:2020-11-08). Thore Husfeldt and Theis Rauhe. Hardness results for dynamic problems by extensions ofFredman and Saks’ chronogram method. In
ICALP , 1998. Eugene Kirpichov. Incremental regular expressions. http://jkff.info/articles/ire/ . Codeavailable at : https://github.com/jkff/ire , 2012. E. Szemerédi M. L. Fredman, J. Komlos. Storing a sparse table with O(1) worst case accesstime. In
SFCS , 1982. doi:10.1109/SFCS.1982.39 . Mihai Pˇatraşcu and Corina E Tarniţˇa. On dynamic bit-probe complexity.
Theoretical ComputerScience , 380(1-2), 2007. Mihai Pˇatraşcu and Mikkel Thorup. Randomization does not help searching predecessors. In
SODA , 2007. J.-E. Pin.
Varieties of formal languages . Foundations of Computer Science. Plenum PublishingCorp., New York, 1986. With a preface by M.-P. Schützenberger, Translated from the Frenchby A. Howie. doi:10.1007/978-1-4613-2215-3 . Jean-Éric Pin. Mathematical foundations of automata theory. , 2019. John Rhodes. The fundamental lemma of complexity for arbitrary finite semigroups.
Bulletinof the American Mathematical Society , 74(6), 1968. doi:10.1090/s0002-9904-1968-12064-6 . Gudmund Skovbjerg Frandsen, Peter Bro Miltersen, and Sven Skyum. Dynamic word problems.
JACM , 44(2), 1997. Howard Straubing. The variety generated by finite nilpotent monoids.
Semigroup Forum ,24(1), 1982. doi:10.1007/bf02572753 . Howard Straubing. Finite semigroup varieties of the form V*D.
Journal of Pure and AppliedAlgebra , 36, 1985. Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintaindisjoint sets.
Journal of Computer and System Sciences , 18(2):110–127, 1979. doi:https://doi.org/10.1016/0022-0000(79)90042-4 . Mikkel Thorup. Equivalence between priority queues and sorting.
JACM , 54(6), 2007. Peter van Emde Boas, Robert Kaas, and Erik Zijlstra. Design and implementation of anefficient priority queue.
Mathematical systems theory , 10(1), 1976. . Amarilli, L. Jachiet, C. Paperman 15 c ∗ ac ∗ bc ∗ ( aa ) ∗ c ∗ ac ∗ bc ∗ ( aa ) ∗ a ∗ b a ∗ b ( aa ) ∗ b ( aa ) ∗ ( aa ) ∗ ba ∗ (( abc ) ) ∗ (( acb ) ) ∗ ( aa ) ∗ ( aa ) ∗ b ( aa ) ∗ AComMNil ComZGAZGA ZEAllSG
ACom xy = yx ∧ A Com xy = yx MNil = ZG ∧ A ZG = SG ∩ ZE = MNil ∨ Com x ω +1 y = yx ω +1 ZE x ω y = yx ω ZG ∨ AA x ω +1 = x ω SG x ω +1 yx ω = x ω yx ω +1 All
XYZLXLYLZ O (1) O (log log n ) prefix- U hard Θ(log n/ log log n ) prefix- Z d hard Figure 1
Complexity of the dynamic word problem for common classes of monoids. Arrows denoteinclusion and are labeled with languages (with an implicit neutral letter e ) whose syntactic monoidsseparate the classes. The classes ZG and SG are maximal for the O (1) region and O (log log n )region respectively. A Proofs for Section 2 (Preliminaries and Problem Statement)A.1 Proof of Proposition 2.1 ▶ Proposition 2.1.
Let S and T be finite semigroups. The dynamic word problem ofsubsemigroups or quotients of S reduce in constant time to the problem for S , and thedynamic word problem of S × T reduces to that of S and T . Proof.
In the case of submonoids, we can simply solve the dynamic word problem for thesubmonoid by performing the computations in S . In the case of quotient, we can representeach element of the quotient by some choice of representative element in S , perform thecomputation in S , and check in which equivalence class in the quotient does the result in S fall.For the product, we can simply maintain the structures for both S and T simultaneously. ◀ A.2 Proof of Theorem 2.4 ▶ Theorem 2.4 (Adapted from [21]) . Any structure for prefix- U on a word of length n mustbe in Ω(log log n ) . To prove this result, we first introduce prerequisites on the colored predecessor problem.We then explain how it relates to the prefix- U problem. We then derive a lower bound forprefix- U . Colored predecessor problem.
The colored predecessor problem is a problem parameterizedby n . In this problem, we have a set S ⊆ { , ..., n } and a color (black or white) for eachelement in S and we need to answer queries asking for the color c y of the biggest element y ∈ S such that y ≤ x for x a parameter of the query.In the static version of the problem, the set S is given in advance and we are allowed tocompute an index before receiving the query. The time complexity is then measured only interms of the time required to answer the query.In [21], it has been proved that, in the worst case, the static colored predecessor problemfor S a subset of { , ..., n } with | S | = n cannot be solved in less than Ω(log log n ) querytime if the space allowed is bounded by n o (1) . And this lower bound holds in the cell probemodel, and still holds even if we allow randomization and the answer to be only probablycorrect. Reduction from the static colored predecessor problem to prefix- U . Now let us supposethat we have a maintenance scheme for prefix- U with complexity d ( n ). Let S ⊆ { , ..., n } be a instance of the static colored predecessor problem and let us show that we can build adata structure for S that takes O ( n × d ( n )) space and can answer predecessor queries with O ( d ( n )) cell-probes. If we manage to do that then either d ( n ) = n o (1) and thus we have d ( n ) = Ω(log log n ) using the static lower bound or d ( n ) ̸ = n o (1) . In both cases we have d ( n ) = Ω(log log n ) and thus d ( n ) = Ω(log log n ). Lower bound for prefix- U . Since S ⊆ { , ..., n } , we create a word w of length n whereall elements are the neutral element of U . Then we perform n substitution updates on w sothat the i -th letter of w is a a when i ∈ S and i has color White, b when i ∈ S and i hascolor Black, and the neutral element of U when i ̸∈ S . The index for prefix- U on w nowenables us to answer predecessor queries on S : if the prefix of length k gives out the color . Amarilli, L. Jachiet, C. Paperman 17 of the predecessor query with parameter k (White when it is a , Black when it is b , and theneutral element when k has no predecessor in S ).Since queries on this index take d ( n ) time, we do have a scheme for the static predecessorproblem in time d ( n ), but, before concluding, we need to make sure that the space usage isbounded by O ( n o (1) ). This is not obvious for two reasons: first because the initializationof the dynamic algorithm might modify O ( n ) memory cells during the initialization (or evenmore if we allow a superlinear preprocessing time) and second because during each updatethe dynamic algorithm might access a memory cell i with i ̸ = n o (1) (e.g. it might use thememory cell n ). We now explain how to handle these two issues. Reducing the memory footprint.
To handle the first issue, we can notice that the wholepreprocessing computation depends only on n and therefore if a cell has not been modifiedsince the initialization of the dynamic algorithm then we can recover its value from n . Forthe second issue, we can notice that each update takes d ( n ) time, and therefore the totalnumber of memory cells that have been modified during one of the updates is bounded by d ( n ) × n . Therefore using a perfect hashing scheme [19] using O ( d ( n ) × n ) memory cells,we can retrieve in O (1) for each address i whether i was modified during an update and if ithas been modified, what is its current value.All in all, we can modify the query function for our dynamic problem in the followingway: whenever the dynamic algorithm wants to read the cell at address i we use the perfecthashing scheme to determine if i has been modified since the initialization, if it has beenmodified we retrieve its value and otherwise we recompute the value that can be deducedfrom n . Such an algorithm makes O ( d ( n )) cell probes over a memory of size O ( n × d ( n )).This is what we wanted to obtain, concluding the proof. A.3 Proof of Theorem 2.5 ▶ Theorem 2.5 ([25]) . For any monoid M , the dynamic word problem and prefix problemfor M are in O (log n/ log log n ) . We first show the claim for the dynamic word problem. Let ( M, ⊙ ) be a monoid and w = w . . . w n be a word. Let us show that we can maintain the value ν ( w ) = w ⊙ · · · ⊙ w n after substitution updates on w in O (log n/ log log n ). M will be considered fixed here andall complexities will thus only depend on n , the length of the word w to be maintained.We denote by G M the (infinite) directed graph whose nodes are the elements of M ∗ andwhere a node ( m , . . . , m ℓ ) has ℓ × | M | outgoing edges each labeled with a different elementof { · · · ℓ } × M . For a node ( m , . . . , m ℓ ), the edge labeled ( i, v ) goes to ( m , . . . , m i − , v,m i +1 , . . . , m ℓ ). Finally the value of the node ( m , . . . , m ℓ ) is defined as m ⊙ · · · ⊙ m ℓ .We denote by G kM the restriction of G M to nodes ( m , . . . , m ℓ ) where ℓ ≤ k . G kM has atmost | M | k × k nodes and the degree of each node is bounded by | M | × k so there exists γ (independent of n ) such that for k = ⌊ γ × log( n ) ⌋ , the graph G kM can be computed in O ( n / )and stored with O ( n / ) space (in fact, for any ϵ >
0, we could achieve n ϵ by changing γ ).We can also ensure that the value of each node is pre-computed (and takes constant time toaccess) and that retrieving the ( i, v )-labeled neighbor of a given node takes constant time. ▶ Claim A.1.
Once G kM is computed, we have a scheme S k ( n ) with O (log k ( n )) update timethat allows to maintain the value of ν ( w ) after substitution update for w , a word composedof n letters. Proof.
This idea is based on a modification of the Fenwick tree data structure with abranching factor of k . The proof works by induction. Each scheme S k ( n ) will maintain a node in G kM such that the product of elements of the word is equal to the value of this node(which takes O (1) to retrieve).For 0 ≤ n ≤ k , we will store the node ( c , . . . , c n ) where c . . . c n is the word to maintain.We deal with substitutions using the transitions of the graph: to update the element atposition j to the element v we simply replace the current node with its ( j, v )-neighbor.For n > k , we note R the biggest power of k that is strictly less than n and we split theword into w , . . . , w ℓ such that | w | = · · · = | w ℓ | = R (note that ℓ ≤ k ). At each step thecurrent node will correspond to ( ν ( w ) , . . . , ν ( w ℓ )). For each of the words w , . . . , w ℓ weuse a scheme S k ( R ) to maintain the value ν ( w i ).To update w at position j , we update the subword w i with i = (cid:22) jR (cid:23) at position j − i × R which gives us the new value of ν ( w i ) then we update the current node by replacing itwith its ( ν ( w i ) , i )-neighbor. ◀ Notice that initialization time is linear, and each update will make log k ( n ) recursive calls eachin constant time. To finish, let us recall that, for k = γ × log( n ) then we have the computationof G kM that is O ( n / ) and each query that is in time O (log k ( n )) = O (cid:18) log( n )log( γ × log( n )) (cid:19) = O (cid:18) log( n )log(log( n )) (cid:19) . This concludes the proof of Theorem 2.5.In terms of memory usage, the data structure uses O ( n/k ) that is O ( n/ log n ). Note thatthis is sublinear thanks to the power of the RAM model. This is because nodes at the lowestlevel compress a factor of size k of the original word. In particular, this memory usage is nomore than linear in the size of the original word.We now turn to the prefix problem, and show that the structure can be used more generallyto support prefix queries (see Section 7). By lowering the γ parameter, we can precomputefor each node ( m , . . . , m ℓ ) of G kM and for each 1 ≤ i ≤ j ≤ ℓ the value m i ⊙ · · · ⊙ m j . Then,in a scheme S k ( n ) to compute the element of the monoid corresponding to the factor ( i, j ),either positions i and j belong to the same subword in which case we recurse on this subwordor the factor ( i, j ) can be decomposed into a strict suffix of some subword w i ′ followed by apossibly empty contiguous list of subwords followed by a strict prefix of a subword w j ′ . Ifthey are not empty, we recurse on w i ′ and w j ′ to get the value of the strict suffix and prefixparts and using our precomputation we get the value for the list of subwords. By composingthe obtained values, we get the effect of the factor ( i, j ).Note that a subword query on a scheme might trigger two recursive calls but this canonly happen if the subword query is not a prefix nor a suffix query and the two recursivecalls it makes are a prefix query and a suffix query. Therefore the overall algorithm doeshave the expected O (cid:18) log( n )log(log( n )) (cid:19) complexity. B Proofs for Section 3 (Dynamic Word Problem for Monoids in SG )B.1 Details on Van Emde Boas Trees In this appendix, we give some details about the vEB data structure. It supports the followingoperations: insert ( x , a ) that inserts the integer x ∈ { , ..., n } in X (with x / ∈ X ) and sets µ ( x ) := a ; delete ( x ) for x ∈ X that removes the integer x from X (and from the domain of µ ); retrieve ( x ) that returns the value µ ( x ) if x is in X and a special value otherwise. . Amarilli, L. Jachiet, C. Paperman 19 findPrev ( y ) for y ∈ { , ..., n } that returns the biggest x ∈ X such that x ≤ y , andreturns a special value if no such element exists, i.e., y is smaller than all elements in X ; findNext ( y ) that works like findPrev but returns the smallest x ∈ X such that x ≥ y .The original definition [30] of vEB creates a tree with O ( n × log log n ) cells and O ( n × log log n ) initialization time. We present here an adaptation of the vEB tree that reduces thememory usage and the initialization to O ( n ), to guarantee that our preprocessing is linear. Linear initialization of vEB.
For this, we will reduce the predecessor problem over the range { , ..., n } to the predecessor problem over the range { , ..., K ( n ) } with K ( x ) = ⌈ x/ log log n ⌉ .Note that the division here can be performed in O (1), for instance by filling sequentially atable for its results over { , ..., n } , as the arguments will always be in that range.The idea is to create an array T of size n that will store the mapping µ and a vEB V storing keys over the range { , ..., K ( n ) } .If we want to initialize our modified vEB with an empty set, we start with V empty and T filled with a special value ⊥ indicating that the domain of the associative array is empty.We then perform the vEB operations on this structure in the following way:for an operation insert ( x, a ) we set T [ x ] := a and we insert into V the integer K ( x ) (ifit is not already present);for an operation retrieve ( x ) we return T [ x ];for an operation delete ( x ) we set T [ x ] := ⊥ and if there is no y ∈ K − ( K ( x )) such that T [ y ] ̸ = ⊥ we call delete ( K ( x )) on V ;for findPrev ( x ), we start by looking among the y ∈ K − ( K ( x )) with y ≤ x and T [ y ] ̸ = ⊥ .If no such y exists, we set p to the result of findPrev ( K ( x ) −
1) on V . If p = ⊥ , it meansthat x has no predecessor and if p is set we find the predecessor of x among the elementsof K − ( p );a successor query can be done in a similar fashion.To initialize the modified vEB with a set X in O ( n ) storing the function µ , we dosomething similar to performing | X | insertions except that we first do all the modificationsin T without inserting in V and afterwards we do all the insertions in V , making sure thateach key is inserted once. The total initialization time is thus O ( n ) to modify in T followedby at most K ( n ) insertions in V each taking O (log log n ). Thus, the initialization time is O ( n ) overall.Note that, by construction, for all x we have that | K − ( x ) | = ⌈ log log n ⌉ and thus alloperations on our new vEB run in O (log log n ). B.2 Proof of Theorem 3.2 ▶ Theorem 3.3.
Let S be a semigroup in SG . The dynamic word problem for S on a vEBof span n is in O (log log n ) . We start by a preliminary remark pointing out that SG is not equal to Com ∨ A , as isillustrated in Figure 1: ▶ Remark B.1.
Remark that SG contains both Com and A , the variety of aperiodic monoids.From [25], we know that both those varieties have a dynamic word problem in O (log log n ).Hence, their join Com ∨ A , i.e., the variety that they generate (which is not illustrated inFigure 1), also has a word problem in O (log log n ) since the complexity of this problem ispreserved by the operations of a variety (Proposition 2.1). However, SG is in fact not equal to this variety. First remark that both Com and A monoids satisfies the equation ( xyz ) ω +1 ( xzy ) ω = ( xyz ) ω ( xzy ) ω +1 . Therefore, the varietythey generate also satisfies this equation. However, the language (( abc ) ) ∗ (( acb ) ) ∗ does notsatisfy it by construction. Still, it is in SG and thus has a complexity in log log( n ) by ourresults.This remark will be refined in the next appendix section (see Remark C.1) to show that SG is also larger than Com ∨ ZG , for the class ZG defined in Section 4.In the rest of this appendix, we successively prove the results needed for the proof ofTheorem 3.2. ▶ Claim 3.4.
For any x, y of S with xy ∈ C , then x ∈ C and y ∈ C . Proof.
We first show the first claim. Note that we have
SxyS ⊆ SxS : this is because anyelement of
SxyS , say sxyt for s, t ∈ S , can be written as sx ( ty ) so it is in SxS also. Thus,we have xy ≤ x But as C is a maximal J -class, we must have x ≤ xy . So x and xy are inthe same J -class and x ∈ C . The same reasoning shows that y ∈ C .We now show the second claim. Let x ∈ S \ C , and y ∈ S . Let us show that xy ∈ S \ C .But we know that if xy ∈ C , then x ∈ C by the previous claim, a contradiction. The samereasoning shows that yx ∈ S \ C . ◀▶ Lemma 3.5.
Let C be a maximal J -class. If C is non-regular, then any word is pair-collapsing: for any x, y ∈ C , we have xy ∈ S \ C . Proof.
We proceed by contraposition and show that if there exist x, y ∈ C such that xy ∈ C ,then C is regular.Assume that we have x, y ∈ C such that xy ∈ C . We then know that x and xy are in thesame J -class, so SxS = SxyS . Thus, as x ∈ SxS , there exists s, t ∈ S such that sxyt = x .By reinjecting the left-hand side in itself, this equation implies that s i x ( yt ) i = x for all i ∈ N .Let ω be the idempotent power of S . We have s ω x ( yt ) ω = x . As x ∈ C , Claim 3.4 impliesthat s ω ∈ C . Hence, C contains an idempotent element, so it is regular. ◀▶ Lemma 3.6.
Let S be a semigroup and let C be a maximal J -class of S . Consider thedynamic word problem for S on vEBs of some span n where we assume that, at every step,the represented word is pair-collapsing for C . Then there is a constant-time reduction fromthat problem to the dynamic word problem same problem for S \ C on vEBs of span n . Proof.
We will always refer to w to mean the word on S (represented as a vEB with somespan), and w ′ the word on S \ C (represented as a vEB with the same span). Recall that thecase where w is empty is special (as the image may not be representable in the semigroup S ).The case where w contains only a single element is also special, as the result may not be in S \ C . Up to maintaining a count of the number of letters of w , we can handle this specialcase easily: when the number of letters of w becomes equal to 1, we locate the remainingletter using the vEB, and this gives us immediately the evaluation of w . So we assume inthe sequel that w contains at least 2 elements.We will maintain a function ψ , called a position mapping , from the positions of w tothose of w ′ , with the following requirements: ψ is surjective: every position of w ′ is reached by ψψ is nondecreasing ψ is bounded injective: there are at most 3 positions of w mapping to any position of w ′ ψ is pair-grouping: there are at least 2 positions of w mapping to any position of w ′ . Amarilli, L. Jachiet, C. Paperman 21 ψ preserves evaluations: for every position ( i, a ) of w ′ , the letter a in S ′ is exactly thecomposition of the letters a , . . . , a n of the pairs ( i , a ) , . . . , ( i n , a n ) of the vEB w thatare mapped to ( i, a ) by ψ : because ψ is nondecreasing, this is a segment of successiveletters in w (and 2 ≤ n ≤ i, a ) in the image of ψ , i is the index of the last letter of w that is mappedto i by ψ Thanks to the fourth condition, as the word is pair-collapsing, we know that the fifthcondition indeed yields an evaluation in S \ C . Also note that the position mapping(specifically conditions 1, 2 and 5) guarantees that w and w ′ indeed evaluate to the samemonoid element, ensuring the correctness of the reduction: the answer to the dynamic wordproblem for S on w is the same as for S ′ on w ′ . So it only remains to explain how wecan construct ψ and w ′ from w (preprocessing), and how we can maintain ψ and w ′ underupdates to w .For the preprocessing, to initialize ψ , we traverse w sequentially (this can be done inlinear time if we simply assume that vEBs on span { , . . . , n } always store their contents inan array maintained in parallel to the structure), and construct w ′ and ψ sequentially, bygrouping the letters of w in groups of 2 or possibly 3 for the last group (to avoid leaving oneletter alone). This can clearly be done in time O ( n ) with n the span of w .To maintain ψ under insertions, we find a predecessor or successor of the new letter in w using the vEB, and we set the image of the new letter by ψ to be that of that predecessor orsuccessor, and update the image letter in w ′ to reflect this, in constant time because ψ isbounded injective. The only problem is that now the new ψ may not be bounded injectiveanymore, because we could have 4 letters of w (including the new one) mapping to a positionof w ′ . If this happens, we split the group by inserting a letter in w ′ for the two first lettersof w that mapped there (put at the position of the second letter to satisfy condition 6) andupdating the letter of w ′ at the position of the fourth letter (by condition 6) to be that ofthe two last letters. All of this can be performed with constantly many updates on w ′ andconstantly many operations on the vEB w .To maintain ψ under deletions, we look at the image of the deleted letter by ψ , we findthe other letters which are in the same group (i.e., constantly many neighbors, which wecan find using the vEB w ), we delete all letters of the group along with the group andmodify ψ (preserving the invariant), and then we insert back the letters of the group thatwe did not intend to delete, as explained in the previous paragraph. This concludes theproof. ◀▶ Claim 3.7.
The structuring group G is commutative. Proof. As C is regular, let ( i, g, j ) be an idempotent of C . We have ( i, g, j ) = ( i, g, j ) ,which implies that p j,i ̸ = 0, and in fact that the idempotent is of the form ( i, p − j,i , j ).Let us write H i,j = { ( i, g, j ) | g ∈ G } : this subset is closed under the semigroup operationbecause p j,i ̸ = 0, so it is a subsemigroup, and in fact it is a group, with neutral element( i, p − j,i , j ), and with ( i, p − j,i g − p − j,i , j ) the inverse of ( i, g, j ). Indeed, for any ( i, g, j ) we have( i, g, j )( i, p − j,i , j ) = ( i, p − j,i , j )( i, g, j ) = ( i, g, j ) so indeed ( i, p − j,i , j ) is neutral, and we have( i, g, j )( i, p − j,i g − p − j,i , j ) = ( i, p − j,i g − p − j,i , j )( i, g, j ) = ( i, p − j,i , j ) so indeed the inverses are asdescribed.Let us show that the group H i,j is commutative. To see why, take any g, g ′ ∈ G , andtake x = ( i, g, j ) and y = ( i, g ′ , j ). Let us show that xy = yx . Applying the equation to S ,we know that x ω +1 yx ω = x ω yx ω +1 , where ω is the idempotent power of x . Now, x ω = x ω by definition, and in the group H i,j this implies that x ω = e , with e the neutral element of the group, by composing the equation with ( x ω ) − . Thus, injecting this in the equationyields xy = yx , as claimed. Thus, H i,j is commutative.We now show this implies that G is a commutative group. Let us first note that, for any g ∈ G , as ( i, g, j )( i, e, j ) = ( i, e, j )( i, g, j ) because H i,j is commutative, we have gp j,i = p j,i g .Now, take any g, g ′ ∈ G , and let us show that gg ′ = g ′ g . We have ( i, g, j )( i, g ′ , j ) =( i, gp j,i g ′ , j ) = ( i, p j,i gg ′ , j ) by what we just argued. As H i,j is commutative, the latter is alsoequal to ( i, g ′ , j )( i, g, j ), which is equal to ( i, p j,i g ′ g, j ). Identifying the two and composingby p − j,i , which is possible because we argued that p j,i ̸ = 0, we obtain gg ′ = g ′ g . So indeed G is commutative. ◀▶ Claim 3.8.
Let r, s, t ∈ S ∗ and ( i, g, j ) , ( i ′ , g ′ , j ′ ) ∈ I × G × J . Write w = r ( i, g, j ) s ( i ′ , g ′ , j ′ ) t and w ′ = r ( i, gg ′ , j ) s ( i ′ , e, j ′ ) t where e is the neutral element of G . Then eval( w ) = eval( w ′ ) . Proof. As D is regular, let us consider an idempotent in D . By the same reasoning asthe first paragraph of the proof of Claim 3.7, we know that the idempotent is of the form( a, p − b,a , b ) with p b,a being nonzero. What is more, like in the proof of Claim 3.7, we knowthat H a,b = { ( a, g, b ) | g ∈ G } is a group.In all equations that follow, we abuse notation and write equalities between words of S to mean that they evaluate to the same element (not that the words are the same).Let us now observe that:( i ′ , g ′ , j ′ ) = ( i ′ , e, b )( a, p − b,a g ′ , b )( a, p − b,a , b )( a, p − b,a , j ′ ) . Indeed, this is immediate by the law of the Rees semigroup with zero.Let us now take x = ( a, p − b,a g ′ , b ), and take ω its idempotent power. We have x ω = x ω ,and as this operation happens in the group H a,b we know that x ω is the neutral element ofthis group, which is ( a, p − b,a , b ).So we can rewrite the above equation as:( i ′ , g ′ , j ′ ) = ( i ′ , e, b ) xx ω ( a, p − b,a , j ′ ) . Similarly to the first equation, we have( i, g, j ) = ( i, p − b,a , b )( a, g, b ) x ω ( a, p − b,a , j ) . So let us take: y = ( a, p − b,a , j ) s ( i ′ , e, b )The equation defining SG , applied to x and y , now tells us that x ω +1 yx ω = x ω yx ω +1 .So let us now write w , i.e.: w = r ( i, p − b,a , b )( a, g, b ) x ω ( a, p − b,a , j ) s ( i ′ , e, b ) xx ω ( a, p − b,a , j ′ ) t We recognise the definition of y , so we have: w = r ( i, p − b,a , b )( a, g, b ) x ω yx ω +1 ( a, p − b,a , j ′ ) t Using the equation, we have: w = r ( i, p − b,a , b )( a, g, b ) x ω +1 yx ω ( a, p − b,a , j ′ ) t Injecting back the definition of y : w = r ( i, p − b,a , b )( a, g, b ) x ω +1 ( a, p − b,a , j ) s ( i ′ , e, b ) x ω ( a, p − b,a , j ′ ) t . Amarilli, L. Jachiet, C. Paperman 23 The part between r and s is:( i, p − b,a , b )( a, g, b )( a, p − b,a g ′ , b )( a, p − b,a , b )( a, p − b,a , j )which evaluates by the Reese law to ( i, gg ′ , j ); and the part between s and t evaluatesaccording to the same to ( i ′ , e, j ′ ), so we have obtained: w = r ( i, gg ′ , j ) s ( i ′ , e, j ′ ) t Which establishes that eval( w ) = eval( w ′ ) and concludes the proof. ◀▶ Claim 3.9.
There is a constant-time reduction from the incremental word problem for S on vEBs (of some span n ) to the same problem on vEBs of span n where we additionallyenforce that, at every step, the represented word is pair-collapsing. Proof. A maximal run of a word of S ∗ is a non-empty maximal contiguous subsequence ofelements of the word whose image is in C . We will maintain the target word w ′ as a vEBalong with a function ψ from the positions of w to that of w ′ , with the following requirements: ψ is surjective: every position of w ′ is reached by ψ ; ψ is nondecreasing;For any letter of w not part of a maximal run, i.e., an element of S \ C , then the imageof this letter by ψ has the same letter and it has has exactly this element as preimage;For any maximal run of w , all its letters have the same image by ψ , the maximal run isprecisely the preimage of that element, and the label of this maximal run is of the form( i, g, j ) for some g , where ( i, g ′ , j ) for some g ′ is the actual evaluation of this maximalrun in w ;For any letter ( i, a ) in the image of ψ , i is the index of the last letter of w that is mappedto i by ψ We additionally require that the evaluation of w ′ and of w in S is the same. It is crucialthat this condition is enforced globally, not locally, as we intend to use Claim 3.8.For the preprocessing, given the word w , we process it sequentially (again assuming thatvEBs also store their contents as an array), and we create the mapping sequentially, usingthe Rees-Sushkevich representation to determine when a maximal run ends. This takestime O ( n ).To handle updates on w , there are several cases:When we insert an element of S \ C , we check the target vEB to know if this insertionhappens within a maximal run or not:If the insert is not within a maximal run, then we simply reflect the same change in w ′ and ψ .When we insert an element x of S \ C within a maximal run, then the maximal run isbroken. We use the vEB w to find the preceding element ( i − , g − , j − ) and ( i + , g + , j + )of w relative to the insertion: they are necessarily in C because we are within amaximal run.We now replace the element ( i, g, j ) of w ′ which was the image of the maximal runby ψ by three elements: one of the form ( i, g , j − ) corresponding to the remainingprefix of the maximal run, one corresponding to the inserted element, and one of theform ( i + , g , j ) corresponding to the remaining suffix of the maximal run. We set g := gp − j − ,i + and g := e for e the neutral element of G .We must argue that the evaluation of w ′ and of w is still the same. To do this,write the new w ′ as r ( i, g , j − ) x ( i + , g , j ) t . Consider now ( i, g ′ , j − ) and ( i + , g ′ , j ), the respective actual evaluations of the prefix and suffix after the update. We knowthanks to the invariant that w and w ′ had the same evaluation before the update,i.e., w evaluates to the same in S as r ( i, g, j ) t , and after the update w evaluates tothe same as r ( i, g ′ , j − ) x ( i + , g ′ , j ) t , and by Rees-Sushkevich we have g = g ′ p j − ,i + g ′ .Instead, our definition of w ′ evaluates to r ( i, g , j − ) x ( i + , e, j ) t . The values g ′ and g ′ are intuitively the ones that we cannot retrieve from our data structure. However,thanks to Claim 3.8, as g = gp − j − ,i + = g ′ g ′ (using the commutativity of G , Claim 3.7),we know that w ′ as we defined it evaluates to the correct value. Thus, the invariant ispreserved.When we delete an element of S \ C :If the deletion does not connect together two maximal runs (i.e., it is preceded andsucceeded in w ′ by elements that are not both in C , or that are both in C but whosecomposition in C yields 0), then we simply reflect the change in w ′ ;When we delete an element of S \ C and connect together two maximal runs, then weupdate w ′ to delete the element and then delete the two elements ( i, g, j ) and ( i ′ , g ′ , j ′ )corresponding to the two runs by an element ( i, gp j,i ′ g ′ , j ′ ) corresponding to the newrun.When we insert an element ( i, g, j ) of C , we check w ′ to distinguish many possible cases:If this happens within a maximal run (as ascertained using the vEB structure on w ′ ),then let ( i − , g − , j − ) and ( i + , g + , j + ) be respectively the preceding and succeedingelements in w , obtained from the vEB of w : they are both in C . We update the G -annotation of the image by ψ of this maximal run to add p − j − ,i + , i.e., the inverse ofthe Rees matrix element obtained between the preceding and succeeding element: callthis step (*). Now: ∗ If both p j − ,i and p j,i + are nonzero then we simply update the G -annotation againto add p j − ,i and p j,i + . ∗ If both p j − ,i and p j,i + are zero, then we break the maximal run in three parts:the part before the insertion, the insertion which is a maximal run of its own,and the part after the insertion. We replace the element ( i ′ , g ′ , j ′ ) in w ′ (with g ′ already modified by step (*)) that corresponded to the whole run by three elements( i ′ , g ′ , j − ), ( i, g, j ), and ( i + , e, j ′ ). The preservation of the global invariant is againby Claim 3.8. ∗ The two cases where exactly one of p j − ,i and p j,i + is zero and the other is nonzeroare analogous to the above.If this does not happen within a maximal run: ∗ If the preceding element is in S \ C or is an element of C which combined with ( i, g, j )gives a zero according to the Rees matrix, and the same is true of the succeedingelement, then ( i, g, j ) is a new maximal run of its own. We insert it as-is in w ′ ∗ If the preceding element is in S \ C or an element of C which combined with ( i, g, j )gives a zero, but the next element ( i + , g + , j + ) is in C and combining ( i, g, j ) withit does not give a zero (i.e., p j,i + is nonzero), then we are extending the maximalrun that follows. We modify its G -annotation to add p j,i + g , we remove the elementin w ′ corresponding to that consecutive run and add it back to the new end positionof the extended maximal run, but changed so that its I -index becomes I . ∗ The case of an insertion extending the maximal run that precedes is analogousexcept that we can simply update the element in w ′ without having to move it, andchange its J -index and not I -index. . Amarilli, L. Jachiet, C. Paperman 25 ∗ If both the preceding element ( i − , g − , j − ) and succeeding elements ( i + , g + , j + ) arein C and neither gives a zero together with the newly inserted element, then by ourassumption that we are not within a maximal run, it must be that p j − ,i + = 0 andwe are merging together the two maximal runs of these elements. We reflect this inthe additional structure, in the G -annotation (adding the two Rees matrix terms p j − ,i and p j,i + which are nonzero by assumption, in addition to g ), removing in w ′ the element for the first run, and copying its I -index and group information to theelement for the original run (now the element for the second run)When we remove an element ( i, g, j ) of C , we distinguish between several cases:Removal within a maximal run which does not change the run endpoints, i.e., thepreceding and succeeding elements do not combine to a zero in the Rees matrixsemigroup. Then we simply update the G -annotation to add the inverse of the twoRees matrix entries that are no longer realized and add g − , and add the new nonzeroRees matrix entry which is realized.Removal within a maximal run which breaks up the run into two nonempty runs atthe deletion point. We insert the information for these two runs, putting the groupinformation of the split run on the first run (along with g − and the inverse p -term forthe two elements that are no longer adjacent) and again use Claim 3.8 to argue forcorrectness.Removal of the first letter of a maximal run. We update the G -annotation with theinverse of the Rees matrix entry which is no longer realized, and update the I -index ofthe end of the run.Removal of the last letter of a maximal run. This is like the previous case exceptwe change the J -index instead of the I -index, and we must move the element in w ′ corresponding to the run to sit at the new ending position of the run.Removal that eliminates a singleton maximal run. We simply delete it from w ′ . ◀ C Proofs for Section 4 (Dynamic Word Problem for Monoids in ZG ) We start by another preliminary remark on the relationship between SG , ZG , and A (seeFigure 1): ▶ Remark C.1.
Following Remark B.1, we could be tempted to claim that SG is equal tothe variety generated by ZG and A . However, as monoids in ZG also satisfy the equation( xyz ) ω +1 ( xzy ) ω = ( xyz ) ω ( xzy ) ω +1 , the same argument shows that it is not the case.We then provide the omitted proofs for the results in the main text. C.1 Proof of Upper Bound Results ▶ Theorem 4.1 ([25]) . The dynamic word problem for any commutative monoid is in O (1) . Proof.
We give a proof sketch for completeness. We easily maintain the vector of the numberof occurrences of each letter. The problem boils down to testing its membership to a subset N | Σ | defined following the monoid. It is known by [7, Proposition 9] that such recognizablesubsets are defined by a finite number of congruence and threshold conditions, so this can bechecked in O (1). ◀▶ Proposition 4.3.
For any nilpotent S , the dynamic word problem for S is in O (1) . Proof.
Let k > S k = 0. Given a word w ∈ S ∗ , weprepare in linear time from w a doubly-linked list L containing the positions of w having anelement which is not the identity of S , along with a table T of size | w | where the i -th cellcontains a pointer to the list element in L representing the i -th element if some exists, and adummy value otherwise. We can construct L in linear time.We will maintain the invariant that L contains all positions of w containing a non-neutralelement (note that we do not assume that L is in sorted order), and that T contains onepointer per element of L leading to the cell corresponding to that element in L .We can easily use L to determine the image of the current word in S . We first check if L contains ≥ k elements, which can be done in time O ( k ), hence O (1), by navigating thelist. If this is the case, then as S k = 0, we know that the word evaluates to 0. Otherwise, weknow the < k non-neutral elements of w , and we can evaluate their product in O (1) to knowthe answer.We now explain how to maintain L in constant time per edit. When a substitutionreplaces 1 by 1, or a non-neutral element by a non-neutral element, then we do nothing.When a substitution replaces a neutral element by a non-neutral element at position i ,then we add i to L , and let T [ i ] be a pointer to the new list item, in time O (1). When asubstitution replaces a non-neutral element by a neutral element at position i , we use T [ i ] tofind the element for i in L , remove it in time O (1), and erase T [ i ], all in O (1). This concludesthe proof. ◀ C.2 Proof of Lower Bound Results: Theorem 4.5
We prove Theorem 4.5 in the rest of this section. To do this, let us introduce ZE as thevariety of monoids whose idempotents are central, i.e., the variety defined by the equation x ω y = yx ω . Note that ZG ⊆ ZE , as is clear from the equations. We show a useful claim: ▶ Claim C.2.
We have: ZG = SG ∩ ZE . Proof.
Let M be in ZG . We first show that M is in ZE . Consider arbitrary elements x and y . We have x ω = ( x ω ) ω +1 by definition of x ω . Thus, x ω y = ( x ω ) ω +1 y = y ( x ω ) ω +1 = yx ω .Thus, M is in ZG . Furthermore, as ZG ⊆ SG , clearly M is in SG .Let M be in SG ∩ ZE . Then x ω +1 y = x ω +1 x ω y (By definition of x ω )= x ω +1 yx ω (By M ∈ ZE )= x ω yx ω +1 (By M ∈ SG )= yx ω x ω +1 (By M ∈ ZE )= yx ω +1 This concludes the proof of the claim. ◀ We can now conclude the proof of Theorem 4.5:
Proof.
Let M be a monoid in SG \ ZG . By Claim C.2, we know that M is not in ZE .Let us now assume that M ̸∈ ZE . By definition, there exist x, y ∈ M , such that x ω y ̸ = yx ω . Notice that we cannot have both x ω y = x ω yx ω and yx ω = x ω yx ω , so one ofthese equations must be false. Without loss of generality, we can assume that yx ω ̸ = x ω yx ω .Indeed, if we have x ω y ̸ = x ω yx ω instead, then we can show instead the lower bound forthe reversal M t defined by x · M t y = y · x with · M t and · M the internal laws of M t and . Amarilli, L. Jachiet, C. Paperman 27 M respectively. Now, we can reduce from the dynamic word problem for M t to the sameproblem for M by an obvious constant-time reduction where we reverse the input word andperform the updates at the mirror position. Thus, it suffices to consider the case where yx ω ̸ = x ω yx ω We now show that that any solution to the dynamic membership problem of M can beused to solve the prefix- U problem. To do so, we consider a word w on { , } of length n ,and encode it as a word w ′ of length 2 n + 2. All letters of w ′ the neutral element e of M except that w n +2 = x ω and w i = x ω whenever w i = 0. This can be done in lineartime during the preprocessing. Now, any edit that writes 0 or 1 in w i is done by writingrespectively x ω or e to w ′ i .Now, to perform at prefix- U query with argument j , we write w j +1 := y , use theincremental word problem data structure to get the evaluation of the word, then write back w j +1 := e . The evaluation result, after removing neutral elements, is x kω yx k ′ ω where k isthe number of 0’s in the prefix of length j in w , and k ′ is the number of 0’s in the rest of w ,which is ≥
1. Because x ω = x ω this is equivalent to x ω yx ω when the prefix contained a 0,or to yx ω when it did not. We have shown that these two elements are different, so we canindeed recover the answer to the prefix query, concluding the proof. ◀ D Proofs for Section 5 (Dynamic Word Problem for Semigroups) ▶ Claim 5.1.
The dynamic word problem for any sub-monoid of a semigroup S reduces inconstant time to the same problem for S . Proof.
This is immediate by Proposition 2.1 (and also intuitively): we simply solve theproblem with a structure for S but where we only use elements of the sub-monoid. ◀▶ Claim 5.2.
We have
LSG = SG as varieties of semigroups. Proof.
Clearly SG ⊆ LSG . Let S be a semigroup of LSG . For all elements x, y of S ,letting e := x ω , we have that the local monoid N = eSe is in SG . Now, since x ′ = exe and y ′ = eye are in N , we have ( x ′ ) ω +1 y ′ ( x ′ ) ω = ( x ′ ) ω y ′ x ω +1 . Furthermore, ( x ′ ) ω +1 = x ω +1 .Thus x ω +1 yx ω = x ω = yx ω +1 . ◀▶ Proposition 5.6.
Let S be a definite semigroup, let T be a semigroup of ZG , and let act be an action of S on T . There is a constant-time reduction from the dynamic word problemfor the semigroup T ◦ act S to the same problem for T . Proof.
Given a word w = ( t , s ) , . . . , ( t n , s n ) of ( T ◦ act S ) ∗ , we note that the secondcomponent of its evaluation is s · · · s n . As S is definite, we can compute this and maintainit in constant time, simply by looking at the k last elements, for k the integer witnessingthat S is definite.As for the first component, it can be shown to evaluate to the following word of T ∗ : t · act( s , t ) · act( s s , t ) · · · act( s s · · · s n , t n )We initialize a structure for the dynamic word problem on T with this word w ′ to obtain thefirst component. Now, when the word w is updated at position i , we perform the updates onthis word w ′ by changing t i in cell i (a single edit), and by changing the k cumulative sumsof s that have changed, i.e., the products in the first component of act in the up to k cellsstarting at cell i : this amounts to k edits, and for each of them the new cumulative sum canbe computed by looking at the k last elements, so this is a constant time computation and aconstant number of operations. ◀ E Proofs for Section 6 (Dynamic Word Problem for Languages) ▶ Proposition 6.1.
Let L be a regular language. The dynamic membership problem for L reduces in constant time to the dynamic word problem for the stable semigroup of L . Proof.
Let L be a regular language with S its stable semigroup. We reduce the dynamicmembership problem for L to the dynamic word problem for S as follows. For any word u ∈ Σ ∗ , we decompose u into u = u u · · · u n v with each u i of length s and v having length ≤ s , where s denotes the stability index. The image by η of this decomposition gives aword of S ∗ M . We use a maintenance scheme for the word of S ∗ , and propagate the updateson u to updates of this word in O (1) in the expected way: we compute the correspondingletter of the word of S ∗ by dividing the position of the update by a constant, we look aconstant number of neighboring positions to find the entire u i in the decomposition of u ,and evaluate η to compute the new image. Note that the case of updates to v is immediateas v has constant size. We can then use the structure for the dynamic word problem on S ∗ to obtain the image of u · · · u n in the stable semigroup, which we can compute togetherwith v . ◀ We now prove Proposition 6.2 in the rest of the appendix. We first establish a general claimformalizing the connection between the syntactic monoid and the dynamic word problem: ▶ Proposition E.1.
Let L be a regular language and η its syntactic morphism. The dynamicmembership problem for L is equivalent under constant-time reductions to the dynamic wordproblem for its syntactic monoid where we require that we only use elements of η (Σ) . Proof.
Clearly, a maintenance scheme for the dynamic word problem provides a maintenancescheme for L as it is sufficient to check that the resulting element belongs to η ( L ), and wewill indeed only use elements of η (Σ).In the other direction, we simply use that for all m , the language L m := η − ( m ) is inthe Boolean algebra closed under the quotient operator generated by L . In other words, anylanguage L m can be expressed using L , Boolean operations, and the quotient operator; noneof these operations changes the alphabet. Thus, we can reduce the dynamic problem to theproblem of checking if the image is in L m for every possible choice of m ∈ M . Now, eachof the L m reduces to L by Proposition 2.1 without changing the alphabet. So we can solvethe dynamic word problem for the syntactic monoid under the assumption that we gave,by building data structures for these L m : this uses the assumption that we are only usingletters from η (Σ). ◀ We can now prove Proposition 6.2:
Proof of Proposition 6.2.
Assume L is not in QLV . Let η be its syntactic morphism, s be its stability index and S its stable semigroup. Then, by definition of L , there exists asubmonoid N of S which is not in V . By definition of the stable semigroup, there exists amapping ψ from N to A s such that for all n ∈ N , we have η ( ψ ( n )) = n , for s the stabilityindex. We simulate the dynamic word problem for N by inserting for each update at position i , the corresponding word at position s × i . We can perform the evaluation in N by evaluatingthe image by η of the resulting word. We know that evaluating the image by η reduce inconstant time to L by Proposition E.1, where we use the fact that the resulting word (formedof the blocks of size s ) only consists of letters from the original alphabet. ◀ . Amarilli, L. Jachiet, C. Paperman 29 F Proofs for Section 7 (Extensions, Problem Variants, and FutureWork) ▶ Proposition 7.1.
There is a language L U in QSG \ QLZG which is equivalent to prefix- U under constant-time reductions, and a language L U in QSG \ QLZG which is equivalent toprefix- U under constant-time reductions. Proof.
We show each claim of the statement separately.
Claim on L U . Let L U be the regular language over the alphabet Σ = { a, b, c, x } thatcontains all words where there is only one x and the closest preceding non- c letter exists andis a b , i.e., L U can be defined through the regular expression ( a + b + c ) ∗ bc ∗ x ( a + b + c ) ∗ .This language is in SG and hence can be maintained in O (log log n ).We first show a constant-time reduction from the prefix- U problem to the dynamicmembership problem for L U , which implies that there is an Ω(log log n ) lower bound onthat problem.Let w be the word over the alphabet 1 , a, b to maintain for prefix- U . The idea of thereduction consists in encoding w = w · · · w n into h ( w ) = h ( w ) · · · h ( w n ) where h (1) = c , h ( a ) = a and h ( b ) = b . Clearly h ( w ) ̸∈ L U as there is no x in the word.Now, to handle a prefix- U query with parameter k , we look at the k -th letter of w . If itis a or b we can answer immediately with the value of w k . In the remaining case of w k = 1we set w ′ k to x . If we now have w ′ ∈ L U it means that the answer of the prefix query is b ,otherwise it means that the answer is either a or 1. To distinguish the two cases we first lookat w , if w ̸ = 1 then the answer is a otherwise we set w ′ to b , if w ′ belongs to L U then theanswer is 1 otherwise it is a . In all these cases after answering the query we restore w ′ to itsprevious state.We then show a constant-time reduction from the dynamic membership problem for L U to the prefix- U problem. We simply encode a word of { a, b, c, x } by writing a as a , b as b , c as 1, and x as 1. We also maintain a doubly-linked list as in the proof of Proposition 4.3 tostore all occurrences of a . Now, whenever the number of x ’s is different from 1 then the worddoes not belong to the language. Otherwise, we use L to find the position of the one x . Theword is then in the language iff the closest preceding non- c letter exists and is a b , which wecan determine with the prefix- U data structure by testing for the corresponding prefix andchecking if the answer is b . Claim on L U . Let L U be the regular language over the alphabet Σ = { a, b, c } defined bythe expression c ∗ x ( a + c ) ∗ , i.e., there is exactly one x and it precedes all a ’s.We first show a constant-time reduction from the dynamic membership problem for L U to the prefix- U problem. Indeed, we translate an input word w on { a, c, x } to a wordon { , } by writing 0 for a and 1 for a and x , and maintain this under updates. We alsomaintain a doubly-linked list as in the proof of Proposition 4.3 to store all occurrences of x .Now, whenever the number of x ’s is different from 1 then the word does not belong to thelanguage. When the number becomes equal to 1, the list L allows us to know its position,and a prefix query on the prefix- U structure allows us to know if there is an a before this x (in which case the word is not in the language) or if there is none (in which case it is).We then show a constant-time reduction from the prefix- U problem to the dynamicmembership problem for L U . Indeed, we simply encode 1 as c and 0 as a . When a prefixquery arrives for a prefix of length i , we check if the i -th letter is a 0, in which case we return 0, and otherwise we edit the word on { a, b, x } to insert an x at that position. Now,the resulting word belongs to the language iff there is no preceding 0. ◀▶ Claim 7.2.
For any fixed regular language L , the dynamic infix membership problem isequivalent up to constant-time reductions to the dynamic membership problem for the language Σ ∗ xLx Σ ∗ where x is a fresh letter. Proof.
Let L ′ := Σ ∗ xLx Σ ∗ , where x is a fresh letter not in Σ. We first explain how to use adynamic membership data structure for L ′ to solve the dynamic infix membership problemfor L . Given a word over Σ ∗ , we initialize the structure for L ′ on this word where we addone letter to the beginning of the word and one letter to the end of the word, say a . Whensubstitutions are performed on the word for L , we perform them in the data structure byoffsetting them by 1. Now, whenever we receive an infix query for a subword, we performtwo substitution updates on L ′ to replace the characters immediately before and after thesubword by x , check if the resulting word belongs to L ′ using the data structure, and undothese two operations to put back the correct characters. It is clear that the modified word isin L ′ iff the infix is in L . Note that the addition of the two fixed letters at the beginning andend of the word guarantee that the characters immediately before and after the subword areindeed defined.Second, we explain how to use a dynamic infix membership data structure for L to solvethe dynamic membership problem for L ′ . Given a word over (Σ ∪ { x } ) ∗ , we initialize thestructure for L by replacing every occurrence of x by some arbitrary character of Σ. We alsoprepare a doubly-linked list, like in the proof of Proposition 4.3, storing all occurrences of theletter x , as well as a pointer from positions containing x to the element of the doubly-linkedlist storing this copy of x . All of this can be performed as part of the preprocessing.Given updates to the word on L ′ , we maintain the doubly-linked list and pointers, andwe replicate these updates on the word on L , except that occurrences of x are replaced bysome arbitrary character of Σ.Now, to know if the current word belongs to L ′ or not, first observe that this is never thecase if the number of occurrences of x is different from 2, which we can check in constanttime using the doubly-linked list. If there are exactly two x ’s, we know their position, andcan perform an infix query on the data structure for L for this infix. The current word isin L ′ iff this query returns true. Thus, by performing this infix query after every update tothe word on L ′ (whenever the current word contains exactly two x ’s), we obtain the desiredinformation. This concludes the proof.’s), we obtain the desiredinformation. This concludes the proof.