Linear Time Runs over General Ordered Alphabets
LLinear Time Runs overGeneral Ordered Alphabets ∗ Jonas Ellert ! Department of Computer Science,Techincal University of Dortmund, Germany
Johannes Fischer ! Department of Computer Science,Techincal University of Dortmund, Germany
Abstract
A run in a string is a maximal periodic substring. For example, the string bananatree containsthe runs anana = ( an ) / and ee = e . There are less than n runs in any length- n string, andcomputing all runs for a string over a linearly-sortable alphabet takes O ( n ) time (Bannai et al.,SODA 2015). Kosolobov conjectured that there also exists a linear time runs algorithm for generalordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore etal., who presented an O ( nα ( n )) time algorithm (where α ( n ) is the extremely slowly growing inverseAckermann function). We show how to achieve O ( n ) time by exploiting combinatorial properties ofthe Lyndon array, thus proving Kosolobov’s conjecture. Theory of computation → Design and analysis of algorithms
Keywords and phrases
String algorithms, Lyndon array, runs, longest common extension, generalordered alphabets, combinatorics on words
Supplementary Material https://github.com/jonas-ellert/linear-time-runs/
A run in a string S is a maximal periodic substring. For example, the string S = bananatree contains exactly the runs anana = ( an ) / and ee = e . Identifying such repetitive structuresin strings is of great importance for applications like text compression, text indexing andcomputational biology (for a general overview see [7]). To name just one example, runs inhuman genes (called maximal tandem repeats) are involved with a number of neurologicaldisorders [4]. In 1999, Kolpakov and Kucherov showed that the maximum number ρ ( n ) of runsin a length- n string is bounded by O ( n ), and provided a word RAM algorithm that outputsall runs in linear time [17]. The algorithm is based on the Lempel-Ziv factorization andonly achieves O ( n ) time for linearly-sortable alphabets , i.e. alphabets that are totally orderedand for which a sequence of σ alphabet symbols can be sorted in O ( σ ) time. Since then,it has been an open question whether there exists a linear time runs algorithm for generalordered alphabets , i.e. totally ordered alphabets for which the order of any two symbols can bedetermined in constant time. Any such algorithm must not use the Lempel-Ziv factorization,since for general ordered alphabets of size σ it cannot be constructed in o ( n lg σ ) time [18].Kolpakov and Kucherov also conjectured that the maximum number of runs is boundedby ρ ( n ) < n , which started a 15 year-long search for tighter upper bounds of ρ ( n ). Rytterwas the first to give an explicit constant with ρ ( n ) < n [23]. After multiple incrementalimprovements of this bound (e.g. [6, 8, 22]), Bannai et al. [2] finally proved the conjecture byshowing ρ ( n ) < n for arbitrary alphabets, which was subsequently even surpassed for binarytexts [11]. (The current best bound for binary alphabets is ρ ( n ) < n [16].) ∗ This work has been submitted to ICALP 2021. a r X i v : . [ c s . D S ] F e b Linear Time Runs over General Ordered Alphabets
On the algorithmic side, Bannai et al. also provided a new linear time algorithm thatcomputes all the runs [2]. While (just like the algorithm by Kolpakov and Kucherov) it onlyachieves the time bound for linearly-sortable alphabets, it no longer relies on the Lempel-Zivfactorization. Instead, the main effort of the algorithm lies in the computation of Θ( n ) longest common extensions (LCEs) ; given two indices i, j ∈ [1 , n ], their LCE is the length ofthe longest common prefix of the suffixes S [ i..n ] and S [ j..n ]. For linearly-sortable alphabets,we can precompute a data structure in O ( n ) time that answers arbitrary LCE queries inconstant time (see e.g. [10]), thus yielding a linear time runs algorithm. Kosolobov showedthat for general ordered alphabets any batch of O ( n ) LCEs can be computed in O ( n lg / n )time, and conjectured the existence of a linear time runs algorithm for general orderedalphabets [19]. Gawrychowski et al. improved this result to O ( n lg lg n ) time [13]. Finally,Crochemore et al. noted that the required LCEs satisfy a special non-crossing property. Theyshowed how to compute O ( n ) non-crossing LCEs in O ( nα ( n )) time, resulting in an O ( nα ( n ))time algorithm that computes all runs over general ordered alphabets [9] (where α is theinverse Ackermann function). Our Contributions.
We show how to compute the required LCEs in O ( n ) time and space,resulting in the first linear time runs algorithm for general ordered alphabets, and thusproving Kosolobov’s conjecture. Our solution differs from all previous approaches in thesense that it cannot answer a sequence of arbitrary non-crossing LCE queries. Instead, ouralgorithm is specifically designed exactly for the LCEs required by the runs algorithm . Thisallows us to utilize powerful combinatorial properties of the Lyndon array (a definition followsin Section 2) that do not generally hold for arbitrary non-crossing LCE sequences.Even though the main contribution of our work is the improved asymptotic time bound,it is worth mentioning that our algorithm is also very fast in practice. On modern hardware,computing all runs for a text of length 10 (= 10MB) takes only one second. A Note on the Model.
As mentioned earlier, our algorithm runs in linear time for generalordered alphabets , whereas previous algorithms achieve this time bound only when thealphabet is linearly-sortable . This is comparable with the distinction between comparison-based and integer sorting: while in the comparison-model sorting n items requires Ω( n lg n )time, integer sorting is faster ( O ( n √ lg lg n ) time [15] and sometimes even linear, e.g. whenthe word width w satisfies w = O (lg n ) and one can use radix sort, or when w ≥ (lg ϵ n ) [1]).Whereas it is a major open problem whether integer sorting can always be done in lineartime, this paper settles a symmetric open problem for the computation of runs.The remainder of the paper is structured as follows: First, we introduce the basic notation,definitions, and auxiliary lemmas (Section 2). Then, we give a simplified description of theruns algorithm by Bannai et al. and show how the required LCEs relate to the Lyndon array(Section 3). Our linear time algorithm to compute the LCEs is described in Section 4. Wediscuss additional practical aspects and experimental results in Section 5. Our analysis is performed in the word RAM model (see e.g. [14]), where we can performfundamental operations (logical shifts, basic arithmetic operations etc.) on words of size w bits in constant time. For an input of size n we assume ⌈ log n ⌉ ≤ w . We write[ i, j ] = [ i, j + 1) = ( i − , j ] = ( i − , j + 1) with i, j ∈ N to denote the set of integers { x | x ∈ N ∧ i ≤ x ≤ j } . . Ellert and J. Fischer 3 Strings.
Let Σ be a finite and totally ordered set. A string S of length | S | = n over the alphabet Σ is a sequence S [1] . . . S [ n ] of n symbols (also called characters ) from Σ. The alphabetis called a general ordered alphabet if order testing (i.e. evaluating σ < σ for σ , σ ∈ Σ) ispossible in constant time. For i, j ∈ [1 , n ], we use the interval notation S [ i..j ] = S [ i..j + 1) = S ( i − ..j ] = S ( i − ..j + 1) to denote the substring S [ i ] . . . S [ j ]. If however i > j , then S [ i..j ] denotes the empty string ϵ . The substring S [ i..j ] is called proper if S [ i..j ] ̸ = S . A prefix of S is a substring S [1 ..j ] (including S [1 ..
0] = ϵ ), while the suffix S i is the substring S [ i..n ] (including S n +1 = ϵ ). Given two strings S and T of length n and m respectively, theirconcatenation is defined as ST = S [1] . . . S [ n ] T [1] . . . T [ m ]. For any positive integer k , the k -times concatenation of S is denoted by S k . Let ℓ max = min( n, m ). The longest commonprefix (LCP) of S and T has length lcp ( S, T ) = max { ℓ | ℓ ∈ [0 , ℓ max ] ∧ S [1 ..ℓ ] = T [1 ..ℓ ] } ,while the longest common suffix (LCS) has length lcs ( S, T ) = max { ℓ | ℓ ∈ [0 , ℓ max ] ∧ S n − ℓ +1 = T m − ℓ +1 } . Let ℓ ′ = lcp ( S, T ). For a string S of length n and indices i, j ∈ [1 , n ],we define the longest common right-extension (R-LCE) and left-extension (L-LCE) as lce r ( i, j ) = lcp ( S i , S j ) and lce ℓ ( i, j ) = lcs ( S [1 ..i ] , S [1 ..j ]) respectively. The total orderon Σ induces a lexicographical order ≺ on the strings over Σ in the usual way. Given threesuffixes, we can deduce properties of their R-LCEs from their lexicographical order: ▶ Lemma 1.
Let S i ≺ S j ≺ S k be suffixes a string, then it holds lce r ( i, k ) ≤ lce r ( i, j ) and lce r ( i, k ) ≤ lce r ( j, k ) . Proof.
Assume ℓ = lce r ( i, j ) < lce r ( i, k ), then S i [1 ..ℓ ] = S j [1 ..ℓ ] = S k [1 ..ℓ ] and S j [ ℓ + 1] ̸ = S i [ ℓ + 1] = S k [ ℓ + 1]. This implies S i ≺ S j ⇔ S k ≺ S j , which contradicts S i ≺ S j ≺ S k . Theproof of lce r ( i, k ) ≤ lce r ( j, k ) works analogously. ◀ Repetitions and Runs.
Let S be a string and let S [ i..j ] be a non-empty substring. We saythat p ∈ N + is a period of S [ i..j ] if and only if ∀ x ∈ [ i, j − p ] : S [ x ] = S [ x + p ]. If additionally( j − i + 1) ≥ p , then S [ i..i + p ) is called string period of S [ i..j ]. Furthermore, p is called shortest period of S [ i..j ] if there is no q ∈ [1 , p ) that is also a period of S [ i..j ]. Analogously,a string period of S [ i..j ] is called shortest string period if there is no shorter string period of S [ i..j ]. A run is a triple ⟨ i, j, p ⟩ such that p is the shortest period of S [ i..j ], ( j − i + 1) ≥ p (i.e. there are at least two consecutive occurrences of the shortest string period S [ i..i + p )),and neither ⟨ i − , j, p ⟩ nor ⟨ i, j + 1 , p ⟩ satisfies these properties (assuming i > j < n ,respectively). Lyndon Words and Nearest Smaller Suffixes.
For a length- n string S and i ∈ [1 , n ], thestring S i S [1 ..i ) is called cyclic shift of S , and non-trivial cyclic shift if i >
1. A
Lyndon word is a non-empty string that is lexicographically smaller than any of its non-trivial cyclic shifts,i.e. ∀ i ∈ [2 , n ] : S ≺ S i S [1 ..i ). The Lyndon array of S identifies the longest Lyndon wordstarting at each position of S . ▶ Definition 2 (Lyndon Array) . Given a string S of length n , its Lyndon array λ [1 , n ] isdefined by ∀ i ∈ [1 , n ] : λ [ i ] = max { j − i + 1 | j ∈ [ i, n ] ∧ S [ i..j ] is a Lyndon word } . An alternative representation of the Lyndon array is the next-smaller-suffix array. ▶ Definition 3 (Next Smaller Suffixes) . Given a string S of length n , its next-smaller-suffix(NSS) array is defined by ∀ i ∈ [1 , n ] : nss [ i ] = min { j | j = n + 1 ∨ ( j ∈ ( i, n ] ∧ S i ≻ S j ) } . If nss [ i ] ≤ n , then S nss [ i ] is called next smaller suffix of S i . ▶ Lemma 4 (Lemma 15 [12]) . The longest Lyndon word starting at any position i ∈ [1 , n ] ofa length- n string S is exactly the substring S [ i.. nss [ i ]) , i.e. ∀ i ∈ [1 , n ] : λ [ i ] = nss [ i ] − i . Linear Time Runs over General Ordered Alphabets . S = aaaa run ⟨ , , ⟩ z }| { α z }| { abc abab α z }| { abc abab α z }| { abc abab α [1 .. z }| { abc aba aaaa β = abab abc nss [8] = 15 , λ [8] = 7 . lce ℓ (8 ,
15) = 4 lce r (8 ,
15) = 17
Figure 1
Lexicographically decreasing run ⟨ , , ⟩ with S [5 ..
31] = ( abcabab ) / . The runhas shortest string period α = abcabab , and is rooted in position 8 (with longest Lyndon word β = S [8 ..
15) = α α [1 ..
3] = abababc ). An important property of next smaller suffixes is that they do not intersect : ▶ Lemma 5.
Let i ∈ [1 , n ] and i ′ ∈ [ i, nss [ i ]) . Then it holds nss [ i ′ ] ≤ nss [ i ] . Proof.
Due to i ′ ∈ [ i, nss [ i ]) and Definition 3 it holds S i ′ ≻ S nss [ i ] . Assume that the lemmadoes not hold, then we have nss [ i ] ∈ ( i ′ , nss [ i ′ ]) and Definition 3 implies S i ′ ≺ S nss [ i ] . ◀ In this section, we recapitulate the main ideas of the algorithm by Bannai et al. [2], which isthe basis of our solution for general ordered alphabets. The key insight is that every run is rooted in a longest Lyndon word, allowing us to compute all runs from the Lyndon array. ▶ Definition 6.
Let ⟨ i, j, p ⟩ be a run in a string S . We say that ⟨ i, j, p ⟩ is (lexicographically)decreasing if and only if S i ≻ S i + p . Otherwise, ⟨ i, j, p ⟩ is (lexicographically) increasing . ▶ Lemma 7.
Let ⟨ i, j, p ⟩ be a decreasing run, then there is exactly one index i ∈ [ i..i + p ) such that λ [ i ] = p . Proof.
Consider any i ∈ [ i, i + p ). By the definition of runs, we have S [ i..i ) = S [ i + p..i + p ).Since the run is decreasing it follows S i ≻ S i + p ⇐⇒ S [ i..i ) S i ≻ S [ i + p..i + p ) S i + p ⇐⇒ S i ≻ S i + p . This implies nss [ i ] ≤ i + p , and due to Lemma 4 also λ [ i ] ≤ p . Next, we showthat there is at least one index i ∈ [ i..i + p ) such that S [ i ..i + p ) is a Lyndon word. Let α = S [ i..i + p ). Assume that the described index i does not exist, then from S [ i..i +2 p ) = αα follows that no cyclic shift of α is a Lyndon word. Let β be a lexicographically minimalcyclic shift of α , then this shift is not unique (otherwise β would be a Lyndon word), andthus there must be a cyclic shift β k β [1 ..k ) = β [1 ..k ) β k with k >
1. This however implies that β is of the form β = µ k for some string µ and an integer k > α is the shortest string period of the run. Finally, let α k α [1 ..k ) with k ∈ [1 , p ] be the unique lexicographically smallest cyclic shift of α (and thus a Lyndon word),then it is easy to see that only i = i + k − λ [ i ] = p . ◀▶ Definition 8 (Root of a Run) . Let ⟨ i, j, p ⟩ be a decreasing run, and let i ∈ [ i..i + p ) be theunique index with λ [ i ] = p (as described in Lemma 7). We say that ⟨ i, j, p ⟩ is rooted in i . An example of a decreasing run and its root is provided in Figure 1. Note that our notionof a root differs from the L -roots introduced by Crochemore et al. [5]. While an L -root is any length- p Lyndon word contained in the run, our root is exactly the leftmost one.Given a longest Lyndon word S [ i .. nss [ i ]) of length p = nss [ i ] − i = λ [ i ], it is easyto determine whether i is the root of a decreasing run. We simply try to extend the . Ellert and J. Fischer 5 periodicity as far as possible to both sides by using the LCE functions. For this purpose, weonly need to compute l = lce ℓ ( i , nss [ i ]) and r = lce r ( i , nss [ i ]). Let i = i − l + 1 and j = nss [ i ] + r −
1, then clearly the substring S [ i..j ] has smallest period p , and we cannotextend the substring to either side without breaking the periodicity. Thus, if j − i + 1 ≥ p then ⟨ i, j, p ⟩ is a run. Note that this run is only rooted in i if additionally i ∈ [ i..i + p ) (orequivalently l ≤ p ) holds. For the index i = 8 in Figure 1, we have l = lce ℓ (8 ,
15) = 4 and r = lce r (8 ,
15) = 17. Therefore, the run starts at position i = 8 − j = 15 + 17 − l = 4 ≤ p follows that 8 is actually the root.Since each decreasing run is rooted in exactly one index, we can find all decreasing runsby checking for each index whether it is the root of a run. This procedure is outlined inAlgorithm 1. First, we compute the NSS array (line 2). Then, we investigate one index i ∈ [1 , n ] at a time (line 3), and consider it as the root of a run with period p = nss [ i ] − i (line 4). If the left-extension covers an entire period (i.e. lce ℓ ( i , nss [ i ]) > p ), then we havealready investigated the root of the run in an earlier iteration of the for-loop, and no furtheraction is required (line 5). Otherwise, we compute the left and right border of the potentialrun as described earlier (lines 6–7). If the resulting interval has length at least 2 p , then wehave discovered a run that is rooted in i (lines 8–9). Algorithm 1
Compute all decreasing runs.
Input:
String S of length n . Output:
Set R of all decreasing runs in S . R ← ∅ compute array nss for i ∈ [1 , n ] with nss [ i ] ̸ = n + 1 do p ← nss [ i ] − i if lce ℓ ( i , nss [ i ]) ≤ p then i ← i − lce ℓ ( i , nss [ i ]) + 1 j ← nss [ i ] + lce r ( i , nss [ i ]) − if j − i + 1 ≥ p then R ← R ∪ {⟨ i, j, p ⟩} Time and space complexity.
The NSS array can be computed in O ( n ) time and space forgeneral ordered alphabets [3]. Assume for now that we can answer L-LCE and R-LCE queriesin constant time, then clearly the rest of the algorithm also requires O ( n ) time and space.The correctness of the algorithm follows from Lemma 7 and the description. We have shown: ▶ Lemma 9.
Let S be a string of length n over a general ordered alphabet, and let nss be itsNSS array. We can compute all increasing runs of S in O ( n ) + t ( n ) time and O ( n ) + s ( n ) space, where t ( n ) and s ( n ) are the time and space needed to compute lce ℓ ( i, nss [ i ]) and lce r ( i, nss [ i ]) for all i ∈ [1 , n ] with nss [ i ] ̸ = n + 1 . In order to also find all increasing runs, we only need to rerun the algorithm with reversedalphabet order. This way, previously increasing runs become decreasing.
In this section, we show how to precompute the LCEs required by Algorithm 1 in linear timeand space. Our approach is asymmetric in the sense that we require different algorithms for
Linear Time Runs over General Ordered Alphabets S = i − ( j − i ) ↓ β i ↓ β j ↓ β j +( j − i ) ↓ (a) Lemmas 10 and 11. The dotted edge followsfrom lce r ( i, j ) ≥ ( j − i ) (Lemma 10). The dashededge follows from lce ℓ ( i, j ) > ( j − i ) (Lemma 11). S = i ↓ i ↓ i ↓ i ↓ j ↓ i ↓ i ↓ j ↓ (b) Relative order of R-LCE computations fromfirst to last: lce r ( i , j ), lce r ( i , j ), lce r ( i , j ), lce r ( i , j ), lce r ( i , j ), lce r ( i , j ). Figure 2
An edge from text position a to text position b indicates nss [ a ] = b . L-LCEs and R-LCEs (whereas previous approaches usually compute L-LCEs by applying theR-LCE algorithm to the reverse text). However, for both directions we use similar propertiesof the Lyndon array that are shown in Lemmas 10 and 11 and visualized in Figure 2a. ▶ Lemma 10.
Let i ∈ [1 , n ] and j = nss [ i ] ̸ = n + 1 . If lce r ( i, j ) ≥ ( j − i ) , then it holds lce r ( j, j + ( j − i )) = lce r ( i, j ) − ( j − i ) and nss [ j ] = j + ( j − i ) . Proof.
From lce r ( i, j ) ≥ ( j − i ) follows lce r ( i, j ) = ( j − i ) + lce r ( j, j + ( j − i )), whichis equivalent to lce r ( j, j + ( j − i )) = lce r ( i, j ) − ( j − i ). It remains to be shown that nss [ j ] = j + ( j − i ). Due to nss [ i ] = j it holds S i ≻ S j . Since S i ≻ S j and lce r ( i, j ) ≥ ( j − i ),we have S i +( j − i ) ≻ S j +( j − i ) , which implies nss [ j ] ≤ j + ( j − i ). Note that nss [ i ] = j andLemma 4 imply that S [ i..j ) = S [ j..j + ( j − i )) is a Lyndon word. Thus it holds λ [ j ] ≥ ( j − i ),or equivalently nss [ j ] ≥ j + ( j − i ). ◀▶ Lemma 11.
Let i ∈ [1 , n ] and j = nss [ i ] ̸ = n + 1 . If lce ℓ ( i, j ) > ( j − i ) , then it holds lce ℓ ( i − ( j − i ) , i ) = lce ℓ ( i, j ) − ( j − i ) and nss [ i − ( j − i )] = i . Proof.
Analogous to Lemma 10. ◀ First, we will briefly describe our general technique for computing LCEs, and our methodof showing the linear time bound. Assume for this purpose that we want to compute ℓ = lce r ( i, j ) with i < j . It is easy to see that we can determine ℓ by performing ℓ + 1individual character comparisons (by simultaneously scanning the suffixes S i and S j from leftto right until we find a mismatch). Whenever we use this naive way of computing an LCE,we charge one character comparison to each of the indices from the interval [ j, j + ℓ ). Thisway, we account for ℓ character comparisons. Since we want to compute O ( n ) R-LCE valuesin O ( n ) time, we can afford a constant time overhead (i.e. a constant number of unaccountedcharacter comparisons) for each LCE computation. Thus, there is no need to charge the( ℓ + 1)-th comparison to any index. At the time at which we want to compute ℓ , we mayalready know some lower bound k ≤ ℓ . In such cases, we simply skip the first k charactercomparisons and compute ℓ = k + lce r ( i + k, j + k ). This requires ℓ − k + 1 charactercomparisons, of which we charge ℓ − k to the interval [ j + k..j + ℓ ).Ultimately, we will show that all R-LCE values lce r ( i, j ) with i ∈ [1 , n ] and j = nss [ i ] ̸ = n + 1 can be computed in a way such that each text position gets charged at most once,which results in the desired linear time bound. From now on, we refer to i as the left index and j as the right index of the R-LCE computation. Our algorithm computes the R-LCEsin the following order (a visualization is provided in Figure 2b): We consider the possible . Ellert and J. Fischer 7 right indices j ∈ [2 , n ] one at a time and in increasing order. For each right index j , we thenconsider the corresponding left indices i with nss [ i ] = j in decreasing order (we will see howto efficiently deduce this order from the Lyndon array later).Assume that we are computing the R-LCEs in the previously described order, and let ℓ = lce r ( i, j ) with j = nss [ i ] ̸ = n + 1 be the next value that we want to compute. Theset of indices that we have already considered as left indices for LCE computations is I = { x | ( nss [ x ] < j ) ∨ (( nss [ x ] = j ) ∧ ( i < x )) } . For example, when we compute lce r ( i , j )in Figure 2b it holds { i , i , i } ⊆ I . At this point in time, the rightmost text position that wehave already inspected is → c = max x ∈ I ( nss [ x ] + lce r ( x, nss [ x ])) if I ̸ = ∅ , or → c = 1 otherwise.Due to the nature of our charging method, we have not charged any indices from the interval[ → c , n ] yet. Thus, in order to show that we can compute all LCEs without charging any indextwice, it suffices to show how to compute ℓ = lce r ( i, j ) without charging any index from theinterval [1 , → c ). If j ≥ → c then we naively compute ℓ and charge the character comparisons tothe interval [ j, j + ℓ ), thus only charging previously uncharged indices. The new value of → c is j + ℓ . If however j < → c , then the computation of ℓ depends on the previously computedLCEs, which we describe in the following.Let ℓ ′ = lce r ( i ′ , j ′ ) with j ′ = nss [ i ′ ] be the most recently computed R-LCE that satisfies j ′ + ℓ ′ = → c . Our strategy for computing ℓ depends on the position of i relative to i ′ and j ′ .First, note that i / ∈ [ i ′ , j ′ ) because otherwise Lemma 5 implies j ≤ j ′ , which contradicts ourorder of computation. This leaves us with three possible cases (as before, a directed edgefrom text position a to text position b indicates nss [ a ] = b ): S = i ↓ i ′ ↓ j ′ ↓ j ↓ → c ↓ Case R1: i < i ′ (possibly j ′ = j ) S = i ′ ↓ j ′ = i ↓ j ↓ → c ↓ Case R2: i = j ′ S = i ′ ↓ j ′ ↓ i ↓ j ↓ → c ↓ Case R3: i > j ′ Now we explain the cases in detail. Each case is accompanied by a schematic drawing.We strongly advise the reader to study the drawings alongside the description, since they areessential for an easy understanding of the matter.
Case R1: i < i ′ (and j ′ ≤ j < → c ) . | α | = j − j ′ , | β | = → c − j S = i ↓ β γ i ′ ↓ α ( i ′ + j − j ′ ) ↓ β j ′ ↓ α j ↓ β → c ↓ γℓ ′ = | αβ | , ℓ = | βγ | Due to i < ( i ′ + j − j ′ ) < j = nss [ i ] we have S j ≺ S i ≺ S i ′ + j − j ′ . From Lemma 1 follows → c − j = lce r ( i ′ + j − j ′ , j ) ≤ lce r ( i, j ) = ℓ , i.e. both S i and S j start with β . Since now weknow a lower bound → c − j ≤ ℓ of the desired LCE value, we can skip character comparisonsduring its computation. Later, we will see that the same bound also holds for most of theother cases. Generally, whenever we can show → c − j ≤ ℓ we use the following strategy. Wecompute ℓ = ( → c − j ) + lce r ( i + ( → c − j ) , → c ) using ℓ − ( → c − j ) + 1 character comparisons,of which we charge ℓ − ( → c − j ) to the interval [ → c , j + ℓ ). Thus we only charge previouslyuncharged positions. We continue with i ′ ← i , j ′ ← j , ℓ ′ ← ℓ , and → c ← j + ℓ . Linear Time Runs over General Ordered Alphabets
Case R2: i = j ′ . We divide this case into two subcases.
Case R2a: ℓ ′ < j ′ − i ′ . | α | = j − j ′ , | β | = → c − j S = i ′ ↓ α ( i ′ + j − i ) ↓ β j ′ = i ↓ α j ↓ β → c ↓ From j < → c = ⇒ j − i < → c − i = ℓ ′ and ℓ ′ < j ′ − i ′ follows i ′ + j − i < j ′ = i . Therefore, nss [ i ′ ] = i and Definition 3 imply S i ≺ S i ′ + j − . Due to nss [ i ] = j we also have S j ≺ S i ,such that it holds S j ≺ S i ≺ S i ′ + j − . It is easy to see that S i ′ + j − i and S j share a prefix β of length lce r ( i ′ + j − i, j ) = → c − j . In fact, also S i has prefix β because Lemma 1 impliesthat lce r ( i ′ + j − i, j ) ≤ lce r ( i, j ) = ℓ . Thus it holds → c − j ≤ ℓ , which allows us to usethe strategy from Case R1. Case R2b: ℓ ′ ≥ j ′ − i ′ . | β | = j ′ − i ′ , ℓ = ℓ ′ − | β | S = i ′ ↓ β j ′ = i ↓ β j ↓ → c ↓ Due to ℓ ′ ≥ j ′ − i ′ , Lemma 10 implies j = i + ( j ′ − i ′ ) and ℓ = ℓ ′ − ( j ′ − i ′ ). Since i ′ , j ′ ,and ℓ ′ are known, we can compute ℓ in constant time without performing any charactercomparisons. We continue with i ′ ← i , j ′ ← j , and ℓ ′ ← ℓ (leaving → c unchanged). Case R3: i > j ′ . This is the most complicated case, and it is best explained by dividingit into three subcases. Let d = j ′ − i ′ , i ′′ = i − d , j ′′ = j − d , and ℓ ′′ = lce r ( i ′′ , j ′′ ). Case R3a: nss [ i ′′ ] ̸ = j ′′ : | α | = ℓ ′ , | β | = | γ | = → c − j S = i ′′ ↓ j ′′ ↓ i ′ ↓ α ( i ′ + ℓ ′ ) ↓ i ↓ j ↓ j ′ ↓ α → c ↓ ℓ ′′ ≥ | β | , ℓ ≥ | β | β γ β γ First, note that S [ i ′ ..i ′ + ℓ ′ ) = S [ j ′ .. → c ) implies S [ i..j ) = S [ i ′′ ..j ′′ ). From nss [ i ] = j followsthat S [ i..j ) = S [ i ′′ ..j ′′ ) is a Lyndon word. Thus, due to Lemma 4 and nss [ i ′′ ] ̸ = j ′′ it holds nss [ i ′′ ] > j ′′ , which implies S i ′′ ≺ S j ′′ . Let β = S [ i ′′ ..i ′′ + → c − j ) = S [ i..i + → c − j ) and let γ = S [ j ′′ ..i ′ + ℓ ′ ) = S [ j.. → c ). From S i ′′ ≺ S j ′′ follows β ⪯ γ , while S i ≻ S j implies β ⪰ γ .Thus it holds β = γ , and therefore lce r ( i, j ) ≥ | γ | = → c − j . This means that we can usethe strategy from Case R1. Case R3b: nss [ i ′′ ] = j ′′ and( j ′′ + ℓ ′′ ) < ( i ′ + ℓ ′ ): S = i ′′ ↓ j ′′ ↓ i ′ ↓ α ( i ′ + ℓ ′ ) ↓ i ↓ j ↓ j ′ ↓ α → c ↓ | α | = ℓ ′ , | β | = ℓ ′′ = ℓ β β β β . Ellert and J. Fischer 9 Due to ℓ ′′ = lce r ( i ′′ , j ′′ ), there is a shared prefix β = S [ i ′′ ..i ′′ + ℓ ′′ ) = S [ j ′′ ..j ′′ + ℓ ′′ ) between S i ′′ and S j ′′ , and the first mismatch between the two suffixes is S [ i ′′ + ℓ ′′ ] ̸ = S [ j ′′ + ℓ ′′ ].Because of ( j ′′ + ℓ ′′ ) < ( i ′ + ℓ ′ ), both the shared prefix and the mismatch are containedin S [ i ′ ..i ′ + ℓ ′ ) (i.e. in the first occurrence of α ). If we consider the substring S [ j ′ ..j ′ + ℓ ′ )instead (i.e. the second occurrence of α ), then S i and S j clearly also share the prefix β = S [ i..i + ℓ ′′ ) = S [ j..j + ℓ ′′ ), with the first mismatch occurring at S [ i + ℓ ′′ ] ̸ = S [ j + ℓ ′′ ].Thus it holds ℓ = ℓ ′′ . Due to nss [ i ′′ ] = j ′′ and our order of R-LCE computations, wehave already computed ℓ ′′ . Therefore, we can simply assign ℓ ← ℓ ′′ and continue withoutchanging i ′ , j ′ , ℓ ′ , and → c . Case R3c: nss [ i ′′ ] = j ′′ and( j ′′ + ℓ ′′ ) ≥ ( i ′ + ℓ ′ ): S = i ′′ ↓ j ′′ ↓ i ′ ↓ α ( i ′ + ℓ ′ ) ↓ i ↓ j ↓ j ′ ↓ α → c ↓ | α | = ℓ ′ , | β | = → c − j , | βγ | = ℓ ′′ β γ β γ β βℓ ≥ | β | This situation is similar to Case R3b. There is a shared prefix β = S [ i ′′ ..i ′′ + → c − j ) = S [ j ′′ ..i ′ + ℓ ′ ) between the suffixes S i ′′ and S j ′′ . They may share an even longer prefix βγ , butonly the first | β | = → c − j symbols of their LCP are contained in S [ i ′ ..i ′ + ℓ ′ ) (i.e. in the firstoccurrence of α ). If we consider the substring S [ j ′ ..j ′ + ℓ ′ ) instead (i.e. the second occurrenceof α ), then S i and S j clearly also share at least the prefix β = S [ i..i + → c − j ) = S [ j.. → c ).Thus it holds → c − j ≤ ℓ , and we can use the strategy from Case R1.We have shown how to compute ℓ without charging any index twice. It follows that thetotal number of character comparisons for all R-LCEs is O ( n ). A Simple Algorithm for R-LCEs.
While the detailed differentiation between the six subcaseshelps to show the correctness of our approach, it can be implemented in a significantly simplerway (see Algorithm 2). At all times, we keep track of j ′ , → c and the distance d = j ′ − i ′ (line 1). We consider the indices j ∈ [2 , n ] in increasing order (line 2). For each index j , wethen consider the indices i with nss [ i ] = j in decreasing order (line 3). Each time we want tocompute an R-LCE value ℓ = lce r ( i, j ), we first check whether Case R3b applies (line 4). Ifit does, then we simply copy the previously computed R-LCE value lce r ( i − d, j − d ) (line 5).Otherwise, we either compute the LCE naively (if j ≥ → c ), or we have to apply the strategyfrom Case R1 (since all other cases except for Case R2b use this strategy; in Case R2b itholds → c − j = ℓ , which means that it can also be solved with the strategy from Case R1). If j ≥ → c then in lines 7–8 we have k = 0, and thus we naively compute lce r ( i, j ) by scanning.If however j < → c , then we have k = → c − j , and we skip k character comparisons. In anycase, we update the values j ′ , → c , and d accordingly (line 9).The correctness of the algorithm follows from the description of Cases 1–3. Since foreach left index i we have to store at most one R-LCE, we can simply maintain the LCEsin a length- n array, where the i -th entry is lce r ( i, nss [ i ]). This way, we use linear spaceand can access the R-LCE that is required in line 5 in constant time. Apart from theat most n character comparisons that we charge to the indices, we only need a constantnumber of additional primitive operations per computed R-LCE. The order of iteration canbe realized by first generating all ( i, nss [ i ])-pairs, and then using a linear time radix sorter tosort the pairs in increasing order of their second component and decreasing order of theirfirst component. We have shown: Algorithm 2
Compute all R-LCEs.
Input:
String S of length n and its NSS array nss . Output:
R-LCE value lce r ( i, nss [ i ]) for each index i ∈ [1 , n ] with nss [ i ] ̸ = n + 1. j ′ ← → c ← d ← for j ∈ [2 , n ] in increasing order do for i with nss [ i ] = j ̸ = n + 1 in decreasing order do if i, j ∈ ( j ′ , → c ) ∧ nss [ i − d ] = j − d ∧ j + lce r ( i − d, j − d ) < → c then lce r ( i, j ) ← lce r ( i − d, j − d ) else k ← max( → c , j ) − j lce r ( i, j ) ← k + naive-scan-lce r ( i + k, j + k ) j ′ ← j ; → c ← j + lce r ( i, j ); d ← j − i ▶ Lemma 12.
Given a string of length n and its NSS array nss , we can compute lce r ( i, nss [ i ]) for all indices i ∈ [1 , n ] with nss [ i ] ̸ = n + 1 in O ( n ) time and space. Our solution for the L-LCEs is similar to the one for R-LCEs, but differs in subtle details. Wegenerally compute ℓ = lce ℓ ( i, j ) by simultaneously scanning the prefixes S [1 ..i ] and S [1 ..j ]from right to left until we find the first mismatch. This takes ℓ + 1 character comparisons,of which we charge ℓ comparisons to the interval ( i − ℓ, i ]. As before, if some lower bound k ≤ ℓ is known then we skip k character comparisons. In this case, we compute the L-LCEas ℓ = k + lce ℓ ( i − k, j − k ), and charge ℓ − k comparisons to the interval ( i − ℓ, i − k ].Again, we will show how to compute all values lce ℓ ( i, nss [ i ]) with i ∈ [1 , n ] and nss [ i ] ̸ = n + 1 such that each index gets charged at most once. In contrast to the more complexR-LCE iteration order, we can simply compute the L-LCE values in decreasing order of i .Thus, when we want to compute ℓ = lce ℓ ( i, j ) with j = nss [ i ] ̸ = n + 1, we have alreadyconsidered the indices I = { x | x ∈ ( i, n ] ∧ nss [ x ] ̸ = n + 1 } as left indices of L-LCEcomputations. The leftmost text position that we have already inspected so far at this pointis ← c = min x ∈ I ( x − lce ℓ ( x, nss [ x ])) if I ̸ = ∅ , or ← c = n otherwise. Due to our charging method,we have not charged any index from the interval [1 , ← c ] yet. Thus, we only have to show howto compute ℓ without charging indices from ( ← c , n ]. Let ℓ ′ = lce ℓ ( i ′ , j ′ ) be the most recentlycomputed L-LCE that satisfies i ′ − ℓ ′ = ← c . If i ≤ ← c then we compute ℓ naively and chargethe character comparisons to the interval ( i − ℓ, i ] (thus only charging previously unchargedindices). If however i > ← c , then our strategy is more complicated. Before explaining it indetail, we show three important properties that hold in the present situation.First, we show that i ≥ i ′ − ( j ′ − i ′ ). Assume the opposite (as visualized in Figure 3a),then from ← c = i ′ − ℓ ′ < i follows ℓ ′ > j ′ − i ′ . Thus, Lemma 11 implies nss [ i ′ − ( j ′ − i ′ )] = i ′ (dashed edge) and lce ℓ ( i ′ − ( j ′ − i ′ ) , i ′ ) = ℓ ′ − ( j ′ − i ′ ). Due to our order of computationand i < i ′ − ( j ′ − i ′ ) we must have already computed this L-LCE. However, it holds i ′ − ( j ′ − i ′ ) − lce ℓ ( i ′ − ( j ′ − i ′ ) , i ′ ) = → c , which contradicts the fact that ℓ ′ = lce ℓ ( i ′ , j ′ ) isthe most recently computed L-LCE with i ′ − ℓ ′ = ← c . . Ellert and J. Fischer 11 S = ← c ↓ i ↓ i ′ − ( j ′ − i ′ ) ↓ ∗ β i ′ ↓ ∗ β j ′ ↓ ∗ (a) S = ← c ↓ γ i ↓ β i ′ ↓ γ ( j ′ − i ′ + i ) ↓ β j ′ ↓ j ↓ (b) S = ← c ↓ i ↓ j ↓ i ′ ↓ α i ′′ ↓ j ′′ ↓ j ′ ↓ α (c) Figure 3
Illustration of the proofs of the three properties in Section 4.2.
Next, we show that j ≤ i ′ . First, note that j / ∈ ( i ′ , j ′ ), since due to i < i ′ we wouldotherwise contradict Lemma 5. Thus we only have to show j < j ′ . Assume for this purposethat j ≥ j ′ (as visualized in Figure 3b). From j ′ − i ′ + i ∈ ( i, nss [ i ]) and Definition 3 follows S i ≺ S j ′ − i ′ + i . Because of lce ℓ ( i ′ , j ′ ) > ( i ′ − i ) it holds S [ i..i ′ ] = S [ j ′ − i ′ + i..j ′ ](= β ). Thus S i ≺ S j ′ − i ′ + i implies S i ′ ≺ S j ′ , which contradicts the fact that nss [ i ′ ] = j ′ .Lastly, let d = j ′ − i ′ , i ′′ = i + d , and j ′′ = j + d (as visualized in Figure 3c). Now weshow that nss [ i ′′ ] = j ′′ (dashed edge in the figure). Because of α = S ( ← c ..i ′ ] = S ( j ′ − ℓ ′ ..j ′ ] itholds S [ i..j ) = S [ i ′′ ..j ′′ ). From nss [ i ] = j and Lemma 4 follows that S [ i ′′ ..j ′′ ) is a Lyndonword, and thus nss [ i ′′ ] ≥ j ′′ . We have already shown that i ≥ i ′ − ( j ′ − i ′ ), which implies i ′′ ≥ i ′ . Due to nss [ i ′ ] = j ′ and i ′′ ∈ [ i ′ , j ′ ) it follows from Lemma 5 that nss [ i ′′ ] ≤ j ′ . Nowassume nss [ i ′′ ] ∈ ( j ′′ , j ′ ], then S [ i ′′ .. nss [ i ′′ ]) = S [ i..j + ( nss [ i ′′ ] − j ′′ )) is a Lyndon word, whichcontradicts the fact that S [ i..j ) is the longest Lyndon word starting at position i . Thus, wehave ruled out all possible values of nss [ i ′′ ] except for j ′′ .Now we show how to compute ℓ . We keep using the definition of i ′′ and j ′′ from theprevious paragraph. Furthermore, let ℓ ′′ = lce ℓ ( i ′′ , j ′′ ). There are two possible cases. Case L1: ( i ′′ − ℓ ′′ ) > ( j ′ − ℓ ′ ). S = ← c ↓ α i ′ ↓ i ↓ j ↓ ( j ′ − ℓ ′ ) ↓ α j ′ ↓ i ′′ ↓ j ′′ ↓ ℓ ′ = | α | , ℓ = ℓ ′′ = | β | β β β β Due to ℓ ′′ = lce ℓ ( i ′′ , j ′′ ), the prefixes S [1 ..i ′′ ] and S [1 ..j ′′ ] share the suffix β = S ( i ′′ − ℓ ′′ ..i ′′ ] = S ( j ′′ − ℓ ′′ ..j ′′ ], and the first (from the right) mismatch between these prefixesis S [ i ′′ − ℓ ′′ ] ̸ = S [ j ′′ − ℓ ′′ ]. Both the shared suffix and the mismatch are contained in S ( j ′ − ℓ ′ ..j ′ ] (i.e. in the right occurrence of α ). If we consider the substring S ( ← c ..i ′ ]instead (i.e. the left occurrence of α ), then S [1 ..i ] and S [1 ..j ] clearly also share the suffix β = S ( i − ℓ ′′ ..i ] = S ( j − ℓ ′′ ..j ], with the first mismatch occurring at S [ i − ℓ ′′ ] ̸ = S [ j ′′ − ℓ ].Thus it holds ℓ = ℓ ′′ . Due to nss [ i ′′ ] = j ′′ and our order of L-LCE computations, wehave already computed ℓ ′′ . Therefore, we can simply assign ℓ ← ℓ ′′ and continue withoutchanging i ′ , j ′ , ℓ ′ , and ← c .(Note that possibly i ′′ ̸ = i ′ ∧ j ′′ = j ′ . We provide a sketch in Appendix A, Figure 4a.) Case L2: ( i ′′ − ℓ ′′ ) ≤ ( j ′ − ℓ ′ ). S = ← c ↓ α i ′ ↓ i ↓ j ↓ ( j ′ − ℓ ′ ) ↓ α j ′ ↓ i ′′ ↓ j ′′ ↓ ℓ ′ = | α | , ℓ ′′ = | βγ | , ℓ ≥ | β | β β γ β γ β This situation is similar to Case L1. There is a shared suffix β = S ( j ′ − ℓ ′ ..i ′′ ] = S ( j ′′ − ( i − ← c ) ..j ′′ ] between the prefixes S [1 ..i ′′ ] and S [1 ..j ′′ ]. They may share an even longer suffix γβ , but only the rightmost | β | = i ′ − ← c symbols of this suffix are containedin S ( j ′ − ℓ ′ ..j ′ ] (i.e. in the right occurrence of α ). If we consider the substring S ( ← c ..i ′ ]instead (i.e. the left occurrence of α ), then S [1 ..i ] and S [1 ..j ] clearly also share the suffix β = S ( ← c ..i ] = S ( j − ( i − ← c ) ..j ]. Thus it holds i − ← c ≤ ℓ , and we can skip the first i − ← c character comparisons by computing the LCE as ℓ = ( i − ← c ) + lce ℓ ( ← c , j + ← c − i ). Wecharge ℓ − ( i − ← c ) character comparisons to the previously uncharged interval ( i − ℓ, ← c ],and continue with i ′ ← i , j ′ ← j , ℓ ′ ← ℓ , and ← c ← i − ℓ .(Note that possibly i ′′ ̸ = i ′ ∧ j ′′ = j ′ or even i ′′ = i ′ ∧ j ′′ = j ′ . We provide schematicdrawings in Appendix A, Figures 4b and 4c.)We have shown how to compute ℓ without charging any index twice. It follows that thetotal number of character comparisons for all LCEs is O ( n ). For completeness, we outline asimple implementation of our approach in Algorithm 3. Lines 4–5 correspond to Case L1. If i ≤ ← c , then lines 7–9 compute the LCE naively. Otherwise, they correspond to Case L2. Algorithm 3
Compute all L-LCEs.
Input:
String S of length n and its NSS array nss . Output:
L-LCE value lce ℓ ( i, nss [ i ]) for each index i ∈ [1 , n ] with nss [ i ] ̸ = n + 1. i ′ ← ← c ← n ; d ← for i ∈ [1 , n ] with nss [ i ] ̸ = n + 1 in decreasing order do j ← nss [ i ] if i ∈ ( ← c , i ′ ) ∧ i − lce ℓ ( i + d, j + d ) > ← c then lce ℓ ( i, j ) ← lce ℓ ( i + d, j + d ) else k ← i − min( ← c , i ) lce ℓ ( i, j ) ← k + naive-scan-lce ℓ ( i − k, j − k ) i ′ ← i ; ← c ← i − lce ℓ ( i, j ); d ← j − i ▶ Lemma 13.
Given a string of length n and its NSS array nss , we can compute lce ℓ ( i, nss [ i ]) for all indices i ∈ [1 , n ] with nss [ i ] ̸ = n + 1 in O ( n ) time and space. ▶ Corollary 14.
Given a string of length n over a general ordered alphabet, we can find allruns in the string in O ( n ) time and space. Proof.
Computing the increasing runs takes O ( n ) time and space due to Lemmas 9, 12,and 13. For decreasing runs, we only have to reverse the order of the alphabet and rerun thealgorithm. ◀ We implemented our algorithm for the runs computation in C++17 and evaluated it bycomputing all runs on texts from the natural, real repetitive, and artificial repetitive textcollections of the Pizza-Chili corpus . Additionally, we used the binary run-rich strings http://pizzachili.dcc.uchile.cl/texts.html, http://pizzachili.dcc.uchile.cl/repcorpus.html . Ellert and J. Fischer 13 Table 1
Throughput achieved by our runs algorithm using an AMD EPYC 7452 processor. Werepeated each experiment five times and use the median throughput as the final result. All numbersare truncated to one decimal place. T e x t n i n M i B t [21] M i B sources M i B pitches M i B proteins M i B dna M i B english M i B xml M i B ecoli M i B cere M i B fib41 M i B rs.13 M i B tm29 M i B runs/100 n proposed by Matsubara et al. [21] as input. Table 1 shows the throughput that we achieve,i.e. the number of input bytes (or equivalently input symbols) that we process per second.On the string tm29 we achieve the highest throughput of 15 . . dna . Generally, we perform better for run-rich strings.Lastly, it is noteworthy that our new method of LCE computation leads to a remarkablysimple implementation of the runs algorithm. In fact, the entire implementation including thecomputation of the NSS array needs only 250 lines of code. We achieve this by interleavingthe computation of the R-LCEs with the computation of the NSS array, which also improvesthe practical performance. For technical details we refer to the source code, which is publiclyavailable on GitHub . We have shown the first linear time algorithm for computing all runs on a general orderedalphabet. The algorithm is also very fast in practice and remarkably easy to implement. Itis an open question whether our techniques could be used for the computation of runs ontries, where the best known algorithms require superlinear time even for linearly-sortablealphabets (see e.g. [24]).
References Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in lineartime?
Journal of Computer and System Sciences , 57(1):74–93, 1998. doi:https://doi.org/10.1006/jcss.1998.1580 . Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, andKazuya Tsuruta. The “runs” theorem.
SIAM Journal on Computing , 46(5):1501–1514, 2017. doi:10.1137/15M1011032 . Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, andEva Rotenberg. Space efficient construction of Lyndon arrays in linear time. In
Proceedings ofthe 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020) ,pages 14:1–14:18, Saarbrücken, Germany, July 2020. doi:10.4230/LIPIcs.ICALP.2020.14 . Helen Budworth and Cynthia T. McMurray.
A Brief History of Triplet Repeat Diseases ,volume 1010 of
Methods in Molecular Biology , pages 3–17. Springer, 2013. doi:10.1007/978-1-62703-411-1_1 . M. Crochemore, C.S. Iliopoulos, M. Kubica, J. Radoszewski, W. Rytter, and T. Waleń.Extracting powers and periods in a word from its runs structure.
Theoretical ComputerScience , 521:29 – 41, 2014. doi:10.1016/j.tcs.2013.11.018 . https://github.com/jonas-ellert/linear-time-runs/ Maxime Crochemore and Lucian Ilie. Maximal repetitions in strings.
Journal of Computerand System Sciences , 74(5):796 – 807, 2008. doi:10.1016/j.jcss.2007.09.003 . Maxime Crochemore, Lucian Ilie, and Wojciech Rytter. Repetitions in strings: Algorithmsand combinatorics.
Theoretical Computer Science , 410(50):5227–5235, 2009. doi:https://doi.org/10.1016/j.tcs.2009.08.024 . Maxime Crochemore, Lucian Ilie, and Liviu Tinta. The “runs” conjecture.
TheoreticalComputer Science , 412(27):2931 – 2941, 2011. doi:10.1016/j.tcs.2010.06.019 . Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Solon P. Pissis,Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Near-optimal computation of runsover general alphabet via non-crossing lce queries. In
Proceedings of the 23rd InternationalSymposium on String Processing and Information Retrieval (SPIRE 2016) , pages 22–34, Beppu,Japan, October 2016. doi:10.1007/978-3-319-46049-9\_3 . Johannes Fischer and Volker Heun. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In
Proceedings of the 17th Annual Symposiumon Combinatorial Pattern Matching (CPM 2006) , pages 36–48, Barcelona, Spain, July 2006. doi:10.1007/11780441\_5 . Johannes Fischer, Stepan Holub, Tomohiro I, and Moshe Lewenstein. Beyond the runstheorem. In Costas S. Iliopoulos, Simon J. Puglisi, and Emine Yilmaz, editors,
StringProcessing and Information Retrieval - 22nd International Symposium, SPIRE 2015, London,UK, September 1-4, 2015, Proceedings , volume 9309 of
Lecture Notes in Computer Science ,pages 277–286. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23826-5_27, doi:10.1007/978-3-319-23826-5\_27 . Frantisek Franek, A. S. M. Sohidull Islam, Mohammad Sohel Rahman, and William F.Smyth. Algorithms to compute the Lyndon array. In
Proceedings of the Prague StringologyConference 2016 (PSC 2016) Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longestcommon extension queries in strings over general alphabets. In
Proceedings of the 27th AnnualSymposium on Combinatorial Pattern Matching (CPM 2016) , pages 5:1–5:13, Tel Aviv, Israel,June 2016. doi:10.4230/LIPIcs.CPM.2016.5 . Torben Hagerup. Sorting and searching on the word RAM. In
Proceedings of the 15th AnnualSymposium on Theoretical Aspects of Computer Science (STACS 98) , pages 366–398, Paris,France, February 1998. doi:10.1007/BFb0028575 . Yijie Han and M. Thorup. Integer sorting in O ( n √ log log n ) expected time and linear space. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS2002) , pages 135–144, Vancouver, Canada, November 2002. doi:10.1109/SFCS.2002.1181890 . Stepan Holub. Prefix frequency of lost positions.
Theor. Comput. Sci. , 684:43–52, 2017. URL:https://doi.org/10.1016/j.tcs.2017.01.026, doi:10.1016/j.tcs.2017.01.026 . R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In
Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS 1999) ,pages 596–604, New York, NY, USA, October 1999. doi:10.1109/SFFCS.1999.814634 . Dmitry Kosolobov. Lempel-Ziv factorization may be harder than computing all runs. In
Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science(STACS 2015) , pages 582–593, Munich, Germany, March 2015. doi:10.4230/LIPIcs.STACS.2015.582 . Dmitry Kosolobov. Computing runs on a general alphabet.
Information Processing Letters ,116(3):241 – 244, 2016. doi:10.1016/j.ipl.2015.11.016 . R. C. Lyndon and M. P. Schützenberger. The equation a m = b n c p in a free group. MichiganMathematical Journal , 9(4):289–298, 1962. doi:10.1307/mmj/1028998766 . Wataru Matsubara, Kazuhiko Kusano, Hideo Bannai, and Ayumi Shinohara. A series ofrun-rich strings. In Adrian Horia Dediu, Armand Mihai Ionescu, and Carlos Martín-Vide,editors,
Proceedings of the 3rd International Conference on Language and Automata Theory . Ellert and J. Fischer 15 and Applications (LATA 2009) , pages 578–587, Tarragona, Spain, April 2009. doi:10.1007/978-3-642-00982-2_49 . Simon J. Puglisi, Jamie Simpson, and W.F. Smyth. How many runs can a string contain?
Theoretical Computer Science , 401(1):165 – 171, 2008. doi:10.1016/j.tcs.2008.04.020 . Wojciech Rytter. The number of runs in a string: Improved analysis of the linear upper bound.In
Proceedings of the 24th Annual Symposium on Theoretical Aspects of Computer Science(STACS 2006) , pages 184–195, Marseille, France, February 2006. doi:10.1007/11672142\_14 . Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda.Computing runs on a trie. In