Words with unbounded periodicity complexity
aa r X i v : . [ c s . F L ] J u l WORDS WITH UNBOUNDED PERIODICITYCOMPLEXITY ˇSTˇEP ´AN HOLUB
Abstract.
If an infinite non-periodic word is uniformly recurrent oris of bounded repetition, then the limit of its periodicity complexity isinfinity. Moreover, there are uniformly recurrent words with the peri-odicity complexity arbitrarily high at infinitely many positions. Introduction
In [5], a new complexity function of infinite words, called periodicity com-plexity , was introduced. It gives, for any position in the word, the averagevalue of the local periods up to that position. The authors construct aninfinite word for which the periodicity complexity is bounded. Since theword is not uniformly recurrent, they ask whether there exist non-periodicuniformly recurrent words having bounded periodicity complexity (see Re-mark 3.8. in [5]). In Section 3, we define words with special lexicographicproperties. In Section 4 we use those words to define a factorization of anynon-periodic uniformly recurrent word. This factorization allows to show,in Section 5, that the answer to the above question is negative. Moreover, inSection 6, we show that any word with bounded repetition has unboundedperiodicity complexity too.The authors of [5] also prove that the periodicity complexity can exceedany fixed function infinitely many times. Again, the witness word is notuniformly recurrent, and a problem is left open whether the same is truefor uniformly recurrent words (see Remark 3.8. in [5]). In Section 7 weconstruct a uniformly recurrent word showing that the answer is positive.The method can be seen as a generalized Toeplitz construction.2.
Basic concepts
We first recall basic definitions and concepts. Let w = a a a · · · , where a i are letters, be a finite or infinite word over some alphabet Σ. A position in the word is any integer 1 ≤ i ≤ | w | (we have | w | = ∞ if w is infinite).The position i can be understood as the border between a i and a i +1 , butwe will rather identify it with the pair ( u, v ) of words such that w = uv and | u | = i . Note that, unlike [5], we consider also the position | w | for afinite word, which does not lie between two letters and corresponds to ( w, ε ),where ε denotes the empty word. Mathematics Subject Classification.
Key words and phrases. periodicity complexity, combinatorics on words.Supported by the Czech Science Foundation grant number 13-01832S. y proper prefix of w we mean any prefix, including the empty one, thatis strictly shorter than w . The period of w , denoted by p ( w ), is the leastinteger p such that a i = a i + p holds for all i ≥ i + p ≤ | w | .A finite word w is called unbordered if p ( w ) = | w | . Otherwise, w is called bordered , and any proper nonempty prefix of w that is also a suffix of w iscalled its border . It is important and easy to see that the shortest border ofa bordered word is itself unbordered.If sup { e : v e is a factor of u for some nonempty word v } for some infinite word u is finite, we say that u is of bounded repetition .A word is primitive if it is not a power of a shorter word. A primitiveword w is a Lyndon word if it is lexicographically minimal within the set ofall words conjugate with it. That is, w ⊳ vu for any factorization w = uv ,where ⊳ is a lexicographic order. It is well known that Lyndon words areunbordered.A repetition word at the position ( u, v ) is any nonempty word r that issuffix comparable with u and prefix comparable with v (any word beingprefix comparable with the empty word). The local period of w , denoted by p w ( i ), is defined by p w ( i ) = min {| r | : r is a repetition word at the position i } . Note that an infinite word may have positions ( u, v ) with no repetition word.This happen if and only if there is no repetition word of length at most | u | and u is not a factor of v . Then the corresponding local period is ∞ (inaccordance with the usual definition of min ∅ ). Note also that p w ( | w | ) = 1for any finite w .The most important result concerning local periods is the Critical Fac-torization Theorem, which states that for any finite word there is a position i such that p w ( i ) = p ( w ). Such a position is called critical . For a proof ofthe Critical Factorization Tehorem using lexicographic orderings, see [2].Let w = uvz be a factorization of w . We will say that the position1 ≤ i ≤ | v | of v corresponds to the position | u | + i of w . We also say thatthe position i of w lies in v if | u | < i ≤ | uv | . This is rather informal since aparticular occurrence of v in w is implicitly understood, but it will be alwaysclear from the context.Note that p v ( i ) ≤ p uvz ( | u | + i ) , ≤ i ≤ | v | . (1)Informally, the local period of a word at a position is at least the local periodof its factor at the corresponding position.We say that j ≥ occurrence of a word z in w if w = uv , | u | = j and z is a prefix of v . If j and j ′ are two consecutive occurrences of z in w = a a a · · · , then a j +1 a j +2 · · · a j ′ is called a return word to z in w .Return words were introduced and studied in [3].An infinite word w is called recurrent if any factor of w has an infinitenumber of occurrences in w . It is called uniformly recurrent if for any factor z , the length of all return words to z in w is bounded. We call the lengthof the longest return word to z in w the maximal return time of z in w . ote that if an infinite word is recurrent, then its local period is finite at allpositions.The property studied in this paper is the periodicity complexity of w whichis the function h w defined on positions of w by h w ( i ) = 1 i i X j =1 p w ( j ) . The notion was introduced (for infinite words) in [5]. If w is infinite, then itis reasonable to suppose that w is recurrent, which guarantees that the rangeof h w are positive rationals (no infinity has to be dealt with). However, wecan also put h w ( i ) = ∞ if p w ( j ) = ∞ for some 1 ≤ j ≤ i . For a finite w ,denote h ( w ) := h w ( | w | ) . Note the following useful inequality, which follows from (1): h ( uv ) ≥ | u | · h ( u ) + | v | · h ( v ) | uv | . (2) 3. Lexicographically minimal return words
Let u be an infinite non-periodic uniformly recurrent word. Such a wordhas the following property. Lemma 1.
Let v be a factor of u . Then the set of integers e such that v e is a factor of u is finite.Proof. Since u is non-periodic, it contains a word w with the period greaterthan | v | . Let m be the maximal return time of w in u . Then any word oflength m + | w | contains w as a factor. This implies that v e , which has theperiod smaller than w , has to be shorter than m + | w | . (cid:3) We define recursively an infinite sequence of words α k , k ≥
1. Fix alexicographic order ⊳ with the least letter a , and let α = a . For k > e k be the largest integer such that α e k k is a factor of u . The definitionof e k is correct by Lemma 1. Now α k +1 is defined as the lexicographicallyminimal return word to α e k k in u . Lemma 2.
For each k ≥ , the word α k is the lexicographically minimal fac-tor of u of length | α k | and each return word to α e k k is Lyndon. In particular,the word α k is unbordered for each k ≥ .Proof. Proceed by induction. Clearly, α = a is lexicographically minimalword of length one and unbordered. Let k >
1. Since α k − is unborderedand e k is maximal, two distinct occurrences of α e k − k − in u do not overlap.Therefore α e k − k − is a prefix of α k . From lexicographic minimality of α k − ,we deduce that α e k − k − is a prefix of the lexicographically minimal factor of u of length | α k | . Such a word is therefore prefix comparable with somereturn word to α e k − k − . The definition of α k now implies that α k is thelexicographically minimal factor of u of its length.Let w be a return word to α e k k , k ≥
1. We first show that w is unbordered.Suppose that r is the shortest border of w and let j be the largest integersuch that α j is a prefix of r . Clearly, 1 ≤ j < k . The maximality of e j mplies that r = α j , since rα k , and therefore also rα e j j , is a factor of u .Recall that r is unbordered since it is a shortest border. Therefore α e j j is aproper prefix of r (otherwise r is bordered), and r is a proper prefix of α j +1 (otherwise j is not maximal). This implies that r is a return word to α e j j that is lexicographically smaller than α j +1 , a contradiction.Let now w = uv be such that vu is the Lyndon conjugate of w and supposethat both u and v are nonempty. Since w is a return word, the word α e k k does not occur in v . The lexicographic minimality of α k and vu ⊳ w impliesthat v is a prefix of α e k k . Therefore w is bordered, a contradiction.The word α k , k >
1, is unbordered since it is a return word to α e k − k − . (cid:3) Unbordered factorizations
For any factor, an infinite recurrent word admits a factorization definedby occurrences of that factor. Such a factorization was considered alreadyin [3]. We will study factorizations of u given by α e k k . For each k ≥
1, let u = w k, w k, w k, w k, · · · , where w k, α e k k is the shortest prefix of u containing α e k k , and w k,j is the j th return word to α e k k in u , for j ≥
1. In other words, the integer | w k, w k, · · · w k,j − | is the j th occurrence of α e k k in u .By Lemma 2, all words w k,j , k ≥ j ≥
1, are unbordered. Moreover,the k th factorization is a refinement of the ( k + 1)th one by the definitionof α k . In particular, for each k < k ′ and each j ≥
1, there are numbers s, t ≥ w k ′ ,j = w k,s w i,s +1 · · · w i,s + t . We have already seen in theproof of Lemma 2 that two distinct occurrences of α e k k in u do not overlap.Therefore α e k k is a prefix of w k,j , for each k, j ≥ h k := inf j ≥ { h ( w k,j ) } . The following lemma is the core of the proof of Theorem 1.
Lemma 3.
The sequence ( h k ) is unbounded.Proof. We will show that for each k there is some k ′ > k such that h k ′ ≥ h k + .Since u is uniformly recurrent, the maximal return time of α m k k in u m k := max j ≥ {| w k,j |} is finite for each k ≥
0. On the other hand, the sequence ( µ k ) where µ k := min j ≥ {| w k,j |} is strictly growing since α e k k is a prefix of each w k,j and | α k | is growing.Therefore, for each k , there is some k ′ such that µ k ′ > m k . (3)We claim that h k ′ ≥ h k + as required.Chose j ≥ w k ′ ,j = w k,s w k,s +1 · · · w k,s + t . n order to obtain a lower bound for h ( w k ′ ,j ), estimate the local period ateach position in w k ′ ,j by the local period at the corresponding position of afactor w k,s ′ , s ≤ s ′ ≤ s + t , with only one exception: the critical position of w k ′ ,j . At that position (we chose one of them if there are many) we shallinsist on the actual value, which is | w k ′ ,j | since w k ′ ,j is unbordered.Let ℓ be such that the chosen critical position of w k ′ ,j lies in w k,ℓ . Thenthe local period of w k,ℓ at that position, which is at most | w k,ℓ | , is replacedby | w k ′ ,j | . By (2), we obtain the following bound. h ( w k ′ ,j ) ≥ P ti =0 | w k,s + i | · h ( w k,s + i ) + | w k ′ ,j | − | w k,ℓ || w k ′ ,j |≥ P ti =0 | w k,s + i | · h k | w k ′ ,j | + 1 − | w k,ℓ || w k ′ ,j |≥ h k + 1 − | w k,ℓ || w k ′ ,j | ≥ h k + 12 , where the last inequality follows from (3). This completes the proof. (cid:3) Uniformly recurrent words
We can now prove the first main result.
Theorem 1.
Let u be an infinite uniformly recurrent non-periodic word.Then lim i →∞ h u ( i ) = ∞ . Proof.
For given n , we want to find i n such that, for each i ≥ i n , we have h u ( i ) ≥ n .Let k be such that h k ≥ n + 1. Denote m = max j ≥ {| w k,j |} = max { m k , | w k, |} . Let u be the prefix of u of lenght i ≥ nm . The word u can be factorized as u = w k, w k, w k, . . . w k,d u ′ , where u ′ is a proper prefix of w k,d +1 . The sum of local periods at positionsof u lying either in w k, or in u ′ is at least | w ,k u ′ | , and the sum of localperiods for positions lying in w k, w k, . . . w k,d is at least h k · | w k, w k, . . . w k,d | ≥ ( n + 1)( | u | − | w k, u ′ | ) . Therefore h u ( i ) ≥ h ( u ) ≥ | w k, u ′ | + ( n + 1)( | u | − | w k, u ′ | ) | u | == n + 1 − n · | w k, u ′ || u | > n + 1 − n · m nm = n. The last inequality uses | u | ≥ nm and | w k, u ′ | < m . (cid:3) . Bounded repetition
We shall now extend our result to words with bounded repetition. Let u be of bounded repetition and let e < ∞ be the largest integer such that v e is a factor of u for some v . For each k ≥
0, we define factorizations u = z k, z k, z k, · · · , where | z k,j | = 2 k for each j ≥
0. Then z k + i,j = z k, i j z k, i j +1 z k, i j +2 · · · z k, i j +2 i − . Denote b k := inf j ≥ { h ( z k,j ) } . We can prove an analogue to Lemma 3.
Lemma 4.
The sequence ( b k ) is unbounded.Proof. For given k , let k ′ be such that2 k ′ ≥ e · k +1 . (4)We show that b k ′ ≥ b k + .Chose j ≥ p be the period of z k ′ ,j . Then2 k ′ p ≤ e, whence p ≥ k ′ e ≥ k +1 . (5)Let z k ′ ,j = v s v ′ , where | v | = p and v ′ is a proper prefix of v .We bound the value h ( z k ′ ,j ) from below similarly as in the proof of Lemma3. Estimate the local period at each position of z k ′ ,j by the local period atthe corresponding position of a factor z k,j ′ , and then, for a chosen criticalposition of v and each of s occurrences of v , replace the previous valuewith p . Such a replacement increases the value by at least p − k for eachoccurrence of v . This yields the following bound. h ( z k ′ ,j ) ≥ b k + s ( p − k )2 k ′ . Since p is the period of z k ′ ,j , it is easy to see that sp > | z k ′ ,j | . From (5), we deduce s · k k ′ ≤ · sp k ′ . Altogether, we have the desired inequality h ( z k ′ ,j ) ≥ b k + s ( p − k )2 k ′ ≥ b k + 12 · sp k ′ > b k + 14 . This completes the proof. (cid:3) he following theorem is now easy to prove. Theorem 2.
Let u be an infinite word with bounded repetition. Then lim i →∞ h u ( i ) = ∞ . Proof.
For a given n , let k be such that b k ≥ n . Consider the prefix u of u of length i ≥ k . Then u = z k ′ , u ′ , where k ′ ≥ k and u ′ is a proper prefix of z k ′ , . We have h u ( i ) ≥ h ( u ) ≥ b k · | z k ′ , | + | u ′ || u | ≥ b k ≥ n. This completes the proof. (cid:3)
Remark . Theorem 1 and Theorem 2 show that the construction of aninfinite word with a bounded periodicity complexity as it is given in [5] isthe only possible in the following sense. Let u be an infinite word that is notultimately periodic and its periodicity complexity is bounded by n . Then u can be factorized as u = v u e v u e v u e · · · , where h ( u i ) < n for each i ≥
0, and the sequence of exponents ( e i ) isunbounded. 7. High periodicity complexities
This section solves the problem left open in [5], Remark 3.23, by provingthe following improvement of Theorem 3.20, ibidem.
Theorem 3.
For each function f : N → N there is a uniformly recurrentword u such that h u ( d ) > f ( d ) for infinitely many integers d .Proof. Let n i , i ≥
1, be a growing sequence of positive integers with n ≥ u i , i ≥
0, of words by u = ε,u i = u i − a ( u i − b ) n i u i − , i ≥ , and put u := lim i →∞ u i . We claim that u is uniformly recurrent, and for each j ≥
1, we have p u ( d j ) = ( n j + 1)( | u j − | + 1) , (6)where d j = | u j − au j − a · · · u aa | . It is easy to verify inductively that u i a, u i b ∈ { u i − a, u i − b } + . (7)This implies that, for each i ≥
1, the word u can be factorized as a productof words u i a and u i b . That is, u = u i c u i c u i c · · · , here c j , j ≥
1, are letters. Therefore each factor z whose first occurencelies within u i has the maximal return time bounded by | u i | + 1, and u isuniformly recurrent.We show that u contains only occurrences of u i , i ≥
1, visible in theabove factorization. More precisely, if k is an occurrence of u i , then k is amultiple of | u i | + 1. For u = ab n , the property is easily verified. Proceedby induction. Let k be an occurrence of u i in u with i ≥
2. The inductionassumption and (7) implies that k = k ′ · ( | u i | + 1) + ℓ · ( | u i − | + 1)for some k ′ ≥ ≤ ℓ ≤ n i − + 1. The word u i c , where c ∈ { a, b } ,contains at most two occurrences of u i − a , and only one of them is followedby u i − b , the one for which ℓ is zero. This proofs the claim since u i − au i − b is a prefix of u i .We are ready to prove (6). For each j ≥
1, denote s j := u j − au j − a · · · u aa, which is a prefix of u . Let r j be the shortest repetition word at the position d j . Observe that d = 1 and r = b n a , thus (6) holds for j = 1. We proceedby induction and consider two possibilities for r j +1 .1. If | r j +1 | < | s j | + | u j − | + 1, then r j +1 is a prefix of s − j u j − a ( u j − b ) u j − . This implies that it is a repetition word at the position d j as well, which isa contradiction with the induction assumption. u j a u j b · · · u j − a u j − a u j − b u j − b u j − · · · s j +1 s j r j +1 r j +1
2. Let now | r j +1 | ≥ | s j | + | u j − | + 1. Then ( u j − a ) is a factor of r j +1 .One can verify that the first occurrence of ( u j − a ) greater or equal to d j +1 is in the second occurrence of s j +1 , the latter being ( n j +1 + 1)( | u j | + 1).Therefore the word s − j +1 u j a ( u j b ) n j +1 s j +1 of length predicted by (6) is the shortest repetition word at the position d j +1 .To conclude the proof of the theorem, it is enough to define n j > f ( d j )since then we have, for each j ≥ h u ( d j ) > d j p u ( d j ) > f ( d j ) 2( | u j − | + 1) d j > f ( d j ) . (cid:3) We can also give an explicit formula for the i th letter of the word u constructed in the previous proof. Let u = a a a . . . , where a i ∈ { a, b } ,and let m := 1 and m j := n j + 2 for j ≥
1. Note that for each j ≥ ave | u j | + 1 = m m m · · · m j . Then a i = a if and only if there is some j ≥ i ≡ m m m · · · m j mod m m m · · · m j m j +1 . Note that the word u can be also obtained by the Toeplitz construction(see [1, 4]). Consider the alphabet { a, b, ? } and let T ( w , v ), where w and v are infinite words over { a, b, ? } , denote the infinite word obtained from w by replacing the sequence of all occurrences of ? by v . Then we can define u =? ω , u i = T ( u i − , ( ab n i ?) ω ) , i ≥ , and u = lim i →∞ u i . It is a task of further research to establish the level of control over theperiodicity complexity function of resulting words given by the choice of thesequence ( n i ) in this construction. References [1] Julien Cassaigne and Juhani Karhum¨aki. Toeplitz words, generalized periodicity andperiodically iterated morphisms.
Eur. J. Comb. , 18(5):497–510, 1997.[2] Maxime Crochemore and Dominique Perrin. Two-way string matching.
J. ACM ,38(3):651–675, 1991.[3] Fabien Durand. A characterization of substitutive sequences using return words.
Dis-crete Mathematics , 179(13):89 – 101, 1998.[4] Michel Koskas. Complexits de suites de toeplitz.
Discrete Mathematics , 183(13):161 –183, 1998.[5] Filippo Mignosi and Anstonio Restivo. A new complexity function for words based onperiodicity.
International Journal of Algebra and Computation , 23(04):963–987, 2013.
Department of Algebra, Charles University, Sokolovsk´a 83, 175 86 Praha,Czech Republic
E-mail address : [email protected]@karlin.mff.cuni.cz