[PDF] Words with unbounded periodicity complexity

Abstract

If an infinite non-periodic word is uniformly recurrent or is of bounded repetition, then the limit of its periodicity complexity is infinity. Moreover, there are uniformly recurrent words with the periodicity complexity arbitrarily high at infinitely many positions.

Full PDF

aa r X i v : . [ c s . F L ] J u l WORDS WITH UNBOUNDED PERIODICITYCOMPLEXITY ˇSTˇEP ´AN HOLUB

Abstract.

If an inﬁnite non-periodic word is uniformly recurrent oris of bounded repetition, then the limit of its periodicity complexity isinﬁnity. Moreover, there are uniformly recurrent words with the peri-odicity complexity arbitrarily high at inﬁnitely many positions. Introduction

In [5], a new complexity function of inﬁnite words, called periodicity com-plexity , was introduced. It gives, for any position in the word, the averagevalue of the local periods up to that position. The authors construct aninﬁnite word for which the periodicity complexity is bounded. Since theword is not uniformly recurrent, they ask whether there exist non-periodicuniformly recurrent words having bounded periodicity complexity (see Re-mark 3.8. in [5]). In Section 3, we deﬁne words with special lexicographicproperties. In Section 4 we use those words to deﬁne a factorization of anynon-periodic uniformly recurrent word. This factorization allows to show,in Section 5, that the answer to the above question is negative. Moreover, inSection 6, we show that any word with bounded repetition has unboundedperiodicity complexity too.The authors of [5] also prove that the periodicity complexity can exceedany ﬁxed function inﬁnitely many times. Again, the witness word is notuniformly recurrent, and a problem is left open whether the same is truefor uniformly recurrent words (see Remark 3.8. in [5]). In Section 7 weconstruct a uniformly recurrent word showing that the answer is positive.The method can be seen as a generalized Toeplitz construction.2.

Basic concepts

We ﬁrst recall basic deﬁnitions and concepts. Let w = a a a · · · , where a i are letters, be a ﬁnite or inﬁnite word over some alphabet Σ. A position in the word is any integer 1 ≤ i ≤ | w | (we have | w | = ∞ if w is inﬁnite).The position i can be understood as the border between a i and a i +1 , butwe will rather identify it with the pair ( u, v ) of words such that w = uv and | u | = i . Note that, unlike [5], we consider also the position | w | for aﬁnite word, which does not lie between two letters and corresponds to ( w, ε ),where ε denotes the empty word. Mathematics Subject Classiﬁcation.

Key words and phrases. periodicity complexity, combinatorics on words.Supported by the Czech Science Foundation grant number 13-01832S. y proper preﬁx of w we mean any preﬁx, including the empty one, thatis strictly shorter than w . The period of w , denoted by p ( w ), is the leastinteger p such that a i = a i + p holds for all i ≥ i + p ≤ | w | .A ﬁnite word w is called unbordered if p ( w ) = | w | . Otherwise, w is called bordered , and any proper nonempty preﬁx of w that is also a suﬃx of w iscalled its border . It is important and easy to see that the shortest border ofa bordered word is itself unbordered.If sup { e : v e is a factor of u for some nonempty word v } for some inﬁnite word u is ﬁnite, we say that u is of bounded repetition .A word is primitive if it is not a power of a shorter word. A primitiveword w is a Lyndon word if it is lexicographically minimal within the set ofall words conjugate with it. That is, w ⊳ vu for any factorization w = uv ,where ⊳ is a lexicographic order. It is well known that Lyndon words areunbordered.A repetition word at the position ( u, v ) is any nonempty word r that issuﬃx comparable with u and preﬁx comparable with v (any word beingpreﬁx comparable with the empty word). The local period of w , denoted by p w ( i ), is deﬁned by p w ( i ) = min {| r | : r is a repetition word at the position i } . Note that an inﬁnite word may have positions ( u, v ) with no repetition word.This happen if and only if there is no repetition word of length at most | u | and u is not a factor of v . Then the corresponding local period is ∞ (inaccordance with the usual deﬁnition of min ∅ ). Note also that p w ( | w | ) = 1for any ﬁnite w .The most important result concerning local periods is the Critical Fac-torization Theorem, which states that for any ﬁnite word there is a position i such that p w ( i ) = p ( w ). Such a position is called critical . For a proof ofthe Critical Factorization Tehorem using lexicographic orderings, see [2].Let w = uvz be a factorization of w . We will say that the position1 ≤ i ≤ | v | of v corresponds to the position | u | + i of w . We also say thatthe position i of w lies in v if | u | < i ≤ | uv | . This is rather informal since aparticular occurrence of v in w is implicitly understood, but it will be alwaysclear from the context.Note that p v ( i ) ≤ p uvz ( | u | + i ) , ≤ i ≤ | v | . (1)Informally, the local period of a word at a position is at least the local periodof its factor at the corresponding position.We say that j ≥ occurrence of a word z in w if w = uv , | u | = j and z is a preﬁx of v . If j and j ′ are two consecutive occurrences of z in w = a a a · · · , then a j +1 a j +2 · · · a j ′ is called a return word to z in w .Return words were introduced and studied in [3].An inﬁnite word w is called recurrent if any factor of w has an inﬁnitenumber of occurrences in w . It is called uniformly recurrent if for any factor z , the length of all return words to z in w is bounded. We call the lengthof the longest return word to z in w the maximal return time of z in w . ote that if an inﬁnite word is recurrent, then its local period is ﬁnite at allpositions.The property studied in this paper is the periodicity complexity of w whichis the function h w deﬁned on positions of w by h w ( i ) = 1 i i X j =1 p w ( j ) . The notion was introduced (for inﬁnite words) in [5]. If w is inﬁnite, then itis reasonable to suppose that w is recurrent, which guarantees that the rangeof h w are positive rationals (no inﬁnity has to be dealt with). However, wecan also put h w ( i ) = ∞ if p w ( j ) = ∞ for some 1 ≤ j ≤ i . For a ﬁnite w ,denote h ( w ) := h w ( | w | ) . Note the following useful inequality, which follows from (1): h ( uv ) ≥ | u | · h ( u ) + | v | · h ( v ) | uv | . (2) 3. Lexicographically minimal return words

Let u be an inﬁnite non-periodic uniformly recurrent word. Such a wordhas the following property. Lemma 1.

Let v be a factor of u . Then the set of integers e such that v e is a factor of u is ﬁnite.Proof. Since u is non-periodic, it contains a word w with the period greaterthan | v | . Let m be the maximal return time of w in u . Then any word oflength m + | w | contains w as a factor. This implies that v e , which has theperiod smaller than w , has to be shorter than m + | w | . (cid:3) We deﬁne recursively an inﬁnite sequence of words α k , k ≥

1. Fix alexicographic order ⊳ with the least letter a , and let α = a . For k > e k be the largest integer such that α e k k is a factor of u . The deﬁnitionof e k is correct by Lemma 1. Now α k +1 is deﬁned as the lexicographicallyminimal return word to α e k k in u . Lemma 2.

For each k ≥ , the word α k is the lexicographically minimal fac-tor of u of length | α k | and each return word to α e k k is Lyndon. In particular,the word α k is unbordered for each k ≥ .Proof. Proceed by induction. Clearly, α = a is lexicographically minimalword of length one and unbordered. Let k >

1. Since α k − is unborderedand e k is maximal, two distinct occurrences of α e k − k − in u do not overlap.Therefore α e k − k − is a preﬁx of α k . From lexicographic minimality of α k − ,we deduce that α e k − k − is a preﬁx of the lexicographically minimal factor of u of length | α k | . Such a word is therefore preﬁx comparable with somereturn word to α e k − k − . The deﬁnition of α k now implies that α k is thelexicographically minimal factor of u of its length.Let w be a return word to α e k k , k ≥

1. We ﬁrst show that w is unbordered.Suppose that r is the shortest border of w and let j be the largest integersuch that α j is a preﬁx of r . Clearly, 1 ≤ j < k . The maximality of e j mplies that r = α j , since rα k , and therefore also rα e j j , is a factor of u .Recall that r is unbordered since it is a shortest border. Therefore α e j j is aproper preﬁx of r (otherwise r is bordered), and r is a proper preﬁx of α j +1 (otherwise j is not maximal). This implies that r is a return word to α e j j that is lexicographically smaller than α j +1 , a contradiction.Let now w = uv be such that vu is the Lyndon conjugate of w and supposethat both u and v are nonempty. Since w is a return word, the word α e k k does not occur in v . The lexicographic minimality of α k and vu ⊳ w impliesthat v is a preﬁx of α e k k . Therefore w is bordered, a contradiction.The word α k , k >

1, is unbordered since it is a return word to α e k − k − . (cid:3) Unbordered factorizations

For any factor, an inﬁnite recurrent word admits a factorization deﬁnedby occurrences of that factor. Such a factorization was considered alreadyin [3]. We will study factorizations of u given by α e k k . For each k ≥

1, let u = w k, w k, w k, w k, · · · , where w k, α e k k is the shortest preﬁx of u containing α e k k , and w k,j is the j th return word to α e k k in u , for j ≥

1. In other words, the integer | w k, w k, · · · w k,j − | is the j th occurrence of α e k k in u .By Lemma 2, all words w k,j , k ≥ j ≥

1, are unbordered. Moreover,the k th factorization is a reﬁnement of the ( k + 1)th one by the deﬁnitionof α k . In particular, for each k < k ′ and each j ≥

1, there are numbers s, t ≥ w k ′ ,j = w k,s w i,s +1 · · · w i,s + t . We have already seen in theproof of Lemma 2 that two distinct occurrences of α e k k in u do not overlap.Therefore α e k k is a preﬁx of w k,j , for each k, j ≥ h k := inf j ≥ { h ( w k,j ) } . The following lemma is the core of the proof of Theorem 1.

Lemma 3.

The sequence ( h k ) is unbounded.Proof. We will show that for each k there is some k ′ > k such that h k ′ ≥ h k + .Since u is uniformly recurrent, the maximal return time of α m k k in u m k := max j ≥ {| w k,j |} is ﬁnite for each k ≥

0. On the other hand, the sequence ( µ k ) where µ k := min j ≥ {| w k,j |} is strictly growing since α e k k is a preﬁx of each w k,j and | α k | is growing.Therefore, for each k , there is some k ′ such that µ k ′ > m k . (3)We claim that h k ′ ≥ h k + as required.Chose j ≥ w k ′ ,j = w k,s w k,s +1 · · · w k,s + t . n order to obtain a lower bound for h ( w k ′ ,j ), estimate the local period ateach position in w k ′ ,j by the local period at the corresponding position of afactor w k,s ′ , s ≤ s ′ ≤ s + t , with only one exception: the critical position of w k ′ ,j . At that position (we chose one of them if there are many) we shallinsist on the actual value, which is | w k ′ ,j | since w k ′ ,j is unbordered.Let ℓ be such that the chosen critical position of w k ′ ,j lies in w k,ℓ . Thenthe local period of w k,ℓ at that position, which is at most | w k,ℓ | , is replacedby | w k ′ ,j | . By (2), we obtain the following bound. h ( w k ′ ,j ) ≥ P ti =0 | w k,s + i | · h ( w k,s + i ) + | w k ′ ,j | − | w k,ℓ || w k ′ ,j |≥ P ti =0 | w k,s + i | · h k | w k ′ ,j | + 1 − | w k,ℓ || w k ′ ,j |≥ h k + 1 − | w k,ℓ || w k ′ ,j | ≥ h k + 12 , where the last inequality follows from (3). This completes the proof. (cid:3) Uniformly recurrent words

We can now prove the ﬁrst main result.

Theorem 1.

Let u be an inﬁnite uniformly recurrent non-periodic word.Then lim i →∞ h u ( i ) = ∞ . Proof.

For given n , we want to ﬁnd i n such that, for each i ≥ i n , we have h u ( i ) ≥ n .Let k be such that h k ≥ n + 1. Denote m = max j ≥ {| w k,j |} = max { m k , | w k, |} . Let u be the preﬁx of u of lenght i ≥ nm . The word u can be factorized as u = w k, w k, w k, . . . w k,d u ′ , where u ′ is a proper preﬁx of w k,d +1 . The sum of local periods at positionsof u lying either in w k, or in u ′ is at least | w ,k u ′ | , and the sum of localperiods for positions lying in w k, w k, . . . w k,d is at least h k · | w k, w k, . . . w k,d | ≥ ( n + 1)( | u | − | w k, u ′ | ) . Therefore h u ( i ) ≥ h ( u ) ≥ | w k, u ′ | + ( n + 1)( | u | − | w k, u ′ | ) | u | == n + 1 − n · | w k, u ′ || u | > n + 1 − n · m nm = n. The last inequality uses | u | ≥ nm and | w k, u ′ | < m . (cid:3) . Bounded repetition

We shall now extend our result to words with bounded repetition. Let u be of bounded repetition and let e < ∞ be the largest integer such that v e is a factor of u for some v . For each k ≥

0, we deﬁne factorizations u = z k, z k, z k, · · · , where | z k,j | = 2 k for each j ≥

0. Then z k + i,j = z k, i j z k, i j +1 z k, i j +2 · · · z k, i j +2 i − . Denote b k := inf j ≥ { h ( z k,j ) } . We can prove an analogue to Lemma 3.

Lemma 4.

The sequence ( b k ) is unbounded.Proof. For given k , let k ′ be such that2 k ′ ≥ e · k +1 . (4)We show that b k ′ ≥ b k + .Chose j ≥ p be the period of z k ′ ,j . Then2 k ′ p ≤ e, whence p ≥ k ′ e ≥ k +1 . (5)Let z k ′ ,j = v s v ′ , where | v | = p and v ′ is a proper preﬁx of v .We bound the value h ( z k ′ ,j ) from below similarly as in the proof of Lemma3. Estimate the local period at each position of z k ′ ,j by the local period atthe corresponding position of a factor z k,j ′ , and then, for a chosen criticalposition of v and each of s occurrences of v , replace the previous valuewith p . Such a replacement increases the value by at least p − k for eachoccurrence of v . This yields the following bound. h ( z k ′ ,j ) ≥ b k + s ( p − k )2 k ′ . Since p is the period of z k ′ ,j , it is easy to see that sp > | z k ′ ,j | . From (5), we deduce s · k k ′ ≤ · sp k ′ . Altogether, we have the desired inequality h ( z k ′ ,j ) ≥ b k + s ( p − k )2 k ′ ≥ b k + 12 · sp k ′ > b k + 14 . This completes the proof. (cid:3) he following theorem is now easy to prove. Theorem 2.

Let u be an inﬁnite word with bounded repetition. Then lim i →∞ h u ( i ) = ∞ . Proof.

For a given n , let k be such that b k ≥ n . Consider the preﬁx u of u of length i ≥ k . Then u = z k ′ , u ′ , where k ′ ≥ k and u ′ is a proper preﬁx of z k ′ , . We have h u ( i ) ≥ h ( u ) ≥ b k · | z k ′ , | + | u ′ || u | ≥ b k ≥ n. This completes the proof. (cid:3)

Remark . Theorem 1 and Theorem 2 show that the construction of aninﬁnite word with a bounded periodicity complexity as it is given in [5] isthe only possible in the following sense. Let u be an inﬁnite word that is notultimately periodic and its periodicity complexity is bounded by n . Then u can be factorized as u = v u e v u e v u e · · · , where h ( u i ) < n for each i ≥

0, and the sequence of exponents ( e i ) isunbounded. 7. High periodicity complexities

This section solves the problem left open in [5], Remark 3.23, by provingthe following improvement of Theorem 3.20, ibidem.

Theorem 3.

For each function f : N → N there is a uniformly recurrentword u such that h u ( d ) > f ( d ) for inﬁnitely many integers d .Proof. Let n i , i ≥

1, be a growing sequence of positive integers with n ≥ u i , i ≥

0, of words by u = ε,u i = u i − a ( u i − b ) n i u i − , i ≥ , and put u := lim i →∞ u i . We claim that u is uniformly recurrent, and for each j ≥

1, we have p u ( d j ) = ( n j + 1)( | u j − | + 1) , (6)where d j = | u j − au j − a · · · u aa | . It is easy to verify inductively that u i a, u i b ∈ { u i − a, u i − b } + . (7)This implies that, for each i ≥

1, the word u can be factorized as a productof words u i a and u i b . That is, u = u i c u i c u i c · · · , here c j , j ≥

1, are letters. Therefore each factor z whose ﬁrst occurencelies within u i has the maximal return time bounded by | u i | + 1, and u isuniformly recurrent.We show that u contains only occurrences of u i , i ≥

1, visible in theabove factorization. More precisely, if k is an occurrence of u i , then k is amultiple of | u i | + 1. For u = ab n , the property is easily veriﬁed. Proceedby induction. Let k be an occurrence of u i in u with i ≥

2. The inductionassumption and (7) implies that k = k ′ · ( | u i | + 1) + ℓ · ( | u i − | + 1)for some k ′ ≥ ≤ ℓ ≤ n i − + 1. The word u i c , where c ∈ { a, b } ,contains at most two occurrences of u i − a , and only one of them is followedby u i − b , the one for which ℓ is zero. This proofs the claim since u i − au i − b is a preﬁx of u i .We are ready to prove (6). For each j ≥

1, denote s j := u j − au j − a · · · u aa, which is a preﬁx of u . Let r j be the shortest repetition word at the position d j . Observe that d = 1 and r = b n a , thus (6) holds for j = 1. We proceedby induction and consider two possibilities for r j +1 .1. If | r j +1 | < | s j | + | u j − | + 1, then r j +1 is a preﬁx of s − j u j − a ( u j − b ) u j − . This implies that it is a repetition word at the position d j as well, which isa contradiction with the induction assumption. u j a u j b · · · u j − a u j − a u j − b u j − b u j − · · · s j +1 s j r j +1 r j +1

2. Let now | r j +1 | ≥ | s j | + | u j − | + 1. Then ( u j − a ) is a factor of r j +1 .One can verify that the ﬁrst occurrence of ( u j − a ) greater or equal to d j +1 is in the second occurrence of s j +1 , the latter being ( n j +1 + 1)( | u j | + 1).Therefore the word s − j +1 u j a ( u j b ) n j +1 s j +1 of length predicted by (6) is the shortest repetition word at the position d j +1 .To conclude the proof of the theorem, it is enough to deﬁne n j > f ( d j )since then we have, for each j ≥ h u ( d j ) > d j p u ( d j ) > f ( d j ) 2( | u j − | + 1) d j > f ( d j ) . (cid:3) We can also give an explicit formula for the i th letter of the word u constructed in the previous proof. Let u = a a a . . . , where a i ∈ { a, b } ,and let m := 1 and m j := n j + 2 for j ≥

1. Note that for each j ≥ ave | u j | + 1 = m m m · · · m j . Then a i = a if and only if there is some j ≥ i ≡ m m m · · · m j mod m m m · · · m j m j +1 . Note that the word u can be also obtained by the Toeplitz construction(see [1, 4]). Consider the alphabet { a, b, ? } and let T ( w , v ), where w and v are inﬁnite words over { a, b, ? } , denote the inﬁnite word obtained from w by replacing the sequence of all occurrences of ? by v . Then we can deﬁne u =? ω , u i = T ( u i − , ( ab n i ?) ω ) , i ≥ , and u = lim i →∞ u i . It is a task of further research to establish the level of control over theperiodicity complexity function of resulting words given by the choice of thesequence ( n i ) in this construction. References [1] Julien Cassaigne and Juhani Karhum¨aki. Toeplitz words, generalized periodicity andperiodically iterated morphisms.

Eur. J. Comb. , 18(5):497–510, 1997.[2] Maxime Crochemore and Dominique Perrin. Two-way string matching.

J. ACM ,38(3):651–675, 1991.[3] Fabien Durand. A characterization of substitutive sequences using return words.

Dis-crete Mathematics , 179(13):89 – 101, 1998.[4] Michel Koskas. Complexits de suites de toeplitz.

Discrete Mathematics , 183(13):161 –183, 1998.[5] Filippo Mignosi and Anstonio Restivo. A new complexity function for words based onperiodicity.

International Journal of Algebra and Computation , 23(04):963–987, 2013.

Department of Algebra, Charles University, Sokolovsk´a 83, 175 86 Praha,Czech Republic

E-mail address : [email protected]@karlin.mff.cuni.cz