New results on pseudosquare avoidance
aa r X i v : . [ c s . F L ] A p r New results on pseudosquare avoidance
Tim NgSchool of Computer ScienceUniversity of WaterlooWaterloo, ON N2L 3G1Canada [email protected]
Pascal OchemLIRMM, CNRSUniversit´e de MontpellierFrance [email protected]
Narad RampersadDepartment of Math/StatsUniversity of Winnipeg515 Portage Ave.Winnipeg, MB R3B 2E9Canada [email protected]
Jeffrey ShallitSchool of Computer ScienceUniversity of WaterlooWaterloo, ON N2L 3G1Canada [email protected]
April 22, 2019
Abstract
We start by considering binary words containing the minimum possible numbersof squares and antisquares (where an antisquare is a word of the form xx ), and wecompletely classify which possibilities can occur. We consider avoiding xp ( x ), where p is any permutation of the underlying alphabet, and xt ( x ), where t is any transformationof the underlying alphabet. Finally, we prove the existence of an infinite binary wordsimultaneously avoiding all occurrences of xh ( x ) for every nonerasing morphism h andall sufficiently large words x . Let x, v be words. We say that v is a factor of x if there exist words u, w such that x = uvw .For example, or is a factor of word .By a square we mean a nonempty word of the form xx , like the French word couscous .The order of a square xx is | x | , the length of x . It is easy to see that every binary word oflength at least 4 contains a square factor. However, in a classic paper from combinatorics1n words, Entringer, Jackson, and Schatz [8] constructed an infinite binary word containing,as factors, only 5 distinct squares: 0 , 1 , (01) , (10) , and (11) . This bound of 5 squareswas improved to 3 by Fraenkel and Simpson [10]; it is optimal. For some other constructionsalso achieving the bound 3, see [16, 15, 11, 2].Instead of considering squares, one could consider antisquares : these are binary words ofthe form xx , where x is a coding that maps 0 → →
0. For example, isan antisquare. (They should not be confused with the different notion of antipower recentlyintroduced by Fici, Restivo, Silva, and Zamboni [9].) Clearly it is possible to construct aninfinite binary word that avoids all antisquares, but only in a trivial way: the only such wordsare 0 ω = 000 · · · and 1 ω = 111 · · · . Similarly, the only infinite binary words with exactly oneantisquare are 01 ω and 10 ω . However, it is easy to see that every word in { , } ω hasexactly two antisquares — namely 01 and 10 — and hence there are infinitely many suchwords that are aperiodic.Several writers have considered variations on these results. For example, Blanchet-Sadri,Choi, and Merca¸s [3] considered avoiding large squares in partial words. Chiniforooshan,Kari, and Zhu [4] studied avoiding words of the form xθ ( x ), where θ is an antimorphicinvolution. Their results implicitly suggest the general problem of simultaneously avoidingwhat we might call pseudosquares : patterns of the form xx ′ , where x ′ belongs to some(possibly infinite) class of modifications of x .This paper has two goals. First, for all integers a, b ≥ a squares and b antisquares. If this is not possible,we determine the length of the longest finite binary word with this property.Second, we apply our results to discuss the simultaneous avoidance of xx ′ , where x ′ belongs to some class of modifications of x . We consider three cases:(a) where x ′ = p ( x ) for a permutation p of the underlying alphabet;(b) where x ′ = t ( x ) for a transformation t of the underlying alphabet; and(c) where x ′ = h ( x ) for an arbitrary nonerasing morphism.In particular, we prove the existence of an infinite binary word that avoids xh ( x ) simultane-ously for all nonerasing morphisms h and all sufficiently long words x . We are interested in binary words where the number of distinct factors that are squares andantisquares is bounded. More specifically, we completely solve this problem determining inevery case the length of the longest word having at most a distinct squares and at most b distinct antisquares. Our results are summarized in the following table. If (one-sided)infinite words are possible, this is denoted by writing ∞ for the length.2 b · · · · · · · · · · · · ∞ · · · ∞ ∞ ∞ ∞ · · · ∞ ∞ ∞ ∞ · · · ∞ · · · ∞ ∞ · · · ∞ · · · ∞ · · ·
10 21 22 ∞ · · · ...Figure 1: Length of longest binary word having at most a squares and b antisquaresThe results in the first two columns and first three rows (that is, for a ≤ b ≤ Proposition 1. (a) For a ≥ , the longest binary word with a squares and antisquares has length a + 1 .(b) For a ≥ , the longest binary word with a squares and antisquare has length a + 2 .Proof. (a) If a binary word has no antisquares, then in particular it has no occurrences of either01 or 10. Thus it must contain only one type of letter. If it has length 2 a + 2, then ithas a + 1 squares, of order 1 , , . . . , a + 1. If it has length 2 a + 1, it has a squares. So2 a + 1 is optimal.(b) If a length- n binary word w has only one antisquare, this antisquare must be either 01or 10; without loss of generality, assume it is 01. Then w is either of the form 0 n − n − . Such a word clearly has ⌊ ( n − / ⌋ squares.We next explain the first three rows: if a binary word has no squares, its length is clearlybounded by 3, as we remarked earlier. If it has one square, a simple argument shows it haslength at most 7. Finally, if it has two squares, already Entringer, Jackson, and Schatz [8,Thm. 2] observed that it has length at most 18.For all the remaining finite entries, we obtained the result through the usual backtracksearch method, and we omit the details. 3t now remains to prove the results labeled ∞ . First, we introduce some morphisms. Letthe morphisms h , , h ′ , , h , , h , , h , , h ′ , be defined as follows:(a) h , :0 → → → This is a 216-uniform morphism.(b) h ′ , :0 → → → h , :0 → → → h , :0 → → → h , :0 → → → h ′ , :0 → → → h , :0 → → → heorem 2. Let w be an infinite squarefree sequence over the alphabet { , , } . Then(a) h , ( w ) has squares and antisquares. The squares are , , and (01) . The an-tisquares are , , , , , , , , , , , , and .(b) h ′ , ( w ) has squares and antisquares. The squares are , , and (01) . The an-tisquares are , , , , , , , , , , , , and .(c) h , ( w ) has exactly squares and antisquares. The squares are , , (00) , and (01) , and the antisquares are , , , , , , , , and .(d) h , ( w ) has exactly squares and antisquares. The squares are , , (00) , (01) ,and (10) , and the antisquares are , , , , and .(e) h , ( w ) has squares and antisquares. The squares are , (00) , (01) , (10) , (001) , (010) , and (100) , and the antisquares are , , and .(f ) h ′ , ( w ) has squares and antisquares. The squares are , (00) , (01) , (10) , (001) , (010) , and (100) , and the antisquares are , , and .(g) h , ( w ) has squares and antisquares. The squares are , (00) , (01) , (10) , (000) , (0001) , (0010) , (0100) , and (1000) , and the antisquares are and .Proof. Let h be any of the morphisms above. We first show that large squares are avoided.The h -images of the letters have been ordered such that | h (0) | ≥ | h (1) | ≥ | h (2) | . A computercheck shows that for every letter i and every ternary word w , the factor h ( i ) appears in h ( w )only as the h -image of i . Another computer check shows that for every ternary squarefreeword w , the only squares uu with | u | ≤ | h (0) | − h ( w ) are the ones weclaim. If w contains a square uu with | u | ≥ | h (0) | −
1, then u contains the full h -imageof some letter. Thus, uu is a factor of h ( avbvc ) with a, b, c single letters and v a nonemptyword. Moreover, a = b and b = c , since otherwise avbvc would contain a square. It followsthat u = ph ( v ) s , so that p is a suffix of h ( a ), and h ( b ) = sp , and s is a prefix of h ( c ). Thus, h ( abc ) contains the square psps with period | ps | at least | h (2) | / | h (0) | / | h (0) | − f such that f is uniformly recurrent in h ( w ) and f is not a factor of h ( w ). We use f = 0101 for h , and f = 0 for the other morphisms. Remark . The uniform morphisms were found as follows: for increasing values of q , ourprogram looks (by backtracking) for a binary word of length 3 q corresponding to the image h (012) of 012 by a suitable q -uniform morphism h . Given a candidate h , we check that5 ( w ) has at most a squares and b antisquares for every squarefree word w up to somelength. Standard optimizations are applied to the backtracking. Squares and antisquaresare counted naively (recomputed from scratch at every step), which is sufficient since themorphisms found are not too large. Remark . The reader can check that h , = h ′ , ◦ m , where m is the 18-uniform morphismgiven by 0 → → → . Corollary 5.
There exists an infinite binary word having at most ten distinct squares andantisquares as factors, but the longest binary word having nine or fewer distinct squares andantisquares is of length .Remark . A word of length 45 with nine distinct squares and antisquares is000001000000010100000010000101000000010000101 . Corollary 7.
Every infinite word having at most ten distinct squares and antisquares hascritical exponent at least , and there is such a word having -powers but no powers of higherexponent.Proof. By the usual backtracking approach, we can easily verify that the longest finite wordhaving at most ten distinct antisquares, and critical exponent < . On the other hand, if w is any squarefree ternary infinite word, then from above we know thatthe only possible squares that can occur in h , ( w ) are of the form x for x ∈ { , , , , } .It is now easy to verify that the largest power of 0 that occurs in h , ( w ) is 0 ; the largestpower of 1 that occurs is 1 ; the largest power of 01 that occurs is (01) / ; and the largestpower of 10 that occurs is (10) / . In this section we discuss avoiding xx ′ where x ′ belongs to some large class of modificationsof x ′ . This is in the spirit of previous results [17, 6, 13], where one is interested in avoidingfactors of low Kolmogorov complexity. The problems we study are not quite so general, butour results are effective, and we obtain explicit bounds.6 .1 Avoiding pseudosquares for permutations Here we are interested in avoiding patterns of the form xp ( x ), for all codings p that arepermutations of the underlying alphabet. Of course, this is impossible for words of length ≥ ap ( a ) where p is thepermutation sending the letter a to p ( a ). Thus it is reasonable to ask about avoiding xp ( x )for all words x of length ≥ n . Our first result shows this is impossible for n = 2. Theorem 8.
For all finite alphabets Σ , and for all words w of length ≥ over Σ , thereexists a permutation p of Σ and a factor of w of the form xx ′ , where x ′ = p ( x ) , and | x | ≥ .Proof. Using the usual tree-traversal technique, where we extend the alphabet size at eachlength extension.We now turn to the case of larger n . For n ≥
3, and k = 2, we can avoid all factors of theform xp ( x ). Of course, this case is particularly simple, since there are only two permutationsof the alphabet: the identity permutation that leaves letters invariant, and the map x → x ,which changes 0 to 1 and vice versa. Theorem 9.
There exists an infinite word w over the binary alphabet Σ = { , } thatavoids xx and xx for all x with | x | ≥ .Proof. We can use the morphism in Theorem 2 (c). Alternatively, a simpler proof comesfrom the fixed point of the morphism 0 → → → → → → → → n → n mod 2. We can now use Walnut [14] to verify that the resulting2-automatic word has the desired property. This word has exactly 5 distinct squares:0 , , (00) , (01) , (10) , and exactly 6 distinct antisquares:01 , , , , , . .2 Avoiding pseudosquares for transformations In the previous subsection we considered permutations of the alphabet. We now generalizethis to transformations of the alphabet, or, in other words, to arbitrary codings (letter-to-letter morphisms).
Theorem 10. (a) For all finite alphabets Σ , and all words w of length ≥ over Σ , there exists atransformation t : Σ ∗ → Σ ∗ such that w contains a factor of the form xt ( x ) for | x | ≥ .(b) For all finite alphabets Σ , and all words w of length ≥ over Σ , there exists atransformation t of Σ such that w contains a factor of the form xx ′ , where x ′ = t ( x ) or x = t ( x ′ ) and | x | ≥ .Proof. Using the usual tree-traversal technique, where we extend the alphabet size at eachlength extension.We now specialize to the binary alphabet. This case is particularly simple, since inaddition to the two permutations of the alphabet, the only other transformations are theones sending both 0 , Theorem 11.
There exists an infinite word w over the binary alphabet Σ = { , } avoiding , , and xx and xx for every x with | x | ≥ . In other words, w avoids both xt ( x ) and t ( x ) x for | x | ≥ and all transformations t . There is no such infinite word if is changed to .Proof. Use the fixed point of the morphism0 → → → → → → n → ⌊ n/ ⌋ . The result can now easily be verified with Walnut . In this subsection we consider simultaneously avoiding all patterns of the form xh ( x ), for allmorphisms h defined over Σ k = { , , . . . , k − } . Clearly this is impossible if h is allowed tobe erasing (that is, some images are allowed to be empty), or if x consists of a single letter.So once again we consider the question for sufficiently long x .8or this version of the problem, it is particularly hard to obtain experimental data,because the problem of determining, given x and y , whether there is a morphism h such that y = h ( x ), is NP-complete [1, 7]. Theorem 12.
No infinite binary word avoids all factors of the form xh ( x ) , for all nonerasingbinary morphisms h , with | x | ≥ .Proof. This can be checked by computer in less than a second. We give another proof that isa reduction to a more classical question of avoiding large squares and a finite set of factors.Let w be a potential counter-example to Theorem 12. Without loss of generality, wecan assume that w is uniformly recurrent (see, e.g., [5, Lemma 2.4]). Suppose, to get acontradiction, that w contains the factor 000. Since w = 0 ω , the word w contains 1000.Since w is uniformly recurrent, the factor 1000 extends to a factor 1000 u u is anonempty finite word, which is a forbidden occurrence of xh ( x ). So w avoids 000, and bysymmetry, the word w also avoids 111. Suppose, to get a contradiction, that w containsboth 0100 and 1011. The factor 0100 extends to 01001. Since w is uniformly recurrent andcontains 11, the word w contains 01001 u
11, where u is a nonempty finite word, which is aforbidden occurrence of xh ( x ). So w does not contain both 0100 and 1011, and we assumewithout loss of generality that w avoids 0100.Using the usual tree-traversal technique, we can now easily check that no infinite binaryword avoids 000, 111, 0100, and every square xx with | x | ≥
4. Thus, w does not exist. Theorem 13.
There exists an infinite binary word that avoids all factors of the form xh ( x ) ,for all nonerasing binary morphisms h , with | x | ≥ .Proof. Let u be any infinite ternary (7 / + )-free word, and consider the binary word w de-fined by w = m ( u ), where m is the 246-uniform morphism given below.0 → → → We use a and b to denote letters. We will use the concept of generalized repetitionthreshold [12]. Recall that a word is said to be ( e, n )-free if it contains no factor of the form x f where f ≥ e and | x | ≥ n . We will need the following properties of w .(a) w is (11 / + , w are 00, 11, 0101,1010, 010010, 101101, and 110110.(b) The only cubes occurring in w are 000 and 111. Every cube bbb extends to the left to bbbbb . 9c) w does not contain any of the following factors: 01010, 10101, 00100, 1101100, 1011010010.(d) Every factor of w of length 17 contains 00111 or 11000.(e) Every factor of w of length 98 contains 11011.(f) Every factor of w of length at least 5, except 00010, 11101, 111011, and 11011, containsa factor of the form bbbb , bbbb , bbbb , or bbbbb .By [15, Lemma 2.1], it is sufficient to check the (11 / + , / + )-free ternary word of length smaller than × / / − / = 44. The other properties can bechecked by inspecting factors of w with bounded length.The following cases show that w contains no factor of the form xh ( x ) with | x | ≥ • We can rule out h (0) = h (1), as h ( x ) contains h (0) , which contradicts (a). • We can rule out h ( b ) = b , as xh ( x ) = xx is a square with period at least 5, whichcontradicts (a). • We can rule out | x | ≥
17: By (d), x contains the factor bbbbb . By (b), | h ( b ) | = 1,say h ( b ) = a . By (a), a square of period at least two has either a or aa as a suffix.So if | h ( b ) | >
1, then h ( bbbbb ) has either aaaa or aaaaa as a suffix, which contradicts(b). Thus | h ( b ) | = | h ( b ) | = 1. By the previous cases, the only remaining possibility is h ( b ) = b . If | x | ≥
98, then x contains 11011 by (e). Thus h ( x ) contains 00100, whichcontradicts (c). If 17 ≤ | x | ≤
97, then a computer check shows that w contains noantisquare xh ( x ) = xx . • If x = bbbbb , then xh ( x ) contains the factor bbh ( bbb ), which contradicts (b). • If x = 111011 or x = 11011, then xh ( x ) contains the factor 11011 h (11). We can checkthat every choice of h (11) leads to a contradiction with (a), (b), or (c). • We can rule out the remaining cases. By (f) and the previous two cases, we canassume that x contains bbbb , bbbb , bbbb , or bbbbb . Since b is always contained in asquare, | h ( b ) | ≤ b is contained in a square, then | h ( b ) | ≤
3. Otherwise, x contains bbbb , or bbbbb . Notice that | h ( bb ) | ≤ | h ( bbb ) | = 3. Let s ∈ { , } . Therepetition h ( bb s b ) in a (11 / + , | h ( bb s b ) | ≤ | h ( bb s ) | . Thisgives | h ( b ) | ≤ | h ( b s ) | ≤
30. Thus, | h (0) | + | h (1) | ≤ w contains afactor of the form xh ( x ) with | x | ≥
5, then | x | ≤
16 and | h (0) | + | h (1) | ≤
33. Finally,a computer check shows that w contains no such factor xh ( x ).10 eferences [1] D. Angluin. Finding patterns common to a set of strings. J. Comput. System Sci. (1980), 46–62.[2] G. Badkobeh and M. Crochemore. Fewest repetitions in infinite binary words. RAIROInform. Th´eor. App. (2012), 17–31.[3] F. Blanchet-Sadri, I. Choi, and R. Merca¸s. Avoiding large squares in partial words. Theoret. Comput. Sci. (2011), 3752–3758.[4] E. Chiniforooshan, L. Kari, and Z. Xu. Pseudopower avoidance.
Fund. Inform. (1)(2012), 55–72.[5] A. de Luca and S. Varricchio. Finiteness and iteration conditions for semigroups.
The-oret. Comput. Sci. (1991), 315–327.[6] B. Durand, L. Levin, and A. Shen. Complex tilings. J. Symbolic Logic (2008),593–613.[7] A. Ehrenfeucht and G. Rozenberg. Finding a homomorphism between two words isNP-complete. Inform. Process. Lett. (1979), 86–88.[8] R. C. Entringer, D. E. Jackson, and J. A. Schatz. On nonrepetitive sequences. J.Combin. Theory. Ser. A (1974), 159–164.[9] G. Fici, A. Restivo, M. Silva, and L. Q. Zamboni. Anti-powers in infinite words. J.Combin. Theory. Ser. A (2018), 109–119.[10] A. S. Fraenkel and J. Simpson. How many squares must a binary sequence contain?
Electronic J. Combinatorics (1995), Bull. European Assoc. Theor.Comput. Sci. , No. 89, (2006), 164–166.[12] L. Ilie, P. Ochem, and J. Shallit. A generalization of repetition threshold.
Theoret.Comput. Sci. (2005), 359–369.[13] J. S. Miller. Two notes on subshifts.
Proc. Amer. Math. Soc. (2012), 16171622.[14] H. Mousavi. Automatic theorem proving in
Walnut . Available at http://arxiv.org/abs/1603.06017 , 2016.[15] P. Ochem. A generator of morphisms for infinite words.
RAIRO Inform. Th´eor. App. (2006), 427–441.[16] N. Rampersad, J. Shallit, and M. w. Wang. Avoiding large squares in infinite binarywords. Theoret. Comput. Sci. (2005), 19–34.1117] A. Yu. Rumyantsev and M. A. Ushakov. Forbidden substrings, Kolmogorov complexityand almost periodic sequences. In
STACS 2006 , Vol. 3884 of