The Simplest Binary Word with Only Three Squares
TThe Simplest Binary Word with Only Three Squares
Daniel Gabric and Jeffrey ShallitSchool of Computer ScienceUniversity of WaterlooWaterloo, ON N2L 3G1Canada [email protected]@uwaterloo.ca
Abstract
We re-examine previous constructions of infinite binary words containing few dis-tinct squares with the goal of finding the “simplest”, in a certain sense. We exhibitseveral new constructions. Rather than using tedious case-based arguments to provethat the constructions have the desired property, we rely instead on theorem-provingsoftware for their correctness.
One of the earliest results in combinatorics on words is that squares are unavoidable overa two-letter alphabet, but are avoidable over a three-letter alphabet [15, 16, 4]. Here a“square” is a nonempty word of the form xx , “unavoidable” means that every sufficientlylong word contains a square subword, and “avoidable” means there exists an infinite wordcontaining no squares.Although squares are unavoidable over a two-letter alphabet, Entringer, Jackson, andSchatz [8] proved that there exist infinite binary words containing no squares of order ≥ xx is | x | , the length of x .) This was later improved by Fraenkel andSimpson; they showed the existence of binary words having only three distinct squares.The main tool for creating such words is the morphism : a map h : Σ ∗ → ∆ ∗ for alphabetsΣ, ∆ obeying the rule h ( xy ) = h ( x ) h ( y ) for all x, y ∈ Σ ∗ . A morphism is k - uniform if | h ( a ) | = k for all a ∈ Σ. If it is k -uniform for some k , then we say it is uniform . A 1-uniformmorphism is called a coding . If ∆ ⊆ Σ we can iterate h , writing h ( x ) for h ( h ( x )), and soforth. If further h ( a ) = ax for some a ∈ Σ, x ∈ Σ ∗ , and h i ( x ) (cid:54) = (cid:15) for all i , then iterating h infinitely produces an infinite word h ω ( a ) = axh ( x ) h ( x ) · · · called a fixed point of h . Ifan infinite word is the image, under a coding, of a fixed point of a k -uniform morphism, itis called k -automatic . The weight of a morphism h : Σ ∗ → Σ ∗ is defined to be (cid:80) a ∈ Σ | h ( a ) | ,1 a r X i v : . [ c s . F L ] J u l nd the weight of a k -automatic infinite word is defined to be the weight of its definingmorphism.In this note we find the “simplest” infinite binary word having at most three distinctsquares. Our criterion for simplicity is as follows:(a) the word should be generated by a finite automaton of s states taking the base- k representation of n as input (i.e., a k -automaton), most significant digit first; and(b) the product k · s should be as small as possible.By Cobham’s theorem [7], this is same as saying the word is generated as the image, undera coding, of a fixed point of a k -uniform morphism over an alphabet of s letters.One practical advantage to restricting our attention to k -automatic words is that theproperty of having exactly three distinct square factors can be stated in first-order logic,thus reducing the verification to a completely routine calculation using a decision procedure[6]. We begin with a description of the construction of Entringer-Jackson-Schatz. Here veryslightly modified from the original, it starts with an arbitrary squarefree word z over { , , } and applies the uniform morphism h (0) = 1100 h (1) = 0111 h (2) = 1010to it. They proved that the resulting word h ( z ) has no squares of order ≥
3; in fact, theonly squares that appear are 0 , , (01) , (10) , and (11) .Although this is indeed a simple construction, in terms of automatic sequences, it can beimproved. The minimum automaton size for h ( z ), over all 2-automatic squarefree words z ,is 10, as can be verified by breadth-first search, with pruning if the prefix constructed so farrequires 11 or more states.This minimum number of states is achieved, for example, by applying h to the famoussquarefree word vtm := τ ( g ω (0)) = 2102012101202102012021012102012 · · · , where g (0) = 01 τ (0) = 2 g (1) = 20 τ (1) = 1 g (2) = 23 τ (2) = 0 g (3) = 02 τ (3) = 1 . Remark . The word vtm is (up to renaming) the classical squarefree word of Thue [16].It can be defined in many different ways [3], including as the fixed point of the morphismdefined by 2 → →
20, 0 →
1. The name vtm for this word comes from [5].2 novel alternative construction (not necessarily an image of vtm ) needs only six states.This is the minimum possible number of states for a 2-automatic word containing no squaresof order ≥ Theorem 1.
Consider the infinite word ρ ( f ω (0)) , where f (0) = 01 ρ (0) = 0 f (1) = 23 ρ (1) = 0 f (2) = 45 ρ (2) = 0 f (3) = 02 ρ (3) = 0 f (4) = 05 ρ (4) = 1 f (5) = 25 ρ (5) = 1 . This is the lexicographically least word generated by a -automaton of ≤ states, containingno squares of order ≥ , and only 5 distinct squares. The Entringer-Jackson-Schatz construction was optimally improved by Fraenkel and Simp-son [9], as follows: they constructed an infinite binary word containing only 3 squares: 0 ,1 , and (10) .Their construction is rather complicated, and also has a complicated proof. It starts withan infinite squarefree word w over { , , } avoiding the subwords 020 and 121. (Althoughthey do not say so, an example of such a word is given by renaming the letters in vtm := τ ( g ω (0)) above.) Then replace every occurrence of 12 with 132. Next, replace every remainingoccurrence of 21 with 241. Finally, apply the morphism α defined as follows: α (0) = 011000111001 α (1) = 011100011001 α (2) = 011001110001 α (3) = 01100010111001 α (4) = 01110010110001 . The resulting word avoids all squares except 0 , 1 , and (01) .Because of the inherent complexity of this construction, it seems desirable to find simplerones. An example using 24-uniform morphisms was given by Rampersad et al. [14]. Define p (0) = 012321012340121012321234 p (1) = 012101234323401234321234 p (2) = 012101232123401232101234 p (3) = 012321234323401232101234 p (4) = 0123212340121012343212343nd β (0) = 011100 β (1) = 101100 β (2) = 111000 β (3) = 110010 β (4) = 110001 . Then β ( p ω (0)) is an infinite word containing only the squares 0 , 1 , and (01) . This con-struction gives a 24-automatic sequence generated by an automaton of 18 states, so its weightis 24 ·
18 = 432.
Ochem [13] provided a different construction in 2006: σ (0) = 00011001011000111001011001110001011100101100010111 σ (1) = 00011001011000101110010110011100010110001110010111 σ (2) = 00011001011000101110010110001110010111000101100111He showed that if x is a (7 / (cid:15) )-free word, then σ ( x ) contains only three squares.In fact, we can also successfully apply σ to the word vtm above, even though it is not(7 / (cid:15) )-free. Since σ is a uniform map, we know that σ ( vtm ) is 2-automatic. Theorem 2.
This word σ ( vtm ) is a -automatic word containing only three distinct squares.It is generated by an automaton with 109 states (and has weight ·
109 = 218 ). Harju and Nowotka [10] generated an infinite binary word with three squares by defining themap ζ (0) = 111000110010110001110010 ζ (1) = 111000101100011100101100010 ζ (2) = 111000110010110001011100101100 . and then applying it to vtm .The morphism ζ is clearly not uniform. However, the lengths of the images of 0 , , , ,
30 and form an arithmetic progression. This is enough to show that ζ ( vtm ) is 2-automatic, as the following result shows. Theorem 3.
Let vtm = τ ( g ω (0)) where g and τ are defined in Section 2. Let h : { , , } ∗ → ∆ ∗ be a morphism. If the three lengths | h (0) | , | h (1) | , and | h (2) | form an arithmetic progres-sion, then h ( vtm ) is -automatic. roof. Suppose a, b are integers, with a ≥ a + 2 b ≥
1, such that | h ( i ) | = a + ib for i ∈ { , , } . Write vtm = c (0) c (1) c (2) · · · . An easy induction now shows that | h ( c (0) c (1) · · · c ( n − | = ( a + b ) n + bt n for n ≥
0, where t = t t · · · is the Thue-Morse word. To compute the n ’th symbol of h ( vtm ),divide n by a + b to determine which block h ( c ( i )) it corresponds to; then adjust based onwhether t i = 0 or not. More precisely, define n (cid:48) := (cid:98) n/ ( a + b ) (cid:99) and m := n mod ( a + b ).Then ( h ( vtm ))[ n ] := ( h ( c ( n (cid:48) )))[ m ] , if t n (cid:48) = 0;( h ( c ( n (cid:48) − m + a + b ] , if t n (cid:48) = 1 and t n (cid:48) − = 0 and m < b ;( h ( c ( n (cid:48) )))[ m − b ] , if t n (cid:48) = 1 and t n (cid:48) − = 0 and m ≥ b ;( h ( c ( n (cid:48) − m + a ] , if t n (cid:48) = 1 and t n (cid:48) − = 1 and m < b ;( h ( c ( n (cid:48) )))[ m − b ] , if t n (cid:48) = 1 and t n (cid:48) − = 1 and m ≥ b. For fixed a and b , an automaton on input n in base 2 can compute n (cid:48) and m on the fly anddo the required lookup. Theorem 4.
The infinite word ζ ( vtm ) contains only three distinct squares: , , and (01) . It is generated by an automaton with 88 states, and has weight is ·
88 = 176 . Yet another construction was given by Badkobeh and Crochemore [2, 1]. They defined themorphism ξ (0) = 000111 ξ (1) = 0011 ξ (2) = 01001110001101 . of weight 24. Although ξ applied to a squarefree word can produce a word with more thanthree squares (consider 0102), it turns out that ξ ( vtm ) is squarefree. Furthermore, althoughthey do not mention it, ξ is a morphism of lowest total weight with this property.Incidentally, we found another morphism with the same properties, of the same weight;it is κ (0) = 110100111000110100 κ (1) = 1100 κ (2) = 01 . However, the lengths of the images of both of these morphisms are not in arithmeticprogression, and so Theorem 3 does not apply. Indeed, we suspect (but did not prove) thatneither ξ ( vtm ) nor κ ( vtm ) is a 2-automatic sequence. If they are 2-automatic, then morethan 200 states are needed to generate them.5 .4 Our first construction The previous section suggests looking for a morphism η of lowest total weight, where thelengths of the images of 0 , , η ( vtm ) has only 3distinct squares. We found the following morphism, which is the smallest such, of weight 36. η (0) = 00011101 η (1) = 001110001101 η (2) = 0011000111001101 . Theorem 5.
The infinite word η ( vtm ) contains only three distinct squares: , , and (10) . It is -automatic, and can be generated by an automaton of states, so its weight is ·
27 = 54 . Finally, instead of using the strategy of applying a morphism to vtm , we can search directlyfor a k -automatic word of minimum total weight. It turns out that this minimum weight is44, corresponding to a 2-automaton with 22 states:
00 1 Figure 1: DFAO where accepting states have output 1 and all other states have output 0.The corresponding representation is as the image, under the coding γ , of the fixed point ofthe morphism q defined below over the alphabet { , , . . . , } . We use commas to separate6etters in the image of q , because of the large alphabet size. q (0) = 0 , γ (0) = 1 q (1) = 2 , γ (1) = 1 q (2) = 4 , γ (2) = 0 q (3) = 6 , γ (3) = 1 q (4) = 8 , γ (4) = 0 q (5) = 10 , γ (5) = 0 q (6) = 12 , γ (6) = 1 q (7) = 8 , γ (7) = 1 q (8) = 13 , γ (8) = 0 q (9) = 15 , γ (9) = 0 q (10) = 2 , γ (10) = 0 q (11) = 7 , γ (11) = 1 q (12) = 6 , γ (12) = 1 q (13) = 2 , γ (13) = 1 q (14) = 18 , γ (14) = 1 q (15) = 0 , γ (15) = 0 q (16) = 6 , γ (16) = 0 q (17) = 10 , γ (17) = 1 q (18) = 20 , γ (18) = 0 q (19) = 10 , γ (19) = 1 q (20) = 13 , γ (20) = 1 q (21) = 18 , γ (21) = 0 Theorem 6.
The infinite word γ ( q ω (0)) = 11010011000111001101001110001101000111010011000 · · · contains only distinct squares: , , and (10) . It has total weight . By exhaustive search we find that there are no k -automatic words containing only threedistinct squares, with s states, for 3 ≤ k ≤
44 and ks ≤ We used breadth-first search to find candidates for the minimal examples presented here.The number of states in the minimal automaton were determined using the Myhill-Nerodetheorem (see, e.g., [11]). We used the theorem-proving software
Walnut [12] to verify asser-tions about the squares contained in each word. For example, the claim about the 22-stateautomaton in the previous section can be proved as follows: create the automaton, and callit Q in Walnut , and then evaluate the following three statements: eval qtest1 "Ei,n (n>=3) & At (t
The first predicate asserts that there is a square of order ≥ or (11) . The third asserts that there is a squareof the form (01) . Since all three queries return false , the word has the desired properties.The total computation time for this query is a few seconds on a laptop.Each of Theorems 1,2,4,5,6 can be proved similarly, although some require significantmemory resources and time. The Walnut code can be found on the website of the secondauthor: https://cs.uwaterloo.ca/~shallit/papers.html .7 eferences [1] G. Badkobeh. Infinite words containing the minimal number of repetitions. J. DiscreteAlgorithms (2013), 38–42.[2] G. Badkobeh and M. Crochemore. Fewest repetitions in infinite binary words. RAIROInform. Th´eor. App. (2012), 17–31.[3] J. Berstel. Sur la construction de mots sans carr´e. S´eminaire de Th´eorie des Nombres (1978–1979), 18.01–18.15.[4] J. Berstel.
Axel Thue’s Papers on Repetitions in Words: a Translation . Number 20 inPublications du Laboratoire de Combinatoire et d’Informatique Math´ematique. Univer-sit´e du Qu´ebec `a Montr´eal, February 1995.[5] F. Blanchet-Sadri, J. Currie, N. Rampersad, and N. Fox. Abelian complexity of fixedpoint of morphism 0 (cid:55)→ (cid:55)→
02, 2 (cid:55)→ INTEGERS: Elect. J. of Combin. NumberTheory (2014), Internat. J. Found. Comp. Sci. (2012), 1035–1066.[7] A. Cobham. Uniform tag sequences. Math. Systems Theory (1972), 164–192.[8] R. C. Entringer, D. E. Jackson, and J. A. Schatz. On nonrepetitive sequences. J.Combin. Theory Ser. A (1974), 159–164.[9] A. S. Fraenkel and J. Simpson. How many squares must a binary sequence contain? Electronic J. Combinatorics (1994), Bull. European Assoc. Theor.Comput. Sci. , No. 89, (2006), 164–166.[11] J. E. Hopcroft and J. D. Ullman.
Introduction to Automata Theory, Languages, andComputation . Addison-Wesley, 1979.[12] H. Mousavi. Automatic theorem proving in
Walnut . Available at http://arxiv.org/abs/1603.06017 , 2016.[13] P. Ochem. A generator of morphisms for infinite words.
RAIRO Inform. Th´eor. App. (2006), 427–441.[14] N. Rampersad, J. Shallit, and M.-w. Wang. Avoiding large squares in infinite binarywords. Theoret. Comput. Sci. (2005), 19–34.815] A. Thue. ¨Uber unendliche Zeichenreihen.
Norske vid. Selsk. Skr. Mat. Nat. Kl. (1906), 1–22. Reprinted in Selected Mathematical Papers of Axel Thue , T. Nagell,editor, Universitetsforlaget, Oslo, 1977, pp. 139–158.[16] A. Thue. ¨Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen.