[PDF] The Simplest Binary Word with Only Three Squares

Abstract

We re-examine previous constructions of infinite binary words containing few distinct squares with the goal of finding the "simplest", in a certain sense. We exhibit several new constructions. Rather than using tedious case-based arguments to prove that the constructions have the desired property, we rely instead on theorem-proving software for their correctness.

Full PDF

TThe Simplest Binary Word with Only Three Squares

Daniel Gabric and Jeﬀrey ShallitSchool of Computer ScienceUniversity of WaterlooWaterloo, ON N2L 3G1Canada [email protected]@uwaterloo.ca

Abstract

We re-examine previous constructions of inﬁnite binary words containing few dis-tinct squares with the goal of ﬁnding the “simplest”, in a certain sense. We exhibitseveral new constructions. Rather than using tedious case-based arguments to provethat the constructions have the desired property, we rely instead on theorem-provingsoftware for their correctness.

One of the earliest results in combinatorics on words is that squares are unavoidable overa two-letter alphabet, but are avoidable over a three-letter alphabet [15, 16, 4]. Here a“square” is a nonempty word of the form xx , “unavoidable” means that every suﬃcientlylong word contains a square subword, and “avoidable” means there exists an inﬁnite wordcontaining no squares.Although squares are unavoidable over a two-letter alphabet, Entringer, Jackson, andSchatz [8] proved that there exist inﬁnite binary words containing no squares of order ≥ xx is | x | , the length of x .) This was later improved by Fraenkel andSimpson; they showed the existence of binary words having only three distinct squares.The main tool for creating such words is the morphism : a map h : Σ ∗ → ∆ ∗ for alphabetsΣ, ∆ obeying the rule h ( xy ) = h ( x ) h ( y ) for all x, y ∈ Σ ∗ . A morphism is k - uniform if | h ( a ) | = k for all a ∈ Σ. If it is k -uniform for some k , then we say it is uniform . A 1-uniformmorphism is called a coding . If ∆ ⊆ Σ we can iterate h , writing h ( x ) for h ( h ( x )), and soforth. If further h ( a ) = ax for some a ∈ Σ, x ∈ Σ ∗ , and h i ( x ) (cid:54) = (cid:15) for all i , then iterating h inﬁnitely produces an inﬁnite word h ω ( a ) = axh ( x ) h ( x ) · · · called a ﬁxed point of h . Ifan inﬁnite word is the image, under a coding, of a ﬁxed point of a k -uniform morphism, itis called k -automatic . The weight of a morphism h : Σ ∗ → Σ ∗ is deﬁned to be (cid:80) a ∈ Σ | h ( a ) | ,1 a r X i v : . [ c s . F L ] J u l nd the weight of a k -automatic inﬁnite word is deﬁned to be the weight of its deﬁningmorphism.In this note we ﬁnd the “simplest” inﬁnite binary word having at most three distinctsquares. Our criterion for simplicity is as follows:(a) the word should be generated by a ﬁnite automaton of s states taking the base- k representation of n as input (i.e., a k -automaton), most signiﬁcant digit ﬁrst; and(b) the product k · s should be as small as possible.By Cobham’s theorem [7], this is same as saying the word is generated as the image, undera coding, of a ﬁxed point of a k -uniform morphism over an alphabet of s letters.One practical advantage to restricting our attention to k -automatic words is that theproperty of having exactly three distinct square factors can be stated in ﬁrst-order logic,thus reducing the veriﬁcation to a completely routine calculation using a decision procedure[6]. We begin with a description of the construction of Entringer-Jackson-Schatz. Here veryslightly modiﬁed from the original, it starts with an arbitrary squarefree word z over { , , } and applies the uniform morphism h (0) = 1100 h (1) = 0111 h (2) = 1010to it. They proved that the resulting word h ( z ) has no squares of order ≥

3; in fact, theonly squares that appear are 0 , , (01) , (10) , and (11) .Although this is indeed a simple construction, in terms of automatic sequences, it can beimproved. The minimum automaton size for h ( z ), over all 2-automatic squarefree words z ,is 10, as can be veriﬁed by breadth-ﬁrst search, with pruning if the preﬁx constructed so farrequires 11 or more states.This minimum number of states is achieved, for example, by applying h to the famoussquarefree word vtm := τ ( g ω (0)) = 2102012101202102012021012102012 · · · , where g (0) = 01 τ (0) = 2 g (1) = 20 τ (1) = 1 g (2) = 23 τ (2) = 0 g (3) = 02 τ (3) = 1 . Remark . The word vtm is (up to renaming) the classical squarefree word of Thue [16].It can be deﬁned in many diﬀerent ways [3], including as the ﬁxed point of the morphismdeﬁned by 2 → →

20, 0 →

1. The name vtm for this word comes from [5].2 novel alternative construction (not necessarily an image of vtm ) needs only six states.This is the minimum possible number of states for a 2-automatic word containing no squaresof order ≥ Theorem 1.

Consider the inﬁnite word ρ ( f ω (0)) , where f (0) = 01 ρ (0) = 0 f (1) = 23 ρ (1) = 0 f (2) = 45 ρ (2) = 0 f (3) = 02 ρ (3) = 0 f (4) = 05 ρ (4) = 1 f (5) = 25 ρ (5) = 1 . This is the lexicographically least word generated by a -automaton of ≤ states, containingno squares of order ≥ , and only 5 distinct squares. The Entringer-Jackson-Schatz construction was optimally improved by Fraenkel and Simp-son [9], as follows: they constructed an inﬁnite binary word containing only 3 squares: 0 ,1 , and (10) .Their construction is rather complicated, and also has a complicated proof. It starts withan inﬁnite squarefree word w over { , , } avoiding the subwords 020 and 121. (Althoughthey do not say so, an example of such a word is given by renaming the letters in vtm := τ ( g ω (0)) above.) Then replace every occurrence of 12 with 132. Next, replace every remainingoccurrence of 21 with 241. Finally, apply the morphism α deﬁned as follows: α (0) = 011000111001 α (1) = 011100011001 α (2) = 011001110001 α (3) = 01100010111001 α (4) = 01110010110001 . The resulting word avoids all squares except 0 , 1 , and (01) .Because of the inherent complexity of this construction, it seems desirable to ﬁnd simplerones. An example using 24-uniform morphisms was given by Rampersad et al. [14]. Deﬁne p (0) = 012321012340121012321234 p (1) = 012101234323401234321234 p (2) = 012101232123401232101234 p (3) = 012321234323401232101234 p (4) = 0123212340121012343212343nd β (0) = 011100 β (1) = 101100 β (2) = 111000 β (3) = 110010 β (4) = 110001 . Then β ( p ω (0)) is an inﬁnite word containing only the squares 0 , 1 , and (01) . This con-struction gives a 24-automatic sequence generated by an automaton of 18 states, so its weightis 24 ·

18 = 432.

Ochem [13] provided a diﬀerent construction in 2006: σ (0) = 00011001011000111001011001110001011100101100010111 σ (1) = 00011001011000101110010110011100010110001110010111 σ (2) = 00011001011000101110010110001110010111000101100111He showed that if x is a (7 / (cid:15) )-free word, then σ ( x ) contains only three squares.In fact, we can also successfully apply σ to the word vtm above, even though it is not(7 / (cid:15) )-free. Since σ is a uniform map, we know that σ ( vtm ) is 2-automatic. Theorem 2.

This word σ ( vtm ) is a -automatic word containing only three distinct squares.It is generated by an automaton with 109 states (and has weight ·

109 = 218 ). Harju and Nowotka [10] generated an inﬁnite binary word with three squares by deﬁning themap ζ (0) = 111000110010110001110010 ζ (1) = 111000101100011100101100010 ζ (2) = 111000110010110001011100101100 . and then applying it to vtm .The morphism ζ is clearly not uniform. However, the lengths of the images of 0 , , , ,

30 and form an arithmetic progression. This is enough to show that ζ ( vtm ) is 2-automatic, as the following result shows. Theorem 3.

Let vtm = τ ( g ω (0)) where g and τ are deﬁned in Section 2. Let h : { , , } ∗ → ∆ ∗ be a morphism. If the three lengths | h (0) | , | h (1) | , and | h (2) | form an arithmetic progres-sion, then h ( vtm ) is -automatic. roof. Suppose a, b are integers, with a ≥ a + 2 b ≥

1, such that | h ( i ) | = a + ib for i ∈ { , , } . Write vtm = c (0) c (1) c (2) · · · . An easy induction now shows that | h ( c (0) c (1) · · · c ( n − | = ( a + b ) n + bt n for n ≥

0, where t = t t · · · is the Thue-Morse word. To compute the n ’th symbol of h ( vtm ),divide n by a + b to determine which block h ( c ( i )) it corresponds to; then adjust based onwhether t i = 0 or not. More precisely, deﬁne n (cid:48) := (cid:98) n/ ( a + b ) (cid:99) and m := n mod ( a + b ).Then ( h ( vtm ))[ n ] :=  ( h ( c ( n (cid:48) )))[ m ] , if t n (cid:48) = 0;( h ( c ( n (cid:48) − m + a + b ] , if t n (cid:48) = 1 and t n (cid:48) − = 0 and m < b ;( h ( c ( n (cid:48) )))[ m − b ] , if t n (cid:48) = 1 and t n (cid:48) − = 0 and m ≥ b ;( h ( c ( n (cid:48) − m + a ] , if t n (cid:48) = 1 and t n (cid:48) − = 1 and m < b ;( h ( c ( n (cid:48) )))[ m − b ] , if t n (cid:48) = 1 and t n (cid:48) − = 1 and m ≥ b. For ﬁxed a and b , an automaton on input n in base 2 can compute n (cid:48) and m on the ﬂy anddo the required lookup. Theorem 4.

The inﬁnite word ζ ( vtm ) contains only three distinct squares: , , and (01) . It is generated by an automaton with 88 states, and has weight is ·

88 = 176 . Yet another construction was given by Badkobeh and Crochemore [2, 1]. They deﬁned themorphism ξ (0) = 000111 ξ (1) = 0011 ξ (2) = 01001110001101 . of weight 24. Although ξ applied to a squarefree word can produce a word with more thanthree squares (consider 0102), it turns out that ξ ( vtm ) is squarefree. Furthermore, althoughthey do not mention it, ξ is a morphism of lowest total weight with this property.Incidentally, we found another morphism with the same properties, of the same weight;it is κ (0) = 110100111000110100 κ (1) = 1100 κ (2) = 01 . However, the lengths of the images of both of these morphisms are not in arithmeticprogression, and so Theorem 3 does not apply. Indeed, we suspect (but did not prove) thatneither ξ ( vtm ) nor κ ( vtm ) is a 2-automatic sequence. If they are 2-automatic, then morethan 200 states are needed to generate them.5 .4 Our ﬁrst construction The previous section suggests looking for a morphism η of lowest total weight, where thelengths of the images of 0 , , η ( vtm ) has only 3distinct squares. We found the following morphism, which is the smallest such, of weight 36. η (0) = 00011101 η (1) = 001110001101 η (2) = 0011000111001101 . Theorem 5.

The inﬁnite word η ( vtm ) contains only three distinct squares: , , and (10) . It is -automatic, and can be generated by an automaton of states, so its weight is ·

27 = 54 . Finally, instead of using the strategy of applying a morphism to vtm , we can search directlyfor a k -automatic word of minimum total weight. It turns out that this minimum weight is44, corresponding to a 2-automaton with 22 states:

00 1 Figure 1: DFAO where accepting states have output 1 and all other states have output 0.The corresponding representation is as the image, under the coding γ , of the ﬁxed point ofthe morphism q deﬁned below over the alphabet { , , . . . , } . We use commas to separate6etters in the image of q , because of the large alphabet size. q (0) = 0 , γ (0) = 1 q (1) = 2 , γ (1) = 1 q (2) = 4 , γ (2) = 0 q (3) = 6 , γ (3) = 1 q (4) = 8 , γ (4) = 0 q (5) = 10 , γ (5) = 0 q (6) = 12 , γ (6) = 1 q (7) = 8 , γ (7) = 1 q (8) = 13 , γ (8) = 0 q (9) = 15 , γ (9) = 0 q (10) = 2 , γ (10) = 0 q (11) = 7 , γ (11) = 1 q (12) = 6 , γ (12) = 1 q (13) = 2 , γ (13) = 1 q (14) = 18 , γ (14) = 1 q (15) = 0 , γ (15) = 0 q (16) = 6 , γ (16) = 0 q (17) = 10 , γ (17) = 1 q (18) = 20 , γ (18) = 0 q (19) = 10 , γ (19) = 1 q (20) = 13 , γ (20) = 1 q (21) = 18 , γ (21) = 0 Theorem 6.

The inﬁnite word γ ( q ω (0)) = 11010011000111001101001110001101000111010011000 · · · contains only distinct squares: , , and (10) . It has total weight . By exhaustive search we ﬁnd that there are no k -automatic words containing only threedistinct squares, with s states, for 3 ≤ k ≤

44 and ks ≤ We used breadth-ﬁrst search to ﬁnd candidates for the minimal examples presented here.The number of states in the minimal automaton were determined using the Myhill-Nerodetheorem (see, e.g., [11]). We used the theorem-proving software

Walnut [12] to verify asser-tions about the squares contained in each word. For example, the claim about the 22-stateautomaton in the previous section can be proved as follows: create the automaton, and callit Q in Walnut , and then evaluate the following three statements: eval qtest1 "Ei,n (n>=3) & At (t Q[i+t]=Q[i+t+n]":eval qtest2 "Ei (Q[i]=Q[i+1])&(Q[i]=Q[i+2])&(Q[i]=Q[i+3])":eval qtest3 "Ei (Q[i]=@0)&(Q[i+1]=@1)&(Q[i+2]=@0)&(Q[i+3]=@1)":

The ﬁrst predicate asserts that there is a square of order ≥ or (11) . The third asserts that there is a squareof the form (01) . Since all three queries return false , the word has the desired properties.The total computation time for this query is a few seconds on a laptop.Each of Theorems 1,2,4,5,6 can be proved similarly, although some require signiﬁcantmemory resources and time. The Walnut code can be found on the website of the secondauthor: https://cs.uwaterloo.ca/~shallit/papers.html .7 eferences [1] G. Badkobeh. Inﬁnite words containing the minimal number of repetitions. J. DiscreteAlgorithms (2013), 38–42.[2] G. Badkobeh and M. Crochemore. Fewest repetitions in inﬁnite binary words. RAIROInform. Th´eor. App. (2012), 17–31.[3] J. Berstel. Sur la construction de mots sans carr´e. S´eminaire de Th´eorie des Nombres (1978–1979), 18.01–18.15.[4] J. Berstel.

Axel Thue’s Papers on Repetitions in Words: a Translation . Number 20 inPublications du Laboratoire de Combinatoire et d’Informatique Math´ematique. Univer-sit´e du Qu´ebec `a Montr´eal, February 1995.[5] F. Blanchet-Sadri, J. Currie, N. Rampersad, and N. Fox. Abelian complexity of ﬁxedpoint of morphism 0 (cid:55)→ (cid:55)→

02, 2 (cid:55)→ INTEGERS: Elect. J. of Combin. NumberTheory (2014), Internat. J. Found. Comp. Sci. (2012), 1035–1066.[7] A. Cobham. Uniform tag sequences. Math. Systems Theory (1972), 164–192.[8] R. C. Entringer, D. E. Jackson, and J. A. Schatz. On nonrepetitive sequences. J.Combin. Theory Ser. A (1974), 159–164.[9] A. S. Fraenkel and J. Simpson. How many squares must a binary sequence contain? Electronic J. Combinatorics (1994), Bull. European Assoc. Theor.Comput. Sci. , No. 89, (2006), 164–166.[11] J. E. Hopcroft and J. D. Ullman.

Introduction to Automata Theory, Languages, andComputation . Addison-Wesley, 1979.[12] H. Mousavi. Automatic theorem proving in

Walnut . Available at http://arxiv.org/abs/1603.06017 , 2016.[13] P. Ochem. A generator of morphisms for inﬁnite words.

RAIRO Inform. Th´eor. App. (2006), 427–441.[14] N. Rampersad, J. Shallit, and M.-w. Wang. Avoiding large squares in inﬁnite binarywords. Theoret. Comput. Sci. (2005), 19–34.815] A. Thue. ¨Uber unendliche Zeichenreihen.

Norske vid. Selsk. Skr. Mat. Nat. Kl. (1906), 1–22. Reprinted in Selected Mathematical Papers of Axel Thue , T. Nagell,editor, Universitetsforlaget, Oslo, 1977, pp. 139–158.[16] A. Thue. ¨Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen.