How far away must forced letters be so that squares are still avoidable?
HHow far away must forced letters be so thatsquares are still avoidable?
Matthieu RosenfeldFebruary 10, 2020
Abstract
We describe a new non-constructive technique to show that squaresare avoidable by an infinite word even if we force some letters from thealphabet to appear at certain occurrences. We show that as long as forcedpositions are at distance at least 19 (resp. 3, resp. 2) from each otherthen we can avoid squares over 3 letters (resp. 4 letters, resp. 6 or moreletters). We can also deduce exponential lower bounds on the numberof solutions. For our main Theorem to be applicable, we need to checkthe existence of some languages and we explain how to verify that theyexist with a computer. We hope that this technique could be applied toother avoidability questions where the good approach seems to be non-constructive (e.g., the Thue-list coloring number of the infinite path).
A square is a word of the form uu where u is a non empty word. We saythat a word is square-free (or avoids squares) if none of its factors is a square.For instance, hotshots is a square while minimize is square-free. In 1906,Thue showed that there are arbitrarily long ternary words avoiding squares [8].This result is often regarded as the starting point of combinatorics on words,and the generalizations of this particular question received a lot of attention.The authors of [2] study three such questions asked by Harju [5]. They alsointroduced a stronger version of the third problem. Problem 1 ([2, Problem 4]) . Let p ≥ be an integer and let v = v v v . . . beany infinite ternary word. Does there exist an infinite ternary square-free word w = w w w . . . such that for all i , w p · i = v i ? They give a partial solution to this question and they show that the answeris yes for any v if p ≥ . In fact, they showed something slightly stronger. Let d (Σ) be the smallest integer such that for all v ∈ Σ ω and for all sequence ofindices ( p i ) ≤ i such that ∀ i , p i +1 − p i ≥ d (Σ) , there is an infinite square-freeword u ∈ Σ ω such that v i = u p i . They showed that ≤ d ( { , , } ) ≤ .Moreover, the fact that squares are avoidable over 3 letters can be used to show1 a r X i v : . [ c s . D M ] F e b hat d ( { , , , } ) ≤ . We show that ≤ d ( { , , } ) ≤ , d ( { , , , } ) = 3 , d ( { , , , ..., k } ) = 2 for k ≥ .The main theorem of this paper gives sufficient conditions for the existence ofsquare-free languages that fulfill some constraints. Kolpakov showed that thereare more than . n square-free words of length n over a ternary alphabetusing a new non-constructive technique [6]. One of the ideas behind Kolpakov’sresult is roughly to approximate (using a computer) the language of square-freewords by the language of words avoiding squares of period less than l for large l , and to show that we do not lose too many words if we remove the largersquares from this language. We use a similar idea in this paper. We also useideas from the power series method (see for instance [1, 7]) even if we do notexplicitly manipulate any power series. It seems to be a good approach to showthat the Thue-list number of paths is (see [3, 4] for definitions and conjectureson this topic) or to tackle other problems that might require a non-constructiveapproach.This paper is organized as follows. We start by fixing some notations inSection 2. In Section 3, we give a weaker version of Theorem 4 to present theideas of the theorem without some of the technicalities. Then in Section 4, wegive the proof of Theorem 4, our main theorem. In Section 5, we explain howto verify with a computer the existence of some languages that are required toapply Theorem 4. Finally, in Section 6, we use Theorem 4 to bound the valuesof d for different alphabet sizes. We denote the set of non-negative integer (resp. positive integers) by N (resp. N > ). For any word w ∈ Σ ∗ , we denote the i th letter of w by w i and the lengthof w by | w | . Then for any w ∈ Σ ∗ , w = w w . . . w | w | . For any set of non-emptywords W , we let W ∗ (resp. W ω ) be the set of words obtained by catenationof finitely many (resp. infinitely many) elements of W . A language over analphabet is a set of finite words over this alphabet. We use the convention that (cid:81) x ∈∅ x = 1 and max x ∈∅ x = 0 (we could use −∞ for the second one, but it isslightly less convenient for the implementation).A partial word over Σ is a (possibly infinite) word over the alphabet Σ ∪ {(cid:5)} .For any partial word µ ∈ (Σ ∪ {(cid:5)} ) ∗ ∪ (Σ ∪ {(cid:5)} ) ω and word v ∈ Σ ∗ ∪ Σ ω , we saythat v is compatible with µ if | v | ≤ | µ | and µ i (cid:54) = (cid:5) = ⇒ µ i = v i for all i suchthat v i and µ i are defined. We denote by S ( µ ) the set of square-free words thatare compatible with the partial word µ . The main Theorem of this paper is Theorem 4. The main idea of this theoremis that if a language avoids short squares and is large enough then it containssquare-free words of any length. The statement and proof of this theorem are2ather difficult to follow so we give in this section a version of the Theorem forthe case where the set W is a singleton { w } . We hope that this helps to conveythe ideas of the proof of Theorem 4. This is in fact really similar to the ideasof [7], but instead of building the word letter by letter, we construct it factorby factor. For that we fix one size of a factor and look at the number of wordswhose length corresponds to multiples of this size. Theorem 2.
Let Σ be an alphabet, w ∈ (Σ ∪ {(cid:5)} ) ∗ be a finite partial word and p ≥ | w | such that | w | divides p . Suppose that there are C ∈ N > and L alanguage such that:(I) ε ∈ L .(II) For all u ∈ L , u avoids squares of period less than p .(III) For any u ∈ L there are at least C different words v ∈ Σ | w | compatiblewith w such that uv ∈ L .(IV) There exists x ∈ ]0 , such that: C (cid:32) − x p | w | − | w | − x (cid:33) ≥ x − Then S ( w ω ) is infinite.Proof. Let µ = w ω . Let L ( µ ) be a set of words from L that are compatible with µ such that, for any u ∈ L ( µ ) of length divisible by | w | , there are exactly C different words v ∈ Σ ∗ compatible with w with uv ∈ L ( µ ) . Conditions (I),(II)and (III) imply that such a set can be obtained by removing words from L . Forall non-negative i , let s i = | S ( µ ) ∩ L ( µ ) ∩ { u ∈ Σ ∗ : | u | = i | w |}| be the numberof square-free words of L ( µ ) of length i | w | .We will show by induction on i that for all positive i , s i +1 ≥ x − s i . Let n be a positive integer such that: ∀ ≤ i < n, s i +1 ≥ x − s i (IH1)By definition of L ( µ ) , for any word w of S ( µ ) ∩ L ( µ ) there are exactly C different factors v of length | w | such that wv is in L ( µ ) . Let F be the setof words in L ( µ ) \ S ( µ ) of length ( n + 1) | w | whose prefix of length n | w | is in S ( µ ) ∩ L ( µ ) . Then by definition: s n +1 ≥ Cs n − | F | . (1)In order to bound | F | , let us introduce for all i < n + 1 , F i = { uvvy ∈ F : | w | ( i − < | uv | ≤ i | w | , | y | < | w |} . That is, F i is the set of words of F thatcontain a square whose midpoint (the middle of the square) is located betweenthe positions ( i − | w | and i | w | in the word. Clearly | F | ≤ (cid:80) ni =1 | F i | , so ournext task is to compute bounds on | F i | for all i .3 emma 3. We have the following inequalities: • for all i > n + 1 − p | w | , | F i | = 0 , • for all i ≤ n + 1 − p | w | , | F i | ≤ s i C | w | .Proof. If i > n + 1 − p | w | , then ( i − | w | + p ≥ ( n + 1) | w | . Since, L does notcontain squares of period less than p , F i = ∅ .Now, let i ≤ n +1 − p | w | . For any i and z ∈ S ( µ ) ∩ L ( µ ) ∩{ u ∈ Σ ∗ : | u | = i | w |} let F i ( z ) be the set of words of F i that admit z as a prefix.By definition of F i , any word F i ( z ) contains a square whose second halfstarts in position a + 1 and ends in position b where ( i − | w | < a ≤ i | w | and n | w | < b ≤ ( n + 1) | w | . Given z , a and b , we know the first half of the squareand thus the word is known at least up to position n | w | . By definition of L ( µ ) there are at most C possible values for the remaining | w | letters. By summingover all the values of a and b one gets: | F i ( z ) | ≤ C | w | . By summing over allthe values of z , we finally get | F i | ≤ s i C | w | .Now, by (IH1), for all i , | F i | ≤ C | w | x n − i s n and thus: | F | ≤ n (cid:88) i =1 | F i | ≤ n +1 − p | w | (cid:88) i =1 C | w | x n − i s n ≤ C | w | s n ∞ (cid:88) i = p | w | − x i | F | ≤ C | w | s n x p | w | − − x We can use this bound in inequality (1) and we get: s n +1 ≥ Cs n − C | w | s n x p | w | − − xs n +1 ≥ s n C (cid:32) − x p | w | − | w | − x (cid:33) s n +1 ≥ x − s n (By Theorem hypothesis IV)This concludes the proof that for all positive i , s i +1 ≥ x − s i . Since s = 1 ,we deduce that s i is unbounded and thus S ( w ω ) is infinite. This section is devoted to the proof of the main Theorem. As already mentionedthe ideas of the proof are the same as for the proof of Theorem 2. However, thisis more technical because W is not a singleton anymore. Moreover, we want theequivalent of condition (IV) to be as general as possible and for that, we needto bound the size of | F | as tightly as possible. Thus the equivalent of Lemma 3(Lemma 5) is much more technical and we delay its proof to a later subsection.4 heorem 4. Let Σ be an alphabet, W ⊆ (Σ ∪ {(cid:5)} ) ∗ be a finite set of finitepartial words, p ≥ {| w | : w ∈ W } be an integer. Suppose that there is alanguage L and a function f : N > → N > such that:(I) ε ∈ L .(II) For any u ∈ L and w ∈ W there are at least f ( | w | ) different words v ∈ Σ | w | compatible with w and such that uv ∈ L .(III) For all u ∈ L , u avoids squares of period less than p .(IV) For all u, v ∈ W and integer ≤ i ≤ | v | , let α ( | u | , | v | ) = | u | (cid:88) m =1 (cid:98) | v |− m (cid:99) (cid:88) j =0 min (cid:110) f ( | v | ) , ( | Σ | − | v |− − jm (cid:111) and α (cid:48) ( i, | v | ) = i − (cid:88) m =0 min { f ( | v | ) , ( | Σ | − m } . There exist x , x , . . . , x max {| w | : w ∈ W } ∈ ]0 , and β : { , . . . , p } → [0 , solution of the following system: ∀ w ∈ W,f ( | w | ) − max u,v ∈ W ≤ r ≤| v | (cid:110) β ( r + p − | w | − | v | ) (cid:16) α (cid:48) ( r, | w | ) + x | v | α ( | u | , | w | )1 − x | u | (cid:17)(cid:111) ≥ x − | w | ∀ j ≤ p, β ( j ) = max (cid:81) i ∈{| u | : u ∈ W } x n i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∀ i ∈ {| u | : u ∈ W } , n i ∈ N , and (cid:80) i ∈{| u | : u ∈ W } i · n i = j Then for any infinite partial word µ ∈ W ω , S ( µ ) is infinite.Proof. Let µ ∈ W ω and ( µ i ) i ∈ N > ∈ W N > be a sequence of elements of W suchthat µ = µ µ µ . . . . For any integer i , let l ( i ) = | µ . . . µ i | . Let L ( µ ) be aset of words from L that are compatible with µ such that, for any u ∈ L ( µ ) oflength | µ . . . µ j | , there are exactly f ( | µ j +1 | ) different words v ∈ Σ ∗ compatiblewith µ j +1 with uv ∈ L ( µ ) . That is, we remove words from L in order toreplace the “at least f ( | µ j +1 | ) ” by “exactly f ( | µ j +1 | ) ”. For all non-negative i ,let s i = | S ( µ ) ∩ L ( µ ) ∩ { v ∈ Σ ∗ : | v | = i }| be the number of square-free wordsof L ( µ ) of length i .We will show by induction on i that for all positive i , s l ( i +1) ≥ x − | µ i +1 | s l ( i ) .Let n be a positive integer such that: ∀ ≤ i < n, s l ( i +1) ≥ x − | µ i +1 | s l ( i ) (IH1)By definition of L ( µ ) , for any word w of S ( µ ) ∩ L ( µ ) of length l ( n ) there areexactly f ( | µ n +1 | ) different factors v of length | µ n +1 | such that wv is in L ( µ ) .5et F be the set of words in L ( µ ) \ S ( µ ) of length l ( n + 1) whose prefix of length l ( n ) is in S ( µ ) ∩ L ( µ ) . Then by definition: s l ( n +1) ≥ f ( | µ n +1 | ) s l ( n ) − | F | . (2)In order to bound | F | , let us introduce for all i < n + 1 , F i = { uvvw ∈ F : l ( i − < | uv | ≤ l ( i ) , | w | < | µ n +1 |} . That is, F i is the set of words of F thatcontain a square whose midpoint (the middle of the square) is located betweenthe positions | µ . . . µ i | and | µ . . . µ i − | in the word. Clearly | F | ≤ (cid:80) ni =1 | F i | ,so our next task is to compute the values of | F i | for all i. Let d be the smallestinteger such that | µ d +1 . . . µ n +1 | ≤ p and let r = | µ d µ d +1 . . . µ n +1 | − p . Remarkthat r > . Lemma 5.
We have the following inequalities: • for all i > d , | F i | = 0 , • | F d | ≤ s l ( d ) α (cid:48) ( r, | µ n +1 | ) , • for all i ≤ d , | F i | ≤ s l ( i ) α ( | µ i | , | µ n +1 | ) . The proof of this Lemma is not really informative and is mostly a rathertechnical counting argument, so we moved it to Section 4.1.We can use the bounds on the sizes of the F i s to bound | F | : Lemma 6.
We have | F | ≤ s l ( d ) max u ∈ W (cid:26) α (cid:48) ( r, | µ n +1 | ) + x | µ d | α ( | u | , | µ n +1 | )1 − x | u | (cid:27) . Proof.
First, let us show by induction on i that for all ≤ i < d : i (cid:88) j =0 s l ( j ) α ( | µ j | , | µ n +1 | ) ≤ s l ( i ) max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) . (IH2)Let us first show that this is true with i = 0 , using the fact that µ ∈ ]0 , . s l (0) α ( | µ | , | µ n +1 | ) ≤ s l (0) α ( | µ | , | µ n +1 | )1 − x | µ | ≤ s l (0) max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) . Now, let i + 1 be an integer such that (IH2) is true for i . i +1 (cid:88) j =0 s l ( j ) α ( | µ j | , | µ n +1 | ) ≤ s l ( i +1) α ( | µ i +1 | , | µ n +1 | ) + s l ( i ) max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) ≤ s l ( i +1) (cid:18) α ( | µ i +1 | , | µ n +1 | ) + x | µ i +1 | max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27)(cid:19) (By (IH1)) ≤ s l ( i +1) max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) i ≤ d and in particular for i = d − and weget: | F | ≤ | F d | + s l ( d − max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) | F | ≤ s l ( d ) α (cid:48) ( r, | µ n +1 | ) + s l ( d ) x | µ d | max u ∈ W (cid:26) α ( | u | , | µ n +1 | )1 − x | u | (cid:27) | F | ≤ s l ( d ) max u ∈ W (cid:26) α (cid:48) ( r, | µ n +1 | ) + x | µ d | α ( | u | , | µ n +1 | )1 − x | u | (cid:27) This concludes the proof of this Lemma.By induction hypothesis (IH1) s l ( d ) ≤ s l ( n ) (cid:81) ni = d +1 x | µ j | . Let us bound theproduct on the right hand side: n (cid:89) i = d +1 x | µ j | ≤ max (cid:89) i ∈{| u | : u ∈ W } x n i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∀ i ∈ {| u | : u ∈ W } , n i ∈ N and (cid:80) i ∈{| u | : u ∈ W } i · n i = l ( n + 1) − l ( d ) − µ n +1 n (cid:89) i = d +1 x | µ j | ≤ β ( l ( n + 1) − l ( d ) − µ n +1 ) n (cid:89) i = d +1 x | µ j | ≤ β ( r + p − µ n +1 − µ d ) Now, using this equation with Lemma 6 gives | F | ≤ s l ( n ) β ( r + p − µ n +1 − µ d ) max u ∈ W (cid:26) α (cid:48) ( r, | µ n +1 | ) + x | µ d | α ( | u | , | µ n +1 | )1 − x | u | (cid:27) Now recall that r = | µ d µ d +1 . . . µ n +1 |− p and thus by definition of d , ≤ r ≤ µ d .We deduce: | F | ≤ s l ( n ) max u,v ∈ W ≤ r ≤| v | (cid:26) β ( r + p − µ n +1 − | v | ) (cid:18) α (cid:48) ( r, | µ n +1 | ) + x | v | α ( | u | , | µ n +1 | )1 − x | u | (cid:19)(cid:27) We can finally replace | F | by this bound in inequality (2) and we get: s l ( n +1) ≥ s l ( n ) (cid:32) f ( | µ n +1 | ) − max u,v ∈ Wr ∈{ ,..., | v |} (cid:26) β ( r + p − µ n +1 − | v | ) (cid:18) α (cid:48) ( r, | µ n +1 | ) + x | v | α ( | u | , | µ n +1 | )1 − x | u | (cid:19)(cid:27) (cid:33) s l ( n +1) ≥ s l ( n ) x − | µ n +1 | (By Theorem hypothesis (IV))Moreover s = 1 and thus for all i , s | µ ...µ i | ≥ (cid:81) ij =1 x − | µ j | . For all j , x − | µ j | > ,so we conclude that S ( µ ) is infinite. 7emark that Theorem 4 is far from sharp. One could improve the boundsgiven by Lemma 5. This could be done by lowering α and α (cid:48) or by introducinga third coefficient α (cid:48)(cid:48) for the second non-empty F i . However, we were not ableto obtain significant improvement that were worth the additional technicalities.In Section 5 we explain how to verify with a computer that there exists alanguage L that satistfies conditions (I),(II) and (III). We also need a way toverify condition (IV). In order to compute β , we can use that β (0) = 1 and, forall j ∈ { , . . . , p } , β ( j ) = max (cid:8) x | u | β ( j − | u | ) : u ∈ W, | u | ≤ j (cid:9) . Thus given thevalues of the x i one can compute β using a dynamic algorithm and all the rest isstraight forward to compute. Thus it is easy to verify with a computer whetheror not a given set of values of x i is a solution. We provide a C++ program thattakes as input | Σ | , k , p , f and x , . . . , x k and verifies whether this is a solutionof the equations of condition (IV). This subsection is dedicated to the proof of Lemma 5. Remark that the state-ment and proof are not self-contained since some of the notations are defined inthe proof of Theorem 4.
Lemma 7.
We have the following inequalities: • for all i > d , | F i | = 0 , • | F d | ≤ s l ( d ) α (cid:48) ( r, | µ n +1 | ) , • for all i ≤ d , | F i | ≤ s l ( i ) α ( | µ i | , | µ n +1 | ) .Proof. If i > d then by definition | µ i . . . µ n +1 | ≤ p . Moreover, L does not containsquares of period less than p and thus F i = ∅ .Now, let i ≤ d . By definition, any word from F i can be written uvvy with l ( i − < | uv | ≤ l ( i ) , | y | < | µ n +1 | . For any i and z ∈ S ( µ ) ∩ L ( µ ) ∩ { u ∈ Σ ∗ : | u | = l ( i ) } , let F i ( z ) be the set of words of F i that admit z as a prefix. Clearly F i = (cid:80) z ∈ S ( µ ) ∩ L ( µ ) ∩{ u ∈ Σ ∗ : | u | = l ( i ) } F i ( z ) .Let a, b (resp. a (cid:48) , b (cid:48) ) be integers such that there is an element z (cid:48) ∈ F i ( z ) that contains a square starting at a (resp. a (cid:48) ) and of period b (resp. b (cid:48) ) with l ( i −
1) + 1 < a (cid:48) + b (cid:48) = a + b ≤ | z | + 1 and a > a (cid:48) . Because of the square in z (cid:48) ,we know that for all ≤ j ≤ | z | − a − b , z a + j = z a + b + j = z a (cid:48) + b (cid:48) + j = z a (cid:48) + j . If a ≤ a (cid:48) + | z | − a − b + 1 then z contains a square which is not possible. Hence a > a (cid:48) + | z | − a − b + 1 a + b + b (cid:48) > a (cid:48) + b + b (cid:48) + | z | − a − b + 1 a (cid:48) + 2 b (cid:48) > a + 2 b + | z | − a − b + 1 since a + b = a (cid:48) + b (cid:48) (3)Let u ∈ F i ( z ) be a word that contains a square starting at a and of period b thenwe know its suffix of size a + 2 b − > l ( n ) . Thus there are at most f ( | µ n +1 | ) possibilities, moreover since the size of the unknown suffix is l ( n +1)+1 − a − b
p i + | z |− a − b +1 (cid:40) j (cid:88) i =1 min (cid:110) f ( | µ n +1 | ) , ( | Σ | − l ( n +1)+1 − p i (cid:111)(cid:41) = max j ∈ N , ≤ p ,...,p j ≤| µ n +1 |− , ∀ i 1) + 1 < a (cid:48) + b (cid:48) = a + b ≤ l ( n + 1) + 1 − p and a > a (cid:48) . By definition of d , l ( n + 1) − l ( d ) ≤ p . We can use equation (3)again and we get: a (cid:48) + 2 b (cid:48) > b + | z | + 1 ≥ p + | z | + 1 ≥ l ( n + 1) + 1 This is a contradiction with the fact that a (cid:48) + 2 b (cid:48) ≤ l ( n + 1) + 1 . Thus given thevalue of a + b there is at most one possible value for a and b . The number ofways for a fixed z and value of a + b to complete z with a suffix into an element9f F d is at most: max (cid:26) min (cid:110) f ( | µ n +1 | ) , ( | Σ | − l ( n +1)+1 − s (cid:111) : s ∈ N > , s ≤ l ( n + 1) + 1 ,a + b + p ≤ s, (cid:27) ≤ min (cid:110) f ( | µ n +1 | ) , ( | Σ | − l ( n +1)+1 − a − b − p (cid:111) Then by summing over all the possible values of a + b , we get: | F d ( z ) | ≤ l ( n +1)+1 − p (cid:88) a + b = l ( d − min (cid:110) f ( | µ n +1 | ) , ( | Σ | − l ( n +1)+1 − a − b − p (cid:111) We can use the variable substitution m = l ( n + 1) + 1 − a − b − p and remarkthat l ( n + 1) + 1 − p − ( l ( d − 1) + 2) = | µ d µ d +1 . . . µ n +1 | − p − r − and weget: | F d ( z ) | ≤ r − (cid:88) m =0 min { f ( | µ n +1 | ) , ( | Σ | − m } ≤ α (cid:48) ( r, | µ n +1 | ) We conclude that | F d | ≤ s l ( d ) α (cid:48) ( r, | µ n +1 | ) by summing over all z . L that satisfies Theorem 4 In this section, we explain how to verify the existence of a language that fulfillsconditions (I),(II) and (III) of Theorem 4.We consider some particular directed labeled graphs: G ( V, A ) is a set V ofvertices together with a set A ⊆ ( V × V × Σ) of labeled arcs. For any u, v ∈ V and a ∈ Σ , ( u, v, a ) ∈ A is an arc from u to v with label a . These graphs couldalso be seen as finite state machines where all the states are initial and final.The Rauzy graph of length n of a factorial language L over Σ is the graph G ( V, A ) where V = L ∩ Σ n and E = { ( au, ub, b ) : aub ∈ L, a, b ∈ Σ } . For anygraph G ( V, A ) and any set X ∈ V , we denote by G [ X ] the subgraph inducedby X .Let R p (Σ) be the Rauzy graph of length p − of the square-free words over Σ . Remark, that the factors of length p − of any walk on R p (Σ) correspondto edges of R p (Σ) and by definition they are square free. Thus, the sequenceof labels of any walk on R p (Σ) avoids squares of period less than p , but cancontain longer squares. We let S p (Σ) be the set of words that contains nosquare of period less than p (from the previous remark S p (Σ) can also be seenas the set of walks on R p (Σ) ).As an illustration, we give R ( { , , } ) in Fig. 1 without the arc labels.For this Section, we abuse the notation and allow ourself to identify wordsand sequences.For any graph G ( V, A ) and partial word w ∈ (Σ ∪{(cid:5)} ) ∗ , we define inductively10 Figure 1: The Rauzy graph R ( { , , } ) .for any integer i ∈ { , . . . , | w |} and vertex v ∈ V : p i,w,G ( v ) = if i = 0 , (cid:80) ( v,u,a ) ∈ Aa ∈ Σ p i − ,w,G ( u ) if w | w | +1 − i = (cid:5) , (cid:80) ( v,u,w | w | +1 − i ) ∈ A p i − ,w,G ( u ) otherwise.Intuitively, p i,w,G ( v ) gives the number of walks of length i starting from v thatare compatible with the i last letters of w . Indeed, there is one walk of length and we always take the transition that is labeled by the current letter of w andany transition if this letter is (cid:5) . Remark that in the third case, there are in facteither or summands in the sum. Lemma 8. Let W ⊆ (Σ ∪ {(cid:5)} ) ∗ , G ( V, A ) = R p (Σ) , f : N > → N > and anon-empty set X ⊆ V . If for all v ∈ X and w ∈ W , p | w | ,w,G [ X ] ( v ) ≥ f ( | w | ) ,then there exists a language L such that: • ε ∈ L , • for all u ∈ L , u avoids squares of period less than p , • for any u ∈ L and w ∈ W there are at least f ( | w | ) different words v ∈ Σ | w | compatible with w such that uv ∈ L .Proof. Let L be the set of sequences of labels that correspond to a walk in G [ X ] .By definition, the two first conditions on L are fulfilled.Let u ∈ L and w ∈ W . If | u | ≥ p − we let u (cid:48) be the suffix of length p − of u . Otherwise, we let u (cid:48) ∈ L such that | u (cid:48) | = 2 p − and w is a suffix of u (cid:48) (there is such an element in L ). Each walk of length | w | starting in u (cid:48) givesa unique sequence of labels u (cid:48)(cid:48) such that uu (cid:48)(cid:48) contains no square of period lessthan p .We easily deduce, by induction on i , that for all v the number of walks oflength i starting at v that are compatible with w | w |− i +1 w | w |− i +2 . . . w | w | is at11east p i,w,G [ X ] ( v ) . So, in particular, the number of walks of length | w | startingat u (cid:48) and compatible with w is at least p | w | ,w,G [ X ] ( u (cid:48) ) ≥ f ( | w | ) . This concludesthe proof.In fact, we need something stronger because for the values of p that we usethe graphs R p are too big to fit in a computer. We can exploit symmetries of R p (Σ) to work on a smaller equivalent graph.For any square-free word w ∈ Σ ∗ , we let Ψ( w ) be the shortest suffix of w suchthat for all i ∈ {(cid:100) | Ψ( w ) | (cid:101) + 1 , . . . , p − } there exists k ∈ { , , . . . , | Ψ( w ) | − i − } with Ψ( w ) | Ψ( w ) |− k (cid:54) = Ψ( w ) | Ψ( w ) |− k − i . If | w | = 2 p − , then w is such a suffix ofitself (since {(cid:100) | w | (cid:101) + 1 , . . . , p − } is empty) and thus there is always a shortestsuffix.For instance, with p = 5 we have Ψ(0210120) = 10120 . For any letter α ,the word α is square free if and only if α is square free. Indeed, anew square is necessarily a suffix of α , it is enough to look at the twoletters in bold in α to deduce that there is no square of length and allthe other possible suffix of even length of α are also suffixes of α .In fact, for any w of size p − , the word wa avoids squares if and only if Ψ( w ) a avoids squares (this is proven in the next lemma) and this is the mainmotivation behind the definition of Ψ (in particular, words with the same imageby Ψ can be extended in the same way).For any graph G ( V, A ) , let Ψ( G )(Ψ( V ) , A (cid:48) ) be the graph such that A (cid:48) =Ψ( A ) = { (Ψ( a ) , Ψ( b ) , c ) : ( a, b, c ) ∈ A } . The next lemma tells us that we onlyneed to consider the walks on Ψ( G ) instead of the walks on G . Lemma 9. Let p be a positive integer and w ∈ (Σ ∪ {(cid:5)} ) ∗ , G ( V, A ) = R p (Σ) and X ⊆ Ψ( V ) . Let Ψ − ( X ) = { x ∈ V : Ψ( x ) ∈ X } . Then for all v ∈ Ψ − ( X ) , p | w | ,w,G [Ψ − ( X )] ( v ) = p | w | ,w, Ψ( G )[ X ] (Ψ( v )) .Proof. Let us first show that for any a ∈ Σ and v ∈ S p (Σ) with | v | = 2 p − if Ψ( v ) a ∈ S p (Σ) then va ∈ S p (Σ) . Let us show that under these assumptions,for any i , va avoids squares of period i . Since v is square free, we only need toshow that no suffix of va is a square. We have to distinguish between two cases: • i ≤ | Ψ( v ) | + 1 . Suppose for the sake of contradiction that there is asquare of period i in va . We deduce that the suffix of length | Ψ( v ) | + 1 of va contains a square of period i . That is, Ψ( v ) a contains a square ofperiod i which is a contradiction. • i ≥ | Ψ( v ) | + 2 . Since i is an integer we get i ≥ (cid:100) | Ψ( v ) | (cid:101) + 1 . Moreover | va | = 2 p − and thus i ≤ p − . Thus by definition of Ψ( v ) , there exists k ∈ { , , . . . , | Ψ( v ) | − i − } such that Ψ( v ) | Ψ( v ) |− k (cid:54) = Ψ( v ) | Ψ( v ) |− k − i .Thus there is k ∈ { , . . . , | Ψ( v ) a | − i − } such that (Ψ( v ) a ) | Ψ( v ) a |− k (cid:54) =(Ψ( v ) a ) | Ψ( v ) a |− k − i . Remark, that | Ψ( v ) a | − i − | Ψ( v ) | − i ≤ i − − i = i − . We conclude that there is k ∈ { , . . . , i − } such that ( va ) | va |− k (cid:54) = ( va ) | va |− k − i . This implies that the suffix of length i of va isnot a square of period i . 12e deduce that for any a ∈ Σ and v ∈ S p (Σ) with | v | = 2 p − if Ψ( v ) a ∈ S p (Σ) then va ∈ S p (Σ) .Let u ∈ V , v ∈ Ψ( V ) and a ∈ Σ such that (Ψ( u ) , v, a ) ∈ Ψ( A ) . By def-inition of Ψ( A ) this implies that there is ( u (cid:48) , v (cid:48) , a ) ∈ A with Ψ( u (cid:48) ) = Ψ( u ) and Ψ( v (cid:48) ) = v . Thus u (cid:48) a ∈ S p (Σ) and Ψ( u ) a = Ψ( u (cid:48) ) a ∈ S p (Σ) . Fromthe previous paragraph, it implies that ua is square-free. Let us show that Ψ( u u . . . u | u | a ) = v . By definition, for all i ∈ {(cid:100) | Ψ( u (cid:48) ) | (cid:101) + 1 , . . . , p − } thereexists k ∈ { , , . . . , | Ψ( u (cid:48) ) | − i − } with Ψ( u (cid:48) ) | Ψ( u (cid:48) ) |− k (cid:54) = Ψ( u (cid:48) ) | Ψ( u (cid:48) ) |− k − i .We easily deduce that for all i ∈ {(cid:100) | Ψ( u (cid:48) ) a | (cid:101) + 1 , . . . , p − } there exists k ∈{ , , . . . , | Ψ( u (cid:48) a ) | − i − } with Ψ( u (cid:48) a ) | Ψ( u (cid:48) a ) |− k (cid:54) = Ψ( u (cid:48) a ) | Ψ( u (cid:48) a ) |− k − i . This im-plies that v = Ψ( v (cid:48) ) is a suffix of Ψ( u (cid:48) ) a . Since Ψ( u (cid:48) ) a = Ψ( u ) a , we deducethat v is also a suffix of u u . . . u | u | a . Since v = Ψ( v (cid:48) ) and v is a suffix of u u . . . u | u | a , we get that Ψ( u u . . . u | u | a ) = v . We showed that if there are u ∈ V , v ∈ Ψ( V ) and a ∈ Σ such that (Ψ( u ) , v, a ) ∈ Ψ( A ) , then there exists v (cid:48)(cid:48) such that ( u, v (cid:48)(cid:48) , a ) ∈ A and Ψ( v (cid:48)(cid:48) ) = v .We deduce that for all u ∈ V , { (Ψ( u ) , Ψ( v ) , a ) ∈ Ψ( A ) } ⊆ { (Ψ( u ) , Ψ( v ) , a ) :( u, v, a ) ∈ A } . The other inclusion is clear from the definition of Ψ( A ) and weget for all u ∈ V : { (Ψ( u ) , Ψ( v ) , a ) ∈ Ψ( A ) } = { (Ψ( u ) , Ψ( v ) , a ) : ( u, v, a ) ∈ A } (4)By definition of R p (Σ) , for any u there is at most one outgoing arc for everylabel in the set of the right. Since the two sets are equals, we deduce that everyvertex of the graph Ψ( G ) has at most one outgoing arc for any label. Intuitively,(4) implies (by induction on the length of the walk) that for any u, v ∈ V theset of labeled walks from u to v in G is equal to the set of labeled walks from Ψ( u ) to Ψ( v ) in Ψ( G ) . We are now ready to show by induction on i that forall i ∈ { , . . . , | w |} and v ∈ Ψ − ( X ) , p i,w,G [Ψ − ( X )] ( v ) = p i,w, Ψ( G )[ X ] (Ψ( v )) . Bydefinition p ,w,G [Ψ − ( X )] ( v ) = 1 = p ,w, Ψ( G )[ X ] (Ψ( v )) .Let n be a positive integer such that for all v ∈ Ψ − ( X ) , ∀ i < n, p i,w,G [Ψ − ( X )] ( v ) = p i,w, Ψ( G )[ X ] (Ψ( v )) . (IH)13hen, for all v ∈ Ψ − ( X ) , if w | w | +1 − i (cid:54) = (cid:5) we get: p i,w,G [Ψ − ( X )] ( v ) = (cid:88) ( v,u,w | w | +1 − i ) ∈ Au ∈ Ψ − ( X ) p i − ,w,G [Ψ − ( X )] ( u ) p i,w,G [Ψ − ( X )] ( v ) = (cid:88) ( v,u,w | w | +1 − i ) ∈ A Ψ( u ) ∈ X p i − ,w, Ψ( G )[ X ] (Ψ( u )) (From (IH)) p i,w,G [Ψ − ( X )] ( v ) = (cid:88) (Ψ( v ) , Ψ( u ) ,w | w | +1 − i ) ∈ Ψ( A )Ψ( u ) ∈ X p i − ,w, Ψ( G )[ X ] (Ψ( u )) (From (4)) p i,w,G [Ψ − ( X )] ( v ) = (cid:88) (Ψ( v ) ,u,w | w | +1 − i ) ∈ Ψ( A ) u ∈ X p i − ,w, Ψ( G )[ X ] ( u ) p i,w,G [Ψ − ( X )] ( v ) = p i,w, Ψ( G )[ X ] (Ψ( v )) The case where w | w | +1 − i = (cid:5) is similar.Using Lemma 8 together with Lemma 9, we get the following lemma: Lemma 10. Let W ⊆ (Σ ∪ {(cid:5)} ) ∗ , G ( V, A ) = R p (Σ) , f : N > → N > and X ⊆ Ψ( V ) be a non-empty set. If for all v ∈ X and w ∈ W , p | w | ,w, Ψ( G )[ X ] ( v ) ≥ f ( | w | ) , then there exists a language L such that: • ε ∈ L , • for all u ∈ L , u avoids squares of period less than p , • for any u ∈ L and w ∈ W there are at least f ( | w | ) different words v ∈ Σ | w | compatible with w and such that uv ∈ L . The graph Ψ( R p (Σ)) is much smaller than R p (Σ) and we can use a computerto check the conditions of this lemma for the values of p that we used. One shouldfirst find the graph Ψ( R p (Σ)) . The following fact allows us to easily computethe set of vertices of Ψ( R p (Σ)) without computing R p (Σ) : Lemma 11. Let w ∈ S p (Σ) . Then w ∈ Ψ( S p (Σ) ∩ Σ p − ) if and only if w isthe smallest non-empty suffix of w such that Ψ( w ) = w . Moreover, given a graph G , the definition of p | w | ,w,G gives a trivial dy-namic algorithm that computes p | w | ,w,G in time O ( | Σ | · | w | · | G | ) . Starting with X = Ψ( R p (Σ)) and inductively removing from X all the vertices for which p | w | ,w, Ψ( R p (Σ))[ X ] < f ( | w | ) gives the largest subgraph that meets the conditionsof Lemma 10. As long as this subgraph is not empty one can then apply Lemma10. Algorithm 1 computes the largest subgraph of Ψ( G ) with the required prop-erty. 14 lgorithm 1: How to compute the subgraph of Ψ( G ) . Input : The graph Ψ( G ) , the set W Output: The largest set X ⊆ Ψ( V ) such that for all v ∈ X and w ∈ W , p | w | ,w, Ψ( G )[ X ] ( v ) ≥ f ( | w | ) X = Ψ( V ) ; todo := true ; while todo do todo := f alse ; foreach w ∈ W do compute p | w | ,w, Ψ( G )[ X ] ; X (cid:48) := { v ∈ X : p | w | ,w, Ψ( G )[ X ] ( v ) ≥ f ( | w | ) } ; if X (cid:54) = X (cid:48) then X := X (cid:48) ; todo := true ; return X ; In this section we apply Theorem 4. We provide a C++ implementation ofAlgorithm 1 that verifies the existence of the language that fulfill conditions(I),(II) and (III) from Theorem 4. Condition (IV) can be easily verified (as longas solutions are given) and we also provide a C++ code to do that. Theorem 12. For any alphabet Σ , let d (Σ) be the smallest integer such thatfor all v ∈ Σ ω and for all sequence ( p i ) ≤ i such that ∀ i , p i +1 − p i ≥ d (Σ) , thereis an infinite square-free word u ∈ Σ ω such that v i = u p i . Then: • ≤ d ( { , , } ) ≤ , • d ( { , , , } ) = 3 , • ≤ d ( { , , , , } ) ≤ , • if | Σ | ≥ , d (Σ) = 2 .Proof. For any alphabet Σ , we have d (Σ) ≥ . Moreover, d is a decreasingfunction of the size of the alphabet. Thus the third statement can easily be de-duced from the second one. We show the remaining statements independentlyof each others. We will show the upper bounds using Algorithm 1, Lemma 10and Theorem 4. The lower bounds are verified by exhaustive search.If | Σ | ≥ , d ( Σ ) = : Let Σ = { , , , , , } and W = {(cid:5)} ∪ {(cid:5) a (cid:5) : a ∈ Σ } ∪ {(cid:5) a (cid:5) b : a, b ∈ Σ } . We can use Algorithm 1 to check that we can apply The codes can be found in the ancillary files of https://arxiv.org/abs/1903.04214 f (1) = 3 , f (3) = f (4) = 6 , p = 12 . Thus conditions (I),(II)and (III) of Theorem 4 are fullfilled. We can check with a computer that con-dition (IV) of Theorem 4 is also fullfilled with x = and x = x = . Thisimplies that for any µ ∈ W ω there are infinite square-free words over Σ com-patible with µ . Since {(cid:5) i a : i ≥ , a ∈ Σ } ω ⊆ W ω , we deduce that for any µ ∈ {(cid:5) i a : i ≥ , a ∈ Σ } ω there are infinite square-free words over Σ compatiblewith µ . We get d ( { , , , , , } ) ≤ . d ( { , , , } ) = : Let w = (0 (cid:5) (cid:5) (cid:5) (cid:5) ) ω . An exhaustive search confirms thatthere are only square-free words over { , , , } compatible with w . Thus d ( { , , , } ) ≥ .Let Σ = { , , , } and W = {(cid:5)} ∪ {(cid:5)(cid:5) a (cid:5) : a ∈ Σ } ∪ {(cid:5)(cid:5) a (cid:5)(cid:5) b : a, b ∈ Σ } . We can use Algorithm 1 to check that we can apply Lemma 10 with f (1) = 2 , f (4) = 5 and f (6) = 8 , p = 18 . We can then apply Theorem 4 with x = , x = and x = and we deduce that for any µ ∈ W ω there are infinite square-free words over Σ compatible with µ . Moreover, {(cid:5) i a : i ≥ , a ∈ Σ } ω ⊆ W ω . We deduce that for any µ ∈ {(cid:5) i a : i ≥ , a ∈ Σ } ω there are infinite square-freewords over Σ compatible with µ . Thus d ( { , , , } ) ≤ . ≤ d ( { , , } ) ≤ : Let w = (0 (cid:5) (cid:5) (cid:5) ) ω . An exhaustive search confirmsthat there are only square-free words over { , , } compatible with w .Thus d ( { , , } ) ≥ .Let Σ = { , , } and W = {(cid:5) } ∪ {(cid:5) i a : i ∈ { , . . . , } , a ∈ Σ } . We can useAlgorithm 1 to check that we can apply Lemma 10 with p = 61 and the values of f given in Table 1. Thus conditions (I),(II) and (III) of Theorem 4 are fullfilled.We can also check that the values of x | w | given in Table 1 fulfill condition (IV) | w | f ( | w | ) x | w | Table 1: The values of f ( | w | ) and x | w | for the computation of d ( { , , } ) of Theorem 4. We deduce that for any µ ∈ W ω there are infinite square-freewords over Σ compatible with µ . Moreover, {(cid:5) i a : i ≥ , a ∈ Σ } ω ⊆ W ω . Wededuce that for any µ ∈ {(cid:5) i a : i ≥ , a ∈ Σ } ω there are infinite square-freewords over Σ compatible with µ . Thus d ( { , , } ) ≤ .The three applications of Algorithm 1 require between 30 and 100GB ofRAM (and around 5 hours of computations). We had to optimize the waystrings are stored in memory in order to be able to compute the graphs for largeenough values of p . The rest of the computations (finding the solution to thesystem and the exhaustive search) easily run on a laptop in a few milliseconds.Remark that we showed something slightly stronger since the results would still16old if an adversary was to tell us at every choice of letter only the next forcedletters with their positions (that is, we know the next element of W ).Experimental computations suggest that d ( { , , } ) is closer to 7 than to and that d ( { , , , , } ) = 2 . Acknowledgement Computational resources have been provided by the Consortium des Équipementsde Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique deBelgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region References [1] J. P. Bell and T. L. Goh. Exponential lower bounds for the number ofwords of uniform length avoiding a pattern. Information and Computation ,205(9):1295–1306, 2007.[2] J. Currie, T. Harju, P. Ochem, and N. Rampersad. Some further results onsquarefree arithmetic progressions in infinite words. Theoretical ComputerScience , 799:140–148, 2019.[3] S. Czerwiński and J. Grytczuk. Nonrepetitive colorings of graphs. Elec-tronic Notes in Discrete Mathematics , 28:453–459, 2007. 6th Czech-SlovakInternational Symposium on Combinatorics, Graph Theory, Algorithms andApplications.[4] J. Grytczuk, J. Kozik, and P. Micek. New approach to nonrepetitive se-quences. Random Structures & Algorithms , 42(2):214–225.[5] T. Harju. On square-free arithmetic progressions in infinite words. Theoret-ical Computer Science , 2018.[6] R. M. Kolpakov. On the number of repetition-free words. Journal of Appliedand Industrial Mathematics , 1(4):453–462, 2007.[7] P. Ochem. Doubled patterns are 3-avoidable. Electronic Journal of Combi-natorics , 23(1), 2016.[8] A. Thue. Über unendliche Zeichenreihen. ’Norske Vid. Selsk. Skr. I. Mat.Nat. Kl. Christiania’Norske Vid. Selsk. Skr. I. Mat.Nat. Kl. Christiania