[PDF] On a generalization of Abelian equivalence and complexity of infinite words

Abstract

In this paper we introduce and study a family of complexity functions of infinite words indexed by $k \in \ints ^+ \cup {+\infty}.$ Let $k \in \ints ^+ \cup {+\infty}$ and A be a finite non-empty set. Two finite words u and v in A ∗ are said to be k -Abelian equivalent if for all x∈ A ∗ of length less than or equal to k, the number of occurrences of x in u is equal to the number of occurrences of x in v. This defines a family of equivalence relations ∼ k on A ∗ , bridging the gap between the usual notion of Abelian equivalence (when k=1 ) and equality (when k=+∞). We show that the number of k -Abelian equivalence classes of words of length n grows polynomially, although the degree is exponential in k. Given an infinite word $\omega \in A^\nats,$ we consider the associated complexity function $\mathcal {P}^{(k)}_\omega :\nats \rightarrow \nats$ which counts the number of k -Abelian equivalence classes of factors of ω of length n. We show that the complexity function P (k) is intimately linked with periodicity. More precisely we define an auxiliary function $q^k: \nats \rightarrow \nats$ and show that if P (k) ω (n)< q k (n) for some $k \in \ints ^+ \cup {+\infty}$ and n≥0, the ω is ultimately periodic. Moreover if ω is aperiodic, then P (k) ω (n)= q k (n) if and only if ω is Sturmian. We also study k -Abelian complexity in connection with repetitions in words. Using Szemerédi's theorem, we show that if ω has bounded k -Abelian complexity, then for every $D\subset \nats$ with positive upper density and for every positive integer N, there exists a k -Abelian N power occurring in ω at some position j∈D.

Full PDF

aa r X i v : . [ m a t h . C O ] J a n On a generalization of Abelian equivalence and complexity of inﬁnitewords

Juhani Karhumaki a,1, , Aleksi Saarela a,1 , Luca Q. Zamboni a,b,2 a Department of Mathematics and Statistics & FUNDIM, University of Turku, FI-20014 Turku, Finland b Universit´e de Lyon, Universit´e Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 43 boulevard du 11novembre 1918, F69622 Villeurbanne Cedex, France

Abstract

In this paper we introduce and study a family of complexity functions of inﬁnite words indexedby k ∈ Z + ∪ { + ∞} . Let k ∈ Z + ∪ { + ∞} and A be a ﬁnite non-empty set. Two ﬁnite words u and v in A ∗ are said to be k -Abelian equivalent if for all x ∈ A ∗ of length less than or equalto k, the number of occurrences of x in u is equal to the number of occurrences of x in v. This deﬁnes a family of equivalence relations ∼ k on A ∗ , bridging the gap between the usualnotion of Abelian equivalence (when k = 1) and equality (when k = + ∞ ) . We show that thenumber of k -Abelian equivalence classes of words of length n grows polynomially, althoughthe degree is exponential in k. Given an inﬁnite word ω ∈ A N , we consider the associatedcomplexity function P ( k ) ω : N → N which counts the number of k -Abelian equivalence classesof factors of ω of length n. We show that the complexity function P ( k ) is intimately linkedwith periodicity. More precisely we deﬁne an auxiliary function q k : N → N and show that if P ( k ) ω ( n ) < q k ( n ) for some k ∈ Z + ∪ { + ∞} and n ≥ , the ω is ultimately periodic. Moreoverif ω is aperiodic, then P ( k ) ω ( n ) = q k ( n ) if and only if ω is Sturmian. We also study k -Abeliancomplexity in connection with repetitions in words. Using Szemer´edi’s theorem, we show thatif ω has bounded k -Abelian complexity, then for every D ⊂ N with positive upper density andfor every positive integer N, there exists a k -Abelian N power occurring in ω at some position j ∈ D. Keywords:

Abelian equivalence, complexity of words, Sturmian words, Szemer´edi’s theorem.

1. Introduction

Abelian equivalence of words has long been a subject of great interest (see for instanceErd¨os problem, [3, 4, 5, 6, 18, 25, 26, 27, 29]). Given a ﬁnite non-empty set A, let A ∗ denotethe set of all ﬁnite words over A. Two words u and v in A ∗ are Abelian equivalent, denoted u ∼ ab v, if and only if | u | a = | v | a for all a ∈ A, where | u | a and | v | a denote the number of Email addresses: [email protected] (Juhani Karhumaki), [email protected] (Aleksi Saarela), [email protected] (Luca Q. Zamboni) Partially supported by the Academy of Finland under grants 251371 and 257857. Partially supported by a FiDiPro grant (137991) from the Academy of Finland and by ANR grant

SUB-TILE . Preprint submitted to Elsevier June 29, 2018 ccurrences of a in u and v, respectively. It is readily veriﬁed that ∼ ab deﬁnes an equivalencerelation (in fact a congruence) on A ∗ . We consider the following natural generalization: Fix k ∈ Z + ∪ { + ∞} . Two words u and v in A ∗ are said to be k - Abelian equivalent , written u ∼ k v, if | u | x = | v | x for each non-emptyword x with | x | ≤ k (where | x | denotes the length of x, and | u | x and | v | x denote the numberof occurrences of x in u and v, respectively). We note that u ∼ + ∞ v if and only if u = v, while ∼ corresponds to the usual notion of Abelian equivalence ∼ ab . Thus one may regard thenotion of k -Abelian equivalence as gradually bridging the gap between Abelian equivalence( k = 1) and equality ( k = + ∞ ) . It is readily veriﬁed that ∼ k deﬁnes an equivalence relation(in fact a congruence) on A ∗ . Clearly, if u ∼ k v, then | u | = | v | and u ∼ ℓ v for each positiveinteger ℓ ≤ k. The notion of k -Abelian equivalence was ﬁrst introduced by the ﬁrst author in [16] inconnection with formal languages and decidability questions of various fundamental problems.It was shown that the well known Parikh Theorem on the equivalence of Parikh images ofregular and context-free languages does not hold for k -abelian equivalence. In contrast varioushighly nontrivial decidability questions including the D0L sequence equivalence problem [8]or the Post Correspondence Problem [24], turned out to be easily decidable in the contextof k -Abelian equivalence. Recently k -Abelian equivalence has been studied in the contextof avoidance of repetitions in words (see the discussion at the beginning of § k -Abelianpowers). In this paper we undergo an investigation of the complexity of inﬁnite words in theframework of k -Abelian equivalence. As is the case with various other notions of complexityof words, we will see that k -Abelian complexity is intimately linked with periodicity and canbe used to detect the presence of repetitions.Let A be a ﬁnite non-empty set. For each inﬁnite word ω = a a a . . . with a i ∈ A, wedenote by F ω ( n ) the set of all factors of ω of length n, that is, the set of all ﬁnite words ofthe form a i a i +1 · · · a i + n − with i ≥ . We set ρ ω ( n ) = Card( F ω ( n )) . The function ρ ω : N → N is called the factor complexity function of ω. Analogously, for each k ∈ Z + ∪ { + ∞} we deﬁne P ( k ) ω ( n ) = Card ( F ω ( n ) / ∼ k ) . The function P ( k ) ω : N → N , which counts the number of k -Abelian equivalence classes offactors of ω of length n, is called the k - Abelian complexity of ω. In case k = + ∞ we have that P (+ ∞ ) ω ( n ) = ρ ω ( n ) , while if k = 1 , P (1) ω ( n ) , denoted ρ ab ω ( n ) , corresponds to the usual Abeliancomplexity of ω. Most word complexity functions, including factor complexity [23], maximal pattern com-plexity [15], permutation complexity [1, 10], Abelian complexity [4], and Abelian maximalpattern complexity [14], may be used to detect (and in some cases characterize) ultimatelyperiodic words. For instance, a celebrated result due to Morse and Hedlund [23] states thatan inﬁnite word ω ∈ A N is ultimately periodic if and only if ρ ω ( n ) ≤ n for some n ∈ Z + . The third author together with T. Kamae proved a similar result in the context of maximalpattern complexity with n replaced by 2 n − . We show that these same results hold in the framework of k -Abelian complex-ity. In order to formulate the precise link between aperiodicity and k -Abelian complexity, wedeﬁne, for each k ∈ Z + ∪ { + ∞} , an auxiliary function q ( k ) : N → N by q ( k ) ( n ) = (cid:26) n + 1 for n ≤ k − k for n ≥ k We prove that for ω ∈ A N , if P ( k ) ω ( n ) < q ( k ) ( n ) for some k ∈ Z + ∪ { + ∞} and n ≥ , then ω is ultimately periodic.This result is already well known in the special cases k = + ∞ and k = 1 (see [23]and [4] respectively). By the Morse-Hedlund result mentioned earlier, this condition givesa characterization of ultimately periodic words in the special case k = + ∞ . In contrast, k -Abelian complexity does not yield such a characterization. Indeed, both Sturmian words andthe ultimately periodic word 01 ∞ = 0111 · · · have the same constant 2 Abelian complexity.More generally, we shall see that the ultimately periodic word 0 k − ∞ has the same k -Abelian complexity as a Sturmian word. Nevertheless k -Abelian complexity gives a completecharacterization of Sturmian words amongst all aperiodic words. More precisely, we provethat for an aperiodic word ω ∈ A N , the following conditions are equivalent: • ω is a balanced binary word, that is, Sturmian . • P ( k ) ω ( n ) = q ( k ) ( n ) for each k ∈ Z + ∪ { + ∞} and n ≥ . Again, the special cases of k = + ∞ and k = 1 were already known (see [23] and [4] respec-tively).Finally we investigate the question of avoidance of k -Abelian N powers: By a k -Abelian N power we mean a word U of the form U = U U . . . U N such that U i ∼ k U j for all 1 ≤ i, j ≤ N. Using Szemer´edi’s theorem [30], we show that if ω has bounded k -Abelian complexity, thenfor every D ⊂ N with positive upper density and for every positive integer N, there exists a k -Abelian N power occurring in ω at some position j ∈ D. The paper is organized as follows: In § k -Abelian equivalence of words. Also in § k -Abelian equivalence classes of words in A n . In § k -Abelian complexity and periodicity of words. In § k -Abelian complexity of Sturmian words and show that it completely characterizes Sturmianwords amongst all aperiodic words. Finally in § k -Abelian complexity in the contextof repetitions in words. k -Abelian equivalence Given a ﬁnite non-empty set A, we denote by A ∗ the set of all ﬁnite words over A includingthe empty word, denoted by ε, by A + the set of all ﬁnite non-empty words over A, by A N the set of (right) inﬁnite words over A, and by A Z the set of bi-inﬁnite words over A. Given With respect to maximal pattern complexity, and Abelian maximal pattern complexity, Sturmian wordsare not the only words of lowest complexity. u = a a . . . a n with n ≥ a i ∈ A, we denote the length n of u by | u | (byconvention we set | ε | = 0 . ) For each x ∈ A + , we let | u | x denote the number of occurrences of x in u. For u ∈ A ∗ , we denote by ¯ u the reverse of u. A factor u of ω = a a a . . . ∈ A N is called right special (respectively left special ) if thereexists distinct symbols a, b ∈ A such that both ua and ub (respectively au and bu ) are factorsof ω. We say u is bispecial if u is both left and right special. An inﬁnite word ω ∈ A N issaid to be periodic if there exists a positive integer p such that a i + p = a i for all indices i. It is said to be ultimately periodic if a i + p = a i for all suﬃciently large i . It is said to be aperiodic if it is not ultimately periodic. Sturmian words are the simplest aperiodic inﬁnitewords; Sturmian words are inﬁnite words over a binary alphabet having exactly n + 1 factorsof length n for each n ≥ . Their origin can be traced back to the astronomer J. BernoulliIII in 1772. A fundamental result due to Morse and Hedlund [23] states that each aperiodic(meaning non-ultimately periodic) inﬁnite word must contain at least n + 1 factors of eachlength n ≥ . Thus Sturmian words are those aperiodic words of lowest factor complexity.They arise naturally in many diﬀerent areas of mathematics including combinatorics, algebra,number theory, ergodic theory, dynamical systems and diﬀerential equations. Sturmian wordsare also of great importance in theoretical physics and in theoretical computer science andare used in computer graphics as digital approximation of straight lines. If ω ∈ { a, b } N isSturmian, then for each positive integer n there exists a unique right special (respectively leftspecial) factor of length n, and one is the reversal of the other. In particular, if x is a bispecialfactor, the x is a palindrome , i.e., x = ¯ x. For more on Sturmian words, we refer the reader to[19].

Deﬁnition 2.1.

Let k ∈ Z + ∪ { + ∞} . We say two words u, v ∈ A + are k - Abelian equivalent and write u ∼ k v, if | u | x = | v | x for all words x of length | x | ≤ k. We note that if u, v ∈ A + and | u | = | v | ≤ k, then u ∼ k v if and only if u = v. Example 2.2.

The words u = 010110 and v = 011010 are -Abelian equivalent but not -Abelian equivalent since the preﬁx of u does not occur in v. The words u = 0110 and v = 1101 are not -Abelian equivalent (since they are not Abelian equivalent) yet for everyword x of length we have | u | x = | v | x . The next lemma gives diﬀerent equivalent ways of deﬁning k -Abelian equivalence. Forexample, item (1) corresponds to the Deﬁnition 2.1 and item (3) corresponds to anothercommon deﬁnition: Words u and v of length at least k − k -Abelian equivalent if theyshare the same preﬁxes and suﬃxes of length k − | u | x = | v | x for every word t of length k . Lemma 2.3.

Let u and v be words of length at least k − and let | u | t = | v | t for every word t of length k . The following are equivalent:1. | u | s = | v | s for all s ∈ A ≤ k − ,2. | u | s = | v | s for all s ∈ A k − ,3. pref k − ( u ) = pref k − ( v ) and suﬀ k − ( u ) = suﬀ k − ( v ) ,4. pref k − ( u ) = pref k − ( v ) , . suﬀ k − ( u ) = suﬀ k − ( v ) ,6. pref i ( u ) = pref i ( v ) and suﬀ k − − i ( u ) = suﬀ k − − i ( v ) for some i ∈ { , . . . , k − } .Proof. (1) ⇒ (2): Clear.(2) ⇒ (3): Let { t , . . . , t n } be the multiset of factors of u (and of v ) of length k . Themultiset of factors of u of length k − { pref k − ( u ) } ∪ { suﬀ k − ( t ) , . . . , suﬀ k − ( t n ) } , and the multiset of factors of v of length k − { pref k − ( v ) } ∪ { suﬀ k − ( t ) , . . . , suﬀ k − ( t n ) } . These multisets must be the same, so pref k − ( u ) = pref k − ( v ). Similarly, suﬀ k − ( u ) =suﬀ k − ( v ).(3) ⇒ (4), (5): Clear.(4) or (5) ⇒ (6): Clear.(6) ⇒ (1): Let { t , . . . , t n } be the multiset of factors of u (and of v ) of length k . Every s ∈ A k − r { pref k − ( u ) , suﬀ k − ( u ) } appears in the multiset { pref k − ( t ) , . . . , pref k − ( t n ) } ∪ { suﬀ k − ( t ) , . . . , suﬀ k − ( t n ) } (1)2 | u | s times. A word s ∈ { pref k − ( u ) , suﬀ k − ( u ) } appears 2 | u | s − k − ( u ) =suﬀ k − ( u ), and 2 | u | s − k − ( u ) = suﬀ k − ( u ). Similarly, every s ∈ A k − r { pref k − ( v ) , suﬀ k − ( v ) } appears 2 | v | s times, and a word s ∈ { pref k − ( v ) , suﬀ k − ( v ) } appears 2 | v | s − k − ( v ) = suﬀ k − ( v ), and 2 | v | s − k − ( v ) = suﬀ k − ( v ).If some words appear an odd number of times in (1), then these must be pref k − ( u ) andsuﬀ k − ( u ), and they must also be pref k − ( v ) and suﬀ k − ( v ). If follows that | u | s = | v | s forevery s ∈ A k − . (In this case the assumption (6) was not needed.)If all words appear an even number of times in (1), then necessarily pref k − ( u ) = suﬀ k − ( u )and pref k − ( v ) = suﬀ k − ( v ). From (6) it follows that pref k − ( u ) = pref k − ( v ) and suﬀ k − ( u ) =suﬀ k − ( v ), and thus | u | s = | v | s for every s ∈ A k − .The fact that | u | s = | v | s also for every s of length less than k − k -Abelian equivalence: Lemma 2.4.

Let u, v ∈ A ∗ and k ≥ . • If | u | = | v | ≤ k − and u ∼ k v , then u = v . • If u ∼ k v , then u ∼ k ′ v for all k ′ ≤ k . • If u ∼ k v and u ∼ k v , then u u ∼ k v v . k − k there exist words u = v of length 2 k such that u ∼ k v. For example, the words u = 0 k − k − and v =0 k − k − of length 2 k are readily veriﬁed to be k -Abelian equivalent (see Proposition 2.8). Lemma 2.5.

Fix ≤ k < + ∞ . Suppose aub ∼ k cvd with a, b, c, d ∈ A and u, v ∈ A ∗ . Then u ∼ k − v. Proof.

Let x ∈ A ∗ with | x | ≤ k − . We can assume that | x | < | aub | for otherwise 0 = | u | x = | v | x . If x is neither a preﬁx nor a suﬃx of aub, then by Lemma 2.3 x is neither a preﬁx norsuﬃx of cvd and hence | u | x = | aub | x = | cvd | x = | v | x . If x is either a preﬁx of aub or a suﬃx of aub but not both, the | u | x = | aub | x − | cvd | x − | v | x . Finally if x is both a preﬁx and asuﬃx of aub then | u | x = | aub | x − | cvd | x − | v | x . The next theorem gives a complete classiﬁcation of pairs of k -Abelian equivalent words oflength 2 k and establishes a ﬁrst link to Sturmian words: Theorem 2.6.

Fix a positive integer k, and let u, v ∈ A ∗ be distinct words of length k. Then u ∼ k v if and only if there exist distinct letters a, b ∈ A, a Sturmian word ω ∈ { a, b } N and aright special factor x of ω of length k − (or empty in case k = 1) such that u = xab ¯ x and v = xba ¯ x. In particular u and v are both factors of the same Sturmian word ω. Remark 2.7.

It follows that if u and v are distinct k -Abelian equivalent words of length 2 k, then both u and v are on a binary alphabet and in fact factors of the same Sturmian word ω. In fact, if B is a bispecial factor of ω then both BabB and

BbaB are factors of ω. Also, if x isa right special factor of ω, then there exists a bispecial factor B of ω with x a suﬃx of B and¯ x a preﬁx of B. Thus both xab ¯ x and xba ¯ x are factors of ω. We will need the next result applied to Sturmian words, but we prove it more generallyfor episturmian words. We refer the reader to [7] for the deﬁnition and basic properties ofepisturmian words.

Proposition 2.8.

Fix a positive integer k ≥ . Let u and v be factors of the same episturmianword ω . Then u and v are k -Abelian equivalent if and only if u and v are ( k − -Abelianequivalent and share a common preﬁx and a common suﬃx of length min {| u | , k − } . Thus, u and v are k -Abelian equivalent if and only if u and v are Abelian equivalent and share acommon preﬁx and a common suﬃx of length min {| u | , k − } . Proof.

One direction follows immediately from Lemma 2.3. Next suppose that u and v are( k − ω, and that u and v share acommon preﬁx and a common suﬃx of length min {| u | , k − } . To prove that u ∼ k v it suﬃcesto show that whenever axb ∈ F ω ( k ) (with a, b ∈ A and x ∈ A ∗ ) , we have | u | axb = | v | axb . Firstlet us suppose that ax is not a right special factor of ω so that every occurrence in ω of ax isa occurrence of axb. Then, if ax is not a suﬃx of u (and hence not a suﬃx of v ) we obtain | u | axb = | u | ax = | v | ax = | v | axb .

6n the other hand if ax is a suﬃx of u (and hence also a suﬃx of v ) we have | u | axb = | u | ax − | v | ax − | v | axb . Similarly, in case xb is not a left special factor of ω we obtain | u | axb = | v | axb . Thus it remainsto consider the case when ax is right special in ω and xb is left special in ω. In this case x is bispecial and a = b. For each c ∈ A, let n c = | u | axc and n ′ c = | v | axc . We must show that n a = n ′ a . However we know that n c = n ′ c for all c = a since xc is not left special in ω. Now, if ax is not a suﬃx of u (and hence not a suﬃx of v ) we have X c ∈ A n c = | u | ax = | v | ax = X c ∈ A n ′ c whence n a = n ′ a . On the other hand if ax is a suﬃx of u (and hence a suﬃx of v ) then X c ∈ A n c = | u | ax − | v | ax − X c ∈ A n ′ c whence n a = n ′ a as required. Remark 2.9.

The following example illustrates that the assumption in Proposition 2.8 that u and v are factors of the same Sturmian word is necessary: Let u = aabb and v = abab. The u and v are Abelian equivalent and share a common preﬁx and suﬃx of length 1 , yet they arenot 2-Abelian equivalent. Proof of Theorem 2.6.

We start by showing that if ω ∈ { a, b } N is a Sturmian word, and x aright special factor of ω of length k − , then u = xab ¯ x and v = xba ¯ x are k -Abelian equivalent.This follows from Proposition 2.8 since u and v share a common preﬁx and a common suﬃxof lengths k − u and v are distinct k -Abelian equivalent words of length 2 k andshow that both u and v have the required form. We proceed by induction on k. In case k = 1 , we have that u and v are distinct Abelian equivalent words of length 2 whence u and v maybe written in the form u = ab and v = ba for some a = b in A. Next suppose the result of Theorem 2.6 is true for k − k. So let u and v be distinct k -Abelian equivalent words of length 2 k with k > . Then byLemma 2.3 we can write u = a ′ u ′ b ′ and v = a ′ v ′ b ′ for some a ′ , b ′ ∈ A and u ′ , v ′ ∈ A ∗ where | u ′ | = | v ′ | = 2( k − ≥ . Since u and v are distinct, it follows that u ′ = v ′ . Also, by Lemma 2.5it follows that u ′ ∼ k − v ′ . Thus by induction hypothesis, there exist distinct letters a, b ∈ A and a Sturmian word ω ∈ { a, b } N such that u ′ and v ′ are both factors of ω of the form u ′ = xab ¯ x and v ′ = xba ¯ x for some right special factor x of ω of length k − . Thus we can write u = a ′ xab ¯ xb ′ and v = a ′ xba ¯ xb ′ . Since u ∼ k v, | a ′ xa | = k, and a = b itfollows that a ′ x must occur in v ′ and hence a ′ ∈ { a, b } . Similarly we deduce that b ′ ∈ { a, b } . Let us ﬁrst suppose that x = ¯ x. Then a ′ xa must occur in v ′ and a ¯ xb ′ must occur in u ′ . Hence both a ′ xa and a ¯ xb ′ are factors of ω. Moreover, since x = ¯ x it follows that x is not leftspecial in ω and ¯ x is not right special in ω. Hence every occurrence of x in ω is preceded by a ′ and every occurrence of ¯ x is ω is followed by b ′ . Since the factors of ω are closed underreversal, we deduce that a ′ = b ′ and a ′ x is a right special factor of ω. Moreover, since u ′ and v ′ are both factors of ω beginning in x and ending in ¯ x, it follows that u = a ′ xab ¯ xa ′ and v = a ′ xba ¯ xa ′ are both factors of ω. x = ¯ x so that x is a bispecial factor of ω. We may write the increasingsequence of bispecial factors ε = B , B , . . . , x = B n , B n +1 , . . . so that x is the n th bispecialfactor of ω. We recall that associated to ω is a sequence ( a i ) i ≥ ∈ A N (called the directiveword of ω ) deﬁned by a i B i is right special in ω. (See for instance [28]).Without loss of generality we can suppose that a ′ = a. We claim b ′ = a. Suppose to thecontrary that b ′ = b. Then both axa and b ¯ xb = bxb are factors of v ′ contradicting that ω isbalanced. Hence we must have a ′ = b ′ = a and so u = axab ¯ xa and v = axba ¯ xa. Now x is abispecial factor of the Sturmian word ω. If ax is a right special factor of ω then we are done byRemark 2.7. Otherwise, if bx is a right special factor of ω, then this means that a n = b where a n is the n th entry of the directive word of ω. Let ω ′ be a Sturmian word whose directive word( b i ) i ≥ is deﬁned by b i = a i for i = n, and b n = a. Then x is a bispecial factor of ω ′ and ax isa right special factor of ω ′ . It follows from Remark 2.7 that both u and v are factors of ω ′ . As an immediate consequence of Theorem 2.6 we have:

Corollary 2.10.

Let u ∈ A ∗ be of the form u = vxab ¯ xw where x is a right special factor oflength k − of a Sturmian word. Set u ′ = vxba ¯ xw. Then u ∼ k u ′ . k -Abelian classes in A n Here we shall estimate the number of k -Abelian equivalence classes of words in A n . Fix k ≥ m ≥ A. Lemma 2.11.

The number of k -Abelian equivalence classes of A n +1 is at least as large as thenumber of k -Abelian equivalence classes of A n .Proof. If k = 1 or n < k −

1, then the claim is clear. Otherwise, let B be a set of representativesof the k -Abelian equivalence classes of A n . The set AB has m times as many words as B . Toprove the theorem, we will show that there can be at most m words in AB that are k -Abelianequivalent.Let a ∈ A and let au , . . . au m ∈ AB be k -Abelian equivalent. It needs to be shown thatsome of these words are equal. Two of these words must have the same k th letter, let these be au and av . Because also pref k − ( au ) = pref k − ( av ), it follows that pref k ( au ) = pref k ( av ). If t ∈ A k , then either | u | t = | au | t = | av | t = | v | t (if t = pref k ( au )), or | u | t = | au | t − | av | t − | v | t (if t = pref k ( au )). Thus u and v are k -Abelian equivalent and, by the deﬁnition of B , u = v . This proves the claim.Let s , s ∈ A k − and let S ( s , s , n ) = A n ∩ s A ∗ ∩ A ∗ s be the set of words of length n that start with s and end with s . For every word w ∈ S ( s , s , n ) we can deﬁne a function f w : A k → { , . . . , n − k + 1 } , f w ( t ) = | w | t . If u, v ∈ S ( s , s , n ), then u ∼ k v if and only if f u = f v . To count the number of k -Abelianequivalence classes, we need to count the number of the functions f w . Not every function f : A k → { , . . . , n − k + 1 } is possible. It must be X t ∈ A k f ( t ) = n − k + 1 , (2)8nd there are also other restrictions, which are determined in Lemma 2.12.If a function f : A k → N is given, then a directed multigraph G f can be deﬁned as follows:the set of vertices is A k − , and if t = s a = bs , where a, b ∈ A , then there are f ( t ) edgesfrom s to s . If f = f w , then this multigraph is related to the Rauzy graph of w . In the nextlemma, deg − denotes the indegree and deg + the outdegree of a vertex in G f . Lemma 2.12.

For a function f : A k → N and words s , s ∈ A k − , the following areequivalent:(i) there is a number n and a word w ∈ S ( s , s , n ) such that f = f w ,(ii) there is an Eulerian path from s to s in G f ,(iii) the underlying graph of G f is connected, except possibly for some isolated vertices, and deg − ( s ) = deg + ( s ) for every vertex s , except that if s = s , then deg − ( s ) = deg + ( s ) − and deg − ( s ) = deg + ( s ) + 1 ,(iv) the underlying graph of G f is connected, except possibly for some isolated vertices, and X a ∈ A f ( as ) = X a ∈ A f ( sa ) + c s ( s ∈ A k − ) , (3) where c s =  − , if s = s = s , , if s = s = s , , otherwise , Proof. (i) ⇔ (ii): w = a . . . a n ∈ S ( s , s , n ) and f = f w if and only if s = a . . . a k − → a . . . a k → · · · → a n − k +2 . . . a n = s is an Eulerian path in G f .(ii) ⇔ (iii): This is well known.(iii) ⇔ (iv): (iv) is just a reformulation of (iii) in terms of the function f .In the next lemma we consider the independence of homogeneous systems related to theequations (3) and (2). Lemma 2.13.

Let x t , where t ∈ A k , be m k unknowns. The system of equations X a ∈ A x as = X a ∈ A x sa ( s ∈ A k − ) (4) is not independent, but all of its proper subsystems are. If we add the equation X t ∈ A k x t = 0 (5) to one of these independent systems, then the system remains independent. roof. The sum of the equations (4) is a trivial identity P t ∈ A k x t = P t ∈ A k x t , so every one ofthese equations follows from the other m k − − s , s ∈ A k − are two diﬀerentwords, then x t = | s s | t for all t is a solution of all the equations, except those with s = s or s = s . This proves that all subsystems are independent. Addition of (5) keeps themindependent, because x t = 1 for all t is a solution of the system (4) but not of (5). Theorem 2.14.

Let k ≥ and m ≥ be ﬁxed numbers and let A be an m -letter alphabet.The number of k -Abelian equivalence classes of A n is Θ( n m k − m k − ) . Proof.

Let n ≥ k − f : A k → { , . . . , n − k + 1 } and u, v ∈ A k − . By Lemma 2.12, thereis a word w ∈ S ( u, v, n ) such that f = f w only if f satisﬁes (2) and (3). Consider the systemformed by these equations. The function f w satisﬁes the equations for every w ∈ S ( u, v, n ),so the system has a solution. By Lemma 2.13, the rank of the coeﬃcient matrix of the systemis m k − , so the general solution of this system is of the form f ( r i ) = m k − m k − X j =1 a ij f ( s j ) + b i ( i = 1 , . . . , m k − ) , where the words r i and s j form the set A k and a ij , b i are rational numbers. Because 0 ≤ f ( s j ) ≤ n − k + 1 , there are O ( n m k − m k − ) possible functions f .Let u = v and consider the system of equations (3). By Lemma 2.13, the general solutionof this homogeneous system is of the form f ( r i ) = m k − m k − +1 X j =1 a ij f ( s j ) ( i = 1 , . . . , m k − − , (6)where the words r i and s j form the set A k and a ij are rational numbers. The coeﬃcients a ij do not depend on n . Let c = max nP m k − m k − +1 j =1 | a ij | | ≤ i ≤ m k − − o and let d be the least common multiple of the denominators of the numbers a ij . Every constantfunction f satisﬁes the system of equations. In particular, f ( t ) = ⌊ n/ m k ⌋ for all t is a solutionof the system. If we let f ( s j ) = j n m k k + b j , where | b j | < n cm k − d | b j , then the numbers f ( r i ) = j n m k k + m k − m k − +1 X j =1 a ij b j given by (6) are integers and 1 ≤ f ( t ) ≤ n/m k − t ∈ A k . Because f ( t ) ≥ t ,the underlying graph of G f is connected, so by Lemma 2.12 there is a word w ∈ S ( u, v, | w | )such that f = f w . Because f ( t ) ≤ n/m k − t , we get | w | = X t ∈ A k f ( t ) + k − ≤ n − m k + k − < n. n m k − m k − +1 ) ways to choose the numbers b j . Every choice gives a diﬀerentfunction f = f w for some w ∈ S ( u, v, | w | ) such that | w | < n . Let these words be w , . . . , w N .No two of them are k -Abelian equivalent. Among these words there are at least N/n wordsof equal length. By Lemma 2.11, there are at least

N/n words of length n such that no twoof them are k -Abelian equivalent, and N/n = Ω( n m k − m k − ). k -Abelian complexity & periodicity In this section we prove that if P ( k ) ω ( n ) < q ( k ) ( n ) for some k ∈ Z + ∪ { + ∞} and n ≥ , then ω is ultimately periodic (see Corollary 3.3 below). For this purpose we introduce anauxiliary family of equivalence relations R k on A ∗ deﬁned as follows: Let k ∈ Z + ∪ { + ∞} . Give u, v ∈ A ∗ we write u R k v, if and only if u ∼ v (i.e., u ∼ ab v ) and u and v share a commonpreﬁx and a common suﬃx of lengths k − . In case | u | < k − , then u R k v means u = v. It follows immediately from Lemma 2.3 that u ∼ k v = ⇒ u R k v. (7)In general the converse is not true: For example, taking u = 0011 and v = 0101 we seethat u R v yet u and v are not 2-Abelian equivalent. However, in view of Proposition 2.8 wehave: Corollary 3.1.

Let u and v be two factors of a Sturmian word ω , and k ∈ Z + ∪ { + ∞} . Then u ∼ k v if and only if u R k v. Let ω ∈ A N . Associated to the relation R k is a complexity function, denoted ρ ( k ) ω ( n ) , whichcounts the number of distinct R k equivalence classes of factors of ω of length n. It followsfrom (7) above that for each n we have ρ ( k ) ω ( n ) ≤ P ( k ) ω ( n ) . (8)We recall the function q ( k ) : N → N ( k ∈ Z + ∪ { + ∞} ) deﬁned by q ( k ) ( n ) = (cid:26) n + 1 for n ≤ k − k for n ≥ k Theorem 3.2.

Let ω = a a a . . . ∈ A N and k ∈ Z + ∪ { + ∞} . If ρ ( k ) ω ( n ) < q ( k ) ( n ) for some n ≥ , then ω is ultimately periodic.Proof. The result is well known in case k = + ∞ (see [23]). For k ∈ Z + , we proceed byinduction on k. In case k = 1 , then R is simply the usual notion of Abelian equivalence andthe result follows from [4].Now suppose k > ρ ( k ) ω ( n ) < q ( k ) ( n ) for some n ≥ . It follows immediatelyfrom the deﬁnition of R k that if u R k v and | u | ≤ k − , then u = v. Thus, if ρ ( k ) ω ( n ) < q ( k ) ( n )where n ≤ k − , then ρ ω ( n ) < n + 1 and so ω is ultimately periodic by the well knownresult of Morse and Hedlund in [23].Thus we suppose that ρ ( k ) ω ( n ) < k for some n ≥ k. We claim that ω must be ultimatelyperiodic. Suppose to the contrary that ω is aperiodic. We shall show that this impliesthat ρ ( k − ν ( n − < k −

1) where ν = a − ω denotes the ﬁrst shift of ω, i.e., the word11btained from ω by removing the ﬁrst letter of ω. Since n − ≥ k −

1) we deduce that ρ ( k − ν ( n − < q ( k − ( n − . But then by induction hypothesis on k, it follows that ν (andhence ω ) is ultimately periodic, a contradiction.Consider the map Ψ : F ω ( n ) / R k −→ F ν ( n − / R k − deﬁned by Ψ([ aub ] k ) = [ u ] k − where a, b ∈ A, and u ∈ A ∗ of length n − u ] k denotes the R k equivalence class of u. To see that Ψ is well deﬁned, suppose aub R k cud. Then since k > , it follows that a = c and b = d and thus that u R v. Moreover as aub and cud share a common preﬁx and suﬃxof length k, it follows that u and v share a common preﬁx and suﬃx of length k − . Thus u R k − v as required. Clearly the mapping Ψ is surjective, in fact for each u ∈ F ν ( n −

2) thereexist a, b ∈ A such that aub ∈ F ω ( n ) . This is the reason for replacing ω by ν. We now show that either there exist distinct classes [ u ] k − , [ v ] k − ∈ F ν ( n − / R k − forwhich min { Card (cid:0) Ψ − ([ u ] k − ) (cid:1) , Card (cid:0) Ψ − ([ v ] k − ) (cid:1) } ≥ , (9)or there exists a class [ u ] k − ∈ F ν ( n − / R k − for whichCard (cid:0) Ψ − ([ u ] k − ) (cid:1) ≥ . (10)In either case it follows thatCard ( F ν ( n − / R k − ) ≤ Card ( F ω ( n ) / R k ) − < k − . Since ω is assumed to be aperiodic, ω contains both a left special factor of the form uc anda right special factor of the form dv of lengths n − c, d ∈ A and u, v ∈ A ∗ . Thus there exist distinct letters a, b ∈ A such that auc and buc are factors of ω. Moreoversince a = b, it follows that [ auc ] k = [ buc ] k . Thus Card (cid:0) Ψ − ([ u ] k − ) (cid:1) ≥ . Similarly, there existdistinct letters a ′ , b ′ ∈ A such that dva ′ and dvb ′ are factors of ω, and since a ′ = b ′ , it followsthat [ dva ′ ] k = [ dvb ′ ] k . Thus Card (cid:0) Ψ − ([ v ] k − ) (cid:1) ≥ . In case [ u ] k − = [ v ] k − , we obtain thedesired inequality (9). In case [ u ] k − = [ v ] k − , since a = b and a ′ = b ′ it follows thatCard { [ auc ] k , [ buc ] k , [ dua ′ ] k , [ dub ′ ] k } ≥ Corollary 3.3.

Let ω ∈ A N and k ∈ Z + ∪ { + ∞} . If P ( k ) ω ( n ) < q ( k ) ( n ) for some n ≥ then ω is ultimately periodic.Proof. As a consequence of the inequality (8), if P ( k ) ω ( n ) < q ( k ) ( n ) then ρ ( k ) ω ( n ) < q ( k ) ( n ) , whence by Theorem 3.2 it follows that ω is ultimately periodic.The same method of proof of Theorem 3.2 can be used to prove the following: Corollary 3.4.

Let ω be a bi-inﬁnite word over the alphabet A and k ∈ Z + ∪ { + ∞} . If P ( k ) ω ( n ) < q ( k ) ( n ) for some n ≥ , then ω is periodic.

12e conclude this section with a few remarks:

Remark 3.5.

In the special case k = + ∞ , the condition given in Corollary 3.3 gives a char-acterization of ultimately periodic words by means of factor complexity: ω ∈ A N is ultimatelyperiodic if and only if ρ ω ( n ) < n +1 for some n ≥ . However, k -Abelian complexity does notyield such a characterization. Indeed, both Sturmian words and the ultimately periodic word01 ∞ = 0111 · · · have the same Abelian complexity. More generally, the ultimately periodicword 0 k − ∞ . . . has the same k -Abelian complexity as a Sturmian word (see Theorem 4.1below). Remark 3.6.

The result of Corollary 3.4 is already known to be true in the special cases k = + ∞ (see [23]) and k = 1 (see Remark 4.07 in [4]). In these special cases, the converse isalso true. But for general 2 ≤ k < + ∞ the converse is false. For instance, let Card( A ) = 5 , and let u be a word containing at least one occurrence of every x ∈ A . Let ω be the periodicword ω = . . . uuuu . . . . Then ρ (2) ω ( n ) ≥ n ≥ . k -Abelian complexity of Sturmian words In this section we determine the k -Abelian complexity of Sturmian words and show thatfor each k, the complexity function P ( k ) completely characterizes Sturmian words amongst allaperiodic words. More precisely: Theorem 4.1.

Fix k ∈ Z + ∪ { + ∞} . Let ω ∈ A N be an aperiodic word. The followingconditions are equivalent: • ω is a balanced binary word, that is, Sturmian. • P ( k ) ω ( n ) = q ( k ) ( n ) = ( n + 1 for ≤ n ≤ k − k for n ≥ k . Our proof of Theorem 4.1 will make use of the following functions g i , which transformbinary words by changing the letters around a speciﬁc point. For words w ∈ { , } n we deﬁne g , . . . , g n as follows: g i ( w ) = ( u v, if i < n , w = u v and | u | = i,u , if i = n and w = u . Lemma 4.2.

Let n ≥ and let w ∈ { , } ω be Sturmian. There is a word u ∈ { , } n anda permutation σ of { , . . . , n } such that if u i +1 = g σ ( i ) ( u i ) for i = 1 , . . . , n , then u , . . . , u n +1 are the factors of w of length n .Proof. Let u , . . . , u n +1 be the factors of w of length n in lexicographic order. If follows fromTheorem 1.1. in [2] that for every i there is an m such that u i +1 = g m ( u i ). It needs to beproved that the m ’s are all diﬀerent. Let u i +1 = g m ( u i ) and u i ′ +1 = g m ( u ′ i ). For every j | pref m ( u j ) | ≤ | pref m ( u j +1 ) | and for j ∈ { i, i ′ } | pref m ( u j ) | < | pref m ( u j +1 ) | . i = i ′ , then | pref m ( u ) | + 2 ≤ | pref m ( u n +1 ) | which contradicts the balance property of Sturmian words. Example 4.3.

The factors of the Fibonacci word of length six are u = 001001 , u = 001010 = g ( u ) , u = 010010 = g ( u ) , u = 010100 = g ( u ) ,u = 100100 = g ( u ) , u = 100101 = g ( u ) , u = 101001 = g ( u ) . We have u ∼ u ∼ u and u ∼ u . There are no other 2-Abelian equivalences betweenthese factors.Proof of Theorem 4.1. First let us suppose ω ∈ { , } N is Sturmian and let 1 ≤ k ≤ + ∞ . Let n ≤ k − . By Lemma 2.4 two factors u and v of ω of length n are k -Abelian equivalent ifand only u = v. Thus P ( k ) w ( n ) = n + 1 as required.Next let n ≥ k and let u , . . . , u n +1 and σ be as in Lemma 4.2. If k ≤ σ ( i ) ≤ n − k , thenthere are words s, t ∈ { , } ∗ and u, v ∈ { , } k − and letters a, b ∈ { , } so that u i = su vt and u i +1 = g σ ( i ) ( u i ) = su vt . We prove that u i ∼ k u i +1 . The preﬁxes and suﬃxes of u i and u i +1 of length k − u i of length k are the factors of su , u v and vt of length k , and the factors of u i +1 of length k are the factors of su , u v and vt oflength k . Because u v and u v are factors of w , it follows that u is right special and v is leftspecial and hence equal to the reversal of u . By Theorem 2.6, u v and u v are k -Abelianequivalent. This proves that u i ∼ k u i +1 if k ≤ σ ( i ) ≤ n − k . Thus the words u , . . . , u n +1 are in at most 2 k diﬀerent k -Abelian equivalence classes and P ( k ) ω ( n ) ≤ k . By Corollary 3.3, P ( k ) ω ( n ) = 2 k .Next let 1 ≤ k ≤ + ∞ and let ω ∈ A N be aperiodic and P ( k ) ω ( n ) = q ( k ) ( n ) = ( n + 1 for 0 ≤ n ≤ k − k for n ≥ k . Taking n = 1 we see that ω is binary, (say ω ∈ { , } N ) . We must show that ω is balanced.We ﬁrst recall some basic facts concerning factors of Sturmian words (see for instance [28]):Let η ∈ { , } N be a Sturmian word, and let F η ( n ) denote the factors of η of length n. Theset F η ( n + 1) is completely determined from the set F η ( n ) unless η has a bispecial factor B of length n − B and 1 B are factors of η and exactly one of the twois right special. If 0 B is right special, then every occurrence of 1 B in η is an occurrence of1 B . If v is a factor of η and u a preﬁx of v, we write u ⊢ v if every occurrence of u in η isan occurrence of v. Thus if 0 B is right special, then 1 B ⊢ B , and similarly if 1 B is rightspecial, then 0 B ⊢ B . Now suppose to the contrary that the aperiodic binary word ω is not Sturmian. Thenthere exists a smallest positive integer n ≥ η such that F ω ( n ) = F η ( n )but F ω ( n + 1) = F η ′ ( n + 1) for every choice of Sturmian word η ′ . This means that ω has abispecial factor B of length n − B and 1 B are in F ω ( n ) and one of the followingmust occur: i) Neither 0 B nor 1 B is right special in ω ; ii) There exists a unique a ∈ { , } suchthat aB is right special, and (1 − a ) B ⊢ (1 − a ) B (1 − a ); iii) Both 0 B and 1 B are right specialin ω. We will show that since ω is aperiodic, only case iii) is in fact possible. Clearly, if neither14 B nor 1 B were right special, then Card( F ω ( n )) = Card( F ω ( n + 1)) whence ω is ultimatelyperiodic, a contradiction. Next suppose case ii) occurs. We may suppose without loss ofgenerality that 0 B is right special and 1 B ⊢ B . If 1 ⊢ B (and hence 1 ⊢ B , then wewould have 1 ⊢ B n for every n ≥ ω correspondingto the ﬁrst occurrence of 1 on ω is periodic. Thus if ¬ (1 ⊢ B ) , then there exists a bispecialfactor B ′ of ω with 0 < | B ′ | < | B | such that 1 B ′ is right special and 1 B ′ ⊢ B and hence1 B ′ ⊢ B . Writing 1 B B ′ V we have 1 B ′ ⊢ B ′ V. We next show by induction on n that 1 B ′ V n is a palindrome for each n ≥ . Clearly this is true for n = 1 since 1 B ′ V = 1 B . Next suppose 1 B ′ V n is a palindrome. Then1 B ′ V n +1 = V n +1 B ′ V V n B ′ V B ′ V n = V B ′ V n = 1 B ′ V V n = 1 B ′ V n +1 . Having established that 1 B ′ V n is a palindrome, it follows that 1 B ′ B ′ V n and hence 1 B ′ V n ⊢ B ′ V n +1 for each n ≥ . Whence as before ω is ultimately periodic.Thus if ω is not Sturmian, case iii) must occur. This implies that F ω ( n + 1) = F η ( n + 1) ∪ { B , B } and Card( F η ( n + 1) ∩ { B , B } ) = 1 . Since η is Sturmian, the number of k -Abelian classesof factors of η of length n + 1 is equal to q ( k ) ( n + 1) . But the additional factor aBa of ω of length n + 1 introduces a new k -Abelian class since it is not even Abelian equivalent toany other factor of η (and hence ω ) of length n + 1 . Thus P ( k ) ω ( n + 1) = q ( k ) ( n + 1) + 1 , acontradiction. Thus ω is Sturmian. Remark 4.4.

In view of Corollary 3.3, within the class of aperiodic words, Sturmian wordshave the lowest possible k -Abelian complexity. See [1, 14, 15, 23] for other instances in whichSturmian words have the lowest complexity amongst all aperiodic words.

5. Bounded k -Abelian complexity & k -Abelian repetitions There is great interest in avoidability of repetitions in inﬁnite words. This originated withthe classical work of Thue [31] and [32], in which he established the existence of an inﬁnitebinary (resp. ternary) word avoiding cubes (resp. squares). It was later shown that to avoidAbelian cubes or Abelian squares, one needs 3-letter or 4-letter alphabets respectively (see[6] and [18]). The corresponding problems for k -abelian repetitions turned out to be quitenontrivial. It follows easily that the smallest alphabet where k -abelian cubes can be avoidedis either 2 or 3 , and similarly the smallest alphabet where k -abelian squares can be avoidedis either 3 or 4 . In the latter case for k = 2 a computer veriﬁcation revealed that the correctvalue is 4 , as in the case of Abelian repetitions: Each ternary 2-abelian square-free word is oflength at most 536 [12]. In the former case computer veriﬁcation shows that there exist binarywords of length 100000 which are 2-abelian cube-free [12]. It is still unknown whether thereexists an inﬁnite binary word which is 2-abelian cube-free. For some larger values of k suchinﬁnite words exist. In the case of binary alphabets and cubes it was shown in a sequence ofpapers that an inﬁnite word avoiding k -abelian cubes can be constructed for k = 8, k = 5and for k = 3 (see [13], [20] and [21] respectively). So only the value k = 2 remains open. Itwould be extremely surprising if no such inﬁnite words exist. For avoiding k -abelian squares15n a ternary alphabet the situation is equally challenging. We know that for k = 3 thereexist words of length 100000 avoiding 3-abelian squares. The avoidability in inﬁnite words of k -abelian squares in a ternary alphabet is only known for large values of k ( k ≥

64) (see [11]).In this section we prove that k -Abelian repetitions are unavoidable in words having bounded k -Abelian complexity. For each positive integer k we set A ≤ k = { x ∈ A ∗ : | x | ≤ k } . Given an inﬁnite word ω = a a a . . . ∈ A N , for each 0 ≤ i < j < + ∞ we denote by ω [ i, j ] thefactor a i a i +1 · · · a j . Deﬁnition 5.1.

Let k and B be positive integers and ω ∈ A N . We say ω is ( k, B ) -balancedif and only if for all factors u and v of ω of equal length, and for all x ∈ A ≤ k we have || u | x − | v | x | ≤ B. We say ω is arbitrarily k -imbalanced if ω is not ( k, B ) -balanced for anypositive integer B. An elementary, but key observation is that

Lemma 5.2.

Let k be a positive integer and ω ∈ A N . Then ω has bounded k -Abelian complexityif and only if ω is ( k, B ) -balanced for some positive integer B. Proof.

Clearly if P ( k ) ω is bounded, say by B, then ω is ( k, B − ω is( k, B )-balanced, then for each positive integer n and for each x ∈ A ∗ with | x | ≤ k we haveCard {| u | x : u ∈ F ω ( n ) } ≤ B + 1 . It follows that P ( k ) ω ( n ) ≤ ( B + 1) K where K = Card A ≤ k . Fix a positive integer k. It follows from Theorem 4.1 and Lemma 5.2 that each Sturmianword is ( k, B )-balanced for some positive integer B (depending on k. ) Actually, I. Fagnot andL. Vuillon proved in [9] that every Sturmian word is ( k, k )-balanced. Deﬁnition 5.3.

Fix k ∈ Z + ∪ { + ∞} , and N a positive integer. By a k -Abelian N -power wemean a word U of the form U = U U · · · U N such that U i ∼ k U j for all ≤ i, j ≤ N. In this section we shall prove the following result:

Theorem 5.4.

Fix k ∈ Z + ∪ { + ∞} . Let ω = a a a . . . ∈ A N be an inﬁnite word on a ﬁnitealphabet A having bounded k -Abelian complexity. Let D ⊆ N be a set of positive upper density,that is lim sup n →∞ Card ( D ∩ { , , . . . , n } ) n > . Then, for every positive integer N , there exist i and ℓ such that { i, i + ℓ, i + 2 ℓ, . . . , i + ℓN } ⊂ D and the N consecutive blocks ( ω [ i + jℓ, i +( j +1) ℓ − ≤ j ≤ N − of length ℓ are pairwise k -Abelianequivalent. In particular, ω contains arbitrarily high k -Abelian powers. emark 5.5. The result in Theorem 5.4 is already known in the special case of D = N and k = + ∞ and k = 1 (see [23] and [27] respectively).Before proving Theorem 5.4 we give some immediate consequences: Corollary 5.6.

Let k and N be positive integers, and ω an inﬁnite word avoiding k -Abelian N -powers. Then ω is arbitrarily k -imbalanced.Proof. This follows immediately from Lemma 5.2 and Theorem 5.4.

Corollary 5.7.

Let ω be a Sturmian word. Then ω contains k -Abelian N -powers for allpositive integers k and N. Proof.

This follows immediately from Theorems 4.1 and 5.4; in fact the k -Abelian complexity P ( k ) ω is bounded (by 2 k ) for each positive integer k. Remark 5.8.

It is known that a Sturmian word ω contains an N -power for each positiveinteger N if and only if the sequence of partial quotients in the continued fraction expansionof the slope of ω is unbounded. So, a Sturmian word whose corresponding slope has boundedpartial quotients (e.g., the Fibonacci word) will not contain N -powers for N suﬃciently large(e.g., the Fibonacci word contains no 4-powers [17, 22]). However, every Sturmian word willcontain arbitrarily high k -Abelian powers.Our proof of Theorem 5.4 will make use of the following well known result ﬁrst conjecturedby Erd¨os and Turan and later proved by to E. Szemer´edi: Theorem 5.9. [Szemer´edi’s theorem [30]]

Let D ⊆ N be a set of positive upper density. Then D contains arbitrarily long arithmetic progressions.Proof of Theorem 5.4. Let D ⊆ N be a set of positive upper density. First we consider thecase k = + ∞ . By assumption P (+ ∞ ) ω ( n ) is bounded. This is equivalent to saying that ω hasbounded factor complexity. It follows by Morse-Hedlund that ω is ultimately periodic, i.e., ω = U V ∞ for some U, V ∈ A ∗ . For each i ≥ , set D i = D ∩ { i + j | V | : j = 1 , , , . . . } . Pick i > | U | such that the set D i has positive upper density. Then an arithmetic progression oflength N + 1 in D i (guaranteed by Szemer´edi’s theorem) determines the N th power of somecyclic conjugate of V. Next let us ﬁx positive integers k and N and assume that P ( k ) ω ( n ) is bounded. It followsby Lemma 5.2 that ω is ( k, B )-balanced for some positive integer B. We recall the followinglemma proved in [27]

Lemma 5.10. [Lemma 5.4 in [27]]

Let k and B be positive integers. There exist positiveintegers α x for each x ∈ A ≤ k and a positive integer M such that whenever X x ∈ A ≤ k c x α x ≡ M ) for integers c x with | c x | ≤ B for each x ∈ A ≤ k , then c x = 0 for each x ∈ A ≤ k . D = ( D − ∩ { k, k + 1 , k + 2 . . . } . Then D is of positive upper density. We now deﬁne a ﬁnite coloringΦ : D −→ { , , , . . . , M − } × F ω (2 k )as follows Φ( n ) +  X x ∈ A ≤ k | ω [1 , n ] | x α x (mod M ) ; ω [ n − k + 1 , n + k ]  where α x and M are as in Lemma 5.10. Note that the second coordinate of Φ( n ) is the suﬃx oflength 2 k of ω [1 , n + k ] . We note also that if Φ( m ) = Φ( n ) for some m < n, then by consideringthe ﬁrst coordinate of Φ one has X x ∈ A ≤ k | ω [1 , n ] | x α x − X x ∈ A ≤ k | ω [1 , m ] | x α x ≡ M ) (11) X x ∈ A ≤ k ( | ω [1 , n ] | x − | ω [1 , m ] | x ) α x ≡ M ) (12) X x ∈ A ≤ k | ω [ m − | x | + 2 , n ] | x α x ≡ M ) . (13)Φ deﬁnes a ﬁnite partition of D where two elements r and s in D belong to the same classof the partition if and only if Φ( r ) = Φ( s ) . Clearly at least one class of this partition of D has positive upper density. Thus by Szemer´edi’s theorem, there exist positive integers r and t with r ≥ k such that { r, r + t, r + 2 t, . . . , r + N t } ⊂ D and Φ( r ) = Φ( r + t ) = Φ( r + 2 t ) = · · · = Φ( r + N t ) . We now claim that the N consecutive blocks of length tω [ r + 1 , r + t ] ω [ r + t + 1 , r + 2 t ] ω [ r + 2 t + 1 , r + 3 t ] . . . ω [ r + ( N − t + 1 , r + N t ]are pairwise k -Abelian equivalent. This would prove that ω contains a k -Abelian N -power inposition r + 1 ∈ D. To prove the claim, let 0 ≤ i, j ≤ N − . We will show that ω [ r + it + 1 , r + ( i + 1) t ] ∼ k ω [ r + jt + 1 , r + ( j + 1) t ] . By (13) ﬁrst taking n = r + ( i + 1) t and m = r + it, then n = r + ( j + 1) t and m = r + jt X x ∈ A ≤ k | ω [ r + it −| x | +2 , r +( i +1) t ] | x α x ≡ X x ∈ A ≤ k | ω [ r + jt −| x | +2 , r +( j +1) t ] | x α x ≡ M )18nd hence X x ∈ A ≤ k ( | ω [ r + it − | x | + 2 , r + ( i + 1) t ] | x − | ω [ r + jt − | x | + 2 , r + ( j + 1) t ] | x ) α x ≡ M ) . But since | ω [ r + it − | x | + 2 , r + ( i + 1) t ] | = | ω [ r + jt − | x | + 2 , r + ( j + 1) t ] | = | x | + t − ω is ( k, B )-balanced, it follows that || ω [ r + it − | x | + 2 , r + ( i + 1) t ] | x − | ω [ r + jt − | x | + 2 , r + ( j + 1) t ] | x | ≤ B whence by Lemma 5.10 we deduce that for each x ∈ A ≤ k | ω [ r + it − | x | + 2 , r + ( i + 1) t ] | x = | ω [ r + jt − | x | + 2 , r + ( j + 1) t ] | x . (14)Since Φ( r + it ) = Φ( r + jt ) , the second coordinate of Φ gives ω [ r + it − k + 1 , r + it + k ] = ω [ r + jt − k + 1 , r + jt + k ] . Together with (14) we deduce that for each x ∈ A ≤ k . | ω [ r + it + 1 , r + ( i + 1) t ] | x = | ω [ r + jt + 1 , r + ( j + 1) t ] | x . In other words ω [ r + it + 1 , r + ( i + 1) t ] ∼ k ω [ r + jt + 1 , r + ( j + 1) t ]as required. This completes our proof of Theorem 5.4 References [1] S.V. Avgustinovich, A. Frid, T. Kamae, P. Salimov, Inﬁnite permutations of lowestmaximal pattern complexity,

Theoret. Comput. Sci.,

412 (2011) 2911–2921.[2] M. Bucci, A. De Luca, L.Q. Zamboni, Some characterizations of Sturmian words in termsof the lexicographic order,

Fund. Inform. , 116 (2012) 25–33.[3] J. Cassaigne, G. Richomme, K. Saari, L.Q. Zamboni, Avoiding Abelian powers in binarywords with bounded Abelian complexity,

Internat. J. Found. Comput. Sci. , 22 (2011)905–920.[4] E. M. Coven, G. A. Hedlund, Sequences with minimal block growth,

Math. SystemsTheory , 7 (1973) 138–153.[5] J. Currie, N. Rampersad. Recurrent words with constant Abelian complexity,

Adv. inAppl. Math.,

47 (2011) 116–124.[6] F.M. Dekking, Strongly non-repetitive sequences and progression-free sets,

J. Combin.Theory Ser. A,

27 (1979) 181–185. 197] X. Droubay, J. Justin, G. Pirillo, Episturmian words and some constructions of de Lucaand Rauzy,

Theoret. Comput. Sci.,

255 (2001) 539–553.[8] A. Ehrenfeucht, G. Rozenberg, Elementary homomorphisms and a solution of the D0Lsequence equivalence problem,

Theoret. Comput. Sci.,

Discrete Appl. Math.,

European J. Combin.,

28 (2007) 2106–2114.[11] M. Huova, Existence of inﬁnite ternary k -Abelian square free words, Preprint, 2013.[12] M. Huova, J. Karhum¨aki. Observations and problems on k -abelian avoidability, In Com-binatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) ,(2011) 2215–2219.[13] M. Huova, J. Karhum¨aki, A. Saarela, Problems in between words and abelian words: k -abelian avoidability, Theoret. Comput. Sci.

454 (2012) 172–177.[14] T. Kamae, S. Widmer, L.Q. Zamboni, Maximal pattern Abelian complexity, Preprint,2013.[15] T. Kamae, L.Q. Zamboni, Sequence entropy and the maximal pattern complexity ofinﬁnite words,

Ergodic Theory Dynam. Systems,

22 (2002) 1191–1199.[16] J. Karhum¨aki, Generalized Parikh mappings and homomorphisms,

Information andControl,

47 (1980) 155–165.[17] J. Karhum¨aki, On cube free ω -words generated by binary morphisms, Discrete Appl.Math.,

Proceedingsof ICALP’1992 (International Conference on Automata, Languages and Programming- Vienna 1992) , volume 623 of

Lecture Notes in Comput. Sci. , pages 41–52. Springer,Berlin, 1992.[19] M. Lothaire,

Combinatorics on Words , volume 17 of

Encyclopedia of Mathematics and itsApplications . Addison-Wesley, 1983. Reprinted in the

Cambridge Mathematical Library ,Cambridge University Press, UK, 1997.[20] R. Merca¸s, A. Saarela, 5-abelian cubes are avoidable on binary alphabets, In

Proceedingsof the 14th Mons Days of Theoretical Computer Science , 2012.[21] R. Merca¸s, A. Saarela, 3-abelian cubes are avoidable on binary alphabets, Preprint 2013.[22] F. Mignosi, G. Pirillo, Repetitions in the Fibonacci inﬁnite word,

RAIRO Theor. Inform.Appl.,

26 (1992) 199-204.[23] M. Morse, G.A. Hedlund, Symbolic Dynamics II: Sturmian trajectories,

Amer. J. Math.,

62 (1940) 1–42. 2024] E. Post, A variant of a recursively unsolvable problem,

Bull. Amer. Math. Soc.,

52 (1946)264–268.[25] S. Puzynina, L.Q. Zamboni, Abelian returns in Sturmian words,

J. Combin. Theory Ser.A,

120 (2013) 390-408.[26] G. Richomme, K. Saari, L.Q. Zamboni, Balance and Abelian complexity of the Tribonacciword,

Adv. Appl. Math., , 45 (2010) 212–231.[27] G. Richomme, K. Saari, L.Q. Zamboni, Abelian complexity of minimal subshifts,

J.London Math. Soc. (2),

83 (2011) 79–95.[28] R. Risley, L.Q. Zamboni. A generalization of Sturmian sequences; combinatorial structureand transcendence,

Acta Arith.,

XCV.2 (2000) 167–184.[29] A. Saarela, Ultimately constant abelian complexity of inﬁnite words,

J. Autom. Lang.Comb.,

14 (2009) 255–258.[30] E. Szemer´edi, On sets of elements containing no k elements in arithmetic progressions, Acta Arith.,

27 (1975) 299–345.[31] A. Thue, ¨Uber unendliche zeichenreihen,

Norske Vid. Selsk. Skr. I. Mat. Nat. Kl.,