[PDF] On Abelian Closures of Infinite Non-binary Words

Abstract

Two finite words u and v are called abelian equivalent if each letter occurs equally many times in both u and v. The abelian closure \mathcal{A}(\mathbf{x}) of an infinite word \mathbf{x} is the set of infinite words \mathbf{y} such that, for each factor u of \mathbf{y}, there exists a factor v of \mathbf{x} which is abelian equivalent to u. The notion of an abelian closure gives a characterization of Sturmian words: among uniformly recurrent binary words, periodic and aperiodic Sturmian words are exactly those words for which \mathcal{A}(\mathbf{x}) equals the shift orbit closure \Omega(\mathbf{x}). Furthermore, for an aperiodic binary word that is not Sturmian, its abelian closure contains infinitely many minimial subshifts. In this paper we consider the abelian closures of well-known families of non-binary words, such as balanced words and minimal complexity words. We also consider abelian closures of general subshifts and make some initial observations of their abelian closures and pose some related open questions.

Full PDF

aa r X i v : . [ m a t h . C O ] D ec On Abelian Closures of Inﬁnite Non-binary Words

Juhani Karhum¨aki a , Svetlana Puzynina b,c, , Markus A. Whiteland d, a Department of Mathematics and Statistics, FI-20014 University of Turku, Finland b St. Petersburg State University, Russia c Sobolev Institute of Mathematics, Russia d Max Planck Institute for Software Systems, Saarland Informatics Campus, Saarbr¨ucken, Germany

Abstract

Two ﬁnite words u and v are called abelian equivalent if each letter occurs equally many timesin both u and v . The abelian closure A ( x ) of an inﬁnite word x is the set of inﬁnite words y such that, for each factor u of y , there exists a factor v of x which is abelian equivalent to u .The notion of an abelian closure gives a characterization of Sturmian words: among uniformlyrecurrent binary words, periodic and aperiodic Sturmian words are exactly those words forwhich A ( x ) equals the shift orbit closure Ω( x ). Furthermore, for an aperiodic binary wordthat is not Sturmian, its abelian closure contains inﬁnitely many minimial subshifts. In thispaper we consider the abelian closures of well-known families of non-binary words, such asbalanced words and minimal complexity words. We also consider abelian closures of generalsubshifts and make some initial observations of their abelian closures and pose some relatedopen questions.

1. Introduction

Let x ∈ Σ N be an inﬁnite word over an alphabet Σ. We deﬁne the language of x , denotedby L ( x ), as the set of factors of x , i.e., blocks of consecutive letters of x . A subshift Ω( x )generated by an inﬁnite word x can be deﬁned as the set of inﬁnite words whose languagesare included in L ( x ): Ω( x ) = { y ∈ Σ N : L ( y ) ⊆ L ( x ) } . In this paper, we consider an abelianversion of the notion of a subshift. Two ﬁnite words u and v are called abelian equivalent ,denoted by u ∼ ab v , if each letter occurs equally many times in both u and v . Various abelianproperties of words have been actively studied recently, e.g., abelian complexity, abelianpowers, abelian periods, etc. [4, 21, 22]. We deﬁne the abelian closure A ( x ) of an inﬁniteword x as the set of inﬁnite words y such that, for each factor u of y , there exists a factor v of x with u ∼ ab v . Clearly, Ω( x ) ⊆ A ( x ) for any word x .We start with two examples showing completely diﬀerent structure of abelian closures:Sturmian words and the Thue–Morse word. Sturmian words can be deﬁned as inﬁnite aperi-odic words which have n + 1 distinct factors for each length n . They admit various characteri-zations; in particular, they are exactly the aperiodic binary balanced words (i.e., the numbersof occurrences of 1 in factors of the same length diﬀer by at most 1). It is not hard to seethat, for a Sturmian word x , Ω( x ) = A ( x ) (so, the abelian closure is small, contains onlyits subshift). Indeed, due to balance there are exactly two abelian classes of factors of each Email addresses: [email protected] (Juhani Karhum¨aki), [email protected] (Svetlana Puzynina), [email protected] (Markus A. Whiteland)

Preprint submitted to Elsevier January 1, 2021 ength. Therefore, any word y ∈ A ( x ) must be balanced. Further, the frequencies of lettersof y are uniquely deﬁned by A ( x ). Thus y is Sturmian with the same letter frequencies as x , i.e., y ∈ Ω( x ). In fact, the property Ω( x ) = A ( x ) characterizes Sturmian words amonguniformly recurrent binary words (see Theorem 2.6).The Thue–Morse word TM = 011010011001 · · · can be deﬁned as the ﬁxed point startingwith 0 of the morphism µ : 0

01, 1

10. For odd lengths TM has two abelian factors,and for even lengths three. Further, the number of occurrences of 1 in each factor diﬀersby at most 1 from half of its length [22]. It is easy to see that any factor of any word in { ε, , } · { , } N has the same property, i.e., { ε, , } · { , } N ⊆ A ( TM ). In fact, equalityholds: A ( TM ) = { ε, , } · { , } N (so, the abelian closure of TM is huge compared to itsshift orbit closure). Indeed, let x ∈ A ( TM ). Then x has blocks of each letter of length at most2 (since there are no factors 000 and 111). Moreover, between two consecutive occurrencesof 00 there must occur 11, and vice versa (otherwise we have a factor 00(10) n

0, where thenumber of occurrences of 1 diﬀers by more than 1 from half of its length). Clearly, such aword is in { ε, , } · { , } N . In [20], we show that this fact can be generalized to all binarywords: In fact, each binary aperiodic uniformly recurrent word which is not Sturmian, admitsinﬁnitely many minimal subshifts in its abelian closure. Moreover, in the case of rationalletter frequency, the abelian closure always contains a morphic image of the full shift.In general, the abelian closure of an inﬁnite word might have a pretty complicated struc-ture. T. Hejda, W. Steiner, and L.Q. Zamboni studied the abelian shift of the Tribonacciword T deﬁned as the ﬁxed point of τ : 0

01, 1

02, 2

0. They have announcedthat A ( T ) contains only one minimal subshift, namely Ω( T ) itself, but that there exist otherwords in it as well [12, 24].The study of abelian closures is motivated by the question of, given an inﬁnite word x ,how strong is the bond between its abelian factors and its language. We quantify this bondby the size of the abelian closure. By size we do not mean the usual cardinality of a subshift,rather, we mean the number of disjoint minimal subshifts contained in A ( x ). A shift orbitclosure Ω( x ) is minimal if does not properly contain another shift orbit closure. If A ( x )is huge (it contains inﬁnitely many minimal subshifts), then this bond is quite weak. Onthe other hand, if A ( x ) is small (ﬁnitely many minimal subshifts), then the bond is quitestrong. The strongest bond is attained when A ( x ) is a minimal subshift itself. In this casewe necessarily have A ( x ) = Ω( x ). It is not hard to see that for purely periodic words, theirabelian closure is ﬁnite (see Proposition 2.3). On the other hand, the abelian closure of anultimately periodic word can be huge (see Example 2.4). For reasons stemming from thisobservation, when dealing with abelian closures of individual words, we shall assume thewords to deﬁne minimal shift orbit closures.There is indeed no particular reason to restrict the deﬁnition of abelian closures to justindividual words; the abelian closure of a set of words X comprises those inﬁnite words y whose each factor is abelian equivalent to some factor of one of the words in X . For example,the abelian closure of Ω( x ) coincides with A ( x ). As mentioned previously, the shift orbitclosure of an inﬁnite word is a certain type of a subshift . In general, a non-empty set X ⊆ Σ N of inﬁnite words is called a subshift if it is closed (as a subset of the compact metric space Σ N equipped with the usual product topology deﬁned by the discrete topology on Σ) and that σ ( X ) ⊆ X , where the shift map σ , is deﬁned as σ ( x ) i = x i +1 . A subshift is called minimal Subshifts are often deﬁned over bi-inﬁnite words, in which case we require σ ( X ) = X in the deﬁnition.

2f it does not contain a proper subshift. Hence a minimal subshift is always the shift orbitclosure of some word x . It is routinely checked that A ( X ) is a subshift for any set X of words.In this paper, we study how the characterization of Sturmian words as aperiodic uniformlyrecurrent words with A ( x ) = Ω( x ) extends to non-binary alphabets. We then study theabelian closures of certain generalizations of Sturmian words, and some preliminary resultshave been reported at DLT 2018 conference [15]. Besides that, we discuss abelian closures ofsubshifts in general. In Section 3 we characterize the abelian subshifts of aperiodic recurrentbalanced words; they are a ﬁnite union of minimal subshifts. In Section 4 we consider abelianclosures of words over a k -letter alphabet with factor complexity n + k − n , whichare aperiodic words of minimal complexity involving k letters. The behavior is diﬀerentdepending on k . For k = 2, we are in the case of Sturmian words, so we have A ( x ) = Ω( x ).Surprisingly, the most complicated behaviour is exhibited in the ternary alphabet. We showthat for k = 3, depending on the word x , its abelian closure A ( x ) contains either exactly one,or uncountably many minimal subshifts. For alphabets of size greater than 3, A ( x ) equalsthe union of exactly two minimal subshifts, Ω( x ) and its ”reversal”. Further, in Section 5,we show that for Arnoux–Rauzy words, their abelian closures contain non-recurrent words,and hence A ( x ) = Ω( x ). We then extend our interest to general subshifts in Section 6. Ourfocus is on subshifts deﬁned using notions from formal language theory. We show that theabelian closure of a subshift of ﬁnite type (resp., soﬁc subshift ) is not necessarily a subshiftof ﬁnite type (resp., a soﬁc subshift) (see Section 6 for deﬁnitions). We then conclude withopen problems.

2. Notation and ﬁrst observations

We recall some notation and basic terminology from the literature of combinatorics onwords. We refer the reader to [17, 18] for more on the subject.The set of ﬁnite words over an alphabet Σ is denoted by Σ ∗ and the set of non-emptywords is denoted by Σ + . The empty word is denoted by ε . We let | w | denote the length ofa word w ∈ Σ ∗ . By convention, | ε | = 0. A factor of a word x is any block of its consecutiveletters, and we let L ( x ) denote the set of factors of x . The length n factors of x is denotedby L n ( x ). The length n preﬁx of the word x is denoted by pref n ( x ). The factor complexity function P x is deﬁned by P x ( n ) = |L n ( x ) | . An inﬁnite word x is called recurrent if eachfactor of x occurs inﬁnitely many times in x . Further, x is uniformly recurrent if for eachfactor u ∈ L ( x ) there exists N ∈ N such that u occurs as a factor in each factor of length N of x . For a ﬁnite word u ∈ Σ ∗ , we let | u | a denote the number of occurrences of the letter a ∈ Σ in u . For a ﬁnite word v , we let v ω denote the inﬁnite word obtained by repeating v inﬁnitely many times.For x ∈ Σ N and a ∈ Σ, the limitsfreq x ( a ) := lim n →∞ sup v ∈L n ( x ) | v | a n and freq x ( a ) := lim n →∞ inf v ∈L n ( x ) | v | a n exist. and, moreover,freq x ( a ) = inf n ∈ N sup v ∈L n ( x ) | v | a n and freq x ( a ) = sup n ∈ N inf v ∈L n ( x ) | v | a n . v ∈L n ( x ) | v | a (resp., inf v ∈L n ( x ) | v | a ) is subadditive (resp., superadditive ) with respect to n . It immediately follows thatsup v ∈L n ( x ) | v | a ≥ n freq x ( a ) and inf v ∈L n ( x ) | v | a ≤ n freq x ( a ) (1)for all n ∈ N . If freq x ( a ) = freq x ( a ), we denote the common limit by freq x ( a ) and we saythat x has uniform frequency of a .A subshift X ⊆ Σ N , X = ∅ , is a closed set (with respect to the product topology on Σ N )satisfying σ ( X ) ⊆ X , where σ is the shift operator (deﬁned by σ ( a a a · · · ) = a a · · · ). Fora subshift X ⊆ Σ N we let L ( X ) = ∪ y ∈ X L ( y ). A subshift X ⊆ Σ N is called minimal if X doesnot contain any proper subshifts. Observe that two minimal subshifts X and Y are eitherequal or disjoint. Let x ∈ Σ N . We let Ω( x ) denote the shift orbit closure of x , which may bedeﬁned as the subshift { y ∈ Σ N : L ( y ) ⊆ L ( x ) } . Thus L (Ω( x )) = L ( x ) for any word x ∈ Σ N .It is known that Ω( x ) is minimal if and only if x is uniformly recurrent. For a morphism ϕ : Σ → ∆ ∗ (that is, ϕ ( uv ) = ϕ ( u ) ϕ ( v ) for all u, v ∈ Σ ∗ ) and a subshift X ⊆ Σ N , we deﬁne ϕ ( X ) = ∪ x ∈ X Ω( ϕ ( x )). When using erasing morphisms , that is, some letter maps to ε , wemake sure that no point in X gets mapped to a ﬁnite word. For more on this topic we referthe reader to [16].We recall deﬁnitions and properties related to Sturmian words from [18, Chapter 2]. Weidentify the interval [0 ,

1) with the unit circle T (the point 1 is identiﬁed with point 0). Forpoints x, y ∈ T , we let I ( x, y ) (resp., ¯ I ( x, y )) denote the half-open interval on T containing x (resp., y ) and starting from x and ending at y in counter–clockwise direction. We omit thebar whenever it does not matter which endpoint is in the interval. Let α ∈ T be irrationalor rational and let ρ ∈ T . The map R α : T → T , x

7→ { x + α } , where { x } = x − ⌊ x ⌋ is the fractional part of x ∈ R , deﬁnes a (counter-clockwise) rotation on T . Divide T intotwo half-open intervals I = I (0 , − α ) and I = I (1 − α,

1) (resp., I = ¯ I (0 , − α ) and I = ¯ I (1 − α, ν : T → { , } , x i if x ∈ I i , i = 0 ,

1. The rotationword s α,ρ (resp., ¯ s α,ρ ) of slope α and intercept ρ is the word a a · · · ∈ { , } N deﬁned by a n = ν ( R nα ( ρ )) for all n ∈ N . Note that 00 occurs in s α,ρ if and only if α < /

2. Clearly, s α,ρ is aperiodic if and only if α is irrational. Each aperiodic rotation word is a Sturmian wordand vice versa. We call periodic rotation words periodic Sturmian. Observe the special roleplayed by s α,α = s : both 01 s and 10 s ∈ Ω( s ) for α ∈ (0 , s is uniformly recurrent so that Ω( s ) is minimal. Further, s ′ ∈ Ω( s )if and only if s and s ′ are of the same slope. In particular, the intercepts or the endpoints of I and I do not play any role when speaking of the shift orbit closure of a Sturmian word.(In Section 4 we pay attention to these choices.)An inﬁnite word x ∈ Σ N is called balanced if, for all v, v ′ ∈ Σ N with | v | = | v ′ | and forall a ∈ Σ, we have || v | a − | v ′ | a | ≤

1. Periodic and aperiodic Sturmian words are exactlythe recurrent balanced binary words [19]. It follows that, for each (aperiodic or periodic)Sturmian word s , the set {| v | : v ∈ L n ( s ) } consists of at most two values k and k + 1 forsome k depending on s and n . Observe now that freq s (1) = α for any Sturmian word s ofslope α . By (1), k above equals ⌊ nα ⌋ .We turn to the main notion of this paper. Deﬁnition 2.1.

For x ∈ Σ N we deﬁne the abelian closure of x as A ( x ) = { y ∈ Σ N | ∀ u ∈ L ( y ) ∃ v ∈ L ( x ) : u ∼ ab v } . x ∈ Σ N , the abelian closure A ( x ) is indeed a subshift. We make preliminaryobservations on abelian closures of inﬁnite words. Lemma 2.2.

Assume x ∈ Σ N has uniform frequency of a letter a ∈ Σ . Then any word y ∈ A ( x ) has uniform frequency of a and freq y ( a ) = freq x ( a ) .Proof. For all y ∈ A ( x ) and for all n ∈ N , we have immediately from the deﬁnition of A ( x )that sup v ∈L n ( x ) | v | a n ≥ sup v ∈L n ( y ) | v | a n ≥ inf v ∈L n ( y ) | v | a n ≥ inf v ∈L n ( x ) | v | a n . Letting n → ∞ gives our claims.We immediately have that if x has an irrational uniform frequency of some letter a , then A ( x ) contains only aperiodic words. We continue by observing how the abelian closures ofperiodic and ultimately periodic words can diﬀer. Proposition 2.3.

For any periodic word x , the abelian closure A ( x ) is ﬁnite.Proof. A word y is periodic if and only if all factors of length n are abelian equivalent forsome n ≥ n be the least such integer for x . It follows that all factors of length n of any word y ∈ A ( x ) are abelian equivalent. Thus y = v ω with | v | dividing n . There areﬁnitely many such words.In general, the abelian closure of an ultimately periodic word can be huge. Example 2.4.

Let x = 0011(001101) ω . It is readily veriﬁed that for odd lengths x has twoabelian factors, and for even lengths three. Further, for each factor of x , the number ofoccurrences of 1 diﬀers by at most one from half of its length. Thus, by the discussion in theintroduction, we have TM ∈ A ( x ) so that A ( x ) = A ( TM ) = { ε, , } · { , } N .A family of such examples are given in [15, Ex. 2].In the end of this section we show that, for a uniformly recurrent binary word x , A ( x )contains exactly one minimal subshift if and only if x is a Sturmian word. We start with acrucial observation, which is characteristic only for binary words. Lemma 2.5 (Corridor Lemma) . Let x be a binary word. Then y ∈ A ( x ) if and only if, forall n ∈ N , we have inf u ∈L n ( y ) | u | ≥ inf u ∈L n ( x ) | u | and sup u ∈L n ( y ) | u | ≤ sup u ∈L n ( x ) | u | . Proof.

It is easy to see (e.g., by a sliding window argument) that, for any n ≥

1, thereexists a word u ∈ L n ( z ) with | u | = m if and only if inf v ∈L n ( x ) | v | ≤ m ≤ sup v ∈L n ( x ) | v | .Applying this observation to x and y we have that for each v ∈ L n ( y ) there exists u ∈L n ( x ) such that v ∼ ab u if and only if inf u ∈L n ( x ) | u | ≤ inf u ∈L n ( y ) | u | and sup u ∈L n ( y ) | u | ≤ sup u ∈L n ( x ) | u | .We now characterize Sturmian words in terms of abelian closures. Theorem 2.6.

Let x ∈ { , } N be uniformly recurrent. Then A ( x ) contains exactly oneminimal subshift if and only if x is Sturmian. Theorem 2.7.

Let x ∈ { , } N be aperiodic and uniformly recurrent. Then A ( x ) = Ω( x ) ifand only if x is Sturmian.Proof. First we show that A ( x ) = Ω( x ) for Sturmian words. Let x be a Sturmian wordof slope α and y ∈ A ( x ). By the Corridor Lemma, y is balanced and, by Lemma 2.2, hasuniform frequencies of letters equal to those of x . Thus y ∈ Ω( x ).Assume then that A ( x ) contains exactly one minimal subshift, namely Ω( x ), and let α =freq x (1). Take a (periodic or aperiodic) Sturmian word s of slope α . By the Corridor Lemma,we have Ω( s ) ⊆ A ( x ) using (1). Since Ω( s ) is also minimal, we have Ω( x ) = Ω( s ). Thus x isSturmian, and A ( x ) = Ω( x ).The following example shows that we cannot omit the assumption of uniform recurrencefrom the statement of the above theorem. Example 2.8.

Take the

Champernowne word C (over the binary alphabet), which is ob-tained by concatenating all ﬁnite words ordered by length and lexicographic order for thesame length: C = 0 1 00 01 10 11 000 001 010 011 100 101 110 111 · · · Clearly, both Ω( C ) and A ( C ) are equal to the full shift, i.e., contain all binary words.Note that the property A ( x ) = Ω( x ) or A ( x ) containing exactly one minimal subshift doesnot characterize Sturmian words among uniformly recurrent words over arbitrary alphabets.Let f be the Fibonacci word , which is a Sturmian word deﬁned as the ﬁxed point of themorphism 0

01, 1

0. Let then ϕ : 0 ,

12. Then A ( ϕ ( f )) = Ω( ϕ ( f )) (seeTheorem 4.4).We investigate possible generalizations of the property A ( x ) = Ω( x ) to nonbinary alpha-bets in the next sections.

3. Abelian closures of balanced words

In this section we study the abelian closures of non-binary aperiodic balanced words. Weprove that the abelian closure of such a word is a ﬁnite union of minimal subshifts:

Theorem 3.1.

Let u be aperiodic recurrent and balanced. Then A ( u ) is the union of ﬁnitelymany minimal subshifts. As we will show below, abelian closure of a recurrent balanced word can contain one ormore (yet a ﬁnite number) of minimal subshifts, depending on its structure, and we can in factcompute this number. Our results rely heavily on the characterization of aperiodic recurrentbalanced words by R. Graham [11] and P. Hubert [13]. In fact, the characterization allowsus to characterize the abelian closures of slightly more general words. In particular, the thetechniques used in the proof of the above theorem give us, for each k , an aperiodic word x k over a four-letter alphabet such that A ( x k ) equals the union of k distinct minimal subshifts(see Proposition 3.10).We need some notation to give a characterization of aperiodic recurrent words. Deﬁnition 3.2.

A word is called constant gap if each letter occurs with a constant gap.6or example, ( abac ) ∞ is a constant gap word. Deﬁnition 3.3.

Let x be a ﬁnite or inﬁnite binary word, z ∈ A N and z ∈ B N , where A and B are some alphabets. Let S ( x , z , z ) denote the word obtained from x by substitutingthe n th occurrence of 0 (resp., 1) in x by the n th letter of z (resp., z ).We illustrate the above operation with an example. Example 3.4.

Let f = 01001010010010100101 · · · be the Fibonacci word, z = (0102) ω , and z = ( ab ) ω . Then S ( f , z , z ) = 0 a b a b a b a b · · · is balanced.The following theorem characterises recurrent balanced words using constant gap wordsand the operation S . Theorem 3.5 ([13, Thm. 1], [11]) . An aperiodic word u ∈ Σ N is recurrent and balanced ifand only if there exist a partition { A, B } of Σ , two constant gap words z ∈ A N and z ∈ B N ,and a Sturmian word s , such that u = S ( s , z , z ) . We remark that although the structure of aperiodic balanced words is clear, the structureof periodic balanced words is a mystery: the following conjecture by Fraenkel, 1973, remainsopen despite eﬀorts of diﬀerent scientists: The unique (up to a permutation of letters) bal-anced word on k ≥ F k ) ω = ( F k − kF k − ) ω where F = 121 [10]. The conjecture has been veriﬁed for k ≤ z and z being constant gap. We obtain a characterization of theabelian closures of such words, from which the characterization of abelian closures of recurrentbalanced words follows.We need the following lemma: Lemma 3.6.

Let u = S ( s , z , z ) with s Sturmian and z i periodic words. Then L ( u ) = {S ( x, z ′ , z ′ ) : x ∈ L ( s ) , z ′ i ∈ Ω( z i ) } . Further, u is uniformly recurrent.Proof. In fact, this result is implicitly contained in [13], only stated in slightly weaker form.Theorem 2 and Proposition 3.1 of [13] essentially state the following. Let v = S ( s , y , y ),where s is a Sturmian word and y and y are constant gap words with periods l and l respectively. Then the factor complexity function of v satisﬁes P v ( n ) = l l ( n + 1) for alllarge enough n . Further [13, Prop. 5.1] states that v is uniformly recurrent.Observe that L ( v ) ⊆ {S ( x, σ i ( y ) , σ j ( y )) : x ∈ L ( s ) , i, j ∈ N } and the cardinality of thelatter set equals l l ( n +1) for large enough n . This coincides with the factor complexity P v ( n )by Hubert’s result. We deduce that for all x ∈ L ( s ), i, j ∈ N , we have S ( x, σ i ( y ) , σ j ( y )) ∈L ( v ).Now we can take the constant gap words y = ( a . . . a l ) ω and y = ( b . . . b l ) ω , where l and l are the lengths of the periods of z and z : z i = u ωi , | u i | = l i . Deﬁne a coding τ so that a · · · a l u and b · · · b l u . Then τ ( v ) = u so that u is uniformly recurrent. Further, τ ( S ( x, σ i ( y ) , σ j ( y ))) = S ( x, σ i ( z ) , σ j ( z )). The claim follows now immediately.7n the above lemma we allow the periodic words z and z contain common letters. On theother hand, in the following proposition we assume that they do not share common letters.This puts us in the position of characterizing the abelian closures of such words. Proposition 3.7.

Let u = S ( s , z , z ) for some Sturmian word s and periodic words z ∈ A N and z ∈ B N , where A and B are disjoint alphabets. Then A ( u ) = [ t i ∈A ( z i ) Ω( S ( s , t , t )) , and A ( u ) is a ﬁnite union of minimal subshifts.Proof. Let ﬁrst x ∈ Ω( S ( s , t , t )) for some t i ∈ A ( z i ). For any factor x ∈ L ( x ), we have x = S ( y, σ i ( t ) , σ j ( t )) for some y ∈ L ( s ) and i, j ≥ u = pref n ( σ i ( t ))and v = pref m ( σ j ( t )), where n = | y | , m = | y | . By assumption, there exist u ′ ∈ L ( z )and v ′ ∈ L ( z ) such that u ∼ ab u ′ and v ∼ ab v ′ . Choose r, s such that σ r ( z ) begins with u ′ and σ s ( z ) begins with v ′ . It follows that x ∼ ab S ( y, σ r ( z ) , σ s ( z )) ∈ L ( u ). We thus have x ∈ A ( u ).Let then x ∈ A ( u ). Take ϕ : Σ → { , } such that ϕ ( a ) = 0 if and only if a ∈ A . Itfollows that ϕ ( x ) ∈ A ( ϕ ( u )) = Ω( s ) since the alphabets A and B are disjoint. Take thenthe morphism ϕ A : Σ → A ∗ such that ϕ A ( a ) = a for a ∈ A , otherwise ϕ A ( a ) = ε . Deﬁnethe morphism ϕ B : Σ → B ∗ analogously. We again have that ϕ A ( x ) ∈ A ( ϕ A ( u )) where ϕ A ( u ) = z . Similarly ϕ B ( x ) ∈ A ( z ). It is now evident that x = S ( s ′ , t , t ) for some t i ∈ A ( z i ), i = 0 ,

1, and s ′ ∈ Ω( s ). The above lemma implies that x ∈ Ω( S ( s , t , t )).As Ω( z i ) is ﬁnite by Proposition 2.3, A ( u ) is a ﬁnite union of minimal subshifts. Thisconcludes the proof.The above proposition has Theorem 3.1 as an immediate corollary.Let us consider what the above proposition says. The number of distinct minimal subshiftsis bounded above by the product of the number of minimal subshifts in A ( z ) and the numberof minimal subshifts in A ( z ). Example 3.8.

Let z = (0102) ω and z = (34) ω . Now A ( z i ) = Ω( z i ) as is readily veriﬁed.Thus A ( u ) = Ω( u ) for u = S ( s , z , z ), s Sturmian, by the above proposition.For the case of recurrent balanced words, the periodic words in the construction areconstant gap words. It is natural to ask whether the extra property of constant gaps restrictsthe cardinality of the number of minimal subshifts in its abelian closure. We give a negativeanswer to this:

Example 3.9.

Let z = ( a a a a a A · a a a a a A · a a a a a A ) ω . Here the letters a i have constant gaps of length 6, and A i have constant gaps of length 18. Take any constantgap sequence z which is not closed under reversal (e.g., ( abc ) ω ), and u = S ( s , z , z ). Thenwe can independently take reversal inside z , inside z , and inside the arithmetic progressiongiven by A A A in z . We thus get eight minimal subshifts. Note that this construction canbe generalized to produce 2 k minimal subshifts for any k .The techniques used in the proof of the above theorem give us the following proposition.We remark that the words in question are not necessarily balanced:8 roposition 3.10. For each k ≥ there exists an aperiodic word x k over a three-letteralphabet such that A ( x k ) equals the union of k distinct minimal subshifts.Proof. For k = 1 we may take any Sturmian word. Let thus k ≥

2. Consider ﬁrst the abelianclosure of the periodic word z = (0 k − ω . It is readily veriﬁed that A ( x ) contains theminimal subshifts generated by the words (0 k − − i i ω , i = 0 , . . . , k −

1. To see that thereis nothing else in A ( z ), we observe the following. By Proposition 2.3, any word in A ( z ) isperiodic with period dividing 2 k + 1, an odd number. The period cannot be less than 2 k + 1,as the number of 1s in factors of length 2 k + 1 is only 2. On the other hand, all words oflength 2 k + 1 containing two occurrences of 1 (and hence the periodic words they generate)occur already in the subshifts above.For the claim we set x k = S ( s , z , a ω ) where s is an aperiodic Sturmian word. ByProposition 3.7, A ( x k ) = k − [ i =0 Ω( S ( s , (0 k − − i i ω , a ω )) , a union of k distinct minimal subshifts.

4. Abelian closures of words of minimal complexity

First we study the abelian closures of aperiodic nonbinary words of minimal factor com-plexity. Over an alphabet Σ with at least two letters, the minimal complexity is n + | Σ | − n + C is related to the structure of Sturmian wordsand is well understood ([7, 9, 14]). The main goal of this subsection is to prove that foraperiodic ternary words of minimal complexity their abelian closure consists of either oneor uncountably many minimal subshifts (Theorem 4.4); for alphabets of size greater than 3the abelian closure contains exactly two minimal subshifts (when the words are assumed tobe recurrent, Theorem 4.13). Recall that for binary alphabet, we have exactly one minimalsubshift (Theorem 2.6).A proof of the following is explained in [9] in the discussion following Lemma 1. Lemma 4.1.

A minimal complexity word u over the alphabet A is of the form a · · · a t u ′ ,where u ′ is a recurrent minimal complexity word over an alphabet A ′ ⊆ A , and a , . . . , a t aredistinct letters of A \ A ′ . We shall consider the abelian subshifts of recurrent aperiodic minimal complexity words.The following lemma then extends those results to handle non-recurrent ones as well.

Lemma 4.2.

Let u = a · · · a t u ′ be an aperiodic minimal complexity word as in the abovelemma. Then A ( u ) = Ω( u ) ∪ A ( u ′ ) .Proof. Let y ∈ A ( u ), so it contains at most one occurrence of each of the letters a i . If itcontains none of them, there is nothing to prove.So let us write y = pa i y ′ , where y ′ contains none of the letters a j , j = 1, . . . , t . First ofall, i = t , as a i must be followed by a i +1 (or a i − , in which case we reach a at some point,after which we cannot add anything). We show that y ′ = u ′ . Assume that y ′ = u ′ , so y contains the factor a t wa , while u contains a t wb , for some a = b . This is impossible, as a t wb

9s the only factor of length | w | + 2 of u that contains only letters from A \ A ′ (apart from a t ).Hence y = pa t u ′ .Now either p ends with a t − or with a letter a ∈ A ′ . In the latter case, we must have that a is the ﬁrst letter of u ′ , and so y contains aa t a . It follows that u ′ must begin with aa , so y contains aa t aa and hence u ′ must begin with aaa . By continuing with this line of reasoning,we see that u ′ = a ω , which is contrary to the assumption that u is aperiodic. We deduce that p must end with a t − . Now p cannot contain any letter from A ′ (it would be followed by aletter a i with i < t , a contradiction). The only option is that y = a j a j +1 · · · a t u ′ for some j ≥

0, which suﬃces for the proof.

We start with inﬁnite words for which p ( n ) = n + 2 for all n ≥

1. Observe that this impliesthat we deal with ternary words.In [3], J. Cassaigne characterizes words having factor complexity n + C for all n ≥ n , C aconstant. Here we consider the case of C = 2 and n = 1. We ﬁrst recall their characterization,which can be deduced from [14] (see also [7]). Theorem 4.3.

A word u ∈ { , , } has factor complexity P ( n ) = n + 2 for all n ≥ if andonly if u is of the form (up to permuting the letters)1. u = 2 s for some Sturmian word s ∈ { , } N , or u ∈ Ω( ϕ ( s )) , where s is a Sturmian word and ϕ is deﬁned by2. , ;3. , . In this subsection we study the abelian closures of these words. The main result is thefollowing theorem.

Theorem 4.4.

Let u be a word of factor complexity n + 2 for all n ≥ . If u is as inTheorem 4.3 (1) or (2) , then A ( u ) = Ω( u ) . If u is as in (3) , then A ( u ) contains uncountablymany minimal subshifts. In fact we are able to characterize the abelian closures of these words. We do this in parts,the ﬁrst two cases are straightforward and we prove them ﬁrst. For the last case we needsome further notions.

Remark 4.5.

In the following, we often use the following argument. Assume that each letterin x ∈ Σ N occurs with bounded gaps. Let ϕ be a morphism such that | ϕ ( a ) | ≤

1. Then ϕ ( A ( x )) ⊆ A ( ϕ ( x )). Indeed, since in x each letter occurs with bounded gaps, the same holdsfor any y ∈ A ( x ). Consequently ϕ ( y ) is inﬁnite. Letting v be a factor of ϕ ( y ), there existsa factor v ′ of y such that ϕ ( v ′ ) = v due to the length assumption on ϕ . As y ∈ A ( x ) thereexists a factor w ′ of x abelian equivalent to v ′ . It thus follows that ϕ ( x ) contains the factor ϕ ( w ′ ) abelian equivalent to v . As y and v were arbitrary, we conclude that ϕ ( y ) ∈ A ( ϕ ( x )).In particular, if ϕ ( x ) is Sturmian, then ϕ ( A ( x )) ⊆ Ω( ϕ ( x )) by Theorem 2.6. Proposition 4.6.

Let u as in Theorem 4.3 (1) or (2) . Then A ( u ) = Ω( u ) .Proof. Assume ﬁrst that u = 2 s . The claim follows immediately from Lemma 4.2 togetherwith Theorem 2.7. 10ssume then that u is as in (2). Let x ∈ A ( u ). By applying the morphism 2

2, 1

0, we see that every second letter of x is 2. Further, by mapping 2 ε , i i for i = 0 ,

1, we see that x maps into a word in Ω( s ). It is straightforward to see that now x ∈ Ω( u ).The rest of this subsection is devoted to the case where u is as in Theorem 4.3 (3). Thiscase is more intricate as shown by the following example: A ( u ) contains non-recurrent words,similar to the Tribonacci word. Example 4.7.

Let α ∈ T , let ϕ be as in Theorem 4.3 (3), and u = ϕ ( s α,α ). The words u = ϕ (01 s α,α ) = 012 u and u = ϕ (10 s α,α ) = 120 u are both in Ω( u ). We claim that thenon-recurrent word x = 02 u ∈ A ( u ). Indeed, σ ( x ) = σ ( u ) ∈ Ω( u ) (recall σ is the shiftmap). Further, any preﬁx of x of length at least 2 is abelian equivalent to the preﬁx of σ ( u ).Thus x ∈ A ( u ).We now analyze the structure of A ( u ). Without loss of generality we may take u = ϕ ( s ),since ϕ ( s ) is uniformly recurrent. Consider the images of u under the morphisms ϕ : 0

1, 2 ϕ : 0

0, 1

0, 2

1. We have ϕ ( u ) = s , a Sturmian word ofsome slope α and intercept ρ . (Indeed, ϕ ( u ) = G ◦ E ( s ) using the notation of [18, Chap. 2,p. 72].) Symmetrically, ϕ ( u ) = s is a Sturmian word of slope α and intercept ρ ′ . (Again, ϕ ( u ) = D ◦ E ( s ), see again the above reference.) In fact we can say more: ρ ′ = ρ − α or,equivalently, s = σ ( s ). Observe now that s contains the factor 00 meaning that α < / x ∈ A ( u ) with the morphisms ϕ and ϕ , we obtain two Sturmianwords with the same slope α . Further, by applying 0 ε on u , we see that x ∈ Ω((12) ω ).This implies that all words in A ( u ) are obtained by somehow ”interleaving” two Sturmianwords of the same slope ( s is encoded by 1 ↔ ternary codings of rotations , which capture this phenomenon.Recall the deﬁnition of Sturmian words as codings of rotations on the torus T with thehalf-open intervals I = I (0 , − α ) and I = I (1 − α, α < /

2. Take ζ ∈ I ( α, − α ) and split torus T into four (three if ζ = α or ζ = 1 − α ) intervals deﬁned bythe points 0 , ζ − α, ζ , and 1 in increasing order: Deﬁne the disjoint intervals J = I ( ζ − α, ζ )(resp., J = ¯ I ( ζ − α, ζ )), J = I , and J = I \ J . We must be careful with the value ζ = α (resp., ζ = 1 − α ): If 1 ∈ I (resp., 1 − α ∈ I ) then J = ¯ I (0 , α ) (resp., J = I (1 − α, − α )).Take the rotation R α and the encoding ν : T → { , , } , x i if and only if x ∈ I i . Theword t α,ζ,x = ( ν ( R nα ( x )) is called the rotation word of slope α , oﬀset ζ , and intercept x . SeeFigure 1a for an illustration. When indicate the choices of endpoints of J i as follows. If 1 ∈ J and ζ ∈ J (resp., 1 / ∈ J , ζ / ∈ J ) we denote the obtained word by t α,ζ,ρ (resp., t α,ζ,ρ ). If1 ∈ J and ζ / ∈ J (resp., 1 / ∈ J , ζ ∈ J ), we denote this by t α,ζ,ρ (resp., t α,ζ,ρ ). Notice that t α,α,ρ and t α, − α,ρ are not deﬁned: this would imply that the intervals J and J overlap.Observe now that, by the discussion following Example 4.7, for u of factor complexity n + 2 as in Theorem 4.3(3), we have u = t α,α,ρ for some ρ ∈ T (see Figure 1b). Further, anyword x ∈ A ( u ) is of form t α,ζ,ρ ′ for some ζ ∈ I ( α, − α ), ρ ′ ∈ T . Our main goal is to showthat t α,ζ,ρ ∈ A ( u ) for all possible ζ ∈ [ α, − α ].Recall that Sturmian words are balanced, so that for each n ∈ N and for each i = 1 , {| v | i : v ∈ L ( u ) } comprises two values (depending on n and α ). We say that a factor v is 1 -heavy (resp., 2 -heavy ) if | v | (resp., | v | ) attains the larger of the two possible values.Otherwise we say that v is 1 -light (resp., 2 -light ). If v is 1-heavy and 2-heavy, we say that11 {− α } ζ − αζ J J J J x R α ( x ) (a) {− α } α J J J x R α ( x ) (b) Figure 1: An illustration of a system of codings of rotations with more than 2 intervals. In 1a we have fourintervals and in 1b three intervals. The word u in Theorem 4.3, item 3 is a coding of the orbit of some pointin the system 1b. v is 1 - -heavy . Similarly, v is called 1 - -light if v is 1-light and 2-light. We make use of thefollowing result appearing in [23, part of Thm. 19]. Proposition 4.8.

Let s be a Sturmian word of slope α and intercept ρ and let m ≥ . Thenthe preﬁx of length m of s is heavy if and only if ρ ∈ I ( R − mα (0) , . Here I ( R − mα (0) , containsthe point R − mα (0) if and only if / ∈ I . We may apply Proposition 4.8 to determine whether a point starts with a heavy factorof length m or not. Indeed, for the letter 1, the proposition stands as is: t α,ζ,ρ begins witha 1-heavy factor of length m if and only if ρ ∈ I ( {− mα } , ζ . Thus the word t α,ζ,ρ begins with a 2-heavyfactor if and only if ρ ∈ I ( {− mα + ζ } , ζ ). Here the interval I ( {− mα + ζ } , ζ ) contains thepoint {− mα + ζ } (resp., ζ ) if and only if ζ / ∈ J (resp., ζ ∈ J ). We deﬁne the followingdistance on the torus: k x k = min { x, − x } . Thus, e.g., max { x, − x } = 1 − k x k . Lemma 4.9.

Let x = t α,ζ,ρ . Then1. x contains a -heavy– -light and a -heavy– -light factor for each length.2. There exists a - -heavy factor v ∈ L ( x ) of length m if and only if {− mα } < − k ζ k ,or {− mα } = 1 − k ζ k and x = t α, { mα } , {− nα } or x = t α, {− mα } , {− ( m + n ) α } for some n ≥ .3. There exists a - -light factor v ∈ L ( x ) of length m if and only if {− mα } > k ζ k , or {− mα } = k ζ k and x = t α, {− mα } , {− ( m + n ) α } or x = t α, { mα } , {− nα } .Proof. We give a proof case by case. Consider factors of length m and write µ = {− mα } forshort.1. We ﬁrst consider 1-heavy–2-light factors. By the preceding observations on 1-heavy and2-heavy factors, x has a 1-heavy–2-light factor if and only if I ( µ, ∩ I ( ζ, { ζ + µ } ) = ∅ .The interval I (max { ζ, µ } , min { ζ + µ, } ) is always in the intersection, since ζ, µ < ζ, µ < ζ + µ . Since ( {− nα } ) n is dense in [0 , x corresponds to a coding ofa point in this interval.Similarly, x has a 2-heavy–1-light factor if and only if I (0 , µ ) ∩ I ( { ζ + µ } , ζ ) = ∅ . Theinterval I (max { , µ + ζ − } , min { ζ, µ } ) is always in the intersection, since ζ, µ > ζ, µ > µ + ζ − x has a 1-2-heavy factor if and onlyif I ( µ, ∩ I ( { µ + ζ } , ζ ) = ∅ . Assume ﬁrst that µ < − k ζ k . If µ < ζ then I ( µ, ζ ) is in the12ntersection. If ζ ≤ µ < − k ζ k , then µ + ζ < { µ + ζ } = µ + ζ and I ( µ + ζ,

1) isin the intersection. The denseness of ( {− nα } ) n ∈ N in T again implies that some shift of x corresponds to a point in this interval.Assume then that µ = 1 − k ζ k . If ζ = k ζ k , then { µ + ζ } = 0. Now I ( { µ + ζ } , ζ ) and I ( µ,

1) can share at most one point in common, namely the point 1. Now the intersectionis non-empty if and only if 1 ∈ J and ζ / ∈ J . Further, t α,ζ, is the only word startingwith a 1-2-heavy factor. To hit the point 0 in the orbit starting from ρ ∈ T , we must have ρ = {− nα } for some n ≥

0. So, we have x contains a 1-2-heavy factor of length m if andonly if x = t α,ζ, {− nα } , where ζ = 1 − µ = 1 − {− mα } = { mα } .Similarly, if ζ = 1 − k ζ k , then I ( { µ + ζ } , ζ ) and I ( µ,

1) can share at most one point incommon, namely ζ . The intersection is not empty if and only if 1 / ∈ J and ζ ∈ J . In thiscase t α,ζ,ζ is the only point starting with a 1-2-heavy factor. To hit the point ζ in the orbitof ρ , we must have ρ = { ζ − nα } for some n ≥

1. Hence x contains a 1-2-heavy factor ifand only if x = t α,ζ, { ζ − nα } , where ζ = 1 − k ζ k = µ = {− mα } .Assume ﬁnally that µ > − k ζ k . It follows that µ > ζ and µ + ζ > < { µ + ζ } < ζ . Therefore I ( µ, ∩ I ( { µ + ζ } , ζ ) = ∅ . This concludes the case of 1-2-heavyfactors.3. Let us then ﬁnally consider 1-2-light factors. We proceed analogous to the previous case.The word x has a 1-2-light factor of length m exists if and only if I (0 , µ ) ∩ I ( ζ, { ζ + µ } ) = ∅ .Assume ﬁrst that µ > k ζ k . If µ > ζ , then the interval I ( ζ, µ ) is in the intersection. If ζ > µ ≥ −k ζ k , then µ + ζ > { µ + ζ } >

0. Now I (0 , { µ + ζ } ) is in the intersection.Assume then that µ = k ζ k . If ζ = k ζ k , then µ + ζ <

1. Now I ( ζ, µ + ζ ) and I (0 , µ ) canshare at most one point in common, namely the point ζ . By the observations precedingthe lemma, the intersection is not empty if and only if 1 ∈ J and ζ / ∈ J . Now t α,ζ,ζ is theonly point starting with a 1-2-light factor. The only way to hit the point ζ in the orbit of ρ is that ρ = { ζ − nα } for some n ≥

0. It follows that x contains a 1-2-light factor if andonly if x = t α,ζ, { ζ − nα } , where ζ = {− mα } .Similarly, if ζ = 1 − k ζ k , then µ + ζ = 1. Now I ( ζ, { µ + ζ } ) and I (0 , µ ) can share at mostone point in common, namely the point 1. By the observations preceding the lemma, theintersection is not empty if and only if 1 / ∈ J and ζ ∈ J . Now t α,ζ, is the only factorstarting with a 1-2-light factor. Again, we have x contains a 1-2-light factor if and only if x = t α,ζ, {− nα } , where ζ = 1 − µ = { mα } .Finally, if µ < k ζ k , then µ < ζ and µ + ζ ≤ µ + 1 − k ζ k <

1. Thus I (0 , µ ) and I ( ζ, µ + ζ )do not intersect. This concludes the proof.As is evident from Lemma 4.9(2), the existence of a 1-2-heavy factor of a certain lengthdepends not only on ζ , but also on ρ and how the endpoints of the intervals are deﬁned. Forexample, the word t α, { mα } , begins with a 1-2-heavy factor of length m , while t α, { mα } , {− nα } does not contain such a factor. Note further that t α, { mα } , contains only one occurrence ofsuch a factor, and hence is non-recurrent. In fact, any word t α, { mα } , {− nα } , n ≥

0, containsexactly one such factor of length m , namely at position n . Lemma 4.10. If k ζ k > k ζ ′ k then t α,ζ,ρ ∈ A ( t α,ζ ′ ,ρ ′ ) but t α,ζ ′ ,ρ ′ / ∈ A ( t α,ζ,ρ ) . roof. Let x = t α,ζ,ρ and u = t α,ζ ′ ,ρ ′ for short. Observe that for any w, w ′ , where w ∈ L m ( u )and w ′ ∈ L m ( x ), we have || w | − | w ′ | | ≤ || w | − | w ′ | | ≤ x ∈ A ( u ). By Lemma 4.9(1), both words contain both 1-heavy-2-light and 2-heavy-1-light factors of each length. We show that whenever x contains a 1-2-heavyfactor or a 1-2-light factor length m , then u contains such a factor as well, which suﬃces forthe claim. To this end, let w ∈ L n ( x ). If w is a 1-2-heavy factor, then by Lemma 4.9(2), wehave {− mα } ≤ − k ζ k < − k ζ ′ k so that u contains a 1-2-heavy factor by the same lemma. If w is a 1-2-light factor, then by Lemma 4.9(3), {− mα } ≥ k ζ k > k ζ ′ k so that u again containsa 1-2-light factor of length m .We then show that u / ∈ A ( x ). Since ( {− mα } ) m ≥ is dense in [0 , m ∈ N for which k ζ k > {− mα } > k ζ ′ k . By Lemma 4.9(3), u contains a 1-2-light factor oflength m , while x does not. It follows that u / ∈ A ( x ).We may now characterize the abelian closure of n + 2 factor complexity words via ternarycodings of rotations. Proposition 4.11.

Let u = t α,α,ρ for some ρ ∈ T . Then A ( u ) = S ζ ∈ [ α, − α ] Ω( t α,ζ,ρ ) . Proof.

By the above lemma we have t α,ζ,ρ ∈ A ( u ) for all ζ ∈ ( α, − α ). For ζ = α or ζ = 1 − α , all words t α,ζ,ρ either contain or do not contain a 1-2-light (resp. 1-2-heavy) factorregardless of ρ . (Recall that the words t α,α,ρ and t α, − α,ρ are not deﬁned.) As there are noother words in A ( u ), this concludes the proof.In fact, utilising Lemma 4.10 we can characterize the abelian closure any word t α,ζ,ρ . Theproof above applied to the setting k ζ k = k α k , i.e., when u is a minimal complexity word,carries over to arbitrary ζ with minor modiﬁcations: Proposition 4.12.

Let u = t α,ζ,ρ with k ζ k > k α k . Then A ( u ) = [ k ξ k≥k ζ k ρ ′ ∈ T Ω( t α,ξ,ρ ′ ) \ S, where S is a countable set of words depending on ζ and ρ as follows.1. If − k ζ k , k ζ k / ∈ {{− mα } : m ∈ N } , then S = ∅ .2. Assume that − k ζ k = {− mα } for some m ≥ . If u = t α, { mα } , {− nα } or u = t α, {− mα } , {− ( m + n ) α } for some n ≥ , then S = ∅ . Otherwise S = { t α, { mα } , {− nα } : n ≥ } ∪ { t α, {− mα } , {− ( m + n ) α } : n ≥ } .

3. Assume that k ζ k = {− mα } for some m ≥ . If u = t α, {− mα } , {− ( m + n ) α } or u = t α, { mα } , {− nα } for some n ≥ , then S = ∅ . Otherwise S = { t α, {− mα } , {− ( m + n ) α } : n ≥ } ∪ { t α, { mα } , {− nα } : n ≥ } . Proof.

Notice that any word y ∈ A ( u ), we have that y is of the form t α,ξ,ρ ′ . Indeed, usingthe mappings ϕ and ϕ as in the discussion following Example 4.7, ϕ ( y ) and ϕ ( y ) are14turmian words with slope α . We deduce that they are interleavings of Sturmian wordsgiving rise to the claimed form of y .Lemma 4.10 then gives that A ( u ) is a subset of S k ξ k≥k ζ k Ω( t α,ξ,ρ ′ ), but it is possibly aproper subset. The same lemma shows that A ( u ) is a superset of S k ξ k > k ζ k Ω( t α,ξ,ρ ) in anycase.Therefore, we may focus on words y = t α,ξ,ρ ′ with oﬀset ξ having k ξ k = k ζ k . Notice thatthe three points are disjoint. Indeed, in 2., we assume that 1 − k ζ k = {− mα } , which gives k ζ k = { mα } . Hence k ζ k 6 = {− m ′ α } for any m ′ ≥

1, as otherwise { ( m + m ′ ) α } = 0 whichwould leave α rational.To identify the set of words S not in the abelian closure of u , we employ Lemma 4.9.1. Assume that 1 − k ζ k , k ζ k / ∈ {{− mα } : m ∈ N } . Lemma 4.9(1) then states that theexistence of a 1-2-heavy or light factor does not depend on the point whose orbit weencode, nor the choices of the endpoints of the intervals. That is to say, all words withoﬀset ξ , k ξ k = k ζ k , simultaneously either have or do not have a 1-2-heavy (resp., light)factor of length m independent to the choice of starting point ρ ′ of the orbit. Thissuﬃces to show that S = ∅ in this case.2. Assume that 1 − k ζ k = {− mα } for some m ≥

1. There is only one length of factorsin which the existence of a 1-2-heavy factor depends on the starting point ρ and thechoice of the endpoints of the intervals. This length is m . By Lemma 4.9(2), if u = t α, { mα } , {− nα } or u = t α, {− mα } , {− ( m + n ) α } for some n ≥

0, then the word contains sucha factor. In this case S = ∅ . If u is not of this form, then it does not contain such afactor, while all the words t α, { mα } , {− nα } and t α, {− mα } , {− ( m + n ) α } , n ≥

0, do. The claimthen follows.3. This is analogous to the one above.

Surprisingly, for alphabet of size greater than 3 there are always only ﬁnitely many sub-shifts:

Theorem 4.13.

Let u be a recurrent word of factor complexity n + C for all n ≥ , where C > . Then A ( u ) contains exactly two minimal subshifts. The proof is based on the characterization of words of factor complexity n + C for all n ≥ Lemma 4.14 ([9, Lem. 4]) . Let u be a recurrent word of minimal complexity an alphabet A ; then there exist distinct elements e , . . . , e b , f , . . . , f c , g , . . . , g d in A such that the sets E = { e , . . . , e b } , F = { f , . . . , f c } , and G = { g , . . . , g d } are pairwise disjoint, E ∪ F ∪ G = A ,with G = ∅ , and E ∪ F = ∅ , and there exists a Sturmian word s on { , } such that, if σ isthe substitution ( g · · · g d e · · · e b g · · · g d f · · · f c , then σ ( s ) = W u , where W is a (possibly empty) preﬁx of σ (0) or σ (1) . emma 4.15. Let w be a recurrent word of of minimal complexity an alphabet A of cardinalityat least 3. Then each w ′ ∈ A ( w ) is a concatenation of blocks of the form σ (0) and σ (1) (ortheir reversals), where σ is as in the previous lemma, preceded by a possibly empty suﬃx of σ (0) or σ (1) (or a reversal of a preﬁx).Proof. The proof is quite direct. Since | A | ≥

4, at least one of the sets

G, E, F containsat least two letters. Let it be E (for other sets it is similar). First we show that for any w ′ ∈ A ( w ) the letters from E must occur in blocks e · · · e b (or in e b · · · e – this case issymmetric, all the blocks are reversed). For this, it is enough to consider only factors oflength 2 and 3. Indeed, if e occurs in w ′ , then the only factors containing e in w are e e and g d e . With the exception of the case F = ∅ and | G | = 1, we have that g d e g d is not anabelian factor of w . Since g d e g d is not an abelian factor of w , in w ′ we must have g d e e or e e g d . Continuing this line of reasoning with e , e , e instead of e , g d , e etc., we get that w ′ the letters from E must occur in blocks e · · · e b (or in e b · · · e ). If F = ∅ and | G | = 1, then | E | ≥ e : by consideringfactors of length 2 and 3, we see that it can occur only in factors e e e and e e e . The restof the proof is the same.In the same way we prove that each such block e · · · e b must be surrounded by g · · · g d from both sides, so we have σ (0) g · · · g d . Now we show that this block σ (0) must be followedby either σ (0) or σ (1). We already have the beginning of the block g . . . g d . After it, onemust have either e , or f , or g in the case if F = ∅ (again, it is enough to consider factorsof length 2 and 3 containing g d ). In the cases of e or f it must continue with e · · · e b or f · · · f c , respectively, thus ﬁnishing the block σ (0) or σ (1). In the case when F is empty, wealready have σ (1). In the same way one can show that after the block σ (1) one must alsohave a full block σ (0) or σ (1).We remark that the cardinality at least 4 of the alphabet is essential for the above lemma.In the case F = ∅ , | G | = 1 and | E | = 2 the letters e and e can be separated by g d , whichcorresponds to Theorem 4.4 (3), when we have uncountably many minimal subshifts. Thefourth letter blocks this possibility: either we have | E | ≥

3, in which case e “glues” letters e and e , or | G | ≥

2, so the two letters g and g prevent mixing.Let u be uniformly recurrent. We deﬁne a word u R for which L ( u R ) = L ( u ) R , i.e., the setof reversals of the factors of u . Indeed, take the sequence ( p n ) n of preﬁxes of u , and considerthe sequence ( p Rn ) of their reversals. There is a subsequence which converges to an inﬁniteword u R . We claim that L ( u R ) = L ( u ) R . As u R is constructed using reversals of factors of u , we have that L ( u R ) ⊆ L ( u ). Let x ∈ L ( u ). Since u is uniformly recurrent, x must occurin p n for n large enough. As x occurs within bounded gaps, we conclude that the words inthe converging subsequence of ( p n ) n must have x R occurring for all n large enough, the ﬁrstoccurrence occurring with a uniform bound. Hence x R ∈ L ( u R ).Notice that we immediately have that u R ∈ A ( u ). Proof of Theorem 4.13.

The two subshifts are Ω( u ) and Ω( u R ). Suppose that there exists aword w ∈ A ( u ) such that it is not from Ω( u ). Due to Lemma 4.15, cutting a short preﬁxof w , we get a word w ′ such that w ′ = σ ( v ′ ), v ′ ∈ { , } N , and v ′ is not in the shift orbitclosure of v , where v is a Sturmian word as in Lemma 4.14. So, v ′ contains a factor w ′ whichis not abelian equivalent to any factor of v . It is straightforward to see that then σ ( w ′ ) isnot abelian equivalent to any factor of u . Indeed, it cannot be abelian equivalent to a factor16onsisting of full blocks. And if is happens to be abelian equivalent to a factor which doesnot consist of full blocks, then it is also equivalent to a shift of this factor that consists of fullblock, which is not possible. So, σ ( w ′ ) is not an abelian factor of u , hence u ′ is not in A ( u ),a contradiction.

5. Abelian closures of Arnoux–Rauzy words

In this section we discuss Arnoux–Rauzy words, which are another generalization of Stur-mian words to larger alphabet. One of the ways to deﬁne Arnoux–Rauzy words is via palin-dromic closures. The following basics on Arnoux–Rauzy words are well-known and mostlytaken from [1, 8]. In fact, this is a generalization of the facts about Sturmian words given forbinary words in [6].A ﬁnite word v = v · · · v n − is a palindrome if it is equal to its reversal, i.e., v = v n − · · · v .The right palindromic closure of a ﬁnite word u , denoted by u (+) , is the shortest palindromethat has u as a preﬁx. The iterated (right) palindromic closure operator ψ is deﬁned recursivelyby the following rules: ψ ( ε ) = ε, ψ ( va ) = ( ψ ( v ) a ) (+) for all v ∈ Σ ∗ and a ∈ Σ. The deﬁnition of ψ may be extended to inﬁnite words u over Σ as ψ ( u ) = lim n ψ (pref n ( u )), i.e., ψ ( u ) is the inﬁnite word having ψ (pref n ( u )) as its preﬁx forevery n ∈ N .Let ∆ be an inﬁnite word on the alphabet Σ such that every letter occurs inﬁnitely oftenin ∆. The word c = ψ (∆) is then called a characteristic (or standard) Arnoux–Rauzy word and ∆ is called the directive sequence of c . An inﬁnite word u is called an Arnoux–Rauzyword if it has the same set of factors as a (unique) characteristic Arnoux–Rauzy word, whichis called the characteristic word of u . The directive sequence of an Arnoux–Rauzy word isthe directive sequence of its characteristic word. An example of Arnoux–Rauzy word is givenby the Tribonacci word T , which can be deﬁned as the ﬁxed point of the morphism 0 → →

02, 2 →

0. It is not hard to see that the Tribonacci word is an Arnoux–Rauzy word withthe directive sequence (012) ω .Apparently, the structure of abelian closures of Arnoux–Rauzy words is rather compli-cated. For example, it is not hard to see that for any Arnoux–Rauzy word with a charac-teristic word c its abelian closure contains 20 c (here we assume that 0 is the ﬁrst letter of∆ and 2 is the third letter occurring in ∆ for the ﬁrst time, i.e., ∆ has a preﬁx of the form0 { , } ∗ { , } ∗ c / ∈ Ω( c ).T. Hejda, W. Steiner, and L.Q. Zamboni studied the abelian shift of the Tribonacci word T . They announced that A T \ Ω( T ) = ∅ but that Ω( T ) is the only minimal subshift containedin A ( T ) [12, 24].An interesting open question is to understand the general structure of Arnoux–Rauzywords (see Problem 7.2).

6. Abelian closures of general subshifts

In this paper and the previous works on the topic, the focus has been on abelian closures ofinﬁnite words. It would be interesting to investigate properties of abelian closures of generalsubshifts. 17e recall some deﬁnitions from [18, § x ∈ Σ Z is said to avoid a set of words F ⊆ Σ ∗ if L ( x ) ∩ F = ∅ . Let X F denotethe set of bi-inﬁnite words avoiding F . A subshift is a set X F for some F . The shift operatoris deﬁned similar to the case of inﬁnite words. Now a set X ⊆ Σ Z is a subshift if and only if σ ( X ) = X and is closed in the usual topology on bi-inﬁnite words.Let X be a subshift and let I ( X ) = Σ ∗ \ L ( X ). Deﬁne the set F ( X ) as the set of elementsof I ( X ) which are minimal for the factor ordering, i.e., have no proper factor in I ( X ). Then X = X F ( X ) . Deﬁnition 6.1.

If a subshift X = X F for some ﬁnite set F ⊆ Σ ∗ , then X is called a subshiftof ﬁnite type (SFT). If, on the other hand, F can be taken regular, then X is called soﬁc .A set X is a SFT if and only if F ( X ) is ﬁnite. Similarly, X is soﬁc if and only if F ( X ) isregular.We may deﬁne the abelian closure of a subshift straightforwardly. Deﬁnition 6.2.

Let X be a subshift. Then its abelian closure A ( X ) is deﬁned as ∪ x ∈ X A ( x ).We remark that in the previous text we considered one-way inﬁnite words, as more cus-tomary in combinatorics on words, whereas here for general subshifts it is more natural toconsider bi-inﬁnite words. Actually, there is no principal diﬀerence for our considerations, asall the results can easily be reformulated for one-way or two-way inﬁnite words.We conclude this paper with a couple of examples of abelian closures of subshifts. Example 6.3.

Clearly A (Σ Z ) = Σ Z . Let F = { } ⊆ { , } ∗ and set X = X F . Thesubshift X ⊆ { , } Z is called the golden mean subshift. Consider the abelian closure of X F : it comprises those words for which all 1s are isolated. But this is just X F itself. Thus A ( X F ) = X F .In the above example, both subshifts are of ﬁnite type. It was concluded that they are, infact, their own abelian closures. This property is of course not general for SFTs, as is shownby the following example. In fact, the abelian closure of a SFT is not in general a SFT. Example 6.4.

Consider the SFT X = X F with F = { aa, ac, ba, bb, cb } . It can be character-ized as the set of two-way inﬁnite walks on the following graph. ab c Assume for a contradiction, that A ( X ) is a SFT, with F ( X ) = F ′ . There is an integer n forwhich each element of F ′ has length at most n . Consider the word x = ω c · ab · c n · ba · c ω . Herefor a ﬁnite word v by ω v we mean the left-inﬁnite word obtained by repeating v inﬁnitelymany times. Observe that the factors of length at most n of this word occur either in ω c · ba · c ω or in ω c · ab · c ω . Both of these words are in A ( X ) by inspection, so none of the factors can bein F ′ . Thus x avoids all the forbidden factors. But, x / ∈ A ( X ), as it contains the factor bc n b .Indeed, any word in the language L ( A ( X )) that contains two occurrences of b must containat least one occurrence of a .The next example shows that this is also possible for binary alphabet:18 xample 6.5. Consider an SFT giving words of the form · · · · · · , plus (0011) ω and (000111) ω ). It is a SFT of order 6, and its Rauzy graph contains two cycleswith the same frequencies of letters and a one-way path between them.It is readily veriﬁed that words of the form · · · · · · are in the abelian closure, whereas words of the form · · · + · · · are not. Similarly to the previous example this implies that the abelian closure is not a SFT.Next we show that the abelian closure of a soﬁc shift is not necessarily soﬁc. Example 6.6.

Let Σ = { a, b, c, d } be the underlying alphabet. Set F = { a, b, d } c ∪ d { a, b, c } ∪ cRd, where R = { a, b } ∗ \ ( ab ) ∗ and let X = X F . Hence X is of the form X = { a, b } Z ∪ { ω c x , x R d ω : x ∈ { a, b } N } ∪ { ω c ( ab ) n d ω : n ≥ } ∪ { ω c ω , ω d ω } . (Here x R is the left-inﬁnite word deﬁned by x , i.e., the letter at position − n of x R equals the n th letter of x .)Let F ′ = F ( A ( X )). We show that F ′ ∩ c { a, b } ∗ d = { cwd : | w | a = | w | b } . It follows that F ′ is cannot be regular, as the language above is well-known to be non-regular. Hence A ( X ) is not soﬁc.Let us show the ⊆ direction. Let w have | w | a = | w | b . We show that the word ω cwd ω is inthe abelian closure, and thus cwd / ∈ F ′ . Now all factors of the form c n x or yd m , for x a preﬁxand y a suﬃx of w , occur in the words ω cw ω or ω wd ω , which are elements of X . We may thusconcentrate on factors of the form c n wd m . Now c n wd m is abelian equivalent to c n ( ab ) | w | / d m which is clearly in the language of X .Let then w ∈ { a, b } ∗ be such that | w | a = | w | b . Observe that the proper factors of cwd arein the language of X . This means that either cwd ∈ F ′ or cwd ∈ L ( A ( X )). Assume, for acontradiction that cwd ∈ L ( A ( X )). Now Ψ( cwd ) = ( | w | a , | w | b , , L ( X ), any wordwith Parikh vector with last two components equal to 1 is of the form ( m, m, , | w | a = | w | b , there is no word in L ( X ) which is abelian equivalent, so cwd is not an elementof L ( A ( X )). We conclude that cwd ∈ F ′ .An interesting open question is to ﬁnd out whether the abelian closure of a subshift ofﬁnite type always soﬁc (see Problem 7.3). 19 . Conclusions In this paper, we introduced and studied a notion of abelian subshifts of inﬁnite words.The main open problem we would like to state in this paper is the following:

Problem 7.1.

Characterize words for which A ( x ) = Ω( x ) . Among binary uniformly recurrent words, this property gives a characterization of Stur-mian words, but the characterization does not extend to usual generalizations of Sturmianwords to non-binary alphabets: neither for balanced words, nor for words of minimal com-plexity, nor for Arnoux–Rauzy words. A modiﬁcation of this question is to characterize wordsfor which A ( x ) contains exactly one minimal subshift.For Arnoux–Rauzy words, we showed that A ( x ) = Ω( x ), but their abelian closure seemsto have rather complicated structure, in particular, it always contains non-recurrent words.An interesting open question is to understand the general structure of Arnoux–Rauzy words: Problem 7.2.

Characterize abelian closures of Arnoux–Rauzy words.

Finally, we propose the following open question about general abelian subshifts:

Problem 7.3.

Is the abelian closure of an SFT always soﬁc?

Acknowledgements

We are grateful to Joonatan Jalonen, Ville Salo, and Luca Zamboni for fruitful discussionsand helpful comments.Svetlana Puzynina is partially supported by Russian Foundation of Basic Research (grant20-01-00488) and by the Foundation for the Advancement of Theoretical Physics and Math-ematics BASIS”. Part of the research was performed while Markus Whiteland was at theDepartment of Mathematics and Statistics, University of Turku, Finland.

References [1] P. Arnoux and G. Rauzy. Repr´esentation g´eom´etrique de suites de complexit´e 2 n +1. Bul-letin de la Soci´et´e Math´ematique de France , 119:199–215, 1991. doi:10.24033/bsmf.2164.[2] J. Bark and P. Varj´u. Partitioning the positive integers to seven Beatty sequences.

Indag.Math. , 14(2):149–161, 2003. ISSN 0019-3577. doi:10.1016/S0019-3577(03)90000-0.[3] J. Cassaigne. Sequences with grouped factors. In

Developments in Language Theory III,Publications of Aristotle University of Thessaloniki , pages 211–222, 1998.[4] S. Constantinescu and L. Ilie. Fine and wilf’s theorem for abelian periods.

EATCS Bull. ,89:167–170, 2006.[5] E. M. Coven and G. A. Hedlund. Sequences with Minimal Block Growth.

Math. Syst.Theory , 7(2):138–153, 1973. doi:10.1007/BF01762232.[6] A. de Luca. Sturmian words: structure, combinatorics, and their arithmetics.

TheoreticalComputer Science , 183:45–82, 1997. doi:10.1016/S0304-3975(96)00310-6.207] G. Didier. Caract´erisation des N -´ecritures et application `a l’´etude des suitesde complexit´e ultimement n + c ste . Theoret. Comp. Sci. , 215(1-2):31–49, 1999.doi:10.1016/S0304-3975(97)00122-9.[8] X. Droubay, J. Justin, and G. Pirillo. Episturmian words and some constructions by deluca and rauzy.

Theoretical Computer Science , 255:539–553, 2001.[9] S. Ferenczi and C. Mauduit. Transcendence of numbers with a low complexity expansion.

Journal of Number Theory , 67:146–161, 1997. doi:10.1006/jnth.1997.2175.[10] A. S. Fraenkel. Complementing and exactly covering sequences.

J. Comb. Theory Ser.A , 14(1):8–20, 1973. ISSN 0097-3165. doi:10.1016/0097-3165(73)90059-9.[11] R. L. Graham. An eﬃcient algorithm for determining the convex hull of a ﬁnite planarset.

Inf. Process. Lett. , 1(4):132–133, 1972. doi:10.1016/0020-0190(72)90045-2. URL https://doi.org/10.1016/0020-0190(72)90045-2 .[12] T. Hejda, W. Steiner, and L. Q. Zamboni. What is the Abelianization of the tribonaccishift?, 2015. Workshop on Automatic Sequences, Li´ege, May 2015.[13] P. Hubert. Suites ´equilibr´ees.

Theor. Comput. Sci. , 242(1-2):91–108, 2000.doi:10.1016/S0304-3975(98)00202-3.[14] I. Kabor´e and T. Tapsoba. Combinatoire de mots r´ecurrents de complexit´e n + 2. ITA ,41(4):425–446, 2007. doi:10.1051/ita:2007027.[15] J. Karhum¨aki, S. Puzynina, and M. A. Whiteland. On abelian subshifts. In

Developmentsin Language Theory 2018 , volume 11088 of

Lecture Notes in Computer Science , pages453–464. Springer, 2018. doi:10.1007/978-3-319-98654-8 37.[16] D. Lind and B. Marcus.

An Introduction to Symbolic Dynamics and Coding . Camb.Univ. Press, New York, NY, USA, 1995. ISBN 0-521-55900-6.[17] M. Lothaire.

Combinatorics on Words , volume 17 of

Encycl. Math. Appl.

Addison-Wesley, 1983. ISBN 978-0-201-13516-9.[18] M. Lothaire.

Algebraic combinatorics on words , volume 90 of

Encycl. Math. Appl.

Cam-bridge University Press, 2002. ISBN 0-521-81220-8. doi:10.1017/CBO9781107326019.[19] M. Morse and G. A. Hedlund. Symbolic Dynamics II. Sturmian Trajectories.

Am. J.Math. , 62:1–42, 1940. ISSN 00029327, 10806377.[20] S. Puzynina and M. A. Whiteland. Abelian closures of inﬁnite binary words.

CoRR ,abs/2008.08125, 2020. URL https://arxiv.org/abs/2008.08125 .[21] S. Puzynina and L. Q. Zamboni. Abelian returns in Sturmian words.

J. Comb. TheorySer. A , 120(2):390–408, 2013. doi:10.1016/j.jcta.2012.09.002.[22] G. Richomme, K. Saari, and L. Q. Zamboni. Abelian complexity of minimal subshifts.

J. Lond. Math. Soc. , 83(1):79–95, 2011. doi:10.1112/jlms/jdq063.[23] M. Rigo, P. Salimov, and ´E. Vandomme. Some properties of abelian return words.