On Abelian Closures of Infinite Non-binary Words
aa r X i v : . [ m a t h . C O ] D ec On Abelian Closures of Infinite Non-binary Words
Juhani Karhum¨aki a , Svetlana Puzynina b,c, , Markus A. Whiteland d, a Department of Mathematics and Statistics, FI-20014 University of Turku, Finland b St. Petersburg State University, Russia c Sobolev Institute of Mathematics, Russia d Max Planck Institute for Software Systems, Saarland Informatics Campus, Saarbr¨ucken, Germany
Abstract
Two finite words u and v are called abelian equivalent if each letter occurs equally many timesin both u and v . The abelian closure A ( x ) of an infinite word x is the set of infinite words y such that, for each factor u of y , there exists a factor v of x which is abelian equivalent to u .The notion of an abelian closure gives a characterization of Sturmian words: among uniformlyrecurrent binary words, periodic and aperiodic Sturmian words are exactly those words forwhich A ( x ) equals the shift orbit closure Ω( x ). Furthermore, for an aperiodic binary wordthat is not Sturmian, its abelian closure contains infinitely many minimial subshifts. In thispaper we consider the abelian closures of well-known families of non-binary words, such asbalanced words and minimal complexity words. We also consider abelian closures of generalsubshifts and make some initial observations of their abelian closures and pose some relatedopen questions.
1. Introduction
Let x ∈ Σ N be an infinite word over an alphabet Σ. We define the language of x , denotedby L ( x ), as the set of factors of x , i.e., blocks of consecutive letters of x . A subshift Ω( x )generated by an infinite word x can be defined as the set of infinite words whose languagesare included in L ( x ): Ω( x ) = { y ∈ Σ N : L ( y ) ⊆ L ( x ) } . In this paper, we consider an abelianversion of the notion of a subshift. Two finite words u and v are called abelian equivalent ,denoted by u ∼ ab v , if each letter occurs equally many times in both u and v . Various abelianproperties of words have been actively studied recently, e.g., abelian complexity, abelianpowers, abelian periods, etc. [4, 21, 22]. We define the abelian closure A ( x ) of an infiniteword x as the set of infinite words y such that, for each factor u of y , there exists a factor v of x with u ∼ ab v . Clearly, Ω( x ) ⊆ A ( x ) for any word x .We start with two examples showing completely different structure of abelian closures:Sturmian words and the Thue–Morse word. Sturmian words can be defined as infinite aperi-odic words which have n + 1 distinct factors for each length n . They admit various characteri-zations; in particular, they are exactly the aperiodic binary balanced words (i.e., the numbersof occurrences of 1 in factors of the same length differ by at most 1). It is not hard to seethat, for a Sturmian word x , Ω( x ) = A ( x ) (so, the abelian closure is small, contains onlyits subshift). Indeed, due to balance there are exactly two abelian classes of factors of each Email addresses: [email protected] (Juhani Karhum¨aki), [email protected] (Svetlana Puzynina), [email protected] (Markus A. Whiteland)
Preprint submitted to Elsevier January 1, 2021 ength. Therefore, any word y ∈ A ( x ) must be balanced. Further, the frequencies of lettersof y are uniquely defined by A ( x ). Thus y is Sturmian with the same letter frequencies as x , i.e., y ∈ Ω( x ). In fact, the property Ω( x ) = A ( x ) characterizes Sturmian words amonguniformly recurrent binary words (see Theorem 2.6).The Thue–Morse word TM = 011010011001 · · · can be defined as the fixed point startingwith 0 of the morphism µ : 0
01, 1
10. For odd lengths TM has two abelian factors,and for even lengths three. Further, the number of occurrences of 1 in each factor differsby at most 1 from half of its length [22]. It is easy to see that any factor of any word in { ε, , } · { , } N has the same property, i.e., { ε, , } · { , } N ⊆ A ( TM ). In fact, equalityholds: A ( TM ) = { ε, , } · { , } N (so, the abelian closure of TM is huge compared to itsshift orbit closure). Indeed, let x ∈ A ( TM ). Then x has blocks of each letter of length at most2 (since there are no factors 000 and 111). Moreover, between two consecutive occurrencesof 00 there must occur 11, and vice versa (otherwise we have a factor 00(10) n
0, where thenumber of occurrences of 1 differs by more than 1 from half of its length). Clearly, such aword is in { ε, , } · { , } N . In [20], we show that this fact can be generalized to all binarywords: In fact, each binary aperiodic uniformly recurrent word which is not Sturmian, admitsinfinitely many minimal subshifts in its abelian closure. Moreover, in the case of rationalletter frequency, the abelian closure always contains a morphic image of the full shift.In general, the abelian closure of an infinite word might have a pretty complicated struc-ture. T. Hejda, W. Steiner, and L.Q. Zamboni studied the abelian shift of the Tribonacciword T defined as the fixed point of τ : 0
01, 1
02, 2
0. They have announcedthat A ( T ) contains only one minimal subshift, namely Ω( T ) itself, but that there exist otherwords in it as well [12, 24].The study of abelian closures is motivated by the question of, given an infinite word x ,how strong is the bond between its abelian factors and its language. We quantify this bondby the size of the abelian closure. By size we do not mean the usual cardinality of a subshift,rather, we mean the number of disjoint minimal subshifts contained in A ( x ). A shift orbitclosure Ω( x ) is minimal if does not properly contain another shift orbit closure. If A ( x )is huge (it contains infinitely many minimal subshifts), then this bond is quite weak. Onthe other hand, if A ( x ) is small (finitely many minimal subshifts), then the bond is quitestrong. The strongest bond is attained when A ( x ) is a minimal subshift itself. In this casewe necessarily have A ( x ) = Ω( x ). It is not hard to see that for purely periodic words, theirabelian closure is finite (see Proposition 2.3). On the other hand, the abelian closure of anultimately periodic word can be huge (see Example 2.4). For reasons stemming from thisobservation, when dealing with abelian closures of individual words, we shall assume thewords to define minimal shift orbit closures.There is indeed no particular reason to restrict the definition of abelian closures to justindividual words; the abelian closure of a set of words X comprises those infinite words y whose each factor is abelian equivalent to some factor of one of the words in X . For example,the abelian closure of Ω( x ) coincides with A ( x ). As mentioned previously, the shift orbitclosure of an infinite word is a certain type of a subshift . In general, a non-empty set X ⊆ Σ N of infinite words is called a subshift if it is closed (as a subset of the compact metric space Σ N equipped with the usual product topology defined by the discrete topology on Σ) and that σ ( X ) ⊆ X , where the shift map σ , is defined as σ ( x ) i = x i +1 . A subshift is called minimal Subshifts are often defined over bi-infinite words, in which case we require σ ( X ) = X in the definition.
2f it does not contain a proper subshift. Hence a minimal subshift is always the shift orbitclosure of some word x . It is routinely checked that A ( X ) is a subshift for any set X of words.In this paper, we study how the characterization of Sturmian words as aperiodic uniformlyrecurrent words with A ( x ) = Ω( x ) extends to non-binary alphabets. We then study theabelian closures of certain generalizations of Sturmian words, and some preliminary resultshave been reported at DLT 2018 conference [15]. Besides that, we discuss abelian closures ofsubshifts in general. In Section 3 we characterize the abelian subshifts of aperiodic recurrentbalanced words; they are a finite union of minimal subshifts. In Section 4 we consider abelianclosures of words over a k -letter alphabet with factor complexity n + k − n , whichare aperiodic words of minimal complexity involving k letters. The behavior is differentdepending on k . For k = 2, we are in the case of Sturmian words, so we have A ( x ) = Ω( x ).Surprisingly, the most complicated behaviour is exhibited in the ternary alphabet. We showthat for k = 3, depending on the word x , its abelian closure A ( x ) contains either exactly one,or uncountably many minimal subshifts. For alphabets of size greater than 3, A ( x ) equalsthe union of exactly two minimal subshifts, Ω( x ) and its ”reversal”. Further, in Section 5,we show that for Arnoux–Rauzy words, their abelian closures contain non-recurrent words,and hence A ( x ) = Ω( x ). We then extend our interest to general subshifts in Section 6. Ourfocus is on subshifts defined using notions from formal language theory. We show that theabelian closure of a subshift of finite type (resp., sofic subshift ) is not necessarily a subshiftof finite type (resp., a sofic subshift) (see Section 6 for definitions). We then conclude withopen problems.
2. Notation and first observations
We recall some notation and basic terminology from the literature of combinatorics onwords. We refer the reader to [17, 18] for more on the subject.The set of finite words over an alphabet Σ is denoted by Σ ∗ and the set of non-emptywords is denoted by Σ + . The empty word is denoted by ε . We let | w | denote the length ofa word w ∈ Σ ∗ . By convention, | ε | = 0. A factor of a word x is any block of its consecutiveletters, and we let L ( x ) denote the set of factors of x . The length n factors of x is denotedby L n ( x ). The length n prefix of the word x is denoted by pref n ( x ). The factor complexity function P x is defined by P x ( n ) = |L n ( x ) | . An infinite word x is called recurrent if eachfactor of x occurs infinitely many times in x . Further, x is uniformly recurrent if for eachfactor u ∈ L ( x ) there exists N ∈ N such that u occurs as a factor in each factor of length N of x . For a finite word u ∈ Σ ∗ , we let | u | a denote the number of occurrences of the letter a ∈ Σ in u . For a finite word v , we let v ω denote the infinite word obtained by repeating v infinitely many times.For x ∈ Σ N and a ∈ Σ, the limitsfreq x ( a ) := lim n →∞ sup v ∈L n ( x ) | v | a n and freq x ( a ) := lim n →∞ inf v ∈L n ( x ) | v | a n exist. and, moreover,freq x ( a ) = inf n ∈ N sup v ∈L n ( x ) | v | a n and freq x ( a ) = sup n ∈ N inf v ∈L n ( x ) | v | a n . v ∈L n ( x ) | v | a (resp., inf v ∈L n ( x ) | v | a ) is subadditive (resp., superadditive ) with respect to n . It immediately follows thatsup v ∈L n ( x ) | v | a ≥ n freq x ( a ) and inf v ∈L n ( x ) | v | a ≤ n freq x ( a ) (1)for all n ∈ N . If freq x ( a ) = freq x ( a ), we denote the common limit by freq x ( a ) and we saythat x has uniform frequency of a .A subshift X ⊆ Σ N , X = ∅ , is a closed set (with respect to the product topology on Σ N )satisfying σ ( X ) ⊆ X , where σ is the shift operator (defined by σ ( a a a · · · ) = a a · · · ). Fora subshift X ⊆ Σ N we let L ( X ) = ∪ y ∈ X L ( y ). A subshift X ⊆ Σ N is called minimal if X doesnot contain any proper subshifts. Observe that two minimal subshifts X and Y are eitherequal or disjoint. Let x ∈ Σ N . We let Ω( x ) denote the shift orbit closure of x , which may bedefined as the subshift { y ∈ Σ N : L ( y ) ⊆ L ( x ) } . Thus L (Ω( x )) = L ( x ) for any word x ∈ Σ N .It is known that Ω( x ) is minimal if and only if x is uniformly recurrent. For a morphism ϕ : Σ → ∆ ∗ (that is, ϕ ( uv ) = ϕ ( u ) ϕ ( v ) for all u, v ∈ Σ ∗ ) and a subshift X ⊆ Σ N , we define ϕ ( X ) = ∪ x ∈ X Ω( ϕ ( x )). When using erasing morphisms , that is, some letter maps to ε , wemake sure that no point in X gets mapped to a finite word. For more on this topic we referthe reader to [16].We recall definitions and properties related to Sturmian words from [18, Chapter 2]. Weidentify the interval [0 ,
1) with the unit circle T (the point 1 is identified with point 0). Forpoints x, y ∈ T , we let I ( x, y ) (resp., ¯ I ( x, y )) denote the half-open interval on T containing x (resp., y ) and starting from x and ending at y in counter–clockwise direction. We omit thebar whenever it does not matter which endpoint is in the interval. Let α ∈ T be irrationalor rational and let ρ ∈ T . The map R α : T → T , x
7→ { x + α } , where { x } = x − ⌊ x ⌋ is the fractional part of x ∈ R , defines a (counter-clockwise) rotation on T . Divide T intotwo half-open intervals I = I (0 , − α ) and I = I (1 − α,
1) (resp., I = ¯ I (0 , − α ) and I = ¯ I (1 − α, ν : T → { , } , x i if x ∈ I i , i = 0 ,
1. The rotationword s α,ρ (resp., ¯ s α,ρ ) of slope α and intercept ρ is the word a a · · · ∈ { , } N defined by a n = ν ( R nα ( ρ )) for all n ∈ N . Note that 00 occurs in s α,ρ if and only if α < /
2. Clearly, s α,ρ is aperiodic if and only if α is irrational. Each aperiodic rotation word is a Sturmian wordand vice versa. We call periodic rotation words periodic Sturmian. Observe the special roleplayed by s α,α = s : both 01 s and 10 s ∈ Ω( s ) for α ∈ (0 , s is uniformly recurrent so that Ω( s ) is minimal. Further, s ′ ∈ Ω( s )if and only if s and s ′ are of the same slope. In particular, the intercepts or the endpoints of I and I do not play any role when speaking of the shift orbit closure of a Sturmian word.(In Section 4 we pay attention to these choices.)An infinite word x ∈ Σ N is called balanced if, for all v, v ′ ∈ Σ N with | v | = | v ′ | and forall a ∈ Σ, we have || v | a − | v ′ | a | ≤
1. Periodic and aperiodic Sturmian words are exactlythe recurrent balanced binary words [19]. It follows that, for each (aperiodic or periodic)Sturmian word s , the set {| v | : v ∈ L n ( s ) } consists of at most two values k and k + 1 forsome k depending on s and n . Observe now that freq s (1) = α for any Sturmian word s ofslope α . By (1), k above equals ⌊ nα ⌋ .We turn to the main notion of this paper. Definition 2.1.
For x ∈ Σ N we define the abelian closure of x as A ( x ) = { y ∈ Σ N | ∀ u ∈ L ( y ) ∃ v ∈ L ( x ) : u ∼ ab v } . x ∈ Σ N , the abelian closure A ( x ) is indeed a subshift. We make preliminaryobservations on abelian closures of infinite words. Lemma 2.2.
Assume x ∈ Σ N has uniform frequency of a letter a ∈ Σ . Then any word y ∈ A ( x ) has uniform frequency of a and freq y ( a ) = freq x ( a ) .Proof. For all y ∈ A ( x ) and for all n ∈ N , we have immediately from the definition of A ( x )that sup v ∈L n ( x ) | v | a n ≥ sup v ∈L n ( y ) | v | a n ≥ inf v ∈L n ( y ) | v | a n ≥ inf v ∈L n ( x ) | v | a n . Letting n → ∞ gives our claims.We immediately have that if x has an irrational uniform frequency of some letter a , then A ( x ) contains only aperiodic words. We continue by observing how the abelian closures ofperiodic and ultimately periodic words can differ. Proposition 2.3.
For any periodic word x , the abelian closure A ( x ) is finite.Proof. A word y is periodic if and only if all factors of length n are abelian equivalent forsome n ≥ n be the least such integer for x . It follows that all factors of length n of any word y ∈ A ( x ) are abelian equivalent. Thus y = v ω with | v | dividing n . There arefinitely many such words.In general, the abelian closure of an ultimately periodic word can be huge. Example 2.4.
Let x = 0011(001101) ω . It is readily verified that for odd lengths x has twoabelian factors, and for even lengths three. Further, for each factor of x , the number ofoccurrences of 1 differs by at most one from half of its length. Thus, by the discussion in theintroduction, we have TM ∈ A ( x ) so that A ( x ) = A ( TM ) = { ε, , } · { , } N .A family of such examples are given in [15, Ex. 2].In the end of this section we show that, for a uniformly recurrent binary word x , A ( x )contains exactly one minimal subshift if and only if x is a Sturmian word. We start with acrucial observation, which is characteristic only for binary words. Lemma 2.5 (Corridor Lemma) . Let x be a binary word. Then y ∈ A ( x ) if and only if, forall n ∈ N , we have inf u ∈L n ( y ) | u | ≥ inf u ∈L n ( x ) | u | and sup u ∈L n ( y ) | u | ≤ sup u ∈L n ( x ) | u | . Proof.
It is easy to see (e.g., by a sliding window argument) that, for any n ≥
1, thereexists a word u ∈ L n ( z ) with | u | = m if and only if inf v ∈L n ( x ) | v | ≤ m ≤ sup v ∈L n ( x ) | v | .Applying this observation to x and y we have that for each v ∈ L n ( y ) there exists u ∈L n ( x ) such that v ∼ ab u if and only if inf u ∈L n ( x ) | u | ≤ inf u ∈L n ( y ) | u | and sup u ∈L n ( y ) | u | ≤ sup u ∈L n ( x ) | u | .We now characterize Sturmian words in terms of abelian closures. Theorem 2.6.
Let x ∈ { , } N be uniformly recurrent. Then A ( x ) contains exactly oneminimal subshift if and only if x is Sturmian. Theorem 2.7.
Let x ∈ { , } N be aperiodic and uniformly recurrent. Then A ( x ) = Ω( x ) ifand only if x is Sturmian.Proof. First we show that A ( x ) = Ω( x ) for Sturmian words. Let x be a Sturmian wordof slope α and y ∈ A ( x ). By the Corridor Lemma, y is balanced and, by Lemma 2.2, hasuniform frequencies of letters equal to those of x . Thus y ∈ Ω( x ).Assume then that A ( x ) contains exactly one minimal subshift, namely Ω( x ), and let α =freq x (1). Take a (periodic or aperiodic) Sturmian word s of slope α . By the Corridor Lemma,we have Ω( s ) ⊆ A ( x ) using (1). Since Ω( s ) is also minimal, we have Ω( x ) = Ω( s ). Thus x isSturmian, and A ( x ) = Ω( x ).The following example shows that we cannot omit the assumption of uniform recurrencefrom the statement of the above theorem. Example 2.8.
Take the
Champernowne word C (over the binary alphabet), which is ob-tained by concatenating all finite words ordered by length and lexicographic order for thesame length: C = 0 1 00 01 10 11 000 001 010 011 100 101 110 111 · · · Clearly, both Ω( C ) and A ( C ) are equal to the full shift, i.e., contain all binary words.Note that the property A ( x ) = Ω( x ) or A ( x ) containing exactly one minimal subshift doesnot characterize Sturmian words among uniformly recurrent words over arbitrary alphabets.Let f be the Fibonacci word , which is a Sturmian word defined as the fixed point of themorphism 0
01, 1
0. Let then ϕ : 0 ,
12. Then A ( ϕ ( f )) = Ω( ϕ ( f )) (seeTheorem 4.4).We investigate possible generalizations of the property A ( x ) = Ω( x ) to nonbinary alpha-bets in the next sections.
3. Abelian closures of balanced words
In this section we study the abelian closures of non-binary aperiodic balanced words. Weprove that the abelian closure of such a word is a finite union of minimal subshifts:
Theorem 3.1.
Let u be aperiodic recurrent and balanced. Then A ( u ) is the union of finitelymany minimal subshifts. As we will show below, abelian closure of a recurrent balanced word can contain one ormore (yet a finite number) of minimal subshifts, depending on its structure, and we can in factcompute this number. Our results rely heavily on the characterization of aperiodic recurrentbalanced words by R. Graham [11] and P. Hubert [13]. In fact, the characterization allowsus to characterize the abelian closures of slightly more general words. In particular, the thetechniques used in the proof of the above theorem give us, for each k , an aperiodic word x k over a four-letter alphabet such that A ( x k ) equals the union of k distinct minimal subshifts(see Proposition 3.10).We need some notation to give a characterization of aperiodic recurrent words. Definition 3.2.
A word is called constant gap if each letter occurs with a constant gap.6or example, ( abac ) ∞ is a constant gap word. Definition 3.3.
Let x be a finite or infinite binary word, z ∈ A N and z ∈ B N , where A and B are some alphabets. Let S ( x , z , z ) denote the word obtained from x by substitutingthe n th occurrence of 0 (resp., 1) in x by the n th letter of z (resp., z ).We illustrate the above operation with an example. Example 3.4.
Let f = 01001010010010100101 · · · be the Fibonacci word, z = (0102) ω , and z = ( ab ) ω . Then S ( f , z , z ) = 0 a b a b a b a b · · · is balanced.The following theorem characterises recurrent balanced words using constant gap wordsand the operation S . Theorem 3.5 ([13, Thm. 1], [11]) . An aperiodic word u ∈ Σ N is recurrent and balanced ifand only if there exist a partition { A, B } of Σ , two constant gap words z ∈ A N and z ∈ B N ,and a Sturmian word s , such that u = S ( s , z , z ) . We remark that although the structure of aperiodic balanced words is clear, the structureof periodic balanced words is a mystery: the following conjecture by Fraenkel, 1973, remainsopen despite efforts of different scientists: The unique (up to a permutation of letters) bal-anced word on k ≥ F k ) ω = ( F k − kF k − ) ω where F = 121 [10]. The conjecture has been verified for k ≤ z and z being constant gap. We obtain a characterization of theabelian closures of such words, from which the characterization of abelian closures of recurrentbalanced words follows.We need the following lemma: Lemma 3.6.
Let u = S ( s , z , z ) with s Sturmian and z i periodic words. Then L ( u ) = {S ( x, z ′ , z ′ ) : x ∈ L ( s ) , z ′ i ∈ Ω( z i ) } . Further, u is uniformly recurrent.Proof. In fact, this result is implicitly contained in [13], only stated in slightly weaker form.Theorem 2 and Proposition 3.1 of [13] essentially state the following. Let v = S ( s , y , y ),where s is a Sturmian word and y and y are constant gap words with periods l and l respectively. Then the factor complexity function of v satisfies P v ( n ) = l l ( n + 1) for alllarge enough n . Further [13, Prop. 5.1] states that v is uniformly recurrent.Observe that L ( v ) ⊆ {S ( x, σ i ( y ) , σ j ( y )) : x ∈ L ( s ) , i, j ∈ N } and the cardinality of thelatter set equals l l ( n +1) for large enough n . This coincides with the factor complexity P v ( n )by Hubert’s result. We deduce that for all x ∈ L ( s ), i, j ∈ N , we have S ( x, σ i ( y ) , σ j ( y )) ∈L ( v ).Now we can take the constant gap words y = ( a . . . a l ) ω and y = ( b . . . b l ) ω , where l and l are the lengths of the periods of z and z : z i = u ωi , | u i | = l i . Define a coding τ so that a · · · a l u and b · · · b l u . Then τ ( v ) = u so that u is uniformly recurrent. Further, τ ( S ( x, σ i ( y ) , σ j ( y ))) = S ( x, σ i ( z ) , σ j ( z )). The claim follows now immediately.7n the above lemma we allow the periodic words z and z contain common letters. On theother hand, in the following proposition we assume that they do not share common letters.This puts us in the position of characterizing the abelian closures of such words. Proposition 3.7.
Let u = S ( s , z , z ) for some Sturmian word s and periodic words z ∈ A N and z ∈ B N , where A and B are disjoint alphabets. Then A ( u ) = [ t i ∈A ( z i ) Ω( S ( s , t , t )) , and A ( u ) is a finite union of minimal subshifts.Proof. Let first x ∈ Ω( S ( s , t , t )) for some t i ∈ A ( z i ). For any factor x ∈ L ( x ), we have x = S ( y, σ i ( t ) , σ j ( t )) for some y ∈ L ( s ) and i, j ≥ u = pref n ( σ i ( t ))and v = pref m ( σ j ( t )), where n = | y | , m = | y | . By assumption, there exist u ′ ∈ L ( z )and v ′ ∈ L ( z ) such that u ∼ ab u ′ and v ∼ ab v ′ . Choose r, s such that σ r ( z ) begins with u ′ and σ s ( z ) begins with v ′ . It follows that x ∼ ab S ( y, σ r ( z ) , σ s ( z )) ∈ L ( u ). We thus have x ∈ A ( u ).Let then x ∈ A ( u ). Take ϕ : Σ → { , } such that ϕ ( a ) = 0 if and only if a ∈ A . Itfollows that ϕ ( x ) ∈ A ( ϕ ( u )) = Ω( s ) since the alphabets A and B are disjoint. Take thenthe morphism ϕ A : Σ → A ∗ such that ϕ A ( a ) = a for a ∈ A , otherwise ϕ A ( a ) = ε . Definethe morphism ϕ B : Σ → B ∗ analogously. We again have that ϕ A ( x ) ∈ A ( ϕ A ( u )) where ϕ A ( u ) = z . Similarly ϕ B ( x ) ∈ A ( z ). It is now evident that x = S ( s ′ , t , t ) for some t i ∈ A ( z i ), i = 0 ,
1, and s ′ ∈ Ω( s ). The above lemma implies that x ∈ Ω( S ( s , t , t )).As Ω( z i ) is finite by Proposition 2.3, A ( u ) is a finite union of minimal subshifts. Thisconcludes the proof.The above proposition has Theorem 3.1 as an immediate corollary.Let us consider what the above proposition says. The number of distinct minimal subshiftsis bounded above by the product of the number of minimal subshifts in A ( z ) and the numberof minimal subshifts in A ( z ). Example 3.8.
Let z = (0102) ω and z = (34) ω . Now A ( z i ) = Ω( z i ) as is readily verified.Thus A ( u ) = Ω( u ) for u = S ( s , z , z ), s Sturmian, by the above proposition.For the case of recurrent balanced words, the periodic words in the construction areconstant gap words. It is natural to ask whether the extra property of constant gaps restrictsthe cardinality of the number of minimal subshifts in its abelian closure. We give a negativeanswer to this:
Example 3.9.
Let z = ( a a a a a A · a a a a a A · a a a a a A ) ω . Here the letters a i have constant gaps of length 6, and A i have constant gaps of length 18. Take any constantgap sequence z which is not closed under reversal (e.g., ( abc ) ω ), and u = S ( s , z , z ). Thenwe can independently take reversal inside z , inside z , and inside the arithmetic progressiongiven by A A A in z . We thus get eight minimal subshifts. Note that this construction canbe generalized to produce 2 k minimal subshifts for any k .The techniques used in the proof of the above theorem give us the following proposition.We remark that the words in question are not necessarily balanced:8 roposition 3.10. For each k ≥ there exists an aperiodic word x k over a three-letteralphabet such that A ( x k ) equals the union of k distinct minimal subshifts.Proof. For k = 1 we may take any Sturmian word. Let thus k ≥
2. Consider first the abelianclosure of the periodic word z = (0 k − ω . It is readily verified that A ( x ) contains theminimal subshifts generated by the words (0 k − − i i ω , i = 0 , . . . , k −
1. To see that thereis nothing else in A ( z ), we observe the following. By Proposition 2.3, any word in A ( z ) isperiodic with period dividing 2 k + 1, an odd number. The period cannot be less than 2 k + 1,as the number of 1s in factors of length 2 k + 1 is only 2. On the other hand, all words oflength 2 k + 1 containing two occurrences of 1 (and hence the periodic words they generate)occur already in the subshifts above.For the claim we set x k = S ( s , z , a ω ) where s is an aperiodic Sturmian word. ByProposition 3.7, A ( x k ) = k − [ i =0 Ω( S ( s , (0 k − − i i ω , a ω )) , a union of k distinct minimal subshifts.
4. Abelian closures of words of minimal complexity
First we study the abelian closures of aperiodic nonbinary words of minimal factor com-plexity. Over an alphabet Σ with at least two letters, the minimal complexity is n + | Σ | − n + C is related to the structure of Sturmian wordsand is well understood ([7, 9, 14]). The main goal of this subsection is to prove that foraperiodic ternary words of minimal complexity their abelian closure consists of either oneor uncountably many minimal subshifts (Theorem 4.4); for alphabets of size greater than 3the abelian closure contains exactly two minimal subshifts (when the words are assumed tobe recurrent, Theorem 4.13). Recall that for binary alphabet, we have exactly one minimalsubshift (Theorem 2.6).A proof of the following is explained in [9] in the discussion following Lemma 1. Lemma 4.1.
A minimal complexity word u over the alphabet A is of the form a · · · a t u ′ ,where u ′ is a recurrent minimal complexity word over an alphabet A ′ ⊆ A , and a , . . . , a t aredistinct letters of A \ A ′ . We shall consider the abelian subshifts of recurrent aperiodic minimal complexity words.The following lemma then extends those results to handle non-recurrent ones as well.
Lemma 4.2.
Let u = a · · · a t u ′ be an aperiodic minimal complexity word as in the abovelemma. Then A ( u ) = Ω( u ) ∪ A ( u ′ ) .Proof. Let y ∈ A ( u ), so it contains at most one occurrence of each of the letters a i . If itcontains none of them, there is nothing to prove.So let us write y = pa i y ′ , where y ′ contains none of the letters a j , j = 1, . . . , t . First ofall, i = t , as a i must be followed by a i +1 (or a i − , in which case we reach a at some point,after which we cannot add anything). We show that y ′ = u ′ . Assume that y ′ = u ′ , so y contains the factor a t wa , while u contains a t wb , for some a = b . This is impossible, as a t wb
9s the only factor of length | w | + 2 of u that contains only letters from A \ A ′ (apart from a t ).Hence y = pa t u ′ .Now either p ends with a t − or with a letter a ∈ A ′ . In the latter case, we must have that a is the first letter of u ′ , and so y contains aa t a . It follows that u ′ must begin with aa , so y contains aa t aa and hence u ′ must begin with aaa . By continuing with this line of reasoning,we see that u ′ = a ω , which is contrary to the assumption that u is aperiodic. We deduce that p must end with a t − . Now p cannot contain any letter from A ′ (it would be followed by aletter a i with i < t , a contradiction). The only option is that y = a j a j +1 · · · a t u ′ for some j ≥
0, which suffices for the proof.
We start with infinite words for which p ( n ) = n + 2 for all n ≥
1. Observe that this impliesthat we deal with ternary words.In [3], J. Cassaigne characterizes words having factor complexity n + C for all n ≥ n , C aconstant. Here we consider the case of C = 2 and n = 1. We first recall their characterization,which can be deduced from [14] (see also [7]). Theorem 4.3.
A word u ∈ { , , } has factor complexity P ( n ) = n + 2 for all n ≥ if andonly if u is of the form (up to permuting the letters)1. u = 2 s for some Sturmian word s ∈ { , } N , or u ∈ Ω( ϕ ( s )) , where s is a Sturmian word and ϕ is defined by2. , ;3. , . In this subsection we study the abelian closures of these words. The main result is thefollowing theorem.
Theorem 4.4.
Let u be a word of factor complexity n + 2 for all n ≥ . If u is as inTheorem 4.3 (1) or (2) , then A ( u ) = Ω( u ) . If u is as in (3) , then A ( u ) contains uncountablymany minimal subshifts. In fact we are able to characterize the abelian closures of these words. We do this in parts,the first two cases are straightforward and we prove them first. For the last case we needsome further notions.
Remark 4.5.
In the following, we often use the following argument. Assume that each letterin x ∈ Σ N occurs with bounded gaps. Let ϕ be a morphism such that | ϕ ( a ) | ≤
1. Then ϕ ( A ( x )) ⊆ A ( ϕ ( x )). Indeed, since in x each letter occurs with bounded gaps, the same holdsfor any y ∈ A ( x ). Consequently ϕ ( y ) is infinite. Letting v be a factor of ϕ ( y ), there existsa factor v ′ of y such that ϕ ( v ′ ) = v due to the length assumption on ϕ . As y ∈ A ( x ) thereexists a factor w ′ of x abelian equivalent to v ′ . It thus follows that ϕ ( x ) contains the factor ϕ ( w ′ ) abelian equivalent to v . As y and v were arbitrary, we conclude that ϕ ( y ) ∈ A ( ϕ ( x )).In particular, if ϕ ( x ) is Sturmian, then ϕ ( A ( x )) ⊆ Ω( ϕ ( x )) by Theorem 2.6. Proposition 4.6.
Let u as in Theorem 4.3 (1) or (2) . Then A ( u ) = Ω( u ) .Proof. Assume first that u = 2 s . The claim follows immediately from Lemma 4.2 togetherwith Theorem 2.7. 10ssume then that u is as in (2). Let x ∈ A ( u ). By applying the morphism 2
2, 1
0, we see that every second letter of x is 2. Further, by mapping 2 ε , i i for i = 0 ,
1, we see that x maps into a word in Ω( s ). It is straightforward to see that now x ∈ Ω( u ).The rest of this subsection is devoted to the case where u is as in Theorem 4.3 (3). Thiscase is more intricate as shown by the following example: A ( u ) contains non-recurrent words,similar to the Tribonacci word. Example 4.7.
Let α ∈ T , let ϕ be as in Theorem 4.3 (3), and u = ϕ ( s α,α ). The words u = ϕ (01 s α,α ) = 012 u and u = ϕ (10 s α,α ) = 120 u are both in Ω( u ). We claim that thenon-recurrent word x = 02 u ∈ A ( u ). Indeed, σ ( x ) = σ ( u ) ∈ Ω( u ) (recall σ is the shiftmap). Further, any prefix of x of length at least 2 is abelian equivalent to the prefix of σ ( u ).Thus x ∈ A ( u ).We now analyze the structure of A ( u ). Without loss of generality we may take u = ϕ ( s ),since ϕ ( s ) is uniformly recurrent. Consider the images of u under the morphisms ϕ : 0
1, 2 ϕ : 0
0, 1
0, 2
1. We have ϕ ( u ) = s , a Sturmian word ofsome slope α and intercept ρ . (Indeed, ϕ ( u ) = G ◦ E ( s ) using the notation of [18, Chap. 2,p. 72].) Symmetrically, ϕ ( u ) = s is a Sturmian word of slope α and intercept ρ ′ . (Again, ϕ ( u ) = D ◦ E ( s ), see again the above reference.) In fact we can say more: ρ ′ = ρ − α or,equivalently, s = σ ( s ). Observe now that s contains the factor 00 meaning that α < / x ∈ A ( u ) with the morphisms ϕ and ϕ , we obtain two Sturmianwords with the same slope α . Further, by applying 0 ε on u , we see that x ∈ Ω((12) ω ).This implies that all words in A ( u ) are obtained by somehow ”interleaving” two Sturmianwords of the same slope ( s is encoded by 1 ↔ ternary codings of rotations , which capture this phenomenon.Recall the definition of Sturmian words as codings of rotations on the torus T with thehalf-open intervals I = I (0 , − α ) and I = I (1 − α, α < /
2. Take ζ ∈ I ( α, − α ) and split torus T into four (three if ζ = α or ζ = 1 − α ) intervals defined bythe points 0 , ζ − α, ζ , and 1 in increasing order: Define the disjoint intervals J = I ( ζ − α, ζ )(resp., J = ¯ I ( ζ − α, ζ )), J = I , and J = I \ J . We must be careful with the value ζ = α (resp., ζ = 1 − α ): If 1 ∈ I (resp., 1 − α ∈ I ) then J = ¯ I (0 , α ) (resp., J = I (1 − α, − α )).Take the rotation R α and the encoding ν : T → { , , } , x i if and only if x ∈ I i . Theword t α,ζ,x = ( ν ( R nα ( x )) is called the rotation word of slope α , offset ζ , and intercept x . SeeFigure 1a for an illustration. When indicate the choices of endpoints of J i as follows. If 1 ∈ J and ζ ∈ J (resp., 1 / ∈ J , ζ / ∈ J ) we denote the obtained word by t α,ζ,ρ (resp., t α,ζ,ρ ). If1 ∈ J and ζ / ∈ J (resp., 1 / ∈ J , ζ ∈ J ), we denote this by t α,ζ,ρ (resp., t α,ζ,ρ ). Notice that t α,α,ρ and t α, − α,ρ are not defined: this would imply that the intervals J and J overlap.Observe now that, by the discussion following Example 4.7, for u of factor complexity n + 2 as in Theorem 4.3(3), we have u = t α,α,ρ for some ρ ∈ T (see Figure 1b). Further, anyword x ∈ A ( u ) is of form t α,ζ,ρ ′ for some ζ ∈ I ( α, − α ), ρ ′ ∈ T . Our main goal is to showthat t α,ζ,ρ ∈ A ( u ) for all possible ζ ∈ [ α, − α ].Recall that Sturmian words are balanced, so that for each n ∈ N and for each i = 1 , {| v | i : v ∈ L ( u ) } comprises two values (depending on n and α ). We say that a factor v is 1 -heavy (resp., 2 -heavy ) if | v | (resp., | v | ) attains the larger of the two possible values.Otherwise we say that v is 1 -light (resp., 2 -light ). If v is 1-heavy and 2-heavy, we say that11 {− α } ζ − αζ J J J J x R α ( x ) (a) {− α } α J J J x R α ( x ) (b) Figure 1: An illustration of a system of codings of rotations with more than 2 intervals. In 1a we have fourintervals and in 1b three intervals. The word u in Theorem 4.3, item 3 is a coding of the orbit of some pointin the system 1b. v is 1 - -heavy . Similarly, v is called 1 - -light if v is 1-light and 2-light. We make use of thefollowing result appearing in [23, part of Thm. 19]. Proposition 4.8.
Let s be a Sturmian word of slope α and intercept ρ and let m ≥ . Thenthe prefix of length m of s is heavy if and only if ρ ∈ I ( R − mα (0) , . Here I ( R − mα (0) , containsthe point R − mα (0) if and only if / ∈ I . We may apply Proposition 4.8 to determine whether a point starts with a heavy factorof length m or not. Indeed, for the letter 1, the proposition stands as is: t α,ζ,ρ begins witha 1-heavy factor of length m if and only if ρ ∈ I ( {− mα } , ζ . Thus the word t α,ζ,ρ begins with a 2-heavyfactor if and only if ρ ∈ I ( {− mα + ζ } , ζ ). Here the interval I ( {− mα + ζ } , ζ ) contains thepoint {− mα + ζ } (resp., ζ ) if and only if ζ / ∈ J (resp., ζ ∈ J ). We define the followingdistance on the torus: k x k = min { x, − x } . Thus, e.g., max { x, − x } = 1 − k x k . Lemma 4.9.
Let x = t α,ζ,ρ . Then1. x contains a -heavy– -light and a -heavy– -light factor for each length.2. There exists a - -heavy factor v ∈ L ( x ) of length m if and only if {− mα } < − k ζ k ,or {− mα } = 1 − k ζ k and x = t α, { mα } , {− nα } or x = t α, {− mα } , {− ( m + n ) α } for some n ≥ .3. There exists a - -light factor v ∈ L ( x ) of length m if and only if {− mα } > k ζ k , or {− mα } = k ζ k and x = t α, {− mα } , {− ( m + n ) α } or x = t α, { mα } , {− nα } .Proof. We give a proof case by case. Consider factors of length m and write µ = {− mα } forshort.1. We first consider 1-heavy–2-light factors. By the preceding observations on 1-heavy and2-heavy factors, x has a 1-heavy–2-light factor if and only if I ( µ, ∩ I ( ζ, { ζ + µ } ) = ∅ .The interval I (max { ζ, µ } , min { ζ + µ, } ) is always in the intersection, since ζ, µ < ζ, µ < ζ + µ . Since ( {− nα } ) n is dense in [0 , x corresponds to a coding ofa point in this interval.Similarly, x has a 2-heavy–1-light factor if and only if I (0 , µ ) ∩ I ( { ζ + µ } , ζ ) = ∅ . Theinterval I (max { , µ + ζ − } , min { ζ, µ } ) is always in the intersection, since ζ, µ > ζ, µ > µ + ζ − x has a 1-2-heavy factor if and onlyif I ( µ, ∩ I ( { µ + ζ } , ζ ) = ∅ . Assume first that µ < − k ζ k . If µ < ζ then I ( µ, ζ ) is in the12ntersection. If ζ ≤ µ < − k ζ k , then µ + ζ < { µ + ζ } = µ + ζ and I ( µ + ζ,
1) isin the intersection. The denseness of ( {− nα } ) n ∈ N in T again implies that some shift of x corresponds to a point in this interval.Assume then that µ = 1 − k ζ k . If ζ = k ζ k , then { µ + ζ } = 0. Now I ( { µ + ζ } , ζ ) and I ( µ,
1) can share at most one point in common, namely the point 1. Now the intersectionis non-empty if and only if 1 ∈ J and ζ / ∈ J . Further, t α,ζ, is the only word startingwith a 1-2-heavy factor. To hit the point 0 in the orbit starting from ρ ∈ T , we must have ρ = {− nα } for some n ≥
0. So, we have x contains a 1-2-heavy factor of length m if andonly if x = t α,ζ, {− nα } , where ζ = 1 − µ = 1 − {− mα } = { mα } .Similarly, if ζ = 1 − k ζ k , then I ( { µ + ζ } , ζ ) and I ( µ,
1) can share at most one point incommon, namely ζ . The intersection is not empty if and only if 1 / ∈ J and ζ ∈ J . In thiscase t α,ζ,ζ is the only point starting with a 1-2-heavy factor. To hit the point ζ in the orbitof ρ , we must have ρ = { ζ − nα } for some n ≥
1. Hence x contains a 1-2-heavy factor ifand only if x = t α,ζ, { ζ − nα } , where ζ = 1 − k ζ k = µ = {− mα } .Assume finally that µ > − k ζ k . It follows that µ > ζ and µ + ζ > < { µ + ζ } < ζ . Therefore I ( µ, ∩ I ( { µ + ζ } , ζ ) = ∅ . This concludes the case of 1-2-heavyfactors.3. Let us then finally consider 1-2-light factors. We proceed analogous to the previous case.The word x has a 1-2-light factor of length m exists if and only if I (0 , µ ) ∩ I ( ζ, { ζ + µ } ) = ∅ .Assume first that µ > k ζ k . If µ > ζ , then the interval I ( ζ, µ ) is in the intersection. If ζ > µ ≥ −k ζ k , then µ + ζ > { µ + ζ } >
0. Now I (0 , { µ + ζ } ) is in the intersection.Assume then that µ = k ζ k . If ζ = k ζ k , then µ + ζ <
1. Now I ( ζ, µ + ζ ) and I (0 , µ ) canshare at most one point in common, namely the point ζ . By the observations precedingthe lemma, the intersection is not empty if and only if 1 ∈ J and ζ / ∈ J . Now t α,ζ,ζ is theonly point starting with a 1-2-light factor. The only way to hit the point ζ in the orbit of ρ is that ρ = { ζ − nα } for some n ≥
0. It follows that x contains a 1-2-light factor if andonly if x = t α,ζ, { ζ − nα } , where ζ = {− mα } .Similarly, if ζ = 1 − k ζ k , then µ + ζ = 1. Now I ( ζ, { µ + ζ } ) and I (0 , µ ) can share at mostone point in common, namely the point 1. By the observations preceding the lemma, theintersection is not empty if and only if 1 / ∈ J and ζ ∈ J . Now t α,ζ, is the only factorstarting with a 1-2-light factor. Again, we have x contains a 1-2-light factor if and only if x = t α,ζ, {− nα } , where ζ = 1 − µ = { mα } .Finally, if µ < k ζ k , then µ < ζ and µ + ζ ≤ µ + 1 − k ζ k <
1. Thus I (0 , µ ) and I ( ζ, µ + ζ )do not intersect. This concludes the proof.As is evident from Lemma 4.9(2), the existence of a 1-2-heavy factor of a certain lengthdepends not only on ζ , but also on ρ and how the endpoints of the intervals are defined. Forexample, the word t α, { mα } , begins with a 1-2-heavy factor of length m , while t α, { mα } , {− nα } does not contain such a factor. Note further that t α, { mα } , contains only one occurrence ofsuch a factor, and hence is non-recurrent. In fact, any word t α, { mα } , {− nα } , n ≥
0, containsexactly one such factor of length m , namely at position n . Lemma 4.10. If k ζ k > k ζ ′ k then t α,ζ,ρ ∈ A ( t α,ζ ′ ,ρ ′ ) but t α,ζ ′ ,ρ ′ / ∈ A ( t α,ζ,ρ ) . roof. Let x = t α,ζ,ρ and u = t α,ζ ′ ,ρ ′ for short. Observe that for any w, w ′ , where w ∈ L m ( u )and w ′ ∈ L m ( x ), we have || w | − | w ′ | | ≤ || w | − | w ′ | | ≤ x ∈ A ( u ). By Lemma 4.9(1), both words contain both 1-heavy-2-light and 2-heavy-1-light factors of each length. We show that whenever x contains a 1-2-heavyfactor or a 1-2-light factor length m , then u contains such a factor as well, which suffices forthe claim. To this end, let w ∈ L n ( x ). If w is a 1-2-heavy factor, then by Lemma 4.9(2), wehave {− mα } ≤ − k ζ k < − k ζ ′ k so that u contains a 1-2-heavy factor by the same lemma. If w is a 1-2-light factor, then by Lemma 4.9(3), {− mα } ≥ k ζ k > k ζ ′ k so that u again containsa 1-2-light factor of length m .We then show that u / ∈ A ( x ). Since ( {− mα } ) m ≥ is dense in [0 , m ∈ N for which k ζ k > {− mα } > k ζ ′ k . By Lemma 4.9(3), u contains a 1-2-light factor oflength m , while x does not. It follows that u / ∈ A ( x ).We may now characterize the abelian closure of n + 2 factor complexity words via ternarycodings of rotations. Proposition 4.11.
Let u = t α,α,ρ for some ρ ∈ T . Then A ( u ) = S ζ ∈ [ α, − α ] Ω( t α,ζ,ρ ) . Proof.
By the above lemma we have t α,ζ,ρ ∈ A ( u ) for all ζ ∈ ( α, − α ). For ζ = α or ζ = 1 − α , all words t α,ζ,ρ either contain or do not contain a 1-2-light (resp. 1-2-heavy) factorregardless of ρ . (Recall that the words t α,α,ρ and t α, − α,ρ are not defined.) As there are noother words in A ( u ), this concludes the proof.In fact, utilising Lemma 4.10 we can characterize the abelian closure any word t α,ζ,ρ . Theproof above applied to the setting k ζ k = k α k , i.e., when u is a minimal complexity word,carries over to arbitrary ζ with minor modifications: Proposition 4.12.
Let u = t α,ζ,ρ with k ζ k > k α k . Then A ( u ) = [ k ξ k≥k ζ k ρ ′ ∈ T Ω( t α,ξ,ρ ′ ) \ S, where S is a countable set of words depending on ζ and ρ as follows.1. If − k ζ k , k ζ k / ∈ {{− mα } : m ∈ N } , then S = ∅ .2. Assume that − k ζ k = {− mα } for some m ≥ . If u = t α, { mα } , {− nα } or u = t α, {− mα } , {− ( m + n ) α } for some n ≥ , then S = ∅ . Otherwise S = { t α, { mα } , {− nα } : n ≥ } ∪ { t α, {− mα } , {− ( m + n ) α } : n ≥ } .
3. Assume that k ζ k = {− mα } for some m ≥ . If u = t α, {− mα } , {− ( m + n ) α } or u = t α, { mα } , {− nα } for some n ≥ , then S = ∅ . Otherwise S = { t α, {− mα } , {− ( m + n ) α } : n ≥ } ∪ { t α, { mα } , {− nα } : n ≥ } . Proof.
Notice that any word y ∈ A ( u ), we have that y is of the form t α,ξ,ρ ′ . Indeed, usingthe mappings ϕ and ϕ as in the discussion following Example 4.7, ϕ ( y ) and ϕ ( y ) are14turmian words with slope α . We deduce that they are interleavings of Sturmian wordsgiving rise to the claimed form of y .Lemma 4.10 then gives that A ( u ) is a subset of S k ξ k≥k ζ k Ω( t α,ξ,ρ ′ ), but it is possibly aproper subset. The same lemma shows that A ( u ) is a superset of S k ξ k > k ζ k Ω( t α,ξ,ρ ) in anycase.Therefore, we may focus on words y = t α,ξ,ρ ′ with offset ξ having k ξ k = k ζ k . Notice thatthe three points are disjoint. Indeed, in 2., we assume that 1 − k ζ k = {− mα } , which gives k ζ k = { mα } . Hence k ζ k 6 = {− m ′ α } for any m ′ ≥
1, as otherwise { ( m + m ′ ) α } = 0 whichwould leave α rational.To identify the set of words S not in the abelian closure of u , we employ Lemma 4.9.1. Assume that 1 − k ζ k , k ζ k / ∈ {{− mα } : m ∈ N } . Lemma 4.9(1) then states that theexistence of a 1-2-heavy or light factor does not depend on the point whose orbit weencode, nor the choices of the endpoints of the intervals. That is to say, all words withoffset ξ , k ξ k = k ζ k , simultaneously either have or do not have a 1-2-heavy (resp., light)factor of length m independent to the choice of starting point ρ ′ of the orbit. Thissuffices to show that S = ∅ in this case.2. Assume that 1 − k ζ k = {− mα } for some m ≥
1. There is only one length of factorsin which the existence of a 1-2-heavy factor depends on the starting point ρ and thechoice of the endpoints of the intervals. This length is m . By Lemma 4.9(2), if u = t α, { mα } , {− nα } or u = t α, {− mα } , {− ( m + n ) α } for some n ≥
0, then the word contains sucha factor. In this case S = ∅ . If u is not of this form, then it does not contain such afactor, while all the words t α, { mα } , {− nα } and t α, {− mα } , {− ( m + n ) α } , n ≥
0, do. The claimthen follows.3. This is analogous to the one above.
Surprisingly, for alphabet of size greater than 3 there are always only finitely many sub-shifts:
Theorem 4.13.
Let u be a recurrent word of factor complexity n + C for all n ≥ , where C > . Then A ( u ) contains exactly two minimal subshifts. The proof is based on the characterization of words of factor complexity n + C for all n ≥ Lemma 4.14 ([9, Lem. 4]) . Let u be a recurrent word of minimal complexity an alphabet A ; then there exist distinct elements e , . . . , e b , f , . . . , f c , g , . . . , g d in A such that the sets E = { e , . . . , e b } , F = { f , . . . , f c } , and G = { g , . . . , g d } are pairwise disjoint, E ∪ F ∪ G = A ,with G = ∅ , and E ∪ F = ∅ , and there exists a Sturmian word s on { , } such that, if σ isthe substitution ( g · · · g d e · · · e b g · · · g d f · · · f c , then σ ( s ) = W u , where W is a (possibly empty) prefix of σ (0) or σ (1) . emma 4.15. Let w be a recurrent word of of minimal complexity an alphabet A of cardinalityat least 3. Then each w ′ ∈ A ( w ) is a concatenation of blocks of the form σ (0) and σ (1) (ortheir reversals), where σ is as in the previous lemma, preceded by a possibly empty suffix of σ (0) or σ (1) (or a reversal of a prefix).Proof. The proof is quite direct. Since | A | ≥
4, at least one of the sets
G, E, F containsat least two letters. Let it be E (for other sets it is similar). First we show that for any w ′ ∈ A ( w ) the letters from E must occur in blocks e · · · e b (or in e b · · · e – this case issymmetric, all the blocks are reversed). For this, it is enough to consider only factors oflength 2 and 3. Indeed, if e occurs in w ′ , then the only factors containing e in w are e e and g d e . With the exception of the case F = ∅ and | G | = 1, we have that g d e g d is not anabelian factor of w . Since g d e g d is not an abelian factor of w , in w ′ we must have g d e e or e e g d . Continuing this line of reasoning with e , e , e instead of e , g d , e etc., we get that w ′ the letters from E must occur in blocks e · · · e b (or in e b · · · e ). If F = ∅ and | G | = 1, then | E | ≥ e : by consideringfactors of length 2 and 3, we see that it can occur only in factors e e e and e e e . The restof the proof is the same.In the same way we prove that each such block e · · · e b must be surrounded by g · · · g d from both sides, so we have σ (0) g · · · g d . Now we show that this block σ (0) must be followedby either σ (0) or σ (1). We already have the beginning of the block g . . . g d . After it, onemust have either e , or f , or g in the case if F = ∅ (again, it is enough to consider factorsof length 2 and 3 containing g d ). In the cases of e or f it must continue with e · · · e b or f · · · f c , respectively, thus finishing the block σ (0) or σ (1). In the case when F is empty, wealready have σ (1). In the same way one can show that after the block σ (1) one must alsohave a full block σ (0) or σ (1).We remark that the cardinality at least 4 of the alphabet is essential for the above lemma.In the case F = ∅ , | G | = 1 and | E | = 2 the letters e and e can be separated by g d , whichcorresponds to Theorem 4.4 (3), when we have uncountably many minimal subshifts. Thefourth letter blocks this possibility: either we have | E | ≥
3, in which case e “glues” letters e and e , or | G | ≥
2, so the two letters g and g prevent mixing.Let u be uniformly recurrent. We define a word u R for which L ( u R ) = L ( u ) R , i.e., the setof reversals of the factors of u . Indeed, take the sequence ( p n ) n of prefixes of u , and considerthe sequence ( p Rn ) of their reversals. There is a subsequence which converges to an infiniteword u R . We claim that L ( u R ) = L ( u ) R . As u R is constructed using reversals of factors of u , we have that L ( u R ) ⊆ L ( u ). Let x ∈ L ( u ). Since u is uniformly recurrent, x must occurin p n for n large enough. As x occurs within bounded gaps, we conclude that the words inthe converging subsequence of ( p n ) n must have x R occurring for all n large enough, the firstoccurrence occurring with a uniform bound. Hence x R ∈ L ( u R ).Notice that we immediately have that u R ∈ A ( u ). Proof of Theorem 4.13.
The two subshifts are Ω( u ) and Ω( u R ). Suppose that there exists aword w ∈ A ( u ) such that it is not from Ω( u ). Due to Lemma 4.15, cutting a short prefixof w , we get a word w ′ such that w ′ = σ ( v ′ ), v ′ ∈ { , } N , and v ′ is not in the shift orbitclosure of v , where v is a Sturmian word as in Lemma 4.14. So, v ′ contains a factor w ′ whichis not abelian equivalent to any factor of v . It is straightforward to see that then σ ( w ′ ) isnot abelian equivalent to any factor of u . Indeed, it cannot be abelian equivalent to a factor16onsisting of full blocks. And if is happens to be abelian equivalent to a factor which doesnot consist of full blocks, then it is also equivalent to a shift of this factor that consists of fullblock, which is not possible. So, σ ( w ′ ) is not an abelian factor of u , hence u ′ is not in A ( u ),a contradiction.
5. Abelian closures of Arnoux–Rauzy words
In this section we discuss Arnoux–Rauzy words, which are another generalization of Stur-mian words to larger alphabet. One of the ways to define Arnoux–Rauzy words is via palin-dromic closures. The following basics on Arnoux–Rauzy words are well-known and mostlytaken from [1, 8]. In fact, this is a generalization of the facts about Sturmian words given forbinary words in [6].A finite word v = v · · · v n − is a palindrome if it is equal to its reversal, i.e., v = v n − · · · v .The right palindromic closure of a finite word u , denoted by u (+) , is the shortest palindromethat has u as a prefix. The iterated (right) palindromic closure operator ψ is defined recursivelyby the following rules: ψ ( ε ) = ε, ψ ( va ) = ( ψ ( v ) a ) (+) for all v ∈ Σ ∗ and a ∈ Σ. The definition of ψ may be extended to infinite words u over Σ as ψ ( u ) = lim n ψ (pref n ( u )), i.e., ψ ( u ) is the infinite word having ψ (pref n ( u )) as its prefix forevery n ∈ N .Let ∆ be an infinite word on the alphabet Σ such that every letter occurs infinitely oftenin ∆. The word c = ψ (∆) is then called a characteristic (or standard) Arnoux–Rauzy word and ∆ is called the directive sequence of c . An infinite word u is called an Arnoux–Rauzyword if it has the same set of factors as a (unique) characteristic Arnoux–Rauzy word, whichis called the characteristic word of u . The directive sequence of an Arnoux–Rauzy word isthe directive sequence of its characteristic word. An example of Arnoux–Rauzy word is givenby the Tribonacci word T , which can be defined as the fixed point of the morphism 0 → →
02, 2 →
0. It is not hard to see that the Tribonacci word is an Arnoux–Rauzy word withthe directive sequence (012) ω .Apparently, the structure of abelian closures of Arnoux–Rauzy words is rather compli-cated. For example, it is not hard to see that for any Arnoux–Rauzy word with a charac-teristic word c its abelian closure contains 20 c (here we assume that 0 is the first letter of∆ and 2 is the third letter occurring in ∆ for the first time, i.e., ∆ has a prefix of the form0 { , } ∗ { , } ∗ c / ∈ Ω( c ).T. Hejda, W. Steiner, and L.Q. Zamboni studied the abelian shift of the Tribonacci word T . They announced that A T \ Ω( T ) = ∅ but that Ω( T ) is the only minimal subshift containedin A ( T ) [12, 24].An interesting open question is to understand the general structure of Arnoux–Rauzywords (see Problem 7.2).
6. Abelian closures of general subshifts
In this paper and the previous works on the topic, the focus has been on abelian closures ofinfinite words. It would be interesting to investigate properties of abelian closures of generalsubshifts. 17e recall some definitions from [18, § x ∈ Σ Z is said to avoid a set of words F ⊆ Σ ∗ if L ( x ) ∩ F = ∅ . Let X F denotethe set of bi-infinite words avoiding F . A subshift is a set X F for some F . The shift operatoris defined similar to the case of infinite words. Now a set X ⊆ Σ Z is a subshift if and only if σ ( X ) = X and is closed in the usual topology on bi-infinite words.Let X be a subshift and let I ( X ) = Σ ∗ \ L ( X ). Define the set F ( X ) as the set of elementsof I ( X ) which are minimal for the factor ordering, i.e., have no proper factor in I ( X ). Then X = X F ( X ) . Definition 6.1.
If a subshift X = X F for some finite set F ⊆ Σ ∗ , then X is called a subshiftof finite type (SFT). If, on the other hand, F can be taken regular, then X is called sofic .A set X is a SFT if and only if F ( X ) is finite. Similarly, X is sofic if and only if F ( X ) isregular.We may define the abelian closure of a subshift straightforwardly. Definition 6.2.
Let X be a subshift. Then its abelian closure A ( X ) is defined as ∪ x ∈ X A ( x ).We remark that in the previous text we considered one-way infinite words, as more cus-tomary in combinatorics on words, whereas here for general subshifts it is more natural toconsider bi-infinite words. Actually, there is no principal difference for our considerations, asall the results can easily be reformulated for one-way or two-way infinite words.We conclude this paper with a couple of examples of abelian closures of subshifts. Example 6.3.
Clearly A (Σ Z ) = Σ Z . Let F = { } ⊆ { , } ∗ and set X = X F . Thesubshift X ⊆ { , } Z is called the golden mean subshift. Consider the abelian closure of X F : it comprises those words for which all 1s are isolated. But this is just X F itself. Thus A ( X F ) = X F .In the above example, both subshifts are of finite type. It was concluded that they are, infact, their own abelian closures. This property is of course not general for SFTs, as is shownby the following example. In fact, the abelian closure of a SFT is not in general a SFT. Example 6.4.
Consider the SFT X = X F with F = { aa, ac, ba, bb, cb } . It can be character-ized as the set of two-way infinite walks on the following graph. ab c Assume for a contradiction, that A ( X ) is a SFT, with F ( X ) = F ′ . There is an integer n forwhich each element of F ′ has length at most n . Consider the word x = ω c · ab · c n · ba · c ω . Herefor a finite word v by ω v we mean the left-infinite word obtained by repeating v infinitelymany times. Observe that the factors of length at most n of this word occur either in ω c · ba · c ω or in ω c · ab · c ω . Both of these words are in A ( X ) by inspection, so none of the factors can bein F ′ . Thus x avoids all the forbidden factors. But, x / ∈ A ( X ), as it contains the factor bc n b .Indeed, any word in the language L ( A ( X )) that contains two occurrences of b must containat least one occurrence of a .The next example shows that this is also possible for binary alphabet:18 xample 6.5. Consider an SFT giving words of the form · · · · · · , plus (0011) ω and (000111) ω ). It is a SFT of order 6, and its Rauzy graph contains two cycleswith the same frequencies of letters and a one-way path between them.It is readily verified that words of the form · · · · · · are in the abelian closure, whereas words of the form · · · + · · · are not. Similarly to the previous example this implies that the abelian closure is not a SFT.Next we show that the abelian closure of a sofic shift is not necessarily sofic. Example 6.6.
Let Σ = { a, b, c, d } be the underlying alphabet. Set F = { a, b, d } c ∪ d { a, b, c } ∪ cRd, where R = { a, b } ∗ \ ( ab ) ∗ and let X = X F . Hence X is of the form X = { a, b } Z ∪ { ω c x , x R d ω : x ∈ { a, b } N } ∪ { ω c ( ab ) n d ω : n ≥ } ∪ { ω c ω , ω d ω } . (Here x R is the left-infinite word defined by x , i.e., the letter at position − n of x R equals the n th letter of x .)Let F ′ = F ( A ( X )). We show that F ′ ∩ c { a, b } ∗ d = { cwd : | w | a = | w | b } . It follows that F ′ is cannot be regular, as the language above is well-known to be non-regular. Hence A ( X ) is not sofic.Let us show the ⊆ direction. Let w have | w | a = | w | b . We show that the word ω cwd ω is inthe abelian closure, and thus cwd / ∈ F ′ . Now all factors of the form c n x or yd m , for x a prefixand y a suffix of w , occur in the words ω cw ω or ω wd ω , which are elements of X . We may thusconcentrate on factors of the form c n wd m . Now c n wd m is abelian equivalent to c n ( ab ) | w | / d m which is clearly in the language of X .Let then w ∈ { a, b } ∗ be such that | w | a = | w | b . Observe that the proper factors of cwd arein the language of X . This means that either cwd ∈ F ′ or cwd ∈ L ( A ( X )). Assume, for acontradiction that cwd ∈ L ( A ( X )). Now Ψ( cwd ) = ( | w | a , | w | b , , L ( X ), any wordwith Parikh vector with last two components equal to 1 is of the form ( m, m, , | w | a = | w | b , there is no word in L ( X ) which is abelian equivalent, so cwd is not an elementof L ( A ( X )). We conclude that cwd ∈ F ′ .An interesting open question is to find out whether the abelian closure of a subshift offinite type always sofic (see Problem 7.3). 19 . Conclusions In this paper, we introduced and studied a notion of abelian subshifts of infinite words.The main open problem we would like to state in this paper is the following:
Problem 7.1.
Characterize words for which A ( x ) = Ω( x ) . Among binary uniformly recurrent words, this property gives a characterization of Stur-mian words, but the characterization does not extend to usual generalizations of Sturmianwords to non-binary alphabets: neither for balanced words, nor for words of minimal com-plexity, nor for Arnoux–Rauzy words. A modification of this question is to characterize wordsfor which A ( x ) contains exactly one minimal subshift.For Arnoux–Rauzy words, we showed that A ( x ) = Ω( x ), but their abelian closure seemsto have rather complicated structure, in particular, it always contains non-recurrent words.An interesting open question is to understand the general structure of Arnoux–Rauzy words: Problem 7.2.
Characterize abelian closures of Arnoux–Rauzy words.
Finally, we propose the following open question about general abelian subshifts:
Problem 7.3.
Is the abelian closure of an SFT always sofic?
Acknowledgements
We are grateful to Joonatan Jalonen, Ville Salo, and Luca Zamboni for fruitful discussionsand helpful comments.Svetlana Puzynina is partially supported by Russian Foundation of Basic Research (grant20-01-00488) and by the Foundation for the Advancement of Theoretical Physics and Math-ematics BASIS”. Part of the research was performed while Markus Whiteland was at theDepartment of Mathematics and Statistics, University of Turku, Finland.
References [1] P. Arnoux and G. Rauzy. Repr´esentation g´eom´etrique de suites de complexit´e 2 n +1. Bul-letin de la Soci´et´e Math´ematique de France , 119:199–215, 1991. doi:10.24033/bsmf.2164.[2] J. Bark and P. Varj´u. Partitioning the positive integers to seven Beatty sequences.
Indag.Math. , 14(2):149–161, 2003. ISSN 0019-3577. doi:10.1016/S0019-3577(03)90000-0.[3] J. Cassaigne. Sequences with grouped factors. In
Developments in Language Theory III,Publications of Aristotle University of Thessaloniki , pages 211–222, 1998.[4] S. Constantinescu and L. Ilie. Fine and wilf’s theorem for abelian periods.
EATCS Bull. ,89:167–170, 2006.[5] E. M. Coven and G. A. Hedlund. Sequences with Minimal Block Growth.
Math. Syst.Theory , 7(2):138–153, 1973. doi:10.1007/BF01762232.[6] A. de Luca. Sturmian words: structure, combinatorics, and their arithmetics.
TheoreticalComputer Science , 183:45–82, 1997. doi:10.1016/S0304-3975(96)00310-6.207] G. Didier. Caract´erisation des N -´ecritures et application `a l’´etude des suitesde complexit´e ultimement n + c ste . Theoret. Comp. Sci. , 215(1-2):31–49, 1999.doi:10.1016/S0304-3975(97)00122-9.[8] X. Droubay, J. Justin, and G. Pirillo. Episturmian words and some constructions by deluca and rauzy.
Theoretical Computer Science , 255:539–553, 2001.[9] S. Ferenczi and C. Mauduit. Transcendence of numbers with a low complexity expansion.
Journal of Number Theory , 67:146–161, 1997. doi:10.1006/jnth.1997.2175.[10] A. S. Fraenkel. Complementing and exactly covering sequences.
J. Comb. Theory Ser.A , 14(1):8–20, 1973. ISSN 0097-3165. doi:10.1016/0097-3165(73)90059-9.[11] R. L. Graham. An efficient algorithm for determining the convex hull of a finite planarset.
Inf. Process. Lett. , 1(4):132–133, 1972. doi:10.1016/0020-0190(72)90045-2. URL https://doi.org/10.1016/0020-0190(72)90045-2 .[12] T. Hejda, W. Steiner, and L. Q. Zamboni. What is the Abelianization of the tribonaccishift?, 2015. Workshop on Automatic Sequences, Li´ege, May 2015.[13] P. Hubert. Suites ´equilibr´ees.
Theor. Comput. Sci. , 242(1-2):91–108, 2000.doi:10.1016/S0304-3975(98)00202-3.[14] I. Kabor´e and T. Tapsoba. Combinatoire de mots r´ecurrents de complexit´e n + 2. ITA ,41(4):425–446, 2007. doi:10.1051/ita:2007027.[15] J. Karhum¨aki, S. Puzynina, and M. A. Whiteland. On abelian subshifts. In
Developmentsin Language Theory 2018 , volume 11088 of
Lecture Notes in Computer Science , pages453–464. Springer, 2018. doi:10.1007/978-3-319-98654-8 37.[16] D. Lind and B. Marcus.
An Introduction to Symbolic Dynamics and Coding . Camb.Univ. Press, New York, NY, USA, 1995. ISBN 0-521-55900-6.[17] M. Lothaire.
Combinatorics on Words , volume 17 of
Encycl. Math. Appl.
Addison-Wesley, 1983. ISBN 978-0-201-13516-9.[18] M. Lothaire.
Algebraic combinatorics on words , volume 90 of
Encycl. Math. Appl.
Cam-bridge University Press, 2002. ISBN 0-521-81220-8. doi:10.1017/CBO9781107326019.[19] M. Morse and G. A. Hedlund. Symbolic Dynamics II. Sturmian Trajectories.
Am. J.Math. , 62:1–42, 1940. ISSN 00029327, 10806377.[20] S. Puzynina and M. A. Whiteland. Abelian closures of infinite binary words.
CoRR ,abs/2008.08125, 2020. URL https://arxiv.org/abs/2008.08125 .[21] S. Puzynina and L. Q. Zamboni. Abelian returns in Sturmian words.
J. Comb. TheorySer. A , 120(2):390–408, 2013. doi:10.1016/j.jcta.2012.09.002.[22] G. Richomme, K. Saari, and L. Q. Zamboni. Abelian complexity of minimal subshifts.
J. Lond. Math. Soc. , 83(1):79–95, 2011. doi:10.1112/jlms/jdq063.[23] M. Rigo, P. Salimov, and ´E. Vandomme. Some properties of abelian return words.