[PDF] Factor frequencies in languages invariant under more symmetries

Abstract

The number of frequencies of factors of length n+1 in a recurrent aperiodic infinite word does not exceed $3\Delta \C(n)$, where $\Delta \C (n)$ is the first difference of factor complexity, as shown by Boshernitzan. Pelantová together with the author derived a better upper bound for infinite words whose language is closed under reversal. In this paper, we further diminish the upper bound for uniformly recurrent infinite words whose language is invariant under all elements of a finite group of symmetries and we prove the optimality of the obtained upper bound.

Full PDF

aa r X i v : . [ m a t h . C O ] J u l FACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORESYMMETRIES

L’. Balkov´a Department of Mathematics FNSPE, Czech Technical University in PragueTrojanova 13, 120 00 Praha 2, Czech Republic

Abstract.

The number of frequencies of factors of length n +1 in a recurrent aperiodic inﬁniteword does not exceed 3∆ C ( n ), where ∆ C ( n ) is the ﬁrst diﬀerence of factor complexity, as shownby Boshernitzan. Pelantov´a together with the author derived a better upper bound for inﬁnitewords whose language is closed under reversal. In this paper, we further diminish the upperbound for uniformly recurrent inﬁnite words whose language is invariant under all elements ofa ﬁnite group of symmetries and we prove the optimality of the obtained upper bound. Introduction

When studying factor frequencies, Rauzy graph is a powerful tool. Using this tool, thefollowing results have been obtained. Dekking in [8] has described factor frequencies of twofamous inﬁnite words – the Fibonacci word and the Thue-Morse word. Using Rauzy graphs,it is readily seen that frequencies of factors of a given length of any Arnoux-Rauzy word overan m -letter alphabet attain at most m + 1 distinct values. Explicit values of factor frequencieshave been derived by Berth´e in [4] for Sturmian words and by Wozny and Zamboni in [16] forArnoux-Rauzy words in general.Queﬀ´elec in [14] has explored factor frequencies of ﬁxed points of morphisms from anotherpoint of view – as a shift invariant probability measure. She has provided a rather complicatedalgorithm for the computation of values of such a measure. For some special classes of ﬁxedpoints of morphisms (circular marked uniform morphisms), Frid [10] has described completelytheir factor frequencies.A simple idea concerning Rauzy graphs lead Boshernitzan [5] to an upper bound on thenumber of diﬀerent factor frequencies in an arbitrary recurrent aperiodic inﬁnite word. Hehas shown that the number of frequencies of factors of length n + 1 does not exceed 3∆ C ( n ),where ∆ C ( n ) is the ﬁrst diﬀerence of factor complexity. In [6], it has been shown that ∆ C ( n )is bounded for inﬁnite words with sublinear complexity (for instance, ﬁxed points of primitivesubstitutions is a subclass of inﬁnite words with sublinear complexity), therefore the number ofdiﬀerent frequencies of factors of the same length is bounded.In our previous paper [2], making use of reﬂection symmetry of Rauzy graphs, we have dimin-ished Boshernitzan’s upper bound for inﬁnite words whose language is closed under reversal.This time, we generalize our result to inﬁnite words whose language is invariant under allelements of a group of symmetries and whose Rauzy graphs are therefore invariant under allelements of a group of automorphisms. In Section 2, we introduce basic notions, describe themain tool of our proofs – reduced Rauzy graphs – and summarize in detail the known upperbounds on the number of factor frequencies. Section 3 explains what is to be understood undera symmetry. In Section 4, we prove Theorem 4.1, which provides an optimal upper bound onthe number of factor frequencies of inﬁnite words whose language is invariant under all elements e-mail: lubomira.balkova@fjﬁ.cvut.cz a ﬁnite group of symmetries. Section 5 is devoted to the demonstration that the upper boundfrom the main theorem is indeed optimal.Finally, let us mention that the idea to exploit symmetries of the Rauzy graph was alreadyused in [3] in order to estimate the number of palindromes of a given length, and, recently, ithas been used profoundly in [12, 13, 15] for the generalization of the so-called rich and almostrich words (see [11]) for languages invariant under more symmetries than just reversal.2. Preliminaries An alphabet A is a ﬁnite set of symbols, called letters . A concatenation of letters is a word .The length of a word w is the number of letters in w and is denoted | w | . The set A ∗ of allﬁnite words (including the empty word ε ) provided with the operation of concatenation is a freemonoid. The set of all ﬁnite words but the empty word ε is denoted A + . We will also deal withright-sided inﬁnite words u = u u u ... , where u i ∈ A . A ﬁnite word w is called a factor of theword u (ﬁnite or inﬁnite) if there exist a ﬁnite word p and a word s (ﬁnite or inﬁnite) such that u = pws . The factor p is a preﬁx of u and s is a suﬃx of u . An inﬁnite word u is said to be recurrent if each of its factors occurs inﬁnitely many times in u . An occurrence of a ﬁnite word w in a ﬁnite word v = v v . . . v m (in an inﬁnite word u ) is an index i such that w is a preﬁx ofthe word v i v i +1 . . . v m (of the word u i u i +1 . . . ). An inﬁnite word u is called uniformly recurrent if for any factor w the set { j − i | i and j are successive occurrences of w in u } is bounded.2.1. Complexity and special factors.

The language L ( u ) of an inﬁnite word u is the set ofall factors of u . We denote by L n ( u ) the set of factors of length n of u . We deﬁne the factorcomplexity (or complexity ) of u as the mapping C : N → N which associates to every n thenumber of diﬀerent factors of length n of u , i.e., C ( n ) = L n ( u ) . An important role for the computation of factor complexity is played by special factors. Wesay that a letter a is a right extension of a factor w ∈ L ( u ) if wa is also a factor of u . We denoteby Rext(w) the set of all right extensions of w in u , i.e., Rext(w) = { a ∈ A | wa ∈ L ( u ) } . If ≥

2, then the factor w is called right special (RS for short). Analogously, we deﬁne left extensions, Lext(w) , left special factors (LS for short). Moreover, we say that a factor w is bispecial (BS for short) if w is LS and RS.With these notions in hand, we may introduce a formula for the ﬁrst diﬀerence of complexity ∆ C ( n ) = C ( n + 1) − C ( n ) (taken from [7]).(1) ∆ C ( n ) = X w ∈L n ( u ) (cid:0) − (cid:1) = X w ∈L n ( u ) (cid:0) − (cid:1) , n ∈ N . Morphisms and antimorphisms.

A mapping ϕ on A ∗ is called • a morphism if ϕ ( vw ) = ϕ ( v ) ϕ ( w ) for any v, w ∈ A ∗ , • an antimorphism if ϕ ( vw ) = ϕ ( w ) ϕ ( v ) for any v, w ∈ A ∗ .We denote the set of all morphisms and antimorphisms on A ∗ by AM ( A ∗ ). Together withcomposition, it forms a monoid (the unit element is the identity mapping Id). The mirror (alsocalled reversal ) mapping R deﬁned by R ( w w . . . w m − w m ) = w m w m − . . . w w is an involutiveantimorphism, i.e., R = Id. It is obvious that any antimorphism is a composition of R anda morphism.A language L ( u ) is closed (invariant) under reversal if for every factor w ∈ L ( u ), also its mirror image R ( w ) belongs to L ( u ). A factor w which coincides with its mirror image R ( w ) iscalled a palindrome . More generally, a language L ( u ) is closed (invariant) under an antimor-phism or morphism Ψ ∈ AM ( A ∗ ) if for every factor w ∈ L ( u ), also Ψ( w ) belongs to L ( u ). If θ ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 3 is an antimorphism on A ∗ , then w = θ ( w ) is called a θ -palindrome . It is not diﬃcult to see thatan inﬁnite word whose language is closed under an antimorphism of ﬁnite order is recurrent.We deﬁne the θ - palindromic complexity of the inﬁnite word u as the mapping P θ : N → N satisfying P θ ( n ) = { w ∈ L n ( u ) | w = θ ( w ) } . If θ = R , we write P ( n ) instead of P R ( n ).Clearly, P ( n ) ≤ C ( n ) for all n ∈ N . A non-trivial inequality between P ( n ) and C ( n ) can befound in [1]. Here, we use a result from [3]. Theorem 2.1.

If the language of an inﬁnite word is closed under reversal, then for all n ∈ N ,we have (2) P ( n ) + P ( n + 1) ≤ ∆ C ( n ) + 2 . This result has been recently generalized in [13].

Theorem 2.2.

Let G ⊂ AM ( A ∗ ) be a ﬁnite group containing an antimorphism and let u bean inﬁnite word whose language is invariant under all elements of G . If there exists an integer N ∈ N such that any factor of u of length N contains all letters of A , then X θ ∈ G (2) (cid:0) P θ ( n ) + P θ ( n + 1) (cid:1) ≤ ∆ C ( n ) + G for all n ≥ N, where G (2) is the set of involutive antimorphism in G . Remark 2.3.

Using Remark 23 from [13] , the assumption on N in Theorem 2.2 can be replacedwith the following weaker assumption: there exists an integer N such that(1) for any two antimorphisms θ , θ ∈ G , it holds θ = θ ⇒ θ ( v ) = θ ( v ) for any v with | v | ≥ N , (2) and for any two morphisms ϕ , ϕ ∈ G , it holds ϕ = ϕ ⇒ ϕ ( v ) = ϕ ( v ) for any v with | v | ≥ N . If u is an inﬁnite word whose language is closed under reversal, i.e., invariant under a morphismand an antimorphism of G = { Id , R } , then the above weaker assumption is satisﬁed already for N = 0 . Therefore, Theorem 2.1 is indeed a particular case of Theorem 2.2. Factor frequency. If w is a factor of an inﬁnite word u and if the following limit existslim | v |→∞ ,v ∈L ( u ) { occurrences of w in v }| v | , then it is denoted by ρ ( w ) and called the frequency of w .Let us recall a result of Frid [10], which is useful for the calculation of factor frequencies inﬁxed points of primitive morphisms. In order to introduce the result, we need some furthernotions. Let ϕ be a morphism on A ∗ = { a , a , . . . , a m } ∗ . We associate with ϕ the incidencematrix M ϕ given by [ M ϕ ] ij = | ϕ ( a j ) | a i , where | ϕ ( a j ) | a i denotes the number of occurrences of a i in ϕ ( a j ). The morphism ϕ is called primitive if there exists k ∈ N satisfying that the power M kϕ has all entries strictly positive. As shown in [14], for ﬁxed points of primitive morphisms, • factor frequencies exist, • it follows from the Perron-Frobenius theorem that the incidence matrix has one dominanteigenvalue λ , which is larger than the modulus of any other eigenvalue, • the components of the unique eigenvector ( x , x , . . . , x m ) T corresponding to λ normal-ized so that P mi =1 x i = 1 coincide with the letter frequencies, i.e., x i = ρ ( a i ) for all i ∈ { , , . . . , m } . FACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES

Let ϕ be a morphism on A ∗ . We denote ψ ij : A + → A + , where i, j ∈ N , the mapping thatassociates with v ∈ A + the word ψ ij ( v ) obtained from ϕ ( v ) by erasing i letters from the leftand j letters from the right, where i + j < | ϕ ( v ) | . We say that a word v ∈ A + admits an interpretation s = ( b b . . . b m , i, j ) if v = ψ ij ( b b . . . b m ), where b i ∈ A and i < | ϕ ( b ) | and j < | ϕ ( b m ) | . The word a ( s ) = b b . . . b m is an ancestor of s . The set of all interpretations of v is denoted I ( v ). Now we can recall the promised result of Frid [10]. Proposition 2.4.

Let ϕ be a primitive morphism having a ﬁxed point u and let λ be the dominanteigenvalue of the incidence matrix M ϕ . Then for any factor v ∈ L ( u ) , it holds ρ ( v ) = 1 λ X s ∈ I ( v ) ρ ( a ( s )) . Reduced Rauzy graphs.

Assume throughout this section that factor frequencies of in-ﬁnite words in question exist. The

Rauzy graph of order n of an inﬁnite word u is a directedgraph Γ n whose set of vertices is L n ( u ) and set of edges is L n +1 ( u ). An edge e = w w . . . w n starts in the vertex w = w w . . . w n − , ends in the vertex v = w . . . w n − w n , and is labeled byits factor frequency ρ ( e ).It is easy to see that edge frequencies in a Rauzy graph Γ n behave similarly as the current ina circuit. We may formulate an analogy of Kirchhoﬀ’s current law: the sum of frequencies ofedges ending in a vertex equals the sum of frequencies of edges starting in this vertex. Observation 2.5 (Kirchhoﬀ’s law for frequencies) . Let w be a factor of an inﬁnite word u whose factor frequencies exist. Then ρ ( w ) = X a ∈ Lext(w) ρ ( aw ) = X a ∈ Rext(w) ρ ( wa ) . Kirchhoﬀ’s law for frequencies has some useful consequences.

Corollary 2.6.

Let w be a factor of an inﬁnite word u whose frequency exists. • If w has a unique right extension a , then ρ ( w ) = ρ ( wa ) . • If w has a unique left extension a , then ρ ( w ) = ρ ( aw ) . Corollary 2.7.

Let w be a factor of an aperiodic recurrent inﬁnite word u whose frequencyexists. Let v be the shortest BS factor containing w , then ρ ( w ) = ρ ( v ) . The assumption of recurrence and aperiodicity in Corollary 2.7 is needed in order to ensurethat every factor can be extended to a BS factor.Corollary 2.6 implies that if a Rauzy graph contains a vertex w with only one incoming edge aw and one outgoing edge wb , then ρ := ρ ( aw ) = ρ ( w ) = ρ ( wb ) = ρ ( awb ). Therefore, we canreplace this triplet (edge-vertex-edge) with only one edge awb keeping the frequency ρ . If wereduce the Rauzy graph step by step applying the above described procedure, we obtain theso-called reduced Rauzy graph ˜Γ n , which simpliﬁes the investigation of edge frequencies. In orderto precise this construction, we introduce the notion of a simple path. Deﬁnition 2.8.

Let Γ n be the Rauzy graph of order n of an inﬁnite word u . A factor e oflength larger than n such that its preﬁx and its suﬃx of length n are special factors and e doesnot contain any other special factors is called a simple path. We deﬁne the label of a simple path e as ρ ( e ) . Deﬁnition 2.9.

The reduced Rauzy graph ˜Γ n of u of order n is a directed graph whose set ofvertices is formed by LS and RS factors of L n ( u ) and whose set of edges is given in the following ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 5 way. Vertices w and v are connected with an edge e if there exists in Γ n a simple path startingin w and ending in v . We assign to such an edge e the label of the corresponding simple path. For a recurrent word u , at least one edge starts and at least one edge ends in every vertex ofΓ n . If u is moreover aperiodic, then all its Rauzy graphs contain at least one LS and one RSfactor. It is thus not diﬃcult to see that for recurrent aperiodic words, the set of edge labels inΓ n is equal to the set of edge labels in the reduced Rauzy graph ˜Γ n . The number of edge labelsin the Rauzy graph ˜Γ n is clearly less or equal to the number of edges in ˜Γ n . Let us calculate thenumber of edges in ˜Γ n in order to get an upper bound on the number of frequencies of factorsin L n +1 ( u ).For every RS factor w ∈ L n ( u ), it holds that w and for every LSfactor v ∈ L n ( u ) which is not RS, only one edge begins in v , thus we get the following formula(3) { e | e edge in ˜Γ n } = X w RS in L n ( u ) X v LS not RS in L n ( u ) . We rewrite the ﬁrst term using (1) and the second term using the deﬁnition of BS factors in thefollowing way(4) { e | e edge in ˜Γ n } = ∆ C ( n ) + X v RS in L n ( u ) X v LS in L n ( u ) − X v BS in L n ( u ) . Since − ≥ w and, similarly, for LS factors, we have(5) { w ∈ L n ( u ) | w RS } ≤ ∆ C ( n ) and { w ∈ L n ( u ) | w LS } ≤ ∆ C ( n ) . By combining (4) and (5), we obtain(6) { e | e edge in ˜Γ n } ≤ C ( n ) − X, where X is the number of BS factors of length n . This provides us with the result initiallyproved by Boshernitzan in [5]. Theorem 2.10.

Let u be an aperiodic recurrent inﬁnite word such that the frequency ρ ( w ) existsfor every factor w ∈ L ( u ) . Then for every n ∈ N , it holds { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ C ( n ) . In the paper [2], we have considered inﬁnite words with language closed under reversal andwe have lowered the upper bound from Theorem 2.10 for them.

Theorem 2.11.

Let u be an inﬁnite word whose language L ( u ) is closed under reversal andsuch that the frequency ρ ( w ) exists for every factor w ∈ L ( u ) . Then for every n ∈ N , we have (7) { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ C ( n ) + 1 − X − Y, where X is the number of BS factors of length n and Y is the number of palindromic BS factorsof length n . Corollary 2.12.

Let u be an inﬁnite word whose language L ( u ) is closed under reversal andsuch that the frequency ρ ( w ) exists for every factor w ∈ L ( u ) . Then the number of distinctfactor frequencies obeys for all n ∈ N , (8) { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ C ( n ) + 1 , where the equality is reached if and only if u is purely periodic. FACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES

As shown by Ferenczi and Zamboni [9], m -iet words attain the upper bound from (7) forall n ∈ N . Since Sturmian words are 2-iet words, they reach the upper bound in (7) for all n ∈ N , too. Consequently, the upper bound from (7) is optimal and cannot be improved whilepreserving the assumptions. However, as we will show in the sequel, if the language of aninﬁnite word u is invariant under more symmetries, the upper bound from (7) may be loweredconsiderably. 3. Symmetries preserving factor frequency

We will be interested in symmetries preserving in a certain way factor occurrences in u , andconsequently, frequencies of factors of u . Let us call a symmetry on A ∗ any mapping Ψ satisfyingthe following two properties:(1) Ψ is a bijection: A ∗ → A ∗ ,(2) for all w, v ∈ A ∗ { occurrences of w in v } = { occurrences of Ψ( w ) in Ψ( v ) } . Theorem 3.1.

Let

Ψ : A ∗ → A ∗ . Then Ψ is a symmetry if and only if Ψ is a morphism or anantimorphism such that Ψ is a letter permutation when restricted to A . The proof of Theorem 3.1 is obtained when putting together the following two lemmas.

Lemma 3.2.

Let Ψ be a symmetry on A ∗ and let w ∈ A ∗ . Then | Ψ( w ) | = | w | .Proof. Since { occurrences of Ψ( w ) in Ψ( ε ) } = { occurrences of w in ε } = 0 for every w ∈A ∗ , it follows that Ψ( ε ) = ε .Since Ψ is a bijection, for every letter a ∈ A , there exists a unique w ∈ A ∗ such that Ψ( w ) = a ,where w = ε . If we denote A = { a , . . . , a m } , then using Property (2), it is easy to show thatthere exists a permutation π ∈ S m such that Ψ( a k ) = a π ( k ) for all k ∈ { , . . . , m } .Let us now take an arbitrary w ∈ A ∗ , then using the fact that Ψ restricted to A is a letterpermutation and applying Property (2), we have | w | = X a ∈A { occurrences of a in w } = X a ∈A { occurrences of Ψ( a ) in Ψ( w ) } = | Ψ( w ) | . (cid:3) Using Lemma 3.2 and the deﬁnition of symmetry, it is seen for every w w . . . w n ∈ A ∗ , w i ∈ A , that the following equation is valid(9) Ψ( w w . . . w n ) = Ψ( w σ (1) )Ψ( w σ (2) ) . . . Ψ( w σ ( n ) )for some permutation σ ∈ S n . The next lemma claims that the permutation σ is necessarilyeither the identical permutation (1 2 . . . n ) or the symmetric permutation ( n . . . Lemma 3.3.

Let Ψ be a symmetry on A ∗ . Then Ψ is either a morphism or an antimorphism.Proof. We have to prove that Ψ( w ) = Ψ( w )Ψ( w ) . . . Ψ( w n ) for every w = w w . . . w n ∈ A ∗ , w i ∈ A , or Ψ( w ) = Ψ( w n ) . . . Ψ( w )Ψ( w ) for every w = w w . . . w n ∈ A ∗ , w i ∈ A .Let us proceed by induction on the length n of w . The case n = 1 is clear. Suppose thatΨ( w ) = Ψ( w )Ψ( w ) . . . Ψ( w n − ) for every w = w w . . . w n − ∈ A ∗ of length n − , n ≥ w = w w . . . w n ∈ A ∗ . Then, as Ψ is a symmetry, Ψ( w . . . w n ) isa factor of Ψ( w w . . . w n ), in more precise terms, Ψ( w . . . w n ) is either a preﬁx or a suﬃx ofΨ( w . . . w n ). Moreover, if w occurs in w . . . w n ℓ times, w occurs in w w . . . w n ( ℓ + 1) times. ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 7

Since Ψ is a symmetry, it follows that Ψ( w ) occurs ℓ times in Ψ( w . . . w n ) and ( ℓ + 1) times inΨ( w w . . . w n ). These two observations result inΨ( w w . . . w n ) = Ψ( w )Ψ( w . . . w n ) = Ψ( w )Ψ( w ) . . . Ψ( w n )orΨ( w w . . . w n ) = Ψ( w . . . w n )Ψ( w ) = Ψ( w ) . . . Ψ( w n )Ψ( w ) . The ﬁrst case means that Ψ is a morphism. Let us treat the second case. Similar reasoning asbefore leads to Ψ( w w . . . w n ) = Ψ( w . . . w n − )Ψ( w n ) = Ψ( w )Ψ( w ) . . . Ψ( w n )orΨ( w w . . . w n ) = Ψ( w n )Ψ( w . . . w n − ) = Ψ( w n )Ψ( w ) . . . Ψ( w n − ) . The ﬁrst case again means that Ψ is a morphism. The only case which remains is Ψ( w ) =Ψ( w ) . . . Ψ( w n )Ψ( w ) = Ψ( w n )Ψ( w ) . . . Ψ( w n − ). Since Ψ is a bijection, we get w = w = · · · = w n . Hence, again Ψ( w ) = Ψ( w )Ψ( w ) . . . Ψ( w n ).With the same reasoning, we deduce that if Ψ( w ) = Ψ( w n − ) . . . Ψ( w )Ψ( w ) for every w = w w . . . w n − ∈ A ∗ , n ≥

2, then for an arbitrary w = w w . . . w n ∈ A ∗ , w i ∈ A , we getΨ( w ) = Ψ( w n ) . . . Ψ( w )Ψ( w ). (cid:3) Observation 3.4.

Let u be an inﬁnite word whose language is invariant under a symmetry Ψ .For every w in L ( u ) whose frequency exists, it holds ρ ( w ) = ρ (Ψ( w )) . Remark 3.5.

If a ﬁnite set G is a submonoid of AM ( A ∗ ) , then G is a group and any its memberrestricted to the set of words of length one is just a permutation on the alphabet A . In otherwords, G is a ﬁnite group of symmetries. Words with languages invariant under all elements ofsuch a group G of symmetries have been studied in [13] . Factor frequencies of languages invariant under more symmetries

Assume u is an inﬁnite word over an alphabet A with A ≥ G ⊂ AM ( A ∗ ) of symmetries containing an antimorphism.Let us summarize some observations concerning the group G of symmetries and reduced Rauzygraphs of u . These observations constitute all tools we need for the proof of the main theoremof this paper - Theorem 4.1. Observations :(1) Let θ be an antimorphism in G . The mapping Ψ → θ Ψ is a bijection on G satisfyingΨ ∈ G is a morphism ⇔ θ Ψ ∈ G is an antimorphism.This implies that G containing an antimorphism has an even number of elements, i.e., G = 2 k .(2) For a factor w containing all letters of A , the following properties can be easily veriﬁed:(a) for any distinct antimorphisms θ , θ ∈ G , we have θ ( w ) = θ ( w ),(b) for any distinct morphisms ϕ , ϕ ∈ G , we have ϕ ( w ) = ϕ ( w ).(3) If w is a θ -palindrome containing all letters of A for an antimorphism θ ∈ G , then θ isan involution, i.e., θ = Id.(4) In a reduced Rauzy graph of u , if there is an edge e between two vertices w and v , where w and v contain all letters of A , then FACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES (a) either e is a θ -palindrome for some antimorphism θ ∈ G , then there exist at least k distinct edges having the same label ρ ( e ), namely edges ϕ ( e ) for all morphisms in G ;(b) or e is not a θ -palindrome for any antimorphism θ ∈ G , then there exist at least 2 k distinct edges having the same label ρ ( e ), namely edges ϕ ( e ) for all morphisms in G and θ ( e ) for all antimorphisms in G .(5) On one hand, if an edge e in the reduced Rauzy graph ˜Γ n is mapped by θ onto itself,then the corresponding simple path has a θ -palindromic central factor of length n or n + 1. On the other hand, every θ -palindrome contained in L n +1 ( u ) is the central factorof a simple path mapped by θ onto itself and every θ -palindrome of length n is eitherthe central factor of a simple path mapped by θ onto itself or is a special factor (thus,evidently, a BS factor). Theorem 4.1.

Let G ⊂ AM ( A ∗ ) be a ﬁnite group containing an antimorphism and let u bea uniformly recurrent aperiodic inﬁnite word whose language is invariant under all elements of G and such that the frequency ρ ( w ) exists for every factor w ∈ L ( u ) . Then there exists N ∈ N such that { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ G (cid:16) C ( n ) + G − X − Y (cid:17) for all n ≥ N , where X is the number of BS factors of length n and Y is the number of BS factors of length n that are θ -palindromes for an antimorphism θ ∈ G .Proof. Since u is uniformly recurrent, we can ﬁnd N such that any factor of length N containsall letters of u . Let ˜Γ n be the reduced Rauzy graph of u of order n ≥ N . We know already thatthe set of edge labels of ˜Γ n is equal to the set of edge labels of Γ n . It is easy to see that anyelement of G is an automorphism of ˜Γ n , i.e., G maps the graph ˜Γ n onto itself.Let us denote by A the number of edges e in ˜Γ n such that e is mapped by a certain antimor-phism of G onto itself (such an antimorphism is involutive by Observation (3)) and by B thenumber of edges e in ˜Γ n such that e is not mapped by any antimorphism of G onto itself, then(10) { e | e edge in ˜Γ n } = A + B ≤ C ( n ) − X, where the upper bound is taken from (6). We get, using Observations (3) and (5), the followingformula(11) A = X θ ∈ G (2) (cid:0) P θ ( n ) + P θ ( n + 1) (cid:1) − X θ ∈ G (2) { w ∈ L n ( u ) | w = θ ( w ) and w BS } , where we subtract the number of BS factors of L n ( u ) that are θ -palindromes for a certainantimorphism θ , in the statement denoted by Y , since they are not central factors of any simplepath. If G = 2 k , then for every edge e in ˜Γ n that is mapped by a certain antimorphism θ ∈ G onto itself, there are at least k diﬀerent edges with the same label ρ ( e ) by Observation (4 a ).Now, let us turn our attention to those edges of ˜Γ n which are not mapped by any antimorphismof G onto themselves. For every such edge e , at least 2 k edges have the same label ρ ( e ) byObservation (4 b ). These considerations lead to the following estimate(12) { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ k A + k B = k A + k ( A + B ) . Putting together (11), (10), (12), and Theorem 2.2, the statement is proven. (cid:3)

Remark 4.2.

If an inﬁnite word u is closed under reversal, then G = { Id , R } and the newupper bound from Theorem 4.1 coincides with the estimate from Theorem 2.11. ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 9

Remark 4.3.

It is easy to show that Theorem 4.1 will stay true if we replace the assumption ofuniform recurrence with the weaker (however more technical) assumption from Remark 2.3.

Finally, if we want to have a simpler upper bound on factor frequencies, we can use thefollowing one, which is slightly rougher than the estimate from Theorem 4.1.

Corollary 4.4.

Let G ⊂ AM ( A ∗ ) be a ﬁnite group containing an antimorphism and let u bea uniformly recurrent inﬁnite word whose language is invariant under all elements of G andsuch that the frequency ρ ( w ) exists for every factor w ∈ L ( u ) . Then there exists N ∈ N suchthat { ρ ( e ) | e ∈ L n +1 ( u ) } ≤ G ∆ C ( n ) + 1 for all n ≥ N .

The equality holds for all suﬃciently large n if and only if u is purely periodic. Optimality of the upper bound

In this section, we will illustrate on an example taken from [13] that the upper bound fromTheorem (4.1) is attained for every n ∈ N , n ≥

1, thus it is an optimal upper bound. Theinﬁnite word u in question is the ﬁxed point starting in 0, which is obtained when we iteratethe primitive morphism ϕ given by:(13) ϕ (0) = 0130 , ϕ (1) = 1021 , ϕ (2) = 102 , ϕ (3) = 013 , i.e., for all n ∈ N , the word ϕ n (0) is a preﬁx of u .The corresponding incidence matrix is of the form M ϕ =   , its dominant eigenvalue is λ = 2 + √  √ − √ − − √ − √  , hence we get the letter frequencies ρ (0) = ρ (1) = √ − , ρ (2) = ρ (3) = 2 − √ . We also know that the frequencies of all factors exist because of the primitivity of ϕ . In [13],the following properties of u have been shown:(1) The language L ( u ) is closed under the ﬁnite group of symmetries G = { Id , θ , θ , θ θ } ,where θ , θ are involutive antimorphisms acting on A as follows: θ : 0 → , → , → , → θ : 0 → , → , → , → . (2) The ﬁrst increment of factor complexity satisﬁes ∆ C ( n ) = 2 for all n ∈ N , n ≥ w is a preﬁx for some n ∈ N , • of either ϕ n (0) = 013010210130130 . . . and Lext(w) = { , }• or of ϕ n (1) = 102101301021021 . . . and Lext(w) = { , } .(3) A factor w of u is LS if and only if θ i ( w ) is RS for i ∈ { , } . In order to ﬁnd the set of frequencies of factors of any length, we need to describe BS factorsof u . By Property (3), we deduce the relation between BS factors and θ i -palindromes. Corollary 5.1.

Every nonempty BS factor is a θ i -palindrome for one of the indices i ∈ { , } . Proposition 5.2. If v ∈ L ( u ) is a BS factor of length greater than , then v = ϕ ( w ) p w n , where w n is the last letter of w and p = p = 10210 and p = p = 01301 . Moreover, ρ ( v ) = 1 λ ρ ( w ) . Proof.

By Property (2), every LS factor of length greater than 5 starts either in 01301 or in 10210.Similarly, by Property (3), every RS factor ends either in 01301 or in 10210. It follows from thedeﬁnition of ϕ in (13) that there exists w ∈ L ( u ) such that v = ϕ ( w )01301 or v = ϕ ( w )10210and that w is necessarily a BS factor. Consider v = ϕ ( w )01301, the second case can be treatedanalogously. It is then not diﬃcult to see that w ends in w n = 1 or w n = 3, hence p w n = 01301.In order to prove the relation between frequencies, we need to determine the set of interpretationsof v . It is readily seen that the set of interpretations is • { ( w , , , ( w , , , ( w , , } if w n = 1 or w n = 3, • { ( w , , , ( w , , , ( w , , } if w n = 0 or w n = 2.Using Proposition 2.4, we obtain ρ ( v ) = ρ ( w ρ ( w ρ ( w λ = ρ ( w ) λ if w n = 1 or w n = 3, wherethe last equality follows from the fact that w is always followed by 01 ,

02, or 30, and similarly, ρ ( v ) = ρ ( w ρ ( w ρ ( w λ = ρ ( w ) λ if w n = 0 or w n = 2. (cid:3) Proposition 5.2 implies that if we want to generate all BS factors of u , then it is enough toknow BS factors of length less than or equal to 5 and to apply the mapping w → ϕ ( w ) p w n onthem repeatedly. Nonempty BS factors of length less than or equal to 5 are:(1) 0 and 1,(2) 01 and 10,(3) 01301 and 10210.The aim of the rest of this section is to show that for any length n ∈ N , n ≥

1, we have { ρ ( e ) | e ∈ L n +1 ( u ) } = (cid:26) L n ( u ) contains a BS factor , . Let us draw in Figure 1 reduced Rauzy graphs containing short BS factors. In order todescribe factor frequencies, it suﬃces to consider reduced Rauzy graphs containing short BSfactors together with the following observations concerning reduced Rauzy graphs of u . Observation 5.3. (1) Any reduced Rauzy graph has either four vertices (two LS factorsand two RS factors) or two vertices (BS factors).(2) Reduced Rauzy graph of larger order than 5 whose vertices are BS factors are obtainedfrom the graphs in Figure 1 by a repeated application of the mapping w → ϕ ( w ) p w n simultaneously to all vertices and edges.(3) By Corollary 2.7, it is not diﬃcult to see that if we ﬁnd to a reduced Rauzy graph ˜Γ n whose vertices are not BS factors the reduced Rauzy graph of minimal larger order, say ˜Γ m , whose vertices are BS factors, then { ρ ( e ) | e edge in ˜Γ n } = { ρ ( e ) | e edge in ˜Γ m } ∪ { ρ ( v ) | v vertex in ˜Γ m } . The last step in the derivation of frequencies of factors of u is to determine the frequenciesof edges and vertices in the reduced Rauzy graphs depicted in Figure 1. In the sequel, we makeuse of Kirchhoﬀ’s law for frequencies (Observation 2.5), of the fact that symmetries preservefactor frequencies, and of the formula from Proposition 2.4. ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 11

Figure 1.

Reduced Rauzy graphs of u of order n ∈ { , , } .(1) ˜Γ : ρ (0) = ρ (1) = √ − = √ λ ρ (130) = ρ (021) = ρ (2) = −√ = λ ρ (01) = ρ (10) = ρ (0) − ρ (130) = √ λ In the second row, the ﬁrst equality follows from the fact that symmetries preservefrequencies and 130 = θ (021) and the second equality by Corollary 2.6 from the factthat 2 is neither LS, nor RS. In the third row, the ﬁrst equality is again due to symmetriesand the second uses Kirchhoﬀ’s law for frequencies from Observation 2.5.(2) ˜Γ : ρ (01) = ρ (10) = √ λ ρ (01301) = ρ (10210) = ρ (130) = λ ρ (010) = ρ (101) = ρ (01) − ρ (01301) = √ − λ (3) ˜Γ : ρ (01301) = ρ (10210) = λ ρ ( ϕ (0)10210) = ρ ( ϕ (1)01301) = ρ (0) λ = √ − λ ρ (01301301) = ρ (10210210) = ρ (01301) − ρ ( ϕ (0)10210) = −√ λ = λ Putting together Proposition 5.2, properties of reduced Rauzy graphs summarized in Observa-tion 5.3, and the knowledge of frequencies of vertices and edges in ˜Γ , ˜Γ , and ˜Γ , we obtain thefollowing corollary. Corollary 5.4.

Let n ∈ N , n ≥ , such that(1) L n ( u ) contains a BS factor: then there exists k ∈ N such that the set { ρ ( e ) | e ∈ L n +1 ( u ) } is of one of the following forms:(a) { λ k +1 , √ λ k +1 } ,(b) { λ k +1 , √ − λ k +1 } ,(c) { √ − λ k +1 , λ k +2 } . (2) L n ( u ) does not contain a BS factor: then there exists k ∈ N such that the set { ρ ( e ) | e ∈L n +1 ( u ) } is of one of the following forms:(a) { √ − λ k , λ k +1 , √ λ k +1 } ,(b) { √ λ k +1 , λ k +1 , √ − λ k +1 } ,(c) { λ k +1 , √ − λ k +1 , λ k +2 } . A direct consequence of the previous corollary is the optimality of the upper bound fromTheorem 4.1.

Proposition 5.5.

Let u be the ﬁxed point of ϕ deﬁned in (13) . Then for every n ∈ N , n ≥ , itholds { ρ ( e ) | e ∈ L n +1 ( u ) } = 1 G (cid:16) C ( n ) + G − X − Y (cid:17) , where X is the number of BS factors of length n and Y is the number of BS factors of length n that are θ - or θ -palindromes.Proof. Let us consider at ﬁrst n such that L n ( u ) does not contain a BS factor. Then, on onehand, Corollary 5.4 states that { ρ ( e ) | e ∈ L n +1 ( u ) } = 3. On the other hand, G (cid:16) C ( n ) + G − X − Y (cid:17) = · − − = 3. At second, let L n ( u ) contain a BS factor. Then, on one hand, wehave by Corollary 5.4 { ρ ( e ) | e ∈ L n +1 ( u ) } = 2. On the other hand, by (1) of Observation 5.3, L n ( u ) contains 2 BS factors, and by Corollary 5.1, one BS factor is a θ -palindrome and one BSfactor is a θ -palindrome, thus G (cid:16) C ( n ) + G − X − Y (cid:17) = · − − = 2. (cid:3) Remark 5.6.

There are also inﬁnite words whose language is invariant under elements ofa ﬁnite group of symmetries, however, the upper bound from Theorem 4.1 is not reached forany n ∈ N . Such an example is the famous Thue-Morse word. Its group of symmetries G = { Id , R , Ψ , Ψ ◦ R } , where Ψ is a morphism acting on { , } as follows: Ψ : 0 → , → . As shown by Dekking [8] , the Thue-Morse word u T M satisﬁes for n ∈ N , n ≥ , { ρ ( e ) | e ∈ L n +1 ( u T M ) } = (cid:26) if u T M contains a BS factor of length n , otherwise . But, the upper bound from Theorem 4.1 is of the following form for n ∈ N , n ≥ , or if u T M contains a BS factor of length n , or otherwise . Acknowledgement

I would like to thank E. Pelantov´a and ˇS. Starosta for careful reviewing and useful remarks.I acknowledge ﬁnancial support by the Czech Science Foundation grant 201/09/0584, by thegrants MSM6840770039 and LC06002 of the Ministry of Education, Youth, and Sports of theCzech Republic.

References [1] J.-P. Allouche, M. Baake, J. Cassaigne, D. Damanik,

Palindrome complexity , Theoret. Comput. Sci. (2003), 9–31[2] L. Balkov´a, E. Pelantov´a,

A Note on Symmetries in the Rauzy Graph and Factor Frequencies , Theoret.Comput. Sci. (2009), 2779–2783

ACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER MORE SYMMETRIES 13 [3] P. Bal´aˇzi, Z. Mas´akov´a, E. Pelantov´a,

Factor versus palindromic complexity of uniformly recurrent inﬁnitewords , Theoret. Comput. Sci. (2007), 266–275[4] V. Berth´e,

Fr´equences des facteurs des suites sturmiennes , Theoret. Comput. Sci. (1996), 295–309[5] M. Boshernitzan,

A condition for unique ergodicity of minimal symbolic ﬂows , Ergodic Theory Dynam.Systems (1992), 425–428[6] J. Cassaigne, Special factors of sequences with linear subword complexity , Developments in language theory,II (Magdeburg, 1995), World Sci. Publishing, Singapore (1996), 25–34[7] J. Cassaigne,

Complexit´e et facteurs sp´eciaux [Complexity and special factors] , Journ´ees Montoises (Mons,1994), Bull. Belg. Math. Soc. Simon Stevin (1997), 67–88[8] M. Dekking, On the Thue-Morse measure , Acta Univ. Carolin. Math. Phys. (1992), 35–40[9] S. Ferenczi, L. Zamboni, Languages of k -interval exchange transformations , Bull. Lond. Math. Soc. 40 (2008), 705–714[10] A. Frid, On the frequency of factors in a D0L word , Journal of Automata, Languages and Combinatorics (1998), 29–41[11] A. Glen, J. Justin, S. Widmer, L. Q. Zamboni, Palindromic richness , Eur. J. Comb. (2009) 510-531[12] E. Pelantov´a, ˇS. Starosta, Inﬁnite words rich and almost rich in generalized palindromes , to appear in Pro-ceedings of DLT 2011, Milano, arXiv:1102.4023v1 [math.CO][13] E. Pelantov´a, ˇS. Starosta,

Languages invariant under more symmetries: overlapping factors versus palin-dromic richness , arXiv:1103.4051v1 [math.CO][14] M. Queﬀ´elec,

Substitution dynamical systems - Spectral analysis , in Lecture Notes in Math. , 1987[15] ˇS. Starosta, On θ -palindromic richness , Theoret. Comput. Sci. (2011), 1111–1121[16] N. Wozny, L. Q. Zamboni, Frequencies of factors in Arnoux-Rauzy sequences , Acta Arith. XCVI.3