[PDF] A direct proof of Agafonov's theorem and an extension to shift of finite type

Abstract

We provide a direct proof of Agafonov's theorem which states that finite state selection preserves normality. We also extends this result to the more general setting of shifts of finite type by defining selections which are compatible the shift. A slightly more general statement is obtained as we show that any Markov measure is preserved by finite state compatible selection.

Full PDF

aa r X i v : . [ c s . F L ] M a y A direct proof of Agafonov’s theoremand an extension to shifts of ﬁnite type

Olivier CartonMay 14, 2020

Abstract

We provide a direct proof of Agafonov’s theorem which states thatﬁnite state selection preserves normality. We also extends this resultto the more general setting of shifts of ﬁnite type by deﬁning selectionswhich are compatible with the shift. A slightly more general statementis obtained as we show that any Markov measure is preserved by ﬁnitestate compatible selection.

Normality was introduced by Borel in [5] more than one hundred years agoto formalize the most basic form of randomness for real numbers. A numberis normal to a given integer base if its expansion in that base is such thatall blocks of digits of the same length occur in it with the same limitingfrequency.Although normality is a purely combinatorial property, it has close linkswith ﬁnite state machines. A fundamental theorem relates normality andﬁnite automata: an inﬁnite sequence is normal to a given alphabet if andonly if it cannot be compressed by lossless ﬁnite transducers. These are de-terministic ﬁnite automata with injective input-output behavior. This resultwas ﬁrst obtained by joining a theorem by Schnorr and Stimm [16] with atheorem by Dai, Lathrop, Lutz and Mayordomo [9]. Becher and Heiber gavea direct proof in [4]. Another astonishing result is Agafonov’s theorem stat-ing that selecting symbols in a normal sequence using a ﬁnite state machinepreserves normality [1]. Agafonov’s publication [1] does not really includethe proof but O’Connor [13] provided it using predictors deﬁned from ﬁniteautomata, and Broglio and Liardet [7] generalized it to arbitrary alphabets.Later Becher and Heiber gave another proof based of the characterizationof normality by non-compressibility by lossless ﬁnite transducers [4]. In thispaper, we provide a direct proof of Agafonov’s theorem. The proof is almostelementary but it still relies on Markov chains arguments.1he notion of normality has been extended to broader contexts like theone of dynamical systems and especially shifts of ﬁnite type [12]. When soﬁcshifts are irreducible and aperiodic, they have a measure of maximal entropyand a sequence is then said to be normal if the frequency of each block equalsits measure. This extension to shifts meets the original aim of normality tostudy expansions of numbers in bases when the shift arises from a numericalsystems like the β -shifts coming from the numeration in a non-integer base β .Normality can be again interpreted as the good distribution of blocks ofdigits in the expansion of a number in a base β . In this paper, we extendAgafonov’s theorem to the setting of shift of ﬁnite type. More precisely,we show that genericity for Markovian measure is preserved by selectionwith ﬁnite state state machines if the machines satisfy some compatibilitycondition with the measure. This result includes the case of shifts of ﬁnitetype as their Parry measure is Markovian.The paper is organized as follows. Section 2 is devoted to notation andmain deﬁnitions. The link between selection and special ﬁnite-state ma-chines called selectors is given in Section 3. Agafanov’s theorem is statedand proved in Section 4. The extension of the theorem to Markovian mea-sures is given in Section 5. Note that the proof given that section subsumesthe one given in the previous one. We keep both proofs since we thinkthat the one in Section 4 is a nice preparation for the reader to the one inSection 5. We write N for the set of all non-negative integers. Let A be a ﬁnite set. Welet A ∗ and A N respectively denote the sets of all ﬁnite and inﬁnite sequencesover the alphabet A . Similarly A k stands for the set of sequences of length k .Finite sequence are also called words . The empty word is denoted by λ and the length of a word w is denoted | w | . The positions in ﬁnite andinﬁnite words are numbered starting from 1. For a word w and positions1 i j | w | , we let w [ i ] and w [ i : j ] denote respectively the symbol a i atposition i and the word a i a i +1 · · · a j from position i to position j . A wordof the form w [ i : j ] is called a block of w . A word u is a preﬁx (respectively suﬃx ) of a word w , denoted u ⊑ w , if w = uv (respectively w = vu ) forsome word v .For any ﬁnite set S we denote its cardinality with S . We write log forthe base 2 logarithm.In this article we are going to work on shift spaces, in particular shifts ofﬁnite type (SFT). Let A be a given alphabet. The full shift is the set A N ofall (one-sided) inﬁnite sequences ( x n ) n > of symbols in A . The shift σ is thefunction from A N to A N which maps each sequence ( x n ) n > to the sequence2 x n +1 ) n > obtained by removing the ﬁrst symbol.A shift space of A N or simply a shift is a subset X of A N which is closedfor the product topology and invariant under the shift operator, that is σ ( X ) = X . Let F ⊂ A ∗ be a set of ﬁnite words called forbidden blocks . Theshift X F is the subset of A N made of sequences without any occurrences ofblocks in F . More formally, it is the set X F = { x : x [ m : n ] / ∈ F for each 1 m n } . It is well known that a shift X is characterized by its forbidden blocks,that is X = X F for some set F ⊂ A ∗ . The shift X is said to be of ﬁnitetype if X = X F for some ﬁnite set F of forbidden blocks [11, Def. 2.1.1]. Upto a change of alphabet, any shift space of ﬁnite type is the same as a shiftspace X F where any forbidden block has length 2, that is F ⊂ A .For simplicity, we always assume that each forbidden block has length 2.In that case, the set F is given by an A × A -matrix P = ( p ab ) a,b ∈ A where p ab = 0 if ab ∈ F and p ab > X = X P . Theshift X is called irreducible if the graph induced by the matrix P is stronglyconnected, that is, for each symbols a, b ∈ A , there exists an integer n (depending on a and b ) such that P nab >

0. The shift X is called irreducible and aperiodic if there exists an integer n such that P nab > a, b ∈ A . Example 1 (Golden mean shift) . The golden mean shift is the shift space X F ⊂ { , } N where the set of forbidden blocks is F = { } . It is madeof all sequences over { , } with no two consecutive . This subshift is alsoequal to X M where the matrix M is given by M = ( ) . Let x = a a a · · · be a sequence over the alphabet A . Let L ⊆ A ∗ be aset of ﬁnite words over A . The word obtained by oblivious preﬁx selection of x by L is x ↾ L = a i a i a i · · · where i , i , i , . . . is the enumerationin increasing order of all the integers i such that the preﬁx a a · · · a i − belongs to L . This selection rule is called oblivious because the symbol a i isnot included in the considered preﬁx. If L = A ∗ x ↾ L is made of all symbols of x occurring after a 1in the same order as they occur in x . A probability measure on A ∗ is a function µ : A ∗ → [0 ,

1] such that µ ( λ ) = 1and X a ∈ A µ ( wa ) = µ ( w )holds for each word w ∈ A ∗ . The simplest example of a probability measureis a Bernoulli measure . It is a monoid morphism from A ∗ to [0 ,

1] (endowed3ith multiplication) such that P a ∈ A µ ( a ) = 1. Among the Bernoulli mea-sures is the uniform measure which maps each word w ∈ A ∗ to ( A ) −| w | .In particular, each symbol a is mapped to µ ( a ) = 1 / A .By the Carath´eodory extension theorem, a measure µ on A ∗ can beuniquely extended to a probability measure ˆ µ on A N such that ˆ µ ( wA N ) = µ ( w ) holds for each word w ∈ A ∗ . In the rest of the paper, we use the samesymbol for µ and ˆ µ . A probability measure µ is said to be (shift) invariant if the equality X a ∈ A µ ( aw ) = µ ( w )holds for each word w ∈ A ∗ .We now recall the deﬁnition of Markov measures. For a stochastic ma-trix P and a stationary distribution π , that is a raw vector such that πP = π ,the Markov measure µ π,P is the invariant measure deﬁned by the followingformula [10, Lemma 6.2.1]. µ π,P ( a a · · · a k ) = π a P a a · · · P a k − a k A measure µ is compatible with a shift X F if it only puts weight onblocks of X , that is, µ ( w ) > w / ∈ F for each word w . For a shiftof ﬁnite type, there is a unique compatible measure with maximal entropy[10, Thm. 6.2.20]. This measure is called the Parry measure and it is aMarkov measure. This measure can be explicitly given as follows. TheParry measure of a SFT X M is the Markov measure given by the stochasticmatrix P = ( P i,j ) where P i,j = M i,j r j /θr i and the stationary probabilitydistribution π deﬁned by π i = l i r i , where θ is the Perron eigenvalue ofthe matrix M and the vectors l and r are respectively the left and righteigenvectors of M for θ normalized so that P ki =1 l i r i = 1. Example 2 (Parry measure of the golden mean shift) . Consider again thegolden mean shift X . Its Parry measure is the Markov measure µ π,P where π is the distribution π = ( λ / (1 + λ ) , / (1 + λ )) and P is the stochasticmatrix P = (cid:16) /λ /λ (cid:17) where λ is the golden mean. Conversely, the support of an invariant measure µ is the shift X µ = X F where F is the set of words of measure zero, that is F = { w : µ ( w ) = 0 } .If µ is the Markovian measure µ π,P , then its support X µ is a shift of ﬁnitetype because it is equal to the shift X P given by matrix P .We recall here the notion of normality and the notion of genericity. Westart with the notation for the number of occurrences of a given word u within another word w . For two words u and w , the number | w | u of oc-currences of u in w is given by | w | u = { i : w [ i : i + | u | −

1] = u } . Borel’sdeﬁnition [5] of normality for a sequence x ∈ A N is that x is normal if foreach ﬁnite word w ∈ A ∗ lim n →∞ | x [1: n ] | w n = ( A ) −| w | x is called generic for a measure µ (or merely µ -generic ) if foreach word w ∈ A ∗ lim n →∞ | x [1: n ] | w n = µ ( w )Normality is then the special case of genericity when the measure µ is theuniform measure. There are another deﬁnitions of normality and genericitytaking into account only some occurrences, called aligned occurrences, ofeach word w . More precisely, the sequence x is factorized x = w w w · · · where | w i | = | w | for each i > { i n : w i = w } /n converges to µ ( w ) when n goes to inﬁnity for each word w .It is shown in [2] that the two notions coincide as long as the measure µ isMarkovian. In this section, we introduce the automata with output also known as trans-ducers which are used to select symbols from a sequence. We consider deterministic transducers computing functions from sequences in a shift X to sequences in a shift Y , that is, for a given input sequence x ∈ X , there isat most one output sequence y ∈ Y . We focus on transducer that operatein real-time, that is, they process exactly one input alphabet symbol pertransition. We start with the deﬁnition of a transducer. Deﬁnition 3. An input deterministic transducer T is a tuple h Q, A, B, δ, I, F i ,where • Q is a ﬁnite set of states , • A and B are the input and output alphabets, respectively, • δ : Q × A → B ∗ × Q is the transition function, • I ⊆ Q and F ⊆ Q are the sets of initial and ﬁnal states, respectively. Input deterministic transducers are also called sequential in the litter-ature [15]. The relation δ ( p, a ) = ( w, q ) is written p a | v −−→ q and the tuple h p, a, w, a i is then called a transition of the transducer. A ﬁnite (respec-tively inﬁnite) run is a ﬁnite (respectively inﬁnite) sequence of consecutivetransitions, q a | v −−−→ q a | v −−−→ q · · · q n − a n | v n −−−→ q n . Its input and output labels are respectively a · · · a n and v · · · v n . A ﬁnite runis written q a ··· a n | v ··· v n −−−−−−−−→ q n . An inﬁnite run is written q a a a ···| v v v ··· −−−−−−−−−−−→∞ . An inﬁnite run is accepting if its ﬁrst state q is initial. Note that thereis no accepting condition. This is due to the fact that we always assumethat the domain is a closed subset of A N . Since transducers are assumed tobe input deterministic there is at most one run with input label x for each5 in A N . If the output label is the inﬁnite sequence y , we write y = T ( x ).By a slight abuse of notation, we write T ( x [ m : n ]) for the output of T alongthat run while reading the block x [ m : n ] of x . We always asumme that alltransducers are trim : each state occurs in at least one accepting run. Sincetransducers are input deterministic, the stating state and the input labeldetermine the run and the ending state. For a state p and a word u , we let p ∗ u and p · u denote respectively the run p u | v −−→ q and its ending state q .A selector is a deterministic transducer such that each of its transitionshas one of the types p a | a −−→ q (type I), p a | λ −−→ q (type II) for a symbol a ∈ A .In a selector, the output of a transition is either the symbol read by thetransition (type I) or the empty word (type II). Therefore, it can be alwaysassumed that the output alphabet B is the same as the input alphabet A .It follows that for each run p u | v −−→ q , the output label v is a subword, that isa subsequence, of the input label u . q q q | | | | λ | λ | oblivious if all transitions starting from a given state havethe same type. The selector pictured in Figure 1 is not oblivious but theone pictured in Figure 2 is oblivious. The terminology is justiﬁed by thefollowing relation between oblivious preﬁx selection and selectors. If L ⊆ A ∗ is a rational set, the oblivious preﬁx selection by L can be performed by anoblivious selector. There is indeed an oblivious selector S such that foreach input word x , the output S ( x ) is the result x ↾ L of the selectionby L . This selector S can be obtained from any deterministic automaton A accepting L . Replacing each transition p a −→ q of A by either p a | a −−→ q ifthe state p is accepting or by p a | λ −−→ q otherwise yields the selector S . Itcan be easily veriﬁed that the obtained transducer is an oblivious selectorperforming the oblivious preﬁx selection by L . Conversely, each obliviousselector performs the oblivious preﬁx selection by K where K is the set ofwords being the input label of a run from the initial state to a state q suchthat transitions starting from q have type I.The transducer pictured in Figure 2 is an oblivious selector that selectssymbols occurring after a 1. It performs the oblivious preﬁx selection by L where L is the set A ∗ q | λ | λ | | automaton a transducer whereoutput labels of transitions are removed. This means that the transitionfunction δ is then a function from Q × A to Q where Q is the state set and A the alphabet. In this section, we consider normality in the full shift. We give an alternativeproof of Agafonov’s result [1] that ﬁnite state selection preserves normality.This means that if the sequence x is normal and L is a regular set of ﬁnitewords, then the sequence x ↾ L is still normal. Since it has been remarkedthat selecting by a regular set is the same as using an oblivious selector, theresult means that if S is an oblivious selector and x is normal, then S ( x ) isalso normal.The following theorem of Agafonov states that oblivious preﬁx selectionby a regular set preserves normality. Theorem 4 (Agafonov [1]) . If x is normal and L is regular, then x ↾ L isstill normal. The strategy of the proof is the following. We consider an obliviousselector S performing selection by L . This means that if x and y are theinput and output label of successful run, then y = x ↾ L . We show thenthat if the input label x a normal sequence, then the output of the runof S is also normal. We ﬁx a state p of S and an integer ℓ . We show thatfor k great enough, the number of runs starting from p and outputting lessthan ℓ symbols is negligible. Then we show that for each words w and w ′ of length ℓ , the number of runs outputting w and w ′ are almost the same.Finally, we show that all these runs of lengths k starting from p have thesame frequency in a run whose input is a normal word.The following lemma shows that the number of runs starting from astate p and outputting a ﬁxed word w is not too large. Lemma 5.

Let S be an oblivious selector. For each state p of S , eachinteger n > , and word w ∈ A ∗ such that | w | n , there are at most ( A ) n −| w | runs p u | v −−→ q of length n such that w is a preﬁx of the outputlabel v . roof. The proof is carried out by induction on the integer n . If n = 0,the only possible word is the empty word λ . Since there is only one run oflength 0, the inequality is satisﬁed. We now suppose that n >

1. Since theselector is oblivious, all transitions starting from state p have the same type,either type I or type II.We ﬁrst suppose that all transitions starting from state p have type I.Let us write w = aw ′ where a is a symbol and w ′ a word. Consider thetransition p a | a −−→ q . All runs starting from p such that w is a preﬁx of theoutput label must use this transition as a ﬁrst transition. Applying theinduction hypothesis to q , n − w ′ gives the result.We now suppose that all transitions starting from state p have type II,that is, have the form p a | λ −−→ q a for each symbol a . This implies that all runsof length n starting from p have an output label of length at most n − | w | = n , there is no run such that w is preﬁx of its output labeland the inequality is trivially satisﬁed. If | w | n −

1, applying the inductionhypothesis to each q a , n −

1, and w gives that the number of runs startingfrom q a such that w is a preﬁx of their output label is at most ( A ) n − −| w | .Summing up all these inequalities for all q a gives the required inequalityfor p .Some of the bounds are obtained using the ergodic theorem for Markovchains [6, Thm 4.1]. For that purpose, we associate a Markov chain M toeach strongly connected automaton A . For simplicity, we assume that thestate set Q of A is the set { , . . . , Q } . The state set of the Markov chain isthe same set { , . . . , Q } . The transition matrix of the Markov chain is thematrix P = ( p i,j ) i,j Q where each entry p i,j is equal to { a : i a −→ j } / A .Note that { a : i a −→ j } is the number of transitions from i to j . Since theautomaton is assumed to be deterministic and complete, the matrix P isstochastic. If the automaton A is strongly connected, the Markov chain isirreducible and it has therefore a unique stationary distribution π such that πP = π . The vector π is called the distribution of A .By a slight abuse of notation, we let | p ∗ w | q denote the number ofoccurrences of the state q in the ﬁnite run p ∗ w . The idea of the followinglemma is borrowed from [16]. Lemma 6.

The proof is a mere application of the ergodic theorem for Markovchains [6, Thm 4.1].The following corollary is also borrowed from [16].8 orollary 7.

Let A be a deterministic and strongly connected automatonand let π its distribution. Let ρ the run of A on a normal sequence x . Thenfor each state q lim n →∞ | ρ [1: n ] | q n = π q . where ρ [1: n ] is the ﬁnite run made of the ﬁrst n transitions of ρ Proof.

Since P q ∈ Q π q = 1, it suﬃces to prove that lim sup n →∞ | ρ [1: n ] | q /n > π q holds for each state q .Let ε > δ = ε provides an integer k such that B = n w ∈ A k : ∃ p (cid:12)(cid:12) | p ∗ w | q /k − π q (cid:12)(cid:12) > ε o has cardinality at most ε ( A ) n . The run ρ is then factorized ρ = p w −→ p w −→ p w −→ p · · · = ( q ∗ w )( q ∗ w )( q ∗ w ) · · · where each word w i is of length k and x = w w w · · · . Since x is normal,there is, by Theorem 4 in [2], an integer N such that for each n > N thecardinality of the set { i < n : w i = w } is greater than (1 − ε ) n/ ( A ) k foreach word w ∈ A k .lim sup n →∞ | ρ [1: n ] | q n = lim n →∞ | ρ [1: nk ] | q nk = 1 nk n − X i =0 | q i ∗ w i | q > nk X w ∈ A k { i < n : w i = w } × min p ∈ Q | p ∗ w | q > nk X w ∈ A k \ B ((1 − ε ) n/ ( A ) k )( k ( π q − ε ))= (1 − ε ) ( π q − ε )Since this inequality holds for each real number ε >

0, we have proved thatlim sup n →∞ | ρ [1: n ] | q /n > π q .Using the terminology of Markov chains, a strongly connected compo-nent (SCC) of an automaton is called recurrent if it cannot be left. Thismeans that there is no transition p a −→ q where p is in that component and q is not. The following lemma is Satz 2.5 in [16]. Lemma 8.

Let A be an automaton and let ρ be a run of A on a normalinput sequence. The run ρ reaches a recurrent SCC of A . Lemma 9.

Let S be a strongly connected selector. For each integer k andeach real number ε > , there exists an integer N such that for each inte-ger n > N , each state p and each word w of length k , the number of runs p u | v −−→ q of length n such that w is a preﬁx of the output label v is between (1 − ε )( A ) n −| w | and ( A ) n −| w | .Proof. Let p be any state. The upper bound ( A ) n −| w | has been alreadyproved in Lemma 5. It remains to prove the lower bound.Let ﬁx a state q such that the transitions starting from q are of type I. Ifno such state exists, all transitions of the selector outputs the empty wordand and the output label of any run is empty. Applying Lemma 6 with ε/ ( A ) k and δ = π q / N such that for each n > N ,the set B = (cid:8) u ∈ A n : (cid:12)(cid:12) | p ∗ u | q /n − π q (cid:12)(cid:12) > π q / (cid:9) has cardinality at most ε ( A ) n − k . Fix now N = max( N , k/π p ) and let n be such that n > N . If a word u of length n does not belong to B , the run p ∗ u satisﬁes | p ∗ u | q > nπ q / > k . This implies that the length of its outputlabel is greater than k . Indeed, the state q has at most k + 1 occurrences inthe run and each transition starting from q outputs one symbol.Consider the ( A ) n runs of the form p ∗ u for u of length n . Amongthese runs, at most ε ( A ) n − k many of them do not have an output greaterthan k . For each w ′ = w , w ′ is the preﬁx of the output label of at most( A ) n − k many of them. It follows that w is the preﬁx of the output labelof at least (1 − ε )( A ) n − k many of them.Let A be an automaton with state set Q . We now deﬁne and automatonwhose states are the run of length n in A . We let A n denote the automatonwhose state set is { p ∗ w : p ∈ Q, w ∈ A n } and whose set of transitions isdeﬁned by (cid:8) ( p ∗ bw ) a −→ ( q ∗ wa ) : p b −→ q in A , a, b ∈ A and w ∈ A n − (cid:9) The Markov chains associated with the automaton A n is called the snake Markov chains. See Problems 2.2.4, 2.4.6 and 2.5.2 (page 90) in [6] for moredetails. It is pure routine to check that the distribution ξ of A n is given by ξ p ∗ w = π p / ( A ) n for each state p and each word w of length n . Proof of theorem 4.

Let y be the output of the run of S on x . By Lemma 8,the run of S on x reaches a recurrent SCC. Therefore it can be assumedwithout loss of generality that the selector S is strongly connected.Let k be a ﬁxed integer. We claim that for each word w of length k lim n →∞ | y [1: n ] | w /n = 1 / ( A ) k . With each occurrence of a word w of10ength k in y , we associate the occurrence of the state q in the run at whichstarts the transition that outputs the ﬁrst symbol of w . Note that transi-tions starting from q must be of type I. Conversely, with each occurrence inthe run of such a state, we associate the block of length k of y starting fromthat position.We ﬁx a state p such that transitions starting from p have type I. Weﬁrst claim that for each integer n all runs of length n starting from p havethe same frequency in the run. To prove this claim, we apply Corollary 7to the automaton A n where A is the automaton obtained by removing theoutputs from S .Let ε > n such that for each w on length k , the number of run starting from p out-putting w as their ﬁrst k symbols is between (1 − ε )( A ) n − k and ( A ) n − k .Combining this result with the fact that all these runs of length n havethe same frequency, we get that the frequency of of each w is between(1 − ε )( A ) − k and ( A ) k . Since this is true for each ε >

0, all wordsof length k have the same frequency after an occurrence of p . Since thisis true for each state p , we get that all words of length k have the samefrequency in y . In this section we extend Agafonov’s result to the more general setting ofshifts of ﬁnite type. In this context, normality is deﬁned through the Parrymeasure which is the unique invariant and compatible measure with maximalentropy. A sequence is said to be normal if it is generic for that measure. Weactually prove a slightly stronger result by showing that genericity for anyMarkov measure is preserved by ﬁnite state selection as long as the selectionis compatible with the measure. This includes the case of shifts of ﬁnitetype because their Parry measure is Markovian.To obtain such a result, the selection must be perfomed in a compatibleway with the measure and its support. This boils down to putting someconstraints on the selector to guarantee that if the input sequence is in thesupport of the measure, then the output sequence is also in that support.Insuring that the output is still in the support is not enough as it is shown bythe following example. Consider the golden mean shift X and the selectorpictured in Figure 2. This selector selects symbols following a 1. If the inputsequence x is in X , the sequence y of selected symbols is 0 N = 000 · · · since x has no consecutive 1s. Therefore, y is always in X but genericity is lost. Toprevent this problematic behaviour, the selector is only allowed to select thenext symbol if the last read symbol and the last selected symbol coincide.This restriction rules out the previous selector because it does satisﬁes thisproperty. 11e suppose that a markov measure µ = µ π,P is ﬁxed and we let X µ be its support. We introduce automata and selectors which are compatiblewith the shift X µ . An automaton A is compatible with X µ if there existsa function ι from its state set Q to A such that the following condition isfulﬁlled.i) If p a −→ q is a transition of A , then P ι ( p ) a > ι ( q ) = a .The condition implies that all transitions arriving to a given state q havethe same label ι ( q ) and that the label of any path is in the shift X µ . Such anautomaton is called X µ -complete if for each pair ( p, a ) such that P ι ( p ) a > p a −→ q for some state q .We continue by deﬁning selectors which are compatible with X µ . Aselector S is compatible with X µ exists two functions ι and η from its stateset Q to the alphabet A such that the following two conditions are fulﬁlled.i) If p a | a −−→ q is a transition of type I, then P ι ( p ) a > ι ( q ) = η ( q ) = a ,and η ( p ) = ι ( p )ii) If p a | λ −−→ q is a transition of type II, then P ι ( p ) a > ι ( q ) = a and η ( q ) = η ( p ).The condition P ι ( p ) a > η ( p ) = ι ( p ) forthe transition p a | a −−→ q states the last read and last selected symbols mustcoincide for the selector to be able to select. The other conditions statesthat ι ( q ) is always the last read symbol, and that η ( q ) is the last selectedsymbol if there is one and that it is equal to η ( p ) otherwise.000 001010 011100 101110 1110 | λ | λ | λ | λ | λ | λ | λ | λ | | | λ | λ | λ | λ | | prs where p ∈ { , } is the parity of the number of read12ymbols so far, r ∈ { , } is the last read symbol and s ∈ { , } the lastselected symbol. The two functions ι and η can be deﬁned by ι ( prs ) = r and η ( prs ) = s .The following theorem states that selection with compatible selectorspreserves genericity for Markov measures. The input sequence x must beassumed to be in the shift X µ because compatible selectors only read se-quences from X µ . Theorem 10.

Let µ be a Markov measure and let x be a sequence in X µ which is µ -generic. For each oblivious selector S compatible with X µ , theoutput S ( x ) of S on x belongs to X µ and is µ -generic. The previous theorem can be applied to the Parry measure µ of a shift X of ﬁnite type because the suppport of µ is actually X µ = X .We start with the deﬁnition of the conditional measures induced by µ .For each symbol a ∈ A , we let µ a denote the conditional measure deﬁned by µ a ( a a · · · a n ) = P aa P a a · · · P a n − a n . Note that the measures µ a might not be invariant. Since π is the stationnarydistribution, the measure µ can be recovered from the measures µ a by theformula µ = P a ∈ A π a µ a .The following lemma shows that the set of runs starting from a state p and outputting a ﬁxed word w is not too large. This is the analog of Lemma 5in the context of Markov measures. Lemma 11.

Let S be an oblivious selector compatible with µ . For eachstate p of S , each integer n > , and word w ∈ A ∗ such that | w | n , thenthe inequality µ ι ( p ) ( { u ∈ A n : p u | v −−→ q and w ⊑ v } ) < µ η ( p ) ( w ) holds.Proof. Let U be the set { u ∈ A n : p u | v −−→ q and w ⊑ v } . The proof is carriedout by induction on the integer n . If n = 0, the set U is U = { λ } and w must be the empty word λ . The inequality is then satisﬁed because bothmeasures are equal to 1. We now suppose that n >

1. Since the selectoris oblivious, all transitions starting from state p have the same type, eithertype I or type II. We distinguish two cases depending on the type of thesetransitions.We ﬁrst suppose that all transitions starting from state p have type I.Let us write w = aw ′ where a is a symbol and w ′ a word. Consider thetransition p a | a −−→ p ′ . The compatibility of S with µ implies that ι ( p ) = η ( p )and ι ( p ′ ) = η ( q ) = a . All runs starting from p such that w is a preﬁx ofthe output label must use this transition as a ﬁrst transition. Applying theinduction hypothesis to p ′ , n − w ′ gives that µ a ( U ′ ) < µ a ( w ′ ) where U ′ = { u ∈ A n − : p ′ u | v −−→ q and w ⊑ v } . Since U = aU ′ , the result followsfrom the equalities µ ι ( p ) ( U ) = P ι ( p ) a µ a ( U ′ ) and µ η ( p ) ( w ) = P η ( p ) a µ a ( w ′ ).13e now suppose that all transitions starting from state p have type II,that is, have the form p a | λ −−→ p a for each symbol a . The compatibility of S with µ implies that ι ( p a ) = a and η ( p a ) = η ( p ) for each a ∈ A . All runsof length n starting from p have an output label of length at most n − | w | = n , there is no run such that w is preﬁx of its outputlabel and the inequality is trivially satisﬁed. If | w | n −

1, applying theinduction hypothesis to each p a , n − w , gives that µ a ( U a ) < µ η ( p ) ( w )where U a = { u ∈ A n − : p a u | v −−→ q and w ⊑ v } . Since U = S a ∈ A aU a ,the result follows from the equalities µ ι ( p ) ( U ) = P a ∈ A P ι ( p ) a µ a ( U a ) and µ η ( p ) ( w ) = P a ∈ A P ι ( p ) a µ η ( p a ) ( w ) = µ η ( p a ) ( w ).Some of the bounds are again obtained using the ergodic theorem forMarkov chains [6, Thm 4.1]. For that purpose, we associate a Markov chain M to each strongly connected automaton A which is compatible with X µ and X µ -complete. This means that there is a function ι from Q to A suchthat if p a −→ q is a transition, then ι ( q ) = a . For simplicity, we assume thatthe state set Q of A is the set { , . . . , Q } .The state set of the Markov chain is the same set { , . . . , Q } . Thetransition matrix of the Markov chain is the matrix ˆ P = ( ˆ P pq ) p,q Q where each entry ˆ P pq is equal to P ι ( p ) a = P ι ( p ) ι ( q ) if p a −→ q is a transition of A and 0 otherwise. Since the automaton is assumed to be deterministic and X µ -complete, the matrix ˆ P is stochastic. If the automaton A is stronglyconnected, the Markov chain is irreducible and it has therefore a uniquestationary distribution ˆ π such that ˆ π ˆ P = ˆ π . The vector ˆ π is called the distribution of A . The matrix ˆ P and its stationary distribution ˆ π deﬁnea Markov measure ˆ µ = µ ˆ π, ˆ P on ﬁnite runs of A . The link between themeasures µ and ˆ µ is that ˆ µ ( p ∗ u ) = ˆ π p µ ι ( p ) ( u ) for each state p and eachword u . Lemma 12.

Let A be a strongly connected deterministic and complete au-tomaton and let π be its distribution. For each real numbers ε, δ > , thereexists an integer N such that for each integer n > Nµ (cid:0)(cid:8) u ∈ A n : ∃ p, q ∈ Q (cid:12)(cid:12) | p ∗ u | q /n − ˆ π q (cid:12)(cid:12) > δ (cid:9)(cid:1) < ε The lemma is stated for the measure µ but the ergodic theorem is validfor any initial distribution. The result is therefore also valid for the condi-tional measures µ a . Proof.

The proof is a mere application of the ergodic theorem for Markovchains [6, Thm 4.1].

Corollary 13.

Let A be a deterministic and strongly connected automatonand let π its distribution. Let ρ be the run of A on a µ -generic sequence x .Then for each state q lim n →∞ | ρ [1: n ] | q n = ˆ π q . here ρ [1: n ] is the ﬁnite run made of the ﬁrst n transitions of ρ Proof.

Since P q ∈ Q ˆ π q = 1, it suﬃces to prove that lim inf n →∞ | ρ [1: n ] | q /n > ˆ π q holds for each state q .Let ε > δ = ε provides an integer k such that µ (cid:16)n u ∈ A k : ∃ p (cid:12)(cid:12) | p ∗ u | q /k − ˆ π q (cid:12)(cid:12) > ε o(cid:17) < ε. The run ρ is then factorized ρ = p u −→ p u −→ p u −→ p · · · = ( p ∗ u )( p ∗ u )( p ∗ u ) · · · where each word u i is of length k and x = u u u · · · . Since x is µ -generic,there is an integer N such that for each n > N the cardinality of the set { i < n : u i = u } is greater than (1 − ε ) nµ ( u ) for each word u ∈ A k .lim inf n →∞ | ρ [1: n ] | q n = lim inf n →∞ | ρ [1: nk ] | q nk = 1 nk n − X i =0 | p i ∗ u i | q > nk X u ∈ A k { i < n : u i = u } × min p ∈ Q | p ∗ u | q > nk X u ∈ A k \ B ((1 − ε ) nµ ( u ))( k (ˆ π q − ε ))= (1 − ε ) (ˆ π q − ε )Since this inequality holds for each real number ε >

0, we have proved thatlim inf n →∞ | ρ [1: n ] | q /n > ˆ π q . Lemma 14.

Let A be an automaton compatible with µ and let ρ be a runin A on a µ -generic sequence in X µ . The run ρ reaches a recurrent stronglyconnected component of A .Proof. We claim that for each SCC C which is not recurrent, there existsa word w with µ ( w ) > a such that from anystate q in C such that P ι ( q ) a > q ∗ w leaves C .We ﬁx a symbol a . Let { q , . . . , q n } be the set of states q in C such P ι ( q ) a >

0. We construct a sequence w , w , . . . , w n of words such that if i j , then the run q i ∗ w j leaves C . We set w = λ and the statement istrue. Suppose that w , . . . , w k have been already chosen and consider thestate p k = q k +1 · w k . If this state p k is already out of C , we set w k +1 = w k .Otherwise, since C is not recurrent, there is a word v k such that p k · v k isout of C : we set w k +1 = w k v k so that q k +1 · w k +1 = p k · v k is out of C .15he run ρ reaches a last SCC C . Suppose by constriction that C is notrecurrent. By the previous claim there is a word w = aw ′ such that µ ( w ) > q in C with P ι ( q ) a > q ∗ w leaves C . Since x is µ -generic, the word w occurs inﬁnitely often in x . Let q be state of C reached by the run ρ before an occurrence of w . Since x is in X µ , P ι ( q ) a > q ∗ w leaves C while C is supposed to be thelast SCC reached by ρ . Lemma 15.

Let S be a strongly connected selector. For each integer k and each real number ε > , there exists an integer N such that for eachinteger n > N , each state p and each word w of length k , the inequalities (1 − ε ) µ η ( p ) ( w ) < µ ι ( p ) ( { u ∈ A n : p u | v −−→ q and w ⊑ v } ) < µ η ( p ) ( w ) hold.Proof. Let p be any state. The upper bound µ η ( p ) ( w ) has been alreadyproved in Lemma 11. It remains to prove the lower bound.Let ﬁx a state q such that the transitions starting from q are of type I.If no such state exists, all transitions of the selector outputs the emptyword and the output label of any run is empty. Applying Lemma 12 with εµ η ( p ) ( w ) and δ = π q / N such that for each n > N , µ ι ( p ) (cid:0)(cid:8) u ∈ A n : (cid:12)(cid:12) | p ∗ u | q /n − π q (cid:12)(cid:12) > π q / (cid:9)(cid:1) < εµ η ( p ) ( w ) . Fix now N = max( N , k/π p ) and let n be such that n > N . If a word u oflength n does not belong to the small set above, the run p ∗ u satisﬁes | p ∗ u | q >nπ q / > k for each state p . This implies that the length of its output labelis greater than k . Indeed, the state q has at most k + 1 occurrences in therun and each transition starting from q outputs one symbol.Consider the ( A ) n runs of the form p ∗ u for u of length n . The measureof those having an output smaller than k is less than εµ η ( p ) ( w ). For each w ′ = w , the measure of those having w ′ as preﬁx of length k of their outputlabel is at most µ η ( p ) ( w ). It follows that the measure of those having w aspreﬁx of length k of their output label is at most (1 − ε ) µ η ( p ) ( w ).Let A be an automaton with state set Q . We now deﬁne an automatonwhose states are the run of length n in A . We let A n denote the automatonwhose state set is { p ∗ u : p ∈ Q, u ∈ A n } and whose set of transitions isdeﬁned by (cid:8) ( p ∗ bu ) a −→ ( q ∗ ua ) : p b −→ q in A , a, b ∈ A and u ∈ A n − (cid:9) The Markov chains associated with the automaton A n is called the snake Markov chains. See Exercises 2.2.4, 2.4.6 and 2.5.2 in [6] for more details.It is pure routine to check that the distribution ˆ ξ of A n is given by ˆ ξ p ∗ w =ˆ π p µ η ( p ) ( w ) for each state p and each word w of length n .16 roof of theorem 10. Let y be the output of the run of S on x . By Lemma 14,the run of S on x reaches a recurrent strongly connected component. There-fore it can be assumed without loss of generality that the selector S isstrongly connected.Let k be a ﬁxed integer. We claim that for each word w of length k lim n →∞ | y [1: n ] | w /n = µ ( w ). With each occurrence of a word w of length k in y , we associate the occurrence of the state q in the run from which startsthe transition that outputs the ﬁrst symbol of w . Note that transitionsstarting from q must be of type I. Conversely, with each occurrence in therun of such a state, we associate the block of length k of y starting fromthat position.We ﬁx a state p such that transitions starting from p have type I. Weﬁrst claim that for each integer n , each run p ∗ u of length n starting from p has a frequency of µ η ( p ) ( w ). To prove this claim, we apply Corollary 13to the automaton A n where A is the automaton obtained by removing theoutputs from S .Let ε > n such that for each w on length k , the measure µ ι ( p ) of all runs starting from p outputting w as their ﬁrst k symbols is between (1 − ε ) µ η ( p ) ( w ) and µ η ( p ) ( w ).Combining this result with the fact that each run p ∗ u of length n occursafter state p with a frequency equal to µ ι ( p ) ( u ), we get that the frequency ofeach word w is between (1 − ε ) µ η ( p ) ( w ) and µ η ( p ) ( w ). Since this is true foreach ε >

0, all words of length k have a frequency after state p equal to itsmeasure µ η ( p ) ( w ). Since this is true for each state p , we get that each wordof length k have a frequency equal to µ ( w ) in y . Conclusion

As a conclusion, we would like to mention a few extensions of our results.Agafanov’s theorem deals with preﬁx selection: a given digit is selected ifthe preﬁx of the word up to that digit belongs to a ﬁxed set of ﬁnite words.

Suﬃx selection is deﬁned similarly: a given digit is selected if the suﬃx ofthe word from that digit belongs to a ﬁxed set of sequences. It has beenshown in [3] that suﬃx selection also preserves normality as long as the ﬁxedset of sequences is regular. Let us recall that a set of sequences is regular ifit can be accepted by non-deterministic B¨uchi or by a deterministic Mullerautomaton [14]. The proof given in [3] is based on the characterization ofnormality by non-compressibility. The proof techniques developed here toprove Agafanov’s theorem can be adapted to also prove directly the resultabout suﬃx selection.The preﬁx and suﬃx selections considered so far are usually called obliv-ious because the digit to be selected is not included to either the preﬁxor the suﬃx taken into account. Non-oblivious does not preserve in gen-17ral normality but it does for a restricted class of sets of ﬁnite words calledgroup languages [8]. Group languages are sets of words which are acceptedby deterministic automata such that each symbol induces a permutation ofthe states. This later property means that for each symbol a , the functionwhich maps each state p to the state q such that p a −→ q is a permutation ofthe state set. The techniques presented in this paper can also be adaptedto prove such a result. References [1] V. N. Agafonov. Normal sequences and ﬁnite automata.

Soviet Math-ematics Doklady , 9:324–325, 1968.[2] N. ´Alvarez and O. Carton. On normality in shifts of ﬁnite type.

CoRR ,abs/1807.07208, 2018.[3] V. Becher, O. Carton, and P. A. Heiber. Normality and automata.

Journal of Computer and System Sciences , 81(8):1592–1613, 2015.[4] V. Becher and P. A. Heiber. Normal numbers and ﬁnite automata.

Theoretical Computer Science , 477:109–116, 2013.[5] ´E. Borel. Les probabilit´es d´enombrables et leurs applicationsarithm´etiques.

Rendiconti del Circolo Matematico di Palermo , 27:247–271, 1909.[6] P. Br´emaud.

Markov Chains: Gibbs Fields, Monte Carlo Simulation,and Queues . Springer, 2008.[7] A. Broglio and P. Liardet. Predictions with automata. Symbolic dy-namics and its applications.

Contemporary Mathematics , 135:111–124,1992. Also in Proceedings AMS Conference in honor of R. L. Adler.New Haven CT - USA 1991.[8] O. Carton and J. Vandehey. Preservation of normality by non-obliviousgroup selection.

ToCS , 2020.[9] J. Dai, J. Lathrop, J. Lutz, and E. Mayordomo. Finite-state dimension.

Theoretical Computer Science , 310:1–33, 2004.[10] B. P. Kitchens.

Symbolic Dynamics . Springer, 1998.[11] D. Lind and B. Marcus.

An Introduction to Symbolic Dynamics andCoding . Cambridge University Press, 1992.[12] M. Madritsch. Normal numbers and symbolic dynamics. In

Sequences ,chapter 8. Cambridge University Press, 2018.1813] M. G. O’Connor. An unpredictability approach to ﬁnite-state random-ness.

Journal of Computer and System Sciences , 37(3):324–336, 1988.[14] D. Perrin and J.-´E. Pin.

Inﬁnite Words . Elsevier, 2004.[15] J. Sakarovitch.

Elements of automata theory . Cambridge UniversityPress, 2009.[16] C. P. Schnorr and H. Stimm. Endliche automaten und zufallsfolgen.