A direct proof of Agafonov's theorem and an extension to shift of finite type
aa r X i v : . [ c s . F L ] M a y A direct proof of Agafonov’s theoremand an extension to shifts of finite type
Olivier CartonMay 14, 2020
Abstract
We provide a direct proof of Agafonov’s theorem which states thatfinite state selection preserves normality. We also extends this resultto the more general setting of shifts of finite type by defining selectionswhich are compatible with the shift. A slightly more general statementis obtained as we show that any Markov measure is preserved by finitestate compatible selection.
Normality was introduced by Borel in [5] more than one hundred years agoto formalize the most basic form of randomness for real numbers. A numberis normal to a given integer base if its expansion in that base is such thatall blocks of digits of the same length occur in it with the same limitingfrequency.Although normality is a purely combinatorial property, it has close linkswith finite state machines. A fundamental theorem relates normality andfinite automata: an infinite sequence is normal to a given alphabet if andonly if it cannot be compressed by lossless finite transducers. These are de-terministic finite automata with injective input-output behavior. This resultwas first obtained by joining a theorem by Schnorr and Stimm [16] with atheorem by Dai, Lathrop, Lutz and Mayordomo [9]. Becher and Heiber gavea direct proof in [4]. Another astonishing result is Agafonov’s theorem stat-ing that selecting symbols in a normal sequence using a finite state machinepreserves normality [1]. Agafonov’s publication [1] does not really includethe proof but O’Connor [13] provided it using predictors defined from finiteautomata, and Broglio and Liardet [7] generalized it to arbitrary alphabets.Later Becher and Heiber gave another proof based of the characterizationof normality by non-compressibility by lossless finite transducers [4]. In thispaper, we provide a direct proof of Agafonov’s theorem. The proof is almostelementary but it still relies on Markov chains arguments.1he notion of normality has been extended to broader contexts like theone of dynamical systems and especially shifts of finite type [12]. When soficshifts are irreducible and aperiodic, they have a measure of maximal entropyand a sequence is then said to be normal if the frequency of each block equalsits measure. This extension to shifts meets the original aim of normality tostudy expansions of numbers in bases when the shift arises from a numericalsystems like the β -shifts coming from the numeration in a non-integer base β .Normality can be again interpreted as the good distribution of blocks ofdigits in the expansion of a number in a base β . In this paper, we extendAgafonov’s theorem to the setting of shift of finite type. More precisely,we show that genericity for Markovian measure is preserved by selectionwith finite state state machines if the machines satisfy some compatibilitycondition with the measure. This result includes the case of shifts of finitetype as their Parry measure is Markovian.The paper is organized as follows. Section 2 is devoted to notation andmain definitions. The link between selection and special finite-state ma-chines called selectors is given in Section 3. Agafanov’s theorem is statedand proved in Section 4. The extension of the theorem to Markovian mea-sures is given in Section 5. Note that the proof given that section subsumesthe one given in the previous one. We keep both proofs since we thinkthat the one in Section 4 is a nice preparation for the reader to the one inSection 5. We write N for the set of all non-negative integers. Let A be a finite set. Welet A ∗ and A N respectively denote the sets of all finite and infinite sequencesover the alphabet A . Similarly A k stands for the set of sequences of length k .Finite sequence are also called words . The empty word is denoted by λ and the length of a word w is denoted | w | . The positions in finite andinfinite words are numbered starting from 1. For a word w and positions1 i j | w | , we let w [ i ] and w [ i : j ] denote respectively the symbol a i atposition i and the word a i a i +1 · · · a j from position i to position j . A wordof the form w [ i : j ] is called a block of w . A word u is a prefix (respectively suffix ) of a word w , denoted u ⊑ w , if w = uv (respectively w = vu ) forsome word v .For any finite set S we denote its cardinality with S . We write log forthe base 2 logarithm.In this article we are going to work on shift spaces, in particular shifts offinite type (SFT). Let A be a given alphabet. The full shift is the set A N ofall (one-sided) infinite sequences ( x n ) n > of symbols in A . The shift σ is thefunction from A N to A N which maps each sequence ( x n ) n > to the sequence2 x n +1 ) n > obtained by removing the first symbol.A shift space of A N or simply a shift is a subset X of A N which is closedfor the product topology and invariant under the shift operator, that is σ ( X ) = X . Let F ⊂ A ∗ be a set of finite words called forbidden blocks . Theshift X F is the subset of A N made of sequences without any occurrences ofblocks in F . More formally, it is the set X F = { x : x [ m : n ] / ∈ F for each 1 m n } . It is well known that a shift X is characterized by its forbidden blocks,that is X = X F for some set F ⊂ A ∗ . The shift X is said to be of finitetype if X = X F for some finite set F of forbidden blocks [11, Def. 2.1.1]. Upto a change of alphabet, any shift space of finite type is the same as a shiftspace X F where any forbidden block has length 2, that is F ⊂ A .For simplicity, we always assume that each forbidden block has length 2.In that case, the set F is given by an A × A -matrix P = ( p ab ) a,b ∈ A where p ab = 0 if ab ∈ F and p ab > X = X P . Theshift X is called irreducible if the graph induced by the matrix P is stronglyconnected, that is, for each symbols a, b ∈ A , there exists an integer n (depending on a and b ) such that P nab >
0. The shift X is called irreducible and aperiodic if there exists an integer n such that P nab > a, b ∈ A . Example 1 (Golden mean shift) . The golden mean shift is the shift space X F ⊂ { , } N where the set of forbidden blocks is F = { } . It is madeof all sequences over { , } with no two consecutive . This subshift is alsoequal to X M where the matrix M is given by M = ( ) . Let x = a a a · · · be a sequence over the alphabet A . Let L ⊆ A ∗ be aset of finite words over A . The word obtained by oblivious prefix selection of x by L is x ↾ L = a i a i a i · · · where i , i , i , . . . is the enumerationin increasing order of all the integers i such that the prefix a a · · · a i − belongs to L . This selection rule is called oblivious because the symbol a i isnot included in the considered prefix. If L = A ∗ x ↾ L is made of all symbols of x occurring after a 1in the same order as they occur in x . A probability measure on A ∗ is a function µ : A ∗ → [0 ,
1] such that µ ( λ ) = 1and X a ∈ A µ ( wa ) = µ ( w )holds for each word w ∈ A ∗ . The simplest example of a probability measureis a Bernoulli measure . It is a monoid morphism from A ∗ to [0 ,
1] (endowed3ith multiplication) such that P a ∈ A µ ( a ) = 1. Among the Bernoulli mea-sures is the uniform measure which maps each word w ∈ A ∗ to ( A ) −| w | .In particular, each symbol a is mapped to µ ( a ) = 1 / A .By the Carath´eodory extension theorem, a measure µ on A ∗ can beuniquely extended to a probability measure ˆ µ on A N such that ˆ µ ( wA N ) = µ ( w ) holds for each word w ∈ A ∗ . In the rest of the paper, we use the samesymbol for µ and ˆ µ . A probability measure µ is said to be (shift) invariant if the equality X a ∈ A µ ( aw ) = µ ( w )holds for each word w ∈ A ∗ .We now recall the definition of Markov measures. For a stochastic ma-trix P and a stationary distribution π , that is a raw vector such that πP = π ,the Markov measure µ π,P is the invariant measure defined by the followingformula [10, Lemma 6.2.1]. µ π,P ( a a · · · a k ) = π a P a a · · · P a k − a k A measure µ is compatible with a shift X F if it only puts weight onblocks of X , that is, µ ( w ) > w / ∈ F for each word w . For a shiftof finite type, there is a unique compatible measure with maximal entropy[10, Thm. 6.2.20]. This measure is called the Parry measure and it is aMarkov measure. This measure can be explicitly given as follows. TheParry measure of a SFT X M is the Markov measure given by the stochasticmatrix P = ( P i,j ) where P i,j = M i,j r j /θr i and the stationary probabilitydistribution π defined by π i = l i r i , where θ is the Perron eigenvalue ofthe matrix M and the vectors l and r are respectively the left and righteigenvectors of M for θ normalized so that P ki =1 l i r i = 1. Example 2 (Parry measure of the golden mean shift) . Consider again thegolden mean shift X . Its Parry measure is the Markov measure µ π,P where π is the distribution π = ( λ / (1 + λ ) , / (1 + λ )) and P is the stochasticmatrix P = (cid:16) /λ /λ (cid:17) where λ is the golden mean. Conversely, the support of an invariant measure µ is the shift X µ = X F where F is the set of words of measure zero, that is F = { w : µ ( w ) = 0 } .If µ is the Markovian measure µ π,P , then its support X µ is a shift of finitetype because it is equal to the shift X P given by matrix P .We recall here the notion of normality and the notion of genericity. Westart with the notation for the number of occurrences of a given word u within another word w . For two words u and w , the number | w | u of oc-currences of u in w is given by | w | u = { i : w [ i : i + | u | −
1] = u } . Borel’sdefinition [5] of normality for a sequence x ∈ A N is that x is normal if foreach finite word w ∈ A ∗ lim n →∞ | x [1: n ] | w n = ( A ) −| w | x is called generic for a measure µ (or merely µ -generic ) if foreach word w ∈ A ∗ lim n →∞ | x [1: n ] | w n = µ ( w )Normality is then the special case of genericity when the measure µ is theuniform measure. There are another definitions of normality and genericitytaking into account only some occurrences, called aligned occurrences, ofeach word w . More precisely, the sequence x is factorized x = w w w · · · where | w i | = | w | for each i > { i n : w i = w } /n converges to µ ( w ) when n goes to infinity for each word w .It is shown in [2] that the two notions coincide as long as the measure µ isMarkovian. In this section, we introduce the automata with output also known as trans-ducers which are used to select symbols from a sequence. We consider deterministic transducers computing functions from sequences in a shift X to sequences in a shift Y , that is, for a given input sequence x ∈ X , there isat most one output sequence y ∈ Y . We focus on transducer that operatein real-time, that is, they process exactly one input alphabet symbol pertransition. We start with the definition of a transducer. Definition 3. An input deterministic transducer T is a tuple h Q, A, B, δ, I, F i ,where • Q is a finite set of states , • A and B are the input and output alphabets, respectively, • δ : Q × A → B ∗ × Q is the transition function, • I ⊆ Q and F ⊆ Q are the sets of initial and final states, respectively. Input deterministic transducers are also called sequential in the litter-ature [15]. The relation δ ( p, a ) = ( w, q ) is written p a | v −−→ q and the tuple h p, a, w, a i is then called a transition of the transducer. A finite (respec-tively infinite) run is a finite (respectively infinite) sequence of consecutivetransitions, q a | v −−−→ q a | v −−−→ q · · · q n − a n | v n −−−→ q n . Its input and output labels are respectively a · · · a n and v · · · v n . A finite runis written q a ··· a n | v ··· v n −−−−−−−−→ q n . An infinite run is written q a a a ···| v v v ··· −−−−−−−−−−−→∞ . An infinite run is accepting if its first state q is initial. Note that thereis no accepting condition. This is due to the fact that we always assumethat the domain is a closed subset of A N . Since transducers are assumed tobe input deterministic there is at most one run with input label x for each5 in A N . If the output label is the infinite sequence y , we write y = T ( x ).By a slight abuse of notation, we write T ( x [ m : n ]) for the output of T alongthat run while reading the block x [ m : n ] of x . We always asumme that alltransducers are trim : each state occurs in at least one accepting run. Sincetransducers are input deterministic, the stating state and the input labeldetermine the run and the ending state. For a state p and a word u , we let p ∗ u and p · u denote respectively the run p u | v −−→ q and its ending state q .A selector is a deterministic transducer such that each of its transitionshas one of the types p a | a −−→ q (type I), p a | λ −−→ q (type II) for a symbol a ∈ A .In a selector, the output of a transition is either the symbol read by thetransition (type I) or the empty word (type II). Therefore, it can be alwaysassumed that the output alphabet B is the same as the input alphabet A .It follows that for each run p u | v −−→ q , the output label v is a subword, that isa subsequence, of the input label u . q q q | | | | λ | λ | oblivious if all transitions starting from a given state havethe same type. The selector pictured in Figure 1 is not oblivious but theone pictured in Figure 2 is oblivious. The terminology is justified by thefollowing relation between oblivious prefix selection and selectors. If L ⊆ A ∗ is a rational set, the oblivious prefix selection by L can be performed by anoblivious selector. There is indeed an oblivious selector S such that foreach input word x , the output S ( x ) is the result x ↾ L of the selectionby L . This selector S can be obtained from any deterministic automaton A accepting L . Replacing each transition p a −→ q of A by either p a | a −−→ q ifthe state p is accepting or by p a | λ −−→ q otherwise yields the selector S . Itcan be easily verified that the obtained transducer is an oblivious selectorperforming the oblivious prefix selection by L . Conversely, each obliviousselector performs the oblivious prefix selection by K where K is the set ofwords being the input label of a run from the initial state to a state q suchthat transitions starting from q have type I.The transducer pictured in Figure 2 is an oblivious selector that selectssymbols occurring after a 1. It performs the oblivious prefix selection by L where L is the set A ∗ q | λ | λ | | automaton a transducer whereoutput labels of transitions are removed. This means that the transitionfunction δ is then a function from Q × A to Q where Q is the state set and A the alphabet. In this section, we consider normality in the full shift. We give an alternativeproof of Agafonov’s result [1] that finite state selection preserves normality.This means that if the sequence x is normal and L is a regular set of finitewords, then the sequence x ↾ L is still normal. Since it has been remarkedthat selecting by a regular set is the same as using an oblivious selector, theresult means that if S is an oblivious selector and x is normal, then S ( x ) isalso normal.The following theorem of Agafonov states that oblivious prefix selectionby a regular set preserves normality. Theorem 4 (Agafonov [1]) . If x is normal and L is regular, then x ↾ L isstill normal. The strategy of the proof is the following. We consider an obliviousselector S performing selection by L . This means that if x and y are theinput and output label of successful run, then y = x ↾ L . We show thenthat if the input label x a normal sequence, then the output of the runof S is also normal. We fix a state p of S and an integer ℓ . We show thatfor k great enough, the number of runs starting from p and outputting lessthan ℓ symbols is negligible. Then we show that for each words w and w ′ of length ℓ , the number of runs outputting w and w ′ are almost the same.Finally, we show that all these runs of lengths k starting from p have thesame frequency in a run whose input is a normal word.The following lemma shows that the number of runs starting from astate p and outputting a fixed word w is not too large. Lemma 5.
Let S be an oblivious selector. For each state p of S , eachinteger n > , and word w ∈ A ∗ such that | w | n , there are at most ( A ) n −| w | runs p u | v −−→ q of length n such that w is a prefix of the outputlabel v . roof. The proof is carried out by induction on the integer n . If n = 0,the only possible word is the empty word λ . Since there is only one run oflength 0, the inequality is satisfied. We now suppose that n >
1. Since theselector is oblivious, all transitions starting from state p have the same type,either type I or type II.We first suppose that all transitions starting from state p have type I.Let us write w = aw ′ where a is a symbol and w ′ a word. Consider thetransition p a | a −−→ q . All runs starting from p such that w is a prefix of theoutput label must use this transition as a first transition. Applying theinduction hypothesis to q , n − w ′ gives the result.We now suppose that all transitions starting from state p have type II,that is, have the form p a | λ −−→ q a for each symbol a . This implies that all runsof length n starting from p have an output label of length at most n − | w | = n , there is no run such that w is prefix of its output labeland the inequality is trivially satisfied. If | w | n −
1, applying the inductionhypothesis to each q a , n −
1, and w gives that the number of runs startingfrom q a such that w is a prefix of their output label is at most ( A ) n − −| w | .Summing up all these inequalities for all q a gives the required inequalityfor p .Some of the bounds are obtained using the ergodic theorem for Markovchains [6, Thm 4.1]. For that purpose, we associate a Markov chain M toeach strongly connected automaton A . For simplicity, we assume that thestate set Q of A is the set { , . . . , Q } . The state set of the Markov chain isthe same set { , . . . , Q } . The transition matrix of the Markov chain is thematrix P = ( p i,j ) i,j Q where each entry p i,j is equal to { a : i a −→ j } / A .Note that { a : i a −→ j } is the number of transitions from i to j . Since theautomaton is assumed to be deterministic and complete, the matrix P isstochastic. If the automaton A is strongly connected, the Markov chain isirreducible and it has therefore a unique stationary distribution π such that πP = π . The vector π is called the distribution of A .By a slight abuse of notation, we let | p ∗ w | q denote the number ofoccurrences of the state q in the finite run p ∗ w . The idea of the followinglemma is borrowed from [16]. Lemma 6.
Let A be a strongly connected deterministic and complete au-tomaton and let π be its distribution. For each real numbers ε, δ > , thereexists an integer N such that for each integer n > N (cid:8) w ∈ A n : ∃ p, q ∈ Q (cid:12)(cid:12) | p ∗ w | q /n − π q (cid:12)(cid:12) > δ (cid:9) < ε ( A ) n Proof.
The proof is a mere application of the ergodic theorem for Markovchains [6, Thm 4.1].The following corollary is also borrowed from [16].8 orollary 7.
Let A be a deterministic and strongly connected automatonand let π its distribution. Let ρ the run of A on a normal sequence x . Thenfor each state q lim n →∞ | ρ [1: n ] | q n = π q . where ρ [1: n ] is the finite run made of the first n transitions of ρ Proof.
Since P q ∈ Q π q = 1, it suffices to prove that lim sup n →∞ | ρ [1: n ] | q /n > π q holds for each state q .Let ε > δ = ε provides an integer k such that B = n w ∈ A k : ∃ p (cid:12)(cid:12) | p ∗ w | q /k − π q (cid:12)(cid:12) > ε o has cardinality at most ε ( A ) n . The run ρ is then factorized ρ = p w −→ p w −→ p w −→ p · · · = ( q ∗ w )( q ∗ w )( q ∗ w ) · · · where each word w i is of length k and x = w w w · · · . Since x is normal,there is, by Theorem 4 in [2], an integer N such that for each n > N thecardinality of the set { i < n : w i = w } is greater than (1 − ε ) n/ ( A ) k foreach word w ∈ A k .lim sup n →∞ | ρ [1: n ] | q n = lim n →∞ | ρ [1: nk ] | q nk = 1 nk n − X i =0 | q i ∗ w i | q > nk X w ∈ A k { i < n : w i = w } × min p ∈ Q | p ∗ w | q > nk X w ∈ A k \ B ((1 − ε ) n/ ( A ) k )( k ( π q − ε ))= (1 − ε ) ( π q − ε )Since this inequality holds for each real number ε >
0, we have proved thatlim sup n →∞ | ρ [1: n ] | q /n > π q .Using the terminology of Markov chains, a strongly connected compo-nent (SCC) of an automaton is called recurrent if it cannot be left. Thismeans that there is no transition p a −→ q where p is in that component and q is not. The following lemma is Satz 2.5 in [16]. Lemma 8.
Let A be an automaton and let ρ be a run of A on a normalinput sequence. The run ρ reaches a recurrent SCC of A . Lemma 9.
Let S be a strongly connected selector. For each integer k andeach real number ε > , there exists an integer N such that for each inte-ger n > N , each state p and each word w of length k , the number of runs p u | v −−→ q of length n such that w is a prefix of the output label v is between (1 − ε )( A ) n −| w | and ( A ) n −| w | .Proof. Let p be any state. The upper bound ( A ) n −| w | has been alreadyproved in Lemma 5. It remains to prove the lower bound.Let fix a state q such that the transitions starting from q are of type I. Ifno such state exists, all transitions of the selector outputs the empty wordand and the output label of any run is empty. Applying Lemma 6 with ε/ ( A ) k and δ = π q / N such that for each n > N ,the set B = (cid:8) u ∈ A n : (cid:12)(cid:12) | p ∗ u | q /n − π q (cid:12)(cid:12) > π q / (cid:9) has cardinality at most ε ( A ) n − k . Fix now N = max( N , k/π p ) and let n be such that n > N . If a word u of length n does not belong to B , the run p ∗ u satisfies | p ∗ u | q > nπ q / > k . This implies that the length of its outputlabel is greater than k . Indeed, the state q has at most k + 1 occurrences inthe run and each transition starting from q outputs one symbol.Consider the ( A ) n runs of the form p ∗ u for u of length n . Amongthese runs, at most ε ( A ) n − k many of them do not have an output greaterthan k . For each w ′ = w , w ′ is the prefix of the output label of at most( A ) n − k many of them. It follows that w is the prefix of the output labelof at least (1 − ε )( A ) n − k many of them.Let A be an automaton with state set Q . We now define and automatonwhose states are the run of length n in A . We let A n denote the automatonwhose state set is { p ∗ w : p ∈ Q, w ∈ A n } and whose set of transitions isdefined by (cid:8) ( p ∗ bw ) a −→ ( q ∗ wa ) : p b −→ q in A , a, b ∈ A and w ∈ A n − (cid:9) The Markov chains associated with the automaton A n is called the snake Markov chains. See Problems 2.2.4, 2.4.6 and 2.5.2 (page 90) in [6] for moredetails. It is pure routine to check that the distribution ξ of A n is given by ξ p ∗ w = π p / ( A ) n for each state p and each word w of length n . Proof of theorem 4.
Let y be the output of the run of S on x . By Lemma 8,the run of S on x reaches a recurrent SCC. Therefore it can be assumedwithout loss of generality that the selector S is strongly connected.Let k be a fixed integer. We claim that for each word w of length k lim n →∞ | y [1: n ] | w /n = 1 / ( A ) k . With each occurrence of a word w of10ength k in y , we associate the occurrence of the state q in the run at whichstarts the transition that outputs the first symbol of w . Note that transi-tions starting from q must be of type I. Conversely, with each occurrence inthe run of such a state, we associate the block of length k of y starting fromthat position.We fix a state p such that transitions starting from p have type I. Wefirst claim that for each integer n all runs of length n starting from p havethe same frequency in the run. To prove this claim, we apply Corollary 7to the automaton A n where A is the automaton obtained by removing theoutputs from S .Let ε > n such that for each w on length k , the number of run starting from p out-putting w as their first k symbols is between (1 − ε )( A ) n − k and ( A ) n − k .Combining this result with the fact that all these runs of length n havethe same frequency, we get that the frequency of of each w is between(1 − ε )( A ) − k and ( A ) k . Since this is true for each ε >
0, all wordsof length k have the same frequency after an occurrence of p . Since thisis true for each state p , we get that all words of length k have the samefrequency in y . In this section we extend Agafonov’s result to the more general setting ofshifts of finite type. In this context, normality is defined through the Parrymeasure which is the unique invariant and compatible measure with maximalentropy. A sequence is said to be normal if it is generic for that measure. Weactually prove a slightly stronger result by showing that genericity for anyMarkov measure is preserved by finite state selection as long as the selectionis compatible with the measure. This includes the case of shifts of finitetype because their Parry measure is Markovian.To obtain such a result, the selection must be perfomed in a compatibleway with the measure and its support. This boils down to putting someconstraints on the selector to guarantee that if the input sequence is in thesupport of the measure, then the output sequence is also in that support.Insuring that the output is still in the support is not enough as it is shown bythe following example. Consider the golden mean shift X and the selectorpictured in Figure 2. This selector selects symbols following a 1. If the inputsequence x is in X , the sequence y of selected symbols is 0 N = 000 · · · since x has no consecutive 1s. Therefore, y is always in X but genericity is lost. Toprevent this problematic behaviour, the selector is only allowed to select thenext symbol if the last read symbol and the last selected symbol coincide.This restriction rules out the previous selector because it does satisfies thisproperty. 11e suppose that a markov measure µ = µ π,P is fixed and we let X µ be its support. We introduce automata and selectors which are compatiblewith the shift X µ . An automaton A is compatible with X µ if there existsa function ι from its state set Q to A such that the following condition isfulfilled.i) If p a −→ q is a transition of A , then P ι ( p ) a > ι ( q ) = a .The condition implies that all transitions arriving to a given state q havethe same label ι ( q ) and that the label of any path is in the shift X µ . Such anautomaton is called X µ -complete if for each pair ( p, a ) such that P ι ( p ) a > p a −→ q for some state q .We continue by defining selectors which are compatible with X µ . Aselector S is compatible with X µ exists two functions ι and η from its stateset Q to the alphabet A such that the following two conditions are fulfilled.i) If p a | a −−→ q is a transition of type I, then P ι ( p ) a > ι ( q ) = η ( q ) = a ,and η ( p ) = ι ( p )ii) If p a | λ −−→ q is a transition of type II, then P ι ( p ) a > ι ( q ) = a and η ( q ) = η ( p ).The condition P ι ( p ) a > η ( p ) = ι ( p ) forthe transition p a | a −−→ q states the last read and last selected symbols mustcoincide for the selector to be able to select. The other conditions statesthat ι ( q ) is always the last read symbol, and that η ( q ) is the last selectedsymbol if there is one and that it is equal to η ( p ) otherwise.000 001010 011100 101110 1110 | λ | λ | λ | λ | λ | λ | λ | λ | | | λ | λ | λ | λ | | prs where p ∈ { , } is the parity of the number of read12ymbols so far, r ∈ { , } is the last read symbol and s ∈ { , } the lastselected symbol. The two functions ι and η can be defined by ι ( prs ) = r and η ( prs ) = s .The following theorem states that selection with compatible selectorspreserves genericity for Markov measures. The input sequence x must beassumed to be in the shift X µ because compatible selectors only read se-quences from X µ . Theorem 10.
Let µ be a Markov measure and let x be a sequence in X µ which is µ -generic. For each oblivious selector S compatible with X µ , theoutput S ( x ) of S on x belongs to X µ and is µ -generic. The previous theorem can be applied to the Parry measure µ of a shift X of finite type because the suppport of µ is actually X µ = X .We start with the definition of the conditional measures induced by µ .For each symbol a ∈ A , we let µ a denote the conditional measure defined by µ a ( a a · · · a n ) = P aa P a a · · · P a n − a n . Note that the measures µ a might not be invariant. Since π is the stationnarydistribution, the measure µ can be recovered from the measures µ a by theformula µ = P a ∈ A π a µ a .The following lemma shows that the set of runs starting from a state p and outputting a fixed word w is not too large. This is the analog of Lemma 5in the context of Markov measures. Lemma 11.
Let S be an oblivious selector compatible with µ . For eachstate p of S , each integer n > , and word w ∈ A ∗ such that | w | n , thenthe inequality µ ι ( p ) ( { u ∈ A n : p u | v −−→ q and w ⊑ v } ) < µ η ( p ) ( w ) holds.Proof. Let U be the set { u ∈ A n : p u | v −−→ q and w ⊑ v } . The proof is carriedout by induction on the integer n . If n = 0, the set U is U = { λ } and w must be the empty word λ . The inequality is then satisfied because bothmeasures are equal to 1. We now suppose that n >
1. Since the selectoris oblivious, all transitions starting from state p have the same type, eithertype I or type II. We distinguish two cases depending on the type of thesetransitions.We first suppose that all transitions starting from state p have type I.Let us write w = aw ′ where a is a symbol and w ′ a word. Consider thetransition p a | a −−→ p ′ . The compatibility of S with µ implies that ι ( p ) = η ( p )and ι ( p ′ ) = η ( q ) = a . All runs starting from p such that w is a prefix ofthe output label must use this transition as a first transition. Applying theinduction hypothesis to p ′ , n − w ′ gives that µ a ( U ′ ) < µ a ( w ′ ) where U ′ = { u ∈ A n − : p ′ u | v −−→ q and w ⊑ v } . Since U = aU ′ , the result followsfrom the equalities µ ι ( p ) ( U ) = P ι ( p ) a µ a ( U ′ ) and µ η ( p ) ( w ) = P η ( p ) a µ a ( w ′ ).13e now suppose that all transitions starting from state p have type II,that is, have the form p a | λ −−→ p a for each symbol a . The compatibility of S with µ implies that ι ( p a ) = a and η ( p a ) = η ( p ) for each a ∈ A . All runsof length n starting from p have an output label of length at most n − | w | = n , there is no run such that w is prefix of its outputlabel and the inequality is trivially satisfied. If | w | n −
1, applying theinduction hypothesis to each p a , n − w , gives that µ a ( U a ) < µ η ( p ) ( w )where U a = { u ∈ A n − : p a u | v −−→ q and w ⊑ v } . Since U = S a ∈ A aU a ,the result follows from the equalities µ ι ( p ) ( U ) = P a ∈ A P ι ( p ) a µ a ( U a ) and µ η ( p ) ( w ) = P a ∈ A P ι ( p ) a µ η ( p a ) ( w ) = µ η ( p a ) ( w ).Some of the bounds are again obtained using the ergodic theorem forMarkov chains [6, Thm 4.1]. For that purpose, we associate a Markov chain M to each strongly connected automaton A which is compatible with X µ and X µ -complete. This means that there is a function ι from Q to A suchthat if p a −→ q is a transition, then ι ( q ) = a . For simplicity, we assume thatthe state set Q of A is the set { , . . . , Q } .The state set of the Markov chain is the same set { , . . . , Q } . Thetransition matrix of the Markov chain is the matrix ˆ P = ( ˆ P pq ) p,q Q where each entry ˆ P pq is equal to P ι ( p ) a = P ι ( p ) ι ( q ) if p a −→ q is a transition of A and 0 otherwise. Since the automaton is assumed to be deterministic and X µ -complete, the matrix ˆ P is stochastic. If the automaton A is stronglyconnected, the Markov chain is irreducible and it has therefore a uniquestationary distribution ˆ π such that ˆ π ˆ P = ˆ π . The vector ˆ π is called the distribution of A . The matrix ˆ P and its stationary distribution ˆ π definea Markov measure ˆ µ = µ ˆ π, ˆ P on finite runs of A . The link between themeasures µ and ˆ µ is that ˆ µ ( p ∗ u ) = ˆ π p µ ι ( p ) ( u ) for each state p and eachword u . Lemma 12.
Let A be a strongly connected deterministic and complete au-tomaton and let π be its distribution. For each real numbers ε, δ > , thereexists an integer N such that for each integer n > Nµ (cid:0)(cid:8) u ∈ A n : ∃ p, q ∈ Q (cid:12)(cid:12) | p ∗ u | q /n − ˆ π q (cid:12)(cid:12) > δ (cid:9)(cid:1) < ε The lemma is stated for the measure µ but the ergodic theorem is validfor any initial distribution. The result is therefore also valid for the condi-tional measures µ a . Proof.
The proof is a mere application of the ergodic theorem for Markovchains [6, Thm 4.1].
Corollary 13.
Let A be a deterministic and strongly connected automatonand let π its distribution. Let ρ be the run of A on a µ -generic sequence x .Then for each state q lim n →∞ | ρ [1: n ] | q n = ˆ π q . here ρ [1: n ] is the finite run made of the first n transitions of ρ Proof.
Since P q ∈ Q ˆ π q = 1, it suffices to prove that lim inf n →∞ | ρ [1: n ] | q /n > ˆ π q holds for each state q .Let ε > δ = ε provides an integer k such that µ (cid:16)n u ∈ A k : ∃ p (cid:12)(cid:12) | p ∗ u | q /k − ˆ π q (cid:12)(cid:12) > ε o(cid:17) < ε. The run ρ is then factorized ρ = p u −→ p u −→ p u −→ p · · · = ( p ∗ u )( p ∗ u )( p ∗ u ) · · · where each word u i is of length k and x = u u u · · · . Since x is µ -generic,there is an integer N such that for each n > N the cardinality of the set { i < n : u i = u } is greater than (1 − ε ) nµ ( u ) for each word u ∈ A k .lim inf n →∞ | ρ [1: n ] | q n = lim inf n →∞ | ρ [1: nk ] | q nk = 1 nk n − X i =0 | p i ∗ u i | q > nk X u ∈ A k { i < n : u i = u } × min p ∈ Q | p ∗ u | q > nk X u ∈ A k \ B ((1 − ε ) nµ ( u ))( k (ˆ π q − ε ))= (1 − ε ) (ˆ π q − ε )Since this inequality holds for each real number ε >
0, we have proved thatlim inf n →∞ | ρ [1: n ] | q /n > ˆ π q . Lemma 14.
Let A be an automaton compatible with µ and let ρ be a runin A on a µ -generic sequence in X µ . The run ρ reaches a recurrent stronglyconnected component of A .Proof. We claim that for each SCC C which is not recurrent, there existsa word w with µ ( w ) > a such that from anystate q in C such that P ι ( q ) a > q ∗ w leaves C .We fix a symbol a . Let { q , . . . , q n } be the set of states q in C such P ι ( q ) a >
0. We construct a sequence w , w , . . . , w n of words such that if i j , then the run q i ∗ w j leaves C . We set w = λ and the statement istrue. Suppose that w , . . . , w k have been already chosen and consider thestate p k = q k +1 · w k . If this state p k is already out of C , we set w k +1 = w k .Otherwise, since C is not recurrent, there is a word v k such that p k · v k isout of C : we set w k +1 = w k v k so that q k +1 · w k +1 = p k · v k is out of C .15he run ρ reaches a last SCC C . Suppose by constriction that C is notrecurrent. By the previous claim there is a word w = aw ′ such that µ ( w ) > q in C with P ι ( q ) a > q ∗ w leaves C . Since x is µ -generic, the word w occurs infinitely often in x . Let q be state of C reached by the run ρ before an occurrence of w . Since x is in X µ , P ι ( q ) a > q ∗ w leaves C while C is supposed to be thelast SCC reached by ρ . Lemma 15.
Let S be a strongly connected selector. For each integer k and each real number ε > , there exists an integer N such that for eachinteger n > N , each state p and each word w of length k , the inequalities (1 − ε ) µ η ( p ) ( w ) < µ ι ( p ) ( { u ∈ A n : p u | v −−→ q and w ⊑ v } ) < µ η ( p ) ( w ) hold.Proof. Let p be any state. The upper bound µ η ( p ) ( w ) has been alreadyproved in Lemma 11. It remains to prove the lower bound.Let fix a state q such that the transitions starting from q are of type I.If no such state exists, all transitions of the selector outputs the emptyword and the output label of any run is empty. Applying Lemma 12 with εµ η ( p ) ( w ) and δ = π q / N such that for each n > N , µ ι ( p ) (cid:0)(cid:8) u ∈ A n : (cid:12)(cid:12) | p ∗ u | q /n − π q (cid:12)(cid:12) > π q / (cid:9)(cid:1) < εµ η ( p ) ( w ) . Fix now N = max( N , k/π p ) and let n be such that n > N . If a word u oflength n does not belong to the small set above, the run p ∗ u satisfies | p ∗ u | q >nπ q / > k for each state p . This implies that the length of its output labelis greater than k . Indeed, the state q has at most k + 1 occurrences in therun and each transition starting from q outputs one symbol.Consider the ( A ) n runs of the form p ∗ u for u of length n . The measureof those having an output smaller than k is less than εµ η ( p ) ( w ). For each w ′ = w , the measure of those having w ′ as prefix of length k of their outputlabel is at most µ η ( p ) ( w ). It follows that the measure of those having w asprefix of length k of their output label is at most (1 − ε ) µ η ( p ) ( w ).Let A be an automaton with state set Q . We now define an automatonwhose states are the run of length n in A . We let A n denote the automatonwhose state set is { p ∗ u : p ∈ Q, u ∈ A n } and whose set of transitions isdefined by (cid:8) ( p ∗ bu ) a −→ ( q ∗ ua ) : p b −→ q in A , a, b ∈ A and u ∈ A n − (cid:9) The Markov chains associated with the automaton A n is called the snake Markov chains. See Exercises 2.2.4, 2.4.6 and 2.5.2 in [6] for more details.It is pure routine to check that the distribution ˆ ξ of A n is given by ˆ ξ p ∗ w =ˆ π p µ η ( p ) ( w ) for each state p and each word w of length n .16 roof of theorem 10. Let y be the output of the run of S on x . By Lemma 14,the run of S on x reaches a recurrent strongly connected component. There-fore it can be assumed without loss of generality that the selector S isstrongly connected.Let k be a fixed integer. We claim that for each word w of length k lim n →∞ | y [1: n ] | w /n = µ ( w ). With each occurrence of a word w of length k in y , we associate the occurrence of the state q in the run from which startsthe transition that outputs the first symbol of w . Note that transitionsstarting from q must be of type I. Conversely, with each occurrence in therun of such a state, we associate the block of length k of y starting fromthat position.We fix a state p such that transitions starting from p have type I. Wefirst claim that for each integer n , each run p ∗ u of length n starting from p has a frequency of µ η ( p ) ( w ). To prove this claim, we apply Corollary 13to the automaton A n where A is the automaton obtained by removing theoutputs from S .Let ε > n such that for each w on length k , the measure µ ι ( p ) of all runs starting from p outputting w as their first k symbols is between (1 − ε ) µ η ( p ) ( w ) and µ η ( p ) ( w ).Combining this result with the fact that each run p ∗ u of length n occursafter state p with a frequency equal to µ ι ( p ) ( u ), we get that the frequency ofeach word w is between (1 − ε ) µ η ( p ) ( w ) and µ η ( p ) ( w ). Since this is true foreach ε >
0, all words of length k have a frequency after state p equal to itsmeasure µ η ( p ) ( w ). Since this is true for each state p , we get that each wordof length k have a frequency equal to µ ( w ) in y . Conclusion
As a conclusion, we would like to mention a few extensions of our results.Agafanov’s theorem deals with prefix selection: a given digit is selected ifthe prefix of the word up to that digit belongs to a fixed set of finite words.
Suffix selection is defined similarly: a given digit is selected if the suffix ofthe word from that digit belongs to a fixed set of sequences. It has beenshown in [3] that suffix selection also preserves normality as long as the fixedset of sequences is regular. Let us recall that a set of sequences is regular ifit can be accepted by non-deterministic B¨uchi or by a deterministic Mullerautomaton [14]. The proof given in [3] is based on the characterization ofnormality by non-compressibility. The proof techniques developed here toprove Agafanov’s theorem can be adapted to also prove directly the resultabout suffix selection.The prefix and suffix selections considered so far are usually called obliv-ious because the digit to be selected is not included to either the prefixor the suffix taken into account. Non-oblivious does not preserve in gen-17ral normality but it does for a restricted class of sets of finite words calledgroup languages [8]. Group languages are sets of words which are acceptedby deterministic automata such that each symbol induces a permutation ofthe states. This later property means that for each symbol a , the functionwhich maps each state p to the state q such that p a −→ q is a permutation ofthe state set. The techniques presented in this paper can also be adaptedto prove such a result. References [1] V. N. Agafonov. Normal sequences and finite automata.
Soviet Math-ematics Doklady , 9:324–325, 1968.[2] N. ´Alvarez and O. Carton. On normality in shifts of finite type.
CoRR ,abs/1807.07208, 2018.[3] V. Becher, O. Carton, and P. A. Heiber. Normality and automata.
Journal of Computer and System Sciences , 81(8):1592–1613, 2015.[4] V. Becher and P. A. Heiber. Normal numbers and finite automata.
Theoretical Computer Science , 477:109–116, 2013.[5] ´E. Borel. Les probabilit´es d´enombrables et leurs applicationsarithm´etiques.
Rendiconti del Circolo Matematico di Palermo , 27:247–271, 1909.[6] P. Br´emaud.
Markov Chains: Gibbs Fields, Monte Carlo Simulation,and Queues . Springer, 2008.[7] A. Broglio and P. Liardet. Predictions with automata. Symbolic dy-namics and its applications.
Contemporary Mathematics , 135:111–124,1992. Also in Proceedings AMS Conference in honor of R. L. Adler.New Haven CT - USA 1991.[8] O. Carton and J. Vandehey. Preservation of normality by non-obliviousgroup selection.
ToCS , 2020.[9] J. Dai, J. Lathrop, J. Lutz, and E. Mayordomo. Finite-state dimension.
Theoretical Computer Science , 310:1–33, 2004.[10] B. P. Kitchens.
Symbolic Dynamics . Springer, 1998.[11] D. Lind and B. Marcus.
An Introduction to Symbolic Dynamics andCoding . Cambridge University Press, 1992.[12] M. Madritsch. Normal numbers and symbolic dynamics. In
Sequences ,chapter 8. Cambridge University Press, 2018.1813] M. G. O’Connor. An unpredictability approach to finite-state random-ness.
Journal of Computer and System Sciences , 37(3):324–336, 1988.[14] D. Perrin and J.-´E. Pin.
Infinite Words . Elsevier, 2004.[15] J. Sakarovitch.
Elements of automata theory . Cambridge UniversityPress, 2009.[16] C. P. Schnorr and H. Stimm. Endliche automaten und zufallsfolgen.