An Embellished Account of Agafonov's Proof of Agafonov's Theorem
aa r X i v : . [ c s . D M ] J u l An Embellished Account of Agafonov’s Proof of Agafonov’sTheorem
Thomas Seiller and Jakob G. Simonsen
Abstract
We give an account of Agafonov’s original proof of his eponymous theorem. Theoriginal proof was only reported in Russian [11, 1] in a journal not widely available, andthe work most commonly cited in western literature is instead the english translation [2] ofa summary version containing no proofs [3]. The account contains some embellishments toAgafonov’s original arguments, made in the interest of clarity, and provides some historicalcontext to Agafonov’s work.
We give an account of Agafonov’s original proof of his eponymous theorem. The originalproof was only reported in Russian [11, 1] in a journal not widely available, and the workmost commonly cited in western literature is instead the english translation [2] of a summaryversion containing no proofs [3].The account contains some embellishments to Agafonov’s original arguments, made in theinterest of clarity:1. The original proof relies on results of Postnikova [14]. We detail Postnikova’s contribu-tion and provide some historic context to her result.2. The original proof contained a mixture of arguments expressed both via running textand explicit lemmas and theorems. While we have retained the general flow of argumen-tation from the original, we have used explicit lemmas and propositions for a number ofobservations occurring in the running text.3. We have made several arguments explicit and provided detailed arguments in placeswhere Agafonov relied on immediate understanding from his specialist audience, butwhere we believe that non-expert readers with modern sensibilities might prefer moreelaborate explanations. The most pertinent examples are:(a) We explicitly prove why it suffices to prove that a connected finite automaton picksout b ∈ { , } n for n = 1 with limiting frequency p from any p -distributed sequence(Lemma 13).(b) We have appealed directly to probabilistic reasoning (using Chebyshev’s Inequality)in the proof that, par abus de langage , the probability of deviation from probability p among the symbols selected by a finite automaton from sets of substrings pickedfrom a p -distributed sequence tends to zero with increasing length of the strings(Lemma 16). In [1], this was essentially proved by a reference to the Strong Lawof Large Numbers and a statement that the proof was similar to Lemma 3 ofLoveland’s paper [10]. 1 cknowledgements. The authors warmly thank Łukasz Czajka and Anastasia Volkova fortheir help in translating the russian documents. If α = a a · · · is a right-infinite sequence over an alphabet A and N is a positive integer, wedenote by α | ≤ N the finite string a a · · · a N .We denote by A ∗ the set of (finite) words over A and by A + the set of finite non-empty words over A . Definition 1.
A finite probability map (over an alphabet A ) is a map p : A + −→ [0 , suchthat, for all positive integers n , P a ··· a n ∈A n p ( a · · · a n ) = 1 .A finite probability map p is said to be: • Bernoulli if, for all positive integers n , and all a , . . . , a n ∈ A , p ( a · · · a n ) = Q nj =1 p ( a j ) . • Equidistributed if, for any string a · · · a n ∈ A n , p ( a · · · a n ) = |A| − n .Observe that an equidistributed p is also Bernoulli. For alphabets |A| > , any map g : A −→ [0 , with P a ∈A g ( a ) = 1 induces a Bernoulli finite probability map p g by letting p g ( a · · · a n ) , Q nj =1 g ( a j ) . This map is equidistributed iff g ( a ) = | A | − for every a ∈ A .The use of the word “Bernoulli” is due to the fact that Bernoulli finite probability mapscorrespond directly to the measure of cylinders in Bernoulli shifts [20]; in the literature onnormal numbers, the word Bernoulli is sometimes used slightly differently, for example Schnorrand Stimm [18] use the term Bernoulli sequences for sequences distributed according to finiteprobability map that are equidistributed in our terminology.We are interested in the finite probability maps whose values can be realized as the limitingfrequencies of finite words in right-infinite sequences over { , } . Definition 2.
Let b = b · · · b N and a = a · · · a n be finite words over A . We denote by a ( b ) the number of occurrences of a in b , that is, the quantity |{ j : b j b j +1 · · · b j + n − = a a · · · a n }| Let p be a finite probability map over A , and be α is a right-infinite sequence over A . Ifthe limit freq a ( α ) = lim N →∞ a ( α | ≤ N ) N exists and is equal to some real number f , we say that a occurs in α with limiting frequency f . If every a ∈ A + occurs in α with limiting frequency p ( a ) , we say that α is p -distributed.Observe that a right-infinite sequence α is normal in the usual sense iff it is p -distributedfor (the unique) equidistributed finite probability map p over A . Also observe that it is notall finite probability maps p for which there exists a p -distributed sequence.An example of a finite probability map that is not Bernoulli, but such that there is atleast one p -distributed right-infinite sequence, is the map b defined by b ( α ) = 1 / if α doesnot contain either of the strings or (note that for each positive integer n , there areexactly two such strings of length n of each length), and b ( α ) = 0 otherwise. Observe thatthe right-infinite sequence · · · is p -distributed.In the remaining sections, we will work with the alphabet { , } unless otherwise specified.2 Preliminaries and Historical aspects
The notion of p -distributed sequences can be traced back to a 1909 paper by Émile Borel [5].In this work, Borel studies the decimal representation of numbers and introduces the followingdefinitions. Definition 3 (Borel normality) . Consider an integer b > . Consider a number < a < anddenote by α b its decimal sequence a b , . . . , a bn , · · · ∈ { , , . . . , b − } ω in base b , i.e. a = P n a bn b n .Then x is said to be:1. simply normal w.r.t. the basis b when freq c ( α ) = b for all c ∈ { , , . . . , b − } ;2. entirely normal (or just normal ) w.r.t. the basis b when for all integers n, k the number b k x is simply normal w.r.t. the basis b m ;3. absolutely normal if it is entirely normal w.r.t. every possible basis b .Borel already remarks that normality correspond to what we introduced as p -distribution :The characterising property of a normal number is the following: considering asequence of p symbols, denoting by c n the number of times this sequence is to befound within the n first decimal numbers, we have lim n →∞ c n n = b p .The main result of Borel on normal numbers is the following theorem. Theorem 1 (Borel [5]) . The probability that a number is absolutely normal is equal to 1, i.e.almost all numbers are absolutely normal.
As a consequence, the probability that a number is normal, or simply normal, is also equalto . In particular, the cardinality of the set of normal numbers is equal to the cardinality ofthe continuum ℵ , and normal numbers are dense in the set of all real numbers. The notion of p -distributed sequences also appeared in connection with the notion of kollektiv introduced by von Mises in order to capture the concept of random sequence . The intuitionbehind von Mises approach it that a random sequence is one that cannot be predicted. I.e.the frequency of each possible outcome is independent from a the choice of a Spielsystem , i.e.a way to predict the outcome of successive trials. In other words, a sequence of trials outcomesis not random whenever there exists a strategy to select a subsequence of the trials in orderto modify the frequency of the outcomes. This is expressed as the second condition in thefollowing definition. As reported by Church [6], a sequence α = a , a , . . . , a n , . . . in { , } ω is a kollektiv according to von Mises [22, 23] when:1. freq ( α ) is defined and equal to p ; The translation is ours, in which we replaced the basis considered by Borel with a parametrised basis b . I.e. entirely normal w.r.t. the basis b , where b = 10 in Borel’s original paper.
3. if β = a n , a n , . . . is any infinite sub-sequence of α formed by deleting some of the termsof the latter sequence according to a rule which makes the deletion or retention of a n depend only on n and a , a , . . . , a n − , then freq ( β ) is defined and equal to p .However, Church judges this definition to be "too inexact in form to serve satisfactorily as thebasis of a mathematical theory" and proposes the following formalisation. Definition 4 (von Mises kollektiv) . Let α be a sequence a , a , . . . , a n , . . . in { , } ω . It is a kollektiv (in the sense of von Mises, as formalised by Church) when:1. freq ( α ) is defined and equal to p ;2. If ϕ is any function of positive integers, if b = 1 , b n +1 = 2 b n + a n , c n = ϕ ( b n ) , and theintegers n such that c n = 1 form in order of magnitude an infinite sequence n , n , . . . ,then the sequence β = a n , a n , . . . satisfies that freq ( β ) is defined and equal to p .In this section, several other notions of kollektiv will be discussed and introduced, and wewill therefore use the following definitions. Definition 5 (Strategy) . A strategy S is a predicate over the set of finite binary words, i.e. S ⊂ { , } ∗ = ∪ ωi =0 { , } i . Definition 6 (Selected Subsequence) . Given a strategy S and an infinite sequence α = a , a , . . . , a n , . . . in { , } ω , we define the sequence S ( α ) as follows. Let i , i , . . . , i k , . . . bethe (increasing) sequence of indices j such that α | ≤ j − ∈ S . S ( α ) j = a i j Definition 7 (Kollektiv) . A sequence α = a , a , . . . , a n , . . . in { , } ω is a kollektiv w.r.t. aset of strategies S when:1. freq ( α ) is defined and equal to p ;2. for any strategy S ∈ S , freq ( S ( α )) is defined and equal to p . With this definition, the notion of von Mises kollektiv coincides with that of kollektiv w.r.t.the set of all strategies. As discussed by several authors [21, 16, 9, 8], this notion of kollektivis however inadequate, because it is too restrictive. This is further explained by Church, whoexplains why no kollektiv can exist if one considers such a strong notion:[...] it makes the class of random sequences associated with any probability p otherthan or an empty class. For the failure of (2) may always be shown by taking ϕ ( x ) = a µ ( x ) where µ ( x ) is the least positive integer m such that m > x : thesequence a n , a n , . . . will then consist of those and only those terms of a , a , . . . which are 1’s .As a consequence, Church introduces a new notion of kollektiv, by factorising in the notion ofcomputability. This choice is furtehr argumented as follows: Note that the terms b n are written as follows in binary b n = 1 a a . . . a n − . Indeed, the function defined by Church ensures that c n = a n . Definition 8 (Church kollektiv) . Let α be a sequence a , a , . . . , a n , . . . in { , } ω . It is a kollektiv (in the sense of Church) when:1. freq ( α ) is defined and equal to p ;2. If ϕ is any effectively calculable function of positive integers, if b = 1 , b n +1 = 2 b n + a n , c n = ϕ ( b n ) , and the integers n such that c n = 1 form in order of magnitude an infinitesequence n , n , . . . , then the sequence β = a n , a n , . . . satisfies that freq ( β ) is definedand equal to p . Towards the general purpose of defining mathematically the notion of random sequence, othernotions were also considered at the time. For our purpose, the notions of "admissible number"introduced by Copeland [7], also studied by Reichenbach under the name "normal number"[16, 17] will be of interest.
Definition 9 (Copeland-admissible sequence.) . Let α = a , a , . . . , a n , . . . be a sequence in { , } ω . For all integers r, n , define the sequence ( r/n ) α = a r , a r + n , . . . , a r + kn , . . . The sequence α is admissible (in the sense of Copeland) if the following are satisfied:1. For all r, n , freq (( r/n ) α ) is defined and equal to p .2. (1 /n ) α, (2 /n ) α, . . . , ( n/n ) α are independent numbers.Note that Copeland remarks that this second item is a consequence of the assumption thatthe sequence is obtained by independent trials (i.e. "the probability of success is a constantand does not vary from one trial to the next"). Church notes the connection between thisnotion of normal numbers and that of "completely normal number" by Armand Borel [5]:These admissible numbers (to adopt Copeland’s term) are closely related to thenormal numbers of Borel – indeed an admissible number associated with the prob-ability is the same as a number entièrement normal to the base 2. Today, one would rather use the terminology "computable". Independence here is understood in terms of probability theory, as is detailed in Copeland’s paper in whichhe states that two numbers are independent if and only if p ( x · y ) = p ( x ) · p ( y ) . .5 Postnikov and Pyateskii Around twenty years after Church’s paper, a notion of of
Bernoulli-normal sequences wasintroduced by russian mathematicians, Postnikov and Pyateskii [13]. This notion coincideswith the notion of p -distributed sequence defined above. Definition 10 (Bernoulli-normal sequence) . Let α ∈ { , } ω be a sequence a , a , . . . , a n , . . . and consider for every integer s > the s -th caterpillar of x: β s = ( a , . . . , a s − ) , ( a , . . . , a s ) , . . . , ( a P , . . . , a P + s − ) , . . . . The sequence α is Bernoulli-normal if for any word w of length s with j ones, freq w ( β s ) existsand is equal to p j (1 − p ) s − j .In subsequent work, Postnikov [12] considers the following alternative definition of admis-sible sequences. While this differs from Copeland’s definition, we provide here a proof thatthe two notions coincide. Definition 11 (Postnikov admissible sequence) . Let α = a , a , . . . , a n , . . . be a sequencein { , } ω . This sequence is called admissible (in the sense of Postnikov) if for any word w of length m with r , r , . . . , r k the positions of its 1s, the sequence β [ w ] = b , b , . . . , b n , . . . ,defined by b n = a nm + r , a nm + r , . . . , a nm + r k , satisfies that freq k ( β [ w ]) exists and is equal to p k . Lemma 2.
A sequence α ∈ { , } ω is admissible in the sense of Copeland if and only if it isadmissible in the sense of Postnikov.Proof. Let α be a Postnikov-admissible sequence. Let us define the word u ni = 00 . . . . . . ,of length n with a single 1 at position i n . Then freq ( β ) exists and is equal to p . Since β [ u ni ] = ( i/n ) α , this proves α satisfies the first item in Copeland’s definition. The seconditem, namely the independance of the sequence (1 /n ) α, (2 /n ) α, . . . , ( n/n ) α , is obtained byconsidering words u ni , j , of length n with 1s exactly at the positions i and j . Indeed, we have freq ( β [ u ni , j ]) = p = freq ( β [ u ni ])freq ( β [ u nj ]) , which coincide with Copeland’s formalisationof independence.Conversely, let α be a sequence, w a word of length m and r , r , . . . , r k the positions of the1s in w . If α is Copeland-admissible, freq k ( β [ w ]) = Q ki =1 freq ( β [ u nr i ]) by the requirement ofindependence, and therefore freq k ( β [ w ]) = k Y i =1 freq (( r i /n ) α ) = p k using that β [ u nr i ] = ( r i /n ) α and the first property of Copeland-admissible sequences, namelythat freq (( i/n ) α ) = p for all i n .Postnikov’s then shows how the two notions, i.e. Bernoulli-normal and admissibility, coin-cide. However, the proof of Postnikov’s theorem is – to the authors’ knowledge – not availablein english. As this result is related to the proof of Agafonov’s theorem, we expect to includea translation in a later version of this document. Theorem 3 (Postnikov [12]) . A sequence α ∈ { , } ω is Bernoulli-normal if and only if it isadmissible. .6 Postnikova A few years later, a short and beautiful paper by Postnikova characterises Bernoulli-normalsequences as the sequences for which the distribution of 1s is preserved by selecting strategiesdepending only on a finite number of preceding bits. In fact, Postnikova’s result is the first tointroduce finiteness and widely opens the way to Agafonov’s theorem. It is stated as a new,restricted, notion of kollektiv.
Definition 12 (Postnikova-kollektiv) . Let α = a , a , . . . , a n , · · · ∈ { , } ω be a sequence.The sequence α will be called a kollektiv (in the sense of Postnikova) if:1. freq ( α ) exists and is equal to p ;2. for all word w of length s , w occurs in α an infinite number of times, and if a subsequence β is made up consisting of the values immediately following the appearance of w then freq ( β ) exists and is equal to p .Note that using our own definition of kollektiv w.r.t. sets of strategies, a Postnikova-kollektiv is a kollektiv w.r.t. the set of strategies defined by a single finite word used aspostfix, i.e. strategies S w defined as { v ∈ { , } ∗ | ∃ u , v = u · w } . Theorem 4 (Postnikova) . A sequence α ∈ { , } ω is Bernoulli-normal if and only if it is aPostnikova-kollektiv. The proof of this theorem can be found in the english translation [15] of Postnikova’spaper [14]. Note the error in translation in the definition of Postnikov-admissible sequences:the translator mentions “the relative frequency of appearances of ones in the sequence (2)”,while it should read the relative frequency of appearances of the word k in the sequence (2) .The confusion comes from the original russian formulation (which can be traced back toPostnikov’s work [12]) which is already ambiguous. Agafonov’s contribution was to relate this to the notion of automata. The main theorem ofhis original russian paper [11] is stated as follows.
Theorem 5 (Agafonov [11]) . A sequence α is normal if and only if it is a kollektiv w.r.t. theset of strategies computable by finite automata, i.e. it satisfies:1. freq ( α ) exists and is equal to p ;2. for all automata M , the subsequence β consisting of the values immediately following thewords accepted by M is such that freq ( β ) exists and is equal to p . In fact, the proof of the implication from right to left in Agafonov’s theorem is a conse-quence of Postnikova’s theorem. Agafonov only refers to her work for this part of the proof.Indeed, if a sequence is a kollektiv in the sense of this theorem, it is also a Postnikova-kollektiv.Agafonov’s contribution is therefore the proof of the converse implication, namely: if a se-quence α is normal, it is a kollektiv w.r.t. the set of strategies computable by finite automata.However, the notion of normality used by Agafonov is not the notion of p -distributed se-quence (Definition 2), or equivalently of Bernoulli-normal sequence (Definition 10). Agafonovuses instead a notion of normality by blocks . 7 efinition 13 (Agafonov normal) . Let α ∈ { , } ω be a sequence a , a , . . . , a n , . . . andconsider for every integer s > the s -th block sequence of x: β s = ( a , . . . , a s − ) , ( a s , . . . , a s − ) , . . . , ( a ks , . . . , a ( k +1) s − ) , . . . . The sequence α is Agafonov-normal if for any word w of length s with j ones, freq w ( β s ) existsand is equal to p j (1 − p ) s − j .This definition can be shown to be equivalent to Postnikov-admissibility which, combinedwith Postnikov’s theorem (Theorem 3), proves the notion coincides with the usual notion ofnormality. Lemma 6.
A sequence α is Agafonov-normal if and only if it is Postnikov-admissible.Proof. In fact, the proof of this appears in the proof of Postnikov theorem, as Agafonov-normality is used as an intermediate notion. The proof of the right-to-left implication istaken from Postnikov’s proof [12]. The key observation is that the quantity freq k ( β [ w ]) thatappears in Postnikov-admissibility corresponds to the frequency of appearance of words in ∆ in the sequence of blocks defined from α , where ∆ is the set of words u of length k that have1s at these positions in which w has 1s (but which may differ from w on other bits).We first show that a Postnikov-admissible sequence α is Agafonov-normal. Let Σ be theset of all length k word with fixed α bits equal to and β bits equal to , α + β ≤ k . Write T l (Σ) the number of occurrences of Σ in the sequence of blocks. Then by induction on β ,using the definition of admissibility, we obtain: lim l →∞ T l (Σ) l = p α q β . (1)This gives the result by fixing Σ as a singleton, i.e. α + β = k .Conversely, consider given an Agafonov-normal sequence. By definition, we know thatthe frequency of a word w (with j bits equal to ) is equal to p j q k − j . We want to sum thisfrequency over all words that have 1s at the same positions as w but in which some 0s mayhave become 1s. I.e. we have all combinations of putting 1s in k − j boxes. So the sum canbe written as: freq k ( β [ w ]) = p j X k − j (cid:18) ik − j (cid:19) p i q k − j − i = p j ( p + q ) k − j = p j . We fix once and for all the alphabet
Σ = { , } . Definition 14.
Let a = a a . . . a n be a word over Σ . We define: µ p ( a ) = p ( a ) (1 − p ) n − ( a ) Definition 15.
For M ⊆ { , } N , de define µ p ( M ) = P w ∈ M µ p ( w ) . Definition 16.
Let α = a , a , . . . , a n , . . . be a sequence in { , } ω . For all natural number n we define the n -block decomposition of α as the sequence ( α ( n,r ) ) r > defined by α ( n,r ) = a n ( r − a n ( r − . . . x nr efinition 17. Let α be a sequence in { , } ω , w a finite word of length n , and k an integer.We define freq w ( α ; k ) = k Card { α ( n,r ) = w | r k } .Notice that a sequence α is Agafonov-normal (Definition 13) if and only if for all finiteword w of length n with j bits equal to , lim k →∞ freq w ( α ; k ) exists and is equal to p j q n − j . Definition 18.
Let A be a strongly connected automata with set of states Q . For all q ∈ Q ,we write A q the automata A in which the state q is chosen as initial. Definition 19.
Let A be a strongly connected automata with set of states Q , and q ∈ Q .Let w = w w . . . w n be a finite word. We write A q [ w ] the word picked out by the automata A q , i.e. the word w i w i . . . w i k where i < i < · · · < i k is the increasing sequence of indices j n such that w | ≤ j − is accepted by A q . Definition 20.
Let A be a strongly connected automata with set of states Q . For all p ∈ [0 , , b ∈ [0 , , n ∈ N and ǫ > , we define the sets: D pn ( b, ǫ ) = { w ∈ { , } n | ∀ q ∈ Q, len( A q [ w ]) > bn, | ( A q [ x ])len( A q [ x ]) − p | < ǫ } Claim 7.
For all ǫ > and all p ∈ [0 , , lim n →∞ µ p ( D pn ( b, ǫ )) = 1 .Proof. This claim is a consequence of Lemma 9 and Lemma 10 below, noting that D pn ( b, ǫ ) =Σ n (cid:31) ( E n ( b ) ∪ G n ( b, ǫ )) . Theorem 8.
Let α be a normal sequence with ratio p ∈ [0 , , A a strongly connected automata.Then the sequence β = A [ α ] is normal with ratio p .Proof. We will show that ∀ ǫ, ∃ L, ∀ l > L, | l P li =1 y i − p | < ǫ .Pick δ > small enough ( δ < bǫ ). By Claim 7, we pick n ∈ N such that µ p ( D pn ( b, ǫ )) > − δ . Now, we consider η < bǫ (i.e. sufficiently small); since α is normal, there exists S ∈ N such that ∀ s > S , ∀ a ∈ { , } n , | freq a ( α ; s ) − µ p ( a ) | < η n , i.e. ∀ M ⊆ { , } n , | freq M ( α ; s ) − µ p ( M ) | < η .We now consider the sequence β [ n,r ] as the sequence of blocks of A [ α ] (of changing lengthbetween and n ) corresponding to the sequence of blocks α ( n,r ) , and write θ the frequency of1s in the blocks picked out from the blocks in D pn ( b, ǫ ) . Then | θ − p | < ǫ .Now let L = P si =1 len( β [ n,i ] ) and ℓ = P i ∈ I len( β [ n,i ] ) with I = { i s | α ( n,i ) D pn ( b, ǫ ) } .We write θ = P i ∈ I ( β [ n,i ] ) P i ∈ I len( β [ n,i ] ) and ρ = P si =1 ( β [ n,i ] ) L . Then | ρ − θ | < ℓL .We then show ℓL < ǫ and deduce that | ρ − p | < ǫ as follows. We consider a small enough δ > and find S big enough to have Card { i S | α [ n,i ] ∈ D pn ( b, ǫ )) } S > − δ − η On one hand, for all w ∈ D pn ( b, ǫ )) more than bn characters are picked out, therefore we have L > (1 − δ − η ) Sbn . On the other hand, for all w ∈ { , } n less than n characters are pickedout and Card { i S | α [ n,i ] D pn ( b, ǫ )) } S < δ + η , thus ℓ < ( δ + η ) Sn . Hence ℓL < ( δ + η )(1 − η − δ ) b < ǫ .Finally, | ρ − p | | ρ − θ | + | θ − p | < ǫ . 9 emma 9. Define E n ( b, q ) = { w ∈ { , } n | A q [ w ] bn } , and E n ( b ) = ∪ q ∈ Q E n ( b, q ) . Thenfor all p ∈ [0 , and for all automaton A , there exists c, d > such that for all ǫ > , thefollowing holds. lim n →∞ µ p ( E n ( c − ǫd )) = 0 Proof.
Let us consider ( X, B , µ p ) the measure space with X = { , } ω , B induced by cylinders,and µ p ( { α | ∀ j ∈ { , , . . . , n } , α i j = b j } ) = µ p ( b b . . . b n ) .For a word v , define C ( v ) = { α ∈ { , } ω | ∃ β ∈ { , } ω , α = v .β } . If R is a finite(prefix-free ) set of words, then µ p ( ∪ v ∈ R C ( v )) = µ p ( R ) . (2)Now, take A a finite automaton ( { , } , Q, Q ∗ , φ ) . This defines a Markov chain of set ofstates Q : p i,j = if φ ( i,
1) = φ ( i,
0) = jp if φ ( i,
1) = j, φ ( i, = j − p if φ ( i, = j, φ ( i,
0) = j otherwiseIf A is strongly connected, there exists a smallest n i,j such that p ( n i,j ) i,j > . Define the period D as the least common multiple of the family ( n i,j ) i,j ∈ Q .Let Q , Q , . . . , Q D − be the classes of “periodical states”. Given Q r , we have a Markovchain with probabilities p ( D ) i,j for i, j ∈ Q r . For all Q r , there exists a family ( c i ) i ∈ Q r such that P i ∈ Q r c i = 1 and lim n →∞ p ( Dn ) i,j = c j .Consider q A j ( α ) = q q . . . the realisation of the Markov process with α as input and j ∈ Q . We have µ p,A j ( { q A j ( α ) | α ∈ M } ) = µ p ( M ) . (3)Let ν ( n ) i ( ~q ) = Card { q j = i | j n } . For all ǫ > and all i, j , lim n →∞ µ p,A j { ~q s.t. | dn ν ( n ) i ( ~q ) − c i | > ǫ } = 0 (4)by the law of large numbers for finite regular ergodic Markov chains .From Equation (3) and Equation (4), we have lim n →∞ µ p { α s.t. | Dn ν ( n ) i ( q A j ( ~x )) − c i | > ǫ } = 0 For a finite word a , write q A j ( a ) = q q . . . q n ( n = len( a ) ). Using Equation (2), lim n →∞ µ p { a a . . . a n , | Dn ν ( n ) i ( q A j ( a a . . . a n )) − c i | > ǫ } = 0 (5)If in q A j ( a ) , there exists q i ∈ Q ∗ , then A j picks out a j from a . Let c = min i ∈ Q ∗ c i . FromEquation (5), for all j ∈ Q , lim n →∞ µ p E n ( c − ǫD , j ) = 0 .The lemma then follows from µ p E n ( c − ǫD ) P j ∈ Q µ p E n ( c − ǫD , j ) . This precision is added by the authors. emma 10. Define G n ( b, ǫ, q ) = { w ∈ { , } n | len( A q [ w ]) > bn, | ( A q [ w ])len( A q [ w ]) − p | > ǫ } , and G n ( b, ǫ ) = ∩ q ∈ Q G n ( b, ǫ, q ) . Then for all p, b, ǫ and all automaton A , lim n →∞ µ p ( G n ( b, ǫ )) = 0 . Proof. (Similar to Lemma 3 from D.W. Loveland,
The Kleene hierarchy classification of re-cursively random sequences [10].)By the "strong law of large numbers", for all ǫ > , lim n →∞ µ p ( ∪ ℓ > n { y | | ℓ ℓ X i =1 y i − p | > ǫ } ) = 0 Define F n ( b, ǫ ) = ∪ bn<ℓ n { y ∈ { , } ℓ | | ℓ P ℓi =1 y i − p | > ǫ } . And define R n ( b, ǫ ) as the setobtained from F n ( b, ǫ ) by removing the words w such that there exists a word u in F n ( b, ǫ ) with u ≺ w .From the fact that ∪ w ∈ R n ( b,ǫ ) C ( w ) ⊂ ∪ ℓ > bn { y | | ℓ ℓ X i =1 y i − p | > ǫ } and µ p ( R n ( b, ǫ )) = µ p ( ∪ w ∈ R n ( b,ǫ ) C ( w )) and the equation above, we deduce that lim n →∞ µ p R n ( b, ǫ ) = 0 . By Lemma 11 and the equality G n ( b, ǫ, q ) = { w ∈ { , } n | A q [ w ] ∈ S n ( b, ǫ ) } we get that µ p ( G n ( b, ǫ, q )) µ p ( R n ( b, ǫ )) . Consequently, lim n →∞ µ p ( G n ( b, ǫ, q )) = 0 for all q ∈ Q , hence lim n →∞ µ p ( G n ( b, ǫ )) = 0 . Lemma 11.
Let S be a strategy, and F a finite subset of { , } ∗ . Let R be the set obtainedfrom F by removing those words w such that there exists a word u ∈ F with u ≺ w (i.e. u isa prefix of w ). Let M be the set { w ∈ { , } n | S ( w ) ∈ F } . Then µ p ( M ) µ p ( R ) . Proof.
It is sufficient to prove that for a given word w = a a . . . a k the set M = { u ∈{ , } n | w (cid:22) S ( u ) } satisfies µ p ( M ) µ p ( w ) . This is show by induction on the length k ofthe word w .The base case is w = a = 1 (by symmetry – 1 becomes 0, p becomes − p –, this issufficient). Let α = x x . . . x n be a word in M , and write x f the first symbol picked out by S ; in particular x f = 1 . Now, one can define ¯ α = x x . . . x f − ¯ x f x f +1 . . . x n , i.e. the wordobtained from α by simply flipping the f -th bit. Then ¯ α M . One can then define the set ¯ M = { ¯ α | α ∈ M } . As µ p ( ¯ α ) = − pp µ p ( α ) and ¯ · defines a one-to-one correspondence between11 and ¯ M , we have µ p ( ¯ M ) = − pp µ p ( M ) . Moreover, M and ¯ M are disjoint subsets of { , } n ,hence µ p ( M ) − µ p ( ¯ M ) . We can then conclude from these two equations that µ p ( M ) p .Now, consider the word w = a a . . . a k a k +1 with a k +1 = 1 . We have M = { α ∈{ , } n | a . . . a k (cid:22) S ( α ) } . Given α ∈ M , define ¯ α as the word obtained from α by flippingits k + 1 -th picked out bit, i.e. ¯ α is the unique word obtained from ¯ x by flipping a singlebit and such that a . . . a k (cid:22) S ( ¯ α ) . Define ¯ M as the set { ¯ α | α ∈ M } . Let N be theset { α ∈ { , } n | a . . . a k (cid:22) S ( α ) } . Then N contains both M and ¯ M , and the latter twosets are disjoint. Moreover the induction hypothesis implies that µ p ( N ) µ p ( a a . . . a k ) .Hence µ p ( M ) + µ p ( ¯ M ) µ p ( a a . . . a k ) . Since µ ¯ M = − pp µ p ( M ) , we deduce that µ p ( M ) pµ p ( a a . . . a k ) = µ p ( a a . . . a k . We now give an embellished, modern account of Agafonov’s proof; we have endeavoured touse pedagogical explanations and have extended the treatment to make the text more readilyreadable the the modern reader.
Definition 21. A finite-state selector over { , } is a deterministic finite automaton S =( Q, δ, q s , Q F ) over { , } . A finite-state selector is strongly connected if its underlying directedgraph (states are nodes, transitions are edges) is strongly connected. Denote by L ( S ) thelanguage accepted by the automaton.If α = a a · · · is a finite or right-infinite sequence over { , } , the subsequence selected by A is the (possibly empty) sequence of letters a n such that the prefix a · · · a n − ∈ L ( S ) , thatis, the automaton when started on the finite word a · · · a n − in state q s ends in an acceptingstate after having read the entire word.For two words u , v , we write u (cid:22) v if u is a prefix of ≺ v , and u ≺ v if u is a proper prefixof v . Definition 22.
Let a = a · · · a n and b = b · · · b N be finite words over { , } . We denote by ( a ) b the number of occurrences of a in b , that is, the quantity |{ j : b j b j +1 · · · b j + n − = a a · · · a n }| Definition 23.
Let a = a a . . . a n be a word over { , } , and p a probability distribution on { , } . We define: µ p ( a ) = n Y i =1 p ( a i ) If M ⊆ { , } ∗ is finite, we define µ p ( M ) = P w ∈ M µ p ( w ) (and set µ p ( ∅ ) = 0 ). Definition 24.
Let α = x x . . . x n · · · be a sequence over { , } . We say that α is p - block-distributed if, for each n > and every w ∈ { , } n , the n -block decomposition ( α ( n,r ) ) r > of α satisfies: lim k →∞ | i ≤ k : α ( n,k ) = w | k = µ p ( w ) As already remarked above, this notion coincides with Agafonov-normality (Definition 13).12 emark . Like in Agafonov’s original paper, for a finite-state selector A , we do not requirethat all cycles in the underlying directed graph of A contain at least one accepting state. Thisassumption is occasionally made in modern papers on Agafonov’s Theorem to ensure that if w ∈ { , } ω is a normal sequence, then A [ w ] is infinite as well. But just as in Agafonov’spaper, the requirement turns out to be unnecessary (see Lemma 14).However, in Agafonov’s paper, the probability µ p (1) of obtaining a was assumed to satisfy < µ p (1) < (i.e., both and occur with positive probability). Without this assumption,there are connected automata that fail to pick out infinite sequences from p -distributed ones.For example, define A = ( { q , q } , { , } , δ, q , { q } ) where δ ( q ,
0) = q δ ( q ,
1) = q δ ( q ,
1) = q δ ( q ,
1) = q Define µ p (0) = 1 and µ p (1) = 0 . Then, w = 10 ω is p -distributed, but A [ w ] = 0 , hence isfinite.Motivated by Remark 1, we have the following definition: Definition 25.
A Bernoulli distribution p : { , } −→ [0 , is said to be positive if, for all a ∈ { , } , p ( a ) > . The probability map µ p : { , } ∗ : −→ [0 , is positive if p is positive. Proposition 12 (Finite-State selectors are compositional) . Let A and B be DFAs over thesame alphabet. Then there is a DFA C such that, for each sequence w , C [ w ] = B [ A [ w ]] .Proof. Let A = ( Q A , { , } , δ A , q A , F A ) and B = ( Q B , { , } , δ B , q B , F B ) . Define Q C = Q A × Q B , and set q C = ( q A , q B ) and F C = F A × F B . For each q B ∈ Q B , define the set D q B = { ( q, q B ) : q ∈ Q A } ⊆ Q C . Observe that Q C = S q B ∈ Q B D q B and that for q B , r B ∈ Q B with q B = r B , we have D q B ∩ D r B = ∅ , and thus { D q B : q B ∈ Q B } is a partitioning of Q C .Hence, the transition relation, δ C , of C may be defined by defining it separately on each subset D q B : δ C (( q, q B ) , a ) = (cid:26) ( r, q B ) if q / ∈ F A and δ A ( q, a ) = r ( r, r B ) if q ∈ F A and δ A ( q, a ) = r and δ B ( q B , a ) = r B Thus, when C processes its input, it freezes the current state q B of B (the freezing is repre-sented by staying within D q B ) and simulates A until an accepting state of A is reached (i.e.just before A would select the next symbol); on the next transition, C unfreezes the currentstate of B and moves to the next state r B of B and then freezes it and continues with asimulation of A .Observe that a symbol is picked out by C iff the state is an element of F C = F A × F B iff the symbol is the next symbol read after simulation of A reaches an accepting state of A when the current frozen state of B is an accepting state of B .The following shows that to prove that p -distributedness is preserved under finite-stateselection, it suffices to prove that the limiting frequency of each a ∈ { , } exists and is equalto p ( a ) . Lemma 13.
Let α be a p -distributed sequence. The following are equivalent: • For all connected DFAs A , A [ α ] is p -distributed. For all connected DFAs A and all a ∈ { , } , the limiting frequency of a in A [ α ] existsand is equal to p ( a ) .Proof. If, for all A , A [ α ] is p -distributed, then in particular the limiting frequency of a in A [ α ] exists and is equal to p ( a ) for all A .Conversely, suppose that, for all DFAs A and all a ∈ { , } , the limiting frequency of a in A [ α ] exists and is equal to p ( a ) . We will prove by induction on k ≥ that the limitingfrequency of every v · · · v k v k +1 ∈ { , } k +1 exists and equals p ( v · · · v k v k +1 ) . • k = 0 : This is the supposition. • k ≥ . Suppose that the result has been proved for k − . Let v · · · v k ∈ { , } k ; bythe induction hypothesis, the limiting frequency of v · · · v k in A [ w ] is p ( v · · · v k ) . Weclaim that there is a strongly connected DFA B that, from any sequence, selects thesymbol after each occurrence of v · · · v k . To see that such a DFA exists, let there be astate for each element of { , } k and assume that the state is the current length- k stringin a “sliding window” that moves over w one symbol at the time; when the window ismoved one step, the DFA transits to the state representing the new length- k string inthe window, i.e. from the state representing the word w · · · w k , there are transitions to w · · · w k and w · · · w k ; it is easy to see that each state is reachable from every otherstate in at most k transitions. The unique final state of B is the state representing v ... v k ;the start state of B can be chosen to be any state representing a string w · · · w k suchthat there is exactly k transitions to the final state.By Proposition 12, there is a connected DFA C such that C [ w ] = B [ A [ w ]] .For any a ∈ { , } and any sufficiently large positive integer N , we have ( a ) C [ w ≤ N ] | C [ w ≤ N ] | = ( a ) B [ A [ w ≤ N ]] | B [ A [ w ≤ N ]] | == ( v · · · v k a ) A [ w ≤ N ] ( v · · · v k ) A [ w ≤ N ] As C is connected, there is a real number b with < b ≤ such that C selects at least bN symbols from w ≤ N , and by the induction hypothesis, for every ǫ > , there is an M suchthat for all N > M/b , (cid:12)(cid:12)(cid:12) ( a ) C [ w ≤ N ] | C [ w ≤ N ] | − p ( a ) (cid:12)(cid:12)(cid:12) < ǫ and hence (cid:12)(cid:12)(cid:12) ( v ··· v k a ) A [ w ≤ N ] ( v ··· v k ) A [ w ≤ N ] − p ( a ) (cid:12)(cid:12)(cid:12) < ǫ .But for all sufficiently large N , the induction hypothesis furnishes (cid:12)(cid:12)(cid:12)(cid:12) ( v · · · v k ) A [ w ≤ N ] | A [ w ≤ N ] | − p ( v · · · v k ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ But as ( v · · · v k a ) A [ w ≤ N ] | A [ w ≤ N ] | = ( v · · · v k a ) A [ w ≤ N ] ( v · · · v k ) A [ w ≤ N ] · ( v · · · v k ) A [ w ≤ N ] | A [ w ≤ N ] | we hence have (as p ( v · · · v k ) p ( a ) = p ( v · · · v k a ) because p is Bernoulli): (cid:12)(cid:12)(cid:12)(cid:12) ( v · · · v k a ) A [ w ≤ N ] | A [ w ≤ N ] | − p ( v · · · v k a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ + ǫ (cid:18) ( v · · · v k a ) A [ w ≤ N ] ( v · · · v k ) A [ w ≤ N ] + ( v · · · v k ) A [ w ≤ N ] | A [ w ≤ N ] | (cid:19) ≤ ǫ + 2 ǫ a ∈ { , } , the limiting frequency of v · · · v k a in A [ w ≤ N ] exists and equals pv · · · v k a , as desired. Definition 26.
A strategy S is a predicate over the set of finite words, i.e. S ⊆ { , } ∗ .Given a strategy S and a right-infinite sequence x in { , } ω , we define the sequence S ( x ) as follows. Let i , i , . . . , i k , . . . be the (increasing) sequence of indices j such that x Let A = ( Q, { , } , δ, q , F ) be a connected DFA. For all q ∈ Q , we denote by A q the automaton ( Q, { , } , δ, q, F ) , i.e. where the state q is chosen as the initial state. Definition 28. Let A = ( Q, { , } , δ, q , F ) be a connected DFA, and let q ∈ Q . Let α be aright-infinite sequence over { , } . We denote by A q [ x ] the subsequence ¯ α of α picked out by A q , that is, w i ∈ ¯ w if and only if A q ( w
Let A = ( Q, { , } , δ, q , F ) be a strongly connected DFA. For all p ∈ [0 , , b ∈ [0 , , n ∈ N and ǫ > , we define sets D pn ( b, ǫ ) , E n ( b, q ) and G n ( b, ǫ, q ) as follows: D pn ( b, ǫ, q ) = (cid:26) w ∈ { , } n : | A q [ w ] | > bn and max a ∈{ , } (cid:12)(cid:12)(cid:12)(cid:12) ( a ) A q [ w ] | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (cid:27) (6) D pn ( b, ǫ ) = \ q ∈ Q D pn ( b, ǫ, q ) (7) E n ( b, q ) = { w ∈ { , } n : | A q [ w ] | ≤ bn } (8) E n ( b ) = [ q ∈ Q E n ( b, q ) (9) G n ( b, ǫ, q ) = (cid:26) w ∈ { , } n : | A q [ w ] | > bn and max a ∈{ , } (cid:12)(cid:12)(cid:12)(cid:12) ( a ) A q [ w ] | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) (10) G n ( b, ǫ ) = [ q ∈ Q G n ( b, ǫ, q ) (11) Observe that, for all b, n, ǫ , { , } n = E n ( b ) ∪ D pn ( b, ǫ ) ∪ G n ( b, ǫ ) (but E n ( b ) and G n ( b, ǫ ) are not necessarily disjoint). Lemma 14. Let A = ( Q, { , } , δ, q , F ) be strongly connected, n a positive integer, and b bea real number with b > . Then there exist real numbers c, d > such that for all real numbers ǫ > : lim n →∞ µ p (cid:18) E n (cid:18) c − ǫd (cid:19)(cid:19) = 0 roof. Now, the DFA A induces a stochastic | Q | × | Q | matrix P by setting P ij = X a ∈{ , } µ p ( a ) · [ δ ( i, a ) = j ] . Note in particular that P ij = 0 iff there are no transitions from i to j in Q on a symbol a ∈ { , } with µ p ( a ) > . As A is strongly connected, there exists a path from state i to state j for each i, j ∈ Q , and as p is a positive Bernoulli distribution, we have µ p ( a ) = p ( a ) > , i, j ,whence for each i, j there is an integer n ij such that P n ij ij > , that is, P (and its associatedMarkov chains) is irreducible. As all states of a finite Markov chain with irreducible transitionmatrix are positive recurrent, standard results (see, e.g., [19, Thm. 54]) yield that thereis a unique positive stationary distribution π : Q −→ [0 , (s.t., for all i ∈ Q , π ( i ) > and λ ( i ) = P j ∈ Q λ ( j ) P ij ). Furthermore, the expected return time M i to state i satisfies M i = 1 /π ( i ) [19, Thm. 54].Let D be the least common multiple of the set { n ij : ( i, q ) ∈ Q } , and let ( X n ) n ≥ =( X , X , . . . ) be a Markov chain with transition matrix P and some initial distribution λ onthe states.Consider, for each i ∈ Q , the variable V ( i ) where V i , where V i ( n ) = n − X k =0 X k = i As P is irreducible, the Ergodic Theorem for Markov chains (see, e.g., [19, Thm. 75]) yieldsthat lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − M i (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (12)Let α ∈ { , } n and let q A j ( α ) = q · · · q n − be the sequence of states visited when A isgiven α as input starting from state j (i.e., q = j ). Observe that the probability of observingthe state sequence q · · · q n − in a Markov chain with transition matrix P is Pr ( q · · · q n − ) = µ p ( { α : q A j ( α ) = q · · · q n − ) . and thus:Pr q · · · q n − : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n − k =0 [ q k = i ] n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ! = (13) µ p α : q A j ( α ) = q · · · q n ∧ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n − k =0 [ q k = i ] n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ! = (14) µ p (cid:18) α : q A j ( α ) = q · · · q n ∧ (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) (15)Hence, by (Equation (12)) and the above, we have lim n →∞ µ p (cid:18) α : q A j ( α ) = q · · · q n ∧ (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (16)If q A j ( w ) = q · · · q n − and one of the states q i ∈ { q , . . . , q n − } is an element of F , then A j picks out w i . Set c = min q j ∈ F π ( i ) . Then, for all j ∈ Q , (Equation (16)) yields that lim n →∞ µ p ( E n ( c − ǫ )) = 0 . The result now follows from µ p ( E n ( c − ǫ )) ≤ P j ∈ Q µ p ( E n ( c − ǫ ) , j ) .16 emark . In Lemma Lemma 14, the assumption that the DFA A is strongly connected canbe omitted if we make the assumption that every cycle of A contains an accepting state.Let k be the maximal number of non-accepting states in any path in A from one acceptingstate to another that does not contain any other accepting states than the start and endstates of the path. As every cycle of A contains an accepting state, k is well-defined. If w = w w · · · ∈ { , } ω and A [ w ] is infinite, then, by construction, | A q [ w · · · w n ] | ≥ d where n = d ( k + 1) + r and ≤ r < k + 1 . As d = ( n − r ) / ( k + 1) > n/ ( k + 1) − ≥ n/ ( k + 2) for n > k + 1) , we have | A q [ w · · · w n ] | ≥ n/ ( k + 2) . Hence, for n > k + 1) , A q [ w · · · w n ] >n/ ( k + 2) , whence E n (1 / ( k + 2) , q ) = ∅ for n > k + 1) , and thus µ p ( E n (1 / ( k + 2) , q )) = 0 ;setting c = 1 and d = 1 / ( k + 2) then proves the lemma).The assumption that every cycle of A contains an accepting state is occasionally madein the modern literature on Agafonov’s Theorem, e.g. [4]. The reason for not making thisassumption is that it is unnecessary for strongly connected automata Lemma 15. Let S be a strategy, and let F be finite subset of { , } ∗ . Let R = F \ { w : ∃ u ∈ F. u ≺ w } be the set obtained from F by removing words w that already have a proper prefixin F . Define, for each positive integer n , the set M n = { w ∈ { , } n : S ( w ) ∈ F } . Then, µ p ( M n ) µ p ( R ) .Proof. Observe that M n = [ u ∈ R { w : S ( w ) ∈ F ∧ u (cid:22) S ( w ) } and thus µ p ( M n ) = X u ∈ R µ p ( { w : S ( w ) ∈ F ∧ u (cid:22) S ( w ) } ) ≤ X u ∈ R µ p ( { w : u (cid:22) S ( w ) } ) Thus, if, for any word u = a a · · · a k , the set M u = { w ∈ { , } n : u (cid:22) S ( w ) } satisfies µ p ( M u ) µ p ( u ) , it follows that µ p ( M n ) ≤ X u ∈ R µ p ( { w : u (cid:22) S ( w ) } ) ≤ X u ∈ R µ p ( u ) = µ p ( R ) as desired. We thus proceed to prove µ p ( M u ) µ p ( u ) by induction on k = | u | . • Base case: u = a ∈ { , } , so µ p ( u ) = µ p ( a ) = p ( a ) . Let α = x x . . . x n be a word in M u and let x f ∈ { , } be the first symbol selected by S when applied to α ; as α ∈ M u ,we have x f = a . Now, for each b ∈ { , } \ { x f } , define ¯ α b = x x · · · x f − bx f +1 . . . x n ,that is, ¯ α b is the word obtained from α by changing the f th symbol to b . Then, ¯ α b / ∈ M u .We define the set ¯ M u = { ¯ α b : α ∈ M u , b ∈ { , } \ { a }} . Observe that µ p ( ¯ α b ) = µ p ( α ) p ( b ) /p ( a ) , and hence: µ p ( ¯ M u ) = X α ∈ M u X b ∈{ , }\{ a } µ p ( α ) p ( b ) p ( a ) = X α ∈ M u µ p ( α ) p ( a ) X b ∈{ , }\{ a } p ( b ) = 1 − p ( a ) p ( a ) X α ∈ M u µ p ( α ) = 1 − p ( a ) p ( a ) µ p ( M u ) ¯ α b / ∈ M u for any b ∈ { , } \ { a } , we have M u ∩ ¯ M u = ∅ , whence µ p ( M u ) + µ p ( ¯ M u ) ≤ µ p ( { , } n ) = 1 and therefore µ p ( M u ) ≤ − µ p ( ¯ M u ) . Thus, µ p ( M u ) ≤ − µ p ( M u ) 1 − p ( a ) p ( a ) that is, µ p ( M u ) ≤ 11 + − p ( a ) p ( a ) = p ( a ) = µ p ( u ) as desired. • Inductive case: u = a a . . . a k a k +1 with a k +1 = a for some a ∈ { , } . We have M u = { α ∈ { , } ∗ : a · · · a k a (cid:22) S ( α ) } . Given α ∈ M u , let for each b ∈ { , } \ { a } , ¯ α b be the word obtained from α by changing the k + 1 th symbol selected by S to b .Observe that ¬ ( ¯ α b (cid:22) u ) . Define ¯ M u to be the set { ¯ α b : α ∈ M, b ∈ { , } \ { a }} ,and note that M u ∩ ¯ M u = ∅ , and that µ p ( ¯ α b ) = µ p ( α ) p ( b ) /p ( a ) and thus, as above, µ p ( ¯ M u ) = µ p ( M u )(1 − p ( a )) /p ( a ) .Let N u be the set { α ∈ { , } ∗ : a . . . a k (cid:22) S ( α ) } . Then N u contains as subsetsboth M u and ¯ M u , whence µ p ( M u ) + µ p ( ¯ M u ) ≤ µ p ( N u ) . The induction hypothesisfurnishes that µ p ( N u ) ≤ µ p ( a a . . . a k ) , and thus µ p ( M u ) + µ p ( ¯ M u ) ≤ µ p ( a a . . . a k ) .As µ p ( ¯ M u ) = µ p ( M u )(1 − p ( a )) /p ( a ) , we deduce that µ p ( M u ) ≤ µ p ( a a . . . a k ) − µ p ( ¯ M u ) = µ p ( a a . . . a k ) − µ p ( M u ) 1 − p ( a ) p ( a ) and thus that µ p ( M u ) ≤ µ p ( a a . . . a k )1 + − p ( a ) p ( a ) = µ p ( a a . . . a k ) p ( a ) = µ p ( a a . . . a k a ) as desired. Lemma 16. Let S be a strategy, a ∈ { , } , b, ǫ be real numbers with < b ≤ and ǫ > ,and define, for all positive integers n : H n ( b, ǫ ) = (cid:26) w ∈ { , } n : | S ( w ) | > bn ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) S ( w ) | S ( w ) | (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = [ bn<ℓ ≤ n (cid:26) w ∈ { , } n : S ( w ) ∈ { , } ℓ ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) S ( w ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Then: lim n →∞ µ p ( H n ( b, ǫ )) = 0 roof. Define F n ( b, ǫ ) = [ bn<ℓ ≤ n (cid:26) y ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Observe that H n ( b, ǫ ) = { w ∈ { , } n : S ( w ) ∈ F n ( b, ǫ ) } . Let R n ( b, ǫ ) ⊆ { , } ≤ n be the setobtained by removing from F n ( b, ǫ ) all w such that there is u ∈ F n ( b, ǫ ) with u ≺ w (i.e.,remove all words from F n ( b, ǫ ) that already have a prefix in F n ( b, ǫ ) ). Lemma Lemma 15 yieldsthat µ p ( H n ( b, ǫ )) ≤ µ p ( R n ( b, ǫ )) , and thus that lim n →∞ µ p ( R n ( b, ǫ )) = 0 .Consider the stochastic variable X a that is when a is picked from { , } with probability p ( a ) , and otherwise. Then, the mean of X a is p ( a ) and the variance of X a is p ( a )(1 − p ( a )) .Now consider performing ℓ ≥ independent Bernoulli trials drawn according to X a . Define q (1) = p ( a ) , q (0) = 1 − p ( a ) , and q (1 c ) = p ( a ) q ( c ) and q (0 c ) = (1 − p ( a )) q ( c ) for c ∈ { , } + ,and consider the probability distribution ¯ q : { , } ℓ −→ [0; 1] on { , } ℓ . Now, for any v ∈{ , } ℓ , ¯ q ( v ) is the probability of obtaining v by performing ℓ repeated Bernoulli trials asabove.Define the stochastic variable X ℓa = X a + X a + · · · + X a ( ℓ times). Then, X ℓ countsthe number of occurrences of a by performing ℓ Bernoulli trials as above. By Chebyshev’sinequality, X ℓa satisfies: Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) ≤ p ( a ) (1 − p ( a )) ℓǫ (17)Define the map g : { , } −→ { , } by g ( a ) = 1 and g ( b ) = 0 for all b ∈ { , } \ { a } ; g obviously extends homomorphically to a map ˜ g : { , } ℓ −→ { , } ℓ by setting ˜ g ( c c · · · c ℓ ) = g ( c ) g ( c ) · · · g ( c ℓ ) .Observe that, for any y ∈ { , } ℓ , we have: | p ( a ) − (1) ˜g ( y ) /ℓ | ≥ ǫ iff | p ( a ) − ( a ) y /ℓ | ≥ ǫ (18)For any u ∈ { , } ℓ , ¯ q ( u ) = p ( a ) (1) u (1 − p ( a )) ℓ − (1) u = X y ∈{ , } ℓ :˜ g ( y )= u µ p ( y ) = µ p ( { y ∈ { , } ℓ : ˜ g ( y ) = u } )) Hence, for any event U ⊆ { , } ℓ , we have:Pr ( U ) = X u ∈U ¯ q ( u ) = X u ∈U µ p ( { y ∈ { , } ℓ : ˜ g ( y ) = u } ))= µ p (cid:16)n y ∈ { , } ℓ : ˜ g ( y ) ∈ U o(cid:17) (19)The event | p ( a ) − X ℓa /ℓ | ≥ ǫ is shorthand for the set ( u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − P ℓj =1 u j ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ) = (cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − (1) u ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) 19e thus obtain:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = Pr (cid:18)(cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − (1) u ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) = µ p (cid:18)(cid:26) y ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − (1) ˜g ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (19) )= µ p (cid:18)(cid:26) y ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (18) )= X y ∈{ , } ℓ µ p (cid:18)(cid:26) y : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) (20)Observe that: µ p ( R n ( b, ǫ )) = µ p [ bn<ℓ ≤ n (cid:26) y ∈ { , } ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ { , } ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) (21)But as µ p ( a · · · a ℓ ) ≥ µ p ( a · · · a ℓ a ℓ +1 ) for any a , . . . , a ℓ , a ℓ +1 ∈ { , } and no element of R n ( b, ǫ ) is a prefix of any other element, we have X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ { , } ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) ≤ µ p (cid:18)(cid:26) y ∈ { , } ⌊ bn ⌋ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ⌊ bn ⌋ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) (22)We thus have: µ p ( R n ( b, ǫ ) ≤ µ p (cid:18)(cid:26) y ∈ { , } ⌊ bn ⌋ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( a ) y ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( eq. (21) ) and ( section Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ⌊ bn ⌋ a ⌊ bn ⌋ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ! by ( Equation (20) ) ≤ p ( a ) (1 − p ( a )) ⌊ bn ⌋ ǫ by ( Equation (17) ) Thus, lim n →∞ µ p ( R n ( b, ǫ )) = 0 , as desired. Corollary 16.1. Let b, ǫ be real numbers with < b ≤ and ǫ > . Then, lim n →∞ µ p ( G n ( b, ǫ )) = 0 Proof. By Lemma Lemma 16 with S = A q , we obtain lim n →∞ µ p ( G n ( b, ǫ, q )) = 0 and as G n ( b, ǫ ) = S q ∈ Q G n ( b, ǫ, q ) , we have µ p ( G n ( b, ǫ )) ≤ P q ∈ Q µ p ( G n ( b, ǫ, q )) . As Q is finite, wehence obtain lim n →∞ µ p ( G n ( b, ǫ )) = 0 . 20 emma 17. There is a real number b with < b ≤ such that for all ǫ > , lim n →∞ µ p ( D pn ( b, ǫ )) = 1 . Proof. Observe that, for all b with < b ≤ : { , } n \ D pn ( b, ǫ )) = { w ∈ { , } n : ∃ q ∈ Q. | A q [ w ] | ≤ bn }∪ (cid:26) w ∈ { , } n : ∃ q ∈ Q. | A q [ w ] a | > bn ∧ max a ∈{ , } (cid:12)(cid:12)(cid:12)(cid:12) ( a ) A q [ w ] | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = [ q ∈ Q E n ( b, q ) ∪ [ q ∈ Q G n ( b, ǫ, q ) and thus, µ p ( { , } n \ D pn ( b, ǫ )) ≤ µ p [ q ∈ Q E n ( b, q ) + µ p [ q ∈ Q G n ( b, ǫ, q ) = µ p ( G n ( b, ǫ )) + µ p ( E n ( b )) Choose, by Lemma Lemma 14 real numbers c, d such that lim n →∞ µ p ( E n ( c − ǫd )) = 0 , andset b = ( c − ǫ ) /d . By Corollary Corollary 16.1, we obtain lim n →∞ G n ( b, ǫ ) = 0 , and thus lim n →∞ µ p ( { , } n \ D pn ( b, ǫ )) = 0 . The result now follows by µ p ( D pn ( b, ǫ ))) = 1 − µ p ( { , } n \ D pn ( b, ǫ )) . Theorem 18. Let α be a p -block-distributed right-infinite sequence, and A a strongly connectedDFA. Then the sequence β = A [ α ] is p -distributed.Proof. By Lemma Lemma 13 it suffices to show that, for all a ∈ { , } , the limiting frequencyof a in A [ α ] exists and is equal to p ( a ) .Consider the sequence ( β ( n,r ) ) of blocks of A [ x ] corresponding to the sequence of blocks ( α ( n,r ) ) , that is β ( n,r ) is the sequence of symbols picked out from block α ( n,r ) when A is appliedto α ; note that each β [ n,r ] has length between and n .For each positive integer m , define L m = P mi =1 | β [ n,i ] | , and for each a ∈ { , } , write ρ ma = P mi =1 ( a ) β ( n,i ) L . Observe that, to prove the theorem, it suffices to show that, for any realnumber ǫ with < ǫ < and sufficiently large m , that | ρ a − p ( a ) | < ǫ .Furthermore, set I m = { i m : α ( n,i ) D pn ( b, ǫ ) } , and set ℓ m = P i ∈ I m | β ( n,i ) | .Now, define θ ma by: θ ma = P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) P i ∈{ ,...,m }\ I m | y ( n,i ) | = P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m − ℓ m That is, θ ma is the frequency of occurrences of a s when the blocks β ( n,i )] picked out from blocks α ( n,r ) ∈ D pn ( b, ǫ ) are concatenated. Observe that, by definition of D pn , we have | θ ma − p ( a ) | < ǫ .21e have: ρ ma − θ ma = P mi =1 ( a ) β ( n,i ) L m − P i ∈{ ,...,m }\ I ( a ) β ( n,i ) L m − ℓ m = P i ∈ I m ( a ) β ( n,i ) L m + P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m ! − P i ∈{ ,...,m }\ I ( a ) β ( n,i ) L m − ℓ m ( † ) = P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m − P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m − ℓ m + P i ∈ I m ( a ) β ( n,i ) L m ≤ P i ∈ I m ( a ) β ( n,i ) L m ≤ P i ∈ I m | β ( n,i ) | L m = ℓ m L m (23)where the penultimate inequalities in the last line above follows because L m ≥ L m − ℓ m implies P i ∈{ ,...,m }\ I ( a ) β ( n,i ) L − P i ∈{ ,...,m }\ I ( a ) β [ n,i ] L − ℓ ≤ , and the final inequality followsbecause P i ∈ I ( a ) β [ n,i ] ≤ P i ∈ I | β [ n,i ] | = ℓ m .By basic algebra, we have: P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m − P i ∈{ ,...,m }\ I m ( a ) β ( n,i ) L m − ℓ m = − ℓ m P i ∈{ ,...,m }\ I ( a ) β ( n,i ) L m ( L m − ℓ m ) and as X i ∈{ ,...,m }\ I m ( a ) β ( n,i ) ≤ X i ∈{ ,...,m }\ I m ≤ X i ∈{ ,...,m }\ I m | β ( n,i ) | ≤ L m − ℓ m we conclude that − ℓ P i ∈{ ,...,m }\ I ( a ) β [ n,i ] L ( L − ℓ ) ≥ − ℓ m L m and thus by ( † ) that ρ ma − θ ma + P i ∈ I m ( a ) β ( n,i ) L m ≥ − ℓ m L m whence − ℓ m /L m ≤ ρ a − θ a , which combined with (Equation (23)) yields | ρ a − θ a | ≤ ℓ/L .By Lemma 17 pick a b such that such that for all ǫ > , we have lim n →∞ µ p ( D pn ( b, ǫ )) = 1 .Choose δ > with δ < bǫ . Pick n ∈ N such that µ p ( D pn ( b, ǫ )) > − δ . Now, pick α < bǫ .Because α is p -block-distributed, there exists M ∈ N such that for all k ≥ M and all G ⊆{ , } n , the prefix α ≤ kn of α of length kn satisfies: (cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ G }| k − µ p ( G ) (cid:12)(cid:12)(cid:12)(cid:12) < α In the particular case G = D pn ( b, ǫ/ , we thus have: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ D pn (cid:0) b, ǫ (cid:1) }| k − µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < α and thus − δ − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k ≤ µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17) − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k< α (cid:12)(cid:12)(cid:12)n i ≤ k : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) > k (1 − δ − α ) (24)By definition of D pn ( b, ǫ )) , every α ( n,i ) ∈ D pn ( b, ǫ )) satisfies | A [ α ( n , i ) ] | > bn , and we thushave, whence L m = m X i =1 | y ( n,i ) | = m X i =1 | A [ α ( n , i ) ] | ≥ (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) bn > m (1 − δ − α ) bn Furthermore, by definition of I m and (Equation (24)), | I m | = (cid:12)(cid:12)(cid:12)n i m : α ( n,i ) D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) = m − (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) < m − m (1 − δ − α ) = m ( δ + α ) But then, ℓ m = X i ∈ I m | y ( i,n ) | ≤ | I m | n < mn ( δ + α ) and thus: ℓ m L m < mn ( δ + α ) m (1 − δ − α ) bn = δ + αb (1 − δ − α ) < bǫ + bǫ b (cid:0) − bǫ − bǫ (cid:1) < ǫ − < ǫ where we have used that bǫ < in the penultimate inequality.We now finally have | ρ a − p ( a ) | ≤ | ρ ma − θ ma | + | θ a − p ( a ) | < ℓ m L m + ǫ < ǫ ǫ ǫ concluding the proof. References [1] V. N. Agafonov. Normal sequences and and finite automata. Problemy Kibernetiki ,20:123–129, 1968. In Russian.[2] V. N. Agafonov. Normal sequences and finite automata. Sov. Math., Dokl. , 9:324–325,1968. Originally published in Russian (vol. 179:2, p. 255-266).[3] V. N. Agafonov. Normal sequences and finite automata. Dokl. Akad. Nauk SSSR ,179(2):255–256, 1968.[4] V. Becher and . A. Heiber. Normal numbers and finite automata. Theoretical ComputerScience , 477:109–116, 2013.[5] E. Borel. Les probabilités dénombrables et leurs applications arithmétiques. Rend. Circ.Matem. Palermo , 27:247–271, 1909.[6] A. Church. On the concept of a random sequence. Bulletin of the American MathematicalSociety , 46(2):130–135, 1940. 237] A. H. Copeland. Admissible numbers in the theory of probability. American Journal ofMathematics , 50(4):535–552, 1928.[8] A. H. Copeland. Point set theory applied to the random selection of the digits of anadmissible number. American Journal of Mathematics , 58(1):181–192, 1936.[9] E. Kamke. Über neuere begründungen der wahrscheinlichkeitsrechnung. Jahresberichtder Deutschen Mathematiker-Vereinigung , 42:14–27, 1933.[10] D. W. Loveland. The kleene hierarchy classification of recursively random sequences. Transactions of the American Mathematical Society , 125(3):497–510, 1966.[11] V. N. . Agafonov . Normal~nye posledovatel~nosti i koneqnye avtomaty . Dokl. ANSSSR , 179(2):255–256, 1968.[12] A. G. . Postnikov . Arifmetiqeskoe modelirovanie sluqa(cid:26)nyh pro
essov . Tr. MIANSSSR , 57:3–84, 1960.[13] A. G. . Postnikov and I. I. . P(cid:31)te
ki(cid:26) . Normal~nye po Bernulli posledova-tel~nosti znakov . Izv. AN SSSR. Ser. matem. , 21(4):501–514, 1957.[14] L. P. . Postnikova . O sv(cid:31)zi pon(cid:31)ti(cid:26) kollektiva Mizesa{Qerqa i normal~no(cid:26) poBernulli posledovatel~nosti znakov . Teori(cid:31) vero(cid:31)tn. i ee primen. , 6(2):232–234,1961.[15] L. Postnikova. On the connection between the concepts of collectives of Mises-Churchand normal Bernoulli sequences of symbols. Theory of Probability & Its Applications ,6(2):211–213, 1961. translation of [14] by Eizo Nishiura.[16] H. Reichenbach. Axiomatik der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift ,34(1):568–619, 1932.[17] H. Reichenbach. Les fondements logiques du calcul des probabilités. In Annales del’institut Henri Poincaré , volume 7, pages 267–348, 1937.[18] C. Schnorr and H. Stimm. Endliche Automaten und Zufallsfolgen. Acta Inf. , 1:345–359,1972.[19] R. Serfozo. Basics of Applied Stochastic Processes . Probability and Its Applications.Springer-Verlag, 2009.[20] P. Shields. The Theory of Bernoulli Shifts . Univ. Chicago Press, 1973.[21] E. Tornier. Wahrscheinlichkeitsrechnung und zahlentheorie. erste mitteilung. Journal fürdie reine und angewandte Mathematik , 1929(160):177–198, 1929.[22] R. Von Mises. Grundlagen der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift ,5(191):52–99, 1919.[23] R. von Mises.