Agafonov's Theorem for finite and infinite alphabets and probability distributions different from equidistribution
aa r X i v : . [ c s . F L ] N ov Agafonov’s Theorem for finite and infinite alphabets andprobability distributions different from equidistribution
Thomas SeillerCNRS & Université Sorbonne Paris Nord Jakob Grue SimonsenUniversity of CopenhagenNovember 2020
Abstract
An infinite sequence over a finite alphabet of symbols Σ is called normal iff the limitingfrequency of every finite string w ∈ Σ ∗ exists and equals | Σ | −| w | .A celebrated theorem by Agafonov states that a sequence α is normal iff every finite-state selector (i.e., a DFA accepting or rejecting prefixes of α ) selects a normal sequencefrom α .Let µ : Σ ∗ −→ [0 , be a probability map (for every n ≥ , P w ∈ Σ n µ ( w ) = 1 ). Saythat an infinite sequence α is is µ -distributed if, for every w ∈ Σ ∗ , the limiting frequencyof w in α exists and equals µ ( w ) . Thus, α is normal if it is µ -distributed for the probabilitymap µ ( w ) = | Σ | −| w | .Unlike normality, µ -distributedness is not preserved by finite-state selectors for allprobability maps µ . This raises the question of how to characterize the probability maps µ for which µ -distributedness is preserved across finite-state selection, or equivalently, byselection by programs using constant space.We prove the following result: For any finite or countably infinite alphabet Σ , ev-ery finite-state selector over Σ selects a µ -distributed sequence from every µ -distributedsequence α iff µ is induced by a Bernoulli distribution on Σ , that is, for every word a · · · a n ∈ Σ ∗ , µ ( a · · · a n ) = Q ni =1 µ ( a i ) .The primary – and remarkable – consequence of our main result is a complete char-acterization of the set of probability maps, on finite and infinite alphabets, for whichAgafonov-type results hold. The main positive takeaway is that (the appropriate gen-eralization of) Agafonov’s Theorem holds for Bernoulli distributions (rather than justequidistributions) on both finite and countably infinite alphabets.As a further consequence, we obtain a result in the area of symbolic dynamical systems:the shift-invariant measures ν on Σ ω such that any finite-state selector preserves theproperty of genericity for ν are exactly the positive Bernoulli measures. ontents µ -distributedness for non-Bernoulli measures 125 Finite-state selectors preserve µ -distributedness for Bernoulli measures 15 µ p -distributedness under finite-state selection . . . . . 17 A.1 Automata and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28A.2 µ -distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29A.3 Symbolic dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Introduction
Let α = x x · · · be an infinite sequence over a finite alphabet Σ . A string w ∈ Σ ∗ is saidto occur in α with limiting frequency f if, for each ǫ > , lim N →∞ w ( x ··· x N ) N = f , where w ( x · · · x N ) is the number of times that w occurs as a contiguous subsequence in x · · · x N . α is said to be normal if every finite string of length n over Σ occurs with limiting frequency | Σ | − n in α [12]. By standard results, the fractional part the base- b expansion of almost all realnumbers is a normal sequence for b ≥ , so for base , almost all real numbers have the digit“0” occurring 1-in-10 times in all sufficiently long finite prefixes of their digit expansion, have“11” occurring 1-in-100 times, “110” occurring 1-in-1000 times, and so on. Concrete examplesof normal sequences include Champernowne’s sequence · · · [20], the Copeland-Erdös sequence · · · consisting of concatenating the prime numbers [24], and forany polynomial f with positive integer coefficients the sequence f (1) f (2) f (3) · · · [25].A finite-state selector is a DFA that selects those symbols x m from α such that x · · · x m − is accepted by the DFA. The sequence of selected symbols may thus be finite or infinite. Agafonov’s Theorem states that a sequence α is normal iff any DFA that selects an infinitesequence from α , selects a normal sequence. Colloquially, Agafonov’s Theorem can be statedas: “any constant-space algorithm must preserve normality”.The purpose of this paper is twofold: (I) we study whether analogues of Agafonov’s The-orem holds if the distribution of finite strings is different from equidistribution, i.e. whetherdistributions where finite strings s are allowed to occur with frequency distinct from Σ | −| w | ;and (II) we study extensions of Agafonov’s Theorem to infinite alphabets (which in the tradi-tional setup in Agafonov’s Theorem is meaningless as there is no equidistributed probabilitydistribution on a countably infinite set).As an example, consider the (non-normal) sequence α = 010101 · · · . Clearly, every finitebit string occurs in α with some well-defined frequency (the simplest way to see this is thatfor each n > , there are exactly two distinct substrings of length n in α – one starting with and one starting with ), and the frequencies thus induce a probability distribution on { , } n for each n . In particular and each occur with limiting frequency / , but any DFA thatselects symbols at even positions will select the sequence · · · , and thus the probabilitydistribution on { , } is not preserved, showing that Agafonov’s Theorem in general fails tohold.In addition to being intrinsically interesting, our study of Agafonov’s Theorem is moti-vated by the fact that constant-space algorithms are usually employed in reactive programminglanguages used for signal processing (see Section 1.2.2 below), both for transduction and selec-tion, and Agafonov’s Theorem is a strong guarantee that such algorithms will always preserveone notion of randomness for infinite strings, namely that a random length- n subsequenceis exactly | Σ | − n – as the above example shows, selection from sequences where and areknown to occur with probability / is not enough – stronger guarantees such as normalitymust hold. Conversely, normality is a very strong requirement; in some infinite sequences,certain element may occur with much higher frequency than others, and one tantalizing wayof generating new sequences having the same distribution of finite subsequences could be tosimply let a DFA select elements from the original sequence, which in general is only possibleif (the appropriate analogue) of Agafonov’s Theorem holds.The motivation for studying infinite alphabets is that the study of normality is closely tiedto the study of symbolic dynamics and (information-theoretic) coding theory [10, 9, 35, 36],and that both areas have witnessed recent advances using infinite alphabets [13, 29, 11, 61,38], in particular the techniques of Madritsch and Mance [38] have allowed construction ofChampernowne-like sequences for various distributions over infinite alphabets. The formal statement of the main theorem can be found in Theorem 4 below. In plainlanguage, we prove that:
Let Σ be a non-empty finite or countably infinite alphabet, and let µ : Σ ∗ −→ [0 , bea probability map (i.e., for all n ≥ , P w ∈ Σ n µ ( w ) ) such that there exists at least one α ∈ Σ ω that is µ -distributed. Then, the following are equivalent:1. µ is induced by a positive Bernoulli probability distribution p on Σ , i.e. for every a , . . . , a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 p ( a ) , and for every a ∈ Σ , p ( a ) > .2. For every DFA A over Σ and every µ -distributed sequence α ∈ Σ ω , if A selects an infinitesequence from α , then the selected sequence is µ -distributed. The above result completely characterizes the probability maps preserved by selection byDFAs, both for finite and infinite alphabets, and Agafonov’s Theorem follows immediately asa corollary. We briefly review the roadmap and techniques used for the proof of the mainresult in Section 1.3.As the study of distributions associated to limiting frequencies of finite strings in (right-)infinite strings is cryptomorphic to the study of shift-invariant probability measures on theshift space (Σ ω , s ) equipped with the σ -algebra induced by the basis of cylinder sets on Σ , weobtain as a corollary a result in the field of symbolic dynamical systems, namely a completecharacterization of the shift-invariant probability measures ν for which any finite-state selectorpreserves genericity for ν , see Section 6. Agafonov’s Theorem [46] was one of the end results of multiple efforts grappling with the twonotions of (i) kollektiv (roughly, α ∈ { , } ω is a kollektiv wrt. a set S of selection strategiesif the limiting frequency of is unchanged after applying any strategy in S to α ), and(ii) admissible sequence and its relation to the notion of normal sequence [22, 53, 54, 48,47]. Agafonov’s Theorem itself had a virtually unknown precursor in a beautiful result byPostnikova [52] that showed, with the terminology of the present paper, that α ∈ { , } ω isnormal iff the distribution of 1s is preserved by selection strategies depending only on a finitenumber of preceding bits.Both Postnikova [52] and Agafonov [45] considered selection functions on sequences in { , } ω where the limiting distribution of was < p < (i.e., considered a Bernoullidistribution on { , } ), but considering Bernoulli distributions instead of the special case of The exact definition of kollektiv differs subtly across different authors, compare e.g. [67], [21], and [52].The original notion of kollektiv introduced by von Mises [67] had no constraints on the set S , but this turnedout to be essentially fruitless [63, 53, 32, 23]. . Forequidistribution, the earliest extension to arbitrary alphabets seems to be by Broglio andLiardet [15], and a number of authors have since re-proved Agafonov’s Theorem in the specialcase of equidistribution using a variety of methods; for example, using predictors definedfrom finite automata (for Σ = { , } ) [44], using compressibility arguments [6, 5, 58], and acombination of automata-theoretic and probabilistic methods similar to Agafonov’s originalreasoning [16].Agafonov’s Theorem itself has been generalized to treat selectors that are not necessarily(induced by prefix selection by) finite automata [2, 5, 66, 17], and some generalizations considerselectors based on relaxed finiteness criteria of the syntactic monoid of a language selectingprefixes of infinite sequences [31, 68]; conversely, results by Merkle and Reimann show thatadding just slight computational power to the selection strategies beyond finite automata– e.g. using a Pushdown automaton with unary stack alphabet instead of a DFA rendersAgafonov’s Theorem invalid [40]. Similarly, selection by finite automata has been extended,and analogues for Agafonov’s Theorem been proved, in other settings than selection fromelements of the set Σ ω , e.g. for shifts of finite type [16]. All of these results only considernormality rather than more general classes of distributions on finite strings.Conversely, construction of normal sequences (as opposed to selecting normal sequencesfrom other normal ones) has been investigated thoroughly for more than a hundred years [60,20, 42, 65, 39, 51], including explicit construction of real numbers with normal expansion forany integer base b ≥ [34, 55, 3], and real numbers with normal expansion in non-integer bases[64, 37]. Among this work, the result of most use to the present paper is the construction byMadritsch and Mance of generic sequences for any shift-invariant probability measure µ [38]– these are essentially sequences that are µ -distributed using the terminology of the presentpaper (see Definition 4).In very recent work, Carton [16] proves that, for any Markov measure µ on Σ ω induced bya pair ( P, π ) of a stochastic | Σ |×| Σ | matrix and a stationary distribution π for P , any sequenceselected from a µ -distributed sequence by a finite-state selector from a particular subset of µ - compatible selectors, will be µ -distributed. Roughly, a finite-state selector is compatible, ifit can only read consecutive symbols of Σ with non-zero transition probability in P and everystate has only incoming transitions of at most one symbol from Σ . In contrast, we considerthe full set of finite-state selectors. Moreover, Carton’s results are restricted to the case offinite alphabets. Infinite streams are typically used to model situtations where data elements arrive, no upperbound on the length of the stream is known a priori, and the focus is not on resource useas a function of the length of the stream; for example, infinite streams have been studiedextensively in event-level differential privacy [26, 33], and in semantics of lazy programminglanguages such as
Haskell [50]. One possible reason for this is that only the short version (without proofs or explanation of techniques)of Agafonov’s result [46] appeared in English as [1]; in contrast, the original longer paper in Russian [45] waspublished in a more obscure journal, and was never translated. In fact, one of the strategies considered by Merkle and Riemann, which consists in computing the language { ww R | w ∈ Σ ∗ } where w R is the reverse of w , can be computed by an arguably less expressive model ofcomputation, namely a DFA(2) – two-way automata with two heads [28]. election of (substreams of) elements from infinite streams has been investigated from apractical perspective since the 1960s [62], and is typically performed by specialized streamprocessing languages, e.g. LUSTRE [18] and
ESTEREL [8], typically for use in reactiveprogramming (e.g., for signal processing or circuit design). As they are designed for real-timeprocessing, these languages typically allow only very constrained operations – any program inboth
LUSTRE and
Esterel can be compiled to a finite state transducer automaton (anddeterministic program selecting a subsequence from its input is hence a finite-state selector asin Agafonov’s Theorem).In typical algorithmic treatments of stream processing, one typically studies unordered,finite sequences of elements from a very large, or infinite, set [41]. The problems consideredtypically have strong constraints, e.g. that only a single pass over the stream is allowed andthat each element can only be observed once, and often involve a sketch –a data structurethat stores information about the elements seen in the stream and allows to answer predefinedqueries. A classic example is estimating the frequency moments of the distribution of elementsin the stream using sketches with low memory in both alphabet size and stream length [4,30, 14]. Our work can be seen as a variation of streaming where the alphabet size may beinfinite, the stream itself is infinite, and the distribution of element is not limited to the setof elements, but also has requirements on the finite subsequences of elements in the stream;in this setting, our main result is that any constant-space sketch sampling an infinite streamin real-time preserves the distribution of finite subsequences iff the distribution is induced bya Bernoulli distribution on the set of elements.
The main result has two directions: (I) proving that if µ -distributedness is preserved byselection by any DFA, then µ is necessarily induced by a Bernoulli distribution, and (II) any µ induced by a Bernoulli distribution is preserved across selection by any DFA.For (I), we prove the more general result that if µ is not induced by a Bernoulli distributionon Σ , selection by a particular Postnikova strategy (roughly, a Postnikova strategy selectsan element of the sequence if and only if it follows a fixed finite word) will select a non- µ -distributed infinite sequence from a – bespoke – µ -distributed sequence. The Postnikovastrategy contains prefixes on the form u · w ∈ Σ ∗ for a fixed w chosen such that w · a ∈ Σ | w | +1 is a minimal witness string such that µ ( w · a ) = µ ( w ) · µ ( a ) . Using basic constructions, wecan then prove that the Postnikova strategy can be implemented by a DFA that simulates asliding fixed-width window.For (II), most of the modern methods of proving Agafonov’s Theorem (e.g., [6, 5, 58])are not immediately adaptable because they use methods that are particular to equidistribu-tions on finite alphabets (e.g., lossless finite-state compressors [6] or automatic Kolmogorovcomplexity [58]) – and we consider both Bernoulli distributions and infinite alphabets. In-stead, we work along the general lines of Agafonov’s original proof [46] that more heavily usesprobabilistic reasoning.The key insights in Agafonov’s original proof was (i) that any strongly connected finiteautomaton (containing at least one accepting state) applied to a normal sequence must select(always, not just with probability ) more than a constant fraction of elements from any suffi-ciently long finite substring of its input, and (ii) that selecting more than a constant fractionof sufficiently long substrings entails that each element of Σ must be selected with approxi-mately equal probability, by the Law of Large Numbers. In Agafonov’s original approach (for6 = { , } ), an appeal to the Strong Law of Large Numbers was used in conjunction with theproduct measure on the product topology on { , } ω , thus required reasoning about cylindersets centered on sets A of finite strings; and to avoid “double-counting” the probabilities, thesesets had to be prefix-free. We avoid this difficulty by using concentration bounds to tally theoccurrences of elements a ∈ Σ in block decompositions of finite prefixes of α .The proof that any DFA selects a µ -distributed infinite sequence from a µ -distributedinfinite sequence then follows by observing that (i) any run of a DFA on an infinite sequenceeventually reaches a strongly connected component C of the DFA that is recurrent (i.e., therun can never exit C ), and (ii) that any such component induces an irreducible Markov chain,whence we can apply the Ergodic Theorem for Markov Chains to conclude that acceptingstates are reached infinitely often and with appropriate frequency.The extension to infinite alphabets is surprisingly straightforward in most proofs: essen-tially, instead of using combinatorial estimates for finite sets, we have to ensure that seriestaken over infinite alphabets converge properly, but almost all instances involve series that (i)have non-negative elements, and (ii) are bounded above, whence the usual reasoning aboutabsolutely convergent series can be employed. Similarly, the classic results for finite automatathat we use need to be re-stated and re-proved in the case of infinite alphabets, but this ingeneral turns out to be doable without too much leg-work (e.g. Lemma 19). One caveat is thatseveral important ancillary results have standard proofs that use combinatorial arguments onfinite sets, and we thus need to provide alternative proofs using different methods. Definition 1.
We assume a non-empty, possibly (countably) infinite, alphabet Σ and denoteby λ the empty string; the sets of finite and right-infinite sequences of elements of Σ aredenoted by Σ ∗ and Σ ω , respectively. Elements of Σ ∗ are ranged over by v, w, . . . , and elementsof Σ ω by α, β, . . . . If α = a a · · · ∈ Σ ω and N is a positive integer, we denote by α | ≤ N thefinite string a a · · · a N .Given v ∈ Σ ∗ and u ∈ Σ ∗ ∪ Σ ω , we write v · w the element of Σ ∗ ∪ Σ ω obtained byconcatenation. For words v ∈ Σ ∗ and w ∈ Σ ∗ ∪ Σ ω , v is said to be a prefix of w , written v (cid:22) w , if there exists u ∈ Σ ∗ ∪ Σ ω such that w = u · v . If v (cid:22) w and v = w , v is said to be a proper prefix of w , written v ≺ w . For any v ∈ Σ ∗ , the cylinder set of w , denoted [ w ] , is thesubset of Σ ω defined by [ w ] = { α ∈ Σ ω : α = w · β, β ∈ Σ ω } , that is the set of right-infinitesequences that have v as prefix. Definition 2.
Let Σ be a non-empty, possibly (countably) infinite, alphabet. A probabilitymap (over Σ ) is a map µ : Σ + −→ [0 , such that, for all positive integers n , the series P a ··· a n ∈ Σ n µ ( a · · · a n ) is convergent with limit . Note that convergence implies absoluteconvergence here.A probability map µ is said to be:• induced by a Bernoulli distribution p : Σ −→ [0 , if, for all positive integers n , and all a , . . . , a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 µ ( a i ) = Q ni =1 p ( a i ) .• invariant if, for all w ∈ Σ ∗ the series P a ∈ Σ µ ( w · a ) and P a ∈ Σ µ ( a · w ) are convergentwith limit µ ( w ) .• (when Σ is finite) equidistributed if, for any w ∈ Σ n , µ ( w ) = | Σ | − n .7bserve that an equidistributed µ is also Bernoulli. For alphabets | Σ | > , any map p : Σ −→ [0 , such that the series P a ∈ Σ p ( a ) converges to induces a probability map µ p by setting µ p ( a · · · a n ) , Q nj =1 p ( a j ) . For finite alphabets Σ , this map is equidistributed iff p ( a ) = | Σ | − for every a ∈ Σ .The expression “induced by a Bernoulli distribution” is justified by the fact that Bernoulliprobability maps correspond directly to the measure of cylinders in Bernoulli shifts [59] . Proposition 1.
A probability map µ induced by a Bernoulli distribution is invariant.Proof. For any w ∈ Σ ∗ , P a ∈ Σ µ ( aw ) = P a ∈ Σ µ ( a ) µ ( w ) = P a ∈ Σ µ ( w ) µ ( a ) = P a ∈ Σ µ ( wa ) .And P a ∈ Σ µ ( w ) µ ( a ) = µ ( w ) P a ∈ Σ µ ( a ) = µ ( w ) .We shall need probability maps to act as “measures” on (possibly infinite) sets of finitestrings: Definition 3.
Let Σ be a non-empty alphabet, let W ⊆ Σ ∗ , and let µ be a probability mapover Σ . If W = ∅ , we define µ ( W ) = 0 . If P w ∈ W µ p ( w ) converges, we define µ p ( W ) = P w ∈ W µ p ( w ) .Observe that as µ p ( w ) ≥ for all w ∈ Σ ∗ , if P w ∈ W µ p ( w ) converges, it is absolutelyconvergent (hence, we do not need to specify an ordering of W ).We are interested in the probability maps whose values can be realized as the limitingfrequencies of finite words in right-infinite sequences over Σ . Definition 4.
Let v = v · · · v N and w = w · · · w n be finite words over Σ . We denote by w ( v ) the number of occurrences of w in v , that is, the quantity |{ j N + 1 − n : v j v j +1 · · · v j + n − = w w · · · w n }| Let µ be a probability map over Σ , and be α is a right-infinite sequence over Σ . If thelimit lim N →∞ w ( α | ≤ N ) N exists and is equal to some real number f , we say that w occurs in α with limiting frequency f . If every w ∈ Σ + occurs in α with limiting frequency µ ( w ) , we say that α is µ -distributed. Proposition 2.
Let µ be a probability map over Σ . If there exists a µ -distributed sequence,then µ is invariant.Proof. Let µ be a probability map over Σ and α = a a . . . a µ -distributed sequence. Weconsider w = w w . . . w k ∈ Σ k and note that for all N > : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X a ∈ Σ wa ( α | ≤ N ) − w ( α | ≤ N ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Indeed, every occurence of w as a i a i +1 . . . a i + k such that i > is also an occurence of b · w fora (unique) b ∈ Σ , so the expressions w ( α | ≤ N ) and P a ∈ Σ aw ( α | ≤ N ) are equal if and only if a a . . . a k = w and their difference is equal to otherwise. In the literature on normal numbers, the word Bernoulli is sometimes used slightly differently, for exam-ple Schnorr and Stimm [56] use the term “Bernoulli sequence” for sequences that are equidistributed in ourterminology. (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − w ( α | ≤ N ) N (cid:12)(cid:12)(cid:12)(cid:12) N .
We therefore obtain that: (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − w ( α | ≤ N ) N (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) w ( α | ≤ N ) N − µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) . Since both expressions on the right converge to , the left-hand side converges to zero, showingthat µ ( w ) = lim n →∞ P a ∈ Σ wa ( α | ≤ n ) n = P a ∈ Σ lim n →∞ wa ( α | ≤ n ) n = P a ∈ Σ µ ( wa ) .Similarly, for all N > : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X a ∈ Σ wa ( α | ≤ N ) − w ( α | ≤ N ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , by a similar argument as the one used above, noting that the number of occurrences is differentif and only if a N − k +1 a N − k +2 . . . a N = w . We then conclude that µ ( w ) = P a ∈ Σ µ ( aw ) in thesame way.Observe that an infinite sequence α is normal in the usual sense iff it is µ -distributedfor (the unique) equidistributed probability map µ over Σ . Also observe that it is not allprobability maps µ for which there exists a µ -distributed sequence. Example 1.
An example of a probability map that is not
Bernoulli, but such that there isat least one µ -distributed right-infinite sequence, is the map µ over Σ = { , } defined by µ ( w ) = 1 / if w does not contain any of the strings or (note that for each positiveinteger n , there are exactly two such strings of length n , namely · · · and · · · ),and b ( α ) = 0 otherwise. Observe that the right-infinite sequence · · · is µ -distributed. In contrast to all previous work on Agafonov’s Theorem, we allow countably infinite alpha-bets Σ . Alphabets of larger cardinality do not in general have probability measures realizableby considering limiting frequencies of elements of Σ ω – simply because most elements of Σ cannot occur at all in a single element of Σ ω .One reason why previous generalizations of Agafonov’s Theorem have not considered infi-nite alphabets is that there can be no equidistribution on a countably infinite set. However,there are Bernoulli measures µ on countably infinite alphabets Σ and µ -distributed infinitesequences over Σ . Example 2.
An example of a countably infinite alphabet with a Bernoulli measure is
Σ = N and p ( n ) = 6 / ( πn ) (note that we have P n ∈ Σ p ( n ) = 1 ). In general, any convergentseries P ∞ n =1 a n where every a n is non-negative induces a Bernoullii distribution on N bysetting p ( n ) = a n / (lim n →∞ P ∞ n =1 a n ) . Each such Bernoulli distribution p induces an invariantprobability map µ p , and by a result of Madritsch and Mance [38], there exists a µ p -distributedsequence.Remark . As we consider possibly infinite alphabets, we often have to consider infinite seriesinstead of finite sums in the proofs. In most cases, these series will have elements that areknown to be non-negative, and the sum of all partial sums will be bounded above, whence theseries will be absolutely convergent and the order of summation can thus be changed freely. Atrivial example of use is to consider some B ⊆ Σ and note that P a ∈ B p ( a ) = 1 − P a ∈ Σ \ B p ( a ) (as P a ∈ B p ( a ) ≤ , P a ∈ Σ \ B p ( a ) ≤ , and p ( a ) ≥ the two series are absolutely convergent,and P a ∈ B p ( a ) + P a ∈ Σ \ B = P a ∈ Σ p ( a ) = 1 ). 9 .1 Strategies, selectors and DFAs2.2 Strategies Definition 5.
Let Σ be an alphabet. A strategy S over Σ is a subset S ⊆ Σ ∗ .Given a strategy S and α ∈ Σ ω , we define the sequence selected by S , denoted S [ α ] , asfollows: if i , i , . . . , i k , . . . be the (increasing) sequence of indices i j such that α |
Definition 6. A finite-state selector over Σ is a DFA A = ( Q, δ, q s , Q F ) , where Q is the setof states, q s is the unique start state, Q F is the set of accepting states, and δ : Q × Σ −→ Q is the transition relation.A DFA is strongly connected if its underlying directed graph (states are nodes, transitionsare edges) is strongly connected.Denote by L ( A ) the language accepted by the automaton. If α = a a · · · is a finite orright-infinite sequence over Σ , the subsequence selected by A is the (possibly empty) sequenceof letters a n such that the prefix a · · · a n − ∈ L ( A ) , that is, the automaton when started onthe finite word a · · · a n − in state q s ends in an accepting state after having read the entireword. The run of S on input α is the sequence of states visited when S is applied to α from thestarting state. For ( q, w ) = ( q, w · · · w n ) ∈ Q × Σ ∗ , we use the notation δ ∗ ( q, w ) to denote thestate δ ( · · · δ ( δ ( q, w ) , w ) . . . w n ) , that is, the state reached by starting from q and followingthe (unique) path induced by w .Observe that a DFA may select an empty, finite or infinite sequence when run on a right-infinite word. Definition 7.
Let A be a DFA. A strongly connected component C in (the underlying directedgraph of) S is said to be recurrent if, for every state p in C and every a ∈ Σ , δ ( p, a ) is a statein C (i.e., once a run of S on some infinite word reaches a state in C , the run cannot leave C ). Definition 8.
Let A = ( Q, Σ , δ, q , F ) be a connected DFA. For all q ∈ Q , we denote by A q the automaton ( Q, Σ , δ, q, F ) , i.e. where the state q is chosen as the initial state.10 efinition 9. Let A = ( Q, Σ , δ, q , F ) be a connected DFA, and let q ∈ Q . Let α be aright-infinite sequence over Σ . We denote by A q [ x ] the subsequence ¯ α of α picked out by A q ,that is, w i ∈ ¯ w if and only if A q ( w
Let µ be a probability map induced by a positive Bernoulli distribution on Σ ,let A be a DFA over Σ , and let α ∈ Σ ω be µ -distributed. Then, the run of S on α eventuallyreaches a strongly connected recurrent component of A .Proof. Let C be the strongly connected component and w the word obtained from Lemma 3.As α is p -distributed, w appears in α , so write α = vwα ′ , and let q be the state of A reachedafter | v | transitions in the run of S on α . Then, after a at most a further | w | transitions, therun reaches a state in C .Corollary 3.1 ensures that we can assume wlog. that the finite-state selectors we treatare strongly connected. Note that the corollary does not imply that the strongly connectedrecurrent component contains an accepting state (indeed, the automata may have an emptyset of accepting states). Thus, some automata do not always select infinite sequences, andadditional assumptions are needed if this is desirable (this is discussed in Remark 2 below).However, this is not an issue for our main result which states that the output of a selectorapplied to a normal sequence is again normal as long as it is infinite . Theorem 4.
Let Σ be a non-empty (finite or infinite) alphabet and µ be a probability mapsuch that there exists at least one α ∈ Σ ω that is µ -distributed. Then, the following statementsare equivalent:1. µ is induced by a positive Bernoulli distribution p on Σ , i.e. for every a · · · a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 p ( a ) , and p ( a ) > for all a ∈ Σ ;2. (Postnikova property) for every finite word w ∈ Σ ∗ and µ -distributed sequence α ∈ Σ ω , ifthe sequence selected from α by the Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } is infinite, then it is µ -distributed;3. (Agafonov property) For every DFA A over Σ and every µ -distributed sequence α ∈ Σ ω ,if the sequence selected from α by A is infinite, then it is µ -distributed.Proof. For the implication 1 ⇒
3, Corollary 3.1 yields that any run of a finite-state selectoron a µ -distributed sequence eventually reaches a strongly connected recurrent component; therestriction of any DFA to the state set of one of its recurrent component is also a DFA, andthe result now follows by Lemma 15. The implication 3 ⇒ ⇒ The result in [56] is stated for finite alphabets, but the proof method carries through for infinite alphabetsas well. We provide a proof in Appendix A. emark . Theorem 4 addresses the case where a DFA or Postnikova strategy selects aninfinite sequence from a µ -distributed sequence. If one wants to restrict attention to automatathat always select an infinite subsequence from any µ -distributed sequence, extra conditionssometimes occur in the literature, e.g. that every cycle in the (underlying graph of the) DFAcontains an accepting state [6] ensuring that an infinite subsequence is selected from any (notjust µ -distributed sequence). Another condition that ensures that an infinite subsequence isselected from any µ -distributed sequence is to consider only DFAs such that every stronglyconnected recurrent component contains at least one accepting state. In this case, Corollary3.1 ensures that any run on the automaton on a µ -distributed sequence will reach a stronglyrecurrent component, and Lemma 11 below then ensures that the DFA accepts an infinitesubsequence from α . µ -distributedness for non-Bernoulli mea-sures We first prove that if µ is a probability map such that any DFA selects a µ -distributed right-infinite sequence from any µ -distributed right-infinite sequence, then µ must be Bernoulli.This is an immediate consequence of a stronger property proved in Lemma 5 below.The idea of the proof is that if µ is not Bernoulli, there is a word a · · · a k such that µ ( a · · · a k − ) = Q k − j =1 a j , but µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) . One can then constructa finite-state selector that acts like a “sliding window” of size k − , that is, remembers the last k − letters scanned and accepts if these are a · · · a k . This selector will select every letterfollowing a · · · a k − ; after a prefix of length N of a right-infinite sequence has been scanned,approximately N · µ ( a · · · a k − ) have been selected, and approximately N · µ ( a · · · a k − a k ) of these will be the symbol a k . But then the limiting frequency of a k in the sequence selectedwill be µ ( a · · · a k − a k ) /µ ( a · · · a k − ) = µ ( a k ) , and the result follows. Lemma 5.
Let µ : Σ ∗ −→ [0 , be a probability map. If µ is not induced by a Bernoullidistribution on Σ , there exists a finite word w ∈ Σ ∗ such that if α ∈ Σ ω is µ -distributed, thenthe Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } selects from α an infinite sequence β ∈ Σ ω that is not µ -distributed.Proof. If no element of Σ ω is µ -distributed, the lemma is vacuously true. Hence, assumethat there is at least one α ∈ Σ ω that is µ -distributed. If | Σ | = 1 , then there is exactly oneprobability map on Σ ∗ , namely the one that assigns probability to the unique element of Σ k for every k ≥ , and this probability map is clearly Bernoulli, and the lemma is thus vacuouslytrue. Hence, in the remainder of the proof, assume that | Σ | ≥ .Assume that µ is not induced by a Bernoulli distribution on Σ . Then there are k anda word a · · · k − a k ∈ Σ k such that µ ( a · · · a k − a k ) = Q kj =1 µ ( a j ) . Observe that k = 1 isimpossible (as µ ( a ) = Q j =1 µ ( a ) ), and thus we must have k ≥ . Assume wlog. that k isminimal among such k , and hence that µ ( a · · · a k − ) = Q k − j =1 µ ( a j ) , and note that this implies µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) .Assume for contradiction that µ ( a · · · a k − ) = 0 . Then µ ( a i ) = 0 for at least one a i and thus µ ( a · · · a k − a k ) = 0 , because the fact that there is at least one µ -distributed right-infinite sequence entails that µ ( a · · · a k − a k ) > implies µ ( a i ) ≥ µ ( a · · · a k − a k ) > . Butthis is a contradiction as we would then have µ ( a · · · a k − a k ) = 0 = Q kj =1 µ ( a j ) . Thus, µ ( a · · · a k − ) > . 12s µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) , µ ( a · · · a k − ) > , and µ ( a · · · a k − a k ) ≤ µ ( a · · · a k − ) (because µ is invariant by Proposition 2), there is a real number γ with < γ < such that: (cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12) > γ We now consider the Postnikova strategy S w with w = a · · · a k − , i.e. the strategy thatselects exactly the symbols following the occurrences of a · · · a k − in α .Let α ∈ Σ ω be µ -distributed. Then, for every ǫ > , there is an N ǫ > such that for all n > N ǫ we have: (cid:12)(cid:12)(cid:12)(cid:12) a ··· a k − ( α | ≤ n ) n − µ ( a · · · a k − ) (cid:12)(cid:12)(cid:12)(cid:12) ǫ Hence nµ ( a · · · a k − ) − nǫ ≤ a ··· a k − ( α | ≤ n ) ≤ nµ ( a · · · a k − ) + nǫ (1)and nµ ( a · · · a k − a k ) − nǫ ≤ a ··· a k − a k ( α | ≤ n ) ≤ nµ ( a · · · a k − a k ) + nǫ (2)As µ ( a · · · a k − ) > and S w selects the symbol after each occurrence of a · · · a k − , S w selectsan infinite sequence β from α . Let β ( n ) ∈ Σ ∗ be the finite sequence selected by S w from α | ≤ n .Observe that we have | β ( n ) | = a ··· a k − ( α | ≤ n ) , and a k ( β ( n ) ) = a ··· a k − a k ( α | ≤ n ) . Thefraction of occurrences a k ( β ( n ) ) / | β ( n ) | of a k in β ( n ) thus satisfies: a k ( β ( n ) ) | β ( n ) | = a ··· a k − a k ( α | ≤ n ) a ··· a k − ( α | ≤ n ) = a ··· a k − a k ( α | ≤ n ) n · n a ··· a k − ( α | ≤ n ) and hence, by (1) and (2), for all n > N : µ ( a · · · a k − a k ) − ǫµ ( a · · · a k − ) + ǫ ≤ a k ( w A,n ) | w A,n | ≤ µ ( a · · · a k − a k ) + ǫµ ( a · · · a k − ) − ǫ (3)Consider an arbitrary δ with < δ < γ/ . By (3), for all sufficiently small ǫ , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − a k ( β ( n ) ) | β ( n ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < δ and thus for all n > N ǫ : γ < (cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − a k ( β ( n ) ) | β ( n ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < δ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) whence: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > γ and as the sequence ( β ( n ) ) n ∈ N consists of prefixes of the sequence S w [ α ] selected by S w from α , and is eventually increasing, the frequency of occurrences of a k differs infinitely often from µ ( a k ) by at least γ/ , S w [ α ] cannot be µ -distributed.13emma 5 shows that if a probability map is not induced by a Bernoulli distribution on Σ , some Postnikova strategy will select a non- µ -distributed sequence from any µ -distributedsequence. In case µ is induced by a Bernoulli distribution, but not a positive Bernoullidistribution, we can show the weaker result that there will be a Postnikova strategy thatselects a non- µ -distributed sequence form some µ -distributed sequences (and this is sufficientfor our main Theorem). Lemma 6.
Let µ : Σ ∗ −→ [0 , be a probability map induced by a Bernoulli distribution on Σ that is not positive. Then there exists a finite word w ∈ Σ ∗ and µ -distributed α ∈ Σ ω such thatthe Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } selects from α an infinite sequence β ∈ Σ ω that is not µ -distributed.Proof. As µ is not positive, pick b ∈ Σ such that µ ( b ) = 0 , and let Γ ⊆ Σ be a maximalsubset such that the restriction of µ to Γ is a positive Bernoulli distribution (observe that Γ is non-empty because µ is a probability map and P a ∈ Σ µ ( a ) thus implies µ ( a ) > for some a ∈ Σ ). By [38] there exists a µ -distributed infinite sequence β ∈ Γ ω ; notice that β can be assumed w.l.o.g. to not contain any occurrences of b . Let α ∈ Σ ω be obtained byinserting the string bb at positions , , , , . . . . Then, α is µ -distributed because (i) every v ∈ Γ ∗ occurs with the same limiting frequency as in β , and every v ∈ Σ ∗ that contains anelement of Σ \ Γ occurs in α with limiting frequency . Set w = b ; then the Postnikova strategy S w = { u ∈ Σ ∗ |∃ v s . t .u = vw } selects from α a sequence β = S w [ α ] such that, for every n > , b ( β | ≤ n ) ≥ n/ − . Thus, the limiting frequency of b in β is not , and hence is not µ ( b ) ,proving that β is not µ -distributed. Lemma 7.
Let w ∈ Σ ∗ . The Postnikova strategy { u ∈ Σ ∗ | ∃ v s.t. u = vw } is computable bya strongly connected DFA over Σ .Proof. Note that the alphabet can possibly be infinite in the following proof.We write m the length of the word w , and write w , w , . . . , w m the bits of w . We design afinite state selector M w with exactly m states which will select a bit of the input if and onlyif it is preceded by the word w . Let M w = ( m , δ, q s , Q F ) be defined as follows:• m = { ( b , b , . . . , b m ) : b i ∈ { , }} is the set of binary sequences of length m ; those willrepresent a sequence of bits where b j = 1 if and only if the previous j bits of the inputcoincide with the first j bits of the input;• q s the initial state is chosen to be the sequence (0 , , . . . , ∈ m ;• Q F the set of accepting state is equal to the set of sequences { ( b , b , . . . , b m ) ∈ m : b m = 1 } ;• δ the transition function is defined as δ ( b , b , . . . , b m ; a ) = ( c , c , . . . , c m ) where c j = 1 if and only if b j − = 1 and a = w j for j = 1 , and c = 1 if and only a = w .The fact that this automaton computes the Postnikova strategy is clear from the definition.We now show it is strongly connected by showing that any state ( b , b , . . . , b m ) is reachablefrom an arbitrary state. For this, we consider a word u b ,b ,...,b m = u , . . . , u m defined by u i = w i if and only if b i = 1 (and thus u i = w i whenever b i = 0 . We then claim that theautomaton, starting from any state c ∈ m , reaches the state ( b , b , . . . , b m ) when given theword u b ,b ,...,b m as input. The key observation here is that since the ’bb’s are inserted at exponentially increasing positions, thefrequency of occurrence of all other strings is decreased by a very small (and quickly decaying) factor. Finite-state selectors preserve µ -distributedness for Bernoullimeasures The sequence of auxiliary results of this section follows the general lines of Agafonov’s originalproof in Russian for the case
Σ = { , } [46], but with multiple proofs needing more carefulanalysis and adapted techniques. Definition 10.
Let Σ be an alphabet, α = x x · · · x n · · · ∈ Σ ω , and let n be a positive integer.The n - block decomposition of α is the sequence ( α ( n,r ) ) r ≥ where α ( n,r ) = x ( r − n +1 · · · x rn ∈ Σ n .Thus, α ( n, is the string of the first n symbols of α , α ( n, is the string of the next n symbols, and so forth. Definition 11.
Let µ be a probability map over Σ and α = x x · · · x n · · · ∈ Σ ω . We saythat α is µ - block-distributed if, for each n > and every w ∈ Σ n , the n -block decomposition ( α ( n,r ) ) r > of α satisfies: lim k →∞ | i ≤ k : α ( n,k ) = w | k = µ ( w ) For finite alphabets and the special case of p being an equidistribution on Σ , it is straight-forward to prove that the properties of being µ p -distributed and µ p -block-distributed areequivalent [43, 19, 49]. For the present paper, we only use that µ p -distributedness implies µ p -block-distributedness, which follows by tedious, but standard counting arguments on suf-ficiently large finite prefixes of α | ≤ N using the same reasoning as the original proof by Nivenand Zuckerman for finite alphabets and normality [43], mutatis mutandis : Proposition 8.
Let µ p be a probability map induced by a Bernoulli distribution p on alphabet Σ . If α ∈ Σ ω is µ p -distributed, it is µ p -block distributed. We now prove that finite-state selectors can be composed appropriately; this will later bea key ingredient in reducing the problem of selecting finite strings w ∈ Σ ∗ with frequency µ p ( w ) to the problem of selecting single symbols a ∈ Σ with frequency p ( a ) . Proposition 9 (Finite-State selectors are compositional) . Let A and B be DFAs over thesame alphabet. Then there is a DFA C such that, for each sequence w , C [ w ] = B [ A [ w ]] . If A and B are both strongly connected and A contains at least one accepting state, C can bechosen to be strongly connected.Proof. Let A = ( Q A , Σ , δ A , q A , F A ) and B = ( Q B , Σ , δ B , q B , F B ) . Define Q C = Q A × Q B ,and set q C = ( q A , q B ) and F C = F A × F B . For each q B ∈ Q B , define the set D q B = { ( q, q B ) : q ∈ Q A } ⊆ Q C . Observe that Q C = S q B ∈ Q B D q B and that for q B , r B ∈ Q B with q B = r B , wehave D q B ∩ D r B = ∅ , and thus { D q B : q B ∈ Q B } is a partitioning of Q C . Hence, the transitionrelation, δ C , of C may be defined by defining it separately on each subset D q B : δ C (( q, q B ) , a ) = (cid:26) ( r, q B ) if q / ∈ F A and δ A ( q, a ) = r ( r, r B ) if q ∈ F A and δ A ( q, a ) = r and δ B ( q B , a ) = r B C processes its input, it freezes the current state q B of B (the freezing is repre-sented by staying within D q B ) and simulates A until an accepting state of A is reached (i.e.just before A would select the next symbol); on the next transition, C unfreezes the currentstate of B and moves to the next state r B of B and then freezes it and continues with asimulation of A .Observe that a symbol is picked out by C iff the state is an element of F C = F A × F B iff the symbol is the next symbol read after simulation of A reaches an accepting state of A when the current frozen state of B is an accepting state of B .By construction, C is strongly connected if both A and B are: for any pair of states ( q A , q B ) , ( q A , a B ) ∈ Q C , strong connectivity of B implies that there is a directed path from q B to q B in B . Let q B , q B , , q B , , . . . , q B ,k be the states along this path. Strong connectivity of A and the assumption that there is some q F ∈ F A imply that there is a directed path from ( q A , q B ) to q F , q B ) in C , and by definition of δ C , there is a transition in C from ( q F , q B ) to ( q B , q B , ) . A straightforward induction on k now completes the proof.The following shows that to prove that the property of being µ p -distributed is preservedunder finite-state selection, it suffices to prove that the limiting frequency of each a ∈ Σ existsand equals p ( a ) . Lemma 10.
Let µ p be a probability map induced by a Bernoulli distribution p on Σ , and let α ∈ Σ ω be µ p -distributed. The following are equivalent: • For all strongly connected DFAs A , if A [ α ] is infinite, then A [ α ] is µ p -distributed. • For all strongly connected DFAs A and all a ∈ Σ , if A [ α ] is infinite, then the limitingfrequency of a in A [ α ] exists and equals p ( a ) .Proof. If, for all A such that A [ α ] is infinite, A [ α ] is µ p -distributed, then in particular thelimiting frequency of a in A [ α ] exists and is equal to p ( a ) for all A .Conversely, suppose that, for all strongly connected DFAs A and all a ∈ Σ , if A [ α ] isinfinite, then the limiting frequency of a in A [ α ] exists and equals p ( a ) . Let A be such a DFA.If A has no accepting states, there is nothing to prove, so assume that A has at least oneaccepting state.We will prove by induction on k ≥ that the limiting frequency of every v · · · v k v k +1 ∈ Σ k +1 exists and equals µ p ( v · · · v k v k +1 ) .• k = 0 : This is the supposition.• k ≥ . Suppose that the result has been proved for k − . Let v · · · v k ∈ Σ k ; bythe induction hypothesis, the limiting frequency of v · · · v k in A [ w ] is µ p ( v · · · v k ) . Weclaim that there is a strongly connected DFA B that, from any sequence, selects thesymbol after each occurrence of v · · · v k . To see that such a DFA exists, let there bea state for each element of Σ k and think of the state is the current length- k string ina “sliding window” that moves over w one symbol at the time; when the window ismoved one step, the DFA transits to the state representing the new length- k string inthe window, i.e. for any a ∈ Σ , from the state representing the word w · · · w k , thereare transitions to w · · · w k a ; it thus every state is reachable from every other state inat most k transitions. The unique final state of B is the state representing v · · · v k ; thestart state of B can be chosen to be any state representing a string w · · · w k such that16here are exactly k transitions to the final state; for example, let a ∈ Σ \ { v } . Thenfrom the state representing a k , the final state cannot cannot be reached in k − or fewersteps, but every state is reachable in k steps.By Proposition 9, there is a strongly connected DFA C such that C [ w ] = B [ A [ w ]] forall w ∈ Σ ∗ .For any a ∈ Σ and any sufficiently large positive integer N , we have: a ( C [ α | ≤ N ]) | C [ α | ≤ N ] | = a ( B [ A [ α | ≤ N ]]) | B [ A [ α | ≤ N ]] | = v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) By the induction hypothesis, for every ǫ > , we have, for all sufficiently large N , that (cid:12)(cid:12)(cid:12) a ( C [ α | ≤ N ]) | C [ α | ≤ N ] | − p ( a ) (cid:12)(cid:12)(cid:12) < ǫ , and hence: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (4)But for all sufficiently large N , the induction hypothesis also furnishes that: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (5)But as: v ··· v k a ( A [ α | ≤ N ]) | A [ α | ≤ N ] | = v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) · v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | Equation (4) and Equation (5) thus yield: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k a ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) · v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k ) p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ + ǫ (cid:18) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) + v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | (cid:19) ≤ ǫ + 2 ǫ Hence, for all a ∈ Σ , the limiting frequency of v · · · v k a in A [ α | ≤ N ] exists and equals µ p ( v · · · v k a ) , as desired. µ p -distributedness under finite-state selection By Lemma 10 we may restrict our attention to proving that the frequency of single symbolsfrom Σ are preserved under selection by DFAs. The strategy will be to consider an arbitrarystrongly connected DFA A , split the set of finite words Σ ∗ into multiple classes that depend onthe selection behaviour of A , and use a combination of concentration bounds and basic Markovchain theory applied to these classes to obtain upper and lower bounds on the frequency withwhich A selects each symbol from A . 17 efinition 12. Let A = ( Q, Σ , δ, q , F ) be a strongly connected DFA. For all p ∈ [0 , , b ∈ [0 , , n ∈ N and ǫ > , we define sets D pn ( b, ǫ ) , E n ( b, q ) and G n ( b, ǫ, q ) as follows: D pn ( b, ǫ, q ) = (cid:26) w ∈ Σ n : | A q [ w ] | > bn and sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (cid:27) (6) D pn ( b, ǫ ) = \ q ∈ Q D pn ( b, ǫ, q ) (7) E n ( b, q ) = { w ∈ Σ n : | A q [ w ] | ≤ bn } (8) E n ( b ) = [ q ∈ Q E n ( b, q ) (9) G n ( b, ǫ, q ) = (cid:26) w ∈ Σ n : | A q [ w ] | > bn and sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) (10) G n ( b, ǫ ) = [ q ∈ Q G n ( b, ǫ, q ) (11) Observe that, for all b, n, ǫ , Σ n = E n ( b ) ∪ D pn ( b, ǫ ) ∪ G n ( b, ǫ ) (and also note that E n ( b ) and G n ( b, ǫ ) are not necessarily disjoint). Lemma 11.
Let p be a positive Bernoulli distribution on Σ , and let S = ( Q, Σ , δ, q s , Q F ) be astrongly connected finite automaton with Q F = ∅ , and let n a positive integer. Then there existsa real number c > such that for all real numbers ǫ > we have lim n →∞ µ p ( E n ( c − ǫ )) = 0 .Proof. S induces a stochastic | Q | × | Q | matrix P by setting P ij = X a ∈ Σ p ( a ) · δ ( i,a )= j . Observe that if Σ is infinite, the fact that (i) p ( a ) · δ ( i,a )= j ≥ , (ii) p ( a ) · δ ( i,a )= j ≤ p ( a ) , and(iii) P a ∈ Σ p ( a ) = 1 entails that the series P a ∈ Σ p ( a ) · δ ( i,a )= j is absolutely convergent.Note also that P ij = 0 iff there are no transitions from i to j in Q on a symbol a ∈ Σ with p ( a ) > . As S is strongly connected, there exists a path from state i to state j for each i, j ∈ Q . Let v be the word along this path; as p ( a ) > for all a ∈ Σ , we have µ p ( v ) > ,whence for each i, j there is an integer n ij such that P n ij ij > , that is, P (and its associatedMarkov chain) is irreducible. As all states of a finite Markov chain with irreducible transitionmatrix are positive recurrent, standard results (see, e.g., [57, Thm. 54]) yield that there is aunique positive stationary distribution π : Q −→ [0 , (s.t., for all i ∈ Q , we have π ( i ) > and π ( i ) = P j ∈ Q π ( j ) P ij ). Furthermore, the expected return time M i to state i satisfies M i = 1 /π ( i ) .Let ( X n ) n ≥ = ( X , X , X , . . . ) be the Markov chain with transition matrix P and someinitial distribution λ on the states. Consider, for each i ∈ Q , the stochastic variable V i , where V i ( n ) = n − X k =0 X k = i , that is, V i ( n ) is the number of times state i is visited in the first n elements of the Markovchain. As P is irreducible, the Ergodic Theorem for Markov chains (see, e.g., [57, Thm. 75])18ields that, independently of λ , we have for arbitrary ǫ > : lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − M i (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (12)Let n be a positive integer, let w = w · · · w n ∈ Σ n , and let q S j ( w ) = q w q w · · · q wn be thesequence of states visited in the run of S j on w (i.e., q w = j ). The probability of observinga state sequence q · · · q n in the Markov chain is (when the initial distribution λ has λ ( q ) = λ ( j ) = 1 ):Pr ( q · · · q n ) = n − Y i =0 X a ∈ Σ p ( a )1 δ ( q i ,a )= q i +1 = X a ,...,a n ∈ Σ p ( a )1 δ ( q ,a )= q · · · p ( a n )1 δ ( q n − ,a n )= q n where we have used the fact that the Cauchy product of two absolutely convergent series isconvergent.As for all integers i with ≤ i ≤ n we have δ ( q wi − , w i ) = q wi , we obtain: X a ,...,a n ∈ Σ p ( a )1 δ ( q ,a )= q · · · p ( a n )1 δ ( q n − ,a n )= q n = µ p ( { a · · · a n : q S j ( a · · · a n ) = q · · · q n } ) and hence Pr ( q · · · q n ) = µ p ( { w : q S j ( w ) = q · · · q n } ) (13)Thus, as S is deterministic and every w · · · w n ∈ Σ n occurs along exactly one path of statesin S , we have:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = X q q ··· q n ∈ Q n Pr ( q · · · q n )1 | V i ( n ) /n − π ( i ) |≥ ǫ = X q q ··· q n ∈ Q n µ p ( { w · · · w n : q S j ( w · · · w n ) = q · · · q n } )= µ p (cid:18) w : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) (14)Hence, by Equation (12) and Equation (14), we have lim n →∞ µ p (cid:18) w : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (15)If q S j ( w ) = q · · · q n and q k ∈ Q F for some k with ≤ k ≤ n − , then S j selects w k +1 . Set c = min q i ∈ Q F π ( i ) ( c is well-defined as Q F = ∅ ), and let i ∈ Q F be such that π ( i ) = c . Then,for all j ∈ Q : µ p ( E n ( c − ǫ, j ) = µ p ( { w ∈ Σ n : | S j [ w ] | ≤ ( c − ǫ ) n } ) ≤ µ p ( { w ∈ Σ n : V i ( n ) ≤ ( c − ǫ ) n } )= µ p (cid:18)(cid:26) w ∈ Σ n : V i ( n ) n − c ≤ − ǫ (cid:27)(cid:19) = µ p (cid:18)(cid:26) w ∈ Σ n : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − c (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) And hence, by Equation (15), we have lim n →∞ µ p ( E n ( c − ǫ, j )) = 0 , and as j ∈ Q was arbitrary,we obtain 19 im n →∞ µ p ( E n ( c − ǫ )) = lim n →∞ µ p ( ∪ j ∈ Q µ p ( E n ( c − ǫ, j )) ≤ lim n →∞ X j ∈ Q µ p ( E n ( c − ǫ ) , j )= X j ∈ Q lim n →∞ µ p ( E n ( c − ǫ ) , j ) = 0 as desired. Lemma 12.
Let S be a strategy, a ∈ Σ , b, ǫ be real numbers with < b ≤ and ǫ > , and p : Σ −→ [0 , a positive Bernoulli distribution. Define, for all positive integers n : H n ( b, ǫ ) = (cid:26) w ∈ Σ n : | S ( w ) | > bn ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( S ( w )) | S ( w ) | (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = [ bn<ℓ ≤ n (cid:26) w ∈ Σ n : S ( w ) ∈ Σ ℓ ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( S ( w )) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Then: lim n →∞ µ p ( H n ( b, ǫ )) = 0 Proof.
Define F n ( b, ǫ ) = [ bn<ℓ ≤ n (cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Observe that H n ( b, ǫ ) = { w ∈ Σ n : S ( w ) ∈ F n ( b, ǫ ) } . Thus, µ p ( H n ( b, ǫ )) ≤ µ p F n ( b, ǫ ) for all n , and it thus suffices to prove that lim n →∞ µ p ( F n ( b, ǫ )) = 0 .Consider the stochastic variable X a that is when a is picked from Σ with probability p ( a ) , and otherwise. Then, the mean of X a is p ( a ) and the variance of X a is p ( a )(1 − p ( a )) .Now consider performing ℓ ≥ independent Bernoulli trials drawn according to X a . Define q : { , } + −→ [0 , inductively by q (1) = p ( a ) , q (0) = 1 − p ( a ) , and q (1 c ) = p ( a ) q ( c ) and q (0 c ) = (1 − p ( a )) q ( c ) for c ∈ Σ + , and observe that q induces a probability distribution ¯ q on Σ ℓ by setting ¯ q ( w ) = q ( w ) . Now, for any v ∈ Σ ℓ , ¯ q ( v ) is the probability of obtaining v byperforming ℓ repeated Bernoulli trials as above.Define the stochastic variable X ℓa = X a + X a + · · · + X a ( ℓ times). Then, X ℓ counts thenumber of occurrences of a by performing the ℓ repeated Bernoulli trials.By the Chernoff bound, X ℓa satisfies:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) ≤ e − ℓǫ p ( a ) (16)Define the map g : Σ −→ { , } by g ( a ) = 1 and g ( b ) = 0 for all b ∈ Σ \ { a } .Clearly, g extends homomorphically to a map ˜ g : Σ ℓ −→ { , } ℓ by setting ˜ g ( c c · · · c ℓ ) = g ( c ) g ( c ) · · · g ( c ℓ ) . Claim:
For any u ∈ { , } ℓ , ¯ q ( u ) = µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = u } ) (17) Proof of claim:
By induction on ℓ . 20 If ℓ = 1 , then if u = 0 , we have { y ∈ Σ ℓ : ˜ g ( y ) = u } = Σ \ { a } and thus: ¯ q ( u ) = ¯ q (0) = q (0) = 1 − p ( a ) = X b ∈ Σ \{ a } p ( b ) = µ p (Σ \ { a } ) Similarly, if u = 1 , we have { y ∈ Σ ℓ : ˜ g ( y ) = u } = { a } , and thus ¯ q ( u ) = ¯ q (1) = q (1) = p ( a ) = µ p ( { a } ) , as desired.• If ℓ > , write u = b · · · b ℓ − b ℓ ; by the induction hypothesis: ¯ q ( b · · · b ℓ − ) = µ p ( { y ∈ Σ ℓ : ˜ g ( y ′ ) = b · · · b ℓ − } ) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − µ p ( y ′ ) If b ℓ = 0 , then: ¯ q ( b · · · b ℓ − b ℓ ) = ¯ q ( b · · · b ℓ − ) q (0) = ¯ q ( b · · · b ℓ − )(1 − p ( a )) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − µ p ( y ′ )(1 − p ( a )= X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − µ p ( y ′ ) X c ∈ Σ \{ a } p ( c ) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − X c ∈ Σ \{ a } µ p ( y ′ ) p ( c ) ( † ) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − c ∈ Σ \{ a } µ p ( y ′ c ) = X y ∈ Σ ℓ ˜ g ( y )= b ··· b ℓ − b ℓ µ p ( y )= µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = b · · · b ℓ − b ℓ } ) where ( † ) follows as both series on the left- and right-hand sides of the equality areabsolutely convergent. The proof for the case b ℓ = 1 is symmetric, mutatis mutandis.(End of proof of claim.)Observe that, for any y ∈ Σ ℓ , we have: | p ( a ) − (˜ g ( y )) /ℓ | ≥ ǫ iff | p ( a ) − a ( y ) /ℓ | ≥ ǫ (18)Hence, by Equation (17), for any event U ⊆ { , } ℓ , we have:Pr ( U ) = X u ∈U ¯ q ( u ) = X u ∈U µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = u } ))= µ p (cid:16)n y ∈ Σ ℓ : ˜ g ( y ) ∈ U o(cid:17) (19)The event | p ( a ) − X ℓa /ℓ | ≥ ǫ is shorthand for the set ( u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − P ℓj =1 u j ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ) = (cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( u ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)
21e thus obtain:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = Pr (cid:18)(cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( u ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) = µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − (˜ g ( y )) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (19) )= µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (18) ) (20)Observe that: µ p ( R n ( b, ǫ )) = µ p [ bn<ℓ ≤ n (cid:26) y ∈ Σ ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ Σ ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) ≤ X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) = X bn<ℓ ≤ n Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) by Equation (20) ≤ X bn<ℓ ≤ n e − ℓǫ p ( a ) by Equation (16) ≤ (1 − b ) n e − bnǫ p ( a ) And thus lim n →∞ µ p ( F n ( b, ǫ )) = 0 , as desired. Corollary 12.1.
Let b, ǫ be real numbers with < b ≤ and ǫ > . Then, lim n →∞ µ p ( G n ( b, ǫ )) = 0 Proof.
By Lemma 12 with S the strategy defined by the automaton A q , we obtain that lim n →∞ µ p ( G n ( b, ǫ, q )) = 0 and as G n ( b, ǫ ) = S q ∈ Q G n ( b, ǫ, q ) , we have: µ p ( G n ( b, ǫ )) ≤ X q ∈ Q µ p ( G n ( b, ǫ, q )) As Q is finite, we hence obtain lim n →∞ µ p ( G n ( b, ǫ )) = 0 . Lemma 13.
There is a real number b with < b ≤ such that for all ǫ > : lim n →∞ µ p ( D pn ( b, ǫ )) = 1 . roof. Observe that, for all b with < b ≤ : Σ n \ D pn ( b, ǫ )) = { w ∈ Σ n : ∃ q ∈ Q. | A q [ w ] | ≤ bn }∪ (cid:26) w ∈ Σ n : ∃ q ∈ Q. | A q [ w ] | > bn ∧ sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = [ q ∈ Q E n ( b, q ) ∪ [ q ∈ Q G n ( b, ǫ, q ) and thus: µ p (Σ n \ D pn ( b, ǫ )) ≤ µ p [ q ∈ Q E n ( b, q ) + µ p [ q ∈ Q G n ( b, ǫ, q ) = µ p ( G n ( b, ǫ )) + µ p ( E n ( b )) By Lemma 11, choose a real number c > such that lim n →∞ µ p ( E n ( c − ǫ )) = 0 , and set b = c − ǫ .By Corollary 12.1, we obtain that lim n →∞ G n ( b, ǫ ) = 0 , and thus lim n →∞ µ p (Σ n \ D pn ( b, ǫ )) =0 . The result now follows by µ p ( D pn ( b, ǫ ))) = 1 − µ p (Σ n \ D pn ( b, ǫ )) . Lemma 14.
Let p : Σ −→ [0 , be a probability distribution, let α ∈ Σ ω be µ p -block-distributed,and A a strongly connected DFA over Σ . Then, for all a ∈ Σ , the limiting frequency of a inthe sequence β = A [ α ] exists and equals p ( a ) .Proof. For each n, r , let β ( n,r ) be the sequence of symbols picked out from block α ( n,r ) when A is applied to α ; note that each β ( n,r ) has length between and n .For each positive integer m , define: L m = m X i =1 | β ( n,i ) | And for each a ∈ Σ , define ρ ma by: ρ ma = P mi =1 a ( β ( n,i ) ) L m To prove the lemma, it suffices to show that, for any real number ǫ > and all sufficientlylarge m , we have | ρ ma − p ( a ) | < ǫ .Define: I m = n i m : α ( n,i ) D pn (cid:16) b, ǫ (cid:17)o And define: ℓ m = X i ∈ I m | β ( n,i ) | Now, define θ ma by: θ ma = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) P i ∈{ ,...,m }\ I m | y ( n,i ) | = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m θ ma is the frequency of occurrences of a when the blocks β ( n,i )] picked out from blocks α ( i,r ) ∈ D pn ( b, ǫ ) with i ≤ m are all concatenated. Observe that, by definition of D pn , we have | θ ma − p ( a ) | < ǫ .We have: ρ ma − θ ma = P mi =1 a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m = P i ∈ I m a ( β ( n,i ) ) L m + P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m ! − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m ( † ) = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m + P i ∈ I m a ( β ( n,i ) ) L m ≤ P i ∈ I m a ( β ( n,i ) ) L m ≤ P i ∈ I m | β ( n,i ) | L m = ℓ m L m (21)where the penultimate inequalities in the last line above follows because L m ≥ L m − ℓ m im-plies P i ∈{ ,...,m }\ I a ( β ( n,i ) ) L − P i ∈{ ,...,m }\ I a ( β [ n,i ] ) L − ℓ ≤ , and the final inequality follows because P i ∈ I m a ( β ( n,i ) ) ≤ P i ∈ I | β ( n,i ) | = ℓ m .By basic algebra, we have: P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m = − ℓ m P i ∈{ ,...,m }\ I a ( β ( n,i ) ) L m ( L m − ℓ m ) and as X i ∈{ ,...,m }\ I m a ( β ( n,i ) ) ≤ X i ∈{ ,...,m }\ I m | β ( n,i ) | = L m − ℓ m we conclude that: − ℓ m P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m ( L m − ℓ m ) ≥ − ℓ m L m and thus by ( † ) above that: ρ ma − θ ma + P i ∈ I m a ( β ( n,i ) ) L m ≥ − ℓ m L m whence − ℓ m /L m ≤ ρ a − θ a , which combined with (Equation (21)) yields | ρ a − θ a | ≤ ℓ m /L m .By Lemma 13 pick a b such that such that for all ǫ > , we have lim n →∞ µ p ( D pn ( b, ǫ )) = 1 .Choose δ > with δ < bǫ , and pick n ∈ N such that µ p ( D pn ( b, ǫ )) > − δ . Now, pick γ < bǫ .Because α is µ p -block-distributed, there exists M ∈ N such that for all k ≥ M and all B ⊆ Σ n ,the prefix α | ≤ kn satisfies: (cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ B }| k − µ p ( B ) (cid:12)(cid:12)(cid:12)(cid:12) < γ In the particular case B = D pn ( b, ǫ/ , we thus have: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ D pn (cid:0) b, ǫ (cid:1) }| k − µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < γ − δ − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k ≤ µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17) − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k < γ whence we conclude: (cid:12)(cid:12)(cid:12)n i ≤ k : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) > k (1 − δ − γ ) (22)By definition of D pn ( b, ǫ )) , every α ( n,i ) ∈ D pn ( b, ǫ )) satisfies | A [ α ( n,i ) ] | > bn , and we thushave: L m = m X i =1 | y ( n,i ) | = m X i =1 | A [ α ( n,i ) ] | ≥ (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) bn > m (1 − δ − γ ) bn (23)Furthermore, by the definition of I m and (Equation (22)): | I m | = (cid:12)(cid:12)(cid:12)n i m : α ( n,i ) D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) = m − (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) < m − m (1 − δ − γ ) = m ( δ + γ ) But then, ℓ m = X i ∈ I m | y ( i,n ) | ≤ | I m | n < mn ( δ + γ ) (24)and thus by Equation (23) and Equation (24): ℓ m L m < mn ( δ + γ ) m (1 − δ − γ ) bn = δ + γb (1 − δ − γ ) < bǫ + bǫ b (cid:0) − bǫ − bǫ (cid:1) < ǫ − < ǫ where we have used that bǫ < in the penultimate inequality.We now finally have | ρ a − p ( a ) | ≤ | ρ ma − θ ma | + | θ a − p ( a ) | < ℓ m L m + ǫ < ǫ ǫ ǫ concluding the proof. Lemma 15.
Let Σ be an alphabet, p a positive Bernoulli distribution on Σ , let α ∈ Σ ω be µ p -distributed, and let A be a strongly connected DFA over Σ . Then, A [ α ] is µ p -distributed.Proof. By Lemma 10 it suffices to show for every a ∈ Σ and every strongly connected A thatthe limiting frequency of a in A [ α ] exists and equals p ( a ) . As α is µ p -distributed, it followsfrom Proposition 8 that it is µ p -block-distributed, and the result then immediately follows byLemma 14. We now show an application of the main result to the area of symbolic dynamical systems. Thefollowing section recalls basic facts about symbolic dynamical systems, including establishingthe correspondence between probability maps on Σ ∗ and probability measures on full shifts.25 .1 Shift spaces and genericity We briefly introduce basic notions; full accounts can be found in standard textbooks, e.g. [35].
Definition 13.
Let Σ be a non-empty alphabet. The (one-sided) shift s : Σ ω −→ Σ ω is themap defined by s ( a a a · · · ) = a a · · · . A shift space is a pair ( X, s ) where X ⊆ Σ ω is aclosed (in the product topology on Σ ω when Σ is endowed with the discrete topology) subsetsuch that s ( X ) = X , and s is the restriction of the shift to X .As usual, we consider the σ -algebra C on Σ ω having the set of cylinders { [ w ] : w ∈ Σ ∗ } asbasis. All measures µ in the remainder of the paper are understood to be measures on (Σ ω , C ) .The standard example of probability measures on shift spaces is the set of Bernoulli mea-sures [59]: Definition 14.
A probability measure on the shift space (Σ ω , s ) is a probability measure on Σ ω with the σ -algebra generated by the cylinder sets { [ v ] : v ∈ Σ ∗ } A probability measure ¯ µ on the full shift is a Bernoulli measure if there is a probability distribution p : Σ −→ [0 , such that the measure of each cylinder satisfies ¯ µ ([ a · · · a n ]) = Q ni =1 p ( a i ) . In this case, wesay that ¯ µ is induced by p . Definition 15.
Let ( X, s ) be a shift space. A probability measure ¯ µ on X is said to be shiftinvariant if ¯ µ ( S − ( A )) = ¯ µ ( A ) for all A ⊆ X . A finite word w ∈ Σ k is said to be admissible for µ if ¯ µ ([ w ]) > .A right-infinite sequence α ∈ Σ ω is said to be generic for ¯ µ if, for all words w admissiblefor ¯ µ , we have: lim n →∞ w ( α | ≤ n ) n = ¯ µ ([ w ]) That is, w occurs in α with limiting frequency ¯ µ ([ w ]) .The study of probability measures on the full shift is cryptomorphic to the study of invari-ant probability maps; this folklore result is contained in the following two propositions (proofscan be found in Appendix A). Proposition 16.
Every invariant probability map µ : Σ ∗ −→ [0 , induces a shift-invariantprobability measure ¯ µ : Σ ω −→ [0 , by setting ¯ µ ([ w ]) = µ ( w ) . Conversely, every probabilitymeasure ν : Σ ω −→ [0 , induces a probability map ν : Σ ∗ −→ [0 , by defining ν ( w ) = ν ([ w ]) ;if ν is shift-invariant, then ν is invariant. Furthermore, µ = ¯ µ , and ν = ¯ ν . Proposition 17.
Let µ : Σ ∗ −→ [0 , be a probability map. The following are equivalent:1. There exists a µ -distributed α ∈ Σ ω .2. µ is invariant.3. There exists a shift-invariant probability measure ν on Σ ω such that ¯ µ = ν .Conversely, let ν be a probability measure on Σ . The following are equivalent:1. There exists α ∈ Σ ω that is generic for ν . For one-sided shifts, some authors require only s ( X ) ⊆ X ; we shall not do so here. . ν is shift-invariant.3. There exists an invariant probability map µ : Σ ∗ −→ [0 , such that ν = µ . It follows that the shift-invariant probability measures ν on the full shift such that gener-icity is preserved by finite-state selection, are exactly the Bernoulli measures: Theorem 18.
Let Σ be a non-empty alphabet, and let ν be a shift-invariant measure on thefull shift (Σ ω , s ) such that there exists at least one α ∈ Σ ω generic for ν . Then, every finite-state selector preserves genericity iff ν is a Bernoulli measure such that all words in Σ ∗ areadmissible.Proof. Observe that for a Bernoulli measure ¯ µ on the full shift on Σ , all words are admissibleiff ¯ µ ( a ) > for all a ∈ Σ . The Theorem now follows from Theorem 4 and Proposition 17. The most obvious extension of our main results is to attempt to relax the requirement thatselection is done by a DFA by using methods similar to Kamae and Weiss [31], and Kamae andWang [68] where reasoning using a combination of density arguments and relaxed finitenessconditions on the syntactic monoid of the strategy (using our terminology) have been used fornormal sequences over binary alphabets. We conjecture that some of these techniques can beadapted to positive Bernoulli distributions on arbitrary finite alphabets.A different possible thrust is to consider generalizations of Agafonov’s Theorem on domainsdifferent from infinite sequence over alphabets. However, some results in the – sparse –literature on selection from normal sequence-like objects in other contexts are negative; forexample normality is not preserved by arithmetic progressions (so, probably not by finite-stateselectors in any reasonable sense) for continued fraction expansions [27]. On the other hand,very recent work by Bergelson et al. has succesfully adapted the classical techniques of Kamaeand Weiss [31] to show that certain Følner sequences preserve (the appropriate analogue of)normality in cancellative amenable semigroups [7].27
Auxiliary proofs and definitions
A.1 Automata and selectors
The following is a proof of the extension of Lemma 2.6 of [56]. The proof follows the originalin most details.
Definition 16.
Let G = ( V, E ) be a directed multigraph, and denote by ∼⊆ V × V theequivalence relation such that v ∼ w iff v and w are in the same strongly connected componentof G . For every v ∈ V , denote by [ v ] ∼ the equivalence class containing v . Define the partialorder < on V / ∼ by V < W iff there are v ∈ V and w ∈ W such that there is a directed pathfrom v to w .If G has a finite number of nodes, < is clearly well-founded. As < is clearly also transitive,every W ∈
V / ∼ satisfies W > V for some < -minimal V ∈
V / ∼ .Also observe that every < -minimal V is a recurrent strongly connected component, because(i) it is strongly connected by definition, and < -minimality implies that no directed path fromany node in V can reach a node in a strongly connected component distinct from V . Lemma 19.
Let S = ( Q, δ, q s , Q F ) be a finite automaton over a (possibly infinite) alphabet Σ .Then there is a word w ∈ Σ ∗ such that, for all states q ∈ Q , δ ∗ ( q, w ) is a state in a < -minimalelement of Q/ ∼ .Proof. Write Q = { q , . . . , q m } . We prove by induction on i ≤ m that there is a word w i ∈ Σ ∗ such that for all j ≤ i , δ ∗ ( s j , w i ) is a state in a < -minimal element of Q/ ∼ . i = 1 : Let V be a < -minimal element of Q/ ∼ such that [ q ] ∼ > V . Choose q ∈ Q such that [ q ] ∼ = V . Then there is a directed path from s to q . Let w be the word along thatpath, and observe that δ ∗ ( q , w ) = q . i > : Let V be a < -minimal element of Q/ ∼ such that δ ∗ ( q i +1 , w i ) ∈ V , and let q ∈ V ,whence there is a directed path from δ ∗ ( q i +1 , w i ) to q . Let w ′ ∈ Σ ∗ be the word alongthat path, whence δ ∗ ( δ ∗ ( q i +1 , w i ) , w ′ ) = q . Define w i +1 = w i · w ′ , and observe that δ ∗ ( q i +1 , w i +1 ) = q .For j ≤ i , we claim that δ ∗ ( q j , w i +1 ) is a state in a < -minimal element of Q/ ∼ . For,by the Induction Hypothesis, δ ( q j , w i ) is in a < -minimal element V j of Q/ ∼ , and as < -minimal element are recurrent strongly connected components, no directed path from δ ( q j , w i ) can end in a state outside V j . Proof of Lemma 3.
Any < -minimal element of Q/ ∼ is recurrent. By Lemma 19, there is aword w such that from any state q ∈ Q , δ ∗ ( q, w ) is a state in a recurrent strongly connectedcomponent of the automaton. As p ( a ) > for all a ∈ Σ , µ p ( w ) > , and as α is p -distributed, w thus occurs (infinitely often) in α . After the first occurrence of w , the run of A on α hasentered a strongly recurrent connected component.28 .2 µ -distribution Proof of Proposition 8.
We use exactly the same arguments as in the proof by Niven andZuckerman [43], but using the notation of the present paper. Almost the entirety of the proofin [43] is devoted to counting arguments on finite prefixes of α , and involves neither the size ofthe alphabet Σ , nor the particular distribution on it; indeed any consideration of those mattersis isolated to a few observations in the beginning of the proof that are then used repeatedlywhen taking limits later on. We have clearly indicated those observations below, but give theentirety of the proof in the interest of completeness.Let w = w · · · w v ∈ Σ v be arbitrary. We introduce the following notation:• For any t ≥ , w Σ t w is the set { wuw : u ∈ Σ t } .• iw ( n ) is the number of times that w occurs in α | ≤ n at a position congruent to i (mod n ).• i,jw ( n ) = iw ( n ) − jw ( n ) .• g : N −→ N is the function defined by: g ( n ) = P n − i =1 iw ( n ) .• θ t ( n ) is the number of occurrences of any element from w Σ t w in α | ≤ n .• w ′ is shorthand for any string of length between v + 1 and v − whose first v digits are w and whose last digits are w , i.e. an “overlap of w with itself”. Such a string does notnecessarily exist.We now treat the part of the proof depending on the cardinality of Σ and µ p -distributedness(as opposed to finiteness of Σ and equidistribution ).As α is µ p -distributed, we have lim n →∞ g ( n ) n = µ p ( w ) (25)29nd for each fixed t ≥ , we also have: lim n →∞ θ t ( n ) n = lim n →∞ P a ··· a t ∈ Σ t wa ··· a t w ( α | ≤ n ) n = lim n →∞ X a ··· a t ∈ Σ t wa ··· a t w ( α | ≤ n ) n = X a ··· a t ∈ Σ t lim n →∞ wa ··· a t w ( α | ≤ n ) n (By the Dominated Convergence Theorem) = X a ··· a t ∈ Σ t µ p ( wa · · · a t w )= X a ··· a t ∈ Σ t µ p ( w ) t Y i =1 p ( a i ) ( As µ p is induced by a Bernoulli distribution )= µ p ( w ) X a ··· a t ∈ Σ t t Y i =1 p ( a i )= µ p ( w ) t Y i =1 X a ∈ Σ p ( a ) (By monotone convergence) = µ p ( w ) (As a ∈ Σ p ( a ) )(26)We shall prove that: lim n →∞ i,jw ( n ) n = 0 (27)By Equation (25) and Equation (27), it follows for any i with ≤ i < v that: lim n →∞ iw ( n ) n = µ p ( w ) v and as v and w ∈ Σ v were arbitrary, that α is µ p -block-distributed.The remainder of the proof is devoted to prove Equation (27) and is only concerned withcounting arguments on finite prefixes of α . All arguments from hereon are, modulo notationand use of Equation (26), completely identical to the proof in [43].Let s ≥ be an integer. Observe that iw ( n + s ) − iw ( n ) is the number of occurrences of w that (1) are in α | ≤ n + s at a position congruent to i mod v , but (2) are not entirely containedin α | ≤ n . Thus, X i Proof of Proposition 16. The two identities µ = ¯ µ , and ν = ¯ ν follow directly from the defini-tions. If µ is invariant, then for every cylinder [ w ] , we have ¯ µ ([ w ]) = µ ( w ) = P a ∈ Σ µ ( w · a ) = P a ∈ Σ ¯ µ ([ w · a ]) ; from this, and the observation that µ ( ǫ ) = ¯ µ ([ ǫ ]) = ¯ µ (Σ ω ) , that ¯ µ is a33robability measure on Σ with the sigma algebra generated by the cylinder sets. In addition,as µ is invariant, we have for any cylinder [ w ] ′ that: ¯ µ ( S − ([ w ])) = ¯ µ [ a ∈ Σ [ a · w ] ! = X a ∈ Σ ¯ µ ([ a · w ]) = X a ∈ Σ µ ( a · w ) = µ ( w ) = ¯ µ ([ w ]) whence ¯ µ is shift-invariant.Conversely, if ν is a shift-invariant probability measure on Σ ω , we have for any w that: ν ( w ) = ν ([ w ]) = ν [ a ∈ Σ [ w · a ] ! = X a ∈ Σ ν ([ w · a ]) = X a ∈ Σ ν ( w · a ) and ν ( w ) = ν ([ w ]) = ν ( S − ([ w ])) = ν [ a ∈ Σ [ a · w ] ! = X a ∈ Σ ν ([ a · w ]) = X a ∈ Σ ν ( a · w ) showing that ν is invariant. Proof of Proposition 17. For the first part, we prove 1 ⇒ ⇒ ⇒ µ -distributed α ∈ Σ ω , then for any w ∈ Σ ∗ and any ǫ > , for all sufficientlylarge n we have sup b ∈ Σ ∪{ λ } | b · w ( α | ≤ n ) /n − µ ( w ) | < ǫ . Observe that every occurrence of a wordon the form a · w in α contains an occurrence of w , and hence w ( α | ≤ n ) ≥ P a ∈ Σ a · w ( α | ≤ n ) .Conversely, for every occurrence of w starting at some position i ≥ in α , there is exactlyone a ∈ Σ such that the word a · w occurs at position i − , whence w ( α | ≤ n ) ≤ P a ∈ Σ a · w ( α | ≤ n ) , and hence: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − w ( α | ≤ n ) n + w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − w ( α | ≤ n ) n (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ǫ + 1 n + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ a · w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ + 1 n + X a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a · w ( α | ≤ n ) n − µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ + 1 n + | Σ | ǫ and as ǫ was arbitrary, we thus have µ ( w ) = P a ∈ Σ µ ( a · w ) . The case for µ ( w ) = P a ∈ Σ µ ( w · a ) is symmetric, mutatis mutandis , and hence µ is invariant. If µ is invariant, then by Proposition16, ¯ µ is a shift-invariant probability measure on Σ ω . If ν is a shift-invariant probability measureon Σ ω such that ¯ µ = ν , then by [38, Main Thm. 2.1], there exists α ∈ Σ ω generic for ¯ µ , andthus for any admissible w ∈ Σ ∗ lim n →∞ w ( α | ≤ n ) n = ν ( w ) = ¯ µ ([ w ]) = µ ( w ) . Observe that anyinadmissible word w = a · · · a n has Q ni =1 µ ( a i ) = µ ( w ) = 0 , whence µ ( a i ) = 0 for some i , andhence lim n →∞ w ( α | ≤ n ) n ≤ lim n →∞ ai () α | ≤ n n = 0 . Hence, α is µ -distributed.34or the second part, we prove 1 ⇒ ⇒ ⇒ 1. Assume that α is generic for ν . Byconstruction, ν is a probability map such that α is ν -distributed, and by the first part ofthe proposition, ν is invariant, as desired. If ν is an invariant probability map, then as anymeasurable A can be written as a disjount union of cylinder sets, and as we for any cylinder [ w ] have S − ([ w ]) = ∪ a ∈ Σ [ a · w ] , we obtain ν ( S − ([ w ])) = ν ( ∪ a ∈ Σ [ a · w ]) = X a ∈ Σ ν ( a · w ) = ν ( w ) = ν ([ w ]) showing that ν is shift-invariant. Finally, if ν is shift-invariant, it follows from [38, Main Thm.2.1], there exists α ∈ Σ ω generic for ν , as desired.35 eferences [1] V. N. Agafonov. “Normal sequences and finite automata”. English. In: Sov. Math., Dokl. issn :0197-6788.[2] D. Airey and B. Mance. “Normality preserving operations for Cantor series expansionsand associated fractals, I”. In: Illinois J. Math. Acta Arith-metica 180 (2017), pp. 333–346.[4] N. Alon, Y. Matias, and M. Szegedy. “The Space Complexity of Approximating theFrequency Moments”. In: Journal of Computer and System Sciences Journal ofComputer and System Sciences TheoreticalComputer Science 477 (2013), pp. 109–116.[7] V. Bergelson, T. Downarowicz, and J. Vandehey. Deterministic functions on amenablesemigroups and a generalization of the Kamae-Weiss theorem on normality preservation .2020. arXiv: .[8] G. Berry and G. Gonthier. “The Esterel synchronous programming language: design,semantics, implementation”. In: Science of Computer Programming Israel Journal of Mathematics 80 (1992), pp. 257–287.[10] F. Blanchard. “Non literal tranducers and some problems of normality”. In: Journal deThéorie des Nombres de Bordeaux IEEE Transactions on Information Theory Rend.Circ. Matem. Palermo 27 (1909), pp. 247–271.[13] S. Boucheron, A. Garivier, and E. Gassiat. “Coding on Countably Infinite Alphabets”.In: IEEE Trans. Inf. Theory Approxima-tion, Randomization, and Combinatorial Optimization. Algorithms and Techniques . Ed.by P. Raghavendra et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 58–70.[15] A. Broglio and P. Liardet. “Predictions with automata. Symbolic dynamics and its ap-plications”. In: Contemporary Mathematics 135 (1992). Also appeared in Proceedings ofthe AMS Conference in honor of R. L. Adler. New Haven CT - USA 1991., 111–124.[16] O. Carton. “A direct proof of Agafonov’s theorem and an extension to shifts of finitetype”. In: Preprint (2020). 3617] O. Carton and J. Vandehey. “Preservation of Normality by Non-Oblivious Group Selec-tion”. In: Theory of Computing Systems (2020).[18] P. Caspi et al. “Lustre: A Declarative Language for Programming Synchronous Sys-tems”. In: Conference Record of the Fourteenth Annual ACM Symposium on Principlesof Programming Languages, Munich, Germany, January 21-23, 1987 . 1987, pp. 178–188.[19] J. W. S. Cassels. “On a paper of Niven and Zuckerman.” In: Pacific J. Math. Journal of the London Mathematical Society s1-8.4 (1933), pp. 254–260.[21] A. Church. “On the concept of a random sequence”. In: Bulletin of the American Math-ematical Society AmericanJournal of Mathematics American Journal of Mathematics Bull. Amer. Math. Soc. Canadian J. Math (1952),pp. 58–63.[26] C. Dwork. “Differential Privacy in New Settings”. In: Proceedings of the Twenty-FirstAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas,USA, January 17-19, 2010 . 2010, pp. 174–183.[27] B. Heersink and J. Vandehey. “Continued fraction normality is not preserved alongarithmetic progressions”. In: Archiv der Mathematik 106 (Sept. 2015).[28] M. Holzer, M. Kutrib, and A. Malcher. “Multi-Head Finite Automata: Characterizations,Concepts and Open Problems”. In: CSP . 2008, pp. 93–107. doi : .[29] M. Hosseini and N. Santhanam. “On redundancy of memoryless sources over countablealphabets”. In: . 2014, pp. 299–303.[30] P. Indyk and D. P. Woodruff. “Optimal approximations of the frequency moments of datastreams”. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing,Baltimore, MD, USA, May 22-24, 2005 . 2005, pp. 202–208.[31] T. Kamae and B. Weiss. “Normal numbers and selection rules”. In: Israel Journal ofMathematics (1975), pp. 101–110.[32] E. Kamke. “Über neuere Begründungen der Wahrscheinlichkeitsrechnung.” In: Jahres-bericht der Deutschen Mathematiker-Vereinigung 42 (1933), pp. 14–27.[33] G. Kellaris et al. “Differentially Private Event Sequences over Infinite Streams”. In: Proc.VLDB Endow. Moscow Univ. Math. Bull. An Introduction to Symbolic Dynamics and Coding . CambridgeUniversity Press, 1995.[36] M. Madritsch. “Normal Numbers and Symbolic Dynamics”. In: Sequences, Groups, andNumber Theory . Ed. by V. Berthé and M. Rigo. Cham: Springer International Publish-ing, 2018, pp. 271–329.[37] M. Madritsch, A.-M. Scheerer, and R. Tichy. “Computable Absolutely Pisot NormalNumbers”. In: Acta Arithmetica 184 (2018), pp. 7–29.[38] M. Madritsch and B. Mance. “Construction of µ -normal sequences”. In: Monatshefte fürMathematik 179 (2016), pp. 259–280.[39] B. Mance. “Cantor series constructions of sets of normal numbers”. In: Acta Arithmetica 156 (2012), pp. 223–245.[40] W. Merkle and J. Reimann. “Selection Functions that Do Not Preserve Normality”. In: Theory Comput. Syst. Data Streams: Algorithms and Applications . Now Publishers Inc.,2005.[42] Y. Nakai and I. Shiokawa. “Discrepancy estimates for a class of normal numbers”. In: Acta Arithmetica Pacific J.Math. Journalof Computer and System Sciences V. N. Agafonov . “ Normal~nye posledovatel~nosti i koneqnye avtomaty ”. In: Prob-lemy kibernetiki . Ed. by L.A. L(cid:31)punova . Vol. 20. Nauka, Akademii nauk SSSR ,1968, pp. 123–129.[46] V. N. Agafonov . “ Normal~nye posledovatel~nosti i koneqnye avtomaty ”. In: Dokl.AN SSSR A. G. Postnikov . “ Arifmetiqeskoe modelirovanie sluqa(cid:26)nyh pro
essov ”. In: Tr.MIAN SSSR 57 (1960), pp. 3–84.[48] A. G. Postnikov and I. I. P(cid:31)te
ki(cid:26) . “ Normal~nye po Bernulli posledovatel~nostiznakov ”. In: Izv. AN SSSR. Ser. matem. L. P. Postnikova . “ O sv(cid:31)zi pon(cid:31)ti(cid:26) kollektiva Mizesa{Qerqa i normal~no(cid:26) poBernulli posledovatel~nosti znakov ”. In: Teori(cid:31) vero(cid:31)tn. i ee primen. Haskell 98 Language and Libraries: The Revised Report . CambridgeUniversity Press, 2003.[51] P. Pollack and J. Vandehey. “Some Normal Numbers Generated by Arithmetic Func-tions”. In: Canadian Mathematical Bulletin 58 (Sept. 2013).[52] L. Postnikova. “On the connection between the concepts of collectives of Mises-Churchand normal Bernoulli sequences of symbols”. In: Theory of Probability & Its Applications Mathematische Zeitschrift Annales del’institut Henri Poincaré . Vol. 7. 5. 1937, pp. 267–348.[55] A.-M. Scheerer. “Computable Absolutely Normal Numbers and Discrepancies”. In: Math-ematics of Computation 86 (Nov. 2015).[56] C. Schnorr and H. Stimm. “Endliche Automaten und Zufallsfolgen”. In: Acta Informatica Basics of Applied Stochastic Processes . Probability and Its Applications.Springer-Verlag, 2009.[58] A. Shen. “Automatic Kolmogorov Complexity and Normality Revisited”. In: Funda-mentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux,France, September 11-13, 2017, Proceedings . Ed. by R. Klasing and M. Zeitoun. Vol. 10472.Lecture Notes in Computer Science. Springer, 2017, pp. 418–430.[59] P. Shields. The Theory of Bernoulli Shifts . Univ. Chicago Press, 1973.[60] W. Sierpinski. “Démonstration élémentaire du théorème de M. Borel sur les nombresabsolument normaux et détermination effective d’une tel nombre”. In: Bulletin de laSociété Mathématique de France 45 (1917), pp. 125–132.[61] J. F. Silva and P. Piantanida. “Almost lossless variable-length source coding on count-ably infinite alphabets”. In: . 2016, pp. 1–5.[62] R. Stephens. “A Survey of Stream Processing”. In: Acta Informatica Jour-nal für die reine und angewandte Mathematik Jour-nal of Number Theory 166 (2016), pp. 424 –451.[65] J. Vandehey. “The normality of digits in almost constant additive functions”. In: Monat-shefte für Mathematik 171 (June 2012).[66] J. Vandehey. “Uncanny subsequence selections that generate normal numbers”. In: Uni-form Distribution Theory 12 (2017), pp. 65–75.[67] R. Von Mises. “Grundlagen der Wahrscheinlichkeitsrechnung”. In: Mathematische Zeitschrift