[PDF] Agafonov's Theorem for finite and infinite alphabets and probability distributions different from equidistribution

Abstract

An infinite sequence over a finite alphabet {\Sigma} of symbols is called normal iff the limiting frequency of every finite string w exists and equals |{\Sigma}|^{|w|}. A celebrated theorem by Agafonov states that a sequence {\alpha} is normal iff every finite-state selector. Normality is generalised to arbitrary probability maps \mu: {\alpha} is is \mu-distributed if, for every finite string w, the limiting frequency of w in {\alpha} exists and equals \mu(w). Unlike normality, \mu-distributedness is not preserved by finite-state selectors for all probability maps \mu. This raises the question of how to characterize the probability maps \mu for which \mu-distributedness is preserved across finite-state selection, or equivalently, by selection by programs using constant space. We prove the following result: for any finite or countably infinite alphabet {\Sigma}, every finite-state selector over {\Sigma} selects a \mu-distributed sequence from every \mu-distributed sequence {\alpha} iff \mu is induced by a Bernoulli distribution on {\Sigma}, that is a probability distribution on the alphabet extended to words by taking the product. The primary -- and remarkable -- consequence of our main result is a complete characterization of the set of probability maps, on finite and infinite alphabets, for which Agafonov-type results hold. The main positive takeaway is that (the appropriate generalization of) Agafonov's Theorem holds for Bernoulli distributions (rather than just equidistributions) on both finite and countably infinite alphabets. As a further consequence, we obtain a result in the area of symbolic dynamical systems: the shift-invariant measures {\nu} on {\Sigma}^{\omega} such that any finite-state selector preserves the property of genericity for {\mu}, are exactly the positive Bernoulli measures.

Full PDF

aa r X i v : . [ c s . F L ] N ov Agafonov’s Theorem for ﬁnite and inﬁnite alphabets andprobability distributions diﬀerent from equidistribution

Thomas SeillerCNRS & Université Sorbonne Paris Nord Jakob Grue SimonsenUniversity of CopenhagenNovember 2020

Abstract

An inﬁnite sequence over a ﬁnite alphabet of symbols Σ is called normal iff the limitingfrequency of every ﬁnite string w ∈ Σ ∗ exists and equals | Σ | −| w | .A celebrated theorem by Agafonov states that a sequence α is normal iff every ﬁnite-state selector (i.e., a DFA accepting or rejecting preﬁxes of α ) selects a normal sequencefrom α .Let µ : Σ ∗ −→ [0 , be a probability map (for every n ≥ , P w ∈ Σ n µ ( w ) = 1 ). Saythat an inﬁnite sequence α is is µ -distributed if, for every w ∈ Σ ∗ , the limiting frequencyof w in α exists and equals µ ( w ) . Thus, α is normal if it is µ -distributed for the probabilitymap µ ( w ) = | Σ | −| w | .Unlike normality, µ -distributedness is not preserved by ﬁnite-state selectors for allprobability maps µ . This raises the question of how to characterize the probability maps µ for which µ -distributedness is preserved across ﬁnite-state selection, or equivalently, byselection by programs using constant space.We prove the following result: For any ﬁnite or countably inﬁnite alphabet Σ , ev-ery ﬁnite-state selector over Σ selects a µ -distributed sequence from every µ -distributedsequence α iff µ is induced by a Bernoulli distribution on Σ , that is, for every word a · · · a n ∈ Σ ∗ , µ ( a · · · a n ) = Q ni =1 µ ( a i ) .The primary – and remarkable – consequence of our main result is a complete char-acterization of the set of probability maps, on ﬁnite and inﬁnite alphabets, for whichAgafonov-type results hold. The main positive takeaway is that (the appropriate gen-eralization of) Agafonov’s Theorem holds for Bernoulli distributions (rather than justequidistributions) on both ﬁnite and countably inﬁnite alphabets.As a further consequence, we obtain a result in the area of symbolic dynamical systems:the shift-invariant measures ν on Σ ω such that any ﬁnite-state selector preserves theproperty of genericity for ν are exactly the positive Bernoulli measures. ontents µ -distributedness for non-Bernoulli measures 125 Finite-state selectors preserve µ -distributedness for Bernoulli measures 15 µ p -distributedness under ﬁnite-state selection . . . . . 17 A.1 Automata and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28A.2 µ -distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29A.3 Symbolic dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Introduction

Let α = x x · · · be an inﬁnite sequence over a ﬁnite alphabet Σ . A string w ∈ Σ ∗ is saidto occur in α with limiting frequency f if, for each ǫ > , lim N →∞ w ( x ··· x N ) N = f , where w ( x · · · x N ) is the number of times that w occurs as a contiguous subsequence in x · · · x N . α is said to be normal if every ﬁnite string of length n over Σ occurs with limiting frequency | Σ | − n in α [12]. By standard results, the fractional part the base- b expansion of almost all realnumbers is a normal sequence for b ≥ , so for base , almost all real numbers have the digit“0” occurring 1-in-10 times in all suﬃciently long ﬁnite preﬁxes of their digit expansion, have“11” occurring 1-in-100 times, “110” occurring 1-in-1000 times, and so on. Concrete examplesof normal sequences include Champernowne’s sequence · · · [20], the Copeland-Erdös sequence · · · consisting of concatenating the prime numbers [24], and forany polynomial f with positive integer coeﬃcients the sequence f (1) f (2) f (3) · · · [25].A ﬁnite-state selector is a DFA that selects those symbols x m from α such that x · · · x m − is accepted by the DFA. The sequence of selected symbols may thus be ﬁnite or inﬁnite. Agafonov’s Theorem states that a sequence α is normal iff any DFA that selects an inﬁnitesequence from α , selects a normal sequence. Colloquially, Agafonov’s Theorem can be statedas: “any constant-space algorithm must preserve normality”.The purpose of this paper is twofold: (I) we study whether analogues of Agafonov’s The-orem holds if the distribution of ﬁnite strings is diﬀerent from equidistribution, i.e. whetherdistributions where ﬁnite strings s are allowed to occur with frequency distinct from Σ | −| w | ;and (II) we study extensions of Agafonov’s Theorem to inﬁnite alphabets (which in the tradi-tional setup in Agafonov’s Theorem is meaningless as there is no equidistributed probabilitydistribution on a countably inﬁnite set).As an example, consider the (non-normal) sequence α = 010101 · · · . Clearly, every ﬁnitebit string occurs in α with some well-deﬁned frequency (the simplest way to see this is thatfor each n > , there are exactly two distinct substrings of length n in α – one starting with and one starting with ), and the frequencies thus induce a probability distribution on { , } n for each n . In particular and each occur with limiting frequency / , but any DFA thatselects symbols at even positions will select the sequence · · · , and thus the probabilitydistribution on { , } is not preserved, showing that Agafonov’s Theorem in general fails tohold.In addition to being intrinsically interesting, our study of Agafonov’s Theorem is moti-vated by the fact that constant-space algorithms are usually employed in reactive programminglanguages used for signal processing (see Section 1.2.2 below), both for transduction and selec-tion, and Agafonov’s Theorem is a strong guarantee that such algorithms will always preserveone notion of randomness for inﬁnite strings, namely that a random length- n subsequenceis exactly | Σ | − n – as the above example shows, selection from sequences where and areknown to occur with probability / is not enough – stronger guarantees such as normalitymust hold. Conversely, normality is a very strong requirement; in some inﬁnite sequences,certain element may occur with much higher frequency than others, and one tantalizing wayof generating new sequences having the same distribution of ﬁnite subsequences could be tosimply let a DFA select elements from the original sequence, which in general is only possibleif (the appropriate analogue) of Agafonov’s Theorem holds.The motivation for studying inﬁnite alphabets is that the study of normality is closely tiedto the study of symbolic dynamics and (information-theoretic) coding theory [10, 9, 35, 36],and that both areas have witnessed recent advances using inﬁnite alphabets [13, 29, 11, 61,38], in particular the techniques of Madritsch and Mance [38] have allowed construction ofChampernowne-like sequences for various distributions over inﬁnite alphabets. The formal statement of the main theorem can be found in Theorem 4 below. In plainlanguage, we prove that:

Let Σ be a non-empty ﬁnite or countably inﬁnite alphabet, and let µ : Σ ∗ −→ [0 , bea probability map (i.e., for all n ≥ , P w ∈ Σ n µ ( w ) ) such that there exists at least one α ∈ Σ ω that is µ -distributed. Then, the following are equivalent:1. µ is induced by a positive Bernoulli probability distribution p on Σ , i.e. for every a , . . . , a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 p ( a ) , and for every a ∈ Σ , p ( a ) > .2. For every DFA A over Σ and every µ -distributed sequence α ∈ Σ ω , if A selects an inﬁnitesequence from α , then the selected sequence is µ -distributed. The above result completely characterizes the probability maps preserved by selection byDFAs, both for ﬁnite and inﬁnite alphabets, and Agafonov’s Theorem follows immediately asa corollary. We brieﬂy review the roadmap and techniques used for the proof of the mainresult in Section 1.3.As the study of distributions associated to limiting frequencies of ﬁnite strings in (right-)inﬁnite strings is cryptomorphic to the study of shift-invariant probability measures on theshift space (Σ ω , s ) equipped with the σ -algebra induced by the basis of cylinder sets on Σ , weobtain as a corollary a result in the ﬁeld of symbolic dynamical systems, namely a completecharacterization of the shift-invariant probability measures ν for which any ﬁnite-state selectorpreserves genericity for ν , see Section 6. Agafonov’s Theorem [46] was one of the end results of multiple eﬀorts grappling with the twonotions of (i) kollektiv (roughly, α ∈ { , } ω is a kollektiv wrt. a set S of selection strategiesif the limiting frequency of is unchanged after applying any strategy in S to α ), and(ii) admissible sequence and its relation to the notion of normal sequence [22, 53, 54, 48,47]. Agafonov’s Theorem itself had a virtually unknown precursor in a beautiful result byPostnikova [52] that showed, with the terminology of the present paper, that α ∈ { , } ω isnormal iff the distribution of 1s is preserved by selection strategies depending only on a ﬁnitenumber of preceding bits.Both Postnikova [52] and Agafonov [45] considered selection functions on sequences in { , } ω where the limiting distribution of was < p < (i.e., considered a Bernoullidistribution on { , } ), but considering Bernoulli distributions instead of the special case of The exact deﬁnition of kollektiv diﬀers subtly across diﬀerent authors, compare e.g. [67], [21], and [52].The original notion of kollektiv introduced by von Mises [67] had no constraints on the set S , but this turnedout to be essentially fruitless [63, 53, 32, 23]. . Forequidistribution, the earliest extension to arbitrary alphabets seems to be by Broglio andLiardet [15], and a number of authors have since re-proved Agafonov’s Theorem in the specialcase of equidistribution using a variety of methods; for example, using predictors deﬁnedfrom ﬁnite automata (for Σ = { , } ) [44], using compressibility arguments [6, 5, 58], and acombination of automata-theoretic and probabilistic methods similar to Agafonov’s originalreasoning [16].Agafonov’s Theorem itself has been generalized to treat selectors that are not necessarily(induced by preﬁx selection by) ﬁnite automata [2, 5, 66, 17], and some generalizations considerselectors based on relaxed ﬁniteness criteria of the syntactic monoid of a language selectingpreﬁxes of inﬁnite sequences [31, 68]; conversely, results by Merkle and Reimann show thatadding just slight computational power to the selection strategies beyond ﬁnite automata– e.g. using a Pushdown automaton with unary stack alphabet instead of a DFA rendersAgafonov’s Theorem invalid [40]. Similarly, selection by ﬁnite automata has been extended,and analogues for Agafonov’s Theorem been proved, in other settings than selection fromelements of the set Σ ω , e.g. for shifts of ﬁnite type [16]. All of these results only considernormality rather than more general classes of distributions on ﬁnite strings.Conversely, construction of normal sequences (as opposed to selecting normal sequencesfrom other normal ones) has been investigated thoroughly for more than a hundred years [60,20, 42, 65, 39, 51], including explicit construction of real numbers with normal expansion forany integer base b ≥ [34, 55, 3], and real numbers with normal expansion in non-integer bases[64, 37]. Among this work, the result of most use to the present paper is the construction byMadritsch and Mance of generic sequences for any shift-invariant probability measure µ [38]– these are essentially sequences that are µ -distributed using the terminology of the presentpaper (see Deﬁnition 4).In very recent work, Carton [16] proves that, for any Markov measure µ on Σ ω induced bya pair ( P, π ) of a stochastic | Σ |×| Σ | matrix and a stationary distribution π for P , any sequenceselected from a µ -distributed sequence by a ﬁnite-state selector from a particular subset of µ - compatible selectors, will be µ -distributed. Roughly, a ﬁnite-state selector is compatible, ifit can only read consecutive symbols of Σ with non-zero transition probability in P and everystate has only incoming transitions of at most one symbol from Σ . In contrast, we considerthe full set of ﬁnite-state selectors. Moreover, Carton’s results are restricted to the case ofﬁnite alphabets. Inﬁnite streams are typically used to model situtations where data elements arrive, no upperbound on the length of the stream is known a priori, and the focus is not on resource useas a function of the length of the stream; for example, inﬁnite streams have been studiedextensively in event-level diﬀerential privacy [26, 33], and in semantics of lazy programminglanguages such as

Haskell [50]. One possible reason for this is that only the short version (without proofs or explanation of techniques)of Agafonov’s result [46] appeared in English as [1]; in contrast, the original longer paper in Russian [45] waspublished in a more obscure journal, and was never translated. In fact, one of the strategies considered by Merkle and Riemann, which consists in computing the language { ww R | w ∈ Σ ∗ } where w R is the reverse of w , can be computed by an arguably less expressive model ofcomputation, namely a DFA(2) – two-way automata with two heads [28]. election of (substreams of) elements from inﬁnite streams has been investigated from apractical perspective since the 1960s [62], and is typically performed by specialized streamprocessing languages, e.g. LUSTRE [18] and

ESTEREL [8], typically for use in reactiveprogramming (e.g., for signal processing or circuit design). As they are designed for real-timeprocessing, these languages typically allow only very constrained operations – any program inboth

LUSTRE and

Esterel can be compiled to a ﬁnite state transducer automaton (anddeterministic program selecting a subsequence from its input is hence a ﬁnite-state selector asin Agafonov’s Theorem).In typical algorithmic treatments of stream processing, one typically studies unordered,ﬁnite sequences of elements from a very large, or inﬁnite, set [41]. The problems consideredtypically have strong constraints, e.g. that only a single pass over the stream is allowed andthat each element can only be observed once, and often involve a sketch –a data structurethat stores information about the elements seen in the stream and allows to answer predeﬁnedqueries. A classic example is estimating the frequency moments of the distribution of elementsin the stream using sketches with low memory in both alphabet size and stream length [4,30, 14]. Our work can be seen as a variation of streaming where the alphabet size may beinﬁnite, the stream itself is inﬁnite, and the distribution of element is not limited to the setof elements, but also has requirements on the ﬁnite subsequences of elements in the stream;in this setting, our main result is that any constant-space sketch sampling an inﬁnite streamin real-time preserves the distribution of ﬁnite subsequences iff the distribution is induced bya Bernoulli distribution on the set of elements.

The main result has two directions: (I) proving that if µ -distributedness is preserved byselection by any DFA, then µ is necessarily induced by a Bernoulli distribution, and (II) any µ induced by a Bernoulli distribution is preserved across selection by any DFA.For (I), we prove the more general result that if µ is not induced by a Bernoulli distributionon Σ , selection by a particular Postnikova strategy (roughly, a Postnikova strategy selectsan element of the sequence if and only if it follows a ﬁxed ﬁnite word) will select a non- µ -distributed inﬁnite sequence from a – bespoke – µ -distributed sequence. The Postnikovastrategy contains preﬁxes on the form u · w ∈ Σ ∗ for a ﬁxed w chosen such that w · a ∈ Σ | w | +1 is a minimal witness string such that µ ( w · a ) = µ ( w ) · µ ( a ) . Using basic constructions, wecan then prove that the Postnikova strategy can be implemented by a DFA that simulates asliding ﬁxed-width window.For (II), most of the modern methods of proving Agafonov’s Theorem (e.g., [6, 5, 58])are not immediately adaptable because they use methods that are particular to equidistribu-tions on ﬁnite alphabets (e.g., lossless ﬁnite-state compressors [6] or automatic Kolmogorovcomplexity [58]) – and we consider both Bernoulli distributions and inﬁnite alphabets. In-stead, we work along the general lines of Agafonov’s original proof [46] that more heavily usesprobabilistic reasoning.The key insights in Agafonov’s original proof was (i) that any strongly connected ﬁniteautomaton (containing at least one accepting state) applied to a normal sequence must select(always, not just with probability ) more than a constant fraction of elements from any suﬃ-ciently long ﬁnite substring of its input, and (ii) that selecting more than a constant fractionof suﬃciently long substrings entails that each element of Σ must be selected with approxi-mately equal probability, by the Law of Large Numbers. In Agafonov’s original approach (for6 = { , } ), an appeal to the Strong Law of Large Numbers was used in conjunction with theproduct measure on the product topology on { , } ω , thus required reasoning about cylindersets centered on sets A of ﬁnite strings; and to avoid “double-counting” the probabilities, thesesets had to be preﬁx-free. We avoid this diﬃculty by using concentration bounds to tally theoccurrences of elements a ∈ Σ in block decompositions of ﬁnite preﬁxes of α .The proof that any DFA selects a µ -distributed inﬁnite sequence from a µ -distributedinﬁnite sequence then follows by observing that (i) any run of a DFA on an inﬁnite sequenceeventually reaches a strongly connected component C of the DFA that is recurrent (i.e., therun can never exit C ), and (ii) that any such component induces an irreducible Markov chain,whence we can apply the Ergodic Theorem for Markov Chains to conclude that acceptingstates are reached inﬁnitely often and with appropriate frequency.The extension to inﬁnite alphabets is surprisingly straightforward in most proofs: essen-tially, instead of using combinatorial estimates for ﬁnite sets, we have to ensure that seriestaken over inﬁnite alphabets converge properly, but almost all instances involve series that (i)have non-negative elements, and (ii) are bounded above, whence the usual reasoning aboutabsolutely convergent series can be employed. Similarly, the classic results for ﬁnite automatathat we use need to be re-stated and re-proved in the case of inﬁnite alphabets, but this ingeneral turns out to be doable without too much leg-work (e.g. Lemma 19). One caveat is thatseveral important ancillary results have standard proofs that use combinatorial arguments onﬁnite sets, and we thus need to provide alternative proofs using diﬀerent methods. Deﬁnition 1.

We assume a non-empty, possibly (countably) inﬁnite, alphabet Σ and denoteby λ the empty string; the sets of ﬁnite and right-inﬁnite sequences of elements of Σ aredenoted by Σ ∗ and Σ ω , respectively. Elements of Σ ∗ are ranged over by v, w, . . . , and elementsof Σ ω by α, β, . . . . If α = a a · · · ∈ Σ ω and N is a positive integer, we denote by α | ≤ N theﬁnite string a a · · · a N .Given v ∈ Σ ∗ and u ∈ Σ ∗ ∪ Σ ω , we write v · w the element of Σ ∗ ∪ Σ ω obtained byconcatenation. For words v ∈ Σ ∗ and w ∈ Σ ∗ ∪ Σ ω , v is said to be a preﬁx of w , written v (cid:22) w , if there exists u ∈ Σ ∗ ∪ Σ ω such that w = u · v . If v (cid:22) w and v = w , v is said to be a proper preﬁx of w , written v ≺ w . For any v ∈ Σ ∗ , the cylinder set of w , denoted [ w ] , is thesubset of Σ ω deﬁned by [ w ] = { α ∈ Σ ω : α = w · β, β ∈ Σ ω } , that is the set of right-inﬁnitesequences that have v as preﬁx. Deﬁnition 2.

Let Σ be a non-empty, possibly (countably) inﬁnite, alphabet. A probabilitymap (over Σ ) is a map µ : Σ + −→ [0 , such that, for all positive integers n , the series P a ··· a n ∈ Σ n µ ( a · · · a n ) is convergent with limit . Note that convergence implies absoluteconvergence here.A probability map µ is said to be:• induced by a Bernoulli distribution p : Σ −→ [0 , if, for all positive integers n , and all a , . . . , a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 µ ( a i ) = Q ni =1 p ( a i ) .• invariant if, for all w ∈ Σ ∗ the series P a ∈ Σ µ ( w · a ) and P a ∈ Σ µ ( a · w ) are convergentwith limit µ ( w ) .• (when Σ is ﬁnite) equidistributed if, for any w ∈ Σ n , µ ( w ) = | Σ | − n .7bserve that an equidistributed µ is also Bernoulli. For alphabets | Σ | > , any map p : Σ −→ [0 , such that the series P a ∈ Σ p ( a ) converges to induces a probability map µ p by setting µ p ( a · · · a n ) , Q nj =1 p ( a j ) . For ﬁnite alphabets Σ , this map is equidistributed iff p ( a ) = | Σ | − for every a ∈ Σ .The expression “induced by a Bernoulli distribution” is justiﬁed by the fact that Bernoulliprobability maps correspond directly to the measure of cylinders in Bernoulli shifts [59] . Proposition 1.

A probability map µ induced by a Bernoulli distribution is invariant.Proof. For any w ∈ Σ ∗ , P a ∈ Σ µ ( aw ) = P a ∈ Σ µ ( a ) µ ( w ) = P a ∈ Σ µ ( w ) µ ( a ) = P a ∈ Σ µ ( wa ) .And P a ∈ Σ µ ( w ) µ ( a ) = µ ( w ) P a ∈ Σ µ ( a ) = µ ( w ) .We shall need probability maps to act as “measures” on (possibly inﬁnite) sets of ﬁnitestrings: Deﬁnition 3.

Let Σ be a non-empty alphabet, let W ⊆ Σ ∗ , and let µ be a probability mapover Σ . If W = ∅ , we deﬁne µ ( W ) = 0 . If P w ∈ W µ p ( w ) converges, we deﬁne µ p ( W ) = P w ∈ W µ p ( w ) .Observe that as µ p ( w ) ≥ for all w ∈ Σ ∗ , if P w ∈ W µ p ( w ) converges, it is absolutelyconvergent (hence, we do not need to specify an ordering of W ).We are interested in the probability maps whose values can be realized as the limitingfrequencies of ﬁnite words in right-inﬁnite sequences over Σ . Deﬁnition 4.

Let v = v · · · v N and w = w · · · w n be ﬁnite words over Σ . We denote by w ( v ) the number of occurrences of w in v , that is, the quantity |{ j N + 1 − n : v j v j +1 · · · v j + n − = w w · · · w n }| Let µ be a probability map over Σ , and be α is a right-inﬁnite sequence over Σ . If thelimit lim N →∞ w ( α | ≤ N ) N exists and is equal to some real number f , we say that w occurs in α with limiting frequency f . If every w ∈ Σ + occurs in α with limiting frequency µ ( w ) , we say that α is µ -distributed. Proposition 2.

Let µ be a probability map over Σ . If there exists a µ -distributed sequence,then µ is invariant.Proof. Let µ be a probability map over Σ and α = a a . . . a µ -distributed sequence. Weconsider w = w w . . . w k ∈ Σ k and note that for all N > : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X a ∈ Σ wa ( α | ≤ N ) − w ( α | ≤ N ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Indeed, every occurence of w as a i a i +1 . . . a i + k such that i > is also an occurence of b · w fora (unique) b ∈ Σ , so the expressions w ( α | ≤ N ) and P a ∈ Σ aw ( α | ≤ N ) are equal if and only if a a . . . a k = w and their diﬀerence is equal to otherwise. In the literature on normal numbers, the word Bernoulli is sometimes used slightly diﬀerently, for exam-ple Schnorr and Stimm [56] use the term “Bernoulli sequence” for sequences that are equidistributed in ourterminology. (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − w ( α | ≤ N ) N (cid:12)(cid:12)(cid:12)(cid:12) N .

We therefore obtain that: (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ wa ( α | ≤ N ) N − w ( α | ≤ N ) N (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) w ( α | ≤ N ) N − µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) . Since both expressions on the right converge to , the left-hand side converges to zero, showingthat µ ( w ) = lim n →∞ P a ∈ Σ wa ( α | ≤ n ) n = P a ∈ Σ lim n →∞ wa ( α | ≤ n ) n = P a ∈ Σ µ ( wa ) .Similarly, for all N > : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X a ∈ Σ wa ( α | ≤ N ) − w ( α | ≤ N ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , by a similar argument as the one used above, noting that the number of occurrences is diﬀerentif and only if a N − k +1 a N − k +2 . . . a N = w . We then conclude that µ ( w ) = P a ∈ Σ µ ( aw ) in thesame way.Observe that an inﬁnite sequence α is normal in the usual sense iff it is µ -distributedfor (the unique) equidistributed probability map µ over Σ . Also observe that it is not allprobability maps µ for which there exists a µ -distributed sequence. Example 1.

An example of a probability map that is not

Bernoulli, but such that there isat least one µ -distributed right-inﬁnite sequence, is the map µ over Σ = { , } deﬁned by µ ( w ) = 1 / if w does not contain any of the strings or (note that for each positiveinteger n , there are exactly two such strings of length n , namely · · · and · · · ),and b ( α ) = 0 otherwise. Observe that the right-inﬁnite sequence · · · is µ -distributed. In contrast to all previous work on Agafonov’s Theorem, we allow countably inﬁnite alpha-bets Σ . Alphabets of larger cardinality do not in general have probability measures realizableby considering limiting frequencies of elements of Σ ω – simply because most elements of Σ cannot occur at all in a single element of Σ ω .One reason why previous generalizations of Agafonov’s Theorem have not considered inﬁ-nite alphabets is that there can be no equidistribution on a countably inﬁnite set. However,there are Bernoulli measures µ on countably inﬁnite alphabets Σ and µ -distributed inﬁnitesequences over Σ . Example 2.

An example of a countably inﬁnite alphabet with a Bernoulli measure is

Σ = N and p ( n ) = 6 / ( πn ) (note that we have P n ∈ Σ p ( n ) = 1 ). In general, any convergentseries P ∞ n =1 a n where every a n is non-negative induces a Bernoullii distribution on N bysetting p ( n ) = a n / (lim n →∞ P ∞ n =1 a n ) . Each such Bernoulli distribution p induces an invariantprobability map µ p , and by a result of Madritsch and Mance [38], there exists a µ p -distributedsequence.Remark . As we consider possibly inﬁnite alphabets, we often have to consider inﬁnite seriesinstead of ﬁnite sums in the proofs. In most cases, these series will have elements that areknown to be non-negative, and the sum of all partial sums will be bounded above, whence theseries will be absolutely convergent and the order of summation can thus be changed freely. Atrivial example of use is to consider some B ⊆ Σ and note that P a ∈ B p ( a ) = 1 − P a ∈ Σ \ B p ( a ) (as P a ∈ B p ( a ) ≤ , P a ∈ Σ \ B p ( a ) ≤ , and p ( a ) ≥ the two series are absolutely convergent,and P a ∈ B p ( a ) + P a ∈ Σ \ B = P a ∈ Σ p ( a ) = 1 ). 9 .1 Strategies, selectors and DFAs2.2 Strategies Deﬁnition 5.

Let Σ be an alphabet. A strategy S over Σ is a subset S ⊆ Σ ∗ .Given a strategy S and α ∈ Σ ω , we deﬁne the sequence selected by S , denoted S [ α ] , asfollows: if i , i , . . . , i k , . . . be the (increasing) sequence of indices i j such that α |

Deﬁnition 6. A ﬁnite-state selector over Σ is a DFA A = ( Q, δ, q s , Q F ) , where Q is the setof states, q s is the unique start state, Q F is the set of accepting states, and δ : Q × Σ −→ Q is the transition relation.A DFA is strongly connected if its underlying directed graph (states are nodes, transitionsare edges) is strongly connected.Denote by L ( A ) the language accepted by the automaton. If α = a a · · · is a ﬁnite orright-inﬁnite sequence over Σ , the subsequence selected by A is the (possibly empty) sequenceof letters a n such that the preﬁx a · · · a n − ∈ L ( A ) , that is, the automaton when started onthe ﬁnite word a · · · a n − in state q s ends in an accepting state after having read the entireword. The run of S on input α is the sequence of states visited when S is applied to α from thestarting state. For ( q, w ) = ( q, w · · · w n ) ∈ Q × Σ ∗ , we use the notation δ ∗ ( q, w ) to denote thestate δ ( · · · δ ( δ ( q, w ) , w ) . . . w n ) , that is, the state reached by starting from q and followingthe (unique) path induced by w .Observe that a DFA may select an empty, ﬁnite or inﬁnite sequence when run on a right-inﬁnite word. Deﬁnition 7.

Let A be a DFA. A strongly connected component C in (the underlying directedgraph of) S is said to be recurrent if, for every state p in C and every a ∈ Σ , δ ( p, a ) is a statein C (i.e., once a run of S on some inﬁnite word reaches a state in C , the run cannot leave C ). Deﬁnition 8.

Let A = ( Q, Σ , δ, q , F ) be a connected DFA. For all q ∈ Q , we denote by A q the automaton ( Q, Σ , δ, q, F ) , i.e. where the state q is chosen as the initial state.10 eﬁnition 9. Let A = ( Q, Σ , δ, q , F ) be a connected DFA, and let q ∈ Q . Let α be aright-inﬁnite sequence over Σ . We denote by A q [ x ] the subsequence ¯ α of α picked out by A q ,that is, w i ∈ ¯ w if and only if A q ( w
Let µ be a probability map induced by a positive Bernoulli distribution on Σ ,let A be a DFA over Σ , and let α ∈ Σ ω be µ -distributed. Then, the run of S on α eventuallyreaches a strongly connected recurrent component of A .Proof. Let C be the strongly connected component and w the word obtained from Lemma 3.As α is p -distributed, w appears in α , so write α = vwα ′ , and let q be the state of A reachedafter | v | transitions in the run of S on α . Then, after a at most a further | w | transitions, therun reaches a state in C .Corollary 3.1 ensures that we can assume wlog. that the ﬁnite-state selectors we treatare strongly connected. Note that the corollary does not imply that the strongly connectedrecurrent component contains an accepting state (indeed, the automata may have an emptyset of accepting states). Thus, some automata do not always select inﬁnite sequences, andadditional assumptions are needed if this is desirable (this is discussed in Remark 2 below).However, this is not an issue for our main result which states that the output of a selectorapplied to a normal sequence is again normal as long as it is inﬁnite . Theorem 4.

Let Σ be a non-empty (ﬁnite or inﬁnite) alphabet and µ be a probability mapsuch that there exists at least one α ∈ Σ ω that is µ -distributed. Then, the following statementsare equivalent:1. µ is induced by a positive Bernoulli distribution p on Σ , i.e. for every a · · · a n ∈ Σ , µ ( a · · · a n ) = Q ni =1 p ( a ) , and p ( a ) > for all a ∈ Σ ;2. (Postnikova property) for every ﬁnite word w ∈ Σ ∗ and µ -distributed sequence α ∈ Σ ω , ifthe sequence selected from α by the Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } is inﬁnite, then it is µ -distributed;3. (Agafonov property) For every DFA A over Σ and every µ -distributed sequence α ∈ Σ ω ,if the sequence selected from α by A is inﬁnite, then it is µ -distributed.Proof. For the implication 1 ⇒

3, Corollary 3.1 yields that any run of a ﬁnite-state selectoron a µ -distributed sequence eventually reaches a strongly connected recurrent component; therestriction of any DFA to the state set of one of its recurrent component is also a DFA, andthe result now follows by Lemma 15. The implication 3 ⇒ ⇒ The result in [56] is stated for ﬁnite alphabets, but the proof method carries through for inﬁnite alphabetsas well. We provide a proof in Appendix A. emark . Theorem 4 addresses the case where a DFA or Postnikova strategy selects aninﬁnite sequence from a µ -distributed sequence. If one wants to restrict attention to automatathat always select an inﬁnite subsequence from any µ -distributed sequence, extra conditionssometimes occur in the literature, e.g. that every cycle in the (underlying graph of the) DFAcontains an accepting state [6] ensuring that an inﬁnite subsequence is selected from any (notjust µ -distributed sequence). Another condition that ensures that an inﬁnite subsequence isselected from any µ -distributed sequence is to consider only DFAs such that every stronglyconnected recurrent component contains at least one accepting state. In this case, Corollary3.1 ensures that any run on the automaton on a µ -distributed sequence will reach a stronglyrecurrent component, and Lemma 11 below then ensures that the DFA accepts an inﬁnitesubsequence from α . µ -distributedness for non-Bernoulli mea-sures We ﬁrst prove that if µ is a probability map such that any DFA selects a µ -distributed right-inﬁnite sequence from any µ -distributed right-inﬁnite sequence, then µ must be Bernoulli.This is an immediate consequence of a stronger property proved in Lemma 5 below.The idea of the proof is that if µ is not Bernoulli, there is a word a · · · a k such that µ ( a · · · a k − ) = Q k − j =1 a j , but µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) . One can then constructa ﬁnite-state selector that acts like a “sliding window” of size k − , that is, remembers the last k − letters scanned and accepts if these are a · · · a k . This selector will select every letterfollowing a · · · a k − ; after a preﬁx of length N of a right-inﬁnite sequence has been scanned,approximately N · µ ( a · · · a k − ) have been selected, and approximately N · µ ( a · · · a k − a k ) of these will be the symbol a k . But then the limiting frequency of a k in the sequence selectedwill be µ ( a · · · a k − a k ) /µ ( a · · · a k − ) = µ ( a k ) , and the result follows. Lemma 5.

Let µ : Σ ∗ −→ [0 , be a probability map. If µ is not induced by a Bernoullidistribution on Σ , there exists a ﬁnite word w ∈ Σ ∗ such that if α ∈ Σ ω is µ -distributed, thenthe Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } selects from α an inﬁnite sequence β ∈ Σ ω that is not µ -distributed.Proof. If no element of Σ ω is µ -distributed, the lemma is vacuously true. Hence, assumethat there is at least one α ∈ Σ ω that is µ -distributed. If | Σ | = 1 , then there is exactly oneprobability map on Σ ∗ , namely the one that assigns probability to the unique element of Σ k for every k ≥ , and this probability map is clearly Bernoulli, and the lemma is thus vacuouslytrue. Hence, in the remainder of the proof, assume that | Σ | ≥ .Assume that µ is not induced by a Bernoulli distribution on Σ . Then there are k anda word a · · · k − a k ∈ Σ k such that µ ( a · · · a k − a k ) = Q kj =1 µ ( a j ) . Observe that k = 1 isimpossible (as µ ( a ) = Q j =1 µ ( a ) ), and thus we must have k ≥ . Assume wlog. that k isminimal among such k , and hence that µ ( a · · · a k − ) = Q k − j =1 µ ( a j ) , and note that this implies µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) .Assume for contradiction that µ ( a · · · a k − ) = 0 . Then µ ( a i ) = 0 for at least one a i and thus µ ( a · · · a k − a k ) = 0 , because the fact that there is at least one µ -distributed right-inﬁnite sequence entails that µ ( a · · · a k − a k ) > implies µ ( a i ) ≥ µ ( a · · · a k − a k ) > . Butthis is a contradiction as we would then have µ ( a · · · a k − a k ) = 0 = Q kj =1 µ ( a j ) . Thus, µ ( a · · · a k − ) > . 12s µ ( a · · · a k − a k ) = µ ( a · · · a k − ) · µ ( a k ) , µ ( a · · · a k − ) > , and µ ( a · · · a k − a k ) ≤ µ ( a · · · a k − ) (because µ is invariant by Proposition 2), there is a real number γ with < γ < such that: (cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12) > γ We now consider the Postnikova strategy S w with w = a · · · a k − , i.e. the strategy thatselects exactly the symbols following the occurrences of a · · · a k − in α .Let α ∈ Σ ω be µ -distributed. Then, for every ǫ > , there is an N ǫ > such that for all n > N ǫ we have: (cid:12)(cid:12)(cid:12)(cid:12) a ··· a k − ( α | ≤ n ) n − µ ( a · · · a k − ) (cid:12)(cid:12)(cid:12)(cid:12) ǫ Hence nµ ( a · · · a k − ) − nǫ ≤ a ··· a k − ( α | ≤ n ) ≤ nµ ( a · · · a k − ) + nǫ (1)and nµ ( a · · · a k − a k ) − nǫ ≤ a ··· a k − a k ( α | ≤ n ) ≤ nµ ( a · · · a k − a k ) + nǫ (2)As µ ( a · · · a k − ) > and S w selects the symbol after each occurrence of a · · · a k − , S w selectsan inﬁnite sequence β from α . Let β ( n ) ∈ Σ ∗ be the ﬁnite sequence selected by S w from α | ≤ n .Observe that we have | β ( n ) | = a ··· a k − ( α | ≤ n ) , and a k ( β ( n ) ) = a ··· a k − a k ( α | ≤ n ) . Thefraction of occurrences a k ( β ( n ) ) / | β ( n ) | of a k in β ( n ) thus satisﬁes: a k ( β ( n ) ) | β ( n ) | = a ··· a k − a k ( α | ≤ n ) a ··· a k − ( α | ≤ n ) = a ··· a k − a k ( α | ≤ n ) n · n a ··· a k − ( α | ≤ n ) and hence, by (1) and (2), for all n > N : µ ( a · · · a k − a k ) − ǫµ ( a · · · a k − ) + ǫ ≤ a k ( w A,n ) | w A,n | ≤ µ ( a · · · a k − a k ) + ǫµ ( a · · · a k − ) − ǫ (3)Consider an arbitrary δ with < δ < γ/ . By (3), for all suﬃciently small ǫ , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − a k ( β ( n ) ) | β ( n ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < δ and thus for all n > N ǫ : γ < (cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( a · · · a k − a k ) µ ( a · · · a k − ) − a k ( β ( n ) ) | β ( n ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < δ + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) whence: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a k ( β ( n ) ) | β ( n ) | − µ ( a k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > γ and as the sequence ( β ( n ) ) n ∈ N consists of preﬁxes of the sequence S w [ α ] selected by S w from α , and is eventually increasing, the frequency of occurrences of a k diﬀers inﬁnitely often from µ ( a k ) by at least γ/ , S w [ α ] cannot be µ -distributed.13emma 5 shows that if a probability map is not induced by a Bernoulli distribution on Σ , some Postnikova strategy will select a non- µ -distributed sequence from any µ -distributedsequence. In case µ is induced by a Bernoulli distribution, but not a positive Bernoullidistribution, we can show the weaker result that there will be a Postnikova strategy thatselects a non- µ -distributed sequence form some µ -distributed sequences (and this is suﬃcientfor our main Theorem). Lemma 6.

Let µ : Σ ∗ −→ [0 , be a probability map induced by a Bernoulli distribution on Σ that is not positive. Then there exists a ﬁnite word w ∈ Σ ∗ and µ -distributed α ∈ Σ ω such thatthe Postnikova strategy S w = { u ∈ Σ ∗ | ∃ v s.t. u = vw } selects from α an inﬁnite sequence β ∈ Σ ω that is not µ -distributed.Proof. As µ is not positive, pick b ∈ Σ such that µ ( b ) = 0 , and let Γ ⊆ Σ be a maximalsubset such that the restriction of µ to Γ is a positive Bernoulli distribution (observe that Γ is non-empty because µ is a probability map and P a ∈ Σ µ ( a ) thus implies µ ( a ) > for some a ∈ Σ ). By [38] there exists a µ -distributed inﬁnite sequence β ∈ Γ ω ; notice that β can be assumed w.l.o.g. to not contain any occurrences of b . Let α ∈ Σ ω be obtained byinserting the string bb at positions , , , , . . . . Then, α is µ -distributed because (i) every v ∈ Γ ∗ occurs with the same limiting frequency as in β , and every v ∈ Σ ∗ that contains anelement of Σ \ Γ occurs in α with limiting frequency . Set w = b ; then the Postnikova strategy S w = { u ∈ Σ ∗ |∃ v s . t .u = vw } selects from α a sequence β = S w [ α ] such that, for every n > , b ( β | ≤ n ) ≥ n/ − . Thus, the limiting frequency of b in β is not , and hence is not µ ( b ) ,proving that β is not µ -distributed. Lemma 7.

Let w ∈ Σ ∗ . The Postnikova strategy { u ∈ Σ ∗ | ∃ v s.t. u = vw } is computable bya strongly connected DFA over Σ .Proof. Note that the alphabet can possibly be inﬁnite in the following proof.We write m the length of the word w , and write w , w , . . . , w m the bits of w . We design aﬁnite state selector M w with exactly m states which will select a bit of the input if and onlyif it is preceded by the word w . Let M w = ( m , δ, q s , Q F ) be deﬁned as follows:• m = { ( b , b , . . . , b m ) : b i ∈ { , }} is the set of binary sequences of length m ; those willrepresent a sequence of bits where b j = 1 if and only if the previous j bits of the inputcoincide with the ﬁrst j bits of the input;• q s the initial state is chosen to be the sequence (0 , , . . . , ∈ m ;• Q F the set of accepting state is equal to the set of sequences { ( b , b , . . . , b m ) ∈ m : b m = 1 } ;• δ the transition function is deﬁned as δ ( b , b , . . . , b m ; a ) = ( c , c , . . . , c m ) where c j = 1 if and only if b j − = 1 and a = w j for j = 1 , and c = 1 if and only a = w .The fact that this automaton computes the Postnikova strategy is clear from the deﬁnition.We now show it is strongly connected by showing that any state ( b , b , . . . , b m ) is reachablefrom an arbitrary state. For this, we consider a word u b ,b ,...,b m = u , . . . , u m deﬁned by u i = w i if and only if b i = 1 (and thus u i = w i whenever b i = 0 . We then claim that theautomaton, starting from any state c ∈ m , reaches the state ( b , b , . . . , b m ) when given theword u b ,b ,...,b m as input. The key observation here is that since the ’bb’s are inserted at exponentially increasing positions, thefrequency of occurrence of all other strings is decreased by a very small (and quickly decaying) factor. Finite-state selectors preserve µ -distributedness for Bernoullimeasures The sequence of auxiliary results of this section follows the general lines of Agafonov’s originalproof in Russian for the case

Σ = { , } [46], but with multiple proofs needing more carefulanalysis and adapted techniques. Deﬁnition 10.

Let Σ be an alphabet, α = x x · · · x n · · · ∈ Σ ω , and let n be a positive integer.The n - block decomposition of α is the sequence ( α ( n,r ) ) r ≥ where α ( n,r ) = x ( r − n +1 · · · x rn ∈ Σ n .Thus, α ( n, is the string of the ﬁrst n symbols of α , α ( n, is the string of the next n symbols, and so forth. Deﬁnition 11.

Let µ be a probability map over Σ and α = x x · · · x n · · · ∈ Σ ω . We saythat α is µ - block-distributed if, for each n > and every w ∈ Σ n , the n -block decomposition ( α ( n,r ) ) r > of α satisﬁes: lim k →∞ | i ≤ k : α ( n,k ) = w | k = µ ( w ) For ﬁnite alphabets and the special case of p being an equidistribution on Σ , it is straight-forward to prove that the properties of being µ p -distributed and µ p -block-distributed areequivalent [43, 19, 49]. For the present paper, we only use that µ p -distributedness implies µ p -block-distributedness, which follows by tedious, but standard counting arguments on suf-ﬁciently large ﬁnite preﬁxes of α | ≤ N using the same reasoning as the original proof by Nivenand Zuckerman for ﬁnite alphabets and normality [43], mutatis mutandis : Proposition 8.

Let µ p be a probability map induced by a Bernoulli distribution p on alphabet Σ . If α ∈ Σ ω is µ p -distributed, it is µ p -block distributed. We now prove that ﬁnite-state selectors can be composed appropriately; this will later bea key ingredient in reducing the problem of selecting ﬁnite strings w ∈ Σ ∗ with frequency µ p ( w ) to the problem of selecting single symbols a ∈ Σ with frequency p ( a ) . Proposition 9 (Finite-State selectors are compositional) . Let A and B be DFAs over thesame alphabet. Then there is a DFA C such that, for each sequence w , C [ w ] = B [ A [ w ]] . If A and B are both strongly connected and A contains at least one accepting state, C can bechosen to be strongly connected.Proof. Let A = ( Q A , Σ , δ A , q A , F A ) and B = ( Q B , Σ , δ B , q B , F B ) . Deﬁne Q C = Q A × Q B ,and set q C = ( q A , q B ) and F C = F A × F B . For each q B ∈ Q B , deﬁne the set D q B = { ( q, q B ) : q ∈ Q A } ⊆ Q C . Observe that Q C = S q B ∈ Q B D q B and that for q B , r B ∈ Q B with q B = r B , wehave D q B ∩ D r B = ∅ , and thus { D q B : q B ∈ Q B } is a partitioning of Q C . Hence, the transitionrelation, δ C , of C may be deﬁned by deﬁning it separately on each subset D q B : δ C (( q, q B ) , a ) = (cid:26) ( r, q B ) if q / ∈ F A and δ A ( q, a ) = r ( r, r B ) if q ∈ F A and δ A ( q, a ) = r and δ B ( q B , a ) = r B C processes its input, it freezes the current state q B of B (the freezing is repre-sented by staying within D q B ) and simulates A until an accepting state of A is reached (i.e.just before A would select the next symbol); on the next transition, C unfreezes the currentstate of B and moves to the next state r B of B and then freezes it and continues with asimulation of A .Observe that a symbol is picked out by C iff the state is an element of F C = F A × F B iff the symbol is the next symbol read after simulation of A reaches an accepting state of A when the current frozen state of B is an accepting state of B .By construction, C is strongly connected if both A and B are: for any pair of states ( q A , q B ) , ( q A , a B ) ∈ Q C , strong connectivity of B implies that there is a directed path from q B to q B in B . Let q B , q B , , q B , , . . . , q B ,k be the states along this path. Strong connectivity of A and the assumption that there is some q F ∈ F A imply that there is a directed path from ( q A , q B ) to q F , q B ) in C , and by deﬁnition of δ C , there is a transition in C from ( q F , q B ) to ( q B , q B , ) . A straightforward induction on k now completes the proof.The following shows that to prove that the property of being µ p -distributed is preservedunder ﬁnite-state selection, it suﬃces to prove that the limiting frequency of each a ∈ Σ existsand equals p ( a ) . Lemma 10.

Let µ p be a probability map induced by a Bernoulli distribution p on Σ , and let α ∈ Σ ω be µ p -distributed. The following are equivalent: • For all strongly connected DFAs A , if A [ α ] is inﬁnite, then A [ α ] is µ p -distributed. • For all strongly connected DFAs A and all a ∈ Σ , if A [ α ] is inﬁnite, then the limitingfrequency of a in A [ α ] exists and equals p ( a ) .Proof. If, for all A such that A [ α ] is inﬁnite, A [ α ] is µ p -distributed, then in particular thelimiting frequency of a in A [ α ] exists and is equal to p ( a ) for all A .Conversely, suppose that, for all strongly connected DFAs A and all a ∈ Σ , if A [ α ] isinﬁnite, then the limiting frequency of a in A [ α ] exists and equals p ( a ) . Let A be such a DFA.If A has no accepting states, there is nothing to prove, so assume that A has at least oneaccepting state.We will prove by induction on k ≥ that the limiting frequency of every v · · · v k v k +1 ∈ Σ k +1 exists and equals µ p ( v · · · v k v k +1 ) .• k = 0 : This is the supposition.• k ≥ . Suppose that the result has been proved for k − . Let v · · · v k ∈ Σ k ; bythe induction hypothesis, the limiting frequency of v · · · v k in A [ w ] is µ p ( v · · · v k ) . Weclaim that there is a strongly connected DFA B that, from any sequence, selects thesymbol after each occurrence of v · · · v k . To see that such a DFA exists, let there bea state for each element of Σ k and think of the state is the current length- k string ina “sliding window” that moves over w one symbol at the time; when the window ismoved one step, the DFA transits to the state representing the new length- k string inthe window, i.e. for any a ∈ Σ , from the state representing the word w · · · w k , thereare transitions to w · · · w k a ; it thus every state is reachable from every other state inat most k transitions. The unique ﬁnal state of B is the state representing v · · · v k ; thestart state of B can be chosen to be any state representing a string w · · · w k such that16here are exactly k transitions to the ﬁnal state; for example, let a ∈ Σ \ { v } . Thenfrom the state representing a k , the ﬁnal state cannot cannot be reached in k − or fewersteps, but every state is reachable in k steps.By Proposition 9, there is a strongly connected DFA C such that C [ w ] = B [ A [ w ]] forall w ∈ Σ ∗ .For any a ∈ Σ and any suﬃciently large positive integer N , we have: a ( C [ α | ≤ N ]) | C [ α | ≤ N ] | = a ( B [ A [ α | ≤ N ]]) | B [ A [ α | ≤ N ]] | = v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) By the induction hypothesis, for every ǫ > , we have, for all suﬃciently large N , that (cid:12)(cid:12)(cid:12) a ( C [ α | ≤ N ]) | C [ α | ≤ N ] | − p ( a ) (cid:12)(cid:12)(cid:12) < ǫ , and hence: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (4)But for all suﬃciently large N , the induction hypothesis also furnishes that: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (5)But as: v ··· v k a ( A [ α | ≤ N ]) | A [ α | ≤ N ] | = v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) · v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | Equation (4) and Equation (5) thus yield: (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k a ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) · v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | − µ p ( v · · · v k ) p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ + ǫ (cid:18) v ··· v k a ( A [ α | ≤ N ]) v ··· v k ( A [ α | ≤ N ]) + v ··· v k ( A [ α | ≤ N ]) | A [ α | ≤ N ] | (cid:19) ≤ ǫ + 2 ǫ Hence, for all a ∈ Σ , the limiting frequency of v · · · v k a in A [ α | ≤ N ] exists and equals µ p ( v · · · v k a ) , as desired. µ p -distributedness under ﬁnite-state selection By Lemma 10 we may restrict our attention to proving that the frequency of single symbolsfrom Σ are preserved under selection by DFAs. The strategy will be to consider an arbitrarystrongly connected DFA A , split the set of ﬁnite words Σ ∗ into multiple classes that depend onthe selection behaviour of A , and use a combination of concentration bounds and basic Markovchain theory applied to these classes to obtain upper and lower bounds on the frequency withwhich A selects each symbol from A . 17 eﬁnition 12. Let A = ( Q, Σ , δ, q , F ) be a strongly connected DFA. For all p ∈ [0 , , b ∈ [0 , , n ∈ N and ǫ > , we deﬁne sets D pn ( b, ǫ ) , E n ( b, q ) and G n ( b, ǫ, q ) as follows: D pn ( b, ǫ, q ) = (cid:26) w ∈ Σ n : | A q [ w ] | > bn and sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ (cid:27) (6) D pn ( b, ǫ ) = \ q ∈ Q D pn ( b, ǫ, q ) (7) E n ( b, q ) = { w ∈ Σ n : | A q [ w ] | ≤ bn } (8) E n ( b ) = [ q ∈ Q E n ( b, q ) (9) G n ( b, ǫ, q ) = (cid:26) w ∈ Σ n : | A q [ w ] | > bn and sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) (10) G n ( b, ǫ ) = [ q ∈ Q G n ( b, ǫ, q ) (11) Observe that, for all b, n, ǫ , Σ n = E n ( b ) ∪ D pn ( b, ǫ ) ∪ G n ( b, ǫ ) (and also note that E n ( b ) and G n ( b, ǫ ) are not necessarily disjoint). Lemma 11.

Let p be a positive Bernoulli distribution on Σ , and let S = ( Q, Σ , δ, q s , Q F ) be astrongly connected ﬁnite automaton with Q F = ∅ , and let n a positive integer. Then there existsa real number c > such that for all real numbers ǫ > we have lim n →∞ µ p ( E n ( c − ǫ )) = 0 .Proof. S induces a stochastic | Q | × | Q | matrix P by setting P ij = X a ∈ Σ p ( a ) · δ ( i,a )= j . Observe that if Σ is inﬁnite, the fact that (i) p ( a ) · δ ( i,a )= j ≥ , (ii) p ( a ) · δ ( i,a )= j ≤ p ( a ) , and(iii) P a ∈ Σ p ( a ) = 1 entails that the series P a ∈ Σ p ( a ) · δ ( i,a )= j is absolutely convergent.Note also that P ij = 0 iff there are no transitions from i to j in Q on a symbol a ∈ Σ with p ( a ) > . As S is strongly connected, there exists a path from state i to state j for each i, j ∈ Q . Let v be the word along this path; as p ( a ) > for all a ∈ Σ , we have µ p ( v ) > ,whence for each i, j there is an integer n ij such that P n ij ij > , that is, P (and its associatedMarkov chain) is irreducible. As all states of a ﬁnite Markov chain with irreducible transitionmatrix are positive recurrent, standard results (see, e.g., [57, Thm. 54]) yield that there is aunique positive stationary distribution π : Q −→ [0 , (s.t., for all i ∈ Q , we have π ( i ) > and π ( i ) = P j ∈ Q π ( j ) P ij ). Furthermore, the expected return time M i to state i satisﬁes M i = 1 /π ( i ) .Let ( X n ) n ≥ = ( X , X , X , . . . ) be the Markov chain with transition matrix P and someinitial distribution λ on the states. Consider, for each i ∈ Q , the stochastic variable V i , where V i ( n ) = n − X k =0 X k = i , that is, V i ( n ) is the number of times state i is visited in the ﬁrst n elements of the Markovchain. As P is irreducible, the Ergodic Theorem for Markov chains (see, e.g., [57, Thm. 75])18ields that, independently of λ , we have for arbitrary ǫ > : lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = lim n →∞ Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − M i (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (12)Let n be a positive integer, let w = w · · · w n ∈ Σ n , and let q S j ( w ) = q w q w · · · q wn be thesequence of states visited in the run of S j on w (i.e., q w = j ). The probability of observinga state sequence q · · · q n in the Markov chain is (when the initial distribution λ has λ ( q ) = λ ( j ) = 1 ):Pr ( q · · · q n ) = n − Y i =0 X a ∈ Σ p ( a )1 δ ( q i ,a )= q i +1 = X a ,...,a n ∈ Σ p ( a )1 δ ( q ,a )= q · · · p ( a n )1 δ ( q n − ,a n )= q n where we have used the fact that the Cauchy product of two absolutely convergent series isconvergent.As for all integers i with ≤ i ≤ n we have δ ( q wi − , w i ) = q wi , we obtain: X a ,...,a n ∈ Σ p ( a )1 δ ( q ,a )= q · · · p ( a n )1 δ ( q n − ,a n )= q n = µ p ( { a · · · a n : q S j ( a · · · a n ) = q · · · q n } ) and hence Pr ( q · · · q n ) = µ p ( { w : q S j ( w ) = q · · · q n } ) (13)Thus, as S is deterministic and every w · · · w n ∈ Σ n occurs along exactly one path of statesin S , we have:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = X q q ··· q n ∈ Q n Pr ( q · · · q n )1 | V i ( n ) /n − π ( i ) |≥ ǫ = X q q ··· q n ∈ Q n µ p ( { w · · · w n : q S j ( w · · · w n ) = q · · · q n } )= µ p (cid:18) w : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) (14)Hence, by Equation (12) and Equation (14), we have lim n →∞ µ p (cid:18) w : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − π ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = 0 (15)If q S j ( w ) = q · · · q n and q k ∈ Q F for some k with ≤ k ≤ n − , then S j selects w k +1 . Set c = min q i ∈ Q F π ( i ) ( c is well-deﬁned as Q F = ∅ ), and let i ∈ Q F be such that π ( i ) = c . Then,for all j ∈ Q : µ p ( E n ( c − ǫ, j ) = µ p ( { w ∈ Σ n : | S j [ w ] | ≤ ( c − ǫ ) n } ) ≤ µ p ( { w ∈ Σ n : V i ( n ) ≤ ( c − ǫ ) n } )= µ p (cid:18)(cid:26) w ∈ Σ n : V i ( n ) n − c ≤ − ǫ (cid:27)(cid:19) = µ p (cid:18)(cid:26) w ∈ Σ n : (cid:12)(cid:12)(cid:12)(cid:12) V i ( n ) n − c (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) And hence, by Equation (15), we have lim n →∞ µ p ( E n ( c − ǫ, j )) = 0 , and as j ∈ Q was arbitrary,we obtain 19 im n →∞ µ p ( E n ( c − ǫ )) = lim n →∞ µ p ( ∪ j ∈ Q µ p ( E n ( c − ǫ, j )) ≤ lim n →∞ X j ∈ Q µ p ( E n ( c − ǫ ) , j )= X j ∈ Q lim n →∞ µ p ( E n ( c − ǫ ) , j ) = 0 as desired. Lemma 12.

Let S be a strategy, a ∈ Σ , b, ǫ be real numbers with < b ≤ and ǫ > , and p : Σ −→ [0 , a positive Bernoulli distribution. Deﬁne, for all positive integers n : H n ( b, ǫ ) = (cid:26) w ∈ Σ n : | S ( w ) | > bn ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( S ( w )) | S ( w ) | (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = [ bn<ℓ ≤ n (cid:26) w ∈ Σ n : S ( w ) ∈ Σ ℓ ∧ (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( S ( w )) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Then: lim n →∞ µ p ( H n ( b, ǫ )) = 0 Proof.

Deﬁne F n ( b, ǫ ) = [ bn<ℓ ≤ n (cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) Observe that H n ( b, ǫ ) = { w ∈ Σ n : S ( w ) ∈ F n ( b, ǫ ) } . Thus, µ p ( H n ( b, ǫ )) ≤ µ p F n ( b, ǫ ) for all n , and it thus suﬃces to prove that lim n →∞ µ p ( F n ( b, ǫ )) = 0 .Consider the stochastic variable X a that is when a is picked from Σ with probability p ( a ) , and otherwise. Then, the mean of X a is p ( a ) and the variance of X a is p ( a )(1 − p ( a )) .Now consider performing ℓ ≥ independent Bernoulli trials drawn according to X a . Deﬁne q : { , } + −→ [0 , inductively by q (1) = p ( a ) , q (0) = 1 − p ( a ) , and q (1 c ) = p ( a ) q ( c ) and q (0 c ) = (1 − p ( a )) q ( c ) for c ∈ Σ + , and observe that q induces a probability distribution ¯ q on Σ ℓ by setting ¯ q ( w ) = q ( w ) . Now, for any v ∈ Σ ℓ , ¯ q ( v ) is the probability of obtaining v byperforming ℓ repeated Bernoulli trials as above.Deﬁne the stochastic variable X ℓa = X a + X a + · · · + X a ( ℓ times). Then, X ℓ counts thenumber of occurrences of a by performing the ℓ repeated Bernoulli trials.By the Chernoﬀ bound, X ℓa satisﬁes:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) ≤ e − ℓǫ p ( a ) (16)Deﬁne the map g : Σ −→ { , } by g ( a ) = 1 and g ( b ) = 0 for all b ∈ Σ \ { a } .Clearly, g extends homomorphically to a map ˜ g : Σ ℓ −→ { , } ℓ by setting ˜ g ( c c · · · c ℓ ) = g ( c ) g ( c ) · · · g ( c ℓ ) . Claim:

For any u ∈ { , } ℓ , ¯ q ( u ) = µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = u } ) (17) Proof of claim:

By induction on ℓ . 20 If ℓ = 1 , then if u = 0 , we have { y ∈ Σ ℓ : ˜ g ( y ) = u } = Σ \ { a } and thus: ¯ q ( u ) = ¯ q (0) = q (0) = 1 − p ( a ) = X b ∈ Σ \{ a } p ( b ) = µ p (Σ \ { a } ) Similarly, if u = 1 , we have { y ∈ Σ ℓ : ˜ g ( y ) = u } = { a } , and thus ¯ q ( u ) = ¯ q (1) = q (1) = p ( a ) = µ p ( { a } ) , as desired.• If ℓ > , write u = b · · · b ℓ − b ℓ ; by the induction hypothesis: ¯ q ( b · · · b ℓ − ) = µ p ( { y ∈ Σ ℓ : ˜ g ( y ′ ) = b · · · b ℓ − } ) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − µ p ( y ′ ) If b ℓ = 0 , then: ¯ q ( b · · · b ℓ − b ℓ ) = ¯ q ( b · · · b ℓ − ) q (0) = ¯ q ( b · · · b ℓ − )(1 − p ( a )) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − µ p ( y ′ )(1 − p ( a )= X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ −  µ p ( y ′ ) X c ∈ Σ \{ a } p ( c )  = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − X c ∈ Σ \{ a } µ p ( y ′ ) p ( c ) ( † ) = X y ′ ∈ Σ ℓ − ˜ g ( y ′ )= b ··· b ℓ − c ∈ Σ \{ a } µ p ( y ′ c ) = X y ∈ Σ ℓ ˜ g ( y )= b ··· b ℓ − b ℓ µ p ( y )= µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = b · · · b ℓ − b ℓ } ) where ( † ) follows as both series on the left- and right-hand sides of the equality areabsolutely convergent. The proof for the case b ℓ = 1 is symmetric, mutatis mutandis.(End of proof of claim.)Observe that, for any y ∈ Σ ℓ , we have: | p ( a ) − (˜ g ( y )) /ℓ | ≥ ǫ iﬀ | p ( a ) − a ( y ) /ℓ | ≥ ǫ (18)Hence, by Equation (17), for any event U ⊆ { , } ℓ , we have:Pr ( U ) = X u ∈U ¯ q ( u ) = X u ∈U µ p ( { y ∈ Σ ℓ : ˜ g ( y ) = u } ))= µ p (cid:16)n y ∈ Σ ℓ : ˜ g ( y ) ∈ U o(cid:17) (19)The event | p ( a ) − X ℓa /ℓ | ≥ ǫ is shorthand for the set ( u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − P ℓj =1 u j ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ) = (cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( u ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)

21e thus obtain:Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) = Pr (cid:18)(cid:26) u ∈ { , } ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − ( u ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) = µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − (˜ g ( y )) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (19) )= µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) by ( Equation (18) ) (20)Observe that: µ p ( R n ( b, ǫ )) = µ p  [ bn<ℓ ≤ n (cid:26) y ∈ Σ ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) = X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ Σ ℓ ∩ R n ( b, ǫ ) : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) ≤ X bn<ℓ ≤ n µ p (cid:18)(cid:26) y ∈ Σ ℓ : (cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − a ( y ) ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27)(cid:19) = X bn<ℓ ≤ n Pr (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) p ( a ) − X ℓa ℓ (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:19) by Equation (20) ≤ X bn<ℓ ≤ n e − ℓǫ p ( a ) by Equation (16) ≤ (1 − b ) n e − bnǫ p ( a ) And thus lim n →∞ µ p ( F n ( b, ǫ )) = 0 , as desired. Corollary 12.1.

Let b, ǫ be real numbers with < b ≤ and ǫ > . Then, lim n →∞ µ p ( G n ( b, ǫ )) = 0 Proof.

By Lemma 12 with S the strategy deﬁned by the automaton A q , we obtain that lim n →∞ µ p ( G n ( b, ǫ, q )) = 0 and as G n ( b, ǫ ) = S q ∈ Q G n ( b, ǫ, q ) , we have: µ p ( G n ( b, ǫ )) ≤ X q ∈ Q µ p ( G n ( b, ǫ, q )) As Q is ﬁnite, we hence obtain lim n →∞ µ p ( G n ( b, ǫ )) = 0 . Lemma 13.

There is a real number b with < b ≤ such that for all ǫ > : lim n →∞ µ p ( D pn ( b, ǫ )) = 1 . roof. Observe that, for all b with < b ≤ : Σ n \ D pn ( b, ǫ )) = { w ∈ Σ n : ∃ q ∈ Q. | A q [ w ] | ≤ bn }∪ (cid:26) w ∈ Σ n : ∃ q ∈ Q. | A q [ w ] | > bn ∧ sup a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a ( A q [ w ]) | A q [ w ] | − p ( a ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ (cid:27) =  [ q ∈ Q E n ( b, q )  ∪  [ q ∈ Q G n ( b, ǫ, q )  and thus: µ p (Σ n \ D pn ( b, ǫ )) ≤ µ p  [ q ∈ Q E n ( b, q )  + µ p  [ q ∈ Q G n ( b, ǫ, q )  = µ p ( G n ( b, ǫ )) + µ p ( E n ( b )) By Lemma 11, choose a real number c > such that lim n →∞ µ p ( E n ( c − ǫ )) = 0 , and set b = c − ǫ .By Corollary 12.1, we obtain that lim n →∞ G n ( b, ǫ ) = 0 , and thus lim n →∞ µ p (Σ n \ D pn ( b, ǫ )) =0 . The result now follows by µ p ( D pn ( b, ǫ ))) = 1 − µ p (Σ n \ D pn ( b, ǫ )) . Lemma 14.

Let p : Σ −→ [0 , be a probability distribution, let α ∈ Σ ω be µ p -block-distributed,and A a strongly connected DFA over Σ . Then, for all a ∈ Σ , the limiting frequency of a inthe sequence β = A [ α ] exists and equals p ( a ) .Proof. For each n, r , let β ( n,r ) be the sequence of symbols picked out from block α ( n,r ) when A is applied to α ; note that each β ( n,r ) has length between and n .For each positive integer m , deﬁne: L m = m X i =1 | β ( n,i ) | And for each a ∈ Σ , deﬁne ρ ma by: ρ ma = P mi =1 a ( β ( n,i ) ) L m To prove the lemma, it suﬃces to show that, for any real number ǫ > and all suﬃcientlylarge m , we have | ρ ma − p ( a ) | < ǫ .Deﬁne: I m = n i m : α ( n,i ) D pn (cid:16) b, ǫ (cid:17)o And deﬁne: ℓ m = X i ∈ I m | β ( n,i ) | Now, deﬁne θ ma by: θ ma = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) P i ∈{ ,...,m }\ I m | y ( n,i ) | = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m θ ma is the frequency of occurrences of a when the blocks β ( n,i )] picked out from blocks α ( i,r ) ∈ D pn ( b, ǫ ) with i ≤ m are all concatenated. Observe that, by deﬁnition of D pn , we have | θ ma − p ( a ) | < ǫ .We have: ρ ma − θ ma = P mi =1 a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m = P i ∈ I m a ( β ( n,i ) ) L m + P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m ! − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m ( † ) = P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m + P i ∈ I m a ( β ( n,i ) ) L m ≤ P i ∈ I m a ( β ( n,i ) ) L m ≤ P i ∈ I m | β ( n,i ) | L m = ℓ m L m (21)where the penultimate inequalities in the last line above follows because L m ≥ L m − ℓ m im-plies P i ∈{ ,...,m }\ I a ( β ( n,i ) ) L − P i ∈{ ,...,m }\ I a ( β [ n,i ] ) L − ℓ ≤ , and the ﬁnal inequality follows because P i ∈ I m a ( β ( n,i ) ) ≤ P i ∈ I | β ( n,i ) | = ℓ m .By basic algebra, we have: P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m − ℓ m = − ℓ m P i ∈{ ,...,m }\ I a ( β ( n,i ) ) L m ( L m − ℓ m ) and as X i ∈{ ,...,m }\ I m a ( β ( n,i ) ) ≤ X i ∈{ ,...,m }\ I m | β ( n,i ) | = L m − ℓ m we conclude that: − ℓ m P i ∈{ ,...,m }\ I m a ( β ( n,i ) ) L m ( L m − ℓ m ) ≥ − ℓ m L m and thus by ( † ) above that: ρ ma − θ ma + P i ∈ I m a ( β ( n,i ) ) L m ≥ − ℓ m L m whence − ℓ m /L m ≤ ρ a − θ a , which combined with (Equation (21)) yields | ρ a − θ a | ≤ ℓ m /L m .By Lemma 13 pick a b such that such that for all ǫ > , we have lim n →∞ µ p ( D pn ( b, ǫ )) = 1 .Choose δ > with δ < bǫ , and pick n ∈ N such that µ p ( D pn ( b, ǫ )) > − δ . Now, pick γ < bǫ .Because α is µ p -block-distributed, there exists M ∈ N such that for all k ≥ M and all B ⊆ Σ n ,the preﬁx α | ≤ kn satisﬁes: (cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ B }| k − µ p ( B ) (cid:12)(cid:12)(cid:12)(cid:12) < γ In the particular case B = D pn ( b, ǫ/ , we thus have: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) |{ i ≤ k : α ( n,i ) ∈ D pn (cid:0) b, ǫ (cid:1) }| k − µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < γ − δ − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k ≤ µ p (cid:16) D pn (cid:16) b, ǫ (cid:17)(cid:17) − |{ i ≤ k : α ( n,i ) ∈ D pn ( b, ǫ ) }| k < γ whence we conclude: (cid:12)(cid:12)(cid:12)n i ≤ k : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) > k (1 − δ − γ ) (22)By deﬁnition of D pn ( b, ǫ )) , every α ( n,i ) ∈ D pn ( b, ǫ )) satisﬁes | A [ α ( n,i ) ] | > bn , and we thushave: L m = m X i =1 | y ( n,i ) | = m X i =1 | A [ α ( n,i ) ] | ≥ (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) bn > m (1 − δ − γ ) bn (23)Furthermore, by the deﬁnition of I m and (Equation (22)): | I m | = (cid:12)(cid:12)(cid:12)n i m : α ( n,i ) D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) = m − (cid:12)(cid:12)(cid:12)n i ≤ m : α ( n,i ) ∈ D pn (cid:16) b, ǫ (cid:17)o(cid:12)(cid:12)(cid:12) < m − m (1 − δ − γ ) = m ( δ + γ ) But then, ℓ m = X i ∈ I m | y ( i,n ) | ≤ | I m | n < mn ( δ + γ ) (24)and thus by Equation (23) and Equation (24): ℓ m L m < mn ( δ + γ ) m (1 − δ − γ ) bn = δ + γb (1 − δ − γ ) < bǫ + bǫ b (cid:0) − bǫ − bǫ (cid:1) < ǫ − < ǫ where we have used that bǫ < in the penultimate inequality.We now ﬁnally have | ρ a − p ( a ) | ≤ | ρ ma − θ ma | + | θ a − p ( a ) | < ℓ m L m + ǫ < ǫ ǫ ǫ concluding the proof. Lemma 15.

Let Σ be an alphabet, p a positive Bernoulli distribution on Σ , let α ∈ Σ ω be µ p -distributed, and let A be a strongly connected DFA over Σ . Then, A [ α ] is µ p -distributed.Proof. By Lemma 10 it suﬃces to show for every a ∈ Σ and every strongly connected A thatthe limiting frequency of a in A [ α ] exists and equals p ( a ) . As α is µ p -distributed, it followsfrom Proposition 8 that it is µ p -block-distributed, and the result then immediately follows byLemma 14. We now show an application of the main result to the area of symbolic dynamical systems. Thefollowing section recalls basic facts about symbolic dynamical systems, including establishingthe correspondence between probability maps on Σ ∗ and probability measures on full shifts.25 .1 Shift spaces and genericity We brieﬂy introduce basic notions; full accounts can be found in standard textbooks, e.g. [35].

Deﬁnition 13.

Let Σ be a non-empty alphabet. The (one-sided) shift s : Σ ω −→ Σ ω is themap deﬁned by s ( a a a · · · ) = a a · · · . A shift space is a pair ( X, s ) where X ⊆ Σ ω is aclosed (in the product topology on Σ ω when Σ is endowed with the discrete topology) subsetsuch that s ( X ) = X , and s is the restriction of the shift to X .As usual, we consider the σ -algebra C on Σ ω having the set of cylinders { [ w ] : w ∈ Σ ∗ } asbasis. All measures µ in the remainder of the paper are understood to be measures on (Σ ω , C ) .The standard example of probability measures on shift spaces is the set of Bernoulli mea-sures [59]: Deﬁnition 14.

A probability measure on the shift space (Σ ω , s ) is a probability measure on Σ ω with the σ -algebra generated by the cylinder sets { [ v ] : v ∈ Σ ∗ } A probability measure ¯ µ on the full shift is a Bernoulli measure if there is a probability distribution p : Σ −→ [0 , such that the measure of each cylinder satisﬁes ¯ µ ([ a · · · a n ]) = Q ni =1 p ( a i ) . In this case, wesay that ¯ µ is induced by p . Deﬁnition 15.

Let ( X, s ) be a shift space. A probability measure ¯ µ on X is said to be shiftinvariant if ¯ µ ( S − ( A )) = ¯ µ ( A ) for all A ⊆ X . A ﬁnite word w ∈ Σ k is said to be admissible for µ if ¯ µ ([ w ]) > .A right-inﬁnite sequence α ∈ Σ ω is said to be generic for ¯ µ if, for all words w admissiblefor ¯ µ , we have: lim n →∞ w ( α | ≤ n ) n = ¯ µ ([ w ]) That is, w occurs in α with limiting frequency ¯ µ ([ w ]) .The study of probability measures on the full shift is cryptomorphic to the study of invari-ant probability maps; this folklore result is contained in the following two propositions (proofscan be found in Appendix A). Proposition 16.

Every invariant probability map µ : Σ ∗ −→ [0 , induces a shift-invariantprobability measure ¯ µ : Σ ω −→ [0 , by setting ¯ µ ([ w ]) = µ ( w ) . Conversely, every probabilitymeasure ν : Σ ω −→ [0 , induces a probability map ν : Σ ∗ −→ [0 , by deﬁning ν ( w ) = ν ([ w ]) ;if ν is shift-invariant, then ν is invariant. Furthermore, µ = ¯ µ , and ν = ¯ ν . Proposition 17.

Let µ : Σ ∗ −→ [0 , be a probability map. The following are equivalent:1. There exists a µ -distributed α ∈ Σ ω .2. µ is invariant.3. There exists a shift-invariant probability measure ν on Σ ω such that ¯ µ = ν .Conversely, let ν be a probability measure on Σ . The following are equivalent:1. There exists α ∈ Σ ω that is generic for ν . For one-sided shifts, some authors require only s ( X ) ⊆ X ; we shall not do so here. . ν is shift-invariant.3. There exists an invariant probability map µ : Σ ∗ −→ [0 , such that ν = µ . It follows that the shift-invariant probability measures ν on the full shift such that gener-icity is preserved by ﬁnite-state selection, are exactly the Bernoulli measures: Theorem 18.

Let Σ be a non-empty alphabet, and let ν be a shift-invariant measure on thefull shift (Σ ω , s ) such that there exists at least one α ∈ Σ ω generic for ν . Then, every ﬁnite-state selector preserves genericity iff ν is a Bernoulli measure such that all words in Σ ∗ areadmissible.Proof. Observe that for a Bernoulli measure ¯ µ on the full shift on Σ , all words are admissibleiff ¯ µ ( a ) > for all a ∈ Σ . The Theorem now follows from Theorem 4 and Proposition 17. The most obvious extension of our main results is to attempt to relax the requirement thatselection is done by a DFA by using methods similar to Kamae and Weiss [31], and Kamae andWang [68] where reasoning using a combination of density arguments and relaxed ﬁnitenessconditions on the syntactic monoid of the strategy (using our terminology) have been used fornormal sequences over binary alphabets. We conjecture that some of these techniques can beadapted to positive Bernoulli distributions on arbitrary ﬁnite alphabets.A diﬀerent possible thrust is to consider generalizations of Agafonov’s Theorem on domainsdiﬀerent from inﬁnite sequence over alphabets. However, some results in the – sparse –literature on selection from normal sequence-like objects in other contexts are negative; forexample normality is not preserved by arithmetic progressions (so, probably not by ﬁnite-stateselectors in any reasonable sense) for continued fraction expansions [27]. On the other hand,very recent work by Bergelson et al. has succesfully adapted the classical techniques of Kamaeand Weiss [31] to show that certain Følner sequences preserve (the appropriate analogue of)normality in cancellative amenable semigroups [7].27

Auxiliary proofs and deﬁnitions

A.1 Automata and selectors

The following is a proof of the extension of Lemma 2.6 of [56]. The proof follows the originalin most details.

Deﬁnition 16.

Let G = ( V, E ) be a directed multigraph, and denote by ∼⊆ V × V theequivalence relation such that v ∼ w iff v and w are in the same strongly connected componentof G . For every v ∈ V , denote by [ v ] ∼ the equivalence class containing v . Deﬁne the partialorder < on V / ∼ by V < W iff there are v ∈ V and w ∈ W such that there is a directed pathfrom v to w .If G has a ﬁnite number of nodes, < is clearly well-founded. As < is clearly also transitive,every W ∈

V / ∼ satisﬁes W > V for some < -minimal V ∈

V / ∼ .Also observe that every < -minimal V is a recurrent strongly connected component, because(i) it is strongly connected by deﬁnition, and < -minimality implies that no directed path fromany node in V can reach a node in a strongly connected component distinct from V . Lemma 19.

Let S = ( Q, δ, q s , Q F ) be a ﬁnite automaton over a (possibly inﬁnite) alphabet Σ .Then there is a word w ∈ Σ ∗ such that, for all states q ∈ Q , δ ∗ ( q, w ) is a state in a < -minimalelement of Q/ ∼ .Proof. Write Q = { q , . . . , q m } . We prove by induction on i ≤ m that there is a word w i ∈ Σ ∗ such that for all j ≤ i , δ ∗ ( s j , w i ) is a state in a < -minimal element of Q/ ∼ . i = 1 : Let V be a < -minimal element of Q/ ∼ such that [ q ] ∼ > V . Choose q ∈ Q such that [ q ] ∼ = V . Then there is a directed path from s to q . Let w be the word along thatpath, and observe that δ ∗ ( q , w ) = q . i > : Let V be a < -minimal element of Q/ ∼ such that δ ∗ ( q i +1 , w i ) ∈ V , and let q ∈ V ,whence there is a directed path from δ ∗ ( q i +1 , w i ) to q . Let w ′ ∈ Σ ∗ be the word alongthat path, whence δ ∗ ( δ ∗ ( q i +1 , w i ) , w ′ ) = q . Deﬁne w i +1 = w i · w ′ , and observe that δ ∗ ( q i +1 , w i +1 ) = q .For j ≤ i , we claim that δ ∗ ( q j , w i +1 ) is a state in a < -minimal element of Q/ ∼ . For,by the Induction Hypothesis, δ ( q j , w i ) is in a < -minimal element V j of Q/ ∼ , and as < -minimal element are recurrent strongly connected components, no directed path from δ ( q j , w i ) can end in a state outside V j . Proof of Lemma 3.

Any < -minimal element of Q/ ∼ is recurrent. By Lemma 19, there is aword w such that from any state q ∈ Q , δ ∗ ( q, w ) is a state in a recurrent strongly connectedcomponent of the automaton. As p ( a ) > for all a ∈ Σ , µ p ( w ) > , and as α is p -distributed, w thus occurs (inﬁnitely often) in α . After the ﬁrst occurrence of w , the run of A on α hasentered a strongly recurrent connected component.28 .2 µ -distribution Proof of Proposition 8.

We use exactly the same arguments as in the proof by Niven andZuckerman [43], but using the notation of the present paper. Almost the entirety of the proofin [43] is devoted to counting arguments on ﬁnite preﬁxes of α , and involves neither the size ofthe alphabet Σ , nor the particular distribution on it; indeed any consideration of those mattersis isolated to a few observations in the beginning of the proof that are then used repeatedlywhen taking limits later on. We have clearly indicated those observations below, but give theentirety of the proof in the interest of completeness.Let w = w · · · w v ∈ Σ v be arbitrary. We introduce the following notation:• For any t ≥ , w Σ t w is the set { wuw : u ∈ Σ t } .• iw ( n ) is the number of times that w occurs in α | ≤ n at a position congruent to i (mod n ).• i,jw ( n ) = iw ( n ) − jw ( n ) .• g : N −→ N is the function deﬁned by: g ( n ) = P n − i =1 iw ( n ) .• θ t ( n ) is the number of occurrences of any element from w Σ t w in α | ≤ n .• w ′ is shorthand for any string of length between v + 1 and v − whose ﬁrst v digits are w and whose last digits are w , i.e. an “overlap of w with itself”. Such a string does notnecessarily exist.We now treat the part of the proof depending on the cardinality of Σ and µ p -distributedness(as opposed to ﬁniteness of Σ and equidistribution ).As α is µ p -distributed, we have lim n →∞ g ( n ) n = µ p ( w ) (25)29nd for each ﬁxed t ≥ , we also have: lim n →∞ θ t ( n ) n = lim n →∞ P a ··· a t ∈ Σ t wa ··· a t w ( α | ≤ n ) n = lim n →∞ X a ··· a t ∈ Σ t wa ··· a t w ( α | ≤ n ) n = X a ··· a t ∈ Σ t lim n →∞ wa ··· a t w ( α | ≤ n ) n (By the Dominated Convergence Theorem) = X a ··· a t ∈ Σ t µ p ( wa · · · a t w )= X a ··· a t ∈ Σ t µ p ( w ) t Y i =1 p ( a i ) ( As µ p is induced by a Bernoulli distribution )= µ p ( w ) X a ··· a t ∈ Σ t t Y i =1 p ( a i )= µ p ( w ) t Y i =1 X a ∈ Σ p ( a ) (By monotone convergence) = µ p ( w ) (As a ∈ Σ p ( a ) )(26)We shall prove that: lim n →∞ i,jw ( n ) n = 0 (27)By Equation (25) and Equation (27), it follows for any i with ≤ i < v that: lim n →∞ iw ( n ) n = µ p ( w ) v and as v and w ∈ Σ v were arbitrary, that α is µ p -block-distributed.The remainder of the proof is devoted to prove Equation (27) and is only concerned withcounting arguments on ﬁnite preﬁxes of α . All arguments from hereon are, modulo notationand use of Equation (26), completely identical to the proof in [43].Let s ≥ be an integer. Observe that iw ( n + s ) − iw ( n ) is the number of occurrences of w that (1) are in α | ≤ n + s at a position congruent to i mod v , but (2) are not entirely containedin α | ≤ n . Thus, X i s , we deﬁne: σ = s − s X m =0 X i
Proof of Proposition 16.

The two identities µ = ¯ µ , and ν = ¯ ν follow directly from the deﬁni-tions. If µ is invariant, then for every cylinder [ w ] , we have ¯ µ ([ w ]) = µ ( w ) = P a ∈ Σ µ ( w · a ) = P a ∈ Σ ¯ µ ([ w · a ]) ; from this, and the observation that µ ( ǫ ) = ¯ µ ([ ǫ ]) = ¯ µ (Σ ω ) , that ¯ µ is a33robability measure on Σ with the sigma algebra generated by the cylinder sets. In addition,as µ is invariant, we have for any cylinder [ w ] ′ that: ¯ µ ( S − ([ w ])) = ¯ µ [ a ∈ Σ [ a · w ] ! = X a ∈ Σ ¯ µ ([ a · w ]) = X a ∈ Σ µ ( a · w ) = µ ( w ) = ¯ µ ([ w ]) whence ¯ µ is shift-invariant.Conversely, if ν is a shift-invariant probability measure on Σ ω , we have for any w that: ν ( w ) = ν ([ w ]) = ν [ a ∈ Σ [ w · a ] ! = X a ∈ Σ ν ([ w · a ]) = X a ∈ Σ ν ( w · a ) and ν ( w ) = ν ([ w ]) = ν ( S − ([ w ])) = ν [ a ∈ Σ [ a · w ] ! = X a ∈ Σ ν ([ a · w ]) = X a ∈ Σ ν ( a · w ) showing that ν is invariant. Proof of Proposition 17.

For the ﬁrst part, we prove 1 ⇒ ⇒ ⇒ µ -distributed α ∈ Σ ω , then for any w ∈ Σ ∗ and any ǫ > , for all suﬃcientlylarge n we have sup b ∈ Σ ∪{ λ } | b · w ( α | ≤ n ) /n − µ ( w ) | < ǫ . Observe that every occurrence of a wordon the form a · w in α contains an occurrence of w , and hence w ( α | ≤ n ) ≥ P a ∈ Σ a · w ( α | ≤ n ) .Conversely, for every occurrence of w starting at some position i ≥ in α , there is exactlyone a ∈ Σ such that the word a · w occurs at position i − , whence w ( α | ≤ n ) ≤ P a ∈ Σ a · w ( α | ≤ n ) , and hence: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − w ( α | ≤ n ) n + w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) µ ( w ) − w ( α | ≤ n ) n (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ǫ + 1 n + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P a ∈ Σ a · w ( α | ≤ n ) n − X a ∈ Σ µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ + 1 n + X a ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) a · w ( α | ≤ n ) n − µ ( a · w ) (cid:12)(cid:12)(cid:12)(cid:12) < ǫ + 1 n + | Σ | ǫ and as ǫ was arbitrary, we thus have µ ( w ) = P a ∈ Σ µ ( a · w ) . The case for µ ( w ) = P a ∈ Σ µ ( w · a ) is symmetric, mutatis mutandis , and hence µ is invariant. If µ is invariant, then by Proposition16, ¯ µ is a shift-invariant probability measure on Σ ω . If ν is a shift-invariant probability measureon Σ ω such that ¯ µ = ν , then by [38, Main Thm. 2.1], there exists α ∈ Σ ω generic for ¯ µ , andthus for any admissible w ∈ Σ ∗ lim n →∞ w ( α | ≤ n ) n = ν ( w ) = ¯ µ ([ w ]) = µ ( w ) . Observe that anyinadmissible word w = a · · · a n has Q ni =1 µ ( a i ) = µ ( w ) = 0 , whence µ ( a i ) = 0 for some i , andhence lim n →∞ w ( α | ≤ n ) n ≤ lim n →∞ ai () α | ≤ n n = 0 . Hence, α is µ -distributed.34or the second part, we prove 1 ⇒ ⇒ ⇒

1. Assume that α is generic for ν . Byconstruction, ν is a probability map such that α is ν -distributed, and by the ﬁrst part ofthe proposition, ν is invariant, as desired. If ν is an invariant probability map, then as anymeasurable A can be written as a disjount union of cylinder sets, and as we for any cylinder [ w ] have S − ([ w ]) = ∪ a ∈ Σ [ a · w ] , we obtain ν ( S − ([ w ])) = ν ( ∪ a ∈ Σ [ a · w ]) = X a ∈ Σ ν ( a · w ) = ν ( w ) = ν ([ w ]) showing that ν is shift-invariant. Finally, if ν is shift-invariant, it follows from [38, Main Thm.2.1], there exists α ∈ Σ ω generic for ν , as desired.35 eferences [1] V. N. Agafonov. “Normal sequences and ﬁnite automata”. English. In: Sov. Math., Dokl. issn :0197-6788.[2] D. Airey and B. Mance. “Normality preserving operations for Cantor series expansionsand associated fractals, I”. In:

Illinois J. Math.

Acta Arith-metica

180 (2017), pp. 333–346.[4] N. Alon, Y. Matias, and M. Szegedy. “The Space Complexity of Approximating theFrequency Moments”. In:

Journal of Computer and System Sciences

Journal ofComputer and System Sciences

TheoreticalComputer Science

477 (2013), pp. 109–116.[7] V. Bergelson, T. Downarowicz, and J. Vandehey.

Deterministic functions on amenablesemigroups and a generalization of the Kamae-Weiss theorem on normality preservation .2020. arXiv: .[8] G. Berry and G. Gonthier. “The Esterel synchronous programming language: design,semantics, implementation”. In:

Science of Computer Programming

Israel Journal of Mathematics

80 (1992), pp. 257–287.[10] F. Blanchard. “Non literal tranducers and some problems of normality”. In:

Journal deThéorie des Nombres de Bordeaux

IEEE Transactions on Information Theory

Rend.Circ. Matem. Palermo

27 (1909), pp. 247–271.[13] S. Boucheron, A. Garivier, and E. Gassiat. “Coding on Countably Inﬁnite Alphabets”.In:

IEEE Trans. Inf. Theory

Approxima-tion, Randomization, and Combinatorial Optimization. Algorithms and Techniques . Ed.by P. Raghavendra et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 58–70.[15] A. Broglio and P. Liardet. “Predictions with automata. Symbolic dynamics and its ap-plications”. In:

Contemporary Mathematics

135 (1992). Also appeared in Proceedings ofthe AMS Conference in honor of R. L. Adler. New Haven CT - USA 1991., 111–124.[16] O. Carton. “A direct proof of Agafonov’s theorem and an extension to shifts of ﬁnitetype”. In:

Preprint (2020). 3617] O. Carton and J. Vandehey. “Preservation of Normality by Non-Oblivious Group Selec-tion”. In:

Theory of Computing Systems (2020).[18] P. Caspi et al. “Lustre: A Declarative Language for Programming Synchronous Sys-tems”. In:

Conference Record of the Fourteenth Annual ACM Symposium on Principlesof Programming Languages, Munich, Germany, January 21-23, 1987 . 1987, pp. 178–188.[19] J. W. S. Cassels. “On a paper of Niven and Zuckerman.” In:

Paciﬁc J. Math.

Journal of the London Mathematical Society s1-8.4 (1933), pp. 254–260.[21] A. Church. “On the concept of a random sequence”. In:

Bulletin of the American Math-ematical Society

AmericanJournal of Mathematics

American Journal of Mathematics

Bull. Amer. Math. Soc.

Canadian J. Math (1952),pp. 58–63.[26] C. Dwork. “Diﬀerential Privacy in New Settings”. In:

Proceedings of the Twenty-FirstAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas,USA, January 17-19, 2010 . 2010, pp. 174–183.[27] B. Heersink and J. Vandehey. “Continued fraction normality is not preserved alongarithmetic progressions”. In:

Archiv der Mathematik

106 (Sept. 2015).[28] M. Holzer, M. Kutrib, and A. Malcher. “Multi-Head Finite Automata: Characterizations,Concepts and Open Problems”. In:

CSP . 2008, pp. 93–107. doi : .[29] M. Hosseini and N. Santhanam. “On redundancy of memoryless sources over countablealphabets”. In: . 2014, pp. 299–303.[30] P. Indyk and D. P. Woodruﬀ. “Optimal approximations of the frequency moments of datastreams”. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing,Baltimore, MD, USA, May 22-24, 2005 . 2005, pp. 202–208.[31] T. Kamae and B. Weiss. “Normal numbers and selection rules”. In:

Israel Journal ofMathematics (1975), pp. 101–110.[32] E. Kamke. “Über neuere Begründungen der Wahrscheinlichkeitsrechnung.” In:

Jahres-bericht der Deutschen Mathematiker-Vereinigung

42 (1933), pp. 14–27.[33] G. Kellaris et al. “Diﬀerentially Private Event Sequences over Inﬁnite Streams”. In:

Proc.VLDB Endow.

Moscow Univ. Math. Bull.

An Introduction to Symbolic Dynamics and Coding . CambridgeUniversity Press, 1995.[36] M. Madritsch. “Normal Numbers and Symbolic Dynamics”. In:

Sequences, Groups, andNumber Theory . Ed. by V. Berthé and M. Rigo. Cham: Springer International Publish-ing, 2018, pp. 271–329.[37] M. Madritsch, A.-M. Scheerer, and R. Tichy. “Computable Absolutely Pisot NormalNumbers”. In:

Acta Arithmetica

184 (2018), pp. 7–29.[38] M. Madritsch and B. Mance. “Construction of µ -normal sequences”. In: Monatshefte fürMathematik

179 (2016), pp. 259–280.[39] B. Mance. “Cantor series constructions of sets of normal numbers”. In:

Acta Arithmetica

156 (2012), pp. 223–245.[40] W. Merkle and J. Reimann. “Selection Functions that Do Not Preserve Normality”. In:

Theory Comput. Syst.

Data Streams: Algorithms and Applications . Now Publishers Inc.,2005.[42] Y. Nakai and I. Shiokawa. “Discrepancy estimates for a class of normal numbers”. In:

Acta Arithmetica

Paciﬁc J.Math.

Journalof Computer and System Sciences

V. N. Agafonov . “

Normal~nye posledovatel~nosti i koneqnye avtomaty ”. In:

Prob-lemy kibernetiki . Ed. by

L.A. L(cid:31)punova . Vol. 20.

Nauka, Akademii nauk SSSR ,1968, pp. 123–129.[46]

V. N. Agafonov . “

Normal~nye posledovatel~nosti i koneqnye avtomaty ”. In:

Dokl.AN SSSR

A. G. Postnikov . “

Arifmetiqeskoe modelirovanie sluqa(cid:26)nyh pro essov ”. In:

Tr.MIAN SSSR

57 (1960), pp. 3–84.[48]

A. G. Postnikov and

I. I. P(cid:31)te ki(cid:26) . “

Normal~nye po Bernulli posledovatel~nostiznakov ”. In:

Izv. AN SSSR. Ser. matem.

L. P. Postnikova . “

O sv(cid:31)zi pon(cid:31)ti(cid:26) kollektiva Mizesa{Qerqa i normal~no(cid:26) poBernulli posledovatel~nosti znakov ”. In:

Teori(cid:31) vero(cid:31)tn. i ee primen.

Haskell 98 Language and Libraries: The Revised Report . CambridgeUniversity Press, 2003.[51] P. Pollack and J. Vandehey. “Some Normal Numbers Generated by Arithmetic Func-tions”. In:

Canadian Mathematical Bulletin

58 (Sept. 2013).[52] L. Postnikova. “On the connection between the concepts of collectives of Mises-Churchand normal Bernoulli sequences of symbols”. In:

Theory of Probability & Its Applications

Mathematische Zeitschrift

Annales del’institut Henri Poincaré . Vol. 7. 5. 1937, pp. 267–348.[55] A.-M. Scheerer. “Computable Absolutely Normal Numbers and Discrepancies”. In:

Math-ematics of Computation

86 (Nov. 2015).[56] C. Schnorr and H. Stimm. “Endliche Automaten und Zufallsfolgen”. In:

Acta Informatica

Basics of Applied Stochastic Processes . Probability and Its Applications.Springer-Verlag, 2009.[58] A. Shen. “Automatic Kolmogorov Complexity and Normality Revisited”. In:

Funda-mentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux,France, September 11-13, 2017, Proceedings . Ed. by R. Klasing and M. Zeitoun. Vol. 10472.Lecture Notes in Computer Science. Springer, 2017, pp. 418–430.[59] P. Shields.

The Theory of Bernoulli Shifts . Univ. Chicago Press, 1973.[60] W. Sierpinski. “Démonstration élémentaire du théorème de M. Borel sur les nombresabsolument normaux et détermination eﬀective d’une tel nombre”. In:

Bulletin de laSociété Mathématique de France

45 (1917), pp. 125–132.[61] J. F. Silva and P. Piantanida. “Almost lossless variable-length source coding on count-ably inﬁnite alphabets”. In: . 2016, pp. 1–5.[62] R. Stephens. “A Survey of Stream Processing”. In:

Acta Informatica

Jour-nal für die reine und angewandte Mathematik

Jour-nal of Number Theory

166 (2016), pp. 424 –451.[65] J. Vandehey. “The normality of digits in almost constant additive functions”. In:

Monat-shefte für Mathematik

171 (June 2012).[66] J. Vandehey. “Uncanny subsequence selections that generate normal numbers”. In:

Uni-form Distribution Theory

12 (2017), pp. 65–75.[67] R. Von Mises. “Grundlagen der Wahrscheinlichkeitsrechnung”. In:

Mathematische Zeitschrift

Related Researches

Regular Model Checking Approach to Knowledge Reasoning over Parameterized Systems (technical report)

by Daniel Stan

Lie complexity of words

by Jason P. Bell

Parallel Hyperedge Replacement String Languages

by Graham Campbell

Recursive Prime Factorizations: Dyck Words as Numbers

by Ralph L. Childress

Subcubic Certificates for CFL Reachability

by Dmitry Chistikov

Explaining Safety Failures in NetKAT

by Georgiana Caltais

Decision Power of Weak Asynchronous Models of Distributed Computing

by Philipp Czerner

Automatic sequences: from rational bases to trees

by Michel Rigo

A theory of Automated Market Makers in DeFi

by Massimo Bartoletti

Model Checking for Decision Making System of Long Endurance Unmanned Surface Vehicle

by Hanlin Niu

Simplest Non-Regular Deterministic Context-Free Language

by Petr Jancar

Synthesis and Implementation of Distributed Supervisory Controllers with Communication Delays

by R.H.J. Schouten

Les claviers, un modèle de calcul

by Yoan Géran

On Typical Hesitant Fuzzy Languages and Automata

by Valdigleis S. Costa

On polynomial grammars extended with substitution

by Janusz Schmude

New Techniques for Universality in Unambiguous Register Automata

by Wojciech Czerwi?ski

Learning Pomset Automata

by Gerco van Heerdt

Locality and Centrality: The Variety ZG

by Antoine Amarilli

MatchKAT: An Algebraic Foundation For Match-Action

by Xiang Long

Dynamic Membership for Regular Languages

by Antoine Amarilli

Adaptive Synchronisation of Pushdown Automata

by A. R. Balasubramanian

Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

by Borja Balle

Which Regular Languages can be Efficiently Indexed?

by Nicola Cotumaccio

Recognizability of languages via deterministic finite automata with values on a monoid: General Myhill-Nerode Theorem

by José Ramón González de Mendívil

The Complexity of Learning Linear Temporal Formulas from Examples

by Nathanaël Fijalkow

«

1

2

3

4

»

Submitted on 17 Nov 2020 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar