[PDF] Bernoulli Randomness and Biased Normality

Abstract

One can consider μ -Martin-Löf randomness for a probability measure μ on 2 ω , such as the Bernoulli measure μ p given p∈(0,1) . We study Bernoulli randomness of sequences in n ω with parameters p 0 , p 1 ,…, p n−1 , and we introduce a biased version of normality. We prove that every Bernoulli random real is normal in the biased sense, and this has the corollary that the set of biased normal reals has full Bernoulli measure in n ω . We give an algorithm for computing biased normal sequences from normal sequences, so that we can give explicit examples of biased normal reals. We investigate an application of randomness to iterated function systems. Finally, we list a few further questions relating to Bernoulli randomness and biased normality.

Full PDF

BBernoulli Randomness and Biased Normality

Andrew DeLapo ∗ [email protected] July 2020

Abstract

One can consider µ -Martin-L¨of randomness for a probability measure µ on 2 ω , such as theBernoulli measure µ p given p ∈ (0 , n ω withparameters p , p , . . . , p n − , and we introduce a biased version of normality. We prove thatevery Bernoulli random real is normal in the biased sense, and this has the corollary that theset of biased normal reals has full Bernoulli measure in n ω . We give an algorithm for computingbiased normal sequences from normal sequences, so that we can give explicit examples of biasednormal reals. We investigate an application of randomness to iterated function systems. Finally,we list a few further questions relating to Bernoulli randomness and biased normality. This paper roughly follows the historical development of normal numbers and algorithmic random-ness. Borel [1] ﬁrst described normal numbers in 1909, and Pillai [2] shortened Borel’s deﬁnition in1940. One decade later, Niven and Zuckerman [3] proved an equivalent formulation of normalityin terms of blocks of digits. Although Borel also showed in 1909 that almost all real numbers arenormal in every base, where the measure is the Lebesgue measure, the ﬁrst explicit constructionof a normal number did not appear until 1933, by Champernowne [4]. In 1966, Martin-L¨of [5]deﬁned randomness criteria in terms of geometrically shrinking and uniformly computably enumer-able open sets, and it can be shown that, in the Lebesgue measure, all Martin-L¨of-random numbersare normal in every base.After introducing preliminary notation, deﬁnitions, and theorems in the remainder of this sec-tion, we begin in Section 2 with a description of normality with respect to given biases on eachdigit in the base. This deﬁnition is written to follow Borel’s original deﬁnition of normality. Wethen prove a redundancy in our deﬁnition, as Pillai showed in Borel’s deﬁnition. We follow thiswith a deﬁnition of biased normality in terms of blocks, as Niven and Zuckerman proved. Theequivalences allow us to prove that, ﬁxing b biases p = ( p , p , . . . , p b − ) adding up to 1 and usingthe Bernoulli measure µ p on b ω , all µ p -Martin-L¨of-random numbers are biased normal with respectto p . In Section 3, we give an algorithm which, given rational biases, uses a normal number toconstruct a biased normal number with respect to the biases. Section 4 describes an application ofbiased normal numbers to iterated function systems, and Section 5 lists further open questions. ∗ This work was the author’s senior honors thesis which was completed in the Department of Mathematics at theUniversity of California, Berkeley, supervised by Professor Theodore Slaman. a r X i v : . [ m a t h . L O ] J u l ernoulli Randomness and Biased Normality Andrew DeLapo A base is an integer n ≥

2. Let n ω denote the set of inﬁnite n -ary sequences where n is a base. Weidentify n <ω as the set of ﬁnite n -ary sequences, which we also call blocks . For a given (cid:96) ∈ N , let n (cid:96) be the set of n -ary sequences of length (cid:96) . If σ ∈ n <ω , then let [[ σ ]] ⊆ n ω be the set of inﬁnitesequences which extend σ .If σ is a (ﬁnite or inﬁnite) n -ary sequence, we will index the entries in σ by σ [ i ], where σ [0] isthe ﬁrst entry of the sequence. The subsequence of σ from index i to index j , inclusive, is σ [ i : j ].If σ is ﬁnite, then the length of σ is len( σ ). If σ , σ ∈ n <ω , then σ σ is the concatenation of σ and σ . The number of occurrences of a base n block ρ inside σ is occ( σ, ρ ). The empty sequenceis denoted as (cid:15) .The base b representation of a real number r ∈ [0 ,

1] is denoted ( r ) b and refers to the sequencein b ω such that r = (cid:80) ∞ i =1 (( r ) b [ i − × b − i ) and such that ( r ) b includes inﬁnitely many instances ofdigits which are not b − Deﬁnition 1.1. A Borel probability measure on n ω is a countably additive, monotone function µ : F → [0 , F is the Borel σ -algebra of n ω and µ ( n ω ) = 1. Since a Borel probabilitymeasure is uniquely determined by the values it takes on ﬁnite unions of basic open cylinders, whengiving a Borel probability measure it is suﬃcient to specify a function ρ : n <ω → [0 ,

1] satisfying ρ ( (cid:15) ) = 1, where (cid:15) is the empty sequence, and ρ ( σ ) = n − (cid:88) i =0 ρ ( σi )where σi denotes the concatenation of σ with i as a symbol in base n . The resulting measure sets µ ([[ σ ]]) = ρ ( σ ). For this paper, we will refer to Borel probability measures as measures and onlyidentify the underlying function on blocks, so that µ ([[ σ ]]) is written as µ ( σ ). Deﬁnition 1.2.

The

Lebesgue measure λ on n ω is the measure given by setting λ ( σ ) = 1 n len( σ ) for each σ ∈ n <ω . Deﬁnition 1.3.

The

Bernoulli measure µ p on n ω , with associated positive probabilities p =( p , p , . . . , p n − ) satisfying (cid:80) n − i =0 p i = 1, is the measure given by setting µ p ( σ ) = p σ [0] p σ [1] · · · p σ [len( σ ) − for each σ ∈ n <ω . Note that the Lebesgue measure on n ω is exactly the Bernoulli measure on n ω obtained by setting p i = n for each i . Deﬁnition 1.4 (Martin-L¨of [5], see also [6]) . Let µ be a measure on n ω and z ∈ n ω . A µ -Martin-L¨oftest relative to z is a uniformly computably enumerable (relative to z ) sequence ( U i ) i ∈ ω of subsetsof n ω with µ ( U i ) ≤ − i for every i ∈ N . Say x ∈ n ω passes the test if x (cid:54)∈ (cid:84) i ∈ ω U i . If x passes every µ -Martin-L¨of test relative to z , then x is µ -Martin-L¨of random relative to z .2ernoulli Randomness and Biased Normality Andrew DeLapo Deﬁnition 1.5. If x ∈ n ω is µ p -Martin-L¨of random for the Bernoulli measure µ p with someprobabilities p = ( p , p , . . . , p n − ), then x is Bernoulli random with respect to the parameters p .Bernoulli randomness for binary sequences has been studied by Porter in [7]. Deﬁnition 1.6.

A real number x is simply normal to base b if every base b digit d ∈ { , , . . . , b − } appears with density b in ( x ) b . That is,lim n →∞ occ(( x ) b [0 : n − , d ) n = 1 b Borel characterized normality in the following way.

Deﬁnition 1.7 (Borel [1]) . A real number x is normal to base b if for every natural n and positiveinteger k , b n x is simply normal to base b k . Example 1.8.

In 1933, Champernowne [4] gave an explicit real number which is normal to base10. C = 0 . . . . In general, let C n denote the real number with the base n representation obtained by concatenatingthe base n numbers in order. C n is normal to base n . Example 1.9.

Among the results by Copeland and Erd˝os in [8] is the fact that the real number CE n obtained by concatenating the primes in base n in order is normal to base n . Then CE = 0 . . . .CE = 0 . . . . In 1940, Pillai simpliﬁed Borel’s deﬁnition with the following theorem.

Theorem 1.10 (Pillai [2]) . A real number x is normal to base b if and only if for every positiveinteger k , x is simply normal to base b k .In 1950, another equivalence was proven by Niven and Zuckerman. Theorem 1.11 (Niven and Zuckerman [3]) . A real number x is normal to base b if and only if forevery positive integer (cid:96) , every block w ∈ b (cid:96) appears in ( x ) b with frequency b (cid:96) .lim n →∞ occ(( x ) b [0 : n − , w ) n = 1 b (cid:96) One important connection between normal numbers and algorithmic randomness is the followingtheorem.

Theorem 1.12.

Every λ -Martin-L¨of random real is absolutely normal — normal in every base.3ernoulli Randomness and Biased Normality Andrew DeLapo The goal of this section is to prove a version of Theorem 1.12 for Bernoulli random numbers. Todo this, we deﬁne a notion of normality given biases on the digits. We will mirror the historicaldevelopment of normality by generalizing Borel’s original deﬁnitions of simply normal and normal to allow for given biases on the digits. In base b , the biases p , p , . . . , p b − , also called “densities”or “probabilities”, will be assumed to be positive real numbers adding to 1. Deﬁnition 2.1.

A real number x is biased simply normal to the biases p , p , . . . , p b − if each base b digit d ∈ { , , . . . , b − } appears with density p d in ( x ) b . That is,lim n →∞ occ(( x ) b [0 : n − , d ) n = p d Deﬁnition 2.2.

A real number x is biased normal with respect to the biases p , p , . . . , p b − if forevery natural n and positive integer k , b n x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , wherefor each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] and where here ( i ) b k contains suﬃcient zero-padding so that it has exactly k digits.Let k be any positive integer. Let p = ( p , p , . . . , p b − ) and p ∗ k = ( p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − ). Let v be a base b block, and let ( v ) b k be v be considered in base b k . For any k , if len( v ) divides k , thenthe p ∗ k are such that µ p ( v ) = µ p ∗ k ( v ).As shown for the case of normality in Theorems 1.10 and 1.11, the deﬁnition of biased normal can be simpliﬁed. To prove this, we will require the following deﬁnition. Deﬁnition 2.3.

Let w be a length (cid:96) block of digits in base b . Let p , p , . . . , p b − be biases. Thenthe simple discrepancy of w with respect to the biases ismax d ∈{ , ,...,b − } (cid:12)(cid:12)(cid:12)(cid:12) occ( w, d ) (cid:96) − p d (cid:12)(cid:12)(cid:12)(cid:12) Lemma 2.4.

Fix a base b , a digit d , and a block length k . Let S i ⊆ b k be the set of blocks oflength k containing exactly i instances of d . The Bernoulli measure of S i is µ p ( S i ) = (cid:18) ki (cid:19) p id (1 − p d ) k − i Proof.

We know that the number of blocks in S i is | S i | = (cid:18) ki (cid:19) ( b − k − i since there are (cid:0) ki (cid:1) choices for where to put the i instances of d and k − i places where one of b − d = 0. For w ∈ S i ,let n e = occ( w, e )4ernoulli Randomness and Biased Normality Andrew DeLapofor a digit e in base b . The measure of any such w is µ p ( w ) = p i b − (cid:89) m =1 p n m m To ﬁnd the measure of S i , we can take the sum of the measures over all such w with digit counts n , n , . . . , n b − ∈ N such that (cid:80) b − m =1 n m = k − i . The number of such w is (cid:88) n + n + ··· + n b − i = k − i (cid:18) ki (cid:19)(cid:18) k − in , n , . . . , n b − (cid:19) where (cid:18) k − in , n , . . . , n b − (cid:19) = ( k − i )! n ! n ! · · · n b − !is the multinomial coeﬃcient. This is because there are (cid:0) ki (cid:1) many choices for the locations of d = 0,and for each sum n + n + · · · + n b − = k − i there are (cid:0) k − in ,n ,...,n b − (cid:1) diﬀerent length k − i sequences w with occ( w, e ) = n e for each e from 1 to b −

1. So the measure of S i is µ p ( S i ) = (cid:88) n + n + ··· + n b − i = k − i (cid:18) ki (cid:19)(cid:18) k − in , n , . . . , n b − i (cid:19) p i b − (cid:89) j =1 p n j j µ p ( S i ) = (cid:18) ki (cid:19) p i (cid:88) n + n + ··· + n b − i = k − i (cid:18) k − in , n , . . . , n b − i (cid:19) b − (cid:89) j =1 p n j j By the multinomial theorem [9], (cid:88) n + n + ··· + n b − i = k − i (cid:18) k − in , n , . . . , n b − i (cid:19) b − (cid:89) j =1 p n j j =  b − (cid:88) j =1 p j  k − i Therefore µ p ( S i ) = (cid:18) ki (cid:19) p i  b − (cid:88) j =1 p j  k − i and we know (cid:80) b − j =1 p j = 1 − p , so µ p ( S i ) = (cid:18) ki (cid:19) p i (1 − p ) k − i which is the desired equality for d = 0. Lemma 2.5.

Let 0 < ε < min( p , . . . , p b − ). Fix a block length k . Say that a block w of length k is “bad” for a digit d if occ( w, d ) ≤ ( p d − ε ) k or occ( w, d ) ≥ ( p d + ε ) k B be the set of such w . B = { w ∈ b k : | occ( w, d ) − p d | ≥ εk } Then the Bernoulli measure of B in b ω with parameters p , p , . . . , p b − is at most 2 e − ε k . Proof.

Let i be an integer such that 0 ≤ i ≤ k . Let B i be set of blocks of length k containing exactly i instances of the digit d . The Bernoulli measure of B i in b ω with parameters p , p , . . . , p b − is, byLemma 2.4, µ p ( B i ) = (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i Notice that this is the binomial distribution with k trials and i successes, where the probability ofsuccess is p d . To calculate µ p ( B ), we have B = (cid:98) ( p d − ε ) k (cid:99) (cid:91) i =0 B i ∪ k (cid:91) i = (cid:100) ( p d + ε ) k (cid:101) B i where all the unions are of pairwise disjoint sets. Then µ p ( B ) = (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 µ p ( B i ) + k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) µ p ( B i )We expand both appearances of µ p ( B i ) as above. µ p ( B ) = (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i + k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i Apply Hoeﬀding’s inequality [10] on the tail ends of the binomial distribution to get that (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i ≤ e − ε k and k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i ≤ e − ε k It follows that µ p ( B ) ≤ e − ε k .Deﬁnition 1.7, Theorem 1.10, and Theorem 1.11 give three equivalent deﬁnitions of normality.The next three lemmas accomplish the same task for biased normality. Lemma 2.6. If x is biased normal to p , p , . . . , p b − , then for every positive integer k , x is biasedsimply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] Proof.

This lemma follows immediately from the deﬁnition of biased normal , as it is a special caseof the deﬁnition.

Lemma 2.7.

If for every positive integer k , x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − ,where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] then for each positive integer r and each block v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v ) Proof.

Fix r and v ∈ b r . Let ε , ε >

0. By Lemma 2 .

5, there is a suﬃciently large positiveinteger N such that all N ≥ N , all but a µ p -measure at most ε subset B of length N base b blocks have simple discrepancy less than ε when parsed in length r intervals starting from index0. Moreover, we argue that N can be made suﬃciently large so that for each m from 0 to r − µ p -measure b m ε subset B m of length N base b blocks have simple discrepancy less than ε when parsed in length r intervals starting from index m . The µ p -measure of each B m is at most b m ε because each length N − m sequence extends to a length N sequence in b m many ways, andwe know µ p ( B ) ≤ ε . Thus the measure of (cid:83) r − m =0 B m is at most (cid:80) r − m =0 b m ε ≤ b r ε .We compute an upper bound on the eventual frequency of v in ( x ) b . Let ε >

0. Parse ( x ) b inlength N subblocks starting from index 0, where N will be suﬃciently large as will be determinedby the following analysis. Because x is biased simply normal in base b N , there is a positive integer (cid:96) such that for all (cid:96) ≥ (cid:96) , every w ∈ b N occurs within ε of its expected frequency in the ﬁrst (cid:96) digits of ( x ) b N . That is, (cid:12)(cid:12)(cid:12)(cid:12) occ(( x ) b N [0 : (cid:96) − , w ) (cid:96) − µ p ∗ N ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε for every w ∈ b N , where p ∗ N = ( p ∗ N, , . . . , p ∗ N,b N − ). Parsing ( x ) b in length N blocks, instances of v in ( x ) b can occur in three diﬀerent ways. If an instance of v is not contained within a length N block when parsing ( x ) b into length N subblocks starting from index 0, then v begins in one blockand ends in the next block. All other instances of v will be entirely within one length N subblock,and we say such a block w is “good” if (cid:12)(cid:12)(cid:12) occ( w,v ) N − µ p ( v ) (cid:12)(cid:12)(cid:12) ≤ ε , or “bad” otherwise. If an instance of v is contained in a length N block w , then we consider separately the cases that the block is goodor bad.Let ε >

0. There are (cid:96) ( r − N many length r blocks that start in one length N block and end inanother length N block. Some of those (cid:96) ( r − N blocks could be instances of v , and none of them arecounted in the above computation. Assume that all (cid:96) ( r − N of these blocks are instances of v . Since N is made arbitrarily large, (cid:96) ( r − N < ε(cid:96) .Next, we bound the occurrences of v in bad length N subblocks. By Lemma 2.5, the subset B of bad length N blocks has µ p -measure at most 2 e − ε N . Since N is made arbitrarily large, we canassume 2 e − ε N ≤ ε . Assume every bad length N block has N − r +1 occurrences of v , the maximumpossible number of occurrences. By the choice of (cid:96) , the number of digits in ( x ) b N [0 : (cid:96) −

1] whichare bad base b length N blocks is at most ε(cid:96) . We are assuming each of these bad blocks contains N − r + 1 instances of v , so the number of instances of v in bad blocks is at most ε ( N − r + 1) (cid:96) .7ernoulli Randomness and Biased Normality Andrew DeLapoSimilarly, let G be the set of length N good blocks. There are at most (cid:96) many elements of G among the digits of ( x ) b N [0 : (cid:96) − v is within ε of its expectedfrequency. The number of instances of v in good blocks is at most (cid:96) ( µ p ( v ) + ε )( N − r + 1).We have counted the instances of v in ( x ) b [0 : N (cid:96) −

1] between two length N blocks, inside badblocks, and inside good blocks. Now we can compute an upper bound on the frequency of v in theﬁrst N (cid:96) digits of ( x ) b . We haveocc(( x ) b [0 : N (cid:96) − , v ) N (cid:96) ≤ ε(cid:96) + ε ( N − r + 1) (cid:96) + (cid:96) ( µ p ( v ) + ε )( N − r + 1) N (cid:96) by above. Additionally, ε(cid:96) + ε ( N − r + 1) (cid:96) + (cid:96) ( µ p ( v ) + ε )( N − r + 1) N (cid:96) = ε + ε ( N − r + 1) + ( µ p ( v ) + ε )( N − r + 1) N and since N − r + 1 ≤ N , ε + ε ( N − r + 1) + ( µ p ( v ) + ε )( N − r + 1) N ≤ ε + εN + ( µ p ( v ) + ε ) NN = εN + 2 ε + µ p ( v ) . Therefore occ(( x ) b [0 : N (cid:96) − N (cid:96) ≤ εN + 2 ε + µ p ( v )which approaches µ p ( v ) as required. The computation for a lower bound on the eventual frequencyof v in ( x ) b can be made in a way analogous to the computation above; again parsing ( x ) b in length N subblocks, assume that all occurrences of v are within good length N blocks. By Lemma 2.5,there are at least (1 − ε ) (cid:96) many good length N blocks when (cid:96) is suﬃciently large. Each goodlength N block must contain at least ( N − r + 1)( µ p ( v ) − ε ) instances of v . Then the number ofoccurrences of v is at least (1 − ε ) (cid:96) ( N − r + 1)( µ p ( v ) − ε ), and one can check that the frequency of v in ( x ) b [0 : N (cid:96) −

1] again approaches µ p ( v ) as required. Lemma 2.8. If x is such that for every positive integer r and every block v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v )then x is biased normal as in Deﬁnition 2.2. Proof.

This proof is similar to a proof by Cassels in [11] for the case of normality, and we usesimilar notation. Let f and g be base b blocks of lengths r and s respectively, with s ≥ r . For agiven integer m from 0 to r − R m ( g, f ) is the number of solutions to g [ n : n + r −

1] = f with n ≡ m (mod r ). Then R m ( g, f ) ≤ s − r + 1.Let ε > v in base b of length r . Let s ≥ r be a positive integer. Consider v as a digit in base b r . Let B be the set of length s base b blocks with simple discrepancy at least ε .By Lemma 2.5, we have max ≤ m

1, which contribute less than s − r + 1 to j m . Then for each m , | i m − j m | ≤ s − s − r + 1) ≤ s .Each of the 2 εN blocks appearing in ( x ) b [0 : N −

1] from B contribute at most s − r + 1occurrences of v . For length s blocks appearing in ( x ) b [0 : N −

1] which are not members of B , v appears at starting indices equivalent to m mod r with frequency at most µ p ( v )+ εr by equation 2.1,so the number of these occurrences of v in such length s blocks is at most ( µ p ( v )+ ε )( s − r +1) r . Thereare at most N − s + 1 length s blocks. This gives the upper bound j m ≤ εN ( s − r + 1) + ( N − s + 1)( µ p ( v ) + ε )( s − r + 1)for each m . Then an upper bound on j m s − r +1 is j m s − r + 1 ≤ εN + ( N − s + 1) µ p ( v ) r + ε ( N − s + 1)for each m , where, to match the bounds given by Cassels, we have used the fact that εr ≤ ε . Notethat (cid:12)(cid:12)(cid:12)(cid:12) i m s − r + 1 − j m s − r + 1 (cid:12)(cid:12)(cid:12)(cid:12) ≤ s s − r + 1and i m s − r + 1 = R m (( x ) b [0 : N − , v )since | i m − j m | ≤ s and by deﬁnition of i m . Thus (cid:12)(cid:12)(cid:12)(cid:12) R m (( x ) b [0 : N − , v ) − ( N − s + 1) µ p ( v ) r (cid:12)(cid:12)(cid:12)(cid:12) ≤ s s − r + 1 + ε ( N − s + 1) + 2 εN and lim sup N →∞ (cid:12)(cid:12)(cid:12)(cid:12) R m (( x ) b [0 : N − , v ) N − µ p ( v ) r (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε. Since ε is arbitrarily small, we therefore havelim N →∞ R m (( x ) b [0 : N − , v ) N = µ p ( v ) r for each m from 0 to r −

1. Conclude that x is biased normal as in Deﬁnition 2.2.Together, Lemmas 2.6, 2.7, and 2.8 prove the following corollary.9ernoulli Randomness and Biased Normality Andrew DeLapo Corollary 2.9.

Let x be a real number. Fix a base b and densities p , . . . , p b − . The following areequivalent.(1) x is biased normal as in Deﬁnition 2.2.(2) For every positive integer k , x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] (3) For each positive integer r and for each v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v ) Theorem 2.10.

Let x be a Bernoulli random real, with biases p , p , . . . , p n − . Then x is biasednormal with respect to p , p , . . . , p n − . Proof.

We will construct a µ p -Martin-L¨of test. Let 0 < ε < min( p , . . . , p b − ). Let k be the leastsuch that Lemma 2.5 holds for ε and b . For each integer k ≥ k , let B k = (cid:91) N>k { w ∈ b N : | occ( w, d ) − p d | > εN for some digit d in base b } Then µ p ( B k ) ≤ (cid:88) N>k e − ε N ≤ (cid:90) ∞ k e − ε N dN = e − ε k ε Suppose x is not biased normal to the densities p , . . . , p b − . By Corollary 2.9, x is equivalentlynot biased simply normal to base b n for some positive integer n and densities p ∗ n, , . . . , p ∗ n,b n − asdeﬁned in Corollary 2.9. Then x ∈ (cid:84) k ≥ k B k , and x fails the µ p -Martin-L¨of-random test. Corollary 2.11.

Fixing densities p , p , . . . , p b − , the set of biased normal reals has Bernoullimeasure 1.As another corollary of Theorem 2.10, we can prove Theorem 1.12. Theorem 1.12.

Every λ -Martin-L¨of random real is absolutely normal — normal in every base. Proof.

Let x be a λ -Martin-L¨of-random real. Let b be any base, and let p = ( p , p , . . . , p b − ) where p i = b for all i . Because the Bernoulli measure with parameters p is the Lebesgue measure, and x is λ -Martin-L¨of-random, it follows that x is Bernoulli random with parameters p . By Theorem2.10, x is biased normal with respect to p . The parameters p are uniform, so equivalently, x isnormal to base b . Since b was arbitrary, deduce that x is absolutely normal. We present a simple algorithm for computing a biased normal sequence by using a normal sequence,but we must assume that the given probabilities are rational numbers.10ernoulli Randomness and Biased Normality Andrew DeLapo

Construction 3.1.

Let p , p , p , . . . , p n − be positive rational probabilities adding up to 1. Foreach i ∈ { , , , . . . , n − } , let p i = a i b i , with a i , b i being positive coprime integers. Let d =lcm( b , b , . . . , b n − ). Then there is a base n block g of length d containing exactly p i d of each i , as p i d is an integer. Assume g has the base n digits in increasing order. Next, let ν ∈ d ω be base d normal sequence. Construct the sequence β ∈ n ω from ν by setting β [ k ] = g [ ν [ k ]]. Example 3.2.

Let p = and p = . Then d = 3, and we can let g = 001. This means that foreach k ∈ N , β [ k ] will be 0 if ν [ k ] is 0 or 1, and β [ k ] will be 1 if ν [ k ] is 2. If ν is Champernowne’sbase 3 sequence, ν = 0121011122021221 . . . then β begins β = 0010000011010110 . . . Theorem 3.3.

In Construction 3.1, β is biased normal with respect to p , p , p , . . . , p n − . Proof.

Let w ∈ n (cid:96) . By Corollary 2.9, it is suﬃcient to show that w has its expected frequency µ p ( w )in β . Let ν be the base d normal sequence used to construct β . We will rely on the normality of ν .Deﬁne A w to be the set of length (cid:96) blocks u in base d such that g [ u [ i ]] = w [ i ] for all i from 0to (cid:96) −

1. In other words, a block u ∈ A w appears starting at index k in ν if and only if w appearsstarting at index k in β . The number of blocks in A w is | A w | = (cid:96) − (cid:89) i =0 ( p w [ i ] d ) = d (cid:96) p w [ i ] = d (cid:96) µ p ( w )by construction of g . By normality of ν and Theorem 1.11, every base d block u of length (cid:96) appearswith frequency d (cid:96) in ν . lim k →∞ occ( ν [0 : k − , u ) k = 1 d (cid:96) Let ε >

0. Then there exists k ∈ N such that for all k ≥ k and each u ∈ d (cid:96) , (cid:12)(cid:12)(cid:12)(cid:12) occ( ν [0 : k − , u ) k − d (cid:96) (cid:12)(cid:12)(cid:12)(cid:12) < ε Consider k ≥ k . For each u ∈ d (cid:96) , let δ u be such that | δ u | ≤ ε andocc( ν [0 : k − , u ) k = 1 d (cid:96) + δ u By the construction of β , we can count instances of w in β in terms of instances of u ∈ A w appearingin ν . occ( β [0 : k − , w ) = (cid:88) u ∈ A w occ( ν [0 : k − , u )Then occ( β [0 : k − , w ) k = (cid:88) u ∈ A w occ( ν [0 : k − , u ) k β [0 : k − , w ) k = (cid:88) u ∈ A w (cid:18) d (cid:96) + δ u (cid:19) Since | δ u | ≤ ε , we then have (cid:88) u ∈ A w (cid:18) d (cid:96) − ε (cid:19) < occ( β [0 : k − , w ) k < (cid:88) u ∈ A w (cid:18) d (cid:96) + ε (cid:19) and we calculated | A w | = d (cid:96) µ p ( w ), so d (cid:96) µ p ( w ) (cid:18) d (cid:96) − ε (cid:19) < occ( β [0 : k − , w ) k < d (cid:96) µ p ( w ) (cid:18) d (cid:96) + ε (cid:19) µ p ( w ) − εd (cid:96) µ p ( w ) < occ( β [0 : k − , w ) k < µ p ( w ) + εd (cid:96) µ p ( w )Thus (cid:12)(cid:12)(cid:12)(cid:12) occ( β [0 : k − , w ) k − µ p ( w ) (cid:12)(cid:12)(cid:12)(cid:12) < εd (cid:96) µ p ( w )Since ε is arbitrarily small and d (cid:96) µ p ( w ) is constant, deduce thatlim k →∞ occ( β [0 : k − , w ) k = µ p ( w )and that, by Corollary 2.9, β is biased normal with respect to the probabilities.Because the translation described in Construction 3.1 is measure-preserving, computable, andcontinuous, we have the following theorem. Theorem 3.4.

Let x be a λ -Martin-L¨of-random real, let b be a base, and let p , p , . . . , p b − berational densities. Let β be the result of running Construction 3.1 on ( x ) b . Then β is Bernoullirandom with parameters p , p , . . . , p b − . In his book

Fractals Everywhere [12] on the theory of iterated function systems, Michael Barnsleypresents two algorithms for computing the attractor of an IFS. The ﬁrst “deterministic algorithm”constructs the attractor directly in iterated steps. The second “random iteration algorithm” (or“chaos game”) plots hundreds of thousands of points, where each point is the image of a randomlyselected transformation on the previous point, and the collection of points approximates the attrac-tor of the IFS. In particular, Barnsley uses a computer’s pseudorandom number generator to selectthe transformations. A famous attractor of an IFS is the Barnsley fern and is shown in Figure 1.We begin by reintroducing iterated function systems (with probabilities) and the random iter-ation algorithm. 12ernoulli Randomness and Biased Normality Andrew DeLapoFigure 1: The Barnsley fern.

The illustrations appearing in this paper are the output of a program written in Processing by theauthor. It is important to note now that the illustrations are of plots in Cartesian coordinates, butwith the convention that the origin (0 ,

0) appears at the top-left of the image and with the y -axisincreasing downwards rather than upwards. The x -axis increases to the right as usual. The sourcecode for the program, including a Python version with a user interface, can be found at [13]. Deﬁnition 4.1. An iterated function system with probabilities consists of a metric space ( X, d ), aﬁnite collection of transformations f , f , . . . , f n : X → X , and a corresponding collection of realprobabilities p , p , . . . , p n , where 0 < p i < i , and (cid:80) ni =1 p i = 1. An iterated function systemwith probabilities, often abbreviated IFS, is often presented as { X ; f , f , . . . , f n ; p , p , . . . , p n } .When the probabilities are omitted, one can assume that the probabilities are uniform, and p i = n for all i . Deﬁnition 4.2.

Let (

X, d ) be a metric space. A transformation f : X → X is a contractionmapping if there is a constant 0 ≤ s < x, y ∈ X , d ( f ( x ) , f ( y )) ≤ s · d ( x, y ) Deﬁnition 4.3.

Let { X ; w , w , . . . , w n } be an IFS where each w i is a contraction mapping.Barnsley calls such an IFS hyperbolic . Let H ( X ) denote the space whose points are the compactsubsets of X , not including the empty set. One can check (see [12]) that the transformation W : H ( X ) → H ( X ) deﬁned by W ( B ) = n (cid:91) i =1 w i ( B )has a unique ﬁxed point A ∈ H ( X ); we have W ( A ) = A , and A is given by A = lim n →∞ W n ( B )for any B ∈ H ( X ). Then A is called the attractor of the IFS. Deﬁnition 4.4.

One can use the random iteration algorithm to approximate the attractor of anIFS { X ; f , f , . . . , f n ; p , p , . . . , p n } . The random iteration algorithm proceeds as follows.13ernoulli Randomness and Biased Normality Andrew DeLapoFirst, set x ∈ X arbitrarily. In cases where X = R , we will set x = (0 , k ≥

1, choose recursively and independently x k ∈ { f ( x k − ) , f ( x k − ) , . . . , f n ( x k − ) } where the probability that x k = f i ( x n − ) is p i . The result of the random iteration algorithm is { x n : n ∈ N } ⊆ X . By “randomly,” Barnsley is referring to an unspeciﬁed level of randomness, butone that is at least as random as the pseudorandom number generator on a computer. Example 4.5. In R , consider the three transformations f ( x, y ) = (cid:16) x , y (cid:17) f ( x, y ) = (cid:18) x , y + 1002 (cid:19) f ( x, y ) = (cid:18) x + 1002 , y + 1002 (cid:19) Then f can be thought of as taking ( x, y ) to the point halfway between itself and the origin.Similarly, f takes ( x, y ) halfway to (0 , f takes ( x, y ) halfway to (100 , (cid:8) R ; f , f , f (cid:9) (where the probabilities are uniform)is a Sierpinski triangle, as seen in Figure 2a. On the right, we use probabilities 0 .

8, 0 .

1, and 0 . f , f , and f respectively, as seen in Figure 2b. (a) The result of one million iterationsof random iteration algorithm on the IFS (cid:8) R ; f , f , f ; , , (cid:9) from Example 4.5is the Sierpinski triangle, with vertices at(0 , , , . .

1, and 0 . f , f , and f , respectively. Figure 2: Two results of the random iteration algorithm with the same transformationsbut diﬀerent probabilities. In each picture, a color is associated to each transformation,so that f i ( x, y ) is given the color associated with f i .14ernoulli Randomness and Biased Normality Andrew DeLapo We modify the random iteration algorithm to instead use a pre-determined sequence to choose fromthe n transformations at each step. Deﬁnition 4.6.

Let { X ; f , f , . . . , f n − } be an IFS. Let σ ∈ n ω . The determined iterationalgorithm is a modiﬁed version of the random iteration algorithm. Pick x ∈ X arbitrarily as inthe random algorithm, and pick x n = f σ [ n − ( x n − ) for each n ≥

1. The result of the determinediteration algorithm is { x n : n ∈ N } . Example 4.7.

Let v = (0 , , v = (0 , , v = (1 , , v = (1 , ∈ R , and consider the IFS { R , f , f , f , f } , where each f i is the midpoint transformation from ( x, y ) to the point halfwaybetween ( x, y ) and v i . The attractor of this IFS is the unit square, and when the probability ofeach f i is p i = , the square is uniformly covered with points when the random iteration algorithmis applied, as in Figure 3a. Champernowne’s base 4 sequence produces the result in Figure 3b.Because the ﬁrst 15 digits of C are 012310111213202the ﬁrst 15 transformations chosen in the determined iteration algorithm are, in order, f , f , f , f , f , f , f , f , f , f , f , f , f , f , f (a) A result of one million it-erations of the random iter-ation algorithm on the IFS { R , f , f , f , f } from Exam-ple 4.7 using a pseudo-randomnumber generator. (b) The result of one millioniterations of the determined it-eration algorithm on the sameIFS as in (a). The transforma-tions were determined by C . (c) The result of one million it-erations of the determined it-eration algorithm on the sameIFS as in (a). The transfor-mations were determined by CE . Figure 3: Comparing the random iteration algorithm with the determined iteration algo-rithm.By the deﬁnition of normal , each transformation has the same chance of being applied to x n asevery other transformation. Not all iterated function systems use uniform probabilities, however.Barnsley’s fern, for example, uses four transformations with probabilities 0 .

85, 0 .

07, 0 .

07, and 0 . (1) One can characterize normality with respect to a probability measure as follows.15ernoulli Randomness and Biased Normality Andrew DeLapo Deﬁnition 5.1.

Let µ be a Borel probability measure, x ∈ [0 ,

1] a real number, and b a base.For each positive integer n and interval I ⊆ [0 , f I ( n, x ) = (cid:12)(cid:12)(cid:12) { k ∈ Z : 1 ≤ k ≤ n and there exists y ∈ I such that b k x ≡ y mod 1 } (cid:12)(cid:12)(cid:12) . Say that x is µ -normal if for every interval I ⊆ [0 , n →∞ f I ( n, x ) n = µ ( I ) . What are the necessary and suﬃcient conditions on µ such that every µ -Martin-L¨of-randomreal x is µ -normal?(2) Suppose x is a Bernoulli random real in base b . For every base b (cid:48) multiplicatively independentof b , do there exist densities to which ( x ) b (cid:48) is biased normal? If not, give a counterexample.For published progress on this question for the case of uniform biases, see [14]. Preliminaryinvestigations suggest that the assumption of Bernoulli randomness cannot be weakened tobiased normality, since it appears that there exist reals which are biased normal for all basesmultiplicatively independent of b = 3 but not biased simply normal in base 3.(3) Can Construction 3.1 be reversed to produce a normal real from a biased normal real? Ifso, does running this reversed construction on a Bernoulli random real produce a Martin-L¨of-random real? In [7], Porter states that von Neumann’s randomness extractor achieves thedesired result for binary sequences.(4) What are the necessary and suﬃcient conditions for a real number, using the determinediteration algorithm, to generate the same attractor as the random iteration algorithm? Is therea connection between the discrepancy of a real number and the rate at which the determinediteration algorithm approximates the attractor produced by the random iteration algorithm? This senior thesis was advised by Professor Theodore Slaman. I am grateful for Professor Slaman’stime, guidance, and patience. His patience in helping me develop the proof of Lemma 2.7 isparticularly noteworthy.Conversations with Druv Pai about the binomial distribution and probability were helpful indeveloping the proofs of Lemmas 2.4 and 2.5.For their support of the undergraduate mathematics community at UC Berkeley, I dedicate thissenior thesis to Berkeley’s Mathematics Undergraduate Student Association.

References [1] ´Emile Borel. Les probabilit´es d´enombrables et leurs applications arithm´etiques.

Rendicontidel Circolo Matematico di Palermo , 27(1):247–271, December 1909.[2] S. S. Pillai. On normal numbers.

Proceedings of the Indian Academy of Sciences - Section A ,12(2), August 1940.[3] Ivan Niven and Herbert Zuckerman. On the deﬁnition of normal numbers.

Paciﬁc Journal ofMathematics , 1(1):103–109, 1951. 16ernoulli Randomness and Biased Normality Andrew DeLapo[4] D. G. Champernowne. The construction of decimals normal in the scale of ten.

Journal of theLondon Mathematical Society , s1-8(4):254–260, October 1933.[5] Per Martin-L¨of. The deﬁnition of random sequences.

Information and Control , 9(6):602–619,December 1966.[6] Andr´e Nies.

Computability and Randomness . Oxford University Press, January 2009.[7] Christopher P Porter. Eﬀective aspects of bernoulli randomness.

Journal of Logic and Com-putation , 29(6):933–946, October 2019.[8] Arthur Copeland and Paul Erd˝os. Note on normal numbers.

Bull. Amer. Math. Soc. ,52(10):857–860, 10 1946.[9] William Feller.

An Introduction to Probability Theory and Its Applications, Volume 1 . A Wileypublication in mathematical statistics. Wiley, 1968.[10] Roman Vershynin.

High-Dimensional Probability . Cambridge University Press, September2018.[11] J. W. S. Cassels. On a paper of niven and zuckerman.

Paciﬁc Journal of Mathematics ,2(4):555–557, December 1952.[12] Michael Barnsley.

Fractals Everywhere . Academic Press, Inc., 1988.[13] Andrew DeLapo. IFS visualization code. GitHub. https://github.com/adelapo/biased-normality-ifs , 2020.[14] Yann Bugeaud.