BBernoulli Randomness and Biased Normality
Andrew DeLapo ∗ [email protected] July 2020
Abstract
One can consider µ -Martin-L¨of randomness for a probability measure µ on 2 ω , such as theBernoulli measure µ p given p ∈ (0 , n ω withparameters p , p , . . . , p n − , and we introduce a biased version of normality. We prove thatevery Bernoulli random real is normal in the biased sense, and this has the corollary that theset of biased normal reals has full Bernoulli measure in n ω . We give an algorithm for computingbiased normal sequences from normal sequences, so that we can give explicit examples of biasednormal reals. We investigate an application of randomness to iterated function systems. Finally,we list a few further questions relating to Bernoulli randomness and biased normality. This paper roughly follows the historical development of normal numbers and algorithmic random-ness. Borel [1] first described normal numbers in 1909, and Pillai [2] shortened Borel’s definition in1940. One decade later, Niven and Zuckerman [3] proved an equivalent formulation of normalityin terms of blocks of digits. Although Borel also showed in 1909 that almost all real numbers arenormal in every base, where the measure is the Lebesgue measure, the first explicit constructionof a normal number did not appear until 1933, by Champernowne [4]. In 1966, Martin-L¨of [5]defined randomness criteria in terms of geometrically shrinking and uniformly computably enumer-able open sets, and it can be shown that, in the Lebesgue measure, all Martin-L¨of-random numbersare normal in every base.After introducing preliminary notation, definitions, and theorems in the remainder of this sec-tion, we begin in Section 2 with a description of normality with respect to given biases on eachdigit in the base. This definition is written to follow Borel’s original definition of normality. Wethen prove a redundancy in our definition, as Pillai showed in Borel’s definition. We follow thiswith a definition of biased normality in terms of blocks, as Niven and Zuckerman proved. Theequivalences allow us to prove that, fixing b biases p = ( p , p , . . . , p b − ) adding up to 1 and usingthe Bernoulli measure µ p on b ω , all µ p -Martin-L¨of-random numbers are biased normal with respectto p . In Section 3, we give an algorithm which, given rational biases, uses a normal number toconstruct a biased normal number with respect to the biases. Section 4 describes an application ofbiased normal numbers to iterated function systems, and Section 5 lists further open questions. ∗ This work was the author’s senior honors thesis which was completed in the Department of Mathematics at theUniversity of California, Berkeley, supervised by Professor Theodore Slaman. a r X i v : . [ m a t h . L O ] J u l ernoulli Randomness and Biased Normality Andrew DeLapo A base is an integer n ≥
2. Let n ω denote the set of infinite n -ary sequences where n is a base. Weidentify n <ω as the set of finite n -ary sequences, which we also call blocks . For a given (cid:96) ∈ N , let n (cid:96) be the set of n -ary sequences of length (cid:96) . If σ ∈ n <ω , then let [[ σ ]] ⊆ n ω be the set of infinitesequences which extend σ .If σ is a (finite or infinite) n -ary sequence, we will index the entries in σ by σ [ i ], where σ [0] isthe first entry of the sequence. The subsequence of σ from index i to index j , inclusive, is σ [ i : j ].If σ is finite, then the length of σ is len( σ ). If σ , σ ∈ n <ω , then σ σ is the concatenation of σ and σ . The number of occurrences of a base n block ρ inside σ is occ( σ, ρ ). The empty sequenceis denoted as (cid:15) .The base b representation of a real number r ∈ [0 ,
1] is denoted ( r ) b and refers to the sequencein b ω such that r = (cid:80) ∞ i =1 (( r ) b [ i − × b − i ) and such that ( r ) b includes infinitely many instances ofdigits which are not b − Definition 1.1. A Borel probability measure on n ω is a countably additive, monotone function µ : F → [0 , F is the Borel σ -algebra of n ω and µ ( n ω ) = 1. Since a Borel probabilitymeasure is uniquely determined by the values it takes on finite unions of basic open cylinders, whengiving a Borel probability measure it is sufficient to specify a function ρ : n <ω → [0 ,
1] satisfying ρ ( (cid:15) ) = 1, where (cid:15) is the empty sequence, and ρ ( σ ) = n − (cid:88) i =0 ρ ( σi )where σi denotes the concatenation of σ with i as a symbol in base n . The resulting measure sets µ ([[ σ ]]) = ρ ( σ ). For this paper, we will refer to Borel probability measures as measures and onlyidentify the underlying function on blocks, so that µ ([[ σ ]]) is written as µ ( σ ). Definition 1.2.
The
Lebesgue measure λ on n ω is the measure given by setting λ ( σ ) = 1 n len( σ ) for each σ ∈ n <ω . Definition 1.3.
The
Bernoulli measure µ p on n ω , with associated positive probabilities p =( p , p , . . . , p n − ) satisfying (cid:80) n − i =0 p i = 1, is the measure given by setting µ p ( σ ) = p σ [0] p σ [1] · · · p σ [len( σ ) − for each σ ∈ n <ω . Note that the Lebesgue measure on n ω is exactly the Bernoulli measure on n ω obtained by setting p i = n for each i . Definition 1.4 (Martin-L¨of [5], see also [6]) . Let µ be a measure on n ω and z ∈ n ω . A µ -Martin-L¨oftest relative to z is a uniformly computably enumerable (relative to z ) sequence ( U i ) i ∈ ω of subsetsof n ω with µ ( U i ) ≤ − i for every i ∈ N . Say x ∈ n ω passes the test if x (cid:54)∈ (cid:84) i ∈ ω U i . If x passes every µ -Martin-L¨of test relative to z , then x is µ -Martin-L¨of random relative to z .2ernoulli Randomness and Biased Normality Andrew DeLapo Definition 1.5. If x ∈ n ω is µ p -Martin-L¨of random for the Bernoulli measure µ p with someprobabilities p = ( p , p , . . . , p n − ), then x is Bernoulli random with respect to the parameters p .Bernoulli randomness for binary sequences has been studied by Porter in [7]. Definition 1.6.
A real number x is simply normal to base b if every base b digit d ∈ { , , . . . , b − } appears with density b in ( x ) b . That is,lim n →∞ occ(( x ) b [0 : n − , d ) n = 1 b Borel characterized normality in the following way.
Definition 1.7 (Borel [1]) . A real number x is normal to base b if for every natural n and positiveinteger k , b n x is simply normal to base b k . Example 1.8.
In 1933, Champernowne [4] gave an explicit real number which is normal to base10. C = 0 . . . . In general, let C n denote the real number with the base n representation obtained by concatenatingthe base n numbers in order. C n is normal to base n . Example 1.9.
Among the results by Copeland and Erd˝os in [8] is the fact that the real number CE n obtained by concatenating the primes in base n in order is normal to base n . Then CE = 0 . . . .CE = 0 . . . . In 1940, Pillai simplified Borel’s definition with the following theorem.
Theorem 1.10 (Pillai [2]) . A real number x is normal to base b if and only if for every positiveinteger k , x is simply normal to base b k .In 1950, another equivalence was proven by Niven and Zuckerman. Theorem 1.11 (Niven and Zuckerman [3]) . A real number x is normal to base b if and only if forevery positive integer (cid:96) , every block w ∈ b (cid:96) appears in ( x ) b with frequency b (cid:96) .lim n →∞ occ(( x ) b [0 : n − , w ) n = 1 b (cid:96) One important connection between normal numbers and algorithmic randomness is the followingtheorem.
Theorem 1.12.
Every λ -Martin-L¨of random real is absolutely normal — normal in every base.3ernoulli Randomness and Biased Normality Andrew DeLapo The goal of this section is to prove a version of Theorem 1.12 for Bernoulli random numbers. Todo this, we define a notion of normality given biases on the digits. We will mirror the historicaldevelopment of normality by generalizing Borel’s original definitions of simply normal and normal to allow for given biases on the digits. In base b , the biases p , p , . . . , p b − , also called “densities”or “probabilities”, will be assumed to be positive real numbers adding to 1. Definition 2.1.
A real number x is biased simply normal to the biases p , p , . . . , p b − if each base b digit d ∈ { , , . . . , b − } appears with density p d in ( x ) b . That is,lim n →∞ occ(( x ) b [0 : n − , d ) n = p d Definition 2.2.
A real number x is biased normal with respect to the biases p , p , . . . , p b − if forevery natural n and positive integer k , b n x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , wherefor each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] and where here ( i ) b k contains sufficient zero-padding so that it has exactly k digits.Let k be any positive integer. Let p = ( p , p , . . . , p b − ) and p ∗ k = ( p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − ). Let v be a base b block, and let ( v ) b k be v be considered in base b k . For any k , if len( v ) divides k , thenthe p ∗ k are such that µ p ( v ) = µ p ∗ k ( v ).As shown for the case of normality in Theorems 1.10 and 1.11, the definition of biased normal can be simplified. To prove this, we will require the following definition. Definition 2.3.
Let w be a length (cid:96) block of digits in base b . Let p , p , . . . , p b − be biases. Thenthe simple discrepancy of w with respect to the biases ismax d ∈{ , ,...,b − } (cid:12)(cid:12)(cid:12)(cid:12) occ( w, d ) (cid:96) − p d (cid:12)(cid:12)(cid:12)(cid:12) Lemma 2.4.
Fix a base b , a digit d , and a block length k . Let S i ⊆ b k be the set of blocks oflength k containing exactly i instances of d . The Bernoulli measure of S i is µ p ( S i ) = (cid:18) ki (cid:19) p id (1 − p d ) k − i Proof.
We know that the number of blocks in S i is | S i | = (cid:18) ki (cid:19) ( b − k − i since there are (cid:0) ki (cid:1) choices for where to put the i instances of d and k − i places where one of b − d = 0. For w ∈ S i ,let n e = occ( w, e )4ernoulli Randomness and Biased Normality Andrew DeLapofor a digit e in base b . The measure of any such w is µ p ( w ) = p i b − (cid:89) m =1 p n m m To find the measure of S i , we can take the sum of the measures over all such w with digit counts n , n , . . . , n b − ∈ N such that (cid:80) b − m =1 n m = k − i . The number of such w is (cid:88) n + n + ··· + n b − i = k − i (cid:18) ki (cid:19)(cid:18) k − in , n , . . . , n b − (cid:19) where (cid:18) k − in , n , . . . , n b − (cid:19) = ( k − i )! n ! n ! · · · n b − !is the multinomial coefficient. This is because there are (cid:0) ki (cid:1) many choices for the locations of d = 0,and for each sum n + n + · · · + n b − = k − i there are (cid:0) k − in ,n ,...,n b − (cid:1) different length k − i sequences w with occ( w, e ) = n e for each e from 1 to b −
1. So the measure of S i is µ p ( S i ) = (cid:88) n + n + ··· + n b − i = k − i (cid:18) ki (cid:19)(cid:18) k − in , n , . . . , n b − i (cid:19) p i b − (cid:89) j =1 p n j j µ p ( S i ) = (cid:18) ki (cid:19) p i (cid:88) n + n + ··· + n b − i = k − i (cid:18) k − in , n , . . . , n b − i (cid:19) b − (cid:89) j =1 p n j j By the multinomial theorem [9], (cid:88) n + n + ··· + n b − i = k − i (cid:18) k − in , n , . . . , n b − i (cid:19) b − (cid:89) j =1 p n j j = b − (cid:88) j =1 p j k − i Therefore µ p ( S i ) = (cid:18) ki (cid:19) p i b − (cid:88) j =1 p j k − i and we know (cid:80) b − j =1 p j = 1 − p , so µ p ( S i ) = (cid:18) ki (cid:19) p i (1 − p ) k − i which is the desired equality for d = 0. Lemma 2.5.
Let 0 < ε < min( p , . . . , p b − ). Fix a block length k . Say that a block w of length k is “bad” for a digit d if occ( w, d ) ≤ ( p d − ε ) k or occ( w, d ) ≥ ( p d + ε ) k B be the set of such w . B = { w ∈ b k : | occ( w, d ) − p d | ≥ εk } Then the Bernoulli measure of B in b ω with parameters p , p , . . . , p b − is at most 2 e − ε k . Proof.
Let i be an integer such that 0 ≤ i ≤ k . Let B i be set of blocks of length k containing exactly i instances of the digit d . The Bernoulli measure of B i in b ω with parameters p , p , . . . , p b − is, byLemma 2.4, µ p ( B i ) = (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i Notice that this is the binomial distribution with k trials and i successes, where the probability ofsuccess is p d . To calculate µ p ( B ), we have B = (cid:98) ( p d − ε ) k (cid:99) (cid:91) i =0 B i ∪ k (cid:91) i = (cid:100) ( p d + ε ) k (cid:101) B i where all the unions are of pairwise disjoint sets. Then µ p ( B ) = (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 µ p ( B i ) + k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) µ p ( B i )We expand both appearances of µ p ( B i ) as above. µ p ( B ) = (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i + k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i Apply Hoeffding’s inequality [10] on the tail ends of the binomial distribution to get that (cid:98) ( p d − ε ) k (cid:99) (cid:88) i =0 (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i ≤ e − ε k and k (cid:88) i = (cid:100) ( p d + ε ) k (cid:101) (cid:18) ki (cid:19) ( p d ) i (1 − p d ) k − i ≤ e − ε k It follows that µ p ( B ) ≤ e − ε k .Definition 1.7, Theorem 1.10, and Theorem 1.11 give three equivalent definitions of normality.The next three lemmas accomplish the same task for biased normality. Lemma 2.6. If x is biased normal to p , p , . . . , p b − , then for every positive integer k , x is biasedsimply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] Proof.
This lemma follows immediately from the definition of biased normal , as it is a special caseof the definition.
Lemma 2.7.
If for every positive integer k , x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − ,where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] then for each positive integer r and each block v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v ) Proof.
Fix r and v ∈ b r . Let ε , ε >
0. By Lemma 2 .
5, there is a sufficiently large positiveinteger N such that all N ≥ N , all but a µ p -measure at most ε subset B of length N base b blocks have simple discrepancy less than ε when parsed in length r intervals starting from index0. Moreover, we argue that N can be made sufficiently large so that for each m from 0 to r − µ p -measure b m ε subset B m of length N base b blocks have simple discrepancy less than ε when parsed in length r intervals starting from index m . The µ p -measure of each B m is at most b m ε because each length N − m sequence extends to a length N sequence in b m many ways, andwe know µ p ( B ) ≤ ε . Thus the measure of (cid:83) r − m =0 B m is at most (cid:80) r − m =0 b m ε ≤ b r ε .We compute an upper bound on the eventual frequency of v in ( x ) b . Let ε >
0. Parse ( x ) b inlength N subblocks starting from index 0, where N will be sufficiently large as will be determinedby the following analysis. Because x is biased simply normal in base b N , there is a positive integer (cid:96) such that for all (cid:96) ≥ (cid:96) , every w ∈ b N occurs within ε of its expected frequency in the first (cid:96) digits of ( x ) b N . That is, (cid:12)(cid:12)(cid:12)(cid:12) occ(( x ) b N [0 : (cid:96) − , w ) (cid:96) − µ p ∗ N ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε for every w ∈ b N , where p ∗ N = ( p ∗ N, , . . . , p ∗ N,b N − ). Parsing ( x ) b in length N blocks, instances of v in ( x ) b can occur in three different ways. If an instance of v is not contained within a length N block when parsing ( x ) b into length N subblocks starting from index 0, then v begins in one blockand ends in the next block. All other instances of v will be entirely within one length N subblock,and we say such a block w is “good” if (cid:12)(cid:12)(cid:12) occ( w,v ) N − µ p ( v ) (cid:12)(cid:12)(cid:12) ≤ ε , or “bad” otherwise. If an instance of v is contained in a length N block w , then we consider separately the cases that the block is goodor bad.Let ε >
0. There are (cid:96) ( r − N many length r blocks that start in one length N block and end inanother length N block. Some of those (cid:96) ( r − N blocks could be instances of v , and none of them arecounted in the above computation. Assume that all (cid:96) ( r − N of these blocks are instances of v . Since N is made arbitrarily large, (cid:96) ( r − N < ε(cid:96) .Next, we bound the occurrences of v in bad length N subblocks. By Lemma 2.5, the subset B of bad length N blocks has µ p -measure at most 2 e − ε N . Since N is made arbitrarily large, we canassume 2 e − ε N ≤ ε . Assume every bad length N block has N − r +1 occurrences of v , the maximumpossible number of occurrences. By the choice of (cid:96) , the number of digits in ( x ) b N [0 : (cid:96) −
1] whichare bad base b length N blocks is at most ε(cid:96) . We are assuming each of these bad blocks contains N − r + 1 instances of v , so the number of instances of v in bad blocks is at most ε ( N − r + 1) (cid:96) .7ernoulli Randomness and Biased Normality Andrew DeLapoSimilarly, let G be the set of length N good blocks. There are at most (cid:96) many elements of G among the digits of ( x ) b N [0 : (cid:96) − v is within ε of its expectedfrequency. The number of instances of v in good blocks is at most (cid:96) ( µ p ( v ) + ε )( N − r + 1).We have counted the instances of v in ( x ) b [0 : N (cid:96) −
1] between two length N blocks, inside badblocks, and inside good blocks. Now we can compute an upper bound on the frequency of v in thefirst N (cid:96) digits of ( x ) b . We haveocc(( x ) b [0 : N (cid:96) − , v ) N (cid:96) ≤ ε(cid:96) + ε ( N − r + 1) (cid:96) + (cid:96) ( µ p ( v ) + ε )( N − r + 1) N (cid:96) by above. Additionally, ε(cid:96) + ε ( N − r + 1) (cid:96) + (cid:96) ( µ p ( v ) + ε )( N − r + 1) N (cid:96) = ε + ε ( N − r + 1) + ( µ p ( v ) + ε )( N − r + 1) N and since N − r + 1 ≤ N , ε + ε ( N − r + 1) + ( µ p ( v ) + ε )( N − r + 1) N ≤ ε + εN + ( µ p ( v ) + ε ) NN = εN + 2 ε + µ p ( v ) . Therefore occ(( x ) b [0 : N (cid:96) − N (cid:96) ≤ εN + 2 ε + µ p ( v )which approaches µ p ( v ) as required. The computation for a lower bound on the eventual frequencyof v in ( x ) b can be made in a way analogous to the computation above; again parsing ( x ) b in length N subblocks, assume that all occurrences of v are within good length N blocks. By Lemma 2.5,there are at least (1 − ε ) (cid:96) many good length N blocks when (cid:96) is sufficiently large. Each goodlength N block must contain at least ( N − r + 1)( µ p ( v ) − ε ) instances of v . Then the number ofoccurrences of v is at least (1 − ε ) (cid:96) ( N − r + 1)( µ p ( v ) − ε ), and one can check that the frequency of v in ( x ) b [0 : N (cid:96) −
1] again approaches µ p ( v ) as required. Lemma 2.8. If x is such that for every positive integer r and every block v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v )then x is biased normal as in Definition 2.2. Proof.
This proof is similar to a proof by Cassels in [11] for the case of normality, and we usesimilar notation. Let f and g be base b blocks of lengths r and s respectively, with s ≥ r . For agiven integer m from 0 to r − R m ( g, f ) is the number of solutions to g [ n : n + r −
1] = f with n ≡ m (mod r ). Then R m ( g, f ) ≤ s − r + 1.Let ε > v in base b of length r . Let s ≥ r be a positive integer. Consider v as a digit in base b r . Let B be the set of length s base b blocks with simple discrepancy at least ε .By Lemma 2.5, we have max ≤ m 1, which contribute less than s − r + 1 to j m . Then for each m , | i m − j m | ≤ s − s − r + 1) ≤ s .Each of the 2 εN blocks appearing in ( x ) b [0 : N − 1] from B contribute at most s − r + 1occurrences of v . For length s blocks appearing in ( x ) b [0 : N − 1] which are not members of B , v appears at starting indices equivalent to m mod r with frequency at most µ p ( v )+ εr by equation 2.1,so the number of these occurrences of v in such length s blocks is at most ( µ p ( v )+ ε )( s − r +1) r . Thereare at most N − s + 1 length s blocks. This gives the upper bound j m ≤ εN ( s − r + 1) + ( N − s + 1)( µ p ( v ) + ε )( s − r + 1)for each m . Then an upper bound on j m s − r +1 is j m s − r + 1 ≤ εN + ( N − s + 1) µ p ( v ) r + ε ( N − s + 1)for each m , where, to match the bounds given by Cassels, we have used the fact that εr ≤ ε . Notethat (cid:12)(cid:12)(cid:12)(cid:12) i m s − r + 1 − j m s − r + 1 (cid:12)(cid:12)(cid:12)(cid:12) ≤ s s − r + 1and i m s − r + 1 = R m (( x ) b [0 : N − , v )since | i m − j m | ≤ s and by definition of i m . Thus (cid:12)(cid:12)(cid:12)(cid:12) R m (( x ) b [0 : N − , v ) − ( N − s + 1) µ p ( v ) r (cid:12)(cid:12)(cid:12)(cid:12) ≤ s s − r + 1 + ε ( N − s + 1) + 2 εN and lim sup N →∞ (cid:12)(cid:12)(cid:12)(cid:12) R m (( x ) b [0 : N − , v ) N − µ p ( v ) r (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε. Since ε is arbitrarily small, we therefore havelim N →∞ R m (( x ) b [0 : N − , v ) N = µ p ( v ) r for each m from 0 to r − 1. Conclude that x is biased normal as in Definition 2.2.Together, Lemmas 2.6, 2.7, and 2.8 prove the following corollary.9ernoulli Randomness and Biased Normality Andrew DeLapo Corollary 2.9. Let x be a real number. Fix a base b and densities p , . . . , p b − . The following areequivalent.(1) x is biased normal as in Definition 2.2.(2) For every positive integer k , x is biased simply normal to p ∗ k, , p ∗ k, , . . . , p ∗ k,b k − , where for each i ∈ { , . . . , b k − } , p ∗ k,i = k − (cid:89) j =0 p ( i ) bk [ j ] (3) For each positive integer r and for each v ∈ b r ,lim n →∞ occ(( x ) b [0 : n − , v ) n = r − (cid:89) j =0 p v [ j ] = µ p ( v ) Theorem 2.10. Let x be a Bernoulli random real, with biases p , p , . . . , p n − . Then x is biasednormal with respect to p , p , . . . , p n − . Proof. We will construct a µ p -Martin-L¨of test. Let 0 < ε < min( p , . . . , p b − ). Let k be the leastsuch that Lemma 2.5 holds for ε and b . For each integer k ≥ k , let B k = (cid:91) N>k { w ∈ b N : | occ( w, d ) − p d | > εN for some digit d in base b } Then µ p ( B k ) ≤ (cid:88) N>k e − ε N ≤ (cid:90) ∞ k e − ε N dN = e − ε k ε Suppose x is not biased normal to the densities p , . . . , p b − . By Corollary 2.9, x is equivalentlynot biased simply normal to base b n for some positive integer n and densities p ∗ n, , . . . , p ∗ n,b n − asdefined in Corollary 2.9. Then x ∈ (cid:84) k ≥ k B k , and x fails the µ p -Martin-L¨of-random test. Corollary 2.11. Fixing densities p , p , . . . , p b − , the set of biased normal reals has Bernoullimeasure 1.As another corollary of Theorem 2.10, we can prove Theorem 1.12. Theorem 1.12. Every λ -Martin-L¨of random real is absolutely normal — normal in every base. Proof. Let x be a λ -Martin-L¨of-random real. Let b be any base, and let p = ( p , p , . . . , p b − ) where p i = b for all i . Because the Bernoulli measure with parameters p is the Lebesgue measure, and x is λ -Martin-L¨of-random, it follows that x is Bernoulli random with parameters p . By Theorem2.10, x is biased normal with respect to p . The parameters p are uniform, so equivalently, x isnormal to base b . Since b was arbitrary, deduce that x is absolutely normal. We present a simple algorithm for computing a biased normal sequence by using a normal sequence,but we must assume that the given probabilities are rational numbers.10ernoulli Randomness and Biased Normality Andrew DeLapo Construction 3.1. Let p , p , p , . . . , p n − be positive rational probabilities adding up to 1. Foreach i ∈ { , , , . . . , n − } , let p i = a i b i , with a i , b i being positive coprime integers. Let d =lcm( b , b , . . . , b n − ). Then there is a base n block g of length d containing exactly p i d of each i , as p i d is an integer. Assume g has the base n digits in increasing order. Next, let ν ∈ d ω be base d normal sequence. Construct the sequence β ∈ n ω from ν by setting β [ k ] = g [ ν [ k ]]. Example 3.2. Let p = and p = . Then d = 3, and we can let g = 001. This means that foreach k ∈ N , β [ k ] will be 0 if ν [ k ] is 0 or 1, and β [ k ] will be 1 if ν [ k ] is 2. If ν is Champernowne’sbase 3 sequence, ν = 0121011122021221 . . . then β begins β = 0010000011010110 . . . Theorem 3.3. In Construction 3.1, β is biased normal with respect to p , p , p , . . . , p n − . Proof. Let w ∈ n (cid:96) . By Corollary 2.9, it is sufficient to show that w has its expected frequency µ p ( w )in β . Let ν be the base d normal sequence used to construct β . We will rely on the normality of ν .Define A w to be the set of length (cid:96) blocks u in base d such that g [ u [ i ]] = w [ i ] for all i from 0to (cid:96) − 1. In other words, a block u ∈ A w appears starting at index k in ν if and only if w appearsstarting at index k in β . The number of blocks in A w is | A w | = (cid:96) − (cid:89) i =0 ( p w [ i ] d ) = d (cid:96) p w [ i ] = d (cid:96) µ p ( w )by construction of g . By normality of ν and Theorem 1.11, every base d block u of length (cid:96) appearswith frequency d (cid:96) in ν . lim k →∞ occ( ν [0 : k − , u ) k = 1 d (cid:96) Let ε > 0. Then there exists k ∈ N such that for all k ≥ k and each u ∈ d (cid:96) , (cid:12)(cid:12)(cid:12)(cid:12) occ( ν [0 : k − , u ) k − d (cid:96) (cid:12)(cid:12)(cid:12)(cid:12) < ε Consider k ≥ k . For each u ∈ d (cid:96) , let δ u be such that | δ u | ≤ ε andocc( ν [0 : k − , u ) k = 1 d (cid:96) + δ u By the construction of β , we can count instances of w in β in terms of instances of u ∈ A w appearingin ν . occ( β [0 : k − , w ) = (cid:88) u ∈ A w occ( ν [0 : k − , u )Then occ( β [0 : k − , w ) k = (cid:88) u ∈ A w occ( ν [0 : k − , u ) k β [0 : k − , w ) k = (cid:88) u ∈ A w (cid:18) d (cid:96) + δ u (cid:19) Since | δ u | ≤ ε , we then have (cid:88) u ∈ A w (cid:18) d (cid:96) − ε (cid:19) < occ( β [0 : k − , w ) k < (cid:88) u ∈ A w (cid:18) d (cid:96) + ε (cid:19) and we calculated | A w | = d (cid:96) µ p ( w ), so d (cid:96) µ p ( w ) (cid:18) d (cid:96) − ε (cid:19) < occ( β [0 : k − , w ) k < d (cid:96) µ p ( w ) (cid:18) d (cid:96) + ε (cid:19) µ p ( w ) − εd (cid:96) µ p ( w ) < occ( β [0 : k − , w ) k < µ p ( w ) + εd (cid:96) µ p ( w )Thus (cid:12)(cid:12)(cid:12)(cid:12) occ( β [0 : k − , w ) k − µ p ( w ) (cid:12)(cid:12)(cid:12)(cid:12) < εd (cid:96) µ p ( w )Since ε is arbitrarily small and d (cid:96) µ p ( w ) is constant, deduce thatlim k →∞ occ( β [0 : k − , w ) k = µ p ( w )and that, by Corollary 2.9, β is biased normal with respect to the probabilities.Because the translation described in Construction 3.1 is measure-preserving, computable, andcontinuous, we have the following theorem. Theorem 3.4. Let x be a λ -Martin-L¨of-random real, let b be a base, and let p , p , . . . , p b − berational densities. Let β be the result of running Construction 3.1 on ( x ) b . Then β is Bernoullirandom with parameters p , p , . . . , p b − . In his book Fractals Everywhere [12] on the theory of iterated function systems, Michael Barnsleypresents two algorithms for computing the attractor of an IFS. The first “deterministic algorithm”constructs the attractor directly in iterated steps. The second “random iteration algorithm” (or“chaos game”) plots hundreds of thousands of points, where each point is the image of a randomlyselected transformation on the previous point, and the collection of points approximates the attrac-tor of the IFS. In particular, Barnsley uses a computer’s pseudorandom number generator to selectthe transformations. A famous attractor of an IFS is the Barnsley fern and is shown in Figure 1.We begin by reintroducing iterated function systems (with probabilities) and the random iter-ation algorithm. 12ernoulli Randomness and Biased Normality Andrew DeLapoFigure 1: The Barnsley fern. The illustrations appearing in this paper are the output of a program written in Processing by theauthor. It is important to note now that the illustrations are of plots in Cartesian coordinates, butwith the convention that the origin (0 , 0) appears at the top-left of the image and with the y -axisincreasing downwards rather than upwards. The x -axis increases to the right as usual. The sourcecode for the program, including a Python version with a user interface, can be found at [13]. Definition 4.1. An iterated function system with probabilities consists of a metric space ( X, d ), afinite collection of transformations f , f , . . . , f n : X → X , and a corresponding collection of realprobabilities p , p , . . . , p n , where 0 < p i < i , and (cid:80) ni =1 p i = 1. An iterated function systemwith probabilities, often abbreviated IFS, is often presented as { X ; f , f , . . . , f n ; p , p , . . . , p n } .When the probabilities are omitted, one can assume that the probabilities are uniform, and p i = n for all i . Definition 4.2. Let ( X, d ) be a metric space. A transformation f : X → X is a contractionmapping if there is a constant 0 ≤ s < x, y ∈ X , d ( f ( x ) , f ( y )) ≤ s · d ( x, y ) Definition 4.3. Let { X ; w , w , . . . , w n } be an IFS where each w i is a contraction mapping.Barnsley calls such an IFS hyperbolic . Let H ( X ) denote the space whose points are the compactsubsets of X , not including the empty set. One can check (see [12]) that the transformation W : H ( X ) → H ( X ) defined by W ( B ) = n (cid:91) i =1 w i ( B )has a unique fixed point A ∈ H ( X ); we have W ( A ) = A , and A is given by A = lim n →∞ W n ( B )for any B ∈ H ( X ). Then A is called the attractor of the IFS. Definition 4.4. One can use the random iteration algorithm to approximate the attractor of anIFS { X ; f , f , . . . , f n ; p , p , . . . , p n } . The random iteration algorithm proceeds as follows.13ernoulli Randomness and Biased Normality Andrew DeLapoFirst, set x ∈ X arbitrarily. In cases where X = R , we will set x = (0 , k ≥ 1, choose recursively and independently x k ∈ { f ( x k − ) , f ( x k − ) , . . . , f n ( x k − ) } where the probability that x k = f i ( x n − ) is p i . The result of the random iteration algorithm is { x n : n ∈ N } ⊆ X . By “randomly,” Barnsley is referring to an unspecified level of randomness, butone that is at least as random as the pseudorandom number generator on a computer. Example 4.5. In R , consider the three transformations f ( x, y ) = (cid:16) x , y (cid:17) f ( x, y ) = (cid:18) x , y + 1002 (cid:19) f ( x, y ) = (cid:18) x + 1002 , y + 1002 (cid:19) Then f can be thought of as taking ( x, y ) to the point halfway between itself and the origin.Similarly, f takes ( x, y ) halfway to (0 , f takes ( x, y ) halfway to (100 , (cid:8) R ; f , f , f (cid:9) (where the probabilities are uniform)is a Sierpinski triangle, as seen in Figure 2a. On the right, we use probabilities 0 . 8, 0 . 1, and 0 . f , f , and f respectively, as seen in Figure 2b. (a) The result of one million iterationsof random iteration algorithm on the IFS (cid:8) R ; f , f , f ; , , (cid:9) from Example 4.5is the Sierpinski triangle, with vertices at(0 , , , . . 1, and 0 . f , f , and f , respectively. Figure 2: Two results of the random iteration algorithm with the same transformationsbut different probabilities. In each picture, a color is associated to each transformation,so that f i ( x, y ) is given the color associated with f i .14ernoulli Randomness and Biased Normality Andrew DeLapo We modify the random iteration algorithm to instead use a pre-determined sequence to choose fromthe n transformations at each step. Definition 4.6. Let { X ; f , f , . . . , f n − } be an IFS. Let σ ∈ n ω . The determined iterationalgorithm is a modified version of the random iteration algorithm. Pick x ∈ X arbitrarily as inthe random algorithm, and pick x n = f σ [ n − ( x n − ) for each n ≥ 1. The result of the determinediteration algorithm is { x n : n ∈ N } . Example 4.7. Let v = (0 , , v = (0 , , v = (1 , , v = (1 , ∈ R , and consider the IFS { R , f , f , f , f } , where each f i is the midpoint transformation from ( x, y ) to the point halfwaybetween ( x, y ) and v i . The attractor of this IFS is the unit square, and when the probability ofeach f i is p i = , the square is uniformly covered with points when the random iteration algorithmis applied, as in Figure 3a. Champernowne’s base 4 sequence produces the result in Figure 3b.Because the first 15 digits of C are 012310111213202the first 15 transformations chosen in the determined iteration algorithm are, in order, f , f , f , f , f , f , f , f , f , f , f , f , f , f , f (a) A result of one million it-erations of the random iter-ation algorithm on the IFS { R , f , f , f , f } from Exam-ple 4.7 using a pseudo-randomnumber generator. (b) The result of one millioniterations of the determined it-eration algorithm on the sameIFS as in (a). The transforma-tions were determined by C . (c) The result of one million it-erations of the determined it-eration algorithm on the sameIFS as in (a). The transfor-mations were determined by CE . Figure 3: Comparing the random iteration algorithm with the determined iteration algo-rithm.By the definition of normal , each transformation has the same chance of being applied to x n asevery other transformation. Not all iterated function systems use uniform probabilities, however.Barnsley’s fern, for example, uses four transformations with probabilities 0 . 85, 0 . 07, 0 . 07, and 0 . (1) One can characterize normality with respect to a probability measure as follows.15ernoulli Randomness and Biased Normality Andrew DeLapo Definition 5.1. Let µ be a Borel probability measure, x ∈ [0 , 1] a real number, and b a base.For each positive integer n and interval I ⊆ [0 , f I ( n, x ) = (cid:12)(cid:12)(cid:12) { k ∈ Z : 1 ≤ k ≤ n and there exists y ∈ I such that b k x ≡ y mod 1 } (cid:12)(cid:12)(cid:12) . Say that x is µ -normal if for every interval I ⊆ [0 , n →∞ f I ( n, x ) n = µ ( I ) . What are the necessary and sufficient conditions on µ such that every µ -Martin-L¨of-randomreal x is µ -normal?(2) Suppose x is a Bernoulli random real in base b . For every base b (cid:48) multiplicatively independentof b , do there exist densities to which ( x ) b (cid:48) is biased normal? If not, give a counterexample.For published progress on this question for the case of uniform biases, see [14]. Preliminaryinvestigations suggest that the assumption of Bernoulli randomness cannot be weakened tobiased normality, since it appears that there exist reals which are biased normal for all basesmultiplicatively independent of b = 3 but not biased simply normal in base 3.(3) Can Construction 3.1 be reversed to produce a normal real from a biased normal real? Ifso, does running this reversed construction on a Bernoulli random real produce a Martin-L¨of-random real? In [7], Porter states that von Neumann’s randomness extractor achieves thedesired result for binary sequences.(4) What are the necessary and sufficient conditions for a real number, using the determinediteration algorithm, to generate the same attractor as the random iteration algorithm? Is therea connection between the discrepancy of a real number and the rate at which the determinediteration algorithm approximates the attractor produced by the random iteration algorithm? This senior thesis was advised by Professor Theodore Slaman. I am grateful for Professor Slaman’stime, guidance, and patience. His patience in helping me develop the proof of Lemma 2.7 isparticularly noteworthy.Conversations with Druv Pai about the binomial distribution and probability were helpful indeveloping the proofs of Lemmas 2.4 and 2.5.For their support of the undergraduate mathematics community at UC Berkeley, I dedicate thissenior thesis to Berkeley’s Mathematics Undergraduate Student Association. References [1] ´Emile Borel. Les probabilit´es d´enombrables et leurs applications arithm´etiques. Rendicontidel Circolo Matematico di Palermo , 27(1):247–271, December 1909.[2] S. S. Pillai. On normal numbers. Proceedings of the Indian Academy of Sciences - Section A ,12(2), August 1940.[3] Ivan Niven and Herbert Zuckerman. On the definition of normal numbers. Pacific Journal ofMathematics , 1(1):103–109, 1951. 16ernoulli Randomness and Biased Normality Andrew DeLapo[4] D. G. Champernowne. The construction of decimals normal in the scale of ten. Journal of theLondon Mathematical Society , s1-8(4):254–260, October 1933.[5] Per Martin-L¨of. The definition of random sequences. Information and Control , 9(6):602–619,December 1966.[6] Andr´e Nies. Computability and Randomness . Oxford University Press, January 2009.[7] Christopher P Porter. Effective aspects of bernoulli randomness. Journal of Logic and Com-putation , 29(6):933–946, October 2019.[8] Arthur Copeland and Paul Erd˝os. Note on normal numbers. Bull. Amer. Math. Soc. ,52(10):857–860, 10 1946.[9] William Feller. An Introduction to Probability Theory and Its Applications, Volume 1 . A Wileypublication in mathematical statistics. Wiley, 1968.[10] Roman Vershynin. High-Dimensional Probability . Cambridge University Press, September2018.[11] J. W. S. Cassels. On a paper of niven and zuckerman. Pacific Journal of Mathematics ,2(4):555–557, December 1952.[12] Michael Barnsley. Fractals Everywhere . Academic Press, Inc., 1988.[13] Andrew DeLapo. IFS visualization code. GitHub. https://github.com/adelapo/biased-normality-ifs , 2020.[14] Yann Bugeaud.