Binarization Trees and Random Number Generation
aa r X i v : . [ c s . D S ] M a y Binarization Trees and Random Number Generation
Sung-il Pae
Abstract An m -extracting procedure produces unbiased random bits from a loaded dice with m faces. A binarization takesinputs from an m -faced dice and produce bit sequences to be fed into a (binary) extracting procedure to obtainrandom bits. Thus, binary extracting procedures give rise to an m -extracting procedure via a binarization. An entropy-preserving binarization is to be called complete, and such a procedure has been proposed by Zhou and Bruck. Weshow that there exist complete binarizations in abundance as naturally arising from binary trees with m leaves. Thewell-known leaf entropy theorem and a closely related structure lemma play important roles in the arguments. Index Terms
Random number generation, binarization, extracting procedures, coin flipping, loaded dice, Peres algorithm, leafentropy theorem.
I. I
NTRODUCTION An m - extracting procedure produces unbiased random bits using a sequence from an i.i.d. source over an alphabet { , , . . . , m − } , regardless of its probability distribution h p , p , . . . , p m − i . When m = 2 , the source is a biasedcoin, and the famous von Neumann trick is 2-extracting: take a pair of coin flips and return random bits by thefollowing rule [1]: λ, , , λ, (1)where λ indicates “no output.” Because Pr(01) = Pr(10) = p p , the resulting bit is unbiased, and the output rate ,the average number of output per input, is p p ≤ / . Elias [2] and Peres [3] extend it by taking inputs of length n ≥ and returning more than one bit at a time. Both methods are asymptotically optimal; as the input size n increases, the output rate approaches the information-theoretic upper bound H ( p ) , the Shannon entropy [4], [5].Elias’s method generalizes naturally from 2-extracting to m -extracting procedures for each m > , as discussedin Elias’s original paper [2]. However, a similar generalization of Peres’s method had been unknown for quite awhile and was found only recently [6]. In the meanwhile, Zhou and Bruck proposed a very interesting scheme thattransforms any binary extracting procedure into an m -extracting procedure [7]. For example, Peres method is turnedinto an m -extracting procedure via a simple process called “binarization.” If the above-mentioned generalizationsof Elias and Peres are to be called direct generalizations, their scheme is rather a meta-generalization. Moreover,the resulting m -extracting procedure is claimed to be asymptotically optimal if the given 2-extracting procedure isasymptotically optimal.In this paper, such entropy-preserving processes will be called complete binarizations and will be shown to existin abundance as naturally arising from binary trees with m leaves, and Zhou-Bruck scheme is an instance of them.The main tools in our argument are the well-known leaf entropy theorem and a technical fact which we call thestructure lemma.Consider the following binary tree with 5 nodes and 6 leaves:
12 34 5 (2)
Sung-il Pae is with Department of Computer Engineering, Hongik University, Seoul, Korea. (email: [email protected]) This work waspresented in part at 2016 IEEE Symposium on Information Theory (ISIT 2016), July 10-16, 2016, Barcelona, Spain. This research wassupported in part by a Hongik University grant and the National Research Foundation of Korea (NRF) grant funded by the Korean government(No. 2016R1D1A1B01016531).
The leaf entropy theorem states that, given a probability distribution p = h p , . . . , p i on the leaves, the Shannonentropy H ( p ) is equal to the weighted sum P i =1 P i H ( π i ) of the branching entropies H ( π i ) of the nodes, where theweight P i of node i is the sum of probabilities of the leaves under it [8], [5], [9]. For example, P = p + p + p + p ,and π = h p + p + p , p i .As an interpretation of the theorem, consider a loaded dice X with the probability distribution p of the 6 faces.Each roll of X generates, according to the tree (2), five possible coin tosses X i with biases π i , and X i has an outputwith probability P i . For example, if the dice roll X is 1, then coins X , X , and X give an output, as the treeis conveniently represented by squares (leaf, dice roll) and circles (node, coin toss). The leaf entropy theorem tellsus that the amount of information of the dice roll and the 5 coin tosses are the same. This suggests that X i ’s maybe used as sources of randomness to generate unbiased and independent random bits, possibly combined together,at a rate as high as the entropy of X .The mapping X ( X , . . . , X ) is a complete binarization: if Ψ is 2-extracting, then Ψ ′ ( X ) = Ψ( X ) ∗ · · · ∗ Ψ( X ) is 6-extracting. Note that X i ’s are not independent. However, Ψ( X i ) ’s are independent and therefore wecan concatenate them. Moreover, if Ψ is asymptotically optimal, then Ψ ′ is also asymptotically optimal. If one ormore of X i ’s are omitted, then the resulting Ψ ′ is still 6-extracting, but not asymptotically optimal anymore. Andthe same story holds true of any binary tree.II. E XTRACTING P ROCEDURES AND B INARIZATION
A. Extracting Procedures
Our dice X has m faces with values , , . . . , m − with probability distribution h p , . . . , p m − i . A sequence x = x . . . x n ∈ { , , . . . , m − } n is considered to be taken from n repeated throws of the dice. Summarizedbelow are some necessary facts on extracting procedures. Refer to [10] and [6] for details. Definition 1 ([3], [10]) . A function f : { , , . . . , m − } n → { , } ∗ is m -extracting if for each pair z , z in { , } ∗ such that | z | = | z | , we have Pr( f ( x ) = z ) = Pr( f ( x ) = z ) , regardless of the distribution h p , . . . , p m − i . Definition 2.
A function
Ψ : { , , . . . , m − } ∗ → { , } ∗ is called an m -extracting procedure if its restriction on { , , . . . , m − } n is extracting, for every n ≥ . Define Ψ on { , } by the rule (1) and call it von Neumann function. Extend it by, for an empty string, Ψ ( λ ) = λ, for a nonempty even-length input, Ψ ( x x . . . x n ) = Ψ ( x x ) ∗ · · · ∗ Ψ ( x n − x n ) , where ∗ is concatenation, and for an odd-length input, drop the last bit and take the remaining even-length bits. Thenthe resulting function Ψ is a 2-extracting procedure. Of course, there are more interesting extracting procedures.Asymptotically optimal 2-extracting procedures like Elias’s [2], [11], [10] and Peres’s [3], [12], [6] also extend vonNeumann function but do not simply repeat it.Denote by S ( n ,n ,...,n m − ) the subset of { , , . . . , m − } n that consists of sequences with n i i ’s. Then { , , . . . , m − } n = [ n + n + ··· + n m − = n S ( n ,n ,...,n m − ) , and each S ( n ,n ,...,n m − ) is an equiprobable subset of elements whose probability of occurrence is p n p n · · · p n m − m − .The size of an equiprobable set is given by a multinomial coefficient like nn , n , . . . , n m − ! = n ! n ! n ! · · · n m − ! . When m = 2 , an equiprobable set S ( l,k ) is also written as S n,k , where n = l + k , and its size can also be writtenas an equivalent binomial coefficient as well as the multinomial one: nk ! = nl, k ! . Extracting functions can be characterized using the concept of multiset . A multiset is a set with repeated elements;formally, a multiset M on a set S is a pair ( S, ν ) , where ν : S → N is a multiplicity function and ν ( s ) is calledthe multiplicity , or the number of occurrences of s ∈ S . The size | M | of M = ( S, ν ) is P s ∈ S ν ( s ) . For multisets A and B , A ⊎ B is the multiset such that an element occurring a times in A and b times in B occurs a + b timesin A ⊎ B . So | A ⊎ B | = | A | + | B | , and the operation ⊎ is associative.When we write x ∈ M = ( S, ν ) , it simply means that x ∈ S . However, when we use the expression “ x ∈ M ” asan index, the multiplicity of the elements is taken into account. For example, for multisets A and B , the multiset A ⊎ B can be redefined as { x | x ∈ A or x ∈ B } .By Definition 1, the image of an extracting function consists of multiple copies of { , } N , the exact full setof binary strings of various lengths N ’s. For example, von Neumann procedure defined above sends { , } to 12copies of { , } , 6 copies { , } , and one copy of { , } . Definition 3 ([6]) . A multiset A of bit strings is extracting if, for each z that occurs in A , all the bit strings oflength | z | occur in A the same time as z occurs in A . For multisets A and B of bit strings, define a new multiset A ∗ B = { s ∗ t | s ∈ A, t ∈ B } , and thisoperation is associative, too. If A and B are extracting, both A ∗ B and A ⊎ B are extracting. Denote by f (( C )) the multiset { f ( x ) | x ∈ C } , or equivalently, ( f ( C ) , ν ) with ν ( z ) = | f − ( z ) ∩ C | for z ∈ f ( C ) . Note that | f (( C )) | = | C | . For a disjoint union C ∪ D , we have f (( C ∪ D )) = f (( C )) ⊎ f (( D )) . With this notation, Ψ (( { , } )) =12 · { , } ⊎ · { , } ⊎ · { , } .The following lemma reinterprets the definition of extracting function in terms of equiprobable sets and theirimages. Lemma 4 ([6]) . A function f : { , , . . . , m − } n → { , } ∗ is extracting if and only if f (( S ( n ,n ,...,n m − ) )) isextracting for each tuple ( n , n , . . . , n m − ) of nonnegative integers such that n + n + · · · + n m − = n .B. Binarization Given a function φ : { , , . . . , m − } → { , , λ } , φ ( X ) is a Bernoulli random variable with distribution h p, q i ,where p = X φ ( i )=0 p i /s, q = X φ ( i )=1 p i /s, and s = X φ ( i ) = λ p i . Extend φ to { , , . . . , m − } n , by letting, for x = x . . . x n , φ ( x ) = φ ( x ) ∗ · · · ∗ φ ( x n ) . Then, for an equiprobableset S = S ( n ,...,n m − ) , its image under φ is also equiprobable, that is, φ ( S ) = S ( l,k ) , where l = X φ ( i )=0 n i , k = X φ ( i )=1 n i . A binarization takes a sequence over { , , . . . , m − } and outputs several binary sequences that are to beseparately fed into a binary extracting procedure and then concatenated together to obtain random bits. Definition 5.
A collection of functions
Φ = { Φ i : { , , . . . , m − } → { , , λ } | i = 1 , . . . , M } is called a binarization if, when extended to { , , . . . , m − } n , given a 2-extracting procedure Ψ , the mapping x Ψ ′ ( x ) =Ψ(Φ ( x )) ∗ · · · ∗ Ψ(Φ M ( x )) is an m -extracting function. Here, each Φ i is called a component of Φ , and we oftenregard Φ as a mapping on { , , . . . , m − } ∗ given by Φ( x ) = (Φ ( x ) , . . . , Φ M ( x )) . For an asymptotically optimal2-extracting procedure Ψ , if the resulting Ψ ′ is asymptotically optimal, then Φ is called a complete binarization. Now, for a function φ : { , , . . . , m − } → { , , λ } , let supp ( φ ) = { x | φ ( x ) = 0 } , supp ( φ ) = { x | φ ( x ) = 1 } , supp( φ ) = { x | φ ( x ) = λ } = supp ( φ ) ∪ supp ( φ ) , and call them 0-support, 1-support, and support of φ , respectively. Call φ degenerate if its 0-support or 1-supportis empty so that φ ( X ) is a degenerate Bernoulli random variable.Consider a binary tree with m external nodes labeled uniquely with , , . . . , m − . For an internal node v definea function φ v : { , , . . . , m − } → { , , λ } as follows: φ v ( x ) = , if x ∈ leaf ( v ) , , if x ∈ leaf ( v ) , λ, otherwise.where leaf ( v ) ( leaf ( v ) , respectively) is the set of external nodes on the left (right, respectively) subtree of v .Since there are exactly m − internal nodes, we uniquely name them with , . . . , m − , with 1 the root node, andthe corresponding functions Φ , . . . , Φ m − . Call such trees m -binarization trees .For example, the tree (2) that we considered in the introduction is a 6-binarization tree and defines the followingfunctions: x Φ ( x ) Φ ( x ) Φ ( x ) Φ ( x ) Φ ( x ) λ λ λ λ λ λ λ λ λ λ λ λ λ Theorem 6.
For an m -binarization tree, the set of associated functions Φ = { Φ , . . . , Φ m − } is a completebinarization. Also, any nonempty subset of Φ is a binarization. For a proof, we use the leaf entropy theorem together with a technical lemma that we call
Structure Lemma.
Thecoin X i = Φ i ( X ) has an output with probability P i = P j ∈ supp(Φ i ) p j , and its distribution is π i = h p, q i , where p = X j ∈ supp (Φ i ) p j /P i , q = X j ∈ supp (Φ i ) p j /P i . Stated below is the leaf entropy theorem in our context of m -binarization trees. Theorem 7 (Leaf Entropy Theorem) . The branching entropies of Φ i ( X ) weighted by the probability P i sum up tothe entropy of X : H ( X ) = m − X i =1 P i H ( π i ) . The following is the main technical tool of this work and we prove it in Section IV.
Lemma 8 (Structure Lemma) . Let
Φ = { Φ , . . . , Φ m − } be the set of functions defined by an m -binarizationtree. Then the mapping Φ : x Φ( x ) = (Φ ( x ) , . . . , Φ m − ( x )) gives a one-to-one correspondence between anequiprobable subset S = S ( n ,n ,...,n m − ) and Φ ( S ) × · · · × Φ m − ( S ) .Proof of Theorem 6. Let Ψ be a 2-extracting procedure. For an equiprobable set S , each S i = Φ i ( S ) is equiprobable,and thus Ψ(( S i )) is extracting, by Lemma 4. Now, by Lemma 8, Ψ ′ (( S )) = Ψ(( S )) ∗ · · · ∗ Ψ(( S m − )) . Since each Ψ(( S i )) is extracting, their concatenation Ψ ′ (( S )) is extracting, by the associativity of concatenation of multisetsand the fact that concatenation of extracting multisets is extracting. The same holds true even if we omit somecomponents of Φ .Since the coin X i = Φ i ( X ) has the distribution π i and outputs with the probability P i , if Ψ is asymptoticallyoptimal, then the output rate of Ψ( X i ) converges to P i H ( π i ) as the input size n → ∞ . Therefore, the output rateof Ψ ′ approaches to P P i H ( π i ) , which equals H ( X ) by the leaf entropy theorem. III. E
XAMPLES
A. An Entropy-Preserving Binarization
For a symbol x ∈ { , , . . . , m − } and ≤ i ≤ m − , consider x ( i ) = , x < i, , x = i,λ, x > i. When m = 6 , we have their values as follow: x Pr( x ) x (1) x (2) x (3) x (4) x (5) p p p λ p λ λ p λ λ λ p λ λ λ λ These functions are associated with the following 6-binarization tree:
For x = x . . . x n ∈ { , , . . . , m − } n , define x ( i ) = x ( i )1 ∗· · ·∗ x ( i ) n . So for a sequence x of length n , x ( i ) is a binarysequence of length at most n . For a binary extracting procedure Ψ , the function Ψ ′ : { , , . . . , m − } n → { , } ∗ ,defined by Ψ ′ ( x ) = Ψ( x (1) ) ∗ · · · ∗ Ψ( x ( m − ) , is m -extracting, and if Ψ is asymptotically optimal, then so is Ψ ′ .To illustrate the structure lemma, for m = 4 , consider an equiprobable subset S = S (1 , , ⊂ { , , } , and let S ( i ) = { x ( i ) | x ∈ S } . Then, S ( i ) is another equiprobable set in { , } n ′ . For example, for S = S (1 , , , observethat x x (2) x (1) and we can see that, as multiset images of x (1) and x (2) , S ((1)) = 4 · S (1 , ,S ((2)) = 3 · S (3 , . Note that | S (1 , , | = 4!1!2!1! = 3!1!2! × | S (1 , | × | S (3 , | . Of course, by the structure lemma, S is in one-to-one correspondence with S (1) × S (2) . B. Zhou-Bruck Binarization
The following method was proposed by Zhou and Bruck [7]. For x ∈ { , , . . . , m − } , let x ′ be the ⌈ lg m ⌉ -bitbinary expansion of x , and also for α ∈ { , } ∗ , let x α = ( a, if αa is a prefix of x ′ , λ, otherwise.That is, x α is the bit that immediately follows α in the standard binary expansion of x . For example, when m = 6 ,we have the following functions: x x ′ x λ x x x x x λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ
05 101 1 λ λ λ After the degenerate x is removed, they are associated with the following 6-binarization tree: The mapping x Ψ ′ ( x ) = Ψ( x λ ) ∗ · · · ∗ Ψ( x ... ) is an asymptotically optimal m -extracting procedure if Ψ isasymptotically optimal. IV. T HE S TRUCTURE L EMMA
Given a binarization tree and its subtree T , let X T be the restriction of X on the leaf set of T . The leaf entropytheorem is proved by induction using the following recursion, H ( X T ) = ( , if T is a leaf, H ( π ) + pH ( X T ) + qH ( X T ) , otherwise, (3)where, for nonempty T , T and T are the left and right subtrees and π = h p, q i is the branching distribution ofthe root of T . The structure lemma holds for a similar reason. Proof of Structure Lemma.
For an equiprobable subset S = S ( n ,...,n m − ) and a subtree T of the given binarizationtree, let S T be the restriction of S on the leaf set of T . Then we have a similar recursion S T ∼ = ( { } , if T is a leaf, S ( l,k ) × S T × S T , otherwise, (4)where, for nonempty T and φ the branching function associated with the root of T , T and T are the left andright subtrees and l = X φ ( i )=0 n i , k = X φ ( i )=1 n i . Recall that a binary tree is recursively defined to be a set of nodes that is either an empty set (a terminal node), or consists of a rootnode, a left subtree and a right subtree, both of which are binary trees.
First, if T is a leaf with label i , then S T is a singleton set that consists of a single string of n i i ’s, hence the firstpart of (4). When T is nonempty, the correspondence S T → S ( l,k ) × S T × S T is given by x ( φ ( x ) , x T , x T ) ,where x T and x T are restrictions of x . This correspondence is one-to-one because φ ( x ) encodes the branchingwith which x is recovered from x T and x T , giving an inverse mapping S ( l,k ) × S T × S T → S T . For example,consider tree (2) and suppose that T is the subtree rooted at the node . For x = 102235315401 , the followingshows the restrictions of x and Φ i ( x ) ’s.
31 4 0
34 5 x T = 10331401(Φ ( x ) = 00110000)101401(010110) 33111 040(101)4 00 By taking symbols one by one from x T = 101401 and x T = 33 , according to Φ ( x ) = 00110000 = ( b i ) i =1 , if b i is 0, from x T , otherwise, from x T , we recover x T = 10331401 .Induction on subtrees proves the lemma.See [13] for an alternative proof. V. R EMARKS
A. Leaf Entropy Theorem and Structure Lemma
The leaf entropy theorem is well known in the information theory, and it follows from the grouping rule ofentropy (see, e.g., the defining property 3 of entropy in Shannon’s original work [4, p. 49], or Problem 2.27 of [5]),which is essentially the recursion (3) in Section IV. As we saw, the structure lemma is proved similarly, hintingthat they are closely related. In fact, using the asymptotic equipartition property (AEP) [5], the structure lemmaimplies the leaf entropy theorem.For a large n , the typical set A ( n ) consists of x = ( x , x , . . . , x n ) that contains about n = p n n = p n . . . , n m − = p m − n ( m − ’s. Let S = S ( n ,...,n m − ) . The asymptotic equipartition property implies that lim n →∞ n log | S | = H ( X ) . On the other hand, by Structure Lemma, S = S × · · · × S m − , where S i = Φ i ( S ) . Note that S i = S ( l i ,k i ) , where l i = X j ∈ supp (Φ i ) n j , k i = X j ∈ supp (Φ i ) n j , and ( l i + k i ) /n → P i and h l i /n, k i /n i → π i as n → ∞ . Since l i + k i ) log | S i | → H ( π i ) , we have n log | S i | → P i H ( π i ) , and n log | S | = 1 n m − X i =1 log | S i | → m − X i =1 P i H ( π i ) , as n → ∞ . B. Generalization of Structure Lemma to Non-Binary Trees
The leaf entropy theorem holds for general trees. The structure lemma also can be generalized to trees whosenodes are not necessarily of degree 2 and whose leaves have unique labels, although in that case, the naming“binarization tree” might not be appropriate. C. m -ary Asymptotically Optimal Extracting Algorithm As an immediate application, take the original binary Peres procedure Ψ and apply Theorem 6. The resulting Ψ ′ is an m -ary asymptotically optimal extracting procedure. As with the original Peres algorithm and its generalization, Ψ ′ runs in O ( n log n ) time, for a fixed m , because Φ i ( x ) is computed in linear time and | Φ i ( x ) | ≤ n for each i . D. Other Applications of Binarization Trees
Peres algorithm is a simple extracting algorithm defined recursively using the famous von Neumann trick as abase, whose output rate approaches the information-theoretic upper bound [3]. However, it is relatively hard toexplain why it works, and it appears partly due to this difficulty that its generalization to many-valued source wasdiscovered only recently [6]. Binarization tree provides a new unified way to understand the original Peres algorithmand its generalizations and facilitates finding many new Peres-style recursive algorithms [14]. By coming up withan appropriate binarization tree (not necessarily based on binary tree but possibly a general tree), a Peres-stylerecursion follows. As with our main result, Theorem 6, the Peres-style recursive algorithms are extracting by thecorresponding structure lemma, and asymptotically optimal by the leaf entropy theorem.The structure lemma gives many different ways to factorize a set of m -combinations into sets of binary combi-nations. We can use this idea to give a ranking on m -combinations, which can be seen as a mixed-radix numbersystem whose radices are binomial numbers [15]. E. Binarization Trees and DDG-trees
DDG-trees (discrete distribution generation trees) work in the opposite way of binarization trees [16], [17], [11],[10]. With a binarization tree, the leaves correspond to the source and various coins are produced. With DDG trees,the nodes correspond to the source and target symbols of the leaves are produced. However, the essential differenceis that DDG has the same branching distribution for every node and that the leaves don’t have to have uniquelabels. If the various source coins with distributions π i ’s are provided, and the coins are tossed starting from theroot in the fashion of DDG-trees, then we arrive at leaves with the target probability distribution h p , . . . , p m − i .Therefore, binarization tree can be regarded as a generalization of DDG-tree with more than one source and uniquelabels on leaves. R EFERENCES [1] J. von Neumann, “Various techniques for use in connection with random digits. Notes by G. E. Forsythe,” in
Monte Carlo Method,Applied Mathematics Series . U.S. National Bureau of Standards, Washington D.C., 1951, vol. 12, pp. 36–38, reprinted in vonNeumann’s
Collected Works (Pergammon Press, 1963), 768–770.[2] P. Elias, “The efficient construction of an unbiased random sequence,” The Annals of Mathematical Statistics , vol. 43, no. 3, pp.865–870, 1972.[3] Y. Peres, “Iterating von Neumann’s procedure for extracting random bits,”
Annals of Statistics , vol. 20, no. 1, pp. 590–597, 1992.[4] C. E. Shannon and W. Weaver,
The Mathematical Theory of Communication . Urbana: The University of Illinois Press, 1964.[5] T. M. Cover and J. A. Thomas,
Elements of information theory (2. ed.) . Wiley, 2006.[6] S. Pae, “A generalization of Peres’s algorithm for generating random bits from loaded dice,”
IEEE Transactions on Information Theory ,vol. 61, no. 2, 2015.[7] H. Zhou and J. Bruck, “A universal scheme for transforming binary algorithms to generate random bits from loaded dice,”
CoRR , vol.abs/1209.0726, 2012. [Online]. Available: http://arxiv.org/abs/1209.0726[8] J. L. Massey, “The entropy of a rooted tree with probabilities,” in
Proceedings of the 1983 IEEE International Symposium on InformationTheory , 1983.[9] D. E. Knuth,
The Art of Computer Programming, Sorting and Searching , 2nd ed. Addison-Wesley, 1998, vol. 3.[10] S. Pae and M. C. Loui, “Randomizing functions: Simulation of discrete probability distribution using a source of unknown distribution,”
IEEE Transactions on Information Theory , vol. 52, no. 11, pp. 4965–4976, November 2006.[11] ——, “Optimal random number generation from a biased coin,” in
Proceedings of the Sixteenth Annual ACM-SIAM Symposium onDiscrete Algorithms , January 2005, pp. 1079–1088.[12] S. Pae, “Exact output rate of Peres’s algorithm for random number generation,”
Inf. Process. Lett. , vol. 113, no. 5-6, pp. 160–164,2013.[13] ——, “Binarizations in random number generation,” in
IEEE International Symposium on Information Theory, ISIT 2016, Barcelona,Spain, July 10-15, 2016 , 2016, pp. 2923–2927. [Online]. Available: https://doi.org/10.1109/ISIT.2016.7541834[14] ——, “Peres-style recursive algorithms,” 2018, submitted.[15] ——, “Recursive enumerations of combinations,” 2018, in preparation.[16] D. E. Knuth and A. C.-C. Yao, “The complexity of nonuniform random number generation,” in
Algorithms and Complexity:New Directions and Recent Results. Proceedings of a Symposium , J. F. Traub, Ed., Carnegie-Mellon University, Computer ScienceDepartment. New York, NY: Academic Press, 1976, pp. 357–428, reprinted in Knuth’s
Selected Papers on Analysis of Algorithms (CSLI, 2000).[17] T. S. Han and M. Hoshi, “Interval algorithm for random number generation,”