[PDF] An Improved Exact Sampling Algorithm for the Standard Normal Distribution

Abstract

In 2016, Karney proposed an exact sampling algorithm for the standard normal distribution. In this paper, we study the computational complexity of this algorithm under the random deviate model. Specifically, Karney's algorithm requires the access to an infinite sequence of independently and uniformly random deviates over the range (0,1). We give an estimate of the expected number of uniform deviates used by this algorithm until outputting a sample value, and present an improved algorithm with lower uniform deviate consumption. The experimental results also shows that our improved algorithm has better performance than Karney's algorithm.

Full PDF

aa r X i v : . [ c s . D S ] A ug An Improved Exact Sampling Algorithm for the StandardNormal Distribution ∗ Yusong Du, Baoying Fan, and Baodian Wei

School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, ChinaGuangdong Key Laboratory of Information Security Technology, Guangzhou 510006, China [email protected]

Abstract

In 2016, Karney proposed an exact sampling algorithm for the standard normal distri-bution. In this paper, we study the computational complexity of this algorithm under therandom deviate model. Speciﬁcally, Karney’s algorithm requires the access to an inﬁnitesequence of independently and uniformly random deviates over the range (0 , ∗ This work was supported by National Key R&D Program of China (2017YFB0802500), the Major Programof Guangdong Basic and Applied Research (2019B030302008), National Natural Science Foundations of China(Grant Nos. 61672550, 61972431), and the Fundamental Research Funds for the Central Universities (GrantNo. 19lgpy217). Introduction

We denote the set of real numbers by R . The Gaussian function on R with parameter σ > µ ∈ R evaluated at x ∈ R can be deﬁned by ρ σ,µ ( x ) = exp (cid:18) − ( x − µ ) σ (cid:19) . Normalizing ρ σ,µ ( x ) by its total measure R x ∈ R ρ σ,µ ( x ) = √ πσ over R , we obtain the proba-bility density function of the (continuous) Gaussian distribution N ( µ, σ ), namely the normaldistribution of mean µ and variance σ .In 2016, Karney [6] proposed an exact sampling algorithm for the standard normal distri-bution N (0 ,

1) of mean 0 and variance 1. This algorithm uses rejection sampling [1], requiresno ﬂoating-point arithmetic, and generates sample values that conform to N (0 ,

1) without anystatistical discrepancy if we have a source of perfectly uniform deviates over the range (0 , µ = 0 and variance 1 over the set ofall the non-negative integers. We denote this distribution by D Z + , , where Z + is the set ofnon-negative integers. This distribution is simply the standard normal distribution restrictedso that its support is Z + . Generally, for all k ∈ Z + , a discrete Gaussian distribution of mean µ and variance σ over the set of non-negative integers Z + can be deﬁned by D Z + ,σ,µ ( k ) = ρ σ,µ ( k ) /ρ σ,µ ( Z + ) , where ρ σ,µ ( Z + ) = P x ∈ Z + ρ σ,µ ( x ). By convention, the subscript µ is omitted when it is taken tobe 0. Karney’s algorithm also involves exactly sampling from the Bernoulli distribution B exp ( − x (2 k + x ) )with an integer k ∈ Z + and a real number x ∈ (0 , (cid:0) − x (2 k + x ) (cid:1) . In this paper, we study the computational complexity of Karney’s exact sampling algorithm forthe standard normal distribution under the random deviate model. In this model, we have theaccess to an inﬁnite sequence of independently and uniformly random deviates over the range(0 , D Z + , ( k ) = ρ ( k ) /ρ ( Z + ) in Karney’s algorithm is not the optimal.It uses about 4 . B / √ e on average (see Proposition 2), andeach Bernoulli random value from B / √ e requires √ e uniform deviates on average. We will givean improved algorithm with lower uniform deviate consumption for sampling D Z + , , which onlyrequires about 3 .

684 Bernoulli random values from B / √ e on average to output an integer from D Z + , (see Proposition 3). We also ﬁnd that the suggested way of sampling from the Bernoullidistribution B exp ( − x (2 k + x ) ) in Karney’s algorithm consumes about 2 .

194 uniform deviates onaverage. This is a relatively large number. We will present an alternative algorithm so that onemay avoid generating the Bernoulli value in the suggested way and reduce the expected numberof uniform deviates used to 2 . In fact, as a rejection sampling algorithm, its acceptance/rejection rate can be regarded as thecomputational complexity. In Karney’s paper [6], it has been indicated that the rejection rateof the sampling algorithm for the standard normal distribution is ( p /π ) / (1 − / √ e ) ≈ . .

03 times on average before it outputs a samplevalue. However, the rejection rate is only a very “coarse-grained” complexity model, whichcould not accurately reﬂect the actual computational cost. Therefore, in order to analyze thecomputational complexity of Karney’s algorithm more accurately, we consider the expectednumber of uniform deviates used, namely, the random deviate model, which can not only coverthe computational complexity of sampling the proposal distribution D Z + , but also reﬂect thecomputational cost required by the rejection operation. The impact of rejection rate on thecomplexity of the sampling algorithm can also be fully reﬂected in the random deviate model.The computational complexity of von Neumann’s exact sampling algorithm for the exponen-tial distribution e − x with x ≥ e/ (1 − e − ) ≈ .

30 to √ e/ (1 − / √ e ) ≈ .

19. (see Algorithm E in [6].) Our work in this paper is of a similar nature.We study the computational complexity by analyzing and reducing the expected number of uni-form deviates used by Karney’s exact sampling algorithm for the standard normal distribution,as we note that there is no theoretical estimate of the expected number of uniform deviates forthis algorithm.Another common complexity model for random sampling algorithms is the random bit model,in which the complexity is measured by the expected number of uniformly random bits used bythe sampling algorithm [7, 4, 2]. Karney also indicated that the expected number of random bitsconsumed by his algorithm for the standard normal distribution is about 30 .

0, but this was onlyan empirical value obtained by experiment [6]. A uniformly random deviate is always made up ofan indeﬁnite number of uniformly random bits, so the random bit model is a more “ﬁne-grained”complexity model as compared to the random deviate model. Nonetheless, we will only use therandom deviate model to analyze the the computational complexity in this work, because the3andom source required by Karney’s algorithm is mainly in the form of uniform deviates over therange (0 , Algorithm 1 is Karney’s algorithm for sampling exactly from the standard (continuous) normaldistribution N (0 ,

1) of mean 0 and variance 1.

Algorithm 1 [6] Sampling from the standard normal distribution N (0 , Output: a sample value from N (0 , select k ∈ Z + with probability exp( − k/ · (1 − exp( − / accept k with probability exp (cid:0) − k ( k − (cid:1) , otherwise goto step 1. sample a uniformly random number x ∈ (0 , accept x with probability exp (cid:0) − x (2 k + x ) (cid:1) , otherwise goto step 1. set s ← ± return s ( k + x ).From the perspective of rejection sampling, in Algorithm 1, we can see that step 1 and step2 form a rejection sampling procedure, which samples k ∈ Z + from the discrete Gaussian distri-bution D Z + , = ρ ( k ) /ρ ( Z + ). Step 4 is equivalent to sampling from the Bernoulli distribution B exp ( − x (2 k + x ) ). The correctness can seen from the fact that ρ ( k ) · exp (cid:18) − x (2 k + x ) (cid:19) = exp (cid:18) − ( k + x ) (cid:19) . In Algorithm 1, in order to exactly sample k ∈ Z + with relative probability density ρ ( k ) =exp( − k / B / √ e by applyingAlgorithm 2. Speciﬁcally, one applies Algorithm 2 repeatedly ( k + 1) times to select an integer k ≥ − k/ · (1 − exp( − / k ( k −

1) times to accept or reject k with probability exp (cid:0) − k ( k − (cid:1) (step 2 in Algorithm 1). Algorithm 2 [6] Generating a Bernoulli random value which is true with probability 1 / √ e Output: a Boolean value according to B / √ e sample uniform deviates u , u , . . . with u i ∈ (0 ,

1) and determine the maximum value n ≥ / > u > u > . . . > u n return true if n is even, otherwise return false if n is odd.Algorithm 2 is adapted from Von Neumann’s algorithm [9] for exactly sampling from theexponential distribution e − x for real x >

0. More precisely, the probability that the length ofthe longest decreasing sequence is n is x n /n ! − x n +1 / ( n + 1)!, and the probability that n is even4s exactly equal to(1 − x ) + (cid:18) x − x (cid:19) + . . . + = ∞ X n =0 (cid:18) x n n ! − x n +1 ( n + 1)! (cid:19) = e − x . In Algorithm 1, step 4 will also be implemented by using a speciﬁcally designed algorithm,so that it produces no statistical discrepancy. The main idea is to sample two sets of uniformdeviates u , u , . . . and v , v , . . . from (0 , n ≥ x > u > u > . . . > u n and v i < (2 k + x ) / (2 k + 2). If n is even, it returns true , and theprobability is exactly equal to1 − x (cid:18) k + x k + 2 (cid:19) + x (cid:18) k + x k + 2 (cid:19) − x (cid:18) k + x k + 2 (cid:19) + . . . = exp (cid:18) − x k + x k + 2 (cid:19) . This procedure can be summarized as Algorithm 3.

Algorithm 3 [6] Generating a Bernoulli random value which is true with probability exp( x k + x k +2 )with an integer k > x ∈ (0 , Output: a Boolean value according to exp( − x k + x k +2 ) set y ← x , n ← sample uniform deviates z with z ∈ (0 , z < y . set f ← C (2 k + 2); if f < sample uniform deviates r ∈ (0 ,

1) if f = 0, and go to step 6 unless r < x . set y ← z , n ← n + 1; goto step 2. return true if n is even, otherwise return false if n is odd.In Algorithm 3, C ( m ) with m = 2 k + 2 is a random selector, which returns −

1, 0 and 1 withprobability 1 /m , 1 /m and 1 − /m respectively. By applying Algorithm 3 at most k + 1 times,since exp (cid:18) − x (2 k + x ) (cid:19) = exp (cid:18) − x k + x k + 2 (cid:19) k +1 , one can obtain a Bernoulli random value which is true with probabilityexp (cid:18) − x (2 k + x ) (cid:19) for given k and x . D Z + , In this section, we analyze the expected number of uniform deviates used by step 1 and step2 in Algorithm 1, which generate a non-negative integer according to D Z + , . Then, we give animproved sampling algorithm with lower uniform deviate consumption.5 .1 The Complexity of Karney’s Algorithm for Sampling from D Z + , Proposition 1.

Algorithm 2 uses e / uniform deviates on average to output a Bernoulli valueaccording to B / √ e .Proof. Let x = 1 /

2. Algorithm 2 uses a uniform deviate for each comparison, and the probabilitythat the length of the longest decreasing sequence is n is x n /n ! − x n +1 / ( n + 1)!. Thus, theexpected number of uniform deviates used can be given by(1 − x ) · (cid:18) x − x (cid:19) · (cid:18) x − x (cid:19) · . . . + = ∞ X n =1 (cid:18) x n − ( n − − x n n ! (cid:19) · n = e x . Algorithm 2 uses e x = e / uniform deviates on average to output a Bernoulli value.Let x be an arbitrary number over the range (0 , / x in Algorithm 2,we can get a Bernoulli random value which is true with probability e − x , namely sampling from B e − x with x ∈ (0 , u , u , . . . with u i ∈ (0 ,

1) anddetermine the maximum value n ≥ x > u > u > . . . > u n , then we get true if n is even, or false if n is odd. This can be viewed as a generalized version of Algorithm 2, whichhas been used implicitly in Karney’s algorithms. It is clear that Proposition 1 also holds for any x ∈ (0 , Corollary 1.

Let x ∈ (0 , . Replacing / with x in Algorithm 2, we can use e x uniformdeviates on average to obtain a Bernoulli value which is true with probability e − x . Since the expected number of uniform deviates used by Algorithm 2 is a constant, it suﬃcesto consider the expected number of Bernoulli random values from B / √ e that are required forstep 1 and step 2 in Algorithm 1. Proposition 2.

The expected number of Bernoulli random values from B / √ e used by step 1and step 2 for sampling from D Z + , in Algorithm 1 is about . .Proof. Let p = 1 / √ e and p = 1 − p . Assume that k ≥ B / √ e used by Algorithm 1is ∞ X k =0 ( k + 1 + k ( k − p k p p k ( k − = ∞ X k =0 (1 + k ) p k p , if k is accepted in step 2, and the expected number of Bernoulli random values is ∞ X k =2 p k p  k ( k − X j =1 j · p j − · p  , if k is rejected in step 2. In particular, if k = 0 or k = 1, then k is directly accepted in step 2.Therefore, the the expected number of Bernoulli random values from B / √ e used by Algorithm 1for executing step 1 and step 2 at a time can be given by ∞ X k =0 (1 + k ) p p k + ∞ X k =2 p k p  k ( k − X j =1 ( k + 1 + j ) · p j − p  ≈ . . P ∞ k =0 p k p ≈ . B / √ e used by Algorithm 1 for successfully generating an integer from D Z + , is about3 . / . ≈ . D Z + , Karney’s algorithm for sampling from D Z + , can be easily extended to the case of discreteGaussian distribution D Z ,σ with a rational-valued σ > √ /

2. So, we obtain Algorithm 4.

Algorithm 4

Sampling from D Z + ,σ with a rational-valued σ > √ / Output: an integer k that conforms to D Z + ,σ select k ∈ Z + with probability exp( − k/ (2 σ )) · (1 − exp( − / (2 σ )). return k with probability exp( − σ k ( k − B exp( − / (2 σ )) . Since 0 < / σ <

1, as mentioned in Section 3.1, by replacing 1 / / σ in Algorithm 2, one can get a Bernoulli random value which is true with probabilityexp( − / (2 σ )). By Corollary 1, it uses exp(1 / (2 σ )) uniform deviates on average to obtain aBernoulli value from B exp( − / (2 σ )) .Following the proof of Proposition 2 with p = exp( − / (2 σ )) and p = 1 − p , one caneasily estimate the expected number of Bernoulli random values from B exp( − / (2 σ )) used byAlgorithm 4. Here, we give an improved version of Algorithm 4, namely Algorithm 5. It uses asmaller number of Bernoulli random values from B exp( − / (2 σ )) , and thus has lower computationalcomplexity compared to Algorithm 4. Algorithm 5

Sampling from D Z ,σ with a rational-valued σ > √ / Output: an integer k that conforms to D Z ,σ sample b ← B exp( − / (2 σ )) and return if b is false. sample b ← B exp( − / (2 σ )) and return if b is false. set k ← t ← k − while t > do sample b ← B exp( − / (2 σ )) . if b is false, then goto step 1, otherwise set t ← t − sample b ← B exp( − / (2 σ )) . return k if b is false, otherwise set k ← k + 1, t ← k −

1) and goto step 4.It is not hard to see that Algorithm 5 is also to select k ∈ Z + with probability exp( − k/ (2 σ )) · (1 − exp( − / (2 σ )), and then to accept k with probability exp( − σ k ( k − k has the desired (relative) probability densityexp( − k/ (2 σ )) · exp( − σ k ( k − − k / (2 σ )) , b is false in step 1 or step 2 isequivalent to k ( k −

1) = 0. Thus, Algorithm 5 directly outputs k = 0 or k = 1 in these twocases.The diﬀerence between Algorithm 4 and Algorithm 5 is that the latter decides to accept orreject k at once as long as k ≥

2, while the former determines the ﬁnal value of k ﬁrstly, andthen decides to accept or reject it. Speciﬁcally, Algorithm 4 needs ( k + 1) Bernoulli randomvalues from B exp( − / (2 σ )) to determines the value of k in step 1. However, the last Bernoullivalue will be wasted if k is rejected in step 2. So, the basic idea of Algorithm 5 is to accept orreject k earlier (before the ﬁnal value of k is determined), and then decides whether or not toreturn k by using one more Bernoulli random value from B exp( − / (2 σ )) , or tries to ﬁnd a larger k if necessary. In other words, a Bernoulli random value will be saved in Algorithm 5 if theselected k is rejected. Therefore, the expected number of uniform deviates used by Algorithm 5will be smaller as compared to Algorithm 4. The actual number for a given σ can be estimatedas follows. Proposition 3.

The expected number of Bernoulli random values from B / √ e used by Algo-rithm 5 with σ = 1 for sampling from D Z + , is about . .Proof. Let p = exp( − / (2 σ )) = exp( − /

2) and p = 1 − p . The expected number of Bernoullirandom values from B p used by Algorithm 5 for outputting some integer k ≥ ∞ X k =0 (1 + k + k ( k − p k p k ( k − p = ∞ X k =0 (1 + k ) p k p . The expected number of Bernoulli random values from B p used by Algorithm 5 for rejectingsome integer k ≥ p p (3 · p + 4 · p p ) + . . . + p p p (6 · p + 7 p p + 8 · p p + 9 · p p ) + . . . = ∞ X k =2 p k − p ( k − k − p  k − X j =1 (1 + ( k − + j ) · p j − p  . In particular, if k = 0 or k = 1, then k will directly be accepted in step 1 or step 2. Thus, ifAlgorithm 5 outputs some integer k , or if it rejects some integer k before going back to step 1,the expected number of Bernoulli random values from B p used can be given by ∞ X k =0 (1 + k ) p k p + ∞ X k =2 p k − p ( k − k − p  k − X j =1 (1 + ( k − + j ) · p j − p  , which is appropriately equal to 2 . / P ∞ k =0 p k p ≈ / . B p used by Algorithm 5 for successfully generating an integerfrom D Z + , is about 2 . / . ≈ . The Computational Complexity of Sampling from B exp ( − x (2 k + x ) ) B exp ( − x (2 k + x ) ) Algorithm 3 involves a random selector C ( m ) that returns −

1, 0 and 1 with probability 1 /m ,1 /m and 1 − /m respectively, where m = 2 k + 2. It is not hard to see that the random selector C ( m ) can be exactly implemented by sampling two Bernoulli random values from B /m and B / respectively. Sampling a Bernoulli random value from B /m is equivalent to sampling oneuniform deviate u ∈ (0 ,

1) and deciding whether 2 /m > u or 2 /m < u . In particular, if k = 0,i.e., m = 2, then the random selector C ( m ) = C (2) only uses a random bit and returns − C ( m ) uses one uniformdeviate if m ≥

4, and in particular it is “free of charge” if m = 2. Lemma 1.

For a given k ≥ , the probability that Algorithm 3 restarts (goes back to step 2) n times can be given by (cid:18) xm + 2 km (cid:19) n (cid:18) x n n ! (cid:19) , where m = 2 k + 2 and n is a positive integer.Proof. For a given k ≥

1, there are two cases in which the algorithm goes to step 2: (1) z < y and f = 1 with probability x · (2 k/m ); (2) z < y , f = − r < x with probability x · ( x/m ).Thus, the probability that the algorithm goes to step 2 is equal to x xm + x · km = x (cid:18) xm + 2 km (cid:19) . After restarting one time, the probability that the algorithm goes to step 2 once again is equalto (cid:18) x (cid:18) xm + 2 km (cid:19)(cid:19) x · xm + (cid:18) x (cid:18) xm + 2 km (cid:19)(cid:19) x · km = x (cid:18) xm + 2 km (cid:19) . Generally, the probability that the algorithm restarts n times ( n ≥

1) can be given by (cid:18) xm + 2 km (cid:19) n (cid:18) x n n ! (cid:19) , where x n /n ! is the probability that a set of uniform deviates z , z , . . . z n over the range (0 , x > z > z > . . . > z n . The proof can be completed by using the mathematical inductionon n . Proposition 4.

Let x ∈ (0 , . For a given k ≥ , the expected number of uniform deviatesused by Algorithm 3 for sampling from B exp ( − x (2 k + x ) ) can be given by (4 k + x + 3) · τ k ( x ) − k − k + x , where τ k ( x ) = exp( x k + x k +2 ) . roof. We can see that Algorithm 3 needs1 + 1 + (cid:16) xm (cid:17) · (cid:16) xm (cid:17) uniform deviates on average every time it restarts, where the ﬁrst deviate is for step 2, thesecond one is used by the random selector C ( m ), and the third one is possibly required when f = 0 in Algorithm 3. By Lemma 1, the probability of restarting n − n ≥

1) is (cid:18) xm + 2 km (cid:19) n − (cid:18) x n − ( n − (cid:19) . By the binomial theorem, for a given n ≥

1, Algorithm 3 uses n − X i =0 (cid:18) n − i (cid:19) (cid:16) xm (cid:17) i (cid:18) km (cid:19) n − − i (2 n − i )uniform deviates on average if it restarts n − k ≥

1, there are three cases inwhich the algorithm goes to step 6 and returns the result before it goes to step 2: (1) z > y withprobability (1 − x ) at a cost of one new uniform deviate; (2) f = − x (1 /m )at a cost of two new uniform deviates; (3) f = 0 and r > x with probability x (1 /m )(1 − x ) at acost of three new uniform deviates. Then, after restarting n − n ≥ z > y with probability (cid:18) xm + 2 km (cid:19) n − (cid:18) x n − ( n − − x n n ! (cid:19) , and at a cost of ∞ X n =1 n − X i =0 (cid:18) n − i (cid:19) (cid:16) xm (cid:17) i (cid:18) km (cid:19) n − − i (2 n − i ) (cid:18) x n − ( n − − x n n ! (cid:19) (1)uniform deviates, where x n − / ( n − − x n /n ! is the probability that the length of the longestdecreasing sequence x > z > z > . . . > z n is n .(2) f = − (cid:18) xm + 2 km (cid:19) n − (cid:18) x n n ! (cid:19) (cid:18) m (cid:19) , and at a cost of ∞ X n =1 n − X i =0 (cid:18) n − i (cid:19) (cid:16) xm (cid:17) i (cid:18) km (cid:19) n − − i (2 n + i ) (cid:18) x n n ! (cid:19) (cid:18) m (cid:19) (2)uniform deviates.(3) f = 0 and r > x with probability (cid:18) xm + 2 km (cid:19) n − (cid:18) x n n ! (cid:19) (cid:18) m (cid:19) (1 − x ) , ∞ X n =1 n − X i =0 (cid:18) n − i (cid:19) (cid:16) xm (cid:17) i (cid:18) km (cid:19) n − − i (2 n + 1 + i ) (cid:18) x n n ! (cid:19) (cid:18) m (cid:19) (1 − x ) (3)uniform deviates. Therefore, for a given k ≥

1, by combining the above three cases, i.e., summingEquations (1)-(3), we can obtain the expected number of uniform deviates used by Algorithm 3,which can be reduced to a function of x :(4 k + x + 3) · τ k ( x ) − k − k + x , where τ k ( x ) = exp( x k + x k +2 ).When k = 0, since the random selector C (2 k + 2) = C (2) returns − n times is (cid:18) (cid:19) n (cid:18) x n n ! (cid:19) x n , where x n is the probability that a set of uniform deviates r , r , . . . r n over the range (0 , r i < x for 1 ≤ i ≤ n . Following the idea of Proposition 4 we can also give an estimatefor Algorithm 3 with k = 0. Proposition 5.

Let x ∈ (0 , . when k = 0 , the expected number of uniform deviates used byAlgorithm 3 (with switching the order of step 2 and step 3) for sampling from B exp ( − x (2 k + x ) ) = B exp (cid:16) − x (cid:17) can be given by ( x + 2) · τ ( x ) − x , where τ ( x ) = exp( x ) .Proof. If k = 0, after restarting n − n ≥ f = − (cid:18) (cid:19) n (cid:18) x n − ( n − (cid:19) x n − , and at a cost of ∞ X n =1 (cid:18) (cid:19) n (cid:18) x n − ( n − (cid:19) x n − (2 n −

2) (4)uniform deviates.(2) z > y with probability (cid:18) (cid:19) n x n − (cid:18) x n − ( n − − x n n ! (cid:19) , ∞ X n =1 (cid:18) (cid:19) n x n − (cid:18) x n − ( n − − x n n ! (cid:19) (2 n −

1) (5)uniform deviates.(3) f = 0 and r > x with probability (cid:18) (cid:19) n x n − (cid:18) x n n ! (cid:19) (1 − x ) , and at a cost of ∞ X n =1 (cid:18) (cid:19) n x n − (cid:18) x n n ! (cid:19) (1 − x )(2 n ) (6)uniform deviates. Therefore, when k = 0, by summing Equations (4)-(6), we have the expectednumber of uniform deviates used by Algorithm 3, which can be reduced to a function of x :( x + 2) · τ ( x ) − x , where τ ( x ) = exp( x ).Let’s go back to Algorithm 1. For a given k ≥

0, the average number of step 4 invokingAlgorithm 3, denoted by t k ( x ), is equal to k X i =1 i (cid:18) exp (cid:18) − x k + x k + 2 (cid:19)(cid:19) i − (cid:18) − exp (cid:18) − x k + x k + 2 (cid:19)(cid:19) + ( k + 1) (cid:18) exp (cid:18) − x k + x k + 2 (cid:19)(cid:19) k . In particular, when k = 0, we have t ( x ) = 1. The value of k conforms to the discrete Gaussiandistribution D Z + , , and x is a uniformly random number over the range (0 , B exp ( − x (2 k + x ) ) in step 4 can be given by D Z + , (0) Z ( x + 2) · τ ( x ) − x dx + ∞ X k =1 D Z + , ( k ) (cid:18)Z (4 k + x + 3) · τ k ( x ) − k − k + x t k ( x ) dx (cid:19) , where τ ( x ) and τ k ( x ) with k ≥ . B exp ( − x (2 k + x ) ) In this subsection, we show that one may not need to use Algorithm 3 to sample from theBernoulli distribution B exp ( − x (2 k + x ) ) with an integer k ≥ ∈ (0 , B exp ( − x (2 k + x ) ) can be simply decomposed into two sam-pling procedures: sampling from B e − kx and from B e − x / respectively since exp (cid:0) − x (2 k + x ) (cid:1) = e − kx e − x / .It is clear that we can repeatedly use the generalized version of Algorithm 2 (replacing 1 / x ) at most k times to sample from B e − kx . When sampling from B e − x / , however, it is hardfor us to directly compare a uniform deviate with the value of x / x .So, we use the following algorithm, namely Algorithm 6, to address this problem. Algorithm 6

Generating a Bernoulli random value which is true with probability e − xy Input: x, y ∈ (0 , Output: a Bernoulli random value from B e − xy set w ← x , n ← sample a uniform deviate u ∈ (0 , goto step 5 unless u < w . sample a uniform deviate v ∈ (0 , goto step 5 unless v < y . set w ← u , n ← n + 1 and goto step 2. return true if n is even, otherwise return false .In fact, Algorithm 6 can be viewed as a special version of Algorithm 3, as the basic ideabehind it follows from Algorithm 3. It samples two sets of uniform deviates u , u , . . . and v , v , . . . over the range (0 , n ≥ x >u > u > . . . > u n and v i < y . If n is even, it returns true , and the probability is exactly equalto (1 − xy ) + (cid:18) x y − x y (cid:19) + . . . = ∞ X n =0 (cid:18) x n n ! y n − x n +1 ( n + 1)! y n +1 (cid:19) = e − xy . Proposition 6.

Let x, y ∈ (0 , . The expected number of uniform deviates used by Algorithm 6for sampling from B e − xy can be given by ( e xy (1 + y ) − /y . .Proof. For a given n ≥

1, the probability that Algorithm 6 restarts (goes back to step 2) ( n − x n − ( n − y n − . Then, after restarting ( n −

1) times, there are two cases in which Algorithm 6 goes to step 5and returns the result before it goes back to step 2 again.(1) u > w with probability (cid:18) x n − ( n − − x n n ! (cid:19) y n − . (2) u < w and v > y with probability (cid:18) x n n ! (cid:19) y n − (1 − y ) . n −

1) uniform deviates in the ﬁrst case, while it uses 2 n uniform deviatesin the second case. Therefore, the expected number of uniform deviates can be given by ∞ X n =1 (cid:18) x n − ( n − y n − (2 n −

1) + (cid:18) x n n ! (cid:19) y n − (1 − y )(2 n ) (cid:19) , which can be reduced to ( e xy (1 + y ) − /y .For a given k ≥ x ∈ (0 , B e − x / : Z (1 + x ) e x − x dx. Here, we set y ← x and x ← x/ e xy (1 + y ) − y < e xy (1 + x ) − x if and only if x < y . Combining the idea presented in Section 4.2, namely Algorithm 6, we give Algorithm 7, whichis to sample from the standard normal distribution N (0 , Algorithm 7

Sampling from the standard normal distribution N (0 , Output: a sample value that conforms to N (0 , sample k ∈ Z + from D Z + , either(a) by using steps 1 and 2 from Algorithm 1 or (b) by applying Algorithm 5 sample a uniformly random number x ∈ (0 , sample a Boolean random values from B e − kx , and goto step 1 if it is false . sample a Boolean random values from B e − x / , and goto step 1 if it is false . set s ← ± return s ( k + x ).The correctness of Algorithm 7 follows from Karney’s algorithm for sampling from standardnormal distribution. According to Proposition 2 and Proposition 3, it is better to use Algorithm 5with σ = 1 to sample k from D Z + , , since Algorithm 5 uses a smaller number of uniform deviateson average. In this subsection, for sampling from B exp ( − x (2 k + x ) ), we show that the expectednumber of uniform deviates used by step 3 and step 4 in Algorithm 7 is also slightly smaller ascompared to using Algorithm 3.When sampling from B e − kx in step 3, the average number of invoking the generalized versionof Algorithm 2, denoted by t k ( x ), is equal to k − X i =1 i (cid:0) e − x (cid:1) i − (cid:0) − e − x (cid:1) + k (cid:0) e − x (cid:1) k − .

14n particular, when k = 0, we have t ( x ) = 0.The value of k conforms to the discrete Gaussian distribution D Z + , , and x is a uniformlyrandom number from (0 , k ≥

0, the algorithm executes step 4 withprobability ( e − x ) k . Therefore, according to Corollary 1 and Proposition 6, the expected numberof uniform deviates used by Algorithm 7 for sampling from B exp ( − x (2 k + x ) ) in step 3 and step 4can be given by ∞ X k =1 D Z + , ( k ) (cid:18)Z t k ( x ) e x dx (cid:19) + ∞ X k =0 D Z + , ( k ) Z (1 + x ) e x − x ( e − x ) k dx  ≈ . , which is slightly smaller than 2 . B exp ( − x (2 k + x ) ). On a laptop computer (Intel i7-8550U, 16GB RAM, the g++ compiler and enabling -O3 op-timization option), and following the implementation of Algorithm 1 in Karney’s C++ libraryRandomLib, we implemented our proposed algorithms. The implementation of Algorithm 5and Algorithm 7 is simply based on the adaption of ExactNormal.hpp as well as the runtimeenvironment provided by RandomLib .We tested the average numbers of Bernoulli random values consumed by Algorithm 1 andAlgorithm 7 for outputting an integer from D Z + , . The average quantities measured in practiceare consistent with the expected values estimated in Proposition 2 and Proposition 3. We alsoveriﬁed that the average numbers of uniform deviates used by Algorithm 1 and Algorithm 7 inpractice for sampling from B exp ( − x (2 k + x ) ) is about 2 .

19 and 2 .

02 respectively.A uniform deviate could be made up of one or several digits, and it will only be treatedas one deviate no matter how many digits it actually uses. The comparison of two deviates isrealized digit-by-digit, and each digit of a deviate is generated online according to the actualneeds. In fact, one digit could consist of only one bit or a small number of bits, such as 4 bits,8 bits or 16 bits. We call the number of bits in each digit the digit size , which can be speciﬁedthrough assigning an integer value to the variable bits in the source code. If uniformly randombits can be produced in batches eﬃciently (say a typical software approach), then a larger bits could bring better actual sampling performance.Table 1: The performance of Algorithm 1 and Algorithm 7 (10 sample values per second) bits = 1 bits = 4 bits = 8 bits = 16Alg. 1 [6] 2 .

109 3 .

248 3 .

470 3 . .

159 3 .

298 3 .

503 3 . .

532 3 .

929 4 .

017 4 . ‘RandomLib’ is available at http://randomlib.sourceforge.net/. σ = 1, one can get about 26 . × sample values per second fromthe discrete Gaussian distribution D Z , , while using Algorithm 4 with σ = 1 one can obtainonly about 21 . × sample values per second from D Z + , . Table 1 shows the performance ofAlgorithm 1 and Algorithm 7. One can see that the performance advantage of Algorithm 7 canbe shown even without using Algorithm 5. This could be well explained by the fact that theexpected number of uniform deviates used by Algorithm 7 for sampling from B exp ( − x (2 k + x ) ) isslightly smaller as compared to using Algorithm 3 in Algorithm 1. In this paper, under the random deviate model, we discuss the computational complexity of Kar-ney’s exact sampling algorithm for the standard normal distribution. We present an improvedalgorithm for sampling D Z + , , and an alternative sampling method for the Bernoulli distribution B exp ( − x (2 k + x ) ) with an integer k ∈ Z + and a real number x ∈ (0 , Z , namely D Z ,σ,µ ( k ) = ρ σ,µ ( k ) /ρ σ,µ ( Z ) for all k ∈ Z , where ρ σ,µ ( Z ) = P x ∈ Z ρ σ,µ ( x ).In recent years, the issue of sampling from discrete Gaussian distributions over the integershas received increasing attention because of its application in cryptography [5, 3, 8, 10]. Themethods of sampling from a continuous Gaussian distribution are not trivially applicable forthe discrete case. As a discretization version of the algorithm for sampling exactly from thestandard normal distribution, Karney also presented an exact sampling algorithm for the discreteGaussian distribution over the integers Z , whose parameters ( σ and µ ) are rational numbers.(see Algorithm D in [6].) Sampling from D Z + , and sampling from B exp ( − x (2 k + x ) ) are alsotwo key subroutines in Karney’s sampling algorithm for discrete Gaussian distributions. Thecomputational complexity and the actual performance of this algorithm could also be improvedby our work in this paper.Our complexity analysis of Karney’s algorithms is based on estimating expected numberof uniform deviates used, namely the random deviate model, which is still a “coarse-grained”complexity model as compared to the random bit model, although it can be used to well explainthe complexity advantage of our improved algorithms. It would be also interesting to investigateKarney’s algorithms under the random bit model, in which the complexity is measured by theexpected number of random bits used by the sampling algorithm. In our opinion, following theargument presented in this work, the complexity of Karney’s algorithms under the random bitmodel could be well estimated if a theoretical estimate of the expected number of random bitsused by Algorithm 2 as well as its generalized version can be given. References [1] Luc Devroye.

Non-Uniform Random Variate Generation . Springer, January 1986.162] Luc Devroye and Claude Gravel. The expected bit complexity of the von neumann rejectionalgorithm.

Stat. Comput. , 27(3):699–710, 2017.[3] Nagarjun C. Dwarakanath and Steven D. Galbraith. Sampling from discrete gaussians forlattice-based cryptography on a constrained device.

Applicable Algebra in Engineering,Communication and Computing , 25(3):159, June 2014.[4] Philippe Flajolet and Nasser Saheb. The complexity of generating an exponentially dis-tributed variate.

J. Algorithms , 7(4):463–488, 1986.[5] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. Trapdoors for hard lattices andnew cryptographic constructions. In Cynthia Dwork, editor,

STOC 2008, Victoria, BritishColumbia, Canada, May 17-20, 2008 , pages 197–206. ACM, 2008.[6] Charles F. Karney. Sampling exactly from the normal distribution.

ACM Trans. Math.Softw. , 42(1):3:1–3:14, 2016.[7] D. Knuth and A. Yao.

Algorithms and Complexity: New Directions and Recent Results ,chapter The complexity of nonuniform random number generation. Academic Press, 1976.[8] Daniele Micciancio and Michael Walter. Gaussian sampling over the integers: Eﬃcient,generic, constant-time. In Jonathan Katz and Hovav Shacham, editors,

CRYPTO 2017,Santa Barbara, CA, USA, August 20-24, 2017, Proceedings, Part II , volume 10402 of

LNCS ,pages 455–485. Springer, 2017.[9] John von Neumann. Various techniques used in connection with random digits. In A. S.Householder, G. E. Forsythe, and H. H. Germond, editors,

Monte Carlo Method , volume 12of

National Bureau of Standards Applied Mathematics Series , chapter 13, pages 36–38. USGovernment Printing Oﬃce, Washington, DC, 1951.[10] Raymond K. Zhao, Ron Steinfeld, and Amin Sakzad. FACCT: fast, compact, and constant-time discrete gaussian sampler over integers.