A table of short-period Tausworthe generators for Markov chain quasi-Monte Carlo
aa r X i v : . [ s t a t . C O ] A ug A table of short-period Tausworthe generators forMarkov chain quasi-Monte Carlo
Shin Harase a, ∗ a College of Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi,Kusatsu, Shiga, 525-8577, Japan.
Abstract
We consider the problem of estimating expectations by using Markov chainMonte Carlo methods and improving the accuracy by replacing IID uni-form random points with quasi-Monte Carlo (QMC) points. Recently, it hasbeen shown that Markov chain QMC remains consistent when the drivingsequences are completely uniformly distributed (CUD). However, the defi-nition of CUD sequences is not constructive, so an implementation methodusing short-period Tausworthe generators (i.e., linear feedback shift regis-ter generators over the two-element field) that approximate CUD sequenceshas been proposed. In this paper, we conduct an exhaustive search of short-period Tausworthe generators for Markov chain QMC in terms of the t -value,which is a criterion of uniformity widely used in the study of QMC meth-ods. We provide a parameter table of Tausworthe generators and show theeffectiveness in numerical examples using Gibbs sampling. Keywords:
Pseudorandom number generation, Quasi-Monte Carlo,Markov chain Monte Carlo, Polynomial lattice point set, Continued fractionexpansion
1. Introduction
We consider the problem of estimating the expectation E π [ f ( X )] by usingMarkov chain Monte Carlo (MCMC) methods for a target distribution π and ∗ Corresponding author
Email address: [email protected] (Shin Harase)
Preprint submitted to Elsevier August 25, 2020 ome function f . For this problem, we want to improve the accuracy by re-placing independent and identically distributed (IID) uniform random pointswith quasi-Monte Carlo (QMC) points. However, typical QMC points (e.g.,Sobol’, Faure, and Niederreiter–Xing) are not applicable in general. Moti-vated by a simulation study by Liao [20], Owen and Tribble [24] and Chenet al. [2] proved that Markov chain QMC remains consistent when the driv-ing sequences are completely uniformly distributed (CUD). Here, a sequence u , u , u , . . . ∈ [0 ,
1) is said to be CUD if overlapping s -blocks ( u i , u i +1 ,. . . , u i + s − ), i = 0 , , , . . . , are uniformly distributed for every dimension s ≥ F := { , } ) that run for the entire period. Chen et al. [3] implementedshort-period Tausworthe generators optimized in terms of the equidistribu-tion property, which is a coarse criterion used in the area of pseudorandomnumber generation (see [1, § t, m, s )-nets and ( t, s )-sequences, the t -value is a central crite-rion of uniformity. In fact, typical QMC points (e.g., Sobol’, Faure, andNiederreiter–Xing) are optimized in terms of the t -value (see [23, 5]).The aim of this paper is to conduct an exhaustive search of short-periodTausworthe generators for Markov chain QMC in terms of the t -value andto provide a parameter table of Tausworthe generators. It is known thatTausworthe generators can be viewed as polynomial Korobov lattice pointsets with a denominator polynomial p ( x ) and a numerator polynomial q ( x )over F (e.g., see [17, 18]). For dimension s = 2, there is a connection betweenthe t -value and continued fraction expansions, that is, the t -value is optimal(i.e., the t -value is zero) if and only if the partial quotients in the continuedfraction of q ( x ) /p ( x ) are all of degree one. To satisfy the definition of CUDsequences approximately, we want to search for parameters ( p ( x ) , q ( x )) whose t -values are optimal for s = 2 and as small as possible for s ≥
3. As a previousstudy, in 1993, Tezuka and Fushimi [30] proposed an algorithm to search forsuch parameters using a polynomial analogue of Fibonacci numbers from theviewpoint of continued fraction expansions. Thus, we refine their algorithmon modern computers, and conduct an exhaustive search again. In addition,we report numerical examples using Gibbs sampling in which the resulting2MC point sets perform better than the existing point sets developed byChen et al. [3].One might consider searching for parameters ( p ( x ) , q ( x )) with t -valuezero for s = 3. Kajiura et al. [12] proved that there exists no maximal-periodTausworthe generator with this property.The remainder of this paper is organized as follows: In Section 2, webriefly recall the definition of CUD sequences, Tausworthe generators, andthe t -value and equidistribution property. Section 3 is devoted to our mainresults: we describe an exhaustive search algorithm and provide a table ofshort-period Tausworthe generators for Markov chain QMC. We also compareour new generators with existing generators developed by Chen et al. [3] interms of the t -value and equidistribution property. In Section 4, we presentnumerical examples using Gibbs sampling. In Section 5, we conclude thispaper.
2. Preliminaries
We refer the reader to [23, 5, 17, 16] for general information.
Let P s = { u , u , . . . , u N − } ⊂ [0 , s be an s -dimensional point set of N elements in the sense of a “multiset”. We recall the definition of thediscrepancy as a criterion of uniformity of P s . Definition 1 (Discrepancy).
For a point set P s = { u , u , . . . , u N − } ⊂ [0 , s , the ( star ) discrepancy is defined as D ∗ sN ( P s ) := sup J (cid:12)(cid:12)(cid:12)(cid:12) ν ( J ; P s ) N − vol( J ) (cid:12)(cid:12)(cid:12)(cid:12) , where the supremum is taken over every sub-interval J = [0 , t ) × · · · × [0 , t s ) ⊂ [0 , s , ν ( J ; P s ) is the number of points from P s that belong to J ,and vol( J ) := t · · · t s is the volume of J .If D ∗ sN ( P s ) is close to zero, we regard P s as highly uniformly distributed.Next, we define the CUD property for a one-dimensional infinite sequence { u i } ∞ i =0 ⊂ [0 , efinition 2 (CUD sequences). A one-dimensional infinite sequence u , u , u ,. . . ∈ [0 ,
1) is said to be completely uniformly distributed ( CUD ) if overlap-ping s -blocks satisfylim N →∞ D ∗ sN (( u , . . . , u s − ) , ( u , . . . , u s ) , . . . , ( u N − , . . . , u N + s − )) = 0for every dimension s ≥
1, that is, the sequence of s -blocks ( u i , . . . , u i + s − ) , i =0 , , . . . , is uniformly distributed in [0 , s for every dimension s ≥ D ∗ sN converges to zero fast if N → ∞ ;see [7, 6] for details. As a necessary and sufficient condition of Definition 2,Chentsov [4] showed that non-overlapping blocks satisfylim N →∞ D ∗ sN (cid:0) ( u , . . . , u s − ) , ( u s , . . . , u s − ) , . . . , ( u s ( N − , . . . , u Ns − ) (cid:1) = 0for every dimension s ≥
1. Thus, we use a sequence { u i } ∞ i =0 ⊂ [0 ,
1) forMarkov chain QMC in this order.
We recall some results of Tausworthe generators. Let F := { , } bethe two-element field, and perform addition and multiplication over F (ormodulo 2). Definition 3 (Tausworthe generators [27, 14, 15]).
Let p ( x ) := x m − c x m − − · · · − c m − x − c m ∈ F [ x ]. Consider the linear recurrence a i := c a i − + · · · + c m a i − m ∈ F , (1)whose characteristic polynomial is p ( x ). Let σ be a step size with 0 < σ < m − u i := w − X j =0 a iσ + j − j − ∈ [0 ,
1) (2)be the output at step i , where w is the word size of the intended machine. If p ( x ) is primitive, ( a , . . . , a m − ) = (0 , . . . , σ, m −
1) = 1, then thesequences (1) and (2) are both purely periodic with maximal period 2 m − σ ≥ w . A generator in such a class iscalled a Tausworthe generator (or a linear feedback shift register generator ).4et N = 2 m and consider a sequence u , u , . . . , u N − , u N − = u , . . . ∈ [0 ,
1) (3)generated from a Tausworthe generator with the period length N −
1. We con-sider s -dimensional overlapping points u i = ( u i , . . . , u i + s − ) for i = 0 , , . . . , N −
2, that is, u = ( u , . . . , u s − ) , u = ( u , . . . , u s ) , . . . , u N − = ( u N − , u , . . . , u s − ).Adding the origin { } , we regard a point set P s = { } ∪ { u i } N − i =0 ⊂ [0 , s (4)as a QMC point set. Note that the cardinality is | P s | = 2 m .Moreover, Tausworthe generators can be represented as a polynomial ana-logue of linear congruential generators: q ( x ) := x σ mod p ( x ) (5) X i ( x ) := q ( x ) X i − ( x ) mod p ( x ) (6) X i ( x ) /p ( x ) = a iσ x − + a iσ +1 x − + a iσ +2 x − + · · · ∈ F (( x − )) . (7)Then, the sequence (2) is expressed as u i = ν w ( X i ( x ) /p ( x )), where a map ν w : F (( x − )) → [0 ,
1) is given by P ∞ j = j k j x − j − P w − j =max { ,j } k j − j − ,which is obtained by substituting x = 2 into (7) and truncating the valuewith the word size w . Furthermore, according to [17, § P s in (4) can also be represented as a polynomial Korobov lattice pointset: P s = (cid:26) ν w (cid:18) h ( x ) p ( x ) (1 , q ( x ) , q ( x ) , . . . , q ( x ) s − ) (cid:19) (cid:12)(cid:12)(cid:12) deg( h ( x )) < m (cid:27) , (8)where m = deg( p ( x )) and the map ν w is applied component-wise. A pair ofpolynomials ( p ( x ) , q ( x )) is a parameter set of P s . Thus, to construct a pointset that approximates CUD sequences in Definition 2, we want to find a pair( p ( x ) , q ( x )) with small discrepancies D ∗ sN ( P s ) for each s ≥ Generally, calculating D ∗ sN ( P s ) is NP-hard [11]. A point set P s in (4)generated from a Tausworthe generator is a digital net , so we can computethe t -value closely related to D ∗ sN ( P s ) for N = 2 m .5 efinition 4 ( ( t, m, s ) -nets). Let s ≥ ≤ t ≤ m be integers. Then,a point set P s consisting of 2 m points in [0 , s is called a ( t, m, s ) -net (in base2) if every subinterval E = Q sj =1 [ r j / d j , ( r j + 1) / d j ) in [0 , s with integers d j ≥ ≤ r j < d j for 1 ≤ j ≤ s and of volume 2 t − m contains exactly2 t points of P s .For dimension s , the smallest value t for which P s is a ( t, m, s )-net is calledthe t -value . D ∗ sN ( P s ) = O (2 t (log N ) s − /N ) holds, where the implied constantin the O -notation only depends on s , so a small t -value is desirable. Thus,we want to find Tausworthe generators with pairs of polynomials ( p ( x ) , q ( x ))whose t -values are optimal (i.e., t = 0) for s = 2 and as small as possible for s ≥
3. Note that all Tausworthe generators have the t -value zero for s = 1.Conversely, Chen et al. [3] used the following equidistribution propertyas a criterion of uniformity: Definition 5 ( s -dimensional equidistribution with l -bit accuracy). For1 ≤ s ≤ m and 1 ≤ l ≤ m , a point set P s consisting of 2 m points in [0 , s issaid to be s -dimensionally equidistributed with l -bit accuracy if we can parti-tion the s -dimensional unit cube [0 , s into congruent cubic boxes of volume2 − sl by dividing each axis [0 ,
1) into 2 l intervals, and can obtain an equalnumber of points from P s in each box.For dimension s , the largest value of l for which this definition holds is calledthe resolution of P s and denoted by l s . We have a trivial upper bound l s ≤⌊ m/s ⌋ . As a criterion of uniformity, a high resolution l s is desirable. Thus,we define the resolution gap d s = ⌊ m/s ⌋ − l s and the sum of resolution gaps ∆ = P ms =1 d s . If ∆ = 0, the generator is said to be fully equidistributed ( FE ).Note that P s contains the origin { } and the output values of a Tausworthegenerator for the entire period of 2 m −
1. Chen et al. [3] implemented FETausworthe generators for Markov chain QMC.
3. Main result
To construct a point set that approximates CUD sequences in Defini-tion 2, we search for a pair of polynomials ( p ( x ) , q ( x )) whose t -values areoptimal for s = 2 and as small as possible for s ≥
3. Thus, we refine thealgorithm of Tezuka and Fushimi [30].6or dimension s = 2, there is a connection between the t -value of poly-nomial Korobov lattice point sets (8) and continued fraction expansion of q ( x ) /p ( x ). Let q ( x ) p ( x ) = A ( x ) + 1 A ( x ) + 1 A ( x ) + 1. . . + 1 A v ( x ) =: [ A ( x ); A ( x ) , A ( x ) , . . . , A v ( x )]be the continued fraction expansion of the rational function q ( x ) /p ( x ) with apolynomial part A ( x ) ∈ F [ x ] and partial quotients A k ( x ) ∈ F [ x ] satisfyingdeg( A k ( x )) ≥ ≤ k ≤ v . Theorem 1 ([23, 30]).
Let p ( x ) ∈ F [ x ] with m = deg( p ( x )) and q ( x ) ∈ F [ x ] with deg( q ( x )) < m . Assume gcd( p ( x ) , q ( x )) = 1 . Then, the two-dimensional point set P = (cid:26) ν w (cid:18) h ( x ) p ( x ) (1 , q ( x )) (cid:19) (cid:12)(cid:12)(cid:12) deg( h ( x )) < m (cid:27) is a (0 , m, -net (i.e., the t -value is zero) if and only if the partial quotientsin the continued fraction expansion [0; A ( x ) , A ( x ) , . . . , A v ( x )] of q ( x ) /p ( x ) all have degree one, so v = m . The next theorem asserts the existence of q ( x ) with the above property forevery irreducible polynomial p ( x ). Theorem 2 ([22]).
Let p ( x ) be an irreducible polynomial with m = deg( p ( x )) and q ( x ) ∈ F [ x ] with deg q (( x )) < m . For each p ( x ) , there are exactly twopolynomials q ( x ) for which the partial quotients of the continued fractionexpansion of q ( x ) /p ( x ) all have degree one. In fact, the two polynomials are q ( x ) and q − ( x ) mod p ( x ), which meanthat we generate Tausworthe generators in normal order and reverse order,respectively. Hence, they yield essentially the same polynomial lattice pointset P s . 7o obtain ( p ( x ) , q ( x )) satisfying the above theorems, Tezuka and Fushimi[30] defined a polynomial analogue of Fibonacci numbers as follows: F k ( x ) = A k ( x ) F k − ( x ) + F k − ( x ) ( k ≥ , (9) F ( x ) = 1 , F ( x ) = A ( x ) , (10) A k ( x ) = x or x + 1 ( k ≥ . (11)They called a pair of polynomials ( F k ( x ) , F k − ( x )) a pair of “Fibonacci poly-nomials” because the partial quotients in the continued fraction of F k − ( x ) /F k ( x )are all of degree one. Figure 1 shows the initial part of a tree of Fibonaccipolynomials, which was originally illustrated in [28, Figure 4.5]. Note thatthere are 2 m different pairs ( F m ( x ) , F m − ( x )) for Fibonacci polynomials withdegree m . From them, we choose a suitable pair ( p ( x ) , q ( x )) that approxi-mates CUD sequences in Definition 2. Figure 1: A tree of Fibonacci polynomials.
Now we refine the algorithm of Tezuka and Fushimi [30]. Our exhaustivesearch algorithm proceeds as follows: 8 lgorithm 1
An exhaustive search algorithm Generate all the pairs ( F m ( x ) , F m − ( x )) using the recurrence relation ofFibonacci polynomials (9)–(11). Check the primitivity of F m ( x ). Find σ such that x σ ≡ F m − ( x ) mod F m ( x ) and 0 < σ < m −
1. Checkgcd( σ, m −
1) = 1 and σ ≥ w . Choose pairs ( F m ( x ) , F m − ( x )) whose t -value is equal to or smaller than3 for s = 3. Let t ( s ) be a t -value for dimension s . For each ( F m ( x ) , F m − ( x )), make avector ( t (4) , t (5) , t (6) , . . . , t ( m ) ) of the t -values. Sort pairs ( F m ( x ) , F m − ( x )) in lexicographic order based on( t (4) , t (5) , t (6) , . . . , t ( m ) ) starting from dimension 4. Choose one of the best (or smallest) pairs ( F m ( x ) , F m − ( x )) in Step 6. Set ( p ( x ) , q ( x )) ← ( F m ( x ) , F m − ( x )).In Step 4, this criterion means that the t -value is sufficiently small for s = 3; see Remark 2 for details. In Steps 4 and 5, we calculate the t -valuesby using Gaussian elimination [25] instead of solving Diophantine equationsin [30, Theorem 1]. Remark 1.
In the original paper [30], before Step 2, Tezuka and Fushimichecked the condition F m − ( x ) m + F m − ( x ) n + 1 = 0 mod F m ( x ) , where 0 < n < m , to obtain fast Tausworthe generators using trinomialgeneralized feedback shift register generators. They also restricted the calcu-lation of the t -values to only 3 ≤ s ≤
6. A reason for these conditions mightbe the difficulty of checking from Steps 2–5 on computers around 1990. Asa result, in the range 3 ≤ m ≤
32, there exist pairs ( F m ( x ) , F m − ( x )) onlyfor m = 3 , , , , , , , , , ,
28, and 31; otherwise, there exists nopair. In the related paper [29], the authors found pairs ( F m ( x ) , F m − ( x )) forall 3 ≤ m ≤
21 under a pentanomial condition. Currently, it is not difficultto remove these conditions when we conduct an exhaustive search on mod-ern computers. In Remark 3, we note a reasonably fast generation methodinstead of the direct use of Definition 3.
Remark 2.
In Step 4, we observed that the smallest t -values are 2 or 3for 10 ≤ s ≤
32 by exhaustive search. More precisely, there exist pairs9 p ( x ) , q ( x )) with t -value two only for 10 ≤ s ≤
14 and s = 16 and 17,and the number of them are quite few, compared with the number of pairswith t -value three. For example, in the case where s = 17, there exist fourpairs with t -value two but 464 pairs with t -value three. Thus, to find a pair( p ( x ) , q ( x )) with smaller t -value even for s ≥
4, we adopted this criterion.
Table 1 lists specific parameters for w = 32 ,
64 and 10 ≤ m ≤
32. InTable 1, each first and second row shows the coefficients of p ( x ) and q ( x ) re-spectively; for example, 1 1 0 1 means 1 + x + x . We also note the step size σ corresponding to q ( x ). For m = 21 and 28, we obtained the pairs of polyno-mials ( p ( x ) , q ( x )) with somewhat large defects ∆ = 6 and 4, respectively, sowe replaced them by the second-best pairs. Table 2 summarizes the t -valuesand sum of resolution gaps ∆ for our new Tausworthe generators (labeled“New”) and the existing Tausworthe generators developed by Chen et al. [3](labeled “Chen”) in the range of 2 ≤ s ≤
20. For 2 ≤ s ≤
5, our new genera-tors have the t -values equal to or smaller than the existing generators (exceptfor m = 32). It is known that QMC are successful in high-dimensional prob-lems, particularly in the case in which problems are dominated by the firstfew variables, so we focus on the optimization of leading dimensions. Con-versely, from the viewpoint of the FE property, our generators are not FE.We can also optimize both the t -values and FE property, but the t -valuesslightly increase. Thus, we prioritized the t -values over the FE property forsimplicity. The code in C is available at https://github.com/sharase/cud . Remark 3.
We note a reasonably fast generation method for Tausworthegenerators. Let x i = ( a iσ , a iσ +1 , . . . , a iσ + m − , a iσ + m , . . . , a iσ + w − ) T be a w -bitstate vector at step i for m ≤ w . We can define a state transition x i +1 = Bx i ,where B := (cid:0) b . . . b m − . . . (cid:1) is a w × w state transition matrixconsisting of w -bit column vectors b , . . . , b m − and w − m w -bit zero columnvectors . Then, we have the recurrence relation x i +1 = a iσ b ⊕ a iσ +1 b ⊕· · · ⊕ a iσ + m − b m − , which can be calculated by adding column vectors b j if a iσ + j = 1 holds for j = 0 , . . . , m −
1, where the symbol ⊕ denotes the bitwiseexclusive-or operation. Using this method, we can generate { u i } ∞ i =0 in (2)with reasonable speed. See [16, § B .10 able 1: Specific parameters of pairs of polynomials ( p ( x ) , q ( x )) and step sizes σ . m = 10 1 0 0 0 0 0 1 1 0 1 10 1 0 1 1 1 0 1 0 1 ( σ = 70) m = 11 1 1 0 0 1 0 0 1 1 0 1 10 1 0 0 0 0 1 1 1 0 1 ( σ = 179) m = 12 1 1 1 1 1 0 0 1 0 0 1 1 10 0 1 0 0 1 1 1 1 0 1 1 ( σ = 146) m = 13 1 1 1 0 1 0 0 0 1 0 1 1 1 11 0 1 0 1 1 1 1 1 0 0 1 1 ( σ = 139) m = 14 1 0 1 0 1 1 0 1 1 1 1 0 1 1 11 0 1 1 1 1 0 1 0 0 1 0 1 1 ( σ = 5192) m = 15 1 1 0 1 1 0 0 1 1 1 0 1 0 1 1 10 0 1 1 0 1 1 1 0 0 0 0 0 1 1 ( σ = 1028) m = 16 1 1 0 1 0 1 1 1 1 1 0 0 1 0 0 1 11 0 0 1 1 1 0 1 0 0 1 1 0 1 1 1 ( σ = 12749) m = 17 1 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 1 11 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 ( σ = 20984) m = 18 1 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 0 1 11 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0 1 ( σ = 72349) m = 19 1 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 1 0 0 10 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 1 0 1 ( σ = 92609) m = 20 1 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 0 0 1 0 10 1 0 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 ( σ = 226826) m = 21 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 10 1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1 ( σ = 1127911) m = 22 1 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 10 0 1 1 0 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 ( σ = 629680) m = 23 1 1 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 0 0 0 11 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 1 1 0 0 0 1 1 ( σ = 1796311) m = 24 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 0 11 1 0 0 0 0 1 1 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 ( σ = 7017398) m = 25 1 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 10 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 ( σ = 2947446) m = 26 1 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 1 1 1 11 1 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 ( σ = 19101221) m = 27 1 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 10 1 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 ( σ = 4397933) m = 28 1 0 0 0 1 0 1 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 10 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 1 ( σ = 167713336) m = 29 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 11 1 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 ( σ = 83189117) m = 30 1 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 0 10 1 0 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 ( σ = 315800840) m = 31 1 0 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 0 1 1 0 1 10 0 0 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 ( σ = 36109125) m = 32 1 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 10 1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 ( σ = 686019401) able 2: Comparison of the t -values and ∆ for our new Tausworthe generators and theexisting Tausworthe generators developed by Chen et al. [3]. m dim. s . Numerical examples In this section, we provide numerical examples to confirm the performanceof Markov chain QMC.
Our first example is a systematic Gibbs sampler to generate the two-dimensional Gaussian distribution X = (cid:18) X X (cid:19) ∼ N (cid:18)(cid:18) (cid:19) , (cid:18) ρρ (cid:19)(cid:19) for correlation ρ ∈ ( − , X i, ← ρX i − , + p − ρ Φ − ( u i − ) ,X i, ← ρX i, + p − ρ Φ − ( u i − ) , where Φ is the cumulative distribution function for the standard normal dis-tribution. For the output values (3) generated from Tausworthe generators,we define two-dimensional non-overlapping points starting from the origin:(0 , , ( u , u ) , ( u , u ) , . . . , ( u N − , u ) , ( u , u ) , . . . , ( u N − , u N − ) , (12)where N = 2 m . We apply digital shifts, that is, we add ( z , z ) to each pointin (12) using bitwise exclusive-or ⊕ , where z and z are IID samples fromU(0, 1).We estimate E ( X ) and E ( X ) by taking the sample mean. Hence, thetrue values are zero. We compare the following driving sequences:1. New: our new Tausworthe generators;2. Chen et al. (2012): Tausworthe generators developed by Chen et al. [3];and3. IID: Mersenne Twister [21].Figure 4.1 shows a summary of standard deviations (in log scale) for ρ =0 , . . ≤ m ≤
25 using 100 digital shifts. Our new generatorsoutperformed Chen’s generators for no correlation ρ = 0 and weak correlation ρ = 0 .
3. Even for strong correlation ρ = 0 .
9, our new generators were stillbetter than Chen’s generators. In Figure 4.1, we generated scatter plots ofsampling ( X , X ) from our new and Chen’s Tausworthe generators for ρ = 013nd m = 12. In the scatter plots, Chen’s generator has a pattern of wigglystrips of points, which is optimized in terms of 64 ×
64 grids for s = 2, butour generator seems to be highly balanced both for X and X . Therefore, itcan be expected that our new generators have better marginal distributionsthan the existing generators.In addition, as a test function, we estimated E ( X X ), which has the truevalue ρ . Figure 4.1 shows a summary of standard deviations (in log scale)for ρ = 0 , . . ≤ m ≤
25 using 100 digital shifts. We obtainedthe results in which our new generators were superior to Chen’s generatorsespecially for ρ = 0 and 0 . Our second example is a hierarchical Bayesian model [9] used in [24,31, 20]. Following [26, Example 7.12], we explain the problem setting. Weconsider multiple failures of ten pumps in a nuclear plant, with the datagiven in Table 3. The modeling is based on the assumption that the numberof failures of the j th pump follows a Poisson process with parameter λ j ( j = 1 , . . . , t j , the number of failures X j is thusa Poisson P ( λ j t j ) random variable. The standard prior distributions aregamma distributions G ( α, β ) with shape parameter α and rate parameter β ,which lead to the hierarchical model X j ∼ P ( λ j t j ) , j = 1 , . . . , ,λ j ∼ G ( α, β ) , j = 1 , . . . , ,β ∼ G ( γ, δ ) , where the hyperparameter values are α = 1 . , γ = 0 .
1, and δ = 1. Ourgoal is to estimate the posterior means E [ λ j ] and E [ β ] by taking the samplemean. For this purpose, we use a Gibbs sampler based on the full conditionaldistributions λ j | β, t j , x j ∼ G ( x j + α, t j + β ) , j = 1 , . . . , ,β | λ , . . . , λ ∼ G γ + 10 α, δ + X j =1 λ j ! . Note that the state vector ( λ , . . . , λ , β ) has eleven dimensions. The start-ing point uses the maximum likelihood estimates x j /t j for λ j together with14 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0 NewChen et al. (2012)IID -30-25-20-15-10-5 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0 NewChen et al. (2012)IID-30-25-20-15-10-5 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0.3 NewChen et al. (2012)IID -30-25-20-15-10-5 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0.3 NewChen et al. (2012)IID-24-22-20-18-16-14-12-10-8-6-4 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0.9 NewChen et al. (2012)IID -24-22-20-18-16-14-12-10-8-6-4 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X ) for r = 0.9 NewChen et al. (2012)IID Figure 2: Estimation of E ( X ) and E ( X ) for ρ = 0 , . . X X New -4-3-2-1 0 1 2 3 4-4 -3 -2 -1 0 1 2 3 4 X X Chen et al. (2012)
Figure 3: Scatter plots of sampling ( X , X ) for ρ = 0 and m = 12. l og ( S t anda r d D e v i a t i on ) mE(X X ) for r = 0 NewChen et al. (2012)IID-24-22-20-18-16-14-12-10-8-6-4 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X X ) for r = 0.3 NewChen et al. (2012)IID -22-20-18-16-14-12-10-8-6-4 12 14 16 18 20 22 24 l og ( S t anda r d D e v i a t i on ) mE(X X ) for r = 0.9 NewChen et al. (2012)IID Figure 4: Estimation of E ( X X ) for ρ = 0 , . . β , given the starting λ j . The Gibbs sam-pling is driven by inversion of gamma cumulative density functions. Sim-ilarly to (12), for the output values (3), we define eleven-dimensional non-overlapping points ( u , . . . , u ) , ( u , . . . , u ) , . . . , ( u N − , . . . , u N − − ),starting from the origin (0 , . . . , N = 2 m and gcd(2 m − ,
11) = 1.Table 4 shows a summary of sample variances of posterior mean estimatesfor m = 12 , ,
16, and 18 using 300 digital shifts. Our new Tausworthe gen-erators were comparable to or even better than Chen’s Tausworthe generatorswith a few exceptions (e.g., λ , λ , and λ for m = 14). Such exceptions oc-curred in pumps for short monitoring periods, and this implies that it mightbe difficult to estimate those parameters with high accuracy from the per-spective of Bayesian inference. In any case, our new generators were at leastsuperior to IID uniform random number sequences generated by MersenneTwister. Table 3: Number of failures and times of observation of ten pumps in a nuclear plant [8].
Pump j x j t j Remark 4.
In our experiments, we set w = 32. In fact, Chen et al. [3]originally defined Tausworthe generators in (2) with m -bit precision, that is, u i = P m − j =0 a iσ + j − j − ∈ [0 , w bits as in (2), and then the differences became clear. Remark 5.
Sequential Monte Carlo (SMC) can be used to perform Bayesianinference when the data are accumulated sequentially rather than being givena priori. Recently, Gerber and Chopin [10] developed a class of algorithmscombining SMC and randomized QMC to accelerate convergence.
5. Conclusion
We conducted an exhaustive search of short-period Tausworthe generatorsfor Markov chain QMC in terms of the t -value. Our key technique was touse the continued fraction expansion of q ( x ) /p ( x ) by refining the algorithm18 able 4: Variance of posterior mean estimates for pump failure data. m = 12Parameter λ λ λ λ λ IID 1.77e-07 1.98e-06 4.12e-07 1.96e-07 2.40e-05Chen 4.77e-11 7.18e-10 8.91e-11 4.69e-11 7.44e-09New
Parameter λ λ λ λ λ β IID 4.14e-06 9.79e-05 9.00e-05 1.05e-04 4.80e-05 2.29e-04Chen 1.09e-09 m = 14Parameter λ λ λ λ λ IID 4.33e-08 5.44e-07 9.21e-08 6.67e-08 5.46e-06Chen 4.86e-12 1.07e-10 1.05e-11 5.15e-12 5.64e-09New
Parameter λ λ λ λ λ β IID 1.12e-06 2.21e-05 2.32e-05 2.40e-05 1.21e-05 6.28e-05Chen 9.75e-11 m = 16Parameter λ λ λ λ λ IID 1.08e-08 1.42e-07 2.42e-08 1.21e-08 1.44e-06Chen 4.03e-13 8.28e-12 1.07e-12 4.73e-13 7.05e-11New
Parameter λ λ λ λ λ β IID 3.01e-07 5.34e-06 5.65e-06 6.79e-06 2.61e-06 1.67e-05Chen 8.74e-12 5.11e-10 5.34e-10 3.90e-10 9.90e-11 2.20e-09New m = 18Parameter λ λ λ λ λ IID 2.48e-09 3.21e-08 7.49e-09 3.47e-09 3.88e-07Chen 2.50e-14 1.05e-12 5.12e-14 2.58e-14 1.86e-11New
Parameter λ λ λ λ λ β IID 8.30e-08 1.34e-06 1.64e-06 1.65e-06 6.65e-07 4.24e-06Chen 1.39e-12 7.52e-11 1.83e-10 9.83e-11 1.68e-11 9.01e-10New
19f Tezuka and Fushimi [30] on modern computers. As a result, we obtainedthe point sets with t -values optimal for s = 2 and small for s ≥
3. Wealso reported numerical examples using Gibbs sampling in which our newgenerators performed better than the existing generators of Chen et al. [3].The code in C is available at https://github.com/sharase/cud .As a future work, we will attempt more realistic numerical examples as in[1, 32, 31]. For this purpose, we believe that the next task is to embed our newand existing generators into several programming languages for statisticalcomputing; for example, R, Stan, and Python. Thus, we are now planning asoftware implementation of Markov chain QMC.
Acknowledgements
This work was supported by JSPS KAKENHI Grant Numbers JP18K18016,JP26730015, JP26310211, JP15K13460. The author would like to thank theanonymous reviewers for many valuable comments and suggestions.
References [1] S. Chen, Consistency and convergence rate of Markov chain quasi-MonteCarlo with examples, 2011. Thesis (Ph.D.)–Stanford University.[2] S. Chen, J. Dick, A.B. Owen, Consistency of Markov chain quasi-MonteCarlo on continuous state spaces, Ann. Statist. 39 (2011) 673–701.[3] S. Chen, M. Matsumoto, T. Nishimura, A.B. Owen, New inputs andmethods for Markov chain quasi-Monte Carlo, in: Monte Carlo andquasi-Monte Carlo methods 2010, volume 23 of