[PDF] Aperiodic pseudorandom number generators based on infinite words

Abstract

In this paper we study how certain families of aperiodic infinite words can be used to produce aperiodic pseudorandom number generators (PRNGs) with good statistical behavior. We introduce the \emph{well distributed occurrences} (WELLDOC) combinatorial property for infinite words, which guarantees absence of the lattice structure defect in related pseudorandom number generators. An infinite word u on a d -ary alphabet has the WELLDOC property if, for each factor w of u , positive integer m , and vector v∈ Z d m , there is an occurrence of w such that the Parikh vector of the prefix of u preceding such occurrence is congruent to v modulo m . (The Parikh vector of a finite word v over an alphabet A has its i -th component equal to the number of occurrences of the i -th letter of A in v .) We prove that Sturmian words, and more generally Arnoux-Rauzy words and some morphic images of them, have the WELLDOC property. Using the TestU01 and PractRand statistical tests, we moreover show that not only the lattice structure is absent, but also other important properties of PRNGs are improved when linear congruential generators are combined using infinite words having the WELLDOC property.

Full PDF

AAPERIODIC PSEUDORANDOM NUMBER GENERATORSBASED ON INFINITE WORDS ˇLUBOM´IRA BALKOV ´A, MICHELANGELO BUCCI, ALESSANDRO DE LUCA,JI ˇR´I HLADK ´Y, AND SVETLANA PUZYNINA

Abstract.

In this paper we study how certain families of aperiodicinﬁnite words can be used to produce aperiodic pseudorandom numbergenerators (PRNGs) with good statistical behavior. We introduce the well distributed occurrences (WELLDOC) combinatorial property forinﬁnite words, which guarantees absence of the lattice structure defectin related pseudorandom number generators. An inﬁnite word u on a d -ary alphabet has the WELLDOC property if, for each factor w of u ,positive integer m , and vector v ∈ Z dm , there is an occurrence of w suchthat the Parikh vector of the preﬁx of u preceding such occurrence iscongruent to v modulo m . (The Parikh vector of a ﬁnite word v over analphabet A has its i -th component equal to the number of occurrencesof the i -th letter of A in v .) We prove that Sturmian words, and moregenerally Arnoux-Rauzy words and some morphic images of them, havethe WELLDOC property. Using the TestU01 [12] and PractRand [6]statistical tests, we moreover show that not only the lattice structureis absent, but also other important properties of PRNGs are improvedwhen linear congruential generators are combined using inﬁnite wordshaving the WELLDOC property. Introduction

Pseudorandom number generators aim to produce random numbers us-ing a deterministic process. No wonder they suﬀer from many defects. Themost usual ones – linear congruential generators – are known to produceperiodic sequences with a defect called the lattice structure. Guimond etal. [15] proved that when two linear congruential generators are combinedusing inﬁnite words coding certain classes of quasicrystals or, equivalently,of cut-and-project sets, the resulting sequence is aperiodic and has no latticestructure. For some other related results concerning aperiodic pseudoran-dom generators we refer to [13, 14]. We mention that although the latticestructure is considered as a defect of a random number generator, it can beuseful in some applications for approximation of the uniform distribution[10].We have found a combinatorial condition – well distributed occurrences ,or WELLDOC for short – that also guarantees absence of the lattice struc-ture in related pseudorandom generators. The WELLDOC property for aninﬁnite word u over an alphabet A means that for any integer m and anyfactor w of u , the set of Parikh vectors modulo m of preﬁxes of u precedingthe occurrences of w coincides with Z |A| m (see Deﬁnition 2.1). In other words, Mathematics Subject Classiﬁcation. a r X i v : . [ m a t h . C O ] D ec ˇL. BALKOV ´A, M. BUCCI, A. DE LUCA, J. HLADK ´Y, AND S. PUZYNINA among Parikh vectors modulo m of such preﬁxes one has all possible vec-tors. Besides giving generators without lattice structure, the WELLDOCproperty is an interesting combinatorial property of inﬁnite words itself. Weprove that the WELLDOC property holds for the family of Sturmian words,and more generally for Arnoux-Rauzy words.Sturmian words constitute a well studied family of inﬁnite aperiodic words.Let u be an inﬁnite word, i. e., an inﬁnite sequence of elements from a ﬁniteset called an alphabet. The (factor) complexity function counts the numberof distinct factors of u of length n. A fundamental result of Morse and Hed-lund [18] states that a word u is eventually periodic if and only if for some n its complexity is less than or equal to n . Inﬁnite words of complexity n + 1for all n are called Sturmian words, and hence they are aperiodic words ofthe smallest complexity. The most studied Sturmian word is the so-calledFibonacci word 01001010010010100101001001010010 . . . ﬁxed by the morphism 0 (cid:55)→

01 and 1 (cid:55)→

0. (See Section 2 for formaldeﬁnitions.) The ﬁrst systematic study of Sturmian words was given byMorse and Hedlund in [19]. Such sequences arise naturally in many contexts,and admit various types of characterizations of geometric and combinatorialnature (see, e.g., [16]).Arnoux-Rauzy words were introduced in [1] as natural extensions of Stur-mian words to multiliteral alphabets (see Deﬁnition 4.4). Despite the factthat they were introduced as generalizations of Sturmian words, Arnoux-Rauzy words display a much more complex behavior. In particular, we havetwo diﬀerent proofs of the WELLDOC property for Sturmian words, andonly one of them can be generalized to Arnoux-Rauzy words. In the sequelwe provide both of them.An inﬁnite word with the WELLDOC property is then used to combinetwo linear congruential generators and form an inﬁnite aperiodic sequencewith good statistical behavior. Using the TestU01 [12] and PractRand [6]statistical tests, we have moreover shown that not only the lattice structureis absent, but also other important properties of PRNGs are improved whenlinear congruential generators are combined using inﬁnite words having theWELLDOC property.The paper is organized as follows. In the next section, we give somebackground on pseudorandom number generation. Next, in Section 2, wegive the basic combinatorial deﬁnitions needed for our main results, includ-ing the WELLDOC property, and we prove that the WELLDOC propertyof u guarantees absence of the lattice structure of the PRNG based on u .In Sections 3 and 4, we prove that the property holds for Sturmian andArnoux-Rauzy words. Finally, in Section 5, we present results of empiricaltests of PRNGs based on words having the WELLDOC property.A preliminary version of this paper [2], using the acronym WDO insteadof WELLDOC, was presented at the WORDS 2013 conference.

PERIODIC PSEUDORANDOM NUMBER GENERATORS 3 Pseudorandom Number Generators and Lattice Structure

For the sake of our discussion, any inﬁnite sequence of integers can be un-derstood as a pseudorandom number generator (PRNG) ; see also [15]. Thegenerators the most widely used in the past – linear congruential generators– are known to suﬀer from a defect called the lattice structure (they possessit already from dimension 2 as shown in [17]).Let Z = ( Z n ) n ∈ N be a PRNG whose output is a ﬁnite set M ⊂ N . Wesay that Z has the lattice structure if there exists t ∈ N such that the set { ( Z i , Z i +1 , . . . , Z i + t − ) (cid:12)(cid:12) i ∈ N } is covered by a family of parallel equidistant hyperplanes and at the sametime, this family does not cover the whole lattice M t = { ( A , A , . . . , A t ) (cid:12)(cid:12) A i ∈ M for all i ∈ { , . . . , t }} . Recall that a linear congruential generator (LCG) ( Z n ) n ∈ N is given byparameters a, m, c ∈ N and deﬁned by the recurrence relation Z n +1 = aZ n + c mod m . Let us mention a famous example of a LCG whose lattice structureis striking. For t = 3, the set of triples of RANDU, i.e., { ( Z i , Z i +1 , Z i +2 ) (cid:12)(cid:12) i ∈ N } is covered by only 15 parallel equidistant hyperplanes, see Figure 1. Figure 1.

The triples of RANDU – the LCG with a = (2 +3) , m = 2 , c = 0 – are covered by as few as 15 parallel equidistantplanes. In the paper of Guimond et al. [15], a restricted version of the followingsuﬃcient condition for the absence of the lattice structure is formulated.

Proposition 1.1.

Let Z be a PRNG whose output is a ﬁnite set M ⊂ N containing at least two elements. Assume there exists for any A, B ∈ M andfor any (cid:96) ∈ N an (cid:96) -tuple ( A , A , . . . , A (cid:96) ) such that both ( A , A , . . . , A (cid:96) , A ) and ( A , A , . . . , A (cid:96) , B ) are ( (cid:96) + 1) -tuples of the generator Z . Then Z doesnot have the lattice structure. ˇL. BALKOV ´A, M. BUCCI, A. DE LUCA, J. HLADK ´Y, AND S. PUZYNINA Remark . Proposition 1.1 can be reformulated in terms of combinatoricson words (see Section 2) as follows: Let Z be a PRNG whose output isa ﬁnite set M ⊂ N containing at least two elements. If for any A, B ∈ M and any length (cid:96) Z has a right special factor of length (cid:96) with right extensions A and B , then Z does not have the lattice structure.Since Proposition 1.1 is formulated for a restricted class of generatorsin [15] (see Lemma 2.3 ibidem), we will provide its proof. However, wepoint out that all ideas of the proof are taken from [15]. We start with anauxiliary lemma.Let us denote λ = gcd { A − B | A, B ∈ M } . Lemma 1.3.

Let Z be a PRNG satisfying all assumptions of Proposi-tion 1.1. Let ¯ n be the unit normal vector of a family of parallel equidistanthyperplanes covering all t -tuples of Z . Assume ¯ e i (the i -th vector of thecanonical basis of the Euclidean space R t ) is not orthogonal to ¯ n . Then thedistance d i of adjacent hyperplanes in the family along ¯ e i is of the form λ/k for some k ∈ N .Remark . The distance d i of adjacent hyperplanes W , W along ¯ e i means | x i − y i | for any ¯ x ∈ W and ¯ y ∈ W , where the j -th components of ¯ x and ¯ y satisfy x j = y j for all j ∈ { , . . . , t } , j (cid:54) = i . This is a well deﬁned term becausethe hyperplanes in the family are of the form W j ≡ ¯ x · ¯ n = α + jd, j ∈ Z ,where d is the distance of adjacent hyperplanes in the family and · denotesthe standard scalar product. Thus, without loss of generality, consider theadjacent hyperplanes W ≡ ¯ x · ¯ n = α and W ≡ ¯ x · ¯ n = α + d. Then for any ¯ x ∈ W and ¯ y = ¯ x + s ¯ e i from W , we have¯ y · ¯ n = α + d = ¯ x · ¯ n + d, ¯ y · ¯ n = ¯ x · ¯ n + s ¯ e i · ¯ n = ¯ x · ¯ n + sn i , where n i is the i -th component of ¯ n . Consequently, d i = | s | = (cid:12)(cid:12)(cid:12) dn i (cid:12)(cid:12)(cid:12) and isthe same for any choice of ¯ x and ¯ y which diﬀer only in their i -th componentand belong to adjacent hyperplanes. Proof of Lemma 1.3.

Let us start with a useful observation. Let ¯ z belongto a hyperplane W of the family in question.(1) If ¯ e j is orthogonal to ¯ n , then we may change the j -th component of ¯ z in an arbitrary way and the resulting vector will belong to the samehyperplane, i.e., if W ≡ ¯ x · ¯ n = α , then clearly (¯ z + β ¯ e j ) · ¯ n = ¯ z · ¯ n = α for any β ∈ R , thus ¯ z + β ¯ e j belongs to W .(2) If ¯ e j is not orthogonal to ¯ n and the distance d j of adjacent hyper-planes along ¯ e i in the family is of the form λ/k for some k ∈ N ,then ¯ z + rλ ¯ e j belongs to the family for any r ∈ Z . This follows froma repeated application of the fact that if ¯ z belongs to a hyperplane W , then ¯ z + λk ¯ e j belongs to an adjacent hyperplane of W .Let us proceed by contradiction, i.e., we assume that there exists i ∈{ , . . . , t } such that ¯ e i is not orthogonal to ¯ n and the distance along ¯ e i of adjacent hyperplanes of the family in question is not of the form λ/k, k ∈ PERIODIC PSEUDORANDOM NUMBER GENERATORS 5 N . Take the largest of such indices and denote it by (cid:96) . Choose A, B ∈ M arbitrarily. According to assumptions, there exists an ( (cid:96) − A , A , . . . , A (cid:96) − ) such that both ( A , A , . . . , A (cid:96) − , A ) and ( A , A , . . . , A (cid:96) − , B )are (cid:96) -tuples of Z . It is therefore possible to ﬁnd two t -tuples of Z such thatthe ﬁrst one is of the form ( A , A , . . . , A (cid:96) − , A, A (cid:96) +1 , . . . , A t ) and the sec-ond one of the form ( A , A , . . . , A (cid:96) − , B, ˆ A (cid:96) +1 , . . . , ˆ A t ). These two t -tuples– considered as vectors in R t – belong by the assumption of Lemma 1.3to some hyperplanes in the family. Since all vectors ¯ e j , j ∈ { (cid:96) + 1 , . . . , t } are either orthogonal to ¯ n or the distance of adjacent hyperplanes along¯ e j is of the form λ/k for some k ∈ N , we can change the last t − (cid:96) coor-dinates ˆ A (cid:96) +1 , . . . , ˆ A t of the second vector to arbitrary values from M (wetransform them into A (cid:96) +1 , . . . , A t ) and it will still belong to a hyperplanein the family. This is a consequence of the observation at the beginningof this proof. Hence, both vectors ( A , A , . . . , A (cid:96) − , A, A (cid:96) +1 , . . . , A t ) and( A , A , . . . , A (cid:96) − , B, A (cid:96) +1 , . . . , A t ) belong to some hyperplanes of the fam-ily. Their distance along ¯ e (cid:96) equals | A − B | , i.e., d (cid:96) divides A − B . Since A, B have been chosen arbitrarily, it follows that d (cid:96) divides λ , i.e., λ = kd (cid:96) for some k ∈ N , which is a contradiction with the choice of ¯ e (cid:96) . (cid:3) Proof of Proposition 1.1.

Let ¯ n be the unit normal vector of a family ofparallel equidistant hyperplanes covering all t -tuples of Z . Suppose withoutloss of generality that ¯ e , . . . , ¯ e (cid:96) are not orthogonal to ¯ n and ¯ e (cid:96) +1 , . . . , ¯ e t areorthogonal to ¯ n . Let ¯ z = ( Z n , Z n +1 , . . . , Z n + t − ) be a t -tuple of Z , thus ¯ z belongs to one of the hyperplanes. Take any vector ¯ y ∈ M t and let us showthat it belongs to a hyperplane in the family.(1) Any vector from M t which diﬀers from ¯ z only in the ﬁrst (cid:96) com-ponents belongs to a hyperplane of the family. This comes fromLemma 1.3 because when we change for i ∈ { , . . . , (cid:96) } the i -th com-ponent of ¯ z by d i = λk , then we jump on the adjacent parallel hy-perplane. So, any transformation of the i -th component of ¯ z intoanother value from M means a ﬁnite number of jumps from one hy-perplane onto another. Hence, we may transform ¯ z so that it hasthe ﬁrst (cid:96) components equal to ¯ y and the obtained vector ¯ x belongsto a hyperplane in the family.(2) Any vector from M t which diﬀers from ¯ x only in the last t − (cid:96) components belongs to the same hyperplane as ¯ x . This comes fromthe orthogonality ¯ e i ⊥ ¯ n for i > (cid:96) (the argument is the same as inthe proof of Lemma 1.3). Since ¯ y diﬀers from ¯ x only in the last t − (cid:96) components, ¯ y belongs to a hyperplane in the family. (cid:3) Combinatorics on Words and the WELLDOC Property

Backgrounds on Combinatorics on Words.

In the following, A denotes a ﬁnite set of symbols called letters ; the set A is therefore called an alphabet . A ﬁnite word is a ﬁnite string w = w w . . . w n of letters from A ;its length is denoted by | w | = n and | w | a denotes the number of occurrencesof a ∈ A in w . The empty word, a neutral element for concatenation of ˇL. BALKOV ´A, M. BUCCI, A. DE LUCA, J. HLADK ´Y, AND S. PUZYNINA ﬁnite words, is denoted ε and it is of zero length. The set of all ﬁnite wordsover the alphabet A is denoted by A ∗ .Under an inﬁnite word we understand an inﬁnite sequence u = u u u . . . of letters from A . A ﬁnite word w is a factor of a word v (ﬁnite or inﬁnite)if there exist words p and s such that v = pws . If p = ε , then w is said tobe a preﬁx of v ; if s = ε , then w is a suﬃx of v . The set of factors andpreﬁxes of v are denoted by Fact( v ) and Pref( v ), respectively. If v = ps forﬁnite words v, p, s , then we write p = vs − and s = p − v .An inﬁnite word u over the alphabet A is called eventually periodic if itis of the form u = vw ω , where v , w are ﬁnite words over A and ω denotes aninﬁnite repetition. An inﬁnite word is called aperiodic if it is not eventuallyperiodic.For any factor w of an inﬁnite word u , every index i such that w is apreﬁx of the inﬁnite word u i u i +1 u i +2 . . . is called an occurrence of w in u . An inﬁnite word u is recurrent if each of its factors has inﬁnitely manyoccurrences in u .The factor complexity of an inﬁnite word u is a map C u : N (cid:55)→ N deﬁnedby C u ( n ) := the number of factors of length n contained in u . The factorcomplexity of eventually periodic words is bounded, while the factor com-plexity of an aperiodic word u satisﬁes C u ( n ) ≥ n + 1 for all n ∈ N . A rightextension of a factor w of u over the alphabet A is any letter a ∈ A suchthat wa is a factor of u . Of course, any factor of u has at least one rightextension. A factor w is called right special if w has at least two right ex-tensions. Similarly, one can deﬁne a left extension and a left special factor.A factor is bispecial if it is both right and left special. An aperiodic wordcontains right special factors of any length.The Parikh vector of a ﬁnite word w over an alphabet { , , . . . , d − } is deﬁned as ( | w | , | w | , . . . , | w | d − ). For a ﬁnite or inﬁnite word u = u u u . . . , Pref n u will denote the preﬁx of length n of u , i.e., Pref n u = u u . . . u n − .In some of the examples we consider are morphic words. A morphism isa function ϕ : A ∗ → B ∗ such that ϕ ( ε ) = ε and ϕ ( wv ) = ϕ ( w ) ϕ ( v ), for all w, v ∈ A ∗ . Clearly, a morphism is completely deﬁned by the images of theletters in the domain. A morphism is prolongable on a ∈ A , if | ϕ ( a ) | ≥ a is a preﬁx of ϕ ( a ). If ϕ is prolongable on a , then ϕ n ( a ) is a proper preﬁx of ϕ n +1 ( a ), for all n ∈ N . Therefore, the sequence ( ϕ n ( a )) n ≥ of words deﬁnesan inﬁnite word u that is a ﬁxed point of ϕ . Such a word u is a (pure) morphic word.Let us introduce a combinatorial condition on inﬁnite words that – as wewill see later – guarantees no lattice structure for the associated PRNGs. Deﬁnition 2.1 (The WELLDOC property) . We say that an aperiodic inﬁ-nite word u over the alphabet { , , . . . , d − } has well distributed occurrences (or has the WELLDOC property ) if for any m ∈ N and any factor w of u theword u satisﬁes the following condition. If i , i , . . . denote the occurrencesof w in u , then (cid:8)(cid:0) | Pref i j u | , . . . , | Pref i j u | d − (cid:1) mod m | j ∈ N (cid:9) = Z dm ; PERIODIC PSEUDORANDOM NUMBER GENERATORS 7 that is, the Parikh vectors of Pref i j u for j ∈ N , when reduced modulo m ,give the whole set Z dm .We deﬁne the WELLDOC property for aperiodic words since it clearlynever holds for periodic ones. It is easy to see that if a recurrent inﬁniteword u has the WELLDOC property, then for every vector v ∈ Z dm thereare inﬁnitely many values of j such that the Parikh vector of Pref i j u iscongruent to v modulo m . Example 2.2.

The Thue-Morse word u = 01101001100101101001011001101001 · · · , which is a ﬁxed point of the morphism 0 (cid:55)→

01, 1 (cid:55)→

10, does not satisfy theWELLDOC property. Indeed, take m = 2 and w = 00, then w occurs onlyin odd positions i j so that ( | Pref i j u | + | Pref i j u | ) = i j is odd. Thus, e.g.,( | Pref i j u | , | Pref i j u | ) mod 2 (cid:54) = (0 , , and hence { ( | Pref i j u | , | Pref i j u | ) mod 2 | j ∈ N } (cid:54) = Z . Example 2.3.

We say that an inﬁnite word u over an alphabet A , |A| = d ,is universal if it contains all ﬁnite words over A as its factors. It is easyto see that any universal word satisﬁes the WELLDOC property. Indeed,for any word w ∈ A ∗ and any m there exists a ﬁnite word v such that if i , i , . . . , i k denote the occurrences of w in v , then (cid:8)(cid:0) | Pref i j v | , . . . , | Pref i j v | d − (cid:1) mod m | j ∈ { , , . . . , k } (cid:9) = Z dm . Since u is universal, v is a factor of u . Denoting by i an occurrence of v in u , one gets that the positions i + i j are occurrences of w in u . Hence (cid:8)(cid:0) | Pref i + i j u | , . . . , | Pref i + i j u | d − (cid:1) mod m | j ∈ { , , . . . , k } (cid:9) == ( | Pref i u | , . . . , | Pref i u | d − ) ++ (cid:8)(cid:0) | Pref i j v | , . . . , | Pref i j v | d − (cid:1) mod m | j ∈ { , , . . . , k } (cid:9) = Z dm . Therefore, u satisﬁes the WELLDOC property.2.2. Combination of PRNGs.

In order to eliminate the lattice structure,it helps to combine PRNGs in a smart way. Such a method was introducedin [14]. Let X = ( X n ) n ∈ N and Y = ( Y n ) n ∈ N be two PRNGs with the sameoutput M ⊂ N and the same period m ∈ N , and let u = u u u . . . bea binary inﬁnite word over the alphabet { , } . The PRNG Z = ( Z n ) n ∈ N based on u is obtained by the following algo-rithm:(1) Read step by step the letters of u .(2) When you read 0 for the i -th time, copy the i -th symbol from X tothe end of the constructed sequence Z .(3) When you read 1 for the i -th time, copy the i -th symbol from Y tothe end of the constructed sequence Z .This construction can be generalized for non-binary alphabets: Using inﬁnitewords over a multiliteral alphabet, one can combine more than two PRNGs.Remark that following terminology from [3], the sequence Z is obtained asa shuﬄe of the sequences X and Y with the steering word u . ˇL. BALKOV ´A, M. BUCCI, A. DE LUCA, J. HLADK ´Y, AND S. PUZYNINA In order to distinguish between generators and inﬁnite words used for theircombination, we always denote generators with capital letters

X, Y, Z, . . . and words with lower-case letters u, v, w (the same convention is appliedfor their outputs:

A, B, . . . for output values of generators (elements of M ), a, b, . . . for letters of words). Finite sequences of successive elements¯ x = ( X i , X i +1 , . . . , X i + t − ) of a PRNG X are called t -tuples, or vectors,while in the case of an inﬁnite word u , we call u i u i +1 . . . u i + t − a factor oflength t .2.3. The WELLDOC Property and Absence of the Lattice Struc-ture.

Guimond et al. in [15] have shown that PRNGs based on inﬁnitewords coding a certain class of cut-and-project sets have no lattice struc-ture. In the sequel, we will generalize their result and ﬁnd larger classes ofwords guaranteeing no lattice structure for associated generators. We focuson the binary alphabet, although everything works for multiliteral words aswell (and for combination of more generators therefore), since the proofsbecome more technical in non-binary case.

Theorem 2.4.

Let Z be the PRNG based on a binary inﬁnite word u withthe WELLDOC property. Then Z has no lattice structure.Proof. According to Proposition 1.1, it suﬃces to check that its assumptionsare met. Let

A, B ∈ M and (cid:96) ∈ N . Assume A = X i and B = Y j , where X = ( X n ) n ∈ N and Y = ( Y n ) n ∈ N are the two combined PRNGs with thesame output M ⊂ N and the same period m ∈ N . Consider a right specialfactor w of u of length (cid:96) , i.e., both words w w u (sucha factor w exists since u is an aperiodic word because of the WELLDOCproperty). By Deﬁnition 2.1, it is possible to ﬁnd an occurrence i k of w u such that | Pref i k u | = i − | w | − m, | Pref i k u | = j − | w | − m. Reading the word w i k , the corresponding (cid:96) -tuple ( A , A , . . . , A (cid:96) )of the generator Z consists of symbols X ( i −| w | ) mod m , . . . , X ( i −

1) mod m and Y ( j −| w | ) mod m , . . . , Y ( j −

1) mod m . When reading 0 after w , the symbol X i = A from the ﬁrst generator follows( A , A , . . . , A (cid:96) ).Again, by Deﬁnition 2.1, there exists an occurrence i s of w u suchthat | Pref i s u | = i − | w | − m, | Pref i s u | = j − | w | − m. When reading the word w at the occurrence i s , the same (cid:96) -tuple ( A , A , . . . , A (cid:96) )of Z as previously occurs. This time, however, ( A , A , . . . , A (cid:96) ) is followedby B because we read w Y j = B . Thus, we have found an (cid:96) -tuple( A , A , . . . , A (cid:96) ) of Z followed in Z by both A and B . (cid:3) Remark . The WELLDOC property is suﬃcient, but not necessary forabsence of the lattice structure. For example, consider a modiﬁed Fi-bonacci word ˆ u where the letter 2 is inserted after each letter, i.e., ˆ u =0212020212021202 . . . . It is easy to verify that ˆ u does not have well dis-tributed occurrences. However, we will show the following: Let Z be the PERIODIC PSEUDORANDOM NUMBER GENERATORS 9

PRNG combining three generators X = ( X n ) n ∈ N , Y = ( Y n ) n ∈ N and V =( V n ) n ∈ N with the same output M ⊂ N and the same period m ∈ N accord-ing to the modiﬁed Fibonacci word ˆ u . Then Z has no lattice structure.It suﬃces to verify assumptions of Proposition 1.1. Let A, B ∈ M and (cid:96) ∈ N , (cid:96) an even number (the proof is analogous for odd (cid:96) ). Assume A = X i and B = Y j . Consider a right special factor w of the Fibonacci word u oflength (cid:96)/

2. Since u has the WELLDOC property, there exists an occurrence i k of w u such that | Pref i k u | = i − | w | − m, | Pref i k u | = j − | w | − m. Then if we insert the letter 2 after each letter of w , we obtain a right specialfactor ˆ w of the modiﬁed Fibonacci word ˆ u of length (cid:96) . It holds then that | Pref i k ˆ u | = i − | w | − m = i − | ˆ w | − m, | Pref i k ˆ u | = j − | w | − m = j − | ˆ w | − m, | Pref i k ˆ u | = i − | w | − j − | w | − m = i + j − | ˆ w | − m. When reading the word ˆ w i k , the corresponding (cid:96) -tuple ( A , A , . . . , A (cid:96) ) of the generator Z is followed by the symbol X i = A from the ﬁrst generator.Again, by the WELLDOC property of u , there exists an occurrence i s of w u such that | Pref i s u | = i − | w | − m, | Pref i s u | = j − | w | − m. It holds then that | Pref i s ˆ u | = i − | w | − m = i − | ˆ w | − m, | Pref i s ˆ u | = j − | w | − m = j − | ˆ w | − m, | Pref i s ˆ u | = i − | w | − j − | w | − m = i + j − | ˆ w | − m. When reading the word ˆ w at the occurrence 2 i s , the same (cid:96) -tuple ( A , A , . . . , A (cid:96) )of Z as previously occurs. This time, however, ( A , A , . . . , A (cid:96) ) is followedby B because we read ˆ w Y j = B . Thus, we have found an (cid:96) -tuple( A , A , . . . , A (cid:96) ) of Z followed in Z by both A and B . Therefore Z has nolattice structure. Remark . In the proof of Theorem 2.4, the modulus m from the WELL-DOC property is set to be equal to the period of the combined generators.Therefore, if we require absence of the lattice structure for a PRNG obtainedwhen combining PRNGs with a ﬁxed period ˆ m , then it is suﬃcient to usean inﬁnite word u that satisﬁes the WELLDOC property for the modulus m = ˆ m . This means for instance that the Thue-Morse word is not com-pletely out of the game, but it cannot be used to combine periodic PRNGswith the period being a power of 2.We have formulated a combinatorial condition – well distributed occur-rences – guaranteeing no lattice structure of the associated generator. It isnow important to ﬁnd classes of words satisfying such a condition.3. Sturmian Words

In this section we show that Sturmian words have well distributed occur-rences.

Deﬁnition 3.1.

An aperiodic inﬁnite word u is called Sturmian if its factorcomplexity satisﬁes C u ( n ) = n + 1 for all n ∈ N .So, Sturmian words are by deﬁnition binary and they have the lowestpossible factor complexity among aperiodic inﬁnite words. Sturmian wordsadmit various types of characterizations of geometric and combinatorial na-ture. One of such characterizations is via irrational rotations on the unitcircle. In [19] Hedlund and Morse showed that each Sturmian word may berealized measure-theoretically by an irrational rotation on the circle. Thatis, every Sturmian word is obtained by coding the symbolic orbit of a pointon the circle of circumference one under a rotation R α by an irrational an-gle α , 0 < α <

1, where the circle is partitioned into two complementaryintervals, one of length α and the other of length 1 − α. Conversely, eachsuch coding gives rise to a Sturmian word.

Deﬁnition 3.2.

The rotation by angle α is the mapping R α from [0 , R α ( x ) = { x + α } , where { x } = x − (cid:98) x (cid:99) is the fractional part of x . Considering a partition of [0 , I = [0 , − α ), I = [1 − α, s α,ρ ( n ) = (cid:40) R nα ( ρ ) = { ρ + nα } ∈ I , R nα ( ρ ) = { ρ + nα } ∈ I . One can also deﬁne I (cid:48) = (0 , − α ], I (cid:48) = (1 − α, s (cid:48) α,ρ .Remark that some but not all Sturmian words are morphic. In fact, itis known that a characteristic Sturmian word (i.e., ρ = α ) is morphic ifand only if the continuous fraction expansion of α is periodic. For moreinformation on Sturmian words we refer to [16, Chapter 2]. Theorem 3.3.

Let u be a Sturmian word. Then u has the WELLDOCproperty.Proof. In the proof we use the deﬁnition of Sturmian word via rotation.The main idea is controlling the number of 1’s modulo m by taking circle oflength m , and controlling the length taking the rotation by mα .For the proof we will use an equivalent reformulation of the theorem:Let u be a Sturmian word on { , } , for any natural number m and anyfactor w of u let us denote i , i , . . . the occurrences of w in u . Then (cid:8)(cid:0) i j , | Pref i j u | (cid:1) mod m | j ∈ N (cid:9) = Z m . That is, we control the number of 1’s and the length instead of the numberof 0’s.Since a Sturmian word can be deﬁned via rotations by an irrational angleon a unit circle, without loss of generality we may assume that u = s α,ρ forsome 0 < α <

1, 0 ≤ ρ < α irrational (see Deﬁnition 3.2). Equivalently,we can consider m copies of the circle connected into one circle of length m with m intervals I i of length α corresponding to 1. The Sturmian word isobtained by rotation by α on this circle of length m (see Fig. 2). Measured by arc length (thus equivalent to 2 πα radians). PERIODIC PSEUDORANDOM NUMBER GENERATORS 11 − α − α − α − α − α I w Figure 2.

Illustration to the proof of Theorem 3.3: the examplefor m = 5. Namely, we deﬁne the rotation R α,m as the mapping from [0 , m ) (identi-ﬁed with the circle of length m ) to itself deﬁned by R α,m ( x ) = { x + α } m ,where { x } m = x − (cid:98) x/m (cid:99) m and for m = 1 coincides with the fractionalpart of x . A partition of [0 , m ) into 2 m intervals I i = [ i, i + 1 − α ), I i = [ i + 1 − α, i + 1), i = 0 , . . . , m − u = s α,ρ : s α,ρ ( n ) = (cid:40) R nα,m ( ρ ) = { ρ + nα } m ∈ I i for some i = 0 , . . . , m − , R nα,m ( ρ ) = { ρ + nα } m ∈ I i for some i = 0 , . . . , m − . It is well known that any factor w = w · · · w k − of u corresponds toan interval I w in [0 , I w , you obtain w . Namely, x ∈ I w if and only if x ∈ I w , R α ( x ) ∈ I w , . . . , R | w |− α ( x ) ∈ I w | w |− .Similarly, we can deﬁne m intervals corresponding to w in [0 , m ) (circle oflength m ), so that if I w = [ x , x ), then I iw = [ x + i, x + i ), i = 0 , . . . , m − w of u , take arbitrary ( j, i ) ∈ Z m . Now let us organize( j, i ) among the occurrences of w , i.e., ﬁnd l such that u l . . . u l + | w |− = w , l mod m = j and | Pref l u | mod m = i :Consider rotation R mα,m ( x ) by mα instead of rotation by α , and start m -rotating from jα + ρ . Formally, R mα,m ( x ) = { x + mα } m , where, asabove, { x } m = x − [ x/m ] m . This rotation will put us to positions mk + j , k ∈ N , in the Sturmian word: for a ∈ { , } one has s α,ρ ( mk + j ) = a if R kmα,m ( jα + ρ ) = { jα + ρ + kmα } m ∈ I ia for some i = 0 , . . . , m − m -rotation of a point on the m -circle are dense, and hence the rotation comes inﬁnitely often to eachinterval. So pick k when jα + mkα + ρ ∈ I iw ⊂ [ i, i + 1) (and actually thereexist inﬁnitely many such k ). Then the length l of the corresponding preﬁxis equal to km + j , and the number of 1’s in it is i + mp , where p is thenumber of complete circles you made, i.e., p = [( jα + mkα + ρ ) /m ]. (cid:3) Arnoux-Rauzy Words

In this section we show that Arnoux-Rauzy words [1], which are naturalextensions of Sturmian words to larger alphabets, also satisfy the WELL-DOC property. Note that the proof for Sturmian words cannot be general-ized to Arnoux-Rauzy words, because it is based on the geometric interpre-tation of Sturmian words via rotations, while this interpretation does notextend to Arnoux-Rauzy words.4.1.

Basic Deﬁnitions.

The deﬁnitions and results we remind in this sub-section are well-known and mostly taken from [1, 9] and generalize the onesgiven for binary words in [5].

Deﬁnition 4.1.

Let A be a ﬁnite alphabet. The reversal operator is theoperator ∼ : A ∗ (cid:55)→ A ∗ deﬁned by recurrence in the following way:˜ ε = ε, (cid:102) va = a ˜ v for all v ∈ A ∗ and a ∈ A . The ﬁxed points of the reversal operator are called palindromes . Deﬁnition 4.2.

Let v ∈ A ∗ be a ﬁnite word over the alphabet A . The rightpalindromic closure of v , denoted by v (+) , is the shortest palindrome thathas v as a preﬁx. It is readily veriﬁed that if p is the longest palindromicsuﬃx of v = wp , then v (+) = wp ˜ w . Deﬁnition 4.3.

We call the iterated (right) palindromic closure operator the operator ψ recurrently deﬁned by the following rules: ψ ( ε ) = ε, ψ ( va ) = ( ψ ( v ) a ) (+) for all v ∈ A ∗ and a ∈ A . The deﬁnition of ψ may be extended to inﬁnitewords u over A as ψ ( u ) = lim n ψ (Pref n u ), i.e., ψ ( u ) is the inﬁnite wordhaving ψ (Pref n u ) as its preﬁx for every n ∈ N . Deﬁnition 4.4.

Let ∆ be an inﬁnite word on the alphabet A such thatevery letter occurs inﬁnitely often in ∆. The word c = ψ (∆) is then called a characteristic (or standard) Arnoux-Rauzy word and ∆ is called the directivesequence of c . An inﬁnite word u is called an Arnoux-Rauzy word if it hasthe same set of factors as a (unique) characteristic Arnoux-Rauzy word,which is called the characteristic word of u . The directive sequence of anArnoux-Rauzy word is the directive sequence of its characteristic word.Let us also recall the following well-known characterization (see e.g. [9]): Theorem 4.5.

Let u be an aperiodic inﬁnite word over the alphabet A .Then u is a standard Arnoux-Rauzy word if and only if the following hold: (1) Fact( u ) is closed under reversal (that is, if v is a factor of u so is ˜ v ). (2) Every left special factor of u is also a preﬁx. (3) If v is a right special factor of u then va is a factor of u for every a ∈ A . From the preceding theorem, it can be easily veriﬁed that the bispecialfactors of a standard Arnoux-Rauzy correspond to its palindromic preﬁxes

PERIODIC PSEUDORANDOM NUMBER GENERATORS 13 (including the empty word), and hence to the iterated palindromic closureof the preﬁxes of its directive sequence. That is, if ε = b , b , b , . . . is the sequence, ordered by length, of bispecial factors of the standardArnoux-Rauzy word u , ∆ = ∆ ∆ · · · its directive sequence (with ∆ i ∈ A for every i ), we have b i +1 = ( b i ∆ i ) (+) .A direct consequence of this, together with the preceding deﬁnitions, isthe following statement, which will be used in the sequel. Lemma 4.6.

Let u be a characteristic Arnoux-Rauzy word and let ∆ and ( b i ) i ≥ be deﬁned as above. If ∆ i does not occur in b i , then b i +1 = b i ∆ i b i .Otherwise let j < i be the largest integer such that ∆ j = ∆ i . Then b i +1 = b i b − j b i . Parikh Vectors and Arnoux-Rauzy Factors.

Where no confusionarises, given an Arnoux-Rauzy word u , we will denote by ε = b , b , . . . , b n , . . . the sequence of bispecial factors of u ordered by length and we will denotefor any i ∈ N , ¯ b i the Parikh vector of b i . Remark . By the pigeonhole principle, it is clear that for every m ∈ N there exists an integer N ∈ N such that, for every i ≥ N , the set { j > i | ¯ b j ≡ m ¯ b i } is inﬁnite. Where no confusion arises and with a slight abuse ofnotation, ﬁxed m , we will always denote by N the smallest of such integers. Lemma 4.8.

Let u be a characteristic Arnoux-Rauzy word and let m ∈ N .Let α ¯ b j + · · · + α k ¯ b j k ≡ m ¯ v ∈ Z dm be a linear combination of Parikh vectors such that (cid:80) ki =1 α i = 0 , with j i ≥ N and α i ∈ Z for all i ∈ { , . . . k } . Then, for any (cid:96) ∈ N , there exists a preﬁx v of u such that the Parikh vector of v is congruent to ¯ v modulo m and vb (cid:96) is also a preﬁx of u .Proof. Without loss of generality, we can assume α ≥ α ≥ · · · ≥ α k , hencethere exists k (cid:48) such that α ≥ α k (cid:48) ≥ ≥ α k (cid:48) +1 ≥ α k . We will prove the result by induction on β = (cid:80) k (cid:48) j =1 α j . If β = 0, trivially,we can take v = ε and the statement is clearly veriﬁed. Let us assume thestatement true for all 0 ≤ β < n and let us prove it for β = n . By theremark preceding this lemma, for every (cid:96) we can choose i (cid:48) > j (cid:48) > (cid:96) suchthat ¯ b j ≡ m ¯ b i (cid:48) and ¯ b j k ≡ m ¯ b j (cid:48) . Since every bispecial factor is a preﬁx andsuﬃx of all the bigger ones, in particular we have that b j (cid:48) is a suﬃx of b i (cid:48) ,and b (cid:96) is a preﬁx of b j (cid:48) ; this implies that b i (cid:48) b − j (cid:48) b (cid:96) is actually a preﬁx of b i (cid:48) .By assumption, the Parikh vector of b i (cid:48) b − j (cid:48) is clearly ¯ b i (cid:48) − ¯ b j (cid:48) ≡ m ¯ b j − ¯ b j k .Since α ≥ α k ≤ −

1, we have, by induction hypothesis, that thereexists a preﬁx w of u such that the Parikh vector of w is congruent modulo m to ( α − b j + · · · + ( α k + 1)¯ b j k and wb i (cid:48) is a preﬁx of u . Hence wb i (cid:48) b − j (cid:48) b (cid:96) is also a preﬁx of u and, by simplecomputation, the Parikh vector of v = wb i (cid:48) b − j (cid:48) is congruent modulo m to¯ v = α ¯ b j + · · · + α k ¯ b j k . (cid:3) Deﬁnition 4.9.

Let n ∈ Z . We will say that an integer linear combinationof integer vectors is a n -combination if the sum of all the coeﬃcients equals n . Lemma 4.10.

Let u be a characteristic Arnoux-Rauzy word and let n ∈ N .Every n -combination of Parikh vectors of bispecial factors can be expressedas an n -combination of Parikh vectors of arbitrarily large bispecials. Inparticular, for every K, L ∈ N , it is possible to ﬁnd a ﬁnite number ofintegers α , . . . , α k such that ¯ b K = α ¯ b j + · · · + α k ¯ b j k with j i > L for every i and α + · · · + α k = 1 .Proof. A direct consequence of Lemma 4.6 is that for every i such that ∆ i appears in b i , we have ¯ b i +1 = 2¯ b i − ¯ b j , where j < i is the largest suchthat ∆ j = ∆ i . This in turn (since every letter in ∆ appears inﬁnitely manytimes from the deﬁnition of Arnoux-Rauzy word) implies that for every non-negative integer j , there exists a positive k such that ¯ b j = 2¯ b j + k − ¯ b j + k +1 , that is, we can substitute each Parikh vector of a bispecial with a 1-combination of Parikh vectors of strictly larger bispecials. Simply iteratingthe process, we obtain the statement. (cid:3) In the following we will assume the set A to be a ﬁnite alphabet ofcardinality d . For every set X ⊆ A ∗ of ﬁnite words, we will denote byPV( X ) ⊆ Z d the set of Parikh vectors of elements of X and for every m ∈ N we will denote by PV m ( X ) ⊆ Z dm the set of elements of PV( X ) reducedmodulo m .For an inﬁnite word u over A , and a factor v of u , let S v ( u ) denote theset of all preﬁxes of u followed by an occurrence of v . In other words, S v ( u ) = { p ∈ Pref( u ) | pv ∈ Pref( u ) } . Deﬁnition 4.11.

For any set of ﬁnite words X ⊆ A ∗ , we will say that u has the property P X (or, for short, that u has P X ) if, for every m ∈ N andfor every v ∈ X we have thatPV m ( S v ( u )) = Z dm . That is to say, for every vector ¯ w ∈ Z dm there exists a word w ∈ S v ( u ) suchthat the Parikh vector of w is congruent to ¯ w modulo m .With this notation, an inﬁnite word u has the WELLDOC property ifand only if it has the property P Fact( u ) . Proposition 4.12.

Let u be a characteristic Arnoux-Rauzy word over the d -letter alphabet A . Then u has the property P Pref( u ) .Proof. Let us ﬁx an arbitrary m ∈ N . We want to show that, for every v ∈ Pref( u ), PV m ( S v ( u )) = Z dm . Let then ¯ v ∈ Z d and (cid:96) be the smallestnumber such that v is a preﬁx of b (cid:96) . Let i < i < · · · < i d be such that ∆ i j does not appear in b i j , where ∆ is the directive word of u . Without loss ofgenerality, we can rearrange the letters so that each ∆ i j is lexicographically PERIODIC PSEUDORANDOM NUMBER GENERATORS 15 smaller than ∆ i j +1 . With this assumption if, for every j , we set ¯ v j = ¯ b i j +1 ,i.e., equal to the Parikh vector of b i j +1 , which, by the ﬁrst part of Lemma4.6, equals b i j ∆ i j b i j , we can ﬁnd j − µ , . . . , µ j − suchthat ¯ v j = ( µ , µ , . . . , µ j − , , , . . . , V = { ¯ v , . . . , ¯ v d } generates Z d , hence there exists an integer n such that ¯ v can be expressed as an n -combination of elements of V (which are Parikhvectors of bispecial factors of u ). Trivially, then, ¯ v = ¯ v − n ¯ = ¯ v − n ¯ b ;thus, it is possible to express ¯ v as a 0-combination of Parikh vectors of (bythe previous Lemma 4.10) arbitrarily large bispecial factors of u . By Lemma4.8, then there exists a preﬁx p of u whose Parikh vector ¯ p satisﬁes ¯ p ≡ m ¯ v and pb (cid:96) is a preﬁx of u . Since we picked (cid:96) such that v is a preﬁx of b (cid:96) , wehave that p ∈ S v ( u ). From the arbitrariness of v , ¯ v and m , we obtain thestatement. (cid:3) As a corollary of Proposition 4.12, we obtain the main result of this sec-tion.

Theorem 4.13.

Let u be an Arnoux-Rauzy word over the d -letter alphabet A . Then u has the property P Fact( u ) , or equivalently, u has the WELLDOCproperty.Proof. Let m be a positive integer and let c be the characteristic word of u . Let v be a factor of u and xvy be the shortest bispecial containing v .By Proposition 4.12, we have that PV m ( S xv ( c )) = Z dm and, since the set isﬁnite, we can ﬁnd a preﬁx p of c such that PV m ( S xv ( p )) = Z dm . Let w be apreﬁx of u such that wp is a preﬁx of u . If ¯ x and ¯ w are the Parikh vectorsof, respectively, x and w , it is easy to see that¯ w + ¯ x + PV( S xv ( p )) ⊆ ¯ w + PV( S v ( p )) ⊆ PV( S v ( u ))Since we have chosen p such that PV m ( S xv ( p )) = Z dm , we clearly obtainthat PV m ( S v ( u )) = Z dm and hence, by the arbitrariness of v and m , thestatement. (cid:3) Remark . Now we introduce a simple method of obtaining words sat-isfying the WELLDOC property. Take a word u with the WELLDOCproperty over an alphabet { , , . . . , d − } , d >

2, apply a morphism ϕ : d − (cid:55)→ , i (cid:55)→ i for i = 0 , . . . , d −

2, i.e., ϕ joins two letters intoone. It is straightforward that ϕ ( u ) has the WELLDOC property. So, tak-ing Arnoux-Rauzy words and joining some letters, we obtain other wordsthan Sturmian and Arnoux-Rauzy satisfying the WELLDOC property. Remark . Now we introduce another class of morphisms preserving theWELLDOC property. Recall that the adjacency matrix

Φ of a morphism ϕ : A → A , with A = { , , . . . , d − } , is deﬁned by Φ i,j = | ϕ ( j − | i − for 1 ≤ i, j ≤ d . By deﬁnition, it follows that if ¯ v is the Parikh vector of v ∈ A ∗ , then Φ¯ v is the Parikh vector of ϕ ( v ).Let us show that if det Φ = ± u has the WELLDOC property, thenso does ϕ ( u ). Indeed, let w be any factor of ϕ ( u ), and suppose xwy = ϕ ( v )for some v ∈ Fact( u ) and x, y ∈ A ∗ . We then have S w ( ϕ ( u )) ⊇ ϕ ( S v ( u )) x ,so that, writing ¯ x for the Parikh vector of x , we have for any m > m ( S w ( ϕ ( u ))) ⊇ Φ · PV m ( S v ( u )) + ¯ x mod m . Since u has the WELLDOC property, PV m ( S v ( u )) = Z dm . As det Φ = ± m ), so that Φ · Z dm + ¯ x mod m = Z dm . HencePV m ( S w ( ϕ ( u ))) = Z dm , showing that ϕ ( u ) has the WELLDOC property bythe arbitrariness of w and m .5. Statistical Tests of PRNGs

In the previous part, we have explained that PRNGs based on inﬁnitewords with well distributed occurrences have no lattice structure. In thissequel we demonstrate this by empirical statistical tests. We have chosento use LCGs as underlying generators explicitly for their known weaknesses.We will show how mixing based on aperiodic inﬁnite words will cope withthese weaknesses and whether statistical tests will show any signiﬁcant im-provements.5.1.

Computer Generation of Morphic Words.

Any real computer isa ﬁnite state machine and hence it can generate only ﬁnite preﬁxes of inﬁnitewords. From practical point of view it is important to ﬁnd algorithms thatare eﬃcient both in memory footprint and CPU time. In [20] an eﬃcientalgorithm for generating the Fibonacci word was introduced: The preﬁxof length n is generated in O (log( n )) space and O ( n ) time. We generalizethis method for any Sturmian and Arnoux-Rauzy word being a ﬁxed pointof a morphism ϕ . The main ingredient is that we consider ϕ n instead of ϕ ; we precompute and store in the memory ϕ n ( a ) for any a ∈ A . Theruntime to generate 10 letters of the Fibonacci and the Tribonacci word issummarized in Table 1. There are the following observations we would liketo point out:(1) There is no need to store the ﬁrst n letters in memory to generatethe ( n + 1)-th letter. Letters are generated on the ﬂy and only nodesof the traversal tree are kept in the memory. Memory consumptionneeded to generate the ﬁrst 10 letters is shown in Table 1. Thealgorithm also supports leap frogging, generation can be started atany position in the word. The consequence is that the algorithm canbe easily parallelized to produce multiple streams [11].(2) Using the method from [20] together with our improvement for gen-eration of Sturmian and Arnoux-Rauzy words, the speed of genera-tion of their preﬁxes is much higher than the speed of generation ofLCGs output values. For example, generation of 10 takes 14 . . letters of a ﬁxed pointof a morphism with the same hardware. Thus, using a ﬁxed pointto combine LCGs causes only a negligible runtime penalty.(3) The speed of generation can be further improved by using a higherinitial memory footprint and CPU that can eﬀectively copy suchlarger chunks of memory (size of L1 data cache is a limiting factor).Thus the new method scales nicely and can beneﬁt form the futureCPUs with higher L1 caches. The only requirement is to precompute ϕ n ( a ), a ∈ A , for larger n . Our program does this automatically PERIODIC PSEUDORANDOM NUMBER GENERATORS 17 based on the limit on the initial memory consumption provided bythe user.Word Fibonacci Tribonacci ϕ morphism rule 115s / 336 Bytes 107s / 256 Bytes ϕ n morphism rule 0.41s / 32 Bytes 0.36s / 32 Bytes Table 1.

The comparison of time in seconds and memory con-sumption to hold the traversal tree state needed to generate theﬁrst 10 letters of the Fibonacci and the Tribonacci word usingthe original [20] (1st line) and the new algorithm (2nd line). Theiteration n in the ϕ n rule was chosen so that the length of ϕ n ( a )does not exceed 4096 bytes for any a ∈ A . The measurement wasdone on Intel Core i7-3520M CPU running at 2.90GHz. Testing PRNGs Based on Sturmian and Arnoux-Rauzy words.

We will present results for PRNGs based on: • the Fibonacci word (as an example of a Sturmian word), i.e., theﬁxed point of the morphism 0 (cid:55)→ , (cid:55)→ • the modiﬁed Fibonacci word – Fibonacci2 – with the letter 2 insertedafter each letter (see Remark 2.5), • the Tribonacci word (as the simplest example of a ternary Arnoux-Rauzy word), i.e., the ﬁxed point of 0 (cid:55)→ , (cid:55)→ , (cid:55)→ Combining LCGs.

Instead of combining plain LCGs, we will executesome modiﬁcations before their combination. Those modiﬁcations turn outto be useful according to the known weaknesses of LCGs.We have chosen LCGs with the period m in range from 2 −

115 to2 , but we use only their upper 32 bits as the output since the statisticaltests require 32-bit sequences as the input. Their output is thus in all cases M = { , , . . . , − } .We use two batteries of random tests – TestU01 BigCrush and PractRand.They operate diﬀerently. The ﬁrst one includes 160 statistical tests, many ofthem tailored to the speciﬁc classes of PRNGs. It is a reputable test, howeverits drawback is that it works with a ﬁxed amount of data and discards theleast signiﬁcant bit (for some tests even two bits) of the 32-bit numbersbeing tested. The second battery consists of three diﬀerent tests where oneis adapted on short range correlations, one reveals long range violations, andthe last one is a variation on the classical Gap test. Details can be found in [7, 8]. Moreover, the PractRand battery applies automatically variousﬁlters on the input data. For our purpose the lowbit ﬁlter is interesting –it is passing various number of the least signiﬁcant bits to the statisticaltests. As we have already mentioned, the LCGs with m = 2 (cid:96) have a muchshorter period than the LCG itself. Therefore the lowbit ﬁlter is useful tocheck whether this weakness disappears when LCGs are combined accordingto an inﬁnite word. The PractRand tests are able to treat very long inputsequences, up to a few exabytes. To control the runtime we have limited thelength of input sequences to 16TB.The ﬁrst column of Table 2 shows the list of tested LCGs. The BigCrushcolumn shows how many tests of the TestU01 BigCrush battery failed. ThePractRand column gives the log of sample datasize in Bytes for whichthe results of the PractRand tests started to be “very suspicious” ( p -valuessmaller than 10 − ). One LCG did not show any failures in the PractRandtests which is denoted as >

44 – the meaning is that the PractRand testhas passed successfully 16TB of input data and the test was stopped there.The last column provides time in seconds to generate the ﬁrst 10 Generator Legend BigCrush PractRand Time 10 LCG(2 − , ,

0) L47-115 14 40 281LCG(2 − , ,

0) L63-25 2 >

44 277LCG(2 , ,

0) L59 19 27 14.1LCG(2 , ,

1) L63 19 33 14.4LCG(2 , ,

1) L64 28 18 35 14.0LCG(2 , ,

1) L64 32 14 34 14.1LCG(2 , ,

1) L64 39 13 33 14.0

Table 2.

List of the used LCGs with parameters LCG( m, a, c ).Results in the BigCrush (number of failed tests) and in thePractRand (log of sample size for which the test started to fail)battery of statistical tests. Time in seconds to generate the ﬁrst10 From Table 2 it can be seen that the LCGs with m ∈ { − , − } have the best statistical properties from the chosen LCGs. At the same time,these LCGs are 20 times slower than the other LCGs used. This is becausewe have used 128-bit integer arithmetic to compute their internal state andbecause explicit modulo operation cannot be avoided. As the CPU useddoes not have the 128-bit integer arithmetic, it has to be implemented insoftware (in this case via GCC’s int128 type) which is much slower thanthe 64-bit arithmetic wired on CPU.5.2.2. Results in Statistical Tests.

We will present results for the PRNGsbased on the Fibonacci, Fibonacci2 and Tribonacci word using the diﬀerentcombinations of LCGs from Table 2. It includes also the situations where

PERIODIC PSEUDORANDOM NUMBER GENERATORS 19 the instances of the same LCG are used. Each instance has its own state.The LCGs were seeded with the value 1. The PRNGs were warmed upby generating 10 values before statistical tests started. Since the relativefrequency of the letters in the aperiodic words diﬀer a lot (for examplefor the Fibonacci word the ratio of zeroes to ones is given by τ = √ ),the warming procedure will guarantee that the state of instances of LCGswill diﬀer even when the same LCGs are used. Even more importantly,the distance between the LCGs is growing as the new output of PRNGs isgenerated.Summary of results is in Table 3. The BigCrush column is using thefollowing notation: the ﬁrst number indicates how many tests from theBigCrush battery have clearly failed and the optional second number inparenthesis denotes how many tests have suspiciously low p -value in therange from 10 − to 10 − . The PractRand column gives the log of sampledatasize in Bytes for which the results of the PractRand tests started to be“very suspicious” ( p -values smaller than 10 − ). The maximum sample datasize used was 16TB . = 2 B. The Time column gives runtime in secondsto generate the ﬁrst 10 Fib A L64 28 L64 28 0 41 30.2L64 32 L64 28 0(1) 41 29.3L64 39 L64 28 0 (2) 41 31L64 28 L64 32 0 41 30.2L64 32 L64 32 0 41 30.1L64 39 L64 32 0 41 30.1L64 28 L64 39 0 42 30.2L64 32 L64 39 0 40 30.5L64 39 L64 39 0 42 30.1B L47-115 L47-115 1(1) >

44 302L63-25 L63-25 0(1) >

44 299L59 L59 0(1) 34 28.7L63 L63 0 40 29.8C L63-25 L59 0 38 198L59 L63-25 0(1) 35 134L63-25 L64 39 0 >

44 199L64 39 L63-25 0 41 135L59 L64 39 0 35 30.4L64 39 L59 0 37 31.3Fib2 A L64 28 L64 28 L64 28 0 40 28.4L64 39 L64 28 L64 28 0(2) 40 27.9L64 39 L64 32 L64 28 0 39 27.5L64 28 L64 39 L64 28 0 40 27.3L64 32 L64 39 L64 28 0 40 27.5L64 39 L64 39 L64 28 0 40 27.4L64 39 L64 28 L64 32 0 40 27.3Continued on the next page

Table 3 – Continued from the previous pageWord Group 0 1 2 BigCrush PractRand Time 10 L64 28 L64 39 L64 32 0 40 27.9L64 28 L64 28 L64 39 0(1) 40 27.4L64 32 L64 28 L64 39 0 39 27.7L64 39 L64 28 L64 39 0 40 27.3L64 28 L64 32 L64 39 0 40 27.3L64 28 L64 39 L64 39 0 40 27.3L64 39 L64 39 L64 39 0 40 27.4B L47-115 L47-115 L47-115 0(2) >

44 297.0L63-25 L63-25 L63-25 0(2) >

44 293.0L59 L59 L59 0(1) 32 27.4L63 L63 L63 0 38 27.3C L63-25 L59 L64 39 0(1) 39 113.0L63-25 L64 39 L59 0 32 113.0L59 L63-25 L64 39 0 38 81.1L59 L64 39 L63-25 0 39 158.3L64 39 L63-25 L59 0 31 81.0L64 39 L59 L63-25 0 42 159.0Trib A L64 28 L64 28 L64 28 0(2) 42 27.2L64 39 L64 28 L64 28 0 43 27.1L64 39 L64 32 L64 28 0(1) 42 28.0L64 28 L64 39 L64 28 0(1) 42 28.1L64 32 L64 39 L64 28 0 42 27.1L64 39 L64 39 L64 28 0(1) 42 27.2L64 39 L64 28 L64 32 0 43 27.1L64 28 L64 39 L64 32 0(1) 42 27.1L64 28 L64 28 L64 39 0 42 28.0L64 32 L64 28 L64 39 0 42 27.2L64 39 L64 28 L64 39 0(1) 43 27.1L64 28 L64 32 L64 39 0 43 27.1L64 28 L64 39 L64 39 0(2) 42 27.3L64 39 L64 39 L64 39 0 43 27.1B L47-115 L47-115 L47-115 1 >

44 299.0L63-25 L63-25 L63-25 0(1) >

44 298.0L59 L59 L59 0 35 27.2L63 L63 L63 0(1) 41 27.2C L63-25 L59 L64 39 0(1) 39 172.0L63-25 L64 39 L59 0(1) 41 173.0L59 L63-25 L64 39 0 35 106.0L59 L64 39 L63-25 0 34 70.5L64 39 L63-25 L59 0 41 107.0L64 39 L59 L63-25 0(1) 40 74.3

Table 3.

Summary of results of statistical tests for PRNGs based on the Fibonacci,Fibonacci2 and Tribonacci word and diﬀerent combinations of LCGs from Table 2.

PERIODIC PSEUDORANDOM NUMBER GENERATORS 21

We can make the following observations based on the results in statisticaltests:(1) The quality of LCGs has improved substantially when we combinedthem according to inﬁnite words with the WELLDOC property. Thiscan be seen in the TestU01 BigCrush results. While for LCGs 13 to19 tests have clearly failed (the only exception is the generator L63-25 with two failures – see Table 2), almost all of the BigCrush testspassed. The worst result was to have one BigCrush test failed forthe Tribonacci combination and one for the Fibonacci combination ofL47-115 generators. The likely reason is that the generator L47-115has the shortest period of all tested LCGs.(2) The results of the PractRand battery conﬁrm the above ﬁndings.For instance, in the case of LCGs with modulo 2 , the test startedto ﬁnd irregularities in the distribution of the least signiﬁcant bit oftested PRNGs output at around 2TB sample size. Compare it withthe sample size of 8GB to 32GB when fast plain LCGs started tofail the test. The PractRand battery applies diﬀerent ﬁlters on theinput stream and all failures appeared for Low1/32 ﬁlter where onlythe least signiﬁcant bit of the PRNG output is used. It correspondsto a known weakness of power-of-2 modulo LCGs: lower bits ofthe output have signiﬁcantly smaller period than the LCG itself.The quality of the PRNGs can be therefore further improved bycombining LCGs that do not show ﬂaws for the least signiﬁcant bitsor by using for example just 16 upper bits of the LCGs output.(3) The quality of the PRNG is linked to the quality of the underlyingLCG. When looking at the group B in Table 3, we observe that thePractRand results of the arising PRNGs are closely related to thesucces of LCGs from Table 2 in the PractRand tests.(4) Another interesting observation is that using the instances of thesame LCG (with only suﬃciently distinct seeds) produces as goodresults as combination of diﬀerent LCGs (multipliers and shifts arediﬀerent, but the modulus is the same). It is just important to makesure that starting states of the LCGs are far apart enough. Refer tothe group A in Table 3.(5) The lower quality LCG dictates the quality of resulting PRNG.When mixing LCGs with diﬀerent quality, use better ones as re-placement for more frequent letters in the aperiodic word.Please refer to the group C in Table 3. For example for the Fi-bonacci word compare ﬁrst two rows in the group C - the order ofLCGs is merely swapped but the diﬀerence in the sample size forwhich PractRand starts to fail is 8 × . This is even more signiﬁcantfor the Tribonacci based generators where the diﬀerence between theworst and best PractRand results when reordering the underlyingLCGs is given by factor 128 × .(6) On the other hand, results from the group A in Table 3 demonstratethat when using generators of similar quality (same modulus, similardeﬁciencies), the order in which generators are used to substitute the letters of the inﬁnite word does not inﬂuence the quality of theresulting generator.(7) We can also see that the modiﬁed Fibonacci word (see Remark 2.5)does not produce better results than the Fibonacci word. Clearly,a regular structure of 2’s on every other position does not help toproduce a better random sequence even if we mix now three LCGsinstead of two as in the case of the Fibonacci word.(8) Results for the Tribonacci word are better than for the Fibonacciword. (We have observed this fact for all ternary Arnoux-Rauzywords in comparison to Sturmian words.) It seems therefore thatmixing three LCGs is better than using just two LCGs, assumingthat an inﬁnite word with the WELLDOC property is used for mix-ing. We expect naturally that the better chosen LCGs (or even someother modern fast linear PRNGs, e.g. mt19937 or nonlinear PRNGsbased on the AES cipher) we combine according to an inﬁnite wordwith the WELLDOC property, the better their results in statisticaltests will be.(9) We have also tested LCGs with m = 2 −

1. It has revealed that ifthe underlying generators have poor statistical properties, then thePRNG will not be able to mask it. In particular, you cannot expectthat PRNGs – despite their inﬁnite aperiodic nature – will ﬁx theshort period problem. Once the period of the underlying LCG isexhausted, statistical tests will ﬁnd irregularities in the output ofthe PRNG.In conclusion, we summarize the main results from the user point of view: • Using diﬀerent instances of the same LCG to form a new generatorbased on the inﬁnite word with the WELLDOC property gives agenerator with improved statistical properties. • The introduced method of generation of morphic words is very fastand supports parallel processing. • The period of underlying generators has to be large enough – muchlarger than the number of needed values. • When using diﬀerent types of the underlying LCGs to form a PRNG,close attention has to be paid to the right order of the combinedLCGs. The generator with the worst properties should be used toreplace the least frequent letter of the aperiodic word. Moreover,statistical properties of the resulting PRNG are ruled by the deﬁ-ciencies of the worst used generator. • We have used the LCGs only for study reasons. Instead of LCGs,the modern generators (of user choice) could be used as underly-ing PRNG to obtain better results. We have done testing with twoinstances (respectively three for the Tribonacci and other Arnoux-Rauzy words) of Mersenne twister 19937 as the underlying generator.The newly constructed generator has passed all the empirical testson randomness we have executed (in contrary to Mersenne twister19937 itself which is failing two tests from TestU01’s BigCrush bat-tery). For the practical usage Arnoux-Rauzy (AR) words are veryappealing since there is an inﬁnite number of AR words and we have

PERIODIC PSEUDORANDOM NUMBER GENERATORS 23 implementation in place to create the AR words based on user input(it can be sought of as the seed). Thus, we recommend to create newPRNGs based on one’s favorite modern PRNGs and the custom ARword. 6.

Open problems and future research

Concerning the combinatorial part of our paper, one of the interestingopen questions there is ﬁnding large families of inﬁnite words satisfying theWELLDOC property. For example, which morphic words have the WELL-DOC property? Also, it seems to be meaningful to study a weaker WELL-DOC property where in Deﬁnition 2.1 instead of every m ∈ N we consideronly a particular m . For instance, one can search for words satisfying sucha modiﬁed WELLDOC condition for m = 2, m = 2 (cid:96) etc. Another questionto be asked is how to construct words with the WELLDOC property overlarger alphabets using words with such a property over smaller alphabets.Regarding statistical tests, it remains to explain why PRNGs based on in-ﬁnite words with the WELLDOC property succeed in tests and to comparetheir results with other comparably fast generators. Acknowledgements

The ﬁrst author was supported by the Czech Science Foundation grantGA ˇCR 13-03538S, and thanks L’Or´eal Czech Republic for the FellowshipWomen in Science. The third author was partially supported by the ItalianMinistry of Education (MIUR), under the PRIN 2010–11 project “Automie Linguaggi Formali: Aspetti Matematici e Applicativi”. The ﬁfth authorwas supported in part by the Academy of Finland under grant 251371 andby Russian Foundation of Basic Research (grants 12-01-00089 and 12-01-00448).

References

1. P. Arnoux, G. Rauzy,

Repr´esentation g´eom´etrique de suites de complexit´e n +1, Bull.Soc. Math. France (1991), 199–215.2. L. Balkov´a, M. Bucci, A. De Luca, S. Puzynina, Inﬁnite Words with Well DistributedOccurrences . In: J. Karhum¨aki, A. Lepist¨o, L. Zamboni (Eds.),

Combinatorics onWords , LNCS (2013), 46–57, Springer.3. E. Charlier, T. Kamae, S. Puzynina, L. Zamboni,

Self-shuﬄing inﬁnite words , inpreraration. Preliminary version:

Self-shuﬄing words,

ICALP 2013, Part II, LNCS (2013), 113–124, arXiv:1302.3844.4. J. Hladk´y,

Random number generators based on the aperiodic inﬁnite words ,https://github.com/jirka-h/aprng5. A. de Luca,

Sturmian words: structure, combinatorics, and their arithmetics , Theoret.Comput. Sci. (1997), 45–82.6. Ch. Doty-Humphrey,

Practically Random: C++ library of statistical tests for RNGs ,https://sourceforge.net/projects/pracrand7. Ch. Doty-Humphrey,

Practically Random: Speciﬁc tests in PractRand ,http://pracrand.sourceforge.net/Tests engines.txt8. Ch. Doty-Humphrey, J. Hladk´y

Practically Random: Discussion of testing results ,http://sourceforge.net/p/pracrand/discussion/366935/thread/a2eaad129. X. Droubay, J. Justin, G. Pirillo,

Episturmian words and some constructions by deLuca and Rauzy , Theoret. Comput. Sci. (2001), 539–553.

10. P. L’Ecuyer.

Random number generation.

In J. E. Gentle, W. Haerdle, and Y. Mori,editors, Handbook of Computational Statistics, 35–71. Springer-Verlag, Berlin, secondedition, 2012.11. P. L’Ecuyer, B. Oreshkin, and R. Simard,

Random num-bers for parallel computers: Requirements and methods ∼ lecuyer/myftp/papers/parallel-rng-imacs.pdf12. P. L’Ecuyer, R. Simard, TestU01: A C library for empirical testing of random numbergenerators , ACM Trans. Math. Softw. (2007).13. L.-S. Guimond, Jiˇr´ı Patera,

Proving the deterministic period breaking of linear congru-ential generators using two tile quasicrystals , Math. Comput. (2002), 319–332.14. L.-S. Guimond, Jan Patera, Jiˇr´ı Patera,

Combining random number generators usingcut-and-project sequences , Czechoslovak Journal of Physics (2001), 305–311.15. L.-S. Guimond, Jan Patera, Jiˇr´ı Patera, Statistical properties and implementation ofaperiodic pseudorandom number generators , Applied Numerical Mathematics (2003), 295–318.16. M. Lothaire,

Algebraic combinatorics on words , Encyclopedia of Mathematics and itsApplications 90, Cambridge University Press, 2002.17. G. Marsaglia,

Random numbers fall mainly in the planes ,Proc. Natl. Acad. Sci. (1968), 25–28.18. M. Morse, G. A. Hedlund,

Symbolic dynamics , Amer. J. Math. (1938), 815–866.19. M. Morse, G. A. Hedlund, Symbolic dynamics II: Sturmian trajectories , Amer. J.Math.

62 (1) (1940), 1–42.20. J. Patera,

Generating the Fibonacci chain in O (log n ) space and O ( n ) time , Phys.Part. Nuclei (2002), 118–122. Department of Mathematics, FNSPE, Czech Technical University in Prague,Trojanova 13, 120 00 Praha 2, Czech Republic

E-mail address : [email protected] Department of Mathematics, University of Li`ege, Grande traverse 12 (B37),B-4000 Li`ege, Belgium

E-mail address : [email protected] DIETI, Universit`a degli Studi di Napoli Federico II, via Claudio, 21, 80125Napoli, Italy

E-mail address : [email protected] Department of Mathematics, FNSPE, Czech Technical University in Prague,Trojanova 13, 120 00 Praha 2, Czech Republic

E-mail address : [email protected] LIP, ENS Lyon, France, and Sobolev Institute of Mathematics, Russia

Current address : LIP, ENS Lyon, 46 All´ee d’Italie Lyon 69364 France

E-mail address ::