[PDF] Efficient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

Abstract

The greedy Prefer-same de Bruijn sequence construction was first presented by Eldert et al.[AIEE Transactions 77 (1958)]. As a greedy algorithm, it has one major downside: it requires an exponential amount of space to store the length 2 n de Bruijn sequence. Though de Bruijn sequences have been heavily studied over the last 60 years, finding an efficient construction for the Prefer-same de Bruijn sequence has remained a tantalizing open problem. In this paper, we unveil the underlying structure of the Prefer-same de Bruijn sequence and solve the open problem by presenting an efficient algorithm to construct it using O(n) time per bit and only O(n) space. Following a similar approach, we also present an efficient algorithm to construct the Prefer-opposite de Bruijn sequence.

Full PDF

EEfﬁcient constructions of the Prefer-same andPrefer-opposite de Bruijn sequences

Evan Sala

School of Computer Science, University of Guelph, Canada

Joe Sawada

School of Computer Science, University of Guelph, Canada

Abbas Alhakim

Department of Mathematics, American University of Beirut, Lebanon

Abstract

The greedy Prefer-same de Bruijn sequence construction was ﬁrst presented by Eldert et al. [

AIEE Transactions

77 (1958)]. As a greedy algorithm, it has one major downside: it requires an exponential amount of space tostore the length n de Bruijn sequence. Though de Bruijn sequences have been heavily studied over the last 60years, ﬁnding an efﬁcient construction for the Prefer-same de Bruijn sequence has remained a tantalizing openproblem. In this paper, we unveil the underlying structure of the Prefer-same de Bruijn sequence and solve theopen problem by presenting an efﬁcient algorithm to construct it using O ( n ) time per bit and only O ( n ) space.Following a similar approach, we also present an efﬁcient algorithm to construct the Prefer-opposite de Bruijnsequence. Greedy algorithms often provide some of the nicest algorithms to exhaustively generate combinatorialobjects, especially in terms of the simplicity of their descriptions. An excellent discussion of suchalgorithms is given by Williams [30] with examples given for a wide range of combinatorial objectsincluding permutations, set partitions, binary trees, and de Bruijn sequences. A downside to greedyconstructions is that they generally require exponential space to keep track of which objects havealready been visited. Fortunately, most greedy constructions can also be constructed efﬁciently byeither an iterative successor-rule approach, or by applying a recursive technique. Such efﬁcientconstructions often provide extra underlying insight into both the combinatorial objects and the actuallisting of the object being generated.A de Bruijn sequence of order n is a sequence of bits that when considered cyclicly containsevery length n binary string as a substring exactly once; each such sequence has length n . They havebeen studied as far back as 1894 with the work by Flye Sainte-Marie [13], receiving more signiﬁcantattention starting in 1946 with the work of de Bruijn [7]. Since then, many different de Bruijnsequence constructions have been presented in the literature (see surveys in [15] and [20]). Generally,they fall into one of the following categories: (i) greedy approaches (ii) iterative successor-rule basedapproaches which includes linear (and non-linear) feedback shift registers (iii) string concatenationapproaches (iv) recursive approaches. Underlying all of these algorithms is the fact that every deBruijn sequence is in 1-1 correspondence with an Euler cycle in a related de Bruijn graph.Perhaps the most well-known de Bruijn sequence is the one that is the lexicographically largest.It has the following greedy Prefer-1 construction [26]. Prefer-1 construction Seed with n − Repeat until no new bit is added: Append 1 if it does not create a duplicate length n substring; otherwiseappend 0 if it does not create a duplicate length n substring Remove the seed © Evan Sala, Joe Sawada and Abbas Alhakim;licensed under Creative Commons License CC-BYLeibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . D M ] O c t X:2 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

For example, applying this construction for n = 4 we obtain the string: 000 . Like all greedy de Bruijn sequence constructions, this algorithm has a major downside: it requires anexponential amount of space to remember which substrings have already been visited. Fortunately, theresulting sequence can also be constructed efﬁciently by applying an O ( n ) time per bit successor-rulewhich requires O ( n ) space [14]. By applying a necklace concatenation approach, it can even begenerated in amortized O (1) time per bit and O ( n ) space [17].Two other interesting greedy constructions take into account the last bit generated. They are knownas the Prefer-same and Prefer-opposite constructions and their resulting sequences are respectivelythe lexicographically largest and smallest with respect to a run-length encoding [3]. The Prefer-sameconstruction was ﬁrst presented by Eldert et al. [10] in 1958 and was revisited with a proof ofcorrectness by Fredricksen [15] in 1982. Recently, the description of the algorithm was simpliﬁed [3]as follows: Prefer-same construction Seed with length n − string · · · Append 1 Repeat until no new bit is added: Append the same bit as the last if it does not create a duplicate length n substring; otherwise append the opposite bit as the last if it does not create a duplicate length n substring Remove the seed

For n = 4 , the sequence generated by this Prefer-same construction is 010 . It hasrun-length encoding 44211211 which is the lexicographically largest amongst all de Bruijn sequencesfor n = 4 .The Prefer-opposite construction is not greedy in the strictest sense since there is a specialcase when the current sufﬁx is n − . Details about this special case are provided in the nextsection. The construction presented below produces a shift of the sequence produced by the originalpresentation in [1]. Here, the initial seed of n − is rotated to the end so the resulting sequence is thelexicographically smallest with respect to a run-length encoding. Prefer-opposite construction Seed with n − Append 0 Repeat until no new bit is added: If current sufﬁx is n − then : append 1 if it is the ﬁrst time n − has been seen; otherwise append 0 Otherwise : append the opposite bit as the last if it does not create a duplicate length n substring;otherwise append the same bit as the last Remove the seed

For n = 4 , the sequence generated by this Prefer-opposite construction is 000 . The run-length encoding of this sequence is given by 111122143.To simplify our discussion, let: S n = the de Bruijn sequence of order n generated by the Prefer-same construction, and O n = the de Bruijn sequence of order n generated by the Prefer-opposite construction.Unlike the Prefer-1 sequence, and despite the vast research on de Bruijn sequences, S n and O n haveno known efﬁcient construction. For S n , ﬁnding an efﬁcient construction has remained an elusiveopen problem for over 60 years. The closest attempt came in 1977 when Fredricksen and Kessler . Sala, J. Sawada and A. Alhakim XX:3 devised a construction based on lexicographic compositions [16] that we discuss further in Section 8.The main results of this paper are to solve these open problems by providing successor-rule basedconstructions for S n and O n . They generate the respective sequences in O ( n ) time per bit using only O ( n ) space. The discovery of these efﬁcient constructions hinged on the following idea:Every interesting de Bruijn sequence is the result of joining together smaller cycles inducedby simple feedback shift registers.The initial challenge was to ﬁnd such a simple underlying feedback function. After careful study, thefollowing function was revealed: f ( w w · · · w n ) = w ⊕ w ⊕ w n , where ⊕ denotes addition modulo 2. We demonstrate this feedback function has nice run-lengthproperties when used to partition the set of all binary strings of length n in Section 4.3. The nextchallenge was to ﬁnd appropriate representatives for each cycle induced by f in order to apply theframework from [20] to join the cycles together. Outline of paper.

Before introducing our main results, we ﬁrst provide an insight into greedyconstructions for de Bruijn sequences that we feel has not been properly emphasized in the recentliterature. In particular, we demonstrate how all such constructions, which are generalized by thenotion of preference or look-up tables [2, 31], are in fact just special cases of a standard Euler cyclealgorithm on the de Bruijn graph. This discussion is found in Section 2 which also outlines a secondEuler cycle algorithm underlying the cycle joining approach applied in our main result. In Section 3,we present background on run-length encodings. In Section 4, we discuss feedback functions and deBruijn successors and introduce the function f ( w w · · · w n ) = w ⊕ w ⊕ w n critical to our mainresults. In Section 5, we present two generic de Bruijn successors based on the framework from [20].In Section 6 we present our ﬁrst main result: an efﬁcient successor-rule to generate S n . In Section 7we present our second main result: an efﬁcient successor-rule to generate O n . In Section 8 we discussthe lexicographic composition algorithm from [16] and a related open problem. In Section 9 wediscuss implementation details and analyze the efﬁciency of our algorithms.In Section 10 and Section 11 we detail the technical aspects required to prove our main results.Implementation of our algorithms, written in C, presented in this paper can be found in the appendicesand are available for download at http://debruijnsequence.org . Applications.

One of the ﬁrst instances of de Bruijn sequences is found in works of Sanskritprosody by the ancient mathematician Pingala dating back to the 2nd century BCE. Since then,de Bruijn sequences and their related theory have a rich history of application. One of their moreprominent applications, due to their random-like properties [22], is in the generation of pseudo-random bit sequences which are used in stream ciphers [25]. In particular, linear feedback shiftregister constructions (that omit the string of all 0s) allow for efﬁcient hardware embeddings whichhave been classically applied to represent different maps in video games including Pitfall [4]. Anotherapplication uses de Bruijn sequences to crack cipher locks in an efﬁcient manner [15]. More recently,the related de Bruijn graph has been applied to genome assembly [6, 27]. Given the vast literatureon de Bruijn sequences and their various methods of construction, the more interesting new resultsmay relate to sequences with speciﬁc properties. This makes the de Bruijn sequences S n and O n ofspecial interest since they are, respectively, the lexicographically largest and smallest sequences withrespect to a run-length encoding [3]. Moreover, recently it was noted they have a relatively smalldiscrepancy when compared to the sequences generated by the Prefer-1 construction [19]. X:4 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

The de Bruijn graph of order n is the directed graph G ( n ) = ( V, E ) where V is the set of allbinary strings of length n and there is a directed edge from u = u u · · · u n to v = v v · · · v n if u · · · u n = v · · · v n − . Each edge e is labeled by v n . Outputting the edge labels in a Hamiltoncycle of G ( n ) produces a de Bruijn sequence. Figure 1(a) illustrates a Hamilton cycle in the de Bruijngraph G (3) . Starting from 000, its corresponding de Bruijn sequence is 10111000. Figure 1 (a) A Hamilton cycle in G (3) starting from 000 corresponding to the de Bruijn sequence10111000 of order 3. (b) An Euler cycle in G (3) starting from 000 corresponding to the de Bruijn sequence0111101011001000 of order 4. Each de Bruijn graph is connected and the in-degree and the out-degree of each vertex is two; thegraph G ( n ) is Eulerian. G ( n ) is the line graph of G ( n − which means an Euler cycle in G ( n − corresponds to a Hamilton cycle in G ( n ) . Thus, the sequence of edge labels visited in an Euler cycleis a de Bruijn sequence. Figure 1(b) illustrates an Euler cycle in G (3) . The corresponding de Bruijnsequence of order four when starting from the vertex 000 is 0111101011001000.Finding an Euler cycle in an Eulerian graph is linear-time solvable with respect to the size ofthe graph. However, since the graph must be stored, applying such an algorithm to ﬁnd a de Bruijnsequence requires O (2 n ) space. One of the most well-known Euler cycle algorithms for directedgraphs is the following due to Fleury [12] with details in [15]. The basic idea is to not burn bridges;in other words, do not visit (and use up) an edge if it leaves the remaining graph disconnected. Fleury’s Euler cycle algorithm (do not burn bridges) Pick a root vertex and compute a spanning in-tree T Make each edge of T (the bridges) the last edge on the adjacency list of the corresponding vertex Starting from the root, traverse edges in a depth-ﬁrst manner by visiting the ﬁrst unused edge in thecurrent vertex’s adjacency list

Finding a spanning in-tree T can be done by reversing the direction of the edges in the Euleriangraph and computing a spanning out-tree with a standard depth ﬁrst search on the resulting graph.The corresponding edges in the original graph will be a spanning in-tree. Using this approach, all deBruijn sequences can be generated by considering all possible spanning in-trees. . Sala, J. Sawada and A. Alhakim XX:5 Although not well documented, this algorithm is the basis for all greedy de Bruijn sequenceconstructions along with their generalizations using preference tables [2] or look-up tables [31].Speciﬁcally, a preference table speciﬁes the precise order that the edges are visited for each vertexwhen performing Step 3 in Fleury’s Euler cycle algorithm. Thus given a preference table and a rootvertex, Step 3 in the algorithm can be applied to construct a de Bruijn sequence if combining the lastedge from each non-root vertex forms a spanning in-tree to the root. For example, the preferencetables and corresponding spanning in-trees for the Prefer-1 (rooted at 000), the Prefer-same (rootedat 010), and the Prefer-opposite (rooted at 000) constructions are given in Figure 2 for G (3) . Forthe Prefer-1, the only valid root is 000. For the Prefer-same, either 010 or 101 could be chosen asroot. The Prefer-opposite has a small nuance. By a strict greedy deﬁnition, the edges will not create aspanning in-tree for any root. But by changing the preference for the single string 111, a spanningin-tree is created when rooted at 000. This accounts for the special case required in the Prefer-oppositealgorithm. Notice how these strings relate to the seeds in their respective greedy constructions. Forthe Prefer-same, a root of 101 could also have been chosen, and doing so will yield the complementof the Prefer-same sequence when applying this Euler cycle algorithm. Figure 2 (a) A preference table corresponding to the Prefer-1 greedy construction along with its corre-sponding spanning in-tree rooted at 000. (b) A preference table corresponding to the Prefer-same greedyconstruction along with its corresponding spanning in-tree rooted at 010. (c) A preference table corresponding tothe Prefer-opposite greedy construction along with its corresponding spanning in-tree rooted at 000.

A second well-known Euler cycle algorithm for directed graphs, attributed to Hierholzer [23], isas follows:

Hierholzer’s Euler cycle algorithm (cycle joining) Start at an arbitrary vertex v visiting edges in a depth-ﬁrst manner until returning to v , creating a cycle. Repeat until all edges are visited:

Start from any vertex u on the current cycle and visit remainingedges in a DFS manner until returning to u , creating a new cycle. Join the two cycles together. X:6 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

This cycle-joining approach is the basis for all successor-rule constructions of de Bruijn sequences. Ageneral framework for joining smaller cycles together based on an underlying feedback shift registeris given for the binary case in [20], and then more generally for larger alphabets in [21]. It is the basisfor the efﬁcient algorithm presented in this paper, where the initial cycles are induced by a speciﬁcfeedback function.

The sequences S n and O n both have properties based on a run-length encoding of binary strings. The run-length encoding (RLE) of a string ω = w w · · · w n is a compressed representation that storesconsecutively the lengths of the maximal runs of each symbol. The run-length of ω is the length of itsRLE. For example, the string 11000110 has RLE 2321 and run-length 4. Note that 00111001 also hasRLE 2321. Since we are dealing with binary strings, we require knowledge of the starting symbol toobtain a given binary string from its RLE. As a further example: S = 11111000001110110011010001001010 has RLE . The following facts are proved in [3]. (cid:73)

Proposition 1.

The sequence S n is the de Bruijn sequence of order n starting with 1 that hasthe lexicographically largest RLE. (cid:73) Proposition 2.

The sequence O n is the de Bruijn sequence of order n starting with 1 that hasthe lexicographically smallest RLE. Let alt ( n ) denote the alternating sequence of 0s and 1s of length n that ends with 0: For example, alt (6) = 101010 . The following facts are also immediate from [3]. (cid:73) Proposition 3. S n has preﬁx n and has sufﬁx alt ( n − . (cid:73) Proposition 4. O n has length n preﬁx · · · and has sufﬁx n − . The sequence based on lexicographic compositions [16] also has also has run-length properties: it isconstructed by concatenating lexicographic compositions which are represented using a RLE. A briefdiscussion of this sequence is provided in Section 8.

Let B ( n ) denote the set of all binary strings of length n . We call a function f : B ( n ) → { , } a feedback function . Let ω = w w · · · w n be a string in B ( n ) . A feedback shift register is a function F : B ( n ) → B ( n ) that takes the form F ( ω ) = w w · · · w n f ( w w · · · w n ) for a given feedbackfunction f .A feedback function g : B ( n ) → { , } is a de Bruijn successor if there exists a de Bruijnsequence of order n such that each string ω ∈ B ( n ) is followed by g ( ω ) in the given de Bruijnsequence. Given a de Bruijn successor g and a seed string ω = w w · · · w n , the following functionDB( g, ω ) will return a de Bruijn sequence of order n with sufﬁx ω : function DB( g, ω ) for i ← to n do x i ← g ( ω ) ω ← w w · · · w n x i return x x · · · x n . Sala, J. Sawada and A. Alhakim XX:7 A linearized de Bruijn sequence is a linear string that contains every string in B ( n ) as a substringexactly once. Such a string has length n + n − . Note that the length n sufﬁx of a de Bruijn sequence D n = DB( g, w · · · w n ) is w · · · w n . Thus, w · · · w n D n is a linearized de Bruijn sequence.For each of the upcoming feedback functions, selecting appropriate representatives for the cyclesthey induce is an important step to developing efﬁcient de Bruijn successors for S n and O n . Inparticular, consider two representatives for a given cycle based on their RLE. RL-rep : The string with the lexicographically largest RLE; if there are two such strings, it is theone beginning with 1.

RL2-rep : The string with the lexicographically smallest RLE; if there are two such strings, it isthe one beginning with 0.For our upcoming discussion, deﬁne the period of a string ω = w w · · · w n to be the smallest integer p such that ω = ( w · · · w p ) j for some integer j . If j > we say that ω is periodic ; otherwise, wesay it is aperiodic (or primitive). The pure cycling register , denoted PCR, is the feedback shift register with the feedback function f ( ω ) = w . Thus, PCR ( w w · · · w n ) = w · · · w n w . It is well-known that the PCR partitions B ( n ) into cycles of strings that are equivalent under rotation. The following example illustrates thecycles induced by the PCR for n = 5 along with their corresponding RL-reps and RL2-reps. Example 1

The PCR partitions B (5) into the following eight cycles P , P , . . . , P where the topstring in bold is the RL-rep for the given cycle. The underlined string is the RL2-rep. P P P P P P P P The PCR is the underlying feedback function used to construct the Prefer-1 greedy constructioncorresponding to the lexicographically largest de Bruijn sequence. It has also been applied in some ofthe simplest and most efﬁcient de Bruijn sequence constructions [8, 20, 29]. In these constructions,the cycle representatives relate to the lexicographically smallest (or largest) strings in each cycle andthey can be determined in O ( n ) time using O ( n ) space using standard techniques [5, 9]. We alsoapply these methods to efﬁciently determine the RL-reps and the RL2-reps.Clearly n and n are both RL-reps. Consider a string ω = w w · · · w n in a cycle P with RLE r r · · · r ‘ where ‘ > . If ω is an RL-rep, then w = w n because otherwise w n w · · · w n − has alarger RLE than ω . All strings in P that differ in the ﬁrst and last bits form an equivalence class underrotation with respect to their RLE. By deﬁnition, the RL-rep will be one that is lexicographicallylargest amongst all its rotations. As noted above, such a test can be performed in O ( n ) time using O ( n ) space. There is one special case to consider: when both a string beginning with 0 and itscomplement beginning with 1 belong to the same cycle. For example, consider and which both have RLE 211211. Note this RLE has period p = 3 and it is maximal amongstits rotations. By deﬁnition, the string beginning with 0 is not an RL-rep. It is not difﬁcult to see thatsuch a string occurs precisely when w = 0 , p is odd, and p < ‘ , where p is the period of r r · · · r ‘ . (cid:73) Proposition 5.

Let ω = w w · · · w n be a string with RLE r r · · · r ‘ , where ‘ > , in a cycle P induced by the PCR. Let p be the period of r r · · · r ‘ . Then ω is the RL-rep for P if and only if w = w n , X:8 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences r r · · · r ‘ is lexicographically largest amongst all its rotations, and either w = 1 or p = ‘ or p is even.Moreover, testing whether or not ω is an RL-rep can be done in O ( n ) time using O ( n ) space. In a similar manner we consider RL2-reps. Again n and n are both clearly RL2-reps. Considera string ω = w w · · · w n in a cycle P with run length greater than one. If ω is an RL2-rep, then w = w because otherwise w · · · w n w has a smaller RLE than ω . Thus, consider all strings s s · · · s n in a cycle P such that s = s . One of these strings is the RL2-rep. Now considerall left rotations of these strings taking the form s · · · s n s . Notice that a string in the latter setwith the smallest RLE will correspond to the RL2-rep after rotating the string back to the right.As noted in the RL-case, the set of rotated strings form an equivalence class under rotation withrespect to their RLE, since their ﬁrst and last bits differ. Again, the same special case arises aswith RL-reps: when both a string beginning with 0 and its complement beginning with 1 belongto the same cycle. For example, consider the cycle containing both 10100101 and 01011010. Ineach string the ﬁrst two bits differ. The set of all strings in its cycle where the ﬁrst two bitsdiffer is { , , , , , } . Rotating each stringto the left we get the set { , , , , , } . Thecorresponding RLEs for this latter set are { , , , , , } . Inthis case there are two strings and that both have RLE 112112. Rotating thesestrings back to the right we have 10100101 and 01011010 which both have the lexicographicallysmallest RLE of 1112111 in their cycle induced by the PCR. By deﬁnition, the string beginning with0 will be the RL2-rep. Thus ω is not an RL2-rep if w = 1 , p is odd, and p < ‘ , where p is the periodof the RLE r r · · · r ‘ for the string w · · · w n w . (cid:73) Proposition 6.

Let ω = w w · · · w n and let r r · · · r ‘ be the RLE of w · · · w n w , where ‘ > , in a cycle P induced by the PCR. Let p be the period of r r · · · r ‘ . Then ω is the RL2-rep for P if and only if w = w , r r · · · r ‘ is lexicographically smallest amongst all its rotations, and either w = 0 or p = ‘ or p is even.Moreover, testing whether or not ω is an RL2-rep can be done in O ( n ) time using O ( n ) space. The complementing cycling register , denoted CCR, is the FSR with the feedback function f ( ω ) = w ,where w denotes the complement w . Thus, CCR ( w w · · · w n ) = w · · · w n w . A string and itscomplement will belong to the same cycle induced by the CCR. Example 2

The CCR partitions B (5) into the following four cycles C , C , C , C where the topstring in bold is the RL-rep for the given cycle. The underlined string is the RL2-rep. C C C C . Sala, J. Sawada and A. Alhakim XX:9 The CCR has been applied to efﬁciently construct de Bruijn sequences in variety of ways [11,24, 20]. An especially efﬁcient construction applies a concatenation scheme to construct a de Bruijnsequence with discrepancy (maximum difference between the number of 0s and 1s in any preﬁx)bounded above by n [18, 19].As with the PCR, we discuss how to efﬁciently determine whether or not a given string is anRL-rep or an RL2-rep for a cycle C induced by the CCR. Consider a string ω = w w · · · w n in acycle C . If ω is an RL-rep, then w = w n because otherwise w n w · · · w n − , which is also in C , hasa larger RLE than ω . All strings in C that agree in the ﬁrst and last bits form an equivalence classunder rotation with respect to their RLE (that includes strings starting with both 0 and 1 for eachRLE). By deﬁnition, the RL-rep will be one that is lexicographically largest amongst all its rotations.As noted in the previous subsection, such a test can be performed in O ( n ) time using O ( n ) space.There are no special cases to consider here since a string and its complement always belong to thesame cycle. Thus, every RL-rep must begin with 1. (cid:73) Proposition 7.

Let ω = w w · · · w n be a string with RLE r r · · · r ‘ in a cycle C induced bythe CCR. Then ω is the RL-rep for C if and only if w = w n = 1 and r r · · · r ‘ is lexicographically largest amongst all its rotations.Moreover, testing whether or not ω is an RL-rep can be done in O ( n ) time using O ( n ) space. In a similar manner we consider RL2-reps. Again, consider a string ω = w w · · · w n in a cycle C . If ω is an RL2-rep, then w = w because otherwise w · · · w n w has a smaller RLE than ω .Consider all such strings w · · · w n w in a cycle C such that w = w . As noted in the RL-case,all such strings form an equivalence class under rotation with respect to their RLE. Clearly, such astring that has the lexicographically smallest RLE will be the RL2-rep. There are no special casesto consider here since a string and its complement always belong to the same cycle. Thus, everyRL2-rep must begin with 0 and hence w = 1 . (cid:73) Proposition 8.

Let ω = w w · · · w n be a string with RLE r r · · · r ‘ in a cycle C induced bythe CCR. Then ω is the RL2-rep for C if and only if w = 0 , w = 1 , and r r · · · r ‘ is lexicographically smallest amongst all its rotations.Moreover, testing whether or not ω is an RL2-rep can be done in O ( n ) time using O ( n ) space. PRR ) The feedback function of particular focus in this paper is f ( ω ) = w ⊕ w ⊕ w n . We will demonstratethat FSR based on this feedback function partitions B ( n ) into cycles of strings with the same run-length. Because of this property, we call this FSR the pure run-length register and denote it by PRR.Thus, PRR ( w w · · · w n ) = w · · · w n ( w ⊕ w ⊕ w n ) . This follows the naming of the pure cycling register (PCR) and the pure summing register (PSR),which is based on the feedback function f ( ω ) = w ⊕ w ⊕ · · · ⊕ w n [22].Let R , R , . . . , R t denote the cycles induced by the PRR on B ( n ) . The following exampleillustrates how the cycles induced by the PRR relate to the cycles induced by the PCR and CCR. X:10 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

Example 3

The PRR partitions B (6) into the following 12 cycles R , R , . . . , R where the topstring in bold is the RL-rep for the given cycle. The underlined string is the RL2-rep. The cycles areordered in non-increasing order with respect to the run-lengths of their RL-reps. R R R R R R R R R R R R By omitting the last bit of each string, the columns are precisely the cycles of the PCR and CCR for n = 5 .The cycles R , R , R , R relating to the CCR start and end with the different bits. The remainingcycles relate to the PCR; each string in these cycles start and end with the same bit. In the example above, note that all the strings in a given cycle R i have the same run-length. (cid:73) Lemma 9.

All the strings in a given cycle R i have the same run-length. Proof.

Consider a string ω = w w · · · w n and the feedback function f ( ω ) = w ⊕ w ⊕ w n . Itsufﬁces to show that w · · · w n f ( ω ) has the same run-length as ω . This is easily observed since if w = w then w n = f ( ω ) and if w = w then w n = f ( ω ) . (cid:74) Based on this lemma, if the strings in R i have run length ‘ , we say that R i has run-length ‘ . Eachcycle R i has another interesting property: either all the strings start and end with the same bit, or allthe strings start and end with different bits. If the strings start and end with the same bit, then R i must have odd run-length and if we remove the last bit of each string we obtain a cycle induced bythe PCR of order n − . In this case we say that R i is a PCR-related cycle . If the strings start and endwith the different bits, then R i must have even run-length and if we remove the last bit of each stringwe obtain a cycle induced by the CCR of order n − . In this case we say that R i is a CCR-relatedcycle . These observations were ﬁrst made in [28] and are illustrated in Example 3. Based on theseobservations, we can apply the RL-rep and RL2-rep testers for cycles induced by the PCR and CCRto determine whether or not a string ω is an RL-rep or an RL2-rep for a cycle R i . These testers willbe critical to the efﬁciency of our upcoming de Bruijn successors. (cid:73) Proposition 10.

Let ω = w w · · · w n be a string in a cycle R induced by the PRR. Then ω isthe RL-rep for R if and only if w = w n and w w · · · w n − is an RL-rep with respect to the PCR, or w = w n and w w · · · w n − is an RL-rep with respect to the CCR.Moreover, testing whether or not ω is an RL-rep for R can be done in O ( n ) time using O ( n ) space. (cid:73) Proposition 11.

Let ω = w w · · · w n be a string in a cycle R induced by the PRR. Then ω isthe RL2-rep for R if and only if w = w n and w w · · · w n − is an RL2-rep with respect to the PCR, or w = w n and w w · · · w n − is an RL2-rep with respect to the CCR.Moreover, testing whether or not ω is an RL2-rep for R can be done in O ( n ) time using O ( n ) space. . Sala, J. Sawada and A. Alhakim XX:11 In this section we provide two generic de Bruijn successors that are applied to derive speciﬁc deBruijn successors for S n and O n in the subsequent sections. The results relate speciﬁcally to the PRRand we assume that R , R , . . . , R t denote the cycles induced by the PRR on B ( n ) .Let ω = w w · · · w n be a binary string. Deﬁne the conjugate of ω to be ˆ ω = w w · · · w n .Similar to Hierholzer’s cycle-joining approach discussed in Section 2, Theorem 3.5 from [20]can be applied to systematically join together the ordered cycles R , R , . . . , R t given certainrepresentatives α i for each R i . This theorem is restated as follows when applied to the PRR and thefunction f ( ω ) = w ⊕ w ⊕ w n . (cid:73) Theorem 12.

For each < i ≤ t , if the conjugate ˆ α i of the representative α i for cycle R i belongs to some R j where j < i , then g ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in { α , α , . . . , α t } ; f ( ω ) otherwise.is a de Bruijn successor. Together, the ordering of the cycles and the sequence α , α , . . . , α t correspond to a rooted tree,where the nodes are the cycles R , R , . . . , R t with R designated as the root. There is an edgebetween two nodes R i and R j where i > j , if and only if ˆ α i is in R j . Each edge representsthe joining of two cycles similar to the technique used in Hierholzer’s Euler cycle algorithm (seeSection 2). An example of such a tree for n = 6 is given in the following example. Example 4

Consider the cycles R , R , . . . , R for n = 6 from Example 3 along with theircorresponding RL-reps α i for each R i . For each i > , ˆ α i belongs to some R j where j < i . Thus, wecan apply Theorem 12 to obtain a de Bruijn successor g ( ω ) based on these representatives. The followingtree illustrates the joining of these cycles based on g : R R R R R R R R R R R R α ˆ α Starting with 101010 from R , and repeatedly applying the function g ( ω ) we obtain the de Bruijnsequence: . Note that the RL-rep of R is α = 001010 and its conjugate ˆ α = 101010 is found in R . The laststring visited in each cycle R i , for i > , is its representative α i . X:12 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

The following observations, which will be applied later in our more technical proofs, follow from thetree interpretation of the ordered cycles rooted at R from Theorem 12 as illustrated in the previousexample. (cid:73) Observation 13.

Let g be a de Bruijn successor from Theorem 12 based on representatives α , α , . . . , α t . Let D n = DB ( g, w w · · · w n ) and let D n = w · · · w n D n denote a linearized deBruijn sequence. If the length n preﬁx of D n is in R , then for each < i ≤ t : ˆ α i appears before all strings in R i , the m strings of R i appear in the following order: PRR ( α i ) , PRR ( α i ) , . . . , PRR m ( α i ) = α i , if R i and R k are on the same level in the corresponding tree of cycles rooted at R , then eitherevery string in R i comes before every string in R k or vice-versa, the strings in all descendant cycles of R i appear after ˆ α i and before α i , and if ˆ α i = a a · · · a n , then a · · · a n g (ˆ α i ) is in R i . As an application of Theorem 12, consider the cycles R , R , . . . , R t to be ordered in non-increasing order based on the run-length of each cycle. Such an ordering is given in Example 3 for n = 6 . Using this ordering, let α i = a a · · · a n be any string in R i , for i > , such that a = a .Note that ˆ α i has run-length that is one more than the run-length of α i and thus ˆ α i belongs to some R j where j < i . Thus, Theorem 12 can be applied to describe the following generic de Bruijn successorbased on the PRR. (cid:73) Theorem 14.

Let R , R , . . . , R t be listed in non-increasing order with respect to therun-length of each cycle. Let α i = a a · · · a n denote a representative in R i such that a = a ,for each < i ≤ t . Let ω = w w · · · w n and let f ( ω ) = w ⊕ w ⊕ w n . Then the function: g ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in { α , α , . . . , α t } ; f ( ω ) otherwise.is a de Bruijn successor. Now consider the cycles R , R , . . . , R t to be ordered in non-decreasing order based on therun-length of each cycle. This means the ﬁrst two cycles R and R will be the cycles containing n and n . But given this ordering, there is no way to satisfy Theorem 12 since the conjugate of anyrepresentative for R will not be found in R . However, if we let R t = { n } , and order the remainingcycles in non-decreasing order based on the run-length of each cycle, then we obtain a result similarto Theorem 14. Observe, that this relates to the special case described for the Prefer-opposite greedyconstruction illustrated in Figure 2. Using this ordering, let α i = a a · · · a n be any string in R i , for < i < t , such that a = a . Such a string exists since R = { n } and R t = { n } . This means ˆ α i has run-length that is one less than the run-length of α i and thus ˆ α i belongs to some R j where j < i .For the special case when i = t , the conjugate of n clearly is found in some R j where j < t . Thus,Theorem 12 can be applied again to describe another generic de Bruijn successor based on the PRR. (cid:73) Theorem 15.

Let R t = { n } and let the remaining cycles R , R , . . . , R t − be listed innon-decreasing order with respect to the run-length of each cycle. Let α i = a a · · · a n denotea representative in R i such that a = a , for each < i < t . Let ω = w w · · · w n and let f ( ω ) = w ⊕ w ⊕ w n . Then the function: g ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in { α , α , . . . , α t } ; f ( ω ) otherwise.is a de Bruijn successor. . Sala, J. Sawada and A. Alhakim XX:13 When Theorem 14 and Theorem 15 are applied naïvely, the resulting de Bruijn successors are notefﬁcient since storing the set { α , α , . . . , α t } requires exponential space. However, if a membershiptester for the set can be deﬁned efﬁciently, then there is no need for the set to be stored. Such sets ofrepresentatives are presented in the next two sections. S n In this section we deﬁne a de Bruijn successor for S n . Recall the partition R , R , . . . , R t of B ( n ) induced by the PRR. In addition to the RL-rep, we deﬁne a new representative for each cycle, calledthe LC-rep, where the LC stands for Lexicographic Compositions which are further discussed inSection 8. Then, considering these two representatives along with a small set of special strings, wedeﬁne a third representative, called the same-rep. For each representative, we can apply Theorem 14to produce a new de Bruijn successor. The deﬁnitions for these three representatives are as follows: RL-rep : The string with the lexicographically largest RLE; if there are two such strings, it is theone beginning with 1.

LC-rep : The strings n and n for the classes { n } and { n } respectively. For all other classes, itis the string ω with RLE i − r i +1 · · · r ‘ where r i +1 = 1 such that PRR i +1 ( ω ) is the RL-rep. same-rep : (cid:26) RL-rep if the RL-rep is special

LC-rep otherwise.We say an RL-rep is special if it belongs to the set SP ( n ) deﬁned as follows: SP ( n ) is the set of length n binary strings that begin and end with 0 and have RLE of theform (21 x ) y z , where x ≥ , y ≥ , and z ≥ .The RL-reps have already been illustrated in Section 4. There are relatively few special RL-reps andthey all have odd run-length since they must begin and end with 0. Example 5

The RLE of the special RL-reps for n = 10 , , , . n = 10 : 2221111 n = 11 : 2222111, 221111111, 211211111 n = 12 : 2222211, 222111111 n = 13 : 222211111, 22111111111, 21121111111 To illustrate an LC-rep, consider the string ω = 110101111011 with RLE . The string ω is an LC-rep since P RR ( ω ) = 111101110101 which is an RL-rep with RLE . Note thatanother way to deﬁne the LC-rep is as follows: If the RLE of an RL-rep ends with i consecutive 1s,then the corresponding LC-rep is the string ω such that P RR i +1 ( ω ) is the RL-rep.Let RL ( n ) , LC ( n ) , and Same ( n ) denote the sets of all length n RL-reps, LC-reps, and same-reps, respectively, not including the representative with run-length n . Consider the followingfeedback functions where ω = w w · · · w n and f ( ω ) = w ⊕ w ⊕ w n : RL ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in RL ( n ) ; f ( ω ) otherwise. LC ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in LC ( n ) ; f ( ω ) otherwise. S ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in Same ( n ) ; f ( ω ) otherwise. X:14 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences (cid:73)

Theorem 16.

The feedback functions RL ( ω ) , LC ( ω ) and S ( ω ) are de Bruijn successors. Proof.

Let the partition R , R , . . . , R t of B ( n ) induced by the PRR be listed in non-increasingorder with respect to the run-length of each cycle. Observe that R is the cycle whose stringshave run-length n , and thus any representative of R will have run-length n . By deﬁnition, thisrepresentative is not in the sets RL ( n ) , LC ( n ) , and Same ( n ) . Now consider R i for i > . Clearlythe RL-rep for R i will begin with 00 or 11 and by deﬁnition, the LC-rep for R i also begins with 00or 11. Together these results imply that each same-rep for R i will also begin with 00 or 11. Thus, iffollows directly from Theorem 14 that RL ( ω ) , LC ( ω ) and S ( ω ) are de Bruijn successors. (cid:74) Recall that alt ( n ) denotes the alternating sequence of 0s and 1s of length n that ends with 0.Let X n = x x · · · x n be the de Bruijn sequence returned by DB( S, alt ( n − ); it will have sufﬁxequal to the seed alt ( n − . Let X n denote the linearized de Bruijn sequence alt ( n − X n . Ourgoal is to show that X n = S n . Our proof requires the following proposition that is proved later inSection 10. (cid:73) Proposition 17. If β is a string in B ( n ) such that the run-length of β is one more than therun-length of ˆ β and neither β nor ˆ β are same-reps, then ˆ β appears before β in X n . The following proposition follows from n applications of the successor S to the seed alt ( n − . (cid:73) Proposition 18. X n has preﬁx n . (cid:73) Theorem 19.

The de Bruijn sequences S n and X n are the same. Proof.

Let S n = s s · · · s n , let X n = x x · · · x n . Recall that X n ends with alt ( n − . FromProposition 3 and Proposition 18, x x · · · x n = s s · · · s n = 1 n and moreover S n and X n sharethe same length n − sufﬁx. Suppose there exists some smallest t , where n < t ≤ n − n , suchthat s t = x t . Let β = x t − n · · · x t − denote the length n substring of X n ending at position t − .Then x t = x t − , because otherwise the RLE of X n is lexicographically larger than that of S n ,contradicting Proposition 1. We claim that ˆ β comes before β in X n , by considering two cases,recalling f ( ω ) = w ⊕ w ⊕ w n :If x t = f ( β ) , then by the deﬁnition of S , neither β nor ˆ β are in Same ( n ) . By the deﬁnition of f and since x t = x t − , the ﬁrst two bits of β must differ from each other. Thus, the run-length of β is one more than the run-length of ˆ β . Thus the claim holds by Proposition 17.If x t = f ( β ) , then either β or ˆ β are in Same ( n ) . Let β = b b · · · b n . Then PRR ( β ) = b · · · b n s t and PRR ( ˆ β ) = b · · · b n x t . From Lemma 9, the strings β and b · · · b n s t have thesame run-length and the strings ˆ β and b · · · b n x t have the same run-length. Since b · · · b n x t hasrun-length one greater than that of b · · · b n s t , it must be that ˆ β has run-length one greater thanthat of β . This means that ˆ β must begin with 10 or 01, and hence is not a same-rep, which can beinferred by deﬁnition. Thus β is a same-rep and the claim thus holds by Observation 13 (item 1).Since ˆ β appears before β in X n then ˆ β must be a substring of alt ( n − x · · · x t − . Thus, either x t − n +1 · · · x t − x t or x t − n +1 · · · x t − s t must be in alt ( n − x · · · x t − which contradicts the factthat both X n and S n are de Bruijn sequences. Thus, there is no n < t ≤ n such that s t = x t andhence S n = X n . (cid:74) O n To develop an efﬁcient de Bruijn successor for O n , we follow an approach similar to that for S n ,except this time we focus on the lexicographically smallest RLEs and RL2-reps. Again, we considerthree different representatives for the cycles R , R , . . . , R t of B ( n ) induced by the PRR. . Sala, J. Sawada and A. Alhakim XX:15 RL2-rep : The string with the lexicographically smallest RLE; if there are two such strings, it isthe one beginning with 0.

LC2-rep : The strings n and n for the classes { n } and { n } respectively. For all other classes,it is the string ω with RLE r r · · · r ‘ such that r = 1 and r applications of the PRR startingwith ω yields the RL2-rep. opp-rep : (cid:26) RL2-rep if the RL2-rep is special2

LC2-rep otherwise.We say an RL2-rep is special2 if it belongs to the set

SP2 ( n ) deﬁned as follows: SP2 ( n ) is the set of length n binary strings that begin with 1 and have RLE of the form x z y where z is odd and y > x .The RL2-reps have already been illustrated in Section 4. There are a relatively few special2 RL2-repsand they all have odd run-length. Example 6

The RLEs of the special2 RL2-reps for n = 10 , , , : n = 10 : 111111112, 1111114, 11116, 118, 12223, 127, 136, 145 n = 11 : 111111113, 1111115, 11117, 119, 12224, 128, 137, 146 n = 12 : 11111111112, 111111114, 1111116, 11118,11(10), 12225, 129, 138, 147, 156 n = 13 : 11111111113, 111111115, 1111117, 11119,11(11), 12226, 12(10), 139, 148, 157 Except for the special cases n and n , the LC-rep will begin with 10 and 01. As an example,consider ω = 10000101001 which has RLE r r r r r r r = 1411121 . It is an LC-rep since r = 4 applications of the PRR to ω yields the RL2-rep with RLE . Note thelast value of this RLE will correspond to r .Let RL2 ( n ) , LC2 ( n ) , and OPP ( n ) denote the set of all length n RL2-reps, LC2-reps, and opp-reps, respectively, not including the representative n . Consider the following feedback functionswhere ω = w w · · · w n and f ( ω ) = w ⊕ w ⊕ w n : RL ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in RL2 ( n ) ; f ( ω ) otherwise. LC ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in LC2 ( n ) ; f ( ω ) otherwise. O ( ω ) = (cid:26) f ( ω ) if ω or ˆ ω is in OPP ( n ) ; f ( ω ) otherwise. (cid:73) Theorem 20.

The feedback functions RL ω ) , LC ω ) and O ( ω ) are de Bruijn successors. Proof.

Let the partition R , R , . . . , R t of B ( n ) induced by the PRR be listed such that R t = { n } and the remaining t − cycles are ordered in non-decreasing order with respect to the run-length ofeach cycle. This means that R = { n } and its representative, which must be n , is not in the sets RL2 ( n ) , LC2 ( n ) , and OPP ( n ) by their deﬁnition. Now consider R i for < i < t . Clearly theRL2-rep for R i , which is a string with the lexicographically smallest RLE, will begin with 01 or 10.Similarly, the LC2-rep for R i must begin with 01 or 10 by its deﬁnition. Together these results implythat each opp-rep for R i will also begin with 01 or 10. Thus, if follows directly from Theorem 15that RL ω ) , LC ω ) and O ( ω ) are de Bruijn successors. (cid:74) Recall from Proposition 4 that the length n sufﬁx of O n is n − . Let Y n = y y · · · y n bethe de Bruijn sequence returned by DB( O, n − ) ; it will have sufﬁx n − . Let Y n denote thelinearized de Bruijn sequence n − Y n . Our goal is to show that Y n = O n . Our proof requires thefollowing proposition that is proved later in Section 11. X:16 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences (cid:73)

Proposition 21. If β is a string in B ( n ) such that the run-length of β is one less than therun-length of ˆ β and neither β nor ˆ β are opp-reps, then ˆ β appears before β in Y n . The following proposition follows from n applications of the successor O to the seed n − . (cid:73) Proposition 22. Y n has length n preﬁx · · · . (cid:73) Theorem 23.

The de Bruijn sequences O n and Y n are the same. Proof.

Let O n = o o · · · o n , let Y n = y y · · · y n . Recall that Y n ends with n − . FromProposition 4 and Proposition 22, y y · · · y n = o o · · · o n = 0101 · · · and moreover O n and Y n share the same length n − sufﬁx n − . Suppose there exists some smallest t , where n < t ≤ n − n ,such that o t = y t . Let β = y t − n · · · y t − denote the length n substring of Y n ending at position t − . Then y t = y t − , because otherwise the RLE of Y n is lexicographically smaller than that of O n , contradicting Proposition 2. We claim that ˆ β comes before β in Y n , by considering two cases,recalling f ( ω ) = w ⊕ w ⊕ w n :If y t = f ( β ) , then by the deﬁnition of O , neither β nor ˆ β are in OPP ( n ) . By the deﬁnition of f and since y t = y t − , the ﬁrst two bits of β are the same. Thus, the run-length of β is one lessthan the run-length of ˆ β . Thus the claim holds by Proposition 21.If y t = f ( β ) , then either β or ˆ β are in OPP ( n ) . Let β = b b · · · b n . Then PRR ( β ) = b · · · b n o t and PRR ( ˆ β ) = b · · · b n y t . From Lemma 9, this means β and b · · · b n o t have the same run lengthand ˆ β and b · · · b n y t have the same run length. Since b · · · b n y t has run-length one less thanthat of b · · · b n o t , it must be that ˆ β has run-length one less than that of β . This means ˆ β mustbegin with 00 or 11 and hence is not an opp-rep, which can be inferred by deﬁnition. Thus β is anopp-rep and the claim holds by Observation 13 (item 1).Since ˆ β appears before β in Y n then ˆ β must be a substring of n − y · · · y t − . Thus, either y t − n +1 · · · y t − y t or y t − n +1 · · · y t − o t must be in n − y · · · y t − which contradicts the fact thatboth Y n and O n are de Bruijn sequences. Thus, there is no n < t ≤ n such that o t = y t and hence O n = Y n . (cid:74) As mentioned earlier, Fredricksen and Kessler devised a construction based on lexicographic compo-sitions [16]. Let L n denote the de Bruijn sequence of order n that results from this construction. Thesequences S n and L n ﬁrst differ at n = 7 (as noted below), and for n ≥ they were conjectured tomatch for a signiﬁcant preﬁx [15, 16]: S = 11111110000000111110111100111101000001000011000010111000111001000110111011000100111010110011001011011010011010100010100100101010 ,L = 11111110000000111110111100111101000001000011000010111000111001000110111011000100111010110011001011011010100010100110100100101010 . After discovering the de Bruijn successor for S n , we observed that the de Bruijn sequenceresulting from the de Bruijn successor LC ( ω ) corresponded to L n . Recall that alt ( n ) denotes thealternating sequence of 0s and 1s of length n that ends with 0. Let LC n be the de Bruijn sequencereturned by DB( LC, alt ( n − ). (cid:73) Conjecture 24.

The de Bruijn sequences LC n and L n are the same. We have veriﬁed that LC n is the same as L n for all n < . However, as the description of thealgorithm to construct L n is rather detailed [16], we did not attempt to prove this conjecture. . Sala, J. Sawada and A. Alhakim XX:17 Given a membership tester for RL ( n ) , testing whether or not a string is an LC-rep or a same-rep caneasily be done in O ( n ) time and O ( n ) space. Similarly, given the membership tester for RL2 ( n ) ,testing whether or not a string is an LC2-rep or a opp-rep can easily be done in O ( n ) time and O ( n ) space. Thus, by applying Proposition 10 and Proposition 11, we can implement each of our six deBruijn successors in O ( n ) time using O ( n ) space. (cid:73) Theorem 25.

The six de Bruijn successors RL ( ω ) , LC ( ω ) , S ( ω ) , RL ω ) , LC ω ) and O ( ω ) can be implemented in O ( n ) time using O ( n ) space.

10 Proof of Proposition 17

Recall that X n = DB( S, alt ( n − )) and X n = alt ( n − X n . We begin by restating Proposition 17by reversing the roles of β and ˆ β in its original statement for convenience:If β is a string in B ( n ) such that the run-length of β is one less than the run-length of ˆ β andneither β nor ˆ β are same-reps, then β appears before ˆ β in X n .The ﬁrst step is to further reﬁne the ordering of the cycles R , R , . . . , R t used in the proof ofTheorem 16 to prove that S was a de Bruijn successor. In particular, let R , R , . . . , R t be the cyclesof B ( n ) induced by the PRR ordered in non-increasing order with respect to the run-lengths of eachcycle, additionally reﬁned so the cycles with the same run-lengths are ordered in decreasing orderwith respect to the RLE (RLE) of the RL-rep. If two RL-reps have the same RLE, then the cyclewith RL-rep starting with 1 comes ﬁrst. Note that this reﬁnement satisﬁes the ordering of the cyclesrequired in the proof.Let α i , γ i , σ i denote the same-rep, LC-rep, and the RL-rep, respectively, for R i . If β is in R then it has run-length n , which one more than that of ˆ β . Thus assume β is in some R i , where i > ,such that the run-length of β is one less than the run-length of ˆ β where neither β nor ˆ β are same-reps.The run-length constraint implies that the RLE of β must begin with a value greater than 1. Weseparate two special cases for β that are illustrated in Figure 3 (a): β = γ j when σ j ( = α j ) is specialand β = σ i when σ i = σ j is special, in which case α i = γ i . All other possible β are illustrated inFigure 3 (b). We begin by looking at an example for the special cases. R j R k R iα i ˆ α i β ˆ β (b) R k R i R jα k α j = σ j α i = γ i ˆ γ i ˆ σ j ˆ γ j ˆ σ i (a) β = σ i β = γ j Figure 3 (a) The case when σ j is special and β = γ j together with the case when σ i = σ j is special and β = σ i . (b) All other possible β . X:18 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

Example 7

Consider R i and R j with RL-reps σ i = 11010010101 and σ j = 00101101010 . Bothhave RLE 211211111. Note that σ i is in SP (11) . The corresponding LC-reps are γ i = 00101011010 and γ j = 11010100101 . The conjugates of all four strings belong to the same cycle R k . This only happens forthese special RLEs. Below is the order that the strings from R k appear in X , based on Observation 13(item 2). In particular take notice of the positions of the four conjugates.01010101011101010101100101010110110101011010 ← ˆ γ i ← ˆ σ j ← σ k , the RL-rep, with RLE 211111111110101010100010101010011010101001001010100101 ← ˆ γ j ← ˆ σ i ← α k = γ k , the same-rep and LC-rep for this cycle The ordering of the four conjugates from this example are formalized in the second item of thefollowing lemma. As a result of the lemma, and observing Figure 3 (a), we see that β comes before ˆ β in X n for the two special cases. (cid:73) Lemma 26.

Let R i and R j be cycles such that σ i and σ j have the same RLE, where i < j .(i) If σ j is not special then ˆ α i and ˆ α j belong to the same cycle and appear in that relative orderwithin X n .(ii) If σ j is special, then ˆ γ i , ˆ σ j , ˆ γ j and ˆ σ i all belong to the same cycle and they appear in that relativeorder within X n . Proof.

By the ordering of the cycles, since i < j it must be that σ i begins with 1 and σ j begins with0; they belong to PCR-related cycles. Note that σ i = σ j and similarly γ i = γ j . Thus ˆ σ i = ˆ σ j and ˆ γ i = ˆ γ j and each pair, respectively, will belong to the same CCR-related cycle. Case (i) : If σ j is not special, then α i = γ i and α j = γ j . Suppose the RLE for σ i is r r · · · r m v ,where r m ≥ . Note also that r ≥ by the deﬁnition of RL-rep. Also m + v is odd, since R i is aPCR-related cycle. Then γ i has RLE v − r r . . . r m − ( r m − and PRR v +1 ( γ i ) = σ i , ˆ γ i has RLE v +1 r r . . . r m − ( r m − , σ k has RLE r r . . . r m − ( r m − v +1 where ˆ γ i ∈ R k .The third item is obtained by applying the deﬁnition of an RL-rep, using the fact that σ i is an RL-rep.Since R k is a CCR-related cycle (it has even run-length m + v + 1 ), its RL-rep σ k begins with 1.Note then that PRR v +1 (ˆ γ i ) = σ k . As a special case, if σ k has RLE n , then k = 1 and moreover σ i has RLE n − . Note σ i = γ i = α i . Since X n begins with n (Proposition 18), the ﬁrst length n substring in X n is · · ·

01 = ˆ α i . Thus ˆ α i clearly comes before ˆ α j within X n . For all remaining . Sala, J. Sawada and A. Alhakim XX:19 cases, if r m > , then PRR v +2 ( γ k ) = σ k . Otherwise, if m = 2 , let v denote the length of thelongest sufﬁx of r r . . . r m − ( r m − consisting only of 1s. Note this number is less than m sincewe already handled the special case where σ i has RLE n − . In this case PRR v +2+ v ( γ k ) = σ k . Ifthere are z strings in R k , clearly v + 2 + v will be less than z . Moreover, PRR z (ˆ γ i ) = ˆ γ j since ˆ γ i = ˆ γ j . Thus by Observation 13 (item 2), ˆ γ i comes before σ k which comes before ˆ γ j in X n . Case (ii) : If σ j is special, then α i = γ i and α j = σ j . Since σ j is special it begins and ends with 0and it has RLE of the form (21 x ) y z , where x ≥ , y ≥ , and z ≥ . Thus: ˆ σ j has RLE x +2 (21 x ) y − z , and begins with 1, γ j has RLE x + z − (21 x ) y − , and ˆ γ j has RLE x + z +1 (21 x ) y − , σ k has RLE (21 x ) y − x + z +2 where ˆ σ j ∈ R k , and begins with 1 since it is a CCR-relatedcycle.The ﬁnal item is obtained by applying the deﬁnition of an RL-rep, using the fact that σ i is an RL-rep.Since R k is a CCR-related cycle we have α k = γ k and based on the RLE of σ k , the cycle willcontain n − distinct strings. Thus for every string ω ∈ R k , PRR n − ( ω ) = ω . By the deﬁnitionof LC-rep, PRR (2 x + z +2)+2 x +1 ( γ k ) = σ k . Note also that PRR x +2 (ˆ σ j ) = σ k . Observe now thatPRR x + z +1 (ˆ γ j ) will have RLE (21 x ) y − z +2 x +2 and begin with 0; it is the complement of σ k .Thus, PRR x + z +1 (ˆ γ i ) = σ k . Putting it all together we have:PRR x +2 ( α k ) = ˆ γ i PRR x + z +1 ( α k ) = ˆ σ j PRR x + z +2+2 x +1 ( α k ) = σ k PRR n − x +2 ( α k ) = ˆ γ j PRR n − x + z +1 ( α k ) = ˆ σ i The result now follows from Observation 13 (item 2). (cid:74)(cid:73)

Corollary 27. If R i and R j are cycles such that σ i and σ j have the same RLE, where i < j ,then every string from R i appears before every string from R j in X n . Proof.

In case (ii) from Lemma , since σ j is special then α j = σ j and α i = γ i . Thus, an immediateconsequence of Lemma is that ˆ α i appears before ˆ α j in X n . Then by Observation 13 (item 5 and item3), every string in R i appears before every string in R j in X n . (cid:74) For all β ∈ R i other than these two special cases, assume that ˆ α i belongs to R j and ˆ β belongs to R k – see Figure 3 (b). We will show that j < k and subsequently that all strings in R j come beforeall strings in R k in X n . Suppose the RLE for σ i is r r · · · r m v , where r m ≥ . Then γ i has RLE v − r r . . . r m − ( r m − , ˆ γ i has RLE v +1 r r . . . r m − ( r m − , σ j has RLE r r . . . r m − ( r m − v +1 where ˆ γ i ∈ R j .The third item is obtained by applying the deﬁnition of an RL-rep, using the fact that σ i is an RL-rep.Moreover, note that σ j begins with 1 if R j corresponds to a CCR-related cycle. Otherwise, R i mustcorrespond to a CCR-related cycle which means that σ i begins with 1 and hence again σ j will beginwith 1 based on the RLEs described above. We now consider two cases depending on whether or not β = α i .If β = α i , then R i must be a CCR-related cycle. Thus α i = γ i begins with 1 and hence β beginswith 0. It is easy to see from the RLE of σ j noted above that it will also begin with 1. Thus σ k , whichwill have the same RLE as σ j , begins with 0. From the ordering deﬁned on the cycles, j < k . Thusby Lemma 10, ˆ α j appears before ˆ α k which implies that all strings in R j appear before all stringsin R k in X n by Observation 13 (item 3). Furthermore, Observation 13 (item 4) implies that β willappear before ˆ β in X n . X:20 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences If β = α i , then we ﬁrst consider the case where either α i or α i is special. We have alreadyhandled the two special cases where β = γ i or β = σ i . Since the RLE for β cannot begin with 1, β must be of the form (21 x ) y − q z (21 x ) q , where x ≥ , y ≥ , z ≥ , and ≤ q ≤ y − . Thus ˆ β has RLE x +2 (21 x ) y − q − z (21 x ) q . It is not hard to see σ k will have a smaller RLE compared to σ j which is detailed in the proof of Lemma 10. A similar analysis can be done when neither α i nor α i . For these cases, it is a relatively straightforward task to observe that the RLE for σ k is less thanthe RLE for σ j which means j < k . We can now apply the following lemma. (cid:73) Lemma 28. If R j and R k have the same run-length where j < k and σ j and σ k both begin withthe same symbol, then every string in R j appears in X n before any string in R k . Proof.

The proof is by induction on the levels of the related tree of cycles rooted by R , which isthe unique cycle with run-length n . The base case trivially holds for cycles with run-length n sincethere is only one such cycle R . Now assume that the result holds for all cycles at levels with runlength greater than ‘ < n , and consider two cycles R j and R k with run-length ‘ such that σ j and σ k both begin with the same symbol. By the ordering of the cycles the RLE of σ j is greater than theRLE of σ k . From Lemma 10, if σ j is special, then ˆ α j belongs to the same cycle as ˆ γ j . Similarlyfor σ k . Thus we need only focus on the RLE of the RL-reps σ x and σ y for the cycles R x and R y containing ˆ γ j and ˆ γ k respectively. From our earlier analysis (case (i) in the proof of Lemma 10), weanalyzed the RLE of these strings, and it can be observed that the RLE for σ x is greater than the RLEfor σ y since the RLE for σ j is greater than the RLE for σ k . Thus by the ordering of the cycles x < y .As noted earlier both R x and R y (any non-leaf in the related tree) must begin with 1. By induction,this means that the every string from the cycle containing R x appears before every string from R y in X n , and hence by Observation 13 (item 4), we have our result. (cid:74) Recall that σ j begins with 1 and σ j < σ k . Thus if σ k begins with 1, then the above lemma impliesthat all strings in R j appear before all strings in R k . Otherwise if σ k begins with 0, then it mustcorrespond to a PCR-related cycle. Consider R k containing RL-rep σ k which begins with 1 and hasthe same RLE as σ k . From the above lemma all strings in R j will appear before all strings in R k which in turn come before all strings in R k in X n by Corollary 27. By applying Observation 13 (item4), as we did earlier, we have that all strings in R i including β will appear before all strings in R k including ˆ β in X n . This completes the proof of Proposition 17.

11 Proof of Proposition 21

The proof of this proposition follows the same technical steps as the proof for Proposition 17. Recallthat Y n = DB( O, n − ) and Y n = 0 n − Y n . We begin by restating Proposition 21 by reversing theroles of β and ˆ β in its original statement for convenience:If β is a string in B ( n ) such that the run-length of β is one more than the run-length of ˆ β andneither β nor ˆ β are opp-reps, then β appears before ˆ β in Y n .The ﬁrst step is to further reﬁne the ordering of the cycles R , R , . . . , R t used in the proof ofTheorem 20 to prove that O was a de Bruijn successor. To begin, recall that R , R , . . . , R t waslisted such that R t = { n } with the remaining t − cycles ordered in non-decreasing order withrespect to the run-length of each cycle. Thus, R = { n } . This listing is reﬁned so that the cycleswith the same run-lengths are ordered in increasing order with respect to the RLE of the RL2-rep. Iftwo RL2-reps have the same RLE, then the cycle with RL2-rep starting with 0 comes ﬁrst. Note thatthis reﬁnement still satisﬁes the ordering of the cycles required in the proof. . Sala, J. Sawada and A. Alhakim XX:21 Let α i , γ i , σ i denote the opp-rep, LC2-rep, and the RL2-rep, respectively, for R i . The only stringsin R and R t are opp-reps. Thus assume β is in some R i , where < i < t , such that the run-lengthof β is one more than the run-length of ˆ β and neither β nor ˆ β are opp-reps. This run-length constraintimplies that the RLE of β must begin with 1. As before we have two special cases that are illustratedin the following example. In general, the special cases can be visualized by Figure 3 (a). Example 8

Consider R i and R j with RL2-reps σ i = 01100110000 σ j = 10011001111 . Both haveRLE 12224. Note that σ j is in SP2 (11) . The corresponding LC2-reps are γ i = 10000110011 and γ j = 01111001100 . The conjugates of all four strings belong to the same cycle R k . Below is the orderthat the strings from R k appear in Y , based on Observation 13 (item 2). In particular take notice of thepositions of the four conjugates.0000001100100000110011 ← ˆ γ i ← ˆ σ j ← σ k , the RL2-rep, with RLE 1226110011111101001111110000111111001011111100111111110011011111001100 ← ˆ γ j ← ˆ σ i ← σ k ← α k = γ k , the opp-rep and LC2-rep for this cycle The ordering of the four conjugates from this example are formalized in the second item of thefollowing lemma. As a result of the lemma, and observing Figure 3 (a), we see that β comes before ˆ β in Y n for the two special cases: β = γ j when σ j = α j is special and β = σ i when σ i = σ j . (cid:73) Lemma 29.

Let R i and R j be cycles such that σ i and σ j have the same RLE, where < i

By the ordering of the cycles, since i < j it must be that σ i begins with 0 and σ j begins with1; they belong to PCR-related cycles. Note that σ i = σ j and similarly γ i = γ j . Thus ˆ σ i = ˆ σ j and ˆ γ i = ˆ γ j and each pair, respectively, will belong to the same CCR-related cycle. Case (i) : If σ j is not special, then α i = γ i and α j = γ j . Since < i < j < t , the run-lengths of R i and R j must be greater than one. Since there is only one cycle with run length n , the run-lengthsof R i and R j must be less than n . Thus, suppose the RLE for σ i is v r r · · · r m , where v > and r ≥ . Note also that r m ≥ since otherwise there is a string with RLE v +1 r · · · r m − in R i X:22 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences which contradicts the fact that σ i is the RL2-rep. Also m + v is odd, since R i is a PCR-related cycle.Then γ i has RLE r m v − r r · · · r m − , ˆ γ i has RLE ( r m +1)1 v − r r · · · r m − , σ k has RLE v r r · · · ( r m − + r m ) where ˆ γ i ∈ R k .The third item is obtained by applying the deﬁnition of an RL2-rep, using the fact that σ i is anRL2-rep. Since R k is a CCR-related cycle (it has even run-length m + v + 1 ), its RL2-rep σ k beginswith 0. Note then that PRR r m (ˆ γ i ) = σ k . From the discussion of LC2-reps, PRR r m − + r m ( γ k ) = σ k .If there are z strings in R k , clearly r m − + r m will be less than z . Moreover, PRR z (ˆ γ i ) = ˆ γ j since ˆ γ i = ˆ γ j . Thus by Observation 13 (item 2), ˆ γ i comes before σ k which comes before ˆ γ j in Y n . Case (ii) : If σ j is special, then α i = γ i and α j = σ j . Since σ j is special it begins with 1 and hasRLE of the form x z y , where z is odd and y > x . Thus: ˆ σ j has RLE ( x +1) x z − y , and begins with 0, γ j has RLE yx z − , and ˆ γ j has RLE ( y +1) x z − , σ k has RLE x z − ( y + x ) where ˆ σ j ∈ R k , and begins with 0 since it is a CCR-related cycle.The ﬁnal item is obtained by applying the deﬁnition of an RL2-rep, using the fact that σ i is an RL2-rep.Since R k is a CCR-related cycle we have α k = γ k and based on the RLE of σ k , the cycle will contain n − distinct strings. Thus for every string ω ∈ R k , PRR n − ( ω ) = ω . From the discussion ofLC2-reps, PRR y + x ( γ k ) = σ k . Note also that PRR x (ˆ σ j ) = σ k . Observe now that PRR y (ˆ γ j ) willhave RLE x z − ( y + x ) and begin with 1; it is the complement of σ k . Thus, PRR y (ˆ γ i ) = σ k . Puttingit all together, recalling γ k = α k , we have:PRR x ( α k ) = ˆ γ i PRR y ( α k ) = ˆ σ j PRR x + y ( α k ) = σ k PRR ( x +1) z + y +1 ( α k ) = ˆ γ j PRR xz +2 y +1 ( α k ) = ˆ σ i The result now follows from Observation 13 (item 2). (cid:74)(cid:73)

Corollary 30. If R i and R j are cycles such that σ i and σ j have the same RLE, where i < j ,then every string from R i appears before every string from R j in Y n . Proof.

In case (ii) from Lemma , since σ j is special then α j = σ j and α i = γ i . Thus, an immediateconsequence of Lemma is that ˆ α i appears before ˆ α j in Y n . Then by Observation 13 (item 5 and item3), every string in R i appears before every string in R j in Y n . (cid:74) For all β ∈ R i other than these two special cases, assume that ˆ α i belongs to R j and ˆ β belongs to R k – see Figure 3 (b). We will show that j < k and subsequently that all strings in R j come beforeall strings in R k in Y n . Suppose the RLE for σ i is v r r · · · r m , where r ≥ . Since < i < t ,clearly v > . Then γ i has RLE r m v − r r · · · r m − , ˆ γ i has RLE ( r m +1)1 v − r r · · · r m − , σ j has RLE v r r · · · ( r m − + r m ) where ˆ γ i ∈ R j .The third item is obtained by applying the deﬁnition of an RL2-rep, using the fact that σ i is anRL2-rep. Moreover, note that σ j begins with 0 if R j corresponds to a CCR-related cycle. Otherwise, R i must correspond to a CCR-related cycle which means that σ i begins with 0 and hence again σ j will begin with 0 based on the RLEs described above. We now consider two cases depending onwhether or not β = α i .If β = α i , then R i must be a CCR-related cycle. Thus α i = γ i begins with 0 and hence β beginswith 1. It is easy to see from the RLE of σ j noted above that it will also begin with 0. Thus σ k , which . Sala, J. Sawada and A. Alhakim XX:23 will have the same RLE as σ j , begins with 1. From the ordering deﬁned on the cycles, j < k . Thusby Lemma 11, ˆ α j appears before ˆ α k which implies that all strings in R j appear before all stringsin R k in X n by Observation 13 (item 3). Furthermore, Observation 13 (item 4) implies that β willappear before ˆ β in X n .If β = α i , then we ﬁrst consider the case where either α i or α i is special. We have alreadyhandled the two special cases where β = γ i or β = σ i . Since the RLE for β must begin with 1 and α = β , β must be of the form q y z − q +1 where z is odd and y > z and ≤ q ≤ z − . Thus ˆ β has RLE q − y z − q +1 . It is not hard to see σ k will have a smaller RLE compared to σ j which isdetailed in the proof of Lemma 11. A similar analysis can be done when neither α i nor α i are special.For these cases, it is a relatively straightforward task to observe that the RLE for σ k is less than theRLE for σ j which means j < k . We can now apply the following lemma. (cid:73) Lemma 31. If R j and R k have the same run-length where j < k and σ j and σ k both begin withthe same symbol, then every string in R j appears in Y n before any string in R k . Proof.

The proof is by induction on the levels of the related tree of cycles rooted by R . Recall R = { n } and R t = { n } . The base case trivially holds for cycles with run-length , since wepreviously demonstrated that Y n begins with n . Now assume the result holds for all cycles at levelswith run length less than ‘ > , and consider two cycles R j and R k with run-length ‘ such that σ j and σ k both begin with the same symbol. By the ordering of the cycles the RLE of σ j is less thanthe RLE of σ k . From Lemma 11, if σ j is special, then ˆ α j belongs to the same cycle as ˆ γ j . Similarlyfor σ k . Thus we need only focus on the RLE of the RL2-reps σ x and σ y for the cycles R x and R y containing ˆ γ j and ˆ γ k respectively. From our earlier analysis (case (i) in the proof of Lemma 11), weanalyzed the RLE of these strings, and it can be observed that the RLE for σ x is less than the RLE for σ y since the RLE for σ j is less than the RLE for σ k . Thus by the ordering of the cycles x < y . Asnoted earlier both R x and R y (any non-leaf in the related tree) must begin with 0. By induction, thismeans that the every string from the cycle containing R x appears before every string from R y in X n ,and hence by Observation 13 (item 4), we have our result. (cid:74) Recall that σ j begins with 0 and σ j < σ k . Thus if σ k begins with 0, then the above lemma impliesthat all strings in R j appear before all strings in R k . Otherwise if σ k begins with 1, then it mustcorrespond to a PCR-related cycle. Consider R k containing RL2-rep σ k which begins with 0 andhas the same RLE as σ k . From the above lemma all strings in R j will appear before all strings in R k which in turn come before all strings in R k in Y n by Corollary 30. By applying Observation 13(item 4), as we did earlier, we have that all strings in R i including β will appear before all strings in R k including ˆ β in Y n . This completes the proof of Proposition 21. References A. Alhakim. A simple combinatorial algorithm for de Bruijn sequences.

The American MathematicalMonthly , 117(8):728–732, 2010. A. Alhakim. Spans of preference functions for de Bruijn sequences.

Discrete Applied Mathematics ,160(7-8):992 – 998, 2012. A. Alhakim, E. Sala, and J. Sawada. Revisiting the prefer-same and prefer-opposite de Bruijn sequenceconstructions.

Theoretical Computer Science , 2020 (to appear). J. Aycock.

Retrogame Archeology . Springer International Publishing, 2016. K. S. Booth. Lexicographically least circular substrings.

Inform. Process. Lett. , 10(4/5):240–242, 1980. P. E. C. Compeau, P. A. Pevzner, and G. Tesler. How to apply de Bruijn graphs to genome assembly.

Nature Biotechnology , 29(11):987–991, 2011. N. G. de Bruijn. A combinatorial problem.

Indagationes Mathematicae , 8:461–467, 1946.

X:24 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences P. B. Dragon, O. I. Hernandez, J. Sawada, A. Williams, and D. Wong. Constructing de Bruijn sequenceswith co-lexicographic order: the k -ary Grandmama sequence. European J. Combin. , 72:1–11, 2018. J. P. Duval. Factorizing words over an ordered alphabet.

Journal of Algorithms , 4(4):363–381, 1983. C. Eldert, H. Gray, H. Gurk, and M. Rubinoff. Shifting counters.

AIEE Trans. , 77:70–74, 1958. T. Etzion. Self-dual sequences.

Journal of Combinatorial Theory, Series A , 44(2):288 – 298, 1987. M. Fleury. Deux problemes de geometrie de situation.

Journal de mathematiques elementaires , 42:257–261, 1883. C. Flye Sainte-Marie. Solution to question nr. 48.

L’intermédiaire des Mathématiciens , 1:107–110, 1894. H. Fredricksen. Generation of the Ford sequence of length n , n large. J. Combin. Theory Ser. A ,12(1):153–154, 1972. H. Fredricksen. A survey of full length nonlinear shift register cycle algorithms.

Siam Review , 24(2):195–221, 1982. H. Fredricksen and I. Kessler. Lexicographic compositions and de Bruijn sequences.

J. Combin. TheorySer. A , 22(1):17 – 30, 1977. H. Fredricksen and J. Maiorana. Necklaces of beads in k colors and k -ary de Bruijn sequences. DiscreteMath. , 23:207–210, 1978. D. Gabric and J. Sawada. Constructing de Bruijn sequences by concatenating smaller universal cycles.

Theoretical Computer Science , 743:12 – 22, 2018. D. Gabric and J. Sawada. Investigating the discrepancy property of de Bruijn sequences.

Submittedmanuscript , 2020. D. Gabric, J. Sawada, A. Williams, and D. Wong. A framework for constructing de Bruijn sequences viasimple successor rules.

Discrete Mathematics , 341(11):2977 – 2987, 2018. D. Gabric, J. Sawada, A. Williams, and D. Wong. A successor rule framework for constructing k -ary deBruijn sequences and universal cycles. IEEE Transactions on Information Theory , 66(1):679–687, 2020. S. W. Golomb.

Shift Register Sequences . Aegean Park Press, Laguna Hills, CA, USA, 1981. C. Hierholzer. Deux problemes de geometrie de situation.

Journal de mathematiques elementaires ,42:257–261, 1873. Y. Huang. A new algorithm for the generation of binary de Bruijn sequences.

J. Algorithms , 11(1):44–51,1990. A. Klein.

Stream Ciphers . Springer-Verlag London, 2013. M. H. Martin. A problem in arrangements.

Bull. Amer. Math. Soc. , 40(12):859–864, 1934. P. A. Pevzner, H. Tang, and M. S. Waterman. An eulerian path approach to dna fragment assembly.

Proceedings of the National Academy of Sciences , 98(17):9748–9753, 2001. E. Sala. Exploring the greedy constructions of de Bruijn sequences. Master’s thesis, University of Guelph,2018. J. Sawada, A. Williams, and D. Wong. A surprisingly simple de Bruijn sequence construction.

DiscreteMath. , 339:127–131, 2016. A. Williams. The greedy Gray code algorithm. In F. Dehne, R. Solis-Oba, and J.-R. Sack, editors,

Algorithms and Data Structures , pages 525–536, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. S. Xie. Notes on de Bruijn sequences.

Discrete Applied Mathematics , 16(2):157 – 177, 1987. . Sala, J. Sawada and A. Alhakim XX:25 A Implementation of the de Bruijn successors RL ( ω ) , LC ( ω ) , and S ( ω ) N_MAX 50 int n; // =============================================================================// Compute the RLE of a[1..m] in run[1..r], returning r = ruh length// ============================================================================= int RLE( int a[], int run[], int m) { int i,j,r,old;old = a[m+1];a[m+1] = 1 - a[m];r = j = 0; for (i=1; i<=m; i++) { if (a[i] == a[i+1]) j++; else { run[++r] = j+1; j = 0; }}a[m+1] = old; return r;} // ===============================================================================// Check if a[1..n] is a "special" RL representative. It must be that a[1] = a[n]// and the RLE of a[1..n] is of the form (21^j)^s1^t where j is even, s >=2, t>=2// =============================================================================== int Special( int a[]) { int i,j,r,s,t,run[N_MAX]; if (a[1] != 0 || a[n] != 0) return // Compute j of prefix 21^j if (run[1] != 2) return while (run[j+2] == 1 && j+2 <= r) j++; // Compute s of prefix (21^j)^s s = 1; while (s <= r/(1+j) -1 && run[s*(j+1)+1] == 2) { for (i=1; i<=j; i++) if (run[s*(j+1)+1+i] != 1) return // Test remainder of string is (21^j)^s is 1^t for (i=s*(j+1)+1; i<=r; i++) if (run[i] != 1) return if (s >= 2 && t >= 2 && j%2 == 0) return return // =============================================================================// Apply PRR^{t+1} to a[1..n] to get b[1..n], where t is the length of the// prefix before the first 00 or 11 in a[2..n] up to n-2// ============================================================================= int Shift( int a[], int b[]) { int i,t = 0; while (a[t+2] != a[t+3] && t < n-2) t++; for (i=1; i<=n; i++) b[i] = a[i]; for (i=1; i<=n; i++) b[i+n] = (b[i] + b[i+1] + b[n+i-1]) % 2; for (i=1; i<=n; i++) b[i] = b[i+t+1]; return t;} // =============================================================================// Test if b[1..len] is the lex largest rep (under rotation), if so, return the// period p; otherwise return 0. Eg. (411411, p=3)(44211, p=5) (411412, p=0).// ============================================================================= int

IsLargest( int b[], int len) { int i, p=1; for (i=2; i<=len; i++) { if (b[i-p] < b[i]) return if (b[i-p] > b[i]) p = i;} if (len % p != 0) return return p;} X:26 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences // =============================================================================// Membership testers not including the cycle containing 0101010...// ============================================================================= int

RLrep( int a[]) { int p,r,rle[N_MAX];r = RLE(a,rle,n-1);p = IsLargest(rle,r); // PCR-related cycle if (a[1] == a[n]) { if (r == n-1 && a[1] == 1) return // Ignore root a[1..n] = 1010101.. if (r == 1) return // Special case: a[1..n] = 000..0 or 111..1 if (p > 0 && a[1] != a[n-1] && (p == r || a[1] == 1 || p%2 == 0)) return // CCR-related cycle if (a[1] != a[n]) { if (p > 0 && a[1] == 1 && (a[n-1] == 1)) return return // ============================================================================= int LCrep( int a[]) { int b[N_MAX]; if (a[1] != a[2]) return return RLrep(b);} // ============================================================================= int

SameRep( int a[]) { int b[N_MAX];Shift(a,b); if (Special(a) || (LCrep(a) && !Special(b))) return return // =============================================================================// Repeatedly apply the Prefer-Same or LC or RL successor rule starting with 1^n// ============================================================================= void DB( int type) { int i,j,v,a[N_MAX],REP; for (i=1; i<=n; i++) a[i] = 1; // Initial string for (j=1; j<=pow(2,n); j++) {printf("%d", a[1]);v = (a[1] + a[2] + a[n]) % 2;REP = 0; // Membership testing of a[1..n] if (type == 1 && SameRep(a)) REP = 1; if (type == 2 && LCrep(a)) REP = 1; if (type == 3 && RLrep(a)) REP = 1; // Membership testing of conjugate of a[1..n] a[1] = 1 - a[1]; if (type == 1 && SameRep(a)) REP = 1; if (type == 2 && LCrep(a)) REP = 1; if (type == 3 && RLrep(a)) REP = 1; // Shift String and add next bit for (i=1; i N_MAX 50 int n; // =============================================================================// Compute the RLE of a[s..m] in run[1..r], returning r = run length// ============================================================================= int RLE( int a[], int run[], int s, int m) { int i,j,r,old;old = a[m+1];a[m+1] = 1 - a[m];r = j = 0; for (i=s; i<=m; i++) { if (a[i] == a[i+1]) j++; else { run[++r] = j+1; j = 0; }}a[m+1] = old; return r;} // ===============================================================================// Check if a[1..n] is a "special" RL representative: the RLE of a[1..n] is of// the form 1 x^j y where y > x and j is odd. Eg. 12224, 1111113 (PCR-related)// =============================================================================== int Special( int a[]) { int i,r,rle[N_MAX];r = RLE(a,rle,1,n); if (r%2 == 0) return for (i=3; i rle[2]) return return // =============================================================================// Apply PRR^{t} to a[1..n] to get b[1..n], where t is the length of the// prefix in a[1..n] before the first 01 or 10 in a[2..n]// ============================================================================= int Shift( int a[], int b[]) { int i,t=1; while (a[t+1] == a[t+2] && t < n-1) t++; for (i=1; i<=n; i++) b[i] = a[i]; for (i=1; i<=n; i++) b[i+n] = (b[i] + b[i+1] + b[n+i-1]) % 2; for (i=1; i<=n; i++) b[i] = b[i+t]; return t;} // =============================================================================// Test if b[1..len] is the lex smallest rep (under rotation), if so, return the// period p; otherwise return 0. Eg. (114114, p=3)(11244, p=5)(124114, p=0).// ============================================================================= int

IsSmallest( int b[], int len) { int i, p=1; for (i=2; i<=len; i++) { if (b[i-p] > b[i]) return if (b[i-p] < b[i]) p = i;} if (len % p != 0) return return p;} // =============================================================================// Membership testers with special case for 111111...1 (run length for a[2..n])// ============================================================================= int RL2rep( int a[]) { int p,r,rle[N_MAX];r = RLE(a,rle,2,n); if (r == 1) return // Special case: a[1..n] = 000..0 or 111..1 if (a[1] == a[2]) return if (a[1] == a[n] && p > 0 && (p == r || a[1] == 0 || p%2 == 0)) return //PCR-related if (a[1] != a[n] && p > 0 && a[1] == 0) return // CCR-related return X:28 Efﬁcient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences // ============================================================================= int

LC2rep( int a[]) { int t,b[N_MAX]; if (a[1] == a[2]) return return RL2rep(b);} // ============================================================================= int

OppRep( int a[]) { int b[N_MAX];Shift(a,b); if (Special(a) || (LC2rep(a) && !Special(b))) return return // =============================================================================// Repeatedly apply the Prefer Opp or LC or RL successor rule starting with 1^n// ============================================================================= void DB( int type) { int i,j,v,a[N_MAX],REP; // Initial string for (i=1; i<=n; i+=2) a[i] = 0; for (i=2; i<=n; i+=2) a[i] = 1; for (j=1; j<=pow(2,n); j++) {printf("%d", a[1]);v = (a[1] + a[2] + a[n]) % 2;REP = 0; // Membership testing of a[1..n] if (type == 1 && OppRep(a)) REP = 1; if (type == 2 && LC2rep(a)) REP = 1; if (type == 3 && RL2rep(a)) REP = 1; // Membership testing of conjugate of a[1..n] a[1] = 1 - a[1]; if (type == 1 && OppRep(a)) REP = 1; if (type == 2 && LC2rep(a)) REP = 1; if (type == 3 && RL2rep(a)) REP = 1; // Shift String and add next bit for (i=1; i