Extending de Bruijn sequences to larger alphabets
EExtending de Bruijn sequences to larger alphabets
Ver´onica Becher Lucas Cort´es [email protected] [email protected] de Computaci´on, Facultad de Ciencias Exactas y Naturales & ICCUniversidad de Buenos Aires & CONICET
ArgentinaJuly 2, 2019
Abstract
A circular de Bruijn sequence of order n in an alphabet of k symbols is a sequencein which each sequence of length n occurs exactly once. In this work we show thatfor each circular de Bruijn sequence v of order n in an alphabet of k symbols there isanother circular de Bruijn sequence w also of order n in an alphabet with one moresymbol, that is an alphabet of k + 1 symbols, such that v is a subsequence of w andin between any two successive occurrences of the new symbol in w there are at most n + 2 k − v . We give an algorithm that receives as input sucha sequence v and outputs a sequence w . We also give a much faster algorithm thatreceives as input such a sequence v and outputs a sequence w , but the new symbolmay not be evenly spread out. A rotation is the operation that moves the final symbol of a finite sequence to the firstposition while shifting all other symbols to the next position, or it is the composition of thisoperation with itself an arbitrary number of times. A circular sequence is the equivalenceclass of a sequence under rotations. We write [ abc ] to denote the circular sequence formedby the rotations of abc .We say that a subsequence of a sequence a a . . . a n is a sequence b b . . . b k defined by b i = a n i for 1 ≤ i ≤ k , where n ≤ n ≤ . . . ≤ n k is an increasing sequence of indices.The same applies to circular words, assuming any starting position. For example, [1 , , , ,
6] and [5,6,1,2] are subsequences of [1 , , , , , circular de Bruijn sequence of order n on a size- k alphabet A is a circular sequenceof size k n in which every possible size- n sequence on A occurs exactly once as a contiguoussubsequence [8, 13]. See [5] for a fine presentation and history. To denote the set of circularde Bruijn sequences of order n in an alphabet of k symbols we write B ( k, n ). For example,[0 , , ,
1] is in B (2 , v of order n in analphabet of k symbols there is another circular de Bruijn sequence w of order n but in analphabet of k + 1 symbols such that v is a subsequence of w and such that in between twosuccessive occurrences of the new symbol in w there are at most n + 2 k − v . We provide an algorithm that given such an input sequence v produces theoutput sequence w . And we give a much faster algorithm that also receives as input such1 a r X i v : . [ c s . D M ] J un sequence v and outputs a sequence w without the guarantee of the fair distribution ofthe new symbol. Thus, Theorems 1 and 2 stated below are the main results of this note: Theorem 1.
Given a circular de Bruijn sequence v in B ( k, n ) there is a circular deBruijn sequence w in B ( k + 1 , n ) such that v is a subsequence of w and for any k + n − consecutive symbols in w there is at least one occurrence of the new symbol s . Moreover,there is an algorithm that given as input such a sequence v generates the sequence w afterperforming O ( k n − ) mathematical operations and it uses O (( k + 1) n ) space. For example, given the B (2 ,
3) sequence v = [1 , , , , , , , B (3 ,
3) sequence w = [1 , , , , , , , , , , , , , , , , , , , , , , , , , , k = 2 and n = 3 in the alphabet A = { , } where the new symbol s is the symbol 2. The symbol 2 occurs ( k + 1) n − = 9 times in w and given any n + 2 k − w there is at least one occurrence ofthe symbol 2.It is not hard to see that given a sequence v in B ( k, n ) there is a sequence w in B ( k + 1 , n ) such that v is a subsequence of w . But we aim to guarantee that the newsymbol s is fairly distributed along the extended de Bruijn sequence w . The first difficultyis to mathematically define this condition. The second difficulty is to prove the existence ofsuch an extended sequence w and to provide an elegant and fast algorithm to construct it.In addition to classical elements from graph theory such as de Bruijn graphs, Euleriancycles and graph transformations, we use the Edmonds-Karp algorithm [9, 6]. The outputsequence obtained by our algorithm has size ( k + 1) n . Thus, Theorem 1 states that thealgorithm is practically cubic on the output size and this time complexity is dominated bythe Edmonds-Karp O ( V E ) time complexity when operating on a graph with V verticesand E edges.In case we ask for no guarantee on the distribution of the new symbol in the extendedsequence, we obtain a faster algorithm. Theorem 2.
There is an algorithm that given a circular de Bruijn sequence v in B ( k, n ) generates a circular de Bruijn sequence sequence w in B ( k + 1 , n ) such that v is a subse-quence of w , after performing at most O ( n ( k + 1) n ) mathematical operations and it uses O (( k + 1) n ) space. The sequence w generated by the algoritm given in Therem 2 has size ( k + 1) n . Thus,the time complexity of this solution is just above the size of the input. Precisely, foreach symbol of the generated sequence w this second algorithm performs a number ofoperations that is the square of the logarithm of the size of the output sequence. Theproof of Theorem 2 is elementary and it formalizes a natural intuition on how to extend ade Bruijn sequence to a larger alphabet. We shall see that the algorithm is greedy, makingjust some computations on each step.The extension problem to a larger alphabet is dual to the extension problem studiedby Becher and Heiber in [3], where they considered the problem of extending a sequence v in B ( k, n ) to a sequence w in B ( k, n + 1) such that v is a suffix of w . Theorems 1 and 2in this note appear in [7]. 2t is possible to conceive the problem of extension to a larger alphabet for particularfamilies of de Bruijn sequences. Gabriel Thibeault in [15] proved that the lexicographicallygreatest de Bruijn sequence v in B ( k, n ) is the suffix of the lexicographically greatestsequence w in B ( k + 1 , n ). Thus, for the lexicographically greatest de Bruijn sequence in B ( k, n ) there is a very simple solution to the problem stated in Theorem 2, which is toconstruct the lexicographically greatest de Bruijn sequence in ( k + 1 , n ), and this can bedone with a greedy algorithm. A fast version of the algorithm for the lexicographicallygreatest de Bruijn sequence was obtained by Amram, Ashlagi, Rubin, Svoray, Schwartzand Weiss [2]. Schwartz, Svoray and Weiss recently considered in [14] the extension to alarger alphabet for lexicographically greatest de Bruijn sequence.It seems interesting to study the extension problem to a larger alphabet for other salientfamilies. For instance, the semi-perfect de Bruijn sequences of Repke and Rytter [12] whichsatisfy that each of the prefixes (large enough) has the largest possible number of distinctwords. Or the perfect sequences of Alvarez, Becher, Ferrari and Yuhjtman [1] which, fororder n , contain each word of length n exactly n times but each one starting at differentpositions modulo n . Or the subtler nested perfect sequences of Mordachay Levin [11,Theorem 2], see also [4].The document is organized as follows. In Section 2 we present the classical materialon de Bruijn graphs and we fix the notation. In Section 3 we give the proof of Theorem 2because it is simpler than that of Theorem 1. In Section 4 we elaborate the definition offair distribution of the new symbol in the extended sequence and we devote Section 5, thelast section of the paper, to the proof of Theorem 1. Fix a finite alphabet A . Without loss of generality, when we consider an alphabet A of k symbols we assume A = { , , . . . k − } . As usual, we write A n to denote the set ofsymbols of size n whose symbols belong to A . In the sequel we use the terms word andsequence interchangeably.A de Bruijn graph G ( k, n ) is a directed graph ( V, E ) where V is the set of wordsof size n on a size- k alphabet A and whose set of edges E is the set of pairs ( u, v ) for u = a a . . . a n and v = a . . . a n b with b ∈ A . Thus, the graph has k n vertices and k n +1 edges, it is strongly connected and every vertex has the same in-degree and out-degree.Each circular de Bruijn sequence in B ( k, n ) can be constructed by taking a Hamiltoniancycle on the G ( k, n ) graph given that each vertex of the graph is a word of size k in analphabet of k symbols. Moreover, since the line graph of G ( k, n ) is G ( k, n + 1), eachcircular de Bruijn sequence in B ( k, n + 1) can be constructed as an Eulerian cycle in G ( k, n ). For example, in the G (2 ,
2) graph if one traverses the edge labelled 1 from 00,one arrives at 01 thereby indicating the presence of the contiguous subsequence 001 in thede Bruijn sequence.Notice that G ( k, n ) is a subgraph of G ( k + 1 , n ). To see that, observe that the verticesof the first graph are all the possible size-n words in a size- k alphabet and the vertices ofthe second graph are those of the first one plus all the possible size- n words in an alphabetof size k + 1 with at least one occurrence of the new symbol. Also, the edges of the secondgraph are the same as the ones from the first graph plus the ones representing words withat least one occurrence of the new symbol. This means that we can add vertices andedges to G ( k, n ) and obtain G ( k + 1 , n ). This motivates the following definition of theaugmenting graph D ( k + 1 , n ). 3igure 1: The edges of the graph D (3 ,
2) are shown in dashed lines.If w is a word on alphabet A and a is a symbol of A we write | w | a to denote the numberof occurrences of a in w . Similarly, if u is a word we write | w | u to denote the number ofoccurrences of u in w . Definition 3 (Augmenting graph) . Let (cid:98) A = A ∪ { s } where A has k symbols and s is asymbol not in A . We define the augmenting graph D ( k + 1 , n ) = ( V, E ) where V = (cid:98) A n E = { ( v, w ) : if v = a . . . a n then w = a . . . a n b where b ∈ (cid:98) A and ( | v | s > | w | s > } To prove Theorems 1 and 2 we have to transform a given de Bruijn sequence in B ( k, n )into a de Bruijn sequence in B ( k + 1 , n ) in such a way that the first one is a subsequenceof the second one. Thus, given an Eulerian cycle c in G ( k, n −
1) we need to constructan Eulerian cycle in G ( k + 1 , n −
1) where we preserve the relative order of the edgesin c . In the augmenting graph D ( k + 1 , n −
1) each of the vertices of G ( k, n −
1) hasexactly one incoming and outgoing edge. Also observe that the outgoing edge is alwayslabelled with the new symbol. So, the only way to define the expected Eulerian cycle in G ( k + 1 , n −
1) is by interleaving disjoint cycles of the augmenting graph D ( k + 1 , n −
1) oneach of the vertices of G ( k, n − D ( k + 1 , n −
1) that we call petals . In order to do that, we usethe following proposition.
Proposition 4.
Fix an integer k greater or equal to . The set of edges in G ( k, n ) can bepartitioned into a set of cycles identified by the circular words of size n + 1 .Proof. First observe that we can unequivocally identify an edge of G ( k, n ) by concatenatingthe outgoing vertex label with the label of that edge. Thus, each edge of G ( k, n ) isidentified with a word of size n + 1. Also this word identifies a circular word of size n + 1,which is the class of all the rotations of this word. Now notice that each circular word ofsize n + 1 corresponds to exactly one cycle in G ( k, n ). Thus the partition of the set ofwords of size n +1 in the equivalence class given by the rotations of these words determinesa partition of the set of edges in G ( k, n ) into cycles.4igure 2: Given a size-2 alphabet, there are 4 circular words of size 3: [000], [100], [110]and [111], each one associated with a cycle in G (2 , G (2 ,
2) graph. The left figure has a petal forthe vertex 01 that only contains one cycle, the one associated to the circular word [012].The right figure has a petal for the vertex 10 that contains three cycles associated to thecircular words [102], [022] and [222].In the following proposition we write (cid:116) to denote the disjoint union of two sets. It statesthat the augmenting graph D ( k + 1 , n ) contains the set of cycles associated to circularwords of size n + 1 with at least one occurrence of the new symbol. It is immediate toverify that the proposition holds. Proposition 5.
Let C be the set of cycles in G ( k, n ) associated to the circular words ofsize n + 1 in an alphabet of k symbols. Let (cid:98) C be the set of cycles in G ( k + 1 , n ) associatedto the circular words of size n + 1 in an alphabet of k + 1 symbols. Then (cid:98) C = C (cid:116) P ,where P is the set of cycles associated to the circular words of size n + 1 with at least oneoccurrence of the new symbol. We are now ready to define a petal . Definition 6 (Petal) . A petal of G ( k, n ) is a cycle of cycles in D ( k + 1 , n ) associated tocircular words of size n + 1 that traverses only one vertex of G ( k, n ).We aim to define the wanted Eulerian cycle in G ( k +1 , n ) as the given cycle c in G ( k, n )interleaved with the petals of the augmenting graph D ( k + 1 , n ).5igure 4: A possible t (3 , ,
2) tree. The root r determines four petals, one for each branch.The first petal has the circular word [002], the second has [012], the third has [021], [022],[122] and [222] and the fourth has [112].The difficulty lies in determining how to define petals using every edge of D ( k + 1 , n )and also how to interleave these petals in c to make sure that the occurrences of the newsymbol are fairly distributed to satisfy the requirement of Theorem 1. Definition 7 (Petals tree) . Let A be an alphabet with cardinality k with k ≥ r acircular de Bruijn sequence in B ( k − , n ) and s ∈ A such that s / ∈ r . We define the Petalstree t ( k, n, s ) as a rooted tree subgraph of the directed graph ( V ∪ { r } , E ) where V = { [ w ] : w ∈ A n and | w | s ≥ } E = { ([ v ] , [ w ]) : v, w ∈ A n , ∃ u ∈ A n − , | v | u > , | w | u > , | w | s = | v | s + 1 } ∪{ ( r, [ v ]) : | v | s = 1 } . The vertices with distance 1 to the root have exactly one occurrence of the symbol s , andeach vertex of t ( k, n, s ) with distance d to the root has exactly d occurrences of the newsymbol.When two vertices are connected in t ( k, n, s ) they have a common contiguous subse-quence of size n −
1. We shall define a cycle that goes through several connected cycles.In order to compose two cycles u and v we traverse the first circular word u until we finda common vertex w such that the next edge in u is not labelled with the new symbol s .Observe that w has the same number of occurrences of s as u . Consequently, an edgelabelled with s that starts from w corresponds to a circular word with more occurrencesof the symbol s . 6igure 5: On the left we have the circular words [021] and [022] from the Petals tree andtheir associated cycles on the right. The first circular word has one occurrence of thesymbol 2 and the second one has two. Their associated cycles have the common vertex 02.Suppose we traverse the first cycle starting from the vertex 21. We would go through theedges 0 and 2 until we get to the common vertex 02. At that point, we start traversingthe second cycle starting with the symbol 2 which guarantees a circular word with twooccurrences of the symbol 2. We traverse 2, 0 and 2. After that, we finish the first cyclewith the label 1. We give an algorithm that formalizes a common intuition on how to extend a de Bruijnsequence to a larger alphabet. Consider the graph for de Bruijn sequences of order n andalphabet in k + 1 symbols.Every de Bruijn sequence v in B ( k, n ) is associated to an Eulerian cycle in G ( k, n − G ( k + 1 , n − s to extend the cycle. Wealready introduced a tool to traverse the de Bruijn graph G ( k + 1 , n −
1) using a Petalstree starting with a B ( k, n ) de Bruijn sequence q that represents an Eulerian cycle inthe de Bruijn graph G ( k, n − w and the edge s . One possibility is that they have not beentraversed. In this case, start traversing the new cycle. Another possibility is that they arethe current circular word. In this case we keep traversing the same circular word. A lastpossibility is that they have already been traversed. In this case we ignore this circularword.In the following example we perform the first six steps of the algorithm just described.Assume as input a B (2 ,
3) de Bruijn sequence [00101110]. We begin the traversal in vertex10 and immediately try to add a circular word with one occurrence of the symbol 2. Thecircular word of the vertex 10 and label 2 is the [210]. We traverse the edge 2 to thevertex 02. Then again we try to find a new circular word by traversing another edgelabelled 2. The vertex 02 with the edge labelled 2 determines the circular word [202]and we traverse the edge 2 to the vertex 22. Again, we search a new circular word. Thevertex 22 with the edge labelled 2 determines the circular word [222]. We traverse theedge 2 and get to the same vertex. Nzow, the circular word [222] is already used, so wehave to go to the next edge of the current circular word. We traverse the edge labelled 0to the vertex 20. Again, the circular word [220] is already in use, so we continue. Whenwe get to the vertex 21 we again can start a new circular word, the [221].7igure 6: First steps of the algorithm with a B (2 ,
3) de Bruijn sequence [00101110] asinput. We show the circular words added on the Petals tree.Figure 7: Next steps of the algorithm with a B (2 ,
3) de Bruijn sequence [00101110] asinput. 8 lgorithm 8 function extendDeBruijn(originalSequence: [Int]) let alphabetSize = getSize(originalSequence) let newAlphabetSize = alphabetSize + 1 let order = getOrder(originalSequence) let newSymbol = alphabetSize var sequence = originalSequence var pos = 0 var vertex = originalSequence.last(order - 1) var visitedVertices = [false, ...] while pos ≤ sequence.count do vertex = vertex.last(vertex.count - 1) + [edgeValue] pos += 1 if !visitedVertices[vertex] then let edge = vertex + [newSymbol] var cycle = [] for for i in 0.. < edge.count do let newEdge = edge.last(edge.count - i) + edge.first(i) if if i > then break end if let newVertex = newEdge.first(order-1) cycle += newEdge.last(1) if newEdge.last == newSymbol then visitedVertices[newVertex] = true end if end for sequence = sequence.first(pos) + cycle + sequence.last(sequence.count -pos) end if end while return sequence end function .2 An Algorithm to prove Theorem 2 Algorithm 8 takes a B ( k, n ) de Bruijn sequence v and returns a B ( k + 1 , n ) de Bruijnsequence w such that v is a subsequence of w where the new symbol s occurs ( k + 1) n − times in w .The main idea of the algorithm is to traverse an array with the original sequence addingpetals and cycles whenever possible. We first determine the alphabet size and the order ofthe de Bruijn sequence. To find the alphabet size we just have to count how many differentsymbols the sequence has. We can get the order of the sequence by solving k n = edges .Then we make a copy of the original sequence that we will modify to get the extendedsequence. There are several variables to keep track of things. The variable pos keeps trackof the current position in the array and represents the edges that we already traversed.The variable vertex indicates in which vertex are placed at each step. To make sure thatwe do not traverse any cycle more than once we have to keep track of every edge of atraversed cycle. We can reduce space by just keeping track of the vertices such that theiroutgoing edge labelled with the new symbol belongs to a traversed cycle. We keep trackof them in the array visitedV ertices . In this way we can unequivocally decide whether ornot we should add a cycle at each vertex.The main loop of Algorithm 8 iterates through every edge of the original sequenceadding cycles. On each vertex v in position pos of the array we have two possibilities. Ifwe already added the circular word determined by the concatenation of v and the newsymbol s we ignore that circular word, increment pos and go to the next vertex in thesequence. If we did not already added that circular word, we have to add it. To do that,we traverse each edge of the cycle, add them to the sequence on the current position, andfor those labelled with s we mark their outgoing vertex as visited .Extra care is taken in writing the edges of the cycles. Notice that not always thecycle associated to a word of size n has n edges. There are as many edges as equivalenceclasses of the word. The algorithm starts adding each edge of the cycle until it reachesthe original vertex. Once the cycle is formed we place it in the current position and keepmoving forward. This process continues until we reach the last position of the array.Figure 8: The first steps of the algorithm for the B (2 ,
3) sequence 11000101.10 emma 9.
Algorithm 8 has time complexity O ( n ( k +1) n ) and space complexity O (( k +1) n ) where k is the size of the alphabet and n is the order of the input de Bruijn sequence.Proof. To calculate the space complexity observe that there are two big arrays. The visitedV ertices array has size ( k +1) n − since it has a slot for each vertex of the ( k +1)-sizedalphabet. But the actual output, the B ( k + 1 , n ) sequence, will grow up to size ( k + 1) n .To calculate the time complexity of the main cycle observe that we iterate ( k + 1) n times,which is the number of edges for the increased alphabet and also the final size of thesequence array. Then, for each vertex there can be a cycle to add. Adding a cycle hastime complexity O ( n ). This is because we iterate through the edges of the cycle (up to n edges) and for each of those edges we check for the equality of words of size n . Thenthe main cycle has time complexity O ( n ( k + 1) n ). Given an Eulerian cycle c in graph G ( k, n ) we created an Eulerian cycle c (cid:48) in the graph G ( k + 1 , n ) with the property that c (cid:48) preserves the order of the edges in c . We achievedthis by placing petals of the augmenting graph D ( k + 1 , n ) on each vertex of G ( k, n ).Remember that each vertex in G ( k, n ) has k incoming and k outgoing edges. That meansthat we have k + 1 options to place a petal for each vertex in the Eulerian cycle. Onlypetals have edges labelled with the new symbol s and no edge in G ( k, n ) is labelled with s .So in order to have a fair distribution of the symbol s we need to interleave each petal inthe an appropriate part of the cycle. This motivates the following definition. Definition 10 (Section of a cycle) . Given an Eulerian cycle c = e → e → · · · → e n in G ( k, n ), the section j of c is a list of vertices of c composed by the head of each edge e i of c such that (cid:98) i/k (cid:99) = j .A G ( k, n ) de Bruijn graph g has k n vertices and k n +1 edges, so a cycle in g has k n sections with k vertices each section. Given that there are the same number of sectionsand vertices, we would like to choose one vertex from each section to place the petal in away that every vertex is used exactly once. Each section has k vertices and each vertexin g belongs to k sections, not necessarily different. Definition 11 (Petals Distribution graph) . Given an Eulerian cycle c in a G ( k, n ) deBruijn graph g , the Petals Distribution graph
P D ( k, n ) is a k -regular bipartite graph inwhich the vertices of g and the sections of c are the two vertex classes and the edges of P D ( k, n ) are the set of ( v, j ) such that the vertex v belongs to the section j .Given a graph G , a matching M in G is a set of edges such that no two edges sharea common vertex. A vertex is matched if it is an endpoint of one of the edges in thematching. A perfect matching is a matching which matches all vertices of the graph. Lemma 12.
For every Petals Distribution graph there is a perfect matching.Proof.
Let G be a finite bipartite graph with bipartite sets X and Y . For a set W of verticesin X , let N G ( W ) denote the neighborhood of W in G , that is, the set of all vertices in Y adjacent to some element of W . Hall’s marriage theorem [10] states that there is amatching that entirely covers X if and only if for every subset W of X , | W | ≤ | N G ( W ) | .Let X be the set of vertices of the original graph and Y the set of vertices for the sections.For any W such that | W | = r , the sum of the degrees of the r vertices is rk . Given thatthe degree for any vertex in Y is k , we have that | N G ( W ) | ≥ r . Then there is a matchingthat entirely covers X . Furthermore, as | X | = | Y | , the matching is perfect.11igure 9: The de Bruijn sequence [11000101] has four sections: the section 0 has the vertex11 twice, the section 1 has the vertices 10 and 00, the section 2 has the vertices 00 and01, and the section 3 has the vertices 10 and 01.Figure 10: The Petals Distribution graph for the de Bruijn sequence [11000101]. Theleft figure shows the possible sections for each vertex. The right figure shows a possibleassignment of those vertices and sections.Figure 11: Flow network for PD(2,2) where each edge has capacity 1.12n order to compute the perfect match in a Petals Distribution graph we can use anymethod for computing the maximum flow in a network. We introduce two vertices s and t for the source and sink and add an edge from s to each vertex of X and an edge from eachvertex of Y to t . We assign capacity 1 to each of the edges of the flow network. We cansee that the maximum flow of the network is | X | , so this flow has the edges of a perfectmatch. Lemma 13.
Given a de Bruijn sequence v in B ( k, n ) , for any k + n − consecutivesymbols in the new sequence w there is at least one occurrence of s .Proof. First notice that each section v has k vertices. Two petals can be at most 2 k − v = a a . . . a n − of G ( k, n −
1) in D ( k + 1 , n −
1) the outgoing edge is labelled with thenew symbol s and that determines a cycle associated with the circular word a a . . . a n − s .In consequence, the tail vertex of the last edge in that cycle is sa a . . . a n − . This meansthat there is an edge labelled s exactly n edges before the end of the petal. In consequence,between the last occurrence of s in a petal and the first occurrence of s in the next petalthere can be at most 2 k + n − s ,therefore we are guaranteed that given any n − s . Algorithm 14 takes a B ( k, n ) de Bruijn sequence v and returns a B ( k + 1 , n ) de Bruijnsequence w such that v is a subsequence of w , the new symbol s occurs ( k + 1) n − timesin w and given any 2 k + n − w there is at least one occurrenceof s . This algorithm is similar to the Algorithm 8, but balances the occurrences of the newsymbol s . For this purpose, we have to find a maximum flow for the Petals Distributiongraph. We use the Edmonds-Karp algorithm as described before to determine a vertexfrom each section to start a petal. Then we store that in the vertexF orSection array.In addition to the steps of Algorithm 8, we now keep track of the position in theoriginal sequence, that means, how many edges of the original sequence we have alreadytraversed. That is used to determine which is the actual section and therefore what petalshould be placed next.In the main loop of Algorithm 14 we iterate through every edge of the original sequenceadding cycles. On each vertex we check if we can add a cycle. But in this case, if thecurrent vertex belongs to the original graph then adding a cycle implies starting a petal.For that reason, in those cases we have to check if the current vertex can start a petal forthe current section, otherwise we do not add the cycle. If the vertex does not belong tothe original graph then to add a cycle we just have to check that such cycle has not beenalready used, because we are not starting a petal. The rest of the algorithm works in thesame way as the Algorithm 8. Lemma 15.
For an input B ( k, n ) circular de Bruijn sequence the Algorithm 14 producesa B ( k + 1 , n ) circular de Bruijn sequence performing at most O ( k n − ) operations andusing O (( k + 1) n ) space. lgorithm 14 function extendDeBruijn(originalSequence: [Int]) let alphabetSize = getSize(originalSequence) let newAlphabetSize = alphabetSize + 1 let order = getOrder(originalSequence) let newSymbol = alphabetSize let vertexForSection = EdmondsKarp(originalSequence) var sequence = originalSequence var originalSequencePos = 0 var pos = 0 var vertex = originalSequence.last(order - 1) var visitedVertices = [false, ...] while pos ≤ sequence.count do let edgeValue = sequence[pos] originalSequencePos += (vertex + [edgeValue]).contains(newSymbol) ? 0 : 1 vertex = vertex.last(vertex.count - 1) + [edgeValue] pos += 1 let section = floor(originalSequencePos/alphabetSize) let shouldStartPetal = !vertex.contains(newSymbol) && vertex == vertex-ForSection[section] && !visitedVertices[vertex] let shouldAddcycle = vertex.contains(newSymbol) && !visitedVertices[vertex] if shouldStartPetal || shouldAddcycle then let edge = vertex + [newSymbol] var cycle = [] for for i in 0.. < edge.count do let newEdge = edge.last(edge.count - i) + edge.first(i) if i > then break end if let newVertex = newEdge.first(order-1) cycle += newEdge.last(1) if newEdge.last == newSymbol then visitedVertices[newVertex] = true end if end for sequence = sequence.first(pos) + cycle + sequence.last(sequence.count -pos) end if end while return sequence end function Proof of Lemma 15.
The space complexity of Algorithm 14 is the same as the one forAlgorithm 8 given that the only addition in space is the vertexF orSection array that hassize k n − , which is smaller than visitedV ertices . Regarding time complexity note thatthe search of the maximum flow is the most expensive operation of the algorithm. To seethis remember that Edmonds-Karp algorithm has running time O ( V E ), see [9, 6]. In ourcase, the vertices of the flow graph are the vertices of the original de Bruijn graph and thesection vertices, so V = 2 k n − . 14lso notice that there are k + 2 edges in the flow graph associated to each vertex ofthe original de Bruijn graph. So E = ( k + 2) ∗ k n − and then the Edmonds-Karp timecomplexity is O ((2 k n − ) ∗ ( k + 2) ∗ k n − ) = O ( k n − ) . This is higher than the main cycle time complexity O ( n ( k + 1) n ) . This completes the proof.
Acknowledgements.
This research was supported by grant PICT-2014-3260 from Agen-cia Nacional de Promoci´on Cient´ıfica y Tecnol´ogica, Argentina. Becher is a researcher inLaboratoire International Associ´e SINFIN Universit´e Paris Diderot-CNRS/Universidad deBuenos Aires-CONICET.
References [1] Nicol´as ´Alvarez, Ver´onica Becher, Pablo Ferrari, and Sergio Yuhjtman. Perfectnecklaces.
Advances in Applied Mathematics , 80:48 – 61, 2016.[2] Gal Amram, Yair Ashlagi, Amir Rubin, Yotam Svoray, Moshe Schwartz, and GeraWeiss. An efficient shift rule for the prefer-max de Bruijn sequence.
DiscreteMathematics , 342(1):226 – 232, 2019.[3] Ver´onica Becher and Pablo Ariel Heiber. On extending de Bruijn sequences.
Information Processing Letters , 111(18):930–932, 2011.[4] Ver´onica Becher and Olivier Carton. Normal numbers and nested perfect necklaces.
Journal of Complexity , page in press, 2019.[5] Jean Berstel and Dominique Perrin. The origins of combinatorics on words.
EuropeanJournal of Combinatorics , 28(3):996–1022, 2007.[6] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction to Algorithms . MIT Press, 2009.[7] Lucas Cort´es. Extending de Bruijn sequences to larger alphabets, 13 December 2018.
Tesis de Licenciatura en Ciencias de la Computaci´on , Facultad de Ciencias Exactasy Naturales, Universidad de Buenos Aires. Director: Ver´onica Becher.[8] Nicolaas G. de Bruijn. A combinatorial problem.
Nederl. Akad. Wetensch., Proc. ,49:758–764 = Indagationes Math. 8, 461–467 (1946), 1946.[9] Jack Edmonds and Richard M. Karp. Theoretical improvements in algorithmicefficiency for network flow problems.
Journal of the ACM , 19(2):248–264, 1972.[10] Philip Hall. On representatives of subsets.
Journal of the London MathematicalSociety , 10, 1935.[11] Mordechay B. Levin. On the discrepancy estimate of normal numbers.
ActaArithmetica , 88(2):99–111, 1999. 1512] Damian Repke and Wojciech Rytter. On semi-perfect de Bruijn words.
TheoreticalComputer Science , 720:55 – 63, 2018.[13] Camille Flye Sainte-Marie. Question 48.
L’interm. des math. , 1:107–110, 1894.[14] Moshe Schwartz, Yotam Svoray, and Gera Weiss. On embedding de Bruijn sequencesby increasing the alphabet size. arXiv:1906.06157, 2019.[15] Gabriel Thibeault. Greatest de Bruijn sequences in many colors, ongoing 2018.