[PDF] Speeding up the AIFV- 2 dynamic programs by two orders of magnitude using Range Minimum Queries

Abstract

AIFV- 2 codes are a new method for constructing lossless codes for memoryless sources that provide better worst-case redundancy than Huffman codes. They do this by using two code trees instead of one and also allowing some bounded delay in the decoding process. Known algorithms for constructing AIFV-code are iterative; at each step they replace the current code tree pair with a "better" one. The current state of the art for performing this replacement is a pair of Dynamic Programming (DP) algorithms that use O( n 5 ) time to fill in two tables, each of size O( n 3 ) (where n is the number of different characters in the source). This paper describes how to reduce the time for filling in the DP tables by two orders of magnitude, down to O( n 3 ) . It does this by introducing a grouping technique that permits separating the Θ( n 3 ) -space tables into Θ(n) groups, each of size O( n 2 ) , and then using Two-Dimensional Range-Minimum Queries (RMQs) to fill in that group's table entries in O( n 2 ) time. This RMQ speedup technique seems to be new and might be of independent interest.

Full PDF

SSpeeding up the AIFV-2 dynamic programs by twoorders of magnitude using Range Minimum Queries

Mordecai Golin a,1 , Elfarouk Harb b a Hong Kong UST. [email protected] b Hong Kong UST. [email protected]

Abstract

AIFV-2 codes are a new method for constructing lossless codes for memorylesssources that provide better worst-case redundancy than Huﬀman codes. Theydo this by using two code trees instead of one and also allowing some boundeddelay in the decoding process. Known algorithms for constructing AIFV-codeare iterative; at each step they replace the current code tree pair with a “better”one. The current state of the art for performing this replacement is a pair ofDynamic Programming (DP) algorithms that use O ( n ) time to ﬁll in two tables,each of size O ( n ) (where n is the number of diﬀerent characters in the source).This paper describes how to reduce the time for ﬁlling in the DP tablesby two orders of magnitude, down to O ( n ). It does this by introducing agrouping technique that permits separating the Θ( n )-space tables into Θ( n )groups, each of size O ( n ), and then using Two-Dimensional Range-MinimumQueries (RMQs) to ﬁll in that group’s table entries in O ( n ) time. This RMQspeedup technique seems to be new and might be of independent interest. Keywords:

AIFV Codes, Dynamic Programming Speedups, Range MinimumQueries

1. Introduction

Almost Instantaneous Fixed to Variable-2 (AIVF-2) codes were introducedrecently in a series of papers [9, 10, 14, 15]Similar to Huﬀman Codes, these provide lossless encoding for a ﬁxed prob-abilistic memoryless source. They diﬀer from Huﬀman codes in that they usea pair of coding trees instead of just one tree, sometimes coding using the ﬁrstand sometimes using the second. They also no longer provide instantaneous decoding. Instead, decoding might require a bounded delay. That is, it mightbe necessary to read up to 2 extra characters after a codeword ends before cer-tifying the the completion (and decoding) of the codeword. The advantage ofAIFV-2 codes over Huﬀman codes is that they guarantee redundancy of at most1 / Work partially supported by Hong Kong RGC CERG Grant 16213318

Preprint submitted to Elsevier a r X i v : . [ c s . D S ] F e b etter, pair. The original paper [15] only proved that its iterative algorithmterminated. This was improved to polynomial time steps by [7], which usedonly O ( b ) iterations, where b is the maximum number of bits used to encodeone of the input source probabilities.Each iterative step of [15]’s algorithm was originally implemented using anexponential time Integer Linear Program. This was later improved by [10] to O ( n ) time, using Dynamic Programming (DP) to replace the ILP. n is thenumber of diﬀerent characters in the original souce.The purpose of this paper is to show how the DP method can be sped upto O ( n ) time. Combined with [7], this yields a O ( n b ) time algorithm forconstructing AIFV-2 codes.Historically, there have been two major approaches to speeding up DPs.The ﬁrst is the Knuth-Yao Quadrangle-Inequality method [11, 12, 16, 17]. Thesecond is the use of “monotonicity” or the “Monge Property” and the applicationof the SMAWK [1] algorithm [3, Section 3.8] ([13] provides a good example ofthis approach). There are also variations, e.g., [5], that while not exactly oneor the other, share many of their properties. [2] provides a recent overview ofthe techniques available.Both methods improve running times by “grouping” calculations. Morespeciﬁcally, they all essentially ﬁll in a DP table of size Θ( n k ), for some k ,in which calculating an individual table entry requires Θ( n ) work. Thus, a-priori, ﬁlling in the table seems to require Θ( n k +1 ) time. The speedups work bygrouping the entries in sets of size Θ( n ) and calculating all entries in the groupin Θ( n ) time. The Quadrangle-Inequality approach does this via amortizationwhile the SMAWK approach does this by a transformation into another problem(matrix row-minima calculation). Both approaches lead to a Θ( n ) speedup,permitting ﬁlling in the table in an optimal Θ( n k ) time.Both DPs in [10] have O ( n ) size tables with each entry requiring Θ( n )individual evaluation time, leading to the O ( n ) time algorithms. The maincontribution of this paper is the development of new grouping techniques thatpermit speeding up the DPs by Θ( n ) , decreasing the running times to O ( n ) . More speciﬁcally, the table entries are now partitioned into Θ( n ) groups,each containing Θ( n ) entries. For each group, a Θ( n ) × Θ( n ) sized rectangularmatrix M is then built; calculating the value of each table entry in the groupis shown to be equivalent to performing a Two-Dimensional Range Minimum(2D RMQ) query on M (along with O (1) extra work). Known results [18] on2D RMQ queries imply that O ( n ) queries can be inplemented using a total of O ( n ) time. Thus all entries in each group of size Θ( n ) can be evaluated in O ( n ) time, leading to an O ( n ) time algorithm.To the best of our knowledge this is the ﬁrst time 2D RMQs have been usedfor speeding up Dynamic Programming in this fashion, so this technique mightbe of independent interest.Section 2 quickly reviews known facts about 2D RMQs. It also introduces thetwo specialized versions of RMQs that will be needed and shows that they canbe solved even more simply (practically) than standard RMQs. Section 3 is themain result of the paper. It states (before derivation) the two DPs of interest andthen describes the new technique to reduce their evaluation from Θ( n ) to Θ( n ) . The remainder of the paper then provides the backstory. Section 4 deﬁnes themotivating AIVF-2 problem and the technique for solving it. Finally, Section 5describes the derivation of the AIFV-2 DPs that were solved in Section 3. We2mphasize that while these DPs are not exactly the ones introduced in [10]they are very similar and were derived using the same observations and basictools (the top-down signature technique of [6, 4]). The derivation of these newDPs was necessary, though. Their slightly diﬀerent structure is what permitssuccessfully applying the 2D RMQ technique to themWe conclude by noting that AIFV-2 codes were later extended to AIFV- m codes by [9]. These replace the pair of coding trees by an m -tuple. The iterativealgorithms for constructing these codes use O ( n m +1 ) time DP algorithms thatﬁll in size O ( n m +1 ) DP tables as subroutines. An interesting direction for futurework is whether it is possible to reduce the running times of evaluating thoseDP tables by a factor of Θ( n m ) via the use of the corresponding m D RMQalgorithms from [18]. This would require a much better understanding of thestructure of those DPs in [9] than currently exist.

2. Range Minimum Queries

As, mentioned, the speedup in evaluating the DPs will result from groupingand then using Range Minimum Queries (RMQs). This section quickly reviewsfacts about RMQs for later use.

Deﬁnition 1 (2D RMQ) . Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . The two-dimensional range minimum query (2D RMQ) problem is,for ≤ a ≤ a ≤ n and ≤ b ≤ b ≤ n to return the value RM Q ( M : a , a , b , b ) (cid:44) min { M i,j : a ≤ i ≤ a , b ≤ j ≤ b } . and indices i (cid:48) , j (cid:48) , a ≤ i (cid:48) ≤ a , b ≤ j (cid:48) ≤ b such that M i (cid:48) ,j (cid:48) = RM Q ( M : a , a , b , b ) . This can be solved using

Lemma 1 ([18]) . Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . There is an O ( mn ) time algorithm to preprocess M that permitsanswering any subsequent 2D RMQ query in O (1) time. While theoretically optimal, the algorithm in [18] is quite complicated. Tomake the speed up more practical to implement, we note in advance that all ofthe RMQ queries used later will be one of the two following specialized types:

Deﬁnition 2.

Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n .Let ≤ a ≤ m, ≤ b ≤ n . See Figure 1. • Deﬁne a restricted column query as RCQ ( M : a, b ) (cid:44) RM Q ( M : a, m, b, b ) . • Deﬁne a restricted RMQ query as RRM Q ( M : a, b ) (cid:44) RM Q ( M : a, m, , b ) . , m,

0) (0 , n )( a,

0) (0 , b ) ( i, j ) a ≤ i ≤ m ≤ j ≤ b (0 , b (cid:48) )( a (cid:48) ,

0) ( a, b ) ( a (cid:48) , b (cid:48) ) i j Figure 1: Illustration of Deﬁnition 2. M is an ( m + 1) × ( n + 1) matrix. RCQ ( M : a (cid:48) , b (cid:48) )is the minimum of the entries in the long thin blue column descending down from ( a (cid:48) , b (cid:48) ) .RRMQ ( M : a, b ) is the minimum of the entries in the blue rectangle with upper-right corner( a, b ). Directly from the deﬁnition, ∀ b, RCQ ( M : a, b ) = (cid:40) M m,b if a = m, min ( M a,b , RCQ ( M : a + 1 , b )) if a < m. Thus, the values of all of the Θ( mn ) possible RCQ ( M : a, b ) queries (and theassociated indices at which minimization occurs) can be easily calculated inΘ( mn ) time.Also directly from the deﬁnitions, RRM Q ( M : a, b ) = (cid:40) RCQ ( M : a,

0) if b = 0 , min ( RRM Q ( M : a, b − , RCQ ( M : a, b )) if b > RCQ ( M : a, b ) have been precalculated, thevalues of all of the Θ( mn ) possible RRM Q ( M : a, b ) queries (and the associatedindices at which minimization occurs) can also be easily calculated in Θ( mn )time.For later use we collect this in a lemma. Lemma 2.

Let M be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . Thereis an O ( mn ) time algorithm that calculates the answers to all of the possible RCQ ( M : a, b ) and RRM Q ( M : a, b ) queries.

3. The Dynamic Program and its speedupDeﬁnition 3.

Let p , . . . , p n be given such that ∀ i, p i > and (cid:80) ni =1 p i = 1 . Set W m (cid:44) (cid:88) j ≤ m p j , and W (cid:48) m (cid:44) (cid:88) j>m p j = 1 − W m and, for m (cid:48) < m , W m (cid:48) ,m (cid:44) (cid:88) m (cid:48)

Deﬁnition 4 (The Signature Set and costs) . Let C ( ≤ C ≤ ) be ﬁxed. • Deﬁne S n (cid:44) { ( m ; p ; z ) : 0 ≤ z ≤ m ≤ n and ≤ p ≤ n } to be the signature set for the problem of size n. • Let ( m (cid:48) ; p (cid:48) ; z (cid:48) ) (cid:54) = ( m, p, z ) ∈ S n .We say ( m (cid:48) ; p (cid:48) ; z (cid:48) ) can be expanded into ( m ; p ; z ) , denoted by ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, p, z ) , if there exists e , e satisfying e , e ≥ ≤ e + e ≤ p (cid:48) (1) and m = m (cid:48) + e + e , (2) z = e , (3) p = z (cid:48) + 2( p (cid:48) − e − e ) . (4) • For α ∈ S n , deﬁne the immediate predecessor set of α to be P ( α ) (cid:44) { α (cid:48) ∈ S n : α (cid:48) → α } . • Let α , α ∈ S n . We say that α leads to α , denoted by α (cid:32) α , if thereexists a path from α to α using “ → ”. • Let I ⊂ S n and α ∈ S n . We say that I (cid:32) α if α (cid:54)∈ I and there exists α (cid:48) ∈ I such that α (cid:48) (cid:32) α . • Let α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and α = ( m, p ; z ) where α (cid:48) → α . The associated expansion costs are c ( α (cid:48) , α ) (cid:44) W (cid:48) m (cid:48) + CW m (cid:48) − z (cid:48) ,m (cid:48) ,c ( α (cid:48) , α ) (cid:44) W (cid:48) m (cid:48) − CW m (cid:48) ,m − z . The two dynamic programs used in the construction of AIFV-2 codes aregiven in the next deﬁnition.

Deﬁnition 5 (The OPT s ( α ) tables) . • Let I ⊂ S n be a given initial set (independent of n ) for the OPT tablewith known values ¯ c ( α ) for α ∈ I . Now deﬁne

OPT ( α ) =  ¯ c ( α ) if α ∈ I min α (cid:48) ∈ P ( α ) { OPT ( α (cid:48) ) + c ( α (cid:48) , α ) } if I (cid:32) α ∞ otherwise Let I ⊂ S n be a given initial set (independent of n ) for the OPT tablewith known values ¯ c ( α ) for α ∈ I . Now deﬁne

OPT ( α ) =  ¯ c ( α ) if α ∈ I min α (cid:48) ∈ P ( α ) { OPT ( α (cid:48) ) + c ( α (cid:48) , α ) } if I (cid:32) α ∞ otherwise • Furthermore, for s ∈ { , } , for α (cid:54)∈ I s with I s (cid:32) α , set Pred s ( α ) (cid:44) argmin α (cid:48) ∈ P ( α ) { OPT s ( α (cid:48) ) + c s ( α (cid:48) , α ) } The ¯ c s ( α ) for α ∈ I s are the initial conditions for the corresponding dynamicprograms. For intuition, let G s ( n ) be the directed graph with vertices α ∈ S n withthe cost of edge ( α (cid:48) , α ) being the expansion cost c s ( α (cid:48) , α ) except that edgesfrom (0; 0; 0) to α ∈ I s have cost ¯ c s ( α ) and edges that are not expansionshave costs set to ∞ . Then OPT s ( α ) is just the cost of the shortest path from(0; 0; 0) to α in G s ( n ) . The actual path could be found by following the Pred s ( α )pointers backward from α . By deﬁnition, the expansion costs c s ( α (cid:48) , α ) are allnon-negative, so the OPT s ( α ) values are all well-deﬁned.The next set of lemmas will imply that G s ( n ) is a Directed Acyclic Graph sothe recurrences deﬁne a Dynamic Program. They will also suggest an eﬃcientgrouping mechanism, leading to fast evaluation. Lemma 3.

Let ( m (cid:48) ; p (cid:48) ; z (cid:48) ) , ( m ; z ; p ) ∈ S n . Then ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, z, p ) if and only if all of m (cid:48) + 2 p (cid:48) + z (cid:48) = 2 m + p, (5) m (cid:48) + p (cid:48) ≥ m, (6) m (cid:48) ≤ m − z, (7)( p (cid:48) , z (cid:48) ) (cid:54) = (0 , , (8) are satisﬁed.Proof. First assume that ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, p, z ).Let e , e be the unique pair that satisﬁes (1)-(4). Then (5) follows from2 m (cid:48) + 2 p (cid:48) = 2( m − e − e ) + ( p − z (cid:48) + 2 e + 2 e )= 2 m + p − z (cid:48) ;(6) follows from m (cid:48) + p (cid:48) ≥ m (cid:48) + e + e = m ;(7) follows from m − z = m − e = m (cid:48) − e ≥ m (cid:48) .

68) follows from the fact that the combination of ( p (cid:48) , z (cid:48) ) = (0 ,

0) and Deﬁnition 4would imply p = − ( e + e ). Since p ≥

0, this further implies e = e = 0 andthus m = m (cid:48) and p = z = 0. This would contradict ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) . For the other direction assume that Equations (5)-(8) all hold. We willshow that Equations (1)-(4) with ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) also all hold with e = m − m (cid:48) − z and e = z . Equations (2) and (3) are trivially satisﬁed. (4) followsfrom p = 2 m (cid:48) + 2 p (cid:48) + z (cid:48) − m = z (cid:48) − m − m (cid:48) ) + 2 p (cid:48) = z (cid:48) − e + e ) + 2 p (cid:48) = z (cid:48) + 2( p (cid:48) − e − e ) . Next note that e = z ≥ e = m − z − m (cid:48) ≥ p (cid:48) ≥ m − m (cid:48) = e + e so Equation (1) holds.It only remains to show that ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) . Suppose, not and( m (cid:48) ; z (cid:48) ; p (cid:48) ) = ( m ; z ; p ) . Then from (4), e = e = 0 so from (3) z (cid:48) = z = 0and thus from (4), p = 2 p (cid:48) implying p (cid:48) = p = 0 . But this contradicts (8).

Deﬁnition 6.

For d ≥ , deﬁne I ( d ) (cid:44) { ( m ; p ; z ) ∈ S n : 2 m + p = d } , I (cid:48) ( d ) (cid:44) { ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ S n : 2 m (cid:48) + 2 p (cid:48) + z (cid:48) = d and ( p (cid:48) , z (cid:48) ) (cid:54) = (0 , } . Now note that Lemma 3 can be rewritten as

Corollary 1. If α ∈ I ( d ) then P ( α ) = (cid:110) ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d )) : m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z (cid:111) ⊆ I (cid:48) ( d )) . Next note

Lemma 4.

Let d > . Then I (cid:48) ( d ) ⊆ (cid:91) d (cid:48)

Let α = ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ) . Since the I ( d (cid:48) ) partition S n , there mustexist some d (cid:48) such that α ∈ I ( d (cid:48) ) . Suppose that d ≤ d (cid:48) . Then2 m (cid:48) + 2 p (cid:48) + z (cid:48) = d ≤ d (cid:48) = 2 m (cid:48) + p (cid:48) , implying p (cid:48) + z (cid:48) ≤ p, z ) = (0 , I (cid:48) ( d )) . Thus d (cid:48) < d. Since this is true for all α ∈ I (cid:48) ( d ) , Equation (9) follows.Corollary 1 and Lemma 4 together imply that the OPT s ( α ) tables can beevaluated in the order α ∈ I ( d ) for d = 1 , , . . . . This ordering guaranteesthat when OPT s ( α ) is being calculated, all of the OPT s ( α (cid:48) ) entries for which α (cid:48) ∈ P ( α ) have been previously calculated.For many α, | P ( α ) | = Θ( n ), so calculating OPT s ( α ) would require Θ( n )time. Since |S n | = Θ( n ), this would imply an O ( n ) time algorithm for ﬁlling7 , m (cid:48) p (cid:48) (0 , r ) ( r, I (cid:48) ( d ) (0 , ji (0 , r ) ( r, , m ) ( m, ( m − z, (0 , m ) ( m − z, m (cid:48) + p (cid:48) = mm (cid:48) + p (cid:48) = r P ( α ) α = ( m ; p ; z ) ( i, j ) m ≤ i ≤ r ≤ j ≤ m − z (a) (b) Figure 2: The transformation from ( m (cid:48) , p (cid:48) ) to ( j, i ) described in the text. From Deﬁnition 6,if ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ) then z (cid:48) = d − m (cid:48) − p (cid:48) is uniquely determined by ( m (cid:48) , p (cid:48) ) . In (a), theright triangle bounded by vertices (0 , , r ) and ( r,

0) with r = (cid:98) d/ (cid:99) is the location of all( p (cid:48) , m (cid:48) ) pairs such that ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ). The blue shaded parallelogram is the location ofall ( p (cid:48) , m (cid:48) ) pairs such that ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ P ( α ) for some α = ( m, p, z ) ∈ I ( d ) . (b) illustrates thetransformation ( j, i ) = ( m (cid:48) , m (cid:48) + p (cid:48) ) . Note how the blue parallelogram becomes a rectangle,permitting the use of a 2D RRMQ query. in the entire table. This is similar to the O ( n ) derivation in [10]. We now showhow to reduce this down to O ( n ) using RMQs and Lemma 2.The sped up O ( n ) algorithm works in batched stages. In stage d, thealgorithm calculates OPT s ( α ) for all α ∈ I ( d ). It ﬁrst spends O ( n ) timebuilding an associated matrix M d and then reduces the calculation of eachOPT s ( α ) to a 2D-RMQ query (and possibly O (1) extra work).Before starting we quickly note a small technical issue concerning the DPinitial conditions. Let ¯ d s = max α =( m ; p ; z ) ∈I s m + p. The starting stage of the algorithms is just to calculate OPT s ( α ) for all α ∈ I ( d )with d = 1 , . . . , ¯ d s . Calculating all of these requires only O (1) time.We now ﬁrst describe the complete solution for OPT , which will be easier,and then discuss the modiﬁcations needed for OPT . Assume then that, for some d > ¯ d , OPT ( α (cid:48) ) is already known for all α (cid:48) ∈ I ( d (cid:48) ), where d (cid:48) < d. If α = ( m ; p ; z ) ∈ I ( d ) then, by deﬁnition,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) + CW m (cid:48) − z (cid:48) ,m (cid:48) } (10)where all the OPT ( α (cid:48) ) for α (cid:48) ∈ P ( α ) are already known.Recall that there are O ( n ) signatures α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ). The ideais to arrange the corresponding O ( n ) values OPT( m (cid:48) ; p (cid:48) ; z (cid:48) ) + CW m (cid:48) − z,z in anarray M ( d ) i,j in such a way that, for each individual α ∈ I ( d ), the minimizationin Equation (10) could be performed using just one 2D RMQ query in M ( d ) i,j .The arrangement will use the invertible transformation (see Figure 2) j = m (cid:48) and i = m (cid:48) + p (cid:48) . j ≤ i. Furthermore,( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ) ⇒ d = 2 m (cid:48) + 2 p (cid:48) + z (cid:48) = 2 i + z (cid:48) which in turn implies 2 i ≤ d and z (cid:48) = d − i. Set r = (cid:98) d/ (cid:99) . Then( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ) ⇒ ≤ j ≤ i ≤ r and ( m (cid:48) ; p (cid:48) ; z (cid:48) ) = ( j ; i − j ; d − i ) . (11)Furthermore, working backwards,0 ≤ j ≤ i ≤ r and ( i − j ; d − i ) (cid:54) = (0 , ⇒ ( j ; i − j ; d − i ) ∈ I (cid:48) ( d ) . (12)where the second condition comes from the fact that ( m (cid:48) , p, z (cid:48) ) (cid:54)∈ I (cid:48) ( d ) if( p (cid:48) , z (cid:48) ) = (0 , r + 1) × ( r + 1) matrix(indices of i and j start at 0) M ( d ) i,j (cid:44)  ∞ if i > j ∞ if i = j = d OPT ( j ; i − j ; d − i ) + CW j − ( d − i ) ,j Otherwise . Since all the values referenced are already known, this matrix can be built in O ( r ) = O ( n ) time.Then, if α ∈ I ( d ) , from Corollary 1,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) + CW m (cid:48) − z (cid:48) ,m (cid:48) } = min { M ( d ) i,j : i = m (cid:48) + p (cid:48) ≥ m and j = m (cid:48) ≤ m − z } = RM Q (cid:16) M ( d ) i,j : m, r, , m − z (cid:17) = RRM Q (cid:16) M ( d ) i,j : m, m − z (cid:17) Note that the RRMQ query result also provides the indices of the minimizingentry, which provides the corresponding Pred ( α ) value as well.Lemma 2 permits calculating all the O ( r ) RRM Q (cid:16) M ( d ) i,j : a, b (cid:17) values in O ( r ) = O ( n ) time. Thus, all of the OPT ( α ) for α ∈ I ( d ) (and their corre-sponding Pred ( α ) values) can be calculated in O ( n ) total time. Doing thisfor all O ( n ) values of d > ¯ d in increasing order, yields the required O ( n ) timealgorithm for ﬁlling in the OPT matrix.We next describe the more complicated algorithm for the OPT case.Assume that OPT ( α ) is already known for all α ∈ I (cid:48) ( d ), d (cid:48) < d. If α =( m ; p ; q ) ∈ I ( d ) then, similar to the OPT case,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } where all the OPT ( α (cid:48) ) for α (cid:48) ∈ P ( α ) are already known.Following the approach in the OPT algorithm, for ﬁxed d, we would like toarrange the O ( n ) values (OPT ( α (cid:48) ) − CW m (cid:48) ,m − z ) for α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ),9ppropriately in an array so that each OPT ( α ) entry could be resolved usingone 2D RMQ query. The diﬃculty is that the values of the array entries dependupon both α and α (cid:48) . More speciﬁcally, the CW m (cid:48) ,m − z term would have to bereprocessed for each ( m, z ) pair. Thus, no ﬁxed M i,j array, independent of( m, z ), could be deﬁned.Instead, we utilize a relationship between diﬀerent queries. More speciﬁcally,let α = ( m ; p ; z ) ∈ I ( d ). From Equation (7), z ≤ m − m (cid:48) ≤ m. If z = m, then m (cid:48) = 0 so W m (cid:48) ,m − z = W , = 0 andOPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) } . If z < m then, splitting into the cases m (cid:48) = m − z and m (cid:48) ≤ m − z − ( α ) = min( A, B )where A (cid:44) min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) m (cid:48) = m − z { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } ,B (cid:44) min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) m (cid:48) ≤ m − z − { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } . First note that if m (cid:48) = m − z , then W m (cid:48) ,m − z = 0 so A = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) = m − z { OPT ( α (cid:48) ) } . Next note that, from Corollary 1, P (( m ; p ; z )) ∩{ ( m (cid:48) ; p (cid:48) ; z (cid:48) ) : m (cid:48) ≤ m − z − } = (cid:110) ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) (2 m + p )) : m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z − (cid:111) = P (( m ; p ; z + 1))and from Deﬁnition 3 W m (cid:48) ,m − z = p m − z + W m (cid:48) ,m − ( z +1) . Thus B = − Cp m − z + min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P (( m ; p ; z +1)) (cid:8) OPT ( α (cid:48) ) − CW m (cid:48) ,m − ( z +1) (cid:9) = OPT ( m ; p ; z + 1) − Cp m − z Again use the same transformation j = m (cid:48) and i = m (cid:48) + p (cid:48) so that Equa-tions (11) and (12) apply. Set r = (cid:98) d/ (cid:99) , deﬁne the ( r + 1) × ( r + 1) array M [ d ] i,j (cid:44)  ∞ if i < j ∞ if i = j = d OPT ( j ; i − j ; d − i ) Otherwise . all the O ( r ) RCQ (cid:16) M ( d ) i,j : a, b (cid:17) values in O ( r ) = O ( n ) time.Let α = ( m ; p ; z ) ∈ I ( d ). Then, from the discussion above,If m = z , OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) =0 { OPT ( α (cid:48) ) } = min (cid:110) M [ d ] i,j : i ≥ m and j = 0 (cid:111) = RM Q (cid:16) M [ d ] i,j : m, r, , (cid:17) = RCQ (cid:16) M [ d ] i,j : m, (cid:17) which is already known.If z < m , A = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) = m − z { OPT ( α (cid:48) ) } = min (cid:110) M [ d ] i,j : i ≥ m and j = m − z (cid:111) = RM Q (cid:16) M [ d ] i,j : m, r, m − z, m − z (cid:17) = RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) . Thus, for α = ( m ; p ; z ) with z < m ,OPT ( α ) = min( A, B ) (13)= min (cid:16)

RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) , OPT ( m ; p ; z + 1) − Cp m − z (cid:17) which, since RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) is already known, can be calculatedin O (1) time if OPT ( m ; p ; z + 1) had already been calculated. The asso-ciated Pred ( α ) can be found appropriately.This permits calculating OPT ( α ) and (and their corresponding Pred ( α )values) for all α = ( m ; p ; z ) ∈ I ( d ) in a total of O ( n ) time as follows:1. First spend O ( n ) time calculating all the RCQ (cid:16) M ( d ) i,j : a, b (cid:17) values.2. For each of the O ( n ) possible ﬁxed pairs m, p satisfying 2 m + p = d (a) Set OPT ( m ; p ; m ) = RRM Q (cid:16) M [ d ] i,j : m, (cid:17) .(b) Then, for z = m − , m − , . . . , calculate OPT ( m ; p ; z ) in (1) timefrom OPT ( m ; p ; z + 1) using Equation (13).Since this is O ( n ) time for ﬁxed d, doing this for all O ( n ) values of d > ¯ d in increasing order yields the required O ( n ) time algorithm for ﬁlling in theOPT matrix. 11 a b c d T a b c d p X ( a ) = 0 . , p X ( b ) = 0 . p X ( c ) = 0 . , p X ( d ) = 0 . q ( T ) = 0 . q ( T ) = 0 . .

25 + 0 .

05 = 0 . L ( T ) = 1 · . · .

25 + 2 · . · .

05 = 1 . L ( T ) = 2 · . · .

25 + 2 · . · .

05 = 2 . L ( T , T ) = 1 . · . . · . .

25 + 0 .

8= 1 . . . . < .

75 = L (Huﬀman X )master nodesslave nodes Figure 3: A binary AIFV-2 code for X = { a, b, c, d } with associated probabilities. Theencoding of b d b c a a is Y = . Note that d, c and the ﬁrst a were encoded using T while the other letters were encoded using T . This code has cost ≈ .

72 which is betterthan the optimal Huﬀman code for the same source which has L (Huﬀman X ) = 1 .

4. A Quick Introdution to AIFV-2 codes

Note: This introduction is copied with some small modiﬁcations, from [8].

Let X be a memoryless source over a ﬁnite alphabet X of size n . ∀ a i ∈ X ,let p i = P X ( a i ) denote the probability of a i ocurring. Without loss of generalitywe assume that p ≥ p ≥ · · · ≥ p n > n (cid:88) i =1 p i = 1 . A codeword c of a binary AIFV code is a string in { , } ∗ . | c | will denote thelength of codeword c .We now brieﬂy describe the structure of Binary AIFV-2 codes using theterminology of [9]. See [9] for more details and Figure 3 for an example.Codes are represented via binary trees with left edges labelled by “0” andright edges by “1”. A Binary AIFV-2 code is a pair of binary code trees, T , T satisfying: • Complete internal nodes in T and T have both left and right children. • Incomplete internal nodes (with the unique exception of the left child ofthe root of T ) have only a “0” (left) child.Incomplete internal nodes are labelled as either master nodes or slave nodes. • A master node must be an incomplete node with an incomplete childThe child of a master node is a slave node.

This implies that a master node is connected to its unique grandchild via“ ” with the intermediate node being a slave node. Each source symbol is assigned to one node in T and one node in T .The nodes to which they are assigned are either leaves or master nodes. Symbols are not assigned to complete internal nodes or slave nodes. • The root of T is complete and its “0” child is a slave node.The root of T has no “00” grandchild.Let c s ( a ) , s ∈ { , } denote the codeword of a ∈ X encoded by T s . Theencoding procedure for a sequence x , x . . . of source symbols works as follows.0. Set s = 0 and j = 1 .

1. Encode x j as c s j ( x j ) .

2. If c s j ( x j ) is a leaf in T s j , then set s j +1 = 0else set s j +1 = 1 % this occurs when c s j ( x j ) is a master node in T s j

3. Set j = j + 1 and Goto 1.Note that a symbol is encoded using T if and only if its predecessor wasencoded using a leaf node and it is encoded using T if and only if its predecessorwas encoded using a master node. The decoding procedure is a straightforwardreversal of the encoding procedure. Details are provided in [14] and [10]. Theimportant observation is that identifying the end of a codeword might ﬁrstrequire reading an extra two bits past its ending, resulting in a two bit delay,so decoding is not instantaneous.Following [14], we can now derive the average codeword length of a binaryAIFV-2 code deﬁned by trees T , T . The average codeword length L ( T s ) of T s , s ∈ { , } , is L ( T s ) = n (cid:88) i =1 | c s ( a i ) | p i . If the current symbol x j is encoded by a leaf (resp. a master node) of T s j ,then the next symbol x j +1 is encoded by T (resp. T ). This process can bemodelled as a two-state Markov chain with the state being the current encodingtree. Denote the transition probabilities for switching from code tree T s to T s (cid:48) by q s (cid:48) ( T s ). Then, from the deﬁnition of the code trees and the encoding/decodingprotocols: q ( T s ) = (cid:88) a ∈L Ts P X ( a ) and q ( T s ) = (cid:88) a ∈M Ts P X ( a )where L T s (resp. M T s ) denotes the set of source symbols a ∈ X that areassigned to a leaf node (resp. a master node) in T s .Given binary AIFV-2 code T , T , as the number of symbols being encodedapproaches inﬁnity, the stationary probability of using code tree T s can then becalculated to be P ( s | T , T ) = q s ( T ˆ s ) q ( T ) + q ( T ) (14)where ˆ s ∈ { , } , s (cid:54) = ˆ s .The average (asymptotically) codeword length (as the number of charactersencoded goes to inﬁnity) of a binary AIFV-2 code is then L AIF V ( T , T ) = P (0 | T , T ) L ( T ) + P (1 | T , T ) L ( T ) (15)13 lgorithm 1 Iterative algorithm to construct an optimal binary AIFV-2 code[15, 10] m ← C (0) = 2 − log (3) repeat m ← m + 1 T ( m )0 = argmin T ∈T ( n ) { L ( T ) + C ( m − q ( T ) } T ( m )1 = argmin T ∈T ( n ) { L ( T ) − C ( m − q ( T ) } Update C ( m ) = L (cid:16) T ( m )1 (cid:17) − L (cid:16) T ( m )0 (cid:17) q (cid:16) T ( m )0 (cid:17) + q (cid:16) T ( m )1 (cid:17) until C ( m ) = C ( m − // Set C ∗ = C ( m ) . Optimal binary AIFV-2 code is T ( m )0 , T ( m )1 [14, 15] showed that the binary AIFV-2 code T , T minimizing Equation (15)can be obtained by Algorithm 1, in which T ( n ) (resp. T ( n )) is the set of allpossible T (resp. T ) coding trees. It implemented the minimization (overall coding trees) in lines 4 and 5 as an ILP. In a later paper [10], the authorsreplaced this ILP with a O ( n ) time and O ( n ) space DP that modiﬁed a top-down tree-building DP from [6, 4].[10, 15] proved algebraically that Algorithm 1 would terminate after a ﬁnitenumber of steps and that the resulting tree pair T ( m )0 , T ( m )1 is an optimal BinaryAIFV-2 code. They were unable, though, to provide any bounds on the numberof steps needed for termination. [7] then gave two new iterative algorithms thatprovably terminated in O ( b ) iterations, where b is the maximum number of bitsrequired to store any of the probabilities p i (so these were weakly polynomialalgorithms). More formally, let o i , b i be such that p i = o i − b i where o i < b i isan odd positive integer. Then b = max i b i . Each iteration step of [7]’s algorithm ran O (1) of the DPs from [10] so itsfull algorithm for constructing optimal AIFV-2 codes ran in O ( n b ) time. Theresults of this paper replace the O ( n )-time DPs with O ( n )-time DPs, leadingto O ( n b )-time algorithms for constructing optimal AIFV-2 codes.We conclude this section by noting that the correctness of the DPs deﬁnedin both [10] and the next section assume that 0 ≤ C ( i ) ≤

1. The need for thisassumption was implicit in [10] and is made explicit in Lemma 5 in the nextsection. The validity of this assumption was proven in [8].

5. Deriving the DP

Each iteration step in both [10] and [7] requires ﬁnding trees that satisfy T ( C ) (cid:44) argmin T ∈T ( n ) { Cost ( T : C ) } , (16) T ( C ) (cid:44) argmin T ∈T ( n ) { Cost ( T : C ) } , (17)where Cost ( T : C ) (cid:44) L ( T ) + Cq ( T ) , (18)14ost ( T : C ) (cid:44) L ( T ) − Cq ( T ) . (19)Since C will be ﬁxed at any iteration stage, we simplify our notation by assuming C ﬁxed and writing Cost ( T ) and Cost ( T ) to denote Equations (18) and (19). Deﬁnition 7.

Let T be a binary AIFV coding tree. Deﬁne ∀ a i ∈ X , c T ( a i ) (cid:44) codeword in T associated with a i ,d T ( i ) (cid:44) | c T ( a i ) | . By the natural correspondence, d T ( i ) is the depth of the node in T associatedwith a i so L ( T ) = (cid:80) ni =1 d T ( i ) p i . Further deﬁne ∀ a i ∈ X , m T ( i ) (cid:44) (cid:40) if c T ( a i ) is a master node in T , if c T ( a i ) is a leaf in T , , (cid:96) T ( i ) (cid:44) (cid:40) if m T ( i ) = 1 . if m T ( i ) = 0 .m T ( i ) and (cid:96) T ( i ) are indicator functions as to whether a i is encoded by a masternode or a leaf in T, so, ∀ i, m T ( i ) + (cid:96) T ( i ) = 1 . Note that using this new notationCost ( T ) = n (cid:88) i =1 d T ( i ) p i + C n (cid:88) i =1 m T ( i ) p i , Cost ( T ) = n (cid:88) i =1 d T ( i ) p i − C n (cid:88) i =1 (cid:96) T ( i ) p i . We now show that 0 ≤ C ≤ T ( C ) and T ( C ) can be assumedto possess a nice ordered structure. Lemma 5.

Let ≤ C ≤ . Then, if s = 0 (resp. s = 1 ) there exists a tree T ( C ) ∈ T ( n ) (resp. T ( C ) ∈ T ( n ) ) satisfying Equation (16) (resp. Equa-tion (17) ) that, for all i < j , satisﬁes the following two properties:(P1) d T s ( i ) ≤ d T s ( j ) . (P2) If d T s ( i ) = d T s ( j ) and m T s ( i ) = 1 then m T s ( j ) = 1 .Proof. We say that T = T ( C ) (resp T = T ( C )) is a minimum cost tree (for s ) if it satisﬁes Equation (16) (resp. (17)).The proof follows from swapping arguments. “Swapping” i and j meansassigning the old codeword c T s ( a i ) to a j and vice-versa. Let T (cid:48) s be the treeresulting from swapping i and j .The following observation is a straightforward calculation:Cost s ( T (cid:48) s ) = Cost s ( T s ) − ( d T s ( i ) − d T s ( j )) ( p i − p j ) + δ ( i, j )where δ ( i, j ) (cid:44)  m T s ( i ) = m T s ( j ) , − C ( p i − p j ) if m T s ( i ) = 1, and (cid:96) T s ( j ) = 1 ,C ( p i − p j ) if (cid:96) T s ( i ) = 1, and m T s ( j ) = 1 . We say that ( i, j ) is an inversion for T s if i < j and d T s ( i ) > d T s ( j ).15he calculations above and the fact that 0 ≤ C ≤

1, immediately imply thatif ( i, j ) is an inversion for T s thenCost s ( T (cid:48) s ) ≤ Cost s ( T s ) . Now let T s be a minimum cost tree for s that has the minimum number ofinversions among all such trees. If no inversion exists, then T s satisﬁes (P1).Otherwise, let ( i, j ) be the inversion that minimizes j − i . Swapping i and j decreases the number of inversions by 1 while not increasing the cost of the tree,contradicting the deﬁnition of T s . We may therefore assume that T s containsno inversion and satisﬁes (P1).Now say that ( i, j ) is an m(cid:96) -inversion in T s if i < j , d T s ( i ) = d T s ( j ), m T s ( i ) =1 and (cid:96) T s ( j ) = 1. Let T s be a minimum cost tree for s that satisﬁes (P1) andhas the fewest number of m(cid:96) -inversions. If no m(cid:96) -inversion exists, then T s alsosatisﬁes (P2) so the lemma is correct. Otherwise let ( i, j ) be an m(cid:96) -inversionthat minimizes j − i. Let T (cid:48) s be the tree that results by swapping i and j. Then T (cid:48) s will still satisfy (P1) but the numbers of inversions will decrease by 1 whileCost s ( T (cid:48) s ) = Cost s ( T s ) − C ( p i − p j ) ≤ Cost s ( T s ) . This contradicts the deﬁnition of T s . We may therefore assume T s contains noinversions and satisﬁes both (P1) and (P2).The consequences of Lemma 5 can be seen in Figure 4. The Lemma impliesthat the optimization in Equation (16) (resp. Equation (17)) can be restrictedto trees that satisfy Properties (P1) and (P2). In particular, the indices ofcodewords on a level are smaller than the indices of codewords on deeper levels.Also, on any given level, the indices of the leaves are smaller than the indices ofthe master nodes. We therefore henceforth assume that all trees in T ( n ) , and T ( n ) satisfy these properties. Deﬁnition 8 (Partial Trees and Truncation) . See Figure 5. • A partial binary AIFV code tree ( partial tree for short) T is one thatsatisﬁes all of the conditions of a binary AIFV code tree and properties(P1), (P2) except that it contains m ≤ n codewords. By (P1), the m ≤ n codewords it contains are c T ( a ) , . . . , c T ( a m ) . • For s ∈ { , } , let ¯ T s ( n ) denote the set of partial trees that satisfy theconditions of T s trees.For notational convenience, also set T ( n ) (cid:44) T ( n ) ∪ T ( n ) and ¯ T ( n ) (cid:44) ¯ T ( n ) ∪ ¯ T ( n ) . • T ∈ ¯ T ( n ) is i -level if depth( T ) ≤ i + 1 . Set ¯ T s ( i : n ) (cid:44) (cid:8) T s ∈ ¯ T s ( n ) : T s is i -level (cid:9) and ¯ T ( i : n ) (cid:44) ¯ T ( i : n ) ∪ ¯ T ( i : n ) . • Let T ∈ T ( n ) . The i -level truncation of T, denoted by Trunc ( i ) ( T ) , is thepartial tree that remains after removing all nodes at depth j > i + 1 from T. p p p p p p p p p p p p T p p p p p p p p p p p p p T (0; 2; 0)sig ( i ) ( T ) i =01234567910 (0; 4; 0)(3; 2; 2)(4; 4; 0)(8; 0; 1)(8; 2; 0)(8; 4; 0)(12; 0; 1)(12; 1; 0)(13; 0; 0) (0; 1; 0)(0; 3; 0)(1; 4; 1)(3; 5; 1)(7; 3; 2)(10; 2; 1)(12; 1; 0)(13; 0; 0)(13; 0; 0)(13; 0; 0)sig ( i ) ( T ) Figure 4: Black nodes are leaves, gray nodes master nodes and blue ones slave nodes. Notethat on every level, the indices of the leaves are smaller than the indices of the master nodes.Also note that in all cases, if sig ( i ) ( T s ) = ( m (cid:48) ; p (cid:48) , z (cid:48) ) and sig ( i +1) ( T s ) = ( m ; p, z ) then 2 m (cid:48) +2 p (cid:48) + z (cid:48) = 2 m + p, m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z, as required by Lemma 3. Note: ∀ T ∈ T ( n ) , Trunc ( i ) ( T ) ∈ ¯ T ( i : n ) . Deﬁnition 9 (Signatures and Costs) . See Figures 4 and 5. (a) i − level Signatures: The i − level signature of T is the ordered triple sig ( i ) ( T ) (cid:44) ( m ; p ; z ) where m (cid:44) |{ j : d T ( j ) ≤ i }| = i of T ,p (cid:44) i + 1 of T ,z (cid:44) |{ j : d T ( j ) = i and m T ( j ) = 1 }| = i of T .

Note that sig ( i ) ( T ) = sig ( i ) (cid:16) Trunc ( i ) ( T ) (cid:17) . (b) i -level Costs: Let sig ( i ) ( T ) = ( m ; p ; z ) . The i -level costs of T are Cost ( i )0 ( T ) (cid:44) iW (cid:48) m + m (cid:88) i =1 d T ( i ) p i + C m − z (cid:88) i =1 m T ( i ) p i . and Cost ( i )1 ( T ) (cid:44) iW (cid:48) m + m (cid:88) i =1 d T ( i ) p i − C m (cid:88) i =1 (cid:96) T ( i ) p i . p p T i =012345 p p p p T (cid:48) = Expand (2) ( T , , p p p p p p p p sig (2) ( T ) = (3; 2; 2) sig (3) ( T (cid:48) ) = (4; 4; 0) sig (4) ( T (cid:48)(cid:48) ) = (8; 0; 1) i p i . . . . . . . . . . . . . W (cid:48) i . . . . . . . . . . . .

02 0Cost (2)1 ( T ) = 2 + C ∗ (3)1 ( T (cid:48) ) = 2 .

45 + C ∗ .

35= Cost (2)1 ( T ) + W (cid:48) + C ∗ W , Cost (4)1 ( T (cid:48)(cid:48) ) = 3 .

45 + C ∗ .

35= Cost (3)1 ( T ) + W (cid:48) + C ∗ W , T (cid:48)(cid:48) = Expand (3) ( T (cid:48) , , Figure 5: Illustrations of the Trunc and Expand operations and Lemma 9. The p i , i =1 , . . . , , are given in the table above the trees. As examples of the Trunc operation notethat Trunc (2) ( T (cid:48)(cid:48) ) = Trunc (2) ( T (cid:48) ) = T and Trunc (3) ( T (cid:48)(cid:48) ) = T (cid:48) . Suppose T s ∈ ¯ T s ( n ) , with depth( T ) = d. An interesting peculiarity of thisdeﬁnition is that T is an i level tree for all i ≥ d − ( d − ( T ) < Cost ( d )0 ( T ) < Cost ( d +1)0 ( T ) < · · · for some indeterminate length chain. The important observation though, is thatCost ( i ) s ( T ) collapses to Cost s ( T ) for the interesting cases. Lemma 6. (a) Let T s ∈ T s ( n ) , with depth( T s ) = d. Then sig ( d ) ( T s ) = ( n ; 0; 0) and Cost ( d ) s ( T s ) = Cost s ( T s ) . (b) Let T s ∈ ¯ T s ( n ) be an i -level tree with sig ( i ) ( T s ) = ( n ; 0; 0) .Then T s ∈ T s ( n ) with depth ( T s ) = i .Proof. (a) By deﬁnition, T s is a d -level tree with no nodes on level d + 1 . Let ( m, p, z ) = sig ( d ) ( T s ). Since T s contains n codewords, m = n. T s containsno nodes on level d + 1, so p = 0 . Furthermore, it contains no slave nodes onlevel d + 1 so it contains no master nodes on level d, i.e., z = 0 . Since W (cid:48) n = 0,Cost ( d )0 ( T ) = dW (cid:48) n + n (cid:88) i =1 d T ( i ) p i + C n (cid:88) i =1 m T ( i ) p i = Cost ( T ) . SimilarlyCost ( d )1 ( T ) = dW (cid:48) n + n (cid:88) i =1 d T ( i ) p i − C n (cid:88) i =1 (cid:96) T ( i ) p i = Cost ( T ) . T (0; 2; 0) i =012 T (1; 0; 1) p p (1)1 ( T , ) = 1 Cost (1)1 ( T , ) = 1 − Cp Cost (1)1 ( T , ) = 1 T (0; 3; 0) T (1; 1; 0) T (1; 1; 1) Cost (0)0 ( T , ) = 0 Cost (0)0 ( T , ) = 0 Figure 6: The initial trees introduced in Deﬁnition 10. Note that the deﬁnition of T treespermit the root to be a master node or an internal node, while the deﬁnition of T treesrequires that the root be an internal node. (b) T s contains no master nodes on level i so it contains no slave nodes onlevel i + 1. It also contains no non-slave nodes on level i . So it contains no nodeson level i and depth( T s ) = i. T s ∈ T s ( n ) by deﬁnition.The deﬁnitions and lemmas immediately imply Corollary 2. T s ( C ) = argmin T s ∈ ¯ T ( n ) ∃ i s.t. T s ∈ ¯ T ( i : n ) and sig ( i ) ( T s )=( n ;0;0) Cost ( i ) s ( T s ) (20)The next deﬁnition introduces the initial conditions for the dynamic pro-grams. Deﬁnition 10.

See Figure 6. Set I = { (0; 2; 0) , (1; 0 , } ,I = { (0; 3; 0) , (1; 1; 0) , (1; 1; 1) } . Note that if ( m ; p ; z ) ∈ I , there exists a unique -level tree T s ∈ ¯ T ( n ) satisfying sig (0) ( T ) = ( m ; p ; z ) . Similarly, if ( m ; p ; z ) ∈ I , there exists a unique -level tree T s ∈ ¯ T ( n ) satisfying sig (1) ( T ) = ( m ; p ; z ) . Let T s ( m ; p ; z ) denote this unique tree and ¯ c s ( m ; p ; z ) = Cost ( s ) s ( T s ( m ; p ; z )) . The following lemma is true by observation

Lemma 7.

Let n > .If T ∈ ¯ T ( n ) with depth( T ) ≥ , then sig (0) ( T ) ∈ I . If T ∈ ¯ T ( n ) with depth( T ) ≥ , then sig (1) ( T ) ∈ I . Note: The reason for starting with sig (1) ( T ) instead of sig (0) ( T ) is because theroot of a T tree is “unusual”, being a complete node with a slave child, the only timethis combination can occur. By deﬁnition, sig (0) ( T ) = (0; 1; 0) . This is misleadingbecause it loses the information about the unusual slave node on level . We thereforeonly start looking at signatures of T trees from level . Deﬁnition 11.

See Figure 5. Let T (cid:48) ∈ ¯ T ( i : n ) satisfy sig ( i ) ( T (cid:48) ) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and e , e ≥ e + e ≤ p (cid:48) . (21)19 eﬁne the ( e , e ) − expansion of T (cid:48) as the unique tree T = Expand ( i ) ( T (cid:48) , e , e ) in which • the ﬁrst i -levels of T are identical to those of T (cid:48) . • e of the p (cid:48) non-slave nodes on level i + 1 of T (cid:48) are set as leaves associatedwith a m (cid:48) +1 . . . . , a m (cid:48) + e . • e non-slave nodes on level i + 1 of T (cid:48) are set as master nodes associatedwith a m (cid:48) + e +1 . . . . , a m (cid:48) + e + e (with corresponding slave nodes created onlevel i + 2 ). • the remaining p (cid:48) − e − e non-slave nodes on level i + 1 of T (cid:48) becomecomplete internal nodes, creating p (cid:48) − e − e ) non-slave nodes on level i + 2 . These are in addition to the z (cid:48) non-slave children on level i + 2 ofthe z (cid:48) slave nodes on level i + 1 . Note that this deﬁnition implies that sig ( i +1) ( T ) = ( m ; p ; z ) where m = m (cid:48) + e + e , (22) z = e , (23) p = z (cid:48) + 2( p (cid:48) − e − e ) . (24) Lemma 8. (a) Let T (cid:48) ∈ ¯ T ( i : n ) . If sig ( i ) ( T (cid:48) ) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and ( e , e ) satisﬁes Equa-tion (21) , then T (cid:48) = Expand ( i ) ( T (cid:48)(cid:48) , e , e ) ∈ ¯ T ( i + 1 : n ) . (b) Let T ∈ ¯ T ( n ) . For i ≥ , set (cid:0) m ( i ) ; p ( i ) ; z ( i ) (cid:1) = sig ( i ) (cid:16) Trunc ( i ) ( T ) (cid:17) . Then

Trunc ( i +1) ( T ) = Expand ( i ) (cid:16) Trunc i ( T ) , e , e (cid:17) where e = m ( i +1) − m ( i ) − z ( i +1) and e = z ( i +1) . Proof. (a) follows from the fact that Deﬁnition 1 maintains the validity of prop-erties (P1) and (P2) of Lemma 5 and that depth( T (cid:48)(cid:48) ) ≤ i + 2 . (b) just followsdirectly from the deﬁnitions.Part (b) implies that any tree T ∈ ¯ T ( n ) can be grown level by level viaexpansion operations.Now recall from Deﬁnition 4 the deﬁnition of the signature set S n and theoperation → . Lemma 9.

Let T (cid:48) ∈ ¯ T ( i : n ) with sig ( i ) ( T (cid:48) ) = α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) .(a) Let ( e , e ) satisfy Equation (1) .Let T = Expand ( i ) ( T (cid:48) , e , e ) and α = ( m ; p ; z ) = sig ( i +1) ( T ) . Then α (cid:48) → α . b) Let α = ( m ; p ; z ) . If α (cid:48) → α , let e , e be the unique values satisfyingEquations (1)-(4) and set T = Expand ( i ) ( T (cid:48) , e , e ) .Then α = sig ( i +1) ( T ) .(c) If T = Expand ( i ) ( T (cid:48) , e , e ) with α = ( m ; p ; z ) = sig ( i +1) ( T ) , then Cost ( i +1)0 ( T ) = Cost ( i )0 ( T (cid:48) ) + c ( α (cid:48) , α )Cost ( i +1)1 ( T ) = Cost ( i )1 ( T (cid:48) ) + c ( α (cid:48) , α ) Proof. (a) This follows directly from the deﬁnition of T = Expand ( i ) ( T (cid:48) , e , e ).(b) From the deﬁnition of α (cid:48) → α there exist appropriate e , e satisfyingEquations (1)-(4). Then T = Expand ( i ) ( T (cid:48) , e , e ) has sig ( i +1) ( T ) = ( m ; p ; z ) . (c) From the deﬁnitions of signatures and expansions m (cid:48) (cid:88) j =1 d T ( j ) p j = m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j and m (cid:88) j = m (cid:48) +1 d T ( j ) p j = ( i +1) m (cid:88) j = m (cid:48) +1 p j = ( i +1) W m (cid:48) ,m . Furthermore, from Lemma 5 (P1), (P2), the master nodes on level i correspondto a m (cid:48) − z (cid:48) +1 , . . . , a m (cid:48) . Thus (again also using the deﬁnition of expansion) m (cid:48) − z (cid:48) (cid:88) j =1 m T ( j ) p j = m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j and m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = m − z (cid:88) j = m (cid:48) − z (cid:48) +1 p j = W m (cid:48) − z (cid:48) ,m (cid:48) . ThenCost ( i +1)0 ( T ) = ( i + 1) W (cid:48) m + m (cid:88) j =1 d T ( j ) p j + C m − z (cid:88) j =1 m T ( j ) p j = ( i + 1) W (cid:48) m + ( i + 1) W m (cid:48) ,m + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + C m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = ( i + 1) W (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + C m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = iW (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + W (cid:48) m (cid:48) + CW m (cid:48) − z (cid:48) ,m (cid:48) = Cost ( i )0 ( T (cid:48) ) + c ( α (cid:48) , α ) . From Lemma 5 (P1), (P2), the leaves on level i + 1 of T correspond to a m (cid:48) +1 , . . . , a m − z . Thus (again also using the deﬁnition of expansion) m (cid:48) (cid:88) j =1 (cid:96) T ( j ) p j = m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j and m (cid:48) (cid:88) j = m (cid:48) +1 (cid:96) T ( j ) p j = m − z (cid:88) j = m (cid:48) +1 p j = W m (cid:48) ,m − z . ( i +1)1 ( T ) = ( i + 1) W (cid:48) m + m (cid:88) j =1 d T ( j ) p j − C m (cid:88) j =1 (cid:96) T ( j ) p j = ( i + 1) W (cid:48) m + ( i + 1) W m (cid:48) ,m + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j − C m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j − C m (cid:88) j = m (cid:48) +1 (cid:96) T ( j ) p j = iW (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j − C m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j + W (cid:48) m (cid:48) − CW m (cid:48) ,m − z = Cost ( i )1 ( T (cid:48) ) + c ( α (cid:48) , α ) . Combining Lemmas 7 to 9 immediately imply a direct relationship betweenpaths in the Signature Graph and building a tree level-by-level.

Corollary 3.

Fix s ∈ { , } . (a) Let T ∈ ¯ T s ( i : n ) and, for all s ≤ j ≤ i set T ( j ) = Trunc ( j ) ( T ) and α ( j ) = sig ( j ) (cid:16) T ( j ) (cid:17) . Then • α ( s ) ∈ I s ; T ( s ) = T s (cid:0) α ( s ) (cid:1) ; Cost s (cid:0) T ( s ) (cid:1) = c s (cid:0) α ( s ) (cid:1) ; • ∀ s ≤ j < i , α ( j ) → α ( j +1) • Cost ( i ) s (cid:0) T ( i ) (cid:1) = ¯ c s (cid:0) α ( s ) (cid:1) + (cid:80) i − j = s c s (cid:0) α ( j ) , α ( j +1) (cid:1) (b) Let (cid:8) α [ j ] (cid:9) ij = s ⊂ S n such that α [ s ] ∈ I s and for all s ≤ j < i , α [ j ] → α [ j +1] .Then there exists an i level tree T ∈ ¯ T s ( n ) such that, using the deﬁnitions frompart (a), α ( j ) = α [ j ] . Note: the condition s ≤ j reﬂects the fact that, from Deﬁnition 10, Lemma 7 andthe explanatory note following Lemma 7, the initial condition for T requires j ≥ and the initial condition for T requires j ≥ . This Corollary motivates the original deﬁnition of the OPT s ( α ) tables. Lemma 10.

Fix s ∈ { , } and deﬁne initial signatures I s with associated ¯ c s ( α ) for α ∈ I s as in Deﬁnition 10. Let OPT s ( α ) and Pred s ( α ) be as introduced inDeﬁnition 5.Then, for all α ∈ S n , OPT s ( α ) = min (cid:91) i ≥ s (cid:110) Cost ( i ) s ( T s ) : T s ∈ T s ( i : n ) and sig ( i ) ( T s ) = α (cid:111) . (25) Furthermore, an i ≥ s and T s ∈ T s ( i : n ) satisfying sig ( i ) ( T s ) = α and Cost ( i ) s ( T s ) = OPT s ( α ) (26) can be constructed in O ( i ) time using the Pred s ( ) entries. roof. Recall the interpretation of

OP T s ( α ) given after Deﬁnition 5. Considerthe α as nodes in a directed graph with edge costs deﬁned by c s ( α (cid:48) , α ) exceptthat edges from (0; 0; 0) to α ∈ I s have cost ¯ c s ( α ) and all other undeﬁned edgecosts are set to ∞ . Then OPT s ( α ) is just the cost of the shortest path from(0; 0; 0) to α .Corollary 3(a) then implies that if T s ∈ ¯ T s ( i : n ) with sig ( i ) ( T s ) = α, thenthere exists a path from (0; 0; 0) to α with cost Cost ( i ) s ( T s ) . In the other direction, Corollary 3(b) implies that if P is a i -edge path from(0; 0; 0) to α , then there exists T s ∈ ¯ T s ( i : n ) with sig ( i ) ( T s ) = α, and Cost ( i ) s ( T s )equal to the cost of the path.This proves Equation (25).The actual tree T s satisfying Equation (25) can be found by following thePred s ( ) values backwards from α until reaching α (cid:48) ∈ I s . This provides a pathfrom (0; 0; 0) to α with cost OPT s ( α ). This path can be translated into T s viaCorollary 3(b).Corollary 2 then immediately implies Corollary 4.

Fix s ∈ { , } . Then min T s ∈T s ( n ) { Cost s ( T s : C ) } = OPT s ( n ; 0; 0) . Furthermore, if i ≥ s and T s ∈ T s ( i : n ) are such that Cost ( i ) s ( T s ) = OPT s ( n ; 0; 0) ,then T s ( C ) = T s . In words, the Corollary states that T s ( C ) can be found by ﬁlling in theOPT s ( ) table and then using the Pred s ( ) entries to construct the tree corre-sponding to OPT s ( n ; 0; 0) . Since Section 3 gives an O ( n ) algorithm for ﬁllingin the OPT s ( ) and Pred s ( ) tables, this leads to the desired O ( n ) algorithmfor solving the original problem. References:References [1] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter Shor, and RobertWilber. Geometric applications of a matrix-searching algorithm.

Algorith-mica , 2(1-4):195–208, 1987.[2] Wolfgang Bein. Advanced techniques for dynamic programming. In

Hand-book of Combinatorial Optimization , number January 1998, pages 41–92.2013.[3] Rainer E Burkard, Bettina Klinz, and R¨udiger Rudolf. Perspectives ofmonge properties in optimization.

Discrete Applied Mathematics , 70(2):95–161, 1996.[4] Sze-Lok Chan and Mordecai J Golin. A dynamic programming algorithmfor constructing optimal “1”-ended binary preﬁx-free codes.

IEEE Trans-actions on I.T. , 46(4):1637–1644, 2000.235] David Eppstein, Zvi Galil, and Raﬀaele Giancarlo. Speeding up dynamicprogramming. In [Proceedings 1988] 29th Annual Symposium on Founda-tions of Computer Science , pages 488–496. IEEE, 1988.[6] M. J. Golin and G. Rote. A dynamic programming algorithm for construct-ing optimal preﬁx-free codes with unequal letter costs.

IEEE Transactionson I.T. , 44(5):1770–1781, Sept 1998.[7] Mordecai Golin and Elfarouk Harb. Polynomial time algorithms for con-structing optimal aifv codes. In ,pages 231–240. IEEE, 2019.[8] Mordecai Golin and Elfarouk Harb. Polynomial time algorithms for con-structing optimal binary aifv-2 codes.

ArXiv:2001.11170 [cs.IT] , 2020.[9] W. Hu, H. Yamamoto, and J. Honda. Worst-case redundancy of optimalbinary aifv codes and their extended codes.

IEEE Transactions on I.T. ,63(8):5074–5086, Aug 2017.[10] K. Iwata and H. Yamamoto. A dynamic programming algorithm to con-struct optimal code trees of AIFV codes. In , pages 641–645, Oct2016.[11] Donald E. Knuth. Optimum binary search trees.

Acta informatica , 1(1):14–25, 1971.[12] Michelle L. Wachs. On an eﬃcient dynamic programming technique of F.F. Yao.

Journal of Algorithms , 10(4):518–530, 1989.[13] Gerhard J. Woeginger. Monge strikes again: Optimal placement of webproxies in the internet.

Operations Research Letters , 27(3):93–96, 2000.[14] H. Yamamoto and X. Wei. Almost instantaneous FV codes. In , pages 1759–1763,July 2013.[15] Hirosuke Yamamoto, Masato Tsuchihashi, and Junya Honda. Almost in-stantaneous ﬁxed-to-variable length codes.

IEEE Transactions on I.T. ,61(12):6432–6443, 2015.[16] F. F. Yao. Eﬃcient dynamic programming using quadrangle inequalities.

Proceedings of the twelfth annual ACM symposium on Theory of Computing(STOC’80) , pages 429–435, 1980.[17] F. F. Yao. Speed-up in dynamic programming.

SIAM Journal on AlgebraicDiscrete Methods , 3(4):532–540, 1982.[18] Hao Yuan and Mikhail J Atallah. Data structures for range minimumqueries in multidimensional arrays. In