Speeding up the AIFV- 2 dynamic programs by two orders of magnitude using Range Minimum Queries
SSpeeding up the AIFV-2 dynamic programs by twoorders of magnitude using Range Minimum Queries
Mordecai Golin a,1 , Elfarouk Harb b a Hong Kong UST. [email protected] b Hong Kong UST. [email protected]
Abstract
AIFV-2 codes are a new method for constructing lossless codes for memorylesssources that provide better worst-case redundancy than Huffman codes. Theydo this by using two code trees instead of one and also allowing some boundeddelay in the decoding process. Known algorithms for constructing AIFV-codeare iterative; at each step they replace the current code tree pair with a “better”one. The current state of the art for performing this replacement is a pair ofDynamic Programming (DP) algorithms that use O ( n ) time to fill in two tables,each of size O ( n ) (where n is the number of different characters in the source).This paper describes how to reduce the time for filling in the DP tablesby two orders of magnitude, down to O ( n ). It does this by introducing agrouping technique that permits separating the Θ( n )-space tables into Θ( n )groups, each of size O ( n ), and then using Two-Dimensional Range-MinimumQueries (RMQs) to fill in that group’s table entries in O ( n ) time. This RMQspeedup technique seems to be new and might be of independent interest. Keywords:
AIFV Codes, Dynamic Programming Speedups, Range MinimumQueries
1. Introduction
Almost Instantaneous Fixed to Variable-2 (AIVF-2) codes were introducedrecently in a series of papers [9, 10, 14, 15]Similar to Huffman Codes, these provide lossless encoding for a fixed prob-abilistic memoryless source. They differ from Huffman codes in that they usea pair of coding trees instead of just one tree, sometimes coding using the firstand sometimes using the second. They also no longer provide instantaneous decoding. Instead, decoding might require a bounded delay. That is, it mightbe necessary to read up to 2 extra characters after a codeword ends before cer-tifying the the completion (and decoding) of the codeword. The advantage ofAIFV-2 codes over Huffman codes is that they guarantee redundancy of at most1 / Work partially supported by Hong Kong RGC CERG Grant 16213318
Preprint submitted to Elsevier a r X i v : . [ c s . D S ] F e b etter, pair. The original paper [15] only proved that its iterative algorithmterminated. This was improved to polynomial time steps by [7], which usedonly O ( b ) iterations, where b is the maximum number of bits used to encodeone of the input source probabilities.Each iterative step of [15]’s algorithm was originally implemented using anexponential time Integer Linear Program. This was later improved by [10] to O ( n ) time, using Dynamic Programming (DP) to replace the ILP. n is thenumber of different characters in the original souce.The purpose of this paper is to show how the DP method can be sped upto O ( n ) time. Combined with [7], this yields a O ( n b ) time algorithm forconstructing AIFV-2 codes.Historically, there have been two major approaches to speeding up DPs.The first is the Knuth-Yao Quadrangle-Inequality method [11, 12, 16, 17]. Thesecond is the use of “monotonicity” or the “Monge Property” and the applicationof the SMAWK [1] algorithm [3, Section 3.8] ([13] provides a good example ofthis approach). There are also variations, e.g., [5], that while not exactly oneor the other, share many of their properties. [2] provides a recent overview ofthe techniques available.Both methods improve running times by “grouping” calculations. Morespecifically, they all essentially fill in a DP table of size Θ( n k ), for some k ,in which calculating an individual table entry requires Θ( n ) work. Thus, a-priori, filling in the table seems to require Θ( n k +1 ) time. The speedups work bygrouping the entries in sets of size Θ( n ) and calculating all entries in the groupin Θ( n ) time. The Quadrangle-Inequality approach does this via amortizationwhile the SMAWK approach does this by a transformation into another problem(matrix row-minima calculation). Both approaches lead to a Θ( n ) speedup,permitting filling in the table in an optimal Θ( n k ) time.Both DPs in [10] have O ( n ) size tables with each entry requiring Θ( n )individual evaluation time, leading to the O ( n ) time algorithms. The maincontribution of this paper is the development of new grouping techniques thatpermit speeding up the DPs by Θ( n ) , decreasing the running times to O ( n ) . More specifically, the table entries are now partitioned into Θ( n ) groups,each containing Θ( n ) entries. For each group, a Θ( n ) × Θ( n ) sized rectangularmatrix M is then built; calculating the value of each table entry in the groupis shown to be equivalent to performing a Two-Dimensional Range Minimum(2D RMQ) query on M (along with O (1) extra work). Known results [18] on2D RMQ queries imply that O ( n ) queries can be inplemented using a total of O ( n ) time. Thus all entries in each group of size Θ( n ) can be evaluated in O ( n ) time, leading to an O ( n ) time algorithm.To the best of our knowledge this is the first time 2D RMQs have been usedfor speeding up Dynamic Programming in this fashion, so this technique mightbe of independent interest.Section 2 quickly reviews known facts about 2D RMQs. It also introduces thetwo specialized versions of RMQs that will be needed and shows that they canbe solved even more simply (practically) than standard RMQs. Section 3 is themain result of the paper. It states (before derivation) the two DPs of interest andthen describes the new technique to reduce their evaluation from Θ( n ) to Θ( n ) . The remainder of the paper then provides the backstory. Section 4 defines themotivating AIVF-2 problem and the technique for solving it. Finally, Section 5describes the derivation of the AIFV-2 DPs that were solved in Section 3. We2mphasize that while these DPs are not exactly the ones introduced in [10]they are very similar and were derived using the same observations and basictools (the top-down signature technique of [6, 4]). The derivation of these newDPs was necessary, though. Their slightly different structure is what permitssuccessfully applying the 2D RMQ technique to themWe conclude by noting that AIFV-2 codes were later extended to AIFV- m codes by [9]. These replace the pair of coding trees by an m -tuple. The iterativealgorithms for constructing these codes use O ( n m +1 ) time DP algorithms thatfill in size O ( n m +1 ) DP tables as subroutines. An interesting direction for futurework is whether it is possible to reduce the running times of evaluating thoseDP tables by a factor of Θ( n m ) via the use of the corresponding m D RMQalgorithms from [18]. This would require a much better understanding of thestructure of those DPs in [9] than currently exist.
2. Range Minimum Queries
As, mentioned, the speedup in evaluating the DPs will result from groupingand then using Range Minimum Queries (RMQs). This section quickly reviewsfacts about RMQs for later use.
Definition 1 (2D RMQ) . Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . The two-dimensional range minimum query (2D RMQ) problem is,for ≤ a ≤ a ≤ n and ≤ b ≤ b ≤ n to return the value RM Q ( M : a , a , b , b ) (cid:44) min { M i,j : a ≤ i ≤ a , b ≤ j ≤ b } . and indices i (cid:48) , j (cid:48) , a ≤ i (cid:48) ≤ a , b ≤ j (cid:48) ≤ b such that M i (cid:48) ,j (cid:48) = RM Q ( M : a , a , b , b ) . This can be solved using
Lemma 1 ([18]) . Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . There is an O ( mn ) time algorithm to preprocess M that permitsanswering any subsequent 2D RMQ query in O (1) time. While theoretically optimal, the algorithm in [18] is quite complicated. Tomake the speed up more practical to implement, we note in advance that all ofthe RMQ queries used later will be one of the two following specialized types:
Definition 2.
Let M = ( M i,j ) be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n .Let ≤ a ≤ m, ≤ b ≤ n . See Figure 1. • Define a restricted column query as RCQ ( M : a, b ) (cid:44) RM Q ( M : a, m, b, b ) . • Define a restricted RMQ query as RRM Q ( M : a, b ) (cid:44) RM Q ( M : a, m, , b ) . , m,
0) (0 , n )( a,
0) (0 , b ) ( i, j ) a ≤ i ≤ m ≤ j ≤ b (0 , b (cid:48) )( a (cid:48) ,
0) ( a, b ) ( a (cid:48) , b (cid:48) ) i j Figure 1: Illustration of Definition 2. M is an ( m + 1) × ( n + 1) matrix. RCQ ( M : a (cid:48) , b (cid:48) )is the minimum of the entries in the long thin blue column descending down from ( a (cid:48) , b (cid:48) ) .RRMQ ( M : a, b ) is the minimum of the entries in the blue rectangle with upper-right corner( a, b ). Directly from the definition, ∀ b, RCQ ( M : a, b ) = (cid:40) M m,b if a = m, min ( M a,b , RCQ ( M : a + 1 , b )) if a < m. Thus, the values of all of the Θ( mn ) possible RCQ ( M : a, b ) queries (and theassociated indices at which minimization occurs) can be easily calculated inΘ( mn ) time.Also directly from the definitions, RRM Q ( M : a, b ) = (cid:40) RCQ ( M : a,
0) if b = 0 , min ( RRM Q ( M : a, b − , RCQ ( M : a, b )) if b > RCQ ( M : a, b ) have been precalculated, thevalues of all of the Θ( mn ) possible RRM Q ( M : a, b ) queries (and the associatedindices at which minimization occurs) can also be easily calculated in Θ( mn )time.For later use we collect this in a lemma. Lemma 2.
Let M be a given m × n matrix; ≤ i ≤ m, ≤ j ≤ n . Thereis an O ( mn ) time algorithm that calculates the answers to all of the possible RCQ ( M : a, b ) and RRM Q ( M : a, b ) queries.
3. The Dynamic Program and its speedupDefinition 3.
Let p , . . . , p n be given such that ∀ i, p i > and (cid:80) ni =1 p i = 1 . Set W m (cid:44) (cid:88) j ≤ m p j , and W (cid:48) m (cid:44) (cid:88) j>m p j = 1 − W m and, for m (cid:48) < m , W m (cid:48) ,m (cid:44) (cid:88) m (cid:48) Definition 4 (The Signature Set and costs) . Let C ( ≤ C ≤ ) be fixed. • Define S n (cid:44) { ( m ; p ; z ) : 0 ≤ z ≤ m ≤ n and ≤ p ≤ n } to be the signature set for the problem of size n. • Let ( m (cid:48) ; p (cid:48) ; z (cid:48) ) (cid:54) = ( m, p, z ) ∈ S n .We say ( m (cid:48) ; p (cid:48) ; z (cid:48) ) can be expanded into ( m ; p ; z ) , denoted by ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, p, z ) , if there exists e , e satisfying e , e ≥ ≤ e + e ≤ p (cid:48) (1) and m = m (cid:48) + e + e , (2) z = e , (3) p = z (cid:48) + 2( p (cid:48) − e − e ) . (4) • For α ∈ S n , define the immediate predecessor set of α to be P ( α ) (cid:44) { α (cid:48) ∈ S n : α (cid:48) → α } . • Let α , α ∈ S n . We say that α leads to α , denoted by α (cid:32) α , if thereexists a path from α to α using “ → ”. • Let I ⊂ S n and α ∈ S n . We say that I (cid:32) α if α (cid:54)∈ I and there exists α (cid:48) ∈ I such that α (cid:48) (cid:32) α . • Let α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and α = ( m, p ; z ) where α (cid:48) → α . The associated expansion costs are c ( α (cid:48) , α ) (cid:44) W (cid:48) m (cid:48) + CW m (cid:48) − z (cid:48) ,m (cid:48) ,c ( α (cid:48) , α ) (cid:44) W (cid:48) m (cid:48) − CW m (cid:48) ,m − z . The two dynamic programs used in the construction of AIFV-2 codes aregiven in the next definition. Definition 5 (The OPT s ( α ) tables) . • Let I ⊂ S n be a given initial set (independent of n ) for the OPT tablewith known values ¯ c ( α ) for α ∈ I . Now define OPT ( α ) = ¯ c ( α ) if α ∈ I min α (cid:48) ∈ P ( α ) { OPT ( α (cid:48) ) + c ( α (cid:48) , α ) } if I (cid:32) α ∞ otherwise Let I ⊂ S n be a given initial set (independent of n ) for the OPT tablewith known values ¯ c ( α ) for α ∈ I . Now define OPT ( α ) = ¯ c ( α ) if α ∈ I min α (cid:48) ∈ P ( α ) { OPT ( α (cid:48) ) + c ( α (cid:48) , α ) } if I (cid:32) α ∞ otherwise • Furthermore, for s ∈ { , } , for α (cid:54)∈ I s with I s (cid:32) α , set Pred s ( α ) (cid:44) argmin α (cid:48) ∈ P ( α ) { OPT s ( α (cid:48) ) + c s ( α (cid:48) , α ) } The ¯ c s ( α ) for α ∈ I s are the initial conditions for the corresponding dynamicprograms. For intuition, let G s ( n ) be the directed graph with vertices α ∈ S n withthe cost of edge ( α (cid:48) , α ) being the expansion cost c s ( α (cid:48) , α ) except that edgesfrom (0; 0; 0) to α ∈ I s have cost ¯ c s ( α ) and edges that are not expansionshave costs set to ∞ . Then OPT s ( α ) is just the cost of the shortest path from(0; 0; 0) to α in G s ( n ) . The actual path could be found by following the Pred s ( α )pointers backward from α . By definition, the expansion costs c s ( α (cid:48) , α ) are allnon-negative, so the OPT s ( α ) values are all well-defined.The next set of lemmas will imply that G s ( n ) is a Directed Acyclic Graph sothe recurrences define a Dynamic Program. They will also suggest an efficientgrouping mechanism, leading to fast evaluation. Lemma 3. Let ( m (cid:48) ; p (cid:48) ; z (cid:48) ) , ( m ; z ; p ) ∈ S n . Then ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, z, p ) if and only if all of m (cid:48) + 2 p (cid:48) + z (cid:48) = 2 m + p, (5) m (cid:48) + p (cid:48) ≥ m, (6) m (cid:48) ≤ m − z, (7)( p (cid:48) , z (cid:48) ) (cid:54) = (0 , , (8) are satisfied.Proof. First assume that ( m (cid:48) ; p (cid:48) ; z (cid:48) ) → ( m, p, z ).Let e , e be the unique pair that satisfies (1)-(4). Then (5) follows from2 m (cid:48) + 2 p (cid:48) = 2( m − e − e ) + ( p − z (cid:48) + 2 e + 2 e )= 2 m + p − z (cid:48) ;(6) follows from m (cid:48) + p (cid:48) ≥ m (cid:48) + e + e = m ;(7) follows from m − z = m − e = m (cid:48) − e ≥ m (cid:48) . 68) follows from the fact that the combination of ( p (cid:48) , z (cid:48) ) = (0 , 0) and Definition 4would imply p = − ( e + e ). Since p ≥ 0, this further implies e = e = 0 andthus m = m (cid:48) and p = z = 0. This would contradict ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) . For the other direction assume that Equations (5)-(8) all hold. We willshow that Equations (1)-(4) with ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) also all hold with e = m − m (cid:48) − z and e = z . Equations (2) and (3) are trivially satisfied. (4) followsfrom p = 2 m (cid:48) + 2 p (cid:48) + z (cid:48) − m = z (cid:48) − m − m (cid:48) ) + 2 p (cid:48) = z (cid:48) − e + e ) + 2 p (cid:48) = z (cid:48) + 2( p (cid:48) − e − e ) . Next note that e = z ≥ e = m − z − m (cid:48) ≥ p (cid:48) ≥ m − m (cid:48) = e + e so Equation (1) holds.It only remains to show that ( m (cid:48) ; z (cid:48) ; p (cid:48) ) (cid:54) = ( m ; z ; p ) . Suppose, not and( m (cid:48) ; z (cid:48) ; p (cid:48) ) = ( m ; z ; p ) . Then from (4), e = e = 0 so from (3) z (cid:48) = z = 0and thus from (4), p = 2 p (cid:48) implying p (cid:48) = p = 0 . But this contradicts (8). Definition 6. For d ≥ , define I ( d ) (cid:44) { ( m ; p ; z ) ∈ S n : 2 m + p = d } , I (cid:48) ( d ) (cid:44) { ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ S n : 2 m (cid:48) + 2 p (cid:48) + z (cid:48) = d and ( p (cid:48) , z (cid:48) ) (cid:54) = (0 , } . Now note that Lemma 3 can be rewritten as Corollary 1. If α ∈ I ( d ) then P ( α ) = (cid:110) ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d )) : m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z (cid:111) ⊆ I (cid:48) ( d )) . Next note Lemma 4. Let d > . Then I (cid:48) ( d ) ⊆ (cid:91) d (cid:48) Let α = ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ) . Since the I ( d (cid:48) ) partition S n , there mustexist some d (cid:48) such that α ∈ I ( d (cid:48) ) . Suppose that d ≤ d (cid:48) . Then2 m (cid:48) + 2 p (cid:48) + z (cid:48) = d ≤ d (cid:48) = 2 m (cid:48) + p (cid:48) , implying p (cid:48) + z (cid:48) ≤ p, z ) = (0 , I (cid:48) ( d )) . Thus d (cid:48) < d. Since this is true for all α ∈ I (cid:48) ( d ) , Equation (9) follows.Corollary 1 and Lemma 4 together imply that the OPT s ( α ) tables can beevaluated in the order α ∈ I ( d ) for d = 1 , , . . . . This ordering guaranteesthat when OPT s ( α ) is being calculated, all of the OPT s ( α (cid:48) ) entries for which α (cid:48) ∈ P ( α ) have been previously calculated.For many α, | P ( α ) | = Θ( n ), so calculating OPT s ( α ) would require Θ( n )time. Since |S n | = Θ( n ), this would imply an O ( n ) time algorithm for filling7 , m (cid:48) p (cid:48) (0 , r ) ( r, I (cid:48) ( d ) (0 , ji (0 , r ) ( r, , m ) ( m, ( m − z, (0 , m ) ( m − z, m (cid:48) + p (cid:48) = mm (cid:48) + p (cid:48) = r P ( α ) α = ( m ; p ; z ) ( i, j ) m ≤ i ≤ r ≤ j ≤ m − z (a) (b) Figure 2: The transformation from ( m (cid:48) , p (cid:48) ) to ( j, i ) described in the text. From Definition 6,if ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ) then z (cid:48) = d − m (cid:48) − p (cid:48) is uniquely determined by ( m (cid:48) , p (cid:48) ) . In (a), theright triangle bounded by vertices (0 , , r ) and ( r, 0) with r = (cid:98) d/ (cid:99) is the location of all( p (cid:48) , m (cid:48) ) pairs such that ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ I (cid:48) ( d ). The blue shaded parallelogram is the location ofall ( p (cid:48) , m (cid:48) ) pairs such that ( m (cid:48) , p (cid:48) , z (cid:48) ) ∈ P ( α ) for some α = ( m, p, z ) ∈ I ( d ) . (b) illustrates thetransformation ( j, i ) = ( m (cid:48) , m (cid:48) + p (cid:48) ) . Note how the blue parallelogram becomes a rectangle,permitting the use of a 2D RRMQ query. in the entire table. This is similar to the O ( n ) derivation in [10]. We now showhow to reduce this down to O ( n ) using RMQs and Lemma 2.The sped up O ( n ) algorithm works in batched stages. In stage d, thealgorithm calculates OPT s ( α ) for all α ∈ I ( d ). It first spends O ( n ) timebuilding an associated matrix M d and then reduces the calculation of eachOPT s ( α ) to a 2D-RMQ query (and possibly O (1) extra work).Before starting we quickly note a small technical issue concerning the DPinitial conditions. Let ¯ d s = max α =( m ; p ; z ) ∈I s m + p. The starting stage of the algorithms is just to calculate OPT s ( α ) for all α ∈ I ( d )with d = 1 , . . . , ¯ d s . Calculating all of these requires only O (1) time.We now first describe the complete solution for OPT , which will be easier,and then discuss the modifications needed for OPT . Assume then that, for some d > ¯ d , OPT ( α (cid:48) ) is already known for all α (cid:48) ∈ I ( d (cid:48) ), where d (cid:48) < d. If α = ( m ; p ; z ) ∈ I ( d ) then, by definition,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) + CW m (cid:48) − z (cid:48) ,m (cid:48) } (10)where all the OPT ( α (cid:48) ) for α (cid:48) ∈ P ( α ) are already known.Recall that there are O ( n ) signatures α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ). The ideais to arrange the corresponding O ( n ) values OPT( m (cid:48) ; p (cid:48) ; z (cid:48) ) + CW m (cid:48) − z,z in anarray M ( d ) i,j in such a way that, for each individual α ∈ I ( d ), the minimizationin Equation (10) could be performed using just one 2D RMQ query in M ( d ) i,j .The arrangement will use the invertible transformation (see Figure 2) j = m (cid:48) and i = m (cid:48) + p (cid:48) . j ≤ i. Furthermore,( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ) ⇒ d = 2 m (cid:48) + 2 p (cid:48) + z (cid:48) = 2 i + z (cid:48) which in turn implies 2 i ≤ d and z (cid:48) = d − i. Set r = (cid:98) d/ (cid:99) . Then( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ) ⇒ ≤ j ≤ i ≤ r and ( m (cid:48) ; p (cid:48) ; z (cid:48) ) = ( j ; i − j ; d − i ) . (11)Furthermore, working backwards,0 ≤ j ≤ i ≤ r and ( i − j ; d − i ) (cid:54) = (0 , ⇒ ( j ; i − j ; d − i ) ∈ I (cid:48) ( d ) . (12)where the second condition comes from the fact that ( m (cid:48) , p, z (cid:48) ) (cid:54)∈ I (cid:48) ( d ) if( p (cid:48) , z (cid:48) ) = (0 , r + 1) × ( r + 1) matrix(indices of i and j start at 0) M ( d ) i,j (cid:44) ∞ if i > j ∞ if i = j = d OPT ( j ; i − j ; d − i ) + CW j − ( d − i ) ,j Otherwise . Since all the values referenced are already known, this matrix can be built in O ( r ) = O ( n ) time.Then, if α ∈ I ( d ) , from Corollary 1,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) + CW m (cid:48) − z (cid:48) ,m (cid:48) } = min { M ( d ) i,j : i = m (cid:48) + p (cid:48) ≥ m and j = m (cid:48) ≤ m − z } = RM Q (cid:16) M ( d ) i,j : m, r, , m − z (cid:17) = RRM Q (cid:16) M ( d ) i,j : m, m − z (cid:17) Note that the RRMQ query result also provides the indices of the minimizingentry, which provides the corresponding Pred ( α ) value as well.Lemma 2 permits calculating all the O ( r ) RRM Q (cid:16) M ( d ) i,j : a, b (cid:17) values in O ( r ) = O ( n ) time. Thus, all of the OPT ( α ) for α ∈ I ( d ) (and their corre-sponding Pred ( α ) values) can be calculated in O ( n ) total time. Doing thisfor all O ( n ) values of d > ¯ d in increasing order, yields the required O ( n ) timealgorithm for filling in the OPT matrix.We next describe the more complicated algorithm for the OPT case.Assume that OPT ( α ) is already known for all α ∈ I (cid:48) ( d ), d (cid:48) < d. If α =( m ; p ; q ) ∈ I ( d ) then, similar to the OPT case,OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } where all the OPT ( α (cid:48) ) for α (cid:48) ∈ P ( α ) are already known.Following the approach in the OPT algorithm, for fixed d, we would like toarrange the O ( n ) values (OPT ( α (cid:48) ) − CW m (cid:48) ,m − z ) for α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) ( d ),9ppropriately in an array so that each OPT ( α ) entry could be resolved usingone 2D RMQ query. The difficulty is that the values of the array entries dependupon both α and α (cid:48) . More specifically, the CW m (cid:48) ,m − z term would have to bereprocessed for each ( m, z ) pair. Thus, no fixed M i,j array, independent of( m, z ), could be defined.Instead, we utilize a relationship between different queries. More specifically,let α = ( m ; p ; z ) ∈ I ( d ). From Equation (7), z ≤ m − m (cid:48) ≤ m. If z = m, then m (cid:48) = 0 so W m (cid:48) ,m − z = W , = 0 andOPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) { OPT ( α (cid:48) ) } . If z < m then, splitting into the cases m (cid:48) = m − z and m (cid:48) ≤ m − z − ( α ) = min( A, B )where A (cid:44) min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) m (cid:48) = m − z { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } ,B (cid:44) min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P ( α ) m (cid:48) ≤ m − z − { OPT ( α (cid:48) ) − CW m (cid:48) ,m − z } . First note that if m (cid:48) = m − z , then W m (cid:48) ,m − z = 0 so A = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) = m − z { OPT ( α (cid:48) ) } . Next note that, from Corollary 1, P (( m ; p ; z )) ∩{ ( m (cid:48) ; p (cid:48) ; z (cid:48) ) : m (cid:48) ≤ m − z − } = (cid:110) ( m (cid:48) ; p (cid:48) ; z (cid:48) ) ∈ I (cid:48) (2 m + p )) : m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z − (cid:111) = P (( m ; p ; z + 1))and from Definition 3 W m (cid:48) ,m − z = p m − z + W m (cid:48) ,m − ( z +1) . Thus B = − Cp m − z + min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈ P (( m ; p ; z +1)) (cid:8) OPT ( α (cid:48) ) − CW m (cid:48) ,m − ( z +1) (cid:9) = OPT ( m ; p ; z + 1) − Cp m − z Again use the same transformation j = m (cid:48) and i = m (cid:48) + p (cid:48) so that Equa-tions (11) and (12) apply. Set r = (cid:98) d/ (cid:99) , define the ( r + 1) × ( r + 1) array M [ d ] i,j (cid:44) ∞ if i < j ∞ if i = j = d OPT ( j ; i − j ; d − i ) Otherwise . all the O ( r ) RCQ (cid:16) M ( d ) i,j : a, b (cid:17) values in O ( r ) = O ( n ) time.Let α = ( m ; p ; z ) ∈ I ( d ). Then, from the discussion above,If m = z , OPT ( α ) = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) =0 { OPT ( α (cid:48) ) } = min (cid:110) M [ d ] i,j : i ≥ m and j = 0 (cid:111) = RM Q (cid:16) M [ d ] i,j : m, r, , (cid:17) = RCQ (cid:16) M [ d ] i,j : m, (cid:17) which is already known.If z < m , A = min α (cid:48) =( m (cid:48) ; p (cid:48) ,z (cid:48) ) ∈I (cid:48) ( d ) m (cid:48) + p (cid:48) ≥ mm (cid:48) = m − z { OPT ( α (cid:48) ) } = min (cid:110) M [ d ] i,j : i ≥ m and j = m − z (cid:111) = RM Q (cid:16) M [ d ] i,j : m, r, m − z, m − z (cid:17) = RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) . Thus, for α = ( m ; p ; z ) with z < m ,OPT ( α ) = min( A, B ) (13)= min (cid:16) RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) , OPT ( m ; p ; z + 1) − Cp m − z (cid:17) which, since RCQ (cid:16) M [ d ] i,j : m, m − z (cid:17) is already known, can be calculatedin O (1) time if OPT ( m ; p ; z + 1) had already been calculated. The asso-ciated Pred ( α ) can be found appropriately.This permits calculating OPT ( α ) and (and their corresponding Pred ( α )values) for all α = ( m ; p ; z ) ∈ I ( d ) in a total of O ( n ) time as follows:1. First spend O ( n ) time calculating all the RCQ (cid:16) M ( d ) i,j : a, b (cid:17) values.2. For each of the O ( n ) possible fixed pairs m, p satisfying 2 m + p = d (a) Set OPT ( m ; p ; m ) = RRM Q (cid:16) M [ d ] i,j : m, (cid:17) .(b) Then, for z = m − , m − , . . . , calculate OPT ( m ; p ; z ) in (1) timefrom OPT ( m ; p ; z + 1) using Equation (13).Since this is O ( n ) time for fixed d, doing this for all O ( n ) values of d > ¯ d in increasing order yields the required O ( n ) time algorithm for filling in theOPT matrix. 11 a b c d T a b c d p X ( a ) = 0 . , p X ( b ) = 0 . p X ( c ) = 0 . , p X ( d ) = 0 . q ( T ) = 0 . q ( T ) = 0 . . 25 + 0 . 05 = 0 . L ( T ) = 1 · . · . 25 + 2 · . · . 05 = 1 . L ( T ) = 2 · . · . 25 + 2 · . · . 05 = 2 . L ( T , T ) = 1 . · . . · . . 25 + 0 . 8= 1 . . . . < . 75 = L (Huffman X )master nodesslave nodes Figure 3: A binary AIFV-2 code for X = { a, b, c, d } with associated probabilities. Theencoding of b d b c a a is Y = . Note that d, c and the first a were encoded using T while the other letters were encoded using T . This code has cost ≈ . 72 which is betterthan the optimal Huffman code for the same source which has L (Huffman X ) = 1 . 4. A Quick Introdution to AIFV-2 codes Note: This introduction is copied with some small modifications, from [8]. Let X be a memoryless source over a finite alphabet X of size n . ∀ a i ∈ X ,let p i = P X ( a i ) denote the probability of a i ocurring. Without loss of generalitywe assume that p ≥ p ≥ · · · ≥ p n > n (cid:88) i =1 p i = 1 . A codeword c of a binary AIFV code is a string in { , } ∗ . | c | will denote thelength of codeword c .We now briefly describe the structure of Binary AIFV-2 codes using theterminology of [9]. See [9] for more details and Figure 3 for an example.Codes are represented via binary trees with left edges labelled by “0” andright edges by “1”. A Binary AIFV-2 code is a pair of binary code trees, T , T satisfying: • Complete internal nodes in T and T have both left and right children. • Incomplete internal nodes (with the unique exception of the left child ofthe root of T ) have only a “0” (left) child.Incomplete internal nodes are labelled as either master nodes or slave nodes. • A master node must be an incomplete node with an incomplete childThe child of a master node is a slave node. This implies that a master node is connected to its unique grandchild via“ ” with the intermediate node being a slave node. Each source symbol is assigned to one node in T and one node in T .The nodes to which they are assigned are either leaves or master nodes. Symbols are not assigned to complete internal nodes or slave nodes. • The root of T is complete and its “0” child is a slave node.The root of T has no “00” grandchild.Let c s ( a ) , s ∈ { , } denote the codeword of a ∈ X encoded by T s . Theencoding procedure for a sequence x , x . . . of source symbols works as follows.0. Set s = 0 and j = 1 . 1. Encode x j as c s j ( x j ) . 2. If c s j ( x j ) is a leaf in T s j , then set s j +1 = 0else set s j +1 = 1 % this occurs when c s j ( x j ) is a master node in T s j 3. Set j = j + 1 and Goto 1.Note that a symbol is encoded using T if and only if its predecessor wasencoded using a leaf node and it is encoded using T if and only if its predecessorwas encoded using a master node. The decoding procedure is a straightforwardreversal of the encoding procedure. Details are provided in [14] and [10]. Theimportant observation is that identifying the end of a codeword might firstrequire reading an extra two bits past its ending, resulting in a two bit delay,so decoding is not instantaneous.Following [14], we can now derive the average codeword length of a binaryAIFV-2 code defined by trees T , T . The average codeword length L ( T s ) of T s , s ∈ { , } , is L ( T s ) = n (cid:88) i =1 | c s ( a i ) | p i . If the current symbol x j is encoded by a leaf (resp. a master node) of T s j ,then the next symbol x j +1 is encoded by T (resp. T ). This process can bemodelled as a two-state Markov chain with the state being the current encodingtree. Denote the transition probabilities for switching from code tree T s to T s (cid:48) by q s (cid:48) ( T s ). Then, from the definition of the code trees and the encoding/decodingprotocols: q ( T s ) = (cid:88) a ∈L Ts P X ( a ) and q ( T s ) = (cid:88) a ∈M Ts P X ( a )where L T s (resp. M T s ) denotes the set of source symbols a ∈ X that areassigned to a leaf node (resp. a master node) in T s .Given binary AIFV-2 code T , T , as the number of symbols being encodedapproaches infinity, the stationary probability of using code tree T s can then becalculated to be P ( s | T , T ) = q s ( T ˆ s ) q ( T ) + q ( T ) (14)where ˆ s ∈ { , } , s (cid:54) = ˆ s .The average (asymptotically) codeword length (as the number of charactersencoded goes to infinity) of a binary AIFV-2 code is then L AIF V ( T , T ) = P (0 | T , T ) L ( T ) + P (1 | T , T ) L ( T ) (15)13 lgorithm 1 Iterative algorithm to construct an optimal binary AIFV-2 code[15, 10] m ← C (0) = 2 − log (3) repeat m ← m + 1 T ( m )0 = argmin T ∈T ( n ) { L ( T ) + C ( m − q ( T ) } T ( m )1 = argmin T ∈T ( n ) { L ( T ) − C ( m − q ( T ) } Update C ( m ) = L (cid:16) T ( m )1 (cid:17) − L (cid:16) T ( m )0 (cid:17) q (cid:16) T ( m )0 (cid:17) + q (cid:16) T ( m )1 (cid:17) until C ( m ) = C ( m − // Set C ∗ = C ( m ) . Optimal binary AIFV-2 code is T ( m )0 , T ( m )1 [14, 15] showed that the binary AIFV-2 code T , T minimizing Equation (15)can be obtained by Algorithm 1, in which T ( n ) (resp. T ( n )) is the set of allpossible T (resp. T ) coding trees. It implemented the minimization (overall coding trees) in lines 4 and 5 as an ILP. In a later paper [10], the authorsreplaced this ILP with a O ( n ) time and O ( n ) space DP that modified a top-down tree-building DP from [6, 4].[10, 15] proved algebraically that Algorithm 1 would terminate after a finitenumber of steps and that the resulting tree pair T ( m )0 , T ( m )1 is an optimal BinaryAIFV-2 code. They were unable, though, to provide any bounds on the numberof steps needed for termination. [7] then gave two new iterative algorithms thatprovably terminated in O ( b ) iterations, where b is the maximum number of bitsrequired to store any of the probabilities p i (so these were weakly polynomialalgorithms). More formally, let o i , b i be such that p i = o i − b i where o i < b i isan odd positive integer. Then b = max i b i . Each iteration step of [7]’s algorithm ran O (1) of the DPs from [10] so itsfull algorithm for constructing optimal AIFV-2 codes ran in O ( n b ) time. Theresults of this paper replace the O ( n )-time DPs with O ( n )-time DPs, leadingto O ( n b )-time algorithms for constructing optimal AIFV-2 codes.We conclude this section by noting that the correctness of the DPs definedin both [10] and the next section assume that 0 ≤ C ( i ) ≤ 1. The need for thisassumption was implicit in [10] and is made explicit in Lemma 5 in the nextsection. The validity of this assumption was proven in [8]. 5. Deriving the DP Each iteration step in both [10] and [7] requires finding trees that satisfy T ( C ) (cid:44) argmin T ∈T ( n ) { Cost ( T : C ) } , (16) T ( C ) (cid:44) argmin T ∈T ( n ) { Cost ( T : C ) } , (17)where Cost ( T : C ) (cid:44) L ( T ) + Cq ( T ) , (18)14ost ( T : C ) (cid:44) L ( T ) − Cq ( T ) . (19)Since C will be fixed at any iteration stage, we simplify our notation by assuming C fixed and writing Cost ( T ) and Cost ( T ) to denote Equations (18) and (19). Definition 7. Let T be a binary AIFV coding tree. Define ∀ a i ∈ X , c T ( a i ) (cid:44) codeword in T associated with a i ,d T ( i ) (cid:44) | c T ( a i ) | . By the natural correspondence, d T ( i ) is the depth of the node in T associatedwith a i so L ( T ) = (cid:80) ni =1 d T ( i ) p i . Further define ∀ a i ∈ X , m T ( i ) (cid:44) (cid:40) if c T ( a i ) is a master node in T , if c T ( a i ) is a leaf in T , , (cid:96) T ( i ) (cid:44) (cid:40) if m T ( i ) = 1 . if m T ( i ) = 0 .m T ( i ) and (cid:96) T ( i ) are indicator functions as to whether a i is encoded by a masternode or a leaf in T, so, ∀ i, m T ( i ) + (cid:96) T ( i ) = 1 . Note that using this new notationCost ( T ) = n (cid:88) i =1 d T ( i ) p i + C n (cid:88) i =1 m T ( i ) p i , Cost ( T ) = n (cid:88) i =1 d T ( i ) p i − C n (cid:88) i =1 (cid:96) T ( i ) p i . We now show that 0 ≤ C ≤ T ( C ) and T ( C ) can be assumedto possess a nice ordered structure. Lemma 5. Let ≤ C ≤ . Then, if s = 0 (resp. s = 1 ) there exists a tree T ( C ) ∈ T ( n ) (resp. T ( C ) ∈ T ( n ) ) satisfying Equation (16) (resp. Equa-tion (17) ) that, for all i < j , satisfies the following two properties:(P1) d T s ( i ) ≤ d T s ( j ) . (P2) If d T s ( i ) = d T s ( j ) and m T s ( i ) = 1 then m T s ( j ) = 1 .Proof. We say that T = T ( C ) (resp T = T ( C )) is a minimum cost tree (for s ) if it satisfies Equation (16) (resp. (17)).The proof follows from swapping arguments. “Swapping” i and j meansassigning the old codeword c T s ( a i ) to a j and vice-versa. Let T (cid:48) s be the treeresulting from swapping i and j .The following observation is a straightforward calculation:Cost s ( T (cid:48) s ) = Cost s ( T s ) − ( d T s ( i ) − d T s ( j )) ( p i − p j ) + δ ( i, j )where δ ( i, j ) (cid:44) m T s ( i ) = m T s ( j ) , − C ( p i − p j ) if m T s ( i ) = 1, and (cid:96) T s ( j ) = 1 ,C ( p i − p j ) if (cid:96) T s ( i ) = 1, and m T s ( j ) = 1 . We say that ( i, j ) is an inversion for T s if i < j and d T s ( i ) > d T s ( j ).15he calculations above and the fact that 0 ≤ C ≤ 1, immediately imply thatif ( i, j ) is an inversion for T s thenCost s ( T (cid:48) s ) ≤ Cost s ( T s ) . Now let T s be a minimum cost tree for s that has the minimum number ofinversions among all such trees. If no inversion exists, then T s satisfies (P1).Otherwise, let ( i, j ) be the inversion that minimizes j − i . Swapping i and j decreases the number of inversions by 1 while not increasing the cost of the tree,contradicting the definition of T s . We may therefore assume that T s containsno inversion and satisfies (P1).Now say that ( i, j ) is an m(cid:96) -inversion in T s if i < j , d T s ( i ) = d T s ( j ), m T s ( i ) =1 and (cid:96) T s ( j ) = 1. Let T s be a minimum cost tree for s that satisfies (P1) andhas the fewest number of m(cid:96) -inversions. If no m(cid:96) -inversion exists, then T s alsosatisfies (P2) so the lemma is correct. Otherwise let ( i, j ) be an m(cid:96) -inversionthat minimizes j − i. Let T (cid:48) s be the tree that results by swapping i and j. Then T (cid:48) s will still satisfy (P1) but the numbers of inversions will decrease by 1 whileCost s ( T (cid:48) s ) = Cost s ( T s ) − C ( p i − p j ) ≤ Cost s ( T s ) . This contradicts the definition of T s . We may therefore assume T s contains noinversions and satisfies both (P1) and (P2).The consequences of Lemma 5 can be seen in Figure 4. The Lemma impliesthat the optimization in Equation (16) (resp. Equation (17)) can be restrictedto trees that satisfy Properties (P1) and (P2). In particular, the indices ofcodewords on a level are smaller than the indices of codewords on deeper levels.Also, on any given level, the indices of the leaves are smaller than the indices ofthe master nodes. We therefore henceforth assume that all trees in T ( n ) , and T ( n ) satisfy these properties. Definition 8 (Partial Trees and Truncation) . See Figure 5. • A partial binary AIFV code tree ( partial tree for short) T is one thatsatisfies all of the conditions of a binary AIFV code tree and properties(P1), (P2) except that it contains m ≤ n codewords. By (P1), the m ≤ n codewords it contains are c T ( a ) , . . . , c T ( a m ) . • For s ∈ { , } , let ¯ T s ( n ) denote the set of partial trees that satisfy theconditions of T s trees.For notational convenience, also set T ( n ) (cid:44) T ( n ) ∪ T ( n ) and ¯ T ( n ) (cid:44) ¯ T ( n ) ∪ ¯ T ( n ) . • T ∈ ¯ T ( n ) is i -level if depth( T ) ≤ i + 1 . Set ¯ T s ( i : n ) (cid:44) (cid:8) T s ∈ ¯ T s ( n ) : T s is i -level (cid:9) and ¯ T ( i : n ) (cid:44) ¯ T ( i : n ) ∪ ¯ T ( i : n ) . • Let T ∈ T ( n ) . The i -level truncation of T, denoted by Trunc ( i ) ( T ) , is thepartial tree that remains after removing all nodes at depth j > i + 1 from T. p p p p p p p p p p p p T p p p p p p p p p p p p p T (0; 2; 0)sig ( i ) ( T ) i =01234567910 (0; 4; 0)(3; 2; 2)(4; 4; 0)(8; 0; 1)(8; 2; 0)(8; 4; 0)(12; 0; 1)(12; 1; 0)(13; 0; 0) (0; 1; 0)(0; 3; 0)(1; 4; 1)(3; 5; 1)(7; 3; 2)(10; 2; 1)(12; 1; 0)(13; 0; 0)(13; 0; 0)(13; 0; 0)sig ( i ) ( T ) Figure 4: Black nodes are leaves, gray nodes master nodes and blue ones slave nodes. Notethat on every level, the indices of the leaves are smaller than the indices of the master nodes.Also note that in all cases, if sig ( i ) ( T s ) = ( m (cid:48) ; p (cid:48) , z (cid:48) ) and sig ( i +1) ( T s ) = ( m ; p, z ) then 2 m (cid:48) +2 p (cid:48) + z (cid:48) = 2 m + p, m (cid:48) + p (cid:48) ≥ m and m (cid:48) ≤ m − z, as required by Lemma 3. Note: ∀ T ∈ T ( n ) , Trunc ( i ) ( T ) ∈ ¯ T ( i : n ) . Definition 9 (Signatures and Costs) . See Figures 4 and 5. (a) i − level Signatures: The i − level signature of T is the ordered triple sig ( i ) ( T ) (cid:44) ( m ; p ; z ) where m (cid:44) |{ j : d T ( j ) ≤ i }| = i of T ,p (cid:44) i + 1 of T ,z (cid:44) |{ j : d T ( j ) = i and m T ( j ) = 1 }| = i of T . Note that sig ( i ) ( T ) = sig ( i ) (cid:16) Trunc ( i ) ( T ) (cid:17) . (b) i -level Costs: Let sig ( i ) ( T ) = ( m ; p ; z ) . The i -level costs of T are Cost ( i )0 ( T ) (cid:44) iW (cid:48) m + m (cid:88) i =1 d T ( i ) p i + C m − z (cid:88) i =1 m T ( i ) p i . and Cost ( i )1 ( T ) (cid:44) iW (cid:48) m + m (cid:88) i =1 d T ( i ) p i − C m (cid:88) i =1 (cid:96) T ( i ) p i . p p T i =012345 p p p p T (cid:48) = Expand (2) ( T , , p p p p p p p p sig (2) ( T ) = (3; 2; 2) sig (3) ( T (cid:48) ) = (4; 4; 0) sig (4) ( T (cid:48)(cid:48) ) = (8; 0; 1) i p i . . . . . . . . . . . . . W (cid:48) i . . . . . . . . . . . . 02 0Cost (2)1 ( T ) = 2 + C ∗ (3)1 ( T (cid:48) ) = 2 . 45 + C ∗ . 35= Cost (2)1 ( T ) + W (cid:48) + C ∗ W , Cost (4)1 ( T (cid:48)(cid:48) ) = 3 . 45 + C ∗ . 35= Cost (3)1 ( T ) + W (cid:48) + C ∗ W , T (cid:48)(cid:48) = Expand (3) ( T (cid:48) , , Figure 5: Illustrations of the Trunc and Expand operations and Lemma 9. The p i , i =1 , . . . , , are given in the table above the trees. As examples of the Trunc operation notethat Trunc (2) ( T (cid:48)(cid:48) ) = Trunc (2) ( T (cid:48) ) = T and Trunc (3) ( T (cid:48)(cid:48) ) = T (cid:48) . Suppose T s ∈ ¯ T s ( n ) , with depth( T ) = d. An interesting peculiarity of thisdefinition is that T is an i level tree for all i ≥ d − ( d − ( T ) < Cost ( d )0 ( T ) < Cost ( d +1)0 ( T ) < · · · for some indeterminate length chain. The important observation though, is thatCost ( i ) s ( T ) collapses to Cost s ( T ) for the interesting cases. Lemma 6. (a) Let T s ∈ T s ( n ) , with depth( T s ) = d. Then sig ( d ) ( T s ) = ( n ; 0; 0) and Cost ( d ) s ( T s ) = Cost s ( T s ) . (b) Let T s ∈ ¯ T s ( n ) be an i -level tree with sig ( i ) ( T s ) = ( n ; 0; 0) .Then T s ∈ T s ( n ) with depth ( T s ) = i .Proof. (a) By definition, T s is a d -level tree with no nodes on level d + 1 . Let ( m, p, z ) = sig ( d ) ( T s ). Since T s contains n codewords, m = n. T s containsno nodes on level d + 1, so p = 0 . Furthermore, it contains no slave nodes onlevel d + 1 so it contains no master nodes on level d, i.e., z = 0 . Since W (cid:48) n = 0,Cost ( d )0 ( T ) = dW (cid:48) n + n (cid:88) i =1 d T ( i ) p i + C n (cid:88) i =1 m T ( i ) p i = Cost ( T ) . SimilarlyCost ( d )1 ( T ) = dW (cid:48) n + n (cid:88) i =1 d T ( i ) p i − C n (cid:88) i =1 (cid:96) T ( i ) p i = Cost ( T ) . T (0; 2; 0) i =012 T (1; 0; 1) p p (1)1 ( T , ) = 1 Cost (1)1 ( T , ) = 1 − Cp Cost (1)1 ( T , ) = 1 T (0; 3; 0) T (1; 1; 0) T (1; 1; 1) Cost (0)0 ( T , ) = 0 Cost (0)0 ( T , ) = 0 Figure 6: The initial trees introduced in Definition 10. Note that the definition of T treespermit the root to be a master node or an internal node, while the definition of T treesrequires that the root be an internal node. (b) T s contains no master nodes on level i so it contains no slave nodes onlevel i + 1. It also contains no non-slave nodes on level i . So it contains no nodeson level i and depth( T s ) = i. T s ∈ T s ( n ) by definition.The definitions and lemmas immediately imply Corollary 2. T s ( C ) = argmin T s ∈ ¯ T ( n ) ∃ i s.t. T s ∈ ¯ T ( i : n ) and sig ( i ) ( T s )=( n ;0;0) Cost ( i ) s ( T s ) (20)The next definition introduces the initial conditions for the dynamic pro-grams. Definition 10. See Figure 6. Set I = { (0; 2; 0) , (1; 0 , } ,I = { (0; 3; 0) , (1; 1; 0) , (1; 1; 1) } . Note that if ( m ; p ; z ) ∈ I , there exists a unique -level tree T s ∈ ¯ T ( n ) satisfying sig (0) ( T ) = ( m ; p ; z ) . Similarly, if ( m ; p ; z ) ∈ I , there exists a unique -level tree T s ∈ ¯ T ( n ) satisfying sig (1) ( T ) = ( m ; p ; z ) . Let T s ( m ; p ; z ) denote this unique tree and ¯ c s ( m ; p ; z ) = Cost ( s ) s ( T s ( m ; p ; z )) . The following lemma is true by observation Lemma 7. Let n > .If T ∈ ¯ T ( n ) with depth( T ) ≥ , then sig (0) ( T ) ∈ I . If T ∈ ¯ T ( n ) with depth( T ) ≥ , then sig (1) ( T ) ∈ I . Note: The reason for starting with sig (1) ( T ) instead of sig (0) ( T ) is because theroot of a T tree is “unusual”, being a complete node with a slave child, the only timethis combination can occur. By definition, sig (0) ( T ) = (0; 1; 0) . This is misleadingbecause it loses the information about the unusual slave node on level . We thereforeonly start looking at signatures of T trees from level . Definition 11. See Figure 5. Let T (cid:48) ∈ ¯ T ( i : n ) satisfy sig ( i ) ( T (cid:48) ) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and e , e ≥ e + e ≤ p (cid:48) . (21)19 efine the ( e , e ) − expansion of T (cid:48) as the unique tree T = Expand ( i ) ( T (cid:48) , e , e ) in which • the first i -levels of T are identical to those of T (cid:48) . • e of the p (cid:48) non-slave nodes on level i + 1 of T (cid:48) are set as leaves associatedwith a m (cid:48) +1 . . . . , a m (cid:48) + e . • e non-slave nodes on level i + 1 of T (cid:48) are set as master nodes associatedwith a m (cid:48) + e +1 . . . . , a m (cid:48) + e + e (with corresponding slave nodes created onlevel i + 2 ). • the remaining p (cid:48) − e − e non-slave nodes on level i + 1 of T (cid:48) becomecomplete internal nodes, creating p (cid:48) − e − e ) non-slave nodes on level i + 2 . These are in addition to the z (cid:48) non-slave children on level i + 2 ofthe z (cid:48) slave nodes on level i + 1 . Note that this definition implies that sig ( i +1) ( T ) = ( m ; p ; z ) where m = m (cid:48) + e + e , (22) z = e , (23) p = z (cid:48) + 2( p (cid:48) − e − e ) . (24) Lemma 8. (a) Let T (cid:48) ∈ ¯ T ( i : n ) . If sig ( i ) ( T (cid:48) ) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) and ( e , e ) satisfies Equa-tion (21) , then T (cid:48) = Expand ( i ) ( T (cid:48)(cid:48) , e , e ) ∈ ¯ T ( i + 1 : n ) . (b) Let T ∈ ¯ T ( n ) . For i ≥ , set (cid:0) m ( i ) ; p ( i ) ; z ( i ) (cid:1) = sig ( i ) (cid:16) Trunc ( i ) ( T ) (cid:17) . Then Trunc ( i +1) ( T ) = Expand ( i ) (cid:16) Trunc i ( T ) , e , e (cid:17) where e = m ( i +1) − m ( i ) − z ( i +1) and e = z ( i +1) . Proof. (a) follows from the fact that Definition 1 maintains the validity of prop-erties (P1) and (P2) of Lemma 5 and that depth( T (cid:48)(cid:48) ) ≤ i + 2 . (b) just followsdirectly from the definitions.Part (b) implies that any tree T ∈ ¯ T ( n ) can be grown level by level viaexpansion operations.Now recall from Definition 4 the definition of the signature set S n and theoperation → . Lemma 9. Let T (cid:48) ∈ ¯ T ( i : n ) with sig ( i ) ( T (cid:48) ) = α (cid:48) = ( m (cid:48) ; p (cid:48) ; z (cid:48) ) .(a) Let ( e , e ) satisfy Equation (1) .Let T = Expand ( i ) ( T (cid:48) , e , e ) and α = ( m ; p ; z ) = sig ( i +1) ( T ) . Then α (cid:48) → α . b) Let α = ( m ; p ; z ) . If α (cid:48) → α , let e , e be the unique values satisfyingEquations (1)-(4) and set T = Expand ( i ) ( T (cid:48) , e , e ) .Then α = sig ( i +1) ( T ) .(c) If T = Expand ( i ) ( T (cid:48) , e , e ) with α = ( m ; p ; z ) = sig ( i +1) ( T ) , then Cost ( i +1)0 ( T ) = Cost ( i )0 ( T (cid:48) ) + c ( α (cid:48) , α )Cost ( i +1)1 ( T ) = Cost ( i )1 ( T (cid:48) ) + c ( α (cid:48) , α ) Proof. (a) This follows directly from the definition of T = Expand ( i ) ( T (cid:48) , e , e ).(b) From the definition of α (cid:48) → α there exist appropriate e , e satisfyingEquations (1)-(4). Then T = Expand ( i ) ( T (cid:48) , e , e ) has sig ( i +1) ( T ) = ( m ; p ; z ) . (c) From the definitions of signatures and expansions m (cid:48) (cid:88) j =1 d T ( j ) p j = m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j and m (cid:88) j = m (cid:48) +1 d T ( j ) p j = ( i +1) m (cid:88) j = m (cid:48) +1 p j = ( i +1) W m (cid:48) ,m . Furthermore, from Lemma 5 (P1), (P2), the master nodes on level i correspondto a m (cid:48) − z (cid:48) +1 , . . . , a m (cid:48) . Thus (again also using the definition of expansion) m (cid:48) − z (cid:48) (cid:88) j =1 m T ( j ) p j = m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j and m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = m − z (cid:88) j = m (cid:48) − z (cid:48) +1 p j = W m (cid:48) − z (cid:48) ,m (cid:48) . ThenCost ( i +1)0 ( T ) = ( i + 1) W (cid:48) m + m (cid:88) j =1 d T ( j ) p j + C m − z (cid:88) j =1 m T ( j ) p j = ( i + 1) W (cid:48) m + ( i + 1) W m (cid:48) ,m + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + C m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = ( i + 1) W (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + C m − z (cid:88) j = m (cid:48) − z (cid:48) +1 m T ( j ) p j = iW (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j + C m (cid:48) − z (cid:48) (cid:88) j =1 m T (cid:48) ( j ) p j + W (cid:48) m (cid:48) + CW m (cid:48) − z (cid:48) ,m (cid:48) = Cost ( i )0 ( T (cid:48) ) + c ( α (cid:48) , α ) . From Lemma 5 (P1), (P2), the leaves on level i + 1 of T correspond to a m (cid:48) +1 , . . . , a m − z . Thus (again also using the definition of expansion) m (cid:48) (cid:88) j =1 (cid:96) T ( j ) p j = m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j and m (cid:48) (cid:88) j = m (cid:48) +1 (cid:96) T ( j ) p j = m − z (cid:88) j = m (cid:48) +1 p j = W m (cid:48) ,m − z . ( i +1)1 ( T ) = ( i + 1) W (cid:48) m + m (cid:88) j =1 d T ( j ) p j − C m (cid:88) j =1 (cid:96) T ( j ) p j = ( i + 1) W (cid:48) m + ( i + 1) W m (cid:48) ,m + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j − C m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j − C m (cid:88) j = m (cid:48) +1 (cid:96) T ( j ) p j = iW (cid:48) m (cid:48) + m (cid:48) (cid:88) j =1 d T (cid:48) ( j ) p j − C m (cid:48) (cid:88) j =1 (cid:96) T (cid:48) ( j ) p j + W (cid:48) m (cid:48) − CW m (cid:48) ,m − z = Cost ( i )1 ( T (cid:48) ) + c ( α (cid:48) , α ) . Combining Lemmas 7 to 9 immediately imply a direct relationship betweenpaths in the Signature Graph and building a tree level-by-level. Corollary 3. Fix s ∈ { , } . (a) Let T ∈ ¯ T s ( i : n ) and, for all s ≤ j ≤ i set T ( j ) = Trunc ( j ) ( T ) and α ( j ) = sig ( j ) (cid:16) T ( j ) (cid:17) . Then • α ( s ) ∈ I s ; T ( s ) = T s (cid:0) α ( s ) (cid:1) ; Cost s (cid:0) T ( s ) (cid:1) = c s (cid:0) α ( s ) (cid:1) ; • ∀ s ≤ j < i , α ( j ) → α ( j +1) • Cost ( i ) s (cid:0) T ( i ) (cid:1) = ¯ c s (cid:0) α ( s ) (cid:1) + (cid:80) i − j = s c s (cid:0) α ( j ) , α ( j +1) (cid:1) (b) Let (cid:8) α [ j ] (cid:9) ij = s ⊂ S n such that α [ s ] ∈ I s and for all s ≤ j < i , α [ j ] → α [ j +1] .Then there exists an i level tree T ∈ ¯ T s ( n ) such that, using the definitions frompart (a), α ( j ) = α [ j ] . Note: the condition s ≤ j reflects the fact that, from Definition 10, Lemma 7 andthe explanatory note following Lemma 7, the initial condition for T requires j ≥ and the initial condition for T requires j ≥ . This Corollary motivates the original definition of the OPT s ( α ) tables. Lemma 10. Fix s ∈ { , } and define initial signatures I s with associated ¯ c s ( α ) for α ∈ I s as in Definition 10. Let OPT s ( α ) and Pred s ( α ) be as introduced inDefinition 5.Then, for all α ∈ S n , OPT s ( α ) = min (cid:91) i ≥ s (cid:110) Cost ( i ) s ( T s ) : T s ∈ T s ( i : n ) and sig ( i ) ( T s ) = α (cid:111) . (25) Furthermore, an i ≥ s and T s ∈ T s ( i : n ) satisfying sig ( i ) ( T s ) = α and Cost ( i ) s ( T s ) = OPT s ( α ) (26) can be constructed in O ( i ) time using the Pred s ( ) entries. roof. Recall the interpretation of OP T s ( α ) given after Definition 5. Considerthe α as nodes in a directed graph with edge costs defined by c s ( α (cid:48) , α ) exceptthat edges from (0; 0; 0) to α ∈ I s have cost ¯ c s ( α ) and all other undefined edgecosts are set to ∞ . Then OPT s ( α ) is just the cost of the shortest path from(0; 0; 0) to α .Corollary 3(a) then implies that if T s ∈ ¯ T s ( i : n ) with sig ( i ) ( T s ) = α, thenthere exists a path from (0; 0; 0) to α with cost Cost ( i ) s ( T s ) . In the other direction, Corollary 3(b) implies that if P is a i -edge path from(0; 0; 0) to α , then there exists T s ∈ ¯ T s ( i : n ) with sig ( i ) ( T s ) = α, and Cost ( i ) s ( T s )equal to the cost of the path.This proves Equation (25).The actual tree T s satisfying Equation (25) can be found by following thePred s ( ) values backwards from α until reaching α (cid:48) ∈ I s . This provides a pathfrom (0; 0; 0) to α with cost OPT s ( α ). This path can be translated into T s viaCorollary 3(b).Corollary 2 then immediately implies Corollary 4. Fix s ∈ { , } . Then min T s ∈T s ( n ) { Cost s ( T s : C ) } = OPT s ( n ; 0; 0) . Furthermore, if i ≥ s and T s ∈ T s ( i : n ) are such that Cost ( i ) s ( T s ) = OPT s ( n ; 0; 0) ,then T s ( C ) = T s . In words, the Corollary states that T s ( C ) can be found by filling in theOPT s ( ) table and then using the Pred s ( ) entries to construct the tree corre-sponding to OPT s ( n ; 0; 0) . Since Section 3 gives an O ( n ) algorithm for fillingin the OPT s ( ) and Pred s ( ) tables, this leads to the desired O ( n ) algorithmfor solving the original problem. References:References [1] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter Shor, and RobertWilber. Geometric applications of a matrix-searching algorithm. Algorith-mica , 2(1-4):195–208, 1987.[2] Wolfgang Bein. Advanced techniques for dynamic programming. In Hand-book of Combinatorial Optimization , number January 1998, pages 41–92.2013.[3] Rainer E Burkard, Bettina Klinz, and R¨udiger Rudolf. Perspectives ofmonge properties in optimization. Discrete Applied Mathematics , 70(2):95–161, 1996.[4] Sze-Lok Chan and Mordecai J Golin. A dynamic programming algorithmfor constructing optimal “1”-ended binary prefix-free codes. IEEE Trans-actions on I.T. , 46(4):1637–1644, 2000.235] David Eppstein, Zvi Galil, and Raffaele Giancarlo. Speeding up dynamicprogramming. In [Proceedings 1988] 29th Annual Symposium on Founda-tions of Computer Science , pages 488–496. IEEE, 1988.[6] M. J. Golin and G. Rote. A dynamic programming algorithm for construct-ing optimal prefix-free codes with unequal letter costs. IEEE Transactionson I.T. , 44(5):1770–1781, Sept 1998.[7] Mordecai Golin and Elfarouk Harb. Polynomial time algorithms for con-structing optimal aifv codes. In ,pages 231–240. IEEE, 2019.[8] Mordecai Golin and Elfarouk Harb. Polynomial time algorithms for con-structing optimal binary aifv-2 codes. ArXiv:2001.11170 [cs.IT] , 2020.[9] W. Hu, H. Yamamoto, and J. Honda. Worst-case redundancy of optimalbinary aifv codes and their extended codes. IEEE Transactions on I.T. ,63(8):5074–5086, Aug 2017.[10] K. Iwata and H. Yamamoto. A dynamic programming algorithm to con-struct optimal code trees of AIFV codes. In , pages 641–645, Oct2016.[11] Donald E. Knuth. Optimum binary search trees. Acta informatica , 1(1):14–25, 1971.[12] Michelle L. Wachs. On an efficient dynamic programming technique of F.F. Yao. Journal of Algorithms , 10(4):518–530, 1989.[13] Gerhard J. Woeginger. Monge strikes again: Optimal placement of webproxies in the internet. Operations Research Letters , 27(3):93–96, 2000.[14] H. Yamamoto and X. Wei. Almost instantaneous FV codes. In , pages 1759–1763,July 2013.[15] Hirosuke Yamamoto, Masato Tsuchihashi, and Junya Honda. Almost in-stantaneous fixed-to-variable length codes. IEEE Transactions on I.T. ,61(12):6432–6443, 2015.[16] F. F. Yao. Efficient dynamic programming using quadrangle inequalities. Proceedings of the twelfth annual ACM symposium on Theory of Computing(STOC’80) , pages 429–435, 1980.[17] F. F. Yao. Speed-up in dynamic programming. SIAM Journal on AlgebraicDiscrete Methods , 3(4):532–540, 1982.[18] Hao Yuan and Mikhail J Atallah. Data structures for range minimumqueries in multidimensional arrays. In