[PDF] On Compressing Permutations and Adaptive Sorting

Abstract

Previous compact representations of permutations have focused on adding a small index on top of the plain data <π(1),π(2),...π(n)> , in order to efficiently support the application of the inverse or the iterated permutation. In this paper we initiate the study of techniques that exploit the compressibility of the data itself, while retaining efficient computation of π(i) and its inverse. In particular, we focus on exploiting {\em runs}, which are subsets (contiguous or not) of the domain where the permutation is monotonic. Several variants of those types of runs arise in real applications such as inverted indexes and suffix arrays. Furthermore, our improved results on compressed data structures for permutations also yield better adaptive sorting algorithms.

Full PDF

aa r X i v : . [ c s . D S ] A ug On Compressing Permutations and Adaptive Sorting ∗ Jérémy Barbay Gonzalo Navarro

Dept. of Computer Science, University of Chile

Abstract

Previous compact representations of permutations have focused on adding a small indexon top of the plain data h π (1) , π (2) , . . . π ( n ) i , in order to eﬃciently support the application ofthe inverse or the iterated permutation. In this paper we initiate the study of techniques thatexploit the compressibility of the data itself, while retaining eﬃcient computation of π ( i ) andits inverse. In particular, we focus on exploiting runs , which are subsets (contiguous or not) ofthe domain where the permutation is monotonic. Several variants of those types of runs arise inreal applications such as inverted indexes and suﬃx arrays. Furthermore, our improved resultson compressed data structures for permutations also yield better adaptive sorting algorithms. Permutations of the integers [1 ..n ] = { , . . . , n } are not only a fundamental mathematical struc-ture, but also a basic building block for the succinct encoding of integer functions [MR04],strings [Kär99, GMR06, GV06, ANS06, MN07, CHSV08], binary relations [BHMR07], and geo-metric grids [BLNS09], among others. A permutation π can be trivially encoded in n ⌈ lg n ⌉ bits,which is within O ( n ) bits of the information theory lower bound of lg( n !) bits, where lg x = log x denotes the logarithm in base two.In most of those applications, eﬃcient computation is required for both the value π ( i ) at anypoint i ∈ [1 ..n ] of the permutation, and for the position π − ( j ) of any value j ∈ [1 ..n ] (i.e., the valueof the inverse permutation). The only alternative we are aware of to storing explicitly both π and π − is by Munro et al. [MRRR03], who add a small structure over the plain representation of π sothat, by spending ǫ lg n extra bits, any π − ( j ) can be computed in time O (1 /ǫ ) . This is extendedto any positive or negative power of π , π k ( i ) . They give another solution using O ( n ) extra bits andcomputing any π k ( j ) in time O (lg n/ lg lg n ) .The lower bound of lg( n !) bits yields a lower bound of Ω( n lg n ) comparisons to sort such apermutation in the comparison model, in the worst case over all permutations of n elements. Yet,a large body of research has been dedicated to ﬁnding better sorting algorithms which can take ad-vantage of speciﬁcities of each permutation to sort. Some examples are permutations composed ofa few sorted blocks [Man85] (e.g., ( , , , , , , , , , ) or ( , , , , , , , , , ) ), or per-mutations containing few sorted subsequences [LP94] (e.g., ( , , , , , , , , , ) ). Algorithmsperforming possibly o ( n lg n ) comparisons on such permutations, yet still O ( n lg n ) comparisons in ∗ Partially funded by Fondecyt Grant 1-110066, Chile. An early partial version of this paper appeared in

STACS [BN09]. o ( n lg n ) comparisons on a class of “easy” permutations,each adaptive algorithm yields a compression scheme for permutations, at the cost of losing aconstant factor on the complementary class of “hard” permutations. Yet such compression schemesdo not necessarily support eﬃciently the computation of value π − ( j ) of the inverse permutationfor an arbitrary value j ∈ [1 ..n ] , or even the simple application of the permutation, π ( i ) .This is the topic of our study: the interplay between adaptive sorting algorithms and compressedrepresentation of permutations that support eﬃcient application of π ( i ) and π − ( j ) . In particularwe focus on classes of permutations that can be decomposed into a small number of runs , that is,monotone subsequences of π , either contiguous or not.Our results include compressed representations of permutations whose space and time to com-pute any π ( i ) and π − ( j ) are proportional to the entropy of the distribution of the sizes of theruns. As far as we know, this is the ﬁrst compressed representation of permutations with similarcapabilities.We also develop the corresponding sorting algorithms, which in general reﬁne the known com-plexities to sort those classes of permutations: While there exist sorting algorithms taking advantageof the number of runs of various kinds, ours take advantage of their size distribution and are strictlybetter (or equal, at worst).Finally, we obtain a representation for strings that improves upon the state of the art [FMMN07,GRR08] in the average case, while retaining their space and worst-case performance for operationsaccess, rank, and select.At the end of the article we describe some applications where the class of permutations com-pressible with the techniques we develop here naturally arise, and conclude with a more generalperspective on the meaning of those results and the research directions they suggest. We deﬁne the entropy of a distribution [CT91], a measure that will be useful to evaluate compress-ibility results.

Deﬁnition 1

The entropy of a sequence of positive integers X = h n , n , . . . , n r i adding up to n is H ( X ) = P ri =1 n i n lg nn i . By concavity of the logarithm, it holds that ( r −

1) lg n ≤ n H ( X ) ≤ n lg r and that H ( h n , n , . . . n r i ) > H ( h n + n , . . . , n r i ) . Here h n , n , . . . , n r i is a distribution of values adding up to n and H ( X ) measures how even isthe distribution. H ( X ) is maximal ( lg r ) when all n i = n/r and minimal ( r − n lg n + n − r +1 n lg nn − r +1 )when they are most skewed ( X = h , , . . . , , n − r + 1 i ).This measure is related to entropy of random variables and of sequences as follows. If arandom variable P takes the value i with probability n i /n , for ≤ i ≤ r , then its entropy is2 ( h n , n , . . . , n r i ) . Similarly, if a string S [1 ..n ] contains n i occurrences of character c i , then itsempirical zero-order entropy is H ( S ) = H ( h n , n , . . . , n r i ) . H ( X ) is then a lower bound to the average number of bits needed to encode an instance of P ,or to encode a character of S (if we model S statistically with a zero-order model, that is, ignoringthe context of characters). The Huﬀman algorithm [Huf52] receives frequencies X = h n , n , . . . , n r i adding up to n , andoutputs in O ( r lg r ) time a preﬁx-free code for the symbols [1 ..r ] . If ℓ i is the bit length of the codeassigned to the i th symbol, then L = P ℓ i n i is minimal. Moreover, L < n (1 + H ( X )) . For example,given S [1 ..n ] over alphabet [1 ..r ] , with symbol frequencies X , one can compress S by concatenatingthe codewords of the successive symbols S [ i ] , achieving total length L < n (1 + H ( S )) . (One alsohas to encode the usually negligible codebook of O ( r lg r ) bits.)Huﬀman’s algorithm starts with a forest of r leaves corresponding to the frequencies { n , n , . . . , n r } , and outputs a binary trie with those leaves, in some order. This so-called Huﬀmantree describes the optimal encoding as follows: The sequence of left/right choices (interpreted as0/1) in the path from the root to each leaf is the preﬁx-free encoding of that leaf, of length ℓ i equalto the leaf depth.A generalization of this encoding is multiary Huﬀman coding [Huf52], in which the tree is givenarity t , and then the Huﬀman codewords are sequences over an alphabet [1 ..t ] . In this case thealgorithm also produces the optimal code, of length L < n (1 + H ( X ) / lg t ) . Let S [1 ..n ] be a sequence of symbols from the alphabet [1 ..r ] . This includes bitmaps when r = 2 (where, for convenience, the alphabet will be { , } rather than { , } ). We will make use of succinctrepresentations of S that support the rank and select operators over strings and over binary vectors: rank c ( S, i ) gives the number of occurrences of c in S [1 ..i ] and select c ( S, j ) gives the position in S of the j th occurrence of c .When r = 2 , S requires n bits and rank and select can be supported in constant time using O ( n lg lg n/ lg n ) = o ( n ) bits on top of S [Mun96, Gol06].Raman et al. [RRR02] devised a bitmap representation that takes n H ( S ) + o ( n ) bits, whilemaintaining the constant time for supporting the operators. For the binary case H ( S ) is just m lg nm + ( n − m ) lg nn − m = m lg nm + O ( m ) , where m is the number of bits set to in S . Golynskiet al. [GGG +

07] reduced the o ( n ) -bits redundancy in space to O ( n lg lg n/ lg n ) .When m is much smaller than n , the o ( n ) -bits term may dominate. Gupta et al. [GHSV06]showed how to achieve space m lg nm + O ( m lg lg nm + lg n ) bits, which largely reduces the dependenceon n , but now rank and select are supported in O (lg m ) time via binary search [Gup07, Theorem17 p. 153].For larger alphabets, of size r = O ( polylog ( n )) , Ferragina et al. [FMMN07] showed how torepresent the sequence within n H ( S ) + o ( n lg r ) bits and support rank and select in constanttime. Golynski et al. [GRR08, Lemma 9] improved the space to n H ( S ) + o ( n lg r/ lg n ) bits whileretaining constant times.Grossi et al. [GGV03] introduced the so-called wavelet tree , which decomposes an arbitrarysequence into several bitmaps. By representing the bitmaps in compressed form [GGG + n H ( S ) + o ( n ) and rank and select are supported in time O (lg r ) . Multiarywavelet trees decompose the sequence into subsequences over a sublogarithmic-sized alphabet andreduce the time to O (1 + lg r/ lg lg n ) [FMMN07, GRR08].In this article n will generally denote the length of the permutation. All of our o () expressions,even those including several variables, will be asymptotic in n . The complexity of adaptive algorithms , for problems such as searching, sorting, merging sortedarrays or convex hulls, is studied in the worst case over instances of ﬁxed size and diﬃculty , for adeﬁnition of diﬃculty that is speciﬁc to each analysis. Even though sorting a permutation in thecomparison model requires Θ( n lg n ) comparisons in the worst case over permutations of n elements,better results can be achieved for some parameterized classes of permutations. We describe someof those below, see the survey by Moﬀat and Petersson [MP92] for others.Knuth [Knu98] considered runs (contiguous ascending subsequences) of a permutation π ,counted by nRuns = 1 + |{ i : 1 ≤ i < n, π ( i + 1) < π ( i ) }| . Levcopoulos and Petersson [LP94]introduced

Shuﬄed Up-Sequences and its generalization

Shuﬄed Monotone Sequences , respectivelycounted by nSUS = min { k : π is covered by k increasing subsequences } , and nSMS = min { k : π is covered by k monotone subsequences } . By deﬁnition, nSMS ≤ nSUS ≤ nRuns .Munro and Spira [MS76] took an orthogonal approach, considering the task of sorting multisetsthrough various algorithms such as MergeSort, showing that they can be adapted to perform intime O ( n (1 + H ( h m , . . . , m r i ))) where m i is the number of occurrences of i in the multiset (notethis is totally diﬀerent from our results, that depend on the distribution of the lengths of monotoneruns).Each adaptive sorting algorithm in the comparison model yields a compression scheme for per-mutations, but the encoding thus deﬁned does not necessarily support the simple application of thepermutation to a single element without decompressing the whole permutation, nor the applicationof the inverse permutation. Our most fundamental representation takes advantage of permutations that are formed by a fewmonotone (ascending or descending) runs.

Deﬁnition 2 A down step of a permutation π over [1 ..n ] is a position ≤ i < n such that π ( i +1) <π ( i ) . An ascending run in a permutation π is a maximal range of consecutive positions [ i..j ] thatdoes not contain any down step. Let d , d , . . . , d k be the list of consecutive down steps in π . Then thenumber of ascending runs of π is noted nRuns = k +1 , and the sequence of the lengths of the ascendingruns is noted vRuns = h n , n , . . . , n nRuns i , where n = d , n = d − d , . . . , n nRuns − = d k − d k − , and n nRuns = n − d k . (If k = 0 then nRuns = 1 and vRuns = h n i = h n i .) The notions of up step and descending run are deﬁned similarly. For example, the permutation ( , , , , , , , , , ) contains nRuns = 2 ascending runs, oflengths forming the vector vRuns = h , i .We now describe a data structure that represents a permutation partitioned into nRuns ascendingruns, and is able to compute any π ( i ) and π − ( i ) .4 .1 Structure Construction

We ﬁnd the down-steps of π in linear time, obtaining nRuns runs of lengths vRuns = h n , . . . , n nRuns i , and then apply the Huﬀman algorithm to the vector vRuns . When we set up theleaves v of the Huﬀman tree, we store their original index in vRuns , idx ( v ) , and the startingposition in π of their corresponding run, pos ( v ) . After the tree is built, we use idx ( v ) to computea permutation φ over [1 .. nRuns ] so that φ ( i ) = j if the leaf corresponding to n i is placed at the j thleft-to-right leaf in the Huﬀman tree. We also compute φ − . We also precompute a bitmap C [1 ..n ] that marks the beginning of runs in π and give constant-time support for rank and select . Since C contains only nRuns bits set out of n , it is represented in compressed form [GGG +

07] within nRuns lg n nRuns + o ( n ) bits.Now we set a new permutation π ′ over [1 ..n ] where the runs are written in the order given by φ − : We ﬁrst copy from π the run whose endpoints are those of the leftmost tree leaf, then therun pointed by the second leftmost leaf, and so on. Simultaneously, we compute pos ′ ( v ) for theleaves v , denoting the starting position of the area they cover in π ′ . After creating π ′ the originalpermutation π can be deleted. We say that an internal node covers the contiguous area of π ′ formedby concatenating the runs of all the leaves that descend from v . We compute, for all nodes v , pos ′ ( v ) , the starting position of the area covered by v in π ′ , length ( v ) , the size of that area, and leaves ( v ) , the number of leaves that descend from v .Now we enhance the Huﬀman tree into a wavelet-tree-like structure [GGV03] without alteringits shape, as follows. Starting from the root, ﬁrst process recursively each child. For the leaves wedo nothing. Once the left and right children, v l and v r , of an internal node v have been processed,the invariant is that the areas they cover have already been sorted. We create a bitmap for v , of size length ( v ) . Now we merge the areas of v l and v r in time O ( length ( v )) . As we do the merging, eachtime we take an element from v l we append a bit to the node bitmap, and a bit when we takean element from v r . When we ﬁnish, π ′ has been sorted and we can delete it. The Huﬀman-shapedwavelet tree (only with ﬁelds leaves and pos ), φ , and C represent π . Space and construction cost

Note that each of the n i elements of leaf i (at depth ℓ i ) is merged ℓ i times, contributing ℓ i bits to the bitmaps of its ancestors, and thus the total number of bits inall bitmaps is P n i ℓ i . Thus the total number of bits in the Huﬀman-shaped wavelet tree is at most n (1 + H ( vRuns )) . Those bitmaps, however, are represented in compressed form [GGG + n extra bits added by the Huﬀman encoding.Let us call m j = n φ − ( j ) the length of the run corresponding to the j th left-to-right leaf, and m i,j = m i + . . . + m j . The compressed representation [GGG +

07] takes, on a bitmap of length n and m m lg nm +( n − m ) lg nn − m bits, plus a redundancy of O ( n lg lg n/ lg n ) bits. We prove by induction(see also Grossi et al. [GGV03]) that the compressed space allocated for all the bitmaps descendingfrom a node covering leaves [ i..k ] is P i ≤ r ≤ k m r lg m i,k m r (we consider the redundancy later). Considertwo sibling leaves merging two runs of m i and m i +1 elements. Their parent bitmap contains m i

0s and m i +1 m i lg m i + m i +1 m i + m i +1 lg m i + m i +1 m i +1 bits. Now consider a general Huﬀman tree node merging a left subtree covering leaves [ i..j ] anda right subtree covering leaves [ j + 1 ..k ] . Then the bitmap of the node will be compressed to m i,j lg m i,k m i,j + m j +1 ,k lg m i,k m j +1 ,k bits. By the inductive hypothesis, all the bitmaps on the left child andits subtrees add up to P i ≤ r ≤ j m r lg m i,j m r , and those on the right add up to P j +1 ≤ r ≤ k m r lg m j +1 ,k m r .Adding up the three formulas we get the inductive thesis.5herefore, a compressed representation of the bitmaps requires n H ( vRuns ) bits, plus the redun-dancy. The latter, added over all the bitmaps, is O ( n (1 + H ( vRuns )) lg lg n/ lg n ) = o ( n ) because H ( vRuns ) ≤ lg n . To this we must add the O ( nRuns lg n ) bits of the tree pointers and extra datalike pos and leaves , the O ( nRuns lg nRuns ) bits for φ , and the nRuns lg n nRuns + o ( n ) bits for C .The construction time is O ( nRuns lg nRuns ) for the Huﬀman algorithm, plus O ( nRuns ) for com-puting φ and ﬁlling the node ﬁelds like pos and leaves , plus O ( n ) for constructing π ′ and C , plus thetotal number of bits appended to all bitmaps, which includes the merging cost. The extra structuresfor rank are built in linear time on those bitmaps. All this adds up to O ( n (1+ H ( vRuns ))) , because nRuns lg nRuns ≤ n H ( vRuns ) + lg n by concavity, recall Deﬁnition 1. Computing π and π − One can regard the wavelet tree as a device that tracks the evolution ofa merge-sorting of π ′ , so that in the bottom we have (conceptually) the sequence π ′ (with one runper leaf) and in the top we have (conceptually) the sorted permutation (1 , , . . . , n ) .To compute π − ( j ) we start at the top and ﬁnd out where that position came from in π ′ . Westart at oﬀset j ′ = j of the root bitmap B . If B [ j ′ ] = 0 , then position j ′ came from the left subtreein the merging. Thus we go down to the left child with j ′ ← rank ( B, j ′ ) , which is the position of j ′ in the array of the left child before the merging. Otherwise we go down to the right child with j ′ ← rank ( B, j ′ ) . We continue recursively until we reach a leaf v . At this point we know that j came from the corresponding run, at oﬀset j ′ , that is, π − ( j ) = pos ( v ) + j ′ − .To compute π ( i ) we do the reverse process, but we must ﬁrst determine the leaf v and oﬀset i ′ within v corresponding to position i : We compute l = φ ( rank ( C, i )) , so that i falls at the l thleft-to-right leaf. Then we traverse the Huﬀman tree down so as to ﬁnd the l th leaf. This is easilydone as we have leaves ( v ) stored at internal nodes. Upon arriving at leaf v , we know that the oﬀsetis i ′ = i − pos ( v ) + 1 . We now start an upward traversal from v using the nodes that are alreadyin the recursion stack. If v is a left child of its parent u , then we set i ′ ← select ( B, i ′ ) to locateit in the merged array of the parent, else we set i ′ ← select ( B, i ′ ) , where B is the bitmap of u .Then we set v ← u and continue until reaching the root, where we answer π ( i ) = i ′ . Query time

In both queries the time is O ( ℓ ) , where ℓ is the depth of the leaf arrived at. If i ischosen uniformly at random in [1 ..n ] , then the average cost is n P n i ℓ i = O (1+ H ( vRuns )) . However,the worst case can be O ( nRuns ) in a fully skewed tree. We can ensure ℓ = O (lg nRuns ) in the worstcase while maintaining the average case by slightly rebalancing the Huﬀman tree [ML01]. Given anyconstant x > , the height of the Huﬀman tree can be bound to at most (1 + x ) lg nRuns so that thetotal number of bits added to the encoding is at most n · nRuns − x lg ϕ , where ϕ ≈ . is the goldenratio. This is o ( n ) if nRuns = ω (1) , and otherwise the cost was O ( nRuns ) = O (1) anyway. Similarly,the average time stays O (1 + H ( vRuns )) , as it increases at most by O ( nRuns − x lg ϕ ) = O (1) . Thisrebalancing takes just O ( nRuns ) time if the frequencies are already sorted.Note also that the space required by the query is O (lg nRuns ) . This can be made constant bystoring parent pointers in the wavelet tree, which does not change the asymptotic space. To make sure this is o ( n ) even if there are many short bitmaps, we can concatenate all the bitmaps into a singleone, and replace pointers to bitmaps by oﬀsets to this single bitmap. Operations rank and select translate easilyinto a concatenated bitmap. While the linear construction time is not obvious from their article [GGG + heorem 1 There is an encoding scheme using at most n H ( vRuns ) + O ( nRuns lg n ) + o ( n ) bits torepresent a permutation π over [1 ..n ] covered by nRuns contiguous ascending runs of lengths formingthe vector vRuns . It can be built within time O ( n (1 + H ( vRuns ))) , and supports the computationof π ( i ) and π − ( i ) in time O (1 + lg nRuns ) for any value of i ∈ [1 ..n ] . If i is chosen uniformly atrandom in [1 ..n ] then the average computation time is O (1 + H ( vRuns )) . We note that the space analysis leading to n H ( vRuns ) + o ( n ) bits works for any tree shape.We could have used a balanced tree, yet we would not achieve O (1 + H ( vRuns )) average time. Onthe other hand, by using Hu-Tucker codes instead of Huﬀman, as in our previous work [BN09], wewould not need the permutation φ and, by using compact tree representations [SN10], we wouldbe able to reduce the space to n H ( vRuns ) + O ( nRuns lg n nRuns ) + o ( n ) . This is interesting for largevalues of nRuns , as it is always n H ( vRuns ) + o ( n (1 + H ( vRuns )) even if nRuns = Θ( n ) . We can easily extend Theorem 1 to mix ascending and descending runs.

Corollary 2

Theorem 1 holds verbatim if π is partitioned into a sequence nRuns contiguous mono-tone (i.e., ascending or descending) runs of lengths forming the vector vRuns . Proof.

We mark in a bitmap of length nRuns whether each run is ascending or descending, andthen reverse descending runs in π , so as to obtain a new permutation π asc , which is representedusing Theorem 1 (some runs of π could now be merged in π asc , but this only reduces H ( vRuns ) ,recall Deﬁnition 1).The values π ( i ) and π − ( j ) are easily computed from π asc : If π − asc ( j ) = i , we use C to determinethat i is within run π asc ( ℓ..r ) , that is, ℓ = select ( rank ( C, i )) and r = select ( rank ( C, i )+1) − .If that run is reversed in π , then π − ( j ) = ℓ + r − i , else π − ( j ) = i . For π ( i ) , we use C todetermine that i belongs to run π ( ℓ..r ) . If the run is descending, then we return π asc ( ℓ + r − i ) ,else we return π asc ( i ) . The operations on C require only constant time. The extra constructiontime is just O ( n ) , and no extra space is needed apart from nRuns = o ( nRuns lg n ) bits. (cid:3) Note that, unlike the case of ascending runs, where there is an obviously optimal way of parti-tioning (that is, maximize the run lengths), we have some freedom when partitioning into ascendingor descending runs, at the endpoints of the runs: If an ascending (resp. descending) run is followedby a descending (resp. ascending) run, the limiting element can be moved to either run; if two as-cending (resp. descending) runs are consecutive, one can create a new descending (resp. ascending)run with the two endpoint elements. While ﬁnding the optimal partitioning might not be easy, wenote that these decisions cannot aﬀect more than O ( nRuns ) elements, and thus the entropy of thepartition cannot be modiﬁed by more than O ( nRuns lg n ) , which is absorbed by the redundancy ofour representation. We do not follow this path because we are more interested in multiary codes (see Section 3.5) and, to the bestof our knowledge, there is no eﬃcient (i.e., O ( nRuns lg nRuns ) time) algorithm for building multiary Hu-Tucker codes[Knu98]. .4 Improved Adaptive Sorting One of the best known sorting algorithms is MergeSort, based on a simple linear procedure to mergetwo already sorted arrays, and with a worst case complexity of n ⌈ lg n ⌉ comparisons and O ( n lg n ) running time. It had been already noted [Knu98] that ﬁnding the down-steps of the array in lineartime allows improving the time of MergeSort to O ( n (1 + lg nRuns )) (the down-step concept can beapplied to general sequences, where consecutive equal values do not break runs).We now show that the construction process of our data structure sorts the permutation and,applied on a general sequence, it achieves a reﬁned sorting time of O ( n (1 + H ( vRuns )) ⊂ O ( n (1 +lg nRuns )) (since H ( vRuns ) ≤ lg nRuns ). Theorem 3

There is an algorithm sorting an array of length n covered by nRuns contiguous mono-tone runs of lengths forming the vector vRuns in time O ( n (1 + H ( vRuns ))) , which is worst-caseoptimal in the comparison model. Proof.

Our wavelet tree construction of Theorem 1 (and Corollary 2) indeed sorts π within thistime, and it also works if the array is not a permutation. This is optimal because, even con-sidering just ascending runs, there are n ! n ! n ! ...n nRuns ! diﬀerent permutations that can be coveredwith runs of lengths forming the vector vRuns = h n , n , . . . , n nRuns i . Thus lg n ! n ! n ! ...n nRuns ! com-parisons are necessary. Using Stirling’s approximation to the factorial we have lg n ! n ! n ! ...n nRuns ! =( n + 1 /

2) lg n − P i ( n i + 1 /

2) lg n i − O (lg nRuns ) . Since P lg n i ≤ nRuns lg( n/ nRuns ) , this is n H ( vRuns ) − O ( nRuns lg( n/ nRuns )) = n H ( vRuns ) − O ( n ) . The term Ω( n ) is also necessary toread the input, hence implying a lower bound of Ω( n (1 + H ( vRuns ))) .Note, however, that the set of permutations that can be covered with nRuns runs of lengths vRuns , may contain permutations that can be covered with fewer runs (as two consecutive runscould be merged), and thus they have entropy less than H ( vRuns ) , recall Deﬁnition 1. Wehave proved that the lower bound applies to the union of two classes: one (1) contains (some )permutations of entropy H ( vRuns ) and the other (2) contains (some) permutations of entropy lessthan H ( vRuns ) . Obviously the bound does not hold for class (2) alone, as we can sort it in lesstime. Since we can tell the class of a permutation in O ( n ) time by counting the down-steps, itfollows that the bound also applies to class (1) alone (otherwise O ( n ) + o ( n H ( vRuns )) would beachievable for (1) + (2)). (cid:3) The time performance achieved in Theorem 1 (and Corollary 2) can be boosted by an O (lg lg n ) time factor by using Huﬀman codes of higher arity.Given the run lengths vRuns , we build the t -ary Huﬀman tree for vRuns , with t = √ lg n . Sincenow we merge t children to build the parent, the sequence stored in the parent to indicate thechild each element comes from is not binary, but over alphabet [1 ..t ] . In addition, we set up nRuns pointers to provide direct access to the leaves, and parent pointers.The total length of all the sequences stored at all the Huﬀman tree nodes is < n (1 + H ( vRuns ) / lg t ) [Huf52]. To reduce the redundancy, we represent each sequence S [1 ..m ] stored Other permutations with vectors distinct from vRuns could also have entropy H ( vRuns ) .

8t a node using the compressed representation of Golynski et al. [GRR08, Lemma 9], which yieldsspace m H ( S ) + O ( m lg t lg lg m/ lg m ) bits.For the string S [1 ..m ] corresponding to a leaf covering run lengths m , . . . , m t , we have m H ( S ) = P m i lg mm i . From there we can carry out exactly the same analysis done in Sec-tion 3.1 for binary trees, to conclude that the sum of the m H ( S ) bits for all the strings S over all the tree nodes is n H ( vRuns ) . On the other hand, the redundancies add up to O ( n (1 + H ( vRuns ) / lg t ) lg t lg lg n/ lg n ) = o ( n ) bits. The advantage of the t -ary representation is that the average leaf depth is H ( vRuns ) / lg t = O (1 + H ( vRuns ) / lg lg n ) . The algorithms to compute π ( i ) and π − ( i ) are similar, except that rank and select are carried out on sequences S over alphabets of size √ lg n . Those operations can stillbe carried out in constant time on the representation we have chosen [GRR08]. The only detailis that, for π ( i ) we ﬁrst moved from the root to the leaf using the ﬁeld leaves ( v ) . This does notanymore allow us processing a node in constant time, and thus we have opted for storing an arrayof pointers to the leaves and parent pointers.For the worst case, if nRuns = ω (1) , we can again limit the depth of the Huﬀman tree to O (lg nRuns / lg lg n ) and maintain the same average time. The multiary case is far less understoodthan the binary case. Recently, an algorithm to ﬁnd the optimal length-restricted t -ary code has beenpresented whose running time is linear once the lengths are sorted [Bae07]. To analyze the increasein redundancy, consider the sub-optimal method that simply takes any node v of depth more than ℓ = 4 lg nRuns / lg t and balances its subtree (so that height nRuns / lg t is guaranteed). Since anynode at depth ℓ covers a total length of at most n/t ⌊ ℓ/ ⌋ (see next paragraph), the sum of all thelengths covered by these nodes is at most nRuns · n/t ⌊ ℓ/ ⌋ . By forcing those subtrees to be balanced,the average leaf depth increases by at most (lg nRuns / lg t ) nRuns /t ⌊ ℓ/ ⌋ ≤ lg( nRuns ) / ( nRuns lg t ) = O (1) . Hence the worst case is limited to O (1 + lg nRuns / lg lg n ) while the average case stays within O (1+ H ( vRuns ) / lg lg n ) . For the space we need a ﬁner consideration: As nRuns = ω (1) , the increasein average leaf depth is o (1 / lg t ) . Since increasing by one the depth of a leaf covering m elementscosts m lg t further bits, the total increase in space redundancy is o ( n ) .The limit on the probability is obtained as follows. Consider a node v in the t -ary Huﬀmantree. Then length ( u ) ≥ length ( v ) for any uncle u of v , as otherwise switching v and u improvesthe already optimal Huﬀman tree. Hence w , the grandparent of v (i.e., the parent of u ) must coveran area of size length ( w ) ≥ t · length ( v ) . Thus the covered length is multiplied at least by t whenmoving from a node to its grandparent. Conversely, it is divided at least by t as we move from anode to any grandchild. As the total length at the root is n , the length covered by any node v atdepth ℓ is at most length ( v ) ≤ n/t ⌊ ℓ/ ⌋ .This yields our ﬁnal result for contiguous monotone runs. Theorem 4

There is an encoding scheme using at most n H ( vRuns ) + O ( nRuns lg n ) + o ( n ) bits toencode a permutation π over [1 ..n ] covered by nRuns contiguous monotone runs of lengths forming thevector vRuns . It can be built within time O ( n (1 + H ( vRuns ) / lg lg n )) , and supports the computationof π ( i ) and π − ( i ) in time O (1+lg nRuns / lg lg n ) for any value of i ∈ [1 ..n ] . If i is chosen uniformlyat random in [1 ..n ] then the average computation time is O (1 + H ( vRuns ) / lg lg n ) . The only missing part is the construction time, since now we have to build strings S [1 ..m ] bymerging t increasing runs. This can be done in O ( m ) time by using atomic heaps [FW94]. Thecompressed sequence representations are built in linear time [GRR08]. Note this implies that we Again, we can concatenate all the sequences to make sure this redundancy is asymptotic in n . nRuns contiguous monotone runs of lengths forming the vector vRuns in time O ( n (1 + H ( vRuns ) / lg lg n )) , yet we are not anymore within the comparison model. Interestingly, the previous result yields almost directly a new representation of sequences that,compared to the state of the art [FMMN07, GRR08], provides improved average time performance.

Theorem 5

Given a string S [1 ..n ] over alphabet [1 ..σ ] with zero-order entropy H ( S ) , there is anencoding for S using at most n H ( S ) + O ( σ lg n ) + o ( n ) bits and answering queries S [ i ] , rank c ( S, i ) and select c ( S, i ) in time O (1 + lg σ/ lg lg n ) for any c ∈ [1 ..σ ] and i ∈ [1 ..n ] . When i is chosen atrandom in query S [ i ] , or c is chosen with probability n c /n in queries rank c ( S, i ) and select c ( S, i ) ,where n c is the frequency of c in S , the average query time is O (1 + H ( S ) / lg lg n ) . Proof.

We build exactly the same t -ary Huﬀman tree used in Theorem 4, using the frequencies n c instead of run lengths. The sequences at each internal node are formed so as to indicate howthe symbols in the child nodes are interleaved in S . This is precisely a multiary Huﬀman-shapedwavelet tree [GGV03, FMMN07], and our previous analysis shows that the space used by the treeis exactly as in Theorem 4, where now the entropy is H ( S ) = P c n c n lg nn c . The three queries aresolved by going down or up the tree and using rank and select on the sequences stored at thenodes [GGV03, FMMN07]. Under the conditions stated for the average case, one arrives at the leafof symbol c with probability n c /n , and then the average case complexities follow. (cid:3) Some classes of permutations can be covered by a small number of runs of a stricter type. Wepresent an encoding scheme that take advantage of them.

Deﬁnition 3 A strict ascending run in a permutation π is a maximal range of positions satisfying π ( i + k ) = π ( i ) + k . The head of such run is its ﬁrst position. The number of strict ascending runsof π is noted nSRuns , and the sequence of the lengths of the strict ascending runs is noted vSRuns .We will call vHRuns the sequence of contiguous monotone run lengths of the sequence formed by thestrict run heads of π . Similarly, the notion of a strict descending run can be deﬁned, as well as thatof strict (monotone) run encompassing both. For example, the permutation ( , , , , , , , , , ) contains nSRuns = 2 strict runs, oflengths vSRuns = h , i . The run heads are h , i , which form 1 monotone run, of lengths vHRuns = h i . Instead, the permutation ( , , , , , , , , , ) contains nSRuns = 10 strict runs, each oflength 1. Theorem 6

Assume there is an encoding P for a permutation over [1 ..n ] with nRuns contigu-ous monotone runs of lengths forming the vector vRuns , which requires s ( n, nRuns , vRuns ) bits ofspace and can apply the permutation and its inverse in time t ( n, nRuns , vRuns ) . Now consider apermutation π over [1 ..n ] covered by nSRuns strict runs and by nRuns ≤ nSRuns monotone runs,and let vHRuns be the vector formed by the nRuns monotone run lengths in the permutation ofstrict run heads. Then there is an encoding scheme using at most s ( nSRuns , nRuns , vHRuns ) + ( nSRuns lg n nSRuns ) + o ( n ) bits for π . It can be computed in O ( n ) time on top of that for building P .It supports the computation of π ( i ) and π − ( i ) in time O ( t ( nSRuns , nRuns , vHRuns )) for any value i ∈ [1 ..n ] . Proof.

We ﬁrst set up a bitmap R of length n marking with a 1 bit the beginning of the strict runs.We set up a second bitmap R inv such that R inv [ i ] = R [ π − ( i )] . Now we create a new permutation π ′ over [1 .. nSRuns ] which collapses the strict runs of π , π ′ ( i ) = rank ( R inv , π ( select ( R, i ))) . Allthis takes O ( n ) time and the bitmaps take nSRuns lg n nSRuns + O ( nSRuns ) + o ( n ) bits in compressedform [GGG + rank and select are supported in constant time.Now we build the structure P for π ′ . The number of monotone runs in π is the same as for thesequence of strict run heads in π , and in turn the same as the runs in π ′ . So the number of runsin π ′ is also nRuns and their lengths are vHRuns . Thus we require s ( nSRuns , nRuns , vHRuns ) furtherbits.To compute π ( i ) , we ﬁnd i ′ ← rank ( R, i ) and then compute j ′ ← π ′ ( i ′ ) . The ﬁnal answer is select ( R inv , j ′ ) + i − select ( R, i ′ ) . To compute π − ( j ) , we ﬁnd j ′ ← rank ( R inv , j ) and thencompute i ′ ← ( π ′ ) − ( j ′ ) . The ﬁnal answer is select ( R, i ′ ) + j − select ( R inv , j ′ ) . The struc-ture requires only constant time on top of that to support the operator π ′ () and its inverse π ′− () . (cid:3) The theorem can be combined with previous results, for example Theorem 4, in order to obtainconcrete data structures. This representation is interesting because its space could be much lessthan n if nSRuns is small enough. However, it still retains an o ( n ) term that can be dominant.The following corollary describes a compressed data structure where the o ( n ) term is signiﬁcantlyreduced. Corollary 7

The o ( n ) term in the space of Theorem 6 can be replaced by O ( nSRuns lg lg n nSRuns +lg n ) at the cost of O (1 + lg nSRuns ) extra time for the queries. Proof.

Replace the structure of Golynski et al. [GGG +

07] by the binary searchable gap encoding ofGupta et al. [GHSV06], which takes O (1 + lg nSRuns ) time for rank and select (recall Section 2.3). (cid:3) Other tradeoﬀs for the bitmap encodings are possible, such as the one described byGupta [Gup07, Theorem 18 p. 155].

Up to now our runs have been contiguous in π . Levcopoulos and Petersson [LP94] introducedthe more sophisticated concept of partitions formed by interleaved runs, such as Shuﬄed UpSe-quences (SUS) and

Shuﬄed Monotone Sequences (SMS). We now show how to take advantage ofpermutations formed by shuﬄing (interleaving) a small number of runs.

Deﬁnition 4

A decomposition of a permutation π over [1 ..n ] into Shuﬄed UpSequences is a set of,not necessarily consecutive, subsequences of increasing numbers that have to be removed from π inorder to reduce it to the empty sequence. The number of shuﬄed upsequences in such a decompositionof π is noted nSUS , and the vector formed by the lengths of the involved shuﬄed upsequences, inarbitrary order, is noted vSUS . When the subsequences can be of increasing or decreasing numbers, e call them Shuﬄed Monotone Sequences , call nSMS their number and vSMS the vector formed bytheir lengths.

For example, the permutation ( , , , , , , , , , ) contains nSUS = 2 shuﬄed upse-quences of lengths forming the vector vSUS = h , i , but nRuns = 5 runs, all of length 2. In-terestingly, we can reduce the problem of representing shuﬄed sequences to that of representingstrings and contiguous runs. We ﬁrst show how a permutation with a small number of shuﬄed monotone sequences can berepresented using strings over a small alphabet and permutations with a small number of contiguousmonotone sequences.

Theorem 8

Assume there exists an encoding P for a permutation over [1 ..n ] with nRuns contiguousmonotone runs of lengths forming the vector vRuns , which requires s ( n, nRuns , vRuns ) bits of spaceand supports the application of the permutation and its inverse in time t ( n, nRuns , vRuns ) . Assumealso that there is a data structure S for a string S [1 ..n ] over an alphabet of size nSMS with symbolfrequencies vSMS , using s ′ ( n, nSMS , vSMS ) bits of space and supporting operators rank , select , andaccess to values S [ i ] , in time t ′ ( n, nSMS , vSMS ) . Now consider a permutation π over [1 ..n ] coveredby nSMS shuﬄed monotone sequences of lengths vSMS . Then there exists an encoding of π usingat most s ( n, nSMS , vSMS ) + s ′ ( n, nSMS , vSMS ) + O ( nSMS lg n nSMS ) + o ( n ) bits. Given the covering intoSMSs, the encoding can be built in time O ( n ) , in addition to that of building P and S . It supportsthe computation of π ( i ) and π − ( i ) in time t ( n, nSMS , vSMS ) + t ′ ( n, nSMS , vSMS ) for any value of i ∈ [1 ..n ] . The result is also valid for shuﬄed upsequences, in which case P is just required to handleascending runs. Proof.

Given the partition of π into nSMS monotone subsequences, we create a string S [1 ..n ] overalphabet [1 .. nSMS ] that indicates, for each element of π , the label of the monotone sequence itbelongs to. We encode S [1 ..n ] using the data structure S . We also store an array A [1 .. nSMS ] so that A [ ℓ ] is the accumulated length of all the sequences with label less than ℓ .Now consider the permutation π ′ formed by the sequences taken in label order: π ′ can be coveredwith nSMS contiguous monotone runs vSMS , and hence can be encoded using s ( n, nSMS , vSMS ) additional bits using P . This supports the operators π ′ () and π ′− () in time t ( n, nSMS , vSMS ) (again, some of the runs could be merged in π ′ , which only improves time and space in P ). Thus π ( i ) = π ′ ( A [ S [ i ]] + rank S [ i ] ( S, i )) can be computed in time t ( n, nSMS , vSMS ) + t ′ ( n, nSMS , vSMS ) .Similarly, π − ( i ) = select ℓ ( S, ( π ′ ) − ( i ) − A [ ℓ ]) , where ℓ is such that A [ ℓ ] < ( π ′ ) − ( i ) ≤ A [ ℓ + 1] , canalso be computed in time t ( n, nSMS , vSMS ) + t ′ ( n, nSMS , vSMS ) , plus the time to ﬁnd ℓ . The latter isreduced to constant by representing A with a bitmap A ′ [1 ..n ] with the bits set at the values A [ ℓ ] + 1 ,so that A [ ℓ ] = select ( A ′ , ℓ ) − , and the binary search is replaced by ℓ = rank ( A ′ , ( π ′ ) − ( i )) .With the structure of Golynski et al. [GGG + A ′ uses O ( nSMS lg n nSMS ) + o ( n ) bits and operatesin constant time. (cid:3) We will now obtain concrete results by using speciﬁc representations for P and S , and speciﬁcmethods to ﬁnd the decomposition into shuﬄed sequences.12 .2 Shuﬄed UpSequences Given an arbitrary permutation, one can decompose it in linear time into contiguous runs in orderto minimize H ( vRuns ) , where vRuns is the vector of run lengths. However, decomposing the samepermutation into shuﬄed up (resp. monotone) sequences so as to minimize either nSUS or H ( vSUS ) (resp. nSMS or H ( vSMS ) ) is computationally harder.Fredman [Fre75] gave an algorithm to compute a partition of minimum size nSUS , into upse-quences, claiming a worst case complexity of O ( n lg n ) . Even though he did not claim it at thetime, it is easy to observe that his algorithm is adaptive in nSUS and takes O ( n (1 + lg nSUS )) time. We give here an improvement of his algorithm that computes the partition itself within time O ( n (1 + H ( vSUS ))) , no worse than the time of his original algorithm, as H ( vSUS ) ≤ lg nSUS . Theorem 9

If an array D [1 ..n ] can be optimally covered by nSUS shuﬄed upsequences (equal valuesdo not break an upsequence), then there is an algorithm ﬁnding a covering of size nSUS in time O ( n (1 + H ( vSUS ))) ⊂ O ( n (1 + lg nSUS )) , where vSUS is the vector formed by the lengths of theupsequences found. Proof.

Initialize a sequence S = ( D [1]) , and a splay tree T [ST85] with the node ( S ) , ordered bythe rightmost value of the sequence contained by each node. For each further array element D [ i ] ,search for the sequence with the maximum ending point no larger than D [ i ] . If it exists, add D [ i ] to this sequence, otherwise create a new sequence and add it to T .Fredman [Fre75] already proved that this algorithm ﬁnds a partition of minimum size nSUS .Note that, although the rightmost values of the splay tree nodes change when we insert a newelement in their sequence, their relative position with respect to the other nodes remains the same,since all the nodes at the right hold larger values than the one inserted. This implies in particularthat only searches and insertions are performed in the splay tree.A simple analysis, valid for both the plain sorted array in Fredman’s proof and the splay tree ofour own proof, yields an adaptive complexity of O ( n (1+lg nSUS )) comparisons, since both structurescontain at most nSUS elements at any time. The additional linear term (relevant when nSUS = 1 )corresponds to the cost of reading each element once.The analysis of the algorithm using the splay tree reﬁnes the complexity to O ( n (1 + H ( vSUS ))) ,where vSUS is the vector formed by the lengths of the upsequences found. These lengths correspondto the frequencies of access to each node of the splay tree, which yields the total access time of O ( n (1 + H ( vSUS ))) [ST85, Theorem 2]. (cid:3) The theorem obviously applies to the particular case where the array is a permutation. For per-mutations and, in general, integer arrays over a universe [1 ..m ] , we can deviate from the comparisonmodel and ﬁnd the partition within time O ( n lg lg m ) , by using y -fast tries [Wil83] instead of splaytrees.We can now give a concrete representation for shuﬄed upsequences. The complete descriptionof the permutation requires to encode the computation the partitioning and of the comparisonsperformed by the sorting algorithm. This time the encoding cost of partitioning is as important asthat of merging. Theorem 10

Let π be a permutation over [1 ..n ] that can be optimally covered by nSUS shuﬄed upse-quences, and let vSUS be the vector formed by the lengths of the decomposition found by the algorithmof Theorem 9. Then there is an encoding scheme for π using at most n H ( vSUS )+ O ( nSUS lg n )+ o ( n ) its. It can be computed in time O ( n (1+ H ( vSUS ))) , and supports the computation of π ( i ) and π − ( i ) in time O (1 + lg nSUS / lg lg n ) for any value of i ∈ [1 ..n ] . If i is chosen uniformly at random in [1 ..n ] the average query time is O (1 + H ( vSUS ) / lg lg n ) . Proof.

We ﬁrst use Theorem 9 to ﬁnd the SUS partition of optimal size nSUS , and the corre-sponding vector vSUS formed by the sizes of the subsequences of this partition. Then we applyTheorem 8: For the data structure S we use Theorem 5, whereas for P we use Theorem 4. Note H ( vSUS ) is both H ( S ) and H ( vRuns ) for permutation π ′ . The result follows immediately. (cid:3) One would be tempted to consider the case of a permutation π covered by nSUS upsequenceswhich form strict runs, as a particular case. Yet, this is achieved by resorting directly to Theorem 4.The corollary extends verbatim to shuﬄed monotone sequences. Corollary 11

There is an encoding scheme using at most n H ( vSUS ) + O ( nSUS lg n ) + o ( n ) bits toencode a permutation π over [1 ..n ] optimally covered by nSUS shuﬄed upsequences, of lengths formingthe vector vSUS , and made up of strict runs. It can be built within time O ( n (1 + H ( vSUS ) / lg lg n )) ,and supports the computation of π ( i ) and π − ( i ) in time O (1 + lg nSUS / lg lg n ) for any value of i ∈ [1 ..n ] . If i is chosen uniformly at random in [1 ..n ] then the average query time is O (1 + H ( vSUS ) / lg lg n ) . Proof.

It is suﬃcient to invert π and represent π − using Theorem 4, since in this case π − iscovered by nSUS ascending runs of lengths forming the vector vSUS : If i < i . . . < i m forms astrict upsequence, so that π ( i t ) = π ( i ) + t , then calling j = π ( i ) we have the ascending run π − ( j + t ) = i t for ≤ t ≤ m . (cid:3) Once more, our construction translates into an improved sorting algorithm, improving on thecomplexity O ( n (1 + lg nSUS )) of the algorithm by Levcopoulos and Petersson [LP94]. Corollary 12

We can sort an array of length n , optimally covered by nSUS shuﬄed upsequences,in time O ( n (1 + H ( vSUS ))) , where vSUS are the lengths of the decomposition found by the algorithmof Theorem 9. Proof.

Our construction in Theorem 10 ﬁnds and separates the subsequences of π , and sorts them,all within this time (we do not need to build the string S ). (cid:3) Open problem

Note that the algorithm of Theorem 9 ﬁnds a partition of minimal size nSUS (thisis what we refer to with “optimally covered”), but that the entropy H ( vSUS ) of this partition is notnecessarily minimal: There could be another partition, even of size larger than nSUS , with lowerentropy. Our results are only in function of the entropy of the partition of minimal size nSUS found.This is unsatisfactory, as the ideal would be to speak in terms of the minimum possible H ( vSUS ) ,just as we could do for H ( vRuns ) .An example, consider the permutation (1 , , . . . , n/ − , n, n/ , n/ , . . . , n − ,for some even integer n . The algorithm of Theorem 9 yields the partition { (1 , , . . . , n/ − , n ) , ( n/ , n/ , . . . , n − } of entropy H ( h n/ , n/ i ) = n lg 2 = n . This issuboptimal, as the partition { (1 , , . . . , n/ − , n/ , n/ , . . . , n − , ( n ) } is of much smallerentropy, H ( h n − , i ) = ( n −

1) lg nn − + lg n = O (lg n ) .14n the other hand, a greedy online algorithm cannot minimize the entropy of a SUS partition-ing. As an example consider the permutation (2 , , . . . , n/ , , n, n/ , . . . , n − , for some eveninteger n . A greedy online algorithm that after processing a preﬁx of the sequence minimizes theentropy of such preﬁx, produces the partition { (1 , n/ , . . . , n − , (2 , , . . . , n/ , n ) } , of size andentropy H ( h n/ , n/ i ) = n . However, a much better partition is { (1 , n ) , (2 , , . . . , n − } , of size and entropy H ( h , n − i ) = O (lg n ) .We doubt that the SUS partition minimizing H ( vSUS ) can be found within time O ( n (1 + H ( vSUS ))) or even O ( n (1 + lg nSUS )) . Proving this right or wrong is an open challenge. No eﬃcient algorithm is known to compute the minimum number nSMS of shuﬄed monotone se-quences composing a permutation, let alone ﬁnding a partition minimizing the entropy H ( vSMS ) ofthe lengths of the subsequences. The problem is NP-hard, by reduction to the computation of the“cochromatic” number of the graph corresponding to the permutation [KSW96].Yet, should such a partition into monotone subsequences be available, and be of smaller entropythan the partitions considered in the previous sections, this would yield an improved encoding bydoing just as in Theorem 10 for SUS.Note that it takes a diﬀerence by a superpolynomial margin between the values of nSUS and nSMS to yield a noticeable diﬀerence between lg nSUS and lg nSMS , and hence between the valuesof H ( vSUS ) and H ( vSMS ) . It seems unlikely that such a diﬀerence would justify the diﬀerence ofcomputing time between the two types of partitions, also diﬀerent by a superpolynomial margin tothe best of current knowledge (i.e., if P = N P ). Relation between space and time

Bentley and Yao [BY76] introduced a family of searchalgorithms adaptive to the position of the element sought (also known as the “unbounded search”problem) through the deﬁnition of a family of adaptive codes for unbounded integers, hence provingthat the link between algorithms and encodings was not limited to the complexity lower boundssuggested by information theory. Such a relation between “time” and “space” can be found in othercontexts: algorithms to merge two sets deﬁne an encoding for sets [AL09], and the binary results ofthe comparisons of any deterministic sorting algorithm in the comparison model yields an encodingof the permutation being sorted.We have shown that some concepts originally deﬁned for adaptive variants of the algorithmMergeSort, such as runs and shuﬄed sequences, are useful in terms of the compression of permuta-tions, and conversely, that concepts originally deﬁned for data compression, such as the entropy ofthe sets of run lengths, are a useful addition to the set of diﬃculty measures previously consideredin the study of adaptive sorting algorithms.Much more work is required to explore the application to the compression of permutationsand strings of the many other measures of preorder introduced in the study of adaptive sortingalgorithms. Figure 1 represents graphically some of those measures of presortedness (adding to thosedescribed by Moﬀat and Petersson [MP92], those described in this and other recent work [BFN11])and a preorder on them based on optimality implication in terms of the number of comparisonperformed. This is relevant for the space of the corresponding permutation encodings, and for the15 ist Block Rem Exc = HamReg Loc Osc Inv = DS Max = ParnSMS Enc nSUS nRuns nSRuns H ( vSMS ) H ( vSUS ) H ( vLRM ) H ( vRuns ) H ( vSRuns ) Figure 1: Partial order on some measures of disorder for adaptive sorting. New results are on thebottom line.space used by the potential corresponding compressed data structures for permutations. Note thatthe reductions in this graph do not represent reductions in terms of optimality of the running timeto ﬁnd the partitions. For instance, we saw that H ( vSMS ) -optimality implies H ( vSUS ) -optimality interms of the number of comparison performed, but not in terms of the running time. In terms ofdata structures, this relates to the construction time of the compressed data structure (as opposedto the space it takes). Adaptive operators

It is worth noticing that, in many cases, the time to support the operatorson the compressed permutations is smaller as the permutation is more compressed, in oppositionwith the traditional setting where one needs to decompress part or all of the data in order to supportthe operators. This behavior, incidental in our study, is a very strong incentive to further developthe study of diﬃculty or compressibility measures: measures such that “easy” instances can bothbe compressed and manipulated in better time capture the essence of the data.

Compressed indices

Interestingly enough, our encoding techniques for permutations compressboth the permutation and its index (i.e., the extra data to speed up the operators). This is opposedto previous work [MRRR03] on the encoding of permutations, whose index size varied with the size ofthe cycles of the permutation, but whose data encoding was ﬁxed; and to previous work [BHMR07]where the data itself can be compressed but not the index, to the point where the space used by theindex dominates that used by the data itself. This direction of research is promising, as in practiceit is more interesting to compress the whole succinct data structure or at least its index, rather thanjust the data.

Applications

Permutations are everywhere, so that compressing their representation helps com-press many other forms of data, and supporting in reasonable time the operators on permutationsyield support for other operators.As a ﬁrst example, consider a natural language text tokenized into word identiﬁers. Its word-based inverted index stores for each distinct word the list of its occurrences in the tokenized text,in increasing order. This is a popular data structure for text indexing [BYRN11, WMB99]. Byregarding the concatenation of the lists of occurrences of all the words, a permutation π is obtainedthat is formed by ν contiguous ascending runs, where ν is the vocabulary size of the text. The lengthsof those runs corresponds to the frequencies of the words in the text. Therefore our representationachieves the zero-order word-based entropy of the text, which in practice compresses the text toabout 25% of its original size [BCW90]. With π ( i ) we can access any position of any inverted16ist, and with π − ( j ) we can ﬁnd the word that is at any text position j . Thus the representationcontains the text and its inverted index within the space of the compressed text.A second example is given by compressed suﬃx arrays (CSAs), which are data structures forindexing general texts. A family of CSAs builds on a function called Ψ [GV06, Sad03, GGV03],which is actually a permutation. Much eﬀort was spent in compressing Ψ to the zero- or higher-orderentropy of the text while supporting direct access to it. It turns out that Ψ contains σ contiguousincreasing runs, where σ is the alphabet size of the text, and that the run lengths correspond to thesymbol frequencies. Thus our representation of Ψ would reach the zero-order entropy of the text.It supports not only access to Ψ but also to its inverse Ψ − , which enables so-called bidirectionalindexes [RNOM09], which have several interesting properties. Furthermore, Ψ contains a number ofstrict ascending runs that depends on the high-order entropy of the text, and this allows compressingit further [NM07].From a practical point of view, our encoding schemes are simple enough to be implemented.Some preliminary results on inverted indexes and compressed suﬃx arrays show good performanceson practical data sets. As an external test, the techniques were successfully used to handle scalabilityproblems in MPI applications [KMW10]. Followup

Our preliminary results [BN09] have stimulated further research. This is just a glimpseof the work that lies ahead on this topic.While developing, with J. Fischer, compressed indexes for Range Minimum Query indexes basedon Left-to-Right Minima (LRM) trees [Fis10, SN10], we realized that LRM trees yield a techniqueto rearrange in linear time nRuns contiguous ascending runs of lengths forming vector vRuns , into apartition of nLRM = nRuns ascending subsequences of lengths forming a new vector vLRM , of smallerentropy H ( vLRM ) ≤ H ( vRuns ) [BFN11]. Compared to a SUS partition, the LRM partition can havelarger entropy, but it is much cheaper to compute and encode. We represent it on Figure 1 between H ( vRuns ) and H ( vSUS ) .While developing, with T. Gagie and Y. Nekrich, an elegant combination of previously knowncompressed string data structures to attain superior space/time trade-oﬀs [BGNN10], we realizedthat this yields various compressed data structures for permutations π such that the times for π () and π − () are improved to log-logarithmic. While those results subsume our initial ﬁndings [BN09],the improved results now presented in Theorem 4 are incomparable, and in particular superior whenthe number of runs is polylogarithmic in n . Acknowledgements

We thank Ian Munro, Ola Petersson and Alistair Moﬀat for interesting discussions.

References [AL09] Bruno T. Ávila and Eduardo S. Laber. Merge source coding. In

ISIT’09: Proceedings ofthe 2009 IEEE international conference on Symposium on Information Theory , pages214–218, Piscataway, NJ, USA, 2009. IEEE Press.17ANS06] D. Arroyuelo, G. Navarro, and K. Sadakane. Reducing the space requirement of LZ-index. In

Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM) ,LNCS 4009, pages 319–330, 2006.[Bae07] M. Baer. D-ary bounded-length Huﬀman coding.

CoRR , abs/cs/0701012, 2007.[BCW90] T. Bell, J. Cleary, and I. Witten.

Text compression . Prentice Hall, 1990.[BFN11] Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. LRM-Trees: Compressed in-dices, adaptive sorting, and compressed permutations. In

Proc. 22th Annual Symposiumon Combinatorial Pattern Matching (CPM) , LNCS 6661, pages 285–298, 2011.[BGNN10] Jérémy Barbay, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich. Alphabet parti-tioning for compressed rank/select and applications. In O. Cheong, K.-Y. Chwa, andK.Parks, editors,

Proceedings of ISAAC 2010, LNCS , volume 6507, pages 315–326, 2010.[BHMR07] Jérémy Barbay, Meng He, J. Ian Munro, and S. Srinivasa Rao. Succinct indexes forstrings, binary relations and multi-labeled trees. In

Proceedings of the 18th ACM-SIAMSymposium on Discrete Algorithms (SODA) , pages 680–689. ACM, 2007.[BLNS09] N. Brisaboa, M. Luaces, G. Navarro, and D. Seco. A new point access method based onwavelet trees. In

Proc. 3rd International Workshop on Semantic and Conceptual Issuesin GIS (SeCoGIS) , LNCS 5833, pages 297–306, 2009.[BN09] J. Barbay and G. Navarro. Compressed representations of permutations, and appli-cations. In

Proc. 26th International Symposium on Theoretical Aspects of ComputerScience (STACS) , pages 111–122. Schloss Dagstuhl, Leibnitz Zentrum fuer Informatik,Germany, 2009.[BY76] Jon Louis Bentley and Andrew Chi-Chih Yao. An almost optimal algorithm for un-bounded searching.

Information processing letters , 5(3):82–87, 1976.[BYRN11] R. Baeza-Yates and B. Ribeiro-Neto.

Modern Information Retrieval . Addison-Wesley,2nd edition, 2011.[CHSV08] Y.-F. Chien, W.-K. Hon, R. Shah, and J. Vitter. Geometric Burrows-Wheeler transform:Linking range searching and text indexing. In

Proc. Data Compression Conference(DCC) , pages 252–261, 2008.[CT91] T. Cover and J. Thomas.

Elements of Information Theory . Wiley, 1991.[Fis10] J. Fischer. Optimal succinctness for range minimum queries. In

Proc. 9th Symposiumon Latin American Theoretical Informatics (LATIN) , LNCS 6034, pages 158–169, 2010.[FMMN07] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations ofsequences and full-text indexes.

ACM Transactions on Algorithms (TALG) , 3(2):article20, 2007.[Fre75] M. L. Fredman. On computing the length of longest increasing subsequences.

DiscreteMath. , 11:29–35, 1975. 18FW94] M. Fredman and D. Willard. Trans-dichotomous algorithms for minimum spanningtrees and shortest paths.

Journal of Computer and Systems Science , 48(3):533–551,1994.[GGG +

07] A. Golynski, R. Grossi, A. Gupta, R. Raman, and S.S. Rao. On the size of succinctindices. In

Proc. 15th Annual European Symposium on Algorithms (ESA) , LNCS 4698,pages 371–382, 2007.[GGV03] R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In

Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 841–850, 2003.[GHSV06] A. Gupta, W.-K. Hon, R. Shah, and J.S. Vitter. Compressed data structures: Dictio-naries and data-aware measures. In

Proc. 16th Data Compression Conference (DCC) ,pages 213–222, 2006.[GMR06] Alexander Golynski, J. Ian Munro, and S. Srinivasa Rao. Rank/select operations onlarge alphabets: a tool for text indexing. In

Proceedings of the 17th Annual ACM-SIAMSymposium on Discrete Algorithms (SODA) , pages 368–373. ACM, 2006.[Gol06] A. Golynski. Optimal lower bounds for rank and select indexes. In

Proc. 33th Interna-tional Colloquium on Automata, Languages and Programming (ICALP) , LNCS 4051,pages 370–381, 2006.[GRR08] A. Golynski, R. Raman, and S. Rao. On the redundancy of succinct data structures. In

Proc. 11th Scandinavian Workshop on Algorithm Theory (SWAT) , LNCS 5124, pages148–159, 2008.[Gup07] A. Gupta.

Succinct Data Structures . PhD thesis, Dept. of Computer Science, DukeUniversity, 2007.[GV06] R. Grossi and J. Vitter. Compressed suﬃx arrays and suﬃx trees with applications totext indexing and string matching.

SIAM Journal on Computing , 35(2):378–407, 2006.[Huf52] D. Huﬀman. A method for the construction of minimum-redundancy codes.

Proceedingsof the I.R.E. , 40(9):1090–1101, 1952.[Kär99] J. Kärkkäinen.

Repetition-based text indexes . PhD thesis, Dept. of Computer Science,University of Helsinki, Finland, 1999. Also available as Report A-1999-4, Series A.[KMW10] H. Kamal, S. Mirtaheri, and A. Wagner. Scalability of communicators and groups inMPI. In

Proc. 19th ACM International Symposium on High Performance DistributedComputing (HPDC) , pages 264–275, 2010.[Knu98] Donald E. Knuth.

Art of Computer Programming, Volume 3: Sorting and Searching(2nd Edition) . Addison-Wesley Professional, April 1998.[KSW96] André E. Kézdy, Hunter S. Snevily, and Chi Wang. Partitioning permutations intoincreasing and decreasing subsequences.

J. Comb. Theory Ser. A , 73(2):353–359, 1996.19LP94] Christos Levcopoulos and Ola Petersson. Sorting shuﬄed monotone sequences.

Inf.Comput. , 112(1):37–50, 1994.[Man85] Heikki Mannila. Measures of presortedness and optimal sorting algorithms. In

IEEETrans. Comput. , volume 34, pages 318–325, 1985.[ML01] R. L. Milidiú and E. S. Laber. Bounding the ineﬃciency of length-restricted preﬁxcodes.

Algorithmica , 31(4):513–529, 2001.[MN07] V. Mäkinen and G. Navarro. Rank and select revisited and extended.

TheoreticalComputer Science , 387(3):332–347, 2007.[MP92] Alistair Moﬀat and Ola Petersson. An overview of adaptive sorting.

Australian Com-puter Journal , 24(2):70–77, 1992.[MR04] J. Ian Munro and S. Srinivasa Rao. Succinct representations of functions. In

Proceedingsof the International Colloquium on Automata, Languages and Programming (ICALP) ,volume 3142 of

Lecture Notes in Computer Science (LNCS) , pages 1006–1015. Springer-Verlag, 2004.[MRRR03] J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct rep-resentations of permutations. In

Proceedings of the 30th International Colloquium onAutomata, Languages and Programming (ICALP) , volume 2719 of

Lecture Notes inComputer Science (LNCS) , pages 345–356. Springer-Verlag, 2003.[MS76] J. Ian Munro and Philip M. Spira. Sorting and searching in multisets.

SIAM J. Comput. ,5(1):1–8, 1976.[Mun96] I. Munro. Tables. In

Proc. 16th Conference on Foundations of Software Technology andTheoretical Computer Science (FSTTCS) , LNCS 1180, pages 37–42, 1996.[NM07] G. Navarro and V. Mäkinen. Compressed full-text indexes.

ACM Computing Surveys ,39(1):article 2, 2007.[Pˇ08] M. Pˇatraşcu. Succincter. In

Proc. 49th IEEE Annual Symposium on Foundations ofComputer Science (FOCS) , pages 305–313, 2008.[RNOM09] L. Russo, G. Navarro, A. Oliveira, and P. Morales. Approximate string matching withcompressed indexes.

Algorithms , 2(3):1105–1136, 2009.[RRR02] R. Raman, V. Raman, and S. Rao. Succinct indexable dictionaries with applications toencoding k -ary trees and multisets. In Proc. 13th Annual ACM-SIAM Symposium onDiscrete Algorithms (SODA) , pages 233–242, 2002.[Sad03] K. Sadakane. New text indexing functionalities of the compressed suﬃx arrays.

Journalof Algorithms , 48(2):294–313, 2003.[SN10] K. Sadakane and G. Navarro. Fully-functional succinct trees. In

Proc. 21st AnnualACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 134–149, 2010.20ST85] D. Sleator and R. Tarjan. Self-adjusting binary search trees.

Journal of the ACM ,32(3):652–686, 1985.[Wil83] D. Willard. Log-logarithmic worst case range queries are possible in space Θ( n ) . Infor-mation Processing Letters , 17:81–84, 1983.[WMB99] I. Witten, A. Moﬀat, and T. Bell.