Range Minimum Queries in Minimal Space
aa r X i v : . [ c s . D S ] F e b Range Minimum Queries in Minimal Space
Lu´ıs M. S. Russo [email protected]
INESC-ID and Department of Computer Science and Engineering,Instituto Superior T´ecnico, Universidade de Lisboa.
Abstract
We consider the problem of computing a sequence of range minimumqueries. We assume a sequence of commands that contains values andqueries. Our goal is to quickly determine the minimum value that existsbetween the current position and a previous position i . Range minimumqueries are used as a sub-routine of several algorithms, namely relatedto string processing. We propose a data structure that can process thesecommands sequences. We obtain efficient results for several variationsof the problem, in particular we obtain O (1) time per command for theoffline version and O ( α ( n )) amortized time for the online version, where α ( n ) is the inverse Ackermann function and n the number of values inthe sequence. This data structure also has very small space requirements,namely O ( ℓ ) where ℓ is the maximum number active i positions. Weimplemented our data structure and show that it is competitive againstexisting alternatives. We obtain comparable command processing time,in the nano second range, and much smaller space requirements. Keywords:
Range Minimum Queries, Union Find, Disjoint Sets, Bulk Queries,String Processing, Longest Common Extension.
Given a sequence of integers, usually stored in an array A , a range minimumquery (RMQ) is a pair of indexes ( i, j ). We assume that i ≤ j . The solutionto the query consists finding in the minimum value that occurs in A betweenthe indexes i and j . Formaly the solution is min { A [ k ] | i ≤ k ≤ j } . Thereexist several efficient solutions for this problem, in this static offline context,see Section 5. In this paper we consider the case where A is not necessariallystored. Instead we assume that the elements of A are streamed in a sequentialfashion. Likewise we assume that the corresponding queries and are intermixedwith the values of A and the answers to the operations are computed online.Hence we assume that the input to our algorithm consists in a sequence of thefollowing commands: Value - represented by V , is followed by an integer, or float, value v and itindicates that v is the next entry of A , i.e., A [ j ] = v . Query - represented by Q , is followed by an integer that indicates a previousindex of the sequence. The given integer corresponds to the element i
1n the query. The element j is the position of the last given value of A .Hence it is only necessary to specify i . This command can only be issuedif an M command was given at position i and no close command was givenwith argument i . Mark - represented by M , indicates that future queries may use the currentposition j as element i , i.e., as the beginning of the query. Close - represented by C , is also followed by an integer i that represents anindex of the sequence. This command essentially nullifies the effect of an M command issued at position i . Hence the command indicates that theinput contains no more queries that use i . Any information that is beingkept about position i can be purged.For simplicity we assume that the sequence of commands is not designed to hackour data structure. Hence we assume that no patological sequences are given asinput. Examples of patological sequences would be: issuing the Mark commandtwice or more or mixed with
Query ; issuing a
Close command for an index thatwas not marked; issuing
Mark commands for positions that have been closed;etc.Consider the following example sequence. We will use this sequence through-out the paper.
V 22 M V 23 M V 26 M V 28 M V 32 M V 27 M V 35 M Q 4 C 3
In this paper we study this type of sequences. Our contributions are thefollowing: • We propose a new algorithm that can efficiently process this type of inputsequences. We show that our algorithm produces the correct solution. • We analyze the algorithm and show that it obtains a fast running timeand requires only a very small amount of space. Specifically the spacerequirements are shown to be at most O ( q ), where q is the number ofqueries. Recall that we do not store the array A . We further reduce thisbound to O ( ℓ ). Consider at some instant the number of marked positionsthat have not yet been closed. We refer to these positions as active. Themaximum number of active positions over all instants is ℓ . The query timeis shown to be O (1) in the offline version of the problem and O ( α ( ℓ )) on theonline version, where α is the inverse Ackermann function, see Theorem 2and Corolary 1 in Section 3.2. We also discuss the use of this data structurefor real-time applications. We obtain a high probability O (log n ) time forall operations, Theorem 3. We also discuss trade-off that can reduce thisbound to O (log log n ) for some operations, Theorem 4. • We implemented the online version of our algorithm and show experimen-tally that it is very efficient both in time and space.
Let us now dicuss how to solve this problem, by gradually considering the chal-lenge at hand. We start by describing a simple structure. We then proceed to2mprove its performance, first by selecting fast data structures which providegood time bounds and second by reducing the space requirements from O ( q ) to O ( ℓ ).Consider again the sequence in Section 1. Our first data structure is a stack,which we use in the same way as for building a Cartesian tree, see Crochemore and Russo[2020]. The process is simple. We start by pushing a −∞ value into the stack,this value will be used as a sentinel. To start the discussion we will assume, fornow, that every Value command is followed by a
Mark command, meaning thatevery position is relevant for future queries.An important invariant of this stack is that the values form an increasingsequence. Whenever a value is received it is compared with the top of the stack.While the value at hand is smaller the stack gets poped. At some point theinput value will be larger than the top of the stack, even if it is necessary for thesentinel to reach the top. When the input value is larger than the top value itgets pushed into the stack. Another important property of this data structureis that the values in the stack are the only possible solutions for range minimumqueries ( i, j ), where j is the current position of the sequence being processedand i is some previous position.To identify the corresponding i it is usefull to keep, associated to each stackitem, the set of positions that yield the corresponding item as the RMQ solution.Maintaining this set of positions is fairly simple. Whenever an item is insertedinto the stack it is inserted with the current position. We number positionsby starting at 1. When an item is poped from the stack the set of positionsassociated to that item is transferred into the set of positions of the item below it.In our example the Value
27 command puts the positions 4 and 5 into the sameset. The rightmost gray rectangle in Figure 1 illustrates the state of this datastructure after processing the commands
V 35 M of our sample sequence. Toprocess a
Close command we remove the corresponding position from whateverset it belongs to, i.e., command C followed by i removes i from a position set.Figure 1 illustrates the configuration of this data structure as it processesthe following sequence of commands: V 22 M V 23 M V 26 M V 28 M V 32 M V 27 M V 35 M Q 4 C 3
Each gray rectangle shows a different configuration. The leftmost configurationis obtained after the
V 32 M commands. The second configuration after the
V35 M commands. The rightmost configuration is the final one after the
C 3 .The solution to the
Q 4 command is 27, because it is the stack item associatedwith the position 4 in the rightmost configuration, these values are highlightedin bold.Using a standard stack implementation it is possible to guarantee O (1) timefor the push and pop operations. Hence, ignoring the time required to processthe sets of positions, the pairs of Value and
Mark operations require only con-stant amortized time to compute. In the worst case a
Value operation may needto discard a big stack, i.e., it may require poping O ( n ) items, where n is thetotal amount of positions in A . However since each operation executes at mostone push operation the amortized time becomes O (1). Hence the main challengefor this data structure is how to represent the sets of positions. To answer thisquestion we must first consider how to compute the Query operation. Giventhis command, followed by a value i , we proceed to find the set that contains i and report the corresponding stack element. For example to process the Q { } { } { } { } { }−∞ { , , } { } { } { }−∞ { } { , , } ∅ { } { }−∞ Figure 1: Illustration of structure configuration at different instances. Each grayrectangle shows the stack on the left and the corresponding sets of positions onthe right.command in the input sequence we most locate the set that contains position4. In this case the set is { , , } and the corresponding element is 27. Hencethe essential operations that are required for the sets of positions are the unionand the find operations. Union is used when merging sets in the Mark operationand find is used to identify sets in the
Query operation.A naive implementation requires O ( n ) time for each operation. Instead weuse a dedicated data structure that supports both operations in O ( α ( n )) amor-tized time, where α ( n ) is the inverse Ackermann function. Note that althoughconceptually the Close command removes elements from the position sets thisdata structure is essentially ignoring these operations. They do not alter theUnion-Find (UF) data structure. Hence, once an element is assigned to a set,it can no longer be removed. Fortunately the resulting procedure is still sound,albeit it requires more space. This version does require a large amount of space,specifically O ( n ) space.Let us now focus on reducing the space to O ( m ), where m is the totalnumber of Mark commands, which should be equal to the total number of
Close commands. We must also have that m ≤ q , where q is the number of Query commands, as there is no point in issuing redundant
Mark commands. Note that m may be much smaller than n as there might be many more Value commandsthan
Mark commands.To guarantee that the size of the stack is at most O ( m ) we now consider thesituation where not all the Value commands are followed by
Mark commands,otherwise n and m would be similar. In this case only the marked positions needto be stored in the stack, thus reducing its size. This separation of commandsmeans that our operating procedure also gets divided. The Mark command onlypushes elements into the stack. The
Value commands only performs the popingcommands. Hence in this scenario both the
Mark and
Value commands require O ( α ( n )) amortized time.To illustrate the division we have just described consider the following se-quence of commands: V 22 M V 23 V 26 M V 28 M V 32 M V 27 M V 35 M Q 4 C 3 { } { } { } { }−∞ { , , } { } { }−∞ { } { , , } ∅ { }−∞ Figure 2: Illustration of structure configuration at different instances. In thissequence of commands there is no M command after V 23 . Each gray rectangleshows the stack on the left and the corresponding sets of positions on the right.We illustrate the state of the resulting data structure in Figure 2. Notice thatin this sequence there is no M command after V 23 . Therefore this value nevergets inserted into the stack.To reduce the size of the UF data structure we add a hash table to it.Without this table every one of the n position values are elements for the UFdata structure. Using a hash we can filter out only the marked positions. Whena Mark command is issued we insert the current j position as the hash key andthe value is the current number of UF elements. This reduces the size of theUF data structure to O ( m ). Moreover the hash table also requires only O ( m )space. Hence this data structure requires only O ( m ) space and can processany sequence of commands in at most O ( α ( n )) amortized time per command.When a Close i command is issued we mark the position i as deleted in thehash table, but we do not actually remove it from memory. The reason for thisprocess is that a stack item might actually point to position i and removing itwould break the data structure. For the O ( m ) space bound this is not an issueas inactive markings count for the overall total.In the next section we discuss several nuances of this data structure, includ-ing how to further reduce the space requirements to O ( ℓ ) space and alternativeimplementations. In this Section we will prove that the algorithm is correct and analyze its per-formance. We start of by giving a pseudo code description of the algorithmsused for each command, Algorithms 3, 4, 5 and 6. In these algorithms we makesome simplifying assumptions and use some extra commands that we will nowdefine.For simplicity we describe the data structure that does not use a hash-table.We use S to represent the stack data structure, but we also use S [ k ′ ] to referencethe element at position k ′ . In general the top of the stack is at position k , whichalso corresponds to the number of elements in the stack. We use k as a globalvariable. We also use k as a bounded variable in the Lemma statements. Hencethe value of k must be derived from context. This is usually not a problem and5n fact it is handy for the proofs, which most of the time only need to considerwhen k is the top of the stack. We also use the notation Top ( S ) to refer tothe top of the stack, this value is equal to S [ k ]. Note that this means that theelement S [ k −
1] is the one just below the
Top element. Algorithms 1 and 2used to manipulate the stack status and are given for completion. The set ofpositions associated with each stack item are denoted with the letter P . In ourexample we have that P [4] = { , , } , see Figure 1.In algorithm 3 we assume that the result of the Find command is directlya position index of S , therefore the expression S [ Find ( i )] for Algorithm 3. The NOP command does nothing, it is used to highlight that without a hash tablethere is nothing for the
Close command to execute.The
Make-Set function is used to create a set in the UF data structure, thefirst argument indicates the element that is stored in the set (position j ) andthe second argument the level of the last element on the stack S , i.e., k . It is thevalues given in this second argument that we expect Find to return. Likewisethe
Union function receives three arguments. The sets that we want to uniteand again the top of the stack k . Note that in Algorithm 6 we use { j } as oneof the arguments to Union operation. In this case we are assuming that thisoperation makes the corresponding
Make-Set operation.Besides k we have a few global variables, j which indicates the current po-sition in A and v , which is not an argument of the Mark command but is usedin that command. At that point it is assumed that v is the last value given inthe Value command.
Algorithm 1 procedure Push ( v ) ⊲ Insert element in stack k ← k + 1 S [ k ] = v end procedureAlgorithm 2 procedure Pop ⊲ Remove element from stack k ← k − end procedureAlgorithm 3 procedure Query ( i ) ⊲ Return RMQ ( i, j ) return S [ Find ( i )] end procedure In this Section we establish that our algorithm is correct, meaning the valuesobtained from our data structure actually correspond to the solutions of thegiven range minimum queries. We state several invariant properties that thestructure always maintains. 6 lgorithm 4 procedure Close ( i ) ⊲ Ignore command NOP end procedureAlgorithm 5 procedure Value ( v ) ⊲ Put into the stack if S [ k ] > v then ⊲ Test element at the
Top . while S [ k − ≥ v do ⊲ Test element below the
Top . Union ( P [ k ], P [ k − k ) ⊲ Unite top position sets. Pop () end while S [ k ] = v end if j ← j + 1 end procedureAlgorithm 6 procedure Mark ⊲ Put into the stack if S [ k ] < v then Push ( v ) ⊲ Insert v into S . Make-Set ( j, k ) ⊲ Associate with k . else Union ( P [ k ], { j } , k ) ⊲ Assume it calls
Make-Set . end if end procedure
7e consider the version of the data structure that consists of a stack anda UF structure. The version containing a hash is relevant for obtaining anefficient structure but does not alter the underlying operation logic. Hence thecorrectness of the algorithm is preserved, only its description is more elaborate.We prove the invariant properties by structural induction, meaning that weassume that they are true before a command is processed and only need to provethat the property is maintained by the corresponding processing. For this kind ofargument to hold it is necessary to verify that the given properties are also truewhen the structure is initialized, this is in general trivially true so we omit thisverification from the following proofs. Another declutering observation is thatthe
Query and
Close commands do not alter our data structure and thereforeare also omitted from the following proofs.Let us start by establishing some simple properties.
Lemma 1.
The stack S always contains at least two elements.Proof. In this particular proof it is relevant to mention the initial state of thestack S . The stack is initialized with two sentinel values, −∞ followed by + ∞ .Hence it initially contains at least two elements. • The
Mark command. This command does not uses the
Pop operation andtherefore never reduces the number of elements. The result follows byinduction hypothesis. • The
Value command. For the
Pop operation in line 5 of Algorithm 5 toexecute the while guard in line 3 must be true. Note that when k = 2 thisguard consists in testing whether −∞ = S [1] > v , which is never the caseand therefore a Pop operation is never executed in a stack that contains 2elements.
Lemma 2. If v was the argument of the last Value command and k is the toplevel of that stack S then S [ k ] ≤ v .Proof. • The
Mark command. When the if condition of Algorithm 6 is true we havethat line 3 executes. After which S [ k ] = v and the Lemma condition isverified. Otherwise the if condition is false and the stack is kept unaltered,in which case the result follows by induction hypothesis. • The
Value command. When the if condition of Algorithm 5 fails theLemma property is immediate. Hence we only need to check the casewhen the if condition holds. In this case line 7 must eventually execute atwhich point we have that S [ k ] = v and the Lemma condition is verified.Let us now focus on more global properties. Next we show that the valuesstored in S are in increasing order. Lemma 3.
For any indexes k and k ′ of the stack S we have that if k ′ < k then S [ k ′ ] < S [ k ] . roof. • The
Value command. This command does not push elements into thestack, instead it pops elements. This means that, in general, a few relationsare discarded. The remaining relations are preserved by the inductionhypothesis. The only change that we need to verify is if the
Top of thestack S changes, line 7 of Algorithm 5. Hence we need to check the casewhen k is the top level of the stack. Note that line 7 occurs immediatelyafter the while cycle. Which means that its guard is false, i.e., we havethat S [ k − < v = S [ k ]. Hence the desired property was established for k ′ = k −
1. For any other k ′ < k − S [ k ′ ] < S [ k − S [ k ′ ] < S [ k ]. • The
Mark command. The only operation performed by this command isto push the last element into the stack. Hence when k is below the top ofthe stack the property holds by induction. Let us analyze the case whenthe top of the stack changes, i.e., when k is the top level of the stack.The change occurs in line 3 of Algorithm 6 in which case we have that S [ k − < v = S [ k ]. Hence we extend the argument for k ′ < k − Value command by induction hypothesis and transitivity.Likewise the converse of this Lemma can now be established.
Lemma 4.
For any indexes k and k ′ of the stack S we have that if S [ k ] < S [ k ′ ] then k < k ′ .Proof. Assume by contradiction that there are k and k ′ such that S [ k ] < S [ k ′ ]and k ′ ≤ k . Because S [ k ] = S [ k ′ ] we have that k = k ′ , since we are using S as anarray. Hence we must have that k ′ < k and can now apply Lemma 3 to concludethat S [ k ′ ] < S [ k ], which contradicts the order relation in our hypothesis.This sorted property also gives structure to the sets of positions. Lemma 5.
For any indexes k ′ < k and positions p ′ ∈ P [ k ′ ] and p ∈ P [ k ] wehave that p ′ < p .Proof. • The
Mark command. This operation inserts the current position j intothe set that corresponds to the top of the stack. The top might havebeen preserved or created by the operation, both cases can be justified inthe same way. We only need to consider the case when Top ( S ) = S [ k ]and p = j , any other instanciation of the variables in the Lemma willcorrespond to relations that were established before the structure wasmodified. Hence we only need to show that p ′ < j for any p ′ in any P [ k ′ ].This is trivial because j represents the current position in A , which istherefore larger than any previous position of A that may be representedby p ′ . 9 The
Value command. As this command pops elements from the stack,it has the side effect of merging the position sets. Hence the only newrelation is for positions at the top of the stack, i.e., when p ∈ P [ k ] and Top ( S ) = S [ k ]. We only need to consider where position p was beforethe operation, i.e., p ∈ P b [ k b ], were P b [ k b ] represents a set of positionsbefore the operation is executed. Because the Value command mergesthe position sets which are highest on the stack we have that k ≤ k b .Now, for any k ′ < k and p ′ ∈ P [ k ′ ], we have that P [ k ′ ] = P b [ k ′ ] becausethe sets of positions below the top of the stack are not altered by theoperation. In essence we have that k ′ < k b and p ′ ∈ P b [ k ′ ] and p ∈ P b [ k b ],therefore by induction hypothesis we obtain p ′ < p , as desired.We can now state our final invariant, which establishes that our algorithmis correct. Theorem 1.
At any given instant when j is the current position over A wehave that if i ∈ P [ k ′ ] then RMQ ( i, j ) = S [ k ′ ] .Proof. • The
Mark command. This command does not alter the sequence A . There-fore none of the RMQ( i, j ) values change. Since almost all positions andposition sets P [ k ′ ] are preserved the implication is also preserved. Theonly new position is j ∈ P [ k ], therefore the only case we need to consideris when i = j and k ′ is the top level of the stack S , i.e., k ′ = k . In thiscase we have that RMQ( j, j ) = A [ j ] = v , where v is the argument givenin the last Value command. Now let us consider the if condition in line 2of Algorithm 6. This further divides the argument into two cases: – When this condition holds then line 3 of Algorithm 6 executes andmakes S [ k ] = v . Hence RMQ( j, j ) = S [ k ]. – When this condition fails we have v ≤ S [ k ]. Applying Lemma 2we obtain S [ k ] ≤ v and therefore conclude that S [ k ] = v . HenceRMQ( j, j ) = S [ k ]. • The
Value command. This command essentially adds a new value v atthe end of A , i.e., it sets A [ j ] = v , where j is now the last position of A . This implies that j is not yet a marked position. Therefore for thiscommand we do not need to consider i = j because j is not a member ofa position set P [ k ′ ].Thus we only need to consider cases when i < j . Consider such an index i , which moreover belongs to the position set P [ k ′ ], i.e., i ∈ P [ k ′ ]. Theposition i must necessarily occur in some set P b [ k ′ b ], which is a set ofpositions that exists before the Value operation alters the stack. In thiscase we have by induction hypothesis that RMQ( i, j −
1) = S b [ k ′ b ]. Wenow divide the proof into two cases: – When S b [ k ′ b ] ≤ v , in which case RMQ( i, j ) = S b [ k ′ b ]. In this casewe only need to show that the Value command does not alter the10ndex k ′ b of the stack, i.e., that i ∈ P [ k ′ b ] and that S b [ k ′ b ] = S [ k ′ b ].Therefore the desired property holds for k ′ = k ′ b . This is imediate asthe case hypothesis means that even if the Value operation happensto extrude level k ′ b to the top of the stack it does eliminate it, becauseLemma 3 implies that S b [ k ′ b − < S b [ k ′ b ] ≤ v , and therefore the whileguard in line 3 fails. – When v < S b [ k ′ b ], in which case RMQ( i, j ) = v . In this case the value S b [ k ′ b ] will be discarded by the Value command. Let k correspondto the level that is at the top of the stack, after the command. ByLemma 2 we have that S [ k ] ≤ v combining both these inequalitiesyields S [ k ] < S b [ k ′ b ]. Using Lemma 3 we have that S [ k − < S [ k ],note that Lemma 1 guarantees that the level k − k is the top level of S after the command we have S b [ k −
1] = S [ k − S b [ k − < S b [ k ′ b ],to which we apply Lemma 4, to conclude that k − < k ′ b . Thereforeeither k = k ′ b or the level k ′ b was excluded from the stack. In bothcases position i must be in P [ k ], either because it was already there orit was eventually transferred by the union commands in line 4. Hencewe only need to check that S [ k ] = v . Let k b be the Top of stack S b before the command is executed. Hence k ′ b ≤ k b and by Lemma 3 weobtain S b [ k ′ b ] ≤ S b [ k b ]. Using this case hypothesis and transitivitywe obtain that v < S b [ k b ]. This implies that the condition of the if in line 2 of Algorithm 5 is true. Therefore line 7 eventually executesand obtains the condition S [ k ] = v as desired. In this section we discuss several issues related to the performance of our datastructure. Namely we start off by reducing the space requirements from O ( m )to O ( ℓ ). First we need to notice in which ways our data structure can waistspace. In particular the Close command waists space in the stack itself. In therightmost structure of Figure 1 we have that the set P [3] becomes empty afterthe C 3 command. This set which corresponds to S [3] = 26 on the stack. Inessence the item S [3] is no longer necessary in the stack. However it is keptinactive in the stack, the hash table and the UF data structure. It is marked asinactive in the hash table, but it still occupies memory.Recall that our data structure consists of three components: a stack, a hashtable and a Union-Find data structure. These structures are linked as follows:the stack contains values and pointers to the hash table; the hash-table usessequence positions as keys and UF elements as values; the Union-Find datastructure is used to manipulate sets of reduced positions and each set in turnpoints back to a stack position.Let us now use an amortizing technique to bound the space requirements ofthis structure. We start off by allocating a data structure that can contain atmost a elements, where a is a small initial constant. Allocating a structure withthis value implies the following guarantees: • It is possible to insert a elements into the stack without overflow.11 It is possible to insert a elements into the hash table and the overalloccupation is always less than half. This guarantees average and highprobability efficient insertions and searches. • It is possible to use a positions for Union-Find operations.Hence we can use this data structure until we reach the limit a . When the limitis reached we consider the number of currently active marked positions, i.e.,the number of positions i such that M was issued at position i , but up to thecurrent position no Close i was never issued. To determine this value it is bestto keep a counter c . This counter is increased when a Mark command is issued,unless the previous command was also a Mark command, in which case it is arepeated marking for a certain position. The counter is decreased when a
Close i is issued, provided position i is currently active, i.e., it was activated by some Mark command and it has not yet been closed by any other
Close command.Hence by consulting this counter c we can determine in O (1) time the numberof active positions at this instant. We can now alloc a new data structure with a ′ = 2 c , i.e., a data structure that can support twice as many elements as thenumber of current active positions. Then we transfer all the active elementsfrom the old data structure to the new data structure. The process is fairlyinvolved, but in essence it requires O ( a × α ( a )) time and when it finishes thenew data structure contains all the active positions, which occupy exactly halfof the new data structure. This factor is crucial as it implies that the amortizedtime of this transfer is in fact O ( α ( a )) and moreover that the allocated size isat most O (2 ℓ ).We now describe how to transfer only the active elements from the old datastructure to the new data structure. First we mark all the elements in the oldstack as inactive. In our implementation we make all the values negative, as thetest input sequences contained no negative values but other marking schemesmay be used. This is also the scheme we used to mark inactive hash entries.Now traverse the old hash table and copy all the active values to the newhash table. Also initilize the pointers from the new hash table to the new UFdata structure. The new UF positions are initialized incrementally, starting at1. Hence every insertion into the new hash function creates a new UF position,that is obtained incrementally from the last one. We also look up the old UFpositions that are given by active entries of the old hash table. We use thoseold active sets to reactivate the old stack entries. This process allowed us toidentify which stack entries are actually relevant in the old stack. With thisinformation we can compact the old stack by removing the inactive positions.We compact the old stack directly to the new stack, so the new stack containsonly active positions. We also add pointers from the old stack to the new stack.Each active entry of the old stack points to its correspondent in the new stack.In our implementation this was done by overriding the pointers to the old hashtable, as they are no longer necessary.At this point the new stack contains the active values, but it still has notinitialized the pointers to the new hash table. These pointers are in fact positionvalues, because positions are used as keys in the hash-table. To initialize thesepointers we again traverse the active entries of the old hash table and mapthem to the old UF positions and to the corresponding old stack items. Wenow use the pointer from the old stack item to the new stack item and updatethe position pointer of the new stack to the key of the active entry of the new12ash that we are processing. This assignment works because positions are keptinvariant from the old data structure to the new one. Therefore these positionsare also keys of the new hash. We finish this process by updating the pointersof the new UF data structure to point to the corresponding items of the newstack. Since we now know the active items in the new stack and have pointersfrom the new stack to the new hash and from the new hash to the new UFposition, we can simply assign the link from the new UF set back to the itemof the new stack item. Thus closing this reference loop.At this point almost all of the data structure is linked up. The new stackpoints to the new hash table, the new hash table points to the new UF structureand the sets of the new UF structure point to the new stack. The only missingingredient is that the sets of the new UF structure are still singletons, becauseno Union operations have yet been issued. The main observation to recover thisinformation is that several positions in the new UF structure point to the sameitem in the new stack. Those positions need to be united into the same set.To establish these unions we traverse the new UF data structure. For each UFposition we determine its corresponding stack item, note that this requires aFind operation. We then follow its pointer to an item in the new hash, and apointer from that item back to a position in the new UF data structure. Nowwe unite two UF sets, the one that contained the initial position and the onethat contains the position that was obtained by passing through the stack andthe hash.
Theorem 2.
It is possible to process online a sequence of RMQ commands in O ( ℓ ) space using O ( α ( ℓ )) expected amortized time per command.Proof. The discussion in this section essentially establishes this result. We onlyneed to point out the complexities of the data structures that we are using.As mentioned before the UF structure requires O ( α ( n )) amortized time. Thestack is implemented over an array and therefore requires O (1) per Push and
Pop command. In theory we consider a hash-table with separate chaining andamaximum load factor of 50%, which obtains O (1) expected time per operation.In practice we implemented a linear probing approach.The final argument is to show that the transfer process requires O ( α ( ℓ ))amortized time. Whenever a transfer process terminates the resulting structureis exactly half full. As the algorithm progresses elements are inserted into thestructure until it becomes full. Whenever an element is inserted we store 2credits. Hence when the structure is full there is a credit for each element itcontains, therefore there are enough credits to amortize a full transfer process.We assume that these credits are actually multiplied by α ( ℓ ) and whatever isthe constant of the transfer procedure is.One important variation of the above procedure is the offline version of theproblem. Meaning that we are given the complete sequence of commands andare allowed to process them as necessary to obtain better performance. In thiscase we can use a more efficient variant of the Union Find data structure andobtain O (1) time per operation, proposed by Gabow and Tarjan [1985]. Corolary 1.
It is possible to process offline a sequence of RMQ commands in O ( ℓ ) space using O (1) expected amortized time per command.
13n the other extreme of applications we may be interrested in real timeapplications. Meaning that we need to focus on minimizing the worst case timethat is necessary to process a given command. In this case we can modify ourdata structure to avoid excessively long operations, i.e., obtain stricter boundsfor the worst case time. As an initial result let us de-amortize the transferprocedure, assuming the same conditions as in Theorem 2.
Lemma 6.
Given a sequence of RMQ commands it is possible to processes themso that the transfer procedures require an overhead of O ( α ( ℓ )) expected amortizedtime per command.Proof. Note that the transfer process requires O ( a × α ( a )) amortized time totransfer a structure that supports a elements.We modify the transference procedure so that it transfers two full structuresat the same time, by merging their active elements into a new structure. Theprocess is essentially similar to the previous transference procedure, with a fewkey differences.An element can only be considered active if it is not marked as inactive inone of the old hashes. More precisely: if it is marked as active in one hash andas inactive in the other then it is inactive; if it is marked as active in one hashand does not exists in the other then it is active; if it is marked as active inboth then it is active.Once the active elements of the old stacks are identified they are merged intothe new stack, by using the same merging procedure that is used in mergeSortalgorithm, with the proviso that there should be only one copy of the sentinelin the merged stack. The third important sincronization point is the unioncommands. Before starting this process it is necessary that all the informationfrom the old structures has been transfered to the new one, recall that thisprocess generaly iterates over the new structure, not the old ones.When the old structures can support a and a elements respectively themerging process requires O ( a + a ) operations. Note that we do not meantime, instead we mean primitive operations on the data structures that composethe overall structure, namely accessing the hash function, following pointers orcalling union or find. Given this merging primitive we can now deamortize ourtransfer process. Instead of immediately discarding a structure that hits its fulloccupancy we keep it around because we can not afford to do an immediatetransfer. Instead when we have at least two full structures we initiate thetransfer process. Again to avoid exceeding real time requirements this processis kept running in parallel, or interleaved, with the processing of the remainingcommands in the sequence. Since this procedure requires O ( a + a ) operations,it is possible to tune it to guarantee that it is terminated by the time that atmost ( a + a ) / O (1) operations to the merging process. Each operationrequires has an expected O ( α ( ℓ )) time, which yields the claimed value.Hence, at any given instant, we can have several structures in memory. Infact we can have at most four, which serve the following purporses: • One active structure. This structure is the only one that is currentlyactive, meaning that it is the only structure that still supports
Mark and
Value commands. • Two static full structures that are currently being merged.14
One destination structure that will store the result of the merged struc-tures. In general this structure is in some inconsistent state and does notprocess
Query commands. The only command that it accepts is
Close .At any point of the execution some or all of the previous structures may be inmemory. The only one that is always guaranteed to exist is the active structure.Now let us discuss how to process commands with these structures. • The
Query command is processed by all structures, except the destinationstructure which is potentially inconsistent. From the three possible valueswe return the overall minimum. In this case we are assuming that if thequery position i is smaller than the minimum position index stored in thestructure than it returns its minimum value, i.e., the value above the −∞ sentinel. • The
Mark and
Value commands modify only the active structure. • The
Close command is applied to all the structures, including the des-tination structure. This causes no conflict or inconsistency. Recall thatelements are not removed from the hashes, they are only marked as inac-tive.If we have only the active structure in memory, we use it to process the
Mark and
Value commands. When this active structure gets full we mark it asstatic and ask for a new structure that supports the same number a of elements.This structure becomes the new active structure. Note that requesting memorymay require O ( a ) time, assuming we need to clean it. This can be mitigatedby using approaches as Briggs and Torczon [1993] or assuming that this processwas previously executed, which is possible with in our approach.As soon as the second structure becomes full we start the merging process toa new destination structure. We consult the number of active elements in eachone, c and c . We request the destination structure to support exactly c + c elements. This implies that once the merge procedure is over the destinationstructure is full and no further elements can be inserted into it. At which pointwe need to request another active structure. If the full structures have sizes a and a we ask for an active structure that can support ( a + a ) / a + a is that its iterationyields a geometric series that does not exceed 2 ℓ . Hence implying that none ofthe structures need to support more that 2 ℓ elements. This can also be verifiedby induction. Assuming that the original alloc size a is also less than 2 ℓ , wehave by induction hypothesis that a ≤ ℓ and a ≤ ℓ therefore ( a + a ) / ≤ (2 ℓ + 2 ℓ ) / ≤ ℓ . Also by the definition of ℓ we also have that c ≤ ℓ and c ≤ ℓ which implies that the destination structures also support at most 2 ℓ elements.Since the algorithm uses a at most 4 structures simultaneously, we can thusconclude that the overall space requirements of the procedure is O ( ℓ ).Note that in the worst case the time bound of the UF structures is O (log ℓ )rather than O ( α ( ℓ )). Also note that using a strict worst case analysis would yield15n O ( ℓ ) worst case time for our complete data structure. Because it containsa hash-table. To avoid this pathological analysis we instead consider a highprobability upper bound. In this context we obtain an O (log ℓ ) time boundwith high probability, for all commands except the Value command. Hence letus now address this command.
Theorem 3.
It is possible to process, in real time, a sequence of RMQ com-mands in O ( ℓ ) space and in O (log ℓ ) time per operation with high probability.Proof. Given the previous observations we can account O (log ℓ ) time for the UFstructure and the hash table, with high probability, see Mitzenmacher and Upfal[2017]. Lemma 6 de-amortized the transfer operation, hence in this proof weonly need to explain how to de-amortize the Value operation.Algorithm 5 specifies that given an argument v this procedure removes fromthe stack S the elements that are strictly larger than v . This process mayend up removing all the elements from the stack, except obviously the −∞ sentinel. Hence its worst case time is O ( m ), where m is the maximum numberof elements in the stack. The transfer procedure guarantees that the stack doesnot accumulate deactivated items and therefore we have that m = O ( ℓ ). Thisis still too much time for a real time operation. Instead we can replace thisprocedure by a binary search over S , i.e., we assume that stack is implementedon an array and therefore we have direct access to its elements in constant time.As shown in Lemma 3 the elements of S are sorted. Therefore we can computea binary search for the position of v and discard all the elements in S that arelarger than v in O (log ℓ ) time. Recall that we use variable k to indicate the topof the stack. Once the necessary position is identified we update k .However Algorithm 5 also specifies that each element that is removed fromthe stack invokes a Union operation, line 4. To perform these unions in realtime we need a different UF data structure.Most UF structures work by choosing a representative element for each set.The representative is the element that is returned by the
Find operation. Thisrepresentative is usually an element of the set it represents. The representativeeither posseses, or is assigned, some distinct feature that makes it easy to iden-tify. In the UF structure by Tarjan and van Leeuwen [1984] a representative isstored at the root of a tree.Lemma 5 essentially states that the sets that we are interrested in can besorted, without incosistencies among elements of diferent sets. Hence this pro-vides a natural way for choosing a representative. Each set can be representedby its minimum element. With this representation the
Find ( p ) operation con-sists in finding the largest representative that is still less than or equal to p , i.e.,the Predecessor. The Union operation simply discards the largest representativeand keeps the smallest one. Hence we do not require an extra data structure,it is enough to store the minimums along with values within the stack items.To compute the Predecessors we perform a binary search over the minimums.This process requires O (log ℓ ) time. Moreover the variable k allows us to per-form multiple Union operations at once. Let us illustrate how to use this datastructure for our goals. Recall the sample command sequence:
V 22 M V 23 M V 26 M V 28 M V 32 M V 27 M V 35 M Q 4 C 3 { } { , , } ∅ { } { }−∞
35 727 426 323 222 1 −∞
10 1 −∞ Figure 3: Illustration structure configuration using minimums to represent po-sition sets.Now assume that after this sequence we also execute the command
V 10 . Weillustrate how a representation based on minimums processes these commands,Figure 3. The structure on left is the configuration after the initial sequence ofcommands. The structure in the middle represents the actual configuration thatis stored in memory. Note that for each set we store only its minimum element.In particular note that the set associated with value 26 is represented by 3, eventhough position 3 was already marked as closed. As mentioned the hash-tablekeeps track of which positions are still open and closed positions are removedduring transfer operations. This means that until then it is necessary to use allpositions, closed or not, for our UF data structure. Hence the representative ofa set is the minimum over all positions that are related to the set, closed or not.The structure on the right represents the structure after processing the
V 10 command.Note that in this final configuration the set, of active positions, associatedwith value 10 should be { , , , , , } . However it is represented only by thevalue 1. This set should be obtained by the following sequence of Union oper-ations { } ∪ { } ∪ { , , } ∪ { } . This amounts to removing the numbers 2, 4and 7, which is obtained automatically when we alter the variable k .Summing up, our data structure consists of the following elements: • An array storing stack S. Each element in the stack contains a value v andposition i , which is the minimum of the position set it represents. • A hash-table to identify the active positions. In this configuration nomapping is required, it is enough to identify the active positions.The general procedure for executing commands and the respective timebounds are the following: • The
Value command needs to truncate the stack, by updating variable k . This process requires O (log ℓ ) time because of the binary search pro-cedure, but it can actually be improved to O (1 + log d ) time where d isthe number of positions removed from the position tree, by using an ex-ponential search that starts at the top of the stack. Using an exponentialsearch the expected amortized time of this operation is O (1).17 The
Mark command needs to add an element to the hash-table and anelement to the stack S . This requires O (log ℓ ) time with high probability.The Make-Set or Union operations require only O (1) time hence the over-all time is dominated by O (log ℓ ). The expected time of this operation is O (1). • The
Query command needs to search for an element in the hash-table andcompute a
Find operation. The
Find operation is computed with a binarysearch over minimums stored in the items of the stack. This operationrequires O (log ℓ ) time with high probability. The expected amortized timeis also O (log ℓ ), but it can be improved to O (1 + log( j − i + 1)) for a querywith indexes ( i, j ), by using an exponential search from the top of thestack. • The
Close command needs to remove an element from the hash-table.This requires O (log ℓ ) time with high probability and O (1) expected time.The data structure of the previous theorem is simple because most of thecomplex de-amortizing procedure is handled in Lemma 6. We now focus onhow to further reduce the high probability time bounds to O (log log n ). Asimple way to obtain this is to have ℓ = O (log n ), i.e., having at most O (log n )active positions at each time. This may be achieved if Query positions are notnecessarily exact, meaning that the data structure actually returns the solutionfor a query ( i ′ , j ) instead of ( i, j ). The goal is that j − i is similar in size of j − i ′ . Meaning that j − i ≤ j − i ′ < j − i ). In this scenario it is enough tokeep O (log n ) active positions, i.e., positions i ′ for which j − i ′ = 2 c for someinteger c . Since the data structure of Theorem 3 does not use the hash-table toreduce the position range, we can bypass its use in these queries. It is enough todirectly determine the predecessor of i among the minimums stored in the stack S . Which is computed with a binary search or exponential search as explainedin the proof.The problem with this specific set of positions is that when j increases theactive positions no longer provide exact powers of two. This is not criticalbecause we can adopt an update procedure that provides similar results. Let i < i < i represent three consecutive positions that are currently active.When j increases we check whether to keep i or discard it. It is kept if j − i > j − i ), otherwise it is discarded. Hence we keep a list of active positions thatgets updated by adding the new position j and checking two triples of activepositions. We keep an index that indicates which triple to check and at each stepuse it to check two triples, moving from smaller to larger position values. Theextremes of the list are not checked. We show the resulting list of positions inTable 1, where the bold numbers indicate the triples that will be checked in thenext iteration. Whenever the triples to check reach the end of the list we havethat the size of the list is at most 2 log n , because the verification guaranteesthat the value j − i is divided in half for every other position i . Therefore ittakes at most 2 log n steps to traverse the list. Hence this list can contain atmost 4 log n = O (log n ) positions and each time j is updated only O (1) time isused. 18
71 4
111 4 7 9
10 11 121 4 7
11 12 13 141
13 14 151 7
11 13 14
15 161 7 11
14 15 16
171 7 11 14
16 17 181 7 11
14 16 17 18 191 7
11 14 16
17 18 19 201 7 11
16 17 18
19 20 211 7 11
16 19 20
21 221 7 11 16
19 21 22
231 7 11 16 19 21
22 23 241 7 11
16 19 21 22 23 24 251
11 16 19
21 22 23 24 25 261 11
19 21 22
23 24 25 26 271 11
19 23 24
25 26 27 281 11 19
24 25 26
27 28 291 11 19
24 27 28
29 30Table 1: Sequence of active position listsAnother alternative for obtaining O (log log n ) high probability time is tochange the UF structure. In this case we use the same approach as Theorem 3that relies on predecessor searches to compute the Find operation. This timewe consider the Van Emde Boas tree that supports this operation efficiently,but requires longer to update.
Theorem 4.
It is possible to process, in real time, a sequence of RMQ com-mands in O ( ℓ ) space and in O (log log ℓ ) time with high probability, for all oper-ations except Value , which requires O ( √ ℓ ) time with high probability.Proof. First note that the
Value command is not used in the de-amortizedtransfer procedure described in Lemma 6. Thus guaranteeing that the overheadper command will be only O (log log ℓ ) time, once the statement of the Theoremis established.One important consideration is to reduce the high probability time of thehash-table to O (log log ℓ ) instead of O (log ℓ ). For this goal we modify the sepa-rate chaining to the 2-way chaining approach proposed by Azar, Broder, Karlin, and Upfal[1999], also with a maximum load factor of 50%.19e can now analyze the Van Emde Boas tree (VEB). This data structureis used as in Theorem 3 to store the minimum values of each set. Hence theunderlying universe are the positions over A . Since this structure uses linearspace in the universe size this would yield O ( n ) space. However in this case wecan use the hash-table to reduce the position range and thus the required spacebecomes O ( ℓ ). Note that the reduced positions are also integers and we canthus correctly use this data structure.Given that the time to compute a predecessor with this data structure is O (log log ℓ ) this then implies this bound for the RMQ operations except Value .For this operation we have two caveats. First the binary search over the valuesin the stack S still requires O (log ℓ ) time. Second the Union operations inAlgorithm 5 implies that it is necessary to remove elements from the VEBtree. This is not a problem for the
Mark operation, Algorithm 6, because asingle removal in this tree also requires O (log log ℓ ) time. The issue for Value is that it may perform several such operations. In particular when d elementsare removed from the stack it requires O ( d log log ℓ ) time. Recall the examplein the proof of Theorem 3, where several union operations where executed toproduce the set { } ∪ { } ∪ ∅ ∪ { , , } ∪ { } . In that Theorem this was doneautomatically by modifying k , but in this case it is necessary to actually removethe elements 2, 3, 4 and 7 from the VEB tree. Note that the element 3 is therepresentative of the empty set. Even though it is not active it was still in theVEB tree.This consists in removing from the VEB tree all the elements that are largerthan 1. The VEB tree does not have a native operation for this process. Hencewe have thus far assumed that this was obtained by iterating the delete opera-tion. Still it is possible to implement this bulk delete operation directly withinthe structure, much like it can be done over binary search trees. In essencethe procedure is to directly mark the necessary first level structures as emptyand then do a double recursion, which is usually strictly avoided in this datastructure. Given a variable u that identifies the logarithm of the universe size as ℓ = 2 u , this yields the following time recursion T ( u ) = 2 u/ +2 T ( u/ u/ = √ ℓ is the number of structures that exist in the first level, and potentiallyneed to be modified. This recursion is bounded by O (2 u/ ) = O ( √ ℓ ).As a final remark about this last result note that the time bound for the Value command is also O (log log ℓ ) amortized, only the high probability boundis O ( √ ℓ ). This is because the iterated deletion bound O ( d log log ℓ ) that wementioned in the proof does amortize to O (log log ℓ ) and for each instance ofthe Value command we can choose between O ( d log log ℓ ) and O ( √ ℓ ).This closes the theoretical analysis of the data structure. Further discussionis given in Section 6. Let us now focus on testing the performance of this structure experimentally.We implemented the data structure that is described in Theorem 2. We also de-signed a generator that produces random sequences of RMQ commands. In thesegenerated sequences the array A contained 2 integers, i.e., n = 2 . Each inte-ger was chosen uniformly between 0 and 2 −
1, with the arc4random uniform .We first implemented the version of our Algorithm described in Section 2, i.e.,without using a hash table nor the transfer process. We refer to the prototypeas the vanilla version and use the letter V to refer to it in our tables. We alsoimplemented the version described in Theorem 2, which includes a hash tableand requires a transfer process. We use the label T2 to refer to this prototype.For a baseline comparison we used the ST-RMQ-CON algorithm by Alzamel, Charalampopoulos, Iliopoulos, and Pissis[2018]. We obtained the implementation from their github repository https://github.com/solonas13/rmqo .Our RMQ command sequence generator proceeds as follows. First it gen-erates n = 2 integers uniformly between 0 and 2 −
1. Then it chooses aposition to
Mark , uniformly among the n positions available. This process isrepeated q times. Note that the choices are made with repetition, therefore thesame position can be chosen several times. Each marked position in turn willforce a query command. All query intervals have the same length l = j − i + 1.Under these conditions it is easy to verify that the expected number of openpositions at a given time is l × q/n and the actual number should be highlyconcentrated around this value. Hence we assume that this value correspondsto our ℓ parameter and therefore determine l as ℓ × n/q .The tests were performed on a 64 bit machine, running Linux mem 4.19.0-12 ,which contained 32 cores in
Intel(R) Xeon(R) CPU E7- 4830 @ 2.13GHz
CPUs.The system has 256 Gb of RAM and of swap. Our prototypes were compile with gcc 8.3.0 and the baseline prototype with g++ . All prototypes are compiledwith -O3 . We measure the average execution time by command and the peakmemory used by the prototypes. These values were both obtained with thesystem time command. These results are show in table 2 and 3.The results show that our prototypes are very efficient. In terms of timeboth V and T2 obtain similar results, see Table 2. As expected T2 is slightlyslower than V, but in practice this different is less than a factor of 2. The timeperformance of B is also very similar, in fact V and T2 are faster, which was notexpected as B has O (1) performance per operation and V and T2 have O ( α ( n )).Even though in practice this difference was expected to be very small we werenot expecting to obtain faster performance. This is possibly a consequence ofthe memory hierarchy as B works by keeping A and all the queries in memory.Concerning memory our prototypes also obtained very good performance,see Table 3. In particular we can clearly show a significant difference betweenusing O ( q ) and O ( ℓ ) extra performance. Consider for example q = 2 and ℓ = 2 . For these values V uses more than one gigabyte of memory, whereasT2 requires only 17Mb, a very large difference. In general T2 uses less memorythan V, except when q and ℓ become similar. For example when q = ℓ = 2 V use around one gigabyte of memory, whereas T2 requires three, but thisis expected. Up to a given fixed factor. The baseline B requires much morememory as it stores more items in memory. Namely a compacted version ofthe array A and the solutions to all of the queries. Our prototypes V andT2 do not store query solutions. Instead whenever a query is computed itsvalue is written to a volatile variable. This guarantees that all the necessarycomputation is performed, instead of optimized away by the compiler. Howeverit also means that previous solutions are overwritten by newer results. Wedeemed this solution as adequate for an online algorithm, which in practice will https://github.com/freedesktop/libbsd ← ℓ T2 14V 10B 252 T2 15 14V 10 10B 20 202 T2 15 15 14V 10 10 10B 21 20 202 T2 15 14 15 14V 10 10 11 10B 20 20 20 212 T2 15 14 15 14 14V 10 10 10 10 10B 21 21 21 20 202 T2 15 16 16 16 16 16V 11 11 10 10 10 10B 21 27 27 27 27 242 T2 15 15 14 14 15 15 15V 10 10 10 10 10 10 10B 26 26 27 26 26 26 252 T2 23 25 14 14 15 14 19 15V 10 10 10 10 10 10 10 10B 26 27 27 26 26 25 27 262 T2 15 16 15 14 15 16 15 14 15V 11 10 11 11 11 10 10 11 10B 28 25 28 27 27 28 27 27 262 T2 15 16 17 16 15 16 16 17 16 17V 10 10 11 10 11 10 11 11 11 11B 29 30 26 30 29 29 30 29 28 252 T2 17 18 17 17 18 17 18 19 18 20 19V 13 12 11 11 12 12 12 13 13 12 12B 32 33 33 33 33 33 33 32 33 28 312 T2 19 19 19 19 19 29 20 21 23 22 24 23V 14 14 13 14 14 15 14 15 14 14 15 14B 38 38 38 37 38 39 39 39 40 38 38 352 T2 24 24 24 24 25 27 25 27 31 30 32 33 33V 18 17 17 17 17 20 19 19 20 20 19 20 22B 49 49 50 50 50 50 52 51 50 52 51 49 442 T2 33 34 33 37 35 40 37 38 42 53 48 50 52 51V 24 25 24 28 27 27 31 31 32 33 32 32 30 35B 65 66 69 71 67 76 78 86 83 89 79 75 74 642 T2 51 49 48 50 50 51 52 57 59 67 75 83 85 91 86V 41 41 40 41 37 38 40 45 45 47 46 45 47 49 54B 106 107 115 107 107 114 115 132 126 127 130 135 134 119 1142 T2 71 72 73 75 75 79 82 82 91 110 124 143 153 162 159 160V 58 59 59 60 60 61 60 71 73 75 75 74 78 83 91 102B 157 155 158 161 156 167 169 172 179 183 187 191 191 186 182 1502 T2 104 110 106 111 108 116 122 122 139 167 195 224 241 267 266 264 280V 93 94 94 94 95 94 95 111 119 119 120 119 115 121 128 146 278B 224 229 233 238 250 247 253 263 267 277 294 287 291 317 288 273 252 ↑ q Table 2: Execution time per command in nano seconds. The values are obtainedby dividing total execution time by n + q .22 ← ℓ T2 2V 2B 2Gb2 T2 2 2V 2 2B 2Gb 2Gb2 T2 2 2 2V 2 2 2B 2Gb 2Gb 2Gb2 T2 2 2 2 2V 2 2 2 2B 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 3V 2 2 2 2 2B 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 4 3V 2 2 2 2 2 2B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 4 6 5V 3 3 3 3 3 3 3B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 4 6 10 8V 4 4 4 4 4 4 4 4B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 5 7 10 14 14V 6 6 6 6 6 6 6 6 7B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 5 8 12 20 26 26V 11 11 11 11 11 11 11 11 11 11B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 5 8 12 22 34 50 50V 21 21 21 21 21 21 21 21 21 21 21B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 3 5 8 14 22 38 66 98 98V 41 41 41 41 41 41 42 41 41 41 42 41B 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb 2Gb2 T2 2 2 2 4 5 8 15 25 42 82 130 194 194V 81 81 81 81 81 81 82 81 81 81 81 81 81B 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb 3Gb2 T2 2 2 3 4 5 8 14 27 46 97 161 257 385 386V 161 161 161 161 161 161 161 162 161 161 161 161 161 162B 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 4Gb 3Gb2 T2 3 2 3 3 5 9 15 27 53 90 160 318 510 766 770V 321 320 321 322 320 320 320 320 319 319 320 319 319 319 320B 6Gb 6Gb 6Gb 6Gb 6Gb 6Gb 6Gb 6Gb 6Gb 5Gb 5Gb 5Gb 5Gb 5Gb 5Gb2 T2 3 3 3 4 5 8 15 29 53 96 158 314 506 1011 1Gb 2GbV 634 634 634 634 634 634 634 634 634 634 634 634 634 634 634 634B 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 9Gb 8Gb 8Gb 8Gb 7Gb2 T2 5 4 3 4 5 9 17 27 55 109 169 307 499 996 2Gb 3Gb 3GbV 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1GbB 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 14Gb 13Gb 12Gb ↑ q Table 3: Total memory peak in Megabytes, or in Gygabytes when indicated byGb. 23ost likely pass its results to a calling process. Moreover storing the querysolutions would bound the experimental results to Ω( q ) space, thus not being afair test of O ( ℓ ) space. The Range Minimum Query problem has been exhaustively studied. This prob-lem was shown to be linearly equivalent to the Lowest Common Ancestor prob-lem in a static tree by Gabow, Bentley, and Tarjan [1984]. A recent perspectiveon this result was given by Bender and Farach-Colton [2000]. The first major so-lution to the LCA problem, by Berkman and Vishkin [1993], obtained O ( α ( n ))time, using Union-Find data structures. Similarly to our data structure. In factthis initial result was a fundamental inspiration for the data structure we pro-pose in this paper. A constant time solution was proposed by Harel and Tarjan[1984]. A simplified algorithm was proposed by Schieber and Vishkin [1988]. Asimplified exposition of these algorithms, and linear equivalence reductions, wasgiven by Bender and Farach-Colton [2000].Even though these algorithms were simpler to understand and implementthey still required O ( n ) space to store auxiliary data structures, such as Carte-sian trees. Moreover the constants associated with these data structures werelarge, limiting the practical application of these algorithms. To improve this lim-itation direct optimal direct algorithms for RMQ were proposed by Fischer and Heun[2006]. The authors also showed that their proposal improved previous re-sults by a factor of two. However they also observed that for several com-mon problem sizes, asymptotically slower variants obtained better performance.Hence a practical approach, that obtained a 5 time speedup, was proposedby Ilie, Navarro, and Tinta [2010]. Their approach was geared towards theLongest Common Extension on strings and leveraged the use its average valueto. A line of research directed by an approach that focused on reducing constantsby using succinct and compressed representations was initiated by Sadakane[2007a] and successively improved by Sadakane [2007b], Sadakane and Navarro[2010] and Fischer and Heun [2011]. The last authors provide a systematiccomparison of the different results up to 2011. Their solution provided an2 n + o ( n ) bits data structure the answers queries in O (1) time.Still several engineering techniques can be used obtain more practical ef-ficient solutions. An initial technique was proposed by Grossi and Ottaviano[2013]. A simplification implemented by Ferrada and Navarro [2017] used 2 . n bits and answered queries in 1 to 3 microseconds per query. Another proposalby Baumstark, Gog, Heuer, and Labeit [2017] obtained around a 1 microsecondper query (timings vary depending on query parameters) on an single core ofthe Intel Xeon E5-4640 CPU.A new approach was proposed by Alzamel, Charalampopoulos, Iliopoulos, and Pissis[2018] where no index data structure was created by a preprocessing step. In-stead all the RMQs are batched together and solved in n + O ( q ) time and O ( q )space. This space was used to store a contracted version of the input array A and the solutions to the queries. This is essentially the approach we follow inthis paper. Therefore in Table 2 we independently verify their query times in thenanoseconds. Also table 3 reports the memory requirements of their structure.24n a recent result Kowalski and Grabowski [2018] proposed an heuristic idea,without constant worst case time and a hybrid variation with O (1) time and 3 n bits. Their best result obtains competitive results against existing solutions, ex-cept possibly for small queries. Their results show query times essentially equalto ours and the algorithm of Alzamel, Charalampopoulos, Iliopoulos, and Pissis[2018] for large queries, but they also obtain 10 times slower performance forsmall queries.For completion we also include references to the data structures we used, ormentioned, in our approach.The technique by Briggs and Torczon [1993] provides a way to use memorywithout the need to initialize it. Moreover each time a given memory positionneeds to be used for the first time it requires only O (1) time to register thischange. The trade-off with this data structure is that it triples the space re-quirements. Since, for now, we do not have an implementation of Lemma 6,the claimed result can use this technique, also explained by Bentley [2016]and Aho and Hopcroft [1974]. For our particular implementation this can beovercome. For the destination structure is not a problem because we can as-sume that the whole merge process includes the time for the initial clean-up, allwithin ( a + a ) / a + a ) / a + a + c + c ) / a + a it is possible to finish the clean-up when at most( a + a ) / Union and
Find operations was by Galler and Fisher [1964]. Their complexity was bounded by O (log ∗ ( n )) amortized time per operation by Hopcroft and Ullman [1973]. Theanalysis of the time bound was later refined to O ( α ( n )) by Tarjan and van Leeuwen[1984]. Lower bound analysis guarantees that these bounds are optimal Tarjan[1979] and Fredman and Saks [1989]. However in the case where the sequence ofoperations is known a priori it is possible to obtain O (1) amortized time per op-eration, as shown by Gabow and Tarjan [1985]. An exhaustive survey was givenby Galil and Italiano [1991]. An elementary description of this data structurewas provided by Cormen, Leiserson, Rivest, and Stein [2009] and Sedgewick and Wayne[2011].Hash tables date back to the origin of computers. A history on the subjectand the first theoretical analysis was given by Knuth [1963]. This analysisestablished constant expect time bound. The high probability bound of separatechaining can be derived from balls and bins model, see Mitzenmacher and Upfal[2017]. Actually a better bound was obtained by Gonnet [1981]. The 2-waychaining hash-table was proposed by Azar, Broder, Karlin, and Upfal [1999],which also established its constant expected time and high probability bound.Exponential searches where proposed by Bentley and Yao [1976] and Baeza-Yates and Salinger[2010] and can be used to speed-up the binary search algorithm when the de-sired element is close to the beginning or end of a list. For an introduction tobinary search see Cormen et al. [2009].25he data structure by Boas, Kaas, and Zijlstra [1976] provides support for Predecessor queries over integers in O (log log n ) time, by recursively dividinga tree along its medium height. For an elementary description, which requiresless space was given by Cormen, Leiserson, Rivest, and Stein [2009]. The y-fast trie data structure was proposed by Willard [1983] to reduce the largespace requirements of the Van Emde Boas tree. This data structure obtainsthe O (log log n ) time bound, only that amortized. For this reason we did notconsidered it in Theorem 4. Also in the process the this result describes x-fasttries. We can now discuss our results in context. In this paper we started by defining aset of commands that can be used to form sequences. Although these commandsare fairly limited they can still be used for several important applications. Firstnotice that if we are given a list of ( i, j ) RMQs we can reduce them to theclassical context. This can be achieved with two hash tables. In the first tablestore the queries indexed by i and on the second by j . We use the first table toissue Mark commands and the second to issue
Query commands. This requiressome overhead but it allows our approach to be used to solve classical RMQproblems. In particular it will significantly increase the memory requirements,as occurs in Table 3 between T2 and B.Our data structures can be used in online and real-time applications. Notein particular we can use our commands to maintain the marked positions ina sliding window fashion. Meaning that at any instant we can issue
Query commands for any of the previous ℓ positions. The extremely small memoryrequirements of our approach makes our data structure suitable to be used inrouters, switches or in embedded computation devices with low memory andCPU resources.The simplest configuration of our data structure consists of a stack com-bined with a Union-Find data structure. For this structure we can formallyprove that our procedures correctly compute the desired result, Theorem 1.We then focused on obtaining the data structure configuration that yieldedthe best performance. We started by obtaining O ( α ( n )) amortized time and O ( q ) space, see Theorem 2. This result is in theory slower than the resultby Alzamel, Charalampopoulos, Iliopoulos, and Pissis [2018], which obtained O (1) amortized query time. We compared experimentally these approachesin Section 4. The results showed that out approach was competitive, both interms of time and space, our prototype V was actually faster than the proto-type B by Alzamel et al. [2018]. We also showed that it was possible for ourdata structure to obtained O (1) amortized query time (Corolary 1), mostly fortheoretical competitiveness. We did not implement this solution.We described how to reduce the space requirements down to O ( ℓ ), by trans-ferring information among structures and discarding structures that becamefull, see Lemma 6. In theory this obtained the same O ( α ( n )) amortized timebut significantly reduced space requirements. We also implemented this versionof the data structure. In practice the time penalty was less than a 2 factor.Moreover, for some configurations, the memory reduction was considerable, seeTable 3. 26astly we focused on obtaining real time performance. We obtained a highprobability bound of O (log n ) amortized time per query, see Theorem 3. Thisbound guarantees real time performance. We then investigated alternatives toreduce this time bound to O (log log n ). We proposed two solutions. In one casewe considered approximate queries, thus reducing the necessary amount of activepositions to O (log n ). In the other case we used the Van Emde Boas tree, whichprovided a O (log log n ) high probability time bound for all commands except Value , see Theorem 4. In this later configuration the
Value command actuallyobtained an O ( √ ℓ ) bound, which is large, but the corresponding amortized valueis only O (log log n ). The work reported in this article was supported by national funds throughFunda¸c˜ao para a Ciˆencia e a Tecnologia (FCT) with reference UIDB/50021/2020and project NGPHYLO PTDC/CCI-BIO/29676/2017.
References
Maxime Crochemore and Lu´ıs M.S. Russo. Cartesian and Lyndon trees.
Theo-retical Computer Science , 806:1–9, feb 2020.Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for aspecial case of disjoint set union.
Journal of Computer and System Sciences ,30(2):209–221, apr 1985.Preston Briggs and Linda Torczon. An efficient representation for sparse sets.
ACM Letters on Programming Languages and Systems , 2(1-4):59–69, mar1993.Michael Mitzenmacher and Eli Upfal.
Probability and computing: Randomiza-tion and probabilistic techniques in algorithms and data analysis . Cambridgeuniversity press, 2017.Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algo-rithms.
Journal of the ACM , 31(2):245–281, mar 1984.Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. Balanced alloca-tions.
SIAM Journal on Computing , 29(1):180–200, jan 1999.Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, and Solon P.Pissis. How to answer a small batch of RMQs or LCA queries in practice.In
Lecture Notes in Computer Science , pages 343–355. Springer InternationalPublishing, 2018.Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and relatedtechniques for geometry problems. In
Proceedings of the sixteenth annualACM symposium on Theory of computing - STOC 84 . ACM Press, 1984.Michael A. Bender and Mart´ın Farach-Colton. The LCA problem revisited. In
Lecture Notes in Computer Science , pages 88–94. Springer Berlin Heidelberg,2000. 27mer Berkman and Uzi Vishkin. Recursive star-tree parallel data structure.
SIAM Journal on Computing , 22(2):221–242, apr 1993.Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest com-mon ancestors.
SIAM Journal on Computing , 13(2):338–355, may 1984.Baruch Schieber and Uzi Vishkin. On finding lowest common ancestors: Simpli-fication and parallelization.
SIAM Journal on Computing , 17(6):1253–1262,dec 1988.Johannes Fischer and Volker Heun. Theoretical and practical improvements onthe RMQ-problem, with applications to LCA and LCE. In
CombinatorialPattern Matching , pages 36–48. Springer Berlin Heidelberg, 2006.Lucian Ilie, Gonzalo Navarro, and Liviu Tinta. The longest common extensionproblem revisited and applications to approximate string searching.
Journalof Discrete Algorithms , 8(4):418–428, dec 2010.Kunihiko Sadakane. Compressed suffix trees with full functionality.
Theory ofComputing Systems , 41(4):589–607, feb 2007a.Kunihiko Sadakane. Succinct data structures for flexible text retrieval systems.
Journal of Discrete Algorithms , 5(1):12–22, mar 2007b.Kunihiko Sadakane and Gonzalo Navarro. Fully-functional succinct trees. In
Proceedings of the Twenty-First Annual ACM-SIAM Symposium on DiscreteAlgorithms . Society for Industrial and Applied Mathematics, jan 2010.Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes forrange minimum queries on static arrays.
SIAM Journal on Computing , 40(2):465–492, jan 2011.Roberto Grossi and Giuseppe Ottaviano. Design of practical succinct datastructures for large data collections. In
Experimental Algorithms , pages 5–17.Springer Berlin Heidelberg, 2013.H´ector Ferrada and Gonzalo Navarro. Improved range minimum queries.
Jour-nal of Discrete Algorithms , 43:72–80, mar 2017.Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit. PracticalRange Minimum Queries Revisited. In , volume 75 of
Leibniz International Pro-ceedings in Informatics (LIPIcs) , pages 12:1–12:16, Dagstuhl, Germany, 2017.Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISBN 978-3-95977-036-1.Tomasz M. Kowalski and Szymon Grabowski. Faster range minimum queries.
Software: Practice and Experience , 48(11):2043–2060, 2018.Jon Bentley.
Programming pearls . Addison-Wesley Professional, 2016.Alfred V Aho and John E Hopcroft.
The design and analysis of computeralgorithms . Pearson Education India, 1974.Bernard A. Galler and Michael J. Fisher. An improved equivalence algorithm.
Communications of the ACM , 7(5):301–303, may 1964.28. E. Hopcroft and J. D. Ullman. Set merging algorithms.
SIAM Journal onComputing , 2(4):294–303, dec 1973.Robert Endre Tarjan. A class of algorithms which require nonlinear time tomaintain disjoint sets.
Journal of Computer and System Sciences , 18(2):110–127, apr 1979.M. Fredman and M. Saks. The cell probe complexity of dynamic data struc-tures. In
Proceedings of the twenty-first annual ACM symposium on Theoryof computing - STOC 89 . ACM Press, 1989.Zvi Galil and Giuseppe F. Italiano. Data structures and algorithms for disjointset union problems.
ACM Computing Surveys , 23(3):319–344, sep 1991.Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein.
Introduction to algorithms . MIT press, 2009.Robert Sedgewick and Kevin Wayne.
Algorithms . Addison-wesley professional,2011.Don Knuth. Notes on” open” addressing. 1963.Gaston H. Gonnet. Expected length of the longest probe sequence in hash codesearching.
Journal of the ACM , 28(2):289–304, apr 1981.Jon Louis Bentley and Andrew Chi-Chih Yao. An almost optimal algorithmfor unbounded searching.
Information Processing Letters , 5(3):82 – 87, 1976.ISSN 0020-0190.Ricardo Baeza-Yates and Alejandro Salinger. Fast intersection algorithms forsorted sequences. In
Algorithms and Applications , pages 45–61. SpringerBerlin Heidelberg, 2010.P. Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an effi-cient priority queue.
Mathematical Systems Theory , 10(1):99–127, dec 1976.Dan E. Willard. Log-logarithmic worst-case range queries are possible in spaceΘ(n).