aa r X i v : . [ c s . D S ] D ec Oblivious Sorting and Queues
Johannes Schneider
University of Liechtenstein, Vaduz, Liechtenstein
Abstract
We present a deterministic oblivious LIFO (Stack), FIFO, double-ended and double-ended priority queue as well asan oblivious mergesort and quicksort algorithm. Our techniques and ideas include concatenating queues end-to-end,size balancing of multiple arrays, several multi-level partitionings of an array. Our queues are the first to enableexecutions of pop and push operations without any change of the data structure (controlled by a parameter). Thisenables interesting applications in computing on encrypted data such as hiding confidential expressions. Mergesortbecomes practical using our LIFO queue, ie. it improves prior work (STOC ’14) by a factor of (more than) 1000 interms of comparisons for all practically relevant queue sizes. We are the first to present double-ended (priority) andLIFO queues as well as oblivious quicksort which is asymptotically optimal. Aside from theortical analysis, we alsoprovide an empirical evaluation of all queues.
Keywords: sorting, queues, complexity, oblivious algorithms, privacy preserving, computation on encrypted data,secure computing, fully homomorphic encryption, secure multi-party computation
1. Introduction
Advances in computing on encrypted data such as Fully Homomorphic Encryption (FHE) and secure multi-partycomputation (SMC) might make outsourcing computation securely practically feasible. Memory access must also besecured. For example, accessing the i -th element of an array of length n needs O (1) operations on RAM machines.But for a program running on encrypted data, the same access mechanism reveals access patterns. The knowledge of seemingly simple access patterns can help to disclose sensitive information such as stock trading patterns [17] orencryption keys [10]. A simple solution requires to access all array elements requiring O ( n ) instead of O (1) time.Oblivious RAM (ORAM) secures memory access more e ffi ciently using multiple parties. Often relying on more thanone party is not desirable. Current solutions for oblivious data structures also do not hide (high level) operations,which makes them unsuitable for omnipresent ‘if-then-else’ statements with private conditions and queue access in branches. Evaluating a confidential expression, keeping data as well as the expression itself secret, becomes straightforward using our LIFO queue and known techniques for computing on encrypted data. Such a scenario is importantfor cloud computing, ie. a cloud provider might host data for customers, which run their own analytics functionality.The customers wish to keep their data and algorithms private – in case of industrial automation an algorithm oftenmeans a mathematical expression on time-series sensor data. To summarize, the main contributions are:
1. We present oblivious LIFO, FIFO and double-ended (priority) queues. The amortized overhead of an operationon the LIFO queue is O(log n ) in the maximal length n of the queue. Prior LIFO queues (based on priorityqueues [21]) required O (log n ). For a wide range of applications such as the producer-consumer problem ina streaming context our FIFO queue has only O(log n ) overhead which improves prior work [21] by a factorlog n . We are the first to introduce double-ended queues. Our double-ended queue needs O(log n ).
2. We are the first to derive oblivious data structures to support push and pop operations that might not alter thestored elements (depending on a parameter).
Email address: [email protected] (Johannes Schneider) In fact, a request from industry motivated this feature.
Preprint submitted to Journal of Theoretical Computer Science June 25, 2018 . Our deterministic mergesort algorithm improves on [9] for all relevant list sizes, eg. by two orders of magnitudefor sorting of 10 billion elements.4. We state the first oblivious quicksort algorithm. It is asymptotically optimal. The Monte Carlo algorithm succeeds with high probability, ie. 1 − / n c for an arbitrary constant c . We structure the array representing the queue in subarrays (SA) of increasing size. A SA might be itself a queue.SAs are organized into parts that are merged and split if they are shifted between di ff erent SAs. Moving of elementsbetween SAs can cause some of the push and pop operations to require linear run-time in the maximal queue length. But the time is amortized across many operations so that the average overhead is only (poly)logarithmic. Moving ofparts between SAs happens based on the number of pops and pushes. It is not dependent on the data held in the queue.We develop a deterministic calling pattern that does not require knowing the number of stored elements in a queue.This allows to hide the number of operations together with another idea: We permit the pop and push of a special(empty) element that does not alter the number of stored elements in the data structure. Put di ff erently, this disguises whether an operation on the data structure changed the stored elements or not. Furthermore, to ensure e ffi cient accessto both ends of a queue, eg. as needed for FIFO and double ended queues, we concatenate two ends of a (LIFO)queue. We first discuss our model and some notation (Section 2). The main data structures are given in Section 3 (Stack) with a detailed explanation of core ideas and analysis, Section 4 (FIFO) and Section 5 (double-ended queue). Detailedcase studies are given in Section 9 after explaining the technique thoroughly. This includes an explanation howobliviousness (and operation hiding) helps in securing code. Performance evaluation can be found in Section 11.
2. Preliminaries and Limitations
We assume knowledge of an upper bound on the maximal number of elements n that can be kept in the data structure, ie. a queue is represented by an array of fixed size n . This assumption is common for oblivious datastructures. Adjusting the size of the data structure exactly to the actual number of elements is impossible since ourgoal is to conceal the number of elements contained in the queue. Our queues support two operations: Push (allowingempty elements) and Pop (allowing conditional popping). For obliviousness we proved an analogous definition as[6]. Essentially, obliviousness implies that memory access patterns are the same for any input. Definition 1.
A data structure is oblivious if the sequence of memory access only depends on the number of push andpop operations. A sorting algorithm is oblivious if the sequence of memory accesses is the same regardless of theinput.
We use a special (empty) element “ ∅ ” also denoted by a dash ‘ − ‘ indicating that an element in the queue isunused. Its bit representation must be di ff erent from any data item stored in the queue. We use variants of compare and exchange operations. The simplest form takes as input a binary bit b and two variables A and B . It assigns A : = B if the bit b is 1, otherwise A is not changed, ie. it computes A : = b · B + (1 − b ) · A . The compare-exchange-and-erase CmpExEr ( b , A , B ) performs a compare and exchange as described and, additionally, it might erase B , ie. itsets variable B to ∅ if b is 1 and leaves it unchanged otherwise (see PseudoCode CmpExEr in Algorithm 1). Forthe analysis we distinguish between input sensitive operations involving parameters of the push and pop elements as well as data of the queue and operations that do not directly depend on any input data (but potentially on the numberof operations). The motivation is that for secure computation these distinctions are meaningful, since the formercorrespond to (slower) operations on encrypted data. For our algorithms input sensitive operations always dominatethe time complexity – even when using non-encrypted data. They are split into elementary operations ( + ,-, · ), called E-Ops , and comparisons
C-Ops , which are composed of elementary operation. The distinction is motivated since comparisons are used to measure performance of sorting algorithms. For encrypted operations, comparisons mighthave di ff erent time complexities, eg. for SMC such as [19] it is not clear how to perform a comparison in less than Ω ( n b · E − Ops ) time, where n b is the number of bits of a compared number.2 . LIFO (Stack) For a Last-In-First-Out (LIFO) queue (also called Stack) a pop operation returns the most recently pushed ele- ment onto the data structure. To ensure obliviousness we access the same array elements independent upon the datacontained in the queue. Our queue always accesses the first element. A newly pushed element is stored in the firstposition of the array. This implies that upon every insertion, we must shift elements to the right to avoid overwritingof a previously inserted element. It is easy to shift the entire queue to the right but this requires linear run-time. Toimprove e ffi ciency, we logically split the array representing the queue into subarrays (SAs) of exponentially growing size. We only shift parts of size at most 2 k of a SA after every 2 k push or pop operations. LIFO Queue A with 3 subarrays S(i): A = [S(0) || S(1) || S(2)]Subarray S(i) with 4 parts P(i,j): S(i) = P(i,0) | P(i,1) | P(i,2) | P(i,3)Part P(i,j) with 2 i elements: P(i,j) = E(i,j,0) E(i,j,1) E(i,j,2)...E(i,j,2 i -1)Legend: "||" separates subarrays, "|" parts and " " elementsElements 1-12 pushed to (=>) LIFO Queue: 1 => [ 1| -| -| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 2 => [ 2| 1| -| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 3 => [ 3| 2| 1| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 4 => [ 4| 3| -| -|| 2 1| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 5 => [ 5| 4| 3| -|| 2 1| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 6 => [ 6| 5| -| -|| 4 3| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -] 7 => [ 7| 6| 5| -|| 4 3| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -] 8 => [ 8| 7| -| -|| 6 5| 4 3| 2 1| - -|| - - - -| - - - -| - - - -| - - - -] 9 => [ 9| 8| 7| -|| 6 5| 4 3| 2 1| - -|| - - - -| - - - -| - - - -| - - - -]10 => [10| 9| -| -|| 8 7| 6 5| 4 3| 2 1|| - - - -| - - - -| - - - -| - - - -]11 => [11|10| 9| -|| 8 7| 6 5| 4 3| 2 1|| - - - -| - - - -| - - - -| - - - -]12 => [12|11| -| -||10 9| 8 7| 6 5| - -|| 4 3 2 1| - - - -| - - - -| - - - -]Elements popped from (<=) LIFO Queue:12 <= [ -|11| -| -||10 9| 8 7| 6 5| - -|| 4 3 2 1| - - - -| - - - -| - - - -]11 <= [10| 9| -| -|| - -| 8 7| 6 5| - -|| 4 3 2 1| - - - -| - - - -| - - - -]10 <= [ -| 9| -| -|| 8 7| 6 5| - -| - -|| 4 3 2 1| - - - -| - - - -| - - - -] 9 <= [ 8| 7| -| -|| - -| 6 5| - -| - -|| 4 3 2 1| - - - -| - - - -| - - - -] 8 <= [ -| 7| -| -|| 6 5| - -| - -| - -|| 4 3 2 1| - - - -| - - - -| - - - -] 7 <= [ 6| 5| -| -|| - -| - -| - -| - -|| 4 3 2 1| - - - -| - - - -| - - - -] 6 <= [ -| 5| -| -|| - -| - -| - -| - -|| 4 3 2 1| - - - -| - - - -| - - - -] 5 <= [ 4| 3| -| -|| - -| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -] 4 <= [ -| 3| -| -|| 2 1| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 3 <= [ 2| 1| -| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 2 <= [ -| 1| -| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] 1 <= [ -| -| -| -|| - -| - -| - -| - -|| - - - -| - - - -| - - - -| - - - -] Figure 1: Pushes and pops onto a LIFO queueMore formally, a queue is implemented as an array A that is split into s subarrays (SA) S i growing exponen-tially in size with i . The total length n of the array is n : = P s − i = | S i | . Each SA S i itself is partitioned into q parts P i , , P i , , ..., P i , q − of equal size | P i , j | = | S i | / q . The size of a part varies for di ff erent SAs. We denote the k -th elementin P i , j by E i , j , k . Figure 1 shows the structure of a queue. We explain the shifting procedure shown in Figure 1 for a sequence of push operations. We always push anelement onto the first position in the array A (or pop an element from there). After every modification of the queue,3 teps for a Push of 6 onto LIFO Queue:Origin al queue: [ 5| 4| 3| -|| - -| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -]after ShiftPartsRight(Sublist 0): [ -| 5| 4| 3|| - -| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -]after E(0,0,0)=6 [ 6| 5| 4| 3|| - -| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -]final queue after executing emptyTwoParts(Sublist 0): [ 6| 5| -| -|| 4 3| - -| 2 1| - -|| - - - -| - - - -| - - - -| - - - -]Executing emptyTwoParts(Sublist 0):Original queue: [ 6| 5| 4| 3|| - -| 2 1| - -| - -|| - - - -| - - - -| - - - -| - - - -]after ShiftPartsRight(Sublist 1): [ 6| 5| 4| 3|| - -| - -| 2 1| - -|| - - - -| - - - -| - - - -| - - - -]after shifting P(0,2) to P(1,0): [ 6| 5| -| 3|| 4 -| - -| 2 1| - -|| - - - -| - - - -| - - - -| - - - -]final queue after shifting of parts P(0,3) to P(2,0): [ 6| 5| -| -|| 4 3| - -| 2 1| - -|| - - - -| - - - -| - - - -| - - - -] Figure 2: Steps for pushing an element onto a LIFO queue
Algorithm 1 LIFO
Initialization(Number of SAs s with s ≥ ) q : = { number of parts per SA } E i , j , k : = ∅ , ∀ i ∈ [0 , s − , j ∈ [0 , q − , k ∈ [0 , i − n pu : = n po : = { counter for pushes and pops } CmpExEr(b,A,B) A : = b · B + (1 − b ) · A { Exchange A , B based on b } B : = (1 − b ) · B + b · ∅ { Delete B based on b } ShiftPartsRight(SA i,doOp) empty & DoOp : = doOp if E i , , , ∅ else 0 for SA j : = q − do doS hi f t : = empty & DoOp if E i , j , = ∅ else 0 for Element k = | S i | / q − do CmpExEr(doShift, E i , j , k , E i , j − , k ) EmptyTwoParts(SA i) isFull : = ∧ q − j = ( E i , j , = ∅ ) else 0ShiftPartsRight(SA i + for SA j : = q − q − do o : = ( j − q + · | S i | / q { o ff set for last 2 parts } for Element k : = | S i | / q − do CmpExEr(isFull, E i + , , k + o , E i , j , k ) Push(Element x) n pu : = MoveBetweenSAs( n pu ,EmptyTwoParts) b : = x , ∅ ∧ E i , , , ∅ else 0 E , , : = x if x , ∅ else E , , MoveBetweenSAs(nOps,Operation Op) mi : = { Find maximal SA to empty / refill } while ( nOps +
1) mod 2 mi + = do mi : = mi + for SA i : = min( mi , s −
2) to 0 do Apply Operation Op on SA i return ( nOps +
1) mod 2 max(0 , s − ShiftPartsLeft(SA i) f ull & DoOp : = E i , , q − , ∅ else 0 for SA j : = to q − do doS hi f t : = f ull & DoOp if E i , j , , ∅ else 0 for Element k : = | S i | / q − do CmpExEr(doShift, E i , j , k , E i , j + , k ) RefillTwoParts(SA i) isEmpty : = ∧ q − j = ( E i , j , , ∅ ) else 0ShiftPartsLeft(SA i + for SA j : = q − q − do o : = ( j − q + · | S i | / q { o ff set for last 2 parts } for Element k : = | S i | / q − do CmpExEr(doShift, E i , j , k , E i + , , k + o ) Pop(doPop) result : = E , , if doPop else ∅ E , , : = ∅ if doPop else E , , n po : = MoveBetweenSAs( n po ,refillTwoParts) return result
4e modify (some) SAs to ensure that there is space for further pushes in the first SA. We shift elements to the right.Shifting is only done on a part level, ie. either we shift all elements of a part or none. We perform frequent shifts to overwrite empty small parts near the beginning of the array and less frequent shifts are conducted for larger partssituated towards the end of the array. We shift parts within a SA but also move parts between SAs, ie. either we mergetwo parts into one or we split a part into two parts. The subroutines for a push shown in Algorithm 1 are discussednext.ShiftPartsRight and EmptyTwoParts: ShiftPartsRight shifts elements from one part to the next part (on the right) within a SA. It avoids overwriting of filled parts by checking if the part to be overwritten is indeed empty. To thisend, we only check if the first position of a part is empty. No parts are moved, if the first part of the SA is empty.A parameter indicates whether shifting should take place or not. This is necessary to enable executions of pushoperations that do not modify the queue. If the parameter is false, ie. zero, then no elements are moved. The orderof shifting is from back to front, ie. elements of the second to last part are shifted to the last part (given it is empty), then the third to last part is shifted to the second to last (if empty) and so on. EmptyTwoParts empties the last twoparts of a SA i by merging them to form the first part of SA i +
1. It first empties the first part in SA i + i are full and SA i + q > ∅ . This would lead to empty SAs followed by (partially) full SAs. As a consequence for pop operations we would have to undo the shifting (or search the entire array).Push: A push operation first ensures that the first position of the array is empty. Then, it inserts the pushed element atthe first position. A push and its suboperations are illustrated in Figure 2.MoveBetweenSAs: Restructuring is done after every operation starting from some initial SA (down) to the very firstSA in the beginning of the queue. The (index of the) initial SA depends on the number of operations and not the number of actual elements in the queue, which we wish to disguise. Parts of a SA are moved to the next SA, once aSA is full. It might seem reasonable to move all parts of a full SA to the next. However, for alternating pushes andpops this might trigger large performance penalties since parts are continuously moved back and forth between SAs.To disguise the number of elements in the queue (and thus parts), we access all parts in the same deterministic mannerfor any sequence of pushes of fixed length. Since we allow pushes of a special (empty) element that has no impact on the number of stored elements, the number of operations (as an indicator for the actual number of elements contains)is not exact. We assume that the array grows at a maximal rate, ie. every push is done using a non-empty element.Since we always empty two parts of a SA, we must create space in a SA by moving elements, whenever a sequence ofoperations could have resulted in the filling of two parts of that SA. For example, every push potentially fills one partin the first SA, since they are of size one. Thus, we would empty the first SA after every second push operation. For SA i with parts of size 2 i , we would move two parts to the next SA after every 2 i + operations. But this approach failsfor an arbitrary interleaving of operations pops and pushes of empty and non-empty elements. For example, for thefollowing sequence of pushed elements 1 , , , ∅ , ,
5, the algorithm would attempt to empty the first SA after havingpushed 1 , , , , ∅ . The first SA contains 1,2,3 and misses one element to be full. Thus, the SAwould not be emptied and two more elements could be (attempted) to be pushed onto the SA before trying to empty it again, but the SA becomes full after pushing one more element. Therefore, we perform restructuring operationsmore frequently, ie. for SA i we execute EmptyTwoParts after every 2 i operations (rather than after 2 i + ). The lastSA that can be emptied is the second to last, ie. the one with index s −
2. The restructuring is done in AlgorithmMoveBetweenSAs which executes for a push operation EmptyTwoParts on all parts as described. It takes as input thecounter of the current operations and returns the next value for the counter, which is (usually) the counter incremented by 1. However, once the maximal possible SA has been shifted the operation counter is reset to zero, eg. for s = s − = we actually perform the operation or not. ShiftPartsLeft only shifts a part, if the first part is empty. RefillTwoPartsmoves the first part of SA i + i . One full part in SA i + i . As for emptying of parts and right shifts, no non-empty parts are overwritten.5 .2. Analysis Theorem 1.
The LIFO queue is oblivious.
Proof.
According to Definition 1 we require that memory accesses are independent of the input. (They are allowed tobe dependent on the number of operations.) None of the procedures in Algorithm 1 accesses memory cell dependenton an input value, ie. all loop-conditions do not depend on the input and any conditional access to memory cells ofthe form ‘cell0: = a if cell1 = x else b’ can be expressed as multiplications (Section 2).We analyze push and pop operations with respect to time complexity (Theorem 2) and correctness (Theorem 3). In the worst case a single operation might take Ω ( n ), where n is the maximal length of the queue. We prove that onaverage, the time is only logarithmic in n . Theorem 2.
For the LIFO queue a pop and push operation requires amortized O (log n ) time, ie. q log( n / q ) E-Opsand q + C-Ops.
The proof uses that two parts of SA i of length 2 · i are refilled (emptied) after every 2 i push (pop) operations. Since there are O (log n ) SAs we get time P log ni = i + / i = O (log n ). Proof. SA i is refilled (emptied) after every 2 i pop (push) operations. After refilling (emptying) all SAs from index s − s − pop (push) operations, we start over by considering SA 0 only. The average run-time increases up tothe point, where SA s − ffi ces to compute the average number of operations for a sequenceof 2 s − pop (push) operations. We analyze pop operations by counting of E-Ops followed by C-Ops. CmpExEr needs7 E-OPs (2 additions, 1 subtraction, 4 multiplications). ShiftsPartsLeft for SA i needs 2 i · q − i performs one shift in list i + i · q − + i · q = · i · q . SinceRefillTwoParts on SA i is called after every 2 i − pops, on average refilling of SA i contributes by 7 · i · q / i − = q E-Ops. By definition we have n = P s − i = | S i | = P s − i = q · i = q · (2 s −
1) yielding s = (log( n / q )) +
1. Summing over allSAs gives s − X i = q = q ( s − = q log( n / q )The analysis of C-Ops is analogous. CmpExEr contains zero comparisons. In ShiftPartsLeft we perform one com-parison (line 1) and one in each of the 2 i · ( q −
1) iterations. A refill of SA i takes 2 q comparisons ( q to computeisEmpty in RefillTwoParts (line 1) and q within ShiftPartsLeft. Therefore, the number of comparisons becomes P s − i = q / i − ≤ q . Adding two C-Ops due to lines 1-2 in Algorithm Pop completes the proof for pop. The push operation is analyzed in the same manner. Lemma 1.
Each part P i , j can only be in one of two states: empty (all elements being ∅ ) or full (no elements being ∅ ). This follows since we modify either all or none of the elements of a part.
Proof.
Initially, all parts are empty. Parts of the first SA can only be full or empty, since they contain at most one element. Parts of SA i > i are full, the last two parts of aSA, ie. P q − , i and P q − , i , each of size 2 i are shifted to the next SA, ie. to become the first P , i + of size 2 i + . This partis filled completely. A filled part in SA i (see procedure RefillTwoParts) split into two parts of the same size, yields two full parts in SA i − Theorem 3.
The LIFO queue works correctly.
We show that no elements are overwritten and no empty elements are returned if the array is non-empty since we refilland empty parts of SAs su ffi ciently often. 6 roof. In Algorithm 1 no parts are overwritten if the first element of a part is non-empty – see definition and usage of variables f ull & doOp , empty & doOp in ShiftPartsLeft / Right; isFull , isEmpty in Empty / RefillTwoParts; line 2 ofpush with E i , , , ∅ . Since all elements of a part are either the empty element or di ff er from it (Lemma 1), checkingthe first element su ffi ces to avoid overwriting of non-empty parts.We first show that there is no interleaving of empty and non-empty SAs. Let the t -th SA be the largest SA such thatat least one part is full. All SAs i < t contain at least one non-empty part. An arbitrary sequence of pushes cannot reduce the number of full parts in a SA below two. This follows since we only empty two parts of a SA if all fourparts are full. An arbitrary sequence of pops cannot completely empty a SA except the last, since SA i being of size q · i is refilled with elements from SA i + i pops (see MoveBetweenSAs in Algorithm 1).Next we show that there is no interleaving of SAs with some non-empty parts and SAs with only full parts. EmptyT-woParts executes on SA i before it executes on SA j < i . Upon execution there are two possibilities: Either no or two parts are moved to SA i +
1. In the first case at most 3 parts are full in SA i and thus, we could insert one more part,in the second case the SA is full and two parts are emptied. Either way, it su ffi ces to empty SA i after two parts in SA i − i − elements, our choice of calling EmptyTwoParts i afterevery two 2 i − operations su ffi ces (see MoveBetweenSAs in Algorithm 1). Therefore, not all parts of a SA can be full,if there is space in a larger SA. For refilling parts an analogous argument applies.
4. FIFO
A First-In-First-Out (FIFO) queue needs fast access to the first and the last element. We use an array of LIFOqueue variants of increasing lengths, ie. each SA of the FIFO queue is itself a LIFO queue. Each LIFO queue storeselements in ‘reverse’ order, meaning the first element to be popped in the LIFO queue is the oldest element the LIFOqueue contains. In this way we can e ffi ciently access the oldest element of each LIFO queue. The array structure is visualized in Figure 3. Each LIFO queue matches on SA.For a pop operation the LIFO queue with largest index that is non-empty is identified. Then an element is poppedfrom that queue. To make the algorithm oblivious we execute a pop operation on every LIFO queue within the FIFOqueue. We start from the back and pop an element from each LIFO queue, ie. SA, until the first non-empty LIFOqueue has been identified. For the remaining queues we execute pops using a parameter to indicate that, in fact, no element should be popped. The key point is that indepedent of the value of the parameter the same memory cells areaccessed.PopperQueue: A LIFO queue o ff ers more functionality than is needed, since we do not push elements in thefront but only pop them except for the first queue, which is just a single element. Opposed to a LIFO queue, we cantherefore refill a SA completely. We reduce the number of parts from four to two. Using more parts per SA is slower since we must shift the same elements multiple times rather than moving them less often in bigger chunks, ie. largerSAs. We can reuse most LIFO procedures (Algorithm 1) without modification, ie. ShiftsPartLeft, RefillTwoParts andPop. We call this LIFO variant “PopperQueue”. It is a special case of the LIFO queue from Section 3. It has the same(asymptotic) properties, but it is roughly a factor of two faster, since it uses less parts and therefore requires less shiftswithin a SA, ie. compare Theorem 2 for q = q = Due to the more involved array organization of a FIFO queue, the emptying of parts and refilling of parts needscareful attention. It is not possible to concatenate two parts to get a larger part without extra processing, ie. two arrays(of PopperQueues) placed after each other generally do not yield an array representing a larger PopperQueue with avalid structure. The concatenation could give partially filled parts. For example, assume that there are two queues withone SA and two parts, eg. [1 |− ] and [ −| | − || − | − − ] having the partially filled part | − | . Furthermore, we have to ensure a correct ordering of the elements within LIFO queues when moving elementsbetween them.If one last part of the PopperQueue stored in SA i is full, we move 2 i elements from queue i to the very last partof queue i +
1. We pop one element after the other from queue i and put it directly into the last part of queue i + j -th pop is put at the j -th position of the last part. At this point the whole queue i + the last part that was just inserted) might be empty which would cause subsequent calls of pop on queue i + i + i +
1. 7
IFO Queue F with 3 LIFO queues A(m) : (cid:0) = (cid:1)(cid:2)(cid:3)(cid:4)> (cid:5)(cid:6)(cid:7)(cid:8))(cid:9) (cid:10)(cid:11)(cid:12)2(cid:13)(cid:14)L(cid:15)(cid:16)(cid:17) (cid:18)u(cid:19)(cid:20)(cid:21) (cid:22) (m) wi t(cid:23) (cid:24)(cid:25)(cid:26) subarrays S ((cid:27)(cid:28)(cid:29) (cid:30) (m) = [S(0) (cid:31)! " (m-1)]Subarray S(i) with 2 parts P(i ,’*+ S(i) = P(i,0) | P(i,1)Part P -/013 with 2 i elements: P( i456 7 89;<= ,0) E( ?@ABCD EFGHIJKMNOPQRSTUVW i -1)Elements pus hXY Z[ \]^_ ‘abc d ueue e fg jk l| mno pq r | -|| - -| - svw xy z{ -|| - -| - -|| - - - -| - } ~ (cid:127)(cid:128)(cid:129) (cid:130) (cid:131)(cid:132) (cid:133)(cid:134) (cid:135)(cid:136) (cid:137)(cid:138)(cid:139) (cid:140)(cid:141) (cid:142) | -|| - -| - (cid:143)(cid:144)(cid:145) (cid:146)(cid:147) (cid:148)(cid:149) -|| - -| - -|| - - - -| - (cid:150) (cid:151) (cid:152)(cid:153)(cid:154) (cid:155) (cid:156)(cid:157) (cid:158)(cid:159) (cid:160)¡ ¢£⁄ ¥ƒ § | 2|| - -| - ¤'“ «‹ ›fi -|| - -| - -|| - - - -| - fl (cid:176) –†‡ · (cid:181)¶ •‚ „” »…‰ (cid:190)¿ (cid:192) | 2|| - -| - `´ˆ ˜¯ ˘˙ -|| - -| - -|| - - - -| - ¨ (cid:201) ˚¸(cid:204) ˝ ˛ˇ —(cid:209) (cid:210)(cid:211) (cid:212)(cid:213)(cid:214) (cid:215)(cid:216) (cid:217) | 2|| 3 4| - (cid:218)(cid:219)(cid:220) (cid:221)(cid:222) (cid:223)(cid:224) -|| - -| - -|| - - - -| - Æ (cid:226) ª(cid:228)(cid:229) (cid:230) (cid:231)Ł ØŒ º(cid:236) (cid:237)(cid:238)(cid:239) (cid:240)æ (cid:242) | 2|| 3 4| - (cid:243)(cid:244)ı (cid:246)(cid:247) łø -|| - -| - -|| - - - -| - œ ß (cid:252)(cid:253)(cid:254) (cid:255) (cid:0) (cid:1)(cid:2) (cid:3)| (cid:4)(cid:5)(cid:6) (cid:7)(cid:8) (cid:9) | 2|| 3 4| 5 (cid:10)(cid:11)(cid:12) (cid:13)(cid:14) (cid:15)(cid:16) -|| - -| - -|| - - - -| - - (cid:17) (cid:18)(cid:19)(cid:20) (cid:21) (cid:22)(cid:23) (cid:24)(cid:25) (cid:26)(cid:27) (cid:28)(cid:29)(cid:30) (cid:31)! " | 6|| - -| - * + ,./ | 6|| = >? @ ABC DE FG
H I JKL
MN OP QR STUVWX YZ [ | 6|| \ ]^ _ ‘ab cd ef g h ijk
Elements pop plm noqr stuv wxyz {}~(cid:127)(cid:128)(cid:129) (cid:130)(cid:131) (cid:132)(cid:133) (cid:134)(cid:135)(cid:136)(cid:137)(cid:138)(cid:139) (cid:140)(cid:141) (cid:142) | 6|| (cid:143) (cid:144)(cid:145) (cid:146) (cid:147)(cid:148)(cid:149) (cid:150)(cid:151) (cid:152)(cid:153) (cid:154) (cid:155) (cid:156)(cid:157)(cid:158) (cid:159) (cid:160)¡ ¢£ ⁄¥ƒ§¤' “« ‹ | 6|| › fifl (cid:176) –†‡ ·(cid:181) ¶• ‚ „ ”»… ‰ (cid:190)¿ (cid:192)` ´ˆ˜¯˘˙ ¨(cid:201) ˚ | 6|| ¸ (cid:204)˝ ˛ ˇ—(cid:209) (cid:210)(cid:211) (cid:212)(cid:213) (cid:214) (cid:215) (cid:216)(cid:217)(cid:218) (cid:219) (cid:220)(cid:221) (cid:222)(cid:223) (cid:224)Æ(cid:226)ª(cid:228)(cid:229) (cid:230)(cid:231) Ł | 6||
Ø Œº (cid:236) (cid:237)(cid:238)(cid:239) (cid:240)æ (cid:242)(cid:243) -|| - -| - -|| - - - -| - (cid:244) ı (cid:246)(cid:247)ł ø œß (cid:252)(cid:253) (cid:254)(cid:255)|(cid:0)(cid:1)(cid:2) (cid:3) (cid:4) | 6|| -|| - -| - -|| - - - -| - - (cid:15) (cid:16)(cid:17)(cid:18) (cid:19) (cid:20)(cid:21) (cid:22)(cid:23) (cid:24)(cid:25)(cid:26)(cid:27)(cid:28)(cid:29) (cid:30)(cid:31) !" - -| - &’( )* +, -|| - -| - -|| - - - -| - . / 012 - -| -
FGH IJ KL -|| - -| - -|| - - - -| -
M N OPQ
R ST UV WXYZ[\ ]^ _ | -|| - -| - ‘ab cd ef -|| - -| - -|| - - - -| - g h ijk l mn opqrs tuv wx y | -|| - -| - z{} ~(cid:127) (cid:128)(cid:129) -|| - -| - -|| - - - -| - (cid:130) (cid:131) (cid:132)(cid:133)(cid:134) (cid:135)(cid:136) (cid:137)(cid:138) (cid:139)(cid:140) (cid:141)(cid:142) (cid:143)(cid:144)(cid:145) (cid:146)(cid:147) (cid:148) | -|| - -| - (cid:149)(cid:150)(cid:151) (cid:152)(cid:153) (cid:154)(cid:155) -|| - -| - -|| - - - -| - (cid:156) (cid:157) (cid:158)(cid:159)(cid:160)
Figure 3: Sequence of pushes and pops onto a FIFO queue8he push operation for the FIFO queue appends elements to the end of the very first LIFO queue. Since it is of lengthtwo, we shift the second element of it to the left and then set the second position to the newly inserted element.
Corollary 1.
For the FIFO queue a pop operation requires O (log n ) and a push O (log n ) time on average.Proof. For a pop of the FIFO queue we do a pop for each of the PopperQueues giving P si = O ( i ) = O ( s ) = O (log n ).For a push we move blocks of size 2 i from SA i , ie. PopperQueue i , to SA i + i operations, which needstime linear in the queue length. Summation gives P si = O (2 i / i ) = O ( s ) = O (log n ). FIFO queues are often used as bu ff ers to distribute peak loads across a longer timespan. Commonly, a producerpushes elements onto the queue continuously (as a stream), while a consumer repeatedly takes an element andprocesses it. Bu ff ering always introduces some delay in processing. Thus, usually an additional delay is tolerable. Apop on the fast FIFO queue only returns an element given the queue has been filled partially, ie. it is at least half full.Our FIFO queue that has only amortized O (log n ) overhead rather than O (log n ). The idea is to use two queues “back to back”: one for popping and one for pushing. The two queues share the last part, ie. both treat this partas belonging to them. Thus, elements are pushed onto one of the queues and are continuously shifted to the rightwith newly inserted elements until they reach the queue for popping. A pop only returns an element after its lastpart of the last SA (shared with the pushing queue) has been filled. The same ideas also apply to double-ended queues. For the Fast FIFO Queue (B2B Queue) the time complexity of a push and pop matches the corresponding operationsfor the LIFO queue.
Corollary 2.
For the B2B-FIFO Queue a pop and push operation require O( log n).
5. Double-Ended Queue
A double-ended queue supports popping elements at the head and tail as well as prepending elements at the beginning and appending them at the end. We combine ideas for LIFO and FIFO queues. We use an array of queues(as for FIFO queues) to address the need to push elements to the head of the array and pop them from the tail. Sinceelements can also be pushed at the back, we use LIFO queues, ie. SA i of the double-ended queue is given by a LIFOqueue with i + is non-full, otherwise we push it onto the next queue. Popping elements from the front might trigger refilling of SAs.In turn, we have to move the newest elements of one SA to another. Identifying the newest elements of a LIFO queue(with elements sorted by age, ie. ascending insertion order) is cumbersome, since there is only e ffi cient access to theoldest element. To reverse order, we remove all elements from the array (using a sequence of pops) and insert theminto a temporary LIFO queue. This yields a queue sorted by newest to oldest elements. Then we move elements by popping them from the temporary queue to the queue to refill, ie. for queue i we move 2 i + elements. The remainingelements are pushed back onto the emptied queue (used to create the temporary LIFO queue). Theorem 4.
Any operation on the double-ended queue has amortized time O (log n ) . Operations are similar to the LIFO queue, except for refilling and emptying that needs an additional logarithmic factordue to the popping and pushing of elements rather than direct access in O(1).
Proof.
Pushing and popping to the front works the same as for LIFO queues except for the refilling and emptyingof full SA. We require an additional logarithmic factor, since we cannot just copy elements of one SA, ie. queue, toanother but we first pop them from the LIFO queue onto a temporary queue. More precisely, each element accessusing a pop requires amortized O(log n ) as shown in Theorem 2 rather than O(1). Pushing and popping to the backrequires executing a constant number of push and pop operations for all parts constituting LIFO queues. Since we have O(log n ) queues and each operation on a LIFO queue requires O(log n ) (see Theorem 2), a single push and popoperation requires O (log n ). 9 . Double-Ended Priority Queue In this scenario, each data item has a priority. A double-ended priority queue can return either the data elementwith the smallest or largest priority. The queue structure is the same as for double-ended queue. We ensure that each
SA, ie. LIFO queue, contains elements sorted in descending priority. When moving elements from one queue toanother, ie. to empty a full queue or refill a queue, we first create one single sorted array containing all elements fromboth queues and then refill the smaller queue up to half of its capacity with the elements of smallest priority and putthe other elements in the larger queue. The sorting can be done by merging both arrays.Popping the element of minimum priority requires finding the smallest element in SA 0. Popping the element of maximum priority requires checking all parts, since we do not know which parts contain elements and which donot as well as which part contains the element with largest priority. More precisely, we first (peek) all parts and findthe element and part with the maximum element. After that we perform a pop on the (first) queue containing themaximum element. This is done by executing a pop for all parts. The parameter of the pop operation, determiningwhether the operation indeed removes an element from the queue, must be set accordingly, ie. it is ∅ for all but the queue containing the maximum element.The restructuring is somewhat more involved. Upon a push that requires restructuring, eg. either refilling oremptying queue i we first create one sorted array in increasing order by merging both queues as done for ordinarymergesort (see also Section 7). We then refill SA i until it is half full with the smallest elements (in reversed order)and insert the remaining to the next SA (in reversed order). Theorem 5.
Any operation on the double-ended priority queue has amortized time O (log n ) .Proof. We discuss time followed by correctness. Pushing an element to the front (or popping the element of minimumpriority) works the same as for LIFO queues except for the emptying and refilling of full SA. We require an additionallogarithmic factor to move elements from queue i to queue i + access using a pop requires amortized O(log n ) as shown in Theorem 2 rather than O(1). Moving the elements fromthe temporary queue onto the (new) queues i and i + O (log n ) time. Popping the maximum priority element requires executing a pop operation for all LIFOqueues (plus restructuring). Since we have O(log n ) queues and each operation on a LIFO queue requires O(log n )(see Theorem 2), a single push and pop operation requires O (log n ). Popping the maximum priority element requires executing a pop operation for all LIFO queues (plus restructuring). This requires O (log n ).Correctness of a pop of maximum priority follows, since we maintain all queues in descending order of priority. Thus,the element of maximum priority is the first element in one of the queues. Since we consider the first elements of allqueues and return the one of maximum priority, correctness follows. For the minimum we only investigate the firstqueue. Since upon every restructuring operation on queue i we keep the smallest half of both queues i and i + queue i , it holds that after a restructuring all elements in SA i are smaller than any element in SA i + > i . Usinginduction, we have that the smallest element is in SA 0.
7. Oblivious Mergesort
Our oblivious mergesort algorithm (O-Mergesort) divides an unsorted array (or list) into SAs of one element. Itrepeatedly merges two arrays of equal length to obtain a new sorted array of double the length until there is only one array remaining. This array is sorted. To make the sorting procedure oblivious requires a queue that supports aconditional pop, ie. we pop the element of the array if it is smaller than another element. For short arrays (of length1), we use a naive sort. Otherwise, two PopperQueues are merged by repeatedly comparing the first element of eachqueue A and B and appending the smaller one to the result array C . Note, that since A and B are sorted the elementput into C is the smallest element in both arrays. We pop an element from the array which element we just appended to C – see Algorithm O-Merge 2. Theorem 6.
Sorting of an array of n elements requires at most n log n C-Ops and a total of n · (3 + n − +
28 log n ) E-Ops. lgorithm 2 O-MergeInput: Sorted PopperQueue A and B of length l Output:
Merged LIFO Queue C if l = then b : = A [0] ≤ B [0] else 0 C [0] : = A [0] · b + B [0] · (1 − b ) C [1] : = B [0] · b + A [0] · (1 − b ) else eleA : = A . pop (1) { Returns smallest element in A } eleB : = B . pop (1) for k = to · l − do { Set C [ k ] to the smallest element in A union B and remove the element } b : = eleA ≤ eleB else 0 C [ k ] : = eleA · b + (1 − b ) · eleB eleA : = A . pop ( b ) · b + (1 − b ) · eleA eleB : = A . pop (1 − b ) · (1 − b ) + b · eleB end for C [2 · l −
1] : = eleA · b + (1 − b ) · eleB end if Proof.
The merger of two arrays of size l each requires 4 l pop operations, each requiring 18 comparisons usingTheorem 2 with q =
2. Additionally, we need one more comparison per iteration. This gives a total of 85 l S-Ops for merging two arrays. In total we get the following bound P log n − j = log n − j · · j ≤ n log n S-Ops.The naive sort of two arrays of size one comparing the two elements requires 5 E-Ops. In total there are n / n /
2. The merger of two arrays of size l > l pop operations,each requiring 28(log( l / + =
28 log l E-Ops. , giving a total of 112 l log l . Additionally, we need 5 E-Ops for eachof the 2 l operations, giving a total of 10 l E-Ops. Overall we get l (10 +
112 log l ). Overall, we get n / + log n − X j = log n − j − · · j · (log(2 j ) + = n / + log n − X j = n · · (log(2 j ) + ≤ n + n (log n / + n − = n · (3 + n − +
28 log n ) The analysis uses that we merge n i arrays of length 2 i and Theorem 2 to bound the time two merge two arrays. Weimprove on [9] by a factor of more than 1000 in terms of the number of comparisons, ie. C-Ops. Comparisons areoften used to analyze sorting algorithms, since typically the total operations involved is proportional to the number ofcomparisons. In our case, this does not necessarily hold, since we only require one comparison for shifting a large number of elements. Therefore, the costs for shifting might dominate the costs for comparisons. To ensure a fair andobjective comparison among algorithms we also analyzed the number of other operations, ie. E-Ops, since they arethe dominant factor in our algorithm. With respect to the total number of operations O-Mergesort is asymptoticallyworse by a factor log n . However, due to the extremely large constants used in the state-of-the-art [9] we use lessoperations for all practically relevant scenarios, ie. for arrays of length up to roughly 2 . For illustration, when sorting 10 billion elements we need more than 100x less E-Ops. Furthermore, E-Ops (or XORs, ANDs) are generallyless complex than comparisons, therefore in practice the speed-up might be even larger. We have log( l / + l /
2) using Theorem 2 with q = l = x but we only support mergers ofarrays of length 2 y −
1, thus we need y = x + = log( l / + . Quicksort Our oblivious quicksort algorithm (O-Quicksort) is a comparison-based divide and conquer algorithm. Smallarrays of size at most 4 log n are sorted using O-Mergesort. Larger arrays are recursively split into two smaller (sub)arrays. An array is split using a pivot element. All elements less or equal to the pivot are put in one array andall larger elements in the other array. Ideally, both arrays are of the same size. However, naive splitting likely leadsto badly balanced arrays leading to O ( n ) run-time since an oblivious algorithm must treat both parts as potentiallylarge. However, when choosing the median as pivot, it is possible to ensure that both arrays are of equal size. Wecompute an approximate median for all elements (Section 8.1). Unfortunately, choosing an approximate median still leaves some uncertainty with respect to the exact array lengths after the splitting. Therefore, in the partition process(Section 8.1), rather than swapping elements within one array, we create two arrays of fixed length, one for elementslarger than the pivot and one for all other elements. Since the length of each of the two arrays must be fixed usingconservative upper bounds, their sum of lengths exceeds the length of the array to be split. To get a single sorted arrayrequires a special reunification of both arrays (Section 8.2). For simplicity, we assume that all elements to be sorted are distinct. This assumption is removed in Section 8.3. Algorithm RandomPivot chooses several elements uniformly at random, sorts them and picks the median. Bychoosing the median of a su ffi ciently large sample of elements we ensure that the chances of a split resulting in veryunbalanced arrays is small. We pick a fixed number of samples n p , sort them, eg. using the O-MergeSort algorithm, and then pick the middle element l / l as pivot. Algorithm 3 RandomPivotInput:
Array A of length l , number of samples n p Output:
Pivot p P : = Set of n p elements of A chosen uniformly at random S P : = Sorted samples P { eg. using O-MergeSort } p : = S P [ l / { Choose middle element ( = Median) as pivot } For the partitioning the entire array is split into two arrays, one with all elements being smaller than the pivot andone with all elements being larger. The two arrays are given by two LIFO queues. We push elements that are smallerthan the approximated median on one of the queues and the larger elements on the other queue. We discuss the caseof duplicates in Section 8.3.
Theorem 7.
Algorithm RandomPivot returns a pivot p such that at least a fraction c f = · + p c (log n ) / n p ≥ / of elements of an array of length n are larger than p and the same fraction is smaller than p with probability − / n c for n p ≥ · c · log n. We obtain tail estimates using carefully applied Cherno ff bounds. Theorem 8 (Cherno ff Bound) . The probability that the number X of occurred independent events X i ∈ { , } , i.e. X : = P i X i , is not in [(1 − c ) E [ X ] , (1 + c ) E [ X ]] with c ∈ ]0 , and c ∈ ]0 , can be bounded by p ( X ≤ (1 − c ) E [ X ] ∨ X ≥ (1 + c ) E [ X ]) < e − E [ X ] · min( c , c ) / . Proof of Theorem 7:
Proof.
The value of c f is minimized, when n p is smallest. Thus, the bound c f ≥ / n p = · c · log n , into c f = · + p c (log n ) / n p = / · + √ = /
4. The theorem holds if the pivot does not stem from the c f · n smallest or largest elements. If we pick less than c f · n p < n p / S ⊆ A from the c f · n smallest and less than c f · n p < n p / L ⊆ A from the c f · n largest elements this will be the case.The reason being that the pivot p is the element at position n p / c f · n smallest or largest elements. We expect to pick c f · n p elements S out of the c f · n smallest elements(and analogously for the largest), ie. E [ | S | ] = c f · n p . We seek the smallest factor f > expectation by factor f the pivot is not chosen correctly. We have f · c f · n p = n p /
2, if f = / (2 · c f ). The probabilitythat the expectation is exceeded by a factor f > ff bound (see Theorem 8) by prob ( | S | > f · E [ S ]) < / ( f − / · c f · n p ≤ / / · ( f − · n p (Using c f ≥ / = / / · (1 / (2 · c f ) − · n p (Using f = / (2 · c f )) = / / · ((1 + √ c log n / n p ) − · n p = / / · c log n = / n / · c In the same manner we can compute prob ( | L | > f · E [ L ]). Therefore the probability of both events becomes for n su ffi ciently large: prob ( (cid:0) | L | ≤ f · E [ L ] (cid:1) ∧ (cid:0) | S | ≤ f · E [ S ] (cid:1) ) ≥ − (cid:0) prob ( | L | > f · E [ L ]) + prob ( | S | > f · E [ S ] (cid:1) ≥ − / n c
11 17 29 32 33 39 42 44 46 - - 91 87 84 71 61 51 48 - - - - 11 17 29 32 33 - - - - - - 61 71 84 87 91 11 17 29 32 33 39 - - - - 51 61 71 84 87 91 11 17 29 32 33 39 42 - - 48 51 61 71 84 87 91 11 17 29 32 33 39 42 44 - 48 51 61 71 84 87 91 11 17 29 32 33 39 42 44 46 48 51 61 71 84 87 91
Copying of elements up to min. LengthMin. Length Expected Length Max. LengthCopying of remaining elements (step by step)
AB CA
Figure 4: Merger of two subarrays within O-Quicksort
So far we have obtained well-balanced, but unsorted SAs. Since we do not have access to their exact lengthswe use a conservative bound on their lengths given by the analysis of the partition process. O-Quicksort recurseson two separate arrays stemming from the partitioning. We sort the array of smaller elements B than the pivot inascending order and the array of larger elements C in descending order. At the end, both arrays must be merged to get a single final array of sorted elements. This requires some care since we do not know their exact lengths. Due tothe partitioning process we can bound the minimum length of B and C to be l / · (1 + ǫ ) with ǫ : = p · c · (log n ) / l .We copy the elements (up to the guaranteed minimum length bound) to the final array, so that these elements appearsorted. This means we fill the final array with B from the left end towards the right and with C from the right end. The13 lgorithm 4 O-Quicksort
Input:
Array A of length l , Sort ascending: asc Output:
Sorted array A ǫ : = p · c · (log n ) / l if l ≥ n then B , C : = Partition ( A ) O-Quicksort( B , l / · (1 + ǫ ) , T rue )) O-Quicksort( C , l / · (1 + ǫ ) , False )) for k = to l / · (1 − ǫ ) do B [ k ] : = B [ k ] { Copy elements from B to A } B [ l − k ] : = C [ k ] { Copy elements from B to A } end for for k = l / · (1 − ǫ ) to l / · (1 + ǫ ) do if B [ k ] , v ∅ then B [ k ] : = B [ k ] { Copy elements from B to A } if C [ k ] , v ∅ then B [ l − k ] : = C [ k ] { Copy elements from C to A } end for else B : = Sort using O-MergeSort or other alg. either ascending or desc. depending on asc end if entire process is illustrated in Figure 4. The remaining elements are handled in the same fashion, but before setting an array element in A to be an element from B or C , we check whether the element in A is still empty. Theorem 9.
O-Quicksort needs O ( n log n ) C-Ops and E-Ops. It produces a correct sorting with probability − / n c for an arbitrary constant c. The recurrences are somewhat involved, since the lengths of both arrays used for recursion exceeds the original lengthof the array being split. We conduct a staged analysis to obtain (asymptotically tight) bounds.
Proof.
The complexity T ( n ) of O-Quicksort in terms of comparisons can be stated using a recurrence. For one call(ignoring recursion) to an array of length l > n we have that the complexity is given by partitioning the arraybeing O ( l ) plus the reunification of both sorted arrays, ie. the copying of elements being also O ( l ). Thus, we get atotal of O ( l ) = c · l for some constant c . We obtain the following recurrence for an array of length l using ǫ = √ a / l with a : = · c · (log n ): First call: T ( l ) = T ( l / · (1 + p a / l )) + c · l Second call: T ( l / · (1 + p a / l )) ≤ T ( l / · (1 + p a / l ) · (1 + p a / ( l / + c · l / · (1 + p a / l ) ≤ T ( l / · (cid:0) + p a / ( l / (cid:1) ) + c · l / · (1 + p a / l )Third call: T ( l / · (1 + p a / ( l / ) ≤ T ( l / · (1 + p a / ( l / ) + c · n / · (1 + p a / ( l / i -th call: T ( l / i · (cid:0) + p a / ( l / i − ) (cid:1) i ) ≤ T ( l / i + · (cid:0) + p a / ( l / i ) (cid:1) i + ) + c · l / i · (cid:0) + p a / ( l / i ) (cid:1) i + (1) Assume we start splitting the entire array A with l = n . The total number of operations (C-Ops and E-Ops) atrecursion depth i is given by the additive term in Equation (1) multiplied by the number of calls to O-Quicksort being2 i , ie. i c · n / i · (cid:0) + p a / ( n / i ) (cid:1) i + = c · n · (cid:0) + p a / ( n / i ) (cid:1) i + The total operations for the first r : = log n − n recursions is given by: r − X i = c · n · (cid:0) + p a / ( n / i ) (cid:1) i + ≤ c · n · r − X i = (cid:0) + p a / (log n ) (cid:1) log n ≤ c · n · log n r recursions the size of the input sequence for the recursive calls is at most n / r · (1 + p a / ( n / r − )) r ≤ · log n (for n su ffi ciently large). For another 6 log log n recursions on an array of length 2 log n the number of operations isbounded by: c · n · n − X i = (1 + s n (log n ) ) n = c · n · n − X i = (1 + p log n ) n ≤ c log n The size of the remaining arrays is 4 log n using the same derivation as above using r recursions. To sort such anarray using O-MergeSort requires O (log n log log n ) C-Ops and O (log n (log log n ) ) E-Ops (see Theorem 6). There are 2 log n − n = n / log n such arrays, giving a total of O ( n log log n ) C-Ops and O ( n (log log n ) ) E-Ops. To obtaina correctly sorted queue all executions of RandomPivot must be successful. We perform at most log n − n recursions. Thus, in total there are at most n calls to RandomPivot, each succeeding with probability at least 1 − / n c ′ for an arbitrary constant c ′ . The probability that all succeed is at least (1 − / n c ′ ) n ≥ − / n c ′ − . Choosing c ′ = c + So far we focused on arrays of distinct elements. For non-distinct elements our algorithm can fail to computebalanced arrays in case the chosen median is not unique. In the most extreme case all elements are the same andthe split would result in one empty array and one array being the same as the array to be split. Elements can alwaysbe made distinct by appending a unique number at the end, eg. by appending a counter to a array of elements (0 , , , , p to both arrays such that their lengthsmaintain balanced. In a first phase we create two arrays B , C and maintain a counter l p for the elements equal to p bydistinguishing three cases for an element x that is compared to the pivot p , ie. x < p , x > p and x = p . In the firstcase, we assign x to array B and increment the length counter of l B . In the second case we assign x to C and incrementthe length counter l C of C . In the third case, we increment just the counter l p . In the second phase we distribute l p copies of p to the arrays B and C such that their di ff erence in length is as small as possible. We perform l iterations,where l is the number of elements in the array to be partitioned, ie. A . In each iteration we subtract one from l p . If l p is zero the arrays remain the same. Otherwise, if the lengths of l B is less than l C , we append a copy of the pivot to B and increment the length counter l B otherwise we do the same for C . The (asymptotic) complexity remains the same.
9. Applications
Confidential expressions, hiding data as well as operations on the data are a rather straight forward application ofour LIFO queue with conditional operations as well as basic operations, e.g. addition and multiplication, from securecomputing schemes such as fully homomorphic encryption (FHE) or secure multi-party computation (MPC). We firstdiscuss the evaluation of non-confidential expressions. For brevity we only discuss the evaluation of expressions involving numbers, additions and multiplications. We focus on evaluating postfix expressions. When using encryptedvalues, the expression remains confidential. That is to say, despite computing the expression we do not learn anythingabout the expression except its length. The key to achieve this is the conditional push, ie. we execute a push operationbut it only has an impact given that the element to be pushed is di ff erent from the special element ∅ that is notappended to the stack. Our algorithm 5 requires linear run-time in terms of the number of elements in the expression (or, more precisely, in the bound we get for the length of the expression).To evaluate confidential expressions, all array elements of the input A must be encrypted as well as variables thatdepend on the array elements. Variables that indicate array lengths or the number of operations do not have to beencrypted. To this end, one can use any of the known scheme for computing on encrypted data such as FHE or MPC,eg. [5] or [19]. These schemes provide basic operations such as addition and multiplication that allow to construct other operations such as comparisons, subtractions and more. However, certain operations like accessing an array In Algorithm 5 the values of n and s do not have to be encrypted. In the LIFO queue and its sub-procedures, n pu , n po , q , o , mi and s remainunencrypted. All other variables are encrypted. lgorithm 5 Case Study: PostFix expressionsInput: LIFO Queue A of postfix symbols of length at most n Output:
Result of evaluation st : = LIFO ( s ) { Choose number of SAs s such that array can hold at least n elements } for i : = to n − do symb : = A . pop (1) toPush : = symb if symb is a number else ∅ st.push(toPush) isAdd : = symb = ” + ” else 0 resAdd : = st.pop(isAdd) + st.pop(isAdd) isMul : = symb = ” ∗ ” else 0 resMul : = st.pop(isMul) · st.pop(isMul) toPush : = isMul · resMul + (1 − isMul ) · ∅ toPush : = isAdd · resAdd + (1 − isAdd ) · toPush st.push(toPush) end for return st.pop(1) element using an encrypted index might occur linear overhead in the length of the array. In our case, we manage tokeep the asymptotic running time, since we only have to directly substitute additions, subtractions, multiplication andcomparisons operations. The Stock Span Problem is motivated by financial analysis of stocks. The span of a stock’s price on day d isthe maximum number of consecutive days until day d , where the price of the stock has been at most its price on d .The well-known text book solution is given in Algorithm 6 taking linear time in the number of prices n . A straightforward solution gives a quadratic run-time algorithm due to the nested loops, ie. due to the worst case behavior ofthe inner loop. This renders the solution impractical for larger datasets. The total number of iterations (when being summed across all outer loop iterations) of the inner loop is only n . A single iteration of the inner loop could performall n iterations for some inputs. To ensure obliviousness we would have to execute (asymptotically) n iterations of theinner loop for every execution of the outer loop. Furthermore, the code contains direct array access, eg. price [ i ]. In theobvious manner, this would also incur linear run-time overhead. However, it is possible to transform the nested loopby essentially changing the inner loop to an ‘if’-conditional first without changing the number of iterations of the outer loop. Then we make the loop oblivious using a conditional expression if-then-else . Essentially, in Algorithm 6we replace the while and do keyword in line 5 by an if and then . Lines 6 to 8 form the else part. We only showthe final pseudocode after the translation of the ‘if’ into oblivious code in Algorithm 7. Since we must execute bothbranches of the i f to keep the condition confidential, the algorithm requires that we can execute the ‘pop’ operationwithout impacting the data structure, ie. without actually performing a pop. This is supported by our data structure by using a special element in case the condition evaluates to true. The algorithm uses a peek operation, which returnsthe first element without removing it. It can be implemented using a combination of pop and push operation, eg. x : = pop (1), push ( x ).
10. Related Work
In 2009 Gentry [5] introduced a fully homomorphic encryption(FHE) scheme based on lattices. Since then the field of computing on encrypted data (and circuits) has evolved rapidly, as summarized in [15]. All approaches forFHE are based on either addition and multiplication or XOR and AND. Secure computations can also be carried outusing multiple parties, such that no party learns a secret (if it does not collude with other parties). Secure multi-party computation was introduced three decades ago [24, 7] and is still subject to extensive research, eg. [19]. Both We adjusted it from lgorithm 6 Case Study: Stock SpanInput: LIFO Queue price of prices of length at most n Output:
LIFO S with spans (in reverse order) st : = LIFO ( s ) { Choose number of SAs s such that array can hold at least n elements } st . push (0) S . push (1) { Span first element is 1 } for i = to n − do while st . peek () , ∅ ∧ price [ st [0]] ≤ price [ i ] do st.pop() span : = i + st . peek () , ∅ else i − st [0] S.push(span) st.append(i) end for Algorithm 7 Case Study: Oblivious Stock SpanInput:
LIFO Queue price of prices of length at most n Output:
LIFO S with spans (in reverse order) st : = LIFO ( s ) { Choose number of SAs s such that array can hold at least n elements } pi : = price . pop (1) st . push ((0 , pi )) S . push (1) { Span first element is 1 } i : = for k : = to n − do (sti,stp) : = st.pop(1) popNext : = sti = ∅ or sti ≤ pi else 0 pi : = price . pop ( popNext ) · popNext + (1 − popNext ) · pi i : = i + (1 − popNext ) span : = i + st . peek () , ∅ else i − sti span : = ∅ if popNext else span S . push ( span ) pushi : = ( i , pi ) if (1-popNext) else ( ∅ , ∅ ) st . push ( pushi ) end for could be used as black boxes to make our algoritms work on encrypted data. [5, 22] mentioned that circuit privacyis achievable by adding a (large) noise vector of the encrypted 0. The original work on SMC [24] already allowed tohide circuits using garbled circuits. Our approach also allows to achieve circuit privacy in a novel manner by hidingwhether a certain operation really impacted the computation. Our work is not limited to circuits. “Oblivious RAM”(ORAM) [6] disguises access patterns by a client from a server using randomization. The original solution [6] is based on a hierarchy of bu ff ers such that each level of the hierarchy consists of several blocks. One block per level is readand always written on the first level. For each level, some blocks are real and some are dummy containing randomdata. The original algorithm has been improved, eg. in a recent tree-based scheme [20] each data block is storedsomewhere in the tree, ie. following a specific path from the root. After an access minor reordering involving theaccessed elements takes place, potentially, resulting in some data being sent to the client. Some schemes, eg. [20, 23], trade-o ff performance (or memory consumption) and the risk for memory access errors. Oblivious data structuresfor ORAM covering arrays and queues are discussed in [23]. They make use of data access locality encountered forarrays and queues. A single access to a parent node in the position map returns pointers to multiple children. Theyhide whether the operation is a read or write to a memory cell. However, assuming one knows that a write mustoccur in a function, one knows that some memory cell is modified. We do not use traditional ORAM techniques. Furthermore, in our scenario, knowing that a certain operation is performed, ie. a pop or push, still gives no hintwhether the data structure was modified or not.Other work designed oblivious data structures particularly for SMC, eg. [12, 21]. The work [12] uses ORAMstructures and secret sharing among parties to achieve obliviousness. In contrast, [21] presents a deterministic schemefor priority queues using the bucket heap concept for priority queues [3] coming with O(log n ) overhead. Bucket heaps partition memory into bu ff ers of size 2 i + and signal blocks of size 2 i [3]. Buckets store actual elements,whereas bu ff ers store overflowing elements. Once a bu ff er is full it is merged into a bucket. [21] adjusted this settingto use blocks of equal size. Our queue shares the common idea of organizing data in blocks of increasing size that isalso found in other work, eg. [14]. We di ff er from prior work [14, 21, 3] in several aspects, eg. we perform a morefine-grained partitioning using multiple blocks, eg. in the view of [21] we introduce one more level of partitioning for bu ff er and data blocks. We have also come up with a deterministic oblivious sequence of restructuring operationsto handle empty and full blocks rather than counting the number of elements in the queue, eg. [21]. In contrast toour work, prior work also does not hide the impact of an operation (ie. they do not hide the number of elements in abucket), which is essential for securing control structures. Our fast B2B-FIFO queue introduces novel ideas such asblock sharing not found in prior work. The paper [1] shows how to compute the k -th ranked element for SMC. The paper [13] discusses median computa-tion. Such operations might prove valuable also for sorting, eg. selecting the median element for quicksort. However,both protocols [1, 13] disclose the outcome of secure comparisons, which might require non-desirable client interac-tion and is not considered secure. The SMC protocol for sorting in [25] runs in constant rounds but needs to knowthe product of the range of numbers R and it has communication and computational complexity that is proportional to product of the range of numbers times the number of elements, ie. O ( n · R ) (an improved version has O ( n )). Toachieve constant rounds it relies on the evaluation of unbounded fan-in gates.Sorting networks are naturally oblivious, since they use a fixed sequence of comparisons among elements in anarray that is not related to the data stored in the array. They have been used for secure sorting in [11, 9]. The work [9]is based on a network with 19600 · n log n comparators. A comparator can be implemented by a comparison yielding a bit b followed by an exchange of two elements A , B , ie. A : = b · B + (1 − b ) · A and B : = b · A + (1 − b ) · B .Therefore a comparator needs 7 E-Ops in addition to the comparison, yielding 156800 · n log n operations. Thoughthis is asymptotically optimal, it is of little practical relevance due to the number of comparators needed. Additionally,the depth (of the sorting network) is of order n log n , which makes it non-parallelizable. Our algorithms improve onit for all relevant scenarios (see Section 7 for a detailed comparison). The oblivious randomized Shellshort algorithm [8] is asymptotically optimal in terms of the number of comparisons using several techniques such as permutation ofthe array as well as shaker and brick passes of the array.Oblivious algorithms for geometric problems are presented in [4]. Algorithms for graphs incurring overhead up tolinear factor (in the number of nodes) are given in [2]. Other work [18] based on ORAM designed data structures formaps. They allow for history independence, ie. di ff erent sequences of operations lead to indistinguishable (memory layouts of the physical) data structures. 18
1. Evaluation
We shed light on two aspects that are not immediate from the asymptotic analysis. First, on the one hand ouroblivious data structures are more involved than using a naive oblivious implementation traversing the entire arrayfor each operation, on the other hand we have shown that asymptotically it outperforms a naive implementation. The key question is, whether our oblivious queues outperform already for queues of small capacity or only for those withlarge capacity. Therefore, we compared our implementation against a simple ‘linear’ oblivious queue that accessesall elements (that could be stored) for each operation. Thus, the run-time is linear in the capacity. Second, howmuch slower is our array compared to a non-oblivious queue. We have shown that the asymptotic factors are of orderO(log n ) and O(log n ) depending on the queue type. Here, we give more precise insights. We implemented the benchmarks in Python. The evaluation was run on a machine equipped with an Intel 2.5 GHzquad-core CPU with 8 GB RAM on Windows 7. For the non-oblivious queue we ran 1 Mio. operations. For theoblivous linear queue, ie. the naive oblivious queue traversing the entire array for each operation, we attempted to run1 Mio operations, but stopped after 1 hour if the computation was still ongoing and estimated the time it would taketo compute 1 Mio. operations. For the oblivious data structures we executed max(100000 , · capacity ) operations, since the maximal run-time is achieved if we execute a multiple of the capacity. Each operation was chosen randomlyamong a push and pop operation. Due to the obliviousness it does not matter what parameters we use for the push andpop operation.
27 55 111 223 447 895 1791 3583 7167~2^13~2^14~2^15Queue Capacity10 T i m e [ s ] ObliviousFIFOObliviousFastFIFOObliviousLinearQueueNonObliviousQueue (a) FIFO Queues
12 28 60 124 252 508 1020 2044 4092 8188Queue Capacity10 T i m e [ s ] ObliviousLIFOObliviousLinearQueueNonObliviousQueue (b) LIFO Queues T i m e [ s ] ObliviousDoubleEndedQueueObliviousLinearQueueNonObliviousQueue (c) Double-Ended Queues T i m e [ s ] ObliviousDoubleEndedPriorityQueueObliviousLinearQueueNonObliviousQueue (d) Double-Ended Priority Queues
Figure 5: Running Times results for 1 Mio. operations for our oblivious queues compared to linear oblivious andnon-oblivious queues 19he plots in Figures 5a, 5b,5c and 5d show the run-times comparing all queue variants for increasing maximumqueue sizes for FIFO and LIFO queues. Qualitatively all queues behave similarly as predicted by the asymptotic analysis. For small queue sizes (LIFO and FastFIFO up to about 60, FIFO up to about 500) a simple linear obliviousqueue has an edge over our more complex queues. For double-ended queues performance is somewhat worse, butsimple linear queues are also outperformed for moderate queue sizes. With growing queue sizes the exponential gapbecomes clearly visible between the linear oblivious queue and our implementations. The LIFO and fast FastFIFOqueue are more than 100x faster for queues of capacity about 10000. For FIFO queues we reach the boundary of wrapper), which uses memory proportional to the actual array size, the asymptotic behavior is well-visible. OurLIFO and FastFIFO queue both have an asymptotic overhead of log n compared to non-oblivious queues that directlyaccesses queue elements. This results in close to parallel lines in Figures 5b and 5b. The overhead is roughly a factor40 for queues of size 10000. For FIFO queues the asymptotic overhead is larger, ie. log n . The overhead is a factorof 200 for arrays of the same size. In the light of overhead that typically comes with secure computation, eg. FHE or SMC that can reach more than 5-6 orders of magnitude [16], our overhead is very modest. We also want to emphasizethat to the best of our knowledge no prior work has compared against a non-oblivious implementation. We believethis is a very important benchmark.
12. Conclusions
We have presented oblivious queues accompanied by theoretical and practical investigation having only (poly)logarithmic overhead. Since queues are an essential part of everyday programming, we believe that they willplay a major role for enabling computation using encrypted data, in particular with focus on expression hiding. Still,many more common data structures and operations have to be realized e ffi ciently before any of the existing technolo-gies such as FHE and MPC become practical for a large range of applications References [1] G. Aggarwal, N. Mishra, and B. Pinkas. Secure computation of the median (and other elements of specified ranks).
Journal of cryptology ,23(3):373–401, 2010.[2] M. Blanton, A. Steele, and M. Alisagari. Data-oblivious graph algorithms for secure computation and outsourcing. In
Proceedings of the 8thACM SIGSAC symposium on Information, computer and communications security , pages 207–218. ACM, 2013.[3] G. S. Brodal, R. Fagerberg, U. Meyer, and N. Zeh. Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths. In
Algorithm Theory-SWAT 2004 , pages 480–492. 2004.[4] D. Eppstein, M. T. Goodrich, and R. Tamassia. Privacy-preserving data-oblivious geometric algorithms for geographic data. In
Proceedingsof the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems , pages 13–22. ACM, 2010.[5] C. Gentry. Fully Homomorphic Encryption Using Ideal Lattices. In
Proc. 41st ACM Symposium on Theory of Computing , pages 169–178,2009. [6] O. Goldreich. Towards a theory of software protection and simulation by oblivious rams. In
Proc. of 19th Symposium on Theory of comput-ing(STOC) , pages 182–194, 1987.[7] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game. In
Proc. of 19th Symp. on Theory of computing , pages 218–229,1987.[8] M. T. Goodrich. Randomized shellsort: A simple oblivious sorting algorithm. In
Proceedings of the 21st Symposium on Discrete Algo- rithms(SODA) , pages 1262–1277, 2010.[9] M. T. Goodrich. Zig-zag sort: A simple deterministic data-oblivious sorting algorithm running in o (n log n) time. In
Proc. of 46th Symp. onTheory of Computing , pages 684–693, 2014.[10] M. S. Islam, M. Kuzu, and M. Kantarcioglu. Access pattern disclosure on searchable encryption: Ramification, attack and mitigation. In
NDSS , volume 20, page 12, 2012. [11] K. V. Jonsson, G. Kreitz, and M. Uddin. Secure multi-party sorting and applications.
IACR Cryptology ePrint Archive , 2011:122, 2011.[12] M. Keller and P. Scholl. E ffi cient, oblivious data structures for MPC. pages 506–525, 2014.[13] Y. Lindell and B. Pinkas. Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality , 1(1):5,2009.[14] J. C. Mitchell and J. Zimmerman. Data-oblivious data structures. In
LIPIcs-Leibniz International Proceedings in Informatics , volume 25.
Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
15] C. Moore, M. O’Neill, E. O’Sullivan, Y. Doroz, and B. Sunar. Practical homomorphic encryption: A survey. In
Int. Symp.on Circuits andSystems (ISCAS) , pages 2792–2795, 2014.[16] M. Naehrig, K. Lauter, and V. Vaikuntanathan. Can homomorphic encryption be practical? In
Proceedings of the 3rd ACM workshop onCloud computing security workshop , pages 113–124. ACM, 2011. [17] B. Pinkas and T. Reinman. Oblivious RAM revisited. In
Advances in Cryptology (CRYPTO) , pages 502–519. 2010.[18] D. S. Roche, A. J. Aviv, and S. G. Choi. A practical oblivious map data structure with secure deletion and history independence. arXivpreprint arXiv:1505.07391 , 2015.[19] J. Schneider. Lean and fast secure multi-party computation: Minimizing communication and local computation using a helper.
SECRYPT ,2016, extended version: arXiv:1508.07690, https://arxiv.org/abs/1508.07690 . [20] E. Stefanov, M. Van Dijk, E. Shi, C. Fletcher, L. Ren, X. Yu, and S. Devadas. Path oram: An extremely simple oblivious RAM protocol. In Proc. of the SIGSAC conference on Computer & communications security , pages 299–310, 2013.[21] T. Toft. Secure data structures based on multi-party computation. In Proceedings of the 30th annual symposium on Principles of distributedcomputing , pages 291–292, 2011.[22] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan. Fully homomorphic encryption over the integers. In
Advances in cryptology–
EUROCRYPT 2010 , pages 24–43. 2010.[23] X. S. Wang, K. Nayak, C. Liu, T. Chan, E. Shi, E. Stefanov, and Y. Huang. Oblivious data structures. In
Proc. of the Conference on Computerand Communications Security , pages 215–226, 2014.[24] A. C.-C. Yao. How to generate and exchange secrets. In
Foundations of Computer Science(FOCS) , 1986.[25] B. Zhang. Generic constant-round oblivious sorting algorithm for MPC. In
Provable Security , pages 240–256. 2011.600