[PDF] Bucket Oblivious Sort: An Extremely Simple Oblivious Sort

Abstract

We propose a conceptually simple oblivious sort and oblivious random permutation algorithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivious sort uses 6nlogn time (measured by the number of memory accesses) and 2Z client storage with an error probability exponentially small in Z . The above runtime is only 3× slower than a non-oblivious merge sort baseline; for 2 30 elements, it is 5× faster than bitonic sort, the de facto oblivious sorting algorithm in practical implementations.

Full PDF

BBucket Oblivious Sort:An Extremely Simple Oblivious Sort ∗ Gilad Asharov † T-H. Hubert Chan ‡ Kartik Nayak § Rafael Pass ¶ Ling Ren (cid:107)

Elaine Shi ∗∗ Abstract

We propose a conceptually simple oblivious sort and oblivious random permutation algo-rithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivioussort uses 6 n log n time (measured by the number of memory accesses) and 2 Z client storagewith an error probability exponentially small in Z . The above runtime is only 3 × slower thana non-oblivious merge sort baseline; for 2 elements, it is 5 × faster than bitonic sort, the defacto oblivious sorting algorithm in practical implementations. With the increased use of outsourced storage and computation, privacy of the outsourced data hasbeen of paramount importance. A canonical setting is where a client with a small local storageoutsources its encrypted data to an untrusted server. In this setting, encryption alone is notsuﬃcient to preserve privacy. The access patterns to the data may reveal sensitive information.Two fundamental building blocks for oblivious storage and computation [GO96,GM11,SS13] areoblivious sorting and oblivious random permutation. In these two problems, an array of n elementsis stored on an untrusted server, encrypted under a trusted client’s secret key. The client wishesto sort or permute the n elements in a data-oblivious fashion. That is, the sequence of accessesit makes to the server should not reveal any information about the n elements (e.g., their relativeranking). The client has a small amount of local storage, the access pattern to which cannot beobserved by the server. This work presents simple and eﬃcient algorithms to these two problems,named bucket oblivious sort and bucket oblivious random permutation. For oblivious sort, it is well-known that one can leverage sorting networks such as AKS [AKS83]and Zig-zag sort [Goo14] to obliviously sort n elements in O ( n log n ) time. Unfortunately, these ∗ The paper was presented in the 3rd Symposium on Simplicity in Algorithms, SOSA@SODA 2020. This versionis identical to the SOSA’20 conference version modulo typo corrections. † Bar-Ilan University. Part of the work was done while the author was a post-doctoral fellow at Cornell Techsupported a Junior Fellow award from the Simons Foundation, and while at J.P. Morgan AI Research. ‡ The University of Hong Kong. Partially supported the Hong Kong RGC under the grant 17200418. § Duke University. Part of the work was done while the author was at University of Maryland. ¶ Cornell Tech. (cid:107)

University of Illinois Urbana-Champaign. Part of the work was done while the author was at MIT. ∗∗ Cornell University. a r X i v : . [ c s . D S ] A ug lgorithms are complicated and incur enormous constants rendering them completely impracti-cal. Thus, almost all known practical implementations [SS13, LWN +

15, NWI +

15] instead employthe simple bitonic sort algorithm [Bat68]. While asymptotically worse, due to the small leadingconstants, bitonic sort performs much better in practice.Oblivious random permutation (ORP) can be realized by assigning a suﬃciently long randomkey to each element, and then obliviously sorting the elements by the keys. To the best of ourknowledge, this remains the most practical solution for ORP. It then follows that while O ( n log n )algorithms exist in theory, practical instantiations resort to the O ( n log n ) bitonic sort. Thereexist algorithms such as the Melbourne shuﬄe [OGTU14] that do not rely on oblivious sort; butthey require O ( √ n ) client storage to permute n elements. Other approaches include the famousThorp shuﬄe [CV14] and random permutation networks [Czu15], but none of these solutions arecompetitive in performance either asymptotically or concretely. Let Z be a statistical security parameter that controls the error probability. Our bucket oblivioussort runs in 6 n log n time (4 n log n for bucket ORP) and has an error probability around e − Z/ when the client can store 2 Z elements locally. This is at most 3 × slower than the non-obliviousmerge sort, and is at least 5 × faster than bitonic sort for n = 2 (cf. Table 1). Therefore,we recommend bucket oblivious sort and bucket ORP as attractive alternatives to bitonic sort inpractical implementations.The core of our algorithms is to assign each element to a random bin and then route theelements through a butterﬂy network to their assigned random bins. This part is inspired byBucket ORAM [FNR + n elements into B = 2 n/Z buckets of size Z/ Z/ B buckets formthe inputs of a butterﬂy network — for simplicity, assume B is a power of two. Each element isuniformly randomly assigned to one of the B output buckets, represented by a key of log B bits. Theelements are then routed through the butterﬂy network to their respective destinations. Assumingthe client can store two buckets locally at a time, at level i , the client simply reads elements fromtwo buckets that are distance 2 i away in level i and writes them to two adjacent buckets in level i + 1, using the i -th bit of each element’s key to make the routing decision. We refer readers toFigure 1 for a graphical illustration.The above algorithm is clearly oblivious, as the order in which the client reads and writes thebuckets is ﬁxed and independent of the input array. If no bucket overﬂows, all elements reach theirassigned destinations. By setting Z appropriately, we can bound the overﬂow probability.Our bucket oblivious sort and bucket ORP algorithms are derived from the above obliviousrandom bin assignment building block. From oblivious random bin assignment to ORP and oblivious sort.

To obtain a randompermutation, we simply remove all dummy elements and randomly permute each bucket of theﬁnal layer. Since the client can hold Z elements, permuting each bucket can be done locally. Weshow that the algorithm is oblivious and gives a random permutation despite revealing the numberof dummy elements in each destination bucket. To get oblivious sort, we can ﬁrst perform ORPon the input array then apply any non-oblivious, comparison-based sorting algorithm (e.g., quicksort or merge sort). We show that the composition of ORP and non-oblivious sort results in anoblivious sort. 2 igure 1: Oblivious random bin assignment with 8 buckets. The

MergeSplit proceduretakes elements from two buckets at level i and put them into two buckets at level i + 1, according tothe ( i +1)-th most signiﬁcant bit of the keys. At level i , every 2 i consecutive buckets are semi-sortedby the most signiﬁcant i bits of the keys.Algorithm Oblivious Client storage Runtime Error probabilityMerge sort No O (1) 2 n log n O (1) n log n O (1) 5 . × × n log n O (1) 8 × × n log n O (1) 24 n log n ≈ n − Bucket oblivious sort

Yes 2 Z n log n ≈ e − Z/ Bucket oblivious sort

Yes O (1) ≈ n log n log Z ≈ e − Z/ Table 1: Runtime of bucket oblivious sort and classic non-oblivious and oblivious sortalgorithms.

Bitonic sort requires n log n comparisons. The number of comparisons for AKSsort and zig-zag sort are cited from [Goo14]. Runtime represents the number of memory accesses,which is four times the number of comparisons. Dealing with small client storage.

In Section 4.1, we extend our algorithms to support O (1)client storage. We can rely on bitonic sort to realize the MergeSplit operation that operates on4 buckets at a time, which would result in O ( n log n · log Z ) runtime. Locality.

Algorithmic performance when the data is stored on disk has been studied in theexternal disk model (e.g., [RW94,AFGV97,Vit01,Vit06]) and references within). Recently, Asharovet al. [ACN +

19] extended this study to oblivious algorithms. We discuss how our algorithms canbe made locality-friendly in Section 4.3.

Subsequent work.

The work by Ramachandran and Shi [RS20] improved the algorithm in acache-oblivious, cache-eﬃcient manner in a binary fork-join model of computation.3

Preliminaries

Notations and conventions.

Let [ n ] denote the set { , . . . , n } . Throughout this paper, wewill use n to denote the size of the instance and use λ to denote the security parameter. Foran ensemble of distributions { D λ } (parametrized with λ ), we denote by x ← D λ a sampling ofan instance from the distribution D λ . We say two ensembles of distributions { X λ } and { Y λ } are (cid:15) ( λ )-statistically-indistinguishable, denoted { X λ } (cid:15) ( λ ) ≡ { Y λ } , if for any unbounded adversary A , (cid:12)(cid:12)(cid:12)(cid:12) Pr x ← X λ (cid:104) A (1 λ , x ) = 1 (cid:105) − Pr y ← Y λ (cid:104) A (1 λ , y ) = 1 (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15) ( λ ) . Random-access machines.

A RAM is an interactive Turing machine that consists of a memoryand a CPU. The memory is denoted as mem [ N, b ], and is indexed by the logical address space[ N ] = { , , . . . , N } . We refer to each memory word also as a block and we use b to denote thebit-length of each block. The memory supports read/write instructions ( op , addr , data ), where op ∈ { read , write } , addr ∈ [ N ] and data ∈ { , } b ∪ {⊥} . If op = read , then data = ⊥ andthe returned value is the content of the block located in logical address addr in the memory. If op = write , then the memory data in logical address addr is updated to data . We use standardsetting that b = Θ(log N ) (so a word can store an address). Obliviousness.

Intuitively, a RAM program M obliviously simulates a RAM program f if: (1) ithas the same input/output behavior as f ; (2) There exists a simulator Sim ( | x | ) that produces accesspattern that is statistically close to the access pattern of M ( x ), i.e., it can simulate all memoryaddresses accessed by M during the execution on x , without knowing x . In case the access patternand the functionality are randomized, we have to consider the joint distribution of the simulatorand the output of the RAM program or the functionality.For a RAM machine M and input x , let AccPtrn ( M ( x )) denote the distribution of memoryaddresses a machine M produces on an input x . Deﬁnition 2.1.

A RAM algorithm M obliviously implements the functionality f with (cid:15) -obliviousnessif the following hold: (cid:110) Sim (1 λ ) , f ( x ) (cid:111) x ∈{ , } λ (cid:15) ( λ ) ≡ { AccPtrn ( M ( x )) , M ( x ) } x ∈{ , } λ If (cid:15) ( · ) = 0 , we say M is perfectly oblivious. The two main functionalities that we focus on in this paper are the following:

Oblivious sort:

This is a deterministic functionality in which the input is an array A [1 , . . . , n ]of memory blocks (i.e., each A [ i ] ∈ { , } b , representing a key). The goal is to output an array A (cid:48) [1 , . . . , n ] which is some permutation π : [ n ] → [ n ] of the array A , i.e., A (cid:48) [ i ] = A [ π ( i )], such that A (cid:48) [1] ≤ . . . ≤ A (cid:48) [ n ]. Oblivious permutation:

This is a randomized functionality in which the input is an array A [1 , . . . , n ] of memory blocks. The functionality chooses a random permutation π : [ n ] → [ n ] andoutputs an array A (cid:48) [1 , . . . , n ] such that A (cid:48) [ i ] = A [ π ( i )] for every i .4 Our Construction

We ﬁrst present the oblivious random bin assignment algorithm (Section 3.1) and then use itto implement our bucket oblivious random permutation (Section 3.2) and bucket oblivious sort(Section 3.3).

Algorithm 3.1: Oblivious Random Bin AssignmentInput : an array X of size n Choose a bucket size Z and let B be the smallest power of two that is ≥ n/Z .Deﬁne (log B + 1) arrays, each containing B buckets of size Z . Denote the j -th bucket of the i -th array A ( i ) j .For each element in X , assign a uniformly random key in [0 , B − X into B groups. Put the j -th group into A (0) j and pad with dummy elements tohave size Z . for i = 0 , . . . , log B − dofor j = 0 , . . . , B/ − do ( A ( i +1)2 j , A ( i +1)2 j +1 ) ← MergeSplit ( A ( i ) j (cid:48) + j , A ( i ) j (cid:48) + j +2 i , i ) where j (cid:48) = (cid:98) j/ i (cid:99) · i +1 (cid:46) Input: j -th pair of buckets with distance 2 i in A ( i ) ; Output: j -th pair of buckets in A ( i +1) end forend forOutput: A (log B ) = A (log B )0 (cid:107) . . . A (log B ) B − . function ( A (cid:48) , A (cid:48) ) ← MergeSplit ( A , A , i ) A (cid:48) receives all real elements in A ∪ A where the ( i + 1)-st MSB of the key is 0 A (cid:48) receives all real elements in A ∪ A where the ( i + 1)-st MSB of the key is 1If either A (cid:48) or A (cid:48) receives more than Z real elements, the procedure aborts with overﬂow Pad A (cid:48) and A (cid:48) to size Z with dummy elements and return ( A (cid:48) , A (cid:48) ) end function The input to the oblivious random bin assignment algorithm is an array X of n elements. The goalis to obliviously and uniformly randomly distribute the elements into a set of bins. Each elementis assigned to independent random bin, and elements are then routed into the bins obliviously.The algorithm ﬁrst chooses a bucket size Z , which can be set to the security parameter λ . Then,it constructs B = (cid:100) n/Z (cid:101) buckets each of size Z . Without loss of generality, assume B is a powerof 2 — if not, pad it to the next power of 2. Note that the algorithm introduces n dummy elements,and the output is twice the size of the input array.Figure 1 gives a graphic illustration of the algorithm for 8 input buckets and Algorithm 3.1gives the pseudocode. Each element in X is assigned a random key in [0 , B −

1] which represents adestination bucket. Next, the algorithm repeatedly calls the

MergeSplit subroutine to exchangeelements between bucket pairs in log B levels to distribute elements into their destination buckets.The operation ( A (cid:48) , A (cid:48) ) ← MergeSplit ( A , A , i ) involves four buckets at the time, distributingthe elements in the two input buckets A and A into two output buckets A (cid:48) and A (cid:48) . A (cid:48) receives allthe keys with ( i + 1)-th most signiﬁcant bit (MSB) as 0 and A (cid:48) receives all the keys with ( i + 1)-thMSB as 1. 5or now, assume the client can locally store two buckets. For each MergeSplit , it reads (anddecrypts) the two input buckets, swaps elements in the two buckets according to the above rule,and writes to the two output buckets (after re-encryption). It is then easy to see that Algorithm 3.1is oblivious since the order in which the client reads and writes the buckets is ﬁxed and independentof the input array.When no bucket overﬂows, all real elements are correctly put into their assigned bins. We nowshow that the probability of overﬂow is exponentially small in Z . Intuitively, this is because eachbucket contains (in expectation) half dummy elements that serve as a form of “slack” to disallowoverﬂow. Lemma 3.2.

Overﬂow happens with at most (cid:15) ( n, Z ) = 2 n/Z · log(2 n/Z ) · e − Z/ probability.Proof. Consider a bucket A ( i ) b at level i . Observe that this bucket can receive real elements from2 i initial buckets, each containing Z/ A ( i ) b only when the most signiﬁcant i bits of its key match b , which happens with exactly 2 − i probability. A Chernoﬀ bound shows that A ( i ) b overﬂows with less than e − Z/ probability. Hence, a union bound over all levels and all bucketsshows that overﬂow happens with less than B · log B · e − Z/ = (cid:15) ( n, Z ) probability. After performing the oblivious random bin assignment, ORP can be simply achieved as follows:scan the array and delete dummy elements from each bin (note that within each bin it is guaranteedthat the real elements appear before the dummy elements). Then obliviously permute each bin andﬁnally concatenate all bins. We have:

Lemma 3.3.

Bucket ORP oblivious implement the permutation functionality except for (cid:15) ( n, Z ) probability.Proof. We ﬁrst describe the simulator. The access pattern of the oblivious bin assignment algo-rithm is deterministic and the same for every input, where the overﬂow even is independent ofthe input itself. Therefore, it is easy to simulate the bin assignment. The simulator then pre-tends to simulate the randomly permuting of each bin. Then, the simulator chooses random loads (cid:126)k = ( k , k , . . . , k B − ), where k i is the load of the real elements in the i th bin. This is done bysimply throwing n elements into B bins (“in the head”). If there is some i for which k i > Z thenthe simulator aborts. The removal of the dummy elements is equivalent to the revealing of theseloads.Clearly, (cid:126)k are distributed the same as in the real execution. The only diﬀerence between thesimulated access pattern and the real one is in the case where the algorithm aborts as a result ofan overﬂow before the last level, which occurs with at most (cid:15) ( n, Z ) probability.We next show that the output of the algorithm is a random permutation, conditioned on theaccess pattern. As we previously described, it is actually enough to condition on the vector ofrandom loads (cid:126)k = ( k , k , . . . , k B − ). We show that given any such vector, all permutations areequally likely.Fix a particular load (cid:126)k = ( k , k , . . . , k B − ). The algorithm works by ﬁrst assigning the realelements into the bins, and then permuting within each bin. For every input, there are exactly (cid:0) nk ,...,k B − (cid:1) ways to distribute the real elements into the bins while achieving the vector of loads (cid:126)k . Then, each bin is individually permuted, i.e., within each bin i , we have k i diﬀerent possible6rdering. Overall, the total number of possible outputs with that load is then (cid:18) nk , . . . , k B − (cid:19) · k ! · . . . · k B − ! = n !That is, even conditioned on some speciﬁc loads (cid:126)k = ( k , k , . . . , k B − ), all permutations are stillequally likely. Therefore, ∀ π , Pr (cid:104) Π = π | (cid:126)K = (cid:126)k (cid:105) = n ! , andPr [Π = π ] = (cid:88) (cid:126)k Pr (cid:104) Π = π | (cid:126)K = (cid:126)k (cid:105) · Pr (cid:104) (cid:126)K = (cid:126)k (cid:105) = 1 n !Our algorithm fails to implement the ORP only when some bin overﬂows during the obliviousrandom bin assignment, which happens with (cid:15) ( n, Z ) probability by Lemma 3.2. Once we have ORP, it is easy to achieve oblivious sort: just invoke any non-oblivious comparison-based sort after ORP.Since the functionality is deterministic, it is enough to consider separately correctness andsimulation. Correctness follows from directly from the correctness of the ORP and the non-oblivioussort. As for obliviousness, given any input array, one can easily simulate the algorithm by ﬁrstrandomly permuting the array and then running the comparison-based non-oblivious sort. Theaccess patterns of a comparison-based sort depend only on the relative ranking of the input elements,which is independent of the input array once the array has been randomly permuted.

We analyze the eﬃciency of our algorithms and compare them to classic non-oblivious oblivioussorting algorithms in Table 1. We measure runtime using the number of memory accesses theclients needs to perform on the server.For our algorithms, assuming the client can store 2 Z elements locally, each 2 n -sized array isread and written once and there are log(2 n/Z ) < log n of them. So oblivious bin assignment andbucket ORP run in (less than) 4 n log n time. Note that the last step of ORP, i.e., permuting eachoutput bucket, can be incorporated with the last level of oblivious bin assignment. Bucket oblivioussort additionally invokes a non-oblivious sort, and thus runs in 6 n log n time. This is within 3 × ofmerge sort and beats bitonic sort when n is moderately large; for example, 5 × faster than bitonicfor n = 2 . For an overﬂow probability of 2 − and most reasonable values of n , Z = 512 suﬃces. We now discuss how to extend our algorithms to the case where the client can only store O (1)elements locally.Each MergeSplit can be realized with a single invocation of bitonic sort. Concretely, we ﬁrstscan the two input buckets to count how many real elements should go to buckets A (cid:48) vs. A (cid:48) , thentag the correct number of dummy elements going to either buckets, and ﬁnally perform a bitonicsort. 7ext, we need to permute each output bucket obliviously with O (1) local storage. This can bedone as follows. First, assign each element in a bucket a uniformly random label of Θ(log n ) bits.Then, obliviously sort the elements by their random labels using bitonic sort. Since the labels are“short” (i.e., logarithmic in size), we may have collisions with n − c probability for some constant c ,in which case we simply retry. In expectation, it succeeds in 1 + o (1) trials.Since we invoke B/ Z elements at each level, the runtime isroughly log B · B/ · Z log (2 Z )) ≈ n log n log Z . Our algorithms can also be extended to have better asymptotic performance. For this instantiation,we use a primitive called oblivious tight compaction. Oblivious tight compaction receives n elementseach marked as either 0 or 1, and outputs a permutation of the n elements such that all elementsmarked 0 appear before the elements that are marked 1. It should not be hard to see that oblivioustight compaction can be used to achieve MergeSplit . Using the O (1)-client-storage and O ( n )-timeoblivious tight compaction construction from [AKL + O ( n log n + n log Z ) runtime and O (1) client storage. Setting Z = ω (1) log n , bucket oblivious sort achieves O ( n log n ) runtime, O (1) client storage, and a negligible in n error probability. Algorithmic performance when the data is stored on disk has been studied in the external disk model(e.g., [RW94, AFGV97, Vit01, Vit06]) and references within). Recently, Asharov et al. [ACN + p, (cid:96) )locality if it has access to p disks and accesses in total (cid:96) discontiguous memory regions in all diskscombined. As an example, it is not hard to see that merge sort is a non-oblivious sorting algorithmthat sorts an array of size n in O ( n log n ) and (3 , log n )-locality, whereas quick sort is not localfor any reasonable p . This locality metric is motivated by the fact that real-world storage mediasuch as disks support sequential accesses much faster than random seeks. Thus an algorithm thatmakes mostly sequential accesses would execute much faster in practice than one that makes mostlyrandom accesses — even if the two have the same runtime in a standard word-RAM model.Guided by this new metric, Asharov et al. [ACN +

19] consider how to design oblivious algorithmsand ORAM schemes that achieve good locality. Since sorting is one of the most important buildingblocks in the design of oblivious algorithms, inevitably Asharov et al. [ACN +

19] show a locality-friendly sorting algorithm. Concretely, they show that there is a speciﬁc way to implement thebitonic sort meta-algorithm, such that the entire algorithm requires accessing O (log n ) distinctmemory regions (i.e., as many as the depth of the sorting network) require only 2 disks to beavailable — in other words, the algorithm achieves (2 , O (log n ))-locality.We observe that our algorithm, when implemented properly, is a locality-friendly oblivioussorting algorithm. Our algorithm outperforms Asharov et al. [ACN + n/Z instances of MergeSplit in the same layer of the butterﬂy network while accessing a small numberof discontiguous regions. Speciﬁcally, the

MergeSplit operation works on 4 buckets at a time,while reading two buckets from the input layer, and writing to two consecutive buckets in the outputlayer. Moreover, the diﬀerent invocations of

MergeSplit on the same layer deal with consecutivebuckets. By carefully distributing the buckets among the diﬀerent disks, and by using bitonic sortwhile implementing the

MergeSplit operation, we conclude:8 orollary 4.1.

There exists a statistically oblivious sort algorithm which, except with ≈ e − Z/ probability, completes in O ( n log n log Z ) work and with (3 , O (log n log Z ) ) locality. References [ACN +

19] Gilad Asharov, T-H Hubert Chan, Kartik Nayak, Rafael Pass, Ling Ren, and ElaineShi. Locality-preserving oblivious RAM. In

Annual International Conference on theTheory and Applications of Cryptographic Techniques , pages 214–243. Springer, 2019.[AFGV97] Lars Arge, Paolo Ferragina, Roberto Grossi, and Jeﬀrey Scott Vitter. On sorting stringsin external memory (extended abstract). In

ACM Symposium on the Theory of Com-puting (STOC ’97) , pages 540–548, 1997.[AKL +

18] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, Enoch Peserico, andElaine Shi. OptORAMa: optimal oblivious RAM.

Cryptology ePrint Archive , 2018.[AKS83] Mikl´os Ajtai, J´anos Koml´os, and Endre Szemer´edi. An 0( n log n ) sorting network. In Proceedings of the ﬁfteenth annual ACM symposium on Theory of computing , pages 1–9.ACM, 1983.[Bat68] Kenneth E Batcher. Sorting networks and their applications. In

Proceedings of the April30–May 2, 1968, spring joint computer conference , pages 307–314. ACM, 1968.[CV14] Artur Czumaj and Berthold V¨ocking. Thorp shuﬄing, butterﬂies, and non-markoviancouplings. In

ICALP (1) , volume 8572 of

Lecture Notes in Computer Science , pages344–355. Springer, 2014.[Czu15] Artur Czumaj. Random permutations using switching networks. In

STOC , pages 703–712. ACM, 2015.[FNR +

15] Christopher W Fletcher, Muhammad Naveed, Ling Ren, Elaine Shi, and Emil Stefanov.Bucket ORAM: Single online roundtrip, constant bandwidth oblivious RAM.

CryptologyePrint Archive , 2015.[GM11] Michael T Goodrich and Michael Mitzenmacher. Privacy-preserving access of out-sourced data via oblivious RAM simulation. In

International Colloquium on Automata,Languages, and Programming , pages 576–587. Springer, 2011.[GO96] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on obliviousrams.

Journal of the ACM , 43(3):431–473, 1996.[Goo10] Michael T Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In

Proceedings of the twenty-ﬁrst annual ACM-SIAM symposium on Discrete Algorithms ,pages 1262–1277. SIAM, 2010.[Goo14] Michael T Goodrich. Zig-zag sort: A simple deterministic data-oblivious sorting algo-rithm running in O ( n log n time. In Proceedings of the forty-sixth annual ACM sympo-sium on Theory of computing , pages 684–693. ACM, 2014.[LWN +

15] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi. ObliVM:A programming framework for secure computation. In

Symposium on Security andPrivacy . IEEE, 2015. 9NWI +

15] Kartik Nayak, Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft, andElaine Shi. GraphSC: Parallel secure computation made easy. In

Symposium on Securityand Privacy . IEEE, 2015.[OGTU14] Olga Ohrimenko, Michael T Goodrich, Roberto Tamassia, and Eli Upfal. The mel-bourne shuﬄe: Improving oblivious storage in the cloud. In

International Colloquiumon Automata, Languages, and Programming , pages 556–567. Springer, 2014.[RS20] Vijaya Ramachandran and Elaine Shi. Data oblivious algorithms for multicores.

CoRR ,abs/2008.00332, 2020.[RW94] Chris Ruemmler and John Wilkes. An introduction to disk drive modeling.

IEEEComputer , 27(3):17–28, 1994.[SS13] Emil Stefanov and Elaine Shi. Oblivistore: High performance oblivious cloud storage.In

Symposium on Security and Privacy . IEEE, 2013.[Vit01] Jeﬀrey Scott Vitter. External memory algorithms and data structures.

ACM Comput.Surv. , 33(2):209–271, 2001.[Vit06] Jeﬀrey Scott Vitter. Algorithms and data structures for external memory.