[PDF] Secrecy: Secure collaborative analytics on secret-shared data

Abstract

We study the problem of composing and optimizing relational query plans under secure multi-party computation (MPC). MPC enables mutually distrusting parties to jointly compute arbitrary functions over private data, while preserving data privacy from each other and from external entities. In this paper, we propose a relational MPC framework based on replicated secret sharing. We define a set of oblivious operators, explain the secure primitives they rely on, and provide an analysis of their costs in terms of operations and inter-party communication. We show how these operators can be composed to form end-to-end oblivious queries, and we introduce logical and physical optimizations that dramatically reduce the space and communication requirements during query execution, in some cases from quadratic to linear with respect to the cardinality of the input. We provide an efficient implementation of our framework, called Secrecy, and evaluate it using real queries from several MPC application areas. Our results demonstrate that the optimizations we propose can result in up to 1000x lower execution times compared to baseline approaches, enabling Secrecy to outperform state-of-the-art frameworks and compute MPC queries on millions of input rows with a single thread per party.

Full PDF

SSecrecy : Secure collaborative analytics on secret-shared data

John Liagouris †‡ , Vasiliki Kalavri † , Muhammad Faisal † , Mayank Varia †† Boston University, ‡ Hariri Institute for Computing{liagos, vkalavri, mfaisal, varia}@bu.edu

ABSTRACT

We study the problem of composing and optimizing relational queryplans under secure multi-party computation (MPC). MPC enablesmutually distrusting parties to jointly compute arbitrary functionsover private data, while preserving data privacy from each otherand from external entities.In this paper, we propose a relational MPC framework basedon replicated secret sharing. We define a set of oblivious opera-tors, explain the secure primitives they rely on, and provide ananalysis of their costs in terms of operations and inter-party com-munication. We show how these operators can be composed toform end-to-end oblivious queries , and we introduce logical andphysical optimizations that dramatically reduce the space and com-munication requirements during query execution, in some casesfrom quadratic to linear with respect to the cardinality of the input.We provide an efficient implementation of our framework, called

Secrecy , and evaluate it using real queries from several MPC applica-tion areas. Our experiments demonstrate that the optimizations wepropose can result in up to 1000 × lower execution times comparedto baseline approaches, enabling Secrecy to outperform state-of-the-art frameworks and compute MPC queries on millions of inputrows with a single thread per party.

Cryptographically secure multi-party computation, or MPC forshort, enables mutually distrusting parties to make queries of theircollective data while keeping their own sensitive data siloed fromeach other and from external adversaries. Several MPC softwarelibraries have been designed over the past decade that offer somecombination of speed, scale, and programming flexibility (e.g., [1,17, 20, 66, 81, 96]). MPC has been deployed to protect healthcaredata like disease surveillance, educational data like student GPAs,financial data like credit modeling, advertising data like conversionrates, public interest data like the gender wage gap, and more[11, 16, 19, 29]. Nevertheless, adoption of MPC is rare, in part dueto the challenge of developing and deploying MPC without domain-specific expertise [49].To make secure computation more accessible to data analysts,systems like Conclave [88], ObliDB [36], OCQ [31], Opaque [98],SAQE [15], SDB [51, 92], Senate [79], Shrinkwrap [14], and SM-CQL [13] are designed to compute relational queries while pro-viding strong security guarantees. Despite their particular differ-ences, these works aim to improve query performance either bysidesteping expensive MPC operations or by relaxing the full MPCsecurity guarantees (or both).We distinguish three main lines of work in this space: (i) worksthat rely on trusted hardware (e.g., secure enclaves [36, 98]) to avoidthe inherent communication cost of MPC protocols, (ii) works that

Figure 1: MPC setting overview for secure collaborative analytics employ hybrid execution (e.g., [13, 88]) and split the query plan intoa plaintext part (executed by the data owners) and an obliviouspart (executed under MPC), and (iii) works that trade off securequery performance with controlled information leakage , e.g., byrevealing information about intermediate result sizes to untrustedparties, either with noise [14, 15] or not [51, 88, 92]. More recently,Senate [79] combines hybrid execution in the spirit of SMCQL andConclave with a technique that reduces joint computation underMPC by leveraging information about data ownership. Table 1summarizes the features of the most prominent software solutionsfor relational analytics under MPC (we discuss hardware-basedapproaches in Section 7).Although the frameworks listed in Table 1 propose various typesof optimizations, these are applicable under certain conditions ondata sensitivity, input ownership, and the role of data owners inthe computation (cf. Optimization Conditions). For example, mini-mizing the use of secure computation via hybrid execution is onlyfeasible when data owners can compute part of the query locallyon their plaintext data (i.e. outside the MPC boundary). Moreover,SMCQL, SDB, and Conclave can sidestep MPC when attributes areannotated as non-sensitive, Shrinkwrap and SAQE calibrate leak-age based on user-provided privacy budgets, and Senate reducesjoint computation when some relations are owned by subsets ofthe computing parties.In this paper, we study the fundamental problem of composingand optimizing MPC queries in a more challenging setting, where all data are sensitive and data owners may not have their ownprivate resources to participate in the computation. In contrast toexisting work that has sought to improve MPC query performanceby either avoiding secure computation or relaxing its guarantees,we propose a set of optimizations for end-to-end oblivious queriesthat retain the full security guarantees of MPC. We contribute

Se-crecy , a framework for secure collaborative analytics that appliesthese optimizations, and we find that they can improve MPC query a r X i v : . [ c s . D B ] F e b ramework MPC Protocol InformationLeakage TrustedParty QueryExecution OptimizationObjective Optimization Conditions Conclave [88] Secret Sharing /Garbled Circuits Controlled(Hybrid operators) Yes Hybrid Minimize the use ofsecure computation 1. Data owners participate in computation2. Data owners provide privacy annotations3. There exists a (fourth) trusted partySMCQL [13] Garbled Circuits /ORAM No No Hybrid Minimize the use ofsecure computation 1. Data owners participate in computation2. Data owners provide privacy annotationsShrinkwrap [14] Garbled Circuits /ORAM Controlled(Diff. Privacy) No Hybrid Calibrate padding ofintermediate results 1. Data owners participate in computation

2. Data owners provide privacy annotations and intermediate result sensitivitiesSAQE [15] Garbled Circuits Controlled(Diff. Privacy) No Hybrid Choose sampling ratefor approximate answers 1. Data owners participate in computation

2. Data owners provide privacy annotations and privacy budgetSenate [79] Garbled Circuits No No Hybrid Reduce joint computationto subsets of parties 1. Data owners participate in computation2. Input or intermediate relations are ownedby subsets of the computing partiesSDB [51, 92] Secret Sharing Yes(operator dependent) No Hybrid Reduce data encryptionand decryption costs 1. Data owner participates in computation2. Data owner provides privacy annotations

Secrecy

Repl. Secret Sharing No No End-to-endunder MPC Reduce MPC costs(Section 4) None Shrinkwrap and SAQE build on top of SMCQL’s information flow analysis and inherit its optimizations along with their conditions. Senate provides security against malicious parties whereas all other systems adopt a semi-honest model. SDB adopts a typical DBaaS model with one data owner and does not support collaborative analytics.

Table 1: Summary of MPC-based software solutions for relational analytics. Hybrid query execution is feasible when data owners participatein the computation. The rest of the optimizations supported by each system are applicable under one or more of the listed conditions. performance by orders of magnitude. To the best of our knowledge,this is the first work to report results for oblivious queries on rela-tions with up to millions of input rows, entirely under MPC, andwithout any information leakage or need for trusted hardware.

Figure 1 gives an overview of our MPC setting. A set of 𝑘 dataowners that wish to compute a public query on their private datadistribute secret shares of the data to three untrusted computingparties. We adopt replicated secret sharing protocols (cf. Section 2.2),according to which each party receives two shares per input. Thecomputing parties execute the query under MPC and open theirresults to a learner. Making such architectures for secure databaseservices practical has been a long-standing challenge in the datamanagement community [3, 4]. We design our MPC framework, Secrecy , on the following principles:

1. Decoupling data owners from computing parties.

Contraryto existing works,

Secrecy decouples the role of a computing partyfrom that of a data owner. Our optimizations do not make anyassumptions about data ownership and are all applicable even whennone of the data owners participates in the computation.

2. No information leakage.

Secrecy retains the full MPC securityguarantees, that is, it reveals nothing about the data and the ex-ecution metadata to untrusted parties. It completely hides accesspatterns and intermediate result sizes.

3. No reliance on trusted execution environments.

Secrecy does not rely on any (semi-)trusted party, honest broker or spe-cialized secure hardware. To make our techniques accessible andremove barriers for adoption, we target general-purpose computeand cloud.

4. End-to-end MPC execution.

Secrecy does not require data own-ers to annotate attributes as sensitive or non-sensitive and does not try to reduce the amount of secure computation. Instead, itexecutes all query operators under MPC and protects all attributesto prevent inference attacks that exploit correlations or functionaldependencies in the data.

We define a set of oblivious operators based on replicated secretsharing and describe how they can be composed to build complexMPC query plans. Our contributions are summarized as follows: • We analyze the cost of oblivious operators and their com-position with respect to the number of required operations,messages, and communication rounds under MPC. • Based on this cost analysis, we propose a rich set of optimiza-tions that significantly reduce the cost of oblivious queries:(i) database-style logical transformations , such as operatorre-ordering and decomposition, (ii) physical optimizations ,including operator fusion and message batching, and (iii) secret-sharing optimizations that leverage knowledge aboutthe MPC protocol. • We provide efficient implementations of the oblivious oper-ators and corresponding optimizations in a new relationalMPC framework called

Secrecy . • We evaluate

Secrecy ’s performance and the effectiveness ofthe proposed optimizations using real and synthetic queries.Our experiments show that

Secrecy outperforms state-of-the-art MPC frameworks and scales to much larger datasets.We will release

Secrecy as open-source and make our experimentspublicly available. This work aims to make MPC more accessibleto the data management community and catalyze collaborationsbetween cryptographers and database experts.

Each party in MPC has one or more of the following roles: Input party or data owner that provides some input data. • Computing party , e.g. a cloud provider that provides re-sources (machines) to perform the secure computation. • Result party or learner , e.g. a data analyst who learns theoutput of the computation.A party may have any combination of the above roles; in fact, itis quite common to have data owners acting as computing and/orresult parties at the same time. This is also supported by

Secrecy without affecting the security guarantees or the proposed optimiza-tions. In addition, a party in MPC is a logical entity and does notnecessarily correspond to a single compute node. For example, acloud or IaaS provider can play the role of a single computing partythat internally distributes its own part of the computation across acluster of machines.

Secrecy does not make any assumption aboutthe parties’ actual deployment, so it could be perfectly possibleto deploy each party at competing providers or to have multipleproviders in the same datacenter in a federated cloud.Before using MPC, data owners must agree on the computation,in our setting a relational query, that they want to execute over theunion of their private data. This query is public , i.e., known to allparties regardless of their role. To evaluate the query, computingparties execute an identical computation and exchange messageswith each other.

MPC broadly offers two types of security guarantees: privacy , mean-ing that nobody learns more than (what they can infer from) theirown inputs and outputs, and correctness , meaning that the partiesare convinced that the output of the calculation is accurate. Theseguarantees hold even in the presence of a dishonest adversary whocontrols a (strict) subset of the computing parties; different MPCprotocols can withstand different adversary size and threat posture.Most MPC protocols consider an adversary who corrupts anarbitrary threshold 𝑇 of the 𝑁 computing parties, although morecomplicated access control policies are possible. Also, most proto-cols consider an adversary who either passively attempts to breakprivacy while following the protocol (a “semi-honest” adversary)or one who is actively malicious and is therefore willing to deviatefrom the prescribed protocol arbitrarily. In this work, we focuson the setting of a semi-honest adversary, noting that there existgeneral transformations to the stronger malicious setting [37]. Wediscuss malicious security further in Section 8.Concretely, the threat model of this work is as follows: we con-sider three computing parties, where the adversary has completevisibility into and control over the network through which theseparties exchange messages. The adversary may add, drop, or modifypackets at any time. Additionally, the adversary can passively mon-itor 1 of the 3 computing parties of their choice from the beginningof the protocol execution. Here, “passive monitoring” means thatthe adversary can view the contents of all messages received bythis party and any data stored on the machine, but they cannotalter the execution of the corrupted party. We also assume that thesoftware faithfully and securely implements the MPC protocol; thatis, formal verification is out of scope for this work. MPC protocols follow one of two general techniques: obscuring thetruth table of each operation using Yao’s garbled circuits [95], orinteractively performing operations over encoded data using secretsharing [83]. Garbled circuits are an effective method to securelycompute Boolean circuits in high-latency environments becausethey only need a few rounds of communication between comput-ing parties. Secret sharing-based approaches require less overallbandwidth and support more data types and operators.This work follows the approach of 3-party replicated secret shar-ing by Araki et al. [8]. We encode an ℓ -bit string of sensitive data 𝑠 ( 𝑠𝑒𝑐𝑟𝑒𝑡 ) by splitting it into 3 shares 𝑠 , 𝑠 , and 𝑠 that individuallyhave the uniform distribution over all possible 𝑛 -bit strings (forprivacy) and collectively suffice to specify 𝑠 (for correctness). Next,we give each party 𝑃 𝑖 two of the shares 𝑠 𝑖 and 𝑠 𝑖 + . Hence, any 2parties can reconstruct a secret, but any single party cannot.We consider two secret sharing formats: boolean secret shar-ing in which 𝑠 = 𝑠 ⊕ 𝑠 ⊕ 𝑠 , where ⊕ denotes the booleanXOR operation, and additive or arithmetic secret sharing in which 𝑠 = 𝑠 + 𝑠 + 𝑠 mod 2 ℓ .The computing parties are placed on a logical ‘ring,’ as shownin Figure 1. Given boolean secret sharings of two strings 𝑠 and 𝑡 or additive secret sharings of two values 𝑢 and 𝑣 , we describe nexthow the parties can collectively compute secret shares of manyoperations, without learning anything about the secrets. In this section, we briefly explain all primitives we use in our work.

Boolean operations.

The parties can compute shares of 𝑠 ⊕ 𝑡 locally , i.e. without communication, by simply XORing their shares 𝑠 𝑖 ⊕ 𝑡 𝑖 . To compute shares of the bitwise AND operation between 𝑠 and 𝑡 , denoted with 𝑠 · 𝑡 or simply 𝑠𝑡 , one round of communication is required. Observe that 𝑠𝑡 = ( 𝑠 ⊕ 𝑠 ⊕ 𝑠 ) · ( 𝑡 ⊕ 𝑡 ⊕ 𝑡 ) . Afterdistributing the AND over the XOR and doing some rearrangementwe have 𝑠𝑡 = ( 𝑠 𝑡 ⊕ 𝑠 𝑡 ⊕ 𝑠 𝑡 ) ⊕ ( 𝑠 𝑡 ⊕ 𝑠 𝑡 ⊕ 𝑠 𝑡 ) ⊕ ( 𝑠 𝑡 ⊕ 𝑠 𝑡 ⊕ 𝑠 𝑡 ) . In our replicated secret sharing scheme, each partyhas two shares for 𝑠 and two shares for 𝑡 . More precisely, 𝑃 has 𝑠 , 𝑠 , 𝑡 , 𝑡 whereas 𝑃 has 𝑠 , 𝑠 , 𝑡 , 𝑡 , and 𝑃 has 𝑠 , 𝑠 , 𝑡 , 𝑡 . Usingits shares, each party can locally compute one of the three terms(in parentheses) of the last equation and this term corresponds toits boolean share of 𝑠𝑡 . The parties then XOR this share with afresh sharing of 0 (which is created locally [8]) so that the finalshare is uniformly distributed. In the end, each party must send thecomputed share to its successor on the ring (clockwise) so that allparties have two shares of 𝑠𝑡 (without knowing the actual value of 𝑠𝑡 ) and the replicated secret sharing property is preserved. LogicalOR and NOT operations are based on the XOR and AND primitives. Equality/Inequality.

The parties can collectively form a secretsharing of the bit 𝑏 that equals 0 if and only if 𝑠 = 𝑡 by first comput-ing a sharing of 𝑠 ⊕ 𝑡 and then taking the boolean-AND of each ofthe bits of this string. Similarly, the parties can compare whether 𝑠 < 𝑡 by checking equality of bits from left to right and taking thevalue of 𝑠 𝑖 at the first bit 𝑖 in which the two strings differ.By arranging the 𝑛 fanin-2 AND gates in a log-depth tree, thenumber of communication rounds required for secure equality =, <>) and inequality ( < , > , ≥ , ≤ ) is ⌈ log ℓ ⌉ and ⌈ log ( ℓ + )⌉ respec-tively, where ℓ is the length of the operands in number of bits. Forexample, to check equality (resp. inequality) between two 64-bitintegers, we need ⌈ log 64 ⌉ = ⌈ log 65 ⌉ =

7) rounds. Notethat it is possible to compute (in)equality in a constant number ofrounds [30], but the constants are worse for typical string lengths.Some special cases of (in)equality operators can be further op-timized. Less-than-zero checks ( 𝑠 <

0) require a secret sharing ofthe most significant bit of 𝑠 , which the parties already possess, sono communication is needed. Equality with a public constant 𝑠 ? = 𝑐 can also be optimized by having the data owners compute twosubtractions 𝑠 − 𝑐 and 𝑐 − 𝑠 locally (in the clear) and secret sharethe results. This way, checking 𝑠 = 𝑐 is reduced in two obliviousinequalities 𝑠 − 𝑐 < 𝑐 − 𝑠 <

0, both of which are local. Thisoptimization exists in other MPC frameworks as well [1], and weshow later how we use it to evaluate selection locally.

Compare-and-swap.

The parties can calculate the min and max oftwo strings. Setting 𝑏 = 𝑠 ? < 𝑡 , we can use a multiplexer to compute 𝑠 ′ = min { 𝑠, 𝑡 } = 𝑏𝑠 ⊕ ( ⊕ 𝑏 ) 𝑡 and 𝑡 ′ = max { 𝑠, 𝑡 } = ( ⊕ 𝑏 ) 𝑠 ⊕ 𝑏𝑡 .Evaluating these formulas requires ⌈ log ( ℓ + )⌉ rounds for theinequality plus two more rounds: one for exchanging shares of thecomputed bit 𝑏 , and a second one to exchange the shares of theresults of the four ANDs required by the multiplexer. Compare-and-swap overwrites the original strings 𝑠 and 𝑡 . Sort and shuffle.

A bitonic sorter, such as Batcher’s sort [61],combines O( 𝑛 log 𝑛 ) compare-and-swap operators with a data-independent control flow. We can obliviously shuffle values in asimilar fashion: each party appends a new attribute that is populatedwith locally generated random values, sorts the values on this newattribute, and then discards the new attribute (although we remarkthat faster oblivious shuffle algorithms are possible). Boolean addition.

In case 𝑠 and 𝑡 are integers, computing theshare of 𝑠 + 𝑡 can be done in ℓ rounds of communication using aripple-carry adder [59]. Rounds can be further reduced to O( log ℓ ) with a parallel prefix adder, at the cost of exchanging more data. Arithmetic operations.

Addition using additive shares is moreefficient. Given additive shares of two secrets 𝑢 and 𝑣 , parties cancompute 𝑢 + 𝑣 locally. Multiplication 𝑢 · 𝑣 is equivalent to a logicalAND using boolean shares, so it requires one round of communica-tion as explained above. Scalar multiplication is local. Conversion.

We can convert between additive and boolean shar-ings [68] by securely computing all of the XOR and AND gates in aripple-carry adder. One special case of conversion that is useful inmany cases is the boolean-to-arithmetic conversion of shares forsingle-bit secrets. This conversion can be done in two rounds withthe simple protocol used in [1]. We explain how we leverage thisoptimization to speedup oblivious aggregations later.

In this section, we define the oblivious operators of

Secrecy , analyzetheir cost, and describe how they can be composed. At a high level,oblivious selection requires a linear scan over the input relation, joinand semi-join operators require a nested-loop over the two inputs,whereas order-by, distinct, and group-by are based on oblivious sorting. In all cases, the operator’s predicate is evaluated underMPC using the primitives of Section 2.3.Our oblivious operators hide both access patterns and output size from the computing parties. We hide access patterns by implement-ing the operator in a way that makes its control-flow independentof the input data so that it incurs exactly the same accesses for allinputs of the same size. In practice, this means that the implementa-tion does not include any if statements that depend either directlyor indirectly on the input data. Also, all operators except PROJECT and

ORDER-BY introduce a new single-bit attribute that stores the(secret-shared) result of a logical or arithmetic expression evaluatedunder MPC. This extra attribute denotes whether the respectivetuple belongs to the output of an oblivious operator and is alwaysdiscarded before opening the final result to the learner(s). Alongwith ‘masking’ that we describe below, the single-bit attribute en-ables the computing parties to jointly apply each operator withoutlearning the actual size of any intermediate or output relation.

Let 𝑅 , 𝑆 , and 𝑇 be relations with cardinalities | 𝑅 | , | 𝑆 | , and | 𝑇 | re-spectively. Let also 𝑡 [ 𝑎 𝑖 ] be the value of attribute 𝑎 𝑖 in tuple 𝑡 . Tosimplify the presentation, we describe how each operator is com-puted over the logical (i.e. secret) relations and not the actual sharesdistributed across parties. That is, when we say that “a computa-tion is applied to a relation 𝑅 and defines another relation 𝑇 ”, inpractice this means that each computing party begins with sharesof 𝑅 , performs some MPC operations, and ends with shares of 𝑇 . PROJECT.

Oblivious projection has the same semantics as thenon-oblivious operation.

SELECT.

An oblivious selection with predicate 𝜑 on a relation 𝑅 defines a new relation: 𝑇 = { 𝑡 ∪ { 𝜑 ( 𝑡 )} | 𝑡 ∈ 𝑅 } with the same cardinality as 𝑅 , i.e. | 𝑇 | = | 𝑅 | , and one more attributefor each tuple 𝑡 ∈ 𝑅 that contains 𝜙 ’s result when applied to 𝑡 (eachparty has two shares of the actual result according to the replicatedsecret sharing protocol). The result is a single bit denoting whetherthe tuple 𝑡 is included in 𝑇 (1) or not (0). The predicate 𝜙 can bean arbitrary logical expression with atoms that may also includearithmetic expressions ( + , ∗ , = , > , < , ≠ , ≥ , ≤ ). Such expressions areevaluated under MPC using the primitives of Section 2.3. Note that,in contrast to a typical selection in the clear, oblivious selectiondefines a relation with the same cardinality as the input relation,i.e., it does not remove tuples from the input so that the size of theoutput remains hidden to the computing parties. JOIN.

An oblivious 𝜃 -join between two relations 𝑅 and 𝑆 , denotedwith 𝑅 ⊲⊳ 𝜃 𝑆 , defines a new relation: 𝑇 = {( 𝑡 ∪ 𝑡 ′ ∪ { 𝜃 ( 𝑡, 𝑡 ′ )}) | 𝑡 ∈ 𝑅 ∧ 𝑡 ′ ∈ 𝑆 } where 𝑡 ∪ 𝑡 ′ is a new tuple that contains all attributes of 𝑡 ∈ 𝑅 along with all attributes of 𝑡 ′ ∈ 𝑆 , and 𝜃 ( 𝑡, 𝑡 ′ ) is 𝜃 ’s result whenapplied to the pair of tuples ( 𝑡, 𝑡 ′ ). This result is a cartesian productof the input relations ( 𝑅 × 𝑆 ), where each tuple is augmented witha single bit (0/1) denoting whether the tuple 𝑡 “matches” with tuple 𝑡 ′ according to 𝜃 . Generating the cartesian product is inherent togeneral oblivious join algorithms (we discuss special join instances n Section 7). Like selections, the join predicate can be an arbitrarylogical expression with atoms that may also include arithmeticexpressions. Join is the only oblivious operator in Secrecy thatgenerates a relation with cardinality larger than the cardinalities ofits inputs.

SEMI-JOIN.

An oblivious (left) semi-join between two relations 𝑅 and 𝑆 on a predicate 𝜃 , denoted with 𝑅 ⋉ 𝜃 𝑆 , defines a new relation: 𝑇 = {( 𝑡 ∪ { (cid:220) ∀ 𝑡 ′ ∈ 𝑆 𝜃 ( 𝑡, 𝑡 ′ )}) | 𝑡 ∈ 𝑅 } with the same cardinality as 𝑅 , i.e. | 𝑇 | = | 𝑅 | , and one more at-tribute that stores the result of the formula (cid:212) ∀ 𝑡 ′ ∈ 𝑆 𝜃 ( 𝑡, 𝑡 ′ ) indicat-ing whether the row in 𝑅 “matches” with any row in 𝑆 . ORDER-BY.

Oblivious order-by on attribute 𝑎 𝑘 has the same se-mantics as the non-oblivious operator, where each tuple is assignedan index 𝑖 such that: ∀ 𝑡 𝑖 , 𝑡 𝑗 ∈ 𝑅, 𝑖 < 𝑗 ⇐⇒  𝑡 𝑖 [ 𝑎 𝑘 ] < 𝑡 𝑗 [ 𝑎 𝑘 ] ( ASC ) 𝑡 𝑖 [ 𝑎 𝑘 ] > 𝑡 𝑗 [ 𝑎 𝑘 ] ( DESC ) The tuple ordering is computed under MPC using oblivious compare-and-swap operations (cf. Section 2.3). Hereafter, sorting a relation 𝑅 with 𝑚 attributes on ascending (resp. descending) order of an at-tribute 𝑎 𝑘 , ≤ 𝑘 ≤ 𝑚 , is denoted as 𝑠 ↑ 𝑎 𝑘 ( 𝑅 ) = 𝑇 (resp. 𝑠 ↓ 𝑎 𝑘 ( 𝑅 ) = 𝑇 ).We define order-by on multiple attributes using the standard se-mantics. For example, sorting a relation 𝑅 first on attribute 𝑎 𝑘 (ascending) and then on 𝑎 𝑛 (descending) is denoted as 𝑠 ↑ 𝑎 𝑘 ↓ 𝑎 𝑛 ( 𝑅 ) .An order-by operator is often followed by a LIMIT that definesthe number of tuples the operator must output. Limit in the obliv-ious setting has the same semantics. Order-by with limit is theonly operator in

Secrecy that may output a relation with cardinalitysmaller than the cardinality of its input.

GROUP-BY with aggregation.

An oblivious group-by aggrega-tion on a relation 𝑅 with 𝑚 attributes defines a new relation 𝑇 = { 𝑓 ( 𝑡 ′ ) | 𝑡 ′ = 𝑡 ∪ { 𝑎 𝑔 , 𝑎 𝑣 } , 𝑡 ∈ 𝑅 } with the same cardinality as 𝑅 , i.e. | 𝑇 | = | 𝑅 | , and two more attributes: 𝑎 𝑔 that stores the re-sult of the aggregation, and 𝑎 𝑣 that denotes whether the tuple 𝑡 is ‘valid’, i.e., included in the output. Let 𝑎 𝑘 be the group-bykey and 𝑎 𝑤 the attribute whose values are aggregated. Let also 𝑆 = (cid:104) 𝑡 [ 𝑎 𝑤 ] , 𝑡 [ 𝑎 𝑤 ] , ..., 𝑡 𝑢 [ 𝑎 𝑤 ] (cid:105) be the list of values for attribute 𝑎 𝑤 for all tuples 𝑡 , 𝑡 , ..., 𝑡 𝑢 ∈ 𝑅 that belong to the same group, i.e., 𝑡 [ 𝑎 𝑘 ] = 𝑡 [ 𝑎 𝑘 ] = ... = 𝑡 𝑢 [ 𝑎 𝑘 ] , 1 ≤ 𝑢 ≤ | 𝑅 | . The function 𝑓 in 𝑇 ’sdefinition above is defined as: 𝑓 ( 𝑡 𝑖 ) =  𝑡 𝑖 [ 𝑎 𝑔 ] = 𝑎𝑔𝑔 ( 𝑆 ) , 𝑡 𝑖 [ 𝑎 𝑣 ] = , 𝑖 = 𝑢 ′ , ≤ 𝑢 ′ ≤ 𝑢𝑡 𝑖𝑛𝑣 , 𝑖 ≠ 𝑢 ′ , ≤ 𝑖 ≤ 𝑢 where 𝑡 𝑖𝑛𝑣 is a tuple with 𝑡 𝑖𝑛𝑣 [ 𝑎 𝑣 ] = 𝑎𝑔𝑔 ( 𝑆 ) is the aggregation func-tion, e.g. MIN , MAX , COUNT , SUM , AVG . Put simply, oblivious aggre-gation sets the value of 𝑎 𝑔 for one tuple per group equal to theresult of the aggregation for that group and updates (in-place) allother tuples with “garbage.” Groups can be defined on multipleattributes (keys) using the standard semantics. Global obliviousaggregation on attributes of 𝑅 is defined by assigning all tuples in 𝑅 to a single group. DISTINCT.

The oblivious distinct operator is a special case ofgroup-by with aggregation, assuming that 𝑎 𝑘 is not the group-bykey as before but the attribute where distinct is applied. For distinct,there is no 𝑎 𝑔 attribute and the function 𝑓 is defined as follows: 𝑓 ( 𝑡 𝑖 ) =  𝑡 𝑖 [ 𝑎 𝑣 ] = , 𝑖 = 𝑢 ′ , ≤ 𝑢 ′ ≤ 𝑢𝑡 𝑖 [ 𝑎 𝑣 ] = , 𝑖 ≠ 𝑢 ′ , ≤ 𝑖 ≤ 𝑢 In simple words, distinct marks one tuple per ‘group’ as ‘valid’ andthe rest as ‘invalid’.

MASK.

Let 𝑡 𝑖𝑛𝑣 be a special tuple with invalid attribute values.A mask operator with predicate 𝑝 on a relation 𝑅 defines a newrelation 𝑇 = { 𝑓 ( 𝑡 ) | 𝑡 ∈ 𝑅 } , where: 𝑓 ( 𝑡 ) =  𝑡, 𝑝 ( 𝑡 ) = 𝑡 𝑖𝑛𝑣 , 𝑝 ( 𝑡 ) = We now describe the implementation of oblivious operators andanalyze their individual costs before discussing plan compositionin the next section. In

Secrecy , we have chosen to provide generalimplementations that favor composability. Building a full-fledgedMPC planner that considers alternative operator implementationsand their costs is out of the scope of this paper but certainly anexciting opportunity for follow-up work (cf. Section 8).We consider two types of costs for individual operators: (i) oper-ation costs defined in terms of the total number of MPC operations per party, which include local computation and message exchange,and (ii) synchronization costs for inter-party communication, whichwe measure by the number of communication rounds across parties.All secret-shared data in our framework reside in main memory,therefore, we do not consider disk I/O costs.A communication round corresponds to a single clockwise dataexchange on the ring between the 3 computing parties. In practice,this is a barrier , i.e. a synchronization point in the distributed com-putation, where parties must exchange data in order to proceed. Ingeneral, the fewer rounds an operation needs the faster it reachescompletion since each party can make more progress without beingblocked on other parties. Table 2 shows the number of operationsas well as the communication rounds required by each individualoperator with respect to the input size. Throughout this section,we use 𝑛 , 𝑚 to refer to the cardinalities of input relations and ℓ todenote the length (in bits) of a secret-shared value. PROJECT.

The cost of an oblivious

PROJECT is the same as itsplaintext counterpart: it does not require any communication, aseach party can locally disregard the shares corresponding to thefiltered attributes.

SELECT.

In terms of operations, oblivious

SELECT performs a lin-ear scan of the input relation 𝑅 . Because predicate evaluation canbe computed independently for an arbitrary number of rows, thenumber of rounds (i.e., synchronization barriers) to perform the SELECT equals the number of rounds required to evaluate the se-lection predicate on a single row; it is independent of the size of 𝑅 . perator SELECT

O ( 𝑛 ) O ( ) JOIN

O ( 𝑛 · 𝑚 ) O ( ) SEMI-JOIN

O ( 𝑛 · 𝑚 ) O ( log 𝑚 ) ORDER-BY

O ( 𝑛 · log 𝑛 ) O ( log 𝑛 ) DISTINCT

O ( 𝑛 · log 𝑛 ) O ( log 𝑛 ) GROUP-BY

O ( 𝑛 · log 𝑛 ) O ( 𝑛 ) MASK

O ( 𝑛 ) O ( ) Table 2: Summary of operation and synchronization costs for gen-eral oblivious relational operators w.r.t. the cardinalities ( 𝑛 , 𝑚 ) ofthe input relation(s). The asymptotic number of operations equalsthe asymptotic number of messages per computing party, as each in-dividual operation on secret shares involves a constant number ofmessage exchanges under MPC. These messages can be batched inrounds as shown in the rightmost column. JOIN is the most expen-sive operator in number of operations/messages whereas GROUP-BY is the most expensive operator in number of rounds. In Section 4.4, we describe a technique we use in

Secrecy that canreduce selections to local operations.

JOIN.

Oblivious

JOIN is the most expensive operation in terms ofoperation cost as it requires a nested-loop over the input relationsto check all possible pairs ( 𝑛 · 𝑚 ); however, the number of commu-nication rounds in the oblivious JOIN is independent of the inputsizes 𝑛 and 𝑚 . As in the case of SELECT , the number of roundsonly depends on the join predicate. For equality joins, each one ofthe 𝑛 · 𝑚 equality checks requires ⌈ log ℓ ⌉ rounds (where ℓ is thelength of the join attributes in bits) and is independent of others,hence, the whole join can be done in ⌈ log ℓ ⌉ rounds. Range joins aremore expensive. A range join with predicate of the form 𝑅.𝑎 ≤ 𝑆.𝑏 ,where 𝑎 , 𝑏 are attributes of the input relations 𝑅 and 𝑆 , requires ⌈ log ( ℓ + )⌉ rounds in total. The constant asymptotic complexitywith respect to the input size holds for any 𝜃 -join. SEMI-JOIN.

Oblivious semi-joins require the same number of op-erations as the 𝜃 -joins but the number of communication roundsis different. A semi-join 𝑅 ⋉ 𝜃 𝑆 requires O( log | 𝑆 |) communicationrounds to evaluate the formula (cid:212) ∀ 𝑡 ′ ∈ 𝑆 𝜃 ( 𝑡, 𝑡 ′ )} from Section 3.1.This formula requires ORing | 𝑆 | bits, which can be done in ⌈ log | 𝑆 |⌉ communication rounds by using a binary tree of logical operators,as in the case of equality and inequality (cf. Section 2.3). ORDER-BY.

Oblivious

ORDER-BY relies on Bitonic sort that per-forms O( 𝑛 · log 𝑛 ) compare-and-swap operations in log 𝑛 ( + log 𝑛 ) stages, where each stage involves 𝑛 independent compare-and-swap operations that can be performed in bulk . In this case, thenumber of messages required by each oblivious compare-and-swapis linear to the number of attributes in the input relation, however,the number of rounds depends only on the cardinality of the input.Given the number of rounds of each compare-and-swap operation(cf. Section 2.3), the total number of rounds required by ORDER-BY is:log 𝑛 · ( + log 𝑛 ) · ( + / · ⌈ log ( ℓ + )⌉) where 𝑛 is the cardinality of the input relation, and ℓ is the length ofthe sort attribute in bits. The analysis assumes one sorting attribute.Adding more sorting attributes increases the number of rounds ineach comparison by a small constant factor. sort input relation 𝑅 on 𝑎 𝑘 ; for each pair of adjacent tuples ( 𝑡 𝑖 , 𝑡 𝑖 + ), ≤ 𝑖 < | 𝑅 | , do //Are tuples in the same group? let 𝑏 ← 𝑡 𝑖 [ 𝑎 𝑘 ] ? = 𝑡 𝑖 + [ 𝑎 𝑘 ] ; //Aggregation 𝑡 𝑖 + [ 𝑎 𝑔 ] ← 𝑏 · 𝑎𝑔𝑔 (cid:16) 𝑡 𝑖 [ 𝑎 𝑔 ] , 𝑡 𝑖 + [ 𝑎 𝑔 ] (cid:17) + ( − 𝑏 ) · 𝑡 𝑖 + [ 𝑎 𝑔 ] ; 𝑡 𝑖 [ 𝑎 𝑣 ] ← ¬ 𝑏 ; //Masking for each attribute 𝑎 ≠ 𝑎 𝑣 of 𝑡 𝑖 do let 𝑟 be an invalid value; 𝑎 ← 𝑏 · 𝑟 + ( − 𝑏 ) · 𝑎 ; shuffle R; Algorithm 1:

Main control-flow of oblivious group-by

GROUP-BY.

The logic of oblivious group-by is given in Algo-rithm 1. Let 𝑎 𝑘 be the group-by key, 𝑎 𝑤 the aggregated attribute, 𝑎 𝑔 the extra attribute that stores the aggregation result (initialized with 𝑎 𝑤 ), and 𝑎 𝑣 the ‘valid’ bit (same notation as in Section 3.1). The firststep is to sort the input relation on the group-by key ( line 1 ). Then,the operator scans the sorted relation and, for each pair of adjacenttuples, applies an oblivious equality comparison on 𝑎 𝑘 ( line 3 ). Theresult of this comparison ( 𝑏 ) is used to aggregate ( line 4 ), set the‘valid’ bit ( line 5 ), and “mask” ( lines 6-8 ) obliviously. Aggregationis updated incrementally based on the values of the last pair oftuples ( line 4 ). MIN , MAX , COUNT , and

SUM can be easily evaluatedthis way but for AVG we need to keep the sum (numerator) andcount (denominator) separate. When the scan is over, the algorithmrequires a final shuffling ( line 9 ) to hide the group “traces” in casethe relation (or a part of it) is opened to the learner; this step isonly needed if no subsequent sorting is required in the query plan,which would obliviously re-order 𝑅 anyway.This operator is the most expensive in terms of communicationrounds because the aggregation function is applied sequentially oneach pair of adjacent tuples. Accounting for the initial sorting andfinal shuffling, the total number of rounds required by GROUP-BY is: ( 𝑛 − ) · 𝑐 𝑎𝑔𝑔 + log 𝑛 · ( + log 𝑛 ) · ( + ⌈ log ( ℓ + )⌉) where 𝑐 𝑎𝑔𝑔 is the number of rounds required to apply the aggrega-tion function to a pair of rows (independent of 𝑛 ). Aggregations.

Aggregations can be used without a

GROUP-BY clause.In this case, applying the aggregation function requires 𝑛 − O( log 𝑛 ) by building a binary tree of function evalua-tions. This optimization makes aggregations efficient in practice,and other works have used it to reduce the number of rounds in GROUP-BY if the data owners agree to reveal the group sizes [22, 55].

DISTINCT.

Distinct is a special case of group-by where 𝑎 𝑘 is inthis case the distinct attribute. As such, it follows a slightly differentversion of Algorithm 1 where, for each pair of adjacent tuples, weapply the equality comparison on 𝑎 𝑘 ( line 3 ) and set the distinct bit 𝑡 𝑖 + [ 𝑎 𝑣 ] to ¬ 𝑏 (the value 𝑡 [ 𝑎 𝑣 ] of the first tuple is set to 1). Lines are simply omitted in this case because distinct does not requireaggregation, masking or shuffling. Crucially, each evaluation of theloop is independent, so the communication rounds of the equalitycomparisons ( line 3 ) can be performed in bulk for all pairs of tuples. igure 2: Example composition of two oblivious selections and a 𝜃 -join. The single-bit attributes 𝜙 , 𝜙 , and 𝜃 are used to compute thefinal attribute 𝑒 𝑐 that denotes whether a row belongs to the result 𝑇 and is secret-shared amongst computing parties. The cost of thecomposition in this case is the cost of evaluating the logical expres-sion 𝑒 𝑐 = 𝜙 ∧ 𝜙 ∧ 𝜃 under MPC, for each tuple in 𝑇 . All | 𝑇 | expres-sions are independent and can be evaluated in bulk within two com-munication rounds, one for each logical AND ( ∧ ) in 𝑒 𝑐 ’s formula. Hence, oblivious

DISTINCT requires the same asymptotic numberof operations as

ORDER-BY because its operation cost is dominatedby the initial sort ( line 1 ). DISTINCT ’s communication cost is alsodominated by that of

ORDER-BY ; the only extra effort is to compute 𝑛 − 𝑛 · ( + log 𝑛 ) · ( + / · ⌈ log ( ℓ + )⌉) + ⌈ log ℓ ⌉ MASK.

The cost of

MASK is similar to the cost of

SELECT ; it requires 𝑛 operations and a constant number of communication rounds toapply the masking function. Consider the composition of two operators defined as applying thesecond operator to the output of the first operator. One merit of ourapproach is that all operators of Section 3.1 reveal nothing abouttheir output or access patterns, so they can be arbitrarily composedinto an end-to-end oblivious query plan without special treatment.Let 𝑜𝑝 and 𝑜𝑝 be two oblivious operators. In general, the com-position 𝑜𝑝 ( 𝑜𝑝 ( 𝑅 )) has an extra cost (additional to the cost ofapplying the operators 𝑜𝑝 and 𝑜𝑝 ) because it requires evaluatingunder MPC a logical expression 𝑒 𝑐 for each generated tuple. Wedefine the composition cost of 𝑜𝑝 ( 𝑜𝑝 ( 𝑅 )) as the cost of evaluating 𝑒 𝑐 on all tuples generated by 𝑜𝑝 . The expression 𝑒 𝑐 depends onthe types of operators, as described below. Table 3 summarizes thecomposition costs for different operator pairs in Secrecy . Composing selections and joins.

Recall that selections, joins,and semi-joins append a single-bit attribute to their input relationthat indicates whether the tuple is included in the output. To com-pose a pair of such operators, we compute both single-bit attributesand take their conjunction under MPC. For example, for two selec-tion operators 𝜎 and 𝜎 with predicates 𝜑 , 𝜑 , the composition 𝜎 ( 𝜎 ( 𝑅 )) defines a new relation 𝑇 = { 𝑡 ∪ { 𝑒 𝑐 = 𝜑 ( 𝑡 ) ∧ 𝜑 ( 𝑡 )} | 𝑡 ∈ 𝑅 } . The cost of composition in this case is the cost of evaluatingthe expression 𝜑 ( 𝑡 ) ∧ 𝜑 ( 𝑡 ) for each tuple in 𝑇 . This includes | 𝑇 | boolean ANDs all of which are independent and can be evaluated in one round. An example of composing two oblivious selectionswith an oblivious 𝜃 -join is given in Figure 2. Composing distinct with other operators.

Applying a selectionor a (semi-)join to the result of

DISTINCT requires a single commu-nication round in order to compute the conjunction of the selectionor (semi-)join bit with the bit 𝑎 𝑣 generated by distinct. However,applying DISTINCT to a relation derived by a selection, a (semi-)join or a group-by operator, requires some care. Consider the casewhere

DISTINCT is applied to the output of a selection. Let 𝑎 𝜙 bethe attribute added by the selection and 𝑎 𝑘 be the distinct attribute.To set the distinct bit 𝑎 𝑣 at each tuple, we need to make sure thereare no other tuples with the same attribute 𝑎 𝑘 , with 𝑎 𝜙 =

1, andwhose distinct bit 𝑎 𝑣 is already set. More formally: 𝑡 𝑖 [ 𝑎 𝑣 ] =  , iff (cid:154) 𝑡 𝑗 , 𝑖 ≠ 𝑗 : 𝑡 𝑖 [ 𝑎 𝑘 ] = 𝑡 𝑗 [ 𝑎 𝑘 ] ∧ 𝑡 𝑗 [ 𝑎 𝜙 ] = ∧ 𝑡 𝑗 [ 𝑎 𝑣 ] = , otherwiseTo evaluate the above formula, the distinct operator must processtuples sequentially and the composition itself requires 𝑛 rounds,where 𝑛 is the cardinality of the input. This results in a significantincrease over the constant number of rounds required by distinctwhen applied to a base relation (cf. Table 2). Applying distinct to theoutput of a group-by or (semi-)join incurs a linear number of roundsfor the same reason. In Section 4.3, we propose an optimizationthat reduces the cost of these compositions to a logarithmic factor. Composing group-by with other operators.

To perform a group-by on the result of a selection or (semi-)join, the group-by operatormust apply the aggregation function to all tuples in the same groupthat are also included in the output of the previous operator. Con-sider the case of applying group-by to a selection result. To identifythe aforementioned tuples, we need to evaluate the formula: 𝑏 ← 𝑏 ∧ 𝑡 𝑖 [ 𝑎 𝜙 ] ∧ 𝑡 𝑖 + [ 𝑎 𝜙 ] at each step of the for-loop in Algorithm 1, where 𝑏 is the bit thatdenotes whether the tuples 𝑡 𝑖 and 𝑡 𝑖 + belong to the same group( line 3 in Algorthm 1) and 𝑎 𝜙 is the selection bit. This formulaincludes two logical ANDs that require two communication rounds.Applying group-by to the output of a (semi-)join has the samecomposition cost; in this case, we replace 𝑎 𝜙 in the above formulawith the (semi-)join attribute 𝑎 𝜃 .To apply a selection to the result of GROUP-BY , we must computea logical AND between the selection bit 𝑎 𝜙 and the ‘valid’ bit 𝑎 𝑣 ofeach tuple generated by the group-by. The cost of composition innumber of rounds is independent of the group-by result cardinality,as all logical ANDs can be applied in bulk. The same holds whenapplying a (semi-)join to the output of group-by. Finally, composingtwo group-by operators has the same cost with applying GROUP-BY to the result of selection, as described above.

Composing order-by with other operators.

Composing

ORDER-BY with other operators is straight-forward. Applying an operatorto the output of order-by has zero composition cost. The converseoperation, applying

ORDER-BY to the output of an operator, requiresa few more boolean operations per oblivious compare-and-swap(due to the attribute/s appended by the previous operator), but doesnot incur additional communication rounds. perator pair(s) SELECT , (SEMI-)JOIN } → DISTINCT

O ( 𝑛 ) DISTINCT → { SELECT , (SEMI-)JOIN } O ( ) SELECT ↔ (SEMI-)JOIN O ( ) GROUP-BY → { SELECT , (SEMI-)JOIN } O ( ) { SELECT , (SEMI-)JOIN } → GROUP-BY

O ( 𝑛 ) { GROUP-BY , DISTINCT } ↔ { GROUP-BY , DISTINCT } O ( 𝑛 ) Table 3: Summary of composition costs in number of rounds forpairs of oblivious operators in

Secrecy w.r.t the number of generatedtuples ( 𝑛 ). Arrows denote the order of applying the two operators.Composition incurs a small constant number of boolean operationsper tuple, so the cost in number of operations is O ( 𝑛 ) for all pairs. In this section, we present the set of optimizations in

Secrecy : logicaltransformation rules, such as operator reordering and decomposi-tion (Section 4.2), physical optimizations, such as message batchingand operator fusion (Section 4.3), and secret-sharing optimizationsthat further reduce the number of communication rounds for certainoperators (Section 4.4). Finally, in Section 4.5, we show concreteexamples of the cost reduction our optimizations achieve whenapplied on real-world queries.

Target queries.

In this work, we focus on collaborative analyticsunder MPC where two or more data owners want to make querieson their collective data without compromising privacy. We considerall query inputs as sensitive and assume that data owners wishto protect their raw data and avoid revealing attributes of baserelations in query results. For example, employing collaborativeMPC to compute a query that includes a patient’s name along withtheir diagnosis in the

SELECT clause would be pointless. Thus, wetarget queries that return global or per-group aggregates and/ordistinct results.

Cost-based query optimization on plaintext data relies on selectivityestimation to reduce the size of intermediate results. The obliviousoperators in

Secrecy , however, hide the true size of their results byproducing fixed-size outputs for all inputs of the same cardinality.As a consequence, traditional cost-based optimization techniquesfor relational queries are not always effective when optimizingplans under MPC. Consider, for instance, the case of the ubiquitous“filter push-down” transformation rule. Since oblivious selectionsdo not reduce the size of intermediate data, this transformationdoes not improve the cost of operators following the filter.To define optimizations that are effective under MPC, we insteadaim to minimize the cost of oblivious queries. The total cost of aquery plan can be computed as a function of the individual costsprovided in Tables 2 and 3. In particular: • The operation cost , which is determined by the total num-ber of operations and messages per party (Section 3.2). • The synchronization cost , given by the number of com-munication rounds across parties (Section 3.2). • The cost of composition , which is also measured in num-ber of operations and communication rounds (Section 3.3).

Observations.

The optimization rules we present in this sectionare guided by the following observations:(1) With the exception of

LIMIT , oblivious operators never re-duce the size of intermediate data.(2)

JOIN is the only operator that produces an output largerthan its input.(3) The synchronization cost of the blocking operators,

ORDER-BY , GROUP-BY , and

DISTINCT , depends on the size of their input.(4) When

DISTINCT follows a selection, a (semi-)join or a group-by, the total asymptotic cost of composition increases froma constant to a linear number of rounds w.r.t. the input size.

Guided by observations (1)-(3), we propose three logical transforma-tion rules that reorder and decompose pairs of operators to lowerthe cost of oblivious query plans. Although non-standard, the ruleswe describe in this section are valid algebraic transformations forplaintext queries and there are no special applicability conditionsin the secure setting.

Blocking oblivious operators(

GROUP-BY , DISTINCT , ORDER-BY ) materialize and sort their entireinput before producing any output tuple. Contrary to a plaintextoptimizer that would most likely place sorting after selective op-erators, in MPC we have an incentive to push blocking operatorsdown, as close to the input as possible. Since oblivious operatorsdo not reduce the size of intermediate data, sorting the input isclearly the best option. Blocking operator push-down reduces allthree cost factors and can provide significant performance improve-ments in practice, even if the asymptotic costs do not change. As anexample, consider the case of applying

ORDER-BY before a selection.Recall that the number of operations and messages required by theoblivious

ORDER-BY depends on the cardinality and the number ofattributes of the input relation (cf. Section 3.2). Applying the selec-tion after the order-by reduces the actual (but not the asymptotic)operation cost, as selection appends one attribute to its input.

The second transformation rule is guided byobservation (2) that

JOIN is the only operator whose output is largerthan its input. Based on this, we have an incentive to perform joinsas late as possible in the query plan so that we avoid applying otheroperators to join results, especially those operators whose synchro-nization cost depends on the input size. For example, placing ablocking operator after a join requires sorting the cartesian productof the input relations, which increases the synchronization costof a subsequent

GROUP-BY to O( 𝑛 ) and the operation cost of anyfollowing blocking operator to O( 𝑛 log 𝑛 ) .Similar re-orderings have been proposed for plaintext queries [26,94], however, in the MPC setting this transformation does notreduce the size of intermediate data. Note that, under MPC, a planthat applies ORDER-BY on a

JOIN input produces exactly the sameamount of intermediate data as a plan where

ORDER-BY is placedafter

JOIN , yet the latter plan has a higher cost.

Example.

Consider the following query:

Q1:

SELECT DISTINCT R.idFROM R, SWHERE R.id = S.id sort input relation R on 𝑎 𝜃 , 𝑎 𝑘 ; for each pair of adjacent tuples ( 𝑡 𝑖 , 𝑡 𝑖 + ), ≤ 𝑖 < | 𝑅 | , do //Are tuples in the same group? let 𝑏 ← 𝑡 𝑖 [ 𝑎 𝑘 ] ? = 𝑡 𝑖 + [ 𝑎 𝑘 ] ; //Are tuples in the semi-join output too? let 𝑏 𝑐 ← 𝑏 ∧ 𝑡 𝑖 [ 𝑎 𝜃 ] ∧ 𝑡 𝑖 + [ 𝑎 𝜃 ] ; // 𝑏 𝑐 is a single bit//Aggregation 𝑡 𝑖 + [ 𝑎 𝑔 ] ← 𝑏 𝑐 · (cid:16) 𝑡 𝑖 [ 𝑎 𝑔 ] + 𝑡 𝑖 + [ 𝑎 𝑔 ] (cid:17) + ( − 𝑏 𝑐 ) · 𝑡 𝑖 + [ 𝑎 𝑔 ] ; 𝑡 𝑖 [ 𝑎 𝑣 ] ← ¬ 𝑏 𝑐 ; // 𝑎 𝑣 is the ‘valid’ bit//Masking for each attribute 𝑎 ≠ 𝑎 𝑣 of 𝑡 𝑖 do let 𝑟 a random value; 𝑎 ← 𝑏 𝑐 · 𝑟 + ( − 𝑏 𝑐 ) · 𝑎 ; shuffle R; Algorithm 2:

Second phase of the Join-Aggregation decomposition.

Let 𝑅 and 𝑆 have the same cardinality 𝑛 . A plan that applies DISTINCT after the join operator requires O( 𝑛 𝑙𝑜𝑔 𝑛 ) operationsand messages per party. On the other hand, pushing DISTINCT before

JOIN reduces the operation cost to O( 𝑛 log 𝑛 ) and the com-position cost from O( 𝑛 ) to O( 𝑛 ) (in number of operations) and O( ) (in number of rounds). The asymptotic synchronization costis the same for both plans, i.e. O( log 𝑛 ) , but the actual number ofrounds when DISTINCT is pushed before

JOIN is 4 × lower. Consider a query plan wherea

JOIN on attribute 𝑎 𝑗 is followed by a GROUP-BY on another at-tribute 𝑎 𝑘 ≠ 𝑎 𝑗 . In this case, pushing the GROUP-BY down does notproduce a semantically equivalent plan. Still, we can optimize theplan by decomposing the aggregation in two phases and push thefirst (and most expensive) phase before the

JOIN .Let 𝑅 , 𝑆 be the join inputs, where 𝑅 includes the group-by key 𝑎 𝑘 . The first phase of the decomposition sorts 𝑅 on 𝑎 𝑘 and com-putes a semi-join ( IN ) on 𝑎 𝑗 , which appends two attributes to 𝑅 : anattribute 𝑎 𝜃 introduced by the semi-join, and a second attribute 𝑎 𝑔 introduced by the group-by (cf. Section 3.1) . During this step, 𝑎 𝑔 isinitialized with a partial aggregation for each tuple in 𝑅 . The partialaggregation depends on the aggregation function in the query (weprovide an example below).In the second phase, we compute the final aggregates per 𝑎 𝑘 using Algorithm 2, which takes into account the attribute 𝑎 𝜃 andupdates the partial aggregates 𝑎 𝑔 in-place with a single scan over 𝑅 .The decomposition essentially replaces the join with an equivalentsemi-join and a partial aggregation in order to avoid performingthe aggregation on the cartesian product 𝑅 × 𝑆 . This way, we signifi-cantly reduce the number of operations and communication rounds,but also ensure that the space requirements remain bounded by | 𝑅 | since the join output is not materialized. Note that this optimizationis fundamentally different than performing a partial aggregation inplaintext (at the data owners) and then computing the global ag-gregates under MPC [13, 79]; in our case, all data are secret-sharedamongst parties and both phases are under MPC. In case the aggregation function is

AVG , we need to keep the value sum (numerator)and count (denominator) as separate attributes in 𝑅 . The decomposition rule works for all common SQL aggrega-tions (

SUM , COUNT , MIN/MAX , AVG ). It can also be used to push down

DISTINCT in queries like Q1 when the distinct attribute is differentfrom the join attribute. In this case, there is no partial aggregation;we simply do the semi-join that appends the attribute 𝑎 𝜃 (as above)and, in the second phase, we apply the distinct operator to 𝑅 bytaking into account 𝑎 𝜃 . Example.

Consider the following query:

Q2:

SELECT

𝑅.𝑎 𝑘 , COUNT(*)FROM R, SWHERE R.id = S.idGROUP BY 𝑅.𝑎 𝑘 Let 𝑅 and 𝑆 have the same cardinality 𝑛 . The plan that applies GROUP-BY to the join output requires O( 𝑛 log 𝑛 ) operations and O( 𝑛 ) communication rounds. When decomposing the aggregationin two phases, the operation cost is reduced to O( 𝑛 log 𝑛 ) (due tooblivious sorting of 𝑅 ) and the synchronization cost is reduced to O( 𝑛 ) rounds (due to the final grouping on 𝑅 ). The space require-ments are also reduced from O( 𝑛 ) to O( 𝑛 ) . In this example, thepartial aggregation amounts to summing (under MPC) the | 𝑆 | bitsproduced by the semi-join in the first phase of the decomposition. In this section, we describe a set of physical optimizations in

Secrecy that further reduce the cost of oblivious plans.

Fusion is a common optimization in plain-text query planning, where the predicates of multiple filters can bemerged and executed by a single operator. Fusion is also applicableto oblivious selections and joins with equality predicates, and isessentially reduced to identifying independent operations that canbe executed within the same communication round. For example, ifthe equality check of an equi-join and a selection are independentof each other, a fused operator requires ⌈ log ℓ ⌉ + ⌈ log ℓ ⌉ +

1. Next, we describe a somewhat more interesting fusion.

Recall that applying

DISTINCT after

SELECT requires 𝑛 communication rounds (cf. Section 4.1, Observation (4)).We can avoid this overhead by fusing the two operators in a dif-ferent way, that is, sorting the input relation on the selection bitfirst and then on the distinct attribute. Sorting on two (insteadof one) attributes adds a small constant factor to each obliviouscompare-and-swap operation, hence, the asymptotic complexity ofthe sorting step remains the same. When DISTINCT is applied tothe output of other operators, including selections and (semi-)joins,this physical optimization keeps the number of rounds required forthe composition low.

Example.

Consider the following query:

Q3:

SELECT DISTINCT idFROM RWHERE 𝑎 𝑘 = ’c’ Fusing the distinct and selection operators reduces the numberof communication rounds from O( 𝑛 ) to O( log 𝑛 ) , as if the distinctoperator was applied only to 𝑅 (without a selection). DISTINCT canbe fused with a join or a semi-join operator in a similar way. In his case, the distinct operator takes into account the equality orinequality predicate of the (semi-)join. In communication-intensive MPC tasks,each non-local operation requires exchanging a constant numberof messages, which in practice are very small in size (i.e., a fewbytes). Grouping and exchanging small independent messages inbulk improves performance significantly. Consider applying a se-lection with an equality predicate on a relation with 𝑛 tuples. Per-forming oblivious equality on one tuple requires ⌈ log ℓ ⌉ rounds(cf. Section 2.3). Applying the selection tuple-by-tuple and sendingmessages eagerly (as soon as they are generated) results in 𝑛 · ⌈ log ℓ ⌉ communication rounds. Instead, if we apply independent selectionsacross the entire relation and exchange messages in bulk, we canreduce the total synchronization cost to ⌈ log ℓ ⌉ . We apply this opti-mization by default to all oblivious operators in Secrecy . Costs inTables 2 and 3 already take message batching into account.

Secrecy uses boolean sharing by default, however, computing arith-metic expressions or aggregations, e.g.

COUNT and

SUM , on booleanshares requires using a ripple-carry adder, which in turn requiresinter-party communication. On the other hand, the same operationson additive shares are local to each computing party. In this section,we describe two optimizations that avoid the ripple-carry adder inaggregations and predicates with constants.

The straight-forward approach of switchingfrom boolean to additive shares (and vice versa) based on the typeof operation does not pay off; the conversion itself relies on theripple-carry adder (cf. Section 2.3), which has to be applied twiceto switch to the other representation and back. The cost-effectiveway would be to evaluate logical expressions using boolean sharesand arithmetic expressions using additive shares. However, this isnot always possible because arithmetic and boolean expressions inoblivious queries often need to be composed into the same formula.We mitigate this problem using a dual secret-sharing scheme.Recall the example query Q2 from Section 4.2.3 that applies an ag-gregation function to the output of a join according to Algorithm 2.The attribute 𝑎 𝜃 in Algorithm 2 is a single-bit attribute denotingthat the respective row is included in the join result. During obliv-ious evaluation, each party has a boolean share of this bit that isused to compute the arithmetic expression in line . The naïveapproach is to evaluate the following equivalent logical expressiondirectly on the boolean shares of 𝑏 𝑐 , 𝑡 𝑖 [ 𝑎 𝑔 ] , and 𝑡 𝑖 + [ 𝑎 𝑔 ] : 𝑡 𝑖 + [ 𝑎 𝑔 ] ← 𝑏 ℓ ∧ RCA (cid:16) 𝑡 𝑖 [ 𝑎 𝑔 ] , 𝑡 𝑖 + [ 𝑎 𝑔 ] (cid:17) ⊕ 𝑏 ℓ ∧ 𝑡 𝑖 + [ 𝑎 𝑔 ] where RCA is the oblivious ripple-carry adder primitive, 𝑏 ℓ is a stringof ℓ bits (the length of 𝑎 𝑔 ) all of which are set equal to 𝑏 𝑐 , and 𝑏 ℓ is thebinary complement of 𝑏 ℓ . Evaluating the above expression requires ℓ communication rounds for RCA plus two more rounds for thelogical ANDs ( ∧ ). On the contrary, Secrecy evaluates the equivalentformula in line of Algorithm 2 in four rounds (independent from ℓ ) as follows. First, parties use arithmetic shares for the attribute 𝑎 𝑔 to compute the addition locally. Second, each time they compute thebit 𝑏 𝑐 in line , they exchange boolean as well as arithmetic shares ofits value. To do this efficiently, we rely on the single-bit conversion (a) Comorbidity (b) Recurrent C.Diff.(c) Aspirin Count Figure 3: Optimized query plans for three real queries protocol used also in CrypTen [1], which only requires two roundsof communication. Having boolean and arithmetic shares of 𝑏 𝑐 allows us to use it in boolean and arithmetic expressions withoutpaying the cost of RCA . The previous optimization relies on 𝑏 𝑐 being a single bit. In many cases, however, we need to composeboolean and additive shares of arbitrary values. Representativeexamples are join predicates with arithmetic expressions on booleanshares, e.g. ( 𝑅.𝑎 − 𝑆.𝑎 ≥ 𝑐 ) , where 𝑎 is an attribute and 𝑐 is a constant.We can speedup the oblivious evaluation of such predicates byproactively asking the data owners to send shares of the expressionresults. In the previous example, if parties receive boolean sharesof 𝑆.𝑎 + 𝑐 they can avoid computing the boolean addition withthe ripple-carry adder. A similar technique is also applicable forselection predicates with constants. In this case, to compute 𝑎 > 𝑐 ,if parties receive shares of 𝑎 − 𝑐 and 𝑐 − 𝑎 , they can transform thebinary equality to a local comparison with zero (cf. Section 2.3). Notethat proactive sharing is fundamentally different than having dataowners perform local filters or pre-aggregations prior to sharing.In the latter case, the computing parties might learn the selectivityof a filter or the number of groups in an aggregation (if results arenot padded). In our case, parties simply receive additional sharesand will not learn anything about the intermediate query results. We now showcase the applicability of

Secrecy ’s optimizations onthree queries from clinical studies [52, 74, 84] that have also beenused in other MPC works [13–15, 79, 88]. We experimentally evalu-ate the performance benefits on a larger set of queries in Section 6. omorbidity. This query returns the ten most common diagnosesof individuals in a cohort.

SELECT diag, COUNT(*) cntFROM diagnosisWHERE pid IN cdiff_cohortGROUP BY diagORDER BY cnt DESCLIMIT 10

This query lends itself to join-aggregation decomposition and dualsharing, producing the plan shown in Figure 3a. Let 𝑛 be the cardi-nality of diagnosis . The number of operations needed to evaluatethis query is O( 𝑛 log 𝑛 ) (due to oblivious sorting) whereas thenumber of communication rounds is O( 𝑛 ) (due to the obliviousgroup-by). The space requirements are bounded by the size of diagnosis , i.e., O( 𝑛 ) . Recurrent Clostridium Difficile.

This query returns the distinctids of patients who have been diagnosed with cdiff and have twoconsecutive infections between 15 and 56 days apart.

WITH rcd AS (SELECT pid, time, row_no() OVER(PARTITION BY pid ORDER BY time)FROM diagnosisWHERE diag=cdiff)SELECT DISTINCT pidFROM rcd r1 JOIN rcd r2 ON r1.pid = r2.pidWHERE r2.time - r1.time >= 15 DAYSAND r2.time - r1.time <= 56 DAYSAND r2.row_no = r1.row_no + 1

Two optimizations are applicable in this case. First, we apply block-ing operator push-down to sort on diagnosis before applyingthe selection. Second, we use distinct fusion ( 𝜎 - 𝛿 ) to evaluate theinequality predicates along with DISTINCT . The optimized planis shown in Figure 3b and it requires O( 𝑛 log 𝑛 ) operations and O( log 𝑛 ) communication rounds. Note that an end-to-end oblivi-ous implementation of the plan used in [13] requires O( 𝑛 log 𝑛 ) operations and 4 × more communication rounds, i.e., O( log 𝑛 ) = O( 𝑛 ) = O( log 𝑛 ) . This is because PARTITION BY is notpossible under MPC without revealing the number of partitionsand, thus, the self-join will generate and materialize the cartesianproduct rcd × rcd , before applying the final DISTINCT operation.

Aspirin Count.

The third query returns the number of patientswho have been diagnosed with heart disease and have been pre-scribed aspirin after the diagnosis was made.

SELECT count(DISTINCT pid)FROM diagnosis as d, medication as m on d.pid = m.pidWHERE d.diag = hd AND m.med = aspirinAND d.time <= m.time

Here, we use blocking operator push-down and join push-up. Wepush the blocking distinct operator after the join to avoid materi-alizing and sorting the join output. The optimized plan is shownin Figure 3c. Let diagnosis and medication have the same cardi-nality 𝑛 . The number of operations needed to evaluate the query is O( 𝑛 ) whereas the number of communication rounds is O( log 𝑛 ) .In contrast, an end-to-end oblivious implementation of the plan in[13] requires O( 𝑛 log 𝑛 ) operations and 4 × more rounds, since itapplies distinct to the materialized join output. Figure 4: Overview of the

Secrecy architecture. /** Commorbidity Query **/ BTable t1 = get_shares (diagnosis); BTable t2 = get_shares (cohort); // Sort t1 on diag (at index 2) bitonic_sort (&t1, 2, ASC); in (&t1, &t2, 0, 0); // Semi-join on pid group_by_count (&t1, 2); // Group-by on diag // Sort t1 on count (at index 4) bitonic_sort (&t1, 4, ASC); open (t1, 10); // Open first 10 rows Figure 5: Comorbidity query with

Secrecy ’s API. SECRECY

IMPLEMENTATION

Even though there exist various open-source MPC frameworks [49],we decided to implement

Secrecy entirely from scratch. As a result,we were able to design and implement secure low-level primitivesand oblivious operators that are optimized to process shares of tables instead of single attributes. In this section, we provide a briefdescription of the most important

Secrecy implementation aspects.

Architecture overview.

Figure 4 shows an overview of the

Secrecy framework.

Secrecy is implemented in C and can be deployed onlocal clusters or machines in the cloud. The distributed runtime andcommunication layer are based on MPI . Secrecy currently does notencrypt data in transit between parties but it can be easily combinedwith any TLS implementation or other networking library that doesso. Each computing party is a separate MPI process and we currentlyuse a single thread per party to handle both local computation andcommunication with other parties. Parties are logically placed ona ring as shown in Figure 1. The middle layers of

Secrecy includeour implementation of the replicated secret sharing protocol, alibrary of secure computation and communication primitives, andthe random number generation protocols. We built the latter withthe libsodium library . The upper two layers of the stack provideoptimized implementations of the oblivious relational operatorsand a declarative relational API. Query execution . Upon startup, the parties establish connectionsto each other and learn the process ids of their successor and pre-decessor parties. Then, they construct a random sharing of zero generator, so that they can jointly create random shares of thevalue 0 for the various secure primitives. To achieve that at scale,parties generate a random seed and share it with their succes-sor in the ring. This way, each party has access to one local and https://libsodium.gitbook.io ne remote pseudo-random number generator: 𝑟𝑎𝑛𝑑 𝑟𝑎𝑛𝑑 𝑠𝑧 = 𝑟𝑎𝑛𝑑 .𝑔𝑒𝑡 _ 𝑛𝑒𝑥𝑡 () − 𝑟𝑎𝑛𝑑 .𝑔𝑒𝑡 _ 𝑛𝑒𝑥𝑡 () . Next, they receive inputshares for each base relation from the data owners.Queries are specified in a declarative API that allows composingoperators seamlessly and abstracts the communication and MPCdetails. To compute the result of a query, parties execute an identicalpiece of code on their data shares. As an example, Figure 5 showsthe Secrecy code that implements the Comorbidity query fromSection 4.5 (we omit two function calls that convert boolean sharesto arithmetic for brevity). We use a data representation forshares, so in our implementation ℓ =

64 (cf. Section 3).

Configurable batching.

Primitives and relational operators in

Se-crecy operate in batched mode, that is, they provide the ability toprocess multiple table rows in bulk and batch independent messagesinto a single round of communication (cf. Section 4.3.3). The batchsize is configurable and allows

Secrecy to compute expensive oper-ators, such as joins, with full control over memory requirements.While batching does not reduce the total number of operations,we leverage it to compute on large inputs without running out ofmemory or switching to an expensive disk-based evaluation.

Our experimental evaluation is structured into five parts:

Performance on real and synthetic queries.

In Section 6.2, weevaluate

Secrecy on eight real and synthetic queries. We show that

Secrecy ’s implementation is efficient and its optimizations effec-tively reduce the runtime of complex queries by up to three ordersof magnitude. In contrast to the baseline plans that fail to scale forinputs beyond a few thousand records,

Secrecy can process hun-dreds of thousands and up to millions of input rows, entirely underMPC, in reasonable time.

Comparison with state-of-the-art frameworks.

In Section 6.3,we compare

Secrecy with two state-of-the-art MPC frameworks:SMCQL [13] and EMP [91]. We show that

Secrecy outperforms bothof them and can comfortably process much larger datasets withinthe same amount of time.

Benefits of optimizations.

In Section 6.4, we evaluate the benefitsof

Secrecy ’s logical, physical, and secret-sharing optimizations onthe three queries of Sections 4.2-4.4. Our results demonstrate thatpushing down blocking operators reduces execution time by up to1000 × and enables queries to scale to 100 × larger inputs. Further,we show that operator fusion and Secrecy ’s dual sharing improveexecution time by 2 × . Performance of relational operators.

In Section 6.5, we presentperformance results for individual relational operators. We showthat

Secrecy ’s batched operator implementations are efficient andthat by properly adjusting the batch size, they can comfortablyscale to millions of input rows without running out of memory.

Micro-benchmarks.

Finally, in Section 6.6, we drill down andevaluate individual secure computations and communication prim-itives that relational operators rely upon. We empirically verify thetheoretical cost analysis of Section 2.3, evaluate the scalability of

Figure 6: Performance gains of

Secrecy ’s optimizations over baselineplans for real and synthetic queries. Logical and physical optimiza-tions result in over × lower execution times, while secret-sharingoptimizations improve query performance by up to × . primitives, and quantify the positive effect that message batchinghas on the performance of communication-heavy operations. We run all experiments on a three-node cluster of VMs in the Mas-sachusetts Open Cloud (MOC) [2]. Each VM has 32GB of memoryand 16 vCPUs and runs Ubuntu , C99 , gcc 5.4.0 , and MPICH 1.4 . Each MPC party is assigned to a different VM and runsas a single MPI process. For the purpose of our experiments, wedesignate one party as the data owner that distributes shares and re-veals results in the end of the computation. Reported measurementsare averaged over at least three runs and are plotted in log-scale,unless otherwise specified.

Queries.

We use 11 queries in total. Five of them are real-worldqueries that have also been used in previous MPC works [13–15,79, 88]. We use the three medical queries from [13] (

Comorbidity , Recurrent C.Diff. , and

Aspirin Count ) and two queries from differentMPC application areas [79]: the first query (

Password Reuse ) asks forusers with the same password across different websites, while thesecond (

Credit Score ) asks for persons whose credit scores acrossdifferent agencies have significant discrepancies in a particularyear. To showcase the applicability of our optimizations in otherdomains, we also use three TPC-H queries ( Q4 , Q6 , Q13 ) thatinclude aggregations along with selections or joins (in Q13 wereplace LIKE with an equality since the former is not yet supportedby

Secrecy ). Finally, to evaluate the performance gains from eachoptimization in isolation, we use the three example queries ( Q1 , Q2 , Q3 ) of Sections 4.2-4.4. Input data.

In all experiments, we use randomly generated tableswith 64-bit values. Note that the MPC protocols we use assume afixed-size representation of shares. The data representation size isimplementation-specific and could be increased to any 2 𝑘 valuewithout modifying the protocols. We also highlight that using ran-domly generated inputs is no different than using real data, as alloperators are oblivious and the data distribution does not affect theamount of computation or communication. No matter whether theinput values are real of random, parties compute on shares, whichare by definition random. a) Category A (b) Category B (c) Category C Figure 7: Scaling behavior of optimized real and synthetic queries on

Secrecy

In this section, we evaluate

Secrecy ’s performance on eight querieswith and without the optimizations of Section 4. For each query,we implement both the optimized and the non-optimized (baseline)plan using

Secrecy ’s efficient batched operators. Although this fa-vors the baseline, the communication cost of MPC is prohibitivewithout message batching and queries cannot scale to more than afew hundred input rows in reasonable time.

Comparison with baseline.

We execute each query plan with1 𝐾 rows per input relation and present the results in Figure 6. For Comorbidity , we use a cohort of 256 patients. For Q4 (resp. Q13), weuse 1 𝐾 rows for LINEITEM (resp.

ORDERS ) and maintain the size ratiowith the other input relation as specified in the TPC-H benchmark.The optimized plans for

Recurrent C.Diff. , Aspirin Count , andQ13 achieve the highest speedups over non-optimized plans, that is,1868 × , 134 × , and 6486 × lower execution times respectively. Opti-mized plans for these queries leverage logical and physical optimiza-tions to push blocking operators before joins ( Aspirin Count ), fuseoperators (

Recurrent C.Diff. ), or decompose join with aggregation(Q13). The optimized plans for

Comorbidity , Password Reuse , Q4, andQ6 leverage secret sharing optimizations that result in up to 71 × lower execution times compared to non-optimized plans. Finally,the Credit Score query leverages dual sharing optimizations, which,in this case, do not provide significant performance improvement.

Scaling behavior.

We now run the optimized plans with increasinginput sizes and measure total execution time. For these experiments,we group queries into three categories of increasing complexity.

Category A includes queries with selections and global aggregations,

Category B includes queries with select and group-by or distinctoperators, and

Category C includes queries with select, group-byand (semi-)join operators. Figure 7 presents the results.The only query that falls in

Category A is Q6. This query includesfive selections plus a global aggregation and requires very limitedinter-party communication that does not depend on the size of theinput relation. As a result, Q6 scales comfortably to large inputs andtakes a bit less than 13 𝑠 for 8 𝑀 rows. Queries in Category B scaleto millions of input rows as well, but with higher execution timescompared to Q6. The cost of queries in this category is dominated bythe oblivious group-by and distinct operators that rely on oblivioussort. For large inputs, the most expensive of the four queries is

Recurrent C.Diff. , which completes in ∼ ℎ for 2 𝑀 input rows.Finally, queries in Category C scale to tens or hundreds of thou-sands of input rows, depending on the particular operators in theplan. The cost of queries in this category is dominated by the obliv-ious join and semi-join operators. All three queries have two input relations but with different size ratios: for Q4 and Q13, we usethe ratio specified in the TPC-H benchmark whereas for

AspirinCount we use inputs of equal size. For each query in Figure 7c, westart with 1 𝐾 rows for the smaller input relation (scaling factor 1 × )and increase the size of the two inputs up to 32 × , always keepingtheir ratio fixed. The most expensive query is Aspririn Count , asit includes an oblivious 𝜃 -join with both equality and inequalitypredicates. Recall that join needs to perform 𝑂 ( 𝑛 · 𝑚 ) comparisons,that is, over 1 𝐵 for 64 𝐾 rows (32 𝐾 per input). Nevertheless, dueto Secrecy ’s ability to push down blocking operators and performjoins in batches, it successfully completes in ∼ . ℎ . Q4 requires ∼ ℎ on 164 𝐾 rows, and Q13 is able to complete in ∼ . ℎ on 295 𝐾 rows due to the join-aggregation decomposition.While MPC protocols remain highly expensive for real-timequeries, our results demonstrate that offline collaborative analyticson medium-sized datasets entirely under MPC are viable. To thebest of our knowledge, Secrecy is the first framework capable of eval-uating real-world queries on inputs of such scale, while ensuringno information leakage and no reliance on trusted hardware.

In this section, we compare

Secrecy with two state-of-the-art MPCframeworks: SMCQL [13] and the 2-party semi-honest version ofEMP [91]. We choose SMCQL (the ORAM-based version) as theonly open-source relational framework with a semi-honest modeland no information leakage (cf. Table 1). More recent systems,such as Shrinkwrap [14], SAQE [15], and a new version of SMCQL,although not publicly available, build on top of EMP. Senate [79]also relies on EMP, albeit its malicious version.

Comparison with SMCQL.

In the first set of experiments, weaim to reproduce the results presented in the SMCQL paper (Figure7) [13] on our experimental setup. We run the three medical querieson SMCQL and

Secrecy , using a sample of 25 tuples per data owner(50 in total), and present the results in Table 4. We use the plans anddefault configuration of protected and public attributes, as in theSMCQL project repository . As we can see, Secrecy is over 2000 × faster than SMCQL in all queries, even though SMCQL pushesoperators outside the MPC boundary by allowing data owners toexecute part of the computation on their plaintext data. In theSMCQL experiment, each computing party is also a data ownerand, although it provides 25 tuples per relation to a query, only 8 ofthose enter the oblivious part of the plan; the rest are filtered outbefore entering the MPC circuit. https://github.com/smcql/smcql igure 8: Performance comparison between EMP and Secrecy on oblivious equi-join (left) and sort (right).

Secrecy evaluates the join on 𝐾 rows per input in ℎ whereas EMP requires . ℎ for 𝐾 rows per input. Secrecy is up to . × faster than EMP on oblivious sort.Comorbidity Recurrent C. Diff. Aspirin CountSMCQL 𝑠 𝑠 𝑠 Secrecy . 𝑠 . 𝑠 . 𝑠 Table 4: SMCQL and

Secrecy execution times for the three medicalqueries of Section 4.5 on 25 tuples per input relation.

Comparison with EMP.

EMP is a general-purpose MPC frame-work and does not provide implementations of relational operatorsout-of-the-box. For this set of experiments, we implemented anequi-join operator using the sample program available in the SoKproject and we also use the oblivious sort primitive provided inthe EMP repository . Figure 8 presents the results. For joins, we useinputs of the same cardinality ( 𝑛 = 𝑚 ) and increase the size from10 𝐾 to 100 𝐾 rows per input. We cap the time of these experimentsto 15 ℎ . Within the experiment duration, EMP can evaluate joinson up to 40 𝐾 rows per input (in 14 . ℎ ). Secrecy is 7 . × faster forthe same input size and can process up to 100 𝐾 rows per input in abit less than 12 ℎ . The performance gap between Secrecy and EMPon oblivious sort is less dramatic but still considerable. In this case,both frameworks scale to much larger inputs and

Secrecy is up to1 . × faster (3 . ℎ vs 4 . ℎ for 4 𝑀 input rows). We now use the example queries of Section 4 (Q1, Q2, Q3) to evaluatethe performance impact of

Secrecy ’s optimizations. We run eachquery with and without the particular optimization and measuretotal execution time. The results are shown in Figure 9.

Distinct-Join reordering.

Q1 applies

DISTINCT to the result ofan equi-join. The baseline plan executes the oblivious join first,then sorts the materialized cartesian product 𝑅 × 𝑆 and applies DISTINCT . In the optimized plan,

DISTINCT is pushed before the

JOIN and, thus,

Secrecy sorts a relation of 𝑛 rows instead of 𝑛 .Figure 9a shows that the optimized plan is up to two orders ofmagnitude faster than the baseline, which runs out of memory foreven modest input sizes. Join-Aggregation decomposition.

Q2 performs a grouped aggre-gation on the result of an equi-join. The baseline plan performs thejoin first, materializes the result, and then applies the grouping and https://github.com/MPC-SoK/frameworks/blob/master/emp/sh_test/test/xtabs.cpp https://github.com/emp-toolkit/emp-sh2pc (a) Distinct-Join reordering in Q1 (b) Join-Aggreg. decomposition in Q2(c) Select-Distinct fusion in Q3 (d) Dual sharing in Group-by-count Figure 9: Benefits of optimizations on

Secrecy . Operator reordering(a) and decomposition (b) result in over × lower execution timescompared to alternative plans, while physical (c) and secret-sharing(d) optimizations improve performance by up to × . All optimiza-tions enable Secrecy to scale to much larger inputs (up to × ) with-out running out of memory. aggregation. Instead, the optimized plan decomposes the aggrega-tion in two phases (cf. Section 4.2.3) and transforms the equi-joininto a pipelined semi-join. As shown in Figure 9b, this optimizationprovides up to three orders of magnitude lower execution time thanthat of the baseline plan. Further, the materialized join causes thebaseline plan to run out of memory for inputs larger than 1 𝐾 rows. Operator fusion.

Q3 applies

DISTINCT on the result of a selection.The baseline plan applies the oblivious selection and then sortsits output and applies

DISTINCT sequentially. As we explain inSection 4.3.2,

Secrecy fuses the two operators and performs the

DISTINCT computation in bulk. Figure 9c (plot in linear scale) showsthat this optimization provides up to 2 × speedup for large inputs. Dual sharing.

We also evaluate

Secrecy ’s ability to switch betweenarithmetic and boolean sharing to reduce communication costs forcertain operations. For this experiment, we compare the run-timeof the optimized

GROUP-BY-COUNT operator (Section 4.4) to that of abaseline operator that uses boolean sharing only and, hence, relieson the ripple-carry adder to compute the

COUNT . Figure 9d plots the esults. The baseline operator is 2 × slower than the optimized one,as it requires 64 additional rounds of communication per input row. The next set of experiments evaluates the performance of obliviousrelational operators in

Secrecy . We perform

DISTINCT , GROUP-BY , ORDER-BY , IN , and JOIN (equality and range) on relations of in-creasing size and measure the total execution time per operator.We empirically verify the cost analysis of Section 3 and show thatour batched implementations are efficient and scale to millions ofinput rows with a single thread. Figure 10 shows the results.

Unary operators.

In Figure 10a, we plot the execution time ofunary operators vs the input size. Recall from Section 3.1 that

DISTINCT and

GROUP-BY are both based on sorting and, thus, theircost includes the cost of

ORDER-BY for unsorted inputs of the samecardinality. To shed more light on the performance of

DISTINCT and

GROUP-BY , Figure 10a only shows the execution time of their secondphase, that is, after the input is sorted and, for

GROUP-BY , beforethe final shuffling (which has identical performance to sorting).For an input relation with 𝑛 rows, DISTINCT performs 𝑛 − DISTINCT to the entire input in six roundsof communication (the number of rounds required for obliviousequality on pairs of 64-bit shares). As a result,

DISTINCT scales wellwith the input size and can process 10 𝑀 rows in 45 𝑠 . GROUP BY isslower than

DISTINCT , as it requires significantly more rounds ofcommunication, linear to the input size. Finally,

ORDER BY relieson our implementation of bitonic sort, where all 𝑛 comparisons ateach level are batched within the same communication round. Joins.

The oblivious join operators in

Secrecy hide the size of theiroutput, thus, they compute the cartesian product between the twoinput relations and produce a bit share for all pairs of records,resulting in an output with 𝑛 · 𝑚 entries. We run both operators with 𝑛 = 𝑚 , for increasing input sizes, and plot the results in Figure 10b.The figure includes equi-join results for up to 100 𝐾 rows per inputand range-join results for up to 40 𝐾 rows per input, as we cappedthe duration of this experiment to 15 ℎ . Secrecy executes joins inbatches without materializing their entire output at once. As aresult, it can perform 10 𝐵 equality comparisons and 1 . 𝐵 inequalitycomparisons under MPC within the experiment duration limit.We also run experiments with semi-joins ( IN ) and present the re-sults in Figure 10c. In this case, we vary the left and right input sizesindependently, as they affect the cost of the semi-join differently.Each line corresponds to an experiment where we keep one of theinputs fixed to 1 𝐾 rows and increase the size of the other inputfrom 1 𝐾 to 1 𝑀 rows (in powers of two). The two lines overlap wheninputs are small (up to 256 𝐾 rows) but they diverge significantlyfor larger inputs. The reason behind this performance differenceis because the number of communication rounds in the semi-joindepends only on the size of the right input (cf. Table 2). Although asemi-join between 1 𝑀 (left) and 1 𝐾 (right) rows incurs the sameasymptotic number of operations with a semi-join between 1 𝐾 (left)and 1 𝑀 (right) rows, the latter has a higher synchronization cost,which in practice causes a latency increase of ∼ 𝑠 . To better understand the results of the previous sections, we nowuse a set of micro-benchmarks and evaluate the performance of

Secrecy ’s MPC primitives.

Effect of message batching on communication latency.

In thefirst experiment, we measure the latency of inter-party communi-cation using two messaging strategies. Recall that, during a mes-sage exchange, each party sends one message to its successor andreceives one message from its predecessor on the ‘ring’.

Eager ex-changes data among parties as soon as they are generated, thus,producing a large number of small messages. The

Batched strategy,on the other hand, collects data into batches and exchanges themonly when computation cannot otherwise make progress, thus,producing as few as possible, albeit large messages.We run this experiment with increasing data sizes and measurethe total time from initiating the exchange until all parties completethe exchange. Figure 11a shows the results. We see that batchingprovides two to three orders of magnitude lower latency than eagermessaging. Using batching in our experimental setup, parties canexchange 100 𝑀 𝑠 . These results reflect thenetwork performance in our cloud testbed. We expect better perfor-mance in dedicated clusters with high-speed networks and higherlatencies if the computing parties communicate over the internet. Performance of secure computation primitives.

We now eval-uate the performance of oblivious primitives that require commu-nication among parties. These include equality, inequality, andaddition with the ripple-carry adder. In Figure 11b we show theexecution time of oblivious primitives as we increase the inputsize from 1 𝐾 rows to 10 𝑀 rows. All primitives scale well with theinput size as they all depend on a constant number of communica-tion rounds. Equality requires six rounds. Inequality requires sevenrounds and more memory than equality. Boolean addition is not asmemory- and computation-intensive as inequality, but requires ahigher number of rounds (64). Enclave-based approaches.

In this line of work, parties apply theoblivious operators on the actual data (rather than secret shares)within a physically-protected environment, such as a trusted server,a hardware enclave or a cryptographic coprocessor. This is a funda-mentally different approach to achieve security, where parties sendtheir (encrypted) data to other parties and the oblivious computa-tion happens inside the trusted environment without paying thecommunication cost of MPC (cf. Table 2). In

Secrecy , computingparties execute an identical computation on commodity hardwareand must communicate multiple times to apply each operator, thus,the main objective is to optimize communication. By contrast, themain objectives within enclave-based approaches are to operatewith a small amount of RAM, properly pad intermediate queryresults, and hide access patterns while reading/writing (encrypted)data from/to untrusted storage. The theoretical works by Agrawalet al. [5] and Arasu et al. [10] focus on secure database queries inthis setting. ObliDB [36], Opaque [98], and StealthDB [87] are threerecent systems that rely on secure hardware (e.g. Intel’s SGX) tosupport a wide range of database operators, including joins and a) Unary operators (b) Join operators (c) Semi-join operator Figure 10: Performance of oblivious relational operators in

Secrecy (a) Eager vs. batched communication (b) Comparison and addition

Figure 11: Performance of oblivious primitives in

Secrecy group-by with aggregation. OCQ [31] builds on Opaque and in-troduces additional optimizations that reduce intermediate resultpadding by leveraging foreign-key constraints between private andpublic relations. Enclave-based systems typically achieve betterperformance than MPC-based systems but they require differenttrust assumptions (as an alternative to cryptography) and are sus-ceptible to various side-channel attacks, including branching [65],cache-timing [21, 47, 90], and other attacks [23, 24, 64, 93].

ORAM-based approaches.

Oblivious RAM [45, 46] allows forcompiling arbitrary programs into oblivious ones by carefully dis-torting access patterns to eliminate leaks. ORAM-based systems likeSMCQL [13] and Obladi [28] hide access patterns but the flexibilityof ORAM comes at high cost to throughput and latency. Two-serverdistributed ORAM systems like Floram [33] and SisoSPIR [56] arefaster but require the same non-collusion assumption as in thiswork.

Secrecy does not rely on ORAM; instead, we implement spe-cific database operators with a data-independent control flow.

Hybrid query processing.

In addition to the frameworks in Ta-ble 1, two other works that employ hybrid query execution andlet data owners execute as many operators as possible on theirplaintext data are those by Aggarwal et al. [4] and Chow et al. [27].The latter also leverages a semi-trusted party that learns metadataand must not collude with any other party.

Oblivious operators.

Related works in the cryptographic anddatabase communities focus on standalone oblivious operators,e.g. building group-by from oblivious sorting [57], building equi-joins [6, 62, 69, 77], or calculating common aggregation operatorslike

MIN , MAX , SUM , and

AVG [35]. Our work is driven by real-worldapplications that typically require oblivious evaluation of querieswith multiple operators. Two recent works in this direction are[22, 55], however, they focus on specific queries and do not employany of the optimizations we introduce in this paper.

Outsourced databases.

Secure database outsourcing is an activearea of research and there are many approaches proposed in the lit-erature. Existing practical solutions [42] use “leaky” cryptographicprimitives that reveal information to the database server. Systemsbased on property-based encryption like CryptDB [80] offer fullSQL support and legacy compliance, but each query reveals informa-tion that can be used in reconstruction attacks [48, 60, 71]. Systemsbased on structural encryption [58] like Arx [78], BlindSeer [73],and OSPIR-OXT [25] provide semantic security for data at rest andbetter protection, but do not eliminate access pattern leaks. SDB [51,92] uses secret-sharing in the typical client-server model but itsprotocol leaks information to the database server. KafeDB [97] usesa new encryption scheme that leaks less information compared toprior works. Finally, Cipherbase [9] is a database system that relieson a secure coprocessor (trusted machine).

FHE-based approaches.

Fully Homomorphic Encryption (FHE)protocols [43] allow arbitrary computations directly on encrypteddata with strong security guarantees. Although many implementa-tions exist [12, 34, 44, 54, 67, 82], this approach is still too computa-tionally expensive for the applications we consider in this work.

Differential privacy.

Systems like DJoin [70], DStress [72], andthe work of He et al. [50] use the concept of differential privacy toensure that the output of a query reveals little about any one inputrecord. This property is independent of (yet symbiotic with) MPC’ssecurity guarantee that the act of computing the query reveals nomore than what may be inferred from its output, and

Secrecy couldbe augmented to provide differentially private outputs if desired.Shrinkwrap [14] and SAQE [15] achieve better efficiency byrelaxing security for the computing parties only up to differentiallyprivate leakage. This is effectively the same guarantee as abovewhen the computing and result parties are identical, but is weakerwhen they are different. For this reason,

Secrecy does not leakanything to computing parties.

MPC frameworks.

The recent advances in MPC have given riseto many practical general-purpose MPC frameworks like ABY [32],ABY3 [68], Jiff[20], Obliv-C [96] ObliVM[66], SCALE-MAMBA [63],and ShareMind [18]; we refer readers to Hastings et al. [49] for anoverview of these frameworks. Some of these frameworks supportstandalone database operators (e.g. [11, 18, 68]) but do not addressquery costs under MPC. Splinter [89] uses function secret sharingto protect private queries on public data. This system supports asubclass of SQL queries that do not include private joins. WHAT’S NEXT?

We see several exciting research directions for the database andsystems communities:

MPC query optimizers.

Several of our examples showcase thatoptimal plans in a cleartext evaluation are not necessarily optimalunder MPC (and vice versa). Building robust MPC query optimizersthat take into account alternative oblivious operators and publicinformation about the data schema is a promising research avenue.The optimizations in Section 4 are by no means exhaustive andthere are many opportunities for continued research in this space.For example, Krastnikov et al. [62] and Mohassel et al. [69] recentlyintroduced oblivious algorithms for joins on unique keys with linear(rather than quadratic) worst-case runtime. These algorithms couldbe extended to avoid materializing intermediate state and appliedto other settings like foreign-key joins.

Parallelism and oblivious hashing.

Task and data parallelismoffer the potential for improved performance and scalability. Ex-tending oblivious operators to work in a task-parallel fashion isstraight-forward (e.g. for bitonic sort) but data-parallel executionrequires additional care. In a plaintext data-parallel computation,data are often partitioned using hashing: the data owners agree on ahash function 𝑓 and hash the input records into buckets, so that sub-sequent join and group-by operations only need to compare recordswithin the same bucket. In MPC, data parallelism can be achievedvia oblivious hashing, with care taken to ensure that the bucketsizes do not reveal the data distribution or access patterns. Indeed,many private set intersection algorithms leverage this technique ina setting where the input and computing parties are identical [76].To achieve better load balancing of keys across buckets and keepthe bucket size low, one can use Cuckoo hashing, as in [75, 77]. It isan interesting direction to design oblivious hashing techniques inthe outsourced setting, where data owners generate and distributesecret shares along with their corresponding bucket IDs to reducethe cost of oblivious join and group-by operators. Efficient MPC primitives and HW acceleration.

There existopportunities to improve upon the efficiency of the underlyingMPC building blocks used in our operators. First, while we strivedto minimize

Secrecy ’s codebase and thus to repurpose obliviousbitonic sort for as many operators as possible, one can achieve evenbetter performance by adding support for more primitives, e.g. afast oblivious shuffle with linear (rather than quasi-linear) workand constant rounds. Second, while

Secrecy takes a software-onlyapproach, one could implement special MPC primitives on modernhardware [38–41, 53, 85, 86] to further improve computation andcommunication latency.

Malicious security.

While the current work focuses on semi-honest security, it provides a strong foundation for achieving mali-cious security in the future.

Secrecy protects data using the repli-cated secret sharing scheme of Araki et al. [8], which can be ex-tended to provide malicious security with low computational cost[7]. By optimizing MPC rather than sidestepping it, our approachhas an advantage over prior work [79]: we do not need to takeadditional non-trivial measures to protect the integrity of localpre-processing steps.

ACKNOWLEDGMENTS

The authors are grateful to Kinan Dak Albab, Azer Bestavros, andBen Getchell for their valuable feedback, and to the Mass OpenCloud for providing access to their cloud for experiments. Thefourth author’s work is supported by the DARPA SIEVE program un-der Agreement No. HR00112020021 and the National Science Foun-dation under Grants No. 1414119, 1718135, 1801564, and 1931714.

REFERENCES [1] CrypTen. https://github.com/facebookresearch/CrypTen. Last access: January2021.[2] Massachusetts Open Cloud. https://massopen.cloud/. Last access: January 2021.[3] Daniel Abadi, Anastasia Ailamaki, David Andersen, Peter Bailis, Magdalena Bal-azinska, Philip Bernstein, Peter Boncz, Surajit Chaudhuri, Alvin Cheung, AnHaiDoan, Luna Dong, Michael J. Franklin, Juliana Freire, Alon Halevy, Joseph M.Hellerstein, Stratos Idreos, Donald Kossmann, Tim Kraska, Sailesh Krishna-murthy, Volker Markl, Sergey Melnik, Tova Milo, C. Mohan, Thomas Neumann,Beng Chin Ooi, Fatma Ozcan, Jignesh Patel, Andrew Pavlo, Raluca Popa, RaghuRamakrishnan, Christopher Ré, Michael Stonebraker, and Dan Suciu. 2020. TheSeattle Report on Database Research.

SIGMOD Rec.

48, 4 (Feb. 2020), 44–53.https://doi.org/10.1145/3385658.3385668[4] Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krish-naram Kenthapadi, Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, and YingXu. 2005. Two Can Keep a Secret: A Distributed Architecture for Secure Data-base Services, In The Second Biennial Conference on Innovative Data SystemsResearch (CIDR 2005).

CIDR 2005 . http://ilpubs.stanford.edu:8090/659/[5] Rakesh Agrawal, Dmitri Asonov, Murat Kantarcioglu, and Yaping Li. 2006. Sover-eign Joins. In

Proceedings of the 22nd International Conference on Data Engineering(ICDE ’06) . IEEE Computer Society, USA, 26. https://doi.org/10.1109/ICDE.2006.144[6] Rakesh Agrawal, Alexandre Evfimievski, and Ramakrishnan Srikant. 2003. In-formation Sharing across Private Databases. In

Proceedings of the 2003 ACMSIGMOD International Conference on Management of Data (San Diego, California) (SIGMOD ’03) . Association for Computing Machinery, New York, NY, USA, 86–97.https://doi.org/10.1145/872757.872771[7] Toshinori Araki, Assi Barak, Jun Furukawa, Tamar Lichter, Yehuda Lindell, ArielNof, Kazuma Ohara, Adi Watzman, and Or Weinstein. 2017. Optimized Honest-Majority MPC for Malicious Adversaries – Breaking the 1 Billion-Gate Per SecondBarrier. In

Proceedings of the 38 th IEEE Symposium on Security and Privacy (SP) .843–862. https://doi.org/10.1109/SP.2017.15[8] Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma Ohara.2016. High-Throughput Semi-Honest Secure Three-Party Computation with anHonest Majority. In

Proceedings of the 2016 ACM SIGSAC Conference on Computerand Communications Security (CCS) (Vienna, Austria). 805–817. https://doi.org/10.1145/2976749.2978331[9] Arvind Arasu, Spyros Blanas, Ken Eguro, Raghav Kaushik, Donald Kossmann,Ravi Ramamurthy, and Ramarathnam Venkatesan. 2013. Orthogonal Secu-rity With Cipherbase. In

Proc.17th International Conference on Database Theory (ICDT), Athens, Greece, March24-28, 2014 , Nicole Schweikardt, Vassilis Christophides, and Vincent Leroy (Eds.).OpenProceedings.org, 26–37. https://doi.org/10.5441/002/icdt.2014.07[11] David W. Archer, Dan Bogdanov, Yehuda Lindell, Liina Kamm, Kurt Nielsen,Jakob Illeborg Pagter, Nigel P. Smart, and Rebecca N. Wright. 2018. From Keys toDatabases - Real-World Applications of Secure Multi-Party Computation.

Comput.J.

61, 12 (2018), 1749–1771.[12] David W. Archer, José Manuel Calderón Trilla, Jason Dagit, Alex J. Maloze-moff, Yuriy Polyakov, Kurt Rohloff, and Gerard W. Ryan. 2019. RAMPARTS: AProgrammer-Friendly System for Building Homomorphic Encryption Applica-tions. In

WAHC@CCS . ACM, 57–68.[13] Johes Bater, Gregory Elliott, Craig Eggen, Satyender Goel, Abel N. Kho, and JennieRogers. 2017. SMCQL: Secure Query Processing for Private Data Networks.

Proc.VLDB Endow.

10, 6 (2017), 673–684. https://doi.org/10.14778/3055330.3055334[14] Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, and Jennie Rogers.2018. Shrinkwrap: efficient sql query processing in differentially private datafederations.

Proceedings of the VLDB Endowment

12, 3 (2018), 307–320.[15] Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. SAQE:practical privacy-preserving approximate query processing for data federations.

Proceedings of the VLDB Endowment

13, 12 (2020), 2691–2705.[16] Dan Bogdanov, Liina Kamm, Baldur Kubo, Reimo Rebane, Ville Sokk, and RiivoTalviste. 2016. Students and Taxes: a Privacy-Preserving Study Using Secure omputation. Proceedings on Privacy Enhancing Technologies (PoPETS)

Computer Security - ESORICS 2008, 13thEuropean Symposium on Research in Computer Security, Málaga, Spain, October6-8, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 5283) , Sushil Jajodiaand Javier López (Eds.). Springer, 192–206. https://doi.org/10.1007/978-3-540-88313-5_13[18] Dan Bogdanov, Sven Laur, and Jan Willemson. 2008. Sharemind: A Framework forFast Privacy-Preserving Computations. In

Computer Security - ESORICS 2008, 13thEuropean Symposium on Research in Computer Security, Málaga, Spain, October6-8, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 5283) , Sushil Jajodiaand Javier López (Eds.). Springer, 192–206. https://doi.org/10.1007/978-3-540-88313-5_13[19] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. BrendanMcMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. [n.d.].Practical Secure Aggregation for Privacy-Preserving Machine Learning. In

ACMConference on Computer and Communications Security . ACM, 1175–1191.[20] Boston University. [n.d.]. Javascript Implementation of Federated Functionalities.https://github.com/multiparty/jiff. [Online; accessed September 2020].[21] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, SrdjanCapkun, and Ahmad-Reza Sadeghi. 2017. Software Grand Exposure: SGX CacheAttacks Are Practical. In

NDSS . TheInternet Society.[26] Surajit Chaudhuri and Kyuseok Shim. 1994. Including Group-By in Query Op-timization. In

VLDB’94, Proceedings of 20th International Conference on VeryLarge Data Bases, September 12-15, 1994, Santiago de Chile, Chile

Proceedings of the Network and Distributed System Security Sympo-sium, NDSS 2009, San Diego, California, USA, 8th February - 11th February 2009

Proceedings of the 13th USENIX Conference on Operating Systems Designand Implementation (Carlsbad, CA, USA) (OSDI’18) . USENIX Association, USA,727–743.[29] Ivan Damgård, Kasper Damgård, Kurt Nielsen, Peter Sebastian Nordholt, andTomas Toft. 2016. Confidential Benchmarking Based on Multiparty Computation.In

Financial Cryptography (Lecture Notes in Computer Science, Vol. 9603) . Springer,169–187.[30] Ivan Damgård, Matthias Fitzi, Eike Kiltz, Jesper Buus Nielsen, and Tomas Toft.2006. Unconditionally Secure Constant-Rounds Multi-party Computation forEquality, Comparison, Bits and Exponentiation. In

TCC (Lecture Notes in ComputerScience, Vol. 3876) . Springer, 285–304.[31] Ankur Dave, Chester Leung, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica.2020. Oblivious coopetitive analytics using hardware enclaves. In

EuroSys . ACM,39:1–39:17.[32] Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY -A Framework for Efficient Mixed-Protocol Secure Two-Party Computation.In

CCS . ACM, 523–535.[34] Duality Technologies. [n.d.]. PALISADE. https://gitlab.com/palisade/palisade-release. [Online; accessed September 2020].[35] F. Emekci, D. Agrawal, A. E. Abbadi, and A. Gulbeden. 2006. Privacy PreservingQuery Processing Using Third Parties. In . 27–27.[36] Saba Eskandarian and Matei Zaharia. 2019. ObliDB: oblivious query processingfor secure databases.

Proceedings of the VLDB Endowment

13, 2 (2019), 169–183.[37] David Evans, Vladimir Kolesnikov, and Mike Rosulek. 2018. A Pragmatic In-troduction to Secure Multi-Party Computation.

Found. Trends Priv. Secur.

2, 2-3(2018), 70–246.[38] Xin Fang, Stratis Ioannidis, and Miriam Leeser. 2017. Secure Function EvaluationUsing an FPGA Overlay Architecture. In

FPGA . ACM, 257–266.[39] Xin Fang, Stratis Ioannidis, and Miriam Leeser. 2019. SIFO: Secure ComputationalInfrastructure Using FPGA Overlays.

Int. J. Reconfigurable Comput.

SCN (Lec-ture Notes in Computer Science, Vol. 8642) . Springer, 358–379.[41] Tore Kasper Frederiksen and Jesper Buus Nielsen. 2013. Fast and Maliciously Se-cure Two-Party Computation Using the GPU. In

ACNS (Lecture Notes in ComputerScience, Vol. 7954) . Springer, 339–356.[42] Benjamin Fuller, Mayank Varia, Arkady Yerukhimovich, Emily Shen, Ariel Ham-lin, Vijay Gadepally, Richard Shay, John Darby Mitchell, and Robert K. Cunning-ham. 2017. SoK: Cryptographically Protected Database Search. In .IEEE Computer Society, 172–191. https://doi.org/10.1109/SP.2017.10[43] Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. In

Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing (Bethesda, MD, USA) (STOC ’09) . Association for Computing Machinery, NewYork, NY, USA, 169–178. https://doi.org/10.1145/1536414.1536440[44] Craig Gentry and Shai Halevi. 2011. Implementing Gentry’s Fully-HomomorphicEncryption Scheme. In

Advances in Cryptology – EUROCRYPT 2011 , Kenneth G.Paterson (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 129–148.[45] O. Goldreich. 1987. Towards a Theory of Software Protection and Simulation byOblivious RAMs. In

Proceedings of the Nineteenth Annual ACM Symposium onTheory of Computing (New York, New York, USA) (STOC ’87) . Association forComputing Machinery, New York, NY, USA, 182–194. https://doi.org/10.1145/28395.28416[46] Oded Goldreich and Rafail Ostrovsky. 1996. Software Protection and Simulationon Oblivious RAMs.

J. ACM

43, 3 (May 1996), 431–473. https://doi.org/10.1145/233551.233553[47] Johannes Götzfried, Moritz Eckert, Sebastian Schinzel, and Tilo Müller. 2017.Cache Attacks on Intel SGX. In

Proceedings of the 10th European Workshop onSystems Security (Belgrade, Serbia) (EuroSec’17) . Association for Computing Ma-chinery, New York, NY, USA, Article 2, 6 pages. https://doi.org/10.1145/3065913.3065915[48] Paul Grubbs, Marie-Sarah Lacharité, Brice Minaud, and Kenneth G. Paterson.2018. Pump up the Volume: Practical Database Reconstruction from VolumeLeakage on Range Queries. In

CCS . ACM, 315–331.[49] Marcella Hastings, Brett Hemenway, Daniel Noble, and Steve Zdancewic. 2019.SoK: General Purpose Compilers for Secure Multi-Party Computation. In

IEEESymposium on Security and Privacy . IEEE, 1220–1237.[50] Xi He, Ashwin Machanavajjhala, Cheryl J. Flynn, and Divesh Srivastava. 2017.Composing Differential Privacy and Secure Computation: A Case Study on Scal-ing Private Record Linkage. In

ACM Conference on Computer and CommunicationsSecurity . ACM, 1389–1406.[51] Zhian He, Wai Kit Wong, Ben Kao, David Wai Lok Cheung, Rongbin Li, Siu MingYiu, and Eric Lo. 2015. Sdb: A secure query processing system with data interop-erability.

Proceedings of the VLDB Endowment

8, 12 (2015), 1876–1879.[52] Adrian F. Hernandez, Rachael L. Fleurence, and Russell L. Rothman. 2015. TheADAPTABLE Trial and PCORnet: Shining Light on a New Research Paradigm.

Ann Intern Med.

DAC . ACM, 33:1–33:6.[54] howpublished = "https://github.com/homenc/HElib" note = "[Online; accessedSeptember 2020]" IBM Research, title = HElib. [n.d.].[55] Mihaela Ion, Ben Kreuter, Ahmet Erhan Nergiz, Sarvar Patel, Mariana Raykova,Shobhit Saxena, Karn Seth, David Shanahan, and Moti Yung. 2019. On DeployingSecure Computing Commercially: Private Intersection-Sum Protocols and theirBusiness Applications.

IACR Cryptology ePrint Archive

CT-RSA(Lecture Notes in Computer Science, Vol. 9610) . Springer, 90–107.[57] Kristján Valur Jónsson, Gunnar Kreitz, and Misbah Uddin. 2011. Secure Multi-Party Sorting and Applications.

IACR Cryptol. ePrint Arch. ttp://eprint.iacr.org/2011/122[58] Seny Kamara and Tarik Moataz. 2018. SQL on Structurally-Encrypted Databases.In Advances in Cryptology – ASIACRYPT 2018 , Thomas Peyrin and Steven Gal-braith (Eds.). Springer International Publishing, Cham, 149–180.[59] Randy Howard Katz and Gaetano Borriello. 2005.

Contemporary logic design (2.ed.) . Pearson Education.[60] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. 2016. GenericAttacks on Secure Outsourced Databases. In

Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security (Vienna, Austria) (CCS’16) . Association for Computing Machinery, New York, NY, USA, 1329–1340.https://doi.org/10.1145/2976749.2978386[61] Donald E. Knuth. 1998.

The Art of Computer Programming, Volume 3: (2nd Ed.)Sorting and Searching . Addison Wesley Longman Publishing Co., Inc., USA.[62] Simeon Krastnikov, Florian Kerschbaum, and Douglas Stebila. 2020. EfficientOblivious Database Joins.

Proc. VLDB Endow.

Proceedings ofthe 2015 IEEE Symposium on Security and Privacy (SP) (San Jose, California, USA).359–376. https://doi.org/10.1109/SP.2015.29[67] Microsoft Research. [n.d.]. SEAL. https://github.com/Microsoft/SEAL. [Online;accessed September 2020].[68] Payman Mohassel and Peter Rindal. 2018. ABY : A Mixed Protocol Frameworkfor Machine Learning. In Proceedings of the 2018 ACM SIGSAC Conference onComputer and Communications Security (Toronto, Canada) (CCS ’18) . Associationfor Computing Machinery, New York, NY, USA, 35–52. https://doi.org/10.1145/3243734.3243760[69] Payman Mohassel, Peter Rindal, and Mike Rosulek. 2020. Fast Database Joins forSecret Shared Data.[70] Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private JoinQueries over Distributed Databases. In

Proceedings of the 10 th USENIX Conferenceon Operating Systems Design and Implementation (OSDI) (Hollywood, California,USA). 149–162. http://dl.acm.org/citation.cfm?id=2387880.2387895[71] Muhammad Naveed, Seny Kamara, and Charles V. Wright. 2015. InferenceAttacks on Property-Preserving Encrypted Databases. In

ACM Conference onComputer and Communications Security . ACM, 644–655.[72] Antonis Papadimitriou, Arjun Narayan, and Andreas Haeberlen. 2017. DStress:Efficient Differentially Private Computations on Distributed Data. In

Proceedingsof the 12 th European Conference on Computer Systems (EuroSys) (Belgrade, Serbia).560–574. https://doi.org/10.1145/3064176.3064218[73] V. Pappas, F. Krell, B. Vo, V. Kolesnikov, T. Malkin, S. G. Choi, W. George, A.Keromytis, and S. Bellovin. 2014. Blind Seer: A Scalable Private DBMS. In . 359–374.[74] Patient-Centered Outcomes Research Institute (PCORI). 2015. Characterizingthe Effects of Recurrent Clostridium Difficile Infection on Patients. IRB Protocol,ORA: 14122.[75] Benny Pinkas, Mike Rosulek, Ni Trieu, and Avishay Yanai. 2020. PSI from PaXoS:Fast, Malicious Private Set Intersection. In

EUROCRYPT (2) (Lecture Notes inComputer Science, Vol. 12106) . Springer, 739–767.[76] Benny Pinkas, Thomas Schneider, Gil Segev, and Michael Zohner. 2015. Phasing:Private Set Intersection Using Permutation-based Hashing. In

USENIX SecuritySymposium . USENIX Association, 515–530.[77] Benny Pinkas, Thomas Schneider, Christian Weinert, and Udi Wieder. 2018. Effi-cient Circuit-Based PSI via Cuckoo Hashing. In

Advances in Cryptology - EURO-CRYPT 2018 - 37th Annual International Conference on the Theory and Applicationsof Cryptographic Techniques, Tel Aviv, Israel, April 29 - May 3, 2018 Proceedings, PartIII (Lecture Notes in Computer Science, Vol. 10822) , Jesper Buus Nielsen and VincentRijmen (Eds.). Springer, 125–157. https://doi.org/10.1007/978-3-319-78372-7_5[78] Rishabh Poddar, Tobias Boelter, and Raluca Ada Popa. 2019. Arx: An EncryptedDatabase Using Semantically Secure Encryption.

Proc. VLDB Endow.

12, 11 (July2019), 1664–1678. https://doi.org/10.14778/3342263.3342641[79] Rishabh Poddar, Sukrit Kalra, Avishay Yanai, Ryan Deng, Raluca Ada Popa, andJoseph M Hellerstein. 2021. Senate: A Maliciously-Secure MPC Platform forCollaborative Analytics. In

Proceedings of the 23 rd ACM Symposium on Operating Systems Principles (SOSP) (Cascais, Portugal). 85–100. https://doi.org/10.1145/2043556.2043566[81] Aseem Rastogi, Matthew A. Hammer, and Michael Hicks. 2014. Wysteria: AProgramming Language for Generic, Mixed-Mode Multiparty Computations. In

Proceedings of the 2014 IEEE Symposium on Security and Privacy . Washington, DC,USA, 655–670. https://doi.org/10.1109/SP.2014.48[82] Kurt Rohloff and David Bruce Cousins. 2014. A Scalable Implementation of FullyHomomorphic Encryption Built on NTRU. In

Financial Cryptography and DataSecurity , Rainer Böhme, Michael Brenner, Tyler Moore, and Matthew Smith (Eds.).Springer Berlin Heidelberg, Berlin, Heidelberg, 221–234.[83] Adi Shamir. 1979. How to Share a Secret.

Commun. ACM

22, 11 (Nov. 1979),612–613. https://doi.org/10.1145/359168.359176[84] Jung Hoon Song and You Sun Kim. 2019. Recurrent Clostridium difficile Infection:Risk Factors, Treatment, and Prevention.

Gut and liver

13, 1 (2019), 16–24.https://doi.org/10.5009/gnl18071[85] Ebrahim M. Songhori, M. Sadegh Riazi, Siam U. Hussain, Ahmad-Reza Sadeghi,and Farinaz Koushanfar. 2019. ARM2GC: Succinct Garbled Processor for SecureComputation. In

DAC . ACM, 112.[86] Ebrahim M. Songhori, Shaza Zeitouni, Ghada Dessouky, Thomas Schneider,Ahmad-Reza Sadeghi, and Farinaz Koushanfar. 2016. GarbledCPU: a MIPS pro-cessor for secure computation in hardware. In

DAC . ACM, 73:1–73:6.[87] Dhinakaran Vinayagamurthy, Alexey Gribov, and Sergey Gorbunov. 2019.StealthDB: a Scalable Encrypted Database with Full SQL Query Support.

Proc.Priv. Enhancing Technol.

Proceedings of the Fourteenth EuroSys Conference 2019, Dresden,Germany, March 25-28, 2019 , George Candea, Robbert van Renesse, and ChristofFetzer (Eds.). ACM, 3:1–3:18. https://doi.org/10.1145/3302424.3303982[89] Frank Wang, Catherine Yun, Shafi Goldwasser, Vinod Vaikuntanathan, and MateiZaharia. 2017. Splinter: Practical Private Queries on Public Data. In

Proceedingsof the 14th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI’17) . USENIX Association, USA, 299–313.[90] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang,Vincent Bindschaedler, Haixu Tang, and Carl A. Gunter. 2017. Leaky Cauldronon the Dark Land: Understanding Memory Side-Channel Hazards in SGX. In

Proceedings of the 2017 ACM SIGSAC Conference on Computer and CommunicationsSecurity (Dallas, Texas, USA) (CCS ’17) . Association for Computing Machinery,New York, NY, USA, 2421–2434. https://doi.org/10.1145/3133956.3134038[91] Xiao Wang, Alex J. Malozemoff, and Jonathan Katz. 2016. EMP-toolkit: EfficientMultiParty computation toolkit. https://github.com/emp-toolkit.[92] Wai Kit Wong, Ben Kao, David Wai Lok Cheung, Rongbin Li, and Siu Ming Yiu.2014. Secure query processing with data interoperability in a cloud databaseenvironment. In

Proceedings of the 2014 ACM SIGMOD international conferenceon Management of data . 1395–1406.[93] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-Channel At-tacks: Deterministic Side Channels for Untrusted Operating Systems. In

Proceed-ings of the 2015 IEEE Symposium on Security and Privacy (SP ’15) . IEEE ComputerSociety, USA, 640–656. https://doi.org/10.1109/SP.2015.45[94] Weipeng P. Yan and Per-Åke Larson. 1994. Performing Group-By before Join. In

Proceedings of the Tenth International Conference on Data Engineering, February14-18, 1994, Houston, Texas, USA . IEEE Computer Society, 89–100. https://doi.org/10.1109/ICDE.1994.283001[95] Andrew Chi-Chih Yao. 1986. How to Generate and Exchange Secrets. In

Proceed-ings of the 27th Annual Symposium on Foundations of Computer Science (SFCS ’86) .IEEE Computer Society, USA, 162–167. https://doi.org/10.1109/SFCS.1986.25[96] Samee Zahur and David Evans. 2015. Obliv-C: A Language for Extensible Data-Oblivious Computation. arXiv:2015/1153 http://eprint.iacr.org/2015/1153.[97] Zheguang Zhao, Seny Kamara, Tarik Moataz, and Zdonik Stan. 2021. EncryptedDatabases: From Theory to Systems. In

Proceedings of the 11th Annual Conferenceon Innovative Data Systems Research .[98] Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, Joseph EGonzalez, and Ion Stoica. 2017. Opaque: An oblivious and encrypted distributedanalytics platform. In { USENIX } Symposium on Networked Systems Designand Implementation ( { NSDI } . 283–298.. 283–298.