[PDF] An Efficient Secure Dynamic Skyline Query Model

Abstract

It is now cost-effective to outsource large dataset and perform query over the cloud. However, in this scenario, there exist serious security and privacy issues that sensitive information contained in the dataset can be leaked. The most effective way to address that is to encrypt the data before outsourcing. Nevertheless, it remains a grand challenge to process queries in ciphertext efficiently. In this work, we shall focus on solving one representative query task, namely dynamic skyline query, in a secure manner over the cloud. However, it is difficult to be performed on encrypted data as its dynamic domination criteria require both subtraction and comparison, which cannot be directly supported by a single encryption scheme efficiently. To this end, we present a novel framework called SCALE. It works by transforming traditional dynamic skyline domination into pure comparisons. The whole process can be completed in single-round interaction between user and the cloud. We theoretically prove that the outsourced database, query requests, and returned results are all kept secret under our model. Moreover, we also present an efficient strategy for dynamic insertion and deletion of stored records. Empirical study over a series of datasets demonstrates that our framework improves the efficiency of query processing by nearly three orders of magnitude compared to the state-of-the-art.

Full PDF

aa r X i v : . [ c s . D B ] F e b An Efﬁcient Secure Dynamic Skyline Query Model ⋆ Weiguo Wang , Hui Li − − − , Yanguo Peng , Sourav S Bhowmick ,Peng Chen , Xiaofeng Chen , and Jiangtao Cui School of Cyber Engineering, Xidian University, China {wgwang,pchen97}@stu.xidian.edu.cn, {hli,xfchen}@xidian.edu.cn School of Computer Science and Technology, Xidian University, China {ygpeng,cuijt}@xidian.edu.cn School of Computer Science and Engineering, Nanyang Technological University, Singapore [email protected]

Abstract.

SCALE . It works by transforming traditional dynamic skyline dominationinto pure comparisons. The whole process can be completed in single-round in-teraction between user and the cloud. We theoretically prove that the outsourceddatabase, query requests, and returned results are all kept secret under our model.Moreover, we also present an efﬁcient strategy for dynamic insertion and deletionof stored records. Empirical study over a series of datasets demonstrates that ourframework improves the efﬁciency of query processing by nearly three ordersof magnitude compared to the state-of-the-art.

Keywords: skyline, secure, cloud, query

With the rapid expansion in data volumes, many individuals and organizations are in-creasingly inclined to outsource their data to public cloud services since they providea cost-effective way to support large-scale data storage and query processing. As amajor type of query and fundamental building block for various applications, skylinequery [7] has become an important issue in database research for extracting interest-ing objects from multi-dimensional datasets. The skyline query processing is widelyadopted in many applications that require multi-criteria decision making such as mar-ket research [13], location based systems [17], web services study [3], etc. The skyline ⋆ corresponding author: Hui Li, [email protected] perator ﬁlters out a set of interesting points based on a group of evaluation criteriafrom a large set of points. A point is considered as interesting, if there does not exist apoint that is at least as good in all criteria and better in at least one criteria.However, similar to other types of query, outsourcing skyline query workload to apublic cloud will inevitably raise privacy issues. Since a real-world database may oftencontain sensitive information such as personal electronic mails, health records, ﬁnancialtransactions, etc., a cloud service provider may illegally spy on the data and invade theprivacy of the data owner and users.In this paper, we focus on the problem of secure skyline querying on the cloud plat-form aiming to protect the security of outsourced data, query request and results. Securequery processing on encrypted data is an important research ﬁeld in outsourcing compu-tation and has been extensively studied during recent years [30,20]. For instance, fullyhomomorphic encryption schemes [14] ensure strong security while enabling arbitrarycomputations on the encrypted data. Modular Order-preserving encryption [5,9] pro-vides an intuitive security model which supports comparison over the ciphertext withoutdecryption. Despite the promising achievements in the area of secure query processing,it remains a grand challenge for processing dynamic skyline queries over ciphertext,where the skyline operator is executed with respect to some query point [25]. The mainreason for the problem is as follows. Given a query request, a dynamic skyline queryrequires performing both comparison and distance evaluation online simultaneously.Unfortunately, accomplishing this task over ciphertext cannot be realized efﬁciently viaexisting encryption schemes.For instance, suppose that a medical institution wishes to outsource its electronic di-abetes records to some public cloud service, resisting to leak the content of the recordsto the cloud server. An electronic diabetes record consists of a series of attributes, in-cluding ID , age , FBGL (fasting blood glucose level) , etc. Let P = { p , . . . , p n } denotea set of electronic diabetes records. When the medical institution receives a new record q , it expects the cloud server to retrieve a similar record to enhance and personalize thetreatment for the new patient q . However, it is usually difﬁcult or even impossible touniformly assign weights for all the attributes to return the nearest neighbor ( e.g., p i isthe nearest if only age is involved while p j is the nearest if only FBGL is taken into ac-count). In light of that, dynamic skyline query provides all possible Pareto records thatare not dominated by any other ones. Given a query q , we can compute the differencebetween each attribute for p i and q . Let t i be the difference tuple between p i and q , and t i [ j ] = | p i [ j ] − q [ j ] | for each dimension j . An object t i dominates t j if it is better than t j in at least one dimension and not worse in every other dimensions. If an object cannotbe dominated by any other object, this object is one of the skyline points that needs tobe returned. As shown in Fig. 1, there are ﬁve patient records p , . . . , p . Given a queryrecord q , we calculate t , . . . , t and can easily identify the skyline points as t and t .Therefore, p and p are the results for the dynamic skyline query w.r.t q .Notably, in the above example, a dynamic skyline query requires performing bothsubtraction and comparison online. As there is no practical encryption scheme support-ing both operators over ciphertext, existing model employs secure multiparty compu-tation over at least to third-party non-collusion clouds and processes the query withmultiple rounds of interactions. In this work, we present a novel framework called age f a s t i n g b l oo d g l u c o s e l e v e l ( mm o l / L ) p p p p p t t t t t q Raw samplesQuery qProjection w.r.t. q

Fig. 1: Dynamic Skyline Query Example.

SCALE ( S e C ure dyn A mic sky L ine qu E rying) by transforming traditional skyline dom-ination criteria, which requires both subtraction and comparison, into comparison only.In this way, we are able to present a new scheme that can support dynamic skylinequery over ciphertext without any help from a second cloud and can be completed ina single-round interaction between user and the cloud. We theoretically prove that theoutsourced database, query requests, and returned results are all kept secret under ourmodel. Empirical study over four datasets including both synthetic and real-world onesdemonstrate that our framework outperforms a state-of-the-art method by nearly threeorders of magnitude. Notably, as a special case of dynamic skyline query, skyline com-putation can also be processed securely and efﬁciently under our model (with trivialmodiﬁcation).In summary, this work makes the following contributions. – We propose a new scheme to encrypt the outsourced database and query request.Based on the scheme, dynamic skyline query can be answered without decryptingthe database or the query. Within the scheme, the cloud server and data user needonly one interaction during the query. – We theoretically prove that our model is secure if the cloud is curious-but-honest. – In addition to the secure query scheme, we also present an efﬁcient mechanismfor modiﬁcations over existing database records, including insertion, deletion andupdating. – We also theoretically show that the skyline points can be computed efﬁciently andcorrectly. – Empirical study over both synthetic and real-world datasets justify that our model issuperior to the state-of-the-art w.r.t the query response time, by more than .The rest of this paper is organized as follows. In Section 2 we conduct a literaturereview for the related work. In Section 3 we formally present the problem deﬁnitionand system model for this work. The detailed designs of encryption scheme and queryframework are discussed in Section 4. Empirical study and corresponding results areshown in Section 5. In Section 6, we conclude this work.

Related Work

The skyline query is particularly important for several applications involving multi-criteria decision making. The computation of the skyline is equivalent to determin-ing the maximal vector problem in computational geometry [18,29], or equivalentlythe Pareto optimal set [29] problem in operations research. Since [18] earliest stud-ied the method and complexity of skyline computation ( i.e., static skyline query) , ithas been extensively studied in the database ﬁeld. Block-Nested-Loop [7], Divide-and-Conquer [12], Nearest Neighbor [16], Branch-and-Bound Skyline [25] and series ofworks afterwards have progressively improved the efﬁciency on general version of sky-line computation [22,27,31].A dynamic skyline query is a variation of skyline computation that was ﬁrst intro-duced in [25,26]. Instead of computing the skyline points purely from the given dataset,dynamic skyline query returns series of points that are not dominated by any others withrespect to q . In another word, skyline computation can be viewed as a special case ofdynamic skyline query where q is ﬁxed as origin point and only the comparison (withoutdistance evaluation) is required. With the development of cryptography, Encryption technology is gradually applied inthe database ﬁeld.Bothe et al. [8] presented an approach for skyline computation over EncryptedData. It provided efﬁciency analysis and empirical study for computing skyline pointsand decrypting the results. However, it failed to provide any formal security guaran-tee. Moreover, as discussed above, skyline computation only requires comparison with-out distance computation; processing it in encrypted domain can be easily performedthrough Order Preserving Encryption [2] or Order Revealing Encryption [5,9]. Anotherwork [10] proposed three novel schemes that enable efﬁcient veriﬁcation of skylinequery results returned by an unauthentic cloud server. Their work focuses on the veri-ﬁcation but not privacy issues, and does not work on ciphertext. It is orthogonal to thescope of this paper.Liu et al. [20] proposed the ﬁrst semantically secure protocol for dynamic skylinequery over the cloud platform. The scheme adopts both Paillier cryptosystem [23] andSecure Multi-party Computation (SMP) as building blocks. Although it is proved to besemantically secure, the protocol suffers from huge computation cost and strict systemmodel. In fact, as a query framework, the response time is the most important issue forthe success of the application, but the performance of [20] is far from satisfactory inthis aspect. In the following of this work, we shall refer to static skyline query as skyline computation. .3 Order-Revealing Encryption

Order-Preserving Encryption (OPE) scheme [2], whose ciphertext preserve the originalordering of the plaintexts, has been extensively applied in range query over encrypteddatabases.The ideal security goal for an order-preserving scheme, IND-OCPA [4], is to revealno additional information about the plaintext values besides their order. Boldyreva et al. [4] were the ﬁrst to provide a rigorous solution to the problem. They settled on a weakersecurity guarantee, which was shown to leak at least half of the plaintext bits [5,9]. Popa et al. [28] presented the ideal-secure order-preserving encoding scheme. [15] showedhow to achieve the even stronger notion of frequency-hiding OPE. However, these ideal-secure OPE schemes require rounds of interactions between client and server.To improve the security of OPE, Boneh et al. [6] presented Order-Revealing En-cryption schemes (ORE), another method for circumventing the lower bound deducedby Boldyreva et al. [4]. In an ORE scheme, the numerical order of two ciphertexts doesnot necessarily reﬂect that of the original messages as OPE does. Instead, the order ofthe original messages can only be decided by a carefully designed function over thecorresponding ciphertexts. Pandey and Ruselakis [24] previously considered this typeof relaxation in the context of property-preserving encryption . In a property-preservingencryption scheme, there is a publicly computable function that can be evaluated onciphertexts to determine the value of some property on the underlying plaintexts. OPEcan thus be viewed as a property-preserving encryption scheme where the computablefunction is the comparison operation. Pandey and Rouselakis introduced and exploredseveral indistinguishable-based notions of security for property-preserving encryption.However, they did not construct an order-revealing encryption scheme. Chenette et al. [11] built efﬁciently implementable order-revealing encryption based on pseudoran-dom functions. Lewi et al. [19] improved the above scheme. The ORE scheme in [19]is adopted for this work, and it will be discussed further in Section 4.

In this section, we shall introduce a group of key concepts for skyline query, and ﬁnallydescribe the system and security models in this paper. For ease of discussion, the keynotations used throughout this paper are summarized in Table 1.

In this part, we shall introduce a series of key concepts for skyline problem that isimportant for the following discussion.

Deﬁnition 1 (Domination).

Given two points p α , p β in d -dimensional space, we say p α dominates p β (denoted by p α ≺ p β ), if ∀ i ∈ { , . . . , d } , p α [ i ] ≤ p β [ i ] , and ∃ i ∈{ , . . . , d } , p α [ i ] < p β [ i ] . Deﬁnition 2 (Skyline Computation).

Given a dataset P = { p , . . . , p n } in d - dimen-sional space, skyline computation returns the points set S ⊆ P , such that ∀ p ∈ S , ∄ p ′ ∈ P such that p ′ ≺ p ( i.e., ∀ p ∈ S, p ′ ∈ P , p ′ cannot dominate p ). able 1: Summary of notations Notation Deﬁnition n The number of tuples in the database d The dimension of database q The query tuple

Enc ( q ) Ciphertext of the query tuple

Enc (2 q ) Ciphertext of the doubled query tuple P = { p , . . . , p n } A database with n tuples E ( P ) Ciphertexts of tuples for PE ( Φ ) Ciphertexts of the pairwise sums for tuples in Pp i [ j ] The j − th attribute of p i keys [ · ] The set of private keys

Deﬁnition 3 (Dynamic Domination).

Given two points p α , p β and a query point q in d -dimensional space, we say p α dynamically dominates p β with respect to q (denotedby p α ≺ q p β ), if ∀ i ∈ { , . . . , d } , | p α [ i ] − q [ i ] | ≤ | p β [ i ] − q [ i ] | , and ∃ i ∈ { , . . . , d } , | p α [ i ] − q [ i ] | < | p β [ i ] − q [ i ] | . Deﬁnition 4 (Dynamic Skyline Query).

Given a dataset P = { p , . . . , p n } and aquery q in d -dimensional space, dynamic skyline query returns the set S ⊆ P , suchthat ∀ p ∈ S , ∄ p ′ ∈ P such that p ′ ≺ q p ( i.e., ∀ p ∈ S, p ′ ∈ P , p ′ cannot dynamicallydominate p with respect to q ). A common algorithm ( i.e.,

BNL [7]) for dynamic skyline query is shown in Algo-rithm 1. It ﬁrst calculates the differences ( i.e., t i ) between each tuple ( i.e., p i ) and thequery request ( i.e., q ) in every dimension (Lines 1-3). When a tuple p i is read from P ,it is added to S if S is empty (Lines 5-6). Otherwise, we shall compare p i ’s correspond-ing difference tuple with respect to q , namely t i , with that of each tuple in S . In case t i ≺ t j , where p j ∈ S , we shall delete p j from S . If there is no p j ∈ S such that t j ≺ t i ,we shall add p i to S (Lines 10-11, 16-18). The algorithm repeats this process for theremaining tuples in P , and ﬁnally returns S (Line 21).We shall use this as the basis for our secure skyline model. Notably, this is notthe most efﬁcient algorithm for plaintext skyline query. We select this method as ourbuilding block for the following reasons. Firstly, the state-of-the-art solution for securedynamic skyline is [20], it adopts BNL [7] as the basic building block. In line with themand to make a fair comparison, our solution is constructed according to the same queryframework. Secondly, BNL is a common and popular iterative algorithm for answeringdynamic skyline query in plaintext. Thirdly, as discussed in Section 1, the key challengein secure dynamic skyline query lies in the solution for performing both subtraction andcomparison over ciphertext. A secure model building on any other (plaintext) dynamicskyline query algorithm inevitably has to address that. In other words, although our lgorithm 1 Basic Skyline Query Algorithm

Require:

The dataset P and a query tuple q Ensure:

The result set of skyline points S for i in , . . . , n and j in , . . . , d do

2: let t i [ j ] = | p i [ j ] − q [ j ] | end for for i in , . . . , n do if S is empty then

6: add p i to S else flag ← T rue for each p j ∈ S do if t j ≺ t i then flag ← F alse else if t i ≺ t j then

13: delete p j from S end if end for if flag == T rue then

17: add p i to S end if end if end for return S solution in this work adopts Algorithm 1 as the foundation, it can be easily adapted toother (plaintext) dynamic skyline query algorithms. Our system model involves three types of participants: a data owner, a cloud serverand a group of query users. The cloud server is assumed to have large storage andcomputation ability, and it provides outsourcing storage and computation services. AsFig. 2 shows, the data owner employs the cloud service and stores his private databasein the cloud server. To preserve data privacy, the data owner will encrypt his dataset, andonly outsource the encrypted dataset to the cloud. Every query user may submit a querypoint ( i.e., q ) toward the system. The query request may be locally encrypted beforesending to the cloud server. Then, the cloud server will perform dynamic skyline queryover encrypted database and query request without decryption. Afterwards, it returnsthe encrypted results to the user. Finally, the user decrypts these results using their ownprivate keys. Security model

We parameterize the security model by a collection of leakage func-tions L = ( L Encrypt , L Query , L Insert , L Delete ) . ataset Encrypted DatasetKeys Send Encrypted Dataset Encrypted Dataset Registration Information

Keys

Query data Encrypted Query Data

Send Encrypted Query Data

Encrypted Query Data Skyline Query

Return Encrypted Skyline PointsKeysData Owner

Query Users

Cloud Server

Fig. 2: The system model of secure skyline queryThe functions describe what information the protocol leaks to the adversary. The def-inition ensures that the scheme does not reveal any information beyond what can beinferred from the leakage functions.We deﬁne two games

Game R , A and Game S , A as follows. The adversary repeat-edly encrypts data and queries skyline points, and receives the transcripts generatedfrom Encrypt () and Query () algorithms in the real game Game R , A or receives thetranscripts generated by the simulator S ( L Encrypt ) and S ( L Query ) in the ideal game Game S , A . Eventually, A outputs a bit 0 ( Game R , A ) or 1 ( Game S , A ). Deﬁnition 5 (Adaptively secure).

A scheme is L -adaptively-secure if for all proba-bilistic polynomial-time algorithm A , there exists an efﬁcient simulator S such that thefollowing equation holds: (cid:12)(cid:12) P r [Game R , A ( λ ) = 1] − P r [Game S , A ( λ ) = 1] (cid:12)(cid:12) ≤ negl ( λ ) . Design goals

Our design goals contain both efﬁciency and privacy, including databaseprivacy, query privacy, and result privacy. The details are as follows. – Data owners need to encrypt the database before it is sent to the cloud server. Mean-while, the content in the database is not leaked to the cloud server. – Query request, as well as the results, should not be revealed to the cloud serverthroughout query processing. – As a query processing framework, efﬁciency should be considered as one of themost important issue for measuring its success. Although the entire query process-ing is performed in ciphertext here, it should minimize the additional cost associ-ated with it.

The SCALE Framework

In this section, we shall introduce the

SCALE framework for secure dynamic skylinequerying under the proposed system model. As discussed above, processing dynamicskyline query given a query point q requires performing both subtraction and compar-ison. Addressing both tasks in ciphertext form is challenging as there is no practicalencryption scheme that supports both operations simultaneously.To address this challenge, we reinvestigate the entire dynamic skyline query work-ﬂow described in Deﬁnition 3.1 and Algorithm 1. Our investigation revealed an impor-tant fact that may lead to an effective solution. Notably, to answer a dynamic skylinequery given a request q , quantifying the differences between each point p i and q throughall dimensions is not mandatory. Instead, what we need is the relative order of such dif-ferences for a group of different p i . Observation 1

In order to evaluate whether p α dynamically dominates p β with respectto q , we do not need to know the exact values for the difference vectors T α and T β , where T i [ j ] = | p i [ j ] − q [ j ] | for j ∈ [1 , . . . , d ] . In fact, what we really need to know is whether T α [ j ] ≤ T β [ j ] or T α [ j ] < T β [ j ] for j ∈ [1 , . . . , d ] . For simplicity, for an arbitrarydimension j , we need to know whether p α [ j ] or p β [ j ] is close to q [ j ] . To answer that,we have to consider two possible cases depending on whether q [ j ] falls in the intervalbetween p α [ j ] and p β [ j ] . Fig. 3a and Fig. 3b depict the cases. In the case of Fig. 3a,the order between T α [ j ] and T β [ j ] can be interpreted as the relationship between p α [ j ] and p β [ j ] . In the case of Fig. 3b, the order between T α [ j ] and T β [ j ] can be interpretedas the relationship between p α [ j ] + p β [ j ] and q [ j ] + q [ j ] . In the aforementioned study, we notice that the multi-type-operation requirement( i.e., with both subtraction and comparison) in dynamic skyline query can be trans-formed to uni-type-operation involving only comparison . Inspired by this criticalpoint, current encryption schemes that support comparison over ciphertext can be adoptedin our framework to realize our design goals.

In our scheme, we adopt a state-of-the-art encryption scheme that supports comparison,namely order-revealing encryption [19]. We ﬁrst present the formal deﬁnition of order-revealing encryption.

Deﬁnition 6 (Order-Revealing Encryption).

An order-revealing encryption (ORE)scheme [19] is a tuple of three algorithms including

Setup , Encrypt and

Compare de-ﬁned over a well-ordered domain D with the following properties: – Setup (1 λ ) → sk : On input a security parameter λ , the setup algorithm outputs asecret key sk . – Encrypt ( sk, m ) → ct : On input a secret key sk and a message m ∈ D , the encryp-tion algorithm outputs a ciphertext ct . – Compare ( ct , ct ) → b : On input two ciphertexts ct , ct , the compare algorithmoutputs a bit b ∈ {− , , } . [ j ] p (cid:166) [ j ] p (cid:167) [ j ] (a) Case 1. q [ j ] p (cid:166) [ j ] p (cid:167) [ j ] (b) Case 2. Fig. 3: Cases for the relationship between q and ( p α , p β ) Algorithm 2

SecureCompare Algorithm

Require:

The ORE ciphertext for

Enc ( p α [ j ]) , Enc ( p β [ j ]) , Enc ( q [ j ]) , as well as Enc ( p α [ j ] + p β [ j ]) , Enc (2 q [ j ]) . Ensure:

The comparison result as − , , denoting that p α [ j ] is closer to ( resp., equivalentwith, farther from) q [ j ] than p β [ j ] .1: if ORE . Compare ( Enc ( p α [ j ]) , Enc ( p β [ j ])) == 0 then return else if ORE . Compare ( Enc ( p α [ j ]) , Enc ( p β [ j ])) == − then if Enc ( q [ j ]) falls outside the interval then return ORE . Compare ( Enc ( q [ j ]) , Enc ( p α [ j ])) else return ORE . Compare ( Enc (2 q [ j ]) , Enc ( p α [ j ] + p β [ j ])) end if else if Enc ( q [ j ]) falls outside the interval then return ORE . Compare ( Enc ( q [ j ]) , Enc ( p β [ j ])) else return ORE . Compare ( Enc ( p α [ j ] + p β [ j ]) , Enc (2 q [ j ])) end if end if With the help of ORE scheme, evaluating the dynamic domination relation between p α and p β can be carried out securely in ciphertext form as outlined in Algorithm 2.For ease of subsequent discussion, we shall denote Enc ( x ) as the ORE ciphertext forthe original message x . Minimizing the number of keys

Following Observation 1, a data owner needs toencrypt database P and the sum of any two tuples in P in each dimension, namely p α [ j ] + p β [ j ] , where α = β, α, β ∈ [1 , n ] , j ∈ [1 , d ] . The above two ciphertexts aredenoted as E ( P ) and E ( Φ ) , respectively. In this step, if we use the same private key onboth E ( P ) and E ( Φ ) , the sum of paired tuples in E ( Φ ) , although encrypted, will leakmore message about plaintext beyond the order.For example, assume that P contains ﬁve tuples, whose values in a particular di-mension are a, b, c, d, e , respectively. Suppose that after sorting the values in ascendingorder, we get b, c, a, e, d . Then their sums can be listed as b + c, b + a, b + e, b + d, c + a, c + e, c + d, a + e, a + d, e + d . For ease of discussion, in the following we shall referto these values as pairs of sums . If we encrypt the results for these pairs of sums using +c b+a b+e b+dc+a c+e c+da+e a+de+d Sorteddata: b c a e d

Sum: b+c b+a b+e b+dc+a c+e c+da+e a+d e+d < < << < < < < Key Key Fig. 4: A novel encryption scheme for pairs of tuplesthe same key as E ( P ) , an attacker can get the ordering of plaintexts. Therefore, he maypossibly know b + e ≤ c + a , and then infer that e − a ≤ c − b . In this way, besides theorder, the distribution of values in plaintext tuples is also leaked.However, according to the security model in this work, except the order of tuples insome dimensions, the cloud should not be able to infer the content of the tuples. There-fore, we have to avoid leaking the distribution of data by adopting different keys inORE. Intuitively, an ideal method is to encrypt each pair of sums using a different key,as it is not required to perform comparison among any pair of p α [ j ] , p β [ j ] according toAlgorithm 2. However, the increased number of keys will further introduce key man-agement and storage problems. We propose a novel method to address this problem,which is shown in Fig. 4.As shown in Fig. 4, b, c, a, e, d are the sorted values for ﬁve tuples in P on a par-ticular dimension. According to Algorithm 2, these values should be encrypted usingthe same key as comparisons over their ciphertext are required. As a result, given that Enc ( b ) , . . . , Enc ( d ) are encrypted using the same key under ORE, any adversary caneasily infer that b + c < b + a < b + e < b + d regardless that b + c, . . . , b + d areencrypted with different keys or not. Therefore, it is not beneﬁcial to use multiple keysfor such a group of sums. Deﬁnition 7 (Order-Obvious Class).

Given the order of a set of n elements, whoseexact values are unknown, if the order of two summations over paired elements can beinferred, we call them Order-Obvious. All the n ( n − / paired summations can bedivided into several disjoint subsets accordingly, such that all the summations in eachsubset are Order-Obvious. We refer to each subset as an Order-Obvious Class (abbrev.OOC). Generally, we can ﬁnd all

OOC s, which is classiﬁed using lines in Fig. 4. The rela-tions for sums in the same

OOC ( e.g., line) can be inferred easily purely from E ( P ) . Inlight of that, we can use the same key to encrypt the sums in the same OOC , and adoptdifferent keys across

OOC s. In this way, any adversary cannot get additional informa-tion over the ciphertexts besides the order, and we can effectively minimize the numberof keys. In particular, the minimum number of keys, denoted as κ , ( e.g., the number oflines in Fig. 4) have to satisfy the following theorem. Theorem 1.

In order to satisfy the predeﬁned security model, the minimum number ofencryption keys in a dimension should be κ = ⌈ ∗ n − ⌉ .Proof. See in Appendix A. emark.

Through the above strategy, we have minimized the required number of en-cryption keys. In spite of that, κ is still linear to the increase of n , which may introducekey management burden if n is very large. To address this, we suggest the followingimplementations. For each row in Fig. 4, we assign it a random Id i . The data owneronly needs to store one master key mk and a series of random Id i . Then, key i for en-crypting each row is generated by mk ⊕ Id i . In this way, we can effectively generate κ different keys based on mk . Accessing the pairs of sums

As required by Algorithm 2, in order to compare t α [ j ] and t β [ j ] , it is always required to retrieve the ciphertext of p α [ j ] + p β [ j ] . Therefore, itis necessary to build a map between the elements of E ( P ) with the corresponding sumsin E ( Φ ) . That is, we need to build an index that maps Enc ( p α [ j ]) and Enc ( p β [ j ]) to Enc ( p α [ j ] + p β [ j ]) . To this end, we present an index based on hash function. Formally,we deﬁne a hash function as h : N → N , where N denote the set of natural num-bers. The hash function h should satisfy the following property, ∀ x , y , x , y ∈ N , h ( x , y ) = h ( x , y ) if and only if x = x and y = y .Assume the indices for Enc ( p α [ j ]) and Enc ( p β [ j ]) in E ( P ) are denoted as α and β , respectively. Then the index of Enc ( p α [ j ] + p β [ j ]) in E ( Φ ) can be easily acquiredas h ( α, β ) . Fig. 5 presents an example for the hash function. There are ﬁve encryptedvalues in E ( P ) , namely a, . . . , e . The hash function in this example is simply designedas a regular traversal order for the corresponding sums. In fact, any hash function thatsatisﬁes the aforementioned property can be adopted here. Indexing the pairs of sums

Additionally, as all the pairs of sums within a particular

OOC are encrypted by ORE using the same key, we present to use additional indexstructures for efﬁcient retrieval of corresponding entries for these pairs of sums . There-fore, we also design an index scheme for management of these ORE encrypted pairs ofsums . In

SCALE , we adopt AVL-Tree based structure to construct the indexing structure,as it provides the best efﬁciency when querying for a particular range. Speciﬁcally, itis possible for us to build an AVL-Tree to index all these encrypted sums in the same

OOC . Notably, each AVL-Tree roots at the median of each

OOC and all the nodes in anAVL-Tree are the corresponding ciphertexts for pairs of sums in the same

OOC .For instance, given the records in Fig. 4, there are two

OOC s. We shall build twodifferent AVL-Trees for indexing the corresponding ciphertexts for each

OOC , respec-tively. That is, the ﬁrst

OOC centered at b + d corresponds to an AVL-Tree rooted with Enc ( b + d ) ; another OOC centered at c + e corresponds to another AVL-Tree rootedwith Enc ( c + e ) (as shown in Fig. 5).In fact, data structures other than AVL-Tree can also be adopted to index the OREciphertexts for each OOC . We select AVL-Tree as the default setting in

SCALE as it pro-vides the best query response time among all the choices. Detailed comparison betweenAVL-Tree and other choices will be discussed in Section 4.3.

Database encryption

We have now all the ammunitions in place to demonstrate theentire process of encrypting the database (Algorithm 3). First, the data owner generates c b e a dc b e a d b+db+a a+db+c b+e c+d e+dc+ec+a a+e+d+a+c dd d+ee+aaaa e Raw Entries in a dimension Orderbcaed 12 H a s h F un c t i o n ORE.Encry

Fig. 5: The complete ciphertext storage structure d + ⌈ ∗ n − ⌉ ∗ d keys (Line 1), and for each column ( i.e., attribute) in P we encryptthe entries using the same key (Lines 3-5), resulting in E ( P ) . Then, the data ownersorts the entries (Line 8) in each column ( i.e., attribute) and computes the sums forpairs of entries in each dimension. Afterwards, the sums are then encrypted using thecorresponding keys as shown in Fig. 4 (Lines 9-17), resulting in E ( Φ ) . Finally, the dataowner sends E ( P ) , E ( Φ ) to the cloud server.Besides, the data owner also creates a hash function h that maps each pair of ele-ments in E ( P ) and the corresponding sums in E ( Φ ) , and sends h to the cloud server. Itis now possible for the cloud to quickly locate the ciphertext of the corresponding sumsfor each pair ( p α [ j ] , p β [ j ]) . A data user needs to register their information to the Data owner and securely get thekeys. Then the query user encrypts the request according to Algorithm 4 before sendingit to the cloud server.As shown in Algorithm 4, user encrypts each dimension of the query tuple usingcorresponding keys (Line 2) and encrypts the doubled entries for the query tuple usingother keys (Lines 4-5). Finally, the user sends

Enc ( q ) , Enc (2 q ) to the cloud server. Asmentioned in Algorithm 1, given an encrypted query q as shown Algorithm 4, the cloudserver needs to perform comparisons and computations over encrypted data. Accord-ing to the approach shown in Fig. 3, the cloud server can perform skyline query viathe comparison relationship with encrypted tuples, encrypted query request, encryptedsums, and encrypted doubled request. As a result, the process described in Algorithm 1can be now performed in ciphertext without decryption, which is shown in Algorithm5. To illustrate the entire protocol, we provide a running example in the following. Example 1.

For the convenience of representation, we assume that P contains ﬁve tu-ples, whose entries in dimension are sorted as , , , , .According to Algorithm 3, we shall ﬁrst compute the sums for all pairs of values, e.g., , , , ,

13 + 21 = 34 ,

13 + 32 = 45 ,

13 + 53 = 66 ,

21 + 32 = 53 ,

21 + 53 = 74 ,

32 + 53 = 85 . As shown in Theorem 1, thenumber of encryption keys required for these sums can be calculated as ⌈ ∗ − ⌉ = 2 . lgorithm 3 Dataset Encryption

Require:

The dataset P Ensure:

The ciphertexts sets E ( P ) , E ( Φ )

1: generate d + ⌈ ∗ n − ⌉ ∗ d keys with ORE . Setup as keys [] for p ∈ P and j in , . . . , d do Enc ( p [ j ]) ← ORE . Encrypt ( keys [ j ] , p [ j ])

4: let

Enc ( p ) = { Enc ( p [1]) , . . . , Enc ( p [ d ]) } and add Enc ( p ) to E ( P ) end for

6: let m = 1 for j in , . . . , d do Λ = ( p (1) [ j ] , . . . , p ( n ) [ j ]) ← sort p [ j ] , . . . , p n [ j ] in ascending order9: while Λ is not empty do for i in , . . . , len ( Λ ) do

11: add

ORE . Encrypt ( keys [ d + m ] , p (1) [ j ] + p ( i ) [ j ]) to E ( Φ ) end for for i in , . . . , len ( Λ ) − do

14: add

ORE . Encrypt ( keys [ d + m ] , p ( n ) [ j ] + p ( i ) [ j ]) to E ( Φ ) end for

16: remove the ﬁrst and last elements in Λ , let m = m + 1 end while end for return E ( P ) , E ( Φ ) Therefore, we use two keys to encrypt the above sums, resulting in

Enc (20) , Enc (28) ,. . . , Enc (85) and Enc (34) , Enc (45) , Enc (53) .Besides, we also need to use another key to encrypt the original tuples, e.g., Enc (7) , Enc (13) , . . . , Enc (53) . Suppose that a user submits a query with q [1] = 23 . Then q and q need to be encrypted according to our scheme, resulting in Enc (46) , Enc (46) , Enc (23) . These ciphertexts are then sent to the cloud server. The cloud server comparesciphertexts one by one according to the protocol. Through ORE . Compare and Algori-thm 2, the cloud server can easily determine that

Enc (32) dominates Enc (53) follow-ing the case shown in Fig. 3a. Similarly, Enc (21) dominates Enc (7) and Enc (13) . Inthe case shown in Fig. 3b, Enc (21) dominates Enc (32) because ORE . Compare ( Enc (53) , Enc (46)) = 1 . Algorithm 5 will iteratively repeat this process for all dimensionsand remaining tuples. Modiﬁcations over database records ( insert , delete , update ) are fundamental require-ments in database applications. In light of that, hereby we discuss the strategies to sup-port these operations in our framework. As depicted in Fig. 4, the cloud server storesthese encrypted sums of data values in different OOC s, which contributes the mostexpensive maintenance cost. Hence, the way how these encrypted sums are stored isfundamentally important. In fact, many index structures can be used to accomplish thistask. In

SCALE , we adopt AVL-Tree [1], as it presents the best efﬁciency in search-ing for an entry due to the strictly balanced structure. In fact, we have considered and lgorithm 4

Query Request Encryption

Require:

The query data q , keys from data owner keys [] Ensure:

The ciphertexts

Enc ( q ) , Enc (2 q ) for j in , . . . , d do Enc ( q [ j ]) ← ORE . Encrypt ( keys [ j ] , q [ j ]) for m in , . . . , ⌈ ∗ n − ⌉ do

4: let key _ num = d + ( j − ∗ ⌈ ∗ n − ⌉ + m Enc (2 q m [ j ]) ← ORE . Encrypt ( keys [ key _ num ] , q [ j ]) end for end for return Enc ( q ) , Enc (2 q ) Table 2: Functional comparison over different structures for insertion and deletion

Data Struc-ture Advantages Disadvantages

Linked List Easy implementa-tion Expensive costAVL-Tree Shorter query timethan Red-blackTree Longer response time forthe insert and delete oper-ationsRed-blackTree Longer query timethan AVL-Tree Shorter response time forthe insert and delete oper-ations compared several different structures including Linked list, AVL-Tree and Red-blacktree. Table 2 shows the functional comparison over the advantages and disadvantagesof these data structures in our models. In the following, we shall sequentially describehow insertion and deletion is supported in

SCALE using AVL-Tree.

Insertion

As described in Section 4.1, the entries for each records should be encryptedin multiple copies. Therefore, any newly inserted records have also to undertake thesame procedure. For example, assume that the data owner adds a new tuple f to theexisting database described in Fig. 5, which contains a, b, c, d, e , and b < c < f

OOC s inthe left part of Fig. 6, which show the

OOC for existing records, we implement twocorresponding AVL-Trees to store the

OOC s, e.g., one tree rooted at b + d and containsall the sums in the same OOC , and the other tree rooted at c + e and contains two othersums, namely c + a and a + e . As shown in Fig. 6 and 7b, b + f , f + d belongs to the lgorithm 5 Secure Skyline Query Algorithm

Require:

The ciphertext for dataset E ( P ) , query request Enc ( q ) , sums for tuples E ( Φ ) anddoubled query request Enc (2 q ) Ensure:

The encrypted result set of skyline points S for i in , . . . , n do if S is empty then

3: add

Enc ( p i ) to S else flag _ cur ← T rue for each Enc ( p j ) ∈ S do for m in , . . . , d do flag [ m ] ← SecureCompare ( Enc ( p i [ m ]) , Enc ( p j [ m ]) , Enc ( q [ m ]) , Enc ( p i [ m ] + p j [ m ]) , Enc (2 q [ m ]) )9: end for if ∀ m, flag [ m ] ≥ , and ∃ k such that flag [ k ] > then flag _ cur ← F alse else if ∀ m, flag [ m ] ≤ , and ∃ k such that flag [ k ] < then

13: delete

Enc ( p j ) from S end if end for if flag _ cur is T rue then

17: add

Enc ( p i ) to S end if end if end for return S same OOC with b + d and can be inserted into the corresponding positions in the AVL-Tree rooted with b + d . Similarly, the corresponding ciphertexts for c + f , f + e will beinserted into another AVL-Tree rooted with c + e . Moreover, f + a will be inserted toa new AVL-Tree rooted with f + a . Deletion

On the other hand, data owners also may have to delete tuples from existingdatabase. In this scenario,

SCALE also need to update the corresponding indices thatare associated with the deleted records. For example, assume that the data owner wantto delete a tuple e in b, c, a, e, d . In the manner described in Section 4.1, data ownersencrypt the tuple for each dimension using different keys. Besides, the data owner havealready computed the sums including b + e , c + e , a + e , d + e and encrypt them withdifferent groups (Fig. 8) of keys. All the corresponding ciphertexts have been uploadedand stored in the cloud server.As depicted in Fig. 9a, in SCALE , the cloud server use an

AV L − tree to store thesums in each OOC with the same key. AVL-Tree structure provides efﬁcient deletion ef-ﬁciency. Whenever e is deleted from the database, the corresponding sums with respectto e , e.g., b + e , e + d , c + e and a + e , shall also be removed from the correspondingAVL-Trees. In particular, b + e and e + d shall be removed from the AVL-Tree rooted (cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282)(cid:272)(cid:1085)(cid:258) (cid:272)(cid:1085)(cid:286) (cid:272)(cid:1085)(cid:282)(cid:258)(cid:1085)(cid:286) (cid:258)(cid:1085)(cid:282)(cid:286)(cid:1085)(cid:282) (cid:94)(cid:437)(cid:373)(cid:855) (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282)(cid:272)(cid:1085)(cid:258) (cid:272)(cid:1085)(cid:286) (cid:272)(cid:1085)(cid:282)(cid:258)(cid:1085)(cid:286) (cid:258)(cid:1085)(cid:282)(cid:286)(cid:1085)(cid:282) (cid:94)(cid:437)(cid:373)(cid:855) (cid:47)(cid:69)(cid:94)(cid:28)(cid:90)(cid:100)(cid:3)(cid:296) (cid:94)(cid:381)(cid:396)(cid:410)(cid:286)(cid:282)(cid:855) (cid:271)(cid:853)(cid:3)(cid:272)(cid:853)(cid:3)(cid:258)(cid:853)(cid:3)(cid:286)(cid:853)(cid:3)(cid:282) (cid:94)(cid:381)(cid:396)(cid:410)(cid:286)(cid:282)(cid:855) (cid:271)(cid:853)(cid:3)(cid:272)(cid:853)(cid:3)(cid:296)(cid:853)(cid:3)(cid:258)(cid:853)(cid:3)(cid:286)(cid:853)(cid:3)(cid:282) (cid:271)(cid:1085)(cid:296)(cid:272)(cid:1085)(cid:296) (cid:296)(cid:1085)(cid:258) (cid:296)(cid:1085)(cid:286) (cid:296)(cid:1085)(cid:282) Fig. 6: Insert a new tuple f (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282) (cid:272)(cid:1085)(cid:282) (cid:258)(cid:1085)(cid:282) (cid:271)(cid:1085)(cid:296)(cid:296)(cid:1085)(cid:282)(cid:286)(cid:1085)(cid:282) (a) Before insertion. (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282) (cid:272)(cid:1085)(cid:282) (cid:258)(cid:1085)(cid:282)(cid:271)(cid:1085)(cid:296) (cid:296)(cid:1085)(cid:282) (cid:296)(cid:1085)(cid:282) (b) After insertion. Fig. 7: Inserting b + f and f + d into the AVL-Treeat b + d (as shown in Fig. 9b), which will be then balanced accordingly afterwards;similarly, c + e , a + e will be removed from another AVL-Tree rooted at c + e . Update

Notably, all the existing data records are stored in ciphertext form accordingto our framework. In ciphertext space, the update operation cannot be directly applied.Instead, it is interpreted as deleting an existing encrypted record and then insert a newencrypted record.

Insert and delete operations on AVL-Tree may provide opportunities for side channelattacks. But it is not the key point of this paper and some existing work can solvethis problem. It would not be discussed here. The presented

SCALE framework is con-structed based on ORE scheme proposed in [19], which is secure with leakage function L BLK . The particular lemma is deﬁned as follows.

Lemma 1.

The ORE scheme is secure with leakage function L BLK assuming that theadopted pseudo random function (PRF) is secure and the adopted hash functions aremodeled as random oracles. Here, L BLK ( m , . . . , m t ) = { ( i, j, BLK ( m i , m j )) | ≤ i < j ≤ t } and BLK ( m i , m j ) = ( ORE.Compare ( m i , m j ) , ind diff ( m i , m j )) , in (cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282)(cid:272)(cid:1085)(cid:258) (cid:272)(cid:1085)(cid:286) (cid:272)(cid:1085)(cid:282)(cid:258)(cid:1085)(cid:286) (cid:258)(cid:1085)(cid:282)(cid:286)(cid:1085)(cid:282) (cid:94)(cid:437)(cid:373)(cid:855) (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:282)(cid:272)(cid:1085)(cid:258) (cid:272)(cid:1085)(cid:282)(cid:258)(cid:1085)(cid:282) (cid:94)(cid:437)(cid:373)(cid:855) (cid:24)(cid:28)(cid:62)(cid:28)(cid:100)(cid:28)(cid:3)(cid:286) (cid:94)(cid:381)(cid:396)(cid:410)(cid:286)(cid:282)(cid:855) (cid:271)(cid:853)(cid:3)(cid:272)(cid:853)(cid:3)(cid:258)(cid:853)(cid:3)(cid:286)(cid:853)(cid:3)(cid:282) (cid:94)(cid:381)(cid:396)(cid:410)(cid:286)(cid:282)(cid:855) (cid:271)(cid:853)(cid:3)(cid:272)(cid:853)(cid:3)(cid:258)(cid:853)(cid:3)(cid:282) Fig. 8: Delete a tuple e (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:286) (cid:271)(cid:1085)(cid:282) (cid:272)(cid:1085)(cid:282) (cid:258)(cid:1085)(cid:282) (cid:271)(cid:1085)(cid:286)(cid:286)(cid:1085)(cid:282)(cid:286)(cid:1085)(cid:282) (a) Before deletion. (cid:271)(cid:1085)(cid:272) (cid:271)(cid:1085)(cid:258) (cid:271)(cid:1085)(cid:282) (cid:272)(cid:1085)(cid:282) (cid:258)(cid:1085)(cid:282) (b) After deletion. Fig. 9: Deleting b + e and e + d from the AVL-Tree which ind diff is the ﬁrst differing block function that is the ﬁrst index i ∈ [ n ] such that x i = x j for all j < i and x i = x j . (The proof of this lemma is in Appendix 4.1 in [19]and is omitted here.) In order to formally prove the security of

SCALE , we extend L -adaptively-secure modelfor keyword searching scheme as shown in Deﬁnition 5. Theorem 2.

Let the adopted PRF in ORE is secure. The presented

SCALE frameworkis L -adaptively-secure in the (programmable) random oracle model, where the leakagefunction collection L = ( L Encrypt , L Query , L Insert , L Delete ) is deﬁned as follows, L Encrypt = L BLK ( ∪ dk =1 X ( n ) k ) , L Query = L BLK ( ∪ dk =1 X ( n ) ′ k ) , L Insert = L BLK ( ∪ dk =1 X ( n +1) k ) , L Delete = L BLK ( ∪ dk =1 X ( n ) k ) where X ( n ) k = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y kt ) , X ( n ) ′ k = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y kt ∪ Enc t ( q ) ∪ Enc t (2 q )) and Y kt = { Enc ( p t [ k ] + p j [ k ]) | t < j < n − t + 1 } ∪ { Enc ( p j [ k ] + p ( n − t +1) [ k ]) | t < j

See in Appendix B.able 3: The time complexities of inserting and deleting a record using linked list, AVL-Tree, and Red-black Tree

Data Structure Insertion Deletion

Linked list O ( n ) O ( n ) AVL-Tree O ( nlogn ) O ( nlogn ) Red-Black tree O ( nlogn ) O ( nlogn ) t i m e ( s e c o n d s ) BSSPSCALE (a) CORR t i m e ( s e c o n d s ) BSSPSCALE (b) ANTI t i m e ( s e c o n d s ) BSSPSCALE (c) INDE −1 t i m e ( s e c o n d s ) BSSPSCALE (d) NBA

Fig. 10: Response time by varying the number of tuples (with d = 3 , block = 16 , K =256 ) In the encryption phase, the plaintext data from the data owner can be sorted and en-crypted in advance. We need O ( d + ⌈ ∗ n − ⌉ ) encryption operations every time whena user submits a query following Algorithm 4.During the querying phase, our scheme replaces the original plaintext subtractionand comparison operations with a limited number of comparisons over ciphertext. Thetime taken for encryption and ciphertext comparisons is only affected by the block sizein ORE, key length in AES, and plaintext length. Therefore, we have not changed themain logic for dynamic skyline query processing. Hence, the complexity for the queryprocessing phase in our scheme is consistent with that of [7]. That is, the complexity is O ( n ) for the worst case.Afterwards, we shall discuss the complexity for inserting and deleting operations.In particular, the main complexity in insertion and deletion of records lie in the updateof corresponding AVL-Trees for the sums of records pairs. Notably, the number ofAVL-Trees to store these sums should be equal to the number of OOC s. As describedabove, the number of

OOC s is ⌈ ∗ n − ⌉ . The time complexity of inserting and deletingelements should be ⌈ ∗ n − ⌉ times the complexity of inserting and deleting elements indifferent data structures. It’s important to note that we need to ﬁnd the correspondingpositions before deleting and inserting elements. Table 3 shows the time complexity forinserting and deleting elements by adopting different data structures other than AVL-Tree. In this section, we evaluate the performance and scalability of

SCALE under differentparameter settings over four datasets, including both real-world and synthetic ones. We number of dimensions d -2 -1 t i m e ( s e c o n d s ) CORRANTIINDENBA (a) Effect of d (with block =16 , K = 256 ) number of ORE block -1 t i m e ( s e c o n d s ) CORRANTIINDENBA (b) Effect of block size (with d = 3 , K = 256 )

128 256 384 512 number of key size K t i m e ( s e c o n d s ) CORRANTIINDENBA (c) Effect of key length (with block = 16 , d = 3 ) Fig. 11: The effects of different parameters ( n = 2500 ) number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (a) INDE number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (b) CORR number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (c) ANTI

100 500 1000 1500 2000 number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (d) NBA

Fig. 12: Performance of different indexing structure during insertionalso compare our model with another baseline, namely BSSP [20,21], which is the onlysolution for secure dynamic skyline query.

All algorithms are implemented in C, and tested on the platform with a 2.7GHz IntelCore i5 processor and 8GB of memory running MacOS. We use both synthetic datasetsand a real-world dataset in our experiments. In particular, we generated independent(INDE), correlated (CORR), and anti-correlated (ANTI) datasets following the seminalwork in [7]. In line with [20,21], we also adopt a dataset that contains 2500 NBA playerswho are league leaders of playoffs . Each player is associated with six attributes thatmeasure the player’s performance: Points, Offensive Rebounds, Defensive Rebounds,Assists, Steals, and Blocks. In this subsection, we evaluate our protocols by varying the number of tuples ( n ), thenumber of dimensions ( d ), the ORE block setting, and the length of key ( K ). Varying the number of tuples.

Fig. 10 shows the time cost by varying the number oftuples, namely n , on the four datasets. In this group of experiments, we ﬁx the numberof dimensions, ORE block size and key length as 3, 16 and 256, respectively. We ob-serve that for all datasets, the time cost increases almost linearly with respect to n . Thisphenomenon is consistent with our complexity study in Section 4.5. Notably, for thereal-world dataset ( i.e., NBA), the query response time is less than . seconds, which It is acquired from https://stats.nba.com/alltime-leaders/?SeasonType=Playoffs .

000 3000 5000 7000 9000 number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (a) INDE number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (b) CORR number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (c) ANTI

100 500 1000 1500 2000 number of tuples n t i m e ( m illi s e c o n d ) Linked ListAVL-TreeRed-Black Tree (d) NBA

Fig. 13: Performance of different indexing structure during deletionis efﬁcient enough in practice. Compared to the state-of-the-art [20],

SCALE is morethan 3 orders of magnitude faster. In the following, we shall ﬁx n = 2500 and focus onevaluating the effects of the three parameters in our scheme. Impact of d . Fig. 11a shows the time cost for different d on the four datasets, wherewe ﬁx the ORE block size and key length as 16 and 256, respectively. For all datasets,as d increases from 2 to 6, the response time in all four datasets increases almost expo-nentially as well. This fact is consistent with the ordinary dynamic skyline querying inplaintext. This is because an increase in d leads to more comparison operations for thedecision of dynamic dominance criteria. Impact of ORE block.

Encrypting plaintext based on block cipher, different block sizesmay take different time. Fig. 11b plots the time cost by varying the block sizes used inthe ORE scheme, where d and K are ﬁxed as 3 and 256, respectively. As mentionedin [19], this ORE scheme leaks the ﬁrst block of δ -bits that differs, therefore, increasingthe block size brings higher security. Observe that the response time increases slightlywith respect to the size of ORE block. That indicates, higher security level in ORE hasto sacriﬁce some response time. Impact of K . Fig. 11c shows the time cost by varying the lengths for the keys in theORE scheme. This ORE scheme uses AES as the building block, therefore, increas-ing the encryption key size brings in higher security. Similar to that of block size, theresponse time also increases linearly with respect to the size of encryption keys. Com-paring Fig. 11c against 11b, we can make the following observations. First, increasingthe security level for ORE scheme will deﬁnitely sacriﬁce some efﬁciency. Second, thekey length in AES exhibits more signiﬁcant impact on the efﬁciency comparing to thatof ORE block size.

Maintenance cost.

We extensively perform another group of experiments to test theperformance by adopting different index structures for managing the encrypted pairs ofsums whenever insertion or deletion occurs. In particular, we compare three differentstructures including Linked List, Red-black Tree and AVL-Tree. Notably, as [20,21]does not support modiﬁcations over existing database records, this method is not takeninto account within this group of experiments.Firstly, for each of the datasets, we sequentially insert (encrypting and uploading)new records into the database and evaluate the average response time for insertion ofthe corresponding ciphertext into existing indices, which includes both the time forsearching the targeted position for insertion within the Lists or Trees as well as re-balancing the Trees, if needed, afterwards. The average response times, which are takenrom 10 independent runs, for inserting different number of records using the threeindex structures are shown in Fig. 12. In all the test for each dataset and index structure,we ﬁx the parameters as d = 3 , block = 16 , K = 256 . Through the results in allthe datasets, it is obvious that Linked List performs the worst, with about one order ofmagnitude larger running time than the other two Tree-based competitors. Both AVL-Tree and Red-Black Tree exhibit similar performance, much better than the Linked List.All these phenomenon is consistent with the complexities for these structures shown inTable 3.Secondly, given all the existing datasets encrypted and stored using SCALE , we fur-ther test the response time for deletion of arbitrary records. Similar to the previousexperiments, the response time includes both the time for searching the targeted posi-tion for deletion within the Lists or Trees as well as re-balancing the Trees, if needed,afterwards. The average response times, which are taken from 10 independent runs,for deleting different number of records using the three index structures are shown inFig. 13. We also ﬁx the parameters as d = 3 , block = 16 , K = 256 throughout allthe datasets. According to the results in all the datasets, Linked List performs the worstagain, with about one order of magnitude larger running time than the other two Tree-based competitors. Both AVL-Tree and Red-Black Tree exhibit similar performance.The cost for all the methods increase with respect to the number of deleted records.Notably, throughout all the datasets in both Fig. 12 and 13, Red-Black Tree isslightly better than AVL-Tree for inserting new records or deleting existing ones, asit do not need to strictly balancing the tree after each insertion or deletion. However,unbalanced tree structure will inevitably lead to worse query performance. Hence, thereis a trade-off between the maintenance cost and query cost in selecting a solution fromthe two different tree-based index structure. In particular, when query performance ismore sensitive, AVL-Tree should be adopted (all other experiments in this section areconducting based on this method); when maintenance cost is more sensitive, Red-BlackTree should be a better choice. In this paper, we have presented a new framework called

SCALE to address the securedynamic skyline query problem in the cloud platform. A distinguishing feature of ourframework is the conversion of the requirement of both subtraction and comparisonoperations to only comparisons. As a result, we are able to use ORE to realize dynamicdomination protocol over ciphertext. Based on this feature, we built

SCALE on top ofBNL. In fact, our framework can be easily adapted to other plaintext dynamic skylinequery models. We theoretically show that the proposed scheme is secure under oursystem model, and is efﬁcient enough for practical applications. Moreover, there isonly one interaction between user and the cloud, which minimizes the communicationcost and corresponding threats. Besides, we also present a mechanism for modiﬁcationover existing stored records. Based on the proposed mechanism, insertion, deletion andupdate of records can be all efﬁciently supported by

SCALE . Experimental study overboth synthetic and real-world datasets demonstrates that

SCALE improves the efﬁciencywith at least three orders of magnitude compared to the state-of-the-art method. As partf our future work, we plan to further enhance the security of our scheme and explorehow the scheme can be adapted to support other variations for skyline query.

A Proof of Theorem 1

As shown in Fig. 4, the relations for sums in the same class ( e.g., line) can be inferredeasily from E ( P ) . Obviously, n elements can be divided into κ groups in the way shownin Fig. 4. In detail, given n elements, we can organize all the paired summations of theminto a matrix, whose row/column correspond to each element and are arranged accord-ing to the descending order of the elements. Each entry is the sum of the correspondingrow and column. As it is a symmetrical matrix, all the paired sums can be found inthe upper-right corner without the diagonal. Obviously, all the entries along the borders(upper and right) of the corner belong to the same OOC . By removing these entriesfrom the matrix, we can iteratively ﬁnd several inner borders ( i.e., OOC s). Obviously,the ﬁrst

OOC contains n − entries, and each subsequent class has 4 entries less thanprevious one. Hence, the total number of classes is κ = ⌈ ∗ n − ⌉ . Assuming that we use κ − ǫ secret keys for encryption, there must exist two OOC s sharing the same key. Then,the cloud may infer the distribution of values in plaintext tuples in the aforementionedway. In this regard, besides the order, the distribution of the values in a dimension isalso leaked, which deviates from the security model deﬁned in Section 3.2. Therefore,to realized the predeﬁned security requirements, the minimum number of keys for pairsof sums in a dimension should be at least κ . B Proof of Theorem 2

Let the leakage function collection be L = ( L Encrypt , L Query , L Insert , L Delete ) . Assumption.

The real game

Game R , A strictly follows the presented SCALE construc-tion. In the ideal game

Game S , A , hush functions are replaced by random oracles. Con-cretely, S maintains several hash tables for the adopted ORE scheme and another hashtable consisting of tuples ( in, out ) ∈ ( { , } ∗ ×{ , } ∗ ) ×{ , } λ . The ﬁrst hash tableswork as [19] states, and the second hash table works as shown in Algorithm 6. Specif-ically, h in Section 4.1 is replaced by RO which works as shown in Algorithm 6. Therest part of ideal secure skyline query follows the presented SCALE construction.

Analysis.

Notably, the leakage function of the adopted ORE scheme is L BLK . Then, byintroducing the conclusion in [19], the advantage for any probabilistic polynomial-timeadversary breaking ORE is | Pr[Game

ORE R , A ( λ ) = 1] − Pr[Game

ORE S , A ( λ ) = 1] | ≤ negl ORE ( λ ) . By convention, the security and information leakage is derived fourfold as follows.In the following convention, we begin by the case that p i is one-dimensional. Then, theconclusion for multi-dimensional can be facilely generalized since the secret keys foreach dimension are completely independently selected. lgorithm 6 Work ﬂow of random oracle (RO).

Require: in Ensure: out if ∃ ( in i , out i ) ∈ RO such that in i = in then

2: Return out i ;3: else out R ←− { , } λ ;5: Add ( in, out ) into RO ;6: Return out ;7: end if Encrypt S . In the process of database encryption, the database is originally sorted inascending order.On one hand, for any p i , p j ∈ P and i = j , the result of p i + p j is encrypted byORE scheme. During the encryption, ⌈ ∗ n − ⌉ secret keys are independently selected tominimize the information leakage. By convention of the encryption for pairs of tuples asillustrated in Fig. 5, there are no valid encrypted pairs of tuple such that p i + p j < p k + p ℓ and i < k < j < ℓ . In such case, for each OOC of encrypted pairs of tuples, there isno effective algorithm to infer any distributions in a single

OOC . The leakage functionfor t -th OOC is L BLK ( X ) such that X = { Enc ( p t + p i ) | t < i < n − t + 1 } ∪{ Enc ( p i + p ( n − t +1) ) | t < i < n − t + 1 } . Additionally, all OOC s are encrypted withindependent secret keys. Hence, the complete leakage function is L BLK ( X ) , where X = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y t ) and Y t = { Enc ( p t + p j ) | t < j < n − t + 1 } ∪ { Enc ( p j + p ( n − t +1) ) | t < j < n − t + 1 } . Because, such a leakage cannot put any effect onbreaking ORE, the advantage for any probabilistic polynomial-time adversary breaking SCALE from this point of view is less than or equal to negl

ORE ( λ ) .On the other hand, the searching process is speedup by introducing h as shownin Section 4.1. In the simulation, h is replaced by RO . The input of RO is Enc ( p i + p j ) , and the output is a random number with λ bits. If there exist k and ℓ such that p k + p λ = p i + p j , the probability for RO ( Enc ( p i + p j )) = RO ( Enc ( p i + p j )) is atmost poly( λ ) / λ . The largest OOC contains n − ciphertexts, so the probability ofbreaking SCALE from this perspective is at most (2 n − λ ) / λ .The probability of breaking SCALE is negl ORE ( λ ) + (2 n − λ ) / λ duringdatabase encryption. The leakage function is L BLK ( X ) , where X = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y t ) and Y t = { Enc ( p t + p j ) | t < j < n − t + 1 } ∪ { Enc ( p j + p ( n − t +1) ) | t < j < n − t + 1 } . Query S . During query, according to the query request encryption as shown in Algori-thm 4, for each chain of encrypted pairs of tuples, q and q are encrypted and temporar-ily inserted into the chain. Since cloud is assumed be semi-honest and will store alltemporary results ( i.e., Enc ( q ) and Enc (2 q ) ), then the leakage function is L BLK ( X ∪ Enc ( q ) ∪ Enc (2 q )) such that X = { Enc ( p t + p i | t < i < n − t + 1) } ∪ { Enc ( p i + p ( n − t +1) ) | t < i < n − t + 1 } . Hence, the complete leakage function is L BLK ( X ) ,where X = ∪ ( ⌈ ∗ n − ⌉ ) i =1 ( Y t ∪ Enc t ( q ) ∪ Enc t (2 q )) , Y t = { Enc ( p t + p j ) | t < j

ORE ( λ ) . Additionally, from the view of random oracle, due to that there is noadditional data that are inserted into the ciphertext storing structure, the probability ofbreaking SCALE from this point is still at most (2 n − λ ) / λ . In a word, duringquery, the probability of breaking SCALE is negl ORE ( λ ) + (2 n − λ ) / λ , andthe leakage function is L BLK ( X ) , where X = ∪ ( ⌈ ∗ n − ⌉ ) i =1 ( Y t ∪ Enc t ( q ) ∪ Enc t (2 q )) and Y t = { Enc ( p t + p j ) | t < j < n − t + 1 } ∪ { Enc ( p j + p ( n − t +1) ) | t < j < n − t + 1 } . Insert S . During insertion, there will be n encrypted pairs of tuples that are insertedinto the chains for only a single tuple. The conclusion can be derived by deducingthe conclusion for Encrypt S . During insertion, the probability of breaking SCALE is negl ORE ( λ ) + (2 n − λ ) / λ , and the leakage function is L BLK ( X ) , where X = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y t ) and Y t = { Enc ( p t + p j ) | t < j < n − t + 2 } ∪ { Enc ( p j + p ( n − t +2) ) | t < j < n − t + 2 } . Delete S . During deletion, there will be n encrypted pairs of tuples that are removedfrom the chains for each single tuple. However, a semi-honest cloud may not removesuch pairs. So, the probability of breaking SCALE and the leakage function should bethe same as that in

Encrypt S . Therefore, for one-dimensional data, we can concludethat the leakage function during different phases are as follows. L Encrypt = L BLK ( X ( n ) ) , L Query = L BLK ( X ( n ) ′ ) , L Insert = L BLK ( X ( n +1) ) , L Delete = L BLK ( X ( n ) ) where X ( n ) = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y t ) , X ( n ) ′ = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y t ∪ Enc t ( q ) ∪ Enc t (2 q )) and Y t = { Enc ( p t + p j ) | t < j < n − t + 1 } ∪ { Enc ( p j + p ( n − t +1) ) | t < j < n − t + 1 } .Then, the advantage for breaking SCALE is at most negl

ORE ( λ )+ (2 n − λ ) / λ .In fact, each record may have d dimensions. The secret keys for each dimensionare selected independently, so the leakage function is the union of leakage for eachdimension, and the advantage for breaking SCALE is d times larger than that for eachdimension. Conclusion.

The leakage function in different phases are as follows. L Encrypt = L BLK ( ∪ dk =1 X ( n ) k ) , L Query = L BLK ( ∪ dk =1 X ( n ) ′ k ) , L Insert = L BLK ( ∪ dk =1 X ( n +1) k ) , L Delete = L BLK ( ∪ dk =1 X ( n ) k ) where X ( n ) k = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y kt ) , X ( n ) ′ k = ∪ ( ⌈ ∗ n − ⌉ ) t =1 ( Y kt ∪ Enc t ( q ) ∪ Enc t (2 q )) and Y kt = { Enc ( p t [ k ] + p j [ k ]) | t < j < n − t + 1 } ∪ { Enc ( p j [ k ] + p ( n − t +1) [ k ]) | t < j

1. Adel’Son-Vel’Skii, G.M., Landis, E.M.: An algorithm for the organization of information.Dokl. Akad. Nauk SSSR (2), 263–266 (1962)2. Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order preserving encryption for numeric data.In: SIGMOD. pp. 563–574. ACM (2004). Alrifai, M., Skoutas, D., Risse, T.: Selecting skyline services for qos-based web service com-position. In: WWW. pp. 11–20 (2010)4. Boldyreva, A., Chenette, N., Lee, Y., O’Neill, A.: Order-preserving symmetric encryption.In: EUROCRYPT. pp. 224–241 (2009)5. Boldyreva, A., Chenette, N., O’Neill, A.: Order-preserving encryption revisited: Improvedsecurity analysis and alternative solutions. In: CRYPTO. pp. 578–595 (2011)6. Boneh, D., Lewi, K., Raykova, M., Sahai, A., Zhandry, M., Zimmerman, J.: Semanticallysecure order-revealing encryption: Multi-input functional encryption without obfuscation.In: EUROCRYPT. pp. 563–594 (2015)7. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE. pp. 421–430(2001)8. Bothe, S., Cuzzocrea, A., Karras, P., Vlachou, A.: Skyline query processing over encrypteddata: An attribute-order-preserving-free approach. In: PSBD@CIKM. pp. 37–43 (2014)9. Chatterjee, S., Das, M.P.L.: Property preserving symmetric encryption revisited. In: ASI-ACRYPT. pp. 658–682 (2015)10. Chen, W., Liu, M., Zhang, R., Zhang, Y., Liu, S.: Secure outsourced skyline query processingvia untrusted cloud service providers. In: INFOCOM. pp. 1–9 (2016)11. Chenette, N., Lewi, K., Weis, S.A., Wu, D.J.: Practical order-revealing encryption with lim-ited leakage. In: FSE. pp. 474–493 (2016)12. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE. pp. 717–719 (2003)13. Dellis, E., Seeger, B.: Efﬁcient computation of reverse skyline queries. In: VLDB. pp. 291–302 (2007)14. Gentry, C.: A fully homomorphic encryption scheme. Stanford University (2009)15. Kerschbaum, F.: Frequency-hiding order-preserving encryption. In: CCS. pp. 656–667. ACM(2015)16. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skylinequeries. In: VLDB. pp. 275–286 (2002)17. Kriegel, H., Renz, M., Schubert, M.: Route skyline queries: A multi-preference path planningapproach. In: ICDE. pp. 261–272 (2010)18. Kung, H.T., Luccio, F., Preparata, F.P.: On ﬁnding the maxima of a set of vectors. J. ACM (4), 469–476 (1975)19. Lewi, K., Wu, D.J.: Order-revealing encryption: New constructions, applications, and lowerbounds. In: CCS. pp. 1167–1178 (2016)20. Liu, J., Yang, J., Xiong, L., Pei, J.: Secure skyline queries on cloud platform. In: ICDE. pp.633–644 (2017)21. Liu, J., Yang, J., Xiong, L., Pei, J.: Secure and efﬁcient skyline queries on encrypted data.CoRR abs/1806.01168 (2018)22. Mullesgaard, K., Pederseny, J.L., Lu, H., Zhou, Y.: Efﬁcient skyline computation in mapre-duce. In: EDBT. pp. 37–48 (2014)23. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: EU-ROCRYPT. pp. 223–238 (1999)24. Pandey, O., Rouselakis, Y.: Property preserving symmetric encryption. In: EUROCRYPT.pp. 375–391 (2012)25. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skylinequeries. In: SIGMOD. pp. 467–478 (2003)26. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database sys-tems. ACM Trans. Database Syst. (1), 41–82 (2005)27. Park, Y., Min, J., Shim, K.: Efﬁcient processing of skyline queries using mapreduce. IEEETrans. Knowl. Data Eng. (5), 1031–1044 (2017)8. Popa, R.A., Li, F.H., Zeldovich, N.: An ideal-security protocol for order-preserving encod-ing. In: SP. pp. 463–477 (2013)29. Preparata, F.P., Shamos, M.I.: Computational Geometry - An Introduction. Springer (1985)30. Sun, W., Zhang, N., Lou, W., Hou, Y.T.: When gene meets cloud: Enabling scalable andefﬁcient range query on encrypted genomic data. In: INFOCOM. pp. 1–9 (2017)31. Zhou, X., Li, K., Zhou, Y., Li, K.: Adaptive processing for distributed skyline queries overuncertain data. IEEE Trans. Knowl. Data Eng.28