An Efficient Secure Dynamic Skyline Query Model
Weiguo Wang, Hui Li, Yanguo Peng, Sourav S Bhowmick, Peng Chen, Xiaofeng Chen, Jiangtao Cui
aa r X i v : . [ c s . D B ] F e b An Efficient Secure Dynamic Skyline Query Model ⋆ Weiguo Wang , Hui Li − − − , Yanguo Peng , Sourav S Bhowmick ,Peng Chen , Xiaofeng Chen , and Jiangtao Cui School of Cyber Engineering, Xidian University, China {wgwang,pchen97}@stu.xidian.edu.cn, {hli,xfchen}@xidian.edu.cn School of Computer Science and Technology, Xidian University, China {ygpeng,cuijt}@xidian.edu.cn School of Computer Science and Engineering, Nanyang Technological University, Singapore [email protected]
Abstract.
It is now cost-effective to outsource large dataset and perform queryover the cloud. However, in this scenario, there exist serious security and privacyissues that sensitive information contained in the dataset can be leaked. The mosteffective way to address that is to encrypt the data before outsourcing. Never-theless, it remains a grand challenge to process queries in ciphertext efficiently.In this work, we shall focus on solving one representative query task, namely dynamic skyline query , in a secure manner over the cloud. However, it is diffi-cult to be performed on encrypted data as its dynamic domination criteria requireboth subtraction and comparison, which cannot be directly supported by a sin-gle encryption scheme efficiently. To this end, we present a novel frameworkcalled
SCALE . It works by transforming traditional dynamic skyline dominationinto pure comparisons. The whole process can be completed in single-round in-teraction between user and the cloud. We theoretically prove that the outsourceddatabase, query requests, and returned results are all kept secret under our model.Moreover, we also present an efficient strategy for dynamic insertion and deletionof stored records. Empirical study over a series of datasets demonstrates that ourframework improves the efficiency of query processing by nearly three ordersof magnitude compared to the state-of-the-art.
Keywords: skyline, secure, cloud, query
With the rapid expansion in data volumes, many individuals and organizations are in-creasingly inclined to outsource their data to public cloud services since they providea cost-effective way to support large-scale data storage and query processing. As amajor type of query and fundamental building block for various applications, skylinequery [7] has become an important issue in database research for extracting interest-ing objects from multi-dimensional datasets. The skyline query processing is widelyadopted in many applications that require multi-criteria decision making such as mar-ket research [13], location based systems [17], web services study [3], etc. The skyline ⋆ corresponding author: Hui Li, [email protected] perator filters out a set of interesting points based on a group of evaluation criteriafrom a large set of points. A point is considered as interesting, if there does not exist apoint that is at least as good in all criteria and better in at least one criteria.However, similar to other types of query, outsourcing skyline query workload to apublic cloud will inevitably raise privacy issues. Since a real-world database may oftencontain sensitive information such as personal electronic mails, health records, financialtransactions, etc., a cloud service provider may illegally spy on the data and invade theprivacy of the data owner and users.In this paper, we focus on the problem of secure skyline querying on the cloud plat-form aiming to protect the security of outsourced data, query request and results. Securequery processing on encrypted data is an important research field in outsourcing compu-tation and has been extensively studied during recent years [30,20]. For instance, fullyhomomorphic encryption schemes [14] ensure strong security while enabling arbitrarycomputations on the encrypted data. Modular Order-preserving encryption [5,9] pro-vides an intuitive security model which supports comparison over the ciphertext withoutdecryption. Despite the promising achievements in the area of secure query processing,it remains a grand challenge for processing dynamic skyline queries over ciphertext,where the skyline operator is executed with respect to some query point [25]. The mainreason for the problem is as follows. Given a query request, a dynamic skyline queryrequires performing both comparison and distance evaluation online simultaneously.Unfortunately, accomplishing this task over ciphertext cannot be realized efficiently viaexisting encryption schemes.For instance, suppose that a medical institution wishes to outsource its electronic di-abetes records to some public cloud service, resisting to leak the content of the recordsto the cloud server. An electronic diabetes record consists of a series of attributes, in-cluding ID , age , FBGL (fasting blood glucose level) , etc. Let P = { p , . . . , p n } denotea set of electronic diabetes records. When the medical institution receives a new record q , it expects the cloud server to retrieve a similar record to enhance and personalize thetreatment for the new patient q . However, it is usually difficult or even impossible touniformly assign weights for all the attributes to return the nearest neighbor ( e.g., p i isthe nearest if only age is involved while p j is the nearest if only FBGL is taken into ac-count). In light of that, dynamic skyline query provides all possible Pareto records thatare not dominated by any other ones. Given a query q , we can compute the differencebetween each attribute for p i and q . Let t i be the difference tuple between p i and q , and t i [ j ] = | p i [ j ] − q [ j ] | for each dimension j . An object t i dominates t j if it is better than t j in at least one dimension and not worse in every other dimensions. If an object cannotbe dominated by any other object, this object is one of the skyline points that needs tobe returned. As shown in Fig. 1, there are five patient records p , . . . , p . Given a queryrecord q , we calculate t , . . . , t and can easily identify the skyline points as t and t .Therefore, p and p are the results for the dynamic skyline query w.r.t q .Notably, in the above example, a dynamic skyline query requires performing bothsubtraction and comparison online. As there is no practical encryption scheme support-ing both operators over ciphertext, existing model employs secure multiparty compu-tation over at least to third-party non-collusion clouds and processes the query withmultiple rounds of interactions. In this work, we present a novel framework called age f a s t i n g b l oo d g l u c o s e l e v e l ( mm o l / L ) p p p p p t t t t t q Raw samplesQuery qProjection w.r.t. q
Fig. 1: Dynamic Skyline Query Example.
SCALE ( S e C ure dyn A mic sky L ine qu E rying) by transforming traditional skyline dom-ination criteria, which requires both subtraction and comparison, into comparison only.In this way, we are able to present a new scheme that can support dynamic skylinequery over ciphertext without any help from a second cloud and can be completed ina single-round interaction between user and the cloud. We theoretically prove that theoutsourced database, query requests, and returned results are all kept secret under ourmodel. Empirical study over four datasets including both synthetic and real-world onesdemonstrate that our framework outperforms a state-of-the-art method by nearly threeorders of magnitude. Notably, as a special case of dynamic skyline query, skyline com-putation can also be processed securely and efficiently under our model (with trivialmodification).In summary, this work makes the following contributions. – We propose a new scheme to encrypt the outsourced database and query request.Based on the scheme, dynamic skyline query can be answered without decryptingthe database or the query. Within the scheme, the cloud server and data user needonly one interaction during the query. – We theoretically prove that our model is secure if the cloud is curious-but-honest. – In addition to the secure query scheme, we also present an efficient mechanismfor modifications over existing database records, including insertion, deletion andupdating. – We also theoretically show that the skyline points can be computed efficiently andcorrectly. – Empirical study over both synthetic and real-world datasets justify that our model issuperior to the state-of-the-art w.r.t the query response time, by more than .The rest of this paper is organized as follows. In Section 2 we conduct a literaturereview for the related work. In Section 3 we formally present the problem definitionand system model for this work. The detailed designs of encryption scheme and queryframework are discussed in Section 4. Empirical study and corresponding results areshown in Section 5. In Section 6, we conclude this work.
Related Work
The skyline query is particularly important for several applications involving multi-criteria decision making. The computation of the skyline is equivalent to determin-ing the maximal vector problem in computational geometry [18,29], or equivalentlythe Pareto optimal set [29] problem in operations research. Since [18] earliest stud-ied the method and complexity of skyline computation ( i.e., static skyline query) , ithas been extensively studied in the database field. Block-Nested-Loop [7], Divide-and-Conquer [12], Nearest Neighbor [16], Branch-and-Bound Skyline [25] and series ofworks afterwards have progressively improved the efficiency on general version of sky-line computation [22,27,31].A dynamic skyline query is a variation of skyline computation that was first intro-duced in [25,26]. Instead of computing the skyline points purely from the given dataset,dynamic skyline query returns series of points that are not dominated by any others withrespect to q . In another word, skyline computation can be viewed as a special case ofdynamic skyline query where q is fixed as origin point and only the comparison (withoutdistance evaluation) is required. With the development of cryptography, Encryption technology is gradually applied inthe database field.Bothe et al. [8] presented an approach for skyline computation over EncryptedData. It provided efficiency analysis and empirical study for computing skyline pointsand decrypting the results. However, it failed to provide any formal security guaran-tee. Moreover, as discussed above, skyline computation only requires comparison with-out distance computation; processing it in encrypted domain can be easily performedthrough Order Preserving Encryption [2] or Order Revealing Encryption [5,9]. Anotherwork [10] proposed three novel schemes that enable efficient verification of skylinequery results returned by an unauthentic cloud server. Their work focuses on the veri-fication but not privacy issues, and does not work on ciphertext. It is orthogonal to thescope of this paper.Liu et al. [20] proposed the first semantically secure protocol for dynamic skylinequery over the cloud platform. The scheme adopts both Paillier cryptosystem [23] andSecure Multi-party Computation (SMP) as building blocks. Although it is proved to besemantically secure, the protocol suffers from huge computation cost and strict systemmodel. In fact, as a query framework, the response time is the most important issue forthe success of the application, but the performance of [20] is far from satisfactory inthis aspect. In the following of this work, we shall refer to static skyline query as skyline computation. .3 Order-Revealing Encryption
Order-Preserving Encryption (OPE) scheme [2], whose ciphertext preserve the originalordering of the plaintexts, has been extensively applied in range query over encrypteddatabases.The ideal security goal for an order-preserving scheme, IND-OCPA [4], is to revealno additional information about the plaintext values besides their order. Boldyreva et al. [4] were the first to provide a rigorous solution to the problem. They settled on a weakersecurity guarantee, which was shown to leak at least half of the plaintext bits [5,9]. Popa et al. [28] presented the ideal-secure order-preserving encoding scheme. [15] showedhow to achieve the even stronger notion of frequency-hiding OPE. However, these ideal-secure OPE schemes require rounds of interactions between client and server.To improve the security of OPE, Boneh et al. [6] presented Order-Revealing En-cryption schemes (ORE), another method for circumventing the lower bound deducedby Boldyreva et al. [4]. In an ORE scheme, the numerical order of two ciphertexts doesnot necessarily reflect that of the original messages as OPE does. Instead, the order ofthe original messages can only be decided by a carefully designed function over thecorresponding ciphertexts. Pandey and Ruselakis [24] previously considered this typeof relaxation in the context of property-preserving encryption . In a property-preservingencryption scheme, there is a publicly computable function that can be evaluated onciphertexts to determine the value of some property on the underlying plaintexts. OPEcan thus be viewed as a property-preserving encryption scheme where the computablefunction is the comparison operation. Pandey and Rouselakis introduced and exploredseveral indistinguishable-based notions of security for property-preserving encryption.However, they did not construct an order-revealing encryption scheme. Chenette et al. [11] built efficiently implementable order-revealing encryption based on pseudoran-dom functions. Lewi et al. [19] improved the above scheme. The ORE scheme in [19]is adopted for this work, and it will be discussed further in Section 4.
In this section, we shall introduce a group of key concepts for skyline query, and finallydescribe the system and security models in this paper. For ease of discussion, the keynotations used throughout this paper are summarized in Table 1.
In this part, we shall introduce a series of key concepts for skyline problem that isimportant for the following discussion.
Definition 1 (Domination).
Given two points p α , p β in d -dimensional space, we say p α dominates p β (denoted by p α ≺ p β ), if ∀ i ∈ { , . . . , d } , p α [ i ] ≤ p β [ i ] , and ∃ i ∈{ , . . . , d } , p α [ i ] < p β [ i ] . Definition 2 (Skyline Computation).
Given a dataset P = { p , . . . , p n } in d - dimen-sional space, skyline computation returns the points set S ⊆ P , such that ∀ p ∈ S , ∄ p ′ ∈ P such that p ′ ≺ p ( i.e., ∀ p ∈ S, p ′ ∈ P , p ′ cannot dominate p ). able 1: Summary of notations Notation Definition n The number of tuples in the database d The dimension of database q The query tuple
Enc ( q ) Ciphertext of the query tuple
Enc (2 q ) Ciphertext of the doubled query tuple P = { p , . . . , p n } A database with n tuples E ( P ) Ciphertexts of tuples for PE ( Φ ) Ciphertexts of the pairwise sums for tuples in Pp i [ j ] The j − th attribute of p i keys [ · ] The set of private keys
Definition 3 (Dynamic Domination).
Given two points p α , p β and a query point q in d -dimensional space, we say p α dynamically dominates p β with respect to q (denotedby p α ≺ q p β ), if ∀ i ∈ { , . . . , d } , | p α [ i ] − q [ i ] | ≤ | p β [ i ] − q [ i ] | , and ∃ i ∈ { , . . . , d } , | p α [ i ] − q [ i ] | < | p β [ i ] − q [ i ] | . Definition 4 (Dynamic Skyline Query).
Given a dataset P = { p , . . . , p n } and aquery q in d -dimensional space, dynamic skyline query returns the set S ⊆ P , suchthat ∀ p ∈ S , ∄ p ′ ∈ P such that p ′ ≺ q p ( i.e., ∀ p ∈ S, p ′ ∈ P , p ′ cannot dynamicallydominate p with respect to q ). A common algorithm ( i.e.,
BNL [7]) for dynamic skyline query is shown in Algo-rithm 1. It first calculates the differences ( i.e., t i ) between each tuple ( i.e., p i ) and thequery request ( i.e., q ) in every dimension (Lines 1-3). When a tuple p i is read from P ,it is added to S if S is empty (Lines 5-6). Otherwise, we shall compare p i ’s correspond-ing difference tuple with respect to q , namely t i , with that of each tuple in S . In case t i ≺ t j , where p j ∈ S , we shall delete p j from S . If there is no p j ∈ S such that t j ≺ t i ,we shall add p i to S (Lines 10-11, 16-18). The algorithm repeats this process for theremaining tuples in P , and finally returns S (Line 21).We shall use this as the basis for our secure skyline model. Notably, this is notthe most efficient algorithm for plaintext skyline query. We select this method as ourbuilding block for the following reasons. Firstly, the state-of-the-art solution for securedynamic skyline is [20], it adopts BNL [7] as the basic building block. In line with themand to make a fair comparison, our solution is constructed according to the same queryframework. Secondly, BNL is a common and popular iterative algorithm for answeringdynamic skyline query in plaintext. Thirdly, as discussed in Section 1, the key challengein secure dynamic skyline query lies in the solution for performing both subtraction andcomparison over ciphertext. A secure model building on any other (plaintext) dynamicskyline query algorithm inevitably has to address that. In other words, although our lgorithm 1 Basic Skyline Query Algorithm
Require:
The dataset P and a query tuple q Ensure:
The result set of skyline points S for i in , . . . , n and j in , . . . , d do
2: let t i [ j ] = | p i [ j ] − q [ j ] | end for for i in , . . . , n do if S is empty then
6: add p i to S else flag ← T rue for each p j ∈ S do if t j ≺ t i then flag ← F alse else if t i ≺ t j then
13: delete p j from S end if end for if flag == T rue then
17: add p i to S end if end if end for return S solution in this work adopts Algorithm 1 as the foundation, it can be easily adapted toother (plaintext) dynamic skyline query algorithms. Our system model involves three types of participants: a data owner, a cloud serverand a group of query users. The cloud server is assumed to have large storage andcomputation ability, and it provides outsourcing storage and computation services. AsFig. 2 shows, the data owner employs the cloud service and stores his private databasein the cloud server. To preserve data privacy, the data owner will encrypt his dataset, andonly outsource the encrypted dataset to the cloud. Every query user may submit a querypoint ( i.e., q ) toward the system. The query request may be locally encrypted beforesending to the cloud server. Then, the cloud server will perform dynamic skyline queryover encrypted database and query request without decryption. Afterwards, it returnsthe encrypted results to the user. Finally, the user decrypts these results using their ownprivate keys. Security model
We parameterize the security model by a collection of leakage func-tions L = ( L Encrypt , L Query , L Insert , L Delete ) . ataset Encrypted DatasetKeys Send Encrypted Dataset Encrypted Dataset Registration Information
Keys
Query data Encrypted Query Data
Send Encrypted Query Data
Encrypted Query Data Skyline Query
Return Encrypted Skyline PointsKeysData Owner
Query Users
Cloud Server
Fig. 2: The system model of secure skyline queryThe functions describe what information the protocol leaks to the adversary. The def-inition ensures that the scheme does not reveal any information beyond what can beinferred from the leakage functions.We define two games
Game R , A and Game S , A as follows. The adversary repeat-edly encrypts data and queries skyline points, and receives the transcripts generatedfrom Encrypt () and Query () algorithms in the real game Game R , A or receives thetranscripts generated by the simulator S ( L Encrypt ) and S ( L Query ) in the ideal game Game S , A . Eventually, A outputs a bit 0 ( Game R , A ) or 1 ( Game S , A ). Definition 5 (Adaptively secure).
A scheme is L -adaptively-secure if for all proba-bilistic polynomial-time algorithm A , there exists an efficient simulator S such that thefollowing equation holds: (cid:12)(cid:12) P r [Game R , A ( λ ) = 1] − P r [Game S , A ( λ ) = 1] (cid:12)(cid:12) ≤ negl ( λ ) . Design goals
Our design goals contain both efficiency and privacy, including databaseprivacy, query privacy, and result privacy. The details are as follows. – Data owners need to encrypt the database before it is sent to the cloud server. Mean-while, the content in the database is not leaked to the cloud server. – Query request, as well as the results, should not be revealed to the cloud serverthroughout query processing. – As a query processing framework, efficiency should be considered as one of themost important issue for measuring its success. Although the entire query process-ing is performed in ciphertext here, it should minimize the additional cost associ-ated with it.
The SCALE Framework
In this section, we shall introduce the
SCALE framework for secure dynamic skylinequerying under the proposed system model. As discussed above, processing dynamicskyline query given a query point q requires performing both subtraction and compar-ison. Addressing both tasks in ciphertext form is challenging as there is no practicalencryption scheme that supports both operations simultaneously.To address this challenge, we reinvestigate the entire dynamic skyline query work-flow described in Definition 3.1 and Algorithm 1. Our investigation revealed an impor-tant fact that may lead to an effective solution. Notably, to answer a dynamic skylinequery given a request q , quantifying the differences between each point p i and q throughall dimensions is not mandatory. Instead, what we need is the relative order of such dif-ferences for a group of different p i . Observation 1
In order to evaluate whether p α dynamically dominates p β with respectto q , we do not need to know the exact values for the difference vectors T α and T β , where T i [ j ] = | p i [ j ] − q [ j ] | for j ∈ [1 , . . . , d ] . In fact, what we really need to know is whether T α [ j ] ≤ T β [ j ] or T α [ j ] < T β [ j ] for j ∈ [1 , . . . , d ] . For simplicity, for an arbitrarydimension j , we need to know whether p α [ j ] or p β [ j ] is close to q [ j ] . To answer that,we have to consider two possible cases depending on whether q [ j ] falls in the intervalbetween p α [ j ] and p β [ j ] . Fig. 3a and Fig. 3b depict the cases. In the case of Fig. 3a,the order between T α [ j ] and T β [ j ] can be interpreted as the relationship between p α [ j ] and p β [ j ] . In the case of Fig. 3b, the order between T α [ j ] and T β [ j ] can be interpretedas the relationship between p α [ j ] + p β [ j ] and q [ j ] + q [ j ] . In the aforementioned study, we notice that the multi-type-operation requirement( i.e., with both subtraction and comparison) in dynamic skyline query can be trans-formed to uni-type-operation involving only comparison . Inspired by this criticalpoint, current encryption schemes that support comparison over ciphertext can be adoptedin our framework to realize our design goals.
In our scheme, we adopt a state-of-the-art encryption scheme that supports comparison,namely order-revealing encryption [19]. We first present the formal definition of order-revealing encryption.
Definition 6 (Order-Revealing Encryption).
An order-revealing encryption (ORE)scheme [19] is a tuple of three algorithms including
Setup , Encrypt and
Compare de-fined over a well-ordered domain D with the following properties: – Setup (1 λ ) → sk : On input a security parameter λ , the setup algorithm outputs asecret key sk . – Encrypt ( sk, m ) → ct : On input a secret key sk and a message m ∈ D , the encryp-tion algorithm outputs a ciphertext ct . – Compare ( ct , ct ) → b : On input two ciphertexts ct , ct , the compare algorithmoutputs a bit b ∈ {− , , } . [ j ] p (cid:166) [ j ] p (cid:167) [ j ] (a) Case 1. q [ j ] p (cid:166) [ j ] p (cid:167) [ j ] (b) Case 2. Fig. 3: Cases for the relationship between q and ( p α , p β ) Algorithm 2
SecureCompare Algorithm
Require:
The ORE ciphertext for
Enc ( p α [ j ]) , Enc ( p β [ j ]) , Enc ( q [ j ]) , as well as Enc ( p α [ j ] + p β [ j ]) , Enc (2 q [ j ]) . Ensure:
The comparison result as − , , denoting that p α [ j ] is closer to ( resp., equivalentwith, farther from) q [ j ] than p β [ j ] .1: if ORE . Compare ( Enc ( p α [ j ]) , Enc ( p β [ j ])) == 0 then return else if ORE . Compare ( Enc ( p α [ j ]) , Enc ( p β [ j ])) == − then if Enc ( q [ j ]) falls outside the interval then return ORE . Compare ( Enc ( q [ j ]) , Enc ( p α [ j ])) else return ORE . Compare ( Enc (2 q [ j ]) , Enc ( p α [ j ] + p β [ j ])) end if else if Enc ( q [ j ]) falls outside the interval then return ORE . Compare ( Enc ( q [ j ]) , Enc ( p β [ j ])) else return ORE . Compare ( Enc ( p α [ j ] + p β [ j ]) , Enc (2 q [ j ])) end if end if With the help of ORE scheme, evaluating the dynamic domination relation between p α and p β can be carried out securely in ciphertext form as outlined in Algorithm 2.For ease of subsequent discussion, we shall denote Enc ( x ) as the ORE ciphertext forthe original message x . Minimizing the number of keys
Following Observation 1, a data owner needs toencrypt database P and the sum of any two tuples in P in each dimension, namely p α [ j ] + p β [ j ] , where α = β, α, β ∈ [1 , n ] , j ∈ [1 , d ] . The above two ciphertexts aredenoted as E ( P ) and E ( Φ ) , respectively. In this step, if we use the same private key onboth E ( P ) and E ( Φ ) , the sum of paired tuples in E ( Φ ) , although encrypted, will leakmore message about plaintext beyond the order.For example, assume that P contains five tuples, whose values in a particular di-mension are a, b, c, d, e , respectively. Suppose that after sorting the values in ascendingorder, we get b, c, a, e, d . Then their sums can be listed as b + c, b + a, b + e, b + d, c + a, c + e, c + d, a + e, a + d, e + d . For ease of discussion, in the following we shall referto these values as pairs of sums . If we encrypt the results for these pairs of sums using +c b+a b+e b+dc+a c+e c+da+e a+de+d Sorteddata: b c a e d
Sum: b+c b+a b+e b+dc+a c+e c+da+e a+d e+d < < << < < < < Key Key Fig. 4: A novel encryption scheme for pairs of tuplesthe same key as E ( P ) , an attacker can get the ordering of plaintexts. Therefore, he maypossibly know b + e ≤ c + a , and then infer that e − a ≤ c − b . In this way, besides theorder, the distribution of values in plaintext tuples is also leaked.However, according to the security model in this work, except the order of tuples insome dimensions, the cloud should not be able to infer the content of the tuples. There-fore, we have to avoid leaking the distribution of data by adopting different keys inORE. Intuitively, an ideal method is to encrypt each pair of sums using a different key,as it is not required to perform comparison among any pair of p α [ j ] , p β [ j ] according toAlgorithm 2. However, the increased number of keys will further introduce key man-agement and storage problems. We propose a novel method to address this problem,which is shown in Fig. 4.As shown in Fig. 4, b, c, a, e, d are the sorted values for five tuples in P on a par-ticular dimension. According to Algorithm 2, these values should be encrypted usingthe same key as comparisons over their ciphertext are required. As a result, given that Enc ( b ) , . . . , Enc ( d ) are encrypted using the same key under ORE, any adversary caneasily infer that b + c < b + a < b + e < b + d regardless that b + c, . . . , b + d areencrypted with different keys or not. Therefore, it is not beneficial to use multiple keysfor such a group of sums. Definition 7 (Order-Obvious Class).
Given the order of a set of n elements, whoseexact values are unknown, if the order of two summations over paired elements can beinferred, we call them Order-Obvious. All the n ( n − / paired summations can bedivided into several disjoint subsets accordingly, such that all the summations in eachsubset are Order-Obvious. We refer to each subset as an Order-Obvious Class (abbrev.OOC). Generally, we can find all
OOC s, which is classified using lines in Fig. 4. The rela-tions for sums in the same
OOC ( e.g., line) can be inferred easily purely from E ( P ) . Inlight of that, we can use the same key to encrypt the sums in the same OOC , and adoptdifferent keys across
OOC s. In this way, any adversary cannot get additional informa-tion over the ciphertexts besides the order, and we can effectively minimize the numberof keys. In particular, the minimum number of keys, denoted as κ , ( e.g., the number oflines in Fig. 4) have to satisfy the following theorem. Theorem 1.
In order to satisfy the predefined security model, the minimum number ofencryption keys in a dimension should be κ = ⌈ ∗ n − ⌉ .Proof. See in Appendix A. emark.
Through the above strategy, we have minimized the required number of en-cryption keys. In spite of that, κ is still linear to the increase of n , which may introducekey management burden if n is very large. To address this, we suggest the followingimplementations. For each row in Fig. 4, we assign it a random Id i . The data owneronly needs to store one master key mk and a series of random Id i . Then, key i for en-crypting each row is generated by mk ⊕ Id i . In this way, we can effectively generate κ different keys based on mk . Accessing the pairs of sums
As required by Algorithm 2, in order to compare t α [ j ] and t β [ j ] , it is always required to retrieve the ciphertext of p α [ j ] + p β [ j ] . Therefore, itis necessary to build a map between the elements of E ( P ) with the corresponding sumsin E ( Φ ) . That is, we need to build an index that maps Enc ( p α [ j ]) and Enc ( p β [ j ]) to Enc ( p α [ j ] + p β [ j ]) . To this end, we present an index based on hash function. Formally,we define a hash function as h : N → N , where N denote the set of natural num-bers. The hash function h should satisfy the following property, ∀ x , y , x , y ∈ N , h ( x , y ) = h ( x , y ) if and only if x = x and y = y .Assume the indices for Enc ( p α [ j ]) and Enc ( p β [ j ]) in E ( P ) are denoted as α and β , respectively. Then the index of Enc ( p α [ j ] + p β [ j ]) in E ( Φ ) can be easily acquiredas h ( α, β ) . Fig. 5 presents an example for the hash function. There are five encryptedvalues in E ( P ) , namely a, . . . , e . The hash function in this example is simply designedas a regular traversal order for the corresponding sums. In fact, any hash function thatsatisfies the aforementioned property can be adopted here. Indexing the pairs of sums
Additionally, as all the pairs of sums within a particular
OOC are encrypted by ORE using the same key, we present to use additional indexstructures for efficient retrieval of corresponding entries for these pairs of sums . There-fore, we also design an index scheme for management of these ORE encrypted pairs ofsums . In
SCALE , we adopt AVL-Tree based structure to construct the indexing structure,as it provides the best efficiency when querying for a particular range. Specifically, itis possible for us to build an AVL-Tree to index all these encrypted sums in the same
OOC . Notably, each AVL-Tree roots at the median of each
OOC and all the nodes in anAVL-Tree are the corresponding ciphertexts for pairs of sums in the same
OOC .For instance, given the records in Fig. 4, there are two
OOC s. We shall build twodifferent AVL-Trees for indexing the corresponding ciphertexts for each
OOC , respec-tively. That is, the first
OOC centered at b + d corresponds to an AVL-Tree rooted with Enc ( b + d ) ; another OOC centered at c + e corresponds to another AVL-Tree rootedwith Enc ( c + e ) (as shown in Fig. 5).In fact, data structures other than AVL-Tree can also be adopted to index the OREciphertexts for each OOC . We select AVL-Tree as the default setting in
SCALE as it pro-vides the best query response time among all the choices. Detailed comparison betweenAVL-Tree and other choices will be discussed in Section 4.3.
Database encryption
We have now all the ammunitions in place to demonstrate theentire process of encrypting the database (Algorithm 3). First, the data owner generates c b e a dc b e a d b+db+a a+db+c b+e c+d e+dc+ec+a a+e+d+a+c dd d+ee+aaaa e Raw Entries in a dimension Orderbcaed 12 H a s h F un c t i o n ORE.Encry
Fig. 5: The complete ciphertext storage structure d + ⌈ ∗ n − ⌉ ∗ d keys (Line 1), and for each column ( i.e., attribute) in P we encryptthe entries using the same key (Lines 3-5), resulting in E ( P ) . Then, the data ownersorts the entries (Line 8) in each column ( i.e., attribute) and computes the sums forpairs of entries in each dimension. Afterwards, the sums are then encrypted using thecorresponding keys as shown in Fig. 4 (Lines 9-17), resulting in E ( Φ ) . Finally, the dataowner sends E ( P ) , E ( Φ ) to the cloud server.Besides, the data owner also creates a hash function h that maps each pair of ele-ments in E ( P ) and the corresponding sums in E ( Φ ) , and sends h to the cloud server. Itis now possible for the cloud to quickly locate the ciphertext of the corresponding sumsfor each pair ( p α [ j ] , p β [ j ]) . A data user needs to register their information to the Data owner and securely get thekeys. Then the query user encrypts the request according to Algorithm 4 before sendingit to the cloud server.As shown in Algorithm 4, user encrypts each dimension of the query tuple usingcorresponding keys (Line 2) and encrypts the doubled entries for the query tuple usingother keys (Lines 4-5). Finally, the user sends
Enc ( q ) , Enc (2 q ) to the cloud server. Asmentioned in Algorithm 1, given an encrypted query q as shown Algorithm 4, the cloudserver needs to perform comparisons and computations over encrypted data. Accord-ing to the approach shown in Fig. 3, the cloud server can perform skyline query viathe comparison relationship with encrypted tuples, encrypted query request, encryptedsums, and encrypted doubled request. As a result, the process described in Algorithm 1can be now performed in ciphertext without decryption, which is shown in Algorithm5. To illustrate the entire protocol, we provide a running example in the following. Example 1.
For the convenience of representation, we assume that P contains five tu-ples, whose entries in dimension are sorted as , , , , .According to Algorithm 3, we shall first compute the sums for all pairs of values, e.g., , , , ,
13 + 21 = 34 ,
13 + 32 = 45 ,
13 + 53 = 66 ,
21 + 32 = 53 ,
21 + 53 = 74 ,
32 + 53 = 85 . As shown in Theorem 1, thenumber of encryption keys required for these sums can be calculated as ⌈ ∗ − ⌉ = 2 . lgorithm 3 Dataset Encryption
Require:
The dataset P Ensure:
The ciphertexts sets E ( P ) , E ( Φ )
1: generate d + ⌈ ∗ n − ⌉ ∗ d keys with ORE . Setup as keys [] for p ∈ P and j in , . . . , d do Enc ( p [ j ]) ← ORE . Encrypt ( keys [ j ] , p [ j ])
4: let
Enc ( p ) = { Enc ( p [1]) , . . . , Enc ( p [ d ]) } and add Enc ( p ) to E ( P ) end for
6: let m = 1 for j in , . . . , d do Λ = ( p (1) [ j ] , . . . , p ( n ) [ j ]) ← sort p [ j ] , . . . , p n [ j ] in ascending order9: while Λ is not empty do for i in , . . . , len ( Λ ) do
11: add
ORE . Encrypt ( keys [ d + m ] , p (1) [ j ] + p ( i ) [ j ]) to E ( Φ ) end for for i in , . . . , len ( Λ ) − do
14: add
ORE . Encrypt ( keys [ d + m ] , p ( n ) [ j ] + p ( i ) [ j ]) to E ( Φ ) end for
16: remove the first and last elements in Λ , let m = m + 1 end while end for return E ( P ) , E ( Φ ) Therefore, we use two keys to encrypt the above sums, resulting in
Enc (20) , Enc (28) ,. . . , Enc (85) and Enc (34) , Enc (45) , Enc (53) .Besides, we also need to use another key to encrypt the original tuples, e.g., Enc (7) , Enc (13) , . . . , Enc (53) . Suppose that a user submits a query with q [1] = 23 . Then q and q need to be encrypted according to our scheme, resulting in Enc (46) , Enc (46) , Enc (23) . These ciphertexts are then sent to the cloud server. The cloud server comparesciphertexts one by one according to the protocol. Through ORE . Compare and Algori-thm 2, the cloud server can easily determine that
Enc (32) dominates Enc (53) follow-ing the case shown in Fig. 3a. Similarly, Enc (21) dominates Enc (7) and Enc (13) . Inthe case shown in Fig. 3b, Enc (21) dominates Enc (32) because ORE . Compare ( Enc (53) , Enc (46)) = 1 . Algorithm 5 will iteratively repeat this process for all dimensionsand remaining tuples. Modifications over database records ( insert , delete , update ) are fundamental require-ments in database applications. In light of that, hereby we discuss the strategies to sup-port these operations in our framework. As depicted in Fig. 4, the cloud server storesthese encrypted sums of data values in different OOC s, which contributes the mostexpensive maintenance cost. Hence, the way how these encrypted sums are stored isfundamentally important. In fact, many index structures can be used to accomplish thistask. In
SCALE , we adopt AVL-Tree [1], as it presents the best efficiency in search-ing for an entry due to the strictly balanced structure. In fact, we have considered and lgorithm 4
Query Request Encryption
Require:
The query data q , keys from data owner keys [] Ensure:
The ciphertexts
Enc ( q ) , Enc (2 q ) for j in , . . . , d do Enc ( q [ j ]) ← ORE . Encrypt ( keys [ j ] , q [ j ]) for m in , . . . , ⌈ ∗ n − ⌉ do
4: let key _ num = d + ( j − ∗ ⌈ ∗ n − ⌉ + m Enc (2 q m [ j ]) ← ORE . Encrypt ( keys [ key _ num ] , q [ j ]) end for end for return Enc ( q ) , Enc (2 q ) Table 2: Functional comparison over different structures for insertion and deletion
Data Struc-ture Advantages Disadvantages
Linked List Easy implementa-tion Expensive costAVL-Tree Shorter query timethan Red-blackTree Longer response time forthe insert and delete oper-ationsRed-blackTree Longer query timethan AVL-Tree Shorter response time forthe insert and delete oper-ations compared several different structures including Linked list, AVL-Tree and Red-blacktree. Table 2 shows the functional comparison over the advantages and disadvantagesof these data structures in our models. In the following, we shall sequentially describehow insertion and deletion is supported in
SCALE using AVL-Tree.
Insertion