LPTD: Achieving Lightweight and Privacy-Preserving Truth Discovery in CIoT
Chuan Zhang, Liehuang Zhu, Chang Xu, Kashif Sharif, Xiaojiang Du, Mohsen Guizani
aa r X i v : . [ c s . CR ] A p r LPTD: Achieving Lightweight and Privacy-PreservingTruth Discovery in CIoT
Chuan Zhang a , Liehuang Zhu a , Chang Xu a, ∗ , Kashif Sharif a , Xiaojiang Du b ,Mohsen Guizani c a Beijing Engineering Research Center of Massive Language Information Processing andCloud Computing Application,School of Computer Science and Technology, Beijing Instituteof Technology, Beijing, China. b Department of Computer and Information Sciences, Temple University, Philadelphia,USA. c Department of Electrical and Computer Engineering, University of Idaho, Moscow, Idaho,USA.
Abstract
In recent years, cognitive Internet of Things (CIoT) has received considerableattention because it can extract valuable information from various Internet ofThings (IoT) devices. In CIoT, truth discovery plays an important role in iden-tifying truthful values from large scale data to help CIoT provide deeper insightsand value from collected information. However, the privacy concerns of IoT de-vices pose a major challenge in designing truth discovery approaches. Althoughexisting schemes of truth discovery can be executed with strong privacy guar-antees, they are not efficient or cannot be applied in real-life CIoT applications.This article proposes a novel framework for lightweight and privacy-preservingtruth discovery called LPTD-I, which is implemented by incorporating fog andcloud platforms, and adopting the homomorphic Paillier encryption and one-way hash chain techniques. This scheme not only protects devices’ privacy, butalso achieves high efficiency. Moreover, we introduce a fault tolerant (LPTD-II)framework which can effectively overcome malfunctioning CIoT devices. De-tailed security analysis indicates the proposed schemes are secure under a com-prehensively designed threat model. Experimental simulations are also carried ∗ Corresponding author
Email address: [email protected] (Chang Xu)
Preprint submitted to Journal of L A TEX Templates April 9, 2018 ut to demonstrate the efficiency of the proposed schemes.
Keywords:
CIoT; truth discovery; lightweight, privacy-preserving.
1. Introduction
Cognitive Internet of Things (CIoT) is a specialized IoT model which capital-izes on the increasing capabilities of mobile devices (with built-in comprehensivesensor sets), which uses cognitive computing techniques to find valuable infor-mation from large scale sensing data [1, 2, 3]. By analyzing the big data createdby various IoT devices, CIoT is able to provide deeper insights, high-level intel-ligence, and further create values for people.Despite the proliferation of CIoT, there are some increasing concerns whichmay impede its wide adoption. For example, the sensory data captured and pro-vided by different devices is usually not directly usable or reliable, as it may bedistorted due to reasons such as, lack of sensor calibration, poor sensor quality,background noise, and even the intent to deceive. Therefore, an important taskof the CIoT applications is to discover truthful information from the sensorydata. This task, called truth discovery, has drawn significant attention [4, 5, 6].Typically, the common principle to execute truth discovery is weighted aggre-gation that assigns a higher weight to a particular device if data reported by itis closer to the aggregated results from all devices. Moreover, a device’s data isgiven higher value if the device has higher weight due to its past performance[7, 8]. By performing truth discovery, accurate sensory data can be obtained,and such data will greatly promote the effectiveness of CIoT applications.Although having significantly improved data accuracy, the challenge fortruth discovery, is that the sensory data is highly sensitive and should be wellprotected, especially considering that sensory data may contain personal infor-mation [9, 10, 11]. For example, geo-tagging services can publish timely andaccurate localization of specific objects (e.g., pothole, automated external defib-rillator, litter, etc.). However, this may lead to exposure of participating users’sensitive geo-location and/or movement patterns. Aggregated health statistics2i.e., treatment outcomes) may provide valuable information regarding medicaldevices’ effects or new drugs, but may threaten the privacy of participating pa-tients. Meanwhile, user reliability (i.e., weight) is another private informationwhich should be well protected. From user reliability information, the attackermay infer details of participating users’ education, skills, and personality traits.For example, aggregating opinions regarding challenging social problems maylead to a better solution. However, the leakage of reliability may disclose users’education and intellectual level.Several studies have tried to preserve users’ privacy in the applications oftruth discovery [7, 8, 12]. However, most of them are not efficient or cannot beapplied in real-life CIoT applications. For example, Du et al[13] tried to find areliable key management scheme, Miao et al. [7] proposed a cloud-based privacy-preserving truth discovery scheme to protect users’ sensory data. However, byusing threshold Paillier cryptosystem [14], their scheme is not efficient. To im-prove efficiency, Xu et al. [8] proposed a lightweight and privacy-preservingdiscovery scheme by using the additive homomorphic privacy-preserving tech-niques. Miao et al. [12] further designed a lightweight truth discovery frameworkby using two non-colluding cloud platforms. Although their schemes achievebetter efficiency, they cannot be applied in CIoT applications, especially inscenarios where some IoT devices may not deliver their data timely [15]. More-over, all the above schemes cannot defend from external attackers who injectfalse data into the system. Hence, there is a need for an efficient truth discoveryscheme, which not only protects users’ privacy, but is also able to mitigate falsedata injection attacks and give fault tolerance.In this paper, to address these challenges, we present a lightweight privacy-preserving truth discovery scheme in CIoT, called LPTD-I, to protect devices’privacy (i.e., sensory data and reliability information), and resist false datainjection attacks. The framework is implemented by involving fog and cloudplatforms, adopting homomorphic Paillier encryption, and one-way hash chaintechniques. In this framework, the fog node authenticates the data submit-ted from devices and aggregates the data before delivering it to the cloud. In3ddition, we exploit the properties of modular arithmetic to design a data ag-gregation algorithm which is efficient and privacy preserving.Although LPTD-I can defend against the false data injection attack launchedby external attackers, it is not fault-tolerant. Thus, we exploit the modifiedPaillier cryptosystem and propose a framework (LPTD-II) suitable for the sce-narios where some IoT devices may stop delivering data due to device failure,to the fog node. In this framework, the secret key is split into two parts, andthe fog devices can cooperate with the cloud to recover the aggregated resultssuccessfully.In summary, the contributions of this paper are: • We propose a novel lightweight and privacy-preserving truth discoveryscheme in CIOT, called LPTD-I. This scheme not only preserves theprivacy of users (i.e., sensory data and reliability information), but alsoachieves high efficiency. • For the scenarios where some IoT devices stop reporting sensory datato the fog node, an upgraded technique called LPTD-II, is proposed toachieve fault tolerance. • Detailed security analysis indicates the proposed schemes are secure un-der an elaborate threat model. Additionally, experimentation shows theefficiency of both the proposed schemes.The rest of this paper is organized as follows. In section 2, we give theproblem definition which includes the system model, security model, and designgoals. In section 3, we describe some preliminary. The details of the proposedLPTD schemes are described in section 4, followed by the security analysis andperformance analysis in section 5 and section 6, respectively. In section 7, wediscuss the related work. Finally, we draw the conclusion in the last section.4 . Problem Definition
The system model, security model, and design goals are outlined in thefollowing sections.
The system model shown in Fig. 1 is comprised of four entities: IoT devices,the fog node, the cloud, and a trusted authority. • IoT devices: Each IoT device is equipped with sensing, communication,and computing capabilities, which can enable the device to collect sensorydata, report data, and perform simple computation operations. Note that,since most IoT devices are resource-constrained, the computational costsfor operations performed at these devices should be minimal. • Fog node: The fog node acts as a middle layer between the IoT devicesand the cloud, and is deployed at the edge of network. They can pro-cess/deliver data for the devices and/or cloud. In our schemes, it alsoaggregates all reports from IoT devices, and forwards resulting data tothe cloud. • Cloud: It receives all data from the IoT devices through the fog node. Foreach object, it generates an initial ground truth, and iteratively updatesthe truth in cooperation with the fog node. • Trusted authority (TA): TA is a trusted third party, and it bootstraps thewhole system. It generates keys and assigns them to all entities. Once thesystem is up and running, the TA remains offline.We formalize the truth discovery approach as follows: Suppose there are K IoT devices and M objects, we use x km to denote the observed value of device k for object m . For all devices, { w , w , · · · , w K } are used to denote theirreliabilities (i.e., weights). Each object is assigned an initial ground truth. Thegoal of the proposed scheme is to calculate the ground truths { x ∗ m } Mm =1 for all5 IOT devices Cloud … Fog node E n c r yp t e d d a t a Aggregated data T r u t h s E n c r yp t e d w e i gh t e d d a t a a nd w e i gh t Trusted authority
Figure 1: System model. objects while protecting the observed value and weight of each device from beingdisclosed to others. Table 1 summarizes the main notations used in this work.
Table 1: Summary of notations . Symbol Definition K Number of devices k Index of devices, k ∈ { , K } w k Weight of device kM Number of objects m Index of objects, m ∈ { , M } x km Observed value of device k for object mx ∗ m Truth for the object mstd m The standard deviation for the m -th object • TA is considered to be fully trusted, and it cannot be breached by anyattacker. • The fog and cloud elements are honest-but-curious. This means that they6ill follow the protocol, but are also curious regarding device/user details.Note that, in our threat model, they do not collude with each other. • The honest-but-curious IoT devices will follow the protocols. They cancollude with other entities (i.e., other IoT devices, the fog, and the cloud),but we emphasize that they cannot collude with the fog and the cloudsimultaneously. • Since the focus of this work is to design a privacy-preserving truth discov-ery approach, internal attacks are not considered, i.e., all entities cannotbe compromised at the same time. However, we do allow that some IoTdevices may malfunction or stop reporting data intermittently. Moreover,external attackers may also launch false data injection attacks. Hence, thefog node should filter such data before transmitting them to the cloud.
The goal of the proposed scheme is to design an efficient and privacy-preserving truth discovery approach which can protect devices’ privacy andreduce computational costs. Security issues as studied in [16, 17, 18] shouldbe solved in our work. In order to achieve this, following design goals must beguaranteed: • Privacy: The proposed scheme should preserve the privacy. The fog nodeand cloud can obtain the truthful values, but they cannot obtain individualIoT devices’ information (i.e., sensory data and reliability information). • Security: The scheme should be resistant to false data injection attackslaunched by external attackers. In other words, the fog node should au-thenticate the IoT devices and filter the false data before transmitting itto the cloud. • Fault Tolerance: In case where some IoT devices malfunction and stopreporting data, the cloud should still be able to obtain acceptable levelsof aggregated data. 7
Efficiency: The computational cost at each system element should be aslittle as possible.
3. Preliminaries
In order to better explain the proposed schemes, we first introduce the gen-eral process of truth discovery and cryptographic tools, in the following parts.
Truth discovery in large scale sensory data has been widely studied in thepast. Although the algorithmic details of different solutions are a bit differentfrom each other, the fundamental principle of assigning device weights and es-timating ground truth is same. At the initialization point of truth discoveryalgorithm, random ground truths are assigned, which are iteratively updateduntil convergence is achieved. Algorithm 1 shows the general truth discoveryprocess.
Weight Update:
In this step, the ground truth of each object is assumedto be fixed. Typically, a device is assigned higher weight if it provides data,which is closer to the ground truth, and vice versa. Inspired by the works ofCRH [4] (as it gives good practical performance), we calculate weight as follows: w k = log ( P Kk =1 P Mm =1 d ( x km , x ∗ m ) P Mm =1 d ( x km , x ∗ m ) ) (1)where d ( · ) is a distance function utilized to measure the difference betweenthe ground truth and observation by devices. Moreover, d ( · ) is dependent onapplication use case. The two most common type of data (i.e. continuous andcategorical) are considered in this work.In applications, such as environmental monitoring, sensory data (e.g., tem-perature, humidity, etc.) is continuous in nature. Hence the following distancefunction is adopted: d ( x km , x ∗ m ) = ( x km − x ∗ m ) std m (2)8here std m is used to represent the standard deviation of all the users’ obser-vations for object m .Other use cases like public opinion polls have collected data that is categor-ical in nature, that is based on the selection of choices. In these applications,only one is correct among the multiple candidate choices. Thus, an observa-tion vector x km = (0 , . . . , q , . . . , T is defined to denote that the k -th deviceselects the q -th candidate choice for object m . The following function is usedto measure the distance between the observation vector and the ground truthvector: d ( x km , x ∗ m ) = ( x km − x ∗ m ) T ( x km − x ∗ m ) (3) Truth Update:
In this step, weights are assumed to be fixed. We calculatethe ground truth for m -th object as follows: x ∗ m ← P Kk =1 w k · x km P Kk =1 w k (4) x ∗ m is considered ground truth, if data is continuous. Contrary to this, x ∗ m is considered a probability vector where each element represents the probabilityof a choice being true, if the data is categorical. In this case, the final groundtruth is the choice with highest probability. In order to perform encryption, we make use of the following algorithms.
A modified Paillier cryptosystem to encrypt devices’ sensitive information[19] is used to realize privacy-preserving truth discovery. This modified Pailliercryptosystem consists of the following four components: • Key Generation:
Given a security parameter κ , two large safe prime num-bers p , and q are calculated as p = 2 p ′ + 1 and q = 2 q ′ + 1, where | p | = | q | = κ , p ′ and q ′ are also two large primes. Then, Compute n = pq ,9 lgorithm 1: Truth Discovery Algorithm
Input:
Observations from K devices: { x km } M,Km,k =1 Output:
Ground truths for M objects: { x ∗ m } Mm =1 Randomly initialize the ground truth x ∗ m ; for iteration = 1 , , · · · , iteration max do for k = 1 , , . . . , K do Update device weight(see Eq.(1)); for m = 1 , , . . . , M do Update ground truth (see, Eq.(4)) return { x ∗ m } Mm =1 ; and λ = lcm ( p − , q −
1) = 2 p ′ q ′ . Choose a random value µ ∈ Z n ,and a random number x ∈ [1 , λ ( n ) / pk = ( n, g = µ mod n , h = g x ), and the secret key is x . • Encryption:
Suppose there is a message m ∈ Z n to be encrypted. Selecta random value r ∈ Z n , and calculate the ciphertexts ( c , c ) as c = g r mod n and c = h r (1 + n · m ) mod n . • Decryption:
Given ( c , c ), the message m can be decrypted by computing m = c / ( c ) x − n n . • Proxy Re-encryption:
Split the secret key x into two random shares x , x ,such that x = x + x . Then, the ciphertexts ( c , c ) can be partiallydecrypted as ( e c , e c ) by using x , where e c = c , and e c = c / ( c ) x mod n . Lastly, ( e c , e c ) can be decrypted using x to recover m . As a common cryptographic tool, various applications [20] have used one-way hash chain. In this work, we use this technique to authenticate the IoTdevices. Suppose there is a secure hash function: h : { , } ∗ → h : { , } l , aone-way hash chain can be defined as a set of values ( m , m , · · · , m n ), where10 n ∈ { , } l is randomly chosen, and m i = h ( m i +1 ) for i = 0 to n −
1. Notethat, it is easy to compute m x , where x < y , but becomes computationallyinfeasible for m z , if y < z . Fig. 2 depicts the structure of one-way hash chain. (cid:1865) (cid:2868) (cid:1865) (cid:2869) (cid:1865) (cid:2870) …… (cid:1865) (cid:3041)(cid:2879)(cid:2869) (cid:1865) (cid:3041) Figure 2: One-way hash chain structure. n In modified Paillier cryptosystem, for any message m i ∈ Z n , i = 1 , , · · · , n ,the following equation holds n Y i =1 (1 + n · m i ) ≡ (1 + n · n X i =1 m i ) mod n . (5)This property can be easily proven by using mathematical induction, whichcan be found in [15].
4. Proposed LPTD Schemes
In this section, we give the details of the proposed two LPTD schemes inCIoT, which mainly include the following parts: system initialization, designoverview, LPTD-I scheme, and LPTD-II scheme.
TA is considered to be fully trusted, and it bootstraps the whole system.Given a security parameter κ , TA selects two large safe prime numbers p, q ,where | p | = | q | = κ . Following this, it then generates the public key pk &private key sk of the modified Paillier cryptosystem as pk = ( n, g, h ) , sk = x ,where n = p · q , and h = g x mod n . Then, TA randomly splits sk into twoshares x and x , such that x = x + x . Suppose there are K IoT devicesin the network, TA generates K + 2 vectors [ s , s , · · · , s K +1 ], each contains w random numbers, such that, 11 +1 X k =0 s kj ≡ n (6)where j ∈ [1 , w ].TA selects a secure cryptographic hash function h , where { , } ∗ { , } l .Since the truth and weight are iteratively updated, we divide the number of iter-ations into w times, and at every iteration, each device will report its observationor weighted data. TA generates K one-way hash chains HC , HC , · · · , HC K ,where HC k = ( h k , h k , · · · , h kw ), h k ∈ { , } ∗ , and h kj = h ( h k ( j +1) || j ),1 ≤ k ≤ K , 0 ≤ j ≤ w − • For the device k , TA computes S k = { ( g s k , h s k ) , ( g s k , h s k ) , · · · , ( g s kw , h s kw ) } and assigns S k , the hash chain HC k = { h k , h k , · · · , h kw } , and the publickey pk . • For the fog, TA assigns a share of the private key x , the hash chainheads of K devices ( h , h , · · · , h K ), the secret key vector S K +1 = { h s ( K +1)1 , h s ( K +1)2 , · · · , h s ( K +1) w } , the public key pk , and the shared key ss to the fog device. • For the cloud, TA assigns the other share of private key x , the secret keyvector S = { h s , h s , · · · , h s w } , together with the public key pk , andthe same shared key ss . Once the devices obtain the observed values, LPTD will carry out the fol-lowing two phases: • Phase 1: Secure weight update.
First, every IoT device encrypts theobserved value by using the cryptographic tool. Then, these ciphertextsare submitted to the fog node for aggregation and the aggregated valueis further submitted to the cloud to calculate the standard deviation of12he observed values, which will be then sent to every device. After that,every device computes the distances between the observed values and theground truths. Finally, the fog and the cloud cooperatively and iterativelyupdate the weights. • Phase 2: Secure truth update.
When each device receives the ag-gregated differences from the fog device, they first calculate the weight,the weighted observed values, and then send them to the fog device inciphertexts. Lastly, the fog and the cloud will calculate the ground truth x ∗ m .During the procedure of LPTD, all operations are executed in ciphertexts.Hence, an entity only knows its own information, and the devices’ sensitiveinformation (i.e., observed value and weights) is not leaked to other entities. In this subsection, we first describe the details of LPTD-I, which is able toprotect the devices’ privacy and resist external false data injection attacks.It is important to note that the sensory data from IoT devices may not beintegers, but the cryptosystem used in this scheme is defined for integer values.Thus, to deal with this problem, a parameter T , of magnitude 10, is utilized toround off the observed values. As an example, device k gets the observed value x km for the object m . We can use T to multiply x km as ⌊ x km · T ⌋ , and the finalresult can be recovered by dividing T . For easy understanding in this work,all observed values and intermediate results are assumed to be preprocessed asabove. Step W1.
The cloud delivers the estimated ground truth x ∗ m for object m to all devices. If it is the first iteration, the estimated ground truth is randomlyinitialized. Otherwise, it will be obtained from the previous iteration. Step W2.
When the device k obtains x ∗ m , it first computes the differencebetween x km and x ∗ m according to Eq. 2, and then aggregates the differences of13 objects as Dist k = P Mm =1 d ( x km , x ∗ m ). Before submitting Dist k to the fognode, the device uses its secret key S kj to compute C kj = (1 + n · Dist k ) · h s kj mod n , (7)and then uses the hash value h kj to compute mac kj = h ( C kj || h kj ) , (8)where j denotes the iteration number. After that, the device submits ( C kj , h kj , mac kj )to the fog. The operation may not seem time efficient, but they can be efficientlyexecuted, as h s kj has been calculated by TA in advance. Step W3.
After receiving ( C kj , h kj , mac kj ) in the j -th iteration, the fognode checks the validity of the IoT device, and aggregates the reports as follows: • Check hash chain node h kj : Assume that the fog has authenticated h k ( j − in the previous ( j − h kj according to h k ( j −
1) ? = ( h kj || j ). If it holds, h kj is accepted. Otherwise, it is rejected. • Check mac kj : If h kj is valid, the fog node further verifies mac kj by com-puting mac ′ kj = h ( C kj || h kj ) , (9)and checking if mac ′ kj ? = mac kj . If it holds, mac kj is accepted. Otherwise,it is rejected. • Data aggregation: After receiving ( C j , C j , · · · , C Kj ) from all devices,the fog node utilizes its secret key S ( K +1) j to obtain the aggregated resultas C j = K Y k =1 ( C kj ) · h s ( K +1) j mod n , (10)and then use the shared secret key ss to compute mac j = h ( C j || j || ss ) . (11)Following this, the fog device delivers ( C j , mac j ) to the cloud.14 tep W4. Upon receiving ( C j , mac j ) in the j -th iteration, the cloud firstchecks data validity according to mac j = h ( C j || j || ss ). If it holds, the cloudexecutes the following operations to obtain the aggregated results. • The cloud utilizes its secret key S j to compute C ′ j = C j · h s j mod n = ( K Y k =1 C kj ) · h s j + s ( K +1) j mod n = ( K Y k =1 (1 + n · Dist k ) · h s kj ) × h s j + s ( K +1) j mod n = K Y k =1 (1 + n · Dist k ) · K +1 Y k =0 h s kj mod n = ( K Y k =1 (1 + n · Dist k ) · h P K +1 k =0 s kj → mod n = K Y k =1 (1 + n · Dist k ) mod n −→ = 1 + n · K X k =1 Dist k mod n . (12) • The cloud can obtain P kk =1 Dist k by computing sum d = K X k =1 Dist k = C ′ j − n . (13)The cloud then selects a random number r j ∈ Z n to blind sum d aslog( r j · sum d ) before forwarding it to the fog node. Step W5.
After receiving log( r j · sum d ), the fog node selects a random number r j ∈ Z n , and computeslog( ^ sum d ) = log( r j · sum d ) + log( r j )= log( r j r j · sum d ) . (14)15fter that, the fog delivers log( ^ sum d ) to the device. The device can calculateits weight as w k = log( ^ sum d ) − log( Dist k )= log( r j r j · K X k =1 Dist k ) − log( Dist k )= log( r j r j · P Kk =1 Dist k Dist k )= r j · w k , (15)where r j = r j · r j .As shown in Eq. 2, the standard deviation std m is necessary to calculate thedifference between the observed value and the ground truth. Thus, it should becomputed first. The calculations can be shown as follows: • The IoT device k encrypts the observed value x km according to Eq. 7, andforwards the ciphertexts to the fog node. • On reception of ciphertexts, the fog node and the cloud cooperativelycalculate sum m = P Kk =1 x km , and x m = sum m /K following the aboveoperations, and then send x m to all devices. • The device k calculates d km = ( x km − x m ) , and encrypts d km before up-loading it to the fog node. • Upon receiving all the ciphertexts, the fog and the cloud cooperatively cal-culate sum d = P Kk =1 d km , and further obtain std m as std m = p sum d /K .At last, std m is forwarded to all devices. Upon updating the weights, it is time to update the ground truth. Thedetails are shown as follows.
Step T1.
The device k calculates the weighted data as r j · x km · w k , andthen encrypts the weighted data and weight as W kj, = (1 + n · ( r j · x km · w k )) · h s kj mod n W kj, = (1 + n · ( r j · w k )) · h s kj mod n (16)16hen, following the same operations in secure weight update, k generates mac kj =( W kj, || W kj, || h kj ), and uploads ( W kj, , W kj, , h kj , mac kj ) to the fog node. Step T2.
After checking the data validity, the fog uses its secret key S ( K +1) j and runs the aggregation operations according to Eq. 10. It then uploads( W j , mac j ) to the cloud. Step T3.
The cloud uses its secret key S j , and computes r j · P Kk =1 ( x km · w k )and r j · sum Kk =1 w k according to Eq. 12. The cloud then updates the groundtruth as x ∗ m = r j · P Kk =1 ( x km · w k ) r j · P Kk =1 w k . (17)Note that, we only consider continuous data in the proposed scheme. Sincethe difference function between continuous and categorical data is different, thedistance between the observed vector x kd and the ground truth vector x ∗ d can beeasily computed according to Eq. 3, which can be seen as a special case in theproposed LPTD schemes.After combining the above two procedures, the privacy-preserving truth dis-covery algorithm is shown in Algorithm 2. In real-life CIoT applications, one IoT device l may not submit its data intime due to malfunctions, low battery, network delay, etc. Thus, the aggregatedresult is not accurate based on the previous operations, because P K +1 k =0 s k ≡ n does not hold. To achieve fault-tolerance, we design another efficientand privacy-preserving truth discovery approach, call ed LPTD-II. In the fol-lowing, we only show how to recover the aggregated results from the ciphertextsin the cloud. Other details are omitted, as they are similar to LPTD-I.When submitting ciphertexts to the fog node, besides C kj , W k, , W k, , thedevice k needs to submit another ciphertext G kj = g s kj mod n . Note that,this ciphertext is also pre-computed by TA, and delivered to the fog node inadvance to save computational cost and communication overhead.17fter receiving G kj from all devices expect the device j , the fog node firstaggregates them as G j = K Y k =1 ,k = l G kj , (18)and then uses its share of the secret key x to partially decrypt the aggregatedciphertexts as C t, = C j ( G j ) x mod n . (19)The cloud further computes C t, = C t, ( G j ) x mod n (20)with x , and obtains the aggregated result M by calculating M = ( C t, − n ) mod n . (21)
5. Security Analysis
The security properties of proposed LPTD schemes are of prime importance.Here, we show how the proposed schemes can achieve privacy preservation andeffectively defend against false data injection attacks.
Defense against false data injection:
To authenticate the validity of data ineach iteration, one-way hash chain technique is applied in the LPTD schemes.For each device, if the hash value h k ( j − is authenticated in the ( j − h kj can be authenticated according to h k ( j − = h ( h kj || j ) as it ishard to obtain h kj from h k ( j − due to the properties of one-way hash function.In fact, only if a device reports its data in the j -th iteration, the fog can get afresh h kj . If the h kj is not fresh in the j -th iteration, it can be considered asfalse data by replaying h kj . The fog can identify and filter this data. Thus, theproposed LPTD schemes can defend against the false data injection attack. Privacy preservation:
In LPTD schemes, the observed value of a device k isencrypted as C kj = (1 + n · m ) · h s kj , if we look at Dist kj as a message m . Notethat (1+ n · m ) · h s kj is a valid Paillier ciphertext. An external attacker cannot get18 lgorithm 2: Privacy-Preserving Truth Discovery Algorithm
Input:
Observations from K devices: { x km } M,Km,k =1 Output:
Ground truths for M objects: { x ∗ m } Mm =1 The cloud randomly initializes the ground truth { x ∗ m } Mm =1 . Each device encrypts the observed value x km as Enc ( x km ) and Enc ( x km ) ,and sends both to the fog. After receiving all ciphertexts, the fog cooperates with the cloud tocalculate the standard deviation std m for object m , and delivers it to alldevices. for iteration = 1 , , · · · , iteration max do for k = 1 , , . . . , K do Each device calculates the difference between x km and x ∗ m , and thesum of differences for M objects Dist k . Then, Dist k isencrypted as Enc ( Dist k ), and submitted to the fog node. After obtaining
Enc ( Dist k ), the fog cooperates with the cloud torecover log( sum d ), and further blind log( sum d ) by choosing tworandom values r j and r j . Then, log( ^ sum d ) is delivered to alldevices. After obtaining log( ^ sum d ) , each device calculates its weight, andweighed data. Both of them will be uploaded to the fog nodeafter encryption. for m = 1 , , . . . , M do When the fog receives
Enc ( x km · w k ) and Enc ( w k ), it cooperateswith the cloud to calculate the ground truth x ∗ m , and then sendsthe truth to all devices. return The ground truths { x ∗ m } Mm =1 ; m , as the Paillier encryption achieves IND-CPA (i.e., indistinguishable underthe chosen plain text attack). The fog node is also curious about m . However,without knowing the other share of the secret key x , it will not be able to19ecover the sensitive data. For the weight information, x km · w k and w k areencrypted as W k, and W k, respectively. As W k, and W k, are both Paillierciphertexts, an external attacker cannot recover the weight information. Noticethat, the attacker may perform the following operation to calculate the weight, W kj, W kj, = 1 + n · ( r j · x km · w k )1 + n · ( r j · w k ) . (22)However, since x km , w k , and r j are unknown, the attacker cannot calculate themfrom Eq. 22. The attacker may build more equations to recover x km as W k , = (1 + n · ( r · x km · w k )) · h s k mod n W k , = (1 + n · ( r · w k )) · h s k mod n W k , = (1 + n · r · x km · w k ) · h s k mod n W k , = (1 + n · r · w k ) · h s k mod n · · · (23)From Eq. 23, we can see that with more equations introduced, more randomnumbers (i.e., r j ) will be introduced. Since r j = r j · r j , only if the fog nodecolludes with the cloud, the attacker can obtain r j . Nevertheless, under oursecurity model, there is no collusion between the fog and the cloud. Hence, thescheme preserves the privacy, and passes the security model.
6. Performance Analysis
In addition to security model evaluation, we also perform experimental eval-uation for communication and computational costs of both proposed schemes.
To show the communication overhead of LPTD, we compare the proposedschemes with the PPDP [7], which encrypts the data by calculating c = g m r n mod n , under the same setting. Here, we assume the bit length of | n | isset as U . However, we omit the cost of authentication for all schemes as afairness consideration. During the process of weight update in LPTD-I, each20evice needs to submit Enc ( Dist k ), which costs U bits. In PPDP, k needsto submit Enc ( Dist k ) and Enc (log(
Dist k )), which cost 2 U . In the procedureof truth update, PPDP and LPTD-I need to submit M · Enc ( w km · w k ) and Enc ( w k ), which cost ( M + 1) U , where M is the number of objects. Comparedwith LPTD-I, LPTD-II needs to submit one more g s kj mod n to execute thedecryption operation. However, in reality, g s kj mod n can be submitted to thefog in advance to receive communication overhead, as it is constant. Table 2summarizes the communication overhead of all schemes in each phase for eachdevice. Table 2: Comparison of communication overhead for each CIoT device.
Phase of weight update Phase of truth updatePPDP 2 U ( M + 1) U LPTD-I U ( M + 1) U LPTD-II 2 U ( M + 2) U We compare the computational costs of LPTD and PPDP schemes by im-plementing all schemes in Java, and run several experiments on a system with2.5 GHz Intel Core i7 and 16GB RAM. The number of iteration is set as 10, asaverage result of 10 experiments are used for comparisons.As shown in Fig. 3(a), we compare the run time of PPDP with 100 devicesand varying number of objects. It can be observed that as the number ofobjects increases, the run time of LPTD remains far less than that of PPDP.For example, when the number of objects is 800, LPTD-I and LPTD-II cost8.098s and 8.696s to finish the truth discovery respectively, while PPDP takes71.172s. This is due to the reason that PPDP needs to perform time-consumingmodule exponent operations, while only multiplication operations are requiredin LPTD. The single module multiplication operation can be done in advance,which provides an added benefit. Note that, LPTD-I performs better than21PTD-II, since LPTD-II needs to execute 2 decryption operations to recoverthe aggregated results, while LPTD-I only needs to perform 2 multiplicationoperations.Similarly, from Fig. 3(b), we can also find that the total running time ofLPTD is less than that of PPDP when the number of devices ranges from100 to 700, while the number of objects is fixed at 100. When the number ofdevices reaches 700, LPTD-I and LPTD-II take 34.079s and 37.606s to finishthe truth discovery respectively, while PPDP needs 136.754s. This also confirmsthe efficiency of our scheme.
100 200 300 400 500 600 700 80001020304050607080 T o t a l r unn i ng t i m e ( s ) Number of objects PPDP LPTD-I LPTD-II (a)
100 200 300 400 500 600 700020406080100120140 T o t a l r unn i ng t i m e ( s ) Number of users PPDP LPTD-I LPTD-II (b)Figure 3: (a) Total running time with varying number of objects. (b) Total running time withvarying number of devices.
Fig. 4 shows the run time of weight update and truth update with varyingnumber of objects. Here, we set the number of devices as 100. As it can beobserved from Fig. 4(a), the run time of PPDP and LPTD are relatively stable.The reason is that, although more objects are introduced, each device only needsto perform 2 encryption operations in PPDP, and 1 encryption operation inLPTD (i.e., (
Enc ( Dist k ) , Enc (log Dist k )) vs. Enc ( Dist k )) in the weight updatephase. Since PPDP needs to execute module exponent operations, it costs higherrunning time than LPTD-I and LPTD-II. In Fig. 4(b), the running time of allschemes grow linearly. The reason is that more truths need to be updated asthe number of objects increases. It can be also found that PPDP takes highertime to finish same computations. 22
00 200 300 400 500 600 700 8000.00.51.01.52.02.53.0 R unn i ng t i m e o f w e i gh t upda t e ( s ) Number of objects PPDP LPTD-I LPTD-II (a)
100 200 300 400 500 600 700 800010203040506070 R unn i ng t i m e o f t r u t h upda t e ( s ) Number of objects PPDP LPTD-I LPTD-II (b)Figure 4: (a) Running time of weight update with varying number of objects. (b) Runningtime of truth update with varying number of objects.
Similar observations can be made in Fig. 5. For the procedure of weightupdate, since more
Dist k need to be encrypted with the increasing numberof devices, the run time of all schemes grows linearly. In the procedure oftruth update, as all schemes need to perform more aggregation operations tocalculate P Kk =1 x km · w k and P Kk =1 w k , the run time forms a linear relation withthe number of devices. Based on these results, we can conclude that LPTDschemes are more efficient then existing solutions.
100 200 300 400 500 600 70002468101214161820 R unn i ng t i m e o f w e i gh t upda t e ( s ) Number of users PPDP LPTD-I LPTD-II (a)
100 200 300 400 500 600 700020406080100120 R unn i ng t i m e o f t r u t h upda t e ( s ) Number of users PPDP LPTD-I LPTD-II (b)Figure 5: (a) Running time of weight update with varying number of devices. (b) Runningtime of truth update with varying number of devices. . Related work A number of truth discovery schemes have been studied previously [4, 5,6, 21, 22, 23, 24, 25, 26, 27], and hence can become an attractive solution forCIoT applications. Among them, CRH [4], AcuSim [5], TruthFinder [25] aresome representative schemes which can provide more reliable results by consid-ering device reliability in the aggregation process compared to the traditionalvoting or averaging approaches. However, these systems fail to take into con-sideration important privacy issues, which may disclose some personal sensitiveinformation [28, 29, 30].To protect devices’ privacy, many privacy-preserving approaches have beenproposed recently. For example, anonymization based schemes are presented by[14, 31] to protect devices’ private information. However, these cannot be usedin truth discovery scenarios, since they are not designed to protect the datavalues. Cryptography based schemes are another option to effectively protectdevices’ privacy. For example, Miao et al. [7] proposed a privacy-preservingtruth discovery scheme by utilizing the threshold Paillier cryptosystem to pro-tect users’ privacy. However, their system is based on the assumption that thereis no collusion between the cloud server and other parties. When such collusionoccurs, the devices’ privacy can be inferred. Moreover, cryptography schemesare not efficient, especially considering the battery and computation limitationof mobile devices. Another scheme [27] integrated the incentive with truth dis-covery approaches. However, the platform is trusted in their scheme which mayimpede its wide adoption. To improve the efficiency, Xu et al. [8] proposedan efficient and privacy-preserving truth discovery scheme by using an additivehomomorphic data aggregation technique. Specifically, each device is assigneda random value and secret key, and the sensory data is blinded before deliver-ing to the cloud. Finally, the authorized receivers can use the secret key andthe aggregated random values to decrypt the ciphertexts. However, in real-lifeCIoT applications, device failure or missing data is a common issue. In suchcases, this scheme does not work, since some of the random values are missing.24iao et al. [12] further proposed a lightweight and privacy-preserving truthdiscovery scheme by using two non-colluding cloud platforms. Specifically, eachdevice is assigned random values to perturb the sensory data, weighted data,and the weight. All these perturbed data is submitted to a cloud S , while theperturbation values are submitted to another cloud S . These two clouds cancooperatively compute the truths without disclosing the sensitive information.However, similar to [8], their scheme cannot achieve fault-tolerance. Moreover,if S eavesdrops the devices, it may decrypt the sensitive data by using the corre-sponding perturbation value. Finally, none of these schemes can resist externalfalse data injection attacks.
8. Conclusion
This article proposes two lightweight and privacy preserving truth discoveryschemes for CIoT. LPTD-I is able to use fog nodes to resist false data injections,and achieve efficient truth discovery with minimal overhead. LPTD-II is anextension to previous scheme, which in addition to attack resistance and efficientprivacy preservation, provides fault tolerance. Detailed security analysis showsthat the proposed LPTD schemes are secure under a comprehensive securitymodel. Experimental evaluation shows significant reduction in computationtimes as compared to other schemes.
ACKNOWLEDGMENT
This research is supported by the National Natural Science Foundation ofChina (Grant Nos. 61402037, 61272512).
ReferencesReferences [1] Q. Wu, G. Ding, Y. Xu, S. Feng, Z. Du, J. Wang, K. Long,Cognitive internet of things: A new paradigm beyond connection,25EEE Internet of Things Journal 1 (2) (2014) 129–143. doi:10.1109/JIOT.2014.2311513 .URL https://doi.org/10.1109/JIOT.2014.2311513 [2] N. Mishra, C. Lin, H. Chang, A cognitive adopted framework for iot big-data management and knowledge discovery prospective,IJDSN 11 (2015) 718390:1–718390:12. doi:10.1155/2015/718390 .URL https://doi.org/10.1155/2015/718390 [3] S. Feng, P. Setoodeh, S. Haykin, Smart home: Cognitive interactive people-centric internet of things,IEEE Communications Magazine 55 (2) (2017) 34–39. doi:10.1109/MCOM.2017.1600682CM .URL https://doi.org/10.1109/MCOM.2017.1600682CM [4] Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, J. Han,Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation,in: International Conference on Management of Data, SIGMOD2014, Snowbird, UT, USA, June 22-27, 2014, 2014, pp. 1187–1198. doi:10.1145/2588555.2610509 .URL http://doi.acm.org/10.1145/2588555.2610509 [5] X. Li, X. L. Dong, K. Lyons, W. Meng, D. Srivastava,Truth finding on the deep web: Is the problem solved?, PVLDB 6 (2)(2012) 97–108.URL [6] Y. Li, Q. Li, J. Gao, L. Su, B. Zhao, W. Fan, J. Han,On the discovery of evolving truth, in: Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery and Data Min-ing, Sydney, NSW, Australia, August 10-13, 2015, 2015, pp. 675–684. doi:10.1145/2783258.2783277 .URL http://doi.acm.org/10.1145/2783258.2783277 [7] C. Miao, W. Jiang, L. Su, Y. Li, S. Guo, Z. Qin, H. Xiao, J. Gao, K. Ren,Cloud-enabled privacy-preserving truth discovery in crowd sensing systems,26n: Proceedings of the 13th ACM Conference on Embedded NetworkedSensor Systems, SenSys 2015, Seoul, South Korea, November 1-4, 2015,2015, pp. 183–196. doi:10.1145/2809695.2809719 .URL http://doi.acm.org/10.1145/2809695.2809719 [8] G. Xu, H. Li, C. Tan, D. Liu, Y. Dai, K. Yang,Achieving efficient and privacy-preserving truth discovery in crowd sensing systems,Computers & Security 69 (2017) 114–126. doi:10.1016/j.cose.2016.11.014 .URL https://doi.org/10.1016/j.cose.2016.11.014 [9] Y. Xiao, V. K. Rayi, B. Sun, X. Du, F. Hu, M. Galloway,A survey of key management schemes in wireless sensor networks,Computer Communications 30 (11-12) (2007) 2314–2341. doi:10.1016/j.comcom.2007.04.009 .URL https://doi.org/10.1016/j.comcom.2007.04.009 [10] X. Du, H. Chen, Security in wireless sensor networks, IEEE Wireless Com-mun. 15 (4) (2008) 60–66. doi:10.1109/MWC.2008.4599222 .URL https://doi.org/10.1109/MWC.2008.4599222 [11] X. Du, Y. Xiao, M. Guizani, H. Chen,An effective key management scheme for heterogeneous sensor networks,Ad Hoc Networks 5 (1) (2007) 24–34. doi:10.1016/j.adhoc.2006.05.012 .URL https://doi.org/10.1016/j.adhoc.2006.05.012 [12] C. Miao, L. Su, W. Jiang, Y. Li, M. Tian,A lightweight privacy-preserving truth discovery framework for mobile crowd sensing systems,in: 2017 IEEE Conference on Computer Communications, IN-FOCOM 2017, Atlanta, GA, USA, May 1-4, 2017, 2017, pp. 1–9. doi:10.1109/INFOCOM.2017.8057114 .URL https://doi.org/10.1109/INFOCOM.2017.8057114 doi:10.1007/3-540-44987-6_18 .URL https://doi.org/10.1007/3-540-44987-6_18 [15] R. Lu, K. Heung, A. H. Lashkari, A. A. Ghorbani,A lightweight privacy-preserving data aggregation scheme for fog computing-enhanced iot,IEEE Access 5 (2017) 3302–3312. doi:10.1109/ACCESS.2017.2677520 .URL https://doi.org/10.1109/ACCESS.2017.2677520 [16] L. Wu, X. Du, J. Wu, Mobifish: A lightweight anti-phishing scheme for mo-bile phones, in: Computer Communication and Networks (ICCCN), 201423rd International Conference on, IEEE, 2014, pp. 1–8.[17] L. Wu, X. Du, X. Fu, Security threats to mobile multimedia applications:Camera-based attacks on mobile phones, IEEE Communications Magazine52 (3) (2014) 80–87.[18] X. Huang, X. Du, Achieving big data privacy via hybrid cloud, in: Com-puter Communications Workshops (INFOCOM WKSHPS), 2014 IEEEConference on, IEEE, 2014, pp. 512–517.[19] X. Liu, R. H. Deng, K. R. Choo, J. Weng,An efficient privacy-preserving outsourced calculation toolkit with multiple keys,IEEE Trans. Information Forensics and Security 11 (11) (2016) 2401–2414. doi:10.1109/TIFS.2016.2573770 .URL https://doi.org/10.1109/TIFS.2016.2573770 doi:10.1109/SECPRI.2000.848446 .URL https://doi.org/10.1109/SECPRI.2000.848446 [21] F. Ma, Y. Li, Q. Li, M. Qiu, J. Gao, S. Zhi, L. Su, B. Zhao, H. Ji, J. Han,Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation,in: Proceedings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Sydney, NSW, Australia, August10-13, 2015, 2015, pp. 745–754. doi:10.1145/2783258.2783314 .URL http://doi.acm.org/10.1145/2783258.2783314 [22] C. Meng, W. Jiang, Y. Li, J. Gao, L. Su, H. Ding, Y. Cheng,Truth discovery on crowd sensing of correlated entities, in: Proceedings ofthe 13th ACM Conference on Embedded Networked Sensor Systems, Sen-Sys 2015, Seoul, South Korea, November 1-4, 2015, 2015, pp. 169–182. doi:10.1145/2809695.2809715 .URL http://doi.acm.org/10.1145/2809695.2809715 [23] L. Su, Q. Li, S. Hu, S. Wang, J. Gao, H. Liu, T. F.Abdelzaher, J. Han, X. Liu, Y. Gao, L. M. Kaplan,Generalized decision aggregation in distributed sensing systems, in:Proceedings of the IEEE 35th IEEE Real-Time Systems Sympo-sium, RTSS 2014, Rome, Italy, December 2-5, 2014, 2014, pp. 1–10. doi:10.1109/RTSS.2014.40 .URL https://doi.org/10.1109/RTSS.2014.40 [24] D. Wang, L. M. Kaplan, H. K. Le, T. F. Abdelzaher,On truth discovery in social sensing: a maximum likelihood estimation approach,in: The 11th International Conference on Information Processing in SensorNetworks (co-located with CPS Week 2012), IPSN 2012, Beijing, China,29pril 16-19, 2012, 2012, pp. 233–244. doi:10.1145/2185677.2185737 .URL http://doi.acm.org/10.1145/2185677.2185737 [25] X. Yin, J. Han, P. S. Yu, Truth discovery with multiple conflicting information providers on the web,IEEE Trans. Knowl. Data Eng. 20 (6) (2008) 796–808. doi:10.1109/TKDE.2007.190745 .URL https://doi.org/10.1109/TKDE.2007.190745 [26] F. Zhang, L. He, W. He, X. Liu,Data perturbation with state-dependent noise for participatory sensing,in: Proceedings of the IEEE INFOCOM 2012, Orlando, FL, USA, March25-30, 2012, 2012, pp. 2246–2254. doi:10.1109/INFCOM.2012.6195610 .URL https://doi.org/10.1109/INFCOM.2012.6195610 [27] H. Jin, L. Su, H. Xiao, K. Nahrstedt,INCEPTION: incentivizing privacy-preserving data aggregation for mobile crowd sensing systems,in: Proceedings of the 17th ACM International Symposium on Mobile AdHoc Networking and Computing, MobiHoc 2016, Paderborn, Germany,July 4-8, 2016, 2016, pp. 341–350. doi:10.1145/2942358.2942375 .URL http://doi.acm.org/10.1145/2942358.2942375 [28] X. Du, M. Guizani, Y. Xiao, H. Chen,Secure and efficient time synchronization in heterogeneous sensor networks,IEEE Trans. Vehicular Technology 57 (4) (2008) 2387–2394. doi:10.1109/TVT.2007.912327 .URL https://doi.org/10.1109/TVT.2007.912327 [29] X. Hei, X. Du, J. Wu, F. Hu, Defending resource depletion attacks on implantable medical devices,in: Proceedings of the Global Communications Conference, 2010. GLOBE-COM 2010, 6-10 December 2010, Miami, Florida, USA, 2010, pp. 1–5. doi:10.1109/GLOCOM.2010.5685228 .URL https://doi.org/10.1109/GLOCOM.2010.5685228 [30] X. Hei, X. Du, Biometric-based two-level secure access control for implantable medical devices during emergencies,in: INFOCOM 2011. 30th IEEE International Conference on Computer30ommunications, Joint Conference of the IEEE Computer and Communi-cations Societies, 10-15 April 2011, Shanghai, China, 2011, pp. 346–350. doi:10.1109/INFCOM.2011.5935179 .URL https://doi.org/10.1109/INFCOM.2011.5935179 [31] L. Sweeney, k-anonymity: A model for protecting privacy, InternationalJournal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (5)(2002) 557–570. doi:10.1142/S0218488502001648 .URL https://doi.org/10.1142/S0218488502001648 r X i v : . [ c s . CR ] A p r JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
PTBI:An Efficient Privacy-Preserving BiometricIdentification Based on Perturbed Term in the Cloud
Chang Xu, Chuan Zhang, and Liehuang Zhu,
Member, IEEE,
Abstract —Biometric identification has been increasingly pop-ular to authenticate individuals’ identities. For efficiency andeconomic savings, biometric data owners are motivated to out-source the identification to a third party, which brings a tradeoffbetween the efficiency and privacy protection. In this paper, wepropose a new privacy-preserving biometric identification schemewhich can release the database owner from heavy computationburden. In the proposed scheme, we design a new biometric dataencryption and matching algorithm by exploiting inherent struc-tures of biometric data and introducing perturb terms. A throughanalysis indicates that our scheme is secure and offers a higherlevel of privacy protection than existing biometric identificationoutsourcing works. The experimental results further show thatour proposed scheme meets the efficiency need well.
Index Terms —Biometric identification; data outsourcing;privacy-preserving; cloud computing.
I. I
NTRODUCTION T HIS Biometric identification is a task to authenticateusers’ identities with biometric data, which includesfingerprints, irises, facial patterns, etc. Compared with thetraditional authentication methods such as passwords andidentification cards, biometric identification searches the traitscollections to find the best match for a given biometric trait[1]. As biometric sensors (e.g., fingerprint sensors, etc.) arebecoming smaller and cheaper, automatic identification basedon biometric data is becoming an attractive alternative to thetraditional authentication methods of identification [7].A typical biometric identification system consists of twoparties including a database owner and users. The databaseowner stores a set of biometric data and users can submita candidate biometric trait to the database owner for iden-tification. To release the database owner from the expensivelocal storage and heavy computation burden, more and morecompanies and governments are motivated to upload their datato the cloud server for economic and storage savings [2]. Whenintroducing cloud to the system, sensitive biometric data hasto be encrypted before outsourcing. Specifically, the databaseowner encrypts the biometric data and then sends it to thecloud server. Whenever a user (e.g., a partner of the nationalapartments such as a bank) wants to identify an individual’s(e.g., a banker) identity, the bank will submit a query to thedatabase owner. Upon receiving the query, the database owner
Chang Xu, Chuan Zhang and Liehuang Zhu are with the Beijing En-gineering Research Center of Massive Language Information Processingand Cloud Computing Application, the School of Computer Science andTechnology, Beijing Institute of Technology, Beijing 100081, China (e-mail:[email protected]; [email protected]; [email protected]).Manuscript received April 19, 2005; revised August 26, 2015. executes the query encryption and further turns to the cloudserver for identification.However, realizing such a biometric identification outsourc-ing system is challenging considering the requirements of dataprivacy and matching efficiency . Several solutions [3], [4], [6],[9], [10], [11], [12], [14], [15], [16] have been proposed totry to achieve good tradeoff between efficiency and privacyprotection. However, most of them suffer from efficiencyissues (e.g., based on complex homomorphic encryption forexample) or security drawbacks (e.g., not secure or secure butunder weak attack models). In [11] and [16], homomorphic en-cryption and obvious transfer were utilized to protect biometricdata privacy. However, with computation costs introduced,their schemes failed to support a large database. Recently,Huang et al. [3] proposed a privacy-preserving biometricidentification scheme based on homomorphic encryption andgarbled circuits. Compared with [11] [16], Huang et al.’sscheme can support a larger database up to 1GB. However,as a secure two-party system, their scheme cannot be appliedin the outsourcing model directly. To suit the outsourcingdemands, Yuan and Yu [4] proposed a cloud-based privacy-preserving biometric identification scheme. However, Zhu etal. [5] and Wang et al. [6] pointed out that Yuan and Yu’sscheme was not secure. To solve the drawbacks, Wang et al.[6] moreover proposed a biometric identification scheme byintroducing random diagonal matrices. Note that Wang et al.’sscheme was based on a weaker attack model compared with[4]. If the attacker has the ability to collude with the cloudserver, simultaneously observe some biometric data and quiresat the same time, their scheme can be completely broken.In this paper, for the first time, we propose a scheme whichachieves a higher level of privacy protection than existingworks and obtains high identification efficiency. Specifically,in our scheme, the pre-processed biometric data is encryptedand outsourced to the cloud server. When a user needs toidentify a biometric trait, the user submits the query tothe database owner where the query will be extended andencrypted. After receiving the query, the cloud server searchesthe encrypted database and returns the index of the matchingciphertext to the database owner, where the FingerCodes’
Euclidean distance can be efficiently computed. Differentfrom previous works, we exploit inherent structures of thebiometric data and introduce some perturbed terms into thedata before performing encryption. Our main contributions canbe summarized as follows: • This paper proposes efficient privacy-preserving biomet-ric identification solutions for high privacy requirements.We enable our scheme to securely outsource the biometric
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 database to the cloud server and efficiently performthe identification without compromising data privacy.As shown in Fig.1, compared with previous works, ourscheme achieves a higher level of privacy protection, asmost existing works require the attacker cannot has the a-bility to observe the biometric data and the correspondingciphertexts. Wang et al. [6] and Zhu et al. [5] even claimthat the attack based on this ability is too strong that thereexists no effective schemes to defend against. Our schemeis designed to omit this strong assumption. Moreover, asshown in Section IV, the identification efficiency in ourscheme is higher than existing solutions. To the best ofour knowledge, the proposed scheme for the first timegives consideration well to the efficiency and privacy pro-tection in the biometric outsourcing identification system. • This paper establishes a set of strict privacy requirements,see Section II-B. To the best of our knowledge, suchrequirements result in the most strict attack model in thebiometric outsourcing identification system. To defendagainst the collusion attack of a
Level-II attacker, weexploit the structure of the query data and design thePTBI-I scheme. To resist the strong attack of a
Level-III attacker, we insert some variables into the biometric datato increase the randomness and further design the PTBI-II scheme. The security analysis indicates that our PTBIschemes are secure under
Level-II and
Level-III attackrespectively. • This paper proposes an efficient biometric data encryptionand secure outsouced matching scheme. To release thedatabase owner from the tremendous computation burden,we use random matrices and vectors to execute the dataencryption and matching. Compared with the works [4],[6] utilizing the same encryption method, the proposedscheme is tailored to suit a higher level of privacy andefficiency requirement. For example, when encrypting thebiometric data, we execute fewer matrices multiplicationoperations than [4] which resulting in less data encryptingtime. And since we transmit the matrix multiplications tovector-matrix multiplications, the identification time inout scheme can significantly save as much as 73.4% costthan [6].The remainder of this paper is organized as follows: SectionII presents the problem formulation, including system model,threat model and our design goals. In Section III, we provideour construction, including two schemes with correctness andsecurity analysis followed. Performance analysis is presentedin Section IV. In Section V, we give the related work and ourconclusion is presented in Section VI.II. P
ROBLEM F ORMULATION
A. System Model
Considering a cloud-based biometric identification systeminvolves three different entities, as shown in Fig.2: the databaseowner, users and the cloud server. The database owner out-sources the encrypted database to the cloud server. Whenidentifying a user’s identify, a query will be transmitted tothe database owner and further uploaded to the cloud server.
Fig. 1. Architecture of the cloud-based biometric-identification system.
More specifically, the database owner owns a set of biometricdata (e.g., fingerprints, voice patterns, facial patterns, etc.).For convenience of database search, the database owner willbuild an index I for each biometric data. Then the index andthe encrypted biometric data are both outsourced to the cloudserver. When identifying a candidate biometric data, a query issubmitted to the database owner by a user. After receiving thequery, the database owner executes the encryption and thenuploads the ciphertext to the cloud server. Upon receiving theciphertext, the cloud server is responsible to find the bestmatch and returns the corresponding index to the databaseowner. Subsequently, the database owner computes the Eu-clidean distance between the candidate biometric data andthe plaintext corresponding to the returned index . Finally, thedatabase owner checks the distance with the defined threshold and returns the final result to the user.We assume the biometric data (e.g., fingerprint data) eitherin the user side or the database owner side has been processedsuch that the representation of the biometric data is fit for theencryption and matching. In this work, we focus on fingerprintidentification and obtain the fingerprint data following thefeature extraction algorithm [7]. In our scheme, a FingerCodewith n elements (typically n = 640 ) is utilized to represent afingerprint image.Given two FingerCodes b = [ b , b , · · · , b n ] and b =[ b , b , · · · , b n ] , their Euclidean distance is defined as: dist = vuut n X j =1 ( b j − b j ) (1)If the Euclidean distance is below the defined threshold , thetwo FingerCodes can be considered from the same person.Therefore, the process of identifying a candidate biometricdata can be described as follows: candidate FingerCodesencryption, secure Euclidean distance computation, best matchfinding and result retrieval.
The database owner executesthe first and the last steps, and others are executed on thecloud server side. In our cloud-based biometric identificationscheme, to improve the efficiency, the time-consuming match-ing operations are outsourced to the cloud server.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
Fig. 2. Architecture of the cloud-based biometric-identification system.
B. Threat Model
In our scheme, users and the cloud server are both con-sidered as semi-honest. They will honestly execute the tasksas designed protocol but try to disclose privacy as much aspossible. We assume the adversary has the ability to know theencrypted database, the encrypted queries and all the valuescomputed in the cloud server. Based on what the adversaryknows, we consider our threat model as follows: • Level-I : The adversary can observe the encrypteddatabase and the encrypted queries in the cloud server. • Level-II:
On the basis of
Level-I , the adversary has theability to observe some biometric data in the biometricdatabase, but has no idea about the corresponding cipher-texts. • Level-III:
On the basis of
Level-II , the adversary canobserve some plaintexts in the database and know thecorresponding ciphertexts. Moreover, the adversary canbe a valid user and construct some queries of his interests.
C. Design Goals
To enable the efficient identification in the cloud serverunder the aforementioned model, the design goals of ourscheme should achieve privacy protection and efficiency asfollows: • privacy protection: The system should prevent the cloudserver and the adversary from learning additional infor-mation except for what they have known. Specifically, thesystem should defend against
Level-III attack. • Efficiency:
The system should outsource the most time-consuming identification operations to the cloud server.III. O UR C ONSTRUCTION : T HE PTBI-I
AND
PTBI-IIS
CHEME
To efficiently achieve candidate FingerCode identification,the “inner product similarity” [8] is employed to quantitativelyformalize the efficient matching. In this section, we first propose a privacy-preserving efficient biometric identificationunder
Level-II attack. This scheme is named as PTBI-I. Then,we present an enhanced scheme named PTBI-II which canachieve security under
Level-III attack.
A. PTBI-I: The Basic Scheme1) Biometric Database Encryption Phase:
As described inSection II-A, the fingerprint image is assumed to be pre-processed using the extraction algorithm and generated as aFingerCode b i . The FingerCode b i = [ b i , b i , · · · , b in ] is an n -dimension vector with each element’s size l bits (typically, n = 640 and l = 8 ). To facilitate the identification matching,the FingerCode is extended to ( n +2) -dimension vector as B i ,where the ( n +1) -th element is set to − . b i + b i + · · · + b in ) and the ( n + 2) -th dimension is 1. For biometric data protec-tion, the encryption operations are performed as follows: Step 1:
The database owner randomly generates secret keysinvolving two ( n +2) × ( n +2) invertible matrices as { M , M } and one ( n + 2) -dimension vector as H , where each elementin the secret keys is a random value with the same size as theelements in the FingerCode. Step 2:
For protection of each extended FingerCode B i , arandom ( n + 2) × ( n + 2) matrix D i is generated to hide thebiometric data as: D i = A ∗ b i A ∗ b i ··· A n +2) ∗ b i A ∗ b i A ∗ b i ··· A n +2) ∗ b i ... ... ... ... A ( n +2)1 ∗ b i ( n +2) A ( n +2)2 ∗ b i ( n +2) ··· A ( n +2)( n +2) ∗ b i ( n +2) (2)where A i = [ A i , A i , · · · , A i ( n +2) ] ( i ∈ [1 , n + 2] ) is set asa random vector, and satisfies the requirement A i × H T = 1 .More specifically, FingerCode B Ti can be recovered by usingthe secret key H and the matrix D i as D i × H T = B Ti . Step 3:
After hiding the FingerCode, the database ownerfurther executes encryption as follows: C i = M − × D i × M (3)After encryption, the database owner builds an index I i andassociates it with the FingerCode b i and its encrypted form C i . Then, the tuple { C i , I i } is uploaded to the cloud serverfor storage.
2) Biometric Data Matching Phase:
In this phase, the querywill be encrypted. Before executing query encryption, we firstgive the definition of the secure Euclidean distance whichserves as the similarity measurement in our scheme.
Definition 1: secure Euclidean distance
The FingerCode which has the minimum Euclidean distancewith the query is needed to be figured out. However, it isnot necessary to compute all Euclidean distances to identifythe closest one. For example, given two FingerCodes b , b OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 and a query b c . Their secure Euclidean distance S can becomputed as follows: S = dis c − dist c = n X j =1 ( b j − b cj ) − n X j =1 ( b j − b cj ) = n X j =1 ( b j − b j ) + 2 n X j =1 ( b j − b j ) b cj (4)Base on the equation 4, the FingerCode which has smallerdistance with the query can be identified by checking the pos-itive or negative of S = P nj =1 ( b j − b j ) + 2 P nj =1 ( b j − b j ) b cj without knowing the Euclidean distance. Step 4:
When identifying a candidate FingerCode, auser submits a query FingerCode to the database own-er. The database owner then extends the query to B c =[ b c , b c , · · · , b cn , , r c ] , where r c is a random positive value.Note that, r c is chosen differently. Then, the database ownerexecutes the following operation: C F = B c × M (5)After encrypting the query, the database owner furtherencrypts H as: C H = M − × H T (6)where M − is the inverse matrix of M .The tuple { C F , C H } is then uploaded to the cloud serverfor identification. Step 5:
Upon receiving the encrypted query, the cloudserver begins to compute the similarity between the queryand the encrypted biometric data. Let P i denote the similarityscore , the computation of P i is executed as follows: P i = C F × C i × C H = B c × M × M − × D i × M × M − × H T = B c × B Ti = n +1 X j =1 b cj ∗ b ij + r c (7)Then the cloud server ranks similarity score P i , and returnsthe top-1 ranked index to the database owner.
3) Final Matching Computation Phase:
We should notethat the
Index returned from the cloud server represents theFingerCode which has the minimum Euclidean distance withthe query in the database. Since the exact Euclidean distanceis not known, the database owner needs to compute the exactdistance between b i and b c as shown in equation 1 to identifyif these two FingerCodes belong to the same person. Step 6:
After receiving the
Index I i , the database ownergets the corresponding biometric data b i and computes the Eu-clidean distance dist ic between b i and b c . Then, by checking dist ic < defined threshold , b c is identified, otherwise, denied.Finally, the database owner returns the final result to the user. Correctness Analysis
As shown in equation 7, P i is aninteger and the sign of P i − P z can be computed as follows: P i − P z = ( n +2 X j =1 b ij ∗ b cj ) − ( n +2 X j =1 b zj ∗ b cj )= ( n X j =1 b ij ∗ b cj − . n X j =1 b ij ) + r c ) − ( n X j =1 b zj ∗ b cj − . n X j =1 b zj ) + r c )= 0 . n X j =1 ( b zj − b ij ) + 2 n X j =1 ( b ij − b zj ) b cj )= 0 . dist zc − dist ic )= − . S iz (8)According to the Definition 1, S iz is an representation ofthe secure Euclidean distance . The cloud server can get thesimilarity by checking the sign of P i − P z , ≤ i, z ≤ m and i = z . More specifically, if P i − P z > , the cloudserver gets dist zc > dist ic which indicates b i better matchesthe query. Otherwise, cloud gets dist zc < dist ic . Therefore,the largest similarity score indicates the minimum Euclideandistance . After repeating the matching process for all theencrypted database, the cloud server only needs to find thelargest similarity score and returns the corresponding index . Security AnalysisTheorem 1.
PTBI-I scheme is secure under Level-II attack.Proof of Theorem 1.
See Appendix A.
B. PTBI-II: The Enhanced Scheme
PTBI-I scheme achieves identification efficiency and alsoprovides privacy protection under
Level-II attack, but it willlead to privacy leakage under
Level-III attack. Specifical-ly, the cloud server can get all the values of similarityscores according to the equation P i = C F × C i × C H .When the attacker has the ability to observe < B i , C i > ( ∈ [1 , m ] ) and construct query b c , r c can be recovered as P i − P nj =1 b ij b cj − . P nj =1 b ij . Following the same way, theattacker can construct query b ′ l and get the encryption random r l , where l ∈ [1 , t ] . After that, for unknown biometric data b k , the cloud server can compute b k according to the equation P k = P nj =1 b kj b ′ lj − . P nj =1 b kj + r l . To achieve higher levelof privacy protection, we further propose an enhanced schemeto introduce more randomness when encrypting the biometricdata.The key difference between the PTBI-I and PTBI-II schemeis that the database owner introduces some randomnessin the similarity score . Besides introducing the random-ness in the query, the database owner inserts a randomvariable into each biometric data. All the vectors are ex-tended to ( n + 3) -dimension instead of ( n + 2) and allthe matrices are extended to ( n + 3) × ( n + 3) . Morespecifically, B i = [ b i , b i , · · · , − . P nj =1 b ij , , ε i ] , B c =[ b c , b c , · · · , , r c , , where ε i is a random variable. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
The remaining operations for the biometric database en-cryption phase, biometric data matching phase are the sameas PTBI-I scheme.
Correctness Analysis
In PTBI-II scheme, the cloud servercomputes the similarity score as P i = P n +1 j =1 b cj ∗ b ij + r c + ε i .Because randomness ε i is introduced as a part of the similarityscore , the search result may not be as accurate as that in PTBI-I scheme. However, considering the obvious differentia amongthe different FingerCodes, if ε i is controlled in an appropriatescope, the search result can be considered as the expectedone. In this scheme, we let ε i follow a normal distribution N (0 , σ ) . Security Analysis
Apparently, the introduction of the ran-dom variable ε i will not compromise the security requirementsof PTBI-I, thus PTBI-II is still secure under Level-II attack.As for
Level-III attack, we have the following theorem.
Theorem 2.
PTBI-II scheme is secure under the Level-IIIattack.Proof of Theorem 2.
See Appendix B.Moreover, we compare the security with other two schemesin terms of our threat models. In Table I, we can see only ourPTBI-II scheme achieves security under all three level attacks.IV. P
ERFORMANCE A NALYSIS
To evaluate the performance of our schemes, we implementPTBI-I and PTBI-II schemes by using C language. The cloudserver is set up with 2 nodes each with 6-core 2.10 GHz In-ter(R) Xeous(R) CPU E5-2620 V2 and 32 GB of memory. Forthe database owner, we use a laptop with Intel(R) Core(TM)2.40GHz CPU and 8 GB of memory. We randomly generate640-dimensional vectors as the FingerCodes to construct thebiometric database and randomly select some of the Fnger-Codes as the queries to complete the identification task.
A. Complexity Analysis
Before implementing our scheme, we first analyze thecomplexity of our PIBI-I scheme and PTBI-II scheme. Asdescribed in Section III, our schemes can be decomposedinto three stages. In stage 1, the whole biometric databaseis encrypted. For each biometric data, the database ownerexecutes matrix multiplication operations. Note that, eachmatrix multiplication has a time complexity of O ( n ) , where n is the dimension of the FingerCode. We assume there exists m FingerCodes needed to be encrypted, the total complexityin stage 1 is O ( m ∗ n ) . In stage 2, a query is submittedto the database owner. To execute the query encryption, thedatabase owner performs vector-matrix multiplication, whichcosts O ( n ) . Similar to the previous analysis, the encryption of H also costs O ( n ) . For the cloud server side, the operation ofmatrix multiplying vectors is needed to process the similarityscore computation, which costs O ( n ) . Assuming there exists m encrypted biometric data, to figure out the FingerCodewhich has the minimum Euclidean distance with the query, thetotal computation complexity is O ( m ∗ n + mlogm ) . Note that,the identification phase can be executed in parallel on the cloud server, which can ensure our scheme is efficient. In stage 3,the database owner computes the Euclidean distance betweenthe query and the FingerCode according to the returned index,which costs O ( n ) . As shown in the TABLE II, compared withother schemes, the complexity in our scheme is the lowest inall stages. B. Experimental Evaluation
Preparation phase:
Fig.3 and Fig.4 show the time costand the bandwidth consumption in the preparation phase.Considering the biometric data encryption is a one-time cost,the preparation time in all schemes grow linearly as thethe number of FingerCodes increases. As shown in Fig.3,the preparation time is almost the same as PTBI-I, PTBI-II and Wang et al.’s scheme, which confirms the theoreticalanalysis in TABLE II. As Yuan and Yu’s scheme executes moreencryption operations, it takes more time than the other threeschemes. The bandwidth consumptions of all four schemes,as shown in Fig.4, are almost the same. Note that this is aone-time cost cost which can be bypassed by using hard diskdrive transmission services to save bandwidth consuming.
Fig. 3. Time costs for different number of FingerCodes in preparation phase.Fig. 4. Bandwidth costs for different number of FingerCodes in preparationphase.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
TABLE IS
ECURITY COMPARISON WITH OTHER SCHEMES . Schemes
Level-I attack Level-II attack Level-III attack
Yuan and Yu’s scheme [5] Yes Yes NoWang et al.’s scheme [7] Yes Yes NoPTBI-I scheme Yes Yes NoPTBI-II scheme Yes Yes No
TABLE IIA
SUMMARY OF COMPLEXITY COSTS : m DENOTES THE NUMBER OF THE BIOMETRIC DATA ; n DENOTES THE DIMENSION OF THE F INGER C ODE , n ≪ m . Schemes Preparation Phase Query Encryption Identification Phase RetrievalYuan and Yu’s scheme [5] O ( mn ) O ( n ) O ( mn + mlogm ) O ( n ) Wang et al.’s scheme [7] O ( mn ) O ( n ) O ( mn + mlogm ) O ( n ) PTBI-I scheme O ( mn ) O ( n ) O ( mn + mlogm ) O ( n ) PTBI-II scheme O ( mn ) O ( n ) O ( mn + mlogm ) O ( n ) Identification phase:
Fig.5 and Fig.6 show the time costand the bandwidth consumption in the identification phase.As shown in Fig.5, since PIBI-I and PTBI-II have the samecomplexity costs and computation operations, the time costsare almost the same. As Yuan and Yu’s scheme has morevector multiplication operations, it takes a little more time thanours. Compared with Wang et al.’s scheme, since the matrixmultiplications are transmitted to vector-matrix multiplicationswhen computing the similarity scores , our schemes can saveas much as 73.4% time cost. For bandwidth consumptionof a query, as shown in Fig.6, the growth of the numberof the FingerCodes will not influence the cost of our PTBIschemes, which is about 1.25 KB. Nevertheless, the bandwidthconsumption in [4] and [6] is also constant, but costs about400 KB. The reason is that when performing the identification,our schemes only need to transmit two vectors while other twoschemes need to upload a matrix.
Fig. 5. Time costs for different number of FingerCodes in identification phase. Fig. 6. Bandwidth costs for different number of FingerCodes in identificationphase.
V. R
ELATED W ORK
Recently, privacy protection and efficiency models on bio-metric identification have been studied well [3], [4], [6], [9],[10], [11], [12], [14], [15], [16], most of which are trying tofind a tradeoff between the efficiency and privacy protection.Wang and Hatzinakos [9] proposed a privacy-preserving facerecognition scheme. By measuring the similarity betweenstored index numbers vectors, the expected one can be i-dentified. Wong and Kim [10] presented a privacy-preservingbiometric identification scheme. However, their scheme iscomputationally infeasible if a malicious client impersonatesan honest user. To enhance privacy protection, in [11], a newprivacy-preserving biometric identification protocol is pro-posed by Barni et al. By using homomorphic encryption, theirscheme can guarantee biometric data privacy. Nevertheless,to compute distances between the query with all matchedfingerprints, heavy computation burden will be introduced fora large biometric database. Osadchy introduced a privacy-preserving scheme for identification with face image utilizingoblivious transfer [16]. It can also achieve privacy protectionin a higher level, but still suffers from efficiency problem. Tobetter balance the efficiency and privacy protection, Huang et
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7 al. [3] and Blanton et al. [12] proposed biometric identificationschemes which combine homomorphic encryption and garbledcircuits. Specifically, they use homomorphic encryption tocompute
Euclidean distance and garbled circuits to find theminimum distance. However, as a client leading system, theirschemes need to transmit the entire encrypted database fromthe database owner to the client side for each query. Similarto the former solutions [9], [10], [11], their schemes arestill two-party protocols, which heavily rely on the hardwareperformance for both owner side and client side. To omit thelocal hardware limitations, it is considered to be a promisingfuture to outsource the identification operations to a third party(e.g., the cloud server) and many solutions [4] [6] [13] [14][15] are proposed. Wong et al. [13] proposed a kNN-basedidentification scheme which provides a new way to securelysearch for the encrypted database. Hu et al. [14] proposeda new outsourcing scheme which can achieve the databasesecurity and privacy-preserving outsourcing separately. How-ever, all these schemes are based on the assumption that thereis no collusion between the third outsourcing party and theclient side, which may produce privacy disclosure problems.To achieve a higher security level, a secure kNN query schemeis proposed by Elmehdwi et al. [15]. But their scheme suffersfrom the problems such as leakage of secret keys and lowefficiency.In 2013, Yuan and Yu [4] developed an efficient privacy-preserving biometric identification in cloud computing. Theyuse matrix to design encryption scheme in the outsourcingmodel and the performance indicates that their computationalcosts are several magnitudes lower than the previous works.They claimed that their scheme can resist the known-plaintextattack (KPA) and the chosen-plaintext attack (CPA). Unfor-tunately, Zhu et al. [5] and Wang et al. [6] pointed out thattheir scheme can be completely broken if there exists collusionbetween the client side and the cloud server. Moreover, Wanget al. [6] presented a new cloud-based practical privacy-preserving outsourcing of biometric identification scheme byintroducing more random diagonal matrices to resist KPA andCPA attacks. However, Wang et al.’s scheme is based on aweaker attack model than [4]. Specifically, they assume theattacker cannot has the ability to collude with the cloud server,simultaneously observe some plaintexts of the database andconstruct quires at the same time. They claim that this attackis too strong that there exists no effective schemes whichcan defend against this attack. In this paper, we omit thisassumption by introducing perturbed terms to each biometricdata. Compared with previous works, our scheme achieves ahigher level of privacy protection and gives consideration wellto the efficiency. VI. C
ONCLUSIONS
In this paper, for the first time, our proposed schemeachieves a higher level of privacy protection and identificationefficiency than state-of-art biometric identification outsourcingschemes. Among various encryption methods for biometrictraits, we utilize matrix and perturbed terms to protect dataprivacy and design a new encryption scheme to efficiently find the best match in the cloud server. The security and experi-ments analysis indicate that our scheme can give considerationwell to the privacy protection and efficiency. In future, we willwork on designing more efficient privacy-preserving biometricidentification schemes. A
PPENDIX AP ROOF OF T HEOREM
Level-II attack, the attacker has the abilityto observe the encrypted data { C i , C F , C H } in the cloudserver. We first consider C i . The database owner generates C i with a random matrix D i and the secret keys M , M .Since D i is randomly generated, the attacker cannot recoverthe biometric data from the known knowledge. C F and C H are both encrypted by the secret keys, without knowing thequery B c , the attacker cannot learn the additional sensitiveinformation from the encryption database.In a Level-II attack, the attacker can get some plaintexts inthe database owner, but has no idea about the mapping rela-tionship between the plaintexts and the ciphertexts. Withoutknowing the mapping relationship, there is no way for theadversary to recover the biometric data.A
PPENDIX BP ROOF OF T HEOREM
Level-III attack, besides the ability mentionedin
Level-II , the attacker can i) know the mapping relation-ship between the plaintexts and ciphertexts in the biometricdatabase, and ii) be a valid user and construct query b c .As what discussed in Theorem 1, the attacker cannot recoverthe biometric data from the encrypted database. We firstconsider the mapping relationship is known to the adversary,which means the attacker can know the plaintext b i and thecorresponding ciphertext C i . Based on what gets from thecloud server, the attacker has: C F × C i = B c × M × M − × D i × M = B c × D i × M (9)Because D i is randomly generated, the attacker cannotrecover the query B c and M .The attacker can also execute the multiplication between C i and C H , denoted as C Hi , as: C Hi = C i × C H = M − × B Ti (10)In this equation, M − is a matrix with ( n + 3) × ( n + 3) unknown elements. To recover M − , the attacker can select b i ( i ∈ [1 , t ] ) to execute the computation as shown in equation10. For the j -th row vector m − j = [ m − j , m − j , · · · , m − j ( n +3) ] in M − , ≤ j ≤ ( n + 3) , the attacker has: . C ( H = m − j b + m − j b + · · · + m − j ( n +3) ε C ( H = m − j b + m − j b + · · · + m − j ( n +3) ε · · · C ( Ht )1 = m − j b t + m − j b t + · · · + m − j ( n +3) ε t (11) OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8 where there are t equations with strictly t + n + 3 unknowns.Thus, the attacker cannot compute M − j , which means M − cannot be recovered.We further consider the attacker can be a valid user andconstruct query b c . Note that b c is an n -dimension vector ( B c is the encrypted form with ( n + 3) elements, where the lastthree are set as 1, random variable r c and 1). According tothe knowledge of linear algebra , there is at most n linearlyindependent { b c , b c , . . . , b cn } can be generated to represent b c as b c = x b c + x b c + · · · + x n b cn (12)where { x , x , · · · , x n } is a set of coefficients. After b c isextended as B c , the attacker has B c = x B c + x B c + · · · + x n B cn (13)Thus, C F can be represented as C F = ( x B c + x B c + · · · + x n B cn ) M = x B c M + x B c M + · · · + x n B cn M (14)From this equation, we can see at most n linearly inde-pendent pairs of ( B cj , C F j ) can be built by an ideal attacker,where ≤ j ≤ n . Without loss of generality, we assume theattacker chooses a basis B c = [1 , , , . . . , , , r c , , B c =[0 , , , . . . , , , r c , , . . . , B cn = [0 , , , . . . , , , r cn , inthe n -dimensional vector space. Then the attacker has: C F = B c M = [1 , , , · · · , , , r c , p p ··· p n +1) p n +2) p n +3) p p ··· p n +1) p n +2) p n +3) ... ... ... ... ... ... p ( n +1)1 p ( n +1)2 ··· p ( n +1))( n +1) p ( n +1)( n +2) p ( n +1)( n +3) p ( n +2)1 p ( n +2)2 ··· p ( n +2))( n +1) p ( n +2)( n +2) p ( n +2)( n +3) p ( n +3)1 p ( n +3)2 ··· p ( n +3)( n +1) p ( n +3)( n +2) p ( n +3)( n +3) = [ p + p ( n +1)1 + r c p ( n +2)1 + p ( n +3)1 , p + p ( n +1)2 + r c p ( n +2)2 + p ( n +3)2 , · · · , p n +3) + p ( n +1)( n +3) + r c p ( n +2)( n +3) + p ( n +3)( n +3) ] ... C F n = B cn M = [ p n + p ( n +1)1 + r cn p ( n +2)1 + p ( n +3)1 , p n + p ( n +1)2 + r cn p ( n +2)2 + p ( n +3)2 , · · · , p n ( n +3) + p ( n +1)( n +3) + r cn p ( n +2)( n +3) + p ( n +3)( n +3) ] (15)The attacker will try to recover M , e.g., q ij . For example, theattacker chooses C F , and has . C ( F = p + p ( n +1)1 + r c p ( n +2)1 + p ( n +3)1 C ( F = p + p ( n +1)2 + r c p ( n +2)2 + p ( n +3)2 · · · C ( F n +3) = p n +3) + p ( n +1)( n +3) + r c p ( n +2)( n +3) + p ( n +3)( n +3) (16)where there are n + 3 equations with n + 3) + 1 unknowns.Thus, the ideal attacker cannot recover M . As for the matching process in the cloud server, accordingto the equation 7, in PTBI-II scheme, the similarity score iscomputed as follows: P i = C F × C i × C H = n X j =1 b ij b cj − . n X j =1 b ij + r c + ε i (17)the attacker can use the same attack methods to bypass thecomputation of the secret keys and derive the unknown queryFingerCode directly from other honest users. By selecting t biometric data { b , b , · · · , b t } and corresponding ciphertexts { C , C , · · · , C t } , the attacker has: . P = n X j =1 b j b cj − . n X j =1 b j + r c + ε P = n X j =1 b j b cj − . n X j =1 b j + r c + ε · · · P t = n X j =1 b tj b cj − . n X j =1 b tj + r c + ε t (18)Following the same analysis as above, there are t equationswith t + 1 unknowns. Thus, the ideal attacker cannot recoverthe query as well.We then consider the attacker can be a valid user andrecover the unknown biometric data b k by constructing query b ′ l ( l ∈ [1 , t ] ). Following the same analysis, after constructing t queries, the attacker has: . P = n X j =1 b kj b ′ j − . n X j =1 b kj + r c + ε k P = n X j =1 b kj b ′ j − . n X j =1 b kj + r c + ε k · · · P t = n X j =1 b kj b ′ tj − . n X j =1 b kj + r ct + ε k (19)where there are t equations with t + 1 unknowns, the attackercannot recover the unknown biometric data as well.Based on the above analysis, the attacker cannot access theprivate biometric data or recover the secret keys by buildingenough knowledge. Therefore, PTBI-II scheme is secure under Level-III attack. ACKNOWLEDGMENTThis research is supported by the National Natural ScienceFoundation of China (Grant Nos. 61402037, 61272512).R
EFERENCES[1] A. Jain, L. Hong, and S. Pankanti, “Biometric identification,”
Communof the ACM, vol. 43, no. 2, pp. 90-98, 2000.[2] P. Mell, and T. Grance, “The NIST working definition of cloud comput-ing,”
Communication of the ACM , vol. 53, no. 6, pp. 50, 2010.[3] D. Evans, Y. Huang, and J. Katz, “Efficient privacy-presereving biometricidentification,”
NDSS , 2011.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9 [4] J.W. Yuan, and S.C. Yu, “Efficient privacy-preserving biometric identifi-cation in cloud computing.”
INFOCOM , pp. 2652-2660, 2013.[5] Y.W. Zhu, T. Takagi, and H.U. Rong, “Security Analysis of Collusion-Resistant Nearest Neighbor Query Scheme on Encrypted Cloud Data,”
IEICE TRANSACTIONS on Information and Systems, vol. 97, no. 2, pp.326-330, 2014.[6] Q. Wang, S. Hu, K. Ren, and et al., “CloudBI: Practical privacy-preserving outsourcing of biometric identification in the cloud,”
ComputerSecurity–ESORICS , pp. 186-205, 2015.[7] A.K. Jain, S. Prabhakar, L. Hong, and et al.,“ Filterbank-based fingerprintmatching. IEEE Transactions on Image,”
IEEE Transactions on ImageProcessing, vol. 9, no. 5, pp. 846-859, 2000.[8] N. Cao, C. Wang, M. Li, and et al., “Privacy-preserving multi-keywordranked search over encrypted cloud data,”
Parallel and DistributedSystems , vol. 25, no. 1, pp. 222-233, 2014.[9] Y. Wang, and D. Hatzinakos, “Face recognition with enhanced privacyprotection,”
ICASSP , pp. 885-888, 2009.[10] K.S. Wong, and M.H. Kim, “A Privacy-Preserving Biometric MatchingProtocol for Iris Codes Verification,”
MUSIC, pp. 120-125, 2012.[11] M. Barni, T. Bianchi, D. Catalano, and et al., “Privacy-preservingfingercode authentication,”
Proceedings of the 12th ACM workshop onMultimedia and security, pp. 231-240, 2010.[12] M. Blanton, and P. Gasti, “Secure and efficient protocols for iris andfingerprint identification,”
Computer Security–ESORICS, pp. 190-209,2011.[13] W.K. Wong, D.W.L. Cheung, B. Kao, and N. Mamoulis, “Secure knncomputation on encrypted databases,”
Proceedings of the 2009 ACMSIGMOD International Conference on Management of data, pp. 139-152,2009.[14] H. Chun, Y. Elmehdwi, F. Li, Prabir, Bhattacharya, and W. Jiang.,“Outsourceable two-party privacy-preserving biometric authentication,”
Proceedings of the 9th ACM symposium on Information, computer andcommunications security, pp. 401-412, 2014.[15] Y. Elmehdwi, B.K. Samanthula, and W. Jiang, “Secure k-nearestneighbor query over encrypted data in outsourced environments,”
DataEngineering (ICDE), 2014 IEEE 30th International Conference on, pp.664-675, 2014.[16] M. Osadchy, B. Pinkas, A. Jarrous, and B. Moskovich, “Scifi-a systemfor secure face identification,”
Security and Privacy (SP), 2010 IEEESymposium on, pp. 239-254, 2010.PLACEPHOTOHERE
Michael Shell
Biography text here.
John Doe
Biography text here.