Deep Hashing for Secure Multimodal Biometrics
IIEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1
Deep Hashing for SecureMultimodal Biometrics
Veeru Talreja,
Student Member, IEEE,
Matthew Valenti,
Fellow, IEEE, and Nasser Nasrabadi,
Fellow, IEEE
Abstract —When compared to unimodal systems, multimodalbiometric systems have several advantages, including lower errorrate, higher accuracy, and larger population coverage. However,multimodal systems have an increased demand for integrityand privacy because they must store multiple biometric traitsassociated with each user. In this paper, we present a deeplearning framework for feature-level fusion that generates asecure multimodal template from each user’s face and irisbiometrics. We integrate a deep hashing (binarization) tech-nique into the fusion architecture to generate a robust binarymultimodal shared latent representation. Further, we employ ahybrid secure architecture by combining cancelable biometricswith secure sketch techniques and integrate it with a deephashing framework, which makes it computationally prohibitiveto forge a combination of multiple biometrics that passes theauthentication. The efficacy of the proposed approach is shownusing a multimodal database of face and iris and it is observedthat the matching performance is improved due to the fusion ofmultiple biometrics. Furthermore, the proposed approach alsoprovides cancelability and unlinkability of the templates alongwith improved privacy of the biometric data. Additionally, wealso test the proposed hashing function for an image retrievalapplication using a benchmark dataset. The main goal of thispaper is to develop a method for integrating multimodal fusion,deep hashing, and biometric security, with an emphasis onstructural data from modalities like face and iris. The proposedapproach is in no way a general biometrics security frameworkthat can be applied to all biometrics modalities, as furtherresearch is needed to extend the proposed framework to otherunconstrained biometric modalities.
Index Terms —Channel coding, hashing, multibiometrics,secure-sketch, template security.
I. I
NTRODUCTION B IOMETRICS are difficult to forge, and unlike in tradi-tional password-based access control systems, they donot have to be remembered. As much as these characteristicsprovide an advantage, they also create challenges related toprotecting biometrics in the event of identity theft or a databasecompromise as each biometric characteristic is distinct andcannot be replaced by a newly generated arbitrary biometric.There are serious concerns about the security and privacy ofan individual because of the proliferation of biometric usage.These concerns cannot be alleviated by using conventionalcryptographic hashing as in case of alpha-numeric passwordsbecause the cryptographic hashes are extremely sensitive tonoise and are not suitable for the protection of biometrics dueto inherent variability and noise in biometric measurements.The leakage of biometric information to an adversary con-stitutes a serious threat to security and privacy because if anadversary gains access to a biometric database, he can poten-tially obtain the stored user information. The attacker can use this information to gain unauthorized access to the system byreverse engineering the system and creating a physical spoof.Furthermore, an attacker can abuse the biometric informationfor unintended purposes and violate user privacy [1].Multimodal biometric systems use a combination of dif-ferent biometric traits such as face and iris, or face andfingerprint. Multimodal systems are generally more resistantto spoofing attacks [2]. Moreover, multimodal systems canbe made to be more universal than unimodal systems, sincethe use of multiple modalities can compensate for missingmodalities in a small portion of the population. Multimodalsystems also have an advantage of lower error rates andhigher accuracy when compared to unimodal systems [1].Consequently, multimodal systems have been deployed inmany large scale biometric applications including the FBI’sNext Genration Identification (NGI), the Department of Home-land Security’s US-VISIT, and the Government of India’sUID. However, Multimodal systems have an increased demandfor integrity and privacy because the system stores multiplebiometric traits of each user. Hence, multimodal templateprotection is the main focus of this paper.The fundamental challenge in designing a biometric tem-plate protection scheme is to manage the intra-user variabilitythat occurs due to signal variations in the multiple acquisi-tions of the same biometric trait. With respect to biometrictemplate protection, four main architectures are widely used: fuzzy commitment, secure sketch, secure multiparty computa-tion, and cancelable biometrics [3]. Fuzzy commitment andsecure sketch are biometric cryptosystem methods and areusually implemented with error correcting codes and provideinformation-theoretic guarantees of security and privacy (e.g.,[4]–[8]). Secure multiparty computation architectures are dis-tance based and use cryptographic tools. Cancelable biometricsuse revocable and non-invertible user-specific transformationsfor distorting the enrollment biometric (e.g., [9]–[12]), withthe matching typically performed in the transformed domain.For a template to be secure, it must satisfy the importantproperties of noninvertibility and revocability . Noninvertibilityimplies that given a template, it must be computationally dif-ficult to recover the original biometric data from the template.Revocability implies that if a template gets compromised,it should be possible to revoke the compromised templateand generate a new template using a different transformation.Moreover, it should be difficult to identify that the newtemplate and the old compromised template are generated fromthe same underlying biometric data.One important issue for multimodal systems is that the mul-tiple biometric traits generally do not have the same feature- a r X i v : . [ c s . C V ] D ec EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2 level representation. Furthermore, it is difficult to characterizemultiple biometric traits using compatible feature-level repre-sentations, as required by a template protection scheme [1]. Tocounter this issue there have been many fusion techniques forcombining multiple biometrics [1], [13], [14]. One possibleapproach is to apply a separate template protection schemefor each trait followed by decision-level fusion. However, suchan approach may not be highly secure, since it is limited bythe security of the individual traits. This issue motivated ourproposed approach of using multimodal biometric security toperform a joint feature-level fusion and classification.Another important issue is that biometric cryptosystemschemes are usually implemented using error control codes. Inorder to apply error control codes, the biometric feature vectorsmust be quantized, for instance by binarizing. One method ofbinarizing the feature vectors is thresholding the feature vec-tors, for example, by thresholding against the population meanor thresholding against zero. However, thresholding causes aquantization loss and does not preserve the semantic propertiesof the data structure in Hamming space. In order to avoidthresholding and minimize the quantization loss, we have usedthe idea of hashing [15], [16], which is used in the image anddata retrieval literature to achieve fast search by binarizingthe real-valued image features. The basic idea of hashing is tomap each visual object into a compact binary feature vectorthat approximately preserves the data structure in the originalspace. Owing to its storage and retrieval efficiency, hashing hasbeen used for large scale visual search and image retrieval.Recent progress in image classification, object detection,face recognition, speech recognition and many other com-puter vision tasks demonstrates the impressive learning abilityof convolutional neural networks (CNN). The robustness offeatures generated by the CNN has led to a surge in theapplication of deep learning for generating binary codes fromraw image data. Deep hashing [17]–[20] is the technique ofintegrating hashing and deep learning to generate compactbinary vectors from raw image data. There is a rich literaturerelated to the application of optimized deep learning forconverting the raw image data to binary hash codes.Inspired by the recent success of deep hashing methods,the objective of this work is to examine the feasibility ofintegrating deep hashing with a secure architecture to generatea secure multimodal template for face and iris biometrics.Contributions include: • We use deep hashing to generate a binary latent sharedrepresentation from a user’s face and iris biometrics. • We combine cancelable biometrics and secure sketchschemes to create a hybrid secure architecture. • We integrate the hybrid secure architecture with thedeep hashing framework to generate a multimodal securesketch, which is cryptographically hashed to generate thesecure multimodal template. • We analyze the trade-off between genuine accept rate(GAR) and security for the proposed secure multimodalscheme using an actual multimodal database. • Additionally, we also perform an information-theoreticprivacy analysis, and unlinkability analysis for the pro-posed secure system. The proposed approach represents a biometric securityframework integrated with multimodal fusion and deep hash-ing, and is particularly well suited for structural data frommodalities like face and iris. Our approach is not a generalbiometric security framework that can be applied to all bio-metric modalities, but rather a proposal that needs further studyand validation.The rest of the paper is organized as follows. Section IIprovides a background on deep hashing techniques and thevarious multibiometric template security schemes proposedin the literature. The proposed framework and the associatedalgorithms are introduced in Section III. Implementation de-tails are presented in Section IV. In Section V, we present aperformance evaluation of the cancelable biometric module,which is a part of the overall proposed system. The perfor-mance evaluation of the overall proposed system is discussedin Section VI. The conclusions are summarized in Section VII.II. R
ELATED W ORK
A. Deep Learning
Deep learning has emerged as a new area of machinelearning and is being extensively applied to solve problemsthat have resisted the best attempts of the machine learningand artificial intelligence community for many years. It hasturned out to be very good at discovering intricate structuresin high-dimensional data and is therefore applicable to manydomains of science, business, and government.Deep learning has been extensively implemented and ap-plied to image recognition tasks. Krizhevsky et al . [21]provided a breakthrough in the field of object recognitionand ImageNet classification by applying a CNN for objectrecognition. They were able to reduce the error rate by almosthalf. The neural network implemented in [21] is currentlyknown as
AlexNet and triggered the rapid endorsement ofdeep learning by the computer vision community. Simonyan et al . [22] increased the depth of the convolutional networkbut reduced the size of the filters being used for convolution.The main contribution in [22] was a thorough evaluation ofnetworks of increasing depth using an architecture with verysmall × convolution filters, which represented a compellingadvancement over the prior-art configurations.Szegedy et al . [23] advanced the architecture of CNN bymaking it deeper, similar to [21], and wider by introducinga CNN termed inception . One particular incarnation of thisarchitecture is known as GoogleNet which is 22 layers deep.He et al . [24] developed a very deep 152 layer convolutionalneural network architecture named
ResNet . The novelty of
ResNet lies not only in creating a very deep network but alsoin the use of a residual architecture to reformulate the layers aslearning residual functions with reference to the layer inputs,instead of learning unreferenced functions.In addition to improving performance in image and speechrecognition [21], [22], [24], [25], deep learning has producedextremely promising results for various tasks in natural lan-guage understanding, particularly topic classification, senti-ment analysis, question answering, and language translation.
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 3
B. Deep Hashing
Many hashing methods [16], [26]–[30] have been proposedto enable efficient approximate nearest neighbor search dueto low space and time complexity. These traditional hashingmethods can be categorized into data-independent or data-dependent methods. A comprehensive survey of hashing tech-niques is presented in [31]. Initial research on hashing wasmainly focused on data-independent methods, such as localitysensitive hashing (LSH). LSH methods [15] generate hashingbits by using random projections. However, LSH methodsdemand a significant amount of memory as they require longcodes to achieve satisfactory performance.To learn compact binary codes, data-dependent hashingmethods have been proposed in the literature. Data-dependentmethods learn similarity-preserving hashing functions from atraining set. Data-dependent hashing methods can be catego-rized as unsupervised [16], [32], [33] or supervised [28], [29].These methods have achieved success to some extent by usinghandcrafted features for learning hash functions. However,the handcrafted features do not preserve the semantic datasimilarities of image pairs and non-linear variation in real-world data [20]. This has led to a surge of deep hashingmethods [17]–[20], [34], [35] where deep neural networksencode non-linear hash functions. This leads to an effectiveend-to-end learning of feature representation and hash coding.Xia et al . [17] adopted a two-stage learning strategy whereinthe first stage computes hash codes from the pairwise sim-ilarity matrix and the second stage trains a deep neuralnetwork to fit the hash codes generated in the first stage. Themodel proposed by Lai et al . [18] simultaneously captures theintermediate image features and trains the hashing functionin a joint learning process. The hash function in [18] usesa divide-and-encode module, which splits the image featuresderived from the deep network into multiple blocks, each blockencoded into one hash bit. Liu et al . [20] present a deephashing model that learns the hash codes by simultaneouslyoptimizing a contrastive loss function for input image pairsand imposing a regularization on the real-valued outputs toapproximate the binary values. Zhu et al . [36] proposed a deephashing method to learn hash codes by optimizing a pairwisecross-entropy quantization loss to preserve the pairwise simi-larity and minimize the quantization error simultaneously.
C. Secure Biometrics
The leakage of biometric template information to an ad-versary constitutes a serious threat to security and privacy ofthe user because if an adversary gains access to the biometricdatabase, he can potentially obtain the stored biometric infor-mation of a user. To alleviate the security and privacy concernsin biometric usage, secure biometric architectures have beendeveloped to allow authentication without requiring that thereference biometric template be stored in its raw format at theaccess control device. Secure biometric architectures include biometric cryptosystems (e.g., fuzzy commitment and securesketch) [4], [5], [7], [8] and transformation based methods (e.g., cancelable biometrics) [3].Fuzzy commitment, a classical method of biometric protec-tion, was first proposed in 1999 [5]. Forward error correction (FEC) based fuzzy commitment can also be viewed as amethod of extracting a secret code by means of polynomialinterpolation [6]. An implementation example of such a fuzzycommitment scheme appears in [8], wherein a BCH code isemployed for polynomial interpolation; experiments show thatwhen the degree of the interpolated polynomial is increased,the matching becomes more stringent, reducing the falseaccept rate (FAR), but increasing the false reject rate (FRR).Cancelable biometrics was first proposed by Ratha et al .[9], after which, there have been various different methodsof generating cancelable biometric templates. Some popularmethods use non-invertible transforms [9], bio-hashing [10],salting [11] and random projections [12]. Literature surveyson cancelable biometrics can be found in [3], and [37].
D. Secure Multimodal Biometrics
The secure biometric frameworks have been extended toinclude multiple biometric traits of a user [1], [13], [14], [38].In [13] face and fingerprint templates are concatenated to forma single binary string and this concatenated string is used asinput to a secure sketch scheme. Kelkboom et al . [39] providedresults for decision-level, feature-level, and score-level fusionof templates by using the number of errors corrected in abiometric cryptosystem as a measure of the matching score.Nagar et al . [1] developed a multimodal cryptosystem basedon feature-level fusion using two different security architec-tures, fuzzy commitment and fuzzy vault . Fu et al . [40] theo-retically analyzed four different versions of the multibiomet-ric cryptosystem: no-split , MN-split , package , and biometricmodel , using template security and recognition accuracy asperformance metrics. In the first three versions, the biometrictemplates are secured individually with a decision-level fusion,while the last version is a feature-level fusion.Research has also been directed towards integrating cance-lable biometric techniques into multimodal systems. Canuto etal . [38] combined voice and iris using cancelable transforma-tions and decision level fusion. Paul and Gavrilova [41] usedrandom projections and transformation-based feature extrac-tion and selection to generate cancelable biometric templatesfor face and ear. There are some studies related to the useof multi-feature biometric fusion, which involves combiningdifferent features of the same biometric trait [42].However, none of the above papers present a secure ar-chitecture that combines multiple secure schemes to protectmultiple biometrics of a user. In this paper, we have integrateda deep hashing framework with a hybrid secure architecture bycombining cancelable biometric templates and secure sketch,which makes it computationally prohibitive to forge a combi-nation of multiple biometrics that passes the authentication.III. P ROPOSED S ECURE M ULTIBIOMETRIC S YSTEM
A. System Overview
In this section, we present a system overview includingdescriptions of the enrollment and authentication procedures.We propose a feature-level fusion and hashing framework forthe secure multibiometric system. The general framework forthe proposed secure multibiometric system is shown in Fig.1. During enrollment, the user provides their biometrics (e.g.,
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 4 face and iris) as an input to the deep feature extraction andbinarization (DFB) block. The output of the DFB block isan J -dimensional binarized joint feature vector e . A randomselection of feature components (bits) from the binarizedjoint feature vector e is performed. The number of randomcomponents that are selected from the binarized joint featurevector e is G . The indices of these randomly selected G components forms the enrollment key k e , which is given tothe user. The cancelable multimodal template r e is formed byselecting the values from the vector e at the correspondinglocation or indices as specified by the user-specific key k e .This random selection of G components from the binarizedjoint feature vector e helps in achieving revocability, because ifa key is compromised, a new key can be issued with a differentset of random indices. In the next step, r e is passed througha forward error correction (FEC) decoder to generate themultimodal sketch s e . The cryptographic hash of this sketch f hash ( s e ) is stored as a secure template in the database.During authentication, the probe user presents the biomet-rics and the key k p where k p could be same as the enrollmentkey k e in the case of a genuine probe or it could be asynthesized key in case of an impostor probe. The probebiometrics are passed through the DFB block to obtain a binaryvector p , which is the joint feature vector corresponding to theprobe. Using the key k p provided by the user, the multimodalprobe template r p is generated by selecting the values from p at the locations given by the key k p . In the next step, r p ispassed through a FEC decoder with the same code used duringenrollment to generate the probe multimodal sketch s p . If thecryptographic hash of the enrolled sketch f hash ( s e ) matchesthe cryptographic hash of the probe sketch f hash ( s p ), then theaccess is granted, otherwise the access is denied.The proposed secure multibiometric system consists oftwo basic modules: Cancelable Template Module (CTM) and
Secure Sketch Template Module (SSTM) , which are describedmore fully in the following subsections.
B. Cancelable Template Module
The cancelable template module (CTM) consists of twoblocks: DFB block and random-bit selection block. The pri-mary function of CTM is non-linear feature extraction, fusion,and binarization using the proposed DFB architecture shownin Figs. 2 and 3. The DFB consists of two layers: domain-specific layer (DSL) and joint representation layer (JRL). Domain-Specific Layer : The DSL consists of a CNN forencoding the face (“Face-CNN”) and a CNN for encoding theiris (“Iris-CNN”). For each CNN, we use VGG-19 [22] pre-trained on ImageNet [43] as a starting point and then fine-tuneit with an additional fully connected layer fc3 as describedin Sec. IV-B and IV-C. There are multiple reasons for usingVGG-19 pre-trained on the ImageNet dataset for encoding theface and iris. In the proposed method, the VGG-19 is onlyused as feature-extractor for face and iris modalities. It can beseen from the previous literature [44]–[49] that the featuresprovided by a VGG-19 pre-trained on ImageNet and fine-tunedon face/iris images are very discriminative and therefore canbe used for face/iris recognition. Moreover, starting with awell-known architecture and using the same architecture forboth modalities makes the work highly reproducible. hi FEC DecodingCryptographic HashingDeep FeatureExtraction andBinarize (DFB) Random Bitselection
User
Cancelable Template Module (CTM)
IrisFace
Enrollment Key (k e ) Secure Sketch TemplateModule (SSTM)Database
Cryptographic Hashing FEC DecodingDeep FeatureExtraction andBinarize (DFB) Bit Selectionusing ProbeKey
Cancelable Template Module (CTM) Secure Sketch TemplateModule (SSTM)
Probe Key (k p ) IrisFaceFaceIris
Access?
HashesMatch?
EnrollmentAuthentication hi e r e s e f hash (s e )p r p s p f hash (s p ) hi Fig. 1: Block diagram of the proposed system. fc1fc2 ConcatenateJoint fc JointRepresentationLayer Softmax
Prediction
Pool1
NormalizedIrisFace
Pool3 Pool5Pool4
HashingLayerFace-CNNIris-CNN D o m a i n S p ec i fi c L aye r Conv164 Conv2128 Conv3256 Conv4512 Conv5512Pool2
FusionLayerfc3fc3fc1fc2
Pool1 Pool3 Pool5Pool4Conv2128 Conv3256 Conv4512 Conv5512Pool2Conv164 e ee e
Fig. 2: Proposed deep feature extraction and binarization(DFB) model for the fully concatenated architecture (FCA). Joint Representation Layer : The output feature vectorsof the Face-CNN and Iris-CNN are fused and binarized inthe JRL, which is split into two sub-layers: fusion layer andhashing layer. The main function of the fusion layer is tofuse the individual face and iris representations from domain-specific layers into a shared multimodal feature embedding.The hashing layer binarizes the shared multimodal featurerepresentation that is generated by the fusion layer.
Fusion layer : We have implemented two different architec-tures for the fusion layer: (1) fully concatenated architecture(FCA), and (2) bilinear architecture (BLA). These two archi-tectures differ in the way the face and iris feature vectors arefused together to generate the joint feature vector.In the FCA shown in Fig. 2, the outputs of the Face-CNNand Iris-CNN are concatenated vertically using a concatenationlayer. The concatenated feature vector is passed through a fullyconnected layer (hereon known as joint fully connected layer )which reduces the feature dimensionality (i.e., the numberof dimensions is reduced) and also fuses the iris and facefeatures. In the FCA, the concatenation layer and the jointfully connected layer together constitute the fusion layer.In the BLA shown in Fig. 3, the outputs of the Face-CNNand Iris-CNN are combined using the matrix outer product;i.e., the bilinear feature combination of column face featurevector f face and column iris feature vector f iris given by f face f T iris .Similar to the FCA, the bilinear feature vector is also passedthrough a joint fully connected layer. In the BLA, the outerproduct layer and the joint fully connected layer together EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 5 fc1fc2
OuterProduct
Joint fc JointRepresentationLayer Softmax
Prediction
Pool1
NormalizedIrisFace
Pool3 Pool5Pool4
HashingLayerFace-CNNIris-CNN D o m a i n S p ec i fi c L aye r Conv164 Conv2128 Conv3256 Conv4512 Conv5512Pool2
FusionLayerfc1fc2
Pool1 Pool3 Pool5Pool4Conv2128 Conv3256 Conv4512 Conv5512Pool2Conv164 e ee e
Fig. 3: Proposed deep feature extraction and binarization(DFB) model for the bilinear architecture (BLA).constitute the fusion layer.In addition to the two techniques (FCA, BLA) used in thispaper, there could be other fusion techniques for combiningmultiple modalities [50]. The rationale behind implementingFCA is that we wanted to use a fusion technique that involvesjust simple concatenation where there is no interaction be-tween the two modalities being fused before the joint fullyconnected layer (Joint f c ). As evident from Fig. 2, the irisand face extracted features do not interact with each other andhave their own network parameters before passing through thejoint fully connected layer. On the other hand, we also wantedto test a fusion technique that involves high interactionsbetween the two modalities feature vectors at every elementbefore being passed through the joint fully connected layer.That is the reason we have used BLA, which is based on bilinear fusion [51]. Bilinear fusion exploits the higher-leveldependencies of the modalities being combined by consideringthe pairwise multiplicative interactions between the modalitiesat each feature element (i.e., matrix outer product of modalitiesfeature vector). Moreover, bilinear fusion is widely beingused in many CNN applications such as fine-grained visualrecognition and video action recognition [50], [51]. Hashing layer : The output of the fusion layer produces a J -dimensional shared multimodal feature vector of real values.We can directly binarize the output of the fusion layer bythresholding at any numerical value or at the population mean.However, this kind of thresholding leads to a quantizationloss, which results in sub-optimal binary codes. To accountfor this quantization loss, we have included another latentlayer after the fusion layer, which is known as the hashinglayer (shown in orange in Fig. 2 and 3). The main function ofthe hashing layer is to binarize (hash) the shared multimodalfeature representation generated by the fusion layer.One key challenge of implementing deep learning to hashend-to-end is converting deep representations, which are real-valued and continuous, to exactly binary codes. The signactivation function h = sgn ( z ) can be used by the hashinglayer to generate the binary hash codes. However, the use ofthe non-smooth sign-activation function makes standard back-propagation impracticable as the gradient of the sign functionis zero for all non zero inputs. The problem of zero gradientat the hashing layer due to a non-smooth sign activation canbe diminished by using the idea of continuation methods [52]. We circumvent the zero-gradient problem by starting witha smooth activation function y = tanh ( βx ) and making itsharper by increasing the bandwidth β as the training proceeds.We have utilized a key relationship between the sign activationfunction and the scaled tanh function using limits: lim β →∞ tanh ( βx ) = sgn ( x ) , (1)where β > is a scaling parameter. The scaled functiontanh ( βx ) will become sharper and more saturated as weincrease β during training. Eventually, this non-smooth tanhfunction with β → ∞ converges to the original, difficult tooptimize, sign activation function. For training the network,we start with a tanh ( βx ) activation for the hashing layer with β = 1 and continue training until the network converges tozero loss. We then increase the value of β while holding othertraining parameters equal to the previously converged networkparameters, and start retraining the network for convergence.This process is repeated several times by increasing thebandwidth of the tanh activation as β → ∞ until the hashinglayer can generate binary codes. In addition to using thiscontinuation method for training the network, we have usedadditional cost functions for efficient binary codes. The overallobjection function used for training is discussed in Sec. IV-A Random-Bit Selection : One of the most prevalent meth-ods for generating cancelable template involves random pro-jections of the biometric feature vector [12], in which therandom projection is a revocable transformation. Similarly,the DFB architecture is considered to be the projection of thebiometric images in a J -dimensional space. The randomnessand revocability is added by performing a random bit selectionof G bits from the J -dimensional output vector e of theDFB. After the selection, these random bits are then arrangedin descending order of reliability. The reliability of eachbit is computed as ((1 − p eg ) p ei ) , where p ei and p eg are theimpostor and genuine bit error probabilities, respectively [1].A different set of random bits is selected for every user andthese randomly selected G bits form the cancelable multimodaltemplate r e and the indices of the selected bits forms thekey for that user k e . This key is revocable and a new set ofrandom bits can be selected in case the key gets compromised.Selecting a new set of bits requires that either the originalvector e be retrieved from a secure location or else the useris re-enrolled, thereby presenting a new instance of e . Thismethod of using the DFB architecture with a random bitselection is analogous to a random projection as a revocabletransformation to generate a cancelable template [12].It is important to note that even if multiple users end uphaving the same key k e (i.e., same indices of G random bits),their final templates will still be distinct because the templatedepends on the values at those G bits (i.e., r e ) from theenrollment vector e , and not only on the indices of the G bits. A second user having the same key k e is equivalent tothe stolen key scenario, which is analyzed in Sec. V-B. C. Secure Sketch Template Module
As shown in Fig. 1, the cancelable template (output ofCTM) r e is an intermediate template and is not stored inthe database. The cancelable template is passed through the EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 6
SSTM to generate the secure multimodal template, whichis stored in the database. As the name suggests, the SSTMmodule is related to the secure sketch biometric templateprotection scheme. The SSTM contains two important blocks:FEC decoding and cryptographic hashing. The main functionof the SSTM is to generate a multimodal secure sketch byusing the cancelable template as an input to the FEC decoder.This multimodal secure sketch (output of the FEC decoder)is cryptographically hashed to generate the secure multimodaltemplate, which is stored in the database.The FEC decoding implemented in our framework is theequivalent of a secure-sketch template protection scheme. Ina secure-sketch scheme, sketch or helper data is generatedfrom the user’s biometrics and this sketch is stored in theaccess-control database. A common method of implementingsecure sketch is to use error control coding. In this methoderror control coding is applied to the biometrics or the featurevector to generate a sketch which is stored in the database.Similarly, in our framework, the FEC decoding is consideredto be the error control coding part required to generate thesecure sketch. Our approach is different from other securesketch approaches using error correcting codes (ECC) as we donot have to present any other side information to the decoderlike a syndrome or a saved message key [53].The cancelable template r e from the CTM is consideredto be the noisy codeword of an ECC that we can select. Thisnoisy codeword is decoded with a FEC decoder and the outputof the decoder is the multimodal secure sketch s e that corre-sponds to the codeword closest to the cancelable template. Thismultimodal sketch s e is cryptographically hashed to generate f hash ( s e ) stored in the database.During authentication, the same process is performed. Theprobe user provides the biometrics and the key which areused to generate the probe template r p . The probe template r p is passed through an FEC decoder for the same errorcorrecting code used during the enrollment. The output ofthe FEC decoder is the probe multimodal sketch s p whichis cryptographically hashed and access is granted only if thishash matches the enrolled hash. During authentication, if itis a genuine probe, the enrollment r e and the probe vector r p would usually decode to the same codeword in which case thehashes would match and access would be granted.IV. I MPLEMENTATION
A. Objective Function for Training the Deep Hashing Network
In this section, the objective function used for training thedeep hashing network is described.
Semantics-preserving binary codes : In order to constructsemantics-preserving binary codes, we propose to model therelationship between the labels and the binary codes. Everyinput image is associated with a semantic label, which isderived from the hashing layer’s binary-valued outputs, andthe classification of each image is dependent on these binaryoutputs. Consequently, we can ensure that semantically similarimages belonging to the same subject are mapped to similarbinary codes through an optimization of a loss function definedon the classification error. The classification formulation has been incorporated into the deep hashing framework by addingthe softmax layer as shown in Fig. 2 and Fig. 3. Let E denotethe objective function required for classification formulation: E ( w ) = 1 N N (cid:88) n =1 L n ( f ( x n , w ) , y n ) + λ || w || , (2)where the first term L n ( . ) is the classification loss for atraining instance n and is described below, N is the numberof training images in a mini-batch. f ( x n , w ) is the predictedsoftmax output of the network and is a function of the inputtraining image x n and the weights of the network w . Thesecond term is the regularization function where λ governsthe relative importance of the regularization.The choice of the loss function L n ( . ) depends on theapplication itself. We use a classification loss function that usessoftmax outputs by minimizing the cross-entropy error func-tion. Let the predicted softmax output f ( x n , w ) be denoted by ˆ y n . The classification loss for the n th training instance is: L n (ˆ y n , y n ) = − M (cid:88) m =1 y n,m ln ˆ y n,m , (3)where y n,m and ˆ y n,m is the ground truth and the predictionresult for the m th unit of the n th training instance, respectivelyand M is the number of output units. Additional cost constraints for efficient binary codes :The continuation method that has been described in III-B2forces the activations of the hashing layer closer to -1 and1. However, we need to include additional cost constraints toobtain more efficient binary codes.Let the J -dimensional vector output of the hashing layerbe denoted by o Hn for the n -th input image, and let the i -thelement of this vector be denoted by o Hn,i ( i = 1 , , , · · · , J ) .The value of o Hn,i is in the range of [ − , because it has beenactivated by the tanh activation. To make the codes closer toeither -1 or 1, we add a constraint of maximizing the sum ofsquared errors between the hashing layer activations and 0,which is given by (cid:80) Nn =1 || o Hn − || , where N is the numberof training images in a mini-batch and is the J -dimensionalvector with all elements equal to 0. However, this is equivalentto maximizing the square of the length of the vector formedby the hashing layer activations, that is (cid:80) Nn =1 || o Hn − || = (cid:80) Nn =1 || o Hn || . Let E ( w ) denote this constraint to boost theactivations of units in hashing layer to be closer to -1 or 1: E ( w ) = − J N (cid:88) n =1 || o Hn || . (4)In addition to forcing the codes to become binarized,we also require that the codes satisfy a balance propertywhereby they produce an equal number of -1’s and 1’s,which maximizes the entropy of the discrete distribution andresults in hash codes with better discrimination. To achievethe balance property, we want each bit to fire of thetime by minimizing the sum of the squared error betweenthe mean of the hashing layer activations and . This isgiven by (cid:80) Nn =1 ( mean ( o Hn ) − , which is equivalent to (cid:80) Nn =1 ( mean ( o Hn )) where mean(.) computes the average ofthe elements of the vector. This criterion helps to obtain binary EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 7 codes with an equal number of -1’s and 1’s. Let E ( w ) denotethis constraint that forces the output of each node to have a chance of being -1 or 1: E ( w ) = N (cid:88) n =1 ( mean ( o Hn )) . (5)Combining the above two constraints (binarizing and bal-ance property constraints) makes o Hn close to a length J binarystring with a chance of each bit being -1 or 1. Overall objective function : The overall objective functionto be minimized for a semantics-preserving efficient binarycodes is given as: αE ( w ) + βE ( w ) + γE ( w ) , (6)where α , β , and γ are the tuning parameters of each term. Theoptimization to be performed to minimize the overall objectivefunction is given as: w = arg min w ( αE ( w ) + βE ( w ) + γE ( w )) (7)The optimization given in (7) is the sum of the lossesform and can be performed via the stochastic gradient de-scent (SGD) efficiently by dividing the training samples intobatches. For training the JRL we adopt a two-step trainingprocedure where we first train only the JRL using the objectivefunction in (6) greedily with softmax by freezing the Face-CNN and Iris-CNN. After training the JRL, the entire modelis fine-tuned end-to-end using the same objective function withback-propagation at a relatively small learning rate.For tuning the hyper-parameters α , β , and γ of the objectivefunction (6), we have utilized an iterative grid search. Tostart, consider a cubic grid with all possible values for eachparameter. Each point on this grid ( α , β , γ ) represents a com-bination of the three hyper-parameters. Because exhaustivelysearching over all combinations is computationally expensive,we adopted an iterative and adaptive grid search.In the iterative and adaptive grid search, for each hyper-parameter, we considered the set of values S = { , i } for i = { , ..., } ; i.e., the set containing 1 and all positive evenintegers from 2 to 30. This grid search is performed iteratively,where each iteration is a combination of 3 steps. In the firststep, we fixed α , and γ to be 1 and β is chosen from the set S . Therefore the set of points considered for this step is: ( α, β, γ ) = (1 , β i , , where β i ∈ S . (8)For each point in the above set (1 , β i , , we trained our DFBnetwork and calculated the genuine accept rate (GAR) forthe overall system for a security of 104 bits using a 5-foldcross validation. Using this method, we found the best valuefor hyper-parameter β that gave us the highest GAR with thevalues of α and γ as 1. This best value of β will be denotedas β t where the superscript t signifies the iteration number.In the second step, we repeated the same process with α and β fixed at 1 and choosing γ from the set S : ( α, β, γ ) = (1 , , γ i ) , where γ i ∈ S . (9)Again using a 5-fold cross validation, we found the best valuefor hyper-parameter γ , which is denoted by γ , that gave usthe highest GAR with the values of α and β fixed as 1. In thethird step, the same procedure was performed by keeping β , and γ fixed at 1 and found the best value for hyper-parameter α , which is denoted by α , from the set S . These three stepstogether complete one iteration of the iterative grid search.In the next iteration, we again performed the above 3 stepsbut instead of fixing the values of the two parameters to 1, wefixed the value of the two parameters to be the best value foundin the previous iteration for those parameters. To explain this,consider the best value of the 3 parameters found in the firstiteration, denoted by α , β , γ . In the first step of the seconditeration, we fixed α , and γ to be α and γ respectively andchose β from the set S . Therefore the set of points are: ( α, β, γ ) = ( α , β i , γ ) , where β i ∈ S . (10)Again, using a 5-fold cross validation, we found the best valuefor hyper-parameter β with the other parameters set to α and γ . This best value of β will be denoted as β since this is thesecond iteration. Similarly, we performed the second and thirdsteps of the second iteration to find the γ and α respectively.We continued performing these iterations until the param-eters converged, which implies that the best value of eachparameter did not change from one iteration to the other; i.e., α t = α t − , β t = β t − , γ t = γ t − . Using the above procedure for hyperparameter tuning, wehave found the values of α t , β t , and γ t to be 8, 2, 2 for FCAand 6, 4, 2 for BLA respectively. The importance of each termwill be further discussed in the ablation study in Section VI-D. B. Network parameters for the Face-CNN
The network used for the Face-CNN is the VGG-19 withan added fully connected layer fc3 (shown in Fig. 2). TheFace-CNN is fine-tuned end-to-end with the CASIA-Webface[54], which contains 494,414 facial images corresponding to10,575 subjects. After fine-tuning with CASIA-Webface, theFace-CNN is next fine-tuned with the 2013 session of theWVU-Multimodal face 2012-21013 dataset [55]. The WVU-Multimodal face dataset for the year 2012 and 2013 togethercontain a total of 119,700 facial images corresponding to 2263subjects with 294 common subjects. All the raw facial imagesare first aligned in 2-D and reduced to a fixed size of × before passing through the network [56]. The only other pre-processing is subtracting the mean RGB value, computed onthe training set, from each pixel. The training is carried out byoptimizing the multinomial logistic regression objective usingmini-batch gradient descent with momentum. The batch sizewas set to 40, and the momentum to 0.9. The training wasregularized by weight decay (the L2 penalty multiplier set to0.0005) and dropout regularization for the first three fully-connected layers (dropout ratio set to 0.5). We used batchnormalization for fast convergence. The learning rate wasinitially set to 0.1, and then decreased to of its value every10 epochs. The number of nodes in the last fully connectedlayer fc3 before the softmax layer is 1024 for the FCA and64 for the BLA. This implies that the feature vector extractedfrom Face-CNN and fused with the feature vector from Iris-CNN has 1024 dimensions for the FCA and 64 for the BLA. EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 8
C. Network parameters for the Iris-CNN
The network used for the Iris-CNN is the VGG-19 with anadded fully connected layer fc3 . First, the Iris-CNN has beenfine-tuned end-to-end using the combination of CASIA-Iris-Thousand [57] and ND-Iris-0405 [58] with about 84,000 irisimages corresponding to 1355 subjects. Next, the Iris-CNN isfine-tuned using the 2013 session of the WVU-Multimodal iris2012-21013 dataset [55]. The WVU-Multimodal iris datasetfor the year 2012 and 2013 together contain a total of 257,800iris images corresponding to 2263 subjects with 294 commonsubjects. All the raw iris images are segmented and normalizedto a fixed size of × using Osiris (Open Source for IRIS)which is an open source iris recognition system developed inthe framework of the BioSecure project [59]. There is no otherpre-processing for the iris images. The other hyper-parametersare consistent with the fine-tuning of the Face-CNN. The irisnetwork has an output of 1024 for FCA and 64 for BLA. D. Network parameters for the Joint Representation Layer
The details of the network parameters for the two JRLarchitectures are discussed in this subsection:
1) Fully Concatenated Architecture:
In the FCA, the 1024-dimensional outputs of the Face-CNN and Iris-CNN are con-catenated vertically to give a 2048-dimensional vector. Theconcatenated feature vector is then passed through a fullyconnected layer which reduces the feature dimensionality from2048 to 1024 and also fuses the iris and face features. Thehashing layer is also a fully connected layer that outputs a1024-dimensional vector and includes a tanh activation.For the training of the DFB model, we have used a two-step training procedure. First, only the JRL was trained for 65epochs, a batch size of 32. The learning rate initially set to0.1, and then decreased to of its value every 20 epochs.The other hyperparameters are consistent with the fine-tuningof the Face-CNN. After training of the joint representationlayer, the entire DFB model was fine-tuned end-to-end for 25epochs on a batch size of 32. The learning rate initialized to0.07 which is the final learning rate in the training process ofthe joint fully connected layer in the first step. The learningrate was decreased to of its value every 5 epochs. For thistwo-step training process, we have used the 2013 session ofthe overlap subjects in the 2012 and 2013 sessions from theWVU-Multimodal dataset. This common subset consists of subjects with a total of face and iris imageswith the same number of face and iris images per subject.
2) Bilinear architecture:
For the BLA, we do not add fc3 (i.e., the additional fully connected layer) to either the Face-CNN or the Iris-CNN. In addition, the number of nodes inthe first and second fully connected layers f c and f c arereduced to 512 and 64, respectively. This means that theoutput feature vector of the face and iris networks have 64dimensions rather than the 1024 dimensions of the FCA. The64-dimensional outputs of the Face-CNN and Iris-CNN arecombined in the bilinear (outer product) layer using the matrixouter product as explained in Sec. III-B2. The bilinear layerproduces an output of dimension ×
64 = 4096 fusing the irisand face features. The bilinear feature vector is then passedthrough a fully connected layer, which reduces the feature dimension from 4096 to 1024 followed by a hashing layerwhich produces a binary output of 1024 dimensions.In the first step of the two-step training process, only thejoint representation layer was trained for 80 epochs on a batchsize of 32. The momentum was set to 0.9. The learning ratewas initially set to 0.1, and then decreased by a factor of 0.1every two epochs. The other hyperparameters and the inputimage sizes are consistent with the training process used inFCA. After training of the joint representation layer, the entireDFB model was fine-tuned for 30 epochs on a batch size of 32.The learning rate was initialized to 0.0015 which is the finallearning rate in the training process of the joint representationlayer in the first step. The learning rate was decreased by afactor of 0.1 every five epochs. The other hyper-parametersare consistent with the training of the JRL in FCA.
E. Parameters for the FEC Decoding
The cancelable template generated from the CTM is consid-ered to be the noisy codeword of some error correcting codethat we can select. Due to its maximum distance seperable(MDS) property, we have selected Reed-Solomon (RS) codesand used RS decoder for FEC decoding in SSTM. The G -dimensional cancelable template is passed through a Reed-Solomon (RS) decoder to identify the closest codeword, whichis the multimodal secure sketch.RS codes use symbols of length m bits. The input to theRS decoder is of length N (cid:48) = 2 m − in symbols, which meansthe number of bits per input codeword to the decoder is n (cid:48) = mN (cid:48) . For example, if the symbol size m = 6 then N (cid:48) = 63 isthe codeword length in symbols and n (cid:48) = 378 is the codewordlength in bits. Let’s assume the size of the cancelable templateis G = 378 bits, which is the number of bits at the inputto the RS decoder. This 378-dimensional vector is decodedto generate a secure sketch whose length is K (cid:48) symbols or,equivalently, k (cid:48) = mK (cid:48) bits. K (cid:48) can be varied depending onthe error correcting capability required for the code and k (cid:48) also signifies the security of the system in bits [60].We have used shortened RS codes. A shortened RS code isone in which the codeword length is less than m − symbols.In standard error control coding, the shortening of the RScode is achieved by setting a number of data symbols to zeroat the encoder, not transmitting them, and then re-insertingthem at the decoder. A shortened [ N, K ] Reed-Solomon codeessentially uses an [ N (cid:48) , K (cid:48) ] encoder, where N (cid:48) = 2 m − ,where m is the number of bits per symbol (symbol size)and K (cid:48) = K + ( N (cid:48) − N ) . In our experiments we haveused m = 8 and N (cid:48) = 255 . In the case of using shortenedRS codes, the size of the cancelable template is consideredequal to N symbols rather than N (cid:48) symbols. For example,the output of the cancelable template block could be 768 bitswhich equals to N = 768 / symbols. The securityof the secure multimodal template depends on the selectedvalue of K, implying that the security of the system is k bits,where k = mK . The output of the decoder is a length- k binary message, which is cryptographically hashed and storedas the secure multimodal template in the database. When aquery is presented for authentication, the system approves theauthentication only if the cryptographic hashes of the querymatches with the specific enrolled identity. EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9
V. E
XPERIMENTAL R ESULTS FOR THE CANCELABLEMULTIMODAL TEMPLATE
We have evaluated the matching performance and the secu-rity of our proposed secure multibiometric system using theWVU multimodal database [55] containing images for faceand iris modalities. Note that all the experiments have beenperformed with optimized hyper-parameters. We have used { α , β , γ } as {
8, 2, 2 } for FCA and {
6, 4, 2 } for BLA, respectively.In this section, we analyze the cancelable multimodal tem-plate, which is the output of the CTM. Analyzing the outputof the CTM helps us to gain insight into the requirementsand the strength of the error correcting code to be used in thesecure sketch template module (SSTM). In the next section, weanalyze the secure multimodal template, which is the outputof the overall secure multimodal system. A. Evaluation Protocol
For the cancelable multimodal template, equal error rate(EER) has been used as one of the metrics to evaluatethe matching performance for various levels of random bitselection (values of G ). EER indicates a value that the pro-portion of false acceptances is equal to the proportion of falserejections. The lower the equal error rate value, the higherthe accuracy of the biometric system. We have also usedthe genuine and impostor distribution curves along with thereceiver operating characteristic (ROC) curves to evaluate thematching performance of the cancelable template. B. Performance Evaluation
After fine-tuning the entire DFB, we test this network byextracting features using the JRL of the DFB. In both theFCA and BLA architectures, the output is a 1024-dimensionaljoint binarized feature vector. For testing, we have used 50subjects from the WVU-Multimodal 2012 dataset. The trainingand testing set are completely disjoint which means these 50subjects have never been used in the training set. 20 faceand 20 iris images are chosen randomly for each of these 50subjects. This will give 20 pairs (face and iris) per subject withno repetitions. These 1,000 pairs (50 × are forward passedthrough the DFB and 1024-dimensional 1,000 fused featurevectors are extracted. A user-specific random-bit selectionis performed using the fused feature vector to generate thecancelable multimodal template. The number of randomlyselected bits G that we have used in our experiments is equalto 128, 256, 512, 768 bits out of the 1024 dimensional binaryfused vector to generate the cancelable multimodal template.In this section, we present the results for the statisticalanalysis of the cancelable multimodal template, using twodifferent architectures (FCA and BLA) for fusing the face andiris features. The performance evaluation for each architectureis also discussed here.Two scenarios have been considered for the evaluation ofthe secure templates. One is the unknown key scenario. Inthis scenario, the impostor does not have access to the keyof the legitimate user. The impostor tries to break into thesystem by posing as a genuine user by presenting an artificiallysynthesized key (which is different from the actual key of thegenuine user) and also presenting impostor biometrics. This -0.2 0 0.2 0.4 0.6 0.8 Normalized Hamming Distance D i s t r i bu t i on Intra-User Distance Attacker Distance (Stolen Key) Inter-User Distance (Unknown Key) (a) 256 bits
Normalized Hamming Distance D i s t r i bu t i on Intra-User Distance Attacker Distance (Stolen Key) Inter-User Distance (Unknown Key) (b) 768 bits
Normalized Hamming Distance D i s t r i bu t i on Intra-User Distance Inter-User Distance (c) 1024 bits
Fig. 4: Genuine and impostor distribution of cancelable tem-plate distances using FCA for varying number of random bits.means that the impostor will try to present random indices forour random-bit selection method in the CTM. These randomindices are different from the actual indices that were selectedduring the enrolment for the legitimate user. The other scenariois the stolen key scenario. In this scenario the impostor hasaccess to the actual key of the genuine user and tries to breakthe system by presenting actual key with impostor biometrics.The genuine and impostor distributions for the cancelabletemplate for FCA in the unknown key and stolen key scenariosgenerated by varying the number of random bits selectedby the CTM is given in Fig. 4. The genuine and impostordistributions shown in Fig. 4 have been generated by fittinga normal distribution curve to the histogram. We first observethat there is no overlap between the inter-user (impostor) andintra-user (genuine) distributions. These distributions assumethat every user employs his own key. Also plotted is anattacker (stolen key) distribution in which a user (attacker)uses the key of another user (victim). In this case, the attackerdistribution slightly overlaps with the genuine distribution,
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 10
100 200 300 400 500 600 700 800
Number of randomly selected bits E qua l E rr o r R a t e ( % ) Stolen Key Unknown Key
FaceIrisJoint
Fig. 5: EER curves for face, iris, joint-FCA modalities inunknown key (dashed lines) and stolen key (solid lines)scenarios using different sizes of cancelable template.
100 200 300 400 500 600 700 800
Number of randomly selected bits E qua l E rr o r R a t e ( % ) Stolen Key Unknown Key
FaceIrisJoint
Fig. 6: EER curves for face, iris, and joint-BLA modalitiesin unknown key (dashed lines) and stolen key (solid lines)scenarios for different sizes of cancelable template.but the overlap between the two is still reasonably small. Inaddition, observe that as the number of random bits selectedgrows from 256 to 768, the overlap between the genuine andimpostor distributions reduces in both the scenarios. However,when all the 1024 bits are used, the overlap again is increased.This clearly shows the trade-off between the security (selectionof ’G’ random bits) and the matching performance (overlap ofthe distributions). Notice that there is no “stolen key” curve inFig. 4(c) as all the 1024 bits are used with no down-selectionof bits, and hence, no key.The EER plots for FCA and BLA are given in Fig. 5and Fig. 6, respectively. EER plot is obtained by calculatingthe value of EER by varying the length of the cancelabletemplate (number of randomly selected bits). In general, itcan be observed from the EER plots that there is an increasein performance by using additional biometric features andthe multimodality (joint) template performs better than theindividual modalities (face and iris). As seen from the curves,the EER for the joint modality is lower than the EER for faceor iris. For example, the EER for joint modality using FCA andBLA at 512 bits for stolen key scenario is . and . ,respectively. Using the same settings, the EER for face and -5 -4 -3 -2 -1 FAR (False Accept Rate) G A R ( G enu i ne A cc ep t R a t e ) FaceIrisJoint-FCAJoint-BLA
Fig. 7: ROC curves for face, iris, joint-FCA, and joint-BLAin unknown key scenario for a random selection of 768 bits. -5 -4 -3 -2 -1 FAR (False Accept Rate) G A R ( G enu i ne A cc ep t R a t e ) FaceIrisJoint-FCAJoint-BLA
Fig. 8: ROC curves for face, iris, joint-FCA, and joint-BLAin stolen key scenario for a random selection of 768 bits.iris is . and . , respectively. This clearly shows thatthere is an improvement by fusing multiple modalities.The ROC curves for both the architectures have been com-pared in Fig. 7 and 8 for unknown and stolen key scenarios,respectively, when the number of randomly selected values(security) is 768 bits. Again, we can clearly observe that thejoint modality performs better than the individual modality.For a false accept rate (FAR) of . , the genuine accept rate(GAR) for stolen key scenario using FCA and BLA is . and . , respectively. For face and iris, the GAR is . and . , respectively at an FAR of . .As observed from the plots, the matching performance is notcompromised for high security and the multimodality gives usbetter performance than unimodality.VI. E XPERIMENTAL R ESULTS FOR THE O VERALL S YSTEM
In this section, we analyze the performance at the output ofthe overall system, where the output of the overall system isthe secure multimodal template that is stored in the database.
A. Evaluation Protocol
We evaluate the trade-off between the matching perfor-mance and the security of the proposed secure multimodalsystem using the curves that relate the GAR to the securityin bits (i.e., the G-S curves). The G-S curve is acquired by
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 11
Security in Bits (k) G enu i ne A cc ep t R a t e ( G A R ) Fig. 9: G-S curves using FCA in unknown key (dashed) andstolen key (solid) scenarios for different values of n bits.varying the error correcting capability of the Reed-Solomoncode used for FEC decoding in the SSTM. The error correctingcapability of a code signifies the number of bits (or symbols)that a given ECC can correct. The error correcting capability ofa Reed-Solomon code is given by ( N − K )2 symbols or ( n − k )2 bits. We vary the error correcting capability of the code byusing different code rates ( K/N ). B. Performance Evaluation
As explained in Sec. IV-E, the output of the cancelabletemplate block ( n bits) is decoded in order to generate amultimodal secure sketch of length k bits, where k alsorepresents the security of the proposed secure multibiometricsystem. This multimodal sketch is cryptographically hashedand stored as the secure multimodal template in the database.When a query is presented for authentication, the systemauthenticates the user only if the cryptographic hash of thequery matches that of the specific enrolled identity.We have experimented with different values of N symbolswith m = 8 and N (cid:48) = 255 symbols using shortened RS codes.The G-S curves for different values of n bits (equivalent to N symbols) for unknown and stolen key scenarios using FCAand BLA are given in Fig. 9 and Fig. 10, respectively. Wecan observe from the curves that as the size of the cancelabletemplate in bits ( n ) increases, the GAR for a given level ofsecurity in bits ( k ) also increases.For example at a security ( k ) of bits (equivalent to K = 13 symbols) using FCA with the stolen key scenario,the GAR for n =128, 256, 512, and 768 bits is equal to . , . , . , and . , respectively. Similarlyfor the unknown key scenario and FCA, the GAR for n =128,256, 512, and 768 bits is equal to . , . , . , and . , respectively. It can be observed that the use of alarger cancelable template results in better performance. Thisperformance improvement can be attributed to the fact that anincrease in n at a fixed value of k (security) improves the errorcorrecting capability of the RS codes which is given by ( n − k )2 and hence a better matching performance.Table I summarizes the GAR for different values of n atsecurity levels of 56, 80, and 104 bits using both FCA andBLA. The error correcting capabilities in bits (cid:16) ( n − k )2 (cid:17) for Security in Bits (k) G enu i ne A cc ep t R a t e ( G A R ) Fig. 10: G-S curves using BLA in unknown key (dashed) andstolen key (solid) scenarios for different values of n bits. Security in Bits (k) G enu i ne A cc ep t R a t e ( G A R ) Joint-BLAJoint-FCAFaceIris
Fig. 11: G-S curves for face, iris, joint-FCA, and joint-BLAmodalities in unknown key (dashed lines) and stolen key (solidlines) scenarios for n = 768 bits.the RS codes at different security levels are also given in thetable. From the Table I, it can be observed that for a givensize of the cancelable template in bits ( n ), the error correctingcapability decreases with an increase in the required securitylevels in bits ( k ) of the system, which results in a decreasein GAR. This implies that the code cannot correct the intra-class variations at high code rates ( k/n ) (higher value of k ),which results in a reduced GAR. This is the trade-off betweenthe matching performance (GAR) and the security ( k ) of thesystem. We have chosen a minimum security level of 56 bitsfor comaprison in Table I which is higher when compared tothose reported in the literature [1].The plot in Fig. 11 gives a comparison of G-S curvesfor face, iris, joint-FCA, and joint-BLA modalities using m = 8 , N (cid:48) = 255 and n = 768 bits (equivalent to N = 96 symbols) for unknown and stolen key scenario, respectively.The security for the iris modality in stolen key scenario at aGAR of is 20 bits. However, by incorporating additionalbiometric features (face), the security of the multibiometricsystem using FCA increases to 128 bits at the same GAR. C. Comparison with State-of-the-Art Hashing Techniques
As a further experiment, we compare the proposed hashingtechnique with other hashing techniques. This is done byreplacing our hashing method with two other hashing methods
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 12
TABLE I: GARs of FCA and BLA in unknown and stolen key scenarios at a security level of56, 80 and 104 bits using different cancelable template size ( N ). N n
Security ( K ) Security ( k ) ( n − k )2 FCA-GAR BLA-GAR(symbols) (bits) (symbols) (bits) Unknown Stolen Unknown Stolen32 256 7 56 100 82.30% 82.15% 82.25% 80.66%10 80 88 31.32% 32.68% 36.67% 35.92%13 104 76 4.3% 4.33% 6.77% 6.07%64 512 7 56 228 99.65% 99.68% 98.95% 99.77%10 80 216 97.85% 94.95% 94.63% 94%13 104 204 84.63% 82.05% 84.41% 85.15%96 768 7 56 356 99.93% 99.99% 99.55% 99.22%10 80 344 99.37% 99.44% 99.04% 99.04%13 104 332 98.95% 99.16% 96.51% 96.75%
Security in Bits (k) G enu i ne A cc ep t R a t e ( G A R ) FCAFCA+SSDHFCA+HashNet
Fig. 12: G-S curves to compare performance of the proposedhashing with two other hashing techniques for FCA in un-known (dashed lines) and stolen key (solid lines) scenarios.[61] and [52], and then training and testing the multimodal au-thentication system using the same WVU multimodal dataset.The rest of the system is kept the same for comparisonpurposes. We have compared our hashing technique withsupervised semantics-preserving deep hashing (SSDH) [61],and HashNet [52] and evaluated the overall system to produceG-S curves. We have used the FCA system for comparison.We denote the system with our proposed hashing techniqueas “FCA”, use “FCA+SSDH” to denote the FCA architecturewith our hashing function replaced by the SSDH hashing,and use “FCA+HashNet” to denote our FCA architecture withthe HashNet hashing function. Fig. 12 shows G-S curves forstolen key and unknown key scenarios. It can clearly be seenthat our proposed hashing method performs better than theother two deep hashing techniques for the given multimodalbiometric security application. Compared to the other twohashing techniques, our proposed method improves the GARby at least . at a high security of 104 bits. A comparisonof our hashing technique against others for an image-retrievalapplication can be found in the Appendix. D. Ablation Study
The objective function defined in (6) contains 3 constraints,one for the semantics-preserving binary codes (i.e., for clas-sification) and two constraints for efficient binary codes (i.e.,for binarization and entropy maximization). In this section, westudy the relative importance of each of these terms.First, we measure the influence of the classification term E by setting α = 1 , β = 0 , and γ = 0 . Using this setting,we train our DFB model and evaluate the overall system bycalculating the GAR for a security of k =
56, 80, and 104 bits for n = 768 bits (similar to Table I) on the test data for theWVU-Multimodal 2012 dataset. We also study the effect of thebinarization constraint along with classification term by setting α = 1 , β = 1 , and γ = 0 , train our DFB model and againevaluate the overall system by calculating the GARs. Finally,we set α = 1 , β = 1 , and γ = 1 , and train the DFB model andevaluate the overall system. We performed this experiment forboth FCA and BLA architectures only for stolen key scenariobecause we can see from Table I that unknown key and stolenkey scenarios give very similar results. The GAR results forthis experiment are shown in Table II.It can be observed from Table II that the classificationterm E is the most important term. However, adding thebinarization and the entropy constraints E and E (i.e., α = 1 , β = 1 , γ = 1 ) definitely help to improve the matchingperformance (i.e., GAR) by at least . at a high securityof 104 bits in our proposed system. We also note that thisperformance improvement is evident for both FCA and BLAarchitectures. Therefore, using all the terms proves beneficialto improve the matching performance evident at higher levelof security for both FCA and BLA architectures.TABLE II: GARs of FCA and BLA in stolen key scenarioshowing the influence of each term in the objective function. n Security ( k )Hyper-parameters (bits) (bits) FCA-GAR BLA-GAR α = 1 , β = 0 , γ = 0
768 56 99.16% 98.71%80 98.32% 96.87%104 95.26% 93.29% α = 1 , β = 1 , γ = 0
768 56 99.73% 98.76%80 98.8% 97.14%104 95.72% 94.72% α = 1 , β = 0 , γ = 1
768 56 99.52% 98.70%80 98.41% 97.02%104 95.43% 93.98% α = 1 , β = 1 , γ = 1
768 56 99.9% 99%80 99.8% 97.6%104 96.5% 95.6%
E. Privacy Analysis
The objective of our work is to design a multimodal au-thentication system that maximizes the matching performancewhile keeping the biometric data secure. However, the problemis complicated by the possibility that the adversary maygain access to the enrollment key k e , the multimodal securesketch s e , the enrollment feature vector e , or any combinationthereof. Using this information, the adversary could not onlycompromise the authentication integrity of the system, but mayalso extract information about the biometric data. The systemshould be robust in these scenarios and the system designshould minimize the privacy leakage, which is the leakageof the user biometric information from the compromised data,and preserve authentication integrity of the system. EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 13
The G-S curves which have been discussed in Sec. VI-Bquantify the security of the system. In this subsection, wewill quantify the privacy leakage of the user’s biometricinformation for our proposed system. The privacy of the useris compromised if the adversary gains access to the enrollmentfeature vector e as we assume that the enrollment featurevector can be de-convolved to recover the biometric data ofthe user. The information leaked about the user’s enrollmentfeature vector e can be quantified as mutual information: I ( e ; V ) = H ( e ) − H ( e | V ) , (11)where e represents the enrollment feature vector, and V represents the information that adversary has access to. V could be the enrollment key k e and/or the multimodal securesketch s e . H ( e ) represents entropy of e and quantifies thenumber of bits required to specify e . In particular, H ( e ) = J because the optimization described in Sec. IV-A is designed toensure that the J bits in the encoded template are independentand equally likely to be 0 or 1. H ( e | V ) is the entropy of e given V and quantifies the remaining uncertainty about e given knowledge of V . The mutual information I ( e ; V ) is thereduction in uncertainty about e given V [3].Let’s assume that the adversary gains access to the enroll-ment key k e . In this case V = k e and mutual information is: I ( e ; k e ) = H ( e ) − H ( e | k e ) = 0 , (12)because H ( e | k e ) = H ( e ) = J as the key k e does not giveany information about the enrollment feature vector e . k e justgives the indices of the random values selected from e butdoes not provide values at those indices.The information leakage when s e or the pair ( k e , s e ) iscompromised can be quantified using the conditional mutualinformation because s e is dependent on r e which is driven by k e . Hence, the information leakage when the secure sketch iscompromised is conditionally dependent on k e and given as: I ( e ; s e | k e ) = H ( e | k e ) − H ( e | s e , k e ) , (13)where H ( e | k e ) quantifies the remaining uncertainty about e given knowledge of k e and H ( e | s e , k e ) quantifies the re-maining uncertainty about e given knowledge of k e and s e .This conditional mutual information is measured under twoscenarios discussed below. Both s e and k e are compromised: In this scenario theadversary gains access to both s e and k e . As previouslydiscussed, H ( e | k e ) = H ( e ) = J because knowing k e doesnot provide any information about e . If the adversary knows s e , the information leakage of r e due to s e is equal to the lengthof s e which is k bits. The adversary can use this informationof r e with the additional knowledge of the enrollment key k e and exactly know the indices and the values for the k bits inthe enrollment vector e . However, there is still uncertainityabout the remaining J − k bits of the enrollment featurevector e , which implies H ( e | s e , k e ) = J − k . Therefore, theinformation leakage about enrollment feature vector when bothsecure sketch and enrollment key are compromised is: I ( e ; s e | k e ) = H ( e | k e ) − H ( e | s e , k e )= J − ( J − k ) = k. (14) Only s e is compromised: In this scenario the adversarygains access to only s e . Even in this case if the adversaryknows s e , the information leakage of r e due to s e is k bits. However, the adversary does not have any informationabout the enrollment key k e which means that there is addeduncertainty in the information about the enrollment featurevector e as the adversary does not know the exact locationsof the k bits given by s e . This added uncertainity is measuredby H ( k e ) which is calculated using combinatorics and is: H ( k e ) = log (cid:18) Jn (cid:19) , (15)where n is the size of the key and (15) provides all thecombinations that n bits could be selected from J . Therefore,the conditional mutual information is given as: I ( e ; s e | k e ) = H ( e | k e ) − H ( e | s e , k e )= J − (cid:18) J − k + log (cid:18) Jn (cid:19)(cid:19) = k − log (cid:18) Jn (cid:19) = max (cid:18) , k − log (cid:18) Jn (cid:19)(cid:19) , (16)where the max function is applied in the last equation asinformation leakage cannot be negative. We have evaluated(16) using different values of n and k for J = 1024 bits. Weknow that n ranges from to J depending on the numberof random bits selected from the enrollment feature vector e and k ranges from to n depending on the rate of the errorcorrecting code. We found that information leakage is zero forall the values of k for n ranging from to bits. However, if n > , there is a positive information leakage for k > .From (14) and (16), we can conclude that for J = 1024 , theideal value of n should be less than and ideal value of k should be small. This would make the information leakageto be zero or small in case if s e or the pair ( s e , k e ) getscompromised. These values of n and k would also keep thematching performance high as shown in Fig. 11. F. Unlinkability Analysis
According to ISO/IEC International Standard 24745 [62],transformed templates generated from the same biometric ref-erences should not be linkable across applications or databases.By using the protocol defined in [63], we have evaluated theunlinkability of the proposed system. The protocol in [63] isbased on mated ( H m ) and non-mated ( H nm ) samples distri-butions. Mated samples correspond to the templates extractedfrom the samples of the same subject using different user-specific keys. Non-mated samples correspond to the templatesextracted from the samples of different subjects using differentkeys. For an unlinkable system, there must exist a significantoverlap between mated and non-mated score distributions [63].Using these distributions, two measures of unlinkability arespecified: i) Local measure D ↔ ( s ) evaluates the linkabilityof the system for each specific linkage score s and is de-pendent upon the likelihood ratio between score distributions. D ↔ ( s ) ∈ [0 , and is defined over the entire score domain. D ↔ ( s ) = 0 denotes full unlinkability, while D ↔ ( s ) = 1 EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 14 (a) FCA-104 (b) FCA-128(c) BLA-104 (d) BLA-128
Fig. 13: Unlinkability analysis of the proposed system for FCAand BLA for different quantities of security bits (104, 128).denotes full linkability of two transformed templates at score s . All values of D ↔ ( s ) between 0 and 1 indicate an increasingdegree of linkability. ii) Global measure D sys ↔ provides anoverall measure of the linkability of the system independent ofthe score domain and is a fairer benchmark for unlinkabilitycomparison of two or more systems. D sys ↔ ∈ [0 , , where D sys ↔ = 1 indicates full linkability for all the scores ofthe mated samples distribution and D sys ↔ = 1 indicates fullunlinkability for the whole score domain. All values of D sys ↔ between 0 and 1 indicates an increasing degree of linkability.According to the benchmark protocol defined in [63], sixtransformed databases were generated from WVU Multimodalface and iris test dataset by using different set of random bits(enrollment key) in the CTM for each template of a subject.The linkage score we have used is the Hamming distancebetween the s e and s p . The mated samples distribution and thenon-mated samples distribution were computed across thesesix databases. These score distributions are used to calculatelocal measure D ↔ ( s ) , which is further used to compute theglobal measure D sys ↔ (overall linkability of the system). Fig.13 shows unlinkability curves when transformed templatesare generated for joint-FCA, and joint-BLA modalities using m = 8 , N (cid:48) = 255 , and n = 768 . We have tested with twoquantities of security bits k = 104 and k = 128 bits. Withsignificant overlap, the overall linkability of the system isclose to zero for both joint-FCA ( D sys ↔ = 0 . ) and joint-BLA ( D sys ↔ = 0 . ). Based on this discussion, the proposedsystem can be considered to be unlinkable.VII. C ONCLUSION
We have presented a feature-level fusion and binarizationframework using deep hashing to design a multimodal tem-plate protection scheme that generates a single secure templatefrom each user’s multiple biometrics. We have employed ahybrid secure architecture combining the secure primitives of cancelable biometrics and secure-sketch and integrated it witha deep hashing framework, which makes it computationally TABLE III: Mean average precision (MAP % ) comparisonwith other hashing methods for 32, 48 and 64 bits. Methods ImageNet32 48 64LSH [15] 25.42 33.74 36.18ITQ [16] 46.96 53.23 57.05CCA-ITQ [16] 47.1 55.67 58.80DHN [36] 49.17 57.19 59.82HashNet [52] prohibitive to forge a combination of multiple biometrics thatpasses the authentication. We have also proposed two deeplearning based fusion architectures, fully connected architec-ture and bilinear architecture that could be used to combinemore than two modalities. Moreover, we have analyzed thematching performance and the security, and also performedalso unlinkability analysis of the proposed secure multibiomet-ric system. Experiments using the WVU multimodal dataset,which contain face and iris modalities, demonstrate that thematching performance does not deteriorate with the proposedprotection scheme. In fact, both the matching performance andthe template security are improved when using the proposedsecure multimodal system. However, we want to clarify thatwhile the proposed solution is an interesting biometric securityframework, in particular for structured data from modalitieslike face and iris, further validation is required to show howmuch it can work with other biometric modalities. Finally, thegoal of this paper is to motivate researchers to investigate howto generate secure compact multimodal templates.A
PPENDIX I MAGE - RETRIEVAL E FFICIENCY ON I MAGE N ET D ATASET
In order to test the effectiveness of the hashing layer inour proposed methods, we have also tested our deep hashingmethod for image retrieval on the ImageNet (ILSVRC 2015)[43] dataset and compared the retrieval performance againstsome baseline hashing methods. The ImageNet dataset con-tains over 1.2 million images in the training set and about50 thousand images in the validation set corresponding to1000 categories. For comparison, we follow the same settingin [52]. We randomly select 100 categories and use all thecorresponding training set images as our database and corre-sponding validation set images as our query points. We select100 images per category from database as training points.For evaluation, we use Mean Average Precision(MAP@1000), Precision curves with Hamming radius 2( P @ r = 2 ), and Precision curves for different numbers of topreturned samples ( P @ K ). We compare our proposed hashingmethod with 6 state-of-the-art hashing methods includingshallow hashing methods LSH [15], ITQ [16], CCA-ITQ[16], and the deep hashing methods DHN [36], HashNet [52]and SSDH [61]. We report results using source code providedby the respective authors except for DHN for which we reportresult published in [52]. For all the shallow hashing methods,we use VGG-19 fc7 features as input, and for deep hashingmethods, we use raw images as input. For fair comparisonwe use VGG-19 for all the deep hashing methods.We can observe from the MAP comparison in Table IIIthat our hashing technique is better than shallow hashing EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 15
30 35 40 45 50 55 60 65
Number of bits P r e c i s i on Our_MethodSSDHHashNetDHN (a) P@ r = 2
100 200 300 400 500 600 700 800 900 1000
Number of retrieved samples P r e c i s i on Our_MethodSSDHHashNetDHN (b) P@K for 64 bits
Fig. 14: Experimental precision results for the ImageNetdataset for different deep hashing methods.methods for all hash code lengths. Also, our hashing methodis competitive with the other state-of-the-art deep hashingmethods when the size of the hash code is 32 bits, at the higherhash code lengths of 48 and 64 bits, our hashing technique isslightly better than other deep hashing methods by . .Fig. 14(a) shows the Hamming precision curves for Ham-ming radius r = 2 (P@ r = 2 ) for different hash code lengthsonly for the deep hashing methods. Fig. 14(b) shows theprecision for hash code length of 64 bits for different numberof top retrieved results (P@K) only for the deep hashingmethods. Our hashing technique consistently provides betterprecision than all the other hashing methods for the samenumber of retrieved results. Also, it is noted from Fig. 14(a)that precision at 32 bits is better than the precision at 48 and 64bits. This is because when using longer binary codes, the datadistribution in Hamming space becomes progressively sparseand fewer samples fall within the set Hamming radius [34].A CKNOWLEDGMENT
This research was funded by the Center for IdentificationTechnology Research (CITeR), a National Science Foundation(NSF) Industry/University Cooperative Res. Center (I/UCRC).R
EFERENCES[1] A. Nagar, K. Nandakumar, and A. K. Jain, “Multibiometric cryptosys-tems based on feature-level fusion,”
IEEE Transactions on InformationForensics and Security , vol. 7, no. 1, pp. 255–268, Feb. 2012.[2] A. Ross and A. K. Jain, “Multimodal biometrics: An overview,” in
Proc.European Signal Processing Conference , Sept. 2004, pp. 1221–1224.[3] S. Rane, Y. Wang, S. C. Draper, and P. Ishwar, “Secure biometrics:Concepts, authentication architectures, and challenges,”
IEEE SignalProcessing Magazine , vol. 30, no. 5, pp. 51–64, Sept. 2013.[4] Y. Sutcu, Q. Li, and N. Memon, “Protecting biometric templateswith sketch: Theory and practice,”
IEEE Transactions on InformationForensics and Security , vol. 2, no. 3, pp. 503–512, Sept. 2007.[5] A. Juels and M. Wattenberg, “A fuzzy commitment scheme,” in , Nov. 1999, pp.28–36. [6] A. Juels and M. Sudan, “A fuzzy vault scheme,” in
Proc. IEEEInternational Symposium on Information Theory , July 2002, p. 408.[7] K. Nandakumar, A. K. Jain, and S. Pankanti, “Fingerprint-based fuzzyvault: Implementation and performance,”
IEEE Transactions on Infor-mation Forensics and Security , vol. 2, no. 4, pp. 744–757, Dec. 2007.[8] A. Nagar, K. Nandakumar, and A. K. Jain, “Securing fingerprint tem-plate: Fuzzy vault with minutiae descriptors,” in
Proc. 19th InternationalConference on Pattern Recognition , Dec. 2008.[9] N. K. Ratha, S. Chikkerur, J. H. Connell, and R. M. Bolle, “Generatingcancelable fingerprint templates,”
IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 29, no. 4, pp. 561–572, April 2007.[10] A. Kong, K.-H. Cheung, D. Zhang, M. Kamel, and J. You, “An analysisof biohashing and its variants,”
Pattern Recognition , vol. 39, no. 7, pp.1359–1368, July 2006.[11] J. Zuo, N. K. Ratha, and J. H. Connell, “Cancelable iris biometric,”in
Proc. IEEE International Conference on Pattern Recognition , Dec.2008, pp. 1–4.[12] A. B. Teoh, Y. W. Kuan, and S. Lee, “Cancellable biometrics andannotations on biohash,”
Pattern Recognition , vol. 41, no. 6, pp. 2034–2044, June 2008.[13] Y. Sutcu, Q. Li, and N. Memon, “Secure biometric templates fromfingerprint-face features,” in
Proc. IEEE Conference on Computer Visionand Pattern Recognition , June 2007.[14] K. Nandakumar and A. K. Jain, “Multibiometric template security usingfuzzy vault,” in
Proc. IEEE International Conference on Biometrics:Theory, Applications and Systems , Oct. 2008.[15] A. Gionis, P. Indyk, R. Motwani et al. , “Similarity search in highdimensions via hashing,” in
Proc. International Conference on VeryLarge Data Bases , vol. 99, no. 6, Sept. 1999, pp. 518–529.[16] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantiza-tion: A procrustean approach to learning binary codes for large-scaleimage retrieval,”
IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 35, no. 12, pp. 2916–2929, Dec. 2013.[17] R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for imageretrieval via image representation learning,” in
Proc. AAAI Conferenceon Artificial Intelligence , July 2014.[18] H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning andhash coding with deep neural networks,” in
Proc. IEEE Conference onComputer Vision and Pattern Recognition , June 2015, pp. 3270–3278.[19] K. Lin, J. Lu, C. S. Chen, and J. Zhou, “Learning compact binarydescriptors with unsupervised deep neural networks,” in
IEEE Conf. onComputer Vision and Pattern Recognition , June 2016, pp. 1183–1192.[20] H. Liu, R. Wang, S. Shan, and X. Chen, “Deep supervised hashing forfast image retrieval,” in
Proc. IEEE Conference on Computer Vision andPattern Recognition , June 2016, pp. 2064–2072.[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
Proc. Advances in NeuralInformation Processing Systems , Dec. 2012, pp. 1097–1105.[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,”
CoRR , vol. abs/1409.1556, Sept. 2014.[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”in
Proc. IEEE Conference on Computer Vision and Pattern Recognition ,June 2015.[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proc. IEEE Conference on Computer Vision and PatternRecognition , June 2016, pp. 770–778.[25] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchicalfeatures for scene labeling,”
IEEE Transactions on Pattern Analysis andMachine Intelligence , vol. 35, no. 8, pp. 1915–1929, Aug. 2013.[26] K. He, F. Wen, and J. Sun, “K-means hashing: An affinity-preservingquantization method for learning binary compact codes,” in
IEEE Conf.on Computer Vision and Pattern Recognition , June 2013, pp. 2938–2945.[27] P. Jain, B. Kulis, and K. Grauman, “Fast image search for learnedmetrics,” in
Proc. IEEE Conference on Computer Vision and PatternRecognition , June 2008.[28] B. Kulis and T. Darrell, “Learning to hash with binary reconstructiveembeddings,” in
Proc. Advances in Neural Information ProcessingSystems , Dec. 2009, pp. 1042–1050.[29] M. Norouzi and D. M. Blei, “Minimal loss hashing for compact binarycodes,” in
Proc. 28th International Conference on Machine Learning ,July 2011, pp. 353–360.[30] M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes fromshift-invariant kernels,” in
Proc. Advances in Neural Information Pro-cessing Systems , Dec. 2009, pp. 1509–1517.[31] J. Wang, H. T. Shen, J. Song, and J. Ji, “Hashing for similarity search:A survey,”
CoRR , vol. abs/1408.2927, Aug. 2014.
EEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 16 [32] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in
Proc.Advances in Neural Information Processing Systems , Dec. 2009, pp.1753–1760.[33] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,“Content-based image retrieval at the end of the early years,”
IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 22,no. 12, pp. 1349–1380, Dec. 2000.[34] X. Yuan, L. Ren, J. Lu, and J. Zhou, “Relaxation-free deep hashingvia policy gradient,” in
The European Conference on Computer Vision(ECCV) , September 2018.[35] Z. Chen, X. Yuan, J. Lu, Q. Tian, and J. Zhou, “Deep hashing viadiscrepancy minimization,”
IEEE Conference on Computer Vision andPattern Recognition (CVPR) , pp. 6838–6847, 2018.[36] H. Zhu, M. Long, J. Wang, and Y. Cao, “Deep hashing network forefficient similarity retrieval.” in
Proc. AAAI Conference on ArtificialIntelligence , Feb. 2016, pp. 2415–2421.[37] V. M. Patel, N. K. Ratha, and R. Chellappa, “Cancelable biometrics: Areview,”
IEEE Signal Processing Magazine , vol. 32, no. 5, pp. 54–65,Sept. 2015.[38] A. M. Canuto, F. Pintro, and J. C. Xavier-Junior, “Investigating fusionapproaches in multi-biometric cancellable recognition,”
Expert Systemswith Applications , vol. 40, no. 6, pp. 1971–1980, May 2013.[39] E. J. C. Kelkboom, X. Zhou, J. Breebaart, R. N. J. Veldhuis, andC. Busch, “Multi-algorithm fusion with template protection,” in
Proc.IEEE International Conference on Biometrics: Theory, Applications, andSystems , Sept. 2009.[40] B. Fu, S. X. Yang, J. Li, and D. Hu, “Multibiometric cryptosystem:Model structure and performance analysis,”
IEEE Transactions onInformation Forensics and Security , vol. 4, no. 4, pp. 867–882, Dec.2009.[41] P. P. Paul and M. Gavrilova, “Multimodal cancelable biometrics,”in
Proc. IEEE International Conference on Cognitive Informatics &Cognitive Computing , Aug. 2012, pp. 43–49.[42] C. Rathgeb and C. Busch, “Cancelable multi-biometrics: Mixing iris-codes based on adaptive bloom filters,”
Computers & Security , vol. 42,pp. 1–12, May 2014.[43] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, andL. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
International Journal of Computer Vision (IJCV) , vol. 115, no. 3, pp.211–252, 2015.[44] K. D. Nguyen, C. Fookes, A. Ross, and S. Sridharan, “Iris recognitionwith off-the-shelf CNN features: A deep learning perspective,”
IEEEAccess , vol. 6, pp. 18 848–18 855, 2017.[45] Z. Zhao and A. Kumar, “Towards more accurate iris recognition usingdeeply learned spatially corresponding features,” in
Proc. of the IEEEInternational Conference on Computer Vision , 2017, pp. 3809–3818.[46] S. Minaee, A. Abdolrashidiy, and Y. Wang, “An experimental study ofdeep convolutional features for iris recognition,” in , 2016, pp. 1–6.[47] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embed-ding for face recognition and clustering,”
IEEE Conference on ComputerVision and Pattern Recognition (CVPR) , pp. 815–823, 2015.[48] Y. Sun, D. Liang, X. Wang, and X. Tang, “Deepid3: Face recognitionwith very deep neural networks,”
CoRR , vol. abs/1502.00873, 2015.[49] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,”
Proc. British Machine Vision Conference (BMVC) , pp. 41.1–41.12,September 2015.[50] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-streamnetwork fusion for video action recognition,” in
Conference on ComputerVision and Pattern Recognition (CVPR) , 2016.[51] T. Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear CNN models forfine-grained visual recognition,” in
Proc. IEEE International Conferenceon Computer Vision , Dec. 2015, pp. 1449–1457.[52] Z. Cao, M. Long, J. Wang, and P. S. Yu, “Hashnet: Deep learning to hashby continuation,” in
Proc. IEEE International Conference on ComputerVision , Oct. 2017, pp. 5609–5618.[53] Y. Sutcu, S. Rane, J. S. Yedidia, S. C. Draper, and A. Vetro, “Featureextraction for a Slepian-Wolf biometric system using LDPC codes,” in
Proc. IEEE International Symposium on Information Theory , July 2008,pp. 2297–2301.[54] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation fromscratch,”
CoRR , vol. abs/1411.7923, Nov. 2014.[55] “WVU multimodal dataset.” [Online]. Available: http://biic.wvu.edu/[56] D. E. King, “Dlib-ml: A machine learning toolkit,”
J. Mach. Learn. Res. ,vol. 10, pp. 1755–1758, Dec. 2009.[57] “CASIA-iris-thousand.” [Online]. Available:http://biometrics.idealtest.org/ [58] K. W. Bowyer and P. J. Flynn, “The ND-IRIS-0405 iris image dataset,”
CVRL, Dept. Comp. Sci. Eng., Univ Notre Dame , 2010.[59] G. Sutra, B. Dorizzi, S. Garcia-Salitcetti, and N. Othman, “A biometricreference system for iris (osiris),” 2013.[60] V. Talreja, M. C. Valenti, and N. M. Nasrabadi, “Multibiometric securesystem based on deep learning,” in
Proc. IEEE Global Conference onSignal and Information Processing , Nov. 2017, pp. 298–302.[61] H. Yang, K. Lin, and C. Chen, “Supervised learning of semantics-preserving hash via deep convolutional neural networks,”
IEEE Trans-actions on Pattern Analysis and Machine Intelligence , vol. 40, no. 2,pp. 437–451, Feb 2018.[62] Information Technology — Security Techniques — Biometric Infor-mation Protection, ISO/IEC 24745:2011, ISO/IEC JTC1 SC27 SecurityTechniques, ISO, 2011.[63] M. Gomez-Barrero, J. Galbally, C. Rathgeb, and C. Busch, “Generalframework to evaluate unlinkability in biometric template protectionsystems,”
IEEE Transactions on Information Forensics and Security ,vol. 13, no. 6, pp. 1406–1420, 2018.
Veeru Talreja is a Ph.D. candidate at West VirginiaUniversity (WVU), Morgantown, WV, USA. Hereceived the M.S.E.E. degree from West VirginiaUniversity and the B.Engg. degree from OsmaniaUniversity, Hyderabad, India. From 2010 to 2013he worked as a Geospatial Software Developer withWest Virginia University research corporation. Hisresearch interests include applied machine learning,deep learning, coding theory, multimodal biometricrecognition and security, and image retrieval.
Matthew C. Valenti (M’92 - SM’07 - F’18) re-ceived the M.S.E.E. degree from the Johns Hop-kins University, Baltimore, MD, USA, and B.S.E.E.and Ph.D. degrees from Virginia Tech, Blacksburg,VA,USA. He has been a Faculty Member with WestVirginia University since 1999, where he is currentlya Professor and the Director of the Center for Identi-fication Technology Research. His research interestsare in wireless communications, cloud computing,and biometric identification. He is the recipient ofthe 2019 MILCOM Award for Sustained TechnicalAchievement. He is active in the organization and oversight of several ComSocsponsored IEEE conferences, including MILCOM, ICC, and Globecom. Hewas Chair of the ComSoc Communication Theory Technical committee from2015-2016, was TPC chair for MILCOM’17, is Chair of the Globecom/ICCTechnical Content (GITC) Committee (2018-2019), and is TPC co-chair forICC’21 (Montreal). He was an Electronics Engineer with the U.S. NavalResearch Laboratory, Washington, DC, USA. Dr. Valenti is registered as aProfessional Engineer in the state of West Virginia