Cross-Spectral Iris Matching Using Conditional Coupled GAN
Moktari Mostofa, Fariborz Taherkhani, Jeremy Dawson, Nasser M. Nasrabadi
CCross-Spectral Iris Matching Using Conditional Coupled GAN
Moktari Mostofa Fariborz Taherkhani Jeremy Dawson Nasser M. NasrabadiWest Virginia University { mm0251,ft0009 } @mix.wvu.edu, { jeremy.dawson, nasser.nasrabadi } @mail.wvu.edu Abstract
Cross-spectral iris recognition is emerging as a promis-ing biometric approach to authenticating the identity of in-dividuals. However, matching iris images acquired at dif-ferent spectral bands shows significant performance degra-dation when compared to single-band near-infrared (NIR)matching due to the spectral gap between iris images ob-tained in the NIR and visual-light (VIS) spectra. Al-though researchers have recently focused on deep-learning-based approaches to recover invariant representative fea-tures for more accurate recognition performance, the ex-isting methods cannot achieve the expected accuracy re-quired for commercial applications. Hence, in this paper,we propose a conditional coupled generative adversarialnetwork (CpGAN) architecture for cross-spectral iris recog-nition by projecting the VIS and NIR iris images into a low-dimensional embedding domain to explore the hidden re-lationship between them. The conditional CpGAN frame-work consists of a pair of GAN-based networks, one re-sponsible for retrieving images in the visible domain andother responsible for retrieving images in the NIR domain.Both networks try to map the data into a common em-bedding subspace to ensure maximum pair-wise similar-ity between the feature vectors from the two iris modalitiesof the same subject. To prove the usefulness of our pro-posed approach, extensive experimental results obtained onthe PolyU dataset are compared to existing state-of-the-artcross-spectral recognition methods.
1. Introduction
Iris recognition has received considerable attention inpersonal identification [4,9] due to highly distinctive spatialtexture patterns in iris. It is considered as one of the most re-liable and secure identity verification methods in biometrics[5, 12]. The human iris pattern is observed to have uniqueand different textures due to the process of chaotic morpho-genesis that causes its formation in early childhood, exhibit-
Figure 1. VIS and NIR iris images from the PolyU bi-spectral irisdatabase. ing variation even among identical twins. Therefore, irisrecognition has been extensively used in ID authenticationtasks. Many applications require both probe and gallery irisimages to be captured in the same optical spectrum, undereither near-infrared (NIR) or visual light (VIS), for homo-geneous iris recognition. Recently, high-resolution visiblesurveillance cameras that can capture useable opportunisticiris images have enabled biometric systems that could po-tentially compare these visible iris images to a NIR galleryusing cross-spectral matching. Cross-spectral iris matchingis defined as the ability of matching iris images acquiredin different spectral bands (e.g., VIS at 400-750 nm wave-length and NIR at 750-1400 nm wavelength) [2].Therefore, to facilitate effective iris matching, cross-spectral iris recognition systems have recently been devel-oped [3, 14, 16, 18]. However, existing methods still sufferfrom significant performance degradation [23]. The spec-tral difference is believed to be the major reason, whichyields poor recognition performance. As shown in Fig.1,the visual differences between the VIS and NIR iris imagesmake it obvious that the choice of illumination spectrumplays a vital role in emphasizing imaged iris patterns. Forinstance, iris textures are clearly visible in the VIS spec-trum and complex patterns are even highlighted under theVIS illumination. However, the recognition performance ishighly affected by reflection that occlude the iris pattern incertain regions. On the other hand, although almost all of a r X i v : . [ c s . C V ] O c t he prominent iris texture patterns are missing in the NIRimages, the iris recognition in the NIR images is more ef-ficient compared to iris recognition in VIS images due toless reflections. Therefore, matching iris images in cross-spectral domain has become a challenging task, which re-quires to be explored to achieve a high accuracy in cross-spectral iris matching.Previous research shows that the most essential innerproperties of an image can be mapped to a reduced low-dimensional latent subspace. A Latent subspace is a com-pressed representation of the image space, which containsthe most relevant and useful features of the raw data. In thispaper, we hypothesize that iris images in the VIS domainare connected to the iris images in the NIR domain in a low-dimensional latent embedded feature subspace. Our goal isto explore this hidden correlation by projecting VIS iris im-ages and NIR iris images into a common latent embeddingsubspace. Moreover, we posit that, if we perform verifica-tion in the latent domain, matching results would be moreaccurate due to the shared common features in that domain.Therefore, we propose a deep coupled learning frameworkfor cross-spectral iris matching, which utilizes a conditionalcoupled generative adversarial network (CpGAN) to learn acommon embedded feature vector via exploring the corre-lation between the NIR and VIS iris images in a reduceddimensional latent embedding feature subspace. The keybenefits from our proposed iris recognition approach can besummarized as the following:• A novel framework for cross-spectral iris matching us-ing coupled generative adversarial network has beenproposed.• Comprehensive experiments using a benchmark PolyUBi-Spectral dataset with comparable results against thebaseline methods ascertain the validity of the proposedCpGAN framework.• The proposed framework investigates the potential ca-pabilities of GAN based network to improve the per-formance of traditional cross-spectral iris recognitionmethods.
2. Literature Review
In recent years, cross-spectral iris matching has gainedsignificant interest in the biometric research community forsecurity, national ID programs, and also for personal iden-tity verification purposes [3, 14, 16, 18]. The accuracy ofan iris recognition system most importantly depends on thefeature extraction approaches. Hence, a robust feature ex-traction method used for representing iris texture patternsis essential in cross-spectral iris matching. Oktiana et al.[15] provides a description of several feature representa-tion methods based on the VIS and NIR imaging systems. Among them, LBP and BSIF are the best feature descrip-tors, which have been found [15] to accurately extract thetexture patterns of the iris for cross-spectral matching.In [22] the authors proposed a feature descriptor, whichapplies a 2D Gabor filter bank to compute the iris patternat multiple scales and orientation. The iris images capturedin the VIS spectrum often suffer from noise due to illumi-nation occlusions and position shifting. Therefore, they uti-lized the difference of variance (DoV) features to divide theiris template into sub-blocks, as the DoV features are in-variant to noise. However, this method could not achievethe high accuracy required for practical applications (highEER of 31.08%) because it is unable to relate the informa-tion comprised in the NIR and VIS images.In the work of Abdullah et al. [1], the matching accuracyhas increased with a 24.28% decrease in EER. They em-ployed a 1D log-Gabor-filter with three different descrip-tors, namely the Gabor difference of Gaussian (G-DoG),Gabor binarized statistical image features (G-BSIF), andGabor multiscale weberface (G-MSW), and achieved muchlower EER of 6.8%. It is also considered as the most accu-rate performance of cross-spectral iris recognition method,according to the report in [20].With the advent of convolutional neural networks(CNN), cross-spectral iris recognition research efforts haveconcentrated more towards feature learning through convo-lutional layers [23]. In [23] the authors observed that CNN-based features carry sparse information and offer a compactrepresentation for the iris template, which is significantlyreduced in size. Moreover, this approach incorporates su-pervised discrete hashing on the learned features to achieveexcellent results compared to other CNN-based iris recog-nition methods. Their proposed method resulted in an EERof 5.39%.
3. Generative Adversarial Network
Recently, GANs have achieved considerable attentionfrom the deep learning research community due to their sig-nificant contributions in image generation tasks. The ba-sic GAN framework consists of two modules – a generatormodule, G, and a discriminator module, D. The objective ofthe generator, G, is to learn a mapping, G : z → y , so that itcan produce synthesized samples from a noise variable, z ,with a prior noise distribution, p z ( z ) , which is difficult forthe discriminator, D, to distinguish from the real data dis-tribution, p data , over y . The generator, G ( z ; θ g ) is a differ-entiable function which is trained with parameters θ g whenmapping the noise variable, z , to the actual data space, y .Simultaneously, the discriminator, D, is trained as a binaryclassifier with parameters θ d such that it can distinguish thereal samples, y , from the fake ones, G ( z ) . Both the gen-erator and discriminator networks compete with each otherin a two-player minimax game. We calculate the followingoss function, L ( D, G ) , for the GAN: L ( D, G ) = E y ∼ P data ( y ) [log D ( y )]+ E z ∼ P z ( z ) [log(1 − D ( G ( z )))] . (1)The objective function of GAN defines the term “twoplayer minimax game” by optimizing the loss function, L ( D, G ) , as follows: min G max D L ( D, G ) = min G max D [ E y ∼ P data ( y ) [log D ( y )]+ E z ∼ P z ( z ) [log(1 − D ( G ( z )))]] . (2)One of the variants of GAN is introduced in [13] as theconditional GAN (cGAN) which expands the scope of syn-thesized image generation by setting a condition for boththe generative and discriminative networks. The cGAN ap-plies an auxiliary variable, x , as a condition which couldbe any kind of useful information such as texts [19], im-ages [8] or discrete labels [13]. The loss function for thecGAN, L c ( D, G ) , can be represented as follows: L c ( D, G ) = E y ∼ P data ( y ) [log D ( y | x )]+ E z ∼ P z ( z ) [log(1 − D ( G ( z | x )))] . (3)Similar to (2), the objective function of the cGAN is min-imized in a two-player minimax manner, which is denotedas L cGAN ( D, G, y, x ) and defined by: L cGAN ( D, G, y, x ) = min G max D [ E y ∼ P data ( y ) [log D ( y | x )]+ E z ∼ P z ( z ) [log(1 − D ( G ( z | x )))]] . (4)
4. Proposed Method
Our proposed method is inspired to further advancecross-spectral iris matching systems utilizing the capabili-ties of the GAN based approaches. Therefore, we do notgenerate a synthesized NIR image of its VIS counterpartbefore matching. Instead, we specifically focus on project-ing both the NIR and VIS iris images to a common latentlow-dimensional embedding subspace using a generativenetwork. We explore this low-dimensional latent featuresubspace for matching iris images in cross-spectral domainwith the help of an adversarial network due to its great suc-cess in finding optimal solution for synthetic image genera-tion.
Our proposed conditional CpGAN for iris matchingin cross-spectral domain consists of two conditonal GAN(cGAN) modules as shown in Fig. 2. One of them is ded-icated to reconstructing the VIS iris images and hence, werefer to as the VIS cGAN module. Similarly, the other mod-ule is dedicated to synthesizing the NIR iris images, which is referred to as the NIR cGAN module. In this work, weuse a U-Net architecture for the generator to achieve thelow-dimensional embedded subspace for cross-spectral irismatching via a contrastive loss along with the standard ad-versarial loss. In addition to the adversarial loss and con-trastive loss [6], the perceptual loss [10] and L reconstruc-tion loss are also used to guide the generators towards theoptimal solutions. Perceptual loss is measured via a pre-trained VGG 16 network, which helps in sharp and realisticreconstruction of the images.Our prime goal is to match a VIS iris probe against agallery of NIR iris images, which have not been seen by thenetwork during the training. To perform this matching in thecross-spectral domain, a discriminative model is requiredto produce a domain invariant representation. Therefore,we focus on learning iris feature representations in a com-mon embedding subspace by incorporating a U-Net auto-encoder architecture that uses class-specific contrastive lossto match the iris patterns in the latent domain.As previously mentioned, we use a U-Net auto-encoderarchitecture in our generator for its structural ability of ex-tracting features in the latent embedding subspace. Morespecifically, the contracting path of the “U shaped” struc-ture of the U-Net captures contextual information, whichis passed directly across all the layers, including the bottle-neck. Also, the high-dimensional features of the contractingpath of the U-Net, combined with the corresponding upsam-pled features of the symmetric expanding, path provides ameans to share the useful information throughout the net-work. Moreover, during domain transformation, a signif-icant amount of low-level information needs to be sharedbetween input and output, which can be accomplished byleveraging a U-Net-like architecture.We have followed the architecture of patch-based dis-criminators [8] to design the discriminators of our proposedmodel. The discriminators are trained simultaneously alongwith the respective generators. It is worthwhile to mentionthat the L loss performs very well when applied to preservethe low-frequency details but fails to preserve the high-frequency information, whereas patch-based discriminatorensures the preservation of high-frequency details since itpenalizes the structure at the scale of the patches.Although the VIS and NIR iris images are in differentdomains, they gradually build a connection in the commonembedding feature subspace. The features are domain in-variant in the embedded subspace, which provides it cred-ibility to discriminate images based on identity. Our finalobjective is to find a set of domain invariant features in acommon embedding subspace by coupling the two genera-tors via a contrastive loss function, L cont [6].The contrastive loss function, L cont , is defined as adistance-based loss metric, which is computed over a set ofpairs in the common embedding subspace such that images igure 2. Architecture of our proposed conditional CpGAN framework. During training, the contrastive loss function is used in the latentembedding subspace to optimize the network parameters so that latent features of iris images from different spectral domain of the sameidentity are close to each other while the features of different identities are pushed further apart. belonging to the same identity (genuine pairs i.e., a VIS irisimage of a subject with its corresponding NIR iris image)are embedded as close as possible, and images of differentidentities (imposter pairs i.e., a VIS iris image of a subjectwith a NIR iris image of a different subject) are pushed fur-ther apart from each other. The contrastive loss function isformulated as: L cont ( z ( x iV IS ) , z ( x jNIR ) , Y ) =(1 − Y ) 12 ( D z ) + ( Y ) 12 ( max (0 , m − D z )) , (5)where x iV IS and x jNIR denote the input VIS and NIR irisimages, respectively. The variable, Y , is a binary label,which is set to 0 if x iV IS and x jNIR belong to the sameclass (i.e., genuine pair), and equal to 1 if x iV IS and x jNIR belong to different classes (i.e., impostor pair). z ( . ) and z ( . ) are denoted as the encoding functions of the U-Netauto-encoder, which transform both x iV IS and x jNIR , re-spectively into a common latent embedding subspace. Here, m , is used as the contrastive margin to “tighten” the con-straint. The Euclidean distance, D z , between the outputs ofthe functions, z ( x iV IS ) , and z ( x jNIR ) , is given by: D z = (cid:13)(cid:13)(cid:13) z ( x iV IS ) − z ( x jNIR ) (cid:13)(cid:13)(cid:13) . (6)Therefore, if Y = 0 (i.e., genuine pair), then the con-trastive loss function, ( L cont ) , is given as: L cont ( z ( x iV IS ) , z ( x jNIR ) , Y ) = 12 (cid:13)(cid:13)(cid:13) z ( x iV IS ) − z ( x jNIR ) (cid:13)(cid:13)(cid:13) , (7) and if Y = 1 (i.e., impostor pair), then contrastive loss func-tion, ( L cont ) , is : L cont ( z ( x iV IS ) ,z ( x jNIR ) , Y ) =12 max (cid:18) , m − (cid:13)(cid:13)(cid:13) z ( x iV IS ) − z ( x jNIR ) (cid:13)(cid:13)(cid:13) (cid:19) . (8)Thus, the total loss for coupling the VIS generator andNIR generator is denoted by L cpl and is given as: L cpl = 1 N N (cid:88) i =1 N (cid:88) j =1 L cont ( z ( x iV IS ) , z ( x jNIR ) , Y ) , (9)where N is the number of training samples. The contrastiveloss in the above equation can also be replaced by someother distance-based metric, such as the Euclidean distance.However, the main aim of using the contrastive loss is tobe able to use the class labels implicitly and find a dis-criminative embedding subspace, which may not be thecase with some other metric such as the Euclidean distance.This discriminative embedding subspace would be usefulfor matching the VIS iris images against the gallery of NIRiris images.
5. Loss Functions
Here, we denote G V IS and G NIR as the VIS and NIRgenerators that reconstruct the corresponding VIS and NIRiris images from the input VIS and NIR iris images, respec-tively. D V IS and D NIR are denoted as the patch-based dis-criminators used for the VIS and NIR iris GANs. Since weave used a conditional GAN for our proposed method, wecondition both the generator networks, G V IS and G NIR ,on input VIS and NIR iris images, respectively. In addition,we have trained the generators and the corresponding dis-criminators with the conditional GAN loss function [13] toensure the reconstruction of real-looking natural image suchthat the discriminators cannot distinguish the generated im-ages from the real ones. Let L V IS and L NIR denote theconditional GAN loss functions for the VIS and NIR GANs,respectively, where L V IS and L NIR are given as: L V IS = L cGAN ( D V IS , G
V IS , y iV IS , x iV IS ) , (10) L NIR = L cGAN ( D NIR , G
NIR , y jNIR , x jNIR ) , (11)where L cGAN is defined as the conditional GAN objectivefunction in (4). The term, x iV IS , is used to denote the VISiris image, which is defined as a condition for the VIS GAN,and y iV IS , is denoted as the real VIS iris image. It is worthmentioning that the real VIS iris image, y iV IS , is same as thenetwork condition given by x iV IS . Similarly, x jNIR , denotesthe NIR iris image that is used as a condition for the NIRGAN. Again, like y iV IS , the real NIR iris image, y jNIR , issame as the network condition given by x jNIR . The totalobjective function for the coupled conditional GAN is givenby: L GAN = L V IS + L NIR . (12) L Reconstruction Loss
For both the VIS GAN and NIR GANs, we consider the L reconstruction loss as a classical constraint to ensure bet-ter results. The L reconstruction loss is measured in termsof the Euclidean distance between the reconstructed iris im-age and the corresponding real iris image. We denote thereconstruction loss for the VIS GAN as L V IS and define itas follows: L V IS = (cid:13)(cid:13) G V IS ( z | x iV IS ) − y iV IS (cid:13)(cid:13) , (13)where y iV IS is the ground truth VIS iris image, and G V IS ( z | x iV IS ) , is the output of the VIS generator.Similarly, lets denote the reconstruction loss for the NIRGAN as L NIR : L NIR = (cid:13)(cid:13)(cid:13) G NIR ( z | x jNIR ) − y jNIR (cid:13)(cid:13)(cid:13) , (14)where y jNIR is the ground truth NIR iris image, and G NIR ( z | x jNIR ) , is the output of the NIR generator.The total L reconstruction loss can be given by the fol-lowing equation: L = 1 N N (cid:88) i =1 N (cid:88) j =1 ( L V IS + L NIR ) . (15) Although the GAN loss and the reconstruction loss areused to guide the generators, they fail to reconstruct percep-tually pleasing images. Perceptually pleasing means imageswith perceptual features defined by the visual determinis-tic properties of objects. Hence, we have also used per-ceptual loss introduced in [10] for style transfer and super-resolution. The perceptual loss function basically measureshigh level differences, such as content and style dissimilar-ity, between images. The perceptual loss is based on high-level representations from a pre-trained VGG-16 [21] likeCNN. Moreover, it helps the network generate better andsharper high quality images [10]. As a result, it can be agood alternative to solely using L or L reconstruction er-ror.In our proposed approach, we have added perceptual lossto both the VIS and NIR GAN modules using a pre-trainedVGG-16 [21] network. It involves extracting the high-levelfeatures (ReLU3-3 layer) of VGG-16 for both the real inputimage and the reconstructed output of the U-Net generator.The perceptual loss calculates the L distance between thefeatures of real and reconstructed images to guide the gen-erators G V IS and G NIR . The perceptual loss for the VISGAN network is defined as: L P V IS = 1 C p W p H p C p (cid:88) c =1 W p (cid:88) w =1 H p (cid:88) h =1 (cid:13)(cid:13) V ( G V IS ( z | x iV IS )) c,w,h − V ( y iV IS ) c,w,h (cid:13)(cid:13) , (16)where V ( . ) is used to denote a particular layer of the VGG-16 and C p , W p , and H p denote the layer dimensions.Likewise the perceptual loss for the NIR GAN networkis: L P NIR = 1 C p W p H p C p (cid:88) c =1 W p (cid:88) w =1 H p (cid:88) h =1 (cid:13)(cid:13)(cid:13) V ( G NIR ( z | x jNIR )) c,w,h − V ( y jNIR ) c,w,h (cid:13)(cid:13)(cid:13) . (17)The total perceptual loss function is given by: L P = 1 N N (cid:88) i =1 N (cid:88) j =1 ( L P V IS + L P NIR ) . (18) We sum up all the loss functions defined above to calcu-late the overall objective function for our proposed method: tot = L cpl + λ L GAN + λ L P + λ L , (19)where L cpl is the coupling loss, L GAN is the total genera-tive adversarial loss, L P is the total perceptual loss, and L is the total reconstruction error. Variables λ , λ , and λ arethe hyper-parameters used as a weight factor to numericallybalance the magnitude of the different loss terms.
6. Experiments
In this section, we first describe the datasets and thetraining details to show the implementation of our method.To show the efficiency of our method for the task of irisrecognition in cross domain, we compare its performancewith other existing cross-spectral iris recognition methods.
Datasets : We conduct the experiments using the PolyUBi-spectral database. The PolyU Bi-Spectral database[14, 23] (see Figure 1) contains iris images of 209 subjectsobtained simultaneously in both the VIS and NIR wave-lengths. The data for each subject consists of 15 differentinstances of right-eye images and left-eye images for bothVIS and NIR spectrum. Therefore, the total number of im-ages in this dataset is 12,540 with a resolution of × pixels. For the experiment, we split the dataset into train-ing and testing sets. We choose the images of the last 168of the identities as the training set and all the images of theremaining identities as the testing set. Implementation Details : We have implemented our Cp-GAN architecture using the U-Net architecture as the gener-ator module. We follow the typical CNN architecture for theimplementation of both encoder and decoder sections of theU-Net model. The encoder section is designed by applyingtwo × convolutions, each followed by a rectified linearunit (ReLU). For downsampling, it uses × max poolingoperation with stride 2. We double the number of featurechannels at each downsampling step. Similarly, each stepin the decoder section upscales the feature map by applyinga × transpose convolution convolution (“deconvolutionthat is similar to upconvolution”) and halves the number offeature channels. After upsampling the dimension of thefeature map, each feature map is concatenated with the cor-responding feature map from the encoder, followed by two × convolutions with a ReLU activation function.The proposed framework has been implemented in Py-torch. We trained the network with a batch size of 16 and alearning rate of 0.0002. We used the Adam optimizer [11]with a first-order momentum of 0.5, and a second-order mo-mentum of 0.999. We have used the Leaky ReLU as the ac-tivation function with a slope of 0.35 for the discriminator.For the network convergence, we set 1 for λ , and 0.3 forboth λ , and λ . For training, genuine/impostor pairs are created from theVIS and NIR iris images of the same/different subjects.During the experiments, we ensure that the training set isbalanced by using the same number of genuine and impos-tor pairs.
7. Evaluation on PolyU Bi-Spectral Database
We have evaluated our proposed method on the PolyUBi-Spectral benchmark iris dataset. The PolyU Bi-Spectraliris dataset contains co-registered eye images in VIS as wellas NIR spectrum. We conduct several experiments to showthe efficacy of our proposed scheme. In all experiments,each probe image of the test set is matched against a galleryof images which are in a different domain (e.g., VIS orNIR). As a consequence, we obtain genuine and imposterscores, which guide calculation of the essential recognitionperformance parameters, such as genuine acceptance rate(GAR), false acceptance rate (FAR), and equal error rate(EER). In addition, we plot receiver operating characteris-tics (ROC) curves to analyze the GAR with respect to FAR.We have studied the following cases for cross-spectral irismatching:(a)
Matching High Resolution VIS iris images against agallery of High Resolution NIR iris images
In this experiment, we train our network with the un-rolled high resolution (64 × VIS and NIR iris im-ages such that the VIS and NIR generators are trained toobtain domain invariant features in a common embeddingsubspace.Our purpose is to use the trained network for match-ing high resolution (HR) VIS iris images against a galleryof high resolution (HR) NIR iris images, which were un-seen by the network during the training. We evaluatethe performance of this network on the PolyU Bi-Spectraldataset. To show the comparative performances, we con-sider other state-of-the-art deep learning approaches (Wanget al. [23, 24], and Oktiana et al. [16]), which apply differ-ent types of feature extraction techniques. In addition, wehave plotted ROC curves comparing our proposed approachwith the baseline algorithms already mentioned above. Theresults are summarized in Table I.From Fig. 3 and Table I, we observe that our proposedCpGAN framework performs much better than the otherbaseline (Wang et al. [23,24] and Oktiana et al. [16]) match-ing algorithms. In this setting, our method achieves 1.67%more identification accuracy with 4.37% decrease in EERcompared to the most recent cross-spectral iris recognitionmethod [23]. Additionally, it outperforms the method de-scribed in [16, 24] by a significant decrease of 0.67% and16.01% in EER, respectively. This significant improvementclearly indicates that the usage of a CpGAN frameworkfor projecting the VIS and NIR iris images into the latent igure 3. ROC curves showing the results obtained on the PolyUBi-Spectral database. embedding subspace to retrieve the domain invariant fea-tures is better than the other existing deep learning methods.(b)
Matching High Resolution VIS iris images against agallery of Low Resolution NIR iris images
Here, we analyze our network by considering an idealscenario for cross-spectral iris recognition systems. Wementioned that in surveillance-based iris recognition sys-tems, surveillance cameras capture high-resolution imagesunder visible spectrum, while images already stored in thegallery are in the NIR domain and having a lower resolu-tion. Therefore, it has become a challenging issue for theexisting cross-spectral iris recognition systems to ascertainthe correlation between iris images in different resolutionsas well as at different spectra. Their limitations of retriev-ing accurate semantic similarity in the iris images of dif-ferent resolution and spectrum have resulted in a significantperformance degradation. One of the most probable waysto resolve this issue could be the usage of CpGAN networktrained with the unrolled high-resolution (HR) ( × )VIS and low-resolution (LR) ( × ) NIR iris images,which ensures the retrieval of contextual and semantic fea-tures of the iris images in a common embedding subspace.To verify the usefulness of this network, we perform match-ing HR VIS iris images against a gallery of LR NIR iris im-ages using the publicly available PolyU Bi-spectral dataset.The results summarized in Fig. 3 and Table I indicate thatthe network remains robust enough to provide outperform-ing results compared to the methods described in [16, 24],which may have significant contribution to the real-life ap-plications.(c) Matching Low Resolution VIS iris images against agallery of High Resolution NIR iris images
In most cross-spectral iris recognition systems, re-searchers have focused on matching high- resolution VISiris images against a gallery of low-resolution NIR iris im-ages. They did not consider to the scenario where matching
Table 1. Comparative performances on the PolyU Bi-Spectraldatabase. Symbol ’-’ indicates that the metric is not available forthat protocol.
Algorithm Matching GAR@FAR=0.01 GAR@FAR=0.001 EERWang et al. [24] HR VIS vs HR NIR 59.10 37.00 17.03CNN with SDH [23] HR VIS vs HR NIR 90.71 84.50 5.39Garg et. al [7] HR VIS vs HR NIR — 48.86GRF BSIF [16] HR VIS vs HR NIR 82.92 79.12 1.69GRF LBP [16] HR VIS vs HR NIR 69.23 67.04 4.2
Ours (CpGAN) HR VIS vs HR NIR 92.38 84.98 1.02
Ours (CpGAN) HR VIS vs LR NIR 89.89 81.21 1.21Ours (CpGAN) HR NIR vs LR VIS 84.75 73.45 1.26Ours (CpGAN) LR NIR vs LR VIS 70.10 59.97 2.51 could be performed using low-resolution VIS probe iris im-ages. To illustrate the point, it is worth considering thatsurveillance cameras in public areas having a large field ofview often capture images of the subjects which are at alarge standoff distance from the camera [7]. Due to this fact,captured faces are expected to be in low-resolution, whichsuffer from poor quality. On the other hand, the gallery im-ages have high-resolution which are generally collected inthe NIR spectrum. So, matching with such a modality gapbetween probe and gallery images makes the cross-spectralrecognition problem even more challenging. Hence, we em-phasize this surveillance scenario and train the VIS and NIRgenerator of our network with the unrolled LR VIS iris im-ages ( × ) and HR NIR iris images ( × ), respec-tively. The matching is performed in the latent embeddedsubspace, as it contains all the information about the iristexture patterns irrespective of the resolution.From Fig. 3 and Table 1, we observe that our proposedalgorithm achieves 84.75% genuine acceptance rate (GAR)at 0.01 FAR and 1.31% equal error rate (EER), which, asin the previous test cases, outperforms the results reportedby Oktiana et al. [16] and Wang et al. [24]. The networkobtains 4.13% and 15.77% less EER compared to the resultsreported in [23, 24], respectively.(d) Matching Low Resolution VIS iris images against agallery of Low Resolution NIR iris images
In addition to the study mentioned above, we have alsoinvestigated the matching performance of our network whenour gallery images are in a low-resolution NIR domain.To train our network, we feed both the VIS and NIR gen-erator with the unrolled LR VIS and NIR iris images. Theexperimental results indicate the matching accuracy of thisnetwork, which are reported in Table I and Fig. 3. It is ob-served that, even though it achieves an EER of 2.51% thatis much lower than several comparable methods, the veri-fication performance is not as satisfactory as our previousexperiments outlined above.(e)
Cross-Spectral Iris Matching in Synthesized Domain
In order to achieve accurate iris recognition performancein the cross-spectral domain, many researchers applied do-main transformation techniques before matching. However, igure 4. Reconstruction of synthesized HR NIR iris images fromthe output of the NIR cGAN generator as the HR VIS iris input tothe VIS cGAN generator.Table 2. Results summary of our proposed conditional CpGANnetwork for iris matching in synthesized domain on the PolyU Bi-Spectral database.The term in (’ ’) indicates the input domain.
Algorithm Matching GAR@FAR=0.01 EEROurs (CpGAN) Synthesized HR NIR (HR VIS)-HR NIR 83.15 1.12Ours (CpGAN) Synthesized HR VIS (HR NIR)-HR VIS 82.16 1.19Ours (CpGAN) Synthesized HR NIR (LR VIS)-HR NIR 77.31 1.47Ours (CpGAN) Synthesized HR VIS (LR NIR)-HR VIS 73.51 2.72 in our proposed method we perform matching in a modality-invariant embedded subspace utilizing its latent feature vec-tors. A range of experiments has been conducted to validatethe efficacy of our proposed method. Moreover, we havealso investigated the impact of the domain transformationtechnique followed by the cross-spectral iris matching. Sev-eral experiments have been performed to quantify the per-formance of our model regarding these following scenarios:(1) We have utilized the feature vectors generated in theembedding subspace of the VIS cGAN generator to pro-duce synthesized NIR iris images. In more detail, the VIScGAN generator is trained with the unrolled HR VIS irisimages such that the generated feature vectors can be fedto the decoder section of the NIR cGAN generator networkto synthesize the corresponding HR NIR iris images. Thesesynthesized HR NIR iris images are then matched againstthe HR NIR iris gallery. We used the OSIRIS [17] softwarefor matching.(2) Similarly, the feature vectors generated from the NIRcGAN generator are used as input to the decoder of the VIScGAN network to obtain synthesized HR VIS iris imageswhich are matched against the HR VIS iris gallery.(3) To show the iris matching performance in cross-spectralas well as cross-resolution domains, we conduct additionalexperiments. More specifically, we train both the VIS andNIR cGAN networks with the unrolled low-resolution VISand NIR iris images, respectively, so that the representativefeature vectors generated in the latent embedding subspacecan be employed to reconstruct HR synthesized iris imagesin the cross-spectral domain following the above manner.We used OSIRIS software to perform matching for theset of experiments described above. The ROC curves fromthis set of experiments are shown in Fig. 6 while verifica-
Figure 5. Reconstruction of synthesized HR VIS iris images fromthe output of the VIS cGAN generator as the HR NIR iris input tothe NIR cGAN generator.Figure 6. ROC curves showing the performance of our proposedconditional CpGAN network for iris matching in synthesized do-main on the PolyU Bi-Spectral iris database. tion performance with EER results are summarized in Table2. We have also shown the synthesized HR results in Fig.4 and 5. The experimental results indicate that followingthe domain transformation technique to conduct matchingin the same domain where the gallery images belong doesnot offer as much improvement as our approach, which con-ducts matching in the latent embedding subspace.However, one of the scenarios still outperforms the otherbaseline methods [16, 24] such as HR VIS input to synthe-sized HR NIR matched against HR NIR gallery by 0.23%and 24.05% GAR at 0.01 FAR, respectively.
8. Conclusion
In this paper, we have investigated the cross-domain irisrecognition problem and introduced a new approach formore accurate cross-spectral iris matching. We developeda conditional Coupled GAN (CpGAN) framework whichprojects modality invariant iris texture features in the latentembedding subspace to perform matching in the embeddeddomain. Matching results outperforming other methods re-ported in the literature, illustrated in Section VII of this pa-per on publicly available PolyU cross-spectral iris database,validate the superiority and effectiveness of our approach. eferences [1] M. A. Abdullah, S. S. Dlay, W. L. Woo, and J. A. Chambers.A novel framework for cross-spectral iris matching.
IPSJTransactions on Computer Vision and Applications , 8(1):9,2016.[2] C. A. Aguilera Carrasco. Local feature description in cross-spectral imagery.
Ph.D. dissertation, Univ. Autonoma deBarcelona, Bellatera, Spain , Sep. 2017.[3] S. S. Behera, M. Gour, V. Kanhangad, and N. Puhan. Peri-ocular recognition in cross-spectral scenario. In , pages681–687. IEEE, 2017.[4] K. W. Bowyer, K. Hollingsworth, and P. J. Flynn. Imageunderstanding for iris biometrics: A survey.
Computer visionand image understanding , 110(2):281–307, 2008.[5] Y. Chen, Y. Liu, X. Zhu, F. He, H. Wang, and N. Deng. Ef-ficient iris recognition based on optimal subfeature selectionand weighted subregion fusion.
The Scientific World Journal ,2014, 2014.[6] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similar-ity metric discriminatively, with application to face verifica-tion. In
Proc. IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR) , pages 539–546, 2005.[7] R. Garg, Y. Baweja, M. Vatsa, and R. Singh. Heterogeneousdeep metric learning for cross-modal biometric recognition.
Senior Project thesis, IIITD-Delhi , 2018.[8] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-imagetranslation with conditional adversarial networks. In
Proc.IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR) , pages 5967–5976, 2017.[9] A. K. Jain, K. Nandakumar, and A. Ross. 50 years of bio-metric research: Accomplishments, challenges, and oppor-tunities.
Pattern Recognition Letters , 79:80–105, 2016.[10] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses forreal-time style transfer and super-resolution. In
Proc. Euro-pean Conference on Computer Vision (ECCV) , 2016.[11] D. P. Kingma and J. A. Ba. A method for stochastic optimiza-tion. arxiv 2014. arXiv preprint arXiv:1412.6980 , 2019.[12] Y.-H. Li and P.-J. Huang. An accurate and efficient user au-thentication mechanism on smart glasses based on iris recog-nition.
Mobile Information Systems , 2017, 2017.[13] M. Mirza and S. Osindero. Conditional generative ad-versarial nets. arXiv:1411.1784, [Online]. Available:https://arxiv.org/abs/1411.1784 , Nov. 2014.[14] P. R. Nalla and A. Kumar. Toward more accurate iris recog-nition using cross-spectral matching.
IEEE transactions onImage processing , 26(1):208–221, 2016.[15] M. Oktiana, F. Arnia, Y. Away, and K. Munadi. Features forcross spectral image matching: A survey.
Bulletin of Electri-cal Engineering and Informatics , 7(4):552–560, 2018.[16] M. Oktiana, K. Saddami, F. Arnia, Y. Away, K. Hirai, T. Ho-riuchi, and K. Munadi. Advances in cross-spectral irisrecognition using integrated gradientface-based normaliza-tion.
IEEE Access , 7:130484–130494, 2019. [17] N. Othman, B. Dorizzi, and S. Garcia-Salicetti. Osiris: Anopen source iris recognition software.
Pattern RecognitionLetters , 82:124–131, 2016.[18] N. P. Ramaiah and A. Kumar. On matching cross-spectralperiocular images for accurate biometrics identification. In , pages 1–6. IEEE,2016.[19] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, andH. Lee. Generative adversarial text to image synthesis. In
In-ternational Conference on Machine Learning (ICML) , 2016.[20] A. Sequeira, L. Chen, P. Wild, J. Ferryman, F. Alonso-Fernandez, K. B. Raja, R. Raghavendra, C. Busch, and J. Bi-gun. Cross-eyed-cross-spectral iris/periocular recognitiondatabase and competition. In , pages 1–5. IEEE, 2016.[21] K. Simonyan and A. Zisserman. Very deep con-volutional networks for large-scale image recog-nition. arXiv:1409.1556, [Online]. Available:https://arxiv.org/abs/1409.1556 , 2014.[22] R. Vyas, T. Kanumuri, and G. Sheoran. Cross spectral irisrecognition for surveillance based applications.
MultimediaTools and Applications , 78(5):5681–5699, 2019.[23] K. Wang and A. Kumar. Cross-spectral iris recognition usingcnn and supervised discrete hashing.
Pattern Recognition ,86:85–98, 2019.[24] K. Wang and A. Kumar. Toward more accurate iris recogni-tion using dilated residual features.