[PDF] Unravelling Robustness of Deep Learning based Face Recognition Against Adversarial Attacks

Abstract

Deep neural network (DNN) architecture based models have high expressive power and learning capacity. However, they are essentially a black box method since it is not easy to mathematically formulate the functions that are learned within its many layers of representation. Realizing this, many researchers have started to design methods to exploit the drawbacks of deep learning based algorithms questioning their robustness and exposing their singularities. In this paper, we attempt to unravel three aspects related to the robustness of DNNs for face recognition: (i) assessing the impact of deep architectures for face recognition in terms of vulnerabilities to attacks inspired by commonly observed distortions in the real world that are well handled by shallow learning methods along with learning based adversaries; (ii) detecting the singularities by characterizing abnormal filter response behavior in the hidden layers of deep networks; and (iii) making corrections to the processing pipeline to alleviate the problem. Our experimental evaluation using multiple open-source DNN-based face recognition networks, including OpenFace and VGG-Face, and two publicly available databases (MEDS and PaSC) demonstrates that the performance of deep learning based face recognition algorithms can suffer greatly in the presence of such distortions. The proposed method is also compared with existing detection algorithms and the results show that it is able to detect the attacks with very high accuracy by suitably designing a classifier using the response of the hidden layers in the network. Finally, we present several effective countermeasures to mitigate the impact of adversarial attacks and improve the overall robustness of DNN-based face recognition.

Full PDF

UUnravelling Robustness of Deep Learning based Face Recognition AgainstAdversarial Attacks

Gaurav Goswami , , Nalini Ratha , Akshay Agarwal , Richa Singh , Mayank Vatsa IIIT-Delhi, India IBM IRL, Bangalore, India, IBM TJ Watson Research Center, USA { gauravgs, akshaya, rsingh, mayank } @iiitd.ac.in , [email protected] Abstract

Deep neural network (DNN) architecture based models havehigh expressive power and learning capacity. However, theyare essentially a black box method since it is not easy to math-ematically formulate the functions that are learned withinits many layers of representation. Realizing this, many re-searchers have started to design methods to exploit the draw-backs of deep learning based algorithms questioning their ro-bustness and exposing their singularities. In this paper, weattempt to unravel three aspects related to the robustness ofDNNs for face recognition: (i) assessing the impact of deeparchitectures for face recognition in terms of vulnerabilitiesto attacks inspired by commonly observed distortions in thereal world that are well handled by shallow learning methodsalong with learning based adversaries; (ii) detecting the sin-gularities by characterizing abnormal ﬁlter response behav-ior in the hidden layers of deep networks; and (iii) makingcorrections to the processing pipeline to alleviate the prob-lem. Our experimental evaluation using multiple open-sourceDNN-based face recognition networks, including OpenFaceand VGG-Face, and two publicly available databases (MEDSand PaSC) demonstrates that the performance of deep learn-ing based face recognition algorithms can suffer greatly inthe presence of such distortions. The proposed method is alsocompared with existing detection algorithms and the resultsshow that it is able to detect the attacks with very high accu-racy by suitably designing a classiﬁer using the response ofthe hidden layers in the network. Finally, we present severaleffective countermeasures to mitigate the impact of adversar-ial attacks and improve the overall robustness of DNN-basedface recognition.

Introduction

Deep learning paradigm has seen signiﬁcant proliferation inface recognition due to the convenience of obtaining largetraining data, availability of inexpensive computing powerand memory, and utilization of cameras at multiple places.Several algorithms such as DeepFace (Taigman, Y. andYang, M. and Ranzato, M. and Wolf, L. 2014), DeepID (Sun,Wang, and Tang 2015), FaceNet (Schroff, Kalenichenko,and Philbin 2015), and Liu et al. (2015) are successful ex-amples of the coalesce of deep learning and face recognition.However, it is also known that machine learning algorithms

VGG = 0.23, OF = 0.2Genuine!VGG = 0.7, OF = 2.4Impostor! VGG = 0.5, OF = 0.07

Genuine!

VGG = 0.85, OF = 2.08

Impostor!

VGG = 0.9, OF = 2.8

Impostor!

VGG = 0.6, OF = 0.24

Genuine!

VGG = 1.0, OF = 2.9

Impostor!

VGG = 0.28, OF = 0.56

Genuine!Add distortionAdd distortionAdd distortionOriginal matched pairOriginal non-matched pair Add distortionAdd distortionAttacker created a false rejectAttacker created a false accept

Figure 1: We show that deep learning based OpenFace (OF)and VGG-Face can be deceived even by image processingoperations that mimic real world distortions.are susceptible to adversaries which can cause the classiﬁerto yield incorrect results. Most of the time these adversariesare unintentional and are in the form of outliers. Recently, ithas been shown that fooling images can be generated in sucha manner where humans can correctly classify the imagesbut deep learning algorithms misclassify them (Goodfellow,Shlens, and Szegedy 2015), (Nguyen, Yosinski, and Clune2015). As shown in Table 1, such images can be generatedvia evolutionary algorithms (Nguyen, Yosinski, and Clune2015) or adversarial sample crafting using the fast gradi-ent sign method (Goodfellow, Shlens, and Szegedy 2015).Sharif et al. (2016) explored threat models by creating per-turbed eye-glasses to fool face recognition algorithms. Anadversarial attack on face recognition is not acceptable asface biometric gets used in many high security applicationssuch as passports, visa, and other law enforcement docu-ments.It is our assertion that it is not required to attack the sys-tem with sophisticated learning based attacks; and attackssuch as adding random noise or horizontal and vertical blackgrid lines in the face image cause reduction in face veriﬁca-tion accuracies. Samples images in Figure 1 show a glimpseof the effect of image processing operations on two state-of-the-art deep learning based face recognition algorithms. To a r X i v : . [ c s . C V ] F e b able 1: Literature review of adversarial attack generation and detection algorithms. Adversary Authors DescriptionGeneration Szegedy et al., 2013 L-BFGS: L ( x + ρ, l ) + λ || ρ || s.t. x i + ρ i ∈ [ b min , b max ] Goodfellow, Shlens, and Szegedy, 2015 FGSM: x + (cid:15) ∗ ( (cid:79) x L ( x , l ) Kurakin, Goodfellow, and Bengio, 2016 I-FGSM: x k +1 = x k + (cid:15) ∗ ( (cid:79) x L ( x , l ) Papernot et al., 2016 Saliency Map: l distance optmizationMoosavi-Dezfooli, Fawzi, and Frossard, 2016 DeepFool: for each class, l (cid:54) = l , minimize d ( l, l ) Carlini and Wagner, 2017 C & W: l p distance metric optimizationMoosavi-Dezfooli et al., 2017 Universal: Distribution based perturbationRauber, Brendel, and Bethge, 2017 Blackbox: Uniform, Gaussian, Salt and Pepper, Gaussian Blur, ContrastDetection Grosse et al., 2017 Statistical test for attack and genuine data distributionGong, Wang, and Ku; Metzen et al., 2017 Neural network based classiﬁcationFeinman et al., 2017 Randomized network using Dropout at both training and testingBhagoji, Cullina, and Mittal, 2017 PCA based dimensionality reduction algorithmLiang et al., 2017 Quantization and smoothing based image processingLu, Issaranon, and Forsyth, 2017 Quantize ReLU output for discrete code + RBF SVMDas et al., 2017 JPEG compression to reduce the effect of adversary the best of our knowledge, this is the ﬁrst reported researchon ﬁnding singularities in deep learning based face recog-nition engines along with detection and mitigation of suchattacks. We believe that being able to not only automaticallydetect but also correct adversarial samples at runtime is acrucial ability for a deep network that is deployed for realworld applications. With this research, we aim to present anew perspective on potential attacks as well as a differentmethodology to limit their performance impact beyond sim-ply including adversarial samples in the training data.The objective of this paper is three-fold: (i) We demon-strate that the performance of deep learning based facerecognition algorithms can be signiﬁcantly affected due toadversarial attacks - both image processing based adversar-ial attacks and adversarial samples generated in context tothe recognition architecture. (ii) The ﬁrst key step in tak-ing countermeasures against such adversarial attacks is to beable to reliably determine which images contain such distor-tions. We propose and evaluate a methodology for automaticdetection of such attacks using the response from hiddenlayers of the DNN. (iii) Once identiﬁed, the distorted im-ages may be rejected for further processing or rectiﬁed usingappropriate preprocessing techniques to prevent degradationin performance. To address this challenge without increas-ing the failure to process rate (by rejecting the samples), thethird contribution of this research is a novel technique of se-lective dropout in the DNN to mitigate these adversarial at-tacks. While we have showcased results with multiple deepface networks in this paper, we have used VGG to report thedetection and mitigation results for DeepFool and

Univer-sal adversarial perturbations since it is the only network forwhich the authors have provided pre-computed models.

Adversarial Attacks on Deep Learning basedFace Recognition

In this section, we discuss the proposed adversarial distor-tions that are able to degrade the performance of deep learn-ing face recognition algorithms. Let x be the input to a deeplearning based face recognition algorithm and l be the out-put class label (in case of identiﬁcation, it is an identity label and for veriﬁcation, it is same or different ). An adversarialattack function a ( · ) , when applied to the input face image,falsely changes the predicted identity label. In other words,if a ( x ) = l (cid:48) where, l (cid:54) = l (cid:48) , then a is a successful adver-sarial attack on the network. While adversarial learning hasbeen used in literature to showcase that the function a ( · ) canbe obtained via optimization based on network gradients,in this research, we explore a different approach. We eval-uate the robustness of deep learning based face recognitionin the presence of image processing based distortions. Basedon the information required in their design, these distortionscan be considered at image-level or face-level. We proposetwo image-level distortions: (a) grid based occlusion, and (b)most signiﬁcant bit based noise, along with three face-leveldistortions: (a) forehead and brow occlusion, (b) eye regionocclusion, and (c) beard-like occlusion. Image-level Distortions

Distortions that are not speciﬁc to faces and can be applied toan image of any object are categorized as image-level distor-tions. In this research, we have utilized two such distortions,grid based occlusion and most signiﬁcant bit change basednoise addition. Figure 2(b) and 2(c) present sample outputsof image-level distortions.

Grid based Occlusion

For the grid based occlusion(termed as Grids) distortion, we select a number of points P = { p , p , ..., p n } along the upper ( y = 0 ) and left( x = 0 ) boundaries of the image according to a parameter ρ grids . The parameter ρ grids determines the number of gridsthat are used to distort each image with higher values re-sulting in a denser grid, i.e., more grid lines. For each point p i = ( x i , y i ) , we select a point on the opposite boundary ofthe image, p (cid:48) i = ( x (cid:48) i , y (cid:48) i ) , with the condition if y i = 0 , then y (cid:48) i = H and if x i = 0 then x (cid:48) i = W , where, W × H is thesize of the input image. Once a set of pairs corresponding topoints P and P (cid:48) have been selected for the image, one pixelwide line segments are created to connect each pair, and eachpixel lying on these lines is set to grayscale value. Most Signiﬁcant Bit based Noise

For the most signiﬁ-cant bit based noise (xMSB) distortion, we select three setsigure 2: Sample images representing the (b) grid based oc-clusion (Grids), (c) most signiﬁcant bit based noise (xMSB),(d) forehead and brow occlusion (FHBO), (e) eye region oc-clusion (ERO), and (f) beard-like occlusion (Dhamecha etal. 2014) (Beard) distortions when applied to the (a) originalimages. (g) is the Universal perturbed (Moosavi-Dezfooli etal. 2017) images of PaSC and MEDS databases.of pixels X , X , X from the image stochastically such that |X i | = φ i × W × H , where W × H is the size of the in-put image. The parameter φ i denotes the fraction of pixelswhere the i th most signiﬁcant bit is ﬂipped. The higher thevalue of φ i , the more pixels are distorted in the i th most sig-niﬁcant bit. For each P j ∈ X i , ∀ i ∈ [1 , , we perform thefollowing operation: P kj = P kj ⊕ (1)where, P kj denotes the k th most signiﬁcant bit of the j th pixel in the set and ⊕ denotes the bitwise XOR operation. Itis to be noted that the sets X i are not mutually exclusive andmay overlap. Therefore, the total number of pixels affectedby the noise is at most |X + X + X | but may also be lowerdepending on the stochastic selection. Face-level Distortions

Face-level distortions speciﬁcally require face-speciﬁc in-formation, e.g. location of facial landmarks. The three face-level region based occlusion distortions are applied after per-forming automatic face and facial landmark detection. Inthis research, we have utilized the open source DLIB library(King 2009) to obtain the facial landmarks. Once facial land-marks are identiﬁed, they are used along with their bound-aries for masking. To obscure the eye region, a singular oc-clusion band is drawn on the face image as follows: I { x, y } = 0 , ∀ x ∈ [0 , W ] , y ∈ (cid:104) y e − d eye ψ , y e + d eye ψ (cid:105) (2)Here, y e = (cid:0) y le + y re (cid:1) , and ( x le , y le ) and ( x re , y re ) are thelocations of the left eye center and the right eye center,respectively. The inter-eye distance d eye is calculated as: x re − x le and ψ is a parameter that determines the widthof the occlusion band. Similar to the eye region occlusion Table 2: Characteristics of the databases used for adversarialattack generation and detection.Database Subjects ImagesPaSC (Beveridge et al. 2013) 293 4,688MEDS-II (Founds et al. 2011) 518 858(ERO), the forehead and brow occlusion (FHBO) is createdwhere facial landmarks on forehead and brow regions areused to create a mask. For the beard-like occlusion, outer fa-cial landmarks along with nose and mouth coordinates areutilized to create the mask as combinations of individuallyoccluded regions. Figure 2 (d), (e), and (f) illustrate the sam-ples of face-level distortions. Learning based Adversaries

Along with the proposed image-level and face-level dis-tortions, we also analyze the effect of adversarial samplesgenerated using two existing adversarial models: DeepFool(Moosavi-Dezfooli, Fawzi, and Frossard 2016) and Uni-versal Adversarial Perturbations (Moosavi-Dezfooli et al.2017).

Adversarial Distortions: Results and Analysis

In this section, we ﬁrst provide a brief overview of the deepface recognition networks, databases, and respective exper-imental protocols that are used to conduct the face veriﬁ-cation evaluations. We attempt to assess how the deep net-works perform in the presence of different kinds of proposeddistortions to emphasize the need for addressing such at-tacks.

Databases

We use two publicly available face databases for our ex-periments, namely, the Point and Shoot Challenge (PaSC)database (Beveridge et al. 2013) and the Multiple En-counters Dataset (MEDS) (Founds et al. 2011). The PaSCdatabase contains still-to-still and video-to-video matchingprotocols. We use the frontal subset of the still-to-still proto-col which contains 4,688 images pertaining to 293 individ-uals which are divided into equal size target and query sets.Each image in the target set is matched to each image in thequery set and the resulting × score matrix is usedto determine the veriﬁcation performance.The MEDS-II database contains a total of 1,309 facespertaining to 518 individuals. Similar to the case of PaSC,we utilize the metadata provided with the MEDS release 2database to obtain a subset of 858 frontal face images fromthe database. Each of these images is matched to every otherimage and the resulting × score matrix is utilizedto evaluate the veriﬁcation performance. For evaluating per-formance under the effect of distortions, we randomly se-lect 50% of the total images from each database and corruptthem with the proposed distortions separately. These dis-torted sets of images are utilized to compute the new scorematrices for each case.able 3: Veriﬁcation performance of existing face recognition algorithms in the presence of different distortions on the PaSCand MEDS databases. All values indicate genuine accept rate (%) at 1% false accept rate. System MEDS PaSCOriginal Grids xMSB FHBO ERO Beard Original Grids xMSB FHBO ERO BeardCOTS 24.1 20.9 14.5 19.0 0.0 24.8 40.3 24.3 19.1 13.0 0 6.2OpenFace 66.7 49.5 43.8 47.9 16.4 48.2 39.4 10.1 10.1 14.9 6.5 22.6VGG-Face 78.4 50.3 45.0 25.7 10.9 47.7 54.3 3.2 1.3 15.2 8.8 24.0LightCNN 89.3 80.1 71.5 62.8 26.7 70.7 60.1 24.6 29.5 31.9 24.4 38.1L-CSSE 89.1 81.9 83.4 55.8 27.3 70.5 61.2 43.1 36.9 29.4 39.1 39.8

Existing Networks and Systems

In this research, we utilize the OpenFace (Amos etal.), VGG-Face (Parkhi, Vedaldi, and Zisserman 2015),LightCNN (Wu et al. 2015), and L-CSSE (Majumdar, Singh,and Vatsa 2017) networks to gauge the performance of deepface recognition algorithms in the presence of the afore-mentioned distortions. The OpenFace library is an opensource implementation of FaceNet (Schroff, Kalenichenko,and Philbin 2015) and is openly available to all members ofthe research community for modiﬁcation and experimentalusage. The VGG deep face network is a deep convolutionalneural network (CNN) with 11 convolutional blocks whereeach convolution layer is followed by non-linearities such asReLU and max pooling. LightCNN is another publicly avail-able deep network architecture for face recognition that is aCNN with maxout activations in each convolutional layerand achieves good results with just ﬁve convolutional lay-ers. L-CSSE is a supervised autoencoder formulation thatutilizes a class sparsity based supervision penalty in the lossfunction to improve the classiﬁcation capabilities of autoen-coder based deep networks. In order to assess the relativeperformance of deep face recognition with a non-deep learn-ing based approach, we compare the performance of thesedeep learning based algorithms with a commercial-off-the-shelf (COTS) matcher. No ﬁne-tuning is performed for anyof these algorithms before evaluating their performance onthe test databases.

Results and Analysis

Table 3 summarizes the effect of image processing based ad-versarial distortions on OpenFace, VGG-Face, LightCNN,L-CSSE, and COTS. On the PaSC database, as shown in Ta-ble 3, while OpenFace and COTS perform comparably toeach other at about 1% false accept rate (FAR), OpenFaceperforms better than the COTS algorithm at all further op-erating points when no distortions are present. However, weobserve a sharp drop in OpenFace performance when anydistortion is introduced in the data. For instance, with gridsattack, at 1% FAR, the GAR of OpenFace drops by 29.3%and of VGG by 28.1%, whereas the performance of COTSonly drops by 16% which is about half the drop compared towhat OpenFace and VGG-Face experience. We notice a sim-ilar scenario in the presence of noise attack where the perfor-mance of OpenFace and VGG drops down by about 29% asopposed to the loss of 21.2% observed by COTS. In cases ofLightCNN and L-CSSE, they both have shown higher per-formance with original images; however, as shown in Table3, similar level of drops are observed. It is to be noted that Figure 3: Bar graph showing the effect of perturbation onthe VGG-Face model. Veriﬁcation accuracy is reported at1% GAR.for xMSB and grid attack, L-CSSE is able to achieve rela-tively better performance because L-CSSE is a supervisedversion of autoencoder which can handle noise better. Over-all, deep learning based algorithms experience higher per-formance drop as opposed to the non-deep learning basedCOTS. In the case of occlusions, however, deep learningbased algorithms suffer less as compared to COTS. It is ourassessment that the COTS algorithm fails to perform accu-rate recognition with the highly limited facial region avail-able in the low-resolution PaSC images in the presence ofocclusions. Similar performance trends are observed on theMEDS database on which for original images, deep learningbased algorithms outperform the COTS matcher with a GARof 60-89% at 1% FAR respectively as opposed to 24.1% byCOTS. The accuracy of deep learning algorithms drops sig-niﬁcantly more than the accuracy of COTS.We next performed a similar analysis with learning basedadversaries on the PaSC database. The results of VGGFacemodel with original and perturbed images are shown in Fig-ure 3. It is interesting to observe that the drop in accuracyobtained by simple image processing operations is equiva-lent to the reduction achieved by learned adversaries. Thisclearly shows that deep models are not resilient to even sim-ple perturbations and therefore, it is very important to deviseeffective strategies for detection and mitigation of attacks.

Detection and Mitigation of AdversarialAttacks

As we can see in the previous section, adversarial attackscan substantially reduce the performance of usually accurate

NN Network Activation Analysis Attack MitigationMatchingAttack detected?Face Image YesNo

Figure 4: Flow chart for the proposed detection and mitiga-tion methodology.deep neural network based face recognition methods. There-fore, it is essential to address such singularities in order tomake face recognition algorithms more robust and useful inreal world applications. In this section, we propose novelmethodologies for detecting and mitigating adversarial at-tacks. First, we provide a brief overview of a deep networkfollowed by the proposed algorithms and their correspond-ing results.Each layer in a deep neural network essentially learns afunction or representation of the input data. The ﬁnal featurecomputed by a deep network is derived from all of the inter-mediate representations in the hidden layers. In an ideal sce-nario, the internal representation at any given layer for an in-put image should not change drastically with minor changesto the input image. However, that is not the case in practiceas proven by the existence of adversarial examples. The ﬁ-nal features obtained for a distorted and undistorted imageare measurably different from one another since these fea-tures map to different classes. Therefore, it is implied thatthe intermediate representations also vary for such cases. Itis our assertion that the internal representations computed ateach layer are different for distorted images as compared toundistorted images. Therefore, in order to detect whether anincoming image is perturbed in an adversarial manner, wedecide that it is distorted if its layer-wise internal represen-tations deviate substantially from the corresponding meanrepresentations. The overall ﬂow of the detection and miti-gation algorithms is summarized in Figure 4.

Network Analysis and Detection

In order to develop adversarial attack detection mecha-nism, we ﬁrst analyze the ﬁlter responses in CNN architec-ture. Network visualization analysis showcases the ﬁlter re-sponses for a distorted image at selected intermediate layersthat demonstrate the most sensitivity towards noisy data. Wecan see that many of the ﬁlter outputs primarily encode thenoise instead of the input signal. We observe that the deepnetwork based representation is more sensitive to the inputand while that sensitivity results in a more expressive rep-resentation that offers higher performance in case of undis-torted data, it also compromises the robustness towards noisesuch as the proposed distortions. Since each layer in a deepnetwork learns increasingly more complicated functions ofthe input data based on the functions learned by the previouslayer, any noise in the input data is also encoded in the fea-tures thus leading to a higher reduction in the discriminative capacity of the ﬁnal learned representation. Similar conclu-sions can also be drawn from the results of other existingadversarial attacks on deep networks, where the addition ofa noise pattern leads to spurious classiﬁcation (Goodfellow,Shlens, and Szegedy 2015).To counteract the impact of such attacks and ensure prac-tical applicability of deep face recognition, the networksmust either be made more robust towards noise at a layerlevel during training or it must be ensured that any input ispreprocessed to ﬁlter out any such distortion prior to com-puting its deep representation for recognition.In order to detect distortions we compare the pattern ofthe intermediate representations for undistorted images withdistorted images at each layer. The differences in these pat-terns are used to train a classiﬁer that can categorize anunseen input as an undistorted/distorted image. In this re-search, we use the VGG-Face (Parkhi, Vedaldi, and Zisser-man 2015) and LightCNN (Wu et al. 2015) networks todevise and evaluate our detection methodology. From the50,248 frontal face images in the CMU Multi-PIE database(Gross et al. 2010), 40,000 are randomly selected and usedto compute a set of layer-wise mean representations, µ , asfollows: µ i = 1 N train Σ N train j =1 φ i ( I j ) (3)where, I j is the j th image in the training set, N train is the to-tal number of training images, µ i is the mean representationfor the i th layer of the network, and φ i ( I j ) denotes the rep-resentation obtained at the i th layer of the network when I j is the input. Once µ is computed, the intermediate represen-tations computed for an arbitrary image I can be comparedwith the layer-wise means as follows: Ψ i ( I, µ ) = Σ λ i z | φ i ( I ) z − µ iz || φ i ( I ) z | + | µ iz | (4)where, Ψ i ( I, µ ) denotes the Canberra distance between φ i ( I ) and µ i , λ i denotes the length of the feature repre-sentation computed at the i th layer of the network, and µ iz denotes the z th element of µ i . If the number of intermediatelayers in the network is N layers , we obtain N layers distancesfor each image I . These distances are used as features totrain a Support Vector Machine (SVM) (Suykens and Van-dewalle 1999) for two-class classiﬁcation. Mitigation: Selective Dropout

An ideal automated solution should not only automaticallydetect but also mitigate the effect of an adversarial attack soas to maintain as high performance as possible. Therefore,the next step in defending against adversarial attack is mit-igation. This can be achieved by discarding or preprocess-ing (e.g. denoising) the affected regions. In order to accom-plish these objectives, we again utilize the characteristics ofthe output produced in the intermediate layers of the net-work. We select 10,000 images from the Multi-PIE databasethat are partitioned into 5 mutually exclusive and exhaus-tive subsets of 2,000 images each. Each subset is processedsing a different distortion. The set of 10,000 distorted im-ages thus obtained contains 2,000 images pertaining to eachof the ﬁve proposed distortions. We use a smaller separateMulti-PIE subset of 1,680 faces (5 per subject) for trainingthe algorithm on DeepFool and Universal perturbations. Us-ing this data, we compute a ﬁlter-wise score per layer thatestimates the particular ﬁlter’s sensitivity towards distortionas follows: (cid:15) ij = Σ N dis k =1 (cid:107) φ ij ( I k ) − φ ij ( I (cid:48) k ) (cid:107) (5)where, N dis is the number of distorted images in the train-ing set, (cid:15) ij denotes the score and φ ij ( · ) denotes the responseof the j th ﬁlter in the i th layer, I k is the k th distorted im-age in the dataset, and I (cid:48) k is the undistorted version of I k .Once these values are computed, the top η layers are selectedbased on the aggregated (cid:15) values for each layer. These are thelayers identiﬁed to contain the most ﬁlters that are adverselyaffected by the distortions in data. For each of the selected η layers, the top κ fraction of affected ﬁlters are disabledby modifying the weights pertaining to before computingthe features. We also apply a median ﬁlter of size × fordenoising the image before extracting the features. We termthis approach as selective dropout . It is aimed at increasingthe network’s robustness towards noisy data by removing themost problematic ﬁlters from the pipeline. We determine thevalues of parameters η and κ via grid search optimization onthe training data with veriﬁcation performance as the crite-rion. Experimental Protocol

For training the detection model, we use the remain-ing 10,000 frontal face images from the CMU Multi-PIEdatabase as undistorted samples. We generate 10,000 dis-torted samples using all ﬁve distortions with 2,000 imagesper distortion that are also randomly selected from the CMUMulti-PIE database. We use the same training data for uni-versal perturbations with 10,000 distorted and 10,000 undis-torted samples. For DeepFool, we use a subset of 1,680 faceimages from the CMU Multi-PIE database with 5 imagesfrom each of the 336 subjects with both distorted and undis-torted versions for training the detection algorithm. Sincethe VGG-Face network has 20 intermediate layers, we ob-tain a feature vector of size distances for each image. Weperform a grid search based parameter optimization usingthe , × training matrix to optimize and learn theSVM model. For DeepFool, the size of the training data is , × . Once the model is learned, any given test imageis characterized by the distance vector and processed by theSVM. The score given by the model for the image to belongto the distorted class is used as a distance metric. We observethat the metric thus obtained is able to classify distorted im-ages on unseen databases. The mitigation algorithm is eval-uated with both LightCNN and VGG-Face networks on boththe PaSC and MEDS databases with the same experimentalprotocol as used in obtaining the veriﬁcation results. Results and Analysis

First, we present the results of the proposed algorithm indetecting whether an image contains adversarial distortionsor not using the VGG and LightCNN networks. We choosethese two as the model deﬁnition and weights are publiclyavailable. Table 4 presents the results of adversarial attackdetection. Each distortion based subset comprises of a 50%split of distorted and undistorted faces. These are the samesets that have been used for evaluating the performance ofthe three face recognition systems. As mentioned previously,the model is trained on a separate database which does nothave any overlap with the test set.The proposed detection algorithm performs almost per-fectly for the PaSC database with the VGG networkand maintains accuracies of 80-90% with the LightCNNnetwork. The lowest performance is observed on theMEDS database (classiﬁcation accuracy of 68.4% withthe LightCNN network). The lower accuracies with theLightCNN can be attributed to the smaller network depthwhich results in smaller size features to be utilized by thedetection algorithm. It is to be noted that the proposed al-gorithm maintains high true positive rates even at very lowfalse positive rates across all distortions on both databaseswhich is desirable when the cost of accepting a distorted im-age is much higher than a false reject for the system. Be-sides exceptionally poor quality images that are naturallyquite distorted, we observe that high or low illumination re-sults in false rejects by the algorithm, i.e., falsely detectedas distorted. This shows the scope of further improvementand reﬁnement in the detection methodology. This is also an-other reason for lower performance with the MEDS databasewhich has more extreme illumination cases as compared toPaSC. We also test using the Viola Jones face detector (Vi-ola and Jones 2004) and ﬁnd that, on average, approximately60% of the distorted faces pass face detection. Therefore, thedistorted face images cannot be differentiated from undis-torted faces on the basis of failing face detection. We attemptto reduce the feature dimensionality to deduce the most im-portant features using sequential feature selection based onclassiﬁcation loss by an SVM model learned on a given sub-set of features. For the VGG-Face based model, using justthe top 6 features for detection, we obtain an average ac-curacy of 81.7% on MEDS and 96.9% on PaSC databaseacross all distortions. If we use only one most discriminativefeature to perform detection, we obtain 79.3% accuracy onMEDS and 95.8% on PaSC on average across all distortions.This signiﬁes that comparing the representations computedby the network in its intermediate layers indeed produces agood indicator of the existence of distortions in a given im-age.In addition to the proposed adversarial attacks, wehave also evaluated the efﬁcacy of the proposed detectionmethodology on two existing attacks that utilize network ar-chitecture information for adversarial perturbation genera-tion, i.e., DeepFool (Moosavi-Dezfooli, Fawzi, and Frossard2016) and Universal adversarial perturbations (Moosavi-Dezfooli et al. 2017). We have also compared the perfor-mance of the proposed detection algorithm with two recentadversarial detection techniques based on adaptive noise re-able 4: Performance (accuracy %) of the proposed detection methodology (using LightCNN and VGG-Face as the targetnetworks) compared to two existing detection algorithms. Grids = grid based occlusion, xMSB = most signiﬁcant bit basednoise, FHBO = forehead and brow occlusion, ERO = eye region occlusion, and Beard = beard like occlusion.

Distortion MEDS PaSCLightCNN VGG (Liang et al. 2017) (Feinman et al. 2017) LightCNN VGG (Liang et al. 2017) (Feinman et al. 2017)Beard

Figure 5: Summarizing the results of the proposed and ex-isting detection algorithms on the PaSC (Left) and MEDS(Right) databases.duction (Liang et al. 2017) and Bayesian uncertainty (Fein-man et al. 2017). Same training data and protocol was usedto train and test all three detection approaches. The resultsof detection are presented in Table 4 and Figure 5. We ob-serve that the proposed methodology is at least 11% bet-ter at detecting DNN architecture based adversarial attacksas compared to the existing algorithms for all cases exceptfor detecting DeepFool perturbed images from the MEDSdatabase where it still outperforms the other approaches bymore than 3%. We believe that this is due to the fact thatMEDS has overall higher image quality as compared toPaSC and even the impact of these near imperceptible per-turbations (DeepFool and Universal) on veriﬁcation perfor-mance is minimal for the database. Therefore, it is harder todistinguish original images from perturbed images for thesedistortions for all the tested detection algorithms.Table 5 present the results of the mitigation algorithm.Mitigation is a two-step process to enable better perfor-mance and computational efﬁciency. Figure 3 shows the ef-fect of DeepFool and Universal adversary on the veriﬁca-tion performance using VGG-Face model. First, using theproposed detection algorithm we perform selective mitiga-tion of only those images that are considered adversarial bythe learned model. Face veriﬁcation results after applyingthe proposed mitigation algorithm on the MEDS and PaSCdatabases are presented in Table 5. We can observe that themitigation model is able to improve the veriﬁcation perfor-mance on both the databases with either network and bringit closer to the original. Thus, we see that even discardinga certain fraction of the intermediate network output, that isthe most affected by adversarial distortions, results in betterrecognition than incorporating them into the obtained fea- Table 5: Mitigation Results (GAR (%) at 1% FAR) on theMEDS and PaSC databases.

Algorithm Database Original Distorted CorrectedLCNN PaSC 60.5 25.9

MEDS 89.3 41.6

VGGFace PaSC 54.3 14.6

MEDS 78.4 30.5 ture vector.

Conclusion and Future Research Directions

To summarize, our work has three main contributions: (i)a framework to evaluate robustness of deep learning basedface recognition engines, (ii) a scheme to detect adversar-ial attacks on the system; and (iii) methods to mitigate ad-versarial attacks when detected. Playing the role of an ex-pert level adversary, we propose ﬁve classes of image dis-tortions in the evaluation experiment. Using an open sourceimplementation of Facenet, i.e., OpenFace, and the recentlyproposed VGG-Face, LightCNN, and L-CSSE networks, weconduct a series of experiments on the publicly availablePaSC and MEDS databases. We observe a substantial lossin the performance of the deep learning based systems whencompared with a non-deep learning based COTS matcherfor the same evaluation data. In order to detect the attacks,we propose a network activation analysis based method inthe hidden layers of the network. When an attack is reportedby this stage, we invoke mitigation methods described in thepaper to show that we can recover from the attacks in manysituations. In the future, we will build more complex mitiga-tion frameworks to restore to a normal level of performance.It is our assertion that with these ﬁndings, future researchcan be aimed at correcting such adversarial samples and in-corporating various other kinds of countermeasures in deepneural networks to further increase their robustness.

Acknowledgements

Goswami was partly supported through IBM PhD Fellow-ship, Agarwal is partly supported by Visvesvaraya PhD Fel-lowship, and Vatsa and Singh are partly supported throughCAI@IIIT-Delhi.

References

Amos, B.; Ludwiczuk, B.; Harkes, J.; Pillai, P.; Elgaz-zar, K.; and Satyanarayanan, M. OpenFace: face recogni-ion with deep neural networks. http://github.com/cmusatyalab/openface . Accessed: 2017-10-10.Beveridge, J.; Phillips, P.; Bolme, D.; Draper, B.; Given, G.;Lui, Y. M.; Teli, M.; Zhang, H.; Scruggs, W.; Bowyer, K.;Flynn, P.; and Cheng, S. 2013. The challenge of face recog-nition from digital point-and-shoot cameras. In

IEEE Con-ference on Biometrics: Theory, Applications and Systems ,1–8.Bhagoji, A. N.; Cullina, D.; and Mittal, P. 2017. Dimension-ality reduction as a defense against evasion attacks on ma-chine learning classiﬁers. arXiv preprint arXiv:1704.02654 .Carlini, N., and Wagner, D. 2017. Towards evaluating therobustness of neural networks. In

IEEE Symposium on Se-curity and Privacy , 39–57.Das, N.; Shanbhogue, M.; Chen, S.-T.; Hohman, F.; Chen,L.; Kounavis, M. E.; and Chau, D. H. 2017. Keeping thebad guys out: Protecting and vaccinating deep learning withjpeg compression. arXiv preprint arXiv:1705.02900 .Dhamecha, T. I.; Singh, R.; Vatsa, M.; and Kumar, A. 2014.Recognizing disguised faces: Human and machine evalua-tion.

PLOS ONE arXivpreprint arXiv:1703.00410 .Founds, A. P.; Orlans, N.; Genevieve, W.; and Watson, C. I.2011. Nist special databse 32-multiple encounter datasetii (meds-ii).

NIST Interagency/Internal Report (NISTIR)-7807 .Gong, Z.; Wang, W.; and Ku, W.-S. 2017. Adversarial andclean data are not twins. arXiv preprint arXiv:1704.04960 .Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Ex-plaining and harnessing adversarial examples.

ICLR - arXivpreprint arXiv:1412.6572 .Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; and Baker, S.2010. Multi-PIE.

Image and Vision Computing arXiv preprint arXiv:1702.06280 .King, D. E. 2009. Dlib-ml: A machine learning toolkit.

Journal of Machine Learning Research arXiv preprintarXiv:1607.02533 .Liang, B.; Li, H.; Su, M.; Li, X.; Shi, W.; and Wang, X.2017. Detecting adversarial examples in deep networks withadaptive noise reduction.

CoRR abs/1705.08378.Liu, J.; Deng, Y.; Bai, T.; and Huang, C. 2015. Target-ing ultimate accuracy: Face recognition via deep embedding.

CoRR abs/1506.07310.Lu, J.; Issaranon, T.; and Forsyth, D. 2017. Safetynet: De-tecting and rejecting adversarial examples robustly. arXivpreprint arXiv:1704.00103 .Majumdar, A.; Singh, R.; and Vatsa, M. 2017. Face veriﬁ-cation via class sparsity based supervised encoding.

IEEE Transactions on Pattern Analysis and Machine Intelligence arXiv preprintarXiv:1702.04267 .Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; andFrossard, P. 2017. Universal adversarial perturbations. In

IEEE Conference on Computer Vision and Pattern Recogni-tion .Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016.Deepfool: a simple and accurate method to fool deep neu-ral networks. In

IEEE Conference on Computer Vision andPattern Recognition , 2574–2582.Nguyen, A.; Yosinski, J.; and Clune, J. 2015. Deep neuralnetworks are easily fooled: High conﬁdence predictions forunrecognizable images. In

IEEE Conference on ComputerVision and Pattern Recognition , 427–436.Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik,Z. B.; and Swami, A. 2016. The limitations of deep learningin adversarial settings. In

IEEE European Symposium onSecurity and Privacy , 372–387.Parkhi, O. M.; Vedaldi, A.; and Zisserman, A. 2015. Deepface recognition. In

British Machine Vision Conference , vol-ume 1, 6.Rauber, J.; Brendel, W.; and Bethge, M. 2017. Foolboxv0.8.0: A python toolbox to benchmark the robustness ofmachine learning models.

CoRR abs/1707.04131.Schroff, F.; Kalenichenko, D.; and Philbin, J. 2015. Facenet:A uniﬁed embedding for face recognition and clustering. In

IEEE Conference on Computer Vision and Pattern Recogni-tion , 815–823.Sharif, M.; Bhagavatula, S.; Bauer, L.; and Reiter, M. K.2016. Accessorize to a crime: Real and stealthy attackson state-of-the-art face recognition. In

ACM SIGSAC Con-ference on Computer and Communications Security , 1528–1540.Sun, Y.; Wang, X.; and Tang, X. 2015. Deeply learned facerepresentations are sparse, selective, and robust. In

IEEEConference on Computer Vision and Pattern Recognition ,2892 – 2900.Suykens, J. A., and Vandewalle, J. 1999. Least squaressupport vector machine classiﬁers.

Neural processing letters arXiv preprint arXiv:1312.6199 .Taigman, Y. and Yang, M. and Ranzato, M. and Wolf, L.2014. DeepFace: closing the gap to human-level perfor-mance in face veriﬁcation. In

IEEE Conference on Com-puter Vision and Pattern Recognition , 1701 – 1708.Viola, P., and Jones, M. J. 2004. Robust real-time face detec-tion.

International Journal of Computer Vision arXiv preprintarXiv:1511.02683arXiv preprintarXiv:1511.02683