[PDF] Oriole: Thwarting Privacy against Trustworthy Deep Learning Models

Abstract

Deep Neural Networks have achieved unprecedented success in the field of face recognition such that any individual can crawl the data of others from the Internet without their explicit permission for the purpose of training high-precision face recognition models, creating a serious violation of privacy. Recently, a well-known system named Fawkes (published in USENIX Security 2020) claimed this privacy threat can be neutralized by uploading cloaked user images instead of their original images. In this paper, we present Oriole, a system that combines the advantages of data poisoning attacks and evasion attacks, to thwart the protection offered by Fawkes, by training the attacker face recognition model with multi-cloaked images generated by Oriole. Consequently, the face recognition accuracy of the attack model is maintained and the weaknesses of Fawkes are revealed. Experimental results show that our proposed Oriole system is able to effectively interfere with the performance of the Fawkes system to achieve promising attacking results. Our ablation study highlights multiple principal factors that affect the performance of the Oriole system, including the DSSIM perturbation budget, the ratio of leaked clean user images, and the numbers of multi-cloaks for each uncloaked image. We also identify and discuss at length the vulnerabilities of Fawkes. We hope that the new methodology presented in this paper will inform the security community of a need to design more robust privacy-preserving deep learning models.

Full PDF

aa r X i v : . [ c s . CR ] F e b Oriole: Thwarting Privacy against TrustworthyDeep Learning Models

Liuqiao Chen , Hu Wang , Benjamin Zi Hao Zhao , Minhui Xue , and HaifengQian (cid:0) ) East China Normal University, Shanghai, China [email protected] The University of Adelaide, Adelaide, Australia The University of New South Wales and Data61-CSIRO, Australia

Abstract.

Deep Neural Networks have achieved unprecedented successin the ﬁeld of face recognition such that any individual can crawl the dataof others from the Internet without their explicit permission for the pur-pose of training high-precision face recognition models, creating a seriousviolation of privacy. Recently, a well-known system named Fawkes [34](published in USENIX Security 2020) claimed this privacy threat canbe neutralized by uploading cloaked user images instead of their originalimages. In this paper, we present

Oriole , a system that combines theadvantages of data poisoning attacks and evasion attacks, to thwart theprotection oﬀered by Fawkes, by training the attacker face recognitionmodel with multi-cloaked images generated by

Oriole . Consequently,the face recognition accuracy of the attack model is maintained and theweaknesses of Fawkes are revealed. Experimental results show that ourproposed

Oriole system is able to eﬀectively interfere with the perfor-mance of the Fawkes system to achieve promising attacking results. Ourablation study highlights multiple principal factors that aﬀect the perfor-mance of the

Oriole system, including the DSSIM perturbation budget,the ratio of leaked clean user images, and the numbers of multi-cloaksfor each uncloaked image. We also identify and discuss at length the vul-nerabilities of Fawkes. We hope that the new methodology presented inthis paper will inform the security community of a need to design morerobust privacy-preserving deep learning models.

Keywords:

Data poisoning · Deep learning privacy · Facial Recognition · Multi-cloaks

Facial Recognition is one of the most important biometrics of mankind andis frequently used in daily human communication [1]. Facial recognition, as anemerging technology composed of detection, capturing and matching, has beensuccessfully adapted to various ﬁelds: photography [30], video surveillance [3],and mobile payments [38]. With the tremendous success gained by deep learning

L. Chen, H. Wang, B. Zhao, M. Xue and H. Qian techniques, current deep neural facial recognition models map an individual’sbiometric information into a feature space and stores them as faceprints. Conse-quently, features of a live captured image are extracted for comparison with thestored faceprints. Currently, many prominent vendors oﬀer high-quality facialrecognition tools or services, including NEC [28], Aware [2], Google [15], andFace++ [11] (a Chinese tech giant Megvii). According to an industry researchreport “Market Analysis Repo” [31], the global facial recognition market wasvalued around $ Oriole , a system designed to render the Fawkessystem ineﬀective. In Fawkes, the target class is selected from the public dataset.In contrast,

Oriole implements a white-box attack to artiﬁcially choose multipletargets and acquire the corresponding multiple cloaked images of leaked userphotos. With the help of the proposed multi-cloaks, the protection of Fawkesbecomes fragile. To do so, the attacker utilizes the multi-cloaks to train theface recognition model. During the test phase, after the original user images arecollected, the attacker inputs the Fawkes cloaked image into the model for facerecognition. As a result, in the feature space, the features of cloaked photos willinevitably fall into the range of marked multi-cloaks. Therefore, the user imagescan still be recognized even if they are cloaked by Fawkes. We also highlightthe intrinsic weakness of Fawkes: The imperceptibility of images before andafter cloaking is limited when encountering high-resolution images, as cloakedimages may include spots, acne, and even disﬁgurement. This will result in thereluctance of users to upload their disﬁgured photos.In summary, our main contributions in this paper are as follows: – The Proposal of Oriole.

We design, implement, and evaluate

Oriole , aneural-based system that makes attack models indiﬀerent to the protectionof Fawkes. Speciﬁcally, in the training phase, we produce the most relevantmulti-cloaks according to the leaked user photos and mix them into the riole : Thwarting Privacy against Trustworthy Deep Learning Models 3

Add

Data Poisoning Attack Decision-time AttackTraining Data DNN Target Model

Raw Data

Perturbations

Adversarial Examples

Raw Data

Perturbations

Adversarial Examples

Fig. 1.

The diﬀerences between data poisoning attacks and decision-time attacks. Datapoisoning attacks modify the training data before the model training process. In con-trast, Decision-time attacks are performed after model training to induce the modelmake erroneous predictions. training data to obtain a face recognition model. During the testing phase,when encountering uncloaked images, we ﬁrst cloak them with Fawkes andthen feed them into the attack model. By doing so, the user images can stillbe recognized even if they are protected by Fawkes. – Empirical Results.

We provide experimental results to show the eﬀec-tiveness of

Oriole in the interference of Fawkes. We also identify multipleprinciple factors that aﬀect the performance of the

Oriole system, includ-ing the DSSIM perturbation budget, the ratio of leaked clean user images,and the number of multi-cloaks for each uncloaked image. Furthermore, weidentify and discuss at length the intrinsic vulnerability of Fawkes to dealwith high-resolution images.

In this section, we brieﬂy introduce defense strategies against data poisoningattacks and decision-time attacks. Figure 1 highlights the diﬀerences betweendata poisoning attacks and decision-time attacks. We then introduce the white-box attacks. The Fawkes system is detailed at the end of this section.

In the scenario of data poisoning attacks, the model’s decision boundary will beshifted due to the injection of adversarial data points into training set. The intu-ition behind it is that the adversary deliberately manipulates the training datasince the added poisoned data has vastly diﬀerent distribution with the originaltraining data. Prior research primarily involves two common defense strategies.

L. Chen, H. Wang, B. Zhao, M. Xue and H. Qian

First, anomaly detection models [40] function eﬃciently if the injected datahas obvious diﬀerences compared to the original training data. Unfortunately,anomaly detection models become ineﬀective if the adversarial examples are in-conspicuous. Similar ideas have been utilized in digital watermarking or datahiding [45]. Second, it is common to analyze the impact of newly added trainingsamples according to the accuracy of models. For example, Reject On NegativeImpact (RONI) was proposed against spam ﬁlter poisoning attacks, while Target-aware RONI (tRONI) builds on the observation of RONI failing to mitigate tar-geted attacks [35]. Other notable methods include TRIM [22], STRIP [13], andmore simply, human analysis on training data likely to be attacked [26].

In decision-time attacks, assuming that the model has already been learned, theattacker leads the model to produce erroneous predictions by making reactivechanges to the input. Decision-time attacks can be divided into several categories.Within these attacks, the most common one is the evasion attack.We shall present the most conventional evasion attack, which can be furtherbroken down into ﬁve categories: Gradient-based attacks [6, 8, 25], Conﬁdencescore attacks [21, 9], Hard label attacks [4], Surrogate model attacks [47] andBrute-force attacks [10, 17, 12]). Undoubtedly, adversarial training is presentlyone of the most eﬀective defenses. Adversarial samples, correctly labeled, areadded to the training set to enhance model robustness. Input modiﬁcation [24],extra classes [19] and detection [27, 16] are common defense techniques againstevasion attacks. Alternative defenses against decision-time attacks involve iter-ative retraining [23, 37], and decision randomization [33].

The adversary has full access to the target DNN model’s parameters andarchitecture in white-box attacks. For any speciﬁed input, the attacker can cal-culate the intermediate computations of each step as well as the correspondingoutput. Therefore, the attacker can leverage the outputs and the intermediateresult of the hidden layers of the target model to implement a successful attack.Goodfellow et al. [14] introduce a fast gradient sign method (FGSM) to attackneural network models with perturbed adversarial examples according to thegradients of the loss with respect to the input image. The adversarial attackproposed by Carlini and Wagner is by far one of the most eﬃcient white-boxattacks [7].

Fawkes [34], provides privacy protections against unauthorized training ofmodels by modifying user images collected without consent by the attacker.Fawkes achieves this by providing as simple means for users to add imperceptible riole : Thwarting Privacy against Trustworthy Deep Learning Models 5 perturbations onto the original photos before uploading them to social media orpublic web. When processed by Fawkes, the features representing the cloaked anduncloaked images are hugely diﬀerent in the feature space but are perceptuallysimilar. The Fawkes system cloaks images by choosing (in advance) a speciﬁctarget class that has a vast diﬀerence to the original image. Then it cloaks theclean images to obtain the cloaked images with great alterations to images’feature representations, but indistinguishable for naked eyes. When trained withthese cloaked images, the attacker’s model would produce incorrect outputs whenencountering clean images. However, Fawkes may be at risk of white-box attacks.If the adversary can obtain full knowledge of the target model’s parameters andarchitecture, for any speciﬁed input, the attacker can calculate any intermediatecomputation and the corresponding output. Thus, the attackers can leverage theresults of each step to implement a successful attack.

For a clean image x of a user Alice, Oriole produces multi-cloaks by addingpixel-level perturbation to x when choosing multiple targets dissimilar to Alice inthe feature space. That is, we ﬁrst need to determine the target classes and theirnumbers for each user; then, we shall generate multi-cloaks with these selectedclasses. The process is detailed in Section 4.1.Figure 2 illustrates the overview of the proposed Oriole system, togetherwith both its connection and the diﬀerences with Fawkes. In the proposed

Ori-ole , the implementation is divided into two stages: training and testing. In thetraining phase, the attacker inserts the multi-cloaks generated by the

Oriole system into their training set. After model training, upon encountering cleanuser images, we use Fawkes to generate cloaked images; the cloaked images arethen fed into the trained face recognition model to complete the recognition pro-cess.

Oriole has signiﬁcant diﬀerences with Fawkes. On one hand, we adopt adata poisoning attack scheme against the face recognition model by modifyingimages with generated multi-cloaks. On the other hand, an evasion attack (toevade the protection) is applied during testing by converting clean images totheir cloaked version before feeding them into the unauthorized face recogni-tion model. Although the trained face recognition model cannot identify users inclean images, it can correctly recognize the cloaked images generated by Fawkesand then map them back to their “true” labels.

We now elaborate the design details of

Oriole . We refer to the illustrationof the

Oriole process in Figure 3. Recall that the application of

Oriole isdivided into a training phase and a testing phase. The training phase can befurther broken down into two steps. In the ﬁrst step, the attacker A launchesa data poisoning attack to mix the multi-cloaks into the training data (recallthat the training data is collected without consent and has been protected by L. Chen, H. Wang, B. Zhao, M. Xue and H. Qian

Fawkes

Training

Model M Model M Original Cloaked

Training Data (cloaked)

Wrong

LabelTesting Data (uncloaked)

Web crawl

User

Tracker / Model

Trainer

Oriole (multi-cloaks)

Fawkes (cloaks)

Add Add

Fig. 2.

The proposed

Oriole system is able to successfully recognize faces, even withthe protection of Fawkes.

Oriole achieves this by combining the concepts of datapoisoning attacks and evasion attacks.

Fawkes). Then, the unauthorized facial recognition model M is trained on themixed training data of the second step. At test time, as evasion attacks, theattacker A ﬁrst converts the clean testing images to the cloaked version byapplying Fawkes and the cloaked version is presented to the trained model M for identiﬁcation. From Figure 3, images making up the attacker database D A can be downloaded from the Internet as training data, while the user database D U provides the user U with leaked and testing data. After obtaining the inputimages from the database, we adopt MTCNN [46] for accurate face detectionand localization as the preprocessing module [46, 42]. It outputs standardizedimages that only contain human faces with a ﬁxed size. At the training phase,the attacker A mixes the processed images of A ′ and multi-cloaks S O of the user U into training set to train the face recognition model M . At the testing phase,the attacker A ﬁrst converts the preprocessed clean images U ′ B into the cloakedimages S F , followed by the same procedure as described in Fawkes; then, theattacker A pipes S F into the trained model M to fetch the results. We assume that a user U has converted his/her clean images U B into theircloaked form for privacy protection. However, the attacker A has collected someleaked clean images of the user U in advance, denoted as U A . As shown inFigure 3, this leaked user dataset U consists of data needed U A and U B . In theproposed Oriole system, U A is utilized for obtaining multi-cloaks S O , whichcontains a target set T M with m categories out of N categories. Here, we denote G ( X, m ) as the new set composed of the target classes corresponding to theﬁrst m largest element values in set X , where X contains the minimum distancebetween the feature vector of users and the centroids of N categories (see Eq. 2).The L distances are measured between the image feature in the projected space Φ ( · ) to the centroids of N categories, and then the top m targets are selected. http://mirror.cs.uchicago.edu/fawkes/ﬁles/target data/ riole : Thwarting Privacy against Trustworthy Deep Learning Models 7 Attacker DatabaseUser Database

Pre-processing

Oriole Fawkes

Training

Training Data Testing Data

Testing

Mix

Cloaked imagesMulti-cloaks

Fig. 3.

The overall process of the proposed

Oriole . The process includes both thetraining and testing stages. Images U taken from the leaked user database D U aredivided into two parts ( U ′ A and U ′ B ) after preprocessing. In the training phase, theattacker A mixes the generated multi-cloaks S O into training data. After training, theface recognition model M is obtained. During the testing phase, the attacker A ﬁrstconverts the clean images U ′ B into cloaked images S F and then pipes them into thetrained model M to obtain a correct prediction. X = N [ k =1 { d | d = min x ∈ U B ( Dist ( Φ ( x ) , C k )) } , (1) T M = G ( X, m ) = { T , T , · · · , T m } = m [ i =1 T i , (2)where C k represents the centroid of a certain target and Φ is the feature projec-tor [34]. Besides, the distance calculation function adopts L distance. Next, thecalculation of a cloak δ ( x, x T i ) is deﬁned as: δ ( x, X T i ) = min δ Dist ( Φ ( x T i ) , Φ ( x ⊕ δ ( x, x T i ))) , (3)where δ subjects to | δ ( x, x T i ) | < ρ , and | δ ( x, x T i ) | is calculated by DSSIM (Struc-ture Dis-Similarity Index) [39, 41] and ρ is the perturbation budget. Then wecan obtain the multi-cloaks S O as follows: S O = m [ i =1 { s | s = x ⊕ δ ( x, x T i ) } , (4)where multi-value m is a tunable hyper-parameter. m decides the number ofmulti-cloaks produced for each clean image.Instead of training the model M with clean data, the attacker A mixes themulti-cloaks S O calculated from Equation 4 with the preprocessed images U ′ A L. Chen, H. Wang, B. Zhao, M. Xue and H. Qian to form the training set. The deep convolutional face recognition model M istrained [32]. The last stage of

Oriole is model testing. Unlike Fawkes, we do not directlyapply clean images to the attack model. Instead,

Oriole ﬁrst makes subtlechanges to the clean images before faces identiﬁcation inference. Speciﬁcally,we implement the subtle changes through cloaking images from processed userimages U ′ B . Conceptually, the feature vectors of cloaked images S F will fall intothe marked feature space of multi-cloaks S O . Then, the trained model M is ableto correctly identify users through cloaked images S F .Figure 4 illustrates the intuition behind the Oriole system. For the purposesof demonstration, we assume the number of multi-value m equals to four. Toput diﬀerently, we shall assume that Fawkes will select one of four targets forcloaking, from which the proposed Oriole system will attempt to obtain multi-cloaks associated with all four targets with a small number of the user U ’s leakedphotos. In this scenario, we successfully link the four feature spaces of our fourtarget classes ( T , T , T and T ) with the user U . Thus, when it comes to anew and clean image of U , we ﬁrst cloak it with Fawkes. The cloaked versionuser images will inevitably fall into one of the marked feature spaces of themulti-cloaks ( T has been chosen for illustration in Figure 4(b). See the hollowgreen and red triangles for the clean and cloaked image features, respectively).As the cloaked image features lie in T , and the multi-cloak trained model nowassociates T (and T , T , T ) as U , the attacker can correctly identify a user’sidentity even with the protection of Fawkes.We ﬁnally discuss the performance of Oriole when target classes are in-cluded and not included in the training data, respectively. We further observethat, no matter whether the number of target classes m is included in the train-ing set or not, the Oriole system still functions eﬀectively to thwart protectionsoﬀered by Fawkes. In Figure 4, assuming that the feature vectors of the cloakedtesting image are located in the high dimensional feature space of T . We ﬁrstconsider when target users of T are not included in the attack model trainingprocess. We are able to map the user U to the feature space of T through theleaked images of the user U that were used to generate multi-cloaks. Further-more, Oriole still works when images of the target class T are included in thetraining set. Even if the cloaked images of U are detected as T , but the settingof Fawkes ensures that the cloaks of T occupy another area within the featurespace that will not overlap with T . Thus, this special case will not interfere theeﬀectiveness of Oriole . We implemented our

Oriole system on three popular image datasets againstthe Fawkes system. In our implementation, considering the size of the three riole : Thwarting Privacy against Trustworthy Deep Learning Models 9 x1x1x2x2 UB DCT1 T2T3 T4 x1x1x2x2 UB DCT1 T2T3 T4Without Oriole With OrioleDecision Boundary(a) (b)

Target classLeaked image of UMulti-cloaked image of U Cloaked image of UTest image of UTarget classLeaked image of UMulti-cloaked image of U Cloaked image of UTest image of U

Fig. 4.

The intuition behind why

Oriole can help the attacker A successfully identifythe user U even with the protection of Fawkes. We denote the process on a simpliﬁed2D feature space with seven user classes B, C, D, T , T , T , T and U . Figures (a) and(b) represent the decision boundaries of the model trained on U ’s clean photos andmulti-cloaks respectively (with four targets). The white triangles represent the multi-cloaked images of U and the red triangles are the cloaked images of U . Oriole worksas long as cloaked testing images fall into the same feature space of the multi-cloakedleaked images of U . datasets, we took the smallest PubFig83 [29] as the user dataset, while the largerVGGFace2 [5] and CASIA-WebFace [44] were prepared for the attacker to traintwo face recognition models. In addition, we artiﬁcially created a high-deﬁnitionface dataset to benchmark the data constraints surrounding the imperceptibilityof the Fawkes system. PubFig83 [29].

PubFig83 is a well-known dataset for face recognition research.It contains 13,838 cropped facial images belonging to 83 celebrities, each of whichhas at least 100 pictures. In our experiment, we treat PubFig83 as a databasefor user sample selection, due to its relative small number of tags and consistentpicture resolution.

CASIA-WebFace [44].

CASIA-WebFace dataset is the largest known pub-lic dataset for face recognition, consisting a total of 903,304 images in 38,423categories.

VGGFace2 [5].

VGGFace2 is a large-scale dataset containing 3.31 million im-ages from 9131 subjects, with an average of 362.6 images for each subject. Allimages on VGGFace2 were collected from the Google Image Search and dis-tributed as evenly as possible on gender, occupation, race, etc.

Models: M V and M CW . We chose VGGFace2 and CASIA to train face recog-nition models separately for real-world attacker simulation. In the preprocessingstage, MTCNN [46] is adopted for face alignment and Inception-ResNet-V1 [36]selected as our model architecture, and we then completed the model trainingprocess on a Tesla P100 GPU, with Tensorﬂow r1.7. An Adam optimizer witha learning rate of -1 is used to train models over 500 epochs. Here, we denotethe models trained on the VGGFace2 and CASIA-WebFace datasets as M V and M CW , the LFW accuracy of these models achieved 99 .

05% and 99 . Similar to the Fawkes system, the proposed

Oriole system is designed for auser-attacker scenario, whereby the attacker trains a powerful model through ahuge number of images collected on the Internet. The key diﬀerence is that

Ori-ole assumes the attacker A is able to obtain a small percentage of leaked cleanimages of user U . Through the evaluation of the Oriole system, we discoverthe relevant variables aﬀecting the attack capability of the

Oriole system. Inthis case, we deﬁne a formula for facial recognition accuracy evaluation in Equa-tion 5, where R represents the ratio of the user’s multi-cloaks in the trainingdata. The ranges of R and ρ are both set to [0 , m (numberof multi-cloaks) is subject to the inequality: 0 < m ≪ N , where N = 18 ,

947 isthe total number of target classes in the public dataset.

Accuracy = k R · mρ (5)Throughout our experimental evaluation, the ratio between the training dataand testing data is ﬁxed at 1:1 (see Section 5.2 for the motivation behind thisratio). Comparison between Fawkes and

Oriole . We start by reproducing theFawkes system against unauthorized face recognition models. Next, we employedthe proposed

Oriole scheme to invalidate the Fawkes system. We shall empha-size that the leaked data obtained associated with the user will not be directlyused for training the attack model. Instead, we insert multi-cloaks actively pro-duced by

Oriole into the training process, which presents a signiﬁcant diﬀerencein the way adversary training schemes deal with leaked data.In particular, we randomly select a user U with 100 images from PubFig83and divided their images equally into two non-intersecting parts: U A and U B ,each of which contains 50 images, respectively. We shall evaluate both Fawkesand Oriole in two settings for comparison. In the ﬁrst setting, we mix the multi-cloaks of the processed U ′ A into the training data to train the face recognitionmodel M and test the accuracy of this model M with the processed U ′ B inthe testing phase (see Figure 3). In the second setting, we replace the cleanimages of U A with the corresponding cloaked images (by applying Fawkes) to riole : Thwarting Privacy against Trustworthy Deep Learning Models 11 Model M V DSSIM Perturbation Budget -3 F ac i a l R ec ogn iti on S u cce ss R a ti o Only Fawkes Fawkes with Oriole

Model M CW DSSIM Pertubation Budget -3 F ac i a l R ec ogn iti on S u cce ss R a ti o Fig. 5.

Evaluation of the impact on

Oriole against Fawkes through two models M V and M CW . The two ﬁgures depict the performance of the face recognition model M with Fawkes and equipped with Oriole . There are clear observations from the twoﬁgures: the larger the DSSIM perturbation budget ρ , the higher the resulting facerecognition accuracy obtained from model M . Additionally, it demonstrates that ourproposed Oriole system can successfully bypass protections oﬀered by Fawkes. obtain a secondary measure of accuracy. Figure 5 shows the variation in facialrecognition accuracy with certain DSSIM perturbation budget, and displays theperformance of

Oriole against Fawkes protection. We implement this processon two diﬀerent models: M V and M CW . The former training data consists ofthe leaked images U A and all images in VGGFace2, while the latter containsthe leaked images U A and all images in CASIA-WebFace. All experiments wererepeated three times and the results presented are averages.It can been seen from Figure 5 that there is a clear trend that the facialrecognition ratio of the two models rises signiﬁcantly as the DSSIM perturbationbudget ρ increases from 0.1 to 1. Speciﬁcally, Oriole improves the accuracyof the face recognition model M V from 12.0% to 87.5%, while the accuracyof the model M CW increases from 0.111 to 0.763 when parameter ρ is set to0.008. We notice that the accuracy of the two models M V and M CW has beenimproved nearly 7 fold, when compared to the scenario where Fawkes is usedto protect privacy. From these results, we empirically ﬁnd that Oriole canneutralize the protections oﬀered by Fawkes, invalidating its protection of imagesin unauthorized deep learning models. Figure 6 shows an uncloaked image andits related multi-cloaks ( ρ = 0 . , m = 20). The feature representation of theclean image framed by a red outline is dissimilar from that of the remaining20 images. Figure 7 shows the two-dimensional Principal Component Analysis(PCA) of the face recognition system validating our theoretical analysis (for ρ = 0 . , m = 4). The feature representation of the clean images are mappedto the feature space of the four target classes images through multi-cloaks. Wethen mark the corresponding feature spaces as part of identity U and identifythe test images of U by cloaking them. Fig. 6.

An example of a clean image of the user U and 20 multi-cloaks produced by Oriole . The uncloaked image has been framed by a red outline.

Table 1.

The four models used in our veriﬁcation and their classiﬁcation accuracyon PubFig83. The “Basic” column represents the conventional face recognition. The“Fawkes” column represents that only Fawkes is used to fool the face recognition modelfor privacy protection. The

Oriole column represents the performance of

Oriole . Dataset Model Architecture Test AccuracyBasic Fawkes Oriole

CASIA-WebFace Inception-ResNet-V1 0.973 0.111 0.763CASIA-WebFace DenseNet-121 0.982 0.214 0.753VGGFace2 Inception-ResNet-V1 0.976 0.120 0.875VGGFace2 DenseNet-121 0.964 0.117 0.714

We show the general eﬀectiveness of the proposed

Oriole system in Table 1.We build four models with two diﬀerent architectures, named Inception-ResNet-V1 [36] and DenseNet-121 [20], on the two aforementioned datasets. The model,equipped with

Oriole , signiﬁcantly outperforms the model without it acrossdiﬀerent setups. The experimental results demonstrate that the

Oriole sys-tem can retain the test accuracy at a higher level of more than 70% accuracyacross all listed settings, even with the protection of Fawkes. For instance, on theCASIA-WebFace dataset with DenseNet-121 as the backbone architecture,

Ori-ole increases the attack success rate from 12.0% to 87.5%, signiﬁcantly boostingthe attack eﬀectiveness.

Main factors contributing to the performance of

Oriole . There are threemain factors inﬂuencing the performance of

Oriole : 1) the DSSIM perturbationbudget ρ , 2) the ratio of leaked clean images R , and 3) the number of multi-cloaks for each uncloaked image m . Diﬀerent DSSIM perturbation budgets ρ have already been discussed in the previous paragraph. We now explore theimpact of R and m values on model’s performance. Up until this point we haveperformed experiments with default values of R , m and ρ as 1, 20 and 0.008 riole : Thwarting Privacy against Trustworthy Deep Learning Models 13 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 Dimension 1 -0.5-0.4-0.3-0.2-0.100.10.20.30.4 D i m e n s i on Cloaked image of U Multi-cloaks of U Target classOther targetTesting image of U Leaked image of U Fig. 7.

Oriole system. Triangles areuser’s leaked images (solid) and testing data (hollow), dots are multi-cloaks of leakedimages, dots represent multi-cloaks (magenta) and images from target classes (black),red crosses are cloaked images of testing data, blue square are images from anotherclass. respectively to enable a fair comparison. From Figure 8 we can observe the mainfactors aﬀecting the

Oriole system’s performance. We observe that the facialrecognition success ratio increases monotonically as the number of multi-cloaks m increases, and this rise occurs until m reaches 20, whereby the success ratioplateaus. We can conclude that the facial recognition success ratio grows withthe ratio of leaked clean images R . The ratio increases at least three times when R increases from 0.1 to 1. Model validation.

In order to ensure the validity of

Oriole , as a comparativeexperiment, we respectively evaluate the model M V and M CW on PubFig83.We divide PubFig83 into 10 training-testing set pairs with diﬀerent proportionsand build classiﬁers with the help of two pre-trained models. We obtained 20experimental results depending on which model M V or M CW was used withratios selected between 0.1 to 1 shown in Table 2. The experimental resultsshow that the accuracy of model M V and M CW based on FaceNet increasesmonotonically as the ratio of the training set to the testing set increases. Wecan see that both models exceed a 96% recognition accuracy on PubFig83 whenthe selected the ratio between training and testing sets are 0.5. Consequently,models M V and M CW are capable of verifying the performance of Oriole .

10 20 30 40 50Value of Parameter m F ac i a l R ec ogn iti on S u cce ss R a ti o M V M CW R : Ratio of Multi-cloaked Images00.20.40.60.81 F ac i a l R ec ogn iti on S u cce ss R a ti o M V M CW Fig. 8.

The facial recognition accuracy changes with diﬀerent ratios of leaked cleanimages R and numbers of multi-cloaks for each uncloaked image m . Table 2.

The test accuracy of models M V (trained on VGGFace2) and M CW (trainedon CASIA-WebFace) across diﬀerent rates of PubFig83. The rate in the ﬁrst columnrepresents the ratio of the size of training and test sets. The test accuracy is the overallcorrect classiﬁcation score for clean images. Rate Test Accuracy of M V Test Accuracy of M CW Shan et al. [34] claim that the cloaked images with small perturbations addedare indistinguishable to the naked human eye. However, we show that the im-perceptibility of Fawkes is limited due to its inherent imperfection, which isvulnerable to white-box attacks. For practical applications, users tend to uploadclear and high-resolution pictures for the purpose of better sharing their lifeexperiences. Through our empirical study, we ﬁnd that Fawkes is able to makeimperceptible changes for low-resolution images, such as the PubFig83 dataset.However, when it comes to high-resolution images, the perturbation betweencloaked photos and their originals is plainly apparent. riole : Thwarting Privacy against Trustworthy Deep Learning Models 15

To demonstrate the limitations in Fawkes for high-resolution images, we man-ually collect 54 high-quality pictures covering diﬀerent genders, ages and regions,whose resolution is more than 300 times (width × height is larger than 3,000,000pixels at least) of PubFig83 images. We further conduct an experiment to setthe value of perturbation budget ρ to 0.007 and run the optimization process for1,000 iterations with a learning rate of 0.5, in the same experimental setting asdescribed in Fawkes [34].A sample of the resulting images from this experiment is displayed in Fig-ure 9, these ﬁgures show images of the same users before (a) and after beingcloaked by Fawkes (b). From these ﬁgures, we can easily observe signiﬁcant dif-ferences with and without cloaking. Notably, there are many wrinkles, shadowsand irregular purple spots on the boy’s face in the cloaked image. This protectionmay result in the reluctance of users to post the cloaked images online. Sybil accounts are fake or bogus identities created by a malicious user toinﬂate the resources and inﬂuence in a target community [43]. A Sybil account,existing in the same online community, is a separate account to the original oneof the user U , but the account, bolstering cloaking eﬀectiveness, can be craftedto boost privacy protection in Fawkes when clean and uncloaked images areleaked for training [34]. Fawkes modiﬁes the Sybil images to protect the user’soriginal images from being recognized. These Sybil images induce the model tobe misclassiﬁed because they occupy the same area within the feature space of U ’s uncloaked images. However, the feature space of cloaked images is vastlydiﬀerent from the originals. Sybil accounts are ineﬀective since the clean imagesare ﬁrst cloaked before testing. Furthermore, these cloaked photos occupy adiﬀerent area within feature space from the Sybil images as well as the cleanimages. To put it diﬀerently, no defense can be obviously oﬀered irrespective ofhow many Sybil accounts the user can own, as cloaked images and uncloakedimages occupy diﬀerent feature spaces. We are also able to increase the numberof multi-cloaks m in step with Fawkes to ensure the robustness of Oriole dueto the white-box nature of the attack.

In this work, we present

Oriole , a novel system to combine the advantages ofdata poisoning attacks and evasion attacks to invalidate the privacy protection ofFawkes. To achieve our goals, we ﬁrst train the face recognition model with multi-cloaked images and test the trained model with cloaked images. Our empiricalresults demonstrate the eﬀectiveness of the proposed

Oriole system. We havealso identiﬁed multiple principle factors aﬀecting the performance of the

Oriole system. Moreover, we lay out the limitation of Fawkes and discuss it at length.We hope that the attack methodology developed in this paper will inform thesecurity and privacy community of a pressing need to design better privacy-preserving deep neural models.

Fig. 9.

Comparison between the cloaked and the uncloaked versions of high-resolutionimages. Note that there are wrinkles, shadows and irregular purple spots on faces ofthe cloaked images. eferences [1] Akbari, R., Mozaﬀari, S.: Performance enhancement of pca-based facerecognition system via gender classiﬁcation method. In: 2010 6th IranianConference on Machine Vision and Image Processing. pp. 1–6. IEEE (2010)[2] Aware Nexa—Face TM , https://aware.com/biometrics/nexa-facial-recognition/. [3] Bashbaghi, S., Granger, E., Sabourin, R., Parchami, M.: Deep learning ar-chitectures for face recognition in video surveillance. In: Deep Learning inObject Detection and Recognition, pp. 133–154. Springer (2019)[4] Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks:Reliable attacks against black-box machine learning models. arXiv preprintarXiv:1712.04248 (2017)[5] Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: Adataset for recognising faces across pose and age. In: 2018 13th IEEE in-ternational conference on automatic face & gesture recognition (FG 2018).pp. 67–74. IEEE (2018)[6] Carlini, N., Wagner, D.: Adversarial examples are not easily detected: By-passing ten detection methods. In: Proceedings of the 10th ACM Workshopon Artiﬁcial Intelligence and Security. pp. 3–14 (2017)[7] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural net-works. In: 2017 ieee symposium on security and privacy (sp). pp. 39–57.IEEE (2017)[8] Chen, P., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.: EAD: elastic-netattacks to deep neural networks via adversarial examples. In: McIlraith,S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAIConference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovativeApplications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Sym-posium on Educational Advances in Artiﬁcial Intelligence (EAAI-18), NewOrleans, Louisiana, USA, February 2-7, 2018. pp. 10–17. AAAI Press (2018), [9] Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: Zeroth or-der optimization based black-box attacks to deep neural networks withouttraining substitute models. In: Proceedings of the 10th ACM workshop onartiﬁcial intelligence and security. pp. 15–26 (2017)[10] Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring thelandscape of spatial robustness. In: International Conference on MachineLearning. pp. 1802–1811. PMLR (2019)[11] Face++ Face Searching API, https://faceplusplus.com/face-searching/. [12] Ford, N., Gilmer, J., Carlini, N., Cubuk, E.D.: Adversarial examples are anatural consequence of test error in noise. CoRR abs/1901.10513 (2019), http://arxiv.org/abs/1901.10513 [13] Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: Adefence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference. pp. 113–125(2019)[14] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adver-sarial examples. arXiv preprint arXiv:1412.6572 (2014)[15] Google Cloud Vision AI, https://cloud.google.com/vision/. (4), 1–32 (2018)[24] Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J.: Defense againstadversarial attacks using high-level representation guided denoiser. In: Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition. pp. 1778–1787 (2018)[25] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towardsdeep learning models resistant to adversarial attacks. arXiv preprintarXiv:1706.06083 (2017)[26] Mei, S., Zhu, X.: Using machine teaching to identify optimal training-setattacks on machine learners. In: Bonet, B., Koenig, S. (eds.) Proceed-ings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, Jan-uary 25-30, 2015, Austin, Texas, USA. pp. 2871–2877. AAAI Press (2015), [27] Meng, D., Chen, H.: Magnet: a two-pronged defense against adversarial ex-amples. In: Proceedings of the 2017 ACM SIGSAC conference on computerand communications security. pp. 135–147 (2017)[28] Nec Face Recognition API, https://nec.com/en/global/solutions/biometrics/face/. riole : Thwarting Privacy against Trustworthy Deep Learning Models 19 [29] Pinto, N., Stone, Z., Zickler, T., Cox, D.: Scaling up biologically-inspiredcomputer vision: A case study in unconstrained face recognition on face-book. In: CVPR 2011 WORKSHOPS. pp. 35–42. IEEE (2011)[30] Rasti, P., Uiboupin, T., Escalera, S., Anbarjafari, G.: Convolutional neuralnetwork super resolution for face recognition in surveillance monitoring. In:International conference on articulated motion and deformable objects. pp.175–184. Springer (2016)[31] Research, G.V.: Facial Recognition Market Size, Share &Trends Analysis Report By Technology (2D, 3D), By Appli-cation (Emotion Recognition, Attendance Tracking & Mon-itoring), By End-use, And Segment Forecasts, 2020 - 2027, [32] Schroﬀ, F., Kalenichenko, D., Philbin, J.: Facenet: A uniﬁed embedding forface recognition and clustering. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition. pp. 815–823 (2015)[33] Shah, R., Gaston, J., Harvey, M., McNamara, M., Ramos, O., You, Y.,Alhajjar, E.: Evaluating evasion attack methods on binary network traf-ﬁc classiﬁers. In: Proceedings of the Conference on Information SystemsApplied Research ISSN. vol. 2167, p. 1508 (2019)[34] Shan, S., Wenger, E., Zhang, J., Li, H., Zheng, H., Zhao, B.Y.: Fawkes:Protecting privacy against unauthorized deep learning models. In: 29th { USENIX } Security Symposium ( { USENIX } Security 20). pp. 1589–1604(2020)[35] Suciu, O., Marginean, R., Kaya, Y., Daume III, H., Dumitras, T.: Whendoes machine learning { FAIL } ? generalized transferability for evasion andpoisoning attacks. In: 27th { USENIX } Security Symposium ( { USENIX } Security 18). pp. 1299–1316 (2018)[36] Szegedy, C., Ioﬀe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedingsof the AAAI Conference on Artiﬁcial Intelligence. vol. 31 (2017)[37] Tong, L., Li, B., Hajaj, C., Xiao, C., Zhang, N., Vorobeychik, Y.: Improv-ing robustness of { ML } classiﬁers against realizable evasion attacks usingconserved features. In: 28th { USENIX } Security Symposium ( { USENIX } Security 19). pp. 285–302 (2019)[38] Vazquez-Fernandez, E., Gonzalez-Jimenez, D.: Face recognition for authen-tication on mobile devices. Image and Vision Computing , 31–33 (2016)[39] Wang, B., Yao, Y., Viswanath, B., Zheng, H., Zhao, B.Y.: With great train-ing comes great vulnerability: Practical attacks against transfer learning. In:27th { USENIX } Security Symposium ( { USENIX } Security 18). pp. 1281–1297 (2018)[40] Wang, H., Pang, G., Shen, C., Ma, C.: Unsupervised representation learningby predicting random distances. arXiv preprint arXiv:1912.12186 (2019)[41] Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarityfor image quality assessment. In: The Thrity-Seventh Asilomar Conferenceon Signals, Systems Computers, 2003. vol. 2, pp. 1398–1402 Vol.2 (2003).https://doi.org/10.1109/ACSSC.2003.1292216 [42] Xiang, J., Zhu, G.: Joint face detection and facial expression recognitionwith mtcnn. In: 2017 4th International Conference on Information Scienceand Control Engineering (ICISCE). pp. 424–427. IEEE (2017)[43] Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncoveringsocial network sybils in the wild. ACM Transactions on Knowledge Discov-ery from Data (TKDD) (1), 1–29 (2014)[44] Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch.arXiv preprint arXiv:1411.7923 (2014)[45] Zhang, H., Wang, H., Li, Y., Cao, Y., Shen, C.: Robust watermarking usinginverse gradient attention. arXiv preprint arXiv:2011.10850 (2020)[46] Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignmentusing multitask cascaded convolutional networks. IEEE Signal ProcessingLetters23