[PDF] Obfuscation of Images via Differential Privacy: From Facial Images to General Images

Abstract

Due to the pervasiveness of image capturing devices in every-day life, images of individuals are routinely captured. Although this has enabled many benefits, it also infringes on personal privacy. A promising direction in research on obfuscation of facial images has been the work in the k-same family of methods which employ the concept of k-anonymity from database privacy. However, there are a number of deficiencies of k-anonymity that carry over to the k-same methods, detracting from their usefulness in practice. In this paper, we first outline several of these deficiencies and discuss their implications in the context of facial obfuscation. We then develop a framework through which we obtain a formal differentially private guarantee for the obfuscation of facial images in generative machine learning models. Our approach provides a provable privacy guarantee that is not susceptible to the outlined deficiencies of k-same obfuscation and produces photo-realistic obfuscated output. In addition, we demonstrate through experimental comparisons that our approach can achieve comparable utility to k-same obfuscation in terms of preservation of useful features in the images. Furthermore, we propose a method to achieve differential privacy for any image (i.e., without restriction to facial images) through the direct modification of pixel intensities. Although the addition of noise to pixel intensities does not provide the high visual quality obtained via generative machine learning models, it offers greater versatility by eliminating the need for a trained model. We demonstrate that our proposed use of the exponential mechanism in this context is able to provide superior visual quality to pixel-space obfuscation using the Laplace mechanism.

Full PDF

OObfuscation of Images via Diﬀerential Privacy: From Facial Imagesto General Images ∗ William L. Croft , J¨org-R¨udiger Sack , and Wei Shi Carleton University, School of Computer Science, Ottawa, Canada Carleton University, School of Information Technology, Ottawa, CanadaFebruary 23, 2021

Abstract

Due to the pervasiveness of image capturing devices in every-day life, images of individuals areroutinely captured. Although this has enabled many beneﬁts, it also infringes on personal privacy. Apromising direction in research on obfuscation of facial images has been the work in the k -same family ofmethods which employ the concept of k -anonymity from database privacy. However, there are a numberof deﬁciencies of k -anonymity that carry over to the k -same methods, detracting from their usefulnessin practice. In this paper, we ﬁrst outline several of these deﬁciencies and discuss their implicationsin the context of facial obfuscation. We then develop a framework through which we obtain a formal diﬀerentially private guarantee for the obfuscation of facial images in generative machine learning models.Our approach provides a provable privacy guarantee that is not susceptible to the outlined deﬁciencies of k -same obfuscation and produces photo-realistic obfuscated output. In addition, we demonstrate throughexperimental comparisons that our approach can achieve comparable utility to k -same obfuscation interms of preservation of useful features in the images. Furthermore, we propose a method to achievediﬀerential privacy for any image (i.e., without restriction to facial images) through the direct modiﬁcationof pixel intensities. Although the addition of noise to pixel intensities does not provide the high visualquality obtained via generative machine learning models, it oﬀers greater versatility by eliminating theneed for a trained model. We demonstrate that our proposed use of the exponential mechanism in thiscontext is able to provide superior visual quality to pixel-space obfuscation using the Laplace mechanism . Keywords:

Privacy protection, Facial obfuscation, Diﬀerential privacy, Neural networks

With the ever expanding presence of devices used to capture photos and video, visual privacy has becomeincreasingly important. Images and video frames containing faces are routinely captured, e.g., throughcameras, closed-circuit television systems [1], visual sensor networks [2] and a host of other devices andmethods [3]. These systems have many beneﬁts including mitigation of crime [1, 2], improved care inassisted-living [4], and useful services such as Google Street View [5]. However, despite the beneﬁts of thelegitimate applications, the potential for infringement on personal privacy must be taken seriously.Although many systems require only visual monitoring of behaviour, identities are often captured as well[2]. In some areas, the degree of public surveillance is reaching levels where it becomes possible to proﬁle andtrack much of the population [4]. In cases where visual information is disseminated to the public, such aswith Google Street View, it is imperative to hide the identities of individuals before the images are published[6]. Failure to suﬃciently protect privacy may allow undesirable inferences to be drawn about individualsor enable malicious activities such as voyeurism or stalking. Users of mobile devices have also expressedstrong aversion to the collection of images from their mobile devices via the applications they use [7]. Even ∗ We gratefully acknowledge the ﬁnancial support from the Natural Sciences and Engineering Research Council of Canada(NSERC) under Grants No. RGPIN-2020-06482, No. RGPIN-2016-06253 and No. CGSD2-503941-2017. a r X i v : . [ c s . CR ] F e b n scenarios where users willingly share images to online platforms, they have expressed concerns over who isable to view their images [8]. In a similar context, the privacy of individuals captured accidentally or withoutauthorization in the backgrounds of images uploaded to such platforms should be taken into consideration.Rich visual information from these sources combined with the great advances in machine learning approachesto facial recognition (e.g., VGGFace [9]) make the exploitation of unprotected visual data a relatively easytask. While such machine learning algorithms are no doubt beneﬁcial in many contexts, it is essential forprivacy protection approaches to be resistant to them.To protect the privacy of individuals, methods for hiding identity via manipulation of the data can beemployed. Many methods focus on obfuscation of facial identity since the face is often the most identiﬁablepiece of information [4, 3]. In this context, facial identity refers to a visual representation of an individual’sface which may potentially be exploited for unique identiﬁcation of the individual. Trivially, the face couldbe covered by a uniformly coloured rectangle. This destroys all information about the face, guaranteeingthat it can no longer be exploited to reveal an identity. However, this also destroys a great deal of utilityand visual appeal. In scenarios where images are shared in online platforms, users have expressed a strongaversion to the use of such rectangles with respect to the visual quality and information content of the images[10, 11].Less severe methods of obfuscation present trade-oﬀs between the level of privacy attainable and the utilityof the data. It is well-known that methods of privacy protection necessitate such a trade-oﬀ [12, 13]. Forexample, a study on privacy-preserving systems used for visual awareness of co-workers in remote locationsfound that the level of privacy protection applied to hide identities directly impacted the value of the datato those using the systems [14]. Preservation of utility is especially important for machine learning and datamining. Visual data can be used to learn about customers in retail environments [15, 16] and to detectanomalous or illegal events [17, 18]. It is therefore essential for a good method of obfuscation to preserve asmuch of the non-sensitive information as possible.The exact speciﬁcation of how utility is deﬁned and what information is considered non-sensitive dependsupon the context in which obfuscation is employed. For example, if data mining is to be employed for retailanalytics, customer identity must be protected, however, certain demographic information may be deemedacceptable for release. This may lead to an interpretation of utility based on the degree of preservation ofdemographic information required for a data mining algorithm. In the case of images shared on social media,utility may instead be focused on a measure of visual aesthetic to ensure that obfuscation of bystanders inphotos does not detract from the user experience. Regardless of the scenario, the goal of facial obfuscationis to provide an acceptable trade-oﬀ between protection of facial identity and preservation of non-sensitiveinformation beneﬁcial for utility in the obfuscated output.A number of research directions have been explored for the obfuscation of visual identity in images,e.g., pixelization, blurring, etc. [4]. While many approaches lack a formal privacy guarantee, the k -same[19] family of approaches has gained a great deal of traction, largely thanks to its guarantee that for achosen privacy parameter k , obfuscated individuals are indistinguishable within a group of k potential trueidentities. Although this privacy guarantee is appealing, it suﬀers from susceptibilities (e.g., compositionattacks [20]) carried over from the disclosure control method of k -anonymity on which it is based. In thispaper, we outline these susceptibilities in the context of privacy in images and propose an alternative, basedon diﬀerential privacy, that addresses these susceptibilities.Note that the task of facial obfuscation, which we address in this work, diﬀers from privacy-preservingmachine learning performed on facial images. Although both have the goal of preventing the leakage ofsensitive information on facial identity, the former releases obfuscated facial images while the later releasesa model trained on facial images. Facial obfuscation is a more general goal which can be applied in variousways. The obfuscated images may be used to enable privacy-preserving machine learning, may be sharedpublicly online, or may be used in other settings that require the protection of facial identity. Our contributions in this work are as follows: • We examine susceptibilities of k -same obfuscation to composition attacks and inferences using back-ground knowledge. We discuss theoretically and demonstrate empirically how the privacy guarantee2an be violated. We discuss the implications this has on privacy in images and examine certain diﬃ-culties in the practical usage of k -same obfuscation. • We propose the use of the formal privacy guarantee of diﬀerential privacy as a means to address thedeﬁciencies of k -same obfuscation. We develop the ﬁrst framework to apply diﬀerential privacy for theobfuscation of facial identity in images via generative machine learning models. • We additionally propose a method to enforce diﬀerential privacy via the direct modiﬁcation of pixelintensities. By giving up the high visual quality provided by generative models, this allows for a muchmore versatile approach that can obfuscate any image. We employ a process guided by an imagequality function in order to improve the visual quality over the existing results for diﬀerential privacyin pixel-space. • We conduct a series of experiments to compare diﬀerential privacy to k -same obfuscation on two well-known datasets for facial images. We demonstrate the resilience of diﬀerential privacy to compositionand parrot attacks. Furthermore, the results of our experiments suggest that diﬀerential privacy oﬀersa comparable level of utility in the obfuscated images to k -same obfuscation while providing a strongerprivacy guarantee. • We provide recommendations for the implementation of generative models for facial obfuscation as wellas for image obfuscation in pixel-space to obtain eﬀective and practical privacy protection.We provide a review of existing work on the obfuscation of facial images in Section 2. We then cover thedeﬁciencies of k -same in Section 3 and lay out a framework for diﬀerentially private obfuscation of imagesin Section 4. In Section 5, we propose a method to obfuscate images in pixel-space using the exponentialmechanism. Finally, we describe our experiments, provide comparisons between the methods of obfuscationand give an analysis of the results in Section 6. Perhaps the most well-known and earliest studied alterations to images for the prevention of human recog-nition of faces are pixelization [21] and blurring [22]. Pixelization decreases the information conveyed in animage by dividing the image into a grid of cells and setting all pixels within each cell to a common pixelintensity. Blurring involves the addition of, typically Gaussian, noise to the image. While these methodshave been successful at foiling human recognition, they have been shown to be highly ineﬀective againstmachine recognition [19].Other ad hoc methods of privacy protection involving variations on blurring [23], warping [24], morphing[25] and face swapping [26, 27] have been studied. However, the methods that have gained the most mo-mentum are those which oﬀer a formal guarantee of privacy. This trend has been reinforced by the legal andlegislative demands in the broader context of the release of sensitive data [28, 29, 3]. To this end, k -sameapproaches have been quite successful. These approaches use an adaptation of k -anonymity [30], a conceptfrom the ﬁeld of database privacy that guarantees that an anonymized database record is linkable to at least k possible identities. The ﬁrst adaptation of this concept to image obfuscation worked by aligning a set ofinput images on their facial features, partitioning the set into clusters of k or more similar images, and thenaveraging the pixels within each cluster to produce an averaged face which would replace each of the originalfaces in the cluster [19]. By releasing only the averaged faces, it could be guaranteed that neither humannor machine recognition could do better than identifying the cluster of identities that produced the image,thus limiting the probability of successful re-identiﬁcation by an upper bound of k .One issue with the original k -same averaging of pixels was poor visual quality due to inexact alignment offacial features, leading to superimposed features. The k -same-m [31] approach improved upon this by usingan active appearance model (AAM) [32] to obfuscate faces. AAMs are generative machine learning modelsfor the approximation of visual representations of a particular class of objects (e.g., human faces). A modelis trained on a set of images in order to learn about visual patterns and minimize diﬀerences with respect3o shape and texture between the original images and the generated output of the model. The k -same-mapproach ﬁrst trains an AAM and then performs the clustering and averaging process within the parameterspace of the model representations of faces to be obfuscated, thus eliminating the issue of superimposedfeatures. Subsequent variants of k -same obfuscation via AAMs have since been proposed such as k -same-furthest [33] which expressly selects clusters of images having highly dissimilar model parameters in orderto make the re-identiﬁcation of the clustered identities more challenging.More recently, generative neural networks (NNs) have been applied for k -same obfuscation [34, 35].Generative NNs are machine learning models that have shown great success in the generation of visualrepresentations of input class labels [36]. A generative NN passes the input labels through a sequence ofconvolutional layers, transforming them into features of ﬁner granularity at each layer until reaching a pixel-space output. A training process adjusts weights used by ﬁlters in each convolutional layer in order to learnfeature representations that minimize a loss function measuring the quality of the output. When trained ona set of images using identities as class labels, a generative NN is able to produce a visual approximation ofan identity based on an input class vector. By providing input vectors in which k identities are speciﬁed,the generative NN produces k -anonymous output.Eﬀorts have also been devoted to the preservation of utility in the obfuscated images. The k -same-selectapproach [37] proposed partitioning the input images into classes based on the information to be preserved(e.g., male and female identities). By running a separate clustering process within each partition, imageswithin each cluster would share the same class, thus preserving this information in the averaged version.This idea has been extended to the k -same-m model by training a diﬀerent AAM for each combinationover the demographic attributes of age, gender and race [38]. By using the appropriately trained AAM forobfuscation, the attributes for which it was trained can be preserved in the output. An alternative approachto the explicit speciﬁcation of classes to be preserved involves the use of multimodal discriminant analysisto allow for the representation of identity and other attributes in orthogonal subspaces [39]. This allowsfor k -anonymity to be applied within the identity subspace while preserving or modifying other attributesrelevant to utility, such as age, gender and race, as desired within each of their subspaces.In the context of preservation of facial expressions, an approach has been proposed to calculate the dif-ference between AAM parameters of original instances having a neutral expression and the target expression(e.g., happiness) and then add this diﬀerence to an obfuscated instance with a neutral expression. In thisway, the target expression can be transferred to the obfuscated image [40]. Preservation of facial expressionshas also been investigated for generative NNs. By designing the network architecture to allow for multipleinput vectors over diﬀerent types of classes, various combinations of these classes can be targeted in theoutput [36]. This method has been applied to generate k -same output having speciﬁc facial expressions [35].Recently, diﬀerential privacy has been applied to preserve privacy in images by adding noise directly tothe pixel intensities [41]. While the use of suﬃcient noise can provide a strong guarantee of privacy, thisrenders the output images unrecognizable as human faces. To improve upon the visual quality, an alternateapproach has been explored in which singular value decomposition is employed to add noise to the singularvalues of a matrix of pixel intensities [42]. Although visual quality is improved, the obfuscated images remainfar from being photo-realistic. Furthermore, the application of noise only to the singular values, as opposedto all matrices of the decomposition, leaves a potential for information leakage which is not investigated. Tothe best of our knowledge, our work is the ﬁrst to study diﬀerential privacy applied to generative models forthe obfuscation of facial images. For completeness, we review here some of the most relevant works on privacy-preserving machine learning,primarily in the context of diﬀerential privacy. While most of these works diﬀer fundamentally in their goalsand motivation from our own research, this review serves to clarify where our work is situated within theexisting literature.In privacy-preserving machine learning, it is necessary to prevent the model from leaking sensitive infor-mation. This has been studied in the context of various attacks such as the inference of training data (i.e.,membership inference) [44], the inference of participant data in collaborative learning [45, 46], and the useof trained models to draw unintended inferences or reconstruct training data (i.e., model-inversion attacks)[47]. In some scenarios, diﬀerential privacy can be applied during training, e.g., by adding noise to gradients4sed to update model parameters [48], to produce a model that protects details about its training data. Inother cases, such as with collaborative learning, it has been shown that diﬀerential privacy is not appropriatefor the granularity of protection required, and an active adversary can thwart attempts at privacy protectionby crafting their shared parameters to force others to reveal more sensitive information [45]. While theseattacks and applications of diﬀerential privacy can all be related to the protection of sensitive informationin facial images, this only holds in the context of training data for machine learning models and relies onanalysis related to the training and usage of said models. Contrary to this, we focus on the task of protectingfacial images in any context, abstracted from their intended usage.Diﬀerential privacy has also been applied to protect sensitive data used to train generative adversarialnetworks [49]. When trained in a privacy-preserving manner, the network is able to learn about the distribu-tion of the training data and subsequently generate new instances of data in such a way that the generatedinstances do not enable accurate inferences about the speciﬁc training instances. Variations on this concepthave been applied to generate synthetic datasets of facial images in a diﬀerentially private manner [50, 51].Although such approaches allow for the generation of facial images in a privacy-preserving manner, this toofalls outside of the domain of facial obfuscation. The images produced from a generative network are drawnfrom a learned distribution, in this case a distribution of facial images, and are thus intended only to lookas though they may have been drawn from the training set. This diﬀers from the task of facial obfuscationin which the output image should retain as much information from the input image as possible short ofrevealing the aspects deemed sensitive.A diﬀerent direction explored in some works is the design of neural networks that reduce the leakage ofsensitive information in the representations of images used for tasks such as classiﬁcation. Here, the goal isnot the protection of the training data but rather the protection of the input images provided to the modelat inference time. This goal has been cast in an adversarial training process in which the network aimsto maximize accuracy for the intended classiﬁcation task while minimizing leakage of sensitive informationas measured by entropy [52]. A similar adversarial training approach has been explored in the context ofminimizing identity classiﬁcation accuracy in facial images while maximizing facial expression classiﬁcationaccuracy [53] and while maximizing action detection accuracy [54]. Although these approaches suit the goalof protecting a speciﬁc facial image, they are highly tailored to speciﬁc tasks and do not produce obfuscatedimages as part of their output. An alternate approach, Privacy-Protective-GAN [55], combines the frameworkof adversarial training for privacy with an auto-encoder style generative architecture to produce obfuscatedfacial images. While this ﬁts for the task of facial obfuscation, the adversarial training objective provides noformal guarantee regarding the level of privacy that is achieved, nor is there any means to adjust the levelof privacy enforced by the trained model.

Given the importance of preserving privacy in images, a good method of obfuscation must assert a meaningfulguarantee about the level of privacy it provides. Without such a guarantee, it is impossible to formally assessthe eﬀectiveness of the obfuscation. Empirical results may help to gain intuition on which approaches appearpromising. However, without a formal guarantee to back up the results, it is impossible to assert that privacywill remain protected in untested scenarios against unknown attacks. For this reason, we focus our attentiononly on methods of obfuscation which oﬀer a formal privacy guarantee.The necessity of this restriction is underscored by the concept of parrot attacks [19]. A parrot attackuses a neural network to classify identities using labeled instances of obfuscated images as the training set.Having learned about patterns in the obfuscation during training, the network is made much more eﬀectiveat defeating the obfuscation. Despite pixelization being reasonably eﬀective against human recognition andeven naive machine recognition, it can be completely defeated by a parrot attack. This formed a strong basisfor the need of a formal privacy guarantee such as that provided by the k -same family.The k -same approaches employ a privacy guarantee derived from k -anonymity [30] which asserts thatthe original identity for any obfuscated image is indistinguishable from at least k − k or more images to produce averagedinstances as replacements for all images in each cluster. This makes it impossible for any software to achievea better probability of re-identiﬁcation than k . 5owever, the k -same guarantee relies on assumptions about the nature of the attack. In this section, wediscuss these assumptions. We show why they are often unrealistic in practice, making the guarantee weakerthan it appears to be. A well-known deﬁciency of k -anonymity is its susceptibility to attacks that employ background knowledge[56]. This refers to cases where the attacker uses prior knowledge about the sensitive information to drawinferences that violate the privacy guarantee. For example, in the context of a database of hospital records, k -anonymity would typically be applied to create groups of database records that are indistinguishable ontheir demographic attributes but with the original medical condition (i.e., the sensitive information) of eachrecord preserved for statistical analysis. If an attacker already knows the medical conditions of one ormore individuals (e.g., friends or family members), they can eliminate the corresponding records from theanonymized groups by ﬁnding the matching demographic information and medical condition. If the removalof records reduces the size of an anonymized group to less than k , this violates the privacy guarantee.This concept carries directly over to the k -same privacy guarantee. If, via prior knowledge, the attackerknows with certainty that some of the k individuals could not be in the obfuscated image, they can discountthem from the set of k identities. An attacker could come by this knowledge in a number of ways: personalknowledge about friends and family, information scrapped from other data sources such as social media, etc.The simple combination of knowledge about the time at which an photo was taken and the approximatelocations of some of the k individuals at that time can be enough to derive a proper subset of the k individualswhich violates the privacy guarantee.Contextual information in an image can often enable these types of inferences. Using signs, architectureor landscapes in an image, an attacker might recognize the location or employ software to determine it.Knowledge about locations that individuals frequent may greatly increase the probability of some possibilitiesover others. Similarly, if some of the k identities are known to live in diﬀerent cities than where the photowas taken or worse yet, diﬀerent continents, these identities become much less probable. Other cues such asaccessories or clothing on obfuscated individuals may also greatly impact the probabilities accorded to the k possible identities. Since the privacy guarantee asserts that each of the k identities are equally probable,this is also in violation of the guarantee.We note that the original k -same paper does acknowledge this vulnerability to contextual informationand asserts that the privacy guarantee applies strictly to the information contained within the face, not tothe image as a whole [19]. While this important distinction allows for the privacy guarantee to be upheld, itis a major restriction on the practical applicability of the k -same guarantee. Most contexts in which facialobfuscation is applied will be rich with contextual information, making the privacy guarantee much lessmeaningful. Another deﬁciency of k -anonymity is a susceptibility to composition attacks [20]. This is a class of attacksthat exploit information from multiple, potentially uncoordinated, obfuscated releases to violate the privacyguarantee. A simple instance of this is the intersection attack. An attacker ﬁrst identiﬁes the clusters inwhich a particular individual exists from two diﬀerent releases. If the releases were uncoordinated, theclusters likely diﬀer, allowing the attacker to take their intersection to achieve a new set with a cardinalityless than k .This attack again carries directly over to the k -same approach. Consider a scenario where an individualtakes a photo that they wish to upload to social media. Privacy protection might be applied to the individualor perhaps to bystanders who were captured in the background of the photo. Should the individual decideto upload the same photo to two or more social media platforms, the issue of uncoordinated obfuscationimmediately arises. An attacker needs only scrape these platforms for similar photos to apply an intersectionattack.Intersection attacks may even be eﬀective for multiple releases from the same organization if care is nottaken. For example, an individual may take consecutive photos and then upload all of them. Algorithmsfor k -same determine clusters based on the similarity of faces but many factors beyond facial identity (e.g.,6ose, angle and lighting) could impact similarity. It is therefore not unlikely that multiple images of thesame individual will result in diﬀerent clusters. Sequences of images uploaded in this way would be an idealtarget for intersection attacks.Most k -same approaches require each individual to appear only once in the gallery of images to beobfuscated. This prevents intersection attacks for releases from the same organization but does not protectagainst uncoordinated releases across multiple organizations. Furthermore, enforcing this restriction maybe very challenging in practice. While the primary subject in a photo might be determined based on theaccount used to upload the photo, other individuals in the photo cannot be correctly identiﬁed 100% of thetime. Face recognition software has not yet reached this level of accuracy. Without manual labeling, such apolicy cannot be enforced. Beyond this, the restriction of one image per identity is very severe and does notmatch typical use cases for image sharing. We discuss here two other diﬃculties that arise when using k -same obfuscation in practice. Although thesediﬃculties do not violate the privacy guarantee, they hinder meaningful applications for k -same obfuscationin some contexts.The ﬁrst problem arises from the requirement of an input gallery of images. This may be appropriatefor scenarios where batches of images are obfuscated but it is awkward to apply to cases where images aresporadically uploaded (e.g., in social media platforms). One might consider the use of a preloaded staticgallery or even a dynamic gallery that gets updated as new images are uploaded. This, however, is not agood solution since identities can then participate in more than one cluster. Furthermore, if an attackerrecords information about identities known to be in the gallery, those identities can be discounted when animage is uploaded for a new identity. An alternative solution could rely on buﬀering uploaded images to forma gallery that can eventually be used to release a batch of obfuscated images. However, this necessitates atrade-oﬀ between the size of the gallery (and thus the quality of the output) and ability to deliver a timelyservice. In an era where users expect images to be uploaded instantly, this is not likely to be a manageabletrade-oﬀ. The release of multiple batches also increases the chances of enabling composition attacks.The second problem relates to the preservation of utility in the obfuscated output. Approaches thatpartition the gallery according to classes to be preserved (e.g., combinations of age and gender) place aneven greater strain on the input gallery requirement. Working separately with the subset of images fromeach class greatly reduces the number of images available for clustering. Such an approach is not scalablefor large numbers of classes that would be needed for ﬁnely grained attention to utility. In the worst case,some classes may be outliers in the overall distribution and could lack suﬃcient images to form a cluster.These classes would have to be merged with others in order to achieve the k -same guarantee, thus failing toachieve the desired granularity of classes. Due to the deﬁciencies of the k -same privacy guarantee in practical applications of facial obfuscation, weargue that a more robust privacy guarantee is required. Following the advances in the ﬁeld of databaseprivacy, we consider the potential of diﬀerential privacy to provide a stronger privacy guarantee. Diﬀerentialprivacy has real world applications in organizations including the US Bureau of Census, Google and Apple[57] and has been studied as a means to comply to legal deﬁnitions of privacy such as FERPA [58] andGDPR [59]. In this section, we ﬁrst review basic theory of diﬀerential privacy. We then adapt the privacyguarantee to ﬁt the context of generative machine learning models for images and we formalize a frameworkto apply diﬀerential privacy to facial images. We discuss how the derived privacy guarantee addresses theissues identiﬁed with the k -same approach. Finally, we apply our framework to implement diﬀerentiallyprivate facial obfuscation using a generative NN. A privacy guarantee that oﬀers an absolute bound on re-identiﬁcation risk necessitates restrictive assumptionsabout the attacker. This is due to the fact that it is impossible to prevent an attacker from learning about the7ensitive information through means other than the obfuscated release [60]. Diﬀerential privacy recognizesthis diﬃculty and instead adopts a privacy guarantee that limits the increase in an attacker’s knowledgeabout the sensitive information. In the context of databases, the goal is to release aggregate informationabout the database while preventing that information from being exploited to derive sensitive details aboutthe individual records. Diﬀerential privacy functions by using a randomization mechanism to add controllednoise to database query responses in order to release useful responses while achieving a desired level ofindistinguishability between potential conﬁgurations of the database contents.Two databases are considered to be adjacent if they diﬀer by a single record. Informally, the privacyguarantee enforces that any pair of adjacent databases must be bounded within a multiplicative factor of e (cid:15) (where (cid:15) is the privacy parameter ) in their probabilities of producing the same noisy query response.This is often interpreted as a ratio of e (cid:15) between these probabilities. With a suﬃciently small ratio, similardatabases have similar probability distributions over their noisy query responses, causing them to behavesimilarly with respect to the noisy query responses they produce. This limits the usefulness of the noisyresponses as a means to distinguish between potential conﬁgurations of the database. The privacy guarantee[61] in Formula 1 formally states this requirement in terms of any pair of adjacent databases D , D ∈ D ,where D is the set of valid database conﬁgurations, and a randomization mechanism K : D → R n , where n ∈ Z + . Pr ( K ( D ) = R ) ≤ e (cid:15) Pr ( K ( D ) = R ) ∀ R ∈ R n . (1)To achieve this privacy guarantee, the mechanism K must take into account the value of (cid:15) and the querysensitivity . The sensitivity ∆ F of a query f : D → R n is deﬁned as the maximum possible L distancebetween the query responses for any pair of adjacent databases. The guarantee can be achieved by addingto the query response a vector of n continuous random variables, each drawn independently from a Laplacedistribution with ∆ F(cid:15) as its scaling parameter [61]. The exponential decay of probability density in theLaplace distribution beneﬁts the utility of the mechanism by limiting the expected perturbation of the queryresponses. Through the selection of an appropriate value for (cid:15) , a data custodian can control how muchinformation is revealed about the contents of the database.

We now consider how diﬀerential privacy can be applied to generative models for images. A generativemodel can represent images of instances from speciﬁc classes (e.g., human faces) using a numeric represen-tation that abstracts from pixel intensities. Our goal is to protect the privacy of individuals in images bymodifying these numeric representations to prevent facial identiﬁcation while maintaining utility and visualquality. Diﬀerential privacy is ideal for this purpose as it provides a robust guarantee against the accuracyof the inferences an attacker can make about the original data. The application of noise to the numericrepresentation of the model allows for the generation of photo-realistic instances of novel human faces. Thisavoids the signiﬁcant degradation in visual quality that results from the addition of noise to pixel intensities.When moving from the domain of databases to that of generative model representations, the concepts ofadjacency and query sensitivity can no longer be applied for the conﬁguration of a mechanism. In place ofa database where each record is an individual, we have a numeric representation of a single individual (e.g.,features extracted by the model). To protect sensitive data in this form, one can apply a generalization ofdiﬀerential privacy to arbitrary secrets [62], where a secret is any numeric representation of data. In ourcase, the secret is the generative model representation of an individual. This generalization substitutes thenotion of adjacency between databases with distance between secrets. By controlling noise according toan appropriate distance metric, the privacy guarantee is adapted to ensure that similar secrets are highlyindistinguishable while very diﬀerent secrets remain distinguishable. For a pair of databases, the distancebetween them is the number of records by which they diﬀer. For other types of secrets, the distance metricmust be carefully chosen in order to provide an appropriate privacy guarantee. An example of a well-studiedinstantiation of this generalization is geo-indistinguishability [63] which protects users’ geographic locations,represented as two-dimensional coordinates, using L distance as the metric.The notion of distance between secrets is appropriate for the representation of images within a generativemodel. Any model that employs a numeric representation of images allows for the calculation of distancebetween images. While the exact representation of an image diﬀers from model to model, they can generally8apped to a vector of ﬁxed length with little diﬃculty. We provide details on how this concept can be appliedto generative NNs in Section 4.5. To develop a general framework here, we consider the representation ofan image to be a vector X ∈ R n and the randomization mechanism to be a function K : R n → R n used to produce an obfuscated instance of the image. Although the diﬀerential privacy generalization onlydeals explicitly with one and two-dimensional secrets [62], its generalization to an n -dimensional vector isstraightforward. We therefore adapt the privacy guarantee to suit this purpose in Formula 2, using a distancefunction d : R n × R n → R .Pr ( K ( X ) = R ) ≤ e (cid:15)d ( X ,X ) Pr ( K ( X ) = R ) ∀ X , X , R ∈ R n . (2)Comparing this to Formula (1), the databases D and D have been replaced by secrets X and X andthe distance function now appears in the exponent of the multiplicative factor e (cid:15) . The distance betweenany pair of secrets acts as a coeﬃcient to (cid:15) when interpreting the ratio of their probabilities. Intuitively, themeaning is that the more similar a pair of images are to each other, the harder is it to determine which ofthem led to a given obfuscated instance. This hampers the accuracy with which attempts at re-identiﬁcationcan be made. To achieve this guarantee, we must ﬁrst determine an appropriate distance metric to measurethe distinguishability of the numeric representations of images.A natural choice for the distance metric is L distance, however, we must be wary of the meaning ofeach element in the vectors. Should certain elements have diﬀerently sized ranges, they should be obfuscatedusing diﬀerent magnitudes of noise. If one element has a much larger range than the others, the addition ofnoise conﬁgured to the smaller range would do little to prevent an inference of high accuracy on the originalvalue of the element. We therefore apply normalization such that the distance between any pair of elementsin the i th position of a pair of vectors falls within the range [0 , R i = [ i min , i max ] be the range ofelements in the i th position of a model representation vector, we deﬁne a normalized, element-wise distancemetric as follows: d i ( x, x (cid:48) ) = | x − x (cid:48) | i max − i min ∀ x, x (cid:48) ∈ R i . (3)A distance metric for vectors deﬁned as the sum of the element-wise distances for each position wouldbe appropriate for images represented by the same model. However, a more useful framework would allowfor reasoning about the level of privacy across diﬀerent models. Ideally, the meaning of a privacy parameter (cid:15) applied to one model should have a similar meaning for a diﬀerent model. For this, we require anothernormalization to account for models having vectors of diﬀerent lengths. We therefore deﬁne the distancemetric for vectors as follows: d ( X , X ) = n (cid:80) i =1 d i ( X i , X i ) n ∀ X , X ∈ R n . (4)By using this distance metric in combination with Formula 2, we obtain a meaningful privacy guaranteefor the model representations of images. Although this type of metric is, in general, not novel, its use inthis context is. We must therefore address how to conﬁgure a mechanism to satisfy this instantiation of theprivacy guarantee. This leads to our main result in the development of a framework for the application ofdiﬀerential privacy to generative models for images. Theorem 1.

Any image X ∈ R n can be protected by (cid:15) -diﬀerential privacy through the addition of a vector ( Y , ..., Y n ) ∈ R n where each Y i is a random variable independently drawn from a Laplace distribution usinga scaling parameter σ i = n ( i max − i min ) (cid:15) .Proof. We must satisfy the privacy guarantee (Formula 2) using our proposed distance metric (Formula 4).The form this privacy guarantee takes is our starting point in Formula 5. Through manipulation of thisinequality and the substitution of mechanism probabilities with a Laplace distribution, we prove that the9election of an appropriate scaling parameter for each instance of the Laplace distribution allows for theprivacy guarantee to be satisﬁed. n (cid:89) i =1 Pr ( K ( X i ) = R i ) ≤ e (cid:15) n (cid:80) i =1 di ( X i,X i ) n n (cid:89) i =1 Pr ( K ( X i ) = R i ) ∀ X , X , R ∈ R n . (5) n (cid:89) i =1 Pr ( K ( X i ) = R i ) ≤ n (cid:89) i =1 e (cid:15)di ( X i,X i ) n n (cid:89) i =1 Pr ( K ( X i ) = R i ) ∀ X , X , R ∈ R n . (6) n (cid:89) i =1 e − | X i,Ri | σ σ ≤ n (cid:89) i =1 e (cid:15)di ( X i,X i ) n n (cid:89) i =1 e − | X i,Ri | σ σ ∀ X , X , R ∈ R n . (7) n (cid:89) i =1 e | X i,Ri | − | X i,Ri | σ ≤ n (cid:89) i =1 e | X i − X i | σ ≤ n (cid:89) i =1 e (cid:15)di ( X i,X i ) n ∀ X , X , R ∈ R n . (8) n (cid:89) i =1 e (cid:15)di ( X i,X i ) n = n (cid:89) i =1 e (cid:15) | X i − X i | n ( imax − imin ) ∀ X , X . (9) From Formula 9, it becomes clear that the inequality holds when using an independent Laplace distri-bution for each pair of elements X i , X i , substituting the scaling parameter σ with a corresponding value σ i = n ( i max − i min ) (cid:15) .Using the generalization of diﬀerential privacy, the notion of query sensitivity is implicitly captured inthe distance metric. Since the distance metric of Formula (4) has a range of [0 , e (cid:15) . This is akin to themeaning of the privacy guarantee for a pair of databases that diﬀer on every record. In order to select anappropriate value of (cid:15) , a data custodian must keep in mind that similar images will have a very small distancebetween them, requiring much larger values of (cid:15) to provide a reasonable ratio. In Section 6, we demonstratethe implications of the choice of (cid:15) on the levels of privacy and utility. We now describe the improvements we obtain from the use of diﬀerential privacy for each of the problemsidentiﬁed in Section 3.

By removing dependence of the attack model on an absolute level of re-identiﬁcation risk, we are able toreason about the level of privacy in the presence of attackers with background knowledge. If the locationin a photo is identiﬁed as a particular city, no facial obfuscation can prevent the inference that individualsliving in the identiﬁed city have a higher probability of being the obfuscated identity than individuals livingelsewhere. Yet, the diﬀerential privacy guarantee continues to hold as the background knowledge does notimpact the conditional probability distribution used by the randomization mechanism. Since the privacyguarantee concerns only the change in the attacker’s knowledge when presented with the obfuscated data(e.g., the face), it is unaﬀected by other sources of information the attacker may gain access to.

Another very important property of diﬀerential privacy is its resilience to composition attacks. The com-position theorem [61] states that for two diﬀerentially private releases using privacy parameters (cid:15) and (cid:15) respectively, the privacy guarantee holds for a privacy parameter (cid:15) = (cid:15) + (cid:15) . Thus, even in the case ofuncoordinated releases, we still have a valid privacy guarantee. Furthermore, this removes the restriction onthe same individual appearing only once in the release of obfuscated images.10 .3.3 Input Image Gallery Diﬀerentially private image obfuscation has no need for a gallery of images in order to perform obfuscation.Since noise is added on a per-image basis, there is no computation of clusters required. Given a trainedmodel, obfuscation of a single image or a batch of images can be performed with ease. This makes theobfuscation process much more versatile.

Although the interpretation of the diﬀerential privacy guarantee is relatively well understood in the con-text of databases, the distance-generalized guarantee which we employ changes the interpretation of theprivacy parameter (cid:15) . To assist users and data curators in understanding the implications of a chosen privacyparameter, we provide a brief discussion here on the generalized privacy guarantee.Recall that the generalized privacy guarantee replaces databases with arbitrary secrets and scales (cid:15) bythe distance between any pair of secrets for which the guarantee is to be interpreted. The distance betweenany pair of databases can in fact be interpreted as a Hamming distance (i.e., the number of records bywhich a pair of databases diﬀer), allowing for the generalization to capture the standard interpretation ofdiﬀerential privacy. At distance 1, this corresponds to the basic privacy guarantee and at distance d > d transitive applications of the privacy guarantee, resulting in a multiplicative bound of e d(cid:15) . For databases with n records, the range of the distance metric is [1 , n ].In contrast, the distance metric that we employ is bounded by the range [0 , e . (cid:15) . While this may seem like a rathersmall distance, this is in fact a much larger fraction of the total range than is often used in the context ofdatabases where there may be thousands or even hundreds of thousands of records. Generative NNs have the useful property of producing photo-realistic images. We now describe how ourframework can be applied to these models. Provided that the addition of noise is properly controlled, theoutput will be a photo-realistic image of any newly created identity.We consider network architectures that take one or more class vectors as input and employ up-convolutionto transform the input into a visual representation in pixel space [36]. By considering each identity to bea diﬀerent class, an input vector can specify the individual to be generated. The identity class vector isan obvious choice as the model vector to be obfuscated. However, this leads to some form of interpolationbetween the identities. To apply a ﬁner degree of modiﬁcation to the identity, we propose the applicationof obfuscation at the second layer of the network. Typically, the second layer applies convolution to theclass vector and transforms it into a vector of high-level numeric facial features. By applying obfuscation tothese features instead, we can achieve a richer variety in the potential modiﬁcations to the face. We thereforeapply obfuscation to the output of the ﬁrst convolutional layer of the network and pass the obfuscated featurevector on as the input to the next layer of the network. A sample architecture is shown in Figure 1.Our proposed implementation can be interpreted as an extra layer added to the network which is usedonly after training is complete when obfuscation is to be applied. The layer has no weights and is simplyan application of the Laplace mechanism, conﬁgured as described in Theorem 1. The noisy output, thenconforming to the diﬀerential privacy guarantee, is passed on to the next layer of the network. All otherpropagation through the network occurs as normal. By ensuring that the encoding of the facial identity11 dentityFacialExpressionGender Output imageConvolution Up-convolutionConcatenationConvolution

DifferentialPrivacy

Input

Figure 1: Visualization of the layer architecture in an up-convolutional neural network using diﬀerentialprivacy. Noise is applied to the output of the second identity layer. The numbers and shapes of theconvolutional layers shown here are not exact and represent only the general structure of such a network.has been passed through the layer implementing the Laplace mechanism, we are able to produce an outputimage depicting a facial identity that has been obfuscated in a diﬀerentially private manner.Information about the range of each model vector element can be used as a means to preserve the visualquality of the obfuscated output. Noisy elements that have gone too far beyond the valid range may lead tovisual artifacts or distortions in the output image. To prevent this, we snap any out-of-bounds noisy valueback to the nearest valid value. Since diﬀerential privacy is resistant to any form of post-processing [61] andthe ranges of the elements are non-sensitive information, this step cannot violate the privacy guarantee.

When designing a system for the obfuscation of identities in images, an important consideration is the abilityto obfuscate any identity. However, the generative NN architecture we employ does not directly allow for therepresentation of classes that were not learned during the training process. Since each identity is a diﬀerentclass, this means that the model cannot directly represent unknown identities and thus cannot obfuscatethem.To solve this problem, we propose to formulate the approximation of an appropriate input vector for anunknown identity as an optimization problem. Since the generative NN has learned a representation of eachtraining identity, a new identity can be approximated as a weighted sum of the known identities. Whenprovided with these weights as the input identity vector, the generative NN would produce an interpolationbetween the identities which can act as an approximate visual representation of the unknown identity. Inorder to formalize this concept as an optimization problem, we must select an appropriate representationfor the identities. This cannot be done using the generative NN feature vectors since the representation ofthe target identity is unknown. We therefore employ a secondary neural network that has been trained forclassiﬁcation of facial identity. By removing the ﬁnal layer of the classiﬁcation network, its output becomesa high-level vector of facial features which we can use as the representation of an identity. The signiﬁcanceof using a classiﬁcation network for this purpose is that it need not have seen any of the target identities in12ts training data in order to produce feature vectors for them. Such a network is therefore ideal to providefeature vectors for the identities from the generative NN training data as well as for the target unknownidentity.We now formalize the optimization problem using real-valued feature vector representations of identitiesfrom a classiﬁcation network. Let X be a set of n-dimensional feature vectors representing m identities onwhich the generative NN was trained (i.e., | X | = m ) and let Y be an n-dimensional feature vector representingan identity that is unknown to the generative NN. Our goal is to determine a set W of weights that minimizethe distance between a weighted sum of the vectors in X and the vector Y .min W (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m (cid:88) i =1 W i X i − Y (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (10)While this optimization problem is similar to the format of an objective function for a linear program,the necessity of absolute value calculations for the L distance between the vectors prevents this from beingwritten as a linear function. However, this problem, known as least absolute deviations, can be rewrittenin an alternate but equivalent formulation that avoids the need for absolute value functions through theintroduction of additional variables [64]. This formulation is as follows:min n (cid:88) i =1 u i (11)Subject to the following constraints: u i ≥ y i + m (cid:88) j =1 W j X j i = 1 , ..., n (12) u i ≥ −  y i + m (cid:88) j =1 W j X j  i = 1 , ..., n (13)The added constraints ensure that the values assigned to the new variables respect the absolute valuefunctions from the original problem. This formulation can be given to any linear programming solver inorder to ﬁnd the optimal weights for the approximation of the unknown identity represented. While we have thus far considered the application of diﬀerential privacy to the numeric representations ofgenerative models, it can also be applied directly to the pixel intensities of an image. Such an approachsuﬀers in the visual quality of the obfuscated images. Noise added in this way is no longer guided to obfuscateonly speciﬁc aspects such as identity. Yet, by discarding the use of a generative model, the randomizationmechanism can be applied to any image, regardless of what is depicted. This versatility allows for obfuscationto be applied to images that are not readily captured by available models. For example, it may be desirableto obfuscate signs, license plates or complete vehicles. A pixel-space randomization mechanism provides ameans to directly achieve privacy protection on any image.

Diﬀerential privacy via a Laplace mechanism has previously been applied in combination with pixelizationto achieve pixel-space obfuscation [41]. In this work, the authors deﬁned two images to be adjacent if theydiﬀered by n pixels, where the value of n is chosen by the user based on the size (in pixels) of a window thatcovers the portion of the image deemed to be sensitive information. For example, if a face covers roughlya 100 ×

100 pixel space and the user wishes to obfuscate the face, n would be set to 10000. Following thestandard framework for diﬀerential privacy, the query sensitivity is then deﬁned as the maximum possiblediﬀerence between any pair of adjacent images. For an image with c channels, each of which covers a rangeof k possible pixel intensities, the query sensitivity is ( k − nc . Given this query sensitivity, (cid:15) -diﬀerential13rivacy can be provided through an independent application of the Laplace mechanism to each pixel in eachchannel using a scaling parameter of ( k − nc(cid:15) [41]. To reduce the query sensitivity, the authors propose toapply pixelization prior to obfuscation. For a speciﬁed level of pixelization b ∈ Z | ≤ b , the image is dividedinto b × b grids of pixels where each grid is set to the average of its pixel intensities. By treating each gridas a single uniform value to be obfuscated, the required scaling parameter for the Laplace mechanism isreduced to ( k − nc(cid:15)b . While the Laplace mechanism is conﬁgured based on diﬀerences in pixel intensities, we propose to insteadapply the exponential mechanism [65] in order to control the distribution of the obfuscated output using ameasure of visual quality for potential outputs of the mechanism. Given an input I (in our case an originalimage), the probability distribution over the set of potential obfuscated outputs I is deﬁned as:Pr ( I (cid:48) ) ∝ e (cid:15)q ( I,I (cid:48) ) ∀ I (cid:48) ∈ I , (14)where q is a quality function that measures the utility of I (cid:48) with respect to I . In other words, q is ameasure that reﬂects the usefulness of an obfuscated output I (cid:48) given its original value I . Due to this, theresulting distribution over the potential obfuscated outputs explicitly gives preference to outputs with higherutility as measured by q .We propose the use of the structural similarity index measure (SSIM) [66] as the utility function usedto control the mechanism. This provides a human-centric measure of image quality which is designed tomatch up with how the human visual system perceives information. SSIM incorporates aspects of luminance,contrast and structure into a measure that is averaged over a sliding window intended to mimic how humaneyes scan a large area but focus only on local areas. The use of SSIM allows us to produce a mechanismwhich achieves diﬀerential privacy while explicitly favouring obfuscated results with higher visual quality, asperceived by the human visual system. For instance, while a contrast-shifted image may yield a relativelyhigh change in pixel intensities, the visual change in the image is far less signiﬁcant than an equivalentchange in pixel intensities distributed at random across the image. The Laplace mechanism obfuscatesimages using a probability distribution based on diﬀerences in pixel intensities which fails to capture thisconcept of visual quality. Contrarily, the exponential mechanism using SSIM as its quality function directlyreﬂects this notion of visual quality in its distribution over its obfuscated output. A human-centric notionof utility is of particular importance for obfuscation employed in contexts such as social media and imagesharing platforms where user experience is a key concern. A diﬃculty arises with the exponential mechanism in its implementation since the creation of the distributionused by the mechanism often requires explicit calculation of the quality score associated with each potentialobfuscated output. Given a image containing n pixels over c channels, each of which covers k possible pixelintensities, there are a total of k nc possible states an image could take on. Enumeration of these statesis computationally intractable, even for very small images. For example, a 10 ×

10 pixel RBG image has256 possible states. Therefore, to implement an SSIM-based exponential mechanism, we must reduce thenumber of states to a tractable size.Since SSIM is intended to simulate the way the human visual system focuses only on small areas at anyone time, it is calculated for a small window of pixels, typically 11 ×

11. The overall measure is calculatedby sliding this window over the image in single pixel steps and averaging the values measured at each step.Similar to this process, we propose to apply the exponential mechanism to a p × p pixel window whichmoves over the image in steps of p pixels in order to ensure that there are no overlapping applications ofobfuscation. We then coarsen the granularity of the intensities that the pixels are allowed to take on to k (cid:48) < k possible values, where k is the original number of possible intensities. Since diﬀerentially privateobfuscation reduces the accuracy of the sensitive data by design, it is often unnecessary to preserve a highdegree of precision. The reduction in the granularity of pixel intensities therefore has little impact on thequality of the obfuscated output. 14nder these speciﬁcations, there are k (cid:48) p possible states that an application of the mechanism mustconsider. We have determined experimentally that p = 3 and k (cid:48) = 4 acts as a reasonable conﬁguration.Further increase of either value quickly renders the mechanism too computationally expensive for a standarddesktop computer while further reduction of either value gives a poor approximation of SSIM. Although theuse of 4 possible pixel intensities may seem very restrictive, it is important to note that the mechanism isapplied independently to each channel of the image, just as SSIM is averaged over each channel. As a result,for an RGB image, each pixel can assume 4 = 64 possible colour states. Given the large amount of noisetypically required to obfuscate the images, we ﬁnd this to be an acceptable level of precision. We now explain how to conﬁgure and utilize the exponential mechanism in order to achieve (cid:15) -diﬀerentialprivacy for image obfuscation.

Theorem 2.

Given an input image with c channels and n pixels in each channel, our proposed implemen-tation of the exponential mechanism is able to produce an obfuscated image satisfying (cid:15) -diﬀerential privacythrough the application of the mechanism to each non-overlapping p × p pixel grid in each channel of theimage using a privacy parameter of (cid:15) (cid:48) = (cid:15)p nc for each such application.Proof. An application of the exponential mechanism using a privacy parameter of (cid:15) (cid:48) provides 2 (cid:15) (cid:48) ∆ q -diﬀerentialprivacy [65] where ∆ q is the maximum possible change in the value of the quality function. Since the measureof SSIM takes on a value in the range [0 , q is 1. To obfuscate n pixels over c channels usingan exponential grid covering p × p pixels on each application, we require ncp applications of the mechanism.By the composition theorem [61], this results in nc(cid:15) (cid:48) p -diﬀerential privacy. Thus to enforce diﬀerential privacyfor a user-speciﬁed privacy budget of (cid:15) , we conﬁgure the mechanism to use (cid:15) (cid:48) = (cid:15)p nc at each application.For instances where an image has dimensions w × h such that one or both of the dimensions are nota multiple of p , we ﬁrst apply the exponential mechanism within a w (cid:48) × h (cid:48) space where w (cid:48) and h (cid:48) are thelargest multiples of p such that w (cid:48) ≤ w and h (cid:48) ≤ h . Then in the remaining w − w (cid:48) columns and h − h (cid:48) rows,we apply the Laplace mechanism of [41].As with the Laplace mechanism, our proposed application of the exponential mechanism requires a highdegree of noise to achieve diﬀerential privacy. We therefore employ the pixelization trick of [41] to helpmitigate this. To do so, we ﬁrst perform pixelization to create b × b grids of uniform pixel intensities. Wethen consider each such grid as a single cell of the p × p grid used by the exponential mechanism. While themechanism still operates on p cells, the exponential grid covers a pb × pb pixel space. In this way, usingthe same privacy budget (cid:15) , we are able to increase the allocation of the budget for each application of themechanism to (cid:15) (cid:48) = (cid:15)p b nc . In this section, we run a series of experiments to gain insight on the performance of our proposed methodsof obfuscation in practice. In the context of obfuscation using generative models, we compare our proposedimplementation of Section 4.5 to a k -same implementation following the design of k -same-net [35]. We employthese comparisons to observe the relative performances of diﬀerential privacy and k -same obfuscation in termsof a trade-oﬀ between re-identiﬁcation risk and utility. We additionally test the resilience of diﬀerentialprivacy against parrot attacks and composition attacks. Finally, we compare our implementations of themore general pixel-space exponential mechanism to the Laplace mechanism [41]. For our generative NN implementation, we have built on top of the DeconvFaces [67] network which imple-ments the concept of up-convolution for the generation of images of input classes [36]. We apply diﬀerentialprivacy as described in Sections 4.2 and 4.5. For the k -same obfuscation, we use the same generative NN15mplementation and, follow the approach of k -same-net [35], using clustering as described for k -same-m [31].This deviates from the use of a proxy gallery as described for k -same-net. It is important to note that, whilea proxy gallery can reduce re-identiﬁcation risk, it involves a step that is not captured by the k -same privacyguarantee. Thus, in the absence of a privacy guarantee that incorporates this detail, we omit the use of aproxy gallery in order to focus our experiments on the formalized aspects of the privacy guarantees. Simi-larly, although we have shown in Section 4.6 how a generative NN can be used to apply diﬀerential privacyto unknown identities, we use only identities from the training set in our experiments. Deviations from anoriginal identity induced by the approximation process may lead to a further decrease in re-identiﬁcationrisk in a manner that is not captured by the formal privacy guarantee. We emphasize that this does notimply that obfuscated images of unknown identities are not protected by the privacy guarantee; merely thatimperfect approximations of unknown identities may lead to empirical results that suggest a stronger level ofprivacy than what is in fact guaranteed by diﬀerential privacy. We have therefore made these choices in theinterest of comparing strictly the privacy protection achieved due to the formalized aspects of the methodsof obfuscation.We have additionally compared diﬀerential privacy against k -same obfuscation using an AAM as thegenerative model. However, the results showed that an AAM that modiﬁes only a tightly-cropped portionof the face serves as a poor generative model for facial obfuscation. As such, we do not include AAMs in ourcurrent experiments. The reader is referred to [43] for these experiments.We apply each method of obfuscation to two diﬀerent datasets - RAFD [68] and KDEF [69]. Thesedatasets provide frontal facial images of subjects wearing same coloured shirts. The use of same colouredshirts prevents bias in re-identiﬁcation from the exploitation of information in unique clothing. The RAFDand KDEF datasets contain images of 67 and 70 subjects, respectively, and provide a variety of facialexpressions. Due to apparent issues with lens exposure in the KDEF dataset, we have removed two of thesubjects from our experiments.The generative NN architecture accepts class vectors for identity and facial expression as input. TheRAFD and KDEF datasets are therefore highly suitable for this network. We have trained the network for1000 epochs on each of the datasets to obtain models capable of reproducing these identities. This trainingprocess is represented by Step 1 of Figure 2. An example of obfuscated output is shown in Figure 3.Figure 2: Illustration of the steps required for the training and testing process of the generative model andfacial identity classiﬁcation model used in our experiments.Although we do not employ the generative NN approximation of unknown identities in our experiments,we include an example of some approximations in Figure 4 for reference. These approximations were producedby setting aside 10 identities from the RAFD dataset and training the generative NN on the remaining 57.We then ran a linear program for each of the 10 omitted identities to approximate them as a weightedsum of the 57 training identities. As the training set is relatively small, we expect that the quality of theapproximation can be improved through the use of larger and more diverse training sets. We also note that16ny minor deviations in the approximated identity are not of great signiﬁcance since the depicted identityis ultimately to be obfuscated.Figure 3: Obfuscation via the generative NN on the RAFD dataset. The top row employs diﬀerential privacyand the bottom row employs k -same obfuscation. A good method of facial obfuscation must be able to produce obfuscated images which cannot be accuratelyre-identiﬁed. In other words, the risk that an obfuscated image can be associated with the identity ofthe originally depicted individual must be kept acceptably low. To measure re-identiﬁcation risk, we haveemployed VGGFace D [9], a deep convolutional neural network which has been shown to achieve excellentfacial identity classiﬁcation accuracy. This simulates how an attacker might leverage machine learningmodels to launch an attack on obfuscated images. We have trained a separate model for each dataset, usingthe neutral and sad expressions for each identity for validation and the remaining expressions for training.Following a later released note about the network training [70], we employ Xavier initialization [71] for thelayer weights. To improve the robustness of the models, we have also augmented the datasets by creating twoadditional versions of each image - one with increased contrast and one with decreased contrast. The trainingof this network is represented by Step 2 of Figure 2. For reference, we provide the network architecture inTable 1 of Appendix A.In our experiments, we generate obfuscated images having a neutral facial expression. We measure re-identiﬁcation risk based on the accuracy of the top 1 guesses of the VGGFace network. Given that diﬀerentialprivacy is a stochastic process, for each combination of a privacy parameter and an identity to be protected,we have generated 10 obfuscated instances over which we take the average of the re-identiﬁcation risk. WeFigure 4: Approximation of unknown identities. Original images are shown in the top row with the corre-sponding approximations in the bottom row. 17igure 5: Identity classiﬁcation accuracy for the methods of obfuscationmeasure overall re-identiﬁcation risk for a given privacy parameter as the average risk over all individualsin the dataset. The obfuscation and re-identiﬁcation processes are respectively shown by Steps 3 and 4 inFigure 2. Since the k -same approaches are deterministic, we produce only a single output image per identityand then take the average re-identiﬁcation risk over the whole dataset. We additionally measure the baselineidentity classiﬁcation accuracy on the original (i.e., unobfuscated) data for reference. The results are shownin Figure 5.Under diﬀerential privacy, a lower value of (cid:15) implies a stronger level of privacy whereas with k -sameobfuscation, a higher value of k implies stronger privacy. The plotted data in Figure 5 shows the expectedtrend of reduced re-identiﬁcation risk as the level of privacy, determined by the respective privacy parameters,is strengthened. In contrast to the typical (cid:15) values applied to diﬀerentially private mechanisms for databases,the values used in our experiments may appear unusually high. The larger magnitude is simply a side-eﬀectof the normalization for the model vector, resulting in the interpretation of (cid:15) on a diﬀerent scale. Refer backto Section 4.4 for discussion on the interpretation of (cid:15) in this context. To compare the methods of obfuscation in terms of utility, we focus on the ability to extract useful, non-sensitive information from the obfuscated output. Speciﬁcally, we measure classiﬁcation accuracy for gender18igure 6: Gender-preserving obfuscation via the generative NN on the RAFD dataset. The top row employsdiﬀerential privacy and the bottom row employs k -same obfuscation.and facial expression in the obfuscated images. A favourable trade-oﬀ between privacy and utility occurswhen facial identity is protected while simultaneously preserving other useful information. Thus, in ourexperimental setting, it is desirable to achieve low re-identiﬁcation risk with high classiﬁcation accuracy forthe selected attributes of gender and facial expression.We begin with the popular task of gender recognition [72]. As forms of demographic classiﬁcation maybe desirable for data mining purposes, we consider high classiﬁcation accuracy to reﬂect good utility. To thisend, we employ a pre-trained model of convolutional neural network for the classiﬁcation of gender in facialimages [73]. Our intent is to compare diﬀerential privacy and k -same obfuscation in terms of a privacy-utilitytrade-oﬀ. Yet, the two methods of obfuscation use proprietary privacy parameters which cannot be directlycompared as a measure of privacy. We therefore plot the gender classiﬁcation accuracy as a function ofidentity classiﬁcation error in order to abstract away from the proprietary privacy parameters.To highlight the ability of generative NNs to incorporate properties relevant to image utility into thenetwork architecture, we have also created a modiﬁed version of the architecture that preserves gender inthe obfuscated output. To do so, we have created an input layer having two classes that specify the genderin the image. By training a model with gender labels, it learns to separate features relevant to gender fromthose relevant to identity. This enables us to focus obfuscation only on the features relevant to identitywhile leaving the gender feature vector untouched. An example of gender-preserving obfuscation is shown inFigure 6. In the interest of a fair comparison between the methods of obfuscation, we employ the modiﬁedarchitecture both for diﬀerential privacy as well as k -same obfuscation.We note that very recent work [74] has shown the potential for learned representations of high-levelfeatures in images to leak sensitive information that can be exploited for unintended inferences. This opensthe door to potential leakage of facial identity via features released in an unobfuscated format. Yet, thishas been studied in the context of image representations extracted from network layers prior to the ﬁnalclassiﬁcation output of the model. Such layers retain relatively rich representations of the input which maybe used for multiple purposes. We expect that any leakage associated with the extremely coarse features weleave unobfuscated (e.g., gender as a binary input), would have a negligible impact on re-identiﬁcation riskbeyond the explicit revelation (e.g., narrowing candidates based on their gender). To provide some empiricalevidence to this eﬀect, we have plotted both the basic and gender-preserving models of the generative NNin the identity classiﬁcation graphs shown in Figure 5. In all cases, the plot for the gender-preserving modeldemonstrates only minor deviation from the original model. This deviation is much more likely due to theexplicit depiction of gender than the leakage of any additional information.The results of the gender classiﬁcation comparisons are shown in Figure 7. From these results, we seethat the basic models for diﬀerential privacy and k -same obfuscation suﬀer a degradation in classiﬁcationaccuracy as the level of privacy is strengthened (i.e., as identity classiﬁcation error increases). Comparing19igure 7: Gender classiﬁcation accuracy for the methods of obfuscationFigure 8: Expression-preserving obfuscation via the generative NN on the RAFD dataset. The top rowemploys diﬀerential privacy and the bottom row employs k -same obfuscation.20igure 9: Facial expression classiﬁcation accuracy for the methods of obfuscationthe gender-preserved models to their basic counterparts, we see a large improvement in the classiﬁcationaccuracy, suggesting that this is an eﬀective approach for the preservation of speciﬁc properties in theobfuscated output. In some cases, the classiﬁcation accuracy of the obfuscated images has surpassed that ofthe original data. This is a result of the explicit speciﬁcation of gender labels in the network input whichcan lead to obfuscated identities that more prominently display these features.Another common task is the detection of facial expressions in images [75]. To compare utility in thiscontext, we measure the classiﬁcation accuracy for facial expressions by using pre-trained neural networkmodels [76] intended for this task. For these experiments, we perform classiﬁcation for six expressions:happiness, sadness, surprise, disgust, fear and a neutral expression. As the pre-trained models have diﬀerentstrengths and weaknesses with respect to their abilities to classify diﬀerent expressions, we have employedthem in combination to achieve high levels of classiﬁcation accuracy on our data. We have found that thebest results are achieved by ﬁrst using the Model-4 [76] architecture trained on the RAFDB [77] dataset todetect disgust, followed by the Model-4 architecture trained on the SFEW [78] dataset to detect fear. Ifneither of these expressions were detected, we then take the sum of the two vectors of model predictions andselect the highest prediction as the detected expression. As the models are not trained to detect contemptand give very poor accuracies for detection of anger in our datasets, we exclude these two expressions fromour experiments.For each expression, we generate a full set of obfuscated images as described in the experiments for re-identiﬁcation risk (Section 6.2), providing the expression label as input to the network in order to apply thechosen expression to the obfuscated output. An example of expression-preserving output is shown in Figure8. We measure both identity classiﬁcation accuracy and expression classiﬁcation accuracy as the averageover all six expressions for each privacy parameter used in both diﬀerential privacy and k -same obfuscation.We then plot the expression classiﬁcation accuracy as a function of the identity classiﬁcation accuracy inorder to once again abstract away from the privacy parameters and compare the utility of the two methods ofobfuscation. The results are shown in Figure 9. As with the experiments on gender classiﬁcation accuracy,when the chosen attributes are expressly preserved by the network, we observe classiﬁcation accuraciesthat exceed the baseline accuracies measured on the unobfuscated datasets. This is again likely due to thenetwork displaying the relevant features more prominently than the original images, making the task of theclassiﬁcation network easier. In these experiments, diﬀerential privacy generally shows better utility than k -same obfuscation.Across both utility experiments, the overall comparison between the information preserved in diﬀerentialprivacy and k -same obfuscation appears to be inconclusive in these results. Some experiments show betterresults for diﬀerential privacy while other experiments show better results for k -same obfuscation. The k -same results are also more diﬃcult to assess given the sporadic nature of the plots. This is likely due to21igure 10: Identity classiﬁcation accuracy for a parrot attackchanges in clusters between each level of obfuscation which can greatly impact classiﬁcation accuracy. Itis clear that the utility is also data-dependent given the variations in the results seen on the two datasets.Notably, many subjects in the KDEF dataset, including some males, have long hair whereas all subjects inthe RAFD dataset have short hair. The males with long hair in KDEF may have contributed to the lowergender classiﬁcation accuracy. Methods of k -same obfuscation are resistant to parrot attacks as a direct consequence of the process ofobfuscation. Diﬀerential privacy, due to its resistance to breaches via post-processing of obfuscated data,similarly possesses a theoretical resistance to parrot attacks. Yet, the implications of parrot attacks inpractical terms are less clear. A standard classiﬁcation network applied to images obfuscated via diﬀerentialprivacy is unlikely to perform as well as it could if it exploited public knowledge about how randomizationmechanisms function. An attacker could instead train a classiﬁcation network for facial recognition usinginstances of (diﬀerentially private) obfuscated images as the training set. In this way, the network mightachieve higher classiﬁcation accuracy than a network trained on unobfuscated instances as it has learned tobetter identify features in the presence of noise. Yet, diﬀerential privacy is a stochastic method of obfuscation,so beyond the presence of noise and its approximate magnitude, there is little the network can learn in termsof predictability of diﬀerentially private output. We therefore hypothesize that a suﬃciently high degree ofnoise induced by a diﬀerentially private mechanism can render the practical implications of a parrot attacknegligible.To observe the degree to which re-identiﬁcation risk is impacted on diﬀerentially private output by parrotattacks, we have trained the VGGFace network for classiﬁcation of obfuscated instances at speciﬁc privacyparameter ( (cid:15) ) values. This requires training a separate model for each privacy parameter value on eachdataset. We then compare the classiﬁcation accuracy achieved by the parrot attacks to the accuracy ofthe models trained on the unobfuscated instances (Figure 10). At high values of (cid:15) , there is an increasein classiﬁcation accuracy for the parrot attacks as the network has become better suited to ignoring smallamounts of noise. However, as the value of (cid:15) decreases, the gap rapidly closes and the trend reverses (atroughly (cid:15) = 300 for the datasets we have employed), with the parrot attack showing lower classiﬁcationaccuracy than the model trained on unobfuscated data. This is likely due to the higher magnitude of noisedestroying many of the useful patterns that the network otherwise learns in the training data. As a result,we expect that for reasonable conﬁgurations of privacy parameters that would be used in practice, parrotattacks would provide little, if any, advantage to an attacker.22igure 11: Identity classiﬁcation accuracy for a composition attack One of the key advantages that diﬀerential privacy provides over k -anonymity is the property of securecomposition, which ensures that the privacy guarantee is never violated in the scenario of uncoordinatedreleases of sensitive data. To demonstrate the resilience of our proposed method of obfuscation againstcomposition attacks, we simulate such an attack and measure the identity classiﬁcation accuracy for bothour implementation of diﬀerential privacy and k -same obfuscation.We consider a scenario in which an image is uploaded to two diﬀerent platforms, each of which apply facialobfuscation in an uncoordinated manner. Through observation of the non-obfuscated portions of the image(e.g., the background of the image), an attacker could determine that the two obfuscated images originallydepicted the same individual, enabling them to perform a composition attack. To simulate this, we split adataset into two subsets of equal size. We select the subsets such that they have a non-empty intersectionbut also contain many identities that are not found in the other subset. We then train a generative networkmodel on each of the subsets of individuals. This provides us with two models which have some identities incommon but which will obfuscate those identities in diﬀerent ways due to having been trained on diﬀeringsubsets. These models represent two organizations which will perform obfuscation in an uncoordinatedmanner. We then use the models to obfuscate only the identities that are present in both subsets. The twoobfuscated images produced for each identity represent the two images that an attacker would examine in acomposition attack.As the RAFD and KDEF datasets are too small to achieve any reasonable diversity of identities oncethey are further reduced to subsets, we combine both datasets together and draw the subsets from thecombined dataset. To mitigate bias in identity classiﬁcation from diﬀerences in the controlled settings of thetwo original datasets, we ﬁrst crop all images to capture only the faces and then adjust the saturation andcontrast of the images to match more closely. From the combined dataset of 135 individuals, we select 50individuals to be shared across both subsets and split the remaining individuals evenly between them. Wematch the male to female ratios of both subsets to the full set of images but otherwise select the individualsfor each subset at random.To perform a composition attack on a pair of obfuscated images, we ﬁrst provide both images as inputto the VGGFace network and add together the two resultant vectors of prediction values for the full setof identities. For k -same obfuscation, we then take the intersection of the identities from each cluster usedby the two models during obfuscation and select the identity within this intersection that has the highestprediction value from the summed vectors. In practice, an attacker will not necessarily know the exactidentities used in each cluster with certainty but could likely determine them with reasonable accuracy bytaking the top k predictions from their facial recognition network on each of the obfuscated images. By usingthe exact identities in our experiments, we eﬀectively test the worst-case scenario.23hen launching a composition attack against diﬀerentially private output, the attacker no longer has theconcept of clusters of identities to use to their advantage. The original image may have depicted any of thepotential identities. The attacker may still use the two images to increase the accuracy of their predictionbut the additional information does not provide them with any means to violate the diﬀerential privacyguarantee. To simulate an attacker using the additional information in this context, we select the highestprediction value from the summed vectors of predictions for each pair of images. Since we generate teninstances of obfuscated output for each identity, we average the prediction accuracy over all ten pairs ofimages for each identity.The results are shown in Figure 11, comparing the re-identiﬁcation risk in each method of obfuscationfor a single obfuscated image to the risk from a pair of images on which a composition attack has beenperformed. With the k -same obfuscation, we observe a signiﬁcant increase in re-identiﬁcation risk whichgreatly exceeds the theoretical maximum value of k . This clearly demonstrates a violation of the k -sameprivacy guarantee. With diﬀerential privacy, we also observe an increase in re-identiﬁcation risk, as wouldbe expected due to the additional information provided to the attacker, however, the property of securecomposition ensures that the privacy guarantee is preserved. Furthermore, the gap between the single imagere-identiﬁcation risk and composition risk is less signiﬁcant on the diﬀerentially private images than the k -same images, and becomes marginal at high levels of privacy (e.g., when (cid:15) <

300 in Figure 11). Sinceprivacy parameters that are conducive to a high level of privacy would typically be selected in most realisticscenarios, we expect that the degradation in privacy due to multiple instances of diﬀerentially private outputbeing distributed in practice would be minimal.

In this section, we compare our proposed use of a pixel-space exponential mechanism to an existing im-plementation [41] of a Laplace mechanism in pixel-space. In these experiments, we employ the FaceScrubdataset [79] to reﬂect the ability of pixel-space obfuscation to handle diverse image content. FaceScrubconsists of a collection of facial images spanning roughly 500 individuals in diverse conditions with respectto background content, lighting, pose, etc. We resize all images to 128 ×

128 pixels and randomly select tenimages per individual to act as a training dataset for a facial classiﬁcation network and three images perindividual to act as the testing set. Given that diﬀerentially private obfuscation is a stochastic process, weproduce three obfuscated instances per test image resulting in a total of nine obfuscated images per identity.We re-iterate that pixel-space obfuscation requires neither training data nor a generative model for the data.The use of training data in our experiments is strictly to train a classiﬁcation network in order to measurere-identiﬁcation risk and report on the performance of the obfuscation mechanisms.The work in [41] proposes the combination of pixelization with diﬀerential privacy in order to bettermanage the privacy budget. We therefore incorporate this into our experiments by using three diﬀerentpixelization grid sizes: 4 ×

4, 8 ×

8, and 16 ×

16. As it is known that blurring a pixelized image can improvethe ability of humans to recognize the image content [22], we additionally test versions of the images thathave been blurred as a post-processing step after obfuscation has been applied. We blur the images using aGaussian kernel with a standard deviation of 1. Recall that post-processing cannot impact the diﬀerentialprivacy guarantee [61]. However, this does not preclude the possibility of an impact on both re-identiﬁcationrisk and utility.

We begin by examining the impact of pixelization and blurring on the obfuscated output. The three pix-elization settings combined with a boolean option for blurring results in six potential conﬁgurations for eachof mechanisms. We ﬁrst compare the variants of the exponential mechanism separately from the variantsof the Laplace mechanism in order to clearly observe the impact of these settings within the same class ofmechanism. The results of this comparison are shown in Figure 12. Here we plot SSIM as a function ofthe privacy budget (cid:15) (i.e., the composition of the privacy parameters over all applications of the mechanismneeded to obfuscate an image). The SSIM values are calculated as the average score over all obfuscatedinstances. Lower values on the x-axis represent a stronger privacy guarantee and higher values on the y-axisrepresent better utility. We additionally show examples of obfuscated images in Figure 13.24igure 12: A comparison of the SSIM achieved by the mechanisms over a wide range of privacy budgets.The left graph compares variants of the exponential mechanism and the right graph compares variants ofthe Laplace mechanism.From the results of Figure 12 we note two main trends that hold for both the exponential and the Laplacemechanism. First, for any given pixelization setting, the use of blurring always oﬀers a higher level of utilitythan the non-blurred conﬁguration. This can be seen by the dashed plots representing the blur variantswhich universally result in higher SSIM than their non-blur counterparts shown by the solid line plots.The second common trend is that stronger pixelization (i.e., larger pixelization grid sizes) provides higherSSIM for strong levels of privacy (i.e., low values of (cid:15) ), while weaker pixelization provides higher SSIM forweak levels of privacy. The better performance of strong pixelization for small values of (cid:15) follows from theusefulness of pixelization as a means to reduce the query sensitivity, leading to better management of theprivacy budget. The reversal of this trend for large values of (cid:15) is a result of the minimal amount of noise thatis added by the mechanisms for such weak privacy parameters. Due to this, the predominant modiﬁcationto the images is the pixelization rather than the application of noise, leading to worse utility in the strongerlevels of pixelization. Similar results can be seen in Figure 16 of Appendix B using the mean squared error(MSE) of pixel intensities as an alternate measure of utility. Note that in the case of MSE, lower valuesindicate better utility.Given that the intent is to provide a meaningful level of privacy through diﬀerentially private obfuscation,these large privacy budgets are of little practical value. This can be easily conﬁrmed by visual inspection ofthe obfuscated examples at high values of (cid:15) in Figure 13. Consequently, strong pixelization with small valuesof (cid:15) should be used in practice to achieve meaningful levels of privacy while attaining the best trade-oﬀ withrespect to utility. Furthermore, the blurring operation appears to always be beneﬁcial with respect to utilityin the obfuscated output.We next turn to a comparison between the exponential mechanism and the Laplace mechanism. Usingonly the blurred variants, we compare the two mechanisms at all three levels of pixelization. The results areshown in Figure 14-a. For any given privacy budget (cid:15) , the highest plotted mechanism on the y-axis reﬂectsthe best performance. The exponential mechanisms, plotted as dashed lines, almost exclusively make upthe upper envelope of the plots. This demonstrates a consistently stronger performance from our proposedmechanism in comparison to the Laplace mechanism. Furthermore, since we are primarily interested in theprivacy/utility trade-oﬀ for low values of (cid:15) which provide a meaningful privacy guarantee, we show a zoomedin comparison of the two mechanisms using a pixelization setting of 16 in Figure 14-b. This comparison showsa notable improvement in SSIM from the exponential mechanism in comparison to the Laplace mechanismfor strong levels of privacy (i.e., (cid:15) < (cid:15) , the scale on which (cid:15) is interpreted remains quite large. Yet, given the obfuscated examples of Figure 13,it is clear that these values of (cid:15) provide a strong level of obfuscation. These results suggest that the inter-25igure 13: A visual comparison of obfuscated instances produced by the blur variants of both mechanismsat all 3 pixelization settings. 26igure 14: A comparison between the blur variants of the exponential and Laplace mechanism with respectto utility measured using SSIM. The left graph compares all three pixelization settings for the mechanismsand the right graph focuses on the mechanisms using a pixelization grid of size 16 for eﬃcient use of lowprivacy budgets.Figure 15: A comparison of the re-identiﬁcation risk and utility trade-oﬀ of the mechanisms. Re-identiﬁcationrisk is measured using the average classiﬁcation accuracy of a facial identity classiﬁcation network.27retation of the privacy guarantee using concepts of adjacency and query sensitivity may not be appropriatefor pixel-space obfuscation. As with our proposed approach to obfuscation via generative models, the pri-vacy guarantee is likely better interpreted in pixel-space using a distance-based generalization of diﬀerentialprivacy. We leave further investigation of this topic as an open question.

As a ﬁnal comparison of the mechanisms, we use a facial identity classiﬁcation network to measure re-identiﬁcation risk. To this end, we employ FaceNet [80], a network which has been demonstrated to providehigh levels of identity classiﬁcation accuracy even when presented with low resolution or otherwise poorquality images. This property makes it a good candidate for re-identiﬁcation of images obfuscated in pixelspace, which are subject to a reduction in resolution due to pixelization and heavy distortion due to theaddition of noise. For reference, we provide the network architecture in Table 2 of Appendix A. For eachcombination of a mechanism type, pixelization setting, and privacy budget, we train the network using thetraining partition of the dataset subject to obfuscation via the selected mechanism conﬁguration. As withthe testing data, we produce three obfuscated instances per image, leading to a total of 30 training instancesper identity. This allows us to use the classiﬁcation network to launch a parrot attack on the obfuscateddata.Results are shown in Figure 15. We plot utility as a function of re-identiﬁcation risk which is measuredas the average identity classiﬁcation accuracy over the obfuscated testing images. A desirable privacy/utilitytrade-oﬀ is reﬂected by low re-identiﬁcation risk and high utility. In Figure 15-a, we measure utility asSSIM where high values on the y-axis indicate higher utility and in Figure 15-b, we measure utility usingMSE where low values on the y-axis indicate higher utility. In both graphs, we show the mechanisms usingpixelization settings of 4 and 16. We omit the pixelization setting of 8 for better visual clarity due to theclutter caused by the closeness of the plots. Note that the plots for the variants using a pixelization setting of16 all remain beneath 2% re-identiﬁcation risk. This is due to the strong level of pixelization which preventseﬀective re-identiﬁcation even in the absence of noise added by the mechanisms.Interestingly, the variants using a pixelization setting of 4 provide the best utility, with the exponentialmechanism providing the best performance with respect to SSIM and the Laplace mechanism providing thebest performance with respect to MSE. However, as we have noted in the previous experiments, the variantsusing a pixelization setting of 4 have a worse trade-oﬀ between the privacy budget and the level of utility.Due to this, the better performance observed in Figure 15 should be considered with caution as the highervalues of (cid:15) imply worse theoretical properties of privacy protection despite the low re-identiﬁcation risk. Itis, for example, possible that the use of diﬀerent classiﬁcation networks which exploit diﬀerent aspects of theobfuscated images may show diﬀerent trends in the trade-oﬀ between re-identiﬁcation risk and utility. Weleave further investigation into the measure of re-identiﬁcation risk in practice as future work.

Based on the results of our experimental comparisons, we recommend the use of the exponential mechanismalong with a strong level of pixelization (e.g., a pixelization grid size of 16) to achieve high levels of utilityin both SSIM and MSE while using a low privacy budget in order to enforce a strong level of privacy.Furthermore, blurring can be applied to the obfuscated results to improve the level of utility. If one isstrictly interested in a practical interpretation of privacy as re-identiﬁcation risk, the use of a weaker levelof pixelization appears to have the potential to provide a better risk/utility trade-oﬀ. However, we cautionreaders that this comes at the cost the strong theoretical properties of privacy protection that would otherwisebe achieved via a low value of (cid:15) . We have studied how to obtain a formalized privacy guarantee for the obfuscation of facial images in practice.We have identiﬁed shortcomings of the k -same privacy guarantee including susceptibilities to backgroundknowledge and composition attacks as well as the awkwardness in the requirement for a gallery of inputimages. To improve upon this, we have proposed the use of diﬀerential privacy in the context of obfuscation28pplied to generative models for images. We have developed a framework that provides a meaningful privacyguarantee for such models and we have derived the conﬁguration of Laplace mechanism that can achievethis privacy guarantee. Our approach preserves the privacy guarantee in the presence of attackers withbackground knowledge, provides resistance to composition attacks and removes the requirement for a galleryof input images. We have also proposed the use of a more general mechanism to obfuscate any image directlyin pixel-space. This allows for greater versatility in the obfuscation of images.We have implemented our proposed mechanisms as well as the competing approaches discussed in thepaper for experimental comparisons. In our experiments on pixel-space mechanisms, we have demonstratedimprovements in measures of visual quality for our proposed use of an exponential mechanism over theLaplace mechanism. In our experiments using generative models, we have implemented both our proposedframework as well as k -same obfuscation. Through our comparisons, we have demonstrated the resilience ofdiﬀerential privacy against parrot and composition attacks. Furthermore, we have shown that this applicationof diﬀerential privacy can achieve comparable utility to k -same obfuscation. We conclude that the keyimprovements in the privacy guarantee combined with comparable levels of utility make diﬀerential privacya much more appropriate choice for the obfuscation of facial images.In this work, we have focused on introducing a framework and examining its theoretical and practicalproperties for frontal facing images taken in a controlled environment using standard facial image datasets.In future work, this could be extended to studying generative adversarial networks for the generation ofdetailed images drawn from more complex distributions. Furthermore, we posit that the approximation ofunknown identities can be improved upon through the use of auto-encoder style architectures seen in somegenerative networks.We also leave as an open problem the potential for improvements to the exponential mechanism for pixel-space obfuscation. Given that our proposed implementation is limited both in the size of the exponentialgrid and the precision of the pixel intensities, there is room for improvement in the quality of the obfuscatedoutput if one can ﬁnd more eﬀective methods to handle the computational complexity of the mechanism.Additionally, investigation of SSIM variants for use as the quality function may lead to improved preservationof visual quality in the obfuscated output. For example, the complex wavelet SSIM [81], which is robust tosmall changes in translation and scaling, could be explored for use in this context. References [1] A. Cavailaro, “Privacy in Video Surveillance [In the Spotlight],”

IEEE Signal Processing Magazine ,vol. 24, no. 2, pp. 168–166, 2007.[2] T. Winkler and B. Rinner, “Security and Privacy Protection in Visual Sensor Networks: A Survey,”

ACM Comput. Surv. , vol. 47, no. 1, pp. 1–42, 2014.[3] S. Ribaric, A. Ariyaeeinia, and N. Pavesic, “De-identiﬁcation for Privacy Protection in MultimediaContent: A Survey,”

Signal Processing: Image Communication , vol. 47, pp. 131 – 151, 2016.[4] J. R. Padilla-L´opez, A. A. Chaaraoui, and F. Fl´orez-Revuelta, “Visual Privacy Protection Methods: ASurvey,”

Expert Syst. Appl. , vol. 42, no. 9, pp. 4177–4195, 2015.[5] Google, “Google Maps.” Accessed: February 27, 2019.[6] A. Frome, G. Cheung, A. Abdulkader, M. Zennaro, B. Wu, A. Bissacco, H. Adam, H. Neven, and L. Vin-cent, “Large-Scale Privacy Protection in Google Street View,” in

IEEE 12th International Conferenceon Computer Vision , pp. 2373–2380, 2009.[7] K. Martin and K. Shilton, “Putting Mobile Application Privacy in Context: An Empirical Study ofUser Privacy Expectations for Mobile Devices,”

The Information Society , vol. 32, no. 3, pp. 200–216,2016.[8] X. Hu, D. Hu, S. Zheng, W. Li, F. Chen, Z. Shu, and L. Wang, “How People Share Digital Images inSocial Networks: A Questionnaire-Based Study of Privacy Decisions and Access Control,”

MultimediaTools and Applications , vol. 77, no. 14, pp. 18163–18185, 2018.299] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep Face Recognition,” in

British Machine VisionConference , 2015.[10] Y. Li, N. Vishwamitra, B. P. Knijnenburg, H. Hu, and K. Caine, “Blur vs. Block: Investigating theEﬀectiveness of Privacy-Enhancing Obfuscation for Images,” in

IEEE Conference on Computer Visionand Pattern Recognition Workshops , pp. 1343–1351, 2017.[11] Y. Li, N. Vishwamitra, B. P. Knijnenburg, H. Hu, and K. Caine, “Eﬀectiveness and Users’ Experienceof Obfuscation As a Privacy-Enhancing Technology for Sharing Photos,”

Proc. ACM Hum.-Comput.Interact. , vol. 1, pp. 1–24, 2017.[12] G. Cormode, C. M. Procopiuc, E. Shen, D. Srivastava, and T. Yu, “Empirical Privacy and EmpiricalUtility of Anonymized Data,” in

IEEE 29th International Conference on Data Engineering Workshops ,pp. 77–82, 2013.[13] M. M. Almasi, T. R. Siddiqui, N. Mohammed, and H. Hemmati, “The Risk-Utility Tradeoﬀ for DataPrivacy Models,” in ,pp. 1–5, 2016.[14] S. E. Hudson and I. Smith, “Techniques for Addressing Fundamental Privacy and Disruption Tradeoﬀs inAwareness Support Systems,” in

ACM Conference on Computer Supported Cooperative Work , pp. 248–257, 1996.[15] J. Kr¨ockel and F. Bodendorf, “Customer Tracking and Tracing Data as a Basis for Service Innovationsat the Point of Sale,” in

Annual SRII Global Conference , pp. 691–696, 2012.[16] X. Liu, N. Krahnstoever, T. Yu, and P. Tu, “What Are Customers Looking at?,” in

IEEE Conferenceon Advanced Video and Signal Based Surveillance , pp. 405–410, 2007.[17] F. Anwar, I. Petrounias, T. Morris, and V. Kodogiannis, “Mining Anomalous Events Against FrequentSequences in Surveillance Videos from Commercial Environments,”

Expert Syst. Appl. , vol. 39, no. 4,pp. 4511–4531, 2012.[18] P. L. Venetianer, Z. Zhang, A. Scanlon, Y. Hu, and A. J. Lipton, “Video Veriﬁcation of Point of SaleTransactions,” in

IEEE Conference on Advanced Video and Signal Based Surveillance , pp. 411–416,2007.[19] E. M. Newton, L. Sweeney, and B. Malin, “Preserving Privacy by De-Identifying Face Images,”

IEEETransactions on Knowledge and Data Engineering , vol. 17, no. 2, pp. 232–243, 2005.[20] S. R. Ganta, S. P. Kasiviswanathan, and A. Smith, “Composition Attacks and Auxiliary Informationin Data Privacy,” in , pp. 265–273, 2008.[21] L. Harmon and B. Julesz, “Masking in Visual Recognition: Eﬀects of Two-Dimensional Filtered Noise,”

Science , vol. 180, no. 4091, pp. 1194–1197, 1973.[22] L. Harmon, “The Recognition of Faces,”

Scientiﬁc American , vol. 229, no. 5, pp. 71–82, 1973.[23] G. Letournel, A. Bugeau, V. . Ta, and J. . Domenger, “Face De-Identiﬁcation with Expressions Preser-vation,” in

IEEE International Conference on Image Processing , pp. 4366–4370, 2015.[24] P. Korshunov and T. Ebrahimi, “Using Warping for Privacy Protection in Video Surveillance,” in , pp. 1–6, 2013.[25] P. Korshunov and T. Ebrahimi, “Using Face Morphing to Protect Privacy,” in , pp. 208–213, 2013.[26] D. Bitouk, N. Kumar, S. Dhillon, P. Belhumeur, and S. K. Nayar, “Face Swapping: AutomaticallyReplacing Faces in Photographs,”

ACM Trans. Graph. , vol. 27, no. 3, pp. 39:1–39:8, 2008.3027] S. Mosaddegh, L. Simon, and F. Jurie, “Photorealistic Face De-Identiﬁcation by Aggregating Donors’Face Components,” in

Asian Conference on Computer Vision , pp. 159–174, 2015.[28] A. Brand and J. A. Lal, “European Best Practice for Quality Assurance, Provision and Use of Genome-based Information and Technologies,”

Drug Metabol Drug Interact , vol. 27, pp. 177–82, 2012.[29] U.S. Department of Health & Human Services, “Guidance Regarding Methods for De-identiﬁcation ofProtected Health Information in Accordance with the Health Insurance Portability and AccountabilityAct (HIPAA) Privacy Rule,” 2015. Accessed: February 9, 2018.[30] P. Samarati and L. Sweeney, “Protecting Privacy when Disclosing Information: k-Anonymity and itsEnforcement through Generalization and Suppression,” tech. rep., Computer Science Laboratory, SRIInternational, 1998.[31] R. Gross, L. Sweeney, F. de la Torre, and S. Baker, “Model-Based Face De-Identiﬁcation,” in

ComputerVision and Pattern Recognition Workshop , pp. 161–161, 2006.[32] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active Appearance Models,”

IEEE Trans. PatternAnal. Mach. Intell. , vol. 23, no. 6, pp. 681–685, 2001.[33] L. Meng and Z. Sun, “Face De-identiﬁcation with Perfect Privacy Protection,” in , pp. 1234–1239, 2014.[34] H. Chi and Y. H. Hu, “Face De-Identiﬁcation Using Facial Identity Preserving Features,” in

IEEEGlobal Conference on Signal and Information Processing , pp. 586–590, 2015.[35] B. Meden, Z. Emersic, V. Struc, and P. Peer, “k-Same-Net: Neural-Network-Based Face Deidentiﬁca-tion,” in

International Conference and Workshop on Bioinspired Intelligence , pp. 1–7, 2017.[36] A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, and T. Brox, “Learning to Generate Chairs,Tables and Cars with Convolutional Networks,”

IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 39, no. 4, pp. 692–705, 2017.[37] R. Gross, E. Airoldi, B. Malin, and L. Sweeney, “Integrating Utility into Face De-identiﬁcation,” in

Privacy Enhancing Technologies , pp. 227–242, 2006.[38] L. Du, M. Yi, E. Blasch, and H. Ling, “GARP-Face: Balancing Privacy Protection and Utility Preser-vation in Face De-Identiﬁcation,” in

IEEE International Joint Conference on Biometrics , pp. 1–8, 2014.[39] T. Sim and L. Zhang, “Controllable Face Privacy,” in , vol. 04, pp. 1–8, 2015.[40] L. Meng, Z. Sun, A. Ariyaeeinia, and K. L. Bennett, “Retaining Expressions on De-Identiﬁed Faces,”in , pp. 1252–1257, 2014.[41] L. Fan, “Image Pixelization with Diﬀerential Privacy,” in

Data and Applications Security and Privacy ,pp. 148–162, 2018.[42] L. Fan, “Practical image obfuscation with provable privacy,” in

IEEE International Conference onMultimedia and Expo , pp. 784–789, 2019.[43] W. L. Croft, J. Sack, and W. Shi, “Diﬀerentially private obfuscation of facial images,” in

MachineLearning and Knowledge Extraction , pp. 229–249, 2019.[44] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership Inference Attacks Against MachineLearning Models,” in

IEEE Symposium on Security and Privacy , pp. 3–18, 2017.3145] B. Hitaj, G. Ateniese, and F. P´erez-Cruz, “Deep Models Under the GAN: Information Leakage from Col-laborative Deep Learning,” in

ACM SIGSAC Conference on Computer and Communications Security ,pp. 603–618, 2017.[46] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov, “Exploiting Unintended Feature Leakage inCollaborative Learning,” in

IEEE Symposium on Security and Privacy , pp. 691–706, 2019.[47] M. Fredrikson, S. Jha, and T. Ristenpart, “Model Inversion Attacks that Exploit Conﬁdence Informationand Basic Countermeasures,” in , pp. 1322–1333, 2015.[48] M. Abadi, A. Chu, I. J. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “DeepLearning with Diﬀerential Privacy,” in

ACM SIGSAC Conference on Computer and CommunicationsSecurity , pp. 308–318, 2016.[49] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, “Diﬀerentially Private Generative Adversarial Network,”

CoRR , 2018.[50] X. Zhang, S. Ji, and T. Wang, “Diﬀerentially Private Releasing via Deep Generative Model,”

CoRR ,2018.[51] A. Triastcyn and B. Faltings, “Generating Diﬀerentially Private Datasets Using GANs,”

CoRR , 2018.[52] P. C. Roy and V. N. Boddeti, “Mitigating Information Leakage in Image Representations: A MaximumEntropy Approach,” in

IEEE Conference on Computer Vision and Pattern Recognition , pp. 2586–2594,2019.[53] J. Chen, J. Konrad, and P. Ishwar, “VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition,” in

IEEE Conference on Computer Vision and Pattern Recog-nition Workshops , pp. 1570–1579, 2018.[54] Z. Ren, Y. J. Lee, and M. S. Ryoo, “Learning to Anonymize Faces for Privacy Preserving ActionDetection,” in

European Conference on Computer Vision , pp. 639–655, 2018.[55] Y. Wu, F. Yang, Y. Xu, and H. Ling, “Privacy-Protective-GAN for Privacy Preserving Face De-Identiﬁcation,”

Journal of Computer Science and Technology , vol. 34, no. 1, pp. 47–60, 2019.[56] A. Basu, T. Nakamura, S. Hidano, and S. Kiyomoto, “k-anonymity: Risks and the Reality,” in

IEEETrustcom/BigDataSE/ISPA , vol. 1, pp. 983–989, 2015.[57] K. Nissim and A. Wood, “Is Privacy

Privacy ?,”

The Royal Society , vol. 376, 2018.[58] K. Nissim, A. Bembenek, A. Wood, M. Bun, M. Gaboardi, U. Gasser, D. R. O’Brien, and S. Vadhan,“Bridging the Gap between Computer Science and Legal Approaches to Privacy,”

Harvard Journal ofLaw & Technology , vol. 31, pp. 687–780, 2018.[59] Rachel Cummings and Deven Desai, “The Role of Diﬀerential Privacy in GDPR Compliance,” 2018.Accessed: October 2, 2019.[60] C. Dwork, “Diﬀerential Privacy,” in , pp. 1–12, 2006.[61] C. Dwork and A. Roth, “The Algorithmic Foundations of Diﬀerential Privacy,”

Found. Trends Theor.Comput. Sci. , vol. 9, no. 3&

Privacy Enhancing Technologies , pp. 82–102, 2013.[63] M. E. Andr´es, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Geo-indistinguishability:Diﬀerential Privacy for Location-based Systems,” in

ACM SIGSAC Conference on Computer & , pp. 901–914, 2013. 3264] T. S. Ferguson, “Linear programming: A concise introduction.” Accessed: September 9, 2019.[65] F. McSherry and K. Talwar, “Mechanism Design via Diﬀerential Privacy,” in , pp. 94–103, 2007.[66] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From ErrorVisibility to Structural Similarity,”

IEEE Transactions on Image Processing , vol. 13, no. 4, pp. 600–612,2004.[67] M. Flynn, “Generating Faces with Deconvolution Networks,” 2016. Accessed: November 1, 2018.[68] O. Langner, R. Dotsch, G. Bijlstra, D. Wigboldus, S. Hawk, and A. van Knippenberg, “Presentationand Validation of the Radboud Faces Database,”

Cognition and Emotion , vol. 24, no. 8, p. 1377—1388,2010.[69] D. Lundqvist, A. Flykt, and A. ¨Ohman, “The Karolinska Directed Emotional Faces – KDEF,” 1998.ISBN 91-630-7164-9.[70] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recogni-tion,” in , 2015.[71] X. Glorot and Y. Bengio, “Understanding the Diﬃculty of Training Deep Feedforward Neural Networks,”in , vol. 9, pp. 249–256, 2010.[72] C. B. Ng, Y. H. Tay, and B.-M. Goi, “Recognizing Human Gender in Computer Vision: A Survey,” in

PRICAI 2012: Trends in Artiﬁcial Intelligence , pp. 335–346, 2012.[73] G. Levi and T. Hassner, “Age and Gender Classiﬁcation Using Convolutional Neural Networks,” in

IEEE Computer Vision and Pattern Recognition Workshops , pp. 34–42, 2015.[74] C. Song and V. Shmatikov, “Overlearning Reveals Sensitive Attributes,” in , 2020.[75] Y.-L. Tian, T. Kanade, and J. F. Cohn, “Facial Expression Analysis,” in

Handbook of Face Recognition (S. Li and A. Jain, eds.), ch. 11, pp. 247–275, Springer Science+Business Media Inc., 2005.[76] D. Acharya, Z. Huang, D. P. Paudel, and L. Van Gool, “Covariance Pooling for Facial Expression Recog-nition,” in

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pp. 480–4807, 2018.[77] S. Li, W. Deng, and J. Du, “Reliable Crowdsourcing and Deep Locality-Preserving Learning for Ex-pression Recognition in the Wild,” in

IEEE Conference on Computer Vision and Pattern Recognition ,vol. 28, pp. 2584–2593, 2017.[78] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Static Facial Expression Analysis in Tough Conditions:Data, Evaluation Protocol and Benchmark,” in

IEEE International Conference on Computer VisionWorkshops , pp. 2106–2112, 2011.[79] H. Ng and S. Winkler, “A Data-Driven Approach to Cleaning Large Face Datasets,” in

IEEE Interna-tional Conference on Image Processing , pp. 343–347, 2014.[80] F. Schroﬀ, D. Kalenichenko, and J. Philbin, “FaceNet: A Uniﬁed Embedding for Face Recognitionand Clustering,” in ,pp. 815–823, 2015.[81] M. P. Sampat, Z. Wang, S. Gupta, A. C. Bovik, and M. K. Markey, “Complex Wavelet StructuralSimilarity: A New Image Similarity Index,”

IEEE Transactions on Image Processing , vol. 18, no. 11,pp. 2385–2401, 2009.[82] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, andA. Rabinovich, “Going Deeper with Convolutions,” in , pp. 1–9, 2015. 33 ppendix A - Facial Classiﬁcation Network Architectures

Layer Type Settings

Conv Size: 3x3, Filters: 64, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 64, Stride: 1, Pad: 1ReLUMax Pool Size: 2x2, Stride: 2, Pad: 0Conv Size: 3x3, Filters: 128, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 128, Stride: 1, Pad: 1ReLUMax Pool Size: 2x2, Stride: 2, Pad: 0Conv Size: 3x3, Filters: 256, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 256, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 256, Stride: 1, Pad: 1ReLUMax Pool Size: 2x2, Stride: 2, Pad: 0Conv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUMax Pool Size: 2x2, Stride: 2, Pad: 0Conv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUConv Size: 3x3, Filters: 512, Stride: 1, Pad: 1ReLUMax Pool Size: 2x2, Stride: 2, Pad: 0FC Size: 7x7, Filters: 4096, Stride: 1, Pad: 0ReLUDropout Rate: 0.5FC Size: 1x1, Filters: 4096, Stride: 1, Pad: 0ReLUDropout Rate: 0.5FC Size: 1x1, Filters: IDs, Stride: 1, Pad: 0SoftmaxTable 1: VGG D architecture [9]. We set the number of ﬁlters on the ﬁnal fully connected layer to thenumber of identities in the dataset on which the network is applied.34 ayer Type Settings

Conv Kernel: 7x7, Filters: 64, Stride: 2ReLUMax Pool Size: 3x3, Stride: 2InceptionMax Pool Size: 3x3, Stride: 2Inception Uses L2 PoolingInception Uses L2 PoolingInceptionMax Pool Size: 3x3, Stride: 2Inception Uses L2 PoolingInception Uses L2 PoolingInception Uses L2 PoolingInception Uses L2 PoolingInceptionMax Pool Size: 3x3, Stride: 2Inception Uses L2 PoolingInceptionAverage Pool Size: 7x7, Stride: 1FC Size: 1x1, Filters: 128, Stride: 1L2 NormTable 2: FaceNet architecture [80]. Inception refers to the use of inception blocks as described in [82].Inception blocks listed as using L2 pooling do so in place of the standard max pooling.