[PDF] Exposing GAN-generated Faces Using Inconsistent Corneal Specular Highlights

Abstract

Sophisticated generative adversary network (GAN) models are now able to synthesize highly realistic human faces that are difficult to discern from real ones visually. In this work, we show that GAN synthesized faces can be exposed with the inconsistent corneal specular highlights between two eyes. The inconsistency is caused by the lack of physical/physiological constraints in the GAN models. We show that such artifacts exist widely in high-quality GAN synthesized faces and further describe an automatic method to extract and compare corneal specular highlights from two eyes. Qualitative and quantitative evaluations of our method suggest its simplicity and effectiveness in distinguishing GAN synthesized faces.

Full PDF

EEXPOSING GAN-GENERATED FACES USINGINCONSISTENT CORNEAL SPECULAR HIGHLIGHTS

Shu Hu, Yuezun Li, and Siwei Lyu

Computer Science and EngineeringUniversity at Buffalo, State University of New York, USA { shuhu,yuezunli,siweilyu } @buffalo.edu ABSTRACT

Sophisticated generative adversary network (GAN) models arenow able to synthesize highly realistic human faces that aredifﬁcult to discern from real ones visually. In this work, weshow that GAN synthesized faces can be exposed with the in-consistent corneal specular highlights between two eyes. Theinconsistency is caused by the lack of physical/physiologicalconstraints in the GAN models. We show that such artifactsexist widely in high-quality GAN synthesized faces and fur-ther describe an automatic method to extract and comparecorneal specular highlights from two eyes. Qualitative andquantitative evaluations of our method suggest its simplicityand effectiveness in distinguishing GAN synthesized faces.

1. INTRODUCTION

The rapid advancements of the AI technology, the easier ac-cess to large volume of online personal media, and the in-creasing availability of high-throughput computing hardwarehave revolutionized the manipulation and synthesis of digitalaudios, images, and videos. A quintessential example of theAI synthesized media are the highly realistic human faces gen-erated using the generative adversary network (GAN) models[1, 2, 3, 4], Figure 1. As the GAN-synthesized faces havepassed the “uncanny valley” and are challenging to distinguishfrom images of real human faces, they quickly become a newform of online disinformation. In particular, GAN-synthesizedfaces have been used as proﬁle images for fake social mediaaccounts to lure or deceive unaware users [5, 6, 7, 8].Correspondingly, there is a rapid development of detectionmethods targeting at GAN synthesized faces [9, 10]. The ma-jority of GAN-synthesized image detection methods are basedon extracting signal level cues then train classiﬁers such asSVMs or deep neural networks to distinguish them from realimages. Although high performance has been reported usingthese methods, they also suffer from some common drawbacks,including the lack of interpretability of the detection results,low robustness to laundering operations and adversarial at-tacks [11], and poor generalization across different synthesismethods. A different type of detection methods take advantage

Fig. 1 : Examples of GAN synthesized images of realis-tic human faces. These images are obtained from http://thispersondoesnotexist.com generated with theStyleGAN2 model [4]. of the inadequacy of the GAN synthesis models in represent-ing the more semantic aspects of the human faces and theirinteractions with the physical world [12, 9, 13, 14]. Such phys-iological/physical based detection methods are more robust toadversarial attacks, and afford intuitive interpretations.In this work, we propose a new physiological/physicalbased detection method of GAN-synthesized faces that usesthe inconsistency of the corneal specular highlights betweenthe two synthesized eyes. The corneal specular highlights arethe images of light emitting or reﬂecting objects in the environ-ment at the time of capture on the surface of the cornea. Whenthe subject’s eyes look straight at the camera and the lightsources or reﬂections in the surrounding environment are rela-tively far away from the subject ( i.e , the “portrait setting”), thetwo eyes see the same scene and their corresponding corneal a r X i v : . [ c s . C V ] O c t ig. 2 : Corneal specular highlights for a real human face (left)and a GAN-synthesized face (right). The corneal regions areisolated and scaled for better visibility. Note that the cornealspecular highlights for the real face have strong similaritieswhile those for the GAN-synthesized face are different. specular highlights exhibit strong similarities (Figure 2, leftimage). We observe that GAN-synthesized faces also complywith the portrait setting (Figure 1), possibly inherited fromthe real face images that are used to train the GAN models.However, we also note the striking inconsistencies between thecorneal specular highlights of the two eyes (Figure 2, right im-age). Our method automatically extracts and aligns the cornealspecular highlights from two eyes and compare their similarity.Our experiments show that there is a clear separation betweenthe distribution of the similarity scores of the real and GANsynthesized faces, which can be used as a quantitative featureto differentiate them.

2. BACKGROUNDAnatomy of Human Eyes . The human eye provides the op-tics and photo-reception for the visual system. Figure 3 showsthe main anatomic parts of a human eye. The center of aneye are iris and the pupil. The transparent cornea is the outerlayer that covers the iris and dissolves into the white scleraat the circular band known as the corneal limbus. The corneahas a spherical shape and its surface exhibits mirror-like re-ﬂection characteristics, which generates the corneal specularhighlights when illuminated by light emitted or reﬂected in theenvironment at the time of capture.

GAN Synthesis of Human Faces . A series of recent worksknown as StyleGANs [2, 3, 4] have demonstrated the superiorcapacity of GAN models [1] trained on large sets of real humanfaces in generating high-resolution realistic human faces. AGAN model consists of two neural networks trained in tandem.The generator takes random noises as input and synthesizesan image, and the discriminator aims to differentiate synthe-sized images from the real ones. In training the two networkscompete with each other: the generator aims to create morerealistic images to defeat the discriminator, while the discrimi-

Fig. 3 : (left) Anatomy of a human eye. (right) The portraitsetting with the corneal specular highlights. nator aims to improve the accuracy in differentiating the twotypes of images. The training ends when the two networksreach an equilibrium.Albeit the successes, GAN-synthesized faces are not per-fect. Early StyleGAN model was shown to generate faces withasymmetric faces [12] and inconsistent eye colors [14]. How-ever, the more recent StyleGAN2 model [4] further improvesthe synthesis quality and eliminate such artifacts. However,visible artifacts and inconsistencies can still be observed in thebackground, the hair, and the eye regions. One fundamentalreason for the existence of such global and semantic artifactsin GAN synthesized faces is due to their lack of understandingof human face anatomy, especially the geometrical relationsamong the facial parts.

3. RELATED WORKS

Methods detecting GAN-synthesized faces fall into three cat-egories. Those in the ﬁrst category focus on signal traces orartifacts left by the GAN synthesis model. For example, earlierworks, e.g , [15, 13], use color differences of ﬁrst generationof GAN images. As color difference can be easily ﬁxed, moresophisticated detection methods, e.g , [10, 16], seek more ab-stract signal-level traces or ﬁngerprints in the noise residuals todifferentiate GAN-synthesized faces. More recent works suchas [17, 18, 19] extend the analysis to the frequency domain,where the upsampling step in the GAN generation leaves spe-ciﬁc artifacts. The second category of GAN synthesized facedetection methods are of data-driven nature [20, 21, 22, 23, 24],where a deep neural network model is trained and employed toclassify real and GAN-synthesized faces. Methods of the thirdcategory look for physical/physiological inconsistencies byGAN models. The work in [12] distinguish GAN-synthesizedfaces by analyzing the distributions of facial landmarks, and[9] exposes the fake videos by detecting inconsistent headposes. The method in [14] further inspect more visual aspectsto expose GAN synthesized faces. Such physiological/physicalbased detection methods are more robust to adversarial attacks,and afford intuitive interpretations.Because of the unique geometrical regularity, the corneal ig. 4 : Overall process to obtain corneal specular highlight. (a) The input high-resolution face image. (b) Detection of faciallandmarks around the eyes. (c) Hough circle detection of the corneal area. (d) Intersection of the eye region and circular cornealregion. (e) Extracted corneal specular highlight area. region of the eyes have been used in the forensic analysis ofdigital images. The work of [25] estimates the internal cameraparameters and light source directions from the perspectivedistortion of the corneal limbus and the locations of the cornealspecular highlights of two eyes, which are used to reveal dig-ital images composed from real human faces photographedunder different illumination. The work of [14] identiﬁes earlygenerations of GAN synthesized faces [2] by noticing that theymay have inconsistent iris colors, and the specular reﬂectionfrom the eyes is either missing or appear simpliﬁed as a whiteblob. However, such inconsistencies have been largely im-proved in the current state-of-the-art GAN synthesis models( e.g , [4]), see examples in Figure 1.

4. METHOD

In this work, we explore the use of corneal specular highlightas a cue to expose GAN synthesized human faces. The ratio-nale of our method can be understood as follows. In an imageof a real human face captured by a camera, the corneal specularhighlights of the two eyes are related as they are the results ofthe same light environment. Speciﬁcally, they are related by atransform that is determined by (1) the anatomic parameters ofthe two eyes including the distance between the centers of thepupils and the diameters of the corneal limbus; (2) the poses ofthe two eyeballs relative to the camera coordinate system, i.e ,their relative location as a result of head orientation; and (3)the location and distance of the light sources to the two eyes,measured in the camera coordinates.Under the following conditions, which we term as the portrait setting as it is often the case in practice when shootingcloseup portrait photographs, the corneal specular highlightsof the two eyes have approximately the same shape. To bemore speciﬁc, what we mean by a portrait setting consists ofthe following conditions, which is also graphically illustratedin the right panel of Figure 3.• The two eyes have a frontal pose, i.e , the line connectingthe center of the eyeballs is parallel to the camera.• The eyes are distant from the light or reﬂection source.• All light sources or reﬂectors in the environment arevisible to both eyes. To highlight such artifacts and quantify them as a cue toexpose GAN synthesized faces, we develop a method to auto-matically compare the corneal specular highlights of the twoeyes and evaluate their similarity. Figure 4 illustrates majorsteps of our analysis for an input image. We ﬁrst run a facedetector to locate the face, followed by a landmark extractor toobtain landmarks (Figure 4(b)), which are important locationssuch as the face contour, tips of the eyes, mouth, nose, and eye-brows, on faces that carry important shape information. Theregions corresponding to the two eyes are properly croppedout using the landmarks. We then extract the corneal limbus,which affords a circular form under the portrait setting. Tothis end, we ﬁrst apply a Canny edge detector followed bythe Hough transform to ﬁnd the corneal limbus (Figure 4(c)),and use its intersection with the eye region provided by thelandmarks as the corneal region (Figure 4(d)).We then separate the corneal specular highlights usingan adaptive image thresholding method [26]. Because thespecular highlights tend to have brighter intensities than thebackground iris, we keep only pixel locations above the adap-tive threshold (Figure 4(e)). We align the extracted cornealspecular highlights of the two eyes (denoted as R L and R R )with a translation, and use their IoU scores, | R L ∩ R R || R L ∪ R R | , as asimilarity metric. The IoU score takes range in [0 , withsmaller value suggesting lower similarity of R L and R R , andhence more likely the face is created with a GAN model.

5. EXPERIMENTS

The images of real human eyes are obtained from the Flickr-Faces-HQ (FFHQ) dataset [3], and the GAN synthesized hu-man faces are from http://thispersondoesnotexist.com , which are created by the StyleGAN2 method [4]. Theimages have resolution of , × , pixels. We use theface detector and landmark extractor provided in DLib [27],and the Canny edge detector and Hough transform are fromscikit-image [28].Figure 5 shows examples of the analysis results for imagesof both real and GAN-synthesized human eyes. As describedin the previous section, real human eyes captured by a cameraunder the portrait setting exhibit strong resemblance betweenmages of real human eyesImages of GAN synthesized human eyes Fig. 5 : Corneal specular highlights from real human eyes (top) and GAN generated human faces (bottom). The right columncorresponds to the detected corneal region (blue) and the specular highlights of two eyes (green and red). The IoU scores of thetwo corneal specular highlights are shown alongside the detections. (a) (b)

Fig. 6 : (a) Distributions of the IoU scores between the detectedcorneal specular highlights of two eyes for real and GANsynthesized faces. (b) The ROC curve based on the IoU scores. the corneal specular highlights of the two eyes, which arereﬂected by the higher IoU scores. On the other hand, thecorneal specular highlights of the two GAN synthesized eyesmay exhibit various types of inconsistencies, such as differentnumbers, different geometric shapes, or different relative lo-cations of specular highlight regions of the two eyes. Theseartifacts lead to signiﬁcantly lower IoU scores. Figure 6 (a)shows the distributions of the IoU scores of two eyes’ cornealspecular highlights for the real images and GAN generatedimages we collected. Consistent with the visual examples, there is a clear separation between the distributions, indicatingthat consistency of corneal specular highlights is an effectivemeasure differentiating real and GAN generated faces. Wealso show the receiver operating characteristic (ROC) curvein Figure 6 (b), which corresponds to an AUC (Area underthe ROC curve) score of 0.94, indicating that corneal specularhighlights are effective to identify GAN synthesized faces.

6. DISCUSSION

In this work, we show that GAN synthesized faces can beexposed with the inconsistent corneal specular highlights be-tween two eyes. Although inconsistencies of specular patternscan be ﬁxed with manual post-processing, it is expected tobe non-trivial. Our method has several limitations. We onlycompare pixel difference without consider inconsistencies ingeometry and scene. Also, when the portrait setting is notobeyed, we may have false positives, e.g , when light source isvery close to the subject or a peripheral light source that is notvisible in both eyes. It does not apply to images where specularpatterns are not present. In the future, we will investigate theseaspects and further improve the effectiveness of our method. . REFERENCES [1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio, “Generative adversarial nets,” in

NIPS , 2014.[2] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen,“Progressive growing of gans for improved quality, stability,and variation,” arXiv preprint arXiv:1710.10196 , 2017.[3] Tero Karras, Samuli Laine, and Timo Aila, “A style-basedgenerator architecture for generative adversarial networks,” in

Proceedings of the IEEE conference on computer vision andpattern recognition , 2019, pp. 4401–4410.[4] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,Jaakko Lehtinen, and Timo Aila, “Analyzing and improvingthe image quality of stylegan,” in

Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition , 2020, pp. 8110–8119.[5] “Experts: Spy used ai-generated face to connect with targets,” .[6] “A high school student created a fake 2020 US candidate.twitter veriﬁed it,” .[7] “How fake faces are being weaponized online,” .[8] “These faces are not real,” https://graphics.reuters.com/CYBER-DEEPFAKE/ACTIVIST/nmovajgnxpa/index.html .[9] Xin Yang, Yuezun Li, and Siwei Lyu, “Exposing deep fakesusing inconsistent head poses,” in

ICASSP , 2019.[10] Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, andGiovanni Poggi, “Do gans leave artiﬁcial ﬁngerprints?,” in . IEEE, 2019, pp. 506–511.[11] Nicholas Carlini and Hany Farid, “Evading deepfake-imagedetectors with white- and black-box attacks,” in . 2020, pp. 2804–2813, IEEE.[12] Xin Yang, Yuezun Li, Honggang Qi, and Siwei Lyu, “Exposinggan-synthesized faces using landmark locations,” in

ACMWorkshop on Information Hiding and Multimedia Security(IHMMSec) , 2019.[13] Haodong Li, Bin Li, Shunquan Tan, and Jiwu Huang,“Detection of deep network generated images using disparitiesin color components,” arXiv preprint arXiv:1808.07276 , 2018.[14] Falko Matern, Christian Riess, and Marc Stamminger,“Exploiting visual artifacts to expose deepfakes and facemanipulations,” in . IEEE, 2019, pp. 83–92.[15] Scott McCloskey and Michael Albright, “Detectinggan-generated imagery using color cues,” arXiv preprintarXiv:1812.08247 , 2018. [16] Ning Yu, Larry S Davis, and Mario Fritz, “Attributing fakeimages to gans: Learning and analyzing gan ﬁngerprints,” in

Proceedings of the IEEE International Conference onComputer Vision , 2019, pp. 7556–7566.[17] Xu Zhang, Svebor Karaman, and Shih-Fu Chang, “Detectingand simulating artifacts in gan fake images,” in . IEEE, 2019, pp. 1–6.[18] Joel Frank, Thorsten Eisenhofer, Lea Sch¨onherr, Asja Fischer,Dorothea Kolossa, and Thorsten Holz, “Leveraging frequencyanalysis for deep fake image recognition,” arXiv preprintarXiv:2003.08685 , 2020.[19] Ricard Durall, Margret Keuper, and Janis Keuper, “Watch yourup-convolution: Cnn based generative deep neural networks arefailing to reproduce spectral distributions,” in

Proceedings ofthe IEEE/CVF Conference on Computer Vision and PatternRecognition , 2020, pp. 7890–7899.[20] Francesco Marra, Cristiano Saltori, Giulia Boato, and LuisaVerdoliva, “Incremental learning for the detection andclassiﬁcation of gan-generated images,” in . IEEE, 2019, pp. 1–6.[21] Michael Goebel, Lakshmanan Nataraj, TejaswiNanjundaswamy, Tajuddin Manhar Mohammed, ShivkumarChandrasekaran, and BS Manjunath, “Detection, attributionand localization of gan generated images,” arXiv preprintarXiv:2007.10466 , 2020.[22] Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens,and Alexei A Efros, “Cnn-generated images are surprisinglyeasy to spot... for now,” in

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2020, vol. 7.[23] Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr, “Globaltexture enhancement for fake face detection in the wild,” in

Proceedings of the IEEE/CVF Conference on Computer Visionand Pattern Recognition , 2020, pp. 8060–8069.[24] Nils Hulzebosch, Sarah Ibrahimi, and Marcel Worring,“Detecting cnn-generated facial images in real-world scenarios,”in

Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition Workshops , 2020, pp. 642–643.[25] Micah K. Johnson and Hany Farid, “Exposing digital forgeriesthrough specular highlights on the eye,” in

Information Hiding ,Teddy Furon, Franc¸ois Cayre, Gwena¨el J. Do¨err, and PatrickBas, Eds., 2008, vol. 4567 of

Lecture Notes in ComputerScience , pp. 311–325.[26] Jui-Cheng Yen, Fu-Juay Chang, and Shyang Chang, “A newcriterion for automatic multilevel thresholding,”

IEEETransactions on Image Processing , vol. 4, no. 3, pp. 370–378,1995.[27] Davis E. King, “Dlib-ml: A machine learning toolkit,”

Journalof Machine Learning Research , vol. 10, pp. 1755–1758, 2009.[28] Stefan Van der Walt, Johannes L Sch¨onberger, JuanNunez-Iglesias, Franc¸ois Boulogne, Joshua D Warner, NeilYager, Emmanuelle Gouillart, and Tony Yu, “scikit-image:image processing in python,”

PeerJ , vol. 2, pp. e453, 2014. ig. 7 : More analysis examples of real human eyes.

Fig. 8 ::