[PDF] Domain Generalization for Document Authentication against Practical Recapturing Attacks

Abstract

Recapturing attack can be employed as a simple but effective anti-forensic tool for digital document images. Inspired by the document inspection process that compares a questioned document against a reference sample, we proposed a document recapture detection scheme by employing Siamese network to compare and extract distinct features in a recapture document image. The proposed algorithm takes advantages of both metric learning and image forensic techniques. Instead of adopting Euclidean distance-based loss function, we integrate the forensic similarity function with a triplet loss and a normalized softmax loss. After training with the proposed triplet selection strategy, the resulting feature embedding clusters the genuine samples near the reference while pushes the recaptured samples apart. In the experiment, we consider practical domain generalization problems, such as the variations in printing/imaging devices, substrates, recapturing channels, and document types. To evaluate the robustness of different approaches, we benchmark some popular off-the-shelf machine learning-based approaches, a state-of-the-art document image detection scheme, and the proposed schemes with different network backbones under various experimental protocols. Experimental results show that the proposed schemes with different network backbones have consistently outperformed the state-of-the-art approaches under different experimental settings. Specifically, under the most challenging scenario in our experiment, i.e., evaluation across different types of documents that produced by different devices, we have achieved less than 5.00% APCER (Attack Presentation Classification Error Rate) and 5.56% BPCER (Bona Fide Presentation Classification Error Rate) by the proposed network with ResNeXt101 backbone at 5% BPCER decision threshold.

Full PDF

AA D

ATABASE FOR D IGITAL I MAGE F ORENSICS OF R ECAPTURED D OCUMENTS

A P

REPRINT

Changsheng Chen ∗ College of EIEShenzhen UniversityChina 518060 [email protected]

Shuzheng Zhang

College of EIEShenzhen UniversityChina 518060

Fengbo Lan

Department of EE & CSYork UniversityON Canada, M3J 1P3January 6, 2021 A BSTRACT

Recapturing attack of document images is a topic with few research attention. However, such attackcan be employed as a simple but effective anti-forensic tool for digital document images. In thiswork, we present a high quality captured and recaptured image dataset of some representative identitydocuments to facilitate the study of this important issue. To highlight the risks posed by such attack,we evaluate some popular off-the-shelf machine learning-based approaches with our database underdifferent experimental protocols. Experimental results reveal the risks of existing document recapturedetection algorithms under uncontrolled application scenarios.

Keywords

Document Image · Recapturing Attack

Authentication of hardcopy document with digitally acquired document image is a forensic research topic withbroad interest. Due to the COVID-19 pandemic, we have observed an unprecedented demand of online documentauthentication in the e-commerce and e-government applications. Some important document images have been uploadedto online platforms for various purpose. However, the security loophole in the exiting authentication scheme has put oursystem at risk. As shown in Fig. 1 (a), some texts have been added on the identity document (ID) image to prevent anyillegally usage. However, the image can be tampered with photo editing software. To cover the editing trace, the editedID image is reacquired through a print-and-scan cycle. The resulting image in Fig. 1 (b) is therefore more realistic thanthe one edited in digital domain. It should be noted that such recapture attack on ID images can also be launched forother important documents, such as business licenses and certiﬁcates. The attack with recaptured images have poseda new thread to our document authentication system. Worse still, as the rapid advancement of deep learning-basedtechniques, there are some recent works on editing characters and words in document images with convolutional neuralnetworks Wu et al. [2019], Roy et al. [2020], Yang et al. [2020] in an end-to-end fashion.Exiting document authentication techniques with digital image has found its applications in various ﬁeld. Thesetechniques can be divided into active and passive categories. As an example of the active forensic techniques, the digitalwatermarking techniques Cox et al. [2007] can be applied to a certiﬁcate as protection against illegal alternations orre-acquisition. However, the active techniques require controls over the document generation process, which limitsits application to documents from various parties. In contrary, the passive techniques do not have such requirement.For instance, a questioned document image can be examined through the some inherent characteristics of the printingand acquisition process Chiang et al. [2009], Mayer and Stamm [2020] for tampering detection. However, the existingpassive forensic techniques on digital images have not considered a low-cost and popular attack, i.e., the recapture attack.Under such an attack, the original or tampered image of a given document is output with a printed and re-acquired withan imaging device to generated a recaptured version of the image. It should be noted that the recaptured document ∗ Author Homepage: https://sites.google.com/site/changshengshomepage/home a r X i v : . [ c s . MM ] J a n rXiv Template

A P

REPRINT (a) (b)

Figure 1: An example of illegal use of an identity document (ID) image. (a) An authentic ID image with texts to preventillegal usage. (b) An tampered ID image obtained from (a) with print-and-scan operation. The printing and scanningdevices in generating (b) are Epson L805 and Kyocera M2530dn, respectively.Figure 2: The block diagram of collecting genuine document images, recaptured document images and the forge-and-recapture document images.image has been through a complete image acquisition chain, and no post-processing (or forgery) is carried out after theacquisition steps. By deﬁnition, the image will be considered as an original copy by the existing passive tamperingdetection techniques. To ﬁght against such attacks, the image recapturing detection has attracted global researchattentions. However, most of the exiting recapture detection schemes focus on natural images, only a few works considerthe recapturing attack on hardcopy documents. Moreover, to the best of our knowledge, there is currently no researchon detecting recaptured document images forged with the latest deep learning-based approaches Wu et al. [2019], Royet al. [2020], Yang et al. [2020].In this work, we aim at evaluating the difﬁculties in the problem of recaptured document detection. To investigate theproblem of recaptured document detection, a high quality recaptured document image database is established. Thedataset consists of 1104 document images (including 132 captured document images and 972 re-captured documentimages) by 14 different devices combinations. It should be noted that these two parts involves two sets of differentdevices. To evaluate the performance of exiting machine learning-based classiﬁer, a generic framework for documentspooﬁng detection with image-based features extracted from both deep learning-based and handcrafted descriptorsis considered. The effectiveness of this detection framework is evaluated under both intra-dataset and cross-datasetexperiment protocols with our database. Experimental results reveal the risks of existing document recapture detectionalgorithms under uncontrolled application scenarios.

To investigate the problem of document recapturing detection, a high quality database consists of captured and recaptureddocument images is needed. First and foremost, the content of document should be chosen carefully. Some legaldocuments (e.g., passport, ID card, certiﬁcate) which contain sensitive privacy, are not suitable to be shared publicly.Student ID cards from 5 universities are synthesized with Adobe CorelDRAW and thus serve as the original documentimages in our experiment.Our database contains two datasets. Dataset I collects 1104 document images which are captured or re-captured by14 different combinations of devices. As shown in Fig. 2, the original document is printed by an authorized party togenerate the genuine document, which are then scanned/captured to yield the captured document images. To collect therecaptured document images, the copied document images are print and reacquired (by scanner or camera). Dataset IIfollows the same data collection procedure but with a different set of devices. As shown in Table 1, we have employed4 phones, 3 scanners and 1 printer in collecting dataset I, while 2 phones (including a high quality camera phone, OppoReno with resolution of 48 MP), 2 scanners (including a high-end scanner, Epson V850 with optical resolution of 6400DPI), and 2 printers (including a high-end printer, Epson L805 with resolution of 5760 × rXiv Template

A P

REPRINT

Figure 3: The original ID images synthesized with Adobe CorelDRAW for our experiment.To collect a high quality dataset, there are a few thumb of rules in our experiment.•

Camera Phone : set to the highest supported resolution, and the captured images are saved in JPEG formatwith the highest quality factor;•

Illuminance : the environmental light is controlled by a lamp to avoid introducing distortions (such as,shadowing, geometric distortion, defocusing);•

Scanner : set to the resolution of 1200 DPI, except the Epson V850 remains at its default value of 3200 dpi;•

Printer : set to color mode with the ﬁnest printing resolution;•

Printing substrate : paper with 120 g/m . To study challenges of the forge-and-recapture attacks, we construct a generic framework for document spooﬁngdetection with a single image by following the network architecture in Agarwal et al. [2018]. Some popular CNNsincluding ResNet 34/50/101/152 He et al. [2016], ResNeXt 50/101 Xie et al. [2017], DenseNet 121/169/201 Huanget al. [2017], VGG 16/19 Simonyan and Zisserman [2014], MobileNet Howard et al. [2017], Inception V3 Szegedy et al.[2016] are considered in our experiment. These CNNs serve as feature extractors in our recapture detection framework.The pre-trained (with ImageNet database) models of these CNN are adopted and frozen, and only the parameters in3 rXiv

Template

A P

REPRINT

Table 1: The devices used for collecting dataset I and II.

Set 1st Imaging Device Printer 2nd Imaging Device D Phone XiaoMi 8 Phone XiaoMi 8RedMi Note 5 RedMi Note 5Huawei P9 HP Huawei P9Apple iPhone 6 OfﬁceJet Apple iPhone 6Scanner Brother DCP-1519 258 Scanner Brother DCP-1519Epson V330 Epson V330Benq K810 Benq K810 D Phone Apple iPhone 6s HP LJ Phone Apple iPhone 6sOppo Reno m176n Oppo RenoScanner Epson V850 Epson Scanner Epson V850HP Laserjet m176n L805 HP Laserjet m176n

Figure 4: The generic framework based on CNNs for document spooﬁng detection with a single image.fully connected (FC) layer of our framework are trainable. The dimensions of both FC layers are 256. The batch size isset to 128. The learning rate is initially set to × − and the number of iterations is 20 epochs. Cross entropy lossand the Adam optimizer are chosen in our implementation. This generic framework is coded with Tensorﬂow 1.13.1and Pytorch 1.10 and is run on a NVidia 2080Ti GPU.Referring to the literatures of face spooﬁng detection, there are some prior works Tirunagari et al. [2015], Boulkenafetet al. [2016], Patel et al. [2016] that detects spooﬁng face images without using depth information. Within these works,the local binary pattern (LBP) Ojala et al. [2002] descriptor has been included as a benchmarking feature. To allow amore complete picture of the performances of different features, LBP with SVM (both linear and RBF kernel) is chosenas a representative machine learning scheme with handcrafted features in our recapture detection framework.As shown in Table 2, performances of the generic recapture detection framework are satisfactory in the scenarios wheretraining and testing data is sampled from the same subset. For example, a majority of the CNN-based schemes achieveEER = 0.0001 and AUC = 1.0000 in the experiment conducted within M , S and D , respectively. It should also benoted that the LBP-based classiﬁers perform less accurately compared to the CNN-based approaches.However, the recapture detection performances degrade signiﬁcantly when the training and testing data is inhomoge-neous. Such degradation is shown in the M → S and S → M experimental conditions in Table 2, as well as the D → D and D → D experimental conditions in Table 3. References

Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, and Xiang Bai. Editing text in the wild.In

Proceedings of the 27th ACM International Conference on Multimedia , pages 1500–1508, 2019.Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, and Umapada Pal. Stefann: Scene text editor using font adaptiveneural network. In

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages4 rXiv

Template

A P

REPRINT

Table 2: Experimental result in Dataset D . M and S stands for the subsets of samples captured (in the last imagingprocess) by mobile phones and scanners, respectively. The notation 8:1:1 means that the samples (in M , S or D ) aredivided into 80%, 10% and 10% for the training, validation and testing sets, respectively. Methods M (8:1:1) S (8:1:1) M → S S → M D (8:1:1)EER AUC EER AUC EER AUC EER AUC EER AUCLBP+SVM (Linear) 0.2143 0.8782 DenseNet169

DenseNet201

MobileNet 0.0293 0.9974

ResNeXt50

ResNet152

ResNet101

ResNet50

ResNet34

VGG16 0.0294 0.9991

Table 3: Cross dataset evaluation in D and D . Methods D → D D → D EER AUC EER AUCLBP+SVM (Linear) 0.3333 0.7934 0.3939 0.6157LBP+SVM (RBF) 0.3157 0.7509 0.4149 0.6615DenseNet121 0.1250 0.9378 0.0561 0.9844DenseNet169 0.1536 0.9130 0.0714 0.9822DenseNet201 0.2031 0.9024 0.1139 0.9538MobileNet 0.2500 0.7953 0.3809 0.6732ResNeXt101

ResNet101 0.2318 0.8550 0.0527 0.9905ResNet50 0.1172 0.9463 0.0867 0.9740ResNet34 0.1666 0.8831 0.1190 0.9532VGG16 0.2499 0.8227 0.2772 0.7627VGG19 0.2499 0.8914 0.3793 0.6270InceptionV3 0.1979 0.8914 0.1246 0.9398

Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 14700–14709, 2020.Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker.

Digital watermarking and steganogra-phy . Morgan kaufmann, 2007.Pei-Ju Chiang, Nitin Khanna, Aravind K Mikkilineni, Maria V Ortiz Segovia, Sungjoo Suh, Jan P Allebach, GeorgeT-C Chiu, and Edward J Delp. Printer and scanner forensics.

IEEE Signal Processing Magazine , 26(2):72–83, 2009.O. Mayer and M. C. Stamm. Forensic similarity for digital images.

IEEE Transactions on Information Forensics andSecurity , 15:1331–1346, 2020.Shruti Agarwal, Wei Fan, and Hany Farid. A diverse large-scale dataset for evaluating rebroadcast attacks. In , pages 1997–2001. IEEE,2018.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

Proceedingsof the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.5 rXiv

Template

A P

REPRINT

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations fordeep neural networks. In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages1492–1500, 2017.Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutionalnetworks. In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4700–4708,2017.Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXivpreprint arXiv:1409.1556 , 2014.Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto,and Hartwig Adam. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. arXivpreprint arXiv:1704.04861 , 2017.Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inceptionarchitecture for computer vision. In

Proceedings of the IEEE conference on computer vision and pattern recognition ,pages 2818–2826, 2016.Santosh Tirunagari, Norman Poh, David Windridge, Aamo Iorliam, Nik Suki, and Anthony TS Ho. Detection of facespooﬁng using visual dynamics.

IEEE transactions on information forensics and security , 10(4):762–777, 2015.Zinelabidine Boulkenafet, Jukka Komulainen, and Abdenour Hadid. Face spooﬁng detection using colour textureanalysis.

IEEE Transactions on Information Forensics and Security , 11(8):1818–1830, 2016.Keyurkumar Patel, Hu Han, and Anil K Jain. Secure face unlock: Spoof detection on smartphones.

IEEE transactionson information forensics and security , 11(10):2268–2283, 2016.Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invariant texture classiﬁcationwith local binary patterns.