Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks
WWriter-independent Feature Learning for OfflineSignature Verification using Deep ConvolutionalNeural Networks
Luiz G. Hafemann, Robert Sabourin
Lab. d’imagerie, de vision et d’intelligence artificielleÉcole de technologie supérieureUniversité du Québec, Montreal, [email protected], [email protected]
Luiz S. Oliveira
Department of InformaticsFederal University of ParanaCuritiba, PR, [email protected]
Accepted as a conference paper for IJCNN 2016
Abstract —Automatic Offline Handwritten Signature Verifica-tion has been researched over the last few decades from severalperspectives, using insights from graphology, computer vision,signal processing, among others. In spite of the advancements onthe field, building classifiers that can separate between genuinesignatures and skilled forgeries (forgeries made targeting aparticular signature) is still hard. We propose approaching theproblem from a feature learning perspective. Our hypothesis isthat, in the absence of a good model of the data generationprocess, it is better to learn the features from data, insteadof using hand-crafted features that have no resemblance tothe signature generation process. To this end, we use DeepConvolutional Neural Networks to learn features in a writer-independent format, and use this model to obtain a featurerepresentation on another set of users, where we train writer-dependent classifiers. We tested our method in two datasets:GPDS-960 and Brazilian PUC-PR. Our experimental resultsshow that the features learned in a subset of the users arediscriminative for the other users, including across differentdatasets, reaching close to the state-of-the-art in the GPDSdataset, and improving the state-of-the-art in the Brazilian PUC-PR dataset.
I. I
NTRODUCTION
Biometrics technology is used in a wide variety of securityapplications. The aim of such systems is to recognize a personbased on physiological traits (e.g fingerprint, iris) or behavioraltraits (e.g. voice, handwritten signature) [1]. The handwrittensignature is a particularly important type of biometric trait,mostly due to its widespread use to verify a person’s identityin legal, financial and administrative areas. One of the reasonsfor its extensive use is that the process to collect handwrittensignatures is non-invasive, and people are familiar with theiruse in daily life [2].Research in signature verification is divided between online(dynamic) and offline (static) scenarios. In the online case, thesignature is captured using a special input device (such as atablet), and the dynamic information of the signature processis captured (pen’s position, inclination, among others). In thiswork, we focus on the Offline (static) signature verificationproblem, where the signature is acquired after the writingprocess is completed, by scanning the document containing the signature. In this case, the signature is represented as adigital image.Most of the research effort in this area has been devotedto obtaining a good feature representation for signatures, thatis, designing good feature extractors. To this end, researchershave used insights from graphology, computer vision, signalprocessing, among other areas [3]. As with several problemsin computer vision, it is often hard to design good featureextractors, and the choice of which feature descriptors to useis problem-dependent. Ideally, the features should reflect theprocess used to generate the data - for instance, neuromotormodels of the hand movement. Although this approach hasbeen explored in the context of online signature verification[4], there is not a widely accepted “best” way to model theproblem, specially for Offline (static) signature verification,where the dynamic information of the signature generationprocess is not available.In spite of the advancements in the field, systems proposedin the literature still struggle to distinguish genuine signaturesand skilled forgeries. These are forgeries made by a personwith access to a user’s signature, that practices imitating it(see Figure 1). Experimental results show somewhat largeerror rates when testing on public datasets (such as GPDS[5]), even when the number of samples for training is around10-15 (results are worse with 1-3 samples per user, which isa common scenario in banks and other institutions).In this work we propose using feature learning (also calledrepresentation learning) for the problem of Offline SignatureVerification, in order to obtain better feature representations.Our hypothesis is that, in the absence of a good model ofthe data generation process, it is better to learn the featuresfrom data, rather than using hand-crafted features that have noresemblance to how the signatures are created, which is thecase for the best performing systems proposed in the literature.For example, recent Offline Signature Verification systems arebased on texture descriptors, such as Local Binary Patterns[6], interest-point-matching such as SURF [7], among others.We base our research on recent successful applications ofpurely supervised learning models for computer vision (such a r X i v : . [ c s . C V ] A p r igure 1. Samples from the GPDS-960 dataset. Each row contains threegenuine signatures from the same user and a skilled forgery. We notice thateach genuine signature is different (showing high intra-class variability), whileskilled forgeries resemble the genuine signatures to a large extent (showinglow inter-class variability) as image recognition [8]). In particular, we use Deep Convolu-tional Neural Networks (CNN) trained with a supervised cri-terion, in order to learn good representations for the signatureverification problem. This type of architecture is interesting forour problem, since it scales better than fully connected modelsfor larger input sizes, having a smaller number of trainableparameters. This is a desirable property for the problem athand, since we cannot rescale signature images too muchwithout risking losing the details that enable discriminatingbetween skilled forgeries and genuine signatures.The most common formulation of the signature verificationproblem is called Writer-Dependent classification. In thisformulation, one classifier is built for each user in the system.Using a supervised feature learning approach directly in thiscase is not practical, since the number of samples per useris very small (usually around 1-14 samples). Instead, wepropose a two-phase approach: a Writer-Independent featurelearning phase followed by Writer-Dependent classification.The feature learning phase uses a surrogate classification taskfor learning feature representations, where we train a CNN todiscriminate between signatures from users not enrolled in thesystem. We then use this CNN as a feature extractor and traina Writer-dependent classifier for each user. Note that in thisformulation, adding a new user to the system requires trainingonly a Writer-Dependent classifier.We tested this method using two datasets: the GPDS-960corpus ([5]) and the Brazilian PUC-PR dataset [9]. The firstis the largest publicly available corpus for offline signatureverification, while the second is a smaller dataset that has beenused for several studies in the area.Our main contributions are the following: We propose atwo-stage framework for offline signature verification, wherewe learn features in a Writer-Independent way, and buildWriter-Dependent classifiers. Our results show that we do haveenough data in signature datasets to learn relevant featuresfor the task, and the proposed method achieves state-of-the-art performance. We also investigate how the features learned in one dataset transfer to another dataset, and the impactin performance of the number of samples available for WDtraining. II. R ELATED W ORK
Feature learning methods have not yet been broadly re-searched for the task of offline signature verification. Murshedet al. [10], [11], used autoencoders (called Identity-MappingBackpropagation in their work) to perform dimensionalityreduction followed by a Fuzzy ARTMAP classifier. Thiswork, however, considered only a single hidden layer, withless units than the input. In contrast, in recent successfulapplications of autoencoders, multiple layers of representationsare learned, often in an over-complete format (more hiddenunits than visible units), where the idea is not to reducedimensionality, but “disentangle” the factors of variation in theinputs [12]. Ribeiro et al. [13] used unsupervised learning forlearning representations, in particular, Restricted BoltzmannMachines (RBMs). In this work, the authors tested with asmall subset of users (10 users), and only reported a visualrepresentation of the learned weights, and not the results ofusing such features to discriminate between genuine signaturesand forgeries. Khalajzadeh [14] used Convolutional NeuralNetworks (CNNs) for Persian signature verification, but didnot considered skilled forgeries.A similar strategy to our work has been used by Sunet al. [15] for the task of face verification. They trainedCNNs on a large dataset of faces and used these networksto extract features on another face dataset. In their work, theverification process consisted in distinguishing between facesfrom different users. In signature verification, distinguishingbetween different writers is one of the objectives (when weconsider “Random Forgeries”), but the main challenge is todistinguish between genuine signatures and skilled forgeries.In this work we evaluate the method for both types of forgery.The framework we propose is also similar to previous workby Eskander et al. [16]. In this work, a Writer-Independentset is used for feature selection, and a Writer-Dependent setis used for training and evaluation. However, in this workthe authors used hand-crafted feature extractors, while in thepresent work we use the Writer-Independent set for featurelearning, instead of feature selection.III. P
ROPOSED M ETHOD
We propose a two-stage approach, considering a writer-independent feature learning phase, followed by writer-dependent classification. We start by partitioning the datasetinto two distinct sets: Development set D and Exploitationset E . The set D is used to learn the feature representationfor signatures. We consider this as a separate dataset fromthe enrolled users. The exploitation set E considers the usersenrolled to the system. This set is used to train Writer-Dependent classifiers (using only genuine signatures) and forevaluating the performance of the system.The proposed system is illustrated in Figure 2. We first usethe set D to learn the feature representations, by training a riter-Independent Feature learningWriter-Dependent training WD Classi fi er Decision(Accept/Reject) Generalization
CNN trainingFeature ExtractionFeature Extraction
SignatureImage ( )
Veri fi cation SignatureImages ( ) CNN model ( )Extracted features ( )Extracted features ( ) Binary classi fi er ( ) Development dataset ( )Training set (from )New Sample (from )
Figure 2. The proposed architecture for writer-independent feature learning and writer-dependent classification.
Convolutional Neural Network (detailed in the next section).The result is a function φ ( . ) , learned from data, that projectsthe input images X to another feature space: φ ( X ) ∈ R m ,where m is the dimensionality of the projected feature space.Our expectation is that the features learned using D will beuseful to separate genuine signatures and forgeries from otherusers. After the CNN is trained, we create a training datasetfor each user in set E , using a subset of the user’s genuinesignatures, and random forgeries. We use the CNN as a featureextractor, obtaining a feature vector φ ( X ) for each signature X in the user’s dataset. This new representation is then used totrain a binary classifier f . For a new sample X new , we first usethe CNN to “extract features” (i.e. obtain the feature vector φ ( X new ) ) and feed the feature vector to the binary classifier,obtaining a final decision f ( φ ( X new )) . The next sections detailthe WI and WD training procedures. A. Pre-processing
For all signatures from both datasets ( D and E ), we applythe same pre-processing strategy. The signatures from theGPDS dataset have a variable size, ranging from 153 x 258pixels to 819 x 1137 pixels. Since for training a neural networkwe need the inputs to have all the same size, we need tonormalize the signature images. We evaluated two approaches:In the simplest approach, we resized the images to afixed size, using bi-linear interpolation. We perform rescalingwithout deformations, that is, when the original image had adifferent width-to-height ratio, we cropped the excess in thelarger dimension. The second approach consisted in first normalizing theimages to the largest image size, by padding the images withwhite background. In this case, we centered the signatures ina canvas of size 840 x 1360 pixels, aligning the center of massof the signature to the center of the image, similar to previousapproaches in the literature, e.g. [17]. We then rescaled theimages to the desired input size of the neural network.With the first approach, less fine-grained information islost during the rescaling, specially for the users that havesmall signatures. On the other hand, the width of the penstrokes becomes inconsistent: for the smaller signatures thepen strokes become much thicker than the pen strokes fromthe larger signatures.Besides resizing the images to a standard size, we alsoperformed the following pre-processing steps: • Removed the background : we used OTSU’s algorithm[18] to find the optimum threshold between foregroundand background pixel intensities. Pixels with intensitylarger than the threshold were set to white (intensity255). The signature pixels (with intensity less than thethreshold) remain unchanged in this step. • Inverted the images : we inverted the images so thatthe white background corresponded to pixel intensity0. That is, each pixel of the image is calculated as: I inverted ( i, j ) ← − I ( i, j ) . • Normalized the input : we normalized the input to theneural network by dividing each pixel by the standarddeviation of all pixel intensities (from all images in D ). We do not normalize the data to have mean 0another common pre-processing step) since we want thebackground pixels to be zero-valued. B. Writer-Independent feature learning
For learning the representation for signatures, we usedDeep Convolutional Neural Networks. We note that modelingdirectly the problem of interest is not feasible in practice: ourultimate goal is to separate genuine signatures from skilledforgeries of the users enrolled in the system, but in a realisticscenario we only have genuine signatures provided duringan enrollment phase, and do not have forgeries for theseusers. Therefore, we need to consider a surrogate classificationobjective. In this work, we use a separate set of users (thedevelopment set D ) to learn the features, by learning aclassification task, considering each user in D as a differentclass. The objective function is to minimize a cross-entropyclassification loss. The expectation is that by learning todistinguish between signatures from different users in thisdataset, the network will learn features that are relevant forour problem of interest - separating genuine signatures andforgeries from the exploitation set E .We used a CNN architecture similar to the one defined byKrizhevsky et al. [8] for an image recognition problem. Initialtests showed that the capacity of this network seems to betoo large for the problem at hand, particularly considering thefully connected layers (that contain most of the weights in thenetwork). We obtained better results with 2 fully-connectedlayers after the convolutions, instead of three layers from theoriginal model. For the purpose of replicating our experiment,we provide a full list of the parameters used in our tests. TableI lists the definition of the CNN layers. For convolution andpooling layers, we list the size as N x H x W where N is thenumber of filters, H is the height and W is the width of theconvolution and pooling windows, respectively. Stride refersto the distance between applications of the convolution (orpooling) operation, and pad refers to add padding (borders)to the input, with value . Local Response normalization isapplied according to [8], with the parameters listed in the table.For the first two fully-connected layers we use dropout [19],with rate 0.5. We use Rectified Linear Units (ReLUs) as theactivation function for all convolutional and fully-connectedlayers, except the last one. The last layer uses a softmaxactivation and has N neurons, where N is the number of usersin the set D , indicating the probability of the sample belongingto each of the users.We initialize the weights of the model according to thework of Glorot and Bengio [20], and the biases to . Wetrained the model with Nesterov Momentum for epochs,using momentum rate of 0.9, and mini-batches of size .We started with a learning rate of . , and divided it by10 twice (after 20 epochs, and after 40 epochs). We used L2regularization with a weight decay factor of . . Thesevalues are consolidated in Table II. The networks were trainedusing the libraries Theano [21] and Lasagne [22], and tookaround 5h to train on a GPU Tesla C2050. Table IS
UMMARY OF THE
CNN
LAYERS
Layer Size Other ParametersConvolution 96x11x11 stride = 4, pad=0Local Response Norm. - α = 10 − , β = 0 . , k = 2 , n = 5 Pooling 96x3x3 stride = 2Convolution 256x5x5 stride = 1, pad=2Local Response Norm. - α = 10 − , β = 0 . , k = 2 , n = 5 Pooling 256x3x3 stride = 2Convolution 384x3x3 stride = 1, pad=1Convolution 256x3x3 stride = 1, pad=1Pooling 256x3x3 stride = 2Fully Connected + Dropout 4096 p = 0 . Fully Connected + Softmax NTable IIT
RAINING H YPERPARAMETERS
Parameter valueInitial Learning Rate (LR) 0.01Learning Rate schedule LR ← LR ∗ . (every 20 epochs)Weight Decay 0.0005Momentum 0.9Batch size 100 C. Writer-dependent classification
After the CNN is trained in the set D , we use it to extractfeatures for the Writer-Dependent training. Similar to previouswork in transfer learning [23], [24], we use the representationobtained by performing forward propagation of an input imageuntil the last layer before softmax. In the notation definedabove, we consider our feature extractor function φ ( X ) to bethe representation of the network at the last layer before soft-max, after forward propagating the input X . As noted in TableI, this representation has 4096 dimensions ( φ ( X ) ∈ R ).The hypothesis is that the features learned for the set D , duringthe CNN training, will be relevant for signatures for other users(from the exploitation set).For training the Writer-Dependent classifiers, no skilledforgeries are used during training or validation, to simulatethe scenario for a real application. Following previous work onWriter-Dependent classification, we create a dataset for eachuser, consisting of genuine signatures and random forgeries(using signatures from other users, from D ).For each user in E , we build a Writer-Dependent trainingand testing set. The training set is composed of a subset ofgenuine signatures for the user (as the positive examples),as well as genuine signatures from other users from thedevelopment dataset (as the negative examples). The testingset consists of genuine signatures from the user (not used fortraining), and the skilled forgeries made for the user. With thisdataset, we first use the CNN to extract the features for eachsignature image (that is, compute φ ( X ) for each signature X ).We then train a standard two-class classifier f for each user.For the WD classification, we test both linear SVMs andSVMs with the RBF kernel [25]. For the linear SVM, we sers Samples
Users 1 - 160 (160) orUsers 1 - 300 (300) Users 161 - 881 (721) orUsers 301 - 881 (581)
TrainingTesting
Figure 3. The separation of the GPDS-960 dataset in Development set D andExploitation set E . used the hyperparameter C = 1 , while for the SVM withRBF kernel we optimize the parameters C and γ with asubset of users from the set D using a grid search. We selectthe hyperparameters that best classify genuine and skilledforgeries from these users.During generalization, for a new signature X NEW , we firstuse the CNN to obtain the representation of the signature (i.e.calculate φ ( X NEW ) , and then feed this representation to theclassifier to obtain a final decision on the sample f ( φ ( X NEW )) .IV. E XPERIMENTAL P ROTOCOL
Feature learning for complex tasks has shown to work betterwith large datasets. The largest publicly available signaturedataset is GPDS-960 [5], and therefore it is particularlysuitable for our proposed method. This dataset contains 24genuine signatures and 30 forgeries per user, from 881 users,which were captured in a single session [5]. We also testedwith a smaller dataset, that also has been extensively used foroffline signature verification: the Brazilian PUC-PR dataset[9]. This dataset contains signatures from 168 users, andforgeries for the first 60 users.The first step is to split the datasets into development set D and exploitation set E . For GPDS, in order to allow comparisonwith previous work, we tested with the set E consisting of thefirst 160 users, and the first 300 users (which were previouslypublished as GPDS-160 and GPDS-300). Figure 3 shows howthe dataset is split. The remaining users are used for the writer-independent feature learning phase. For the brazilian set, weconsider the first 60 users for the set E , and the remaining 108users are used as the set D .After splitting the dataset into sets D and E , we preprocessthe signature images to a standard size of 155 x 220 pixels,considering the two preprocessing options listed in the previ-ous section. This size was chosen to be large enough to keepdetails from the pen strokes in the signatures, while still smallenough to enable training on the GPU. We use the set D totrain a CNN, that learns to classify input signatures to thedifferent users in this set.To assess if the learned features generalize to other datasets,we used the CNN trained in the GPDS dataset for extractingfeatures for the brazilian dataset. This experiment serves twopurposes: analyze if the learned features generalize to other datasets, and evaluate if we can obtain a better performanceon the brazilian set (which is smaller) by leveraging data froma larger dataset (GPDS).For the Writer-Dependent training, we have slightly differ-ent protocols for GPDS and the Brazilian dataset, to corre-spond to protocols used in other work on these datasets. ForGPDS, we selected up to 14 genuine signatures as positivesamples (from E ), and 14 genuine signatures from each userin the set D as negative samples. For testing, we selected10 genuine signatures from the user, ensuring they were notused for training, and all the 30 skilled forgeries. For theBrazilian dataset, we selected up to 30 genuine samples aspositive samples (from E ), and 30 genuine samples from theusers in set D as negative samples. For testing, we selected10 genuine signatures from the user, 10 signatures from otherusers in E (i.e. not used for training) as random forgeries, andall 10 simple forgeries and 10 skilled forgeries available foreach user.To evaluate the impact of different number of samplesignatures per user, we trained the WD classifiers using avariable number of signatures from the enrolled users. Thisset-up is summarized in table III.For optimizing the hyperparameters for the SVM training(for the WD classifiers), we performed a grid search on theparameters C and γ . We used 10 users from D , buildingWD classifiers with the same protocol as above. We selectedthe hyperparameters that performed best in separating gen-uine signatures and skilled forgeries for these 10 users, bymeasuring the classification error of each classifier. Beforetraining the SVM models, we rescale the inputs to have a unitstandard deviation (in each dimension). This slightly improvedperformance and significantly decreased the SVM trainingtime. Similar to [16], in order to have a balanced dataset fortraining, we duplicated the genuine examples in the trainingset to match the same number of random forgeries (equivalentto having different C for the positive and negative classes).In this work we conducted experiments with two datasets,and authors from different studies have reported differentmetrics. For GPDS, some authors report two metrics: FalseRejection Rate (FRR) and False Acceptance Rate for skilledforgeries (FAR skilled ). The first metric is the fraction of genuinesignatures that were classified as forgery, while the second isthe fraction of skilled forgeries that were classified as genuinesignatures. Other authors report simply the Equal Error Rate,which is the point in a ROC curve where FAR and FRR areequal. For the results on GPDS, we report these three metrics,and also the mean of the Area Under the Curve (AUC) - thatis, we build a ROC curve for each user, and report the averageof the AUC. For calculating the EER, we considered the ROCcurves created for each user (thresholds specific for each user).For the Brazilian PUC-PR dataset, authors commonly re-port FRR and FAR for three types of forgeries: Random,Simple and Skilled. Authors also report an average error rate(AER) which is the average of the four types of error (FRR,FAR random , FAR simple , FAR skilled ). To allow comparison withthe results on GPDS, we also report metrics considering only able IIIT RAINING AND TESTING SET - UP Dataset Training Set Testing Setgenuine forgeries (random) genuine forgeriesBrazilian (PUC-PR) 1,5,10,15,30 samples 108 x 30 = 3240 samples 10 samples 10 random, 10 simple, 10 skilledGPDS-160 4, 8, 12, 14 samples 721 x 14 = 10094 samples 10 samples 30 skilledGPDS-300 4, 8, 12, 14 samples 581 x 14 = 8134 samples 10 samples 30 skilledTable IVC
LASSIFICATION ERRORS ON
GPDS (%)
AND MEAN
AUC
Dataset Features Classifier FRR FAR EER Mean AUCGPDS-160 CNN_GPDS SVM (Linear) 26.62 9.65 14.35 0.9153GPDS-160 CNN_GPDS SVM (RBF) 37.25 3.66 14.64 0.9097GPDS-160 CNN_GPDS norm
SVM (Linear) 11.12 16.77 11.32 0.9381
GPDS-160 CNN_GPDS norm
SVM (RBF) 19.81 5.99 10.70 0.9459
GPDS-300 CNN_GPDS SVM (Linear) 25.43 12.80 16.40 0.8968GPDS-300 CNN_GPDS SVM (RBF) 36.27 5.00 16.22 0.9014GPDS-300 CNN_GPDS norm
SVM (Linear) 11.93 25.58 16.07 0.8957
GPDS-300 CNN_GPDS norm
SVM (RBF) 20.60 9.08 12.83 0.9257
FRR and FAR skilled : AER genuine + skilled , EER genuine + skilled andMean AUC genuine + skilled .V. R
ESULTS AND D ISCUSSION
We first report the results of the search for the besthyperparameters for the SVM with RBF kernel used forWriter-Dependent classification. After training classifiers for10 users in the development set, we noticed that the besthyperparameters were the same for most users (8/10 users): γ = 2 − , C = 1 . For the other two users, this was the secondbest configuration for the parameters. Therefore we used thesehyperparameters for the subsequent experiments.Table IV presents the results of our experiments with theGPDS dataset. The column “Features” list the method weused to extract features - in our work this column lists theCNN trained in the set D . We considered both alternativesdefined in the Pre-processing section - simply resizing thesignatures images (CNN_GPDS), and first normalizing thesignatures in a canvas with a standard size, before resizingthem (CNN_GPDS norm ). We notice that this normalization wasessential to obtain good classification results on this dataset,with a boost in performance from 14.64% of EER to 10.70%in the GPDS-160 dataset. We also noticed the best results wereachieved with the SVM trained with an RBF kernel. Lastly,we noted a drop in performance between the experiments withGPDS-160 and GPDS-300. This can be partially explained bythe fact that we use more data on the set D for GPDS-160.Table V shows the results of our tests with the BrazilianPUC-PR dataset. We noticed the same characteristics as withthe GPDS test, with improved results with the non-linear RBFkernel for the classifier. In this dataset we tested with both aCNN trained on the brazilian dataset, as well as the CNNtrained above for the GPDS dataset. The results were similar,suggesting that the features learned in one dataset generalizewell to other datasets. On the other hand, we expected the Figure 4. Performance on the GPDS-160 dataset varying the number ofsamples per user for WD training. The error bars show the smallest andlargest AUC of users in the exploitation dataset. performance with the CNN trained on GPDS to be better,since the development set for the Brazilian dataset is muchsmaller (108 users in the Brazilian dataset vs. 721 users forGPDS-160), and therefore there is much more data on GPDSto learn a good feature representation.We evaluated the performance of the system consideringdifferent number of samples per user in the exploitation set.For these tests, we used the configuration that performed beston the tests above: using the normalized GPDS developmentset to learn the features, and using an SVM with RBF kernelfor training the WD classifiers. Figures 4 and 5 present the evo-lution of the AUC and the Equal Error Rate for the GPDS andBrazilian datasets. We notice that even with a small numberof samples the performance is reasonable, achieving 15.05%EER with 4 signatures in the GPDS dataset, and 9.83% EERwith 5 signatures on the Brazilian dataset. However, we noticethat in the extreme case, when a single signature is available,the performance of the entire system is much worse (around17% EER), and some users have very poor performance (forone user, AUC is below 0.5).We compare our results with the state-of-the art in tablesVI and VII. For GPDS, the method achieves state-of-the-artperformance in terms of Equal Error Rate, when comparingwith systems that used a single feature extractor. However, theperformance is worse compared to systems where multiplefeature extractors / classifiers are used. Future work canbe done in analyzing if the features learned from data arecomplementary to hand-crafted features.For the Brazilian PUC-PR dataset, authors use other metrics able VC
LASSIFICATION ERRORS ON THE B RAZILIAN
PUC-PR
DATASET (%)
AND MEAN
AUCFeatures Classifier FRR FAR random
FAR simple
FAR skilled
EER genuine + skilled
Mean AUC genuine + skilled
CNN_Brazilian SVM (Linear) 1.00 0.00 1.67 27.17 7.33 0.9668
CNN_Brazilian SVM (RBF) 2.83 0.17 0.17 14.17 4.17 0.9837
CNN_GPDS SVM (Linear) 1.83 0.00 1.33 27.83 11.50 0.9413CNN_GPDS SVM (RBF) 6.50 0.17 1.17 15.17 8.50 0.9601CNN_GPDS norm
SVM (Linear) 0.17 0.00 1.67 29.00 6.67 0.9653
CNN_GPDS norm
SVM (RBF) 2.17 0.17 0.50 13.00 4.17 0.9800
Table VIC
OMPARISON WITH THE STATE - OF - THE - ART ON THE B RAZILIAN
PUC-PR
DATASET ( ERRORS IN %) Reference Features Classifier FRR FAR_random FAR_simple FAR_skilled AER AER genuine + skilled
EER genuine + skilled
Bertolini et al. [26] Graphometric SVM (RBF) 10.16 3.16 2.8 6.48 5.65 8.32 -Batista et al. [27] Pixel density HMM + SVM 7.5 0.33 0.5 13.5 5.46 10.5 -Rivard et al. [28] ESC + DPDF Adaboost 11 0 0.19 11.15 5.59 11.08 -Eskander et al. [16] ESC + DPDF Adaboost 7.83 0.02 0.17 13.5 5.38 10.67 -
Present Work CNN_GPDS norm
SVM (RBF) 2.17 0.17 0.50 13.00 3.96 7.59 4.17
Figure 5. Performance on the Brazilian PUC-PR dataset varying the numberof samples per user for WD training. The error bars show the smallest andlargest AUC of users in the exploitation dataset.Table VIIC
OMPARISON WITH STATE - OF - THE ART ON
GPDS-160 (
ERRORS IN %) Reference Features Classifier FRR FAR EERHu and Chen [29] LBP, GLCM, HOG Adaboost - - 7.66Yilmaz [30] LBP SVM (RBF) - - 9.64Yilmaz [30] LBP, HOG Ensemble of SVMs - - 6.97Guerbai et al. [31] Curvelet transform OC-SVM 12.5 19.4 -Present work CNN_GPDS norm
SVM (RBF) 19.81 5.99 10.70 to compare - the False Acceptance rates for different types offorgery and the Average Error Rate among all types of error.Besides using these metrics, we also compare with an averageerror rate considering only genuine signatures and skilledforgeries, which is more comparable to the results on GPDS.In this dataset, the proposed method achieves state-of-the-artperformance. The large gap between AER genuine + skilled andEER genuine + skilled also shows that optimization of user-specificdecision thresholds is necessary to obtain a good system: inthe present work the decision thresholds were kept as default(scores larger than 0 were considered forgeries). We noticethat, for GPDS, this default threshold achieved a large FRR, with low FAR, while for the Brazilian dataset we obtained theopposite. This suggests that a global threshold is not sufficient,and user-specific thresholds should be considered. Better user-specific thresholds will be explored in future work.It is worth noting that in the present work we trained theWD classifiers with a combination of genuine signatures andrandom forgeries. This considers a hypothesis that separatingrandom forgeries from genuine signatures will also make theclassifier separate genuine signatures from skilled forgeries.This is a weak hypothesis, as we expect the skilled forgeries tohave much more resemblance to the genuine signatures, whererandom forgeries should be quite different. However, giventhat we only have genuine signatures available for training, thisis a reasonable option, and has been used extensively in theliterature for Writer-Dependent classification. An alternative isto use one-class classification to model only the distributionof the genuine signatures (e.g. [31]), which can be exploredas future work.We would like to point out that, although the EER metric(Equal Error Rate) is useful to have a single number tocompare different systems, it relies on implicitly selectingthe decision thresholds using information from the test set.Therefore, it considers the error rate that can be achievedwith the optimal decision threshold for each user. In a realapplication, the decision thresholds can only be defined usingdata from the enrolled users (i.e. using only genuine signaturefrom the training/validation set), or in a writer-independentway (a single global threshold). Therefore, besides reportingEER, we consider beneficial to also report FAR and FRR,stating the procedure used to select the thresholds.Lastly, we would like to point out that the WD trainingdatasets are significantly imbalanced. We have only a few pos-itive samples(1-30), and a large amount of random forgeries(up to 10 thousand for GPDS-160). Methods betters suited forsuch scenario can also be explored in future work to improvethe performance of the system.I. C
ONCLUSION
We presented a two-stage framework for offline signatureverification, based on writer-independent feature learning andwriter-dependent classification. This method do not rely onhand-crafted features, but instead learn them from data in anwriter-independent format. Experiments conducted on GPDSand the Brazilian PUC-PR datasets demonstrate that thismethod is promising, achieving performance close to thestate-of-the-art for GPDS and surpassing the state-of-the-artperformance in the Brazilian PUC-PR dataset. We have shownthat the features seem to generalize well, by learning thefeatures in the GPDS dataset and achieving good results onthe Brazilian PUC-PR dataset. Results with small number ofsamples per user also demonstrated that this method can beeffective even with few samples per user (4-5 samples).Lastly, we note that although these methods achieve lowEqual Error Rates, the actual False Rejection and False Accep-tance rates are very imbalanced, and not stable across multipleusers and datasets. This highlights the importance of a goodmethod for defining user-specific thresholds, which we intendto explore in future work.A
CKNOWLEDGMENT
This research has been supported by the CNPq grant
EFERENCES[1] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometricrecognition,”
Circuits and Systems for Video Technology, IEEE Trans-actions on , vol. 14, no. 1, pp. 4–20, 2004.[2] R. Plamondon and S. N. Srihari, “Online and off-line handwritingrecognition: a comprehensive survey,”
Pattern Analysis and MachineIntelligence, IEEE Transactions on , vol. 22, no. 1, pp. 63–84, 2000.[3] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Offline Hand-written Signature Verification-Literature Review,” arXiv preprintarXiv:1507.07909 , 2015.[4] M. Ferrer, M. Diaz-Cabrera, and A. Morales, “Static Signature Synthe-sis: A Neuromotor Inspired Approach for Biometrics,”
Pattern Analysisand Machine Intelligence, IEEE Transactions on , vol. 37, no. 3, pp.667–680, Mar. 2015.[5] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, “Off-line HandwrittenSignature GPDS-960 Corpus,” in
Document Analysis and Recognition,9th International Conference on , vol. 2, Sep. 2007, pp. 764–768.[6] Y. Serdouk, H. Nemmour, and Y. Chibani, “Off-line handwritten sig-nature verification using variants of local binary patterns,”
Networkingand Advanced Systems, 2nd International Conference on , p. 75, 2015.[7] S. Pal, S. Chanda, U. Pal, K. Franke, and M. Blumenstein, “Off-linesignature verification using G-SURF,” in
Intelligent Systems Design andApplications, 12th International Conference on . IEEE, 2012, pp. 586–591.[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classificationwith Deep Convolutional Neural Networks,” in
Advances in NeuralInformation Processing Systems 25 , 2012, pp. 1097–1105.[9] C. Freitas, M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier,F. Bortolozzi, and R. Sabourin, “Bases de dados de cheques bancariosbrasileiros,” in
XXVI Conferencia Latinoamericana de Informatica ,2000.[10] N. A. Murshed, F. Bortolozzi, and R. Sabourin, “Binary image com-pression using identity mapping backpropagation neural network,” in
Electronic Imaging’97 . International Society for Optics and Photonics,1997, pp. 29–35.[11] N. A. Murshed, R. Sabourin, and F. Bortolozzi, “A cognitive approach tooff-line signature verification,”
International Journal of Pattern Recog-nition and Artificial Intelligence , vol. 11, no. 05, pp. 801–825, 1997. [12] Y. Bengio, “Learning Deep Architectures for AI,”
Foundations andTrends in Machine Learning , vol. 2, no. 1, pp. 1–127, Jan. 2009.[13] B. Ribeiro, I. Gonçalves, S. Santos, and A. Kovacec, “Deep learningnetworks for off-line handwritten signature recognition,” in
Progress inPattern Recognition, Image Analysis, Computer Vision, and Applica-tions . Springer, 2011, pp. 523–532.[14] H. Khalajzadeh, M. Mansouri, and M. Teshnehlab, “Persian SignatureVerification using Convolutional Neural Networks,” in
InternationalJournal of Engineering Research and Technology , vol. 1. ESRSAPublications, 2012.[15] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep Learning Face Rep-resentation by Joint Identification-Verification,” in
Advances in NeuralInformation Processing Systems , 2014, pp. 1988–1996.[16] G. Eskander, R. Sabourin, and E. Granger, “Hybrid writer-independent-writer-dependent offline signature verification system,”
IET Biometrics ,vol. 2, no. 4, pp. 169–181, Dec. 2013.[17] M. R. Pourshahabi, M. H. Sigari, and H. R. Pourreza, “Offline handwrit-ten signature identification and verification using contourlet transform,”in
Soft Computing and Pattern Recognition, International Conferenceof . IEEE, 2009, pp. 670–673.[18] N. Otsu, “A threshold selection method from gray-level histograms,”
Automatica , vol. 11, no. 285-296, pp. 23–27, 1975.[19] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, “Improving neural networks by preventing co-adaptationof feature detectors,” arXiv e-print 1207.0580, Jul. 2012.[20] X. Glorot and Y. Bengio, “Understanding the difficulty of training deepfeedforward neural networks,” in
Artificial Intelligence and Statistics,International conference on , 2010, pp. 249–256.[21] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Des-jardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a CPUand GPU math expression compiler,” in
Proceedings of the Python forscientific computing conference (SciPy) , vol. 4. Austin, TX, 2010, p. 3.[22] S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri,D. Maturana, M. Thoma, E. Battenberg, J. Kelly, J. D. Fauw, M. Heil-man, diogo149, B. McFee, H. Weideman, takacsg84, peterderivaz, Jon,instagibbs, D. K. Rasul, CongLiu, Britefury, and J. Degrave, “Lasagne:First release.” Aug. 2015.[23] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transfer-ring Mid-level Image Representations Using Convolutional Neural Net-works,” in
Computer Vision and Pattern Recognition, IEEE Conferenceon , Jun. 2014, pp. 1717–1724.[24] L. G. Hafemann, L. S. Oliveira, P. R. Cavalin, and R. Sabourin, “TransferLearning between Texture Classification Tasks using ConvolutionalNeural Networks,” in
Neural Networks, The 2015 International JointConference on , 2015.[25] C. Cortes and V. Vapnik, “Support-vector networks,”
Machine Learning ,vol. 20, no. 3, pp. 273–297, Sep. 1995.[26] D. Bertolini, L. S. Oliveira, E. Justino, and R. Sabourin, “Reducingforgeries in writer-independent off-line signature verification throughensemble of classifiers,”
Pattern Recognition , vol. 43, no. 1, pp. 387–396, Jan. 2010.[27] L. Batista, E. Granger, and R. Sabourin, “Dynamic selection of genera-tive–discriminative ensembles for off-line signature verification,”
PatternRecognition , vol. 45, no. 4, pp. 1326–1340, Apr. 2012.[28] D. Rivard, E. Granger, and R. Sabourin, “Multi-feature extraction andselection in writer-independent off-line signature verification,”
Interna-tional Journal on Document Analysis and Recognition , vol. 16, no. 1,pp. 83–103, 2013.[29] J. Hu and Y. Chen, “Offline Signature Verification Using Real AdaboostClassifier Combination of Pseudo-dynamic Features,” in
DocumentAnalysis and Recognition, 12th International Conference on , Aug. 2013,pp. 1345–1349.[30] M. B. Yilmaz, “Offline Signature Verification With User-Based AndGlobal Classifiers Of Local Features,” Ph.D. dissertation, Sabancı Uni-versity, 2015.[31] Y. Guerbai, Y. Chibani, and B. Hadjadji, “The effective use of theone-class SVM classifier for handwritten signature verification basedon writer-independent parameters,”