[PDF] Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks

Abstract

Automatic Offline Handwritten Signature Verification has been researched over the last few decades from several perspectives, using insights from graphology, computer vision, signal processing, among others. In spite of the advancements on the field, building classifiers that can separate between genuine signatures and skilled forgeries (forgeries made targeting a particular signature) is still hard. We propose approaching the problem from a feature learning perspective. Our hypothesis is that, in the absence of a good model of the data generation process, it is better to learn the features from data, instead of using hand-crafted features that have no resemblance to the signature generation process. To this end, we use Deep Convolutional Neural Networks to learn features in a writer-independent format, and use this model to obtain a feature representation on another set of users, where we train writer-dependent classifiers. We tested our method in two datasets: GPDS-960 and Brazilian PUC-PR. Our experimental results show that the features learned in a subset of the users are discriminative for the other users, including across different datasets, reaching close to the state-of-the-art in the GPDS dataset, and improving the state-of-the-art in the Brazilian PUC-PR dataset.

Full PDF

WWriter-independent Feature Learning for OfﬂineSignature Veriﬁcation using Deep ConvolutionalNeural Networks

Luiz G. Hafemann, Robert Sabourin

Lab. d’imagerie, de vision et d’intelligence artiﬁcielleÉcole de technologie supérieureUniversité du Québec, Montreal, [email protected], [email protected]

Luiz S. Oliveira

Department of InformaticsFederal University of ParanaCuritiba, PR, [email protected]

Accepted as a conference paper for IJCNN 2016

Abstract —Automatic Ofﬂine Handwritten Signature Veriﬁca-tion has been researched over the last few decades from severalperspectives, using insights from graphology, computer vision,signal processing, among others. In spite of the advancements onthe ﬁeld, building classiﬁers that can separate between genuinesignatures and skilled forgeries (forgeries made targeting aparticular signature) is still hard. We propose approaching theproblem from a feature learning perspective. Our hypothesis isthat, in the absence of a good model of the data generationprocess, it is better to learn the features from data, insteadof using hand-crafted features that have no resemblance tothe signature generation process. To this end, we use DeepConvolutional Neural Networks to learn features in a writer-independent format, and use this model to obtain a featurerepresentation on another set of users, where we train writer-dependent classiﬁers. We tested our method in two datasets:GPDS-960 and Brazilian PUC-PR. Our experimental resultsshow that the features learned in a subset of the users arediscriminative for the other users, including across differentdatasets, reaching close to the state-of-the-art in the GPDSdataset, and improving the state-of-the-art in the Brazilian PUC-PR dataset.

I. I

NTRODUCTION

Biometrics technology is used in a wide variety of securityapplications. The aim of such systems is to recognize a personbased on physiological traits (e.g ﬁngerprint, iris) or behavioraltraits (e.g. voice, handwritten signature) [1]. The handwrittensignature is a particularly important type of biometric trait,mostly due to its widespread use to verify a person’s identityin legal, ﬁnancial and administrative areas. One of the reasonsfor its extensive use is that the process to collect handwrittensignatures is non-invasive, and people are familiar with theiruse in daily life [2].Research in signature veriﬁcation is divided between online(dynamic) and ofﬂine (static) scenarios. In the online case, thesignature is captured using a special input device (such as atablet), and the dynamic information of the signature processis captured (pen’s position, inclination, among others). In thiswork, we focus on the Ofﬂine (static) signature veriﬁcationproblem, where the signature is acquired after the writingprocess is completed, by scanning the document containing the signature. In this case, the signature is represented as adigital image.Most of the research effort in this area has been devotedto obtaining a good feature representation for signatures, thatis, designing good feature extractors. To this end, researchershave used insights from graphology, computer vision, signalprocessing, among other areas [3]. As with several problemsin computer vision, it is often hard to design good featureextractors, and the choice of which feature descriptors to useis problem-dependent. Ideally, the features should reﬂect theprocess used to generate the data - for instance, neuromotormodels of the hand movement. Although this approach hasbeen explored in the context of online signature veriﬁcation[4], there is not a widely accepted “best” way to model theproblem, specially for Ofﬂine (static) signature veriﬁcation,where the dynamic information of the signature generationprocess is not available.In spite of the advancements in the ﬁeld, systems proposedin the literature still struggle to distinguish genuine signaturesand skilled forgeries. These are forgeries made by a personwith access to a user’s signature, that practices imitating it(see Figure 1). Experimental results show somewhat largeerror rates when testing on public datasets (such as GPDS[5]), even when the number of samples for training is around10-15 (results are worse with 1-3 samples per user, which isa common scenario in banks and other institutions).In this work we propose using feature learning (also calledrepresentation learning) for the problem of Ofﬂine SignatureVeriﬁcation, in order to obtain better feature representations.Our hypothesis is that, in the absence of a good model ofthe data generation process, it is better to learn the featuresfrom data, rather than using hand-crafted features that have noresemblance to how the signatures are created, which is thecase for the best performing systems proposed in the literature.For example, recent Ofﬂine Signature Veriﬁcation systems arebased on texture descriptors, such as Local Binary Patterns[6], interest-point-matching such as SURF [7], among others.We base our research on recent successful applications ofpurely supervised learning models for computer vision (such a r X i v : . [ c s . C V ] A p r igure 1. Samples from the GPDS-960 dataset. Each row contains threegenuine signatures from the same user and a skilled forgery. We notice thateach genuine signature is different (showing high intra-class variability), whileskilled forgeries resemble the genuine signatures to a large extent (showinglow inter-class variability) as image recognition [8]). In particular, we use Deep Convolu-tional Neural Networks (CNN) trained with a supervised cri-terion, in order to learn good representations for the signatureveriﬁcation problem. This type of architecture is interesting forour problem, since it scales better than fully connected modelsfor larger input sizes, having a smaller number of trainableparameters. This is a desirable property for the problem athand, since we cannot rescale signature images too muchwithout risking losing the details that enable discriminatingbetween skilled forgeries and genuine signatures.The most common formulation of the signature veriﬁcationproblem is called Writer-Dependent classiﬁcation. In thisformulation, one classiﬁer is built for each user in the system.Using a supervised feature learning approach directly in thiscase is not practical, since the number of samples per useris very small (usually around 1-14 samples). Instead, wepropose a two-phase approach: a Writer-Independent featurelearning phase followed by Writer-Dependent classiﬁcation.The feature learning phase uses a surrogate classiﬁcation taskfor learning feature representations, where we train a CNN todiscriminate between signatures from users not enrolled in thesystem. We then use this CNN as a feature extractor and traina Writer-dependent classiﬁer for each user. Note that in thisformulation, adding a new user to the system requires trainingonly a Writer-Dependent classiﬁer.We tested this method using two datasets: the GPDS-960corpus ([5]) and the Brazilian PUC-PR dataset [9]. The ﬁrstis the largest publicly available corpus for ofﬂine signatureveriﬁcation, while the second is a smaller dataset that has beenused for several studies in the area.Our main contributions are the following: We propose atwo-stage framework for ofﬂine signature veriﬁcation, wherewe learn features in a Writer-Independent way, and buildWriter-Dependent classiﬁers. Our results show that we do haveenough data in signature datasets to learn relevant featuresfor the task, and the proposed method achieves state-of-the-art performance. We also investigate how the features learned in one dataset transfer to another dataset, and the impactin performance of the number of samples available for WDtraining. II. R ELATED W ORK

Feature learning methods have not yet been broadly re-searched for the task of ofﬂine signature veriﬁcation. Murshedet al. [10], [11], used autoencoders (called Identity-MappingBackpropagation in their work) to perform dimensionalityreduction followed by a Fuzzy ARTMAP classiﬁer. Thiswork, however, considered only a single hidden layer, withless units than the input. In contrast, in recent successfulapplications of autoencoders, multiple layers of representationsare learned, often in an over-complete format (more hiddenunits than visible units), where the idea is not to reducedimensionality, but “disentangle” the factors of variation in theinputs [12]. Ribeiro et al. [13] used unsupervised learning forlearning representations, in particular, Restricted BoltzmannMachines (RBMs). In this work, the authors tested with asmall subset of users (10 users), and only reported a visualrepresentation of the learned weights, and not the results ofusing such features to discriminate between genuine signaturesand forgeries. Khalajzadeh [14] used Convolutional NeuralNetworks (CNNs) for Persian signature veriﬁcation, but didnot considered skilled forgeries.A similar strategy to our work has been used by Sunet al. [15] for the task of face veriﬁcation. They trainedCNNs on a large dataset of faces and used these networksto extract features on another face dataset. In their work, theveriﬁcation process consisted in distinguishing between facesfrom different users. In signature veriﬁcation, distinguishingbetween different writers is one of the objectives (when weconsider “Random Forgeries”), but the main challenge is todistinguish between genuine signatures and skilled forgeries.In this work we evaluate the method for both types of forgery.The framework we propose is also similar to previous workby Eskander et al. [16]. In this work, a Writer-Independentset is used for feature selection, and a Writer-Dependent setis used for training and evaluation. However, in this workthe authors used hand-crafted feature extractors, while in thepresent work we use the Writer-Independent set for featurelearning, instead of feature selection.III. P

ROPOSED M ETHOD

We propose a two-stage approach, considering a writer-independent feature learning phase, followed by writer-dependent classiﬁcation. We start by partitioning the datasetinto two distinct sets: Development set D and Exploitationset E . The set D is used to learn the feature representationfor signatures. We consider this as a separate dataset fromthe enrolled users. The exploitation set E considers the usersenrolled to the system. This set is used to train Writer-Dependent classiﬁers (using only genuine signatures) and forevaluating the performance of the system.The proposed system is illustrated in Figure 2. We ﬁrst usethe set D to learn the feature representations, by training a riter-Independent Feature learningWriter-Dependent training WD Classi ﬁ er Decision(Accept/Reject) Generalization

CNN trainingFeature ExtractionFeature Extraction

SignatureImage ( )

Veri ﬁ cation SignatureImages ( ) CNN model ( )Extracted features ( )Extracted features ( ) Binary classi ﬁ er ( ) Development dataset ( )Training set (from )New Sample (from )

Figure 2. The proposed architecture for writer-independent feature learning and writer-dependent classiﬁcation.

Convolutional Neural Network (detailed in the next section).The result is a function φ ( . ) , learned from data, that projectsthe input images X to another feature space: φ ( X ) ∈ R m ,where m is the dimensionality of the projected feature space.Our expectation is that the features learned using D will beuseful to separate genuine signatures and forgeries from otherusers. After the CNN is trained, we create a training datasetfor each user in set E , using a subset of the user’s genuinesignatures, and random forgeries. We use the CNN as a featureextractor, obtaining a feature vector φ ( X ) for each signature X in the user’s dataset. This new representation is then used totrain a binary classiﬁer f . For a new sample X new , we ﬁrst usethe CNN to “extract features” (i.e. obtain the feature vector φ ( X new ) ) and feed the feature vector to the binary classiﬁer,obtaining a ﬁnal decision f ( φ ( X new )) . The next sections detailthe WI and WD training procedures. A. Pre-processing

For all signatures from both datasets ( D and E ), we applythe same pre-processing strategy. The signatures from theGPDS dataset have a variable size, ranging from 153 x 258pixels to 819 x 1137 pixels. Since for training a neural networkwe need the inputs to have all the same size, we need tonormalize the signature images. We evaluated two approaches:In the simplest approach, we resized the images to aﬁxed size, using bi-linear interpolation. We perform rescalingwithout deformations, that is, when the original image had adifferent width-to-height ratio, we cropped the excess in thelarger dimension. The second approach consisted in ﬁrst normalizing theimages to the largest image size, by padding the images withwhite background. In this case, we centered the signatures ina canvas of size 840 x 1360 pixels, aligning the center of massof the signature to the center of the image, similar to previousapproaches in the literature, e.g. [17]. We then rescaled theimages to the desired input size of the neural network.With the ﬁrst approach, less ﬁne-grained information islost during the rescaling, specially for the users that havesmall signatures. On the other hand, the width of the penstrokes becomes inconsistent: for the smaller signatures thepen strokes become much thicker than the pen strokes fromthe larger signatures.Besides resizing the images to a standard size, we alsoperformed the following pre-processing steps: • Removed the background : we used OTSU’s algorithm[18] to ﬁnd the optimum threshold between foregroundand background pixel intensities. Pixels with intensitylarger than the threshold were set to white (intensity255). The signature pixels (with intensity less than thethreshold) remain unchanged in this step. • Inverted the images : we inverted the images so thatthe white background corresponded to pixel intensity0. That is, each pixel of the image is calculated as: I inverted ( i, j ) ← − I ( i, j ) . • Normalized the input : we normalized the input to theneural network by dividing each pixel by the standarddeviation of all pixel intensities (from all images in D ). We do not normalize the data to have mean 0another common pre-processing step) since we want thebackground pixels to be zero-valued. B. Writer-Independent feature learning

For learning the representation for signatures, we usedDeep Convolutional Neural Networks. We note that modelingdirectly the problem of interest is not feasible in practice: ourultimate goal is to separate genuine signatures from skilledforgeries of the users enrolled in the system, but in a realisticscenario we only have genuine signatures provided duringan enrollment phase, and do not have forgeries for theseusers. Therefore, we need to consider a surrogate classiﬁcationobjective. In this work, we use a separate set of users (thedevelopment set D ) to learn the features, by learning aclassiﬁcation task, considering each user in D as a differentclass. The objective function is to minimize a cross-entropyclassiﬁcation loss. The expectation is that by learning todistinguish between signatures from different users in thisdataset, the network will learn features that are relevant forour problem of interest - separating genuine signatures andforgeries from the exploitation set E .We used a CNN architecture similar to the one deﬁned byKrizhevsky et al. [8] for an image recognition problem. Initialtests showed that the capacity of this network seems to betoo large for the problem at hand, particularly considering thefully connected layers (that contain most of the weights in thenetwork). We obtained better results with 2 fully-connectedlayers after the convolutions, instead of three layers from theoriginal model. For the purpose of replicating our experiment,we provide a full list of the parameters used in our tests. TableI lists the deﬁnition of the CNN layers. For convolution andpooling layers, we list the size as N x H x W where N is thenumber of ﬁlters, H is the height and W is the width of theconvolution and pooling windows, respectively. Stride refersto the distance between applications of the convolution (orpooling) operation, and pad refers to add padding (borders)to the input, with value . Local Response normalization isapplied according to [8], with the parameters listed in the table.For the ﬁrst two fully-connected layers we use dropout [19],with rate 0.5. We use Rectiﬁed Linear Units (ReLUs) as theactivation function for all convolutional and fully-connectedlayers, except the last one. The last layer uses a softmaxactivation and has N neurons, where N is the number of usersin the set D , indicating the probability of the sample belongingto each of the users.We initialize the weights of the model according to thework of Glorot and Bengio [20], and the biases to . Wetrained the model with Nesterov Momentum for epochs,using momentum rate of 0.9, and mini-batches of size .We started with a learning rate of . , and divided it by10 twice (after 20 epochs, and after 40 epochs). We used L2regularization with a weight decay factor of . . Thesevalues are consolidated in Table II. The networks were trainedusing the libraries Theano [21] and Lasagne [22], and tookaround 5h to train on a GPU Tesla C2050. Table IS

UMMARY OF THE

CNN

LAYERS

Layer Size Other ParametersConvolution 96x11x11 stride = 4, pad=0Local Response Norm. - α = 10 − , β = 0 . , k = 2 , n = 5 Pooling 96x3x3 stride = 2Convolution 256x5x5 stride = 1, pad=2Local Response Norm. - α = 10 − , β = 0 . , k = 2 , n = 5 Pooling 256x3x3 stride = 2Convolution 384x3x3 stride = 1, pad=1Convolution 256x3x3 stride = 1, pad=1Pooling 256x3x3 stride = 2Fully Connected + Dropout 4096 p = 0 . Fully Connected + Softmax NTable IIT

RAINING H YPERPARAMETERS

Parameter valueInitial Learning Rate (LR) 0.01Learning Rate schedule LR ← LR ∗ . (every 20 epochs)Weight Decay 0.0005Momentum 0.9Batch size 100 C. Writer-dependent classiﬁcation

After the CNN is trained in the set D , we use it to extractfeatures for the Writer-Dependent training. Similar to previouswork in transfer learning [23], [24], we use the representationobtained by performing forward propagation of an input imageuntil the last layer before softmax. In the notation deﬁnedabove, we consider our feature extractor function φ ( X ) to bethe representation of the network at the last layer before soft-max, after forward propagating the input X . As noted in TableI, this representation has 4096 dimensions ( φ ( X ) ∈ R ).The hypothesis is that the features learned for the set D , duringthe CNN training, will be relevant for signatures for other users(from the exploitation set).For training the Writer-Dependent classiﬁers, no skilledforgeries are used during training or validation, to simulatethe scenario for a real application. Following previous work onWriter-Dependent classiﬁcation, we create a dataset for eachuser, consisting of genuine signatures and random forgeries(using signatures from other users, from D ).For each user in E , we build a Writer-Dependent trainingand testing set. The training set is composed of a subset ofgenuine signatures for the user (as the positive examples),as well as genuine signatures from other users from thedevelopment dataset (as the negative examples). The testingset consists of genuine signatures from the user (not used fortraining), and the skilled forgeries made for the user. With thisdataset, we ﬁrst use the CNN to extract the features for eachsignature image (that is, compute φ ( X ) for each signature X ).We then train a standard two-class classiﬁer f for each user.For the WD classiﬁcation, we test both linear SVMs andSVMs with the RBF kernel [25]. For the linear SVM, we sers Samples

Users 1 - 160 (160) orUsers 1 - 300 (300) Users 161 - 881 (721) orUsers 301 - 881 (581)

TrainingTesting

Figure 3. The separation of the GPDS-960 dataset in Development set D andExploitation set E . used the hyperparameter C = 1 , while for the SVM withRBF kernel we optimize the parameters C and γ with asubset of users from the set D using a grid search. We selectthe hyperparameters that best classify genuine and skilledforgeries from these users.During generalization, for a new signature X NEW , we ﬁrstuse the CNN to obtain the representation of the signature (i.e.calculate φ ( X NEW ) , and then feed this representation to theclassiﬁer to obtain a ﬁnal decision on the sample f ( φ ( X NEW )) .IV. E XPERIMENTAL P ROTOCOL

Feature learning for complex tasks has shown to work betterwith large datasets. The largest publicly available signaturedataset is GPDS-960 [5], and therefore it is particularlysuitable for our proposed method. This dataset contains 24genuine signatures and 30 forgeries per user, from 881 users,which were captured in a single session [5]. We also testedwith a smaller dataset, that also has been extensively used forofﬂine signature veriﬁcation: the Brazilian PUC-PR dataset[9]. This dataset contains signatures from 168 users, andforgeries for the ﬁrst 60 users.The ﬁrst step is to split the datasets into development set D and exploitation set E . For GPDS, in order to allow comparisonwith previous work, we tested with the set E consisting of theﬁrst 160 users, and the ﬁrst 300 users (which were previouslypublished as GPDS-160 and GPDS-300). Figure 3 shows howthe dataset is split. The remaining users are used for the writer-independent feature learning phase. For the brazilian set, weconsider the ﬁrst 60 users for the set E , and the remaining 108users are used as the set D .After splitting the dataset into sets D and E , we preprocessthe signature images to a standard size of 155 x 220 pixels,considering the two preprocessing options listed in the previ-ous section. This size was chosen to be large enough to keepdetails from the pen strokes in the signatures, while still smallenough to enable training on the GPU. We use the set D totrain a CNN, that learns to classify input signatures to thedifferent users in this set.To assess if the learned features generalize to other datasets,we used the CNN trained in the GPDS dataset for extractingfeatures for the brazilian dataset. This experiment serves twopurposes: analyze if the learned features generalize to other datasets, and evaluate if we can obtain a better performanceon the brazilian set (which is smaller) by leveraging data froma larger dataset (GPDS).For the Writer-Dependent training, we have slightly differ-ent protocols for GPDS and the Brazilian dataset, to corre-spond to protocols used in other work on these datasets. ForGPDS, we selected up to 14 genuine signatures as positivesamples (from E ), and 14 genuine signatures from each userin the set D as negative samples. For testing, we selected10 genuine signatures from the user, ensuring they were notused for training, and all the 30 skilled forgeries. For theBrazilian dataset, we selected up to 30 genuine samples aspositive samples (from E ), and 30 genuine samples from theusers in set D as negative samples. For testing, we selected10 genuine signatures from the user, 10 signatures from otherusers in E (i.e. not used for training) as random forgeries, andall 10 simple forgeries and 10 skilled forgeries available foreach user.To evaluate the impact of different number of samplesignatures per user, we trained the WD classiﬁers using avariable number of signatures from the enrolled users. Thisset-up is summarized in table III.For optimizing the hyperparameters for the SVM training(for the WD classiﬁers), we performed a grid search on theparameters C and γ . We used 10 users from D , buildingWD classiﬁers with the same protocol as above. We selectedthe hyperparameters that performed best in separating gen-uine signatures and skilled forgeries for these 10 users, bymeasuring the classiﬁcation error of each classiﬁer. Beforetraining the SVM models, we rescale the inputs to have a unitstandard deviation (in each dimension). This slightly improvedperformance and signiﬁcantly decreased the SVM trainingtime. Similar to [16], in order to have a balanced dataset fortraining, we duplicated the genuine examples in the trainingset to match the same number of random forgeries (equivalentto having different C for the positive and negative classes).In this work we conducted experiments with two datasets,and authors from different studies have reported differentmetrics. For GPDS, some authors report two metrics: FalseRejection Rate (FRR) and False Acceptance Rate for skilledforgeries (FAR skilled ). The ﬁrst metric is the fraction of genuinesignatures that were classiﬁed as forgery, while the second isthe fraction of skilled forgeries that were classiﬁed as genuinesignatures. Other authors report simply the Equal Error Rate,which is the point in a ROC curve where FAR and FRR areequal. For the results on GPDS, we report these three metrics,and also the mean of the Area Under the Curve (AUC) - thatis, we build a ROC curve for each user, and report the averageof the AUC. For calculating the EER, we considered the ROCcurves created for each user (thresholds speciﬁc for each user).For the Brazilian PUC-PR dataset, authors commonly re-port FRR and FAR for three types of forgeries: Random,Simple and Skilled. Authors also report an average error rate(AER) which is the average of the four types of error (FRR,FAR random , FAR simple , FAR skilled ). To allow comparison withthe results on GPDS, we also report metrics considering only able IIIT RAINING AND TESTING SET - UP Dataset Training Set Testing Setgenuine forgeries (random) genuine forgeriesBrazilian (PUC-PR) 1,5,10,15,30 samples 108 x 30 = 3240 samples 10 samples 10 random, 10 simple, 10 skilledGPDS-160 4, 8, 12, 14 samples 721 x 14 = 10094 samples 10 samples 30 skilledGPDS-300 4, 8, 12, 14 samples 581 x 14 = 8134 samples 10 samples 30 skilledTable IVC

LASSIFICATION ERRORS ON

GPDS (%)

AND MEAN

AUC

Dataset Features Classiﬁer FRR FAR EER Mean AUCGPDS-160 CNN_GPDS SVM (Linear) 26.62 9.65 14.35 0.9153GPDS-160 CNN_GPDS SVM (RBF) 37.25 3.66 14.64 0.9097GPDS-160 CNN_GPDS norm

SVM (Linear) 11.12 16.77 11.32 0.9381

GPDS-160 CNN_GPDS norm

SVM (RBF) 19.81 5.99 10.70 0.9459

GPDS-300 CNN_GPDS SVM (Linear) 25.43 12.80 16.40 0.8968GPDS-300 CNN_GPDS SVM (RBF) 36.27 5.00 16.22 0.9014GPDS-300 CNN_GPDS norm

SVM (Linear) 11.93 25.58 16.07 0.8957

GPDS-300 CNN_GPDS norm

SVM (RBF) 20.60 9.08 12.83 0.9257

FRR and FAR skilled : AER genuine + skilled , EER genuine + skilled andMean AUC genuine + skilled .V. R

ESULTS AND D ISCUSSION

We ﬁrst report the results of the search for the besthyperparameters for the SVM with RBF kernel used forWriter-Dependent classiﬁcation. After training classiﬁers for10 users in the development set, we noticed that the besthyperparameters were the same for most users (8/10 users): γ = 2 − , C = 1 . For the other two users, this was the secondbest conﬁguration for the parameters. Therefore we used thesehyperparameters for the subsequent experiments.Table IV presents the results of our experiments with theGPDS dataset. The column “Features” list the method weused to extract features - in our work this column lists theCNN trained in the set D . We considered both alternativesdeﬁned in the Pre-processing section - simply resizing thesignatures images (CNN_GPDS), and ﬁrst normalizing thesignatures in a canvas with a standard size, before resizingthem (CNN_GPDS norm ). We notice that this normalization wasessential to obtain good classiﬁcation results on this dataset,with a boost in performance from 14.64% of EER to 10.70%in the GPDS-160 dataset. We also noticed the best results wereachieved with the SVM trained with an RBF kernel. Lastly,we noted a drop in performance between the experiments withGPDS-160 and GPDS-300. This can be partially explained bythe fact that we use more data on the set D for GPDS-160.Table V shows the results of our tests with the BrazilianPUC-PR dataset. We noticed the same characteristics as withthe GPDS test, with improved results with the non-linear RBFkernel for the classiﬁer. In this dataset we tested with both aCNN trained on the brazilian dataset, as well as the CNNtrained above for the GPDS dataset. The results were similar,suggesting that the features learned in one dataset generalizewell to other datasets. On the other hand, we expected the Figure 4. Performance on the GPDS-160 dataset varying the number ofsamples per user for WD training. The error bars show the smallest andlargest AUC of users in the exploitation dataset. performance with the CNN trained on GPDS to be better,since the development set for the Brazilian dataset is muchsmaller (108 users in the Brazilian dataset vs. 721 users forGPDS-160), and therefore there is much more data on GPDSto learn a good feature representation.We evaluated the performance of the system consideringdifferent number of samples per user in the exploitation set.For these tests, we used the conﬁguration that performed beston the tests above: using the normalized GPDS developmentset to learn the features, and using an SVM with RBF kernelfor training the WD classiﬁers. Figures 4 and 5 present the evo-lution of the AUC and the Equal Error Rate for the GPDS andBrazilian datasets. We notice that even with a small numberof samples the performance is reasonable, achieving 15.05%EER with 4 signatures in the GPDS dataset, and 9.83% EERwith 5 signatures on the Brazilian dataset. However, we noticethat in the extreme case, when a single signature is available,the performance of the entire system is much worse (around17% EER), and some users have very poor performance (forone user, AUC is below 0.5).We compare our results with the state-of-the art in tablesVI and VII. For GPDS, the method achieves state-of-the-artperformance in terms of Equal Error Rate, when comparingwith systems that used a single feature extractor. However, theperformance is worse compared to systems where multiplefeature extractors / classiﬁers are used. Future work canbe done in analyzing if the features learned from data arecomplementary to hand-crafted features.For the Brazilian PUC-PR dataset, authors use other metrics able VC

LASSIFICATION ERRORS ON THE B RAZILIAN

PUC-PR

DATASET (%)

AND MEAN

AUCFeatures Classiﬁer FRR FAR random

FAR simple

FAR skilled

EER genuine + skilled

Mean AUC genuine + skilled

CNN_Brazilian SVM (Linear) 1.00 0.00 1.67 27.17 7.33 0.9668

CNN_Brazilian SVM (RBF) 2.83 0.17 0.17 14.17 4.17 0.9837

CNN_GPDS SVM (Linear) 1.83 0.00 1.33 27.83 11.50 0.9413CNN_GPDS SVM (RBF) 6.50 0.17 1.17 15.17 8.50 0.9601CNN_GPDS norm

SVM (Linear) 0.17 0.00 1.67 29.00 6.67 0.9653

CNN_GPDS norm

SVM (RBF) 2.17 0.17 0.50 13.00 4.17 0.9800

Table VIC

OMPARISON WITH THE STATE - OF - THE - ART ON THE B RAZILIAN

PUC-PR

DATASET ( ERRORS IN %) Reference Features Classiﬁer FRR FAR_random FAR_simple FAR_skilled AER AER genuine + skilled

EER genuine + skilled

Bertolini et al. [26] Graphometric SVM (RBF) 10.16 3.16 2.8 6.48 5.65 8.32 -Batista et al. [27] Pixel density HMM + SVM 7.5 0.33 0.5 13.5 5.46 10.5 -Rivard et al. [28] ESC + DPDF Adaboost 11 0 0.19 11.15 5.59 11.08 -Eskander et al. [16] ESC + DPDF Adaboost 7.83 0.02 0.17 13.5 5.38 10.67 -

Present Work CNN_GPDS norm

SVM (RBF) 2.17 0.17 0.50 13.00 3.96 7.59 4.17

Figure 5. Performance on the Brazilian PUC-PR dataset varying the numberof samples per user for WD training. The error bars show the smallest andlargest AUC of users in the exploitation dataset.Table VIIC

OMPARISON WITH STATE - OF - THE ART ON

GPDS-160 (

ERRORS IN %) Reference Features Classiﬁer FRR FAR EERHu and Chen [29] LBP, GLCM, HOG Adaboost - - 7.66Yilmaz [30] LBP SVM (RBF) - - 9.64Yilmaz [30] LBP, HOG Ensemble of SVMs - - 6.97Guerbai et al. [31] Curvelet transform OC-SVM 12.5 19.4 -Present work CNN_GPDS norm

SVM (RBF) 19.81 5.99 10.70 to compare - the False Acceptance rates for different types offorgery and the Average Error Rate among all types of error.Besides using these metrics, we also compare with an averageerror rate considering only genuine signatures and skilledforgeries, which is more comparable to the results on GPDS.In this dataset, the proposed method achieves state-of-the-artperformance. The large gap between AER genuine + skilled andEER genuine + skilled also shows that optimization of user-speciﬁcdecision thresholds is necessary to obtain a good system: inthe present work the decision thresholds were kept as default(scores larger than 0 were considered forgeries). We noticethat, for GPDS, this default threshold achieved a large FRR, with low FAR, while for the Brazilian dataset we obtained theopposite. This suggests that a global threshold is not sufﬁcient,and user-speciﬁc thresholds should be considered. Better user-speciﬁc thresholds will be explored in future work.It is worth noting that in the present work we trained theWD classiﬁers with a combination of genuine signatures andrandom forgeries. This considers a hypothesis that separatingrandom forgeries from genuine signatures will also make theclassiﬁer separate genuine signatures from skilled forgeries.This is a weak hypothesis, as we expect the skilled forgeries tohave much more resemblance to the genuine signatures, whererandom forgeries should be quite different. However, giventhat we only have genuine signatures available for training, thisis a reasonable option, and has been used extensively in theliterature for Writer-Dependent classiﬁcation. An alternative isto use one-class classiﬁcation to model only the distributionof the genuine signatures (e.g. [31]), which can be exploredas future work.We would like to point out that, although the EER metric(Equal Error Rate) is useful to have a single number tocompare different systems, it relies on implicitly selectingthe decision thresholds using information from the test set.Therefore, it considers the error rate that can be achievedwith the optimal decision threshold for each user. In a realapplication, the decision thresholds can only be deﬁned usingdata from the enrolled users (i.e. using only genuine signaturefrom the training/validation set), or in a writer-independentway (a single global threshold). Therefore, besides reportingEER, we consider beneﬁcial to also report FAR and FRR,stating the procedure used to select the thresholds.Lastly, we would like to point out that the WD trainingdatasets are signiﬁcantly imbalanced. We have only a few pos-itive samples(1-30), and a large amount of random forgeries(up to 10 thousand for GPDS-160). Methods betters suited forsuch scenario can also be explored in future work to improvethe performance of the system.I. C

ONCLUSION

We presented a two-stage framework for ofﬂine signatureveriﬁcation, based on writer-independent feature learning andwriter-dependent classiﬁcation. This method do not rely onhand-crafted features, but instead learn them from data in anwriter-independent format. Experiments conducted on GPDSand the Brazilian PUC-PR datasets demonstrate that thismethod is promising, achieving performance close to thestate-of-the-art for GPDS and surpassing the state-of-the-artperformance in the Brazilian PUC-PR dataset. We have shownthat the features seem to generalize well, by learning thefeatures in the GPDS dataset and achieving good results onthe Brazilian PUC-PR dataset. Results with small number ofsamples per user also demonstrated that this method can beeffective even with few samples per user (4-5 samples).Lastly, we note that although these methods achieve lowEqual Error Rates, the actual False Rejection and False Accep-tance rates are very imbalanced, and not stable across multipleusers and datasets. This highlights the importance of a goodmethod for deﬁning user-speciﬁc thresholds, which we intendto explore in future work.A

CKNOWLEDGMENT

This research has been supported by the CNPq grant

EFERENCES[1] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometricrecognition,”

Circuits and Systems for Video Technology, IEEE Trans-actions on , vol. 14, no. 1, pp. 4–20, 2004.[2] R. Plamondon and S. N. Srihari, “Online and off-line handwritingrecognition: a comprehensive survey,”

Pattern Analysis and MachineIntelligence, IEEE Transactions on , vol. 22, no. 1, pp. 63–84, 2000.[3] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Ofﬂine Hand-written Signature Veriﬁcation-Literature Review,” arXiv preprintarXiv:1507.07909 , 2015.[4] M. Ferrer, M. Diaz-Cabrera, and A. Morales, “Static Signature Synthe-sis: A Neuromotor Inspired Approach for Biometrics,”

Pattern Analysisand Machine Intelligence, IEEE Transactions on , vol. 37, no. 3, pp.667–680, Mar. 2015.[5] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, “Off-line HandwrittenSignature GPDS-960 Corpus,” in

Document Analysis and Recognition,9th International Conference on , vol. 2, Sep. 2007, pp. 764–768.[6] Y. Serdouk, H. Nemmour, and Y. Chibani, “Off-line handwritten sig-nature veriﬁcation using variants of local binary patterns,”

Networkingand Advanced Systems, 2nd International Conference on , p. 75, 2015.[7] S. Pal, S. Chanda, U. Pal, K. Franke, and M. Blumenstein, “Off-linesignature veriﬁcation using G-SURF,” in

Intelligent Systems Design andApplications, 12th International Conference on . IEEE, 2012, pp. 586–591.[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classiﬁcationwith Deep Convolutional Neural Networks,” in

Advances in NeuralInformation Processing Systems 25 , 2012, pp. 1097–1105.[9] C. Freitas, M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier,F. Bortolozzi, and R. Sabourin, “Bases de dados de cheques bancariosbrasileiros,” in

XXVI Conferencia Latinoamericana de Informatica ,2000.[10] N. A. Murshed, F. Bortolozzi, and R. Sabourin, “Binary image com-pression using identity mapping backpropagation neural network,” in

Electronic Imaging’97 . International Society for Optics and Photonics,1997, pp. 29–35.[11] N. A. Murshed, R. Sabourin, and F. Bortolozzi, “A cognitive approach tooff-line signature veriﬁcation,”

International Journal of Pattern Recog-nition and Artiﬁcial Intelligence , vol. 11, no. 05, pp. 801–825, 1997. [12] Y. Bengio, “Learning Deep Architectures for AI,”

Foundations andTrends in Machine Learning , vol. 2, no. 1, pp. 1–127, Jan. 2009.[13] B. Ribeiro, I. Gonçalves, S. Santos, and A. Kovacec, “Deep learningnetworks for off-line handwritten signature recognition,” in

Progress inPattern Recognition, Image Analysis, Computer Vision, and Applica-tions . Springer, 2011, pp. 523–532.[14] H. Khalajzadeh, M. Mansouri, and M. Teshnehlab, “Persian SignatureVeriﬁcation using Convolutional Neural Networks,” in

InternationalJournal of Engineering Research and Technology , vol. 1. ESRSAPublications, 2012.[15] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep Learning Face Rep-resentation by Joint Identiﬁcation-Veriﬁcation,” in

Advances in NeuralInformation Processing Systems , 2014, pp. 1988–1996.[16] G. Eskander, R. Sabourin, and E. Granger, “Hybrid writer-independent-writer-dependent ofﬂine signature veriﬁcation system,”

IET Biometrics ,vol. 2, no. 4, pp. 169–181, Dec. 2013.[17] M. R. Pourshahabi, M. H. Sigari, and H. R. Pourreza, “Ofﬂine handwrit-ten signature identiﬁcation and veriﬁcation using contourlet transform,”in

Soft Computing and Pattern Recognition, International Conferenceof . IEEE, 2009, pp. 670–673.[18] N. Otsu, “A threshold selection method from gray-level histograms,”

Automatica , vol. 11, no. 285-296, pp. 23–27, 1975.[19] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, “Improving neural networks by preventing co-adaptationof feature detectors,” arXiv e-print 1207.0580, Jul. 2012.[20] X. Glorot and Y. Bengio, “Understanding the difﬁculty of training deepfeedforward neural networks,” in

Artiﬁcial Intelligence and Statistics,International conference on , 2010, pp. 249–256.[21] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Des-jardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a CPUand GPU math expression compiler,” in

Proceedings of the Python forscientiﬁc computing conference (SciPy) , vol. 4. Austin, TX, 2010, p. 3.[22] S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri,D. Maturana, M. Thoma, E. Battenberg, J. Kelly, J. D. Fauw, M. Heil-man, diogo149, B. McFee, H. Weideman, takacsg84, peterderivaz, Jon,instagibbs, D. K. Rasul, CongLiu, Britefury, and J. Degrave, “Lasagne:First release.” Aug. 2015.[23] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transfer-ring Mid-level Image Representations Using Convolutional Neural Net-works,” in

Computer Vision and Pattern Recognition, IEEE Conferenceon , Jun. 2014, pp. 1717–1724.[24] L. G. Hafemann, L. S. Oliveira, P. R. Cavalin, and R. Sabourin, “TransferLearning between Texture Classiﬁcation Tasks using ConvolutionalNeural Networks,” in

Neural Networks, The 2015 International JointConference on , 2015.[25] C. Cortes and V. Vapnik, “Support-vector networks,”

Machine Learning ,vol. 20, no. 3, pp. 273–297, Sep. 1995.[26] D. Bertolini, L. S. Oliveira, E. Justino, and R. Sabourin, “Reducingforgeries in writer-independent off-line signature veriﬁcation throughensemble of classiﬁers,”

Pattern Recognition , vol. 43, no. 1, pp. 387–396, Jan. 2010.[27] L. Batista, E. Granger, and R. Sabourin, “Dynamic selection of genera-tive–discriminative ensembles for off-line signature veriﬁcation,”

PatternRecognition , vol. 45, no. 4, pp. 1326–1340, Apr. 2012.[28] D. Rivard, E. Granger, and R. Sabourin, “Multi-feature extraction andselection in writer-independent off-line signature veriﬁcation,”

Interna-tional Journal on Document Analysis and Recognition , vol. 16, no. 1,pp. 83–103, 2013.[29] J. Hu and Y. Chen, “Ofﬂine Signature Veriﬁcation Using Real AdaboostClassiﬁer Combination of Pseudo-dynamic Features,” in

DocumentAnalysis and Recognition, 12th International Conference on , Aug. 2013,pp. 1345–1349.[30] M. B. Yilmaz, “Ofﬂine Signature Veriﬁcation With User-Based AndGlobal Classiﬁers Of Local Features,” Ph.D. dissertation, Sabancı Uni-versity, 2015.[31] Y. Guerbai, Y. Chibani, and B. Hadjadji, “The effective use of theone-class SVM classiﬁer for handwritten signature veriﬁcation basedon writer-independent parameters,”