[PDF] Meta-learning for fast classifier adaptation to new users of Signature Verification systems

Abstract

Offline Handwritten Signature verification presents a challenging Pattern Recognition problem, where only knowledge of the positive class is available for training. While classifiers have access to a few genuine signatures for training, during generalization they also need to discriminate forgeries. This is particularly challenging for skilled forgeries, where a forger practices imitating the user's signature, and often is able to create forgeries visually close to the original signatures. Most work in the literature address this issue by training for a surrogate objective: discriminating genuine signatures of a user and random forgeries (signatures from other users). In this work, we propose a solution for this problem based on meta-learning, where there are two levels of learning: a task-level (where a task is to learn a classifier for a given user) and a meta-level (learning across tasks). In particular, the meta-learner guides the adaptation (learning) of a classifier for each user, which is a lightweight operation that only requires genuine signatures. The meta-learning procedure learns what is common for the classification across different users. In a scenario where skilled forgeries from a subset of users are available, the meta-learner can guide classifiers to be discriminative of skilled forgeries even if the classifiers themselves do not use skilled forgeries for learning. Experiments conducted on the GPDS-960 dataset show improved performance compared to Writer-Independent systems, and achieve results comparable to state-of-the-art Writer-Dependent systems in the regime of few samples per user (5 reference signatures).

Full PDF

11 Meta-learning for fast classiﬁer adaptation to newusers of Signature Veriﬁcation systems

Luiz G. Hafemann, Robert Sabourin,

Member, IEEE, and Luiz S. Oliveira.

Abstract —Ofﬂine Handwritten Signature veriﬁcation presentsa challenging Pattern Recognition problem, where only knowl-edge of the positive class is available for training. While classiﬁershave access to a few genuine signatures for training, duringgeneralization they also need to discriminate forgeries. This isparticularly challenging for skilled forgeries, where a forgerpractices imitating the user’s signature, and often is able to createforgeries visually close to the original signatures. Most work inthe literature address this issue by training for a surrogate ob-jective: discriminating genuine signatures of a user and randomforgeries (signatures from other users). In this work, we proposea solution for this problem based on meta-learning, where thereare two levels of learning: a task-level (where a task is to learn aclassiﬁer for a given user) and a meta-level (learning across tasks).In particular, the meta-learner guides the adaptation (learning)of a classiﬁer for each user, which is a lightweight operation thatonly requires genuine signatures. The meta-learning procedurelearns what is common for the classiﬁcation across differentusers. In a scenario where skilled forgeries from a subset ofusers are available, the meta-learner can guide classiﬁers to bediscriminative of skilled forgeries even if the classiﬁers themselvesdo not use skilled forgeries for learning. Experiments conductedon the GPDS-960 dataset show improved performance comparedto Writer-Independent systems, and achieve results comparableto state-of-the-art Writer-Dependent systems in the regime of fewsamples per user (5 reference signatures).

Index Terms —Meta Learning, Signature Veriﬁcation, Biomet-rics

I. I

NTRODUCTION

Ofﬂine Handwritten signature veriﬁcation remains a chal-lenging problem in the presence of skilled forgeries, wherethe forger has access to the user’s signature and practicesimitating it [1]. This problem is particularly challenging sincein a practical application scenario we cannot expect to haveaccess to skilled forgeries for every user in the system fortraining the classiﬁers.This problem is mainly addressed in three ways in the lit-erature: (i) training a classiﬁer for each user using a surrogateobjective, where the negative samples are genuine signaturesfrom other users (called random forgeries in this context) [2],[3], [4] (ii) training a one-class classiﬁer for each user [5]; (iii)training a global, writer-independent classiﬁer [6], [7], [8].

L. G. Hafemann and R. Sabourin are with the Laboratoire d’imagerie,de vision et d’intelligence artiﬁcielle, ´Ecole de technologie sup´erieure,Universit´e du Qu´ebec, Montreal, Canada. (e-mail: [email protected],[email protected])L. S. Oliveira is with the Department of Informatics, Federal University ofParana, Curitiba, Brazil (e-mail:[email protected])This work was supported by the Fonds de recherche du Qu´ebec - Natureet technologies (FRQNT), the CNPq grant

The ﬁrst alternative (Writer Dependent (WD) classiﬁcation)optimizes a surrogate objective, which therefore can be sub-optimal. The second alternative (one class Writer Dependentclassiﬁcation) is an appropriate formulation of the problem, butempirical results show that this approach performs worse thanthe ﬁrst. A possible reason is that for signature veriﬁcationtasks we normally have only a small number of samplesper user, which makes it hard to estimate the support (orprobability density) of the positive class. For instance, recentwork considers a feature space in R , while the numberof signatures from one individual can be as low as 1-5 inpractical applications [1], [4]. Lastly, the third alternative(Writer Independent (WI) classiﬁcation) alleviates the problemof a small number of samples per user by transforming theproblem in a binary classiﬁcation problem: comparing a querysignature with a reference (template) signature, where the sameclassiﬁer is used for all users [9], [7]. However, empiricallythese approaches also show worse performance than WDclassiﬁcation, at least when the number of signatures availablefor training (per user) is larger than 1 [4]. We hypothesize thata reason for this gap in performance is that the WI classiﬁerscompare a query signature with a reference signature one ata time, while the WD classiﬁers are trained with multiplereferences at the same time, and therefore can better estimatethe invariances in a person’s signature (intra-class variation).Considering different approaches, WD classiﬁcation (al-ternative (i) above) shows better empirical performance [1].However, this approach has other shortcomings compared toWI approaches: they require training a classiﬁer for each user,which is not desirable in some scenarios: For instance, whenthe number of users is very large, and each user do not use thesystem often - many classiﬁers are trained but are almost neverused. Also, in the cases where features are learned from data(e.g. [4]), if we want to change the feature representation, forinstance by training with new data, it is not straightforwardto incorporate the new features without re-training all WDclassiﬁers in the system, while a global (WI) classiﬁer wouldnot require any extra step. WI systems also naturally handlesthe issue of adding more signatures to the reference set.In this work, we propose to formulate the task as ameta-learning problem, inspired by the work of a ForensicsHandwritten Expert: the expert acquires knowledge examininggenuine signatures and forgeries from several people alonghis/her training and work experience. For a new case, alongwith knowledge of signatures from the individual, this previousexperience is also used when analyzing a signature of interest.The main contributions of this paper are: (cid:13) a r X i v : . [ c s . C V ] O c t • We formulate the signature veriﬁcation task as a meta-learning problem, considering a meta-learner that learnsacross-tasks (classiﬁcation for speciﬁc individuals), thatis subsequently adapted to a particular user in order tomake a prediction on a query signature. • We extend Model Agnostic Meta Learning (MAML)[10], to consider different loss functions during classiﬁeradaptation and meta-learning, to address the issue ofpartial-knowledge during training. • The resulting system is as scalable as a WI system(there is a single meta-model), but that is also adaptablefor individual users with a lightweight operation (a fewgradient descent steps). Additionally, contrary to otherwork that learns representations to train WD classiﬁers([4]), not only the ﬁnal classiﬁcation layer is adaptedto the new user, but the feature representation is alsoadapted. • We evaluate the approach in four widely used datasets,achieving results comparable to state-of-the-art on theGPDS-960 dataset. Finally, we discuss the limitations ofthe approach, most notably the requirement of using datafrom a large number of users for training, and worseresults when transferring the meta-learner to the otherdatasets. Code to reproduce the experiments can be foundat https://github.com/luizgh/sigver.The paper is organized as follows: section II reviews therelated work on signature veriﬁcation and meta-learning. Sec-tion III introduces the formulation of signature veriﬁcation asa meta-learning problem, and the proposed algorithm. SectionIV describes the experimental protocol, and section V presentsand discusses our results. Finally, section VI concludes thepaper. II. R

ELATED W ORK

The objective of signature veriﬁcation systems is to classifya query signature as being genuine (produced by the claimedindividual), or a forgery (produced by another person). In thePattern Recognition community, different forgeries are consid-ered: Random forgeries - in which the forger has no knowledgeof the user’s signature, and use his signature instead; Simpleforgeries - in which the forger knows the person’s name,but not their signature; Skilled forgeries - where the forgerhas access to the user’s signature, and practices imitatingit. While the problem of distinguishing random and simpleforgeries is relatively easy (i.e. low error rates in state-of-the-art classiﬁers), skilled forgeries still present a signiﬁcantchallenge for classiﬁcation.These systems can be broadly categorized as Writer-Dependent (WD, also called User-Dependent) and Writer-Independent (WI, also called User-Independent). For Writer-Dependent classiﬁers, we consider a dataset for each user { x, y } ni =1 , where x are signatures, and y indicate whetherthey are genuine signatures from the user ( y = 1 ) or randomforgeries ( y = 0 ) [2], [3], [4]. Some work consider one-classWD classiﬁers, in which only genuine signatures from theuser are used for training (only y = 1 ) [5]. For WI classiﬁers,there are two main approaches: training a single classiﬁer in Users S a m p l e s

160 or 300 53150

Fig. 1: Common dataset separation for Feature Learning fol-lowed by WD classiﬁcation, on the GPDS dataset. Features arelearned in D . Model selection is conducted in V v . The systemis evaluated by training WD classiﬁers for the exploitation set E . [4]a dissimilarity space, and metric learning approaches. In theﬁrst case, the training samples are difference of feature vectors: | φ ( x ) − φ ( x ) | , with y = 1 if both signatures are from thesame user, and y = 0 otherwise [6], [7]. The metric learningapproaches use a siamese network architecture [11], whichtakes two signatures ( x , x ) as input, and outputs a metric(distance) between them.Recent work on signature veriﬁcation rely on feature learn-ing methods [12], [4], [13], [14], [8], in which learning isconducted directly from signature pixels, instead of relying onhandcrafted feature extractors. In this case, a function φ ( x ) is learned to extract features from signature images x , bytraining using a surrogate objective, e.g. dictionary learning[15], [14], or classifying the user that produce the signatures[4]. For instance, the SigNet model [4] is a ConvolutionalNeural Network trained with the following objective: L = − (cid:88) j y ij log P ( y j | X i ) (1)Where X i is a signature and y i is the user that wrote the signa-ture. Therefore, the network is trained to obtain a representa-tion space where signatures from different people are linearlyseparable [4]. This feature representation is learned from aDevelopment dataset D , which is then used to extract featuresand train Writer-Dependent classiﬁers for a disjoint set ofusers (exploitation set E ) - a diagram of dataset separation isshown in Figure 1. While this approach achieved state-of-the-art veriﬁcation performance, we note that the feature learningprocess does not directly optimize for the ﬁnal objective of thesystem, which is distinguish genuine signatures and forgeries.This is addressed to some extent in the SigNet-F model, byalso classifying whether or not the signature is a forgery.However, in that case, the neuron classifying forgeries does notuse a reference signature from the user. While this was shownto be helpful in obtaining a good feature representation, thisneuron did not generalize in classifying forgeries for unseenusers [4].Such methods using feature learning followed by WD classi-ﬁcation also have other shortcomings: they require training oneclassiﬁer for each user, which may be an expensive operation(e.g. best results in [12], [4] were reported with an SVMtrained with the RBF kernel for each user). If the feature genuine genuine ? ? Fig. 2: Illustration of the data available for one task (user).Left: the reference (support) set. Right: query samples.extractor is updated (e.g. trained with more data), then allclassiﬁers need to be retrained. Also, these systems use a ﬁxedrepresentation for all users, and it is possible that adaptingthe representation for each user would yield improvements inclassiﬁcation performance.It is also worth noting that, for WI classiﬁcation, signatureveriﬁcation systems can be trained jointly (feature extractionand classiﬁcation) [8]. Despite being jointly trained, such WIsystems still perform worse than WD classiﬁers trained withfeatures learned with surrogate objectives, at least when morethan one signature references are used [4]. A possible reasonfor this gap is the fact that WI systems compare the querysignature to each reference individually (or comparing withthe centroid of the signatures), which is less powerful thantraining a classiﬁer for the user, in capturing the invariancesof the person’s signature.

A. Meta-learning

In a broad sense, meta-learning is concerned with theproblem of learning to learn , with origins in the 80’s and90’s [16], [17]. More recently, algorithms based on meta-learning have achieved state-of-the-art results in tasks suchas hyperparameter optimization [18], neural network architec-ture search [19]), and few-shot learning [20], [10]. Few-shotlearning considers a scenario where only a few samples fromeach class are available for training, which is similar to actualapplication scenarios in handwritten signature veriﬁcation.The goal of these meta-learning approaches for few-shotlearning is to train a model that can quickly (i.e. in a fewiterations) adapt to a new task using only a few samples.A new task in this context refers, for instance, to classifya new object, for which only a few samples are known.Ravi and Larochelle [20] proposed learning an optimizer and initialization for the tasks (Meta Nets). They propose using aLong short-term memory (LSTM) model to learn the updaterule for adapting the network parameters to a new task. Finn etal [10] proposed a Model Agnostic Meta Learning (MAML)procedure that does not require any extra parameters. Thismodel optimizes the sensitivity of the weights, that is, obtaina feature representation that is highly adaptive, such that asingle (or a few) gradient descent iterations are sufﬁcient tooptimize to new tasks.III. P

ROPOSED M ETHOD

In this work we propose a meta-learning approach forsignature veriﬁcation. This formulation considers a meta-learner that guides the adaptation of a classiﬁer for each user.We consider that each user describes a task : discriminating genuine genuine genuine forgerygenuine genuine genuine forgery

Fig. 3: Example of the meta-learning setup. Each user repre-sents an episode , where D u is used for classiﬁer adaptationand D (cid:48) u is used for meta-update.TABLE I: Table of symbols T Distribution of tasks (i.e. users) T u Task for user u D meta-train Training set for the meta-learner D meta-test Testing set for the meta-learner D u Samples for weight adaptation for user u D (cid:48) u Samples for meta-update for user uG u Genuine signatures for user uS u Skilled forgeries for user uθ Network parameters θ ( u ) k Parameters adapted to user u after k descent steps L Loss function for weight adaptation L (cid:48) Loss function for meta-update between genuine signatures (created by the user) and forgeries.Figure 2 illustrates the data available for one task: we considera reference (support) dataset that is used for training a classiﬁerthat can classify new queries as genuine or forgery.In a meta-learning setting, we consider that training aclassiﬁer for a particular user is guided by a meta-learner,that leverages data from multiple tasks for learning. Forthis we consider a dataset D meta-train , and then evaluate thegeneralization performance on unseen users D meta-test .We note that this approach has a direct correspondence toprevious work that used feature learning followed by WDclassiﬁcation (section II), and here we make the associationbetween the terminology in the meta-learning research andprevious work on Signature Veriﬁcation. In both cases weuse a separate set of users for feature learning ( D meta-train is analogous to the development set in section II), which isthen used for to train and test classiﬁers on a new set ofusers ( D meta-test is analogous to the exploitation set). The keydifferences of meta-learning is that: (i) The loss optimizedfor feature learning is directly related to the ﬁnal objective(separate genuine signatures and forgeries); (ii) training aclassiﬁer for a new user is a lightweight process (a few gradientdescent iterations); (iii) not only the classiﬁer, but the featuresare also adapted for each user.In the next section we formalize the problem of signatureveriﬁcation as a meta-learning task. A. Problem formulation

We consider that each user describes a task T u ∈ T , wherethe task consists in classifying a signature image as genuine(created by the user) or forgery (not created by the user). Acollection of users therefore describes a distribution of tasks T , and the aim of the meta-learner is to explore the structure Meta-learningGeneralization

Classi ﬁ erAdaptation(Alg. 2)Meta-Training(Alg. 1)Classi ﬁ cation Weights ( )Adapted Weights ( )

Fig. 4: Overview of the meta-learning system for signatureveriﬁcation.present in this distribution. We consider a dataset D meta-train containing tasks from T , that is used for meta-learning. Foreach user we consider a set D u , that is used to adapt theclassiﬁer, and a set D (cid:48) u that is used for updating the meta-learner. Lastly, to verify the generalization to unseen users,we consider a set D meta-test , that contains data from a disjointset of users ( D meta-train ∩ D meta-test = ∅ ). Figure 3 illustrates themeta-learning setup, and the symbols used in this paper arelisted in Table I for clarity. B. Model Agnostic Meta-Learning for signature veriﬁcation

In this work we propose an extended version of Model-Agnostic Meta-Learning (MAML) [10], by considering dif-ferent criteria for classiﬁer adaptation and meta-learning. Anoverview of the system can be seem in ﬁgure 4. We consider adevelopment set for meta-training, that consists in learning theweights θ of a Convolutional Neural Network, that are highly adaptable to new tasks. During generalization, for a user u ,a reference set D u is used to adapt the classiﬁer to this user(using K gradient descent steps) obtaining weights θ ( u ) K . Thisadapted classiﬁer is then used to classify a query image x q ,obtaining P ( y = 1 | x q , θ ( u ) K ) .Algorithm 1 describes the full meta-training algorithm.Meta-training is conducted in episodes (Figure 3). In eachepisode, the classiﬁer is adapted to a particular user using D u (lines 7 to 10), and the adapted classiﬁer is used to classifythe set D (cid:48) u . The loss is then back-propagated through allintermediate steps of the classiﬁer adaptation (lines 11 and12), and is used to update the meta-learner weights θ (line14). Therefore, instead of having a feature representation thatis directly applicable for any user, they are learned to workwell for new users after K gradient descent steps on the user’ssignatures. For stability during training, we train on “mini-batches” of episodes, by accumulating the gradients for M episodes before updating θ . Algorithm 1

Meta-Training algorithm

Input: M : Meta-batch size Input: K : Number of gradient descent steps Input: α , β Learning rates

Output: θ : Meta-learned weights Randomly initialize θ while not done do Sample a batch of tasks { T u } Mu =1 ∼ T θ grad ← (cid:126) for u ← to M do Sample D u (cid:46) Genuine only θ (cid:48) ← θ for k ← to K do (cid:46) Adapt weights to u θ (cid:48) k ← θ (cid:48) k − − α ∇ θ (cid:48) k − L ( D u , θ (cid:48) k − ) end for Sample D (cid:48) u (cid:46) Genuine and forgeries θ grad ← θ grad + M ∇ θ L (cid:48) ( D (cid:48) u , θ (cid:48) K ) end for θ ← θ − βθ grad (cid:46) Meta-update end while

Figure 5 illustrates the classiﬁer adaptation procedure. Inthis work, we adapt the MAML algorithm to use differentloss functions for the classiﬁer adaptation and the ﬁnal loss(used for the meta-update). In particular, we consider a lossfunction L that only uses genuine signatures for the classiﬁeradaptation, and a loss function L (cid:48) that use both genuinesignatures and forgeries. Let D u = G u ∪ G i (cid:54) = u be the trainingset consisted of genuine signatures from the user ( G u ) andrandom forgeries ( G i (cid:54) = u ). We consider the following loss forclassiﬁer adaptation: L ( D u , θ ) = − | G u | (cid:88) x : G u log( P ( y | x, θ )) − | G i (cid:54) = u | (cid:88) x : G i (cid:54) = u log( P ( y | x, θ )) (2)where | G u | and | G i (cid:54) = u | are the number of users in the sets,which is used to correct for the imbalance between the twoclasses.Let D (cid:48) u = G (cid:48) u ∪ G (cid:48) i (cid:54) = u ∪ S (cid:48) u be the a disjoint set of signaturesfor user u : genuine signatures ( G (cid:48) u ), random forgeries ( G (cid:48) i (cid:54) = u ),and (if available), skilled forgeries S (cid:48) u . We deﬁne the lossfunction for meta-update as follows: L (cid:48) ( D (cid:48) u , θ ) = − | G (cid:48) u | (cid:88) x : G (cid:48) u log( P ( y | x, θ ( u ) K )) − | G (cid:48) i (cid:54) = u | (cid:88) x : G (cid:48) i (cid:54) = u log( P ( y | x, θ ( u ) K )) − | S (cid:48) u | (cid:88) x : S (cid:48) u log( P ( y | x, θ ( u ) K )) (3)On generalization, for a new user we ﬁrst adapt the weightsto this user using a set of reference signatures D u , and thenclassify a new query signature using the adapted weights.Algorithm 2 describes the classiﬁer adaptation to a new user. Model Model Model

Fig. 5: Illustration of one iteration of meta-training for one task T u . Starting with parameters θ , the weights are specializedfor the task in K gradient descent steps. Each step involves computing the loss (1), back-propagating the loss w.r.t to θ (cid:48) k − (2) and updating the weights (3). For the meta-update, the loss L (cid:48) is backpropagated through the entire chain (from L (cid:48) backto the initial θ ), computing ∇ θ L (cid:48) ( D (cid:48) u , θ uK ) . Algorithm 2

Classiﬁer adaptation

Input: K : Number of gradient descent steps Input: α Learning rate

Input: θ Meta-learned weights

Input: D u Reference set for user u Output: θ (cid:48) K : Weights adapted to the user after K steps θ (cid:48) ← θ for k ← to K do θ (cid:48) k ← θ (cid:48) k − − α ∇ θ (cid:48) k − L ( D u , θ (cid:48) k − ) end for We note that only the loss function L is used, and thereforeonly genuine signatures are used when adapting a classiﬁerfor a new user. C. Meta-learning for one-class classiﬁcation

The approach deﬁned above can also be extended for one-class classiﬁcation, where the classiﬁer adaptation is done withonly genuine signatures from the user of interest. This is easilyimplemented by considering D u = G u . It is worth noting thatsimilarity-based methods and one-class methods that involvefeature learning often suffer from the problem of collapsingrepresentations into a point [21]. This is often addressed byadding a penalty in the loss function that requires dissimilaritems to be far apart in the feature space. In our formulation,while the user’s classiﬁer is only trained with data from oneclass, we observe that training does not collapse to a singlepoint since the meta-training procedure directly optimizes theperformance on separating forgeries in D (cid:48) u .IV. E XPERIMENTAL P ROTOCOL

We conducted most experiments on the GPDS-960 dataset[22], that consists of 881 users, with 24 genuine signaturesper user and 30 skilled forgeries. We follow the same datasetseparation as previous work (ﬁgure 1), with users 350-881 as D meta-train , 300-350 as D meta-val and users 0-300 as D meta-test . Weused the same pre-processing method from previous work [12],[4], by removing the background noise using OTSU, centeringthe images in a canvas of size × and resizing themto × . We analyze the impact of the hyperparameters in the clas-siﬁer’s performance, measured in D meta-val . We consider theexperiments by varying these parameters: • Number of gradient descent steps in the classiﬁer adap-tation: K ∈ { , } • One-class classiﬁcation vs adaptation using genuine sig-natures and random forgeries • Fraction of users with skilled forgeries available fortraining • Performance as we vary the number of reference genuinesignaturesWe compare the results on D meta-val with a baseline usingfeature learning followed WD classiﬁcation [4]. As in [4], weevaluate each model with repeated random subsampling: werandomly partition the validation set into training ( D u ) andtesting ( D (cid:48) u ), repeating the experiment 10 times with differentpartitions. We report the mean and standard deviation of themetrics.In all experiments, we train the meta-classiﬁer for a totalof 100 epochs, considering a meta-batch size M = 4 . Weconsider an initial meta-learning rate β = 0 . , that isreduced (with cosine annealing) to − by the last epoch. Weused early stopping, by keeping the meta-learner weights thatperformed best in the validation set. Following [23], we usedMulti-Step Loss Optimization (MSL) for improving trainingstability. For the ﬁrst 20 epochs, instead of computing the lossfunction L (cid:48) only after K steps (step 12 of algorithm 1), wecompute the loss function for all intermediate θ (cid:48) k , and considera weighted average of the losses. In the ﬁrst epoch the lossusing each θ (cid:48) k contributes equally to the loss function, and theweights are annealed to give more weight to the last step untiliteration 20, after which only the loss function at the ﬁnal step K contributes to the loss. We found this procedure effectivein stabilizing training (measured by the variation in validationaccuracy across epochs). We also attempted to use learnabletask learning rates (LSLR) described in [23] without success.Empirically, we also noticed that when using only genuinesignatures the task learning rate needs to be larger than thecase where skilled forgeries are available for training. In ourexperiments, if the fraction of users with skilled forgeries isless than 10% we used a task learning rate α = 0 . , and a TABLE II: Base architecture used in this work

Layer SizeInput 1x150x220Convolution (C1) 32x5x5Max Pooling 32x5x5Convolution (C2) 32x5x5Pooling 32x5x5Fully Connected (FC3) 1024Fully Connected (FC4) 256Fully Connected + Sigmoid 1 learning rate of α = 0 . for the other experiments.In order to evaluate the transferability of the features toother operating conditions, we conducted experiments on otherdatasets, (that were collected in different regions, and followeddifferent collection processes): MCYT-75 [24], CEDAR [25]and Brazilian PUC-PR [26]. We conducted two experiments:(i) use the meta-learner trained on GPDS directly for newusers of these datasets; (ii) train a meta-learner with data fromthe four datasets. It is worth noting that, with the exceptionof GPDS, the datasets are relatively small, with 55, 75 and60 users for CEDAR, MCYT and Brazilian PUC-PR. Weobserved that the formulations from this work require a largeamount of users for training, and for this reason, we conducted10-fold cross validation. We divide each dataset in 10 folds (byusers), and for each run we consider 1 fold as meta-test, andthe remaining folders for meta-training and validation. As inthe previous experiments, we further use repeated subsamplingfor evaluating the adaptation for the new users. In total, forexperiment (ii), we train 10 CNN models and perform 10adaptations for each user. We report the mean error rates overall runs, and the standard deviation across the 10 differentadaptations (each based on different train/test splits of therepeated subsampling).The CNN architecture used in the experiments is listed intable II. We found that using a smaller network, comparedto previous work using feature learning followed by WDclassiﬁcation, was successful in the meta-learning setting. Thisnetwork has a total of 1.4M weights and uses 0.1 GFLOPSfor forward propagation, while SigNet [4] has 15.8M weightsand uses 0.6 GFLOPS. That is, the CNN used in this work is10x smaller and 6x times faster.We evaluate the performance using the following metrics:False Rejection Rate (FRR): the fraction of genuine signa-tures rejected as forgeries; False Acceptance Rate (FAR random and FAR skilled ): the fraction of forgeries accepted as genuine(considering random forgeries and skilled forgeries). We alsoreport the Equal Error Rate (EER): which is the error whenFAR = FRR. We considered two forms of calculating the EER:EER global τ : using a global decision threshold and EER user τ :using user-speciﬁc decision thresholds. In both cases, to calcu-late the Equal Error Rate we only considered skilled forgeries.For FRR and FAR, we report the values with a threshold of . (i.e. if p ( y = 1 | x, θ (cid:48) K ) ≥ . we consider the model predicting x as a genuine signature). - F RR EER: 3.48EER: 2.86 EER: 4.60 EER: 17.03Meta-learning One-classMeta-learning Two-classSigNet-F*SigNet*

Fig. 6: ROC curves on D meta-val comparing the one-class andtwo-class formulations with the baselines.V. R ESULTS

A. System design

In this section we report the results on D meta-val (GPDSusers 300-350), considering the experiments deﬁned in sectionIV. The objective is to evaluate different aspects of thesystem, such as the number of gradient steps (that trades-offcomputation complexity and accuracy), as well as investigatethe performance of the model in different data scenarios.In a ﬁrst experiment we consider the results of the one-class formulation and the two-class formulation as we varythe number of Random Forgeries used for classiﬁer adaptation( K = 5 gradient descent steps;for meta-training we considered that skilled forgeries wereavailable on D meta-train (users 350-881). Note that for vali-dation, no skilled forgeries were used for training. Table IIIreports the results of these experiments. We observe similarveriﬁcation performance on the two formulations. Note that theformulation using random forgeries is more computationallyexpensive, as the classiﬁer adaptation involves a larger batch ofimages (e.g. computing the loss for one-class uses 5 images,while for two-class with SigNet* used thesame approach proposed in [4], but using the CNN architecturedeﬁned for this work (table II). We note that the meta-learningformulation performed much better, while being a simplermodel (single model for all users). A comparison with theSigNet CNN architecture from [4] is conducted in section V-B,where we compare to the state-of-the-art. Figure 6 presentsROC curves for the one-class formulation and the two-classformulation with K . For eachvalue of K , we meta-trained a network and evaluate itsperformance on D meta-val . We observed improved performance TABLE III: Performance on D meta-val with one-class and two-class formulations Type random

FAR skilled

EER global τ EER user τ SigNet* + WD 5 7434 10.48 ( ± . ) 0.03 ( ± . ) 24.67 ( ± . ) 17.03 ( ± . ) 13.17 ( ± . )SigNet-F* + WD 5 7434 18.08 ( ± . ) 0.16 ( ± . ) 1.55 ( ± . ) 4.6 ( ± . ) 3.08 ( ± . )Meta-learningOne-class 5 - 2.54 ( ± . ) 2.74 ( ± . ) 4.24 ( ± . ) 3.48 ( ± . ) 1.69 ( ± . )Meta-learningTwo-class 5 5 2.82 ( ± . ) 1.98 ( ± . ) 4.18 ( ± . ) 3.8 ( ± . ) 2.04 ( ± . )5 10 5.1 ( ± . ) 1.94 ( ± . ) 2.66 ( ± . ) 3.56 ( ± . ) 1.85 ( ± . )5 20 2.84 ( ± . ) 1.98 ( ± . ) 3.1 ( ± . ) 2.86 ( ± . ) 1.78 ( ± . )5 30 2.62 ( ± . ) 2.48 ( ± . ) 3.46 ( ± . ) 3.17 ( ± . ) 1.4 ( ± . ) EE R one-classtwo-class Fig. 7: Performance on D meta-val as we vary the number ofupdate steps K . EE R one-classtwo-class Fig. 8: Performance on D meta-val as we vary the number ofusers in D meta-train for which skilled forgeries are available.with larger number of steps, but with diminishing returns. Wenote a high variance of the errors in these experiments, andtherefore we cannot determine a particular K as being optimal.As we increase the number of steps, we also increase thecomputational cost. If we consider that forward propagationand backward propagation have similar cost, the classiﬁeradaptation for a new user takes K the time for a singleforward pass. A higher K also requires more memory (in theorder of K ) during meta-training, since the whole updatesequence needs to be stored in memory in order to computethe gradient for meta-update (as can be seen in ﬁgure 5).In ﬁgures 8 and 9 we analyze the impact in performance

100 200 300 400 531 EE R

0% skilled10% skilled50% skilled100% skilled (a) One-class

100 200 300 400 531 EE R

0% skilled10% skilled50% skilled100% skilled (b) Two-class

Fig. 9: Performance on D meta-val as we vary the number ofusers available for meta-training. (a): one-class formulation;(b) two-class formulation.as we vary the size of the D meta-train set. As noted in sectionIII-B, if skilled forgeries from a subset of users are available,we can incorporate them into the meta-update loss function L (cid:48) . In this experiment we considered that D meta-train containsall 531 users, and vary the number of users for which skilledforgeries are available. For each case, we build a datasetconsisting of genuine signatures for all users and skilledforgeries for the selected users, and trained a model. Figure8 shows the performance as we vary the number of usersfor which skilled forgeries as available. We re-iterate that weevaluate the performance on a disjoint set of users ( D meta-val )for which only genuine signatures are used. We observed that TABLE IV: Comparison with state-of-the art on the GPDSdataset (errors in %)

Reference Type Dataset τ ) 5.25 ( ± . )Hafemann et al [4] WD GPDS-300 5 SigNet-F (user τ ) 2.42 ( ± . )Hafemann et al [4] WD GPDS-300 12 SigNet-F (global τ ) 3.74 ( ± . )Hafemann et al [4] WD GPDS-300 12 SigNet-F (user τ ) 1.69 ( ± . )Souza et al [30] WI GPDS-300 5 SigNet (global τ ) 9.05 ( ± . )Souza et al [30] WI GPDS-300 5 SigNet (user τ ) 4.40 ( ± . )Souza et al [30] WI GPDS-300 12 SigNet (global τ ) 7.96 ( ± . )Souza et al [30] WI GPDS-300 12 SigNet (user τ ) 3.34 ( ± . )Present work WI/WD GPDS-300 5 MAML one-class (global τ ) 5.52 ( ± . )Present work WI/WD GPDS-300 5 MAML one-class (user τ ) 3.35 ( ± . )Present work WI/WD GPDS-300 5 MAML two-class (global τ ) 5.16 ( ± . )Present work WI/WD GPDS-300 5 MAML two-class (user τ ) 2.94 ( ± . )Present work WI/WD GPDS-300 12 MAML one-class (global τ ) 4.70 ( ± . )Present work WI/WD GPDS-300 12 MAML one-class (user τ ) 2.93 ( ± . )Present work WI/WD GPDS-300 12 MAML two-class (global τ ) 4.39 ( ± . )Present work WI/WD GPDS-300 12 MAML two-class (user τ ) 2.68 ( ± . ) the meta-learning formulation of the problem is well suitedto incorporating information from skilled forgeries (when itis available), and this generalizes well to unseen users, forwhich we only have genuine signatures. However, we observedthat the performance is not very good when there are onlygenuine signatures for meta-training: the one-class formulationachieves 14.15% EER when only genuine signatures areavailable, and 3.48% EER when skilled forgeries are availablefor all 531 users in meta-training.In ﬁgure 9, we evaluate the performance of the system aswe vary the number of users in D meta-train . We also consider 4levels of availability of skilled forgeries in the meta-trainingset: 0% (genuine only), 10%, 50% and 100%, where thepercentages refer to the number of users for which skilledforgeries are available (e.g. 10% with 100 users means thatforgeries for 10 users are considered, where the remaining 90users have only genuine signatures). For a given number ofusers and skilled forgery percentage, we construct a datasetwith randomly selected users (taken from the 531 users in thedevelopment set), with genuine signatures from all the selectedusers, and skilled forgeries for a fraction of the users. We thenuse this dataset for meta-training a model, and evaluate itsperformance on D meta-val . We observed improved performanceboth as more users are available for meta-training, as well aswhen more knowledge of skilled forgeries is available. Mostsurprisingly, we observed that for the two-class formulation,a classiﬁer trained with 100 users with 100% forgeries (i.e.forgeries for every user in meta-train) performed better than amodel trained with 531 users with forgeries for only 100 users(comparing ﬁgures 9b and 8): 6.07% EER vs 9.14% EER. Were-iterate that this measures the performance on discriminategenuine signatures and skilled forgeries, and the model thathas access to more users (with the same amount of users withskilled forgeries) has better performance on discriminatingrandom forgeries, since its optimization consisted mostly ofthis problem. B. Comparison with the state-of-the-art

We now compare our results with the state-of-the-art in theGPDS-300 dataset. For these comparisons, we considered amodel trained with the one-class formulation, and a model EE R global user (a) One-class EE R global user (b) Two-class Fig. 10: Performance on GPDS-300 as we vary the numberreference signatures available for each user. (a): one-classformulation; (b) two-class formulation.trained with the two-class formulation, with r = 30 forgeries.In both cases, we used the whole dataset D meta-train for trainingthe meta-classiﬁer, and used 5 genuine signatures for classiﬁeradaptation, with k = 5 updates. While training was conductedwith 5 reference signatures, we evaluate the performance ofthe system with different number of references.Table IV compares our results with the state-of-the-art.We observe an improved performance compared to other WIsystems, achieving 5.16% EER (global τ ) with 5 referencesignatures, compared to 9.05% from [30]. Comparing to WDsystems, we observed similar performance in some scenarios(5 reference signatures), and worse results otherwise. With12 reference signatures, the proposed system obtained 4.39%EER (global τ ), compared to 3.74 for the WD system [4].However, the proposed system is more scalable, as a singlemodel is stored for all users.Figure 10 shows the performance on GPDS-300 as we varythe number of reference samples available for each user. Ascommonly observed in WD systems (e.g. [4]), the performancegreatly improves as more reference samples are available fortraining: For the one-class formulation, performance with asingle reference is 9.09% EER (global τ ) and 5.81% EER(user τ ). With 12 references, we obtain 4.70% EER (global τ ) and 2.93% EER (user τ ). TABLE V: Transfer performance to the other datasets

Target Dataset Training Dataset EER (global) EER (user)MCYT GPDS 15.48 ( ± ± ± ± ± ± ± ± ± ± ± ± TABLE VI: Comparison with the state-of-the-art in MCYT

Reference Type τ ) 3.58 ( ± τ ) 2.87 ( ± τ ) 15.37( ± τ ) 12.77( ± τ ) 14.50( ± τ ) 12.44( ± C. Transfer to other datasets

We now consider results on three other datasets: MCYT,CEDAR and the Brazilian PUC-PR. Table V shows theperformance in two scenarios: (i) meta-learner trained onlyin GPDS, with its generalization to new operating conditionsand (ii) meta-learned trained on all four datasets (using 10-fold cross validation, as described in section IV). Whilethe method generalized well to unseen GPDS users, we seethat the generalization performance to other datasets is muchworse. Furthermore, we notice that even when training witha subset of users from all datasets, the performance does notimprove for all datasets. A possible explanation is that theGPDS dataset is still much larger (10 times larger than theothers) and dominates training. Overall, this suggests that theTABLE VII: Comparison with the state-of-the-art in CEDAR

Reference Type ± ± τ ) 11.06( ± τ ) 8.27( ± τ ) 10.21( ± τ ) 7.07( ± TABLE VIII: Comparison with the state-of-the-art on theBrazilian PUC-PR dataset (errors in %)

Reference Type genuine + skilled /EERBertolini et al. [37] WI 15 Graphometric 8.32Batista et al. [38] WD 30 Pixel density 10.5Rivard et al. [9] WI 15 ESC + DPDF 11.08Eskander et al. [7] WD 30 ESC + DPDF 10.67Hafemann et al.[4] WD 5 SigNet (user τ ) 2.92 ( ± τ ) 2.07 ( ± τ ) 5.95 ( ± τ ) 2.58 ( ± τ ) 5.13 ( ± τ ) 1.70 ( ± τ ) 8.55 ( ± τ ) 6.70( ± τ ) 6.93( ± τ ) 5.74( ± proposed method requires a large amount of data from thetarget application, and is sensitive to changes in operatingconditions. Finally, tables VI, VII and VIII compares de resultswith the state-of-the-art on MCYT, CEDAR and BrazilianPUC-PR, respectively.It is worth noting that the meta-learning does generalizeto new users of the GPDS dataset, as veriﬁed in sectionsV-A and V-B, since we evalute in a D meta-test that containsa disjoint set of users that was used to train the meta-learner.What we observed, however, is that this meta-learned does nottransfer well to other datasets. This has been observed morerecent work with meta-learning [39], that shows that althoughthese models perform well for new classes of the samedistribution (e.g. same dataset), the performance deteriorateswhen evaluating on new datasets (i.e. a shift in the task-distribution). This is still an active area of research in meta-learning. VI. C ONCLUSION

In this paper we proposed to formulate Signature Veriﬁ-cation as a meta-learning problem, where each user deﬁnesa task. This formulation enables directly optimizing for theobjective (separating genuine signatures and forgeries) evenwhen forgeries are not available for all users. The resultingsystem is scalable and yet adaptable for individual users:a single meta-classiﬁer is learned and stored, and for theveriﬁcation of a given signature, the classiﬁer is adaptedto the claimed user and subsequently used for veriﬁcation.The proposed method is also able to naturally incorporatenew reference signatures for a user, and enable adapting therepresentation as more training data is available. The draw-backs of this solution are twofold: increased computationalcost and worse transferability to new conditions. The methodis K slower, when using K updates for the classiﬁcationadaptation, although it allows the option to trade storage andcomputational cost - the adapted weights for a given user canbe stored for faster classiﬁcation.In our experiments with the GPDS-960 dataset, the proposedmethod obtains better results than WI systems in the literature,and approach the performance of WD systems, especiallywhen few samples are available for training. With 5 referencesignatures, the proposed method obtains 5.16% EER (usinga global threshold), compared to 9.05% of a WI system and5.25% of a WD system. For a larger number of references theWD system still performs better, but the gap in performanceis greatly reduced. Considering 12 reference signatures, themethod obtains 4.39% EER (with a global threshold), vs3.74% for the WD system, while being more scalable (singlemeta-classiﬁer) Our experiments transferring the meta-learnerto other datasets show reduced performance, highlighting theneed for better adaptation to new conditions, which will beexplored in future work. Future work also includes consideringa dynamic scenario, where the meta-classiﬁer is updated asnew training data is available.R EFERENCES[1] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Ofﬂine handwrittensignature veriﬁcation — Literature review,” in tional Conference on Image Processing Theory, Tools and Applications(IPTA) , Nov. 2017, pp. 1–8.[2] J. Vargas, C. Travieso, J. Alonso, and M. Ferrer, “Off-line SignatureVeriﬁcation Based on Gray Level Information Using Wavelet Transformand Texture Features,” in , Nov. 2010, pp. 587–592.[3] M. B. Yılmaz and B. Yanıko˘glu, “Score level fusion of classiﬁers inoff-line signature veriﬁcation,” Information Fusion , vol. 32, Part B, pp.109–119, Nov. 2016.[4] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Learning featuresfor ofﬂine handwritten signature veriﬁcation using deep convolutionalneural networks,”

Pattern Recognition , vol. 70, pp. 163–176, Oct. 2017.[5] Y. Guerbai, Y. Chibani, and B. Hadjadji, “The effective use of theone-class SVM classiﬁer for handwritten signature veriﬁcation basedon writer-independent parameters,”

Pattern Recognition , vol. 48, no. 1,pp. 103–113, Jan. 2015.[6] R. Kumar, J. D. Sharma, and B. Chanda, “Writer-independent off-linesignature veriﬁcation using surroundedness feature,”

Pattern RecognitionLetters , vol. 33, no. 3, pp. 301–308, Feb. 2012.[7] G. Eskander, R. Sabourin, and E. Granger, “Hybrid writer-independent-writer-dependent ofﬂine signature veriﬁcation system,”

IET Biometrics ,vol. 2, no. 4, pp. 169–181, Dec. 2013.[8] H. Rantzsch, H. Yang, and C. Meinel, “Signature embedding: Writerindependent ofﬂine signature veriﬁcation with deep metric learning,” in

Advances in Visual Computing , ser. Lecture Notes in Computer Science.Springer International Publishing, pp. 616–625, DOI: 10.1007/978-3-319-50832-0 60.[9] D. Rivard, E. Granger, and R. Sabourin, “Multi-feature extraction andselection in writer-independent off-line signature veriﬁcation,”

Interna-tional Journal on Document Analysis and Recognition , vol. 16, no. 1,pp. 83–103, 2013.[10] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learningfor fast adaptation of deep networks,” in

Proceedings of the 34thInternational Conference on Machine Learning , vol. 70. InternationalConvention Centre, Sydney, Australia: PMLR, 06–11 Aug 2017, pp.1126–1135.[11] J. Bromley, I. Guyon, Y. LeCun, E. Siickinger, and R. Shah, “SignatureVeriﬁcation using a “Siamese” Time Delay Neural Network,” 1994.[12] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Writer-independentfeature learning for ofﬂine signature veriﬁcation using convolutionalneural networks,” in

Neural Networks, The 2016 International JointConference on , 2016.[13] L. G. Hafemann, L. S. Oliveira, and R. Sabourin, “Fixed-sized repre-sentation learning from ofﬂine handwritten signatures of different sizes,”

International Journal on Document Analysis and Recognition (IJDAR) ,pp. 1–14, 2018.[14] E. N. Zois, M. Papagiannopoulou, D. Tsourounis, and G. Economou,“Hierarchical Dictionary Learning and Sparse Coding for Static Signa-ture Veriﬁcation,” p. 11.[15] E. N. Zois, I. Theodorakopoulos, D. Tsourounis, and G. Economou,“Parsimonious Coding and Veriﬁcation of Ofﬂine Handwritten Sig-natures,” in , Jul. 2017, pp. 636–645.[16] J. Schmidhuber, “Evolutionary principles in self-referential learning,”PhD Thesis, Technische Universit¨at M¨unchen, M¨unchen, 1987.[17] Y. Bengio, S. Bengio, and J. Cloutier, “Learning a synaptic learningrule,” in

IJCNN-91-Seattle International Joint Conference on NeuralNetworks , vol. ii, Jul. 1991, pp. 969 vol.2–.[18] D. Maclaurin, D. Duvenaud, and R. Adams, “Gradient-based hyperpa-rameter optimization through reversible learning,” in

Proceedings of the32nd International Conference on Machine Learning , ser. Proceedingsof Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille,France: PMLR, 07–09 Jul 2015, pp. 2113–2122.[19] B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing NeuralNetwork Architectures using Reinforcement Learning,” in

InternationalConference on Learning Representations , 2017.[20] S. Ravi and H. Larochelle, “Optimization as a model for few-shotlearning,” 2016.[21] P. Perera and V. M. Patel, “Learning Deep Features for One-ClassClassiﬁcation,” arXiv:1801.05365 [cs] , Jan. 2018.[22] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, “Off-line HandwrittenSignature GPDS-960 Corpus,” in

Document Analysis and Recognition,9th International Conference on , vol. 2, Sep. 2007, pp. 764–768.[23] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,”in

International Conference on Learning Representations , 2019. [24] J. Ortega-Garcia, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, I. Hernaez, J.-J. Igarza, C. Vivaracho, andothers, “MCYT baseline corpus: a bimodal biometric database,”

IEEProceedings-Vision, Image and Signal Processing , vol. 150, no. 6, pp.395–401, 2003.[25] M. K. Kalera, S. Srihari, and A. Xu, “Ofﬂine signature veriﬁcation andidentiﬁcation using distance statistics,”

International Journal of PatternRecognition and Artiﬁcial Intelligence , vol. 18, no. 07, pp. 1339–1360,Nov. 2004.[26] C. Freitas, M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier,F. Bortolozzi, and R. Sabourin, “Bases de dados de cheques bancariosbrasileiros,” in

XXVI Conferencia Latinoamericana de Informatica ,2000.[27] J. Hu and Y. Chen, “Ofﬂine Signature Veriﬁcation Using Real AdaboostClassiﬁer Combination of Pseudo-dynamic Features,” in

DocumentAnalysis and Recognition, 12th International Conference on , Aug. 2013,pp. 1345–1349.[28] Y. Serdouk, H. Nemmour, and Y. Chibani, “New gradient featuresfor off-line handwritten signature veriﬁcation,” in , Sep. 2015, pp. 1–4.[29] A. Soleimani, B. N. Araabi, and K. Fouladi, “Deep Multitask MetricLearning for Ofﬂine Signature Veriﬁcation,”

Pattern Recognition Letters ,vol. 80, pp. 84–90, Sep. 2016.[30] V. L. F. Souza, A. L. I. Oliveira, and R. Sabourin, “A Writer-IndependentApproach for Ofﬂine Signature Veriﬁcation using Deep ConvolutionalNeural Networks Features,” in , Oct. 2018, pp. 212–217.[31] J. Wen, B. Fang, Y. Y. Tang, and T. Zhang, “Model-based signature ver-iﬁcation with rotation invariant features,”

Pattern Recognition , vol. 42,no. 7, pp. 1458–1466, Jul. 2009.[32] J. F. Vargas, M. A. Ferrer, C. M. Travieso, and J. B. Alonso, “Off-line signature veriﬁcation based on grey level information using texturefeatures,”

Pattern Recognition , vol. 44, no. 2, pp. 375–385, Feb. 2011.[33] S. Y. Ooi, A. B. J. Teoh, Y. H. Pang, and B. Y. Hiew, “Image-basedhandwritten signature veriﬁcation using hybrid methods of discreteRadon transform, principal component analysis and probabilistic neuralnetwork,”

Applied Soft Computing , vol. 40, pp. 274–282, 2016.[34] S. Chen and S. Srihari, “A New Off-line Signature Veriﬁcation Methodbased on Graph,” in , vol. 2, 2006, pp. 869–872.[35] R. Kumar, L. Kundu, B. Chanda, and J. D. Sharma, “A Writer-independent Off-line Signature Veriﬁcation System Based on SignatureMorphology,” in

Proceedings of the First International Conference onIntelligent Interactive Technologies and Multimedia , ser. IITM ’10. NewYork, NY, USA: ACM, 2010, pp. 261–265.[36] R. Bharathi and B. Shekar, “Off-line signature veriﬁcation based onchain code histogram and Support Vector Machine,” in , Aug. 2013, pp. 2063–2068.[37] D. Bertolini, L. S. Oliveira, E. Justino, and R. Sabourin, “Reducingforgeries in writer-independent off-line signature veriﬁcation throughensemble of classiﬁers,”

Pattern Recognition , vol. 43, no. 1, pp. 387–396, Jan. 2010.[38] L. Batista, E. Granger, and R. Sabourin, “Dynamic selection of genera-tive–discriminative ensembles for off-line signature veriﬁcation,”

PatternRecognition , vol. 45, no. 4, pp. 1326–1340, Apr. 2012.[39] E. Triantaﬁllou, T. Zhu, V. Dumoulin, P. Lamblin, K. Xu, R. Goroshin,C. Gelada, K. Swersky, P.-A. Manzagol, and H. Larochelle, “Meta-dataset: A dataset of datasets for learning to learn from few examples,” arXiv preprint arXiv:1903.03096 , 2019.

Luiz G. Hafemann received his B.S. degree inComputer Science in 2008 and his M.Sc. degree inInformatics in 2014, both from the Federal Univer-sity of Paran´a, Curitiba, PR, Brazil. He received hisPh.D. degree in Systems Engineering in 2019 fromthe ´Ecole de Technologie Sup´erieure, Universit´e duQu´ebec, in Montreal, QC, Canada. He is currently aresearcher at Sportlogiq, applying computer visionmodels for sports analytics. His current interestsinclude meta-learning, adversarial machine learningand group activity recognition. Luiz S. Oliveira received his B.S. degree in Com-puter Science from Unicenp, Curitiba, PR, Brazil,the M.Sc. degree in electrical engineering and in-dustrial informatics from the Centro Federal deEducac¸˜ao Tecnol´ogica do Paran´a (CEFET-PR), Cu-ritiba, PR, Brazil, and Ph.D. degree in ComputerScience from ´Ecole de Technologie Sup´erieure, Uni-versit´e du Qu´ebec in 1995, 1998 and 2003, respec-tively. From 2004 to 2009 he was professor of theComputer Science Department at Pontiﬁcal CatholicUniversity of Paran´a, Curitiba, PR, Brazil. In 2009,he joined the Federal University of Paran´a, Curitiba, PR, Brazil, where he isprofessor of the Department of Informatics and head of the Graduate Programin Computer Science. His current interests include Pattern Recognition,Machine Learning, Image Analysis, and Evolutionary Computation.