Learning Meta Model for Zero- and Few-shot Face Anti-spoofing
Yunxiao Qin, Chenxu Zhao, Xiangyu Zhu, Zezheng Wang, Zitong Yu, Tianyu Fu, Feng Zhou, Jingping Shi, Zhen Lei
LLearning Meta Model for Zero- and Few-shot Face Anti-spoofing
Yunxiao Qin,
Chenxu Zhao, ∗ Xiangyu Zhu, Zezheng Wang, Zitong Yu, Tianyu Fu, Feng Zhou, Jingping Shi, Zhen Lei Northwestern Polytechnical University, Xian, China, AIBEE, Beijing, China National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China CMVS, University of Oulu, Oulu, Finland, Winsense Technology Ltd, Beijing, [email protected], { cxzhao; zezhengwang; fzhou } @aibee.com, { xiangyu.zhu; zlei } @nlpr.ia.ac.cnzitong.yu@oulu.fi, [email protected], [email protected] Abstract
Face anti-spoofing is crucial to the security of face recog-nition systems. Most previous methods formulate face anti-spoofing as a supervised learning problem to detect variouspredefined presentation attacks, which need large scale train-ing data to cover as many attacks as possible. However, thetrained model is easy to overfit several common attacks andis still vulnerable to unseen attacks. To overcome this chal-lenge, the detector should: 1) learn discriminative featuresthat can generalize to unseen spoofing types from predefinedpresentation attacks; 2) quickly adapt to new spoofing typesby learning from both the predefined attacks and a few ex-amples of the new spoofing types. Therefore, we define faceanti-spoofing as a zero- and few-shot learning problem. Inthis paper, we propose a novel A daptive I nner-update M eta F ace A nti- S poofing (AIM-FAS) method to tackle this prob-lem through meta-learning. Specifically, AIM-FAS trains ameta-learner focusing on the task of detecting unseen spoof-ing types by learning from predefined living and spoofingfaces and a few examples of new attacks. To assess the pro-posed approach, we propose several benchmarks for zero-and few-shot FAS. Experiments show its superior perfor-mances on the presented benchmarks to existing methods inexisting zero-shot FAS protocols. Face recognition is a ubiquitous technology used in indus-trial applications and commercial products. However, facerecognition system is easy to be fooled by presentation at-tacks (PAs), such as printed face (print attack), face replayon digital device (replay attack), face covered by a mask(3D-mask attack), etc. As a result, face anti-spoofing (FAS)system, which detects whether the presented face is live ornot, becomes essential to keep the recognition system safe.Until now, researchers have proposed lots of hand-craftedfeature based (Boulkenafet, Komulainen, and Hadid 2016;Gan et al. 2017; Lucena et al. 2017) and deep-learningbased methods(Lucena et al. 2017; Xu, Li, and Deng 2015;Shao, Lan, and Yuen 2017) to discriminate spoof faces fromliving faces. Most of them train the detector to learn how to ∗ Corresponding Author.Copyright c (cid:13)
Real2 Replay2 Real IR
Meta-LearnerDetector
Real √Spoof √ output
Mask1Real1Replay1Print1
Train SetTest Set
Mask2 update
Figure 1: A zero- and few-shot FAS example. The train setcontains several predefined living and spoofing types. Thetest set contains several faces of new emerged living andspoofing types. Zero-shot FAS is training the detector onlyon the train set and testing it on the test set. Whereas, few-shot FAS utilizes both the train set and a few collected faces(the blue box) for updating the detector.discriminate living and spoofing faces with numerous pre-defined living and spoofing faces in a supervised way.The detectors are satisfactory at detecting the predefinedPAs due to their data-driven training manner. However,when deployed in real scenarios, the FAS systems will en-counter the following practical challenges. • A variety of application scenarios and unpredictable novelPAs keep evolving. Data-driven models may give unpre-dictable results when faced with out-of-distribution liv-ing examples captured in new application scenarios andspoofing examples with new PAs. • When we adapt the anti-spoofing model to new attacks,existing methods need to collect sufficient samples fortraining. However, it is expensive to collect labeled datafor every new attack since the spoofing keeps evolving.To overcome these challenges, we propose that FASshould be treated as an open set zero- and few-shot learn- a r X i v : . [ c s . C V ] D ec ng problem. As shown in Fig.1, Zero-shot learning aims tolearn general discriminative features, which are effective todetect un-predicted new PAs, from the predefined PAs.
Few-shot learning aims to quickly adapt the anti-spoofing modelto new attacks by learning from both the predefined PAs andthe collected very few examples of the new attack.Zero-shot FAS problem has been studied in (Liu et al.2019; Arashloo, Kittler, and Christmas 2017) neglectingthe few-shot scene. As aforementioned, the FAS detec-tor should solve both zero- and few-shot FAS problems.To this end, inspired by the model-agnostic meta-learning(MAML) (Finn, Abbeel, and Levine 2017), we propose anovel meta-learning based FAS method: Adaptive Inner-update Meta Face Anti-spoofing (AIM-FAS).AIM-FAS solves the zero- and few-shot FAS problem by F usion T raining (FT) a meta-learner on zero- and few-shotFAS tasks, with A daptive I nner- U pdate (AIU) learning ratestrategy. FT means the meta-learner is forced to focus onsimultaneously learning: 1) general discriminative featuresto detect unseen PAs from predefined PAs, if no instanceof the new PA has been collected; 2) better discriminativefeatures to adapt to new PA from both the predefined PAsand the few instances of the new PA, once a few instances ofthe new PA are collected. AIU means the meta-learner inner-updates with a learn-able regular inner-update step size.To evaluate the zero- and few-shot FAS, we propose threebenchmarks to assess the FAS model’s learning capability ofdetecting new PAs from the same domain, different domains,and different modals.The main contributions of this paper are: • To the best of our knowledge, we are the first to formulateFAS as a zero- and few-shot learning problem. • To solve zero- and few-shot FAS problem, we proposea novel meta-learning based approach: Adaptive Inner-update Meta Face Anti-spoofing (AIM-FAS), which Fu-sion Trains (FT) a meta-learner on zero- and few-shotFAS tasks with a novel developed Adaptive Inner-Update(AIU) strategy. • We propose three novel zero- and few-shot FAS bench-marks to validate the efficacy of AIM-FAS. • Comprehensive experiments are conducted to show thatAIM-FAS achieves state-of-the-art results on zero- andfew-shot anti-spoofing benchmarks.
Traditional FAS methods (de Freitas Pereira et al. 2012;2013; M ¨ a ¨ a tt ¨ a , Hadid, and Pietik ¨ a inen 2011; Patel, Han,and Jain 2016b; Boulkenafet, Komulainen, and Hadid 2017;Komulainen, Hadid, and Pietikainen 2013) usually extracthand-crafted features from the facial images and train a bi-nary classifier to detect spoofing faces. Recently, deep learn-ing based FAS methods (Lucena et al. 2017; Nagpal andDubey 2018; Li et al. 2016; Patel, Han, and Jain 2016a)attract more attention. These methods commonly train adeep network to learn static discrimination between liv-ing and spoofing faces, with binary classification or depth regression supervision. Recent researches show that thedepth regression supervised methods (Atoum et al. 2017;Liu, Jourabloo, and Liu 2018) outperform the binary clas-sification methods, mainly because they provide the net-work with more detail information to study the spoofingcues. However, either traditional or deep learning based ap-proaches are still sensitive to various conditions, such as il-lumination, blur pattern, capture camera, and presentationattack instruments. Slight change of these conditions wouldsignificantly affect the performance of the FAS detector. Few-shot learning (Vinyals et al. 2016; Snell, Swersky, andZemel 2017), which aims at learning from very few in-stances, has attracted lots of attention. Metric learning basedmethods are popular to solve few-shot learning problem.These methods train a non-linear mapping function project-ing images to an embedding space, and classify the imagewith nearest neighbor or linear classifier. Recently, meta-learning (Bengio et al. 1992; Finn, Abbeel, and Levine 2017;Nichol, Achiam, and Schulman 2018; Duan et al. 2016;Mishra et al. 2018; Grant et al. 2018; Qin et al. 2018) basedmethods solve few-shot learning by training a meta-learneron few-shot learning tasks. Given a few examples of newobject categories, these methods train the meta-learner torecognize the new categories by memorizing (Mishra et al.2018; Duan et al. 2016) the few examples of the new cat-egories or updating its weight(Finn, Abbeel, and Levine2017; Nichol, Achiam, and Schulman 2018; Qin et al. 2018).
Few-shot learning task is usually referred to N -way K -shot learning task, which contains N unseen categories forthe model to recognize. Compared to conventional classifi-cation problem, each way in the task has a relatively smallernumber ( K ) of labeled examples provided for training. Ina nutshell, an N -way K -shot task provides a support set of N K labeled examples for the model to learn. In evaluation,a query set that contains several other examples from the N unseen categories is used to test the model. Zero-shot learning which aims at to recognize unseen cate-gory with only description or semantic attributes of the newcategory. Similar to metric learning based few-shot learn-ing, traditional zero-shot learning methods train a model tolearn a visual-semantic embedding (Lampert, Nickisch, andHarmeling 2009; 2014; Norouzi et al. 2013). Once the em-bedding is trained, the instance of unseen classes can be clas-sified in two steps. Firstly, the instance is projected into thesemantic space. Secondly, it is labeled to the class which hasthe most similar semantic attributes.For
Zero-shot learning task , the model is required to rec-ognize unseen categories by learning only from the descrip-tion or semantic information of these unseen categories. Inother words, the support set of the zero-shot learning taskcontains only the description or semantic information ofthese unseen categories. In this paper, we prefer to solve bothzero- and few-shot FAS problems simultaneously.
In this section, we detail the proposed Adaptive Inner-updateMeta Face Anti-spoofing (AIM-FAS) method. -shot FAS Task 𝐿 (cid:3036) (M-K) 𝐿 (cid:3037) (K) 𝑆 (cid:3040) (M-K) 𝑆 (cid:3041) (K)𝐿 (cid:3037) (Q) 𝑆 (cid:3041) (Q) SupportQuery
Living Spoofing
Support setQuery set
Meta-learner(Weight , IULR )Meta-learner(Weight ) Meta-learning lossPerformanceInner updates
Task N
Fine-grained FAS dataset Meta-train
Support setQuery setTask 2Support setQuery setTask 1
N zero- and few-shot FAS tasks ... ... ... 𝜃 𝜃 ((cid:3048)) (a) (b) 𝛼𝛾 (cid:3037) 𝐿 (cid:2869) 𝐿 (cid:2870) 𝑆 (cid:2869) 𝑆 (cid:2870) 𝐿 (cid:3039) 𝑆 (cid:3046) Figure 2: (a) The fine-grained FAS dataset contains several living ( L , L , ..., L l ) and spoofing ( S , S , ..., S s ) categories, andgenerates N zero- and few-shot FAS tasks. (b) The meta-learner inner-updates itself on the support set for u steps (the pinkarrow), and updates its weight θ to θ ( u ) . Then we get the meta-learner’s zero- and few-shot learning performance and meta-learning loss by testing the updated meta-learner on the query set. Finally, we optimize the meta-learner with the meta-learningloss. The L j (Q) in query set of the K-shot FAS task means the query set contains Q faces from the L j living face category. Zero- and few-shot FAS task
We propose that there existgeneral discriminative features among predefined PAs andunpredicted new PAs. In other words, the knowledge in pre-defined living and spoofing faces can be transferred to detectnew living ( e.g. the living faces recorded in new applica-tion scenarios) and new spoofing types. Therefore, we de-fine zero- and few-shot FAS task differently from the tradi-tional zero- and few-shot learning task. In zero-shot FAS,the model learns the feature to recognize new living andspoofing categories from predefined living and spoofing cat-egories. The support set in zero-shot FAS task only containspredefined living and spoofing faces. In few-shot FAS task,the model learns the feature to detect new spoofing types notonly from the predefined types but also from a few examplesof new living and spoofing types The support set in few-shotFAS task contains faces of not only new living and spoofingtypes but also predefined types.
Task generation
To generate zero- and few-shot FAStasks, we split the living and spoofing faces into fine-grained pattern, and show the fine-grained dataset structurein Fig.2(a). We show an example of K -shot FAS task inFig.2(b), and generate the K -shot ( K > = 0 ) FAS tasks inthe following way: 1) sample one fine-grained living cate-gory L i and one spoofing category S m , from the train set.2) sample M − K faces from each of L i and S m . 3) re-sample one fine-grained living category L j and one spoof-ing category S n . Note that, for training tasks, L j and S n are sampled from the train set, and for testing tasks, theyare sampled from the test set. 4) sample K + Q faces fromeach of L j and S n . 5) build the query set with Q facesfrom L j and S n , and build the support set with the other ∗ ( M − K ) + 2 ∗ K = 2 M faces. In other words, L i and S m can be seen as the predefined categories, and L j and S n can be seen as the new emerged categories. In this way, wegenerate both zero-shot and few-shot learning tasks. When K = 0 (zero-shot FAS), the meta-learner learns from L i and S m , and predict faces from L j and S n . When K > (few-shot FAS), the meta-learner learns from L i , L j , S m and S n ,and predict faces from L j and S n . To tackle the zero- and few-shot FAS problem, we developour Adaptive Inner-update Meta Face Anti-Spoofing (AIM- FAS) with training a meta-learner on zero- and few-shotFAS training tasks. Furthermore, an Adaptive Inner-Update(AIU) strategy is presented to improve the performance fur-ther, as the meta-learner will inner-update more accuratelyon the support set with AIU. Specifically, on a given zero- orfew-shot FAS task, one training iteration of the meta-learnerconsists of two stages.
Inner-update stage
The meta-learner with weight θ inner-updates itself on the support set for several steps whichcan be formulated as: L s ( τ i ) ( θ ( j ) i ) ← (cid:107) s ( τ i ) (cid:107) (cid:88) x,y ∈ s ( τ i ) l ( f θ ( j ) i ( x ) , y ) , (1) θ ( j +1) i ← θ ( j ) i − α · γ j · ∇ θ ( j ) i L s ( τ i ) ( θ ( j ) i ) , (2)where τ i is a randomly selected zero- or few-shot FAS train-ing task, and θ ( j ) i is the meta-learner’s weight after j inner-update steps. Note that, for each task τ i , θ (0) i = θ when j = 0 . x and y is a pair of instance and label sampled fromthe support set of τ i . (cid:107) s ( τ i ) (cid:107) is the number of instances ofthe support set. If not otherwise specified, (cid:107) s ( τ i ) (cid:107) = 2 M . f θ ( j ) i ( x ) is the meta-learner’s prediction on instance x , and L s ( τ i ) ( θ ( j ) i ) is the meta-learner’s loss on the support set.Scalar parameter α and γ in Eq.2 are the keys to achieveAIU. Both of them are trainable. The product of α and γ j isthe inner-update learning rate (IULR). j is the meta-learner’sinner-update step. The IULR changes along with the updatesof j . For example, when the meta-learner inner-updates it-self on the support set for the first step ( j =0 ), the IULR is α itself. When the meta-learner inner-updates itself on the sup-port set for the second step ( j =1 ), the IULR turns to α · γ .With trainable α and γ , the meta-learner inner-updates withan adaptive step size. After u inner-update steps, the meta-learner update its weight from θ to θ ( u ) i on the support setwith Eq.1 and Eq.2. Optimizing stage
The meta-learner is evaluated and op-timized on the query set, which contains faces of unseen liv-ing and spoofing categories. The optimization can be formu-lated as: L q ( τ i ) ( θ ( u ) i ) ← (cid:107) q ( τ i ) (cid:107) (cid:88) x,y ∈ q ( τ i ) l ( f θ ( u ) i ( x ) , y ) (3) ( θ, α, γ ) ← ( θ, α, γ ) − β · ∇ ( θ,α,γ ) L q ( τ i ) ( θ ( u ) i ) (4) lgorithm 1 AIM-FAS in training stage input: K -shot ( K > = 0 ) FAS training tasks Ψ t , learningrate β , number of inner-update steps u , initial value of AIUparameters α and γ . output: Meta-learner’s weight θ , AIU parameters α and γ . initialize θ and AIU parameters α and γ . pre-train the meta-learner on the train set. done do4 : sample batch tasks τ i ∈ Ψ t each of τ i do6 : θ (0) i = θ j < u do8 : L s ( τ i ) ( θ ( j ) i ) ← (cid:107) s ( τ i ) (cid:107) (cid:80) x,y ∈ s ( τ i ) l ( f θ ( j ) i ( x ) , y ) θ ( j +1) i ← θ ( j ) i − α · γ j · ∇ θ ( j ) i L s ( τ i ) ( θ ( j ) i ) L q ( τ i ) ( θ ( j +1) i ) ← (cid:107) q ( τ i ) (cid:107) (cid:80) x,y ∈ q ( τ i ) l ( f θ ( j +1) i ( x ) , y ) j = j + 1
12: end13: end14: ( θ, α, γ ) ← ( θ, α, γ ) - β · ∇ ( θ,α,γ ) (cid:80) τ i L q ( τ i ) ( θ ( u ) i )
15: end where x and y is a pair of instance and label from the queryset of task τ i . (cid:107) q ( τ i ) (cid:107) is the number of instances of the queryset. If not otherwise specified, (cid:107) q ( τ i ) (cid:107) is Q . Note that whenthe meta-learner is evaluated on the query set, its weight is θ ( u ) i , which is updated from θ with Eq.2 for u inner-updatesteps. Further more, in Eq.4, ∇ ( θ,α,γ ) L q ( τ i ) ( θ ( u ) i ) uses themeta-learner’s loss on query to compute the gradient of θ , α and γ , but not θ ( u ) i . β is the learning rate in the optimiz-ing stage. By constantly training the meta-learner on lots ofthese zero- and few-shot learning tasks, the meta-learner isforced to learn easy fine-tuning weight θ and propriety α and γ . With weight θ and the adaptive IULR α · γ j , the meta-learner updates itself accurately on the support set, and learnthe discriminative features to detect unseen spoofing types.The training process of AIM-FAS is shown in Algo-rithm 1 and Fig.2(b). We firstly pre-train the meta-learnerto learn the prior knowledge about FAS on the train set (line2 in Algorithm 1), and secondly meta-train the meta-learneron zero- and few-shot FAS training tasks. The testing pro-cess of AIM-FAS is shown in Algorithm 2, in which P q ( τ i ) is the meta-learner performance on the query set of τ i . X q ( τ i ) and Y q ( τ i ) are the faces and labels in the query set of task τ i . Difference between AIM-FAS with the other traditionalFAS methods
The difference between AIM-FAS withthe other traditional FAS is that AIM-FAS trains the meta-learner to focus on learning the discrimination for detectingnew spoofing category, from the support set where containspredefined living and spoofing faces and a few or none dataof the new living and spoofing categories, while traditionalFAS methods train a detector to learn the discrimination fordetecting predefined spoofing faces.
Fusion Train (FT)
Traditionally, meta-learning methodstrain meta-learners independently for different K -shot ( K >
Algorithm 2
AIM-FAS in testing stage input: K -shot FAS testing tasks Ψ v , number of inner-updatesteps u , Meta-learner’s weight θ , AIU parameters α and γ . output: Meta-learner’s performance P . each of τ i ∈ Ψ v do2 : θ (0) i = θ j < u do4 : L s ( τ i ) ( θ ( j ) i ) ← (cid:107) s ( τ i ) (cid:107) (cid:80) x,y ∈ s ( τ i ) l ( f θ ( j ) i ( x ) , y ) θ ( j +1) i ← θ ( j ) i − α · γ j · ∇ θ ( j ) i L s ( τ i ) ( θ ( j ) i ) j = j + 1 P q ( τ i ) ← p ( f θ ( u ) i ( X q ( τ i ) ) , Y q ( τ i ) ) P ← (cid:107) Ψ v (cid:107) (cid:80) τ i ∈ Ψ v P q ( τ i )
32 48 32 32 48 32 32 48 3216 32 Input Output C on c a t P oo l P oo l P oo l Figure 3: Network structure of AIM-FAS. The pink cube isthe convolution layer, on which the number means the num-ber of channels of its filter. ) learning problems. For example, to solve 1-shot learningproblem, they usually train a meta-learner on 1-shot train-ing tasks, and to solve 5-shot learning problem, they trainanother meta-learner on 5-shot training tasks. In contrast,our goal is training one meta-learner to solve both zero- andfew-shot FAS tasks. So, in AIM-FAS, we train the meta-learner in a Fusion Training (FT) manner, which meansthe meta-learner is simultaneously trained on different K -shot ( K > = 0 ) FAS tasks. Specifically, the meta-learner istrained on tasks of both zero- and few-shot FAS tasks, ie. etc. . In our experiment, we show that FTimproves AIM-FAS on both zero- and few-shot FAS tasks.
Network
Depth-supervised FAS methods (Liu, Jourabloo,and Liu 2018) take advantage of the discrimination betweenspoofing and living faces based on 3D shape, and providemore detailed information for the FAS model to capturespoofing cues. Motivated by this, AIM-FAS trains the meta-learner to solve depth regression based zero- and few-shotFAS tasks. We build a depth regression network for AIM-FAS and name it as FAS-DR. The structure and detail ofFAS-DR is shown in Fig.3. There are three cascaded blocksin the network backbone, and all their features are con-catenated for predicting the facial depth. We formulate thefacial depth prediction process as (cid:101) D = f θ ( x ) , where x ∈ R × × is the RGB facial image, and (cid:101) D ∈ R × × isthe predicted facial depth, and θ is the network’s weights.Contrastive Depth Loss (CDL) (Wang et al. 2018) is uti-lized to help the meta-leaner to predict vivid facial depth.The CDL is L contrast = (cid:88) i (cid:107) k contrasti · (cid:101) D − k contrasti · D (cid:107) , (5)here D is the generated “ground truth” facial depth label. k contrasti is the kernel of CDL, and i ∈ { } . To verify AIM-FAS, we propose three Z ero- and F ew-shotFAS benchmarks: OULU-ZF , Cross-ZF and
SURF-ZF . OULU-ZF is a single domain zero- and few-shot FASbenchmark and is build based on OULU-NPU. In OULU-NPU, there are 6 image capture devices, 3 kinds of livingfaces (living faces captured within 3 different sessions), 2kinds of print attacks, and 2 kinds of replay attacks. All liv-ing and spoofing faces are captured with 55 people. We reor-ganize OULU-NPU into
OULU-ZF and show the structureof
OULU-ZF in Tab.1. There is no overlap between the train(seen categories) and test (unseen categories) set. The trainset contains 2 kinds of living face (living 2 and 3), 1 kind ofprint faces and replay faces. All living and spoofing faces inthe train set are captured with device 1,2,4,5 and 6. Whereas,in the test set, all living faces are the living 1 category.
Cross-ZF is a cross domain zero- and few-shot FASbenchmark which is more challenging than
OULU-ZF . Itcontains more varied living and spoofing categories. Webuild
Cross-ZF based on several public FAS dataset. Tab.1shows the structure of
Cross-ZF . The train set contains 7kinds of living faces, 4 kinds of printed faces, and 7 kindsof replayed faces, from three public dataset: CASIA-MFSD,MSU-MFSD, and SiW. The test set contains living andspoofing faces from the other three dataset: 3DMAD, Oulu-NPU, and Replay-Attack. There is no overlap between thetrain set and test set, and the test set contains 3D Mask faces,which are different greatly with printed and replayed faces.
SURF-ZF is a cross modal zero- and few-shot FASbenchmark. We build
SURF-ZF based on the CASIA-SURFdataset. Structure of
SURF-ZF is shown in Tab.3. We extractseveral samples from CASIA-SURF, and split these exam-ples into train, validation, and test set. The train set containsRGB and Depth modalities, and the test/validation set con-tains IR and depth modalities. Each set contains all PSAIs(Living 1;Print1;Cut1-5). Based on
SURF-ZF , we can testthe model’s ability of learning fast from new modalities.
Performance Metrics
In our experiments, AIM-FAS isevaluated by: 1) Attack Presentation Classification ErrorRate (
AP CER ); 2) Bona Fide Presentation ClassificationError Rate (
BP CER ); 3)
ACER (international organiza-tion for standardization 2016), which evaluates the meanof
AP CER and
BP CER . 4) Area Under Curve (AUC).
Evaluation Process
On all benchmarks, we evaluate themeta-learner’s zero- and few-shot FAS performance in thefollowing way: 1) train the meta-learner on the training tasksgenerated on the train set; 2) test the meta-learner on zero-and few-shot FAS testing tasks on the test set; 3) calculatethe meta-learner’s performance with Eq.6.
ACER avg = (cid:88) ACER Ti =1 /T,ACER = ACER avg ± . ∗ σ/ √ T (6) Table 1: Zero- and few-shot FAS benchmark: OULU-ZF . Set Device Subjects PSAITrain Phone 1,2,4,5,6 1-20 Living 2,3; Print 1;Replay 1Val Phone 3 21-35 Living 1-3; Print 1,2; Replay 1,2Test Phone 1,2,4,5,6 36-55 Living 1; Print 2; Replay 2
Table 2: Zero- and few-shot FAS benchmark:
Cross-ZF . Set Domains PSAITrain CASIA-MFSD, MSU-MFSD, SiW Living 1-7; Print 1-4; Replay 1-7Val MSU-USSA Living 1;Print 1-2; Replay 1-6Test 3DMAD, Oulu-NPU, Replay-Attack Living 1-9; 3D Mask; Print 1-3;Replay 1-4
Table 3: Zero- and few-shot FAS benchmark:
SURF-ZF . Set Modals PSAITrain RGB;Depth Living 1;Print 1;Cut 1-5Val IR;Depth Living 1;Print 1;Cut 1-5Test IR:Depth Living 1;Print 1;Cut 1-5 σ is the standard deviation of ACER on all the test tasks,and T is the quatity of test tasks. Implementation Details
In our experiment, we gener-ate the ground-truth depth label of living face with the PR-Net (Feng et al. 2017), and normalize the generated facialdepth to [0, 1]. To distinguish spoofing face from living face,we set the ground-truth depth label of spoofing face to allzero. The generated facial depths are shown in Fig. 4. Allthe facial depth maps are resized into 32 ×
32 resolution.We generate 100,000 training tasks on the train set and100 ( T = 100 ) testing tasks on the test set. For each K -shottraining task, K is randomly sampled from { } .For testing tasks, K is a specified number indicating themeta-learner is tested on specified K -shot tasks. For exam-ple, if we evaluate the meta-learner’s performance on zero-shot FAS tasks, we set K = 0 and generate 100 such zero-shot testing tasks to test the meta-learner. We set Q to 15, M to 10. The meta batch size is set to 8, and the meta-learningrate β is set to 0.0001. The AIU parameters α and γ are ini-tialized to 0.001 and 1, respectively. Compared Methods
To validate the performance ofAIM-FAS on zero- and few-shot FAS problem, we compareAIM-FAS with three FAS detectors Resnet-10, FAS-DR,and DTN*. The detector FAS-DR is the network of AIM-FAS trained in traditional supervised learning. As the net-work of detector FAS-DR is the same as that of AIM-FAS,We treat detector FAS-DR as the baseline of AIM-FAS. Thedetector Resnet-10 is a binary classification FAS model andis also trained in traditional supervised learning. DTN(Liuet al. 2019) is a zero-shot FAS detector. We re-implementDTN with all experiment settings the same to the originalpaper and named it as DTN*. For fairly comparison, we setup the evaluation protocol for all methods, which is shownin Tab.4. For example, the detector Resnet-10 is trained onthe train set, and to evaluate its 0-shot performance, we eval-uate it directly on the query set of 0-shot FAS tasks withoutfinetuning on the support set. To evaluate its 1-shot perfor-mance, we first finetune it on the support set of the 1-shottasks and then evaluate it on the corresponding query set. iving face Replay F a c e L a b e l PrintMask Spoofing faceType 2 Type 3Type 1
Figure 4: Generated depth label of living and spoofing faces.Table 4: Evaluation detail of compared methods and AIM-FAS.
Method Train 0-shot Test 1- or 5-shot TestFinetune Evaluate Finetune EvaluateCompared Train set / Query Support Query
AIM-FAS
Training tasks Support Query Support Query
Corresponding experimental results on the proposed bench-marks are shown in Tab.5. It can be seen that AIM-FASoutperforms the other detectors with a clear margin on allbenchmarks. Note that, as original DTN is designed for zero-shot FAS, we follow the same way to evaluate DTN* onzero-shot instead of few-shot tasks. Compared with FAS-DR, the
ACER of AIM-FAS decreases by 25%, 17%, and28% on zero-, 1- and 5-shot tasks on
OULU-ZF , respec-tively, and decreases by 38%, 30%, and 38% on
Cross-ZF ,and decreases by 12%, 13%, and 16% on
SURF-ZF . SinceAIM-FAS trains the meta-learner to focus on learning dis-crimination of new spoofing types from predefined faces,or from predefined faces and a few examples of new livingand spoofing types. It learns more generalized discrimina-tive features for detecting new attack types. Whereas, FAS-DR only focus on learning the discrimination to distinguishpredefined spoofing faces from living faces.Another phenomenon is that the margin between AIM-FAS and the other methods is more clear on
Cross-ZF thanon the other benchmarks. The possible reason behind isthat
Cross-ZF contains more diverse fine-grained living andspoofing categories, which is more suitable than the otherbenchmarks for AIM-FAS learning general discriminationfor detecting new attack types.
To further evaluate the advantages of AIM-FAS, we testAIM-FAS on the protocol proposed by (Arashloo, Kittler,and Christmas 2017). In this protocol, CASIA, Replay-Attack, and MSU-MFSD are used to evaluate the FASmodel’s zero-shot performance across replay and print at-tacks. As shown in Tab.6, AIM-FAS performs better thanthe other methods on most sub-protocols with rising theaverage AUC by at least 0.52%. The result further revealsthat AIM-FAS is successful for not only few-shot but alsozero-shot FAS. AIM-FAS performs not the best on the sub-protocols of CASIA Video, Replay-Attack Video, and MSUPrinted Photo. The possible reason is that the training spoof-ing categories of these sub-protocols are unitary and not suit-able for AIM-FAS learning the discrimination to detect the Table 5: Experimental result on three proposed benchmarks.DTN* is our re-implementation of DTN(Liu et al. 2019)with the same setting as its original paper.
Benchmark Method ACER(%)0-shot 1-shot 5-shotOULU-ZF Resnet-10 7.27 ± ± ± ± / / FAS-DR 6.60 ± ± ± AIM-FAS 4.97 ± ± ± Cross-ZF Resnet-10 26.51 ± ± ± ± / / FAS-DR 13.49 ± ± ± AIM-FAS 8.43 ± ± ± SURF-ZF Resnet-10 45.60 ± ± ± ± / / FAS-DR 34.61 ± ± ± AIM-FAS 30.97 ± ± ± testing spoofing category. For example, on CASIA, the CutPhoto and Warped Photo spoofing categories are not var-ied enough, and the meta-learner trained on these categoriescan hardly summarize and capture the general discrimina-tion that is effective for detecting the Video category. AIM-FAS for Binary Supervision
We validate the ef-fectiveness of AIM-FAS on binary supervised architectureby taking Resnet-10 as the backbone, named as AIM-FAS(Resnet). And Resnet-10 trained in traditional supervisedmanner is set as the baseline of AIM-FAS (Resnet). Thecomparison of these two methods on
Cross-ZF is shownin Tab.7. Compared with Resnet-10, AIM-FAS (Resnet) de-crease the ACER by 45.08%, 54.72%, and 41.55% on 0-shot, 1-shot, and 5-shot tasks, respectively. This demon-strates the generality of AIM-FAS on different networkstructures and different supervision manners.
Effectiveness of Predefined Living and Spoofing Faces inSupport Set
Here we verify whether predefined livingand spoofing faces in the support set are useful for AIM-FAS to learn discrimination to detect the new spoofing cate-gory. In this experiment, during the testing stage, we gener-ate K -shot FAS tasks without predefined living and spoof-ing categories in the support set. In other words, the sup-port set of K -shot FAS tasks here contains no categories L i and S m in Fig.2(b). For K -shot ( K > ) FAS tasks, AIM-FAS without PreDefined living and spoofing faces (namedas AIM-FAS w/o PD) inner-updates the meta-learner withonly K new type of spoofing/living faces, and then test themeta-learner on the query set. Note that we do not test AIM-FAS w/o PD on zero-shot FAS tasks since that support setof zero-shot FAS tasks here is empty. In Tab.7, we can seethat AIM-FAS w/o PD increases the ACER(%) by 11% and42% on 1-shot and 5-shot FAS, respectively. The worse per-formance of AIM-FAS w/o PD indicates that the predefinedliving and spoofing faces indeed bring benefit for the meta-learner learning discrimination for detecting new attacks. Effectiveness of Fusion Training (FT)
For the trainedmeta-learner to be capable of solving both zero- and few-shot FAS problems, we present an FT strategy in AIM-FASfor training the meta-learner simultaneously on all K -shotable 6: Performance of AIM-FAS on CASIA, Replay-Attack, and MSU-MFSD. The evaluation metric is AUC(%). Methods CASIA Replay-Attack MSU OverallVideo Cut Photo Warped Photo Video Digital Photo Printed Photo Printed Photo HR Video Mobile VideoOC-SVM RBF + BSIF 70.7 60.7 95.9 84.3 88.1 73.7 64.8 87.4 74.7 78.7 ± + BSIF 91.5 91.7 84.5 99.1 98.2 87.3 47.7 99.5 97.6 88.6 ± ± ± AIM-FAS(ours) ± α γ α ACER17.6 4.93 3.11 Iterations
Figure 5: The learning curve of α , γ and the meta-learner’sACER. The left Y-axis is the Y-axis of γ . The right Y-axis isthe Y-axis of α . The X-axis is the training iteration.Table 7: Ablation experiment on Cross-ZF . Method ACER (%)0-shot 1-shot 5-shotResnet-10 26.51 ± ± ± ± ± ± AIM-FAS w/o AIU 11.67 ± ± ± ± ± ± / ± ± AIM-FAS 8.43 ± ± ± tasks. To assess FT, we conduct an experiment that trains themeta-learner without FT. AIM-FAS w/o FT trains a meta-learner on 0-shot tasks for 0-shot testing, and train anothermeta-learner on 1-shot tasks for 1-shot testing, and so on.Tab.7 shows the performance of AIM-FAS w/o FT. Com-pared with AIM-FAS w/o FT, the AIM-FAS performs betteron all kinds of shot scenes. The possible reason is that theFT manner provides the meta-learner diverse K -shot FAStasks so that the AIM-FAS meta-learner generalizes betterfrom training tasks to testing tasks. In other words, with FT,the testing shot scene is a subset of the training shot scenes. Impact of Adaptive Inner-Update (AIU)
In this experi-ment, we discard the Adaptive Inner-Update (AIU) from thecomplete AIM-FAS. Tab.7 shows the comparison of AIM-FAS and AIM-FAS w/o AIU. We find that the AIU improvesour AIM-FAS with a large margin. Specifically, AIM-FASwith AIU apparently decreases ACER(%) more than 3.0 on0-, 1- and 5-shot. Furthermore, in Fig.5, we show the curvesof α and γ of Eq. 2 during meta-training process. Both α and γ present a rising tendency, meanwhile the ACER fallsdown. This indicates that AIM-FAS prefers a larger inner-update learning rate, and with the learned α and γ , AIM-FASperforms better than the AIM-FAS w/o AIU. In this subsection, the feature (feature of the last but onelayer) distribution of the meta-learner is illustrated in Fig.6. (a) ( b ) ( c ) Figure 6: Visualization of the distribution of living andspoofing faces in the query set of a 5-shot FAS testing task.Color used: red =living, blue =spoofing.We randomly generate a 5-shot FAS testing task and updatethe meta-learner for 50 inner-update steps on the support set.Then the feature distribution of the query set is visualizedwith t-SNE(Maaten and Hinton 2008). Fig.6a is the featuredistribution of the query set before the meta-learner updatesitself on the support set. Fig.6b and Fig.6c are the distri-butions after the meta-learner updates itself for 50 inner-update steps without and with AIU, respectively. We alsoshow the category inner distance (L1) and the inter distance(L2). From left to right, the distinction between the distribu-tion of living and spoofing faces turns more and more clear,and L1 declines gradually, whereas L2 rises. The visualiza-tion clearly reveals that the meta-learner learns the discrim-ination between new living and spoofing categories on thesupport set, and the AIU helps the meta-learner to learn bet-ter discrimination.
In this paper, we redefine the face anti-spoofing (FAS) as asimultaneously zero- and few-shot learning issue. To addressthis issue, we develop a novel method Adaptive Inner-updateMeta Face Anti-Spoofing (AIM-FAS) and propose threezero- and few-shot FAS benchmarks. To validate AIM-FAS,we conduct experiments on both the proposed benchmarksand existing zero-shot protocols. All experiments show thatAIM-FAS outperforms existing methods with a clear mar-gin on both zero- and few-shot FAS. In the future, we willdevelop AIM-FAS to more challenging and practical appli-cation scenes.
This work was supported by the Chinese National Nat-ural Science Foundation Projects (Grant No. 61876178,61872367, and 61806196). eferences
Arashloo, S. R.; Kittler, J.; and Christmas, W. J. 2017.An anomaly detection approach to face spoofing detection:A new formulation and evaluation protocol.
IEEE Access
IJCB ,319–328.Bengio, S.; Bengio, Y.; Cloutier, J.; and Gecsei, J. 1992. Onthe optimization of a synaptic learning rule. In , 6–8. Univ. of Texas.Boulkenafet, Z.; Komulainen, J.; and Hadid, A. 2016.Face spoofing detection using colour texture analysis.
IEEE Transactions on Information Forensics and Security
IEEE Signal Processing Letters
ACCV , 121–132.de Freitas Pereira, T.; Anjos, A.; De Martino, J. M.; and Mar-cel, S. 2013. Can face anti-spoofing countermeasures workin a real world scenario? In
ICB , 1–8.Duan, Y.; Schulman, J.; Chen, X.; Bartlett, P. L.; Sutskever,I.; and Abbeel, P. 2016. Rl : Fast reinforcementlearning via slow reinforcement learning. arXiv preprintarXiv:1611.02779 .Feng, Y.; Wu, F.; Shao, X.; Wang, Y.; and Zhou, X. 2017.Joint 3d face reconstruction and dense alignment with posi-tion map regression network. In CVPR .Finn, C.; Abbeel, P.; and Levine, S. 2017. Model-agnosticmeta-learning for fast adaptation of deep networks. arXivpreprint arXiv:1703.03400 .Gan, J.; Li, S.; Zhai, Y.; and Liu, C. 2017. 3d convolutionalneural network based on face anti-spoofing. In
ICMIP , 1–5.Grant, E.; Finn, C.; Levine, S.; Darrell, T.; and Griffiths, T.2018. Recasting gradient-based meta-learning as hierarchi-cal bayes. arXiv preprint arXiv:1801.08930 .international organization for standardization. 2016. Iso/iecjtc 1/sc 37 biometrics: Information technology biomet-ric presentation attack detection part 1: Framework. In .Komulainen, J.; Hadid, A.; and Pietikainen, M. 2013. Con-text based face anti-spoofing. In
BTAS , 1–8.Lampert, C. H.; Nickisch, H.; and Harmeling, S. 2009.Learning to detect unseen object classes by between-classattribute transfer. 951–958.Lampert, C. H.; Nickisch, H.; and Harmeling, S. 2014.Attribute-based classification for zero-shot visual object cat-egorization.
IEEE Transactions on Pattern Analysis and Ma-chine Intelligence
IPTA , 1–6.Liu, Y.; Stehouwer, J.; Jourabloo, A.; and Liu, X. 2019.Deep tree learning for zero-shot face anti-spoofing.Liu, Y.; Jourabloo, A.; and Liu, X. 2018. Learning deepmodels for face anti-spoofing: Binary or auxiliary supervi-sion. In
CVPR , 389–398.Lucena, O.; Junior, A.; Moia, V.; Souza, R.; Valle, E.; andLotufo, R. 2017. Transfer learning using convolutional neu-ral networks for face anti-spoofing. In
International Confer-ence Image Analysis and Recognition , 27–34.Maaten, L. v. d., and Hinton, G. 2008. Visualizing data usingt-sne.
Journal of machine learning research ¨ a ¨ a tt ¨ a , J.; Hadid, A.; and Pietik ¨ a inen, M. 2011. Facespoofing detection from single images using micro-textureanalysis. In IJCB , 1–7.Mishra, N.; Rohaninejad, M.; Chen, X.; and Abbeel, P.2018. A simple neural attentive meta-learner.Nagpal, C., and Dubey, S. R. 2018. A performance evalua-tion of convolutional neural networks for face anti spoofing. arXiv preprint arXiv:1805.04176 .Nichol, A.; Achiam, J.; and Schulman, J. 2018. On first-order meta-learning algorithms.Norouzi, M.; Mikolov, T.; Bengio, S.; Singer, Y.; Shlens, J.;Frome, A.; Corrado, G.; and Dean, J. A. 2013. Zero-shotlearning by convex combination of semantic embeddings. arXiv: Learning .Patel, K.; Han, H.; and Jain, A. K. 2016a. Cross-databaseface antispoofing with robust feature representation. In
Chi-nese Conference on Biometric Recognition , 611–619.Patel, K.; Han, H.; and Jain, A. K. 2016b. Secure face un-lock: Spoof detection on smartphones.
IEEE transactionson information forensics and security
CoRR abs/1812.04955.Shao, R.; Lan, X.; and Yuen, P. C. 2017. Deep con-volutional dynamic texture learning with adaptive channel-discriminability for 3d mask face anti-spoofing. In
IJCB ,748–755.Snell, J.; Swersky, K.; and Zemel, R. 2017. Prototypicalnetworks for few-shot learning. In
Advances in Neural In-formation Processing Systems , 4077–4087.Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D.; et al.2016. Matching networks for one shot learning. In
Advancesin Neural Information Processing Systems , 3630–3638.Wang, Z.; Zhao, C.; Qin, Y.; Zhou, Q.; and Lei, Z. 2018.Exploiting temporal and depth information for multi-frameface anti-spoofing. arXiv preprint arXiv:1811.05118 .Xu, Z.; Li, S.; and Deng, W. 2015. Learning temporal fea-tures using lstm-cnn architecture for face anti-spoofing. In