[PDF] Fast Matching by 2 Lines of Code for Large Scale Face Recognition Systems

Abstract

In this paper, we propose a method to apply the popular cascade classifier into face recognition to improve the computational efficiency while keeping high recognition rate. In large scale face recognition systems, because the probability of feature templates coming from different subjects is very high, most of the matching pairs will be rejected by the early stages of the cascade. Therefore, the cascade can improve the matching speed significantly. On the other hand, using the nested structure of the cascade, we could drop some stages at the end of feature to reduce the memory and bandwidth usage in some resources intensive system while not sacrificing the performance too much. The cascade is learned by two steps. Firstly, some kind of prepared features are grouped into several nested stages. And then, the threshold of each stage is learned to achieve user defined verification rate (VR). In the paper, we take a landmark based Gabor+LDA face recognition system as baseline to illustrate the process and advantages of the proposed method. However, the use of this method is very generic and not limited in face recognition, which can be easily generalized to other biometrics as a post-processing module. Experiments on the FERET database show the good performance of our baseline and an experiment on a self-collected large scale database illustrates that the cascade can improve the matching speed significantly.

Full PDF

FFast Matching by 2 Lines of Code for Large Scale Face Recognition Systems

Dong Yi, Zhen Lei, Yang Hu and Stan Z. Li ∗ Center for Biometrics and Security Research & National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of Sciences, Beijing, Chinadyi, zlei, yhu, [email protected]

Abstract

In this paper, we propose a method to apply the popularcascade classiﬁer into face recognition to improve the com-putational efﬁciency while keeping high recognition rate.In large scale face recognition systems, because the proba-bility of feature templates coming from different subjects isvery high, most of the matching pairs will be rejected by theearly stages of the cascade. Therefore, the cascade can im-prove the matching speed signiﬁcantly. On the other hand,using the nested structure of the cascade, we could dropsome stages at the end of feature to reduce the memory andbandwidth usage in some resources intensive system whilenot sacriﬁcing the performance too much. The cascade islearned by two steps. Firstly, some kind of prepared featuresare grouped into several nested stages. And then, the thresh-old of each stage is learned to achieve user deﬁned veriﬁ-cation rate (VR). In the paper, we take a landmark basedGabor+LDA face recognition system as baseline to illus-trate the process and advantages of the proposed method.However, the use of this method is very generic and not lim-ited in face recognition, which can be easily generalizedto other biometrics as a post-processing module. Experi-ments on the FERET database show the good performanceof our baseline and an experiment on a self-collected largescale database illustrates that the cascade can improve thematching speed signiﬁcantly.

1. Introduction

Recently, face recognition technologies are applied inmore and more applications with big data, such as, facerecognition for large scale social network services [1], andface recognition system for surveillance. These large scalesystems usually contain many millions of templates in thedatabase, which are deployed centralized or distributed.And they need to deal with the requests from the users orother image sources ( e.g.,

Surveillance Camera) continu- ∗ Stan Z. Li is the corresponding author. ously. Big data and high intensive request bring about manystrict requirements to the efﬁciency of face recognition sys-tems. The efﬁciency of the systems are mainly determinedby two factors: the size of feature template (Storage andTransmission) and the speed of template matching (Com-putation). In this paper, we will focus on these two factorsand propose a generic and lightweight method for efﬁcientlarge scale face recognition.The computational complexity of the most popularmatching algorithms are linear with respect to the numberof templates n and the dimension of feature d , such as Eu-clidean distance, Cosine similarity and so on. There aretwo kinds of methods to reduce the computation: approxi-mated nearest neighborhood methods [26] and partial fea-ture based ﬁltering [23]. The details of these methods willbe described in the next section. In this paper, mainly in-spired by [10], we propose a novel fast matching method byapplying the cascade classiﬁer [21] from face detection toface recognition. For face recognition, especially in largescale applications, the matching between the probe and thegallery set is an asymmetry classiﬁcation problem, in whichthe majority of matching pairs are negative (two imageswith different identity). This is the case that the cascadeclassiﬁer can work efﬁciently.While feature template has been grouped into severalstages, we can drop several stages at the end of the cascadeto obtain smaller feature template. Small template can obvi-ously save the storage space and improve the transmissionspeed of system, but it will also reduce the recognition rate.Therefore, we need to consider the trade-off between fea-ture length and performance for speciﬁc needs. The spiritis very similar to the Scalable Video Coding (SVC) [4] inH.264 video compression standard. SVC standardizes theencoding of a high-quality video bitstream that also con-tains one or more subset bitstreams. A subset video bit-stream is derived by dropping packets from the larger videoto reduce the bandwidth required for the subset bitstream.The subset bitstream can represent a lower spatial resolu-tion, lower temporal resolution, or lower quality video sig-nal. Similarly, the cascaded feature template contains many a r X i v : . [ c s . C V ] F e b ubset feature bits, and the subset has lower recognitionrate.In summary, the cascade can provide two advantages forlarge scale face recognition systems: (1) quickly reject neg-ative face pairs to improve the matching speed; (2) accord-ing to the speciﬁc storage and computation requirements,supply a scalable structure for user to select the size of fea-ture. To illustrate the advantages, we propose a EBGM-like [22] baseline system. First, a face detector and ASMare used to localize serval facial landmarks. Then, multi-scale and orientation Gabor [12] features are extracted onthe landmarks. Finally, LDA is applied to enhance the dis-crimination and reduce the dimensionality of feature. Thesimilarity of features are evaluated by Cosine metric.The main contributions of the paper are as follows:1. By studying the asymmetric property of large scaleface recognition problem, we apply cascade classiﬁerfor fast face matching. As a generic and lightweightpost-processing module, the cascade classiﬁer can beused widely to improve the speed of other biometricsystems;2. We propose a new concept “feature subset” for bio-metrics by cutting-off several stages of the cascade,and give the relationship between feature length andrecognition rate by experiments;3. To illustrate the advantages of our method, we proposean EBGM-like [22] baseline face recognition method,the performance of which is comparable to many state-of-the-art methods [19, 25, 20].

2. Related Work

Fast algorithm for large scale pattern recognition prob-lems has a long history in the ﬁeld of image retrieval, butdon’t get much attention in face recognition community.The most popular fast algorithm for large scale image re-trieval is approximated nearest neighborhood, such as k-dtree [8] and Locality-Sensitive Hashing (LSH) [7, 4]. But[23] has found these approximated nearest neighbor searchmethods do not work well with high-dimensional face fea-tures, that means their performance degrade quickly in facerecognition with database increasing.In [23], Wu et al. proposed a multi-reference re-rankingapproach for large scale face recognition. The main ideais originated from the query expansion techniques in textinformation retrieval. Firstly, many local features of facecomponents are used to get a small candidate set from thelarge gallery. Then a binary global feature is used to re-rankthe candidate set to get the ﬁnal result. Experiments showthe performance of their algorithm is comparable with thelinear scan system using the state-of-the-art face feature. Ona database containing one million face images, the speed-up ratio is about 8x comparing to the linear scan system. [13]also used similar way to improve the speed of face recog-nition by sifting the gallery according to rank. Our methodis closely related to these two methods, the idea is usingpartial or simple features to reject the irrelevant samples asquickly as possible.Inspired by the ideas of popular Hashing based meth-ods [7, 4], Yan et al. [26] proposed a Similarity Hashing(SH) for large scale face recognition, and got good perfor-mance on a database containing 100,000 face images. Al-though SH has achieved 30x speed-up ratio in the experi-ments, it is very memory consuming and need extra tens ofGiga-Bytes to store the hash index for every samples in thegallery. Because our method exploits the asymmetric struc-ture of the data in large scale face recognition, comparing tothe above methods, our cascade structure not only can ob-tain high speed-up ratio but also has the lowest performanceloss.Cascade is a kind of extreme unbalanced tree to deal withasymmetric two-class problem. The most successful appli-cation of cascade is face detection [21], in which the cas-cade was used to reject non-face samples at each node (orstage) and nearly allow all face samples to pass. In [10]the cascade was used for face recognition, but its advantagewas not noticed and analyzed in the context of large scaleface recognition. In this paper, we take face matching asa two-class problem, then large face recognition problemis exactly an asymmetric classiﬁcation problem. Given aprobe, the majority of samples in gallery should have dif-ferent identity with the probe. Therefore, we borrow thecascade structure from face detection to large scale facerecognition in this case. We will see that the cascade canalso work well for face recognition. The process of cascadelearning in face recognition is simpler than that in face de-tection, because we only need to learn the threshold of eachstage but the strong classiﬁer.

3. Baseline Face Recognition Method

To illustrate the concept and the advantages of the pro-posed method, we will discuss the steps and algorithmsbase on a baseline face recognition method. The baselineis similar with the popular EBGM and Bochum/USC sys-tem [22, 15]. Firstly, we detect 76 facial landmarks ona face image, and normalize the face image according tosome reference points ( e.g., two eyes). Then Difference ofGaussian (DoG) ﬁlter [19] is used for illumination normal-ization. Second, we extract multi-scale Gabor feature onthe 76 landmarks. And then a holistic subspace learning isapplied on the Gabor features to enhance the discriminativeability. The detail steps are described in the following sub-sections.

Figure 1. Two face images in FERET database and their 76 faciallandmarks localized by our ASM landmarker.

Because face detection is relative irrelevant to this paper,we just skip this step. After face detection, we localize thefacial landmarks by Active Shape Model (ASM) [6]. ASMis composed by three parts: shape model, local experts andoptimization strategy.In the most ASM variants, shape model is usuallyPCA [9], and we follow this model. For local experts, weuse LBP [2] feature and Boosting classiﬁer for each land-mark, which is similar with the method in [27]. Based onthe output of Boosting classiﬁers, we can get a conﬁdencemap for each landmark. These conﬁdence maps are feed toa Landmark Mean-Shift procedure [18]. Then we can getthe ﬁnal positions of all facial landmarks. For robustnessand efﬁciency, the optimization process is repeated severaltimes on two scales.The training set of our landmarker is constructed fromthe MUCT database [14]. Three views (a, d and e) withsmall pose variations are used for training. Because thebackground of images in the MUCT are almost uniform,we replace them with some random backgrounds and mir-ror all images to augment the dataset (see Figure 2). Theuniform background of the face images are segmented byGrabCut [17], which is initialized by the results of face de-tection. Figure 1 shows two example images in the FERETdatabase [16] and their 76 facial landmarks localized by thelandmarker, from which we can see the landmarks are ro-bust to small pose variations and have good precision forthe next steps.For geometry normalization, two eyes in the 76 land-marks are selected as reference points. Based on the ref-erence points, the face images are normalized to a standardframe with eye distance =

60 pixels by a similarity trans-formation. Note that the above similarity transformation isalso applied on the 76 landmarks. And then DoG ﬁlter [19]is used to reduce the inﬂuence of illumination on the faceimages. The parameters of DoG are: γ = . σ = . σ = . do norm =

0. Figure 3 shows an aligned face

Figure 2. Sample images in the MUCT training set for the ASMlandmarker. Left: A color face image in the MUCT database,which has uniform background. Right: A face image is convertedto gray scale and the background is replaced by a random imagefrom the Internet.Figure 3. An aligned face image and its corresponding DoG ﬁl-tered image. image and the DoG ﬁltered image.

Given an aligned face image and the 76 landmarks, weextract local features on the landmarks by a Gabor wavelet,which is described in [15]. ψ k , σ ( x ) = k σ e k − σ x { e i kx − e − σ } (1)The wavelet is a plane wave with wave vector k , re-stricted by a Gaussian envelope, the size of which relative tothe wavelength is parameterized by σ . The second term inthe brace removes the DC component. Following the pop-ular way, we sample the space of wave vectors k and scale σ in a discrete hierarchy of 5 resolutions (differing by half-octaves) and 8 orientations at each resolution (See Figure 4),thus giving 5 × =

40 complex values for each landmark.Because the phase information is sensitive to image shift ormis-alignment, we just drop the phase and use the ampli-tude as feature for face recognition.Merging the feature values at all landmarks together, weget a feature vector with 76 × = igure 4. The real part of the Gabor wavelet in 5 resolutions and 8orientations. s ( x , y ) = x T y (cid:112) x T xy T y (2)In practice, we usually normalize the feature vector x and y to unit length as x (cid:48) and y (cid:48) . Then the Equation (2) can bewritten as s ( x , y ) = s ( x (cid:48) , y (cid:48) ) = x (cid:48) T y (cid:48) . (3)

4. Cascade Learning

As described in Section 1, the objective of the cascadeis to improve the speed of large scale face recognition andsupply a ﬂexible way to get a trade-off between the featurelength and recognition rate.How to construct the cascade is feature and matching al-gorithm dependent. In this paper, we construct a speciﬁccascade based on the Gabor-LDA feature and Cosine met-ric in the baseline method (see Section 3).

The cascade has been widely used in face detection sincethe famous work of Viola and Jones [21]. After that, thereare many cascade variations arise for many speciﬁc applica-tions, such as Nested cascade [11], Soft cascade [5], Boost-ing Chain [24] and so on. To use the existing features ef-ﬁciently, we adopt the nested structure. As shown in Fig-ure 5, we divide the feature template into several nestedstages from coarse to ﬁne. The ﬁrst level “stage1” is ac-tually the original feature template, which has the highestrecognition rate. The “stage2” and “stage3” are coarse lev-els with lower recognition rate, and the feature template canbe divided further.When the structure of the cascade is determined, we cangroup the Gabor-LDA features in this way. In the experi-ments of this paper, the size of feature is 428, and the featuretemplate is divided into a seven-stage structure. The sizes ofthe stages are: 6, 13, 26, 53, 107, 214, 428. For other appli-cations, the number of stage and the size of each stage canbe chosen by experience or experiments. The recognitionrate of each stage will be reported in the experiments. Inresource constrained applications, we can drop some stages feature template stage1stage2stage3 …… Figure 5. The structure of the cascade classiﬁer. at the end of feature template to save the storage and trans-mission overhead.

The target of the cascade classiﬁer is to improve thefeature matching speed. While training the cascade, themulti-class samples are ﬁrstly converted to positive (sam-ple pairs from same subject) and negative samples (sam-ple pairs from different subject) by cross matching. Giventwo samples x (cid:48) and y (cid:48) , the training sample is constructed byEquation (4) as follows b = x (cid:48) (cid:12) y (cid:48) , (4)where (cid:12) denote element-wise product. Given the nestedstructure of the cascade and the training samples, we needto learn the threshold of each stage to achieve some userdeﬁned Veriﬁcation Rate (VR) (see Alg. 1). To reduce thefalse reject rate of positive samples, we usually set the VRof each stage nearly to 100%. In this paper, we set the ver-iﬁcation rates for the seven stages as: 99 . . = . . = . . = . . = . . = . . = . Algorithm 1

Cascade learning procedure. Input : Positive samples { P i j } , where i = , , · · · , d , j = , , · · · , n ; n is sample count; d is feature dimen-sion. Note that { P i j } are constructed from the trainingset by Equation (4). m is the size of each stage. v is theuser deﬁned VR of each stage. Stage count sn = Output : Threshold t of each stage. Let cumulative similarity s = , i =

0, where the lengthof s is n . for k = sn − do while i < m [ k ] do s = s + P i ∗ , where P i ∗ is the i th dimension of allsamples; i = i + 1; end while Sorting s to ﬁnd a threshold t [ k ] to let ( s > t [ k ]) n > v [ k ] . end for In the testing phase, two feature templates are matchedrom coarse to ﬁne by Equation (3), i.e., from stage7 tostage1. If the similarity score is smaller than the thresholdof current stage, the matching process is interrupted. Be-cause the probability of that the two feature templates com-ing from different subjects is very high, most of the match-ing pairs will be rejected by the early stages of the cascade.Therefore, the cascade can improve the matching speed sig-niﬁcantly. The testing procedure is shown in the Alg. 2.Compared with ordinary linear scan way, the cascade onlyneed two extra lines of code. This supplies a easy way toimprove the speed of existing large scale face recognitionsystem.

Algorithm 2

Fast feature matching by the cascade classiﬁer. Input : Two normalized feature templates x (cid:48) and y (cid:48) , fea-ture size d , feature size of each stage m , thresholds t ,stage count sn . Output : Similarity s . Calculate the similarity in an incremental way: Let s = i = for k = sn − do while i < m [ k ] do s = s + x (cid:48) [ i ] × y (cid:48) [ i ] ; i = i + 1; end while if s < t [ k ] then break; end if end for return s .

5. Experiments

We use the FERET database to illustrate the advantagesof our fast matching method following the standard test pro-tocols [16]. The results of the four popular experiments arereported: fafb, fafc, fadup1 and fadup2. In the experiments,fa is always used as gallery, and fb, fc, dup1 and dup2 areused as probe respectively. The ﬁrst two experiments fafband fafc are already saturated by the most of state-of-the-art algorithms, but fadup1 and fadup2 are still challengingdue to the appearance variations caused by expression andaging.To simulate large scale face recognition applications, wecollect a large database, which contains 200 ,

000 face im-ages with frontal pose, uniform illumination and good qual-ity (These images can not be disclosed due to privacy is-sues). We use the self-collected database to enlarge thegallery of the FERET database to test the performance ofthe baseline method on the setting of big data. With a biggallery, we can observe the variation of recognition rate and

Table 1. The baseline rank-1 recognition rate of the four experi-ments on the FERET database.

Experiment fafb fafc fadup1 fadup2Tan & Triggs [19] 98.0% 98.0% 90.0% 85.0%S-LGBP+LGXP [25] 99.0% 99.0% 94.0% 93.0%G-LQP [20] 99.9% 100% 93.2% 91.0%Proposed Baseline 99.67% 100% 93.35% 92.74%speed-up ratio of the cascade more easily. Furthermore, wewill analyze the relationship between the performance andfeature size to supply a reference for feature cutting in re-source constrained applications.

First, all face images in the training set, fa, fb, dup1 anddup2 are processed by a face detector and the facial land-marker. We get 76 facial landmarks of each face image.Then, face images are normalize to a standard frame witheye distance =

60 pixels, and processed by DoG [19] ﬁlterto alleviate the affection of illumination. Note that, in theface normalization step, the 76 facial landmarks are trans-formed in the same way.At the transformed 76 facial landmarks, 40 × = To illustrate the performance in big data setting, we mix200 ,

000 face images into the existing gallery as “Extended-fa”, and the probe sets remain unchanged. Like fa set, theExtended-fa is also processed by the pipeline described inSection 3. Figure 6 shows the impact of the size of galleryto the recognition rate. We can see the last two experimentsare more easily affected by the size of gallery than the ﬁrsttwo. However, when the Extended-fa contains 200 , + i.e., Extended-fadup1: from 93 .

35% to 89 . .

74% to 89 .

10 15 20 25 30 35 40 45 500.880.90.920.940.960.981 Rank C u m u l a t i v e M a t c h S c o r e Cumulative Match Curve fafbfafcfadup1fadup2Extended−fafbExtended−fafcExtended−fadup1Extended−fadup2

Figure 6. The performance comparison of the baseline method be-fore and after the gallery expansion, where “Extended-” denote thegallery fa is extended by the extra 200 ,

000 face images.

In the following, the performance of the cascade and par-tial cascade are also evaluated on the large extended gallery“Extended-fa”.

The cascade is trained using the training set of theFERET and the settings described in Section 4. Its perfor-mance is assessed in two aspects: recognition rate loss, andcomputation speed up.All experiments are conducted on the extended FERETdatabase. The experimental platform is a normal PC with2.4GHz CPU, 4G Memory. Table 2 list the rank-1 recog-nition rate of normal and cascade based matching methods.It’s amazing that the recognition rate of the cascade keepsas same as the original Cosine metric with zero performanceloss. The 7 th column of the table shows the average time toscan the gallery one time of all experiments, which indicatethat the average matching speed is improved signiﬁcantlyby 7.55 times. On one hand, the cascade provides a fast matchingmethod for large scale face recognition systems. On theother hand, it also provides a ﬂexible structure for the sys-tem to cut off feature length to save the storage and trans-mission bandwidth. Here, Figure 7 gives the relationshipbetween the recognition rate and feature length for refer-ence.We test the four experiments using partial cascade thatmeans we just evaluate the similarity of two samples bythe ﬁrst k stages in the cascade and neglect the last sev-eral stages. The x-axis in Figure 7 is the count of stage weused. From the ﬁgure, we can see the speed of improve- R e c ogn i t i on R a t e Extended−fafbExtended−fafcExtended−fadup1Extended−fadup2

Figure 7. The relationship between the recognition rate and featurelength on the extended FERET database. ment of recognition rate with respect to the stage count (orfeature length) is decrease. That means, as the length of fea-ture increases, the recognition rate will become harder andharder to improve. Therefore, we can cut off a large partat the end of the feature to get a smaller feature template,while not sacriﬁcing much recognition rate. For example,when the length of feature template is cut from 428 to 214dimension, i.e., use 6 stages, the recognition rate of exper-iment “fadup2” just drop 3 .

6. Conclusions

Computation speed, storage capacity, and transmissionbandwidth are three key factors for large scale face recogni-tion systems. To improve the efﬁciency of system in termsof these factors, we propose a cascade classiﬁer and itslearning algorithm. Through extensive experiments on theFERET and a large self-collected database, we ﬁnd the cas-cade can improve the face matching speed by 7.55 timeswith zero recognition rate loss. The cascade also supplied aﬂexible structure for resource constrained applications, bywhich we could drop half of features just with minor per-formance loss. Furthermore, the proposed baseline method(landmark based Gabor+LDA) is comparable to many state-of-the-art methods on the FERET database, which will bestudied further in the future and applied in unconstrainedface recognition problems, e.g.,

LFW.

Acknowledgements

The authors would like to acknowledge the followingfunding sources: the Chinese National Natural ScienceFoundation Project able 2. Evaluation of recognition rate and speed-up ratio of the cascade classiﬁer on the extended FERET.

Extended-fafb Extended-fafc Extended-fadup1 Extended-fadup2 Total Time Time/Query Speed-up RatioNormal 99 .

58% 98 .

97% 89 .

89% 89 . .

58% 98 .

97% 89 .

89% 89 . Technology Support Program Project

References

Proceedings of the EuropeanConference on Computer Vision , pages 469–481, Prague,Czech, 2004.[3] P. Belhumeur, J. Hespanha, and D. Kriegman. “Eigenfacesvs. ﬁsherfaces: recognition using class speciﬁc linear projec-tion”.

IEEE Trans. PAMI , 19(7):711–720, 1997.[4] B. Bennett and C. Dee. “Scalable video coding across hetero-geneous networks”. In

MILCOM, Military CommunicationsConference , 2008.[5] L. Bourdev and J. Brandt. “Robust object detection via softcascade”. In

Proceedings of IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition , pages236–243, 2005.[6] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. “Ac-tive shape models: Their training and application”.

CVGIP:Image Understanding , 61:38–59, 1995.[7] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni.“Locality-sensitive hashing scheme based on p-stable distri-butions”. In

Proceedings of the twentieth annual symposiumon Computational geometry , SCG ’04, pages 253–262, NewYork, NY, USA, 2004. ACM.[8] J. H. Friedman, J. L. Bentley, and R. A. Finkel. “An algo-rithm for ﬁnding best matches in logarithmic expected time”.

ACM Trans. Math. Softw. , 3(3):209–226, September 1977.[9] K. Fukunaga.

Introduction to statistical pattern recognition .Academic Press, Boston, 2 edition, 1990.[10] G.-D. Guo and H.-J. Zhang. ”Boosting for fast face recogni-tion”. In

Recognition, Analysis, and Tracking of Faces andGestures in Real-Time Systems, 2001. Proceedings. IEEEICCV Workshop on , pages 96 –100, 2001.[11] C. Huang, H. Ai, B. Wu, and S. Lao. “Boosting nested cas-cade detector for multi-view face detection”.

ICPR , pages415–418, 2004.[12] M. Lades, J. Vorbruggen, J. Buhmann, J. Lange, C. von derMalsburg, R. P. Wurtz, and W. Konen. “Distortion invariantobject recognition in the dynamic link architecture”.

IEEETransactions on Computers , 42:300–311, 1993. [13] Y. M. Lui and J. R. Beveridge. “Grassmann registration man-ifolds for face recognition”. In

ECCV , pages 44–57, 2008.[14] S. Milborrow, J. Morkel, and F. Nicolls. The MUCT Land-marked Face Database.

Pattern Recognition Association ofSouth Africa

IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , 22(10):1090–1104, 2000.[17] C. Rother, V. Kolmogorov, and A. Blake. “GrabCut: interac-tive foreground extraction using iterated graph cuts”.

ACMTrans. Graph. , 23(3):309–314, Aug. 2004.[18] J. Saragih, S. Lucey, and J. Cohn. “Deformable model ﬁttingby regularized landmark mean-shift”.

International Journalof Computer Vision , 91:200–215, 2011.[19] X. Tan and B. Triggs. “Enhanced local texture featuresets for face recognition under difﬁcult lighting conditions”.

IEEE Transactions on Image Processing , 19:1635–1650,June 2010.[20] S. ul Hussain, Wheeler, T. Napolon, and F. Jurie. “Facerecognition using local quantized patterns”. In

Proc. BritishMachine Vision Conference , volume 1, pages 52–61, 2012.[21] P. Viola and M. Jones. “Robust real-time face detection”.

In-ternational Journal of Computer Vision , 57:137–154, 2004.[22] L. Wiskott, J. Fellous, N. Kruger, and C. v. d. Malsburg.“Face recognition by elastic bunch graph matching”.

IEEETransactions on Pattern Analysis and Machine Intelligence ,19(7):775–779, 1997.[23] Z. Wu, Q. Ke, J. Sun, and H.-Y. Shum. “Scalable face im-age retrieval with identity-based quantization and multiref-erence reranking”.

IEEE Trans. Pattern Anal. Mach. Intell. ,33(10):1991–2001, Oct. 2011.[24] R. Xiao, L. Zhu, and H.-J. Zhang. “Boosting chain learn-ing for object detection”. In

Proceedings of the Ninth IEEEInternational Conference on Computer Vision , ICCV ’03,pages 709–715, Nice, France, 2003.[25] S. Xie, S. Shan, X. Chen, and J. Chen. “Fusing local patternsof gabor magnitude and phase for face recognition”.

IEEETransactions on Image Processing , 19(5):1349–1361, 2010.[26] J. Yan, Z. Lei, D. Yi, and S. Z. Li. “Towards incremental andlarge scale face recognition”. In

International Joint Confer-ence on Biometrics (IJCB) , pages 33–39, Washington, DC,USA, Oct. 11-13 2011.[27] D. Yi, Z. Lei, and S. Z. Li. “A robust eye localization methodfor low quality face images”. In