Robust Degraded Face Recognition Using Enhanced Local Frequency Descriptor and Multi-scale Competition
11 Robust Degraded Face Recognition Using Enhanced Local Frequency Descriptor and Multi-scale Competition
Guangling Sun,Guoqing Li,Xinpeng Zhang School of Communication and Information Engineering, Shanghai University, Shanghai, China [email protected]
Abstract — Recognizing degraded faces from low resolution and blurred images are common yet challenging task. Local Fre-quency Descriptor (LFD) has been proved to be effective for this task yet it is extracted from a spatial neighborhood of a pixel of a frequency plane independently regardless of correla-tions between frequencies. In addition, it uses a fixed window size named single scale of short-term Frequency transform (STFT). To explore the frequency correlations and preserve low resolution and blur insensitive simultaneously, we propose Enhanced LFD in which information in space and frequency is jointly utilized so as to be more descriptive and discriminative than LFD. The multi-scale competition strategy that extracts multiple descriptors corresponding to multiple window sizes of STFT and take one corresponding to maximum confidence as the final recognition result. The experiments conducted on Yale and FERET databases demonstrate that promising re-sults have been achieved by the proposed Enhanced LFD and multi-scale competition strategy.
Keywords - face recognition; low resolution and blurred; Enhanced LFD; multi-scale competition; frequency correlation; Ⅰ . INTRODUCTION Due to a wide range of potential applications as well as academic challenges, face recognition has attracted much attention during the last decade. Despite great progress has been made in design of scheme robust to expressions and aging of subjects, partial occlusions, illuminations and inac-curate registrations, most of them aimed at recognizing faces in high quality image. Once coping with degraded images caused by such as blur, low resolution, noise etc, the perfor-mance will decline dramatically. Hence, in this paper, we will focus on robust blurred and low resolution face recogni-tion. There roughly exist three categories of frameworks in literature to handle face recognition from blurred and low resolution image. The first category is to deblur or superre-solve an image, then feed the restored image to the recogni- tion engine [1, 2]. While the separated scheme is straight-forward, it is not a best choice for the goal of image restora-tion is not consistent with that of recognition. And even worse, especially for blurred image, if the blur model is un-known or complex, notable artifacts introduced by deblur-ring will in fact decline the recognition performance. The second category is to do a direct recognition from blurred or low resolution image without deblurring or super resolving. Zhang et al [3] presented a joint blind restoration and recog-nition framework based on sparse representation. Once blur kernel is estimated, it is used to blur the training set to gen-erate a blur dictionary and the sparse coding of the blurred face using the blur dictionary is determined to give recogni-tion result. Moreover, the kernel is estimated iteratively in a close loop. Sun et al [4] also explored the blind blurred im-age recognition in which two frameworks are investigated. One is first to infer the kernel as a separate step, then the kernel is used to generate a data dictionary and an adaptive SIFT feature dictionary is also obtained accordingly. The other is to integrate the kernel estimation and the adaptive SIFT dictionary inference into a common model. The two steps are alternatively executed until stop criterion is reached. The main drawback of works in [3] and [4] is the low effi-ciency since the time consumption of blurring operation is heavy. Li et al [5] learned two coupled mapping matrix that mapped a pair of high and low resolution image to a unique feature space. The target of the couple mapping matrix is to make the distance between two points in feature space as close as possible provided that they are corresponding to a pair of high and low resolution version of a same image. The efficiency of the approach is high and superresolving is not necessary, but the mapped feature is global not being benefit for recognition. The last category is to extract blur invariant or insensitive features. Heikkilä et al analyzed Local Phase Quantization (LPQ) descriptor robust to centrally symmetric blur [6]. LPQ relied on short-term of Fourier transform (STFT). They noticed that the local quantized phase is nearly invariant in low frequency band. Clearly, phase information alone is not appropriate since magnitude is also very useful even more important for recognition demonstrated by work [7]. Lei et al proposed Local Frequency Descriptor (LFD) that both magnitude and phase are extracted [8]. Similar to Local Binary Pattern encoding relative relations between two pixels [9], LFD is defined in terms of relations of STFT of two neighboring pixels and declared to be insensitive to arbitrary type of blur kernel. Our idea stems from the work in [8]. It has been shown that LFD is effective for recognizing low resolution face to a certain extent. We notice that the correlations between fre-quencies are useful especially for improving performance of low resolution and blurred face recognition. Furthermore, for a given tested image, a descriptor that is the most insensitive relative to original image depending on the most suitable scale of STFT would be favored. To reach the two purposes, we propose Enhanced LFD and multi-scale competition strategy. The former encodes the joint information in space and frequency into the descriptor and the latter selects the winner among multi-scales relying on corresponding recog-nition confidences. The paper is organized as follows: Section 2 reviews the LFD. Section 3 gives a detail discussion of the Enhanced LFD and multi-scale competition. Sections 4 demonstrates good empirical results on Yale and FERET databases. Con-clusions and future work are provided in section 5. Ⅱ . REVIEW ON LFD LFD is based on STFT, which is calculated over a local area N x centered at x of an image f ( x ) as follows: x F ( ) ( ) ( ) f π ω −∗∈ = − ∑ ux u T ii j yi iN y y x e (1) where u = { u , u ,…, u L } denote a set of two dimensional frequencies, ( ) x ω denote a window function and ( ) x ω ∗ is the conjugate of it. A window example of size 5 × × lmd ) and local phase descriptor ( lpd ) are calculated from magnitude and phase of STFT respectively. lmd and lpd are both dependent on binary strings describing relative relations between value of a position and its 8-neighborings. Once a binary string is obtained, it will be encoded into an integer. Finally, all integers in an area are pooled into a his-togram. Figure 1. u =(1/5,0); u =(0,1/5); u =(1/5,1/5); u =(1/5,-1/5); (a) (b) u (c) u (d) u (e) u (f) u (g) u (h) u (i) u Figure 2. (a) Original face image. (b)-(e) magnitudes, (f)-(i) phases at frequencies u , u , u and u from left to right. Ⅲ . ENHANCED
LFD
AND
MULTI-SCALE
COMPETITION 3.1 Enhanced LFD Using Joint Information in Space and Frequency While LFD descriptor has been demonstrated to be ef-fective for recognizing low resolution and blurred face, cor-relations among frequencies has not been explored since LFD only encoded the spatial neighboring relation in each single frequency plane (FP) independently. In fact, the joint representation in space and frequency is more descriptive and discriminative for recognition. Meanwhile, the property of low resolution and blur insensitive should be preserved. To accomplish the joint representation and degraded insensi-tive property simultaneously, we propose a new descriptor which is named Enhanced LFD that concatenates the binary relation corresponding to correlated frequencies and at the same spatial location. For the sake of a good trade-off be-tween performance and efficiency, we choose arbitrary two frequencies from all frequencies as correlated frequencies. As mentioned in section 2, 4 frequency u , u , u and u are considered. Accordingly, a total of 12 2-frequency combina-tions are produced ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ), ( u , u ). Of a couple of correlated frequencies, the former is principal FP, the latter is its correlated FP. For arbitrary a couple of corre-lated frequencies and an identical spatial location, the ex-tended binary relations contain the 8-neighborings at the principal FP and 4-neighborings at the correlated FP as illus-trated in Fig. 3, where U denotes principal FP and U de- notes its correlated FP. The motivation of computing fea-tures from ( , ) u u i j and ( , ) u u j i separately and differently is to achieve a proper trade-off between performance and the length of extended binary relation string. Based on the magnitude of STFT ( ) , u x M at u and x , the enhance lmd ( elmd ) is defined as follows: ( ) ( ) ( ) ( ) ( ) M MT M M ⎧ ≥= ⎨⎩ k mk m ,, u uu u (2) where k denotes the focused spatial position and m de-notes the position of one of neighbors of pixel positioned at k . depending on the binary relations, elmd is encoded as an integer: ( ) ( ) ( ) ( ) ( ) ( ) , , , 2 , , , 2 p c welmd c cw wp pw l T M MT M M −= −= =+ ∑∑ u u u uu u k k mk m (3) where u p denotes principal FP and u c denotes its corre-lated FP. Similarly, based on the phase of STFT ( ) , u x P at u and x , the enhance lpd ( elpd ) is defined as follows: ( ) ( ) ( ) ( ) ( ) u uu u k mk m ⎧⎪= ⎨⎪⎩ P PT P P ,, (4) depending on the binary relations, elpd is also encoded as an integer: ( ) ( ) ( ) ( ) ( ) ( ) , , , 2 , , , 2 p c welpd c cw wp pw l T P PT P P −= −= =+ ∑∑ u u u uu u k k mk m (5) The encoded integers of all positions compose a labeled image and the 12 labeled magnitude and phase images are shown in Fig. 4. Because of 12 bits length in coding, the encoded integer is a value between 0 and 4095 leading to a histogram with 4096 bins. In our experiment, each of the 24 labeled images is divided empirically into 4 × × ×
11 10 5 4 3 2 × + × + × + × + × + × =
11 10 9 5 4 3 1 0 × + × + × + × + × + × + × + × =
Figure 3. Enhanced LFD at a location and a couple of correlated FPs. ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) ( u , u ) (a) 12 Labeled magnitude images (b) 12 Labeled phase images Figure 4. Labeled magnitude and phase images of Enhanced LFD
Multi-scale Competition Another limitation of LFD is that it uses a fixed win-dow size of STFT. In other words, it is single scale. Obvi-ously, it is not reasonable since the degradations of the tested image will vary greatly so that the most insensitive scale corresponding to each tested image would be much different. In following, we will give an analysis about the role of scale on the recognition performance. A low resolution or blurred image ( ) x g could be mod-eled as a convolution between a high quality image ( ) x f and a blur kernel function ( ) x k : ( ) ( ) ( ) g f k = ⊗ x x x (6) Assume we focus on two positions x i and x j and two local regions centered at the two positions. In terms of STFT, the Fourier transforms of two local regions in ( ) x f are as follows: F ( ) [ ( ) ( )]F ( ) [ ( ) ( )] ij ij F fF f xx u x x xu x x x ωω= −= − (7) where ( ) x ω refers to the window function. Now let the two local images blurred by ( ) x k , the deduction according to Convolution Theorem of Fourier transform is follows: [ ] G ( ) ( ) [ ( ) ( )] K( ) F ( )G ( ) ( ) [ ( ) ( )] K( ) F ( ) i ij j i j
F k fF k f ωω= ⊗ − = •⎡ ⎤= ⊗ − = •⎣ ⎦ %% x xx x u x x x x u uu x x x x u u (8) where K( ) u denotes Fourier transform of ( ) x k . For a same frequency u k , the blur invariant hold due to G ( ) / G ( )=F ( ) / F ( ) u u u u x x x x % % i i i i k k k k . Nevertheless, the blur operation is followed by local area extraction in practice which means that: [ ] [ ] ( ) ( ) ( ) ( )[ ( ) ( )]( ) ( ) ( ) ( )[ ( ) ( )] ij i ij j F g F k fF g F k f ω ωω ω= − = − ⊗⎡ ⎤ ⎡ ⎤= − = − ⊗⎣ ⎦ ⎣ ⎦ xx G u x x x x x x xG u x x x x x x x (9) obviously, the blur insensitive will be destroyed. However, we can make ( ) x G u i and ( ) x G u j approxi-mate F ( ) x u i and F ( ) x u j respectively as close as possible by letting the multiple scales compete and the winner is regard-ed as the most insensitive scale for a given tested degraded image. We propose a straightforward but effective strategy: in a reasonable scale range, confidences of first candidate for all possible scales are calculated and then the identity corre-sponding to maximum confidence is regarded as the final recognition result. This procedure is intuitively a competition among multi-scales and the scale that obtains the highest confidence would win. Certainly, features and classifiers corresponding to all scales must be extracted and constructed in advance from original high quality samples. We adopt the generalized confidence presented by [10]: ( )( | ) 1 min ( ) i k ci ck i de c d ≠ = − xx x where x denotes a tested sample, ( ) x i c d denotes a distance of x for category i c , and ( | ) x i e c just denotes the general-ized confidence of x for category i c . To prove the feasibility of the scheme, the first candidate confidences of all scales for four correctly recognized sam-ples and four wrongly recognized samples are shown in Fig. 5 respectively. Fig. 5(a) implies that the maximum confi-dence or two largest confidences is significantly larger than the others for correctly recognized sample whereas the dis-crepancy between the maximum confidence and the others is trivial for wrongly recognized samples shown in Fig. 5(b). From another aspect illustrated in Fig. 6, we can further argue the effectiveness of the competition strategy relying on confidence. An original image and the degraded image blurred by a blur kernel shown in right-bottom part of the image are shown in Fig. 6(a). And the magnitude histograms of a sub-region emphasized by red rectangle at three differ-ent scales are illustrated in the right most columns of Fig. 6, in which the top corresponds to original image and the bot-tom corresponds to blur one. It could be seen that the histo-gram similarity between original and blur sub-region for the scale obtaining maximum confidence is the highest, that for the scale obtaining the second largest confidence is lower and that for single scale is the lowest. An evident fact is the more insensitive the descriptor is, the more similar two his-tograms are. Thus, this example confirms the reliability of multi-scale competition based on confidence to certain de-grees. (a) Confidences of correctly recognized samples (b)
Confidences of wrongly recognized samples
Figure 5. Confidences of correctly and wrongly recognized samples
Original image 11x11 15x15 19x19
Degraded image by a blur kernel 11x11 15x15 19x19 single scale scale of obtaining the second scale of obtaining large confidence maximum confidence (a) (b) (c) (d)
Figure 6. Magnitude histograms of a sub-region at different scales Ⅳ . EXPERIMENT
RESULTS
AND
ANALYSIS The performance of the proposed Enhanced LFD and multi-scale competition scheme are evaluated on two public face databases: Yale and FERET. FERET database used here is a random subset of original FERET containing 40 persons. All samples of each class are partitioned randomly into two parts. For Yale, one part includes five training samples and the other part includes six testing samples and for FERET, the number of training and testing sample is the same. Two low resolution degradations with down sampling scale of 2 and 4, parametric blur kernels including Gaussian kernel (standard deviation 3 and size 7 ×
7) and linear motion kernel (7 pixel-length with 45 degrees), eight complex non-parametric kernels [11] are conducted. Altogether twelve degradations of original image in Fig. 2(a) are demonstrated in Fig. 7. Gaussian window is adopted in STFT.
LR 2 LR 4
Gaussian Linear motion kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8
Figure 7. Twelve degradations of an original image
The Classifier for Face Recognition
Though we do not focus on issue of classifier in this paper, the performance of adopted classification approach is rather important. Hence, we implement a classification scheme that slightly different from [12] since we only take the reconstruction errors as recognition distance as follows: Step 1: Calculate optimal coding ˆ α for tested sample y upon dictionary D with l2-norm regularization: ˆ arg min D λ = − + α α y α α where λ is the regularization factor. In all experiments, we set λ =0.01. Step 2: Classification according to reconstruction error associated with each category: ˆidentity( ) arg min i ii D = − y y α where { , , } α α α α α ∈ = L i i c ˆ ˆ ˆ ˆ ˆ . 4.2 Parameters Setting The single scale is set as 11 ×
11 and the multiple scales are in range of 11 ×
11 to 31 ×
31. For magnitude and phase, all feature planes (for instance, 4 feature planes for LFD, 12 feature planes for Enhance LFD) are concatenated to com-pose a complete feature vector to feed the classifier. The optimal valid number of bins is 48 for lmd and lpd so that the dimension of LFD is
64 48 3072 × = .To reduce the dimen-sion of elmd and elpd and remain the good performance of the two enhanced descriptors as well, the optimal valid number of bins is selected as 16 for them. Consequently, the dimension of LFD and Enhance LFD are both 3072. The performance of LFD and Enhanced LFD is compared using multi-scale competition. All results have been listed in table 1 and table 2. In both tables, lmd / lpd with suffix “s” refers to single scale and “c” refers to multi-scale competition. Ac-cordingly, elmdc and elpdc refer to elmd and elpd with mul-ti-scale competition respectively. TABLE Ⅰ . ACCURATE
RATES OF YALE (%) TABLE Ⅱ . ACCURATE
RATES OF FERET (%) • Single scale versus multi-scale competition From the comparison of single scale and multi-scale competition, the great improvements achieved by the latter fully indicate that the feasibility and necessity of this strate-gy. • LFD versus Enhance LFD using multi-scale competition On average, in the scenario of multi-scale competition, results of both databases have proved the performance of elmd is superior to lmd but elpd is only slightly advantage over even inferior to lpd for FERET and Yale respectively.
This may be owing to much less number of histogram bins of elpd and lack of discriminant analysis. This result indi-cates that if adequate discriminant analysis is implemented, the performance of elpd will surpass that of lpd and the ad-vantage of elmd will be further increased. Ⅴ . CONCLUSIONS
AND
FUTURE
WORK A novel local face representation descriptor robust to low resolution and blurred degradation called Enhanced LFD and multi-scale competition strategy are proposed. Enhance LFD improves the performance of LFD by utilizing the cor-relations among different frequencies so as to present a joint local descriptor of two correlated frequencies at identical spatial locations. In addition, the most insensitive descriptor adaptive to tested image is found in recognition by using multi-scale competition and a depth discussion about it is presented. Encouraging results have been obtained on public obtainable Yale and FERET database. Future work would complement discriminate analysis for the proposed descriptor instead of direct use and develop faster and cleverer searching approaches instead of full searching of multiple scales. REFERENCES [1]
Masashi Nishiyama, Abdenour Hadid etc. Facial Deblur Inference Using Subspace Analysis for Recognition of Blurred Faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(4), 2011, pp.838-845. [2]
C. Liu, H. Y. Shum, and W. T. Freeman. Face hallucination: Theory and Practice. International Journal of Computer Vision 75(1), 2007, pp.115-134. [3]
Haichao Zhang, Jianchao Yang etc. Close the Loop: Joint Blind Image Restoration and Recognition with Sparse Representation Prior. IEEE International Conference on Computer Vision, 2011. pp.770-777. [4]
Guangling Sun, Guoqing Li. Blurred Image Classification Using Adaptive Dictionary. The 3rd IEEE International Conference on Intelligent Computing and Intelligent Systems, Vol3, pp.419-423, oral, 2011. [5]
Bo Li, Hong Chang, Shiguang Shan, Xinlin Chen. Low-Resolution Face Recognition via Coupled Locality Preserving Mappings. IEEE Signal Processing Letters. 17(1), 2010, pp.20-23. [6]
Janne Heikkilä and Ville Ojansivu. Methods for Local Phase Quantization in Blur-Insensitive Image Analysis. Local and Non-Local Approximation in Image Processing, 2009, pp.104-111 . [7]
Shufu Xie, Shiguang Shan, Xilin Chen, and Jie Chen. Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition. IEEE Transactions on Image Processing. 19, 2010, pp.1349-1361. [8]
Zhen Lei; Ahonen, T., etc. Local Frequency Descriptor for Low-Resolution Face Recognition. IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, 2011, pp.161-166. [9]
T. Ahonen, A. Hadid, and M. Pietikainen. Face Description with Local Binary Patterns:Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28, 2006, pp.2037-2041. [10]
Xiaofan Lin, Xiaoqing Ding. Adaptive Confidence Transform based on Classifier Combination for Chinese Character Recognition. Pattern Recognition Letters. 19, 1998, pp.975-988. [11]
A. Levin, Y. Weiss, F. Durand, and W. Freeman. Understanding and Evaluating Blind Deconvolution lmds lmdc elmdc lpds lpdc elpdc
LR2
Gaussian motion
LR4 kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8 average 88.79 95.21 96.17 65.71 85.54 83.33 lmds lmdc elmdc lpds lpdc elpdc
LR2
Gaussian motion
LR4 kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8 average 84.24 92.50 93.26 76.94 89.65 90.07