Blurred Image Classification based on Adaptive Dictionary
11 Blurred Image Classification based on Adaptive Dictionary
Guangling Sun, Guoqing Li, Jie Yin School of Communication and Information Engineering, Shanghai University, Shanghai, China [email protected]
Abstract —Two types of framework for blurred image classification based on adaptive dictionary are proposed. Given a blurred image, instead of image deblurring, the semantic category of the image is determined by blur insensitive sparse coefficients calculated depending on an adaptive dictionary. The dictionary is adaptive to the Point Spread Function (PSF) estimated from input blurred image. The PSF is assumed to be space invariant and inferred separately in one framework or updated combining with sparse coefficients calculation in an alternative and iterative algorithm in the other framework. The experiment has evaluated three types of blur, naming defocus blur, simple motion blur and camera shake blur. The experiment results confirm the effectiveness of the proposed frameworks.
Keywords - blurred image classification; sparse representation; adaptive dictionary I.
INTRODUCTION Image semantic classification remains one of the most challenging problems in computer vision, pattern recognition and statistical learning. To this end, significant progresses have been made in this research area. However, most of the image classification strategies focus on addressing issues such as a wide range of viewpoints, varying scales or illuminations, occlusions and much less attention is devoted to degraded image caused by blur, noise, fog and etc . In fact, blur is a very common degradation instance thus recognizing blurred image is significantly meaningful. In this paper, we cope with classifying image degraded by blur in particular. Compared with general image classification, few literatures exist to handle blurred image classification. Published approaches to this issue can be partitioned into three categories: The first is to extract blur insensitive features. J. Heikkila proposed Local Phase Quantization (LPQ) robust to centrally symmetric blur [1]. In [2], the author declared that the improved LPQ can be applied with any blur regardless of the point spread function. H.Zhang presented orthogonal Lengendre moments to construct a set of invariants to centrally symmetric blur, simple motion blur and noise [3]. The second is to deblur the image followed by classification [4]. M.Nishiyama designed a blurred face recognition framework called FADEIN composed of two stages: first a blur PSF is inferred using Frequency-Magnitude-Based feature space and subspace analysis. Then the deblurred face is recognized based on features used for high quality image recognition. M.Nishiyama further revealed that LPQ extracted from deblurred image actually outperformed comparing with FADEIN or LPQ extracted from blurred image. The third is to make a close combination of image restoration and recognition [5].Although face image is deblurred, the recognition is still accomplished in blur space produced by estimated PSF rather than deblurred space. While these methods are successful to some degree, they all have some limitations. For the first method, the PSF they have tested is simple and has single direction for motion blur. However, some blur such as the camera shake blur are complex and cannot be modeled well with simple motion blur PSF. For the second method, any image deblurring algorithm will inevitably introduce additional artifacts and noise, which in turn have negative affects for classification. As far as the third method, the performance is only evaluated on face recognition application not covering general case. Our idea belongs to the first category that blur insensitive features are extracted and without image deblurring. Specifically, relying on the framework proposed by Yang in which sparse coefficients of image patch are pooled and used as features to feed and train the SVM classifier [6], sparse coefficients of blurred image patch are directly adopted as features to implement classification without retraining SVM. It is reasonable that sparse coefficients of blurred image patch are regarded as blur insensitive since the used dictionary is adaptive to the specific blur. Hence, learning an adaptive blurred dictionary is a critical component. Certainly, PSF estimation is also a significant important issue. Once the PSF is inferred properly, we can deal with any type of blur resulting from camera defocus, camera shake and simple relative motion between camera and object. 1.1.
Overview of proposed framework As analyzed in above section, dictionary learning is an extremely important in proposed framework. Obviously, sparse coefficients of a blurred image using a sharp dictionary will drift much from that of sharp image. Therefore, we force sharp and blurred image patches to have identical sparse coefficients through dictionary learning to find blur insensitive features. We proposed two types of frameworks. The first proposed framework is as follows: linear SPM SVM classifier using sparse coefficients of SIFT as features for sharp image is constructed first. Then a PSF is inferred from the input blurred image using method proposed by R. Fergus [7]. Next, the estimated PSF is applied to blur training patches, and SIFT feature of blurred training patches and corresponding sparse coefficients of sharp version are utilized to obtain the adaptive blurred dictionary. Finally, depending on the adaptive dictionary, the sparse coefficients of input image are computed, transformed and utilized to recognize the image. To improve the efficiency and obtain better PSF estimation, the second proposed framework is as follows: joint feature of gradient and SIFT describing a sharp image patch is used to get joint dictionary D and sparse coefficients of training images according to D act as features to establish SVM classifier. Given an input blurred image, the unknown PSF, the adaptive dictionary and sparse coefficients of patch are updated alternatively and iteratively. The final output sparse coefficients are transformed and utilized to recognize the image. The two types of frameworks are analyzed in detail in section 2.2. Furthermore, to improve computation efficiency, a selection rule is designed to select a small part of all patches to learn the adaptive dictionary as discussed in section 2.3. II. PROBLEM
FORMULATION 2.1
Linear SPM SVM classifier using sparse coefficients First an image is partitioned into overlapping dense grids and SIFT feature is extracted from each grid. Then SIFT features of all grids are collected together to learn a dictionary and sparse coefficients of each grid are obtained accordingly. Further, three layers of a spatial pyramid for an image is build and each layer is partitioned into 2 l parts equally, where l denotes l th layer, l=0, 1, 2. A ‘max’ pooling strategy based on sparse coefficient is adopted for each part. Hence all together 21 pooling results are connected as a high dimensional feature vector representing an image. Finally, such feature vectors of training images are utilized to design a linear SVM classifier. The author declared they have achieved states-of-the-art performance [6]. 2.2 Blurred image classification based on adaptive dictionary Our work focus on making sparse coefficient of same feature of sharp and blurred patch can be inferred from each other through dictionary learning rather than classifier design so that we directly adopt linear SPM SVM as base classifier in this investigation. 1)
Framework I In fact, there exists an intuitive solution to blurred image recognition: training images are blurred with the PSF estimated from the input blurred image and these blurred training images are further used to learn new classifier. However, it is obvious to be impractical since the classifier needs to be retrained for every unknown image. Thus we propose a trade-off strategy: using a large set of sharp training image patches, a sharp dictionary D and corresponding sparse coefficient matrix trsh Α are obtained with K-Singular Value Decomposition (KSVD) and Orthogonal Matching Pursuit (OMP) algorithm [8]. Then, a classifier is trained based on D and put aside once training is finished. For any input blurred image, a new dictionary adaptive to the specific PSF inferred from the input image is relearned. Naturally, two essential issues must be addressed: PSF estimation and the adaptive dictionary learning. We adopt Ensemble Learning presented by [7] to infer the PSF. The adaptive dictionary should have a property that sparse coefficient of blurred image patch using it can be utilized to infer that of sharp version using sharp dictionary. To achieve this, we propose to design the following model: b tr trb b b sh FD ˆD arg min P D = − Α (1) Where trb P refers to SIFT features extracted from blurred training image patches produced by estimated PSF. Our goal is to search an optimal ˆ b D that minimizes the mean approximation errors shown in equation (1). Given a full row rank matrix trsh Α , the solution of this target function can be solved by Method of Optimal Directions (MOD): ( ) ( ) tr tr T tr tr Tb b sh sh sh ˆD P − ⎡ ⎤= ⎣ ⎦ Α Α Α (2) In terms of ˆ b D and trsh α , a training patch trb p can be approximated as follows: Ttr tr tr tr trb b sh b sh, b, sh, b, sh,K b,K ˆ ˆ ˆˆp D D d , d , , d α α α⎡ ⎤= ⎣ ⎦ α (cid:0) (cid:0) (cid:0) L (cid:0) (3) Where K refers to the number of dictionary atom. b D and b, j ˆd denote a normalized dictionary of which each atom is unit vector and l -norm of b, j ˆd respectively. The normalization of b ˆD is a requirement of majority methods of computing sparse coefficient. Moreover, it is assumed that the relation also holds for testing patch: ( ) Tte te te teb b sh, b, sh, b, sh,K b,KTte te teb b, b, b,K ˆ ˆ ˆp D d , d , , dD , , α α αα α α⎡ ⎤⎣ ⎦ (cid:0) (cid:0) (cid:0) L (cid:0)(cid:0) L (4) It means that during recognition, tesh α could be deduced from teb α without deblurring the blurred patch. Certainly, each element of teb α should be divided by l -norm of each atom of b D ˆ . 2) Framework II As we know, PSF estimation based on Ensemble Learning has intensive time-consuming [7].To be more efficient and obtain better PSF estimation, we propose another framework making a close combination of PSF estimation and sparse coefficients calculation. In this framework, the scheme of using blur insensitive sparse coefficient for the purpose of recognition is still adopted. Nevertheless, SIFT feature is not appropriate for representing image and meaningless for inferring PSF. To address the issue, we introduce a joint feature of gradient and SIFT as to bridge the gap between recognition and representation. Obviously, the roles of gradient feature have two folds: one is to be used to infer sparse coefficient for recognition and the other is to represent image and estimate PSF. Accordingly, framework Ⅱ is composed of two phases: The first phase is to use sharp training images and learn joint dictionary that represent an image patch from two aspects: SIFT feature and gradient feature. A joint dictionary learning model is designed as follows:
1{ } 121 2 tr tr tr tr Tj j iFD,tr ˆˆD, arg min P D s.t. d d , Li , j , , K = − = ≤∈Ω = L A A A α (5) Where tr P denotes joint data composed of SIFT feature and gradient feature, and tr A denotes corresponding sparse coefficient. Once ˆD is obtained, grad D is to be separated to approximate trgrad p as follows: ( ) Ttr tr tr tr trgrad grad grad grad, grad, K grad,K ˆ ˆ ˆ ˆp D D d , d , , d α α α= α (cid:0) (cid:0) (cid:0) L (cid:0) (6) Similar to expression (3), grad D denote a normalized dictionary of which each atom is unit vector. ˆD is used to train SVM classifier and grad D is utilized to represent image and infer PSF in second phase. The second phase is to infer PSF and compute sparse coefficient used for recognition.
21 22 0, 2 ˆ ˆ{ , } arg min . . , te T T te tei i i grad i ik i i te k B k R R R D k s t L i η −∈Ω ∈Ω ⎡ ⎤ ⎡ ⎤= ∇ − ⊗ + ≤ ∈Ω⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ∑ ∑ α α A A , (7) Where k , B ∇ and R i denote PSF, gradient of the input blurred image including horizontal derivative and vertical derivative and a matrix extracting i th patch from image respectively. k is a Tikhonov regularization term providing a smooth PSF prior and η is regularization factor. With Alternating Minimization scheme, model (7) can be converted into two sub-problems: k estimation and te A calculation. Before iteratively solving the two sub-problems, one of the two variables must be initialized. We initialize te A as follows: i te te te tei b,i grad i i ˆ argmin p D ,s.t. L,i α = − ≤ ∈Ω α α α (8) Where teb,i p refers to gradients of i th patch of input blurred image. In sequel, two sub-problems are solved alternatively until stop condition is satisfied. We set iteration number as stop condition and usually only very few iteration is required. a) PSF estimation Given current ,( 1) ˆ te n − A , k is updated to minimize the following model:
21 2( ) ,( 1) 222 22 2 ˆ arg min + arg min + n T T te ni i i grad ik i ik k B k R R R D kB k X k ηη − −∈Ω ∈Ω ⎡ ⎤ ⎡ ⎤= ∇ − ⊗ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦= ∇ − ⊗ ∇ ∑ ∑ α (9) ( ) ˆ n k is given as follows: ⎟⎟⎠⎞⎜⎜⎝⎛ +∂∂+∂∂ ∂∂+∂∂= − I η )()()()( )()()()(ˆ XFXFXFXF BFXFBFXFFk yyxx yyxxn (10) Where ( ) F ⋅ and ( ) F − ⋅ denote the FFT and inverse FFT respectively. ( ) F ⋅ is a complex conjugate operator. b) Sparse Coefficients calculation Given current ( ) n ˆk , te A is updated in following ways: first, ( ) n ˆk is used to blur all training image patches and adaptive ( n )b,grad ˆD is obtained similar to expression (2); then ,( ) ˆ te n A is updated as follows: teb,i te n te n te teb,i b,i b,grad b,i b,i ˆ arg min p D ,s.t. L,i = − ≤ ∈ Ω α α α α (11) ( ) ,( ) ,( ) ( ) ,( ) ( ) ,( ) ( )1 1 2 2 Tte n te n n te n n te n ni b,i, b,grad , b,i, b,grad , b,i,K b,grad ,K ˆ ˆ ˆˆ ˆ ˆ ˆ/ d , / d , / d α α α= α L (12) Where ( ) nb,grad D refers to normalized blurred dictionary after n th iteration. Once preset iteration number T is reached, final sparse coefficients used for recognition are obtained according to expression (6) as follows: ( ) ,( ) ,( ) ,( )1 1 2 2 Tte T te T te Ti i, grad, i, grad, i,K grad,K ˆ ˆ ˆ ˆ/ d , / d , / d α α α= α L (13) In sum, for gradient feature and blurred and sharp patch, the sparse coefficients of them using ( n )b,grad ˆD and grad D are related by a set of factors; meanwhile, for joint feature composed of SIFT and gradient and gradient alone, the sparse coefficients of them using ˆ D and grad D are also related by a set of factors. Hence, sparse coefficient of gradient feature of a blurred patch obtained from (11) can be utilized to predict that of joint feature of its sharp version as described in (12) and (13). 2.3 Efficiency improvement consideration In both the two frameworks, a large set of training patches is used to get dictionary for classifier design. However, for adaptive blurred dictionary learning, it is not necessary to utilize all the training patches. On the other hand, it is well known that only support vectors are needed using SVM to classify a pattern, thus it is reasonable to assume the support vectors contain most of the useful information for recognition. Thereby, to achieve a good trade-off between efficiency and performance, we propose an acceleration scheme: only a part of the large set of training patches coming from the training images corresponding to support vector images are blurred and utilized to learn the adaptive blurred dictionary. III.
EXPERIMENT
AND
RESULT
We implement the proposed frameworks and carry out experiments on Matlab platform. 3.1 Image database The tested database is Caltech101 and all 102 categories are trained. Random 20 images of each category are selected as training samples. Altogether 10 categories including accordion, pizza, buddha, car-side, leopards, lotus, pyramid, rooster, gramophone and Windsor-chair. Among these 10 categories, 20 samples of each category are tested to evaluate the proposed framework. Some samples of Caltech101 have been listed in figure 1. 3.2 Tested blue kernel The tested blur kernels (PSF) are Gaussian kernel, motion kernel 1 generated by Matlab function and motion kernel 2 provided by Levin [9] respectively. The details of them are listed in following: (1) Gaussian kernel: Gaussian low pass filter with size 9*9 and standard deviation 5. The kernel simulates blur resulting from camera defocus. (2) Motion kernel 1: Linear motion of 20 pixels length and direction 45o.The kernel simulates blur resulting from simple relative motion between object and camera. (3) Motion kernel 2: The sixth kernel chosen from file: LevinEtalCVPR09Data.rar. The kernel simulates blur resulting from camera shake. The three kernels and corresponding blurred images have been illustrated in figure 2.
Fig. 1. Sample Images in Caltech101 (a)(b)(c)(d)
Fig. 2. (a) Sharp images. (b) Gaussian kernel and blurred images. (c)Motion kernel 1 and blurred images. (d) Motion kernel 2 and blurredimages. L =5 and number of dictionary atom is set as K =1024. In framework II, η and T are set as η= and T = respectively. The size of grid is selected as 16*16 pixels and the dimensionality of gradient feature is 512 accordingly. Consequently, Principal Component Analysis technique is used to reduce the dimensionality before combining with SIFT feature. Altogether four methods and three kernels are evaluated in the experiment and the result is listed in table I. Besides the proposed two frameworks, other two compared methods are: one is to recognize with sharp dictionary; the other is to deblur the blurred image with Richardson-Lucy algorithm before recognition. The recognition accuracy of sharp image is 75% which roughly agree with the result reported by [6]. However, for recognizing blurred image that still uses sharp dictionary, the performance declined dramatically. After removing blurring with Richardson-Lucy algorithm, accuracy has increased to some degree. But the proposed frameworks have obtained higher accuracy and especially, the highest accuracies have been achieved by framework II for three blur kernels. TABLE I A CCURACY C OMPARISON OF MULTI - METHODS AND MULTI - KERNELS
Method Kernel Using sharp dictionary Deblurring with R-L algorithm Framework I Framework II Gaussian 47.5% 61% 64%
Motion1 46.5% 57% 62%
Motion2 55.5% 71% 69%
The accuracy of sharp image classification is 75%
IV.
CONCLUSIONS
AND
FUTURE
WORK In this paper, we propose two types of framework for blurred image classification and space-invariant blur kernel is assumed. The two frameworks are based on adaptive dictionary and neither demands image deblurring. The essential idea is that a new dictionary being capable of adaptive to inferred PSF from input blurred image is relearned for every input image. Therefore, for each blurred image patch, the sparse coefficient obtained by adaptive dictionary is insensitive to arbitrary blur. Meanwhile, for the two frameworks, the performance of the latter is higher than that of the former, since the former infers the PSF as a separate step, and the latter updates the PSF and sparse coefficient of gradient feature alternatively so as to better combine PSF estimation and sparse coefficient calculation. The proposed framework can tackle any blur resulting from camera defocus, simple relative motion between camera and object, to camera shake. The further work may come from two aspects: one is adaptive dictionary learning will not rely on outer image database but itself; the other is to cope with space variant blurred image recognition. REFERENCES [1] Heikkila, J. Ojansivu, V. Methods for local phase quantization in blur–insensitive image analysis. International Workshop on Local and Non-Local Approximation in Image Processing, 2009. [2]
Heikkila, J. Ojansivu, V. Rahtu, E. Improved blur insensitivity for decorrelated local phase quantization. 20th International Conference on Pattern Recognition, 2010. [3]
Hui Zhang, Huazhong Shu,etc. Blurred image recognition by Lengendre moments invariants. IEEE Transactions on Image Processing , vol.19, no.3, March 2010. [4] Nishiyama, M., Hadid, A., etc. Facial deblur inference using subspace analysis for recognition of blurred faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33 no.4, April 2011. [5]
H. Zhang, J.Yang., etc.Close the loop:joint blind image restoration and recognition with sparse representation prior.In ICCV,2011. [6]
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009. [7]
R. Fergus, B. Singh, A. Hertzmann, S.T. Roweis, and W.T. Freeman. Removing camera shake from a single photograph. ACM Trans.Graphics, vol. 25, no. 3, pp. 787-794, 2006. [8]