Sparse Graph-based Transduction for Image Classification
Sheng Huang, Dan Yang, Jia Zhou, Luwen Huangfu, Xiaohong Zhang
SSparse Graph-based Transduction for Image Classification
Sheng Huang a , Dan Yang a, ∗ , Jia Zhou a , Luwen Huangfu b , Xiaohong Zhang c,d a College of Computer Science at Chongqing University, Chongqing, 400044, P.R.C b Eller College of Management at University of Arizona, Tucson, AZ, 85712, USA c School of Software Engineering at Chongqing University, Chongqing, 400044, P.R.C d Ministry of Education Key Laboratory of Dependable Service Computing in Cyber Physical Society, Chongqing, 400044, P.R.C
Abstract
Motivated by the remarkable successes of Graph-based Transduction (GT) and Sparse Representation (SR), we present a novelClassifier named Sparse Graph-based Classifier (SGC) for image classification. In SGC, SR is leveraged to measure the correlation(similarity) of each two samples and a graph is constructed for encoding these correlations. Then the Laplacian eigenmapping isadopted for deriving the graph Laplacian of the graph. Finally, SGC can be obtained by plugging the graph Laplacian into theconventional GT framework. In the image classification procedure, SGC utilizes the correlations, which are encoded in the learnedgraph Laplacian, to infer the labels of unlabeled images. SGC inherits the merits of both GT and SR. Compared to SR, SGCimproves the robustness and the discriminating power of GT. Compared to GT, SGC sufficiently exploits the whole data. Thereforeit alleviates the undercomplete dictionary issue suffered by SR. Four popular image databases are employed for evaluation. Theresults demonstrate that SGC can achieve a promising performance in comparison with the state-of-the-art classifiers, particularlyin the small training sample size case and the noisy sample case.
Keywords:
Image Classification, Sparse Representation, Graph Learning, Transductive Learning, Semi-supervised Learning
1. Introduction
As two popular techniques for classification, Sparse Rep-resentation (SR) and Graph-based Transduction (GT) have at-tracted a lot of attentions in machine learning, computer visionand image processing communities [1, 2, 3, 4, 5, 6, 7]. Theidea of SR stems from the compression sensing that most sig-nals have a sparse representation as a linear combination of areduced subset of signals from the same space [1, 8]. The basicidea of GT is to utilize the similarities between each two sam-ples to infer the labels of unlabeled samples where such simi-larities are encoded in a graph or hypergraph [7, 9, 10, 11, 12].In SR, the signals tend to have a representation biased towardstheir own class and only the most relevant signals are high-lighted [4]. These facts endow SR with the strong discrimi-nating power and the robustness to noise. However, an im-portant prior condition of SR is that it requires the dictionaryto be overcomplete. In the lack of training samples case (theundercomplete dictionary case), which actually is very com-mon in the real world applications, the dictionary constructedby training samples is too small to sparsely represent the querysample which will restrict the classification performance of SR.Moreover, another shortcoming of SR is that it cannot utilizethe self-similarities of the training data and the self-similaritiesof the testing data. On the contrary, GT can well alleviate theprevious shortcomings suffered by SR, since the graph, whichis core of GT and encodes the similarities, are constructed fromboth training and testing samples. In other words, all data can ∗ Corresponding author (Dan Yang): [email protected] be sufficiently exploited. The main problem of the current GTapproaches is that they are easily corrupted by noise. This isdue to the fact that most of GT approaches generate the graphs(or hypergraphs) by k-nearest-neighbour and (cid:15) -ball [13]. How-ever, improving the robustness to noise is what is SR good at.Apparently, the advantages SR and GT are complementary. Sohere comes a question, if there exists a classification approachthat can combine SR and GT together and inherit their advan-tages? Fortunately, this paper will give a positive answer.Recently, many works leverage SR to construct a sparsegraph (or (cid:96) -graph) for tackling subspace learning, clusteringand semi-supervised learning tasks [3, 4, 15, 13, 16]. Theseapproaches can achieve such remarkable successes, since thesparse graph incorporates the merits of SR that it is more dis-criminative and robust than the conventional graph. Althougha lot of impressive related works have been proposed, as far aswe know, there is no prior work that directly employs the sparsegraph for transduction. In this paper, we utilize the sparse graphto present a novel Graph-based Transduction (GT) algorithmfor classification. Following the same graph construction man-ner in [4, 13], each sample is taken out as the query sample andthe remainder of samples are considered as the dictionary topresent a Sparse Representation (SR) system in which the cor-relations (or similarities) between the query sample and othersamples are measured. In such case, a sparse graph, which en-codes the correlations between each two samples, can be con-structed and it is not hard to derive its graph Laplacian. Note,the graph Laplacian is constructed from both training samplesand testing samples. Finally we can achieve our proposed clas- August 22, 2018 a r X i v : . [ c s . C V ] D ec a) Raw Samples and their rank scores (b) Samples with noise and their rank scoresFigure 1: The figure shows the top 10 most relevant face images selected by SR and K-Nearest Neighbour based on a given query face image. This experiment isconducted in a subset of FERET database [14] (72 subjects with 6 images in each subject). The first two rows of the figure are the selection results of SR whilethe last two rows are the selection results of KNN.
The left subfigure reports the results on the original FERET database while the right one reports the results onthe modified FERET database in which 30% of pixels of each image has been corrupted by noise. In the figure, the first face image of each image array is the queryimage and the rest ten images are the relevant face images selected by SR or KNN. The histograms above the image array demonstrates the confidence scores ofthese top ten relevant face images. If the subjects of the return face image and the query face image are identical, its corresponding histogram is positive otherwiseit is negative. In the figure, SR gets five hits either on the original FERET database or on the noisy FERET database while KNN only gets three and two hits onthese two datasets respectively. Clearly, this phenomenon verifies that sparse graph, which is generated by SR, is more discriminative and robust. sification approach via plugging such graph Laplacian into theconventional Graph-based transduction framework. We namethis novel graph-based classification approach Sparse Graph-based Classifier (SGC). SGC inherits the advantages of both SRand GT which is exactly the positive answer of the aforemen-tioned question. Compared with SR, since the graph Laplacianis constructed from both training and testing samples, SGC cannot only use the correlations between the given testing sampleand training samples, which is as same as what the traditionalSR-based classifier does, but also use the correlations of thetesting data and the correlations of the training data to furtherimprove the discriminating power of SR. Moreover, since thetesting samples are complemented to construct a larger dictio-nary, SGC alleviates the undercomplete dictionary issue suf-fered by SR [1]. Compared with GT, the graph laplacian ofSGC is generated from SR instead of k-nearest-neighbour or (cid:15) -ball. So there are two merits inherited from SR: the relevantsamples can be better and adaptively selected for each sampleto constitute the local clique (or neighbour); the obtained graphis more robust to noise [1, 13] (see the examples on Figure 1).We apply our work to image classification. Yale [17], AR [18],FERET [14] and Caltech256 [19] databases are employed forevaluation. The experimental results show that our method canget a promising result in comparison with the state-of-the-artclassifiers particularly in the small training sample size case.The rest of paper is organized as follows: Section 2 presentsthe related works; Section 3 describes the proposed approach.Section 4 shows the experimental evaluation of our works; theconclusion is summarized in Section 5.
2. Related Works
Sparse Representation (SR) is a hot topic in the recent decadeand widely applied to extensive areas [1, 5, 13, 20, 21, 4]. SinceSR enjoys the good discriminating power and the robustness tonoise, SR is often considered as a popular classification tech-nique. For example, Wright et al considered the testing faceimage as a query and the training face images as the visual dic-tionary to construct a SR system to address the face recognitiontask [1]. Gao et al kernelized the previous approach and applythe kernel version to the face recognition and image classifica-tion [5]. To overcome the undercomplete dictionary situationand further improve the performance of SR-based face recogni-tion, Ma et al [2] complemented the visual dictionary by addingthe gradient image of the faces. However, the original faces andgradient faces are totally in the different feature domains. Agar-wal e al introduced a work for learning a sparse, part-basedrepresentation for object detection [20]. Yuan et al presenteda multitask joint sparse representation model to combine thestrength of multiple features and/or instances for visual clas-sification [21]. Although these SR-based approaches achieveremarkable successes, there are two main shortcomings whichare still not essentially overcame. The first one is that SR cannotperform well in the undercomplete dictionary case (the smalltraining sample size case). The second shortcoming is that con-ventional SR only can utilize the correlations (or similarities)between training samples and the testing samples to infer theclass label while cannot sufficiently exploit the correlations ofthe training samples as well as the correlations of the testingsamples. The proposed Sparse Graph-based Classifier (SGC)can well overcome these two shortcomings.2 .2. Sparse Graph
Motivated by the recent successes of SR [1, 5, 22], someresearchers leverage SR instead of the conventional k-nearest-neighbour or (cid:15) -ball to construct a sparse graph for addressingthe different issues [3, 4, 13, 15, 16]. More specifically, Qiao etal and Timofte et al successively use SR to construct a sparsegraph for dimensionality reduction [4, 15]. The Sparse Sub-space Clustering (SSC) algorithms [3, 23, 24] learn a sparsegraph for clustering via considering the data self representationproblem as a SR issue. Similar to [4, 15], Cheng et al utilize SRto construct the (cid:96) -graph (sparse-graph) for spectral clustering,subspace learning and semi-supervised learning [13]. Althoughthe applications and the learning (or construction) proceduresof these works are very different, the obtained sparse graphs arevery similar which all demonstrate the good discriminative abil-ities and robustness. In this paper, we intend to use the sparsegraph to present a GT algorithm which can incorporate thesedesirable properties. As same as these works [4, 13, 15, 16],our approach is also an application of the sparse graph. As a transductive learning algorithm, Graph-based Trans-duction (GT) labels the samples based on the similarities be-tween each two sample (no matter the training sample or thetesting samples) where these similarities are encoded in a graph(or hypergraph). In other words, GT can sufficiently exploitthe information of whole data and therefore it often performswell in the small training sample size case. This fact makesGT become very popular approach for classification and label-ing [6, 7, 9, 11, 25, 26]. For example, Duchenne et al presenteda state-of-the-art segmentation via leveraging the conventionalGT to infer the label of each pixel [9]. Graph Transductionvia Alternating Minimization (GTAM) enhanced GT via intro-ducing a propagation algorithm, which can more reliably min-imize a cost function over both a function on the graph and abinary label matrix, and applying it to classification [11]. Sim-ilarly, in order to address the classification issue, Orach et alpresented a new GT algorithm via introducing an additionalquantity of confidence in label assignments and learning themjointly with the weights [25]. Zhou et al provided a new wayto construct the hypergraph and used it to replace the graph inthe GT framework for tackling a labeling task [7]. Followingthe same framework in [7], Yu et al presented a GT-based im-age classification approach via adaptively generating the hyper-edges and learning their weights [6]. From this short review, itis not hard to conclude that one of the important factors to effectthe success of GT algorithm is the quality of the graph (or hy-pergraph). The graphs (or hypegraphs) of aforementioned ap-proaches are generated by k-nearest-neighbour or (cid:15) -ball. How-ever, some works have indicated that such graphs can be easilycorrupted by noise [13] (see the examples in Figure 1). Inspiredby the approaches mentioned in Section 2.2 [4, 15, 13], in ourapproach, we adopt the more robust and discriminative graph,sparse graph, to alleviate this problem.
3. Methodology
The graph plays a very important role in Graph-based Trans-duction (GT), since it depicts the relationships (similarities orcorrelations) of the samples which are regarded as the basisfor classification (or labeling). However, the conventional GTapproaches generate the graphs (or hypergraph) by k-nearest-neighbour or (cid:15) -ball. It has been proved that these graphs oftencannot well reveal the real relationships of samples due to noiseand some other factors [13]. Some recent works [4, 13, 15] in-dicate that using the Sparse Representation (SR) can generatea more discriminative and robust graph. So, in this section, wewill introduce how to use SR to construct a high quality graph.Following the same graph construction manner in [4, 13], wetake out one sample from the whole dataset and consider therest samples as the dictionary to construct a SR system. Here,we let d × n -dimensional matrix, X = [ x , · · · , x i , · · · , x n ], bethe sample matrix where d is the dimension of sample and n isthe number of samples. We denote the sample that we want torepresent, x q , where q is its corresponding index. The matrix X i (cid:44) q = [ x , · · · , x q − , x q + , · · · , x n ] is the sample matrix whichexcludes the sample x q . The correlations (or similarities) be-tween the query sample x q and the other samples are measuredby solving the following SR problemˆ c q = arg min c q || c q || , s.t. || x q − X i (cid:44) q c Tq || ≤ (cid:15) (1)where the vector c q = [ c q (1) , · · · , c q ( i − , c q ( i + , · · · , c q ( n )] T is the representation coefficients (regression weights) of sample x q and c q ( t ) is the element of c q corresponding to the sample x t . (cid:15) is the measurement noise. However, this (cid:96) -norm constrainedrepresentation issue is NP-hard and difficult even to approxi-mate [1, 27]. Only a few of very recent works attempt to solvethe problem as a non-convex minimization issue [28, 29], andsome of these works even cannot guarantee the converge. Theresearchers more tend to seek the close-form solution via con-sidering this (cid:96) -norm constrained regression problem as a (cid:96) -norm constrained problemˆ c q = arg min c q || c q || , s.t. || x q − X i (cid:44) q c Tq || ≤ (cid:15) (2) ⇒ ˆ c q = arg min c q { (1 − β ) || x q − X i (cid:44) q c Tq || + β || c q || } where β is a parameter in the range [0 ,
1] which is used to con-trol the trade off between the reconstruction error and the spar-sity. This problem is a typical convex problem. So it can besolved by many mature convex optimization techniques. More-over, another reason that (cid:96) -norm may be more suitable to con-struct a high quality sparse graph is that, unlike the (cid:96) -norm,which only counts the nonzero elements of coefficients, (cid:96) alsopays attention on the values of coefficients which indicate thedegrees of similarities. Of course, the idea of the sparse graph isgeneral. So other norms can also be applied to construct someother graphs which incorporate different specific properties.In our model, we adopt the SLEP method [30] to efficientlysolve the problem in Equation 2. The correlation between sam-3le x i and x j , which is also regarded as the weight of edge be-tween x i and x j , can be calculated as follows w i j = w ji = | c i ( j ) | + | c j ( i ) | w i j is also the ( i , j )-th element of affinity matrix of sparsegraph, W . Moreover, we define the self-similarity of the sampleas follows w ii = (cid:88) t (cid:44) i w it (4)We use the Laplacian Eigenmapping [31] to derive the graphLaplacian. The normalized graph Laplacian can be computedas follows L = D − / ( D − W ) D − / = I − D − / WD − / (5)where D is a diagonal matrix and D ii = (cid:80) j w i j . I is an identi-cal matrix. This normalized graph Laplacian incorporates theproperties of SR which is more discriminative and enjoys therobustness to noise. Graph-based transduction (GT) methods label input databy learning a classification function that is regularized to ex-hibit smoothness along a graph over labeled and unlabeled sam-ples [7, 11]. In other words, the GT model can be deemed as aregularized graph cut problem in which the graph cut is consid-ered as a classification function. Based on the obtained sparsegraph Laplacian L , we first formulate our GT method in the bi-nary class case and then generalize it into the multi-class case.Since our method is based on sparse graph, we name our pro-posed GT algorithm Sparse Graph-based Transduction (SGT)and its corresponding classifier Sparse Graph-based Classifier(SGC). In SGC, a graph cut f is defined as the classificationfunction and this cut should not only minimize the similaritieslosses (sparse representation relationship losses) but also reducethe classification errors of the training samples. Mathematicallyspeaking, such model can be formatted as followsˆ f = arg min f { Ω ( L , f ) + λ Φ ( y , f ) } (6) = arg min f { f T L f + λ || f − y || } where the similarity loss function Ω ( L , f ) = f T L f is denotedas a normalized cut function [32] and the classification errorfunction Φ ( y , f ) = || f − y || measures the classification errorsby computing the Euclidean distances between the predictedlabels and groundtruth labels. The vector y is the label vector.Let us assume y ( i ) is the i -the element of y , which depicts thestatus of the sample x i . Then, in y , y ( i ) = x i has been labeled as positive or negative respectively, and 0 ifit is unlabeled. λ is a positive to reconcile the similarity losses, Ω ( L , f ), and the classification errors, Φ ( y , f ). Note, the graphLaplacian L is constructed from both training samples and test-ing samples. Moreover, it is worthwhile to point out that theGT framework is very flexible. The researchers can also design these two loss functions by themselves for addressing differentissues.We employ the one-versus-all strategy to generalize the al-gorithm from the binary classification case to the multi-classclassification case. The multi-class version is denoted as fol-lows ˆ F = arg min ˆ F (cid:88) i { Ω ( L , f i ) + λ Φ ( y i , f i ) }⇒ ˆ F = arg min ˆ F (cid:88) i { f Ti L f i + λ || f i − y i || } (7) ⇒ ˆ F = arg min F { F T LF + λ || F − Y || } where F = [ f , · · · , f i , · · · , f c ] and Y = [ y , · · · , y i , · · · , y c ] arethe collection of classification functions and the collection ofthe defined labels with respect to the different classes. c is thenumber of classes. In label vector y i , only the samples fromclass i are considered as positive while the samples from otherclasses are considered as negative.Since L is a positive semi-definite matrix, Equation 7 canbe efficiently solved by Regularized Least Square (RLS). Weobtain the partial derivative of Equation 7 with respect to F ,and let it equal to zero. ∂∂ F (cid:110) F T LF + λ || F − Y || (cid:111) = ⇒ LF + λ F − λ Y ) = ⇒ F = λ YL + λ I Finally, the classification of i -th sample can be accomplishedby assigning it to the j -th class that satisfiesˆ j = arg max j F i j . (9)where F i j is the ( i , j )-th element of matrix F .SGT inherits the desirable properties of both GT and SR.More specifically, SGC can well exploit the correlations of bothtesting samples and the training samples, since the graph Lapla-cian is constructed from whole data. SGC performs much bet-ter in the small training sample size case, since SGC utilizesthe testing samples to complement the dictionary of SR in thesparse graph construction procedure. Moreover, SGC is morediscriminative and robust to noise.
4. Experimental Results
Yale [17], FERET [14], AR [18] and Caltech256 [19] databasesare used to evaluate our work. The Yale face database totallyhas 15 subjects and 11 samples per subject [17]. The size ofimage is 32 ×
32 pixels. The FERET database contains 13539images corresponding to 1565 subjects [14]. Following pa-per [10], a subset which contains 436 images of 72 individu-als is selected in our experiments and this subset involves vari-ations of facial expression, illumination and poses. The ARdatabase consists of more than 4,000 images of 126 subjects[18]. The database characterizes divergence from ideal con-ditions by incorporating various facial expressions, luminance4 a) Yale(b) AR(c) FERET(d) Caltech256Figure 2: The sample images of the datasets used in our experiments. alterations, and occlusion modes. Following paper [33], a sub-set contains 1680 images with 120 subjects are constructed inour experiment. All these images are 50 ×
40 pixels. Similarly,we follow the paper [6] and select a subset from Caltech256database [19]. In this subset, there are 20 classes and 100 im-ages per class. Since Caltech256 is more challenging than theother two databases. We adopt the Picodes feature [34] to rep-resent the images. The AR, Yale and FERET databases are theface databases and Caltech256 is the image databases. Figure 2shows some example images of these databases.Sparse Representation-based Classifier (SRC) [1], Collabo-rative Representation-based Classifier (CRC) [35], LIBSVM [36],Graph-based Classifier (GC) (The corresponding Classifier ofGraph-based Transduction (GT) algorithm [9, 11]), NormalizedHypergraph-based Classifier (NHC) [7] and Adaptive Hypergraph-based Classifier (AHC) [6] are employed for comparison. Thelast three algorithms are all transductive learning-based meth-ods and their graph matrices (or hypergraph matrices) are gen-erated based on Euclidean distance (Heat Kernel Weighting).
We apply different classifiers to these four databases andthe two-fold cross validation is adopted in our experiments. Ta-ble 1 reports the classification results. From the observations,we can know that the proposed Sparse Graph-based Classifier(SGC) outperforms all the compared classifiers on AR, Yale andFERET databases and can get a very promising performanceon Caltech256 database. Moreover, SGC improves the perfor- mances of both SRC and GT-based algorithms (NHC, AHC andGC). For example, the classification accuracy gains of SGCover SRC, NHC, AHC and GC are 1.25%, 12.88%, 12.49%and 9.02% respectively in average. In the experiments, the GT-based algorithms perform not well in comparison with SRC andSGC. We think there are two reasons behind this phenomenon.The first reason is that k-nearest-neighbour is not discriminativeenough to well select the relevant samples. The second reasonsis that it is hard to select a suitable k to define a suitable neigh-bour, which can well reveal the local relationships of samples,while SGC can avoid such selection of k , since the relevant sam-ples are adaptively selected (without giving any k ). We observefrom the classification performances of SRC and SGC that theperformance of SGC relies on the performance of SRC. Thisphenomenon verifies that the core of SGC is the sparse graphwhich is generated by SR and incorporates the properties of SR. In SGC, the graph Laplacian is generated by SR. So SGCshould inherit some merits from SRC. Theoretically speaking,compared to the original GT approaches, SGT should be morerobust to noise. In this section, we conduct some experimentson AR and FERET databases to validate this. In the exper-iments, several noisy face databases are constructed by ran-domly generating the salt-and-pepper noise for each face im-age. We define four noise levels based on the proportion ofnoise in a image and study the effect of noise proportion to theclassification performance. As same as the experimental settingin the previous section, the two-fold cross-validation is adoptedto measure the classification performance. Figure 3 shows theexperimental results. In this figure, the x -axis indicates the pro-portion of noise and the y -axis indicates the misclassificationaccuracy. From the figure, SGC outperforms SRC and GC inall experiments and has a similar behaviour as SRC. GC failssoon even in the case that only 10% noise is introduced. Onthe contrary, the classification performances of SRC and SGCdrop slowly along with the increasing of the noise percentage.Clearly, such phenomenon well verifies that SGC is more robustto noise in comparison with the conventional GT algorithms. The main advantage of GT approaches is that the informa-tion of both training data and testing data can be fully exploited.So, in most of time, these approaches always perform much bet-ter than other approaches in small training sample size case. Asan instance of the GT framework, SGT should also have suchdesirable property. In this section, we conduct several experi-ments on AR and Yale databases to investigate the effect of thetraining sample size to the classification performance of SGC.In these experiments, the cross-validation strategy is employedfor measuring the classification performance and five sizes oftraining samples are defined. For example, if the proportion ofthe training sample is 0.1, we adopt ten-fold cross-validationto conduct the experiments. We plot the classification errorsof different approaches under different training sample sizes inFigure 4. The x -axis indicates the training sample percentage5 able 1: Two-fold cross validation results on Yale, AR and Caltech256 databases.Databases Classification Error (Mean ± STD,%)CRC [35] LIBSVM [36] SRC [1] NHC [7] AHC [6] GC
Ours
Yale 9.33 ± ± ± ± ± ± ± AR 31.25 ± ± ± ± ± ± ± Caltech256 ± ± ± ± ± ± ± ± ± ± ± ± ± ± Average 23.52 23.67 20.88 32.51 32.12 28.65
0% 10% 20% 30%00.10.20.30.40.50.60.7 Noise Percentages C l a ss i f i c a t i o n E rr o r s SGCSRCGC (a) FERET
0% 10% 20% 30%00.10.20.30.40.50.60.70.8 Noise Percentages C l a ss i f i c a t i o n E rr o r s SGCSRCGC (b) ARFigure 3: Classification performances of different approaches under differentnoise levels. of data and the y -axis indicates the mean classification error.From the observations in Figure 4, SGC consistently outper-forms SRC and the improvement of SGC over SRC is increasedalong with the reduction of the training proportion. These phe-nomena all verify that SGC can perform much better than SRCin the small training sample size case. There are two parameters in SGC. One is β which is intro-duced by SR and used to control the degree of sparsity. Theother is λ which is used to reconcile the correlation loss andclassification errors of training samples. In this section, we con-
10% 20% 30% 40% 50%00.050.10.150.20.250.30.350.40.45 Training Percent (%) C l a ss i f i c a t i o n E rr o r SRCAHCNHCGCSGC (a) Yale
10% 20% 30% 40% 50%0.10.150.20.250.30.350.40.450.5 Training Percent (%) C l a ss i f i c a t i o n E rr o r SRCAHCNHCGCSGC (b) ARFigure 4: Classification errors of different methods under different trainingsample sizes. duct some experiments to discuss the effects of these parame-ters to the classification performance. As same as the previoussection, the two-fold cross-validation is adopted. Figure 5 plotsthe relationships between the classification error and the valueof parameters. From this figure, we can find that SGC is quiteinsensitive to β on three face databases, namely Yale, AR andFERET, when β ≤ − . So we suggest that the optimal β ofthese face databases are all equal to 10 − . However, for Cal-tech256 database, the optimal β is much greater and its value is0 .
1. The settings of β on Caltech256 database is different to theones on three face databases, since their features are different.The feature of the face databases is just the simple gray scale6 −5 −4 −3 −2 −1 β C l a ss i f i c a t i o n E rr o r YaleFERETARCaltech256 (a) The effect of β −4 −2 λ C l a ss i f i c a t i o n E rr o r YaleFERETARCaltech256 (b) The effect of λ Figure 5: The effects of parameters to the classification performance. while the feature of the Caltech256 database is Picodes. Simi-larly, SGC is quite insensitive to λ when its value is greater than1. From the observations, we can conclude that the optimal λ for all four databases is 10 .
5. Conclusion
We introduced the Sparse Representation (SR) to the Graph-based Transduction (GT) and presented a novel GT-based Clas-sifier called Sparse Graph-based Classifier (SGC) for imageclassification. In SGC, SR is utilized to measure the correlationof each two samples. Then a sparse graph is constructed to de-pict such correlations. Finally, the graph Laplacian of this graphis plugged into the GT framework to infer the labels of the un-labeled samples. According to the theoretical analysis and theexperimental verification on four popular image databases, weconcluded that SGC can incorporate the advantages of both SRand GT. SGC is a very flexible framework, since its parts are allreplaceable. So there are a lot of interesting works that can bedone based on SGC. For example, if we want to enhance SGC, we can design the classification error function Φ ( y , f ) by our-selves or utilize other more advanced regression techniques toinstead of SR to construct the high quality graph. Acknowledgement
The work described in this paper was partially supported byNational Natural Science Foundations of China (NO. 60975015and 61173131), Fundamental Research Funds for the CentralUniversities (No. CDJXS11181162). The authors would liketo thank useful comments of the anonymous reviewers and edi-tors.
References [1] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recog-nition via sparse representation, IEEE Transactions on Pattern Analysisand Machine Intelligence 31 (2) (2009) 210–227.[2] P. Ma, D. Yang, Y. Ge, X. Zhang, Y. Qu, S. Huang, J. Lu, Robust facerecognition via gradient-based sparse representation, Journal of Elec-tronic Imaging 22 (1) (2013) 013018–013018.[3] E. Elhamifar, R. Vidal, Sparse subspace clustering, in: IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2790–2797.[4] R. Timofte, L. Van Gool, Sparse representation based projections, in:British machine vision conference (BMVC), 2011, pp. 61–1.[5] S. Gao, I. W.-H. Tsang, L.-T. Chia, Kernel sparse representation for im-age classification and face recognition, in: European Conference on Com-puter Vision (ECCV), 2010, pp. 1–14.[6] J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its applicationin image classification, IEEE Transactions on Image Processing 21 (7)(2012) 3262–3272.[7] D. Zhou, J. Huang, B. Sch¨olkopf, Learning with hypergraphs: Cluster-ing, classification, and embedding, in: Advances in neural informationprocessing systems (NIPS), 2006, pp. 1601–1608.[8] D. L. Donoho, For most large underdetermined systems of linear equa-tions the minimal L1-norm solution is also the sparsest solution, Commu-nications on pure and applied mathematics 59 (6) (2006) 797–829.[9] O. Duchenne, J.-Y. Audibert, R. Keriven, J. Ponce, F. S´egonne, Segmen-tation by transduction, in: IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), 2008, pp. 1–8.[10] S. Huang, D. Yang, Y. Ge, D. Zhao, X. Feng, Discriminant hyper-laplacian projections with its application to face recognition, in: IEEEInternational Conference on Multimedia and Expo Workshops (ICMEW),2014, pp. 1–6.[11] J. Wang, T. Jebara, S.-F. Chang, Graph transduction via alternating min-imization, in: International conference on Machine learning, 2008, pp.1144–1151.[12] X. Zhu, Z. Ghahramani, J. Lafferty, et al., Semi-supervised learning usinggaussian fields and harmonic functions, in: International Conference onMachine Learning (ICML), Vol. 3, 2003, pp. 912–919.[13] B. Cheng, J. Yang, S. Yan, Y. Fu, T. S. Huang, Learning with L1-graphfor image analysis, IEEE Transactions on Image Processing 19 (4) (2010)858–866.[14] P. J. Phillips, H. Wechsler, J. Huang, P. J. Rauss, The feret database andevaluation procedure for face-recognition algorithms, Image and visioncomputing 16 (5) (1998) 295–306.[15] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applica-tions to face recognition, Pattern Recognition 43 (1) (2010) 331–341.[16] L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, N. Yu, Non-negative lowrank and sparse graph for semi-supervised learning, in: IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2328–2335.[17] A. M. Mart´ınez, A. C. Kak, PCA versus LDA, IEEE Transactions onPattern Analysis and Machine Intelligence 23 (2) (2001) 228–233.[18] A. Mart´ınez, R. Benavente, The AR Face Database (Jun 1998).[19] G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset.
20] S. Agarwal, D. Roth, Learning a sparse representation for object detec-tion, in: European Conference on Computer Vision (ECCV), 2002, pp.113–127.[21] X.-T. Yuan, X. Liu, S. Yan, Visual classification with multitask jointsparse representation, IEEE Transactions on Image Processing 21 (10)(2012) 4349–4360.[22] C.-Y. Lu, H. Min, J. Gui, L. Zhu, Y.-K. Lei, Face recognition via weightedsparse representation, Journal of Visual Communication and Image Rep-resentation 24 (2) (2013) 111–116.[23] X. Peng, L. Zhang, Z. Yi, Scalable sparse subspace clustering, in: IEEEConference on Computer Vision and Pattern Recognition (CVPR), 2013,pp. 430–437.[24] G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank rep-resentation, in: International Conference on Machine Learning (ICML),2010, pp. 663–670.[25] M. Orbach, K. Crammer, Graph-based transduction with confidence, in:Machine Learning and Knowledge Discovery in Databases, 2012, pp.323–338.[26] X. Yang, X. Bai, L. J. Latecki, Z. Tu, Improving shape retrieval by learn-ing graph transduction, in: European Conference on Computer Vision(ECCV), 2008, pp. 788–801.[27] E. Amaldi, V. Kann, On the approximability of minimizing nonzero vari-ables or unsatisfied relations in linear systems, Theoretical Computer Sci-ence 209 (1) (1998) 237–260.[28] X.-T. Yuan, T. Zhang, Truncated power method for sparse eigenvalueproblems, The Journal of Machine Learning Research 14 (1) (2013) 899–925.[29] L. Xu, S. Zheng, J. Jia, Unnatural l0 sparse representation for naturalimage deblurring, in: IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2013, pp. 1107–1114.[30] J. Liu, S. Ji, J. Ye, SLEP: Sparse learning with efficient projections, Ari-zona State University 6.[31] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reductionand data representation, Neural computation 15 (6) (2003) 1373–1396.[32] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans-actions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 888–905.[33] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recog-nition, IEEE Transactions on Pattern Analysis and Machine Intelligence32 (11) (2010) 2106–2112.[34] A. Bergamo, L. Torresani, A. W. Fitzgibbon, Picodes: Learning a com-pact code for novel-category recognition, in: Advances in Neural Infor-mation Processing Systems (NIPS), 2011, pp. 2088–2096.[35] D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative rep-resentation: Which helps face recognition?, in: IEEE International Con-ference on Computer Vision (ICCV), 2011, pp. 471–478.[36] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines,ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1–27:27.20] S. Agarwal, D. Roth, Learning a sparse representation for object detec-tion, in: European Conference on Computer Vision (ECCV), 2002, pp.113–127.[21] X.-T. Yuan, X. Liu, S. Yan, Visual classification with multitask jointsparse representation, IEEE Transactions on Image Processing 21 (10)(2012) 4349–4360.[22] C.-Y. Lu, H. Min, J. Gui, L. Zhu, Y.-K. Lei, Face recognition via weightedsparse representation, Journal of Visual Communication and Image Rep-resentation 24 (2) (2013) 111–116.[23] X. Peng, L. Zhang, Z. Yi, Scalable sparse subspace clustering, in: IEEEConference on Computer Vision and Pattern Recognition (CVPR), 2013,pp. 430–437.[24] G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank rep-resentation, in: International Conference on Machine Learning (ICML),2010, pp. 663–670.[25] M. Orbach, K. Crammer, Graph-based transduction with confidence, in:Machine Learning and Knowledge Discovery in Databases, 2012, pp.323–338.[26] X. Yang, X. Bai, L. J. Latecki, Z. Tu, Improving shape retrieval by learn-ing graph transduction, in: European Conference on Computer Vision(ECCV), 2008, pp. 788–801.[27] E. Amaldi, V. Kann, On the approximability of minimizing nonzero vari-ables or unsatisfied relations in linear systems, Theoretical Computer Sci-ence 209 (1) (1998) 237–260.[28] X.-T. Yuan, T. Zhang, Truncated power method for sparse eigenvalueproblems, The Journal of Machine Learning Research 14 (1) (2013) 899–925.[29] L. Xu, S. Zheng, J. Jia, Unnatural l0 sparse representation for naturalimage deblurring, in: IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2013, pp. 1107–1114.[30] J. Liu, S. Ji, J. Ye, SLEP: Sparse learning with efficient projections, Ari-zona State University 6.[31] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reductionand data representation, Neural computation 15 (6) (2003) 1373–1396.[32] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans-actions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 888–905.[33] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recog-nition, IEEE Transactions on Pattern Analysis and Machine Intelligence32 (11) (2010) 2106–2112.[34] A. Bergamo, L. Torresani, A. W. Fitzgibbon, Picodes: Learning a com-pact code for novel-category recognition, in: Advances in Neural Infor-mation Processing Systems (NIPS), 2011, pp. 2088–2096.[35] D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative rep-resentation: Which helps face recognition?, in: IEEE International Con-ference on Computer Vision (ICCV), 2011, pp. 471–478.[36] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines,ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1–27:27.