Sparse Representation-based Open Set Recognition
IIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 1
Sparse Representation-based Open Set Recognition
He Zhang,
Student Member, IEEE and Vishal M. Patel,
Senior Member, IEEE
Abstract —We propose a generalized Sparse Representation-based Classification (SRC) algorithm for open set recognitionwhere not all classes presented during testing are known duringtraining. The SRC algorithm uses class reconstruction errorsfor classification. As most of the discriminative information foropen set recognition is hidden in the tail part of the matchedand sum of non-matched reconstruction error distributions, wemodel the tail of those two error distributions using the statisticalExtreme Value Theory (EVT). Then we simplify the open setrecognition problem into a set of hypothesis testing problems.The confidence scores corresponding to the tail distributions ofa novel test sample are then fused to determine its identity. Theeffectiveness of the proposed method is demonstrated using fourpublicly available image and object classification datasets and itis shown that this method can perform significantly better thanmany competitive open set recognition algorithms. Code is publicavailable: https://github.com/hezhangsprinter/SROSRIndex Terms —Open set recognition, sparse representation-based classification, extreme value theory.
I. I
NTRODUCTION
In recent years, sparse representation-based techniques havedrawn much interest in computer vision and image processingfields [1], [2]. A number of image classification and restorationalgorithms have been proposed based on sparse representa-tions. In particular, sparse representation-based classification(SRC) algorithm [3] has gained a lot of traction. The basic ideaof SRC is to identify the correct class by seeking the sparsestrepresentation of the test sample in terms of the training. TheSRC algorithm was originally proposed for face recognitionand later extended for iris recognition and automatic targetrecognition in [4] and [5], respectively. A simultaneous dimen-sion reduction and classification framework based on SRC wasproposed in [6]. Furthermore, non-linear kernel extensions ofthe SRC method have also been proposed in [7], [8], [9], [10].The SRC algorithm and its variants are essentially basedon the closed world assumption . In other words, it is assumedthat the testing data pertains to one of K classes that are usedduring training. But in practice, testing data may come froma class that is not necessarily seen in training. This problemwhere the testing data corresponds to a class that is not seenduring training is known as open set recognition [11]. Considerthe problem of animal classification. If the training samplescorrespond to K different animals, then given a test imagecorresponding to an animal from one of the K classes, thealgorithm should be able to determine its identity. However, ifthe test image corresponds to an animal which does not match This work has been accepted by T-PAMI. He Zhang is with the departmentof Electrical and Computer Engineering at Rutgers University, Piscataway, NJUSA. email: [email protected] M. Patel is with the department of Electrical and Com-puter Engineering at Rutgers University, Piscataway, NJ USA. email:[email protected]. one of the K animals seen during training, then the algorithmshould have the capability to ignore or reject the test sample[12].The goal of an open set recognition algorithm is to learna predictive model that classifies the known data into correctclass and rejects the data from open class. As a result, one canview open set recognition as tackling both the classificationand novelty detection problem at the same time. Noveltydetection refers to the problem of finding anomalous behaviorsthat are inconsistent with the expected pattern. A noveltydetection problem can be formulated as a hypothesis testingproblem where the null hypothesis, H , implies the test samplecoming from normal class and the alternative hypothesis, H ,indicates the presence of anomalies and the objective is to findthe best threshold that separates H from H .A number of approaches have been proposed in the literaturefor open set recognition. For instance, [11] introduced aconcept of open space risk and developed a 1-vs-Set Machineformulation using linear SVMs for open set recognition. In[13], the concept of Compact Abating Probability (CAP) wasintroduced for open set recognition. In particular, Weibull-calibrated SVM (W-SVM) algorithm was developed whichessentially combines the statistical Extreme Value Theory(EVT) with binary SVMs for open set recognition. Also, theW-SVM framework was recently used in [14] for fingerprintspoof detection. In [15], an open set recognition-based methodwas developed to identify whether or not an image wascaptured by a specific digital camera.In order to reject invalid samples, the notion of SparsityConcentration Index (SCI) was proposed in [3]. Similarly,a rejection rule based on the ratio of the first two highestprojection scores was developed for rejecting non-face imagesin [16]. The rejection rules defined using sparse representationsin [3] and [16] were specifically designed to reject non-faceimages. As will be shown later, these rules do not work wellon general open set recognition problems.In this paper, we extend the SRC formulation for openset recognition. Our method relies on the statistical EVT[17] and consists of two main stages. In the first stage, thetail distributions of the matched reconstruction errors and thesum of non-matched reconstruction errors are modeled usingthe EVT to simplify the open set recognition problem intotwo hypothesis testing problems. In the second stage, thereconstruction errors corresponding to a test sample from eachclass are calculated and the confidence scores based on thetwo tail distributions are fused to determine the identity ofthe test sample. Figure 1 gives an overview of the proposedSparse Representation-based Open Set Recognition (SROSR)algorithm.This paper is organized as follows. In Section II, we give abrief background on the EVT and the SRC algorithm. Details a r X i v : . [ c s . C V ] M a y EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 2 H H H H H H H H Inverse
No Prior?
Training SamplesTest SampleTest Sample SRCSRC
Class Unknown
Fig. 1:
Overview of the proposed SROSR algorithm. Given training samples, we model tail part of the matched reconstruction error distributionand the sum of non-matched reconstruction error using the statistical EVT. Given a novel test sample, the modeled distributions and thematched and the sum of non-matched reconstruction errors are used to calculate the confidence scores. Then, these scores are fused to obtainthe final score for recognition. of the proposed SROSR algorithm are given in Section III.Experimental results are presented in Section IV and Section Vconcludes the paper with a brief summary and discussion.II. B
ACKGROUND
In this section, we review some related work in SRC andEVT.
A. Sparse Representation-based Classification
Stack the training samples from the i -th class as columnsof a large matrix Y i ∈ R M × N i , and use Y = [ Y , Y , . . . , Y K ] ∈ R M × N , as the dictionary of training samples from K classes, where N = (cid:80) i N i is the total number of training samples and M is the dimension of each training sample. Let L Y denote thecorresponding label set. If the Y i are sufficiently expressive[18], a new input sample from the i -th class, stacked as avector y t ∈ R M , will have a sparse representation y t = Yx in terms of the training data Y : x will be nonzero only forthose samples from class i . The sparse coefficient vector x ∈ R N can be estimated by solving the following optimizationproblem ˆ x = arg min x (cid:107) x (cid:107) s.t. (cid:107) y t − Yx (cid:107) < (cid:15), (1)where we have assumed that the observations are noisy withnoise energy (cid:15) and (cid:107) x (cid:107) = (cid:80) i | x i | . The sparse code ˆ x can then be used to determine the class of y t based on the classresiduals r k = (cid:107) y t − Y k ˆ x k (cid:107) , k = 1 , . . . , K, (2)where ˆ x k is the part of ˆ x that corresponds to class k . Finally,the class k ∗ that is associated to the test sample y t , can bedeclared as the one that produces the smallest approximationerror k ∗ = class of y t = arg min k r k . This method provides excellent performance on several imageclassification datasets [3], [4], and is provably robust to errorsand occlusion [19]. The basic SRC algorithm is summarizedin Algorithm 1.
Algorithm 1:
Sparse Representation-based Classification
Input: Y , L Y , (cid:15) , y t ˆ x = arg min x (cid:107) x (cid:107) s.t. (cid:107) y t − Yx (cid:107) < (cid:15)r k = (cid:107) y t − Y k ˆ x k (cid:107) for k = 1 , . . . , Kk ∗ = arg min k r k Output: k ∗ , r = [ r , r , . . . , r K ] In order to reject outliers, the following SCI rule was definedin [3] SCI ( x ) = K × max k (cid:107) x k (cid:107) (cid:107) x (cid:107) − K − ∈ [0 , . (3)Sparsity coefficient index takes values between 0 and 1. TheSCI values close to 1 correspond to the case where thetest image can be approximately represented by using onlyimages from a single class. If the SCI value of the recovered EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 3 coefficient is close to zero, then the coefficients are spreadacross all classes. Hence, the test vector is not similar to anyof the classes and can be rejected. A threshold can be chosento reject invalid test samples if SCI (ˆ x ) < α and otherwiseaccepted as valid, where α is some chosen threshold between0 and 1. B. Extreme Value Theory
Extreme value theory is a branch of statistics analyzing thedistribution of data of abnormally high or low values. It hasbeen applied in Finance [20], Hydrology [21] and noveltydetection problems [22], [23], [24]. In this section, we givea brief overview of the statistical EVT.Assume that we are given n i.i.d. samples { Z , Z , ..., Z n } drawn from an unknown distribution F ( z ) . Denote Z m = max i Z i i ∈ [1 , n ] . The Fisher-Tippett-Gnedenko theorem [25] states that if thereexists a pair of parameters ( a n , b n ) , subject to the condition a n > and b n ∈ R , then lim n →∞ P (cid:18) Z m − b n a n (cid:19) = E ( z ) , (4)where E ( z ) is a non-degenerate distribution that belongs toeither Fr´echet, Weibull or Gumbel distribution. These distri-butions can be represented as a Generalized Extreme Valuedistribution (GEV) as follows E ( z ; µ, σ, ξ ) = exp − p ( z ) , (5)where p ( z ) = (cid:18) ξ (cid:18) z − µσ (cid:19)(cid:19) − /ξ and µ, σ and ξ are the location, scaling and shape parameters,respectively.There are two challenges that one has to overcome beforeusing the GEV distribution to model the tail distribution ofdata. Firstly, we have to choose which distribution to useamong the three based on prior knowledge. Secondly, weneed to segment the data into several parts and model themaximum in each part as a distribution using GEV. However,to overcome these challenges, an alternative method basedon the Generalized Pareto distribution (GPD), denoted as G ( z ) , was proposed in [17] to estimate the tail distributionof data samples. It was shown that given a sufficiently largethreshold u , the probability of an observation exceeding u by z conditioned on u can be approximated by lim n →∞ P ( Z > z + u | Z > u ) = 1 − G ( z ) , (6)with G ( z ) = 1 − (cid:16) ξ zσ (cid:17) − ξ + , z > , where σ > , ξ ∈ R and x + = max( x, .To estimate the parameters of GPD, one can use themaximum likelihood estimation (MLE) method introduced in Here, G is the Cumulative Distribution Function (CDF) of the GPD. [26]. Even though there is the possibility that the parametersof GPD don’t exist and that maximum likelihood estimationmay not converge when ξ > / , it has been shown that theseare extremely rare cases in practice [26] [27] .III. S PARSE R EPRESENTATION - BASED O PEN -S ET R ECOGNITION (SROSR)In [11] the notion of “Open set Risk” was defined as thecost of labeling the open set sample as known sample. Basedon this, one can minimize the following cost to develop anopen set recognition algorithm arg min f C o ( f ) + λ r C (cid:15) ( f ) , (7)where f is a measurable function, C o ( f ) denotes open setrisk, C (cid:15) ( f ) denotes empirical risk for classification and λ r isa parameter that balances open set risk and empirical risk.The SRC algorithm uses residuals from (2) for classificationwhich can also be used to model f in (7) for open setrecognition. This is due to the following reason. If the testsample corresponds to class k , then the reconstruction errorcorresponding to class k should be much lower than thatcorresponding to the other classes. As a result, there may be adistinction between matched and non-matched reconstructionerrors. To illustrate this, we plot the distributions of matchedand non-matched reconstruction errors using the samples fromthe MNIST handwritten digits dataset [28], shown in Figure 2.Training samples consists of digits 0 to 9 and test samplescorrespond to digit 9. Matched reconstruction errors here meanthat the errors correspond to the sparse coefficients of digit 9and non-matched reconstruction errors mean that the errors aregenerated by the sparse coefficients of all other digits. One cansee from this figure that matched classes’ reconstruction errorsfollow some underlying distribution. If one can fit a probabilitymodel P ( r k ) to describe the distribution of the reconstructionerrors of the matched class, then one can reformulate the open-set recognition problem as a hypothesis testing for noveltydetection problem as H : P ( r k ) ≤ δ H : P ( r k ) > δ, (8)where the null hypothesis H implies that the test data aregenerated from the distribution P ( r k ) , and the alternativehypothesis H implies that test data correspond to the classesother than the ones considered in training and δ ∈ [0 , is thethreshold for rejection.However, as we have no prior knowledge on the underlyingdistribution of the matched reconstruction errors, we cannotfit a proper distribution on them. Instead, we can apply theEVT on the tail of the matched distribution as we are onlyconcerned about the right tail of this distribution for hypothesistesting. As the implementation of GEV on real data is difficult,we instead use the GPD to model the tail of the matcheddistribution. Once we learn the distribution of the tail, we canmodify the hypothesis testing problem Eq. 8 to the following H : G ( r k ) ≤ δ g H : G ( r k ) > δ g , (9) EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 4 r k
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P (r k )
0 0.010.020.030.04
Matched Non-Matched
Fig. 2:
Histogram of the matched and non-matched reconstructionerrors. Matched reconstruction errors are the errors correspondingto the sparse coefficients of digit 9 and non-matched reconstructionerrors are the errors that are generated by the sparse coefficients ofall other digits when training samples consists of digits 0 to 9 and thetest samples correspond to digit 9. All samples are from the MNISTdataset. where G ( r k ) is the learned GPD distribution for fitting theright tail of r k and δ g is the rejection threshold.When SRC is used for classification, we don’t only get theinformation of the matched reconstruction errors but we alsohave access to the non-matched reconstruction errors whichcan be used to enhance the performance of our open setrecognition algorithm. Due to the self expressiveness propertyof the SRC algorithm [3], the sparse coefficients correspondingto open set samples are very different from that of the closedset samples and they follow a certain pattern. If an open setsample is written as a linear combination of the training sam-ples from closed set then the resulting sparse coefficient vectorwill not concentrate on any class but instead spread widelyacross the entire closed training set. Thus, the distribution ofthe estimated sparse coefficient contains important informationabout the validity of open set sample. In order to illustrate thispoint, we conduct the following toy experiment using the digitsfrom the MNIST dataset. Suppose that the training data onlycontains digits 0 to 5 and the test samples consist of closed setdigits 0 to 5 and open set digits 6 to 9. In Figure 3, we plot thesum of the non-matched reconstruction errors correspondingto the closed set digits 0 to 5 and the sum of non-matchedreconstruction errors corresponding to the open set digits 6 to9. As one can see from this figure that the sum of the non-matched reconstruction errors from the closed set digits 0 to5 also follow a certain distribution that is very different fromthe distribution that one obtains from the errors correspondingto the open set digits.As a result, we can formulate another hypothesis testingproblem similar to (8) for the sum of non-matched recon-struction errors. We can combine the two hypothesis testingproblems together to make the open set recognition algorithmmore accurate. As we are only interested in the right tail ofthe matched distribution and the left tail of the sum of non- sum(r k ) P (r k )
0 0.010.020.030.04
Sum of Non-Matched from Closed-setSum of Non-Matched from Open-set
Fig. 3:
Histogram of the sum of non-matched reconstruction errorscorresponding to the closed set classes 0 to 5 and the sum of non-matched reconstruction errors corresponding to the open set digits 6to 9 . All samples are from MNIST dataset. matched distribution, we apply an inverse procedure to therandom variable Z as Z I = − Z. So the right tail of Z I is the left tail of Z . A. Training
In the training phase, we have to estimate the parametersfor fitting the tail distribution based on the GPD. Estimatingthe parameters based on MLE requires the availability ofmultiple reconstruction errors. To deal with this issue, wepropose the following iterative procedure. In each iteration,we first randomly order the training samples from each class Y i and then partition them into two sets - cross-train Y tri and cross-test Y tei . Samples in the cross-train set Y tri andsamples in the cross-test set Y tei are used as training andtesting samples, respectively for the SRC algorithm during thisparticular iteration. The cross-test and cross-train sets contain20 and 80 percent of the training samples in Y i , respectively.Let L tri and L tei denote the associated label sets correspondingto Y tri and Y tei , respectively. Once the training samplesfrom all classes are partitioned into cross-train and cross-test sets, combine the cross-train samples from all K classesinto a cross-train matrix Y tr = [ Y tr , Y tr , . . . , Y trK ] and theirassociated labels into a label set L tr = {L tr , L tr , . . . , L trK } .Similarly, combine the cross-test sets into a cross-test matrix Y te = [ Y te , Y te , . . . , Y teK ] and their labels into a label set L te = {L te , L te , . . . , L teK } . Use ( Y tr , Y te , L tr , L te , (cid:15) ) as theinputs to the SRC algorithm and obtain the reconstructionerror vector r i . We repeat this process for L times and gatherthe matched R mi and the sum of non-matched reconstructionerrors R nmi , respectively for i = 1 , . . . , K , for fitting the taildistribution based on the GDP. The entire training phase ofour method is summarized in Algorithm 2, where ρ indicatesthe tail size. EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 5
Algorithm 2:
Pseudocode for SROSR Training
Input: Y , ρ, (cid:15), L , L Y Initialization for i = 1 : K dofor j = 1 : L do ˜ Y i = randomly ordered Y i ∈ R M × N i N tr = N i × . Y tri = ˜ Y i (: , N tr ) L tri = Labels of Y tri Y tei = ˜ Y i (: , N tr + 1 : end ) L tei = Labels of Y tei r i ( j, :) ← SRC ( Y tr , Y te , L tr , L te , (cid:15) ) end for R mi = [ r i (1 , i ) , . . . , r i ( L, i )] R nmi = [ (cid:80) p : p (cid:54) = i r i (1 , p ) , . . . , (cid:80) p : p (cid:54) = i r i ( L, p )] σ m ( i ) , ξ m ( i ) ← GPDfit( R m , ρ ) σ nm ( i ) , ξ nm ( i ) ← GPDfit( − R nm , ρ ) end forOutput: σ m , ξ m , σ nm , ξ nm B. Testing
Given a novel test sample y t , we compute its sparsecoefficient ˆ x by solving the (cid:96) -minimization problem Eq.1. We then obtain K reconstruction errors as required bythe SRC algorithm. We choose the class with the minimumreconstruction error as the candidate class. We then obtain twoprobability scores by fitting matched and sum of non-matchedreconstruction errors to their corresponding GPDs. As the tworaw reconstruction errors are all normalized into probabilitiesby their corresponding GPDs, we can add the two probabilityscores together with appropriate weights to obtain the finalscore. We set the weight, w , as w = 13 (1 − Openness ) , where Openness = 1 − (cid:114) × N T A N T G + N T E , (10)and N T A , N
T G and N T E are the number of training classes,the number of target classes to be identified, and the number oftesting classes, respectively [11]. If ‘Openness = 0’, then oursetting reduce to the traditional classification problem (i. e., acompletely closed problem). With the growth of ‘Openness’,more and more unknown classes will appear during testing.As a result, the weight on the non-matched probability scoreswill decrease.Our testing algorithm is summarized in Algorithm 3.The inputs required during testing are the test sample y t , training samples Y , the estimated parameters for thematched ( σ m , ξ m ) and the sum of non-matched distributions ( σ nm , ξ nm ) , rejection threshold δ t and the weight w . Theoutput of the testing phase is one of the following classes { , , . . . , K, O} , where O represents the open class.IV. E XPERIMENTAL R ESULTS
In this section, we present several experimental resultsdemonstrating the effectiveness of the proposed SROSR
Algorithm 3:
Pseudocode for SROSR Testing
Input: y t , Y , σ m , ξ m , σ nm , ξ nm , δ t , w, (cid:15) r ← SRC ( Y , y t , L Y , (cid:15) ) k ∗ = arg min i r i r m = r k ∗ , r nm = (cid:80) Ki =1 ,i (cid:54) = k ∗ r i S m = G ( r m ; σ m ( k ∗ ) , ξ m ( k ∗ )) , S nm = G ( r nm ; σ nm ( k ∗ ) , ξ nm ( k ∗ )) S = S m + w . . . S nm if S > δ t then Class of y t = O else Class of y t = k ∗ end ifOutput: k ∗ or O method on open set recognition. In particular, we present theopen set recognition results on the MNIST handwritten digitsdataset [28], Extended Yale B face dataset [29], UIUC attributedataset [30] and Caltech-256 dataset [31]. The comparisonwith other existing open set recognition methods such as 1-vs-All Multi-class RBF SVM with Platt Probability Estimation[32] and Pairwise Multi-class RBF SVM [33] in [13] suggeststhat the W-SVM algorithm is among the best. Hence, we treatit as state-of-the-art and use it as a benchmark for comparisonsin this paper. Furthermore, we compare the performanceof our method with two other sparse representation basedmethods for rejecting invalid samples - SCI [3] and Ratiomethod [16]. Finally, we compare our method with a “Naive”baseline where we estimate a reconstruction error thresholddirectly from training rather than using GPD to model the taildistributions.Recognition accuracy and F-measure are used to measurethe performance of different algorithms on open set recogni-tion. The F-meaure is defined as a harmonic mean of Precisionand Recall F-measure = 2 · Precision · RecallPrecision + Recall , (11)where Recall is defined asRecall = TPTP+FNand Precision defined asPrecision = TPTP+FP . Here TP, FN, and FP denote true positive, false negative andfalse positive, respectively. F-measure is always between 0 and1. The higher the F-measure the better the performance of anobject recognition system. Accuracy is defined asAccuracy = TP + TNTN + TP + FP + FN , where TN denotes true negative. The rejection threshold, δ t was empirically determined. In our experiments, we have used δ t = 0 . · (1 + w ) , . · (1 + w ) , . · (1 + w ) , . · (1 + w ) for the simulations with the MNIST dataset, Extended YaleBdataset, UIUC attribute dataset and Caltech-256 dataset, re-spectively. We choose the tail size ρ based on cross-validation. EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 6
In particular, we set ρ = 0 . , . , . and . for theexperiments with the MNIST dataset, Extended YaleB dataset,UIUC attribute dataset and Caltech-256 dataset, respectively.The noise level (cid:15) is set equal to . for solving the SRCproblem in our proposed SROSR framework. A. Results on the Extended YaleB Dataset
The Extended Yale B Dataset consists of 2,414 frontalimages of 38 individuals. These images were captured un-der various controlled indoor lighting conditions. Each classcontains about 64 images. They were cropped and normalizedto the size of × pixels. We randomly choose 10 classesfor training and vary the openness by randomly selecting 10 to28 classes. The following steps summarize our data partitionprocedure on the Extended Yale B dataset.1) Randomly select 10 classes among the 38 classes.2) Randomly choose 80% of the samples in each of the 10selected classes as training samples.3) Select the remaining 20% of the samples from step 2and all the samples from the other 28 classes as testingsamples.We repeat the above procedure 50 times and report the averageF-measure and accuracy of different methods.To show the significance of why we used the sum ofnon-matched reconstruction error distribution along with thematched error distribution, in this experiment, we consider justthe matched reconstruction error distribution without fusingthe sum of the non-matched reconstruction error distributionin our method. The results as shown in Figure 4. Figure 4(a) shows the average F-measure results on this dataset. Theface images in this dataset are cropped and well-aligned.Furthermore, the images contain almost the same background.As a result, all compared methods achieve very high F-measures on this dataset. Figure 4 (b) shows the averageaccuracy of different methods as we vary openness. As can beseen from both of these plots, the proposed SROSR methodoutperforms the other compared methods. In particular, if onlythe matched reconstruction error distribution is considered,then the performance degrades significantly. On the other had,when both the sum of non-matched and matched distributionsare used, it greatly enhances the performance of the proposedSROSR algorithm. This experiment clearly indicates that bothmatched and the sum of non-matched reconstruction errorscontain complementary information which can be used toimprove the performance of an open set recognition algorithm. B. Results on the MNIST Dataset
The MNIST dataset contains gray scale images of hand-written digits of size × . There are about 60,000 trainingimages and 10,000 testing images corresponding to 10 classesin this dataset. Following the experimental setting describedin [13], we randomly choose 6 classes for training and alterthe openness by the remaining 4 classes. We repeat thisexperiment 50 times and record the average F-measure andAccuracy. Finally, we plot the Openness vs F-measure andOpenness vs Accuracy curves to validate our approach.
0% 5% 10% 15% 20% 25% 30% 35% 40%0.70.750.80.850.90.95
Openness F − m ea s u r e SROSR W−SVM SCI Ratio Matched Naive 0% 5% 10% 15% 20% 25% 30% 35% 40%0.860.880.90.920.940.960.981
Openness A cc u r a cy SROSR W−SVM SCI Ratio Matched Naive (a) (b)
Fig. 4:
Results on the Extended Yale B dataset. (a) Openness vsF-Measure results. (b) Openness vs Accuracy results.
The Openness vs F-measure and Openness vs Accuracycurves corresponding to this experiment are shown in Figure 5(a) and Figure 5 (a), respectively. It can be seen from theseresults that the proposed SROSR method performs better thanthe Naive method, the W-SVM method and the sparsity-basedrejection methods. Our method achieves the highest F-measureand accuracy among all the five methods as we vary openness.The rejection methods such as SCI and Ratio are based onthe sparsity of the test vector with respect to the trainingsamples. If an open set sample has a sparsity pattern similarto that corresponding to one of the training samples, thenthe SRC method based on SCI will not reject that sample.This demonstrates that incorporating matched as well as non-matched reconstruction errors can significantly enhance theperformance of a sparsity-based classification method on openset recognition.
0% 2% 4% 6% 8% 10% 12% 14%0.820.840.860.880.90.920.940.960.981
Openness F − m ea s u r e SROSR W−SVM SCI Ratio Naive
0% 2% 4% 6% 8% 10% 12% 14%0.750.80.850.90.951
Openness A cc u r a cy SROSR W−SVM SCI Ratio Naive (a) (b)
Fig. 5:
Results on the MNIST dataset. (a) Openness vs F-Measureresults. (b) Openness vs Accuracy results.
By comparing Figure 4 (b) with Figure 5 (b), we see thatthe accuracy in Figure 4 (b) increases while the accuracy inFigure 5 (b) decreases. This is mainly due to the fact thatthe rejection accuracy is higher than the recognition accuracyon the Extended YaleB dataset while the rejection accuracy islower than the recognition accuracy on the MNIST dataset.
C. Results on the UIUC Attribute Dataset
The UIUC attributes dataset contains data in two parts - a-Pascal and a-Yahoo. The a-Pascal dataset has twenty object
EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 7 classes such as animals, vehicles, etc. and each categorycontains 150 to 1000 samples. The a-Yahoo dataset containstwelve additional object classes, which can be used as openset classes during testing. We randomly choose 10 classesfrom the a-Pascal dataset as training and vary the openness byrandomly selecting 1 to 10 classes from the a-Yahoo dataset.In each training class, we randomly choose 50 samples and ineach testing class, we randomly choose 20 samples. We repeatthe above procedure 50 times and average the F-measure andaccuracy results. Results are shown in Figure 6. As can be seenfrom this figure that SROSR outperforms the other methods.In particular, as the openness is increased, our method canachieve much better F-measure and accuracies than the othercompared methods.
2% 4% 6% 8% 10% 12% 14% 16% 18% 20%0.660.680.70.720.740.760.780.80.820.84
Openness F − m ea s u r e SROSR W−SVM SCI Ratio Naive
2% 4% 6% 8% 10% 12% 14% 16% 18% 20%0.60.620.640.660.680.70.720.740.760.78
Openness A cc u r a cy SROSR W−SVM SCI Ratio Naive (a) (b)
Fig. 6:
Results on the UIUC attribute dataset. (a) Openness vs F-Measure results. (b) Openness vs Accuracy results.
D. Results on the Caltech-256 Dataset
The Caltech-256 dataset contains 257 categories includingone background clutter class. Each category has about 80to 827 images and most of the categories have about 100images. In this experiment, we extracted the spatial pyramidfeatures [34] from these images as input for all four methods.The evaluation protocol is very similar to the previous threeexperiments. We randomly select 20 categories as trainingclasses and vary the openness by randomly selecting 31 to40 classes out of the other 237 classes. For all the selectedclasses, we randomly choose 50 samples for each trainingclass and 20 samples for each testing class. So the opennessof our experiments on the Caltech-256 dataset varies from24.94% to 29.29%. We average the results over 50 randomtrails. Figure 7 (a) and (b) show the average F-measure andaccuracy curves of different methods as we vary the openness,respectively. Overall, the proposed SROSR achieves the bestF-measure and accuracy results on this dataset compared tothe other competitive open set recognition methods.V. C
ONCLUSION
The SRC algorithm classifies a test sample by seekingthe sparsest representation in terms of the training data anddoes not work well under the open world assumption. Inthis paper, we have introduced a training stage to the SRCalgorithm so that it can be adapted to tackle the open set
24% 25% 26% 27% 28% 29% 30%0.50.550.60.650.70.75
Openness F − m ea s u r e SROSR W−SVM SCI Ratio Naive
24% 25% 26% 27% 28% 29% 30%0.580.60.620.640.660.680.70.720.740.76
Openness A cc u r a cy SROSR W−SVM SCI Ratio Naive (a) (b)
Fig. 7:
Results on the Caltech 256 dataset. (a) Openness vs F-Measureresults. (b) Openness vs Accuracy results. recognition problems. The resulting algorithm makes use ofthe reconstruction error distributions modeled by the EVT.Various experiments on popular image and object classificationdatasets have shown that our method can perform significantlybetter than many competitive open set recognition algorithms.If the dataset contains extreme variations in pose, illumi-nation or resolution, then the self expressiveness property re-quired by the SRC algorithm will no longer hold. In this case,the proposed SROSR algorithm will fail. A possible solutionto this problem would be to develop kernel-based methodsfor SROSR where kernel SRC [10], [9], [7] is used to findthe sparse representation in the high-demential feature space.Another limitation of the proposed SROSR method is that forgood recognition performance, the training set is required to beextensive enough to span the conditions that might occur in thetest set. Development of sparsity-based open set recognitionmethod where only a single image or a very few images aregiven per class for training is an interesting open problem.Furthermore, it remains an interesting topic for future work todevelop a sparse representation or dictionary learning-basedopen set recognition algorithm by directly minimizing the openrisk criteria. A
KNOWLEDGEMENT
This work was supported by an ARO grant W911NF-16-1-0126. R
EFERENCES[1] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparserepresentation for computer vision and pattern recognition,”
Proceedingsof the IEEE , vol. 98, no. 6, pp. 1031–1044, June 2010.[2] R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparserepresentation modeling,”
Proceedings of the IEEE , vol. 98, no. 6, pp.1045–1057, June 2010.[3] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,”
IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 31, no. 2, pp. 210–227, 2009.[4] J. K. Pillai, V. M. Patel, R. Chellappa, and N. K. Ratha, “Secure androbust iris recognition using random projections and sparse representa-tions,”
IEEE Transactions on Pattern Analysis and Machine Intelligence ,vol. 33, no. 9, pp. 1877–1893, Sept 2011.[5] V. M. Patel, N. M. Nasrabadi, and R. Chellappa, “Sparsity-motivatedautomatic target recognition,”
Applied Optics , vol. 50, no. 10, pp. 1425–1433, Apr 2011.[6] D. Zhang, M. Yang, Z. Feng, and D. Zhang, “On the dimensionalityreduction for sparse representation based face recognition,” in
Interna-tional Conference on Pattern Recognition , Aug 2010, pp. 1237–1240.
EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. X, NO. X, 2016 8 [7] L. Zhang, W.-D. Zhou, P.-C. Chang, J. Liu, Z. Yan, T. Wang, and F.-Z.Li, “Kernel sparse representation-based classifier,”
IEEE Transactionson Signal Processing , vol. 60, no. 4, pp. 1684–1695, April 2012.[8] H. V. Nguyen, V. M. Patel, N. M. Nasrabadi, and R. Chellappa,“Design of non-linear kernel dictionaries for object recognition,”
IEEETransactions on Image Processing , vol. 22, no. 12, pp. 5123–5135, 2013.[9] S. Gao, I. Tsang, and L.-T. Chia, “Sparse representation with kernels,”
IEEE Transactions on Image Processing , vol. 22, no. 2, pp. 423–434,Feb 2013.[10] A. Shrivastava, V. M. Patel, and R. Chellappa, “Multiple kernel learningfor sparse representation-based classification,”
IEEE Transactions onImage Processing , vol. 23, no. 7, pp. 3013–3024, July 2014.[11] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult, “To-ward open set recognition,”
Pattern Analysis and Machine Intelligence,IEEE Transactions on , vol. 35, no. 7, pp. 1757–1772, 2013.[12] M. Wilber, W. Scheirer, P. Leitner, B. Heflin, J. Zott, D. Reinke,D. Delaney, and T. Boult, “Animal recognition in the mojave desert:Vision tools for field biologists,” in
IEEE Workshop on Applications ofComputer Vision , Jan 2013, pp. 206–213.[13] W. J. Scheirer, L. P. Jain, and T. E. Boult, “Probability models for openset recognition,”
IEEE Transactions on Pattern Analysis and MachineIntelligence (T-PAMI) , vol. 36, November 2014.[14] A. Rattani, W. Scheirer, and A. Ross, “Open set fingerprint spoofdetection across novel fabrication materials,”
IEEE Transactions onInformation Forensics and Security , vol. 10, no. 11, pp. 2447–2460,Nov 2015.[15] F. de O. Costa, E. Silva, M. Eckmann, W. J. Scheirer, and A. Rocha,“Open set source camera attribution and device linking,”
Pattern Recog-nition Letters , vol. 39, pp. 92 – 101, 2014.[16] V. M. Patel, T. Wu, S. Biswas, P. J. Phillips, and R. Chellappa,“Dictionary-based face recognition under variable lighting and pose,”
IEEE Transactions on Information Forensics and Security , vol. 7, no. 3,pp. 954–965, 2012.[17] J. Pickands III, “Statistical inference using extreme order statistics,” theAnnals of Statistics , pp. 119–131, 1975.[18] Y. Zhang, C. Mu, H.-W. Kuo, and J. Wright, “Toward guaranteedillumination models for non-convex objects,” in
IEEE InternationalConference on Computer Vision , 2013, pp. 937–944.[19] J. Wright and Y. Ma, “Dense error correction via (cid:96) -minimization,” IEEE Transactions on Information Theory , vol. 56, no. 7, pp. 3540–3560, July 2010.[20] S. Kotz and S. Nadarajah,
Extreme value distributions . World Scientific,2000, vol. 31.[21] R. L. Smith, “Extreme value analysis of environmental time series: anapplication to trend detection in ground-level ozone,”
Statistical Science ,pp. 367–377, 1989.[22] S. J. Roberts, “Novelty detection using extreme value statistics,”
IEEProceedings-Vision, Image and Signal Processing , vol. 146, no. 3, pp.124–129, 1999.[23] D. A. Clifton, S. Hugueny, and L. Tarassenko, “Novelty detectionwith multivariate extreme value statistics,”
Journal of signal processingsystems , vol. 65, no. 3, pp. 371–389, 2011.[24] X. Gibert-Serra, V. M. Patel, and R. Chellappa, “Sequential score adap-tation with extreme value theory for robust railway track inspection,” in
IEEE International Conference on Computer Vision (ICCV) workshopon Computer Vision for Road Scene Understanding and AutonomousDriving (CVRSUAD) , 2015.[25] R. A. Fisher and L. H. C. Tippett, “Limiting forms of the frequencydistribution of the largest or smallest member of a sample,” in
Mathe-matical Proceedings of the Cambridge Philosophical Society , vol. 24,no. 02. Cambridge Univ Press, 1928, pp. 180–190.[26] S. D. Grimshaw, “Computing maximum likelihood estimates for thegeneralized pareto distribution,”
Technometrics , vol. 35, no. 2, pp. 185–191, 1993.[27] V. Choulakian and M. Stephens, “Goodness-of-fit tests for the gener-alized pareto distribution,”
Technometrics , vol. 43, no. 4, pp. 478–484,2001.[28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learningapplied to document recognition,”
Proceedings of the IEEE , vol. 86,no. 11, pp. 2278–2324, 1998.[29] A. Georghiades, P. Belhumeur, and D. Kriegman, “From few to many:Illumination cone models for face recognition under variable lightingand pose,”
IEEE Trans. Pattern Anal. Mach. Intelligence , vol. 23, no. 6,pp. 643–660, 2001.[30] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objectsby their attributes,” in
Computer Vision and Pattern Recognition, 2009.CVPR 2009. IEEE Conference on . IEEE, 2009, pp. 1778–1785. [31] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object categorydataset,” 2007.[32] J. Platt et al. , “Probabilistic outputs for support vector machines andcomparisons to regularized likelihood methods,” 1999.[33] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclasssupport vector machines,”
Neural Networks, IEEE Transactions on ,vol. 13, no. 2, pp. 415–425, 2002.[34] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatialpyramid matching for recognizing natural scene categories,” in2006IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR’06)