Sang-Woon Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sang-Woon Kim is active.

Explore More

Publication

Featured researches published by Sang-Woon Kim.

Pattern Analysis and Applications | 2003

A brief taxonomy and ranking of creative prototype reduction schemes

Sang-Woon Kim; B. John Oommen

Various Prototype Reduction Schemes (PRS) have been reported in the literature. Based on their operating characteristics, these schemes fall into two fairly distinct categories — those which are of a creative sort, and those which are essentially selective. The norms for evaluating these methods are typically, the reduction rate and the classification accuracy. It is generally believed that the former class of methods is superior to the latter. In this paper, we report the results of executing various creative PRSs, and attempt to comparatively quantify their capabilities. The paper presents a brief taxonomy of the various reported PRS schemes. Our experimental results for three artificial data sets, and for samples involvingreal-life data sets, demonstrate that no single method is uniformly superior to the others for all kinds of applications. This result, though consistent with the findings of Bezdek and Kuncheva [1], is, in one sense, counter-intuitive, because the various researchers have presented their specific PRS with the hope that it would be superior to the previously reported methods. However, the fact is that while one method is superior in certain domains, it is inferior to another method when dealing with a data set with markedly different characteristics. The conclusion of this study is that the question of determining when one method is superior to another remains open. Indeed, it appears as if the designers of the pattern recognition system will have to choose the appropriate PRS based to the specific characteristics of the data that they are studying. The paper also suggests answers to various hypotheses that relate to the accuracies and reduction rates of families of PRS.

Pattern Recognition | 2003

Enhancing prototype reduction schemes with LVQ3-type algorithms

Sang-Woon Kim; B.J. Oommen

Abstract Various prototype reduction schemes have been reported in the literature. Foremost among these are the prototypes for nearest neighbor (PNN), the vector quantization (VQ), and the support vector machines (SVM) methods. In this paper, we shall show that these schemes can be enhanced by the introduction of a post-processing phase that is related, but not identical to, the LVQ3 process. Although the post-processing with LVQ3 has been reported for the SOM and the basic VQ methods, in this paper, we shall show that an analogous philosophy can be used in conjunction with the SVM and PNN rules. Our essential modification to LVQ3 first entails a partitioning of the respective training sets into two sets called the Placement set and the Optimizing set, which are instrumental in determining the LVQ3 parameters. Such a partitioning is novel to the literature. Our experimental results demonstrate that the proposed enhancement yields the best reported prototype condensation scheme to-date for both artificial data sets, and for samples involving real-life data sets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

On using prototype reduction schemes and classifier fusion strategies to optimize kernel-based nonlinear subspace methods

Sang-Woon Kim; B.J. Oommen

In Kernel-based Nonlinear Subspace (KNS) methods, the length of the projections onto the principal component directions in the feature space, is computed using a kernel matrix, K, whose dimension is equivalent to the number of sample data points. Clearly this is problematic, especially, for large data sets. In this paper, we solve this problem by subdividing the data into smaller subsets, and utilizing a Prototype Reduction Scheme (PRS) as a preprocessing module, to yield more refined representative prototypes. Thereafter, a Classifier Fusion Strategy (CFS) is invoked as a postprocessing module, to combine the individual KNS classification results to derive a consensus decision. Essentially, the PRS is used to yield computational advantage, and the CFS, in turn, is used to compensate for the decreased efficiency caused by the data set division. Our experimental results demonstrate that the proposed mechanism significantly reduces the prototype extraction time as well as the computation time without sacrificing the classification accuracy. The results especially demonstrate a significant computational advantage for large data sets within a parallel processing philosophy.

Pattern Recognition | 2004

On using prototype reduction schemes to optimize kernel-based nonlinear subspace methods

Sang-Woon Kim; B. John Oommen

Abstract The subspace method of pattern recognition is a classification technique in which pattern classes are specified in terms of linear subspaces spanned by their respective class-based basis vectors. To overcome the limitations of the linear methods, kernel-based nonlinear subspace (KNS) methods have been recently proposed in the literature. In KNS, the kernel principal component analysis (kPCA) has been employed to get principal components, not in an input space, but in a high-dimensional space, where the components of the space are nonlinearly related to the input variables. The length of projections onto the basis vectors in the kPCA are computed using a kernel matrix K , whose dimension is equivalent to the number of sample data points. Clearly this is problematic, especially, for large data sets. In this paper, we suggest a computationally superior mechanism to solve the problem. Rather than define the matrix K with the whole data set and compute the principal components, we propose that the data be reduced into a smaller representative subset using a prototype reduction scheme (PRS). Since a PRS has the capability of extracting vectors that satisfactorily represent the global distribution structure, we demonstrate that data points which are ineffective in the classification can be eliminated to obtain a reduced kernel matrix, K , without degrading the performance. Our experimental results demonstrate that the proposed mechanism dramatically reduces the computation time without sacrificing the classification accuracy for samples involving real-life data sets as well as artificial data sets. The results especially demonstrate the computational advantage for large data sets, such as those involved in data mining and text categorization applications.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

On utilizing search methods to select subspace dimensions for kernel-based nonlinear subspace classifiers

Sang-Woon Kim; B.J. Oommen

In kernel-based nonlinear subspace (KNS) methods, the subspace dimensions have a strong influence on the performance of the subspace classifier. In order to get a high classification accuracy, a large dimension is generally required. However, if the chosen subspace dimension is too large, it leads to a low performance due to the overlapping of the resultant subspaces and, if it is too small, it increases the classification error due to the poor resulting approximation. The most common approach is of an ad hoc nature, which selects the dimensions based on the so-called cumulative proportion computed from the kernel matrix for each class. We propose a new method of systematically and efficiently selecting optimal or near-optimal subspace dimensions for KNS classifiers using a search strategy and a heuristic function termed the overlapping criterion. The rationale for this function has been motivated in the body of the paper. The task of selecting optimal subspace dimensions is reduced to find the best ones from a given problem-domain solution space using this criterion as a heuristic function. Thus, the search space can be pruned to very efficiently find the best solution. Our experimental results demonstrate that the proposed mechanism selects the dimensions efficiently without sacrificing the classification accuracy.

systems man and cybernetics | 2008

On Using Prototype Reduction Schemes to Optimize Kernel-Based Fisher Discriminant Analysis

Sang-Woon Kim; B.J. Oommen

Fishers linear discriminant analysis (LDA) is a traditional dimensionality reduction method that has been proven to be successful for decades. Numerous variants, such as the kernel-based Fisher discriminant analysis (KFDA), have been proposed to enhance the LDAs power for nonlinear discriminants. Although effective, the KFDA is computationally expensive, since the complexity increases with the size of the data set. In this correspondence, we suggest a novel strategy to enhance the computation for an entire family of the KFDAs. Rather than invoke the KFDA for the entire data set, we advocate that the data be first reduced into a smaller representative subset using a prototype reduction scheme and that the dimensionality reduction be achieved by invoking a KFDA on this reduced data set. In this way, data points that are ineffective in the dimension reduction and classification can be eliminated to obtain a significantly reduced kernel matrix K without degrading the performance. Our experimental results demonstrate that the proposed mechanism dramatically reduces the computation time without sacrificing the classification accuracy for artificial and real-life data sets.

Pattern Recognition Letters | 2010

A pre-clustering technique for optimizing subclass discriminant analysis

Sang-Woon Kim

Subclass discriminant analysis (SDA) [Zhu, M., Martinez, A.M., 2006. Subclass discriminant analysis. IEEE Trans. Pattern Anal. Machine Intell., 28(8), pp. 1274-1286] is a dimensionality reduction method that has proven successful for different types of class distributions. In SDA, the reduction of dimensionality is not achieved by assuming that each class is represented by a single cluster, but rather by approximating the underlying distribution with a mixture of Gaussians. The advantage of SDA is that since it does not treat the class-conditional distributions as uni-modal ones, the nonlinearly separable problems can be handled as linear ones. The problem with this strategy, however, is that to estimate the number of subclasses needed to represent the distribution of each class, i.e., to find out the best partition, all possible solutions should be verified. Therefore, this approach leads to an associated high computational cost. In this paper, we propose a method that optimizes the computational burden of SDA-based classification by simply reducing the number of classes to be examined through choosing a few classes of the training set prior to the execution of the SDA. To select the classes to be partitioned, the intra-set distance is employed as a criterion and a k-means clustering is performed to divide them. Our experimental results for an artificial data set of XOR-type samples and three benchmark image databases of Kimia, AT&T, and Yale demonstrate that the processing CPU-time of the SDA optimized with the proposed scheme could be reduced dramatically without either sacrificing classification accuracy or increasing computational complexity.

S+SSPR 2014 Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition - Volume 8621 | 2014

Metric Learning in Dissimilarity Space for Improved Nearest Neighbor Performance

Robert P. W. Duin; Manuele Bicego; Mauricio Orozco-Alzate; Sang-Woon Kim; Marco Loog

Showing the nearest neighbor is a useful explanation for the result of an automatic classification. Given, expert defined, distance measures may be improved on the basis of a training set. We study several proposals to optimize such measures for nearest neighbor classification, explicitly including non-Euclidean measures. Some of them may directly improve the distance measure, others may construct a dissimilarity space for which the Euclidean distances show significantly better performances. Results are application dependent and raise the question what characteristics of the original distance measures influence the possibilities of metric learning.

canadian conference on artificial intelligence | 2007

On Combining Dissimilarity-Based Classifiers to Solve the Small Sample Size Problem for Appearance-Based Face Recognition

Sang-Woon Kim; Robert P. W. Duin

For high-dimensional classification tasks, such as face recognition, the number of samples is smaller than the dimensionality of the samples. In such cases, a problem encountered in Linear Discriminant Analysis-based (LDA) methods for dimension reduction is what is known as the Small Sample Size (SSS) problem. A number of LDA-extension approaches that attempt to solve the SSS problem have been proposed in the literature. Recently, a different way of employing a dissimilarity representation method was proposed [18], where an object was represented based on the dissimilarity measures among representatives extracted from training samples instead of the feature vector itself. Apart from utilizing the dissimilarity representation, in this paper, a new way of employing a fusion technique in representing features as well as in designing classifiers is proposed in order to increase the classification accuracy. The proposed scheme is completely different from the conventional ones in terms of the computation of the transformation matrix as well as the selection of the number of dimensions. The present experimental results demonstrate that the proposed combining mechanism works well and achieves further improved efficiency compared with the LDA-extension approaches for well-known face databases involving AT&T and Yale databases. The results especially demonstrate that the highest accuracy rates are achieved when the combined representation is classified with the trained combiners.

Pattern Recognition Letters | 2011

An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications

Sang-Woon Kim

This paper presents an empirical evaluation on the methods of reducing the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classifications (DBCs). One problem of DBCs is the high dimensionality of the dissimilarity spaces. To address this problem, two kinds of solutions have been proposed in the literature: prototype selection (PS) based methods and dimension reduction (DR) based methods. Although PS-based and DR-based methods have been explored separately by many researchers, not much analysis has been done on the study of comparing the two. Therefore, this paper aims to find a suitable method for optimizing DBCs by a comparative study. Our empirical evaluation, obtained with the two approaches for an artificial and three real-life benchmark databases, demonstrates that DR-based methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA) based methods, generally improve the classification accuracies more than PS-based methods. Especially, the experimental results demonstrate that PCA is more useful for the well-represented data sets, while LDA is more helpful for the small sample size problems.

Explore More