Kwang Hyung Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kwang Hyung Lee is active.

Explore More

Publication

Featured researches published by Kwang Hyung Lee.

Pattern Recognition | 2004

On cluster validity index for estimation of the optimal number of fuzzy clusters

Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

A new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. The overlap measure, which indicates the degree of overlap between fuzzy clusters, is obtained by computing an inter-cluster overlap. The separation measure, which indicates the isolation distance between fuzzy clusters, is obtained by computing a distance between fuzzy clusters. A good fuzzy partition is expected to have a low degree of overlap and a larger separation distance. Testing of the proposed index and nine previously formulated indexes on well-known data sets showed the superior effectiveness and reliability of the proposed index in comparison to other indexes.

Pattern Recognition | 2005

Rapid and brief communication: Evaluation of the performance of clustering algorithms in kernel-induced feature space

Dae-Won Kim; Ki-Young Lee; Doheon Lee; Kwang Hyung Lee

By using a kernel function, data that are not easily separable in the original space can be clustered into homogeneous groups in the implicitly transformed high-dimensional feature space. Kernel k-means algorithms have recently been shown to perform better than conventional k-means algorithms in unsupervised classification. However, few reports have examined the benefits of using a kernel function and the relative merits of the various kernel clustering algorithms with regard to the data distribution. In this study, we reformulated four representative clustering algorithms based on a kernel function and evaluated their performances for various data sets. The results indicate that each kernel clustering algorithm gives markedly better performance than its conventional counterpart for almost all data sets. Of the kernel clustering algorithms studied in the present work, the kernel average linkage algorithm gives the most accurate clustering results.

Pattern Recognition Letters | 2004

Fuzzy clustering of categorical data using fuzzy centroids

Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

In this paper the conventional fuzzy k-modes algorithm for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainly in the classification of categorical data. To test the proposed approach, the proposed algorithm and two conventional algorithms (the k-modes and fuzzy k-modes algorithms) were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results.

Pattern Recognition Letters | 2005

A kernel-based subtractive clustering method

Dae-Won Kim; Ki-Young Lee; Doheon Lee; Kwang Hyung Lee

In this paper the conventional subtractive clustering method is extended by calculating the mountain value of each data point based on a kernel-induced distance instead of the conventional sum-of-squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. Use of the kernel function makes it possible to cluster data that is linearly non-separable in the original space into homogeneous groups in the transformed high dimensional space. Application of the conventional subtractive method and the kernel-based subtractive method to well-known data sets showed the superiority of the proposed approach.

Information Sciences | 2004

A cluster validation index for GK cluster analysis based on relative degree of sharing

Young-Il Kim; Dae-Won Kim; Doheon Lee; Kwang Hyung Lee

In this paper, the problem of traditional validity indices when applied to the Gustafson-Kessel (GK) clustering are reviewed. A new cluster validity index for the GK algorithm is proposed. This validity index is defined as the average value of the relative degrees of sharing of all possible pairs of fuzzy clusters in the system. It computes the overlap of each pair of fuzzy clusters by considering the degree of sharing of each data point in the overlap. The optimal number of clusters is obtained by minimizing the validity index. Experiments in which the proposed validity index and several traditional validity indices were applied to 6 data sets highlight the superior qualities of the proposed index. The results indicate that the proposed validity index is very reliable.

Pattern Recognition Letters | 2004

A novel initialization scheme for the fuzzy c-means algorithm for color clustering

Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

A novel initialization scheme for the fuzzy c-means (FCM) algorithm is proposed for the color clustering problem. Given a set of color points, the proposed initialization scheme extracts the most vivid and distinguishable colors, referred to here as the dominant colors. The color points closest to these dominant colors are selected as the initial centroids in the FCM calculations. To obtain the dominant colors and their closest color points, we introduce reference colors and define a fuzzy membership model between a color point and a reference color. The effectiveness and reliability of the proposed method is demonstrated through various color clustering examples.

IEEE Transactions on Neural Networks | 2007

Density-Induced Support Vector Data Description

Ki-Young Lee; Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

The purpose of data description is to give a compact description of the target data that represents most of its characteristics. In a support vector data description (SVDD), the compact description of target data is given in a hyperspherical model, which is determined by a small portion of data called support vectors. Despite the usefulness of the conventional SVDD, however, it may not identify the optimal solution of target description especially when the support vectors do not have the overall characteristics of the target data. To address the issue in SVDD methodology, we propose a new SVDD by introducing new distance measurements based on the notion of a relative density degree for each data point in order to reflect the distribution of a given data set. Moreover, for a real application, we extend the proposed method for the protein localization prediction problem which is a multiclass and multilabel problem. Experiments with various real data sets show promising results

Pattern Recognition Letters | 2003

Fuzzy cluster validation index based on inter-cluster proximity

Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

A new cluster validity index is proposed for fuzzy partitions obtained from Fuzzy C-Means algorithm. The proposed validity index exploits an inter-cluster proximity between fuzzy clusters. The inter-cluster proximity is used to measure the degree of overlap between clusters. A low proximity value indicates well-partitioned clusters. The best fuzzy c-partition is obtained by minimizing the inter-cluster proximity with respect to c. Well-known data sets are tested to show the effectiveness and reliability of the proposed index.

Bioinformatics | 2005

Detecting clusters of different geometrical shapes in microarray gene expression data

Dae-Won Kim; Kwang Hyung Lee; Doheon Lee

MOTIVATION Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters. RESULTS We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data. AVAILABILITY The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at http://dragon.kaist.ac.kr/gk.

systems man and cybernetics | 2004

Fuzzy branching temporal logic

Seong-ick Moon; Kwang Hyung Lee; Doheon Lee

Intelligent systems require a systematic way to represent and handle temporal information containing uncertainty. In particular, a logical framework is needed that can represent uncertain temporal information and its relationships with logical formulae. Fuzzy linear temporal logic (FLTL), a generalization of propositional linear temporal logic (PLTL) with fuzzy temporal events and fuzzy temporal states defined on a linear time model, was previously proposed for this purpose. However, many systems are best represented by branching time models in which each state can have more than one possible future path. In this paper, fuzzy branching temporal logic (FBTL) is proposed to address this problem. FBTL adopts and generalizes concurrent tree logic (CTL*), which is a classical branching temporal logic. The temporal model of FBTL is capable of representing fuzzy temporal events and fuzzy temporal states, and the order relation among them is represented as a directed graph. The utility of FBTL is demonstrated using a fuzzy job shop scheduling problem as an example.

Explore More