Woong-Kee Loh
KAIST
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Woong-Kee Loh.
Data Mining and Knowledge Discovery | 2004
Woong-Kee Loh; Sang-Wook Kim; Kyu-Young Whang
In this paper, an algorithm is proposed for subsequence matching that supports normalization transform in time-series databases. Normalization transform enables finding sequences with similar fluctuation patterns even though they are not close to each other before the normalization transform. Simple application of existing subsequence matching algorithms to support normalization transform is not feasible since the algorithms do not have information for normalization transform of subsequences of arbitrary lengths. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for every possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of different lengths of query sequences. For subsequence matching it selects the most appropriate index among them. Better search performance can be obtained by using more indexes. In this paper, the approach is called index interpolation. It is formally proved that the proposed algorithm does not cause false dismissal. The search performance can be traded off with storage space by adjusting the number of indexes. For performance evaluation, a series of experiments is conducted using the indexes for only five different lengths out of lengths 256∼512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 2.4 times on the average when the selectivity of the query is 10−2 and up to 14.6 times when it is 10−5. Since the proposed algorithm performs better with smaller selectivities, it is suitable for practical situations, where the queries with smaller selectivities are much more frequent.
conference on information and knowledge management | 2000
Woong-Kee Loh; Sang-Wook Kim; Kyu-Young Whang
In this paper, w epropose a subsequence matching algorithm that supports normalization transform in timeseries databases. Normalization transform enables nding sequences with similar uctuation patterns although they are not close to each other before the normalization transform. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for ev ery possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of di erent lengths of query sequences. F or subsequence matching it selects the most appropriate index among them. We can obtain better searc h performance by using more indexes. We call our approach index interp olation. We formally pro ve that the proposed algorithm does not cause false dismissal. F or performance evaluation, we have conducted experiments using the indexes for only ve di erent lengths out of the lengths 256 512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 14.6 times on the average when the selectivity of the query is 10 .
Information Systems | 2001
Yang-Sae Moon; Kyu-Young Whang; Woong-Kee Loh
Abstract In this paper, we propose a new subsequence matching method, Dual Match. Dual Match exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 1994, pp. 419–429.) (FRM in short), which divides data sequences into sliding windows and the query sequence into disjoint windows. FRM causes a lot of false alarms (i.e., candidates that do not qualify) by storing minimum bounding rectangles rather than individual points representing windows to save storage space for the index. Dual Match solves this problem by directly storing points without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement both in false alarms and performance over FRM given the same amount of storage space. In particular, for low selectivities (less than 10−4), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities (more than 10−2), it shows a very minor degradation (less than 29%). For selectivities in between (10−4–10−2), Dual Match shows performance slightly better than that of FRM. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications.
Information Sciences | 2015
Woong-Kee Loh; Hwanjo Yu
Graphics processing units (GPUs) have been utilized to improve the processing speed of many conventional data mining algorithms. DBSCAN, a popular clustering algorithm that has been often used in practice, was extended to execute on a GPU. However, existing GPU-based DBSCAN extensions still have impediments in that the distances from all objects need to be repeatedly computed to find the neighbor objects and the objects and intermediate clustering results are stored in costly off-chip memory of the GPU. This paper proposes CudaSCAN, a novel algorithm that improves the efficiency of DBSCAN by making better use of the GPU. CudaSCAN consists of three phases: (1) partitioning the entire dataset into sub-regions of size of an integer multiple of the on-chip shared memory size in the GPU; (2) local clustering within sub-regions in parallel; and (3) merging the local clustering results. CudaSCAN allows an overlap between sub-regions to ensure independent, parallel local clustering in each sub-region, which in turn enables for objects and/or intermediate results to be stored in on-chip shared memory that has an access cost a few hundred times cheaper than that of off-chip global memory. The independence also enables for merging to be parallelized. This paper proves the correctness of CudaSCAN, and according to our extensive experiments, CudaSCAN outperforms CUDA-DClust, a previous GPU-based DBSCAN extension, by up to 163.6 times.
Archive | 2014
Woong-Kee Loh; Young-Ho Park
Density-based clustering forms the clusters of densely gathered objects separated by sparse regions. In this paper, we survey the previous and recent density-based clustering algorithms. DBSCAN [6], OPTICS [1], and DENCLUE [5, 6] are previous representative density-based clustering algorithms. Several recent algorithms such as PDBSCAN [8], CUDA-DClust [3], and GSCAN [7] have been proposed to improve the performance of DBSCAN. They make the most of multi-core CPUs and GPUs.
Multimedia Systems | 2015
Yang-Sae Moon; Woong-Kee Loh
Nowadays there are many efforts to develop image matching applications exploiting a large number of images stored in smart devices such as smartphones, smart pads, and smart cameras. Boundary image matching converts boundary images to time-series and identifies similar boundary images using time-series matching on those time-series. In boundary image matching, computing the rotation-invariant distance between image time-series is a very time-consuming process since it requires a lot of Euclidean distance computations for all possible rotations. To support the boundary image matching in smart devices, we need to devise a simple but fast computation mechanism for rotation-invariant distances. For this purpose, in this paper we propose a novel rotation-invariant matching solution that significantly reduces the number of distance computations using the triangular inequality. To this end, we first present the notion of self-rotation distance and formally show that the self-rotation distance with the triangular inequality produces a tight lower bound and prunes many unnecessary distance computations. Using the self-rotation distance, we then propose a triangular inequality-based solution to rotation-invariant image matching. We next present the concept of k-self rotation distance as a generalized version of the self-rotation distance and formally show that this
Computers & Industrial Engineering | 2012
Wookey Lee; Woong-Kee Loh; Mye M. Sohn
Information Sciences | 2010
Wook-Shin Han; Woong-Kee Loh; Kyu-Young Whang
k
IEICE Transactions on Information and Systems | 2011
Woong-Kee Loh; Yang-Sae Moon; Wookey Lee
Information Sciences | 2016
Dongha Lee; Jinoh Oh; Woong-Kee Loh; Hwanjo Yu
k-self rotation distance produces a tighter lower bound and prunes more unnecessary distance computations. Using the