Ingyu Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ingyu Lee is active.

Explore More

Publication

Featured researches published by Ingyu Lee.

Knowledge and Information Systems | 2012

Scalable clustering methods for the name disambiguation problem

Byung-Won On; Ingyu Lee; Dongwon Lee

When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) “names” of entities are used as their identifier, the problem is often referred to as a name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., If only last name is used as the identifier, one cannot distinguish “Masao Obama” from “Norio Obama”). In this paper, in particular, we study the scalability issue of the name disambiguation problem—when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms. First, we carefully examine two of the state-of-the-art solutions to the name disambiguation problem and point out their limitations with respect to scalability. Then, we propose two scalable graph partitioning algorithms known as multi-level graph partitioning and multi-level graph partitioning and merging to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation—our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.

Microprocessors and Microsystems | 2012

A hybrid SSD with PRAM and NAND Flash memory

Gyu Sang Choi; Ingyu Lee; Mankyu Sung; Choongjae Im

The speed of computing processor has been improved dramatically with multi-core architecture. However, the overall computer system performance shows slow improvement because of the sluggish speed of storage system. Several researches have been done to improve the performance of storage system by introducing Solid-State Disk technology with NAND Flash memory. In this paper, we propose new hybrid Solid-State Disk (SSD) architecture to combine Phase-change Memory (PRAM) and NAND Flash memory to achieve high-performance. Our experimental results show that the proposed scheme shows up to 140% performance improvement without endurance problem in PRAM in write-intensive workloads, compared to SSD with only NAND Flash memory.

Artificial Intelligence Review | 2011

An effective web document clustering algorithm based on bisection and merge

Ingyu Lee; Byung-Won On

To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation, we found that clustering such web pages is more complicated because (1) the number of clusters (known as ground truth) is larger than two or three clusters as in general clustering problems and (2) clusters in the data set have extremely skewed distributions of cluster sizes. To overcome the aforementioned problem, in this paper, we propose an effective clustering algorithm to boost up the accuracy of K-means and spectral clustering algorithms. In particular, to deal with skewed distributions of cluster sizes, our algorithm performs both bisection and merge steps based on normalized cuts of the similarity graph G to correctly cluster web documents. Our experimental results show that our algorithm improves the performance by approximately 56% compared to spectral bisection and 36% compared to K-means.

international conference on digital information management | 2014

A big data management system for energy consumption prediction models

Wonjin Lee; Byung-Won On; Ingyu Lee; Jungin Choi

In this work, we develop a prototype about a big data management system for storing, indexing, and searching for huge-scale energy usage data. Rather than existing, commercial relational databases such as Oracle and IBM-DB2, this system is able to provide us with high availability and performance at low cost. It is also able to manage unstructured data and store big data in distributed environment. In addition, using data access APIs, target data is quickly retrieved from our proposed system. To utilize our prototype system, we also propose an energy consumption prediction model based on penalized linear regression-based map/reduce algorithms. Then, we exploit discriminate features with respect to time stamp. Finally, given a time stamp (e.g., 2014-01-05 12:01:08), our proposed learning model will give us a predicted value about the energy usage (e.g., 90 watt) at that time. According to our experimental results obtained from about 7.5 million records, each of which consists of an energy usage and time stamp during three months in 2014, it turns out that our prediction model can predict real values that are very close to actual energy usage at that time, and is about 1.72 times faster than in a single machine.

Journal of Information Science | 2015

LDA topics

Muhammad Omar; Byung-Won On; Ingyu Lee; Gyu Sang Choi

In recent years many automated topic coherence formulas (using the top-m words of a topic inferred by latent Dirichlet allocation) based on word similarities have been proposed and evaluated against human ratings. We treat a wordy topic as an object and quantitatively describe it via normalized mean values of pair-wise word similarities. Two types of word similarities, thesaurus and local corpus-based, are used as the descriptive features of a topic. We perform topic classification using represented topics as input and bi-level human ratings about topic coherence as class labels. Classification results (precision, recall and accuracy) based on two datasets and three supervised classification algorithms suggest that the novel topic representation is consistent with human ratings. Corpus-based word similarities are positively correlated with human ratings whereas thesaurus-based similarities have negative relations. The proposed representation of topics opens a window for us to investigate the utilization of topics with different perspectives.

Symmetry | 2017

Prognosis Essay Scoring and Article Relevancy Using Multi-Text Features and Machine Learning

Arif Mehmood; Byung-Won On; Ingyu Lee; Gyu Sang Choi

This study develops a model for essay scoring and article relevancy. Essay scoring is a costly process when we consider the time spent by an evaluator. It may lead to inequalities of the effort by various evaluators to apply the same evaluation criteria. Bibliometric research uses the evaluation criteria to find relevancy of articles instead. Researchers mostly face relevancy issues while searching articles. Therefore, they classify the articles manually. However, manual classification is burdensome due to time needed for evaluation. The proposed model performs automatic essay evaluation using multi-text features and ensemble machine learning. The proposed method is implemented in two data sets: a Kaggle short answer data set for essay scoring that includes four ranges of disciplines (Science, Biology, English, and English language Arts), and a bibliometric data set having IoT (Internet of Things) and non-IoT classes. The efficacy of the model is measured against the Tandalla and AutoP approach using Cohen’s kappa. The model achieves kappa values of 0.80 and 0.83 for the first and second data sets, respectively. Kappa values show that the proposed model has better performance than those of earlier approaches.

IEEE Transactions on Knowledge and Data Engineering | 2015

PB + -Tree: PCM-Aware B + -Tree

Gyu Sang Choi; Byung-Won On; Ingyu Lee

Phase change memory (PCM) is non-volatile memory that is byte-addressable. It is two to four times denser than DRAM, orders of magnitude better than NAND Flash memory in read latency, and 10 times better than NAND Flash memory in write endurance. However, it still limits the number of write operations to at most

international conference on digital information management | 2011