Duck-Ho Bae | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Duck-Ho Bae is active.

Explore More

Publication

Featured researches published by Duck-Ho Bae.

international symposium on computer architecture | 2016

Biscuit: a framework for near-data processing of big data workloads

Bon-Cheol Gu; Andre S. Yoon; Duck-Ho Bae; Insoon Jo; Jin-Young Lee; Jonghyun Yoon; Jeong-Uk Kang; Moon-sang Kwon; Chanho Yoon; Sangyeun Cho; Jaeheon Jeong; Duckhyun Chang

Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.

conference on information and knowledge management | 2013

Intelligent SSD: a turbo for big data mining

Duck-Ho Bae; Jinhyung Kim; Sang-Wook Kim; Hyunok Oh; Chanik Park

This paper introduces the notion of intelligent SSDs. First, we present the design considerations of intelligent SSDs, and then examine their potential benefits under various settings in data mining applications.

very large data bases | 2016

YourSQL: a high-performance database system leveraging in-storage computing

Insoon Jo; Duck-Ho Bae; Andre S. Yoon; Jeong-Uk Kang; Sangyeun Cho; Daniel D. G. Lee; Jaeheon Jeong

This paper presents YourSQL, a database system that accelerates data-intensive queries with the help of additional in-storage computing capabilities. YourSQL realizes very early filtering of data by offloading data scanning of a query to user-programmable solid-state drives. We implement our system on a recent branch of MariaDB (a variant of MySQL). In order to quantify the performance gains of YourSQL, we evaluate SQL queries with varying complexities. Our result shows that YourSQL reduces the execution time of the whole TPC-H queries by 3.6×, compared to a vanilla system. Moreover, the average speed-up of the five TPC-H queries with the largest performance gains reaches over 15×. Thanks to this significant reduction of execution time, we observe sizable energy savings. Our study demonstrates that the YourSQL approach, combining the power of early filtering with end-to-end datapath optimization, can accelerate large-scale analytic queries with lower energy consumption.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

On Constructing Seminal Paper Genealogy

Duck-Ho Bae; Se-Mi Hwang; Sang-Wook Kim; Christos Faloutsos

Let us consider that someone is starting a research on a topic that is unfamiliar to them. Which seminal papers have influenced the topic the most? What is the genealogy of the seminal papers in this topic? These are the questions that they can raise, which we try to answer in this paper. First, we propose an algorithm that finds a set of seminal papers on a given topic. We also address the performance and scalability issues of this sophisticated algorithm. Next, we discuss the measures to decide how much a paper is influenced by another paper. Then, we propose an algorithm that constructs a genealogy of the seminal papers by using the influence measure and citation information. Finally, through extensive experiments with a large volume of a real-world academic literature data, we show the effectiveness and efficiency of our approach.

conference on information and knowledge management | 2011

Constructing seminal paper genealogy

Duck-Ho Bae; Se-Mi Hwang; Sang-Wook Kim; Christos Faloutsos

When a researcher starts with a new topic, it would be very useful if seminal papers in the topic and their relationships are provided in advance. We propose an approach to construct seminal paper genealogy and show the effectiveness and efficiency of our approach.

database and expert systems applications | 2009

Clustering and Non-clustering Effects in Flash Memory Databases

Duck-Ho Bae; Ji-Woong Chang; Sang-Wook Kim

Flash memory has its unique characteristics: the write operation is much more costly than the read operation, and in-place updating is not allowed. In this paper, we analyze how these characteristics affect the performance of clustering and non-clustering in record management, and shows that non-clustering is more suitable in flash memory environment, which does not hold in disk environment. Also, we discuss the problems of the existing non-clustering method, and identify design factors to be considered with record management method in flash memory environment.

ieee international conference on network infrastructure and digital content | 2009

SD-Miner: A spatial data mining system

Duck-Ho Bae; Ji-Haeng Baek; Hyun-Kyo Oh; Ju-Won Song; Sang-Wook Kim

Owing to the GIS technology, a vast volume of spatial data has been accumulated, thereby incurring the necessity of spatial data mining techniques. In this paper, we propose a new spatial data mining system named SD-Miner. SD-Miner consists of three parts: a graphical user interface for inputs and outputs, a data mining module that processes spatial data mining functionalities, a data storage model that stores and manages spatial as well as non-spatial data by using a DBMS. In particular, the data mining module provides major spatial data mining functionalities such as spatial clustering, spatial classification, spatial characterization, and spatio-temporal association rule mining. SD-Miner has its own characteristics: (1) It supports users to perform non-spatial data mining functionalities as well as spatial data mining functionalities intuitively and effectively; (2) It provides users with spatial data mining functions as a form of libraries, thereby making applications conveniently use those functions. (3) It inputs parameters for mining as a form of database tables to increase flexibility.

conference on information and knowledge management | 2015

Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis

Yong-Yeon Jo; Sang-Wook Kim; Duck-Ho Bae

As a number of social network services appear online recently, there have been many attempts to analyze social networks for extracting valuable information. Most existing methods first represent a social network as a quite sparse adjacency matrix, and then analyze it through matrix operations such as matrix multiplication. Due to the large scale and high complexity, efficient processing multiplications is an important issue in social network analysis. In this paper, we propose a GPU-based method for efficient sparse matrix multiplication through the parallel computing paradigm. The proposed method aims at balancing the amount of workload both at fine- and coarse-grained levels for maximizing the degree of parallelism in GPU. Through extensive experiments using synthetic and real-world datasets, we show that the proposed method outperforms previous methods by up to three orders-of-magnitude.

acm symposium on applied computing | 2015

On running data-intensive algorithms with intelligent SSD and host CPU: a collaborative approach

Yong-Yeon Jo; SungWoo Cho; Sang-Wook Kimm; Duck-Ho Bae; Hyunok Oh

A solid state device (SSD), which has the characteristics such as high IO bandwidth and low access latency, is drawing attention as a next-generation storage device. Even though SSD provides a high internal bandwidth, the performance bottleneck exists on the host interface of relatively low bandwidth in spite of the increased internal bandwidth of SSD. To overcome the performance bottleneck, the notion of intelligent SSD (iSSD) has been proposed. In iSSD, there are still problems in processing the algorithms of high complexity. In this paper, we address an effective collaboration of iSSD and host CPU in order to maximize the performance of data-intensive algorithms. Extensive experimental results show that our approach performs faster up to 2.43 times than a previous approach.

conference on information and knowledge management | 2012

Outlier detection using centrality and center-proximity

Duck-Ho Bae; Seo Jeong; Sang-Wook Kim; Minsoo Lee

An outlier is an object that is considerably dissimilar with the remainder of the dataset. In this paper, we first propose the notion of centrality and center-proximity as novel outlierness measures which can be considered to represent the characteristics of all of the objects in the dataset. We then propose a graph-based outlier detection method which can solve the problems of local density, micro-cluster, and fringe objects. Finally, through extensive experiments, we show the effectiveness of the proposed method.

Explore More