Yang-Suk Kee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yang-Suk Kee is active.

Explore More

Publication

Featured researches published by Yang-Suk Kee.

international conference on management of data | 2014

Durable write cache in flash memory SSD for relational and NoSQL databases

Woon-Hak Kang; Sang-Won Lee; Bongki Moon; Yang-Suk Kee; Moon-Wook Oh

In order to meet the stringent requirements of low latency as well as high throughput, web service providers with large data centers have been replacing magnetic disk drives with flash memory solid-state drives (SSDs). They commonly use relational and NoSQL database engines to manage OLTP workloads in the warehouse-scale computing environments. These modern database engines rely heavily on redundant writes and frequent cache flushes to guarantee the atomicity and durability of transactional updates. This has become a serious bottleneck of performance in both relational and NoSQL database engines. This paper presents a new SSD prototype called DuraSSD equipped with tantalum capacitors. The tantalum capacitors make the device cache inside DuraSSD durable, and additional firmware features of DuraSSD take advantage of the durable cache to support the atomicity and durability of page writes. It is the first time that a flash memory SSD with durable cache has been used to achieve an order of magnitude improvement in transaction throughput without compromising the atomicity and durability. Considering that the simple capacitors increase the total cost of an SSD no more than one percent, DuraSSD clearly provides a cost-effective means for transactional support. DuraSSD is also expected to alleviate the problem of high tail latency by minimizing write stalls.

The Journal of Supercomputing | 2015

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Sangwhan Moon; Jaehwan Lee; Xiling Sun; Yang-Suk Kee

Solid-state drives (SSDs) are an attractive alternative to hard disk drives (HDDs) to accelerate the Hadoop MapReduce Framework. However, the SSD characteristics and today’s Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. This paper explores how to optimize a Hadoop MapReduce Framework with SSDs in terms of performance, cost, and energy consumption. It identifies extensible best practices that can exploit SSD benefits within Hadoop when combined with high network bandwidth and increased parallel storage access. Our Terasort benchmark results demonstrate that Hadoop currently does not sufficiently exploit SSD throughput. Hence, using faster SSDs in Hadoop does not enhance its performance. We show that SSDs presently deliver significant efficiency when storing intermediate Hadoop data, leaving HDDs for Hadoop Distributed File System (HDFS). The proposed configuration is optimized with the JVM reuse option and frequent heartbeat interval option. Moreover, we examined the performance of a state-of-the-art non-volatile memory express interface SSD within the Hadoop MapReduce Framework. While HDFS read and write throughput increases with high-performance SSDs, achieving complete system performance improvement requires carefully balancing CPU, network, and storage resource capabilities at a system level.

international conference on management of data | 2016

SHARE Interface in Flash Storage for Relational and NoSQL Databases

Gihwan Oh; Chiyoung Seo; Ravi Mayuram; Yang-Suk Kee; Sang-Won Lee

Database consistency and recoverability require guaranteeing write atomicity for one or more pages. However, contemporary database systems consider write operations non-atomic. Thus, many database storage engines have traditionally relied on either journaling or copy-on-write approaches for atomic propagation of updated pages to the storage. This reliance achieves write atomicity at the cost of various write amplifications such as redundant writes, tree-wandering, and compaction. This write amplification results in reduced performance and, for flash storage, accelerates device wear-out. In this paper, we propose a flash storage interface, SHARE. Being able to explicitly remap the address mapping inside flash storage using SHARE interface enables host-side database storage engines to achieve write atomicity without causing write amplification. We have implemented SHARE on a real SSD board, OpenSSD, and modified MySQL/InnoDB and Couchbase NoSQL storage engines to make them compatible with the extended SHARE interface. Our experimental results show that this SHARE-based MySQL/InnoDB and Couchbase configurations can significantly boost database performance. In particular, the inevitable and costly Couchbase compaction process can complete without copying any data pages.

IEEE Transactions on Computers | 2016

In-Storage Computing for Hadoop MapReduce Framework: Challenges and Possibilities

Dong-Chul Park; Jianguo Wang; Yang-Suk Kee

Solid State Drives (SSDs) were initially developed as faster storage devices intended to replace conventional magnetic Hard Disk Drives (HDDs). However, high computational capabilities enable SSDs to be computing nodes, not just faster storage devices. Such capability is generally called ”In-Storage Computing (ISC)”. Today’s Hadoop MapReduce framework has become a de facto standard for big data processing. This paper explores In-Storage Computing challenges and opportunities for the Hadoop MapReduce framework. For this, we integrate a Hadoop MapReduce system with ISC SSD devices that implement the Hadoop Mapper inside real SSD firmware. This offloads Map tasks from the host MapReduce system to the ISC SSDs. We additionally optimize the host Hadoop system to make the best use of our proposed ISC Hadoop system. Experimental results demonstrate our ISC Hadoop MapReduce system achieves a remarkable performance gain (2.3 faster) as well as significant energy savings (11.5 lower) compared to a typical Hadoop MapReduce system. Further, the experiment suggests such ISC augmented systems can provide a very promising computing model in terms of a system scalability.

international conference on big data | 2015

Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing

I. Stephen Choi; Weiqing Yang; Yang-Suk Kee

This paper describes our experience with storage optimization that utilizes cost-effective PCIe solid-state drives (SSDs) to improve the overall performance of a Spark framework. A key problem we address is the limited memory system performance. In particular, we adopt high-performance SSDs to alleviate the saturated DRAM bandwidth and its limited capacity. We utilize SSDs to store shuffle data and persisted RDDs. As a result, the overall performance improves due to the larger capacity of SSDs and the increased bandwidth provided by SSDs while alleviating memory contentions. Our experiments show that we can improve the performance of data-intensive applications by 23.1% on average, compared to the performance of the memory-only approach. To our knowledge, this is the first work to demonstrate performance optimizations using PCIe SSDs on Spark.

Proceedings of the 2015 International Symposium on Memory Systems | 2015

Energy Efficient Scale-In Clusters with In-Storage Processing for Big-Data Analytics

I. Stephen Choi; Yang-Suk Kee

Big data drives a computing paradigm shift. Due to enormous data volumes, data-intensive programming frameworks are pervasive and scale-out clusters are widespread. As a result, data-movement energy dominates overall energy consumption and this will get worse with a technology scaling. We propose scale-in clusters with In-Storage Processing (ISP) devices that would enable energy efficient computing for big-data analytics. ISP devices eliminate/reduce data movements towards CPUs and execute tasks more energy-efficiently. Thus, with energy efficient computing near data and higher throughput enabled, clusters with ISP can achieve more than quadruple energy efficiency with fewer number of nodes as compared to the energy efficiency of similarly performing its counter-part scale-out clusters.

international conference on management of data | 2013