Nae Young Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nae Young Song is active.

Explore More

Publication

Featured researches published by Nae Young Song.

ACM Transactions on Computer Systems | 2014

Optimizing the Block I/O Subsystem for Fast Storage Devices

Young Jin Yu; Dong In Shin; Woong Shin; Nae Young Song; Jae Woo Choi; Hyeong Seog Kim; Hyeonsang Eom; Heon Young Yeom

Fast storage devices are an emerging solution to satisfy data-intensive applications. They provide high transaction rates for DBMS, low response times for Web servers, instant on-demand paging for applications with large memory footprints, and many similar advantages for performance-hungry applications. In spite of the benefits promised by fast hardware, modern operating systems are not yet structured to take advantage of the hardware’s full potential. The software overhead caused by an OS, negligible in the past, adversely impacts application performance, lessening the advantage of using such hardware. Our analysis demonstrates that the overheads from the traditional storage-stack design are significant and cannot easily be overcome without modifying the hardware interface and adding new capabilities to the operating system. In this article, we propose six optimizations that enable an OS to fully exploit the performance characteristics of fast storage devices. With the support of new hardware interfaces, our optimizations minimize per-request latency by streamlining the I/O path and amortize per-request latency by maximizing parallelism inside the device. We demonstrate the impact on application performance through well-known storage benchmarks run against a Linux kernel with a customized SSD. We find that eliminating context switches in the I/O path decreases the software overhead of an I/O request from 20 microseconds to 5 microseconds and a new request merge scheme called Temporal Merge enables the OS to achieve 87% to 100% of peak device performance, regardless of request access patterns or types. Although the performance improvement by these optimizations on a standard SATA-based SSD is marginal (because of its limited interface and relatively high response times), our sensitivity analysis suggests that future SSDs with lower response times will benefit from these changes. The effectiveness of our optimizations encourages discussion between the OS community and storage vendors about future device interfaces for fast storage devices.

ACM Transactions on Storage | 2016

Efficient Memory-Mapped I/O on Fast Storage Device

Nae Young Song; Yongseok Son; Hyuck Han; Heon Young Yeom

In modern operating systems, memory-mapped I/O (mmio) is an important access method that maps a file or file-like resource to a region of memory. The mapping allows applications to access data from files through memory semantics (i.e., load/store) and it provides ease of programming. The number of applications that use mmio are increasing because memory semantics can provide better performance than file semantics (i.e., read/write). As more data are located in the main memory, the performance of applications can be enhanced owing to the effect of a large cache. When mmio is used, hot data tend to reside in the main memory and cold data are located in storage devices such as HDD and SSD; data placement in the memory hierarchy depends on the virtual memory subsystem of the operating system. Generally, the performance of storage devices has a direct impact on the performance of mmio. It is widely expected that better storage devices will lead to better performance. However, the expectation is limited when fast storage devices are used since the virtual memory subsystem does not reflect the performance feature of those devices. In this article, we examine the Linux virtual memory subsystem and mmio path to determine the influence of fast storage on the existing Linux kernel. Throughout our investigation, we find that the overhead of the Linux virtual memory subsystem, negligible on the HDD, prevents applications from using the full performance of fast storage devices. To reduce the overheads and fully exploit the fast storage devices, we present several optimization techniques. We modify the Linux kernel to implement our optimization techniques and evaluate our prototyped system with low-latency storage devices. Experimental results show that our optimized mmio has up to 7x better performance than the original mmio. We also compare our system to a system that has enough memory to keep all data in the main memory. The system with insufficient memory and our mmio achieves 92% performance of the resource-rich system. This result implies that our virtual memory subsystem for mmap can effectively extend the main memory with fast storage devices.

Cluster Computing | 2015

Design and evaluation of a user-level file system for fast storage devices

Yongseok Son; Nae Young Song; Hyuck Han; Hyeonsang Eom; Heon Young Yeom

Lately, fast storage devices are rapidly increasing in social network services, cloud platforms, etc. Unfortunately, the traditional Linux I/O stack is designed to maximize performance on disk-based storage. Emerging byte-addressable and low-latency non-volatile memory technologies (e.g., phase-change memories, MRAMs, and the memristor) provide very different characteristics, so the disk-based I/O stack cannot lead to high performance. This paper presents a high performance I/O stack for the fast storage devices. Our scheme is to remove the concept of block and to simplify the whole I/O path and software stack, which results in only two layers that are the byte-capable interface and the byte-aware file system called BAFS. We aim to minimize I/O latency and maximize bandwidth by eliminating the unnecessary layers and supporting byte-addressable I/O without requiring changes to applications. We have implemented a prototype and evaluated its performance with multiple benchmarks. The experimental results show that our I/O stack achieves 6.2 times on average and up to 17.5 times performance gains compared to the existing Linux I/O stack.

2014 International Conference on Cloud and Autonomic Computing | 2014

A User-Level File System for Fast Storage Devices

Yongseok Son; Nae Young Song; Hyuck Han; Hyeonsang Eom; Heon Young Yeom

Lately, fast storage devices are rapidly increasing in social network services, cloud platforms, etc. Unfortunately, the traditional Linux I/O stack is designed to maximize performance on disk-based storage. Emerging byte-addressable and low-latency non-volatile memory (NVM) technologies (e.g., Phase-change memories, spin-transfer torque MRAMs, and the memristor) provides very different characteristics, so the disk-based I/O stack cannot lead to high performance. This paper presents a high performance I/O stack for the fast storage devices. Our scheme is to remove the concept of block and to simplify the whole I/O path and software stack, which results in only two layers that are composed of the byte-capable interface and the byte-aware file system called BAFS. We aim to minimize I/O latency and maximize bandwidth by eliminating the unnecessary layers and supporting byte-addressable I/O without requiring changes to applications. We have implemented a prototype and evaluated its performance with multiple benchmarks. The experimental results show that our I/O stack achieves 6.2 times on average and up to 17.5 times performance gains compared to existing Linux I/O stack.

ieee international conference on high performance computing data and analytics | 2012

Low-latency Memory-Mapped I/O for Data-Intensive Applications on Fast Storage Devices

Nae Young Song; Young Jin Yu; Woong Shin; Hyeonsang Eom; Heon Young Yeom

Thesedays, along with read()/write(), mmap() is used to file I/O in data-intensive applications as an alternative method of I/O on emerging low-latency device such as flash-based SSD. Although utilizing memory-mapped file I/O have many advantages, it does not produce much benefit when combined with large-scale data and fast storage devices. When the working set of an application accessing file with mmap() is larger than the size of physical memory, the I/O performance is severely degraded compared to the application with read/write(). This is mainly due to the virtual memory subsystem that does not reflect the performance feature of the underlying storage device. In this paper, we examined linux virtual memory subsystem and mmap() I/O path to figure out the influence of low-latency storage devices on the existing virtual memory subsystem. Also, we suggest some optimization policies to reduce the overheads of mmap() I/O and implement the prototype in a recent Linux kernel. Our solution guarantees that 1) memory-mapped I/O will be several times faster than read-write I/O when cache-hit ratio becomes high, and 2) the former will show at least the performance of the latter even when cache-miss frequently occurs and the overhead of mapping/unmapping pages becomes significant, which are not achievable by the existing virtual memory subsystem.

Cluster Computing | 2018

Optimizing of metadata management in large-scale file systems

Nae Young Song; Hwajung Kim; Hyuck Han; Heon Young Yeom

As modern computer systems face the challenge of large data, filesystems have to deal with a large number of files. This leads to amplified concerns of metadata operations as well as data operations. Most filesystems manage metadata of files by constructing in-memory data structures, such as directory entry (dentry) and inode. We found inefficiencies on management of metadata in existing filesystems, such as path traversal mechanism. In this article, we optimize the metadata operations by (1) looking up dentry cache (dcache) hash table in backward manner. To adopt the backward finding mechanism, we devise the rename and permission-granted mechanism. We also propose (2) compacting the metadata into dentry structures for in-memory space efficiency. We evaluate our optimized metadata managing mechanisms with several benchmarks, including a real-world workload. These optimizations significantly reduce dcache lookup latency by up to 40% and improve overall throughput by up to 72% in a real-world benchmark.

Cluster Computing | 2017

A low-latency storage stack for fast storage devices

Yongseok Son; Nae Young Song; Heon Young Yeom; Hyuck Han

Modern storage systems are facing an important challenge of making the best use of fast storage devices. Even though the underlying storage devices are being enhanced, the traditional storage stack falls short of utilizing the enhanced characteristics, as it has been optimized specifically for hard disk drives. In this article, we optimize the storage stack to maximize the benefit of low latency that fast storage devices provide. Our approach is to simplify the I/O path from application to the fast storage device by removing inefficient layers and the conventional block I/O. The proposed stack consists of three layers: an optimized device driver, a low-latency file system called L2FS, and a simplified VFS. The device driver provides a simple file I/O API to the file system instead of the existing block I/O API. L2FS, a variant of EXT4, performs low-latency I/O operations by using the file I/O API that our optimized device driver provides. We implement our storage stack on Linux 3.14.3 and evaluate it with multiple benchmarks. The results show that our system improves the throughput by up to 6.6 times and reduces the latency by an average of 54% compared to the existing storage stack on fast storage.

usenix conference on hot topics in storage and file systems | 2012