Myoungsoo Jung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Myoungsoo Jung is active.

Explore More

Publication

Featured researches published by Myoungsoo Jung.

measurement and modeling of computer systems | 2013

Revisiting widely held SSD expectations and rethinking system-level implications

Myoungsoo Jung; Mahmut T. Kandemir

Storage applications leveraging Solid State Disk (SSD) technology are being widely deployed in diverse computing systems. These applications accelerate system performance by exploiting several SSD-specific characteristics. However, modern SSDs have undergone a dramatic technology and architecture shift in the past few years, which makes widely held assumptions and expectations regarding them highly questionable. The main goal of this paper is to question popular assumptions and expectations regarding SSDs through an extensive experimental analysis using 6 state-of-the-art SSDs from different vendors. Our analysis leads to several conclusions which are either not reported in prior SSD literature, or contradict to current conceptions. For example, we found that SSDs are not biased toward read-intensive workloads in terms of performance and reliability. Specifically, random read performance of SSDs is worse than sequential and random write performance by 40% and 39% on average, and more importantly, the performance of sequential reads gets significantly worse over time. Further, we found that reads can shorten SSD lifetime more than writes, which is very unfortunate, given the fact that many existing systems/platforms already employ SSDs as read caches or in applications that are highly read intensive. We also performed a comprehensive study to understand the worst-case performance characteristics of our SSDs, and investigated the viability of recently proposed enhancements that are geared towards alleviating the worst-case performance challenges, such as TRIM commands and background-tasks. Lastly, we uncover the overheads of these enhancements and their limits, and discuss system-level implications.

international symposium on computer architecture | 2012

Physically addressed queueing (PAQ): improving parallelism in solid state disks

Myoungsoo Jung; Ellis Herbert Wilson; Mahmut T. Kandemir

NAND flash storage has proven to be a competitive alternative to traditional disk for its properties of high random-access speeds, low-power and its presumed efficacy for random-reads. Ironically, we demonstrate that when packaged in SSD format, there arise many barriers to reaching full parallelism in reads, resulting in random writes outperforming them. Motivated by this, we propose Physically Addressed Queuing (PAQ), a request scheduler that avoids resource contention resultant from shared SSD resources. PAQ makes the following major contributions: First, it exposes the physical addresses of requests to the scheduler. Second, I/O clumping is utilized to select groups of operations that can be simultaneously executed without major resource conflict. Third, inter-request NAND transaction packing empowers multi-plane-mode operations. We implement PAQ in a cycle-accurate simulator and demonstrate bandwidth and IOPS improvements greater than 62% and latency decreases as much as 41.6% for random reads, without degrading performance of other access types.

ieee conference on mass storage systems and technologies | 2012

NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level

Myoungsoo Jung; Ellis Herbert Wilson; David Donofrio; John Shalf; Mahmut T. Kandemir

As NAND flash memory becomes popular in diverse areas ranging from embedded systems to high performance computing, exposing and understanding flash memorys performance, energy consumption, and reliability becomes increasingly important. Moreover, with an increasing trend towards multiple-die, multiple-plane architectures and high speed interfaces, high performance NAND flash memory systems are expected to continue to scale. This scaling should further reduce costs and thereby widen proliferation of devices based on the technology. However, when designing NAND flash-based devices, making decisions about the optimal system configuration is non-trivial because NAND flash is sensitive to a large number of parameters, and some parameters exhibit significant latency variations. Such parameters include varying architectures such as multi-die and multi-plane, and a host of factors that affect performance, energy consumption, diverse node technology, and reliability. Unfortunately, there are no public domain tools for high-fidelity, microarchitecture level NAND flash memory simulation in existence to assist with making such decisions. Therefore, we introduce NANDFlashSim; a latency variation-aware, detailed, and highly configurable NAND flash simulation model. NANDFlashSim implements a detailed timing model for operations in sixteen state-of-the-art NAND flash operation mode combinations. In addition, NANDFlashSim models energies and reliability of NAND flash memory based on statistics. From our comprehensive experiments using NANDFlashSim, we found that 1) most read cases were unable to leverage the highly-parallel internal architecture of NAND flash regardless of the NAND flash operation mode, 2) the main source of this performance bottleneck is I/O bus activity, not NAND flash activity itself, 3) multi-level-cell NAND flash provides lower I/O bus resource contention than single-level-cell NAND flash, but the resource contention becomes a serious problem as the number of die increases, and 4) preference to employ many dies rather than to employ many planes promises better performance in disk-friendly real workloads. The simulator can be downloaded from http://www.cse.psu.edu/~mqj5086/nfs.

high-performance computer architecture | 2014

Sprinkler: Maximizing resource utilization in many-chip solid state disks

Myoungsoo Jung; Mahmut T. Kandemir

Resource utilization is one of the emerging problems in many-chip SSDs. In this paper, we propose Sprinkler, a novel device-level SSD controller, which targets maximizing resource utilization and achieving high performance without additional NAND flash chips. Specifically, Sprinkler relaxes parallelism dependency by scheduling I/O requests based on internal resource layout rather than the order imposed by the device-level queue. In addition, Sprinkler improves flash-level parallelism and reduces the number of transactions (i.e., improves transactionallocality) by over-committing flash memory requests to specific resources. Our extensive experimental evaluation using a cycle-accurate large-scale SSD simulation framework shows that a many-chip SSD equipped with our Sprinkler provides at least 56.6% shorter latency and 1.8 -2.2 times better throughput than the state-of-the-art SSD controllers. Further, it improves overall resource utilization by 68.8% under different I/O request patterns and provides, on average, 80.2% more flash-level parallelism by reducing half of the flash memory requests at runtime.

international symposium on computer architecture | 2014

HIOS: a host interface I/O scheduler for solid state disks

Myoungsoo Jung; Wonil Choi; Shekhar Srikantaiah; Joonhyuk Yoo; Mahmut T. Kandemir

Garbage collection (GC) and resource contention on I/O buses (channels) are among the critical bottlenecks in Solid State Disks (SSDs) that cannot be easily hidden. Most existing I/O scheduling algorithms in the host interface logic (HIL) of state-of-the-art SSDs are oblivious to such low-level performance bottlenecks in SSDs. As a result, SSDs may violate quality of service (QoS) requirements by not being able to meet the deadlines of I/O requests. In this paper, we propose a novel host interface I/O scheduler that is both GC-aware and QoS-aware. The proposed scheduler redistributes the GC overheads across non-critical I/O requests and reduces channel resource contention. Our experiments with workloads from various application domains reveal that the proposed scheduler reduces the standard deviation for latency over state-of-the-art I/O schedulers used in the HIL by 52.5%, and the worst-case latency by 86.6%. In addition, for I/O requests with sizes smaller than a superpage, our proposed scheduler avoids channel resource conflicts and reduces latency by 29.2% compared to the state-of-the-art.

international middleware conference | 2012

Taking garbage collection overheads off the critical path in SSDs

Myoungsoo Jung; Ramya Prabhakar; Mahmut T. Kandemir

Solid state disks (SSDs) have the potential to revolutionize the storage system landscape, mostly due to their good random access performance, compared to hard disks. However, garbage collection (GC) in SSD introduces significant latencies and large performance variations, which renders widespread adoption of SSDs difficult. To address this issue, we present a novel garbage collection strategy, consisting of two components, called Advanced Garbage Collection (AGC) and Delayed Garbage Collection (DGC), that operate collectively to migrate GC operations from busy periods to idle periods. More specifically, AGC is employed to defer GC operations to idle periods in advance, based on the type of the idle periods and on-demand GC needs, whereas DGC complements AGC by handling the collections that could not be handled by AGC. Our comprehensive experimental analysis reveals that the proposed strategies provide stable SSD performance by significantly reducing GC overheads. Compared to the state-of-the-art GC strategies, P-FTL, L-FTL and H-FTL, our AGC+DGC scheme reduces GC overheads, on average, by about 66.7%, 96.7% and 98.2%, respectively.

architectural support for programming languages and operating systems | 2014

Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems

Myoungsoo Jung; Wonil Choi; John Shalf; Mahmut T. Kandemir

Solid State Disk (SSD) arrays are in a position to (as least partially) replace spinning disk arrays in high performance computing (HPC) systems due to their better performance and lower power consumption. However, these emerging SSD arrays are facing enormous challenges, which are not observed in disk-based arrays. Specifically, we observe that the performance of SSD arrays can significantly degrade due to various array-level resource contentions. In addition, their maintenance costs exponentially increase over time, which renders them difficult to deploy widely in HPC systems. To address these challenges, we propose Triple-A, a non-SSD based Autonomic All-Flash Array, which is a self-optimizing, from-scratch NAND flash cluster. Triple-A can detect two different types of resource contentions and autonomically alleviate them by reshaping the physical data-layout on its flash array network. Our experimental evaluation using both real workloads and a micro-benchmark show that Triple-A can offer a 53% higher sustained throughput and a 80% lower I/O latency than non-autonomic SSD arrays.

ieee international conference on high performance computing data and analytics | 2013

Exploring the future of out-of-core computing with compute-local non-volatile memory

Myoungsoo Jung; Ellis Herbert Wilson; Wonil Choi; John Shalf; Hasan Metin Aktulga; Chao Yang; Erik Saule; Mahmut T. Kandemir

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

ieee international conference on high performance computing data and analytics | 2016

Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems

Narges Shahidi; Mohammad Arjomand; Myoungsoo Jung; Mahmut T. Kandemir; Chita R. Das; Anand Sivasubramaniam

In the last decade, NAND flash-based SSDs have been widely adopted for high-end enterprise systems in an attempt to provide a high-performance and reliable storage. However, inferior performance is frequently attained mainly due to the need for Garbage Collection (GC). GC in flash memory is the process of identifying and clearing the blocks of unneeded data to create space for the new data to be allocated. GC is a high-latency operation and once it is scheduled for service to a block of a plane in a flash chip (each flash chip consists of multiple planes), it can increase latency for later arriving I/O requests to the same plane. Apart from that, the consequent high latency also keep other planes of the same chip, that are not involved in this GC, idle for a long time. We show that for the baseline SSD with modern FTL, GC considerably reduces the plane-level parallelism, causing significant performance degradation. There are several circuit-level constraints that make it difficult to allow subsequent I/O operations and/or GCs to be served concurrently from the same chip, but different planes, during the long latency GC. This paper proposes a novel GC strategy, called Parallel GC (PaGC), whose goal is to proactively run GC on the remaining planes of a flash chip whenever any of its planes needs to execute on-demand GC. The resulting PaGC system boosts the response time of I/O requests by up to 45% (32% on average) for different GC settings and across a wide spectrum of enterprise I/O workloads.

international conference on parallel architectures and compilation techniques | 2015

NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures

Jie Zhang; David Donofrio; John Shalf; Mahmut T. Kandemir; Myoungsoo Jung

Thanks to massive parallelism in modern Graphics Processing Units (GPUs), emerging data processing applications in GPU computing exhibit ten-fold speedups compared to CPU-only systems. However, this GPU-based acceleration is limited in many cases by the significant data movement overheads and inefficient memory management for host-side storage accesses. To address these shortcomings, this paper proposes a non-volatile memory management unit (NVMMU) that reduces the file data movement overheads by directly connecting the Solid State Disk (SSD) to the GPU. We implemented our proposed NVMMU on a real hardware with commercially available GPU and SSD devices by considering different types of storage interfaces and configurations. In this work, NVMMU unifies two discrete software stacks (one for the SSD and other for the GPU) in two major ways. While a new interface provided by our NVMMU directly forwards file data between the GPU runtime library and the I/O runtime library, it supports non-volatile direct memory access (NDMA) that pairs those GPU and SSD devices via physically shared system memory blocks. This unification in turn can eliminate unnecessary user/kernel-mode switching, improve memory management, and remove data copy overheads. Our evaluation results demonstrate that NVMMU can reduce the overheads of file data movement by 95% on average, improving overall system performance by 78% compared to a conventional IOMMU approach.

Explore More