Nohhyun Park
University of Minnesota
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nohhyun Park.
ieee international symposium on workload characterization | 2010
Nohhyun Park; David J. Lilja
The compression and throughput performance of data deduplication system is directly affected by the input dataset. We propose two sets of evaluation metrics, and the means to extract those metrics, for deduplication systems. The First set of metrics represents how the composition of segments changes within the deduplication system over five full backups. This in turn allows more insights into how the compression ratio will change as data accumulate. The second set of metrics represents index table fragmentation caused by duplicate elimination and the arrival rate at the underlying storage system. We show that, while shorter sequences of unique data may be bad for index caching, they provide a more uniform arrival rate which improves the overall throughput. Finally, we compute the metrics derived from the datasets under evaluation and show how the datasets perform with different metrics. Our evaluation shows that backup datasets typically exhibit patterns in how they change over time and that these patterns are quantifiable in terms of how they affect the deduplication process. This quantification allows us to: 1) decide whether deduplication is applicable, 2) provision resources, 3) tune the data deduplication parameters and 4) potentially decide which portion of the dataset is best suited for deduplication.
high performance computing and communications | 2011
Youngjin Nam; Guanlin Lu; Nohhyun Park; Weijun Xiao; David Hung-Chang Du
Data deduplication has recently become commonplace in most secondary storage and even in some primary storage for the capacity optimization purpose. Aside from its write performance, read performance of the deduplication storage has been gaining in significance with a wide range of its deployments. In this paper, we emphasize the importance of read performance in reconstituting a data stream from its unique and shared chunks physically dispersed over deduplication storage. We newly introduce a read performance indicator called Chunk Fragmentation Level (CFL). We also validate that the CFL is very effective to indicate read performance of deduplication storage through a developed theoretical performance model and extensive experiments. Finally, we articulate further research issues.
symposium on cloud computing | 2012
Nohhyun Park; Irfan Ahmad; David J. Lilja
Workload consolidation is a key technique in reducing costs in virtualized datacenters. When considering storage consolidation, a key problem is the unpredictable performance behavior of consolidated workloads on a given storage system. In practice, this often forces system administrators to grossly overprovision storage to meet application demands. In this paper, we show that existing modeling techniques are inaccurate and ineffective in the face of heterogenous devices. We introduce Romano, a storage performance management system designed to optimize truly heterogeneous virtualized datacenters. At its core, Romano constructs and adapts approximate workload-specific performance models of storage devices automatically, along with prediction intervals. It then applies these models to allow highly efficient IO load balancing. End-to-end experiments demonstrate that Romano reduces prediction error by 80% on average compared with existing techniques. The result is improved load balancing with lowered variance by 82% and reduced average and maximum latency observed across the storage systems by 52% and 78%, respectively.
international symposium on parallel and distributed processing and applications | 2012
Weijun Xiao; Xiaoqiang Lei; Ruixuan Li; Nohhyun Park; David J. Lilja
Recent advances in flash memory show great potential to replace traditional hard drives (HDDs) with flash-based solid state drives (SSDs) from personal computing to distributed systems. However, it is still a long way to go before completely using SSDs for enterprise data storage. Considering the cost, performance, and reliability of SSDs, a practical solution is to combine both SSDs and HDDs together. This paper proposes a hybrid storage system named PASS (Performance-dAta Synchronization - hybrid storage System) to tradeoff between I/O performance and data discrepancy between SSDs and HDDs. PASS includes a high-performance SSD and a traditional HDD to store mirrored data for reliability. All of the I/O requests are redirected to the primary SSD first and then the updated data blocks are copied to the backup HDD asynchronously. In order to hide the latency of copying operations, we use an I/O window to coalesce write requests and maintain an ordered I/O queue to shorten the HDD seek and rotation times. Depending on the charateristics of different I/O workloads, we develop an adaptive policy to dynamically balance the foreground I/O processing and background mirroring. We implement a prototype system of PASS by developing a Linux device driver and conduct experiments on the IoMeter, PostMark, and TPCC benchmarks. Our results show that PASS can achieve up to 12 times the performance of a RAID1 storage system for the IoMeter and PostMark workloads while tolerating less than 2% data discrepancy between the primary SSD and the backup HDD. More interestingly, while PASS does not produce any performance benefit for the TPC-C benchmark, it does allow the system to scale to larger sizes than when using an HDD-based RAID system alone.
international conference on computer design | 2012
Zhe Zhang; Weijun Xiao; Nohhyun Park; David J. Lilja
Phase change memory (PCM) is a promising technology to solve energy and performance bottlenecks for memory and storage systems. To help understand the reliability characteristics of PCM devices, we present a simple fault model to categorize four types of PCM errors. Based on our proposed fault model, we conduct extensive experiments on real PCM devices at the memory module level. Numerical results uncover many interesting trends in terms of the lifetime of PCM devices and error behaviors. Specifically, PCM lifetime for the memory chips we tested is greater than 14 million cycles, which is much longer than for flash memory devices. In addition, the distributions for four types of errors are quite different. These results can be used for estimating PCM lifetime and for measuring the fabrication quality of individual PCM memory chips.
file and storage technologies | 2015
Carl A. Waldspurger; Nohhyun Park; Alexander Thomas Garthwaite; Irfan Ahmad
usenix annual technical conference | 2017
Carl A. Waldspurger; Trausti Saemundsson; Irfan Ahmad; Nohhyun Park
Archive | 2014
Carl A. Waldspurger; Alexander Thomas Garthwaite; Nohhyun Park; Irfan Ahmad
storage network architecture and parallel i/os | 2011
Nohhyun Park; Weijun Xiao; Kyubaik Choi; David J. Lilja
Archive | 2014
Carl A. Waldspurger; Nohhyun Park