Youjip Won | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Youjip Won is active.

Explore More

Publication

Featured researches published by Youjip Won.

IEEE Transactions on Computers | 2011

Efficient Deduplication Techniques for Modern Backup Operation

Jaehong Min; Daeyoung Yoon; Youjip Won

In this work, we focus on optimizing the deduplication system by adjusting the pertinent factors in fingerprint lookup and chunking, the factors which we identify as the key ingredients of efficient deduplication. For efficient fingerprint lookup, we propose fingerprint management scheme called LRU-based Index Partitioning. For efficient chunking, we propose Incremental Modulo-K(INC-K) algorithm which is optimized Rabins algorithm where we significantly reduce the number of arithmetic operations exploiting the algebraic nature of modulo arithmetic. LRU-based Index Partitioning uses the notion of tablet and enforces access locality of the fingerprint lookup in storing fingerprints. We maintain tablets with LRU manner to exploit temporal locality of the fingerprint lookup. To preserve access correlation across the tablets, we apply prefetching in maintaining tablet list. We propose Context-aware chunking to maximize chunking speed and deduplication ratio. We develop prototype backup system and performed comprehensive analysis on various factors and their relationship: average chunk size, chunking speed, deduplication ratio, tablet management algorithms, and overall backup speed. By increasing the average chunk size from 4 KB to 10 KB, chunking time increases by 34.3 percent, deduplication ratio decreases by 0.66 percent and the overall backup speed increases by 50 percent (from 51.4 MB/sec to 77.8 MB/sec).

IEEE Transactions on Knowledge and Data Engineering | 1999

Server capacity planning for Web traffic workload

Krishna Kant; Youjip Won

The goal of the paper is to provide a methodology for determining bandwidth requirements for various hardware components of a World Wide Web server. The paper assumes a traditional symmetric multiprocessor (SMP) architecture for the Web server, although the same analysis applies to an SMP node in a cluster. The paper derives formulae for bandwidth demands for memory, processor data bus, network adapters, disk adapters, I/O memory paths, and I/O buses. Since the Web workload characteristics vary widely, three sample workloads are considered for illustrative purposes: 1) standard SPECweb96; 2) a SPECweb96-like workload that assumes dynamic data and retransmissions; and 3) WebProxy, which models a Web proxy server that does not do much caching, and thus has rather severe requirements. The results point to a few general conclusions regarding Web workloads. In particular, reduction in memory/data bus bandwidth by using the virtual interface architecture (VIA) is very desirable, and the connectivity needs may go well beyond the capabilities of traditional systems based on the traditional PCI-bus. Web workloads also demand a significantly higher memory bandwidth than data bus bandwidth and this disparity is expected to increase with the use of VIA. Also, the current efforts to offload TCP/IP processing may require a larger headroom in I/O subsystem bandwidth than in the processor-memory subsystem.

embedded software | 2012

Smart layers and dumb result: IO characterization of an android-based smartphone

Kisung Lee; Youjip Won

In this paper, we offer an in-depth IO characterization of the Android-based smartphone. We analyze the IO behaviors of a total of 14 Android applications from six different categories. We examine the correlations among seven IO attributes: originating application, file type, IO size, IO type (read/write), random/sequential, block semantics (Data/Metadata/Journal), and session type (buffered vs. synchronous IO). For the purposes of our study, we develop Mobile Storage Analyzer (MOST), a framework for collecting IO attributes across layers. Let us summarize our findings briefly. SQLite, which is the most popular tool for maintaining persistent data in Android, puts too much burden on the storage. For example, a single SQLite operation (update or insert) results in at least 11 write operations being sent to the storage. These are for creating short-lived files, updating database tables, and accessing EXT4 Journal. From the storage point of view, more than 50% of writes are for EXT4 Journal updating. Excluding Metadata and Journal accesses, 60-80% of the writes are random. More than 50% of the writes are synchronous. 4KB IO accounts for 70% of all writes. In the Android platform, each SQLite and EXT4 filesystem requires a great amount of effort to ensure reliability in supporting transactions and journaling, respectively. When they are combined, the results are rather dumb. The operations of SQLite and EXT4, when combined, generate unnecessarily excessive write operations to the NAND-based storage. This not only degrades IO performance but also significantly reduces the lifetime of the underlying NAND flash storage. The results of this study clearly suggest that SQLite, EXT4, and the underlying NAND-based storage need to be completely overhauled and vertically integrated so as to properly and effectively incorporate their respective characteristics.

ACM Transactions on Storage | 2010

FRASH: Exploiting storage class memory in hybrid file system for hierarchical storage

Jaemin Jung; Youjip Won; Eun-Ki Kim; Hyung-Jong Shin; Byeonggil Jeon

In this work, we develop a novel hybrid file system, FRASH, for storage-class memory and NAND Flash. Despite the promising physical characteristics of storage-class memory, its scale is an order of magnitude smaller than the current storage device scale. This fact makes it less than desirable for use as an independent storage device. We carefully analyze in-memory and on-disk file system objects in a log-structured file system, and exploit memory and storage aspects of the storage-class memory to overcome the drawbacks of the current log-structured file system. FRASH provides a hybrid view storage-class memory. It harbors an in-memory data structure as well as a on-disk structure. It provides nonvolatility to key data structures which have been maintained in-memory in a legacy log-structured file system. This approach greatly improves the mount latency and effectively resolves the robustness issue. By maintaining on-disk structure in storage-class memory, FRASH provides byte-addressability to the file system object and metadata for page, and subsequently greatly improves the I/O performance compared to the legacy log-structured approach. While storage-class memory offers byte granularity, it is still far slower than its DRAM counter part. We develop a copy-on-mount technique to overcome the access latency difference between main memory and storage-class memory. Our file system was able to reduce the mount time by 92% and file system I/O performance was increased by 16%.

ieee conference on mass storage systems and technologies | 2012

Deduplication in SSDs: Model and quantitative analysis

Jonghwa Kim; Choonghyun Lee; Sang Yup Lee; Ikjoon Son; Jongmoo Choi; Sungroh Yoon; Hu-ung Lee; Sooyong Kang; Youjip Won; Jaehyuk Cha

In NAND Flash-based SSDs, deduplication can provide an effective resolution of three critical issues: cell lifetime, write performance, and garbage collection overhead. However, deduplication at SSD device level distinguishes itself from the one at enterprise storage systems in many aspects, whose success lies in proper exploitation of underlying very limited hardware resources and workload characteristics of SSDs. In this paper, we develop a novel deduplication framework elaborately tailored for SSDs. We first mathematically develop an analytical model that enables us to calculate the minimum required duplication rate in order to achieve performance gain given deduplication overhead. Then, we explore a number of design choices for implementing deduplication components by hardware or software. As a result, we propose two acceleration techniques: sampling-based filtering and recency-based fingerprint management. The former selectively applies deduplication based upon sampling and the latter effectively exploits limited controller memory while maximizing the deduplication ratio. We prototype the proposed deduplication framework in three physical hardware platforms and investigate deduplication efficiency according to various CPU capabilities and hardware/software alternatives. Experimental results have shown that we achieve the duplication rate ranging from 4% to 51%, with an average of 17%, for the nine workloads considered in this work. The response time of a write request can be improved by up to 48% with an average of 15%, while the lifespan of SSDs is expected to increase up to 4.1 times with an average of 2.4 times.

ieee conference on mass storage systems and technologies | 2013

VSSIM: Virtual machine based SSD simulator

Jinsoo Yoo; Youjip Won; Joongwoo Hwang; Sooyong Kang; Jongmoo Choi; Sungroh Yoon; Jaehyuk Cha

In this paper, we present a virtual machine based SSD Simulator, VSSIM (Virtual SSD Simulator). VSSIM intends to address the issues of the trace driven simulation, e.g. trace re-scaling, accurate replay, etc. VSSIM operates on top of QEMU/KVM with software based SSD module. VSSIM runs in realtime and allows the user to measure both the host performance and the SSD behavior under various design choices. VSSIM can flexibly model the various hardware components, e.g. the number of channels, the number of ways, block size, page size, planes per chip, program, erase, read latency of NAND cells, channel switch delay, and way switch delay. VSSIM can also facilitate the implementation of the SSD firmware algorithms. To demonstrate the capability of VSSIM, we performed a number of case studies. The results of the simulation study deliver an important guideline in the firmware and hardware designs of future NAND based storage devices. Followings are some of the findings: (i) as the page size increases, the performance benefit of increasing the channel parallelism against increasing the way parallelism becomes less significant, (ii) due to the bi-modality in IO size distribution, FTL should be designed to handle multiple mapping granularity, (iii) hybrid mapping does not work in four or more way SSD due to severe log block fragmentation, (iv) as a performance metric, the Write Amplification Factor can be misleading, (v) compared to sequential write, random write operation can be benefited more from the channel level parallelism and therefore in multi-channel environment, it is beneficial to categorize larger fraction of IO as random. VSSIM is validated against commodity SSD, Intel X25M SSD. VSSIM models the sequential IO performance of X25M within 3% offset.

acm multimedia | 2002

Empirical study of user perception behavior for mobile streaming

Seungho Song; Youjip Won; Injae Song

The objective of this study is to examine the effect of individual factors over human perception behavior and to determine the right set of parameters which effectively exploit the underlying network and system capacity while maximizing the QoS perceived by the user. For the comprehensive test, we examine three different types of video clips: news, drama and sport game. From each of the original video clip, we vary the encoding factors as follows: playback rate(384Kbits/sec and 1.5 Mbits/sec), frame rate(5 frames/sec, 15 frames/sec, and 25 frames/sec) and spatial resolution(176x244 and 320x240). We performed extensive user experiment. We particularly focus on video streaming in mobile wireless environment where playback rate and screen size are relatively small. The analysis result reveals that out of three encoding factors, frame rate is the most influential factor. Spatial resolution does not make significant difference on QoS for three video categories. Playback rate results in noticeable difference in QoS. However, the analysis result suggests that the improvement on QoS obtained by quadrupling the playback rate (from 384Kbits/sec to 1.5 Mbits/sec) may not be justifiable particularly when the screen size is small.

modeling, analysis, and simulation on computer and telecommunication systems | 2008

Efficient index lookup for De-duplication backup system

Youjip Won; Jongmyeong Ban; Jaehong Min; Jungpil Hur; Sangkyu Oh; Jangsun Lee

We minimizes fingerprint management overhead (index lookup and index insert) via introducing main memory index lookup structure and workload-aware index partitioning of the index file in the storage. Backup server maintains three data structures for redundancy elimination: Header files, chunk files and fingerprint tables. These data structures altogether enables PRUNE to effectively eliminate redundancy and to perform efficient backup. We perform various experiment to measure the overhead of each tasks in backup operation and to examine the efficiency of redundancy elimination. Incremental modulo-K reduces the file chunking latency by approximately 60%. With filter based in-memory index data structure and index partitioning, PRUNE eliminates 99.4% of disk accesses involved in fingerprint management.

ACM Transactions on Storage | 2015

HEAPO: Heap-Based Persistent Object Store

Taeho Hwang; Jaemin Jung; Youjip Won

In this work, we developed a Heap-Based Persistent Object Store (HEAPO) to manage persistent objects in byte-addressable Nonvolatile RAM (NVRAM). HEAPO defines its own persistent heap layout, the persistent object format, name space organization, object sharing and protection mechanism, and undo-only log-based crash recovery, all of which are effectively tailored for NVRAM. We put our effort into developing a lightweight and flexible layer to exploit the DRAM-like access latency of NVRAM. To address this objective, we developed (i) a native management layer for NVRAM to eliminate redundancy between in-core and on-disk copies of the metadata, (ii) an expandable object format, (iii) a burst trie-based global name space with local name space caching, (iv) static address binding, and (v) minimal logging for undo-only crash recovery. We implemented HEAPO at commodity OS (Linux 2.6.32) and measured the performance. By eliminating metadata redundancy, HEAPO improved the speed of creating, attaching, and expanding an object by 1.3×, 4.5×, and 3.8×, respectively, compared to memory-mapped file-based persistent object store. Burst trie-based name space organization of HEAPO yielded 7.6× better lookup performance compared to hashed B-tree-based name space of EXT4. We modified memcachedb to use HEAPO in maintaining its search structure. For hash table update, HEAPO-based memcachedb yielded 3.4× performance improvement against original memcachedb implementation which uses mmap() over ramdisk approach to maintain the key-value store in memory.

IEEE Transactions on Signal Processing | 2010

On-Line Prediction of Nonstationary Variable-Bit-Rate Video Traffic

Sungjoo Kang; Seongjin Lee; Youjip Won; Byeongchan Seong

In this paper, we propose a model-based bandwidth prediction scheme for variable-bit-rate (VBR) video traffic with regular group of pictures (GOP) pattern. Multiplicative ARIMA (autoregressive integrated moving-average) process called GOP ARIMA (ARIMA for GOP) is used as a base stochastic model, which consists of two key ingredients: prediction and model validity check. For traffic prediction, we deploy a Kalman filter over GOP ARIMA model, and confidence interval analysis for validity determination. The GOP ARIMA mPodel explicitly models inter and intra-GOP frame size correlations and the Kalman filter-based prediction maintains ?state? across the prediction rounds. Synergy of the two successfully addresses a number of challenging issues, such as a unified framework for frame type dependent prediction, accurate prediction, and robustness against noise. With few exceptions, a single video session consists of several scenes whose bandwidth process may exhibit different stochastic nature, which hinders recursive adjustment of parameters in Kalman filter, because its stochastic model structure is fixed at its deployment. To effectively address this issue, the proposed prediction scheme harbors a statistical hypothesis test in the prediction framework. By formulating the confidence interval of a prediction in terms of Kalman filter components, it not only predicts the frame size but also determines validity of the stochastic model. Based upon the results of the model validity check, the proposed prediction scheme updates the structures of the underlying GOP ARIMA model. We perform a comprehensive performance study using publicly available MPEG-2 and MPEG-4 traces. We compare the prediction accuracy of four different prediction schemes. In all traces, the proposed model yields superior prediction accuracy than the other prediction schemes. We show that confidence interval analysis effectively detects the structural changes in the sample sequence and that properly updating the model results in more accurate prediction. However, model update requires a certain length of observation period, e.g., 60 frames (2 s). Due to this learning overhead, the advantage of model update becomes less significant when scene length is short. Through queueing simulation, we examine the effect of prediction accuracy over user perceivable QoS. The proposed bandwidth prediction scheme allocates less 50% of the queue(buffer) compared to the other bandwidth prediction schemes, but still yields better packet loss behavior.

Explore More