Is this you? Create Your Porfile

Changwoo Min

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Changwoo Min is active.

Explore More

Publication

Featured researches published by Changwoo Min.

international conference on management of data | 2013

X-FTL: transactional FTL for SQLite databases

Woon-Hak Kang; Sang-Won Lee; Bongki Moon; Gihwan Oh; Changwoo Min

In the era of smartphones and mobile computing, many popular applications such as Facebook, twitter, Gmail, and even Angry birds game manage their data using SQLite. This is mainly due to the development productivity and solid transactional support. For transactional atomicity, however, SQLite relies on less sophisticated but costlier page-oriented journaling mechanisms. Hence, this is often cited as the main cause of tardy responses in mobile applications. Flash memory does not allow data to be updated in place, and the copy-on-write strategy is adopted by most flash storage devices. In this paper, we propose X-FTL, a transactional flash translation layer(FTL) for SQLite databases. By offloading the burden of guaranteeing the transactional atomicity from a host system to flash storage and by taking advantage of the copy-on-write strategy used in modern FTLs, X-FTL drastically improves the transactional throughput almost for free without resorting to costly journaling schemes. We have implemented X-FTL on an SSD development board called OpenSSD, and modified SQLite and ext4 file system minimally to make them compatible with the extended abstractions provided by X-FTL. We demonstrate the effectiveness of X-FTL using real and synthetic SQLite workloads for smartphone applications, TPC-C benchmark for OLTP databases, and FIO benchmark for file systems.

european conference on computer systems | 2017

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Steffen Maass; Changwoo Min; Sanidhya Kashyap; Woon-Hak Kang; Mohan Kumar; Taesoo Kim

Processing a one trillion-edge graph has recently been demonstrated by distributed graph engines running on clusters of tens to hundreds of nodes. In this paper, we employ a single heterogeneous machine with fast storage media (e.g., NVMe SSD) and massively parallel coprocessors (e.g., Xeon Phi) to reach similar dimensions. By fully exploiting the heterogeneous devices, we design a new graph processing engine, named Mosaic, for a single machine. We propose a new locality-optimizing, space-efficient graph representation---Hilbert-ordered tiles, and a hybrid execution model that enables vertex-centric operations in fast host processors and edge-centric operations in massively parallel coprocessors. Our evaluation shows that for smaller graphs, Mosaic consistently outperforms other state-of-the-art out-of-core engines by 3.2-58.6x and shows comparable performance to distributed graph engines. Furthermore, Mosaic can complete one iteration of the Pagerank algorithm on a trillion-edge graph in 21 minutes, outperforming a distributed disk-based engine by 9.2×.

symposium on operating systems principles | 2015

Cross-checking semantic correctness: the case of finding file system bugs

Changwoo Min; Sanidhya Kashyap; Byoungyoung Lee; Chengyu Song; Taesoo Kim

Today, systems software is too complex to be bug-free. To find bugs in systems software, developers often rely on code checkers, like Linuxs Sparse. However, the capability of existing tools used in commodity, large-scale systems is limited to finding only shallow bugs that tend to be introduced by simple programmer mistakes, and so do not require a deep understanding of code to find them. Unfortunately, the majority of bugs as well as those that are difficult to find are semantic ones, which violate high-level rules or invariants (e.g., missing a permission check). Thus, it is difficult for code checkers lacking the understanding of a programmers true intention to reason about semantic correctness. To solve this problem, we present Juxta, a tool that automatically infers high-level semantics directly from source code. The key idea in Juxta is to compare and contrast multiple existing implementations that obey latent yet implicit high-level semantics. For example, the implementation of open() at the file system layer expects to handle an out-of-space error from the disk in all file systems. We applied Juxta to 54 file systems in the stock Linux kernel (680K LoC), found 118 previously unknown semantic bugs (one bug per 5.8K LoC), and provided corresponding patches to 39 different file systems, including mature, popular ones like ext4, btrfs, XFS, and NFS. These semantic bugs are not easy to locate, as all the ones found by Juxta have existed for over 6.2 years on average. Not only do our empirical results look promising, but the design of Juxta is generic enough to be extended easily beyond file systems to any software that has multiple implementations, like Web browsers or protocols at the same layer of a network stack.

grid computing | 2012

VMMB: Virtual Machine Memory Balancing for Unmodified Operating Systems

Changwoo Min; Inhyeok Kim; Taehyoung Kim; Young Ik Eom

Virtualization technology has been widely adopted in Internet hosting centers and cloud-based computing services, since it reduces the total cost of ownership by sharing hardware resources among virtual machines (VMs). In a virtualized system, a virtual machine monitor (VMM) is responsible for allocating physical resources such as CPU and memory to individual VMs. Whereas CPU and I/O devices can be shared among VMs in a time sharing manner, main memory is not amendable to such multiplexing. Moreover, it is often the primary bottleneck in achieving higher degrees of consolidation. In this paper, we present VMMB (Virtual Machine Memory Balancer), a novel mechanism to dynamically monitor the memory demand and periodically re-balance the memory among the VMs. VMMB accurately measures the memory demand with low overhead and effectively allocates memory based on the memory demand and the QoS requirement of each VM. It is applicable even to guest OS whose source code is not available, since VMMB does not require modifying guest kernel. We implemented our mechanism on Linux and experimented on synthetic and realistic workloads. Our experiments show that VMMB can improve performance of VMs that suffers from insufficient memory allocation by up to 3.6 times with low performance overhead (below 1%) for monitoring memory demand.

IEEE Transactions on Consumer Electronics | 2013

Virtual memory partitioning for enhancing application performance in mobile platforms

Geunsik Lim; Changwoo Min; Young Ik Eom

Recently, the amount of running software on smart mobile devices is gradually increasing due to the introduction of application stores. The application store is a type of digital distribution platform for application software, which is provided as a component of an operating system on a smartphone or tablet. Mobile devices have limited memory capacity and, unlike server and desktop systems, due to their mobility they do not have a memory slot that can expand the memory capacity. Low memory killer (LMK) and out-of-memory killer (OOMK) are widely used memory management solutions in mobile systems. They forcibly terminate applications when the available physical memory becomes insufficient. In addition, before the forced termination, the memory shortage incurs thrashing and fragmentation, thus slowing down application performance. Although the existing page reclamation mechanism is designed to secure available memory, it could seriously degrade user responsiveness due to the thrashing. Memory management is therefore still important especially in mobile devices with small memory capacity. This paper presents a new memory partitioning technique that resolves the deterioration of the existing application life cycle induced by LMK and OOMK. It provides a completely isolated virtual memory node at the operating system level. Evaluation results demonstrate that the proposed method improves application execution time under memory shortage, compared with methods in previous studies.

international conference on parallel architectures and compilation techniques | 2013

DANBI: dynamic scheduling of irregular stream programs for many-core systems

Changwoo Min; Young Ik Eom

The stream programming model has received a lot of interest because it naturally exposes task, data, and pipeline parallelism. However, most prior work has focused on static scheduling of regular stream programs. Therefore, irregular applications cannot be handled in static scheduling, and the load imbalance caused by static scheduling faces scalability limitations in many-core systems. In this paper, we introduce the DANBI1 programming model which supports irregular stream programs and propose dynamic scheduling techniques. Scheduling irregular stream programs is very challenging and the load imbalance becomes a major hurdle to achieve scalability. Our dynamic load-balancing scheduler exploits producer-consumer relationships already expressed in the stream program to achieve scalability. Moreover, it effectively avoids the thundering-herd problem and dynamically adapts to load imbalance in a probabilistic manner. It surpasses prior static stream scheduling approaches which are vulnerable to load imbalance and also surpasses prior dynamic stream scheduling approaches which have many restrictions on supported program types, on the scope of dynamic scheduling, and on preserving data ordering. Our experimental results on a 40-core server show that DANBI achieves an almost linear scalability and outperforms state-of-the-art parallel runtimes by up to 2.8 times.

IEEE Transactions on Consumer Electronics | 2015

Effective flash-based SSD caching for high performance home cloud server

Dongwoo Lee; Changwoo Min; Young Ik Eom

In the home cloud environment, the storage performance of home cloud servers, which govern connected devices and provide resources with virtualization features, is critical to improve the end-user experience. To improve the storage performance of virtualized home cloud servers in a cost-effective manner, caching schemes using flash-based solid state drives (SSD) have been widely studied. Although previous studies successfully narrow the speed gap between memory and hard disk drives, they only focused on how to manage the cache space, but were less interested in how to use the cache space efficiently taking into account the characteristics of flash-based SSD. Moreover, SSD caching is used as a read-only cache due to two well-known limitations of SSD: slow write and limited lifespan. Since storage access in virtual machines is performed in a more complex and costly manner, the limitations of SSD affect more significantly the storage performance. This paper proposes a novel SSD caching scheme and virtual disk image format, named sequential virtual disk (SVD), for achieving high-performance home cloud environments. The proposed techniques are based on the workload characteristics, in which synchronous random writes dominate, while taking into consideration the characteristics of flash memory and storage stack of the virtualized systems. Unlike previous studies, SSD is used as a read-write cache in the proposed caching scheme to effectively mitigate the performance degradation of synchronous random writes. The prototype was evaluated with some realistic workloads, through which the developed scheme was shown to allow improvement of the storage access performance by 21% to 112%, with reduction in the number of erasures on SSD by about 56% on average.

international conference on consumer electronics | 2014

Reducing excessive journaling overhead in mobile devices with small-sized NVRAM

Jung-Hoon Kim; Changwoo Min; Young Ik Eom

The excessive journaling degrades the performance and shortens the lifetime of NAND flash storage in mobile devices. We propose a novel journaling scheme that resolves this problem by using small-sized NVRAM efficiently. Experimental results show that our proposed scheme outperforms EXT4 by up to 16.8 times for synthetic workloads. Also, for TPC-C SQLite benchmark, it enhances the transaction throughput by 20% and reduces the number of journal writes by 58% with only 16 MB NVRAM.

international conference on consumer electronics | 2013

Enhancing application performance by memory partitioning in Android platforms

Geunsik Lim; Changwoo Min; Young Ik Eom

This paper suggests a new memory partitioning scheme that can enhance process lifecycle, while avoiding Low Memory Killer and Out-of-Memory Killer operations on mobile devices. Our proposed scheme offers the complete concept of virtual memory nodes in operating systems of Android devices.

Operating Systems Review | 2016

Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds

Sanidhya Kashyap; Changwoo Min; Taesoo Kim

With increasing demand for big-data processing and faster in-memory databases, cloud providers are moving towards large virtualized instances besides focusing on the horizontal scalability. However, our experiments reveal that such instances in popular cloud services (e.g., 32 vCPUs with 208 GB supported by Google Compute Engine) do not achieve the desired scalability with increasing core count even with a simple, embarrassingly parallel job (e.g., Linux kernel compile). On a serious note, the internal synchronization scheme (e.g., paravirtualized ticket spinlock) of the virtualized instance on a machine with higher core count (e.g., 80-core) dramatically degrades its overall performance. Our finding is different from the previously well-known scalability problem (i.e., lock contention problem) and occurs because of the sophisticated optimization techniques implemented in the hypervisor---what we call sleepy spinlock anomaly. To solve this problem, we design and implement OTICKET, a variant of paravirtualized ticket spinlock that effectively scales the virtualized instances in both undersubscribed and oversubscribed environments.

Explore More