Hyeontaek Lim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hyeontaek Lim is active.

Explore More

Publication

Featured researches published by Hyeontaek Lim.

symposium on operating systems principles | 2011

SILT: a memory-efficient, high-performance key-value store

Hyeontaek Lim; Bin Fan; David G. Andersen; Michael Kaminsky

SILT (Small Index Large Table) is a memory-efficient, high-performance key-value store system based on flash storage that scales to serve billions of key-value items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. SILT combines new algorithmic and systems techniques to balance the use of memory, storage, and computation. Our contributions include: (1) the design of three basic key-value stores each with a different emphasis on memory-efficiency and write-friendliness; (2) synthesis of the basic key-value stores to build a SILT key-value store system; and (3) an analytical model for tuning system parameters carefully to meet the needs of different workloads. SILT requires one to two orders of magnitude less memory to provide comparable throughput to current high-performance key-value systems on a commodity desktop system with flash storage.

virtual execution environments | 2009

Task-aware virtual machine scheduling for I/O performance.

Hwanju Kim; Hyeontaek Lim; Jinkyu Jeong; Heeseung Jo; Joonwon Lee

The use of virtualization is progressively accommodating diverse and unpredictable workloads as being adopted in virtual desktop and cloud computing environments. Since a virtual machine monitor lacks knowledge of each virtual machine, the unpredictableness of workloads makes resource allocation difficult. Particularly, virtual machine scheduling has a critical impact on I/O performance in cases where the virtual machine monitor is agnostic about the internal workloads of virtual machines. This paper presents a task-aware virtual machine scheduling mechanism based on inference techniques using gray-box knowledge. The proposed mechanism infers the I/O-boundness of guest-level tasks and correlates incoming events with I/O-bound tasks. With this information, we introduce partial boosting, which is a priority boosting mechanism with task-level granularity, so that an I/O-bound task is selectively scheduled to handle its incoming events promptly. Our technique focuses on improving the performance of I/O-bound tasks within heterogeneous workloads by lightweight mechanisms with complete CPU fairness among virtual machines. All implementation is confined to the virtualization layer based on the Xen virtual machine monitor and the credit scheduler. We evaluate our prototype in terms of I/O performance and CPU fairness over synthetic mixed workloads and realistic applications.

conference on emerging network experiment and technology | 2013

Scalable, high performance ethernet forwarding with CuckooSwitch

Dong Zhou; Bin Fan; Hyeontaek Lim; Michael Kaminsky; David G. Andersen

Several emerging network trends and new architectural ideas are placing increasing demand on forwarding table sizes. From massive-scale datacenter networks running millions of virtual machines to flow-based software-defined networking, many intriguing design options require FIBs that can scale well beyond the thousands or tens of thousands possible using todays commodity switching chips. This paper presents CuckooSwitch, a software-based Ethernet switch design built around a memory-efficient, high-performance, and highly-concurrent hash table for compact and fast FIB lookup. We show that CuckooSwitch can process 92.22 million minimum-sized packets per second on a commodity server equipped with eight 10 Gbps Ethernet interfaces while maintaining a forwarding table of one billion forwarding entries. This rate is the maximum packets per second achievable across the underlying hardwares PCI buses.

international symposium on computer architecture | 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Sheng Li; Hyeontaek Lim; Victor W. Lee; Jung Ho Ahn; Anuj Kalia; Michael Kaminsky; David G. Andersen; Seongil O; Sukhan Lee; Pradeep Dubey

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of datacenters. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused upon improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts too showed orders of magnitude improvement over stock memcached. We aim at architecting high performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems, but also leads to guided optimizations atop a recent design to achieve a record-setting throughput of 120 million requests per second (MRPS) on a single commodity server. Our implementation delivers 9.2X the performance (RPS) and 2.8X the system energy efficiency (RPS/watt) of the best-published FPGA-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.

symposium on cloud computing | 2011

Small cache, big effect: provable load balancing for randomly partitioned cluster services

Bin Fan; Hyeontaek Lim; David G. Andersen; Michael Kaminsky

Load balancing requests across a cluster of back-end servers is critical for avoiding performance bottlenecks and meeting service-level objectives (SLOs) in large-scale cloud computing services. This paper shows how a small, fast popularity-based front-end cache can ensure load balancing for an important class of such services; furthermore, we prove an O(n log n) lower-bound on the necessary cache size and show that this size depends only on the total number of back-end nodes n, not the number of items stored in the system. We validate our analysis through simulation and empirical results running a key-value storage system on an 85-node cluster.

Journal of Parallel and Distributed Computing | 2011

Transparently bridging semantic gap in CPU management for virtualized environments

Hwanju Kim; Hyeontaek Lim; Jinkyu Jeong; Heeseung Jo; Joonwon Lee; Seungryoul Maeng

Consolidated environments are progressively accommodating diverse and unpredictable workloads in conjunction with virtual desktop infrastructure and cloud computing. Unpredictable workloads, however, aggravate the semantic gap between the virtual machine monitor and guest operating systems, leading to inefficient resource management. In particular, CPU management for virtual machines has a critical impact on I/O performance in cases where the virtual machine monitor is agnostic about the internal workloads of each virtual machine. This paper presents virtual machine scheduling techniques for transparently bridging the semantic gap that is a result of consolidated workloads. To enable us to achieve this goal, we ensure that the virtual machine monitor is aware of task-level I/O-boundedness inside a virtual machine using inference techniques, thereby improving I/O performance without compromising CPU fairness. In addition, we address performance anomalies arising from the indirect use of I/O devices via a driver virtual machine at the scheduling level. The proposed techniques are implemented on the Xen virtual machine monitor and evaluated with micro-benchmarks and real workloads on Linux and Windows guest operating systems.

algorithm engineering and experimentation | 2013

Practical batch-updatable external hashing with sorting

Hyeontaek Lim; David G. Andersen; Michael Kaminsky

This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Our scheme combines three key techniques: (1) a new index data structure (Entropy-Coded Tries); (2) the use of sorting as the main data manipulation method; and (3) support for incremental index construction for dynamic datasets. We evaluate our scheme by building an external dictionary on flash-based drives and demonstrate our schemes high performance, compactness, and practicality.

acm special interest group on data communication | 2012

Supporting network evolution and incremental deployment with XIA

Robert Grandl; Dongsu Han; Suk-Bok Lee; Hyeontaek Lim; Michel Machado; Matthew K. Mukerjee; David Naylor

eXpressive Internet Architecture (XIA) [1] is an architecture that natively supports multiple communication types and allows networks to evolve their abstractions and functionality to accommodate new styles of communication over time. XIA embeds an elegant mechanism for handling unforeseen communication types for legacy routers. In this demonstration, we show that XIA overcomes three key barriers in network evolution (outlined below) by (1) allowing end-hosts and applications to start using new communication types (e.g., service and content) before the network supports them, (2) ensuring that upgrading a subset of routers to support new functionalities immediately benefits applications, and (3) using the same mechanisms we employ for 1 and 2 to incrementally deploy XIA in IP networks.

international conference on management of data | 2017

Cicada: Dependably Fast Multi-Core In-Memory Transactions

Hyeontaek Lim; Michael Kaminsky; David G. Andersen

Multi-core in-memory databases promise high-speed online transaction processing. However, the performance of individual designs suffers when the workload characteristics miss their small sweet spot of a desired contention level, read-write ratio, record size, processing rate, and so forth. Cicada is a single-node multi-core in-memory transactional database with serializability. To provide high performance under diverse workloads, Cicada reduces overhead and contention at several levels of the system by leveraging optimistic and multi-version concurrency control schemes and multiple loosely synchronized clocks while mitigating their drawbacks. On the TPC-C and YCSB benchmarks, Cicada outperforms Silo, TicToc, FOEDUS, MOCC, two-phase locking, Hekaton, and ERMIA in most scenarios, achieving up to 3X higher throughput than the next fastest design. It handles up to 2.07 M TPC-C transactions per second and 56.5 M YCSB transactions per second, and scans up to 356 M records per second on a single 28-core machine.

IEEE Micro | 2016

Achieving One Billion Key-Value Requests per Second on a Single Server

Sheng Li; Hyeontaek Lim; Victor W. Lee; Jung Ho Ahn; Anuj Kalia; Michael Kaminsky; David G. Andersen; Seongil O; Sukhan Lee; Pradeep Dubey

Distributed in-memory key-value stores (KVSs) have become a critical data-serving layer in cloud computing and big data infrastructure. Unfortunately, KVSs have demonstrated a gap between achieved and available performance, QoS, and energy efficiency on commodity platforms. Two research thrusts have focused on improving key-value performance: hardware-centric research has started to explore specialized platforms for KVSs, and software-centric research revisited the KVS application to address fundamental software bottlenecks. Unlike prior research focusing on hardware or software in isolation, the authors aimed to full-stack (software through hardware) architect high-performance and efficient KVS platforms. Their full-system characterization identifies the critical hardware/software ingredients for high-performance KVS systems and suggests optimizations to achieve record-setting performance and energy efficiency: 120~167 million requests per second (RPS) on a single commodity server. They propose a future many-core platform and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server platform.

Explore More