Juchang Lee
Seoul National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juchang Lee.
international conference on data engineering | 2013
Juchang Lee; Yong Sik Kwon; Franz Färber; Michael Muehle; Chulwon Lee; Christian Bensberg; Joo Yeon Lee; Arthur H. Lee; Wolfgang Lehner
One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads.
embedded and real-time computing systems and applications | 1995
Sang Kyun Cha; Byoung Dae Park; Sehwan Lee; S. H. Song; Jang Ho Park; Juchang Lee; S. Y. Park; D. Y. Hur; G. B. Kim
Many applications, such as telecommunication, process control, and virtual reality, require real-time access to database. Main-memory DBMS, which becomes feasible with the increasing availability of large and relatively cheap memory, can provide better performance than disk-based systems for real-time applications. This paper presents an overall architecture of M/sup 2/RT, a main-memory real-time DBMS, and an object-oriented design of its storage system called M/sup 2/RTSS. M/sup 2/RTSS provides classes that implement the core functionality of storage management, real-time transaction scheduling, and recovery. Implemetation-specific information is encapsulated in these classes and extensions can be made by inheritance. With object-oriented features, M/sup 2/RTSS can easily incorporate new development in application requirements and the result of ongoing research in real-time systems.
very large data bases | 2014
Carsten Binnig; Stefan Hildenbrand; Franz Färber; Donald Kossmann; Juchang Lee; Norman May
Modern database systems employ Snapshot Isolation to implement concurrency control and isolationbecause it promises superior query performance compared to lock-based alternatives. Furthermore, Snapshot Isolation never blocks readers, which is an important property for modern information systems, which have mixed workloads of heavy OLAP queries and short update transactions. This paper revisits the problem of implementing Snapshot Isolation in a distributed database system and makes three important contributions. First, a complete definition of Distributed Snapshot Isolation is given, thereby extending existing definitions from the literature. Based on this definition, a set of criteria is proposed to efficiently implement Snapshot Isolation in a distributed system. Second, the design space of alternative methods to implement Distributed Snapshot Isolation is presented based on this set of criteria. Third, a new approach to implement Distributed Snapshot Isolation is devised; we refer to this approach as Incremental. The results of comprehensive performance experiments with the TPC-C benchmark show that the Incremental approach significantly outperforms any other known method from the literature. Furthermore, the Incremental approach requires no a priori knowledge of which nodes of a distributed system are involved in executing a transaction. Also, the Incremental approach can execute transactions that involve data from a single node only with the same efficiency as a centralized database system. This way, the Incremental approach takes advantage of sharding or other ways to improve data locality. The cost for synchronizing transactions in a distributed system is only paid by transactions that actually involve data from several nodes. All these properties make the Incremental approach more practical than related methods proposed in the literature.
international conference on management of data | 2016
Juchang Lee; Hyungyu Shin; Chang Gyoo Park; Seongyun Ko; Jaeyun Noh; Yongjae Chuh; Wolfgang Stephan; Wook-Shin Han
While multi-version concurrency control (MVCC) supports fast and robust performance in in-memory, relational databases, it has the potential problem of a growing number of versions over time due to obsolete versions. Although a few TB of main memory is available for enterprise machines, the memory resource should be used carefully for economic and practical reasons. Thus, in order to maintain the necessary number of versions in MVCC, versions which will no longer be used need to be deleted. This process is called garbage collection. MVCC uses the concept of visibility to define garbage. A set of versions for each record is first identified as candidate if their version timestamps are lower than the minimum value of snapshot timestamps of active snapshots in the system. All such candidates, except the one which has the maximum version timestamp, are safely reclaimed as garbage versions. In mixed OLTP and OLAP workloads, the typical garbage collector may not effectively reclaim record versions. In these workloads, OLTP applications generate a high volume of new versions, while long-lived queries or transactions in OLAP applications often block garbage collection, since we need to compare the version timestamp of each record version with the snapshot timestamp of the oldest, long-lived snapshot. Thus, these workloads typically cause the in-memory version space to grow. Additionally, the increasing version chains of records over time may also increase the traversal cost for them. In this paper, we present an efficient and effective garbage collector called HybridGC in SAP HANA. HybridGC integrates three novel concepts of garbage collection: timestamp-based group garbage collection, table garbage collection, and interval garbage collection. Through experiments using mixed OLTP and OLAP workloads, we show that HybridGC effectively and efficiently collects garbage versions with negligible overhead.
database and expert systems applications | 1999
Jang Ho Park; Ki Hong Kim; Sang Kyun Cha; Sang Ho Lee; Min Seok Song; Juchang Lee
Newly emerging spatial applications such as the intelligent transportation system require high-performance access to databases. Although research prototypes and spatial extensions on top of commercial DBMSs have been built, the high-performance requirement is difficult to satisfy because most of them employ the traditional disk-based database architecture. With the steadily increasing memory capacity of computer systems, the main-memory database architecture becomes a feasible approach to meeting the requirement, and a few commercial products are developed recently. However, there has been little work on applying the main-memory database to the spatial domain. This paper presents Xmas-SX, a high-performance spatial storage system based on the main-memory database architecture. It provides the core subset of the OpenGIS geometry types, operators, and spatial indexes. Variable-length spatial data are efficiently managed by storing each of them as a sequence of fixed-size fragments. An experiment shows that, compared with a disk-based ODBMS with data fully cached, Xmas-SX shows only 6% better performance for the spatial range query. Before data fully cached, however, the performance gap is much bigger. For the update, Xmas-SX outperforms the ODBMS by more than ten times.
embedded and real-time computing systems and applications | 1996
Sang Kyun Cha; Jang Ho Park; Sehwan Lee; Byoung Dae Park; Juchang Lee
M/sup 2/ RTSS is a main-memory storage system, which has been under recent development at Seoul National University as a vehicle for high-performance and real-time database application research. To deal with requirements from various applications, M/sup 2/ RTSS architecture is designed to have extensibility in many aspects of storage system functionality. One crucial aspect of such extensibility is that the user is allowed to compose application-specific, high-level operations with a basic set of operations supplied by the system. Called composite actions in M/sup 2/ RTSS, these operations run on a customized M/sup 2/ RTSS server with minimum interprocess communication with user processes, thus increasing the overall performance of executing user transactions. The object-oriented design and implementation of the whole M/sup 2/ RTSS system facilitates this type of application-specific extension as well as other types in transaction scheduling and recovery. This paper first describes the overall architecture of M/sup 2/ RTSS and then explores the composition of application-specific operations in detail.
very large data bases | 2017
Juchang Lee; SeungHyun Moon; Kyu Hwan Kim; Deok Hoe Kim; Sang Kyun Cha; Wook-Shin Han
Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing real-time reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability of OLAP query performance. In order to tackle this challenging problem, we propose a novel database replication architecture called Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, row-store formats can be used for OLTP transactions at the primary, while column-store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) in order to enable real-time reporting by minimizing the propagation delay between the primary and replicas. Through extensive experiments with a concrete implementation available in a commercial database system, we demonstrate that ATR achieves sub-second visibility delay even for update-intensive workloads, providing scalable OLAP performance without notable overhead to the primary.
international conference on data engineering | 2001
Juchang Lee; Kihong Kim; Sang Kyun Cha
With a GByte of memory priced at less than
very large data bases | 2018
Juchang Lee; Wook-Shin Han; Hyoung Jun Na; Chang Gyoo Park; Kyu Hwan Kim; Deok Hoe Kim; Joo Yeon Lee; Sang Kyun Cha; SeungHyun Moon
2000, main-memory DBMSs (MMDBMSs) are emerging as an economically viable alternative to disk-resident DBMSs (DRDBMSs) in many problem domains. The MMDBMS can show significantly higher performance than the DRDBMS by reducing disk accesses to the sequential form of log writing and occasional checkpointing. Upon a system crash, the recovery process begins by accessing the disk-resident log and checkpoint data to restore a consistent state. With increasing CPU speed, however, such disk access is still the dominant bottleneck in MMDBMSs. To overcome this bottleneck, this paper explores alternatives of parallel logging and recovery. The major contribution of this paper is the so-called differential logging scheme that permits unrestricted parallelism in logging and recovery. Using the bit-wise XOR operation both to compute the differential log between the before and after images and to recover the consistent database state, this scheme offers the room for significant performance improvement in the MMDBMS. First, with logging done on the difference, the log volume is reduced to almost half compared with the conventional physical logging. Second, the commutativity and associativity of XOR enables processing of log records in an arbitrary order. This means that we can freely distribute log records to multiple disks to improve the logging performance. During the recovery time, we can do a parallel restart independently for each log disk. This paper shows the superior performance of the differential logging compared to the physical logging in a shared-memory multiprocessor environment.
Lecture Notes in Computer Science | 1999
Sang Kyun Cha; Ki Hong Kim; Juchang Lee
Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing real-time reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability. In order to tackle this challenging problem, we propose a novel database replication architecture called HANA Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, row store formats can be used for OLTP transactions at the primary, while column store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance, while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) to enable real-time reporting by minimizing the propagation delay between the primary and replicas. It supports adaptive query routing depending on its predefined acceptable staleness range. Through extensive experiments with a concrete implementation available in a commercial product, we demonstrate that ATR achieves sub-second visibility delay even for update-intensive workloads, providing scalable OLAP performance without notable overhead to the primary. In addition, with extension of ATR to eager parallel replication, we demonstrate how the parallel log replay and its log-less replica recovery mechanisms improve run-time transaction performance under eager replication.