Sang Kyun Cha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sang Kyun Cha is active.

Explore More

Publication

Featured researches published by Sang Kyun Cha.

international conference on management of data | 2012

SAP HANA database: data management for modern business applications

Franz Färber; Sang Kyun Cha; Jürgen Primsch; Christof Bornhövd; Stefan Sigg; Wolfgang Lehner

The SAP HANA database is positioned as the core of the SAP HANA Appliance to support complex business analytical processes in combination with transactionally consistent operational workloads. Within this paper, we outline the basic characteristics of the SAP HANA database, emphasizing the distinctive features that differentiate the SAP HANA database from other classical relational database management systems. On the technical side, the SAP HANA database consists of multiple data processing engines with a distributed query processing environment to provide the full spectrum of data processing -- from classical relational data supporting both row- and column-oriented physical representations in a hybrid engine, to graph and text processing for semi- and unstructured data management within the same system. From a more application-oriented perspective, we outline the specific support provided by the SAP HANA database of multiple domain-specific languages with a built-in set of natively implemented business functions. SQL -- as the lingua franca for relational database systems -- can no longer be considered to meet all requirements of modern applications, which demand the tight interaction with the data management layer. Therefore, the SAP HANA database permits the exchange of application semantics with the underlying data management platform that can be exploited to increase query expressiveness and to reduce the number of individual application-to-database round trips.

international conference on management of data | 2001

Optimizing multidimensional index trees for main memory access

Kihong Kim; Sang Kyun Cha; Keunjoo Kwon

Recent studies have shown that cache-conscious indexes such as the CSB+-tree outperform conventional main memory indexes such as the T-tree. The key idea of these cache-conscious indexes is to eliminate most of child pointers from a node to increase the fanout of the tree. When the node size is chosen in the order of the cache block size, this pointer elimination effectively reduces the tree height, and thus improves the cache behavior of the index. However, the pointer elimination cannot be directly applied to multidimensional index structures such as the R-tree, where the size of a key, typically, an MBR (minimum bounding rectangle), is much larger than that of a pointer. Simple elimination of four-byte pointers does not help much to pack more entries in a node. This paper proposes a cache-conscious version of the R-tree called the CR-tree. To pack more entries in a node, the CR-tree compresses MBR keys, which occupy almost 80% of index data in the two-dimensional case. It first represents the coordinates of an MBR key relatively to the lower left corner of its parent MBR to eliminate the leading Os from the relative coordinate representation. Then, it quantizes the relative coordinates with a fixed number of bits to further cut off the trailing less significant bits. Consequently, the CR-tree becomes significantly wider and smaller than the ordinary R-tree. Our experimental and analytical study shows that the two-dimensional CR-tree performs search up to 2.5 times faster than the ordinary R-tree while maintaining similar update performance and consuming about 60% less memory space.

international conference on management of data | 2012

Efficient transaction processing in SAP HANA database: the end of a column store myth

Vishal Sikka; Franz Färber; Wolfgang Lehner; Sang Kyun Cha; Thomas Peh; Christof Bornhövd

The SAP HANA database is the core of SAPs new data management platform. The overall goal of the SAP HANA database is to provide a generic but powerful system for different query scenarios, both transactional and analytical, on the same data representation within a highly scalable execution environment. Within this paper, we highlight the main features that differentiate the SAP HANA database from classical relational database engines. Therefore, we outline the general architecture and design criteria of the SAP HANA in a first step. In a second step, we challenge the common belief that column store data structures are only superior in analytical workloads and not well suited for transactional workloads. We outline the concept of record life cycle management to use different storage formats for the different stages of a record. We not only discuss the general concept but also dive into some of the details of how to efficiently propagate records through their life cycle and moving database entries from write-optimized to read-optimized storage formats. In summary, the paper aims at illustrating how the SAP HANA database is able to efficiently work in analytical as well as transactional workload environments.

symposium on large spatial databases | 2003

Performance Evaluation of Main-Memory R-tree Variants

Sangyong Hwang; Keunjoo Kwon; Sang Kyun Cha; Byung Suk Lee

There have been several techniques proposed for improving the performance of main-memory spatial indexes, but there has not been a comparative study of their performance. In this paper we compare the performance of six main-memory R-tree variants: R-tree, R*-tree, Hilbert R-tree, CR-tree, CR*-tree, and Hilbert CR-tree. CR*-trees and Hilbert CR-trees are respectively a natural extension of R*-trees and Hilbert R-trees by incorporating CR-trees’ quantized relative minimum bounding rectangle (QRMBR) technique. Additionally, we apply the optimistic, latch-free index traversal (OLFIT) concurrency control mechanism for B-trees to the R-tree variants while using the GiST-link technique. We perform extensive experiments in the two categories of sequential accesses and concurrent accesses, and pick the following best trees. In sequential accesses, CR*-trees are the best for search, Hilbert R-trees for update, and Hilbert CR-trees for a mixture of them. In concurrent accesses, Hilbert CR-trees for search if data is uniformly distributed, CR*-trees for search if data is skewed, Hilbert R-trees for update, and Hilbert CR-trees for a mixture of them. We also provide detailed observations of the experimental results, and rationalize them based on the characteristics of the individual trees. As far as we know, our work is the first comprehensive performance study of main-memory R-tree variants. The results of our study provide a useful guideline in selecting the most suitable index structure in various cases.

IEEE Transactions on Knowledge and Data Engineering | 2012

A Performance Anomaly Detection and Analysis Framework for DBMS Development

Dong-Hun Lee; Sang Kyun Cha; Arthur H. Lee

Detecting performance anomalies and finding their root causes are tedious tasks requiring much manual work. Functionality enhancements in DBMS development as in most software development often introduce performance problems in addition to bugs. To detect the problems as soon as they are introduced, which often happens during the early phases of a development cycle, we adopt performance regression testing early in the process. In this paper, we describe a framework that we developed to manage performance anomalies after establishing a set of conditions for a problem to be considered an anomaly. The framework uses Statistical Process Control (SPC) charts to detect performance anomalies and differential profiling to identify their root causes. By automating the tasks within the framework we were able to remove most of the manual overhead in detecting anomalies and reduce the analysis time for identifying the root causes by about 90 percent in most cases. The tools developed and deployed based on the framework allow us continuous, automated daily monitoring of performance in addition to the usual functionality monitoring in our DBMS development.

international conference on management of data | 1998

Xmas: an extensible main-memory storage system for high-performance applications

Jang Ho Park; Yong Sik Kwon; Ki Hong Kim; Sang Ho Lee; Byoung Dae Park; Sang Kyun Cha

Xmas is an extensible main-memory storage system for high-performance embedded database applications. Xmas not only provides the core functionality of DBMS, such as data persistence, crash recovery, and concurrency control, but also pursues an extensible architecture to meet the requirements from various application areas. One crucial aspect of such extensibility is that an application developer can compose application-specific, high-level operations with a basic set of operations provided by the system. Called composite actions in Xmas, these operations are processed by a customized Xmas server with minimum interaction with application processes, thus improving the overall performance. This paper first presents the architecture and functionality of Xmas, and then demonstrates a simulation of mobile communication service.

web information systems engineering | 2000

Efficient Web-based access to multiple geographic databases through automatically generated wrappers

Sang Kyun Cha; Ki Hong Kim; Changbin Song; Yong Sik Kwon; Sangyong Hwang

With the proliferation of various geographic database servers on the Internet, the need to access them simultaneously through the Web arises frequently for high-level decision making. The ongoing OpenGIS standard addresses many of the interoperability issues to make such global utilization of geographic databases possible. Based on the OpenGIS standard, the paper presents an object oriented architecture for the efficient Web based access to multiple geographic databases on the Internet. Called MEADOW, it provides a pair of automatically generated modules: the OpenGIS wrapper on the server side and the matching transparent access provider (TAP) on the client side. In cooperation with the wrapper, TAP supports an applications efficient access to the databases through prefetching and caching of remote objects.

Archive | 1999

A Middleware Architecture for Transparent Access to Multiple Spatial Object Databases

Sang Kyun Cha; Ki Hong Kim; Chang Bin Song; Joo Kwan Kim; Yong Sik Kwon

The need to access multiple databases arises frequently in geographic information processing because a single spatial object database may not contain all the information at the desired level of abstraction, completeness, and accuracy. For example, in planning the extension of underground urban utility networks such as gas and telecommunication lines, it is necessary to access databases of existing and planned utility networks. Such databases are usually maintained independently by individual operating companies. It is also common that these independent databases maintain much geographic information redundantly with different levels of abstraction, completeness, and accuracy. Many high-level decision making processes can take advantage of such redundancy to unify the content of one database with those of others. This unification of spatial objects in multiple databases is expected to extend the solution space that would otherwise be very limited or nonexistent.

international conference on data engineering | 2005

Paradigm Shift to New DBMS Architectures: Research Issues and Market Needs

Sang Kyun Cha; Anastassia Ailamaki; Yoshinori Hara; Vishal Sikka

Moore’s law has driven CPU power and memory capacity to grow million times since the system R and Ingres projects started thirty years ago. The underlying software technology has also changed substantially. Today, operating systems support POSIX lightweight multithread library and virtually infinite address for efficient utilization of multiprocessor systems with very large memory. Despite these dramatic advances in underlying hardware and software, the initial RDBMS architecture of managing data and indexes as disk-resident block structures remains the same. The heavyweight process architecture is still dominant, incurring costly context switch overhead among multiple processes involved in transaction execution. Typical commercial RDBMS implementations involve several millions of lines of complex code that has been evolving over decades. Because it is extremely risky to overhaul any software of this size, commercial RDBMS implementations are likely to maintain the current, disk-resident heavyweightprocess architecture. On the application side, there is growing demand for real-time acquisition and analysis of a large volume of data, especially, continuously arriving stream data. Examples are traditionally found in financial services, telecom, defense and intelligence, logistics, and this list is being expanded to include other domains such as supply chain and retail with the development of RFID technology for ubiquitous tracking of physical objects. These so-called real-time enterprise applications demand

database systems for advanced applications | 1999

A middleware implementation of active rules for ODBMS

Sang Bong Yoo; K. C. Kim; Sang Kyun Cha

Throughout many research and development projects for active rule systems, active rules are implemented with different syntax and semantics. It becomes one of the stumbling blocks to apply active database systems especially in networked heterogeneous multidatabase environments. Utilizing the recent development of CORBA and ODMG standards, a middleware approach to provide active rule systems for heterogeneous ODBMS is presented in this paper. The active rule system described is applied for integrity maintenance of spatial objects. According to the events included in application programs, the active rules represented in ECA type are inserted into the program by a preprocessor. One advantage of this compile approach is that the preprocessed program can be compiled and executed without the overhead of runtime monitoring. For the changed rules after compilation, a run time interpreter is included in the executable program.

Explore More