Gap-Joo Na
Sungkyunkwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gap-Joo Na.
conference on information and knowledge management | 2009
Gap-Joo Na; Sang-Won Lee; Bongki Moon
This paper presents Dynamic IPL B+-tree (d-IPL in short) as a B+-tree index variant for flash-based storage systems. The d-IPL B+-tree adopts a dynamic In-Page Logging (IPL) scheme in order to address a few new problems that are caused by the unique characteristics of B+-tree indexes The d-IPL B+-tree avoids the frequent log overflow problem by allocating a log area in a flash block dynamically. It also addresses elegantly the problem of page evaporation, imposed by the contemporary NAND flash chips, by introducing ghost nodes within the context of the dynamic IPL scheme. This simple but elegant design of the d-IPL B+-tree improves the performance significantly. For a random insertion workload, the d-IPL B+-tree index outperformed a B+-tree with a plain IPL scheme by more than a factor of two in terms of page write and block erase operations.
database systems for advanced applications | 2009
Gap-Joo Na; Bongki Moon; Sang-Won Lee
We demonstrate the IPL B + -tree prototype, which has been designed as a flash-aware index structure by adopting the in-page logging (IPL) scheme. The IPL scheme has been proposed to improve the overall write performance of flash memory database systems by avoiding costly erase operations that would be caused by small random write requests common in database workloads. The goal of this demonstration is to provide a proof-of-concept for IPL scheme as a viable and effective solution to flash memory database systems.
IEEE Transactions on Knowledge and Data Engineering | 2012
Gap-Joo Na; Sang-Won Lee; Bongki Moon
Unlike database tables, B+-tree indexes are hierarchical and their structures change over time by node splitting operations, which may propagate changes from one node to another. The node splitting operation is difficult for the basic In-Page Logging (IPL) scheme to deal with, because it involves more than one node that may be stored separately in different flash blocks. In this paper, we propose Dynamic IPL B+-tree (d-IPL B+-tree in short) as a variant of the IPL scheme tailored for flash-based B+-tree indexes. The d-IPL B+-tree addresses the problem of frequent log overflow by allocating a log area in a flash block dynamically. It also avoids a page evaporation problem, imposed by the contemporary NAND flash chips, by introducing ghost nodes to d-IPL B+-tree. This simple but elegant design of the d-IPL B+-tree provides significant performance improvement over existing approaches. For a random insertion workload, the d-IPL B+-tree outperformed a B+-tree with the plain IPL scheme by more than a factor of two in terms of page write and block erase operations.
human computer interaction with mobile devices and services | 2007
Sang-Won Lee; Gap-Joo Na; Jae-Myung Kim; Joo-Hyung Oh; Sang-Woo Kim
Recently, flash memory(in particular, NAND) is being rapidly deployed as data storage for mobile platforms such as PDAs, MP3 players, mobile phones and digital cameras, mainly because of its many advantages over its competitor, hard disk, including its low electronic power, non-volatile storage, high performance, physical stability, smaller size, light weight, and portability. Considering its rapid technical improvement both in capacity and speed, it will have a competitive advantage over its rivalry minidrive (i.e. a small size hard disk) under 100 Gbytes within a few years, As the applications in next generation mobile platforms become large, complex, and more data-oriented, they requires the database technology, because the file interface is too complex to manage their complicated data requirements. However, flash memory, compared to hard disk, has a few unique characteristics, and thus the traditional disk-based database technology does not seem to go well with flash memory. Therefore, we need to revisit almost every aspect of DBMS implementation techniques from the perspectives of flash memory. In this paper, we introduce the technical characteristics of flash memory, which we think might have huge impact on database performance to database community that are 1) no-overwrite (erase-before-write paradigm), 2) asymmetric read and write speed, and 3) no seek or rotation time. These small differences necessitate us to revisit all the major DBMS modules which have evolved over the several decades. Based on the characteristics, we identify several key issues in implementing major DBMS modules, and suggest alternative approaches to solve the issues. The topics covered in this article are neither comprehensive nor in-depth, but the main goal of this article is just to issue that a practical and urgent research topic is ahead and it poses us many challenges and opportunities.
asia-pacific web conference | 2010
Byung-Woo Nam; Gap-Joo Na; Sang-Won Lee
Flash memory has many advantages such as high performance, low electronic power, non-volatile storage and physical stability, over hard-disks. For this reason, flash memory has been deployed as data storage for mobile devices, including PDAs, MP3 players, laptop-computers and database systems. According to the cell type, flash memory can be divided into SLC(Single Level Chip) and MLC(Multi Level Chip). In general, SLC is known to have high performance and longer lifetime (i.e. more than 100K wear-leveling) while MLC is to offer larger capacity and with low price but have wear leveling of not longer than 10K. In this paper, we show that it is possible to design a fast and cost-efficient storage by combining two types of flash memories in a hybrid fashion. Specifically, we propose a hybrid flash memory solid state disk(SSD) scheme using FAST FTL for enterprise applications, where SLC chip is used as the log space for FAST while MLC chips store the normal data blocks. SLC chips allow fast and durable performance for write while MLC chips provide the large capacity. And, this is mainly due to the FAST FTL algorithm’s characteristics: it tends to direct the random writes to SLC chips and direct the other most random read to MLC chips. By taking the advantages of both chip types, we can find an economically desirable flash SSD design option. Experimental results show that our hybrid flash SSD scheme outperforms MLC-only flash scheme by far both in terms of performance and price.
database and expert systems applications | 2006
Gap-Joo Na; Sang-Won Lee
As XML is rapidly becoming the de-facto standard for data representation and exchange in the Internet age, there has been a lot of research on how to store and retrieve XML data in relational databases. However, even though the XML data is mostly tree-structured, the XML research community has shown little attention to the traditional RDBMS-based encoding scheme for tree data. In this paper, we investigate one of the encoding schemes, called Nested Interval, for the storage and retrieval of XML data. In particular, our approach is very robust in updating XML data, including insertion of new node. In fact, the existing RDBMS-based XML storage and indexing techniques work very poorly against XML data update because the XML data should be re-encoded from the scratch for virtually any update in XML data. In contract, Nested Interval scheme does not require re-encoding all nodes. In this respect, our approach is a viable option for storing and querying update-intensive XML application.
database systems for advanced applications | 2007
Hyun-Ho Kang; Jae-Myung Kim; Gap-Joo Na; Sang-Won Lee
In the era of the Internet, more and more privacy-sensitive data is published online. Even though this kind of data are published with sensitive attributes such as name and social security number removed, the privacy can be revealed by joining those data with some other external data. This technique is called joining attack. Among many techniques developed against the joining attack, the k-anonymization generalizes and/or suppresses some portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. Incognito is one of the most efficient k-anonymization algorithms. However, Incognito requires many repeating sorts against large volume data. In this paper, we propose a bitmap based Incognito algorithm. Using the bitmap technique, we can completely eliminate the expensive sort operations, and can even prune some steps in the traditional Incognito algorithm. Therefore, our new algorithm can improve the performance by an order of magnitude. From the perspective of implementation, the key issue in bitmap based Incognito is the speed of bitwise AND/OR and bit-count operations. For this, we designed and implemented a bitmap package which exploits the Single Instruction Multiple Data technique. Our experimental result shows that bitmap-based Incognito outperforms the traditional Incognito by an order of magnitude.
asia information retrieval symposium | 2005
Gap-Joo Na; Sang-Won Lee
The XML data is a typical kind of tree-data. However, the XML research community has given little attention to the traditional Relational database Management System(RDBMS) based encoding schemes for tree-data. In this paper, we will investigate one of the traditional RDBMS-based encoding schemes, called Nested Interval, for storage and retrieval of XML data. Especially, our approach is very robust for updating XML data, including insertion of new nodes. In fact, the existing RDBMS-based XML storage and indexing techniques work very poorly against XML data updates because they should be rebuilt from the scratch when any update occurs in XML data. In contract, our scheme does not require re-encoding. In this respect, our approach is a viable option for storing and querying update-intensive XML applications.
international conference on computational science and its applications | 2009
Gap-Joo Na; Sang-Won Lee
Containment queries for XML documents is one of the most important query types, and thus the efficient support for this type of query is crucial for XML databases. Recently, object-relational database management system (ORDBMS) vendors try to store and retrieve XML data in their products. In this paper, we propose an extensible index to support containment queries over the XML data stored as BLOB type in ORDBMSs. That is, we describe how to implement an index using the extensibility feature of an ORDBMS, and describe its usage. The main advantage of this index is users productivity in handling XML data in SQL language.
Journal of Information Science and Engineering | 2011
Gap-Joo Na; Bongki Moon; Sang-Won Lee