Bongki Moon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bongki Moon is active.

Explore More

Publication

Featured researches published by Bongki Moon.

IEEE Transactions on Knowledge and Data Engineering | 2001

Analysis of the clustering properties of the Hilbert space-filling curve

Bongki Moon; H. V. Jagadish; Christos Faloutsos; Joel H. Saltz

Several schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering (Abel and Mark, 1990; Jagadish, 1990). We analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.

international conference on management of data | 2012

Parallel data processing with MapReduce: a survey

Kyong Ha Lee; Yoon Joon Lee; Hyunsik Choi; Yon Dohn Chung; Bongki Moon

A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. While MapReduce is used in many areas where massive data analysis is required, there are still debates on its performance, efficiency per node, and simple abstraction. This survey intends to assist the database and open source communities in understanding various technical aspects of the MapReduce framework. In this survey, we characterize the MapReduce framework and discuss its inherent pros and cons. We then introduce its optimization strategies reported in the recent literature. We also discuss the open issues and challenges raised on parallel data analysis with MapReduce.

international conference on management of data | 2007

Design of flash-based DBMS: an in-page logging approach

Sang-Won Lee; Bongki Moon

The popularity of high-density flash memory as data storage media has increased steadily for a wide spectrum of computing devices such as PDAs, MP3 players, mobile phones and digital cameras. More recently, computer manufacturers started launching new lines of mobile or portable computers that did away with magnetic disk drives altogether, replacing them with tens of gigabytes of NAND flash memory. Like EEPROM and magnetic disk drives, flash memory is non-volatile and retains its contents even when the power is turned off. As its capacity increases and price drops, flash memory will compete more successfully with lower-end, lower-capacity disk drives. It is thus not inconceivable to consider running a full database system on the flash-only computing platforms or running an embedded database system on the lightweight computing devices. In this paper, we present a new design called in-page logging (IPL) for flash memory based database servers. This new design overcomes the limitations of flash memory such as high write latency, and exploits unique characteristics of flash memory to achieve the best attainable performance for flash-based database servers. We show empirically that the IPL approach can yield considerable performance benefit over traditional design for disk-based database servers. We also show that the basic design of IPL can be elegantly extended to support transactional database recovery.

international conference on management of data | 2008

A case for flash memory ssd in enterprise database applications

Sang-Won Lee; Bongki Moon; Chanik Park; Jae-Myung Kim; Sang-Woo Kim

Due to its superiority such as low access latency, low energy consumption, light weight, and shock resistance, the success of flash memory as a storage alternative for mobile computing devices has been steadily expanded into personal computer and enterprise server markets with ever increasing capacity of its storage. However, since flash memory exhibits poor performance for small-to-moderate sized writes requested in a random order, existing database systems may not be able to take full advantage of flash memory without elaborate flash-aware data structures and algorithms. The objective of this work is to understand the applicability and potential impact that flash memory SSD (Solid State Drive) has for certain type of storage spaces of a database server where sequential writes and random reads are prevalent. We show empirically that up to more than an order of magnitude improvement can be achieved in transaction processing by replacing magnetic disk with flash memory SSD for transaction log, rollback segments, and temporary table spaces.

international conference on data engineering | 2004

PRIX: indexing and querying XML using prufer sequences

Praveen Rao; Bongki Moon

We propose a new way of indexing XML documents and processing twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by Prufers method that constructs a one-to-one correspondence between trees and sequences. During query processing, a twig pattern is also transformed into its Prufer sequence. By performing subsequence matching on the set of sequences in the database, and performing a series of refinement phases that we have developed, we can find all the occurrences of a twig pattern in the database. Our approach allows holistic processing of a twig pattern without breaking the twig into root-to-leaf paths and processing these paths individually. Furthermore, we show that all correct answers are found without any false dismissals or false alarms. Experimental results demonstrate the performance benefits of our proposed techniques.

international conference on data engineering | 1997

Titan: a high-performance remote-sensing database

Chialin Chang; Bongki Moon; Anurag Acharya; Carter T. Shock; Alan Sussman; Joel H. Saltz

There are two major challenges for a high performance remote sensing database. First, it must provide low latency retrieval of very large volumes of spatio temporal data. This requires effective declustering and placement of a multidimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data size due to post processing makes it imperative, from a performance perspective, that the post processing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. The paper describes the design, implementation and evaluation of Titan, a parallel shared nothing database designed for handling remote sensing data. The computational platform for Titan is a 16 processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24 GB of AVHRR data from the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries and interactive response times for local queries.

IEEE Transactions on Knowledge and Data Engineering | 2005

Spatiotemporal aggregate computation: a survey

Inés Fernando Vega López; Richard T. Snodgrass; Bongki Moon

Spatiotemporal databases are becoming increasingly more common. Typically, applications modeling spatiotemporal objects need to process vast amounts of data. In such cases, generating aggregate information from the data set is more useful than individually analyzing every entry. In this paper, we study the most relevant techniques for the evaluation of aggregate queries on spatial, temporal, and spatiotemporal data. We also present a model that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied. This model gives us a framework that allows us to analyze and compare the different existing techniques for the evaluation of aggregate queries. At the same time, it allows us to identify opportunities for research on types of aggregate queries that have not been studied.

international conference on management of data | 2009

Advances in flash memory SSD technology for enterprise database applications

Sang-Won Lee; Bongki Moon; Chanik Park

The past few decades have witnessed a chronic and widening imbalance among processor bandwidth, disk capacity, and access speed of disk. According to Amdhals law, the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. This implies that the performance enhancement of an OLTP system would be seriously limited without a considerable improvement in I/O throughput. Since the market debut of flash memory SSD a few years ago, we have made a continued effort to overcome its poor random write performance and to provide stable and sufficient I/O bandwidth. In this paper, we present three different flash memory SSD models prototyped recently by Samsung Electronics. We then show how the flash memory SSD technology has advanced to reverse the widening trend of performance gap between processors and storage devices. We also demonstrate that even a single flash memory drive can outperform a level-0 RAID with eight enterprise class 15k-RPM disk drives with respect to transaction throughput, cost effectiveness and energy consumption.

conference on high performance computing (supercomputing) | 1994

Run-time and compile-time support for adaptive irregular problems

Shamik D. Sharma; Ravi Ponnusamy; Bongki Moon; Yuan-Shin Hwang; Raja Das; Joel H. Saltz

In adaptive irregular problems, data arrays are accessed via indirection arrays, and data access patterns change during computation. Parallelizing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This paper describes CHAOS, a library of efficient runtime primitives that provides such support. To demonstrate the effectiveness of the runtime support, two adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a code for simulating gas flows (DSMC). We have also proposed minor extensions to Fortran D which would enable compilers to parallelize irregular for all loops in such adaptive applications by embedding calls to primitives provided by a runtime library. We have implemented our proposed extensions in the Syracuse Fortran 90D/HPF prototype compiler, and have used the compiler to parallelize kernels from two adaptive applications.<<ETX>>

Software - Practice and Experience | 1995

Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Yuan-Shin Hwang; Bongki Moon; Shamik D. Sharma; Ravi Ponnusamy; Raja Das; Joel H. Saltz

In many scientific applications, arrays containing data are indirectly indexed through indirection arrays. Such scientific applications are called irregular programs and are a distinct class of applications that require special techniques for parallelization.

Explore More