Hongchan Roh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongchan Roh is active.

Explore More

Publication

Featured researches published by Hongchan Roh.

very large data bases | 2011

B+-tree index optimization by exploiting internal parallelism of flash-based solid state drives

Hongchan Roh; Sanghyun Park; Sungho Kim; Mincheol Shin; Sang-Won Lee

Previous research addressed the potential problems of the hard-disk oriented design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential benefits of flashSSDs. First, we examine the internal parallelism issues of flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest algorithm-design principles in order to best benefit from the internal parallelism. We present a new I/O request concept, called psync I/O that can exploit the internal parallelism of flashSSDs in a single process. Based on these ideas, we introduce B+-tree optimization methods in order to utilize internal parallelism. By integrating the results of these methods, we present a B+-tree variant, PIO B-tree. We confirmed that each optimization method substantially enhances the index performance. Consequently, PIO B-tree enhanced B+-trees insert performance by a factor of up to 16.3, while improving point-search performance by a factor of 1.2. The range search of PIO B-tree was up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree outperformed other flash-aware indexes in various synthetic workloads. We also confirmed that PIO B-tree outperforms B+-tree in index traces collected inside the Postgresql DBMS with TPC-C benchmark.

Information Sciences | 2009

A B-Tree index extension to enhance response time and the life cycle of flash memory

Hongchan Roh; Woo-Cheol Kim; Seungwoo Kim; Sanghyun Park

Flash memory has critical drawbacks such as long latency of its write operation and a short life cycle. In order to overcome these limitations, the number of write operations to flash memory devices needs to be minimized. The B-Tree index structure, which is a popular hard disk based index structure, requires an excessive number of write operations when updating it to flash memory. To address this, it was proposed that another layer that emulates a B-Tree be placed between the flash memory and B-Tree indexes. This approach succeeded in reducing the write operation count, but it greatly increased search time and main memory usage. This paper proposes a B-Tree index extension that reduces both the write count and search time with limited main memory usage. First, we designed a buffer that accumulates update requests per leaf node and then simultaneously processes the update requests of the leaf node carrying the largest number of requests. Second, a type of header information was written on each leaf node. Finally, we made the index automatically control each leaf node size. Through experiments, the proposed index structure resulted in a significantly lower write count and a greatly decreased search time with less main memory usage, than placing a layer that emulates a B-Tree.

Journal of Information Science and Engineering | 2014

AS B-tree: A Study of an Efficient B + -tree for SSDs *

Hongchan Roh; Sungho Kim; Daewook Lee; Sang-Hyun Park

Recently, flash memory has been utilized as the primary storage device in mobile devices. SSDs have been gaining popularity as the primary storage device in laptop and desktop computers and even in enterprise-level server machines. SSDs have an array of NAND flash memory packages and are therefore able to achieve concurrent parallel access to one or more flash memory packages. In order to take advantage of the internal parallelism of an SSD, it is beneficial for DBMSs to request input/output (I/O) operations on sequential logical block addresses (LBAs). However, the B^(+)-tree structure, which is a representative index scheme of current relational DBMSs, produces excessive I/O operations in random order when its node structures are updated. Therefore, the conventional B^(+)-tree structure is unfavorable for use in SSDs. In this paper, we propose the Always Sequential (AS) B-tree which consists of the Legacy B^(+)-tree structure, a Sequential Writer, a Write Buffer, an Address Mapping Table, and a Node Validation Manager. All of the modified nodes in the Legacy B^(+)-tree are stored in the Write Buffer. If the Write Buffer is full, the Sequential Writer contiguously writes each node of the Write Buffer at the end of the file. To support this algorithm, the Address Mapping Table links NodeIDs of the Legacy B^(+)-tree to the LBA of the corresponding node. Because AS B-tree writes the modified nodes on sequential LBAs in this same manner, it is able to take advantage of the internal parallelism of SSDs. In the experiments presented as part of this paper, AS B-tree enhanced the insertion performance of the conventional B^(+)-tree by 21%. We also confirmed AS B-tree demonstrates better performance than other flash-aware indexes in search-oriented workloads.

bioinformatics and bioengineering | 2008

A novel evolutionary algorithm for bi-clustering of gene expression data based on the Order Preserving Sub-Matrix (OPSM) constraint

Hongchan Roh; Sanghyun Park

Biclustering is a popular method which can reveal unknown genetic pathways. However, even though many algorithms have been suggested, no overwhelming algorithm has been suggested, due to its significant search space, until now. In this respect, several evolutionary algorithms tried to address this problem utilizing the powerful search capability of Evolutionary Computation (EC). However, most algorithms focused on exploiting the Mean Square Residue (MSR) measure which was proposed by Cheng and Church. The Order Preserving Sub-Matrix (OPSM) constraint was rarely considered even though it promises more biologically relevant biclusters than the MSR measure. The goal of this paper is to design an EC algorithm which ensures biologically significant biclusters by using the OPSM constraint and better biclusters than the original OPSM algorithm. We designed a novel encoding method and evolutionary operators suitable for the OPSM constraint. To efficiently explore the search space, we modulized our evolutionary algorithm and applied the co-evolution concept. Through a set of experiments, it was confirmed that our algorithm outperformed a representative EC biclustering algorithm based on CC and the original OPSM algorithm.

Information Sciences | 2015

BulkAligner: A novel sequence alignment algorithm based on graph theory and Trinity

Junsu Lee; Yunku Yeu; Hongchan Roh; Youngmi Yoon; Sanghyun Park

Abstract Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.

conference on information and knowledge management | 2010

Yet another write-optimized DBMS layer for flash-based solid state storage

Hongchan Roh; Daewook Lee; Sanghyun Park

Flash-based Solid State Storage (flashSSS) has write-oriented problems such as low write throughput, and limited life-time. Especially, flashSSDs have a characteristic vulnerable to random-writes, due to its control logic utilizing parallelism between the flash memory chips. In this paper, we present a write-optimized layer of DBMSs to address the write-oriented problems of flashSSS in on-line transaction processing environments. The layer consists of a write-optimized buffer, a corresponding log space, and an in-memory mapping table, closely associated with a novel logging scheme called InCremental Logging (ICL). The ICL scheme enables DBMSs to reduce page-writes at the least expense of additional page-reads, while replacing random-writes into sequential-writes. Through experiments, our approach demonstrated up-to an order of magnitude performance enhancement in I/O processing time compared to the original DBMS, increasing the longevity of flashSSS by approximately a factor of two.

acm symposium on applied computing | 2016

Optimizing hash partitioning for solid state drives

Mincheol Shin; Hongchan Roh; Wonmook Jung; Sanghyun Park

The use of flashSSDs has increased rapidly in a wide range of areas due to their superior energy efficiency, shorter access time, and higher bandwidth when compared to HDDs. The internal parallelism created by multiple flash memory packages embedded in a flashSSDs, is one of the unique features of flashSSDs. Many new DBMS technologies have been developed for flashSSDs, but query processing for flashSSDs have drawn less attention than other DBMS technologies. Hash partitioning is popularly used in query processing algorithms to materialize their intermediate results in an efficient manner. In this paper, we propose a novel hash partitioning algorithm that exploits the internal parallelism of flashSSDs. The devised hash partitioning method outperforms the traditional hash partitioning technique regardless of the amount of available main memory independently from the buffer management strategies (blocked I/O vs page sized I/O). We implemented our method based on the source code of the PostgreSQL storage manager. PostgreSQL relation files created by the TPC-H workload were employed in the experiments. Our method was found to be up to 3.55 times faster than the traditional method with blocked I/O, and 2.36 times faster than the traditional method with pagesized I/O.

The Kips Transactions:partd | 2012

The Efficient Merge Operation in Log Buffer-Based Flash Translation Layer for Enhanced Random Writing

Jun-Hyuk Lee; Hongchan Roh; Sang-Hyun Park

Recently, the flash memory consistently increases the storage capacity while the price of the memory is being cheap. This makes the mass storage SSD(Solid State Drive) popular. The flash memory, however, has a lot of defects. In order that these defects should be complimented, it is needed to use the FTL(Flash Translation Layer) as a special layer. To operate restrictions of the hardware efficiently, the FTL that is essential to work plays a role of transferring from the logical sector number of file systems to the physical sector number of the flash memory. Especially, the poor performance is attributed to Erase-Before-Write among the flash memory`s restrictions, and even if there are lots of studies based on the log block, a few problems still exists in order for the mass storage flash memory to be operated. If the FAST based on Log Block-Based Flash often is generated in the wide locality causing the random writing, the merge operation will be occur as the sectors is not used in the data block. In other words, the block thrashing which is not effective occurs and then, the flash memory`s performance get worse. If the log-block makes the overwriting caused, the log-block is executed like a cache and this technique contributes to developing the flash memory performance improvement. This study for the improvement of the random writing demonstrates that the log block is operated like not only the cache but also the entire flash memory so that the merge operation and the erase operation are diminished as there are a distinct mapping table called as the offset mapping table for the operation. The new FTL is to be defined as the XAST(extensively-Associative Sector Translation). The XAST manages the offset mapping table with efficiency based on the spatial locality and temporal locality.

The Kips Transactions:partd | 2011

A novel page replacement policy associated with ACT-R inspired by human memory retrieval process

Hongchan Roh; Sang-Hyun Park

The cache structure, which is designed for assuring fast accesses to frequently accessed data, resides on the various levels of computer system hierarchies. Many studies on this cache structure have been conducted and thus many page-replacement algorithms have been proposed. Most of page-replacement algorithms are designed on the basis of heuristic methods by using their own criteria such as how recently pages are accessed and how often they are accessed. This data-retrieval process in computer systems is analogous to human memory retrieval process since the retrieval process of human memory depends on frequency and recency of the retrieval events as well. A recent study regarding human memory cognition revealed that the possibility of the retrieval success and the retrieval latency have a strong correlation with the frequency and recency of the previous retrieval events. In this paper, we propose a novel page-replacement algorithm by utilizing the knowledge from the recent research regarding human memory cognition. Through a set of experiments, we demonstrated that our new method presents better hit-ratio than the LRFU algorithm which has been known as the best performing page-replacement algorithm for DBMS caches.

The Kips Transactions:partd | 2011

AS B-tree: A study on the enhancement of the insertion performance of B-tree on SSD

Sungho Kim; Hongchan Roh; Daewook Lee; Sang-Hyun Park

Recently flash memory has been being utilized as a main storage device in mobile devices, and flashSSDs are getting popularity as a major storage device in laptop and desktop computers, and even in enterprise-level server machines. Unlike HDDs, on flash memory, the overwrite operation is not able to be performed unless it is preceded by the erase operation to the same block. To address this, FTL(Flash memory Translation Layer) is employed on flash memory. Even though the modified data block is overwritten to the same logical address, FTL writes the updated data block to the different physical address from the previous one, mapping the logical address to the new physical address. This enables flash memory to avoid the high block-erase cost. A flashSSD has an array of NAND flash memory packages so it can access one or more flash memory packages in parallel at once. To take advantage of the internal parallelism of flashSSDs, it is beneficial for DBMSs to request I/O operations on sequential logical addresses. However, the B-tree structure, which is a representative index scheme of current relational DBMSs, produces excessive I/O operations in random order when its node structures are updated. Therefore, the original b-tree is not favorable to SSD. In this paper, we propose AS(Always Sequential) B-tree that writes the updated node contiguously to the previously written node in the logical address for every update operation. In the experiments, AS B-tree enhanced 21% of B-tree`s insertion performance.

Explore More