Is this you? Create Your Porfile

Dong Hoon Choi

Korea Institute of Science and Technology Information

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dong Hoon Choi is active.

Explore More

Publication

Featured researches published by Dong Hoon Choi.

BioMed Research International | 2013

Exploiting GPUs in Virtual Machine for BioCloud

Heeseung Jo; Jinkyu Jeong; Myoungho Lee; Dong Hoon Choi

Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.

Procedia Computer Science | 2013

Data Encryption on GPU for High-Performance Database Systems☆

Heeseung Jo; Seung-Tae Hong; Jae-Woo Chang; Dong Hoon Choi

Abstract Graphics processing units have proved its capability for general purpose computing in many research areas. In this paper, we propose the mechanism and implementation of a database system that encrypts and decrypts data by using GPU. The proposed mechanism is mainly designed for a database system that requires data encryption and decryption to support high security level or ODBS. By exploiting the computation capability of GPU, we achieve not only a fast encryption and decryption time per operation but also a higher overall performance of a database system by offioading computation to GPU. Moreover, the proposed system includes a mechanism which can decide whether to offload computation to GPU or not for more performance gain. We implemented the AES algorithm based on CUDA framework and integrate with MySQL, a commodity database system. Our evaluation demonstrates that the encryption and decryption on GPU shows eight times better performance compared that on CPU when the data size is 16 MB. We also show that the proposed system alleviates the utilization of CPU and the overall performance of database system is improved by offioading heavy encrypting and decrypting computation to GPU.

ieee international conference on high performance computing data and analytics | 2015

Memory-Efficient Parallelization of 3D Lattice Boltzmann Flow Solver on a GPU

Nhat-Phuong Tran; Myungho Lee; Dong Hoon Choi

Lattice Boltzmann Method (LBM) is a powerful numerical simulation method of the fluid flow. With its data parallel nature and the simple kernel structure, it is a promising candidate for a parallel implementation on a GPU. The LBM, however, is heavily data-intensive and memory bound. In particular, moving the data to the adjacent cells in the streaming computation phase of the LBM incurs a lot of uncoalesced accesses on the GPU which affects the overall performance. In this paper, we parallelize the LBM on a GPU by incorporating memory-efficient techniques such as the tiling optimization with the data layout changes and the data update scheme so called a pull scheme. Furthermore, we developed optimization techniques such as removing branch divergences, reducing the register uses, and reducing the number of double precision floating-point instructions. Experimental results on Nvidia Tesla K20 GPU show that our approach delivers up to 1105 MLUPS (Million Lattice Updates Per Second) and 156-times speedup compared with a serial implementation.

high performance computing and communications | 2014

Optimizing Cache Locality for Irregular Data Accesses on Many-Core Intel Xeon Phi Accelerator Chip

Nhat Phuong Tran; Dong Hoon Choi; Myungho Lee

Many-core accelerator chips are becoming increasingly popular these days for its high performance floating-point performance exceeding 1 Tflops per chip. Aho-Corasick (AC) is a multiple patterns string matching algorithm commonly used in computer and network security, bioinformatics, among others. In order to simultaneously match a number of string patterns with respect to the input text data, a Deterministic Finite Automata (DFA) is constructed from a given set of pattern strings. The DFA is referenced almost randomly, whereas the input data is sequentially accessed. As the number of pattern strings increases, the irregular DFA accesses lead to poor cache locality and low overall performance. In this paper, we present a cache locality optimizing parallelization on the many-core accelerator chip, the Intel Xeon Phi. A given set of pattern strings is partitioned into into multiple sets of a smaller number of patterns so that multiple small DFAs are constructed instead of single large DFA. The accesses to multiple small DFAs lead to significantly smaller cache footprints in each cores cache and result in impressive performance improvements. Experimental results on the Intel Xeon Phi 5110P show that our approach delivers up to 2.00 × speedup compared with the previous approach using single large DFA.

MUSIC | 2014

Performance Analysis of MapReduce-Based Distributed Systems for Iterative Data Processing Applications

Min Yoon; Hyeong-Il Kim; Dong Hoon Choi; Heeseung Jo; Jae-Woo Chang

Recently, research on big data has been actively made because big data are generated in various scientific applications, such as biology and astronomy. Therefore, distributed data processing techniques have been studied to manage the big data in large number servers. Meanwhile, some scientific applications like genome data analysis require loop control in analyzing big data using a MapReduce framework. In this paper, we first describe the existing MapReduce-based distributed systems which support iterative data processing. In addition, we do the performance analysis of the existing distributed systems in terms of execution time for various scientific applications which require iterative data processing. Finally, based on the performance analysis, we discuss some requirements for a new MapReduce-based distributed system which supports iterative data processing efficiently.

MUSIC | 2014

Multi-stream Parallel String Matching on Kepler Architecture

Nhat-Phuong Tran; Myungho Lee; Sugwon Hong; Dong Hoon Choi

Aho-Corasick (AC) algorithm is a commonly used string matching algorithm. It performs multiple patterns matching for computer and network security, bioinformatics, among many other applications. These applications impose high computational requirements, thus efficient parallelization of the AC algorithm is crucial. In this paper, we present a multi-stream based parallelization approach for the string matching using the AC algorithm on the latest Nvidia Kepler architecture. Our approach efficiently utilizes the HyperQ feature of the Kepler GPU so that multiple streams generated from a number of OpenMP threads running on the host multicore processor can be efficiently executed on a large number of fine-grain processing cores. Experimental results show that our approach delivers up to 420Gbps throughput performance on Nvidia Tesla K20 GPU.

Applied Mechanics and Materials | 2013

GPU Virtualization using PCI Direct Pass-Through

Heeseung Jo; Myungho Lee; Dong Hoon Choi

Machine virtualization and cloud computing environment have highlighted for last several years. This trend is based on the endeavor to enhance the utilization and reduce the ownership cost of machines. On the other hand, in aspect of high performance computing, graphics processing unit (GPU) has proved its capability for general purpose computing in many research areas. Evolving from traditional APIs such as the OpenGL and the Direct3D to program GPU as a graphics device, the CUDA of NIVDIA and the OpenCL provide more general programming environment for users. By supporting memory access model, interfaces to access GPUs directly and programming toolkits, users can perform parallel computation using the hundreds of GPU cores. In this paper, we propose a GPU virtualization mechanism to exploit GPU on virtualized cloud computing environment. Differently from the previous work which mostly reimplemented GPU programming APIs and virtual device drivers, our proposed mechanism uses the direct pass-through of PCI-E channel having GPU. The main limitation of previous approaches is virtualization overhead. Since they were focused on the sharing of GPU among virtual machines, they reimplemented GPU programming APIs at virtual machine monitor (VMM) level, and it incurred significant performance overhead. Moreover, if APIs are changed, they need to reengineer the most of APIs. In our approach, bypassing virtual machine monitor layer with negligible overhead, the mechanism can achieve similar computation performance to bare-metal system and is transparent to the GPU programming APIs.

Concurrency and Computation: Practice and Experience | 2017

Enhancing network I/o performance for a virtualized Hadoop cluster

Jinkyu Jeong; Dong Hoon Choi; Heeseung Jo

A MapReduce programming model is proposed to process big data using Hadoop, one of the major cloud computing frameworks. With the increasing adoption of cloud computing, running a Hadoop framework on a virtualized cluster is a compelling approach to reducing costs and increasing efficiency. In this paper, we measure the performance of a virtualized network and analyze the impact of network performance on Hadoop workloads running on a virtualized cluster. Then, we propose a virtualized network I/O architecture as a novel optimization for a virtualized Hadoop cluster for a public/private cloud provider. The proposed network architecture combines traditional network configurations and achieves better performance for Hadoop workloads. We also show a better way to utilize the rack awareness feature of the Hadoop framework in the proposed computing environment. The evaluation demonstrates that the proposed network architecture and mechanisms improve performance by up to 4.1 times compared with a bridge network architecture. This novel architecture can even virtually match the performance of the expensive, hardware‐based single root I/O virtualization network architecture.

Scientific Programming | 2015

Cache locality-centric parallel string matching on many-core accelerator chips

Nhat-Phuong Tran; Myungho Lee; Dong Hoon Choi

Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in computer and network security and bioinformatics, among many others. In order to meet the highly demanding computational requirements imposed on these applications, achieving high performance for the AC algorithm is crucial. In this paper, we present a high performance parallelization of the AC on the many-core accelerator chips such as the Graphic Processing Unit (GPU) from Nvidia and the Intel Xeon Phi. Our parallelization approach significantly improves the cache locality of the AC by partitioning a given set of string patterns into multiple smaller sets of patterns in a space-efficient way. Using the multiple pattern sets, intensive pattern matching operations are concurrently conducted with respect to the whole input text data. Compared with the previous approaches where the input data is partitioned amongst multiple threads instead of partitioning the pattern set, our approach significantly improves the performance. Experimental results show that our approach leads up to 2.73 times speedup on the Nvidia K20 GPU and 2.00 times speedup on the Intel Xeon Phi compared with the previous approach. Our parallel implementation delivers up to 693 Gbps throughput performance on the K20.

international conference on conceptual modeling | 2014

A Semi-clustering Scheme for High Performance PageRank on Hadoop

Seung-Tae Hong; Jeonghoon Lee; Jae-Woo Chang; Dong Hoon Choi

As global Internet business has been evolving, large-scale graphs are becoming popular. PageRank computation on the large-scale graphs using Hadoop with default data partitioning method suffers from poor performance because Hadoop scatters even a set of directly connected vertices to arbitrary multiple nodes. In this paper we propose a semi-clustering scheme to address this problem and improve the performance of PageRank on Hadoop. Our scheme divides a graph into a set of semi-clusters, each of which consists of connected vertices, and assigns a semi-cluster to a single data partition in order to reduce the cost of data exchange between nodes during the computation of PageRank. The semi-clusters are merged and split before the PageRank computation, in order to evenly distribute a large-scale graph into a number of data partitions. Our semi-clustering scheme drastically improves the performance: total elapsed time including the cost of the semi-clustering computation reduced by up to 36%. Furthermore, the effectiveness of our scheme increases as the size of the graph increases.

Explore More