Hong Jun Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hong Jun Choi is active.

Explore More

Publication

Featured researches published by Hong Jun Choi.

The Journal of Supercomputing | 2013

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

Hong Jun Choi; Dong Oh Son; Seung Gu Kang; Jong-Myon Kim; Hsien-Hsin Lee; Cheol Hong Kim

Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency of the scheduling scheme, which selects the device to execute the application between the CPU and the GPU, is one of the most critical factors in determining the performance. This paper proposes a dynamic scheduling scheme for the selection of the device between the CPU and the GPU to execute the application based on the estimated-execution-time information. The proposed scheduling scheme enables the selection between the CPU and the GPU to minimize the completion time, resulting in a better system performance, even though it requires the training period to collect the execution history. According to our simulations, the proposed estimated-execution-time scheduling can improve the utilization of the CPU and the GPU compared to existing scheduling schemes, resulting in reduced execution time and enhanced energy efficiency of heterogeneous computing systems.

international conference on computational science and its applications | 2012

Adaptive dynamic frequency scaling for thermal-aware 3d multi-core processors

Hong Jun Choi; Young Jin Park; Hsien-Hsin Lee; Cheol Hong Kim

3D integration technology can provide significant benefits of reduced interconnection delay and low power consumption in designing multi-core processors. However, the 3D integration technology magnifies the thermal challenges in multi-core processors due to high power density caused by stacking multiple layers vertically. For this reason, the 3D multi-core architecture cannot be practical without proper solutions to the thermal problems such as Dynamic Frequency Scaling(DFS). This paper investigates how the DFS handles the thermal problems in 3D multi-core processors from the perspective of the function-unit level. We also propose an adaptive DFS technique to mitigate the thermal problems in 3D multi-core processors by assigning different DFS levels to each core based on the corresponding cooling efficiency. Experimental results show that the proposed adaptive DFS technique reduces the peak temperature of 3D multi-core processors by up to 10.35°C compared to the conventional DFS technique, leading to the improved reliability.

symposium/workshop on electronic design, test and applications | 2010

Energy-aware Filter Cache Architecture for Multicore Processors

Young Jin Park; Hong Jun Choi; Cheol Hong Kim; Jong-Myon Kim

Energy consumption as well as performance should be considered when designing high-performance multicore processors. The energy consumed in the instruction cache accounts for a significant portion of total processor energy consumption. Therefore, energy-aware instruction cache design techniques are essential for high-performance multicore processors. In this paper, we propose new instruction cache architecture, which is based on the level-0 cache composed of filter cache and victim cache together, for multicore processors. The proposed architecture reduces the energy consumption in the instruction cache by reducing the number of accesses to the level-1 instruction cache. We evaluate the proposed design using a simulation infrastructure based on SimpleScalar and CACTI. Simulation results show that the proposed technique reduces the energy consumption in the instruction cache by up to 3.4% compared to the conventional filter cache architecture. Moreover, the proposed architecture shows better performance over the conventional filter cache architecture.

The Journal of the Korea Contents Association | 2013

Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications

Hong Jun Choi; Cheol Hong Kim

Due to increased computing power and flexibility of GPU, recent GPUs execute general purpose parallel applications as well as graphics applications. Programmers can use GPGPU by using the APIs from GPU vendors. Unfortunately, computational resources of GPU are not fully utilized when executing general purpose applications because of frequent branch instructions. To handle the branch problem, several warp formations have been proposed. Intuitively, we expect that the warp formations providing higher computational resource utilization show higher performance. Contrary to our expectations, according to simulation results, the performance of the warp formation providing better utilization is lower than that of the warp formation providing worse utilization. This is because warp formation providing high utilization causes serious memory bottleneck due to increased memory request. Therefore, warp formation providing high computation utilization cannot guarantee high performance without proper hardware resources. For this reason, we will analyze the correlation between hardware configuration and warp formation. Our simulation results present the guideline to solve the underutilization problem due to branch instructions when designing recent GPU.

international conference on it convergence and security, icitcs | 2014

Analysis on the Power Efficiency of Mobile Systems Varying Device Parameters

Dong Oh Son; Hong Jun Choi; Jong-Myon Kim; Cheol Hong Kim

Recent mobile devices such as smartphones can provide diverse functions by supporting various kinds of applications. To maximize the efficiency of mobile devices, power consumption should be considered since it has strong relation with battery life. This paper measures and analyzes the power consumption of smart devices varying critical configuration components such as processor type, display device and operating system. Additionally, various kinds of applications are executed to consider diverse aspects of characteristics of different applications. As we predicted, increasing processor complexity and display size results in more power consumption. Compared to idle case, Android consumes more power by 161% on average whereas iOS consumes more power by 142% on average. Therefore, we can know that the iOS power efficiency is better than Android by 19.7% on the average. This is because hardware/software optimization in iOS is better than Android.

international conference on it convergence and security, icitcs | 2014

Impact of Clock Frequency and Number of Cores on GPU Performance

Hong Jun Choi; Dong Oh Son; Cheol Hong Kim; Jong Myron Kim Kim

Modern graphics processing units (GPUs) containing massive parallel hardware have become more flexible with unified shader cores which can run diverse graphics operations. Moreover, programmers can run general-purpose applications on GPUs easily, since GPU vendors provide user-friendly application programming interfaces (APIs). Many studies for improving system performance using GPUs have been researched intensively. Study on the GPU architecture is challenging, because the GPU architecture is totally different to the traditional CPU architecture. This paper analyzes the GPU performance according to GPU parameters with various number of cores and clock frequency. According to our simulations, the GPU performance improves by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory system cannot service the data requests efficiently, resulting in memory bottleneck. Consequently, memory bottleneck problem should be considered for efficient GPU architecture design.

international conference on it convergence and security, icitcs | 2014

A Novel Prefetch Technique for High Performance Embedded System

Hong Jun Choi; Dong Oh Son; Cheol Hong Kim; Jong Myron Kim Kim

Improving the performance of embedded systems can be supported by increasing the hit rates of last level caches (LLC). To enhance the hit rates of LLC, we propose a new prefetch technique. The proposed prefetch technique can fetch the data from main memory prior to actual requests to reduce the long latency to the main memory. To support the proposed technique, we introduce a new structure, LLC buffer which contains several memory blocks nearby the previous referenced memory block. In case that the LLC capacity is not enough, the proposed prefetch technique can improve the performance of embedded systems significantly.

international conference on information science and applications | 2013

Analysis of Memory Management Policies for Heterogeneous Cloud Computing

Dong Oh Son; Hong Jun Choi; Jae Hyung Park; Cheol Hong Kim

Cloud computing has become a mainstream in providing computing resources as services. Recent cloud computing systems are composed of homogeneous hardware resources. Heterogeneous cloud computing can provide better performance and energy efficiency than homogeneous cloud computing. Therefore, heterogeneous cloud computing is expected to be used widely. In general, memory management policy has huge impact on the performance of heterogeneous computing systems. In this paper, we analyze three memory management policies for heterogeneous cloud computing systems. According to our experimental results, dynamic partition memory management policy provides better performance than static partition memory management policy by 4.65% on the average.

International Journal of Computer and Communication Engineering | 2013

Impact of Warp Formation on GPU Performance

Hong Jun Choi; Dong Oh Son; Cheol Hong Kim

dramatically, the GPU is widely used for general-purpose parallel applications as well as graphics applications. Especially, programmers using the GPU can easily create multiple threads with the help of APIs provided by GPU vendors. In GPU architecture, threads are grouped into a warp to run on the SIMD pipeline, leading to high performance. However, computational resources of GPU are not fully utilized in executing general-purpose applications due to control-flow instructions, resulting in performance degradation. To improve the GPU performance, several warp formations for handling branch divergence due to control-flow instructions have been proposed. In this work, we analyze the GPU performance according to warp formations with real GPU hardware configuration. Our simulation results show that the warp formation providing high hardware utilization does not guarantee high performance if hardware resources are not fully supported. Therefore, hardware configuration should be considered together with hardware utilization to improve the GPU performance by using warp formation.

research in applied computation symposium | 2011

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature

Seung Gu Kang; Hong Jun Choi; Cheol Hong Kim; Sung Woo Chung; Dongseop Kwon; Joong Chae Na

In recent computing systems, CPUs have encountered the situations in which they cannot meet the increasing throughput demands. To overcome the limits of CPUs in processing heavy tasks, especially for computer graphics, GPUs have been widely used. Therefore, the performance of up-to-date computing systems can be maximized when the task scheduling between the CPU and the GPU is optimized. In this paper, we analyze the system in the perspective of performance, energy efficiency, and temperature according to the execution methods between the CPU and the GPU. Experimental results show that the GPU leads to better efficiency compared to the CPU when single application is executed. However, when two applications are executed, the GPU does not guarantee superior efficiency than the CPU depending on the application characteristics.

Explore More