Dong Oh Son | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dong Oh Son is active.

Explore More

Publication

Featured researches published by Dong Oh Son.

The Journal of Supercomputing | 2013

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

Hong Jun Choi; Dong Oh Son; Seung Gu Kang; Jong-Myon Kim; Hsien-Hsin Lee; Cheol Hong Kim

Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency of the scheduling scheme, which selects the device to execute the application between the CPU and the GPU, is one of the most critical factors in determining the performance. This paper proposes a dynamic scheduling scheme for the selection of the device between the CPU and the GPU to execute the application based on the estimated-execution-time information. The proposed scheduling scheme enables the selection between the CPU and the GPU to minimize the completion time, resulting in a better system performance, even though it requires the training period to collect the execution history. According to our simulations, the proposed estimated-execution-time scheduling can improve the utilization of the CPU and the GPU compared to existing scheduling schemes, resulting in reduced execution time and enhanced energy efficiency of heterogeneous computing systems.

Archive | 2016

CTA-Aware Dynamic Scheduling Scheme for Streaming Multiprocessors in High-Performance GPUs

Dong Oh Son; Cong Thuan Do; Hong Jun Choi; Jong-Myon Kim; Jaehyung Park; Cheol Hong Kim

GPGPUs can provide powerful computational capability and are employed to execute both graphics and general-purpose applications. Hardware resource utilization is one of the most important factors in determining the GPGPU performance. For GPGPUs, multiple-application execution can increase the data parallelism, resulting in high resource utilization. However, applications have different execution time depending on their workload sizes. Therefore, if one application is completed earlier than the other ones, resource underutilization problem may happen because the hardware resource allocated for the early completed application become idle. In this work, a CTA-aware dynamic streaming multiprocessors scheduling scheme is proposed for multiple-application execution in the GPGPU to efficiently manage hardware resources. Compared to the baseline architecture, the proposed CTA-aware dynamic SM scheduling scheme can increase GPU performance by up to 25.6% on average.

Archive | 2016

A New Prefetch Policy for Data Filter Cache in Energy-Aware Embedded Systems

Dong Oh Son; Cong Thuan Do; Hong Jun Choi; Jong-Myon Kim; Ji-Seung Nam; Cheol Hong Kim

As process technology scales down, energy consumption in embedded processors becomes a crucial issue. In embedded processors, data cache accounts for a considerable portion of total dynamic energy consumption. In this paper, we propose a novel energy-efficient Prefetch Data Filter cache (PDF-cache) technique that enables filtering cache accesses by using the access pattern to data cache when a loop instruction is executed. In the proposed architecture, the accesses to data cache are partly migrated to the PDF-cache, which has a very small size. According to our experimental results, the proposed cache architecture with PDF-cache can reduce the dynamic energy consumption compared to the baseline about 7.1% on average with little storage overheads.

Archive | 2018

Cache Reuse Aware Replacement Policy for Improving GPU Cache Performance

Dong Oh Son; Gwang Bok Kim; Jong-Myon Kim; Cheol Hong Kim

The performance of computing systems has been improved significantly for several decades. However, increasing the throughput of recent CPUs (Central Processing Units) is restricted by power consumption and thermal issues. GPUs (Graphics Processing Units) are recognized as efficient computing platform with powerful hardware resources to support CPUs in computing systems. Unlike CPUs, there is a large number of CUDA (Compute Unified Device Architecture) cores in GPUs, hence, some cache blocks are referenced many times repeatedly. If those cache blocks reside in the cache for long time, hit rates can be improved. On the other hand, many cache blocks are referenced only once and never referenced again in the cache. These blocks waste cache memory space, resulting in reduced GPU performance. Conventional LRU replacement policy cannot consider the problems from non-reused cache blocks and frequently-reused cache blocks. In this paper, a new cache replacement policy based on the reuse pattern of cache blocks is proposed. The proposed cache replacement policy manages cache blocks by separating reused cache blocks and thrashing cache blocks. According to simulation results, the proposed cache reuse replacement policy can increase IPC by up to 4.4% compared to the conventional GPU architecture.

Cluster Computing | 2017

A dynamic CTA scheduling scheme for massive parallel computing

Dong Oh Son; Cong Thuan Do; Hong Jun Choi; Ji-Seung Nam; Cheol Hong Kim

Recent computing devices execute massive parallel data requiring huge computing hardware. To satisfy increasing computing need, GPUs providing powerful computational capability are employed to execute both graphics and general-purpose applications (GPGPUs). In the GPGPU, executing multiple applications together can increase the data parallelism, resulting in high resource utilization. Improving the resource utilization of the GPGPU can increase the GPGPU performance. However, various kinds of applications have different execution time depending on their workload sizes. Therefore, if one application is completed earlier than the other ones, resource underutilization problem may happen because the hardware resource allocated for the early completed application becomes idle. In this work, a CTA-aware dynamic streaming multiprocessors scheduling scheme is proposed for multiple applications execution in the GPGPU to efficiently manage hardware resources. Simulation results show that the proposed CTA-aware dynamic SM scheduling scheme can increase the GPU performance by up to 25.6% on average.

Seventh International Conference on Graphic and Image Processing (ICGIP 2015) | 2015

Impact of memory bottleneck on the performance of graphics processing units

Dong Oh Son; Hong Jun Choi; Jong-Myon Kim; Cheol Hong Kim

Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.

international conference on it convergence and security, icitcs | 2014

Analysis on the Power Efficiency of Mobile Systems Varying Device Parameters

Dong Oh Son; Hong Jun Choi; Jong-Myon Kim; Cheol Hong Kim

Recent mobile devices such as smartphones can provide diverse functions by supporting various kinds of applications. To maximize the efficiency of mobile devices, power consumption should be considered since it has strong relation with battery life. This paper measures and analyzes the power consumption of smart devices varying critical configuration components such as processor type, display device and operating system. Additionally, various kinds of applications are executed to consider diverse aspects of characteristics of different applications. As we predicted, increasing processor complexity and display size results in more power consumption. Compared to idle case, Android consumes more power by 161% on average whereas iOS consumes more power by 142% on average. Therefore, we can know that the iOS power efficiency is better than Android by 19.7% on the average. This is because hardware/software optimization in iOS is better than Android.

international conference on it convergence and security, icitcs | 2014

Impact of Clock Frequency and Number of Cores on GPU Performance

Hong Jun Choi; Dong Oh Son; Cheol Hong Kim; Jong Myron Kim Kim

Modern graphics processing units (GPUs) containing massive parallel hardware have become more flexible with unified shader cores which can run diverse graphics operations. Moreover, programmers can run general-purpose applications on GPUs easily, since GPU vendors provide user-friendly application programming interfaces (APIs). Many studies for improving system performance using GPUs have been researched intensively. Study on the GPU architecture is challenging, because the GPU architecture is totally different to the traditional CPU architecture. This paper analyzes the GPU performance according to GPU parameters with various number of cores and clock frequency. According to our simulations, the GPU performance improves by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory system cannot service the data requests efficiently, resulting in memory bottleneck. Consequently, memory bottleneck problem should be considered for efficient GPU architecture design.

international conference on it convergence and security, icitcs | 2014

A Novel Prefetch Technique for High Performance Embedded System

Hong Jun Choi; Dong Oh Son; Cheol Hong Kim; Jong Myron Kim Kim

Improving the performance of embedded systems can be supported by increasing the hit rates of last level caches (LLC). To enhance the hit rates of LLC, we propose a new prefetch technique. The proposed prefetch technique can fetch the data from main memory prior to actual requests to reduce the long latency to the main memory. To support the proposed technique, we introduce a new structure, LLC buffer which contains several memory blocks nearby the previous referenced memory block. In case that the LLC capacity is not enough, the proposed prefetch technique can improve the performance of embedded systems significantly.

Journal of KIISE | 2014

A New Cache Replacement Policy for Improving Last Level Cache Performance

Cong Thuan Do; Dong Oh Son; Jong-Myon Kim; Cheol Hong Kim

캐쉬 교체 기법은 캐쉬 미스를 감소시키기 위해서 개발되었다. 마이크로프로세서와 주기억장치의 속도 차이를 해결하기 위해서는 캐쉬 교체 기법의 성능이 중요하다. 일반적인 캐쉬 교체 기법으로는 LRU 기법이 있으며 대부분의 마이크로프로세서에서 캐쉬 교체 기법으로 LRU 기법을 사용한다. 그러나, 최근의 연구에 따르면 LRU 기법과 최적 교체(OPT) 기법 간의 성능 차이는 매우 크다. LRU 기법의 성능은 많은 연구를 통해서 검증되었지만, 캐쉬 사상방식이 높아질수록 LRU 기법과 OPT 기법의 성능 차이는 증가한다. 본 논문에서는 기존의 LRU 기법을 활용하여 캐쉬 성능을 향상시키는 캐쉬 교체 기법을 제안하였다. 제안된 캐쉬 교체 기법은 캐쉬 블록의 접근율에 따라 교체 대상을 선정하여 캐쉬 블록을 교체시킨다. 제안된 캐쉬 교체 기법은 512KB L2 캐쉬에서 기존의 LRU 기법과 비교하여 평균 15%의 미스율을 감소시켰고, 프로세서 성능은 4.7% 향상됨을 알 수 있다.

Explore More