Hong-Gyu Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hong-Gyu Kim is active.

Explore More

Publication

Featured researches published by Hong-Gyu Kim.

acm sigplan symposium on principles and practice of parallel programming | 2011

Achieving a single compute device image in OpenCL for multiple GPUs

Jungwon Kim; Hong-Gyu Kim; Joo Hwan Lee; Jaejin Lee

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the platform that has multiple GPU devices. It also makes the application exploit full computing power of the multiple GPU devices and the total amount of GPU memories available in the platform. Our OpenCL framework automatically distributes at run-time the OpenCL kernel written for a single GPU into multiple CUDA kernels that execute on the multiple GPU devices. It applies a run-time memory access range analysis to the kernel by performing a sampling run and identifies an optimal workload distribution for the kernel. To achieve a single compute device image, the runtime maintains virtual device memory that is allocated in the main memory. The OpenCL runtime treats the memory as if it were the memory of a single GPU device and keeps it consistent to the memories of the multiple GPU devices. Our OpenCL-C-to-C translator generates the sampling code from the OpenCL kernel code and OpenCL-C-to-CUDA-C translator generates the CUDA kernel code for the distributed OpenCL kernel. We show the effectiveness of our OpenCL framework by implementing the OpenCL runtime and two source-to-source translators. We evaluate its performance with a system that contains 8 GPUs using 11 OpenCL benchmark applications.

international conference on parallel architectures and compilation techniques | 2010

An OpenCL framework for heterogeneous multicores with local memory

Jaejin Lee; Jungwon Kim; Sangmin Seo; Seungkyun Kim; Jungho Park; Hong-Gyu Kim; Thanh Tuan Dao; Yongjin Cho; Sung Jong Seo; Seung Hak Lee; Seung Mo Cho; Hyo Jung Song; Sang-bum Suh; Jong-Deok Choi

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our OpenCL runtime is based on software-managed caches and coherence protocols that guarantee OpenCL memory consistency to overcome the limited size of the local memory. To boost performance, the runtime relies on three source-code transformation techniques, work-item coalescing, web-based variable expansion and preload-poststore buffering, performed by our OpenCL C source-to-source translator. Work-item coalescing is a procedure to serialize multiple SPMD-like tasks that execute concurrently in the presence of barriers and to sequentially run them on a single accelerator core. It requires the web-based variable expansion technique to allocate local memory for private variables. Preload-poststore buffering is a buffering technique that eliminates the overhead of software cache accesses. Together with work-item coalescing, it has a synergistic effect on boosting performance. We show the effectiveness of our OpenCL framework, evaluating its performance with a system that consists of two Cell BE processors. The experimental result shows that our approach is promising.

Journal of Parallel and Distributed Computing | 2010

Adaptive execution techniques of parallel programs for multiprocessors

Jaejin Lee; Jungho Park; Hong-Gyu Kim; Changhee Jung; Daeseob Lim; SangYong Han

In simultaneous multithreading(SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems.

Archive | 2013

DYNAMIC LIBRARY PROFILING METHOD AND DYNAMIC LIBRARY PROFILING SYSTEM

Min-Ju Lee; Bernhard Egger; Jaejin Lee; Young-Lak Kim; Hong-Gyu Kim; Hong-June Kim

Archive | 2013

Verfahren zur dynamischen Bibliothekenprofilierung und System zur dynamischen Bibliothekenprofilierung

Min-Ju Lee; Bernhard Egger; Jaejin Lee; Young-Lak Kim; Hong-Gyu Kim; Hong-June Kim

Archive | 2013

PERFORMANCE MEASUREMENT UNIT, PROCESSOR CORE INCLUDING THE SAME AND PROCESS PROFILING METHOD

Min-Ju Lee; Bernhard Egger; Jaejin Lee; Young-Lak Kim; Hong-Gyu Kim; Hong-June Kim

Archive | 2013

Profiling method for dynamic library

Min-Ju Lee; 敏周李; Egger Bernhard; ベルンハルト・エガー; Jaejin Lee; 在鎭李; Eiraku Kin; 永洛金; Hong-Gyu Kim; 鴻圭金; Hong-June Kim; 洪準金

Archive | 2013

Performance measurement unit

Min-Ju Lee; Egger Bernhard; ベルンハルト・エガー; Jaejin Lee; 在鎭李; Eiraku Kin; 永洛金; Hong-Gyu Kim; 鴻圭金; Hong-June Kim; 洪準金

Archive | 2013

Unité de mesure de performance, noyau de processeur l'incluant et procédé de profilage de processus

Min-Ju Lee; Bernhard Egger; Jaejin Lee; Young-Lak Kim; Hong-Gyu Kim; Hong-June Kim

Archive | 2013

Leistungsmesseinheit, Prozessorkern damit und Prozessprofilierungsverfahren

Min-Ju Lee; Bernhard Egger; Jaejin Lee; Young-Lak Kim; Hong-Gyu Kim; Hong-June Kim

Explore More

Collaboration

Dive into the Hong-Gyu Kim's collaboration.

Top Co-Authors

Jaejin Lee

Seoul National University

View shared research outputs

Top Co-Authors

Hong-June Kim

Samsung

View shared research outputs

Top Co-Authors

Min-Ju Lee

Seoul National University

View shared research outputs

Top Co-Authors

Bernhard Egger

Seoul National University

View shared research outputs

Top Co-Authors

Young-Lak Kim

Seoul National University

View shared research outputs

Top Co-Authors

Jungho Park

Seoul National University

View shared research outputs

Top Co-Authors

Jungwon Kim

Seoul National University

View shared research outputs

Top Co-Authors

Daeseob Lim

Seoul National University

View shared research outputs

Top Co-Authors

Hyo Jung Song

Samsung

View shared research outputs

Top Co-Authors

Jong-Deok Choi

Samsung

View shared research outputs

Explore More