Deokho Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Deokho Kim is active.

Explore More

Publication

Featured researches published by Deokho Kim.

Sensors | 2011

Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

Deokho Kim; Karam Park; Won Woo Ro

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine.

Sensors | 2014

A Malicious Pattern Detection Engine for Embedded Security Systems in the Internet of Things

Doohwan Oh; Deokho Kim; Won Woo Ro

With the emergence of the Internet of Things (IoT), a large number of physical objects in daily life have been aggressively connected to the Internet. As the number of objects connected to networks increases, the security systems face a critical challenge due to the global connectivity and accessibility of the IoT. However, it is difficult to adapt traditional security systems to the objects in the IoT, because of their limited computing power and memory size. In light of this, we present a lightweight security system that uses a novel malicious pattern-matching engine. We limit the memory usage of the proposed system in order to make it work on resource-constrained devices. To mitigate performance degradation due to limitations of computation power and memory, we propose two novel techniques, auxiliary shifting and early decision. Through both techniques, we can efficiently reduce the number of matching operations on resource-constrained systems. Experiments and performance analyses show that our proposed system achieves a maximum speedup of 2.14 with an IoT object and provides scalable performance for a large number of patterns.

Journal of Information Processing Systems | 2012

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

Sangpil Lee; Deokho Kim; Jaeyoung Yi; Won Woo Ro

This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.

IEEE Transactions on Parallel and Distributed Systems | 2015

Dynamic Load Balancing of Parallel SURF with Vertical Partitioning

Deokho Kim; Minwoo Kim; Kyungah Kim; Minyong Sung; Won Woo Ro

The demand for real-time processing of robust feature detection is one of the major issues in the computer vision field. In order to comply with the requirements, in this paper a parallelization and optimization method to effectively accelerate SURF is proposed. The proposed parallelization method is developed based on a workload analysis of SURF in terms of various aspects, focusing in particular on the load balancing problem. First, the average parallel workload is divided into identical portions using the vertical partitioning method. Then, the load imbalance problem is further resolved using the dynamic partition balancing method. In addition, an optimization method is proposed together with the parallelization method to find and exclude redundant operations in SURF, thus effectively accelerating the feature detection operation when the proposed parallelization method is applied. The proposed method shows a maximum speedup of 19.21 compared to the single threaded performance on a 24-core system, achieving a maximum of 83.80 fps in a real-machine experiment, enabling real-time processing.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph

Minwoo Kim; Deokho Kim; Kyungah Kim; Won Woo Ro

This paper presents an optimized parallel algorithm for the next-generation video codec High Efficiency Video Coding (HEVC). The proposed method provides maximized parallel scalability by exploiting two levels of parallelism: 1) frame level and 2) task level. Frame-level parallelism is exploited using a graph that efficiently provides a parallel coding order of the frames with complex reference dependencies. The proposed reference dependency graph is generated at runtime by a novel construction algorithm that dynamically analyzes the configuration of the HEVC codec. Task-level parallelism is exploited to provide further scalability to frame-level parallelization. A pipelined execution is allowed for independent tasks, which are defined by dividing and categorizing a single coding process into multiple types of tasks. The proposed parallel encoder and decoder do not suffer from loss in coding efficiency because neither constraints nor modification in coding options are required. The proposed parallel methods result in an average encoding speedup of 1.75 and the aggressive method that exploits additional frame-level parallelism achieved 6.52 speedup using eight physical cores.

international symposium on performance analysis of systems and software | 2015

DRAW: investigating benefits of adaptive fetch group size on GPU

Myung Kuk Yoon; Yunho Oh; Sangpil Lee; Seung Hun Kim; Deokho Kim; Won Woo Ro

Previously, hiding operation stalls is one of the important issues to suppress performance degradation of Graphics Processing Units (GPUs). In this paper, we first conduct a detailed study of factors affecting the operation stalls in terms of the fetch group size on the warp scheduler. Throughout this paper, we find that the size of fetch group is highly involved in hiding various types of operation stalls. The short latency stalls can be hidden by issuing other available warps from the same fetch group. Therefore, the short latency stalls may not be hidden well under small sized fetch group since the group has the limited number of issuable warps to hide stalls. On the contrary, the long latency stalls can be hidden by dividing warps into multiple fetch groups. The scheduler switches the fetch groups when the warps in each fetch group reach the long latency memory operation point. Therefore, the stalls may not be hidden well at the large sized fetch group. Increasing the size of fetch group reduces the number of fetch groups to hide the stalls. In addition, the load/store unit stalls are caused by the limited hardware resources to handle the memory operations. To hide all these stalls effectively, we propose a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of active fetch group. From the evaluation results, DRAW scheduler reduces an average of 16.3% of stall cycles and improves an average performance of 11.3% compared to the conventional two-level warp scheduler.

Journal of Systems Architecture | 2013

Parallelized sub-resource loading for web rendering engine

Deokho Kim; Changmin Lee; Sangpil Lee; Won Woo Ro

High-performance web browsers would be more emphasized in the commercial electronic devices including smart phones, tablet PCs, netbooks, laptops, and smart TVs. On the other hand, the web browsers still experience performance degradation due to the increase of resources provided on a web page. In fact, web pages with a large number of images require a very complex rendering operations. In this paper, we propose a parallel web browser for rapid web rendering operations with exploiting thread-level parallelism. The proposed architecture parallelizes the sub-resource loading operation on various platforms including the conventional PC system and the mobile embedded system. The proposed parallel sub-resource loading operation achieves a maximum speedup of 1.87 in a quad-core system. For the dual-core embedded system, a maximum speed-up is as large as 1.45.

international midwest symposium on circuits and systems | 2011

Performance evaluation of adaptive progressive network coding

Deokho Kim; Karam Park; Won Woo Ro

This paper introduces adaptive progressive network coding, a hybrid decoding method for random linear network coding. The adaptive scheme progressively decodes a group of rows simultaneously instead of a single row. We show a performance evaluation and trade-offs using our proposed model for network coding in the multitasking environment. The adaptive method provides better performance than the original progressive decoding with fewer interrupts and performance degradation due to the frequent recesses.

IEEE Transactions on Parallel and Distributed Systems | 2017

Dynamic Resizing on Active Warps Scheduler to Hide Operation Stalls on GPUs

Myung Kuk Yoon; Yunho Oh; Seung Hun Kim; Sangpil Lee; Deokho Kim; Won Woo Ro

This paper conducts a detailed study of the factors affecting the operation stalls in terms of the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that the size of a fetch group is highly involved for hiding various types of operation stalls: short latency stalls, long latency stalls, and Load/Store Unit (LSU) stalls. The scheduler with a small fetch group cannot hide short latency stalls due to the limited number of warps in a fetch group. In contrast, the scheduler with a large fetch group cannot hide long latency and LSU stalls due to the limited number of fetch groups and the lack of memory subsystems, respectively. To hide various types of stalls, this paper proposes a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of a fetch group dynamically based on the execution phases of applications. For the applications that have the best performance at LRR (one fetch group), the DRAW scheduler matches the performance of LRR and outperforms TL (multiple fetch groups) by 22.7 percent. In addition, for the applications that have the best performance at TL, our scheduler achieves 11.0 and 5.5 percent better performance compared to LRR and TL, respectively.

international conference on image processing | 2015

True motion compensation with feature detection for frame rate up-conversion

Kyungah Kim; Minwoo Kim; Deokho Kim; Won Woo Ro

This paper presents a feature-based frame rate up-conversion algorithm which provides more comfortable visual experience by exploiting true motion of the objects. By considering the movement of the objects rather than the pixel values, the proposed method can create interpolated frames to reflect true movement of the video contents. We first find local features within a frame by using a feature detection algorithm. Then, the local features are matched between adjacent frames and are clustered to form an object region. The interpolated frame is created by using the perspective transformation, which enables to adequately track the dynamic movement of the defined objects. The proposed scheme efficiently resolves the blocking artifact problem and presents outstanding visual quality compared to the conventional block-based motion compensated interpolation algorithm.

Explore More