Minwoo Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minwoo Kim is active.

Explore More

Publication

Featured researches published by Minwoo Kim.

Journal of Network and Computer Applications | 2013

Benefits of using parallelized non-progressive network coding

Minwoo Kim; Karam Park; Won Woo Ro

Network coding helps improve communication rate and save bandwidth by performing a special coding at the sending or intermediate nodes. However, encoding/decoding at the nodes creates computation overhead on large input data that causes coding delays. Therefore the progressive method which can hide decoding delay in waiting time is proposed in the previous works. However, the network speed has been greatly accelerated and progressive schemes are no longer the most efficient decoding method. Thus, we present non-progressive decoding algorithm that can be more aggressively parallelized than the progressive network coding, which can diminish the advantages of hidden decoding time of progressive methods by utilizing the multi-core processors. Moreover, the block algorithm implemented by non-progressive decoding helps to reduce cache misses. Through experiments, our scheme which relies on matrix inversion and multiplication shows 46.0% improved execution time and 89.2% last level cache miss reduction compared to the progressive method on multi-core systems.

IEEE Transactions on Parallel and Distributed Systems | 2015

Dynamic Load Balancing of Parallel SURF with Vertical Partitioning

Deokho Kim; Minwoo Kim; Kyungah Kim; Minyong Sung; Won Woo Ro

The demand for real-time processing of robust feature detection is one of the major issues in the computer vision field. In order to comply with the requirements, in this paper a parallelization and optimization method to effectively accelerate SURF is proposed. The proposed parallelization method is developed based on a workload analysis of SURF in terms of various aspects, focusing in particular on the load balancing problem. First, the average parallel workload is divided into identical portions using the vertical partitioning method. Then, the load imbalance problem is further resolved using the dynamic partition balancing method. In addition, an optimization method is proposed together with the parallelization method to find and exclude redundant operations in SURF, thus effectively accelerating the feature detection operation when the proposed parallelization method is applied. The proposed method shows a maximum speedup of 19.21 compared to the single threaded performance on a 24-core system, achieving a maximum of 83.80 fps in a real-machine experiment, enabling real-time processing.

international symposium on consumer electronics | 2014

Accelerating HEVC transcoder by exploiting decoded quadtree

Minyong Sung; Minwoo Kim; Minsik Kim; Won Woo Ro

This paper proposes an accelerated High-Efficiency Video Coding (HEVC) transcoder which can promptly provide downscaled video contents to various devices. The quadtree information is first extracted from the decoding process and is transformed for the target resolution. By utilizing the decoded depth information, the encoder can be accelerated by only searching the optimal depths in the quadtree, without losing the video quality. The encoder also adaptively changes the depth search ranges according to the picture order count (POC) for further optimization. Our proposed method shows a maximum encoding speedup of 2.18, with only 0.3% BD-rate increase in the best case.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph

Minwoo Kim; Deokho Kim; Kyungah Kim; Won Woo Ro

This paper presents an optimized parallel algorithm for the next-generation video codec High Efficiency Video Coding (HEVC). The proposed method provides maximized parallel scalability by exploiting two levels of parallelism: 1) frame level and 2) task level. Frame-level parallelism is exploited using a graph that efficiently provides a parallel coding order of the frames with complex reference dependencies. The proposed reference dependency graph is generated at runtime by a novel construction algorithm that dynamically analyzes the configuration of the HEVC codec. Task-level parallelism is exploited to provide further scalability to frame-level parallelization. A pipelined execution is allowed for independent tasks, which are defined by dividing and categorizing a single coding process into multiple types of tasks. The proposed parallel encoder and decoder do not suffer from loss in coding efficiency because neither constraints nor modification in coding options are required. The proposed parallel methods result in an average encoding speedup of 1.75 and the aggressive method that exploits additional frame-level parallelism achieved 6.52 speedup using eight physical cores.

Future Generation Computer Systems | 2014

Architectural investigation of matrix data layout on multicore processors

Minwoo Kim; Won Woo Ro

Abstract Many practical applications include matrix operations as essential procedures. In addition, recent studies of matrix operations rely on parallel processing to reduce any calculation delays. Because these operations are highly data intensive, many studies have investigated work distribution techniques and data access latency to accelerate algorithms. However, previous studies have not considered hardware architectural features adequately, although they greatly affect the performance of matrix operations. Thus, the present study considers the architectural characteristics that affect the performance of matrix operations on real multicore processors. We use matrix multiplication, LU decomposition, and Cholesky factorization as the test applications, which are well-known data-intensive mathematical algorithms in various fields. We argue that applications only access matrices in a particular direction, and we propose that the canonical data layout is the optimal matrix data layout compared with the block data layout. In addition, the tiling algorithm is utilized to increase the temporal data locality in multilevel caches and to balance the workload as evenly as possible in multicore environments. Our experimental results show that applications using the canonical data layout with tiling have an 8.23% faster execution time and 3.91% of last level cache miss rate compared with applications executed with the block data layout.

international midwest symposium on circuits and systems | 2011

Parallel transpose of matrix multiplication based on the tiling algorithms

Minwoo Kim; Yong J. Jang; Won Woo Ro

This paper introduces a useful technique which can be used in a parallel matrix multiplication with the tiling method. Firstly, we exploit the effect of the matrix transpose for the tiling algorithm compared to the standard tiling algorithm. The experimental results show that the transpose tiling algorithm is more efficient than the standard tiling algorithm in most usable tile sizes. Moreover, we propose a parallel transpose tiling algorithm which is further developed from transpose tiling algorithm. Parallel transpose tiling algorithm reduces the overhead of transpose operation by distributing the matrix over multiple threads. As a result, the parallel transpose tiling algorithm is up to 4.76% and 6.61% faster than the original transpose tiling algorithm on Core2 9400 and Phenom 9550 processors, respectively.

international conference on image processing | 2015

True motion compensation with feature detection for frame rate up-conversion

Kyungah Kim; Minwoo Kim; Deokho Kim; Won Woo Ro

This paper presents a feature-based frame rate up-conversion algorithm which provides more comfortable visual experience by exploiting true motion of the objects. By considering the movement of the objects rather than the pixel values, the proposed method can create interpolated frames to reflect true movement of the video contents. We first find local features within a frame by using a feature detection algorithm. Then, the local features are matched between adjacent frames and are clustered to form an object region. The interpolated frame is created by using the perspective transformation, which enables to adequately track the dynamic movement of the defined objects. The proposed scheme efficiently resolves the blocking artifact problem and presents outstanding visual quality compared to the conventional block-based motion compensated interpolation algorithm.

Archive | 2014

Efficient Descriptor-Filtering Algorithm for Speeded Up Robust Features Matching

Minwoo Kim; Deokho Kim; Kyungah Kim; Won Woo Ro

This paper presents an efficient descriptor filtering algorithm for the feature matching process of SURF. The matching algorithm used in OpenSURF compares each and every feature descriptors by calculating the root-mean-square error of the descriptor vectors. The proposed instant-termination and Bloom filtering algorithm pre-compares the feature descriptors and decides whether the compared descriptor pairs should be further inspected. The proposed pre-comparison process compares the most significant bits of the descriptor for early decision. Also, the descriptor bits are interleaved to adapt to the Bloom filter, increasing the reliability of the filtering process. Our proposed filtering algorithm effectively reduces the number of root-mean-square error calculations.

Archive | 2013