Hyun-Ho Jo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hyun-Ho Jo is active.

Explore More

Publication

Featured researches published by Hyun-Ho Jo.

international symposium on consumer electronics | 2014

Fast intra mode decision for HEVC intra coding

M. Ismail; Hyun-Ho Jo; Dong-Gyu Sim

This paper proposes an adaptive intra mode selection algorithm with rough mode decision (RMD) stage and an early termination for rate-distortion optimization (RDO) stage for fast intra coding of High Efficiency Video Coding (HEVC). The proposed adaptive mode selection for fast intra coding employs the prediction modes from the upper layer of coding unit (CU) and neighboring prediction units (PUs) to minimize BD-rate increment with the reduced candidate modes. In addition, the proposed early termination approach for RDO stage skips some candidate modes based on the average value of rate-distortion (RD) costs that are calculated from the RMD stage. Experimental results show that the proposed approaches reduce 28% of total encoding time with 1.7% BD-rate increment, compared to the reference software of HEVC.

international congress on image and signal processing | 2013

Hybrid parallelization for HEVC decoder

Hyun-Ho Jo; Dong-Gyu Sim; Byeungwoo Jeon

This paper presents a new hybrid parallelization method for High Efficiency Video Coding (HEVC) decoder. The proposed method groups HEVC decoding modules into entropy decoding, pixel decoding, and in-loop filtering parts for optimal parallelization considering the characteristic of all the parts. The proposed method employs coding tree unit (CTU)-level 2D wavefront for the pixel decoding part. To decrease the delay between the entropy decoding and pixel decoding, task level parallelism (TLP) is additionally employed for two parts. For the HEVC deblocking filter, CTU-level data level parallelism (DLP) with equally partitioned CTUs is proposed. In addition, CTU row-level DLP for sample adaptive offset (SAO) is proposed to achieve maximum parallel performance and to minimize the overhead of organizing a backup buffer. The experimental results show that the proposed approach for parallel deblocking filter achieved a speed-up of 5.4× and the parallel SAO approach achieved a speed-up of 3.7× maximally on the multi-core platform. Furthermore, the proposed parallel HEVC decoder shows a speed-up of 2.9× with 6 threads without any encoder parallel tools such as wavefront parallel processing (WPP) coding and picture partitioning with tile and slice segments.

Eurasip Journal on Image and Video Processing | 2013

Fast CAVLD of H.264/AVC on bitstream decoding processor

Jung-Han Seo; Hyun-Ho Jo; Dong-Gyu Sim; Doohyun Kim; Joon-Ho Song

This paper presents a fast context-based adaptive variable-length decoding (CAVLD) method of H.264/AVC with a very long instruction word-based bitstream processing unit (BsPU) designed for entropy decoding of multiple video formats. A new table mapping algorithm for the coeff_token, level, and run_before syntax elements of the quantized transform coefficients is proposed, and many branch operations are removed by utilizing several designated instructions in the BsPU. By applying designated instructions and the proposed table mapping algorithm to CAVLD, we found that the proposed fast CAVLD method achieves an increase of approximately 47% in the decoding speed and a reduction of approximately 59% in memory requirements for the table mapping.

multimedia signal processing | 2011

Macroblock-Based Adaptive Loop Filter for Video Compression

Hyun-Ho Jo; Dong-Gyu Sim; Hae-Kwang Kim

In this paper, we propose an adaptive loop filter that can work on macroblock-level encoding. First, we calculate the filter coefficients that minimize the mean square error between original and encoded frames, and we then apply 2D-filtering to the encoded macroblock using the coefficients. In addition to alleviating blockness, the proposed filtering process improves coding efficiency by allowing the use of filtered pixels as reference pixels for consecutive macroblock coding. The experimental results show that the proposed macroblock-based adaptive loop filter (MBALF) methods can achieve 6% bitrate reduction on average, as compared with the H.264/AVC high profile.

Optical Engineering | 2014

Bitstream decoding processor for fast entropy decoding of variable length coding-based multiformat videos

Hyun-Ho Jo; Dong-Gyu Sim

Abstract. We present a bitstream decoding processor for entropy decoding of variable length coding–based multiformat videos. Since most of the computational complexity of entropy decoders comes from bitstream accesses and table look-up process, the developed bitstream processing unit (BsPU) has several designated instructions to access bitstreams and to minimize branch operations in the table look-up process. In addition, the instruction for bitstream access has the capability to remove emulation prevention bytes (EPBs) of H.264/AVC without initial delay, repeated memory accesses, and additional buffer. Experimental results show that the proposed method for EPB removal achieves a speed-up of 1.23 times compared to the conventional EPB removal method. In addition, the BsPU achieves speed-ups of 5.6 and 3.5 times in entropy decoding of H.264/AVC and MPEG-4 Visual bitstreams, respectively, compared to an existing processor without designated instructions and a new table mapping algorithm. The BsPU is implemented on a Xilinx Virtex5 LX330 field-programmable gate array. The MPEG-4 Visual (ASP, Level 5) and H.264/AVC (Main Profile, Level 4) are processed using the developed BsPU with a core clock speed of under 250 MHz in real time.

multimedia and ubiquitous engineering | 2013

Sample Adaptive Offset Parallelism in HEVC

Eun-Kyung Ryu; Jung-Hak Nam; Seon-Oh Lee; Hyun-Ho Jo; Dong-Gyu Sim

We propose a parallelization method for SAO, in-loop filter of HEVC. SAO filtering proceeds along CTB lines and there exists data dependency between inside and outside of CTB boundaries. Data dependency makes data-level parallelization hard. In this paper, we equally divided an entire frame into sub regions. With a little amount of memory, proposed method shows 1.9 times of performance enhancement in terms of processing time.

Journal of Broadcast Engineering | 2012

Complexity-based Sample Adaptive Offset Parallelism

Eun-Kyung Ryu; Hyun-Ho Jo; Jung-Han Seo; Dong-Gyu Sim; Doohyun Kim; Joon-Ho Song

In this paper, we propose a complexity-based parallelization method of the sample adaptive offset (SAO) algorithm which is one of HEVC in-loop filters. The SAO algorithm can be regarded as region-based process and the regions are obtained and represented with a quad-tree scheme. A offset to minimize a reconstruction error is sent for each partitioned region. The SAO of the HEVC can be parallelized in data-level. However, because the sizes and complexities of the SAO regions are not regular, workload imbalance occurs with multi-core platform. In this paper, we propose a LCU-based SAO algorithm and a complexity prediction algorithm for each LCU. With the proposed complexity-based LCU processing, we found that the proposed algorithm is faster than the sequential implementation by a factor of 2.38 times. In addition, the proposed algorithm is faster than regular parallel implementation SAO by 21%.

international conference on consumer electronics | 2013

Bitstream parsing processor with emulation prevention bytes removal for H.264/AVC decoder

Hyun-Ho Jo; Jung-Han Seo; Dong-Gyu Sim; Doohyun Kim; Joon-Ho Song; Do-Hyung Kim; Shihwa Lee

In this paper, we present a bitstream parsing processor including emulation prevention bytes (EPB) removal for H.264/AVC decoder. The proposed bitstream parsing processor includes several specific instructions for bitstream parsing. Furthermore, it employs double bitstream buffers to remove EPBs for sequential bitstream parsing. Experimental results show that the proposed bitstream parsing processor achieves a cycle reduction of 18%, compared with conventional EPB removing methods. In addition, the proposed method reduces the buffer size by a large amount to preserve a network abstraction layer (NAL) unit for the removal of EPBs.

Journal of Broadcast Engineering | 2012

Parallel Method for HEVC Deblocking Filter based on Coding Unit Depth Information

Hyun-Ho Jo; Eun-Kyung Ryu; Jung-Hak Nam; Dong-Gyu Sim; Doohyun Kim; Joon-Ho Song

In this paper, we propose a parallel deblocking algorithm to resolve workload imbalance when the deblocking filter of high efficiency video coding (HEVC) decoder is parallelized. In HEVC, the deblocking filter which is one of the in-loop filters conducts two-step filtering on vertical edges first and horizontal edges later. The deblocking filtering can be conducted with high-speed through data-level parallelism because there is no dependency between adjacent edges for deblocking filtering processes. However, workloads would be imbalanced among regions even though the same amount of data for each region is allocated, which causes performance loss of decoder parallelization. In this paper, we solve the problem for workload imbalance by predicting the complexity of deblocking filtering with coding unit (CU) depth information at a coding tree block (CTB) and by allocating the same amount of workload to each core. Experimental results show that the proposed method achieves average time saving (ATS) by 64.3%, compared to single core-based deblocking filtering and also achieves ATS by 6.7% on average and 13.5% on maximum, compared to the conventional uniform data-level parallelism.

Journal of Real-time Image Processing | 2017

Software pipelining with CGA and proposed intrinsics on a reconfigurable processor for HEVC decoders

Yong-Jo Ahn; Jonghun Yoo; Hyun-Ho Jo; Dong-Gyu Sim

This work proposes several intrinsics on a reconfigurable processor intended for HEVC decoding and software pipelining algorithms with a coarse-grained array (CGA) architecture as well as the proposed intrinsic instructions. Software pipelining algorithms are developed for the CGA acceleration of inverse transform, pixel reconstruction, de-blocking filter and sample adaptive offset modules. To enable efficient software pipelining, several very-long instruction-word-based intrinsics are designed in order to maximize the parallelization rather than the computational acceleration. We found that the HEVC decoder with the proposed intrinsics yields 2.3 times faster in running clock cycle than a decoder that does not use the intrinsics. In addition, the HEVC decoder with CGA pipelining algorithms executes 10.9 times faster than that without the CGA mode.

Explore More