Hyuk-Jae Lee
Seoul National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hyuk-Jae Lee.
international symposium on circuits and systems | 2007
Yongje Lee; Chae-Eun Rhee; Hyuk-Jae Lee
To reduce the size and bandwidth requirement of a frame memory for video compression, a number of memory recompression algorithms have been proposed. These previous algorithms are performed independently of a video compression standard and therefore do not take advantage of the information obtained during the processing of the compression standard. This paper proposes a new recompression algorithm that makes use of the information from H.264 intra prediction results. The proposed algorithm decomposes a frame into 4times4 blocks which are then compressed into 64-bit segments. The result of 4times4 intra prediction is used to select the scan order of the 4times4 block and DPCM (differential pulse code modulation) is performed along this scan order. Then, the DPCM results are further compressed by Golomb-Rice coding. The proposed recompression algorithm is implemented in hardware and integrated with an H.264 encoder. The proposed algorithm improves the average PSNR by 2.9dB compared to the previous work in (Lee, 2003). The hardware cost for the implementation of the recompression algorithm is 28 K gates and the additional latency to read the compressed frame memory is 162 cycles per a macroblock
Physical Review A | 2003
Doyeol Ahn; Hyuk-Jae Lee; Young Hoon Moon; Sung Woo Hwang
In this paper, the Lorentz transformation of entangled Bell states seen by a moving observer is studied. The calculated Bell observable for four joint measurements turns out to give a universal value, + + - =(2/{radical}(2-{beta}{sup 2}))(1+{radical}(1-{beta}{sup 2})), where a,b are the relativistic spin observables derived from the Pauli-Lubanski pseudovector and {beta}=(v/c). We found that the degree of violation of the Bells inequality is decreasing with increasing velocity of the observer and Bells inequality is satisfied in the ultrarelativistic limit where the boost speed reaches the speed of light.
computer and information technology | 2006
Genhua Jin; Hyuk-Jae Lee
Recently efficient hardware architectures were proposed for the fast execution of H.264/AVC intra prediction. However, these architectures waste hardware resources because intra predictions and reconstructions of sixteen 4x4 blocks in one macroblock are serialized. This paper proposes an efficient pipelining method for the intra predictions and reconstructions of the 4x4 blocks. A new processing order is proposed to reduce dependencies between consecutively executed blocks Thus, pipelined executions of these thirteen blocks speed up the execution time by 41% with a negligible drop-off of the compression efficiency when compared to the non-pipelined execution of the order in the JM reference software. Parallel executions of the intra prediction of Luma 4x4 blocks and the intra predictions of Luma 16x16 blocks and Chroma 8x8 blocks further reduce the execution time. When compared with the best previous work [1], the execution time is reduced by 13%.
IEEE Transactions on Consumer Electronics | 2009
Nam-Joon Kim; Sarp Ertürk; Hyuk-Jae Lee
Binary motion estimation algorithms reduce the computational complexity of motion estimation, but, sometimes generate an inaccurate motion vector. This paper proposes a novel two-bit representation, called twobit transform-second derivatives (2BT-SD) which improves the efficiency of image binarization and the accuracy of motion estimation by making use of the positive and negative second derivatives independently in the derivation of the second bit plane. The second derivatives are also used to detect flat or background regions, avoiding expensive motion vector search operations for macroblocks in these areas, and deriving the motion vectors by prediction from neighboring blocks. In applying the proposed 2BT-SD in the H.264/AVC standard, a further reduction of motion estimation complexity with a minor accuracy reduction is achieved by using 2BT-SD representation for secondary motion estimation while using the full resolution representation for the primary motion estimation. A hardware cost analysis shows that about 209 K gates of hardware logics and 2.7 K bytes of memory are reduced by 2BT-SD for motion estimation of 1280times720 size videos when compared with the full resolution motion estimation. Experiments show that the proposed 2BT-SD achieves better motion estimation accuracy than other binary motion estimations and provides faster processing time in flat or background regions with an acceptable bit-rate increase.
international symposium on circuits and systems | 2007
Genhua Jin; Jin-Su Jung; Hyuk-Jae Lee
A number of recent efforts have been made to speed up H.264 intra frame coding. When these algorithms are implemented by dedicated hardware accelerators, these hardware resources are often wasted if intra predictions and reconstructions for 4times4 blocks are serialized. In order to avoid a hardware waste, this paper proposes a pipelined execution of the intra predictions and reconstructions of 4times4 blocks. The processing orders of 4times4 intra predictions are derived for both encoding and decoding, respectively, to reduce the dependencies between consecutively processed blocks and minimize pipeline stalls. The proposed pipelined execution of 4times4 intra predictions for encoding is integrated with the other intra frame encoding operations with an efficient scheduling that allows these other operations to be executed in parallel with intra prediction. When compared with the best previous work for intra frame coding (Suh et al., 2005), the execution time is decreased by 41 % even with reduced hardware resources
IEEE Transactions on Circuits and Systems for Video Technology | 2012
Yongseok Jin; Hyuk-Jae Lee
Set-partitioning in hierarchical trees (SPIHT) is a widely used compression algorithm for wavelet-transformed images. One of its main drawbacks is a slow processing speed due to its dynamic processing order that depends on the image contents. To overcome this drawback, this paper presents a modified SPIHT algorithm called block-based pass-parallel SPIHT (BPS). BPS decomposes a wavelet-transformed image into 4 × 4 blocks and simultaneously encodes all the bits in a bit-plane of a 4 × 4 block. To exploit parallelism, BPS reorganizes the three passes of the original SPIHT algorithm and then BPS encodes/decodes the reorganized three passes in a parallel and pipelined manner. The precalculation of the stream length of each pass enables the parallel and pipelined execution of these three passes by not only an encoder but also a decoder. The modification of the processing order slightly degrades the compression efficiency. Experimental results show that the peak signal-to-noise ratio loss by BPS is between approximately 0.23 and 0.59 dB when compared to the original SPIHT algorithm. Both an encoder and a decoder are implemented in the hardware that can process 120 million samples per second at an operating clock frequency of 100 MHz. This processing speed allows a video of size of 1920 × 1080 in the 4:2:2 format to be processed at the rate of 30 frames/s. The gate count of the hardware is about 43.9 K.
international conference on supercomputing | 1997
Hyuk-Jae Lee; James P. Robertson; José A. B. Fortes
Cannon’s algorithm is a memory-efficient matrix multiplication technique for parallel computers with toroidal mesh interconnections. This algorithm assumes that input matrices are block distributed, but it is not clear how it can deal with block-cyclic distributed matrices. This paper generalizes Cannon’s algorithm for the case when input matrices are blockcyclic distributed across a two-dimensional processor array with an arbitrary number of processors and toroidal mesh interconnections. An efficient scheduling technique is proposed so that the number of communication steps is reduced to the least common multiple of P and Q for a given P x Q processor array. In addition, a partitioning and communication scheme is proposed to reduce the number of page faults for the case when matrices are too large to fit into main memory. Performance analysis shows that the proposed generalized Cannon’s algorithm (GCA) requires fewer page faults than a previously proposed algorithm (SUMMA). Experimental results on Intel Paragon show that GCA performs better than SUMMA when blocks of size larger than about (65 x 65) are used. However, GCA performance degrades if the block size is relatively small while SUMMA maintains the same performance. It is also shown that GCA maintains higher performance for large matrices than SUMMA
IEEE Journal of Selected Topics in Signal Processing | 2013
Kyujoong Lee; Hyuk-Jae Lee; Jaehyun Kim; Yong-Seok Choi
In HEVC, the effect of the RDO on compression efficiency increases while its computational complexity accounts for an important part of the computation burden. For H.264/AVC, zero block (ZB) detection has been used as an efficient scheme to reduce the complexity of RDO operations. For HEVC, ZB detection is different from the scheme for H.264/AVC because HEVC requires large transform sizes, 16 × 16 and 32 × 32. The increased transform size increases the complexity of ZB detection. It also decreases the accuracy of ZB detection because the variance among the quantized DCT coefficients increases and consequently the possibility of a block to be a ZB also increases even when not all of the quantized coefficients are equal to zero. For effective ZB detection of 16 × 16 and 32 × 32 blocks, this paper proposes a new ZB detection algorithm in which the DC components of the Hadamard coefficients are transformed again by Hadamard basis functions for 16 × 16 and 32 × 32 blocks and the results are compared with a predefined threshold. To reduce the computational complexity, the upper-bound of the SAD (or SATD) is defined and the Hadamard threshold is then tested only for the blocks with the SAD (or SATD) smaller than the upper-bound. Experimental results show that the proposed ZB detection reduces the computational complexity of RDO by about 40% with a negligible degradation of the RD performance.
IEEE Transactions on Consumer Electronics | 2012
Chae Eun Rhee; Kyujoong Lee; Tae Sung Kim; Hyuk-Jae Lee
The emerging High Efficiency Video Coding (HEVC) standard attempts to improve the coding efficiency by a factor of two over H.264/AVC using new compression tools with high computational complexity. The increased computational complexity makes the real-time execution with reasonable computing power become one of the critical concerns for the commercialization of HEVC. A large number of prediction modes are the main causes of the increased complexity of HEVC. Thus, a fast decision of a prediction mode needs to be effectively used to reduce the computational complexity. To take advantage of large amounts of previous works and to find a guide for application to HEVC, this paper presents a survey of these efforts for the previous standards, especially for H.264/AVC, and examines the possibility of the previous algorithms to be applicable for HEVC. To this end, previous algorithms are categorized and then the effectiveness of each category for HEVC is evaluated. For this evaluation, a previous algorithm is modified for HEVC when it is not applicable to HEVC directly. Simulation results show that most previous algorithms with slight modification, in general, improve the encoding speed with a relatively small degradation of the compression efficiency. Among them, hierarchical mode decision is especially effective whereas mode pre-decision using motion or spatial homogeneity often results in inaccurate results.
IEEE Transactions on Circuits and Systems for Video Technology | 2010
Chae Eun Rhee; Jin-Su Jung; Hyuk-Jae Lee
This paper presents a novel processing time control algorithm for a hardware-based H.264/AVC encoder. The encoder employs three complexity scaling methods partial cost evaluation for fractional motion estimation (FME), block size adjustment for FME, and search range adjustment for integer motion estimation (IME). With these methods, 12 complexity levels are defined to support tradeoffs between the processing time and compression efficiency. A speed control algorithm is proposed to select the complexity level that compresses most efficiently among those that meet the target time budget. The time budget is allocated to each macroblock based on the complexity of the macroblock and on the execution time of other macroblocks in the frame. For main profile compression, an additional complexity scaling method called direction filtering is proposed to select the prediction direction of FME by comparing the costs resulting from forward and backward IMEs. With direction filtering in addition to the three complexity scaling methods for baseline compression, 32 complexity levels are defined for main profile compression. Experimental results show that the speed control algorithm guarantees the processing time to meet the given time budget with negligible quality degradation. Various complexity levels for speed control are also used to speed up the encoding time with a slight degradation in quality and a minor reduction of the compression efficiency.