Liang-Gee Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liang-Gee Chen is active.

Explore More

Publication

Featured researches published by Liang-Gee Chen.

IEEE Transactions on Circuits and Systems for Video Technology | 2002

Efficient moving object segmentation algorithm using background registration technique

Shao-Yi Chien; Shyh-Yih Ma; Liang-Gee Chen

An efficient moving object segmentation algorithm suitable for real-time content-based multimedia communication systems is proposed in this paper. First, a background registration technique is used to construct a reliable background image from the accumulated frame difference information. The moving object region is then separated from the background region by comparing the current frame with the constructed background image. Finally, a post-processing step is applied on the obtained object mask to remove noise regions and to smooth the object boundary. In situations where object shadows appear in the background region, a pre-processing gradient filter is applied on the input image to reduce the shadow effect. In order to meet the real-time requirement, no computationally intensive operation is included in this method. Moreover, the implementation is optimized using parallel processing and a processing speed of 25 QCIF fps can be achieved on a personal computer with a 450-MHz Pentium III processor. Good segmentation performance is demonstrated by the simulation results.

IEEE Transactions on Circuits and Systems for Video Technology | 2005

Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder

Yu-Wen Huang; Bing-Yu Hsieh; Tung-Chien Chen; Liang-Gee Chen

Intra prediction with rate-distortion constrained mode decision is the most important technology in H.264/AVC intra frame coder, which is competitive with the latest image coding standard JPEG2000, in terms of both coding performance and computational complexity. The predictor generation engine for intra prediction and the transform engine for mode decision are critical because the operations require a lot of memory access and occupy 80% of the computation time of the entire intra compression process. A low cost general purpose processor cannot process these operations in real time. In this paper, we proposed two solutions for platform-based design of H.264/AVC intra frame coder. One solution is a software implementation targeted at low-end applications. Context-based decimation of unlikely candidates, subsampling of matching operations, bit-width truncation to reduce the computations, and interleaved full-search/partial-search strategy to stop the error propagation and to maintain the image quality, are proposed and combined as our fast algorithm. Experimental results show that our method can reduce 60% of the computation used for intra prediction and mode decision while keeping the peak signal-to-noise ratio degradation less than 0.3 dB. The other solution is a hardware accelerator targeted at high-end applications. After comprehensive analysis of instructions and exploration of parallelism, we proposed our system architecture with four-parallel intra prediction and mode decision to enhance the processing capability. Hadamard-based mode decision is modified as discrete cosine transform-based version to reduce 40% of memory access. Two-stage macroblock pipelining is also proposed to double the processing speed and hardware utilization. The other features of our design are reconfigurable predictor generator supporting all of the 13 intra prediction modes, parallel multitransform and inverse transform engine, and CAVLC bitstream engine. A prototype chip is fabricated with TSMC 0.25-/spl mu/m CMOS 1P5M technology. Simulation results show that our implementation can process 16 mega-pixels (4096/spl times/4096) within 1 s, or namely 720/spl times/480 4:2:0 30 Hz video in real time, at the operating frequency of 54 MHz. The transistor count is 429 K, and the core size is only 1.855/spl times/1.885 mm/sup 2/.

IEEE Transactions on Circuits and Systems for Video Technology | 2006

Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder

Tung-Chien Chen; Shao-Yi Chien; Yu-Wen Huang; Chen-Han Tsai; Ching-Yeh Chen; To-Wei Chen; Liang-Gee Chen

H.264/AVC significantly outperforms previous video coding standards with many new coding tools. However, the better performance comes at the price of the extraordinarily huge computational complexity and memory access requirement, which makes it difficult to design a hardwired encoder for real-time applications. In addition, due to the complex, sequential, and highly data-dependent characteristics of the essential algorithms in H.264/AVC, both the pipelining and the parallel processing techniques are constrained to be employed. The hardware utilization and throughput are also decreased because of the block/MB/frame-level reconstruction loops. In this paper, we describe our techniques to design the H.264/AVC video encoder for HDTV applications. On the system design level, in consideration of the characteristics of the key components and the reconstruction loops, the four-stage macroblock pipelined system architecture is first proposed with an efficient scheduling and memory hierarchy. On the module design level, the design considerations of the significant modules are addressed followed by the hardware architectures, including low-bandwidth integer motion estimation, parallel fractional motion estimation, reconfigurable intrapredictor generator, dual-buffer block-pipelined entropy coder, and deblocking filter. With these techniques, the prototype chip of the efficient H.264/AVC encoder is implemented with 922.8 K logic gates and 34.72-KB SRAM at 108-MHz operation frequency.

IEEE Transactions on Circuits and Systems | 2006

Analysis and architecture design of variable block-size motion estimation for H.264/AVC

Ching-Yeh Chen; Shao-Yi Chien; Yu-Wen Huang; Tung-Chien Chen; Tu-Chih Wang; Liang-Gee Chen

Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intra-level classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory

IEEE Transactions on Circuits and Systems for Video Technology | 2003

Analysis and architecture design of block-coding engine for EBCOT in JPEG 2000

Chung-Jr Lian; Kuanfu Chen; Hong-Hui Chen; Liang-Gee Chen

Embedded block coding with optimized truncation (EBCOT) is the most important technology in the latest image-coding standard, JPEG 2000. The hardware design of the block-coding engine in EBCOT is critical because the operations are bit-level processing and occupy more than half of the computation time of the whole compression process. A general purpose processor (GPP) is, therefore, very inefficient to process these operations. We present detailed analysis and dedicated hardware architecture of the block-coding engine to execute the EBCOT algorithm efficiently. The context formation process in EBCOT is analyzed to get an insight into the characteristics of the operation. A column-based architecture and two speed-up methods, sample skipping (SS) and group-of-column skipping (GOCS), for the context generation are then proposed. As for arithmetic encoder design, the pipeline and look-ahead techniques are used to speed up the processing. It is shown that about 60% of the processing time is reduced compared with sample-based straightforward implementation. A test chip is designed and the simulation results show that it can process 4.6 million pixels image within 1 s, corresponding to 2400 /spl times/ 1800 image size, or CIF (352 /spl times/ 288) 4 : 2 : 0 video sequence with 30 frames per second at 50-MHz working frequency.

IEEE Transactions on Signal Processing | 2004

Flipping structure: an efficient VLSI architecture for lifting-based discrete wavelet transform

Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen

In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.

IEEE Transactions on Circuits and Systems for Video Technology | 2006

Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC

Yu-Wen Huang; Bing-Yu Hsieh; Shao-Yi Chien; Shyh-Yih Ma; Liang-Gee Chen

In the new video coding standard H.264/AVC, motion estimation (ME) is allowed to search multiple reference frames. Therefore, the required computation is highly increased, and it is in proportion to the number of searched reference frames. However, the reduction in prediction residues is mostly dependent on the nature of sequences, not on the number of searched frames. Sometimes the prediction residues can be greatly reduced, but frequently a lot of computation is wasted without achieving any better coding performance. In this paper, we propose a context-based adaptive method to speed up the multiple reference frames ME. Statistical analysis is first applied to the available information for each macroblock (MB) after intra-prediction and inter-prediction from the previous frame. Context-based adaptive criteria are then derived to determine whether it is necessary to search more reference frames. The reference frame selection criteria are related to selected MB modes, inter-prediction residues, intra-prediction residues, motion vectors of subpartitioned blocks, and quantization parameters. Many available standard video sequences are tested as examples. The simulation results show that the proposed algorithm can maintain competitively the same video quality as exhaustive search of multiple reference frames. Meanwhile, 76 %-96 % of computation for searching unnecessary reference frames can be avoided. Moreover, our fast reference frame selection is orthogonal to conventional fast block matching algorithms, and they can be easily combined to achieve further efficient implementations.

IEEE Transactions on Signal Processing | 1993

An efficient and simple VLSI tree architecture for motion estimation algorithms

Yeu-Shen Jehng; Liang-Gee Chen; Tzi-Dar Chiueh

A low-latency, high-throughput tree architecture is proposed. This architecture implements both the full-search block-matching algorithm and the three-step hierarchical search algorithm in motion estimation. Owing to the simple and modular properties, the proposed architecture is suitable for VLSI implementation. Furthermore, it can be decomposed into subtrees to reduce hardware cost and pin count. The memory interleaving and the pipeline interleaving are also employed to enhance memory bandwidth and to use the pipeline 100%. Theoretical calculations and simulation results are presented to show the attractive performance. >

signal processing systems | 2006

Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results

Yu-Wen Huang; Ching-Yeh Chen; Chen-Han Tsai; Chun-Fu Shen; Liang-Gee Chen

Abstract.Block matching motion estimation is the heart of video coding systems. During the last two decades, hundreds of fast algorithms and VLSI architectures have been proposed. In this paper, we try to provide an extensive exploration of motion estimation with our new developments. The main concepts of fast algorithms can be classified into six categories: reduction in search positions, simplification of matching criterion, bitwidth reduction, predictive search, hierarchical search, and fast full search. Comparisons of various algorithms in terms of video quality and computational complexity are given as useful guidelines for software applications. As for hardware implementations, full search architectures derived from systolic mapping are first introduced. The systolic arrays can be divided into inter-type and intra-type with 1-D, 2-D, and tree structures. Hexagonal plots are presented for system designers to clearly evaluate the architectures in six aspects including gate count, required frequency, hard-ware utilization, memory bandwidth, memory bitwidth, and latency. Next, architectures supporting fast algorithms are also reviewed. Finally, we propose our algorithmic and architectural co-development. The main idea is quick checking of the entire search range with simplified matching criterion to globally eliminate impossible candidates, followed by finer selection among potential best matched candidates. The operations of the two stages are mapped to the same hardware for resource sharing. Simulation results show that our design is ten times more area-speed efficient than full search architectures while the video quality is competitively the same.

IEEE Transactions on Circuits and Systems for Video Technology | 1997

Error concealment of lost motion vectors with overlapped motion compensation

Mei-Juan Chen; Liang-Gee Chen; Ro-Min Weng

Most video sequence coding systems use block motion compensation to remove temporal redundancy for video compression due to the regularity and simplicity. A new error concealment algorithm for recovering the lost or erroneously received motion vectors is presented. It combines the overlapped motion compensation and the side match criterion to make the effect of lost motion vectors subjectively imperceptible. The side match criterion takes advantage of the spatial contiguity and interpixel correlation of image to select the best-fit replacement among the motion vectors of spatially contiguous candidate blocks. Particularly, to mask the blocking artifacts, we incorporate an overlapping technique to create a subjectively closer approximation to the true error-free image.

Explore More