Tsu-Ming Liu
National Chiao Tung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tsu-Ming Liu.
international solid-state circuits conference | 2006
Tsu-Ming Liu; Ting-An Lin; Sheng-Zen Wang; Wen-Ping Lee; Jiun-Yan Yang; Kang-Cheng Hou; Chen-Yi Lee
A low-power dual-standard video decoder has been developed for mobile applications. It supports MPEG-2 SP@ML and H.264/AVC BL@L4 video decoding in a single chip and features a scalable architecture to reach area/power efficiency. This chip integrates diverse algorithms of MPEG-2 and H.264/AVC to reduce silicon area. Three low-power techniques are proposed. First, a domain-pipelined scalability (DPS) technique is used to optimize the pipelined structure according to the number of processing cycles. Second, bandwidth scalability is implemented via a line-pixel-lookahead (LPL) scheme to improve the external bandwidth and reduce the internal memory size, leading to 51% of memory power reduction compared to a conventional design. Third, low-power motion compensation and deblocking filter are designed to reduce the operating frequency without degrading system performance. A test chip is fabricated in a 0.18mum one-poly six-metal CMOS technology with an area of 15.21 mm2. For mobile applications, H.264/AVC and MPEG-2 video decoding of quarter-common intermediate format (QCIF) sequences at 15 frames per second are achieved at 1.15 MHz clock frequency with power dissipation of 125 muW and 108 muW, respectively, at 1V supply voltage
international symposium on circuits and systems | 2005
Sheng-Zen Wang; Ting-An Lin; Tsu-Ming Liu; Chen-Yi Lee
Motion compensation is always the main bottleneck in real-time or high quality video applications; thus, fast and efficient motion compensation is necessary. A new motion compensation design is presented to overcome the large calculation time of the complicated motion vector prediction (MVP) algorithm and high motion resolution in H.264/AVC. By applying 4/spl times/4 block based parallelism and context switch buffers in our design, it can efficiently reduce memory access and increase data reuse probability such that real-time decoding can be achieved with 1080HD (1920/spl times/1088) at 100 MHz and search range [-64, +63.75].
international symposium on circuits and systems | 2005
Tsu-Ming Liu; Wen-Ping Lee; Ting-An Lin; Chen-Yi Lee
A memory-efficient architecture design for a de-blocking filter in H.264/AVC is presented. We use the novel column-of-pixel data arrangement to facilitate the memory access and reuse the pixel value. Further, we propose a hybrid filter scheduling to improve the system throughput. As compared with some existing approaches of realizing the de-blocking filter, the proposed design saves about one-half of the processing cycles. With the novel data arrangement and hybrid filter scheduling, an efficient architecture design is implemented. Further, it is evaluated on an H.264 system and easily achieved real-time decoding with 1080 HD (1920/spl times/1088 @ 30 fps) when the working frequency is 100 MHz.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Tsu-Ming Liu; Wen-Ping Lee; Chen-Yi Lee
In this paper, we propose a high-throughput deblocking filter to perform the in-loop or post-loop filtering process for different standard requirements. The performance improvement is very mild if we replace a post-loop filter with an in-loop filter. To alleviate this problem, we derive an integration-oriented algorithm that can be reconfigured as the in-loop or post-loop filter. Moreover, we develop a hybrid filtering schedule to reach a lower bound of processing cycles. In particular, we reschedule the filtering order and reuse the intermediate pixels when the deblocking filter switches the filtered edges from vertical to horizontal direction. Finally, a 0.18-mum CMOS design that performs the in/post-loop filter with the hybrid filtering schedule is implemented. The synthesized gate counts are 21.1 K which is reduced to 70% of preliminary design that performs the in-loop or post-loop filter separately. Moreover, it achieves 4times105 macroblock/s of throughput rate at a 100-MHz clock rate.
international symposium on circuits and systems | 2005
Ting-An Lin; Sheng-Zen Wang; Tsu-Ming Liu; Chen-Yi Lee
In this paper, we propose a 4/spl times/4-block level pipelining architecture with instantaneous switching scheme and optimal decoding ordering of H.264/AVC decoder. Compared with conventional H.264/AVC video decoders, which adopt macroblock level pipelines, our proposed 4/spl times/4-block level pipelining architecture of H.264/AVC decoder achieves better hardware utilization. Moreover, our proposed decoding ordering can effectively save memory access and reduce processing cycles, which results in 260000 MB/s under 100 MHz clock frequency. By adopting these two techniques, our proposed design supports real time decoding with 1080HD (1920/spl times/1088) video sequence in 30fps (244800 MB/s required) and level 4 of baseline profile.
international symposium on vlsi design, automation and test | 2005
Ting-An Lin; Tsu-Ming Liu; Chen-Yi Lee
In this paper, memory access could be saved in inter and intra prediction by adopting the proposed memory-efficient decoding ordering. In the proposed hierarchical syntax parser, gated clock technique could be effectively applied to reduce power. Simulation shows the proposed design consumes 88mW in real time decoding 1080HD video sequence.
asian solid state circuits conference | 2005
Tsu-Ming Liu; Ting-An Lin; Sheng-Zen Wang; Wen-Ping Lee; Kang-Cheng Hou; Jiun-Yan Yang; Chen-Yi Lee
A low power H.264/AVC video decoder LSI for mobile applications is presented. Video decoding of quarter-common intermediate format (QCIF) sequence at 30 frames per second is achieved at 1.2MHz clock frequency and requires about 865-muW at 1.8-V supply voltage. Moreover, CIF, SD and HD sequence format are also supported. The decoder architecture is based on 4times4 sub-block level pipelining that achieves better buffer allocation and decoding throughput. In addition, several modules are designed with new features to improve overall system throughput (up to 260,000 macro-block/sec). The proposed solution integrates 456-k logic gates with 161Kb of embedded SRAM in 0.18-mum single-poly six-metal CMOS process with area of 11.3mm2
international conference on image processing | 2005
Tsu-Ming Liu; Wen-Ping Lee; Chen-Yi Lee
In this paper, we propose an area-efficient design approach to cover both in-loop and post-loop filtering processes for multiple video coding standards. In addition, we propose a hybrid filter scheduling to improve system throughput. Compared with available designs [Yu-Wen Huang et al, 2003][M. Sima et al, 2004], the proposed approach saves about one-half of processing cycles, and hence reduces power dissipation. Compared to the original loop-filter, the proposed loop/post filter only incurs 20.7% of extra cost. Simulation results show that our proposal can easily achieve real-time decoding for 1080HD when the working frequency is 100 MHz.
international symposium on circuits and systems | 2009
Yu-Fan Lai; Tsu-Ming Liu; Yao Li; Chen-Yi Lee
This paper presents a high-profile intra predictior for H.264 video decoder. To alleviate the starved bandwidth of intra compensation in high-definition video, we reuse the neighboring pixels and optimize the buffer size and access latency. In particular, a dedicated pixel buffer reuses neighboring pixel for realizing MB-adaptive frame-field (MBAFF) decoding in intra compensation. Moreover, a base-mode predictor is explored to optimize the area efficiency for reference sample filtering process (RSFP) in intra 8×8 modes. Simulation results show that the proposed data-reused intra prediction module requires 14K logic gates and 688 bits SRAM, and operates on 100MHz frequency for realizing 1080HD video playback at 30fps.
international symposium on vlsi design, automation and test | 2007
Wen-Ping Lee; Tsu-Ming Liu; Chen-Yi Lee
In this paper, we propose new spatial error concealment (SEC) method, error-concealed de-blocking filter (ECDF), for a real-time decoding system over an error-prone or wireless channel. There are several advantages of ECDF. The first is that ECDF can conceal I frames without the need of flexible macroblock ordering (FMO). Second, the hardware cost of interpolation can be saved. Third, the required information for error concealment (EC) only includes the pixels in the top and left neighbors of current corrupted macroblock (MB). Hence, we can conceal the corrupted MB without requiring the information in the right and bottom side, leading to the reduction of memory space as well as bandwidth. The implementation results show that the hardware cost of ECDF can be saved about 30% compared to direct implementation. Without the FMO, the proposal gains 1.3dB in PSNR compared to bilinear interpolation in JM9.8.