Sung-Fang Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sung-Fang Tsai is active.

Explore More

Publication

Featured researches published by Sung-Fang Tsai.

IEEE Transactions on Circuits and Systems for Video Technology | 2007

Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC

Tung-Chien Chen; Yu-Han Chen; Sung-Fang Tsai; Shao-Yi Chien; Liang-Gee Chen

In an H.264/AVC video encoder, integer motion estimation (IME) requires 74.29% computational complexity and 77.49% memory access and becomes the most critical component for low-power applications. According to our analysis, an optimal low-power IME engine should be a parallel hardware architecture supporting fast algorithms and efficient data reuse (DR). In this paper, a hardware-oriented fast algorithm is proposed with the intra-/inter-candidate DR considerations. In addition, based on the systolic array and 2-D adder tree architecture, a ladder-shaped search window data arrangement and an advanced searching flow are proposed to efficiently support inter-candidate DR and reduce latency cycles. According to the implementation results, 97% computational complexity is saved by the proposed fast algorithm. In addition, 77.6% memory bandwidth is further saved with the proposed DR techniques at architecture level. In the ultra-low-power mode, the power consumption is 2.13 mW for real-time encoding CIF 30-fps videos at 13.5-MHz operating frequency

symposium on vlsi circuits | 2007

2.8 to 67.2mW Low-Power and Power-Aware H.264 Encoder for Mobile Applications

Tung-Chien Chen; Yu-Han Chen; Chuan-Yung Tsai; Sung-Fang Tsai; Shao-Yi Chien; Liang-Gee Chen

A 2.8 to 67.2 mW H.264 encoder is implemented on a 12.8 mm2 die with 0.18 mum CMOS technology. The proposed parallel architectures along with fast algorithms and data reuse schemes enable 77.9% power savings. The power awareness is provided through a flexible system hierarchy that supports content-aware algorithms and module-wise gated clock.

international conference on consumer electronics | 2011

A real-time 1080p 2D-to-3D video conversion system

Sung-Fang Tsai; Chao-Chung Cheng; Chung-Te Li; Liang-Gee Chen

In this paper, we demonstrate a 2D-to-3D video conversion system capable of real-time 1920×1080p conversion. The proposed system generates 3D depth information by fusing cues from edge feature-based global scene depth gradient and texture-based local depth refinement. By combining the global depth gradient and local depth refinement, generated 3D images have comfortable and vivid quality, and algorithm has very low computational complexity. Software is based on a system with a multi-core CPU and a GPU. To optimize performance, we use several techniques including unified streaming dataflow, multi-thread schedule synchronization, and GPU acceleration for depth image-based rendering (DIBR). With proposed method, real-time 1920×1080p 2Dto- 3D video conversion running at 30fps is then achieved.

IEEE Transactions on Circuits and Systems for Video Technology | 2009

Algorithm and Architecture Design of Power-Oriented H.264/AVC Baseline Profile Encoder for Portable Devices

Yu-Han Chen; Tung-Chien Chen; Chuan-Yung Tsai; Sung-Fang Tsai; Liang-Gee Chen

Because video services are becoming popular on portable devices, power becomes the primary design issue for video coders nowadays. H.264/AVC is an emerging video coding standard which can provide outstanding coding performance and thus is suitable for mobile applications. In this paper, we target a power-efficient H.264/AVC encoder. The main power consumption in an H.264/AVC encoding system is induced by data access of motion estimation (ME). At first, we propose hardware-oriented algorithms and corresponding parallel architectures of integer ME (IME) and fractional ME (FME) to achieve memory access power reduction. Then, a parameterized encoding system and flexible system architecture are proposed to provide power scalability and hardware efficiency, respectively. Finally, our design is implemented under TSMC 0.18 mum CMOS technology with 12.84 mm2 core area. The required hardware resources are 452.8 K logic gates and 16.95 KB SRAMs. The power consumption ranges from 67.2 to 43.5 mW under D1 (720 x 480) 30 frames/s video encoding, and more than 128 operating configurations are provided.

signal processing systems | 2008

Data Reuse Exploration for Low Power Motion Estimation Architecture Design in H.264 Encoder

Yu-Han Chen; Tung-Chien Chen; Chuan-Yung Tsai; Sung-Fang Tsai; Liang-Gee Chen

Data access usually leads to more than 50% of the power cost in a modern signal processing system. To realize a low-power design, how to reduce the memory access power is a critical issue. Data reuse (DR) is a technique that recycles the data read from memory and can be used to reduce memory access power. In this paper, a systematic method of DR exploration for low-power architecture design is presented. For a start, the signal processing algorithms should be formulated as the nested loops structures, and data locality is explored by use of loop analysis. Then, corresponding DR techniques are applied to reduce memory access power. The proposed design methodology is applied to the motion estimation (ME) algorithms of H.264 video coding standard. After analyzing the ME algorithms, suitable parallel architectures and processing flows of the integer ME (IME) and fractional ME (FME) are proposed to achieve efficient DR. The amount of memory access is respectively reduced to 0.91 and 4.37% in the proposed IME and FME designs, and thus lots of memory access power is saved. Finally, the design methodology is also beneficial for other signal processing systems with a low-power consideration.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

Brain-Inspired Framework for Fusion of Multiple Depth Cues

Chung-Te Li; Yen-Chieh Lai; Chien Wu; Sung-Fang Tsai; Tung-Chien Chen; Shao-Yi Chien; Liang-Gee Chen

2-D-to-3-D conversion is an important step for obtaining 3-D videos, as a variety of monocular depth cues have been explored to generate 3-D videos from 2-D videos. As in a human brain, a fusion of these monocular depth cues can regenerate 3-D data from 2-D data. By mimicking how our brains generate depth perception, we propose a reliability-based fusion of multiple depth cues for an automatic 2-D-to-3-D video conversion. A series of comparisons between the proposed framework and the previous methods is also presented. It shows that significant improvement is achieved in both subjective and objective experimental results. From the subjective viewpoint, the brain-inspired framework outperforms earlier conversion methods by preserving more reliable depth cues. Moreover, an enhancement of 0.70-3.14 dB and 0.0059-0.1517 in the perceptual quality of the videos is realized in terms of the objective-modified peak signal-to-noise ratio and disparity distortion model, respectively.

picture coding symposium | 2009

Mapping Scalable Video Coding decoder on multi-core stream processors

Yu-Chi Su; Sung-Fang Tsai; Tzu-Der Chuang; You-Ming Tsao; Liang-Gee Chen

Scalable Video Coding (SVC) is an advanced video compression technique that can support temporal, spatial, and quality scalability to terminals with different network conditions. SVC adopts layered coding techniques to improve coding efficiency for spatial and quality scalability. Upsampling and inter-layer prediction are two important mechanisms to remove redundant information between different layers. However, upsampling occupying around 75% memory bandwidth of SVC decoder results in serious performance degradation, especially for applications with high resolutions. Moreover, inter-layer prediction with complex scheduling leads to difficulties when mapping the SVC decoder in parallel. In this paper, we propose a method to parallelize the SVC decoder on a multi-core stream processor platform in both efficiency and flexibility. We focus on mapping issues of spatial scalability supporting with various resolutions of decoded frames. The experiment result proves the proposed design for SVC decoder reduces 95% memory bandwidth of the upsampling module in JSVM, performed on a single general-purpose processor.

asian solid state circuits conference | 2010

Tera-Scale Performance Machine Learning SoC (MLSoC) With Dual Stream Processor Architecture for Multimedia Content Analysis

Tse-Wei Chen; Chi-Sun Tang; Sung-Fang Tsai; Chen-Han Tsai; Shao-Yi Chien; Liang-Gee Chen

A new machine learning SoC (MLSoC) for multimedia content analysis is implemented with 16-mm2 area in 90-nm CMOS technology. Different from traditional VLSI architectures, it focuses on the coacceleration of computer vision and machine learning algorithms, and two stream processors with massively parallel processing elements are integrated to achieve tera-scale performance. In the dual stream processor (DSP) architecture, the data are transferred between processors and the high-bandwidth dual memory (HBDM) through the local media bus without consuming the AMBA AHB bandwidth. The image stream processor (ISP) of the MLSoC can handle common window-based operations for image processing, and the feature stream processor (FSP) can deal with machine learning algorithms with different dimensions. The power efficiency of the proposed MLSoC is 1.7 TOPS/W, and the area efficiency is 81.3 GOPS/mm 2.

international conference on consumer electronics | 2012

3D image correction by Hilbert Huang decomposition

Chung-Te Li; Yen-Chieh Lai; Chien Wu; Sung-Fang Tsai; Liang-Gee Chen

This paper presented a novel correction for 3D image signals by Hilbert Huang decomposition. Hilbert Huang decomposition is applied for dividing edges of the textures and edges of the objects. Depth map is corrected by regrouping pixels with edges of the objects. The proposed correction outperforms conventional methods especially at the boundaries of the objects.

custom integrated circuits conference | 2009

Tera-scale performance machine learning SoC with dual stream processor architecture for multimedia content analysis

Tse-Wei Chen; Chi-Sun Tang; Sung-Fang Tsai; Chen-Han Tsai; Shao-Yi Chien; Liang-Gee Chen

A new SoC architecture for multimedia content analysis is implemented with 16mm2 area in 90nm CMOS technology. It focuses on the co-acceleration of computer vision and machine learning algorithms, and two stream processors with massively parallel processing elements are integrated to achieve tera-scale performance. In the dual processor architecture, the data are transferred between processors and the high bandwidth dual memory through the local media bus, which reduces the power consumption in the AHB data access. The power efficiency of the proposed machine learning SoC is 1.7 TOPS/W, and the area efficiency is 81.3 GOPS/mm2.

Explore More