Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tian-Sheuan Chang is active.

Publication


Featured researches published by Tian-Sheuan Chang.


IEEE Transactions on Circuits and Systems for Video Technology | 2010

Algorithm and Architecture of Disparity Estimation With Mini-Census Adaptive Support Weight

Nelson Yen-Chung Chang; Tsung-Hsien Tsai; Bo-Hsiung Hsu; Yi-Chun Chen; Tian-Sheuan Chang

High-performance real-time stereo vision system is crucial to various stereo vision applications, such as robotics, autonomous vehicles, multiview video coding, freeview TV, and 3-D video conferencing. In this paper, we proposed a high-performance hardware-friendly disparity estimation algorithm called mini-census adaptive support weight (MCADSW) and also proposed its corresponding real-time very large scale integration (VLSI) architecture. To make the proposed MCADSW algorithm hardware-friendly, we proposed simplification techniques such as using mini-census, removing proximity weight, using YUV color representation, using Manhattan color distance, and using scaled-and-truncate weight approximation. After applied these simplifications, the MCADSW algorithm was not only hardware-friendly, but was also 1.63 times faster. In the corresponding real-time VLSI architecture, we proposed partial column reuse and access reduction with expanded window to significantly reduce the bandwidth requirement. The proposed architecture was implemented using United Microelectronics Corporation (UMC) 90 nm complementary metal-oxide-semiconductor technology and can achieve a disparity estimation frame rate of 42 frames/s for common intermediate format size images when clocked at 95 MHz. The synthesized gate-count and memory size is 563 k and 21.3 kB, respectively.


IEEE Transactions on Circuits and Systems for Video Technology | 2007

A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding

Yu-Jen Wang; Chao-Chung Cheng; Tian-Sheuan Chang

This paper presents a fast algorithm and its VLSI architecture for H.264 fractional motion estimation. Motivated by the high correlation of cost between neighboring fractional pel position, the proposed algorithm efficiently explores the neighborhood position around the minimum one and thus skips other unlikely ones. Thus, the proposed search pattern and early termination under constant quantization parameter can reduce about 50% of computation complexity compared to that in reference software but only with 0.1-0.2 dB peak signal-to-noise ratio degradation and less than 2% of bit rate increase. The VLSI architecture of the proposed algorithm thus can save 40% of area cost due to only half of the processing elements and save 14% of searching time when compared with the previous design


IEEE Transactions on Circuits and Systems | 2008

A Hardware-Efficient H.264/AVC Motion-Estimation Design for High-Definition Video

Yu-Kun Lin; Chia-Chun Lin; Tzu-Yun Kuo; Tian-Sheuan Chang

Motion estimation (ME) in high-definition H.264 video coding presents a significant design challenge for memory bandwidth, latency, and cost because of its large search range and various modes. To conquer this problem, this paper presents a low-latency and hardware-efficient ME design with three design techniques. The first technique on integer-pel ME (IME) adopts parallel instead of serial multiresolution search so that we can process 1080 p @ 60 fps videos with plusmn128 search range within just 256 cycles, 5.95-KB buffers, and 213.7 K gates. The second technique on fractional-pel ME (FME) uses a single-iteration six-point search to reduce the cycle count by half with similar gate count and negligible quality loss. The third technique applies a mode-filtering approach to further reduce the bandwidth and cycles and share the buffer of IME and FME. The final ME implementation with 0.13-mum process can support processing of 1080 p @ 60 fps with just 128.8 MHz, 282.6 K gates, and 8.54-KB buffer, which saves 60% gate count, and 68.9% SRAM buffers when compared with the previous design.


IEEE Transactions on Image Processing | 2013

Fast SIFT Design for Real-Time Visual Feature Extraction

Liang-chi Chiu; Tian-Sheuan Chang; Jiun-Yen Chen; Yen-Chung Chang

Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object recognition. However, its real-time implementation suffers from long latency, heavy computation, and high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design with an on-the-fly feature extraction flow for real-time application needs. Compared with the original SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000 feature points/frame for VGA images at 30 frames/s and ~ 2000 feature points/frame for 1920 × 1080 images at 30 frames/s at the clock rate of 100 MHz.


IEEE Transactions on Circuits and Systems for Video Technology | 2006

A High-Definition H.264/AVC Intra-Frame Codec IP for Digital Video and Still Camera Applications

Chun-Wei Ku; Chao-Chung Cheng; Guo-Shiuan Yu; Min-Chi Tsai; Tian-Sheuan Chang

This paper presents a real-time high-definition 720p@30fps H.264/MPEG-4 AVC intra-frame codec IP suitable for digital video and digital still camera applications. The whole design is optimized in both the algorithm and architecture levels. In the algorithm level, we propose to remove the area-costly plane mode, and enhance the cost function to reduce hardware cost and to increase the processing speed while provide nearly the same quality. In the architecture design, in additional to the fast module implementation the process is arranged in the macroblock-level pipelining style together with three careful scheduling techniques to avoid the idle cycles and improve the data throughput. The whole codec design only needs 103 K gate count for a core size of 1.28times1.28mm2 and achieves real-time encoding and decoding at 117 and 25.5 MHz, respectively, when implemented by 0.18-mum CMOS technology


IEEE Transactions on Circuits and Systems for Video Technology | 2009

A 140-MHz 94 K Gates HD1080p 30-Frames/s Intra-Only Profile H.264 Encoder

Yu-Kun Lin; Chun-Wei Ku; De-Wei Li; Tian-Sheuan Chang

This paper presents a HD1080p 30-frames/s H.264 intra encoder operated at 140 MHz with just 94 K gate count and 0.72-mm2 core area for digital video recorder or digital still camera applications. To achieve high throughput and low area cost for high-definition video, we apply the modified three-step fast intra prediction technique to reduce the cycle count while keeping the quality as close as full search. Then, in architecture scheduling, we further adopt the variable pixel parallelism instead of constant four-pixel parallelism to speed up performance on the critical intra prediction part while keeping other parts unchanged for low area cost. The achieved design only needs half of the working frequency and reduces the gate count cost by 23.5% compared with the previous design with the same HD720p 30-frames/s requirement. Besides, our design at 140 MHz can support HD1080p 30 frames/s for digital video encoder or 4096 times2304 images with 6.78 frames/s for digital still camera application.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006

An in-place architecture for the deblocking filter in H.264/AVC

Chao-Chung Cheng; Tian-Sheuan Chang; Kun-Bin Lee

This brief presents an in-place computing design for the deblocking filter used in H.264/AVC video coding standard. The proposed in-placed computing flow reuses intermediate data as soon as data is available. Thus, the intermediate data storage is reduced to only the four 4 /spl times/ 4 blocks instead of whole 16 /spl times/ 16 macroblock. The resulting design can achieve 100 MHz with only 13.41K gate count and support real-time deblocking operation of 2K /spl times/ 1K@30 Hz video application when clocked at 73.73 MHz by using 0.25-/spl mu/m CMOS technology.


international solid-state circuits conference | 2008

A 242mW 10mm 2 1080p H.264/AVC High-Profile Encoder Chip

Yu-Kun Lin; De-Wei Li; Chia-Chun Lin; Tzu-Yun Kuo; Sian-Jin Wu; Wei-Cheng Tai; Wei-Cheng Chang; Tian-Sheuan Chang

High-profile H.264 has been adopted as the major coding standard in popular high definition video due to its excellent coding efficiency. Several implementations have been developed ((Y. W. Huang, et al., 2005), (H.C. Chang, et al., 2007), (T.C. Chen, et al., 2007)), but, their performance is limited to baseline 720p (Y. W. Huang, et al., 2005),(H.C. Chang, et al., 2007) or SDTV (T.C. Chen, et al., 2007). The main stream 1080p high-profile application presents a series of new design challenges in throughput, cost and power because of at least a 4x higher complexity than in the 720p baseline. Thus, a 0.13mum 1080p high-profile H.264 video encoder is presented with 10mm2 core and 242 mW power. Compared to a state-of-the-art 720p baseline design (H.C. Chang, et al., 2007), this design achieves a 46.7% and 54% reduction in area and power, respectively. These savings are from parallelism enhanced throughput and a cross-stage sharing pipeline.


design automation conference | 2008

A 242mW, 10mm 2 1080p H.264/AVC high profile encoder chip

Yu-Kun Lin; De-Wei Li; Chia-Chun Lin; Tzu-Yun Kuo; Sian-Jin Wu; Wei-Cheng Tai; Wei-Cheng Chang; Tian-Sheuan Chang

A 1080 p high profile H.264 encoder is designed by the robust reusable silicon IP methodology and fabricated in a 0.13 mum CMOS technology with an area of 10 mm2 and 242 mW at 145 MHz. Compared to the state-of-the-art design targeted at 720 p baseline, this design reduces 53.4% power and 46.7% area through parallelism enhanced throughput and cross stage sharing pipeline.


international symposium on circuits and systems | 2006

A zero-skipping multi-symbol CAVLC decoder for MPEG-4 AVC/H.264

Guo-Shiuan Yu; Tian-Sheuan Chang

This paper presents a high-performance CAVLC decoding VLSI architecture for MPEG-4 AVC/H.264. Instead of just skipping zero block, the proposed design explores the features of CAVLC decoding process to efficient skip possible processes if none needed to be decoded, and can decode multiple symbols in sign and run before stage. The proposed design just needs average 90 cycles for one MB decoding, which can meet real time HDTV requirement and saves 64% of cycle count in average when compared with previous design. The hardware cost is about 13192 gates when synthesized at 125 MHz

Collaboration


Dive into the Tian-Sheuan Chang's collaboration.

Top Co-Authors

Avatar

Nelson Yen-Chung Chang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yu-Kun Lin

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Gwo-Long Li

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yu-Cheng Tseng

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Cheng-Wen Wei

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chia-Chun Lin

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Shyh-Jye Jou

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chao-Chung Cheng

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Tzu-Yun Kuo

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge