Nelson Yen-Chung Chang

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nelson Yen-Chung Chang is active.

Explore More

Publication

Featured researches published by Nelson Yen-Chung Chang.

IEEE Transactions on Circuits and Systems for Video Technology | 2010

Algorithm and Architecture of Disparity Estimation With Mini-Census Adaptive Support Weight

Nelson Yen-Chung Chang; Tsung-Hsien Tsai; Bo-Hsiung Hsu; Yi-Chun Chen; Tian-Sheuan Chang

High-performance real-time stereo vision system is crucial to various stereo vision applications, such as robotics, autonomous vehicles, multiview video coding, freeview TV, and 3-D video conferencing. In this paper, we proposed a high-performance hardware-friendly disparity estimation algorithm called mini-census adaptive support weight (MCADSW) and also proposed its corresponding real-time very large scale integration (VLSI) architecture. To make the proposed MCADSW algorithm hardware-friendly, we proposed simplification techniques such as using mini-census, removing proximity weight, using YUV color representation, using Manhattan color distance, and using scaled-and-truncate weight approximation. After applied these simplifications, the MCADSW algorithm was not only hardware-friendly, but was also 1.63 times faster. In the corresponding real-time VLSI architecture, we proposed partial column reuse and access reduction with expanded window to significantly reduce the bandwidth requirement. The proposed architecture was implemented using United Microelectronics Corporation (UMC) 90 nm complementary metal-oxide-semiconductor technology and can achieve a disparity estimation frame rate of 42 frames/s for common intermediate format size images when clocked at 95 MHz. The synthesized gate-count and memory size is 563 k and 21.3 kB, respectively.

international conference on multimedia and expo | 2007

Real-Time DSP Implementation on Local Stereo Matching

Nelson Yen-Chung Chang; Ting-Min Lin; Tsung-Hsien Tsai; Yu-Cheng Tseng; Tian-Sheuan Chang

Real-time DSP stereo matching solution has been important to various applications relying on stereo vision. We proposed a 4times5 jigsaw matching template and the dual-block parallel processing technique to enhance VLIW DSP stereo matchers performance. The 4times5 jigsaw template improves the matching quality by 1% compared with regular 4times5 block template while consuming the same amount of memory access bandwidth. Along with the benefit of the jigsaw template, the dual-block parallel processing technique, which doubles the throughput, is possible to be implemented for DSP. Together with instruction scheduling and operation pipelining, our DSP stereo matcher can achieve 50 FPS of 16 disparity levels for a 384times288 stereo image pair. Both quantitative and qualitative stereo matching results are provided at the end of this work.

international conference on multimedia and expo | 2007

Low Memory Cost Block-Based Belief Propagation for Stereo Correspondence

Yu-Cheng Tseng; Nelson Yen-Chung Chang; Tian-Sheuan Chang

The typical belief propagation has good accuracy for stereo correspondence but suffers from large run-time memory cost. In this paper, we propose a block-based belief propagation algorithm for stereo correspondence that partitions an image into regular blocks for optimization. With independently partitioned blocks, the required memory size could be reduced significantly by 99% with slightly degraded performance with a 32times32 block size when compared to original one. Besides, such blocks are also suitable for parallel hardware implementation. Experimental results using Middlebury stereo test bed demonstrate the performance of the proposed method.

IEEE Transactions on Circuits and Systems for Video Technology | 2008

Architecture Design of Shape-Adaptive Discrete Cosine Transform and Its Inverse for MPEG-4 Video Coding

Hui-Cheng Hsu; Kun-Bin Lee; Nelson Yen-Chung Chang; Tian-Sheuan Chang

This paper presents efficient VLSI architectures of the shape-adaptive discrete cosine transform (SA-DCT) and its inverse transform (SA-IDCT) for MPEG-4. Two of the challenges encountered during the exploitation of more efficient architectures for the SA-DCT and SA-IDCT are addressed. One challenge is to handle the architectural irregularity due to the shape-adaptive nature. The other one is to provide acceptable throughput using minimal hardware. In the algorithm-level optimization, this work exploits the numerical properties found in the transform matrices of various lengths, and derives a fine-grained zero-skipping scheme for the IDCT which can perform 22.6% more zero-skipping than the common vector-based coarse-grained zero-skipping scheme does. In the architecture-level design, the 1-D variable-length DCT/IDCT architectures designed on the basis of the numerical properties are proposed. An auto-aligned transpose memory that aligns the data of different lengths is also incorporated. In addition, a zero-index table is also included in the transpose memory to support the fine-grained zero-skipping in the SA-IDCT. The synthesized designs of the SA-DCT and SA-IDCT are implemented using UMC 0.18-mum technology. The SA-DCT architecture has 26 635 gates, and its average cycle-throughput is 0.66 pixels/cycle, which is comparable to other proposed architectures. On the other hand, the SA-IDCT architecture has 29 960 gates, and its cycle-throughput is 6.42 pixels/cycle. While decoding for CIF@30FPS, the SA-IDCT is clocked at 0.7 MHz, and the power consumption is 0.14 mW. Both the throughput and power consumption of the proposed SA-IDCT architecture are an order better than those of the existing SA-IDCT architectures.

international conference on consumer electronics | 2003

Optimal frame memory and data transfer scheme for MPEG-4 shape coding

Kun-Bin Lee; Hao-Yun Chin; Nelson Yen-Chung Chang; Hui-Cheng Hsu; Chein-Wei Jen

An optimal frame memory and data transfer scheme is proposed for MPEG-4 shape coding in embedded systems. The proposed alpha frame buffer scheme contains two approaches. First, a distributed tile-based memory organization is used to efficiently support the time-varying size of alpha plane. Second, a compression scheme is used to reduce the number of memory access to and the size of the alpha frame memory. Under the criteria of MPEG-4 standard, the size of alpha frame memory can be reduced to 50% by introducing a small index table (2.73%-5.08% of the original frame memory size). A coarse assessment shows that the number of memory reference can be reduced to 56.25%. On the other hand, the proposed data transfer scheme combines the run length coding and addressing mode to reduce average data transfer time to 9.39%. Therefore, the shared system bus can be kept as free as possible, which in turn leads to increasing the potentialities of improvement on system performance. Furthermore, this data transfer scheme also helps in accelerating the processing of shape coding.

Iet Computers and Digital Techniques | 2009

Analysis of shared-link AXI

Nelson Yen-Chung Chang; Ying-Ze Liao; Tian-Sheuan Chang

Shared-link AXI provides decent communication performance and requires half the cost of its crossbar counterpart. The authors analysed the performance impact of the factors in a shared-link AXI system. The factors include interface buffer size, arbitration combination and task access setting (transfer mode mapping). A hybrid data locked transfer mode was also proposed to improve the performance due to AXIs extra transition cycle. The analysis is carried out by simulating a multi-core platform with a shared-link AXI backbone running a video phone application. The performance is evaluated in terms of bandwidth utilisation, average transaction latency and system task completion time. The analysis showed that channel-independent arbitration could contribute up to 23.2% of bandwidth utilisation and completion time difference. Moreover, the analysis suggests that the proposed hybrid data locked mode should be used only by long access latency devices. Such setting resulted in up to 21.1% completion time reduction compared with the setting without the hybrid data locked mode. The design options in shared-link AXI bus are also discussed.

asia pacific conference on circuits and systems | 2008

Analysis of color space and similarity measure impact on stereo block matching

Nelson Yen-Chung Chang; Yu-Cheng Tseng; Tian Sheuan Chang

The impact of color space and similarity measure on complexity, speed, and performance of stereo matching is especially important to applications adopting stereo vision. This work analyzed the complexity of several most commonly considered color space and similarity measure. In addition, the execution speed and performance of color space and similarity measure combination are also compared on the same basis. The comparison result suggests that the Y-only rank provides the best combination under speed and performance trade-off.

IEEE Transactions on Circuits and Systems for Video Technology | 2006

Combined Frame Memory Motion Compensation for Video Coding

Nelson Yen-Chung Chang; Tian-Sheuan Chang

The frame memory has long been the dominant component in a video decoder in terms of energy, area, and latency. We proposed a non-combined frame memory motion compensation (CFMMC) for video decoding which facilitates the characteristic of the perfect-matched macroblock (MB) to avoid unnecessary memory access and to save energy. The statistic result confirms that some sequences have more than 70% of MBs being perfect-matched MB. The CFMMC hardware architecture is further evaluated for latency, area, and energy. The hardware architecture shows that with SRAM-base frame memory, the equivalent gate count can be reduced by 37.7%, and the energy consumption and the latency may also be improved for sequences with enough percentage of perfect-matched MBs. Since the benefit of the CFMMC is highly dependent on the percentage of perfect-matched MBs, it is best suited for applications with large portion of static background, such as video surveillance, video telephony, and video conferencing

international symposium on circuits and systems | 2008

Data reuse analysis of local stereo matching

Tsung Hsien Tsai; Nelson Yen-Chung Chang; Tian Sheuan Chang

External memory bandwidth and internal memory size have been major bottlenecks in designing VLSI architecture for real-time stereo matching hardware because of large amount of pixel data and disparity range. To address these bottlenecks, this work explores the impact of data reuse on disparity-order and pixel-order along with the partial column reuse (PCR) and vertically expanded row reuse (VERR) techniques we proposed. The analysis suggest that a disparity-order reuse with both PCR and VERR techniques is suitable for low memory cost and low external bandwidth design, whereas the pixel-order reuse with both techniques is more suitable for low computation resource requirement.

international symposium on circuits and systems | 2004

Trace-path analysis and performance estimation for multimedia application in embedded system

Nelson Yen-Chung Chang; Kun-Bin Lee; Chien-Wei Jen

High-level performance estimation can help assess the performance in the early development stage of embedded systems with real-time constraints efficiently; however, the estimation accuracy has long been an issue. In this paper, a performance estimation method called trace-path analysis based on high-level execution path tracing analysis, which takes the effect of multiple execution paths into consideration, is proposed. Multiple execution path resolutions and a linear model for non-deterministic node cost are incorporated to yield better estimation accuracy. The estimation of error is also covered in this context so that the degree of estimation accuracy can be known. Experiments taking MPEG-4 shape coding as an example show that the proposed approach can achieve an average of 1.88% estimation error per QCIF frame, which is better than the 12.38% of the bitstream analysis approach.

Explore More