Is this you? Create Your Porfile

Jui-Chin Chu

National Chung Cheng University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jui-Chin Chu is active.

Explore More

Publication

Featured researches published by Jui-Chin Chu.

international conference on acoustics, speech, and signal processing | 2008

Joint algorithm/code-level optimization of H.264 video decoder for mobile multimedia applications

Ting-Yu Huang; Guo-An Jian; Jui-Chin Chu; Ching-Lung Su; Jiun-In Guo

In this paper, we propose a joint algorithm/code-level optimization scheme to make it feasible to perform real-time H.264/AVC video decoding software on ARM-based platform for mobile multimedia applications. In the algorithm-level optimization, we propose various techniques like fast interpolation scheme, zero-skipping technique for texture decoding, fast boundary strength decision for in-loop filtering, and pattern matching algorithm for CAVLD. In the code-level optimization, we propose the design techniques on minimizing memory access and branch times. The experimental result shows that we have reduced the complexity of H.264 video decoder up to 93% as compared to the reference software JM9.7. The optimized H.264 video decoder can achieve the QCIF@30Hz video decoding on an ARM9 processor when operating at 120MHz clock.

international symposium on circuits and systems | 2009

A system architecture exploration on the configurable HW/SW co-design for H.264 video decoder

Guo-An Jian; Jui-Chin Chu; Ting-Yu Huang; Tao-Cheng Chang; Jiun-In Guo

In this paper we focus on the design methodology to propose a design that is more flexible than ASIC solution and more efficient than the processor-based solution for H.264 video decoder. We explore the memory access bandwidth requirement and different software/hardware partitions so as to propose a configurable architecture adopting a DEM (Data Exchange Mechanism) controller to fit the best tradeoff between performance and cost when realizing H.264 video decoder for different applications. The proposed architecture can achieve more than three times acceleration in performance.

IEEE Transactions on Circuits and Systems for Video Technology | 2009

VisoMT: A Collaborative Multithreading Multicore Processor for Multimedia Applications With a Fast Data Switching Mechanism

Wei-Chun Ku; Shu-Hsuan Chou; Jui-Chin Chu; Chi-Lin Liu; Tien-Fu Chen; Jiun-In Guo; Jinn-Shyan Wang

Multithreading and multicore processing are powerful ways to take advantage of parallelism in applications in order to boost a systems performance. However, exploring sufficient parallelism and achieving data locality with low communication overhead are still important research issues in embedded multithreading/multicore design. This paper introduces the design of a fast data switching mechanism between multilevel storage structures in a new multicore architecture. This paper makes several contributions to the development of contemporary sophisticated multimedia applications with advanced standards such as H.264. The first contribution, collaborative-multithreading, tightly unifies reduced instruction set computer and collaborative multithreading digital signal processing (DSP) in order to exploit high parallelism to provide sufficient computing power to applications. Each collaborative thread of our DSP is constructed by a heterogeneous-simultaneously multithreading single instruction, multiple data structure, and four media processing cores, which is connected by a fast switch for providing a fast data exchange mechanism among correlative streams on a thread-level basis. Our second contribution is one-stop streaming processing, which aims to keep data in the system for as long as possible until it is no longer needed, thus making data more efficient to access. Our third contribution is a chunk threading programming model, including a thread management library and threading communication directives for reducing data communication and synchronization overhead. By a combination of coarse-grained and fine-grained threading, programmers can choose various threading levels based on the amount of data exchange in a program. With our proposed techniques and an appropriate programming model, we can reduce processing time by 54.9% in H.264 video encoding (common intermediate format video at 16.574 f/s) with the 1-virtual independent and streaming processing by open collaborative multithreading configuration, compared to the Texas Instruments C62 core that owns 8 function units. We realize our design as a prototype by chip implementation, and fabricate it as a chip based on the Taiwan Semiconductor Manufacturing Company Ltd. 0.13 mum process. The die size of the processor core is 16.12 mm2, including 414 k logic transistors and 34.4 kB of on-chip static random access memory. The processor runs at 180 MH0z/1.2-V and consumes 245 mW by postsimulation results.

international conference on information technology: new generations | 2009

Optimization of VC-1/H.264/AVS Video Decoders on Embedded Processors

Guo-An Jian; Ting-Yu Huang; Jui-Chin Chu; Jiun-In Guo

In this paper we propose some optimization techniques to achieve the goal of real-time decoding of the new generation video such as VC-1, H.264, and AVS on embedded processors. We optimize the VC-1/H.264/AVS video decoders from a variety of viewpoints including algorithmic complexity reduction, memory access minimization, branch minimization, and zero skipping. We have reduced about 80% ~ 90% of complexity after optimization with the proposed techniques as compared to the original reference codes. The proposed low complexity new generation video decoders can achieve about CIF@12fps ~ 14fps and QCIF@47 ~ 50fps when running on ARM9 processor at 200 MHz.

ACM Transactions on Design Automation of Electronic Systems | 2009

A 252Kgates/4.9Kbytes SRAM/71mW multistandard video decoder for high definition video applications

Chih-Da Chien; Cheng-An Chien; Jui-Chin Chu; Jiun-In Guo; Ching-Hwa Cheng

This article proposes a low-cost, low-power multistandard video decoder for high definition (HD) video applications. The proposed design supports multiple-standard (JPEG baseline, MPEG-1/2/4 Simple Profile (SP), and H.264 Baseline Profile (BP)) video decoding through interactive parsing control and common parameter bus interface. In order to reduce hardware cost, the shared adder-based structure and reusable data management are proposed to achieve hardware sharing and reduce internal memory size, respectively. In addition, the proposed design is optimized through reducing memory bandwidth by increasing both data reuse amount and burst length of memory access as well as eliminating cycle overhead in data access for supporting HD video decoding with single AHB-based SDR memory. The proposed 252Kgates/4.9kB/71mW/0.13μm multi-standard video decoder reduces 72% in gate count and 87% in power consumption as compared to the state-of-the-art design, when operating at 120MHz for real-time HD1080 video decoding with single AHB-based SDR memory.

international soc design conference | 2008

A multi-mode entropy decoder with a generic table partition strategy

Jui-Chin Chu; Liang-Fei Su; Yao-Chang Yang; Jiun-In Guo; Ching-Lung Su

In this paper a low cost and low power multi-mode entropy decoder is proposed. The proposed design is compatible to the entropy decoding for JPEG, MPEG-1/2/4, H.264 and VC-1 video coding standards. It adopts the code-word tables merging and sharing, and integrates the various entropy decoding into a single programmable design. To reduce the required memory space, a generic look-up table partition strategy covering various video coding standards is proposed. Besides, the low power concept of high probability data path with lower capacitance is also taken into account. The proposed multi-mode entropy decoder is implemented using TSMC 0.13 mum at the cost of 113,884 gates and 0.54 KB SRAM. Its maximum operating frequency achieves 166 MHz, which can support entropy decoding on high definition video larger than HD1080@48 fps.

design automation conference | 2007

An embedded coherent-multithreading multimedia processor and its programming model

Jui-Chin Chu; Wei-Chun Ku; Shu-Hsuan Chou; Tien-Fu Chen; Jiun-In Guo

Multithreading and multi-core processing have been shown to be powerful approaches for boosting a system performance by taking advantage of parallelism in applications. This paper presents a processor design by unifying RISC and multithreading DSP for the sophisticated multimedia applications with advanced standards such as H.264. The proposed design not only minimizes integration costs for embedded multithreading/multi-core design by independent coherent threads, but also reduces the memory bandwidth requirements by one-stop streaming buffer and a very fast data exchange mechanism. With the proposed techniques and appropriate programming model, we can achieve 78% reduction of memory bandwidth and 89% reduction of processing time in H.264 video encoding, compared to traditional single stream micro-processor.

intelligent information hiding and multimedia signal processing | 2009

Optimization of AVS-M Video Decoder for Real-time Implementation on Embedded RISC Processors

Guo-An Jian; Jui-Chin Chu; Jiun-In Guo

In this paper, we propose a powerful AVS-M video decoder that can perform real-time decoding of AVS-M video on embedded RISC processors. Our optimization schemes cover algorithmic improvement, zero skipping, early termination, and data reusing. We have reduced about 90% ~ 93% of complexity after optimization with the proposed techniques as compared to the original reference codes. The proposed low complexity AVS-M video decoder can achieve about QVGA@30fps ~ 50fps when running on ARM920T processor at 384 MHz.

international symposium on circuits and systems | 2006

Design of customized functional units for the VLIW-based multi-threading processor core targeted at multimedia applications

Jui-Chin Chu; Chih-Wen Huang; He-Chun Chen; Keng-Po Lu; Ming-Shuan Lee; Jiun-In Guo; Tien-Fu Chen

In this paper, we propose the customized functional units (CFUs) for the UniCore which is a VLIW-based multi-threading processor core. The CFUs work as the hardware accelerators and play a key component to increase the performance for the multimedia application. Compared to ARM9TDMI, the number of execution cycles is reduced a factor of 21.7 in running the operations in MPEG video coding. Besides, the proposed design owns about twice data throughput rate compared to other SIMD-based architectures in average

asia pacific conference on circuits and systems | 2006

Predictive Mode Searching Policy for H.264/AVC Intra Prediction

Ming-Shuan Lee; Jui-Chin Chu; Jiun-In Guo

In this paper, the predictive mode searching policy for H.264/AVC intra prediction is presented. The algorithm could reduce the computational complexity of the Luma 4times4-block mode decision by skipping the less possible candidates and reduce the computation of the intra-prediction of Chroma 8times8-block and Luma 16times16-block by using the early termination mechanism with an empirical threshold. The experimental results show that, the algorithm reduces the computational complexity about 49.6% with PSNR degradation less than 0.02 dB for the intra-prediction of 4times4-block, and reduce the computational complexity about 68% with PSNR degradation less than 0.08 dB for the intra-prediction of 8times8- and 16times16-block as compared to JM97

Explore More