Guo-An Jian
National Chung Cheng University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guo-An Jian.
international conference on acoustics, speech, and signal processing | 2008
Ting-Yu Huang; Guo-An Jian; Jui-Chin Chu; Ching-Lung Su; Jiun-In Guo
In this paper, we propose a joint algorithm/code-level optimization scheme to make it feasible to perform real-time H.264/AVC video decoding software on ARM-based platform for mobile multimedia applications. In the algorithm-level optimization, we propose various techniques like fast interpolation scheme, zero-skipping technique for texture decoding, fast boundary strength decision for in-loop filtering, and pattern matching algorithm for CAVLD. In the code-level optimization, we propose the design techniques on minimizing memory access and branch times. The experimental result shows that we have reduced the complexity of H.264 video decoder up to 93% as compared to the reference software JM9.7. The optimized H.264 video decoder can achieve the QCIF@30Hz video decoding on an ARM9 processor when operating at 120MHz clock.
international symposium on circuits and systems | 2009
Guo-An Jian; Jui-Chin Chu; Ting-Yu Huang; Tao-Cheng Chang; Jiun-In Guo
In this paper we focus on the design methodology to propose a design that is more flexible than ASIC solution and more efficient than the processor-based solution for H.264 video decoder. We explore the memory access bandwidth requirement and different software/hardware partitions so as to propose a configurable architecture adopting a DEM (Data Exchange Mechanism) controller to fit the best tradeoff between performance and cost when realizing H.264 video decoder for different applications. The proposed architecture can achieve more than three times acceleration in performance.
international symposium on vlsi design, automation and test | 2011
Jiff Kuo Alan P. Su; Kuen-Jong Lee; Ing-Jer Huang; Guo-An Jian; Cheng-An Chien; Jiun-In Guo; Chien-Hung Chen
Multi-core system is becoming the next generation embedded design platform. Heterogeneous and homogeneous processor cores integrated in Multiple Instruction Multiple Data (MIMD) System-on-a-Chip (SoC) to provide complex services, e.g. smart phones, is coming up in the horizon. However, distributed programming is a difficult problem in such systems. Today, only in very few MIMD SoC designs we can find comprehensive multi-core software/hardware co-debug capability that can stop at not only software but also hardware breakpoints to inspect data and system status for identifying bugs. In this work we have integrated various debug mechanisms so that the entire multi-core SoC is able to iterate unlimited times of software and hardware breaks for data and status inspections and stepping forward to resume execution till next break point. This debug mechanism is realized with a chip with four ARM1176 cores and ARM CoreSight™ on-chip debug and trace system, a Field Programmable Gate Array (FPGA) loaded with on-chip test architecture and bus monitor, and software debug platform to download system trace and processor core data for inspection and debug control. Key contributions of this work are (1) a development of multi-clock multi-core software/hardware co-debug platform and (2) the exercise of a multi-core program debugging to visualize the physical behavior of race conditions.
international conference on information technology: new generations | 2009
Guo-An Jian; Ting-Yu Huang; Jui-Chin Chu; Jiun-In Guo
In this paper we propose some optimization techniques to achieve the goal of real-time decoding of the new generation video such as VC-1, H.264, and AVS on embedded processors. We optimize the VC-1/H.264/AVS video decoders from a variety of viewpoints including algorithmic complexity reduction, memory access minimization, branch minimization, and zero skipping. We have reduced about 80% ~ 90% of complexity after optimization with the proposed techniques as compared to the original reference codes. The proposed low complexity new generation video decoders can achieve about CIF@12fps ~ 14fps and QCIF@47 ~ 50fps when running on ARM9 processor at 200 MHz.
international symposium on vlsi design, automation and test | 2013
Guo-An Jian; Jui-Sheng Lee; Kheng-Joo Tan; Peng-Sheng Chen; Jiun-In Guo
Scalable video coding (SVC) is a video coding technique that mainly aims its target in resolving problems of multimedia communication between servers and various clients with different computational power, transmission bandwidths, and display resolutions. In this paper, we developed a parallel SVC encoding system to achieve the real-time coding performance of SVC. First, a GOP-level bit-stream structure that is fully compatible with the standard SVC decoder was proposed to eliminate the data dependencies of encoding among GOPs. Based on this bit-stream structure, we developed a parallel SVC encoding algorithm to exploit the parallelism on multicore systems. Finally, the proposed parallel SVC encoder was implemented and integrated into a multimedia streaming system for the evaluation. The experimental results showed that for CIF/QCIF and VGA/QVGA 2-layer SVC encoding, the proposed parallel SVC encoder can achieve 50.96 and 103.99 times speedup with negligible PSNR drop, respectively.
IEEE Transactions on Consumer Electronics | 2012
Cheng-An Chien; Guo-An Jian; Hsiu-Cheng Chang; Kuan-Hung Chen; Jiun-In Guo
This paper presents an efficient VLSI architecture of in-loop deblocking filter (ILF) with high efficiency data access system for supporting multiple video coding standards including H.264 BP/MP/HP, SVC, MVC, AVS, and VC-1. Advanced standards, such as H.264 MP/HP, SVC, and MVC, adopt Macro Block Adaptive Frame Field (MBAFF) to enhance coding efficiency which results in the performance bottleneck of deblocking filter due to complex data access requirement. This design challenge has not been discussed in previous works according to our best knowledge. Therefore, we develop a Prediction Data Management (PDM) to manage the input prediction data order of deblocking filter for different coding types (like frame/field) and multiple standards. We also design an extended output frame buffer module to solve the system bus architecture restriction (like 1K boundary and burst length) and achieve high efficiency data access by using MB-based scan order. By using these techniques, we can solve the data accessing design challenge and reduce 67% bus latency. After being implemented by using 90 nm CMOS technology, the proposed work can achieve real-time performance requirement of QFHD (3840×2160@30fps) when operated at 156MHz at the cost of 50.6K gates and 2.4K bytes local memory. The maximum operating frequency of the proposed design, i.e. 370MHz, is higher than the required real-time operating frequency so that voltage scaling may be adopted to reduce power consumption.
intelligent information hiding and multimedia signal processing | 2009
Guo-An Jian; Jui-Chin Chu; Jiun-In Guo
In this paper, we propose a powerful AVS-M video decoder that can perform real-time decoding of AVS-M video on embedded RISC processors. Our optimization schemes cover algorithmic improvement, zero skipping, early termination, and data reusing. We have reduced about 90% ~ 93% of complexity after optimization with the proposed techniques as compared to the original reference codes. The proposed low complexity AVS-M video decoder can achieve about QVGA@30fps ~ 50fps when running on ARM920T processor at 384 MHz.
international symposium on circuits and systems | 2007
Guo-An Jian; Chih-Da Chien; Jiun-In Guo
In this paper we propose a memory-based hardware accelerator for MPEG-4 audio coding and reverberation to achieve both high quality and reality of audio. The proposed design can realize both the computation-intensive component of 256/2048-point IMDCT and the 1024-point FFT-based reverberation through the same hardware engine by adopting a unified IMDCT/FFT/IFFT algorithm, which greatly reduces the hardware cost. The proposed design can achieve both the real-time 5.1 channel audio decoding at the sampling rate of 44.1 KHz and audio reverberation with the hardware cost of 26,633 gates and 4.6K words of local memory for storing transform coefficients and temporary results. The maximum working frequency achieves 220 MHz when implemented by UMC 0.18mum CMOS technology, which can fit the real-time processing requirement of many high quality MPEG-4 audio coding applications
Archive | 2017
Po-Chun Shen; Kuan-Hung Chen; Jui-Sheng Lee; Guan-Yu Chen; Yi-Ting Lin; Bing-Yang Cheng; Guo-An Jian; Hsiu-Cheng Chang; Wei-Ming Lu; Jiun-In Guo
Intelligent vision processing technology has a wide range of applications on vehicles. Many of these applications are related to a so-called Advanced Driver Assistance System (ADAS). Collaborated with cameras, Pedestrian and Motorcyclist Detection System (PMD), Lane Departure Warning System (LDWS), Forward Collision Warning System (FCWS), Speed Limit Detection System (SLDS), and Dynamic Local Contrast Enhancement (DLCE) techniques can help drivers notice important events or objects around. This chapter gives an in-depth exploration for these intelligent vision processing technologies from the viewpoints of methodology development, algorithm optimization, and system implementation on embedded platforms. More precisely, this chapter tends to first give a survey and overview for newly appeared state-of-the-art intelligent vision processing technologies for ADAS, and then highlights some significant technologies including PMD, LDWS, FCWS, SLDS, and DLCE developed in System on Chip (SoC) Laboratory, Fong-Chia University, Taiwan, and intelligent Vision System (iVS) Laboratory, National Chiao Tung University, Taiwan. Besides, implementation and verification of the above ADAS technologies will also be presented. In summary, the proposed PMD design achieves 32.5 frame per second (fps) for 720 × 480 (D1) resolution on an AMD A10-7850K processor by using heterogeneous computing. On an automotive-grade Freescale i.MX6 (including 4-core ARM Cortex A9, 1 GB DDR3 RAM, and Linux environment) platform, the proposed LDWS, FCWS, and SLDS designs, respectively, achieve 33 fps, 32 fps, and 30 fps for D1 resolution. Finally, the proposed DLCE system is realized on a TREK-668 platform with an Intel Atom 1.6 GHz processor for real-time requirement of 50 fps at D1 resolution.
asia-pacific signal and information processing association annual summit and conference | 2013
Yuan-Hsiang Miao; Guo-An Jian; Li-Ching Wang; Jui-Sheng Lee; Jiun-In Guo
This paper proposes a low complexity multi-view video encoder which includes mode decision and early termination based on B-frame characteristics. According to the statistics of coding mode distribution in different B-frame types, we classify all the coding modes into several classes and propose an early terminated mode decision algorithm that can largely reduce the computing complexity. On the other hand, MVD-based adaptive search range scheme is also included in the proposed encoding strategy. In our experimental results, the encoding time is saved up to 91%-93% but the quality loss is controlled within 0.1 dB PSNR drop.