Is this you? Create Your Porfile

Chih-Peng Fan

National Chung Hsing University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chih-Peng Fan is active.

Explore More

Publication

Featured researches published by Chih-Peng Fan.

international symposium on intelligent signal processing and communication systems | 2007

Implementations of high throughput sequential and fully pipelined AES processors on FPGA

Chih-Peng Fan; Jun-Kui Hwang

In this paper, we use FPGA chips to realize the high- throughput 128 bits AES cipher processor by new high-speed and hardware sharing functional blocks. The AES functional caculations include four transformation stages, which are SubBytes, ShiftRows, MixColumns and AddRoundKey. The content-addressable memory (CAM) based scheme is used to realize the new proposed high-speed SubBytes block. The new hardware sharing architecture is applied to implement the proposed high-speed MixColumns block. Then the efficient low-cost AddRoundKey architecture is used for real-time key generations. The utilized FPGA tool is Xilinx ISEtrade 7.1 with XSTtrade synthesizer. In our proposed sequential AES design, the operational frequency can reach 75.3 MHz and the throughput can be up to 0.876 Gbits/s. In our full pipelined AES design, the operational frequency can process 250 MHz and the throughput can be up to 32 Gbits/s. Both of the proposed sequential and full pipelined AES realizations achieve higher throughput than the other sequential and full pipelined designs, individually.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 2000

Compact recursive structures for discrete cosine transform

Jar-Ferr Yang; Chih-Peng Fan

In this paper, we propose compact recursive structures for computing the discrete cosine transform. With a simple preprocessor, the proposed recursive computation, which can be realized in a fixed-coefficient second-order infinite-impulse response (IIR) filter, requires fewer recursive loops than the previous methods if the transformed length is not a prime number. Due to fewer recursive loops and selected coefficients, the proposed compact recursive structure achieves more accurate results than the other methods. With fast recursion and low roundoff error in transformation, the compact recursive algorithm can be easily realized in VLSI chips.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1998

Fast center weighted Hadamard transform algorithms

Chih-Peng Fan; Jar-Ferr Yang

In this paper, one-dimensional (1-D) and two-dimensional (2-D) fast algorithms for realizing the center weighted Hadamard transform (CWHT) are proposed. Based on the Kronecker product and direct sum operations, the proposed 1-D and 2-D CWHT algorithms through matrix decompositions require fewer computations than the existing methods. With low complexity and regular modularity, the proposed algorithms are suitable for VLSI implementation to achieve real-time signal processing and compression.

Integration | 2011

Efficient RC low-power bus encoding methods for crosstalk reduction

Chih-Peng Fan; Chia-Hao Fang

In on-chip buses, the RC crosstalk effect leads to serious problems, such as wire propagation delay and dynamic power dissipation. This paper presents two efficient bus-coding methods. The proposed methods simultaneously reduce more dynamic power dissipation and wire propagation delay than existing bus encoding methods. Our methods also reduce more total power consumption than other encoding methods. Simulation results show that the proposed method I reduces coupling activity by 26.7-38.2% and switching activity by 3.7%-7% on 8-bit to 32-bit data buses, respectively. The proposed method II reduces coupling activity by 27.5-39.1% and switching activity by 5.3-9% on 8-bit to 32-bit data buses, respectively. Both the proposed methods reduce dynamic power by 23.9-35.3% on 8-bit to 32-bit data buses and total propagation delay by up to 30.7-44.6% on 32-bit data buses, and eliminate the Type-4 coupling. Our methods also reduce total power consumption by 23.6-33.9%, 23.9-34.3%, and 24.1-34.6% on 8-bit to 32-bit data buses with the 0.18, 0.13, and 0.09@mm technologies, respectively.

IEEE Signal Processing Letters | 2008

Efficient Low-Cost Sharing Design of Fast 1-D Inverse Integer Transform Algorithms for H.264/AVC and VC-1

Chih-Peng Fan; Guo-An Su

In this letter, the fast one-dimensional (1-D) algorithms and their sharing design for 1-D inverse integer transforms of H.264/AVC and VC-1 are proposed by using the matrix decompositions with the sparse matrices and the matrix offset computations. The computational complexities of the proposed fast 1-D 4 times 4 and 8 times 8 inverse integer transforms for H.264/AVC are the same as those of the previous fast methods. Then the shift operations of the proposed fast 1-D inverse integer transform for VC-1 are equivalent to those of the previous fast method. For playback environments, the video decoder can support the multiple modes, which include H.264/AVC and VC-1 video standards. The proposed hardware sharing architecture requires lower hardware cost than the individual and separate design for the VLSI realization.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2008

Low-Cost Hardware-Sharing Architecture of Fast 1-D Inverse Transforms for H.264/AVC and AVS Applications

Guo-An Su; Chih-Peng Fan

In this paper, the fast one-dimensional (1-D) algorithms and their hardware-sharing designs for the 1-D 2times2, 4times4, and 8times8 inverse transforms of H.264/AVC and the 1-D 8times8 inverse transform of AVS are proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware cost of the proposed sharing architecture is smaller than that of the individual and separate designs. With regular modularity, the proposed sharing architecture is suitable to achieve H.264/AVC and AVS signal processing by VLSI implementations.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2009

Fast Algorithm and Low-Cost Hardware-Sharing Design of Multiple Integer Transforms for VC-1

Chih-Peng Fan; Guo-An Su

In this brief, the fast 1D multiple integer transforms of Windows Media Video 9 (WMV-9/VC-1) are proposed by matrix decompositions, additions, and row/column permutations. Then, the proposed fast 1D integer transforms are hardware shared, and they can be applied to the 2D transform scheme. The hardware costs of the proposed fast 1D and 2D integer transform designs are smaller than those of the previous individual designs without shares. With the hardware share, the proposed architecture is suitable for the low-cost implementation of the VC-1 codec.

IEEE Transactions on Signal Processing | 1997

Fixed-Pipeline Two-Dimensional Hadamard Transform Algorithms

Chih-Peng Fan; Jar-Ferr Yang

In this correspondence, we first propose a new two-dimensional (2-D) Hadamard transform algorithm, which can be realized in fixed and identical pipeline stages. By introducing exchangeable permutations, the fixed-pipeline algorithm can be further extended to provide all 2-D lower-dimension transformations in intermediate pipeline stages. Finally, the parallel pipeline realization of the proposed algorithm is also suggested. For VLSI implementation, the proposed fixed-pipeline structure with the same computational complexity provides better modularity than the other famous algorithms. With lower dimension transformations, the proposed algorithm is suitable for applications in variable-block-size compression.

international conference on consumer electronics | 2003

An adaptive carrier synchronizer for M-QAM cable receiver

Chun-Nan Ke; Cheng-Yi Huang; Chih-Peng Fan

A digital carrier recovery (CR) loop with an adaptive loop bandwidth for rapid carrier frequency offset acquisition and low steady-state jitter is proposed in this paper. In addition to the traditional CR functional blocks, the presented carrier synchronizer consists of a tracking-status detector and a loop bandwidth controller. The tracking-status detector monitors the frequency-estimate signals output from the frequency-tracking (integral) branch of the loop filter, detecting whether the frequency offset is locked or not. Then, by adjusting the loop bandwidth in response to the detected result, the convergence time of the acquisition-state and the carrier jitter in the steady-state can be reduced. The new scheme, implemented by FPGAs, has been successfully applied to a 256-QAM baseband digital receiver and also inter-operated with a commercial CMTS. The SNR performance can be improved up to 3 dB only at the expense of 3% hardware area of a Virtex-II-2000 FPGA for this proposed architecture.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1999

Recursive discrete cosine transforms with selectable fixed-coefficient filters

Jar-Ferr Yang; Chih-Peng Fan

In this work, we propose new fixed-coefficient recursive structures for computing discrete cosine transforms with the power-of-two length. The fixed-coefficient recursive structures are developed by exploring the periodicity embedded in transform bases, whose indices can form a complete residue system or a complete odd residue system. After simple data manipulation, the proposed filtering structures requiring fixed-coefficient multipliers are better than the previous recursive methods which need general multipliers in filter realization. In particular, we found that the properly selected fixed-coefficient fitters achieve lower roundoff errors than the nominal variable-coefficient ones for computing DCTs in finite-word-length machines.

Explore More