Dongpei Liu
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongpei Liu.
IEICE Electronics Express | 2012
Jianfeng Zhang; Hengzhu Liu; Wenmin Hu; Dongpei Liu; Botao Zhang
Conventional Coordinate Rotation Digital Computer (CORDIC) algorithm is an effective implementation of trigonometric function, and it has been widely applied in digital signal processing. In this paper, we propose a novel CORDIC architecture based on Scaling Free (SF) theory, which adopts adaptive recoding method to reduce the iterations and eliminate the scaling factor. The proposed scheme has been synthesized using 65nm CMOS technology with standard cell library. Compared with SFB4C, our scheme saves 21.11% to 34.64% hardware-overhead with no performance penalty.
Circuits Systems and Signal Processing | 2014
Dongpei Liu; Hengzhu Liu; Li Zhou; Jianfeng Zhang; Botao Zhang
A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency.
2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip | 2011
Wenmin Hu; Zhonghai Lu; Axel Jantsch; Hengzhu Liu; Botao Zhang; Dongpei Liu
A low-latency path setup approach with multiple setup packets for parallel set is presented. It reduces the header overhead compared to multiaddress encoding. Further, we propose four variants of deadlock-free multicast routing algorithms using different subpath generation methods, different destination partitioning, and channel sharing strategies. Experimental results show that the quatuor partitions path-like tree outperforms other algorithms.
international conference on computer engineering and technology | 2010
Botao Zhang; Dongpei Liu; Shixian Wang; Xucan Chen; Hengzhu Liu
BCH code is adopted as a part of the Forward Error Correction subsystem of DVB-S2 system, which is the next generation digital video broadcast system based on satellite wireless communication system. As DVB-S2 system uses very long code length and multiple code modes, full compatible BCH decoder is high area cost. In order to reduce the area cost of the full compatible DVB-S2 BCH decoder, we have modified the VLSI implementation of Berlekamp Massey Algorithm for key equation solver which is the main part of BCH decoder, rebuilt the Galois Filed Multiplications including the constant Galois Field Multiplications and the general Galois Field Multiplications. In order to support all the code modes and Adaptive Coded Modulation, a novel duo-pipeline reconfigurable architecture have been proposed, which can support code mode switching without stalling the symbol stream. We have implemented the decoder with verilog HDL, and have evaluated it by FPGA platform and ASIC library. The results show that the logic area of the decoder is at least 13% fewer than other existing decoders.
applied reconfigurable computing | 2013
Li Zhou; Dongpei Liu; Botao Zhang; Hengzhu Liu
Coarse-grained reconfigurable array (CGRA) is an efficient architecture in digital signal processing domain. It is one of the best candidate architectures to exploit instruction level and loop level parallelism in the computation intensive applications while maintaining a certain degree of flexibility. The performance of CGRA is greatly reliant on the mapping algorithm which associate operations with PE and the time slot to execute.
CCF National Conference on Compujter Engineering and Technology | 2013
Li Zhou; Hengzhu Liu; Dongpei Liu
Coarse grained reconfigurable array (CGRA) is an architecture which offers hardware like high performance and software like flexibility. The two characteristics make CGRA an effective candidate for computational intensive applications. In this paper, we propose a novel cluster base CGRA architecture which achieves high efficiency of CGRA. The reconfigurable processing elements in CGRA clusters share complex function units and registers. Area is reduced due to the resource sharing and the performance is improved. Besides, an ant colony based mapping algorithm is proposed. Experiments show that the cluster base CGRA outperforms some existing architectures in the efficiency; the proposed mapping algorithm also outperforms other mapping heuristics.
BIC-TA | 2013
Li Zhou; Dongpei Liu; Min Tang; Hengzhu Liu
Coarse-grained reconfigurable array (CGRA) is a competitive hardware platform for computation intensive tasks in many application domains. The performance of CGRA heavily depends on the mapping algorithm which exploits different level of parallelisms. Unfortunately, the mapping problem on CGRA is proved to be NP-complete. In this paper, we propose a genetic based modulo scheduling algorithm to map application kernels onto CGRA. An efficient routing heuristic is also presented to reduce the mapping time. Experiment result shows our algorithm outperforms other heuristic algorithms both in solution’s quality and mapping time.
Archive | 2012
Dongpei Liu; Hengzhu Liu; Jianfeng Zhang; Botao Zhang; Li Zhou
This paper presents a CORDIC-based radix-4 FFT processor, which adopts an improved conflict-free parallel memory access scheme and the pipelined CORDIC architecture. By generating the twiddle factor correctly, the proposed FFT processor eliminates the need of ROM making it memory-efficient. Synthesis results show that the 16-bit 1024-point FFT processor only has 45 K equivalent gates with area of 0.13 mm2 excluding memories in Chartered 90 nm CMOS technology. When the operating frequency is 350 MHz, the proposed FFT processor performs 1024-point FFT every 3.94 μs.
International Journal of Electronics | 2015
Li Zhou; Dongpei Liu; Jianfeng Zhang; Hengzhu Liu
Coarse-grained reconfigurable arrays (CGRAs) have shown potential for application in embedded systems in recent years. Numerous reconfigurable processing elements (PEs) in CGRAs provide flexibility while maintaining high performance by exploring different levels of parallelism. However, a difference remains between the CGRA and the application-specific integrated circuit (ASIC). Some application domains, such as software-defined radios (SDRs), require flexibility with performance demand increases. More effective CGRA architectures are expected to be developed. Customisation of a CGRA according to its application can improve performance and efficiency. This study proposes an application-specific CGRA architecture template composed of generic PEs (GPEs) and special PEs (SPEs). The hardware of the SPE can be customised to accelerate specific computational patterns. An automatic design methodology that includes pattern identification and application-specific function unit generation is also presented. A mapping algorithm based on ant colony optimisation is provided. Experimental results on the SDR target domain show that compared with other ordinary and application-specific reconfigurable architectures, the CGRA generated by the proposed method performs more efficiently for given applications.
MUSIC | 2014
Dongpei Liu; Hengzhu Liu; Li Zhou
Max ∗ operator is the kernel operation in MAP decoding. An intuitive approximation to the correction term of max ∗ operator is presented. The binary-tree based architecture for multi-variable max ∗ calculation is also suggested. The proposed max ∗ operator provides a good trade off between hardware overhead and logic delay, and can be easily realized in parallel. Simulations on (37,21) turbo code demonstrate that the BER performance of proposed scheme is almost near the optimal Log-MAP algorithm and significantly superior to the Max-Log-MAP algorithm. The proposed enhanced implementation of max ∗ operator has potential applications in turbo decoder.