Hengzhu Liu
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hengzhu Liu.
asia and south pacific design automation conference | 2011
Wenmin Hu; Zhonghai Lu; Axel Jantsch; Hengzhu Liu
In this paper, a novel hardware support for multicast on mesh Networks-on-Chip (NoC) is proposed. It supports multicast routing on any shape of tree-based paths. Two power-efficient tree-based multicast routing algorithms, Optimized tree (OPT) and Left-XY-Right-Optimized tree (LXYROPT) are also proposed. XY tree-based (XYT) algorithm and multiple unicast copies (MUC) are also implemented on the router as baselines. Along with the increase of the destination size, compared with MUC, OPT and LXYROPT achieve a remarkable improvement in both latency and throughput while the average power consumption is reduced by 50% and 45%, respectively. Compared with XYT, OPT is 10% higher in latency but gains 17% saving in power consumption. LXYROPT is 3% lower in latency and 8% lower in power consumption. In some cases, OPT and LXYROPT give power saving up to 70% less than the XYT.
IEICE Electronics Express | 2012
Jianfeng Zhang; Hengzhu Liu; Wenmin Hu; Dongpei Liu; Botao Zhang
Conventional Coordinate Rotation Digital Computer (CORDIC) algorithm is an effective implementation of trigonometric function, and it has been widely applied in digital signal processing. In this paper, we propose a novel CORDIC architecture based on Scaling Free (SF) theory, which adopts adaptive recoding method to reduce the iterations and eliminate the scaling factor. The proposed scheme has been synthesized using 65nm CMOS technology with standard cell library. Compared with SFB4C, our scheme saves 21.11% to 34.64% hardware-overhead with no performance penalty.
ieee international conference on communication software and networks | 2011
Shixian Wang; Hengzhu Liu; Lun-guo Xie; Wenmin Hu
Cognitive radio has been a research hotspot because of its promise to improve the utilization of the assigned but unused radio spectrum. In order to have such ability, cognitive engine, the intelligent part of cognitive radio, should configure the system to adapt to its communication context. Although much artificial intelligence techniques have been proposed for the cognitive engine realization, this paper proposes an agent based realization method, which had been investigated in the autonomic communication research. Based on the similarity between cognitive radio and autonomic communication, the autonomic cognitive radio conceptual and architecture models are proposed. Cognitive radio nodes realized by the proposed models are easy to form an autonomic cognitive radio network, which can be looked as an autonomic communication environment. The autonomic cognitive radio node is expressed by autonomic communication element (ACE) architecture and a realization method is given based on the open-source ACE toolkit, which establishes a simulation environment for cognitive radio research.
Circuits Systems and Signal Processing | 2014
Dongpei Liu; Hengzhu Liu; Li Zhou; Jianfeng Zhang; Botao Zhang
A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency.
IEICE Electronics Express | 2012
Li Zhou; Hengzhu Liu; Botao Zhang
This paper presents a flexible and high-efficiency decoder for turbo product code using extended Hamming code. The supported component code ranges from (8, 4) to (128, 120) to provide enough flexibility for various communication standards. A novel Chase decoder architecture is developed with high efficiency using a low complexity algorithm. Moreover, a conflict free interleave memory access model for variable length is provided. A 90 nm standard cell technology shows that the decoder sustains a maximum throughput of 5.6Gbps and consumes 300 k gates.
international symposium on signals, systems and electronics | 2010
Shixian Wang; Botao Zhang; Hengzhu Liu; Lunguo Xie
The accurate and timely spectrum sensing ability is very critical to cognitive radio. Cyclostationary feature detection has the ability to separate the signal of interest from noise and/or interference, but the computational complexity of cyclic spectral analysis limits its use as a signal analysis tool. To reduce the computational complexity of cyclic spectral analysis, this paper proposes an efficient parallel FFT accumulation method (FAM) algorithm and realizes it on a novel SDR processor for next generation wireless communication, GAEA. The simulation results show the parallel FAM algorithms processing time on an 8 MHz BW signal at frequency resolution of 512 Hz requires approximately 78.8 ms. This approach will be suitable for spectrum sensing of cognitive radio and other cyclostationary feature detection applications.
IEICE Electronics Express | 2014
Zhengfa Liang; Hengzhu Liu; Botao Zhang; Benzhang Wang
This paper presents a real-time hardware accelerator for single image haze removal using dark channel prior and guided filter on a FPGA chip. Single image haze removal using dark channel prior and guided filter is one of the state-of-art algorithms recently proposed. However, its large quantity of calculation limits its real-time processing ability. So, in this paper, we design a hardware accelerator based on FPGA implementation for single image haze removal, which takes full advantage of the powerful parallel processing ability of the hardware and the parallelism of the algorithm. To be exactly, 1) the dark channel calculation part and the atmospheric light calculation part of the algorithm are modified to reduce the quantity of computation; 2) two pipelines are applied in the guided filtering to speed up the processing; 3) in addition, fast mean filtering technique is used to accelerate the mean filtering, which is the main calculation of the guided filter, by avoiding redundant computation. To the best of our knowledge, this paper is also the first FPGA design for single image haze removal using dark channel prior and guided filtering. The design can achieve 13.74ms at 100 MHz when processing a 720 × 576 image, and gives almost the same results as that of original algorithm.
IEEE Micro | 2014
Shuming Chen; Yaohua Wang; Sheng Liu; Jianghua Wan; Haiyan Chen; Hengzhu Liu; Kai Zhang; Xiangyuan Liu; Xi Ning
Vector-SIMD architectures have gained increasing attention because of their high performance in signal-processing applications. However, the performance of existing vector-SIMD architectures remains limited because of their inefficiency in the coordinated exploitation of different hardware units. To solve this problem, this article proposes the FT-Matrix architecture, which improves the coordination of traditional vector-SIMD architectures from three aspects: the cooperation between the scalar and SIMD unit is refined with the dynamic coupling execution scheme, the communication among SIMD lanes is enhanced with the matrix-style communication, and data sharing among vector memory banks is accomplished by the unaligned vector memory accessing scheme. Evaluation results show an average performance gain of 58.5 percent against vector-SIMD architectures without the proposed improvements. A four-core chip with each core built on the FT-Matrix architecture is also under fabrication.
Journal of Computers | 2012
Wenmin Hu; Zhonghai Lu; Hengzhu Liu; Axel Jantsch
Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%).
2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip | 2011
Wenmin Hu; Zhonghai Lu; Axel Jantsch; Hengzhu Liu; Botao Zhang; Dongpei Liu
A low-latency path setup approach with multiple setup packets for parallel set is presented. It reduces the header overhead compared to multiaddress encoding. Further, we propose four variants of deadlock-free multicast routing algorithms using different subpath generation methods, different destination partitioning, and channel sharing strategies. Experimental results show that the quatuor partitions path-like tree outperforms other algorithms.