Is this you? Create Your Porfile

Hengzhu Liu

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hengzhu Liu is active.

Explore More

Publication

Featured researches published by Hengzhu Liu.

asia and south pacific design automation conference | 2011

Power-efficient tree-based multicast support for networks-on-chip

Wenmin Hu; Zhonghai Lu; Axel Jantsch; Hengzhu Liu

In this paper, a novel hardware support for multicast on mesh Networks-on-Chip (NoC) is proposed. It supports multicast routing on any shape of tree-based paths. Two power-efficient tree-based multicast routing algorithms, Optimized tree (OPT) and Left-XY-Right-Optimized tree (LXYROPT) are also proposed. XY tree-based (XYT) algorithm and multiple unicast copies (MUC) are also implemented on the router as baselines. Along with the increase of the destination size, compared with MUC, OPT and LXYROPT achieve a remarkable improvement in both latency and throughput while the average power consumption is reduced by 50% and 45%, respectively. Compared with XYT, OPT is 10% higher in latency but gains 17% saving in power consumption. LXYROPT is 3% lower in latency and 8% lower in power consumption. In some cases, OPT and LXYROPT give power saving up to 70% less than the XYT.

IEICE Electronics Express | 2012

Adaptive recoding CORDIC

Jianfeng Zhang; Hengzhu Liu; Wenmin Hu; Dongpei Liu; Botao Zhang

Conventional Coordinate Rotation Digital Computer (CORDIC) algorithm is an effective implementation of trigonometric function, and it has been widely applied in digital signal processing. In this paper, we propose a novel CORDIC architecture based on Scaling Free (SF) theory, which adopts adaptive recoding method to reduce the iterations and eliminate the scaling factor. The proposed scheme has been synthesized using 65nm CMOS technology with standard cell library. Compared with SFB4C, our scheme saves 21.11% to 34.64% hardware-overhead with no performance penalty.

ieee international conference on communication software and networks | 2011

Cognitive radio simulation environment realization based on autonomic communication

Shixian Wang; Hengzhu Liu; Lun-guo Xie; Wenmin Hu

Cognitive radio has been a research hotspot because of its promise to improve the utilization of the assigned but unused radio spectrum. In order to have such ability, cognitive engine, the intelligent part of cognitive radio, should configure the system to adapt to its communication context. Although much artificial intelligence techniques have been proposed for the cognitive engine realization, this paper proposes an agent based realization method, which had been investigated in the autonomic communication research. Based on the similarity between cognitive radio and autonomic communication, the autonomic cognitive radio conceptual and architecture models are proposed. Cognitive radio nodes realized by the proposed models are easy to form an autonomic cognitive radio network, which can be looked as an autonomic communication environment. The autonomic cognitive radio node is expressed by autonomic communication element (ACE) architecture and a realization method is given based on the open-source ACE toolkit, which establishes a simulation environment for cognitive radio research.

Circuits Systems and Signal Processing | 2014

Computationally Efficient Architecture for Accurate Frequency Estimation with Fourier Interpolation

Dongpei Liu; Hengzhu Liu; Li Zhou; Jianfeng Zhang; Botao Zhang

A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency.

IEICE Electronics Express | 2012

Flexible and high-efficiency turbo product code decoder design

Li Zhou; Hengzhu Liu; Botao Zhang

This paper presents a flexible and high-efficiency decoder for turbo product code using extended Hamming code. The supported component code ranges from (8, 4) to (128, 120) to provide enough flexibility for various communication standards. A novel Chase decoder architecture is developed with high efficiency using a low complexity algorithm. Moreover, a conflict free interleave memory access model for variable length is provided. A 90 nm standard cell technology shows that the decoder sustains a maximum throughput of 5.6Gbps and consumes 300 k gates.

international symposium on signals, systems and electronics | 2010

Parallelized cyclostationary feature detection on a software defined radio processor

Shixian Wang; Botao Zhang; Hengzhu Liu; Lunguo Xie

The accurate and timely spectrum sensing ability is very critical to cognitive radio. Cyclostationary feature detection has the ability to separate the signal of interest from noise and/or interference, but the computational complexity of cyclic spectral analysis limits its use as a signal analysis tool. To reduce the computational complexity of cyclic spectral analysis, this paper proposes an efficient parallel FFT accumulation method (FAM) algorithm and realizes it on a novel SDR processor for next generation wireless communication, GAEA. The simulation results show the parallel FAM algorithms processing time on an 8 MHz BW signal at frequency resolution of 512 Hz requires approximately 78.8 ms. This approach will be suitable for spectrum sensing of cognitive radio and other cyclostationary feature detection applications.

IEICE Electronics Express | 2014

Real-time hardware accelerator for single image haze removal using dark channel prior and guided filter

Zhengfa Liang; Hengzhu Liu; Botao Zhang; Benzhang Wang

This paper presents a real-time hardware accelerator for single image haze removal using dark channel prior and guided filter on a FPGA chip. Single image haze removal using dark channel prior and guided filter is one of the state-of-art algorithms recently proposed. However, its large quantity of calculation limits its real-time processing ability. So, in this paper, we design a hardware accelerator based on FPGA implementation for single image haze removal, which takes full advantage of the powerful parallel processing ability of the hardware and the parallelism of the algorithm. To be exactly, 1) the dark channel calculation part and the atmospheric light calculation part of the algorithm are modified to reduce the quantity of computation; 2) two pipelines are applied in the guided filtering to speed up the processing; 3) in addition, fast mean filtering technique is used to accelerate the mean filtering, which is the main calculation of the guided filter, by avoiding redundant computation. To the best of our knowledge, this paper is also the first FPGA design for single image haze removal using dark channel prior and guided filtering. The design can achieve 13.74ms at 100 MHz when processing a 720 × 576 image, and gives almost the same results as that of original algorithm.

IEEE Micro | 2014

FT-Matrix: A Coordination-aware Architecture for Signal Processing

Shuming Chen; Yaohua Wang; Sheng Liu; Jianghua Wan; Haiyan Chen; Hengzhu Liu; Kai Zhang; Xiangyuan Liu; Xi Ning

Vector-SIMD architectures have gained increasing attention because of their high performance in signal-processing applications. However, the performance of existing vector-SIMD architectures remains limited because of their inefficiency in the coordinated exploitation of different hardware units. To solve this problem, this article proposes the FT-Matrix architecture, which improves the coordination of traditional vector-SIMD architectures from three aspects: the cooperation between the scalar and SIMD unit is refined with the dynamic coupling execution scheme, the communication among SIMD lanes is enhanced with the matrix-style communication, and data sharing among vector memory banks is accomplished by the unaligned vector memory accessing scheme. Evaluation results show an average performance gain of 58.5 percent against vector-SIMD architectures without the proposed improvements. A four-core chip with each core built on the FT-Matrix architecture is also under fabrication.

Journal of Computers | 2012

TPSS: A Flexible Hardware Support for Unicast and Multicast on Network-on-Chip

Wenmin Hu; Zhonghai Lu; Hengzhu Liu; Axel Jantsch

Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%).

2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip | 2011

Network-on-Chip multicasting with low latency path setup

Wenmin Hu; Zhonghai Lu; Axel Jantsch; Hengzhu Liu; Botao Zhang; Dongpei Liu

A low-latency path setup approach with multiple setup packets for parallel set is presented. It reduces the header overhead compared to multiaddress encoding. Further, we propose four variants of deadlock-free multicast routing algorithms using different subpath generation methods, different destination partitioning, and channel sharing strategies. Experimental results show that the quatuor partitions path-like tree outperforms other algorithms.

Explore More