Wenping Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenping Zhu is active.

Explore More

Publication

Featured researches published by Wenping Zhu.

design automation conference | 2015

A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction

Guangli Jiang; Leibo Liu; Wenping Zhu; Shouyi Yin; Shaojun Wei

Visual feature extraction is a fundamental technique in vision-based application. This paper proposes an effective and efficient VLSI architecture based on optimized accelerated KAZE (AKAZE) for real-time feature extraction. AKAZE is a new feature detection algorithm with strong robustness for object recognition. To extract feature more robustly and reduce hardware resource, a two-dimensional pipeline array named Loop-Snake Architecture is presented. It takes advantage of computational similarity in different octaves and provides flexibility in precision-speed tradeoff on the fly. Furthermore, Polar Local Difference Binary descriptor and the corresponding structure are proposed to greatly reduce the memory bandwidth requirement and improve the speed. The experimental results indicate the optimized algorithm keeps the same accuracy compared with the original algorithm. The whole hardware system achieves 127fps in 1080p resolution at 200 MHz frequency. The throughput is twice faster than the state-of-the-art solutions.

International Journal of Electronics | 2014

Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices

Wenping Zhu; Leibo Liu; Shouyi Yin; Siqi Hu; Eugene Y. Tang; Shaojun Wei

With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human–computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience.

Sensors | 2015

A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.

Guangli Jiang; Leibo Liu; Wenping Zhu; Shouyi Yin; Shaojun Wei

This paper proposes a real-time feature extraction VLSI architecture for high-resolution images based on the accelerated KAZE algorithm. Firstly, a new system architecture is proposed. It increases the system throughput, provides flexibility in image resolution, and offers trade-offs between speed and scaling robustness. The architecture consists of a two-dimensional pipeline array that fully utilizes computational similarities in octaves. Secondly, a substructure (block-serial discrete-time cellular neural network) that can realize a nonlinear filter is proposed. This structure decreases the memory demand through the removal of data dependency. Thirdly, a hardware-friendly descriptor is introduced in order to overcome the hardware design bottleneck through the polar sample pattern; a simplified method to realize rotation invariance is also presented. Finally, the proposed architecture is designed in TSMC 65 nm CMOS technology. The experimental results show a performance of 127 fps in full HD resolution at 200 MHz frequency. The peak performance reaches 181 GOPS and the throughput is double the speed of other state-of-the-art architectures.

design automation conference | 2017

A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment

Qiang Wang; Leibo Liu; Wenping Zhu; Huiyu Mo; Chenchen Deng; Shaojun Wei

In this work, a fast shape searching face alignment (F-SSFA) algorithm based accelerator is proposed to achieve real-time processing. Firstly, a learning based low-dimensional SURF feature is introduced to reduce the computation cost in the cascaded regression. Then the Euclidean distance and shape affine transformation are utilized to accelerate the shape searching procedure. F-SSFA therefore greatly reduces the computational complexity while keeping the same accuracy. Also, a fixed-point F-SSFA based VLSI architecture is designed with approximately 80% decrease in the data transmission traffic. The throughput of this accelerator achieves 700 fps, which is especially suitable for high-speed facial-related applications.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator

Wenping Zhu; Leibo Liu; Guangli Jiang; Shouyi Yin; Shaojun Wei

Binary image descriptors, which derive image feature description from the local image patches directly, are widely adopted in the mobile and embedded applications due to lower computational complexity and memory requirement. With the aim of improving the computation efficiency without degrading recognition performance, a lightweight binary robust descriptor is proposed based on the analysis of the state-of-the-art binary descriptors in this paper. A directional edge detection and optimized keypoint score function are developed to refine the keypoints. In addition, rotation invariance is achieved by executing circular symmetric-based descriptor generation and a coarse-grained orientation calculation method concurrently. The experimental results demonstrate that the proposed keypoint detector and binary descriptor achieve more than two times speedup and at least 23.6% improvement in processing speed with comparable performance, respectively. Furthermore, a very large scale integration architecture is also designed based on in-depth exploration of bit-level and task-level parallelism. Based on the postlayout simulation in a TSMC 65-nm CMOS process, the accelerator can achieve 135 frames/s on 1080p image while only consuming 87.5 mW at a 200-MHz operating frequency.

international symposium on circuits and systems | 2014

A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing

Wenping Zhu; Leibo Liu; Shouyi Yin; Yuan Dong; Shaojun Wei; Eugene Y. Tang; Jiqiang Song; Jinzhan Peng

Multiple mobile device-based collaborative computing emerges with the rapid proliferation of various smart mobile devices such as smartphones and tablets, which provide always-on connectivity, information and communication. However, due to severe resource poverty and poor network connectivity, lots of traditional embedded electronic devices with attracting features cannot be incorporated into this computing paradigm conveniently. In this paper, an uneven-dual-core SoC, which integrates a CPU core and a MCU core on a single chip with multiple operating system support, is proposed to realize loosely-coupled multiple heterogeneous device collaboration. A network file system, MRFS (Multi-client Raindrop File System), and FAT-X (File Allocation Table eXtension) are also proposed to provide client-centric cross-device data consistency and virtual file access respectively. Comprehensive mobile services are enabled by offloading appropriate tasks from existing smart mobile devices to involved traditional embedded devices. The SoC is implemented onto a 16.65 mm2 silicon with 65 nm CMOS technology. This paper also presents three typical applications to illustrate the universality and huge potential for innovative usage model of the proposed system.

international conference on electric information and control engineering | 2012

Gesture Recognition Approach on FPGA via Dynamic Time Warping

Siqi Hu; Leibo Liu; Shouyi Yin; Wenping Zhu; Eugene Tang; Shaojun Wei

This paper presents a hardware implementation of Dynamic Time Warping (DTW) algorithm that is always employed in gesture recognition system based on FPGA. To meet real-time operating requirement, parallel processing structure is designed. To work around the characteristics of high memory consumption in DTW, a solution of memory reuse is utilized to reduce dispensable storage space. The proposed system was evaluated by a database of 600 samples for 10 gestures collected from 6 participants. Recognition rate of 96.14% was achieved. Compared with other software implementation as reported in the latest literature, the system response time will drop greatly to an acceptable value. The method provides competitive results in performance and can be used in real-time recognition system.

ieee international conference on solid-state and integrated circuit technology | 2010

A novel application data coordinator for mobile computing systems

Wenping Zhu; Leibo Liu; Shouyi Yin; Eugene. Y. Tang; Jiqiang Song; Qian Huang; Shaojun Wei

Recent developments and technological advances in information and communication technologies are leading to an increasing availability and functionality of portable devices, with improved QoS (Quality of Service) of wireless connections together with decreasing costs. As a consequence, the power consumption of the portable device and the required transmission bandwidth are rapidly increasing. This paper proposes a novel architecture named FAST (Fast Application Service Transfer) coordinator, which aims at extending the battery lifetime of portable devices and adaptively regulating the network bandwidth of wireless communications in mobile computing environments. As a coprocessor capable of application-aware transfer acceleration with little involvement of the CPU, the architecture provides a direct and fast interface between the application software and the communication network. This coordinator has been implemented on an FPGA-based prototype platform. Emulation showed that the CPU usage was reduced nearly 90% in the target client system (Atom Z500 processor @800 MHz) by partitioning most of the computing-intensive tasks into the proposed coordinator. Moreover, the transmission data rate was reduced from 9.8Mbps to 2.4Mbps. With these improvements, the FAST coordinator can be widely used in a range of applications, including multimedia service delivery, web browsing, and collaborative computing.

Archive | 2012