Dafei Huang
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dafei Huang.
british machine vision conference | 2015
Dafei Huang; Lei Luo; Mei Wen; Zhaoyun Chen; Chunyuan Zhang
Among increasingly complicated trackers in visual tracking area, recently proposed correlation filter based trackers have achieved appealing performance despite their great simplicity and superior speed. However, the filter input is a bounding box of fixed size, so they are not born with the adaptability to target’s scale and aspect ratio changes. Although scaleadaptive variants have been proposed, they are not flexible enough due to pre-defined scale sampling manners. Moreover, to the best of our knowledge, no correlation filter variant has been proposed to handle aspect ratio variation. To tackle this problem, this paper integrates the class-agnostic detection proposal method, which is widely adopted in object detection area, into a correlation filter tracker, and presents KCFDP tracker. The correlation filter part of KCFDP is based on KCF[2] with some modifications. We extend the HOG feature in KCF to a combination of HOG, intensity, and color naming by simply concatenating the three features, resulting in 42 feature channels. The model updating scheme in KCF, which is simple linear interpolation, is substituted with a more robust scheme presented in [1]. EdgeBoxes[4] is adopted to generate flexible detection proposals and enable the scale and aspect ratio adaptability of our tracker. It traverses the whole image in a sliding window manner, and scores every sampled bounding box according to the number of contours that are wholly enclosed. To accelerate EdgeBoxes and produce less unnecessary proposals, we set the minimum proposal area and aspect ratio range dynamically in sliding window sampling according to the current target size. In the tracking pipeline, KCF is firstly performed to estimate the preliminary target location ld . Within a patch zd extracted from current frame, KCF locates the target center according to the location of the maximum element in f : f(zd) = kxz d · α, (1)
International Journal of Computer Vision | 2017
Dafei Huang; Lei Luo; Zhaoyun Chen; Mei Wen; Chunyuan Zhang
The newly proposed correlation filter based trackers can achieve appealing performance despite their great simplicity and superior speed. However, this kind of object trackers is not born with scale and aspect ratio adaptability, thus resulting in suboptimal tracking accuracy. To tackle this problem, this paper integrates the class-agnostic detection proposal method, which is widely adopted in object detection area, into a correlation filter tracker. In the tracker part, optimizations such as feature integration, robust model updating and proposal rejection are applied for efficient integration. As for proposal generation, through integrating and comparing four detection proposal generators along with two baseline methods, the quality of detection proposals is found to have considerable influence on tracking accuracy. Therefore, as the most promising proposal generator, EdgeBoxes is chosen and further enhanced with background suppression. Evaluations are mainly performed on a challenging 50-sequence dataset (OTB50) and its two subsets, 28 sequences with significant scale variation and 14 sequences with obvious aspect ratio change. Among the trackers equipped with different proposal generators, state-of-the-art trackers and existing correlation filter variants, our proposed tracker reports the highest accuracy while running efficiently at an average speed of 20.4 frames per second. Additionally, numerical performance analysis in per-sequence manner and experiment results on VOT2014 dataset are also presented to enable deeper insights into our approach.
european conference on parallel processing | 2014
Dafei Huang; Mei Wen; Changqing Xun; Dong Chen; Xing Cai; Yuran Qiao; Nan Wu; Chunyuan Zhang
When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. When executing GPU-specific kernels on CPUs, local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns by using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by removing all the unwanted local-memory arrays together with the obsolete barrier statements. Experiments show that the automated transformation can satisfactorily improve OpenCL kernel performances on Sandy Bridge CPU and Intel’s Many-Integrated-Core coprocessor.
multimedia signal processing | 2012
Nan Wu; Mei Wen; Ju Ren; Huayou Su; Dafei Huang
Model-based design is widely accepted in developing parallel program. Stream model, an emerging model-based programming method, shows surprisingly good efficiency in many compute-intensive domains, especially for complex media processing. On the basis, this paper illustrates how to map the stream H.264 encoder onto concrete parallel processors such as stream processor. Results show that our encoder achieves significant speedup over the original X264 encoder on various programmable architectures.
high performance computing and communications | 2013
Dong Chen; Changqing Xun; Dafei Huang; Mei Wen; Chunyuan Zhang
In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework is based on the independence of work groups which built inside the OpenCL programming model and relies heavily on the knowledge of global memory access regions of work groups. So global memory access patterns of work groups are analyzed and an abstract representation CCRwS is designed to describe the exact memory access regions of each memory access statement in the kernels. A global memory access analyzer is designed to get CCRwSs by performing static program analysis on kernel codes. Based on CCRwSs, data transfer between multiple devices and host can be fully controlled by our framework. Then a kernel code regenerator is designed to distribute the workload and perform architecture specific optimizations by code transformation. Then we tested our framework on a platform with 2 Intel E5-2650 CPUs and 4 NVIDIA Tesla C2050 GPUs. Compared with the performance on single GPU, the kernels running on all the 6 devices can achieve about 4.5x faster.
network and parallel computing | 2017
Yuran Qiao; Junzhong Shen; Dafei Huang; Qianming Yang; Mei Wen; Chunyuan Zhang
Nowadays, the rapid growth of data across the Internet has provided sufficient labeled data to train deep structured artificial neural networks. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. To this end, various FPGA based deep neural network accelerators are proposed for higher performance and lower energy consumption. However, as a dilemma, the development cycle of FPGA application is much longer than that of CPU and GPU. Although FPGA vendors such as Altera and Xilinx have released OpenCL framework to ease the programming, tuning the OpenCL codes for desirable performance on FPGAs is still challenging. In this paper, we look into the OpenCL implementation of Convolutional Neural Network (CNN) on FPGA. By analysing the execution manners of a CPU/GPU oriented verision on FPGA, we find out the causes of performance difference between FPGA and CPU/GPU and locate the performance bottlenecks. According to our analysis, we put forward a corresponding optimization method focusing on external memory transfers. We implement a prototype system on an Altera Stratix V A7 FPGA, which brings a considerable 4.76\(\times \) speed up to the original version. To the best of our knowledge, this implementation outperforms most of the previous OpenCL implementations on FPGA by a large margin.
Journal of Zhejiang University Science C | 2017
Zhaoyun Chen; Lei Luo; Dafei Huang; Mei Wen; Chunyuan Zhang
Recently correlation filter based trackers have attracted considerable attention for their high computational efficiency. However, they cannot handle occlusion and scale variation well enough. This paper aims at preventing the tracker from failure in these two situations by integrating the depth information into a correlation filter based tracker. By using RGB-D data, we construct a depth context model to reveal the spatial correlation between the target and its surrounding regions. Furthermore, we adopt a region growing method to make our tracker robust to occlusion and scale variation. Additional optimizations such as a model updating scheme are applied to improve the performance for longer video sequences. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed tracker performs favourably against state-of-the-art algorithms.
Journal of Zhejiang University Science C | 2015
Mei Wen; Dafei Huang; Changqing Xun; Dong Chen
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are functionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typically, the use of OpenCL’s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements are also achieved on Intel’s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.
Computer Graphics and Imaging | 2013
Dafei Huang; Mei Wen; Yungang Xue; Nan Wu; Ju Ren; Huayou Su; Chunyuan Zhang
Traditional panoramic image stitching, which refers to single-viewpoint cylindrical or spherical panorama generation, has two crucial limitations when it is used for planelike scenes. One is the severe distortion induced by warping, and the other is that the optical center of camera must be fixed. However, existing stitching methods designed for plane-like scenes also have some drawbacks, such as ghosting and limited adaptability. In this paper, an Automatic Stitching System for Images of Plane-like Scenes (ASSIPS) is proposed to overcome the limitations and drawbacks. In ASSIPS, input images are taken with a handheld camera which is translated along the scene. Images are firstly used to estimate their camera poses and calculate the sparse structure of the scene. Then a dominant plane is found out and all the input images are warped according to the plane. Finally, warped images are aligned incrementally with similarity model and a simplified multiband blending algorithm is used in the process of image blending. Results of ASSIPS for various plane-like scenes prove that it can provide visually attractive panoramic images which look like taken with the direction perpendicular to the scene, and effectively eliminate stitching seams, blurs and ghosts.
IEICE Transactions on Information and Systems | 2013
Jun Chai; Mei Wen; Nan Wu; Dafei Huang; Jing Yang; Xing Cai; Chunyuan Zhang; Qianming Yang