Dapeng Tao
Yunnan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dapeng Tao.
IEEE Transactions on Circuits and Systems for Video Technology | 2013
Dapeng Tao; Lianwen Jin; Yongfei Wang; Yuan Yuan; Xuelong Li
With the rapid development of the intelligent video surveillance (IVS), person re-identification, which is a difficult yet unavoidable problem in video surveillance, has received increasing attention in recent years. That is because computer capacity has shown remarkable progress and the task of person re-identification plays a critical role in video surveillance systems. In short, person re-identification aims to find an individual again that has been observed over different cameras. It has been reported that KISS metric learning has obtained the state of the art performance for person re-identification on the VIPeR dataset . However, given a small size training set, the estimation to the inverse of a covariance matrix is not stable and thus the resulting performance can be poor. In this paper, we present regularized smoothing KISS metric learning (RS-KISS) by seamlessly integrating smoothing and regularization techniques for robustly estimating covariance matrices. RS-KISS is superior to KISS, because RS-KISS can enlarge the underestimated small eigenvalues and can reduce the overestimated large eigenvalues of the estimated covariance matrix in an effective way. By providing additional data, we can obtain a more robust model by RS-KISS. However, retraining RS-KISS on all the available examples in a straightforward way is time consuming, so we introduce incremental learning to RS-KISS. We thoroughly conduct experiments on the VIPeR dataset and verify that 1) RS-KISS completely beats all available results for person re-identification and 2) incremental RS-KISS performs as well as RS-KISS but reduces the computational cost significantly.
Information Sciences | 2014
Jun Yu; Dapeng Tao; Jonathan Li; Jun Cheng
How do we accurately browse a large set of images or efficiently annotate the images from an image library? Image clustering methods are invaluable tools for applications such as content-based image retrieval and image annotation. To perform these tasks, it is critical to have proper features to describe the visual and semantic content of images and to define an accurate distance metric to measure the dissimilarity between any two images. However, existing methods, which adopt the features of color histograms, edge direction histograms and shape context, lack the ability to describe semantic content. To solve this problem, we propose a new approach that utilizes user-provided pairwise constraints to describe the semantic relationship between two images. A Semantic Preserving Distance Metric Learning (SP-DML) algorithm is developed to explore the complementary characteristics of the visual features and pairwise constraints in a unified feature space. In this space, the learned distance metric can be used to measure the dissimilarity between two images. Specifically, the manifold structure adopted in SP-DML is revealed by the images visual features. To integrate semantic contents in distance metric learning, SP-DML utilizes pairwise constraints to build semantic patches and align these patches to obtain the optimal distance metric for the new feature space. Experimental results in image clustering demonstrate that the performance of SP-DML is appealing.
IEEE Transactions on Systems, Man, and Cybernetics | 2016
Dapeng Tao; Xu Lin; Lianwen Jin; Xuelong Li
Chinese character font recognition (CCFR) has received increasing attention as the intelligent applications based on optical character recognition becomes popular. However, traditional CCFR systems do not handle noisy data effectively. By analyzing in detail the basic strokes of Chinese characters, we propose that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks. For robust CCFR, we integrate a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and develop principal component 2DLSTM (PC-2DLSTM) algorithm. PC-2DLSTM considers two aspects: 1) the principal component layer convolution operation helps remove the noise and get a rational and complete font information and 2) simultaneously, 2DLSTM deals with the long-range contextual processing along scan directions that can contribute to capture the contrast between character trajectory and background. Experiments using the frequently used CCFR dataset suggest the effectiveness of PC-2DLSTM compared with other state-of-the-art font recognition methods.
IEEE Transactions on Neural Networks | 2016
Dapeng Tao; Jun Cheng; Mingli Song; Xu Lin
Saliency detection is used to identify the most important and informative area in a scene, and it is widely used in various vision tasks, including image quality assessment, image matching, and object recognition. Manifold ranking (MR) has been used to great effect for the saliency detection, since it not only incorporates the local spatial information but also utilizes the labeling information from background queries. However, MR completely ignores the feature information extracted from each superpixel. In this paper, we propose an MR-based matrix factorization (MRMF) method to overcome this limitation. MRMF models the ranking problem in the matrix factorization framework and embeds query sample labels in the coefficients. By incorporating spatial information and embedding labels, MRMF enforces similar saliency values on neighboring superpixels and ranks superpixels according to the learned coefficients. We prove that the MRMF has good generalizability, and develops an efficient optimization algorithm based on the Nesterov method. Experiments using popular benchmark data sets illustrate the promise of MRMF compared with the other state-of-the-art saliency detection methods.
IEEE Transactions on Image Processing | 2016
Dapeng Tao; Yanan Guo; Mingli Song; Yaotang Li; Zhengtao Yu; Yuan Yan Tang
Person re-identification aims to match the images of pedestrians across different camera views from different locations. This is a challenging intelligent video surveillance problem that remains an active area of research due to the need for performance improvement. Person re-identification involves two main steps: feature representation and metric learning. Although the keep it simple and straightforward (KISS) metric learning method for discriminative distance metric learning has been shown to be effective for the person re-identification, the estimation of the inverse of a covariance matrix is unstable and indeed may not exist when the training set is small, resulting in poor performance. Here, we present dual-regularized KISS (DR-KISS) metric learning. By regularizing the two covariance matrices, DR-KISS improves on KISS by reducing overestimation of large eigenvalues of the two estimated covariance matrices and, in doing so, guarantees that the covariance matrix is irreversible. Furthermore, we provide theoretical analyses for supporting the motivations. Specifically, we first prove why the regularization is necessary. Then, we prove that the proposed method is robust for generalization. We conduct extensive experiments on three challenging person re-identification datasets, VIPeR, GRID, and CUHK 01, and show that DR-KISS achieves new state-of-the-art performance.Person re-identification aims to match the images of pedestrians across different camera views from different locations. This is a challenging intelligent video surveillance problem that remains an active area of research due to the need for performance improvement. Person re-identification involves two main steps: feature representation and metric learning. Although the keep it simple and straightforward (KISS) metric learning method for discriminative distance metric learning has been shown to be effective for the person re-identification, the estimation of the inverse of a covariance matrix is unstable and indeed may not exist when the training set is small, resulting in poor performance. Here, we present dual-regularized KISS (DR-KISS) metric learning. By regularizing the two covariance matrices, DR-KISS improves on KISS by reducing overestimation of large eigenvalues of the two estimated covariance matrices and, in doing so, guarantees that the covariance matrix is irreversible. Furthermore, we provide theoretical analyses for supporting the motivations. Specifically, we first prove why the regularization is necessary. Then, we prove that the proposed method is robust for generalization. We conduct extensive experiments on three challenging person re-identification datasets, VIPeR, GRID, and CUHK 01, and show that DR-KISS achieves new state-of-the-art performance.
IEEE Transactions on Systems, Man, and Cybernetics | 2015
Dapeng Tao; Lianwen Jin; Yongfei Wang; Xuelong Li
In recent years, person reidentification has received growing attention with the increasing popularity of intelligent video surveillance. This is because person reidentification is critical for human tracking with multiple cameras. Recently, keep it simple and straightforward (KISS) metric learning has been regarded as a top level algorithm for person reidentification. The covariance matrices of KISS are estimated by maximum likelihood (ML) estimation. It is known that discriminative learning based on the minimum classification error (MCE) is more reliable than classical ML estimation with the increasing of the number of training samples. When considering a small sample size problem, direct MCE KISS does not work well, because of the estimate error of small eigenvalues. Therefore, we further introduce the smoothing technique to improve the estimates of the small eigenvalues of a covariance matrix. Our new scheme is termed the minimum classification error-KISS (MCE-KISS). We conduct thorough validation experiments on the VIPeR and ETHZ datasets, which demonstrate the robustness and effectiveness of MCE-KISS for person reidentification.
Information Sciences | 2015
Chaoqun Hong; Jun Yu; Jane You; Xuhui Chen; Dapeng Tao
View-based methods are popular in 3D object recognition. However, current methods with traditional classifiers are usually based on one-to-one view matching and fail to capture the structure information of multiple views. Some multi-view based methods take different views into consideration, but they still treat views separately. In this paper, we propose a novel 3D object recognizing method based on multi-view data fusion, called Multi-view Ensemble Manifold Regularization (MEMR). In this method, we model image features with a regularization term for SVM. To train this modified SVM, multi-view learning is achieved with alternating optimization. Hypergraph construction is used to better capture the connectivity among views. Experimental results show that the accuracy rate has been improved by 20-25%, which demonstrates the effectiveness of the proposed method.
IEEE Transactions on Systems, Man, and Cybernetics | 2013
Dapeng Tao; Lianwen Jin; Zhao Yang; Xuelong Li
With the rapid development of the RGB-D sensors and the promptly growing population of the low-cost Microsoft Kinect sensor, scene classification, which is a hard, yet important, problem in computer vision, has gained a resurgence of interest recently. That is because the depth of information provided by the Kinect sensor opens an effective and innovative way for scene classification. In this paper, we propose a new scheme for scene classification, which applies locality-constrained linear coding (LLC) to local SIFT features for representing the RGB-D samples and classifies scenes through the cooperation between a new rank preserving sparse learning (RPSL) based dimension reduction and a simple classification method. RPSL considers four aspects: 1) it preserves the rank order information of the within-class samples in a local patch; 2) it maximizes the margin between the between-class samples on the local patch; 3) the L1-norm penalty is introduced to obtain the parsimony property; and 4) it models the classification error minimization by utilizing the least-squares error minimization. Experiments are conducted on the NYU Depth V1 dataset and demonstrate the robustness and effectiveness of RPSL for scene classification.
Information Sciences | 2017
Xinghao Yang; Weifeng Liu; Dapeng Tao; Jun Cheng
In recent years, deep learning has attracted an increasing amount of attention in machine learning and artificial intelligence areas. Currently, many deep learning network-related architectures such as deep neural networks (DNNs), convolutional neural network (CNN), wavelet scattering network (ScatNet) and principal component analysis network (PCANet) have been proposed. The most effective network is PCANet, which has achieved promising performance in image classification, such as for face, object and handwritten digit recognition. PCANet can only handle data that are represented by single-view features. In this paper, we present a canonical correlation analysis network (CCANet) to address image classification, in which images are represented by two-view features. The CCANet learns two-view multistage filter banks by a canonical correlation analysis (CCA) method and constructs a cascaded convolutional deep network. Then, we incorporate filters with binaryzation and block-wise histogram processes to form the final depth structure. In addition, we introduce a variation of CCANetdubbed RandNet-2in which the filter banks are randomly generated. Extensive experiments are conducted using the ETH-80, Yale-B, and USPS databases for object classification, face classification and handwritten digits classification, respectively. The experimental results demonstrate that the CCANet algorithm is more effective than PCANet, RandNet-1 and RandNet-2.
IEEE Transactions on Multimedia | 2017
Yanhua Yang; Cheng Deng; Shangqian Gao; Wei Liu; Dapeng Tao; Xinbo Gao
As the prosperity of low-cost and easy-operating depth cameras, skeleton-based human action recognition has been extensively studied recently. However, most of the existing methods partially consider that all 3D joints of a human skeleton are identical. Actually, these 3D joints exhibit diverse responses to different action classes, and some joint configurations are more discriminative to distinguish a certain action. In this paper, we propose a discriminative multi-instance multitask learning (MIMTL) framework to discover the intrinsic relationship between joint configurations and action classes. First, a set of discriminative and informative joint configurations for the corresponding action class is captured in multi-instance learning model by regarding the action and the joint configurations as a bag and its instances, respectively. Then, a multitask learning model with group structure constraints is exploited to further reveal the intrinsic relationship between the joint configurations and different action classes. We conduct extensive evaluations of MIMTL using three benchmark 3D action recognition datasets. Experimental results show that our proposed MIMTL framework performs favorably compared with several state-of-the-art approaches.