Pichao Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pichao Wang is active.

Explore More

Publication

Featured researches published by Pichao Wang.

IEEE Transactions on Human-Machine Systems | 2016

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Pichao Wang; Wanqing Li; Zhimin Gao; Jing Zhang; Chang Tang; Philip Ogunbona

This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2-9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.

Pattern Recognition | 2016

RGB-D-based action recognition datasets

Jing Zhang; Wanqing Li; Philip Ogunbona; Pichao Wang; Chang Tang

Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-a-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols. HighlightsA detailed review and in-depth analysis of 44 publicly available RGB-D-based action datasets.Recommendations on the selection of datasets and evaluation protocols for use in future research.Identification of some limitations of these datasets and evaluation protocols.Recommendations on future creation of datasets and use of evaluation protocols.

acm multimedia | 2016

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Pichao Wang; Zhaoyang Li; Yonghong Hou; Wanqing Li

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

acm multimedia | 2015

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Pichao Wang; Wanqing Li; Zhimin Gao; Chang Tang; Jing Zhang; Philip Ogunbona

In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

IEEE Transactions on Circuits and Systems for Video Technology | 2018

Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks

Yonghong Hou; Zhaoyang Li; Pichao Wang; Wanqing Li

This letter presents an effective method to encode the spatiotemporal information of a skeleton sequence into color texture images, referred to as skeleton optical spectra, and employs convolutional neural networks (ConvNets) to learn the discriminative features for action recognition. Such spectrum representation makes it possible to use a standard ConvNet architecture to learn suitable “dynamic” features from skeleton sequences without training millions of parameters afresh and it is especially valuable when there is insufficient annotated training video data. Specifically, the encoding consists of four steps: mapping of joint distribution, spectrum coding of joint trajectories, spectrum coding of body parts, and joint velocity weighted saturation and brightness. Experimental results on three widely used datasets have demonstrated the efficacy of the proposed method.

international conference on pattern recognition | 2016

Large-scale Isolated Gesture Recognition using Convolutional Neural Networks

Pichao Wang; Wanqing Li; Song Liu; Zhimin Gao; Chang Tang; Philip Ogunbona

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57% classification accuracy and ranked 2nd place in this challenge but was very close to the best performance even though we only used depth data.

IEEE Signal Processing Letters | 2017

Joint Distance Maps Based Action Recognition With Convolutional Neural Networks

Chuankun Li; Yonghong Hou; Pichao Wang; Wanqing Li

Motivated by the promising performance achieved by deep learning, an effective yet simple method is proposed to encode the spatio-temporal information of skeleton sequences into color texture images, referred to as joint distance maps (JDMs), and convolutional neural networks are employed to exploit the discriminative features from the JDMs for human action and interaction recognition. The pair-wise distances between joints over a sequence of single or multiple person skeletons are encoded into color variations to capture temporal information. The efficacy of the proposed method has been verified by the state-of-the-art results on the large RGB+D Dataset and small UTD-MHAD Dataset in both single-view and cross-view settings.

international conference on pattern recognition | 2016

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

Pichao Wang; Wanqing Li; Song Liu; Yuyao Zhang; Zhimin Gao; Philip Ogunbona

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using Convolutional Neural networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd place in this challenge.

IEEE Signal Processing Letters | 2017

Salient Object Detection via Weighted Low Rank Matrix Recovery

Chang Tang; Pichao Wang; Changqing Zhang; Wanqing Li

Image-based salient object detection is a useful and important technique, which can promote the efficiency of several applications such as object detection, image classification/retrieval, object co-segmentation, and content-based image editing. In this letter, we present a novel weighted low-rank matrix recovery (WLRR) model for salient object detection. In order to facilitate efficient salient objects-background separation, a high-level background prior map is estimated by employing the property of the color, location, and boundary connectivity, and then this prior map is ensembled into a weighting matrix which indicates the likelihood that each image region belongs to the background. The final salient object detection task is formulated as the WLRR model with the weighting matrix. Both quantitative and qualitative experimental results on three challenging datasets show competitive results as compared with 24 state-of-the-art methods.

IEEE Signal Processing Letters | 2016

A Spectral and Spatial Approach of Coarse-to-Fine Blurred Image Region Detection

Chang Tang; Jin Wu; Yonghong Hou; Pichao Wang; Wanqing Li

Blur exists in many digital images, it can be mainly categorized into two classes: defocus blur which is caused by optical imaging systems and motion blur which is caused by the relative motion between camera and scene objects. In this letter, we propose a simple yet effective automatic blurred image region detection method. Based on the observation that blur attenuates high-frequency components of an image, we present a blur metric based on the log averaged spectrum residual to get a coarse blur map. Then, a novel iterative updating mechanism is proposed to refine the blur map from coarse to fine by exploiting the intrinsic relevance of similar neighbor image regions. The proposed iterative updating mechanism can partially resolve the problem of differentiating an in-focus smooth region and a blurred smooth region. In addition, our iterative updating mechanism can be integrated into other image blurred region detection algorithms to refine the final results. Both quantitative and qualitative experimental results demonstrate that our proposed method is more reliable and efficient compared to various state-of-the-art methods.

Explore More