Zhimin Gao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhimin Gao is active.

Explore More

Publication

Featured researches published by Zhimin Gao.

IEEE Transactions on Human-Machine Systems | 2016

Action Recognition From Depth Maps Using Deep Convolutional Neural Networks

Pichao Wang; Wanqing Li; Zhimin Gao; Jing Zhang; Chang Tang; Philip Ogunbona

This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2-9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.

acm multimedia | 2015

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Pichao Wang; Wanqing Li; Zhimin Gao; Chang Tang; Jing Zhang; Philip Ogunbona

In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

IEEE Journal of Biomedical and Health Informatics | 2017

HEp-2 Cell Image Classification With Deep Convolutional Neural Networks

Zhimin Gao; Lei Wang; Luping Zhou; Jianjia Zhang

Efficient Human Epithelial-2 cell image classification can facilitate the diagnosis of many autoimmune diseases. This paper proposes an automatic framework for this classification task, by utilizing the deep convolutional neural networks (CNNs) which have recently attracted intensive attention in visual recognition. In addition to describing the proposed classification framework, this paper elaborates several interesting observations and findings obtained by our investigation. They include the important factors that impact network design and training, the role of rotation-based data augmentation for cell images, the effectiveness of cell image masks for classification, and the adaptability of the CNN-based classification system across different datasets. Extensive experimental study is conducted to verify the above findings and compares the proposed framework with the well-established image classification models in the literature. The results on benchmark datasets demonstrate that 1) the proposed framework can effectively outperform existing models by properly applying data augmentation, 2) our CNN-based framework has excellent adaptability across different datasets, which is highly desirable for cell image classification under varying laboratory settings. Our system is ranked high in the cell image classification competition hosted by ICPR 2014.

international conference on pattern recognition | 2016

Large-scale Isolated Gesture Recognition using Convolutional Neural Networks

Pichao Wang; Wanqing Li; Song Liu; Zhimin Gao; Chang Tang; Philip Ogunbona

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57% classification accuracy and ranked 2nd place in this challenge but was very close to the best performance even though we only used depth data.

international conference on pattern recognition | 2016

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

Pichao Wang; Wanqing Li; Song Liu; Yuyao Zhang; Zhimin Gao; Philip Ogunbona

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using Convolutional Neural networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd place in this challenge.

IEEE Access | 2018

Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition

Yonghong Hou; Shuang Wang; Pichao Wang; Zhimin Gao; Wanqing Li

This paper presents an effective yet simple video representation for RGB-D-based action recognition. It proposes to represent a depth map sequence into three pairs of structured dynamic images (DIs) at body, part, and joint levels, respectively, through hierarchical bidirectional rank pooling. Different from previous works that applied one convolutional neural network (ConvNet) for each part/joint separately, one pair of structured DIs is constructed from depth maps at each granularity level and serves as the input of a ConvNet. The structured DI not only preserves the spatial-temporal information but also enhances the structure information across both body parts/joints and different temporal scales. In additionally, it requires low computational cost and memory to construct. This new representation, referred to as Spatially and Temporally Structured Dynamic Depth Images, aggregates from global to fine-grained levels motion and structure information in a depth sequence, and enables us to fine-tune the existing ConvNet models trained on image data for classification of depth sequences, without a need for training the models afresh. The proposed representation is evaluated on six benchmark data sets, namely, MSRAction3D, G3D, MSRDailyActivity3D, SYSU 3D HOI, UTD-MHAD, and M2I data sets and achieves the state-of-the-art results on all six data sets.

digital image computing techniques and applications | 2014

Experimental Study of Unsupervised Feature Learning for HEp-2 Cell Images Clustering

Yan Zhao; Zhimin Gao; Lei Wang; Luping Zhou

Automatic identification of HEp-2 cell images has received an increasing research attention. Feature representations play a critical role in achieving good identification performance. Much recent work has focused on supervised feature learning. Typical methods consist of BoW model (based on hand-crafted features) and deep learning model (learning hierarchical features). However, these labels used in supervised feature learning are very labour-intensive and time-consuming. They are commonly manually annotated by specialists and very expensive to obtain. In this paper, we follow this fact and focus on unsupervised feature learning. We have verified and compared the features of these two typical models by clustering. Experimental results show the BoW model generally perform better than deep learning models. Also, we illustrate BoW model and deep learning models have complementarity properties.

international conference on multimedia and expo | 2017

Infomax principle based pooling of deep convolutional activations for image retrieval

Zhimin Gao; Lei Wang; Luping Zhou; Ming Yang

Neural activations produced by deep convolutional networks have recently become state-of-the-art representation for image retrieval. To obtain a global image representation, sum-pooling has been frequently used to aggregate activations of convolutional feature maps. This work first presents an understanding on the effectiveness of sum-pooling via probabilistic interpretation, by proving that sum-pooling is an upper bound of the probability that a visual pattern is present in an image. To further answer the optimality of sum-pooling, a quantitative analysis based on the Infomax principle in neural networks is provided. It shows that sum-pooling aligns well with the leading eigenvector of principal component analysis (PCA) applied to the activations of a feature map. Moreover, considering the 2D matrix structure of feature maps, a two-directional 2DPCA-based pooling scheme is proposed to aggregate the convolutional activations. Experiments on multiple benchmark image retrieval datasets demonstrate the above analysis and the superiority of the proposed pooling scheme.

digital image computing techniques and applications | 2016

Semi-Supervised Weight Learning for the Spatial Search Method in ConvNet-Based Image Retrieval

Yan Zhao; Lei Wang; Zhimin Gao; Ian Comor; Weichen Zhang; Luping Zhou

As the state-of-the-art ConvNet-based image retrieval method, spatial search has shown excellent retrieval performance and outperformed other competitors. A key component of this method is a weighted combination of distances evaluated at different regions of a query image. However, these weights are currently manually tuned, by a trial-and-error based exhaustive search. This not only incurs a lengthy parameter tuning process, but is also hard to guarantee the optimality of the tuned weights. Moreover, these weights may not be generally applied when the nature of image data set changes. To improve this situation, we propose to automatically learn the combination weights based on retrieval groundtruth. Specifically, we develop a method, called semi-supervised weight learning (SWL), based on the framework of distance metric learning. In addition to generating triplet constraints with retrieval groundtruth, we leverage unlabelled images to generate numerous unsupervised constraints to stabilise the learning process and improve learning efficiency. By linking with the latest primal solver of linear support vector machines, an efficient algorithm is put forward to solve the resulting large-scale optimization problem. Experimental results on three benchmark data sets and a newly collected archival photo data set demonstrate the effectiveness of the proposed weight learning approach. It achieves comparable or better retrieval performance than the manual tuning approach, especially on the new archival photo data set.

digital image computing techniques and applications | 2016

Image Descriptors from ConvNets: Comparing Global Pooling Methods for Image Retrieval

Ian Comor; Yan Zhao; Zhimin Gao; Luping Zhou; Lei Wang

A major component of a generic image retrieval pipeline is producing concise and effective descriptors for each image. Previous works have shown impressive results in image retrieval when using descriptors from the black-box output of the fully-connected stage of pretrained Convolutional Neural Networks (ConvNets). However, previous work on descriptors pooled from the deep feature maps from late convolutional layers can produce more discriminative descriptors for generic image retrieval, while being relatively concise. When planning to globally pool such feature maps from a ConvNet, some options to consider are (1) the depth of the network, (2) choice of layer to pool, and (3) the level of dimension reduction. The previous work on global pooling methods uses differing techniques without a clear consensus on which method is best. This motivates us to establish a baseline pipeline from which to compare these options and their effect on retrieval results. Our contribution is a systematic and comprehensive experimental study of different pooling strategies of deep features for image retrieval, and the various options. Our results show that the nature of the dataset (object- heavy or scene-heavy) warrants a different pooling strategy. Significantly, we visualise the level of image discrimination brought by the different pooling methods on the datasets, and show that pooling need not have a priori spatial weights to effectively find objects within the image. The results underline the need to consider the context of the image dataset when developing image retrieval pipelines using ConvNets.

Explore More