Shenghua Gao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shenghua Gao is active.

Explore More

Publication

Featured researches published by Shenghua Gao.

computer vision and pattern recognition | 2010

Local features are not lonely – Laplacian sparse coding for image classification

Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia; Peilin Zhao

Sparse coding which encodes the original signal in a sparse signal space, has shown its state-of-the-art performance in the visual codebook generation and feature quantization process of BoW based image representation. However, in the feature quantization process of sparse coding, some similar local features may be quantized into different visual words of the codebook due to the sensitiveness of quantization. In this paper, to alleviate the impact of this problem, we propose a Laplacian sparse coding method, which will exploit the dependence among the local features. Specifically, we propose to use histogram intersection based kNN method to construct a Laplacian matrix, which can well characterize the similarity of local features. In addition, we incorporate this Laplacian matrix into the objective function of sparse coding to preserve the consistence in sparse representation of similar local features. Comprehensive experimental results show that our method achieves or outperforms existing state-of-the-art results, and exhibits excellent performance on Scene 15 data set.

IEEE Transactions on Image Processing | 2015

PCANet: A Simple Deep Learning Baseline for Image Classification?

Tsung-Han Chan; Kui Jia; Shenghua Gao; Jiwen Lu; Zinan Zeng; Yi Ma

In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)]. Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.

european conference on computer vision | 2010

Kernel sparse representation for image classification and face recognition

Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia

Recent research has shown the effectiveness of using sparse coding(Sc) to solve many computer vision problems. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which may reduce the feature quantization error and boost the sparse coding performance, we propose Kernel Sparse Representation(KSR). KSR is essentially the sparse coding technique in a high dimensional feature space mapped by implicit mapping function. We apply KSR to both image classification and face recognition. By incorporating KSR into Spatial Pyramid Matching(SPM), we propose KSRSPM for image classification. KSRSPM can further reduce the information loss in feature quantization step compared with Spatial Pyramid Matching using Sparse Coding(ScSPM). KSRSPM can be both regarded as the generalization of Efficient Match Kernel(EMK) and an extension of ScSPM. Compared with sparse coding, KSR can learn more discriminative sparse codes for face recognition. Extensive experimental results show that KSR outperforms sparse coding and EMK, and achieves state-of-the-art performance for image classification and face recognition on publicly available datasets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications

Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia

Sparse coding exhibits good performance in many computer vision applications. However, due to the overcomplete codebook and the independent coding process, the locality and the similarity among the instances to be encoded are lost. To preserve such locality and similarity information, we propose a Laplacian sparse coding (LSc) framework. By incorporating the similarity preserving term into the objective of sparse coding, our proposed Laplacian sparse coding can alleviate the instability of sparse codes. Furthermore, we propose a Hypergraph Laplacian sparse coding (HLSc), which extends our Laplacian sparse coding to the case where the similarity among the instances defined by a hypergraph. Specifically, this HLSc captures the similarity among the instances within the same hyperedge simultaneously, and also makes the sparse codes of them be similar to each other. Both Laplacian sparse coding and Hypergraph Laplacian sparse coding enhance the robustness of sparse coding. We apply the Laplacian sparse coding to feature quantization in Bag-of-Words image representation, and it outperforms sparse coding and achieves good performance in solving the image classification problem. The Hypergraph Laplacian sparse coding is also successfully used to solve the semi-auto image tagging problem. The good performance of these applications demonstrates the effectiveness of our proposed formulations in locality and similarity preservation.

IEEE Transactions on Image Processing | 2013

Sparse Representation With Kernels

Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia

Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which helps in finding a sparse representation of nonlinear features, we propose kernel sparse representation (KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an implicit mapping function. We apply KSR to feature coding in image classification, face recognition, and kernel matrix approximation. More specifically, by incorporating KSR into spatial pyramid matching (SPM), we develop KSRSPM, which achieves a good performance for image classification. Moreover, KSR-based feature coding can be shown as a generalization of efficient match kernel and an extension of Sc-based SPM. We further show that our proposed KSR using a histogram intersection kernel (HIK) can be considered a soft assignment extension of HIK-based feature quantization in the feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation, especially when a small fraction of the data is used. Extensive experimental results demonstrate promising results of KSR in image classification, face recognition, and kernel matrix approximation. All these applications prove the effectiveness of KSR in computer vision and machine learning tasks.

IEEE Transactions on Circuits and Systems for Video Technology | 2014

Region-Based Saliency Detection and Its Application in Object Recognition

Zhixiang Ren; Shenghua Gao; Liang-Tien Chia; Ivor W. Tsang

The objective of this paper is twofold. First, we introduce an effective region-based solution for saliency detection. Then, we apply the achieved saliency map to better encode the image features for solving object recognition task. To find the perceptually and semantically meaningful salient regions, we extract superpixels based on an adaptive mean shift algorithm as the basic elements for saliency detection. The saliency of each superpixel is measured by using its spatial compactness, which is calculated according to the results of Gaussian mixture model (GMM) clustering. To propagate saliency between similar clusters, we adopt a modified PageRank algorithm to refine the saliency map. Our method not only improves saliency detection through large salient region detection and noise tolerance in messy background, but also generates saliency maps with a well-defined object shape. Experimental results demonstrate the effectiveness of our method. Since the objects usually correspond to salient regions, and these regions usually play more important roles for object recognition than background, we apply our achieved saliency map for object recognition by incorporating a saliency map into sparse coding-based spatial pyramid matching (ScSPM) image representation. To learn a more discriminative codebook and better encode the features corresponding to the patches of the objects, we propose a weighted sparse coding for feature coding. Moreover, we also propose a saliency weighted max pooling to further emphasize the importance of those salient regions in feature pooling module. Experimental results on several datasets illustrate that our weighted ScSPM framework greatly outperforms ScSPM framework, and achieves excellent performance for object recognition.

computer vision and pattern recognition | 2016

Single-Image Crowd Counting via Multi-Column Convolutional Neural Network

Yingying Zhang; Desen Zhou; Siqin Chen; Shenghua Gao; Yi Ma

This paper aims to develop a method than can accurately estimate the crowd count from an individual image with arbitrary crowd density and arbitrary perspective. To this end, we have proposed a simple but effective Multi-column Convolutional Neural Network (MCNN) architecture to map the image to its crowd density map. The proposed MCNN allows the input image to be of arbitrary size or resolution. By utilizing filters with receptive fields of different sizes, the features learned by each column CNN are adaptive to variations in people/head size due to perspective effect or image resolution. Furthermore, the true density map is computed accurately based on geometry-adaptive kernels which do not need knowing the perspective map of the input image. Since exiting crowd counting datasets do not adequately cover all the challenging situations considered in our work, we have collected and labelled a large new dataset that includes 1198 images with about 330,000 heads annotated. On this challenging new dataset, as well as all existing datasets, we conduct extensive experiments to verify the effectiveness of the proposed model and method. In particular, with the proposed simple MCNN model, our method outperforms all existing methods. In addition, experiments show that our model, once trained on one dataset, can be readily transferred to a new dataset.

IEEE Transactions on Image Processing | 2014

Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization

Shenghua Gao; Ivor W. Tsang; Yi Ma

This paper targets fine-grained image categorization by learning a category-specific dictionary for each category and a shared dictionary for all the categories. Such category-specific dictionaries encode subtle visual differences among different categories, while the shared dictionary encodes common visual patterns among all the categories. To this end, we impose incoherence constraints among the different dictionaries in the objective of feature coding. In addition, to make the learnt dictionary stable, we also impose the constraint that each dictionary should be self-incoherent. Our proposed dictionary learning formulation not only applies to fine-grained classification, but also improves conventional basic-level object categorization and other tasks such as event recognition. Experimental results on five data sets show that our method can outperform the state-of-the-art fine-grained image categorization frameworks as well as sparse coding based dictionary learning frameworks. All these results demonstrate the effectiveness of our method.

computer vision and pattern recognition | 2011

Multi-layer group sparse coding — For concurrent image classification and annotation

Shenghua Gao; Liang-Tien Chia; Ivor W. Tsang

We present a multi-layer group sparse coding framework for concurrent image classification and annotation. By leveraging the dependency between image class label and tags, we introduce a multi-layer group sparse structure of the reconstruction coefficients. Such structure fully encodes the mutual dependency between the class label, which describes the image content as a whole, and tags, which describe the components of the image content. Then we propose a multi-layer group based tag propagation method, which combines the class label and subgroups of instances with similar tag distribution to annotate test images. Moreover, we extend our multi-layer group sparse coding in the Reproducing Kernel Hilbert Space (RKHS) which captures the nonlinearity of features, and further improves performances of image classification and annotation. Experimental results on the LabelMe, UIUC-Sport and NUS-WIDE-Object databases show that our method outperforms the baseline methods, and achieves excellent performances in both image classification and annotation tasks.

International Journal of Computer Vision | 2015

Neither Global Nor Local: Regularized Patch-Based Representation for Single Sample Per Person Face Recognition

Shenghua Gao; Kui Jia; Liansheng Zhuang; Yi Ma

This paper presents a regularized patch-based representation for single sample per person face recognition. We represent each image by a collection of patches and seek their sparse representations under the gallery images patches and intra-class variance dictionaries at the same time. For the reconstruction coefficients of all the patches from the same image, by imposing a group sparsity constraint on the reconstruction coefficients corresponding to the patches from the gallery images, and by imposing a sparsity constraint on the reconstruction coefficients corresponding to the intra-class variance dictionaries, our formulation harvests the advantages of both patch-based image representation and global image representation, i.e. our method overcomes the side effect of those patches which are severely corrupted by the variances in face recognition, while enforcing those less discriminative patches to be constructed by the gallery patches from the right person. Moreover, instead of using the manually designed intra-class variance dictionaries, we propose to learn the intra-class variance dictionaries which not only greatly accelerate the prediction of the probe images but also improve the face recognition accuracy in the single sample per person scenario. Experimental results on the AR, Extended Yale B, CMU-PIE, and LFW datasets show that our method outperforms sparse coding related face recognition methods as well as some other specially designed single sample per person face representation methods, and achieves the best performance. These encouraging results demonstrate the effectiveness of regularized patch-based face representation for single sample per person face recognition.

Explore More