Shizhou Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shizhou Zhang is active.

Explore More

Publication

Featured researches published by Shizhou Zhang.

Pattern Recognition | 2017

Constructing Deep Sparse Coding Network for image classification

Shizhou Zhang; Jinjun Wang; Xiaoyu Tao; Yihong Gong; Nanning Zheng

This paper introduces a deep model called Deep Sparse-Coding Network (DeepSCNet) to combine the advantages of Convolutional Neural Network (CNN) and sparse-coding techniques for image feature representation. DeepSCNet consists of four type of basic layers: The sparse-coding layer performs generalized linear coding for local patch within the receptive field by replacing the convolution operation in CNN into sparse-coding. The Pooling layer and the Normalization layer perform identical operations as that in CNN. And finally the Map reduction layer reduces CPU/memory consumption by reducing the number of feature maps before stacking with the following layers. These four type of layers can be easily stacked to construct a deep model for image feature learning. The paper further discusses the multi-scale, multi-locality extension to the basic DeepSCNet, and the overall approach is fully unsupervised. Compared to CNN, training DeepSCNet is relatively easier even with training set of moderate size. Experiments show that DeepSCNet can automatically discover highly discriminative feature directly from raw image pixels. HighlightsWe propose a novel deep model combining the advantages of CNN and sprase-coding technique.Image representation can be obtained directly from raw pixels.Multi-scale and Multi-Locality extensions are proposed to boost the recognition accuracy.The classification accuracy is very promising among traditional unsupervised methods.

Neurocomputing | 2017

Combining local and global hypotheses in deep neural network for multi-label image classification

Qinghua Yu; Jinjun Wang; Shizhou Zhang; Yihong Gong; Jizhong Zhao

Multi-label image classification is a challenging problem in computer vision. Motivated by the recent development in image classification performance using Deep Neural Networks, in this work, we propose a flexible deep Convolutional Neural Network (CNN) framework, called Local-Global-CNN (LGC), to improve multi-label image classification performance. LGC consists of firstly a local level multi-label classifier which takes object segment hypotheses as inputs to a local CNN. The output results of these local hypotheses are aggregated together with max-pooling and then re-weighted to consider the label co-occurrence or interdependencies information by using a graphical model in the label space. LGC also utilizes a global CNN that is trained by multi-label images to directly predict the multiple labels from the input. The predictions of local and global level classifiers are finally fused together to obtain MAP estimation of the final multi-label prediction. The above LGC framework could benefit from a pre-train process with a large-scale single-label image dataset, e.g., ImageNet. Experimental results have shown that the proposed framework could achieve promising performance on Pascal VOC2007 and VOC2012 multi-label image dataset.

international conference on image processing | 2015

Incorporating image degeneration modeling with multitask learning for image super-resolution

Yudong Liang; Jinjun Wang; Shizhou Zhang; Yihong Gong

Learning the non-linear image upscaling process has previously been considered as a simple regression process, where various models have been utilized to describe the correlations between high-resolution (HR) and low-resolution (LR) images/patches. In this paper, we present a multitask learning framework based on deep neural network for image super-resolution, where we jointly consider the image super-resolution process and the image degeneration process. By sharing parameters between the two highly relevant tasks, the proposed framework could effectively improve the obtained neural network based mapping model between HR and LR image patches. Experimental results have demonstrated clear visual improvement and high computational efficiency, especially with large magnification factors.

international conference on internet multimedia computing and service | 2015

Deep sparse coding network for image classification

Sijun Zhou; Shizhou Zhang; Jinjun Wang

In this paper, we introduce a novel deep model called Deep Sparse Coding Network (DeepSCNet) for Image classification. This new model includes four types of layers: the Sparse-coding layer, the Pooling layer, the Normalization layer and the Map reduction layer. The Sparse-coding laying does the general linear coding work for local patch within the receptive field. The Pooling layer and the Normalization layer do the same work as that in CNN. The Map reduction layer reduces the CPU and memory consumptions by reducing the number of feature maps. The paper further discusses the multi-scale, multi-locality extension to the basic DeepSCNet. Compared to CNN, training DeepSCNet is relatively easier even with a training set of moderate size. Experiments show that DeepSCNet can automatically discover highly discriminative feature directly from raw image pixels.

asia pacific signal and information processing association annual summit and conference | 2014

Learning visual co-occurrence with auto-encoder for image super-resolution

Yudong Liang; Jinjun Wang; Shizhou Zhang; Yihong Gong

This paper proposes a novel neural network learning the essential mapping function between the low resolution and high resolution image for Image superresolution problem. In our approach, patch recurrence property of small patches in natural image are utilized as a prior to train the network. An autoencoder neutral network is designed to reconstruct the high resolution patches. The constraint that the output of the coding part should be similar as the corresponding high resolution patches is imposed to ameliorate the illness nature of the superresolution problem. In fact, the degeneration mapping from the high resolution image to the low resolution image is also integrated in the network. Both visual improvements and objective assessments are demonstrated on true images.

Multimedia Tools and Applications | 2018

Person re-identification by the asymmetric triplet and identification loss function

De Cheng; Yihong Gong; Weiwei Shi; Shizhou Zhang

Person re-identification(re-id) aims to match the same individuals across different non-overlapping camera views. In this paper, we analyze the effectiveness of two widely used triplet loss and softmax loss on person re-id task. We conclude that the triplet loss function is suitable for the relatively small datasets with the shallow neural network, while the softmax loss works better on larger datasts with relatively deeper network architecture. Both of them are essential to the person re-id task. Moreover, we present a convolutional neural network (CNN) model under the joint supervision of the triplet loss and softmax loss for person re-id. This method can get a slightly better performance than either of them. The triplet loss makes the distance of the same individual’s images closer, and pushes the instances of different individuals far apart from each other, which can effectively reduce the intra-personal variations. Meanwhile, the person identification cost, which is implemented by the softmax loss with the “center loss” embedded, can discriminatively learn some identity-related feature representations (i.e. features with large inter-personal variations). Extensive experimental results demonstrate the effectiveness of our proposed method, and we have obtained promising performance on the challenging i-LIDS, PRID2011 and CUHK03 datasets.

Multimedia Tools and Applications | 2017

Correction to: Person re-identification by the symmetric triplet and identification loss function

De Cheng; Yihong Gong; Weiwei Shi; Shizhou Zhang

The original version of this article unfortunately contained mistakes. The author has misspelled the word “symmetric” as “asymmetric” in the article title and some occurrences in the texts.

advances in multimedia | 2016

A Biologically Inspired Deep CNN Model

Shizhou Zhang; Yihong Gong; Jinjun Wang; Nanning Zheng

Recently, the Deep Convolutional Neural Networks DCNN have achieved state-of-the-art performances with many tasks in image and video analysis. However, it is a very challenging problem to devise a good DCNN model as there are so many choices to be made by a network designer, including the depth, the number of feature maps, interconnection patterns, window sizes for convolution and pooling layers, etc. These choices constitute a huge search space that makes it impractical to discover an optimal network structure with any systematic approaches. In this paper, we strive to develop a good DCNN model by borrowing biological guidance from the human visual cortex. By making an analogy between the proposed DCNN model and the human visual cortex, many critical design choices of the proposed model can be determined with some simple calculations. Comprehensive experimental evaluations demonstrate that the proposed DCNN model achieves state-of-the-art performances on four widely used benchmark datasets: CIFAR-10, CIFAR-100, SVHN and MNIST.

international conference on multimedia and expo | 2015

Multi-cue Normalized Non-Negative Sparse Encoder for image classification

Shizhou Zhang; Jinjun Wang; Yudong Liang; Yihong Gong; Nanning Zheng

Recently, the sparse coding based image representation has achieved state-of-the-art recognition results on many benchmarks. In this paper, we propose Multi-cue Normalized Non-Negative Sparse Encoder (MN3SE) which enforces both the non-negative constraint and the shift-invariant constraint on top of the traditional sparse coding criteria, and takes multi-cue to further boost the performance. The former constraint reduces information loose by the negative coefficients and improves the coding stability, and the latter allows the sparseness to be self-adaptive to the local feature. The proposed coding scheme is then approximated by an neural network based encoder for speed-up. More importantly, the multi-layer neural network architecture allows us to apply a multi-task learning strategy to fuse information from multi-cue. Specifically, we take one type of descriptor, such as SIFT as the input, and enforce the learned encoder to produce sparse code that can reconstruct not only SIFT but also other types of descriptors such as color moments. In this way, we could achieve not only 10 to 33 times speed up for sparse-coding, the multi-cue enforced learning strategy gives the image feature extracted by MN3SE superior image classification accuracy.

ieee international conference semantic computing | 2015

Cascade object detection with complementary features and algorithms

De Cheng; Jinjun Wang; Xing Wei; Nan Liu; Shizhou Zhang; Yihong Gong; Nanning Zheng

This paper presents a novel method of combining the object detection algorithms and the methods used for image classification aiming to further boosting the object detection performance. Since the algorithm and image features which used in the image classification tasks have not been well transplanted into the object detection method, most of the reason is that the feature used in the image classification is extracted from the whole image which have no space information. In our framework, firstly we use the detection model to propose the candidate windows; in the second stage the candidate windows will act as the whole image to be classified. Intuitively, the first stage should have high recall, while the second stage should have high precision. In our proposed detection framework, a SVM model was trained to combine the scores computed from both stages. The proposed framework can be generally used, while in our experiments we used the LSVM as the object detector in the first stage and the mostly used deep convolutional neural network classifier in the second stage. Finally, a combined model shows that the object detection performance can be further boosted under this framework in our experiments.

Explore More