Is this you? Create Your Porfile

Wen Zhou

Chinese Academy of Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wen Zhou is active.

Explore More

Publication

Featured researches published by Wen Zhou.

computer vision and pattern recognition | 2012

Sparse representation for face recognition based on discriminative low-rank dictionary learning

Long Ma; Chunheng Wang; Baihua Xiao; Wen Zhou

In this paper, we propose a discriminative low-rank dictionary learning algorithm for sparse representation. Sparse representation seeks the sparsest coefficients to represent the test signal as linear combination of the bases in an over-complete dictionary. Motivated by low-rank matrix recovery and completion, assume that the data from the same pattern are linearly correlated, if we stack these data points as column vectors of a dictionary, then the dictionary should be approximately low-rank. An objective function with sparse coefficients, class discrimination and rank minimization is proposed and optimized during dictionary learning. We have applied the algorithm for face recognition. Numerous experiments with improved performances over previous dictionary learning methods validate the effectiveness of the proposed algorithm.

computer vision and pattern recognition | 2013

Cross-View Action Recognition via a Continuous Virtual Path

Zhong Zhang; Chunheng Wang; Baihua Xiao; Wen Zhou; Shuang Liu; Cunzhao Shi

In this paper, we propose a novel method for cross-view action recognition via a continuous virtual path which connects the source view and the target view. Each point on this virtual path is a virtual view which is obtained by a linear transformation of the action descriptor. All the virtual views are concatenated into an infinite-dimensional feature to characterize continuous changes from the source to the target view. However, these infinite-dimensional features cannot be used directly. Thus, we propose a virtual view kernel to compute the value of similarity between two infinite-dimensional features, which can be readily used to construct any kernelized classifiers. In addition, there are a lot of unlabeled samples from the target view, which can be utilized to improve the performance of classifiers. Thus, we present a constraint strategy to explore the information contained in the unlabeled samples. The rationality behind the constraint is that any action video belongs to only one class. Our method is verified on the IXMAS dataset, and the experimental results demonstrate that our method achieves better performance than the state-of-the-art methods.

IEEE Signal Processing Letters | 2012

Action Recognition Using Context-Constrained Linear Coding

Zhong Zhang; Chunheng Wang; Baihua Xiao; Wen Zhou; Shuang Liu

Although traditional bag-of-words model has shown promising results for action recognition, it takes no consideration of the relationship among spatio-temporal points; furthermore, it also suffers serious quantization error. In this letter, we propose a novel coding strategy called context-constrained linear coding (CLC) to overcome these limitations. We first calculate the contextual distance between local descriptors and each codeword by considering the spatio-temporal contextual information. Then, linear coding using contextual distance is adopted to alleviate the quantization error. Our method is verified on two challenging databases (KTH and UCF sports), and the experimental results demonstrate that our method achieves better results than previous methods in action recognition.

IEEE Transactions on Information Forensics and Security | 2013

Attribute Regularization Based Human Action Recognition

Zhong Zhang; Chunheng Wang; Baihua Xiao; Wen Zhou; Shuang Liu

Recently, attributes have been introduced as a kind of high-level semantic information to help improve the classification accuracy. Multitask learning is an effective methodology to achieve this goal, which shares low-level features between attributes and actions. Yet such methods neglect the constraints that attributes impose on classes, which may fail to constrain the semantic relationship between the attributes and actions. In this paper, we explicitly consider such attribute-action relationship for human action recognition, and correspondingly, we modify the multitask learning model by adding attribute regularization. In this way, the learned model not only shares the low-level features, but also gets regularized according to the semantic constrains. In addition, since attribute and class label contain different amounts of semantic information, we separately treat attribute classifiers and action classifiers in the framework of multitask learning for further performance improvement. Our method is verified on three challenging datasets (KTH, UIUC, and Olympic Sports), and the experimental results demonstrate that our method achieves better results than that of previous methods on human action recognition.

IEEE Transactions on Circuits and Systems for Video Technology | 2014

Cross-View Action Recognition Using Contextual Maximum Margin Clustering

Zhong Zhang; Chunheng Wang; Baihua Xiao; Wen Zhou; Shuang Liu

Recently, maximum margin clustering (MMC) has been proposed for a cross-view action recognition. However, such a method neglects the temporal relationship between contiguous frames in the same action video. In this paper we propose a novel method called contextual maximum margin clustering (CMMC) to tackle cross-view action recognition. In CMMC, we add temporal regularization to give a high penalty when the contiguous frames are dissimilar. Thus, the CMMC not only achieves the goal of finding maximum margin hyperplanes, but also explicitly considers the temporal information among contiguous frames. Our method is verified on the IXMAS dataset and the experimental results demonstrate that our method can achieve better performance than the state-of-the-art methods.

Pattern Analysis and Applications | 2015

Robust relative attributes for human action recognition

Zhong Zhang; Chunheng Wang; Baihua Xiao; Wen Zhou; Shuang Liu

High-level semantic feature is important to recognize human action. Recently, relative attributes, which are used to describe relative relationship, have been proposed as one of high-level semantic features and have shown promising performance. However, the training process is very sensitive to noises and moreover it is not robust to zero-shot learning. In this paper, to overcome these drawbacks, we propose a robust learning framework using relative attributes for human action recognition. We simultaneously add Sigmoid and Gaussian envelops into the loss objective. In this way, the influence of outliers will be greatly reduced in the process of optimization, thus improving the accuracy. In addition, we adopt Gaussian Mixture models for better fitting the distribution of actions in rank score space. Correspondingly, a novel transfer strategy is proposed to evaluate the parameters of Gaussian Mixture models for unseen classes. Our method is verified on three challenging datasets (KTH, UIUC and HOLLYWOOD2), and the experimental results demonstrate that our method achieves better results than previous methods in both zero-shot classification and traditional recognition task for human action recognition.

advanced video and signal based surveillance | 2012

Multi-scale Fusion of Texture and Color for Background Modeling

Zhong Zhang; Chunheng Wang; Baihua Xiao; Shuang Liu; Wen Zhou

Background modeling from a stationary camera is a crucial component in video surveillance. Traditional methods usually adopt single feature type to solve the problem, while the performance is usually unsatisfactory when handling complex scenes. In this paper, we propose a multi-scale strategy, which combines both texture and color features, to achieve a robust and accurate solution. Our contributions are two folds: one is that we propose a novel texture operator named Scale-invariant Center-symmetric Local Ternary Pattern, which is robust to noise and illumination variations, the other is that a multi-scale fusion strategy is proposed for the issue. Our method is verified on several complex real world videos with illumination variation, soft shadows and dynamic backgrounds. We compare our method with four state-of-the-art methods, and the experimental results clearly demonstrate that our method achieves the highest classification accuracy in complex real world videos.

Signal Processing-image Communication | 2014

Action recognition via structured codebook construction

Wen Zhou; Chunheng Wang; Baihua Xiao; Zhong Zhang

Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset Experimental results demonstrate the advantages of our structured codebook construction

IEEE Signal Processing Letters | 2014

SLD: A Novel Robust Descriptor for Image Matching

Wen Zhou; Chunheng Wang; Baihua Xiao; Zhong Zhang

Image matching based on local features is a challenging task because it is difficult to build a robust local descriptor which is invariant to large variations in scale, viewpoints, illumination and rotation. To address these issues, Scale Invariant Feature Transform (SIFT) descriptor has been proposed to build a robust and distinctive local descriptor. However, it is not fully affine invariant. In this letter, we propose a novel robust descriptor: Sampling based Local Descriptor (SLD) to perform reliable image matching under large variations in scale, viewpoints, illumination and rotation. We build the descriptor based on elliptical sampling which samples image pixels according to the elliptic equations. The main advantage of elliptical sampling is that two controllable parameters of elliptical sampling can generate descriptors with different viewpoints and rotations. Besides, the descriptor has two notable properties: 1) it is fully invariant to affine changes; 2) it enables fast matching process because we only need to search two controllable parameters for elliptical sampling, which is more efficient than other affine invariant descriptors. We test the proposed descriptor on standard benchmark for evaluation. Experimental results show the robustness of the proposed method under large variations in illumination, viewpoints and scale.

asian conference on pattern recognition | 2011

Background modeling by exploring multi-scale fusion of texture and intensity in complex scenes

Zhong Zhang; Baihua Xiao; Chunheng Wang; Wen Zhou; Shuang Liu

Background modeling is a fundamental yet challenging issue in video surveillance. Traditional methods usually adopt single feature type to solve the problem, while the performance is usually unsatisfactory when handling complex scenes. In this paper, we propose a multi-scale framework, which combines both texture and intensity feature, to achieve a robust and accurate solution. Our contributions are three folds: first, we provide a multi-scale analysis for the issue; second, for texture feature we propose a novel texture operator named Scale-invariant Center-symmetric Local Ternary Pattern, and a corresponding Pattern Adaptive Kernel Density Estimation technique for its probability estimation; third, we design a Simplified Gaussian Mixture Models for intensity feature. Our method is tested on several complex real world videos with illumination variation, soft shadows and dynamic backgrounds. The experimental results clearly demonstrate that our method is superior to the previous methods.

Explore More