Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shaoxin Li is active.

Publication


Featured researches published by Shaoxin Li.


international conference on multimodal interfaces | 2014

Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild

Mengyi Liu; Ruiping Wang; Shaoxin Li; Shiguang Shan; Zhiwu Huang; Xilin Chen

In this paper, we present the method for our submission to the Emotion Recognition in the Wild Challenge (EmotiW 2014). The challenge is to automatically classify the emotions acted by human subjects in video clips under real-world environment. In our method, each video clip can be represented by three types of image set models (i.e. linear subspace, covariance matrix, and Gaussian distribution) respectively, which can all be viewed as points residing on some Riemannian manifolds. Then different Riemannian kernels are employed on these set models correspondingly for similarity/distance measurement. For classification, three types of classifiers, i.e. kernel SVM, logistic regression, and partial least squares, are investigated for comparisons. Finally, an optimal fusion of classifiers learned from different kernels and different modalities (video and audio) is conducted at the decision level for further boosting the performance. We perform an extensive evaluation on the challenge data (including validation set and blind test set), and evaluate the effects of different strategies in our pipeline. The final recognition accuracy achieved 50.4% on test set, with a significant gain of 16.7% above the challenge baseline 33.7%.


ieee international conference on automatic face gesture recognition | 2013

AU-aware Deep Networks for facial expression recognition

Mengyi Liu; Shaoxin Li; Shiguang Shan; Xilin Chen

In this paper, we propose to construct a deep architecture, AU-aware Deep Networks (AUDN), for facial expression recognition by elaborately utilizing the prior knowledge that the appearance variations caused by expression can be decomposed into a batch of local facial Action Units (AUs). The proposed AUDN is composed of three sequential modules: the first module consists of two layers, i.e., a convolution layer and a max-pooling layer, which aim to generate an over-complete representation encoding all expression-specific appearance variations over all possible locations; In the second module, an AU-aware receptive field layer is designed to search subsets of the over-complete representation, each of which aims at best simulating the combination of AUs; In the last module, multilayer Restricted Boltzmann Machines (RBM) are exploited to learn hierarchical features, which are then concatenated for final expression recognition. Experiments on three expression databases CK+, MMI and SFEW demonstrate the effectiveness of AUDN in both lab-controlled and wild environments. All our results are better than or at least competitive to the best known results.


asian conference on computer vision | 2014

Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis

Mengyi Liu; Shaoxin Li; Shiguang Shan; Ruiping Wang; Xilin Chen

Expressions are facial activities invoked by sets of muscle motions, which would give rise to large variations in appearance mainly around facial parts. Therefore, for visual-based expression analysis, localizing the action parts and encoding them effectively become two essential but challenging problems. To take them into account jointly for expression analysis, in this paper, we propose to adapt 3D Convolutional Neural Networks (3D CNN) with deformable action parts constraints. Specifically, we incorporate a deformable parts learning component into the 3D CNN framework, which can detect specific facial action parts under the structured spatial constraints, and obtain the discriminative part-based representation simultaneously. The proposed method is evaluated on two posed expression datasets, CK+, MMI, and a spontaneous dataset FERA. We show that, besides achieving state-of-the-art expression recognition accuracy, our method also enjoys the intuitive appeal that the part detection map can desirably encode the mid-level semantics of different facial action parts.


european conference on computer vision | 2012

Morphable displacement field based image matching for face recognition across pose

Shaoxin Li; Xin Liu; Xiujuan Chai; Haihong Zhang; Shihong Lao; Shiguang Shan

Fully automatic Face Recognition Across Pose (FRAP) is one of the most desirable techniques, however, also one of the most challenging tasks in face recognition field. Matching a pair of face images in different poses can be converted into matching their pixels corresponding to the same semantic facial point. Following this idea, given two images G and P in different poses, we propose a novel method, named Morphable Displacement Field (MDF), to match G with Ps virtual view under Gs pose. By formulating MDF as a convex combination of a number of template displacement fields generated from a 3D face database, our model satisfies both global conformity and local consistency. We further present an approximate but effective solution of the proposed MDF model, named implicit Morphable Displacement Field (iMDF), which synthesizes virtual view implicitly via an MDF by minimizing matching residual. This formulation not only avoids intractable optimization of the high-dimensional displacement field but also facilitates a constrained quadratic optimization. The proposed method can work well even when only 2 facial landmarks are labeled, which makes it especially suitable for fully automatic FRAP system. Extensive evaluations on FERET, PIE and Multi-PIE databases show considerable improvement over state-of-the-art FRAP algorithms in both semi-automatic and fully automatic evaluation protocols.


Neurocomputing | 2015

AU-inspired Deep Networks for Facial Expression Feature Learning

Mengyi Liu; Shaoxin Li; Shiguang Shan; Xilin Chen

Most existing technologies for facial expression recognition utilize off-the-shelf feature extraction methods for classification. In this paper, aiming at learning better features specific for expression representation, we propose to construct a deep architecture, AU-inspired Deep Networks (AUDN), inspired by the psychological theory that expressions can be decomposed into multiple facial Action Units (AUs). To fully exploit this inspiration but avoid detecting AUs, we propose to automatically learn: (1) informative local appearance variation; (2) optimal way to combining local variation and (3) high level representation for final expression recognition. Accordingly, the proposed AUDN is composed of three sequential modules. Firstly, we build a convolutional layer and a max-pooling layer to learn the Micro-Action-Pattern (MAP) representation, which can explicitly depict local appearance variations caused by facial expressions. Secondly, feature grouping is applied to simulate larger receptive fields by combining correlated MAPs adaptively, aiming to generate more abstract mid-level semantics. Finally, a multi-layer learning process is employed in each receptive field respectively to construct group-wise sub-networks for higher-level representations. Experiments on three expression databases CK+, MMI and SFEW demonstrate that, by simply applying linear classifiers on the learned features, our method can achieve state-of-the-art results on all the databases, which validates the effectiveness of AUDN in both lab-controlled and wild environments.


international conference on computer vision | 2015

AgeNet: Deeply Learned Regressor and Classifier for Robust Apparent Age Estimation

Xin Liu; Shaoxin Li; Meina Kan; Jie Zhang; Shuzhe Wu; Wenxian Liu; Hu Han; Shiguang Shan; Xilin Chen

Apparent age estimation from face image has attracted more and more attentions as it is favorable in some real-world applications. In this work, we propose an end-to-end learning approach for robust apparent age estimation, named by us AgeNet. Specifically, we address the apparent age estimation problem by fusing two kinds of models, i.e., real-value based regression models and Gaussian label distribution based classification models. For both kind of models, large-scale deep convolutional neural network is adopted to learn informative age representations. Another key feature of the proposed AgeNet is that, to avoid the problem of over-fitting on small apparent age training set, we exploit a general-to-specific transfer learning scheme. Technically, the AgeNet is first pre-trained on a large-scale web-collected face dataset with identity label, and then it is fine-tuned on a large-scale real age dataset with noisy age label. Finally, it is fine-tuned on a small training set with apparent age label. The experimental results on the ChaLearn 2015 Apparent Age Competition demonstrate that our AgeNet achieves the state-of-the-art performance in apparent age estimation.


ieee international conference on automatic face gesture recognition | 2015

Report on the FG 2015 Video Person Recognition Evaluation

J. Ross Beveridge; Hao Zhang; Bruce A. Draper; Patrick J. Flynn; Zhen-Hua Feng; Patrik Huber; Josef Kittler; Zhiwu Huang; Shaoxin Li; Yan Li; Meina Kan; Ruiping Wang; Shiguang Shan; Xilin Chen; Haoxiang Li; Gang Hua; Vitomir Struc; Janez Krizaj; Changxing Ding; Dacheng Tao; P. Jonathon Phillips

This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.


asian conference on computer vision | 2012

Relative forest for attribute prediction

Shaoxin Li; Shiguang Shan; Xilin Chen

Human-Namable visual attributes are promising in leveraging various recognition tasks. Intuitively, the more accurate the attribute prediction is, the more the recognition tasks can benefit. Relative attributes [1] learns a ranking function per attribute which can provide more accurate attribute prediction, thus, show clear advantages over previous binary attribute. In this paper, we inherit the idea of learning ranking function per attribute but propose to improve the algorithm in two aspects: First, we propose a Relative Tree algorithm which facilitates more accurate nonlinear ranking to capture the semantic relationships. Second, we develop a Relative Forest algorithm which resorts to randomized learning to reduce training time of Relative Tree. Benefiting from multiple tree ensemble, Relative Forest can achieve even more accurate final ranking. To show the effectiveness of proposed method, we first compare Relative Tree method with Relative Attribute on PubFig and OSR dataset. Then to verify the efficiency of Relative Forest algorithm, we conduct age estimation evaluation on FG-NET dataset. With much less training time compared to Relative Attribute and Relative Tree, proposed Relative Forest achieves state-of-the-art age estimation accuracy. Finally, experiments on the large scale SUN Attribute database show the scalability of proposed Relative Forest.


IEEE Transactions on Image Processing | 2014

Maximal likelihood correspondence estimation for face recognition across pose.

Shaoxin Li; Xin Liu; Xiujuan Chai; Haihong Zhang; Shihong Lao; Shiguang Shan

Due to the misalignment of image features, the performance of many conventional face recognition methods degrades considerably in across pose scenario. To address this problem, many image matching-based methods are proposed to estimate semantic correspondence between faces in different poses. In this paper, we aim to solve two critical problems in previous image matching-based correspondence learning methods: 1) fail to fully exploit face specific structure information in correspondence estimation and 2) fail to learn personalized correspondence for each probe image. To this end, we first build a model, termed as morphable displacement field (MDF), to encode face specific structure information of semantic correspondence from a set of real samples of correspondences calculated from 3D face models. Then, we propose a maximal likelihood correspondence estimation (MLCE) method to learn personalized correspondence based on maximal likelihood frontal face assumption. After obtaining the semantic correspondence encoded in the learned displacement, we can synthesize virtual frontal images of the profile faces for subsequent recognition. Using linear discriminant analysis method with pixel-intensity features, state-of-the-art performance is achieved on three multipose benchmarks, i.e., CMU-PIE, FERET, and MultiPIE databases. Owe to the rational MDF regularization and the usage of novel maximal likelihood objective, the proposed MLCE method can reliably learn correspondence between faces in different poses even in complex wild environment, i.e., labeled face in the wild database.


Journal on Multimodal User Interfaces | 2016

Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Mengyi Liu; Ruiping Wang; Shaoxin Li; Zhiwu Huang; Shiguang Shan; Xilin Chen

In this paper, we present the method for our submission to the emotion recognition in the wild challenge (EmotiW). The challenge is to automatically classify the emotions acted by human subjects in video clips under real-world environment. In our method, each video clip can be represented by three types of image set models (i.e. linear subspace, covariance matrix, and Gaussian distribution) respectively, which can all be viewed as points residing on some Riemannian manifolds. Then different Riemannian kernels are employed on these set models correspondingly for similarity/distance measurement. For classification, three types of classifiers, i.e. kernel SVM, logistic regression, and partial least squares, are investigated for comparisons. Finally, an optimal fusion of classifiers learned from different kernels and different modalities (video and audio) is conducted at the decision level for further boosting the performance. We perform extensive evaluations on the EmotiW 2014 challenge data (including validation set and blind test set), and evaluate the effects of different components in our pipeline. It is observed that our method has achieved the best performance reported so far. To further evaluate the generalization ability, we also perform experiments on the EmotiW 2013 data and two well-known lab-controlled databases: CK+ and MMI. The results show that the proposed framework significantly outperforms the state-of-the-art methods.

Collaboration


Dive into the Shaoxin Li's collaboration.

Top Co-Authors

Avatar

Shiguang Shan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xilin Chen

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Mengyi Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xin Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ruiping Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Meina Kan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhiwu Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiujuan Chai

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge