Shizhong Han
University of South Carolina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shizhong Han.
computer vision and pattern recognition | 2014
Ping Liu; Shizhong Han; Zibo Meng; Yan Tong
A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.
european conference on computer vision | 2014
Ping Liu; Joey Tianyi Zhou; Ivor W. Tsang; Zibo Meng; Shizhong Han; Yan Tong
Studies in psychology show that not all facial regions are of importance in recognizing facial expressions and different facial regions make different contributions in various facial expressions. Motivated by this, a novel framework, named Feature Disentangling Machine (FDM), is proposed to effectively select active features characterizing facial expressions. More importantly, the FDM aims to disentangle these selected features into non-overlapped groups, in particular, common features that are shared across different expressions and expression-specific features that are discriminative only for a target expression. Specifically, the FDM integrates sparse support vector machine and multi-task learning in a unified framework, where a novel loss function and a set of constraints are formulated to precisely control the sparsity and naturally disentangle active features. Extensive experiments on two well-known facial expression databases have demonstrated that the FDM outperforms the state-of-the-art methods for facial expression analysis. More importantly, the FDM achieves an impressive performance in a cross-database validation, which demonstrates the generalization capability of the selected features.
ieee international conference on automatic face gesture recognition | 2013
Ping Liu; Shizhong Han; Yan Tong
Facial activity is the most direct signal for perceiving emotional states in people. Emotion analysis from facial displays has been attracted an increasing attention because of its wide applications from human-centered computing to neuropsychiatry. Recently, image representation based on sparse coding has shown promising results in facial expression recognition. In this paper, we introduce a novel image representation for facial expression analysis. Specifically, we propose to use the histograms of nonnegative sparse coded image features to represent a facial image. In order to capture fine appearance variations caused by facial expression, logarithmic transformation is further employed on each nonnegative sparse coded feature. In addition, the proposed Histograms of Log-Transformed Nonnegative Sparse Coding (HLNNSC) features are calculated and organized in a pyramid-like structure such that the spatial relationships among the features are captured and utilized to enhance the performance of facial expression recognition. Extensive experiments on the Cohn-Kanade database show that the proposed approach yields a significant improvement in facial expression recognition and outperforms the other sparse coding based baseline approaches. Furthermore, experimental results on the GEMEP-FERA2011 dataset demonstrate that the proposed approach is promising for recognition under less controlled and thus more challenging environment.
ieee international conference on automatic face gesture recognition | 2017
Zibo Meng; Ping Liu; Jie Cai; Shizhong Han; Yan Tong
Facial expression recognition suffers under realworldconditions, especially on unseen subjects due to highinter-subject variations. To alleviate variations introduced bypersonal attributes and achieve better facial expression recognitionperformance, a novel identity-aware convolutional neuralnetwork (IACNN) is proposed. In particular, a CNN with a newarchitecture is employed as individual streams of a bi-streamidentity-aware network. An expression-sensitive contrastive lossis developed to measure the expression similarity to ensure thefeatures learned by the network are invariant to expressionvariations. More importantly, an identity-sensitive contrastiveloss is proposed to learn identity-related information from identitylabels to achieve identity-invariant expression recognition.Extensive experiments on three public databases including aspontaneous facial expression database have shown that theproposed IACNN achieves promising results in real world.
international conference on image processing | 2014
Shizhong Han; Zibo Meng; Ping Liu; Yan Tong
Face registration is a major and critical step for face analysis. Existing facial activity recognition systems often employ coarse face alignment based on a few fiducial points such as eyes and extract features from equal-sized grid. Such extracted features are susceptible to variations in face pose, facial deformation, and person-specific geometry. In this work, we propose a novel face registration method named facial grid transformation to improve feature extraction for recognizing facial Action Units (AUs). Based on the transformed grid, novel grid edge features are developed to capture local facial motions related to AUs. Extensive experiments on two well-known AU-coded databases have demonstrated that the proposed method yields significant improvements over the methods based on equal-sized grid on both posed and more importantly, spontaneous facial displays. Furthermore, the proposed method also outperforms the state-of-the-art methods using either coarse alignment or mesh-based face registration.
IEEE Transactions on Affective Computing | 2017
Zibo Meng; Shizhong Han; Yan Tong
Extensive efforts have been devoted to recognizing facial action units (AUs). However, it is still challenging to recognize AUs from spontaneous facial displays especially when they are accompanied by speech. Different from all prior work that utilized visual observations for facial AU recognition, this paper presents a novel approach that recognizes speech-related AUs exclusively from audio signals based on the fact that facial activities are highly correlated with voice during speech. Specifically, dynamic and physiological relationships between AUs and phonemes are modeled through a continuous time Bayesian network (CTBN); then AU recognition is performed by probabilistic inference via the CTBN model. A pilot audiovisual AU-coded database has been constructed to evaluate the proposed audio-based AU recognition framework. The database consists of a “clean” subset with frontal and neutral faces and a challenging subset collected with large head movements and occlusions. Experimental results on this database show that the proposed CTBN model achieves promising recognition performance for 7 speech-related AUs and outperforms both the state-of-the-art visual-based and audio-based methods especially for those AUs that are activated at low intensities or “hardly visible” in the visual channel. The improvement is more impressive on the challenging subset, where the visual-based approaches suffer significantly.
International Journal of Multimedia Data Engineering and Management | 2016
Min Chen; Zibo Meng; Shizhong Han; Yan Tong
Recognizing facial actions is challenging, especially when they are accompanied with speech. Instead of employing information solely from the visual channel, this work aims to exploit information from both visual and audio channels in recognizing speech-related facial action units AUs. In this work, two feature-level fusion methods are proposed. The first method is based on a kind of human-crafted visual feature. The other method utilizes visual features learned by a deep convolutional neural network CNN. For both methods, features are independently extracted from visual and audio channels and aligned to handle the difference in time scales and the time shift between the two signals. These temporally aligned features are integrated via feature-level fusion for AU recognition. Experimental results on a new audiovisual AU-coded dataset have demonstrated that both fusion methods outperform their visual counterparts in recognizing speech-related AUs. The improvement is more impressive with occlusions on the facial images, which would not affect the audio channel.
neural information processing systems | 2016
Shizhong Han; Zibo Meng; Ahmed Shehab Khan; Yan Tong
computer vision and pattern recognition | 2018
Shizhong Han; Zibo Meng; Zhiyuan Li; James O'Reilly; Jie Cai; Xiaofeng Wang; Yan Tong
international symposium on multimedia | 2015
Zibo Meng; Shizhong Han; Min Chen; Yan Tong