Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuan Zong is active.

Publication


Featured researches published by Yuan Zong.


IEEE Transactions on Multimedia | 2016

A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition

Tong Zhang; Wenming Zheng; Zhen Cui; Yuan Zong; Jingwei Yan; Keyu Yan

In this paper, a novel deep neural network (DNN)-driven feature learning method is proposed and applied to multi-view facial expression recognition (FER). In this method, scale invariant feature transform (SIFT) features corresponding to a set of landmark points are first extracted from each facial image. Then, a feature matrix consisting of the extracted SIFT feature vectors is used as input data and sent to a well-designed DNN model for learning optimal discriminative features for expression classification. The proposed DNN model employs several layers to characterize the corresponding relationship between the SIFT feature vectors and their corresponding high-level semantic information. By training the DNN model, we are able to learn a set of optimal features that are well suitable for classifying the facial expressions across different facial views. To evaluate the effectiveness of the proposed method, two nonfrontal facial expression databases, namely BU-3DFE and Multi-PIE, are respectively used to testify our method and the experimental results show that our algorithm outperforms the state-of-the-art methods.


international conference on multimodal interfaces | 2016

Multi-clue fusion for emotion recognition in the wild

Jingwei Yan; Wenming Zheng; Zhen Cui; Chuangao Tang; Tong Zhang; Yuan Zong; Ning Sun

In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the fine-tuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.


IEEE Signal Processing Letters | 2016

Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression

Yuan Zong; Wenming Zheng; Tong Zhang; Xiaohua Huang

In this letter, a novel cross-corpus speech emotion recognition (SER) method using domain-adaptive least-squares regression (DaLSR) model is proposed. In this method, an additional unlabeled data set from target speech corpus is used to serve as an auxiliary data set and combined with the labeled training data set from source speech corpus for jointly training the DaLSR model. In contrast to the traditional least-squares regression (LSR) method, the major novelty of DaLSR is that it is able to handle the mismatch problem between source and target speech corpora. Hence, the proposed DaLSR method is very suitable for coping with cross-corpus SER problem. For evaluating the performance of the proposed method in dealing with the cross-corpus SER problem, we conduct extensive experiments on three emotional speech corpora and compare the results with several state-of-the-art transfer learning methods that are widely used for cross-corpus SER problem. The experimental results show that the proposed method achieves better recognition accuracies than the state-of-the-art methods.


international conference on multimodal interfaces | 2015

Transductive Transfer LDA with Riesz-based Volume LBP for Emotion Recognition in The Wild

Yuan Zong; Wenming Zheng; Xiaohua Huang; Jingwei Yan; Tong Zhang

In this paper, we propose the method using Transductive Transfer Linear Discriminant Analysis (TTLDA) and Riesz-based Volume Local Binary Patterns (RVLBP) for image based static facial expression recognition challenge of the Emotion Recognition in the Wild Challenge (EmotiW 2015). The task of this challenge is to assign facial expression labels to frames of some movies containing a face under the real word environment. In our method, we firstly employ a multi-scale image partition scheme to divide each face image into some image blocks and use RVLBP features extracted from each block to describe each facial image. Then, we adopt the TTLDA approach based on RVLBP to cope with the expression recognition task. The experiments on the testing data of SFEW 2.0 database, which is used for image based static facial expression challenge, demonstrate that our method achieves the accuracy of 50%. This result has a 10.87% improvement over the baseline provided by this challenge organizer.


international conference on neural information processing | 2016

Cross-Database Facial Expression Recognition via Unsupervised Domain Adaptive Dictionary Learning

Keyu Yan; Wenming Zheng; Zhen Cui; Yuan Zong

Dictionary learning based methods have achieved state-of-the-art performance in the task of conventional facial expression recognition FER, where the distributions between training and testing data are implicitly assumed to be matched. But in the practical scenes this assumption is usually broken, especially when testing samples and training samples come from different databases, a.k.a. the cross-database FER problem. To address this problem, we propose a novel method called unsupervised domain adaptive dictionary learning UDADL to deal with the unsupervised case that all samples in target database are completely unlabeled. In UDADL, to obtain more robust representations of facial expressions and to reduce the time complexity in training and testing phases, we introduce a dual dictionary pair consisting of a synthesis one and an analysis one to mutually bridge the samples and their codes. Meanwhile, to relieve the distribution disparity of source and target samples, we further integrate the learning of unlabeled testing data into UDADL to adaptively adjust the misaligned distribution in an embedded space, where geometric structures of both domains are also encourage to be preserved. The UDADL model can be solved by an iterate optimization strategy with each sub-optimization in a closed analytic form. The extensive experiments on Multi-PIE and BU-3DFE databases demonstrate that the proposed UDADL is superior over most widely-used domain adaptation methods in dealing with cross-database FER, and achieves the state-of-the-art performance.


IEEE Transactions on Affective Computing | 2018

Cross-Domain Color Facial Expression Recognition Using Transductive Transfer Subspace Learning

Wenming Zheng; Yuan Zong; Xiaoyan Zhou; Minghai Xin

Facial expression recognition across domains, e.g., training and testing facial images come from different facial poses, is very challenging due to the different marginal distributions between training and testing facial feature vectors. To deal with such challenging cross-domain facial expression recognition problem, a novel transductive transfer subspace learning method is proposed in this paper. In this method, a labelled facial image set from source domain is combined with an unlabelled auxiliary facial image set from target domain to jointly learn a discriminative subspace and make the class labels prediction of the unlabelled facial images, where a transductive transfer regularized least-squares regression (TTRLSR) model is proposed to this end. Then, based on the auxiliary facial image set, we train a SVM classifier for classifying the expressions of other facial images in the target domain. Moreover, we also investigate the use of color facial features to evaluate the recognition performance of the proposed facial expression recognition method, where color scale invariant feature transform (CSIFT) features associated with 49 landmark facial points are extracted to describe each color facial image. Finally, extensive experiments on BU-3DFE and Multi-PIE multiview color facial expression databases are conducted to evaluate the cross-database & cross-view facial expression recognition performance of the proposed method. Comparisons with state-of-the-art domain adaption methods are also included in the experiments. The experimental results demonstrate that the proposed method achieves much better recognition performance compared with the state-of-the-art methods.


acm multimedia | 2017

Learning a Target Sample Re-Generator for Cross-Database Micro-Expression Recognition

Yuan Zong; Xiaohua Huang; Wenming Zheng; Zhen Cui; Guoying Zhao

In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.


ICMI '18 Proceedings of the 20th ACM International Conference on Multimodal Interaction | 2018

Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild

Cheng Lu; Wenming Zheng; Chaolong Li; Chuangao Tang; Suyuan Liu; Simeng Yan; Yuan Zong

The difficulty of emotion recognition in the wild (EmotiW) is how to train a robust model to deal with diverse scenarios and anomalies. The Audio-video Sub-challenge in EmotiW contains audio-video short clips with several emotional labels and the task is to distinguish which label the video belongs to. For the better emotion recognition in videos, we propose a multiple spatio-temporal feature fusion (MSFF) framework, which can more accurately depict emotional information in spatial and temporal dimensions by two mutually complementary sources, including the facial image and audio. The framework is consisted of two parts: the facial image model and the audio model. With respect to the facial image model, three different architectures of spatial-temporal neural networks are employed to extract discriminative features about different emotions in facial expression images. Firstly, the high-level spatial features are obtained by the pre-trained convolutional neural networks (CNN), including VGG-Face and ResNet-50 which are all fed with the images generated by each video. Then, the features of all frames are sequentially input to the Bi-directional Long Short-Term Memory (BLSTM) so as to capture dynamic variations of facial appearance textures in a video. In addition to the structure of CNN-RNN, another spatio-temporal network, namely deep 3-Dimensional Convolutional Neural Networks (3D CNN) by extending the 2D convolution kernel to 3D, is also applied to attain evolving emotional information encoded in multiple adjacent frames. For the audio model, the spectrogram images of speech generated by preprocessing audio, are also modeled in a VGG-BLSTM framework to characterize the affective fluctuation more efficiently. Finally, a fusion strategy with the score matrices of different spatio-temporal networks gained from the above framework is proposed to boost the performance of emotion recognition complementally. Extensive experiments show that the overall accuracy of our proposed MSFF is 60.64%, which achieves a large improvement compared with the baseline and outperform the result of champion team in 2017.


international joint conference on artificial intelligence | 2018

A Novel Neural Network Model based on Cerebral Hemispheric Asymmetry for EEG Emotion Recognition

Yang Li; Wenming Zheng; Zhen Cui; Tong Zhang; Yuan Zong

In this paper, we propose a novel neural network model, called bi-hemispheres domain adversarial neural network (BiDANN), for EEG emotion recognition. BiDANN is motivated by the neuroscience findings, i.e., the emotional brain’s asymmetries between left and right hemispheres. The basic idea of BiDANN is to map the EEG data of both left and right hemispheres into discriminative feature spaces separately, in which the data representations can be classified easily. For further precisely predicting the class labels of testing data, we narrow the distribution shift between training and testing data by using a global and two local domain discriminators, which work adversarially to the classifier to encourage domaininvariant data representations to emerge. After that, the learned classifier from labeled training data can be applied to unlabeled testing data naturally. We conduct two experiments to verify the performance of our BiDANN model on SEED database. The experimental results show that the proposed model achieves the state-of-the-art performance.


Neurocomputing | 2018

Unsupervised Facial Expression Recognition Using Domain Adaptation based Dictionary Learning Approach

Keyu Yan; Wenming Zheng; Zhen Cui; Yuan Zong; Tong Zhang; Chuangao Tang

Abstract Over the past years, dictionary learning (DL) based methods have achieved excellent performance in facial expression recognition (FER), where training and testing data are usually presumed to have the same distributions. But in the practical scenarios, this assumption is often broken, especially when training and testing data come from different databases, a.k.a. the cross-database FER problem. In this paper, we focus on the unsupervised cross-domain FER problem where all the samples in target domain are completely unannotated. To address this problem, we propose an unsupervised domain adaptive dictionary learning (UDADL) model to bridge source domain and target domain by learning a shared dictionary. The encoding of the two domains on this dictionary are constrained to be mutually embedded on each other. To bypass the solution complexity, we borrow an analysis dictionary to seek for approximate solutions as the latent variable to favor sub-solvers to be analyzed. To evaluate the performance of the proposed UDADL model, we conduct extensive experiments on the widely used Multi-PIE and BU-3DFE databases. The experimental results demonstrated that the proposed UDADL method outperforms recent domain adaptation FER methods and achieved the state-of-the-art performance.

Collaboration


Dive into the Yuan Zong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhen Cui

Nanjing University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Keyu Yan

Southeast University

View shared research outputs
Top Co-Authors

Avatar

Yang Li

Southeast University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge