Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qirong Mao is active.

Publication


Featured researches published by Qirong Mao.


IEEE Transactions on Multimedia | 2014

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

Qirong Mao; Ming Dong; Zhengwei Huang; Yongzhao Zhan

As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect- related , discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn affect-salient, discriminative features using a novel objective function that encourages feature saliency, orthogonality, and discrimination for SER. Our experimental results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and language variation, and environment distortion) and outperforms several well-established SER features.


acm multimedia | 2014

Speech Emotion Recognition Using CNN

Zhengwei Huang; Ming Dong; Qirong Mao; Yongzhao Zhan

Deep learning systems, such as Convolutional Neural Networks (CNNs), can infer a hierarchical representation of input data that facilitates categorization. In this paper, we propose to learn affect-salient features for Speech Emotion Recognition (SER) using semi-CNN. The training of semi-CNN has two stages. In the first stage, unlabeled samples are used to learn candidate features by contractive convolutional neural network with reconstruction penalization. The candidate features, in the second step, are used as the input to semi-CNN to learn affect-salient, discriminative features using a novel objective function that encourages the feature saliency, orthogonality and discrimination. Our experiment results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and environment distortion), and outperforms several well-established SER features.


Journal of Zhejiang University Science C | 2015

Using Kinect for real-time emotion recognition via facial expressions

Qirong Mao; Xin-yu Pan; Yongzhao Zhan; Xiangjun Shen

Emotion recognition via facial expressions (ERFE) has attracted a great deal of interest with recent advances in artificial intelligence and pattern recognition. Most studies are based on 2D images, and their performance is usually computationally expensive. In this paper, we propose a real-time emotion recognition approach based on both 2D and 3D facial expression features captured by Kinect sensors. To capture the deformation of the 3D mesh during facial expression, we combine the features of animation units (AUs) and feature point positions (FPPs) tracked by Kinect. A fusion algorithm based on improved emotional profiles (IEPs) and maximum confidence is proposed to recognize emotions with these real-time facial expression features. Experiments on both an emotion dataset and a real-time video show the superior performance of our method.


Journal of Zhejiang University Science C | 2015

Speech emotion recognition with unsupervised feature learning

Zhengwei Huang; Wentao Xue; Qirong Mao

Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.


Multimedia Tools and Applications | 2017

Unsupervised domain adaptation for speech emotion recognition using PCANet

Zhengwei Huang; Wentao Xue; Qirong Mao; Yongzhao Zhan

Research in emotion recognition seeks to develop insights into the variances of features of emotion in one common domain. However, automatic emotion recognition from speech is challenging when training data and test data are drawn from different domains due to different recording conditions, languages, speakers and many other factors. In this paper, we propose a novel feature transfer approach with PCANet (a deep network), which extracts both the domain-shared and the domain-specific latent features to facilitate performance improvement. The proposal attempts to learn multiple intermediate feature representations along an interpolating path between the source and target domains using PCANet by considering the distribution shift between source domain and target domain, and then aligns other feature representations on the path with target subspace to control them to change in the right direction towards the target. To exemplify the effectiveness of our approach, we select the INTERSPEECH 2009 Emotion Challenge’s FAU Aibo Emotion Corpus as the target database and two public databases (ABC and Emo-DB) as source set. Experimental results demonstrate that the proposed feature transfer learning method outperforms the conventional machine learning methods and other transfer learning methods on the performance.


International Journal of Humanoid Robotics | 2010

SPEECH EMOTION RECOGNITION METHOD BASED ON IMPROVED DECISION TREE AND LAYERED FEATURE SELECTION

Qirong Mao; Xiaojia Wang; Yongzhao Zhan

In this paper, in order to improve the classification accuracy with features as few as possible, a new hierarchical recognition method based on an improved SVM decision tree and the layered feature selection method combining neural network with genetic algorithm are proposed. The improved SVM decision tree is constructed according to confusion degrees between two emotions or those between two emotion groups. The classifier in each node of the improved decision tree is a SVM. On the emotional speech corpus recorded by our workgroup including 7 emotions, with the features and parameters gotten by the method combining neural network with genetic algorithm, improved SVM decision tree, multi-SVM, SVM-based binary decision tree, the traditional SVM-based decision directed acyclic graph and HMM are evaluated respectively. The experiments reveal that, compared with the other four methods, the proposed method in this paper appears better classification accuracy with fewer features and less time.


affective computing and intelligent interaction | 2015

Multi-pose facial expression recognition based on SURF boosting

Qiyu Rao; Xing Qu; Qirong Mao; Yongzhao Zhan

Today Human Computer Interaction (HCI) is one of the most important topics in machine vision and image processing fields. The ability to handle multi-pose facial expressions is important for computers to understand affective behavior under less constrained environment. In this paper, we propose a SURF (Speeded-Up Robust Features) boosting framework to address challenging issues in multi-pose facial expression recognition (FER). Local SURF features from different overlapping patches are selected by boosting in our model to focus on more discriminable representations of facial expression. And this paper proposes a novel training step during boosting. The experiments using the proposed method demonstrate favorable results on RaFD and KDEF databases.


international conference on acoustics, speech, and signal processing | 2016

Domain adaptation for speech emotion recognition by sharing priors between related source and target classes

Qirong Mao; Wentao Xue; Qiru Rao; Feifei Zhang; Yongzhao Zhan

In speech emotion recognition (SER), speech data is usually captured from different scenarios, which often leads to significant performance degradation due to the inherent mismatch between training and test set. To cope with this problem, we propose a domain adaptation method called Sharing Priors between Related Source and Target classes (SPRST) based on a two-layer neural network. The classifier parameters, namely the weights of the second layer, are imposed the common priors between the related classes, so that the classes with few labeled data in target domain can borrow knowledge from the related classes in source domain. The method is evaluated on the INTERSPEECH 2009 Emotion Challenge two-class task. Experimental results show that our approach significantly improves the performance when only a small number of target labeled instances are available.


IEEE Transactions on Multimedia | 2017

Hierarchical Bayesian Theme Models for Multipose Facial Expression Recognition

Qirong Mao; Qiyu Rao; Yongbin Yu; Ming Dong

As an essential way of human emotional behavior understanding, facial expression recognition (FER) has attracted a great deal of attention in multimedia research. Most of studies are conducted in a “lab-controlled” environment, and their real-world performance degenerates greatly due to factors such as head pose variations. In this paper, we propose a pose-based hierarchical Bayesian theme model to address challenging issues in multipose FER. Local appearance features and global geometry information are combined in our model to learn an intermediate face representation before recognizing expressions. By sharing a pool of features with various poses, our model provides a unified solution for multipose FER, bypassing the separate training and parameter tuning for each pose, and thus is scalable to a large number of poses. Experiments on both benchmark facial expression databases and Internet images show the superior/highly competitive performance of our system when compared with the current state of the art.


Journal of Zhejiang University Science C | 2013

Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

Qirong Mao; Xiaolei Zhao; Zhengwei Huang; Yongzhao Zhan

Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.

Collaboration


Dive into the Qirong Mao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ming Dong

Wayne State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge