Guobao Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guobao Zhang is active.

Explore More

Publication

Featured researches published by Guobao Zhang.

chinese conference on pattern recognition | 2009

Speech Emotion Recognition Research Based on the Stacked Generalization Ensemble Neural Network for Robot Pet

Yongming Huang; Guobao Zhang; Xiaoli Xu

In this paper, we present an emotion recognition system using the stacked generalization ensemble neural network for special human affective state in the speech signal. 450 short emotional sentences with different contents from 3 speakers were collected as experiment materials. The features relevant with energy, speech rate, pitch and formant are extracted from speech signals. Stacked Generalization Ensemble Neural Networks are used as the classifier for 5 emotions including anger, calmness, happiness, sadness and boredom. First, compared with the traditional BP network or wavelet neural network, the results of experiments show that the Stacked Generalization Ensemble Neural Network has faster convergence speed and higher recognition rate. Second, after discussing the advantage and disadvantage between different ensemble Neural Networks, suitable decision will be made for Robot Pet.

Iet Signal Processing | 2015

Extraction of adaptive wavelet packet filter-bank-based acoustic feature for speech emotion recognition

Yongming Huang; Ao Wu; Guobao Zhang; Yue Li

In this paper, a wavelet packet (WP)-based acoustic feature extraction approach is proposed for automatic speech emotion recognition (SER). First, the issue of optimising the WP filter-bank structure for giving classification task is presented as a tree pruning problem, and different tree-pruning criteria are investigated. On this basis, a novel WP-based feature is introduced for SER, namely discriminative band WP power coefficients. Finally, a SER system is built and extensive experiments are carried out. Experimental results show that the proposed feature considerably improves emotion recognition performance over conventional mel frequency cepstrum coefficient (MFCC) feature. The proposed feature extraction approach is promising since it can be easily extended to two-dimensional (2D) facial expression analysis with 2D WP quadtree structures, and further a high-quality audio-visual bimodal emotion recognition system is desirable.

Archive | 2013

Adaptive Wavelet Packet Filter-Bank Based Acoustic Feature for Speech Emotion Recognition

Yue Li; Guobao Zhang; Yongming Huang

In this paper, a wavelet packet based adaptive filter-bank construction method is proposed, with additive Fisher ratio used as wavelet packet tree pruning criterion. A novel acoustic feature named discriminative band wavelet packet power coefficients (db-WPPC) is proposed and on this basis, a speech emotion recognition system is constructed. Experimental results show that the proposed feature improves emotion recognition performance over the conventional MFCC feature.

international conference on intelligent computing | 2014

Improved Emotion Recognition with Novel Task-Oriented Wavelet Packet Features

Yongming Huang; Guobao Zhang; Yue Li; Ao Wu

In this paper, a wavelet packet based adaptive filter-bank construction method is proposed for speech signal processing. On this basis, a set of acoustic features are proposed for speech emotion recognition, namely Wavelet Packet Cepstral Coefficients (WPCC). The former extends the conventional Mel-Frequency Cepstral Coefficients (MFCC) by adapting the filter-bank structure according to the decision task; while the later aims at selecting the most crucial frequency bands where the most discriminative emotion information is located. Speech emotion recognition system is constructed with the two proposed feature sets and Gaussian mixture model as classifier. Experimental results on Berlin emotional speech database show that the proposed features improve emotion recognition performance over the conventional MFCC feature. The proposed feature extraction scheme has encouraging prospects since it can be extended to 2D image processing with 2D wavelet packets and hence extended to audio-visual bimodal emotion recognition application.

international conference on intelligent computing | 2009

Speech emotion recognition research based on wavelet neural network for robot pet

Yongming Huang; Guobao Zhang; Xiaoli Xu

In this paper, we present an emotion recognition system using wavelet neural network and BP neural network for special human affective state in the speech signal. 750 short emotional sentences with different contents from 5 speakers were collected as experiment materials. The features relevant with energy, speech rate, pitch and formant are extracted from speech signals. Neural network are used as the classifier for 5 emotions including anger, calmness, happiness, sadness and boredom. Compared with the traditional BP network, the results of experiments show that the wavelet neural network has faster convergence speed and higher recognition rate.

chinese conference on pattern recognition | 2014

Speech Emotion Recognition Based on Coiflet Wavelet Packet Cepstral Coefficients

Yongming Huang; Ao Wu; Guobao Zhang; Yue Li

A wavelet packet based adaptive filter-bank construction method is proposed for speech signal processing in this paper. On this basis, a set of acoustic features are proposed for speech emotion recognition, namely Coiflet Wavelet Packet Cepstral Coefficients (CWPCC). CWPCC extends the conventional Mel-Frequency Cepstral Coefficients (MFCC) by adapting the filter-bank structure according to the decision task; Speech emotion recognition system is constructed with the proposed feature set and Gaussian mixture model as classifier. Experimental results on Berlin emotional speech database show that the Coiflet Wavelet Packet is more suitable in speech emotion recognition than other Wavelet Packets and proposed features improve emotion recognition performance over the conventional features.

international conference network communication and computing | 2016

Feature Fusion Methods for Robust Speech Emotion Recognition Based on Deep Belief Networks

Ao Wu; Yongming Huang; Guobao Zhang

The speech emotion recognition accuracy of prosody feature and voice quality feature declines with the decrease of SNR (Signal to Noise Ratio) of speech signals. In this paper, we propose novel sub-band spectral centroid weighted wavelet packet cepstral coefficients (W-WPCC) for robust speech emotion recognition. The W-WPCC feature is computed by combining the sub-band energies with sub-band spectral centroids via a weighting scheme to generate noise-robust acoustic features. And Deep Belief Networks (DBNs) are artificial neural networks having more than one hidden layer, which are first pre-trained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. We extracted prosody feature, voice quality features and wavelet packet cepstral coefficients (WPCC) from the speech signals to combine with W-WPCC and fused them by Deep Belief Networks (DBNs). Experimental results on Berlin emotional speech database show that the proposed fused feature with W-WPCC is more suitable in speech emotion recognition under noisy conditions than other acoustics features and proposed DBNs feature learning structure combined with W-WPCC improve emotion recognition performance over the conventional emotion recognition method.

chinese conference on pattern recognition | 2010