2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) | 2019

Emotion Recognition from Children Speech Signals Using Attention Based Time Series Deep Learning

 
 
 
 

Abstract


Children s emotions expression concentrates in the acoustic aspects such as the tones and timbres of the voice instead of the semantics, and there are a lot of lengthy fragments in their speech. This paper proposes an emotion recognition model using the time series deep learning technology, named attention based Bi-directional Long Short-Term Memory (CNN-BiLSTM) to extract the emotional features. After preprocessing the speech signal, the forty-dimensional Mel Frequency Cepstral Coefficients (MFCC) related parameters are extracted, including the dynamic and static features. And these frequency domain features are enhanced by convolutional neural networks (CNNs) as the emotional features of children s speech recognition. BiLSTM is used to solve the problem of poor performance of long-term dependent learning features, and attention mechanism is used for only a few frames contain emotional features in the children speech signal. Compared with the related speech emotion recognition models such as LSTM-CNN and 2D-CNN-LSTM, our proposed speech emotion recognition model improves the accuracy up to 71.6% on the FAU-AIBO children s speech emotion database.

Volume None
Pages 1296-1300
DOI 10.1109/BIBM47256.2019.8982992
Language English
Journal 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Full Text