2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) | 2021

Two-stream Emotion-embedded Autoencoder for Speech Emotion Recognition

 
 

Abstract


Speech emotion recognition is an important part of the human-computer interaction process, which has been receiving more attention in recent years. However, although a wide diversity of methods had been proposed in decades, these approaches still cannot improve the performance. The main reason for the low accuracy of emotion recognition system is how to effectively extract emotion-oriented features. In this paper, we propose a novel autoencoder architecture, two-stream emotion-embedded autoencoder, to extract deep emotion feature. The input is projected to two latent representations in our method. One of them is meant to learn the best representation of the input which contains all information of speech; whereas the other is used to capture emotion-independent information. Next, the difference between two latent representations is considered as the deep emotion feature. Furthermore, the deep emotion feature is concatenated with global acoustic features obtained by openSMILE toolkit. Finally, based on the concatenated feature vector, fully connected network is adopted to conduct emotion classification. Besides, to improve generalization of our method, a simple data augmentation approach is applied. IEMOCAP that is a publicly available and highly popular databases is chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.

Volume None
Pages 1-6
DOI 10.1109/IEMTRONICS52119.2021.9422602
Language English
Journal 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)

Full Text