2021 International Wireless Communications and Mobile Computing (IWCMC) | 2021

Speech Emotion Recognition Model with Time-Scale-Invariance MFCCs as Input

 
 
 

Abstract


Speech Emotion Recognition (SER) is a significant task for human communication. In the recent years, Mel-frequency Cepstrum Coefficient (MFCC) feature can be usually utilized in the related tasks of speech emotion recognition. In this study, we developed a multi-head-attention CNN model with auxiliary task of gender task. Base on proposed model, we explore the effect of different time-scale MFCCs and different combination of them as input on the performance of proposed model. Experimental results show that MFCC having higher resolution in time-scale as input can help model achieving better performance of speech emotion recognition with a moderate range. Also, it can help model achieving better performance to combine different time-scale MFCCs appropriately.

Volume None
Pages 537-542
DOI 10.1109/IWCMC51323.2021.9498598
Language English
Journal 2021 International Wireless Communications and Mobile Computing (IWCMC)

Full Text