International Journal of Speech Technology | 2021

Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm

 

Abstract


Human interactions involve emotional cues that can be used to interpret the emotion expressed by the speaker. As the vocal emotions vary from one speaker to another, there is a chance of misinterpretation. To determine the emotion expressed by the speaker, a speech emotion recognizer can be utilized. It is known that speech expresses the emotional states of humans along with the syntax and semantic content of linguistic sentences. Therefore, human emotion recognition using speech signaling is possible. Speech emotion recognition is a crucial and challenging task in which the feature extraction plays a prominent role in its performance. Determining emotional states in speech signals is a very challenging area for many reasons. The first issue of all speech emotion systems is the selection of the best features, which is powerful enough to distinguish various emotions. The presence of different language, pronunciation, sentences, style, and speakers adds additional difficulty since these characteristics include pitch and energy that directly alters most of the features extracted. Redundant features and high computational cost make emotion recognition an undesirable task. Instead of focusing on the words, the vocal changes and communicative pressure on the words should be taken as the primary consideration. The Enhanced Cat Swarm Optimization (ECSO) algorithm for feature extraction has been proposed to address these issues and it is not used in any existing speech emotion recognition approaches. The proposed approach achieves excellent performance in terms of accuracy, recognition rate, sensitivity, and specificity.

Volume 24
Pages 155-163
DOI 10.1007/s10772-020-09776-x
Language English
Journal International Journal of Speech Technology

Full Text