Multim. Tools Appl. | 2021

Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

 
 

Abstract


This paper proposes a speech emotion recognition technique based on Optimized Deep Neural Network. The speech signals are denoised by presenting a novel adaptive wavelet transform with a modified galactic swarm optimization algorithm (AWT_MGSO). From the noise removed speech signals, the spectral features like LPC (Linear Prediction Coefficients), MFCC (Mel frequency cepstral coefficients), PSD (power spectral density) and prosodic features like energy, entropy, formant frequencies and pitch are extracted and certain features are selected by ASFO (Adaptive Sunflower Optimization Algorithm). The optimized DNN-DHO (Deep Neural Network with Deer Hunting Optimization Algorithm) is proposed for emotion classification. An enhanced squirrel search algorithm is proposed to update the weight in the optimized DNN_DHO classifier. In this study, all the eight emotions of the speech from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set) databases for English and IITKGP-SEHSC (Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus) database for Hindi are classified. The experimental results are obtained and compared with the classifiers such as DNN_DHO, DNN (Deep Neural Network) and DAE (Deep Auto Encoder). The experimental results show that the proposed algorithm obtains maximum accuracy as 97.85% by the TESS dataset, 97.14% by the RAVDESS dataset and 93.75% by the IITKGP-SEHSC dataset by the DNN-HHO classifier.

Volume 80
Pages 9961-9992
DOI 10.1007/s11042-020-10118-x
Language English
Journal Multim. Tools Appl.

Full Text