IEEE Access | 2021

On the Usage of Pre-Trained Speech Recognition Deep Layers to Detect Emotions

 
 

Abstract


One of the Industry 4.0 landmarks, concerns the optimization of manufacturing processes by increasing the operator’s productivity. But productivity is highly affected by the operator’s emotions. Positive emotions (e.g. happiness) are positively related to productivity, in contrast negative emotions (e.g. frustration) are negative related to productivity and positive related to misconducts and misbehaviors on the workplace. Thus perhaps, automatic recommendation systems can suggest actions or instructions to eliminate or attenuate undesired negative emotions on the workplace. These systems might support their actions based on the reliability of emotion detectors. In this paper, emotions are detected thought a speech system. Our solution was built over deep speech recognition layers, namely the first two convolutional layers of the pre-trained 2015 Baidu’s speech recognition model. In re-utilizing these first two convolutional layers, robust meta-features are expected to be extracted. Our deep learning model attempts to predict the seven primary emotions on the MELD test set.Furthermore, our solution did not use any contextual data and yet it achieved robust results. The proposed weighted TrBaidu algorithm achieved state-of-art results on the detection of joy and surprise emotions, a F1-score rate of 23 % for both emotions.

Volume 9
Pages 9699-9705
DOI 10.1109/ACCESS.2021.3051083
Language English
Journal IEEE Access

Full Text