J. Supercomput. | 2021

Deep neural network-based fusion model for emotion recognition using visual data

 
 
 
 
 
 

Abstract


In this study, we present a fusion model for emotion recognition based on visual data. The proposed model uses video information as its input and generates emotion labels for each video sample. Based on the video data, we first choose the most significant face regions with the use of a face detection and selection step. Subsequently, we employ three CNN-based architectures to extract the high-level features of the face image sequence. Furthermore, we adjusted one additional module for each CNN-based architecture to capture the sequential information of the entire video dataset. The combination of the three CNN-based models in a late-fusion-based approach yields a competitive result when compared to the baseline approach while using two public datasets: AFEW 2016 and SAVEE.

Volume 77
Pages 10773-10790
DOI 10.1007/S11227-021-03690-Y
Language English
Journal J. Supercomput.

Full Text