IEEE Transactions on Cognitive and Developmental Systems | 2021

Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition

 
 
 
 

Abstract


Multimodal signals are powerful for emotion recognition since they can represent emotions comprehensively. In this paper, we compare the recognition performance and robustness of two multimodal emotion recognition models: deep canonical correlation analysis (DCCA) and bimodal deep autoencoder (BDAE). The contributions of this paper are three folds: 1) We propose two methods for extending the original DCCA model for multimodal fusion: weighted sum fusion and attention-based fusion. 2) We systemically compare the performance of DCCA, BDAE, and traditional approaches on five multimodal datasets. 3) We investigate the robustness of DCCA, BDAE, and traditional approaches on SEED-V and DREAMER datasets under two conditions: adding noises to multimodal features and replacing EEG features with noises. Our experimental results demonstrate that DCCA achieves state-of-the-art recognition results on all five datasets: 94.6% on the SEED dataset, 87.5% on the SEED-IV dataset, 84.3% and 85.6% on the DEAP dataset, 85.3% on the SEED-V dataset, and 89.0%, 90.6%, and 90.7% on the DREAMER dataset. Meanwhile, DCCA has greater robustness when adding various amounts of noises to the SEED-V and DREAMER datasets. By visualizing features before and after DCCA transformation on the SEED-V dataset, we find that the transformed features are more homogeneous and discriminative across emotions.

Volume None
Pages None
DOI 10.1109/TCDS.2021.3071170
Language English
Journal IEEE Transactions on Cognitive and Developmental Systems

Full Text