Multimedia Tools and Applications | 2021
User-generated video emotion recognition based on key frames
Abstract
Video is an important medium in communication and entertainment, and thus, an intelligent understanding of videos has attracted widespread interest in academic community. Video content diversity and sparse emotional expression are challenging for video emotion recognition, especially for user-generated video. In this paper, we propose a key frames extraction algorithm based on affective saliency estimation. By estimating the affective saliency value of video frames, key frames are extracted to avoid the influence of emotion-independent frames on the recognition result. Efficient deep visual features are extracted using pretrained models and traditional models Support Vector Machine (SVM), Random Forests (RF) and deep model Convolutional Neural Networks (CNN) are used to perform emotion recognition. Moreover, we propose a hybrid fusion mechanism that combines score fusion and Top-K decision fusion to further improve recognition accuracy. Extensive experiments are conducted on user-generated video datasets Ekman-6 and VideoEmotion-8, and the average recognition accuracy are 59.51% and 52.85% respectively. The experimental results show that the proposed method can improve the recognition performance and is superior to the current user-generated video emotion recognition methods.