IEEE Transactions on Multimedia | 2019

Hierarchical Concept Score Postprocessing and Concept-Wise Normalization in CNN-Based Video Event Recognition

 
 

Abstract


This paper is focused on video event recognition based on frame level convolutional neural network (CNN) descriptors. Using transfer learning, the image trained descriptors are applied to the video domain to make event recognition feasible in scenarios with limited computational resources. After fine-tuning of the existing CNN concept score extractors, pretrained on ImageNet, the output descriptors of the different fully connected layers are employed as frame descriptors. The resulting descriptors are hierarchically postprocessed and combined with novel and efficient pooling and normalization methods. As major contributions of this paper to the video event recognition, we present a postprocessing scheme in which the hierarchy and the relative shortest distance of concepts in WordNet concept tree is taken into account to alleviate uncertainty of the resulting concept scores at the output of the CNN. Besides, we propose a concept-wise power law normalization method that outperforms the widely used power law normalization. The integration of these approaches results in a high performance average (max) pooling-based video event recognition. Compared to the average (max) pooling combined with the state-of-the-art normalization methods and fine-tuned support vector machine classification, the proposed processing scheme improves the event recognition accuracy in terms of mean average precision over the Columbia consumer video and unstructured social activity attribute datasets, where achieves a pretty comparable result on UCF101 and ActivityNet datasets.

Volume 21
Pages 157-172
DOI 10.1109/TMM.2018.2844101
Language English
Journal IEEE Transactions on Multimedia

Full Text