2021 17th International Conference on Machine Vision and Applications (MVA) | 2021

Video Summarization With Frame Index Vision Transformer

 
 
 

Abstract


In this paper, we propose a novel frame index vision transformer for video summarization. Given training frames, we linearly project the content of the frames to obtain frame embedding. By incorporating the frame embedding with the index embedding and class embedding, the proposed frame index vision transformer can be efficiently and effectively applied to learn the importance of the input frames. As shown in the experimental results, the proposed method outperforms the state-of-the-art deep learning methods including recurrent neural network (RNN) and convolutional neural network (CNN) based methods in both of the SumMe and TVSum datasets. In addition, our method can achieve real-time computational efficiency during testing.

Volume None
Pages 1-5
DOI 10.23919/MVA51890.2021.9511350
Language English
Journal 2021 17th International Conference on Machine Vision and Applications (MVA)

Full Text