2021 17th International Conference on Machine Vision and Applications (MVA) | 2021

Video Summarization With Frame Index Vision Transformer

Abstract

In this paper, we propose a novel frame index vision transformer for video summarization. Given training frames, we linearly project the content of the frames to obtain frame embedding. By incorporating the frame embedding with the index embedding and class embedding, the proposed frame index vision transformer can be efficiently and effectively applied to learn the importance of the input frames. As shown in the experimental results, the proposed method outperforms the state-of-the-art deep learning methods including recurrent neural network (RNN) and convolutional neural network (CNN) based methods in both of the SumMe and TVSum datasets. In addition, our method can achieve real-time computational efficiency during testing.

Volume None

2021 17th International Conference on Machine Vision and Applications (MVA) | 2021

Video Summarization With Frame Index Vision Transformer

Abstract

Volume None

Pages 1-5

DOI 10.23919/MVA51890.2021.9511350

Language English

Journal 2021 17th International Conference on Machine Vision and Applications (MVA)

Full Text