2019 IEEE Winter Conference on Applications of Computer Vision (WACV) | 2019

Online Video Summarization: Predicting Future to Better Summarize Present

Abstract

Automatically generating the summary of a video is a challenging problem due to its subjective nature. Most of the previous works in the field consider the entire video to extract out the important frames. Unlike them, our paper presents MerryGoRoundNet, a supervised learning approach to solve this problem in an online fashion. We observe that to effectively summarize a video, one needs to take into account both the spatial and temporal relations between video frames. MerryGoRoundNet utilizes encoder-decoder style architecture and convolutional LSTM to establish spatiotemporal relationship and generates the summary on the fly, thereby being more efficient than non-autoregressive counterparts in terms of time and memory. In order to make summary more diverse and complete, we augment our network with unsupervised task of next frame prediction and a supervised task of scene start detection and propose a loss function that explicitly focuses on achieving the right balance between continuity and diversity in the produced summary. Ablation study performed affirms the architecture and learning objective of our approach. Evaluation of MerryGoRoundNet on different datasets demonstrates superior performance among online summarization approaches and competitive performance when compared with offline approaches as well.

Volume None

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) | 2019

Online Video Summarization: Predicting Future to Better Summarize Present

Abstract

Volume None

Pages 471-480

DOI 10.1109/WACV.2019.00056

Language English

Journal 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Full Text