2019 IEEE International Conference on Big Data and Smart Computing (BigComp) | 2019
Multiple Videos Captioning Model for Video Storytelling
Abstract
In this paper, We propose a novel video captioning model that utilizes context information of correlated clips. Unlike the ordinary “one clip - one caption” algorithms, we concatenate multiple neighboring clips as a chunk and train the network in “one chunk - multiple caption” manner. We train and evaluate our algorithm using M-VAD dataset and report the performance of caption and keyword generation. Our model is a foundation model for generating a video story using several captions. Therefore, in this paper, we focus on caption generation for several videos and trend analysis of the generated captions. In the experiments, we show the performance of intermediate results of our model in both qualitative and quantitative aspects.