IEEE Transactions on Multimedia | 2019

Structure-Constrained Motion Sequence Generation

Abstract

Video generation is a challenging task due to the extremely high-dimensional distribution of the solution space. Good constraints in the solution domain would thus reduce the difficulty of approximating optimal solutions. In this paper, instead of directly generating high-dimensional video data, we propose using object landmarks as explicit structure constraints to address this issue. Specifically, we propose a two-stage framework for an action-conditioned video generation task. In our framework, the first stage aims to generate landmark sequences according to predefined motion types, and a recurrent model (RNN/LSTM) is adopted for this purpose. The landmark sequence can be regarded as a low-dimensional structure embedding of high-dimensional video data, and generating landmark sequences is much easier than generating videos. The second stage is inspired by a conditional generative adversarial network (CGAN), and we take the generated landmark sequence as a structure condition to learn a landmark-to-image translation network. Such a one-to-one translation framework avoids the difficulty of generating videos and instead transfers the video generation task to image generation, which is resolvable due to the maturity of current GAN-based models. The experimental results demonstrate that our model not only achieves promising results on rigid/nonrigid motion generation tasks but also can be extended to multiobject motion situations.

Volume 21

IEEE Transactions on Multimedia | 2019

Structure-Constrained Motion Sequence Generation

Abstract

Volume 21

Pages 1799-1812

DOI 10.1109/TMM.2018.2885235

Language English

Journal IEEE Transactions on Multimedia

Full Text