Proceedings of the 27th ACM International Conference on Multimedia | 2019

Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

Abstract

One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.

Volume None

Proceedings of the 27th ACM International Conference on Multimedia | 2019

Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

Abstract

Volume None

Pages None

DOI 10.1145/3343031.3351015

Language English

Journal Proceedings of the 27th ACM International Conference on Multimedia

Full Text