2020 25th International Conference on Pattern Recognition (ICPR) | 2021

Anticipating Activity from Multimodal Signals

 
 
 
 
 
 

Abstract


Images, videos, audio signals, sensor data, can be easily collected in huge quantity by different devices and processed in order to emulate the human capability of elaborating a variety of different stimuli. Are multimodal signals useful to understand and anticipate human actions if acquired from the user viewpoint? This paper proposes to build an embedding space where inputs of different nature, but semantically correlated, are projected in a new representation space and properly exploited to anticipate the future user activity. To this purpose, we built a new multimodal dataset comprising video, audio, tri-axial acceleration, angular velocity, tri-axial magnetic field, pressure and temperature. To benchmark the proposed multimodal anticipation challenge, we consider classic classifiers on top of deep learning methods used to build the embedding space representing multimodal signals. The achieved results show that the exploitation of different modalities is useful to improve the anticipation of the future activity.

Volume None
Pages 4680-4687
DOI 10.1109/ICPR48806.2021.9412197
Language English
Journal 2020 25th International Conference on Pattern Recognition (ICPR)

Full Text