2021 18th International Conference on Ubiquitous Robots (UR) | 2021

Learning Multi-modal Attentional Consensus in Action Recognition for Elderly-Care Robots

 
 
 

Abstract


This paper addresses a practical action recognition method for elderly-care robots. Multi-stream based models are one of the promising approaches for solving the complexity of real-world environments. While multi-modal action recognition have been actively studied, there is a lack of research on models that effectively combine features of different modalities. This paper proposes a new mid-level feature fusion method for two-stream based action recognition network. In multi-modal approaches, extracting complementary information between different modalities is an essential task. Our network model is designed to fuse features at an intermediate level of feature extraction, which leverages a whole feature map from each modality. Consensus feature map and consensus attention mechanism are proposed as effective ways to extract information from two different modalities: RGB data and motion features. We also introduce ETRI-Activity3D-LivingLab, a real-world RGB-D dataset for robots to recognize daily activities of the elderly. It is the first 3D action recognition dataset obtained in a variety of home environments where the elderly actually reside. We expect our new dataset to contribute to the practical study of action recognition with the previously released ETRI-Activity3D dataset. To prove the effectiveness of the method, extensive experiments are performed on NTU RGB+D, ETRI-Activity3D and, ETRI-Activity3D-LivingLab dataset. Our mid-level fusion method achieves competitive performance in various experimental settings, especially for domain-changing situations.

Volume None
Pages 308-313
DOI 10.1109/UR52253.2021.9494666
Language English
Journal 2021 18th International Conference on Ubiquitous Robots (UR)

Full Text