2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) | 2019

PA3D: Pose-Action 3D Machine for Video Recognition

 
 
 
 

Abstract


Recent studies have witnessed the successes of using 3D CNNs for video action recognition. However, most 3D models are built upon RGB and optical flow streams, which may not fully exploit pose dynamics, i.e., an important cue of modeling human actions. To fill this gap, we propose a concise Pose-Action 3D Machine (PA3D), which can effectively encode multiple pose modalities within a unified 3D framework, and consequently learn spatio-temporal pose representations for action recognition. More specifically, we introduce a novel temporal pose convolution to aggregate spatial poses over frames. Unlike the classical temporal convolution, our operation can explicitly learn the pose motions that are discriminative to recognize human actions. Extensive experiments on three popular benchmarks (i.e., JHMDB, HMDB, and Charades) show that, PA3D outperforms the recent pose-based approaches. Furthermore, PA3D is highly complementary to the recent 3D CNNs, e.g., I3D. Multi-stream fusion achieves the state-of-the-art performance on all evaluated data sets.

Volume None
Pages 7914-7923
DOI 10.1109/CVPR.2019.00811
Language English
Journal 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Full Text