Signal, Image and Video Processing | 2021

View transform graph attention recurrent networks for skeleton-based action recognition

 
 
 
 

Abstract


Human action recognition based on skeleton recently has attracted attention of researchers due to the accessibility and popularity of the 3D skeleton data. However, it is complicated to effectively represent spatial–temporal skeleton sequences given the large variations of action representations when they are captured from different viewpoints. In order to get a better representation of the spatial–temporal skeletal features, this paper introduces a view transform graph attention recurrent networks (VT+GARN) method for view-invariant human action recognition. We design a view-invariant transform strategy based on the sequence to reduce the influence of different views on the spatial–temporal position of skeleton joint. Then, the graph attention recurrent network automatically calculates the coefficient of attention and learns the representation of spatiotemporal skeletal features after the transformation and outputs the classification result. Ablation studies and extensive experiments on three challenging datasets, Northwestern-UCLA, NTU RGB+D and UWA3DII, demonstrate the effectiveness and superiority of our method

Volume 15
Pages 599-606
DOI 10.1007/S11760-020-01781-6
Language English
Journal Signal, Image and Video Processing

Full Text