IEEE Transactions on Multimedia | 2019

Learning Composite Latent Structures for 3D Human Action Representation and Recognition

 
 
 

Abstract


3D human action representation and recognition are important issues in many multimedia applications. While latent state approaches have been widely used for action modeling, previous works assume the latent states of actions are single attribute. This assumption is inaccurate for representing structures of complex actions. In this paper, we propose that latent states have composite attributes and introduce a novel composite latent structure (CLS) model to represent and recognize 3D human actions with skeleton sequences. A human action is modeled with a hierarchical graph, which represents the action sequence as sequential atomic actions. An atomic action is represented as a composite latent state, which is composed of a latent semantic attribute and a latent geometric attribute. A discriminative EM-like algorithm is proposed to learn the model parameters and the composite latent structures of human actions. Given a 3D skeleton sequence, a composite attribute iterative programming algorithm is proposed to recognize the action and infer the action s latent temporal structure. We evaluate the proposed method on three challenging 3D action datasets—MSR 3D Action Dataset, Multiview 3D Event Dataset, and UTKinect-Action 3D Dataset. Extensive experimental results on these datasets demonstrate the effectiveness and advantage of the proposed method.

Volume 21
Pages 2195-2208
DOI 10.1109/TMM.2019.2897902
Language English
Journal IEEE Transactions on Multimedia

Full Text