Pattern Recognition and Image Analysis | 2021

Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

 
 
 
 

Abstract


Abstract Feature extraction based traditional human action recognition algorithms are complicated, leading to low recognition accuracy. We present an algorithm for the recognition of human actions in videos based on spatio-temporal fusion using 3D convolutional neural networks (3D CNNs). The algorithm contains two subnetworks, which extract deep spatial information and temporal information, respectively, and bilinear fusion policy is applied to obtain the final fused spatio-temporal information. Spatial information is represented by a gradient feature, and the temporal information is represented by optical flow. The fused spatio-temporal information can retrieve deep features from multiple angles by constructing a new 3D CNNs. The proposed algorithm is compared with the current mainstream algorithms in the KTH and UCF101 datasets, showing effectiveness and high recognition accuracy.

Volume 31
Pages 580-587
DOI 10.1134/S105466182103024X
Language English
Journal Pattern Recognition and Image Analysis

Full Text