Multimedia Tools and Applications | 2021

Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

 
 

Abstract


The video file is a collection of image sequential; this image sequence holds both spatial and temporal information. Optical flow and motion history images are two well-known methods for the identification of human activities. Optical flow describes the speed of every individual pixel point in the picture. Still, this information about the motion cannot represent the complete action and different movement speeds. The durations of Local body parts show almost similar intensity in the Motion history image. Therefore, similar actions are not identifying with good precision. In this paper, a deep convolutional neural model for human activities recognition video has been proposed in which multiple CNN streams are combined. The model combines spatial and temporal information. Two fusion schemes, i.e. Average fusion and convolution fusion of spatial and temporal stream, are discussed in this paper. The proposed method performs better than other approaches based on human activity recognition methods on a benchmark dataset, namely UCF101 and HMDB51.Average fusion score 95.4% test accuracy and convolution fusion score 97.2% test accuracy on UCF101 and for HMDB51, average fusion score 84.3% and convolution fusion score 85.1% respectively.

Volume None
Pages 1-13
DOI 10.1007/S11042-021-11220-4
Language English
Journal Multimedia Tools and Applications

Full Text