Multim. Tools Appl. | 2021

Enhanced 3D residual network for video event recognition in shipping monitoring

 
 

Abstract


The three-dimensional convolutional neural network is widely used in video recognition, action recognition and other tasks because it can directly extract temporal and spatial features. Due to the large number of parameters, many computing resources, and difficulty in training, the structure of three-dimensional convolutional neural network is generally shallow. For example, the traditional C3D [17] method uses only the 11-layer VGGNet structure, and the traditional Res3D [18] method adopts a residual network of 18 and 34 layers. Some experience of two-dimensional convolutional neural network shows that the deeper the network structure is, the higher the recognition accuracy will be. Therefore, this paper proposes a new method 3D ResNet-66, which combines a 50-layer 3D residual network and four-layer residual blocks, effectively reducing the number of parameters while increasing the depth of the network, and we finally obtain a better video recognition model through experiments. We evaluate our method on shipping event datasets. Compared to the traditional C3D and Res3D method, our method has improved the accuracy from 91.48% to 96.33%, the model size has been reduced from 561\xa0MB to 135\xa0MB, and the average processing time has become half of the original.

Volume 80
Pages 3337-3348
DOI 10.1007/S11042-020-09564-4
Language English
Journal Multim. Tools Appl.

Full Text