IEEE Access | 2021

An Image Captioning Model Based on Bidirectional Depth Residuals and its Application

 
 
 
 
 
 
 

Abstract


A novel network model “bidirectional depth residuals gated recurrent unit network (BDR-GRU) ” is designed and implemented for improving the effectiveness of Image Captioning. BDR-GRU is designed based on encoder and decoder architecture. Moreover, the network can run on an NVIDIA JETSON TX2 processor, which makes the algorithm applied to mobile robots. In the encoding stage, the convolution neural network is used to obtain the multi-dimensional vector information of the image, and the BDR-GRU network is used to complete the sentence generation in the decoding stage. The BDR-GRU network model is a new recurrent neural network model, which is improved on the basics of the GRU network. Firstly, the layer of the GRU network is increased from a single layer to multiple layers. Secondly, the bidirectional derivation structure is redesigned to enhance the ability of derivation. Finally, the residual mechanism between levels is designed to prevent the disappearance of gradient and over-fitting caused by the increase of the layers. Experiments are carried out on TX2 processor and have been done to verify the effectiveness of our design, and the results are compared with img-gLSTM network model, neural talk model, attention model, and unidirectional GRU model, then the results are analyzed. The experimental results show that the CIDEr evaluation value of our network model is 12.7% higher than that of the img-gLSTM network and 14.6% higher than that of the Neural Talk network, other evaluation indicators also improve significantly. The experimental results prove the significance of our BDR-GRU model.

Volume 9
Pages 25360-25370
DOI 10.1109/ACCESS.2021.3057091
Language English
Journal IEEE Access

Full Text