2021 29th Signal Processing and Communications Applications Conference (SIU) | 2021

Multi-GRU Based Automated Image Captioning for Smartphones

 
 
 
 

Abstract


Image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the development of many image captioning applications. In this study, a novel automatic image captioning system based on the encoder-decoder approach that can be applied in smartphones is proposed. While high-level visual information is extracted with the ResNet152V2 convolutional neural network in the encoder part, the proposed decoder transforms the extracted visual information into natural expressions of the images. The proposed decoder with the multilayer gated recurrent unit structure allows generating more meaningful captions using the most relevant visual information. The proposed system has been evaluated using different performance metrics on the MSCOCO dataset and it outperforms the state-ofthe-art approaches. The proposed system is also integrated with our custom-designed Android application, named IMECA, which generates captions in offline mode unlike similar applications. Thus, image captioning is intended to be practical for more people.

Volume None
Pages 1-4
DOI 10.1109/SIU53274.2021.9477901
Language English
Journal 2021 29th Signal Processing and Communications Applications Conference (SIU)

Full Text