2021 IEEE International Conference on Mechatronics and Automation (ICMA) | 2021

Improving Caption Consistency to Image with Semantic Filter by Adversarial Training

 
 

Abstract


Benefiting from the larger-scale dataset, image captioning has achieved remarkable success to generate more humanlike captions. However, for the specific tasks (e.g., stylized image captioning) trained with the small-scale dataset, the visual objects and semantic diversity are generally insufficient. Although the generated captions are suitable, it still lacks in depicting the image with comprehensive visual objects, which leads to a reduction in the fluency and accuracy expressions. To address this issue, we proposed an image captioning system based on an adversarial training strategy. To improve the accuracy, a semantic filter module is implemented to obtain the informative context from the semantic vectors. With a two-separated LSTM architecture, our model learns the image features and semantic vectors at the global and local levels. Through adversarial training, the generated caption can be integrated with accurate information and expressed in a fluent style. Experiment results show the outstanding performance of our approach to capture semantic knowledge on the FlickrStyle10K dataset. The linguistic analysis demonstrates our model succeeds in improving the accuracy and fluency of generated captions.

Volume None
Pages 269-274
DOI 10.1109/ICMA52036.2021.9512682
Language English
Journal 2021 IEEE International Conference on Mechatronics and Automation (ICMA)

Full Text