Proceedings of the 28th ACM International Conference on Information and Knowledge Management | 2019

Cross-modal Image-Text Retrieval with Multitask Learning

 
 
 
 
 

Abstract


In this paper, we propose a multi-task learning approach for cross-modal image-text retrieval. First, a correlation network is proposed for relation recognition task, which helps learn the complicated relations and common information of different modalities. Then, we propose a correspondence cross-modal autoencoder for cross-modal input reconstruction task, which helps correlate the hidden representations of two uni-modal autoencoders. In addition, to further improve the performance of cross-modal retrieval, two regularization terms (variance and consistency constraints) are introduced to the cross-modal embeddings such that the learned common information has large variance and is modality invariant. Finally, to enable large-scale cross-modal similarity search, a flexible binary transform network is designed to convert the text and image embeddings into binary codes. Extensive experiments on two benchmark datasets demonstrate that our model has robust superiority over the compared strong baseline methods. Source code is available at \\urlhttps://github.com/daerv/DAEVR.

Volume None
Pages None
DOI 10.1145/3357384.3358104
Language English
Journal Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Full Text