IEEE Transactions on Multimedia | 2021

Joint Input and Output Space Learning for Multi-Label Image Classification

 
 
 
 
 
 

Abstract


Multi-label image classification aims to predict the labels associated with a given image. While most existing methods utilize unified image representations, extracting label-specific features through input space learning would improve the discriminative power of the learned features. On the other hand, most feature learning studies often ignore the learning in the output label space, although taking advantage of label correlations can boost the classification performance. In this paper, we propose a deep learning framework that incorporates flexible modules which can learn from both input and output spaces for multi-label image classification. For the input space learning, we devise a label-specific feature pooling method to refine convolutional features for obtaining features specific to each label. For the output space learning, we design a Two-Stream Graph Convolutional Network (TSGCN) to learn multi-label classifiers by mapping spatial object relationships and semantic label correlations. More specifically, we build object spatial graphs to characterize the spatial relationships among objects in an image, which supplements the label semantic graphs modelling the semantic label correlations. Experimental results on two popular benchmark datasets (i.e., Pascal VOC and MS-COCO) show that our proposed method achieves superior performance over the state-of-the-arts.

Volume 23
Pages 1696-1707
DOI 10.1109/TMM.2020.3002185
Language English
Journal IEEE Transactions on Multimedia

Full Text