Proceedings of the 29th ACM International Conference on Multimedia | 2021

Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval

 
 
 
 
 

Abstract


Multi-modal hashing makes an important contribution to multimedia retrieval, where a key challenge is to encode heterogeneous modalities into compact hash codes. To solve this dilemma, graph-based multi-modal hashing methods generally define individual affinity matrix of each independent modality and apply linear algorithm for heterogeneous modalities fusion and compact hash learning. Several other methods construct graph Laplacian matrix based on semantic information to help learn discriminative hash code. However, these conventional methods roughly ignore the structural similarity of training set and the complex relations among multi-modal samples, which leads to unsatisfactory complementarity of fused hash codes. More notably, they are faced with two other important problems: huge computing and storage costs caused by graph construction and partial modality feature lost problem when incomplete query sample comes. In this paper, we propose a Flexible Graph Convolutional Multi-modal Hashing (FGCMH) method that adopts GCNs with linear complexity to preserve both the modality-individual and modality-fused structural similarity for discriminative hash learning. Necessarily, accurate multimedia retrieval can be performed on complete and incomplete datasets with our method. Specifically, multiple modality-individual GCNs under semantic guidance are proposed to act on each individual modality independently for intra-modality similarity preserving, then the output representations are fused into a fusion graph with adaptive weighting scheme. Hash GCN and semantic GCN, which share parameters in the first two layers, propagate fusion information and generate hash codes under high-level label space supervision. In the query stage, our method adaptively captures various multi-modal contents in a flexible and robust way, even if partial modality features are lost. Experimental results on three publicly datasets show the flexibility and effectiveness of our proposed method.

Volume None
Pages None
DOI 10.1145/3474085.3475598
Language English
Journal Proceedings of the 29th ACM International Conference on Multimedia

Full Text