Proceedings of the 27th ACM International Conference on Multimedia | 2019

Perceptual Visual Reasoning with Knowledge Propagation

 
 
 

Abstract


Visual Question Answering (VQA) aims to answer natural language questions given images, where great challenges lie in comprehensive understanding and reasoning based on the rich contents provided by both questions and images. Most existing literature on VQA fuses the image and question features together with attention mechanism to answer the questions. In order to obtain a more human-like inferential ability, there have been some preliminary module-based approaches which decompose the whole problem into modular sub-problems. However, these methods still suffer from unsolved challenges such as lacking sufficient explainability and logical inference --- no doubt the gap between these preliminary studies and the real human reasoning behaviors is still extremely large. To tackle the challenges, we propose a Perceptual Visual Reasoning (PVR) model which advances one important step towards the more explainable VQA in this paper. Our proposed PVR model is a module-based approach which incorporates the concept of logical and/or for logic inference, introduces a richer group of perceptual modules for better logic generalization and utilizes the supervised information on each sub-module for more explainability. Knowledge propagation is therefore enabled by resorting to the modular design and supervision on sub-modules. We carry out extensive experiments with various evaluation metrics to demonstrate the superiority of the proposed PVR model against other state-of-the-art methods.

Volume None
Pages None
DOI 10.1145/3343031.3350922
Language English
Journal Proceedings of the 27th ACM International Conference on Multimedia

Full Text