IEEE Transactions on Image Processing | 2021

Hierarchical Reasoning Network for Human-Object Interaction Detection

 
 
 
 
 

Abstract


Human-object interaction detection that aims at detecting triplets is critical for the holistic human-centric scene understanding. Existing approaches ignore the modeling of correlations among hierarchical human parts and objects. In this work, we introduce a Hierarchical Reasoning Network (HRNet) to capture relations among human parts at multiple scales (including the holistic human, human region, and human keypoint levels) and objects via a unified graph. In particular, HRNet first constructs one multi-level human parts graph, each level of which consists of human parts at one specific scale, objects, and the unions of human part-object pairs as nodes, and their mutual visual and spatial layout relations as intra-level reasoning. To also capture the relations across scales, we further introduce inter-level reasoning between the nodes of two consecutive levels based on the prior of human body structure. The representations of graph nodes are propagated along intra-level and inter-level reasoning in turn during reasoning. Extensive experiments demonstrate our HRNet obtains new state-of-the-art results on three challenging HICO-DET, V-COCO and HOI-A benchmarks, validating the compelling effectiveness of the proposed method.

Volume 30
Pages 8306-8317
DOI 10.1109/TIP.2021.3093784
Language English
Journal IEEE Transactions on Image Processing

Full Text