IEEE transactions on pattern analysis and machine intelligence | 2021

Human-centric Relation Segmentation: Dataset and Solution

 
 
 
 
 
 
 
 

Abstract


Vision and language techniques have achieved remarkable progress, but it is still difficult to well handle problems involving fine-grained details. For example, when the robot is told to bring me the book in the girls left hand, existing methods would fail if the girl holds one book respectively in her left and right hand. In this work, we introduce a new task named human-centric relation segmentation (HRS) as a fine-grained case of HOI-det. It aims to predict the relations between the human and surrounding entities and identify the interacted human parts, which are represented as pixel-level masks. Correspondingly, we collect a new Person In Context (PIC) dataset and propose a Simultaneously Matching and Segmentation (SMS) framework to solve the task. It contains three parallel branches. Specifically, the entity segmentation branch obtains entity masks by dynamically-generated conditional convolutions; the subject object matching branch links the corresponding subjects and objects by displacement estimation and classifies the interacted human parts; and the human parsing branch generates the pixelwise human part labels. Outputs of the three branches are fused to produce the final HRS results. Extensive experiments on two datasets show that SMS outperforms baselines with the 36 FPS inference speed.

Volume PP
Pages None
DOI 10.1109/TPAMI.2021.3075846
Language English
Journal IEEE transactions on pattern analysis and machine intelligence

Full Text