IEEE transactions on neural networks and learning systems | 2021

DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and Semantics.

 
 
 
 
 

Abstract


Multitask joint learning technology continues gaining more attention as a paradigm shift and has shown promising performance in many applications. Depth estimation and semantic understanding from monocular images emerge as a challenging problem in computer vision. While the other joint learning frameworks establish the relationship between the semantics and depth from stereo pairs, the lack of learning camera motion renders the frameworks that fail to model the geometric structure of the image scene. We make a further step in this article by proposing a multitask learning method, namely DPSNet, which can jointly perform depth and camera pose estimation and semantic scene segmentation. Our core idea for depth and camera pose prediction is that we present the rigid semantic consistency loss to overcome the limitation of moving pixels from image reconstruction technology and further infer the segmentation of moving instances based on them. In addition, our proposed model performs semantic segmentation by reasoning the geometric correspondences between the pixel semantic outputs and the semantic labels at multiscale resolutions. Experiments on open-source datasets and a video dataset captured on a micro-smart car show the effectiveness of each component of DPSNet, and DPSNet achieves state-of-the-art results in all three tasks compared with the best popular methods. All our models and code are available at https://github.com/jn-z/DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and semantics.

Volume PP
Pages None
DOI 10.1109/TNNLS.2021.3107362
Language English
Journal IEEE transactions on neural networks and learning systems

Full Text