Comput. Vis. Image Underst. | 2021

Multi-modal semantic image segmentation

 
 
 
 

Abstract


Abstract We propose a modality invariant method to obtain high quality semantic object segmentation of human body parts, for four imaging modalities which consist of visible images, X-ray images, thermal images (heatmaps) and infrared radiation (IR) images. We first consider two modalities (i.e. visible and X-ray images) to develop an architecture suitable for multi-modal semantic segmentation. Due to the intrinsic difference between images from the two modalities, state-of-the-art approaches such as Mask R-CNN do not perform satisfactorily. Insights from analysing how the intermediate layers within Mask R-CNN work on both visible and X-ray modalities have led us to propose a new and efficient network architecture which yields highly accurate semantic segmentation results across both visible and X-ray domains. We design multi-task losses to train the network across different modalities. By conducting multiple experiments across visible and X-ray images of the human upper extremity, we validate the proposed approach, which outperforms the traditional Mask R-CNN method through better exploiting the output features of CNNs. Based on the insights gained on these images from visible and X-ray domains, we extend the proposed multi-modal semantic segmentation method to two additional modalities; (viz. heatmap and IR images). Experiments conducted on these two modalities, further confirm our architecture’s capacity to improve the segmentation by exploiting the complementary information in the different modalities of the images. Our method can also be applied to include other modalities and can be effectively utilized for several tasks including medical image analysis tasks such as image registration and 3D reconstruction across modalities.

Volume 202
Pages 103085
DOI 10.1016/j.cviu.2020.103085
Language English
Journal Comput. Vis. Image Underst.

Full Text