2020 25th International Conference on Pattern Recognition (ICPR) | 2021

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

 
 
 
 
 

Abstract


Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8 % of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024 x 2048). Besides, with a light version of our network, we reach 74.1 % of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at github.com/GibranBenitez/FASSD- Net.

Volume None
Pages 2264-2271
DOI 10.1109/ICPR48806.2021.9413176
Language English
Journal 2020 25th International Conference on Pattern Recognition (ICPR)

Full Text