Automatic Polyp Segmentation using Fully Convolutional Neural Network
AAutomatic Polyp Segmentation using Fully ConvolutionalNeural Network
Nikhil Kumar Tomar Indira Gandhi National Open University, [email protected]
ABSTRACT
Colorectal cancer is one of fatal cancer worldwide. Colonoscopy isthe standard treatment for examination, localization, and removalof colorectal polyps. However, it has been shown that the miss-rateof colorectal polyps during colonoscopy is between 6 to 27% [1].The use of an automated, accurate, and real-time polyp segmen-tation during colonoscopy examinations can help the cliniciansto eliminate missing lesions and prevent further progression ofcolorectal cancer. The “Medico automatic polyp segmentation chal-lenge” provides an opportunity to study polyp segmentation andbuild a fast segmentation model. The challenge organizers provide aKvasir-SEG dataset to train the model. Then it is tested on a separateunseen dataset to validate the efficiency and speed of the segmenta-tion model. The experiments demonstrate that the model trained onthe Kvasir-SEG dataset [5] and tested on an unseen dataset achievesa dice coefficient of 0.7801, mIoU of 0.6847, recall of 0.8077, andprecision of 0.8126, demonstrating the generalization ability of ourmodel. The model has achieved 80.60 FPS on the unseen datasetwith an image resolution of 512 × Colorectal cancer is one of the dangerous types of cancer, adding tosignificant deaths worldwide. Polyps are an early indicator of thistype of cancer, and clinicians often detect it through colonoscopy.These polyps come in various shapes and sizes and are sometimesmissed by clinicians as some polyps are hard to differentiate fromthe surrounding tissue. Sometimes these polyps are covered withstool, mucosa, and other surrounding structures and pose chal-lenges for clinicians. This is why it is essential to build a Computer-Aided Diagnosis (CADx) system for detecting polyps.The automatic polyp segmentation can play a crucial role inidentifying and localizing the affected regions from the imagesor video frames. Semantic segmentation helps you analyze eachpixel and classify them into a well-defined polyp or non-polypclass instance. With the increase in the amount of publicly avail-able datasets, dominant methodology such as convolutional neuralnetworks and improved hardware enables researchers to solve thechallenging task of automated diagnosis of colorectal cancer inreal-time.The “Medico Automatic Polyp Segmentation Challenge” [4] con-sists of two tasks. The first task is “Polyp segmentation task” and
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
MediaEval’20, December 14-15 2020, Online the second is “Algorithm efficiency task”. A single has been sub-mitted for both the task. The model is efficient in terms of both theevaluation metrics score and the FPS.
The proposed architecture is a full convolution network followingan encoder-decoder approach. It combines the strength of residuallearning [2] and the attention mechanism of the squeeze and excita-tion network [3]. The encoding network consists of 4 encoder blockwith 32, 64, 128, and 256 number of filters. The decoding networkalso consists of a 4 decoder block with 128, 64, 32, and 16 numberof filters. Both the encoder and decoder block consists of a residualblock as their core component.The residual block consists of two 3 × 𝑛 , where 𝑛 represents the number of feature channelsfrom the incoming feature maps. In the second step(excitation),this global feature vector goes through a two-layer feed-forwardneural network. Here the number of features is first reduced andthen expanded to the original size 𝑛 . Finally, a sigmoid activationfunction is used, which scales the feature vector value between 0and 1. This scaled feature vector is used to multiply the incomingfeature maps.The proposed network takes the polyp image of 512 ×
512 sizeas the input given to the first encoder block. Each encoder blockstarts with two residual blocks, where each residual block consistsof two 3 × × ×
64) the original input size. a r X i v : . [ ee ss . I V ] J a n ediaEval’20, December 14-15 2020, Online N. K. Tomar et. al. d) Proposed Architecturea) Residual block Input3x3 ConvolutionBatch Norm. & ReLU3x3 ConvolutionBatch Norm. 1x1 ConvolutionBatch Norm.ReLUSqueeze & Excitation InputResidual blockResidual block2x2 MaxPool2DOutput To Skip Connection b) Encoder block
Input Skip Connection4x4 TransposeConvolutionConcatenateResidual blockResidual blockOutput c) Decoder block x x x x
32 64 128 256 128 64 32 16 C on ca t e n a t e Figure 1: The proposed architecture and its components
The output of the last encoder block acts as the input of the firstdecoder block. In each decoder block, first, the incoming featuremap is upscaled by using a 4 × × ×
512 using a 4 × × Table 1 shows the overall results on the validation dataset of Kvasir-SEG and unseen test dataset provided by the challenge organizers.For the evaluation of the results, the Mean Intersection-Over-Union(mIoU), Sørensen–Dice coefficient (DSC), recall, precision (Prec.),accuracy (Acc.), and F2 metrics were used for both task 1 and 2.Additionally, FPS was also calculated for task 2. Task 1 and task 2’sevaluation score is the same as a single model was used for both thetasks. The proposed model trained on the Kvasir-SEG dataset [5]
Table 1: Quantitative results on Kvasir-SEG and unseen(Challenge) dataset for task 1 and 2.Dataset mIoU DSC Recall Prec. Acc. F2 FPS
Kvasir-SEG 0.7565 0.8411 0.8643 0.8680 0.9532 0.8461 -Unseen 0.6847 0.7801 0.8077 0.8126 0.9404 0.7854 80.60and tested on an unseen dataset achieves a DSC of 0 . . . . . . × The Medico Automatic Polyp Segmentation challenge [4] providesa platform to explore the potential and challenges of automatedpolyp segmentation on the Kvasir-SEG dataset containing 1000images and their respective annotative masks. We have trained theproposed model and provide competitive results for both task 1 andtask 2. We believe this approach will be an effective method for therapid and automated segmentation of polyps. In the future, we canfurther investigate how to improve the system by further reducingthe model complexity while improving performance. ediaEval’20: Multimedia Evaluation Workshop MediaEval’20, December 14-15 2020, Online
REFERENCES [1] Sang Bong Ahn, Dong Soo Han, Joong Ho Bae, Tae Jun Byun, Jong PyoKim, and Chang Soo Eun. 2012. The miss rate for colorectal adenomadetermined by quality-adjusted, back-to-back colonoscopies.
Gut andliver
6, 1 (2012), 64.[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deepresidual learning for image recognition. In
Proceedings of the IEEEconference on computer vision and pattern recognition . 770–778.[3] Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks.In
Proceedings of the IEEE conference on computer vision and pattern recognition . 7132–7141.[4] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard D. Jo-hansen, Dag Johansen, Thomas de Lange, Michael A. Riegler, and PålHalvorsen. 2020. Medico Multimedia Task at MediaEval 2020:Auto-matic Polyp Segmentation. In
Proc. of MediaEval 2020 CEUR Workshop .[5] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen,Thomas de Lange, Dag Johansen, and Håvard D Johansen. 2020. Kvasir-SEG: A Segmented Polyp Dataset. In