[PDF] Enhancing MRI Brain Tumor Segmentation with an Additional Classification Network

Abstract

Brain tumor segmentation plays an essential role in medical image analysis. In recent studies, deep convolution neural networks (DCNNs) are extremely powerful to tackle tumor segmentation tasks. We propose in this paper a novel training method that enhances the segmentation results by adding an additional classification branch to the network. The whole network was trained end-to-end on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2020 training dataset. On the BraTS's validation set, it achieved an average Dice score of 78.43%, 89.99%, and 84.22% respectively for the enhancing tumor, the whole tumor, and the tumor core.

Full PDF

EEnhancing MRI Brain Tumor Segmentationwith an Additional Classiﬁcation Network (cid:63)

Hieu T. Nguyen , , ∗ , (cid:12) , Tung T. Le , , ∗ Thang V. Nguyen , and Nhan T.Nguyen Medical Imaging Department, Vingroup Big Data Institute (VinBigdata) Hanoi University of Science and Technology (HUST) University of Engineering and Technology (UET), VNU (cid:12)

Correspondence to Hieu Nguyen; e-mail: [email protected] ∗ These authors share ﬁrst authorship on this work.

Abstract.

Brain tumor segmentation plays an essential role in medi-cal image analysis. In recent studies, deep convolution neural networks(DCNNs) are extremely powerful to tackle tumor segmentation tasks.We propose in this paper a novel training method that enhances thesegmentation results by adding an additional classiﬁcation branch to thenetwork. The whole network was trained end-to-end on the MultimodalBrain Tumor Segmentation Challenge (BraTS) 2020 training dataset.On the BraTS’s test set, it achieved an average Dice score of 80 . .

67% and 82 . .

22, 7 . .

27, respectively for the enhancing tumor, the whole tumor andthe tumor core.

Keywords:

Deep learning · Brain tumor segmentation · FPN · U-Net

Gliomas are the most common primary brain malignancies, with diﬀerent degreesof aggressiveness, variable prognosis and various heterogeneous histological sub-regions [3,1,2]. One objective of The Brain Tumor segmentation (BraTS) chal-lenge is to identify state-of-the-art machine learning methods for segmentation ofbrain tumors in magnetic resonance imaging (MRI) scans [19,4]. One MRI datasample consists of a native T1-weighted scan (T1), a post-contrast T1-weightedscan (T1Gd), a native T2-weighted scan (T2), and a T2 Fluid Attenuated Inver-sion Recovery (T2-FLAIR) scan. However, each tumor-region-of-interest (TRoI)is visible in one pulse. Speciﬁcally, the whole tumor is visible in T2-FLAIR, thetumor core is visible in T2, and the enhancing tumor is visible in T1Gd.An accurate deep learning segmentation model not only can save time for neu-roradiologists but provides a reliable result for further tumor analysis. Recently,deep learning approaches have consistently surpassed traditional computer vi-sion methods [6,11,22,24,27]. Speciﬁcally, convolutional neural networks (CNN) (cid:63)

This work was done when Hieu Nguyen and Tung Le were AI Interns at MedicalImaging Department, Vingroup Big Data Institute (VinBigdata). a r X i v : . [ ee ss . I V ] O c t Nguyen et al. are able to learn deep representative features to generate accurate segmentationmask both in 2D and 3D medical images.The BraTS 2020 training dataset, which comprises 369 cases for training and125 cases for validation, is manually annotated by both clinicians and board-certiﬁed radiologists. Each tumor is segmented into enhancing tumor, peritu-moral edema, and the necrotic and non-enhancing tumor core. To evaluate thesegmentation performance, various metrics are used: Dice score, Hausdorﬀ dis-tance (95%), sensitivity and speciﬁcity.Since the introduction of U-Net [23] in 2015, various types of U-shape DCNNhave been proposed and gained signiﬁcant results in medical image segmenta-tion tasks. In BraTS 2017, Kamnitsas et al. [10], who was the winner of thesegmentation challenge, explored Ensembles of Multiple Models and Architec-ture (EMMA) for robust performance by combining several DCNNs includingDeepMedic [11], 3D FCN [17] and 3D U-Net [5]. In BraTS 2018, Myronenko [21],who won segmentation track, utilized asymmetrically large encoder to extractdeep image features, and the decoder part reconstructs dense segmentationmasks. The authors also added the variational autoencoder (VAE) branch inorder to regularize the network. In BraTS 2019, Jiang et al.[9], who recentlyachieved the highest score on private test set, deployed two-stage cascaded U-Net which basically stacked 2 U-Net networks together. In the ﬁrst stage, theyused a variant of U-Net to train a coarse prediction. In the next stage, theyincreased the network capacity by using 2 decoders simultaneously. The modelwas trained in an end-to-end manner and achieved the best result.

Contribution.

Through exploratory model analysis after training, we no-tice that deep learning segmentation models sometimes make false positive pre-dictions. To bridge the gap between segmentation model eﬃciency and avoidthese problems, we proposed a novel end-to-end training method by combiningboth segmentation and classiﬁcation. The classiﬁcation branch helps to pre-dict whether a mask slice contains region of interest as well as to regularizethe segmentation branch. We explored this approach with 2 architectures whichare variant of nested U-Net [28] and Bi-directional Feature Pyramid Network(BiFPN) [25]. Our method achieved Dice score of 80 . .

67% and 82 . . In this section, we describe the proposed approach in which two diﬀerent models,BiFPN and Nested U-Net, are leveraged as the base segmentation architecture,enhanced by a classiﬁer head. While the segmentation head largely relies on localfeatures to segment tumor area, the classiﬁcation branch leverages global fea-tures of the whole slice as well as neighbors slices to aid segmentation task. Themain advantage of classiﬁcation head is that it signiﬁcantly reduces false positive nhancing MRI Brain Tumor Segmentation by Classiﬁcation 3 regions since minute, high intensity regions of enhancing tumor are often con-fused with other non-tumor, high intensity regions. In addition, to tackle smallbatch size problem when using batch-norm, we deploy Group Normalization [26]with number groups of 8 instead. In this approach, an encoder-decoder based network is leveraged [28,12] withan additional classiﬁcation branch to further enhance segmentation results. Theclassiﬁcation head is placed at the end of the encoder to classify whether an imageslice has tumor region. In the following subsections, we describe the details ofthe encoder and decoder parts (see Fig.1).

Fig. 1.

Overview of the BiFPN architecture with Classiﬁer.

Encoder

For the encoder part, we exploited residual block [7] for features ex-traction with the number of channels being doubled after each convolutionallayer of stride 2, which results in multi-scale features maps for the latter part.There are four scales of feature maps, where the smallest one were 16 timessmaller than the input image (see Table.1 for the details of the feature extrac-tor). In order to combine features of multiple sizes, we adapted the BiFPN layerfrom EﬃcientDet architecture [25] (see Fig.2), which was an improved versionof the Feature Pyramid networks [14]. We used three consecutive BiFPN layerswith feature dimensions of 256, as deeper networks did not improve performance.

Nguyen et al.

Table 1.

Details of the feature extractor, where conv3 is 3 × × × × × Block Details Repeat Output sizeEncoder block1 (conv3 stride2, GN, dropout, ReLU,conv3 stride1, GN, dropout, ReLU)+ conv3 stride2 1 16 × × × × × × × × × × × × Fig. 2.

Illustration of the BiFPN layer.nhancing MRI Brain Tumor Segmentation by Classiﬁcation 5

Decoder

In the decoder part, we followed the design of semantic segmentationbranch of the Panoptic Feature Pyramid networks [12]. Each feature map fromthe BiFPN layers was put into a number of up-sample blocks, depending on thespatial size. Each up-sample block consists of a 3 × × × trilinear interpolation, and the feature dimension isﬁxed to 256. Due to GPU memory constraint, all feature maps were up-sampledto a common size, which is half of the input image size, then were concatenatedbefore putting into the ﬁnal upsample block, which has a 1 × × × trilinear interpolation layer. Fig. 3.

Overview of the nested UNET architecture with an additional Classiﬁer.

Skip pathways

According to Zhou et al. [28], nested U-Net (UNet++) pro-posed dense convolution block whose number of convolution layers depend on thepyramid level. Therefore, they re-designed skip pathways to bring the semanticlevel of the encoder feature maps closer to that of the feature maps awaiting inthe decoder. Fig. 3 clariﬁes how the feature maps travel through the top skippathway of UNet++.

Deep supervision

In order to take advantage of lower feature maps, we useddeep supervision [13] wherein the ﬁnal segmentation map was selected from all

Nguyen et al. segmentation branches averaged. Instead of using another layer before upsam-pling to the map size, our ﬁnal prediction mask upsamples directly from lastlayer. Each layer contains ReLU activation, followed by 2 convolution layers.

In all segmentation architectures, the concatenated feature maps before the ﬁ-nal blocks were used as the input for the classiﬁcation head. While the featuremaps for classiﬁer in UNET++ have the same spatial size as the input image,its counterpart in BiFPN are only half of the input, leading to an additional up-sample layer in the classiﬁcation branch of the BiFPN. The classiﬁcation branchincludes: (1) a 3 × × Table 2.

Details of the classiﬁcation branch in BiFPN, where conv1d1 is 1-D convo-lution with kernel size of 1, conv3d3 is 3-D convolution with kernel size of 3 × × tconv3d3 is 3-D transpose convolution with kernel size of 3 × ×

3, GN denotesgroup-norm. Here output shape of each layer corresponding to input features of shape1024 × × × Names Details Repeat Output sizeConv block conv3d3, GN, ReLU 1 512 × × × × × × × Table 3.

Details of the classiﬁcation branch in UNET++. Here output shape of eachlayer corresponding to input features of shape 128 × × × Names Details Repeat Output sizeConv block conv3d3, BN, ReLU 1 256 × × × × × × nhancing MRI Brain Tumor Segmentation by Classiﬁcation 7 Dice loss

Dice loss originates from Sørensen–Dice coeﬃcient, which is a statisticdeveloped in 1940s to gauge the similarity between two samples. It was brought inV-Net paper [20]. The Dice Similarity Coeﬃcient (DSC) measures the the degreeof overlap between the prediction map and ground truth based on dice coeﬃcient,which is a quantity ranging between 0 and 1 which we aim to maximize. TheDice loss is calculated as L dice = 1 − N (cid:80) i p i g iN (cid:80) i p i + N (cid:80) i g i + (cid:15) , (1)where p i is predicted voxels, and g i is ground truth. The sums run over N voxelsand we add a small (cid:15) = 1 e − Focal loss

To deal with large class imbalance in the segmentation problem, wealso used focal loss [15] to penalize more on wrongly segmented regions L focal ( p t ) = − α t (1 − p t ) γ log( p t ) . (2)We directly optimized the label regions (whole tumor, tumor core, and en-hancing tumor) with the losses. Classiﬁcation loss

For the classiﬁcation branch, we used focal loss and stan-dard binary cross entropy loss .Finally, we summed over all the losses to obtain the ﬁnal loss. L total = L focal seg + L dice + L focal cls + L BCE . (3) For preprocessing, we cropped out zero-intensity regions in order to reduce theimage size as well as discard the out-of-interest regions in training (see Fig-ure 4). To prevent the network from overﬁtting, we executed several types ofdata augmentation. Firstly, we applied random ﬂip with probability of 0.5 in ev-ery spatial axis. Then, we applied a random scale intensity shift of input in therange of [0 . , . − . , . × × × ×

96 voxels due to memory limitation.

Nguyen et al.

Fig. 4.

Crop out zero-intensity region.

Thanks to Pytorch 1.6, we took advantage of automatic mixed precision tosave GPU memory. Adam optimizer was exploited to update model’s parame-ters. Moreover, instead of using ﬁxed learning rate during training, we deployed2 learning rate schedulers (see below) which were cosine learning rate sched-uler [18] and polynomial learning rate scheduler [16]. Both of the two settingshave achieved the same level of performance after 200 epochs with base learningrate of 1 e − Cosine learning rate scheduler (see Eq. 4) Ignoring the warmup stage andassuming the total number of batches is T , initial learning rate η , the learningrate η t at batch t is computed as η t = 12 (1 + cos( tπT )) η. (4) Polynomial learning rate scheduler (see Eq. 5) Ignoring the warm-upstage, η = 1 e − e is epoch counter, N e is total numberof epochs, the learning rate η t at batch t is computed as η t = η × (1 − eN e ) . . (5)The cosine learning rate scheduler was used to train in BiFPN model and thepolynomial learning rate scheduler was used in nested Unet model. Each networkwas trained from scratch without neither pretrained weight nor external data ontwo NVIDIA Tesla V100 32GB RAM. While BiFPN takes a batch of 4 samples,each of shape 4 × × ×

96, UNET++ takes a batch of 2 samples with sizeof 4 × × × To achieve a robust prediction, we applied test time augmentations (TTA) forevery model before averaging them. The augmentations were diﬀerent ﬂippingin each axis. Finally, we averaged the output before put them through sigmoidfunction. Decision threshold for both classiﬁcation and segmentation of threeclasses was set at 0.5. The negative predictions at slice level of the classiﬁcation nhancing MRI Brain Tumor Segmentation by Classiﬁcation 9 head were used to exclude predicted segmentation regions. While removal ofsmall tumor regions is a simpler approach to address false positive predictions,this method is highly sensitive to volume threshold and can mistakenly excludethe actual tumor which is minute. The classiﬁcation head provides a more robustsolution which can signiﬁcantly reduce false positive rate, at the same time lesslikely to miss tiny tumor.

In this section we show the performance of the two architectures with and with-out the classiﬁcation head (see Table4) in terms of Dice score and Hausdorﬀdistance (95%). The results indicate that classiﬁcation head improves both archi-tectures by a signiﬁcant margin. Ensemble of 5 models of the same architecturetrained on 5-folds cross-validation gives marginal improvement, while ensembleof the two architectures gives more considerable enhancement. The results ofensemble of 10 models on the testing set are given in Table 5

Table 4.

Mean Dice score and Hausdorﬀ distance of the proposed method on BraTS2020 validation set.

Method Dice score Hausdorﬀ distance (95%)Validation ET WT TC ET WT TCbest single Unet++ w/o cls 0.7029 0.8967 0.8239 42.2474 7.4907 9.0179best single Unet++ w. cls 0.7742 0.8940 0.8241 35.4246 8.4361 10.4074ensemble of 5-fold Unet++ w/o cls 0.7017 0.8953 0.8239 47.1436 5.8179 11.0075ensemble of 5-fold Unet++ w. cls 0.7841 0.8960 0.8233 35.4841 5.0862 10.0780best single BiFPN w/o cls 0.7480 0.8896 0.8400 31.1209 5.8924 6.9682best single BiFPN w. cls 0.7729 0.8881 0.8373 21.5720 6.9531 6.5573ensemble of 5-fold BiFPN w/o cls 0.7471 0.8915 0.8371 28.9473 5.9362 6.8706ensemble of 5-fold BiFPN w. cls 0.7774 0.8914 0.8380 24.6944 5.9834 6.8527ensemble of 10 models

Table 5.

Mean Dice score and Hausdorﬀ distance of the proposed method on BraTS2020 testing set.

Method Dice score Hausdorﬀ distance (95%)Testing ET WT TC ET WT TCensemble of 10 models 0.80569 0.85671 0.81997 14.21938 7.35549 23.27358

In this work, we described a novel training method for brain tumor segmentationfrom multimodal 3D MRIs. Our results on BraTS 2020 indicated that our modelis able to achieve an extremely competitive segmentation result. On the BraTS2020 test set, the proposed method obtained an average Dice score of 80 . .

67% and 82 . . .

36% and23 . Acknowledgments

This work was highly supported by Medical ImagingDepartment at Vingroup Big Data Institute (VinBigdata).

References

1. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann,J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features forthe pre-operative scans of the tcga-lgg collection. The cancer imaging archive (2017)2. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann,J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for thepre-operative scans of the tcga-gbm collection. the cancer imaging archive. Nat SciData , 170117 (2017)3. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann,J.B., Farahani, K., Davatzikos, C.: Advancing the cancer genome atlas glioma mricollections with expert segmentation labels and radiomic features. Scientiﬁc data , 170117 (2017)4. Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempﬂer, M., Crimi, A., Shinohara,R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learn-ing algorithms for brain tumor segmentation, progression assessment, and overallsurvival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)5. C¸ i¸cek, ¨O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net:learning dense volumetric segmentation from sparse annotation. In: Internationalconference on medical image computing and computer-assisted intervention. pp.424–432. Springer (2016)6. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal,C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation with deep neural net-works. Medical image analysis , 18–31 (2017)7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learn-ing for image recognition. In: IEEE CVPR. pp. 770–778 (2016).https://doi.org/https://doi.org/10.1109/CVPR.2016.908. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation (8), 1735–1780 (1997)9. Jiang, Z., Ding, C., Liu, M., Tao, D.: Two-stage cascaded u-net: 1st place solutionto brats challenge 2019 segmentation task. In: International MICCAI BrainlesionWorkshop. pp. 231–241. Springer (2019)10. Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N.,Rajchl, M., Lee, M., Kainz, B., Rueckert, D., et al.: Ensembles of multiple modelsnhancing MRI Brain Tumor Segmentation by Classiﬁcation 11and architectures for robust brain tumour segmentation. In: International MICCAIBrainlesion Workshop. pp. 450–462. Springer (2017)11. Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon,D.K., Rueckert, D., Glocker, B.: Eﬃcient multi-scale 3d cnn with fully connectedcrf for accurate brain lesion segmentation. Medical image analysis , 61–78 (2017)12. Kirillov, A., Girshick, R., He, K., Doll´ar, P.: Panoptic feature pyramid networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.pp. 6399–6408 (2019)13. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In:Artiﬁcial intelligence and statistics. pp. 562–570 (2015)14. Lin, T.Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition. pp. 2117–2125 (2017)15. Lin, T.Y., Goyal, P., Girshick, R., He, K., Doll´ar, P.: Focal loss for dense objectdetection. In: Proceedings of the IEEE international conference on computer vision.pp. 2980–2988 (2017)16. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. arXivpreprint arXiv:1506.04579 (2015)17. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semanticsegmentation. In: Proceedings of the IEEE conference on computer vision andpattern recognition. pp. 3431–3440 (2015)18. Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983 (2016)19. Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J.,Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumorimage segmentation benchmark (brats). IEEE transactions on medical imaging (10), 1993–2024 (2014)20. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networksfor volumetric medical image segmentation. In: 2016 fourth international confer-ence on 3D vision (3DV). pp. 565–571. IEEE (2016)21. Myronenko, A.: 3d mri brain tumor segmentation using autoencoder regularization.In: International MICCAI Brainlesion Workshop. pp. 311–320. Springer (2018)22. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using con-volutional neural networks in mri images. IEEE transactions on medical imaging (5), 1240–1251 (2016)23. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-ical image segmentation. arXiv preprint arXiv:1505.04597 (2015)24. Shen, H., Wang, R., Zhang, J., McKenna, S.J.: Boundary-aware fully convolutionalnetwork for brain tumor segmentation. In: International Conference on Medical Im-age Computing and Computer-Assisted Intervention. pp. 433–441. Springer (2017)25. Tan, M., Pang, R., Le, Q.V.: Eﬃcientdet: Scalable and eﬃcient object detection.arXiv preprint arXiv:1911.09070 (2019)26. Wu, Y., He, K.: Group normalization. In: Proceedings of the European conferenceon computer vision (ECCV). pp. 3–19 (2018)27. Zhao, X., Wu, Y., Song, G., Li, Z., Zhang, Y., Fan, Y.: A deep learning modelintegrating fcnns and crfs for brain tumor segmentation. Medical image analysis43