Enhancing MRI Brain Tumor Segmentation with an Additional Classification Network
EEnhancing MRI Brain Tumor Segmentationwith an Additional Classification Network (cid:63)
Hieu T. Nguyen , , ∗ , (cid:12) , Tung T. Le , , ∗ Thang V. Nguyen , and Nhan T.Nguyen Medical Imaging Department, Vingroup Big Data Institute (VinBigdata) Hanoi University of Science and Technology (HUST) University of Engineering and Technology (UET), VNU (cid:12)
Correspondence to Hieu Nguyen; e-mail: [email protected] ∗ These authors share first authorship on this work.
Abstract.
Brain tumor segmentation plays an essential role in medi-cal image analysis. In recent studies, deep convolution neural networks(DCNNs) are extremely powerful to tackle tumor segmentation tasks.We propose in this paper a novel training method that enhances thesegmentation results by adding an additional classification branch to thenetwork. The whole network was trained end-to-end on the MultimodalBrain Tumor Segmentation Challenge (BraTS) 2020 training dataset.On the BraTS’s test set, it achieved an average Dice score of 80 . .
67% and 82 . .
22, 7 . .
27, respectively for the enhancing tumor, the whole tumor andthe tumor core.
Keywords:
Deep learning · Brain tumor segmentation · FPN · U-Net
Gliomas are the most common primary brain malignancies, with different degreesof aggressiveness, variable prognosis and various heterogeneous histological sub-regions [3,1,2]. One objective of The Brain Tumor segmentation (BraTS) chal-lenge is to identify state-of-the-art machine learning methods for segmentation ofbrain tumors in magnetic resonance imaging (MRI) scans [19,4]. One MRI datasample consists of a native T1-weighted scan (T1), a post-contrast T1-weightedscan (T1Gd), a native T2-weighted scan (T2), and a T2 Fluid Attenuated Inver-sion Recovery (T2-FLAIR) scan. However, each tumor-region-of-interest (TRoI)is visible in one pulse. Specifically, the whole tumor is visible in T2-FLAIR, thetumor core is visible in T2, and the enhancing tumor is visible in T1Gd.An accurate deep learning segmentation model not only can save time for neu-roradiologists but provides a reliable result for further tumor analysis. Recently,deep learning approaches have consistently surpassed traditional computer vi-sion methods [6,11,22,24,27]. Specifically, convolutional neural networks (CNN) (cid:63)
This work was done when Hieu Nguyen and Tung Le were AI Interns at MedicalImaging Department, Vingroup Big Data Institute (VinBigdata). a r X i v : . [ ee ss . I V ] O c t Nguyen et al. are able to learn deep representative features to generate accurate segmentationmask both in 2D and 3D medical images.The BraTS 2020 training dataset, which comprises 369 cases for training and125 cases for validation, is manually annotated by both clinicians and board-certified radiologists. Each tumor is segmented into enhancing tumor, peritu-moral edema, and the necrotic and non-enhancing tumor core. To evaluate thesegmentation performance, various metrics are used: Dice score, Hausdorff dis-tance (95%), sensitivity and specificity.Since the introduction of U-Net [23] in 2015, various types of U-shape DCNNhave been proposed and gained significant results in medical image segmenta-tion tasks. In BraTS 2017, Kamnitsas et al. [10], who was the winner of thesegmentation challenge, explored Ensembles of Multiple Models and Architec-ture (EMMA) for robust performance by combining several DCNNs includingDeepMedic [11], 3D FCN [17] and 3D U-Net [5]. In BraTS 2018, Myronenko [21],who won segmentation track, utilized asymmetrically large encoder to extractdeep image features, and the decoder part reconstructs dense segmentationmasks. The authors also added the variational autoencoder (VAE) branch inorder to regularize the network. In BraTS 2019, Jiang et al.[9], who recentlyachieved the highest score on private test set, deployed two-stage cascaded U-Net which basically stacked 2 U-Net networks together. In the first stage, theyused a variant of U-Net to train a coarse prediction. In the next stage, theyincreased the network capacity by using 2 decoders simultaneously. The modelwas trained in an end-to-end manner and achieved the best result.
Contribution.
Through exploratory model analysis after training, we no-tice that deep learning segmentation models sometimes make false positive pre-dictions. To bridge the gap between segmentation model efficiency and avoidthese problems, we proposed a novel end-to-end training method by combiningboth segmentation and classification. The classification branch helps to pre-dict whether a mask slice contains region of interest as well as to regularizethe segmentation branch. We explored this approach with 2 architectures whichare variant of nested U-Net [28] and Bi-directional Feature Pyramid Network(BiFPN) [25]. Our method achieved Dice score of 80 . .
67% and 82 . . In this section, we describe the proposed approach in which two different models,BiFPN and Nested U-Net, are leveraged as the base segmentation architecture,enhanced by a classifier head. While the segmentation head largely relies on localfeatures to segment tumor area, the classification branch leverages global fea-tures of the whole slice as well as neighbors slices to aid segmentation task. Themain advantage of classification head is that it significantly reduces false positive nhancing MRI Brain Tumor Segmentation by Classification 3 regions since minute, high intensity regions of enhancing tumor are often con-fused with other non-tumor, high intensity regions. In addition, to tackle smallbatch size problem when using batch-norm, we deploy Group Normalization [26]with number groups of 8 instead. In this approach, an encoder-decoder based network is leveraged [28,12] withan additional classification branch to further enhance segmentation results. Theclassification head is placed at the end of the encoder to classify whether an imageslice has tumor region. In the following subsections, we describe the details ofthe encoder and decoder parts (see Fig.1).
Fig. 1.
Overview of the BiFPN architecture with Classifier.
Encoder
For the encoder part, we exploited residual block [7] for features ex-traction with the number of channels being doubled after each convolutionallayer of stride 2, which results in multi-scale features maps for the latter part.There are four scales of feature maps, where the smallest one were 16 timessmaller than the input image (see Table.1 for the details of the feature extrac-tor). In order to combine features of multiple sizes, we adapted the BiFPN layerfrom EfficientDet architecture [25] (see Fig.2), which was an improved versionof the Feature Pyramid networks [14]. We used three consecutive BiFPN layerswith feature dimensions of 256, as deeper networks did not improve performance.
Nguyen et al.
Table 1.
Details of the feature extractor, where conv3 is 3 × × × × × Block Details Repeat Output sizeEncoder block1 (conv3 stride2, GN, dropout, ReLU,conv3 stride1, GN, dropout, ReLU)+ conv3 stride2 1 16 × × × × × × × × × × × × Fig. 2.
Illustration of the BiFPN layer.nhancing MRI Brain Tumor Segmentation by Classification 5
Decoder
In the decoder part, we followed the design of semantic segmentationbranch of the Panoptic Feature Pyramid networks [12]. Each feature map fromthe BiFPN layers was put into a number of up-sample blocks, depending on thespatial size. Each up-sample block consists of a 3 × × × trilinear interpolation, and the feature dimension isfixed to 256. Due to GPU memory constraint, all feature maps were up-sampledto a common size, which is half of the input image size, then were concatenatedbefore putting into the final upsample block, which has a 1 × × × trilinear interpolation layer. Fig. 3.
Overview of the nested UNET architecture with an additional Classifier.
Skip pathways
According to Zhou et al. [28], nested U-Net (UNet++) pro-posed dense convolution block whose number of convolution layers depend on thepyramid level. Therefore, they re-designed skip pathways to bring the semanticlevel of the encoder feature maps closer to that of the feature maps awaiting inthe decoder. Fig. 3 clarifies how the feature maps travel through the top skippathway of UNet++.
Deep supervision
In order to take advantage of lower feature maps, we useddeep supervision [13] wherein the final segmentation map was selected from all
Nguyen et al. segmentation branches averaged. Instead of using another layer before upsam-pling to the map size, our final prediction mask upsamples directly from lastlayer. Each layer contains ReLU activation, followed by 2 convolution layers.
In all segmentation architectures, the concatenated feature maps before the fi-nal blocks were used as the input for the classification head. While the featuremaps for classifier in UNET++ have the same spatial size as the input image,its counterpart in BiFPN are only half of the input, leading to an additional up-sample layer in the classification branch of the BiFPN. The classification branchincludes: (1) a 3 × × Table 2.
Details of the classification branch in BiFPN, where conv1d1 is 1-D convo-lution with kernel size of 1, conv3d3 is 3-D convolution with kernel size of 3 × × tconv3d3 is 3-D transpose convolution with kernel size of 3 × ×
3, GN denotesgroup-norm. Here output shape of each layer corresponding to input features of shape1024 × × × Names Details Repeat Output sizeConv block conv3d3, GN, ReLU 1 512 × × × × × × × Table 3.
Details of the classification branch in UNET++. Here output shape of eachlayer corresponding to input features of shape 128 × × × Names Details Repeat Output sizeConv block conv3d3, BN, ReLU 1 256 × × × × × × nhancing MRI Brain Tumor Segmentation by Classification 7 Dice loss
Dice loss originates from Sørensen–Dice coefficient, which is a statisticdeveloped in 1940s to gauge the similarity between two samples. It was brought inV-Net paper [20]. The Dice Similarity Coefficient (DSC) measures the the degreeof overlap between the prediction map and ground truth based on dice coefficient,which is a quantity ranging between 0 and 1 which we aim to maximize. TheDice loss is calculated as L dice = 1 − N (cid:80) i p i g iN (cid:80) i p i + N (cid:80) i g i + (cid:15) , (1)where p i is predicted voxels, and g i is ground truth. The sums run over N voxelsand we add a small (cid:15) = 1 e − Focal loss
To deal with large class imbalance in the segmentation problem, wealso used focal loss [15] to penalize more on wrongly segmented regions L focal ( p t ) = − α t (1 − p t ) γ log( p t ) . (2)We directly optimized the label regions (whole tumor, tumor core, and en-hancing tumor) with the losses. Classification loss
For the classification branch, we used focal loss and stan-dard binary cross entropy loss .Finally, we summed over all the losses to obtain the final loss. L total = L focal seg + L dice + L focal cls + L BCE . (3) For preprocessing, we cropped out zero-intensity regions in order to reduce theimage size as well as discard the out-of-interest regions in training (see Fig-ure 4). To prevent the network from overfitting, we executed several types ofdata augmentation. Firstly, we applied random flip with probability of 0.5 in ev-ery spatial axis. Then, we applied a random scale intensity shift of input in therange of [0 . , . − . , . × × × ×
96 voxels due to memory limitation.
Nguyen et al.
Fig. 4.
Crop out zero-intensity region.
Thanks to Pytorch 1.6, we took advantage of automatic mixed precision tosave GPU memory. Adam optimizer was exploited to update model’s parame-ters. Moreover, instead of using fixed learning rate during training, we deployed2 learning rate schedulers (see below) which were cosine learning rate sched-uler [18] and polynomial learning rate scheduler [16]. Both of the two settingshave achieved the same level of performance after 200 epochs with base learningrate of 1 e − Cosine learning rate scheduler (see Eq. 4) Ignoring the warmup stage andassuming the total number of batches is T , initial learning rate η , the learningrate η t at batch t is computed as η t = 12 (1 + cos( tπT )) η. (4) Polynomial learning rate scheduler (see Eq. 5) Ignoring the warm-upstage, η = 1 e − e is epoch counter, N e is total numberof epochs, the learning rate η t at batch t is computed as η t = η × (1 − eN e ) . . (5)The cosine learning rate scheduler was used to train in BiFPN model and thepolynomial learning rate scheduler was used in nested Unet model. Each networkwas trained from scratch without neither pretrained weight nor external data ontwo NVIDIA Tesla V100 32GB RAM. While BiFPN takes a batch of 4 samples,each of shape 4 × × ×
96, UNET++ takes a batch of 2 samples with sizeof 4 × × × To achieve a robust prediction, we applied test time augmentations (TTA) forevery model before averaging them. The augmentations were different flippingin each axis. Finally, we averaged the output before put them through sigmoidfunction. Decision threshold for both classification and segmentation of threeclasses was set at 0.5. The negative predictions at slice level of the classification nhancing MRI Brain Tumor Segmentation by Classification 9 head were used to exclude predicted segmentation regions. While removal ofsmall tumor regions is a simpler approach to address false positive predictions,this method is highly sensitive to volume threshold and can mistakenly excludethe actual tumor which is minute. The classification head provides a more robustsolution which can significantly reduce false positive rate, at the same time lesslikely to miss tiny tumor.
In this section we show the performance of the two architectures with and with-out the classification head (see Table4) in terms of Dice score and Hausdorffdistance (95%). The results indicate that classification head improves both archi-tectures by a significant margin. Ensemble of 5 models of the same architecturetrained on 5-folds cross-validation gives marginal improvement, while ensembleof the two architectures gives more considerable enhancement. The results ofensemble of 10 models on the testing set are given in Table 5
Table 4.
Mean Dice score and Hausdorff distance of the proposed method on BraTS2020 validation set.
Method Dice score Hausdorff distance (95%)Validation ET WT TC ET WT TCbest single Unet++ w/o cls 0.7029 0.8967 0.8239 42.2474 7.4907 9.0179best single Unet++ w. cls 0.7742 0.8940 0.8241 35.4246 8.4361 10.4074ensemble of 5-fold Unet++ w/o cls 0.7017 0.8953 0.8239 47.1436 5.8179 11.0075ensemble of 5-fold Unet++ w. cls 0.7841 0.8960 0.8233 35.4841 5.0862 10.0780best single BiFPN w/o cls 0.7480 0.8896 0.8400 31.1209 5.8924 6.9682best single BiFPN w. cls 0.7729 0.8881 0.8373 21.5720 6.9531 6.5573ensemble of 5-fold BiFPN w/o cls 0.7471 0.8915 0.8371 28.9473 5.9362 6.8706ensemble of 5-fold BiFPN w. cls 0.7774 0.8914 0.8380 24.6944 5.9834 6.8527ensemble of 10 models
Table 5.
Mean Dice score and Hausdorff distance of the proposed method on BraTS2020 testing set.
Method Dice score Hausdorff distance (95%)Testing ET WT TC ET WT TCensemble of 10 models 0.80569 0.85671 0.81997 14.21938 7.35549 23.27358
In this work, we described a novel training method for brain tumor segmentationfrom multimodal 3D MRIs. Our results on BraTS 2020 indicated that our modelis able to achieve an extremely competitive segmentation result. On the BraTS2020 test set, the proposed method obtained an average Dice score of 80 . .
67% and 82 . . .
36% and23 . Acknowledgments
This work was highly supported by Medical ImagingDepartment at Vingroup Big Data Institute (VinBigdata).
References
1. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann,J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features forthe pre-operative scans of the tcga-lgg collection. The cancer imaging archive (2017)2. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann,J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for thepre-operative scans of the tcga-gbm collection. the cancer imaging archive. Nat SciData , 170117 (2017)3. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann,J.B., Farahani, K., Davatzikos, C.: Advancing the cancer genome atlas glioma mricollections with expert segmentation labels and radiomic features. Scientific data , 170117 (2017)4. Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara,R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learn-ing algorithms for brain tumor segmentation, progression assessment, and overallsurvival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)5. C¸ i¸cek, ¨O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net:learning dense volumetric segmentation from sparse annotation. In: Internationalconference on medical image computing and computer-assisted intervention. pp.424–432. Springer (2016)6. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal,C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation with deep neural net-works. Medical image analysis , 18–31 (2017)7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learn-ing for image recognition. In: IEEE CVPR. pp. 770–778 (2016).https://doi.org/https://doi.org/10.1109/CVPR.2016.908. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation (8), 1735–1780 (1997)9. Jiang, Z., Ding, C., Liu, M., Tao, D.: Two-stage cascaded u-net: 1st place solutionto brats challenge 2019 segmentation task. In: International MICCAI BrainlesionWorkshop. pp. 231–241. Springer (2019)10. Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N.,Rajchl, M., Lee, M., Kainz, B., Rueckert, D., et al.: Ensembles of multiple modelsnhancing MRI Brain Tumor Segmentation by Classification 11and architectures for robust brain tumour segmentation. In: International MICCAIBrainlesion Workshop. pp. 450–462. Springer (2017)11. Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon,D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3d cnn with fully connectedcrf for accurate brain lesion segmentation. Medical image analysis , 61–78 (2017)12. Kirillov, A., Girshick, R., He, K., Doll´ar, P.: Panoptic feature pyramid networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.pp. 6399–6408 (2019)13. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In:Artificial intelligence and statistics. pp. 562–570 (2015)14. Lin, T.Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition. pp. 2117–2125 (2017)15. Lin, T.Y., Goyal, P., Girshick, R., He, K., Doll´ar, P.: Focal loss for dense objectdetection. In: Proceedings of the IEEE international conference on computer vision.pp. 2980–2988 (2017)16. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. arXivpreprint arXiv:1506.04579 (2015)17. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semanticsegmentation. In: Proceedings of the IEEE conference on computer vision andpattern recognition. pp. 3431–3440 (2015)18. Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983 (2016)19. Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J.,Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumorimage segmentation benchmark (brats). IEEE transactions on medical imaging (10), 1993–2024 (2014)20. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networksfor volumetric medical image segmentation. In: 2016 fourth international confer-ence on 3D vision (3DV). pp. 565–571. IEEE (2016)21. Myronenko, A.: 3d mri brain tumor segmentation using autoencoder regularization.In: International MICCAI Brainlesion Workshop. pp. 311–320. Springer (2018)22. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using con-volutional neural networks in mri images. IEEE transactions on medical imaging (5), 1240–1251 (2016)23. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-ical image segmentation. arXiv preprint arXiv:1505.04597 (2015)24. Shen, H., Wang, R., Zhang, J., McKenna, S.J.: Boundary-aware fully convolutionalnetwork for brain tumor segmentation. In: International Conference on Medical Im-age Computing and Computer-Assisted Intervention. pp. 433–441. Springer (2017)25. Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection.arXiv preprint arXiv:1911.09070 (2019)26. Wu, Y., He, K.: Group normalization. In: Proceedings of the European conferenceon computer vision (ECCV). pp. 3–19 (2018)27. Zhao, X., Wu, Y., Song, G., Li, Z., Zhang, Y., Fan, Y.: A deep learning modelintegrating fcnns and crfs for brain tumor segmentation. Medical image analysis43