Efficient, high-performance pancreatic segmentation using multi-scale feature extraction
Moritz Knolle, Georgios Kaissis, Friederike Jungmann, Sebastian Ziegelmayer, Daniel Sasse, Marcus Makowski, Daniel Rueckert, Rickmer Braren
EEfficient, high-performance pancreatic segmentationusing multi-scale feature extraction
Moritz Knolle , Georgios Kaissis , Friederike Jungmann ,Sebastian Ziegelmayer , Daniel Sasse , Marcus Makowski , DanielRueckert , and Rickmer Braren Department of diagnostic and interventional Radiology, TechnicalUniversity of Munich, Munich, Germany Institute for Artificial Intelligence and Data Science in Medicine andHealthcare, Technical University of Munich, Munich, Germany OpenMined Research Department of Computing, Imperial College London, London,United Kingdom * shared first authorship + Corresponding author e-mail: [email protected] a r X i v : . [ ee ss . I V ] S e p bstract Rationale
For artificial intelligence-based image analysis methods to reach clinical ap-plicability, the development of high-performance algorithms is crucial. Forexample, existent segmentation algorithms based on natural images are nei-ther efficient in their parameter use nor optimized for medical imaging.Here we present
MoNet , a highly optimized neural-network-based pancreaticsegmentation algorithm focused on achieving high performance by efficientmulti-scale image feature utilization.
Methods
We developed
MoNet a shallow,
U-Net -like architecture based on repeated,dilated convolutions with decreasing dilation rates. The model was trainedon publicly available pancreatic computed tomography (CT) scans in theportal-venous phase from the Medical Segmentation Decathlon (196 train-ing and 85 validation scans) and tested for its out-of-sample generalizationperformance by evaluating the Dice coefficient on 85 manually segmentedscans sourced from our institution’s picture archiving and communicationsystem (PACS). We compared the model’s Dice coefficient and inferencetime against the standard architectures (
U-Net and
Attention U-Net ). Results
MoNet achieved a mean ± STD Dice coefficient of 0 . ± . U-Net : 0 . ± . Attention U-Net : 0 . ± . ,
556 parameters (
U-Net : 31 , , Attention U-Net :31 , , ± STD inference time was 14 . ± . s compared to45 . ± . s for U-Net and 53 . ± . s for Attention U-Net . Conclusion
We present an optimized neural network architecture for pancreatic segmen-tation which provides performance competitive with the state-of-the-art onout-of-sample data while utilizing fewer parameters and requiring a fractionof inference time. 2 ntroduction
Pancreatic ductal adenocarcinoma (PDAC) is prognosticated to soon be-come the second leading cause of cancer-related death worldwide because oflate diagnosis and diverse tumor biology [1]. Machine learning-based quan-titative imaging workflows have demonstrated promising results in a varietyof oncologic imaging workflows, such as the detection or sub-classificationof lung [2] and breast [3] cancers in imaging data. Arguably, successes havebeen driven in part by the facility of automatic detection and segmentationof these tumor entities due to high-contrast to the surrounding tissue andhigh acquisition resolution of thoracic CT or mammography. To translatesuch applications to PDAC, e.g. to improve early diagnosis and non-invasiveclassification of known molecular tumor subtypes with differential outcomein response to available chemotherapy regimens, high-performance pancre-atic segmentation algorithms will be instrumental. However, the success ofautomated segmentation algorithms in pancreatic CT imaging has hithertobeen limited by the organ’s poor differentiability from adjacent structuresof similar attenuation, variability in position and fat content and alterationsdue to pathology such as tumor or inflammation.Existent work in deep learning-assisted pancreatic segmentation has fo-cused on expanding previously available architectures such as the
U-Net [4]into the three-dimensional context [5] or on improving segmentation re-sults by incorporating attention mechanisms into the architecture[6]. Thesemodifications however result in a further increase in the (already substan-tial) computational requirements and resulting costs of these architectures,rendering such
U-Net derivatives impractical for cost-efficient utilization inrapid research workflows or in clinical practice. Here, we introduce
MoNet ,an optimized, shallow,
U-Net -derived architecture achieving with state-of-the-art or higher performance in pancreatic segmentation, based on efficientmulti-scale feature extraction using repeated decreasingly dilated convolu-tion (RDDC) layers with two global down-sampling operations and a total of403,556 parameters, a >
95% parameter reduction compared to the original
U-Net architecture.
Methods
Training, validation and independent testing datasets
All neural network architectures presented in this work were trained onthe pancreas dataset from the Medical Segmentation Decathlon (MSD) [7].A 70%/30% training-validation split was employed. Hence, 196 abdomi-nal CT scans of the portal-venous contrast agent enhancement phase wereused for training and 85 scans for validation. For processing, images werebilinearly down-sampled to 256 × Network architecture
The architecture of
MoNet is depicted in Fig. 2. In brief, input tensors ofshape B × × ×
1, with B denoting the batch size, are progressivelydown-sampled across the encoder branch of the network using convolutionswith a stride length of 2, resulting in an X × Y resolution of 64 ×
64 in the bottleneck segment of the network. The resulting feature maps are then pro-gressively up-sampled by transposed convolution ( deconvolution ) in the de-coder branch resulting in output masks of identical dimensions as the input.Each (de-)convolution block consists of a 3x3 convolutional layer followedby batch normalization and an exponential linear unit (ELU) activation.At every stage in the
U-Net -like architecture, the convolution blocks arefollowed by a repeated decreasingly dilated convolution (RDDC) block (Fig.3), consisting of four successive convolutional blocks as described above, butemploying dilated convolutions [8] with a decreasing dilation rate (4, 3, 2, 1,respectively). A feature extraction strategy that has been shown to performwell for small objects[9]. Each convolutional block within a
RDDC block is4ollowed by a spatial dropout layer[10]. Finally, residual-type longitudinal(short) connections are employed within each RDDC block and transverse(long) skip connections are employed between the encoder and the decoderbranch to assist signal and gradient flow as originally described in [4, 11].
32 64 x
256 256 x x
128 128 x
128 128 x
128 128 x
128 256 x
256 256 x
256 256 x X
64 64 X
32 32
16 16
Legend : conv (3x3) , ba t chnor m , E L Udown -c on v RDDC b l o ck up -c on v x
16 1 sk i p c onne c ti on Figure 2: Schematic representation of the
MoNet architecture.
Convolution Block
BatchNorm ELUprevious layer conv block dilation = 4 conv block dilation = 3 conv block dilation = 2 spatial dropout conv block dilation = 1 next layer
RDDC Block
Figure 3: Schematic representation diagram of a RDDC block (top) and theconstituent convolutional blocks (bottom).
Model training
All architectures were trained to convergence using the
Nesterov-Adam op-timizer [12] with an initial learning rate of 5 × − and learning rate decayby a factor 10 upon validation loss stagnation for ≥ ◦ , random zoom ( ± .
25) and random pixel shifts of a maximum5agnitude of 0.2 of the image height/width. Data augmentation was val-idated and chosen based on advice from a senior radiologist to representplausible data expected to be encountered in real-world clinical use settingswhen imaging the pancreas.
Performance Assessment
We compared
MoNet ’s out-of-sample generalisation performance to the fol-lowing two
U-Net [4] variants: • original U-Net , 64 base filters (
U-Net ) • Attention-gated
U-Net , 2D, 64 base filters (
Attention U-Net )For all performance comparisons, repeated testing was performed underidentical circumstances (no concurrent network traffic, all non-essential op-erating system processes suspended, identical CPU power settings). Meaninference times and Dice scores were compared using the Student’s t-testwith multiple testing correction.
Results
Inference-time comparison
A comparison of the time required for performing inference with 150 256 ×
256 images on CPU (2.4GHz 8-Core Intel Core i9) was performed with iden-tical batch size and otherwise consistent environment for
U-Net , AttentionU-Net (2D) and
MoNet . MoNet significantly outperformed both
U-Net and
Attention U-Net with regards to inference time (Student’s t-test with mul-tiple testing correction p < Architecture Mean ± STD inference times (sec), N=5 repetitions
U-Net . ± . Attention U-Net . ± . MoNet (ours) . ± . Table 1: Inference time for a CT scan of 150 slices at 256 ×
256 resolution,results averaged over 5 runs under identical setup.
Segmentation Performance Comparison
MoNet performed on par with other
U-Net variants on the validation dataset(all Student’s t-test after multiple testing correction non-significant ) whileoutperforming the other
U-Net variants on the independent validation dataset(all Student’s t-test after multiple testing correction p < rchitecture Parameter count Mean ± STD Dice, MSD Mean ± STD Dice, IVD
MoNet (ours) , . ± .
11 0 . ± . U-Net , ,
145 0 . ± .
15 0 . ± . Attention U-Net , ,
349 0 . ± .
15 0 . ± . Table 2: Comparison of
MoNet with other
U-Net variants tested on theMSD and the independent validation set (IVD) (both N=85 scans).
Discussion
We here present an efficient, high-performance
U-Net -like segmentation al-gorithm for pancreatic segmentation and show significant inference speedgains on CPU hardware while maintaining or exceeding the segmentationperformance of similar algorithms. The poor prognosis and increasing in-cidence of PDAC [13, 1] mandate the development of enhanced diagnosisand treatment strategies. Our recent findings suggest that quantitative im-age analysis can identify molecular subtypes related to different responseto chemotherapeutic drugs [14] or predict patient survival [15]. Automatedregion-of-interest definition increases the reliability and validity of such find-ings, and offers substantial time savings compared to manual expert-basedsegmentation. However, the widespread application of automatic segmenta-tion algorithms will depend both, on their real-world segmentation perfor-mance and on ease of deployment on a wide range of hardware environments,e.g. on hardware lacking graphics processing units.The work presented provides state-of-the-art segmentation performancewith substantial efficiency gains through the utilization of higher resolutionfeature maps in the bottleneck section of the network, making it suitable bothfor rapid prototyping and for large-scale deployment of e.g. decentralizedmachine learning workflows [16]. Network architectures with few parametersare therefore an excellent strategy to reduce network traffic.
MoNet was trained to segment the entire pancreas including the tumor.This approach is owing to the fact that the exact delineation of the tumorborder is often times infeasible and supported by literature findings notingthe importance of the peritumoral tissue in PDAC [17, 18, 19] and in othertumor entities [20].Recent work on semantic segmentation provides evidence in favor ofarchitectures performing image feature extraction at multiple scales by uti-lizing dilated convolutions instead of relying merely on the scale-decreasingbackbones employed in traditional fully convolutional architectures [21, 22,9, 23]. Our work corroborates this notion, since multi-scale feature extrac-tion combined with larger receptive fields at the same hierarchical level seemto capture both more robust and higher quality features compared to thefixed kernel size design encountered in
U-Net -like architectures. Moreover,architectures with several down-sampling operations and/or many filters7uch as the 64-filter
U-Net (with 4 down-sampling stages) cannot leveragethe large number of parameters sufficiently well to warrant their utilizationat least in medical imaging tasks, typically characterized by a lack of largedatasets and by small segmentation targets (such as the pancreas).Furthermore, segmentation algorithms are often trained on publicly avail-able and/or single-institutional data such as the pancreas dataset from MSD,which has recently been identified to potentially create generalization chal-lenges to data collected at different centers[24]. Our results show that
MoNet using repeated decreasingly dilated convolutions extracts more robust fea-tures that generalize better to out-of-sample data than current methods, asshown by
MoNet ’s performance on the independent validation set.Our work is not without limitations. The generalizability of our findingsis still limited due to the single-institution independent validation set and therelatively small sample size. All tested algorithms would have benefited fromlarger training sets and from performance evaluation on additional multi-center data sets. Furthermore, we only compared our algorithm against al-gorithms based on the use of a single 2D-
U-Net -style network. Algorithmssuch as nnU-Net [25] based on
U-Net ensembles offer superior performance,however at the expense of high computational and post-processing require-ments and thus much slower inference times (especially on CPU).
Conlcusion
In conclusion, we propose an efficient, state-of-the-art performance pancre-atic segmentation algorithm which can benefit both, radiological researchand clinical translation of artificial intelligence workflows in medical imag-ing by providing consistent, high-quality segmentation for both radiomics and machine learning tasks.
Source code and data availability
Source code for
MoNet based on TensorFlow is available at https://github.com/TUM-AIMED/MoNet .The training datasets are available from http://medicaldecathlon.com/ . The independent test set data contains confi-dential patient information and cannot be shared publicly.
References [1] E. A. Collisson, P. Bailey, D. K. Chang, and A. V. Biankin, “Molecu-lar subtypes of pancreatic cancer,”
Nature Reviews Gastroenterology &Hepatology , vol. 16, no. 4, pp. 207–220, 2019.82] D. Ardila, A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng,D. Tse, M. Etemadi, W. Ye, G. Corrado, D. P. Naidich, and S. Shetty,“End-to-end lung cancer screening with three-dimensional deep learningon low-dose chest computed tomography,”
Nature Medicine , vol. 25,pp. 954–961, may 2019.[3] S. M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova,H. Ashrafian, T. Back, M. Chesus, G. C. Corrado, A. Darzi, et al. ,“International evaluation of an ai system for breast cancer screening,”
Nature , vol. 577, no. 7788, pp. 89–94, 2020.[4] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net-works for biomedical image segmentation,” in
International Confer-ence on Medical image computing and computer-assisted intervention ,pp. 234–241, Springer, 2015.[5] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutionalneural networks for volumetric medical image segmentation,” in , pp. 565–571, IEEE,2016.[6] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa,K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al. , “Atten-tion u-net: Learning where to look for the pancreas,” arXiv preprintarXiv:1804.03999 , 2018.[7] A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani,B. Van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens,B. Menze, et al. , “A large annotated medical image dataset for the de-velopment and evaluation of segmentation algorithms,” arXiv preprintarXiv:1902.09063 , 2019.[8] M. Holschneider, R. Kronland-Martinet, J. Morlet, andP. Tchamitchian, “A real-time algorithm for signal analysis withthe help of the wavelet transform,” in
Wavelets , pp. 286–297, Springer,1990.[9] R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka,“Effective use of dilated convolutions for segmenting small object in-stances in remote sensing imagery,” in , pp. 1442–1450, IEEE, 2018.[10] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Efficientobject localization using convolutional networks,” in
Proceedings of theIEEE conference on computer vision and pattern recognition , pp. 648–656, 2015. 911] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , pp. 770–778, 2016.[12] T. Dozat, “Incorporating nesterov momentum into adam,” 2016.[13] American Cancer Society, “Cancer Facts & Figures 2020.”[14] G. A. Kaissis, S. Ziegelmayer, F. K. Loh¨ofer, F. N. Harder, F. Jung-mann, D. Sasse, A. Muckenhuber, H.-Y. Yen, K. Steiger, J. Siveke,H. Friess, R. Schmid, W. Weichert, M. R. Makowski, and R. F. Braren,“Image-Based Molecular Phenotyping of Pancreatic Ductal Adenocar-cinoma,”
Journal of Clinical Medicine , vol. 9, p. 724, mar 2020.[15] G. A. Kaissis, F. Jungmann, S. Ziegelmayer, F. K. Loh¨ofer, F. N.Harder, A. M. Schlitter, A. Muckenhuber, K. Steiger, R. Schirren,H. Friess, R. Schmid, W. Weichert, M. R. Makowski, and R. F. Braren,“Multiparametric Modelling of Survival in Pancreatic Ductal Adenocar-cinoma Using Clinical, Histomorphological, Genetic and Image-DerivedParameters,”
Journal of Clinical Medicine , vol. 9, p. 1250, apr 2020.[16] W. Li, F. Milletar`ı, D. Xu, N. Rieke, J. Hancox, W. Zhu, M. Baust,Y. Cheng, S. Ourselin, M. J. Cardoso, and A. Feng, “Privacy-PreservingFederated Brain Tumour Segmentation,” in
Machine Learning in Med-ical Imaging , pp. 133–141, Springer International Publishing, 2019.[17] A. S. Bauer, P. V. Nazarov, N. A. Giese, S. Beghelli, A. Heller,W. Greenhalf, E. Costello, A. Muller, M. Bier, O. Strobel, T. Hack-ert, L. Vallar, A. Scarpa, M. W. B¨uchler, J. P. Neoptolemos, S. Kreis,and J. D. Hoheisel, “Transcriptional variations in the wider peritu-moral tissue environment of pancreatic cancer,”
International Journalof Cancer , vol. 142, pp. 1010–1021, oct 2017.[18] N. Fukushima, J. Koopmann, N. Sato, N. Prasad, R. Carvalho, S. D.Leach, R. H. Hruban, and M. Goggins, “Gene expression alterationsin the non-neoplastic parenchyma adjacent to infiltrating pancreaticductal adenocarcinoma,”
Modern Pathology , vol. 18, pp. 779–787, mar2005.[19] J. R. Infante, H. Matsubayashi, N. Sato, J. Tonascia, A. P. Klein, T. A.Riall, C. Yeo, C. Iacobuzio-Donahue, and M. Goggins, “PeritumoralFibroblast SPARC Expression and Patient Outcome With ResectablePancreatic Adenocarcinoma,”
Journal of Clinical Oncology , vol. 25,pp. 319–325, jan 2007.[20] Q. Sun, X. Lin, Y. Zhao, L. Li, K. Yan, D. Liang, D. Sun, and Z.-C. Li, “Deep learning vs. radiomics for predicting axillary lymph node10etastasis of breast cancer using ultrasound images: Don’t forget theperitumoral region,”
Frontiers in Oncology , vol. 10, jan 2020.[21] X. Du, T.-Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, andX. Song, “Spinenet: Learning scale-permuted backbone for recogni-tion and localization,” in
Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition , pp. 11592–11601, 2020.[22] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinkingatrous convolution for semantic image segmentation,” arXiv preprintarXiv:1706.05587 , 2017.[23] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated con-volutions,” arXiv preprint arXiv:1511.07122 , 2015.[24] L. Zhang, X. Wang, D. Yang, T. Sanford, S. Harmon, B. Turkbey,B. J. Wood, H. Roth, A. Myronenko, D. Xu, et al. , “Generalizing deeplearning for medical image segmentation to unseen domains via deepstacked transformation,”
IEEE Transactions on Medical Imaging , 2020.[25] F. Isensee, J. Petersen, A. Klein, D. Zimmerer, P. F. Jaeger, S. Kohl,J. Wasserthal, G. Koehler, T. Norajitra, S. Wirkert, et al. , “nnu-net:Self-adapting framework for u-net-based medical image segmentation,” arXiv preprint arXiv:1809.10486arXiv preprint arXiv:1809.10486