[PDF] Efficient, high-performance pancreatic segmentation using multi-scale feature extraction

Abstract

For artificial intelligence-based image analysis methods to reach clinical applicability, the development of high-performance algorithms is crucial. For example, existent segmentation algorithms based on natural images are neither efficient in their parameter use nor optimized for medical imaging. Here we present MoNet, a highly optimized neural-network-based pancreatic segmentation algorithm focused on achieving high performance by efficient multi-scale image feature utilization.

Full PDF

EEﬃcient, high-performance pancreatic segmentationusing multi-scale feature extraction

Moritz Knolle , Georgios Kaissis , Friederike Jungmann ,Sebastian Ziegelmayer , Daniel Sasse , Marcus Makowski , DanielRueckert , and Rickmer Braren Department of diagnostic and interventional Radiology, TechnicalUniversity of Munich, Munich, Germany Institute for Artiﬁcial Intelligence and Data Science in Medicine andHealthcare, Technical University of Munich, Munich, Germany OpenMined Research Department of Computing, Imperial College London, London,United Kingdom * shared ﬁrst authorship + Corresponding author e-mail: [email protected] a r X i v : . [ ee ss . I V ] S e p bstract Rationale

For artiﬁcial intelligence-based image analysis methods to reach clinical ap-plicability, the development of high-performance algorithms is crucial. Forexample, existent segmentation algorithms based on natural images are nei-ther eﬃcient in their parameter use nor optimized for medical imaging.Here we present

MoNet , a highly optimized neural-network-based pancreaticsegmentation algorithm focused on achieving high performance by eﬃcientmulti-scale image feature utilization.

Methods

We developed

MoNet a shallow,

U-Net -like architecture based on repeated,dilated convolutions with decreasing dilation rates. The model was trainedon publicly available pancreatic computed tomography (CT) scans in theportal-venous phase from the Medical Segmentation Decathlon (196 train-ing and 85 validation scans) and tested for its out-of-sample generalizationperformance by evaluating the Dice coeﬃcient on 85 manually segmentedscans sourced from our institution’s picture archiving and communicationsystem (PACS). We compared the model’s Dice coeﬃcient and inferencetime against the standard architectures (

U-Net and

Attention U-Net ). Results

MoNet achieved a mean ± STD Dice coeﬃcient of 0 . ± . U-Net : 0 . ± . Attention U-Net : 0 . ± . ,

556 parameters (

U-Net : 31 , , Attention U-Net :31 , , ± STD inference time was 14 . ± . s compared to45 . ± . s for U-Net and 53 . ± . s for Attention U-Net . Conclusion

We present an optimized neural network architecture for pancreatic segmen-tation which provides performance competitive with the state-of-the-art onout-of-sample data while utilizing fewer parameters and requiring a fractionof inference time. 2 ntroduction

Pancreatic ductal adenocarcinoma (PDAC) is prognosticated to soon be-come the second leading cause of cancer-related death worldwide because oflate diagnosis and diverse tumor biology [1]. Machine learning-based quan-titative imaging workﬂows have demonstrated promising results in a varietyof oncologic imaging workﬂows, such as the detection or sub-classiﬁcationof lung [2] and breast [3] cancers in imaging data. Arguably, successes havebeen driven in part by the facility of automatic detection and segmentationof these tumor entities due to high-contrast to the surrounding tissue andhigh acquisition resolution of thoracic CT or mammography. To translatesuch applications to PDAC, e.g. to improve early diagnosis and non-invasiveclassiﬁcation of known molecular tumor subtypes with diﬀerential outcomein response to available chemotherapy regimens, high-performance pancre-atic segmentation algorithms will be instrumental. However, the success ofautomated segmentation algorithms in pancreatic CT imaging has hithertobeen limited by the organ’s poor diﬀerentiability from adjacent structuresof similar attenuation, variability in position and fat content and alterationsdue to pathology such as tumor or inﬂammation.Existent work in deep learning-assisted pancreatic segmentation has fo-cused on expanding previously available architectures such as the

U-Net [4]into the three-dimensional context [5] or on improving segmentation re-sults by incorporating attention mechanisms into the architecture[6]. Thesemodiﬁcations however result in a further increase in the (already substan-tial) computational requirements and resulting costs of these architectures,rendering such

U-Net derivatives impractical for cost-eﬃcient utilization inrapid research workﬂows or in clinical practice. Here, we introduce

MoNet ,an optimized, shallow,

U-Net -derived architecture achieving with state-of-the-art or higher performance in pancreatic segmentation, based on eﬃcientmulti-scale feature extraction using repeated decreasingly dilated convolu-tion (RDDC) layers with two global down-sampling operations and a total of403,556 parameters, a >

95% parameter reduction compared to the original

U-Net architecture.

Methods

Training, validation and independent testing datasets

All neural network architectures presented in this work were trained onthe pancreas dataset from the Medical Segmentation Decathlon (MSD) [7].A 70%/30% training-validation split was employed. Hence, 196 abdomi-nal CT scans of the portal-venous contrast agent enhancement phase wereused for training and 85 scans for validation. For processing, images werebilinearly down-sampled to 256 × Network architecture

The architecture of

MoNet is depicted in Fig. 2. In brief, input tensors ofshape B × × ×

1, with B denoting the batch size, are progressivelydown-sampled across the encoder branch of the network using convolutionswith a stride length of 2, resulting in an X × Y resolution of 64 ×

64 in the bottleneck segment of the network. The resulting feature maps are then pro-gressively up-sampled by transposed convolution ( deconvolution ) in the de-coder branch resulting in output masks of identical dimensions as the input.Each (de-)convolution block consists of a 3x3 convolutional layer followedby batch normalization and an exponential linear unit (ELU) activation.At every stage in the

U-Net -like architecture, the convolution blocks arefollowed by a repeated decreasingly dilated convolution (RDDC) block (Fig.3), consisting of four successive convolutional blocks as described above, butemploying dilated convolutions [8] with a decreasing dilation rate (4, 3, 2, 1,respectively). A feature extraction strategy that has been shown to performwell for small objects[9]. Each convolutional block within a

RDDC block is4ollowed by a spatial dropout layer[10]. Finally, residual-type longitudinal(short) connections are employed within each RDDC block and transverse(long) skip connections are employed between the encoder and the decoderbranch to assist signal and gradient ﬂow as originally described in [4, 11].

32 64 x

256 256 x x

128 128 x

128 256 x

256 256 x

256 256 x X

64 64 X

32 32

16 16

Legend : conv (3x3) , ba t chnor m , E L Udown -c on v RDDC b l o ck up -c on v x

16 1 sk i p c onne c ti on Figure 2: Schematic representation of the

MoNet architecture.

Convolution Block

BatchNorm ELUprevious layer conv block dilation = 4 conv block dilation = 3 conv block dilation = 2 spatial dropout conv block dilation = 1 next layer

RDDC Block

Figure 3: Schematic representation diagram of a RDDC block (top) and theconstituent convolutional blocks (bottom).

Model training

All architectures were trained to convergence using the

Nesterov-Adam op-timizer [12] with an initial learning rate of 5 × − and learning rate decayby a factor 10 upon validation loss stagnation for ≥ ◦ , random zoom ( ± .

25) and random pixel shifts of a maximum5agnitude of 0.2 of the image height/width. Data augmentation was val-idated and chosen based on advice from a senior radiologist to representplausible data expected to be encountered in real-world clinical use settingswhen imaging the pancreas.

Performance Assessment

We compared

MoNet ’s out-of-sample generalisation performance to the fol-lowing two

U-Net [4] variants: • original U-Net , 64 base ﬁlters (

U-Net ) • Attention-gated

U-Net , 2D, 64 base ﬁlters (

Attention U-Net )For all performance comparisons, repeated testing was performed underidentical circumstances (no concurrent network traﬃc, all non-essential op-erating system processes suspended, identical CPU power settings). Meaninference times and Dice scores were compared using the Student’s t-testwith multiple testing correction.

Results

Inference-time comparison

A comparison of the time required for performing inference with 150 256 ×

256 images on CPU (2.4GHz 8-Core Intel Core i9) was performed with iden-tical batch size and otherwise consistent environment for

U-Net , AttentionU-Net (2D) and

MoNet . MoNet signiﬁcantly outperformed both

U-Net and

Attention U-Net with regards to inference time (Student’s t-test with mul-tiple testing correction p < Architecture Mean ± STD inference times (sec), N=5 repetitions

U-Net . ± . Attention U-Net . ± . MoNet (ours) . ± . Table 1: Inference time for a CT scan of 150 slices at 256 ×

256 resolution,results averaged over 5 runs under identical setup.

Segmentation Performance Comparison

MoNet performed on par with other

U-Net variants on the validation dataset(all Student’s t-test after multiple testing correction non-signiﬁcant ) whileoutperforming the other

U-Net variants on the independent validation dataset(all Student’s t-test after multiple testing correction p < rchitecture Parameter count Mean ± STD Dice, MSD Mean ± STD Dice, IVD

MoNet (ours) , . ± .

11 0 . ± . U-Net , ,

145 0 . ± .

15 0 . ± . Attention U-Net , ,

349 0 . ± .

15 0 . ± . Table 2: Comparison of

MoNet with other

U-Net variants tested on theMSD and the independent validation set (IVD) (both N=85 scans).

Discussion

We here present an eﬃcient, high-performance

U-Net -like segmentation al-gorithm for pancreatic segmentation and show signiﬁcant inference speedgains on CPU hardware while maintaining or exceeding the segmentationperformance of similar algorithms. The poor prognosis and increasing in-cidence of PDAC [13, 1] mandate the development of enhanced diagnosisand treatment strategies. Our recent ﬁndings suggest that quantitative im-age analysis can identify molecular subtypes related to diﬀerent responseto chemotherapeutic drugs [14] or predict patient survival [15]. Automatedregion-of-interest deﬁnition increases the reliability and validity of such ﬁnd-ings, and oﬀers substantial time savings compared to manual expert-basedsegmentation. However, the widespread application of automatic segmenta-tion algorithms will depend both, on their real-world segmentation perfor-mance and on ease of deployment on a wide range of hardware environments,e.g. on hardware lacking graphics processing units.The work presented provides state-of-the-art segmentation performancewith substantial eﬃciency gains through the utilization of higher resolutionfeature maps in the bottleneck section of the network, making it suitable bothfor rapid prototyping and for large-scale deployment of e.g. decentralizedmachine learning workﬂows [16]. Network architectures with few parametersare therefore an excellent strategy to reduce network traﬃc.

MoNet was trained to segment the entire pancreas including the tumor.This approach is owing to the fact that the exact delineation of the tumorborder is often times infeasible and supported by literature ﬁndings notingthe importance of the peritumoral tissue in PDAC [17, 18, 19] and in othertumor entities [20].Recent work on semantic segmentation provides evidence in favor ofarchitectures performing image feature extraction at multiple scales by uti-lizing dilated convolutions instead of relying merely on the scale-decreasingbackbones employed in traditional fully convolutional architectures [21, 22,9, 23]. Our work corroborates this notion, since multi-scale feature extrac-tion combined with larger receptive ﬁelds at the same hierarchical level seemto capture both more robust and higher quality features compared to theﬁxed kernel size design encountered in

U-Net -like architectures. Moreover,architectures with several down-sampling operations and/or many ﬁlters7uch as the 64-ﬁlter

U-Net (with 4 down-sampling stages) cannot leveragethe large number of parameters suﬃciently well to warrant their utilizationat least in medical imaging tasks, typically characterized by a lack of largedatasets and by small segmentation targets (such as the pancreas).Furthermore, segmentation algorithms are often trained on publicly avail-able and/or single-institutional data such as the pancreas dataset from MSD,which has recently been identiﬁed to potentially create generalization chal-lenges to data collected at diﬀerent centers[24]. Our results show that

MoNet using repeated decreasingly dilated convolutions extracts more robust fea-tures that generalize better to out-of-sample data than current methods, asshown by

MoNet ’s performance on the independent validation set.Our work is not without limitations. The generalizability of our ﬁndingsis still limited due to the single-institution independent validation set and therelatively small sample size. All tested algorithms would have beneﬁted fromlarger training sets and from performance evaluation on additional multi-center data sets. Furthermore, we only compared our algorithm against al-gorithms based on the use of a single 2D-

U-Net -style network. Algorithmssuch as nnU-Net [25] based on

U-Net ensembles oﬀer superior performance,however at the expense of high computational and post-processing require-ments and thus much slower inference times (especially on CPU).

Conlcusion

In conclusion, we propose an eﬃcient, state-of-the-art performance pancre-atic segmentation algorithm which can beneﬁt both, radiological researchand clinical translation of artiﬁcial intelligence workﬂows in medical imag-ing by providing consistent, high-quality segmentation for both radiomics and machine learning tasks.

Source code and data availability

Source code for

MoNet based on TensorFlow is available at https://github.com/TUM-AIMED/MoNet .The training datasets are available from http://medicaldecathlon.com/ . The independent test set data contains conﬁ-dential patient information and cannot be shared publicly.

References [1] E. A. Collisson, P. Bailey, D. K. Chang, and A. V. Biankin, “Molecu-lar subtypes of pancreatic cancer,”

Nature Reviews Gastroenterology &Hepatology , vol. 16, no. 4, pp. 207–220, 2019.82] D. Ardila, A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng,D. Tse, M. Etemadi, W. Ye, G. Corrado, D. P. Naidich, and S. Shetty,“End-to-end lung cancer screening with three-dimensional deep learningon low-dose chest computed tomography,”

Nature Medicine , vol. 25,pp. 954–961, may 2019.[3] S. M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova,H. Ashraﬁan, T. Back, M. Chesus, G. C. Corrado, A. Darzi, et al. ,“International evaluation of an ai system for breast cancer screening,”

Nature , vol. 577, no. 7788, pp. 89–94, 2020.[4] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net-works for biomedical image segmentation,” in

International Confer-ence on Medical image computing and computer-assisted intervention ,pp. 234–241, Springer, 2015.[5] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutionalneural networks for volumetric medical image segmentation,” in , pp. 565–571, IEEE,2016.[6] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa,K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al. , “Atten-tion u-net: Learning where to look for the pancreas,” arXiv preprintarXiv:1804.03999 , 2018.[7] A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani,B. Van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens,B. Menze, et al. , “A large annotated medical image dataset for the de-velopment and evaluation of segmentation algorithms,” arXiv preprintarXiv:1902.09063 , 2019.[8] M. Holschneider, R. Kronland-Martinet, J. Morlet, andP. Tchamitchian, “A real-time algorithm for signal analysis withthe help of the wavelet transform,” in

Wavelets , pp. 286–297, Springer,1990.[9] R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka,“Eﬀective use of dilated convolutions for segmenting small object in-stances in remote sensing imagery,” in , pp. 1442–1450, IEEE, 2018.[10] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eﬃcientobject localization using convolutional networks,” in

Proceedings of theIEEE conference on computer vision and pattern recognition , pp. 648–656, 2015. 911] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , pp. 770–778, 2016.[12] T. Dozat, “Incorporating nesterov momentum into adam,” 2016.[13] American Cancer Society, “Cancer Facts & Figures 2020.”[14] G. A. Kaissis, S. Ziegelmayer, F. K. Loh¨ofer, F. N. Harder, F. Jung-mann, D. Sasse, A. Muckenhuber, H.-Y. Yen, K. Steiger, J. Siveke,H. Friess, R. Schmid, W. Weichert, M. R. Makowski, and R. F. Braren,“Image-Based Molecular Phenotyping of Pancreatic Ductal Adenocar-cinoma,”

Journal of Clinical Medicine , vol. 9, p. 724, mar 2020.[15] G. A. Kaissis, F. Jungmann, S. Ziegelmayer, F. K. Loh¨ofer, F. N.Harder, A. M. Schlitter, A. Muckenhuber, K. Steiger, R. Schirren,H. Friess, R. Schmid, W. Weichert, M. R. Makowski, and R. F. Braren,“Multiparametric Modelling of Survival in Pancreatic Ductal Adenocar-cinoma Using Clinical, Histomorphological, Genetic and Image-DerivedParameters,”

Journal of Clinical Medicine , vol. 9, p. 1250, apr 2020.[16] W. Li, F. Milletar`ı, D. Xu, N. Rieke, J. Hancox, W. Zhu, M. Baust,Y. Cheng, S. Ourselin, M. J. Cardoso, and A. Feng, “Privacy-PreservingFederated Brain Tumour Segmentation,” in

Machine Learning in Med-ical Imaging , pp. 133–141, Springer International Publishing, 2019.[17] A. S. Bauer, P. V. Nazarov, N. A. Giese, S. Beghelli, A. Heller,W. Greenhalf, E. Costello, A. Muller, M. Bier, O. Strobel, T. Hack-ert, L. Vallar, A. Scarpa, M. W. B¨uchler, J. P. Neoptolemos, S. Kreis,and J. D. Hoheisel, “Transcriptional variations in the wider peritu-moral tissue environment of pancreatic cancer,”

International Journalof Cancer , vol. 142, pp. 1010–1021, oct 2017.[18] N. Fukushima, J. Koopmann, N. Sato, N. Prasad, R. Carvalho, S. D.Leach, R. H. Hruban, and M. Goggins, “Gene expression alterationsin the non-neoplastic parenchyma adjacent to inﬁltrating pancreaticductal adenocarcinoma,”

Modern Pathology , vol. 18, pp. 779–787, mar2005.[19] J. R. Infante, H. Matsubayashi, N. Sato, J. Tonascia, A. P. Klein, T. A.Riall, C. Yeo, C. Iacobuzio-Donahue, and M. Goggins, “PeritumoralFibroblast SPARC Expression and Patient Outcome With ResectablePancreatic Adenocarcinoma,”

Journal of Clinical Oncology , vol. 25,pp. 319–325, jan 2007.[20] Q. Sun, X. Lin, Y. Zhao, L. Li, K. Yan, D. Liang, D. Sun, and Z.-C. Li, “Deep learning vs. radiomics for predicting axillary lymph node10etastasis of breast cancer using ultrasound images: Don’t forget theperitumoral region,”

Frontiers in Oncology , vol. 10, jan 2020.[21] X. Du, T.-Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, andX. Song, “Spinenet: Learning scale-permuted backbone for recogni-tion and localization,” in

Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition , pp. 11592–11601, 2020.[22] L.-C. Chen, G. Papandreou, F. Schroﬀ, and H. Adam, “Rethinkingatrous convolution for semantic image segmentation,” arXiv preprintarXiv:1706.05587 , 2017.[23] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated con-volutions,” arXiv preprint arXiv:1511.07122 , 2015.[24] L. Zhang, X. Wang, D. Yang, T. Sanford, S. Harmon, B. Turkbey,B. J. Wood, H. Roth, A. Myronenko, D. Xu, et al. , “Generalizing deeplearning for medical image segmentation to unseen domains via deepstacked transformation,”

IEEE Transactions on Medical Imaging , 2020.[25] F. Isensee, J. Petersen, A. Klein, D. Zimmerer, P. F. Jaeger, S. Kohl,J. Wasserthal, G. Koehler, T. Norajitra, S. Wirkert, et al. , “nnu-net:Self-adapting framework for u-net-based medical image segmentation,” arXiv preprint arXiv:1809.10486arXiv preprint arXiv:1809.10486