Anisotropic 3D Multi-Stream CNN for Accurate Prostate Segmentation from Multi-Planar MRI
Anneke Meyer, Grzegorz Chlebus, Marko Rak, Daniel Schindele, Martin Schostak, Bram van Ginneken, Andrea Schenk, Hans Meine, Horst K. Hahn, Andreas Schreiber, Christian Hansen
AAnisotropic 3D Multi-Stream CNN for AccurateProstate Segmentation from Multi-Planar MRI
Anneke Meyer a,1, ∗ , Grzegorz Chlebus b,c,1 , Marko Rak a , Daniel Schindele d ,Martin Schostak d , Bram van Ginneken c,b , Andrea Schenk b , Hans Meine e,b ,Horst K. Hahn b , Andreas Schreiber b , Christian Hansen a a Faculty of Computer Science and Research Campus STIMULATE, University ofMagdeburg, Germany b Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany c Radboud University Medical Center, Nijmegen, The Netherlands d Clinic of Urology and Pediatric Urology, University Hospital Magdeburg, Germany e University of Bremen, Medical Image Computing Group, Bremen, Germany
Abstract
Background and Objective:
Accurate and reliable segmentation of the prostategland in MR images can support the clinical assessment of prostate cancer, aswell as the planning and monitoring of focal and loco-regional therapeutic inter-ventions. Despite the availability of multi-planar MR scans due to standardizedprotocols, the majority of segmentation approaches presented in the literatureconsider the axial scans only. In this work, we investigate whether a neural net-work processing anisotropic multi-planar images could work in the context of asemantic segmentation task, and if so, how this additional information wouldimprove the segmentation quality.
Methods:
We propose an anisotropic 3D multi-stream CNN architecture, whichprocesses additional scan directions to produce a high-resolution isotropic prostatesegmentation. We investigate two variants of our architecture, which work ontwo (dual-plane) and three (triple-plane) image orientations, respectively. Theinfluence of additional information used by these models is evaluated by com-paring them with a single-plane baseline processing only axial images. To realizea fair comparison, we employ a hyperparameter optimization strategy to selectoptimal configurations for the individual approaches.
Results:
Training and evaluation on two datasets spanning multiple sites showstatistical significant improvement over the plain axial segmentation ( p < . ∗ Corresponding author.
Email-Address: [email protected].
Postal Address:
Universitaetsplatz 2, 39106 Magdeburg, Germany Anneke Meyer and Grzegorz Chlebus contributed equally to this work. © Accepted manuscript in Elsevier Comput Methods Programs Biomed November 2020 a r X i v : . [ ee ss . I V ] D ec t the base (0 .
898 single-plane vs. 0 .
906 triple-plane) and apex (0 .
888 single-plane vs. 0 .
901 dual-plane).
Conclusion:
This study indicates that models employing two or three scan direc-tions are superior to plain axial segmentation. The knowledge of precise bound-aries of the prostate is crucial for the conservation of risk structures. Thus, theproposed models have the potential to improve the outcome of prostate cancerdiagnosis and therapies.
Keywords:
MRI, Prostate Segmentation, Multi-Stream-CNN, AnisotropicCNN, Hyperparameter Optimization
1. Introduction
Prostate cancer is the most prevalent type of cancer among men account-ing for over 164 thousand new cases and more than 29 thousand deaths inthe US in 2018 [1]. Clinical workflows of prostate cancer patients commonlyinvolve MR imaging, which, thanks to the high soft-tissue contrast, can be em-ployed for diagnosis, staging, and therapy planning. Prostate segmentation inMRI is a time-consuming task, requiring expert knowledge and suffering frominter-observer variability. Knowledge of the gland size and shape, which can bederived from the segmentation mask, is often utilized in clinical and researchapplications. For instance, Shah et al. [2] has shown that MRI findings can becorrelated with the prostatectomy specimen by employing the prostate segmen-tation. Moreover, it is often used to facilitate radiotherapy planning [3] andtargeted biopsy with MRI-TRUS (transrectal ultrasound) fusion [4, 5]. Becauseneighboring structures as seminal vesicles, bladder, neurovascular bundles, andthe external sphincter are essential for the erectile function and urine continenceof men, the segmentation should be as precise as possible for the planning ofprostate cancer therapy.
Before the advance of deep learning, prostate segmentation was mainly per-formed with atlas-based segmentation or deformable models based on hand-crafted features. A comprehensive summary of those methods is given in [6].Early approaches incorporating deep learning used voxel-wise classification toyield a segmentation mask. For instance, Liao et al. [7] learned deep featureswith a stacked independent subspace analysis network in an unsupervised fash-ion and perform segmentation with label propagation from atlases. Guo et al. [8] also used deep features but generated by a supervised stacked sparse au-toencoder, yielding a prostate likelihood map, which is then segmented by adeformable model. Jia et al. [9] performed patch-based prediction with ensem-ble deep convolutional neural networks (CNNs).CNNs are gaining attention in the medical image processing field thanks tostate-of-the-art results on numerous classification and segmentation tasks. Var-ious CNN architectures for segmentation problems were proposed. Long et al. et al. followingthe encoder/decoder design with long skip connections to retain the locality in-formation was successfully used for different image segmentation problems [11].Established CNN architectures, as well as their modified versions, have been in-troduced for prostate segmentation on T2-weighted MRI. For instance, Tian etal. fined-tuned a FCN model for prostate segmentation [12]. Yan et al. [13]adopted a FCN to embed superpixel information as low-level features in combi-nation with high-level deep features. Another modification strategy to improvenetwork segmentation is to add deep supervision [14, 15, 16, 17].Learning and segmentation performance can benefit from different aspectsregarding network design to retain fine-detailed information and alleviate thevanishing gradient problem. While the U-Net architecture employs skip con-nections from the encoder to the decoder part of the network, Yu et al. [18]analyzed the effect of short and long residual connections and showed that acombination of both is beneficial in a 3D CNN for segmentation. Wang et al. [16]observed improvements with residual connections between neighboring blocks incombination with strided convolutions. Hossain et al. [19] adapted the VGG19architecture [20] into an FCN and added short and long residual connections.A ResNet [21] encoder was extended with a decoder with 3D global convolu-tional block and boundary refinement blocks in [22]. The authors combined thisnetwork with an adversarial network for higher-order consistent predictions. Inthe whole model, anisotropic convolutions are employed to reflect the high slicethickness of the MR input volumes. The authors furthermore suggested usingthe ResNet encoder in combination with an anisotropic decoder and multi-levelpyramid convolutional skip connections as well as adversarial training [23].The use of dense connections that enhance feature reuse and propagationhas been shown in the last two years to improve performance additionally. Has-sanzadeh et al. [24] evaluated the use of various residual and dense connections.Yuan et al. [25] made use of densely connected blocks in encoder and decoderand trained with a joint loss function that incorporates the Dice similarity co-efficient and the reconstruction error of dense block outputs. Also, Zhu et al. [26, 27], To et al. [28] and Liu et al. [29] incorporated, amongst others, denseblocks into their architectures. Brosch et al. [30] formulated the segmentationas a regression task. They combined a 3D shape model with a convolutionalregression network, where the network is used to obtain the distance from thesurface mesh to the corresponding boundary point of the prostate.The above-mentioned methods use only the axial T2-weighted scan as input,which is suboptimal as MR images acquired in a typical prostate imaging proto-col are highly anisotropic (in-plane to out-of-plane resolution ratio of 6-10), seeFig. 1. This leads to substantial partial volume artifacts, making it difficult toprecisely identify prostate boundaries, especially in the apex and base regions.In addition, segmentations created only on axial volumes suffer from step arti-facts due to large slice spacing. However, in prostate cancer imaging protocolsas in Weinreb et al. [31], it is mandatory to acquire at least an additional scandirection (sagittal or coronal) and in multiple clinical routines, all three scan3 igure 1: Visualization of the independent orthogonal scans of one patient illustrating theiranisotropic nature. The first row depicts the axial scan that is normally used for segmentation.As can be seen in the sagittal and coronal view of that axial scan, the apical (blue arrow) andbase (orange arrow) region lack clear boundaries of the prostate due to partial volume effect.In the sagittal and coronal scans, the prostate tissue in these regions can be distinguishedmore clearly from non-prostate tissue. directions are acquired for better interpretation. These additional scans couldbe used to improve the prostate segmentation quality, especially in the areassuffering from partial volume effects.An approach to compute a high-quality prostate mesh was proposed byShah et al. [2], where three masks resulting from manual contouring on axial,coronal, and sagittal MR acquisitions were merged by the means of shape-basedinterpolation. Cheng et al. introduced a fully automatic segmentation algorithmincorporating multi-planar MR information [32]. The algorithm includes anensemble of three 2D neural networks trained separately on axial, coronal, andsagittal MR scans, respectively. The outputs are fused before a high-resolutionprostate segmentation is extracted. Furthermore, Lozoya et al. [33] assessedthe effect of single and dual plane segmentation by training ensembles of 2DCNNs independently on axial and sagittal volumes. The models process threeconsecutive image slices (downsampled to a 128 ×
128 resolution) to segmentthe middle one. The results showed an improvement of 4% for the dual planeapproach.While these multi-planar approaches show that the exploitation of multi-planar MR images is beneficial for the segmentation quality, they have some lim-itations. First, both approaches train independent CNNs per MRI orientation,4hich prevents the models from learning how to combine the information com-ing from different orientations. Second, only 2D neural networks are employedwhich cannot capture the inherent volumetric information of MRI scans. Beingable to analyze the 3D image context is important for the prostate segmentationas demonstrated by Ye et al. [34], who developed a volumetric ConvNet modelthat achieved the best performance at the PROMISE12 challenge so far [35]. Inthis work, we address both limitations by presenting a multi-stream 3D CNNarchitecture that processes simultaneously anisotropic multi-planar MR imagesto produce a high-resolution prostate segmentation. This paper builds on ourprevious work [36], where we demonstrated initial results of an isotropic multi-stream 3D network on a smaller dataset.Additionally, we evaluate the performance of one, two and three input scan di-rections on the same dataset. While Lozoya et al. [33] only include two scandirections, Cheng et al. [32] and our previous work [36] use three planes. Allworks use different methods and datasets and therefore a thorough investigationof the difference between two and three planes has been impossible so far.
The contribution of this work is two-fold:1. We propose an anisotropic 3D multi-stream CNN architecture and showthat it can process multi-planar MR images to produce a high-resolutionprostate segmentation. Contrary to our prior work [36], the proposed net-work design fuses information from anisotropic images alleviating the needfor image resampling to isotropic voxel size. Additionally, the proposedarchitecture is computationally less expensive, which allows for faster in-ference and more efficient training.2. We quantify the influence of information from additional image orienta-tions on segmentation quality by comparing performance of a baselinesingle-plane model (processing only axial images) with dual-plane (axial+ sagittal) and triple-plane (axial + sagittal + coronal) models. To al-low a fair comparison of the three approaches, we employ an automatichyperparameter optimization strategy. We report quantitative results forwhole-gland and base, mid and apex regions using image data from twodatasets and multiple sites.Our source code is available on GitHub , and we published ground truthsegmentations that were created as part of this project for a publicly availablechallenge dataset [37].In the following section, we describe the proposed architecture for the multi-plane segmentation of the prostate as well as our hyperparameter optimizationmethod. Furthermore, we will give a description of the datasets used in thiswork, the training procedure, and the evaluation measures. https://github.com/AnnekeMeyer/AnisotropicMultiStreamCNN o n v ( ) , B N , R e L U C o n v ( ) , B N , R e L U D r o p o u t Upsample (2x2x2)Conv (32), BN, ReLUConv (16), BN, ReLU C o n v ( ) , B N , R e L U C o n v ( ) , B N , R e L U D r o p o u t Max pooling (2x2x2) Conv (64), BN, ReLUConv (32), BN, ReLU Max pooling (2x2x1) Max pooling (2x1x2)Max pooling (1x2x2) Conv (32), BN, ReLUConv (16), BN, ReLU Conv (32), BN, ReLUConv (16), BN, ReLU Upsample (2x2x2)Upsample (2x1x1)Upsample (1x2x1)Upsample (1x1x2) C o n v ( ) , B N , R e L U C o n v ( ) , B N , R e L U D r o p o u t Upsample (2x2x2)Conv (16), BN, ReLUConv (8), BN, ReLU Max pooling (2x2x1) Max pooling (2x1x2)Max pooling (1x2x2) Conv (16), BN, ReLUConv (8), BN, ReLU Conv (16), BN, ReLUConv (8), BN, ReLU C o n v ( ) , B N , R e L U C o n v ( ) , B N , R e L U D r o p o u t C o n v ( ) , S i g m o i d Upsample (2x1x1)Upsample (1x2x1)Upsample (1x1x2) Upsample (2x1x1)Upsample (1x2x1)Upsample (1x1x2) analysis path synthesis path
Concatenateaxial input(144x144x36)coronal input(144x36x144)sagittal input(36x144x144) segmentation output(144x144x144)
Figure 2: Triple-planar multi-stream 3D network processing axial, coronal, and sagittal MRvolumes. The number in parentheses denotes feature map count (conv layer), pool size (maxpooling), and upsampling factor (upsampling). The upsampling is performed either by trilin-ear upsampling or 3D transposed convolution.
2. Materials and Methods
With respect to the literature, we can basically define two variants of combin-ing multiple planes for CNNs. The first way is to train three networks separatelywith each network taking one orthogonal scan as input. The output of the threenetworks is then fused in a postprocessing step. The alternative is to inputall planes in one multi-stream network and and process them simultaneously.We compared the two variants to each other and could not find any significantdifference in their performance (see results in Section 3.5). Due to its simplic-ity in deployment, we focus our work on the multi-stream architecture in thisproject. This has the additional benefit, that we can investigate the influenceof additional planes directly, as the ensembling of network outputs has a benefiton performance in general.Our multi-stream model is a 3D U-Net-like architecture following an en-coder/decoder design with four resolution levels [38]. The proposed network6esign is flexible with respect to the number of inputs, which enables informa-tion extraction from more than one volume. Fig. 2 illustrates a triple-planarmodel instance processing axial, coronal, and sagittal acquisitions. Dependingon the desired input configuration (single-plane, dual-plane, or triple-plane), theanalysis path has corresponding input specific branches on the first two resolu-tion levels. These branches perform downsampling by max pooling operationswith anisotropic pool size (e.g., 2 × × × × × × ,
1] range.
Careful tuning of neural network hyperparameters, such as learning rate orregularization strength, is important in getting the best possible model per-formance. Moreover, hyperparameter optimization (HPO) should be performedwhenever the architecture or the learned task changes, as a direct transfer ofhyperparameter values may lead to a sub-optimal prediction quality. We runHPO to find hyperparameter values yielding the best segmentation performancefor all three architectures (single-, dual-, and triple-plane) independently. Thisstrategy minimizes the influence of the chosen hyperparameters, yielding a faircomparison among the investigated models.We employed the HPO strategy that was proposed by Falkner et al. in[39]. The method involves a combination of Hyperband (HB) with Bayesianoptimization (BO) to achieve fast convergence to optimal configurations. HB isan HPO method that evaluates n randomly sampled configurations with a smallbudget (e.g., maximal training epoch count), keeps the best half, and doublestheir budget [40]. This process is repeated until only one configuration is left.BO builds a probabilistic model based on the already evaluated configurations[41]. This model is then employed to sample hyperparameter values that shouldresult in better model performance. One iteration of our HPO involves sampling n configurations from the Bayesian model and another n by random sampling.The sampled configurations are then evaluated using the HB method. We used two datasets for the evaluation of the proposed approaches. Thefirst dataset is an in-house dataset containing 89 axial, sagittal and coronalT2-weighted MR scans acquired on a Philips Achieva 3T imager. In the clini-cal routine, gland segmentations have been obtained with commercial software7DynaCAD, Philips Invivo) in a semi-supervised manner. As the software onlyconsiders the axial T2 volumes, we resampled the segmentations to an isotropicresolution via shape-based interpolation as in Herman et al. [42]. Subsequently,an expert urologist reviewed and corrected the isotropic segmentations with 3DSlicer [43] by simultaneously considering all three orthogonal scans.The second dataset ProstateX is publicly available through the SPIE-AAPM-NCI Prostate MR Classification Challenge [44, 45, 46], which was designed forpredicting the clinical significance of prostate lesions. The dataset comprisesmultiparametric MRI acquired on two different types of Siemens 3T MR im-agers; the MAGNETOM Trio and Skyra. As no reference segmentation ofthe glands is available in the challenge dataset, we created 66 segmentationsfor randomly chosen T2-weighted volumes. The segmentations were obtainedmanually for each scan direction by a medical student, followed by a reviewand corrections of an expert urologist with 3D Slicer under consideration of allthree orthogonal scans. The final isotropic high-resolution prostate mask is ex-tracted by taking the average of linearly resampled distance transformations ofthe individual segmentations and thresholding the result at zero (similar to theapproach employed by Herman et al. [42]). The final masks were reviewed byan expert and corrected if necessary using 3D Slicer. These segmentations werepublished as part of the study to support open research [37, 46].The scans of both datasets were acquired without an endorectal coil. Detailson the resolution of the orthogonal scans can be found in Table 1. The scansrepresent prostates with clinical variability such as tumors, cysts, benign pro-static hyperplasia, and scars from previous minimally invasive surgeries. Thealignment of the orthogonal scans was checked visually using 3D Slicer. Inabout 10% of the cases, the scans were misaligned due to, for example, patientor bowel motion. For these cases, we performed a manual rigid registrationof affected images. Volumes in the ProstateX dataset that did not contain thewhole prostate were excluded from this study to have a fair comparison betweenthe approaches. For the in-house dataset, no such cases were found.Methods regarding the segmentation of the prostate glands are often comparedto each other in the PROMISE12 challenge [35]. As this challenge dataset onlyconsists of axial T2-weighted MR images (see Table 1), we were not able to makethis comparison in this project. Instead, we focus on the comparison of differentnetwork architectures that are based on the multi-planar input volumes.
For network training and prediction, the three scans are preprocessed byresampling (linear interpolation) them into a common coordinate system. Theresulting resolution is 0.5 × × × × × × ×
184 and an out-of-plane size of 46. As intensitynormalization, the gray values are cropped to the 1st and 99th percentiles andafterwards normalized to a range of [0,1].8 able 1: Resolution details for Prostate MRI datasets.
Dataset Scan Resolution [mm]
ProstateX axial [0.5-0.6] x [0.5-0.6] x [3-5]sagittal 0.56 x 0.56 x [3-4]coronal [0.56-0.6] x [0.56-0.6] x [3-4.5]In-House axial 0.5 x 0.5 x 2.75sagittal 0.5 x 0.5 x 3.25coronal 0.5 x 0.5 x 2.76PROMISE12 axial [0.27-0.63] x [0.27-0.63] x [2.2-3.6]sagittal not availablecoronal not available
We set aside randomly chosen 19 test cases for each dataset that were notconsidered for training. The remaining images were split into four folds forcross-validation. Hence, the folds of the in-house dataset consist of 52 trainingand 18 validation images each, while the ProstateX fold contains 35 training and12 validation images. To augment the training set, random operations such asaxial flips, elastic deformations, translations and rotations were used. Unnaturaltransformations such as top-bottom and front-back flips were not considered.The input images were cropped to a size of 144 × ×
144 voxels, before beingfed to the network.The objective function of our networks is the negative soft Dice similaritycoefficient (DSC) loss = − (cid:80) Ni p i g i + (cid:15) (cid:80) Ni p i + (cid:80) Ni g i + (cid:15) , with N being the total number of voxels, p i and g i the predicted and referencevoxels, respectively, and (cid:15) a small constant to ensure numerical stability. Weran the training with the Adam optimizer [47] for a maximum of 270 epochs,with an early stop criterium if the validation loss does not improve by at least δ = 0 .
001 for 100 iterations. The mini-batch size was set to one due to GPUmemory capacity (NVIDIA GeForce GTX 1080 Ti).The prediction was post-processed with a connected components analysis,removing every component except for the largest. We ran the HPO on the con-catenation of the first folds from both datasets. For each approach (single, dualand triple-plane), a separate HPO was performed. We optimized the hyperpa-rameters which were empirically found to have substantial influence on modelperformance: learning rate (range [10 − , − ]), dropout rate (0 . , . , . , . . able 2: Best performing hyperparameter for each of the investigated network architectures. Single-Plane Dual-Plane Triple-Plane learning rate 1 . × − . × − . × − dropout rate 0.6 0.2 0.2batch normalization no no yesupsampling mode tri-linear transposedconvolu-tion transposedconvolu-tion proposed network architectures are 1.4, 1.6, and 1.7 million, respectively. Thus,the proposed strategies are using similar network capacity. We implemented two training scenarios: • Scenario I - train one model on merged datasets • Scenario II - train separate models for each datasetBy comparing models resulting from both scenarios, we can verify whether seg-mentation quality for a target dataset can benefit from training on multi-sitedata. For each scenario, four-fold cross-validation was performed.
We evaluated the investigated models with the following measures thatwere also used in the PROMISE12 challenge [35]: Dice similarity coefficient(DSC) as well as the average boundary distances (ABD) and the 95th percentileHausdorff-Distance (95-HD) between surface points of both volumes.The Dice similarity coefficient is defined asDSC(
X, Y ) = 2 | X ∩ Y || X | + | Y | (1)with X being the predicted and Y being the ground truth voxels. Theaverage boundary distance is defined as:ABD( X S , Y S ) = 1 | X S | + | Y S | ( (cid:88) x ∈ X S min y ∈ Y S ED( x, y )+ (cid:88) y ∈ Y S min x ∈ X S ED( y, x )) (2)where X S and Y S are the sets of surface points of the predicted and groundtruth segmentation. ED is the Euclidean distance operator. The Hausdorffdistance is defined as 10 able 3: Evaluation measures for scenario I (training on merged datasets) averaged across allfolds. Asterisks mark significantly better results when compared to the single-plane model.Merged Datasets ProstateX In-HouseSingle Dual Triple Single Dual Triple Single Dual TripleDSC Whole 0 . ∗∗ . ∗ . ∗ .
922 0 . ∗ . . ∗ .
896 0 . ∗∗∗ . .
921 0 . . ∗ . .
956 0 .
950 0 . ∗ . .
898 0 . ∗∗ ∗ .
884 0 . ∗ . . . ∗∗ .
877 1 . ∗ .
048 0 . ∗ . . ∗ .
916 1 . ∗∗∗ . .
643 0 . . ∗ . .
918 0 .
971 0 . ∗∗ . . ∗ . ∗ .
230 1 . . . . ∗ .
072 3 .
916 3 . . ∗ . . ∗ .
810 4 . ∗∗∗ . ∗ .
015 2 . . ∗∗ . .
439 3 .
212 2 . ∗ . .
097 2 . ∗ .
670 3 . . ∗ . ∗ p < . ∗∗ p < . ∗∗∗ p < . HD( X S , Y S ) = max (cid:0) HD (cid:48) ( X S , Y S ) , HD (cid:48) ( Y S , X S ) (cid:1) with HD (cid:48) ( X S , Y S ) = max x ∈ X S ( min y ∈ Y S ED( x, y )) . (3)As done in [35], we used the 95th percentile for implementation of HD (theso-called 95-HD), as this measure is more often used, leveraging comparabilitywith previous works.All evaluation measures are computed in 3D each for the whole gland, apex,base, and mid-gland regions. Each region corresponds to ca. one-third of theprostate and was partitioned in a slice-based manner with regards to the manualreference segmentation.
3. Results and Discussion
We report quantitative results (averaged across folds) of both scenarios inTable 3 and Table 4, respectively. Each approach was subject to four-fold cross-validation, and the performance of the resulting models was evaluated on left-outtest cases. We applied the Wilcoxon signed-rank test to obtain the statisticalsignificance of quantitative differences between single and dual or triple planeapproaches. The rationale against a standard Student t-Test is that we cannotassume Gaussianity for the distribution of the result quality.In general, the additional scans used by the dual- and triple-plane modelsimproved the segmentation quality when compared with the single-plane model.In the following, we present a more detailed result analysis for both consideredscenarios as well as comparison with the inter-rater variability.11 able 4: Evaluation measures for scenario II (models are trained and evaluated on eachdataset individually) averaged across all folds. Asterisks mark significantly better resultswhen compared to the single-plane model.ProstateX In-HouseSingle Dual Triple Single Dual TripleDSC Whole 0.919 0.923 ∗ ∗ Apex 0.865 0.873
Mid ∗ ABD[mm] Whole 1.056 1.014 ∗ Apex 1.228 1.144
Mid ∗ ∗ Apex 3.573
Mid ∗∗ Best results are marked bold. ∗ p < . ∗∗ p < . ∗∗∗ p < . In training scenario I (training on merged datasets), the dual-plane approachthat incorporates axial and sagittal volumes, works significantly better ( p < .
05) than the single-plane approach on both datasets and every region of theprostate with regards to every evaluation measure. The dual-plane methodachieved an average DSC of 0.933 for the whole gland (vs. 0.927 for single-plane), 0.901 (vs. 0.888) in the apex and 0.958 (vs. 0.956) and 0.904 (vs. 0.898)for mid-gland and base, respectively. It has to be noted that the ABD and95-HD for the mid-region are worse for dual-plane than single-plane, but theboxplots in Fig. 3 indicate that the dual-plane model performs better whenthe median is considered. The triple-plane model performed significantly better( p < .
05) than the single-plane model regarding the DSC of the whole prostateas well as of the base region. Regarding distance-based measures, only the ABDof the base region was significant ( p < . For scenario II, we can find less significant differences between the differentapproaches (see Tab. 4). This may be caused by the fact that less training datawas available for each experiment. Opposed to scenario I (Table 3), where thedual-plane approach achieved the best performance for the evaluation measuresin general, the triple-plane approach generally performs better in scenario IIthan dual-plane for each region and evaluation measure.12 hole Apex Mid Base0.650.700.750.800.850.900.951.00 D i c e (a) DSC Whole Apex Mid Base0.00.51.01.52.02.53.0 A B D [ mm ] (b) ABD Whole Apex Mid Base0246810 - H D [ mm ] (c) 95-HDFigure 3: Boxplots showing (a) DSC, (b) ABD, and (c) 95-HD for the whole gland and itssubregions for single- (dashed), dual- (dotted), and triple-plane (solid) models. Models weretrained on merged datasets (scenario I). a) Simple case where all approaches perform about equally well.(b) Challenging case where dual/triple plane approaches are necessary. When considering onlythe axial plane, we yield overestimation in the base region.(c) Challenging case where dual/triple plane approaches are necessary. Segmentation in apicalregion of the prostate is improved.(d) Challenging case, where all approaches fail, presumably due to strong heterogeneity in theprostate gland.Figure 4: Four examples with different characteristics. On the left, segmentations in the imageplane are depicted. Left column is the axial view, central column is sagittal view, and rightcolumn depicts the coronal view. On the right the surface distance between ground truth andprediction are shown for each approach. hole Apex Mid Base0.650.700.750.800.850.900.951.00 D i c e Whole Apex Mid Base0.650.700.750.800.850.900.951.00 ProstateX Whole Apex Mid Base0.650.700.750.800.850.900.951.00Whole Apex Mid Base
Single D i c e Whole Apex Mid Base
Dual
Triple
Figure 5: Boxplots comparing Dice similarity coefficients for the whole gland and its subregionsfor models trained on only one dataset (scenario II, dashed) and merged datasets (scenarioI, solid). Results are accumulated from all folds. The quality differences between scenario Iand II are not substantial, yet, a slight improvement for the whole gland and most regions formodels resulting from scenario I can be observed.
In general, the differences in performance between scenario II (training onindividual datasets) and scenario I (training on merged datasets) were not sub-stantial. However, we can see a slight improvement in the boxplots in Fig. 5 forthe whole gland and most regions when models are trained on merged datasets.We observed that the quantitative evaluation measures in both scenariosare considerably better for the in-house datasets than for the ProstateX data.We assume that the reason for these results is two-fold: Firstly, the number ofcases in the datasets are not balanced: the in-house dataset had almost 50%more cases available for training (n=70) than the ProstateX dataset (n=47).Secondly, the reference annotations were created with different methods: whilethe annotations for the ProstateX dataset were created entirely manually, thein-house dataset was segmented semi-automatically in the first stage and laterrefined manually. Even when experts review and correct the semi-automaticallygenerated segmentations, there may still be a potential bias towards the semi-automatic segmentations, which could result in more consistent segmentationsthan with manual delineation. One might also argue that the image qualityis another factor for performance quality. However, we could not confirm thisvisually.Another observation we made is that, on the one hand, the triple-planeapproach performs better than dual-plane if models are trained on separatedatasets (scenario II). On the other hand, the dual-plane approach is more of-15en significantly better than single-plane as the triple–plane approach is whentrained on merged datasets (scenario I). Thus, dual-plane seems to be morerobust to variations in the training data if multiple data sources are used. How-ever, the quantitative differences between dual- and triple-plane in both trainingscenarios are not statistically significant.
To put our automatic segmentation results into perspective, we were inter-ested to see in what range the inter-observer variability of prostate segmentationis (see Table 5). In the literature, second observer segmentation evaluation hasbeen investigated within the scope of the PROMISE12 challenge [35]. The au-thors report a mean DSC of 0.90 between two expert segmentations for thewhole gland and 0.80 and 0.86 for the apex and base, respectively. For thewhole gland, they report an inter-rater variability of 5.64 mm for 95-HD.We carried out a similar study as part of another project where we askedtwo urologists to outline the glandular structures in the axial scans of 20 casesfrom the ProstateX challenge [48]. It has to be noted that those cases do notcover the test cases of this work. Nevertheless, we can still get a notion of howmuch two expert segmentations can vary. The inter-rater DSC for the wholegland, apex and base for these 20 cases were 0.93, 0.90 and 0.89, respectively.The 95-HD was 3.15 mm for the whole gland, which corresponds approximatelyto the thickness of one slice. Comparing these results to the overall DSC of 0.93for the dual- and triple-planar model, we are clearly in the range of inter-ratervariability. However, individual cases, as shown in Fig. 4d, still indicate thatautomatic segmentations need to be further improved in the future.
We compared our triple-plane architecture processing all orthogonal imagessimultaneously, which directly outputs a prostate segmentation, with an en-semble approach from the literature [32, 33]. In the ensemble approach threeindependent 3D models are trained for each image orientation, which outputsare combined in a post-processing step. We trained three single-plane modelsand used majority vote to compute the final segmentation. The experiment wasperformed on the ProstateX dataset and results were averaged across 4 folds.The results are listed in Table 6. No significant differences were found betweenthe two methods for any region and evaluation measure (Wilcoxon signed-ranktest). The results are also in line with the outcome of our study that the inputof multiple planes improves over a single-plane input.Although no differences were found, we think that the multi-stream approachis superior to the ensemble because it requires less parameters (factor of 2.7)and therefore is easier to deploy in production. Moreover, using common de-coder for all image orientations (as in multi-stream architecture) can be seenas a regularizer, which can help in minimising the generalization error on otherdatasets/tasks. For the ensemble we also evaluated output combination usingshape-based interpolation, but it worked worse than majority vote.16 able 5: Evaluation measures for inter-rater variabilityInter-ObserverPROMISE12 (n=30) ProstateX (n=20)DSC Whole 0.90 0.93Apex 0.80 0.90Mid n.a. 0.96Base 0.86 0.89ABD Whole 1.82 0.66Apex 2.55 0.63Mid n.a 0.49Base 2.21 0.8695-HD Whole 5.64 3.15Apex 6.36 2.84Mid n.a 2.02Base 6.28 3.56Table 6: Comparison of two methods (ensemble and triple-plane) for generating segmentationsfrom tri-planar input. No significant differences were found.ensemble triple-planeDSC Whole 0 . ± .
03 0 . ± . . ± .
12 0 . ± . . ± .
02 0 . ± . . ± .
04 0 . ± . . ± .
36 0 . ± . . ± .
12 0 . ± . . ± .
30 0 . ± . . ± .
56 1 . ± . . ± .
38 3 . ± . . ± .
72 3 . ± . . ± .
99 3 . ± . . ± .
70 3 . ± . . Conclusion and Future Work We proposed an anisotropic 3D multi-stream segmentation CNN that allowsincorporating of different numbers of orthogonal input volumes. The objectiveof our work was to determine whether segmentation performance could be in-creased by the incorporation of sagittal and coronal volumes. To allow for a faircomparison between single-, dual- and triple-plane approaches, we included anautomatic hyperparameter optimization strategy.The most important finding of this work is that the use of multi-planarstrategies significantly improves segmentation performance compared to usingonly axial volumes in almost all cases. The quantitative differences betweenthe three proposed approaches may not be large, but depending on the clinicalapplication, the improved accuracy can be critical for the conservation of struc-tures like external sphincter, bladder, or seminal vesicles. The clinical utilityof the multi-planar approaches would be addressed in future work. Whether toprefer using the dual- or triple-plane variant could not be answered unequivo-cally. However, the dual-plane approach seems to be a good trade-off betweencomputational cost and segmentation quality.Future work will include an automatic registration among the orthogonalscans to compensate for potential transformations between them. This maylead to an increased performance of the multi-planar approaches, as the man-ual registration may not compensate for all motion artifacts and may be lessprecise than an automatic method. Another field of future research will be thedetailed investigation of the multi-stream network architecture. For example,the location where the encoders are merged could be further examined.Our results quality is comparable to the inter-rater variability. However,as mentioned above, some negative outliers would have never been producedby any medical experts. Hence, future work should also investigate how thoseoutliers could be automatically detected and how much correction time would berequired to achieve clinically acceptable segmentations. Furthermore, it wouldbe interesting to apply our multi-stream architecture to other clinical use cases,where multi-planar imaging is acquired (e.g., cardiac MRI).
Conflict of Interest
This work has been funded by the EU and the federal state of Saxony-Anhalt,Germany under grant number ZS/2016/08/80388. Co-Funding was providedby Fraunhofer-Society. The Titan Xp used for this research was donated bythe NVIDIA Corporation. Data used in this research were obtained from TheCancer Imaging Archive (TCIA) sponsored by the SPIE, NCI/NIH, AAPM,and Radboud University.
References [1] R. L. Siegel, K. D. Miller, A. Jemal, Cancer statistics, 2018, CA Cancer JClin 68 (1) (2018) 7–30. doi: .182] V. Shah, T. Pohida, B. Turkbey, H. Mani, M. Merino, P. A. Pinto,P. Choyke, M. Bernardo, A method for correlating in vivo prostate mag-netic resonance imaging and histopathology using individualized mag-netic resonance-based molds, Rev Sci Instrum 80 (10) (2009) 104301.doi: .[3] M. A. Schmidt, G. S. Payne, Radiotherapy planning using MRI, Phys MedBiol 60 (22) (2015) R323–61. doi: .[4] A. Fedorov, S. Khallaghi, A. C. S´anchez, A. Lasso, S. Fels, K. Tuncali, E. S.Neubauer, T. Kapur, C. Zhang, W. Wells, P. L. Nguyen, P. Abolmaesumi,C. Tempany, Open-source image registration for MRI-TRUS fusion-guidedprostate interventions, Int J Comput Assist Radiol Surg 10 (6) (2015) 925–934. doi: .[5] C. J. Das, A. Razik, A. Netaji, S. Verma, Prostate MRI–TRUS fusionbiopsy: a review of the state of the art procedure, Abdominal Radiology(Jan 2020). doi:10.1007/s00261-019-02391-8.[6] S. Ghose, A. Oliver, R. Mart´ı, X. Llad´o, J. C. Vilanova, J. Freixenet,J. Mitra, D. Sidib´e, F. Meriaudeau, A survey of prostate segmentationmethodologies in ultrasound, magnetic resonance and computed tomogra-phy images, Comput Methods Programs Biomed 108 (1) (2012) 262–287.doi: .[7] S. Liao, Y. Gao, A. Oto, D. Shen, Representation learning: a uni-fied deep learning framework for automatic prostate MR segmenta-tion, Med Image Comput Comput Assist Interv 16 (2) (2013) 254–261.doi: .[8] Y. Guo, Y. Gao, D. Shen, Deformable MR prostate segmentation via deepfeature learning and sparse patch matching, IEEE Trans Med Imaging35 (4) (2016) 1077–1089. doi: .[9] H. Jia, Y. Xia, W. Cai, M. Fulham, D. D. Feng, Prostate segmentation inMR images using ensemble deep convolutional neural networks, in: ProcIEEE 14th Int Symp Biomed Imaging (ISBI), IEEE, 2017, pp. 762–765.doi: .[10] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for seman-tic segmentation, in: Proc IEEE Comput Soc Conf Comput Vis PatternRecognit, 2015, pp. 3431–3440. doi: .[11] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks forbiomedical image segmentation, in: Med Image Comput Comput AssistInterv, 2015, pp. 234–241. doi: .[12] Z. Tian, L. Liu, B. Fei, Deep convolutional neural network for prostateMR segmentation, in: Proc SPIE Int Soc Opt Eng, Vol. 10135, 2017, p.101351L. doi: .1913] K. Yan, X. Wang, J. Kim, M. Khadra, M. Fulham, D. Feng, A propagation-dnn: Deep combination learning of multi-level features for mr prostatesegmentation, Comput Methods and Programs Biomed 170 (2019) 11–21.doi: .[14] Q. Zhu, B. Du, B. Turkbey, P. L. Choyke, P. Yan, Deeply-supervised CNNfor prostate segmentation, in: Proc Int Jt Conf Neural Netw, 2017, pp.178–184. doi:
IJCNN.2017.7965852 .[15] R. Cheng, H. R. Roth, N. Lay, L. Lu, B. I. Turkbey, W. Gandler,E. S. McCreedy, P. Choyke, R. M. Summers, M. J. McAuliffe, Auto-matic MR prostate segmentation by deep learning with holistically-nestednetworks, in: Proc SPIE Int Soc Opt Eng, 2017, pp. 101332H–101332H.doi: .[16] B. Wang, Y. Lei, S. Tian, T. Wang, Y. Liu, P. Patel, A. B. Jani, H. Mao,W. J. Curran, T. Liu, X. Yang, Deeply supervised 3D fully convolutionalnetworks with group dilated convolution for automatic MRI prostate seg-mentation, Med Phys 46 (4) (2019) 1707–1718. doi: .[17] B. Wang, Y. Lei, J. J. Jeong, T. Wang, Y. Liu, S. Tian, P. Patel, X. Jiang,A. B. Jani, H. Mao, W. J. Curran, T. Liu, X. Yang, Automatic MRIprostate segmentation using 3D deeply supervised FCN with concate-nated atrous convolution, in: Proc SPIE Int Soc Opt Eng, 2019, p. 141.doi: .[18] L. Yu, X. Yang, H. Chen, J. Qin, P.-A. Heng, Volumetric ConvNets withmixed residual connections for automated prostate segmentation from 3DMR images., in: Proc Conf AAAI Artif Intell, 2017, pp. 66–72.[19] M. S. Hossain, A. P. Paplinski, J. M. Betts, Residual Semantic Segmen-tation of the Prostate from Magnetic Resonance Images, in: L. Cheng,A. C. S. Leung, S. Ozawa (Eds.), Neural Information Processing, Vol. 11307of Lecture Notes in Computer Science, Springer International Publishing,Cham, 2018, pp. 510–521. doi: .[20] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv:1409.1556 (2014).[21] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recog-nition, in: Proc 29th IEEE Comput Soc Conf Comput Vis Pattern Recog-nit, IEEE, 2016, pp. 770–778. doi: .[22] H. Jia, Y. Song, D. Zhang, H. Huang, D. Feng, M. Fulham, Y. Xia, W. Cai,3D Global Convolutional Adversarial Network for Prostate MR VolumeSegmentation, arXiv preprint arXiv:1807.06742 (2018).[23] H. Jia, Y. Xia, Y. Song, D. Zhang, H. Huang, Y. Zhang, W. Cai, 3DAPA-Net: 3D Adversarial Pyramid Anisotropic Convolutional Network for20rostate Segmentation in MR Images, IEEE Trans Med Imaging (2019).doi: .[24] T. Hassanzadeh, L. G. C. Hamey, K. Ho-Shon, Convolutional Neural Net-works for Prostate Magnetic Resonance Image Segmentation, IEEE Access7 (2019) 36748–36760. doi: .[25] Y. Yuan, W. Qin, X. Guo, M. Buyyounouski, S. Hancock, B. Han, L. Xing,Prostate Segmentation with Encoder-Decoder Densely Connected Convo-lutional Network (Ed-Densenet), in: Proc IEEE 16th Int Symp BiomedImaging (ISBI), 2019, pp. 434–437. doi: .[26] Q. Zhu, Du Bo, P. Yan, Boundary-weighted domain adaptive neural net-work for prostate MR image segmentation, arXiv preprint arXiv:1902.08128(2019).[27] Q. Zhu, B. Du, J. Wu, P. Yan, A deep learning health data analysis ap-proach: Automatic 3D prostate MR segmentation with densely-connectedvolumetric ConvNets, in: Proc Int Jt Conf Neural Netw, IEEE, 2018, pp.1–6. doi: .[28] M. N. N. To, D. Q. Vu, B. Turkbey, P. L. Choyke, J. T. Kwak, Deepdense multi-path neural network for prostate segmentation in magneticresonance imaging, Int J Comput Assist Radiol Surg 13 (11) (2018) 1687–1696. doi: .[29] Q. Liu, M. Fu, X. Gong, H. Jiang, Densely Dilated Spatial Pooling Con-volutional Network using benign loss functions for imbalanced volumetricprostate segmentation, arXiv preprint arXiv:1801.10517 (2018).[30] T. Brosch, J. Peters, A. Groth, T. Stehle, J. Weese, Deep learning-basedboundary detection for model-based segmentation with application to MRprostate segmentation, Med Image Comput Comput Assist Interv (2018)515–522doi: .[31] J. C. Weinreb, J. O. Barentsz, P. L. Choyke, F. Cornud, M. A. Haider,K. J. Macura, D. Margolis, M. D. Schnall, F. Shtern, C. M. Tempany,H. C. Thoeny, B. Turkbey, A. Rosenkrantz, G. Villeirs, S. Verma, PI-RADS Prostate Imaging - Reporting and Data System: 2019, Version 2.1,Eur. Urol. 69 (1) (2016) 16–40.[32] R. Cheng, N. Lay, F. Mertan, B. Turkbey, H. R. Roth, L. Lu,W. Gandler, E. S. McCreedy, T. Pohida, P. Choyke, M. J. McAuliffe,R. M. Summers, Deep learning with orthogonal volumetric HED seg-mentation and 3D surface reconstruction model of prostate MRI, in:Proc IEEE 14th Int Symp Biomed Imaging (ISBI), 2017, pp. 749–753.doi: .2133] R. Cabrera Lozoya, A. Iannessi, J. Brag, S. Patriti, E. Oubel, Assessing therelevance of multi-planar MRI acquisitions for prostate segmentation usingdeep learning techniques, in: Proc SPIE Int Soc Opt Eng, 2018, p. 45.doi: .[34] L. Yu, X. Yang, H. Chen, J. Qin, P.-A. Heng, Volumetric convnets withmixed residual connections for automated prostate segmentation from 3dMR images, in: Proc Conf AAAI Artif Intell, 2017, pp. 66–72.[35] G. Litjens, R. Toth, W. van de Ven, C. Hoeks, S. Kerkstra,B. van Ginneken, G. Vincent, G. Guillard, N. Birbeck, J. Zhang,et al., Evaluation of prostate segmentation algorithms for MRI: thePROMISE12 challenge, Med Image Anal 18 (2) (2014) 359–373.doi: .[36] A. Meyer, A. Mehrtash, M. Rak, D. Schindele, M. Schostak, C. Tem-pany, T. Kapur, P. Abolmaesumi, A. Fedorov, C. Hansen, Automatichigh resolution segmentation of the prostate from multi-planar MRI, in:Proc IEEE 15th Int Symp Biomed Imaging (ISBI), 2018, pp. 177–181.doi:10.1109/ISBI.2018.8363549.[37] D. Schindele, A. Meyer, D. F. von Reibnitz, V. Kiesswetter, M. Schostak,M. Rak, C. Hansen, High resolution prostate segmentations for theProstateX-Challenge [Data set], The Cancer Imaging Archive (2020).doi: .[38] ¨O. C¸ i¸cek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger,3D U-Net: learning dense volumetric segmentation from sparse annota-tion, in: Med Image Comput Comput Assist Interv, 2016, pp. 424–432.doi: .[39] S. Falkner, A. Klein, F. Hutter, BOHB: Robust and efficient hyperparam-eter optimization at scale, arXiv preprint arXiv:1807.01774 (2018).[40] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyper-band: A novel bandit-based approach to hyperparameter optimization,arXiv preprint arXiv:1603.06560 (2016).[41] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Takingthe human out of the loop: A review of bayesian optimization, Proc IEEE104 (1) (2015) 148–175. doi: .[42] G. T. Herman, J. Zheng, C. A. Bucholtz, Shape-based interpolation, IEEEComput Graph Appl 12 (3) (1992) 69–79. doi: .[43] A. Fedorov, R. Beichel, J. Kalpathy-Cramer, J. Finet, J.-C. Fillion-Robin,S. Pujol, C. Bauer, D. Jennings, F. Fennessy, M. Sonka, et al., 3D Slicer asan image computing platform for the quantitative imaging network, MagnReson Imaging 30 (9) (2012) 1323–1341. doi: .2244] G. Litjens, O. Debats, J. Barentsz, N. Karssemeijer, H. Huis-man, ProstateX Challenge data, The Cancer Imaging Archive (2017).doi: .[45] G. Litjens, O. Debats, J. Barentsz, N. Karssemeijer, H. Huisman,Computer-aided detection of prostate cancer in mri, IEEE Trans Med Imag-ing 33 (5) (2014) 1083–1092. doi:10.1109/TMI.2014.2303821.[46] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore,S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, F. Prior, The Can-cer Imaging Archive (TCIA): maintaining and operating a public infor-mation repository, Journal of digital imaging 26 (6) (2013) 1045–1057.doi: .[47] D. Kingma, J. Ba, Adam: A method for stochastic optimization, arXivpreprint arXiv:1412.6980 (2014).[48] A. Meyer, M. Rak, D. Schindele, S. Blaschke, M. Schostak, A. Fedorov,C. Hansen, Towards patient-individual PI-Rads v2 sector map: CNN forautomatic segmentation of prostatic zones from T2-weighted MRI, in:Proc IEEE 16th Int Symp Biomed Imaging (ISBI), 2019, pp. 696–700.doi:10.1109/ISBI.2019.8759572