[PDF] Learning Multi-Modal Volumetric Prostate Registration with Weak Inter-Subject Spatial Correspondence

Abstract

Recent studies demonstrated the eligibility of convolutional neural networks (CNNs) for solving the image registration problem. CNNs enable faster transformation estimation and greater generalization capability needed for better support during medical interventions. Conventional fully-supervised training requires a lot of high-quality ground truth data such as voxel-to-voxel transformations, which typically are attained in a too tedious and error-prone manner. In our work, we use weakly-supervised learning, which optimizes the model indirectly only via segmentation masks that are a more accessible ground truth than the deformation fields. Concerning the weak supervision, we investigate two segmentation similarity measures: multiscale Dice similarity coefficient (mDSC) and the similarity between segmentation-derived signed distance maps (SDMs). We show that the combination of mDSC and SDM similarity measures results in a more accurate and natural transformation pattern together with a stronger gradient coverage. Furthermore, we introduce an auxiliary input to the neural network for the prior information about the prostate location in the MR sequence, which mostly is available preoperatively. This approach significantly outperforms the standard two-input models. With weakly labelled MR-TRUS prostate data, we showed registration quality comparable to the state-of-the-art deep learning-based method.

Full PDF

LLEARNING MULTI-MODAL VOLUMETRIC PROSTATE REGISTRATION WITH WEAKINTER-SUBJECT SPATIAL CORRESPONDENCE

Oleksii Bashkanov (cid:63)

Anneke Meyer (cid:63)

Daniel Schindele † Martin Schostak † Klaus–Dietz T¨onnies (cid:63)

Christian Hansen (cid:63)

Marko Rak (cid:63)(cid:63)

Faculty of Computer Science & Research Campus STIMULATE, University of Magdeburg, Germany † Clinic of Urology and Pediatric Urology, University Hospital Magdeburg, Germany

ABSTRACT

Recent studies demonstrated the eligibility of convolutional neuralnetworks (CNNs) for solving the image registration problem. CNNsenable faster transformation estimation and greater generalizationcapability needed for better support during medical interventions.Conventional fully-supervised training requires a lot of high-qualityground truth data such as voxel-to-voxel transformations, whichtypically are attained in a too tedious and error-prone manner. Inour work, we use weakly-supervised learning, which optimizes themodel indirectly only via segmentation masks that are a more acces-sible ground truth than the deformation ﬁelds. Concerning the weaksupervision, we investigate two segmentation similarity measures:multiscale Dice similarity coefﬁcient (mDSC) and the similaritybetween segmentation-derived signed distance maps (SDMs). Weshow that the combination of mDSC and SDM similarity measuresresults in a more accurate and natural transformation pattern to-gether with a stronger gradient coverage. Furthermore, we introducean auxiliary input to the neural network for the prior informationabout the prostate location in the MR sequence, which mostly isavailable preoperatively. This approach signiﬁcantly outperformsthe standard two-input models. With weakly labelled MR-TRUSprostate data, we showed registration quality comparable to thestate-of-the-art deep learning-based method.

Index Terms — multi-modal registration, image-guided prostatebiopsy, weakly-supervised learning

1. INTRODUCTION

Prostate cancer is the second most prevalent cancer diagnosis amongolder men. A recent review [1] reports that 1,276,106 new cases and358,989 deaths, were registered in 2018 worldwide. Early-stage di-agnosis is one of the essential countermeasures, easing treatment ofprostate cancer, and lowering mortality risks. Medical image regis-tration plays a signiﬁcant role in this context, supporting physiciansduring their image-guided biopsies and therapies. Transrectal ultra-sound (TRUS) biopsies are among the most relevant techniques forearly diagnosis of prostate cancer. However, TRUS alone does notprovide the necessary soft-tissue contrast. Therefore, biopsies areusually preceded by magnetic resonance (MR) imaging to identifysuspicious regions beforehand. During the actual biopsy procedure, ©2021 IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotionalpurposes, creating new collective works, for resale or redistribution to serversor lists, or reuse of any copyrighted component of this work in other works. the information from MR and TRUS needs to be registered in real-time to steer the tissue sampling.The main challenge of the underlying registration problem is themassive difference in the appearance of the prostate between MR andTRUS sequences, causing many classical iterative intensity-basedmethods to fail. The alternative would be to use segmentation-drivenregistration methods. While MR image segmentation is done pre-operatively, the TRUS sequence requires a manual intra-proceduralsegmentation which prolongs the biopsy. A recent practical survey[2] revealed that CNNs are the most promising way to address thischallenge, enabling real-time MR-TRUS prostate registration with-out any additional manual guidance or effort.Early research focused on segmentation-driven registration. Forinstance, [3] presented two techniques: B-spline registration ofsegmentation-derived signed distance maps and a biomechanicallyconstrained surface registration. In both cases, an intra-proceduralprostate segmentation is required. Since many intensity-based simi-larity measures are prone to fail in multi-modal MR-TRUS settings,[4] proposed to learn image similarity through CNNs, outperform-ing mutual information on MR-TRUS pairs. However, this approachhas its drawbacks. The learned similarity measure needs to be en-veloped by an iterative registration which contradicts the real-timerequirement. Moreover, the training of similarity measures requiresthat voxel-correspondence is available as ground truth, which ischallenging to collect. This is especially true for TRUS becauseprostate deformations are involved due to the transrectal ultrasoundtransducer.To address the ground truth collection problem, [5] proposedan agent-based generation of synthetic ground truth for deformableregistration, which can be jointly used with real ground truth to trainprostate registration in MR-based setting. However, the strategies al-ready exist to avoid dense voxel-correspondence altogether. To thisend, [6, 7] proposed to use a weakly-supervised technique, combin-ing sparse correspondence from anatomical landmarks and prostategland segmentations into a weaker proxy measure. This neat ideasreduce the effort involved in the creation of the ground truth data,but still requires tedious segmentation of the corresponding prostateanatomical landmarks on both sequences.In this work, we go one step further, proposing to only usethe available prostate gland segmentations as a weak proxy mea-sure between two modalities. This approach greatly reduces theeffort spent in the creation of the ground truth data compared tothe voxel-to-voxel correspondence or the patient-speciﬁc landmarkpairs. Apart from that, the multitude of research behind automaticprostate segmentation indicates a great potential for automating thistask [8, 9, 10]. This can enable automatic annotation of massivedatasets, where human input would be merely impractical on such aTo appear in

Proc. ISBI 2021, April 13-16, 2021, Nice, France © IEEE 2021 a r X i v : . [ ee ss . I V ] F e b cale.We compare three approaches of utilizing segmentation similar-ity measures to optimize registration models based on mDSC, thesimilarity between segmentation-derived SDMs and on the combi-nation of them. Moreover, we show that the prior information aboutthe prostate location on the preoperative MR sequences, for whichsegmentations are mostly available beforehand, can signiﬁcantly im-prove the overall registration performance.

2. METHODS

Our pipeline consists of two stages: an initial coarse alignment of theregions of interests and a subsequent deformable ﬁne registration. Inthis section, we provide the details for both stages.

In the ﬁrst stage, we seek to guarantee a coarse alignment of bothimages. To this end, we noticed that in both images, the prostate isoriented according to the axis of the coordinate system (see Fig. 1),meaning that there is little to no rotation involved. Therefore, wecould use the axis as a reference frame for coarse initialization. Re-garding translation, we found that for TRUS, the prostate is approx-imately centrally positioned on the image due to acquisition. Thisneeds not to be always true for the MR images. To compensate forthe difference, we used the available MR-based prostate segmenta-tion to align its center of mass with the TRUS centre. After thealignment, we resampled the MR according to the TRUS image andcropped both images to guarantee a ﬁxed input size for our CNN.Typically, the MR segmentation is done by a radiologist before thebiopsy procedure; however, this step can be automated easily, forexample, using U-Net-based techniques [8] in this stage.

In the second stage, a deformation ﬁeld is predicted to establishthe spatial correspondence between voxels of the MR (moving) andTRUS (ﬁxed) image using a one-step CNN. We trained our networkin weakly-supervised fashion. Speciﬁcally, the training was drivenby the overlap between the labels of the moving and ﬁxed image.Standard binary overlap measures like the Dice similarity coef-ﬁcient (DSC), widely used for segmentation problems, can fail forregistration problems. For instance, in case of a complete label mis-match, any network will struggle to adapt its parameters due to zerogradients. To overcome this problem, we used the method presentedin [7] which uses a multiscale DSC (mDSC), that averages over a

Fig. 1 . Illustration of the coarse alignment via region of interest def-inition on MR sequence based on center of mass according to MRIsegmentation and the target image size derived from TRUS volume.

32 3 48164 8 16 32 32 32 332 16 16 8 83 3 44 43 conv 4x4x4, LReLU(0.1), BN, max pool 2x2x2conv 4x4x4, LReLU(0.1), BN, 3D upsamplingconv 4x4x4, LReLU(0.1), BNInputconcatenationadditive trilinear upsamplingconv 3x3x3 M R m a s k * M R T RU S Final DDF

Fig. 2 . Network architectures with a cumulative summation of thetransformation from the coarse- up to the full-scale level. *The thirdinput for MR segmentation is optional.battery of subsequently Gaussian smoothed binary masks. Speciﬁ-cally, we used L DSC ( p, g ) = 1 Z (cid:88) z ∈ σ (cid:80) Ni p z,i · g z,i (cid:80) Ni p z,i + (cid:80) Ni g z,i , (1)where Z is the number of smoothness levels, and p and g being theregistered moving and ﬁxed binary segmentation accordingly.We utilized Gaussians ﬁlters of sizes σ ∈ { , , , , } . Thespecial case σ = 0 indicates that no smoothing is applied, whichsuggests the standard DSC deﬁnition. The idea is that larger Gaus-sians will contribute more to the global convergence of the deforma-tion, while smaller Gaussians will carve out the ﬁne detail. A usefulby-product of this mDSC deﬁnition is the inherent smoothness of thedeformation.The add-on to regular DSC mentioned above is unable to coverthe whole deformable region with strong gradients entirely. Toovercome this problem, we propose to use segmentation-derived eu-clidean signed distance maps (SDMs) as a driving force to increasethe overlap between p and g . This strategy has already been shownto be favourable in the context of iterative prostate registration [3].In our case, we sought to minimize the mean squared logarithmicerror (MSLE) between pre-computed deformed ˆ p and ﬁxed ˆ g SDMs.MSLE operates on all positive sides of the SDMs and thus prop-agates the gradients to the area outside the prostate. Presumably,the resultant deformation ﬁeld should imitate how the prostate isdeformed during the procedure — with external forces applied toit. Another useful property is that the regions closer to the prostateboundary will be penalized stronger. L SDM (ˆ p, ˆ g ) = 1 N N (cid:88) i =1 (log( ˆ p i + 1) − log( ˆ g i + 1)) (2)To ensure the smoothness of deformations, we additionally appliedbending energy [11] as a second-order smoothness penalty T forvolumetric dense deformation ﬁeld (DDF) u , where only non-lineardeformations are penalized while allowing global afﬁne transforma-tions: T ( u ) = (cid:82) x (cid:82) y (cid:82) z (cid:20)(cid:16) ∂ u ∂x (cid:17) + (cid:16) ∂ u ∂y (cid:17) + (cid:16) ∂ u ∂z (cid:17) (cid:21) ∂x ∂y ∂z. (3) We utilized a 3D U-Net-like architecture (Fig. 2), actively used inthe medical imaging analysis [12]. The ﬁrst part of this architecture2 ig. 3 . Qualitative comparison of the proposed approaches: (a) multiscale Dice similarity coefﬁcient (mDSC), (b) signed distance maps(SDM), (c) mixed strategy (MIX), and (d) its variation with mask (MIX+). Each 3 × , / , / , / and / resolution scaling levels.Our network architecture comprises 3D convolutional layerswith the kernel size of × × , the stride of × × , and thesame padding for the down- and upsampling part. At the ﬁrst level,the convolution layer outputs four ﬁlters, followed by Leaky ReLUactivation with a negative slope of 0.1 with the subsequent batchnormalization layer. Later, this layer is downsampled by 3D Max-Pooling operation with a stride of × × . With every next levelmoving down, the number of ﬁlters in the feature space is doubled(see Fig. 2). The residual convolution layers from the upsamplingpart share the kernel size of × × .We have trained our models with a total batch size of 5. TheAdam optimization algorithm was used with an initial learning rateof e − . All conﬁgurations were trained for 300 epochs at most(33,000 mini-batch iterations), while best models were selected onthe best validation DSC. 3D afﬁne image augmentation, togetherwith a trilinear resampling layer, and differentiable deformationlayer was borrowed from [7], which is an adapted version of open-source methods from NiftyNet [13]. Our dataset contained 155 segmented TRUS and T2-weighted MRimage pairs. For each patient, these MR records were received ina transversal scanning position from a Philips Achieva 3T systemwith a volume size of × × voxels and resolution of . × . × . mm. For biopsies, the BK 3000 ultrasound system,together with an endorectal biplane transducer, was used to acquireimage sequences of size × × with . × . × . mmspacing. While the prostate glands from MRI scans were segmentedsemi-automatically by the radiologist with the commercial softwarePhilips DynaCAD beforehand, the TRUS segmentation masks wereacquired during biopsy intervention in a semi-supervised manner us-ing the commercial software Philips UroNav. Since not all TRUSsegmentations were of the best quality due to intra-procedural ac-quisition, 21 held-out test cases were thoroughly re-segmented onthe TRUS images for reliable evaluation. Moreover, 73 new anatom-ical landmark pairs in the form of patient-speciﬁc calciﬁcation andcysts were annotated and veriﬁed by an experienced urologist on thisdata for a detailed evaluation of the methods. Both MR and TRUS images were resampled to the voxel size of . × . × . mm to place them into the same world coordinatesystem. For the resampling procedure on the image data, we utilizedB-Spline interpolation, whereas, for segmentation masks, linear in-terpolation followed by binarization operation was used. Later, allimages were cropped to the volume size of × × . Subse-quently, both TRUS and MR image intensities up to the 99th per-centile were normalized to the interval of [0 , . During the training,we separately augmented the input data on the ﬂy by resampling theTRUS and MR images with a small randomly generated afﬁne trans-formation without ﬂipping.3 .4. Experiments We performed several experiments with ﬁve-fold cross-validationscheme showing that weakly-supervised learning strategy canachieve state-of-the-art performance in automatic transformationestimation task with a considerably reduced time and effort spent increation of the ground truth data.First, we aimed to investigate how the mDSC- and SDM-drivenregistration methods perform with registration task individually.Then, we examined the idea of combining these two approaches andhow it impacts the deformation behaviour. Lastly, we compared theregistration accuracy between the models with and without the thirdauxiliary input for MR segmentation. We hypothesize that this priorinformation provides substantial support in identifying the prostatelocation on moving images.

In our experiments, the loss function combines two label similaritymetrics with the smoothness penalty: L = α L DSC + β L SDM + γ T , (4)where L DSC and L SDM denote the mDSC score and the similarityerror between SDMs, respectively. Bending energy T is weightedby γ term, with γ = 1 − α − β . To compare different strategieswe trained models independently with empirically chosen hyper-parameters α = 0 . , β = 0 for mDSC and α = 0 , β = 0 . forSDM. The ﬁnal mixed strategy (MIX) combines multiscale DSC andsimilarity of SDMs with the following parameterization: α = 0 . and β = 0 . , since it yielded the best results.

3. RESULTS AND DISCUSSION

For each experiment, we measured the registration accuracy withthe DSC score on the whole prostate gland and its three anatomi-cal regions (base, mid, apex) together with patient-based TRE bycalculating the mean distance between the center of mass of the cor-responding anatomical landmarks. The coarse pre-alignment on thetest data produced a median DSC score of 0.76 and median TRE of7.4 mm, which is sufﬁcient for initialization stage.Visual inspection revealed that mDSC-based models tend tostretch the inner organ structures to its boundaries, whereas theSDM-based models demonstrated a more uniform and natural de-formation pattern (Fig. 3), also resulting in a better TRE score.However, the SDM model fails to produce smooth deformation ac-cording to the second-order smoothness measure ∇| J | (see Fig. 4). Table 1 . Quantitative comparison of the registration methods interms of Dice similarity coefﬁcient (DSC), Target registration error(TRE) and the mean gradient of the Jacobian determinant × − .The results are in the mean ( median ) ± SD format. Symbol ↑ in-dicates that a lower score is better, while ↓ vice versa. Method

DSC Whole ↑ TRE, mm ↓ ∇| J | × − ↓ Coarse ± ± ± ± ± ± ± ± ± ±

28 (27) ± MIX+ ± ±

36 (36) ± l ll lllll l TRE, mm lll lll llllllllll lllll l llllll llllllll lllllllllll l l lllll lllll llllll lllll llllll lllll llll ll l

DSC scores l ll

Gradient

Multiscale DSC SDM MIX MIX+ Coarse

Fig. 4 . Registration performance comparison of multiscale Dicesimilarity coefﬁcient (mDSC), signed distance maps (SDM) andmixed strategy (MIX) together with its variation with mask (MIX+)in terms of Dice similarity coefﬁcient (DSC), Target registrationerror (TRE) and the mean gradient of the Jacobian determinant × − . Coarse alignment indicates starting conditions.Since mDSC approach entails the regularization effect on the DDF,it is reasonable to try to combine it with SDM. The performancescore of such approach indicates that MIX learning yields a muchsmoother transformation than SDMs while preserving the high reg-istration accuracy.Considering the auxiliary input of the network (MIX+), wefound that the additional information in the form of the MR segmen-tation mask helped to achieve signiﬁcantly better results for DSCand TRE than the standard two-images input (p-value of < . for both metrics in a paired Wilcoxon signed-rank test). These ﬁnd-ings suggest that with such prior information, the network can moreeasily ﬁgure out where to move, without looking for what to move.In general, our MIX+ model with auxiliary input reached state-of-the-art performance with the median TRE of 4.3 mm and DSC of0.89, which is similar to the results demonstrated by Hu et al. [7]with TRE of 3.6 mm and DSC of 0.88. In contrast to our method,they utilize plenty of anatomical landmarks during training, whereasour training relied purely on the ﬁxed and moving segmentationmasks. It is also worth noting that up to this point there is no similardataset publicly available, that would contain such MR-TRUS pairsper patient and enable more transparent comparison.In future work, it is reasonable to investigate the impact of thedeep-learning generated segmentation masks on the registration per-formance. Thus, for instance, the coarse pre-alignment can be im-proved by having two segmentations during inference stage insteadof one MR mask as it is now. Consequently, a better starting positioncan lead to more accurate ﬁne registration. This will also enable thecomparison of the segmentation-based iterative registration methodon automatically created segmentation with the instant DDF predic-tion methods.

4. CONCLUSION

This work demonstrated that it is feasible to learn an effective reg-istration model based solely on the segmentation of the prostateglands. The combination of two segmentation similarity measures(mDSC and SDM) has proven to be the best option in terms ofaccuracy and smoothness of the transformation. We demonstratedthat models with the given prior information, such as MR prostatesegmentation, signiﬁcantly outperform the two-input models.Our approach enables efﬁcient registration learning without theuse of anatomical landmarks and requires far less ground truth datathan the fully-supervised methods. Future work should also focus onthe potential integration of the deep-learning-based intensity-based4imilarity metric [4] in the current setup to eliminate bias imposedduring the segmentation process and explore the effects of DDF en-sembling.

Acknowledgments

This work has been supported by the federalstate of Saxony-Anhalt, Germany within the framework of the post-graduates funding. We would like to thank all the colleagues andradiologists, who have contributed their effort to this study.

Compliance with Ethical Standards

The retrospective analysisin this research study relies on fully anonymized treatment planningdata. This work does not involve any studies with human participantsor animals performed by any of the authors and is in line with theDeclaration of Helsinki.

Conﬂict of Interest

The authors have no conﬂicts of interest todeclare.

5. REFERENCES [1] Prashanth Rawla, “Epidemiology of prostate cancer,”

WorldJournal of Oncology , vol. 10, no. 2, pp. 63, 2019.[2] Natan Andrade, Fabio Augusto Faria, and F´abio Au-gusto Menocci Cappabianco, “A practical review on medi-cal image registration: From rigid to deep learning based ap-proaches,” in

SIBGRAPI Conference on Graphics, Patternsand Images , 2018, pp. 463–470.[3] Andriy Fedorov, Reinhard Beichel, Jayashree Kalpathy-Cramer, Julien Finet, Jean-Christophe Fillion-Robin, Sonia Pu-jol, Christian Bauer, Dominique Jennings, Fiona Fennessy, Mi-lan Sonka, et al., “3d slicer as an image computing platform forthe quantitative imaging network,”

Magnetic resonance imag-ing , vol. 30, no. 9, pp. 1323–1341, 2012.[4] Grant Haskins, Jochen Kruecker, Uwe Kruger, Sheng Xu, Pe-ter A Pinto, Brad J Wood, and Pingkun Yan, “Learning deepsimilarity metric for 3d mr–trus image registration,”

Interna-tional journal of computer assisted radiology and surgery , vol.14, no. 3, pp. 417–425, 2019.[5] Julian Krebs, Tommaso Mansi, Herv´e Delingette, Li Zhang,Florin C Ghesu, Shun Miao, Andreas K Maier, Nicholas Ay-ache, Rui Liao, and Ali Kamen, “Robust non-rigid registrationthrough agent-based action learning,” in

International Con-ference on Medical Image Computing and Computer-AssistedIntervention . Springer, 2017, pp. 344–352.[6] Yipeng Hu, Marc Modat, Eli Gibson, Nooshin Ghavami, EsterBonmati, Caroline M Moore, Mark Emberton, J Alison Noble,Dean C Barratt, and Tom Vercauteren, “Label-driven weakly-supervised learning for multimodal deformable image registra-tion,” in . IEEE, 2018, pp. 1070–1074.[7] Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, NooshinGhavami, Ester Bonmati, Guotai Wang, Steven Bandula, Car-oline M Moore, Mark Emberton, et al., “Weakly-supervisedconvolutional neural networks for multimodal image registra-tion,”

Medical Image Analysis , vol. 49, pp. 1–13, 2018.[8] Anneke Meyer, Alireza Mehrtash, Marko Rak, Daniel Schin-dele, Martin Schostak, Clare Tempany, Tina Kapur, Purang Abolmaesumi, Andriy Fedorov, and Christian Hansen, “Auto-matic high resolution segmentation of the prostate from multi-planar mri,” in , 2018, pp. 177–181.[9] Yi Wang, Haoran Dou, Xiaowei Hu, Lei Zhu, Xin Yang, MingXu, Jing Qin, Pheng-Ann Heng, Tianfu Wang, and Dong Ni,“Deep attentive features for prostate segmentation in 3d tran-srectal ultrasound,”

IEEE transactions on medical imaging ,vol. 38, no. 12, pp. 2768–2778, 2019.[10] Nader Aldoj, Federico Biavati, Florian Michallek, Sebas-tian Stober, and Marc Dewey, “Automatic prostate andprostate zones segmentation of magnetic resonance images us-ing densenet-like u-net,”

Scientiﬁc reports , vol. 10, no. 1, pp.1–17, 2020.[11] Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LGHill, Martin O Leach, and David J Hawkes, “Nonrigid reg-istration using free-form deformations: application to breastmr images,”

IEEE Transactions on Medical Imaging , vol. 18,no. 8, pp. 712–721, 1999.[12] ¨Ozg¨un C¸ ic¸ek, Ahmed Abdulkadir, Soeren S Lienkamp,Thomas Brox, and Olaf Ronneberger, “3d u-net: learningdense volumetric segmentation from sparse annotation,” in

International Conference on Medical Image Computing andComputer-Assisted Intervention . Springer, 2016, pp. 424–432.[13] Eli Gibson, Wenqi Li, Carole Sudre, Lucas Fidon, Dzhoshkun IShakir, Guotai Wang, Zach Eaton-Rosen, Robert Gray, TomDoel, Yipeng Hu, et al., “Niftynet: a deep-learning platformfor medical imaging,”